Please note that by using this software you agree to the terms of use (please see 'Terms of use' tab).
For performance reasons the number of input peptides that can be processed on the website has been limited to 10000. To predict for large datasets please use HLAthena with Terra/FireCloud, an example can be found here .
A dockerized version of HLAthena is also available. To use the docker version:
Please pull the following docker image (assuming you already have docker installed):
docker pull ssarkizova/hlathena-external
Then run:
docker run -v `pwd`:`pwd` -w `pwd` ssarkizova/hlathena-external predict --help
You should see help output with run examples and descriptions of the available options. The parameters are very similar to those on the website and the examples should explain how to build up the command for your use case. More details on the input and output formats are available on the website: Predict -> How to.
If you have questions please reach out at the HLAthena discussion group.
Please be patient, predictions may take several minutes to complete...
HLAthena is made available for use under the terms of this license.
Third party code acknowledgment
Parameter name | Values | Description |
---|---|---|
alleles | click on one or more alleles from the drop-down list | The list of allele(s) for which predictions will be run |
peptides | Type or paste peptides into the input box or load a file (sample input files provided) | A list or file containing the input peptides that predictions should be computed for. This can be either in a peptide format (i.e. one peptide per line) or in a fasta format. Please see below for more details. |
Peptide column name | e.g. “pep” | The name of the column with peptide sequences |
Context available? | “yes” or “no” | Specifies whether peptide context data exists in the input file. See also the more detailed description of the input file format below |
Expression column name: | e.g. “TPM” | The name of the column containing peptide expression level; or blank if this information is not available. See also the more detailed description of the input file format below |
Log-transform expr? | “yes” or “no” | Should the expr_col_name be log transformed or not. See also the more detailed description of the input file format below |
Aggregate by peptide? | “yes” or “no” | Should the input be aggregated in case the same peptide sequence appears multiple times. For example, the same peptide sequence could sometimes be derived from multiple source proteins and hence it can appear in the input file with different context sequences or with different expression values. Setting this parameter to “true” will aggregate the input on the peptide sequences and provide a single final scores for each peptide. String columns are semi-column concatenated, cleavability (and other numerical columns) is averaged, and expression is summed |
Assign peptides to alleles by: | “ranks” or “scores” | The workflow will provide both log-likelihood presentation scores ([0,1], 1 is good) as well as percentile ranks ([0, 100], 0 is good). If multiple alleles are provided, the workflow will pick the best allele for each peptide. The best allele will be determined either based on “scores” or based on “ranks” depending on the value of this input |
Threshold: | e.g. “0.1” | Set the threshold value for assigning a peptide to an allele. For example if assign_by_ranks_or_scores is set to “ranks” assign_threshold can be set to “0.1” which means that a peptide will be assigned to an allele if it scores within the top 0.1 percentile amongst a large set of background decoys. Alternatively, if assign_by_ranks_or_scores is set to “ranks” assign_threshold can be set to “0.98”. |
A file containing the input peptides that predictions will be computed for. This can be either in a peptide format (i.e. one peptide per line) or in a fasta format:
peptide format
The peptide format should be tab-delimited and it should contain a header line. The file can contain peptides of different lengths, lengths 8, 9, 10, and 11 are supported. If a peptide or context sequence contains non-standard amino acid symbols the corresponding row will be excluded from the analysis.
Example:
pep | ctex_up | ctex_dn | TPM |
---|---|---|---|
AADIFYSRY | AAAAAAAAAGAGGGGFPHPAAAAAGGNFSV | AANQCRNLMAHPAPLAPGAASAYSSAPGEA | 1.5 |
AADLNLVLY | AAAAAAAAGAGGGGFPHPAAAAAGGNFSVA | ANQCRNLMAHPAPLAPGAASAYSSAPGEAP | 8.3 |
AADLVEALY | AAAAAAAGAGGGGFPHPAAAAAGGNFSVAA | NQCRNLMAHPAPLAPGAASAYSSA—— | 12.5 |
AIDEDVLRY | —AAAALVSDSFSCGGSPGSSAFSLTSSS | AASSSPFANDYSVFQAPGVSGGSGGGGGGG | 0.5 |
IDLLKEIY | AAAAAALVSDSFSCGGSPGSSAFSLTSSSA | ASSSPFANDYSVFQAPGVSGGSGGGGGGGG | 30.2 |
The only required column is the peptide column, this is the column that contains peptide sequence (see peptide_col_name). If only this column is provided then only the MSintrinsic (or MSi) predictors will be ran. If exists_ctex is set to “true” and the file also contains ctex_up and ctex_dn columns then the MSiC predictor will also be ran. The ctex_up and ctex_dn columns should contain the 30 upstream and 30 downstream amino acid residues for each peptide. If the peptide is found at the boundary of a protein (i.e. N- or C-terminus) such that less than 30 residues are available in the source sequence, then ctex_up and ctex_dn should be padded with dashes '-'.
Similarly, if exists_expr is set to true, the MSiCE models will be ran as well. Expression should be provided as TPM (transcripts per million) values or log(TPM+1) values. The column name containing the expression information is specified by the expr_col_name input. These values will be log-transformed if logtransform_expr is set to “true”.
If the input file contains any other columns they will be carried through to the final predictions output file, with the exception of a few column names that are used internally that may be removed.
fasta format
The fasta format input is assumed to follow standard fasta formatting. In addition we assume amino acid sequence are provided (rather than codon sequences). Spaces within the sequence are accepted, and so are both upper and lower case AA letters.
Example:
>HPV-16 E7 |
---|
mhgdtptlhe ymldlqpett dlycyeqlnd sseeedeidg pagqaepdra hynivtfcck cdstlrlcvq sthvdirtle dllmgtlgiv cpicsqkp |
>KRAS_G12D |
MTEYKLVVVGADGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLL |
Column name | Description |
---|---|
model_<allele> | The type of model used for this {allele, length} combination, either allele-and-length specific or pan-allele-pan-length model |
MSi_<allele> | The predicted score for this allele [0,1], 1 is good |
prank.MSi_<allele> | The percentile ranks corresponding to the predicted score for this allele [0,100], 0 is good |
MSiC_<allele>; MSiCE_<allele> | Analogous as above, for the MSiC and MSiCE models if expression and sequence context data is available and the exists_ctex and exists_expr input options were set to true |
prank.MSiC_<allele>; prank.MSiCE_<allele> | Analogous to above for the , for the MSiC and MSiCE models |
best.<model> | The best model score across input alleles. model here is one of MSi, MSiC, MSiCE, where the most integrative model available is considered. Note that the best score does not necessarily satisfy the allele assignment threshold |
best.<model>_allele | The allele which corresponds to the best score |
assign.<model>_ranks OR assign.<model>_scores | Depending on the value of *assign_by_ranks_or_scores* one of these two columns will be present in the output file. The numerical values of the score or rank which satisfy the threshold criteria set for peptide-to-allele assignment, or “NA” if none of the input alleles do |
assign.<model>_allele | The allele(s) which satisfy the threshold criteria set for peptide-to-allele assignment, or “unknown” if none of the input alleles do |
Sample data files can be downloaded from the left-hand side menu in both peptide and fasta formats.
If you have further questions regarding HLAthena please reach out on the HLAthena discussion group.
[1] Sarkizova S*, Klaeger S*, Le PM, Li WL., Oliveira G, Keshishian H, Hartigan CR, Zhang W, Braun DA, Ligon KL, Bachireddy P, Zervantonakis IK, Rosenbluth JM, Ouspenskaia T, Law T, Justesen S, Stevens J, Lane WJ, Eisenhaure T, Zhang GL, Clauser KR, Hacohen N#, Carr SA#, Wu CJ#, Keskin DB#. A large peptidome dataset improves HLA class I epitope prediction across most of the human population. Nature Biotechnology. 2019 Dec 16; doi: 10.1038/s41587-019-0322-9. [Epub ahead of print] PubMed
[2] Abelin JG*, Keskin DB*, Sarkizova S*, Hartigan CR, Zhang W, Sidney J, Stevens J, Lane W, Zhang GL, Eisenhaure TM, Clauser KR, Hacohen N#, Rooney MS#, Carr SA#, Wu CJ#. Mass spectrometry profiling of HLA-associated peptidomes in mono-allelic cells enables more accurate epitope prediction. Immunity. 2017 Feb 21; 46:315-26 PubMed