Ranking of binding and nonbinding peptides to MHC class I molecules using inverse folding approach: implications for vaccine design.

T cell recognition of the peptide-MHC complex initiates a cascade of immunological events necessary for immune responses. Accurate T-cell epitope prediction is an important part of the vaccine designing. Development of predictive algorithms based on sequence profile requires a very large number of experimental binding peptide data to major histocompatibility complex (MHC) molecules. Here we used inverse folding approach to study the peptide specificity of MHC Class-I molecule with the aim of obtaining a better differentiation between binding and nonbinding sequence. Overlapping peptides, spanning the entire protein sequence, are threaded through the backbone coordinates of a known peptide fold in the MHC groove, and their interaction energies are evaluated using statistical pairwise contact potentials. We used the Miyazawa & Jernigan and Betancourt & Thirumalai tables for pairwise contact potentials, and two distance criteria (Nearest atom >> 4.0 A & C-beta >> 7.0 A) for ranking the peptides in an ascending order according to their energy values, and in most cases, known antigenic peptides are highly ranked. The predictions from threading improved when used multiple templates and average scoring scheme. In general, when structural information about a protein-peptide complex is available, the current application of the threading approach can be used to screen a large library of peptides for selection of the best binders to the target protein. The proposed scheme may significantly reduce the number of peptides to be tested in wet laboratory for epitope based vaccine design.


Background:
Development of epitope-based vaccines critically requires identification of regions in non-self and mutated proteins which are recognized by cytotoxic T lymphocyte (CTLs).The recognition of such regions by CTLs is a multistep processes where binding of peptides to MHC class I molecule is an important step and further transport of peptide-MHC complex to the antigen presenting cell surface [1].Much of the information has accumulated regarding the specific binding of peptides to MHC class I molecules.A number of computational methods have been developed for the prediction of MHC binding peptide according to the data and computational approaches they apply i.e. sequence and structure based.Sequence based approaches includes motif, quantitative matrix and machine learning models have been successful applied in the discovery of novel T-cell epitopes involved in the cancer immunity [2, 3].Although sequence based approaches are well established, but they require large sets of peptides that were tested experimentally and not feasible in situations where insufficient experimental binding data are available [4, 5].Availability of crystallographically solved MHC-peptide complexes provides the opportunities for inverse folding (threading) approach which do not rely on previously binding data but aim to take account of the contributions of individual amino acids along the peptide that prompt them to fit into the groove of MHC allele using structural considerations [6, 7, 8].
In this paper, an approach developed to address the inverse protein folding problem is applied to prediction of potential binding peptides to a specific MHC molecule and their interaction energies [9] are evaluated using statistical pairwise contact potentials, MJ [10] and BT [11].The number of conformations the peptide can adopt in the binding groove is limited and defined by the peptide-MHC structure that imposes physical constraints on the peptide [12].The residues were considered to be in contact or not according to two different distance criteria [6, 13].We also investigated whether using multiple template structures and taking the average improves the predictions or not.After these analysis, we found that using BT potential with any two atoms are closer than 4 Å and taking multiple peptide conformations into consideration improves the threading procedure in discriminating between binding and nonbinding peptides.Hence, the compatibility of the peptide sequence with the space in the binding groove has an important role in molecular recognition which implies that the peptide conformation should be taken into consideration to improve the predictions of threading methods.

Methodology: Template structures
The available data in the PDB are redundant and hence we created a non-redundant set from those entries with the best resolution for the related structural complexes having identical sequence information [14].The non-redundant dataset consists of fifty four class I MHC-Peptide complexes (Table 1 in supplementary material).All the complexes chosen for the study were characterized using IMGT/3Dstructure-DB Structural Query tool [15] and MHC-Peptide Interaction Database (MPID) [16] including eleven 8-mer peptide-H2-Kb, seventeen 9-mer peptide-HLA-A*0201, twelve 9-mer peptide-H2-Db, four 10-mer peptide-HLA-A*0201 complexes.The MHC non-binding peptide data set for the selected alleles were retrieved from AntiJen database [17], which covered a large range of IC 50 value from 5000-440000 (Table 2 in supplementary material).The interface of peptide-MHC complexes is defined using the parameters, Interface Area and Gap Index [18].Interface area for class I MHC-peptide complexes was defined as the change in their solvent accessible surface area (delta ASA) when going from a monomeric MHC molecule to a dimeric MHC-peptide complex state whereas, Gap index is used as means to evaluate the complementarity of interacting surfaces.The gap index is calculated using the formula, Gap Index = Gap Volume / delta ASA.

Threading with a contact potential matrix
In this method, binding affinity of a peptide is predicted by the total energy of interaction with contact residues.The contacts of the peptide in the available template co-crystal structure are determined according to two different criteria 1), ß-carbon atoms are closer than 7 Å [6]; and 2) any two atoms are closer than 4 Å [13].Then, the amino-acid sequence of the query peptide is threaded onto the coordinates of the peptide in the template using MODPROPEP web server [19].The contacts are assumed to be conserved, and the total interaction energy is obtained by summing the interaction energy values of peptide residues using a contact potential matrix.The contacting residues are determined for the conformation in the known structure, and therefore are only approximate for different sequences threaded.Energy values for amino acid-toamino acid interactions are taken from the table of statistical pairwise contact potentials derived by MJ and BT [10,11].The experimental binding energies are correlated with binding affinity (IC 50 ) using the expression, ∆Gexp = -RT ln(IC 50 ) where R is the gas constant and T the absolute temperature [20].The predicted contact energies are given in dimensionless units of RT.

Results and discussion:
The peptide sequences in the test dataset (Table 1 and 2, see supplementary material) were threaded onto the crystal structures of the MHC class I peptide complexes.Different statistical potential matrices (MJ & BT) were used to obtain an estimate of the binding affinity of the threaded sequences, with the goal of ranking the binding and nonbinding sequences in the selected data set (see Methodology).We applied the method of Altuvia and colleagues [21] to score and rank the binding affinities of peptides to MHC class I molecules.Table 3 and 4 (under supplementary material)gives the ranking of peptides according to the binding affinities predicted by MJ and BT threading algorithm and using the 1VAC, 1LEG (H2-Kb/8); 1INQ, 1JPG (H2-Db/9); 1HHI, 1AO7 (HLA-A*0201/9); 1I4F, 2CLR (HLA-A*0201/10) complex structure as the template for two different distance criteria (Nearest atom < 4.0 Å & C-beta < 7.0 Å) to define the contacting residues.Although it is reasonable to use the same distance criterion as in the parameterization of the statistical contact potentials, we have applied both distance criteria to enable a direct comparison of the results.
Here, we found that the nearest atom < 4.0 Å distances criterion to determine the contacting residues gives a better prediction compared to C-beta < 7.0 Å distances (Table 3 and 4 in supplementary material).Surprisingly, although it still ranks high, the template structure's own peptide does not have the highest score, indicating that this force field may not have adequate precision.Overall, there is a tendency that the nonbinding peptides are ranked lower than the binding ones, but it is not possible to differentiate the binder and non binder using these rankings.
The pair wise potential is used to estimate the binding energies of peptide sequences threaded upon the different structural template.MJ pair-wise contact potential table puts much emphasis on hydrophobic interaction for the MHC alleles that contain various pockets of hydrophobic characters.Although most peptide are relatively buried within the binding groove of the MHC molecule, one can not assume that hydrophobic interaction are the mainly one that will tell binding from nonbinding peptides apart.So we have used the table of BT that has modified table of MJ by changing the reference state from solvent to a defined single solvent like molecule, the amino acid threonine and improved the ranking of template.However, in some cases (HLA-A*0201-10/1I4F), the template structure's own peptide has a very bad score, and is predicted to have a binding affinity even lower than nonbinding peptides.The results of threading are very much dependent on the template structure used, as a peptide ranks high if its binding scheme is similar to the template peptide.Hence, using multiple templates potentially should provide a better fit for the binding peptides.Therefore, this crude force field is not accurate enough to distinguish the subtle differences between the various peptide sequences.For the other sequences, it is not possible to differentiate binding and nonbinding peptides based on energy using a single template; however, some binders have lower scores using one template and have high scores in the other.Using multiple templates provides more possible conformations accessible in the binding groove than the binding sequences can possibly assume.Therefore, taking the average of results from the two templates improves the results as seen in Table 5 (supplementary material).The non binders are ranked lower than the binder, but once again, the binding and nonbinding peptides are not separated significant.In another test to justify the use of the threading method, we evaluated their performances using the rank analysis of binding peptide in the source protein sequences derived from the overlapping peptides.The BT potentials generally rank the template structure's own peptide high among all possible 8, 9 and 10mers in the source protein (Table 6 in supplementary material).

Conclusion:
Threading methodology employing two different statistical contact potentials (MJ and BT) and distance criteria (Nearest atom < 4.0 Å and C-beta < 7.0 Å) were applied to MHC class I molecules with a test set consisting of both its natural binding peptide and nonbinding peptide sequences The aim was to find which force field gives better predictions to rank and differentiate between the two groups in the test dataset, and hence determine which factors are important in the peptide recognition in MHC class I molecules.We found that using a BT force field, nearest atom < 4.0 Å distance criteria and the average of results from multiple template structures gives better predictions.Nevertheless, we could not obtain results that could separate the binders from the nonbinders in the test dataset even when we used multiple templates.This leads to the idea that shapes, rather than certain amino acids, are recognized by the MHC.Although the MHC also adapts to bind different sequences, the binding groove restricts the conformations accessible to the bound peptide.
The affinity of the peptide is thus affected by how well it can fit into the volume defined by the binding groove.This finding suggests that the "fitness" of a given peptide to the conformations accessible in the bound form is an important determinant of its binding affinity.This also indicates that the force field precisely defines the energy of the peptide when the exact conformation is available.Thus the inverse folding approach is advantageous for MHC alleles that lack binding data but have solved structure in complex with peptide, or alternatively, a structural model of the complex based on known structures.In this postgenomic era, the approach is potentially useful for screening a library of potential binding sequences to the newly discovered proteins to develop epitope based vaccines.

Table 2 :
MHC non binding peptide dataset used in the study.

Table 3 :
Ranking of MHC binding and non peptides according to their predicted binding affinity by threading using a scoring matrix (MJ & BT) and two distance criteria (Nearest atom < 4.0 Å & C-beta< 7.0 Å). * Structure used as template.

Table 4 :
Ranking of MHC binding and non binding peptides according to their predicted binding affinity by threading using a scoring matrix (MJ & BT) and two distance criteria (Nearest atom < 4.0 A 0 & C-beta< 7.0 A 0 ).* Structure used as template.

Table 5 :
Ranking of MHC binding and non binding peptides according to their average predicted binding affinity by threading using a scoring matrix (MJ & BT) and two distance criteria (Nearest atom < 4.0 A 0 & C-beta< 7.0 A 0 ).* Structure used as template.

Table 6 :
Ranking of MHC binding peptides according to their predicted binding affinity by threading using own template in their source protein sequence for scoring matrices (MJ & BT) and two distance criteria (Nearest atom < 4.0 A 0 & C-beta< 7.0 A 0 ).