Comparative analysis of epitope predictions: proposed library of putative vaccine candidates for HIV.

Designing a vaccine for a disease is one of the crucial tasks that involve millions and billions of dollars, several decades and yet there is no guarantee of successful results. Several pharmaceutical companies are investing their money and time in such activities. Computational biology could be of great help in these activities by proving a library of plausible candidates that might actually show some positive responses. MHC binding peptide prediction is one such area where the immense power of computers could be used to get a breakthrough. In this direction several databases and servers have been developed by many labs to predict the MHC binding peptides. These short peptides on the antigen surface are recognized by the MHC molecule and are presented to the receptors of T-cells for further immune response. Peptides that bind to a given MHC molecule share sequence similarity. Here we present a comparative study of servers that can predict the MHC binding peptides in a given protein sequence of the antigen. Based on this comparative analysis on HIV data, we are able to propose a library of putative vaccine candidates for the env GP-160 protein of HIV-1.


Background:
With the progression and success rates of several genome projects, we are provided with exponentially increasing number of proteins. This protein sequence data also include vital hidden information of pathogenic response of an organism. Sequences from pathogens provide a huge amount of potential vaccine candidates. The first step in the development of peptide vaccines is the identification of the immunodominant peptides along with proteins sequence. Theoretically every sub-sequence along with the protein could be antigenic. The experimental identification of peptide binding affinity to Major Histocompatibility Complex (MHC) [1] molecules requires a binding assay of each peptide, which is a time consuming and costly process. Therefore, a number of alternative research efforts have been carried out in an attempt to discover the laws of binding peptide sequence patterns. One could perform several computational analyses to screen and develop libraries of such peptides. Such libraries could help the research and development process of several pharmaceutical companies saving them money and time and will also insure less hazardous situation to handle the pathogens.
One of the major roles of immune system is to recognize and destroy cells expressing non-self or mutated proteins. The cascade of immune response begins when the cytotoxic T lymphocytes (CTLs) recognizes the specific antigen and trigger the specific immune response against them [2]. If the antigens aren't recognized properly on the early onset of the infection, might even lead to lethal diseases. MHC molecules play an important role in the immune system. MHC helps T-Lymphocyte cells to identify the antigens and to trigger the immune response. The complex of bound peptide and MHC complex induces the naïve T cells to proliferate and differentiate into armed effector T cells that help to remove the antigens. There is a great diversity in the selectivity of peptides by MHC molecules which makes it difficult for pathogens to escape the immune response. Each different MHC molecule can bind to a set of different peptides [3]. This brings into the consideration, the several supertypes of MHC molecules with different alleles in many supertypes.
The env gene in HIV encodes a single protein, gp160. When gp160 is synthesized in the cell, cellular enzymes add complex carbohydrates and turn it from a protein into a glycoprotein -hence the name gp160 rather than 'p160'. gp160 is found on the outer surface or envelope of the HIV. It is composed of gp120, which protrudes from the envelope, and gp41, which is embedded in the envelope. gp160 is the unit that helps the virus to adhere and to interact with the surface proteins of host. The majority of HIV subunit vaccines are based on the envelope proteins of HIV namely gp120 and gp41, which form the gp160 glycoprotein complex, or on selected epitopes identified within these proteins [4; 5] HLA-A2 is a human leukocyte antigen serotype within HLA-A "A" serotype group. The serotype is determined by the antibody recognition of α 2 subset of HLA-A α-chains. For A2, the alpha "A" chain are encoded by the HLA-A*02 allele group and the β-chain are encoded by B2M locus [6]. The serotype identifies the gene products of many HLA-A*02 alleles, including HLA-A*0201, *0202, *0203, *0206, and *0207 gene products. A2 is the most diverse serotype, showing diversity in Eastern Africa and Southwest Asia. While the frequency of A*0201 in Northern Asia is high, its diversity is limited to A*0201 the less common Asian variants A*0203, A*0206. Due to its diversity, it is an interesting entity to study the antigenantibody interactions.
This study deals with a comparative analysis of epitope and MHC binding peptide prediction on envelope proteins of HIV-1. The potential value of a preventative and cost-effective vaccine stratagem to protect against HIV is inevitable. Based on this analysis a putative vaccine candidate library has been fabricated which focuses mainly on the performance of servers used and their predicted accuracy with their respective parameters. Generated library will be a useful resource in the process of vaccine design for HIV-1 and it will also help in the generation of similar libraries for other pathogens.

Material and methods:
Here in this study, we tested the same sequences over several servers with the similar prediction conditions and compared the results obtained. The protein sequences were fetched from the National Center for Biotechnology Information (NCBI) through their enterz search engine. We considered HIV-1 surface and envelope proteins in our study.
In our study, we compared 8 comprehensive servers (Table 1 see Supplementary material), based on their results and scores they assigned to different peptides predicted. Additional information provided by every server has also been analyzed along with the results. Analysis has been performed on the basis of final score given by the server. Basic description and working principles of servers is given in the text following. , and is able to identify good binders only for MHC molecules with hydrophobic binding pockets. This server returns the peptide "energy score" value. The peptides are ranked according to their energy score (the lower the better).

RankPep [10]
server predicts MHC-I and MHC-II peptide binders from protein sequence or sequence alignments using Position Specific Scoring Matrices (PSSMs) or profiles from set aligned peptides known to bind to a given MHC molecule as the predictor of MHC-peptide binding. In addition, it predicts those MHC-I ligands with a C-terminal end is that likely to be the result of proteasomal cleavage. Profiles basically consist of a table listing the observed sequence-weighted frequency of all amino acids in every column of a sequence alignment. This server includes a selection of 102 and 80 PSSMs for the prediction of peptide binding MHCI and MHCII molecules, respectively.

HLA-BIND [11]
allows users to locate and rank peptides that contain peptide-binding motifs for HLA class I molecules. By default, this server predicts 9-mer peptides using 20-by-9 coefficient matrix for the selected HLA molecule for the scoring. The estimated numerical score for the subsequence in case of HLA-A2 is calculated based upon the half-time of dissociation of complexes containing the peptide at 37 o C at pH 6.5. For other molecules, the estimate is based on the observed anchor residue preferences.

ProPred1 [12]
is a matrix based method that allows the prediction of MHC binding sites in an antigenic sequence for 47 MHC class-I alleles. The matrices used in ProPred1 have been obtained from BIMAS [11] server and matrices described by Toes [13]. ProPred1 also allows the prediction of the standard proteasome and immunoproteasome cleavage sites in an antigenic sequence. It allows filtering of MHC binders, who have cleavage site at C terminus. The most of matrices in Propred1 is multiplication type where the score of each predicted peptide is calculated by multiplying scores of each position.
We carried out the multiple sequence alignment (MSA) of various HIV envelop protein sequences through ClustalX, to select one representative sequence. We selected CAQ63623.1, being the most consistent based on the blocks appeared among all the sequences in their MSA. Resulted and computed epitopes were ranked according to comparative and statistical analysis system. Additionally all the epitopes have been compared with the LANL immunological database collection of T-cells [14]. As the protein is an envelope protein that helps the virus to attach with the host cell, our hypothesis is that, if the CTL can trigger an immune response against the antigen virus targeting the envelop proteins, then virus will not be able to cause any infection.

Results and discussion:
Three-tier screening was performed on the results obtained from various servers. Pre-screening: to collect the top 25 scoring peptides from all the servers. In the second round, out of the top 25 scoring peptides we selected 20 most consistent and common peptide sequences and compared the ranks and their respective positions results from every server against all other servers. This yielded a comprehensive list of 158 peptides (data not shown). Out of the comprehensive list, we selected the peptides with at least 50% occurrence and calculated their average rank ( An interesting aspect of this analysis was extraction of few interesting plausible epitopes while compared with LANL immunological database collection of T-cells [14]. Four epitopic peptide sequences viz. 'WLWYIKIFI', 'NVWATHACV', 'LLDTIAIAV', 'YIKIFIMIV', have been found in LALN database. The most important looks 'WLWYIKIFI', as it has been ranked 2 nd in our analytical system and is for HLA A*0201, A2 and A2.1. Besides that we also propose another plausible epitopic sequence 'TLFNNSWTL', as it has been ranked one in our system and can be verified experimentally and might be a part of immunological databases in future. Other sequences can also included in the immunological databases after successful experimental verification.

Conclusion and Future Prospects:
Design and development of an effective HIV vaccine is exigent because of complex host-virus pathogenesis. From the study carried out and results obtained it is proposed that the given library contains the most putative vaccine peptide candidates for HIV-1. If we can target these peptide candidates, we might be able to have substantial vaccine for the cure of this disease. The putative vaccine candidates obtained might support our proposed hypothesis and will help in hindering the antigenic effect of the virus. Further, similar study can be carried out to design a consolidate library over the whole proteome level which might provide many more putative candidates, increasing the success rate of winning over the disease analysis. This study also proposes a model for carrying analyses on specific data with diversified techniques to extract something attentiongrabbing and decisive.
We would like to extend this work towards the integration of various features of many servers used and would also like to apply more statistical techniques to make results more significant and better. Structural aspects of epitopes will also be incorporated to make results more robust and significant.