T-cell epitopes predicted from the Nucleocapsid protein of Sin Nombre virus restricted to 30 HLA alleles common to the North American population

Hantavirus cardiopulmonary syndrome in North America is caused by Sin Nombre virus (SNV) and poses a public health problem. We identified T-cell epitopes restricted to HLA alleles commonly seen in the N. American population. Nucleocapsid (N) protein is 428 aminoacid in length and binds to RNA and functions also as a key molecule between virus and host cell processes. The predicted epitopes from N protein that bind to class I MHC were analyzed for human proteasomes cleavage, TAP efficiency, immunogenicity and antigenicity. We identified 8 epitopes through MHC binding prediction, proteasomal cleavage prediction and TAP efficiency. Epitope VMGVIGFSF had highest Vaxijen score and the epitope, TNRAYFITR had highest immunogenicity score. Epitope AAVSALETK and TIACGLFPA had 100% homology to many HCPS causing viruses. Our study focused on T-cell epitope prediction specific to restricted HLA haplotypes of racial groups in North America for the potential vaccine development. Among the candidate epitopes, FLAARCPFL was conserved in SNV, which is suitable for vaccine specific to the virus genotype. Peptide-based vaccines can be designed to include multiple determinants from several hantavirus genotypes, or multiple epitopes from the same genotype. Thereby, immune response will focus solely on relevant epitopes, avoiding non-protective responses or immune evasion. The other advantages include absence of infectious material unlike in live or attenuated vaccines. There is no risk of reversion or formation of adverse reassortants leading to virulence and no risk of genetic integration or recombination forming a rationale for vaccine design including for distinct geographical regions.


Background:
Sin Nombre virus (SNV) belongs to Hantavirus genus (Family Bunyaviridae). Goldsmith et al. [1] documented the virus morphology using electron microscopy and immunoelectron microscopy. It is the causative agent of hantavirus cardiopulmonary syndrome (HCPS) in humans transmitted by its rodent reservoir, North American deer mouse (Peromyscus maniculatus). Chizhikov et al. [2] reported the complete genetic characterization of SNV and the exact 5'-and 3'-terminal sequences of the three genomic segments. Remote sensing and geographic information system maps of SNV infections in deer mouse populations has been documented by Boone et al. [3]. A relationship between host density and infection dynamics was studied [4]. Terajima and Ennis [5] reported the quantitative measurement of viral RNA in human samples. They indicated that antibody-bound viruses and unbound viruses were measurable by quantitative RT-PCR. SNV persists to be the predominant hantavirus causing HCPS in the United States [6] and Canada [7]. As of January 2016, 659 HCPS cases have been reported with the case fatality rate of 36% in USA (http://www.cdc.gov/hantavirus/surveillance/annualcases.html).
Ye et al. [8] reported the presence of high titers of neutralizing antibodies months after recovery. Nucleocapsid (N) protein coded by S segment of the virus genome has been used for diagnosis due to its antigenic properties [9]. The amino and carboxy termini of the N protein are inferred to form trimers in the protein generation [10]. Diagnoses by PCR testing for specific and pan-hantaviruses have been reported. There is no specific antiviral treatment option available but only supportive therapy and blood oxygenation. Minimizing or eliminating contact with rodents to help prevent exposure to the virus could prevent this condition. Safronetz et al. [11] and Brocato et al. [12] have successfully used animal models to establish persistent infection in which it may be possible to test antiviral agents and vaccines. Vaccines against SNV are still under development for use to avoid outbreaks [13].
The other fatal infection caused by hantaviruses is hemmorhagic fever with renal syndrome (HFRS). This is seen predominantly in Asian countries. Increased vascular permeability and leakage in the kidneys and the lungs are responsible for characteristic difference in the respective disease caused by the different genotypes. A prophylactic T cell epitope based vaccine could induce CTL immunity, which will protect against viral disease like in the case of Dengue virus [14]. As in the case of vaccine preventable viral diseases, preexisting T and B cell immunity could avert disease. Good CD4 T cell priming by peptide vaccination could improve antibody response also during natural infection that could occur in the immunized individuals, "primed by vaccines, boosted by natural infection" is a good vaccine strategy [15]. Infection could occur in vaccinated individuals, but no disease is seen, in the case of killed poliovirus vaccine, even gut infection by poliovirus is prevented [16].
The increased understanding of antigen recognition at molecular level has resulted in the development of rationally designed peptide vaccines. In the present study, we used immunoinformatics strategies for designing vaccine candidate Tcell epitopes. These peptide's epitopes are important towards development of T-cell epitope-based vaccines that could bind to specific Class I MHC and thereby stimulate T-cell immune responses.
We aimed to identify candidate T-cell epitopes of SNV that are restricted to HLA alleles common to North American population where this virus is widespread. The epitopes that bind to Class I MHC that is also cleaved at the flanking regions by human proteasomes and transporter associated with antigen processing (TAP) efficiency was also analyzed.

Retrieval of nucleotide sequences:
All available complete S segment amino acid (aa) sequences (n=11) of strains of Sin Nombre virus that causes Hantavitus Cardiopulmonary Syndrome were retrieved from GenBank database [17] as of October 2016. A consensus aa sequence was identified using CLC sequence Viewer 7 program (https://www.qiagenbioinformatics.com/). The program identifies the consensus sequence based on most frequent residues found at each position in the sequence alignment. The consensus sequence was used for further analysis to identify Tcell epitopes.

Selection of MHC alleles:
We selected the top 30 human Class I MHC alleles reported for Whites, Blacks, Hispanics and Asian or Pacific Islander population groups of the North American population [18]. The selected alleles were based on the percentage chance of haplotype expressed in an individual identified from HLA matchmaker program available at http://www.epitopes.net/.

Prediction of epitopes from the N protein of Sin Nombre virus with affinity to Class I MHC molecules:
Using the identified consensus aa sequences as the input, T-cell epitopes that bind to MHC Class I were predicted using NetMHCpan 3.1 online server. This program predicts binding of peptides to any MHC molecule of known sequence using artificial neural networks (ANNs) [19]. The epitopes of 9-mer and 10-mer lengths were derived. The program also had a wide choice of alleles to choose and select as a query. HLA alleles that occur most commonly in the North American population were selected for epitope identification. The default threshold for strong binding and weak binding in terms of % rank, 0.5 and 2 respectively was used in our study as in previous reports on other analytical approaches. Strong binders alone were selected and used for further analysis.

Prediction of proteasomal cleavage:
This was predicted using MAPPP (MHC-I Antigenic Peptide Processing Prediction) program [20]. The program generates a probability for the cleavage of each possible peptide from a protein by the proteasome in the cell and the probability is based on a statistic-empirical method. The algorithms in the program were earlier implemented in FRAGPREDICT. Minimum possibility for cleavage after a single residue and for cleavage of a fragment was set to default value of 0.5.

Prediction of TAP efficiency:
To predict the candidate epitope(s) based on the processing of the peptide(s) in vivo, the transporter of antigenic peptides (TAP) proteins' transport efficiency was tested using TAPPred server program [21]. The prediction approach used in this study was cascade Support Vector Machines (SVM), a prediction that is based on the sequence and features of amino acids and their properties.

Prediction of antigenicity/immunogenicity:
The identified epitope(s) were used to predict whole protein antigenicity (protective antigen) using Vaxijen 2.0 server program with a threshold limit of 0.5 [22]. The threshold values of the highest accuracy of more than 0.5 were considered probable antigens and were selected for further analysis. In addition, class I immunogenicity analysis was carried out in an online server tool available at http://tools.iedb.org/immunogenicity/. This tool uses amino acid properties as well as their position within the peptide to predict the immunogenicity of a peptide MHC (pMHC) complex. were selected for their preponderance in the North American racial groups. Analysis for epitopes restricted to specified class I MHC resulted in 478 possible epitopes [HLA-A* (n=171), HLA-B* (n=146) and HLA-C* (n=161)]. The results are presented in Table  1.

Results
Common HLA alleles were found in the four groups of North American population and many common T-cell epitopes were identified from different HLA alleles due to promiscuous presentation of the same T-cell epitope via two or more HLA class I molecules. Therefore, a non-redundant 63 HLA alleles [HLA-A* (n=21), HLA-B* (n=25) and HLA-C* (n=17)] was generated and epitope dataset (n=85) were identified restricted to these alleles. Among the top 30 alleles in North American population, alleles A*02:01, B*44:03, C*03:04, C*04:01, C*06:02, C*07:02 were present in all four population groups. TAPpred analysis was carried out using full-length consensus amino acid sequence of Sin Nombre nucleocapsid coding protein.
The analysis resulted in 420 possible epitopes with varying affinities classified as high (n=164), intermediate (186) and low or detectable (n=70). A total of 47 epitopes were identified both by NetMHCpan 3.0 and TAPpred programs. These epitopes were analyzed for proteasome cleavage analysis.
Further screening based on proteasome cleavage resulted in 8 epitopes with scores ranging from 0.5009 to 1 ( Table 2). Among them, six have been identified as probable antigen by Vaxijen program and were further analyzed for immogenicity. Epitopes VMGVIGFSF had highest Vaxijen score of 1.   HLA molecules significantly overlap in peptide binding specificity. Class I HLA peptide binding shows a high degree (>60%) of promiscuity [33]. HLA allelic variation occurs in different ethnicities [34] and therefore must be an important consideration while designing and developing T-cell epitopebased diagnostics or vaccines, where multiple epitopes with different HLA binding specificities are screened.
HLA allele frequencies exhibit ethnic variation, with some alleles found widely distributed among populations and others almost exclusively within a particular ethnic group. The Class I and II loci reside on a relatively small region of human chromosome 6 and specific haplotypes. Apparently, they are present at high frequencies in founding populations or were selected for generating immune response to the infectious organisms. In this setting, linkage disequilibrium results in a significant over representation of certain haplotypes [35]. An ethnic and geographical difference in HLA has been shown to be associated with disease outcome, such as viral persistence or viral clearance [36]. Therefore, HLA diversity data has become increasingly important in the design of population-specific T-cellbased vaccines [37]. HLA diversity data was thus utilized suitably in our study to predict T-cell epitopes specific to the population where the infection is widespread. The present approach is to use peptide sequence data for experimental determination of affinity. Such findings have been used in the construction of many T-cell epitope prediction algorithms and the outcome of such analysis is robust [40]. However, previously, HLA diversity for a given population was not considered while developing vaccines.
Conventional experimental HLA typing using next generation sequencing tool and mapping an optimal CD8 T-cell epitope is laborious and expensive. Now, bioinformatic tools have been developed that predict peptides that bind to a specific MHC molecule. Though the experimental fine mapping of epitopes are unmatched in their efficacy [41]. Prediction methods also are equally indispensable to experimental validation methods for better vaccine development [42].
The application of information from the fields of pharmacogenomics, pharmacogenetics and bioinformatics to vaccine design termed 'vaccinomics' has potential advantages.
The conventional experimental approaches are seen as a bottleneck toward developing new vaccines simply because of the possibility of potential candidate epitopes being left unnoticed. Availability of pathogen genomes is now the key wealth of information and the computer programs developed with extremely powerful algorithms can handle even a huge dataset for informatics-based approach towards vaccine design. Moreover, possibility of T-cell epitope prediction that bind to specific HLA-class/allele, transporter of antigen processing (TAP) affinity prediction and proteasomal cleavage prediction are highly beneficial. Screening peptide-based vaccines using in silico bioinformatic approach has been shown to be particularly useful when hyper variable viruses like HIV and HCV are examined [43]. We also believe that this applies to hantaviruses as well simply because they are very diverse and causing different clinical syndromes in different areas and each transmitted by different rodent hosts. Ample choices of T-cell epitopes identified through these bioinformatic approaches can be developed into a synthetic polyvalent peptide vaccines suitable for diverse HLA types in each population.
In the course of Class I MHC presentation, antigens that are synthesized in the cytosol undergo proteasomal degradation and Transporter associated with Antigen Presentation (TAP) molecules [44] transports the generated peptides into the endoplasmic reticulum (ER). Inside the ER, the peptides bind to Class I MHC molecules, and carried to the cell surface. The MHC-I and peptide complex are then recognized by CTLs. Cytotoxic T cells encounter smaller peptides (eight to ten amino acids) in length. Peters et al. [45] reported that combining in silico predictions of MHC-I binding affinities along with predictions of TAP transport efficiency lead to an improved identification of epitopes, compared to predictions of MHC-I binding affinities combined with predictions of C-terminal cleavages made by the proteasome. Nevertheless, the proteasome system plays an important role in MHC Class I antigen processing and presentation [46] and as a result activation of CD8+ T cells, as well as activation of the NF-κB pathway [47] for mounting immune response. Ip et al. [48] reported that the prediction of MHC class I epitopes for HCV and proteasomal cleavage sites prediction at the flanking regions of epitopes enhances the precision of identification of functional HCV-specific CTL epitopes. In our study, we screened for T-cell epitopes for potential vaccine candidate using bioinformatic approaches integrating both proteasome cleavage prediction and TAP affinity prediction along with antigenic and immunogenic abilities. This significantly improves the strength of prediction ability for further evaluation in animal models and finally in human population.
Previously, we had demonstrated immunodominant B-cell epitope of SNV in the N protein [49]. The 3D structure generated using I-TASSER program is shown in Figure 2. In our study, the generated candidate T-cell epitopes (9-mer and 10-mer) ranged from three to thirteen specific to each allele. No epitope was identified for HLA-A*29:02 by the program. The NetMHCpan 3.0 program used in our study is based on neural network-based machine-learning algorithm. This allows insertions and deletions in a pan-specific MHC-I binding machine-learning model and also enables combining information across both multiple MHC molecules and peptide lengths. The above pan-allele/pan-length algorithm is a state-of-the-art method with increased accuracy for ligand identification [50]. Figure 2: 3D structure of SNV N protein generated by I-TASSER program MAPPP, which stands for MHC-I Antigenic Peptide Processing Prediction, predicts proteasomal cleavage with peptide anchoring to MHC I molecules. This program accepts length of fragments between 9 and 11. Though a TAP transporter can translocate peptides of 8-40 amino acids, with preference for peptides of length 8 to 11 amino acids, many programs including TAPpred used in our study predicts nonamers (9-mer) only. Therefore, 10mer epitopes predicted in MHC binding program and MAPPP program, were eliminated in the TAP efficiency analysis. Due to this reason, the finalized epitopes were all nonamers.
The steps of MHC class I antigen presentation pathway are evaluated by three scoring systems. 1. proteasomal score which reflects the efficiency of antigen-processing examining cleavage site usage releasing the peptide C-terminus. 2. TAP score predicts transporter molecule associated with the epitope transport. This is achieved by estimation of the binding of a given peptide to TAP. The highest affinity score for a peptide indicates the highest transport rates and affinity for the MHC molecule. The scores are expressed logarithmically; higher values indicate higher predicted efficiency.
Following this, the identification of variables that influence immunogenicity has also been identified as an important step in the investigation of T-cell epitopes and understanding of cellular immune responses [51]. In the immunogenicity analysis program we used, positions P4-6 of a presented peptide and amino acids with large and aromatic side chains, which are associated with immunogenicity are taken into consideration. Also, in this program, T-cells are equipped to better recognize viral than human (self) peptides. Similarly, Vaxijen model for prediction of protective viral antigens was used. The model was reported to have prediction accuracy up to 89% [52].
Highlights of our study include T-cell epitope prediction specific to geographically restricted HLAs for the potential vaccine development for hantavirus infection. Among the candidate epitopes identified in our study, FLAARCPFL was conserved in Sin Nombre virus, which is suitable for a vaccine specific to this virus genotype. Other epitopes were conserved across the HCPS causing hantaviruses suitable for pan-hantavirus vaccine. The data generated in this study has an intriguing potential for more rational approaches for vaccine design. SNV continues to be a significant cause of morbidity and mortality in N. America and its control is not possible because of several epidemiological features and lack of specific therapy. Development and application of an effective vaccine may be one important approach to be explored for the control of SNV infection.