Selection of epitope-based vaccine targets of HCV genotype 1 of Asian origin: a systematic in silico approach

Hepatitis C is the major health problem over the globe affecting approximately 200 million people worldwide and about 10 million Pakistani populations. Developing countries are especially facing the problems of HCV infection. Hence the goal of the study was to find out the antigenic epitopes that could be effective vaccine targets of HCV genotype 1 of Asian origin against HLA alleles frequently distributed in Asian countries. A total of 85 complete genome sequences of HCV 1 of Asian origin were retrieved from the HCV sequence database. Using in silico tools, T cell epitopes were predicted from conserved regions of all the available HCV 1 subtypes against Asian HLA alleles. Using 10 MHC I supertypes 51 epitopes was predicted as promiscuous binders. MHC class I supertypes A2 and B7 were found to be good promiscuous binders for a large number of predicted epitopes. Other alleles of MHC I supertypes (B57, B27, BX, B44) either were not respondent as promiscuous binders or responded only to a limited number of epitopes. Against 8 predominantly found Asian alleles of DRB1 supertype, 42 epitopes was predicted as promiscuous binders. MHC class II alleles DRB1-0101, DRB1-0701 and DRB1-1501 were the highest binders to promiscuous predicted epitopes while DRB1-0301 was the least binder for the predicted promiscuous epitopes of HCV 1 genotype of Asian origin. Literature review survey of predicted epitopes via IEDB also confirmed that great numbers of predicted epitopes are true positive. Hence, sophisticated selection of viral proteins and MHCs provided conserved promiscuous epitopes that can be used as effective vaccine candidates for all Asian counties. Abbreviations HCV - hepatitis C virus, MHC - major histocompatability complex, HLA - human leukocyte antigen, CTL - cytotoxic T lymphocytes.


Background:
Hepatitis C Virus (HCV) belongs to family flaviviridae of genous flavivirus, is the major health problem over the globe.It is a positive sense RNA virus affecting about 200 million people worldwide (3.3%) [1].The single stranded virus genome encodes three structural and six nonstructural proteins [2].The inability of the virus to proofread induces a very high mutation rate (8-18 mutations) in the virus genome/year and produces 10 12 viruses/day which is advantageous for virus to increase the evolution rate [3].Such a high mutation rate is the main hindrance not only for vaccine design but also for the treatment of the infected individuals.HCV infection rate in Pakistan is about 10 million covering 6% overall population [4].The infection of virus may be acute but mostly chronic seropositive (about 70%) individuals are found [5].Chronically infected HCV patients often remain asymptomatic and undiagnosed for long times even before chronic hepatitis leads to severe fibrosis, cirrhosis, hepatic failure or hepatocellular carcinoma.Such long term complications associated with HCV made it a leading emerging infectious disease worldwide [6].
Globally this virus is found in six genotypes having numerous subtypes [7].Variations among different genotypes of HCV are about 1/3.HCV genotypes 1, 2 and 3 have global distributed.In Pakistan HCV sero-frequency figures are considerably higher (4.7%) as compared to other Asian countries.The most frequently distributed HCV genotype in Pakistan is 3 for which epitopes have been predicted as a part of ongoing research [8].The HCV 1 is the 2 nd highest genotype in the country (Punjab-12.14%;Sindh-8.33%;Balochistan-32.12%).Pakistan shares a longer border with Iran where genotype 1a is most prevalent.The neighboring Asian countries especially China has also been reported for high HCV-1 prevalence (HCV-1b).Most common HCV-1 subtypes in Thailand are 1a and 1b.HCV-1d was exclusively found in Indonesian population.The vast majority of HCV isolates found in Philippines was HCV-1a and HCV-1b.These figures raise the alarming signals to take the major steps for reduction of viral infection because various HCV-1 subtypes are associated with severe cirrhosis [8-10].
It is also reported that individuals having chronic infection with HCV showed reduced antibody titers against other viral vaccines like Hepatitis A, B and HIV etc.This impaired immune response is explained due to the defect in antigen presenting cell function [5].Moreover, the HLA region in human genome is highly dense containing approximately 200 genes.A large number of these genes play an important role for immune response and some exhibit high genetic polymorphisms [11].
The study was designed to predict the conserve promiscuous MHC I and MHC II binding epitopes of HCV 1 genotype of Asian origin against HLA alleles that are frequently found in Asian countries in order to pick up the best epitopes that can provide good results as vaccine candidates for all Asian population.Since Asian countries share similar climatic and hygienic conditions and the mode of viral infection.Hence it was hypothesized that promiscuous prediction of epitope from conserved regions of the viral sequences infecting the human population covering a wide geographical region (especially Asia) can provide a more clear picture of viral mutation and to locate the conserve regions by analysis of mutations in past and reduced mutation rate in future.Such an analysis can provide potential vaccine candidates from conserve viral sequence having less mutation rates in future with better results of vaccination.

Methodology: Data collection and preparation
Elimination of viral mutations in the past as well as their prediction for the future are important for epitope based vaccine design.Hence, it's important to collect all the available sequences over extended periods of time and geographical distribution representing the possible genetic variants of the viral of interest.For that purpose the available complete genome sequences of HCV genotype 1 and its subtypes of Asian origin were collected from HCV database (http://hcv.lanl.gov/content/index).Collected sequences were edited manually to remove duplications, discrepancies and precursor polyprotein.Finally selected sequences were then aligned and compared by using multiple sequence alignment software ClustalW2.

Identification of conserved sequences
Conserved regions in all the collected protein sequences of HCV-1 genotypes were examined by a consensus-sequence based approach for each country separately.Finally all the consensuses of each origin were aligned via multiple sequence alignment.Segments of minimum length of nine amino acids were selected that were 100% conserve in all HCV 1 subtpes of Asian origin demonstrating at least 80% representation of each subtype.Selection of minimum length of nine-mers is important for many immunological applications because it represents typical length of peptide that bind to HLA molecules [12].

Entropy-based analysis of HCV sequence variability
Degree of variability of peptides having any length can be measured by vigorous method which is based on information entropy.It also helps the assumption of evolutionary stability.Low value of sequence variability at entropy scale characterizes the site stability.An increase in value from 0 to upward is parallel to respective decrease of conservancy from 100% to lower [12].Hence, entropy of HCV genotype 1 was calculated using Shannon Entropy-One tool available at HIV sequence database.The results show that HCV 1 sequences of different Asian origin has distinct patterns of highly conserved and variable regions.Thus the low entropy regions were restricted to distinct short regions which corresponded to the conserved sequences selected by consensus-sequence method.

HLA Selection & Epitope Prediction
Epitopes of HCV genotype 1 were predicted against MHC I and II alleles that were more frequently found in Asian countries.These are mostly 10 MHC I supertypes and 1 MHC II supertype along with their respective alleles that covers about 99% Asian population [13,14].Using the HLA alleles that are frequently dominated in Asian countries, epitopes of HCV genotype 1 were predicted by using NetMHCpan (http://www.cbs.dtu.dk/services/NetMHCpan) and NetMHCpanII (http://www.cbs.dtu.dk/services/NetMHCIIpan-2.0/) based on artificial neural networks (ANNs).HCV epitopes were predicted as nanomers using the protein sequence in FASTA format.Any epitope that fall in the hotspot or warm spot were rejected.Promiscuous epitopes from conserve region are tabulated along with their sequence, start position in the protein, average score and HLA binding.

Validation of predicted Epitopes
All the predicted epitopes were submitted to IEDB database (http://www.immuneepitope.org/)that contains experimentally confirmed data about antibody, T call epitopes, MHC binding, host organism, MHC restriction, MHC class, etc.All the predicted epitopes were analyzed and those found to be true positive are highlighted by using (*) in the predicted epitope tables.

Results:
85 complete genome sequences of HCV genotype 1 retrieved from HCV sequence database were used for the present analysis.The average molecular weight of all the subtypes of HCV genotype 1 was found to be 327171.61KDawith 3010 amino acid residues analyzed by Composition/Molecular weight in PIR search and analysis tool (http://pir.georgetown.edu/pirwww/search/comp_mw.shtml ).The mostly repeated amino acid residue in the whole viral proteome was found to be Leucine (9.97%) while Methionine was found to be the least repeater (1.96%).Among all the Asian countries, China and Japan were found to be the major contributing countries of HCV 1 full genome sequences in HCV sequence database.The other significant contributing Asian countries were India, Taiwan, Korea and Philippines.Using all the available full genome sequences of HCV 1 of Asian origin from the public database, the promiscuous epitopes were predicted against subjected HLA alleles.Out of 10 MHC I supertypes, 51 epitopes were predicted as promiscuous binders.Epitope LSAFSLHSY is predicted to be the highest promiscuous binder, covering 9 MHC I alleles of Asian origin at binding score 149.22.YLVAYQATV is also a good promiscuous epitope of binding score 69.  2 (see supplementary material).MHC class II alleles DRB1-0101 was found to be highest binder by 40.77% followed by DRB1-0701 and DRB1-1501 at binding specificity 22.33% and 16.50% respectively.While MHC II allele DRB1-0301 was the least binder (0.97%) to predicted promiscuous epitopes of HCV genotype 1 of Asian origin (Figure 2).The IEDB analysis of predicted epitopes highlights that server predicts true positive results at default thresholds.That's why we found a considerable no. of predicted epitopes confirmed experimentally and are highlighted by (*) in the tables against respective epitope sequence.The remaining short number of epitopes will be confirmed in future.All the true positive binders will be the effective vaccine candidates singly or fused as polyepitopes.However, the major problem is that MHC proteins are not only polygenic (i.e.multiple genes for MHC I and MHC II) but also the polymorphic (i.e.various alleles of each gene).Variation in MHC alleles is usually by 30 amino acid residues that are often found within binding site.These variations in peptide binding sites results in high specificity of peptides and thus the recognition of T cell.Different polymorphic MHC alleles exhibit different peptide binding specificities and each allele binds to a particular sequence pattern of peptide [20].Hence the promiscuous epitopes were predicted from HCV glycoprotein isolated from Asian countries against MHC alleles that were frequently found in Asian countries to catch the best epitopes as good vaccine candidates for Asian population.Moreover, CD4+ and CD8+ binding T cell epitopes were predicted as nanomers because maximum number of MHCs responds more strongly to nanomers.All the predicted epitopes of HCV genotype 1 varies in their positions to viral proteins as well as binding specificity.
Hence, the peptide binding core of viral protein, position of peptide in full length viral sequence, binding MHCs and their average scores have been tabulated.All the predicted nanomers are antigenic and can be used as singly as potent vaccine candidates or fused together as polytopes.These epitopes thus considerably reduces the viral mutation and represents the whole viral genome to be used as vaccine candidates having a potential control over the immune response and eliminating the side effects [14].

Conclusion:
The presented in silico approach to HCV of Asian origin will proved generic as it applied to other viruses mainly dengue, HIV and influenza and proved to be successful [12].Hence this approach can serve as a template for the study of other emerging viruses and their subtypes over a wide range of geographical distribution.It is therefore, possible to significantly reduce the costs and efforts of experimentation for screening of effective vaccine candidates.

Figure 1 :
Figure 1: Binding specificity of different MHC I alleles of Asian origin to originally predicted epitopes of HCV genotype 1 viral proteins.
41 and covering 8 MHC I alleles.About 9.8% predicted epitopes (FSIFLLALL, SVIDCNTCV, YLNTPGLPV, ILSPGALVV and LMTHFFSIL) have binding specificity to 7 MHC I alleles of Asian origin at average binding score 113.14, 34.25, 66.32, 133.89 and 185.63 respectively Table 1 (see supplementary material).MHC class I supertypes A2 and B7 were found to be good binders for a large number of promiscuous predicted epitopes.MHC class I alleles A*0206 and A*0203 were the highest binder by 11.71% for the predicted epitopes of HCV genotype 1.It was followed by MHC I alleles A*0202, A*0201 and A*0205 of A2 supertype as promiscuous binder to predicted epitopes by 10.36% and 9.46% respectively.The MHC I alleles B*3801 (A24 supertype), B*4402, B*4403 (B44 supertype), B*5702 (B57 supertype) and B*5109 (BX supertype) never responded for binding to any promiscuous epitope and thus restricted the binding efficiency.Other MHC I alleles (A*0207, A*3301, A*6601, B*0702, B*5101, B*5401, A*0101, A*2402, B*4002, B*2705 and B*2706) were found to be least binders for predicted promiscuous epitopes of HCV genotype 1 of Asian origin (Figure 1).Against 8 predominantly found Asian alleles of DRB1 supertype 42 epitopes were predicted as promiscuous binders.Epitope VNLLPAILS was predicted to be the highest promiscuous binder covering 62.5% Asian alleles at average score 274.65.11.9% predicted promiscuous MHC II binding epitopes (YKVLVLNPS, LVLNPSVAA, IQYLAGLST, INALSNSLL and LITSCSSNV) covers 50% MHC II alleles of Asian origin at binding score 96.19, 84.26, 236.19, 74.25 and 171.45 respectively.About 14.28% predicted epitopes covers 37.5% MHC II alleles of Asian origin.The rest of the predicted epitopes are also provided Table

Figure 2 :
Figure 2: Binding specificity of different MHC II alleles of Asian origin to originally predicted epitopes of HCV genotype 1 viral proteins.Discussion: In this study 85 publically available full length sequences of HCV 1 of Asian origin were subjected for the physicochemical and immunoinformatics analysis.HCV specific immune responses are closely correlated to CD+4 and CD+8 T cells [15-17].In patients recovered from HCV infection, CD4+ proliferative T-cell response remain vigorous and multi-specific while CD4+T cell response is quite week and focused in persistent HCV infected patients [18].CD8+ memory T cells maturation and maintenance is determined by their vigorous

Table 1 :
Predicted promiscuous MHC I epitopes of HCV 1 genotype (Asian origin) with sequence, start position, binding score and respective MHC binding alleles; (*) in 2nd column indicates that these predicted epitopes are experimentally confirmed in past as studied by IEDB

Table 2 :
Predicted promiscuous MHC II epitopes of HCV 1 genotype (Asian origin) with sequence, start position, binding score and respective MHC binding alleles; (*) in 2nd column indicates that these predicted epitopes are experimentally confirmed in past as studied by IEDB