A comprehensive analysis of amino-peptidase N1 protein (APN) from Anopheles culicifacies for epitope design using Immuno-informatics models

Analysis of the Amino-peptidase N (APN) protein from Anopheles culicifacies as a vector based Transmission Blocking Vaccines (TBV) target has been considered for malaria vaccine development. Short peptides as potential epitopes for B cells and cytotoxic T cells and/or helper T cells were identified using prediction models provided by NetCTL and IEDB servers. Antigenicity determination, allergenicity, immunogenicity, epitope conservancy analysis, atomic interaction with HLA allele specific structure models and population coverage were investigated in this study. The analysis of the target protein helped to identify conserved regions as potential epitopes of APN in various Anopheles species. The T cell epitopes like peptides were further analyzed by using molecular docking to check interactions against the allele specific HLA models. Thus, we report the predicted B cell (VDERYRL) and T cell (RRYLATTQF for HLA class I and LKATFTVSI for HLA class II) epitopes like peptides from APN protein of Anopheles culicifacies (Diptera: Culicidae) for further consideration as vaccine candidates subsequent to in vitro and in vivo analysis.


601
©Biomedical Informatics (2019) bound ubiquitous zinc metallo-proteases (ZMP). Because of the lack of any effective and economical control strategy, TBVs, promise a more efficient way to malaria control. Other studies have shown that the APN protein is a candidate antigen for vaccine development [8]. Studies on the APN 1 gene of Anopheles gambiae have shown it as a potential candidate to induce specific humoral and cellular immunity in BALB/c mice [9]. Structural analysis of midgut APN1 in Anopheles gambiae has revealed B cell epitope based malaria transmission blocking activity [10]. However, T-cell-based epitope mapping is lacking for cellular immunity which is also essential for cleaning parasite infection.
The vaccination aim is to induce immunity against specific pathogens. It will be induced by selectively stimulating antigen specific cytotoxic T-cells, helper T-cells and B-cells. Ideally, a vaccine is divided into two classes based on antigenic epitopes, firstly a B-cell epitope and a helper T-cell epitope, secondly a CTL epitope. The vaccine is capable to induce either specific humoral or cellular immune response against the specific pathogens using combination of these epitopes like peptides [11]. It is of interest to identify conserved regions as epitopes in various species of Anopheles that elicit both neutralizing antibody and cellular immunity against parasite towards the development of an effective transmission blocking vaccine for malaria.
It should be noted that An. culicifacies (Diptera: Culicidae) is an important malarial vector responsible for 60-70 % of cases in India [12]. A comprehensive analysis of amino-peptidase N1 protein (APN) from Anopheles culicifacies for epitope design using Immuno-Informatics models was completed. The data reported here will help identify epitopes to draw strategy for transmission blocking malaria vaccine development.

Materials and Methods:
Retrieval of protein sequence from database: The protein sequence of APN 1 gene (accession no. QCO76330) from An. culicifacies A was downloaded from the NCBI database (Figure 1). The antigenicity of the sequence was predicted using the VaxiJen v2.0 server [13] with default parameters. Further the APN1 protein sequence from different mosquito species (Diptera: Culicidae) were downloaded from the vectorbase database (https://www.vector base.org/). Multiple sequence alignment (MSA) of APN1 protein sequences from these species was completed using Clustal W.

Secondary structure analysis:
Antigenicity depends on the protein secondary structure. Therefore, prediction of secondary structures using the ExPASy's server ProtParam [14] was completed. Various parameters like the amino acid composition, extinction coefficient, instability index, aliphatic index and molecular weight are included. Self-optimized prediction method (SOPMA) [15] was also used to study transmembrane helices, solvent accessibility, globular and coiled regions for the analysis of secondary structure in the APN1 protein.
These methods provided information about the protein stability with potential functional role for APN1.

Prediction of B cell epitope:
Immune Epitope Database (IEDB) was used to predict B cell epitopes.

Prediction of cytotoxic T cell epitopes:
The NetCTL server [23] was used to predict T-cell epitopes in this study. The parameter value was set at 50 to have highest specificity and sensitivity of 0.94 and 0.89, respectively. It should be noted that all available HLA super types were selected for the antigen protein sequence analysis. A combined algorithm of class I HLA-peptide binding, transport efficiency, Transporter of Antigenic Peptide (TAP) and proteosomal cleavage efficiency were considered to conclude scores. The best epitope was selected based on the combined score values.

Results:
Retrieval of protein sequence and antigenicity determination: APN1 protein sequence of An. culicifacies retrieved from NCBI in FASTA format was screened using the VaxiJen server to predict immunogenicity. The APN1 (QCO76330) is a known antigenic protein based on overall immunogenicity prediction score.

B-cell epitope identification:
Linear B cell epitopes were predicted on the basis of five algorithms-Parker hydrophilicity, Emini surface accessibility, Chou and Fasman beta turn prediction, Kolaskar and Tongaonkar antigenicity and Bepipred linear epitope prediction available on IEDB. All values greater than the average value were considered as potential antigenic determinants. Three epitopes were found to have cutoff prediction scores above threshold scores and nonallergic in nature, namely VDERYRL, MPQQETFN and TVFQRTP ( Table 1). These epitopes are found in surface assessable region, their positions on 3D structures and area surface assessable are shown in Figure 3. Among these three epitopes, VDERYRL epitope is conserved in various Anopheles species taken in this study (Figure 4). The conformational B-cell epitopes were also obtained in four chains of APN1 protein by using ElliPro. ElliPro gives the score to each output epitope, which is Protrusion Index (PI) value averaged over each epitope residue. A number of ellipsoids approximated the tertiary structure of the protein. The highest probability of a conformational epitope was calculated at 74% (PI score: 0.74). Residues involved in conformational epitopes, their number, location and scores are also predicted.

Cytotoxic T-cell epitopes identification:
Epitopes having high combinatorial scores were considered as most potential epitopes as predicted by NetCTL. HLA-I allele interactions with these epitopes were completed using SMM-based IEDB HLA-I binding prediction tool. The epitopes with higher affinity (IC50 less than 200) with MHC-I alleles were selected for further analysis ( Table 2). The affinity for binding of the epitopes with the HLA-I alleles was inversely propotional with the IC50 values. The predicted total score of proteasome score, tap score, HLA score, processing score and HLA-I binding are summarized as total score in Table 2. These epitopes are antigenic and nonallergic in nature. Among these five T-cell epitopes, 9-mer epitope, RRYLATTQF was found to have the highest combined score and it interacts with twelve HLA-I alleles. The conservancy analysis of these epitopes indicated that this epitope was found to be 78 % conserve (Figure 4), which was maximum among all epitopes. However, another epitope NLAERTMLI was found to be 56 % conserve and have more number of allelic interactions with good population coverage than other epitopes.

Helper T-cell epitope identification:
Putative helper T-cell epitope candidates (9-mer sequences) were antigenic and non-allergic in nature showing interactions with numerous HLA-DR alleles ( Table 3). The epitope LKATFTVSI was found to have maximum number of allele binding interactions with highest population coverage and 60 % epitope conservancy (Figure  4), which is the maximum among all selected epitopes.

Population coverage:
The population coverage of predicted epitopes has been analyzed based on their binding with alleles in sixteen ethnic groups and geographical regions across the world. The high population coverage was found in all putative helper T-cell epitopes and CTL epitopes in 16 geographic regions of the world. The percentage of population coverage rate for selected MHC I epitope 'RRYLATTQF' and MHC II epitope 'LKATFTVSI' of APN1 protein was shown in Figure 5. Also, 3D structure of proposed CTL epitopes, HTL epitopes and B cell epitopes of An. culicifacies APN1 protein illustrated by Pymol (Figure 6). The ASA Plot for APN model over all three epitope residues is also designed. Amino acid interacts 608 ©Biomedical Informatics (2019) with the solvent and the protein core is naturally proportional to the surface area exposed to these environments.

Docking simulation:
Binding interactions between epitopes and HLA alleles were assessed using Autodock Vina. The 3D structure of epitopes was predicted using PEP-FOLD and energy minimization was carried out by using Yasara. In this study binding of epitope RRYLATTQF were shown with HLA class I alleles. Three-dimensional structures were obtained from RCSB. The receptors used for docking studies included reported HLAs. However epitope (RRYLATTQF) was used as ligand for HLA class I. The grid coordinates from selected receptor molecules for docking with their epitope was selected. 1Å spacing was used to select the binding site. The grid box was positioned carefully to make the docking of ligands at the binding groove of the receptors. The binding energies of predicted epitope with their respective allele's receptor were as shown in Table 4. HLA-C*07:02 was observed to have the best interaction with the RRYLATTQF epitope with lower binding energy (-8.4 Kcal/mol). The predicted peptides showed significant binding affinities with all HLAs (Figure 7). The more negative ΔG binding value, stronger is the interaction between the epitope and HLA. Also, the binding energy of the predicted epitopes were compared with the binding energy of the already experimentally verified peptides and found to be negative. Similarly molecular docking simulation epitope LKATFTVSI were shown with HLA class II alleles (Figure 8). The LKATFTVSI -HLA-DRB1*11:01 complex shows lowest ΔG binding value (-7.9 kcal/mol) among all the complexes ( Table 5). Strong binding affinities give strong indicative clear idea that peptide vaccine designed by using these epitopes may efficiently work in vivo to elicit humeral and cell mediated immunity.

Discussion:
Malaria transmission blocking vaccine helps control malaria without causing ecological imbalance. During the present study, the most potent B and T cell epitopes for transmission blocking vaccine in APN1 protein of An. culicifacies based on computational techniques. APN1 was found to be the immunogenic protein by Vaxijen server and this has also been indicated as a lead TBV candidate [5]. The analysis of secondary structure of APN1 revealed that its antigenic part is more likely to be the beta sheet region as also reported in other experiment [40]. The presence of threonine residues (10.5%) predominately in the beta sheet also indicates the protein's antigenicity. The predicted negative value (-0.096) of grand average of the hydrophobicity rule (GRAVY) of this linear sequence protein not only indicates its hydrophilic nature but also indicates the presence of residues mostly on the surface. In addition, this protein is stable and aliphatic in nature because its Instability Index (33.25) is smaller than 40 and Aliphatic Index (85.53) has higher value. High aliphatic index seems to be 609 ©Biomedical Informatics (2019) responsible for increasing the thermo stability of globular proteins. Also higher proportions of coiled region provide more stability.
B and T cell epitopes involves in humoral and cell mediated immunity. Two types of B cell epitopes are linear epitopes and conformational epitopes. We predicted three linear (continuous) epitopes based on scores which were above threshold values of five algorithms-Parker hydrophilicity, Emini surface accessibility, Chou and Fasman beta turn prediction, Kolaskar and Tongaonkar antigenicity and Bepipred linear epitope prediction available on IEDB. The more value of B cell epitope scores then the threshold level in five algorithms indicates that these candidate epitopes (VDERYRL, MPQQETFN and TVFQRTP) could be effective antigenic peptides in response to B cells. The localization of conformational (discontinuous) epitopes on A and B chain of the APN1 protein using 3D representation of residues revealed that the presumptive antigenic epitopes sequence that is placed in such a way which enables it to have direct interactions with immune receptor. The B-cell epitopes residues, 66VDERYRL72 situated on the surface of B chain of APN1 protein had good Protrusion Index (PI) score (0.738) were indicative of high accessibility. Ellipsoid value of PI 0.73 indicates that 73% protein residues lie within ellipsoid and the remaining 27% residues lie outside. PI score and solvent accessibility are directly proportional to each other, if PI score is higher; maximum is the solvent accessibility of the residues. Thus, these could be the putative vaccine candidates.
T-cell based development of vaccines seems to have potential because of antigenic drift as the foreign particles can easily engineer the escape from antibody memory response. In addition T-cell mediated immunity tends to be a long lasting. The peptide that passes several criteria has been considered to be a good epitope candidate such as possessing antigenicity, non-allergen, highly immunogenic, good conservancy, good interaction with HLA molecules and enough population coverage. During the present study, it was found that the epitope NLAERTMLI could be used as a potential candidate because it had the maximum number of HLA binding alleles amongst other CTL epitopes, but having less conservancy and combined score. This inconsistency of immunological features of epitopes indicates that some other parameters also needed for screening. An epitope should be highly conserved among different species of Anopheles. The conservancy analysis of these epitopes indicated that RRYLATTQF was found to have maximum conservation almost all Anopheles species consider in this study. It also had highest combined score and immunogenicity score than NLAERTMLI. Armistead et al. (2014) have indicated that 135-amino-acid fragment located in 60-195 amino acid sequence of An. gambiae APN1 is safe and highly immunogenic, even in the absence of an adjuvant, in murine models. Interestingly CTL epitope (RRYLATTQF) and B cell epitope (VDERYRL) predicted during the present study coincides with this location.
The maximum number of alleles binding interactions of epitope LKATFTVSI with MHC class II was observed using IEDB server. This epitope was predicted to have maximum conservancy among other epitopes. These epitopes was nonallergic and antigenic in nature. The peptide that fulfills the above said parameters, RRYLATTQF for MHC class LKATFTVSI and I for MHC class II, were further chosen for docking studies. Docking simulation study of the predicted MHC peptides with HLA molecules was performed to find out that whether the designed epitope would elicit the sufficient immunological responses in vivo. The binding energy of predicted MHC I epitope with HLA-B*27:05 recep¬tor was found to be -7.9 kcal/mol as compared to the binding energy of Nipah virus V protein predicted epitope (NPTAVPFTL) with HLA-B*27:05 (-3.13 kcal/mol) and was observed to be lower in the predicted epitope [43]. The interaction between the epitope and HLA are stronger if ΔG-binding value is more negative. The similar results were also found in the molecular dock¬ing simulation between MHC class II-restricted epitope and HLA. The LKATFTVSI-HLA-DRB1*11:01 complex had the lowest binding energy (-7.6 kcal/mol) of all the studied complexes. The strong binding affinity showed that peptide vaccine designed by using these selected epitopes might be well work in vivo to elicit cell mediated and humoral immunity.
Different ethnic populations have high polymorphism in HLA. HLA proteins restrict the reaction to T-cell epitopes. Therefore, to stimulate immune responses in human populations among world, the HLA specificity of T-cell epitopes has to be measured as main criteria for selection of the epitopes. On the basis of above study, the epitope candidates should bind maximum HLA alleles to get better population coverage. In this study, the five HTL and CTL epitopes have shown good population coverage (74% for MHC I and 59% for MHC II in average) and reached above average values in Europe, North America, North Africa and south Asia population. Further analysis has shown that helper T-cell epitopes RRYLATTQF (33%) for MHC class-I and CTL epitope LKATFTVSI (60%) for MHC class-II (that bind the maximum number of HLA alleles) is reported. It should be noted that An. culicifacies is a prominent species in India. NPTAVPFTL for MHC class I show highest population coverage in India. These epitopes have good coverage of population and it may provide a broad immune protection to human beings from different regions of the world. The predicted CTL epitope RRYLATTQF for cellular immunity,