Linear epitope prediction in HPV type 16 E7 antigen and their docked interaction with human TMEM 50A structural model

Human Papilloma Virus (HPV) HPV type 16 E7 antigen is a known target in cervical cancer. We report the predicted potential epitopes in the E7 antigen. We further describe the subsequent interaction of these linear epitope peptides with the human TMEM 50 A structural model using molecular docking. This data finds application in the development of components towards HPV associated disease prevention.


Model refinement and evaluation:
The SAV server, which include various tools such as WHATIF [10], PROCHECK [11], PROVE, ERRAT [12], VERIFY-3D [13] and ProSA was used for model refinement. The stereo chemical quality and accuracy of the predicted models was evaluated with PROCHECK by Ramachandran plot analysis [14]. The best model was selected on the basis of overall G-factor, number of residues in core, allowed, generously allowed and disallowed regions. The selected model was further analyzed by VERIFY 3D, WHAT IF and ERRAT programme. ProSA was used to display Z-scores, energy plots and visualized with Discovery studio Modeler 3.5.

The molecular dynamic simulation of TMEM 50A
The Schrodinger software package (https://www.schrodinger .com) was used for the molecular dynamics simulation of the predicted TMEM 50A model.

Active sites prediction of TMEM 50A:
We used Q-SiteFinder to locate binding site. Q-SiteFinder (http://www.bioinformatics.leeds.ac.uk/qsitefinder) uses the interaction energy between the protein and a simple van der Waals probe to locate energetically favorable binding sites. It also generates predicted sites with the lowest average volumes of the protein. The Active site finder (AADS) [15] was used to identify all cavities in a protein and scores them based on the physicochemical properties of functional groups lining the cavities in the protein. It is a freely available at http://www.scfbio-iitd.res.in/dock/ActiveSite_new.jsp.

Computational epitope prediction for HPV type 16 E7 proteins:
The FASTA amino acid sequence of HPV type16 E7 protein was retrieved from Swiss--Prot (Accession No. P03129) at http: //expasy.org/sprot/. The protein sequence of HPV 16 E7 was submitted to BCPREDS server (http://ailab.cs.iastate.edu/bcpreds/) and ABCPred online tool available at (http://www.imtech.res.in/raghava/abcpred) [16] for B-cell epitope prediction. IEDB analysis tool was used (http://tools.immuneepitope.org) was used for T-cell epitope prediction. This server covers a broad range of tools facilitating the prediction of new B-cell and T-cell epitopes in proteins of interest, and tools for the analysis of epitopes sets collected from within the IEDB. Alleles considered for prediction in this study include HLA-A*01:01, HLA-A*02:01, HLA-DRB1*01:02 and HLA-DRB1*01:01.

Docking of Human TMEM 50A (human Transmembrane Protein) with predicted T-cell epitopes of HPV type 16 E7 antigens:
The 3D structure prediction of all the HLA allele specific predicted epitopes was completed using the HHPred server (http://toolkit.tuebingen.mpg.de/hhpred). The antigen-antibody docking of modeled epitopes of HPV and human protein TMEM 50A were completed using the PatchDock server (http://bioinfo3d.cs.tau.ac.il/PatchDock). PatchDock is a geometry-based molecular docking algorithm, which is aimed at finding docking transformation, Global energy, attractive vdW, repulsive vdW, hydrogen bond and ACE that yield good molecular shape complementarity [17]. The top ranked predicted epitopes were also docked with TMEM 50A (receptor) human Trans membrane protein model. The epitope-receptor docked complex were further refined using the FireDock program (http://bioinfo3d.cs.tau.ac.il/FireDock/). FireDock is an efficient method for refinement and re-scoring of rigid-body proteinprotein docking solutions [18]. The docked complexes based on their energy scores (kJ/mole), a comparative analysis was carried out and visualized with the help of Discovery studio 3.5.

Prediction of protein-protein interaction of docked molecules:
InterProSurf was used for protein-protein interactions studies of the docked molecules, which investigated the role of hydrogen bond formation, hydrophobic residues and overall electrostatics with total accessible surface area of the complex protein molecules, which includes polar area/energy and apolar area/energy. InterProSurf analyzed each chain within the complex and provides interface residues, interface area of each residue and a change in the surface area of each residue upon complex formation. It is freely available at http://curie.utmb.edu/prosurf.html [19].

Results and Discussion: HPV 16 E7 Protein sequence Analysis:
Analysis of HPV 16 E7 ProtPram provides molecular weight (11.02 kD), theoretical pI (4.20), protein iso-electric point as pH 3.97 and total number of negatively charged residues (Asp + Glu) as 19, total number of positively charged residues (Arg + Lys) as five and ext. coefficient was 6335. The estimated half-life is: 30 hours (mammalian reticulocytes, in vitro) >20 hours (yeast, in vivo), total number of atoms found as 1499 and instability index second was computed to be 63.00. This classified the protein HPV type 16 E7 as stable and it also measured aliphatic index as 78.57 and grand average of hydropathicity (GRAVY) as 0.405.

Characterization of TMEM 50A (Human protein):
TMHMM server 2.0 (http://www.cbs.dtu.dk/services/TMHMM/) and Target-P 1.1 server (http://www.cbs.dtu.dk/services/ TargetP/) [20] predicted the presence of four transmembrane domains within the signal peptide, which may be required to direct the protein to secretary pathway. Physiological properties found as molecular weight (17400Da), basal is electronic point (5.57), 17388.70 (monoisotopic mass) and the result of K-NN prediction using PSORT-II gives percentage of plasma-membrane (43.5%), endoplasmic reticulum (26.1%), vacuolar (17.4%), Golgi bodies (4.3%), mitochondrial (4.3%). Post-translation modification using PhosphoSitePlus predicted two serine phosphorylation sites found at amino acid 82 and 84 residue, one possible N-linked glycosylation site located at amino acid 74, and one possible tyrosine phosphorylation site is found. Expasy's ProtPram proteomics server predicted the instability index (II) was computed to be 28.66, aliphatic index score as 95.10 and grand average of hydropathicity (GRAVY) as 0.580 which classified the protein as stable.

Model refinement and evaluation of Human TMEM 50A
PROCHECK server provided Ramachandran plot analysis of the predicted model. The best model in terms of stereo chemical quality showed Overall G-factor value of -0.38 which indicates that geometry of the model corresponds to high probability confirmation with 85.8% Residues in most favored regions [A, B, L], 9.0% residues in additional allowed regions [a, b, l, p], 2.2% residues in generously allowed regions [~a, ~b, ~l, ~p] and 3.0% of the residues was present in the disallowed region of the plot. Verify 3D analysis revealed that 80% of the residues had an average 3D-1D score of <0.2, predicting that the model is compatible with its sequence. The amino acid environment was evaluated using ERRAT plots, which assess the distribution of different types of atoms with respect to one another in the protein model and is used for making decision about its reliability. ERRAT showed an overall quality factor of 31.544, a result expected for crystallographic models with resolution>2.5A. Bfactor analysis is done with WHAT IF server reflected the mobility or flexibility of various parts of the molecule. Averaged B-factor deviation for protein backbone was 0.082 (Z score mean) and averaged standard deviation was 2.011. Since average deviation value was less than standard deviation, so it reflected a good quality model Figure 3.

Molecular dynamic simulation of TMEM 50A
The Molecular dynamic stability of TMEM 50A protein has been established by performing the molecular dynamic simulation study using Schrodinger software packages showed the energy of the molecule, which was found constant throughout the time period of simulation. In Figure 4 the radius of gyration increases in between 200 to 400 ps time but after 400 ps it decreases up to 1000 ps and finally omens almost constant for rest of the duration of the simulation. This suggests that the Human TMEM 50A protein model has a compact structure, which provides the required stability to the molecule. In Figure 5 the root mean square deviation (RMSD) during the simulation was increasing in the beginning and at 600 ps it showed highest peak value but after that it becomes almost constant for rest of the duration of the simulation. This suggests that the Human TMEM 50Amodel has lesser RMSD for Its backbone and it also has less flexibility, indicating that this protein has a stable dynamic behavior structure. In Figure 6 the root mean square (RMS) fluctuations were very less because most of the atoms were free from the RMS fluctuations but some atoms showed this at C and N terminal due to the presence of loop regions of modeled TMEM 50A.

Active site prediction result:
Q-SiteFinder predicted site volume and protein volume in cubic Angstrom with the minimum and maximum co-ordinates [21] and Active site finder (AADS) generated top 15 cavity points for TMEM 50A each pocket was represented by a single cavity point. Then it was sorted into cavities in the descending order of their volumes. The cavities generated in protein along with the Cartesian coordinates of the cavities are shown in Table 2. Table 3 and Table 4 shows the predicted T-cell MHC class-I, epitopes for alleles HLA-A*01:01 and HLA-A*02:01 respectively. These small peptides (epitopes) were predicted by the IEDB analysis tool and ranked according to their consensus percentile rank. These percentile ranks refer to the percentage of rank out of one million random peptides from swiss-prot proteins [22]. In this the tap scores of the predicted epitopes were measured as local sequence to structure fitness based on torsion angle propensities normalized against the global minimum and maximum. In table 3 top four epitopes MHGDTPTLHE, HGDTPTLHEY, GDTPTLHEYM and DTPTLHEYML were selected for docking with receptor protein. While In the table 4 the score rank of ANN and SMM were shown in the units of IC50 (nM), which demonstrated that better binders have a low value of ANN and SMM rank. In this data include measured binding affinities for a total of 15 ranked epitopes for allele HLA-A*02:01. The top ranked epitopes YMLDLQPETT, DLLMGTLGIV, LLMGTLGIVC and TLEDLLMGTL was selected for epitope modeling and docking process. In particular a value of 500 (IC50) is often used as the threshold between binders and non-binders. So, peptides sequence YMLDLQPETT, DLLMGTLGIV, LLMGTLGIVC and LLMGTLGIVC represents highest affinity for binding with receptor proteins.

MHC Class I binding epitope prediction:
MHC class II binding epitope prediction for allele HLA DRB1*01:01: Table 5 and Table 6 shows peptides for MHC Class-II alleles HLA-DRB1*01:01 and HLA-DRB1*01:02 predicted by IEDB analysis tool which were ranked according to their consensus percentile rank. In these predicted epitopes HVDIRTLEDLLMGTL, IRTLEDLLMGTLGIV, VDIRTLEDLLMGTLG and RTLEDLLMGTLGIVC were selected for docking with TMEM 50A. In Table 6, for each peptide, a percentile rank for Sturniolo method was generated by comparing the peptides score against the score of five million random isomers selected from Swiss-Prot database. It should be noted that small numbered percentile rank indicates high affinity. So the peptide sequences DLLMGTLGIVCPICS, LLMGTLGIVCPICSQ, LMGTLGIVCPICSQK and MGTLGIVCPICSQKP represents high affinity towards receptor proteins because these contained same percentile score and Sturniolo rank of 0.78.

Docking of MHC Class-I epitopes and TMEM 50A
Top ranked predicted epitopes for different alleles of MHC Class I was docked with human TMEM 50A receptor using Patch Dock server. The results have been shown in the Table 7 and Table 8. The outputs of these tables were ranked by the global energy value, which was represented as the binding energy of the docked structures. ACE was used to calculate the free energies of transferring side-chains from protein interior into water. An ACE provides a reasonably accurate and rapidly calculated salvation component of free energy, and thus makes possible a range of docking, design and protein folding calculations [23]. Besides this, attractive Vdw and repulsive Vdw show the contribution of the Van der Waals forces to the global binding energy. In Table 7, docked structure of modeled epitope MHGDTPTLHE of MHC class-I (HLA-A*01:01) and TMEM 50A showed global energy as -16.45 while other modeled epitopes HGDTPTLHEY and DTPTLHEYML of this class showed global energy as -60.15 and -40.25 with TMEM 50A receptor protein while in Table 8 docked structure of modeled epitope YMLDLQPETT of MHC class I (HLA-A*02:01) and TMEM 50A showed global energy as -60.59.

Docking of MHC class-II epitopes and TMEM 50A
Top ranked predicted epitopes for different alleles of MHC class-II were docked with human TMEM 50A receptor using Patch Dock server. The results are shown in the Table 9 and  Figure 7 docked molecule showed binding of modeled epitope YMLDLQPETT of MHC Class-I (HLA-A*02:01) with human TMEM 50A receptor protein with global energy -60.59, attractive Vdw -24.24 and repulsive Vdw 26.54. In this, the contribution of the atomic contact energy (ACE) and hydrogen bonds (HB) energy to the global binding energy was found as -9.33 and -1.39. After this InterProSurf server predicts the total accessible surface area of this docked structures as 1568.43 in which polar area/energy was 610.73 and apolar area/energy was 957.70 with 1.4 prob radiuses.
In  Figure 9 docked molecules showed binding of modeled epitope DLLMGTLGIVCPICS of MHC-II (HLA-DRB1*01:02) to with human TMEM 50A receptor protein with global energy -47.59, attractive vdW -29.83 and repulsive vdW 7.75, which gave more accurate vaccine candidate structure in comparison to rest of the molecules. In this the contribution of the atomic contact energy (ACE) and hydrogen bonds (HBs) energy to the global binding energy was found as -11.27 and -0.95.     Note: In this Iden 1 shows the percentage sequence identity of the templates in the threading aligned region and Ident 2 is the percentage sequence identity of the whole template chains with the TMEM 50A sequence and Norm Z-score is the normalized Z-score of the threading alignments. Alignment with a Normalized Z-score > 1 means a good alignment.        Table 9: Docking using FireDock for HLA-DRB1*01:01 specific epitopes with human MEM 50A    Beside this, InterProSurf server predicted the total accessible surface area of the docked structure as 11286.18 in which polar area/energy were 2870.52 and apolar area/energy was 8415.65 with 1.4prob radiuses. It also predicted the total number of surface atoms as 859 and total buried atoms were 371. In Figure  10, docked molecule showed binding of modeled epitope HVDIRTLEDLLMGTL of MHC Class-II (HLA-DRB1*01:01) to with human TMEM 50A receptor protein. ) In which modeled epitope protein showed in blue color and human TMEM 50 A shown in green color visualized with Discovery Studio 3.5 client tool with global energy -38.45, attractive Vdw -26.15 and repulsive Vdw 27.01 which gave more accurate vaccine candidate structure in comparison to rest of the molecules. In this the contribution of the atomic contact energy (ACE) and hydrogen bonds (HBs) energy to the global binding energy was found as -9.33 and -1.39. After this InterProSurf server predicted the total accessible surface area of this docked structure as 9569.38 in which polar area/energy was 2740.68 and apolar area/energy was 6828.70 with 1.4 prob radiuses. In this total surface atoms and number of buried atoms were found as 775 and 455. Amino acid residues found at protein interface were GLU, ARG and CYS at residue number 6, 9 and 10 with complex area 46.11, 74.52 and 98.17.