Naturally occurring capsid protein variants L1 of human papillomavirus genotype 16 in Morocco

HPV L1 protein is a corner stone in HPV structure, it's involved in the formation of the viral capsid; widely used as a systematic material and considered as the main component in vaccines development and production. The present study aims to characterize genetic variation of L1 gene of HPV 16 specimens and to evaluate in silico the impact of major variants on the epitope change affecting its conformational structure. A fragment of L1 gene from 35 HPV 16 confirmed specimens were amplified by PCR and sequenced. Overall, five amino acids residues changes were reported: T390P in 16 specimens, M425I and M431I in 2 cases, insertion of Serine at 460 and aspartic acid deletion at position 477 in all analyzed cases. The 3D generated model showed that T389P amino acid substitution is located in the H-I loop; the two substitutions M424I and M430I are both located in the H2 helice. The Serine insertion and aspartic acid deletion are located in the H4 helice and B-C loop, respectively. Superimposition of sequences' structures showed that they share a very similar conformation highlighting that the reported amino acids variations don't affect the structure of the L1 protein. However T389P, located in the H-I loop identified as an immunogenetic region of L1 capsid, was reported in 51.4% of cases could interact with vaccines induced monoclonal antibodies suggesting a potential impact on the efficacy of available anti-HPV vaccines.


Background:
Worldwide, Cervical cancer is the fourth most common cancer in women, with an estimated 528,000 new cases in 2012, and more than 85% of the global burden occurs in developing countries, where it accounts for 13% of all female cancer [1]. There's evidence that persistent infection with high risk Human papillomavirus (HPV) is the main etiological factor in the development of cervical cancer [2]. Of these high-risk types, HPV-16 and HPV-18 are responsible for about 70% of cervical cancers [3]. Human papillomavirus (HPV) genomes are circular dsDNA characterized by eight open reading frames (ORFs), which are all transcribed from the same DNA strand and orientation, and yield two classes of proteins which are classified as non-structural regulatory proteins (E1-E7) and structural proteins L1 and L2 based on their temporal expression. Infectious HPV is primarily composed of 72 pentameric capsomeres of the L1 protein arranged in a T = 7-icosahedral capsid, the capsomeres are associated with 12 or more copies of the L2 protein [4]. The intrinsic capacity of L1 proteins to assemble into empty capsid-like structures has been used to develop virus like particles (VLPs) largely used in the induction of protective immunity in animal models [5] and the development of prophylactic vaccines for HPV infection [6][7][8]. Accordingly, two prophylactic vaccines; Cervarix (GSK) and Gardasil (Merck); based on the L1 proteins of HPV16 and HPV18, have been introduced into the immunization schedule in many developed and some developing countries [9].
On the surface of the pentamers, specific loops structures of the L1 protein contain type specific epitopes [10] and the vaccineinduced type-specific protection is likely mediated by neutralizing antibodies targeting L1 surface-exposed loops.
Studies with monoclonal antibodies suggest epitopes composed of FG and HI loops are immunodominant for HPV 16 [11][12] whereas BC, DE, and HI loops are important for neutralization of HPV 6 and 11 [13]. Polymorphism within these loops is likely to result in the generation of neutralizing antibodies of different binding affinities due to the presence of different HPV types displaying distinct features on their surfaces [14]. Given that the prevalence of cervical cancer varies in different regions and countries, a number of studies have addressed the possible association of E6 based HPV16 variant status with different risks for progression to malignancy and suggested that HPV variants can influence the viral persistence and development of cervical cancer [15][16][17][18]. Naturally occurring intratypic molecular variants of HPV-16 are defined as isolates with primary DNA sequence differences that total no more than 2% of the L1 open reading frame (ORF) of the prototype sequence [19] and are known to occur and have been shown to be specific or more prevalent in certain parts of the world [20].
Previous studies have reported that variations in L1 gene can affect the viral assembly affecting the protein structure or conformation and leading to altered biological functions with clinical significance, including the immunological recognition by the host [21][22]. On the other hand, L1 intratypic HPV variants can restrict the immune response by escaping consensus B-and T-cell epitopes of the available vaccines. These variants may also provide some new epitopes for targeting a particular geographical population, which may not be presented by these available vaccines [23]. In Morocco, as it is the case in the other North African countries, cervical cancer is the second most common cancer among women and its incidence is the highest in this region with an age standardized incidence rate (ASR) of 13.5 per 100 000 women [24]. In our previous HPV monitoring studies, we have shown that the two most prevalent high-risk HPV types among women, before and at the time of introduction of HPV vaccination in Morocco, were HPV16 and HPV18 [25-28] and we have analyzed the intratypic variation of HPV16 based on the naturally occurring sequence variations of E6 and E7 genes, to have a global picture on the HPV16 variants circulating in Morocco [17][18]. However, to our best knowledge, there's no study giving information on the L1 variants of HPV16 in Morocco. Thus, the present study was planned to characterize genetic variation of L1 gene in a sample of Moroccan women with cervical cancer to identify the L1 HPV16 variants circulating in Morocco and to evaluate in silico the impact of major variants of L1 on the epitope change affecting the conformational immune reactive epitope regions within HPV16 genotypes.

Methodology: Clinical Specimens:
DNAs from 35HPV16 positive cervical cancer samples were available from our laboratory DNA bank [17][18]. Overall, 36 cases (90%) were diagnosed at advanced stages (IIB and IIIB), whereas 4 patients (10%) was admitted at an earlier stage (IB). Pathological analysis revealed that all cases were squamous cell carcinoma (SCC) and only one case was a welldifferentiated adenocarcinoma. The ethic committee of Pasteur Institute in Morocco approved the study and written informed consent was obtained from each study subject.

Variant analysis of L1 gene by PCR and direct sequencing:
A fragment of 450 bp of L1 gene was amplified using MY09/MY11 consensus primers (MY09: 5'-GCMCAGGGWCATAAYAATGG-3'; MY11: 5'-CGTCCMARRGGA WACTGA-3'). PCR amplification was performed in a 25 µl volume containing 1,5mM MgCl2, 100 µM each dNTP, 0,2 µM forward and reverse primers, 100 ng genomic DNA and 0,25 U gold Taq DNA polymerase (Applied Biosystems, USA) in 1x PCR buffer. The amplification mixtures were first denatured at 94°C for 7 min. Then, thirty-five cycles of PCR were performed with denaturation at 94°C for 1 min, primer annealing for 1 min at 55°C and primer extension for 1 min at 72°C. At the end of the last cycle, the mixtures were incubated at 72°C for 7 min. For every reaction, a positive control, using DNA extracted from SiHa, an HPV16 positive cell line, and a negative control, without template DNA, was included. PCR products were tested on an ethidium bromide stained 2% agarose gel.
The ExoSaP IT R clean up system (USB, USA) was used to purify positive PCR products. Sequencing of purified PCR products was performed with BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems). Sequencing reaction was performed in a final volume of 10µl containing 1µl of Big Dye v.3.1, 10 pmol of forward primer and 2µl of purified PCR product. The mixture was incubated at 96°C for 1min and 25 cycles were performed: denaturation at 96°C for 10s, primer annealing at 50°C for 5s and extension at 60°C for 4 min. The reaction was set to 30µl. To eliminate the excess of labeled ddNTPs, sequencing reaction products were purified using sephadex G-50 gel-exclusion chromatography (GE Healthcare Life Sciences). Direct sequencing of amplified PCR products was performed on an ABI 3130xL Genetic Analyzer (Applied Biosystems).

Sequences alignment:
The compiled nucleotide sequences were aligned using ClustalW2 software [http://www.ebi.ac.uk/clustalw/] and the protein sequences were aligned using MUSCLE stands (MUltiple Sequence Comparison by Log-Expectation) [29]. SeqLogo was used to generate sequences logos from amino acid sequence alignment and evaluate the sequence variability and represent informations concerning consensus sequence [30].

3D prediction of partial L1protein:
The secondary structure of HPV16 L1 protein sequences was predicted using Swiss Model Server (http://swissmodel.expasy.org). Predicted 3D structures of HPV16 L1 were obtained using PHYRE2 Server [31], the alignment was done by the crystal structure of L1 protein of human papillomavirus 16 which was retrieved from Protein Data Bank (PDB) with PDB ID: 1dzl, the diffraction structure of L1 HPV protein had a resolution of 3.5 Å, R free value of 0.290 and R value of 0.280. PROCHECK server was used to evaluate the stereo chemical quality of protein models and to analyze residueby-residue geometry and the overall structure geometry [32]. The predicted partial protein structures of L1 HPV16, which present the amino acids variations, were generated, visualized and analyzed on PyMOL program [33].

Results:
Results from multiple sequence alignment of the 35 samples with the L1 reference sequence (ID: K02718.1) are reported in Table 1. Overall, 11 patterns have been reported. A total, of 17 single nucleotide changes have been reported overall. Among them, 5 were non-synonymous amino acids variations including A6693C observed in 18 cases, G6800A in 2 cases, G6818A in 2 cases. Of particular interest, ATC insertion at position 6901 and deletion of GAT at position 6950 were common to all analyzed samples.  Original  codon  GCA  GGC  CAC  ACT  AAG  ATG  ATG  TTG  CTA  CCC  ACA  ACT  -AAA  GAT  TAC  ACT  Altered  codon  GCC  GGT  CAT  CCT  AAA  ATA  ATA  TTA  TTA  CCT  ACC  ACG  ATC  AAG  Del  TAT     The 3D L1 protein generated by PyMOL, representing an individual L1 protein molecule resulted from the expression of HPV16 L1-ORF, is reported in Figure 3. This structure showed the presence of 12 helices, 25 sheets and 26 loops. DE and FG loops known to interact with the L2 protein and producing conformational neutralizing antibodies. HI-and BC-loops variable regions are supposed to be involved in the interaction with the trans-regulatory protein E2. The 3D carton structure of the L1 prototype and the three obtained structures are illustrated in Figure 5. The 4 generated models have been well designed with a score 1 obtained by Global Model Quality Estimation (QMEAN6 score). Superimposition of sequences' structures showed that they share a very similar conformation highlighting that the reported amino acids variations don't affect the structure of the L1 protein.

Figure 2:
Variability of amino acid residues using SeqLogo. The twenty-three sequences have been used to estimate amino acids residue variation using SeqLogo. Sequence conservation is represented by the height of residue logos "indicated in bits", and the arrow indicates the change in amino acids residues.   Partial sequence of L1 gene was used to have a phylogenetic analysis of HPV16 circulating in Morocco. Two different groups belonging to the African 1 and 2 lineages prevail, with 37.1% and 51.5% respectively. These results are in agreement with our previously reported data on the same isolates using E6 and E7 genes. Indeed, DNA sequencing of E6 and E7 genes highlighted that the predominance of HPV16 African variants and the majority of isolates belong to the African 1 and 2 lineages [17].
The computer modeling of L1 protein was generated for each pattern to assess the impact on the obtained non-synonymous mutations on the structure of the major capsid protein and therefore its potential interaction to available vaccines. In the Of particular interest, fine-mapping of the epitope footprint showed that the five non-synonymous changes of amino acids residues, including the ATC insertion and GAT deletion, are localized in the H2 and H4 helices and H-I and B-C loop regions T389P amino acid substitution is located in the H-1 loop, identified as an immunogenetic region of L1 capsid and believed to contribute towards cross-neutralising antibody [41]. M424I and M430I amino acid substitutions are located in the H4 helice whereas the Ser insertion is located in the H2 helice. Both structural elements H2 and H4 helices are near the C-terminal end of L1 and are important for the assembly of papillomaviruses into particles. Moreover, H2 region, in association with H3, is essential for L1 folding and pentamer formation, whereas the H4 region is indispensable for the assembly of both the virus particle and also T1 and T7 virus-like particle [6]. D475 deletion is located in the BC loop. BC loop contains lysine residues that can facilitate binding to heparin sulfate proteoglycans, the initial step required for successful HPV infection [41].
There's evidence that among the 5 non-synonymous mutations obtained in this study, 4 don't affect the immunogenic site of L1 and therefore don't interfere with the binding of monoclonal antibodies targeting HPV 16 and only the T389P amino acid substitution present in 51.4% of cases was associated with potential interaction to monoclonal antibodies induced vaccines. On the other hand, generated 3D carton structure of HPV16 L1 homology models, harboring the 5 non-synonymous mutations, clearly showed that the L1 structure rests unchanged suggesting that these mutations don't affect the overall structure of L1 protein.
The present study is very informative and give for the first time data on genetic diversity on HPV16 L1 gene in a Moroccan population and adress a prediction of potential reaction to available anti-HPV vaccines. However, the main limitation was the small number of HPV 16 DNA, which may not reflect the real situation in Morocco and give an exhaustive picture on L1 gene diversity and the mutational status highlighting mutations that could modify the L1 protein structure and consequently affect the neutralization epitopes.

Conclusion:
The present study gives evidence on the genetic diversity of HPV 16 L1 gene without any impact on the structure of the L1 protein. The study highlights the presence of T390P mutation, located in the H-I loop and could interact with vaccines induced monoclonal antibodies suggesting a potential impact of this mutation on the efficacy of available anti-HPV vaccines. Other studies are needed on large samples and by sequencing the entire HPV 16 L1 gene to predict the efficacy of HPV16