Genetic characterization of Chikungunya virus 2009 isolates from South India

Chikungunya Virus (CHIKV) is a single stranded positive sense enveloped RNA virus. Re-emergence of CHIKV caused a massive outbreak with severe clinical manifestation affecting multiple organs. The genetic diversity of CHIKV, which caused recurring outbreaks in India, was studied. Blood samples were collected from suspected human cases of CHIKV infection in Chennai, Tamil Nadu and three Northern districts of Kerala in Southern India during the CHIKV outbreak in 2009. A partial E2 gene segment was amplified by RT-PCR. Among 119 samples 37 samples were positive for CHIKV by RT-PCR. Phylogenetic analysis revealed that the isolated sequences belonged to Indian Ocean Lineage (IOL) of ECSA genotype. The mutational analysis revealed the presence of substitutions such as S299N, T312M, A344T, S375T, V386G, W339R and S375P in the current study. In addition, a novel mutation V386G was observed in all the sequences. Two isolates found with unique substitutions W339R and S375P are reported. The structural analysis of the wild type and mutant proteins revealed that the structural changes are accompanied by modification in the intraprotein interactions.


Background:
CHIKV is a single stranded positive-sense RNA virus belonging to alphavirus genus of the Togoviridae family. The 12kb viral genome encodes four non-structural proteins (nsP1, nsP2, nsP3 and nsP4), a core protein and three structural proteins (E1, E2 and E3, E1 and E2 are connected by a short 6K region) [1]. Various gene mutations in structural and non-structural protein coding regions may contribute to infectivity and enhanced virulence rate of the virus [2]. Several mutations in the protein coding regions were detected during the 2006 epidemic in Reunion Island [3]. A point mutation in E1 protein (A226V) altered vector specificity, which resulted in the re-emergence of CHIKV and has been detected among CHIKV isolates during 2007 epidemic in the Indian subcontinent including Kerala [4]. E2 and E1 act as trimers of heterodimers (E2-E1) on the viral particle surface. E1 is a class II viral protein that mediates the fusion of viral membrane with host cell membrane under acidic pH. E1 homotrimer attaches to the host membrane and forms a hairpin loop; E2 most likely mediates cell attachment. The E2 protein has domains called A, B and C. Domain B and A has significant role in receptor binding. Domain B is located at the membrane distal part and forms the tip of E2. Domain A is located at the center and domain C is close to the viral membrane. Domain C is a transmembrane protein that attaches to the hairpin loop of E1 and aids in the viral entry [5]. Therefore, it is of interest to explore the gene mutations of the highly conserved C domain of E2 protein during the 2009 outbreak to understand the impact of domain C in viral entry. The potential selection pressure sites were analysed for their influence on evolution. The effect of mutations on protein structures provides valuable insights to functional consequences.

Methodology: Ethics statement:
Institutional ethical committee approval was obtained to carry out the study (IEC-NI/09/APR/09/11). The study participants were recruited after obtaining a written informed consent.

Sample collection:
Blood samples were collected from the suspected CHIKV cases from Kozhikode and Kannur districts of Kerala and a tertiary care centre in Chennai, Tamil Nadu during October-November, 2009. Blood samples were collected in suitable appropriately labeled vacutainer tubes (Beckton Dickinson, USA) and transported to the laboratory at 4 o C. A detailed proforma on clinical symptoms and duration of illness was also obtained. Plasma was separated and stored in aliquots at −80 o C.

RNA extraction and RT-PCR:
The viral RNA was extracted from the plasma using QIAamp Viral RNA Mini kit (Qiagen, Hilden, Germany) according to manufacturer's protocol. The partial E2 gene of CHIKV was amplified by RT-PCR using Qiagen One step RT-PCR kit (Qiagen, Hilden, Germany) on a thermo cycler (Applied Biosystems, Veriti, USA) using the primers forward: TATCCTGACCACCCAACGCTCC and reverse: ACATGCACATCCCACCTGCC [6]. The amplicons were verified for the size (305 bp) using agarose gel electrophoresis.

Partial E2 gene Sequencing:
DNA Sequencing was performed in our laboratory. The amplified DNA was mixed with RNase free water and made up to the volume of 100µl and filtered under vaccum in the Multiscreen HTS PCR plate (Millipore, USA). Big dye terminator cycle sequencing ready reaction kit (Applied Biosystems, USA) was used for sequencing with specific primers targeting partial E2 region [6]. The sequencing mixture was again purified by Montage SEQ 96 filtration (Millipore, USA). DNA sequencing was performed on ABI genetic analyzer 3730 (Applied Biosystems, USA). The sequences (37) were deposited in GenBank database and the accession numbers are KJ577651-KJ577660, KC977313-KC977322 and KM275635-KM275651.

Phylogenetic analysis:
From GenBank database, (n=360 sequences) CHIKV sequences were retrieved (DOA: 24/02/2016) and analysed along with study sequence. E2 gene was taken for analysis along with our sequences. Sequences were categorized based on the year of isolation, continent and genotypes. Likelihood mapping analysis was done to avoid the false conclusion about evolutionary relationships among strains using MEGA 7 [7]. Consensus was made for each category using codon code aligner (Tamura, Stecher et al. 2013).

Selection pressure analysis:
The nucleotide sequences (n=37) were aligned using HyPhy package implemented in Datamonkey server [8] and checked for redundancy. The potential recombination in the dataset was screened using GARD method. The selection pressure in the nucleotide sequence alignment was assessed using Parris method. The specific site selection on the gene was analysed using Single likelihood ancestor counting (SLAC), fixed effects likelihood (FEL), random effects likelihood (REL), MEME and fast-unconstrained bayesian approximation (FUBAR) algorithm. Positive selection was defined as p-value ≤ 0.1 for SLAC, FEL and Posterior probability ≥ 0.9 for FUBAR.

E2 protein structural analysis:
The secondary structure of the protein was predicted by PDBSUM [9]. The entire protein structure was unavailable in PDB; an ab-initio structure prediction was carried out using I-TASSER [10]. The best model was selected using confidence score (C-score) and validated using Ramachandran plot predicted by SAVES server [11]. The substitutions reported in all the study isolates were modelled using Swiss PDB viewer [12]. All the structures were optimized using Gromacs 4.5.6 [13]. The structural consequences were investigated by comparing the wild type and mutant protein structures. The structures were visualized using PyMol. Changes in the intra-molecular interactions were analysed using PIC server [14].

Results & Discussion: Demography and clinical features of the patients:
All 119 samples were collected from the study participants with acute onset of fever along with joint pain, myalgia and headache. Of these 119 samples, CHIKV infection was confirmed in 37 study participants by RT-PCR. Since all the samples were collected during acute phase illness (<5 days), Chikungunya IgM was not performed. Age of the participant's age ranged between 7 and 75 years (mean 35.7 years, p < 0.0001). We compared the prevalence rate based on the age of the patients by grouping them into four categories as; < 14, 15-24, 25-50 and 51-75 years. Significant detection of CHIKV was observed in the adults who are 25 to 50 years old (p < 0.05). The male female ratio was 1:1.43. Table I lists the demographic profile of 37 participants positive for CHIKV by RT-PCR. The clinical features fever (100%) and severe joint pain (97.29%) were most frequent joint inflammation (7%), maculo-papular rash on trunk and limbs (8.10%) and headache (10.81%) were also observed.

Nucleotide sequencing and Phylogenetic analysis:
The partial E2 protein-coding region showed changes at twenty nucleotide positions. Among them, seven unique nucleotide changes were consistent in all the sequences ( Table 2). These nucleotide substitutions were common among studied isolates [15].
Phylogenetic tree constructed using geographic sequences of different genotypes has shown that all the sequences belonged to ECSA genotype (Figure 1). ECSA genotype group was rooted with strains isolated from Congo during 1960. Three clusters were observed in the ECSA lineage. Majority (n=24) of the sequences were found in Clade III along with sequences isolated from Asian countries. Rest of the sequences (n=13) were found in-group I with the sequences of Central Africa, Italy and South East Asian countries. None of the sequences were found in clade II (which denotes sequences from South East Asia and India). The strains isolated from India after 2012 were found in-group II. This may be due to the amino acid divergence. Nucleotide divergence was found to be 5.2%, 4.8% and 5.5% among Clade I, Clade II and Clade III respectively based on the CHIKV prototype strain. No specific genetic diversity was noted in the consensus sequence of 2007-2010 isolated from different periods. Indian 2010 isolates were closely associated with these sequences. Branch length of 2011-2016 consensus sequence was more when compared to all other sequences due to nucleotide and amino acid divergence. Year wise analysis showed that isolates from 2006 to 2009 were found in the same cluster (Figure not shown).  (Table 3). Based on the prediction of protein secondary structure, codon 35 is located on the thirtieth beta turn, codon 45 on the tenth beta-bulge and codon 78 on the sixth helix. Codons 35, 45 and 78, that are under episodic positive selection experience purifying selection for their evolution [18]. Mutations at such sites may face transient positive selection, which indicates adaptive evolution towards new vector adaptability, cell tropism of vector and human.

Effect of mutation on protein structure:
The four substitutions that were prevalent in all the sequences were compared with the wild type to assess the changes in the protein structure. The secondary structure analysis of the wild type and mutant partial E2 proteins revealed that no major structural change had been enforced; however, the S299N substitution lead to loss of gamma turns and the mutated residue (Asparagine) is found within a beta-strand (Figure 2).
The C-score of the protein structures predicted by I-Tasser ranged between -2.35 and 1.02. The model 5 with a C-score of 1.02 was selected for further analysis. The quality of the best model was verified using Ramachandran plot. The predicted protein model had 75.9% of the residues in the most favoured region, 23.9% of the residues in additionally allowed region and 0.3% of residues in the disallowed region validating the quality of the model generated. A structural deviation of 0.019 Å was observed when the wild type and mutant proteins were superimposed (Figure 3). The intra-molecular interaction analysis revealed various changes in the main chain hydrogen bonds, main chainside chain hydrogen bonds, side chain-chain hydrogen bonds and hydrophobic interactions (Table 4 and 5). Possibly, this may alter the interaction of E2 with other proteins, particularly with cellular receptors, and may impact on tissue tropism [18]. A study proved that mutation in E2 gene leads to more severe inflammation and damage in the tissues distal to the site of the inoculation in the mice that were inoculated with CHIKV strain with mutation in E2 and in 3'UTR. Mutation in E2 along with 3'UTR enhances dissemination of virus to different organs with increases in titre of virus [19]. Since the recent outbreak is evidenced with severe joint pain in majority of the patients, it may be due to amino acid substitution in viral proteins, which leads to enhancement of host cell binding affinity. The comparison of the intramolecular analysis showed deviations in the bond formed by the residues. In addition, many new interactions were gained and few interactions were lost during the substitution process. The changes in the protein structure in the mutant may be compensated by gain or loss of new bonds to maintain their structural integrity leading to their adaptability and evolution.  Figure 1: Phylogenetic tree of CHIKV isolates and with the sequences derived from GenBank database generated using a maximum likely hood on the partial sequence of E2 gene (305bp). Bootstrap analysis was performed with 1000 replicates to determine confidence values on the clades within trees. The S27 strain (AF369024) prototype strain and O'nyong nyong (out group) were used in the tree.

Conclusion:
All sequences showed V386G substitution and we hypothesize that this substitution cause conformation changes in the protein leading to better affinity towards human host tissues. More studies are required to understand the effects of novel substitutions in the E2 region. Our study results along with published reports [20-23] indicate that there is a persistent change in the genome of CHIKV strains. Amino acid substitutions, which lead to structural changes of the proteins, lead to enhanced pathogenicity and virulence. Further studies are required to correlate the amino acid substitutions with disease severity, vector adaptation, host-cell interaction and drug affinity.