Effect of HIV-1 Subtype C integrase mutations implied using molecular modeling and docking data

The degree of sequence variation in HIV-1 integrase genes among infected patients and their impact on clinical response to Anti retroviral therapy (ART) is of interest. Therefore, we collected plasma samples from 161 HIV-1 infected individuals for subsequent integrase gene amplification (1087 bp). Thus, 102 complete integrase gene sequences identified as HIV-1 subtype-C was assembled. This sequence data was further used for sequence analysis and multiple sequence alignment (MSA) to assess position specific frequency of mutations within pol gene among infected individuals. We also used biophysical geometric optimization technique based molecular modeling and docking (Schrodinger suite) methods to infer differential function caused by position specific sequence mutations towards improved inhibitor selection. We thus identified accessory mutations (usually reduce susceptibility) leading to the resistance of some known integrase inhibitors in 14% of sequences in this data set. The Stanford HIV-1 drug resistance database provided complementary information on integrase resistance mutations to deduce molecular basis for such observation. Modeling and docking analysis show reduced binding by mutants for known compounds. The predicted binding values further reduced for models with combination of mutations among subtype C clinical strains. Thus, the molecular basis implied for the consequence of mutations in different variants of integrase genes of HIV-1 subtype C clinical strains from South India is reported. This data finds utility in the design, modification and development of a representative yet an improved inhibitor for HIV-1 integrase.


Background:
The enzyme integrase (IN) is a key retroviral enzyme that plays a significant role in the replication of the virus in its susceptible host [1]. The HIV-1 pol gene encodes for the IN enzyme, the reverse transcriptase enzyme (RT) and Protease (Pr). In HIV-1 replication mechanism IN catalyzes the chromosomal integration of newly synthesized double-stranded DNA by strand transfer and incorporates it into the host genomic DNA [2]. The HIV-1 IN also plays a role in the stabilization of preintegration complex (PIC) which acts as the permanent genetic reservoir and also in the process of initiating the production of new HIV-1 virions [3]. The HIV-1 IN enzyme is a 32 kilo Dalton protein and is composed of 288 amino acids. This gene is also an important target for molecular diagnosis in HIV-1 infected individuals because the region is highly conserved compared to reverse transcriptase and protease.
The HIV IN enzyme comprises three structurally and functionally distinct domains the N terminal domain (residues 1-50), the catalytic core domain (residues 51-212) and the C terminal domain (residues 213-288) contains an SH3 like domain binds non specifically to DNA. The IN active site is located in the catalytic domain which is composed of 2 Asp residues and 1 Glu residue in the conserved DDE motif which are required for catalysis. The conserved DDE motif coordinates two divalent Mg 2+ ions essential for the catalysis of integration [4,5]. The critical step during retroviral life cycle is the integration of the viral double stranded DNA into the host chromosome [6]. The HIV-1 integrase enzyme plays a crucial role by removing a dinucleotide next to the conserved cytosineadenine sequence from each 3'end of the viral DNA. Subsequently then integrase catalyzes by joining of the processed viral 3'end to 5'end of strand breaks in the host DNA [6].
Currently there are three regimens named Raltegravir (RAL), Elvitegravir (EVG) and Dolutegravir (DTG) which can be used as IN inhibitors and are approved by the US Food and Drug Administration. Regimen including the IN inhibitors can even be used among HIV infected individuals who are treatment naïve as per guide lines [7]. These drugs can halt the HIV-1 viral replication and significantly reduce the disease progression to AIDS. The IN inhibitors can be very useful in countries where the numbers of individuals infected with HIV are very high such as South Africa, Nigeria and India [8,9]. The amino acid substitutions reflect resistance to RAL in three distinct genetic pathways. The two major resistance pathways are Q148HRQ/G140S and N155H/E92Q and third less frequent pathway identified as Y143CRH/T97A specifically described for raltegravir and specific resistance mutations [10]. The only described resistance pathways for EVG was T66I and S147G. The DTG has a resistance profile markedly distinct from those of RAL and EVG. The non polymorphic residues such as S153YF, R263K and G118R, have been described as being selected by DTG [10].
The molecular informatics and computer aided applications have contributed for best understanding of the mechanism of integration, drug binding and resistance development [11]. The computer based methods are becoming increasingly important and complementary to in vitro assays in studying the structure and function of bio molecules. Molecular docking is a frequently used tool in analyzing the structure based drug design. The most frequently used method and an effective way to predict the binding of substrate with its receptor is docking simulation, which is successfully implemented in many applications including the understanding of drug resistance [12,13]. The study reported here was intended to assess the frequency of mutations in the IN region of HIV-1 among infected treatment naïve (IN inhibitor) population and also to predict the response to treatment of strains with mutations by minimal molecular modeling and docking studies.

Methodology:
The study was carried out at the department of Clinical Virology of a tertiary care hospital in South India with the approval from its Institutional Review Board. The analysis was carried out in a university attached Computer-Aided Drug Design and Molecular Modeling Lab in South India. Blood samples were collected from HIV-1 infected individuals who had come to the department Clinical Virology for CD4+ T-cell estimation and/or HIV-1 drug resistance testing. The study was conducted during the years 2009-2012. A total of 128 samples collected from treatment naïve HIV infected individuals and 33 samples collected from HIV-1 infected individuals on anti retroviral therapy (ART with NRTI/NNRTI/PI) showing clinical failure. None of these 33 individuals were on IN inhibitors. All study participants were enrolled for the study only after getting the informed consent.

Sample collection
Eight ml of whole blood samples were collected in a sterile K2 EDTA vacutainer (Beckton Dickinson, New Jersey, USA) from study subjects who were not on any IN inhibitors. Fresh whole blood was used for CD4 estimation and in parallel plasma samples are separated as required and made in to multiple aliquots and stored at -70˚C until testing. The CD4 + T cell count estimation for all these patients was performed using the standard procedure by Guava Technologies with Easy CD4™ and cytosoft™ software version 2.2 (Guava Technologies, California, USA) or BD FACS Count system with FACS Count CD4/CD4 SW, Version 1.0 as described earlier [14].

HIV-1 integrase gene amplification
The plasma samples were used to extract HIV-1 RNA using QIAamp® viral RNA extraction (Qiagen GmbH, Hilden, Germany) reagents as per the manufacturer's instruction. The amplification of HIV-1 IN gene was carried out by an in-house nested PCR assay using 20µl of extracted RNA with reagents from Qiagen one step RT PCR assay (Hilden, Germany) which combines both conversion of RNA to cDNA and amplification of IN using specific primers. The 1 st round of PCR amplification was done using 20 picomoles of forward GCAGGATTCGGGATTAGAAG and reverse primers CTTTCTCCTGTATTCAGACC. The thermal cycling conditions used were as follows; 50 0 C for 30 minutes, 95 0 C for 15 min for activation of the Hot start Taq enzyme followed by 40 cycles of 94 0 C for 30 sec, 52 0 C for 30 sec and 68 0 C for 1 minute followed by a final extension at 68 0 C for 7 minutes. The second round PCR was carried out using 2 µL of first round DNA input to a PCR mix containing 2.5 units of Hot star Taq master mix (Qiagen, Hilden, Germany) and 20 pico moles of forward AAGGTCTATCTGGCATGGGTA and reverse primers TCCCCTAGTGGGATGTGTACTTC in a total reaction volume of 50 µL. The PCR was carried out in thermal cycler PTC-100 (MJ research, California, USA) with following thermal cycling conditions; 95 0 C for 15 min for activation of the Hot start Taq polymerase enzyme followed by 40 cycles of 94 0 C for 30 sec, 52 0 C for 30 sec and 72 0 C for 1 min followed by a final extension at 68 0 C for 7 min. Following amplification the PCR products were first analyzed on an ethidium bromide stained agarose (2%) gel to check for the 1087 bp specific amplicon using the gel documentation system Gel doc 2000 (Bio-Rad, California, USA) using the software Quantity one version 4.6.9. 3, 4-oxadiazol-2-yl) formamido] propan-2-yl}-6-oxo-1, 6-dihydropyrimidine-4carboxamide) with PFV intasome (PDB ID: 3OYA). This was generated using software Carbon atoms in PFV are colored brown and RAL are colored in cyan. This docking protocol using PFV was only used for validation purpose for showing the accuracy of our docking method.

Sequencing of the PCR products
The PCR amplified products were subjected to pre-sequencing purification using sequencing clean up reagents (Millipore Corporation, Montage, MA, USA). The sequencing PCR was carried out with respective forward and reverse primers used for 2 nd round PCR amplification. The sequencing reaction was carried out using Big Dye Terminator Cycle Sequencing assay kit (Applied Biosystems, CA, USA), subsequently run on ABI PRISM 310 genetic analyzer (Applied Bio-System, California, USA) followed by post-sequencing PCR clean up by the Millipore clean up system (Montage, MA, USA).The sequences were aligned using the Bio Edit software [www.mbio.ncsu.edu/bioedit/bioedit.html] and Finch TV software (version 1.4.0). These sequences were submitted to the Stanford HIV drug resistance database (http:// hivdb.stanford.edu) for analysis and interpretation on a regular basis.

HIV-1 IN gene functional domain analysis
The IN study strains with amino acid sequences were aligned using the Bio Edit software. The study strains consensus amino acid sequences was also constructed and compared to the global subtype C and the global subtype B consensus sequences [2]. The amino acid sequence alignment shown in our study strains had complete IN region (288 residues). The functional domains (F1-A49, M50-E212 and L213-D288) on the test consensus were compared with HIV-1 global C and global B subtypes, in order to determine conserved and substituted amino acid residues. The mean genetic distances among the amino acid sequences were determined using Kimura 2parameter model [2].

HIV Sub typing and phylogenetic analysis
All the samples were subjected to HIV-1 sub typing using the REGA HIV-1 Sub typing Tool -version 2.0 in HIV drug resistance Stanford data base (http://hivdb.stanford.edu). The HIV-1 sub typing was also determined by phylogenetic tree analysis using the nucleotide sequences of the study strains aligned using clustal W with reference strains of HIV-1 subtypes (A-D, F-H, J and K) taken from the previously published literature [2]. The tree was generated by the neighbor joining method using 1000 bootstrap replicates in MEGA4 software. All the IN sequences generated from this study was submitted to Gen Bank and accession numbers obtained for the study samples

Criteria used for construction of IN 3D modeling
The consensus amino acid sequence generated from the total study samples (n=102) was utilized for the 3D modeling. The generated consensus amino acid sequence had not shown any major or minor mutations and hence was considered as wild type for 3D model. The same amino acid sequence was further incorporated with appropriate mutations in single or in combinations at appropriate positions while generating the mutation models. The model was built by introducing 3 individual accessory mutations (L74M, T97A and G163R), two major mutations (Q148H, N155H) also combination of accessory (L74M, T97A and G163R) mutations and accessory with major mutations (L74M, T97A, G163R and Q148H), (L74M, T97A, G163R and N155H).

Protein model preparation
The wild and mutant HIV-1 IN and viral DNA complexes were prepared by a multi-step process through Schrödinger Protein Preparation Wizard [24]. The right bond orders as well as charges and atom types were assigned, and hydrogen atoms were added to the structures. Each model was subjected to energy minimization using the Optimized Potentials for Liquid Simulations (OPLS)-2005 force field with implicit solvation. Each model was minimized and the minimization was terminated when root mean square deviation (RMSD) of heavy atoms in the minimized structure relative to the model structure exceeded 0.3 Å. We prepared one wild type, five single mutants (L74M, T97A, G163, Q148H, N155H), combination of accessory mutations (L74M, T97A and G163R), and combination of major with accessory mutations (L74M, T97A, G163R, Q148H), (L74M, T97A, G163R and N155H).

Ligand preparation
The HIV-1 IN inhibitors RAL, EVG and DTG were prepared using the LigPrep [25]. First all the hydrogen atoms were added to the ligand molecules as they had implicit hydrogen atoms. The bond orders of these ligands were fixed. The ionization states of the ligands were generated in the pH range of 5.0-9.0 using Epik [26]. Most probable tautomers and all possible stereo isomers were generated to study the activity of individual stereotypes of each ligand. In the final stage of Lig prep, compounds were minimized with OPLS-2005 force field.

Molecular Docking
Molecular docking was analyzed to study the relative estimates of interaction between IN molecule of the clinical isolate having different mutations and with the IN therapeutic drugs. The 3D modeling of HIV-1 IN mutation that can lead to potential low level drug resistance with accessory mutations (L74M, T97A and G163R) and high level drug resistance with major mutation (Q148H, N155H) were chosen for docking analysis. We performed standard precision (SP) docking using the program Glide and semi-flexible docking protocols [27, 28]. The ligands being docked were kept flexible, in order to explore an arbitrary number of torsional degrees of freedom spanned by the translational and rotational parameters. The final energy evaluation is performed with Glide Score. The Glide Score was used as a function of fitness and best-fit pose for a given ligand was determined by the Emodel score. The choice of the best docked structure for each ligand was made using model energy score Emodel that combines Glide score, the non bonded interaction energy and the excess internal energy of the generated ligand conformation. The IN inhibitors like RAL, EVG and DTG were chosen to investigate their binding mode within the active site of modeled structure. Glide SP was carried out in each modeled structure.

Structural validation
The structural validation was done for all the generated HIV-1 IN models using the web server based Ramachandran plot (http://mordred.bioc.cam.ac.uk/~rapper/rampage.php).The Ramachandran plot shows the phi (ϕ) psi (ψ) torsion angles for all residues except glycine and proline in the structure.

Results:
A total of 143 samples of 161 tested were amplified and of these only 102 samples were successfully sequenced for the complete length of the IN region from pol protein (288 amino acid). Of the 102 sequences with complete IN amino acid residues, 81 were from ART naïve while 21 samples were from individuals who were on NRTI/NNRTI/PI drug regimen and showed clinical failure. None of them were on any IN inhibitors. Fourteen (14%) among the 102 strains showed accessory mutations that can contribute to potential low level IN inhibitor drug resistance. None of the individuals had any major mutations that can contribute drug resistance to the IN inhibitors. The mutation data profile on the HIV-1 IN drug resistance accessory mutations among the study subjects and the significance of the mutation on the drug activity is shown in Table 1. Among the 102 individuals the CD4 counts were available for 101 individuals, these ranged from 2-1295 cells/µl. The CD4 counts did not show any significant difference (p = 0.39) between the individuals who presented with or without mutations in the IN region.
Six strains (6%) showed mutation at position 74 while 3 strains (3%) had mutation at 95 th amino acid position. The remaining 1 each strain showed accessory mutations at positions 97, 138, 157, 163 and 230. All the 102 sequences from the clinical samples were identified as HIV-1 subtype C according to REGA HIV-1 Sub typing Tool. Based on the phylogenetic tree analysis all the clinical strains were also identified as subtype C and this data was shown in Figure 1. The amino acid residues of these study samples were aligned with global subtype B and C consensuses of HIV-1 (residues 1-288) using Bio Edit software and this comparison is shown in Figure 2. The genetic variability of the IN gene among the study strains was very low as shown by the kimura 2-parameter analysis. The mean genetic distance observed was in the range of 0.0012 to 0.0413 (0.1 -4%). The complete predicted study amino acid sequence alignment (up to 288 amino acid residues) showed that the consensus of the study strains was identical to the global subtype C consensus, except at 8 positions (D25E, K42Q, M50I, K110V, V111K, D182I, T218I and R269K). It differed from the global subtype B consensus at 12 positions (R14K, K42Q, V72I, F100Y, L101I, K110V, V111K, V113I, N134G, E167D, D182I and D278A).
The structural validation based on the Ramachandran plot showed a good validation score for all generated HIV-1 IN mutation models. The distribution of ϕ, ψ angles which describes the rotations of the polypeptide backbone around the bonds between N-Cα (Phi, ϕ) and Cα-C (Psi, ψ) showed a range from 88.4-90.5% residues in the most favorable region, 6.7-7.7% range of residues in allowed region and 2.8-3.9% residues in the generous region. Overall 96.1-97.1% of the residues were within the allowed region. Molecular docking was analyzed to study the relative estimates of interaction between the clinical isolate protein sequences with mutations and the therapeutic drug. The molecular docking scores obtained for various mutation models like 3 individual accessory mutations (L74M, T97A and G163R), major mutation (Q148H, N155H) and also in combination of accessory (L74M, T97A and G163R), accessory and major mutations (L74M, T97A, G163R and Q148H), (L74M, T97A, G163R and N155H) were compared with wild type strain. The molecular docking scores using the study strains showed significant effect on the molecule binding affinity compared with wild type. (Subtype H), AF082395 (Subtype J) and AJ249239 (Subtype K) for repective HIV-1 subtypes were taken from literature as described elsewhere [2]. The tree was generated by the neighbor joining (N-J) method with 1000 bootstrap replicates using MEGA4 software. The docking algorithm used in the study was validated to reproduce the co crystallized pose of RAL in the PFV IN-DNA binding pocket. The docking study yielded a good agreement between the crystal and docked structure (RMSD 0.6 Å) shown in Figure-3. This docking protocol using PFV was only used for validation purpose for showing the accuracy of our docking method. The most significant method of evaluating the accuracy of a docking procedure is to determine how closely the lowest energy poses (binding conformation) predicted by the object scoring function, Glide score. The model with single accessory mutation at positions like 74, 97, 163 had a mean difference in binding score of -0.26 (range -0.14 to -0.39) with RAL, -0.68 (range -0.38 to -1.13) with EVG and -0.55 (range -0.22 to -0.71) with DTG docking analysis. The mean difference in the binding score for the two major mutations Q148H and N155H were -0.23, -0.62 and -0.80 respectively for RAL, EVG and DTG. However, when there were three accessory mutation co-exist the reduction in the binding scores for all 3 inhibitors showed -0.25, -0.35 and -0.22 respectively. This difference in the binding energy score was higher when there is a combination of accessory and major mutations (-2.07 to -2.53 for RAL, -1.82 to -2.41 for EVG and -1.92 to -2.98 for DTG ). The predicted binding affinity, energy of best docked compound and their interaction in active site for the individual mutation models and wild type with RAL, EVG, and DTG analogs is shown in Table 2. The representation of 3D and 2D docking pose of HIV-1 wild type strain structure with RAL, EVG, and DTG analogs is shown in Figure 4.   *Emodel is a specific combination of Glide Score, the non-bonded interaction energy between the ligand and the receptor and the internal torsion energy of the ligand conformer, expressed in kcal/mol. Minor mutations -L74M, T97A, G163R, Major mutations -Q148H, N155H, **3 minor -combination of L74M, T97A, G163R; The difference in the binding energy score was higher when there is a combination of accessory and major mutations. However these studies showed the significance of subtypespecific polymorphisms in both ARV-naïve and experienced individuals. [33][34] Our study where in HIV-1 subtype C strains were analyzed and did not demonstrate any major mutations while 14 (14%) individuals had accessory IN mutations at 7 positions of which majority were L74M. This was similar to the findings from a Brazilian study where 8% of the study population had accessory mutations like V72I and V201I among treatment naïve individuals among the HIV-1 B, C, and F subtypes [35]. Our study also demonstrated these two polymorphisms, V72I and V201I reported in the Brazilian study. Studies from Europe and USA also reported absence of major IN mutations and presence of accessory mutations among treatment naïve subjects infected with HIV-1 B and non-B subtypes [36,37]. Compared to the European study our frequency of accessory mutations were less but were higher than the one reported from USA [37]. Among the RAL treated individuals the major IN resistance mutations reported were G140S, Q148H and N155H, V151I, E92EQ, V151I, G163R [1]. These major mutations were not present in any of our study strains.

Discussion
The reported genetic variability of IN gene based on the mean genetic distance among HIV-1 subtype C from south Africa varied from 0.0124 to 0.1004 [2]. Compare to this our subtype C study strains had a lower genetic variability of the IN gene with a mean genetic distance range of 0.0012 to 0.0413 using the kimura 2-parameter method. Since the mean genetic distance was low the cross contamination between sequences in the data set were also verified and ruled out.
A study from reported Canada had analysed the structural modeling of R263K mutation to show a change in 3' processing and strand transfer activities with viral LTR DNA and target DNA compared to the wild type. Based on their modeling studies they reported that IN containing R263K mutation against DTG showed a slight decrease in 3' processing and strand transfer activities compared to the wild type. They also reported based on their structural modeling analysis by in vitro IN-DNA binding assays that R263K mutation can affect the IN DNA interactions [37]. Another study analyzed the molecular dynamics approach to estimate the binding energy of HIV-1 IN inhibitors (including RAL and EVG) and correlated with in vitro activity [38]. They reported that RAL and EVG interaction energy values closely matched with the experimental binding energy [35]. In one of the structural analysis it was found that two mutations like G118R and E138K lead to the introduction of a positively charged arginine at position 118, near the catalytic amino acid 116, this might decrease Mg (2+) binding, compromising the enzyme function and which can lead to significant reduction in viral integration and replication capacity [39].
In yet another study difference in the susceptibility of IN strand transfer inhibitors (RAL and EVG) among HIV-1 Subtypes B and CRF02_AG were compared by looking at the binding affinity. The study showed that the sequence variations between the INs of CRF02_AG and B strains did not lead to any notable difference in the structural features of the enzyme and did not impact the susceptibility to the IN inhibitors [40]. Studies had also evaluated various IN inhibitors including RAL and EVG impact on the specific mutations associated with IN resistance [41,11]. Those investigators had stated that distinct binding energy difference was observed between different classes of IN inhibitors which also correlated with resistance patterns [11].
The phenotypic analysis of clinically derived IN for G118R and F121Y was analysed in a study which showed high resistance to all 3 inhibitors with a fold change >100. This finding was also supported by their results which showed G118R and F121Y was associated with reduced binding affinities to each of the inhibitors and with a decreased number of hydrogen bonds compared with the wild-type complexes [10]. Marsden et al 2011 in their experimental analysis found the presence of a single primary raltegravir resistance mutation (Q148H, Q148R, N155H, or N155S) is sufficient to provide resistance to RAL during macrophage infection. They also had reported that N155 pathway mutations typically produce lower-level resistance than Q148 pathway mutations [42]. Another study from USA had looked at the genotypic and phenotypic susceptibility to IN inhibitors using 39 clinical isolates. In their study the susceptibility to DTG and RAL was determined for RAL resistant clinical isolates. Of 39 clinical isolates 30 isolates had genotypic and phenotypic resistance to RAL, in which the median level of phenotypic resistance to RAL was high, while the DTG containing the Y143 and N155 mutations level of resistance to was close to that of wild-type variants. But for isolates with Q148 plus additional IN mutations showed more reduced susceptibility to DTG [43]. Based on these phenotypic studies it was observed that there was reduced drug susceptibility when clinical isolates contain combination of mutations. When compared to these studies our docking analysis also supports these findings.
In our study we look at the drug binding affinity with mutation singly or in combination. This difference in the binding energy was observed for RAL, EVG and DTG analogs. The binding energy was decreased with either a single accessory or major mutation. But in combinations of accessory mutations as well as in combination of major mutations with accessory mutations the binding energy has been significantly reduced. Our findings with the docking experiments is well correlating with HIV stanford drug resistance data base findings as it suggests these accessory mutations cause low level resistance when presented individually (http://hivdb.stanford.edu). But when presented together they cause high level resistance against IN inhibitors among infected individuals.

Conclusion:
This study reports the analysis and interpretation of HIV-1 subtype C integrase sequences with identified accessory mutations among infected individuals from South India for the first time. This data aided with molecular models and docked inhibitor conformers finds utility in understanding the functional consequences of mutations among integrase variants for predicting drug susceptibility and or resistance. An accessory mutation in addition to other major mutations in HIV-1 subtype C integrase variants is a challenge for inferring inhibitor resistance in HIV-1 infected individuals in South India when combinational regimens are scheduled.

Ethical clearance:
It should be noted that ethical clearance board of the institute has reviewed the use of samples from human subjects for this study and appropriate approval was given in this regard.