Genomic signatures of protease and reverse transcriptase genes from HIV-1 subtype C isolated from first-line ART patients in India

Genomic signatures of the protease and reverse transcriptase gene of HIV-1 from HIV infected North Indian patients who were under ART from 1 to ≤ 7 years were analyzed. The DNA from plasma samples of 9 patients and RNA from 57 patients were isolated and subjected to amplification for the protease and reverse transcriptase gene of HIV-1 subtype C. Then sequencing was carried out following the WHO dried blood spot protocol. The drug resistance mutation patterns were analyzed using the HIV Drug Resistance Database, Stanford University, USA. Lamivudine-associated drug-resistance mutations such as M184V/M184I, nevirapine-associated drug resistance mutations Y181C and H221Y, and efavirenz-associated drug resistance mutations M230I were observed in reverse transcriptase gene of archived DNA of two HIV-1 infected patients. No mutation was observed in the remaining 7 patients. Various computational tools and websites like viral epidemiological signature pattern analysis (VESPA), hyper mutation, SNAP version 2.1.1, and entropy were utilized for the analysis of the signature pattern of amino acids, hyper mutation, selection pressure, and Shannon entropy in the protease and reverse transcriptase gene sequences of the 9 archived DNA, 56 protease gene and 51 reverse transcriptase gene from the HIV-1 DNA amplified sequences of RNA. The HIV-1 Subtype-C (Gene bank accession number: AB023804) and first isolate HXB2 (Gene bank accession number: K03455.1) was taken as reference sequence. The signature amino acid sequences were identified in the protease and reverse transcriptase gene, no hyper mutation, highest entropy was marked in the amino acid positions and synonymous to non-synonymous nucleotide ratio was calculated in the protease and reverse transcriptase gene of 9 archived DNA sequences, 56 protease and 51 reverse transcriptase gene sequences of HIV-1 Subtype C isolates.


Background:
Acquired immunodeficiency syndrome (AIDS) is mainly caused by human immunodeficiency virus-1 (HIV-1) and humanimmunodeficiency virus-2 (HIV-2). The molecular characterization of the human immunodeficiency virus-1 and human immunodeficiency virus-2 in Yaounde, Cameroon was reported [1]. The length of the HIV-1 genome is approximately 9.1 kilobases. The genome consists of 15 proteins that completely regulate the life cycle of viruses within human beings [2]. HIV-1 replicates within the host [3]. The reverse transcriptase gene of the HIV-1 synthesized the DNA strands and the protease gene cleaved the strands to form the mature particles [4].The reverse transcriptase and protease gene of HIV-1 were sequenced for drug resistance studies [5]. Several types of antiretroviral therapy are used to treat the HIV-1 patients in the antiretroviral therapy program in India. According to guidelines, the HIV-1 patients initially started the first line antiretroviral therapy which consists of nucleoside reverse transcriptase inhibitors (NRTIs) like zidovudine, lamivudine, tenofovir, abacavir and non-nucleoside reverse transcriptase inhibitors (NNRTIs) like efavirenz and nevirapine etc. (ART guidelines) [6,7]. If the patients were on the failure of first line ART, the switch over to second line antiretroviral therapy is an option to decrease the viral copy number [8]. High viral load or low CD4 count as well as no drug resistance mutations was associated with mortality in AIDS defining patients [9]. However, the drug resistance mutation analysis is essential in patients infected with HIV-1 [10]. The list of pattern of mutations with the drug panels were highlighted by the expert groups [11]. The full-length of the HIV-1 proviral genome was characterized undergoing the firstline highly active antiretroviral therapy [12]. The HIV-1 proviral DNA drug resistance mutations were reported in a community treatment program [13]. Drug resistance mutation analysis of the archived DNA could help in choosing the proper regimen at a low level or suppressed viremia patients [14]. The functional analysis of the genomic signatures of HIV-1 gives important information on strain subtypes, epidemiological signatures, nucleotide substitution rates, Shannon entropy etc. Molecular epidemiology of human immunodeficiency virus transmission was reported in a dentist with AIDS [15]. The study of signature pattern analysis was analysed through the viral epidemiological signature pattern analysis (VESPA) [16]. The epidemiological signature pattern analysis of HIV-1 genome was reported in a Southern Indian clinical cohort study [17]. Therefore, it is of interest to document the genomic signatures of Protease and Reverse transcriptase gene of HIV-1 Subtype C isolated from the first line ART patients from India.

Materials and Methods: Epidemiological Investigation of HIV-1 patients:
The patients were enrolled for first line ART at the antiretroviral therapy centre, Sarojini Naidu Medical College, Agra, India from December 2009 to November 2016 as per the treatment guidelines directed by National AIDS Control Organization (NACO), Govt. of India. The details of clinical and sociodemographic profile were collected as per the published leaflet [18]. These patients were on first-line ART such as ZLE (Zidovudine + Lamivudine + Efavirenz), ZLN (Zidovudine + Lamivudine + Nevirapine), TLE (Tenofovir + Lamivudine + Efavirenz), TLN (Tenofovir + Lamivudine + Nevirapine), SLE (Stavudine + Lamivudine + Efavirenz) and SLN (Stavudine + Lamivudine + Nevirapine). The details of the study of the genotyping including the polymerase chain reaction and sequencing primers as well as amplification conditions of two-step polymerase chain reaction (1st round and 2nd round) was reported earlier [19]. After genotyping, Drug resistance mutation analysis of archived DNA and plasma RNA samples were performed by the HIV drug resistance database, Stanford University, USA (http://hivdb.stanford. edu/pages/algs/sierra_sequence.html). The 7-protease gene and 9 reverse transcriptase genes of the archived DNA samples, 56 protease genes and 51 reverse transcriptase genes of RNA samples were considered through the genomic signatures in a molecular epidemiological study.

Gene Bank accession number:
All the 57 partial polymerase gene sequences from RNA samples bearing the accession number MG788697 to MG788753 and the 9 partial polymerase gene archived DNA bearing the accession number MH503757 to MH503765 were available at NCBI, USA.
A computational approach for the analysis of the archived DNA and plasma RNA: The nucleotide sequences of the protease (PR) and reverse transcriptase (RT) gene were aligned using multiple sequence alignment tools (www.ebi.ac.uk/tools/mas). The nucleotide sequences were converted into amino acid sequences by EMBOSS Tran seq (www.ebi.ac. uk/ Tools/st/ emboss_transeq). The complete 9 PR gene lengths (nucleotide positions: 1 -297) and the 7 RT gene lengths (nucleotide positions:310 -988) of the archived DNA were taken for signature pattern analysis, hyper mutation analysis, selection pressure analysis, and Shannon entropy calculation. The complete 56 PR gene length (nucleotide positions: 1 to 297) and the 51 RT gene length (nucleotide positions: 310 to 988) were taken as the background sequences for signature pattern analysis, hyper mutation analysis, selection pressure analysis, and Shannon entropy calculation.
VESPA (Viral Epidemiology Signature Pattern Analysis) program was used for the comparison of amino acid sequences in the PR and RT genes for the archived DNA.VESPA program is available in the HIV databases (https://www.hiv.lanl.gov). The reference sequences of protease and reverse transcriptase genes of HIV-1 is being taken from the Gene Bank (accession number: AB023804) and HXB2 (accession number: K03455.1). The drug resistance-associated NRTI and NNRTI mutations were identified in the RT gene of 9 patients and the PR gene of 7 patients' archived DNA sequences and then these sequences were taken as background sequences for VESPA, selection pressure, hyper mutation, and Shannon entropy analysis. The computational analysis of the sequences of the PR and RT gene of HIV-1 subtype C were performed by using the Los Alamos Laboratory

Result and Discussion: Genomic signatures of the PR and RT gene of the archived DNA:
The details of the regimen profiles, viral load, and drug resistance mutations of protease and reverse transcriptase genes of the archived DNA isolated from 9 patients treated with first line ART are given in Table 1.   Table 2. The signature amino acids of the protease genes were found with the reference to background sequences at frequencies of 0.57 to 0.85. The signature amino acids of the reverse transcriptase genes were found with reference to background sequences at frequencies of 0.44 to 1.0. The signature frequency of protease and reverse transcriptase genes are given in Table 3 and  Table 4.     Table 5, Table 6, Table 7 and Table 8.  Table 9.                 Alignment position  35  39  48  60  103  121  173  177  184  200  207  211  214  245  291  292  293 Shannon entropy calculation of the protease and reverse transcriptase gene from plasma RNA: To evaluate the genomic stability of protease and reverse transcriptase gene of the HIV-1 subtype C, the Shannon entropy analysis of each amino acid codon was performed with the reference HXB2 (accession number : K03455) and Indian subtype C (AB023804). The amino acid positions A121= 1.151, I135= 1.338, H162= 1.136 were observed as the highest random entropy in the reverse transcriptase gene of HIV-1 subtype C with the reference sequences (AB023804) and HXB2. Similarly, the amino acid positions S12=0.579, X19=0.830, I36=0.546, N37=0.722 were observed as the highest random entropy in the protease gene of HIV-1 subtype C with the reference sequences (AB023804) and HXB2. The entropy differences between two sets were observed in 100 randomizations with replacement, cut off for conserved signature = 5.

Evaluation of selection pressure:
The selection pressure acting on the PR gene and RT gene of DNA was evaluated at codon level by subjecting all 56 PR gene and 51 RT gene sequences of clinical isolates along with the Indian subtype C reference (AB023804) and HXB2.

Hyper mutation:
Hyper mutated sequences of the genomic DNA were detected through the identification of an excessive G→ A change pattern consistent with APOBEC3G/F signature. Here the 56 PR gene and 51 RT gene sequences were taken for hyper mutation analysis with the reference sequences AB023804 and HXB2. No hyper mutation (P value =1) was observed in the PR and RT gene of HIV-1 subtype C with refence sequences AB023804 and HXB2 (P-value less than 0.05 to indicate a hyper mutant). The diversity of the HIV genome is a major factor due to error-prone reverse transcription, recombination, etc. Diversity trends lead to the failure of the immune system [22]. To see diversity of the HIV-1, the pattern of amino acids of the protease and reverse transcriptase gene has been characterized through the VESPA method, as it is associated with drug resistance towards first-line antiretroviral therapy. The primers of the WHO dried blood spot protocol 2010 was used for analysis of drug resistance mutations in low levels of RNA copies patients (≤40 copies/ml replication through G-to-A hypermutations that could generate drug-resistant progenies with or without antiretroviral therapy. Human APOBEC3G/F mediated hypermutation is associated with development of drug resistant mutants of HIV-1 subtype C infected Southern Indian patients in first line ART [17]. But in case of Northern Indian patients, even the patients were infected with HIV-1 subtype C drug resistant isolates, no hypermutation (G to A) was observed. Data shows that the North Indian patients were more protected with innate immunity towards the drug resistant isolates. We suggest for the physiological and clinical study of the patients are essential those are living with drug resistant viruses.
Signalling complexity of the genome is measured by Shannon entropy and could be helpful in medicine [37]. Thus, there is more genome complexity and hence more entropy. In this study, when we measured the Shannon entropy of the PR and RT gene of drug resistant isolates with the HXB2 and AB023804 isolates, we found there is no change in entropy in the highest measured amino acid positions in both genes. Thus, it can be inferred that, the amino acid complexity remains same from the virus origin with several drug pressure within the human being.
The enormous genetic diversity within the infected individuals with implications for vaccine design and drug treatment was observed. The new infection results the transmission of the virus in a homogeneous viral population in early infection. The diversification of the transmitted virus provides information about the selection pressures during the transmission of the virus within the host [38]. In this study, selection pressure of the amino acids of the drug resistant isolates of the PR and RT gene is more with HXB2 than AB023804. Thus, the biasness amino acid changes in the HIV-1 drug-resistant mutants are a cause of rise of pressure at present context.

Conclusion:
The analysis of the drug resistance mutation in the protease and reverse transcriptase genes of the archived DNA of HIV-1 subtype-C infected patients over 1 to ≤ years of first-line ART may be helpful in the treatment guideline. A few signature amino acids persisted in the reverse transcriptase and protease genes of archived DNA and the RNA samples from plasma in similar HIV-1 subtype C infected patients over 1 to ≤ years of first-line ART in comparison to the reference sequence. This drug resistance mutation testing could be an important tool in archived DNA, when RNA testing becomes unsuccessful and in vitro DNA is amplified from RNA samples in the plasma of HIV-1 patients.