HIV-1 envelope accessible surface and polarity: clade, blood, and brain

The human immunodeficiency virus type-1 (HIV-1) gp160 (gp120-gp41 complex) trimer envelope (ENV) protein is a potential vaccine candidate for HIV/AIDS. HIV-1 vaccine development has been problematic and charge polarity as well as sequence variation across clades may relate to the difficulties. Further obstacles are caused by sequence variation between blood and brain-derived sequences, since the brain is a separate compartment for HIV-1 infection. We utilize a threedimensional residue measure of solvent exposure, accessible surface area (ASA), which shows that major segments of gp120 and gp41 known structures are solvent exposed across clades. We demonstrate a large percent sequence polarity for solvent exposed residues in gp120 and gp41. The range of sequence polarity varies across clades, blood, and brain from different geographical locations. Regression analysis shows that blood and brain gp120 and gp41 percent sequence polarity range correlate with mean Shannon entropy. These results point to the use of protein modifications to enhance HIV-1 ENV vaccines across multiple clades, blood, and brain. It should be noted that we do not address the issue of protein glycosylation here; however, this is an important issue for vaccine design and development. Abbreviations HIV-1 - human immunodeficiency virus type 1, AIDS - acquired immunodeficiency syndrome, ENV - envelope, gp160 - 160,000d glycoprotein, gp120 - 120,000d glycoprotein, gp41 - 41,000d glycoprotein, LANL - Los Alamos National Laboratories, PDB - Protein Data Bank, HVTN - STEP HIV vaccine trial, AA - amino acids, MSA - multiple sequence alignment, ASA - accessible surface area, SNPs- single nucleotide polymorphisms, HAART - Highly Active Antiretroviral Therapy, CCR5 - C-C chemokine receptor type 5, CNS - central nervous system, HIVE - HIV encephalitis, P - polarity, NP - non-polarity, CTL - cytotoxic T lymphocyte, NIAID - National Institute of Allergy and Infectious Diseases.

inter-subunit interactions, inner (core)-outer (solvent) molecular interactions, receptor and co-receptor attachment, fusion required for viral entry, and antibody and CTL immunity. Another key issue is that although AA sequences may vary widely within and among HIV-1 clades, there may be immunologically conserved three-dimensional structures that provide foci for improved vaccine development [6][7][8][9][10].
AA positive selection is a consequence of the effects of immunity on HIV-1 ENV evolution. In 2007, the LANL HIV sequence database was examined and AA positive selection sites in the ENV protein were identified. Asian isolates had a higher positive selection level than North American isolates. The C3, C4, and C5 conserved domains had most of the positive selection sites detected and C1 and C2 were primarily positive selection-free [11]. In other studies, ClustalX was used to align 300 sequences of gp160 and located AAs with a high degree of conservation that were in proximity to one another in threedimensional maps. For example, in HIV-1 clade A, conserved AA occurred at positions including 32, 137, 441, and 915. Several AA were classified according to their polarity or non-polarity at such conserved sites [12].
Each component of the gp120-gp41 complex has specific functions. For example, anchoring the complex occurs via the gp41, a transmembrane protein [13]. The gp120 V3 variable region has long been considered crucial for ENV function. The V3 variable region binds to CCR5 or CXCR4 cell surface coreceptors and contains conserved regions including a band, arch, and hydrophobic core [14]. HIV-1 gp41 N-and C-domains mediate virusmembrane fusion. AA sequence residues 512-681 from 862 isolates were analyzed in HIV-1 clades A, B, C, D, E, F, G, H, I, J, and O. A highly conserved segment GIVQQQ on the C-terminal of the C-domain was identified that is involved in the formation of the three interfaces between neighboring helices in the trimer [15]. The HIV-1 gp41 amino-terminal region is a pretransmembrane domain. It contains an amphipathic-at-interface sequence that is non-polar (aromatic AA-rich), and is conserved among several viral strains. The amphipathic-at-interface sequence also includes a beta-turn structure with nonhelical extended region. Interaction of the amphipathic-at-interface sequence with the fusion peptide region reduces its fusion ability [16]. Additional studies from 357 HIV-1 clades A, B, C, and D also indicated that the gp41 C-terminal tail loop and three beta-sheet membrane-spanning domains are involved in membrane fusion [17].
In addition to ENV AA sequence and charge studies, Shannon entropy is a measure of diversity of AA sequences; the higher the entropy the greater is sequence diversity [18]. For example, during investigations of HIV-1 vaccine development, Shannon entropy was used to assess the intra-and inter-clade sequence variation of proteomes of HIV-1 clades A1, B, C, and D. Mean entropies were compared for strings of AA sequences and used to identify protein regions with little diversity that harbored epitopes [19,20]. Entropy was also used to pinpoint clade sequence differences. For example, between clades B and C for HIV-1 gp120, amphipathicity was maintained whereas there was elevated entropy at the polar face of the C3 region alpha2-helix. In clade B there was increased hydrophobicity and in clade C, V4 loops were shorter [21]. Entropy analysis also helped identify protein regions with little AA variation that harbored CTL epitopes; these regions frequently occurred in alpha helices [19,22]. Shannon entropy increased for V3 regions from patients treated with CCR5 antagonists vs. baseline. This may be due to treatment resistant viruses being able to produce a wider range of sequence variation than baseline viral strains [23].
The loss of effectiveness of immunological and drug therapy against HIV-1 and difficulties with producing an effective vaccine are due in part to viral immune escape and protein sequence diversity. However, a study of protein structure, selection, and sequence diversity of HIV-1 proteins demonstrated a ceiling to the diversity reachable by HIV-1 despite its high mutation rate [24,25]. Moreover, concomitant with the sequence variability ceiling, variability tended to occur at restricted locations in HIV-1 proteins including in ENV. Entropy studies for both clades B and C demonstrated that increased entropy occurred at AA sites with less constraint, and low entropy at sites with greater constraint. Entropies of AA sites in the protein core were lower than for sites on the protein surface. This is consistent with a paradigm in which loops have greater solvent accessibility than the more constrained core AAs. In this context, protein-protein interaction regions are considered solvent inaccessible (hydrophobic) [24,25].
Since the early 1990's, phylogenetic analyses supported the paradigm of brain as a reservoir for HIV-1 infection sequestered from blood. In studies of the V3 region, both polar and non-polar AA residues were prevalent for brain and blood with predominantly negative and neutral AA for the brain. Entropy calculations for HIV-1 derived from six patients indicated lower entropy in V3 sequences from brain vs. blood. Thus, the complex sequence and structure relationships for HIV-1 sequences in brain need to be dealt with as well, for vaccine design and production [26-32].

Materials and Methodology: Datasets: Structures:
The structural data for gp120 (Table 1

Solvent Accessibility:
The solvent accessibility of residues in gp120 and gp41 structures was assessed using ASA calculations (Figure 4). ASA was calculated using the Lee and Richards [35] algorithm implemented in the software, Surface racer [36]. The probe radius used was 1.4 Å for the calculation of ASA.

Compositional polarity:
We used percent compositional polarity to estimate sequence variations among known clades. This allows calculation of percent polarity range among clades, within which the sequences vary. The percent compositional polarity (S, T, N, Q, H, Y, D, E, R, W, C and K) and non -polarity (G, A, P, V, I, L, F, M residues) were calculated for gp120 and gp41, which included blood and brain sequences ( Figure 5). It should be noted that W and C were included in the polar group due to their partial polar property.

Shannon Entropy:
Shannon entropy for each AA residue was calculated as -SUM {Paa.log(Paa)}. Paa is the proportion of each AA in its respective site [37]. This equation is based on a calculation of informational entropy and the LANL methods have been described in detail [38, 39] (http://www.hiv.lanl.gov/content /sequence/ENTROPY/entropy_one.html).

Polarity range and Shannon Entropy relation:
Pearson correlation of Polarity range and Shannon Entropy was calculated using Microsoft Excel ( Figure 6). The statistical significance analysis between Shannon entropy and polarity of gp120 and gp41 sequences in blood and brain were calculated using Two Way ANOVA. In addition, the post test for linear trend has been applied to confirm the linear regression between the groups using GraphPad Prism (version 5) software.     Table 1 and Table 2) were superimposed using the software SPDBV (Swiss PDB Viewer version 3.7).
[http://www.expasy.org/spdbv/]. The variable loops (V1-V5) and the constant regions (C1-C4) of the GP120 structure, with inner and outer domains are shown (a). The superimposition of the gp120 trimer (PDB ID: 3DNL), solved through NMR at 20.0Å is shown in (b). The N heptad and the C heptad regions of the superimposed gp41 structure, solved by X-ray crystallography with a resolution of 2.10Å, are shown in (c). Only the N heptad region of gp41 is available in trimeric form, as illustrated in (d).

Results:
The Graphical Abstract outlines the flow of analysis from the databases to the tables and figures. The LANL and PDB databases were utilized to obtain sequence and structure information for the HIV-1 gp120 and gp41. We generated datasets from PDB of known structures produced by X-ray crystallography for gp120 and gp41 shown in Tables 1 and 2, respectively. These datasets include clades B, C, and A/E for gp120, and clades B and D for gp41. From Tables 1 and 2, we produced MSAs for gp120 and gp41 shown in Figures 1 and 2, respectively. In Figure 3, the corresponding structures for the sequence alignments were then used for constructing structural superimposition of gp120 (a), its trimer (b), gp41 (c), and its trimer (d). The superimposition of multiple structures shows that several clades share the same structural folds for gp120 (clades B, C, A/E) and gp41 (clades B, D). The ASA distribution, mean, and standard deviations are shown for each residue position for gp120 and gp41 in Figure 4 based on Tables 1 and 2. These values show the degree of structural variation for the differences in sequence, which is represented by the ASA measure and help identify residue positions that are solvent exposed (ASA > 0 Å 2 ). The mean distribution show that most residues in gp120 and gp41 are solvent exposed. As of 12-31-2010, there were approximately 14,925 gp120 and 14,472 gp41 sequences in the LANL database for clades A-K. Figure 5 shows the sequence percent polarity for several clades, blood, and brain by geographical location from this database. In addition, the sequence percent polarity range is shown for gp120 (clades B, C, A/E) and gp41 (clades B, D) for which structures are known from the PDB database. (The figures that utilize sequences from the full LANL database for clades A-K and the sequences from known structures from PDB do not specify geographical location.) The ranges in percent compositional polarity among clades in blood and brain sequences are shown in There is a correlation for gp120 Shannon entropy vs. gp120 polarity range across clades and for gp41 Shannon entropy vs. gp41 polarity range among clades. The Pearson correlation coefficients (r) are 0.734 and 0.588, respectively, for gp120 and gp41 ( Figure 6). Two-Way ANOVA of Shannon entropy and polarity of gp120 as well as gp41 sequences shows significant variation in both blood (F = 509.6; P<0.0001) and brain (F = 790.9; P < 0.0001). Furthermore, the post-test for linear trend, between the Shannon entropy and polarity, is positively correlated by coefficient of determination in blood (R 2 = 0.558) and in brain (R 2 = 0.483).

Discussion:
The gp120-gp41 complex trimer protein is a potential HIV-1 vaccine candidate [42]. The gp120 and gp41 protein subunits interact with each other at an interface, forming a gp120-gp41 complex involved in trimer assembly. The gp41 interactive region of the gp120 protein has a layered structure that has conformational mobility -flexibility -at the interface with gp41 [43]. The gp120 structures in the PDB database are available only in ligand-bound states, which may be partly due to the limited stability of the protein without support ligands. The trimer is unstable when produced in vitro and this may be caused by its sequence composition and conformation [44][45][46]. The polar composition of the gp120-gp41 complex trimer protein influences its surface, immunological, and stability properties. The bottleneck for the in vitro synthesis and production of this protein in stable form may be the prevalence of solvent exposed polar residues as described in our findings. Moreover, trimer instability is probably also due to the difficulty in exactly mimicking the in vivo environment, in vitro, for protein folding and assembly of the complex. The structure of the gp120-gp41 complex is similar among clades and homologous sequences share common structural folds and shapes although they have differences among side chain packing and residue orientation [47,48]. Our analysis, using a solvent exposure measure, ASA, shows threedimensional structural variation for each AA position and that the residues are solvent-exposed. In addition, in major segments of gp120 and gp41, we demonstrate a large percent polarity for solvent exposed residues across clades, blood, and brain. Thus, our findings support more open and dynamic ENV structures and conformations.
The variation in biochemical properties is relevant to manufacturing a stable and effective HIV-1 ENV protein. An additional concern in using the ENV as an HIV/AIDS vaccine candidate is its high sequence variation among clades from different geographical locations; the LANL database (as of 12-31-2010) contained 14,925 gp120 and 14,472 gp41 sequences. Potential gp120-gp41 global vaccine candidates should incorporate the issue of immunological specificity and AA mutant variation across clades from different geographical locations. Our analysis of the known sequences for gp120 and gp41 to estimate the polarity changes caused by AA variation show that the percent polarity range among clades, blood, and brain correlates with the mean Shannon entropy. This reflects sequence variation that changes surface properties within and across clades, blood, and brain and this is anticipated to affect their respective immunological responses. Despite the structural similarities determined so far across clades for blood, this is not yet known for blood vs. brain. The difference in sequence polarity range for brain is comparable to blood for gp120 and is greater for brain than blood for gp41. These sequence variations could be due to differences in immune selection between brain and blood as well as due to the structure of the ENV, e.g. the gp120 juts further into the solvent than does gp41, since gp41 is partially submerged in the membrane. These findings help quantify the percent compositional polarity range within which the gp120 and gp41 sequences vary among clades; we infer from this structural folding related to conformations pertinent to the immune response. This is further relevant in the design of suitable HIV-1 ENV vaccines specific for multiple HIV-1 clades across blood-and brain-derived sequences. Vaccines that induce neutralizing antibody are currently insufficient to the task; thus, new methodologies are needed to optimize this approach. A recent method is under development that originates from a neutralizing antibody, works its way back to reconstruct the epitopes (reverse engineering), and then uses a structure based design technology to optimize the epitopes [49]. The structural information presented in this article should enhance methods that augment the stability of the gp120-gp41 complex and trimer. This work points to the need for the development of supporting ligands that assist in protein conformation stabilization as well as producing AA mutants and other protein modifications that neutralize antigen charge where needed. In addition, brain-related HIV-1 should be dealt with as strains of HIV-1 that similarly require a specific antigenic enhancement approach. For this tactic, it is important to understand the charge characteristics of the antigen; thus, we characterize the HIV-1 ENV from the point of view of compositional percent charge polarity and its variation across clades, brain, and blood.
There are several additional concerns in developing HIV-1 ENV vaccines that indicate the complexity of the problem and the sophistication required. These include issues of antigen integrity, delivery, and cross-reactivity across clade, blood, and brain viral strains. Furthermore, potential vaccines may have advantageous or deleterious effects depending on other factors as well. Such vaccine-related factors include host immune responses that might cause inflammation resulting from vaccines that could be more deleterious in the presence of HIV-1 infection. Thus, application of the knowledge of antigen structure may be different for vaccines in uninfected vs. HIV-1 infected individuals. Virologic and immunologic factors (e.g., pre-existing viral strains and cognate host immune responses), once one introduces a vaccine into the CNS of an infected individual, might result in further inflammation as well. Moreover, it is unknown how selective pressure on virus evolution might lead to vaccine-resistant strains that vitiate vaccine effects. Would a vaccine be possible that stimulates the immune response sufficiently rapidly to halt virus replication prior to mutant virus spread? It is also unknown about the potential interactions with vaccines amongst neurovirulent macrophage-tropic HIV-1 strains (that predominantly infect the CNS). In addition, which potential vaccines will inhibit or prevent their neurovirulence as well as suppress or prevent brain infections due to CNS strains? In host genetics for example, the CCR5-32-delta polymorphism results in a less or non-functional HIV-1 coreceptor. Therefore, those individuals with that allele would be anticipated to have improved potential survival with a vaccine. However, NeuroAIDS susceptibility allele SNPs may play a deleterious role during vaccine exposure coupled with virus exposure, pre-or post-vaccinations. Because NeuroAIDS appears to result largely from host immune state and viral strain, host variability in the immune response to vaccine introduction is a crucial concomitant concern [10, 27, 29, 30, 31, 50, 51].
The effects of HIV-1 infection can be directly measured in the brain and the neuropathology of HIVE has changed in the brain since the use of HAART commenced in 1995 [52]. Concomitant use of vaccines and anti-viral therapy may have an interactive impact on these therapies systemically and in the brain. The production of escape mutants could occur within brain that has been demonstrated systemically by automated deep sequencing techniques because of anti-CCR5 strain antiviral therapy [53]. During the last several years, the HIV-1 infected population has shown increased aging and increased incidence of Alzheimer's disease and these factors may further complicate vaccine use in at-risk and already infected individuals [52]. An additional complicating factor in the use of vaccines is the possibility of autoimmunity as a component of increased inflammation that could occur peripherally as well as within the brain  a GP120 = Larger subunit of ENV glycoprotein; b Res = Atomic resolution of PDB structure; c no information available; d Core region of gp120; e GARS = Glycine, Alanine, Arginine, Serine.