Differential binding of SARS-CoV-2 Spike protein variants to its cognate receptor hACE2 using molecular modeling based binding analysis

The current emergence of novel coronavirus, SARS-CoV-2 and its ceaseless expansion worldwide has posed a global health emergency that has adversely affected the humans. With the entire world striving to understand the newly emerged virus, differences in morbidity and infection rate of SARS-CoV-2 have been observed across varied geographic areas, which have been ascribed to viral mutation and evolution over time. The homotrimeric Spike (S) glycoprotein on the viral envelope surface is responsible for binding, priming, and initiating infection in the host. Our phylogeny analysis of 1947 sequences of S proteins indicated there is a change in amino acid (aa) from aspartate (Group-A) to glycine (Group-B) at position 614, near the receptor- binding domain (RBD; aa positions 331-524). The two variants are reported to be in circulation, disproportionately across the world, with Group-A dominant in Asia and Group-B in North America. The trimeric, monomeric, and RBD of S protein of both the variant groups (A & B) were modeled using the Swiss-Model server and were docked with the human receptor angiotensin-converting enzyme 2 (hACE2) employing the PatchDock server and visualized in PyMol. Group-A S protein's RBD bound imperceptibly to the two binding clefts of the hACE2 protein, on the other hand, Group-B S protein's RBD perfectly interacted inside the binding clefts of hACE2, with higher number of hydrogen and hydrophobic interactions. This implies that the S protein's amino acid at position 614 near the core RBD influences its interaction with the cognate hACE2 receptor, which may induce its infectivity that should be explored further with molecular and biochemical studies.


Background:
In December 2019, the city of Wuhan, Hubei province of China, witnessed patients inflicted with severe atypical pneumonia and respiratory illness, reporting the first case of novel coronavirus (CoV-2) infection in December 2019 [1,2]. Since then, it has grasped and restricted the entire globe, with nearly 85 million cases and more than 1.8 million fatalities as of the first week of January 2021 [3]. The genome of SARS-CoV-2 is around 30 kb, with a 5′ end comprising orf1ab encoding orf1ab polyproteins, while the 3′ end consists of genes encoding structural proteins, including spike (S), envelope (E), membrane (M), and nucleocapsid (N) proteins. The large replicase polyproteins pp1a and pp1ab are proteolytically cleaved into 16 putative nonstructural proteins (nsps). Structurally, S proteins (1273 amino acids) are homotrimeric, with each monomer of about 180 kDa, consisting of two subunits, S1 and S2. SARS-CoV-2 has a functional polybasic (furin) cleavage site at the S1-S2 boundary through the insertion of 12 nucleotides. The viral entry, a complex series of events, is prompted by the binding of the S1 subunit to the host cell receptor, that in turn, triggers priming cleavage by cellular proteases at S1/S2 domain, and activation cleavage at S2', followed by insertion of the putative fusion peptide (amino acids 770-788) into the target cell membrane, and thus forming a stable post-fusion complex [4,5,6,7]. It has been well documented that binding the S1 subunit to the human Angiotensin-Converting Enzyme 2 (hACE2) receptor acts as a gateway for coronaviral entry inside the host. A stable hot spot is known to consist of a Lys353-Asp38 salt bridge surrounded by four hydrophobic tunnel walls, two each being contributed by the hACE2 receptor and viral S protein, respectively. The stability of viral-receptor interaction is crucial for viral endocytosis and subsequent infection [8]. Any mutation in the viral or receptor domain might curb a stable complex's formation, affecting the infectivity. In this study, we examined 977 S protein sequences deposited across the globe and focused our study on two broad groups, Group-A and B, found primarily in Asia and North America, respectively. These two groups differed at a single amino acid (aa) from aspartate (Group-A) to glycine (Group-B) at position 614, which is close to a highly structurally conserved 194 amino acid long receptor-binding domain (RBD), localized at position 331-524 of the S1 subunit [9]. Using homology modeling (HM) and molecular docking, we tried to analyze differences in the viralreceptor binding, which could plausibly affect variants' pathogenicity.

Model Building, validation, and quality evaluation
The target-template sequence alignments were performed for finding the highly homologous protein template deposited in PDB for generating the quaternary structure or models for all target sequences. The produced models' geometry was parameterized in a CHARMM27 force field using the OpenMM library of ProMod version 33.0.0 available at the SWISS-MODEL server. The best homology models were selected according to three statistical parameters -Global Model Quality Estimation (GMQE), QMEAN, and quaternary structure quality estimate (QSQE) [10]. The structure validation of modeled proteins was performed through Ramachandran plot (RP) analysis using the SWISS-MODEL structure assessment server to visualize energetically most favored regions for two dihedral angles (Phi (Φ) / (Psi (Ψ)) of backbone amino acid residues.

Structural comparison of the S protein of two groups of SARS-CoV-2
The human SARS-CoV-2 S glycoprotein (PDB ID 6VSB) was downloaded from the RCSB protein databank. The .pdb format of the modeled SARS-CoV-2 trimeric S proteins (of Group-A and B) were opened in PyMOL Molecular Graphics System, Version 2.0 (Schrödinger, LLC) and superimposed with the native Electron Microscopic (EM) structure of the human SARS-CoV-2 S protein (PDB ID 6VSB) [7]. The structural or amino acid difference in SARS-CoV-2 Group-A S protein with Group-B S protein was observed by superimposing these modeled proteins in PyMOL.

Molecular docking of human SARS-CoV-2 S protein of two groups with human ACE2 receptor:
We performed three levels of molecular docking of human angiotensin-converting enzyme 2 (hACE2) receptor (PDB ID: 2AJF) [11] with two distinct groups (-A & -B) of human SARS-CoV-2 S protein. Initially, the hACE2 receptor was docked with human trimeric S glycoprotein SARS-CoV-2. Then, SARS-CoV-2 monomeric S protein was docked with the human hACE2 receptor. Finally, the hACE2 receptor was docked with the core RBD (331-524 aa) along with its proximal (from 300-330) and distal (from 525-615) regions of two distinct groups of SARS-CoV-2 S proteins. PatchDock server (http://bioinfo3d.cs.tau.ac.il/PatchDock/), which works on a Geometric Hashing Algorithm (GHA), was used for docking purposes. The default value of 4.0 Å of clustering RMSD for protein-protein docking and default complex type was tuned as the set docking parameters. The PatchDock output lists the top 20 potential docked complexes sorted by geometric shape complementarity (GSC) score and approximate interface (AI) area. The output also shows Atomic Contact Energy (ACE) and 3D ©Biomedical Informatics (2021) transformation, including three rotational angles plus three translational parameters. The protein-protein interaction analysis was performed using the DIMPLOT module of LigPlot+ version 1.4.5. The hydrogen bond formation between docked complexes of hACE2 receptor with RBD of human SARS-CoV-2 S-glycoprotein (Group-A and B) was visualized in the PyMOL Molecular Graphics System, Version 2.0 (Schrödinger, LLC).

Results and Discussion: Modeled S proteins exhibit accuracy and reliability
A total of 1947 S protein sequences were analyzed from different geographical regions of the world, and 36 prominent mutations were observed. The mutation at 614 amino acid position was the most predominant, with 1105 (56%) had GLY (G), while 842 (44%) had ASP (D) at 614 positions ( Figure 1A). A total of 176 protein templates were found in the SMTL based on BLAST, and a total of 676 templates were found by HHblits database searching. Among them, 24 top filtered template structures were found suitable for homology modeling. Ultimately, the human SARS-CoV-2 S glycoprotein (PDB ID 6VSB) was selected as the most suitable template for homology modeling of stars-CoV-2 Group-A and B S proteins. The Group-A and -B S proteins showed 99.26% and 99.17% sequence identity, respectively, and an equal coverage (95%) with the template protein (PDB ID: 6VSB). The high QSQE score and GMQE value of 0.87 and 0.72, respectively, showed the accuracy and reliability of the SARS-CoV-2 Group-A and -B S protein models, deeming them reliable to do structural and docking analysis. Also, a low QMEAN of -2.81 and -3.10 resonates that the model of SARS-CoV-2 Group-A and -B S protein is of good quality. The number of observed two dihedral angles, i.e., Φ / Ψ pairs, determines the contour lines. In SARS-CoV-2 Group-A and -B S protein 90.49 % and 89.74 %, amino acids except proline or glycine were within the first contour line known as Ramachandran favored region (data not shown). The alignment of the modeled RBDs of human SARS-CoV-2 variants has the same scaffold as the template crystal structure of SARS-CoV-2 (PDB ID: 6vsb). The mutant site D614G is distal to RBD and found in the variable loop region. The core RBD of SARS-CoV-2 S protein is comprised of five major antiparallel β-sheets (β1-β5), three minor β-sheets (β'β'') and six alpha-helices (α1-α6), which is structurally highly conserved. The modeled SARS-CoV-2 trimeric S protein of Group-A and Group-B was individually superimposed and compared with the native electron microscopic (EM) structure of the SARS-CoV-2 trimeric S protein template (PDB ID: 6VSB). The perfectly superimposed secondary structure elements of modeled proteins with native SARS-CoV-2 trimeric S protein reflect that the modeling is accurate. The ribbon diagrams clearly showed no significant observable structural changes in the modeled trimeric S proteins of group-A with group-B except one amino acid change ASP to GLY at position 614, present in the variable loop region (shown by arrow marks) (Figure 1 B).

S protein variants displayed altered interaction with the receptor hACE2
The molecular docking of hACE2 receptor with SARS-CoV-2 S protein of two distinct groups by PatchDock showed that GSC score (20400) and AI area (3946.10) was maximum for docking complex of hACE2 receptor with RBD of S1 subunit of S protein belonging to Group-B SARS-CoV-2, while the GSC score (17410) was lowest for docking complex of hACE2 receptor with trimeric S protein of Group-A human SARS-CoV-2. The visualization of the docked complexes of the hACE2 receptor with trimeric Group A S protein showed that hACE2 interacted far away from the RBD (Figure 2A  The protein-protein interaction analysis revealed that the amino acids of group-A RBD peripherally interacted with two chains (A and B) of the hACE2 receptor. A closer investigation u n v e i l s that the RBD of group-A S protein improperly binds inside the cleft of chains A and B of the hACE2 receptor ( Figure 3A). Interestingly, it was found that the ASP-614 amino acid containing motif is protruded from the interface (Figure 3C I). The interfacing amino acids between group-A RBD and hACE2 that are hydrogen bond-forming (red dots) are shown in Figure  3A.
Interestingly, on the other hand group-B RBD of S protein binds perfectly inside both the domains of chains A and B of the hACE2 receptor with several hydrogen bonds (red dots) and hydrophobic interactions ( Figure 3B). The GLY614 amino acid containing motif is embedded in chain A of the hACE2 receptor (Fig 3C.II) The exact interfacing amino acids between group-B RBD of S protein and hACE2 are shown in Figure 3B.
It was found that 19 aa residues of Group-A RBD interact with 21 aa residues of the hACE2 receptor ( Table 1). Out of 19 amino acids of RBD of Group-A S protein, seven amino acids (ARG355, ARG357, THR33, LEU335, SER359, GLU536, and SER530) formed hydrogen b o n d s with the hACE2 receptor amino acids (GLN552, ASN556, LEU91, GLN338, ARG559, LYS419 and GLN89). Besides, 73 hydrophobic interactions between RBD of Group-A S proteins with 21 aa residues of hACE2 were observed (Table1). Moreover, this interaction pattern shows that the distal aa residues from core RBD, i.e., LYS529, SER530, GLU536, GLN580, THR581, and LEU582 of Group-A S protein are interacting with the hACE2. It is interesting to note that THR333 and THR500 of RBD strongly interact with the LEU91 and SER370 of hACE2 by forming one hydrogen and several hydrophobic bonds formation in proximity (2.66, 2.97Å) ( Table 1).      Analysis of Group B S protein interaction with hACE2 unveils those 34 amino acid residues of core RBD strongly interact (8 hydrogens and 117 hydrophobic bonds) with 35 amino acid residues of the hACE2 receptor (Table 2). Here eight amino acids, i.e., ASN370, TYR508, TYR369, GLU536, THR430, PRO412, ASP427 and THR500 of RBD form eight hydrogen bonds with seven amino acid residues of hACE2 receptor viz., SER254, GLU160, ALA246, LYS419, GLY286, ASN290, and ASN134 (Table 2). Interestingly, only one distal amino acid residue (GLU536) from core RBD was interacting with the LYS419 of hACE2, forming single hydrogen and hydrophobic interaction. More than 100 hydrophobic interactions between amino acids of RBD of Group-B S protein and hACE2 receptor were observed (Table 2). Of interest, the TYR369, THR430, and THR500 residues of RBD strongly interacted with the ALA246, GLY286, and ASN134 of hACE2, the formation of one hydrogen bond (with each) and 30 hydrophobic bonds formation in proximity (Table 2) The  (Table 3). It is also noticeable that the solventaccessible surface area of five distal amino acids of Group-B RBD is lower (547.525 Å) than Group-A RBD (689.913 Å). The ASP614 of Group-A RBD is more accessible (96%) than GLY614 (92%) of group-B RBD of SARS-CoV-2 to the hACE2 receptor. This implies that GLY614 is probably influencing the neighboring aa residues for overall increased accessibility of Group-B RBD to the hACE2 receptor.

Discussion:
The Spike protein and its RBD have emerged as a mutational hotspot in this novel strain of coronavirus [12]. Recent reports suggest enhanced binding affinity of SARS-CoV-2 to ACE2 receptor compared to SARS-CoV, which is probably responsible for increased infectivity and transmissibility [6,13]. The SARS-CoV-2 S protein adopts a homodimer architecture, of which the RBD undergoes a hinge-like conformational change from perfusion to post-fusion upon binding to the hACE2 receptor. The rotation of trimeric architecture at an angle of 52.2° (determined by aa residues D405-V622-V991) decreases the atomic collision or steric hindrance. It converts ACE2 inaccessible or close or down-conformation of trimeric spike protein into ACE2 accessible or open or up conformation of RBD of trimeric S protein [14]. Studies using computer-aided molecular modeling and protein-protein docking of SARS-CoV-2 spike protein with human ACE2 receptors have shed light on amino acid residues potentially involved in the effective protein-protein interaction [13]. However, the specific actual amino acid residues, which mediate this protein-protein interaction is still unknown. Our in-silico analysis elucidates the interacting amino acid residues between the SARS-CoV-2 RBD of Group-A and B S protein and hACE2 in detail. Through a single aa change at position 614, the S proteins' overall secondary structure showed a similar folding pattern, with no significant structural difference, though minor differences at specific sections of coils or loops were spotted. The structurally preserved functional domains (RBD) and motifs (RBM) in S1 and S2 subunits suggest a highly conserved SARS-CoV-2 S protein structure. However, loop or coil modifications are known to affect the pathogenesis of murine hepatitis virus (MHV) and modulate neurovirulence and invasiveness of human coronavirus (HCoV-OC43) within the central nervous system (CNS) [15,16]. Comparative analysis of structural features and interaction with hACE2 shows that Group-B forms a large binding interface and a larger number of interacting residues than Group-A. Since the trimeric and monomeric forms of spike protein are very bulky in size, it may cause steric hindrance in the visualization of direct interaction of hACE2 receptor to the RBD of S protein. The high-resolution X-ray crystallographic structural data of the SARS-CoV-2 RBD-ACE2 complex unveiled by Lan and their Group show that a total of 17 residues of the RBD are interacting with 20 residues of ACE2 [14].
Our molecular docking shows that 34 amino acid residues of RBD S1 domain of S protein of Group-B SARS-CoV-2 strongly interact with 35 amino acid residues of the hACE2 receptor. Amino acid residues, THR500, ASN501, and GLY502 amino acid residues of RBD of S protein, which interact with the hACE2 receptor, were found common. It is worth noting that their study revealed that THR500 of SARS-CoV-2 RBD forms a 2.6 Å hydrogen bond with Y41 of hACE2, but in our analysis, we found that THR500 forms a 2.18Å hydrogen bond with ASN134 of hACE2. It is interesting to observe that two amino acids THR500 and ASN501, which are under the top five critical residues involved in the interaction with hACE2have been observed in other studies too [14, 17, 18, 19]. Our docking analysis reveals that the RBD-hACE2 binding interface consists of hydrophobic residues forming an intricate hydrophobic region and hydrogen-bonding network as reported previously [20]. A key mutation at aa position 614 from aspartic acid (D), a polar residue to glycine (G), a nonpolar one in the SARS-CoV-2 S protein, is expected to change the solvent-accessible hydrophobic surface area of the folded protein. This replacement of polar with nonpolar residue resulted in decreased electrostatic energy of Group-B S protein. The amino acid residues LYS529, SER530, GLU536, GLN580, THR581, and LEU582, are distant residues from the RBD (331-524), formed non-specific interaction of RBD of Group-A SARS-CoV-2 to the host hACE2 receptor. On the other hand, Group-B protein is close to the receptor forming a compactly localized interaction compared to Group-A. In a way, it implies a point mutation of ASP614 into GLY614 may have sharply increased the selective affinity of Group-B SARS-CoV-2 RBD to the hACE2 receptor. Single amino acid change enhancing protein interaction efficiency has been observed frequently in other proteins [21]. An insertion of basic aa residues into the loops in the avian influenza virus (H5N1) has reportedly caused its conversion from low pathogenicity to a highly pathogenic state [22]. Also, modifications in S1/S2 cause changes in feline coronavirus (FCoV) pathogenesis [23,24,25]. It could be hypothesized that the D614G substitution, distal to the receptor-binding motif in the RBD increased the binding capacity of Group B S protein to the human ACE2 receptor, which could be one of the reasons for its increased infectivity and pathogenicity.

Conclusion:
In conclusion, our study implies that the Group-B S protein having Gly at 614 aa position is structurally more accessible or exposed and binds compactly with hACE2 than the Group-A S protein having Asp at 614. Knowledge of the whole repertoire of residues involved in the interaction between SARS-CoV-2 trimeric S protein (not only core RBD) and hACE2 is needed to properly understand infectivity, transmissibility, and pathogenicity of novel coronaviruses. However, it is essential to note that the S protein mutation may not be the only determining factor for increased transmissibility and infectivity. The combination of mutations in other viral proteins could also affect replication efficiency and other life cycle steps, which need to be examined using the native virus.