Molecular modeling and analysis of human and plant endo-β-N-acetyl- glucosaminidases for mutations effects on function

Endo- β-N-acetylgucosaminidases (ENGases) are the enzymes that catalyze both hydrolysis and transglycosylation reactions. It is of interest to study ENGases because of their ability to synthesize glycopeptides. Homology models of Human, Arabidopsis thaliana and Sorghum ENGases were developed and their active sites marked based on information available from Arthrobacter protophormiae (PDB ID: 3FHQ) ENGase. Further, these models were docked with the natural substrate GlcNAc-Asn and the inhibitor Man3GlcNAc-thiazoline. The catalytic triad of Asn, Glu and Tyr (N171, E173 and Y205 of bacteria) were found to be conserved across the phyla. The crucial Y299F mutation showing 3 times higher transglycosylation activity than in wild type Endo-A is known. The hydrolytic activity remained unchanged in bacteria, while the transglycosylation activity increased. This Y to F change is found to be naturally evolved and should be attributing higher transglycosylation rates in human and Arabidopsis thaliana ENGases. Ligand interactions Ligplots revealed the interaction of amino acids with hydrophobic side chains and polar uncharged side chain amino acids. Thus, structure based molecular model-ligand interactions provide insights into the catalytic mechanism of ENGases and assist in the rational engineering of ENGases.

, 2014; Published August 30, 2014 acetylgucosaminidases (ENGases) are the enzymes that catalyze both hydrolysis and transglycosylation reactions. It of interest to study ENGases because of their ability to synthesize glycopeptides. Homology models of ENGases were developed and their active sites marked based on information available from Arthrobacter protophormiae (PDB ID: 3FHQ) ENGase. Further, these models were docked with the natural substrate GlcNAc catalytic triad of Asn, Glu and Tyr (N171, E173 and Y205 of bacteria) were found to be conserved across the phyla. The crucial Y299F mutation showing 3 times higher transglycosylation activity than in wild type E ained unchanged in bacteria, while the transglycosylation activity increased. This Y to F change is found to be naturally evolved and should be attributing higher transglycosylation rates in human and Arabidopsis thaliana ENGases. Ligand interactions Ligplots revealed the interaction of amino acids with hydrophobic side chains and polar uncharged side chain amino acids. Thus, structure based molecular model-ligand interactions provide insights into the catalytic mechanism of ENGases and assist in the rational engineering of ENGases. acetylgucosaminidases (ENGases) are the enzymes that catalyze both hydrolysis and transglycosylation reactions. It is copeptides. Homology models of Human, Arabidopsis thaliana ENGases were developed and their active sites marked based on information available from Arthrobacter protophormiae (PDB ID: 3FHQ) ENGase. Further, these models were docked with the natural substrate GlcNAc-Asn and the catalytic triad of Asn, Glu and Tyr (N171, E173 and Y205 of bacteria) were found to be conserved across the phyla. The crucial Y299F mutation showing 3 times higher transglycosylation activity than in wild type Endoained unchanged in bacteria, while the transglycosylation activity increased. This Y to F change is found to be naturally evolved and should be attributing higher transglycosylation rates in human and Arabidopsis ots revealed the interaction of amino acids with hydrophobic side chains and polar ligand interactions provide insights into the catalytic transglycosylation activity, i.e., they transfer the released oligosaccharide moiety to a suitable acceptor other than water. The transglycosylation activity of ENGases has attracted much attention in recent years for the chemoenzymatic synthesis of oligosaccharides, glycopeptides and glycoproteins [12,13].
In order to understand the structural and functional relationships of ENGases, we studied the different orthologous Arabidopsis thaliana and Sorghum in the bacterial 'Arthrobacter protophormiae' protein structure. Due to lack of crystal structures of plant and Arthrobacter protophormiae ENGase crystal is considered for homology modeling. These structural comparisons lead to comprehend the dual-catalytic mechanisms like hydrolysis and transglycosylation of ENGases by docking with GlcNAc-Asn and Man3GlcNAc-

Methodology:
The protein sequence of Human (Uniprot ID: Q8NFI3, www.uniprot.org), Arabidopsis (NCBI ID: 79507164, www. ncbi.nlm.nih.gov) and Sorghum (Uniprot ID: C5YW98, www.uniprot.org) sequences were retrieved. Protein-protein Blast was used to identify the homologous sequence with three dimensional structures done against protein data bank (www.rcsb.org). The obtained PDB sequence was then aligned with Clustal x for multiple sequence alignment.

Homology Modeling and Energy Minimization
Homology models of Human, Arabidopsis and sorghum ENGase were built with Prime (3.1) module of Schrödinger Suite (Schrödinger, LLC, New York, NY). The secondary structure of these three target sequences were predicted using the SSpro program bundled with Prime. The target Human, Arabidopsis and Sorghum ENGase and template (Arthrobacter protophormiae ENGase) sequences were aligned using the Clustal X method of Prime, followed by manual adjustment to avoid big gaps in the secondary structure domain. The original ligand in the template structure was removed before performing homology modeling.
Using protein preparation wizard (Schrödinger suite version 9.6) the protein was minimized. Firstly, water molecules were removed from the crystallographic structure followed by addition of hydrogen atoms. All atom charges and atom types were assigned. Finally, energy minimization and refinement of the structure was done up to 0.3 Å RMSD by applying OPLS-2005 force field. The optimized target protein was employed for docking studies. Further, the structure quality of the predicted homology models for ENGases of Human, Arabidopsis thaliana and Sorghum were evaluated by using the online tools like Rampage, PDBSum and PSVS (Protein Structural Validation Suite).

Binding Site Prediction
The binding site for bacterial ENGase was determined by using the ligand interaction diagram generated by LIGPLOT. All modeled structures were superimposed on bacterial ENGase (3FHQ) using the superimposition tool of the Schrodinger Suite. Sitemap (Schrödinger, LLC) was used to recognize the plausible active sites for the modeled proteins. Based on the scores and the larger cavity, the top ranked potential active ligand binding sites for each of the modeled protein was identified and receptor grid was generated to proceed further for molecular docking.

Preparation of Ligand Molecules
GlcNAc-Asn and Man3GlcNAc-thiazoline were used as ligands in this present study. LigPrep module was used to prepare the ligand. Preparation involves the generation of tautomers, 32 stereoisomers corresponding to a pH of 7±2 and protonation states. Finally, energy minimization was done using the force field OPLS2005.

Molecular Docking Studies
Identified active site as predicted by sitemap in the three modeled proteins was used as a target for docking. Receptor grid generation was performed, being an essential step as the docking protocol is a grid based docking. The grid box was generated at centroid of the residues in the active site of the modeled structures continuing with the default parameters. The glide extra precision (XP) protocol (Glide Tool, Schrodinger suite) was employed for docking the prepared ligands with the modeled proteins. Ligand interaction diagram was used to understand the interactions between the ligands and modeled ENGase proteins.

Results & discussion:
Protein 3D structure prediction from amino acid sequence turns accessible if relatively similar homologous protein structure is crystallized. PSI -Blast search, an online search engine was used to identify the similar type of proteins. Arthrobacter protophormiae Endo-Beta-N-acetylglucosaminidase is chosen as template for modeling studies. Blast results showed 25 % similarity with Human, Arabidopsis thaliana and 24% with Sorghum protein sequences. The crystal structure 3FHQ with a resolution of 2.45 o A was retrieved from protein data bank (www.rcsb.org). The multiple sequence alignment (MSA) of the four ENGases was done using Clustal X as shown in the Figure 1. The modeled 3D structures were generated using the prime module and further refined by using the protein preparation wizard. The models were evaluated by using the Procheck. Ramchandran plot for Arabidopsis thaliana showed 73.7% of amino acids in most favorable region, 23% in allowed regions and 3.3% in disallowed region. Human ENGase model showed 77.8% in most favorable region, 20.4% in allowed regions and only 1.8% of amino acids in disallowed regions. Sorghum ENGase protein model showed 76.7% of amino acids in most favorable regions, 20.5% in allowed regions and only 2.8% in disallowed regions. Majority of amino acids are in allowed regions of the Ramchandran plot for the three models generated and reinstate that these homology models are of good quality and dependable. Super imposition of the modeled proteins along with bacterial counterpart showed helix broken to helix-loop-helix confirmation at 680 to 689 amino acids and extended loops with 446-456 amino acids in Human protein. Four extended loops were also seen in Arabidopsis thaliana structure ranging 22-28, 553-557, 570-576, 584-603 amino acids. 570-581 amino acids extended loop was also seen in Sorghum as depicted in Figure 2. The Glide XP based molecular docking was done with all the three modeled proteins with GlcNAc-Asn and Man3GlcNAc-thiazoline as ligands. All the four ENGases were docked with GlcNAc-Asn and Man3GlcNAc-thiazoline inside their binding pockets generated by the sitemap. The binding cavity site position is common for all the four ENGases but they differ in amino acids residues number as depicted in Figure 3. Based on the nature of amino acids, all the four cavities of the proteins contain majority of aromatic and acidic type of amino acids in their cavity. This suggests that cavities of all the three modeled modeled structures are in line with crystallized protein and could be attributed with similar functionality.
When compared with the bacterial (3FHQ) catalytic triad, N171, E173 and Y205 are found to be conserved in Human and plant ENGase proteins as shown in Table 1 Table 1 (see supplementary material).

Considering the previous mutagenic studies of Endo M [15]
and the structures of binary complexes of Endo A and mutational studies, Y299 F showed 3 times higher transglycosylation activity than wild type Endo A. The hydrolytic activity also remained unchanged [14]. Interestingly, Y299 of bacteria is not found to be conserved and is replaced with F(PHE) at 361 in humans and at 300 in A.thaliana. The same tyrosine is found to be completely deleted in Sorghum protein sequence. As predicted, the loss of hydroxyl group in the Y to F mutation might help in the release of the product faster and there by resulting in increased transglycosylation rate in human and plant ENGases. This could also be considered as a natural selection criteria of evolution for elevated rates of enzyme activity in higher phyla.

Conclusion:
ENGases are inherently a class of hydrolases; the hydrolytic activity in general is relatively higher in comparison to their transglycosylation activities with natural N-glycan's as substrates. The major concern for the chemo-enzymatic approach is the product of hydrolysis, where the product thus formed tends to be the substrate for the enzyme. Lack of crystallized structures was a challenge for ENGase engineering and in the understanding of the structural basis for hydrolysis and transglycosylation activities. This study provides homology models of human and plant ENGases and depicts the mutations for enhancing the transglycosylation activities. The human and plant enzymes are excellent source for engineering it by selectively modifying the hydrolytic catalytic triad.