Comparative modeling and genomics for galactokinase (Gal1p) enzyme.

The Gal1p (Galactokinase) protein is known for regulation of D-galactose metabolism. It catalyzes the formation of galactose -1-phosphate from alpha – D-galactose, which is an important step in galactose catabolism. The knowledge of Gal1p protein structure, its protein interacting partners and enumeration of functional site residues will provide great insight in understanding the functional role of Gal1p. These studies are lacking in case of the Gal11p kinase enzyme. Structure of this enzyme has already been determined in S. cerevisiae, however, no structural information for this protein is available for K. lactis and E. coli. We used the homology modeling based approach to model the structures of Gal1p for K. lactis and E. coli. Furthermore, functional residues were predicted for these Gal1 proteins and the strength of interaction between Gal1p and other Gal proteins was determined by protein–protein interaction studies via patchdock software. The interaction studies revealed that the affinity for Gal1p for other Gal proteins varies in different organisms. Sequence and structural based comparison of Gal1p kinase enzyme showed that the orthologs in K.lactis and S. cervisiae are more similar to each other as compared to the ortholog in E. coli. These studies carried out by us will help in better understanding of the galactose metabolism. Our sequence and structure comparison studies revealed that Human Gal1p shows more homology for Gal1p protein of E. coli. The above studies may be applied to Human Gal1p, where it can help in gaining useful insight into Galactosemia disease.


Background:
In Saccharomyces cerevisiae, the GAL1 gene is one of the structural gene of the galactose pathway along with GAL7, GAL10 and GAL2 [1, 2]. These structural genes are regulated at the transcriptional level in response to Gal4p, Gal80p and Gal3p [3]. The Gal4p acts as a transcriptional activator [4]. Gal80p acts as transcriptional repressor [5]. Gal3p acts as a ligand sensor where it sequesters Gal80p in the cytoplasm [6]. As a result of this effect Gal4p becomes active and turns on the synthesis of GAL genes that metabolized the galactose [7].
This enzyme is needed for inducible galactose uptake [10]. It can also act as a weak transcriptional regulator in the absence of Gal3p [11]. When grown on a medium, where galactose is present as a sole carbon source, null mutants for GAL1 are unable to metabolize galactose and grow [12,13]. This protein shares a high similarity of 90% in amino acids with Gal3p [12]. Addition of two amino acids (serine and alanine ) at 165 th position of Gal3p renders kinase activity to Gal3p [12]. This enzyme is conserved from E. coli, S.cerevisiae, K. lactis to humans [13].
There are two forms of Gal1p in humans, GALK1 and GALK2. Mutations in the GALK1 which is an ortholog of yeast Gal1p have been associated with a potentially lethal disease called Galactosemia II [14]. It is a genetic metabolic disorder where organisms are unable to metabolize galactose. It is characterized by the accumulation of galactose and galactitol which results in formation of cataract [15] . This disease is rare and is associated with isolated gene pool [16]. Patients suffering from this disease must avoid intake of food which contains galactose [17].
Therefore, we wanted to do comparative structural studies between the Gal1p proteins of E. coli, K. lactis and S.cerevisiae and extend it to humans to have a better understanding of Gal1p protein. Subsequently, we determined the putative functional site residues for Gal1p in order to find out which are the functional residues playing important role for the kinase activity of the protein in all the organisms mentioned above. Furthermore, through the use of protein-protein interaction tool, we determined the possible interacting proteins for the Gal1p in the whole genome of respective organisms. At last we have performed sequence and structure wise comparison to find out the evolutionary relationship of human GALK1 with that of E. coli, K. lactis and S.cerevisiae.

Methodology: Input file:
The protein sequences of Gal1p from S. cerevisiae, K. lactis and E. coli were set as input for finding of sequence similarity. These sequences were furnished for 3D model development via swiss model and Esypred3D (modeller 6v2) software's.
The research work was divided in to following steps: (1) homology modeling of Gal1p proteins for K. lactis and E. coli. (2) Finding of interface amino acids pattern, required for protein-protein interaction between Gal1p and other nearby partners. (3) Detecting evolutionary relationship among the Gal1p of S. cerevisiae, K. lactis and E. coli and assigning reference model for Gal1p of human via sequence and structure similarity finding. (4) Generating putative protein-protein interaction map among GAL proteins of K. lactis, S.cerevisiae and E. coli and estimate their interaction affinity.

Homology Modeling:
The protein sequences of Gal1p from K. lactis and Ecoli were subjected to SWISS MODEL and EsyPred3D (Modeller 6v2) software's for homology modeling [18]. Then, Procheck was used to generate the ramachandran plot that determined the accuracy of the developed model of Gal1p. In addition, ProSA (https://prosa.services.came.sbg.ac.at/prosa.php) was used to find the similarity with the known structured proteins from NMR and X ray experiments. The table 1 shows details of homology modeling and Ramachandran plot analysis. Note that the model structure will be generated only when the sequence similarity will be more then 30%.

Model Optimization:
The developed model was further refined by calculation of free energy of the system and further minimized via GROMOS96 software, incorporated in Swiss Pdb Viewer. Here our goal is to find the optimized model structure of the Gal1p protein. Energy minimization optimizes all the distorted geometries of the protein obtained after the protein modeling. It follows some basic steps: (1) firstly, it prepared the query protein as input for energy minimization, (2) Secondly, the number of cycles was set to 200 for Steepest Descent. All parameters for SD were set to be default, (3) The bonds, angle, torsion, improper, non bonded and electrostatic bonds were selected for molecular dynamics movement, (4) It moved the query protein atoms in all possible directions to release internal constraints. During the energy minimization step, some times protonation of atoms also takes place, (5) Next, It displaced the sidechains by gently pushing away atoms that clash, hence removing steric hindrances and (6) Finally, the repaired geometry was obtained.

Protein functional sites:
The model structures of Gal1p from K. lactis and E. coli were subjected to functional site prediction servers like PINTS, PROFUNC and Q-SITEFINDER. These servers predicted the active site residues domain for Gal1p.

Comparative genomics study:
The amount of sequence and structure similarity among the Gal1p from S. cerevisiae, K. lactis and E. coli were determined. The sequence alignment was performed via BLAST. Furthermore, the structural similarities between Gal1p were estimated by the swiss pdb viewer software via structure-structure superposition tool [19]. The above comparative studies were extended to human Gal1p. We also used neighbor joining method to show the relationship of Human galactokinase with common prokaryotes and eukaryotes. In addition, Protein Interaction Network was generated for Gal1p from S. cerevisiae, K. lactis and E. coli via STRING (version 8.2). Additionally, protein-protein interaction affinity was measured by Patchdock software [20]. Patchdock algorithm has three major stages: firstly, it computes the overall molecular surface of the protein molecule followed by finding of the geometric patches on the protein (concave, convex and flat surface pieces). Next, the selection of best patches is performed which retain the "hot sopt" patches. Then these patches are matched with the patches from another query protein based on hybrid of the Geometric Hashing and Pose-Clustering matching techniques. Concave patches are matched with convex and flat patches with any type of patches. The bad complexes are discarded with unacceptable penetrations of the atoms of the receptor to the atoms of the ligand. Finally, the remaining candidates are ranked according to a geometric shape complementarity score. Please refer Figure 1 for overall methodology.

Results:
The 3D model structures of Gal1p from Kluyveromyces lactis and E. coli were generated by swiss model software and EsyPred3D (Modeller 6v2) via homology modeling (Figure 2). We furnished the protein sequences of Gal1p through SWISS MODEL and EsyPred3D (Modeller 6v2) by using default parameters. The swiss model developed the 3D model structures of Gal1p proteins of K. lactis and E. coli by using known galactokinase protein from S.cerevisiae, 2AJ4 (chain B, sequence identity of 60.35%) and form human, 1WUU (chain A, sequence identity 42.22%) respectively (Table 1 see supplementary materials). On the other hand, EsyPred3D (Modeller 6v2) used same template proteins for model development but with different chains and sequence identities. The query sequence from K. lactis selected chain A of 2AJ4 with sequence identity of 55.10% and from E. coli selected chain A of 1WUU, sequence identity of 42.5% (Table 1  see Table  1). The number of bad contacts per 100 residues measured to be only one. Additionally, ProSA-Web server was used to major the model similarity with structures of known proteins from NMR and X ray experiments. This analysis revealed that the modeled structures for Gal1p of K. lactis and E. coli occupied region of X-ray predicted native protein structures of same size with Z score of -10.96 (Gal1p of K. lactis from swiss model software) (Figure 3). In addition to this, the Gal1p of E. coli showed similarity to Xray determined structures of known proteins with Z score of -9.67. Models from EsyPred3D produced no significant hits form ProSA server ( Figure  3). Based on these analyses, we selected protein models from swiss model software for further studies (Figure 2).
The models were further optimized by energy minimization via Gromos96. The Gal1p protein model of K. lactis was stabilized from energy of -10939.251 KJ/mol to -18470.979 KJ/mol. In addition, the model of E. coli Gal1p was stabilized from energy of -8056.374 to -15065.842 KJ/mol. The DALI server provided significant match for Gal1p from K. lactis with 2AJ4 (B) (Z score 59.6, RMSD=0.8A 0 ) and E. coli with 1WUU(A) (Z score 64.5, RMSD=0.6 A 0 ). The protein-protein structure similarity was estimated by the DALI server. It selected the same template proteins that matched with the Swiss model. Furthermore, the structures of Gal1p (Kinase enzyme) from (Modeled structure) K. lactis and E. coli were subjected to functional sites prediction serves like PINTS(24), PROFUNC(25) and Q-SITEFINDER(26) for finding of putative active sites residues. These servers predicted following active site residues in Gal1p of E. coli R28, G124, S128, S129, S130, H35, D37, G171, D174 and in Gal1p of K. lactis R43, E49, H50, D52, N201, D205, G153,G155, S157, S158, K252, G202 with significant match. The functional sites predicted by Q-SITEFINDER server also matched with the Profunc server (data not shown).
The sequence (by BLAST method) and structure (by swiss pdb viewer method) similarity have been estimated between the Gal1p proteins of S.cerevisiae, K. lactis and E. coli. The Gal1p from S.cerevisiae, K. lactis and E. coli did not show any nucleotide sequence similarity with each other but the protein sequence produced significant sequence similarity with each other. The Gal1p protein of S.cerevisiae produced sequence identity of 59% and e-value of 7e-175, score 596 with Gal1p protein of K.lactis. However, in case of matching with E. coli Gal1p the sequence identity is 27%, e-value 4e-21, score85.1 which is less then homology of Gal1p of S.cerevisiae with Gal1p of K.lactis. Gal1p of K. lactis with Gal1p of E. coli produced sequence identity of 28 %, e-value 4e-18, score 75.1. The protein sequence identity was also reflected by Dot matrix plot where among all Gal1p proteins, The Gal1p of S.cerevisiae and Gal1p of K. lactis are diagonally align with each other (Figure 4). The comparative analysis when applied to human GalK1 shows that human Gal1p is closely related to E. coli Gal1p. It produced sequence identity of 46%, e-value 2e-54, score 194 (Table 3 see (Table 2 see supplementary materials). ). These studies for Human Gal1p showed more similarity to E. coli. Superposition between Gal1p of Human with the Gal1p of E. coli produced low RMSD (RMSD = 0.52A 0 ) as compared to Gal1p of S.cerevisiae (RMSD= 1.21 A 0 ) and Gal1p of K. lactis (RMSD=1.2 A 0 ) ( Table 3 see supplementary materials) We used neighbourjoining method to plot the evolutionary tree. Neighbour joining method also obtained the same result. (Figure 5). We have obtained the putative protein-protein interaction network for Gal1p proteins in S.cerevisiae, K. lactis and E. coli via string (version 8.2) (http://string.embl.de/) software ( Figure 6).
The D-galactose pathway is regulated by several proteins which are known to interact with each other and regulate the synthesis of galactose metabolizing enzyme. The Gal1p may also interact with nearest proteins to carry out its function therefore we determined the affinity between the Gal1p with other GAL proteins present in the K. lactis, S.cerevisiae and E. coli. In order to estimate the strength of interaction affinity between the Gal1p and other Gal proteins within genome of S.cerevisiae, K. lactis and E. coli, we used patchdock software for protein-protein interaction study. The Gal1p of S.cerevisiae produced greater affinity for its Gal4p protein with patch dock score 17350 as compared to its other Gal proteins (Table 4 see supplementary materials). On the other hand, Gal1p of K. lactis produced greater affinity for its Gal80p with patch dock score 17312. The Gal1p of E. coli showed greater interaction for Gal10p (galE) with patchdock score 16562 (Table 4 see supplementary materials).The residues making interactions with Gal1p proteins are shown in figure 7. These selections are based on the good geometric shape complementarities between the proteins. Others with less complementarities are discarded.

Discussion and Conclusion:
Gal1p is a Galactokinase enzyme which participates in Leloir pathway of D-Galactose metabolism. Here we have predicted the 3D structure of Gal1p of K. lactis and E. coli via comparative homology modeling method.
The model was developed from SWISS MODEL software and further verified by Procheck and ProSA. This is the first report to determine the putative structure of Gal1p from K. lactis and E. coli. After, the structure, we predicts the functional residues and the putative interactive partners along with the strength of affinity. These studies will help in understanding the mechanism of action of Gal1p protein. At the same time, this information can be used in biotech industries where Gal proteins are used for protein production or designing some drugs. Our 3D model may help the biologist to understand the role of Gal1p in K. lactis and E. coli galactose pathway in a better way. Even we also deduced the comparative