Identification of the sequence motif of glycoside hydrolase 13 family members

A bioinformatics analysis of sequences of enzymes of the glycoside hydrolase (GH) 13 family members such as α-amylase, cyclodextrin glycosyltransferase (CGTase), branching enzyme and cyclomaltodextrinase has been carried out in order to find out the sequence motifs that govern the reactions specificities of these enzymes by using hidden Markov model (HMM) profile. This analysis suggests the existence of such sequence motifs and residues of these motifs constituting the −1 to +3 catalytic subsites of the enzyme. Hence, by introducing mutations in the residues of these four subsites, one can change the reaction specificities of the enzymes. In general it has been observed that α -amylase sequence motif have low sequence conservation than rest of the motifs of the GH13 family members.

There are also regions wherein the sequence similarity is rather high; although strict conservation of the residues is not observed in these regions. These are region V ( 173 LPDLD 177 ), region VI ( 56 GFTAIWITP 64 ), and region VII ( 323 GIPIIYAGQ 331 ; through out the introduction residue number is given according to 6TAA (TAKA α-amylase), unless stated otherwise) [11]. Of these, region V is at the C-terminus of the domain B of these enzymes. A sequence analysis of 79 experimentally characterized proteins has suggested that the signature sequence QpDln and MPKln (single letter amino acid symbols are used; upper case letters indicate total conservation whereas lower case letters indicate partial conservation) define the oligo-1,6-glucosidase and neopullulanase subfamilies, respectively, in the region V. The signature sequence MPDLN characterized the intermediary group which includes enzymes with mixed specificities of α-amylase, cyclomaltodextrinase and neopullulanase [12]. Currently, the catalytic triad residues Asp206, Glu230 and Asp297 seem to be the only residues that are absolutely invariant among all the GH-H clan members [11]. In addition, a few residues such as Gly56 and Pro64 [13] (flanking the second β-strand), Tyr82 [13], His122 and His296 [11] are present in most of the members. However, Arg204 has been found to be conserved only in the amylolytic members [6,14].
A larger number of residues are conserved within subgroups of enzymes such as neopullulanases and oligo-α-1,6-glucosidases [12]. For example, Lys or Arg are conserved at position 209 in 91% of the α-(1→4)-linkage specific members of the GH13 and GH77 family [6]. Similarly, Gly207 and His210 are present in many α-(1→4)-linkage-specific enzymes. In some of the enzymes, Gly207 is replaced by an aromatic residue and mimics the interactions of His210. However, in case of archaeal and plant α-amylases His210 is replaced with a Gly. Trp and Tyr/Phe are found at positions 231 and 232 in CGTases and one maltogenic amylase but not in other GH13 family members [6]. Enzymes which act on α-(1→6)-linkages (e.g., pullulanases, isoamylases, glycogen debranching enzymes) have a conserved aromatic residue in the loop that links the second β-strand and second helix. On the other hand, the enzymes that act on α-(1→4)-linkages, have a conserved aliphatic residue in this position (Ala120) suggesting that such regions provide the enzyme with a distinct "activity" and/or "substrate" specificities. Certain fungal proteins of the GH13 family, some of which are involved in cell wall synthesis, share a few conserved residues that are absent in α-amylases from other phyla (plant, animals and bacteria) [15]. These residues are His (Thr41 in Taka-amylase), Arg (Gly44), Cys (Thr66), Leu (Ala120), Tyr (Val231), Trp (Leu232), Cys (Ile326) and Leu (Glu332). The sequence motifs that are responsible for the reaction specificity of the enzymes of the GH13 family are not very well understood. So, the identification of the sequence motif of α-amylase, CGTase, branching enzymes and CDase subfamily of the GH13 family was performed. All these motifs are in continuation with the conserved region III and are in almost same position with respect to each other. These sequence motifs belong to the region that has not been explored so far, and include residue number 225 to 264. This analysis identifies sequence motif that is responsible for reaction specificity of the GH13 family. The newly discovered sequence motif along with the previous analyses of the structures of these enzymes [16,17] will not only help in the understanding of structure-function relationship of these enzymes but also in the identification of the GH13 family members.

Generation of dataset and analysis strategy:
The analysis was performed on those members of the GH13 family that use the α-glucan as a substrate and produce disaccharides to polysaccharides as a final product. 90% sequence identity cutoff option present on the UniProt database (http://www.uniprot.org/) was used to retrieve the sequences and only reviewed Swiss-Prot sequences for α-amylase, branching enzyme and cyclodextrin glycosyltransferase (CGTase) were selected. For CDase enzymes all the reviewed Swiss-Prot entries were chosen, as the number of the sequences was very low. Analysis was performed on the experimentally characterized sequences rather than computationally annotated sequences (having larger size of data set), because; despite of having high overall sequence similarity, changes in key residues may confer different activity or no activity at all. All the peptides and exceptionally large sequences were ignored to ensure the proper alignment. This selection criteria lead to generation of the dataset consisting of 59 α-amylases, 12 CGTases, 166 branching enzymes, 3 maltogenic α-amylases, 3 neopullulanases and 2 cyclomaltodextrinases (CDase) (Supplementary Table 1 -available with author). The CDase, neopullulanase and maltogenic α-amylase enzymes are considered together in CDase subfamily as these enzymes have similar enzymatic activities [23].
The conserved region of α-amylase, CGTase, branching enzyme and CDase subfamily was obtained by multiple sequence alignment and was further used for generation of sequence logos. The conserved region was selected by visualization in BioEdit. While selecting the conserved region, the length and region were kept same as far as possible. A sequence logo shows the relative frequencies of the various residues at a given position. This is indicated by proportionally varying the size of the symbol. The order of predominance of the residues at a given position are indicated by showing the most frequently occurring residue at the top of the heap and least frequently occurring residue at the bottom of the heap. The height of the logo at a given position is proportional to the degree of conservation at that position.

Sensitivity and specificity:
Sensitivity is a parameter that reflects the ability of a profile to detect true positive sequences, while specificity reflects their ability to reject false positive sequences. Sensitivity = TP/(TP+FN), where TP is true positive, FN is false negative. Specificity = TP/(TP+FP), where FP is false positive.

Results and Discussion:
The conserved region of α-amylase, CGTase, branching enzyme and CDase are present in equivalent position in the multiple sequence alignment (Supplementary Figure 1 -available with author) and also includes conserved region III. Some of the residues of these motifs constitute the -1 to +3 catalytic subsites in 3D structure of the enzymes (that is present within the 4.5Å from the ligand, data not shown).

α-amylase:
Searching by the HMM profile of α-amylase sequence motif against Swiss-Prot database, the α-amylase enzymes with sensitivity of 95% and specificity of 99% were identified (Figure 1a). The four false positive hits included two CGTases and two uncharacterized glycosyl hydrolases. The E-values of CGTases are 0.0023 and 0.014; while that of uncharacterized glycosyl hydrolases are 4.5e-05 and 0.0085. The results suggest that the similar sequences present in both the α-amylase motif and false positive CGTases may be responsible for the α-(1→4) hydrolytic activity. However, the role of these residues needs to be experimentally investigated. The low E-value for the uncharacterized glycosyl hydrolases hit indicates these enzymes to be αamylase. The sequence logo of this motif suggests that, besides catalytic glutamate, this motif also contains three highly conserved residues (Glu, Arg and Tyr and Trp as aromatic residue) that are absent in other enzymes at equivalent positions. It has been seen that mutations in the α-amylase of Bacillus stearothermophillus (A271Y, A271F) [25] and Bacillus licheniformis (V271F) [26] (Figure 1a; P04745 residue number, position 59 in sequence logo) caused an increase in transglycosylation reaction as compared to the wild-type enzymes. Interestingly, the mutant A271Y performed transglycosylation reaction more efficiently than A271F. In case of human salivary α-amylase the introduction of bulkier tryptophan residue in place of Phe271 (Figure 1a; P04745 residue number, position 59 in sequence logo) caused a disruption of the water chain involved in hydrolysis leading to reduction in hydrolytic activity by 70 folds [27]. Thus mutational analyses suggest that residues of this motif may contribute to the hydrolytic activity in α-amylases (Figure 1a).

CGTase:
CGTase specific motif was identified from 12 experimentally characterized sequences. Searching by the HMM profile of CGTase sequence motif against Swiss-Prot database, CGTase enzymes with sensitivity of 100% and specificity  (Figure 1b). The lower specificity of this motif is due to the two false positive hits (α-amylase and maltogenic α-amylase). Both of these false positive hits have a very low E-value with 4.7e-28 for α-amylase and 3.2e-06 for maltogenic α-amylase. However, the bit score for maltogenic α-amylase is low and aligns with the short stretch of the query motif. As CGTase can perform both hydrolysis and transglycosylation reactions of α-(1→4) linked polysaccharide, it might have lead to picking of these false positive hits. Maltogenic α-amylase can efficiently catalyze both hydrolysis and transglycosylation reactions, suggesting that the similar sequences in CGTase motif and false positive maltogenic α-amylase may be responsible for hydrolytic and/or transglycosylation activity. The role of these residues needs to be experimentally validated. The sequence logo of this motif shows the presence of highly conserved sequence in different positions. Mutation, W286V, in Bacillus stearothermophillus CGTase (Figure 1b; P26827 (Figure 1b; P26827 residue number, position 9 in sequence logo) caused a decrease in cyclization and disproportionation reactions along with increase in hydrolytic activity. These mutational analyses suggest that these residues are central for cyclization reaction [35]. The replacement of E292A (Figure 1b; P26827 residue number, position 18 in sequence logo) in Bacillus circulans strain 251 implies that this residue may be involved in disproportionation reaction [32]. Thus, the above mutational analyses suggest that this CGTase specific motif may be conferring the reaction specificity in this enzyme.

Branching enzyme:
The sequence logo of 166 multiple aligned sequences show the presence of a highly conserved sequence. On searching the HMM profile of branching enzyme sequence motif against Swiss-Prot database, the branching enzymes with a very high sensitivity and specificity of 100% were identified ( Figure  1c). There are many residues like Ala, which are highly conserved and might be responsible for the reaction specificity of the branching enzymes. Unlike other enzymes where an aromatic residue or a hydrophobic residue is present next to the catalytic Glu, the branching enzymes have an acidic residue like Glu or Asp present. Thus, the conserved residues of this motif may be responsible for the reaction specificity of the enzyme.

CDase:
CDase subfamily includes cylcomaltodextrinase (CDase), maltogenic αamylase and neopullulanase. Despite of having different EC number, these enzymes have similar enzymatic activities [23], and hence are treated together in the present analysis. Searching by the HMM profile of CDase subfamily sequence motif against Swiss-Prot database, the enzymes of CDase subfamily with sensitivity of 100% and specificity of 53% were identified (Figure 1d). The false positive hits include four amylopullulanase and one maltodextrin glucosidase. The sequences similar in CDase motif and false positive amylopullulanase may be responsible for the hydrolytic activity. I355W (Figure 1d; Q08751residue number, position 8 in sequence logo) mutation in the CDase of Bacillus stearothermophillus reduced the affinity of this enzyme for α-(1→6) glycosidic linked substrate. It also lead to reaction specificity similar to that of typical starch-saccharifying α-amylase [36]. However, I355V mutant have high affinity for α-(1→6) glycosidic linked substrate. A mutation, W356A, of Thermoactinomyces vulgaris neopullulanase II suggests that W356 (Figure 1d; Q08751residue number, position 9 in sequence logo) is crucial for the binding of different substrate and it does so by making stacking interaction [37]. However, to make this stacking interaction possible, Y374 residue is required. The replacement of Y374A (Figure 1d; Q08751residue number, position 36 in sequence logo) results in a decrease in Km value for the pullulan as a substrate [38]. Y374 residue also helps in the hydrolysis of the different substrates by providing catalytic water near the catalytic site. It has been observed that on replacing Y374 with hydrophilic residue (D/S) in Bacillus stearothermophillus neopullulanase, there was a decrease in transglycosylation. Further, M372L and Y374F (Figure 1d; Q08751residue number, position 34 and 36 in sequence logo) mutants have been observed to have higher transglycosylation activity than the wild-type enzyme [36]. Thus, above mutational analyses clearly indicates that the residues of CDase motif may govern the reaction specificities in enzymes of this subfamily.

Conclusion:
Sequence variation in the α-amylase enzyme is higher as compared to rest of the enzyme of GH13 family members. This may be due to the presence of αamylase in diverse variety of organisms and it may have evolved earlier than rest of the enzymes of the GH13 family. Thereby, during the evolution more mutations may have occurred in α-amylase to perform its activity in diverse variety of biological systems or environment. As suggested by a number of mutational studies the replacement of residues belonging to one motif with sequence of another motif at equivalent positions may have changed the reaction-specificities of the enzyme. Hence, these motifs can be used as a guide for the inter-conversion of the GH13 family. Residues of these motifs constitute the -1 to +3 catalytic subsites of the GH13 family members, suggesting that these four subsites are mainly responsible for the reaction specificities of the enzymes of the some of the GH13 family members.