Comparative characterization of commercially important xylanase enzymes

Xylanase is an industrially important enzyme having wide range of applications especially in paper industry. It is crucial to gain an understanding about the structure and functional aspects of various xylanases produced from diverse sources. In this study, a bioinformatics and molecular modeling approach was adopted to explore properties and structure of xylanases. Physico-chemical properties were predicted and prediction of motifs, disulfide bridges and secondary structure was performed for functional characterization. Apart from these analyses, three dimensional structures were constructed and stereo-chemical quality was evaluated by different structure validation tools. Comparative catalytic site analysis and assessment was performed to extract information about the important residues. Asn72 was found to be the common residue in the active sites of the proteins P35809 and Q12603.

This group of enzymes has been attracting a lot of attention in the recent past due to its probable applicability in a spectrum of industrial processes [1]. Xylanases are mainly exploited in the Kraft process for the removal of the lignincarbohydrate complexes [7,8,9]. Other important processes where xylanases are used frequently in extraction and preparation of beverages [4]; clarification of juices [10]; detergents [11]; generation of protoplast in plant cells [12]; production of pharmacologically active polysaccharides for use as antimicrobial agents [13] or antioxidants [14]; production of surfactants [15] and bioconversion of lignocellulosic materials to fuels. Broadly xylanases are classified under two classes: Family 10 (F) and Family 11 (G), based on hydrophobic cluster analysis and sequence homology [16,17,18]. Xylanases differ in their physicochemical properties, structures, specific activities, thermo stability and yields, thus providing a great deal of choice in their potential usage. In this paper, we report the in silico analysis and characterization studies on 8 xylanases from various organisms.

Methodology:
Xylanase protein sequences were retrieved from the SWISS-PROT, a public domain protein database [19].During the sequence retrieval process, key word 'Xylanase' was used. The database search yielded 76 xylanase protein sequences. Sequences representing putative, partial, precursor and fragment of Xylanase protein were excluded from the study. Hence, 8 unique proteins were retrieved and considered for this study ( Table 1 in supplementary material). The selected xylanase protein sequences were retrieved in FASTA format and used for further analysis.

Physico-chemical characterization:
Theoretical isoelectric point (pI), molecular weight, total number of positive and negative residues, extinction coefficient [20], instability index [21], aliphatic index [22] and grand average hydropathy (GRAVY) [23] were computed using the Expasy's ProtParam server [24] (http://us.expasy.org/tools/protparam.html) ( Table 3 in supplementary material). Amino acid composition of the protein sequences can reveal their nature; hence, amino acid composition was also computed ( Functional characterization: Disulphide bonds are important in determining the functional linkages, so, SS bonds were analyzed using the primary protein sequence data with the help of CYS_REC ( Table 7 in supplementary data). CYS_REC identifies the positions of cysteines, total number of cysteines present and computes the most probable SS bond pattern of pairs in the protein sequence. Motifs in the considered sequences were scanned using Motif Search (Table 5 in supplementary material) [26]. SOSUI server [27] was used to predict the transmembrane tendency of the proteins considered for this study ( Table 6 in supplementary material). Hydrophobicity score and plot was obtained using Kyte and Doolittle method keeping a window size of 7 (Figure 1).   Tertiary structure prediction and structure Validation: Since the crystal structures for P35809, Q12603, P26223 and P48791 are not available, SWISSMODEL [28] was used to model the 3D structure of these proteins based on the best template. The details of template and the criteria used for selection are listed in Table 8 in supplementary material. No model could be built for P48791 using the first approach mode of SWISSMODEL. Structure validation tools like ERRAT [29], PROVE [30], PROCHECK [31], WHATCHECK [32] and Verify 3D [33] were employed to evaluate the stereochemistry and quality of the models ( Table 9 in supplementary material).

Active site analysis:
Possible catalytic sites were assessed and explored applying CASTp [34]. Out of many binding sites predicted, active site was selected on the basis of maximum surface area and volume. Important residues involved in active sites were identified and compared for the modeled proteins ( Table 10 in supplementary material).

Discussion:
Amino acid composition determines the fundamental properties of the enzyme. The amino acid composition of xylanase sequences is represented in Table 2 (see supplementary material). Isoelectric point (pI) is the pH at which net charge existing on the protein is zero. The pI values of all protein sequences are in the range of 4.78- 8.71 indicating that all considered xylanase sequences are acidic except P56588 and P48793. The calculated isoelectric point (pI) will be useful because at pI, solubility is least and mobility in an electro focusing system is zero. The instability index which gives clue about the stability of a protein in vitro can be calculated using equation 1 (see supplementary material). All the considered sequences were classified as stable with value ranging from 13.57 to 37.23 as a value > 40 indicates an unstable protein.
The aliphatic index (AI) which is defined as the relative volume of a protein occupied by aliphatic side chains is regarded as a positive factor for the increase of thermal stability of globular proteins. It can be calculated by equation 2 (see supplementary material). Aliphatic index for the xylanase sequences ranged from 48.71-87.76. The very high aliphatic index of all xylanase sequences indicates that these xylanases may be stable for a wide temperature range. From the molar extinction coefficient of tyrosine, tryptophan and cystine (cysteine does not absorb appreciably at wavelengths >260 nm, while cystine does) at a given wavelength, the extinction coefficient of the native protein in water can be computed using equation 3 (see supplementary material).
The computed protein concentration and extinction coefficients help in the quantitative study of protein-protein and protein-ligand interactions in solution. The Grand Average hydropathy (GRAVY) value for a peptide or protein is calculated as the sum of hydropathy values of all the amino acids, divided by the number of residues in the sequence. GRAVY indices of xylanases are ranging from -0.608 to -0.173. This low range of value indicates the possibility of better interaction with water. The secondary structure indicates whether a given amino acid lies in a helix, strand or coil. Secondary structure features as predicted using SOPMA are represented in Table 4 (see supplementary). The results revealed that random coils dominated among secondary structure elements followed by alpha helix, extended strand and beta turns in P40942, P81536, P35809, P26223, P48791, and P48793 while alpha helix outnumbered random coils in Q12603, P56588. A set of conserved amino acid residues located in vicinity that provides clues to the functions is termed as motif. Motifs predicted using Motif Search is shown in Table 5 (see supplementary material).
It was found that P40942 and P26223 contained Glycosyl hydrolases family 10 motif. Glycosyl hydrolases family 11 contains two signature motif viz signature 1 that spans upto 11 residues and signature 2 motif of l2 residues. The average length of the motif predicted was 11 in both Glycosyl hydrolases family 10. Motifs could not be predicted for Q12603 and P48791. SOSUI distinguishes between membrane and soluble proteins and predicts the transmembrane helices from amino acid sequences quickly with high precision. Xylanase from Dictyoglomus thermophilum was classified as membrane protein by SOSUI server while all other xylanases were predicted to be soluble proteins ( Table 7 in supplementary material). The transmemebrane region predicted was found to be rich in hydrophobic amino acids and it is also evident in Kyte and Doolittle mean hydrophobicity profile generated using online tool (http://gcat.davidson.edu/rakarnik/kyte-doolittle.htm) (Figure 1) in which many points lie above the 0.0 line and a clear peak was observed in plot that indicates about the plausible transmembrane region. As disulphide bridges play an important role in determining the thermostability of these enzymes, CYS_REC was used to determine the Cysteine residues and disulphide bonds. CYS_REC predicted no Cystine residues in P40942 and P48793. Possible pairing and pattern with probability indicated by scores are presented in Table 7 (see supplementary material). Since there is lack of experimental structures for 4 Xylanases considered, SWISSMODEL was used to predict the 3D structures of proteins. 1YNA_A (Thermomyces lanuginosus), 1N82 (Bacillus stearothermophilus), 2F8Q (Bacillus sp.ng-27) were selected as templates from PDB database for P35809, Q12603 and P26223 respectively based on sequence identity ( Table 8 in supplementary material). The final modeled structures are shown in Figure 2. The predicted structures were validated using various structure validation servers. 90.5%, 81.4%, 82.5% % of amino acids lie in the most favored regions of Ramachandran Plot as revealed by PROCHECK analysis for the structure modeled for P35809, Q12603 and P26223 respectively. The predicted structures conformed well to the stereochemistry indicating reasonably good quality and were used for further analysis. These structures will provide a good foundation for functional analysis in dearth of experimentally derived crystal structures. It is important to explore and characterize the active site of an enzyme for understanding its interactions with substrate. CASTp was used to investigate the possible  Figure 3). The results indicate that Asn72 was present in active site of both P35809, Q12603 thus implying its essential role in enzyme-substrate interactions.

Conclusion:
For obtaining desirable results in industrial application, it is essential to manipulate the characteristic properties of enzyme which is a tedious task. Protein engineering techniques used to achieve this goal require a sound knowledge about the protein both at sequence and structure level. In this study, 8 xylanase sequences were selected to acquire an understanding about their physico-chemical properties and various protein structure levels by using in silico techniques. Primary structure analysis reveals that most of the xylanase under study are hydrophobic in nature and three of them contain disulphide linkages. Secondary structure analysis established that in most of the sequences, random coils dominated among secondary structure elements followed by alpha helix, extended strand and beta turns. Three dimensional structures were predicted for proteins where such data was unavailable and active sites were explored for determining important residues. This study will provide an insight about the physiochemical properties and function of xylanases and thus aid in formulating their uses in industries. i=L-1 II = (10/L) * Sum DIWV(x(i)x(i+1)) i=1 Where, L denotes length of sequence, DIWV(x (i) x (i+1)) is the instability weight value for the dipeptide starting in position i.