In silico analysis of onion chitinases using transcriptome data

Chitinases are glycoside hydrolase (GH) family of proteins having multifaceted roles in plants. It is of interest to identify and characterize chitinase-encoding genes from the popular bulbous plant onion (Allium cepa L.). We have used the EST sequences for onion chitinases to elucidate its functional features using sequence, structure and functional analysis. These contigs belong to the GH19 chitinases family according to domain architecture analysis. They have highly conserved chitinase motifs including motifs exclusive to plant chitinases as implied using the MEME based structural characterization. Estimation of biochemical properties suggested that these proteins have features to form stable and hydrophilic proteins capable of localizing extracellular and in vacuoles. Further, they have multiple cellular processes including defense role as inferred by DeepGO function prediction. Phylogenetic analysis grouped them as class I and class VII plant chitinase, with possible abundance of class I chitinase in onion. These observations help in the isolation and functional validation of onion chitinases.

chitinases, but mainly consists of chitinases from animals, fungi, bacteria and viruses. On the contrary, most plant chitinases belong to GH19 family along with a few chitinases from nematodes and bacteria [10]. Chitinases in plants are divided into seven classes (class I-VII), which belong to both GH18 and GH19 families [4]. The functions and localization of different class chitinases differ from one another; for instance, class I chitinase are basic in nature and localize in vacuole, whereas class II are acidic in nature and localize extracellularly [11].
The availability of EST data in public databases like dbEST facilitates the mining, prediction and characterization of candidate genes by computational biological methods. Several genes having vital functional attributes in processes like seed development [12], plant growth [13], and defense response [14,15], microsatellites [16], and micro-RNAs [17] have been identified using mining of dbESTs and genome survey sequences (dbGSS) sequences. In the current work, an EST mining-based identification of chitinases in A. cepa has been carried out using already reported plant chitinase sequences as bait. Further insilico analysis of the identified highly homologous contigs revealed the structural organization and domain architecture of the identified onion chitinases. Functional annotations and biochemical properties of the identified onion chitinases were predicted using several bioinformatics tools. Finally, based on the predicted structural information along with phylogenetic classification the identified onion chitinases are divided into respective chitinase classes.

Methodology: EST dataset of onion chitinases:
The NCBI public database dbEST contains single-pass cDNA or expressed sequence tag sequences from animals, plants, and microorganisms. A total of 20255 EST sequences expressed in different physiological conditions in different tissues deposited in dbEST were downloaded in FASTA format. All 20255 ESTs were screened against the UniVec database of NCBI [18] to detect vector and adapter sequence contaminations, and such detected sequences were subsequently removed. Obtained clean reads with no sequence contaminations were subsequently fed using CAP3 sequence assembly program [19] to generate a nonredundant dataset of contigs.

Sequence homologs of onion chitinases:
The Basic Local Alignment Search Tool (BLAST) variant TBLASTN was used to perform reverse alignment of selected previously reported chitinases on A. cepa contigs. All chitinase clusters found in A. cepa database were translated to obtain their protein sequences. The open reading frames (ORFs) for each searched contig was obtained using ExPASy translate tool [20]. Protein sequences obtained were used for second round of BLASTp search against the non-redundant protein database at NCBI to identify their closest homologs. Multiple alignments of proteins deduced from the selected contig sequences were performed using Clustal Omega program [21].

De novo motifs and domain architecture:
De novo motif predictions and motif elicitation of the selected contigs were performed using Multiple Expectation Maximization for motif Elicitation (MEME) tool [22]. The motif searches were performed for zero or one occurrence per sequence to restrict the number of statistically overrepresented motifs in the dataset. Default width of MEME motif searches were employed having a minimum and maximum motif width of 6 and 50, respectively. Additional domains were detected using the Simple Modular Architecture Research Tool (SMART) program [23]. The protein folding states of the identified onion chitinases were predicted using the FoldIndex program [24].

Estimation of biochemical parameters:
Prediction of various peptide properties like molecular weight (Mw) and isoelectric points (pI) of the selected onion contigs were achieved using Compute pI/Mw [25]. Peptide properties including amino acid composition, instability index, aliphatic index, and grand average of hydropathicity (GRAVY) were predicted by using Protparam tool [26]. Subcellular localization of the onion contigs was performed using mGOASVM (Plant V2) server [27].

Phylogenetic analysis:
Multiple sequence alignment of the selected nine onion contigs was performed using the MUSCLE program

Results & Discussion:
Assembling the cleaned EST sequences by using CAP3 sequence assembly program resulted in a total of 4175 contigs. Reverse alignment on the generated contigs were done using TBLASTN with previously reported plant chitinases. The bait chitinase sequences were taken from two widely used model plants Arabidopsis thaliana and Oryza sativa chitinases as listed by Grover [30], comprising of both GH18 and GH19 family chitinases. Sequence homology assessment by consecutive rounds of BLAST searches resulted in identification of nine (AcCon16, AcCon72, AcCon198, AcCon213, AcCon387, AcCon703, AcCon1214, AcCon2325, and AcCon3094) highly homologous onion contigs with previously reported plant chitinases ( Table 1). All nine contigs were found to carry the GH19 domain, thus, identified as GH19 family chitinases. No member of GH18 chitinase was identified in our in silico approach of onion chitinase identification. In Angiosperms, GH19 family chitinases are seen in abundance and their distribution is localized to higher plants.
On the contrary, GH19 family chitinases are rarely seen in microorganisms like bacteria, and are completely absent in archaea  De novo motif prediction from the selected onion contig sequences using MEME tool revealed the presence of several conserved motifs at both N-terminal and C-terminal regions (Figure 2). A highly conserved motif across the GH19 family chitinases "M1" [WY]N" was found to be present in all nine onion chitinases. M1 motif has been reported to form a substrate binding cleft during its activity in plants [31]. Additionally, 9 more structurally conserved motifs in GH19 chitinases were discovered from the selected onion contig sequences. Motif M3, M4, and M6 are conserved in chitinases found in purple bacteria, actinobacteria, and plants [10]. Furthermore, M8, M10, M11, M12, and M13, found in the onion chitinases are exclusively present in plant GH19 chitinases. Thus, the presence of highly conserved GH19 structural motifs and exclusive plant GH19 chitinase motifs strongly support that all nine onion chitinases to be possible functional chitinases.
Nine onion contigs with high sequence homology to previously reported plant chitinases contains conserved canonical GH19 chitinase motifs were identified in our study. Their predicted peptide length varied from 164 (AcCon3094) to 240 (AcCon387 and AcCon1214) amino acids, whereas their predicted molecular weight ranged from 17.28 to 25.53 kDa. Predicted proteins forms of all nine contigs were found to be stable having less than 40instability index (except AcCon198) and higher aliphatic indexes ranging from 42.33 to 72.70 ( Table 2). Nature of hydropathy prediction of these nine chitinases revealed that all of them showed a negative GRAVY value indicating to be hydrophilic in nature.
Ontology predictions of all nine-onion chitinases were performed using DeepGo analysis [32]. DeepGO predicts the function of a protein from its sequence by employing an algorithm that utilizes the dependencies of the gene ontology (GO) classes as background information to construct a deep learning model. Prediction of functions of all nine contigs revealed their possible catalytic and hydrolase activities, which are the key functions of a plant chitinase ( Table 2). In addition, the biological function prediction of all 9-onion chitinases revealed that all of them potentially participate in cellular or multi-organism processes. However, AcCon198, AcCon387, AcCon1214, and AcCon2325 were predicted with a function of response to stimulus, which suggest their possible role in onion defense responses. These predicted functional attributes of the contigs strengthens the assumption that the identified contigs to be functional chitinases.
Chitinases are a diverse group of enzymes having different enzymatic activities in different parts of a plant and have diverse cellular localizations. We predicted the sub-cellular localization of the identified onion chitinases using the mGOASVM server. mGOASVM prediction accuracy of the subcellular locations are significantly higher than the conventional methods of subcellular-localization prediction using tools like TargetP, SignalP, or even iLoc-Plant [27]. Results from mGOASVM server prediction revealed that AcCon16, AcCon213, and AcCon703 localize in vacuole, whereas AcCon72, AcCon198, AcCon387, AcCon1214, AcCon2325, and AcCon3094 were secretory chitinases. As the phylogenetic analysis confirmed that the chitinases are of class I and class VII, we tried to predict their protein folding stages using the FoldIndex program. The folding states of all nine-onion chitinases were predicted (Figure 4). FoldIndex estimates the folding of a given protein sequence based on the net charge and average hydrophobicity of the input sequence [24]. The onion chitinases exhibited different predicted folding properties. AcCon16, AcCon72, AcCon213, AcCon387, AcCon703, AcCon1214, and AcCon2325 contained higher percentage of disordered residues, whereas AcCon198 and AcCon3094 showed small unfolding and least disordered sequences. The results obtained were in accordance with Mishra et al. [11] who reported that class I chitinases possess more disordered sequences than others.

Conclusion:
It is of interest to perform a comprehensive structural evaluation of onion chitinases using various computational approaches. We have found nine highly homologous onion contigs with other plant chitinases having conserved motifs. Further, their domain architecture contains well-conserved GH19 domain in addition to CBD structural and signal peptides. DeepGo function prediction suggests that four onion chitinases have defense response. Phylogenetic classification confirmed that the onion chitinases belong to class I and class VII. These observations serve as a framework for the future characterization and functional assessment of onion chitinases. Moreover, it adds insights to the understanding of the distribution and diversity of onion chitinases.