Identification of drought responsive proteins using gene ontology hierarchy

The availability of the complete genome sequences has facilitated access to essential information to identify proteins. The determination of Arabidopsis genome sequence has had a great impact to annotate data. The genome sequencing of Sorghum bicolor has been only recently completed and hither to the global response to abiotic stresses in this important crop remains largely unexplored. We used 2-D gel electrophoresis based proteomic approach refined with MALDI-TOF to analyze drought-stress response proteins in sorghum. Major changes in protein complement of sorghum were observed in hydroponic cultures at 96 hours under drought stress. Six most highly expressed proteins were excised for functional identification. Here, we developed a method to obtain functional distances between GO terms and analyzed distance values to allocate shortest path (SP) in GO hierarchy. The shortest paths for expressed proteins were noted for most informative common ancestor (MICA) terms, viz. binding, catalytic activity and primary metabolic process. We observed the expressed proteins belonged to the functional group of signal transduction mechanisms, carbohydrate transport and metabolism. These identified functions of proteins suggest a different mechanism of drought-stress tolerant in sorghum. The novel approach applied in this study may have great importance in further identifying proteins involved in abiotic and biotic stress conditions in crops.


Background:
The food productivity decreases due to the effect of various abiotic stresses.Drought and salinity are the major abiotic environment stresses, which negatively affect growth and productivity of plants.Minimizing these losses and to improve and manage food productivity by developing stress tolerant crops is a major challenge globally.More than one abiotic stress can occur simultaneously; one abiotic stress decreases plant's ability to resist a second stress.Plant's responses to salt and drought stress have much in common [1].The primary effect of drought is to generate osmotic stress, whereas salinity also induces osmotic stress by its effect on the ionic homeostasis in the plant cell.Reduction in the ability of plants to take up water under salinity quickly causes reduction in growth rate, along with a group of other metabolic changes that are identical with drought stress.Photosynthesis is among the primary process that are affected by drought and salinity which cause reduced synthesis of photosynthetic pigments [1].These changes in the amount of photosynthetic pigments are closely associated to plant biomass yield.Besides photosynthesis, reduced water potential also disrupts membranes along with other essential physiological and biochemical processes such as respiration, ion uptake, carbohydrates and nutrient metabolism [2].Reduced water limits CO2 fixation and reduces NADP + regeneration in calvin cycle.These adverse conditions increase the formation of reactive oxygen species (ROS) such as H2O2 (hydrogen peroxide), O 2-(superoxide) and OH (hydroxyl) radicals, through enhanced leakage of electrons to molecular oxygen.ROS is involved in the stress signal transduction pathway as a secondary messenger, however excessive ROS production causes oxidative stress, which damages plant's photosynthetic pigments, membrane lipids, proteins and nucleic acids.To keep the levels of active oxygen species under control, plants have antioxidant defense systems to protect cells from oxidative damage [3].
Sorghum bicolor is a stress-tolerant cereal crop species with considerable economic importance in drought and salinity affected arid and semiarid regions of the world.The crop is grown for food, feed and biofuels.The genome sequencing of S. bicolor has been recently completed [4].Although, currently a large amount of data are available on gene expression in response to salt, drought and other stresses, the global response to these abiotic stresses in sorghum remains largely unexplored.The increasing availability of genomic sequences of several plants in combination with high-throughput bioinformatics tools and databases provide new opportunities for understanding role of proteins.The stress defence proteins network has been recently explained in Arabidopsis and rice using multi-parallel gene expression analysis techniques [5].Currently, several computational approaches for functional identification of proteins such as sequence similarity, phylogenetic profiles, protein-protein interaction (PPI) and gene expression are available [6].Protein-protein interaction method is based on the assumption that interacting proteins usually share same function [7].In this paper, we report Gene Ontology term's functional distances analysis method to identify six expressed drought responsive proteins in sorghum.Gene Ontology (GO) term's semantic similarity provides functional relationship between proteins.The semantic similarity between two proteins is usually calculated based on the GO terms similarity [8].To identify semantic similarity between GO terms, several methods have been reported earlier, viz.Resnik's, Lin'S and JiangConrath's [9].Gene Ontology is a structured and controlled vocabulary, which characterizes the functional annotation of proteins.Gene Ontology is composed of three independent ontologies: biological process (BP), molecular function (MF), and cellular component (CC).In directed acyclic graph (DAG) the GO terms are structured as 'isa' and 'part-of' relationships in GO database [10].
The Gramene plant genome database [11] (Release 34b) includes a wealth of information of proteins from Arabidopsis, rice and other plant species.However, only little information in sorghum has been proposed till date.In the present study, 2-DE based proteomic technique with MALDI-TOF was used to separate and identify six highly expressed proteins in sorghum.The drought (no water supply) until 96 hours to investigate cellular responses was imposed in 7 d old sorghum plants.To identify drought responsive six expressed proteins, we proposed a new method to identify proteins using functional distances between GO terms.

Methodology:
Plant material Seeds of Sorghum bicolor (L.) genotype csv-17 were used as plant material and grown hydroponically in Hoagland's solution.After 7 days of germination, the drought stress (no water supply) was imposed on germinated seeds and leaves.

Two-dimensional (2-DE) gel electrophoresis
The method Mechin et al. [12] was followed for the extraction of proteins from the plant sample.Protein samples were purified using 2D-cleanup kit (Bio-Rad) and the protein pellet was finally resuspended in sample rehydration buffer (8M urea, 2% CHAPS, 15mM DTT and 0.5% IPG buffer pH 4-7).Protein spots were visualized by staining with coomassie brilliant blue G-250.Gel images were captured by GS800 densitometer (Bio-Rad, USA).

Protein spots identification by mass spectrometry
The gel spots were washed with proteomic grade de-ionized water and proteins were identified by using MALDI-TOF mass spectrometer (Ultraflex III, Bruker Daltonics, Germany).The expressed proteins were analyzed using mascot sequence matching software (http://www.matrixscience.com)using MSDB in the taxonomy group of green plants.While performing mascot the search parameters were: maximum of one missed cleavage by trypsin, fixed modification of oxidation, charged state of +1, peptide mass tolerance of 50 ppm, and fragment mass tolerance of ±1.0 Da.

Data identification and network illustration
A total of six expressed proteins in sorghum were identified to be common with from closely related in Oryza sativa proteins by homology comparison.Homology search was carried out using blastp at Uniprot database (http://www.uniprot.org/blast/)with the following parameters: Database-UniprotKB; Threshold-0.

Functional distance algorithm
We proposed an algorithm to obtain functional distances using sum of distance values between GO terms from root to its associated terms in GO hierarchy.The ProteInOn tool (http://xldb.di.fc.ul.pt/tools/proteinon/) was utilized to calculate the semantic similarities with JiangConrath's measurement without ignore inferred from electronic annotation (IEA).The shortest path was confirmed by analysis of distance values in their corresponding paths.The smallest distance value of analyzed path was referred as shortest path.The shortest path algorithm is given as follows: Where, D1 is the functional distance between GO term's (t1, t2) and SS is semantic similarity.The pathA is the path of GO ontologies that connected to the root term.The total distance (d) to identify shortest path was calculated by summing of functional distance values between GO terms.Discussion: Sorghum seedlings were grown hydroponically and drought stress imposed on 7 d old germinated seeds.The drought stress induced chlorosis and reduced growth of leaves.The MALDI-TOF/MS analysis identified expression of six drought inducible proteins as compared to corresponding control with water in sorghum seedlings (Figure 1).
The whole genome sequencing of Sorghum bicolor has been recently completed [4] and till date no comparable information on drought inducible proteins from this crop is available.Hence, the identified homologues in sorghum were described as hypothetical proteins.Here, very effective bioinformatics approaches were applied to put functional context of expressed drought responsive proteins Table 1 (see supplementary material).The COGnitor program based results revealed that differentially expressed proteins belonged to functional categories of signal transduction mechanisms (spot S1; S2) and, carbohydrate transport and metabolism (spot S5).Among the expressed proteins, cupin1 (spot S3), glycoside hydrolase (spot S4) and raffinose synthase (spot S6) did not match with any COG record.Hence, the families of all six expressed proteins were identified Table 1 (see supplementary material).
Figure 2: GO hierarchy network of highly expressed proteins under drought stress.In the graph M1, M2 and M3 are MICA terms and (n1-n11) are GO terms of expressed proteins.
In this study, we have developed an effective method to identify proteins using functional distances between hierarchical structured GO terms.The shortest path was identified on the bases of the analysis of distance values of their corresponding paths.In the results the smallest distance value has been referred as shortest path Table 2 (see supplementary material).While finding shortest paths for assigned GO terms of six expressed proteins, the most informative common ancestor (MICA) terms were noted.The identified MICA terms were facilitated to discover functional annotation of drought stress responsive proteins.The shortest path method developed in this study was applied on hierarchical structured GO terms in molecular function, biological process and cellular component.
The same calculation was performed to find shortest path for other GO-terms that associated with the root term, molecular function  The GO hierarchy of root term biological process (GO: 0008150) was associated with GO-terms [(n9) carbohydrate metabolic process (GO: 0005975) (spots S4, S6)] and [(n10) trehalose biosynthetic process (GO: 0005992) (spot S5)] (Figure 2).To identify shortest path from root to its associated GO-terms, we have proposed a method based on algorithm described in methodology.Consequently, we noted the MICA [(M3) "primary metabolic process" (GO: 0044238)] was noted for its GO-terms.The GO-terms (n9) and (n10) share a same hierarchy path (Figure 2).

Conclusion:
It can be concluded that the approach applied in this study can be used to identify proteins.We described isolation and characterization of six drought stress responsive expressed in sorghum using 2DE-MS proteomic approaches.We developed a new method to obtain functional distances between GO terms and the analysis of distance values to allocate the shortest path (SP) in GO hierarchy.Further, we observed that identified proteins belonged to functional categories of signal transduction mechanisms.These identified proteins may possibly lead to a distinct mechanism of drought-stress adaptation in sorghum.The approach applied in this study may have great importance in further identifying proteins involved in abiotic and biotic stress conditions in crops.

Figure 1 : 2 -
Figure 1: 2-DE gel showing spots differentially expressed (>1.5fold difference) in leaves from Sorghum bicolor plants grown in drought stress (no water supply) (A) as compared to the control; (B) the identified spots are marked with arrows.

14].
The GO-terms were obtained using Gene Ontology database that cross linked at Uniprot database.We used GO hierarchies of Arabidosis thaliana to obtain GO directed acyclic graphs (DAG).This graph was explained using cytoscape 2.8.1 version [