Identification of putative drought-responsive genes in rice using gene co-expression analysis

Drought is one of the major abiotic stresses causing yield losses and restricted growing area for several major crops. Rice being a major staple food crop and sensitive to water-deficit conditions bears heavy yield losses due to drought stress. To breed drought tolerant rice cultivars, it is of interest to understand the mechanisms of drought tolerance. In this regard, large amount of publicly available transcriptome datasets could be utilized. In this study, we used different transcriptome datasets obtained under drought stress in comparison to normal conditions (control) to identify novel drought responsive genes and their underlying molecular mechanisms. We found 517 core drought responsive differentially expressed genes (DEGs) and different modules using gene co-expression analysis to specifically predict their biological roles in drought tolerance. Gene ontology and KEGG analyses showed key biological processes and metabolic pathways involved in drought tolerance. Further, network analysis pinpointed important hub DEGs and major transcription factors regulating the expression of drought responsive genes in each module. These identified novel DEGs and transcription factors could be functionally characterized using systems biology approaches, which can significantly enhance our knowledge about the molecular mechanisms of drought tolerance in rice.


Background:
Rice is a model plant species and a major staple food crop among cereal feeding around half of the world population [1]. Since, rice has very high-water requirements for good production, drought is a major cause for its yield reduction and limits the growing area worldwide [2]. Several projects were launched for the annotation of rice genome which enabled researchers to utilize the genomeannotation information for the functional characterization of rice genes. One of the promising projects is Rice Genome Annotation Project by MSU (http://rice.plantbiology.msu.edu/index.shtml) with its 7th release providing annotation of all the 12 chromosomes and about 40,000-60,000 estimated rice genes [3]. Recently the reduced cost of transcriptome analysis and availability of efficient omics tools have shifted the attention of molecular biologists to utilize the genome annotation information via computational approaches. Since most of the biological pathways involve coordinated regulation of dozens-to-hundreds of genes (known as co-expressed genes), identification of these co-expressed genes provides useful information for the characterization of key genes via systems level approaches. The development of various omics databases such as the gene ontology (GO) consortium, 2017 (http://geneontology.org/) [4], AGRIGO V2.0 (http://systemsbiology.cau.edu.cn/agriGOv2/) [5], and kyoto encyclopedia of genes and genomes (KEGG) (https://www.genome.jp/kegg/), etc. have provided the opportunity to explore the molecular mechanisms and biological pathways of genes involved in biotic and abiotic stresses by utilizing the available large transcriptome datasets with the minimum cost. 481 ©Biomedical Informatics (2019) In this study, we used several computational and omics tools to understand the molecular components and biological processes involved in drought tolerance in rice. Recently, co-expression network analysis has emerged as a very useful approach for the functional annotation of novel genes [6,7]. It is based on the idea that all the genes involved in a particular biological pathway will be connected to each other in a network. The nodes in the network represent genes and edges represent their connection to other genes. One of the promising applications of network analysis is to identify the functional modules. Weighted gene co-expression network analysis (WGCNA), a package in R, has been used for the network analysis to detect the networks of co-expressed genes and divide the core genome into different modules [8]. These modules can then be used to pinpoint drought responsive key hub genes and predict their putative role in particular biological processes and metabolic pathways, thus facilitate for the elucidation of molecular and biochemical mechanisms using systems biology approaches. For example, using WGCNA analysis of the RNA-seq data, researchers have identified key regulators of flower and fruit development in strawberry [9]. In addition, WGCNA analysis is an important source for the functional annotation of novel genes in rice, thereby providing a helping hand for the functional characterization of novel genes in rice [6]. Loss-of-function mutants have been widely used to characterize various genes for their involvement in particular phenotype and understand the involved molecular mechanisms [10,11]. However, it mostly involves development of large mapping populations for the map-based cloning approach which is a very time taking, laborious and tricky approach. On the other hand, identification of putative target genes involved in particular phenotype or biological process using transcriptome datasets has becoming popular approach for the characterization of genes involved in target phenotype [12]. This can help reduce the long time and large land area needed for the development of mapping populations or near isogenic lines of the specific genes. Here, we identified several putative drought responsive uncharacterized genes using co-expression network analysis from large transcriptomic datasets that will serve as an important resource for the identification of genes involved in drought tolerance using loss-of-function knockout mutants. In addition, we have highlighted several biological processes and metabolic pathways of the identified modules that are involved in drought tolerance by the particular module's genes.

Material and Methods: Plant material and growth conditions
The seeds of japonica rice cultivar 'Hunan' were collected from Hunan Rice Research Institute, Changsha, China. The experiment was conducted in controlled environment in a greenhouse. The seeds were surface sterilized with 1% NaOCl solution to remove the contaminants. Sterilized seeds were immersed in water at 37°C for two days followed by germination at 30°C with a photoperiod of 16 h (light)/8h (dark) and a relative humidity set at 70%. Seedlings were grown for a week in 2000 ml pots containing a½ strength Hoagland nutrient solution. Then, the drought stress treatment was applied by adding to the nutrient solution 20% PEG6000 solution and then the root samples were harvested after 0, 1, 3 and 5 d time period. The control seedlings were maintained in a nutrient solution without drought treatment and samples were collected in parallel. Three biological replicates were maintained for control and drought treatment for each time point.

RNA isolation and qRT-PCR gene expression analysis
Plant total RNA from control and drought-treated root samples was isolated using an RNA extraction kit (Tiangen), and the first-strand cDNA was synthesized from 2 ng of RNA by reverse transcriptase (Invitrogen, Carlsbad, CA, USA), and then diluted (1:100) for use in qRT-PCR with SYBR Premix ExTaq Mix (Takara, Dalian, Liaoning, China) in a total volume of 15 mL. Reactions were performed in a LightCycler 480 thermal cycler (Roche, Basel, Switzerland), following the manufacturer's instructions. Three biological replicates were performed for each sample, and the expression level was normalized to that of rice Actin-1 gene (LOC4333919), which was used as the endogenous control gene. The primer sequences used in this study are given in supplemental material 1.

Weighted gene co-expression network analysis (WGCNA)
Gene co-expression networks were constructed using WGCNA package in R software [8]. The core DEGs were further divided into three modules using WGCNA and correlation of each module with drought stress was calculated. Genes in each module that is blue, grey and turquoise are listed in supplemental material 3. Moduletrait associations were estimated using the correlation between the module eigengene and the stress treatments. Network visualization for each module was performed using the Cytoscape software version 3.6.1 with a cut off of the weight parameter obtained from the WGCNA, set at 0.3. The gene co-expression network is a scalefree weighted gene network with multiple nodes connected to different nodes via edges. Each node represents a gene which is connected to different number of genes. The gene which is connected to a greater number of genes is denoted with bigger size and is more important for its interaction with large number of genes. In addition, key transcription factors involved in drought tolerance and their putative target genes were highlighted.

Identification of the drought-responsive core DEGs in rice
In this study, we analyzed global gene expression profiles of japonica rice cultivar Nipponbare for drought stress response using different datasets.  In order to identify the drought responsive core DEGs, we crosscompared the DEGs among different datasets used in this study, which displayed 517 core DEGs that are conserved among different drought treatments ( Figure 1B). The global gene expression profiles of these core DEGs vary across the datasets and among control and stressed treatments, confirming that these core DEGs are responsive to drought ( Figure 1C). We then employed these 517 core DEGs to get further insight into the mechanisms of drought responses. To achieve, we first performed gene ontology (GO) to identify the significantly enriched biological processes contributed by these DEGs. GO analysis unveiled that biological process related to "sequence-specific DNA binding" was the most enriched biological process followed by "response to stress" with the q-value lower than 0.1, suggesting that specific transcription factors may be involved in drought tolerance by regulating the expression of stress-responsive genes (Figure 2A). We then performed kyoto encyclopedia of genes and genomes (KEGG) analysis to identify enriched pathways contributed by 517 core drought responsive DEGs. Plant hormone signal transduction was the most significantly enriched pathway followed by carbohydrate metabolism with 24% and 21% of the core DEGs involved in these pathways, respectively, indicating that particular hormones played key roles in drought tolerance (Figure 2B, supplemental material 4).

WGCNA of the drought-responsive core genome
In order to identify the different modules involved in drought tolerance in rice, we performed WGCNA which divided the core 517 hub DEGs into three modules; blue, grey and turquoise ( Figure  3). Blue module contains 218 DEGs, grey module contains 76 DEGs and turquoise module contains 223 DEGs (supplemental material 3). Blue and turquoise modules had the positive correlation (r=0.82 and r=0.79, respectively) with drought stress suggesting that genes in these modules positively regulate the drought tolerance in rice (Figure 3). However, grey module had the negative correlation (r=-0.91) with drought stress which indicates that genes in this module negatively regulate the drought tolerance in rice, and thus should be down-regulated under drought stress in order for the plant to survive the drought stress (Figure 3). Hence, it can be conferred that biological pathways enriched in grey module will be different from blue and turquoise modules.
We further expanded the study by performing the GO and KEGG analyses of each module to identify its involvement in particular biological process and metabolic pathway. GO analysis of the 218 DEGs belonging to blue module displayed "sequence-specific DNA binding" as the most enriched biological process, suggesting that transcription factors may be involved in drought tolerance by regulating the expression of drought responsive genes ( Figure 4A). Moreover, KEGG analysis of these DEGs identified "plant hormone signal transduction" as the most enriched metabolic pathway followed by carbohydrate metabolism related pathways ( Figure  4B). Similarly, GO and KEGG analysis of the 223 turquoise DEGs identified "sequence-specific DNA binding" as the most enriched biological process and "plant hormone signal transduction" as the most enriched metabolic pathway, similar to the blue module genes suggesting that blue and turquoise modules may participate in the same biological processes and metabolic pathways (Figure 5).
In contrast, GO analysis of the 76 DEGs belonged to grey module detected "transporter activity" as the most enriched biological process under drought stress ( Figure 6A). Further KEGG analysis revealed that "porphyrin and chlorophyll metabolism" was the most significantly enriched metabolic pathway suggesting that photosynthesis rate might be affected under drought stress probably due to closing of stomata to avoid water loss ( Figure 6B).

Networks displaying relationships among genes within coexpressed modules
We further extended the study to network analysis of detected modules to identify key hub genes and their correlation. Genes in the blue module were divided into four clusters, each having network of different number of genes ( Figure 7A). Genes encoding transcription factors (TFs) are represented with different node colors except sky blue. The size of node circle is positively correlated with the number of genes it interacts.

qRT-PCR validation of selected genes from each module under drought stress
We selected 16 genes from the three modules and performed a qRT-PCR analysis of their gene expression after 5 d under drought stress simulated by PEG6000 treatment in an independent rice cultivar 'Hunan'. The results showed that the expression levels of all the selected genes were significantly changed (|FC>2|) at different time points under drought stress (supplemental material 5). This confirms that the genes identified through our computational analysis are well responsive to drought in rice.

Discussion:
Drought is a major abiotic stress that negatively affects crop production in major crops including rice [16], wheat [17], maize [18,19] and cotton [20]. Rice being a major staple food crop is particularly sensitive to drought stress due to its large water requirements. Thus, understanding the molecular mechanisms of drought tolerance in rice is particularly important to cope with the challenge. Transcriptome-based studies have provided a new platform to understand the molecular mechanisms and biological processes involved in drought tolerance. These RNA-seq data generates a bundle of information for a target phenotype or stress, however, resourceful utilization of this data has been a bottleneck.
Recently, availability of several bioinformatics and statistical tools have helped the researchers to pinpoint key biological processes and metabolic pathways involved in biotic or abiotic stress tolerance, thus efficiently utilizing the RNA-seq data [12, 21]. In this study, we aimed at understanding the key players of drought tolerance in rice using efficient statistical and bioinformatics tools by utilizing large transcriptome datasets. Datasets contain gene expression profiles from RNA-seq data of japonica rice cv. Nipponbare under control and drought stress conditions. WGCNA is a powerful package in R which divides the core DEGs in different modules based on correlation among coexpressed genes involved in particular metabolic pathway [8]. In the present investigation, we first identified drought responsive core DEGs by cross comparison of various transcriptome datasets of rice under drought stress (Figure 1). This core droughtresponsive DEGs were subjected to gene ontology enrichment which unveiled "sequence-specific DNA binding" as the most enriched biological process in drought tolerance in rice (Figure 2A). Transcription factors have been well documented to pl ay key role in stress tolerance by regulation of stress-responsive gene expression [22]. The TFs work by binding to the promoter region of 486 ©Biomedical Informatics (2019) particular genes and thus regulate their expression. Thus, GO analysis highlighted the possible role of TFs in drought tolerance in rice. KEGG analysis further revealed "plant hormone signal transduction" as the most enriched metabolic pathway of these core DEGs which further extended our understanding of the metabolic pathways involved in drought tolerance in rice ( Figure 2B). Later, we did WGCNA of these core DEGs that divided them into three modules, each of them contributes to drought tolerance in a unique metabolic pathway. Modules blue and turquoise had positive correlation with drought stress treatment, thus genes in these modules positively regulate drought tolerance in rice. On the other hand, grey module had a negative correlation with drought stress, thus genes in this module negatively regulate drought tolerance in rice (Figure 3).
To get a deeper insight of how these modules participate in drought tolerance, we performed GO and KEGG analysis of each module separately. Results revealed that "sequence-specific DNA binding" is the most enriched biological process in blue and turquoise modules, suggesting that TFs are involved in drought tolerance. In addition, both of these modules play role in drought tolerance in similar pathway and genes in these pathways positively regulate drought tolerance. However, GO analysis revealed transporter activity as the most enriched biological process in grey module and genes in this module negatively regulate drought tolerance (Figure 5). In addition, KEGG analysis showed that "porphyrin and chlorophyll metabolism" was the most enriched metabolic pathway in grey module. Porphyrin is an important metabolite and its production under drought stress is critical for stress tolerance in rice. Rice plants over-expressing proto porphyrinogen oxidase showed improved drought tolerance under limited water supply [23]. Similarly, heat and drought stresses also affect the rate of photosynthesis in plants due to closing of stomata and reduced chlorophyll contents [18,24,25]. Genes involved in sustained photosynthesis under drought stress would be imperative to play role in drought tolerance in rice. Since grey module genes negatively regulate drought tolerance, it could be suggested that these genes would be playing role for the negative regulation of "porphyrin and chlorophyll metabolism" under drought stress in rice. However, down regulation of these genes could be utilized for the improvement of drought tolerance in rice.
Since TFs have been well characterized to be involved in abiotic stress tolerance, we further extended the study to identify key hub genes and TFs in each module. Network analysis is a very useful tool to pinpoint association among different genes particularly those playing role in a certain pathway [26]. It also delineates the putative interaction among TFs and their target genes. HSFs are a big class of TFs having important role in abiotic stress tolerance particularly drought and heat stress via regulating the expression of drought and heat stress responsive genes [27,28]. Identification of putative HSF genes responsive to drought stress in rice would improve our knowledge about the mechanism of drought tolerance in rice. In this study, network analysis identified two novel uncharacterized HSF genes (LOC_Os02g13800 and LOC_Os02g325900), which showed interaction with several genes in the blue and turquoise module, respectively. These genes could be the putative target genes involved in drought stress response (Figure 7). WRKY TFs are well known TFs involved in biotic and abiotic stress tolerance in plants [29]. In this report, we identified a novel WRKY TF (LOC_Os02g26430) interacting with several genes of blue module and had highest expression in reproductive tissues including panicle and anthers in rice (http://ricexpro.dna.affrc.go.jp/). Since rice is most susceptible to drought and heat stress at reproductive stage, thus this could be a potential TF regulating drought tolerance in rice at reproductive stage [1,30]. In addition, Homeodomain, MYB, bZIP and C2H2 TFs were shown to be involved in drought tolerance in crop plants by modulating the expression of several genes [31 -33]. Here, we report novel drought responsive TFs belonging to blue module, including Homeodomain protein (LOC_Os04g49450), MYB (LOC_Os05g04210 and LOC_Os04g43680), bZIP (LOC_Os06g10880) and C2H2 (LOC_Os03g32230) which could be functionally characterized using systems biology approaches to improve our understanding of the molecular mechanism of drought tolerance.
Further, in grey module, we identified Zinc finger domain TF (LOC_Os08g06280) interacting with three genes including LOC_Os01g69070 (Auxin efflux carrier family protein), LOC_Os02g35180 (OsRR2 type-A response regulator) and LOC_Os05g06920 (Ca2+-activated RelA/spot homolog) that could be putative target genes in the drought tolerance mechanism. Zinc finger TFs have been previously reported to regulate drought tolerance in rice via regulating expression of particular target genes [34,35]. Since grey module was negatively correlated with drought stress treatment, this transcription factor may negatively regulate drought tolerance in rice. However, further characterization of this TF or its target genes using loss of function mutants would further advance our understanding of the mechanism of drought tolerance in rice [11]. Another TF identified in the present study in the turquoise module belongs to the homeobox-domain family (LOC_Os04g45810) which interacts with LOC_Os06g44190 and LOC_Os03g17790. Homeodomain transcription factors were also reported to regulate drought tolerance in rice via binding at particular cis-elements. For example, OsWox13 enhances drought tolerance in rice via regulating expression of genes involved in drought escape [31]. Thus, homeobox TF identified in our study could be a potential regulator of genes involved in drought tolerance in rice particularly those containing its cis-elements in their promoter region.

Conclusions:
We report the core drought responsive DEGs and the co-expressed modules from diverse transcriptome datasets under drought stress in rice. GO and KEGG analysis showed "sequence-specific DNA binding" and "plant hormone signal transduction" as key 487 ©Biomedical Informatics (2019) biological process and metabolic pathway involved in drought tolerance in rice. Network analysis pinpointed several putative TFs from HSF, WRKY, Zinc-finger domain, homeobox domain, bZIP and MYB families that could be key regulators of drought tolerance in rice. , the publisher presents BIOINFORMATION since 2005 …

Supplementary materials (available with authors):
The journal is indexed in