Computational predictions of common transcription factor binding sites on the genes of proline metabolism in plants

Proline, an imino acid, has been well documented to be associated with the stress response induced by abiotic factors such as drought, cold and salinity in plants and biotic factors such as bacterial and fungal attacks. However, the regulatory mechanisms controlling proline metabolism, intercellular and intracellular transport and connections of proline to other metabolic pathways are poorly understood. F-MATCH analysis combined with composite module analysis (CMA) revealed that the binding sites matching matrices for O2 and OCSBF-1 were overrepresented in the promoters of differentially expressed proline metabolism genes. The presence of MYBAS1 consensus binding sites occurring in combination with O2 and OCSBF1 in the promoters of genes of proline biosynthesis pathway and SBF1 and GT1 consensus binding sites occurring in combination with O2 and OCSBF1 in the promoters of proline catabolic pathway genes suggest their involvement in modulation of proline metabolism and its accumulation in plants.

Accumulation of proline in plant cells under stress environment could be due to either its de novo synthesis or decreased degradation or both. Synthesis of proline in plants occurs in the cytosol and in the plastids (like chloroplasts in green tissues) and involves the sequential action of pyrroline-5-carboxylate synthetase (P5CS) and pyrroline-5-carboxylate reductase (P5CR), which convert glutamate to pyrroline-5-carboxylate (P5C) and P5C to proline, respectively [11,12]. Several studies have indicated that P5CS is the critical enzyme in proline biosynthesis under salt and water stress [13,14]. Proline biosynthesis is controlled by the activity of two P5CS genes in Arabdopsis, encoding one housekeeping and one stress-specific P5CS isoform. Arabidopsis P5CS1 is induced by osmotic and salt stresses and is activated by H2O2-derived signals and abscisic acid (ABA)-dependent pathway [4,15]. P5CR is encoded by only one gene but the enzyme seems to be active in chloroplasts and cytosol [16,12].
Proline is oxidized by sequential action of proline dehydrogenase (PDH), which converts proline to P5C and D1pyrroline-5-carboxylate dehydrogenase (P5CDH), which converts P5C into glutamate [17,18]. The oxidation of proline generates NADP/NADPH cycling or redox balance therefore, important for the cell. The enzyme PDH is bound to the inner membrane of mitochondria. An alternative source of substrate for the PCDH enzyme can be derived from the conversion of arginine to ornithine and subsequent catabolism to P5C by ornithine aminotransferase [11]. P5CDH is a single copy gene in Arabidopsis and the encoded protein is mitochondria localized [18].
Whereas proline biosynthesis is upregulated by light and osmotic stresses, proline catabolism is activated in the dark and during stress relief [19,20]. P5CS1 gene activation and proline accumulation is promoted by light and repressed by brassinosteroids [21,19]. Under non-stressed conditions, phospholipase D (PLD) functions as a negative regulator of proline accumulation [22], on the other hand, calcium signaling and phospholipase C (PLC) trigger P5CS transcription and proline accumulation during salt stress. However, in some halophyte PLD functions as positive regulator, whereas PLC exerts a negative control on proline accumulation [23,24]. Calcium signals can be transmitted by a specific CaM4 calmodulin, which interacts with the MYB2 transcription factor and upregulates P5CS1 transcription [25]. Conversion of P5C to proline is not a rate-limiting step in proline biosynthesis, yet the control of P5CR activity implies a complex regulation of transcription, which was shown to be under developmental and osmotic regulation [12]. Promoter analysis of Arabidopsis P5CR identified a 69-bp promoter region that is responsible for tissuespecific expression [26]. However, trans-acting factors that can bind to this promoter region have not yet been identified. Downregulation of PDH expression during stress is widely accepted as one control point that can promote proline accumulation under stress [17,27,28]. PDH transcription is activated by rehydration and proline, but repressed by dehydration; thus, preventing proline degradation during abiotic stress [29]. Promoter analysis of PDH1 identified the proline and hypo-osmolarity-responsive element (PRE) motif ACTCAT, which is necessary for the activation of the PDH gene [30]. Basic leucine zipper protein (bZIP) transcription factors (AtbZIP-2, -11, -44, -53) have been identified as candidates for binding to this motif [31]. The P5CDH gene is expressed at a low basal level in all Arabidopsis tissues, and can be upregulated by proline [32]. A short sequence similar to the PRE motif has been identified on the promoters of P5CDH genes in Arabidopsis and cereals [33].
Although the importance of proline accumulation conferring hyperosmotic stress tolerance has been demonstrated well, the regulatory molecules as well as the molecular signals involved in the expression of proline biosynthetic genes are not understood. Comprehensive studies are required at physiological, molecular and genetic levels to explore the signal transduction events of proline synthesis and degradation. An insight into proline metabolism would be of interest to both those seeking to better understand plant stress physiology as well as those seeking to understand metabolic regulation. In the present study, we aim to analyze promoters of genes involved in proline metabolism in order to improve the understanding of the underlying physiological, biochemical and molecular events in stress tolerance by the plant.

Promoter sequence analysis of differentially expressed genes of proline biosynthesis
Promoters sequence analysis of differentially expressed genes of proline metabolism was done using BIOBASE Knowledge Library Plant Edition (BKL-Plant) and ExPlain Plant Analysis system. The promoter window of -1,000 to +100bp was uploaded into ExPlain PlantAnalysis System, F-Match module was used to identify transcription factor binding sites overrepresented in differentially expressed proline biosynthesis gene sets against a background set of 200 ubiquitously present genes. The Composite Module Analysis was then used to determine which combination of binding sites, or Composite Module, was most commonly found within the sets of genes. Matrices with a Yes/No score >1.3, pvalue <0.05, and Matched promoters p-value <0.1 from F-Match analyses were selected for the composite module analysis (CMA).

Results and Discussion:
Regulation of gene expression plays an important role in a variety of biological processes such as development and responses to environmental stimuli including biotic and abiotic stresses. These responses are modulated by transcriptional regulation of various genes. One of such plant responses to environmental stresses is the accumulation of proline. Although proline metabolism has been studied for very long time in plants, little is known about the signaling pathways involved in its regulation. Proline biosynthesis is activated and its catabolism repressed during dehydration, whereas rehydration triggers the opposite regulation [13,34]. Based on the literature and genome information, Arabidopsis genome was parsed for the proline biosynthetic genes by using mapviewer. In general, cellular concentration of compatible solutes can be regulated by increasing biosynthesis, decreasing degradation, and/or modifying rates of uptake or release of these compounds. Therefore, the genes were divided into three sub-groups according to the metabolic functions. Group 1 of pyrroline-5carboxylate synthetase (P5CS) and pyrroline-5-carboxylate reductase (P5CR). There are two P5CS gene loci in the nuclear genome of Arabidopsis thaliana. Out of these, AtP5CS1 is reported to be involved in stress tolerance. Group 2 of osmotic stress-responsive proline dehydrogenase (PO/PDH) and pyrroline-5-carboxylate dehydrogenase (P5CDH). Group 3 of proline transporter with affinity for glycine betaine, proline and GABA (AtProT1), proline transporter 2 (AtProT2) and Proline transporter 3 (AtProT3) Table 1.

Composite Module Analysis of proline biosynthetic Genes
A large number (>1,500) of transcription factors (TFs) in plants, control the expression of tens or hundreds of target genes in various, sometimes intertwined, signal transduction cascades. Transcription factor binding sites (TFBSs) are the functional elements that determine the timing and location of transcriptional activity. In plants and other higher eukaryotes, these elements are primarily located in the long non-coding sequences upstream of a gene, although functional elements in introns and untranslated regions have been described as well. The discovery of regulatory motifs and their organization in promoter sequences is an important first step to improve understanding of gene expression and regulation. Since coexpressed genes are likely to be regulated by the same TF, the identification of shared and thus overrepresented motifs insets of potentially co-regulated genes may provide an insight to the regulation of expression of whole metabolic pathway.
F-Match analysis compares the number of sites found in a query sequence set against the background set and provides, as results, the Position Weight Matrices (PWMs) whose frequencies are higher in the query sequence set compared to the background set. The F-match analysis results showed over represented TFBS in the proline biosynthetic gene set (Figure 1). The composite module analysis was used to determine the combination of binding sites most commonly found within the sets of genes. Upon analysis, the binding sites matching matrices for Opaque 2 (O2) and Ocs element binding factor (OCSBF-1) appeared most commonly in differing combinations within the promoters of proline biosynthesis genes ( Figure 2 and Table 2  . The information available on O2 and closely related monocot genes indicates that they regulate seed storage protein production by interacting with the PBF protein. The data derived from monocot and dicot species suggest that homologues of group S bZIPs are transcriptionally activated after stress treatments [38]. The presence of O2 and OCSBF-1 consensus binding sites in the members of proline metabolism genes set suggests common sensor for the concerted regulatory control of proline metabolism. Similar regulatory control of proline metabolism is seen upon illumination, where proline biosynthesis upregulated by light and osmotic stresses while proline catabolism is downregulated [19][20][21]. The blast analysis showed O2 and OCSBF1 show homology to the members of C and S groups of Arabidopsis AtbZIP transcription factor family, respectively. Blast results showed transcription factors closest to O2 are AtbZIP10 and AtbZIP25 and to OCSBF1 are AtbZIP53 and AtbZIP2. Weltmeier et al. [39] used chromatin immunoprecipation (ChIP) assay to show that ProDH is a direct target of the group S bZIP transcription factor AtbZIP53 and group C AtbZIP10. Promoter analysis of Arabidopsis P5CR identified a 69-bp promoter region that is responsible for tissue-specific expression [38]. The transacting factors that can bind to this promoter region, however, have not yet been identified. The regulation of other genes in proline metabolism by specific transcription factor is yet to be studied.
Composite module analysis also showed consensus binding sites of MYBAS1 in P5CS1 and P5CR, SBF-1 in PDH and GT-1 in P5CDH promoters (Supplementary data) suggesting that the proline metabolism gene cluster is likely to be under the combinatorial control of more than one class of transcription factors. Plant MYB proteins have been shown to regulate diverse developmental processes, as well as being involved in environmental signaling and secondary metabolism [40]. SBF-1 (SET binding factor 1) is a pseudo-phosphatase related to the myotubularin family of dual specificity phosphatases and binds to the silencer region of a chalcone synthase promoter [41]. The consensus sequence resembles the binding site for the GT-1 factor in light-responsive elements of the pea rbcS-3A gene suggesting that they may have similar functions [41]. Plant transcription factor GT-1 was identified by its specific binding activity to Box II, a promoter cis-element with the core DNA sequence 5′-GGTTAA found initially in light-regulated genes and may activate transcription through direct interaction with the transcriptional pre-initiation complex [42].
The F-Match and CM analyses with the published literature suggest involvement of both trans-and cis-acting regulatory elements that sense proline levels either directly or indirectly. Therefore, to determine the precise regulation mechanism of proline metabolism, reverse genetic approaches (random or specific gene disruption, overexpression of strong activators or repressors or their inducible counterparts) in combination with microarray technology may be done. Such studies with the metabolomic analyses should pave the way towards a better understanding of the functional diversification of plant TFs and regulatory molecule under stress conditions.

Conclusion:
The composite module analysis was used to determine the combination of binding sites most commonly found within the sets of proline metabolism genes. Upon analysis, the binding sites matching matrices for O2 and OCSBF-1 appeared most commonly in differing combinations within the promoters of proline biosynthesis genes. O2 and OCSBF1 showed homology with the members of C and S groups of AtbZIP transcription factor family, respectively. The presence of MYBAS1 consensus binding sites in combination with O2 and OCSBF1 in the promoters of genes of proline biosynthesis pathway and GT1 and SBF-1 consensus binding sites occurring in combination with O2 and OCSBF1 in the promoters of genes of proline catabolic pathway genes suggest involvement of these transcription factors in the regulation of cellular levels of proline in plants exposed to abiotic and biotic stresses.