Genome wide identification and functional assignments of C2H2 Zinc-finger family transcription factors in Dichanthelium oligosanthes

Transcription factors (TFs) are biological regulators of gene function in response to various internal and external stimuli. C2H2 zinc finger proteins (C2H2-ZFPs) are a large family of TFs that play crucial roles in plant growth and development, hormone signalling and response to biotic and abiotic stresses. While C2H2-ZFPs have been well characterized in many model and crop plants, they are yet to be ascertained in the evolutionarily important C3 plant Dichanthelium oligosanthes (Heller's rosette grass). In the present study, we report 32 C2H2-ZF genes (DoZFs) belonging to three different classes-Q type, C-type and Z-type based on structural elucidation and phylogenetic analysis. Sequence comparisons revealed paralogs within the DoZFs and orthologs among with rice ZF genes. Motif assignment showed the presence of the distinctive C2H2-ZF conserved domain "QALGGH" in these proteins. Cis-element analysis indicated that majority of the predicted C2H2-ZFPs are associated with hormone signalling and abiotic stress responses. Further, their role in nucleic acid binding and transcriptional regulation was also observed using predicted functional assignment. Thus, we report an overview of the C2H2-ZF gene family in D. oligosanthes that could serve as the basis for future experimental studies on isolation and functional implication of these genes in different biological mechanism of C3 plants.

Dichanthelium oligosanthes, also known as the Heller's rosette grass is a frost tolerant perennial wild penicoid grass species which utilizes the C3 pathway for carbon fixation and lacks Kranz anatomy [18]. Therefore, it can be used as a model species to understand the evolutionary developmental pattern of C4 photosynthesis when compared with important C4 relatives, including rice, wheat, and maize. The draft genome of D. oligosanthes has been recently sequenced and a small suite of transcription factors associated with C4 photosynthesis have been identified [19]. While, extensive studies of C2H2-ZFPs and their association with biological and physiological mechanisms have been conducted in many plant species, no report is available from D. oligosanthes so far. Therefore, it is important to perform a genome-wide identification and characterization of C2H2-ZF family of transcription factors to illuminate their molecular role in D. oligosanthes. In the present study, we identified 32 C2H2-ZF genes from D. oligosanthes utilizingvaried bioinformatics tools. The structural organization of the identified genes including exonintron arrangements, 5'/3' untranslated regions (UTRs), conserved protein motifs and promoter cis-elements were determined. Further, the identified proteins were analyzed for their phylogenetic relationship and orthology/ paralogy within D. oligosanthes as well as with other model plant species. Additionally, the functional characteristics of the identified C2H2-ZFPs were predicted using gene ontology (GO) analyses. These results will form the basis for future gene functional studies of C2H2-ZFPs in towards understanding physiological responses in D. oligosanthes.

Structural organization and identification of conserved motifs
The individual cDNA sequences of the C2H2-ZF genes and their corresponding genomic sequences were compared using the Gene Structures Display Server (GSDS 2.0; http://gsds.cbi.pku.edu.cn/index.php) to generate the intron/exon organization. Motif structures of the predicted protein were analyzed using Multiple Expectation Maximization for motif Elicitation (MEME) tool [22] using the set parameters as follows: occurrence of motif repeats: any number, max number of motifs to be predicted: 20, and Min/Max motif width: 10/100.

Promoter cis-element analysis and identification of paralogs and orthologs
Promoter sequences about 2Kb upstream of the translation start site for all the C2H2-ZF genes were obtained from the NCBI database. The cis-acting regulatory elements were located and predicted from the putative C2H2-ZF promoter regions by using Plant-CARE [23]. All the cDNA sequences of the C2H2-ZFgenes were compared amongst themselves (all-against-all) by performing BLASTn to identify the paralogous ZFs in D. oligosanthes. After each round of 691 ©Biomedical Informatics (2019) BLASTn, sequences showing ≥ 40% sequence similarity with at least 300bp sequence alignment were considered to be paralogous [24]. To predict the orthologs in rice, each of the rice C2H2-ZF sequences was used as a query to search against all DoZF sequences by using BLASTn. The BLASTn results showing the best hits with at least 300 bp region of alignment with a DoZF was considered to be an ortholog [24].

Sub-cellular localization and gene ontology (GO) analysis
The subcellular localization of C2H2-ZF proteins was predicted using the mGOASVM (Plant V2) server [25]. The functional grouping of C2H2-ZF sequences from D. oligosanthes and the annotation data were computed using the Blast2GO v3.0 [26] and cross verified using the DeepGO protein function prediction tool with the protein GO classes [27]. Blast2GO annotation associates genes or transcripts with GO terms classified into three categories: biological processes, molecular functions and cellular components.  Table  1.

Results & Discussion
The HMM profile of the C2H2-ZF domain (PF00096) was used as a query to search for C2H2-ZF genes of D. oligosanthes within the protein databases using HMMER software. A total of 57 C2H2-ZF genes were obtained. A recent study using similar approach identified 14 Squamosa promoter-binding protein-like (SPL) TFs in D. oligosanthes [28].  To further reveal the diversification of C2H2-ZFPs in D. oligosanthes, conserved protein motif sequences were predicted using MEME web server [22]. A total of 15 distinct structural motifs were predicted (Figure 2; Table 2). Motif 1, 2, 7 and 11 represented distinctive conserved regions of the C2H2-ZFPs. Motif 7 and 11 constituted the plant specific conserved domain "QALGGH" and were found in 11DoZFPs that were identified as Q-type. Among the Q-types, DoZF29 have a modified conserved sequence "ALGGH" and classified as M-typeC2H2-ZFP. Likewise, 15DoZFPs consisted of Motif 1 with conserved sequence "CGKGFQRDQNLQLHRRGH" and motif 2 with conserved sequence "CGKGFKRDANLRMHMRGH", the characteristic features of the Z-type C2H2-ZFPs. The remaining 6 DoZFPs (DoZFP4, DoZFP9, DoZFP13, DoZFP15, DoZFP25 and DoZFP32) did not contain any known conserved motif in the ZF region and were categorized as C-type C2H2-ZFPs. Additionally, 11 unidentified conserved motifs were also identified that were randomly placed across all the DoZFPs. Taken together, our results suggest that functionally divergent group of C2H2-ZFPs are associated in numerous plant developmental and physiological processes of D. oligosanthes.

693
©Biomedical Informatics (2019) To explore the evolutionary association of the identified DoZFPs, full length protein sequences of 32 DoZFPs, 15 AtZFPs and 29 OsZFPs were used to construct a neighbor-joining tree (Figure 3).

Conclusion:
A comprehensive genome wide analysis including phylogenetic relationships, structural prediction, conserved motif analysis and gene functions of the C2H2ZF gene family in D. oligosanthes were performed. Our analysis identified 32 C2H2ZF genes in D. oligosanthes. Phylogenetic analysis grouped the DoZFPs into three clusters similar to their orthologs in Arabidopsis and rice. Structural and motif elucidation demonstrated the presence of multiple conserved domains "QALGGH" suggesting their implication in DNA binding and transcription factor activity. Further, the ciselement analysis of the DoZFs showed their involvement in hormone signalling and stress responses. These data form the basis for functional characterization of suitable candidate genes to untangle their different roles in biological regulation.