Mining of miRNAs from EST data in Dendrobium nobile

Dendrobium nobile is an orchid species highly popular for its therapeutic properties and is often used as a medicinal herb. Documenting miRNA-target associations in D. nobile is an important step to facilitate functional genomics studies in this species. Therefore, it is of interest to identify miRNA sequences from EST data available in public databases using known techniques and tools. We report 14 potential miRNAs from three ESTs of D. nobile. They belong to 3 miRNA families (miR390, miR528 and miR414) linking to transcription factor regulation, signal transduction, DNA and protein binding, and various cellular processes covering 34 different metabolic networks in KEGG. These results help in the understanding of miRNA-mRNAs functional networks in Dendrobium nobile.

MicroRNAs are a class of endogenous small, non-coding, single stranded RNAs that act as post-transcriptional regulators in eukaryotic organisms [9]. Each miRNA is capable of regulating the expression of many genes -either by translational repression or mRNA cleavage-allowing them to simultaneously regulate multiple cellular signalling and biosynthetic pathways [10]. Plant microRNAs play important roles in plant growth and development including leaf morphology and polarity, organ development, cell differentiation and proliferation, programmed cell death, signal transduction, stress responses, hormone signalling, floral organ identity and maturity, phase transition and reproduction [11][12][13]. For miRNAs to be reliably distinguishable from other RNAs, Ambros et al. (2003) developed a set of criteria for miRNA identification and annotation and their guidelines for experimental verification [14]. However, those criteria for miRNA annotation have been revised by Axtell and Meyers (2018), which has been followed in this study [15]. The first miRNA to be discovered was lin-4, predicted to be of 22 nucleotides in length and found in the larval form of Caenorhabditis elegans [16]. It is responsible for regulation of the pathway that triggers the transitions of first larval stage cell division to the second [17]. In plants, RNA polymerase II is responsible to transcribe majority of primary miRNA transcripts (pri-miRNAs) from miRNA genes. Processing of pri-miRNAs to precursor miRNAs and then further to mature miRNA-miRNA* duplex is brought about by the DCL1 (Dicer-like 1) enzyme [18]. The duplex is methylated by HUA ENHANCER 1 (HEN1) and transported to the cytoplasm by HASTY, after which the guide miRNA strand is then incorporated into ARGONAUTE (AGO) protein [19]. Once a suitable pairing event between a miRNA and target mRNA occurs, the RISC (RNA-induced silencing complex) then triggers almost complete inhibition of protein expression by either cleavage of mRNA targets or by inhibiting protein translation [20]. The repressional activity of miRNA is mainly based on the property of regulation of gene expression at the post-transcriptional level either by cleavage mediated mRNA degradation or inhibition of translation [21]. Discovery of genetic modulators in various plants has helped to comprehend their specific regulatory modules involved in complex biological processes. Understanding the biological functions of miRNAs, identification of miRNAs and their target genes is an important step in interpreting the roles of miRNAs in regulation of specific characters. Documentation of miRNAs and their targets have been very effective in a number of plants such as Arabidopsis, rice, maize, wheat, soybean, cotton and tea [22,23].

Identification of putative miRNAs:
Sequence and structural homologies are used for computer based predictions of miRNAs. Computational strategies provide less time consuming, valuable and efficient means for prediction and identification of miRNA genes and their targets ( Figure 1). NCBI-BLAST+ program was used to screen the ESTs against the reference miRNAs obtained from miRBase by searching for homologous hits [24]. A maximum of two mismatches, threshold e-value of <0.001 and word-size value of 7 was set for the blast+ analysis. After removing redundancy the ESTs with matched hits were subjected to Blastx analysis with NR protein database, and the non-protein coding sequences were retained for further analysis of RNA

Prediction of putative target genes:
A plant small RNA Target Analysis Server viz. psRNATarget was used for predicting the targets of the newly identified miRNA by using Schema V2 (2017 Release) with the maximum expectation value threshold as 3 and rest of the values set as default [28]. A maximum of two mismatches were allowed in the complementary region of target genes with the miRNAs, whereas mismatch inhibition was maintained at 10 th and 11 th nucleotide position along the aligned region. Target genes were identified against Arabidopsis thaliana transcript, TAIR V10 as genome or transcriptome sequences of D. nobile are not available in public domain.

Gene Ontology, KEGG pathway and Phylogenetic analysis:
Annotations of the target genes were carried out using a Blastx analysis with an e-value of 10 -3 against the NCBI non-redundant protein database. Blast2go version 5.2 (https://www.blast2go.com/blast2go-pro/) was used for the gene ontology and KEGG (Kyoto Encyclopaedia of Genes and Genomes) pathway analysis of the annotated target genes in order to assess the phenotypic traits which may get affected by expression of the identified miRNAs of D. nobile [29]. The phylogenetic trees were constructed using MEGA7 -a Windows OS based software. The precursor sequences of family members of the identified miRNAs, belonging to other plant species were downloaded from miRBase and collated with the D.nobile miRNA precursors. Multiple sequence alignment was carried out using MUSCLE algorithm and phylogenetic trees developed using the Maximum likelihood approach.   (2018) for plant miRNA annotation, these were further filtered to retain only the miRNAs ≥ 19 nucleotides in length [15]. As a result only 249 miRNAs were taken from which further removal of redundancies in miRNAs and ESTs yielded 247 potential miRNA sequences. Blastx analysis of these ESTs against the NCBI non-redundant database resulted in identification of 89 sequences as non-coding sequences.

miRNA secondary structure:
The potential miRNAs were subjected to structural validation analysis in Mfold v3.5 for prediction of miRNA secondary structure. The miRNAs which showed valid stem-loop hairpin precursor, presence of complementary miRNA* sequence in the precursor with less than 6 mismatches, and an MFEI value greater than 0.5 were considered for further analysis of their target genes. Fourteen such conserved miRNAs were identified belonging to three miRNA families (Figure 2). miR528 is represented by two members, miR390 represented by 11 members and miR414 by one member. The ΔG values ranged from -47.8 to -35.5 kcal/mol. It is often considered that, lower the value of ΔG, higher is the thermodynamic stability of the miRNA precursor [30]. A lower value of ΔG corresponds to a higher MFEI value as MFE is equivalent to (-ΔG). miRNA characterization indicates that the precursor length of miRNAs varied between 79-169 bases and the mature miRNA length ranged from 19 to 21 nucleotides ( Table 1).

Target gene prediction and annotation:
It has been demonstrated in several studies that most plant miRNAs bind to their target mRNA sequences with perfect or nearperfect sequence complementarity [31,32]. This provides an effective approach for discovering probable miRNA targets by comparing and aligning miRNAs with mRNA sequences. In order to identify genes plausibly recognised by the potential miRNAs, psRNA Target -a web-based server was used for searching target genes against A. thaliana transcriptome acquired from TAIR10. A total of 138 genes were identified as target genes of 14 identified miRNAs, where 4 genes having unknown functions were discarded. Out of the 134 retained targets, only 3 genes exhibit translational repression by corresponding miRNAs whereas all the rest of the genes show cleavage mode of regulation ( Table 2).

GO and KEGG pathway analysis:
To further understand the regulatory functions of miRNAs, the target genes were subjected to Gene Ontology (level 2) and KEGG pathway enrichment analysis, using Blast2Go v5.2. The results suggested that D.nobile miRNAs were involved in regulation of 14 broadly defined biological processes and 3 basic molecular functions. The target genes were also found to be part of 9 different types of cellular components (Figure 3). Pathway enrichment analysis of target genes based on KEGG database demonstrated the participation of identified miRNAs in 34 different metabolism networks (Figure 4). These networks are involved in various important pathways such as purine metabolism, antibiotic synthesis, caffeine metabolism, pentose phosphate pathway and TCA cycle.

Phylogenetic Analysis:
Phylogenetic analysis was carried out to understand the relationship between the identified miRNAs in D. nobile with the other plant species available in miRNA database for same family identification ( Figure 5).  The G+C% of most of the miRNAs was found to be in the range of 41-47%, however only the members of miR528 family presented a G+C% value greater than 50. Among the predicted targets 13% genes are sequence specific transcription factors, 33% genes with various catalytic functions and 54% genes act as sequence specific DNA-binding, metal ion binding or protein binding factors. In the gene ontology analysis, the two main categories represented among the biological processes are cellular processes and metabolic processes (18% and 16% genes respectively). 22% of the target proteins have been found to be part of the nucleus, 18% proteins are present in various cell organelles and 13% proteins act as integral part of the cell membrane.
Transcription factors (TFs) are the master regulators of gene expression patterns in eukaryotes, and are responsible for facilitation of growth and development in plants [37]. dno-miR414 identified in this study has been shown to target several transcription factors including those from MYB as well as TCP family of TFs. Members of MYB DNA-binding domain superfamily protein are involved in many important biochemical and physiological processes in plants [38]. Furthermore, previous studies have also reported that miR414 can target the MYB family transcription factors in Allium cepa, Solanum tuberosum and Brachypodium distachyon [39][40][41]. The plant-specific TCP (TEOSINTE BRANCHED 1, CYCLOIDEA, PCF 1 and 2) transcription factor family is involved in plant development throughout its vegetative phase, i.e. from seed germination until the formation of flowers and fruits [42]. Members of a few other families of transcription factors have also been found to be probable targets of dno-miR414, such as ERF, GATA and WRKY family of transcription factors. The ERF (Ethylene responsive) transcription factors are responsible for establishment of floral meristem and tissue repair processes [43]. GATA transcription factors (binding to GATA rich sequences) are the DNA motifs that have been mostly implicated in lightdependent gene regulation in plants [44], and the WRKY family of transcription factors has a significant role in regulation of abiotic stress responses in plants [45]. Our results also show that dno-miR414 and dno-miR528a may also target several genes which encode various F-box proteins. These proteins are characterized as components of the SCF ubiquitin-ligase complexes (Skp I, Cullin, and an F-box protein), in which they bind substrates for ubiquitinmediated proteolysis [46]. Protein ubiquitination is considered as a critical post-translational modification process that is employed by eukaryotes in order to regulate various types of cellular processes [47]. Another important gene found to be targeted by dno-miR528 family is the co-chaperone that assists in protein folding mediated by HSP70 or HSP90 [48]. The KEGG pathway analysis also reveals involvement of miRNAs in regulation of genes associated with various significant metabolic pathways. Our findings have shown that ESTs can be a major source of functional information similar to previous reports of SSRs identified from ESTs [49].

Conclusion:
We report the mining of miRNAs from EST data in Dendrobium nobile. We describe 14 potential miRNAs from 3 ESTs of D. nobile. They belong to 3 miRNA families (miR390, miR528 and miR414) linking to transcription factor regulation, signal transduction, DNA and protein binding, and various cellular processes covering 34 different metabolic networks in KEGG. These results help in the understanding of miRNA-mRNAs functional networks in D. nobile.