In silico identification of miRNAs and their targets from the expressed sequence tags of Raphanus sativus

MicroRNAs (miRNAs) are a novel growing family of endogenous, small, non- coding, single-stranded RNA molecules directly involved in regulating gene expression at the posttranscriptional level. High conservation of miRNAs in plant provides the foundation for identification of new miRNAs in other plant species through homology alignment. Here, previous known plant miRNAs were BLASTed against the Expressed Sequence Tag (EST) database of Raphanus sativus, and according to a series of filtering criteria, a total of 48 miRNAs belonging to 9 miRNA families were identified, and 16 potential target genes of them were subsequently predicted, most of which seemed to encode transcription factors or enzymes participating in regulation of development, growth and other physiological processes. Overall, our findings lay the foundation for further researches of miRNAs function in R.sativus.


Background:
MicroRNAs (miRNAs) are a novel growing family of endogenous, small, non-coding, single-stranded RNA molecules encoded in the genomes of plants and animals that repress mRNA translation or mediate mRNA degradation in a sequence-specific manner [1]. The discovery of the first microRNA lin-4 in Caenorhabditis elegans by Ambros laboratory emerged as biology's unusual or unique findings [2]. These tiny bits of RNA play a major role in gene regulation, which involves in negative regulation of gene targets. In recent years, identification and functional studies of miRNA has made great progress in research. The exposition of miRNA in plants is still a continuing process hence, till date a number of plant miRNAs have been discovered and functionally identified. Key roles of miRNA in biological processes are revealed by plant studies, which include regulation of leaf development [3], stem development [4], root development [5], signal transduction, developmental timing, floral differentiation and development, and defense response against every sort of stress. Plant miRNAs are usually evolutionary conserved and are observed in regions of the genome distinct from previously annotated genes. Different approaches used for miRNA identification includes, gene cloning technology and Bioinformatics strategies. Gene cloning is a conventional method to identify the new miRNA accurately, even though it has disadvantages, such as difficulty in finding miRNAs which express at low levels, difficulty in cloning, degradation of RNA during sample separation etc [6]. Rapid development in the field of bioinformatics has brought a number of computational programs and other tools to successfully predict the miRNA [7,8]. This process is purely based on the genomic databases like expressed sequence tags EST and other. Since the miRNAs are more conserved in plant species, it is possible to identify novel miRNAs using computational techniques. Now a days miRNAs are identified using the computational or bioinformatics based approach, since it is very useful in predicting the novel miRNA, which cannot be done by cloning. Radish (R.sativus) is an edible root vegetable that has been found to possess a number of pharmacological and antioxidant properties. Radish is rich in folic acid, Vitamin C and anthocyanins. These nutrients make it a very effective cancer-fighting food. It is said that radish is effective in fighting oral cancer, colon cancer and intestinal cancer as well as kidney and stomach cancers.
In this study, all previously known plant miRNAs from A. thaliana, rice, and other plant species were used to search the R.sativus homologs of miRNAs in the publicly available expressed sequence tag (EST) (National Center for Biotechnology Information, NCBI, (http://www.ncbi.nlm.nih.gov/). A total of 48 potential miRNAs were detected. Using these potential miRNAs sequences, R.sativus mRNA database was further blasted to find 16 potential miRNA-targeted genes. Most of the target mRNAs were found to be coding transcription factors which are involved in regulating plant growth, development and metabolism.

Datasets of miRNAs, EST and mRNA sequences
To search potential miRNAs in R.sativus, previously known plant miRNAs including their precursor sequences from Arabidopsis thaliana, Zea mays, Oryza sativa, Glycine max, Sorghum bicolor and other plant species were downloaded from the miRBase (Release 17:April 2011) (http://www.mirbase.org/) [7]. After removal of the repeated sequences, 1,876 were considered as reference set R.sativus expressed sequence tag (EST) and mRNA database were downloaded from GenBank database (http://www.ncbi.nlm.nih.gov).

Availability of Computational Software
Comparative software BLAST-2.2.24 [8] was downloaded from NCBI. BLASTX (http://www.ncbi.nlm.nih.gov/BLAST/) was used for analysis of potential targets. RNA secondary structure and the free energy were calculated by web server Mfold 3.2 [9].

Prediction of R.sativus miRNAs
The Computational prediction of miRNAs in R.sativus is shown in Figure: 1. the plant miRNA sequences were initially scanned to remove repeats. The reference set was subjected to the BLAST [10] search for R.sativus homologs of miRNAs against EST database. The initial BLAST-2.2.24 search was carried out with default parameters. Mature miRNA sequence should be no less than 18 nt, and the mismatches should be less than 4. Later these precursor sequences were BLASTXed online to remove the protein coding sequences. Pre-miRNAs secondary structure was run on MFOLD 3.2 (http://www.bioinfo.rpi.edu/). The following steps were considered for screening the candidate miRNA homologs: (1) The RNA sequence folding into an appropriate stem-loop hairpin secondary structure that contains the ~22 nt mature miRNA sequence located in one arm of the hairpin structure; (2) The predicted mature miRNAs with no more than 6 mismatches with the opposite miRNA* sequence in the other arm; (3) maximum size of 3 nt for a bulge in the miRNA sequence was allowed; (4) miRNA precursors with secondary structures had higher negative minimal free energies and minimal free energy index (MFEI) than other different type of RNAs; and MFEI of greater than 0.85 [11]; (5) The A+U content of pre-miRNA within 30-70% was considered; (6) no loop or break in miRNA sequences was allowed. These criteria significantly reduced false positives and required that the predicted miRNAs fit the criteria proposed by Ambros and coworkers  Computational prediction of potential targets of miRNAs miRNA targets prediction was performed by aligning the predicted miRNA sequences with mRNA sequences of R.sativus through the BLAST program. The targets were screened according to these criteria: the number of mismatches should be less than 4, and no gaps were allowed at the complementary sites. After removal of the repeated sequences, the potential target genes were BLASTed against protein databases to predict their function.

Results:
To signify new miRNAs in R.sativus by Bioinformatics strategy, the flowchart is shown in Figure 1. Setting the default e-value, we began our BLASTn search for homologous miRNA sequences against the EST databases of R.sativus. After the blast, all blasted hits with non-coding sequences were retained for analysis of secondary structure; those meeting the criteria, discussed in Methodology were termed as miRNA homologs. Finally, 48 potential R.sativus miRNAs belonging to 9 miRNA families were identified and they were named according to Ambros [12]. The details on predicted R.sativus miRNAs including family names, Sources, mismatches, mature miRNA lengths, precursors, A+U Content, and Minimal folding free energy index (MFEI) were listed in Table 1 (see supplementary  material. During the screening of the potential miRNAs, candidate miRNAs were evaluated for A+U content. The sequences of the miRNA precursors had A+U content ranging from 31.57% to 61.90% Table 1, which is in agreement with the previous results [13]. The length of the 48 predicted miRNAs ranged from 18nt to 22nt. All the MFEI of these hairpin structures were over 0.85, which was thought to be gold standard to differentiate miRNAs from other ones [14]. The 48 miRNAs represent 9 miRNA families in R.sativus (Figure. 2). miRNA 156 has nine members; miRNA 160, 319 and 171 have 7 members; miRNA 396 and 162 have 6 members; miRNA 167, 170 and 399 have 4,1 and 1 members respectively. The current results confirm that the approach for EST analysis is a relatively efficient way to identify miRNAs. , about 10000 ESTs contained one miRNA, so about 15 miRNAs should be predicted theoretically from the total of 150665 ESTs, but in this work we found total 48 miRNAs belonging to 9 families showing the higher value than the previous prediction results and different length of mature miRNAs from the same precursor were regarded as different ones, considering they corresponded to different target genes. Compared to the nucleotide number of animal miRNA precursors (typically with 70-80), the R.sativus miRNA precursors show more diversity in structure and size Table 1. The length of miRNA precursors in R.sativus varied from 77 to 181 nucleotides. The different size of the identified miRNAs within different families suggests that they may offer unique functions for regulation of miRNA biogenesis or gene expression [13]. The diversity of the identified miRNAs can also be found in the location of mature miRNA sequences. The sequences of miR 156b/d/f/g/h/i, miR 160/b/c/d/e/f, miR 162/a, miR 167b/c/d, miR 170, miR 319c, miR396a/c/d and miR399a were located at the 5′ end of the miRNA precursors, while the miR 156a/c/e, miR160a, miR162b/c/d/e, miR167a, miR 171a/b/c/d/e/f/g, miR319a/b/d/e/f/g and miR 396b/e/f were found at the 3′ end. Based on the complementarity between miRNAs and their target genes in plants, the R.sativus EST database was searched for homology to the new miRNA sequences with a BLASTN and BLASTX algorithm for the discovery of miRNA targets. A total of 16 potential targets for R.sativus miRNAs were identified. These potential miRNA targets belonged to a number of gene families that had different biological functions Table 2 (see  supplementary material), including the control of transcription factors, organizing and segregating chromosomes for partition. The miRNAs and their putative targets with known functions are listed in Table 2.

Discussion:
In plant kingdom most of the mature miRNAs are evolutionarily conserved from species to species. This information enables us to predict new miRNA homologs or orthologs by insilico method [13].Therefore, we used all previously known plant mature miRNAs from miRBase to search for homologs of miRNAs and their target genes in radish in the publicly available EST and GSS database of R.sativus. Finally, 48 potential R.sativus miRNAs belonging to 9 miRNA families were identified. In the present study, the length of predicted miRNA precursors varies from 77 to 181 NT. The different sizes of the identified miRNAs within the different families suggest that they may perform unique functions in the regulation of miRNA biogenesis or gene expression [13]. MFEI is an important characteristic that distinguish miRNA from other non-coding and coding RNAs. The MFEI is a unique criterion to designate miRNAs. When the MFEI is more than 0.85, the sequence is most likely to be miRNA. We observed that the MFEIs of the hairpin structures ranged from 1.34 to 3.73 Table 1. All the mature sequences of Radish miRNAs are in the stem portion of the hairpin structures, as shown in Figure 3. According to the estimation approximately 10,000 ESTs in plants contain one miRNA [13] that means 150,665 ESTs in R.sativus should contain 15 miRNAs. But in this study, 48 miRNAs were detected, showing higher value than the previous prediction (Figure. 3). The current results confirm that the approach of EST analysis is a relatively efficient way to identify miRNAs.
To understand the biological function of miRNAs in plant development, it is necessary to identify their targets. In miRNA target prediction, the screening criterion was set according to the description in Methodology. Finally, 16 potential targets for R.sativus miRNAs were identified Table 2. The miRNA 156 have been predicted to target Squamosa promoter Binding Proteins (SBP) or SBP-like (SPL) genes, Table 2. It is identified that, most of the predicted miRNA targets were coding genes for transcription factors mainly involved in the synthesis of enzymes participating in regulation of development, growth and other physiological processes.
The general characteristic of the miRNA sequence is, it is complementary to their target gene, and in some case single miRNA can be complementary to more than one target gene. For example, in this work we identified that the miR156 and miR396 have more regulatory targets, even though their structure and sequence is dissimilar. SMC proteins are recently identified important factors that influence chromosome structure during mitosis and development. SMC proteins were identified very first in the budding yeast through genetic analysis of chromosome segregation. These proteins are known to involve in both prokaryote and eukaryote chromosome structure maintainance [15]. Thioredoxins (Trx) are proteins which play essential role in the cellular redox reactions. These are small and molecular weight is approximately 120 KD. Trx are widely distributed in different forms in bacteria, fungi and plants etc. Especially plants have different Trx proteins found in chloroplast Trx f and Trx m, Trx h found in cytosol, ER and mithochondria [16]. The findings of this study considerably broaden the scope of understanding the function of miRNA in R.sativus.