Survey and characterization of NBS-LRR (R) genes in Curcuma longa transcriptome.

Resistance genes are among the most important gene classes for plant breeding purposes being responsible for activation of plant defense mechanisms. Among them, the nucleotide binding site-leucine rich repeat (NBS-LRR) class R-genes are the most abundant and actively found in all types of plants. Insilico characterization of EST database resulted in the detection of 28 NBS types R-gene sequences in Curcuma longa. All the 28 sequences represented the NB-ARC domain, 21 of which were found to have highly conserved motif characteristics and categorized as regular NBS genes. The Open Reading Frames varied from 361 (CL.CON.3566) to 112 (CL.CON.1267) with an average of 279 amino acids. Most alignment occurred with monocots (67.8%) with emphasis on Oryza sativa and Zingiber sequences. All best alignments with dicots occurred with Arabidopsis thaliana, Populus trichocarpa and Medicago sativa. These detected NBS type Rgenes from Curcuma longa can be used as a valuable resource for molecular marker development, molecular mapping of R-genes, and identification of resistance gene analogs and functional and evolutionary characterization of NBS-LRR-encoding resistance genes in asexually reproducing plants.

Bangladesh and Thailand [24].The International Trade Centre, Geneva, has estimated an annual growth rate of 10% in the world demand for turmeric.Continuous domestication of the preferred genotypes coupled with their exclusive vegetative nature seems to have eroded the genetic base of these crops and as a result, all of their cultivars available today are equally susceptible to major diseases such as rhizome rot caused by Pythium aphanidermatum, leaf blotch caused by Taphrina maculans and leaf spot caused by Colletotrichum capsici.Moreover, turmeric is completely sterile and is propagated exclusively by vegetative means using rhizome.In this context, characterization of resistance-related sequences may provide a lead towards retrieving resistance specificities suitable for the improvement of this crop.Recent advances in Curcuma genomic technologies have generated a large number of expressed sequence tags (ESTs) that have been made available in public database.As of July 2011, GenBank had released 12,593 EST sequences from Curcuma longa.This database can be used as a starting material for the characterization of NBS-LRR class R gene sequences in turmeric.Thus, our objective is to perform a data mining-based identification of plant NBS-LRR class R-genes in Curcuma longa EST database, by using well known R-genes sequences as template, comparing the identified sequences with known R-genes deposited in public DNA and protein databases.

Methodology:
Curcuma longa transcriptome database was searched for NBS-LRR R-gene homologues using Amino-acid sequences of known genes as query.Accession numbers of sequences used at NCBI (National Center for Biotechnology Information; http://www.ncbi.nlm.nih.gov) are shown in   Results and Discussion: R-genes are quite abundant in higher plants but the most functionally defined R genes belong to a class that encode cytoplasmic receptor-like proteins characterized by an N-terminal nucleotide-binding site (NBS) and a leucinerich repeat (LRR) domain.A set of 28 non-redundant NBS sequences were retrieved through TBLASTN alignment of 4035 Curcuma longa contig sequences.They have been annotated for one or more than one R-gene (data summarized in Table 2 (see Supplementary material).Earlier, five resistance gene analogues (RGAs) have been already isolated and characterized in Curcuma longa [27].However, all the five RGAs isolated were of the CC-NBS-LRR class without exhibiting significant variations in the NBS type Rgene domain characterization.In contrast, it was expected that some similar genes grouped at the same class should cause some level of redundancy [28].Contigs representing exclusive NBS type R genes with variability were (I) X-NBS-LRR: 10; CC-NBS-NBS-LRR: 2; CC-NBS-LRR: 9; NBS-LRR: 2; NBS: 3 and CC-NBS: 2 (Figure 1) In 21 out of 28 NBS genes, all the motifs characteristic of the NBS domain were conserved and categorized as regular NBS genes.The others were very different in their structures from the majority, or were simply truncated and categorized as non-regular NBS genes.Two nonregular NBS genes yielded higher P values when they were hit by TBLASTN in the NBS regions and had standard LRR regions while 3 genes had only some of the conserved NBS motifs.Two non-regular NBS genes encoded a coiled motif but were highly divergent in NBS region and lacked LRR regions.In the N-terminal region, 10 regular NBS genes contained some unknown motifs, which were symbolized as X. 11 regular NBS genes encoded the CC motif (CNL and CNNL) while the rest where without the CC motif (XNL).No genes were encoded with the TIR motifs.TIR motif is supposed to be absent in monocotyledonous plants [4], being present in all dicotyledonous taxa actually studied.Sizes of Curcum longa contig aligned to NBS-LRR R-genes varied from 1256 (CL.CON.1529) to 452 nucleotides (CL.CON.1267).The prediction of contig coding regions revealed that ORFs were coded in both forward and reverse reading frames, with an average of 279 amino acids (aa) in length.ORF sizes varied from 361 (CL.CON.3566) to 112 amino acids (CL.CON.1267).The search for conserved domains (CD-Search) revealed conserved motifs in all the analyzed contig clusters.All the 28 contig Curcuma longa clusters represented the NB-ARC domain.In the LLR region, Pfam software detected 32 LRR motifs in the 28 NBS genes.This number is higher than the number of Curcuma longa contigs with NBS-LRR R genes, due to their occurrence in tandem repetitions.Sometimes these LRR sequences are imperfect and may be difficult to recognize with available insilico tools, so it is possible that a larger number may be identified manually.Two of the contig clusters CL.CON.1267 and CL.CON.3620 with a poorly developed NBS motif represented very short ORFs of 112 and 123 amino acids respectively.Considering the best matches to the 28 Curcuma longa NBS-LRR contigs identified, 9 were from plants of dicotyledenous families such as Arabidopsis thaliana, Populus trichocarpa, Pyrus communis, Glycine max, Cajanus cajans and Medicago sativa.From monocots, rice (O.sativa) sequences appeared as best matches (9 contig clusters) followed by Zingiber officinale (3 contig clusters).A comprehensive list of all the sequences that aligned with Curcuma longa NBS-LRR contig clusters are represented in table 2 (see Supplementary material).The comparison of our results regarding the organization of detected Curcuma longa NBS-LRR genes was mainly with rice and ginger.It has been observed that most of the information regarding R-genes available in databases refers to herbaceous model and crop plants such as rice and Arabidopsis, may be because most identified and sequenced R-genes were a consequence of mapping approaches that have been abundantly performed in these plants.The larger number of sequences from Oryza sativa representing best alignments to Curcuma does not represent a higher similarity to this plant species, but it reflects the large number of sequences of this model plant deposited in GenBank.Barbosa-da-Silva et al., 2005 [29] has also found that Eucalyptus even being a woody plant exhibited maximum alignment of R-genes with herbaceous Arabidopsis thaliana.There can be other arguments as well such as (i) Curcuma belongs to the same family as ginger (Zingiberaceae) and (ii) both Curcuma and rice are monocots and exhibit similar levels of complexity.However, we cannot also rule out the fact that significant sequence similarity was also detected with dicot plants.This suggests that, Curcuma longa might be positioned at the transition point between dicots and monocots as far as resistance genes are concerned.However, detail characterization of the NBS-LRR gene in turmeric has to be made before making a valid conclusion on its evolutionary aspect.The number of NBS type R-genes identified here is quite low considering the total size of the EST database.However, there can be other types of R-genes in Curcuma longa, which were not targeted in this study.Moreover, the EST database has not been obtained under pathogen stress condition.This may suggest that the identified NBS sequences are expressed constitutively but also leads to the supposition that a higher number of R-genes may be present in Curcuma under other experimental conditions.Thus, the generation of additional ESTs especially under infection by pathogen, can make it possible to detect many new NBS genes from Curcuma longa.

Conclusion
Using bioinformatics tools, it was possible to detect and characterize NBS type R-genes from Curcuma longa transcriptome.Twenty eight (28) NBS type Rgenes were detected with distinct NB-ARC domain, 21 of which were regular NBS genes.This insilico method of detecting NBS-LRR type R genes in Curcuma longa has been done for the first time in this study.The identified sequences will be valuable resources for the development of markers for molecular breeding and identification of RGAs (resistance gene analogs) in Curcuma and other related species.A few of the NBS type R-genes of Curcuma isolated in this study may also be used for fluorescent insitu hybridization (FISH) on Eucalyptus chromosomes, also helping in the comparison of different parental species and the respective hybrids.Further, these insilico detected NBS type R-genes will reveal furthers insights on the organization, function and evolution of the NBS-LRR-encoding resistance genes in asexually reproducing plants.

Supplementary material:
CAP3 program was used to assemble the EST sequence into contigs for creating a non-redundant dataset.The program TBLASTN [25] was used to perform reverse alignment on Curcuma longa contigs.The clusters frame of the TBLASTN alignment was used to predict the Open Reading Frames (ORFs) for each searched contig.For this purpose, the Expasy Translate Tool (bo.expasy.org/tools/dna.html)was used, which predicts the correct ORF for a DNA sequence in the corresponding amino acid FASTA sequence.The obtained ORFs were subsequently submitted to a Reverse Position Specific BLAST (RPS-BLAST) against Conserved Domain Database [26] aiming to identify patterns or motifs in predicted cluster products.Reciprocal alignments were conducted for ORFs by using the nr databank and stand-alone BLAST package from NCBI.Matched sequences were annotated for latter comparison.

Figure 1 :
Figure 1: Graphical representation of the NBS-LRR R-genes retrieved from Curcum longa EST database.

Table 1 (see Supplementary material), together
with sequences features and accession numbers.They are grouped according to the conserved domains previously //ftp.ncbi.nih.gov/pub/UniVec/) for detecting vector and adapter sequences by using the program Cross_Match.
described.All turmeric sequences used during this work were obtained from Curcuma longa EST database.EST database of NCBI contains 12953

Table 1 :
Classification and features of NBS-LRR R genes used as query against the Curcuma longa EST database.The genes are grouped in three classes (I: NBS+LRR; II: CC+NBS+LRR; III: TIR+NBS+LRR) with respective accession number in NCBI, source species, gene name and domain range.

Table 2 :
Blast results and sequence evaluation of Curcuma NBS-LRR genes, including data about the query: homologous sequence, NCBI gi/-number; features and evaluation results of Curcuma clusters related to R-genes: cluster size in nucleotides (n), ORF (Open Reading Frame) size in amino acids (aa) and e-value.