Computational analysis of common bean (Phaseolus vulgaris L., genotype BAT93) lycopene β-cyclase and β-carotene hydroxylase gene's cDNA

The identification of genes and understanding of genes' expression and regulation in common bean (Phaseolus vulgaris L.) is necessary in order to strategize its improvement using genetic engineering techniques. Generation of expressed sequence tags (ESTs) is useful in rapid isolation, identification and characterization of the genes. To study the gene expression in P. vulgaris pods tissue, ESTs generation work was initiated. Early stage and late stage bean-pod-tissues cDNA libraries were constructed using CloneMiner cDNA library construction kit. In total, 5972 EST clones were isolated using random method of gene isolation. While processing ESTs, we found lycopene β-cyclase (PvLCY-β) and β-carotene hydroxylase (PvCHY-β) gene's cDNA. In carotenoid biosynthesis pathway, PvLCY-β catalyzes the production of carotene; and PvCHY-β is known to function as a catalyst in the production of lutein and zeaxanthin. To understand more about PvLCY-β and PvCHY-β, both strands of both cDNA clones were sequenced using M13 forward and reverse primers. Nucleotide and deduced protein sequences were analyzed and annotated using online bioinformatics tools. Results showed that PvLCY-β and PvCHY-β cDNAs are 1639 and 1107 bp in length, respectively. Analysis results showed that PvLCY-β and PvCHY-β gene's cDNA contains an open reading frame (ORF) that encodes for 502 and 305 amino acid residues, respectively. The deduced protein sequence analysis results also showed the presence of conserved domains needed for PvLCY-β and PvCHY-β functions. The phylogenetic analysis of both PvLCY-β and PvCHY-β proteins showed it's closeness with the LCY-β and CHY-β proteins from Glycine max, respectively. The nucleotide sequence of PvLCY-β and PvCHY-β gene's cDNA and it's annotation is reported in this paper.

consortium was developed to establish the necessary framework of knowledge and materials for the advancement of bean genomics, transcriptomics, and proteomics; and the main goal of it is to help in generating new common bean varieties suitable and desired by farmers and consumers [3]. As a part of the international consortium for Phaseolus genomics [3], research work on generation of P. vulgaris expressed sequence tags (ESTs) was initiated at Melaka Institute of Biotechnology, Malaysia.
The randomly isolated anonymous cDNA clones (on a large scale) are treated as ESTs and used extensively in the gene's expression and regulation studies [4]. The generated ESTs data is also used in the evaluation of the genomes for genes content and its structure, in comparative gene expression analysis between different plant tissues using computational tools [5], and in discovery of new and novel genes [6]. In monocot and dicot plants, various new and novel genes have been identified by using random method of cDNA clones isolation and their nucleotide sequencing [7][8][9][10][11]. Hence, ESTs were generated to study the gene's expression and regulations in bean-pod-tissue in-line-with the agenda of the international consortium for Phaseolus genomics [3].
To this point, we have generated 5972 ESTs; and annotated ESTs were deposited into ESTs database hosted by National Center for Biotechnology Information (NCBI) GenBank / DDBJ / EMBL (our unpublished work). While processing and analysing generated ESTs, we found lycopene β-cyclase and βcarotene hydroxylase gene's cDNA [12,13]. The source of lycopene β-cyclase and β-carotene hydroxylase cDNA is P. vulgaris; hence, lycopene β-cyclase and β-carotene hydroxylase cDNAs were designated as PvLCY-β and PvCHY-β, respectively. In carotenoids biosynthesis pathway, PvLCY-β catalyzes the production of carotene (α-carotene and β-carotene) [12,14]; and PvCHY-β is known to function as a catalyst in the production of lutein and zeaxanthin [13].
Due to antioxidant properties of carotenes (β-carotene), several health benefits associated with its consumption are reported elsewhere [15]. Similarly, the benefits of lutein and zeaxanthin consumption are reported by many researchers; and their reports are reflecting the importance of these (carotenes, lutein and zeaxanthin) natural products in human health [16][17][18][19][20][21][22].
Both, PvLCY-β and PvCHY-β cDNA clones do have potential applications in genetic engineering of P. vulgaris and other plants. That is why, both clones were fully sequenced. These two cDNA clones could be used in manipulating P. vulgaris and level of carotene, lutein and zeaxanthin could be elevated. Hence, in order to understand more about PvLCY-β and PvCHY-β, their cDNA clones were analysed and annotated. The nucleotide and deduced protein sequence of PvLCY-β and PvCHY-β gene's cDNA are analyzed and annotated in this study using computational tools. The nucleotide sequence of PvLCY-β and PvCHY-β gene's cDNA and its annotation is reported in this paper.

Plant Materials
The seeds of P. vulgaris genotype BAT93 were kindly provided by Patricia Lariguet, Laboratoire de Biologie Moléculaire des Plantes Supérieures, Department of Plant Biology, University of Geneva, Geneva, Switzerland. Seeds were germinated in soil obtained from a nursery (Melaka, Malaysia), and seedlings were maintained to grow in the open area at Melaka Institute of Biotechnology, Malaysia.

PvLCY-β and PvCHY-β cDNA clones isolation
The PvLCY-β and PvCHY-β cDNA clones were identified from the ESTs generated using random method of gene isolation [7,8,23]. The cDNA clone encoding PvLCY-β was isolated from 20day-old [days after anthesis (DAA)] bean-pod-tissue cDNA Entry Library; and the cDNA clone encoding PvCHY-β was isolated from 5-day-old bean-pod-tissue cDNA Entry Library. The cDNA libraries were constructed (our unpublished data) using 'CloneMiner cDNA library construction kit' procured from Invitrogen Corporation.

Plasmid DNA isolation
The individual cultures of Escherichia coli strain DH5α cells harbouring recombinant plasmids with PvLCY-β and PvCHY-β cDNA clones were cultivated in 10 ml LB medium supplemented with 40µg/ml Kanamycin. Cultures were incubated in dark at 37ºC, 160 rpm for 18 h. From harvested E. coli cells, plasmid DNA was isolated using Wizard ® Plus SV Minipreps DNA purification system, a commercial kit (Promega).

cDNA and deduced protein sequence analysis
For both PvLCY-β and PvCHY-β cDNA clones, the nucleotide sequence of plus (+) and minus (-) strands were aligned using Blast (bl2seq) program available at NCBI [http://blast.ncbi.nlm.nih.gov/]. The 5' and 3' ends of the cDNA sequences were edited to eliminate adaptor and vector sequences. The finalized cDNA sequences were analyzed using online bioinformatics tools.
The similarity searches were performed using blast programs (BlastN and BlastP) available at NCBI. Online bioinformatics tools available at JustBio [http://www.justbio.com/] were used to deduce the protein sequence, to find out the general features of PvLCY-β and PvCHY-β cDNA and deduced protein sequences. The EMBOSS Water -Pairwise Sequence Alignment [http://www.ebi.ac.uk/Tools/emboss/align/] was used to compare cDNA and deduced protein sequences to find out similarity% with their counterparts from other species. Guanine and cytosine (GC %) content calculation was carried out by using 'DNA/RNA base composition calculator'. Alignment of multiple protein (amino acids) sequences was carried out using multiple sequence alignment by ClustalW program, and the phylograms were constructed using BioEdit and TreeView programs [24,25]. Proteins sequences were aligned by using CLUSTAL 2.1 multiple sequence alignment program to find out conserved residues in both PvLCY-β and PvCHY-β deduced proteins.

Discussion: PvLCY-β and PvCHY-β cDNA clones isolation
The full-length PvLCY-β and PvCHY-β cDNA clones were isolated from 20-day-old and 5-day-old bean-pod-tissues cDNA libraries, respectively. The isolated PvLCY-β and PvCHY-β cDNA clones were designated as PvLCY-β and PvCHY-β to indicate their precise identity and the source of the plant to which they belong.

Nucleotide sequencing
Both, sense (+) and antisense (-) strands of both cDNA clones were sequenced where M13 forward and M13 reverse primers were used. After elimination of the vector and adaptor sequence, the sequence of sense and antisense strand of individual cDNA was compared using blast (bl2seq) program. Analysis of the results showed that PvLCY-β and PvCHY-β cDNAs are 1639 and 1107 bp in length, respectively. Open reading frame (ORF) and 3' non-coding region of cDNA are shown in capital and small letters, respectively. The deduced aminoacid sequence is given below the nucleotide sequence, and numbered at both ends of each sequence line. The ORF encodes for a protein of 502 amino acid residues (blue). Amino acid residues are numbered beginning with the initial Methionine (M) till last Glutamic acid (E) residue. Initiation and termination codons are shown in green and red colour, respectively. *represent the termination codon. This cDNA clone was isolated from P. vulgaris 20-day-old-pods tissue cDNA library.

cDNA and Deduced Protein Sequence Analysis
The identity of both cDNA clones was confirmed by analyzing finalized respective cDNA sequence and its deduced amino acid sequence. Annotated nucleotide sequences of both PvLCYβ and PvCHY-β cDNA were deposited in GenBank/DDBJ/EMBL under the accession numbers HQ199604 and JN255133, respectively. Annotated general features of cDNA nucleotide and protein sequences are summarized in Table 1 (see supplementary material); and nucleotide sequence of PvLCY-β and PvCHY-β cDNA along with its deduced amino acid sequence is shown in Figure 1 &  Figure 2, respectively.

Figure 2:
Nucleotide and deduced amino acid sequences of Phaseolus vulgaris beta-carotene hydroxylase (PvCHY-β) cDNA clone. Open reading frame (ORF) and non-coding regions of cDNA are shown in capital and small letters, respectively. The deduced amino-acid sequence is given below the nucleotide sequence, and numbered at both ends of each sequence line. The ORF encodes for a protein of 305 amino acid residues (blue). Amino acid residues are numbered beginning with the initial Methionine (M) till the last Serine (S) residue. Initiation and termination codons are shown in green and red colour, respectively. *represent the termination codon. This cDNA clone was isolated from P. vulgaris 5-day-old-pods tissue cDNA library.
The similarity% of both PvLCY-β and PvCHY-β cDNA nucleotide and deduced protein sequence with their counterparts from other species are shown in Table 2 & Table 3 (see supplementary material), respectively. The amino acid Sequence analysis results showed that both PvLCY-β and PvCHY-β proteins are Leucine (L) rich (Supplementary Figure 1  & 2). The comparison of the PvLCY-β protein with its counterparts from other species showed that 217 (out of 502) residues (43.23%) are fully conserved. But, in case of the PvCHY-β protein, results showed that only 67 (out of 305) residues (21.97%) are fully conserved. The consecutive search for conserved domains in PvLCY-β and PvCHY-β protein sequences resulted in the detection of their conserved domains, and the results are summarised in Table 4 (see supplementary material). The phylograms were constructed in order to understand phylogenetic relationship of PvLCY-β and PvCHY-β proteins with their counterparts from other species. The phylograms for PvLCY-β and PvCHY-β proteins are shown in Figure 3 & Figure 4, respectively.
The understanding of the identified genes, their expression patterns and regulation is crucial in order to strategize the approach to manipulate any biosynthesis pathway of interest in the plants. For the suppression of a gene expression, partial sequence of that gene can be utilized to induce posttranscriptional gene silencing (PTGS) (26-28]. However, the full length gene or its cDNA is required for its over-expression in order to increase either the production of desired vital proteins or natural products [29]. Therefore, understanding of gene of interest and it's cDNA is prerequisite before it can be used in recombinant DNA (rDNA) technology to manipulate genetically, any plant of interest or organism [30].
The main goal of this study was to annotate PvLCY-β and PvCHY-β gene's cDNA and deduced respective protein (amino acid sequence). The PvLCY-β cDNA clone was identified in 20day-old-pod tissue cDNA library, and it indicates that PvLCY-β is expressed in bean's 20-day-old developing pod tissue. However, the PvCHY-β cDNA clone was identified in 5-dayold-pod tissue cDNA library; and it reflects that PvCHY-β is expressed in bean's 5-day-old developing pod tissue. However, the level of both gene's expression, pattern of expression, and tissue-specificity is not clear at this moment as we have not characterised these two gene's expression. It can be done by using either Northern hybridization technique or microarray technique [31].  Table 2). The location of PvLCY-β protein in phylogram is shown in a pink box.
The GC content in PvLCY-β and PvCHY-β cDNAs is 42% and 46%, respectively. The GC% in both PvLCY-β and PvCHY-β cDNAs is significantly higher than that of the GC% (39.4 %) reported in nuclear DNA of the broad bean [32]. Phaseolus vulgaris is a valuable source of proteins in the human diet; and it is important to increase the yield of this essential crop [3,39]. Several research teams are using GM technology approach to improve yield of the beans [30,40]. For instance developing P. vulgaris resistant to the herbicide [41] and viral infection [42]. In addition to this, there is a vast scope to modify beans genetically for improving the nutritional quality of its pods and seeds. This type of genetic manipulation is possible; because, rice (Oryza sativa) has been genetically engineered and β-carotene content in it has been increased for use as a source of vitamin A [43]. Similarly, β-carotene content can be increased in beans by over-expression of PvLCY-β in its carotenoid biosynthesis pathway (Supplementary Figure 3). Furthermore, therapeutically beneficial lutein and zeaxanthin content increment in beans is also possible by over-expression of PvCHY-β [44].
Genetic modification of agricultural crop plants to improve yield and nutritional quality is a viable option, and it is absolutely important as far as human wellbeing is concerned [30,40]. Both, isolated PvLCY-β and PvCHY-β gene's cDNA are reasonably well annotated in this study, and we believe that the available annotated cDNA sequences could be useful in designing the strategy for the construction of transformation vectors. Further research is needed in this line to achieve the ultimate goal of generating new common bean varieties suitable and desired by farmers and consumers.

Conclusion:
This study has annotated the salient features of PvLCY-β and PvCHY-β gene's cDNA clones. The computational analysis of deduced PvLCY-β and PvCHY-β proteins revealed the presence of conserved domains. Furthermore, the comparative analysis of deduced PvLCY-β and PvCHY-β protein sequences with their counterparts from other species revealed the fully conserved amino acid residues. However, further study is required to understand PvLCY-β and PvCHY-β gene's expression and its regulation in bean-pods. Both genes' over-expression in beanpods can be considered for futher research to explore the possibility of nutritional quality improvement of the bean-pods and bean-seeds.