Stress induced MAPK genes show distinct pattern of codon usage in Arabidopsis thaliana, Glycine max and Oryza sativa

Mitogen activated protein kinase (MAPK) genes provide resistance to various biotic and abiotic stresses. Codon usage profiling of the genes reveals the characteristic features of the genes like nucleotide composition, gene expressivity, optimal codons etc. The present study is a comparative analysis of codon usage patterns for different MAPK genes in three organisms, viz. Arabidopsis thaliana, Glycine max (soybean) and Oryza sativa (rice). The study has revealed a high AT content in MAPK genes of Arabidopsis and soybean whereas in rice a balanced AT-GC content at the third synonymous position of codon. The genes show a low bias in codon usage profile as reflected in the higher values (50.83 to 56.55) of effective number of codons (Nc). The prediction of gene expression profile in the MAPK genes revealed that these genes might be under the selective pressure of translational optimization as reflected in the low codon adaptation index (CAI) values ranging from 0.147 to 0.208.


Background:
Many codons in the genetic code are functionally synonymous. A single amino acid is encoded by two to six codons, a phenomenon called synonymous codon usage. All the synonymous codons of an amino acid show a variation in the occurrence in a gene frequency. Some codons show a high frequency indicating that the usage of the particular codon is biased. The study of the codon usage bias may assist to design the DNA primers [1]. The synonymous codons show variation between as well as within the genome of an organism [2,3]. This variation could be a consequence of natural and/or mutational pressure to determine the accuracy and the efficiency of the translation process of the organisms. Study of the synonymous codon usage patterns of a gene in various organisms could provide an insight into the evolution profile and the level of gene expression as well [4,5]. In due course of evolution, plants have developed their own mechanisms to combat the stresses-both biotic as well as abiotic which they are continually subject to. Mitogen activated protein kinase (MAPK) genes present in the plants actively respond to various stresses. Abiotic stresses mainly include drought, high salinity, heat, cold, freezing, limited nutrient availability, heavy metals etc. [6,7]. The MAPK genes are generally classified into three distinct types-MAPKKK, MAPKK and MAPK. The string of reactions in the stress signalling pathways involve these three MAPK families; MAPKKKs phosphorylate the serine/threonine of MAPKKs at their activation loop which in turn double phosphorylate the MAPKs at their T-D-Y motif in the activation loop. The MAPK genes get activated by the stress stimuli and form various cascade which form the metabolic pathways to regulate the stress response. The cascade actions acts downstream of the receptors on extracellular surface or acts as a sensor that's transduces the extracellular signals into intracellular responses [8].  [11]. Chilling stress is one of the serious problems associated with the production of major crops such as rice and maize [12] in temperate zone. The Arabidopsis MAPKs, based on their structural motif and sequence similarities, is classified into four groups (A-D). The group A, B and C possesses a T-E-Y motif whereas the fourth group A possesses a T-D-Y motif [13,14]. A signalling pathway comprising of MEKK1-MKK2-MPK4/MPK6 is reported to regulate the response to abiotic stresses in Arabidopsis [15]. The abiotic stresses induce the production of ROS (reactive oxygen species). Several MAPK signalling pathways are induced only by ROS and in turn regulate the ROS production. Many MAPK pathways in response to abiotic stresses have been studied in rice. Experiment on long term exposure of rice plants to cooling stress revealed the involvement of several MAPKs namely MEK1 (MAPKK) and MAP1 (MAPK) [16]. MAPK5 gene in rice has shown multiple activities, responding to both biotic and abiotic stress. Resistivity to drought in rice plants is provided by an MAPKK of the B3 subgroup named DSM1 [17]. MAPK33 has also shown activity withstanding drought [18]. Microarray analysis of MPK4 activity in soybean revealed that it negatively regulates SA and H2O2 accumulation. Therefore silencing of MPK4 in soybean significantly increases SA and H2O2 accumulation, up-regulating genes involved in defense response and providing the plants a better resistance to downy mildew and soybean mosaic virus as compared to vector controlled plants. MPK4 has been reported to down-regulate genes involved in growth and development, such as those in auxin signalling pathways and in cell cycle and proliferation [19]. Very little information is available on the codon usage pattern of MAPK genes across plant species despite substantial literature on physiological mechanisms. The present study has therefore been undertaken to elucidate the codon usage of MAPK genes. Overall comparative analysis of the codon usage patterns of the MAPK genes reported in the three organisms -A. thaliana, G. max and O. sativa could be beneficial to analyse the conservative nature of the genes, codon usage patterns and the compositional role in determining the optimal codons. Information regarding the codon usage patterns could help reveal the evolutionary history of individual genes within or between organisms, and the expression of genes as well.

Methodology:
Coding DNA sequence data Coding sequence data (a total of 127 cds) of the different genes of the MAPK families of Arabidopsis, soybean and rice were retrieved from NCBI (www.ncbi.mlm.nih.gov). The genes with different accession numbers are listed in Table 1 (see supplementary material). Different analytical parameters for codon usage bias of the MAPK genes with different accession number were estimated and analysed.

Analysis of the codon usage profile
Several parameters have been used to characterize the sequence data of the genes. Gene members of each of the three MAPK families are analysed and compared family-wise across the three organisms.

RSCU
The relative synonymous codon usage (RSCU) measures the frequencies of optimal codons of each of the synonymous codons encoding an amino acid. It assists in characterising the codons in a genetic sequence, whether it follows unbiased pattern of the codons being used or certain codons are more preferred. Codons with RSCU values greater than 1 are generalised to possess a positive codon usage bias and those less than 1 are considered to possess negative codon usage bias [20]. Bioinformatics tools available online, codonW (mobyle.pasteur.fr/cgi-bin/portal.py#forms::codonW) have been used to estimate the RSCU which is mathematically expressed as: (For equation-> (1) please see supplementary material). where, xi is the number of frequency of j th codon for i th amino acid and being the number of alternative synonymous codons available for the i th amino acid.

GC content
It is a measure of the occurrence of the nucleotide bases guanine (G) and cytosine (C) in the entire genetic sequence

Results: Characteristic patterns of nucleotide composition
The nucleotide composition of the MAPK genes in three organisms shows a clear characteristic pattern of resemblance with a few exceptions in rice (Figure 1). In case of MAPKKK family, the overall GC% of Arabidopsis is 35.6%, which is low, revealing that the organism has high overall AT content. The overall GC content in rice and soybean gene sequences is 43.99% and 53.10% respectively. This result suggests that there exists a balance between the AT and GC content in the soybean genes, while in rice the overall AT content slightly exceeds the overall GC content for MAPK genes. From the GC3% in all the three gene families across three species, it is evident that the GC% is markedly suppressed in Arabidopsis and soybean at the third synonymous codon position. The rice genes show an overall consistent pattern from the other two species in respect of the GC3%. The overall GC3 content in all the gene families of rice is slightly lower (40.02%-45.39%) as against the AT3 content (54.61%-59.08%). Thus the comparative study of the MAPKKK, MAPKK and the MAPK family genes show that Arabidopsis and soybean have resemblance in their pattern of nucleotide usage whereas rice genes deviate from the other two species (Table 1).

Synonymous codon usage pattern
Codons with RSCU values greater than 1 are considered to have positive codon usage bias [22]. The most preferred codon for each of 18 amino acids bearing the highest RSCU values are shown in Table 2 (see supplementary material). Inspection of the overall RSCU values reveal that the codons TTT, GTT, AAT, and GAT coding for phenylalanine, valine, asparagine and aspartic acid respectively, have got the highest preference in all the three organisms. Besides this, the comparative study between the Arabidopsis and soybean (G. max) reveals significant resemblance in the preference of codons; 11 out of 18 amino acids show the same preferred codon. But rice does not show any significant trend: only two amino acids resemble with soybean, and three with Arabidopsis for preferred codon. The resemblance between Arabidopsis and soybean is further evident from the codon preference in the MAPKK family; 12 out of 18 amino acids show the same preference of codons. Only one amino acid i.e. valine is encoded by the most preferred codon GTT in three MAPK gene families across all the species. The MAPKKK family has shown the extreme resemblance for the most preferred codons between Arabidopsis and soybean; 15 out of 18 amino acids have the same preferred codon. But in rice the most preferred codon for most amino acids differs from Arabidopsis and soybean. The comparison of RSCU values for most preferred codon for each amino acid in three organisms is shown in Figure 2 for each gene family. The third codon position of all the preferred codons predominantly possesses the nucleotide T followed by A in all the three gene families indicating that this preference could be due to translational selection or mutation bias.

Expressivity of genes
Codon adaptation index (CAI), as proposed by Sharp and Li in the year 1987, is a measure to predict the expressivity of genes [24]. The CAI value of each gene belonging to different families was found to be very low, ranging from 0.147 to 0.208 (Table  1), which indicates that the genes are not possibly optimized for high expression (Figure 3). This could be due to the fact that stress induces the MAPK gene expression and that MAPK genes are not house-keeping genes by nature. Hence their CAI values are usually low indicating low expression under nonstress environment.

Biasness in codon usage
The Nc values of all genes are towards higher side, ranging from 50.83 to 56.55. The higher values of Nc signify that the gene sequences show low biasness in the codon usage profile.

Discussion:
The comparative study of the codon usage for MAPK genes in three species i.e. Arabidopsis, soybean and rice have shown that the first two organisms share a high degree of similarity between them. The MAPK genes of these two organisms have shown very high resemblance in codon preference in the coding sequences. The gene family wise comparison across the three organisms revealed that Arabidopsis and soybean are compositionally alike. In contrast, rice showed a somewhat different nucleotide composition from the other two species. This could be due to the fact that Arabidopsis and soybean are dicots unlike rice which is a monocot. Arabidopsis and soybean genes are overall AT-rich as well as at the third synonymous codon position. The rice MAPK sequences, on the other hand, are overall GC-rich and at the third synonymous position. Guo and his co-workers (2007) also reported a high GC content in rice genes [25]. The results suggest that the codon usage pattern in all the MAPK genes in the three species used in the study might be influenced by their base compositional properties. The general trend of the Nc values in the three species is consistently higher side for all the genes indicating a low bias in the codon usage pattern in these genes.
The overall perusal of the RSCU values for the three gene families revealed a high similarity between the codon usage in Arabidopsis and G. max. Majority of the preferred codons in the genes of these two species are T-redundant at the silent third codon position. This may be due to the high prevalence of AT content in these genes. However, in case of O. sativa, the preference for the nucleotides at the third synonymous codon position is balanced in MAPK and MAPKK genes, which is evident from the nearly equal distribution of overall GC and AT content in these two gene families. In MAPKKK gene family of O. sativa, however, the preferential codons showed the increased tendency of using T at the third position. Valine is encoded by the most preferred codon GTT in three species. Highly expressed genes generally show a tendency of using a limited number of codons which they use preferentially [24].
The level of gene expression as estimated by the CAI values has shown that the MAPK genes are not possibly highly expressed. Since stress induces the expression of MAPK genes, these genes might be evolutionarily so organized as to give low expression under non-stress or normal conditions. CAI values of all the genes have shown a close proximity to 0, suggesting that they have less expressivity [24]. Translational selection might have played a role in this context, rendering the MAPK genes to be optimized for low expression under non-stress environment.

Conclusion:
This work is the first attempt to gain insight into the codon usage profiles of stress induced MAPK genes across three plant species. Mutational pressure and natural selection have been projected as the main impetus behind the codon usage bias in various organism ranging from small prokaryotes to large plants and animals [21,26,27]. In the present study, the results indicate that apart from the mutational pressure, translational selection might be playing a pivotal role in order to make these genes optimized for translating efficiently. MAPK families comprise of a huge number of genes interacting with each other in order to combat different environmental stresses. Based on the type of stress and the species involved, the codon usage as well as the expression of genes may vary. Thus, it is necessary to carry out further detailed analysis of the codon usage pattern in MAPK and the associated genes involved in the cascade of actions under biotic and abiotic stress environments in different species.

Eq.No Equation Explanation
(1) where, xi is the number of frequency of j th codon for i th amino acid and being the number of alternative synonymous codons available for the i th amino acid.  Where Xij is the number of occurrence of j th codons in the reference set of highly expressed genes and Xmax is the maximum Xij for i th amino acid.