Codon bias and gene expression of mitochondrial ND2 gene in chordates

Background: Mitochondrial ND gene, which encodes NADH dehydrogenase, is the first enzyme of the mitochondrial electron transport chain. Leigh syndrome, a neurodegenerative disease caused by mutation in the ND2 gene (T4681C), is associated with bilateral symmetric lesions in basal ganglia and subcortical brain regions. Therefore, it is of interest to analyze mitochondrial DNA to glean information for evolutionary relationship. This study highlights on the analysis of compositional dynamics and selection pressure in shaping the codon usage patterns in the coding sequence of MT-ND2 gene across pisces, aves and mammals by using bioinformatics tools like effective number of codons (ENC), codon adaptation index (CAI), relative synonymous codon usage (RSCU) etc. Results: We observed a low codon usage bias as reflected by high ENC values in MT-ND2 gene among pisces, aves and mammals. The most frequently used codons were ending with A/C at the 3rd position of codon and the gene was AT rich in all the three classes. The codons TCA, CTA, CGA and TGA were over represented in all three classes. The F1 correspondence showed significant positive correlation with G, T3 and CAI while the F2 axis showed significant negative correlation with A and T but significant positive correlation with G, C, G3, C3, ENC, GC, GC1, GC2 and GC3. Conclusions: The codon usage bias in MTND2 gene is not associated with expression level. Mutation pressure and natural selection affect the codon usage pattern in MT-ND 2 gene.


Background
The mitochondrial genome is ideal as the molecular marker for species identification as well as systematic phylogenetic studies due to its small size. It is easily amplified and mostly conserved in gene content and characterized by lack of recombination, maternal inheritance and high evolutionary rate [1]. The respiratory chains of mitochondrial genome comprise of four complexes (complex, I-IV) and are encoded by 37 genes consisting of two ribosomal RNA (rRNA), twenty-two transfer RNA (tRNA) and thirteen protein coding genes. The complex-I of mitochondrial respiratory chain includes the first enzyme NADH dehydrogenase and its seven subunits (ND1-6 & ND4L) play a pivotal role in diverse pathological processes [2]. The subunit 2 of NADH dehydrogenase is encoded by ND2 gene and its function is not yet fully understood. However, literature suggests that a mutation in the ND2 (T4681C) gene was found in patients with Leigh syndrome, a neurodegenerative disease characterized by bilateral symmetric lesions in basal ganglia and subcortical brain regions [3]. Urrutia and Hurst (2003) reported that the codon usage in human is positively related to gene expression but is inversely related to the rate of synonymous substitution [4]. Several genomic factors such as gene expression level, protein secondary structure, and translational preferences balancing between the mutational pressure and natural selection contribute to the synonymous codon usage variation in different organisms [5,6]. Therefore, gaining the information on the synonymous codon usage pattern provides significant insights pertaining to the prediction, classification, and evolution of a gene at molecular level and also helps in designing highly expressed genes. In the present study we have carried out a comparative analysis of the ND2 gene codon usage and codon context patterns among the mitochondrial genomes of three chordate classes (pisces, aves and mammals) in order to understand the molecular mechanism along with functional conservation of gene expression during the period of evolution using several bioinformatics tools.

Retrieval of Sequence data
The coding sequences (cds) of MT-ND2 gene from five species of pisces, aves and mammals each were retrieved from National Center for Biotechnology Information, USA (http:// www.ncbi.nlm.nih.gov/) using the following accession numbers. The accession numbers of different species are AP006806, AP006813, AP006778, AP006825, AP006858, X52392, AF090337, AF090341, AF090338, AF090340, U96639, AJ001562, X14848, Y11832 and AJ001588. A perl programme was used to analyze the compositional features and codon usage bias parameters.

Compositional properties
The overall composition of A, T, G, C bases and its composition at 3 rd position along with GC, GC1, GC2 and GC3 contents were calculated using the perl script.

Codon adaptation Index (CAI)
Codon adaptation index (CAI) is used to estimate gene expression level. The CAI is calculated as Where, ωk is the relative adaptiveness of the kth codon and L is the number of synonymous codons in the gene [7].

Effective Number of Codons (ENC)
The effective number of codons (ENC) is the most extensively used parameter to measure the usage bias of the synonymous codons [8]. The ENC value ranges from 20 (when only one codon is used for each amino acid) to 61 (when all codons are used randomly). It is calculated as: Where Fk(k= 2,3,4,6) is the mean of Fk values for the k-fold degenerate amino acids.

Relative Synonymous Codon Usage (RSCU)
Relative synonymous codon usage was calculated as the ratio of the observed frequency of a codon to its expected frequency if all the synonymous codons of a particular amino acid are used equally [9]. The RSCU value is calculated using the formula where, X ij is the frequency of occurrence of the j th codon for i th amino acid (any X ij with a value of zero is arbitrarily assigned a

Correspondence analysis (COA)
Correspondence analysis is a multivariate statistical method used to study the major trends in synonymous codon usage variation in coding sequences and distributes the codons in axis1 and axis2 with these trends [12].

Software used
Novel software developed by SC (corresponding author) using Perl script was used to calculate all the codon usage bias parameters and nucleotide composition. The genetic code of vertebrate mitochondria having 60 sense codons available in NCBI database was used for the present analysis. The RSCU values of each codon from different species were clustered by hierarchal clustering method using XLSTAT.

Statistical analysis
Correlation analysis was used to identify the relationship between overall nucleotide composition and each base at 3 rd codon position. All the statistical analyses were done using the SPSS software.

Results & Discussion:
The overall nucleotide compositions in the coding sequence of MT-ND2 gene among pisces, aves and mammals were analyzed Table 1 (see supplementary material). Our results showed that the nucleobase C was the highest (%) in pisces and aves but the nucleobase A was the highest in mammals whereas G was the lowest in pisces, aves and mammals. For the 3 rd position of codon, A3 was the highest in pisces, aves and mammals but G3 the lowest. This clearly indicates that compositional constraint might influence the codon usage pattern of MT-ND2 gene [13].
The effective number of codon (ENC) values for MT-ND2 gene among pisces, aves and mammals were estimated  [14] . It was also found that the overall GC % was less than 50% and the gene was AT rich. This phenomenon was also reported in AT rich species such as Plasmodium falciparum [15].
We calculated the codon adaptation index (CAI) values for MT-ND2 gene in order to find out the expression level among pisces, aves and mammals (Figure 1). In our analysis, the CAI values were (Mean±SD) 0.7851±0.05, 0.7667±0.05, 0.7635±0.02 in pisces, aves and mammals, respectively. We used unpaired t test between pisces and aves as well as between pisces and mammals but the difference was not statistically significant.
Wei et.al 2014, also reported the average value of CAI in mitochondrial protein coding genes ranged from 0.5-0.7 in B.mori [16]. In addition, we performed a correlation analysis between ENC and gene expression level as measured by CAI and found no significant relationship suggesting that the codon usage bias in MT-ND2 gene is not associated with expression level among the three classes.
Moreover, we calculated the relative synonymous codon usage (RSCU) values in the coding sequences of MT-ND2 gene among pisces, aves and mammals Table 3  The overall percentage of GC contents at different codon positions were calculated (see supplementary material Table  S2). In order to find out the role of mutation pressure and natural selection, we constructed a neutrality plot of GC12 against GC3 (Figure 3, a-c) [17].
The linear regression coefficient of GC12 on GC3 indicated that natural selection plays a major role while mutation pressure plays a minor role in shaping the codon usage patterns in MT-ND2 gene. Our result was similar to the findings of Wei et.al (2014) in the mitochondrial DNA codon usage analysis of B.mori [16].
We performed correspondence analysis (CoA) based on RSCU values to analyze the codon usage variation in MT-ND2 gene among pisces, aves and mammals. In our analysis, the 1 st axis (F1) accounted for 34.50% of the total variation and the 2 nd axis accounted for 12.51% of the total variation (Figure 4). Further, correlation analysis was done to determine the interrelationships between the first two principle axes (F1 and F2), nucleotide constraints and indices of natural selection (CAI, Gravy, Aromo) on MT-ND2 gene. The F1 axis showed significant positive correlation with G, T3 and CAI whereas the F2 showed significant positive correlation with G, C, G3, C3, ENC, GC and GC1-3 but significant negative correlation with A and T Table 4 (see supplementary material). These results suggest both compositional constraint under mutation pressure and natural selection affect the codon usage pattern in MT-ND2. The results were similar to the findings of Butt et.al. [18].

Conclusion:
The codon usage bias in MT-ND2 gene is weak with high expression level. It is found that natural selection and mutation pressure affect the codon usage pattern in MT-ND 2 gene.