Genome-wide codon usage bias analysis in Beauveria bassiana

Codon usage bias analysis allows in identifying the factors that are influencing and contributing to shape the evolution of the organisms. Therefore, it is of interest to analyze 10363 gene sequences from Beauveria bassiana. The GC content with 51.50% is higher than the AT content (48.50%) in B. bassiana. The fungal nuclear genes tend to be GC rich and predominantly G/C ending. Codon usage bias exhibited by B. bassiana is based on the Relative Synonymous Codon Usage (RSCU) values of 61 sense codons, of which 28 codons are with RSCU value larger than 1. Other factors like Nucleotide composition, mutational pressure and selection also has a role in shaping the codon usage bias. We identified 24 optimal codons that end with G or C. Correlation analysis suggests existence of translational efficiency of amino acids. Based on the GC3s distribution evolution of the B. bassiana genes is by the contribution of mutation pressure. ENC may be the major factor in shaping the codon usage bias. This study provides insights into the compositional selection pressure of the genes in B. bassiana


Background:
The probability of the codon used for an amino acid over a different codon, which codes for the same amino acid is regarded as codon bias. Different codons that encode the same amino acid are known as synonymous codons. Even though synonymous codons encode the same amino acid it has been shown that for a wide variety of organisms different synonymous codons are used with different frequencies. This phenomenon is termed as codon bias [1]. It is found in all eukaryotic and prokaryotic genomes. Codons used more often are referred to as optimized codons or preferred codons. Synonymous codon usage identity may be varying or similar in the genome or among different genes within the genome. Several factors that influence the variations in the codon usage patterns which include genetic drift, mutational pressure and natural selection [2] and these factors are highly responsible for differences in codon usage variations among different organisms. Multiple forms of selection may act resulting in different clusters of synonymous codon usage patterns among genes within the genome [3].
An analysis of genome-wide codon usage bias patterns investigates their consequences and causes and helps in identifying the selective forces that are involved in shaping the evolution of the codon usage patterns, which help in understanding the perspectives of genome biology [4]. Codon usage bias of several organisms have been analyzed however, very little is known about codon usage bias in B. bassiana entomo pathogen, belonging to Hypocrealean fungi (cordycepitaceae, Ascomycota) that is used as a potential biopesticide. It is an environmental friendly mycoinsecticide, which is commercially available whose genome was sequenced and light has shed on its differential gene expression and adaptability to different niches [5]. It has diversifying roles apart from bio-pesticidal activity, also found as an endophyte both naturally and from inoculated samples and had a role in suppressing plant pathogens [6, 7] which makes it more interesting to make further investigations to go through the details of genetic content. The accuracy and efficiency of protein production can be modulated with differences in codon usage while maintaining the same protein sequence [8]. Synonymous codon usage patterns identification proves useful in identifying the genes likely under translational selection [9]. In this study we analyzed the codon usage bias of B. bassiana. The objectives of the present study are to investigate the presence of codon bias and to identify the preferred codons in the B. bassiana genome and to examine the contribution of influencing factors on the usage of synonymous codons.

Methodology:
The flowchart for methodology is given in Figure 3.

Sequence data:
The 10363 CDS (Coding domain sequences) dataset of B. bassiana (ASM28067v1) from the whole genome sequence were downloaded from National Centre for Biotechnology Information (NCBI) in FASTA (fasta and fna) format (http://www.ncbi.nlm.nih.gov/genome/).

Effective Number of Codons (ENC) and ENC plot:
ENC is assessment of non-uniformity of usage within synonymous groups of codons [11]. ENC values vary from 20 (extreme bias i.e., only one codon is used for one amino acid) and 61 (random bias i.e., codons used randomly). ENC values were plotted against GC3s values to find out the codon usage bias-influencing factor [11].

Relative Synonymous Codon Usage (RSCU):
RSCU is defined as the ratio of observed frequency of codons to the expected frequency. If the RSCU value is equal to 1 the codon is not biased and if RSCU value is >1 codon is frequently used.

Codon Adaptation Index (CAI):
CAI is a measurement of the relative adaptiveness of the codon usage of a gene towards the codon usage of highly expressed genes. CAI values range from 0-1. The higher values indicate a higher-level gene expression as well as codon bias [12].

Neutrality plot:
The GC content is calculated according to the first, second and third codon positions (GC1, GC2 and GC3 respectively). GC12 is the average of GC1 and GC2 used for the analysis of neutrality plot (GC12 against GC3). Neutrality plot is used to analyze the relationship between GC12 and GC3, and the factors influencing the codon usage bias [13,15].

Neutrality plot:
To characterize the correlation among three positions of GC the neutrality plot is drawn. The relationship between GC12 and GC3 was revealed with neutrality plot (Figure 1). The neutrality plot reveals that the genes of B. bassiana exhibit a wide range of GC3 values, ranging from 20.16% to 95.78%. If a gene is located on the diagonal line with a significant correlation between GC12 and GC3, it indicates that the gene is under neutral selection pressure. The points (genes) were located above the regression curve (bold line) with a slope less than 1, indicating that the natural selection pressure is dominating the composition of coding codons in B.bassiana. GC12 and GC3s showed a significant positive correlation (r= 0.3348, p<0.001). The slope of regression line for all genes was 0.1196, which indicates that the effect of mutation pressure is 11.96% and the influence from other factors is around 88.04%.

Effective Number of codons (ENC) and GC3s association
The ENC of B.bassiana ranges from 24.68 to 61.00 with an average of 48.02. Among 10363 genes 808 genes exhibited high codon bias (ENC<35), indicating that B.bassiana genes, in general exhibit random codon usage without strong codon bias.  An ENC plot was generated to explore the influence of GC3s on codon bias in B.bassiana. If a gene is located on the expected curve, the codons of that gene are no bias. The GC3s distribution was in between 0.4 and 0.99, indicating that B.bassiana mainly evolved by mutation pressure (Figure 2). The distribution of ENC versus GC3s reveals, most of the points with low ENC values lay below the expected curve. This indicates that the mutational pressure and other factors are likely to be involved in determining the selective contribution on codon bias.

Correlation between codon usage bias, gene length, Hydrophobicity and Aromaticity in B.bassiana
Correlation between the codon usage indices such as gene length, codon usage bias and hydrophobicity and aromaticity was determined using Spearman correlation analysis (        Table 3).
Each amino acid has the synonymous codons, the putative optimal codons of B. bassiana are given in (Table 4). There is a difference in number of synonymous codons for each amino acid. There were 25 optimal codons that end with G or C (G= 10/25, C= 15/25), which suggests the third position in the preferred codons may be related to the GC content. There are two or three optimal codons for each amino acid indicating that the codons were significantly correlated with translation levels.

Discussion:
Codon usage bias is an essential feature of all genomes [14]. In particular, for species of fungi codon usage bias was driven by selection [22, 23, 24] and partly genetic interference in the model organism Neurospora crassa [25]. Codon usage bias is recognized as a critical factor contributing to gene expression and cellular function with its effects on processes like RNA processing to translation and protein folding [26]. Optimal codons were identified by comparing the low and high bias datasets, these codons if significantly correlate with translational levels [19], they would be helpful in designing degenerate primers in order to investigate evolutionary aspects of B. bassiana. B.bassiana exhibit no strong codon usage bias, there is a random codon usage bias. There is a strong evidence for selection of translational efficiency of amino acids and also there is the contribution of mutational pressure and other factors to codon usage bias. The natural selection pressure dominates the codon usage in B.bassiana.

Conclusion:
The present study brings out the codon usage details of entompathogenic fungus Beauveria bassiana. We found no strong codon bias in B.bassiana. Reason for random or selective contribution of codon bias is mutational pressure and other factors like natural selection. There is also influence of translational efficiency of amino acids in shaping codon usage bias. Our analysis forms the footwork of genetic evolutionary aspects of B.bassiana. Further studies may reveal more details relating to the evolution and other molecular aspects of these fungi.