Microsatellite repeat dynamics in mitochondrial genomes of phytopathogenic fungi: frequency and distribution in the genic and intergenic regions

The frequency and distribution of microsatellites were analyzed in the 19 mitogenomes of phytopathogenic fungi covering five phyla. Our analysis revealed that in all the mitogenomes studied, the frequency and relative abundance varied, and it was neither influenced by genome size nor by GC content. SSRs were found to be differential distributed in genic and intergenic regions. An average of 5.14 (23.6%) SSRs were present in genic sequences and 21.7 (76.4%) SSRs were located in the intergenic sequences. Relative abundance of SSRs in mitogenomes was the highest in Aspergillus tubigensis, whereas, it was the least in Phaeosphaeria nodurum, the average being 0.45. Trinucleotide repeats were the most abundant motifs in the genic and intergenic regions of the mitogenomes of the phytopathogenic fungi. Among the genes, cox1 harbors the maximum SSRs, whereas cox3 and nad 7 contain the least. Based on the presence of SSRs in a particular gene, genetic relationships among individual organisms were also established.


Background:
Microsatellites or Simple Sequence Repeats (SSRs) are the portions of the genome consisting of a sequence of repeats of a given string of nucleotides which have lengths from one to six bases. These repeats exhibit a strong level of instability, undergoing additions or deletions of repeated units, which lead to variation in the number of copies of the repeated stretches [1]. Microsatellites can be found in the protein-coding and noncoding regions of genome. Because of their abundance and inherent potential for variation, microsatellites are a valuable source of genetic markers, and are widely used in population genetics, genetic diversity, fingerprinting and forensic analysis in many organisms including bacteria, fungi, plants and human [2,3]. Apart from their applications as molecular marker, determining the abundance and density of microsatellites may help to understand whether these sequences have any functional and evolutionary significance. Large scale genome sequencing initiatives on the growing number of organisms had provided an opportunity to evaluate the abundance and relative distribution of microsatellites in different genera. Several microsatellite-mining reports in various organisms including fungi have appeared in recent years, providing important data for the comparative analysis of microsatellite distribution [4,5]. These reports were focused on nuclear genomes and relatively few microsatellite analyses have been conducted in organelle genomes, especially in the mitochondria.
Mitogenomes are generally characterized by low GC content, conservation in gene function, and high copy number [6] and they can evolve at their own rate relative to the nuclear genomes of the organisms in which they occur [7]. The size and topology of the mitogenome, the number and nature of the proteins it encodes, and even the genetic code itself can vary greatly between the species [8]. The mitogenomes of fungi are generally an order of magnitude smaller than those of plants but larger than animal mitogenomes and usually contain 14 genes encoding hydrophobic subunits of respiratory chain complexes, as well as genes for the large (rnl) and small (rns) ribosomal subunits and a set of tRNAs [8].
The availability of mitochondrial genome sequences is increasing as a result of recent technological advances in molecular biology. Although, many fungal mitogenomes are publically available, no formal analyses of microsatellites in these sequences are reported. Thus, the aims of this study were, (1) to reveal various facets of the distribution and dynamics of microsatellites in the mitogenomes of phytopathogenic fungi (2) and to construct phylogenetic relationship between them. To accomplish this, an in-silico approach was used to analyze the frequency and distribution of the microsatellites in genic and intergenic regions of the genomes.

Sources of mitochondrial genomic sequences
The twenty one mitochondrial genome sequences of different phytopathogenic fungi were retrieved from National Center for Biotechnology Information and Broad Institute (www.broadinstitute.org) Table 1 (see supplementary material). Files were obtained from the respective mitochondrial genome for each organism in FASTA format. Of the 19 fungal species taken, ten belongs to the phylum Ascomycota, 7 belongs to Basidiomycota and remaining two were from Chytridiomycota and Zygomycota. The frequency of the repeats motifs in genic and intergenic region were analyzed using WebSat online software [9], which is accessible through internet, requiring no programme installation. Repeats greater than twelve bases were considered as SSRs which means that there should be twelve occurrence of mono-nucleotide repeat, six occurrence of dinucleotide repeats, four occurrence tri-nucleotide repeats, and three occurrences of tetra, penta and hexa-nucleotide repeats. All SSR were analyzed for their frequency of occurrence and relative abundance. Relative abundance was calculated as SSR per kb of sequence.

Statistical analysis
A binary data was generated on the basic of presence and absence of repeat motifs in a mitochondrial gene and analyzed using SIMQUAL route to generate Jaccard's similarity coefficient using NTSYS-PC, software version 2.1. These similarity coefficients were used to construct a dendrogram depicting genetic relationships among the species by employing the Unweighted Paired Group Method of Arithmetic Averages (UPGMA) algorithm and SAHN clustering. The robustness of the dendrogram was evaluated with a bootstrap analysis performed on the binary dataset using WINBOOT software (version. 2.0).

Evaluation of polymorphism
The Polymorphism Information Content (PIC) was measured as described by Botstein et al. [10]. PIC is defined as the probability that two randomly chosen copies of gene will be different alleles and will be different within a population. The PIC value was calculated with the formula (described in supplementary material)

Discussion:
The recent technological inventions in genome sequencing have accelerated the decoding of organisms genetic architecture. This results in generation of huge amount of sequencing data of various organisms which can be exploited for evolutionary and comparative genomics, including phylogenomics.
In the present study, we have estimated the number and frequency of SSRs in the mitogenomes of 19 phytopathogenic fungi. The relative abundance of SSRs is randomly distributed across all species Similarly, two Sordariomycetes fungi, F. graminaerium and F. oxysporum which varied greatly in their genome size exhibit a more similar relative abundance to each other, whereas, F. verticilliodes showed higher. Karoglu et al. [11] while comparing the SSR repeats in nuclear genomes of fungi observed the same trend. The possible reason of this non-random distribution of SSRs may lies in the mechanism through which SSRs are generated. Two possible models have been proposed to explain microsatellite generation: replication slippage and unequal recombination. Replication slippage generally involves DNA polymerase pausing and dissociation [12]. Difference in mutability and the bias in repair efficiency of mismatch-repair system could also lead to over-representation of SSRs in certain genomes [13]. While comparing the different repeat classes of SSRs in genic and intergenic regions, we observed that the trinucleotide repeats were the most abundant classes of SSRs in both genic as well as intergenic regions with a frequent occurrence of motif AAT\TTA (Figure 1). While comparing with other reports, we found that in the case of genic regions, this is a common trend because these trinucleotide SSRs in the coding region are translated into amino-acid repeats, which possibly contribute to the biological function of the protein [4, 5]. However, high frequency of trinucleotide SSR in intergenic regions is rare. Rajendrakumar et al. [14] observed the abundance of dinucleotide SSRs in genic and intergenic regions of mitochondrial genome of rice. Our results show that the high frequency of trinucleotide repeats is a characteristic feature of mitogenomes of phytopathogenic fungi. The abundance of motif AAT\TTA may be attributed to the nucleotide composition of genome. It has been proposed that base usages of motifs are significantly correlated with the genome nucleotide composition. Percentage of AT-rich motifs will rise with the increase of genome AT content [15]. The longest repeat in the intergenic region was a mononucleotide (C) repeated 23 times in P. noduram. Similarly, the longest repeat in genic region was also a mononucleotide (T) which was repeated 17 times in P. sojae. We observed that SSRs in the genic and intergenic regions of mitogenomes were of low repeat numbers (4-5U). The possible reason for this is the genesis of SSR which is generally through slippage and slippage can even begin at low repeat numbers. Since mitogenomes are generally conserved, it is likely that the SSRs first arise by chance substitutions that make a short repetitive sequence, which can undergo slippage if they are above some threshold size [16]. The low repeat number is also evident in the nuclear genomes of fungi where shorter numbers of repeats (5-7 U) were predominated with around 90% of all motifs [11,17]. Cross genera conservation of motifs in a particular gene was also observed in our analysis.eg. Motif (tat)5 was conserved in cox 1 gene of five species. This conservation of repeat is expected because cox 1 gene is slightly more conserved as compared to other genes and has been proposed as an appropriate region for genetic barcoding of species [18]. In our study, the presence of SSRs in specific genes was used to create genetic relationship among different species (Figure 2). The result indicates that set of genes coupled with the presence of SSR were capable of analyzing the genetic relationship. Protein-coding genes (Cox 3 and Nad 7) were found to be most informative whereas ssRNA and lsRNA genes were least. Mitochondrial cytochrome genes are known to be good performer in establishing phylogenies in vertebrates also [19]. The possible explanation for the poor performance of RNA genes lies in the higher variability in these RNA genes. We observed that fungi from Ascomycota have greater proximity with Oomycota whereas, Zygomycetes were close to Basidiomycetes. This is expected because we have considered only the presence of repeat in a particular gene for estimating phylogenetic relationship. Fine resolution can be obtained by taking the presence of a particular motif or repeat number; however, this will lead to incongruent result.

Conclusion:
The study has identified the pattern of distribution of microsatellites in the genic and intergenic regions of 19 mitogenomes of phytopathogenic fungi and established relationship between individual genomes. Most of the repeat motifs were confined to the intergenic region. Hence the occurrence of SSR is non-random in mitogenomes. Unlike nuclear genomes, trinucleotide repeat motifs were found to be the most abundant class of repeats in the intergenic regions. Genetic relationship can be obtained on the basic of SSRs present in mitochondrial protein coding genes. To the best of our knowledge, this is the first extensive study of SSR repeat dynamics in the mitogenomes of phytopathogenic fungi.