SSR repeat dynamics in mitochondrial genomes of five domestic animal species.

SSR (simple sequence repeats) are ubiquitously abundant in genomes. In organellar mitochondrial genome of animals, its distribution, size dynamics and effectiveness for phylogenetic relationship have not been understood. Present investigation reveals organisation of SSR in genic and intergenic region, its length and repeat motif dynamics, extent of conservation of flanking regions, appropriateness of these SSR data in establishing phylogenetic relationship. Contrary to eukaryotic nuclear abundance of SSR in non-coding region, we found abundance in coding region. Like nuclear SSR, most hyper mutable repeats were found in non coding region having di nucleotide motifs of mitochondrial genome but contrary to human having high mutable tetra repeats in case of mitochondrial genomes this was found to be with tri-motif repeats. SSR of mitochondrial genomes also show cyclical expansion and shrinkage in pattern of SHM (simple harmonic motion) with respect to time its non- linear thus not appropriate for phylogenetic analysis though the flanking regions of these SSR also conserved like nuclear SSR.


Background:
The near-absence of genetic recombination and high mutation rate with some selectivity in mitochondrial DNA makes it a useful source for analyzing microsatellite or STR (simple tandem repeat) or simple sequence repeat (SSR) dynamics on the archaeological time scale.Difference between the mean repeat sizes of two lineages is a linear function of the time since they diverged [1].Since some factors prevent allele from becoming staying large (since they cease to behave like microsatellite [2], so there is a maximum threshold value of SSR allele size beyond which alleles starts shrinking in size because of background mutation and repaired by DNA polymerase I up to minimum threshold allele size and again it starts increasing in its size as a function of time.Thus it's cyclical in nature in terms of elongation and shrinkage of repeats over evolutionary passage of time.The flanking region constancy has been found to be extensively conserved across distant taxa.For example, presence of homologous loci in each test of marine species within two families (Cheloniidae and Dermochelyidae), as well as in a freshwater species (Emydidae and Trachemys scripta) is the indication for this constancy approximately over 300 million years of divergent evolution [3].The flanking region is conserved and perpetuates down in evolution but the SSR loci reflects cyclical (shrinkage and expansion) variation in size over time which has been well documented in case of nuclear SSR [4].
Though earlier attempt has been made to relate the major cereal crops viz.rice, wheat, maize and sorghum phylogenetically using organellar genomes (mitochondria and chloroplast) SSR [5] but no such attempt has been made in case of animals.Unlike these crops, the relative abundance of type of repeat motif and dynamics across species in case of domestic animals are not documented with comparative study.In case of advanced eukaryotes these repeats are almost all in intronic (non-coding) regions but does similar situation exists in organellar SSR or not?It is yet to be known.Do these genomes have universal flanking regions bracketing SSR loci is not known.Like the case of crop studies [5] can we have a phylogenetic tree using SSR data of organellar genome are not known.The cyclical dynamism of repeat length over the archaeological time line has also not been studied in organellar genome.
The present in silico work in which five domestic animal species viz.buffalo, cattle, goat, sheep and yak has been taken as model, to investigate relative abundance of type of repeat motif, their distribution in coding and non coding regions, evaluation of mitochondrial SSR data for appropriateness of phylogenetic tree, dynamism of length and repeat motif and extent of conservation of flanking regions across loci with respect to time.

Mining of STR from mitochiondrial genomes:
For mining of microsatellite data especially type of repeat and motif viz di-, tri-, tetra, penta-and hexa-online programme SSRIT (Simple Sequence Repeat Identification Tools) were used [6] 1 in supplementary material).For a particular repeat motif in each specific genome, corresponding alleles in other genomes were identified by the presence of same flanking sequence.

Designing of universal primers:
SSR regions were identified in all the five mitochondrial genomes using FastPCR.A set of universal primers were designed (using online software Primer 3 ([8], http://fokker.wi.mit.edu/primer3/input.htm)across flanking regions of these identified SSR (Table 2 in supplementary material).While designing the universal primers across species the null alleles (mutations in 3' end region of primer) encountered were bursted by locking one primer and increasing the expected PCR product length to get its counterpart compatible primer.Before using the designed primers for ePCR, in silico evaluation of primer quality was done on three parameters viz.self dimer, cross dimer and self hairpin loop of each of the primer pair.The delta G values of these evaluations are shown in (Table 2

in supplementary material). ePCR on mitochondrial genomes:
All the five mitochondrial genomes subjected to analysis were at par in terms to become focal species for cross species electronic PCR amplification.For primer designing of two sets each from buffalo and yak (alphabetically ascending and descending in order) were treated as focal species to evaluate and further design the universal set of primer in remaining four 'heterologous' species.While designing the compatible universal primer in heterologous species, the estimated PCR products size(s) were treated as species specific allelic data for length polymorphism and microsatellite dynamics.

Microsatellite dynamics analysis across species:
The generated data of both SSR loci (Table 2 in supplementary material) in five species were used to establish allele size dynamics (shrinkage/expansion) as a function of time (Figure 1).The established relationship of organelle SSR is compared with dynamism of nuclear SSR on time line which has been reported [4].Comparative analysis of extent of mitochondrial SSR constancy across five species was done to observe magnitude of allelic size dynamism with respect to time.

Discussion:
The in silico mining of SSR in mitochondrial genomes of five species reveals that they are more abundant in coding region unlike the nuclear genomes where SSRs are usually present in intronic/non coding regions [9].The percent of SSR in coding regions of buffalo, cattle, sheep, goat, and yak mitochondrial genomes are 65.28 %, 68.59 %, 65.01 %, 64.40 % and 65.58 %, respectively.An average of 29.66 % of whole genome in these 5 species (28.95 %, 29.59 %, 29.45 %, 30.31 % and 30.01 % in buffalo, cattle, sheep, goat and yak, respectively).Contrary to nuclear genome SSR interestingly, the organellar SSR shows more abundancy in coding region than non coding region in all the mitochondrial genomes.Such SSR abundance in coding region has been reported in lower eukaryotes like [10].This might be because of prokaryotic origin of mitochondria (endosymbiont hypothesis) or evolutionary legacy in lower eukaryotes.
The microsatellites mined from whole genomes were classified in to two classes viz.class I containing only mono-repeat motif and class II containing di-to hexa-motif repeats.The density of class II microsatellites was found to be 108 bp-113 bp per kbp in exonic region and 118 bp-129 bp per kbp in intronic region.Di-motif repeats are abundant in each case (Table 1 in supplementary material).Maximum number (754) of di-motif repeats were found in goat and minimum is 704 in buffalo in coding region.In noncoding region it was relatively less abundant (maximum 396 in goat and min 350 in cattle).Least frequent repeats were of penta and hexa motifs.Goat mitochondrial genomes were with maximum number of penta-(50) and hexa-(24) motif repeats.Maximum numbers (155) of tetra-motif repeats were found in sheep and least (126) in cattle.Among the di-motif repeats (CA)n were abundant where n varies from 148 to 204.Our data shows that hexa-and penta-repeat motif are less abundant which is because of selectivity involved on maintaining microsatellite within certain range.A similar size constraint in repeat number and length over the period in different taxa has been reported .The general conclusion from these studies is that there is an exceptionally high rate of mutation adding or subtracting a small number of perfect repeats.In humans, the average overall mutation rate for 28 di-and tetra-nucleotide microsatellites was estimated at about 0.001, with the tetra nucleotide repeats significantly more mutable than the dinucleotide repeats.The most popular explanation for the high mutation rate is polymerase slippage [15], a hypothesis that received considerable support from an elegant in vitro analysis showing that polymerase tends to miscopy repeated tracks of DNA [16].A sub set of SSRs namely trinucleotide plays important role in eukaryotes because of expansion of these triplet repeats, where the rate of mutation depends on the number of tandem units within the repeat, this is the basis of dynamic mutation [17].The designed primer to generate SSR allelic data by electronic PCR using two sets of universal primers shows that the flanking regions are well conserved in all the 5 species.Such conservation of flanking regions of SSRs has been reported over a longer evolutionary period of time which is as high as 300 million years [3].Though the flanking regions are conserved but STR loci shows differential alleles in all the five species investigated.
In all the five species, generated data of allele size dynamics (shrinkage/expansion) as a function of time (Figure 1) shows simple harmonic motion (SHM) pattern.This pattern is similar to established dynamism of nuclear SSR [4].This is because of distinct mutational processes ([18, 19]), slippage mutation which involves the addition or subtraction of one repeat unit.SHM was shown by SSR but not by flanking region of SSR, the cause for this phenomenon is that, the mutation rate of SSRs which is 10 -

Conclusion:
Present study revealed the relative abundance and motifs of SSR in mitochondrial genomes of five domestic species.Contrary to eukaryotic nuclear abundance of SSR in non-coding region, we found abundance in coding region.Like nuclear SSR, most hyper mutable repeats were found in non coding region having dinucleotide motifs of mitochondrial genome but contrary to human having high mutable tetra-repeats in case of mitochondrial genomes this was found to be with tri-motif repeats.SSRs of mitochondrial genomes also show cyclical expansion and shrinkage in pattern of SHM with respect to long evolutionary time which is non-linear.Because of such SSR data are not appropriate for phylogenetic analysis though the flanking regions of these SSRs also conserved like nuclear SSR.Lengths of SSRs obtained are much useful in predicting the influence of transcriptional activity in promoter region.The present work does not take account of intra-species variation in each species which can be further verified with reasonable sample size in wet lab work to revalidate the work further, hitherto not done.

Figure 1 :
Figure 1: Allelic size variation on archaeological time line.(a) primer set I; (b) primer set II.
[10].The mutational differences amongst different motifs di-(Figure 4(A) and Figure 4(B)), tri-(Figure 4(C) and Figure 4(D)) and tetra-(Figure 4(E) and Figure 4(F)) between 5 species were compared in both coding and non coding region of genomes.Among non coding region di motifs were having faster mutation and in coding region tri motifs were with high mutation rate (Figure 3(A) and Figure 3(B)).This is in contrary to human STRs where tetra repeats shows hyper mutation [11].Interestingly in rare cases in human coding region triplet repeat slippage event gives rise to copy number mutation or dynamic mutation example -Fragile X syndrome [12].Microsatellite mutation processes have been inferred by direct observations both on artificial constructs in yeast [13] and in human pedigrees [14]

3 /
www.bioinformation.netHypothesis ISSN 0973-2063 (online) 0973-8894 (print) Bioinformation 4(4): 158-163 (2009) © 2009 Biomedical Informatics 160 (addition of repeats >> deletion of repeats) where as in flanking region mutation (mutation rate is 10 -9 /cell/generation i.e. back ground mutation rate) which works on both SSR region (concerning single simple repeats into compound interrupted repeats) as well as in flanking region[20].Evaluation of mitochondrial SSR data for appropriateness of phylogenetic tree based on length polymorphism and structural polymorphism revealed that the trees are not in conformity with established phylogenetic relationship of these five taxa.The structural polymorphism based rooted (Figure 2(A)) and unrooted tree (Figure 2(B)) and length polymorphism data (allelic length difference) UPGMA based rooted tree (Figure 2(C)) all were not in conformity.A second set of generated STR data for same set of tree shows same unusual pattern (Figure 2(D), Figure 2(E) and Figure 2(F)).Though the UGMA based tree of both set of data shows differences but again non-conformity with established phylogenetic relationship among five model species studied.SHM was shown by SSR but not by flanking region of SSR, the cause for this phenomenon is that, the mutation rate of SSRs which is 10 - cell/generation that is hyper mutation which is 10 6 times more (addition of repeats >> deletion of repeats) where as in flanking region mutation (mutation rate is 10 -9 /cell/generation i.e. back ground mutation rate) which works on both SSR region (concerning single simple repeats into compound interrupted repeats) as well as in flanking region [20].

Figure 2 :Figure 3 :
Figure 2: Rooted and rooted trees based on structural and length polymorphism for primer set I and II.

Figure 4 :
Figure 4: Di-motif repeat variation on archaeological time line (a: coding region b: non-coding region); Tri-motif repeat variation on archaeological time line (a: coding region b: non-coding region); Tetra-motif repeat variation on archaeological time line (a: coding region b: non-coding region); (Second set of primer) T A = Annealing temperature; AS = Allele size; Delta G value = calculated value (Worst case) for forward primer calculated value (Worst case) for reverse primer .