Classification of DNA sequences based on thermal melting profiles.

A new classification scheme based on the melting profile of DNA sequences simulated thermal melting profiles. This method was applied in the classification of (a) several species of mammalian - β globin and (b) α-chain class II MHC genes. Comparison of the thermal melting profile with the molecular phylogenetic trees constructed using the sequences shows that the melting temperature based approach is able to reproduce most of the major features of the sequence based evolutionary tree. Melting profile method takes into account the inherent structure and dynamics of the DNA molecule, does not require sequence alignment prior to tree construction, and provides a means to verify the results experimentally. Therefore our results show that melting profile based classification of DNA sequences could be a useful tool for sequence analysis.


Background:
The DNA double helix has more information built into its structure, that are both local such as variation in base pairing interactions and stacking interactions, as well as long range such as dynamic superhelical stress [1-3]. These interactions are responsible for the physical chemistry of the sequences and they are reflected in the thermodynamic properties such as the melting temperature. Therefore, sequences that have a high homology are expected to have similar thermodynamic parameters such as the melting temperature (Tm). Melting of a DNA molecule involves the denaturation the double-stranded DNA molecule into two single strands and it is the reverse process of hybridization. The denaturation process can be affected by many means such as an increase in temperature or denaturant concentrations [4]. The melting curve represents the denaturation process as a function of increasing sample temperature. Experimentally, the melting profile can be monitored by optical techniques such as absorption and fluorescence microscopy as the interactions among stacked bases cause a decrease in UV absorption. Melting of double-stranded DNA at elevated temperatures involves breaking the hydrogen bonds of the base pairs and a decrease of base stacking. This results in an increase in UV absorption, a hyperchromicity, which can be measured with a spectrophotometer [4].
Melting curve analysis has been used in many applications such as the detection of single nucleotide polymorphisms (SNPs) [5-7] and has recently been proposed as an approach to DNA sequencing [8]. DNA melting profile analysis has also been used in many clinical research applications [9]; these include genotyping [10-13], mutation scanning [14,15] and simultaneous genotyping and mutation scanning [16][17][18]. Experiments based on melting profiles have also been used as a rapid, economical means of screening close relatives for transplant compatibility [19]. The melting behavior and melting temperature of oligonucleotides can be predicted by a wide range of thermodynamic models which assumes that the stability of a DNA duplex depends on the identity and orientation of the neighboring base pairs [5-7, 20-25]. The idea of using the thermal stability, in particular the melting temperature to differentiate between DNA sequences was originally suggested in the pioneering work by King and Wilson [26] and later followed by others [26][27][28][29]. King and Wilson used the nucleic acid hybridization melting temperature to quantify the resemblance between human and chimpanzee genes [26]. The difference in melting temperature between the reannealed human DNA and human-chimpanzee hybrid DNA is about 1.1° C, and that to the sibling species of Drosophila, congeneric species of mice and congeneric species of Drisophila are 3° C, 5° C and 19° C, respectively. Higher the difference in the Tm, larger the evolutionary distance between the species. As longer DNA sequences tend to have several localized melting events [30,31], the melting profile of the DNA sequence has additional information [9]. In one of the early works, Schmid and Marks [28] demonstrated the use of DNA hybridization as a guide to phylogeny using a model system of heteroduplexes formed between human β globin CDNA and four β -like globin genes isolated from a different species (chimpanzee).
In this study, we present a simple method for classifying the nucleotide sequences using simulated melting profiles. We demonstrate the utility of this method in β -globin and gene clusters of MHC class II α-chain proteins across multiple mammalian species. Comparison of the melting profile generated phylogenetics with that of the conventional sequence based approach reproduces many of the major features, but do not show a perfect match. The major advantage of this method is that it provides a way to verify phylogenetics constructed only from the DNA sequence and the molecular evolutionary process experimentally.

Methodology:
The first exon of the β -globin gene is often used as a standard example in many DNA-based graphing methods [32] is used as the first example. The gene family of β -globin varies between 86-105 bases and has a significant biological role in oxygen transport. βglobin gene across the 11 species is listed in

Results:
The gene family of β-globin across the 11 species are listed in Table 1 along with the percentage of the various bases, GC content as well the estimated Tm, for each sequence. Figure 1 shows the simulated melting profile using the program MELTSIM of the denaturation process (θ vs. temperature) (in Figure.1a) for all the 11 sequences and its first derivative dθ/dT (in Figure. 1b). The population value 0.5 and -0.5 represents the double-helical and denatured (single strand) DNAs, respectively and the melting temperature (Tm) is defined to be where these populations are equal.
The first derivative of the melting profile is more illustrative of the process of denaturation, as the melting temperature is represented as a peak (dθ/dT). Each derivative profile shows peak value that corresponds to the Tm and additional features such the width, shape and other low intensity peaks are manifestations of the sequence composition [41,42]. As β-globin is represented by a relatively small number of bases such distinct features are not pronounced, except for the opossum (highlighted with dark curve in Figure 1). The DNA sequence of opossum has the lowest of the GC content of the sequences and consequently has the lowest Tm [43]. Melting profile based phylogenetics of β -globin is shown in Figure 2a, while the similar profile using the sequence is shown in Figure 2b. Most of the tree structures are reproduced in the melting profile based phylogenetics, with few notable differences. In both approaches the sequence of Opossum is clearly differentiated and the relative distances between Rat, Lemur and Gallus are also reproduced. Sequence based approach shows the cluster of Human, Chimpanzee and Gorilla, while the melting profile based approach keeps only the cluster of Chimpanzee and Gorilla together.  Figure 3a shows the plot of the profiles and the change in the Tm is shown in Figure 3b. Increase in salt concentration increases Tm (70°C at 0.01M to 98°C at 0.5M) and follows the empirical relationship between Tm vs. log(Concentration) [44]. Increasing the salt concentration also drastically changes the melting profile such as drastic loss of fine features that represent local melting events. Both melting profile and the Tm are important to differentiate the sequences in determining the phylogenetics and therefore optimal concentration of the salt is expected to be critical. Figure 4 show the melting profiles and molecular phylogenetics, respectively for the genes of class II MHC α-chain genes [33]. The origins and divergence times of mammalian class II MHC genes have been studied in detail by Takahashi et al [33] and the data presented here is one of the subgroups of gene clusters studied elsewhere ( [33]). The sequence lengths of class II MHC α-chain genes are much larger than that of β -globin. Total of 31 different sequences and sequence length close to 612 for most sequences (see supporting material for the list genetic sequences) were used. As expected the melting profiles of these genes are complex suggesting the presence of additional features that could be useful to differentiate one sequence from the other. Original sequence based analysis of class II MHC α-chain genes showed four sequences (Zebra fish A1, A2 and A3 and shark) belong to an out-group and the melting temperature analysis (Fig. 4) also reproduces the same result. Mammalian class II MHC genes are clustered into four major groups, DRA, DPA, DQA and DNA [33]. Melting temperature based phylogeny is able to reproduce three of the four clusters only with overlaps between the closer clusters DQA and DPA.   Salt concentration is expected to play a significant role on the applicability of DNA melting based differentiation between the sequences (Figure 2). The hypochromicity of DNA, responsible for the denaturing of the double helix is explained in terms of the interaction of the bases when they are stacked in the double helical array as delineated by Watson -Crick Model [49][50][51]. Semiquantitative models developed in the 1960s continue to provide significant insight to the melting profiles of DNAs [44,52]. Another major variable generating the melting profiles of the DNA sequences is the choice of the simulation program. In this work we have used the one of the widely used method, MELTSIM (materials and methods). For short oligonucleotides (16-32 bases), Panjkovich and Melo [53] performed an extensive comparison of the various methods. In their study it was noted that large and significant differences in the estimations of Tm were obtained while using different methods and no conclusive recommendations were provided on the choice of simulation methods to determine the melting profiles or its accuracy. Here, we are suggesting an approach to classify DNA sequences has potential implication for sequence analysis; DNA sequences could be classified purely from the experimental melting profiles and sequence information is not mandatory as it depends. This method is expected to find wider applications once the sensitivity of the results is established by experiments.