Classifying glycerol dehydratase by its functional residues and purifying selection in its evolution.

Glycerol dehydratase (GD) catalyses glycerol reductive conversion to 3-hydroxypropanaldehyde (3-HPA), this being the first step required for the microbial conversion of glycerol to 1, 3 -propanodiol. GD has been functionally characterised to date and two main groups have been determined, one of them being vitamin B(12)-dependent and the other B(12)-independent. GD evolutionary history has been described and an exhaustive analysis made for detecting the functional residues responsible for type I divergence. GD phylogenetic tree topology was seen to be statistically robust and the data indicated strong purifying selection operating on the GD proteins within it. Two clades were indentified, one for vitamin B(12)-dependent and the other for B(12)- independent classes. The ancient hot-pot residues responsible for protein divergency for each clade were also identified. The basic evolutionary biology for GD proteins has been described, thereby opening the way forward for developing rational mutagenesis studies.


Background:
Interest in glycerol dehydratase GD (EC 4.2.1.30) has increased beyond academic circles in the past few years because of its role in the fermentation pathway for producing industrial 1,3-propanediol (1,3-PD). Two kinds of GD have been characterised to date. The first one catalyses glycerol conversion to 3-hydroxypropionaldehyde via a radical mechanism depending on the extensively studied 5'-deoxyadenosylcobalamin (vitamin coB12) [1]; the other performs the same function but is B12-independent. Both enzymes belong to the new radical SAM superfamily of proteins which has been identified in all kingdoms of life and has been shown to catalyse a diverse array of chemical reactions having significant medical and biotechnological importance. The GDs specifically belongs to the lyase family which cleaves carbon-oxygen bonds [2].
The cofactors required for such common activation mechanism are a [4Fe-4S] + cluster (three Fe2+ ions and one Fe3+ ion) and S-adenosylmethionine (SAM). Glycerol dehydratase is a key enzyme for the dihydroxyacetone (DHA) pathway [3]. The C. butyricum enzyme presents the highest identity (47%) with E. coli PFL (piruvate formate lyase) according to Raynaud et al., specifically the C-terminal domain (the radical loop). Its overall structure is an β/α barrel containing its catalytic properties. The B12-independent enzyme forms a monomer forming a functional dimer [4]; however, the B12-dependent one exists as an αβγ heterotrimer dimer. The α monomer corresponds to the β/α barrel [5].
Neither the basic evolutionary biology for this class of protein nor the type of residues considered to be evolutionary hot spots has been deduced at the present. This study has examined GD molecular evolutionary history to determine whether the evolutionary process has been responsible for the high degree of sequence conservation. Different methodological approaches were used for analysing synonymous (pS) and nonsynonymous (pN) changes in 31 GD sequences. PRATT software was used for predicting the GD motif signature and the Evolutionary Trace server was used for determining evolutionary traces for the GD protein. Specific amino acids responsible for selective restriction were then identified, phylogenetic divergence being produced for this protein. DIVERGE 1.0 software was used in our approach for evaluating all protein sequences.

Methodology: Sequences:
An exhaustive search was made in GenBank, EMBL and Swiss-prot databases for GD nucleotide and protein sequences. This search was optimised by using BLAST, PSI-BLAST and WU-BLAST software (6) using the Clostridium butyricum protein sequence as search entry (access number ABX56860.2). 103 hits were obtained and then filtered by removing partial and redundant sequences from the population. Complete protein representations were included by strain; our final working population consisted of 31 complete protein sequences. SMART software was used for scrutinising all sequences in the search for typical GD protein domains [7]. GD crystal structures were downloaded from the PDB database; the 1r9d structure [4] was used as template for divergent functional residue analysis.

Alignment and phylogenetic reconstruction:
Muscle software [8] was used for gene and protein alignment of the 31 previously collected sequences, using default parameters. dS and dN percentage changes were computed using a modified version of the Nei-Gojobori test; the Tajima test was calculated using MEGA 4.0 software and the SNAP server [9]. A combined strategy was used for phylogenetic analysis; the NJ method was used first for phylogenetic reconstruction and p-distance as a model for distance analysis [10]. Statistical robustness was calculated by using 5,000 Bootstrap repeats. MEGA 4.0 software was used throughout [11].
Secondly, the alignment was then analysed using ProtTest [12] to determine the protein evolution model having the best fit for GD sequence alignment. Phylogenetic analysis then used Phyml 3.0.1 [13], using 1,000 Bootstrap repeats. The phylogenetic tree was then visualised using NJplot software [14]. The best tree topology was shown.

Analysing type I functional residues:
A conceptual statistical framework for modelling functional divergence was used for estimating the coefficient of functional divergence (θ) as type I functional divergence level indicator. GD protein alignments were used for determining divergence points (DIVERGE software 1.0) [16].

Discussion:
Glycerol conversion to 1, 3-PD involves a B12-dependent glycerol dehydratase coenzyme [5]. However, one report has described that Clostridium butyricum VPI1718 glycerol dehydratase (extracted from 1,3-PD-producing cells) was not stimulated by coenzyme B12 and was extremely oxygen sensitive, thereby suggesting that it might be a B12independent coenzyme [4]. It seems that B12-dependent and B12independent enzymes are orthologous genes which have evolved in separate lines; however, β/α barrel homology indicate an ancestral relationship.
GD evolution is characterised by ancient gene duplications (supported by high basal bootstrap values) followed by bifurcation having long branches, indicating independent evolution for each clade. Despite similar tree branching being observed when using both strategies (see methodology), the second one seemed to be the most parsimonious because it required less steps to reproduce the topology with a good bootstrap value ( Figure   1A, B). Interestingly, the longer basal branches of the tree (1,756 for B12independent and 2,175 for B12-dependent nodes) indicated a deep common ancestor even though each current GD clade has its own evolutionary mode. This hypothesis has been demonstrated by structural analysis for both enzymatic types in which the B12-dependent type has additional chains (contrary to the B12-independent types). JTT+γ was the evolutionary model which best fit our protein sequences [17]; this was not calculated by MEGA 4.0 but is default in Phyml 1.0 software. This strategy has been seen to be effective in predicting the best model for GD evolution.
Several approaches were applied for testing natural selection. The results suggested that dS level was higher than dN (Table 1 see supplementary  material). A 0.000 probability was obtained in the Z-test (dS-dN=3.538). Tajima D value was 4.857740 and dS/dN was 1.3432 in the SNAP server. This suggested that birth and death subjected to strong purifying selection was the model best fitting GD protein evolution.
Such combination has thus sought the best polymorphism by niche, explored according to species. This indicated that GD genes have been in the bacterial genome for a long time. It also suggested that GD was a determinant point of natural selection and thereby cooperated by inducing the divergence of these kinds of bacterial species. It is possible that the GD protein belongs to the radical SAM superfamily but the blast result Hypothesis suggested that it fit better with the RNR-PFL superfamily (data not shown). Such enzymes are strictly anaerobic (like GD) and it has been further suggested that the diversity of chemical reactions catalysed by this class of protein exceeds those catalysed by B12 [18]. Glycerol is the primary metabolite of GD but has a wide variety of catalysed substrates according to its evolutionary mode. GD displayed broad spectrum substrates in this work. GD can catalyse 1, 2-ethanediol → acetaldehyde + H2O, 1,2-propanediol → propionaldehyde + H2O [19] and ethylene glycol → acetaldehyde + H2O (20) and GD may have a plethora of substrates which have not yet been discovered.
Several residues have been determined for GD function. GD has been found at Gly763 within the Clostridia Gly-radical domain (which has been identified as being the site for free radical formation) and Cys433 located around it. The active site binding glycerol and 1, 2-propanediol are mediated by H281, H164, S282, D447, E435, Y640, C433 and Y339 residues (1-4). R782 may be important for functional contact between GD and its reactivase protein [4].

Conclusion:
For one hand, GD protein evolution can be clearly explained by birth and death evolution in purifying selection mode and opens the way forward for future mutagenesis studies pursuing enzymatic activity improvement based on the traces identified here. For the other hand, it is important to develop non conventional data mining strategies looking for the optimal identification of RNF-PFL proteins family members in the databases.