Evolutionary trace analysis of plant haemoglobins: implications for site-directed mutagenesis

Haemoglobins are found ubiquitously in eukaryotes and many bacteria. In plants, haemoglobins were first identified in species, which can fix nitrogen via symbiosis with bacteria. Recent findings suggest that another class of haemoglobins termed as nonsymbiotic haemoglobins are present through out the plant kingdom and are expressed differentially during plant development. Limited data available suggests that non-symbiotic haemoglobins are involved in hypoxic stress and oversupply of nutrients. Due to lack of information on structurally conserved, functionally important residues in non-symbiotic haemoglobins, further studies to elucidate the molecular mechanisms underlying the biological role are hampered. To determine functionally important residues in non-symbiotic haemoglobins, I have analyzed a number of sequences from plant haemoglobin family, in the context of the known crystal structures of plant by evolutionary trace method. Results indicate that the, evolutionary trace method like conventional phylogentic analysis, could resolve phylogentic relationships between plant haemoglobin family. Evolutionary trace analysis has identified candidate functional (trace) residues that uniquely characterize the heme-binding pocket, dimer interface and possible novel functional surfaces. Such residues from specific three-dimensional clusters might be of functional importance in nonsymbiotic haemoglobins. These data, together with our improved knowledge of possible functional residues, can be used in future structure-function analysis experiments.

Though the isolation of haemoglobin from Parasponia andersonii, a non leguminous plant, led to the hypothesis that presence of haemoglobin may extend beyond legumes more concrete evidence came after the discovery of barley haemoglobin and the subsequent demonstration that haemoglobin was present in a number of other cereals, such as maize, wheat rye and triticale.[4] To date non-symbiotic haemoglobin genes are identified from both nitrogen and nonnitrogen fixing dicot and from monocot species.Nonsymbiotic haemoglobins differ markedly in gene homology from the symbiotic haemoglobins.Current studies indicate that most plants have two or more different non-symbiotic haemoglobin genes that are individually regulated.The function of non-symbiotic haemoglobins in plants is the subject of much current research.These non-symbiotic haemoglobins are expressed at low levels and have expression patterns different to the symbiotic haemoglobins.Based on the predicted oxygen-binding kinetics and probable concentration of the non-symbiotic haemoglobins Appleby et al., (1988) first suggested that the one possible role of the non-symbiotic Hbs could be to sense oxygen levels.[5] Further work indicates that they are expressed differentially, with patterns that are complex and vary from tissue to tissue as well as temporally and in response to different stressors leading to the several speculative functions for non-symbiotic haemoglobins.Some of the suggestions indicate their roles in oxygen storage and transport, detoxification of nitric oxide and/or other reactive oxygen species, and sensing of oxygen or other small heme ligands.Current research shows that, plant non-symbiotic www.bioinformation.netBioinformation, an open access forum © 2007 Biomedical Informatics Publishing Group 371 haemoglobins are involved in nitric oxide signaling.Plants have a general response to many environmental stresses including pathogen attack, wounding and adverse environmental conditions.In response to pathogens, plants also produce an oxidative burst composed of superoxide, hydrogen peroxide and other reactive oxygen species whose production involves nitric oxide.In addition, the expression of plant non-symbiotic haemoglobins is also stimulated by somatic embryogenesis, germination and sucrose, each of which turns on the plant disease resistance pathway, generating nitric oxide and other reactive oxygen species.These proteins cannot facilitate oxygen transport because their affinities are too high and their dissociation kinetics are too slow.Further it is now evident that unlike the symbiotic haemoglobins which show reversible oxygen binding, the nonsymbiotic proteins display highly stable oxygen binding and are unlikely to act as oxygen carriers.Hence the physiological function of non-symbiotic haemoglobins in plants must involve something other than traditional oxygen storage and transport.So the exact physiological role of non-symbiotic haemoglobins remains unclear in plants.

Hypothesis
Although the details of leghaemoglobin structure and function have been extensively studied, the more recently discovered non-symbiotic plant haemoglobins are still a mystery.The most unusual feature of non-symbiotic haemoglobins is hexacoordination of the heme group.This differs markedly from 'traditional' pentacoordinate haemoglobins, with their open binding site for exogenous ligands.Instead, the heme group in non-symbiotic haemoglobins is coordinated by two histidines similar to the bovine cytochrome b5.The molecular details of this binding reaction are currently one of the principal questions about hexacoordination and haemoglobins.Although such hexacoordination was first observed in the plant non-symbiotic haemoglobins, it has now also been identified in haemoglobins from the photosynthetic microorganisms Synechocystis and Chlamydomonas that are members of the 'truncated hemoglobin' family.This family contains both pentacoordinate and hexacoordinate haemoglobins that are phylogenetically and structurally different from other haemoglobins and are common to many bacteria and protists.Truncated Hbs (tHbs) are short versions of the globin fold.A putative member of truncated haemoglobins is recently identified in Arabidopsis.[6] However, unlike bovine cytochrome b5, which is unreactive towards oxygen and other gaseous ligands, non-symbiotic haemoglobins bind exogenous ligands rapidly and with high affinity.Because hexacoordinate haemoglobins are up regulated by similar conditions in both plants and animals, there might be a common function for these proteins that involves nitric oxide, reactive oxygen species and hypoxia.Owing to their diverse spatio-temporal expression patterns, the biggest challenge now is to decipher the molecular function of these proteins.[7] Several attempts have been made to decipher the exact function of haemoglobins in plants.Nakajima et al.; used computational methods like ADM (average distance map) method to predict the folding kinetics of selected plant haemoglobins.
[8] In order to identify the function of various amino acid residues, site directed mutagenesis experiments were also performed on conserved Phe40 (B10) which is close to His61 (E7) from rice Hb1.[9] However, the impact of a large number of single or cumulative mutations on the function of non-symbiotic haemoglobins remains to be tested.Since mutating all residues randomly would be an incredibly time and resource consuming process, ways of narrowing down the choice of possible mutational targets must be sought.In this paper I discuss the use of the evolutionary trace (ET) method, to identify potential targets for mutagenesis in plant haemoglobins, with the aim of finding specificity determinants for Lbs and nsHbs from plants.Evolutionary Trace, developed by Olivier Lichtarge, exploits the fact that residues important to the structure or function of a protein strongly tend to be conserved across species.[10] Briefly, The method uses a multiple sequence alignment of a protein family to generate a 'trace' by comparing the consensus sequences for groups of proteins which originate from a common node in phylogenetic tree and are characterized by a common evolutionary time cutoff (ETC) and classifying each residue as one of three types: absolutely conserved, class-specific and neutral.Here, 'classspecific' denotes residues occupying a strictly conserved location in the sequence alignment, but differing in the nature of their conservation between various subgroups.The trace residues identified can then be mapped on to known protein structures, allowing us to identify clusters of important aminoacids and to distinguish between buried and exposed residues.Depending on the ETC value for which a trace is generated, it is possible to maximize the specificity of the analysis over its sensitivity and vice versa.Thus ET analysis allows for a wide range of 'functional resolution'.

Methodology:
Homologous sequences were obtained with BLASTp [11] using the SWISS PROT [12] protein data base and aligned with ClustalW.[13] After removal of sequences with gaps and outliers of the sequence similarity tree, I obtained an alignment for 74 sequences of the plant haemoglobin family was obtained.Several protein weight matrices such as Blosum 30, Blosum 45, Blosum 62, Gonnet, PAM250 were used to generate the sequence similarity index.However as there were no significant differences observed in conserved amino acid residues with respect to heme binding residues, ET analysis was carried out using BLOSUM62 to allow for substitution of similar amino acids.The aligned sequences and the 1D8U, 1BIN coordinates were submitted to the Cambridge University ET server [14] with default parameters.Briefly the server will then construct a phylogenetic tree based on a phylip distance matrix computed from all the aligned sequences.The sequences on different branches of the phylogram will then be grouped into different evolutionary classes depending on percent similarity.Further to generate these classes, the phylogram will be divided by evolutionary time cutoff lines, into ten evenly distributed partitions P01 to P10, in the order of increasing divergence.In the phylogram, all the sequences that Bioinformation, an open access forum © 2007 Biomedical Informatics Publishing Group 372 originate from a common node in any given single partition generated by the evolutionary time cutoff line making that partition constitute a class ensuring that most similar sequences belong to the same class.Random inclusion of sequences was not necessary as ten partitions were sufficient to distinguish all the three plant haemoglobin classes and no distinct class-specific surfaces were created after partition P10.Sequences within different classes, in a given partition were separately aligned, and the resultant aligned classes were compared to derive their consensus (trace) residues, for that partition.The trace residues were mapped onto the known structures of soya bean leghaemoglobin (1BINA) and nonsymbiotic haemoglobin from rice (1D8U) obtained from Protein Data Bank [15] and visualized using pymol.[16] Alternatively similar analysis is also carried out using the Evolutionary trace report_maker, which takes only the Protein Data Bank identifier, and pools, from different sources, information about protein sequence, structure and elementary annotation, and to that background superimposes inference about the evolutionary behavior of individual residues, using real-valued evolutionary trace method.[17] Results and discussion: A phylip distance matrix based on sequence identity is generated for 74 plant haemoglobins sequences identified from SWISSPROT database by the CLUSTAL program and then the resulting sequence alignment after careful examination is submitted to the TraceSuitII server for ET analysis where a rooted phylogenetic tree is built by Kitsch algorithm.The different partitions P01 to P10 divide the phylogenetic tree into classes that vary with the partitions (Fig. 1).Individual partitions contain different number of similar classes, where each class consists a cluster of similar sequences originating from a given node within that partition.

Trace residues and their position on the crystal structures
In order to study the structural differences between the known symbiotic and non-symbiotic plant haemoglobins i have first super imposed the crystal structures of rice and soya bean haemoglobins.
Known crystal structures for the plant haemoglobin family have relative overall root mean square deviation of < 1.1 A 0 indicating the conservation of the overall structure with characteristic globin fold.However, Analysis of mapped traces for partitions P01-P10 reveals clusters of potentially important residues appearing on the 3D structures of symbiotic and non-symbiotic plant haemoglobins (Fig. 3 A and  B).The amino acids are colored according to the relative importance based on the estimated evolutionary pressure they experience.Yellow indicates the highly conserved amino acid residues (>95%) where as white indicates least conserved residues (<5%).The main components of these functional clusters are visible as early as partition P01 and the number of trace residues increase slowly with ETC, thus making it difficult to define cluster boundaries accurately (Fig. 2).Visual inspection of the mapped traces nevertheless led to choosing P08 as the partition displaying the highest ratio of functional resolution over random signal.This is a subjective choice and partitions with both lower and higher ETC values should also be considered, in order to extract the maximum amount of functional information.Class specific trace residues in plant haemoglobins Analysis of class-specific residues in the plant haemoglobins is a convenient and objective way to identify features that distinguish the symbiotic and non-symbiotic haemoglobins.Class specificity is determined by sequence conservation within defined subgroups of a protein family.The most striking result of ET analysis is the designation of the catalytic histidine and nearby conserved residues as class specific.Evolutionary trace analysis, could identify the previously know histidine i.e H108 that facilitate the heme iron co-ordination in both symbiotic (pentacoordinate) and non-symbiotic (hexacoordinate) haemoglobins and H73 from non-symbiotic haemoglobin located on E7 and F8 helix positions.Aside from the histidine, many of the class-specific residues identified by the ET surround the positions of the invariant/nearly invariant residues that are not described earlier.These residues include F54, F118, Y150, L153 and I157 in non-symbiotic haemoglobins as conserved in non-symbiotic haemoglobins.In symbiotic haemoglobins at the interface with the heme-iron S45 is conserved which is not conserved in non-symbiotic haemoglobins.Another significant cluster of class specific residues in plant haemoglobin family is located in the diner interface.In general plant haemoglobins form homo-dimers in their active form.Evolutionary trace analysis has identified conserved residues that are at the dimer interface.In rice nonsymbiotic haemoglobins these residues include E45, A47, P48 and V120.Where as in the symbiotic haemoglobins an additional trace residue W121 is identified at the dimer interface.In addition to the conserved trace residues at the heme-iron binding pocket and the dimer interface, current analysis has identified possible novel functional surfaces (Fig. 3 C and D).Table .1 lists all the trace residues and know substitutions at these possible novel functional surfaces in nonsymbiotic haemoglobins.The information provided in the 'substitutions' column refers to the other amino acids seen at the same position in the alignment.These amino acid types may be interchangeable at that position in that protein having no effect on the protein function.Hence these residues should be avoided in site directed mutagenesis experiments.This method clearly identifies clusters of functionally important residues, outlining the large and complex binding pocket of non-symbiotic haemoglobin thoroughly while making it clear that there are no other areas on the protein structure that are essential for its function.Conserved functional class specific amino acids and the known substitutions reported in the present study could potentially be utilized in future mutational experiments of the plant nonsymbiotic haemoglobins.

Figure 1 :
Figure 1: Evolutionary trace based dendrogram containing plant haemoglobins.Partitions P01-P10 are shown as thin vertical lines.ETC increases from P01 to P10ET partition-dependent classesThe phylogentic tree containing 74 plant haemoglobin genes consists of four major classes viz.Class0, ClassI ClassIIa and ClassIIb.Nodes one and two create two branches in partition P01.One of these branches contains the haemoglobin sequences from liverwort Marchantia polymorpha (Q941Q0), and bryophytes representing the class 0 plant haemoglobins.Unlike conventional phylogentic analysis ET could distinguish between the moss and

Figure 2 :
Figure 2: Traces for partitions P01-P10, aligned with the amino acid sequences of 1BINA and 1D8U.Conserved residues are surrounded by boxes, class-specific residues are denoted by an X, solvent-accessible side chains are shaded

Figure 3 :
Figure 3: Trace residues mapped on to the structure of A. rice non-symbiotic haemoglobin 1D8UA and B. soya bean symbiotic haemoglobin 1BINA.C and D represent novel functional surfaces (shaded in Blue) in symbiotic and non-symbiotic haemoglobins respectively