Genome wide survey and molecular modeling of hypothetical proteins containing 2Fe-2S and FMN binding domains suggests Rieske Dioxygenase Activity highlighting their potential roles in bioremediation

‘Conserved hypothetical’ proteins pose a challenge not just for functional genomics, but also to biology in general. As long as there are hundreds of conserved proteins with unknown function in model organisms such as Escherichia coli, Bacillus subtilis or Saccharomyces cerevisiae, any discussion towards a ‘complete’ understanding of these biological systems will remain a wishful thinking. Insilico approaches exhibit great promise towards attempts that enable appreciating the plausible roles of these hypothetical proteins. Among the majority of genomic proteins, two-thirds in unicellular organisms and more than 80% in metazoa, are multi-domain proteins, created as a result of gene duplication events. Aromatic ring-hydroxylating dioxygenases, also called Rieske dioxygenases (RDOs), are class of multi-domain proteins that catalyze the initial step in microbial aerobic degradation of many aromatic compounds. Investigations here address the computational characterization of hypothetical proteins containing Ferredoxin and Flavodoxin signatures. Consensus sequence of each class of oxidoreductase was obtained by a phylogenetic analysis, involving clustering methods based on evolutionary relationship. A synthetic sequence was developed by combining the consensus, which was used as the basis to search for their homologs via BLAST. The exercise yielded 129 multidomain hypothetical proteins containing both 2Fe-2S (Ferredoxin) and FNR (Flavodoxin) domains. In the current study, 17 proteins with N-terminus FNR domain and C-terminus 2Fe-2S domain are characterized, through homology modelling and docking exercises which suggest dioxygenase activity indicate their plausible roles in degradation of aromatic moieties.


Background:
Over the last decade, more than 150 complete genomes of diverse bacteria, archaea and eukaryotes have been sequenced, and many more are currently in the pipeline [1].It is well known that, in any newly sequenced bacterial genome, as many as 30-40% of the genes do not have an assigned function [2].This figure is even higher for archaeal and eukaryotic genomes and for the relatively large genomes of bacteria with a complex life style, such as Anabaena, Streptomyces, etc [3,4].
as there are hundreds of conserved proteins of unknown function even in model organisms, such as Escherichia coli, Bacillus subtilis or Saccharomyces cerevisiae, any discussion of a 'complete' understanding of these organisms as biological systems will remain in the realm of wishful thinking.Although it appears likely that the central pathways of information processing and metabolism are already known, crucial elements of these systems could still be lurking among the 'conserved hypotheticals', and important mechanisms of signalling and stress response, in all likelihood, would remain undiscovered [6].Because of the inherent thermodynamic stability of the aromatic ring, natural turnover of these compounds is slow and instead relies on complex microbial degradation pathways.With aromatic compounds comprising >25% of the earth's biomass, these pathways play a crucial role in the biogeochemical carbon cycle.However, despite the abundance of microbial degraders, man-made aromatic pollutants are often recalcitrant to existing bioprocessing pathways.As a result, these xenobiotic compounds, many of which are derived from the processing of crude oil, persist in the environment causing irreversible damage to the biosphere [7].Aromatic ring-hydroxylating dioxygenases, also called Rieske dioxygenases (RDOs), are class of multi-domain proteins that catalyze the initial step in microbial aerobic degradation of many aromatic compounds.Two hydroxyl groups are introduced into the aromatic ring yielding cyclic cisdihydrodiols or cis-diol carboxylic acids (Figure 1) [Substituents X and Y can be hydrogen atoms or any of several other groups] [8, 9].More than three dozen distinct RDOs have been identified.RDOs consist of a reductase, an oxygenase and in some cases, an additional ferredoxin that mediates electron transfer between the former two components.The oxygenase component catalyzes the insertion of both atoms of molecular oxygen into the aromatic substrate, which is believed to occur at a mononuclear iron site and to be accompanied by electron insertion from a Rieske-type [2Fe-2S] centre.Either the reductase or, where present, the intermediary ferredoxin component, supplies the two electrons from NAD(P)H to the dioxygenase [10].RDOs have been empirically classified according to the various combinations of subunits and electron transfer co-factors involved in reducing the oxygenase component [10,11] as mentioned in Table 1 (see supplementary material).
Here we present a protocol to data mine and computationally characterize redox hypothetical proteins possessing multiple domains.Most proteins consist of multiple domains, and domains determine the function and evolutionary relationships of proteins [12].Thus, it is important to understand the principles of domain combinations and their associated inter domain interactions especially, in hypothetical proteins.
Primarily, 2Fe-2S (Ferredoxins) and FMN/FAD (Flavodoxins) were considered due to their vital and diverse roles in biological systems, the most important amongst it being their role in Electron Transport Mechanisms.Ferredoxins are small, acidic, electron transfer proteins that are ubiquitous in biological redox systems.Members of the 2Fe-2S ferredoxin family have a general core structure consisting of beta (2)alpha-beta (2).They are proteins of around one hundred amino acids with four conserved cysteine residues to which the 2Fe-2S cluster is ligated [13].Flavoenzymes have the ability to catalyse a wide range of biochemical reactions.They are involved in the dehydrogenation of a variety of metabolites, in electron transfer from and to redox centres, in light emission, in the activation of oxygen for oxidation and hydroxylation reactions.About 1% of all eukaryotic and prokaryotic proteins are predicted to encode a flavin adenine dinucleotide (FAD) or

2PIA based model for GI ID 289441001
The query protein 289441001 from Mycobacterium tuberculosis was successfully modelled using SWISS model interface, where the overall identity between the query and template is 26.3 %.The alignment between the template and query is shown in Figure 11.In spite of the low overall sequence identity, it can be appreciated that the binding regions of 2Fe-2S and FMN exhibit high conservation.The RMSD between the modelled structure and template is found to be 0.22 Å (for 93.2% of the atoms superposed) for Ca atoms.The quality of the model was assessed with PROCHECK (ramachandran map analysis) where 97.7% of the residues were in allowed region and only 2.3% residues were in disallowed region.Interestingly, none of these residues in the outlier regions belong to the functionally important residues.The 2Fe-2S and FMN ligands were docked into the model and all the where all the models were judged to possess clashes within acceptable limits.Table 3 summarises the details of all the 17 models generated with 2PIA (which contains 322aa) as the template.

Figure 6 :
Figure 6: MSA of group 1 of FNR reductase family.

Figure 8 :
Figure 8: Pie-chart showing the distribution of domains in the 129 hypothetical proteins.

Figure 9 : 7 . 9 .
Figure 9: Phylogenetic tree of the hypothetical proteins containing Phylogenetic tree of N-terminus FNR and Cterminus 2Fe-2S Results & Discussions: Upon critical evaluation of the 129 multi-domain hypothetical sequences through CDD, significant differences in the location of 2Fe-2S domain, relative to other domains, were found.Of these 129 sequences, 61 contained an N-terminus 2Fe-2S and a C-terminus FNR domain while this order was reversed in 25 sequences as shown in the Figure 7.The remaining 43 sequences contained an N-terminus MOSC domain [21](pfam03473 and pfam03476) which is a super family of betastrand-rich domains identified in the molybdenum cofactor sulfurase and several other proteins from both prokaryotes and eukaryotes.The MOSC domain is predicted to be a sulfurcarrier domain that receives sulfur abstracted by the pyridoxal phosphate-dependent NifS-like enzymes, on its conserved
interactions were found similar to that of the template.The binding of 2Fe-2S Ligand and FMN are shown in Figures 12, 13 & 14.Table2(see supplementary material) summarizes the residues forming the Pharmacophore (4 Å radius) for FMN ligand in template, FMN ligand redocked to template and model where high residue conservation is observed.The docking of the FMN to the template (using the program FlexX) was done to re-confirm the ligand binding pose, and normalize the artefacts due to the software, if any.The residues highlighted in bold forms H-bonds with the FMN, which further reiterates decent bind of the ligand.The modelled and docked structures were deposited at the Protein Model Data Bank (PMDB)[26]

Figue 14 :
Figue 14: A) Surface representation of ligand binding region in model; b) residues at the pharmacophore (4 Å radius) in model; C) 2D representation of the ligand-residue interaction in modelConclusion:129 hypothetical proteins from across the genomes have been data mined, and the 3D description of 17 sequences has been derived with confidence.The statistics related to comparative modelling and docking studies (with acceptable energy values) have revealed a strong interaction of 2 redox ligands, viz., 2Fe-2S and FMN with the binding residues, which further strengthens the argument of these proteins being involved in cleavage of aromatic compounds.Though degradation of aromatic compounds by microorganisms is a well established Fact [27, 28], characterization of hypothetical sequences in the Present study could aid in better understanding of these microbial systems.A large number of microbial systems containing these dioxygenases have also been mined and characterized in the present investigation, which could provide insights into their degradation properties.Thus, this study on multi-domain hypothetical proteins could prove critical in two ways viz., in understanding the mechanism of uptake of nutrients which contain aromatic ring structures and hence enabling engineering of these proteins towards effective degradation of harmful xenobiotics.

Table 2 :
Residues in the Pharmacophore in the template and model for FMN

Table 3 :
Summary of 17 models