Sequence analysis and homology modeling of peroxidase from Medicago sativa

Plant peroxidases are one of the most extensively studied group of enzymes which find applications in the environment, health, pharmaceutical, chemical and biotechnological processes. Class III secretary peroxidase from alfalfa (Medicago sativa) has been characterized using bioinformatics approach Physiochemical properties and topology of alfalfa peroxidase were compared with that of soybean and horseradish peroxidase, two most popular commercially available peroxidase preparations. Lower value of instability index as predicted by ProtParam and presence of extra disulphide linkages as predicted by Cys_REC suggested alfalfa peroxidase to be more stable than either of the commercial preparations. Multiple Sequence Alignment (MSA) with other functionally similar proteins revealed the presence of highly conserved catalytic residues. Three dimensional model of alfalfa peroxidase was constructed based on the crystal structure of soybean peroxidase (PDB Id: 1FHF A) by homology modelling approach. The model was checked for stereo chemical quality by PROCHECH, VERIFY 3D, WHAT IF, ERRAT, 3D MATCH AND ProSA servers. The best model was selected, energy minimized and used to analyze structure function relationship with substrate hydrogen peroxide by Autodock 4.0. The enzyme substrate complex was viewed with Swiss PDB viewer and one residue ASP43 was found to stabilize the interaction by hydrogen bonds. The results of the study may be a guiding point for further investigations on alfalfa peroxidase.


Background:
Peroxidases (PODs, EC 1.11.1.7)are well known haem containing proteins that use hydrogen peroxide (H2O2) as the electron acceptor to catalyse a number of oxidative reactions [1-3].PODs from bacteria, fungi, plants and animals constitute a superfamily consisting of three major classes [4].Class I PODs include intracellular enzymes in plant, bacteria and yeast.Class II PODs consist of secretary fungal enzymes and Class III are typical secretary plant PODs, which have multiple tissue specific functions: e.g., removal of H2O2 from chloroplasts and cytosol, oxidation of toxic compounds, biosynthesis of cell wall, defence responses towards wounding, indole 3 acetic acid catabolism, ethylene biosynthesis etc. [5].Class III PODs are glycoprotein's which belong to a large multigene family containing four conserved disulphide bridges and two calcium ions [6].Besides being vital for plants, the enzyme also has serious commercial applications in medicine as a component of diagnostic tools and in the bioremediation and bio bleaching industries [7,8].Horseradish POD and soybean POD are two important commercially exploited sources of the enzyme because of their higher stability [9].A couple of preliminary experiments have already established the usefulness of alfalfa POD for biotechnological applications.The enzyme immobilized in self assembled monolayers has been successfully used for determination of thiodicarb [10] and the plant has also been studied for analysis of POD activity under abiotic stress [11].Detailed study of alfalfa POD still needs to be done in order to raise the potential of its use for various academic and industrial applications.Hence, the present paper enlists some of the physiochemical and functional properties of alfalfa POD and provides insight into its three dimensional structure.Docking studies were performed to predict the ligand binding site within the protein.

Retrieval of target sequence
The 353 amino acid long sequence of POD from alfalfa (Medicago sativa) was obtained from the Protein sequence database of NCBI (GenBank Id: AAB41811) and blasted [12] against Protein Data Bank (PDB) entries to find similar sequences.Parameter values for BLAST 2.2.26 search were set as default.

Characterization of target sequence
Transmembrane helices in the query sequence were predicted by TMHMM 2.0, ABTMpro and TargetP 1.1 server.Subcellular localization of protein using amino acid composition was achieved by MultiLoc

Protein Topology prediction
Secondary structures within alfalfa POD were calculated with SOPMA (Self Optimized Prediction Method with Alignment) [18] and disulfide bonds were predicted by the Cys_REC tool from softberry.

Multiple Sequence Alignment
In order to know the key residues responsible for catalytic activity of the enzyme, amino acid sequence of alfalfa POD was compared with amino acid sequences from other PODs of known crystallographic structures.Pairwise sequence alignment server Clustal-W [19] was employed to compare POD form alfalfa, soybean (PDB Id: 1FHF), peanut (PDB Id: 1SCH) and horseradish (PDB Id: 1ATJ).

Homology modeling
High resolution (2.80A°) crystal structure of soybean POD (PDB Id: 1FHF) was selected as template to create the three dimensional model for POD from alfalfa.The homology modeling program, Modeller 9v9 [20] was used to generate a total of 10 models of target protein.

Model refinement and evaluation
The models constructed were solvated and subjected to energy minimization using the steepest descent and conjugate gradient technique to eliminate bad contacts between protein atoms and structural water molecules.

Docking Studies
The structure of substrate H2O2 was obtained from NCBI PubChem in psf format and converted to pdb format using OpenBabel [28].To understand the molecular interactions between H2O2 and the protein receptor, flexible small moleculerigid protein docking experiments were performed using Autodock 4.0 [29].In the protein non-polar hydrogen atoms were merged and total gasteiger charge of -3.9953 was added to the protein.It was made sure that there are no non-bonded atoms in the protein.Gasteiger partial charges were also assigned to the ligand and all torsions were allowed to rotate during docking.The grid box was centred at the modeled structure and affinity maps were calculated by AutoGrid.Fifty Lamarckian Genetic Algorithm runs with default parameter settings were performed and the docked confirmations were played ranked by energy.The lowest energy complex of ligand and protein was further viewed using Swiss PDB viewer.

Results and Discussion:
The 353 amino acid long sequence of alfalfa POD retrieved from Protein: sequence database of NCBI (GenBank ID: AAB41811) had a putative signal peptide of 28 amino acids at N-terminus and a mature peptide of 325 amino acid residues.The sequence of mature protein was blasted against the PDB database for proteins with similar sequence and known three dimensional structure using compositionally adjusted substitution matrices.A 304 amino acid long sequence of POD from soybean (PDB Id: 1FHF) had 69% residues identical with the target protein with 0% gaps.Structure of soybean POD has been solved at a resolution of 2.80 A°, hence it was chosen as template to build the 3D structure for target protein.

Characterization of POD
TargetP 1.1 designated alfalfa POD to be a protein of secretary pathway and TMHMM server 2.0 predicted the presence of one transmembrane domain within the signal peptide which may be required to direct the protein to secretary pathway.ABTMpro server also indicated that alfalfa POD is an alpha helical transmembrane protein.Multiloc predicted the protein to be extracellular in nature.Eight asparagines residues were found to be N-glycosylated by NetNGlyc server.Glycosylation of asparagines residues is required for correct folding of protein before being exported.
A comparison of physiochemical properties of POD from three different sources viz.horseradish, soybean and alfalfa computed using ProtParam tools is presented in Table 1 (see supplementary material).Horseradish and soybean are two of the most popular sources for isolation of POD and hence were included in the study for comparison purpose.Data shown in Table 1 suggested that alfalfa POD is superior over POD from both the sources.The computed isoelectric point (pI value) revealed alfalfa POD to be more acidic than horseradish POD but less acidic than soybean POD.The computed isoelectric point will be useful for separating the protein on a polyacrylamide gel by isoelectric focusing.The extinction coefficient of a protein as calculated by the program depends on the molar extinction coefficient of Tyr, Trp and Cys residues.The extinction coefficient can be used to calculate the concentration of a protein in solution.Stability of the three PODs was studied by analyzing the values for instability index, aliphatic index and Grand average of hydropathicity (GRAVY) index.Instability index relies upon the occurrence of certain dipeptides along the length of the protein to distinguish between the unstable and stable protein [16].The value of instability index was least for alfalfa POD (42.94, 30.18 and 23.17 for horseradish, soybean and alfalfa POD respectively); hence it could be safely predicted as most stable of the three proteins.The aliphatic index refers to the relative volume of a protein that is occupied by aliphatic side chains and contributes to the increased thermo stability of protein.Aliphatic index of POD from all the three sources was comparable.GRAVY index indicates the solubility of proteins: a negative GRAVY value for POD from all the compared sources designates it to be hydrophilic in nature.

Topology of alfalfa POD
A comparison of secondary structure elements as calculated by SOPMA showed that random coil occupied the largest part of the protein followed by alpha helix, extended strand and beta turns for all three PODs

ClustalW analysis
The identification of catalytic residues is key to understanding the function of enzymes.The information from other functionally similar sequences with known crystallographic structures was exploited to identify the key catalytic residues.ClustalW alignment of alfalfa POD with soybean (PDB Id: 1FHF), peanut (PDB Id: 1SCH) and horseradish (PDB Id: 1ATJ) PODs as shown in Table 3 (see supplementary material) depicted a high degree of conservation among the sequences [30, 31].Moreover POD in higher plants is a part of large multigene family where a number of isozymes are formed as a result of posttranscriptional and posttranslational changes.Hence, the compared sequences varied in length but essentially conserved the key catalytic residues which have been highlighted with an asterisk (*) symbol.

Model building, refinement and evaluation
Crystal structure of soybean POD was used as template to generate three dimensional coordinates for alfalfa POD.Ten models generated by Modeller 9v8 were viewed with Swiss PDB viewer and energy minimized.Ramachandran plot analysis of the ten models was obtained by PROCHECK server.The best model in terms of stereo chemical quality showed a overall G-factor value of -0.04 which indicates that geometry of the model corresponds to high probability confirmation with 92.1% residues in the core region of Ramachandran plot (Figure 1A).The number of residues in allowed and generously allowed regions were 6.6% and 1.4% respectively.It is generally accepted that if 90% residues are in the allowed region, the quality of the model is evaluated as good and reliable.None of the residues was present in the disallowed region of the plot.Superimposition of the model with the template (used as reference layer) with Swiss PDB viewer as shown in (Figure 1B) showed a very low RMSD of 0.26 A  , suggesting high similarity between them.3D Match program was also used for alignment of protein 3D structures.The RMSD score of 0.256A  with a Z score of 7.930 indicated that both template and target proteins have similar folds.
Verify 3D analysis revealed that 84.97 % of the residues had an average 3D-1D score of <0.2, predicting that the model is compatible with its sequence.The amino acid environment was evaluated using ERRAT plots, which assess the distribution of different types of atoms with respect to one another in the protein model and is used for making decision about its reliability.ERRAT showed an overall quality factor of 85.050, a result expected for crystallographic models with resolution >2.5A  .B-factor analysis is done with WHAT IF server reflected the mobility or flexibility of various parts of the molecule.Averaged B-factor deviation for protein backbone was 30.594 and averaged standard deviation was 38.491.Since average deviation value was less than standard deviation, so it reflected a good quality model.

Docking Studies
Autodock was used to generate fifty different conformations of the ligand H2O2.The confirmations were played ranked by their energy and the confirmation having lowest binding energy (-1.88 kcal/mol) was chosen for further analysis of protein-ligand complex.Docking site interaction was stabilized by hydrogen bonds between ASP43 OD2 and two hydrogen atoms of the ligand (Figure 3A).Residues His40 and Cys49 were found within a distance of 3 A  from the ligand.When the distance was increased to 4A  , residues Leu101, Ala98, Asp50 and Cys44 were found to interact with the ligand.In the template protein both ASP43 and ASP50 are involved in binding of calcium ions, which in turn may help in binding the H2O2 molecule.The key ligand binding residues (Arg38, Phe41 and His42) as found in other functionally similar proteins (PDB Id: 1SCH & 1ATJ) were found within a distance of 5A  from H2O2, which might be due to rigid biomolecule docking.Complex of alfalfa POD with H2O2 as viewed by Deep view Swiss PDB viewer is shown in (Figure 3B).Further analysis revealed the presence of fourteen alpha helices in the modelled protein.Surface view of the modeled structure (Figure 3C) showed that ligand binding site was buried inside a channel.LGRRDSLTANRTLANQNLPAPFFNLTQLKASFAVQGLN-TLDLVTLSGGHTFGRARCSTF 179 1ATJ|F LGRRDSLQAFLDLANANLPAPFFTLPQLKDSFRNVGLNRSSDLVALSGGHTFGKNQCRFI 180 1SCH|A LGRRDSTTASLSSANSDLPAPFFNLSGLISAFSNKGFT [13].NetNGlyc 1.0 Server predicted N-Glycosylation sites using artificial neural networks.Physiochemical properties such as molecular weight, theoretical pI, total number of negatively (Asp+Glu) and positively (Arg+Lys) charged residues, extinction coefficients [14], instability index [15], aliphatic index [16] and grand average of hydropathicity (GRAVY) [17] of the mature protein were computed using Expasy's ProtParam Proteomics server.

Figure 1 :
Figure 1: (A) Ramachndran plot analysis for alfalfa POD.The plot statistics are: Total number of residues-325 with 92.1% residues in the core region (red); 6.6 % residues in allowed (yellow) and 1.4% in generously allowed (light yellow) region.Number of glycine residues (labelled as G) is 21 and Number of Pro residues is 16 (B) Superimposition of CA atoms of alfalfa POD (red) with soybean POD (green) as done with Deep View Swiss PDB viewer gave a RMS value of 0.26 A  , arrows within helices and sheets points towards the C-terminus.Gln1 is the first amino acid at N-terminus whereas Oxt 325 represents the terminal oxygen atom at C-terminus.
RMS Z-scores for anomalous bond lengths and bond angles as determined by WHAT IF 0.895 and 1.209 respectively, which is very close to 1.0 suggesting high model quality.ProSA was used to check three dimensional model of alfalfa POD for BIOINFORMATION open access ISSN 0973-2063 (online) 0973-8894 (print) Bioinformation 8(20): 974-979 (2012) 977 © 2012 Biomedical Informatics potential errors.The program displays two characteristics of the input structure: its Z-score and a plot of its residue energies.The Z-score indicates overall model quality and measures the deviation of the total energy of the structure with respect to an energy distribution derived from random conformations.As shown in (Figure 2A) the Z-score for alfalfa POD is -8.26 and for soybean POD is -7.04.The scores are well within the range of scores typically found for proteins of similar size indicating a highly reliable structure.The energy plot shows the local model quality by plotting energies as a function of amino acid sequence position.In general, positive values correspond to problematic or erroneous part of a model.Figure 2B displays a comparable energy plot for both the target and template structures.

Figure 2 :
Figure 2: ProSA-web service analysis of alfalfa and soybean peroxidase.ProSA-web z-scores of all protein chains in PDB determined by X-ray crystallography (light blue) or NMR spectroscopy (dark blue) with respect to their length.The z-scores of alfalfa (A) and soybean (B) peroxidase are highlighted as large dots (C) Energy plot of alfalfa peroxidase (D) Energy plot of soybean peroxidase.

21] by Ramachandran plot analysis [22]. The
best model was selected on the basis of overall G-factor, number of residues in core, allowed, generously allowed and disallowed regions.The selected model was further put to analysis by VERIFY 3D [

Table 1 :
Comparison of properties of POD from three different sources as predicted by ProtParam program * First value is based on the assumption both cysteine residues form cystines and the second assumes that both cysteine residues are reduced. *

Table 2 :
Secondary structure elements as predicted by SOPMA for PODs