Structure prediction and functional characterization of secondary metabolite proteins of Ocimum.

Various species of Ocimum have acquired special attention due to their medicinal properties. Different parts of the plant (root, stem, flower, leaves) are used in the treatment of a wide range of disorders from centuries. Experimental structures (X-ray and NMR) of proteins from different Ocimum species, are not yet available in the Protein Databank (PDB). These proteins play a key role in various metabolic pathways in Ocimum. 3D structures of the proteins are essential to determine most of their functions. Homology modeling approach was employed in order to derive structures for these proteins. A program meant for comparative modeling- Modeller 9v7 was utilized for the purpose. The modeled proteins were further validated by Prochek and Verify-3d and Errat servers. Amino acid composition and polarity of these proteins was determined by CLC-Protein Workbench tool. Expasy's Prot-param server and Cys_rec tool were used for physico-chemical and functional characterization of these proteins. Studies of secondary structure of these proteins were carried out by computational program, Profunc. Swiss-pdb viewer was used to visualize and analyze these homology derived structures. The structures are finally submitted in Protein Model Database, PMDB so that they become accessible to other users for further studies.


Background:
Botanically, basil belongs to the genus Ocimum of the family Lamiaceae.More than 160 species of Ocimum are reported from different parts of the world.Different parts (roots, stem, leaves, seeds and flowers) of Ocimum have been used for treatment of variety of diseases such as bronchitis, malaria, diarrhea, dysentery, skin diseases, arthritis etc. Ocimum sp.contains monoterpene derivatives such as camphor, limonene, thymol, citral, geraniol and linalool.A detailed analysis of protein sequences from Ocimum, their probable structures and mode of action are yet to be accomplished.Plants synthesize chemicals in their leaves in order to protect themselves from herbivores.One such class of defense compounds that has been used extensively by humans are members of phenylpropenoid class namely eugenol, chavicol and their derivatives.It has been reported that in basil glands, two closely related (90% identical) enzymes chavicol o-methyltransferase (CVOMT) and eugenol o-methyltransferase (EOMT) catalyze the formation of methylchavicol and methyleugenol from chavicol and eugenol respectively [1].The enzymes are involved in aroma production in basil.From an evolutionary perspective plant and microbial PALs (phenylalanine ammonia lyase) are part of superfamily of enzymes from plants, fungi and bacteria, and are likely derived from a precursor of the widespread histidine ammonia lyase (HAL) family in the histidine degradation pathway [2].PAL catalyses the non-oxidative deamination of phenylalanine to trans-cinnamate and directs the carbon flow from the shikimate pathway to the various branches of the general phenylpropanoid metabolism.Lipooxygenase (Fatty-acid metabolism) is one of the most widely studied enzymes found in more than 60 species of plant and animal kingdom.The enzyme catalyses the biooxygenation of polyunsaturated fatty acids (PUFA) containing a cis, cis-1, 4-pentadiene unit to form conjugated hydroperoxydienoic acids.Lipoxygenase has considerable application in food related products such as in bread making.The enzyme plays a significant role in formation of secondary metabolites in sweet basil.
Enzymatic browning of fruits and vegetables is caused mainly by the conversion of native phenolic compounds to quinones which are then polymerized to brown, red or black pigments imparting colour to various plant parts.The enzymes responsible for catalyzing this sequence of reactions are termed as polyphenol oxidases, but are also known as tyrosinases, catecholases, cresolases and phenolases [3].Because of deleterious effect of enzymatic browning on fruits and vegetables much work is devoted as to retard or at least delay the browning process.Polyphenol oxidase being the causative agent responsible for browning is exploited for the purpose.The enzyme is involved in iso-quinoline alkaloid biosynthesis and in biosynthesis of other secondary metabolites.In order to understand biochemical function and interaction properties of the protein at molecular level, three dimensional structure of protein is foremost requirement.However, the number of available protein sequences exceeds far behind the available three dimensional protein structures.In order to compensate this, homology modeling approach came into being.These methods are believed to be cost-effective and time-effective when compared to X-rays crystallography and NMR techniques.Computational methods make use of hidden information inside amino acid sequences in order to predict protein structure and function.

Materials and Methodology:
The amino acid sequences of secondary metabolite proteins of Ocimum whose structures are not yet available in RCSB Protein Databank (PDB) were retrieved from SWISSPROT, a public resource of curated protein sequences [4] and subjected to NCBI BLAST [5].Based on high score, lower e-value and maximum sequence identity, the best template was selected which was then used as reference structure to build a 3D model.Template and target proteins considered for the study have been shown in Table 1 (see Supplementary material).

Model building and evaluation:
The three dimensional structures of proteins were modeled using Modeler 9v8

Computation of amino acid composition:
Amino acid composition (Table 2 see supplementary material) of Ocimum proteins under study was calculated using CLC protein workbench tool (www.clcbio.com/protein).The tool also provides estimation of percentage of hydrophobic and hydrophilic residues present in the protein (Table 3 see Supplementary material).

Physiochemical characterization:
For physiochemical characterization, theoretical pI (isoelectric point), molecular weight, -R and +R (total number of positive and negative residues), EI (extinction coefficient)

Submission of the modeled proteins in protein model database (PMDB):
The models generated for various Ocimum proteins were successfully submitted in Protein model database, PMDB [19] without any stereochemical errors.The submitted models can be accessed via their PMIDs (Table 7 see Supplementary material).

Results and Discussion:
As experimental structures of some of the important secondary metabolite proteins of Ocimum are not available, homology modeling approach was used in order to derive their structures.

Model building, refinement and evaluation: PROCHECK analysis:
Ramachandran plot for Chavicol O-methyltransferase (D3KYA1) has been illustrated in Figure 1.Altogether more than 90% of the residues were found to be in favoured and allowed regions, which validate the quality of homology models.The overall G-factor for D3KYA1 was -0.19.As the value is greater than the acceptable value -0.50, this suggests that the modeled structure is acceptable.The modeled structures were also validated by other structure verification servers such as Verify 3D and Errat.Verify 3D assigned a 3D-1D score of >0.2 for all the modeled proteins.This implies that the models are compatible with its sequence.ERRAT showed overall quality factor of 49.62 for D3KYA1.The plot generated by Verify-3D and Errat for Chavicol omethyltransferase has been illustrated in Figure 2A & 2B.

Physiochemical characterization:
The physiochemical parameters viz., theoretical isoelectric point (Ip), molecular weight, total number of positive and negative residues, extinction coefficient, half-life, instability index, aliphatic index and grand average hydropathy (GRAVY) were computed using the Expasy's ProtParam tool (Table 4).The computed pI value for A8D7D8, B2ZA17, B6VQV5, B6VQV6, D3KYA1 (pI<7) indicated their acidic nature, whereas pI for A8D6D7, B2ZA12, B2ZA16 (pI>7) revealed there basic behaviour.The computed isoelectric point (pI) will be useful for developing buffer system for purification by isoelectric focusing method.Extinction coefficient values for Ocimum proteins at 280 nm ranged from 1490 to 50795 M-1cm-1 for B6VQV6 and D3KYA1 indicating the presence of higher concentration of Tyr and Trp.Cys was very low in concentration in all the eight Ocimum proteins studied.This indicates that these proteins cannot be analyzed using UV spectral methods.On the basis of instability index Expasy's ProtParam classified the B2ZA17 (Eugenol o-methyltransferase), A8D7D8 (Lipoxygenase), B2ZA12 (Eugenol o-methyltransferase) and B2ZA16 (Eugenol o-methyltransferase) proteins as unstable (Instability index>40) and other Ocimum proteins as stable (Instability index<40).The aliphatic index (AI) which is defined as the relative volume of a protein occupied by aliphatic side chain is regarded as the positive factor for the increase of thermal stability of globular proteins.The very high aliphatic index of all Ocimum proteins infers that these proteins may be stable for a wide range of temperature.The very low GRAVY index of proteins B6VQV6 and D3KYA1 infers that these proteins could result in a better interaction with water.

Functional characterization:
The result of primary analysis suggests that all the Ocimum proteins under study were hydrophobic in nature due to the presence of high non-polar residues content (Table 2 & 3).As percentage of Cysteine(C) is very low in all the Ocimum proteins under study (Table 2), none of these proteins have disulphide bond linkages, as indicated by CYS_REC result (Table 4).The extensive hydrogen bonding may provide stability to these proteins in absence of disulphide bonds.Proteins B2ZA12, A8D6D7 and B6VQV5 have high percentage of methionine(M), alanine(A), leucine(L) and lysine(K).As these amino acids have high helix-forming propensities, alpha helix are dominant in these proteins.This is also evident from analysis of PROFUNC result (Table 6).Rest of the Ocimum proteins had mixed secondary structures i.e. alphahelices, beta-strands and coils.All the proteins showed high percentage of glycine and proline (Table 2).As these amino acids are common in turns, other secondary structures such as Beta turns and Gamma turns are dominant in these proteins (Table 6).

Submission of modeled proteins in PMDB:
The modeled structures of proteins from various species of Ocimum were successfully deposited in Protein Model Database (PMDB).The PMDB ID for the submitted structures has been presented in Table 7 (see Supplementary material).These 3D structures may be further used in characterizing the protein experimentally.

Conclusion:
In this study proteins from various species of Ocimum were modeled using homology modeling approach.Different parameters such as isoelectric point, molecular weight, total number of positive and negative residues, extinction coefficient, instability index, aliphatic index and grand average hydropathy (GRAVY) were computed for these proteins in order to determine their physiochemical characteristics.All the proteins were found to be deficient in amino acid cystein, and therefore lack presence of disulphide linkages as also inferred from analysis of cys_rec result.In the absence of disulphide bond, extensive hydrogen bonding is believed to be responsible for stability of these proteins.Polarity studies using CLC protein work bench tool confirmed all the studied proteins to be hydrophobic in nature.This may be due to the presence of a large number of non-polar residues.Secondary structure studies showed that all the studied proteins contain high proportion of other secondary structures ie.Beta-turns and Gamma-turns.This is attributed to the presence of higher concentration of proline and glycine residues.The modeled structures can be accessed through protein model database PMDB via there PMID's.Homology derived models are extensively used in wide range of applications such as virtual screening, site-directed mutagenesis experiments or in rationalizing the effects of sequence variation.These structures will serve as cornerstone for functional analysis of experimentally derived crystal structures.

[ 6 ]
. Quality of generated models was evaluated with PROCHECK [7] by Ramachandran plot analysis [8].Stereochemical quality and accuracy of the selected models was further improved by subjecting it to energy minimization with the GROMOS 96 43B1 parameters set, implementation of Swiss-PDB Viewer [9].Validation of generated models was further performed by VERIFY 3D [10] and ERRAT [11] programs.ProSA [12] was used for the analysis of Zscores and energy plots.The three dimensional structures of modeled proteins were analyzed using Deep View Swiss PDB viewer.Root Mean Square Deviation (RMSD) values were calculated between the set of targets and template protein to see how much modeled protein deviates from the template protein structure.
[13], II (instability index[14])[15], AI (aliphatic index) and GRAVY (grand average hydropathy) [16] were computed using the Expasy's ProtParam server[17] for set of proteins (http://us.expasy.org/tools/protparam.html).The results are shown in Table4(see Supplementary material) Functional characterization: CYS_REC (http://sunl.softberry.com/berry.phtml?topic) was used to locate "SS bond" between the pair of cystein residues, if present.The tool yields position of cysteins, total number of cysteins present and pattern, if present, of pairs in the protein sequence as output.All the Ocimum proteins under study showed absence of disulphide bonds.The results are presented in

Figure 2 :
Figure 2: (A) Verify-3D plot, (B) Errat plot PROSA analysis: The z-score for all the modeled proteins was found to be within the range of scores typically found for native proteins of similar size showing good quality of the model.Energy Plot for chavicol o-methyltransferase (D3KYA1) with chain length (257 AA) and z-score (-7.26) is presented in Figure 3A & B.

Figure 3 :
Figure 3: (A) Prosa-web z-score plot, (B) Prosa-web plot of residue scoresSwiss-PDB viewer analysis of predicted model:Visualization and analysis of the model using Swiss-PDB reveals that there are no steric hindrances between the residues and thus modeled structures are stable.Structure-structure superimposition was done in order to calculate Root

Figure 4 :
Figure 4: Modeled structure of Chavicol o-methyltransferase as viewed by Swiss-PDB viewer In the present study, In silico analysis and homology modeling studies on uncharacterized proteins in different species of Ocimum like O. basilicum, O. tenuiflorum, O. citriodorum, O. seloi, O. gratissimum and O. americanum whose structures are not yet available in PDB have been accomplished.

Table 5 (see Supplementary material). Secondary structure prediction: Profunc [18] was
employed for calculating the secondary structural features of Ocimum protein sequences.The results are presented in

Table 1 :
Target protein and template protein considered for the study

Table 3 :
Hydrophilic and hydrophobic residues content

Table 4 :
Physiochemical characters as predicted by Expasy's prot-param program

Table 5 :
Presence of disulphide (ss) bond as predicted by Cys_Rec

Table 6 :
Calculated secondary structure elements by Profunc

Table 7 :
Protein submitted in PMDB