Three-dimensional (3D) structure prediction of the American and African oil-palms β-ketoacyl-[ACP] synthase-II protein by comparative modelling

Background: The fatty-acid profile of the vegetable oils determines its properties and nutritional value. Palm-oil obtained from the African oil-palm [Elaeis guineensis Jacq. (Tenera)] contains 44% palmitic acid (C16:0), but, palm-oil obtained from the American oilpalm [Elaeis oleifera] contains only 25% C16:0. In part, the b-ketoacyl-[ACP] synthase II (KASII) [EC: 2.3.1.179] protein is responsible for the high level of C16:0 in palm-oil derived from the African oil-palm. To understand more about E. guineensis KASII (EgKASII) and E. oleifera KASII (EoKASII) proteins, it is essential to know its structures. Hence, this study was undertaken. Objective: The objective of this study was to predict three-dimensional (3D) structure of EgKASII and EoKASII proteins using molecular modelling tools. Materials and Methods: The amino-acid sequences for KASII proteins were retrieved from the protein database of National Center for Biotechnology Information (NCBI), USA. The 3D structures were predicted for both proteins using homology modelling and ab-initio technique approach of protein structure prediction. The molecular dynamics (MD) simulation was performed to refine the predicted structures. The predicted structure models were evaluated and root mean square deviation (RMSD) and root mean square fluctuation (RMSF) values were calculated. Results: The homology modelling showed that EgKASII and EoKASII proteins are 78% and 74% similar with Streptococcus pneumonia KASII and Brucella melitensis KASII, respectively. The EgKASII and EoKASII structures predicted by using ab-initio technique approach shows 6% and 9% deviation to its structures predicted by homology modelling, respectively. The structure refinement and validation confirmed that the predicted structures are accurate. Conclusion: The 3D structures for EgKASII and EoKASII proteins were predicted. However, further research is essential to understand the interaction of EgKASII and EoKASII proteins with its substrates.

biosynthesis pathway [8]. However, the level of C16:0 in the palmoil obtained from the American oil-palm fruit-mesocarp-tissue is only 25%, which is about 43.2% less in comparison to C16:0 content in palm-oil obtained from the African oil-palm [8].
The understanding of KASII in the African and American oilpalms is important in order to understand the deviation in its activity in two oil palm species. As a part of it, the full length KASII cDNA clones were isolated from both E. guineensis (EgKASII) and E. oleifera (EoKASII) previously [2]. The EgKASII and EoKASII cDNAs are 2011 and 2138 base pair in length [2].
As of March 1, 2014, nobody has reported the structural features of oil-palms KASII protein; which is needed to understand the differences in KASII efficiency in the African and the American oil palm species. This warrants the study on oil palms KASII protein structures. Protein structures can be studied using X-ray crystallography and or NMR techniques [9, 10]. However, protein structures can be predicted using computational tools to quickly understand the protein structure [11]. Therefore, this study was undertaken to predict three-dimensional (3D) structures of the African and American oil-palms KASII protein by comparative modelling and to elucidate and understand their unique features. The predicted 3D structures of EgKASII and EoKASII proteins are being reported in this paper.

Methodology: Protein sequence retrieval
The nucleotide database of NCBI contains full-length KASII cDNA sequences for both EgKASII and EoKASII [Gene Bank Accession Numbers: AF220453 (EgKASII) and FJ940767 (EoKASII)] [2]. The deduced amino acid sequence for EgKASII and EoKASII proteins are available in protein database of NCBI. The amino acid sequence of EgKASII and EoKASII proteins were retrieved from the NCBI's protein database. These retrieved protein sequences were used in the experiments to predict 3D structures.

Secondary structure prediction
The secondary structures in EgKASII and EoKASII proteins were predicted using PHYRE server [12] and visualized using Pymol [13]. The MEMSAT-SVM server [14] was used to predict the protein topology and to identify the presence of signal peptide and transmembrane helices within the EgKASII and EoKASII proteins.

Template selection and 3D structure prediction
Two comparative molecular modelling approaches namely, homology modelling by MODELLER [15] and ab-initio by I-TASSER (server) were used in this study to predict the 3D structure of EgKASII and EoKASII proteins [16,17].
In homology modelling, the templates were identified based on position-specific profile search method which improves the accuracy of sequence alignments and also extends the boundaries of detectable sequence similarity. Position-specific iterative basic local alignment search tool (PSI-BLAST) [18] was used to derive a position-specific scoring matrix (PSSM) or profile from the multiple sequence alignment (MSA) of sequences using protein-protein BLAST. After templates identification, global alignment was carried out between the query sequence and the identified templates. The best template was selected based on the E-value (lowest), highest score, highest matching secondary structures and the most aligned region between the query and the template.
In ab-initio (by I-TASSER) approach of 3D structure prediction, the 3D structure models were built based on multiple-threading alignments and iterative template fragment assembly simulations by Local Meta-Threading Server (LOMETS) [19]. Five top decoys were predicted for EgKASII and EoKASII proteins. The structures with the lowest c-score were selected as the best model for EgKASII and EoKASII proteins.

Refinement and validation of predicted 3D structures
The predicted 3D structures of EgKASII and EoKASII were processed for the refinement and the validation. Using GROMACS v4.5.4 [20] and GROMOS96 53a6 force field on a Linux system, energy minimization and molecular dynamics (MD) simulation was performed for the predicted 3D structures. Energy minimization was performed using steepest descent algorithm and was allowed to run until it converged to machine precision or to a maximum force on each atom less than 100 kJ/mol/nm. The 3D structures were centered in a rhombic dodecahedral cell filled with simple point charge (SPC) water with a box edge set at 1.0 nanometer (nm). Sodium or chloride ions were added accordingly to neutralize the overall charge of the system. The position restraints were applied to (all) protein and heavy atoms and simulations were performed for NVT equilibration ensemble where number of particles, volume of the system and temperature were kept constant at 300K for 100 picoseconds (ps) using velocity rescaling method [21] followed by 100 ps of NPT equilibration ensemble where number of particles, pressure and temperature were kept constant at 1 bar. The temperature and pressure were controlled by Nose-Hoover thermostat and Parrinello-Rahman barostat, respectivel [22, 23].
After the system has been well-equilibrated, we run a 30 nanosecond (ns) of MD simulation for our protein structures predicted by ab-initio approach. A time-step of 2 femtoseconds (fs) was used where all bonds were constrained using the linear constraint solver (LINCS) algorithm [24]. Coulombs potentials were calculated using Particle Mesh Ewald (PME) electrostatics [25] using a cubic-spline interpolated grid with 0.16 nm grid spacing. The stereochemical quality of the models was determined using PROCHECK [26].

Results:
The EgKASII and EoKASII protein sequences were retrieved, and its secondary structures were predicted. The topology of both EgKASII and EoKASII proteins to show secondary structures is shown in Figure 1 & Figure 2, respectively. The predicted 3D structure produced for EgKASII and EoKASII proteins by homology modelling using MODELLER software covers their 120 to 567 and 145 to 562 regions (amino acids), respectively. The predicted 3D structure produced for EgKASII and EoKASII proteins by homology modelling using MODELLER as well as by using ab-initio method (using I-TASSER) are shown in Figure  3.
Superimposition showing active sites of EgKASII (Cys316, His456, His492) and EoKASII (Cys316, His453, His489) with their respective templates used in homology modelling is depicted in Figure 4A & B. Superimposition of EgKASII on EoKASII active sites from the structures generated by MODELLER and I-TASSER is shown in Figure 4C & D, respectively. The total energy for the protein models after energy minimization and equilibration is shown in supplementary Figure 1. All the four models built were evaluated with PROCHECK for stereochemistry quality of protein structures. The comparative study of these structures by using Ramchandran plot is shown in supplementary Figure 2. Other details such as the main chain, side chain, bond length, bond angle and planar groups within limits obtained are shown in Table 1 (see supplementary material). The RMSF for ab-initio generated models of EgKASII and EoKASII is shown in supplementary Figure 3. Most of the residues and the active residues fluctuate within the range of 0.1 to 0.2 nm which is in the acceptable range. The radius of gyration plot (supplementary Figure 4) shows that EgKASII and EoKASII protein remains compact and stably folded after 30 ns (or 30,000 ps) of simulation.

Discussion:
The secondary structures of both EgKASII and EoKASII proteins are made up of mostly alpha helices (45%) and coils (43%) with only 12% of beta strands. The EgKASII protein (sequence) is known to have a high (95%) similarity with the EoKASII protein [2]. However, secondary structure analyses of both proteins suggest that both proteins have the same number of alpha helices, coils and beta strands. The analysis of the secondary structures also suggests that most of the differences between EgKASII and EoKASII protein sequence reported previously [2] are located in the loop regions. It is in line with the commonly observed evolutionary patterns in the proteins [27].
The template, 1OX0_A (Streptococcus pneumonia KASII) was used for the EgKASII protein structure prediction by considering resolution (1.3 Angstrom) and the highest score calculated by MODELLER [15]. However, the best suitable template, 3KZU_A (Brucella melitensis KASII) was used for EoKASII protein structure prediction. The 3D model structures produced by homology modelling using MODELLER software covers 120 to 567 amino acids of EgKASII and 145 to 562 amino acids of EoKASII. The alpha carbons superimposition of EgKASII and EoKASII with their respective templates showed a RMSD of 1.54A and 1.92A, respectively. This indicates that RMSD value is in the range of attainable accuracy for a model [28].
We successfully predicted the three-dimensional (3D) structures for KASII proteins of both Elaeis species, and made comparison at sequence and structural level. We strongly believe that the 3D structures predicted for KASII proteins should be closer to real structures of these respective proteins. However, we suggest the further wet lab experimental work to validate these predicted structures using X-ray crystallography or NMR technique [9,10]. Similarly, the active-site residues of KASII proteins has been determined successfully but have not been tested experimentally. In order to confirm the predicted structures and their active sites, molecular docking and simulations for the formation of complex between the predicted protein structure and their respective substrates needs to be carried out for the structures predicted by MODELLER; so that comparative analysis can be done for the 3D structures predicted by using MODELLER and I-TASSER.
Oil palm derived from the African oil palm contains high amount (~54%) of saturated fatty acids [3] in comparison to palm oil obtained from the American oil palm [4]. A systematic study of oil palms key genes involved in fatty acid biosynthesis pathway and a comparative modelling of key proteins important in fatty acid biosynthesis will help to elucidate and understand their unique features. Molecular modelling can be used for the understanding and prediction of the microscopic and macroscopic properties of the proteins [29]. It is also useful in the study of enzyme's binding affinity [30], in virtual screening of natural products [31], in predicting molecular interaction [32], and saves the time and money. Our aim was to utilize molecular modelling tools to predict the 3D structures of the African and American oil palm KASII protein.
In the recent past, Malaysian Palm Oil Board (MPOB) and collaborators published oil palms (Elaeis guineensis Jacq. and E. oleifera) genome [33]. They have also reported a unique gene that controls the oil yield in oil palm fruits [34]. These advances in oil palm research will have a significant impact in oil palm industry. It is estimated that there are at least 34,802 genes in oil palm genome. If we want to understand structures of all proteins in oil palm by doing wet lab work then this is too much experimental work. However, to understand the structural features of important proteins in short time molecular modelling will be useful. For the real understanding of the protein structures, structures should be determined by using NMR or X-ray crystallography. However, the 3D structures predicted in this study could serve as foundation for the further research on oil palm KASII protein and could be useful in clear understanding of the fatty acid biosynthesis pathway in oil palm.

Conclusion:
We determined the three-dimensional structure for E. guineensis and E. oleifera KASII protein. The RMSF value for the three active residues of EgKASII and EoKASII were around 0.1nm. Both the structures remain compact and stably folded after 30 ns of simulation at an average of 2.41 nm and 2.38 nm for EgKASII and EoKASII, respectively. Molecular docking and simulation study is required to understand the interactions between the predicted KASII proteins and their substrates. In addition, further research is required in wet lab to validate the predicted structures.

Acknowledgement:
SJB acknowledges the financial support from the Malaysia's Ministry of Education to EW under the MyMaster programme. Authors are grateful to the University Putra Malaysia, University Technology Malaysia, and University Technical Malaysia (Melaka) for providing access to their facilities.