Prediction of the three-dimensional structure of serine/threonine protein kinase pto of Solanum lycopersicum by homology modelling

The resistant gene Pto of Solanum lycopersicum interacts with the avr Pto gene product of the bacterial pathogen Pseudomonas syringae pv tomato to launch a cascade of molecular events that triggers the hypersensitive disease-resistance response in tamato. The paper describes attempts to predict the structure of Pto encoding a serine/threonine protein kinase to understand the mechanism and function. A three-dimensional model based on the crystal structure of effect protein Avr ptob complexed with Kinase Pto and bacterial effector protein Avrpto was generated using Modeller9v7. We adopted different modelling approaches for our study, Intialy, we generated a model based on a single template protein and then a model based on multiple templates. The models generated through these approaches were further assessed with ANOLEA energy assessment, Ram Page server and PROCHECK for stereochemistry and geometry check. Comparative analysis suggested that the model generated was better than the templates. This study paves the way for generating computer molecular models for proteins whose crystal structures are not available and which would aid in studying protein-protein interactions.


Background:
Pto encodes a cytoplasmic serine/threonine protein kinase [1] that interacts with the avr Pto gene product of the bacterial pathogen Pseudomonas syringae pv tomato [2]. The interaction appears to launch a cascade of molecular events that triggers the hypersensitive disease-resistance response [3]. These experiments provided the first molecular confirmation of Flor's (1956) gene-for-gene hypothesis that predicted a host resistance (R) gene encodes a receptor that recognizes a ligand encoded or produced by the corresponding Avr gene. Protein kinases are a group of enzymes with possess a catalytic subunit which transfers the gamma phosphate from nucleotide triphosphates (often ATP) to one or more amino acid residues in a protein substrate side chain, resulting in a conformational change affecting protein function. The enzymes classified into two broad classes based on substrate specificity are serine/threonine specific and tyrosine specific [4]. Protein kinase also play a major role in a multitude of cellular processes, including division, proliferation, apoptosis, and differentiation [5]. The catalytic subunits of protein kinase are highly conserved, and several structures have been generated [6,7].

Methodology:
The amino acid sequence of serine/threonine protein kinase Pto of Solanum lycopersicum (311 amino acids).was retrieved (Accession No: AAB47421) from the NCBI Genbank database (http:www.ncbi.nlm.gov). A BLAST [8] search (PDB-BLAST) [9] was performed with the amino acid sequence of serine/threonine protein kinase pto. The PDB-BLAST resulted with two best entries: Crystal Structure of effect protein AvrptoB complexed with Kinase Pto (PDB ID: 3HGK (Chain A)) [10] and bacterial effector protein Avrpto (PDB ID: 2QKW (Chain B)) [11]. These two proteins have a common sequence identity of 83% (269/321) with serine/threonine protein kinase Pto. Additionally, these two proteins have a crystal structure resolution of 3.30 Angstrom (3HGK) and 3.20 Angstrom (2QKW) respectively to make them excellent reference templates for performing homology modelling. The present study, we approached three different modelling approaches using the Modeller 9v7 [12] software for modelling the threedimensional structures of serine/threonine protein kinase Pto. First we aligned the serine/threonine protein kinase Pto with the sequence of the template protein 3HGK (Accession No: 3HGKA) using the ALIGN2D command of Modeller 9v7 and built a model. In next phase we aligned the serine/threonine protein kinase Pto with the sequence of the template protein 2QKW (Accession No: 2QKWB) using Modeller 9v7 to build another model. And finally, we created a model using multiple templates of Chain A of 3HGK and Chain B of 2QKW. The rough models generated were further refined using loop.py script in Modeller9v7.

Discussion: Sequence analysis, alignment and Model generation
Initially a structural analysis of the two template structures 3HGK and 2QKW were performed using ICM Molsoft Browser www.molsoft.com/icm_browser.html which revealed that crystal structure of Chain A of 3HGK consisted with only 288 amino acid residues (LYS31-GLU312) whereas crystal structure of Chain B of 2QKW consisted of 292 amino acid residues (PRO30-ILE321). Therefore, we manually retrieved out the sequence form MET01 up to PHE29 of our target sequence and sequences form PRO25-ILE311 (a total of 287 residues) to perform our homology modelling and generated the best fitting structure. For convenience sake, PRO25 was designated as the first residue i.e. PRO01.
Using 3HGK as a template, we created an alignment (using the ALIGN2D command of Modeller9v7) between the sequence of serine/threonine protein kinase Pto (pto) and the sequence of Chain A of 3HGK and generated five different structures in 185.80 seconds using the Modeller 9v7 program. Out of the 5 structures generated the best structure was chosen based on the evaluation of molecular Probability Density Function, DOPE Score [13] and GA341 [14]. The best structure gives a molecular Probability Density Function 2520.30786, DOPE score of -29699.56836 and GA341 score of 1.0. In the next phase, 2QKW was used as a template to create another alignment between the sequence of serine/threonine protein kinase Pto (pto) and the sequence of Chain B of 2QKW), we again generated another five 5 structures and choose the best structure with molecular Probability Density Function 2494.73193, DOPE score -30093.53711 and GA341 score 1.0 Finally using both templates 3HGK and 2QKW we created an alignment between serine/threonine protein kinase Pto, 3HGK and 2QKW. Using this alignment we generated another five structures and choose the best structure with molecular Probability Density Function 9821.59473.

Model refinement and assessment:
The best structure generated using 3HGK as a template was evaluated with ANOLEA energy assessment using the ANOLEA server [15,16] showing a total non-local energy of -742. The structure was further assessed for Ramachandran Plot with the RAM page server [17,18] showing 250 residues in the favoured regions, 27 residues in the allotted regions and 8 residues outside the outlier region. The ANOLEA energy assessment revealed that the energy of few loop regions were high with positive values, therefore, a loop refinement was performed [19] that generated a structure with a total non-local energy of -902 with ANOLEA assessment. It showed 261 residues in the favoured regions, 22 residues in the allotted regions and 2 residues outside the outlier region in the Ramachandran plot. ANOLEA energy assessment [15,16] of the best structure generated by using 2QKW was evaluated which showed a total non-local energy of -871. The structure was further assessed for Ramachandran Plot with the RAM page server [17,18] showing 255 residues in the favoured regions, 22 residues in the allotted regions and 8 residues outside the outlier region. Further loop refinement for the loop regions where the energies were high was performed to generate a structure with a total non-local energy of -1110 with ANOLEA assessment and showing 271 residues in the favoured regions, 13 residues in the allotted regions and 1 residues outside the outlier region in the Ramachandran plot.
Similarly, the best structure generated using 3HGK and 2QKW as template was evaluated with ANOLEA energy assessment [15,16] showing a total non-local energy of -886. The structure was further assessed for Ramachandran Plot with the RAM page server [17,18] showing 255 residues in the favoured regions, 20 residues in the allotted regions and 10 residues outside the outlier region. As the ANOLEA energy assessment displayed few loop regions with high energy, we performed loop refinement for the loop regions where the energies were high and generated a structure with a total non-local energy of -1043 with ANOLEA assessment and showing 257residues in the favoured regions 19 residues in the allotted regions and 9 residues outside the outlier region in the Ramachandran plot. A detailed comparative statics on ANOLEA energy assessment and Ramachandran analysis of the three models generated and their Z-score [20] based on the templates 3HGK, 2QKW and 3HGK-2QKW in comparison to the templates strucures 3HGK and 2QKW is shown on Table 1

Structural Comparison:
The assessment of homology models' accuracy is straightforward when the experimental structure is known. In our study we used the most common method of comparing two protein structures i.e. the root-mean-square deviation (RMSD) metric which measure the mean distance between the corresponding atoms of two superimposed structures. We superimposed the backbone of the built proteins and computed their RMSD -3HGK based model over the template 3HGK (RMSD 1.33), 2QKW based model over the template 2QKW (RMSD 2.17) and 3HGK-2QKW based model over template 3HGK and 2QKW (RMSD 1.72) using ICM Molsoft Browser www.molsoft.com/icm_browser.html (Figure 1).
Additionally we performed a comparative analysis on the secondary structure of the three built models and found out the