Prediction of kinase-inhibitor binding affinity using energetic parameters

The combination of physicochemical properties and energetic parameters derived from protein-ligand complexes play a vital role in determining the biological activity of a molecule. In the present work, protein-ligand interaction energy along with logP values was used to predict the experimental log (IC50) values of 25 different kinase-inhibitors using multiple regressions which gave a correlation coefficient of 0.93. The regression equation obtained was tested on 93 kinase-inhibitor complexes and an average deviation of 0.92 from the experimental log IC50 values was shown. The same set of descriptors was used to predict binding affinities for a test set of five individual kinase families, with correlation values > 0.9. We show that the protein-ligand interaction energies and partition coefficient values form the major deterministic factors for binding affinity of the ligand for its receptor.

The three-dimensional structures of proteins with bound ligand are available in the Protein Data Bank [11] along with their experimental binding affinity information. Binding affinity data such as Ki, Kd, IC50 etc. obtained from experimental studies are also available in databases such as BindingDB [12], Binding MOAD [13], PDBbind [14] etc. Availability of valuable resources regarding kinase inhibitors made computational biologists to develop statistical models to accurately predict the binding affinity of complexes.
Structure-based virtual screening methods use docking programs to explore the possible binding modes of a ligand within the target binding site, and scoring functions to estimate the affinity of the ligand for the binding site [15,16]. While docking methods at present are in general successful in predicting the correct binding conformations of ligand molecules, they do not perform well in correctly predicting the binding affinity for the predicted ligand conformations [2]. Hence, it is essential to predict the binding affinity of a given ligand to its target known as the 'scoring problem' [17]. Table 1: Experimentally observed and predicted IC50 values for 25 kinase-inhibitor complexes.
Note: The deviation between the predicted and experimental IC50 values is given in parenthesis.
As a pioneering work, Bohm [18] (1994) developed a simple empirical function (LUDI) to estimate the binding constant for a protein-ligand complex of known structure. This empirical scoring function takes into account hydrogen bonds, ionic interactions, the lipophilic protein-ligand contact surface and the number of rotatable bonds in the ligand. Head et al. (1996) in their VALIDATE approach used electrostatic and steric interaction energies, octanol-water partition coefficient, polar and nonpolar contact surfaces, and a term to describe intramolecular flexibility [19]. Following the approach of Bohm, [18] Eldridge et al. [20] (1997) included intramolecular flexibility in ChemScore and Wang et al. [21] (1998) classified hydrogen bonds and included the occurrence of interstitial water molecules in SCORE. Based on the statistical analysis of experimentally observed distributions While the above methods use the known three dimensional structures to predict binding affinities, the Quantitative Structure-Activity Relationship (QSAR) methods serve as an alternative way of binding affinity predictions in the absence of 3D structure of target proteins or their complexes with ligands. These methods make use of physicochemical and structural properties (descriptors) of ligands to relate their biological activity using regression methods. Combined QSAR approaches in binding affinity predictions have been recently reported [29,30].
In the present work, we have correlated the experimental IC50 values (in their logarithmic form) of 25 different kinase-inhibitor complexes with their protein-interaction energy and partition coefficient (logP) values for multiple regression analysis, which shows a good correlation with the experimental IC50 values. This shows that the protein-ligand interaction energies and logP values form the major factors that determine the ligand binding affinity of proteins. By incorporating these energetic as well as solvent terms, docking methodologies can be highly successful in predicting the binding affinity for the generated poses of their correct ligand binding modes.

Methodology Information resources
Twenty five different protein kinase-inhibitor complexes solved by x-ray crystallography method were obtained from Protein Data Bank [11]. The complexes in the dataset have resolution less than 2.5 Å with known IC50 values were treated as training set. The number of non-hydrogen atoms of the ligands and energetic profile comprising of i) total ligand-receptor interaction energy, ii) van der Waals energy, iii) electrostatic energy, iv) hydrogen bond energy, v) solvation free energy, vi) conformational entropy and vii) ligand-water-receptor binding energy were obtained from the PEARLS server for each of the kinase-inhibitor complexes. The PEARLS server uses the AMBER force field [31] for computing the above energetic contributions [28]. LogP (octanol/water partition coefficient) values of the ligands were calculated from the Molinspiration server [32] by providing SMILES code of the ligand as input.

Training set construction and validation
Multiple regression analysis was carried out to establish a relationship between the above-mentioned descriptors and experimental log (IC50). A back-check test was carried out for predicting the binding affinity by re-substituting the values in the regression equation obtained. For the jack-knife test, coefficients of multiple regressions were determined using (n-1) data (omitting one protein-ligand complex at a time) and then predicting binding affinity of the omitted protein-ligand complex.

Test set information
The regression equation obtained from the training set was tested on i) a set of 93 kinase-inhibitor complexes with IC50 values, and ii) a set of 9 approved kinase inhibitors [2].
To further assess the predominant role of the chosen descriptors in binding affinity predictions, the experimental log (IC50) values were regressed with the same set of nine descriptors in five independent protein-kinase families comprising 17 cyclic AMPdependent kinase-inhibitors, 12 casein kinase-inhibitors, 15 hepatocyte growth factor receptor kinase-inhibitors, 12 cyclindependent kinase-inhibitors and 16 mitogen-activated kinaseinhibitors. For each of the five kinase families, five different regression equations were obtained which were then validated by back-check analysis. The dataset information of all the kinaseinhibitor complexes used in the present study, including PDB ID, protein name, ligand ID, x-ray resolution (Å), experimental IC50 values (nM) with their logarithmic form, and descriptor values are provided in the Appendix.

Discussion
The following multiple regression equation (1) Figure 1a and 1b. Note: The mean value between the logarithm of minimum and maximum experimental IC50 values are given in parenthesis. Note: The deviation between the predicted and experimental IC50 values is given in parenthesis.

a) Test set of diverse protein kinases (Test set I)
The regression equation (1) obtained was tested on 93 kinaseinhibitor complexes (results are provided as Table 7 in the Supporting Information file) and the relationship between the experimental and predicted log (IC50) values is presented as a scatter plot (Figure 2). An average deviation of 0.92 from the original log (IC50) values was observed for the 93 kinase-inhibitor complexes. The difference between the experimental and calculated log (IC50) values was found to be less than ±1 log unit for 64 out of 93 kinase-inhibitor complexes.

b) Approved kinase inhibitors as test set (Test set II)
To further test the predictability of our regression equation (1) Using the regression equation (Eq. 2), log (IC50) values for 17 cyclic AMP-dependent protein kinase-inhibitor complexes were predicted. The experimental as well as predicted log (IC50) values are presented (Table 3) and plotted (Figure 3a). The average deviation for the back-check test was 0.28 from the experimental values.
ii) Casein kinase-inhibitor complexes 12 casein kinase-inhibitor complexes were taken for the multiple regression analysis which has shown a good correlation of r = 0.97 for the regression equation (3) log (IC50) = 0.01 NHA -1. The set of 12 casein kinase-inhibitor complexes with their experimental and predicted values has been provided ( Table 4).
The scatter plot shows the relationship between the experimental and predicted IC50 values (Figure 3b), the average deviation being 0.02 for back-check predictions.

iii) Hepatocyte growth factor receptor kinase-inhibitor complexes
A set of 15 hepatocyte growth factor receptor kinase-inhibitors has shown a correlation coefficient value of 0.90 when subjected to regression with multiple descriptors, the equation (4)  The experimental and predicted log (IC50) values are provided in Table 5. The correlation between experimental and calculated values for the 15 hepatocyte growth factor receptor kinaseinhibitors is shown in Figure 3c. An average deviation of 0.31 was observed.

iv) Cyclin-dependent kinase-inhibitor complexes
A very good correlation of r = 0.94 was obtained for 12 cyclindependent kinase-inhibitor dataset using the regression equation The predicted results of 12 cyclin-dependent kinase-inhibitor complexes are tabulated ( Table 6). The average deviation value from the experimental value was found to be 0.49. The results are plotted (Figure 3d).

Note:
The deviation between the predicted and experimental IC50 values is given in parenthesis.

v) Mitogen-activated protein kinase-inhibitor complexes
The multiple regression analysis of 16 mitogen-activated protein kinase-inhibitors gave a correlation of r = 0.94 using the regression equation (6) log (IC50) = -0. 13  The observed and computed values for a dataset of 16 mitogenactivated protein kinase-inhibitors are presented (Table 7) showing an average deviation value of 0.29. The predicted IC50 values were plotted against the experimental values (Figure 3e). In the present study, we have used the various energetic components as independent variables along with logP values, to predict the experimental binding affinity. This set of descriptors developed from a small set of 25 kinase-inhibitor complexes were able to predict IC50 values for 93 test set complexes spanning 4 orders of magnitude of IC50 values. The same set of descriptors was also found to be suitable for family specific regression models as well. Hepatocyte growth factor receptor kinase-inhibitor complexes; (d)12 cyclin-dependent kinase-inhibitor complexes; (e) 16 mitogen-activated protein kinase-inhibitor complexes.
As docking methods improve to reproduce conformations observed through x-ray crystallographic and NMR determined structures, it will be possible to use our present approach to predict the IC50 values for various protein targets, more significantly for specific protein families. Alternatively, if IC50 values for kinase-inhibitor complexes are known, the method can also be used to predict the pose of a given ligand as well.

Conclusion:
Despite intensive research over more than two decades, accurate prediction of the binding affinities of large set of diverse protein ligand complexes remains one of the most important open problems in computational molecular biology [42]. The issues currently being addressed are the scoring of modelled protein conformations, and including the binding free energy due to presence of water molecules [43]. In the present work, we have addressed these issues by using energetic and solvent descriptors to predict the binding affinity of kinase-inhibitor complexes using multiple regression analysis. A high correlation value of 0.9 between the predicted and experimental binding affinity was obtained for a test set of kinase-inhibitor complexes. The method was validated by predicting a test of 93 kinase-inhibitor complexes covering five kinase families which has shown a good predictive ability. Our methodology can provide valuable insights for the prediction accuracy of molecular docking strategies. Further studies will be required to validate the general applicability of these set of descriptors to predict the binding affinity for a diverse set of enzyme-inhibitor complexes.