CoMFA and CoMSIA 3D-QSAR analysis of DMDP derivatives as anti-cancer agents.

Comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) based on three dimensional quantitative structure–activity relationship (3D-QSAR) studies were conducted on a series (78 compounds) of 2, 4-diamino-5-methyl-5-deazapteridine (DMDP) derivatives as potent anticancer agents. The best prediction were obtained with a CoMFA standard model (q2 = 0.530, r2 = 0.903) and with CoMSIA combined steric, electrostatic, hydrophobic and hydrogen bond donor fields (q2 = 0.548, r2 = 0.909). Both models were validated by a test set of ten compounds producing very good predictive r2 values of 0.935 and 0.842, respectively. CoMFA and CoMSIA contour maps were then used to analyze the structural features of ligands to account for the activity in terms of positively contributing physiochemical properties such as steric, electrostatic, hydrophobic and hydrogen bond donor fields. The resulting contour maps produced by the best CoMFA and CoMSIA models were used to identify the structural features relevant to the biological activity in this series of analogs. This study suggests that the highly electropositive substituents with low steric tolerance are required at 5 position of the pteridine ring and bulky electronegatve substituents are required at the meta-position of the phenyl ring. The information obtained from CoMFA and CoMSIA 3-D contour maps can be used for the design of deazapteridine-based analogs as anticancer agents.


Background:
The enzyme dihydrofolate reductase (DHFR) (5, 6, 7, 8tetrahydrofolate, NADP + oxido-reductase EC 1.5.1.3), utilizing NADPH as a cofactor, catalyzes the reduction of dihydrofolate to tetrahydrofolate. Tetrahydrofolate and its derivatives are essential cofactors in the synthesis of thymidylate, purines and some amino-acids. Inhibition of DHFR results in the depletion of reduced folate pools, inhibition of DNA synthesis, and cell death. Therefore, DHFR has proven to be an important target in cancer therapy. Methotrexate (MTX), a tight binding inhibitor of DHFR, is used to treat patients with leukemia and solid tumors in addition to a variety of other non-malignant diseases. The anti-folate methotrexate has been rationally-designed nearly 60 years ago to potently block the folate-dependent enzyme DHFR thereby achieving temporary remissions in childhood acute leukemia.
DHFR is not a new target. However, there is active enthusiasm for the development of improved derivatives for DHFR specific inhibitor [1-6]. A unique feature of DHFR is the selectivity that is possible in the design of inhibitors. This makes it an ideal yet old target for rational and effective design for anticancer agents. The anti-folate compounds evaluated in this investigation are derivatives of DMDP, having structures similar to the trimetrexate/piritrexim class of anti-folates. Due to an interest in new anticancer drugs, several DMDP inhibitors were chosen from the Southern Research Institute chemical repository for screening against human DHFR. [7]. A sound understanding of the structural requirements for anticancer activity in DMDP is important in guiding and optimizing drug design efforts.
However, there is no comprehensive structure activity relationship study of DHFR in the literature.
QSAR based on the 3D structures of ligands involve two methods namely CoMFA [8] and CoMSIA [9]. Recently, more advanced techniques have attempted to model the receptor environment for accommodating ligand structure. QSAR studies incorporate 3D data for ligands and provide a more detailed analysis of ligandreceptor interactions. Here, we describe 3D QSAR studies of 78 CoMFA and CoMSIA. Thus, the resulting CoMFA and CoMSIA studies will not only illustrate the conformation or spatial orientation of anti-cancer DMDP derivatives but also provide useful indicators for the design of new drug candidates for cancer. These results are applicable to the prediction of the activities of new DHFR inhibitors and provide structural implications for designing potent and selective DHFR inhibitors as anticancer agents.

Methodology: Molecular structures and optimization
Seventy eight molecules selected for the present study were taken from an earlier report (7). The structures of the compounds and their biological data are given in Table 1 (see supplementary  material). The IC 50 values were converted to the corresponding pIC 50 (-logIC 50 ) and used as dependent variables in CoMFA and CoMSIA analysis. The pIC 50 values span a range of 3-log units providing a broad and homogenous data set for 3D-QSAR study. The 3D QSAR models were generated using a training set of 68  Table 1 under supplementary material). The test compounds were selected manually such that the structural diversity and wide range of activity in the data set were included.

Molecular alignment
CoMFA results may be extremely sensitive to a number of factors such as alignment rules, overall orientation of the aligned compounds, lattice shifting step size and probe atom type [10]. The accuracy of prediction of CoMFA models and the reliability of the contour models depend strongly on the structural alignment of the molecules [10]. Thus, we applied molecular alignment to align all the molecules used in the present study. The molecular alignment was achieved by SYBYL routine database align. The most active compound (compound 63) was used as an alignment template and the rest of the molecules were aligned to it by using the common substructure. Partial atomic charges were calculated using the MMFF94 charges.

CoMFA studies
Steric and electrostatic interactions were calculated using the Tripos force field [11] with a distance-dependent dielectric constant at all interactions in a regularly spaced (2Å) grid taking a sp 3 carbon atom as steric probe and a +1 charge as electrostatic probe. The cutoff was set to 30kcal/mol. With standard options for scaling of variables, the regression analysis was carried out using the fully cross-validated partial least squares (PLS) method (leave one out) [12]. The minimum sigma (column filtering) was set to 2.0 kcal/mol to improve the signal to noise ratio by omitting those lattice points whose energy variation was below this threshold. The final model which is non cross-validated conventional analysis was developed with the optimum number of components to yield a non cross-validated r 2 value.

CoMSIA studies
In CoMSIA, a distance-dependent Gaussian-type physicochemical property has been adopted to avoid singularities at the atomic positions and dramatic changes of potential energy for those grids in the proximity of the surface. With the standard parameters and no arbitrary cutoff limits, three physico-chemical properties, namely, steric, electrostatic and hydrophobic fields were calculated. The steric contribution was reflected by the third power of the atomic radii of the atoms. Electrostatic properties were introduced as atomic charges resulted from molecular docking. An atom-based hydrophobicity was assigned according to the parameterization developed by Ghose and colleagues [13]. The lattice dimensions were selected with a sufficiently large margin (>4Å) to enclose all the binding conformations of the inhibitors. In general, similarity indices, A F,K between the compounds of interest were computed by placing a probe atom at the intersections of the lattice points using Equation 1 (see supplementary material).
In the present study, similarity indices were computed using a probe atom (W probe,k ) with charge +1, radius 1Å, hydrophobicity +1, and attenuation factor a of 0.3 for the Gaussian type distance. The statistical evaluation for the CoMSIA analyses was performed in the same way as described for CoMFA. Partial least square (PLS) analysis PLS method (8) was used to linearly correlate the CoMFA fields to biological activity values. The cross-validation was performed using leave-one-out (LOO) method in which one compound is removed from the dataset and its activity is predicted using the model derived from the rest of the molecules in the dataset. Equal weights for CoMFA were assigned to steric and electrostatic fields using CoMFA STD scaling option. To speed up the analysis and to reduce noise, a minimum column filter value of 2.0 kcal/mol was used for the cross-validation. Non-cross-validation was performed to calculate conventional r 2 using the same number of components. To further assess the robustness and statistical confidence of the derived models, bootstrapping analysis for 100 runs was performed (14) and (15). Bootstrapping involves the generation of many new data sets from original dataset and is obtained by randomly choosing samples from the original dataset. The statistical calculation is performed on each of these bootstrapping samplings. The difference between the parameters calculated from the original dataset and the average of the parameters calculated from many bootstrapping samplings is a measure of the bias of the original calculations. The entire crossvalidated results were analyzed considering the fact that a value of q 2 above 0.3 indicates that probability of chance correlation is less than 5% [14].

Hardware and software
Insight II 2000.1 (16) and Sybyl 7.1 (17) were used for molecular modeling on a SGI Origin 300 workstation equipped with 4 * 600 Mhz R12000 processors.

Discussion: 3D QSAR STUDIES
CoMFA and CoMSIA 3D-QSAR models were derived using previously reported DHFR inhibitors. The chemical structures of molecules and their experimental pIC 50 values are given in Table  1 (see supplementary material).

CoMFA analysis
Sixty eight compounds out of the total seventy eight DHFR inhibitors were used as training set and ten compounds were used as test set. The test set compounds were selected manually so that the structural diversity and wide range of activity in the dataset were included. PLS analysis was carried out for the training set and a cross-validated q 2 of 0.530 for six components was obtained. The non cross-validated PLS analysis with the optimum components revealed a conventional r 2 value of 0.903, F = 94.349 and an estimated standard error of prediction (SEP) 0.386. The steric field descriptors explain 52.2 % of the variance, while the electrostatic descriptors explain 47.8 % of the variance. Bootstrap analysis for 100 runs was then carried out for further validation of the model by statistical sampling of the original dataset to create new datasets.  Table 1 (see supplementary material) and the correlation between the predicted activities and the experimental values is depicted in Figure 1. Figure 1 illustrate the predicted activities using the CoMFA model are in good agreement with the experimental data, suggesting that the CoMFA model should have a satisfactory predictive ability.

CoMSIA analysis
The CoMSIA analyses were performed using four descriptor fields: steric, electrostatic, hydrophobic and hydrogen bond donor. The CoMSIA study revealed a cross validated q 2 of 0.548 with optimum number of component 6, a conventional r 2 of 0.909 with a standard error of 0.373 and F = 101.992 for training set. The steric field descriptor explains 17.4 % of the variance and, the electrostatic descriptor explains 43.0 %, the hydrophobic field explains 28.4% while the hydrogen bond donor explains 11.4 % of the variance. Bootstrap analysis for 100 runs was then carried out for further validation of the model by statistical sampling of the original dataset to create new datasets. Thus, the difference in the parameters calculated from the original data and the average of the parameters calculated from N(=100) runs of bootstrapping sampling is a measure of the bias of the original calculation. This yielded higher r 2 bootstrap value 0.939 for CoMSIA with standard error value of 0.297 supporting the statistical validity of the developed models. The predicted inhibitory activities are listed in Table 1 (see supplementary material). The correlation between the experimental and predicted bioactivities is shown in Figure 1. Results show that prediction by the CoMSIA model is reasonably accurate.  CoMSIA analysis hydrogen bond donor field do not have effect on the variance. Therefore, we found only two cyan contours in the hydrogen bond donor maps near the NH 2 group in deazaptredine ring of compound 63, indicating that hydrogen bond donor functionalities in this region will enhance the activity (Figure 2d). In summary, the CoMFA and CoMSIA models for anticancer activity indicated that highly electropositive substituents with low steric tolerance are required at position 5 of the pteridine ring and bulky electronegative substituents are required at meta position of the phenyl ring.

Validation of 3D-QSAR models
The ten manually selected compounds (Table 1 in supplementary material) were used as testing set to verify the stability and predictive ability of the CoMFA and CoMSIA models. The predicted pIC 50 with the QSAR models are in good agreement with the experimental data within a statistically tolerable error range, with a predicted correlation coefficient of r pred 2 = 0.935 and 0.842 and standard error of prediction value 0.244 and 0.380 for CoMFA and CoMSIA, respectively. The correlation between CoMFA and CoMSIA predicted activities and the experimental activities of the testset compounds are depicted in Figure 1. The testing results indicate that the CoMFA and CoMSIA models can be reliably used in the design of novel DHFR inhibitors.

Contour analysis
The visualization of the results of the CoMFA and CoMSIA models have been performed using the StDev*Coeff mapping option contoured by contribution. The default level of contour with contribution, 80% for favored region and 20% for disfavored region was set during contour analysis.

CoMFA steric and electrostatic contours are shown in Figures 2. The steric interaction is represented by green and yellow contours, while electrostatic interaction is denoted by red and blue contours.
A large yellow contour is located around the O-methyl substituted phenyl ring of compound 29 and -CH 2 OCH 3 on the 5-position of the pteridine ring suggesting that groups with low steric tolerance are required at this position to increase the activity (Figure 2a). This is possibly a reason why, compounds 45, 50 and 51 are less potent. In case of compound 63, large green contour was found near the plane of Chloro substituted napthyl ring of compound indicating that bulky substituents were preferred in this region (Figure 2b). Similarly, in compound 41 bromo group extends towards the large green contour which is the favorable region for bulkier groups and hence compound 41 is high in binding affinity.  . compounds 59, 60, 61, 62, 63, 64, 65 and 66 are more potent than molecules with smaller substituents, such as compounds 29, 32, 45, 50 and 51.
The CoMFA electrostatic contour plot is displayed in Figure 2a and Figure 2b. A blue contour indicate that substituents should be electron deficient for high binding affinity and red color indicates that substituents should be electron rich for high binding affinity with the protein. A red contour was found overlapping -CH 2 OCH 3 substitution at 5-position of compound 29 with an electron rich functionality and hence compound 29 exhibit low activity (Figure 2a). Similarly in case of other low active compounds like compounds 50 and 51 the substitution of -CH 2 OCH 3 at 5-position overlaps blue contour which is a favorable region for electron deficient moieties and thus lead to decrease in the activity. In contrast, compound 63, most potent inhibitor, has no functional group with high electron density extended to the blue areas (Figure 2b). Similarly, other highly active compounds like compound 41, 46, 47, 48, 49, 50, 64 and 66 have no electron rich substituents at 5-position of the pteridine ring and hence these molecules are having greater binding affinity towards human DHFR. A red contour was found near the Chloro substituted napthyl ring of compound 63 indicating a preference for negatively charged substituents in these areas (Figure 2b). The presence of negative charges favored red contours near the napthyl ring and this indicates the addition of electron rich groups may increase high binding affinity.

CoMSIA contour maps
The contours maps of CoMSIA were derived using steric, electrostatic, hydrophobic and hydrogen bond donor fields. CoMSIA steric and electrostatic are more or less similar to those of CoMFA steric and electrostatic contour maps. In CoMFA, a large green contour was found overlapping the plane of Chloro substituted napthyl ring of compound 63, indicating that bulky substituents were preferred in this region. Similarly, in case of another highly active compound 41, green contour overlaps with methyl and Bromo substituted phenyl ring which indicates that bulkier groups are preferred in this region. A large yellow contour is found overlapping the O-methyl substitution at 2-position of the phenyl ring in case of compound 29 which is the region where bulkier substituents are not preferred and that is why compounds having bulkier substituents extended to this region and hence  compounds 1, 2, 3, 6, 7, 8, 29, 54 and 67 are either less active or moderate active. There is one red contour overlapping the -NH group compound 29, which means that this group is not favored in this region and will lead to decrease in the inhibitory potency. Similarly, in compound 78, a blue contour was found overlapping the -O(CH 2 ) 2 CH 3 and this leads to decrease in activity for compound 78.
Hydrophobic contour map from CoMSIA is shown in Figure 2c. Hydrophobic region is represented by white contours and unfavorable regions are represented by yellow contours. CoMSIA hydrophobic contour map showed a big white contour covering chloro substituted napthyl ring of compound 63, suggesting that increase in hydrophobicity in this region is expected to improve the activity of molecule (Figure 2c). A small yellow contour covers the -CH 3 group of compound 63 indicating that hydrophobic substituents are not preferred in this region. Further increase in hydrophobicity at this position will bring down the activity, that is why 63 is more active than the 29, 45, 50 and 51. Substituents which decrease the net hydrophobicity of the moiety like NH 2 may lead to increase in the activity of the molecule. active compound 63 respectively. The favorable steric areas (contribution level 80%) with more bulkiness are indicated by green isopleths and the dis-favorable steric areas (contribution level 20%) are shown by yellow isopleths. The favorable electrostatic areas (contribution level 80%) with positive charges are indicated by blue isopleths and the favorable electrostatic areas (contribution level 20%) with negative charges are shown by red isopleths. 2(c) and 2(d) are the hydrophobic contour maps and hydrogen-bond contour maps of CoMSIA model for high active compound 63, respectively. The favorable hydrophobic areas (contribution level 80%) indicated by yellow isopleths and the disfavorable hydrophobic areas (contribution level 20%) are shown by white isopleths. The hydrogen bond contour maps of CoMSIA model are also shown. Cyan isopleths contour maps (contribution level 80%) beyond the ligands where a hydrogen -bond donor group in the ligand will be favorable for biological activity and purple isopleths (contribution level 20%) represents hydrogen -bond acceptor in the ligands unfavorable for bioactivity. where q represents a grid point, i is the summation index over all atoms of the molecule j under computation, W ik is the actual value of the physicochemical property k of atom i, and W probe,k is the value of the probe atom.