Docking of phosphonate and trehalose analog inhibitors into M. tuberculosis mycolyltransferase Ag85C: Comparison of the two scoring fitness functions GoldScore and ChemScore, in the GOLD software.

The Ag85 family enzymes are responsible for the synthesis of cell wall components in mycobacterial species. Inhibitors to these enzymes are potential antimycobacterial agents. We have carried out the docking of phoshonate and trehalose analog inhibitors into the three dimensional structure of mycolyltransferase enzyme, Ag85C of M. tuberculosis using the GOLD software. The inhibitor binding positions and affinity were evaluated using both the scoring fitness functions- GoldScore and ChemScore. We observed that the inhibitor binding position identified using the GoldScore was marginally better than the ChemScore. A qualitative agreement between the reported experimental biological activities (IC50) and the GoldScore was observed. We identified that amino acid residues Arg541, Trp762 are important for inhibitor recognition via hydrogen bonding interactions. This information can be exploited to design Ag85C specific inhibitors.

In M. tuberculosis, a major secreted protein complex, antigen 85, constitutes three proteins antigen 85A, 85B and 85C [1] that are responsible for the synthesis of cell envelope. These enzymes catalyze the transfer of mycolyl residue from one molecule of α, α' trehalosemonomycolate (TMM) to another TMM leading to the formation of α, α' trehalosedimycolate (TDM) [2] and are hence termed mycolyltransferases. These enzymes have the ability to bind the human fibronectin [3] through which the bacterium gets attached to host cells. Due to their immunodominant and secretory nature, the components of Ag85 complex represent promising protective antigen candidates. Also, the cell envelope of M. tuberculosis has been one of the targets of antimycobacterial agents. Drugs such as isoniazid, ethionamide, isoxyl, thiolactomycin, and triclosan and ethambutol, target the synthesis of mycolic acids in M. tuberculosis. In addition, pyrazinamide was shown to inhibit fatty acid synthase type I that in turn, provides precursors for fatty acid elongation to long-chain mycolic acids by fatty acid synthase II.
[4] Mycolic acid biosynthesis is known to be essential for mycobacterium growth, in particular trehalose mycolates aid in virulence of the organism and the structure of mycolates has been found to be important for initial replication and persistence in vivo.
[5] A mutated M. tuberculosis strain lacking the functional Ag85C gene showed a 40% decrease in the amount of cell wall linked mycolic acids indicating its role in cell wall synthesis.
[6] The importance of mycolyltransferases in pathogenesis and cell wall formation makes them an attractive target for drug design studies. A series of 6,6'-bis(sulfonamido), N,N'-dialkylamino and related derivatives of 6,6'-dideoxytrehalose were designed and synthesized to inhibit the Ag85 complex. were determined for both native and substrate bound forms. The protein structure corresponds to a α/β hydrolase fold and the catalytic triad responsible for the mycolyltransferase activity comprise the amino acid residues Ser624, Glu728 and His760 (numbering according to the PDB_ID: 1DQZ, B chain). This active site is typical of a serine protease family. Structural comparison of mycolyltransferases revealed that their backbone superimposes with an overall RMSD (root mean square deviation) of 0.577 Å. The catalytic residues are highly superimposed in these structures indicating that the structure and function of these isozymes are highly similar. Minor deviations are observed in the loop region connecting strand 4 and helix 3, comprising the sequence motif, 85Q-S-N-G-Q-N90 in Ag85C. This region is away from the substrate binding site and therefore will not affect substrate/inhibitor binding to the enzyme. A putative picture of the catalytic mechanism of the mycolyl transfer reaction been proposed by Ronning et al., 2000. [10] In the first step, catalytic serine attacks the indicated that for "drug like" and "fragment like" ligands, the docking accuracies obtained from GoldScore and ChemScore are similar, while for larger ligands, GoldScore gave better results. Docking involves the identification of ligand conformation and orientation in the protein binding pockets. The scoring functions are helpful to predict the biological activity of the ligand. The aim of the present work is to study the docking of phosphonate and trehalose analog inhibitors into the active site of Ag85C using the GOLD software. [12,13] In this work, we compared the scoring functions, GoldScore and ChemScore that are available in the Gold docking software. The three mycolyltransferases have highly similar structure, we have therefore chosen the three dimensional structure of Ag85C (PDB ID: 1DQZ) as a representative structure for docking studies. Also, the inhibition of Ag85C by the phosphonate inhibitors by Gobec et al., 2004 has been studied and the experimental IC50 values are reported. Since this enzyme functions as a serine protease, these docking studies will help characterize the inhibitor binding site. The information about the inhibitor binding site can be used to design Ag85C specific inhibitors that would not interfere with the other physiologically important ubiquitous serine proteases in humans.

Methodology:
Energy minimization of protein: The crystal structure coordinates of the Ag85C (PDB_ID: 1DQZ) were obtained from the protein data bank (http://www.rcsb.org/) and the B chain was selected for docking studies. All hydrogen atoms were added to the protein, including those necessary to define the correct ionization and tautomeric states of amino acid residues such as Asp, Ser, Glu, Arg and His using Cache software (www.cachesoftware.com/cache). A two stage protocol was set up for the energy minimization of protein using Hyperchem7 (Hypercube Inc.), molecular modeling software. In the first stage, all hydrogen atoms in the protein were allowed to optimize. The hydrogen locations are not specified by the X-ray structure but these are necessary to improve the hydrogen bond geometries. In the second stage, all protein atoms were allowed to relax. Minimization in both stages was performed using 100 steps of steepest descent and 2000 steps of conjugate gradient algorithm.
Binding site analysis: The binding site identification of mycolyltransferases, Ag85A, Ag85B and Ag85C was carried out using the Binding Site Analysis module available in Insight II (Acclerys, USA). The default parameters for grid size 1.00 Å and site open size 7.00 Å were used in binding site calculation.

Selection of docking molecules:
A set of 17 phosphonate inhibitors is taken from the reported literature [11] for the docking studies. A list of these molecules is provided in Figure 1a and a schematic representation of the molecules with least IC50 values is provided in Figure 1b. Similarly, a set of 9 trehalose analogs is taken from the reported literature [7] for docking studies. A list of these molecules is provided in Figure 5a and a schematic representation of the molecules with least MIC values are provided in Figure 5b. We have adapted the numbering of these inhibitors as represented in the original papers. [7, 11]

Molecular Modeling:
The phosphonate and trehalose analog inhibitors were built using the Hyperchem7. The structures were energy minimized using the steepest descent algorithm with a convergence gradient value of 0.001 kcal/mol. Further, the geometry optimization was carried out for each compound using the MOPAC 6 package and the semi empirical AM1 Hamiltonian.
Docking: Docking was carried out using GOLD (Genetic optimization for Ligand Docking) software, that uses the Genetic algorithm (GA). This method allows a partial flexibility of protein and full flexibility of ligand. All water molecules and hetero atoms were removed from the protein to evaluate the two scoring functions in GOLD software. For each of the 25 independent GA runs, a maximum number of 100000 GA operations were performed on a set of five groups with a population size of 100 individuals. Operator weights for crossover, mutation, and migration were set to 95, 95, and 10, respectively. Default cutoff values of 2.5 Å (dH-X) for hydrogen bonds and 4.0 Å for van der Waals distance were employed. When the top three solutions attained RMSD values within 1.5 Å, GA docking was terminated. The RMSD values for the docking calculations are based on the RMSD matrix of the ranked solutions. We observed that the best ranked solutions were always among the first 10 GA runs, and the conformation of molecules based on the best fitness score was further analyzed. All the 17 phosphonate and 9 trehalose analog inhibitors were analyzed using docking studies and the molecules with least IC50 and MIC values respectively in each group are described in detail in the results and discussion section. The two scoring functions available in the GOLD software to measure the affinity of ligand for protein binding site are: GoldScore fitness function: GoldScore performs a force field based scoring function and is made up of four components: 1. Protein-ligand hydrogen bond energy (external H-bond); 2. Protein-ligand van der Waals energy (external vdw); 3. Ligand internal van der Waals energy (internal vdw); 4. Ligand intramolecular hydrogen bond energy (internal-H-bond). The external vdw score is multiplied by a factor of 1.375 when the total fitness score is computed. This is an empirical correction to encourage protein-ligand hydrophobic contact. The fitness function has been optimized for the prediction of ligand binding positions. GoldScore = S(hb_ext) + S(vdw_ext) + S(hb_int) + S(vdw_int) Where S(hb_ext) is the protein-ligand hydrogen bond score, S(vdw_ext) is the protein-ligand van der Waals score, S(hb_int) is the score from intramolecular hydrogen bond in the ligand and S(vdw_int) is the score from intramolecular strain in the ligand.
ChemScore fitness function: ChemScore is an empirical scoring function to estimate the free energy of ligand binding to protein. contact terms to estimate lipophilic and metal-ligand binding contributions, including hydrogen bonding interactions. It does not differentiate between different types of hydrogen bonds, depending on the nature and geometry of the interaction. It adds a clash penalty and internal torsion terms, which influence against close contacts in docking and poor internal conformations. Covalent and constraint scores may also be included.
ChemScore estimates the total free energy change that occurs on ligand binding to protein as per the equation given below Each component of this equation is the product of a term dependent on the magnitude of a particular physical contribution to free energy (e.g. hydrogen bonding) and a scale factor determined by regression. ChemScore = ∆G binding + P clash +C internal P internal + (C covalent P covalent + P constraint ) ….. (2) Results and Discussion: Binding site analysis of mycolyltransferases, Ag85A, Ag85B and Ag85C identified that, their binding pockets are identical and the largest binding pocket overlaps with the trehalose binding position in the PDB ID: 1F0P (data not shown). This further increased our confidence that the structures of mycolyltransferases are highly similar and we have therefore taken Ag85C as the representative structure for docking studies. The docking of phosphonate and trehalose analog inhibitors into the active site of mycolyltransferase, Ag85C was carried out using the GOLD software and the docking evaluations were made on the basis of two fitness functions, GoldScore and ChemScore. For both the scoring functions, the docking conformation generated with best fitness has been analyzed using two different criteria. These are 1) ligand binding position, and 2) fitness function score comparison.
Phosphonate inhibitors: Criteria-1: Ligand binding position. From the docking of molecule 3c into the active site of Ag85C, we observed that, the hydrogen bond acceptor atoms are 6, 7 and 8, and atom 8 is also a hydrogen bond donor, as indicated in Figure 1b.
GoldScore fitness function: One hydrogen bond was observed when the molecule 3c was docked into the active site of Ag85C, see Figure 2a and Table 1a. The alkyl chain of inhibitor entered into hydrophobic region of the protein comprising amino acid residues indicated in Table 2a. The best RMSD was found to be 0.28 Å.
ChemScore fitness function: Two hydrogen bonds were observed when the molecule 3c was docked into active site of Ag85C, see Figure 3a and Table 1a. The alkyl chain of inhibitor entered into hydrophobic region of protein comprising amino acid residues indicated in Table 2a. The best RMSD was found to be 0.62 Å.  From the docking of molecule 4a into the active site of Ag85C, we observed that the hydrogen bond acceptor atoms are 2, 3, 6 and 14 as indicated in Figure 1b.

Molecule
GoldScore fitness function: One hydrogen bond was observed when the molecule 4a was docked into the active site of Ag85C, see Figure 2b and ChemScore fitness function: One hydrogen bond was observed when the molecule 4a was docked into active site of Ag85C, see Figure 3b and Table 1a. The alkyl chain of inhibitor entered into hydrophobic region of protein comprising amino acid residues, as shown in Table 2a. The best RMSD was found to be 0.80 Å.
From the docking of molecule 5e into the active site of Ag85C, we observed that, the hydrogen bond acceptor atoms are 2, 3, 4, 6, 11, and 12 as indicated in Figure 1b.
GoldScore fitness function: Two hydrogen bonds were observed when the molecule 5e was docked into active site of Ag85C, see Figure 2c and Table 1a. The alkyl chain of inhibitor entered into hydrophobic region of protein comprising amino acid residues, as indicated in Table 2a. The best RMSD was found 0.61 Å. ChemScore fitness function: One hydrogen bonding interaction was observed when the molecule 5e was docked into the active site of Ag85C, see Figure 3c and Table 1a. The inhibitor alkyl chain entered into hydrophobic region of protein comprising amino acid residues as indicated in Table 2a. The best RMSD was found to be 0.82 Å.
From the docking of molecule 6b into the active site of Ag85C, we observed that, the hydrogen bond acceptor atoms are 1, 4, 5, 13, 14 and 15, and the atom 14 is also a hydrogen bond donor, as indicated in Figure 1b.
GoldScore fitness function: Three hydrogen bonds were observed when molecule 6b was docked into the active site of Ag85C, see Figure 2d and Table 1a. The alkyl chain of ligand entered into hydrophobic region of protein comprising residues as indicated in Table 2a. The best RMSD was found to be 1.04 Å. ChemScore fitness function: One hydrogen bonding interaction was observed when molecule 6b was docked into the active site of Ag85C with a distance of 2.41 Å, see Figure 3d and Table 1a. The alkyl chain entered into hydrophobic region of protein comprising amino acid residues as indicated in Table 2a. The best RMSD was found to be 1. Criteria-2: Fitness function score comparison. The phosphonate inhibitors were docked into the active site of Ag85C and evaluated based on both the fitness scoring functions-GoldScore and ChemScore available in the GOLD software. We observed that the GoldScore correlated well with the reported IC50 values as shown in Figure 4a. There are no compounds with high GoldScore and low activity (high IC50 values). The exceptions are molecules 3b and 3c that bind the Ag85C with a GoldScore of 50.30 and 51.18, while the IC50 values are 3.5 μM and 1.06 μM, respectively. We hypothesize that the low GoldScore for these molecules is due to the presence of OH substitution alone on the phosphate head group. When the docked molecules were evaluated based on the ChemScore, there is no such correlation with the IC50 values, see Figure 4b. These results indicate that the GoldScore is a better parameter to assess the binding of phosphonate inhibitors to Ag85C and there is an overall 70-80 % correlation between the GoldScore and IC50 values.  ChemScore fitness function: Four hydrogen bonds were observed when the molecule 11b was docked into active site of Ag85C, see Figure 7a and Table 1b. The alkyl chain of the inhibitor entered into hydrophobic region of protein comprising amino acid residues, as indicated in Table 2b. The best RMSD was found to be 2.64 Å.
GoldScore fitness function: Four hydrogen bonds were observed when the molecule 15a was docked into the active site of Ag85C, see Figure 6b and Table 1b. The alkyl chain of the inhibitor entered into the hydrophobic region of the protein comprising amino acid residues as indicated in Table 2b. The best RMSD was found to be 0.78 Å.
ChemScore fitness function: Three hydrogen bonds were observed when the molecule 15a was docked into active site of Ag85C, see Figure 7b and Table 1b. The alkyl chain of inhibitor entered into hydrophobic region of protein comprising amino acid residues, as indicated in Table 2b. The best RMSD was found to be 1.60 Å.
GoldScore fitness function: Three hydrogen bonds were observed when the molecule 18b was docked into the active site of Ag85C, see Figure 6c and Table 1b. The alkyl chain of the inhibitor entered into the hydrophobic region of the protein comprising amino acid residues, as indicated in Table 2b. The best RMSD was found to be 2.34 Å.
ChemScore fitness function: Two hydrogen bonds were observed when the molecule 18b was docked into active site of Ag85C, see Figure 7c and Table 1b. The alkyl chain of inhibitor entered into hydrophobic region of protein comprising amino acid residues, as indicated in Table 2b. The best RMSD was found to be 2.28 Å.
For the trehalose analog inhibitors, Rose et al., 2002 [7] have reported the MIC values in order to estimate their activity as antimycobacterial agents, therefore, we have not correlated them with the GoldScore and ChemScore.
In the crystal structure of Ag85B complexed with trehalose substrate (PDB ID: 1F0P), it has been shown that Ser124 (equivalent of Ser624 in Ag85C, B chain) forms a hydrogen bonding interaction with atom O6 of trehalose.
[9] From our docking studies, we have shown that this serine forms a hydrogen bonding interactions with phosphonate and trehalose analog inhibitors (see Tables 1a  and 1b). Further, we also identified additional amino acid residues, Arg541 and Trp762 important for inhibitor recognition mediated via non-bonding interactions. Sequence analysis identified that Arg541, Ser624 and Trp762 are highly conserved in mycolyltransferases [17] of M. tuberculosis. Current docking studies using the GoldScore fitness function reported a hydrophobic tunnel to accommodate the hydrocarbon alkyl chain of phosphonate and R1 alkyl chain of trehalose analog inhibitors. This hydrophobic tunnel comprises the residues Leu540, Phe576, Phe650, Trp658, Leu661, Ile662, Leu623, Met625, Ala665 and Leu727. In the reported crystal structures of Ag85B and Ag85C [9, 10], the authors have described a hydrophobic tunnel comprising these residues to accommodate αalkyl chain of mycolic acids. It was also proposed that βalkyl chain of mycolic acids would fit the trough on the surface of the protein. Docking of trehalose analog inhibitors identified the R2 alkyl chain to bind the region comprising amino acid residues Tyr510, Gln525, Ala542, Gln543, Asp545, Gly548, Asp550, Ile551, Asn552, Pro554 that are located on the surface of the protein. These results indicate that the predicted nonbonding interactions and hydrophobic region to accommodate the alkyl chains are similar to the experimental data obtained from crystal structures. Ala542, Gln543, Gly548, Asn547, Ile551, Asn552 Ala665, Ile662, Leu661,Trp658, Leu730, Phe650, Leu727 Ala542,Gln543, Asp545, Ile551, Gly548