Virtual screening of a MDR-TB WhiB6 target identified by gene expression profiling

Multidrug resistance in M. tb has become a huge global problem due to drug resistance. Hence, the treatment remains a challenge, even though short term chemotherapy is available. Therefore, it is of interest to identify novel drug targets in M.tb through gene expression profiling complimented by a subtractive proteome model. WhiB6 is a transcriptional regulator protein and a known drug resistant marker that is critical in the secretion dependent regulation of ESX-1, which is specialized for the deployment of host membrane-targeting proteins. The WhiB6 protein structure was modelled ab initio and was docked with a library of 173 phytochemicals with potential antituberculosis activity to the identified drug marker to find novel lead molecules. UDP-galactopyranose and GDP-L-galactose were identified to be potential lead molecules to inhibit the target WhiB6. The results were compared with the first line drugs for MDR-TB by docking with WhiB6. Data showed that Ethambutol showed better binding ability to WhiB6 but the afore mentioned top ranked phytochemicals were found to be better candidate molecules. The chosen candidate lead molecules should be further validated by suitable in vitro or in vivo investigation.


Background:
Every year about 10 million people are affected by tuberculosis and among which 1.6 million people die. [1-2] Across the world about 10 million people developed tuberculosis as of 2017 about two third of all new cases occurred in 8 countries like India, China, Indonesia, Philippines, Pakistan, Nigeria, Bangladesh and South Africa which are designated the status of high TB burden countries along with 22 other countries. These countries contribute to 87% of world cases.
[1] Multidrug resistance in Mycobacterium tuberculosis has emerged as a major problem in treatment even though short-term chemotherapy is available; development of resistance to antibiotics has become a global menace. [3] MDR-TB does not acquire drug resistance due to transposable element or a plasmid carrying drug resistant marker, but instead it is acquired by stepwise new mutations in genes for different drug targets. [4] Resistance against the major first line antituberculosis drugs -Streptomycin, Ethambutol, Pyrazinamide, Isoniazid and Rifampicin makes it necessary for treatment with second line drugs with greater toxicity and lesser efficacy.
[5] Exuding antibiotic is due to the impermeable cell wall, that is mediated by efflux mechanisms by several ABC (ATP -binding cassette) transporter and major facilitator super family (MFS) proteins. Among the other causes for drug resistance, efflux mechanism contributes in a major way to intrinsic resistance to drugs. [6] Currently the growing trends of drug resistance in M.tb have led to a wide range of drug discoveries and to look for the functional protein that which is of key focus to target a lead molecule. In this scenario alternate treatment protocols with lesser toxicity can help clinicians battle MDR TB with greater ease. In the current study we have attempted to recognize novel drug target in M.tb through gene expression profiling approach complimented by a subtractive proteomic approach. Subsequently a library of Phytochemicals with potential antituberculosis activity, virtual screening was performed against the identified biomarkers to find

Materials and Methods:
Systematic search for gene expression datasets pertaining to MDR-TB: A comprehensive literature mining of all eligible studies on Mycobacterium tuberculosis gene expression was carried out by searching GEO datasets (as on December 2016) based on the search terms X1 AND (("I" OR "i" AND (T OR t)) X2 AND (("I" OR "i" AND (T OR t)) Where, X1 = Gene expression; X2 = Microarray; I =Mycobacterium tuberculosis; i= Mtb; T = Tuberculosis; t = tb The concept concordance was limited to Tuberculosis, so that only datasets containing studies or data related to TB would be pulled out. Further the confidence of mining was tested by simple scoring algorithm. (Shown in Table 1) Out of these only those gene expression datasets pertaining to Multidrug resistant tuberculosis strains and /or clinical isolates were considered for analysis.

Gene expression profiling:
Gene expression profiling is a technique aimed at understanding transcription pattern in a cell at a given time frame. Measuring mRNA levels is accomplished by measuring mRNA levels of individual genes. Usually relative mRNA levels in two or more experimental conditions (case Vs control) are measured to analyze and understand specific gene expression pattern in given condition Pre-processed datasets were chosen by systematic text mining technique as described above.
[7]Based on the systematic literature search as described above, microarray datasets were retrieved from NCBI.GEO repository https://www.ncbi.nlm.nih.gov/gds/?term= mycobacterium+tuberculosis) using accession number GSE3201 annotated in GPL2787 platform which provides complete coverage of the Human Genome (Build 133, April 20, 2001) plus 6500 additional genes for analysis of over 47,000 transcripts. Gene expression profiling analysis of the chosen dataset using GEO2R.
[8] The dataset comprised of gene expression data from 11clinical isolates and H37Rv as the (reference strain) as control. Each of the 11 clinical isolates was compared against H37Rv individually by using GEO2R log transformation was applied to all the data prior to analysis. Bonferroni adjustment was applied to the p-values. In each of the 11 comparisons, only those genes, which showed log, fold change >1.5 was taken for the further analysis (depicted in table). The upregulated genes which were common in all the clinical isolates (while comparing them with H37Rv) were chosen as candidate drug targets. The genes-MmpL10, WhiB6, Rv1052, PPE39, and Rv2035 were found to be up regulated in all the isolates. From these 5 genes WhiB6 was chosen as the suitable candidate drug target based upon several filtering parameters discussed in detail in the results and discussion section.

Protein Modelling:
Determination of protein 3D structure is an essential part of many aspects of molecular research. In the absence of an experimentally determined protein structure (from X-diffraction or NMR) computational prediction of protein 3D structure becomes the only alternative. Computational protein structure prediction is highly beneficial in gaining insights on the protein function and drugs screening. [9]

Ab-initio Modelling:
The primary sequence of WhiB6 from H37Rv retrieved from UniprotKB ID No P9WF37. The protein sequence was subjected to a PSI Blast against PDB database to recognize suitable template for modelling WhiB6 by homology method. Due the absence of any structurally similar orthologs with a solved structure, Ab-initio modelling was chosen. Ab-initio protein structural modelling is employed when the protein of interest does not have any homologue with solved structure to be used as template for modelling. Ab-initio modelling performs a conformational scan based on designed energy function. QUARK is a computer algorithm for Ab initio protein structure prediction and protein peptide folding, which constructs the correct protein 3D model from small fragments, by replica exchange Monte Carlo simulation under the guidance of an atomic level knowledge-based force field. It conducts a conformational search of a designated energy function, which enables to generate a number of possible suitable structures.
[10] The sequence was subjected to PSI-Blast against the human genome to rule out the presence of human orthologs with high sequence similarity.

Model validation:
The model obtained by Ab-initio modelling using Ramachandran plot, ERRAT2 and ProSA. Ramachandran plot was obtained from the Pdbsum server.

Library of Phytochemicals-used as potential lead molecule against tuberculosis:
Phytochemical were searched for using systematic literature search.
Only those compounds with pro1 antituberculosis activity were chosen and their 3D structures in Dot Sdf format were taken. Those Phytochemicals, which did not abide by Lipinski's rule of 5 were filtered out and rest of the compound was taken for further

Molecular docking:
The library of Phytochemicals with reported antituberculosis activity subjected to virtual screening against WhiB6 (H37Rv) using Molegro virtual docker. Molegro Virtual Docker (MVD) 5.0 uses MolDock scoring system and it is based on a hybrid search algorithm, called guided differential evolution. This algorithm combines the technique of differential evolution optimization with a cavity prediction algorithm. The modelled protein structure was loaded on to MVD 5.0 platform for the molecular docking process. The built-in cavity detection algorithm of MVD 5.0 was used to identify the potential binding sites which are also referred to as active sites or cavities. The search algorithm used was MolDock SE and 10 was the number of runs taken while 2000 was the maximum iterations for a population size of 50 having 100 as the energy threshold. At every step, least 'min' torsions/translations/rotations were sought and the molecule having the lowest energy was preferred. After molecular docking simulation, the poses (binding modes) obtained were classified by re-rank score. Using the ligand preparation module of MVD 5.0, the selected ligands were manually prepared. Bond order, flexible torsion and the ligands were deducted. After the careful removal of hetero atoms and water molecules, the target protein structures were prepared and its electrostatic surface was produced. The grid resolution was set at 0.3 Å. The maximum interaction and maximum population size were set at 1500 and 50 respectively. Further the first line MDR-TB drugs-Ethambutol, Streptomycin, Pyrazinamide, Isoniazid, Rifampicin were docked against WhiB6 to measure the relative affinity and mode of interaction of these first-line drugs in comparison with the Phytochemicals which were found to posses the best binding affinity towards WhiB6.

Results and Discussion: Gene expression profiling
Gene expression profiling of the 11 clinical isolates was performed using GEO2R by comparing each of the isolates against H37Rv (taken as control). Bonferroni correction was applied to the p-values to counteract the problem of multiple comparisons. Those genes that were at least 1.5 fold upregulated in each of these clinical isolates were tabulated and were shown in Modelling of WhiB6 and Target validation by subtractive proteomic approach: PSI-Blast was performed to predict the suitable template with solved 3D structure to model the WhiB6 (H37Rv), this revealed that no structural orthologs with more than 40% of sequence similarity with WhiB6. Therefore homology modelling could not be employed for structure prediction of WhiB6, so Ab-initio modelling was employed as an alternative. WhiB6 protein was modeled by Ab initio modelling method by using QUARK server by taking small fragments through replica exchange Monte Carlo simulation method utilizing atomic level knowledge based force field. The built protein model was validated using Ramachandran plot to evaluate the stereochemicals stability of the modelled WhiB6.
Ramachandran plot revealed that out of the total 101 non-glycine, non-proline residues present in WhiB6 -59 amino acids were present in the most favoured regions. 35 were present in the additionally allowed regions and further 5 amino acids were present in the generously allowed regions-totally constituting 98.0% of all residues. The number of amino acids in the disallowed regions was mere 2.01%. The presence of the vast majority of amino acids in the allowed regions of the plot shows that the modeled WhiB6 was stereochemically stable.
[24] Errat2 server was employed to study the non-bonded interactions between the various atom types in the model protein. ProSA analysis revealed Z score of -5.69. Human protein shared more than 31% of similarity with H37Rv and WhiB6. It is generally hypothesized that protein sharing high degree of sequence similarity will also have structural similarity (Figure 1). Therefore lack of sequence and structural homologues in humans suggest that a lead molecule inhibiting M.tb WhiB6 will have very low propensity to cross bind with human Whib6 leading to adverse effects.  Gene Expression AND (("Mycobacterium tuberculosis" OR " Mtb" AND (Tuberculosis OR tb)) 1253 2 Microarray AND (( "Mycobacterium tuberculosis" OR " Mtb" AND (Tuberculosis OR tb)) 548 3 Total 1801        Table 3).
UDP-galactopyranose belong to the class of Uridine Diphosphate Sugars commonly found in Cucurbit Fruit, Melons, and Legumes and GDP-L-galactose belong to the class of organophosphate oxoanion commonly found in tomato fruit, and strawberry are potential lead molecules against WhiB6 of M.tb based on their high binding affinity and the ability to form strong H-bonds. UDP-galactopyranose is further suitable as a lead molecule as it abides by all the Lipinski's rule of five.
[11] Whereas GDP-L-galactose has a molecule weight of 605.34 and thereby might not be suitable for oral administration. The first line MDR-TB drugs were docked against WhiB6 to identify their potential WhiB6 inhibiting activity in comparison with the identified Phytochemical lead molecules. The molecular docking of Pyrazinamide, Isoniazid, Ethambutol, and Streptomycin against WhiB6 revealed that streptomycin and Rifampicin do not bind with WhiB6 as shown by a positive MolDock score 34.2929 for streptomycin and 967.456 for Rifampicin (Table 4). The H-bond score are 4.88673 and -5.15092 respectively. (Figure 3) Ethambutol showed the highest binding affinity towards WhiB6 compare to all the other first line MDR-TB drugs which is shown by a MolDock score of -78.1277 and it formed 6 H-bonds with amino acids-Asp108, Arg107, and Val111 but while comparing the binding affinity with top ranked Phytochemicals, the compounds such as UDP-galactopyranose, GDP-L-galactose showed much stronger binding affinity with WhiB6 and formed more H-bonds.

Conclusion:
WhiB6 is a transcriptional regulator protein, which is a known drug resistant associated marker in M.tb. It is an ideal candidate drug target to combat MDR-TB based on the results from gene expression profiling and subtractive proteomic approach. UDPgalactopyranose and GDP-L-galactose is the potential lead molecule to bind and inhibit WhiB6. The invitro and invivo efficacy of UDP-galactopyranose and GDP-L-galactose needs to be investigated further.