Genome subtraction for novel target definition in Salmonella typhi

Large genomic sequencing projects of pathogens as well as human genome leads to immense genomic and proteomic data which would be very beneficial for the novel target identification in pathogens. Subtractive genomic approach is one of the most useful strategies helpful in identification of potential targets. The approach works by subtracting the genes or proteins homologous to both host and the pathogen and identify those set of gene or proteins which are essential for the pathogen and are exclusively present in the pathogen. Subtractive genomic approach is employed to identify novel target in salmonella typhi. The pathogen has 4718 proteins out of which 300 are found to be essential (“ indispensable to support cellular life”) in the pathogen with no human homolog. Metabolic pathway analyses of these 300 essential proteins revealed that 149 proteins are exclusively involved in several metabolic pathway of S. typhi. 8 metabolic pathways are found to be present exclusively in the pathogen comprising of 27 enzymes unique to the pathogen. Thus, these 27 proteins may serve as prospective drug targets. Sub-cellular localization prediction of the 300 essential proteins was done which reveals that 11 proteins lie on the outer membrane of the pathogen which could be probable vaccine candidates.


Background:
The availability of large amount of genomic data generated by the microbial genomes and the human genome project has revolutionized the field of drug-discovery against threatening human pathogens [1]. These large sets of genomic data are useful in identification and characterization of the novel therapeutic targets and virulent factors prevalent in the pathogens. Subtractive genomic strategy is developed by assuming that the novel targets identified in the pathogen should be essential for the pathogen that is it should be involved in the replication, survival and a important component of various metabolic pathways and mechanisms occurring in the pathogen while at the same time should be absent on the host that is human and should have no homolog in human, so that when a drug or a lead compound is designed considering the potential target it should only be against the mechanism and functionality of the pathogen not the host. Subtractive genomics has been successfully used by authors to locate novel drug targets in Pseudomonas aeruginosa [2]. The work has been effectively complemented with the compilation of the Database of Essential Genes (DEG) for a number of pathogenic microorganisms [3]. The current studies make use of the subtractive genomics approach and DEG to analyze the complete genome of Salmonella typhi to search for potential vaccine candidates which would possibly lie on the surface membrane of the pathogen and drug targets.
Salmonella enterica serovar typhi is a human-specific gram-negative pathogen causing enteric typhoid fever, a severe infection of the reticuloendothelial system [4], [5], [6]. It has two strains CT18 (multiple drug resistant) [7] and Ty with a complete proteome of 4718 proteins. Worldwide, typhoid fever affects roughly millions of people annually, causing deaths. Infection of S. typhi leads to the development of typhoid, or enteric fever. This disease is characterized by the sudden onset of a sustained and systemic fever, severe headache, nausea, and loss of appetite. Other symptoms include constipation or diarrhea, enlargement of the spleen, possible development of meningitis, and/or general depression. Untreated typhoid fever cases result in mortality rates ranging from 12-30% while treated cases allow for 99% survival. The early administration of antibiotic treatment has proven to be highly effective in eliminating infections, but indiscriminate use of antibiotics has led to the emergence of multidrug-resistant strains of S. enterica serovar Typhi [8]. Chloramphenicol was the drug for the treatment of this infection till plasmid mediated chloramphenicol resistance was encountered [9]. Following this ciprofloxacin became the mainstay of treatment being a safer and more effective drug than Chloramphenicol but after clinical resistance to treatment with ciprofloxacin in the patients suffering from enteric fever, the choice left now is an expensive drug like ceftriaxone or cefexime. [10]. Resistance against ceftriaxone have been reported to CDC (Centre for Drug Control) [11] mild to moderate side effects have been shown for ceftriaxone. The novel targets identified by us using subtractive genomics will help enable understanding the biology of the pathogen to provide a more cost effective medication.

Methodology:
The systematic identification and characterization of potential targets in salmonella typhi is illustrated in Figure 1.

Retrieval of proteomes of host and pathogen:
The complete proteome of Salmonella typhi were retrieved from SwissProt [12] and protein sequences of Homo sapiens were downloaded from NCBI [13]. The Database of Essential genes was accessed from its location http://tubic.tju.edu.cn/deg/.

Identification of essential proteins in S. typhi:
The S. typhi proteins were purged at 60% using CD-HIT [14] to identify the paralogs or duplicates proteins within the proteome of S.typhi. The paralogs are excluded and the remaining sets of protein were subjected to BlastP against Homo sapiens protein sequences with the expectation value (E-value) cutoff of 10 -4 . The resultant dataset obtained were with no homologs in Homo sapiens. BLASTP analysis was performed for the non homologous protein sequences of S. typhi against DEG with E-value cutoff score of 10 -100 . A minimum bitscore cut-off of 100 was used to screen out genes that appeared to represent essential genes. The protein sequences obtained are non homologous essential proteins of S.typhi.

Metabolic pathway analysis:
Metabolic pathway analysis of the essential proteins of S. typhi was done by KAAS server at KEGG for the identification of potential targets. KAAS (KEGG Automatic Annotation Server) provides functional annotation of genes by BLAST comparisons against the manually curated KEGG GENES database. The result contains KO (KEGG Orthology) assignments and automatically generated KEGG pathways. [15]

Sub-cellular Localization prediction:
Protein sub cellular localization prediction involves the computational prediction of where a protein resides in a cell. Prediction of protein sub cellular localization is an important component as it predicts the protein function and genome annotation, and it can aid the identification of targets. Sub-cellular localization analysis of the essential protein sequences has been done by Proteome Analyst Specialized Subcellular Localization Server v2.5 (PA-SUB) [16] to identify the surface membrane proteins which could be probable vaccine candidates.

Discussion:
The results obtained through computational analysis reveals that out of 4718 proteins in salmonella typhi 159 were identified as duplicates through CD-HIT with 60% similarity. The remaining 4559 paralogs were subjected to subtractive genomics which leads to 3570 proteins. These 3570 proteins when subjected to blastp against DEG database showed 300 proteins, which were essential for the pathogen. The results for subtractive proteome approach, metabolic pathway analysis and sub cellular localization are listed in Table No. 1(Supplementary material). The purpose of the present studies was to locate those essential proteins of S. typhi that play vital roles in the normal functioning of the bacterium within the host and to pick out them in the view of targeting. Detection of non-human homologs in the essential proteins of S. typhi with subsequent screening of the proteome to find the resultant protein product are likely to lead to development of drugs that exclusively interact with the pathogen. The non-human homologs of the surface proteins would represent potential vaccine candidates. 300 of the essential proteins were without human homologs. Metabolic pathway analyses of these 300 essential proteins by KAAS server at KEGG revealed that out of 300, 149 proteins might be concluded to be unique and are invariably linked with essential metabolic and signal transduction pathways. Presumably, screening against such novel targets for functional inhibitors will result in discovery of novel therapeutic compounds active against bacteria, including the increased number of antibiotic resistant clinical strains [17].
Metabolic pathway analyses of the 149 essential proteins revealed that 15 proteins are involved in Carbohydrate Metabolism, 10 in Energy Metabolism, 5 in Lipid Metabolism, 4 in Nucleotide Metabolism, 30 in Amino Acid Metabolism, 20 in Glycan Biosynthesis and Metabolism, 16 in Metabolism of Co-factors and Vitamins, 20 in genetic information processing, 26 in environmental information processing and 2 in human disease. The results are summarized in Table 2 (Supplementary material). Comparative analysis of the metabolic pathways of the host (Homo sapiens) and the pathogen (S.typhi) by using Kyoto Encyclopedia of Genes and Genomes (KEGG) reveals 8 pathways which are unique to S.typhi. Thereafter, each selected pathway was screened for the unique enzymes and proteins involved. The peptidoglycan layer of the bacterial cell wall is the major structural element which plays an important role in pathogenesis as it provides resistance to osmotic lysis. D-alanine is the central molecule in the peptidoglycan assembly and cross-linking. D-alanine-D-alanine ligase (ddlA) is an important target as it is involved in D-alanine metabolism. Lipopolysaccharides (LPS) are also one of the main constituents of the outer cell wall of gram negative bacteria and play an important role for the survival of the pathogen. Out of the 14 enzymes involved in LPS biosyntheseis pathway, 13 enzymes are found to be essential for the variability of the bacteria and could be probable drug targets and it did not show homology with any human protein.
Two-component systems of bacteria represent the primary signal transduction paradigm in prokaryotic organisms. 8 essential enzymes were found to be potential targets in this pathway. Tryptophan synthase beta chain (trpB) is an important enzyme as it is involved in tyrosine and tryptophan biosynthesis pathway. Chemotaxis protein (MotA) and chemotaxis protein methyltransferase (CheR) is essential enzyme due to its involvement in multiple metabolic pathways like cell Motility, bacterial chemotaxis and flagellar assembly. Phosphoenolpyruvate (ppc) has been identified as a possible target due to its involvement in carbon fixation in photosynthetic organism, pyruvate metabolism and reductive carboxylase cycle. The focus of the present studies was to hunt for potential targets in S. typhi by computational approach. The sub-cellular localization prediction done by PA-SUB identify 11 proteins lying on the surface of the pathogen which could represent promising candidates for further characterization and analysis with a support to vaccine design. The results are summarized in Table No.

Conclusion:
The availability of full genomic and proteomic sequences generated from the sequencing projects along with the computer-aided softwares to identify and characterize probable drug targets is a new emerging trend in pharmacogenomics . The application of the Database of essential genes helps to identify the potential drug targets in pathogens. The current study helps in the characterization of the potential proteins that could be targets for efficient drug design against Salmonella typhi. As subtractive genomic approach is applied for the identification of drug targets, so the drug would be specific for the pathogen and not lethal to the host. Molecular modeling of the targets will decipher the best possible active sites that can be targeted by simulations for drug design. Virtual screening against these potential targets might be useful in the discovery of potential therapeutic compounds against Salmonella typhi.

References:
[1] L Miesel et al., Nat Rev Genet. 4: 442 (2003)  This is an open-access article, which permits unrestricted use, distribution, and reproduction in any medium, for noncommercial purposes, provided the original author and source are credit.