In silico synteny based comparative genomics approach for identification and characterization of novel therapeutic targets in Chlamydophila pneumoniae

Chlamydophila pneumoniae is one of the most important and well studied gram negative bacterial strain with respect to community acquired pneumonia and other respiratory diseases like Chronic obstructive pulmonary disease (COPD), Chronic asthma, Alzheimer's disease, Atherosclerosis and Multisclerosis which have a great potential to infect humans and many other mammals. According to WHO prediction, COPD is to become the third leading cause of death by 2030. Unfortunately, the molecular mechanisms leading to chronic infections are poorly understood and the difficulty in culturing C pneumoniae in experimental conditions and lack of entirely satisfactory serological methods for diagnosis is also a hurdle for drug discovery and development. We have performed an insilico synteny based comparative genomics analysis of C pneumoniae and other eight Chlamydial organisms to know the potential of C pneumoniae which cause COPD but other Chlamydial organisms lack in potential to cause COPD though some are involved in human pathogenesis. We have identified total 354 protein sequences as non-orthologous to other Chlamydial organisms, except hypothetical proteins 70 were found functional out of which 60 are non homologous to Homo sapiens proteome and among them 18 protein sequences are found to be essential for survival of the C pneumoniae based on BLASTP search against DEG database of essential genes. CELLO analysis results showed that about 80% proteins are found to be cytoplasmic, Among which 5 were found as bacterial exotoxins and 2 as bacterial endotoxins, remaining 11 proteins were found to be involved in DNA binding, RNA binding, catalytic activity, ATP binding, oxidoreductase activity, hydrolase activity and proteolysis activity. It is expected that our data will facilitate selection of C pneumoniae proteins for successful entry into drug design pipelines.


Background:
The genus Chlamydia was established in 1966, Chlamydiaceae family consists of obligate intracellular gram-negative bacterium, which possesses a distinctive and a composite biphasic developmental cycle [1]. C pneumoniae was separated as a distinct species in 1992 [2]. It is perhaps one of the most successful Chlamydial species, which have established a forte in a range homoeothermic and poikilothermic hosts, including humans, animals, amphibians and reptiles [3]. C pneumoniae is a common cause of upper respiratory tract infections and pneumonia and has been associated with several chronic inflammatory conditions such as atherosclerosis [4] and COPD [5]. It is a very common bacterium worldwide, and almost everyone is infected at some point of their life. In some cases, acute C pneumoniae infection can become chronic. The molecular mechanism of chronic infections is poorly understood, which has lead to a major setback in combating these pathogens. C pneumoniae infection has also been reported as a cause of lower respiratory tract infection. The lower tract infection has a direct impact on the pathogenesis, diagnosis and prognosis of COPD in several ways. Several recent group studies suggest that lung growth is impaired by childhood lower respiratory tract infection, in making these individuals more susceptible to develop COPD. Several recent group studies suggest that lung growth is impaired by childhood lower respiratory tract infection, making these individuals more susceptible to develop COPD. This chronic colonization of the lower respiratory tract by bacterial pathogens could induce a chronic inflammatory response with lung damage.
C pneumoniae, usually regarded as a human respiratory pathogen, has been demonstrated to cause intracellular infections of the upper and lower respiratory tract respiratory tissue worldwide. C pneumoniae infections are often importunate, and an acute infection may sometimes turn chronic. Acute C pneumoniae infection can cause bronchitis, emphysema and pneumonia, in addition, more serious diseases such as atherosclerosis and stroke [6] myocarditis, Alzheimer's disease [7] and multiple sclerosis [8]. COPD have been associated with chronic C pneumoniae infection, Chronic infection with C pneumoniae is being seriously investigated as a cause of several systemic diseases, studies reveal the elevated incidence of C pneumoniae infection in COPD [5].
These chronic infections of respiratory tissues could contribute to the pathogenesis of COPD by altering the host response which has proven to be extremely difficult to diagnose and impossible to treat with current antibiotics. Thus, development of safe and effective vaccines represents a cost-effective approach that would have a greater impact on the high prevalence of Chlamydia infections and the prevention of severe long-term sequelae. Thus, new antichlamydial drug targets are urgently needed to be identified. The first identified case of C pneumoniae infection was reported in Taiwan. It was identified as TWAR organism, originally derived from the names of the two isolates -Taiwan (TW-183) and an acute respiratory isolate designated as AR-39. By the availability of genome sequence of C pneumoniae AR-39 [9], we considered AR-39 as a reference species as all the C pneumoniae strains share ~99.5% genome in common but comparably AR-39 has more number of coding genes and gene products. Availability of the whole genome sequence which was first sequenced in 1999 and has been deposited in Genbank database has paved a way to this research.
By the application of newer molecular and genomics research techniques/tools with the modern approach like Synteny Based Comparative Genomics(SBCG) is helping us to find precisely the important genes which are conserved and play a role of bacterial infection in COPD which can be identified as potential drug targets.

Prediction of non-orthologous (uncommon) genes / proteins
The tool displays the strict conservation of gene order, in comparing to a reference species with compared species. Each gene present in the genome of the reference species is depicted by a rectangle with a color code. The blue, yellow and grey colored rectangles represents positive strand, negative strand and genes without orthologs belonging to a synteny block in another species. Based on the color code the non-orthologous gene was selected and used for further analysis. Gene information panel located on the left hand side of the tool was used for retrieving the sequence related information like its gi number -Identifier, the name of its species -Species, Gene name, function, its location-Chromosome and sequence information. The sequence information was used to retrieve the protein sequence data and used for further study (Figure 1).

Insilico identification of non-orthologous essential genes and their localization
The predicted non-orthologous protein sequence of the C. pneumonia was subjected to BLASTP at http://blast.ncbi.nlm.nih.gov/ [11] specifically against DEG 7.0 (Database of Essential Genes) at http://tubic.tju.edu.cn/deg/ [12] for screening of essential genes. The DEG database, compiles literature and sequences of experimentally verified essential genes and proteins from Gram-positive and Gramnegative bacteria. The cut off value for database search used was E-value <E -10 , bit score >100 and percentage of identity >35% at amino acid level were considered.

Insilico prediction of non-host genes and prediction of toxigenesis
The BLASTP analysis was carried out between host and pathogen, for the identification of non-host proteins from C. pneumonia

Testing the druggability and prediction of therapeutic targets
The druggability of the predicted non-orthologous essential genes involved in production of toxin in pathogenic systems was analyzed using Drug Bank database at http://www.drugbank.ca/ [16] by using Pharma search. Proteins were subjected to BLAST against KEGG database http://www.genome.jp/kegg/ [17] to know in which pathway they are involved. It was found that all the putative therapeutic targets predicted were present in the unique metabolic pathways of C. pneumonia (Unpublished data).

Results & Discussion:
Genome comparison and Non-Orthologous gene prediction C pneumoniae was the reference organism which encloses 1128 synteny blocks and it was compared to identify organism specific synteny blocks as they lack the ability to cause COPD in human beings. Out of these, 354 non-orthologous synteny blocks were identified and protein sequences were retrieved ( Table 1). Out of the 354 synteny blocks hypothetical proteins were filtered to avoid the noise.

Non-Orthologous Essential gene prediction and localization
Protein coding sequences were searched for identifying nonhomologous proteins, against human proteome using BLASTP and 60 proteins were identified as non-homologues to the human proteome and were subjected to Protein BLAST against DEG database with E-value cutoff score of 10 -10 and 35% identity 18 proteins comprising 1% of the total number of protein coding sequences in C pneumoniae AR39 were found to be essential and non-host (Figure 2).

Non-Host gene prediction and Bacterial Toxins prediction
Bacterial protein toxins are the most potent poisons known and may show activity at very high dilutions where toxins are the major determinant of virulence, usually virulent strains of bacterium produce range of toxins which is not observed in non-virulent strains. The protein toxins are soluble proteins secreted by living bacteria which are essential proteins for their survival, attack and defensive mechanism. We have been successful in identifying these toxins in C pneumoniae which can be classified into exotoxins and endotoxins where the latter is specific to Gram negative bacteria's like Chlamydia.
Among the 18 putative uncharacterized essential proteins predicted toxins were analyzed using the BTXPred tool which predicts the bacterial toxins based on the protein sequence and it classifies them into exotoxins and endotoxins, 5 proteins were identified as bacterial exotoxin proteins and 2 as bacterial endotoxin proteins which are involved in toxigenesis with the help of Lipopolysaccharide components attached to produce cytotoxicity and play an important role in pathogenesis.

Testing Druggability and prediction of Therapeutic targets
We report these bacterial toxins as potential candidate targets. As no specific drugs are available when checked in the Drug Bank, so there is a urgent need to develop specific inhibitors of these targets as these are the primary cause for the pathogenesis of the C pneumoniae and related organisms which are involved in causing diseases in humans.We extrapolated the remaining proteins using Interpro, which helps in analyzing the given protein's function and classifies based on their family and domains involved, and also it provides information on important sites of protein and integrates the protein signature's information from the member databases.
We successfully predicted 1 RNA binding protein, 6 DNA binding proteins, 4 proteins involved in catalytic activity, 3 ATP binding proteins, 1 chaperonine protein, 1 protein involved in Oxidoreductase activity,1 protein involved in hydrolase activity, 1protein involved in proteolysis. We recommend 4 proteins namely signal recognition particle protein FtsY and cysteinyl-tRNA synthetase which are endotoxins involved in pathogenesis, serine protease and Clp Protease are proteins which are ubiquitous and serves as virulence factor in causing the disease.

Conclusion:
The SBCG analysis of C pneumoniae, have led to identification of several proteins in the C pneumoniae genome that can be targeted for potential drug design and vaccine development. As various identified drug targets have been reported to play a vital role in the important metabolic pathways that regulate bacterial pathogenicity and necessary nutrient uptake, a novel systematic approach to develop drugs against these targets would likely be very promising for the treatment of COPD.
The present study has thus led to the identification of several proteins which are present in unique metabolic pathways of C pneumoniae (data not shown, unpublished data), and identified drug targets in the present study are relatively small in number which can be targeted for effective drug design and vaccine development against C pneumoniae. It is expected that the drugs developed against the identified targets will be specific to the pathogen and less or not toxic for the host. Homology Modeling of these target proteins will facilitate for identifying crucial sites which can be targeted for drug designing.