MTB-PCDB: Mycobacterium tuberculosis proteome comparison database.

The Mycobacterium tuberculosis Proteome Comparison Database (MTB-PCDB) is an online database providing integrated access to proteome sequence comparison data for five strains of Mycobacterium tuberculosis (H37Rv, H37Ra, CDC 1551, F11 and KZN 1435) sequenced completely so far. MTB-PCDB currently hosts 40252 protein sequence comparison data obtained through inter-strain proteome comparison of five different strains of MTB. 2373 proteins were found to be identical in all 5 strains using MTB H37Rv as reference strain. To enable wide use of this data, MTB-PCDB provides a set of tools for searching, browsing, analyzing and downloading the data. By bringing together, M. tuberculosis proteome comparison among virulent & avirulent strains and also drug susceptible & drug resistance strains MTB-PCDB provides a unique discovery platform for comparative proteomics among these strains which may give insights into the discovery & development of TB drugs, vaccines and biomarkers. Availability The database is available for free at http://www.bicjbtdrc-mgims.in/MTB-PCDB/


Background:
One third of the world's population is considered to be infected with Mycobacterium tuberculosis, which leads to nearly 9.4 million new patients and 3 million deaths every year [1]. Multi-drug-resistant strains of this pathogen, emerging in association with HIV, have added a frightening dimension to the problem [2]. Outbreaks of extensively drug-resistant (XDR) tuberculosis have also been an increasing threat in certain regions around the world [3]. As M. tb H37Rv is virulent and susceptible to most of the antitubercular drugs used so far, H37Ra which is an avirulent strain [4], and M. tb KZN strain is resistant to different drugs like isoniazid, rifampicin, kanamycin, ofloxacin, ethambutol, pyrazinamide etc. [5], there must be some genetic or proteomic mutations present in them. So, there is a need for genomic as well as proteomic analysis among different strains of MTB to know the variation among them. The complete genome sequences of four clinical strains of Mycobacterium tuberculosis (H37Rv, CDC 1551, F11 and KZN 1435) and one avirulent strain H37Ra is available. In this study we did proteomic comparison amongst these strains of MTB by using NCBI's standalone BLAST algorithm

Database Architecture & Design:
Standalone BLAST program from NCBI was also downloaded and configured for local system. The proteome sequence were formatted using formatdb program of standalone BLAST, followed by pairwise comparison (Local BLAST) among each strain using blastall program of standalone BLAST taking whole proteome at a time. Mycobacterium tuberculosis Proteome Comparison Database (MTB-PCDB) was developed using Microsoft SQL Server as the back end. The output of the BLAST result was then parsed and stored in MS SQL relational database tables using in-house developed PERL code. While parsing BLAST output results, percentage identities, positivities, number of gaps, identical residues, bits, bits score, e-value, query length, subject length, query sequence, subject sequence, consensus sequence etc of the first hit obtained were taken into consideration for each protein comparison.

Data Access:
The interfaces of MTB-PCDB are designed in a manner to help users in easy navigation and retrieval of information from database for analysis. The Comparison, Useful Links and Help. The database can be queried to obtain the proteome sequence comparison information in different ways through a user friendly web interface as follows (Figure 1). The user can search protein sequence comparison data between any two strain of MTB by giving desired identities and percentage similarity. ii) Advanced Search options like identity, similarity, query coverage, bits, bits score etc. are provided for searching more specific information regarding pair wise proteome comparison. iii) A dynamic result page appears after any search in which user can sort the comparison results by identities, similarities, query coverage, bits score, query length, subject length etc. iv) The user can restrict the number of items to be shown per page obtained in searched result. v) The user can also download sequence comparison data. vi) Each comparison also navigates to the details of comparison between the two sequences of respective strains i.e., Protein Name, Protein Length, Start, End, Strand, Accession No., Gene ID, Locus, etc along with whole alignment between the query and subject sequence besides the consensus sequence showing the matches, mismatches and gaps present in the alignments between them. vii) There is also an advanced comparison page for comparing proteome of multiple strains at a time. This may helps users to identify mutations involved in drug resistance and pathogenicity.

Utility:
MTB-PCDB, a comprehensive database with total of 40252 protein sequence comparison data. The proteomic variation found in five M. tuberculosis strains may have vital role in each species. This comparative study may help understand the mechanism of pathogenesis and survival of M. tuberculosis within the host. This information also facilitates design of new antitubercular vaccines and therapeutic agents based on the identified virulence-associated mutations.

Caveats:
MTB-PCDB does not include comparison of all the strains of M. tb as they are not completely sequenced.

Future Developments:
As and when in future, new TB strains are sequenced and available in public databases, we shall attempt to update MTB-PCDB including newly proteome comparison data. We would continue working on analyzing and correlating the proteomic variation among different strains with their drug resistance, virulence and pathogenic properties.