SALMONELLABASE - An online database of druggable targets of Salmonella species

Salmonellosis is one of the most common and widely distributed food borne diseases caused by Salmonella serovars. The emergence of multi drug resistant strains has become a threatening public health problem and targeting unique effectors of this pathogen can be considered as a powerful strategy for drug design. SalmonellaBase is an online web portal serving as an integrated source of information about Salmonella serovars with the data required for the structural and functional studies and the analysis of druggable targets in Salmonella. We have identified several target proteins, which helps in the pathogenicity of the organism and predicted their structures. The database will have the information on completely sequenced genomes of Salmonella species with the complete set of protein sequences of the respective strains, determined structures, predicted protein structures and biochemical pathways of the respective strains. In addition, we have provided information about name and source of the protein, Uniprot and Protein Data Bank codes and literature information. Furthermore, SalmonellaBase is linked to related databases and other resources. We have set up a web interface with different search and display options so that users have the ability to get the data in several ways. SalmonellaBase is a freely available database. Availability http://www.salmonellabase.com/


Background:
Salmonellosis is one of the most common and widely distributed food borne diseases caused by Salmonella serovars. Salmonella enterica serovar Typhi is a human-specific pathogen causing enteric typhoid fever, a severe infection of the reticuloendothelial system. The early administration of antibiotic treatment has proven to be highly effective in eliminating infections, but indiscriminate use of antibiotics has led to the emergence of multidrug-resistant strains of S. enterica serovar Typhi [1-4]. The emergence of multi drug resistant strains has become a threatening public health problem. Salmonella is responsible for an estimated 3 billion human infections worldwide and killing 217,000 people every year [5].
Since typhoid is becoming difficult to treat with conventional drugs, information about the whole genome sequence and genes of S. enterica serovar Typhi will help to reveal more specific targets for drugs aimed at disease treatment and vaccines for prevention. Targeting unique effectors of this pathogen can be considered as a powerful strategy for drug design against bacterial variations to drug resistance [6]. Over the past few decades the nucleotide and protein sequence data have been accumulated and available in number of public databases [7, 8]. However, such databases are broad in scope and there is a gap between the public databases and the small curated databases focusing on a particular organism or a type of data. Therefore we have developed SalmonellaBase, an online database with the data required for the structural and functional studies of Salmonella serovars' druggable targets.

Methodology:
The database consists of the records of nucleotide sequences, protein sequences, biological pathways, determined 3D structures, and the predicted druggable protein targets of 16 different strains of Salmonella species. They are organized to simplify the task of finding relevant data for proteins in the related strains (Figure 1). Complete genome and proteome sequences sequence data was collected from NCBI. The protein records include all the functionally assigned protein sequences as well as the hypothetical proteins; each record when accessed returns the primary sequence in fasta format. Databases such as Database of Essential Genes (DEG) [9] (http://tubic.tju.edu.cn/deg) and Kyoto Encyclopedia of Genes and Genomes (KEGG) [10] pathway database were used to identify putative drug targets for the individual strains respectively. SalmonellaBase was constructed with standard HTML and JavaScript. The database used PHP as the frontend and MySQL as the backend. The database is available online with user-friendly search methods and graphical browsers (Figure 2).

3D Structures of druggable targets:
One of the main focuses of the database is to provide the structural information of the druggable targets found in the completely sequenced genomes of Salmonella strains. The database has a record of predicted structures of proteins which could serve as targets for novel drug design. The target proteins were identified by differential genomics approach and the structures are modeled using comparative and homology modeling approaches and are refined by energy minimization and validated by Ramachandran plot. Along with the predicted structures of protein targets, structure files of the experimentally determined proteins obtained from PDB are also curated. These structures were made available for download from respective pages in the database.

Features of the database:
The database consists of the collection of the primary sequences of all hypothetical proteins, functionally assigned proteins, complete genome sequences, biological pathways and the tertiary structure files of protein targets. The important feature of SalmonellaBase is the integrated availability of data. The completely sequenced genomes of over 16 strains of Salmonella serovars, are incorporated in this database Table 1 (see  supplementary material). Moreover, each of the protein record shows, (i) The start position and end position of the protein in the genome; (ii) The length of the nucleotide sequence and the length of the amino acid sequence; (iii) Displays the primary protein sequence in fasta format; (iv) The protein summary page also has the hypothetical protein sequences of all the strains with their position in the genome; (v) The database also allows the users to search the protein of interest from the data by giving a keyword in the filter option; (vi) The biological pathway record includes all the pathway names with its image.