CHIKVPRO - a protein sequence annotation database for chikungunya virus.

In the recent past, there has been a resurgence of interest in Chikungunya virus (CHIKV) attributed to massive outbreaks of Chikungunya fever in the South-East Asia Region. This has reflected in substantial increase in submission of CHIKV genome sequences to NCBI (National Center for Biotechnology Information) database. Hereby we submit a database “CHIKVPRO” containing structural and functional annotation of Chikungunya virus proteins (25 strains) submitted in the NCBI repository. The CHIKV genome encodes for 9 proteins:4 non-structural and 5 structural. The CHIKVPRO database aims to provide the virology community with a single accession authoritative resource for CHIKV proteome- with reference to physiochemical and molecular properties, proteolytic cleavage sites, hydrophobicity, transmembrane prediction, and classification into functional families using SVMProt and other Expasy tools. Availability The database is freely available at http://www.chikvpro.info/


Background:
The Chikungunya disease burden is highly associated with the large-scale morbidity. The disease manifests itself with an acute febrile illness that lasts for 2-5 days, followed by prolonged joint pain which may persist for weeks, months or even years [1]. The causative agent Chikungunya virus was first isolated in Tanzania in 1953 and since then it was associated with a number of epidemic outbreaks in Africa, Southeast Asia and India [2]. Aedes mosquitoes, primarily Aedes aegypti and Aedes albopictus have been identified as the vectors responsible for disease transmission. Development of new tools for diagnosis, prevention and treatment of Chikungunya need to be supported. There is a need to characterize the proteome of various CHIKV strains isolated during different outbreaks. CHIKVPRO, a database with computational analysis and integrated information of different CHIKV strains is an attempt in this direction to provide proteomic data to those involved in virology research.

Methodology: Dataset:
The complete genome sequences of 25 Chikungunya strains (submitted till October 2009) were downloaded from NCBI nucleotide database [4]. The translated protein sequence for each strain was extracted from NCBI protein database in two sets -the non-structural polyprotein sequence and the structural-polyprotein. For all strains, the non structural polyprotein sequence was divided into 4 protein sequences -non structural proteins nsP1, nsP2, nsP3, and nsP4; and the structural polyprotein sequence into 5 protein sequences -capsid, 6K, envelope protein E1, E2 and E3. This fragmentation was done computationally using parameters such as amino acid positions and sequence alignment in reference to the strains 37997 and African S27 characterized in Uniprot database

Database:
MySQL was used to construct a relational database to store information of viral proteins in the form of related tables. Data consistency and non redundancy was maintained by using normalisation techniques. HTML and PHP were used to provide a dynamic web interface which was appropriately connected with the database. The database is freely available to view and download data.

Database features:
CHIKVPRO provides its users to search a protein through its name, accession ID and virus strain (Figure 1). After selecting a particular protein, the user can perform the following: (1) Molecular predictiongives molecular weight, amino acid composition, atomic composition and physiochemical properties like theoretical pI, instability index, aliphatic index and grand average of hydrophobicity (GRAVY). The tool used for this analysis was Protparam

Database utility:
The in silico analysis gives us a brief idea about the role of each protein and helps us to understand its biological significance. Such information might help virologists better understand the mode of virus replication, its mechanism of pathogenesis, strain specific variations and to develop potential anti-viral agents [8]. Since earlier studies have shown that a single mutation in the virus affects vector specificity, severity and epidemic potential, this database may also provide useful information for determining the virulence of the new isolates [9, 10]. The database provides its users a web-interface which is highly user-friendly and easy to access (Figure 1).

Conclusions and future aspects:
CHIKVPRO, a protein sequence annotation database of Chikungunya virus was designed to provide an easy access to the large and growing volume of data. The database provides useful resource of information on viral proteins for molecular biologists and virologists working in related areas.
In the submitted version, CHIKVPRO provides information on physiochemical, molecular and functional properties of each protein. It also illustrates the various peptide cleavage sites, transmembrane and other functional domains present in each protein. We plan to further include additional features such as higher structural conformations, post translational modifications, etc. and also upload the data for the remaining strains in our advanced version of CHIKVPRO. The database shall be updated timely as more data on viral proteins is generated through our ongoing experimental analysis and future sequence submissions.