Integrating T-cell epitope annotations with sequence and structural information using DAS

Immunoinformatics is an emerging new field that benefits from computational analyses and tools that facilitate the understanding of the immune system. A large number of immunoinformatics resources such as immune-related databases and analysis software are available through the World Wide Web for the benefit of the research community. However, immunoinformatics developments have sometimes remained isolated from mainstream bioinformatics. Therefore, there is clearly a need for integration, which will empower the exchange of data and annotations within the scientific community in a quick and efficient fashion. Here, we have chosen the Distributed Annotation System (DAS), for integrating in house annotations on experimental and predicted HLA I-restriction elements of CD8 T-cell epitopes with sequence and structural information.


Background:
Recent years have witnessed the birth of Immunoinformatics, an emerging subdiscipline of Bioinformatics. With the burgeoning explosion of immunological data, computational analysis has become an essential element of immunology research, facilitating the understanding of the immune function by modeling the interactions among immunological components [1]. Another major role in Immunoinformatics is the efficient management, storage, and annotation of such data. Following those principles, a large number of immunoinformatics resources including immune-related databases and sophisticated analysis software, are available through the World Wide Web. Collectively, these resources contribute to the advances made in immunological research. Yet, there is still a major step to be taken towards the integration of all these resources, as ideally, multiple research groups should be able to exchange and compare their data, in a quick and efficient fashion. The distributed annotation system (DAS) defines a communication protocol used to exchange biological annotations from a number of heterogeneous distributed databases [2]. The key idea behind the DAS concept is that annotations should not be provided by single centralized databases but instead be spread over multiple sites. DAS follows a simple http-based client-server protocol, where clients make requests in the form of a URL to the servers, and receive simple XML responses. The basic system is composed of a reference server, one or more annotation servers, and an annotation viewer. The reference server is responsible for serving genome maps, sequences and information related to the sequencing process. Annotation servers are responsible for returning the annotations on a defined region (given a start and stop position coordinates) of the genome or proteome. The annotation viewer can either be a simple web browser, which will visualize the raw XML data provided by the server, or a graphical client which translates the XML annotations such as the Center for Biological Sequence Analysis (CBS) DAS viewer [3] accessible at http://www.cbs.dtu.dk/cgi-gin/das.
In this article, we will show how an epitope database can be integrated to other database resources using DAS. For that we will describe TEPIDAS, a DAS Annotation Server of HLA Irestricted CD8 T-cell epitopes specific of human pathogenic organisms. TEPIDAS falls into the category of annotation servers and is registered at the DAS registry since February of 2008, and has the unique id DS_545.

Description:
Overview TEPIDAS is a DAS annotation server that follows the UniProt coordinates system to annotate the experimental and potential HLA I-restriction elements of a set of CD8 T-cell epitopes. TEPIDAS is implemented using ProServer TEPIDAS server, ProServer simply retrieves the relevant information from the relational database and composes the XML response. The annotations in TEPIDAS are precalculated and stored in a relational database. The coordinate system defined for TEPIDAS is Uniprot [5], as the "authority", and Protein Sequence, as the "type". As for TEPIDAS capabilities, our server implements the "types" and "features" queries.

Annotations served by TEPIDAS
TEPIDAS annotates the HLA I molecules that can restrict a set of 3250 CD8 T-cell epitopes. Epitopes were obtained from the EPIMHC [6] and IMMUNEEPITOPE (http://www.immuneepitope.org/) databases, and were selected to be experimentally defined in humans infected with the pathogen or immunized with the relevant source antigen. HLA I-restriction annotations can be classified as experimental, when determined experimentally, or predicted. Predictions of the epitopes binding HLA I molecules, were obtained using a set of 72 positionspecific scoring matrices (PSSMs), also known as weight matrices of profiles, which are obtained from aligned peptides known to bind to the relevant HLA I molecules. This predictive method is described in full detail at [7]. In addition to the experimental and predicted data, the cumulative phenotypic frequency (CMV) of the T-cell epitope HLA I restriction is also provided for five ethnic groups (Black, Caucasian, Hispanic, North American natives and Asian). CMV was computed using the gene and haplotype frequencies of the relevant HLA I alleles [8].
The potential population protection coverage of a T cell epitope-based vaccine is determined by the percentage of the population that could elicit a T cell response to the epitopes, which in turn is given by the CMV of HLA I molecules restricting these epitopes.
Accessing TEPIDAS from the SPICE graphical client SPICE [9] is a Java program that can be used to visualize annotations of protein sequences and protein structures. It is available at: http://www.efamily.org.uk/software/dasclients/spice. SPICE accepts either a PDB or a UniProt accession code, and integrates information from four different types of DAS servers: 1) a protein sequence server that provides the sequence (typically UniProt), 2) an alignment server that provides the alignment between the protein sequence and its structure, 3) a structure server that serves the 3D coordinates displayed, and 4) several feature servers that provide pre-calculated annotations, as for example TEPIDAS among others. SPICE retrieves the protein sequence pertaining to the selected UniProt accession number, and displays it as a ruler with relative position numbers. Annotations, such as TEPIDAS annotation features, are listed below the sequence in that figure.
On the left of the panel, below the 'tepidas' descriptor, appears the type of HLA I molecule of the corresponding feature shown as a colored rectangle on the right. When the user clicks on a feature, a pop-up window appears, containing all the information of the feature, including the explanatory note. In addition, the PDB coordinates of the selected feature, if available, will be highlighted at the left panel, enabling the location of the epitope at the 3D structure whenever there is a match between sequence and structure ( Figure 1).