A searchable database for the genome of Phomopsis longicolla (isolate MSPL 10-6)

Phomopsis longicolla (syn. Diaporthe longicolla) is an important seed-borne fungal pathogen that primarily causes Phomopsis seed decay (PSD) in most soybean production areas worldwide. This disease severely decreases soybean seed quality by reducing seed viability and oil quality, altering seed composition, and increasing frequencies of moldy and/or split beans. To facilitate investigation of the genetic base of fungal virulence factors and understand the mechanism of disease development, we designed and developed a database for P. longicolla isolate MSPL 10-6 that contains information about the genome assemblies (contigs), gene models, gene descriptions and GO functional ontologies. A web-based front end to the database was built using ASP.NET, which allows researchers to search and mine the genome of this important fungus. This database represents the first reported genome database for a seed borne fungal pathogen in the Diaporthe– Phomopsis complex. The database will also be a valuable resource for research and agricultural communities. It will aid in the development of new control strategies for this pathogen. Availability: http://bioinformatics.towson.edu/Phomopsis_longicolla/HomePage.aspx


Background:
Phomopsis longicolla (syn. Diaporthe longicolla) is an important seedborne fungal pathogen that primarily causes Phomopsis seed decay (PSD) in most soybean production areas worldwide [1,2]. This disease severely decreases soybean seed quality by reducing seed viability and oil quality, altering seed composition, and increasing frequencies of moldy and/or split beans [3][4][5][6]. Research on analysis of the internal transcribed spacer (ITS) region [7], the small subunit of the mitochondrial ribosomal RNA gene [8], and other genes/regions of P. longicolla have been reported. Recently, the genome of a P. longicolla isolate MSPL 10-6 which was isolated from field-grown soybean seed in Mississippi, USA was sequenced [9]. Development of a database for P. longicolla isolate MSPL 10-6 that contains information about the genome assemblies (contigs), gene models, gene descriptions and GO functional ontologies will allow researchers to search and mine the genome of this important fungus. The database will be a valuable resource for research and agricultural communities, and facilitate investigation of the genetic base of fungal virulence factors and an understanding of the mechanism of disease development. To our knowledge, this database represents the first reported genome database for a seed borne fungal pathogen in the Diaporthe-Phomopsis complex.

Methodology of Development:
The database was designed, implemented and hosted using Microsoft SQL Server 2008 Enterprise Edition. Microsoft Visual Studio 2013 was used to design and implement the web pages, which were programmed using ASP.NET framework 4.0 with C# programming language. Both the database and the website are on the same server at Towson University in Baltimore, MD, USA. This server is running Microsoft Windows Server 2012 and Internet Information Services (IIS V7.0). The database stores the assembly of the P. longicolla MSPL 10-6 genome (108 scaffolds) [9] and their annotations. In addition to the sequences, the database also houses information on gene function and gene ontology distributions.

Utility to the biological community:
The database contains the genome sequence of P. longicolla MSPL 10-6 and the 16,597 genes that were annotated. The annotation includes GO ontologies that have been assigned to most genes (process, molecular function and cellular component). The database's web-accessible interface (Figure 1) provides an easy way to search, browse and download the sequences and functional annotation data stored in the database. The following are the main functions the website provides: [1] Search: Users can search by GO ontology terms, or by sequence description (Figure 2). Partial characters can be used if one is not sure of the full GO term or gene name. Both the search by GO ontologies and search by description return their results in a nice tabular format that allows the user to select any record of the returned search results to see details about that specific sequence\gene. The information includes sequence name, sequence description, sequence length, blast e-value, gene ontolgy, InterProScan results and the actual sequence in FASTA.
[2] Statistics and Graphs: The web site provides static pages that display the annotation statistics (lengths of coding regions, number of exons…etc.) along with bar graphs depicting the GO ontologies distributions. [3]

Download:
The web site allows user to download the complete assembled genome (FASTA format) and the annotations in both FASTA and GF3 formats. Raw sequences can be found from the SRA database, located at: http://www.ncbi.nlm.nih.gov/nuccore/ AYRD00000000/