MFPPI – Multi FASTA ProtParam Interface

Physico-chemical properties reflect the functional and structural characteristics of a protein. The comparative study of the physicochemical properties is important to know role of a protein in exploring its molecular evolution. A number of online and offline tools are available for calculating the physico-chemical properties of a single protein sequence. However, a tool is not available for a comparative study with graphical visualization of Multi-FASTA sequences. Hence, we describe the development and utility of MFPPI V.1.0 (a web interface developed in JAVA platform) to input each FASTA sequence from Multi-FASTA file into the ProtParam web server for the calculation of physico-chemical properties. MFPPI V.1.0 calculates different physico-chemical properties for a given set of proteins in a single run and saves the data in the MSExcel sheet. Furthermore, it provides a graphical representation of protein physico-chemical properties for analysis and visualization of data in a user-friendly manner. Therefore, the output from the analysis helps to understand compositional changes and functional relationship in evolution among organisms. We have demonstrated the utility of MFPPI V.1.0 using 17 mtATP6 protein sequences from different mammalian species. It is available for free at http://insilicogenomics.in/mfpcalc/mfppi.html.


Background:
The physicochemical property of proteins is critical for sustainability, efficiency, and stability in a biological system. Various physico-chemical parameters of proteins such as amino acid composition, extinction coefficient [1], instability index [2, 3], grand average of hydropathicity (GRAVY), aliphatic index, theoretical pI, atomic composition and molecular weight allows us to understand the stability, activity and nature of protein.
There are many web based and standalone softwares available that compute physico-chemical properties of proteins. AACompIdent is a web-based tool at ExPASy that identifies proteins using amino acid composition [1].
Protein/Peptide Property Calculator [4] is a web-based tool to calculate the peptide chemical formula, molecular weight, netcharge at neutral pH, hydrophilicity, hydrophobicity, isoelectric point and extinction coefficient. It also predicts hydrophobic or hydrophilic region, secondary structure of the protein, trans-membrane region and flexible region of the input protein or peptide sequence of interest. However, it is useful for single sequence analysis.
The Molinspiration server also offers number of chemoinformatics tools to calculate LogP (octanol/water partition coefficient), molecular polar surface area and molecular volume [5]. ProtParam [6] from ExPASy [7] server is a reliable algorithm to compute physico-chemical properties. However, it uses single sequence per analysis through the interface. Moreover, current methods do not analyze multiple sequences for comparative analysis. It also does not provide options for downloading results for subsequent analysis. Therefore, it is of interest to develop a novel interface using ProtParam to analyze multiple sequences from a multi-FASTA file producing results for comparative inference with evolutionary insights. It is also of interest to develop methods to download and store results in an ".xls" format for further analysis. Hence, we describe the development and utility of MFPPI V.1.0 in a JAVA platform version JRE7 (simple, objectoriented, reliable, secure and portable) for this purpose.

Methodology: Sequence retrieval and construction of Multi-FASTA file
Mitochondrial protein (mtProtein) sequences of 17 different mammalian members were retrieved in FASTA format from National Centre of Biotechnology Information on a single notepad file with ".txt" extension was created. The FASTA format of protein chosen must start with >lcl| then followed by accession number or description. In the end there should be at least one bracket "[ ]" and in this bracket there may be species name or other details, sequence length should start after bracket. The input FASTA file of different mammalian protein has been illustrated in Figure 1.

Script Development
Java GUI programming involves two packages first the original Abstract Window Toolkit (AWT) and second newer Swing toolkit. Swing is the primary Java GUI widget toolkit. The script of the web interface was developed in four steps.

Input data
Multi-FASTA text file of mtProteins were declared as string that contains several sequences in FASTA format separated by greater than (">") symbol.

Splitting and storing Multi-FASTA sequence into raw sequence
Each sequence was split and converted into raw format (without any symbol and description line) and then stored into a separate file. To split the sequence from description line, each FASTA sequence was taken into string and then split method was applied from where greater than symbol ">"starts and ends with "]".

Fetching raw sequence into ProtParam server
To fetch the sequence into ProtParam server sequentially one by one, a connection was established with ProtParam server using following syntax.
Syntax: URL siturl = new URL ("http://web.expasy.org/cgibin/ProtParam/ProtParam"); Redirect method was applied to calculate next sequence and then output condition should be "true" to print the results after physico-chemical property calculation compilation.

Saving data into MS-Excel file
After compilation of calculated parameters at ProtParam server sequential result was saved in MS-Excel (.xls) file.

Graphical User Interface
The graphical user interface was developed very simple and user friendly. Interface contains text field, browse button, submit button and process status. Logo of software with its name in Hindi and English language as well as logo of Banaras Hindu University, Varanasi and Sam Higginbottom Institute of Agriculture Technology & Sciences, Allahabad was also added. MFPPI V.1.0 is fully automated web interface tool for ProtParam to calculate physico-chemical property. Also we divided this software into six different packages for particular calculation.

General features
The MFPPI V.1.0 graphical user interface of tool has only two buttons, browse and submit (Figure 2). The server is able to calculate total number of amino acid, molecular weight, theoretical pI, number of each amino acid residue and their percentage, total number of negatively charged residues (D + E), instability index, aliphatic index, and grand average of hydropathicity (GRAVY) for several protein sequences simultaneously.

Example analysis
The results from MFPPI V.1.0 for 17 mtATP6 protein [8] sequences from different mammalian species are given in Table  1 &Table 2. A graph drawn using Table 1 is shown in Figure 3. This is an example of comparative analysis of multiple sequences. The sequences are amino acid C poor and L rich. Low frequency of D was found across the species and absent in Saimiri boliviensis and Gorilla gorilla gorilla. The amino acid residues R, E, K, W and Y were also present in low frequency in comparison to higher frequencies of N, Q, G, H, M, F, P and V. Residues A, I, S and T frequency was found relatively higher among all species.

Other features
The interface also provides values for molecular weight, extinction coefficient, instability index, aliphatic index and grand average of hydropathycity (GRAVY) [9] for the protein sequences (Table 2) in a comparative manner among 17 mammalian species. This provides insight for functional analysis and molecular evolution. shown. The composition graph shows mtATP6 is rich in amino acid L and poor in C.

Conclusion:
The added feature in MFPPI V.1.0 interface is its ability to calculate physico-chemical properties of multiple protein sequences along with comparative analysis of several physiochemical parameters using the Expasy's ProtParam server. The interface provides output in Excel sheet format for further useful statistical analysis and graph generation for further visualization analysis. MFPPI V.1.0 finds utility in understanding compositional changes and functional relationship in evolution among organisms. We have demonstrated this using 17 mtATP6 protein sequences from different mammalian species.