PHYSICO2: an UNIX based standalone procedure for computation of physicochemical, window-dependent and substitution based evolutionary properties of protein sequences along with automated block preparation tool, version 2

Automated genome sequencing procedure is enriching the sequence database very fast. To achieve a balance between the entry of sequences in the database and their analyses, efficient software is required. In this end PHYSICO2, compare to earlier PHYSICO and other public domain tools, is most efficient in that it i] extracts physicochemical, window-dependent and homologousposition-based-substitution (PWS) properties including positional and BLOCK-specific diversity and conservation, ii] provides users with optional-flexibility in setting relevant input-parameters, iii] helps users to prepare BLOCK-FASTA-file by the use of Automated Block Preparation Tool of the program, iv] performs fast, accurate and user-friendly analyses and v] redirects itemized outputs in excel format along with detailed methodology. The program package contains documentation describing application of methods. Overall the program acts as efficient PWS-analyzer and finds application in sequence-bioinformatics. Availability PHYSICO2: is freely available at http://sourceforge.net/projects/physico2/ along with its documentation at https://sourceforge.net/projects/physico2/files/Documentation.pdf/download for all users.

There are different web-tools that perform either physicochemical [5,6] or window-dependent [6] analysis on per-sequence basis for one [5,6] or few properties [6]. Webtools are also there that use amino acid index values [7] for prediction of interaction profile of sequence [8,9] or sequences [9] per run. However, web-tools are rare that allow mass-scale, user-friendly analyses of PWS (3-in-1) properties in a single run using any form of FASTA-file. Gaining insight into PWS differential among different taxonomic groups, is of great significance in sequence bioinformatics [1,2,4], would be computationally costly by analyzing one sequence at a time and then computing the average. Moreover management of analytical and graphical web-data by later procedure is very cumbersome and error-prone. Further, sharing of same websoftware by worldwide users might cause lower processivity. While PHYSICO [10], in contrast performs batch analyses for PWS properties, their ranges are still less exhaustive. Now the later to serve better, an upgraded version seems urgent such that users i] could relish the flexibility in setting relevant inputparameters, ii] could procure additional outputs on windowdependent profiles for RAW-FASTA, pI-profiles and items in similar kind of output as PHYSICO that contains novel reports on substitution-based positional as well as BLOCK specific diversity and conservation, iii] can access detailed documentation on principle and methodologies used in the program. Although capable in analyzing, PHYSICO is unable to perform painstaking BLOCK-FASTA-file preparation that would not only be necessary for extraction of extra information but also for comparison of novel evolutionary properties among different taxonomic groups of a given family. PHYSICO2 incorporates all the above attributes along with comprehensive up-gradation of PWS properties in reference to earlier version and thus has been a unique tool in sequence bioinformatics.

Methodology:
The program works on input FASTA file of any form Figure 1: F2). Upon execution it optionally allows to change default input-parameters (DPAR) such as residue classes, pI-method and Shannon-threshold by users one (UPAR). Program then enters into first phase (P1) of analyses. In contrast to BLOCK-FASTA (F2), RAW-FASTA (F1) input produces only one output (Figure 1:R1) as in this case homologous positions are noncomparable and thus column specific analyses (that produce additional three outputs in the former one: B2, B3 and B4) are skipped. RAW-FASTA-file (F1) harboring sequences from one or more taxonomic groups that are readily converted into BLOCK-FASTA-file or files of identical width respectively using ABPT (F3) of the program. In second phase of computation (P2), the program performs window-dependent property analysis. In this case if the input is BLOCK-FASTA (where homologous positions are aligned), all sequence specific profiles are redirected into one excel table (R5) to facilitate easy computation of mean along with standard deviation for taxonomically related sequences (Documentation) otherwise each sequence specific profile is saved separately (R21, R22 etc) in named directory (Documentation).

Program input:
PHYSICO2 is extensively tested to function in CYGWIN (32-bit) environment. It takes either RAW (Figure 1: F1) or BLOCK -FASTA (F2) as input. While the former is directly usable upon downloading from the database, the later is to be prepared (either manually or programmatically) prior to its use as input. Unlike PHYSICO where one needs to prepare BLOCK-FASTAfile manually, PHYSICO2 includes ABPT for its preparation (Documentation). Users are also prompted for inputparameters such as residue-classes, pI-method and Shannonthreshold.

Method of computation, performance of the program and experimental validation:
Detailed method precedes analytical results of each item in each output file. We performed PHYSICO2 based analyses on representative candidate sequences from two taxonomic groups (metazoa: 31 and cyanobacteria: 32 sequences) of "cytochrome c family" in Intel(R) core™ i3 CPU M330 @2.13 GHz PC-CYGWIN (32 bit) environment. We also performed same analysis using "PROTPARAM" for physicochemical (8 properties [6] and their averages) and "PROTSCALE" (individual and average profile) for window-dependent [6] properties using "Alliance Broadband: PRIME (54Mbps) package" internet connection. Efficient use of these tools took ≥ 10 hours for obtaining these results in excel. On the other hand, only 6 minutes was sufficient to obtain the above and other additional properties in PWS-format using PHYSICO2.
To compare itemized-results, same set of sequences were subjected for analysis using available web-tools [5,6] and PHYSICO2. Although not all items of results could be compared due to lack of public domain program (e.g. substitution hetero-pair diversity) above items showed exactly similar results (data not shown).