CARBANA: Carbon analysis program forprotein sequences.

There are lots of works gone into proteins to understand the nature of proteins. Hydrophobic interaction is the dominant force that drives the proteins to carry out the biochemical reactions in all living system. Carbon is the only element that contributes towards this hydrophobic interaction. Studies find that globular proteins prefer to have 31.45% of carbon for its stability. Taking this as standard, a carbon analysis program has been developed to study the carbon distribution profile of protein sequences. This carbon analysis program has been made available online. This can be accessed at www.rajasekaran.net.in/tools/carbana.html. This new program is hoped to help in identification and development of active sites, study of protein stability, evolutionary understating of proteins, gene identification, ligand binding site identification, and to solve the long-standing problem of protein-protein and protein-DNA interactions.


Background:
There is lot of work gone into proteins to understand the ultimate truth of real information [1 -3]. Hydrophobic interaction is the dominant force that comes from presence of carbon. Recent studies reveal that proteins prefer to have 31.45% of carbon in its structure and in sequence [2]. To understand the buried information further in proteins this work has been taken up.

Methodology:
The idea behind this method is visualising the molecule on actual basis. That is the basic units of proteins are elements such as carbon, sulphur, nitrogen, oxygen and hydrogen. In this method the amino acid sequences are converted into atomic sequences. Example is given in supplementary material.
It is also hoped that a protein sequence with 100 amino acids should have about 1555 atoms in the atomic sequence. Further the percentage of carbon in the first 500 atoms are computed and marked as carbon percentage at the point of 250. Residue number that carries the 250 th atom is taken as reference point. Next the group of 500 atoms is taken from 5 to 505. Again the carbon percentage computed is assigned to reference point of 255. This way by a shift of 5 atoms all 500 groups are computed with carbon percentage and assigned to corresponding reference point. This shift by 5 atoms can increased or decreased depends upon the resolution required. Similarly the window length 500 atoms (~32 amino acids) can changed for different calculations. A plot of carbon percentage versus the reference point is plotted to indentify the carbon distribution profile along the sequence. A C program has been written to carry out all these calculation. A sample input, output (Table 1 see Supplementary material) and plot (Figure 1) are given and discussed.

Discussion:
The program reads protein sequences and converts it into array of elements. The percentage of carbon is computed for a group of atoms is assigned to reference point residue. Normally the shift value of 5 is used. It can be increased or decreased depends upon the resolution required. Reduction in shift value creates too many points and makes the plot congested. A shift value of 17 may be optimum. This value is half of the smallest unit (35 atoms) that is producing 31.45% of carbon. Further improvements in having all amino acids (including first and last 17 residues) represented in the output and in figures are underway. Also the computation of carbon percentage at alpha carbon position will be implemented for mutational studies and for other applications.
There is window length of 500 atoms taken for carbon percentage calculation (Figure 1). This value may be increased or decreased depends upon required resolution. This can be from 35 to 1000 atoms length. The 35 atom length is chosen because the smallest unit which can produce 31.45 is 35 with 11 carbons in it. Carbon accumulation in active site or in core can be easily identified at length of 500. So by default a length of 500 atoms is taken for general carbon profile study. To identify the residue contributing to the stabilization or destabilization factors, one can reduce this length. For mutational study a length value of 50 atoms may be appropriate. A sample input and output are given below for length of 500 atoms and shift size of 17 atoms.

Conclusion:
Carbon profile analysis software [CARBANA] has been developed and presented here. This program is capable of locating the carbon accumulated site in proteins. It can clearly identify the hydrophobic and hydrophilic regions along the sequence. It can also pinpoint an amino acid which is causing instability. Atomic level representation of proteins can yield better results. This carbon analysis program is available online. This new program is hoped to address several biological problems based on hydrophobicity. Particularly, it can help in identification and development of active sites, address the proteins in diseased and healthy state, characterize the disordered proteins, address the role of carbon in half of proteins and understand patterns and repeats in proteins.