Distribution of amino acids in functional sites of proteins with high melting temperature

The stability of proteins in its native state has an important implication on its function and evolution. The functional site analysis may lead to better understanding of how these amino acid distributions influence the melting temperature of proteins. It has been reported that increasing the fraction of hydrophobic contacts in a protein tends to raise melting temperature; increasing the fraction of repulsive charge contacts decrease the melting temperature and consistent with a destabilizing effect. The role of amino acid distribution as hydrophobic, charged and polar residues in proteins and mainly in its functional sites has been studied. Due to limited data availability, redundancy check and controlled environment parameters, the study was carried out with ten single chain-wild proteins having melting temperature above 80°C at pH 7. The analysis depicts that, the entire protein, hydrophobic residues are more frequent in single chain proteins and charged residues are more frequent in multi-chains proteins. In functional sites of these proteins, hydrophobic and charged residues are equally frequent in single chain proteins and charged residues are very high in multi-chains proteins. But, the polar residue distribution remains same for both single chain and multi-chain proteins and its functional sites.


Background:
The melting temperature is an important characteristic feature of a sequence in the prediction of thermal stability of proteins. Increasing the fraction of hydrophobic contacts in a protein tends to raise melting temperature; increasing the fraction of repulsive charge contacts decrease the melting temperature and consistent with a destabilizing effect. It has been revealed that thermophilic and mesophilic proteins have both similar polar and non-polar contribution to the surface area and compactness. Salt bridges and main chain hydrogen bonds show an increase in the majority of thermophilic proteins than their mesophilic homologues. In thermophilic proteins, hydrophobic residues are significantly more frequent and polar residues are less frequent [1]. It has been described that thermophiles prefer to have contacts between residues with hydrogen-bond-forming capability. The contact density is not significantly correlated with protein melting temperature. More contacts are observed between polar and non-polar residues in thermophiles than mesophiles. Tyr has good contacts with several other residues, and Cys has considerably higher longrange contacts in thermophiles compared with mesophiles [2]. It has been reported that both thermophilic and mesophilic proteins have similar hydrophobicities, compactness, oligomeric states, polar non-polar contribution to surface areas, main-chain and side-chain hydrogen bonds. Salt bridges and side-chain hydrogen bonds increase in the majority of the thermophilic proteins. Arg and Tyr are more frequent, Cys, Ser are less frequent in thermophilic proteins. Thermophiles have a larger fraction of their residues in the α-helices and avoid Pro in its α-helices to a greater extent than the mesophiles. It may be the cause to withstand high temperature [3].
The stabilizing group consists of polar-charged residues and non-polar residues that possess high surrounding hydrophobicity. It has been described as the polar uncharged residues destabilize the molecule against temperature and Ser being the most destabilizing residue. A very high co-operativity exists among the stabilizing non-polar residues. In small globular proteins, the melting temperature remains mainly a function of amino acid composition and in complex molecules it depends on other factors also [4]. The maximum melting points of proteins with respect to pH are reported that the correlation coefficient of hydrophobic index versus melting point was 0.622; average residue volumes versus melting points was 0.960; the average residue volume versus hydrophobic index was 0.697 [5]. In proteins, the strongest contributors to thermostability are increased in ion pairs on the surface and the very strong hydrophobic interior [6]. Identifying the protein regions that are most likely to be involved in function is useful to gain information about its potential role. The combination of experimental and in-silico approaches will help us to interpret the genetic information in functional terms and be the final goal of the so-called post-genomic era [7]. It has been analyzed that multiple peptide motifs are capable of different temperature dependent transitions [8]. In the present work, we have studied the distribution of amino acids in both thermophilic proteins and in its functional sites. In this amino acids distribution studies, the classification of amino acids as hydrophobic, charged and polar residues is followed. Then, the distribution of such classified amino acids and the melting temperature transition among proteins is analyzed.

Methodology: Datasets
The database ProTherm [9] available at: http://gibk26.bio.kyutech.ac.jp/jouhou/Protherm/protherm.h tml is used for our study. The melting temperature of wild proteins measured by different techniques such as DSC, CD, Adsorption, Fluorescence, and NMR, for both single chain and multi-chains proteins is mainly focused in this study. Certain conditions such as measurement, state, chain type, pH etc. is set to derive a proper dataset for the analysis. To avoid bias in prediction of result, the size of the derived dataset was reduced by redundancy check.

Amino acid composition (%)
The percentage of amino acids present in the protein sequences were collected using Protparam tool [10] available at http://web.expasy.org/protparam/. The percentages of collected amino acids were classified as hydrophobic, Hp (A, I, L, M, F, P, V), charged (R, D, E, K) and polar (N, C, Q, G, H, S, T, W, Y) residues.

Non-polar residues (hydrophobic)
Hydrophobic amino acids are found at the core of many proteins and they mainly composed of non-polar residues. A significant amount of non-polar residues are also found on the surface of proteins. These are not favorably interact with water but stabilized by numerous van der Waals interactions.

Charged residues
Charged amino acids are found on the surface of the protein as well as seldom buried in the interior of a folded protein. They can interact with water and other important biological molecules. The positively and negatively charged amino acids form salt bridges.

Polar residues (hydrophilic)
Polar amino acids are found both at buried position as well as on the surface of proteins and posses hydrophilic groups which either form hydrogen bonds with other polar residues in the protein or with water. Polar amino acid side chains may form side chain-side chain or side chains-main chain hydrogen bonds. These interactions are often crucial for the stabilization of protein tertiary structure and are normally conserved.

Functional/ Active sites
The CASTp server [11, 18] available at http://stsfw.bioengr.uic.edu/castp/ is used. Using this server, the amino acids present in the active sites of single chains proteins were identified. The POOL program (Partial Order Optimal Likelihood) available at: http://www.pool.neu.edu:8080/wPOOL/, which is a machine learning method that identifies the functionally important residues in a given protein structure [12] is also used in this research.

Multiple sequence alignment
Analysis of multiple alignments was performed using the software MAFFT. It is a multiple alignment program for amino acid or nucleotide sequences [13,14] available at http://mafft.cbrc.jp/alignment/software/. This program is used to understand the phylogenetic relationships among wild proteins particularly with single chain and Tm > 80˚C at pH7. Jalview version-2 was used for editing and for obtaining phylogenetic tree [14].

Discussion:
As per previous report, in complex molecules the melting temperature depends on other factors also [4]. So, the dataset with defined conditions such as pH, state, number of chains etc is generated. Due to these limitations, the study is carried out with limited data. The overview of available Tm measured by different techniques at different pH is given in (Figure 1a, 1b,  1c, 1d & 1e). These scatter plots depicting the size of Tm dataset measured by DSC and CD measurements is larger than the datasets derived through adsorption, florescence, NMR measurements. Since the reliability of result prediction attained by large dataset, the research is focused on Tm obtained through DSC. The distribution of single chain and multi-chains proteins with Tm > 80˚C at different pH are shown in (Figure  2a, 2b). The study is to find out the type of amino acids significant for maintaining the temperature in thermophilic proteins. While considering the whole protein with Tm > 100˚C at pH 7, hydrophobic residues are more frequent followed by polar residues in single chain proteins (4 data) and hydrophobic/ charged residues are more frequent and polar residues are very less frequent in multi-chains proteins (4 data) which illustrated in (Figure 3a & 3b). As per the previous report the hydrophobic residues are significantly more frequent and polar residues which are less frequent in thermophilic proteins [1]. The dataset of 16 proteins with many single chains as well as few multi-chains having Tm > 40˚C at pH 7 make it clear that almost 50% are shared by polar residues and the remaining 50% are shared by mostly hydrophobic residues which followed by charged residues (Figure 3.2). This dataset is taken just for the comparison of different melting temperature transition.
While considering the whole protein with Tm > 80˚C at pH 7, not much variations is observed in polar residue distribution for both single chain and multi-chain proteins. In single chain proteins (11 data), hydrophobic residues are more frequent and in multi-chains proteins (7 data), charged residues are more frequent (Figure 3.3 & 3.4) Figure 3.5a & 3.5b) illustrates the top ten residues Which constitute the functional site of proteins having Tm > 80˚C at pH 7. While considering the functional sites of these proteins, not much variation is observed in polar residue distribution for both single chain and multi-chain proteins. It seems polar residues remain same in protein as well as in functional sites for both single chain proteins. In single chain proteins (10 data), hydrophobic and charged residues are equally frequent and in multi-chains proteins (7 data), charged residues are high frequent (Figure 3.5a & 3.5b). As reported earlier more contacts have been observed between polar and non-polar (Hp) residues in thermophiles than in mesophiles [2]. The individual amino acid distribution of single and multichain proteins having Tm>100˚C at different pH is shown in (Figure 4)  Identifying the protein regions that are most likely to be involved in function may give the direction to melting temperature transition. So, the study is focused on the amino acid distribution in its functional sites. As reported by Pool server, either because of atoms with undefined parameters or because of high-potential regions in the protein, the functional site residues for our entire dataset could not be collected. And also, it is given as the top 8-10% of the residues in the rank order gives the best performance in this server. Hence, the top 10 residues were considered for work Table 1 & 2 (see supplementary material). The Accession number and PDB ID of proteins used in the multiple sequence alignment for the dataset generated using the conditions such as: single chain, two state, pH7, Tm>80˚C Table 3 (see supplementary  material). The phylogenetic tree shown in (Figure 5) depicted that 1BSQ, 1QG5 are similar proteins; 1ONC and 1J2V/1NZA might have another internal ancestor. In this, 1J2V and 1NZA belong to same family. 1NZA, 1ONC and 1J2V are of almost same length and the percentage of top 10 functional site residues are also same for single chain proteins with Tm>80˚C at pH7.

Conclusion:
Previous report described whether it is a single chain or multichain protein, the polar amino acids composition is often crucial for the thermal stabilization of proteins. The present study concludes that while considering the entire protein, hydrophobic residues are more frequent in single chain proteins and charged residues are more frequent in multi-chains proteins. While considering the functional sites of these proteins, hydrophobic and charged residues are equally frequent in single chain proteins and charged residues are very high in multi-chains proteins. But, the polar residue distribution remains same for both single chain and multi-chain proteins and its functional sites. From the phylogenetic tree of single chain, two state proteins having Tm>80˚C at pH7, it is clear that the thermophilic proteins having similar amino acid length might have same internal ancestor. This may lead to the conservation of functional site residues and its chemical nature is same which may also responsible for high melting temperature. Instead of considering the entire protein, the functional site analysis leads to better understanding of how these distributions of amino acids may affect the melting temperature of proteins.