Analysis of protein chameleon sequence characteristics

Conversion of local structural state of a protein from an α-helix to a β-strand is usually associated with a major change in the tertiary structure. Similar changes were observed during the self assembly of amyloidogenic proteins to form fibrils, which are implicated in severe diseases conditions, e.g., Alzheimer disease. Studies have emphasized that certain protein sequence fragments known as chameleon sequences do not have a strong preference for either helical or the extended conformations. Surprisingly, the information on the local sequence neighborhood can be used to predict their secondary at a high accuracy level. Here we report a large scale-analysis of chameleon sequences to estimate their propensities to be associated with different local structural states such as α -helices, β-strands and coils. With the help of the propensity information derived from the amino acid composition, we underline their complexity, as more than one quarter of them prefers coil state over to the regular secondary structures. About half of them show preference for both α-helix and β-sheet conformations and either of these two states is favored by the rest.


Background:
Repetitive secondary structures like α-helices and β-strands have been viewed as key building blocks of proteins. These local protein structures are stabilized mainly by hydrogen bonds within the protein backbone. In 1984, Kabsch and Sander identified identical fragment sequences of limited length found in both α-helices and β-strands, namely chameleon sequences [1]. This suggests that only local sequence composition and the order of amino acids are not sufficient to predict the secondary structure accurately [2]. The number of examples supporting the above speculation has strikingly increased in the recent past [3]. Elegant experimental studies have shown the importance of nonlocal interactions to guide the formation of α -helix or βstrand, e.g. the IgG-binding domain of protein G (GB1) [4]. Chameleon sequences have also been designed, e.g. MATa2 and MCM1 DNA complexes [5]. Studies have emphasized that these chameleon sequences, have no strong preference for either α-helical or β-strand conformations [6]. Nonetheless, the information on the local sequence neighborhood can be used to predict their secondary at a high accuracy level [3,7]. Here, we have analyzed chameleon sequences to estimate their propensities to form not only the regular secondary structures like α -helix or β-strand, but also coil [8].

Description:
Unlike the previous studies that focused only on limited parts of the Protein DataBank [9], all the protein structures available in 2007 (~40.000 protein structures) have been used. Secondary structures have been assigned for these proteins using the DSSP algorithm [10]. Only those proteins with complete side-chain co-ordinates and without multiple breaks in the chain were considered, leading to a final number of 14,692,070 amino acid residues associated to a given secondary structure. The 8 secondary structural assignments made by DSSP were reduced to the 3 classical states: helix includes α, 3. 10 and π-helices, strand has only the β-strand assignments, and coil covering the rest of the assignments (γ-bridges, turns, bends, and coil). Default parameters of the program have been used.
In the second step, we searched for chameleon sequences of length L, L ranging from 4 to 8 amino acids. A fragment is considered as a chameleon sequence if all the residues in this fragment are associated at least once to the helical conformation and also, at least once to the β-strand. Thus, numerous chameleon sequences have been located: 63,228 (for L = 4 residues), 34,408 (for L = 5), 2,423 (for L = 6), 179 (for L = 7) and 64 (for L = 8). As the dataset is large and complete when compared to the ones used in previous studies, more examples were found, especially for the longer fragments [3].
Our main goal is to check whether the chameleon sequences don't have any strong preference for either helical or strand conformations [6], and also to extend the questioning to the preference of chameleon sequences for the coil state, a question not directly tackled in the previous works. For this purpose, we have used a simple methodology. We have used a non-redundant databank containing proteins with not more than 20% pairwise sequence identity. The selected chains have X-ray crystallographic resolutions less than 1.6 Å, with a R-factor less than 0.25 (details can be found in [11]). Using this non-redundant databank, the propensity of an amino acid k to be associated to a given secondary structure state i, namely p i k , has been computed (see equation 1 in supplementary material) and i corresponds to α-helix, βstrand or the coil state, while k corresponds to one of the 20 amino acids.
Hence, each chameleon sequence X S is associated to a score S α , S β and S coil As these scores are propensity products, a score S i of 1.0 corresponds to the random value. If S i is higher than one, this chameleon sequence is found preferentially associated with the secondary state i and vice versa. This measure is crude but gives some basic insights into the behaviors of chameleon sequence. Figure 1a shows a plot of S α versus S β for the 63,228 chameleon sequences (for L=4 residues). The adequacy scores greater than 4.0 were set to a maximum value of 4.0. The figure shows that 53.7% and 47.3% of the chameleon sequences have S β and S α scores greater than 1.0 respectively. Thus, each square delineated by the red lines are quite equivalent. S β scores go far beyond S α scores, as 16% of the S β scores are greater than 2.0, 5.3% than 3.0 and 2.7% than 4.0, while only 5.1% of the S α scores are greater than 2.0 and 0.2% than 3.0. 21.6% of the chameleon sequences have S α and S β scores greater than one, with an average S coil of 0.42 (i.e. less than two times the random value). For 25.7% of these fragments, α-helix is statistically preferred over β-strand, with an average S coil of 0.68, while for 24.7%, only β-strand is preferred (average S coil of 0.65). Interestingly, 27.9% of the chameleon sequences have S α and S β less than 1.0, i.e., the coil state is favored.   Figure 1b shows the chameleon sequence fragment MLIL that have S α and S β scores greater than 2.0 (shown as the blue dot in Figure 1a). In type-1 beta-hydroxysteroid 2 dehydrogenase, this chameleon sequence forms the central β-strand of a β-sheet composed of 5 β-strands (Figure 1b  left), while in hyperthermophilic tungstoperin enzyme 2 aldehyde ferredoxin oxidoreductase, this sequence is in the middle of a long α-helix (Figure 1b right).
With this simple approach, we have underlined that chameleon sequences have no strong preference for either α-or β-conformation. We have also found that very different chameleon sequences exist, some showing a higher preference for either helical or strand conformations, some showing preference for both, while some sequences favor the coil state over the regular secondary structures. These observations again support the idea that non-local factors [2, 3] have a major influence over the secondary structure that an amino acid sequence adopts. Supplementary information can be found on our website: http://www.dsimb.inserm.fr/~joseph/chameleon/