The relationship between relative solvent accessible surface area (rASA) and irregular structures in protean segments (ProSs)

Intrinsically Disordered Proteins (IDPs) lack a stable, three-dimensional structure under physiological conditions, yet they exhibit numerous biological activities. Protean segments (ProSs) are the functional regions of intrinsically disordered proteins that undergo disorder-to-order transitions upon binding to their partners. Example ProSs collected from the intrinsically disordered proteins with extensive annotations and literature (IDEAL) database. The interface of protean segments (ProSs) is classified into core, rim, and support, and analyzed their secondary structure elements (SSEs) based on the relative accessible surface area (rASA). The amino acid compositions and the relative solvent accessible surface areas (rASAs) of ProS secondary structural elements (SSEs) at the interface, core and rim were compared to those of heterodimers. The average number of contacts of alpha helices and irregular residues was calculated for each ProS and heterodimer. Furthermore, the ProSs were classified into high and low efficient based on their average number of contacts at the interface. The results indicate that the irregular structures of ProSs and heterodimers are significantly different. The rASA of irregular structures in the monomeric state (rASAm) is large, leads to the formation of larger ΔrASA and many contacts in ProSs.

Protean segments (ProSs) are the functional regions of intrinsically disordered proteins that undergo disorder-to-order transitions upon binding to their partners (i.e., coupled folding and binding) [9][10][11][12]. The ProS interface is composed of a small core and a large rim. The average number of contacts of ProS interface with its interaction partners is greater than that of heterodimers. This indicates the effective interactions of ProSs that take place in the rim region like core. The key to effective interactions of ProSs is the solvent exposure of rim residues in the monomeric state (rASAm) [13].
The goal of this work is to investigate the properties of secondary structure elements (SSEs) at the interface of ProSs relative to those of heterodimers. The interfaces of ProSs and heterodimers were classified into the core, rim, and support based on their relative solvent accessible surface area (rASA) [14]. The average number of contacts of alpha helices and irregular residues was calculated for each ProS and heterodimer. Furthermore, the ProSs were classified into high and low efficient ProSs based on their average number of contacts at the interface. Compared to heterodimers, irregular residues of ProSs have larger number of contacts than their alpha helices. Moreover, irregular residues of ProSs have larger ΔrASA than their alpha helices. The rASA of irregular structures in the monomeric state is large, that leads to the formation of larger ΔrASA and many contacts in ProSs. In addition, high efficient ProSs have larger average rASA in the monomeric state (rASAm) and larger average ΔrASA, than low efficient ProSs.

Materials and Methods: ProSs and heterodimers
All ProSs (210) in 70 protein sequences were collected from the IDEAL database (as of August 2013) [11,12]. If more than one ProS were found in a protein and their positions overlapped, the longest ProS was selected. The sequence redundancy was removed with 80% sequence similarity (based on the CLUSTALW alignment) [15]. Hierarchical clustering was done with R [16] using completelinkage clustering and the longest ProS in a cluster was selected as the representatives. A non-redundant set contained 99 ProSs [13]. DNA-binding ProSs and one-to-many binding ProSs (a single ProS binds to two or more different partners), were discarded [17]. Both the X-ray and NMR structures were used in this study.
A non-redundant dataset of 276 heterodimers was selected from the Protein Data Bank (PDB) [18], using the PDB's advanced search interface (as of July 2014). The search criteria satisfied the following conditions: (1) less than 30% sequence identity; (2) the macromolecule type contained only proteins; (3) the oligomeric state was heterodimer; (4) each chain was greater than 100 residues; and (5) structures determined by X-ray crystallography had higher than 3 Å resolutions. Only smaller protomers were analyzed as the reference of ProSs.

Secondary structure analysis
The program DSSP [19] was used to assign secondary structures. The eight types calculated by DSSP were reduced to three, such as alpha helices (H, G and I), beta strands (E) and irregulars (B, S, T and C). The amino acid propensity, average number of contacts and relative solvent accessible surface areas (rASAs) of alpha helices and irregulars were analyzed in detail.

Calculation of amino acid propensities
The propensities of amino acids are represented as the Chou-Fasman parameters [20], CF (a,P) = N a (P)/N(P)/(N a all/Nall), where N a (P) is the number of amino acid residue a in place P, N (P) is the total number of residues in P, N a all is the total number of amino acid residue a in the protein sequence, and Nall is the total number of residues in the protein sequence. In P, the alpha helix and irregular residues of ProSs and heterodimers were considered. To calculate the reference states (the denominator), the same secondary structure types of PDBSelect25 [21] proteins were used. PDBSelect25 contains a representative set of PDB entries with less than 25 % sequence identity.

Analysis of relative ASA (rASA) and residue contacts
The interfaces of each ProSs and heterodimer were classified into the core, rim and support based on the definitions of Levy [14]. The relative solvent accessible surface area (rASA) of a residue indicates a degree of residue solvent exposure. It can be calculated by normalizing the total accessible surface area (ASA) of the residues in a protein structure by the ASA of the residues in the most exposed state to a solvent molecule [22]. The program Naccess [23], which is an implementation of Lee and Richard's algorithm [24] were used to calculate the rASA of each residue in the monomeric (rASAm) and complex states (rASAc) for ProSs and heterodimers. The change in relative solvent accessible surface area (ΔrASA) of each residue was calculated as the difference between the rASAs of monomeric (rASAm) and complex states (ASAc). The rASAs were averaged for the interface, core and rim residues, to derive the average rASAs of proteins.
Two residues, i and j, were considered to be in contact if any atom of residue i was within a distance of < 4.5 Å with any atom of residue j [25,26]. The average number of external contacts and relative solvent accessible surface areas (rASAs) at the interface, core and rim in alpha helices and irregular residues were calculated for each ProS and heterodimer. External contacts are defined as the contacts between the proteins and their interaction partners. The support and beta strand residues were discarded from this study because of their shortage in ProSs.

High and low efficient ProSs
Based on the average number of contacts in the interface, the ProSs were classified into high and low efficient ProSs. High and low efficient ProSs were defined as the contacts of ProSs with greater than 4 and less than 2.5, respectively. Short ProSs (less than 11 residues) were discarded from this classification. Several properties were analyzed for each high and low efficient ProSs (See Results and Discussion). The datasets contain 11 and 14 ProSs for high and low efficient, respectively [13]. The radius of gyration was calculated using Bio3D package [27] in R [16].

Statistical analysis
Wilcoxon rank-sum test was performed by RStudio [28] to calculate the P-values. P < 0.01 was considered statistically significant.

Figure 1: Distribution of secondary structure elements (SSEs) in
ProS and heterodimer interface. The composition of secondary structure elements (SSEs) in ProS interface (A) and heterodimer interface (B). The program DSSP was used to assign secondary structures. The eight types calculated by DSSP [19] were reduced to three, such as alpha helices, beta strands, and irregulars. The distributions of alpha helices, beta strands and irregulars are colored in green, violet and yellow, respectively. Because of the shortage of beta strand residues in ProSs, alpha helices and irregulars were considered for further analysis. Box-plots of the rates of (C) alpha helix residues in ProSs (red) and heterodimers (blue) interface (D) irregular residues in ProSs and heterodimers interface. The distribution of the irregulars is significantly different as assessed by the Wilcoxon rank-sum test (alpha helices = 0.03, irregulars = 1.05e-07).

Results and Discussion Secondary structure analysis of ProSs and heterodimers
The secondary structure assignments for each of the ProS and heterodimer interface were determined by the DSSP program [19]. This analysis (See Figure 1A and B) showed that 33% of the residues in the ProSs dataset were alpha helices, 6% were beta strands, and 61% were residues of the irregular structure. The secondary structure distribution of ProSs interface is very different from those of heterodimers. The content of irregular structures and beta strands are the largest difference between ProSs and heterodimers. Alpha helices are almost equally abundant in both data sets. ProS interface contains 15% more irregular residues, 13% fewer beta strands and 2% fewer alpha helices than heterodimers.
The differences between the distributions were evaluated, and the boxplots of the rates of alpha helices and irregulars are shown in Figure 1C and D. The alpha helix residues of ProSs and heterodimers are not significantly different (P-value = 0.03). It is important to note that, the irregular structures of ProSs and heterodimers are significantly different (P-value = 1.05e-07).

Interactions of secondary structure elements (SSEs)
The amino acid propensities of the different secondary structure elements (SSEs) (alpha helices and irregular structures) for ProSs vs. heterodimers were examined. The Chou-Fasman parameters [20] for alpha helix and irregular residues at the interface were calculated. In Figure 2A and B, the correlations between ProS alpha helices vs. heterodimer alpha helices and ProS irregulars vs. heterodimer irregulars at the interface are indicated. In both cases, positive correlations were observed with 0.50 and 0.61 for alpha helix and irregular residues, respectively. This indicates that the amino acid composition of the ProSs secondary structural elements (SSEs) is moderately similar to that of heterodimers.
The core residues at the interface are the hydrophobic residues, generally in the central region of the interface, and play an important role in the interaction. The rim residues are the polar residues, located on the outer edges of the interface. The support residues represent the intersection between the interior and the interface [14].
Previous studies have been indicated that the ProS interface can be in contact with a larger number of residues of the interaction partners compared with the heterodimer interface [13,29]. To examine the efficiency of interactions in different secondary structural elements (SSEs), the average number of external contacts of the interface, core, and rim residues were calculated for each ProS and heterodimer (see Figure 3 A-F). Compared to heterodimers, irregular residues of ProSs have a larger number of contacts than their alpha helices. In Table 1 and 2, the P-values of alpha helices and irregulars are shown respectively, for the interface, core, and rim.

Relative ASA (rASA) of secondary structure elements (SSEs)
Our previous study showed that the average ΔrASA correlates well with the average number of contacts in ProSs [13]. ΔrASA of each residue is defined by the difference between rASA of the unbound state (rASAm) and that of the bound state (rASAc), and both rASAs are used to define the core, rim and support residues (ΔrASA = rASAm -rASAc) [14]. Here, the relative solvent accessible surface areas (rASAs) of the alpha helices and irregular structures in each ProS and heterodimer at the interface, core and rim were analyzed in detail. In Figure 4 A-C, D-F, and G-I, the distribution of the average rASAm, rASAc and ΔrASA of ProS alpha helices is shown respectively, for the interface, core, and rim, and compared with those of heterodimers. Similarly, in Figure 5 A-C, D-F and G-I, the distribution of the average rASAm, rASAc and ΔrASA of ProS irregulars is shown respectively, for the interface, core, and rim, and compared with those of heterodimers. In both the core and rim, irregular residues of ProSs have a larger rASA in the monomeric state than heterodimers. The differences are confirmed by a statistical test (See Table 1 and 2). The rASA of ProS irregular residues in the monomeric state (rASAm) is large, resulting in a larger ΔrASA, leads to the formation of many contacts. Contour plots of average rASAm and rASAc of alpha helices and irregular structures are shown in Figure 7 and 8.

High and low efficient ProSs
Based on the average number of contacts at the interface, the ProSs were classified into high and low efficient ProSs (See Methods). To examine the properties of high efficient ProSs, several factors, such as average rASAm, average rASAc, average ΔrASA, rate of the interface, rate of the core, rate of the rim, radius of gyration (Rg), and length of the ProSs for each high and low efficient ProS were analyzed. Boxplots of the distributions of high and low efficient ProSs are shown in Figure 6 A-H. P-values of the high and low efficient ProSs are shown in Table 3. , core (C and D) and rim (E and F). The distributions of ProSs and heterodimers are colored in red and blue, respectively (these colors are used throughout this paper). The differences between the distributions were evaluated, and the P-values are shown in Table 1 Table 1.    Table 2.

Conclusion
The properties of secondary structure elements (SSEs) at the interface, core, and rim of ProSs were analyzed relative to those of heterodimers. The results demonstrate that irregular structures of ProSs and heterodimers are significantly different. Irregular structures have a larger rASA in the monomeric state (rASAm) that leads to the formation of many contacts in ProSs.  Table 3.