Structural features differentiate the mechanisms between 2S (2 state) and 3S (3 state) folding homodimers.

The formation of homodimer complexes for interface stability, catalysis and regulation is intriguing. The mechanisms of homodimer complexations are even more interesting. Some homodimers form without intermediates (two-state (2S)) and others through the formation of stable intermediates (three-state (3S)). Here, we analyze 41 homodimer (25 2S and 16 3S) structures determined by X-ray crystallography to estimate structural differences between them. The analysis suggests that a combination of structural properties such as monomer length, subunit interface area, ratio of interface to interior hydrophobicity can predominately distinguish 2S and 3S homodimers. These findings are useful in the prediction of homodimer folding and binding mechanisms using structural data.

where N 2 is the native dimer state, I is the intermediate monomeric species, I 2 is the intermediate dimeric species, and U is unfolded monomeric state.3SDI and 3SMI are commonly considered as three-state (3S).It is found that 2S interfaces are similar to protein cores and 3SMI interfaces resemble the monomer surfaces.[4] 2S and 3SMI dimerization were also studied by following the evolution of two identical 20-letter residue chains within the framework of a lattice model, using Monte Carlo simulations.[5] It is found that folding of 2S sequences depend on a significantly larger number of conserved amino acids than 3SMI sequences.The effects of the monomer and interface geometry on 2S and 3S association mechanism were also studied by the energetically minimally frustrated Gō model.[6] It is found that the native protein 3D structure is the major factor that governs the choice of binding mechanism.
Mei and colleagues investigated the importance of 2S and 3S dimers using structural and folding data.[2] Apiyo and colleagues proposed (using 13 obligomers (multimers with permanent interfaces)) that small obligomers (molecular mass < 20 kDa) unfold through 2S.[7] On the other hand, large obligomers (molecular mass > 35 kDa) unfold through oligomeric intermediate (3SDI) and those with intermediate size unfold through monomeric intermediate (3SMI).Moreover, Levy and colleagues proposed (using 21 homodimers) that 2S and 3SMI dimers can be effectively classified based on the ratio of intra-molecular/inter-molecular contacts and interface hydrophobicity.[6] Here, we created an extended dataset of 41 homodimers (2S: 25; 3SDI: 6; and 3SMI: 10) to design a methodology for the discrimination of 2S, 3SDI and 3SMI dimers using 3D structural properties.

Interface area
The solvent accessible surface area (ASA) was computed using the program NACCESS.[9] The dimeric interface area (B) was calculated as ΔASA (change in ASA upon complex formation from monomer to dimer state).[10] We then calculated subunit interface area (B/2), due to the two-fold symmetry of homodimer complexes.

Interior, interface and exterior residues
Homodimer residues were classified into three categories (interior, interface and exterior) based on relative ASA.The percentage relative ASA was obtained by dividing the accessible surface area by the total surface area of a side-chain in an extended conformation in the tripeptide GXG.Exterior residues were defined as having a relative ASA > 5%, interior residues were defined as having a relative ASA < 5% and interface residues were defined satisfying the conditions ΔASA > 1Å 2 & relative ASA < 5%.The 5% cut-off was optimized elsewhere by Miller et al., [11] Fraction of interface to interior Hydrophobicity (F hp ) F hp (Fraction of interface to interior hydrophobicity) was defined by the equation (H inf -H ext )/(H int -H ext ), where H int is interior hydrophobicity, H inf is interface hydrophobicity and H ext is exterior hydrophobicity.The individual hydrophobicity values were calculated using the equation Σn i h i /Σn i , where n i is the number of residue type i and h i is hydrophobicity value (based on SES (solvent excluded surface) & SAS (solvent accessible surface)) of type I, as described elsewhere.[12] Small and large homodimers By definition, small homodimers were defined as those with ML (monomer length) less than the dataset mean length (185 residues).By definition, large homodimers were defined as those with ML larger than the dataset mean length (185 residues).

Homodimers with small and large B/2
By definition, homodimers with small B/2 were defined as those whose B/2 is less than the dataset mean B/2 (1424 Å 2 ).By definition, homodimers with large B/2 were defined as those whose B/2 is larger than the dataset mean B/2 (1424 Å 2 ).

Results: Distribution of 2S and 3S in a Cartesian plane of monomer length and subunit interface area
Figure 1 shows the distribution of 2S and 3S in the Cartesian plane consisting of ML (monomer length) and B/2 (subunit interface area).It shows that 76% of small proteins form 2S and 60% of large proteins form 3S homodimers.Figure 1 also shows that 68% of 2S have large interface area and 45% of 3S have small interface area.2S have ML in the range of 45-270 residues and 3S have ML in the range of 70-850 residues.However, 3SMI lie within 90-380 residues and 3SDI lie within 70-850 residues.2S and 3S dimers have significantly different ML range (p = 0.05 in F test).Nonetheless, 2S and 3SMI have similar ML range (p = 0.05 in F test).The dataset mean ML is 185 residues.This lies between 2S mean (125 residues) and 3S mean (282 residues).Data also show that 2S and 3S ML means are different (p < 0.05).The mean ML for 3SDI is 405 and this is much greater than the mean ML for 2S (125) and 3SMI (208).
The B/2 range for 2S (650 -2500 Å 2 ) and 3S (300 -2317 Å 2 ) are overlapping and are not significantly different (p = 0.21).However, 3SMI and 3SDI are distinguished by the B/2 range (p < 0.05).3SMI having small B/2 range (300-1550 Å 2 ) and 3SDI having large B/2 range (1350-2317 Å 2 ) are distinguished from each other.The dataset mean for B/2 is 1424Å 2 , which lies between 2S mean (1509 Å 2 ) and 3S mean (1239 Å 2 ).Interestingly, the 3SMI mean (1068 Å 2 ) is close to 3S mean B/2 (p = 0.25) and 3SDI mean (1705 Å 2 ) is close to 2S mean B/2 (p = 0.35).In Figure 1, the distribution of 2S and 3S were divided into four regions (G1 to G4) based on the dataset mean of ML and B/2.Entries in G1 are small proteins with large B/2 and entries in G4 are large proteins with small B/2 (refer to methodology section for definition of small and large proteins).However, entries in G2 are small proteins with small B/2 and those in G3 are large proteins with large B/2.This grouping shows 84% of homodimers in G1 are 2S and 66% of homodimers in G4 are 3S.Nevertheless, homodimers in G3 there are 44% 2S and 56% 3S.Homodimers in G2 have 67% 2S and 33% 3S.It should be observed that 3S in G2 are solely 3SMI.The results show that 2S and 3S are distinctly and prevalently distinguished in G1 and G4 but not as much in G2 and G3.The distribution of 2S and 3S in regions G1 to G4 provide insight to their structural preference in terms of ML and B/2.

Exterior, interior and interface hydrophobicity in 2S and 3S
Table 1 gives the hydrophobicity of interior, interface and exterior residues for 2S, 3SDI and 3SMI.It also gives the mean hydrophobicity of interior, interface and exterior residues for 2S, 3SDI and 3SMI in the dataset.Very small 2S (≤ 90 residues) have greater interface hydrophobicity compared to interior hydrophobicity.However, this is not true with very large 2S (> 90 residues).It is also interesting to observe that majority of 3SMI have less interface hydrophobicity compared to interior hydrophobicity.Nonetheless, this is not true with a majority of 3SDI.Table 1 shows that the mean interface hydrophobicity values satisfy a condition (2S > 3SDI > 3SMI).However, the mean interior hydrophobicity satisfy a different condition (2S > (3SDI = = 3SMI)).The ratio of interface to interior hydrophobicity is ~1 for 2S and 3SDI, while it is < 1 for 3SMI.

F hp (Factor of interface to interior hydrophobicity) value in 2S and 3S
Figure 1, shows that 92% of entries in G1 have high F hp value (> 0.5) and 83% of entries in G4 have low F hp value (< 0.5).It also shows that 3S in G1 have high F hp value and 2S in G4 have low F hp value.Interestingly, 75% of entries in G2 have high F hp value and 78% of entries in G3 have high F hp value.Moreover, Figure 1 show that 91% 2S in G1 have high F hp value and 75% 3S in G4 have low F hp value.However, 100% 3S (2 entries) in G1 have high F hp value and 100% 2S (2 entries) in G4 have low F hp value.In G2, 75% of 2S have high F hp value and 67% of 3S have high F hp value.Nonetheless, 100% 3S have high F hp value and 50% of 2S have high F hp value in G3.The mean F hp value for 2S and 3SDI is 1, while it is 0.5 for 3SMI.Thus, the distribution of 2S and 3S in the G1 to G4 regions is described.

Discussion:
The mechanism of homodimer folding and binding has been investigated using denaturation experiments.[14-52] 3 dimensional structures are also available for many of these homodimers with known folding and binding mechanisms (Table 1).The folding and binding homodimer data collected from the literature is classified into three 2S, 3SMI and 3SDI.The study of homodimer folding and binding using energy models is computational intensive and time consuming.Alternatively, study on their folding and binding using structural data is found useful.[2] Recently, Mei and colleagues documented the differences between 2S, 3SMI and 3SDI homodimers using 3S structure data.

[2]
The study provided structural insight to the mechanism of 2S and 3S folding.However, the analysis did not document parameters to differentiate 2S, 3SMI and 3SDI homodimers using structural data.In this study, we study an extended dataset of homodimer complexes to distinguish 2S and 3S homodimers using structural features.Results show that 76% of small proteins are 2S homodimers and 60% of large proteins are 3S homodimers.Thus, protein size plays an important role in determining the pathways of homodimer folding and binding.The result also shows that 68% of 2S have large subunit interface area and 45% of 3S have small subunit interface area.These observations suggest the importance of protein size and subunit interface area in determining the mechanism of homodimer formation.The value within parentheses is hydrophobicity factor (F hp ), calculated by the equation (H inf -H surf )/(H int -H surf ), where H inf is interface hydrophobicity, H int is interior hydrophobicity and H surf is surface hydrophobicity.
The distribution of 2S and 3S in the G1 and G4 regions of Figure 1 show difference between them based on protein size, subunit interface area and F hp .In G1, 84% dimers are 2S and 92% of dimers have high F hp (> 0.5).Thus, entries with high F hp are grouped in G1 and this region represents small proteins with large subunit interface area.Moreover, 91% of 2S in G1 have high F hp .This implies that a majority of small proteins with large subunit interface area and high F hp are 2S.3S in G1 have high F hp and this explains the presence of exceptional 3S entries in G1.Similarly, 66% of dimers are 3S and 83% of dimers have low F hp (< 0.5) in G4.Thus, entries with low F hp are grouped into G4 and this region represents large proteins with small subunit interface area.Furthermore, 75% 3S in G4 have low F hp .2S in G4 have low F hp and this explains the presence of unusual 2S entries in G4.Entries in G2 and G3 have a mixture of 2S and 3S with low and high F hp values.This is different to the distribution in G1 and G4.100% 3S and 50% 2S in G3 have high F hp and thus dimers in G3 are not distinguished by their folding mechanisms using structural parameters.The mean F hp for 2S and 3SDI is 1, while it is 0.5 for 3SMI.The similarity between 2S and 3SDI in F hp is interesting.It implies that binding after folding displayed by 3SMI resembles the association of protein-protein complexes.Thus, we show that small homodimers with large interface area and high F hp are prevalently 2S.Similarly, large homodimers with small interface area and low F hp are prevalently 3S.Hence, it is possible to distinguish 2S and 3S dimers using 3D structural data.However, small homodimers with small interface area and large homodimers with large interface area are not significantly distinguished into 2S and 3S using structural parameters ML, B/2 and F hp .It should be noted that the conclusion made in the report are based on a limited set of homodimers given in Table 1.

Conclusion:
The mechanisms of homodimer complexations have implications in drug discovery.However, elucidation of homodimer mechanism using unfolding experiments is difficult.Prediction of homodimer folding and binding using structural data has application in target validation.Here, we show that small proteins with large interface area and high F hp form 2S. We also show that large proteins with small interface area and low F hp form 3S. Therefore, it is feasible to differentiate 2S and 3S homodimers using structural data.

Figure 1 :
Figure 1: Correlation between monomer length (ML) and subunit interface area (B/2) for three groups of homodimers.2S: two-state; 3SDI: three-state with dimeric intermediate; 3SMI: three-state with monomeric intermediate.The two dash lines through 185 aa and 1424Å 2 represent mean monomer length and mean B/2 for all homodimers, respectively.They classify the dimers into four regions (G1, G2, G3 and G4).The distributions of 2S, 3SDI and 3SMI dimers are given for each region.The value within parentheses is hydrophobicity factor (F hp ), calculated by the equation (H inf -H surf )/(H int -H surf ), where H inf is interface hydrophobicity, H int is interior hydrophobicity and H surf is surface hydrophobicity.

Table 1 :
Dataset of homodimeric proteins divided into three groups according to their unfolding pathways where n i is the number of residue type i and h i is ASA hydrophobicity factor (based on SES (solvent excluded surface) & SAS (solvent accessible surface)) of residue type i from Pacios. ,