A study of interface roughness of heteromeric obligate and non-obligate protein-protein complexes.

A number of studies aimed to distinguish the structural patterns at the interfaces of obligate and non-obligate protein-protein complexes. These studies revealed better geometric complementarity of protomers in obligate complexes over non-obligates. We showed that protein surface roughness can be used to explain this observation. Using smoothened atomic fractal dimension (SAFD) as a descriptor, this work investigates the role of interface roughness in the molecular recognition of these two types of protein-protein complexes. We studied 52 obligate and 62 nonobligate heteromeric high quality crystal structures from benchmark data sets. We found that distribution of interface roughness values obligate and non-obligates are quite similar. However, we observed a distinct preference for obligate protomers to complex with chains having similar roughness. The roughness pairing is correlated in obligates only. The later indicates, an increase/decrease of roughness in one chain causes a proportional change in roughness in its binding partner. Based on these observations we proposed that similar and correlated roughness pairing leads to more interdigitation and contacts at the interface leading to better geometric fit in obligates. We propose that roughness information can find useful application in improving machine learning based complex type classifiers and filtering protein-protein docking solutions.


Background:
Protein-protein interactions (PPI) in living cells mediate numerous if not most biological processes.Structural data, when available for such protein complexes, could enable us to model the underlying molecular mechanism of such interactions.Such models could be potentially helpful for predicting protein function by identifying the complex type.Two different types of protein complexes can be distinguished on the basis of protomer affinity.In an obligate interaction, the protomers are not found as stable structures on their own in vivo.Such complexes are generally also functionally obligate.Most homodimers and some heterodimers belong to this category.Non-obligate interactions are those whose protomers exist independently.The components of such protein-protein complexes are often initially not co-localized and thus need to be independently stable.Protein complexes are also categorized as permanent and transient according to their lifetime in vivo.Structurally and functionally obligate interactions are also mostly permanent and non-obligatory interactions can be transient or weaker (compared to obligatory interactions) .Transient interactions are more stable compared to the weaker ones and requires a molecular trigger for association or dissociation in vivo.compiled a nonredundant set of 82 obligatory and 30 non-obligatory complexes to compare their structural properties.Among the various differences in interface properties, obligates showed somewhat better shape complementarity at the interfaces compared to non-obligates using Colman's method.The differences in interface area as well as residue propensities were only marginally different for the two types of complexes.postulated that oligomeric protein-protein interfaces were rough compared to the other regions of the protein chain.In a similar study Pettit & Bowie [9] used a smoothened atomic fractal dimension (SAFD) to measure protein surface roughness.As the measurement of individual surface area of atoms is prone to statistical error, The SAFD calculates the roughness around each atom by smoothing over its neighborhood.The SAFD (fi) is defined in equation 2 (see supplementary material).

Previous workers like Ofran and Rost
Analyzing the active site SAFD of a small data set (26 complexes), they came to an almost opposite conclusion that oligomeric interfaces are not much rougher than non interfacial regions of the complexes, particularly when the interface area is 600Ǻ 2 or more.On the other hand small molecule binding pockets showed significantly higher surface roughness values.In a later work Pettit et.al used SAFD in a neural network based protein functional site prediction software, HOTPATCH [10].Toufic et. al [11] used fractal dimension as one of the structural descriptors for the characterization and prediction of different protein-RNA binding sites.Protein-RNA interface residues were found to be rougher compared to the non-interfacial ones.In another study the SAFD and Fractal Dimension were used to evaluate the accuracy of protein homology models [12] From the previous studies it became apparent that surface roughness has an important role in molecular recognition.However, earlier studies were performed on limited number of complexes and no discrimination was made between obligates and non-obligates.This motivated us to probe the role of surface roughness, in obligate and non-obligate interactions.We specifically studied if SAFD is a discriminating feature for these two types of complexes and if so, whether it could possibly provide a plausible mechanism for complex formation.We hypothesized that similar interface roughness pairing would provide more interdigitation leading to better geometric fit in obligates and vice versa in non-obligates.We tested our hypothesis against well characterized high quality benchmarked datasets of obligates and non-obligates.The correlations (or lack of it) between the roughness values of interacting chains in two types of complexes substantiated our hypothesis.

Methodology: Generation of obligate and non-obligate dataset:
Most homodimeric proteins fall into the obligate category.Obligate homodimers are also in most cases symmetric in a way, that the interface contains equivalent atoms belonging to the same surfaces of the protomers.Such symmetric homomeric obligates were excluded from our study.Because, in such complexes, the difference in interface SAFD would be essentially zero, making it unsuitable for this analysis.Therefore our dataset comprised of heteromeric subunits only.For obligates, a subset of heteromeric complexes from the dataset of Mintseris & Weng [13] were considered.This dataset of permanent complexes was an extension of [4] and was generated by automated means.From this set we chose binary complexes having no more than 30% sequence identity between protomers.We rejected complexes having ligands in the interfaces, as described in [6].To select obligates we subjected the rest of the complexes to the NOXclass multistage classifier [6] which has high prediction accuracy.The NOXclass features used for classification include interface area, interface area ratio, area-based amino acid composition, correlation between interface and protein surface and gap volume index.Complexes that were predicted to be obligate with greater than 55 % certainty were accepted.Finally a total of 52 obligate complexes were considered (Table 1a: Supplementary data).For the non-obligate protein complexes, the entire NOXclass [6] non-obligate data set was taken, comprising of 62 hetero complexes (Table 1b: Supplementary data).All the PDB files corresponding to the data sets were downloaded and 'ATOM' records of complexes and individual chains were written into separate files.

Calculation of SAFD values:
We used "Ezprot 2.2", a free suite of programs, developed by Frank Pettit [14] to calculate SAFD, and interface residues.

Finding interface residues:
To identify the amino acids at the interface of each chain, the 'listoligface' program which comes along the Ezprot suite was used.'listoligface' reads one or more protein complex , and for each chain, identifies any oligomeric interfaces; the output is a list of which residues on each chain contact which neighboring chains.Here, a 'neighboring chain' is defined as any chain within a fixed distance to any heavy atom on a different chain.We used the default fixed distance of 4.2 Angstroms.This method is akin to one of the methods described in [5] for defining interface residues using a lenient accessible surface area change criterion.

Calculation of Fractal dimension:
To calculate the fractal dimension of each atom, and to remove the statistical error, the program 'rufness' calculates the SAFD for each atom of the protein according to equation (2).In the default option the probe radius is 1.4 Å and the small change in probe radius to be used for calculating areas and fractal dimension is 0.1 Å.We used these default values in all our calculations.The 'rufness' program can optionally calculate the SAFD for solvent exposed atoms only.We used this feature to calculate the atomic SAFD averaged over all neighboring atoms that were solvent exposed and not buried.The SAFD i values for each exposed atom of all the amino acid residues present at the interface, of each chain were calculated, and averaged to get the average SAFD (ISAFD) of the chain interface atoms.Gap volume Index (GVI) is a measure of interface geometric complementarity, was calculated using the NOXclass server [6].A lesser value of GVI indicated better shape or geometric complementarity at the interface.

Statistical analysis:
Graphical and statistical analysis of SAFDs, viz.Box plots, correlation coefficient and t-test calculations were done using the 'R' software suite.Correlation plots were made using Microsoft Excel.

Discussion: Interacting protein chains in different complexes:
In this study a total of 52 obligate and 62 non-obligate heteromeric complexes of high resolution were used from curated benchmark data sets.[6, 13]

Interface SAFD values of obligates and non-obligates:
The interface averages SAFD (ISAFD) of the two protomers in the two types of complexes were studied using boxplots and unpaired ttest (Figure 1).For the obligate complexes, the ISAFD ranged from 2.94 to 3.45, the median value being 3.18.For the non-obligates the ISAFD ranged from 2.84 to 3.77, the median value being 3.15.However the spread of ISAFD values is a little more in non-obligates.The two tailed unpaired t-test gave P value of 0.4689, considered insignificant.The boxplots and statistics show that the median ISAFD values of protomers didn't differ in two types of complexes and the distributions of roughness values are quite close.The median interface roughness values comes close to the average value of 3.12 reported by Pettit [9] for large functional sites (interface area >600 Ǻ 2 )

ISAFD differences between chains in obligates and non-obligates:
For the two types of complexes, the absolute differences of ISAFD (d-ISAFD) between the protomers were calculated, and analyzed by boxplots and unpaired t-test (Figure 2).The median d-ISAFD values for obligate and non-obligates are 0.109 and 0.179 respectively.Thus there was a noticeable, more than 1.6 fold increase in the median value of ISAFD between protomers for non-obligates over obligates.The two tailed unpaired t-test gave P value < 0.0001, considered highly significant.This indicates that in obligates the interface roughness difference is much less than that of non-obligates.As a result a tight distribution of d-ISAFD values in obligates was seen in the plot.In non-obligates, d-ISAFD values were much scattered, indicating the two chains of the complex had a wide range of interface roughness.

Correlation of protomer ISAFD in obligates and non-obligates:
Correlations between the ISAFD of protomers in both types of complexes were quantified by Pearson correlation coefficient.A high positive correlation coefficient of 0.621 (P value < 0.0001, considered highly significant) was observed between the interface SAFD values of the two chains involved in obligatory interactions.The correlation coefficient for non-obligates was 0.027(P value 0.84, considered not significant), which was negligible when compared with obligates.This showed that there was practically no correlation of roughness between the non-obligatory chains.A visual inspection of the scatter plots (Figure 3, 4) also revealed that the chains in obligates have more correlated surface roughness with respect to non-obligates.Our study showed the predominant preference of protomers of similar roughness to form obligate interactions.From correlation studies, it is apparent that in obligates, the increase or decrease in roughness in one of the protomers is also reflected in its binding partner by a proportional increase or decrease of roughness.The underlying mechanism can be described by Evidently, chances of having more contacts are more for rough/rough or smooth/smooth association, compared to a rough/smooth association.
Shape complementarity, in form of GVI, was successfully used as one of the six descriptors in NOXCLASS [6] to characterize complex types by a Support vector machine, achieving 91.8% accuracy.In our study too we have obtained high discriminatory values of correlation coefficients (0.621, 0.027) and median d-IASFD values (0.0109 and 0.179) for obligate and non-obligates respectively.We propose that the difference in interface SAFD (d-ISAFD) can be used as a discriminatory feature for classification of complexes along with other descriptors.

Conclusion:
We studied interface roughness properties of a high quality, nonredundant obligate and non-obligate heteromeric protein complex dataset.We found that subunits with similar roughness values at the interfaces tend to form obligate complexes, and those with different values tend to form non-obligate complexes.One of the essential features of obligate complexes is the better geometric fit between the protomers.Our structural model provides an explanation of such complementarity from observed similar and correlated roughness pairing of subunits with better interdigitation at the interface.The model also explains the observed higher interface contacts in obligates compared to non-obligates.There are a few practical implications of our findings.Firstly, along with various other surface based descriptors interface roughness property can be used to discriminate among heteromeric obligate and non-obligate complexes and thus can further improve the performance of existing classifiers.This could be an important tool for protein function prediction in a structural genomics framework.Secondly, preferential roughness pairing can be used to filter out wrong candidates in computational protein-protein docking when the type of protein complex is known beforehand, viz.from biological data.Thirdly, protein design experiments can also benefit from these findings for designing appropriate binding interfaces. www.bioinformation.net

[ 1 ]
analyzed six types of protein complexes, including homo-obligomers and hetero-obligomers.Nooren and Thornton [2] studied homo and heterodimeric transient protein complexes.This study revealed that transient complex interfaces have lesser area, are more planar and polar compared to those homodimers which show permanent and stable interactions.Block et.al identified several physiochemical interface descriptors [3] and used machine learning techniques to classify permanent and transient complexes.The first breakthrough in classifying permanent and transient complexes was achieved by Mintseris and Weng using atomic contact vectors [4] as a structural descriptor.Dey et.al [5]

Figure 5 .
The surfaces of protomers of similar roughness can interdigitate well to form better geometric fit (Figure5: I, II).Given that the roughness values are similar, rougher protomers can geometrically fit well, leaving lesser gap at the interface, just as well the two protomers with comparatively smooth interfaces can fit together.This type of protomer pairings are observed in our obligate data set.On the other hand, protomer surfaces with marked difference in interface roughness can't interdigitate well, resulting in poor geometric fit with more gaps in the interface.The later explains the case of non-obligates (Figure5: III, IV).This structural model is in accordance with the better shape complementarity observed in obligates over non-obligates [5,6,15].The average GVI for our obligate data set was 2.15 as compared to 5.3 for the non-obligate set [6] indicating better geometric fit in obligate complexes.This further corroborates our proposed mechanism.In obligates the correlation allows the chains to preserve structural complementarity as the roughness of one protomer is changed, unlike the non-obligates.The proposed model also explains the more number of contacts observed at obligate interfaces over non-obligates [5].

[7, 8]. From a study of surface roughness study Lewis & Rees [7] and Aqvist & Tapia [8]
Zhu et.al [6]used a non redundant training set and six interface properties to train a Support vector machine to classify obligate, non-obligate and crystal packing interactions.Zhu's NOXclass classifier achieved a remarkable accuracy of 91.8% in classifying obligates from non-obligates.Of the various interface properties Gap volume and gap volume index were studied as a measure of shape complementarity.In accordance with previous studies obligate complexes show better geometric complementarity over non-obligate ones.