Insights from the structural analysis of protein heterodimer interfaces.

Protein heterodimer complexes are often involved in catalysis, regulation, assembly, immunity and inhibition. This involves the formation of stable interfaces between the interacting partners. Hence, it is of interest to describe heterodimer interfaces using known structural complexes. We use a non-redundant dataset of 192 heterodimer complex structures from the protein databank (PDB) to identify interface residues and describe their interfaces using amino-acids residue property preference. Analysis of the dataset shows that the heterodimer interfaces are often abundant in polar residues. The analysis also shows the presence of two classes of interfaces in heterodimer complexes. The first class of interfaces (class A) with more polar residues than core but less than surface is known. These interfaces are more hydrophobic than surfaces, where protein-protein binding is largely hydrophobic. The second class of interfaces (class B) with more polar residues than core and surface is shown. These interfaces are more polar than surfaces, where binding is mainly polar. Thus, these findings provide insights to the understanding of protein-protein interactions.

The classical work by Chothia & Janin (1975) showed that protein interfaces are dominantly hydrophobic [1]. It was later detailed by Jones & Thornton (1995) that interfaces have more hydrophobic residues than surface but less than core [2]. The role of interface hydrophobic residues in binding was also later acknowledged by Tsai et al. (1997) [3]. It was found that large and strong hydrophobic patches are dominating features at the interface [4]. The use of a hydrophobic mean-field potential for protein subunit docking was also subsequently demonstrated [5]. Hydrophobic interfaces with few charged groups have been described [6]. This study also documented that interface residues are either "abundantly polar" or "abundantly hydrophobic". The presence of distinctly clustered yet conserved residues at the interface was known [7]. Interfaces have also been described using features (e.g. protein size, interface size, interface area, gap volume, gap index, planarity, hydrogen bonds, salt bridges, residue propensity, etc.) based on mean statistics for large datasets [8,9,10,11,12,13]. Online web servers are also available for studying PPI using these features [14,15,16]. Thus, the progress on the understanding of the molecular principles of protein-protein binding is prominent. It should be stated that these studies use datasets consisting of both heterodimers and homodimers. The formation of homodimers and their folding through 2-state (2S -without intermediate) and 3-state (3S -with stable intermediate) mechanisms is distinct from that of heterodimers [20]. Therefore, it is our interest here, to study and understand heterodimer complexes only, using interface residue types. Moreover, it is known that non-specific interfaces are less pronounced in heterodimer complexes and hence, the need to distinguish true and false complexes is not compelling [9]. We use percentage polar residues to describe interface in comparison with core and surface for 209 heterodimer complexes to classify them into distinct classes.

Materials & Methodology: Heterodimers dataset:
We created an updated yet non-redundant heterodimer dataset from protein databank (PDB) [21]. The availability of precompiled datasets are described in ProtorP [16] and PQS [22] online servers. ProtorP provides no option for download and PQS has not been updated since 1999. Therefore, it is essential to create an updated yet non-redundant heterodimer dataset from PDB (Table 1 see Supplementary material) using the procedure outlined in Figure 1. In this procedure, we downloaded 5,387 entries from PDBelite web interface using the predefined keywords "hetero AND dimer" [23]. However, this dataset was redundant corresponding to about 28,525 sequence chains. This is more than ISSN 0973-2063 (online) 0973-8894 (print) Bioinformation 6(4): 137-143 (2011) 138 © 2011 Biomedical Informatics the expected 10,774 (5,387*2) due to the presence of multiple sequence chains (>2 chains) in several entries. Therefore, we extracted the PDB entries (984) with just two sequence chains. Thus, a sequence set of 1,968 sequences corresponding to 984 PDB entries was created. This dataset was redundant at sequence level and hence, the dataset was subjected to CD-HIT (sequence redundancy removal program) [24] at 40% sequence similarity cut-off (with step size n = 2). This resulted in 680 unique sequences corresponding to 457 PDB entries. It should be noted that the number of complexes is more than half of the number of chains. This is because the interface is a combination of two chains and thus, the interfaces are non-redundant. This set contained about 60 RNA/DNA, homodimer and HETATOM structures and these entries were removed. The 397 protein complexes produced were further refined to remove short peptides of chain length <=50 residues and resolution > 3.5 Å. This resulted in a non-redundant dataset of 192 heterodimer protein complexes ( Table 1). The dataset was subsequently characterized for protein size distribution ( Figure 2).

Source organism based grouping:
Each heterodimer complex is made up of two protein monomer subunits. The source for each protein subunit is either different (different organism (DO)) or same (same organism (SO)) ( Table 1). The formation of a protein complex with interacting partners from DO is possible, often for a non-essential (nonobligatory, e.g. inhibitory) role, only in heterodimers. Thus, the dataset is divided based on organism source of interacting partners. The dataset also consists of 5 (FIVE) complexes with at least one synthetic partner (SP).

Figure 2:
Characterization of the dataset based on protein size.

Functional grouping of complexes:
We extracted "descriptive" functional data (usually semantic) for each complex from the PDB header annotation records. This data was manually curated ("by domain expert decision") through visual inspection using available literature information. Thus, complexes were generally grouped based on function into catalysis (enzymes), regulatory (cellular), assembly (structural), immunity and inhibitory (Table 1). It should be noted that this exercise is not comprehensive. However, we have taken reasonable effort on a case by case basis to classify complexes into their respective functional groups. Manual inspection of PDB description records suggests that DO complexes are often inhibitory (e.g. PDB code: 1K9O) or immune (e.g. PDB code: 1GH6) related (Table 3 see Supplementary material). However, SO complexes are associated with catalysis, regulatory, assembly and immunity. The SP group consists of a synthetic partner for in vitro inhibitory or regulatory studies. It is often possible that a complex may align with two different functional groups, where such complexes are grouped based an "expert decision" using known information.

Interface residues:
Interface (I) residues in heterodimers are identified using change in accessible surface area (ΔASA) from a "monomer-state" to a "dimer-state". Residues with ΔASA > 0 Å are considered to be at the interface. Thus, interface residues contributed by subunits A and B were identified.

Interface size and Interface area:
The distribution of complexes with interface size (number of interface residues) is given in Figure 3. The relationship between interface size and interface area is given in Figure 4.

Interface property abundance:
The interface between two interacting subunits is made of both polar and hydrophobic residues. The number of polar and hydrophobic residues at the interface varies from complex to complex. Some interfaces are rich in polar residues, while some others are rich in hydrophobic residues. Therefore, we calculated the percentage of polar and hydrophobic residues at the interface for each complex. The difference in the percentages of polar (P) and hydrophobic (H) residues at the interface is measured ( Figure 5). Thus, interface residues have "polar abundance" when %P -%H > 0 and "hydrophobic abundance" when it is < 0. This help to classify complexes with interfaces based on "abundant polar" and "abundant hydrophobic" residues.

Surface residues:
Surface (S) residues in heterodimers are identified using residue ASA values in a "dimer state". Residues with ASA > 0 Å are considered as surface residues. Thus, surface residues in the subunits A and B of the complex were identified.

Core residues:
Core (C) residues in heterodimers are identified using residue ASA values in a "monomer state". Residues with ASA = 0 Å are considered as core residues. Thus, core residues in the subunits A and B were identified. Interface, surface and core polarity: A protein heterodimer complex consists of three distinct regions (core (C), interface (I) and surface (S)) as shown in Figure 6. Interface, surface, core residues in a complex thus documented are further classified into polar and hydrophobic residues. Thus, interface, surface and core residues are grouped as polar {R, N, D, Q, H, K, S, T, Y, E} and hydrophobic {A, C, G, I, L, M, F, P, V, W} based on residue type. We then estimated the percentage of polar residues at interface (I), surface (S) and core (C) for each complex. The interface is the interacting region between the two protein partners. The core is the buried region in the individual monomers. The surface is the solvent exposed region in the complex state.

Classification of complexes:
Complexes were grouped into four distinct classes based on the relative difference in percentage polar residues (referred thereafter as polarity) between interface and core (Figure 7; Table 2

Results:
The principles of PPI were studied using a dataset of 192 heterodimer complexes (Table 1) created using a procedure described in Figure 1. The dataset is divided based on the organism source of the interacting partners. Thus, SO, DO, and SP group of complexes were identified ( Table 1). The distribution of complexes based on interacting protein size is given in Figure 2. This describes the size of interacting protein partners forming the complex. These partners interact through interface residues. The distribution of interface size among heterodimer complexes is given in Figure 3. The interfaces have interface areas which correlate with interface size (Figure 4). The chemical nature of interface residues in complexes is given in Figure 5. This shows that interface residues in complexes are either "abundantly polar" or "abundantly hydrophobic". However, majority of interfaces (121/192 -63%) have abundantly polar residues. The classification of complexes using relative polarity between interface, core and surface into classes A-D was shown (Table 2; Figure 8). This grouping shows that majority (191/192 -99%) of interfaces have polarity greater than core [I>C] as shown in Figure 7. However, interfaces in two complexes (1/192 -<1%) have polarity less than core [I<C]. We further found that 64% (122/192) of complexes are grouped under "class A" having interface polarity greater than core but less than surface. It was also noted that 36% (69/192) of complexes are "class B" with interface polarity greater than core and surface. Complexes having interface polarity less than core and surface (class C) are rare (1/192 -<1%) in the dataset. It should be stated that "class D" type of complexes are absent in the dataset. Grouping of complexes based on source organism of interacting partners shows that DO complexes are mostly inhibitory and SO are usually associated with catalysis, regulation and assembly (Table 1; Table 3). Thus, DO and SO group of complexes show functional preference (p = 0.019). However, this is not true for classes (A-D) as shown in Table 4 (p = 0.12). Table 2 shows that complexes grouped in classes A, B, C and D does not show significant difference for function preference.

Discussion:
Protein-protein interactions are vital for cellular function. Two different proteins associate with one another for function (catalysis, regulatory and assembly) that are often obligatory (essential for cellular activity). However, this is not always true. They also interact for inhibitory and immune related role, where their association is frequently non-obligatory (not essential for cellular activity). The dataset shows that obligatory role is usually observed among SO complexes and non-obligatory functions are common among DO complexes. Thus, the functional role exhibited by complexes based on organism source is significantly distinct (p value = 0.019). However, the molecular principles for such associations are not clearly known. The molecular forces for protein interactions are gathered through analysis of known structural complexes. Hence, we describe the analysis of a dataset of 192 heterodimer complexes using polarity of the interface, surface and core for classifying them into classes A -D.
Analysis of protein structural complexes showed that interfaces are either "dominantly polar" [6] or "dominantly hydrophobic" [1, 2, 6]. It is also known that the interface hydrophobic residues are more than surface but less than core [2]. Hydrophobic interfaces are similar to surface with few charged groups [6].
Our analysis shows that class A complexes have interface polarity greater than core but less than surface as reported elsewhere [2]. Thus, this observation is acknowledged in this study using an extended dataset. Interfaces are part of the surfaces in the monomers, where the interface hydrophobic residues are more than the rest of the surface and the partners interact through relative hydrophobic forces. It should be noted that we identified an unusual complex (PDB code: 2F95) under class C describing rhodopsin II/transducer interaction. The core is made of more polar residues than the interface in this complex. Thus, protein binding is hydrophobic, although, folding of the individual monomers are driven by polar residues, as in several non-globular proteins. We also identified class B complexes with interface polarity greater than both core and surface. In this class, interface polar residues are more than the rest of the surface and partners interact through polar interactions. Thus, relative polarity is the driving force in class B complexes. This class of interfaces has not been described in the literature and it is novel. The driving force for protein binding is hydrophobic in class A and polar in class B complexes. These observations using interface residue properties are imminent to the understanding of protein binding in heterodimer complexes. This study should be extended using a combined formulation of residue types and atomic features in future investigation. It should also be noted that interfaces between partners are part of surfaces in interacting monomers. These interfaces are clearly defined in known structural complexes. However, there are often several binding sites in an interacting monomer under in vivo conditions and these have not yet been characterized. Therefore, experiments should be formulated to capture these combined features in future studies.

Conclusion:
Proteins associate with one another as a resultant effect of both polar and hydrophobic residues at the interface. The unresolved challenge here is to quantify their combined effect at the interface. Inter-subunit scoring functions for polar and hydrophobic effects are available based on a limited set of structural complexes and are always inadequate to describe new classes of interfaces. It is known that interface residues are either "abundantly polar" or "abundantly hydrophobic". It is also known that interfaces are less hydrophobic than core but more than surface in a class of complexes. We document a new class of complexes with more interface residues than core and surface. Thus, the driving force for protein-protein interaction is selectively either hydrophobic or polar for different classes of interfaces.  1A6D  1PDK  2OVP  1A50  1LDJ  2F9Y  1BKD  1B8M  1BI7  1J7D  1B34  1R1K  2P7V  1B7Y  1NVI  2GAF  1H3P  1CAU  1OC0  1BP3  1RKE  2PJW  1BMQ 1PG5  2GGV 1HDM  1P5V  1XT9  1C3A 1RY7