SpecP: A tool for spectral partitioning of protein contact graph

SpecP is an open-source Python module that performs Spectral Partitioning on Protein Contact Graphs. Protein Contact Graphs are graph theory based representation of the protein structure, where each amino acid forms a ‘vertex’ and spatial contact of any two amino acids is an ‘edge’ between them. Spectral partitioning is carried out in SpecP based on the second smallest spectral value (eigen value) of the Protein Contact Graph. The eigen vector corresponding to the second smallest spectral value are partitioned into two clusters based on the sign of the corresponding vector entry. Spectral Partitioning algorithm is repeatedly carried out until the desired numbers of partitions are obtained. SpecP visualizes the spectrally partitioned clusters of protein structure along with the Protein Contact Map and Protein Contact Graph which can be saved for later use. It also possesses an interactive mode whereby the user has the ability to zoom, pan, resize and save these raster images in various image formats (.eps, .jpg, .png) manually. SpecP is a stand-alone extensible tool useful for structural analysis of proteins.


Background:
Spectral partitioning is a graph partition algorithm which partitions data represented in the form of a graph G = (V,E), with V vertices and E edges, into smaller components with specific properties [1]. Spectral partitioning has gained momentum in recent times due to its simplicity and better performance. They have been successfully applied in protein science [2]. Proteins are linear, ordered chain of amino acids that fold by virtue of chemical forces to form a 3D structure. Coarse grain models of proteins using graph theory are spawned to gain insight into the structures of proteins. Proteins are depicted as graph with the amino acids as nodes and the positional information of the Cα atoms that form the backbone of protein structure, as edge connectivity or 'contact'. This graph-theory based network forms the Protein Contact Graph [3]. The Protein Contact Graph of the first 10 nodes for the protein (PDB id 4q21) is as shown in (Figure 1). Protein Contact Maps are a reduced representation of Protein Contact Graph [4], providing a quick way of visually inspecting structural features. A contact map or adjacency matrix is a square matrix M where Mi,j=1 if the distance between Cα atoms of residues i and j is below cut-off threshold atomic distance, or Mi,j=0 otherwise. The cut-off atomic threshold distance is > 8 Ǻ for long-range contact network, between 4Ǻ and 8Ǻ (inclusive) for medium-range contact network and < 4 Ǻ for short-range contact networks.
The Spectral Partitioning algorithm is applied on the Protein Contact Map obtained from Protein Contact Graph. The algorithm considers the eigen vectors (Fiedler vector) of the second smallest eigen value that yields a lower bound on the optimal cost of ratio-cut partition and bisects the graph into two disjoint sets based on the sign of the corresponding vector entry [5]. The algorithm is repeated until desired numbers of partitions are obtained. SpecP can generate Spectral Partitions, Protein Contact Graph and Protein Contact Map that can be visualized and the saved for later use.

Methodology:
Computing the adjacency matrix is the first step in Spectral partitioning. The adjacency matrix for the first 10 amino acids of the protein with PDB id 4q21 is given in (Figure 2).
To compute the Laplacian matrix from the adjacency matrix, the degree matrix must be obtained. Degree matrix is a diagonal matrix that holds the degree of each vertex of a graph. All the elements of a degree matrix are 0 except for the diagonal elements. The degree of each vertex is computed by summing up each row (that correspond to a vertex) of the adjacency matrix and placing it in the diagonal element of that row. The degree matrix of the above adjacency matrix is given in (Figure  2).
Spectral partitioning which takes the Laplacian matrix, is worked out as L=D-A, where D is diagonal degree matrix and A is the adjacency matrix. The Fiedler vector bisects the graph into two partitions based on the sign of the corresponding vector entry. By repeatedly applying Spectral partitioning algorithm, the desired numbers of partitions can be obtained,

Software Input:
The user provides the PDB (Protein Data Bank) file that can either be uploaded from the http://www.rcsb.org (PDB) site or from the local disk. The clustering is performed on activating the 'Spectral Partitioning' button. The user is prompted to select the threshold atomic distance before partitioning. The 'Cluster Again' button will result in partitioning the selected partition. This can be continued until the desired number of partition is obtained. The Graphical user interface of SpecP tool is specified in (Figure 3). The 'Cluster Again' option spectrally partitions the selected partition. (Figure 5), displays the SpecP tool producing 2, 3, 4, 5 partitions of the protein (PDB id4Q21).

Caveat & Future development:
SpecP is a stand-alone package developed in Python 2.6 on Windows platform. GUI was designed using Tkinter. Numpy and Scipy modules were used for scientific computing, Igraph, Networkx modules were added to create and manipulate graphs, and to represent and manipulate complex network structures respectively. MatplotLib and Pylab were imported to plot data and generate output in different formats.
Future version will be a web-based application capable of performing spectral partitioning on the basis of surrounding hydrophobicity and build to compute clustering coefficient, cyclic coefficient, characteristic path length, associative coefficient and triangle density from the of the Protein Contact Network generated.