Application of centrality measures in the identification of critical genes in diabetes mellitus

The connectivity of a protein and its structure is related to its functional properties. Many experimental approaches have been employed for the identification of Diabetes Mellitus (DM) associated candidate genes. Therefore, it is of interest to use var ious graph centrality measures integrated with the genes associated with the human Diabetes Mellitus network for the identification of potential targets. We used 2728 genes known to cause Diabetes Mellitus from Jensenlab (Novo Nordisk Foundation Center for Protein Research, Denmark) for this analysis. A protein-protein interaction network was further constructed using a tool Centralities in Biological Networks (CentiBiN) with 1020 nodes after eliminating the duplicates, parallel edges, self -loop edges and unknown Human Protein Reference Database (HPRD) IDS. We used fourteen centralities measures which are useful in identifying the structural characteristic of individuals in the network. The results of the centrality measures are highly correlated. Thus, we identified genes that are critically associated with DM. We further report the top ten genes of all fourteen centrality measures for further consideration as targets for DM.

Diabetes Mellitus is not a single disease but a group of disorders with glucose intolerance in common. Many online databases are available to research genes across species. Different databases available that allows access to information about phenotypes, pathways, and variations of many genes across species. Before the candidate-gene approach was fully developed, various other methods were used to identify genes linked to disease-states. However, these methods are not as beneficial when studying complex diseases for several reasons. In this scenario, candidate gene approaches were found in identifying the risk variants associated with various diseases of interest such as dementia, cancer, diabetes, asthma, and hypertension. The candidate gene approach to conducting genetic association studies focuses on associations between genetic variation within prespecified genes of interest and phenotypes or disease states.

[3, 4]
With the tremendous escalation of human protein interaction data, the entanglement of the techniques can be conquered through protein-protein interaction networks (PPINs). The function and activity of a protein are often modulated by other proteins with which it interacts [5,6]. Data might be represented as networks, in which the vertices (e.g. transcripts, proteins or metabolites) are linked by edges (correlations, interactions or reactions, respectively). Structural analysis of networks can lead to new insights into biological systems and is a helpful method for proposing new hypotheses [7-10].

Methodology:
Proteins are the representatives of the biological networks and they are realized only if the relationship between essentiality and topological properties such as the degree distribution, clustering coefficients, centrality measures, and community structures of the network are studied [9]. Network centralities are used to rank elements of a network according to a given importance concept [11].
However, the use of centralities as a structural analysis method for biological networks is controversial and several centrality measures should be considered within an exploratory process [16]. To support such analysis and due to the complexity of both biological networks and centrality calculations, a tool is needed to facilitate these investigations. Here we present CentiBin, an application for the calculation and visualization of centralities for biological networks.
The human protein interaction data was obtained from Human Protein Reference Database (HPRD). The main purpose of using HPRD dataset is it focuses on likely true Protein-Protein Interaction (PPI) set by generating sub networks around proteins of interest. HPRD represents a centralized platform to visually depict and integrate information pertaining to domain architecture, posttranslational modifications, interaction networks and disease association for each protein in the human proteome [17]. We have followed the procedure mentioned in Figure 1 for identifying the critical genes for diabetes mellitus.

Data Set:
We have extracted the human gene involving in Diabetes mellitus from the database developed by Jensen Group (Jensenlab) of Novo Nordisk Foundation Center for Protein Research, Denmark. Jensenlab is maintaining a DISEASES database. DISEASES database is a frequently updated web resource that integrates evidence on disease-gene associations from automatic text mining, manually curated literature, cancer mutation data, and genome-wide association studies. We have mined the Jensenlab DISEASES database for the genes causing Diabetes mellitus. We got that 2728 genes causing diabetes mellitus, after eliminating duplicate entries it reduced to 2017 genes.

Result & Discussion:
With the help of bioDBnet (http://biodbnet.abcc.ncifcrf.gov) we find the equivalent HPRD ids for the 2017 genes. Out of 2017 proteins we got HPRD IDs for 1834 proteins. We are unable to get the equivalent HPRD IDs for 183 proteins, so we find the HPRD IDs through their aliases. Still we couldn't find the HPRD ids for 39 proteins because these are the new entries in the database. After eliminating duplicates we got 1876 unique genes.

PPI Network
To construct the Protein-Protein Interaction network, we have downloaded and deployed the interactions database from HPRD website (http://hprd.org) in our local database. We have retrieved the PPIs for 1876 unique proteins where both source and sink proteins are in 1876 unique proteins. With that we have constructed a network using CentiBin with 1151 vertices and 3389 edges. Finally we got 1020 vertices with 2891 edges after eliminating the self edges, parallel edges from the network.
Using CentiBin we have calculated fourteen different graph centrality measures such as degree, eccentricity, closeness, radiality, centroid values, Stress, shortest-path betweenness, current-flow closeness, current-flow betweenness, Katz status index, Eigen vector, hits-authority, hits-hubs and Page Rank for the PPI network constructed. The top ten genes of each centrality measure are presented in Table 1 (see  supplementary material).

Correlation analysis on centrality properties
The pair wise correlation coefficients of the fourteen centrality measures depicted for the Diabetes Mellitus elucidate that they all are positively correlated and their correlation value lies above 0.52 as represented in Table 2 (see supplementary material), Figure 2. Here, the difference represents the difference in the ranks of each observation on the two variables which here represents the centrality scores.

Conclusion:
Many experimental approaches have been used to identify candidate genes in DM. We used various graph centrality measures integrated with the genes to identify potential drug targets. We calculated fourteen centralities measures for the constructed network with positive correlation having values greater than 0.52. This helped to identify genes that are highly critical in DM. We thus report the top 10 genes of all fourteen centralities for consideration as potential targets for DM.