Insights from the protein-protein interaction network analysis of Mycobacterium tuberculosis toxin-antitoxin systems

Protein-protein interaction (PPI) network analysis is a powerful strategy to understand M. tuberculosis (Mtb) system level physiology in the identification of hub proteins. In the present study, the PPI network of 79 Mtb toxin-antitoxin (TA) systems comprising of 167 nodes and 234 edges was investigated. The topological properties of PPI network were examined by 'Network analyzer' a cytoscape plugin app and STRING database. The key enriched biological processes and the molecular functions of Mtb TA systems were analyzed by STRING. Manual curation of the PPI data identified four proteins (i.e. Rv2762c, VapB14, VapB42 and VapC42) to possess the highest number of interacting partners. The top 15% hub proteins were identified in the PPI network by employing two statistical measures, i.e. betweenness and radiality by employing cytohubba. Insights gained from the molecular protein models of VapC9 and VapC10 are also documented.

Interaction sources selected for generation of PPI network were text mining, experiments, databases, co-expression, neighborhood, gene fusion, and cooccurrence. PPIs that possessed at least a medium confidence score of 0.400 were considered for network generation. Network construction and visualization was done by cytoscape v 3.5.1 [10] and STRING v10.5.

Figure 1:
A flowchart representing the methodology applied in the study; arrows represent flow of information and transition from one step to another

Topological and functional enrichment analysis of PPI network:
The topological parameters of PPI network, i.e. number of nodes, number of edges, average node degree, etc were evaluated by STRING v10.5. The protein-protein association data obtained from STRING was further utilized to compute several other topological parameters such as average clustering coefficients, topological coefficients, and shortest path lengths etc. via Network analyzer, a cytoscape plugin app, by treating the network as directed graph. In addition, functional enrichment of input seed sequences of Mtb TA systems was carried out by STRING to identify significantly enriched GO (Gene Ontology) biological processes and molecular functions.

Identification of Hub proteins:
Cyto-Hubba [11], a java plugin for Cytoscape software, was employed to determine the hub proteins of PPI network of Mtb TA systems. In this study, two centrality measurements, i.e. betweenness and radiality were applied to mine the top 15% hub proteins of the network. In addition, we further mined hub proteins, which were commonly identified by both the algorithms.

Model evaluation:
Energy minimized 3D models of VapC9 and VapC10 were subjected to various model validation servers to evaluate the stereo chemical properties. To determine parameters such as Zscore, QMean Score, D-fire Energy and residue by residue geometry, energy minimized theoretical models were subjected to SWISS-MODEL server [17]

Results and Discussion: PPI network analysis of Mtb TA systems:
Protein-protein association data of 79 TA systems of Mtb H37Rv was extracted from STRING v 10.5. To completely explore the PPI data, the search was set to include all the source parameters. The mined PPI data was comprised of total 468 PPI's as depicted in Figure 2. Manual curation of the PPI data revealed that 63 proteins out of 468 PPIs possessed ≥4 interacting partners in the network (Suppl Table 2), whereas 64 proteins were associated with only one interacting partner. Two groups of proteins, comprising of 20 and 15 proteins were highly connected with each member having five and six interacting partners, respectively. Notably, two members of VapBC family, i.e. antitoxin VapB45 and antitoxin VapB14 were found to possess 8 interacting partners each. Interestingly, both the toxin and antitoxin of VapBC42 were highly connected with each possessing 9 interacting partners. Strikingly, antidote HigA1 possessed the highest number (10) of interacting partners. For 17 proteins, no PPI information could be extracted from STRING. PPI information extracted from STRING v10.5 ranged from medium to highest confidence scores. In fact, 84 (~18%) of the protein-protein associations fell within highest confidence interval (CI, S > 0.9), 176 (37.6%) within high CI (0.7 ≤ S < 0.9), and 208 (44.44%) within medium CI (0.4 ≤ S < 0.7).
In addition, topological and functional enrichment analysis of input Mtb TA systems was carried out by STRING and 'network analyzer' a cytoscape plugin. Network statistics obtained by STRING database revealed that the extracted interactome was comprised of 167 nodes and 234 edges. The average node degree and average local clustering coefficient of the network was determined to be 2.8 and 0.628, respectively (Table 1). On the other hand, 'network analyzer' a cytoscape plugin estimated several other topological parameters such as network diameter, network radius, shortest path, characteristic path length and average number of neighbors (Table 1). GO biological process and molecular function enrichment analysis of input seed sequences was carried out by STRING v10.5. The majority of Mtb TA system proteins were significantly enriched in biological processes associated with regulation of growth (1.46E-77), nucleic acid phosphodiester bond hydrolysis (1.08E-61), RNA phosphodiester bond hydrolysis (3.73E-60) and negative regulation of growth (2.65e-45, Table 2). Furthermore, molecular functions of such proteins were primarily related with nuclease activity (6.27e-62), ribonuclease activity (7.72e-61), and metal ion binding (7.36e-12, Table 3). Similar to gene enrichment analysis of this study, activation of TA systems leading to growth arrest by the toxin partners of VapBC, RelBE, MazEF, and HigBA families has also been reported [3]. Hub proteins of PPI network represent highly connected nodes with special biological properties and are more evolutionary conserved than non-hubs. In fact, removal of such hubs can lead to network disruption and thus are considered as attractive drug targets [24,25]. Identification of hub proteins can be carried out by in silico tools such as Hubba, cytohubba, and CHAT etc. [26,27]. In the present study, cytohubba was used to explore the hub proteins of Mtb TA systems PPI network. The top 15% hub proteins were identified on the basis of radiality and betweenness algorithms ( Table 3, 4). HigA1 and VapB45 antitoxins were determined to be the top scorer hub proteins by betweenness and radiality method, respectively. Antitoxin HigA1 (Rv1956) belonging to HigBA family has been reported among the 10 top most upregulated Mtb TA systems of drug tolerant persisters [3]. On the other hand, VapB45 (Rv2018) is the antitoxin partner of VapC45, but no experimental data is available to elucidate their role in Mtb pathogenesis. In addition, we mined the hub proteins that were commonly identified by both the algorithms (Table 4). Notably, majority of the hub proteins identified belonged to the VapBC family apart from the members of HigBA and RelBE families. VapBC is the largest family of Mtb TA systems characterized by the presence of PIN domain and functions mostly as ribonucleases [3,28]. Interestingly, Rv2762, which was not part of input seed proteins, was also detected as a top ranker hub protein.
Antitoxins are reported to be small proteins with less order in their structure. Therefore, it is difficult to find druggable pockets on their surface that can accommodate small-molecule inhibitors, whereas toxins are more stable and ordered in their structure, and are also considered as attractive targets for drug-design [28]. Therefore, we focused to elucidate the structural insight of toxins identified by in silico analysis. It was found that 5 out of 10 top ranking hub proteins were antitoxins, i.e. higA1, VapB45, VapB14, VapB11 and RelB, whereas three proteins were toxins, i.e. VapC1, VapC9 and VapC10. Since the structure of VapBC1 is available at PDB, we focused to determine the structural insights of VapC9 and VapC10 proteins. Domain identification tools used in the study, i.e NCBI CDD, Pfam and InterProscan revealed the presence of PIN domain in both VapC9 and VapC10. In fact, PIN domains are small protein domains of ~130 amino acids, which are characterized by the presence of three invariant amino acid residues and fourth lesser-conserved acidic residue [3,29]. Physicochemical properties were computed by protparam tool of ExPASy for VapC9 and VapC10 (Table 5). Protein BLAST was carried out against PDB database to identify homologs with resolved 3D structure for structure prediction of VapC9 and VapC10. The crystal structure of PAE0151 from Pyrobaculum aerophilum (2FE1) was selected as the best suitable template on the basis of maximum query coverage (100%) and maximum identity (79%) for input VapC9 protein sequence (Figure 3). In a similar manner, crystal Structure of Fitacb from Neisseria gonorrhoeae (2H1C) was chosen as the best template structure for the development of homology model of VapC10 based on maximum sequence coverage (81%) and maximum sequence identity (32%) (Figure 4). Five theoretical models generated for each protein by Modeler v9.17 were assessed on the basis of DOPE score and GA341 score. The model with highest GA341 score and lowest DOPE score was selected and was further subjected to energy minimization by Chimera v1.11.2.
Theoretical models of VapC9 and VapC10 were subjected to various structural validation servers to assess the correctness of the models. Ramachandran plot obtained for VapC9 by PROCHECK of swiss model server revealed that 93 % of amino acid residues were present in most favored regions, 6.1% in additionally allowed regions and 0. 9 % in generously allowed regions (Table 6). On the other hand, Ramachandran plot for VapC10 showed that 84.5 % residues were present in the most favored region, 12.9 % in additionally allowed region and 2.6 % in generously allowed region (Table 6). Notably, for both the generated models, none of the amino acid residues was observed in the disallowed region. Overall G-factor score computed for VapC9 and VapC10 by swiss model server fell in the acceptable cut-off range. In addition, Z-score, Q-mean score and D-fire score were also estimated by swiss model server, which further confirmed the reliability of the models generated ( Table 6). The overall quality factor score obtained after ERRAT analysis was 97.3 % for VapC9 and 93.5 % for VapC10. Prosa web Z score for VapC9 and VapC10 was -4.07 and -3.64 respectively, thereby suggesting that the structures are of good quality (Table 6). Predicted resolution by resprox along with LG and Maxsub score determined by ProQ also indicated the reliability of 3D models. In addition, MOLPROBITY revealed that none of the residue possessed bad bonds or Cβ deviations > 0.25A ( Table 6). The 3D models generated in the study were of reliable quality as assessed by various structural assessment reports such as PROCHECK of swiss model server, ERRAT, ProSA-web, ProQ, MOLPROBITY and ResProx. In addition, active site of VapC9 and VapC10 was identified by Metapocket v2.0. The generated homology models and active site determined by metapocket v2.0 of VapC9 and VapC10, respectively are shown ( Figure 5-6 A, B).
The developed models can be used for structure based drug designing.

Conclusion:
We reported the construction and extensive analysis of PPI network of 79 Mtb TA systems. Our computational analysis revealed significantly enriched gene ontology terms for pathways and molecular functions of Mtb TA systems and topological properties of PPI network. The major contribution is the identification of hub proteins of PPI network that can be explored as promising drug targets and for vaccine development. In addition, homology models of hub proteins VapC9 and VapC10 provide insights to its molecular functions.