Phytochemical derivatives targeting fliJ flagellar protein from Escherichia coli

Approximately 50 per cent of nosocomial infections are caused by the use of indwelling medical devices. The surfaces of devices are ideal sites of attachment for bacterial cells and an increase in biofilm formation. Biofilms have been a constant concern due to their complex extracellular matrix (ECM) resulting in multiple drug resistance. E. coli is known to associate with biofilms. Therefore it is of interest to identify the proteins associated to biofilm formation in Escherichia coli through literature survey, investigate their protein-protein interactions and identify indispensible proteins of biofilm formation. These proteins were further analyzed and fliJ was identified as the target, based on betweenness, centrality and radiality. 87 phytochemicals were found to be associated with the microbe in question and were docked with the target using Molegro Virtual Docker (MVD) 5.0. The results showed that geranyl pyrophosphate, ferulic acid 4-o-b-d-glucuronide, 5-8'-dehydrodiferulic acid and geranyl diphosphate showed maximum activity. A combinatorial library of 96 models was generated using the four phytochemicals binding with fliJ.


Background:
Biofilms have been a constant concern due to their compact yet complex extracellular matrix (ECM). A major concern associated with their eradication is due to their complex signalling and diversity in structural composition [1]. This allows microorganisms in biofilms to survive and withstand hostile circumstances like starvation and desiccation, thereby enabling them to cause a broad range of chronic infections. Biofilms are often found on surfaces of medical devices. Around 50% of nosocomial infections are caused due to the use of indwelling medical devices such as cardiac pacemakers, catheters, dentures, lenses, prosthetic valves and joint prostheses [2]. The surfaces of such devices are ideal sites of attachment for bacterial cells and a raise in biofilm formation has been noticed in the presence of indwelling medical devices [3].
Microbial colonization begins within 24 hours after insertion of catheters [4]. Central-venous catheter-related bloodstream infections (CRBSIs) are one of the principal causes of nosocomial infections coupled with morbidity, mortality and cost. CRBSIs are caused by Escherichia coli, Klebsiella pneumoniae, Staphylococcus aureus, Pseudomonas aeruginosa and Acinetobacter baumanii, out of which, eight per cent was attributed by E. coli [5]. Biofilms harbour multiple microorganisms and the communication occurs through a complex signalling process -quorum sensing.
It is of interest to identify proteins associated with biofilm formation in Escherichia coli by literature survey, investigate their protein-protein interactions and identify indispensible proteins of biofilm formation. These proteins will further be analyzed to identify an appropriate target based on betweenness, centrality and radiality. Phytochemicals found to be associated with E. coli will be docked with the target protein and a combinatorial library of the identified phytochemicals will be built to enable synthetic production of the ligand.

Methodology: Study of Protein-Protein
Interactions 338 E. coli proteins involved in biofilm formation were identified using literature survey. Interactions between the proteins were studied using the STRING 10.0 database. The STRING results were further analysed by using Cytoscape and plug-ins, M-CODE & CENTISCAPE.

Identification of Drug Targets
In graph theory, a clique is a subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent and dense cliques are the sub-networks formed using the plug-in, M-CODE. 11 dense cliques were obtained of which 5 dense cliques had a threshold score above 5 in the M-CODE analysis. The M-CODE analysis helped to separate the protein networks based on function. CENTISCAPE analysis was done to identify the subnetwork with the maximum interaction of proteins using betweenness, centrality and radiality properties. Maximum betweeness centrality was observed in the flagellar protein subnetwork amongst three proteins: fliJ, fliP and flgN.

Protein Modelling
The properties of the proteins fliJ, fliP, flgN such as sequence, sequence length, mass and presence of 3-D structures was studied. A PSI-BLAST was run and a template for fliJ protein was obtained. fliJ has a pivotal task in flagellar assembly as it is involved in chemotactic stimuli. The template chosen to model the protein had 100% identity and 88% query coverage. The template used was Chain A of fliJ protein obtained from Salmonella enterica subspecies. Homology modelling of fliJ was performed using Swiss Model. The model obtained was further analysed using ERRAT2, ProSA and PDBsum to check the quality.

Identification of Lead Molecules against E. coli:
Phytochemicals showing antimicrobial activity against E. coli were identified and their structures were obtained. The phytochemical molecules which satisfied the 'Lipinski's Rule of Five' were chosen.

Virtual Screening by Molecular Docking
Phytochemicals that satisfied with the Lipinski's Rule of Five was docked with the protein model of fliJ obtained using Molegro Virtual Docker (MVD) 5.0. MVD 5.0 uses MolDock scoring system and it is based on a hybrid search algorithm, called guided differential evolution. This algorithm combines the technique of differential evolution optimization with a cavity prediction algorithm. The modelled protein structure was loaded on to MVD 5.0 platform for the molecular docking process. The built-in cavity detection algorithm of MVD 5.0 was used to identify the potential binding sites which are also referred to as active sites or cavities.
The search algorithm used was Moldock SE and 10 was the number of runs taken while 2000 was the maximum iterations for a population size of 50 having 100 as the energy threshold. At every step, least 'min' torsions/translations/rotations were sought and the molecule having the lowest energy was preferred. After molecular docking simulation, the poses (binding modes) obtained were classified by re-rank score.
Using the ligand preparation module of MVD 5.0, the selected ligands were manually prepared. Bond order, flexible torsion and the ligands were deducted. After the careful removal of hetero atoms and water molecules, the target protein structures were prepared and its electrostatic surface was produced. The molecular docking was subjected to amino acid residues which were found to be a part of the interaction of fliJ with geranyl phosphates and ferulic acids. The grid resolution was set at 0.3 Å. The maximum interaction and maximum population size were set at 1500 and 50 respectively [6]. A combinatorial library was developed using the phytochemical molecules which showed maximum activity with the target protein, using SmiLib v2.0. [7] SmiLib is a free, platform independent software tool for rapid combinatorial library generation in the SMILES notation.

Results: Study of Protein-Protein Interactions
The Centiscape Plug-in of Cytoscape is based on the property of maximum betweenness centrality, centrality and radiality. These are graph theory and network analysis terminologies which mean a measure of centrality in a graph based on shortest paths (betweenness centrality), identification of the most important vertices within a graph (centrality -where its applications include identifying the most influential protein in a network) and a measure of the number of nodes reachable from a central node in a network (radiality). Among the interacting proteins in the subnetwork (dense clique) in Cluster 1, three proteins (Figure 1) were selected for further study -flgN, fliP and fliJ.

Protein Modelling of fliJ:
The properties of the proteins fliJ, flip and flgN, such as the amino acid sequence length, mass and presence of 3-D structures were studied in UniProtKB. A PSI-BLAST alignment (Figure 2) was run and a template for fliJ protein was obtained. The template chosen to model the protein had 100% identity and 88% query coverage. The template used was Chain A of fliJ protein obtained from Salmonella enterica subspecies. Homology modelling of fliJ was performed using Swiss Model. The model obtained was further analysed using ERRAT2, PDBSum and ProSA to check the quality. The ERRAT2 analysis showed that the modelled protein structure showed an overall Quality Factor of 99.2188 which is acclaimed to be a very good score. In PDBSum,

Identification of Lead Molecules against E. coli:
A total of 87 molecules (Table 1) were found to be having antimicrobial activity against E. coli by literature survey.

Molecular Docking:
All the 87 phytochemical molecules obtained were docked with the fliJ protein. The molecular docking results were tabulated for all compounds. Of all compounds, out of the many molecular docking poses, only the ones which have the highest moldock score and relatively good hydrogen bond interaction were chosen. The best few compounds which displayed very good affinity with the interaction site were selected.
The molecular docking results (  The principle objective of our study was to identify phytochemicals which may target some essential proteins in Escherichia coli. The interacting amino acids of geranyl pyrophosphate were Arg50, Tyr69, Trp66, showing a strong physical interaction between the flagellar protein, fliJ and the phytochemical, geranyl pyrophosphate. The other phytochemicals which showed good activity with the target are ferulic acid 4-o-b-d-glucuronide, 5-8'dehydrodiferulic acid and geranyl diphosphate. The common interacting amino acid is Trp66, which is the running thread which happens to be in the list of interacting amino acids of all the four phytochemicals which showed maximum activity in MVD 5.0. M-CODE analysis was performed in Cytoscape and 11 subnetworks were obtained of which 5 subnetworks had a threshold score above 5. The M-code analysis helped to separate the protein networks based on function. CENTISCAPE analysis was done to identify the subnetwork with the maximum interaction of proteins using betweenness, centrality and radiality properties. Maximum betweeness and centrality was observed in the flagellar protein subnetwork amongst 3 proteins: fliJ, fliP and flgN.
The properties of the proteins fliJ, fliP, flgN such as sequence, sequence length, mass and presence of 3-D structures were studied. A PSI-BLAST was run and a template for fliJ protein was obtained. fliJ plays a role in flagellar assembly as it is involved in chemotactic stimuli. The template chosen to model the protein had 100% identity and 88% query coverage. The template used was Chain A of fliJ protein obtained from Salmonella enterica subspecies. Homology modelling of fliJ was performed using Swiss Model. The model obtained was further analysed using ERRAT2, ProSA and PDBsum to check the quality. A total of 87 molecules were found to be having antimicrobial activity against E. coli by literature survey. All the phytochemical molecules obtained were docked with the fliJ protein. The molecular docking results were tabulated for all compounds. Out of the many molecular docking poses, for every compound, only those with the highest Moldock Score and good hydrogen bond interaction were preferred. A few compounds which showed a very good affinity towards the interaction site were picked.

Conclusion:
Medical biofilms is a ubiquitous threat. Therefore, it is of interest to disrupt biofilms. The molecular interaction between the bacterial flagellar protein fliJ and geranyl pyrophosphate, ferulic acid 4-o-bd-glucuronide, 5-8'-dehydrodiferulic acid and geranyl diphosphate denote probable prevention of biofilm formation in Escherichia coli strains. The phytochemical geranyl pyrophosphate exhibited the highest binding affinity for further consideration against Escherichia coli biofilms.