Molecular docking enabled updated screening of the matrix protein VP40 from Ebola virus with millions of compounds in the MCULE database for potential inhibitors

Ebola virus is known for several outbreaks of hemorrhagic fever in West Africa. This RNA virus is linked to high fatality and easy transmission. Recently, an effective vaccine and a monoclonal antibody cocktail have been introduced to combat Ebola virus infection. The matrix protein VP40 of Ebola virus is a known drug target and it is essential for viral life cycle through participation in RNA transcription as well as for the budding of the mature virus. It is known that residues phenylalanine 125 and arginine 134 of VP40 are involved in the interaction with RNA. Therefore, it is of interest to screen VP40 with millions of compounds at the mcule.com database for potential inhibitors. The output hits were ranked according to their minimum binding energy to matrix protein VP40. We further calculated the pharmacokinetics and toxicology properties for the best five hits using several predictive ADME analysis web tools. We report a candidate lead (compound #5: ((10R)-10-(4-hydroxyphenyl)-11,12,14,16-tetraazatetracyclo[7.7.0.02,7.011,15] hexadeca-1(16), 2(7),3,5,8,12,14-heptaen-8-ol)) with high drug-likeness score, promising lead-likeness behaviour and high median lethal dose. The candidate lead compound #5 engages in hydrogen bonding and hydrophobic interactions with VP40 active site residues. Thus, the lead compound #5 is recommended for further in vitro and in vivo validations for further consideration.


Background:
The Ebola virus (EboV) is an antisense RNA virus that had been responsible for several outbreaks of hemorrhagic fever mainly in West Africa. This virus has been categorized as a bio-weapon hazard due to its high transmission capacity with fatality rate of about 90% [1,2]. Attempts to combat Ebola virus infection with an effective vaccine or a potential drug has been crowned with the development of a recombinant vesicular stomatitis virus Ebola vaccine and a monoclonal antibody cocktail named ZMapp [3,4]. Additionally, computational approaches are currently employed to generate novel anti-EboV compounds. These approaches are also applied nowadays to repurpose some FDA approved drugs like ibuprofen for treatment and prevention of EboV [5]. The RNA genome of Ebola virus is composed of about 19 Kb nucleotides that encodes for seven structural proteins. These conserved proteins include glycoprotein (GP), matrix protein VP40, VP35, VP30, VP24, nucleoprotein (NP) and polymerase L protein. These proteins can serve as potential drug targets for high throughput screening (HTS) [1,6]. 628 ©Biomedical Informatics (2019) . A three-dimensional structure representation of matrix protein VP40 can be seen in Figure 1. It is worth mentioning that the blue colored N-terminal domain as seen in Figure 1 is believed to be responsible for VP40 oligomerization into different conformations. These various conformations may explain the different cellular functions carried out by VP40 within Ebola virus life cycle [12,14].
Several previous studies had virtually screened large chemicals databases like zinc database and traditional Chinese medicine (TCM) database [15,16]. In these in silico projects, the binding potential of these chemicals against VP40 of Ebola virus was assessed and several pharmacokinetics parameters were estimated. In this project, we have virtually screened the full version of mcule.com chemical database [17] against matrix protein VP40 of EboV. The minimum binding free energy of the top five hits were reported and analyzed. Several chemical, pharmacokinetics and toxicological parameters were also estimated for these top chemicals. Our aim is to identify potential lead compounds for probable use against Ebola virus infection by using computational approach.

Methodology:
Structure-based virtual screening: We have used mcule.com [17] online platform to carry out our hits identification workflow. This platform provides a collection of online drug discovery tools and well-curated chemicals databases. In summary, we used MCULE purchasable (full) database as updated on July 2019. This version of database contains more than 42 million chemical compounds. For our screening protocol, we used default options to select our final hits. It is worth to mention that we have implemented both sampler and diversity filters to randomly select diverse and different chemical structures with no more than 10 rotatable bonds, 5 chiral centers and 1 violation of Lipinski's rule of five. These filters can save our online tools limit and speed up our preliminary screening process. In order to eliminate the possibility of having promiscuous ligands and minimize non-selective and frequent hitters, we have added a REOS (rapid elimination of swill) filter. Finally, the filtered chemicals were virtually screened for binding potential against Ebola virus matrix protein VP40 by using AutoDock vina.

Pharmacokinetics and toxicology prediction:
We used two web servers, namely pkCSM[23] and SwissADME[24], to predict different pharmacokinetics features and drug-likeness model scores for the top five hits. These online platforms implement both molecular similarity and predictive regression to analyze submitted molecules [25,26]. We were also able to estimate median lethal dose (LD50) for these compounds by using ProTox-II server [27, 28].

Results:
The chemical structures and characteristics for the top five hits as screened virtually against Ebola virus VP40 protein are shown in Figure 2 and Table 1 respectively. The compounds were ordered according to their minimum binding energy to VP40. The minimum binding energy, pharmacokinetics and toxicology features are estimated for these selected compounds as seen in Table 2.
According to this table, the average binding energy for these chemicals was between -7.0 and -5.9 kcal/mol. These compounds follows Lipinski's rule of five except compound 2 and 3. Only compound 5 shows high drug-likeness score and may be a 630 ©Biomedical Informatics (2019) potential lead candidate. All these selected hits have high percentage of predicted intestinal absorption with relatively low volume of distribution. Total clearance ranges from 1.3 to 3.6 ml/min/kg for these five compounds. Unfortunately, compound 2 had failed to pass virtual AMES test and it may possess a mutagenic potential. Both compounds 2 and 5 have relatively high median lethal dose (LD50) as compared to the others. Docking image as seen in Figure 3 had revealed that compound 5, our probable lead candidate, is clearly involved in conventional hydrogen bond with Glycine 126 residue. It is also engaged in Pi hydrophobic interaction and van der Waals (vdW) bonding with Phenylalanine 125 and Arginine 134 respectively.

Discussion:
Due to its high fatality rate and transmission capacity, Ebola virus has been recognized as type "A" bio-weapon microorganism [1, 2]. Management of Ebola hemorrhagic fever was focused towards symptoms mitigation and related complications control. Recently, a potential Ebola vaccine and a monoclonal antibody medication have been developed [3,4,29]. Furthermore, target based virtual high throughput screening approach has been implemented to identify potential anti-EboV hits [2]. In this trend, Tamilvanan and Hopper virtually screened both Traditional Chinese medicine (TCM) database and Asinex database against VP40 crystal with code 1H2C. By using Glide based three-tiered docking strategy, they were able to report five natural and five synthetic potential inhibitors of Ebola virus VP40. Of these ten possible inhibitors, compound ASN03576800 (2-[2(1,3-benzodioxol-5-ylamino)-2oxoethyl]sulfinyl acetic acid) had displayed a promising minimum binding energy and an interesting orientation within VP40 active site pocket. Interestingly, these two researchers had used a docking grid box coordinates very close to those used in our project [16]. Additionally, Abazari et al. 2015 had screened 120,000 compounds from zinc database for potential inhibitors of matrix protein VP40 (code 4LDB). By using Autodock vina for virtual screening [20], they were able to report four drug-like chemicals with binding energy range from -11.3 to 10.1 kcal/mol [15]. Later on, Alam El-Din et al. 2016 had searched PubChem database for conformers of pyrimidine, 2,4 dione by using similarity fingerprints. The retrieved 1800 compounds were virtually screened against VP40 crystal of Sudan Ebola virus by using AutoDock 4 [30]. Then, they had employed virtual ADMET web tools to estimate pharmacokinetics and toxicity properties for these compounds. They were able to report seven hits with promising minimum binding energy and low virtual toxicity [29]. Recently, Nagarajan et al. 2019 had virtually evaluated 48 sugar alcohols for binding to 1H2C crystal of VP40. By using both virtual docking and molecular dynamics (MD) simulations, they were able to find that Sorbitol had the best binding affinity to VP40 crystal. Sorbitol-VP40 complex was also stable throughout MD simulation period [31]. In our project, we have applied structure based virtual screening to recognize novel compounds with potential capacity to interfere with Ebola virus VP40 function. We have used mcule.com platform with different filters to accelerate the screening of more than 42 million compounds database. Various pharmacokinetics and toxicology characteristics have been estimated by using prediction regression web servers.  (2) The partition coefficient (XLOGP3) should be ≤ 3.5; (3) The number of rotatable bonds should be ≤ 7. Additionally, compound 5 has the highest estimated water solubility among the five hits. This compound has high predicted median lethal dose (LD50) and no mutagenic potential and may be relatively safe compound. Two dimensional docking as shown in Figure 3 shows that compound 5 is well involved in hydrogen bonding with Glycine 126 amino acid residue. It is also involved in multiple hydrophobic and Van der Waals interactions with Phenylalanine 125, Arginine 134 and Tyrosine 171 residues. This virtual behavior may enable compound 5 to effectively interfere with VP40-RNA interaction and VP40 octamer formation. Further in vitro analysis may be required to evaluate the impact of compound 5 on Ebola virus replication.

Conclusion:
The known protein target of VP40 from Ebola virus was screened with 42 million compounds in mcule.com database for potential lead inhibitors. The filtered hits were ranked based on their minimum binding energy and the top five compounds were selected for detailed study. Various pharmacokinetics and toxicology properties were estimated by using available predictive regression and molecular similarity tools. The lead compound #5 ((10R)-10-(4-hydroxy phenyl)-11,12,14,16-tetra aza tetra cyclo[ 7.7.0.0 2,7 .0 11,15 ] hexa deca-1(16),2(7),3,5,8,12,14-hepta en-8-ol) possesses high drug-likeness score and displays lead-likeness features. Molecular docking analysis shows that compound #5 forms optimal hydrogen bonding with hydrophobic interactions at the active site of the matrix protein VP40. The compound also has high LD50 value as compared to other filtered hits. Therefore, the lead compound #5 is recommended for further consideration and evaluation using in vitro and in vivo models.