Analysis of Hepatitis E virus (HEV) X-domain structural model

Hepatitis E viral infection is now emerging as a global health concern, which needs to be addressed. Mechanism of viral replication and release is attributed by the different genomic component of HEV. However, few proteins/domain like X and Y domain remain unexplored, so we aim to explore the physiochemical, structural and functional features of HEV ORF-1 X domain. Molecular modeling of the unknown X domain was carried out using Phyre2 and Swiss Model. Active ligand binding sites were predicted using Phyre2. The X-domain protein found to be stable and acidic in nature with high thermostability and better hydrophilic property. Twelve binding sites were predicted along with putative transferase and catalytic functional activity. Homology modeling showed 10 binding sites along with Mg2+ and Zn2+ as metallic heterogen ligands binding to predicted ligand-binding sites. This study may help to decipher the role of this unexplored X-domain of HEV, thereby improving our understanding of the pathogenesis of HEV infection.


Background:
Hepatitis E virus (HEV) is recently evolving as a global emerging disease with neurological, haematological manifestations in addition to acute and chronic liver infection [1,2]. Widely accounts for the 20-30% mortality in the HEV infected pregnant ladies in their third trimester [3], recent evidences of HEV in solid organ transplant patients, blood donors, and incidence of vertical transmission to newborns with severe maternal and fetal outcome, obviates the need to explore in depth the virus itself. Even the recent reports of the ribavirin resistance in HEV are alarming, as there is no effective FDA approved vaccine against HEV [4].
Recently, molecular study by Parvez MK, 2017 [14] suggested the role of Y domain sequence (a.a 239-439) in HEV life cycle through gene regulation and/or ER membrane binding in replication complexes. Allen et al 2003 [15], classify X domain to ADP-ribose-1''monophosphate of macro-domain protein family. Although there is lack of significant sequence homology of viral X domains with phosphatases, yet some viruses are shown to have Appr-1pase activity [16,17], due to common macro-domain fold (Asparagine-rich (Asn) catalytic site).
However, structure of this X domain is not reported yet. Also the detailed physiochemical characterization and putative structure with ligand binding active sites is not elucidated, so we proposed an in-silico 3-D structure prediction of HEV X domain using homology modelling.

Methodology: Retrieval of the target (X-Domain) amino acid sequence:
The amino acid sequence of X-Domain (HEV ORF

Secondary structure prediction of HEV X-domain protein:
The self-optimized prediction method with alignment (SOPMA) software [23] and PSIPRED program (http://bioinf.cs.ucl.ac.uk/psipred) was used to predict the secondary structure of X Domain protein (target). Disorder prediction was performed using DISOPRED tool. Predict Protein software (https://predictprotein.org) including PROFsecwas also used to predict secondary structure [24].

Protein binding sites and Gene ontology prediction of Xdomain:
Protein-protein binding sites were predicted by profISIS [25] by identifying interacting residues from sequence alone by combining predicted structural features with evolutionary information. Molecular, cellular and biological functions were predicted by a Gene Ontology (GO) prediction method Metastudent [26] via homology to known annotated proteins.

Homology modelling and validation of X-domain:
There is no experimentally deduced 3D structure available for X domain protein in protein data bank (PDB), therefore homology modelling of the protein (X domain) was done using two program Swiss Model and Phyre2 (http://www.sbg.bio.ic.ac.uk/phyre2) [27][28]. Secondary structure has also been predicted using Phyre2. 3D model of X domain generated from Swiss-Model and Phyre2 was compared and only the most suitable 3D model was selected for final validation. The final modelled structure was validated using Ramachandran plot analysis (PROCHECK) (http://nihserver.mbi.ucla.edu/SAVES) for sterio-chemical property. The final predicted model was submitted to the 3D LigandSite [29] server to predict the potential binding site.  Table 1.

Secondary structure prediction:
The default parameters (similarity threshold: 8; window width: 17) were considered by SOPMA for the secondary structure prediction with >70% prediction accuracy. Utilising 511 proteins (sub-database) and 15 aligned proteins, SOPMA predicted 40.51% of residues as random coils in comparison to Alpha helix (34.81%), extended strand (20.25%) and Beta turn (4.43%) as shown in Table 2. PSIPRED showing the higher confidence of prediction of helix, strand and coil (Figure 1). Secondary structure prediction by PROFsec (PredictProtein) employing neural network system, provide the prediction accuracy of more than 72%. 42.41% helix confirmation (α; π; 3_10-helix), 44.30% loop (L) followed by 13.29% beta strand (E=extended strand in beta sheet conformation) was predicted in X domain. Intrinsic disorder profile was computed using DISOPRED and >90% of the amino acid are below the confidence score of 0.5 for disordered condition, suggested the lowest possibility of distortion and conferred the high stability to the predicted protein.

Protein binding sites and Gene ontology prediction:
Binding sites were predicted using predict protein software (profISIS), where 12 different protein binding sites were identified at positions viz.: 28-30; 46-47; 49; 59-60; 73-78; 88-89; 93; 108; 128; 131-133; 135; 141 (data not shown). Gene ontology predicted and categories the functional aspects as cellular, molecular and biological, where this X domain protein found to be extracellular or the part of host cell or membrane; metabolic processes such as primary and cellular metabolic processes including cyclic, heterocyclic and aromatic compound metabolism processes (data not shown).
Ambiguous states (?) 0.0% Stereochemical quality of the Swiss model predicted X-domain structure was evaluated by plotting Ramachandran map (PROCHECK). 85.1% of the total residues (137) were found in the core (A; B; L) whereas 13.2% of residues were in the allowed (a; b; l) regions. Disallowed region constitute of 1.8% of the residues. Good quality model of X domain was predicted by analyzing 118 structures of good resolution (2.0 A°) and R-factor (<20%). PROCHECK analysis showed max deviation of 21.0 (residue properties), with bond length/angle of 5.8 and 77.8% planar groups within the limits.

Homology modelling and structural validation of X-domain
Similarly, the homology modelling of X domain was performed by Phyre2.Based on the 6 templates (c5fsuA, c2x47A, c5iitC, c5kivA, c5fszA and d2acfa1), protein model was generated with 87% of the residues modelled at >90% confidence ( Fig. 3) with coordinates (A): X: 51.738, Y: 33.604, Z: 42.515 (based on heuristics to maximise confidence, percentage identity and alignment coverage). Secondary structure prediction by Phyre2 was described as Disordered (13%), Alpha helix (36%), and beta strand (22%) (data not shown). Phyre2 predicted structural model was evaluated for the stereochemical quality using Ramachandran map (PROCHECK). The 84.1% of the residues were found in the core (A; B; L) whereas 12.1% of residues were in the allowed (a; b; l) regions. However 2.3% residues were aligned in generously allowed region (~a, ~b, ~l), whereas disallowed region constituted 1.5% of the residues. Among residual properties max deviation was 4.1, bond length/angle 10.5 with 2 cis-peptides with 98.3% planar groups within limits.

Conclusion:
We report the structural model of HEV X domain with predicted active site for ligand binding. This provides insights into the functional role of X-domain in viral pathogenesis.