Modeling and structural analysis of human Guanine nucleotide-binding protein-like 3,nucleostemin

Human GNL3 (nucleostemin) is a recently discovered nucleolar protein with pivotal functions in maintaining genomic integrity and determining cell fates of various normal and cancerous stem cells. Recent reports suggest that targeting this GTP-binding protein may have therapeutic value in cancer. Although, sequence analyzing revealed that nucleostemin (NS) comprises 5 permuted GTP-binding motifs, a crystal structure for this protein is missing at Protein Data Bank (PDB). Obviously, any attempt for predicting of NS structure can further our knowledge on its functional sites and subsequently designing molecular inhibitors. Herein, we used bioinformatics tools and could model 262 amino acids of NS (132-393 aa). Initial models were built by MODELLER, refined with Scwrl4 program, and validated with ProsA and Jcsc databases as well as PSVS software. Then, the best quality model was chosen for motif and domain analyzing by Pfam, PROSITE and PRINTS. The final model was visualized by vmd program. This predicted model may pave the way for next studies regarding ligand binding states and interaction sites as well as screening of databases for potential inhibitors.


Background:
Human GNL3 (nucleostemin) has been viewed as a nucleolar protein with variety of roles, including pre-rRNA processing, cell-cycle control, telomere stability, genomic integrity and selfrenewal maintaining of embryonic and tissue stem cells [1,2]. Since its discovery in 2002, there are accumulating reports that nucleostemin (NS) is also abundantly expressed in many cancerous cells, and contributed directly to formation of cancer stem cells (CSCs), highlighting its importance as diagnostic marker and/or therapeutic target in cancer [1,2]. In this line, we and other groups evidenced that NS depletion can inhibit tumor growth in in vitro and in vivo and can lead to inhibition of proliferation and induction of cell death [3,4]. Although its exact mechanism(s) of action is not clear, this nucleolar protein can interact with some important functional proteins in nucleoplasm, such as p53, mouse double minute protein 2 (MDM2), and telomeric repeat binding factor 1 (TRF1), thereby modulating different fates of the cells [5]. In fact, GTP status of NS is the key factor in its nucleolar-nucleoplasm recycling and interaction with nucleoplasmic proteins [4][5].  [7]. Despite these data, however, there is no crystal (three-dimensional) structure for NS in the literature, offering an emerging work for predicting its structure by bioinformatics tolls. In this study we represented a predicted model for target sequence of NS, particularly its GTP-binding motifs, which may be helpful for better understanding its functional sites and subsequently designing therapeutic drugs.

Disordered residues
Disordered proteins are a kind of protein that lacks a fixed or ordered three dimensional structures and therefore cannot be predicted, so we firstly tried to find which residues of NS can be potentially disordered through a disorder prediction server (http://iupred.enzim.hu/).

Sequence alignment and homology modeling
Different blast algorithms (blastp, PSI-blast and PHI-blast) were used against Protein Data Bank (PDB) to choose a suitable template [9]. The template and target sequence were aligned subsequently using ClustalW with the default parameters. Finally, the aligned sequence was used as the input for modeler 9.14 to generate a model.

Model refinement
Predicted models were then refined using Scwrl4 program for prediction of protein side chain conformations. This program is based on a new algorithm and new potential function that result in improved accuracy [10].

Structural validation
The resulted structure was subjected to structural quality assessment. The ProsA program was used to assess the energy of residue-residue interaction using a distance based pair potential and the energy was transformed to a score called Zscore. Also, predicted models were assessed with Jcsg [11] server indexes and the best one was chosen. Moreover, we used PSVS program for model validation that give results in PROCHECK [12], MolProbity [13], Verify3D [14] and ProsaII plots and graphs.

Secondary structure
Stride database was used for secondary structure prediction and computation of α -helical, β -strand and coiled regions [15].

Domain and motif analysis
The Pfam database is a large collection of protein families [16], each represented by multiple sequence alignments and hidden Markov models (HMMs). Pfam results showed one conserved MMR-HSR1 domain in our query sequence. Amino acid residues within a domain that occur consistently and are responsible for specific function are called motifs. They can be used as fingerprints in evolutionary studies and in assigning a recently sequenced protein to a particular family. Fingerprint in our query sequence were found by motif search server PROSITE [17] and PRINTS [18].

Structure visualization
The predicted model is visualized by vmd software [19].

Results & discussion:
Although NS plays an important role in physiological and pathological conditions of stem and cancerous cells, the data on its structure is insufficient [3][4][5][6][7]. In a try to predict structure of NS we engaged bioinformatics tools in this study. The first issue that should be consider before starting the modeling process is sequence evaluation of target protein regarding to its tendency for being disordered. The results in Figure 1A showed a low disorder tendency in our target sequence while high tendency was seen in amino acids of 1-131 and 393-549. After evaluation of disordered residues we started modeling process. Different blast algorithms were also performed. Among blast results, the best score belonged to 3CNL chain A which contains 263 amino acids and had 99% coverage with amino acids of 132-393 of NS; there was no structural template with acceptable scores for whole protein at PDB. Although, a secondary structure prediction for whole NS were performed via http://cho-fas.sourceforge.net/index.php (data not shown), we decided to predict our model based on amino acids 132-393 largely because any models predicted from disorder regions (1-131 and 393-549 aa) may not reliable. In addition, the functional GTP-binding motifs were found among amino acids of 132-393. Blast results for target sequence showed 86.7 total score, 8e-20 E value and 26% identity, meaning that it is proper for choosing as a template. In the next step, the template and target sequence were aligned using ClustalW with the default parameters ( Figure 2) and further, the aligned sequence was used as the input for modeler 9.14 to generate ten (10) models. After model refinement, different indexes of models were compared by jcsg database to choose the best one and then validated with ProsA and PSVS software (Figure 1). The constructed model is monomer with 263 amino acids; its molecular weight estimated as 28875 Dalton. The Prosa analysis showed that Z-score in the predicted model is -2.29, indicating reasonable side chain interactions ( Figure 1B). Ramachandran Plot from Procheck evaluation for refined model showed 82.6% residues in allowed regions, 13.0% in additionally allowed regions and 2.6% in disallowed regions ( Figure 1C). Global quality scores are given in supplementary Tables 1 & 2 (see  supplementary material). In addition, PROCHECK G-factor for phi-psi and all dihedral angles was performed ( Figure 1D & E). With respect to mean and standard deviation for a set of 252 Xray structures < 500 residues, of resolution <= 1.80 Å, R-factor <= 0.25 and R-free <= 0.28; a positive value indicates a 'better' score. Results showed no close contacts within 2.2 Å, 2.4 ° RMS deviation for bond angles and 0.018 Å for RMS deviation for bond lengths (Figure 1D & E). ProsaII and Verify3D results also demonstrated that approximately 60% of residues represent a score over 0.2 which indicate an acceptable model (Figure 1F &  G). Secondary structure was predicted with Stride database and analyzed with PSVS ( Figure 1H). As depicted from Figure  1H, the predicted secondary structure contains 7 α-helices, 3 βstrands, a 3-helix and a lot of Turn and Coil structures. In fact, amino acids 2A-10A, 55A-67A, 137A-144A, 187A-195A, 207A-221A, 232A-240A, 253A-262A forms α-helices and 15A-19A, 44A-48A, 72A-74A forms β-strands. By contrast, much less βsheets were observed in 1-132 and 393-549 regions (data not shown). Finally, the predicted three-dimensional (3D) structure was shown in Figure 3A. Motif and domain analysis showed the conserved MMR-HSR1 domain is located between amino acids 256-370 where order of GTP-binding motifs is G4 (amino acids 47-50), G5 (amino acids 75-77), G1 (amino acids 131-138), G2 (amino acids 157-161) and G3 (amino acids 175-178) in our model ( Figure 3B).
GTP binding sites are good therapeutic targets for drug designing against GTP-binding proteins. Therefore, the sites of 5 GTP-binding motifs in the predicted model are highly important and can be considered as valuable targets. Indeed, GTP-binding is the main mechanism that can control the functions of NS by its shuttling between nucleolus and nucleoplasm [5][6][7]. For instance, it has been reported that NS can enter into nucleoplasm to interact with MDM2 and as a results induce p53-depndent cell-cycle arrest and/or apoptosis [4][5][6][7]. To get more insights on predicted model, we served the target sequence to corresponding database (http://iupred. enzim.hu/) and predicted disorder and anchor tendency in all residues. Then, we compared given scores for each GTPbinding motif and calculated the average number of score for being disorder and ANCHOR (data not shown). The results presented in Figure S1 showed that G1 and G4 orderly present the lowest scores (0.11 and 0.18) for being disorder while the best ANCHOR score belonged to G4 (0.25). Since a higher ANCHOR score represents a more fixed structure, we proposed that G4 motif may be more reliable target for drug design.
Collectively, this study represented a general view of NS structure. Predicted structural model can fit a variety of applications in future studies including, ligand binding states and interaction sites as well as screening of databases for potential lead compounds or inhibitors. It should be mentioned that our model for target sequence of NS is only a predictive model and need to be confirmed in further studies.