Insights from the Molecular Modelling and Docking Analysis of AIF-NLS complex to infer Nuclear Translocation of the Protein

Apoptosis Inducing Factor protein has a dual role depending on its localization in mitochondrion (energy production) and nucleus (induces apoptosis). Cell damage transports this protein to nucleus which otherwise favors mitochondrion. The alteration of Nuclear Localisation Signal tags could aid nuclear translocation. In this study, apoptosis inducing factor protein (AIF) was conjugated with strong NLS tags and its binding affinity with Importin was studied using in silico approaches such as molecular modeling and docking. This aims to improve the docking affinity of the AIF-Importin complex thus allowing for nuclear translocation, in order to induce caspase-independent apoptosis of the cell.

Cancerous cell death can be induced using a protein, Apoptosisinducing factor 1, encoded by the AIFM1 gene located on the Xchromosome. The protein can localize to the mitochondria (for energy production and subsequent cell growth) as well as the nucleus (inducing caspase independent apoptosis) [3][4][5]. However, owing to its weak Nuclear Localization Signal (NLS), the protein does not localize to the nucleus except in response to apoptotic stimuli, preferring to carry out its mitochondrial function [6].
NLS is a monopartite or bipartite signal rich in positively charged amino acid residues (Lysine and Arginine residues) that tags protein for import into the nucleus [7]. NLS is recognized by Importin, a type of Karyopherin, which is involved in transporting proteins into the nucleus [8]. Importin consists of α and β subunits. Importin α is an adaptor protein that recognises and binds to the NLS of a nuclear protein [9]. The Importin α-NLS complex then proceeds to bind to Importin β, by means of Importin β binding domain (IBB), a 44-amino acid long sequence which is present at the N-terminus of the importin-α [10].
The binding with Importin simply facilitates its movement across the Nuclear Pore Complex (NPC), and once this is complete the Importin-NLS complex dissociates with the binding of Ran-GTP. This allows for the release of the NLS and thus the protein, and Importin is captured once again by the NPC, recycled and used for the transport of further proteins [11,12]. The nuclear transport of the Apoptosis-inducing factor protein is not facilitated unless a DNA-damaging event occurs [13,14]. On occurrence of such an event, the Apoptosis-inducing factor protein, is released from mitochondria following mitochondrial outer membrane permeabilization thereby inducing a caspaseindependent cell death [15][16][17][18]. Moreover, AIF protein lacks a strong NLS, which prevents it from localizing into the nucleus.

©2018
This study aims at modifying the NLS tag of the AIF protein such that it aids in the translocation of the protein to the nucleus by taking precedence over its mitochondrial function.

Methodology: AIF Protein:
The AIF protein was selected solely on the basis of the fact that it contains a Mitochondrial Localisation Signal (MLS) and an NLS, and the reason for its duality was due to the MLS being stronger than the NLS. This led to the reasoning that if the protein were reinforced with a stronger NLS tag, it would cause the protein to relocate to the nucleus after synthesis and thus induce apoptosis. The sequence of Apoptosis Inducing Factor (AIF) protein was retrieved from the Uniprot database holding an accession number O95831. ]. An NLS score of 8 and above signifies a strong signal, therefore it was chosen as the cut-off. Since more than 150 such NLSs were obtained, so the cut-off was raised to 15.

Conjugation of NLS with AIF protein:
The best NLS tag of the AIF protein was present from amino acid position 26 to 56, which had a score of 3.5 (Figure 1). This site was replaced with a stronger NLS tag, which was obtained from the proteins in the Nuclear Protein Database. Moreover fusing the NLS tag at this particular site also ensures that the existing Mitochondrial Localization Signal is interrupted. This was done by obtaining the information of both sequences in the FASTA format and editing the AIF protein by inputting the NLS sequence in place of the pre-existing tag.
The recombinant protein, labelled RecAIF was modelled based on threading approach due to lack of proper template structures using ITASSER (Iterative Threading ASSEmbly Refinement). ITASSER generates the 3-dimensional structure of protein using "fold recognition" and also provides other parameters such as RMSD value and C-score which can be used to choose the best model [27 -28]. ITASSER also provides the Gene Ontology terms, which predict the Molecular function, Biological process and Cellular component of the modelled protein [29].

Validation of the results:
Models obtained from ITASSER were evaluated based on phi-psi Ramachandran plot (RC plot) using the RAMPAGE server [30]. ProtParam analysis was also performed on the primary sequence using ExPasy server to compute various physicochemical properties such as instability index, energy values, estimated half-life and GRAVY (Grand Average of Hydropathy).

RecAIF-Importin docking:
In order to facilitate the nuclear transport of the RecAIF protein, it should form a stable complex with the importin α, which carries the protein across the nuclear pore complex. Therefore, the interaction between NLS of the recombinant protein and its corresponding binding site at the importin α should be studied. The NLS binding site is present in the importin α from position 142 to 238. ClusPRO [31-35] was used to perform protein-protein docking.

Result and Discussion: cNLS mapping:
The NLS mapper revealed that there is an NLS for the AIF protein from the position 26 to 56. This NLS was chosen to replace with the NLSs obtained from the Nuclear Protein Database. This is because, a Mitochondrial Localization Signal (MLS) is present in the protein from position 1 to 30 [6] and replacing this sequence will render the MLS redundant. Therefore, it would enhance the prospects of the protein getting localized to the nucleus.

List of NLS sites:
The search for proteins with NLS sites against the Nuclear Protein database resulted in 16 proteins having 24 NLS sites with score greater than 15. The proteins "Histone-lysine Nmethyltransferase" and "NUT family member 1" have 3 NLS sites. Four proteins have 2 NLS sites and rest of proteins have only one NLS site resulting in a total of 24 NLS sites. The NLS score ranges from 15-24. NLSs from "NUT family member 1" had two highest NLS score of 24 and 21.6 shown in table 1. The identified NLS sites were conjugated with the target protein between sites 26 to 56, thus replacing AIF protein's N-terminal NLS and MLS sites.

Model Validation:
Models for all the conjugated proteins were generated and were subjected to Ramachandran Plot and Physicochemical properties analysis. The most vital criteria for selection of a recombinant protein model include RMSD values, C-score, Energy values and the instability index. All these criteria were obtained using ProtParam and GRAVY analysis [36].

RC plot Analysis:
The viability check is done in terms of validation of the model, using the Ramachandran Plot, which was obtained via the RAMPAGE server. The cut-off for this is that at least 90% of the recombinant protein's residues should be present within the favoured and allowed regions. Barring model ID 7.2, all the other models have their residues validated based on the cut-off provided by the Ramachandran Plot analysis, i.e., all other models have >90% of their residues within the allowed/favoured regions, implying that the protein's configuration with regards to the dihedral angles phi and psi are such that there is no steric hindrance regarding the protein's structure. This solves an important conundrum that may hinder protein-protein docking, and validates the structure based on the position of its residues and tells us the possible conformations of psi and phi angles for

©2018
the amino acid residues of the protein. No models had >10% residues in the outlier region implying that all the models obtained had high viability and passed the validation.

C-Score and RMSD Validation:
The C-score is calculated based on the significance of threading template alignments and also on the parameter convergence of the structure assembly simulations. The C-score should be in the range of [-5, 2] for the model to be acceptable. As seen from table 2, all the models satisfy this criterion, having C-scores ranging from -1.14 to -2.02. The RMSD values of all the structures in table 2 are higher than expected, and hence it is unlikely that the recombinant AIF protein will fold in a similar manner to that of the actual AIF. The obtained RMSD values of ≈ 10±5 Å were all greater than the accepted 2.5±1 Å, suggesting that the recombinant protein may have a different fold to that of the native one.

Energy of the Models:
The energy of models spans between -16630.957 to -5429.86 kJ/mol (

Validation by Instability Index:
At first glance, it seems that the model containing the NLS from NUT family member 1 is superior because -1) it has a lower energy score, and 2) it has a higher scoring NLS sequence. However, on checking the instability indices of all the given recombinant models, the model with the least instability index by quite a margin is model 16 Thus, the recombinant model 16 was selected; containing the NLS sequence KKKKKRKMVNDAEPDTKKAKTE isolated from the protein ATP-dependent RNA helicase DDX18. This model was chosen because -1) it had an acceptable C-score of -1.55, 2) it had the second lowest energy value -15068.126, 3) it had the lowest instability index of 46.43 close to native model, 4) When RAMPAGE analysis was performed, it was found that this model had the most number of residues in the favored region and second least number of residues in the outlier region (490 and 38 respectively) out of 24 models generated.

Gene Ontology annotations:
The GO terms for this recombinant protein seem to provide the most promising results since it has "Nucleotide binding" as one of its molecular functions, which happens to be an important factor for the apoptotic activity of the AIF. The GO terms "Establishment of localization" as its biological process shows that the protein will get localized with accordance to the signal peptide it carries (NLS in this case) and "Response to chemical stimulus" biological process shows that the localization process would alter the state of the cell (apoptosis in this case).

Importin and Recombinant AIF Protein Docking:
Protein-protein interactions play a vital role in various aspects of the structural and functional organization of the cell, and a better understanding of cell processes such as metabolic control, signal transduction, and gene regulation. Molecular modeling approaches can be used to understand the details of proteinprotein interactions at the atomic level. ClusPro server, an FFT based algorithm was used to study the interaction between two proteins. ClusPro clusters and filters the docked complexes. Totally, 110 clusters were generated by from four methods namely; (i) balanced, (ii) Electrostatic favoured, (iii) hydrophobic favoured and (iv)VdW+Elec, each method gave 29, 29, 23 and 29 clusters respectively. The lowest energy values of the docked complex ALF-Importin model from balanced, electrostaticfavoured, hydrophobic-favoured and VdW+Elec has -998.3, -111.6, -1219.2 and -317.3 respectively. Best-docked complexes were analyzed manually to identify the possible interaction sites. The residues A222, C223, G224, E180, W184 and R227 were known to be involved in binding interaction. The complex 002.29 has major group of interacting residues from NLS binding site. The residues H177, E180, S219, L221, A222, C223, G224 and Y225 are known to be part of NLS binding site. The surface model of Importin and Recombinant AIF is shown in figure 3. The binding pocket and NLS site residues were shown in yellow color and the details of interacting residues were given in cartoon representation in figure 4. Total accessible surface areas of interacting residues between Importin and Recombinant AIF protein were given in table 4. ©2018      -All values provided (except for ratios) are in angströms 2 . Out and In refer to exposed/buried residues.

Contribution:
The interaction of AIF-Importin complex was enhanced by the addition of an NLS from ATP-dependent RNA helicase, thus leading to nuclear translocation being favoured over mitochondrial translocation.

Conclusion:
From the above results, it is evident that the recombinant protein arising from the fusion of NLS of the ATP-dependent RNA helicase to the AIF protein has considerably low values for Instability Index, RMSD and Energy. The GO term "Establishment of localization" as its biological process shows that the protein will get localized to the nucleus owing to the NLS tag it carries. By performing protein-protein docking, it is also seen that the NLS in the recombinant AIF protein interacts with its binding site at the importin α. Therefore, it can be concluded that the NLS of the protein ATP-dependent RNA helicase is the best NLS tag for the AIF protein to enhance its interaction with