Insight into gene fusion from molecular dynamics simulation of fused and un-fused IGPS (Imidazole Glycerol Phosphate Synthetase).

Gene fusion produces proteins with novel structural architectures during evolution. Recent comparative genome analysis shows several cases of fusion÷fission across distant phylogeny. However, the selection forces driving gene fusion are not fully understood due to the lack of structural, dynamics and kinetics data. Available structural data at PDB (protein databank) contains limited cases of structural pairs describing fused and un-fused structures. Nonetheless, we identified a pair of IGPS (imidazole glycerol phosphate synthetase) structures (comprising of HisF glutaminase unit and HisH cyclase unit) from S. cerevisiae (SC) and T. thermophilus (TT). The HisFHisH structural units are domains in SC and subunits in TT. Hence, they are fused in SC and un-fused in TT. Subsequently, a domain-domain interface is formed in SC and a subunit-subunit interface in TT between HisF and HisH. Our interest is to document the structure and dynamics differences between fused and un-fused IGPS. Therefore, we probed into the structures of fused IGPS in SC and un-fused IGPS in TT using molecular dynamics simulation for 5ns. Simulation shows that fused IGPS in SC has larger interface area between HisFHisH and greater radius of gyration compared to un-fused IGPS in TT. These structural features for the first time demonstrate the evolutionary advantage in generating proteins with novel structural architecture through gene fusion.


Background:
Proteins with novel structural architectures are generated by gene fusion in one species' compared to another species. [1, 2] Proteome wide comparative analyses within and across kingdoms showed a large number of fused structures. [3] Proteins created by gene fusion are shown to have enhanced role in pathways by Yanai et al., [4], simulate protein subunit interaction by Marcotte et al., [5], novel function by Long [6], enhanced substrate specificity by Katzen et al.,[7] and enzyme multi-functionality by Berthonneau and Mirande. [8] These reports indicate the existence of several isolated cases of fused protein as a result of gene fusion in evolutionary history. However, the advantage (structure, dynamics and kinetics) of producing fused proteins in one species compared to the un-fused protein orthologs in another species is not fully understood.

Molecular dynamics simulation:
All molecular mechanics calculations were carried out using the TRIPOS force field [15] in SYBYL (Molecular Modeling Software Package, Version 6.8, Tripos Associates Inc.) running on a Silicon Graphics Workstation. The energy function used in the force field was defined as the sum of six contributions (bond stretching, angle bending, torsion, van der Waals, electrostatic and planarity (for aromatic conjugated systems). Minimizations of the potential energy of the system were carried out using the Simplex algorithm and the Powell torsional gradient algorithm as implemented in SYBYL, terminating when a 0.5 Kcal/molÅ energy gradient shift was obtained. A distance dependent dielectric constant of 1.0 was used to compute electrostatic effects. The non-bonded cutoff distance used was 8 Å and the net atomic charges in the residues were calculated by the Gasteiger-Hucker method. [16,17] The in vacuo system was simulated at constant temperature, constant volume (NVT) ensemble which is referred to as the canonical ensemble. The system was run at a temperature of 300 K using a coupling constant of 100 femtosecond. The initial atom velocities were employed from a Maxwell-Boltzmann distribution with scaling velocities. The non-bonded pair list was updated every 25 femtosecond and an 8 Å cut-off was applied. During the simulation, the integration step was set up as 1 femtosecond and molecular snapshots were saved for every 1000 steps (1 pico-second). A total of 5000 structures were generated and the simulation properties were derived from the analyses of these snapshots.

Analysis:
We performed a comprehensive analysis of structures in each trajectory to detect structural differences between the two simulated systems. The flexibilities of the different structures were assessed by computing gap volume, gap index, interface area and radius of gyration. Figure 2 illustrates the fused and un-fused IGPS structures in SC and TT, respectively. A small linker connects HisH (glutaminase) and HisF (cyclase) in SC and thus IGPS is fused in SC. However, this linker  Figure 3 shows the structural snapshots of TT IGPS and SC IGPS at 0 and 5 ns simulation. The HisH and HisF interface in TT and SC is also visualized in Figure 3. The linker connecting HisH and HisF in SC is labeled and this linker is absent in TT. Thus, the interface is formed by HisH and HisF domains in SC and HisH and HisF subunits in TT. This demonstrates an evolutionary transition from a subunit-subunit interface in TT to a domain-domain interface in SC. Figure 4 shows the interface area (change in solvent accessible surface area upon interface formation between HisH and HisF calculated using NACCESS implemented using Lee and Richard algorithm [18]) in TT IGPS and SC IGPS for structures generated over a 5 ns simulation. The interface area between HisH and HisF is significantly larger (> 1000 Å 2 ) in fused SC IGPS compared to the un-fused TT IGPS throughout the simulation period.  Data shows that interface residues are more conserved than surface residues for HisF and HisH between TT and SC. The number of conserved residues for HisF is 113 (> 95 == (14+20+61)) and the remaining 18 conserved residues are located at different regions (interior/interface/surface) in the two structures from TT and SC. This explanation holds true for the HisH structures in TT and SC.  Figure 4: Interface area between HisH and HisF is given for IGPS from SC and TT over a 5 ns molecular dynamics simulation. The domain-domain interface area in SC is larger than TT throughout the simulation period. Figure 5 shows the gap volume (calculated using SURFNET [19]) between HisH and HisF in SC IGPS and TT IGPS for structures generated over a 5 ns simulation. Similar to interface area, the gap volume is consistently larger in SC IGPS compared to TT IGPS throughout the simulation period. Figure 5: Gap volume between HisH and HisF is given for IGPS from SC and TT over a 5 ns molecular dynamics simulation. The domain-domain gap volume in SC is larger than TT throughout the simulation period. Figure 6 shows the gap index (ratio of gap volume to interface area) between HisH and HisF in SC IGPS and TT IGPS for structures generated over a 5 ns simulation. Unlike interface area and gap volume, gap index is steadily similar throughout the simulation period. Figure 6: Gap index (ratio of volume to interface area) between HisH and HisF is given for IGPS from SC and TT over a 5 ns molecular dynamics simulation. The gap index is similar for the interface between HisH and HisF from SC and TT. Figure 7 shows the radius of gyration for SC IGPS and TT IGPS for structures generated over a 5 ns simulation. Similar to interface area and gap volume, the radius of gyration for SC IGPS is considerably larger compared to TT IGPS throughout the simulation period.  Therefore, it is of great significance to document the selection force generating such proteins with fused structural architectures. However, there is no documentation for structural evidence supporting the dynamics of these fused structures in the evolution of orthologous proteins.

Result:
The interface residues between HisF and HisH in TT and SC are more conserved than surface residues ( Table 1). The interface residues similarities imply catalytic conservation at the interface. The structural properties for IGPS in TT and SC are given for initial and final structures ( Table 2). The interface area, gap volume and gap index are greater in SC than TT in both initial and final structures. These values increased relatively due to simulation in both SC and TT. However, the radius of gyration in TT is larger than SC for the initial structure unlike the final structure (Table 2). Interestingly, the radius of gyration increased in SC and decreased in TT due to simulation.
The results given in Figure 3 to Figure 7 demonstrate the structure dynamics of fused IGPS in SC compared to the un-fused IGPS in TT. The IGPS in SC forms a domain-domain interface between HisH and HisF compared to a subunit-subunit interface in TT. The transition from a subunit-subunit interface in TT to a domain-domain interface in SC is interesting. The domain-domain interface area in SC is larger than the subunit interface area in TT over a 5 ns molecular dynamics simulation. The interface area in SC is 1400 Å 2 greater than in TT. The larger interface area in SC facilitates better domain-domain interactions compared to subunit interactions in TT (Figure 4). The amount of interface area determines the degree of atomic interaction at the interface. Larger HisH and HisF interface in SC imply better interaction between these two domains. Better interaction between HisF and HisH facilitates greater stability and kinetics in SC. This is assisted largely by the linker segment connecting HisF and HisH domains in SC.
The gap volume between HisF and HisH domains from SC IGPS is larger than that between HisF and HisH subunits in IGPS from TT ( Figure 5). The increased gap volume in SC IGPS may aid in substrate flow into the active sites formed by HisH and HisF domains. However, this flow of substrate is relatively restricted in TT IGPS in exchange for interface stability formed by subunit interaction. Larger gap volume in SC IGPS is partly helped by the linker between HisH and HisF which provides enhanced flexibility for these two domains. Interestingly, the increased gap volume in SC IGPS does not affect gap index (ratio of gap volume to interface area) in both SC IGPS and TT IGPS ( Figure  6). This suggests that increased gap volume is proportional to the increased interface in SC compared to that in TT.
Radius of gyration in proteins is a measure of their size and implies their compactness. The radius of gyration for IGPS from SC and TT given in Figure 7 describes the unfolding of the structure during simulation. The flexibility rendered by the linker between HisF and HisH in the case of SC IGPS is shown by the increased radius of gyration compared to that in TT throughout the simulation period over 5 ns. The difference in the average radius of gyration between SC and TT IGPS is about 1.76 Å. This provides the explanation for the increased stability leading to greater kinetics of IGPS caused by the linker in the fused structure of SC IGPS.
The raise and fall in interface area, gap volume and gap index in TT during simulation is unusual. This may be due to the high interface movement between the weakly associated subunits. The proposed hypothesis driving the formation of fused proteins by gene fusion is the structural determinant providing increased stability, dynamics and kinetics facilitated during evolutionary selection. This is evident by the structure and dynamics of IGPS as described using interface area, gap volume and radius of gyration in SC and TT. _____________________________________________________________________________________________ Conclusion: A number of fusion proteins have been identified by comparative genome analysis using sequence comparison. This suggests that gene fusion is common in evolutionary phylogeny. However, the selection force driving gene fusion in organism