The RAS subfamily Evolution – tracing evolution for its utmost exploitation

In the development of multicellularity, signaling proteins has played a very important role. Among them, RAS family is one of the most widely studied protein family. However, evolutionary analysis has been carried out mainly on super family level leaving sub family information in scanty. Thus, a subfamily evolutionary study on RAS evolutionary expansion is imperative as it will aid in better drug designing against dreadful diseases like Cancer and other developmental diseases. The present study was aimed to understand RAS evolution on both holistic as well as reductive level. All human RAS family genes and protein were subjected to BLAST tools to find orthologs and paralogs with different parameters followed by phylogenetic tree generation. Our results clearly showed that H-RAS is the most primitive RAS in higher eukaryotes and then diverged into other RAS family members due to different gene modification events. Furthermore, a site specific selection pressure analysis was carried out using SELECTON server which showed that H-RAS, M-RAS and N-RAS are evolving faster than K-RAS and R-RAS. Thus, the results ascertain a new ground to cancer biologists to exploit negatively selected K-RAS and R-RAS as potent drug targets in cancer therapeutics.


Background:
Development of multicellularity requires a sophisticated but robust machinery to communicate among different cells.In this machinery, the sensing unit is usually activated by ligandreceptor interaction, generating specific effects mostly in gene transcription, morphology, motility, adhesion and endo/exocytosis [1].Ras (Rat sarcoma) proteins family comprises of highly conserved molecular switches involved in controlling canonical process like homeostasis, cell growth, differentiation and apoptosis [2].These switches works by altering their conformation to Guanosine triphosphate (GTP) bound active mode to Guanosine diphosphate (GDP) bound inactive mode [3].Their significance in human life form can easily be assessed by the fact that nearly in 25 -30% of human tumors, RAS mutations are present [2,4].Its role in cancer development has raised this promising protein family as one of the most lucrative and extensively studies cancer drug target for nearly three decades with more than 40,000 research articles published on it till year 2011 [5].However, comparatively fewer efforts have been made to decipher its evolutionary history creating a huge lacuna in its utmost exploitation.Thus, an indepth evolutionary study will broaden the scope of understanding of this particular target and will aid in designing better drugs against different menacingly dangerous cancers and developmental diseases.
Ras family consists of key regulatory proteins mainly conserved in eukaryotes [3].Despite of early reports on its evolutionary presence in lower eukaryotes and prokaryotes by Robins et al. in the year 1990 [6] and Dong et al. in 2007 [7], a comprehensive route to its evolutionary expansion in higher and complex eukaryotes was still missing.To facilitate the understanding of higher eukaryotic evolution, RAS evolutionary study was imperative.Ras evolutionary canvas can be drawn in two levels viz., Ras Superfamily and Subfamily.The Ras superfamily consists of five major families i.e., Ras, Rho, Arf/Sar, Ran, and Rab.However, to best of our knowledge, evolutionary studies have been conducted on RAS superfamily [3, 7, 8, 9], Rho subfamily [1] and Rab subfamily [10, 11] only, leaving rest of the family members still untouched.The present study is an endeavor to understand RAS subfamily evolutionary history and sequence structure relationship to an unprecedented indepth level.The aim of this study was to decipher the evolutionary path followed by individual RAS subfamily members viz., K-Ras, H-Ras, M-Ras, N-Ras and R-Ras [4] during the course of evolution on both sequence and structural level.

Homolog (Ortholog and Paralog) Searching
A homolog searching was carried out for all the RAS family members.A rigid parameter was set to get only the true positive results.The searching was done against Non Redundant Database by using NCBI's BLAST tools, blastn and blastp [12].
All the subject sequences which were synthetic constructs and sequences that aligned to the queries with e-value more than 0, identities less than 100 % and coverage less than 90% were eliminated.Apart from tracing the distribution of RAS family among the different biodiversity, our endeavor was also to understand the extent of evolution in individual genomes.Therefore, a paralog sequence search was conducted against all the homologous genomes already found out through previous analysis.However, the analysis was done with flexible parameters i.e., all the sequences with their identity more than 90% to the query and spanned more than 90% of the query were considered as paralogous sequences.The flexibility in the parameters was introduced to extract all the false negative sequence from the entire raw data.

Multiple Sequence Alignment, Structure Prediction and Validation
Multiple sequence alignment was carried out through guide tree approach using windows ClustalX program [13] on both gene and protein level.To leverage the potentiality of protein structures on tracing evolutionary level, 3D structure of all the longest isoform of each individual Human RAS family members were predicted using Modeller software [14], except for RRAS1 which was modeled by I-TASSER server using ab-initio methodology [15] due to its high degree of divergence from other RAS subfamily members.The structural models were further evaluated using Structural Analysis and Verification Server's PROCHECK tool which checks stereochemical quality of a protein structure utilizing Ramachandran plot [16].The best models from each individual isoforms were taken under consideration for further analysis.

Phylogenetic tree generation and Selection Pressure analysis
Topali software [17]  However, the rigid parameters also removed false negative data from results i.e., due to long evolutionary distance and time, the RAS eukaryotic forms substantially diverged from lower organisms showing no significant similarity on gene and protein level for entire stretch of sequence.Therefore, all the prokaryotic and lower eukaryotic RAS family members were subjected to BLAST all-to-all method.Surprisingly even in lower eukaryotes and prokaryotes, no significant similarity for entire stretch of sequences was found indicating a long evolutionary distance and many parallel evolutionary events among lower organisms.Thus to decipher the evolutionary path from prokaryotes to eukaryotes, gene and protein sequences were utilized to draw phylogenetic tree using neighbor joining approach followed by clustering (Figure 1 & Figure 2).
From both the gene and protein tree, it was evident that it was H-RAS which came from prokaryotes to eukaryotes which eventually due to evolutionary gene modification events like gene duplication, bifurcated among different RAS forms present today in just higher eukaryotes.The study clearly showed that HRAS subfamily has further evolved, diverged into NRAS on one cluster and KRAS, MRAS and RRAS in the other cluster/sub clusters.To assess the different selection pressures on all the sites of all RAS forms, SELECTON server was used.
All the protein structures of longest RAS forms were modeled through homology modeling except R-RAS1 which was modeled by I-TASSER server using ab-initio methodology Table 3 (see supplementary material) and then were subjected to site specific selection pressure analysis using SELECTON server Table 4 (see supplementary material) [18].While considering the distribution of calculated Ka/Ks ratio (ω), the evaluation of selection pressures indicated that maximum number of the codons in RRAS and KRAS were under purifying selection and HRAS, NRAS and MRAS showed neutral selection.It is also interesting to note the correlation on location of RAS members on the human chromosome arm i.e., the classical RAS members, HRAS (11p15.5),KRAS (12p21.1),NRAS (1p13.2) are located on the short arm (p) of human chromosome whereas MRAS (3q22.3)and RRAS (19q13.3-qter)are located on the long arm (q) of the human chromosome.The phylogenetic tree and selection pressure results showed that HRAS diverged into KRAS and NRAS on the short arm and then a gene duplication event was observed where either of the then present form duplicated to long arm of the human chromosome with sudden evolutionary expansion giving rise to more evolutionary dynamic RRAS and MRAS.It is worth mentioning that RRAS itself comprises of 53.5% of the entire RAS subfamily bio-distribution.The maximum percentage biodistribution density was observed in RRAS2c (24.56) followed by RRAS2b (14.91),HRAS1 (13.16),NRAS (8.78), HRAS2 (7.89), MRAS1 (7.89) and RRAS1 (7.89), RRAS2a (6.14), MRAS2 (3.52) and KRAS1 (2.63) and KRAS2 (2.63).

Conclusion:
Ras family proteins play pivotal role in key regulatory process in eukaryotes.The study on evolution of this family has immense significance in both understanding of multicellularity evolution and cancer biology.From our finding, we speculate and hypothesize that RAS family in higher eukaryotes started with H-RAS and then parallely evolved in two distinct domains viz.K-RAS with N-RAS and M-RAS with R-RAS where N-RAS and M-RAS with their higher divergence rate are involved in ever expanding RAS forms.Moreover evidence of K-RAS and R-RAS being negatively selected opens opportunity to exploit these proteins as preferred targets over other family members.

Figure 1 :
Figure 1: Phylogenetic tree on gene basis

Figure 2 :
Figure 2: Phylogenetic tree on Protein basis

Table 1
The existence and role of RAS family in lower eukaryotic and prokaryotic development are reported early by Dong et al, and Robbins et al. [6, 7, 19].Moreover, Dı´ez et al., showed expansion of RAS family in complex higher order of eukaryotes like humans [3].However, the route followed by this canonical family during the course of evolution was still eluded.This study was carried out to understand the biodiversity of RAS family and how this family might have been derived during evolution.All the form and isoforms of RAS subfamily in human were traced from NCBI gene database using Boolean queries like AND and OR [18]ence and the modeled structures were subjected to SELECTON server which calculated substitution rate ratios of nonsynonymous (Ka) versus synonymous (Ks) mutations.A Ka/Ks ratio greater than 1 suggests positive selection and a ratio less than 1 suggests purifying selection[18].Discussion:

Table 1 :
The gene-protein architecture (forms and isoforms) of all RAS subfamily members i.e., H-Ras, K-Ras, M-Ras, N-Ras and R-Ras in Homo sapiens by generating Boolean queries against NCBI's Gene database

Table 2 :
The Bio-distribution of RAS forms in terms of both orthologs and paralogs generated through BLAST tools.