A Phylogenetic analysis of Heparanase (HPSE) gene

The Current Study aimed to investigate the possible role of Heparanase protein (HPSE-1, [Entrez Pubmed ref|NP_001092010.1|, heparanase isoform 1 preproprotein [Homo sapiens]) in evolution by studying the phylogenetic relationship and divergence of HPSE-1 gene using computational methods. The Human HPSE protein sequences from various species were retrieved from GenBank database and were compared using sequence alignment. Multiple sequence alignment was done using Clustal-W with defaults and phylogenetic trees for the gene were built using neighbor-joining method as in BLAST 2.2.26+ version. A total of 112 BLAST hits were found for the heparanase query sequence and these hits showed putative conserved domain, Glyco_hydro_79n superfamily. We then narrowed down the search by manually deleting the proteins which were not HPSE-1. These sequences were then subjected to phylogenetic analyses using the PhyML and TreeDyn software. Our study indicated that HPSE-1 is a conserved protein in classes Mammalia, Aves, Amphibia, Actinopterygii and Insecta emphasizing its importance in the physiology of cell membranes. Occurrence of this gene in evolution with conserved sites strengthens the role of HPSE-1 gene and helps in better understanding the biochemical processes that may lead to cancer.


Background:
Cancer remains an important cause of chronic illness.Better understanding of various genes and proteins and their implications at the cellular and molecular levels helps in identifying appropriate preventive/diagnostic measures.Studies in our lab on heparanase (HPSE-1 [Entrez Pubmed ref|NP_001092010.1|,heparanase isoform 1 preproprotein [Homo sapiens]) gene polymorphisms associated with cancer have provided a preliminary understanding of the role of this gene in the pathophysiology of cancer (unpublished).Heparanase is a predominant endoglycosidase that cleaves heparan sulfate which forms a principal polysaccharide in the basement membrane and extracellular matrix.Cleavage of heparan sulfate disintegrates the structural integrity of the basement membrane and releases angiogenic and growthpromoting mediators [1,2].Heparanase is synthesized as an inactive 65 kDa proheparanase in the golgi apparatus and later transferred to endosomes/lysosomes for transport to the cell-surface [3].The human heparanase gene (HPSE) is located on chromosome 4 (4q21.3)and is expressed as a 5-and 1.7-kb mRNA by alternative splicing [4].The heparanase mRNA encodes a 61.2-kDa protein with 543 amino acids.This proenzyme is then post-translationally cleaved into 8 (Gln 36 to Glu 109) -and 50 (Lys 158 to Ile 543)-kDa subunits that noncovalently associate to form the active heparanase [5,6].The formation of this heterodimer is essential for heparanase activity.Penetration of the endothelial layer of cell in the surface of blood vessels is an important process in tumor metastases .SHA proteoglycans are important components of this layer [7].Therefore, increased metastatic potential may correspond to increased heparanase activity [8].Because of this inherent role of heparanase, the inhibition of its activity can be a potential target for anti-cancer therapies.Heparanase also elicits an indirect neovascular response by releasing extracellular matrix resident HS-bound angiogenic factors [9].Considering the importance of HPSE-1 gene in cancer, we aimed to elucidate if variance among this gene exists in various species during evolution by using a phylogenetic analysis of published protein sequences of these genes.

Methodology: Data Set, Sequence Alignment and Phylogenetic Tree Building
The GenBank database [10] was queried to retrieve all available protein sequences of the heparanse gene.These sequences were retrieved and saved in FASTA sequence format.These sequences were then aligned using Clustal W [11] algorithm using default parameters.The initial first-pass phylogenetic tree was constructed using Neighbour

Results:
From the NCBI GenBank database, 112 sequences of heparanse 1 isoform covering the putative conserved Glyco_hydro_79n domain were obtained and used for the construction of a firstpass phylogenetic tree.However, some of the repetitive and non sequences which were not related to HPSE-1 were deleted.The short-listed sequences majorly belonged to the classes Mammalia, Aves, Amphibia, Actinopterygii and Insecta (lancelets, bony fishes, frogs and toads, marsupials, rodents, primates, even-toed ungulates, carnivores, rabbits and hares, odd-toed ungulates, lizards, birds and placentals).The accession information for these sequences is available in Table 1 (see supplementary material).Analysis of the sequences revealed that there is a high degree of sequence similarity of heparanase protein in many of the Mammalian species used for the phylogeny reconstruction, thus implying that this protein might be more or less conserved in most of the Mammalian species.The two species belonging to Class Insecta also seem to share a high degree of sequence similarity with each other, but show very less degree of sequence similarity with the other classes.Putative conserved domains were observed in many taxa at the Glycosyl hydrolase family 79, N-terminal domain; Family of endo-beta-N-glucuronidase (Figure 1) [22].The actual alignment was detected with superfamily member pfam03662 (E-value: 9.29e-12).BLAST produced 112 hits (Figure 2); these sequences were screened manually and only those related to the sequence in question (HPSE-1) from different taxa were retained for further analyses.This produced a total of 31 sequences from various taxa.Multiple sequence alignment results of these short-listed sequences are presented in (Figure 3).Using the PhyML program a tree was constructed for these 31 sequences, the results of which are presented in (Figure 4). of heparanase protein among various groups of organisms (e.g., species, taxa, phyla).We aimed to unearth the relationship of this protein synthesized by HPSE-1 gene using computational phylogenetics to identify the seemingly similar genes in different organisms.This was done through the use of algorithms, methods and programs through phylogenetic analyses to assemble a phylogenetic tree from a set of genes, species, or other taxa [24, 25].Neighbor-joining method was used to calculate genetic distance from multiple sequence alignments, and ClustalW was used to create trees based on distance.JTT matrix is an efficient method for generating mutation data matrices from protein sequences by using the peptide-based sequence comparison algorithm, the set sequences are clustered at the 85% identity level.The closest relating pairs of sequences are aligned, and observed amino acid exchanges tallied in a matrix [26].Later, a phylogenetic tree is constructed from closely related sequences [20].Several studies and especially the BAliBASE benchmark showed that MUSCLE achieved the highest ranking of any method at the time of publication.Gblocks eliminates poorly aligned positions and divergent regions (removes alignment noise).PhyML was shown to be at least as accurate as other existing phylogeny programs using simulated data, while being one order of magnitude faster.Furthermore, the system offers the possibility to control results of each step before launching the next program, so that users can modify and properly adjust parameters for a given task.This is possible in checking the "Step by step" option.BLAST algorithm was used for comparing primary biological sequence information; this preliminary fundamental program uses a heuristic method to find homologous sequences by locating shot matches by seeding followed by local alignment.main idea of initially using BLAST is its ability to produce high scoring sequence alignment between the query sequence and database sequences.The BLAST algorithm gives a very good speed and relatively good accuracy.However, it has been widely believed that BLAST should be used as a first pass sequence alignment.Phylogenetic trees constructed in this study indicated that HPSE protein is conserved and may play and important role in organismal evolution (Figures 3 & 4).It is interesting to note that the conserved regions as shown in Homo sapiens are similar to those found in some other organisms that have this conserved gene.Presence of heparanase in the major organisms indicates that it is crucial for the development of the physiology of cell membranes.Its high conservation at certain domains indicates that its function is preserved.This gene was absent in lower organisms like the invertebrates and the prokaryotes.In conclusion, the evolutionary relationship of HPSE gene was established based on the sequence alignment, conserved sequences and phylogenetic trees.The results of the published data on protein sequences of the above genes showed that the sequences are highly conserved especially at certain domains.Human sequences consistently clustered with their mammal orthologs within these genes clearly indicate the importance of these genes in evolution.
[12] method (maximum sequence difference of 0.85) using Basic Local Alignment Search Tool [BLAST] pairwise alignments between a query and the database sequences searched [13].Evolutionary distance between two sequences modeled as expected fraction of amino acid substitutions per site given the fraction of mismatched amino acids in the aligned region was taken by the software using Grishin computation [14].Using the results from BLAST [15] we created a first-pass phylogenetic tree after which we used a purpose-built computational phylogenetic method using Phylogeny.frsoftware.Sequences were aligned with MUSCLE (v3.7) [16] configured for highest accuracy (MUSCLE with default settings).After alignment, ambiguous regions (i.e. containing gaps and/or poorly aligned) were removed with Gblocks (v0.91b) [17-19].The phylogenetic tree was reconstructed using the maximum likelihood method implemented in the PhyML program (v3.0 aLRT) [15-20].The Jones-Taylor-Thornton (JTT) substitution model was selected assuming an estimated proportion of invariant sites (of 0.108) and 4 gamma-distributed rate categories to account for rate heterogeneity across sites.The gamma shape parameter was estimated directly from the data (gamma=1.163).Reliability for internal branch was assessed using the aLRT test (SH-Like).The graphical representation and edition of the phylogenetic tree were performed with TreeDyn (v198.3)[21].

Figure 1 :
Figure 1: Putative sequence of heparanase in super families.The actual alignment was detected with superfamily member pfam03662: Cd Length: 320 Bit Score: 65.50 E-value: 9.29e-12

Figure 2 :
Figure 2: First pass phylogenetic tree constructed by multiple alignment using BLAST pair wise alignments: Results presented using Taxonomic name [112 hits] Discussion: Heparan sulfate proteoglycans (HSPGs) play a key role in the self-assembly, insolubility and barrier properties of basement membranes and extracellular matrices [23].Hence, cleavage of heparan sulfate (HS) affects the integrity and functional state of tissues and thereby fundamental normal and pathological phenomena involving cell migration and response to changes in the extracellular micro-environment [1, 2].Heparanase, an enzyme that degrades heparan sulfate at specific intra-chain sites is synthesized as a latent approximately 65 kDa protein that is processed at the N-terminus into a highly active approximately 50 kDa form.Experimental evidence suggests that heparanase may facilitate both tumour cell invasion and neovascularization, both critical steps in cancer progression.The enzyme is also involved in cell migration associated with inflammation and autoimmunity.Our preliminary studies (unpublished) have indicated that the HPSE gene might have association in the pathophysiology of cancer.Phylogenetics was performed in this study to evaluate the evolutionary relatedness