Phylogenetic analysis of cubilin (CUBN) gene

Cubilin, (CUBN; also known as intrinsic factor-cobalamin receptor [Homo sapiens Entrez Pubmed ref NM_001081.3; NG_008967.1; GI: 119606627]), located in the epithelium of intestine and kidney acts as a receptor for intrinsic factor – vitamin B12 complexes. Mutations in CUBN may play a role in autosomal recessive megaloblastic anemia. The current study investigated the possible role of CUBN in evolution using phylogenetic testing. A total of 588 BLAST hits were found for the cubilin query sequence and these hits showed putative conserved domain, CUB superfamily (as on 27th Nov 2012). A first-pass phylogenetic tree was constructed to identify the taxa which most often contained the CUBN sequences. Following this, we narrowed down the search by manually deleting sequences which were not CUBN. A repeat phylogenetic analysis of 25 taxa was performed using PhyML, RAxML and TreeDyn softwares to confirm that CUBN is a conserved protein emphasizing its importance as an extracellular domain and being present in proteins mostly known to be involved in development in many chordate taxa but not found in prokaryotes, plants and yeast.. No horizontal gene transfers have been found between different taxa.


Background:
Cubilin (CUBN, also known as intestinal intrinsic factor receptor or intrinsic factor-cobalamin receptor or intrinsic factor-vitamin B12 receptor), acts as a co-transporter and helps in the uptake of lipoprotein, vitamin and iron. It functions as a transporter in many absorptive epithelia (intestine, renal proximal tubules and embryonic yolk sac) [1]. A potential role of mutations in CUBN gene has been hypothesized to play a role in the etiology of autosomal recessive megaloblastic anemia [2]. This hereditary condition is also known as MGA1 Norwegian type or Imerslund-Grasbeck syndrome. The disease is characterized by defective absorption of vitamin B12 and impaired function of the enzyme thymidine synthase; therefore, DNA synthesis, particularly during erythropoiesis is affected [1,3]. Cubilin interacts with megalin in a calcium dependent manner by forming a dual-receptor complex called cubam which facilitates the uptake of specific ligands like hemoglobin, uteroglobin etc. [4]. In addition, CUBN also controls and facilitates endocytosis of various ligands [5,6]. In the proximal tubule cells, cubilin helps in reabsorption of vitamin D binding protein from glomerular filtrates and assists in the synthesis of 1α,25-dihydroxyvitamin D(3) [7,8]. In addition, a missense mutation in CUBN gene was found to alter the levels of [9].
The cubilin protein has 27 CUB domains and 7 EGF-like domains and is coded by the human CUBN gene is located on  First pass phylogenetic tree constructed by multiple alignment using BLAST pair wise alignments: Results presented using Taxonomic name. First pass phylogenetic tree constructed showed that the CUBN sequences were mostly from arthropods, placozoans, nematodes, tunicates, rabbits, hares, primates, rodents, placentals, odd toed ungulates, even toed ungulates, bivalves, lancelets, hemichordates, sea urchins, bony fishes, amphibians, birds, lizards, marsupials and monotremes.

Methodology: Data Set, Sequence Alignment and Construction of Phylogenetic Tree
We queried the GenBank database [11] for all available protein sequences of the CUBN. The retrieved sequences were saved in FASTA format. An initial first-pass phylogenetic tree was constructed using Neighbour Joining method

Results:
From the NCBI GenBank database, 588 sequences of CUBN covering the CUB domain (Figure 1) were used to construct a first-pass phylogenetic tree. The sequences were mostly from arthropods, placozoans, nematodes, tunicates, rabbits, hares, primates, rodents, placentals, odd toed ungulates, even toed ungulates, bivalves, lancelets, hemichordates, sea urchins, bony fishes, amphibians, birds, lizards, marsupials and monotremes. This tree however had many repetitive and unrelated sequences which were deleted. A high degree of sequence similarity of CUBN enzyme in many of the selected sequences was observed during phylogeny reconstruction. Putative conserved domains were observed in many taxa at the CUB domain CUB domain (cd00041); extracellular domain; present in proteins mostly known to be involved in development; not found in prokaryotes, plants and yeast [22]. The actual alignment after DELTA BLAST was detected with cd00041, Cd Length: 113 Bit Score: 139.08, E-value: 4.18e-37. The final accession information for the tested sequences (n=25) are presented in Table 1 (see supplementary material); multiple sequence alignment is presented in Appendix 1. Using the PhyML program a tree was constructed for these sequences, the results of which are presented in (Figure 2). RAxML revealed that there were 3782 distinct alignment patterns and the proportion of gaps and completely undetermined characters in this alignment was 24.21%. RAxML rapid bootstrapping and subsequent ML search showed an ML estimate of 25 per site rate categories (Figure 3). No horizontal gene transfers have been observed in the selected taxa.

Figure 2:
Phy ML: Phylogenetic tree of CUBN sequences. The final phylogenetic tree constructed using specific sequences revealed that the CUBN sequences were highly similar in most organisms. The closest similarity was observed in the primates.

Discussion:
The acquisition of cubilin and its importance in evolution was assessed by using a mix of programs to construct a phylogenetic tree [23,24]. It is interesting to note that CUBN as a protein with receptor function has arisen only in the eukaryotes and thus not found in prokaryotes and plants. Cubilin is very helpful in the absorption of Vit B12 from the intestinal epithelia on which animals and protists are dependent for various biochemical pathways; however, they do not have the ability to synthesize it. It should be noted that plants and fungi neither produce Vit B12 nor use it but animals and protists use it but cannot synthesize it owing to the very complex de novo biosynthesis pathways [25]. The acquisition of cubilin in these phyla from an evolutionary stand point explains why mammalians have an efficient mechanism for uptake of extrinsic Vit B12. We have used the neighbor-joining method to create trees based on multiple sequences in ClustalW; a JTT matrix method was used to cluster sequences at 85% identity level [26] followed by construction of a phylogenetic tree [18]. The PhyML software was used because it is an accurate but slightly faster than other phylogeny programs. DELTA-BLAST which uses a heuristic method to identify homologous sequences produced high scoring sequence alignment to generate a first pass phylogenetic tree from which relevant sequences were narrowed down. Kalign was used for multiple sequence alignment of shortlisted sequences because it is very fast, suitable for large alignments and concentrates on local regions providing insight to evolutionary relationships [18]. The RAxML software was used in inferring and validating the most scoring maximum likelihood tree [20]. In addition, using the T-Rex software, we have evaluated horizontal gene transfers in the selected organisms. It is well understood that this involves a transfer of genetic material from one lineage to another commonly found in prokaryotes as an adaptation mechanism to environment [21]. However, the selected organisms did not show the presence of this phenomenon. The final phylogenetic tree indicated that CUBN is a relatively new protein having evolved very late in evolution (Figures 2 & 3). Also, no homologous gene transfer has been observed among the short-listed organisms which are indicative of the largely conserved domains within the amino acid sequences.

Conclusion:
Acquisition of CUBN in relatively evolved organisms indicates its crucial role in the physiology of cells. A higher conservation at CUB domains indicates preserved function. The phylogenetic tree for CUBN revealed the presence of this gene in eukaryotes and indicated its importance in the biochemical pathways related to absorption of many ligands from the epithelial linings in cells re-emphasizing the importance of certain proteins in evolution.