Sequence analysis and phylogenetic study of some toxin proteins of snakes and related non-toxin proteins of chordates

Snakes are equipped with their venomic armory to tackle different prey and predators in adverse natural world. The venomic composition of snakes is a mix of biologically active proteins and polypeptides. Among different components snake venom cytotoxins and short neurotoxin are non-enzymatic polypeptide candidates with in the venom. These two components structurally resembled to three-finger protein superfamily specific scaffold. Different non-toxin family members of three-finger protein superfamily are involved in different biological roles. In the present study we analyzed the snake venom cytotoxins, short neurotoxins and related non-toxin proteins of different chordates in terms of amino acid sequence level diversification profile, polarity profile of amino acid sequences, conserved pattern of amino acids and phylogenetic relationship of these toxin and nontoxin protein sequences. Sequence alignment analysis demonstrates the polarity specific molecular enrichment strategy for better system adaptivity. Occurrence of amino acid substitution is high in number in toxin sequences. In non-toxin body proteins there are less amino acid substitutions. With the help of conserved residues these proteins maintain the three-finger protein scaffold. Due to system specific adaptation toxin and non-toxin proteins exhibit a varied type of amino acid residue distribution in sequence stretch. Understanding of Natural invention scheme (recruitment of venom proteins from normal body proteins) may help us to develop futuristic engineered bio-molecules with remedial properties.


Background:
In the domain of biology and biochemistry interpreting the evolutionary process of life is of great importance. Study of molecular level evolution of proteins helps us to understand the evolution of life. The journey of evolution assisted with natural selection generates improved biologically functional protein molecules [1]. Different organisms use different adaptive strategy for existence in the adverse natural world. From ancient times mankind was enchanted by the snakes and their astonishing features. The process of evolution gifted a precious thesaurus to the snakes, i.e., their venom. The primary function of snake venom is to incapacitation and immobilization of the prey of the snakes (as an offensive armory). Evolved snake venom to aid in catching prey exhibits fatal and enfeebling effect. The secondary function of venom is to serve as defensive machinery against their predators. Snake venom also assists in digestion of variety of diets of snakes [2]. The evolution of venoms favors the survival of snakes in different environment [3]. Snake venom is a cornucopia of biologically active polypeptides and non-polypeptide constituents [4]. Among the different venomic component snake venom cytotoxins and short neurotoxins are nonenzymatic polypeptide candidates with a molecular weight of 5-10 kDA [4]. In the cobra venom, fifty percent of the dry weight accounts for its cytotoxin components [5]. The characteristics of cytotoxins and short neurotoxins are i) three β-strand loops ii) small globular hydrophobic core iii) four conserved disulfide bridges and iv) 60-62 amino acid residues [6,7,8]. The lethality of cytotoxin is due to its pore formation capacity into the biological membranes [9]. Short neurotoxins exert their effect by blocking the neuromuscular transmission through selective binding to muscle nicotinic acetyl-choline receptors (nAChR) [10]. Cytotoxin and short neurotoxin are enriched with disulfide bonds. As a family member of 'Threefinger protein' superfamily, cytotoxins and short neurotoxins exhibit 'three-finger' appearance which is a chief structural characteristics feature of this superfamily. The name of threefinger protein is for three loops of three-finger proteins that are outstretched from the core region of the protein which is looked like stretch out three fingers of a hand [6,8]. For proper structural maintenance three-finger proteins are rich in disulfide bonds [8]. Three-finger protein scaffold is also present in different chordates where they employed in diverse biological functions. Plethodontid modulating factor (PMF), Lymphocyte antigen 6H (Ly6H), CD59 glycoprotein (CD59), Ly-6/neurotoxin-like protein 1 (Lynx-1) are different non-toxin family members of three-finger protein superfamily.
These non-toxin family members are involved in diverse physiological system namely pheromone system, complement system, cellular communication system and central nervous system etc [8, 11,12]. Snake venom proteins recruited into venom proteome by a process of evolution in which genes of normal body proteins that are engaged in key regulatory process within the body are duplicated. Selective expression of these duplicated genes in the venom gland produces deadly cocktail of lethal toxin molecules [12]. Previously a study on the physicochemical characterization and functional analysis of snake venom toxin proteins & related proteins (non-toxin) of chordates was done to understand their compositional differences, physicochemical properties and functions in a comparative manner [13]. The aim of the present study was to analyze the diversification profile of amino acid sequences, polarity profile of amino acid sequences, conservation pattern of amino acid residues and phylogenetic relationship of snake venom toxin proteins and related non-toxin three-finger proteins of different chordates like hagfish, bird, frog, mouse, rat, human etc.

Methodology:
Protein sequences of cytotoxins, short neurotoxins and related non-toxin body proteins were retrieved from National Centre for Biotechnology Information (http://ncbi/nlm/nih.gov) [14]. Presence of signal peptide within the amino acid sequences and cleavage sites of signal peptides was analyzed using SignalP 4.0 algorithm which is a neural network-based method (http://www.cbs.dtu.dk/services/SignalP/) [15]. Main chain of protein sequences were selected for further analysis. For simplicity a sequence ID code was given to each molecule. Sequence related information's was mined from Protein Information Resources (PIR) knowledgebase [16]. Multiple sequence alignment is done with Clustal-X program, followed by manual inspection for errors [17]. Aligned sequence sets were represented in Clustal-X coloring scheme and polarity coloring scheme using Clustal-X and Jalview tool respectively [17,18].
Sequence logos were generated using WebLogo application, which helps to dictate the sequence conservation and relative frequencies of the amino acid residues at each position [19,20]. Family specific conserved pattern of amino acid residues in toxin and non-toxin proteins were extracted using PRATT version 2.1 [21]. Phylogenetic trees were re-constructed using neighbor-joining method implemented in MEGA 4.0 version [22]. For Bayesian inference of phylogeny, MrBayes 3.1.2 was used [23].

Results & Discussion:
In this work, various snake venom cytotoxins and short neurotoxins (naja annulifera and naja naja) and related chordate specific non-toxin proteins were analyzed with the help of different bioinformatical packages to address evolutionary process of these molecules Table 1 (see supplementary material). The number of amino acids of main chain of toxin proteins and non-toxin proteins comprise of 60 to 90 amino acid residues (Table 1). Multiple sequence alignment of protein sequences (represented in polarity coloring schemes) shows variable sites as well as conserved sites of these proteins which demonstrate the sequence enrichment strategy of these sequences for adaptation to different physiological systems (Figure 1). First residue of all sequences is a Leu (hydrophobic amino acid) which is conserved in all peptides except in short neurotoxins (replaced by Met), Secreted Ly-6/uPAR-related protein 1 (SLURP1) of human (replaced by Phe) and Secreted Ly-6/uPAR-related protein 2 (SLURP2) of human (replaced by Ile).
In terms of polarity Leu, Met, Phe, Ile all were hydrophobic in nature. These replacements of residues (non-synonymous substitution) with a same polarity residues hint conservation strategy at physicochemical characteristics level. Cysteine is conserved in total seven sites, which act like an anchoring residue for maintenance of the three-finger fold of the threefinger protein superfamily (Supplement Figure 1). After last Cysteine the successive amino acid is Asn which is conserved in all sequences. These conserved residues help to create the signature residues of three-finger protein domain. Substitutions (point mutations) are high in number in toxin specific sequences. Polarity shuffling and hydrophobic (aliphatic) amino acid substitution are very prevalent in cytotoxins and short neurotoxin sequences (Figure 1).
In other related non-toxin proteins, substitutions of amino acids at family level are slightly low in number. In xenoxin family the observed substitution of amino acid in terms of polarity is from a negative amino acid (Asp) to a hydrophobic amino acid (Gly) and from a hydrophobic amino acid (Ile) to polar amino acid (Thr). Presence of more amount of positively charged amino acid residue than negatively charged amino acid is observed throughout in xenoxin sequence stretch. In Hep21 protein (HEP21) sequences negatively charged amino acid is equal in number to positively charged amino acid. In PMF the presence of negatively charged amino acid is greater in number then positively charged amino acid. Very versatile amino acid substitution profile is observed within the family members of CD59. Within family members and at superfamily level CD59 has shown a diversified sequence profile. Low level of amino acid sequence conservation of CD59 molecules of different species result in vivid amino acid distribution profile of CD59 [24]. In Ly6H and Lynx-1 amino acids substitution are uncommon and these protein sequences are very conservative in nature within their corresponding family (Figure 1).  WebLogo program is used to create sequence logos. With the help of Sequence logos the pattern in sequence conservation has been addressed in a graphical form (Figure 2). Stack of letters in sequence logos represents the amino acid position. The stack height (measured in bits) signifies the degree of sequence conservation at that position and the height of each letter depicts the relative frequency of the corresponding amino acid [19,20]. Multiple sequence alignment (represented as sequence logo) of cytotoxins and short neurotoxins reveals that despite of the difference in their mode of mechanism of actions i.e., cytotoxin acts upon cell membrane and short-neurotoxin exert action upon nAChR, 24 sites were conserved in their sequences [9, 10]. Eighteen sites (amino acid residue) are identical in nature and amino acid residues of 6 sites are physicochemically conserved (non-synonymous substitution) within the sequences (Supplement Figure 2). Sequence logo of toxin and non-toxin proteins exhibits the pattern of sequence conservation in toxin and non-toxin sequences of three-finger protein superfamily (Figure 2). It is noteworthy that the conserved cysteine profile of all the sequences of toxin and nontoxin proteins indicates their common evolutionary ancestry. With the help of these cysteines the three-finger fold is well maintained in three-finger proteins [8]. Variable sites signify the diversification process of evolution where a single scaffold is used in different system with distinct roles. Individual family specific conserved pattern in protein sequences were tabulated in Table 2 (see supplementary material). Phylogenetic analysis of toxin and non-toxin protein sequences using depicts the evolutionary relationship among cytotoxins, short neurotoxins and related non-toxin proteins of other chordates (Supplement Figure 3). Bayesian phylogenetic analysis shows that cytotoxins and short neurotoxins are entirely separate entities in terms of position in phylogenetic tree (Figure 3). Cytotoxin and short neurotoxin belong to two separate groups but share a common branch point. Species specific distribution of toxin molecules in phylogenetic tree was observed (i.e., cytotoxins of naja annulifare and naja naja are not mixed within each cluster). Related non-toxin proteins form an entirely separate group of non-toxin body proteins. This can logically be interpreted that Lynx1 and Hep21 share a common branch point with toxin proteins which reasonably logical because cytotoxins and short neurotoxins are originated from an ancestral Lynx1 like molecule [12]. All other non-toxin protein molecules i.e., HEP21, SLURP1, SLURP2, Ly6H, Xenoxins, HMLP1, PMF and CD59 form separate clusters with their corresponding family members.

Conclusion:
Different results signify that cytotoxins and short neurotoxins, which are two important components of snake venom, are originated from simply body proteins enriched with different sequence specific substitution strategy for biological needs. Efficient utilization of hydrophobic amino acids, positively charged and negatively charged amino acids and their distribution profile in toxin sequence make them a tailored killer element in snake venom. An enhanced amphipathic nature added extra advantage to cytotoxins for exerting harmful action upon biological membranes. Variations in physicochemical properties of amino acid within the toxin sequences gave additional opportunity for generation of improved potent toxins. PMF is also efficient to use negatively charged amino acids in sequence stretch for better receptor attachment. Even distribution of negatively charged amino acids at regular intervals helps PMF to provide better chance to attach pheromone receptor in a specific manner. CD59 comprises of vivid amino acid residue distribution in their sequences. Very conserved profile of amino acid sequences in Ly6H and Lynx1 is for their involvement in two important systems i.e., cellular communication system and central nervous system. Within the nature these two systems are very critical for any life form. Distribution of different amino acid residues along the sequence length of three-finger proteins is customized for their adapted biological functional needs. With conserved cysteine these proteins maintain the structural scaffold but at the same time variation of amino acid residues in other parts of sequence facilitates different system dependent necessities. Comparative sequence specific analysis of protein sequences demonstrates how proteins are generated within the nature's testing ground for tailor-made biologic needs. Tracing the Natural protein engineering scheme of three-finger proteins enrich our knowledge which in turn helps to generate biomolecules with remedial properties.