Physicochemical characterization and functional site analysis of lactoferrin gene of Vechur cow

The Bos indicus Vechur breed cow milk is known for its medicinal value and the breed is listed under the category of critically maintained breeds by the Food and Agriculture Organization. The lactoferrin protein in milk is known for its nutritional value. Gene polymorphisms have been reported for Bovine lactoferrin. Mutations in the evolutionarily conserved sites tend to impair protein function and are related with the physicochemical difference between the known variants with 11 SNPs within the wild type. Structural differences are located due to these SNPs that may lead to functional variations. The structural variation is seen primarily in the first 48 residues at 5' end in all the samples modelled. Out of 11 SNPs 5 amino acid variations fall under alpha helix and beta sheet region, this might be of functional significance. This result may provide evidence that the SNPs detected in lactoferrin gene might have potential effects on milk composition. Our result demonstrates one major domain that could be a common binding pocket to all the samples, and important as an active site common to all the breeds that could be utilized for effective drug designing. Moreover, at some SNP positions in Vechur breed, antimicrobial peptides were located indicating importance of those residues for enhanced antimicrobial activity in lactoferrin of Vechur breed. Second binding pocket found in N- lobe region with the three required residues aspartic acid, histidine and tyrosine for iron binding, was considered as major binding site.


Background:
India possesses the largest population of cattle in the world and Vechur (Bos indicus) is one of the indigenous cattle breeds of Kerala, India. It is the smallest cow (nearly 90 cm high) in the world that produces relatively high amount of milk as compared to other breeds. Moreover, it has got a low level of feed requirements and high disease resistance [1]. Milk of Vechur cow has got a high proportion of small fat globules and saturated fatty acids, hence suitable for infants and sick [1]. This milk is considered of medicinal value and is used in Ayurveda system of medicine. Food and Agriculture Organization (FAO) has listed Vechur cow breed under the category of critical breeds. Lactoferrin (LTf) belongs to the serum transferrin gene family and the bovine LTf gene is localized to chromosome 22 and syntenic group U12 [2]. It is a potential gene for milk composition and body measurement traits in dairy cows. Though, genotype, sex, age, breed, herd, location, and other random environmental factors affect milk composition in the organism, single nucleotide polymorphisms (SNPs) reportedly have some very important role. Though most of the SNPs are silent, without an observable phenotype, some SNPs make an individual unique, genetically. Apart from LTf's antibacterial and tumor growth-inhibiting functions, its physiological roles in bone growth and the efficient expression of the milk protein are also very important [3]. Few SNPs of the bovine LTf gene have been reported. Associations between bovine LTf gene polymorphism and somatic cell count in milk produce are well reported [4] by many scientists but are not much studied in this species. The information required for determining and characterizing a protein's function are hidden in its amino acid sequence. The physicochemical properties of the amino acids present in the proteins are well understood with the use of computational tools used in proteomics and molecular biology [5]. If the bioinformatics methods are standardized and optimized, it may lead to the characterization of sequences and identification of functional targets. Moreover, many attempts have been well documented for characterization and prediction of functional pockets of a query protein by its amino acid sequence [6]. It is widely recognized that mutations in evolutionarily conserved sites tend to impair protein function. There is also weak evidence that protein impairment are somewhat correlated with the physicochemical difference between the original amino acid and the missense variant [7]. Much attention has been focused on the possible phenotypic effect of SNPs that cause amino acid changes. The volume of available information together with the development of more sophisticated methods of protein structure prediction has led to different attempts of relating the effect of amino acid changes to structural distortions and, consequently possible phenotypic effects [8]. Recently we cloned and sequenced the full coding region of LTf gene of Vechur breed of B. indicus. The objectives of the present study are to detect SNPs in the LTf gene, its physicochemical characterization and binding site analysis in some animals of B. indicus Vechur breed and its comparison with 'wild type' B. taurus using bioinformatics tools.

Structure prediction and analysis:
LTf protein homologs from Vechur breed and the B. taurus sequences were modelled using 'Modeller9v8' software which implements comparative protein structure modelling by spatial restraints to construct protein models [10,11]. LTf protein variants were modelled using structure of 'diferric buffalo lactoferrin' (PDB ID: 1BIY) obtained from protein data bank. To gain more insight into locations of these amino acids, the domains were predicted and analyzed using PDBsum from Procheck. Procheck verifies the stereo chemical quality of a protein structure and provides an at-a-glance overview of every macromolecular structure deposited in the Protein Data Bank. PyMOL program was employed for interactive visualization and analysis of molecular structures [12]. Structural superposition was performed between the B. indicus protein variants and the B. taurus protein sequence using a combination of MatchMaker and Match-Align subroutine of Chimera [13]. A novel score incorporating both secondary structure and residue type is used to align the sequences in Match Maker while Match-Align constructs sequence alignments from pre-existing superpositions of structures. These tools when used together enhance the understanding of sequence information in the context of structure and vice versa [14].

Binding pocket exploration:
Surface topography of proteins was computed using CASTp server [15]. The weighted Delaunay triangulation and the alpha complex for shape measurements method is utilized by CASTp server that compute identification and measurements of surface accessible pockets as well as interior inaccessible cavities, for proteins and other molecules. This tool measures analytically the area and volume of each pocket and cavity, both in solvent accessible surface and molecular surface.

Result and Discussion: Comparative sequence analysis:
In the present study, the full coding region of the LTf gene of Vechur breed of B.indicus was cloned and sequenced. The LTf sequences of five animals of B.indicus Vechur breed namely LTfV1 (NCBI ID: ACY01187), LTfV2, LTfV3, LTfV4 and LTfV5 (Accession Nos. JF926526, JF926527 and JF926528) were analyzed using various bioinformatics tools to find variation in LTf of Vechur breed and B.taurus LTf represented as LTfBt (NCBI ID: AAA30609). These variations in the sequences could be the probable cause for better milk yield in Vechur breed. Comparative sequence analysis for Vechur breeds LTf gene samples with 'wild type' B.taurus revealed fifteen single nucleotide variations at the very onset using MUSCLE [9] software, resulting in eleven amino acid differences. The assembled results present evidence that, the difference in LTf gene might have potential effect on milk composition in these species. Arginine residue at position 176 in all five LTfV samples, at 264 in LTfV3 and LTfV4 while at 632 position in LTfV3, LTfV4 and LTfV5 are of prime importance. This study explains primarily the effect of SNPs detected in the LTf gene. The nonsynonymous mutation that results in more arginine residue in LTfV samples further illustrate powerful bioactivity of LTf in the Vechur breed that is expected to have an increase of a part of phenotypic variation, especially on milk protein. Table 1 (see Supplementary material) provides details of the physicochemical analysis which shows more of the positively charged residues in the sequences and shows instability index ranging from 40.47 to 41.34. A protein whose instability index is smaller than 40 (+5%) is predicted as stable hence these proteins might be stable in a test tube. GRAVY indicates the solubility of the proteins and was computed negative in all the samples (hydrophilic). Calculation of protein isoelectric point (theoretical pI) and molecular weight (Mw) of a specified amino acid sequence are useful to know the approximate region of a 2-D gel where a protein may be found. Theoretical pI ranges from 8.69 to 8.76 in the given samples, showing protein pI as basic. The PROTPARAM tool calculates an accurate (+/-5% in most cases) molar extinction coefficient for proteins at 280 nm, simply from knowledge of the amino acid composition. The extinction coefficient indicates how much light a protein absorbs at a certain wavelength and is useful while purifying a protein.

Physicochemical characterization:
Our results establish that difference in sequence at discovered positions leads to physicochemical variation and is highly predictive of the impact of amino acid substitutions on protein function. Antimicrobial peptides are an evolutionarily conserved component of the innate immune response which demonstrates it as novel therapeutic agents. In our desire to better understand the functions of antimicrobial properties of LTf in innate immunity in Vechur breed with that of the wild type, APD2: Antimicrobial Peptide Predictor and AntiBP Server was employed for the prediction of the antibacterial peptides using antimicrobial peptide database (APD). The distribution of sequences in antimicrobial motif was predominantly studied that result in many potential antibacterial peptides that need to be investigated further for confirmation. These softwares confidently predict antimicrobial peptides at 61 and 428 positions and show some chance of being an antimicrobial peptide at 538 and 546 positions in the LTfV samples which was not found in LTfBt. This could be the cause for enhanced innate immunity in Vechur breed ( Table 2

Structure prediction and analysis:
In order to determine difference in tertiary structure LTfV protein variants and the reference LTfBt samples were modeled using coordinates of diferric buffalo LTf (PDB: 1BIY) [16] and the modeled structures were validated with the Ramachandran plot as well as its RMSD value. Based on an analysis of 118 structures of resolution of at least 2.0 Angstroms and R-factor no greater than 20%, a good quality model would be expected to have over 90% residues in the most favored regions [17]. LTfV1 and LTfV2 resulted 90.5%, LTfV3 and LTfV4 resulted 90.6% while LTfV5 resulted 91.5% residues in the most favored regions in the Ramachandran plot, hence was designated as a stable structure. The RMSD value also scored less then one, an acceptable range for all the samples modeled. To gain more insight into locations of these amino acids, the modeled structures were superimposed using Match-Align subroutine of Chimera software. The result demonstrates moderate difference in structure at 5' end for 48 residues. This could be quite significant as residue property plot confirmed residues ranging from 30 to 48 forming helical region (Figure 1). Moreover, the structure analysis from residue property plot confirmed the secondary structure elements of some of the SNP positions in beta sheet region (176 and 428 positions) and helix region (546, 627 and 632 positions) (Figure 1) indicating an important structural and functional role of these residues. It is very interesting to note that LTfV samples have more alpha helical strands than beta sheets which are also found more than LTfBt sample when compared. This can be of functional significance especially with the disparity in some of the alpha helices. This finding is substantial and falls in line with the previously documented studies of computational protein design efforts on the surface of GCN4, a homodimeric coiled coil that suggest helix propensity is the key factor in sequence design for surface helical positions

Binding pocket exploration:
While many surface sites in proteins can tolerate a wide variety of side chains, surface residues can still have a significant effect on protein structure and stability. Binding sites and active sites of proteins and DNAs are often associated with structural pockets and cavities. The binding pocket search resulted in 136 binding pockets that did not show reasonable difference in all the structures except that of LTfV5, and need to be investigated further. This result suggests that despite of some variation leading to physicochemical variations, the binding sites in four LTfV samples are exactly similar as in LTfBt, while it is found rather different in LTfV5 sample. Based on area and volume of the binding pockets, four pockets were selected as major binding pockets for further study (Table 3 see Supplementary material). This study has its importance in drug development based on LTf protein structure, as one drug developed for wild type can also be employed for other variants with similar structure. The procured consensus region may be utilized for effective drug designing [20]. Subsequently, analysis was preformed to discover binding pocket position and the residues located in it (Table 3). Binding pocket exploration produced a total of 136 binding pockets. Four binding pockets were considered 'major' based on area and volume of the site. The largest binding pocket (Green color in Figure 2) discloses eight aspartic acid residues, one tyrosine residue and no histidine residue. The second (Blue color in Figure 2) and fourth (Yellow color in Figure 2) binding pockets show all the three amino acid residues but one aspartic acid. Third pocket though (Cyan color in Figure  2) provides all the three amino acid residues in the required number for iron binding that is two aspartic acids and one each of histidine and tyrosine. The 708 amino acids of LTf, forms two homologous globular domains named Nand C-lobes. N-lobe corresponds to amino acid residues 1-333 and C-lobe to 345-692. Literature suggests that N lobe is important for binding and the ends of the domains are connected by a short α-helix [21]. The metal binding sites of transferrin domains are localized in each of the two protein globules. There, each ion is bonded with six ligands: four from the polypeptide chain (two tyrosine residues, one histidine residue and one aspartic acid residue) and two from carbonate or bicarbonate ions. Therefore, we can establish as per the literature discussed above that the second binding pocket is considered more prominent for binding, as it does not only contain all the three required residues for iron binding but also falls in N-lobe region.

Conclusions:
In conclusion, SNPs identified in the LTfV samples result in missense mutations as well. The sequence and structural difference due to these variations was investigated in the present research. Our results provide evidence that the SNPs detected in the LTf gene might have potential effects on milk composition. Binding site analysis demonstrates one major domain that could be a common binding pocket to all the samples that could be utilized for effective drug designing. Furthermore, antimicrobial peptide regions at SNP positions might be a reason for improved antimicrobial property of the milk of Vechur breed. Therefore, further study will be essential using the SNP for marker-assisted selection in larger populations to substantiate the result.