Computational analysis of bovine alpha-1 collagen sequences

Bovine collagen alpha-1 is a naturally occurring extracellular matrix protein found in tendons and other connective tissues. It plays a vital role in cell growth, differentiation, attachment, and migration. Recent findings have established that collagen alpha-1 is involved in osteogenesis imperfecta phenotype in cattle but deep information about other members of this large family is not available so far. So with a view to finding a new edge and attempt to figure out a correlation among the well attributed Bovine alpha-1 collagen sequences are executed and analyzed. To do so, comparative analysis among the 28 members of collagen family has been carried out using Computational tools. Consequently, based on the physico-chemical, secondary structural, functional and phylogenetic classifications, we have selected collagen 12, 14 and 20 as targets for pathological conditions. These proteins belong to the FACIT family and significantly showed low glycine and proline content, high instability and aliphatic index. Moreover, FACIT family collagens contain multiple triple helical domains and being members of the FACIT family, bovine collagen 12, 14, 20 do not form fibrils by themselves but they are associated to collagen 1 associated fibrils. These collagen molecules might be crucial candidates to detect and understand the process of matrix remodeling in diseases especially in the arena of cellular compartments.


Background:
Collagen is the most abundant family of fibrous proteins in mammals which is secreted by the connective tissue cells [1]. To note about its localization, it is found mostly in flesh and connective tissues in vertebrates [2]. Collagen structure is a triple helix with three different chains and these three alpha chains are wound around one another to form the superhelix structure which gives the long, stiff structure of collagen protein [3]. The amino acids in collagen are arranged in such a manner that glycine is present in every third residue [4]. Glycine is the smallest amino acid and thus fits perfectly in the helix and allows the alpha chains to wrap around together to form the superhelix. Collagen is rich in glycine and proline residues. So, other than glycine in every third residue, the remaining two amino acids are mostly occupied by proline. Pro-collagens are inactive precursors of collagens. During the synthesis of collagen, pro-collagens are synthesized at first. The mature active collagen molecules are formed by the action of peptidases cleaving the pro-peptides at the N and C terminals. Vitamin C acts as a cofactor in conversion of pro-collagens to collagens. Pro-collagens are cleaved only after secretion from the cells by proteolytic enzymes. Pro-collagens are fibrillar molecules which are lot more (about a thousand fold) stable than the collagen fibrils. Cleaving of pro-collagens to collagens inside the cell can lead to catastrophic consequences.
Collagen is the most abundant protein of the extracellular matrix (ECM). ECM is an intricate network of macromolecules filling the extracellular space inside the tissues. Other than collagens, ECM is rich in proteoglycans, glycoproteins and proteases [5]. In vertebrates, the main function of ECM is to serve as a scaffold to stabilize the physical structure of tissues.
But ECM also has more complex functions which involve cell survival, cell development, cell migration, cell-cell interaction and cell proliferation [6]. Evidenced and hypothetical together constitute 28 genetically distinct members of collagen protein in Bos taurus. In bovine, several pathological disorders are involved with imperative role of collagen. Genetic disorders in collagen synthesis include mutations in genes that encode for collagen proteins. Mutations in these genes can lead to five varieties of diseases in cattle such as Ehlers-Danlos syndrome, Osteogenesis imperfecta, Marfan syndrome, Epidermolysis bullosa (junctionalis and acanthylosis). Protein structure is the key to protein function and interaction. Protein structure analysis can provide lots of complex protein functions related disorders. Wet lab based research requires the trial and error method and cannot make a prediction before the original result. This problem can be overcome by the use of computational biology. Alteration in protein structure leads to altered protein function which in turn leads to development of diseases. So, a study that involves both dry and wet lab approaches can help to understand better about the protein function related to its structure. This type of study has been done to characterize the human matrix metalloproteinases (MMPs), in which, dry lab predictions were confirmed by experimental approaches and MMP-7 was proved as potential target in cardiac hypertrophy [7,8]. Collagen acts as substrate of MMPs and is involved in many pathological conditions. A derivative of collagen, gelatin also shows such kind of relation to diseases. Analysis of collagen is thus essential to understand the process of matrix remodeling in diseases. In this specific study, analysis of bovine alpha-1 collagen sequences is done by using computational tools. Alpha-1 is present in all 28 bovine collagen. So, bovine collagen alpha-1 chain was selected for further study of collagen sequences. In our study, secondary structural, physicochemical, phylogenetic and functional analysis of bovine alpha-1 collagen sequences were done. The target of this research is to give an insight about the nature of collagen proteins and characterize this protein family. Proposal about the potential members involved in disease conditions is also an intention of this study by examining the collagen protein family and finding any abnormal characteristics in the protein molecules. Thus further studies on collagen protein family would be facilitated by this research.

Analysis of functional properties
For functional analysis, the motifs of the alpha-1 protein sequences were identified by using the Motif Scan tool (http://myhits.isb-sib.ch/cgi-bin/motif_scan) tool [11]. The input data type was in FASTA format and motifs were scanned against Prosite patterns.

Phylogenetic analysis
Phylogenetic analysis of bovine alpha-1 collagen sequences was done by two softwares, ClustalX and TreeView. All the sequences were aligned by using the clustalx version 2.1. Then phylogenetic tree was generated by using NJ method. The output of phylogenetic tree in Phylip format was then viewed by TreeView.

Discussion:
For all the collagen, three criteria were analyzed-the biological processes they are involved in, cellular components they are part of and their molecular function Table 1 (see supplementary material) Collagen 1 and 2 seem to be involved in a huge number of biological processes. No data involving the three selected criteria were found for 8 collagen members. It signifies that these collagen members are yet uncharacterized for their biological process, cellular component and molecular function. From ProtParam result, it was observed that for all the residues, the percentage of glycine and proline was higher than the other residues. Except collagen 12, 14 and 20 glycine content was higher than 12% (Figure 1). High glycine content is a necessity for collagens to maintain their triple helical structure since larger amino acids cause steric hindrance [12]. Proline content for all members except collagen 12, 14 and 20 was observed more than 10% (Figure 1). Proline residues of collagen are necessary to stabilize the helix and disrupt the structure of secondary structural elements [13]. So, for collagen to be a protein molecule and carry out processes like cell migration and cell adhesion, proline concentration is important. The total number of positively (Arg + Lys) and negatively (Asp + Glu) charged residues of the collagen members were observed Table  2 (see supplementary material). For 14 members, the total number of positively charged residues exceeded the total number of negatively charged residues; they showed their isoelectric point in alkaline range. For the remaining members, the isoelectric point was within acidic range. Extinction coefficient for collagen 7, 11, 12, 14, 17, 18 and 20 was observed higher than remaining members. Higher extinction coefficient means higher concentration of lysine, tryptophan and tyrosine. This observation is important for protein-protein interaction studies. Whether a protein is stable or not can be described by its instability index. Instability index for collagen 14, 17, 18 and 20 is higher than 40 and thus describing these proteins as unstable [14]. High aliphatic index was observed for collagen 12, 14 and 20. Higher aliphatic index indicates higher concentration of alanine, valine, isoleucine and leucine occupying the relative volume of a protein [14]. Again, higher aliphatic index provides higher thermostability. For bovine alpha-1 collagen, aliphatic index ranges from 35.18 to 84.29. It is a wide range and suggests that most of the collagens may be stable. For collagen 14 and 20, the instability index and aliphatic index results contradict greatly. According to instability index, Grand Average of Hydropathy (GRAVY) was computed for all the members. A broad range of GRAVY value was observed from -0.955 to -0.223 for bovine alpha-1 collagen. By ExPASy's ProtScale tool hydrophobicity was measured and it ranges from -0.3555 for collagen 20 (most hydrophilic) to 0.1275 for collagen 23 (most hydrophobic). Average flexibility ranges between 0.4365 and 0.459; a short range which indicates high glycine and proline content in the proteins Table 3 (see supplementary  material). SOPMA analysis was done for all bovine alpha-1 collagen members and it showed a high value for Random coil in all the members Table 4 (see supplementary material). The values for alpha helix were found higher than extended strands in 13 collagens. High value for random coil bears important significance in the study of protein tertiary structure and related functions. Collagen 1, 2, 3 motifs were described as VWFC domain signature and the other two were described as pancreatic trypsin inhibitor family signature. As VWFC domain is involved in oligomerization, so it could be related to the assembly of collagen into a triple helical structure. Furthermore, Collagen 7 and 28 were showed to have pancreatic trypsin inhibitor (Kunitz) family signature which manifest strong matches in the motifs Table 5 (see supplementary material). Phylogenetic tree was constructed with distance based Neighbor-Joining method. A number of clusters were found including 6 and 28, 12 and 14, 21 and 22, 1 and 2, 4 and 13, 17 and 25 lying in close proximity to 26, 20, 9, 3, 7 and 23 respectively (Figure 2). Proteins in close evolutionary relationship may be analyzed together for their involvement in similar biological processes. Collagen 1 has already been reported as a key player in cattle osteogenesis imperfecta [15]. The FACIT (Fibril Associated Collagens with Interrupted Triple Helices, Collagen 9, 12, 14 and 20) get associated with collagen 1 and then form fibrillar structure. In human, collagen 9 has already been reported as responsible for skeletal disorders [16]. Collagen 12,14 and 20 might be potent target in pathological conditions. Based on their similarities and abnormalities in structural properties, these protein molecules might be accounted for investigation for their involvement in pathological conditions.

Conclusion:
In this research, we tried to disclose the hidden information about bovine alpha-1 collagen by analyzing their structural features e.g. amino acid content, physico-chemical properties, secondary structural features and phylogenetic classification.
Various computational tools were used to ease up the process of finding. Change in protein structure can cause impairment of protein function and develop many pathological conditions. Disease conditions interfere with the normal biological processes in animals. Apart from these, based on comparative characterization and analyzing the evolutionary relationship it can be hypothesized that collagen 12, 14 may be potential target in pathological conditions and they show a close resemblance with collagen 20 in phylogenetic tree for which cellular and molecular function still not revealed. Hence it can be assumed that collagen 12 and 20 also can interact with the fibril surface and regulate fibrillogenesis which is a unique feature of collagen 14. Moreover, all of these collagens belong to FACIT collagen family and share similar properties and abnormal behaviors; e.g. they have very low percentage of glycine content, high instability index and high aliphatic index. To sum up, this experiment will provide an insight for the biologists working with ECM proteins in order to prosecuting research on collagen to find out different cell mediated injuries and so on. Findings of this study need further studies and validation by experimental research.