Effect of non-synonymous SNP on JAK1 protein structure and subsequent function

JAK1 gene plays a critical role in signalling. Malfunction of JAK1 is linked to numerous human diseases ranging from chronic inflammation to cancers and autoimmune diseases. Genetic variations in JAK1 exhibit deleterious effects on gene function leading to the deregulation of signalling pathways. A comprehensive list of nsSNPs potentially affecting the structure and function of JAK1 gene is not available. We report 3 deleterious nsSNPs (F78L, Q644R and S646F) in the coding region of JAK1 with predicted structure and function linking to diseases. However, further studies are needed to validate this preliminary observation.


Background:
The Janus kinases (JAKs) are one of ten recognized families of nonreceptor protein tyrosine kinases that constitute a novel type of signal transduction pathway activated in response to cytokines and interferons. They govern many essential cellular functions such as survival, growth, development, apoptosis and immune regulation [1]. Activation of JAK in response to ligand involves receptor dimerization and phosphorylation creating docking sites for signal transducers and activators of transcription (STATs). Thus, activation of STAT proteins plays an important role in JAK kinase function. Mammals have four members of this family: JAK1, JAK2, JAK3 and Tyrosine Kinase 2 (TYK2). Each member possesses a Cterminal protein tyrosine kinase domain, an adjacent kinase-related domain and five further domains extending towards the N terminus which have amino acid similarity between members of this family [2]. In humans, the JAK1 gene is located on chromosome 1p31.3. JAK1 plays a significant role in lymphoid cell precursor proliferation, survival and differentiation. JAK1 somatic mutations occur in individuals with acute lymphoblastic leukemia (ALL) [3]. JAK1 loss of function results in a deficit in the production of mature B lymphocytes due to a block in differentiation at the pro-B/pre-B cell transition step [4]. JAK1 gene mutations are hypothesised as initial molecular defects in human cancer and autoimmune diseases [5]. Two different heterozygous mutations in the JAK1 gene can contribute to complete loss of the protein in several different prostate cancer cell lines [6].
Understanding the human genetic variation is one of the major challenges to analyse the differences in susceptibility to diseases and designing individualized therapeutic treatments. It was estimated that 90% of human genetic variations were caused by single nucleotide polymorphisms (SNPs). SNPs are often neutral while some of them contribute to disease predisposition by modifying protein function. The non-synonymous single nucleotide polymorphisms (nsSNPs) trigger genomic disparities in the gene coding regions forming about half of all genetic changes related to human inherited diseases. Numerous efforts have been implemented to illustrate the deleterious effects of nsSNPs on the stability of proteins linking to its function. Computer aided prediction tools have been widely applied to identify the effect of deleterious SNPs in candidate genes based on the biochemical severity of the amino acid substitution towards phenotype implications [8]. The functional consequences of most of the nsSNPs in JAK1 are still unknown at the structural level. Therefore, it is of interest to investigate the nsSNPs in the JAK1 gene and the effect on the protein structure and consequent function using prediction models. The deleterious nsSNPs were further analysed for protein stability linking to altered molecular function using computer aided known data enabled prediction tools.

Methodology: Retrieval of SNPs:
Data on SNPs in human JAK1 gene was collected from the dbSNP-NCBI (http://www.ncbi.nlm.nih.gov/SNP/). The FASTA format of the protein sequence of JAK1 gene was obtained from the UniProtKB database (http://www.uniprot.org/uniprot/).

Functional consequence of nsSNPs distinguishing intolerant from tolerant (SIFT):
SIFT (https://sift.bii.a-star.edu.sg/) is a predicting tool for analysing the effects of an amino acid substitution on protein function. SIFT is used to the human variant databases to distinguish mutations (deleterious) involved in disease from neutral polymorphisms. The rsIDs of each nsSNP of JAK1 gene were submitted as a query [9]. Functional impacts of nsSNPs by screening for non-acceptable polymorphisms (SNAP2): SNAP2 (https://www.rostlab.org/services/SNAP/) is a program to predict the functional effects of nsSNPs by differentiating between effective and neutral variants. The FASTA format of protein sequences is used as an input for SNAP2 [10].

Functional effects of nsSNPs using PolyPhen version 2:
PolyPhen-2 (Polymorphism phenotyping) (http://genetics.bwh. harvard.edu/pph2/) scores the impact of amino acid substitutions on the stability and function of proteins. The prediction output is classified as probably damaging, possibly damaging, and benign with specificity and sensitivity values for a mutation [11].
Functional impacts of nsSNPs using PROVEAN: PROVEAN (Protein variation effect analyzer) (http://provean.jcvi.org/index.php) predicts the functional impact of protein sequence variations as 'deleterious' or 'neutral' by assessing the single amino acid substitutions. FASTA format data with substitutions predicted by the SIFT server is used as an input nsSNPs impact on surface and solvent accessibility of protein using NetSurfP: The FASTA sequence of JAK1 protein was submitted to NetSurfP (http://www.cbs.dtu.dk/services/NetSurfP/) to predict secondary structure, surface, and solvent accessibility of amino acids. The outcome shows buried (low accessibility), partially buried (moderate accessibility) and exposed (high accessibility) region in protein structure [17]. Structure to function assignment of nsSNPs using project HOPE: HOPE (Have (y) Our Protein Explained) (https://www3.cmbi.umcn.nl/hope) hypothesizes the effects of the mutation on the 3D structure and the corresponding function. HOPE collects information from data on the 3D coordinates of the protein from PDB, sequence annotations from the UniProt database and DAS prediction [18].

Results and Discussion:
A dataset of SNPs in human JAK1 gene (gene ID: 3716) was retrieved from the dbSNP database. A total of 52315 SNPs were found in human JAK1 gene and 1957 of them were reported in the exon region of the gene. Among them 483 (473 missense and 10 nonsense) were non-synonymous SNPs, which contribute to only 0.92% of all SNPs known in human JAK1 gene. Different

725
©Biomedical Informatics (2019) computational algorithms such as SIFT, SNAP2, Polyphen-2, PROVEAN, SNPs & GO, PhD-SNP and PANTHER were used for finding the deleterious nsSNPs. SIFT was used for the preliminary screening and identification of functionally significant nsSNPs. SIFT classifies missense variant as tolerated (equal to 0.05) or deleterious (less than 0.05). SIFT identified 30 substitutions as tolerated and 20 as deleterious from 483 nsSNPs ( Table 1). It should be noted that the rest of the rsIDs were not found by SIFT [9]. SNAP2 and PolyPhen-2 were used to support the data obtained from SIFT. SNAP2 illustrates the effects of specific mutation altering the native protein function with expected accuracy. SNAP2 gives a score (ranges from −100 strong neutral prediction to +100 strong effect prediction) [10]. The results from SNAP2 shows that out of 50 nsSNPs 15 variants (30%) was significant while the rest (70%) show no effect (Table 1) Table 1). The prediction accuracy using SIFT, SNAP2 and PolyPhen-2 scores was further validated using PROVEAN. The functional effects of protein sequence variations are illustrated using PROVEAN. The predicted values of amino acid substitutions are above -2.5 are neutral and those below or equal to -2.5 are considered as deleterious [12]. Among 50 nsSNPs, 34 amino acid substitutions (68%) were predicted to be neutral (score is above-2.5) and the remaining 16(32%) were having score below or equal -2.5 are assigned to be associated with diseases (Table 1).  Table 1). The most deleterious nsSNPs was confirmed by concordance. It should be noted that mutants were predicted as deleterious by sequence data aided by SVM models. Thus, a total of 3 variants with F78L (rs200161963), Q644R (rs374267637) and S646F (rs151047872) were found to be deleterious by the above methods.  (Table 2).
NetsurfP calculates protein solvent accessibility and assigns secondary structures [17]. The variants S646F, F78L and Q644R were assessed for solvent accessibility and stability. Among these 3 deleterious nsSNPs, 2 variants (S646Fand Q644R) and their respective wild variants were exposed to the surface whereas F78L and its respective wild variant were buried ( Table 2). It is known that polar side chains tend to be exposed to the solvent whereas hydrophobic residues tend to be buried in the interior of the protein the presence of non-polar residues on the surface may reduce stability in the S646F mutant. The leucine side chains cluster together within proteins to stabilize the protein structure through hydrophobic effect. The substitution of phenylalanine by leucine increases protein stability in F78L. The substitution of the polar uncharged residue with a basic charged residue increases the hydrophilicity of surface area in Q644R.  The 3D structure of protein plays a vital role in unveiling the molecular mechanisms leading to a disease. The evaluation of 3D structure of the mutant protein using HOPE [18] was completed. The mutants S464F (Figure 1) and Q644R (Figure 2) were larger than the wild type and they are situated on the surface of the protein. The F78L mutant was smaller in size and it is located within a domain as annotated in UniProt as FERM ( Table 3). The wild-type Q644R is NEUTRAL and the mutant is positively charged. The hydrophobicity of the mutated residue in S464F is more than the wild-type. It is noted in F78L that the mutated residue is located near a highly conserved region. The mutant Q644R shows a substitution of glutamine (polar uncharged) to arginine (basic charged). The mutation of a neutral amino acid with a positively charged residue clashes with the neighboring residues. The large size of the mutated residue on the surface of the protein affects the interaction with other molecules. The combined effect of mutations alters the protein function. Phenylalanine (aromatic) is substituted by leucine (non-polar aliphatic) in F78L. The mutated residue is within FERM and it is crucial for binding with other molecules. The variant with changed properties disrupts the domain with abolishment of function. Further the mutation also results in reduced contact due smaller residue size. The presence of the variant F78L in the highly conserved region is damaging. The substitution of serine (polar uncharged) by phenylalanine (aromatic) in S646F is interesting. This mutation also results in increased size and hydrophobicity of the residue. Residue interacts with different molecules on the surface of the protein. Therefore, its size and location along with changed hydrophilicity has a significant effect in the structural disruption of the protein leading to altered function.

Conclusion:
JAK1 plays a crucial role in regulating diverse signalling cascades. Increasing evidence has suggested that functional SNPs of JAK1 gene is linked to inflammation and cancer. Several nsSNPs of JAK1 gene are known. However, the functional consequences of the nsSNPs in JAK1 as a result of corresponding structural changes remain unknown. Moreover, discriminating deleterious nsSNPs with potential effects on disease susceptibility from tolerated variants is a major challenge. We report the identification of 3 potentially deleterious nsSNPs (F78L, Q644R and S646F) in the coding region of JAK1 gene. Further studies are needed to validate this preliminary observation.