Identification and analysis of pathogenic nsSNPs in human LSP1 gene

LSP1 (Lymphocyte-specific protein 1) protein plays an important role in neutrophil motility, fibrinogen matrix proteins adhesion, and trans-endothelial migration. Variation in the LSP1 gene is associated with leukemia and lymphomas in tumor cells of Hodgkin's disease and breast cancer. Despite extensive study on the human LSP1, a comprehensive analysis on the Single Nucleotide Polymorphism (SNPs) of the gene is not available. Therefore, it is of interest to identify, collect, store and analyze the SNPs of the LSP1 gene in relation to several known diseases. Hence, the SNP data (398 rsids) from dbSNP database was downloaded and mapped to the genomic coordinate of "NM_002339.2" transcript expressed by LSP1 (P33241). There were 300 nsSNPs with missense mutation in the dataset. Tools such as SIFT, PROVEAN, Condel, and PolyPhen-2 were further used to identify 29 highly deleterious or damaging on synonymous SNP (nsSNPs) for LSP1. These high confident damaging nsSNPs were further analyzed for disease association using SNPs and GO tool. SNPs of the gene such as nsSNPs C283R, G234R, Y328D and H325P showed disease association with high prevalence.


Background:
Human LSP1 (lymphocyte specific protein 1) gene encodes an intracellular F-actin binding protein, recently renamed as leukocyte specific protein. The protein is expressed in lymphocytes, macrophages, neutrophils, and endothelium and regulates adhesion to fibrinogen matrix proteins, neutrophil motility, and transendothelial migration. Due to alternative splicing there are multiple transcript variants which encodes different isoforms. Highest expression of this gene in spleen (RPKM 60.6), appendix (RPKM 43.3) and other tissues [1, 2] is known. LSP1 is found in plasma membrane internal surface of the, the cytoplasm, and is thought to mediate cytoskeleton-driven responses in activated leukocytes that involve receptor capping, cell-cell interactions and cell motility [3]. Lymphocyte specific protein 1 modulates leukocyte populations in resting and inflamed peritoneum [2]. The LSP1 protein is detected in leukemia and lymphomas in tumor cells of Hodgkin's disease and breast cancer [4]. The motility of melanoma cell is inhibited even at low level of LSP1 expression [5]. Many research showed identifying the deleterious effectiveness and disease associated mutations, thus predicting the pathogenic nsSNPs in correlation to their functional and structural damaging properties [6][7][8][9]. Computational studies provide an efficient platform for analysis of genetic mutations for their pathological consequences and in determining their underlying molecular mechanism [10-11]. Single nucleotide polymorphism (SNPs) is a common genetic variations contributing greatly towards the phenotypic variations in the populations. SNPs can alter the functional consequences of proteins. In the coding region of gene, SNPs may be synonymous, non-synonymous (nsSNPs) or nonsense. Synonymous SNPs changes the nucleotide base residue but does not change the amino acid residue in protein sequence due to degeneracy of genetic code. The nsSNPs also called missense ©Biomedical Informatics (2019) variants, alter amino acid residue in protein sequence and thus change the function of protein through altering protein activity, solubility and protein structure. Nonsense SNPs introduce premature termination in the protein sequence. SNPs have been emerged as the genetic markers for diseases and there are many SNPs markers available in the public databases. With recent advances in high-throughput sequencing technology, many new SNPs have been mapped to human LSP1genes. However, not all SNPs are functionally important. Despite extensive studies of LSP1 proteins in human and effect of their polymorphism in diseases, no attempts was made to comprehensively and systematically analyze to establish the functional consequences of SNPs of LSP1 gene. The aim of this study is to identify the high confident pathogenic SNPs of LSP1 gene and determine their functional consequences using computational methods.

SNPs dataset
The SNPs of the LSP1 (Lymphocyte-specific protein 1) protein were retrieved from the dbSNP database [12]. I used "LSP1" as our search term and filter SNPs. Furthermore, I mapped these SNPs on the genomic coordinate of "NM_002339.2" transcript expresses LSP1 protein (P33241) for computation analysis of the effect of missense variant. The protein sequences of genes, LSP1 (P33241) was retrieved from the UniProt database [18]. I employed various computational approaches to identify the pathogenic SNPs and their effect on structural and functional consequences of LSP1 (Figure 1) Tools used for the prediction of SNPs effects Predicting deleterious and damaging nsSNPs SIFT: The algorithm predicted that the tolerant and intolerant coding base substitution based upon properties of amino acids and homology of sequence [13]. The tool considered that vital positions in the protein sequence have been conserved throughout evolution and therefore substitutions at conserved alignment position is expected to be less tolerated and affect protein function than those at diverse positions. I used SIFT version 2.0 [19], which predicted the amino acid substitution score from zero to one. SIFT predicted substituted amino acid as damaging at default threshold score <0.05, while score ≥ 0.05 is predicted as tolerated.

PROVEAN:
The online tool uses an alignment-based scoring method for predicting the functional consequences of single and multiple amino acid substitutions, and in-frame deletions and insertions [14]. The tool has a default threshold score, i.e. -2.5, below which a protein variant is predicted as deleterious, and above that threshold, a protein variant is neutral.

Condel (CONsensus DELeteriousness):
This tool evaluates the probability of missense single nucleotide variants (SNVs) deleterious. it computes a weighted average of the scores of SIFT, PolyPhen2, Mutation Assessor and FatHMM [15].

PolyPhen-2:
This tool is predicting the structural and functional consequences of a particular amino acid substitution in human protein [16]. Prediction of PolyPhen-2 server [20] is based on a number of features including information of structural and sequence comparison. The PolyPhen-2 score varies between 0.0 (benign) to 10.0 (damaging). The PolyPhen-2 prediction output categorizes the SNPs into three basic categories, benign (score < 0.2), possibly damaging, (score between 0.2 and0.96), or probably damaging (score >0.96).

Predicting disease associated nsSNPs SNPs & GO:
A web server predicting whether an amino acid substitution is associated to a disease or not [17]. It is a SVM (Support Vector Machine) based tool which takes features of protein sequence, evolutionary information, and functional annotation according to Gene Ontology terms. Isoform 1 of Swiss-Prot Code of LSP1 (P33241) was used and provided the list of amino acid mutations. The results predicted the probability for the polymorphisms of helicase whether being disease-associated or not by three methods: (a) SNPs & GO, (b) PhD-SNP, and (c) PANTHER. Probability score >0.5 is predicted as disease associated variation.

Results and Discussion:
398rsIDof nsSNPs mapped in human LSP1 gene was downloaded from dbSNP database of NCBI (Table 3), after filtering variation class SNV and function class missense, there were 9590 SNPs mapped to intron, while 457SNPs mapped to 5'UTR, 134SNPs mapped to 3'UTR and 10815 mapped to total SNPs of different variation class (Figure 2). Some rsIDs are associated with multiple SNPs and therefore fall in different classes.

Predicting deleterious and damaging nsSNPs
In order to predict the damaging or deleterious nsSNPs multiple consensus tools were employed. Initially, online tool VEP was used [21]. VEP advantages include: it uses latest human genome assembly GRCh38.p10, and can predict thousands of SNPs from multiple tools including SIFT, Condel, and PolyPhen-2, at a time. 398 nsSNP accession numbers were uploaded to VEP tool and the prediction results were taken for further analysis. 300 missense SNPs was mapped to NM_002339.2 on default scores of consensus tools based on sequence and structure homology methods: (a) SIFT (score <0.5) and (b) PROVEAN (score <-2.5) and Condel (score >0.522). In order to get a very high confident nsSNPs impacting structure and function of LSP1, I considered high stringent scores across different consensus tools. At parameters of SIFT (score = 0), Polyphen (score >0.96) and Condel (score >0.9), I got 40 nsSNPs ( Table 1). These 40nsSNPs were further analyzed by PROVEAN, which gave 29 nsSNP at default cutoff at -2.5 score fall in the predicted category of deleterious and have damaging effect on protein structure and function ( Table 1). Identifying disease associated nsSNPs Furthermore, 29 selected amino acid substitutions in LSP1 protein were used to analyze for disease association. LSP1 Protein ID "P33241" isoform-1and its amino acid mutations were submitted to "SNPs & GO" tool [22] and the predicted disease association from three different tools were analyzed. The output of (a) SNPs & GO predicted 4SNPsC283R, G324R, Y328D and H325P are associated with disease and (b) PhD-SNP predicted 14 SNPsR207P, I227T, Q233R, Q233K, T235I, T235P, E239K, C283R, W297S, Y328D, Y318C, K319T, G324R,H325P are associated with diseases, while (c) PANTHER predicted 4 SNPs C283R, L296H, S276C and G301R as disease associated ( Table 2).

Conclusion
A comprehensive analysis of SNPs of the human LSP1protein with known disease-associated mutations is reported for the first time.
The study identified 29 nsSNPs as highly damaging nsSNPs of the human LSP1protein. These high confident damaging nsSNPs were further analyzed for disease association by manual data mapping. Prediction analysis shows that SNPs C283R, G324Rand H325P and Y328D have high prevalence for disease association. Data implies that the reported nsSNPs could potentially alter structure and hence the function of LSP1 protein resulting in pathogenicity with abnormal symptoms describing the disease states. These nsSNPs were associated with significant pathogenicity pending experiment verification to link disease prevalence.