Genomic profiling of Nipah virus using NGS driven RNA-Seq expression data

Nipah virus (NiV) is an ssRNA, enveloped paramyxovirus in the genus Henipaveridae with a case fatality rate >70%. We analyzed the NGS RNA-Seq gene expression data of NiV to detect differentially expressed genes (DEGs) using the statistical R package limma. We used the Cytoscape, Ensembl, and STRING tools to construct the gene-gene interaction tree, phylogenetic gene tree and protein-protein interaction networks towards functional annotation. We identified 2707 DEGs (p-value <0.05) among 54359 NiV genes. The top-up and down-regulated DEGs were EPST1, MX1, IFIT3, RSAD2, OAS1, OASL, CMPK2 and SLFN13, SPAC977.17 using log2FC criteria with optimum threshold 1.0. The top 20 up-regulated gene-gene interaction trees showed no significant association between Nipah and Tularemia virus. Similarly, the top 20 down-regulated genes of neither Ebola nor Tularemia virus showed an association with the Nipah virus. Hence, we document the top-up and down-regulated DEGs for further consideration as biomarkers and candidates for vaccine or drug design against Nipah virus to combat infection.


Background:
Nipah virus (NiV) is a stage III zoonotic pathogen from the family of Paramyxoviridae and a new genus from the Henipavirus [1]. Nipah virus was first discovered in a large encephalitis outbreak in Malaysia in 1998 [2][3][4]. Nipah virus outbreak has been recognized nearly every year in Bangladesh since 2001 and occasionally in neighboring India [5][6][7][8][9]. With the capacity of person-to-person transmission, high case fatality rate (>70%) and no availability of treatment or vaccine, the World Health Organization included the Nipah virus among the 7 Blueprint list of priority diseases and effort for Nipah vaccine development is underway [10-12]. Genes are strongly involved in NiV infection in interferon response in endothelial cells. The chemokine CXCL10 (interferon-induced protein 10, IP-10) gene was identified among the top 10 upregulated genes. The cellular functionality of CXCL10 is a generation of inflammatory immune response and neurotoxicity [13]. Arankalle (18,252 nucleotides) from the lung tissue samples [14]. Detection of DEGs is an important branch of transcriptomics research in bioinformatics. RNA-sequencing (RNA-seq) is the modern Next Generation Sequencing (NGS) technology for genomic profiling of any bacteria, virus or pathogens and other causes of diseases. Identification of DEGs or transcripts associated with the specific trait of interest from the high dimension of transcriptomic data based on NGS RNA-Seq gene expression technique. Previously microarray technology had been used by biological and biomedical researchers for discovering the candidate genes and differentially expressed markers between two or more groups of interest. Additionally, this approach includes the identification of disease biomarkers that may be important in the diagnosis of the different types and subtypes of diseases, with several implications in terms of prognosis and therapy [15]. This sequence-based technology has created significant scope for studying the transcriptome and enabling a wide range of novel 854 ©Biomedical Informatics (2019) applications, including detection of alternative splicing isoforms [16][17][18][19], detecting novel genes, gene promoters, isoforms, and allelespecific expression [20]. RNA-seq uses NGS technology to sequence cDNA that has been derived from an RNA sample, and hence generates millions of short reads [21]. One important objective for RNA-seq is to identify DEGs under different conditions. Researchers typically target for differential expression analysis called "count matrix", where each row represents the gene, each column represents the sample, and each cell indicates the number of reads mapped to the gene in the sample [22]. A basic research problem in many RNA-seq analyses is the discovery of DEGs between different sample groups (e.g. healthy and disease). RNAseq analysis has some benefits over microarrays for DE analysis including wide dynamic range and a lower background level, and the chance to detect and quantify the expression of previously unknown transcripts [23]. Identification of differentially expressed genes from the large scale NGS RNA-Seq data and functional annotation of the Nipah virus were the key objectives of this study.

Materials and Methods: NGS RNA-Seq Microarray Gene Expression Dataset:
We used Microarray gene expression of RNA-Seq data for molecular investigation of NiV infection. We collected the complete genome of selected pathogen from the National Centre for Biotechnology Information (NCBI). To analyze data we considered 7 (seven) different datasets with accession numbers in the GEO (Gene Expression Omnibus) database are as follows GSE32902, GSE23986, GSE93861, GSE18064, GSE12108, GSE69980, GSE89915 [13,24-29].  For functional annotation and biological network of differentially expressed genes were analyzed using STRING, Ensembl and Cytoscape bioinformatics tools respectively. Figure 1 shows the work flow of the manuscript.

Up and Down Regulated DEGs Detection:
We used the log2 fold change or a cutoff at 0.5 for down and 1/2 for up-regulated genes and selected genes under/above thresholds.

Gene-Gene Interaction Network:
We used Cytoscape (https://cytoscape.org/) bioinformatics tool to construct a gene-gene interaction network. Based on the calculated p-value, we selected top 20 up-and down-regulated genes (p-value <0.000481 and log2FC>1) for 7 viruses (Nipah, Chikungunya, Dengue type III, Ebola, Tularemia, Valley fever, Zika) ranked according to increasing order for each gene. The gene-gene interaction tree of the top 20 up-regulated DEGs among the 7 viruses showed that the Tularemia virus had no association with the other 6 viruses including the Nipah virus and, it represents a separate gene tree. The other five viruses, however, showed that a significant association with the Nipah virus and unknown gene symbol (NA; not available) and a strong network was seen among these viruses (Figure 2). From the gene-gene interaction tree of the top 20 down-regulated DEGs, we found that the Ebola and Tularemia virus showed no association with the other five viruses and showed a separate pattern. Nipah virus was strongly associated with Chikungunya, Dengue type III, Valley fever, and Zika virus. The unknown gene symbol (NA) showed a strong association with Nipah, Chikungunya, Dengue type III, Rift Valley fever, and Zika viruses (Figure 3).

Conclusions:
We used the statistical R package limma to analyze the NGS RNAseq data to detect DEGs (biomarker) of the Nipah virus for application in combat and care of the disease. We identified 2707 DEGs (p-value <0.05) among the 54359 genes of the virus. We report 834 up-regulated and 1873 down-regulated DEGs estimated by the log2FC approach at threshold value 1.0. This data will help in the selection of biomarkers and vaccine targets against the virus.