Meta-analysis of lean and obese RNA-seq datasets to identify genes targeting obesity

Obesity is a global crisis leading to several metabolic disorders. Modernization and technology innovation has been easier for next generation sequencing using open-source online software galaxy, which allows the users to share their data and workflow mapping in an effortless manner. This study is to identify candidate genes for obesity by performing differential expression of genes. RNA-Seq analysis was performed for six different datasets retrieved from GEO database. 258 datasets from obese patients and 55 datasets from lean patients were analysed for differentially expressed genes (DEGs). DEGs analysis showed 1971 upregulated genes and 615 downregulated genes with log2FC count ≥ 2.5 and p-value < 0.05. The Gene enrichment analysis performed using Gene Ontology resource highlighted pathways associated to obesity such as cholesterol metabolism, Fat digestion and absorption and glycerolipid metabolism. Using string database protein-protein interactions network was built and the network clusters were visualized using Cytoscape software. The protein-protein interactions of the upregulated and downregulated genes were mapped to form a network, wherein PNLIP (Pancreatic lipase) and FTO (Fat mass and obesity associated protein) gene clusters were visualized as densely connected clusters in MCODE. PNLIP and FTO with its associated genes were identified as candidate genes for targeting obesity.

with log2FC count ≥ 2.5 and p-value < 0.05.The Gene enrichment analysis performed using Gene Ontology resource highlighted pathways associated to obesity such as cholesterol metabolism, Fat digestion and absorption and glycerolipid metabolism.Using string database protein-protein interactions network was built and the network clusters were visualized using Cytoscape software.The protein-protein interactions of the upregulated and downregulated genes were mapped to form a network, wherein PNLIP (Pancreatic lipase) and FTO (Fat mass and obesity associated protein) gene clusters were visualized as densely connected clusters in MCODE.PNLIP and FTO with its associated genes were identified as candidate genes for targeting obesity.

Background:
Obesity is increasing at an alarming rate leading to various metabolic diseases [1].The identification of candidate gene for obesity is highly important for treating this global epidemic crisis [2].Modernization and technology innovation created novel sequencing technologies in genome sequencing whereby large DNA fragments were detected using Next Generation sequencing technique (NGS) [3].In recent years, RNA sequencing has been widely exploited to continuously monitor the changes in cellular transcriptome [4].The objective of RNA-Seq is to create profiling of gene expressions by identification of genes or their corresponding molecular pathways and understanding the differentially expressed genes among two or more biological conditions using galaxy platform.The dataset for obesity is imported from public databases to identify differentially expressed genes (DEG) involved in obesity.

Materials and Methods:
Next generation RNA sequencing samples were retrived from NCBI GEO Database from 5 different studies namely GSE152991 [5], GSE132831 [6], GSE86430 [7], GSE148892 [8], GSE161042 [9] and GSE137631 [10] (Table 1).A total of 313 samples of which, 258 samples were from obese patients and 55 samples were from lean patients.The datasets were imported into the Galaxy Server (https://usegalaxy.org.au/) using the tool Faster Download and Extract Reads in FASTQ format from NCBI SRA (Galaxy Version 2.11.0 + galaxy0).The read quality check was performed by using the tool FastQC Read Quality reports (Galaxy Version 0.73 + galaxy0).Trimmomatic, a flexible read trimming tool for Illumina NGS data (Galaxy Version 0.36.6)[11] was run with default parameters and phred quality score.The consolidated report was generated using MultiQC aggregate results from bioinformatics analyses into a single report (Galaxy Version 1.11 + galaxy0) [12].Sequence Mapping and Alignment was performed using HISAT2, a fast and sensitive alignment program (Galaxy Version 2.2.1 + galaxy1) [13].The RNA sequence reads were mapped to reference human genome version hg38.FeatureCounts was used to measure gene expression in RNA-Seq experiments from BAM files (Galaxy Version 2.0.1 + galaxy2) .Annotations for gene regions were provided in the GTF format.Differential expression gene analysis was performed on two factors namely, Obese VS lean patients with limma-voom (Galaxy Version 3.50.1+galaxy0)[14].
The differentially expressed upregulated and downregulated genes were submitted to online tool g:GOSt to perform functional enrichment analysis.The DEGs were subjected to Gene ontology (GO) resources (http://www.geneontology.org/).It maps genes to known functional information sources and detects statistically significantly enriched terms.The GO analysis having terms under the three categories such as cellular component (CC), molecular function (MF) and biological process (BP) are completed.The cutoff value for a significant GO term and pathway was set to p-value< 0.05 and log2FC count ≥ 2.5.To further analyses the potential pathway of the overlapping DEGs, gene ontology resources integrate pathways from Kyoto Encyclopedia of Genes and Genomes (KEGG) [15] was used to perform pathway enrichment analysis.

Results
To preliminarily understand the mechanism contributing to the obesity, 313 patients (258 obese patients and 55 lean patients) were selected for subsequent analysis.The differentially expressed genes of obese VS lean samples from limma-voom were examined using volcano plot.The volcano plot represents the expressed fold change of genes in obese vs lean samples were plotted against the degree of statistical significance in differential expression (Figure 1).

Discussion:
The differentially expressed gene set showed both up and downregulated genes.Within, up-regulated gene set, the PNLIP (Pancreatic lipase) gene was the most significantly altered (log2FC=11.08);followed by FTO (Fat mass and obesity associated protein) log2FC=11.06.The high expression of PNLIP and FTO was reported for fat digestion and absorption cholesterol metabolism.

Conclusion:
Meta-analysis of 313 RNA seq dataset of 258 obese samples and 55 lean samples retrieved from GEO database includes six studies namely GSE152991, GSE132831, GSE86430, GSE148892, GSE161042 and GSE137631.The differential gene expression analysis showed 1970 genes were up regulated, and 615 genes were downregulated with a threshold criterion of log FC greater than and equal to 2.5 and p-value less than 0.05.The protein -protein interaction network analysis showed that PNLIP and FTO genes were identified as candidate genes targeting obesity.

Figure 2 :Figure 3 :
Figure 2: The fold enrichment pathway for upregulated genes

Figure 4a :
Figure 4a: The Cluster 1(PNLIP) derived from the protein-protein interactions (PPI) network using MCODE with a score of 9.75 and 4b: Cluster 2 (FTO) derived from the protein-protein interactions (PPI) network using MCODE with a score of 8.04.Ellipse and lines represent the nodes and edges, respectively After STRING analysis, Cytoscape was used to visualize and identify the PPI network.MCODE plugin (version 2.0.0) was used to identify the hub genes, and the parameters of DEG clustering and scoring were as follows: For cluster1 MCODE score=9.750,Degree Cut-off=2, Node Score Cut-off=0.2,k-score=2, and Max.Depth=100.It consists of 89 nodes and 429 edges.For Cluster2,

Table 1 . Next generation RNA sequencing samples from GEO Database GEO
was used for the visualization of the PPI by importing the tsv file of the STRING database.Cytoscape helps to organize the imported network as a graph by representing the molecular species in the form of nodes and edges, where each node represented a protein product of single-gene and edges represented the protein-protein association.The Molecular Complex Detection (MCODE) [18] plugin of the Cytoscape app was used to identify the densely connected regions/clusters in the PPI network.The top ranked gene clusters of the interactive network were extracted according to their scores.
[16]protein-protein interactions (PPI) network of the obesity genes associated were constructed with the help of the online Search Tool for the Retrieval of Interacting Genes (STRING) database[16].Cytoscape (version 3.9.1)[17],

Table 2 .
The first neighbour of PNLIP (Pancreatic Lipase) from cluster 1 and FTO (Fat mass and obesity protein) from cluster 2 was obtained and represented in Figure4aand 4b respectively.