Serum and Plasma Metabolomic Biomarkers for Lung Cancer

In drug invention and early disease prediction of lung cancer, metabolomic biomarker detection is very important. Mortality rate can be decreased, if cancer is predicted at the earlier stage. Recent diagnostic techniques for lung cancer are not prognosis diagnostic techniques. However, if we know the name of the metabolites, whose intensity levels are considerably changing between cancer subject and control subject, then it will be easy to early diagnosis the disease as well as to discover the drug. Therefore, in this paper we have identified the influential plasma and serum blood sample metabolites for lung cancer and also identified the biomarkers that will be helpful for early disease prediction as well as for drug invention. To identify the influential metabolites, we considered a parametric and a nonparametric test namely student׳s t-test as parametric and Kruskal-Wallis test as non-parametric test. We also categorized the up-regulated and down-regulated metabolites by the heatmap plot and identified the biomarkers by support vector machine (SVM) classifier and pathway analysis. From our analysis, we got 27 influential (p-value<0.05) metabolites from plasma sample and 13 influential (p-value<0.05) metabolites from serum sample. According to the importance plot through SVM classifier, pathway analysis and correlation network analysis, we declared 4 metabolites (taurine, aspertic acid, glutamine and pyruvic acid) as plasma biomarker and 3 metabolites (aspartic acid, taurine and inosine) as serum biomarker.


Background:
Lung cancer is the leading cause of cancer mortality in United States as well as all over the world [1]. In 2012, 1.8 million people were affected by lung cancer and 1.6 million deaths [2] worldwide. This is the most common cause of cancer-related death in men and second most common in women after breast cancer [3]. Early diagnosis of lung cancer can increase the survival rate at 85% [4]. Till now, there are no FDA-approved diagnostic tests available for detecting the existence of lung cancer [5]. However, Due to the development of molecular biology early diagnosis of cancer is possible through metabolomics data analysis.
Metabolomics is the powerful high throughput technology based on the entire set of metabolites that provide potential information because it measures and quantify the end product of cellular metabolism. Any disorder of cellular process is revealed with the changes of metabolites level. Therefore, differentially expressed (DE) metabolites identification between normal and cancer patient is very important for early diagnosis of disease as well as metabolomic biomarker discovery. There are several parametric and non-parametric approaches for DE  In this paper, we also classified the up regulated and downregulated metabolites by using cluster heatmap plot [12]. Upregulated and down-regulated metabolites for lung cancer are important to early diagnosis the disease, for drug discovery and biomarker discovery. On the basis of heatmap plot, importance plot (importance score is calculated using SVM classifier with radial basis kernel function), pathway analysis and correlation network plot finally we identified the biomarker metabolites.
In this paper, we took plasma and serum blood samples for identifying the significant metabolites and biomarker discovery. Here, we got 27 significant (p-value<0.05) metabolites from plasma sample and 13 significant (p-value<0.05) metabolites from serum sample for lung cancer. According to the importance plot, pathway analysis and metabolomic correlation network analysis, we declared 4 metabolites (taurine, aspertic acid, glutamine and pyruvic acid) as plasma biomarker and 3 metabolites (aspartic acid, taurine and inosine) as serum biomarker for lung cancer. Among these metabolites taurine, aspartic acid and pyruvic acid are up regulated and glutamine and inosine are down regulated in cancer patient.

Methodology:
In this paper, we have identified the significant metabolites from plasma and serum samples using student's t-test and Kruskal-Wallis test. This test has been implemented by R-software using function t.test and kruskal.test. Heatmap plot and importance plot using SVM have also been implemented by R-software in library gplots and caret respectively. We did pathway analysis by online software MetaboAnalyst 3.0 [13]. The detail description of the analyzed dataset and the significant metabolite identification methods namely student's t-test and Kruskal-Wallis test are below.
If σ1 2 ≠ σ2 2 ; the test statistic is, In both cases, the Ho will be rejected at α % level of significance, if the calculated t value is greater than the tabulated t value with n1 + n2 -2 degrees of freedom and α % level of significance.
If X is a metabolomics data matrix that contains two types of samples (e.g., cancer vs. control), then for the ith metabolite xij; j=1,2, . . . n1 is a sample for type-1 (e.g., cancer) with sample size n1 and xik; k=1,2, . . ., n2 , is the sample for type-2(e.g., control) with sample size n2 ; we assume Ho, "the ith metabolite is not differentially expressed between cancer vs. control group". Usually Ho is rejected if 'p-value'< 0.05.

Kruskal-Wallis:
This is a non-parametric test was proposed by Kruskal and Wallis [14] and it is used when the data does not satisfy the normality property and contains outliers. The test statistic of Kruskal-Wallis for k groups each of size ni is defined by where, N is the total number and Ri is the sum of the ranks for the i-th sample and we assume Ho is that all k distribution functions are equal.

Dataset Description:
The dataset used in this paper was produced by Gas chromatography time of flight mass spectrometry (GC-TOF-MS) using the blood sample of 82 subjects (20 males and 62 females). All samples were collected Among the 82 subjects, 41 blood samples came from the patients with lung cancer and another 41 samples were taken from the individuals without cancer. These blood samples were acquired from the bio-repositories of two institutes (Fred Hutchison Cancer Research Center (FHCRC) and University of California at Davis Medical Center (UCDMC)). All samples were collected with individuals consent and followed the IRB protocols, which was approved by each Institution's Institutional Review Board and its aim was to use only for research purposes. Blood samples were collected using EDTA tubes and prepared the samples (serum and plasma) using approved protocols and stored at −80 °C. Raw data of GC-TOF-MS were processed using ChromaTOF software (v. 2.32) for peak finding and mass spectral deconvolution. Result files were exported and filtered for consistency using the UC Davis Metabolomics BinBase database. Finally, 158 metabolites were identified as known metabolites. Thus, the dataset contain 82 subjects (41 cancer and 41 control) and 158 metabolites. This dataset had been produced by Oliver Fiehn, whose study ID was ST000392 [5]. We used log2 transformation and auto-scaling to normalize the dataset.

Results and Discussion:
To identify the significant metabolites from the plasma and serum blood samples for lung cancer, we used Student's t-test and Kruskal -Wallis test, where, p-values were adjusted using Benjamini-Hochberg (BH) procedure. The lists of significant metabolites of plasma and serum samples for lung cancer are given in Table 1. Table 1 shows that 27 significant metabolites were identified from the plasma samples for lung cancer, among those significant metabolites, t-test identified 24 and Kruskal-Wallis test identified 25 significant metabolites (BH adjusted p-value<0.05). On the contrary, 13 significant metabolites were identified from the serum samples for lung cancer, among these 13 significant metabolites, t-test identified 12 and Kruskal-Wallis test identified 11 significant metabolites (BH adjusted p-value<0.05). Using the identified significant metabolites, we also drew the heatmap plot to classify the up regulated and downregulated metabolites for plasma and serum sample, we also ranked those metabolites according to the importance (importance score is calculated using SVM classifier with radial basis kernel function), which were depicted in Figure 1 and Figure 2 respectively. Furthermore, we analyzed the pathway and drew the correlation network plot to identify the plasma and serum biomarker for lung cancer. Figure 3 and Figure 4 contain the pathway analysis plot and correlation network plot to identify the plasma and serum biomarker for lung cancer. From Figure 3 (a), we got three important metabolomic pathway for plasma biomarker namely (i) alanine, aspartate and glutamate metabolism pathway, (ii) taurine and hypotaurine metabolism pathway and (iii) pyruvate metabolism pathway. Also from Figure 4(a), we got two important metabolomic pathway for serum biomarker: (i) alanine, aspartate and glutamate metabolism pathway and (ii) taurine and hypotaurine metabolism pathway. According to the importance plot (Figure 1 and Figure 2), pathway analysis and metabolomic correlation network analysis (Figure 3 and Figure 4), we declared 4 metabolites (taurine, aspertic acid, glutamine and pyruvic acid) as plasma biomarker and 3 metabolites (aspartic acid, taurine and inosine) as serum biomarker for lung cancer. Among these metabolites taurine, aspartic acid and pyruvic acid are up regulated and glutamine and inosine are down regulated in cancer patient. This is the dry laboratory based untargeted metabolomics results. To get the final and more accurate results, further analysis could be the wet laboratory experiment for targeted metabolomics analysis. Table 1: List of significant metabolites of plasma and serum samples for lung cancer.

Conclusion:
We analysed GC-TOF-MS based untargeted metabolomics data of plasma and serum blood samples. Blood samples were collected from 41 lung cancer cases and 41 control subjects to identify the significant metabolites as well as to discover the plasma and serum biomarker for lung cancer. In our analysis, we got 27 significant metabolites (BH adjusted p-value<0.05) from plasma samples and 13 significant metabolites (BH adjusted p-value<0.05) for serum samples for lung cancer. We also got 3 important pathway: (i) alanine, aspartate and glutamate metabolism pathway, (ii) taurine and hypotaurine metabolism pathway and (iii) pyruvate metabolism pathway from plasma samples and 2 important pathway: (i) alanine, aspartate and glutamate metabolism pathway and (ii) taurine and hypotaurine metabolism pathway from serum samples for lung cancer. On the basis of the importance plot, pathway analysis and metabolomic correlation network analysis, we declared 4 metabolites (taurine, aspertic acid, glutamine and pyruvic acid) as plasma biomarker and 3 metabolites (aspartic acid, taurine and inosine) as serum biomarker for lung cancer. Among these metabolites taurine, aspartic acid and pyruvic acid are up regulated and glutamine and inosine are down regulated in cancer patient. We think, this analysis could be helpful for targeted metabolomics researcher, who may validate the result by wet laboratory experiment.