HOME   |    PDF   |   


Identification of marker genes in Alzheimer's disease using a machine-learning model



Inamul Hasan Madar1,*, Ghazala Sultan2, $, Iftikhar Aslam Tayubi3, Atif Noorul Hasan4, Bandana Pahi5, Anjali Rai6, Pravitha Kasu Sivanandan7, Tamizhini Loganathan8, Mahamuda Begum9, Sneha Rai10



1Department of Biotechnology, School of Biotechnology and Genetic Engineering, Bharathidasan University, Tiruchirappalli - 620024, Tamil Nadu, India; 2Department of Computer Science, Faculty of Science, Aligarh Muslim University, Aligarh - 202002, Uttar Pradesh, India; 3Faculty of Computing and Information Technology, Rabigh, King Abdulaziz University, Jeddah - 21589, Kingdom of Saudi Arabia; 4Department of Computer Science, Jamia Millia Islamia (Central University), Jamia Nagar - 110025, New Delhi, India; 5Department of Bioinformatics, Sambalpur University, Jyoti Vihar, Burla, Sambalpur - 768019, Odisha, India; 6Department of Biotechnology and bioinformatics, Mahila Maha Vidyalaya , Banaras Hindu University, Varanasi - 221005, Uttar Pradesh, India; 7Department of Bioinformatics, School of Biosciences, Sri Krishna Arts and Science College, Coimbatore - 641008, Tamil Nadu, India; 8Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, IIT Madras and Initiative for Biological Systems Engineering (IBSE), Chennai - 600036, Tamil Nadu, India; 9PG and Research Department of Biotechnology, Marudhar Kesari Jain College for Women, Vaniyambadi - 635751, Tamil Nadu, India; 10Department of Biological Sciences and Engineering, Netaji Subhas Institute of Technology, Dwarka - 110078, New Delhi, India



*Corresponding author e-mail id: inambioinfo@gmail.com


Article Type

Research Article



December 20, 2020; Revised February 24, 2021; Accepted February 27, 2021, Published February 28, 2021



Alzheimer's Disease (AD) is one of the most common causes of dementia, mostly affecting the elderly population. Currently, there is no proper diagnostic tool or method available for the detection of AD. The present study used two distinct data sets of AD genes, which could be potential biomarkers in the diagnosis. The differentially expressed genes (DEGs) curated from both datasets were used for machine learning classification, tissue expression annotation and co-expression analysis. Further, CNPY3, GPR84, HIST1H2AB, HIST1H2AE, IFNAR1, LMO3, MYO18A, N4BP2L1, PML, SLC4A4, ST8SIA4, TLE1 and N4BP2L1 were identified as highly significant DEGs and exhibited co-expression with other query genes. Moreover, a tissue expression study found that these genes are also expressed in the brain tissue. In addition to the earlier studies for marker gene identification, we have considered a different set of machine learning classifiers to improve the accuracy rate from the analysis. Amongst all the six classification algorithms, J48 emerged as the best classifier, which could be used for differentiating healthy and diseased samples. SMO/SVM and Logit Boost further followed J48 to achieve the classification accuracy.



Alzheimer's Disease, Biomarkers, In-silico Analysis, Machine Learning, Cross-validation, Classifiers, Bayes Net, Na´ve Bayes, Decision Table, J48, SMO/SVM, Log it Boost.



Madar et al. Bioinformation 17(2): 348-355 (2021)


Edited by

P Kangueane






Biomedical Informatics



This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.