BACK TO CONTENTS   |    PDF   |    PREVIOUS   |    NEXT

Title

Handling class imbalance problem in miRNA dataset associated with cancer

 

Authors

Ram Kothandan

 

 

Affiliation

Department of Biological Sciences, BITS PILANI K K Birla Goa Campus, Zuarinagar, Vasco Da Gama, India

 

 

Email

mailram1986@gmail.com; *Corresponding author

 

Article Type

Hypothesis

 

Date

Received January 09, 2015; Accepted January 23, 2015; Published January 30, 2015

 

Abstract

MiRNAs are small (~22nt long) non-coding RNA sequences; binds to the complementarity target sites in 3' Untranslated Region (UTR) of mRNA sequences but not restricted to other mRNA regions viz., 5' UTR and Coding sequences (CDS). Complementarity binding of miRNA to mRNA target sites either results in complete degradation of the mRNA itself or it may regulate the mRNA as an oncogene or as a tumor suppressor gene. However, the exact mechanism involved in identifying a miRNA to be associated with cancer is still unclear. Further, with the outburst in the number of miRNAs sequences recorded every year in miRBase, the gap is still widening mainly due to the laborious and economically unfavorable experimental procedures associated with the functional annotation. Motivated by the fact, we constructed a two-step support vector machine-based predictive model - miRSEQ and miRINT. However, the major pitfall during the construction of the model is the class imbalance problem. Hence, in order to overcome class imbalance problem, in the present study we empirically compare the effectiveness of two different methods viz., Synthetic Minority Oversampling Technique (SMOTE) and cost-senstive learning method. Performance measures were evaluated in terms of Precision and Recall. Based on our result, it was observed that for miRNA dataset with high class imbalance utilized for predicting association of cancer, cost-sensitive method outperformed the oversampling method.  

 

Keywords

Cost-sensitive, SMOTE, miRNA-mRNA interaction, Support Vector Machines.

 

Citation

Kothandan,   Bioinformation 11(1): 006-010 (2015)
 

Edited by

P Kangueane

 

ISSN

0973-2063

 

Publisher

Biomedical Informatics

 

License

This is an Open Access article which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. This is distributed under the terms of the Creative Commons Attribution License.