
Improving classification of mature microRNA by solving class imbalance problem
Author(s) -
Ying Wang,
Xiaoye Li,
Bairui Tao
Publication year - 2016
Publication title -
scientific reports
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.24
H-Index - 213
ISSN - 2045-2322
DOI - 10.1038/srep25941
Subject(s) - support vector machine , computer science , classifier (uml) , artificial intelligence , adaboost , machine learning , microrna , multiclass classification , pattern recognition (psychology) , random subspace method , ensemble learning , data mining , gene , biology , genetics
MicroRNAs (miRNAs) are ~20–25 nucleotides non-coding RNAs, which regulated gene expression in the post-transcriptional level. The accurate rate of identifying the start sit of mature miRNA from a given pre-miRNA remains lower. It is noting that the mature miRNA prediction is a class-imbalanced problem which also leads to the unsatisfactory performance of these methods. We improved the prediction accuracy of classifier using balanced datasets and presented MatFind which is used for identifying 5′ mature miRNAs candidates from their pre-miRNA based on ensemble SVM classifiers with idea of adaboost. Firstly, the balanced-dataset was extract based on K-nearest neighbor algorithm. Secondly, the multiple SVM classifiers were trained in orderly using the balance datasets base on represented features. At last, all SVM classifiers were combined together to form the ensemble classifier. Our results on independent testing dataset show that the proposed method is more efficient than one without treating class imbalance problem. Moreover, MatFind achieves much higher classification accuracy than other three approaches. The ensemble SVM classifiers and balanced-datasets can solve the class-imbalanced problem, as well as improve performance of classifier for mature miRNA identification. MatFind is an accurate and fast method for 5′ mature miRNA identification.