Overcome Support Vector Machine Diagnosis Overfitting
Author(s) -
Henry Han,
Xiaoqian Jiang
Publication year - 2014
Publication title -
cancer informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.606
H-Index - 31
ISSN - 1176-9351
DOI - 10.4137/cin.s13875
Subject(s) - overfitting , support vector machine , computer science , artificial intelligence , machine learning , biomarker discovery , pattern recognition (psychology) , data mining , artificial neural network , proteomics , biology , biochemistry , gene
Support vector machines (SVMs) are widely employed in molecular diagnosis of disease for their efficiency and robustness. However, there is no previous research to analyze their overfitting in high-dimensional omics data based disease diagnosis, which is essential to avoid deceptive diagnostic results and enhance clinical decision making. In this work, we comprehensively investigate this problem from both theoretical and practical standpoints to unveil the special characteristics of SVM overfitting. We found that disease diagnosis under an SVM classifier would inevitably encounter overfitting under a Gaussian kernel because of the large data variations generated from high-throughput profiling technologies. Furthermore, we propose a novel sparse-coding kernel approach to overcome SVM overfitting in disease diagnosis. Unlike traditional ad-hoc parametric tuning approaches, it not only robustly conquers the overfitting problem, but also achieves good diagnostic accuracy. To our knowledge, it is the first rigorous method proposed to overcome SVM overfitting. Finally, we propose a novel biomarker discovery algorithm: Gene-Switch-Marker (GSM) to capture meaningful biomarkers by taking advantage of SVM overfitting on single genes.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom