Premium
Selecting Genes for Cancer Classification Using SVM: An Adaptive Multiple Features Scheme
Author(s) -
Hsu WenChin,
Liu ChanCheng,
Chang Fu,
Chen SuShing
Publication year - 2013
Publication title -
international journal of intelligent systems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.291
H-Index - 87
eISSN - 1098-111X
pISSN - 0884-8173
DOI - 10.1002/int.21625
Subject(s) - support vector machine , scheme (mathematics) , classification scheme , artificial intelligence , computer science , pattern recognition (psychology) , machine learning , mathematics , mathematical analysis
Selecting important genes from microarray data is a considerably challenging problem as shown in Guyon's 2002 paper in this journal. We have developed an alternative feature ranking and selection methodology to tackle this problem. On the basis of several cancer data sets, AMFES (adaptive multiple features selection) outperforms Guyon's RFE (recursive feature elimination). In this paper, we will present a comprehensive and systematic comparison of three methods: AMFES, RFE, and the CORR (correlation coefficient) on five data sets (leukemia, colon, lymphoma, prostate, and potentially others). The leukemia, colon, and lymphoma data sets are adapted from Guyon's paper for convenience and the prostate cancer data set is from a public database, NCBI GEO (Gene Expression Omnibus). These three methods are compared in terms of test accuracy, number of selected features, computational time (total and training), statistical significance ( t test, p values, and ROC (receiver operating characteristic)/AUC (area under curve)), and the discovery rate of informative features. AMFES obtains better results in computational time and number of selected features while maintaining higher or comparable test accuracy, statistical significance, and the discovery rate of informative features. In addition, AMFES can serve as a general methodology for other similar problems such as sampling and data mining.