z-logo
Premium
Robust variable selection based on bagging classification tree for support vector machine in metabonomic data analysis
Author(s) -
Chen ShuFang,
Gu Hui,
Tu MengYing,
Zhou YanPing,
Cui YanFang
Publication year - 2018
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.2921
Subject(s) - support vector machine , artificial intelligence , feature selection , machine learning , computer science , classifier (uml) , robustness (evolution) , pattern recognition (psychology) , chemometrics , biomarker discovery , decision tree , data mining , biology , biochemistry , proteomics , gene
In metabonomics, metabolic profiles of high complexity bring out tremendous challenges to existing chemometric methods. Variable selection (ie, biomarker discovery) and pattern recognition (ie, classification) are two important tasks of chemometrics in metabonomics, especially biomarker discovery that can be potentially used for disease diagnosis and pathology discovery. Typically, the informative variables are elicited from a single classifier; however, it is often unreliable in practice. To rectify this, in the current study, bagging and classification tree (CT) were combined to form a general framework (ie, BAGCT) for robustly selecting the informative variables, based on the advantages of CT in automatically carrying out variable selection as well as measuring variable importance and the properties of bagging in improving the reliability and robustness of a single model. In BAGCT, a set of parallel CT models were established based on the idea of bagging, each CT providing some endowed information such as the splitting variables and their corresponding importance values. The informative variables can be successfully spied via inspecting the variable importance values over all CTs in BAGCT. Taking the promising properties of support vector machine (SVM) into account, we used the informative variables identified by BAGCT as the inputs of SVM, forming a new classification tool abbreviated as BAGCT‐SVM. A metabonomic dataset by hydrogen‐1 nuclear magnetic resonance from the patients with lung cancer and the healthy controls was used to validate BAGCT‐SVM with CT and SVM as comparisons. Results showed that BAGCT‐SVM with less number of variables can give better predictive ability than CT and SVM.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here