z-logo
Premium
Feature importance sampling‐based adaptive random forest as a useful tool to screen underlying lead compounds
Author(s) -
Cao DongSheng,
Liang YiZeng,
Xu QingSong,
Zhang LiangXiao,
Hu QianNan,
Li HongDong
Publication year - 2011
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.1375
Subject(s) - random forest , feature (linguistics) , simple random sample , sampling (signal processing) , computer science , adaptive sampling , pattern recognition (psychology) , data mining , stratified sampling , artificial intelligence , simple (philosophy) , machine learning , mathematics , statistics , population , philosophy , linguistics , demography , filter (signal processing) , epistemology , sociology , monte carlo method , computer vision
Good performance of ensemble approaches could generally be obtained when base classifiers are diverse and accurate. In the present study, feature importance sampling‐based adaptive random forest (fisaRF) was proposed to obtain superior classification performance to the primal one‐step random forest (RF). fisaRF takes a convenient, yet very effective, way called feature importance sampling (FIS), to select the more eligible feature subset at each splitting node instead of simple random sampling and thereby strengthen the accuracy of individual trees, without sacrificing diversity between them. Additionally, the iterative use of feature importance obtained by the previous step can adaptively capture the most significant features in data and effectively deal with multiple classification problems, not easily solved by other feature importance indexes. The proposed fisaRF was applied to classify three structure–activity relationship (SAR) data sets proposed by Xue et al . 1 together with disinfection by‐products (DBPs) data, compared to the primal one‐step RF induced by simple random sampling. The comparison revealed that fisaRF can effectively improve the classification accuracy and prediction confidence for each sample and thereby was considered as a very useful tool to screen the underlying lead compounds. Copyright © 2011 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here