z-logo
open-access-imgOpen Access
Variable importance‐weighted Random Forests
Author(s) -
Liu Yiyi,
Zhao Hongyu
Publication year - 2017
Publication title -
quantitative biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.707
H-Index - 15
eISSN - 2095-4697
pISSN - 2095-4689
DOI - 10.1007/s40484-017-0121-6
Subject(s) - random forest , feature selection , feature (linguistics) , variable (mathematics) , computer science , regression , random variable , pattern recognition (psychology) , artificial intelligence , data mining , sampling (signal processing) , statistics , machine learning , mathematics , mathematical analysis , philosophy , linguistics , filter (signal processing) , computer vision
Background Random Forests is a popular classification and regression method that has proven powerful for various prediction problems in biological studies. However, its performance often deteriorates when the number of features increases. To address this limitation, feature elimination Random Forests was proposed that only uses features with the largest variable importance scores. Yet the performance of this method is not satisfying, possibly due to its rigid feature selection, and increased correlations between trees of forest. Methods We propose variable importance‐weighted Random Forests, which instead of sampling features with equal probability at each node to build up trees, samples features according to their variable importance scores, and then select the best split from the randomly selected features. Results We evaluate the performance of our method through comprehensive simulation and real data analyses, for both regression and classification. Compared to the standard Random Forests and the feature elimination Random Forests methods, our proposed method has improved performance in most cases. Conclusions By incorporating the variable importance scores into the random feature selection step, our method can better utilize more informative features without completely ignoring less informative ones, hence has improved prediction accuracy in the presence of weak signals and large noises. We have implemented an R package “viRandomForests” based on the original R package “randomForest” and it can be freely downloaded from http://zhaocenter.org/software .

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here