A new approach for interpreting Random Forest models and its application to the biology of ageing
Author(s) -
Fábio Fabris,
Aoife Doherty,
Daniel H. Palmer,
João Pedro de Magalhães,
Alex A. Freitas
Publication year - 2018
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/bty087
Subject(s) - random forest , feature (linguistics) , computer science , ranking (information retrieval) , measure (data warehouse) , feature selection , data mining , feature model , artificial intelligence , machine learning , pattern recognition (psychology) , software , philosophy , linguistics , programming language
This work uses the Random Forest (RF) classification algorithm to predict if a gene is over-expressed, under-expressed or has no change in expression with age in the brain. RFs have high predictive power, and RF models can be interpreted using a feature (variable) importance measure. However, current feature importance measures evaluate a feature as a whole (all feature values). We show that, for a popular type of biological data (Gene Ontology-based), usually only one value of a feature is particularly important for classification and the interpretation of the RF model. Hence, we propose a new algorithm for identifying the most important and most informative feature values in an RF model.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom