Premium
Making complex prediction rules applicable for readers: Current practice in random forest literature and recommendations
Author(s) -
Boulesteix AnneLaure,
Janitza Silke,
Hornung Roman,
Probst Philipp,
Busen Hannah,
Hapfelmeier Alexander
Publication year - 2019
Publication title -
biometrical journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.108
H-Index - 63
eISSN - 1521-4036
pISSN - 0323-3847
DOI - 10.1002/bimj.201700243
Subject(s) - computer science , random forest , set (abstract data type) , field (mathematics) , data mining , predictive modelling , data science , machine learning , logistic regression , artificial intelligence , mathematics , pure mathematics , programming language
Ideally, prediction rules should be published in such a way that readers may apply them, for example, to make predictions for their own data. While this is straightforward for simple prediction rules, such as those based on the logistic regression model, this is much more difficult for complex prediction rules derived by machine learning tools. We conducted a survey of articles reporting prediction rules that were constructed using the random forest algorithm and published in PLOS ONE in 2014–2015 in the field “medical and health sciences”, with the aim of identifying issues related to their applicability. Making a prediction rule reproducible is a possible way to ensure that it is applicable; thus reproducibility is also examined in our survey. The presented prediction rules were applicable in only 2 of 30 identified papers, while for further eight prediction rules it was possible to obtain the necessary information by contacting the authors. Various problems, such as nonresponse of the authors, hampered the applicability of prediction rules in the other cases. Based on our experiences from this illustrative survey, we formulate a set of recommendations for authors who aim to make complex prediction rules applicable for readers. All data including the description of the considered studies and analysis codes are available as supplementary materials.