Nonparametric variable importance assessment using machine learning techniques | Zendy

Williamson Brian D. | Zendy; Gilbert Peter B. | Zendy; Carone Marco | Zendy; Simon Noah | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Nonparametric variable importance assessment using machine learning techniques

Author(s) -

Williamson Brian D.,

Gilbert Peter B.,

Carone Marco,

Simon Noah

Publication year - 2021

Publication title -

biometrics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 2.298

H-Index - 130

eISSN - 1541-0420

pISSN - 0006-341X

DOI - 10.1111/biom.13392

Subject(s) - nonparametric statistics , computer science , machine learning , variable (mathematics) , artificial intelligence , statistics , mathematics , mathematical analysis

In a regression setting, it is often of interest to quantify the importance of various features in predicting the response. Commonly, the variable importance measure used is determined by the regression technique employed. For this reason, practitioners often only resort to one of a few regression techniques for which a variable importance measure is naturally defined. Unfortunately, these regression techniques are often suboptimal for predicting the response. Additionally, because the variable importance measures native to different regression techniques generally have a different interpretation, comparisons across techniques can be difficult. In this work, we study a variable importance measure that can be used with any regression technique, and whose interpretation is agnostic to the technique used. This measure is a property of the true data‐generating mechanism. Specifically, we discuss a generalization of the analysis of variance variable importance measure and discuss how it facilitates the use of machine learning techniques to flexibly estimate the variable importance of a single feature or group of features. The importance of each feature or group of features in the data can then be described individually, using this measure. We describe how to construct an efficient estimator of this measure as well as a valid confidence interval. Through simulations, we show that our proposal has good practical operating characteristics, and we illustrate its use with data from a study of risk factors for cardiovascular disease in South Africa.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research