Premium
Testing prediction algorithms as null hypotheses: Application to assessing the performance of deep neural networks
Author(s) -
Bickel David R.
Publication year - 2020
Publication title -
stat
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.61
H-Index - 18
ISSN - 2049-1573
DOI - 10.1002/sta4.270
Subject(s) - artificial neural network , computer science , algorithm , machine learning , bayesian probability , artificial intelligence , posterior predictive distribution , predictive value , set (abstract data type) , bayesian linear regression , bayesian inference , medicine , programming language
Bayesian models use posterior predictive distributions to quantify the uncertainty of their predictions. Similarly, the point predictions of neural networks and other machine learning algorithms may be converted to predictive distributions by various bootstrap methods. The predictive performance of each algorithm can then be assessed by quantifying the performance of its predictive distribution. Previous methods for assessing such performance are relative, indicating whether certain algorithms perform better than others. This paper proposes performance measures that are absolute in the sense that they indicate whether or not an algorithm performs adequately without requiring comparisons with other algorithms. The first proposed performance measure is a predictive p value that generalizes a prior predictive p value with the prior distribution equal to the posterior distribution of previous data. The other proposed performance measures use the generalized predictive p value for each prediction to estimate the proportion of target values that are compatible with the predictive distribution. The new performance measures are illustrated by using them to evaluate the predictive performance of deep neural networks when applied to the analysis of a large housing price data set that is used as a standard in machine learning.