Premium
Model Uncertainty, Data Mining and Statistical Inference
Author(s) -
Chatfield Chris
Publication year - 1995
Publication title -
journal of the royal statistical society: series a (statistics in society)
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.103
H-Index - 84
eISSN - 1467-985X
pISSN - 0964-1998
DOI - 10.2307/2983440
Subject(s) - statistical inference , inference , computer science , data mining , statistics , econometrics , mathematics , artificial intelligence
SUMMARY This paper takes a broad, pragmatic view of statistical inference to include all aspects of model formulation . The estimation of model parameters traditionally assumes that a model has a prespecified known form and takes no account of possible uncertainty regarding the model structure. This implicitly assumes the existence of a ‘true’ model, which many would regard as a fiction. In practice model uncertainty is a fact of life and likely to be more serious than other sources of uncertainty which have received far more attention from statisticians. This is true whether the model is specified on subject‐matter grounds or, as is increasingly the case, when a model is formulated, fitted and checked on the same data set in an iterative, interactive way. Modern computing power allows a large number of models to be considered and data‐dependent specification searches have become the norm in many areas of statistics. The term data mining may be used in this context when the analyst goes to great lengths to obtain a good fit. This paper reviews the effects of model uncertainty, such as too narrow prediction intervals, and the non‐trivial biases in parameter estimates which can follow data‐based modelling. Ways of assessing and overcoming the effects of model uncertainty are discussed, including the use of simulation and resampling methods, a Bayesian model averaging approach and collecting additional data wherever possible. Perhaps the main aim of the paper is to ensure that statisticians are aware of the problems and start addressing the issues even if there is no simple, general theoretical fix.