Premium
Deterministic fallacies and model validation
Author(s) -
Hawkins Douglas M.,
Kraker Jessica
Publication year - 2010
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.1311
Subject(s) - computer science , bootstrapping (finance) , cross validation , computation , model validation , data validation , machine learning , econometrics , data mining , artificial intelligence , data science , algorithm , mathematics , database
Stochastic settings differ from deterministic ones in many subtle ways, making it easy to slip into errors through applying deterministic thinking inappropriately. We suspect this is the cause of much of the disagreement about model validation. A further technical issue is a common misapplication of cross‐validation, in which it is applied only partially, leading to incorrect results. Statistical theory and empirical investigation verify the efficacy of cross‐validation when it is applied correctly. In settings where data are relatively scarce, cross‐validation is attractive in that it makes the maximum possible use of all available information, at the cost of potentially substantial computation. The bootstrap is another method that makes full use of all available data for both model fitting and model validation, at a cost of substantially increased computation, and it shares many of the broad philosophical background of cross‐validation. Increasingly, the computational cost of these methods is not a major concern, leading to the recommendation, in most circumstances, to use cross‐validation or bootstrapping rather than the earlier standard method of splitting the available data into a learning and a testing portion. Copyright © 2010 John Wiley & Sons, Ltd.