Premium
Model selection for incomplete and design‐based samples
Author(s) -
Hens N.,
Aerts M.,
Molenberghs G.
Publication year - 2006
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.2559
Subject(s) - akaike information criterion , model selection , selection (genetic algorithm) , statistics , bayesian information criterion , sample (material) , sample size determination , information criteria , mathematics , set (abstract data type) , computer science , econometrics , artificial intelligence , chemistry , chromatography , programming language
Abstract The Akaike information criterion, AIC, is one of the most frequently used methods to select one or a few good, optimal regression models from a set of candidate models. In case the sample is incomplete, the naive use of this criterion on the so‐called complete cases can lead to the selection of poor or inappropriate models. A similar problem occurs when a sample based on a design with unequal selection probabilities, is treated as a simple random sample. In this paper, we consider a modification of AIC, based on reweighing the sample in analogy with the weighted Horvitz–Thompson estimates. It is shown that this weighted AIC‐criterion provides better model choices for both incomplete and design‐based samples. The use of the weighted AIC‐criterion is illustrated on data from the Belgian Health Interview Survey, which motivated this research. Simulations show its performance in a variety of settings. Copyright © 2006 John Wiley & Sons, Ltd.