Premium
Relative efficiencies of two‐stage sampling schemes for mean estimation in multilevel populations when cluster size is informative
Author(s) -
Innocenti Francesco,
Candel Math J.J.M.,
Tan Frans E.S.,
Breukelen Gerard J.P.
Publication year - 2018
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.8070
Subject(s) - cluster sampling , statistics , sampling (signal processing) , simple random sample , sample size determination , mathematics , cluster (spacecraft) , poisson sampling , sampling design , population , sampling distribution , population size , efficiency , inference , slice sampling , importance sampling , computer science , monte carlo method , estimator , demography , filter (signal processing) , computer vision , programming language , artificial intelligence , sociology
In multilevel populations, there are two types of population means of an outcome variable ie, the average of all individual outcomes ignoring cluster membership and the average of cluster‐specific means. To estimate the first mean, individuals can be sampled directly with simple random sampling or with two‐stage sampling (TSS), that is, sampling clusters first, and then individuals within the sampled clusters. When cluster size varies in the population, three TSS schemes can be considered, ie, sampling clusters with probability proportional to cluster size and then sampling the same number of individuals per cluster; sampling clusters with equal probability and then sampling the same percentage of individuals per cluster; and sampling clusters with equal probability and then sampling the same number of individuals per cluster. Unbiased estimation of the average of all individual outcomes is discussed under each sampling scheme assuming cluster size to be informative. Furthermore, the three TSS schemes are compared in terms of efficiency with each other and with simple random sampling under the constraint of a fixed total sample size. The relative efficiency of the sampling schemes is shown to vary across different cluster size distributions. However, sampling clusters with probability proportional to size is the most efficient TSS scheme for many cluster size distributions. Model‐based and design‐based inference are compared and are shown to give similar results. The results are applied to the distribution of high school size in Italy and the distribution of patient list size for general practices in England.