Premium
Non‐parametric inference for clustered binary and count data when only summary information is available
Author(s) -
Hall Peter,
Maiti Tapabrata
Publication year - 2008
Publication title -
journal of the royal statistical society: series b (statistical methodology)
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 6.523
H-Index - 137
eISSN - 1467-9868
pISSN - 1369-7412
DOI - 10.1111/j.1467-9868.2008.00658.x
Subject(s) - parametric statistics , poisson distribution , inference , count data , binary data , parametric model , random effects model , mathematics , computer science , negative binomial distribution , binomial distribution , statistics , binary number , artificial intelligence , medicine , meta analysis , arithmetic
Summary. Data in the form of pairs ( X , Y ), where the response Y is a count, arise in many applications, including problems involving stratified or two‐stage sampling. Such data are often analysed by using random‐effects models, where the distribution of Y , conditional on X and on an unobserved random parameter Θ , is taken to be either binomial or Poisson, and the distribution of Θ is connected through a link function to a random effect. The latter is sometimes supposed to be normally distributed, but that assumption can lead to serious biases if it is not satisfied. This difficulty has motivated an extensive literature on non‐parametric techniques for cases where the random‐effect distribution is unknown, but that methodology requires detailed cluster level data. No non‐parametric techniques are available for instances where only summary data are accessible. We show that the random‐effect distribution is actually non‐parametrically identifiable from crude summary data, and we suggest a sieve method for inference in this case. Non‐parametric approaches to both parameter estimation and prediction are introduced, and empirical techniques are developed for choosing tuning parameters. These methods are shown to outperform their parametric counterparts when the model is misspecified, and to perform almost as well as parametric methods when the model is correct.