Premium
On the analysis of hybrid designs that combine group‐ and individual‐level data
Author(s) -
Smoot E.,
Haneuse S.
Publication year - 2015
Publication title -
biometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.298
H-Index - 130
eISSN - 1541-0420
pISSN - 0006-341X
DOI - 10.1111/biom.12220
Subject(s) - group (periodic table) , computer science , statistics , mathematics , data mining , chemistry , organic chemistry
Summary Ecological studies that make use of data on groups of individuals, rather than on the individuals themselves, are subject to numerous biases that cannot be resolved without some individual‐level data. In the context of a rare outcome, the hybrid design for ecological inference efficiently combines group‐level data with individual‐level case‐control data. Unfortunately, except in relatively simple settings, use of the design in practice is limited since evaluation of the hybrid likelihood is computationally prohibitively expensive. In this article we first propose and develop an alternative representation of the hybrid likelihood. Second, based on this new representation, a series of approximations are proposed that drastically reduce computational burden. A comprehensive simulation shows that, in a broad range of scenarios, estimators based on the approximate hybrid likelihood exhibit the same operating characteristics as the exact hybrid likelihood, without any penalty in terms of increased bias or reduced efficiency. Third, in settings where the approximations may not hold, a pragmatic estimation and inference strategy is developed that uses the approximate form for some likelihood contributions and the exact form for others. The strategy gives researchers the ability to balance computational tractability with accuracy in their own settings. Finally, as a by‐product of the development, we provide the first explicit characterization of the hybrid aggregate data design which combines data from an aggregate data study (Prentice and Sheppard, 1995, Biometrika 82, 113–125) with case–control samples. The methods are illustrated using data from North Carolina on births between 2007 and 2009.