z-logo
Premium
Novel two‐phase sampling designs for studying binary outcomes
Author(s) -
Wang Le,
Williams Matthew L.,
Chen Yong,
Chen Jinbo
Publication year - 2020
Publication title -
biometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.298
H-Index - 130
eISSN - 1541-0420
pISSN - 0006-341X
DOI - 10.1111/biom.13140
Subject(s) - covariate , statistics , outcome (game theory) , matching (statistics) , sampling (signal processing) , mathematics , sampling design , econometrics , set (abstract data type) , sample size determination , computer science , medicine , population , environmental health , mathematical economics , filter (signal processing) , computer vision , programming language
In biomedical cohort studies for assessing the association between an outcome variable and a set of covariates, usually, some covariates can only be measured on a subgroup of study subjects. An important design question is—which subjects to select into the subgroup to increase statistical efficiency. When the outcome is binary, one may adopt a case‐control sampling design or a balanced case‐control design where cases and controls are further matched on a small number of complete discrete covariates. While the latter achieves success in estimating odds ratio (OR) parameters for the matching covariates, similar two‐phase design options have not been explored for the remaining covariates, especially the incompletely collected ones. This is of great importance in studies where the covariates of interest cannot be completely collected. To this end, assuming that an external model is available to relate the outcome and complete covariates, we propose a novel sampling scheme that oversamples cases and controls with worse goodness‐of‐fit based on the external model and further matches them on complete covariates similarly to the balanced design. We develop a pseudolikelihood method for estimating OR parameters. Through simulation studies and explorations in a real‐cohort study, we find that our design generally leads to reduced asymptotic variances of the OR estimates and the reduction for the matching covariates is comparable to that of the balanced design.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here