Premium
Modeling and analyzing respondent‐driven sampling as a counting process
Author(s) -
Berchenko Yakir,
Rosenblatt Jonathan D.,
Frost Simon D. W.
Publication year - 2017
Publication title -
biometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.298
H-Index - 130
eISSN - 1541-0420
pISSN - 0006-341X
DOI - 10.1111/biom.12678
Subject(s) - respondent , statistics , sampling (signal processing) , process (computing) , counting process , computer science , econometrics , mathematics , political science , programming language , filter (signal processing) , computer vision , law
Summary Respondent‐driven sampling (RDS) is an approach to sampling design and analysis which utilizes the networks of social relationships that connect members of the target population, using chain‐referral. RDS sampling will typically oversample participants with many acquaintances. Naïve estimators, such as the sample average, will thus be biased towards the state of the most highly connected individuals. Current methodology cannot estimate population size from RDS, and promotes inverse probability weighted estimators for population parameters such as HIV prevalence. We propose to use the timing of recruitment, typically collected and discarded, in order to estimate the population size via a counting process model. Once population size and degree frequencies are made available, prevalence can be debiased in a post‐stratified framework. We adapt methods developed for inference in epidemiology and software reliability to estimate the population size, degree counts and frequencies. A fundamental advantage of our approach is that it makes the assumptions of the sampling design explicit. This enables verification of the assumptions, maximum likelihood estimation, extension with covariates, and model selection. We develop large‐sample theory, proving consistency and asymptotic normality. We further compare our estimators to other estimators in the RDS literature, through simulation and real‐world data. In both cases, we find our estimators to outperform current methods. The likelihood problem in the model we present is separable, and thus efficiently solvable. We implement these estimators in an accompanying R package, chords , available on CRAN.