Premium
Estimating population size and duplication rates when records cannot be linked
Author(s) -
Laska Eugene M.,
Meisner Morris,
Wanderling Joseph,
Siegel Carole
Publication year - 2003
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.1640
Subject(s) - statistics , variance (accounting) , population , estimator , sample size determination , binomial distribution , set (abstract data type) , population size , negative binomial distribution , mathematics , binomial (polynomial) , demography , computer science , accounting , sociology , business , poisson distribution , programming language
The capture‐recapture approach to estimating the size of a population is a well‐studied area of statistics. The number of distinct individuals, N A and N B , on each of two lists, A and B, and the number common to both lists, N AB , are used to form an estimate of the binomial probability of being on one of the lists, which then allows an estimate to be made of the size of the population. Critical to the method is an accurate count of N AB . We consider situations in which this count is not available. Such problems arise in a variety of behavioural health contexts in which the need for protection of privacy may prevent sharing identifying information, so it is not possible to specifically match an individual who appears on one list with an individual on the other. Suppose that the birth dates and/or other demographics of individuals on each list are known. We introduce two methods for estimating the duplication rates and the size of the population. Conditioning on the set β of birth dates of those on list B, N A and N B , the maximum likelihood estimators (MLEs) and their variance are derived. The MLEs are based on the proportion of individuals on list A whose birth dates fall in β. This approach is particularly useful if list B itself contains duplicates. The second model utilizes the full sample distribution of the birth dates. We generalize this approach to accommodate multiple demographic characteristics. The approaches are applied to the problem of estimating duplication rates and the population size of veterans who have mental illness in Kings County, NY. The data are lists of those receiving service from the Veterans Administration system and from providers funded or certified by the New York State Office of Mental Health. Copyright © 2003 John Wiley & Sons, Ltd.