Premium
Proximity and gravity: modeling heaped self‐reports
Author(s) -
Allen Chelsea McCarty,
Griffith Sandra D.,
Shiffman Saul,
Heitjan Daniel F.
Publication year - 2017
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.7327
Subject(s) - count data , mathematics , statistics , set (abstract data type) , econometrics , computer science , poisson distribution , programming language
Self‐reported daily cigarette counts typically exhibit a preponderance of round numbers, a phenomenon known as heaping or digit preference . Heaping can be a substantial nuisance, as scientific interest lies in the distribution of the underlying true values rather than that of the heaped data. In principle, we can estimate parameters of the underlying distribution from heaped data if we know the conditional distribution of the heaped count given the true count, denoted the heaping mechanism (analogous to the missingness mechanism for missing data). In general, it is not possible to estimate the heaping mechanism robustly from heaped data only. A doubly‐coded smoking cessation trial data set that includes daily cigarette count as both a conventional heaped retrospective recall measurement and a precise instantaneous measurement offers the rare opportunity to directly estimate the heaping mechanism. We propose a novel model that describes the conditional probability of the self‐reported count as a function of its proximity to the truth and its intrinsic attractiveness, denoted its gravity . We apply variations of the model to the cigarette count data, illuminating the cognitive processes that influence self‐reporting behaviors. The principal application of the model will be to enabling the correct analysis of heaped‐only data sets. Copyright © 2017 John Wiley & Sons, Ltd.