Premium
Proxy expenditure weights for Consumer Price Index: Audit sampling inference for big‐data statistics
Author(s) -
Zhang LiChun
Publication year - 2021
Publication title -
journal of the royal statistical society: series a (statistics in society)
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.103
H-Index - 84
eISSN - 1467-985X
pISSN - 0964-1998
DOI - 10.1111/rssa.12632
Subject(s) - proxy (statistics) , big data , audit , econometrics , statistics , sampling (signal processing) , variance (accounting) , inference , survey sampling , computer science , sampling bias , survey data collection , sampling design , statistical inference , selection bias , sample size determination , data mining , mathematics , economics , accounting , population , demography , filter (signal processing) , artificial intelligence , sociology , computer vision
Purchase data from retail chains can provide proxy measures of private household expenditure on items that are the most troublesome to collect in the traditional expenditure survey. Due to the inevitable coverage and selection errors, bias must exist in these proxy measures. Moreover, given the sheer amount of data, the bias completely dominates the variance. To investigate the potential of replacing costly and burdensome surveys by non‐survey big‐data sources, we propose an audit sampling inference approach, which does not require linking the audit sample and the big‐data source at the individual level. It turns out that one is unable to reject a null hypothesis of unbiased big‐data estimation at the chosen size, because the audit sampling variance is too large compared to the bias of the big‐data estimate. For the same reason, audit sampling fails to yield a meaningful mean squared error estimate. We propose a novel accuracy measure that is generally applicable in such situations. This can provide a necessary part of the statistical argument for the uptake of non‐survey big‐data sources, in replacement of traditional survey sampling. An application to disaggregated food price indices is used to demonstrate the proposed approach.