z-logo
Premium
Correlates of record linkage and estimating risks of non‐linkage biases in business data sets
Author(s) -
Moore Jamie C.,
Smith Peter W. F.,
Durrant Gabriele B.
Publication year - 2018
Publication title -
journal of the royal statistical society: series a (statistics in society)
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.103
H-Index - 84
eISSN - 1467-985X
pISSN - 0964-1998
DOI - 10.1111/rssa.12342
Subject(s) - linkage (software) , representativeness heuristic , record linkage , computer science , sample (material) , linked data , data set , set (abstract data type) , identifier , similarity (geometry) , survey data collection , data mining , data science , information retrieval , econometrics , statistics , mathematics , artificial intelligence , medicine , environmental health , chemistry , semantic web , chromatography , image (mathematics) , gene , programming language , population , biochemistry
Summary Researchers often utilize data sets that link information from multiple sources, but non‐linkage biases caused by linked and non‐linked subject differences are little understood, especially in business data sets. We address these knowledge gaps by studying biases in linkable 2010 UK Small Business Survey data sets. We identify correlates of business linkage propensity, and also for the first time its components: consent to linkage and register identifier appendability. As well, we take a novel approach to evaluating non‐linkage bias risks, by computing data set representativeness indicators (comparable, decomposable sample subset similarity measures). We find that the main impacts on linkage propensities and bias risks are due to consenter–non‐consenter differences explicable given business survey response processes, and differences between subjects with and without identifiers caused by register undercoverage of very small businesses. We then discuss consequences for the analysis of linked business data sets, and implications of the evaluation methods we introduce for linked data set producers and users.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here