Linkage‐data linear regression | Zendy

Zhang LiChun | Zendy; Tuoto Tiziana | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Linkage‐data linear regression

Author(s) -

Zhang LiChun,

Tuoto Tiziana

Publication year - 2021

Publication title -

journal of the royal statistical society: series a (statistics in society)

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.103

H-Index - 84

eISSN - 1467-985X

pISSN - 0964-1998

DOI - 10.1111/rssa.12630

Subject(s) - linkage (software) , record linkage , computer science , data mining , inference , regression , identification (biology) , statistical hypothesis testing , regression analysis , key (lock) , linear regression , statistics , artificial intelligence , machine learning , mathematics , sociology , biology , gene , population , biochemistry , chemistry , demography , botany , computer security

Data linkage is increasingly being used to combine data from different sources with the aim of identifying and bringing together records from separate files, which correspond to the same entities. Usually, data linkage is not a trivial procedure and linkage errors, false and missed links, are unavoidable. In these cases, standard statistical techniques may produce misleading inference. In this paper, we propose a method for secondary linear regression analysis, where the linked data have to be prepared by someone else, and neither the match‐key variables nor the unlinked records are available to the analyst. We develop also a diagnostic test for the assumption of non‐informative linkage errors, which is required for all existing secondary analysis adjustment methods. Our approach provides important advantages: it relies on the realistic assumption that the probabilities of correct linkage vary across the records but it does not assume that one is able to estimate the probability of correct linkage for each individual record. Moreover, it accommodates in a simple manner the general situation where the files are of different sizes and none of them is a subset of another. The proposed methodology of adjustment and testing is studied by simulation and applied to real data.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research