
Record Linkage Reconciliation of Arlington Department of Human Services Administrative Data Using Potts Models
Author(s) -
Ian Crandell,
Aaron Schroeder,
Dave Higdon,
Michael-dharma Irwin
Publication year - 2018
Publication title -
international journal of population data science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.602
H-Index - 7
ISSN - 2399-4908
DOI - 10.23889/ijpds.v3i5.1061
Subject(s) - computer science , record linkage , linkage (software) , pairwise comparison , probabilistic logic , service (business) , data mining , artificial intelligence , business , population , biochemistry , chemistry , demography , marketing , sociology , gene
Situated at the nexus of federal, state, and local governments, the Arlington Department of Human Services (DHS) receives service utilization data from a multitude of different sources. Because of their “no wrong door” policy, customers can sign up for any DHS service from any DHS department. A practical consequence of this is that a single person can appear as multiple records from multiple databases with no unambiguous key between these records. Merging these records requires a probabilistic linkage approach. Classical approaches to record linkage, such as the method of Felligi and Sunter, consider each possible pair of records between databases and assigning link probabilities to each one. A drawback of considering pairwise links alone is that sometimes the transitive nature of links is violated. In order to better handle such information clashes, we propose a Bayesian linkage method that considers a large set of possible pairs at once. At the heart of this approach is a Potts model representation that tracks which records are assigned to the same individual. This allows us to assign probabilities to the various reconciliations of inconsistent linkage assignments.