z-logo
open-access-imgOpen Access
Using synthetic data to replace linkage derived elements: a case study
Author(s) -
Dean Resnick,
Christine Cox,
Lisa B. Mirel
Publication year - 2021
Publication title -
health services and outcomes research methodology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.716
H-Index - 28
eISSN - 1572-9400
pISSN - 1387-3741
DOI - 10.1007/s10742-021-00241-z
Subject(s) - record linkage , microdata (statistics) , linkage (software) , data mining , computer science , actuarial science , statistics , medicine , data science , mathematics , environmental health , business , population , biochemistry , chemistry , census , gene
While record linkage can expand analyses performable from survey microdata, it also incurs greater risk of privacy-encroaching disclosure. One way to mitigate this risk is to replace some of the information added through linkage with synthetic data elements. This paper describes a case study using the National Hospital Care Survey (NHCS), which collects patient records under a pledge of protecting patient privacy from a sample of U.S. hospitals for statistical analysis purposes. The NHCS data were linked to the National Death Index (NDI) to enhance the survey with mortality information. The added information from NDI linkage enables survival analyses related to hospitalization, but as the death information includes dates of death and detailed causes of death, having it joined with the patient records increases the risk of patient re-identification (albeit only for deceased persons). For this reason, an approach was tested to develop synthetic data that uses models from survival analysis to replace vital status and actual dates-of-death with synthetic values and uses classification tree analysis to replace actual causes of death with synthesized causes of death. The degree to which analyses performed on the synthetic data replicate results from analysis on the actual data is measured by comparing survival analysis parameter estimates from both data files. Because synthetic data only have value to the degree that they can be used to produce statistical estimates that are like those based on the actual data, this evaluation is an essential first step in assessing the potential utility of synthetic mortality data.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here