Spot the difference: comparing results of analyses from real patient data and synthetic derivatives | Zendy

Randi E. Foraker | Zendy; Sean Yu | Zendy; Aditi Gupta | Zendy; Andrew P. Michelson | Zendy; José A. Soto | Zendy; Ryan Colvin | Zendy; Francis Loh | Zendy; Marin H. Kollef | Zendy; Thomas M. Maddox | Zendy; Bradley Evanoff | Zendy; Hovav Dror | Zendy; Noa Zamstein | Zendy; Albert M. Lai | Zendy; Philip Payne | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Spot the difference: comparing results of analyses from real patient data and synthetic derivatives

Author(s) -

Randi E. Foraker,

Sean Yu,

Aditi Gupta,

Andrew P. Michelson,

José A. Soto,

Ryan Colvin,

Francis Loh,

Marin H. Kollef,

Thomas M. Maddox,

Bradley Evanoff,

Hovav Dror,

Noa Zamstein,

Albert M. Lai,

Philip Payne

Publication year - 2020

Publication title -

jamia open

Language(s) - English

Resource type - Journals

ISSN - 2574-2531

DOI - 10.1093/jamiaopen/ooaa060

Subject(s) - synthetic data , computer science , confidentiality , data sharing , robustness (evolution) , data science , data mining , big data , artificial intelligence , medicine , biochemistry , chemistry , alternative medicine , computer security , pathology , gene

Background Synthetic data may provide a solution to researchers who wish to generate and share data in support of precision healthcare. Recent advances in data synthesis enable the creation and analysis of synthetic derivatives as if they were the original data; this process has significant advantages over data deidentification. Objectives To assess a big-data platform with data-synthesizing capabilities (MDClone Ltd., Beer Sheva, Israel) for its ability to produce data that can be used for research purposes while obviating privacy and confidentiality concerns. Methods We explored three use cases and tested the robustness of synthetic data by comparing the results of analyses using synthetic derivatives to analyses using the original data using traditional statistics, machine learning approaches, and spatial representations of the data. We designed these use cases with the purpose of conducting analyses at the observation level (Use Case 1), patient cohorts (Use Case 2), and population-level data (Use Case 3). Results For each use case, the results of the analyses were sufficiently statistically similar ( P > 0.05) between the synthetic derivative and the real data to draw the same conclusions. Discussion and conclusion This article presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in clinical research for faster insights and improved data sharing in support of precision healthcare.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research