Research Library

open-access-imgOpen AccessComprehensive Exploration of Synthetic Data Generation: A Survey
Author(s)
André Bauer,
Simon Trapp,
Michael Stenger,
Robert Leppich,
Samuel Kounev,
Mark Leznik,
Kyle Chard,
Ian Foster
Publication year2024
Recent years have witnessed a surge in the popularity of Machine Learning(ML), applied across diverse domains. However, progress is impeded by thescarcity of training data due to expensive acquisition and privacy legislation.Synthetic data emerges as a solution, but the abundance of released models andlimited overview literature pose challenges for decision-making. This worksurveys 417 Synthetic Data Generation (SDG) models over the last decade,providing a comprehensive overview of model types, functionality, andimprovements. Common attributes are identified, leading to a classification andtrend analysis. The findings reveal increased model performance and complexity,with neural network-based approaches prevailing, except for privacy-preservingdata generation. Computer vision dominates, with GANs as primary generativemodels, while diffusion models, transformers, and RNNs compete. Implicationsfrom our performance evaluation highlight the scarcity of common metrics anddatasets, making comparisons challenging. Additionally, the neglect of trainingand computational costs in literature necessitates attention in futureresearch. This work serves as a guide for SDG model selection and identifiescrucial areas for future exploration.
Language(s)English

Seeing content that should not be on Zendy? Contact us.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here