Tabular Data Augmentation Using Artificial Intelligence: A Systematic Review and Taxonomic Framework | Zendy

Mauro Henrique Lima De Boni | Zendy; Iwens Gervasio Sene | Zendy; Ronaldo Martins Da Costa | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Tabular Data Augmentation Using Artificial Intelligence: A Systematic Review and Taxonomic Framework

Author(s) -

Mauro Henrique Lima De Boni,

Iwens Gervasio Sene,

Ronaldo Martins Da Costa

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3593449

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Context: Tabular data predominate in machine learning applications; however, data scarcity, class imbalance, and privacy-related constraints often impair the model performance. Therefore, AI-centric data-synthesis techniques have been adopted to mitigate these challenges. Objective: To systematically map the state of the art in AI-driven tabular-data augmentation, identifying trends, methodological gaps, and best practices. Method: The review followed Kitchenham’s guidelines and covered the ACM Digital Library, Compendex, IEEE Xplore, ScienceDirect, and Scopus for the period 2020–2024. After deduplication and application of the inclusion and exclusion criteria, 55 primary studies were selected and analyzed with respect to the eight research questions. Results: Of the 55 studies, 210 quantitative results were extracted: 70.95% employed utility metrics, and only 7.14% assessed privacy. To organize this heterogeneous landscape, we propose a taxonomic framework: solution type × Evaluation × metric, highlighting gaps and redundancies. Conclusions: Although research has progressed from conventional synthesizers to hybrid and novel architectures, metric standardization and systematic privacy assessments remain limited. Future work should address these gaps and apply fidelity and privacy metrics to underrepresented tasks such as regression problems and ultra-rare-class datasets.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research