Generative Private Synthetic Student Data for Learning Analytics: An Empirical Study.
Author(s) -
Divine Iloh,
Oluwakemi Temitope Olayinka,
Faith Iyere,
Sanath Chilakala
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3619091
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Learning analytics has the potential to significantly enhance educational outcomes by personalizing student support. However, this field relies on sensitive student data, creating substantial privacy risks and hindering open research. Synthetic data generation offers a promising solution by creating artificial datasets that mimic the statistical properties of real data without exposing personal information. This paper presents a comprehensive framework for generating, evaluating, and utilizing synthetic student data based on the widely used Open University Learning Analytics Dataset (OULAD). We employ two state-of-the-art synthesizers, the Gaussian Copula (GC) and Conditional Tabular GAN (CTGAN), to create synthetic versions of the dataset. The synthetic data are rigorously evaluated against the original data along three critical axes: (1) machine learning utility for predicting student dropout and final grades, (2) statistical quality using the SDMetrics library, and (3) privacy risk via a membership inference attack. Our results show that CTGAN-generated data retains 93.1% of the predictive utility for dropout classification, a statistically significant finding, while also offering strong privacy protection. Conversely, the GC synthesizer produces data with higher statistical fidelity but lower machine learning utility. This work demonstrates that modern synthesizers can produce high-utility, privacy-preserving datasets, providing a viable path for researchers to share and explore rich educational data safely.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom