FraudX SimS: A Synthetic Dataset for Anomaly Detection in Payment-Card Transactions
Author(s) -
Nazerke Baisholan,
J. Eric Dietz,
Sergiy Gnatyuk,
Mussa Turdalyuly,
Eric T. Matson,
Karlygash Baisholanova
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3637828
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Progress in detecting payment fraud is challenged by the limited variety of publicly available datasets. Relying on one or two datasets makes it hard to compare fairly, disguises sensitivity to data changes, and limits the ability to evaluate explainable methods in depth. This article introduces FraudX SimS, a scenario-labeled synthetic dataset designed to expand the set of benchmarks for anomaly detection in payment transactions, particularly in the context of fraud detection. The dataset preserves the class imbalance between legitimate and fraudulent activity and includes openly specified spatial, temporal, and behavioral features, allowing direct application of explainable artificial intelligence techniques. We establish baselines with standard machine learning models and report accuracy, precision, recall, F1-score, confusion-matrix results, and the area under the receiver operating characteristic curve (AUC-ROC) and the area under the precision–recall curve (AUC-PR), with a primary emphasis on recall given the cost of missed fraud. We further employ Shapley additive explanations to quantify feature contributions, enabling transparent error analysis and model refinement. Although synthetic, the dataset is constructed to support reproducible experimentation and cross-study comparisons, advancing the development of reliable and interpretable fraud-detection methods.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom