
A Bias Injection Technique to Assess the Resilience of Causal Discovery Methods
Author(s) -
Martina Cinquini,
Karima Makhlouf,
Sami Zhioua,
Catuscia Palamidessi,
Riccardo Guidotti
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3573201
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Causal discovery (CD) algorithms are increasingly applied in socially and ethically sensitive domains. However, their evaluation under realistic conditions remains challenging due to the scarcity of real-world datasets annotated with ground-truth causal structures. Whereas synthetic data generators support controlled benchmarking, they often overlook forms of bias, such as dependencies involving sensitive attributes, that may significantly affect the observed distribution and compromise the trustworthiness of downstream analysis. This paper introduces a novel synthetic data generation framework that enables controlled bias injection while preserving the causal relationships specified in a ground-truth causal graph. The aim of the framework is to evaluate the reliability of CD methods by examining the impact of varying bias levels and outcome binarization thresholds. Experimental results show that even moderate bias levels can lead CD approaches to fail to correctly infer causal links, particularly those connecting sensitive attributes to decision outcomes. These findings underscore the need for expert validation and highlight the limitations of current CD methods in fairness-critical applications. Our proposal thus provides an essential tool for benchmarking and improving CD algorithms in biased, real-world data settings.