z-logo
open-access-imgOpen Access
SHAP-based Feature Selection for Enhanced Unsupervised Labeling
Author(s) -
Mary Anne Walauskis,
Taghi M. Khoshgoftaar
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3591554
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Manual dataset labeling is expensive, time-consuming, and susceptible to noise and inaccuracies, often necessitating significant financial investments with risks of inconsistencies from human annotations. These challenges are further extended in domains such as fraud detection because of privacy concerns due to manual annotations and severe class imbalance, which negatively impact machine learning models. Our unsupervised approach integrates SHapley Additive exPlanations (SHAP) for feature selection with our novel unsupervised labeling method which uses an ensemble unsupervised method in conjunction with a percentile-based threshold technique on the widely used Kaggle Credit Card Fraud Detection dataset. We create subsets with three and five features using unsupervised SHAP-based feature selection to determine the most impactful features, as well as use the full-featured dataset. To evaluate, we compare the newly generated binary class labels to the actual labels, which were only used for evaluation, and calculate Matthews Correlation Coefficient (MCC), Jaccard Index (JI), and Precision. Furthermore, we compare our method to an unsupervised baseline and show significant improvements. Our empirical results demonstrate that unsupervised SHAP-based feature selection consistently improves the quality of our labels, when compared to the baseline unsupervised method. Lastly, unsupervised SHAP-based feature selection improves label quality when comparing feature subsets to the full-feature dataset while reducing computational complexity. Our work provides an unsupervised framework capable of addressing the challenges of labeling highly imbalanced and unlabeled data while preserving data privacy concerns given the unsupervised nature of our methodology and application of unsupervised SHAP-based feature selection.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom