z-logo
open-access-imgOpen Access
Validity of Feature Importance in Low-Performing Machine Learning for Tabular Biomedical Data
Author(s) -
Youngro Lee,
Giacomo Baruzzo,
Jeonghwan Kim,
Jongmo Seo,
Barbara Di Camillo
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3618851
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
In tabular data analysis within biomedical research, achieving high model accuracy is often considered a prerequisite for discussing feature importance, as medical practitioners expect the validity of feature importance to correlate with model performance. In this work, we challenge this prevailing belief by demonstrating that even low-performing models can provide reliable feature importance on biomedical datasets. We conduct experiments to observe how feature importance rankings change as model performance progressively degrades. Using three synthetic datasets and four real-world biomedical datasets, we compare feature rankings from the full datasets to those obtained after reducing either the number of samples (samples removal) or the number of features (features removal), using different feature stability indices. Our results reveal that, in both synthetic and real datasets, feature rankings remain stable during performance degradation caused by features removal. In contrast, sample removal introduces greater discrepancies in feature importance rankings as performance deteriorates more severely. By analyzing the distribution of feature importance values and theoretically examining the probability that the model fails to distinguish importance between features, we show that models can still reliably identify feature importance despite performance degradation due to features removal. We conclude that the validity of feature importance can be preserved even at suboptimal model performance levels, as long as the degradation stems from insufficient features rather than insufficient samples. This has a considerable impact on biomedical research, where feature importance analysis plays a pivotal role in clinical decision support and translational bioinformatics.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom