z-logo
open-access-imgOpen Access
NoTAC: A Noise-Tolerance Automatic Cleaning Framework for Bone Marrow Karyotyping Data
Author(s) -
Rihan Huang,
Siyuan Chen,
Yafei Li,
Chunling Zhang,
Yilan Zhang,
Changchun Yang,
Na Li,
Jingdong Hu,
Ao Xu,
Junkai Su,
Xin Gao,
Huidan Li,
Jiatao Lou
Publication year - 2025
Publication title -
ieee journal of biomedical and health informatics
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 1.293
H-Index - 125
eISSN - 2168-2208
pISSN - 2168-2194
DOI - 10.1109/jbhi.2025.3594369
Subject(s) - bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , signal processing and analysis
Deep neural networks have advanced chromosome classification, a critical procedure in karyotyping for disease diagnosis. However, training an effective DNN requires clean and reliable data, whereas real-world clinical chromosome data often contain label errors and outliers, which degrade DNN performance and limit their clinical applicability. In this work, we propose a Noise-Tolerance Automatic Cleaning framework, named NoTAC, to address potential labeling errors and outliers to enhance the performance of chromosome classification. The framework consists of two branches: KaryoCleanse for label noise detection and KaryoDrift for outlier identification. First, it identifies potential label errors by leveraging the DNN's self-confidence, estimating the latent label distribution, and ranking probabilities to prune mislabeled data. Second, it scores out-of-distribution samples based on the average K-nearest neighbor distances, enabling the identification and removal of outlier data. We conducted comprehensive comparative experiments against state-of-the-art noise-handling methods on a real-world R-band bone marrow chromosome dataset. Our results demonstrate that NoTAC achieves superior performance with an accuracy of 93.99%, which represents a 6.25% relative improvement over the baseline and outperforms the best competing method by 0.92%. Furthermore, our qualitative analysis of NoTAC revealed reliable data issues in a real-world R-band bone marrow chromosome dataset, offering insights into how these issues impair DNN prediction capabilities. These findings demonstrate NoTAC's potential to enhance both the performance and reliability of DNNs in practical medical datasets. The proposed method has also been applied to assist clinical karyotype diagnosis.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom