Preventing dataset shift from breaking machine-learning biomarkers | Zendy

Jérôme Dockès | Zendy; Gaël Varoquaux | Zendy; JeanBaptiste Poline | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Preventing dataset shift from breaking machine-learning biomarkers

Author(s) -

Jérôme Dockès,

Gaël Varoquaux,

JeanBaptiste Poline

Publication year - 2021

Publication title -

gigascience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 2.947

H-Index - 54

ISSN - 2047-217X

DOI - 10.1093/gigascience/giab055

Subject(s) - biomarker , machine learning , artificial intelligence , computer science , biomarker discovery , population , medicine , biology , proteomics , biochemistry , environmental health , gene

Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals. Dataset shifts are frequent in biomedical research, e.g., because of recruitment biases. When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers. This article provides an overview of when and how dataset shifts break machine-learning-extracted biomarkers, as well as detection and correction strategies.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research