z-logo
open-access-imgOpen Access
Preventing dataset shift from breaking machine-learning biomarkers
Author(s) -
Jérôme Dockès,
Gaël Varoquaux,
JeanBaptiste Poline
Publication year - 2021
Publication title -
gigascience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.947
H-Index - 54
ISSN - 2047-217X
DOI - 10.1093/gigascience/giab055
Subject(s) - biomarker , machine learning , artificial intelligence , computer science , biomarker discovery , population , medicine , biology , proteomics , biochemistry , environmental health , gene
Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals. Dataset shifts are frequent in biomedical research, e.g.,  because of recruitment biases. When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers. This article provides an overview of when and how dataset shifts break machine-learning-extracted biomarkers, as well as detection and correction strategies.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom