Premium
A machine‐learning‐driven data labeling pipeline for scientific analysis in MLExchange
Author(s) -
Chavez Tanny,
Zhao Zhuowen,
Jiang Runbo,
Koepp Wiebke,
McReynolds Dylan,
Zwart Petrus H.,
Allan Daniel B.,
Gann Eliot H.,
Schwarz Nicholas,
Ushizima Daniela,
Barnard Edward S.,
Mehta Apurva,
Sankaranarayanan Subramanian,
Hexemer Alexander
Publication year - 2025
Publication title -
journal of applied crystallography
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.429
H-Index - 162
ISSN - 1600-5767
DOI - 10.1107/s1600576725002328
This study introduces a novel labeling pipeline to accelerate the labeling process of scientific data sets by using artificial intelligence (AI)‐guided tagging techniques. This pipeline includes a set of interconnected web‐based graphical user interfaces (GUIs), where Data Clinic and MLCoach enable the preparation of machine learning (ML) models for data reduction and classification, respectively, while Label Maker is used for label assignment. Throughout this pipeline, data can be accessed through a direct connection to a file system or through Tiled for access through Hypertext Transfer Protocol (HTTP). Our experimental results present three use cases where this labeling pipeline has been instrumental for the study of large X‐ray scattering data sets in the area of pattern recognition, the remote analysis of resonant soft X‐ray scattering data and the fine‐tuning process of foundation models. These use cases highlight the labeling capabilities of this pipeline, including the ability to label large data sets in a short period of time, to perform remote data analysis while minimizing data movement and to enhance the fine‐tuning process of complex ML models with human involvement.
Empowering knowledge with every search
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom