z-logo
open-access-imgOpen Access
A Data-Centric Approach to Improve Machine Learning Model’s Performance in Production
Author(s) -
Pritom Bhowmik,
Arabinda Saha Partha
Publication year - 2021
Publication title -
international journal of engineering and advanced technology
Language(s) - English
Resource type - Journals
ISSN - 2249-8958
DOI - 10.35940/ijeat.a3201.1011121
Subject(s) - computer science , machine learning , artificial intelligence , consistency (knowledge bases) , code (set theory) , audit , production (economics) , process (computing) , baseline (sea) , raw data , data mining , big data , oceanography , management , set (abstract data type) , economics , macroeconomics , programming language , geology , operating system
Machine learning teaches computers to think in a similar way to how humans do. An ML models work by exploring data and identifying patterns with minimal human intervention. A supervised ML model learns by mapping an input to an output based on labeled examples of input-output (X, y) pairs. Moreover, an unsupervised ML model works by discovering patterns and information that was previously undetected from unlabelled data. As an ML project is an extensively iterative process, there is always a need to change the ML code/model and datasets. However, when an ML model achieves 70-75% of accuracy, then the code or algorithm most probably works fine. Nevertheless, in many cases, e.g., medical or spam detection models, 75% accuracy is too low to deploy in production. A medical model used in susceptible tasks such as detecting certain diseases must have an accuracy label of 98-99%. Furthermore, that is a big challenge to achieve. In that scenario, we may have a good working model, so a model-centric approach may not help much achieve the desired accuracy threshold. However, improving the dataset will improve the overall performance of the model. Improving the dataset does not always require bringing more and more data into the dataset. Improving the quality of the data by establishing a reasonable baseline level of performance, labeler consistency, error analysis, and performance auditing will thoroughly improve the model's accuracy. This review paper focuses on the data-centric approach to improve the performance of a production machine learning model.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here