z-logo
open-access-imgOpen Access
Calibrating F1 Scores for Fair Performance Comparison of Binary Classification Models with Application to Student Dropout Prediction
Author(s) -
Hyeon Gyu Kim,
Yoohyun Park
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3594735
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
The F1 score has been widely used to measure the performance of machine learning models. However, it is variant to the ratio of the positive class in the training data, π. Depending on how large π is, it can be underestimated or overestimated, making it difficult to fairly compare the performance of models. In this study, we discuss how to calibrate the F1 score for fair performance comparison of binary classification models trained on data with different positive class ratios. We initially demonstrated that the F1 score is inverse proportional to accuracy according to the change in π. From the relationship, the calibrated F1 score was defined as an arithmetic mean of the two measures, which we named the F1* score. Since many prior studies only presented the F1 score or accuracy for model performance, but not both, we provided additional equations to estimate the expected F1 score or accuracy when one of the two measures is available. The accuracy of the presented equations was examined through experiments with a real dataset aimed at student dropout prediction, and the results showed that the mean absolute difference between the derived and actual values was less than 0.01, inferring that the proposed F1* score can calibrate a given F1 score with a high level of accuracy. We also conducted an example analysis comparing the performance of existing models using the F1* score to highlight its efficacy.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom