
AutoEval: Are Labels Always Necessary for Classifier Accuracy Evaluation?
Author(s) -
Weijian Deng,
Liang Zheng
Publication year - 2021
Publication title -
ieee transactions on pattern analysis and machine intelligence
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.811
H-Index - 372
eISSN - 1939-3539
pISSN - 0162-8828
DOI - 10.1109/tpami.2021.3136244
Subject(s) - computing and processing , bioengineering
Understanding model decision under novel test scenarios is central to the community. A common practice is evaluating models on labeled test sets. However, many real-world scenarios see unlabeled test data, rendering the common supervised evaluation protocols infeasible. In this paper, we investigate such an important but under-explored problem, named Automatic model Evaluation (AutoEval). Specifically, given a trained classifier, we aim to estimate its accuracy on various unlabeled test datasets. We construct a meta-dataset: a dataset comprised of datasets (sample sets) created from original images via various transformations such as rotation and background substitution. Correlation studies on the meta-dataset show that classifier accuracy exhibits a strong negative linear relationship with distribution shift (Pearson’s Correlation $r <-0.88$r < - 0 . 88). This new finding inspires us to formulate AutoEval as a dataset-level regression problem. Specifically, we learn regression models (e.g., a regression neural network) to estimate classifier accuracy from overall feature statistics of a test set. In the experiment, we show that the meta-dataset contains sufficient and diverse sample sets, allowing us to train robust regression models and report reasonable and promising predictions of the classifier accuracy on various test sets. We also provide insights into application scopes, limitations, and potential future directions of AutoEval.