z-logo
open-access-imgOpen Access
On the Accuracy of Cross-Validation in the Classification Problem
Author(s) -
V. M. Nedel’ko,
AUTHOR_ID
Publication year - 2021
Publication title -
izvestiâ irkutskogo gosudarstvennogo universiteta. seriâ "matematika"/izvestiâ irkutskogo gosudarstvennogo universiteta. seria matematika
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.411
H-Index - 3
eISSN - 2541-8785
pISSN - 1997-7670
DOI - 10.26516/1997-7670.2021.38.84
Subject(s) - cross validation , computer science , statistics , sample size determination , sample (material) , statistical hypothesis testing , mathematics , algorithm , data mining , chemistry , chromatography
In this work we will study the accuracy of the cross-validation estimates for decision functions. The main idea of the research consists in the scheme of statistical modeling that allows using real data to obtain statistical estimates, which are usually obtained only by using model (synthetic) distributions. The studies confirm the well-known empirical recommendation to choose the number of folds equal to 5 or more. The choice of more than 10 folds does not yield a significant increase in accuracy. The use of repeated cross-validation also does not provide fundamental gain in precision. The results of the experiments allow us to formulate an empirical fact that the accuracy of the estimates obtained by the cross-validation method is approximately the same as the accuracy of the estimates obtained from the test sample of half the size. This result can be easily explained by the fact that all the objects of the test sample are independent, and the estimates built by the cross-validation on different subsamples (folds) are not independent.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here