Premium
Model selection and error estimation without the agonizing pain
Author(s) -
Oneto Luca
Publication year - 2018
Publication title -
wiley interdisciplinary reviews: data mining and knowledge discovery
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.506
H-Index - 47
eISSN - 1942-4795
pISSN - 1942-4787
DOI - 10.1002/widm.1252
Subject(s) - generalization , bounding overwatch , computer science , usable , model selection , selection (genetic algorithm) , field (mathematics) , machine learning , artificial intelligence , estimation , algorithm , theoretical computer science , mathematics , mathematical analysis , management , world wide web , pure mathematics , economics
How can we select the best performing data‐driven model? How can we rigorously estimate its generalization error? Statistical learning theory (SLT) answers these questions by deriving nonasymptotic bounds on the generalization error of a model or, in other words, by delivering upper bounding of the true error of the learned model based just on quantities computed on the available data. However, for a long time, SLT has been considered only as an abstract theoretical framework, useful for inspiring new learning approaches, but with limited applicability to practical problems. The purpose of this review is to give an intelligible overview of the problems of model selection (MS) and error estimation (EE), by focusing on the ideas behind the different SLT‐based approaches and simplifying most of the technical aspects with the purpose of making them more accessible and usable in practice. We start by presenting the seminal works of the 80s until the most recent results, then discuss open problems and finally outline future directions of this field of research. This article is categorized under: Algorithmic Development > Statistics