Premium
Preface
Author(s) -
Christopher Neale
Publication year - 2016
Publication title -
international journal of laboratory hematology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.705
H-Index - 55
eISSN - 1751-553X
pISSN - 1751-5521
DOI - 10.1111/ijlh.12525
Subject(s) - medicine
How does one decide among competing explanations of data given limited observations? This is the problem of model selection. A central concern in model selection is the danger of overfitting: the selection of an overly complex model that, while fitting observed data very well, predicts future data very badly. Overfitting is one of the most important issues in inductive and statistical inference: besides model selection, it also pervades applications such as prediction, pattern classification and parameter estimation. The minimum description length (MDL) principle is a relatively recent method for inductive inference that provides a generic solution to the model selection problem, and, more generally, to the overfitting problem. MDL is based on the following insight: any regularity in the data can be used to compress the data, i.e. to describe it using fewer symbols than the number of symbols needed to describe the data literally. The more regularities there are, the more the data can be compressed. Equating “learning” with “finding regularity,” we can therefore say that the more we are able to compress the data, the more we have learned about the data. Formalizing this idea leads to a general theory of inductive inference with several attractive properties: