Premium
DECISION TREES DO NOT GENERALIZE TO NEW VARIATIONS
Author(s) -
Bengio Yoshua,
Delalleau Olivier,
Simard Clarence
Publication year - 2010
Publication title -
computational intelligence
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.353
H-Index - 52
eISSN - 1467-8640
pISSN - 0824-7935
DOI - 10.1111/j.1467-8640.2010.00366.x
Subject(s) - decision tree , curse of dimensionality , machine learning , computer science , artificial intelligence , partition (number theory) , representation (politics) , decision tree learning , set (abstract data type) , mathematics , theoretical computer science , combinatorics , politics , political science , law , programming language
The family of decision tree learning algorithms is among the most widespread and studied. Motivated by the desire to develop learning algorithms that can generalize when learning highly varying functions such as those presumably needed to achieve artificial intelligence, we study some theoretical limitations of decision trees. We demonstrate formally that they can be seriously hurt by the curse of dimensionality in a sense that is a bit different from other nonparametric statistical methods, but most importantly, that they cannot generalize to variations not seen in the training set. This is because a decision tree creates a partition of the input space and needs at least one example in each of the regions associated with a leaf to make a sensible prediction in that region. A better understanding of the fundamental reasons for this limitation suggests that one should use forests or even deeper architectures instead of trees, which provide a form of distributed representation and can generalize to variations not encountered in the training data.