Premium
The role of margins in boosting and ensemble performance
Author(s) -
Martinez Waldyn,
Gray J. Brian
Publication year - 2014
Publication title -
wiley interdisciplinary reviews: computational statistics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.693
H-Index - 38
eISSN - 1939-0068
pISSN - 1939-5108
DOI - 10.1002/wics.1292
Subject(s) - boosting (machine learning) , overfitting , ensemble learning , random forest , machine learning , gradient boosting , artificial intelligence , computer science , resampling , generalization error , adaboost , support vector machine , artificial neural network
Boosting refers to methods that create a sequence of classifiers that perform at least slightly better than random (weak learners) and combine them into a highly accurate ensemble model (strong learners) through weighted voting. There is sufficient empirical evidence to suggest that the performance of boosting methods is superior to that of individual classifiers. In the bias–variance decomposition framework, it has been demonstrated that boosting algorithms typically reduce bias for learning problems, and in some instances reduce the variance. In addition, even when combining a large number of weak learners, boosting algorithms can be very robust to overfitting, in most instances having lower generalization error than other competing ensemble methodologies, such as bagging and random forests. To explain the successful performance of boosting methods, Schapire et al. (Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 1998, 26:1651–1686) developed a bound based on the margins of the training data, from which they concluded that larger margins should lead to lower generalization error. In this article, we examine the role of margins in boosting and ensemble method performance. WIREs Comput Stat 2014, 6:124–131. doi: 10.1002/wics.1292 This article is categorized under: Statistical and Graphical Methods of Data Analysis > Bootstrap and Resampling Statistical Learning and Exploratory Methods of the Data Sciences > Classification and Regression Trees (CART)