On the Use of Information Criteria for Model Selection in Phylogenetics
Author(s) -
Edward Susko,
Andrew J. Roger
Publication year - 2019
Publication title -
molecular biology and evolution
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 6.637
H-Index - 218
eISSN - 1537-1719
pISSN - 0737-4038
DOI - 10.1093/molbev/msz228
Subject(s) - akaike information criterion , bayesian information criterion , model selection , information criteria , bayes factor , divergence (linguistics) , bayes' theorem , selection (genetic algorithm) , statistics , bayesian probability , biology , mathematics , computer science , econometrics , artificial intelligence , linguistics , philosophy
The information criteria Akaike information criterion (AIC), AICc, and Bayesian information criterion (BIC) are widely used for model selection in phylogenetics, however, their theoretical justification and performance have not been carefully examined in this setting. Here, we investigate these methods under simple and complex phylogenetic models. We show that AIC can give a biased estimate of its intended target, the expected predictive log likelihood (EPLnL) or, equivalently, expected Kullback–Leibler divergence between the estimated model and the true distribution for the data. Reasons for bias include commonly occurring issues such as small edge-lengths or, in mixture models, small weights. The use of partitioned models is another issue that can cause problems with information criteria. We show that for partitioned models, a different BIC correction is required for it to be a valid approximation to a Bayes factor. The commonly used AICc correction is not clearly defined in partitioned models and can actually create a substantial bias when the number of parameters gets large as is the case with larger trees and partitioned models. Bias-corrected cross-validation corrections are shown to provide better approximations to EPLnL than AIC. We also illustrate how EPLnL, the estimation target of AIC, can sometimes favor an incorrect model and give reasons for why selection of incorrectly under-partitioned models might be desirable in partitioned model settings.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom