How Well Does Your Phylogenetic Model Fit Your Data?
Author(s) -
Daisy A. Shepherd,
Steffen Klaere
Publication year - 2018
Publication title -
systematic biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 7.128
H-Index - 182
eISSN - 1076-836X
pISSN - 1063-5157
DOI - 10.1093/sysbio/syy066
Subject(s) - goodness of fit , model selection , statistic , bayesian probability , econometrics , statistics , test statistic , statistical hypothesis testing , class (philosophy) , phylogenetic tree , statistical model , computer science , bayesian inference , mathematics , artificial intelligence , biology , biochemistry , gene
The test for model-to-data fitness is a fundamental principle within the statistical sciences. The purpose of such a test is to assess whether the selected best-fitting model adequately describes the behavior in the data. Despite their broad application across many areas of statistics, goodness of fit tests for phylogenetic models have received much less attention than model selection methods in the last decade. At present a number of approaches have been suggested. However, these are often flawed, with problems ranging from the presence of systematic error in the models themselves to the difficulties presented by the nature of phylogenetic data. Ultimately these problems lead to an inadequate choice of statistic. This is one of the main reasons why goodness of fit assessment is often a neglected step within phylogenetic analysis. We argue not only for the necessity of these goodness of fit measures to test how well the model reflects the data, but additionally for the need for "useful" tests that explain why the model-to-data fit may be inadequate. Such tests are a critical part of the model building process, allowing the model to be adapted to provide a better model-to-data fit or to reject a model class outright due to such an inadequate fit that the intended use of the class may be compromised. Proposed and existing methods in both the maximum likelihood and Bayesian framework will be discussed here, whilst highlighting their strengths and limitations for assessing goodness of fit. The final section discusses some critical open statistical problems in goodness of fit assessment for this field, with the hope of encouraging more research into such a fundamental yet underdeveloped area of phylogenetic inference. [Bayesian phylogenetics; Goodness of fit; maximum likelihood; molecular phylogenetics; outlier detection; residual diagnostics.].
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom