Missing Data in Phylogenetic Analysis: Reconciling Results from Simulations and Empirical Data
Author(s) -
John J. Wiens,
Matthew C. Morrill
Publication year - 2011
Publication title -
systematic biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 7.128
H-Index - 182
eISSN - 1076-836X
pISSN - 1063-5157
DOI - 10.1093/sysbio/syr025
Subject(s) - phylogenetic tree , biology , missing data , evolutionary biology , phylogenetics , econometrics , statistics , mathematics , genetics , gene
existing theoretical framework (Wiens 2003b). Furthermore, many contradictory studies suggesting that missing data are not generally problematic for Bayesian and likelihood analyses (given some assumptions) were not addressed by LEA. Second, the sweeping negative conclusions of LEA are not necessarily supported by their results. LEA find missing data to be problematic primarily when using sets of invariant or saturated characters and/or when obvious rate heterogeneity is ignored. Their results do not support the idea that missing data generally lead to incorrect inferences about topology when informative data are analyzed with appropriate methods. We conduct new simulations under more realistic conditions, and these results show no evidence that missing data generally lead to inaccurate Bayesian estimates of phylogeny. In fact, we show that the practice of excluding characters simply because they contain missing data cells may itself reduce accuracy. We reanalyze the “manipulated” empirical example from LEA and find that, without these artificial “manipulations” of the data, their conclusions are not supported. We also analyze eight empirical data sets, each containing many taxa with extensive missing data. We show that these incomplete taxa are consistently placed into the expected higher taxa, often with very strong support. Overall, our results confirm previous simulation and empirical studies showing that taxa with extensive missing data can be accurately placed in phylogenetic analyses and that adding characters with missing data can be beneficial (at least under some conditions). We conclude by pointing out important areas for future research on the topic of missing data and phylogenetic analysis.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom