How Many Taxa Must Be Sampled to Identify the Root Node of a Large Clade?
Author(s) -
Michael J. Sanderson
Publication year - 1996
Publication title -
systematic biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 7.128
H-Index - 182
eISSN - 1076-836X
pISSN - 1063-5157
DOI - 10.1093/sysbio/45.2.168
Subject(s) - biology , clade , taxon , root (linguistics) , evolutionary biology , botany , phylogenetic tree , genetics , gene , linguistics , philosophy
The importance of choice of taxa in phylogenetic analysis has been explored mainly with reference to its effect on the accuracy of tree estimation. Taxon sampling can also introduce other kinds of errors. Even if the sampled topology agrees with the true topology, it may not include the true root node of a clade, a node that is of interest for many reasons. Using a simple Yule model for the diversification process, the probability of identifying this node is derived under random sampling of taxa. For large clades, the minimum sample size needed to be 95% confident of identifying the root node is approximately 40 and is independent of the size of the clade. If rates of diversification differ in the two sister groups descended from the root node, the minimum sample size needed increases markedly. If these two sister groups are so different in diversity that a Yule model would be rejected by conventional diversification tests, then the necessary sample size is an order of magnitude greater than when diversification is homogeneous. (Diver- sification; phylogeny; branching; speciation; Yule model; taxon sampling.) The recent publication of a very large phylogenetic analysis of seed plants based on chloroplast rbcL data (Chase et al., 1993) has raised a number of interesting ques- tions about phylogenetic analyses of large clades. Among these questions are com- putational issues related to reconstructing optimal trees using heuristic algorithms (Rice et alv 1995) and the choice of taxon sampling scheme for groups that are either large or poorly understood phylogeneti- cally. The rbcL analysis included nearly 500 sequences, a remarkable and possibly record-setting number but one that sam- ples barely 0.2% of seed plant diversity. Other similarly large clades probably will remain sparsely sampled by systematists for the foreseeable future. How much sam- pling is enough in groups that are excep- tionally species rich?
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom