Premium
TESTING METHODS OF EVOLUTIONARY TREE CONSTRUCTION
Author(s) -
PENNY DAVID,
HENDY M. D.
Publication year - 1985
Publication title -
cladistics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.323
H-Index - 92
eISSN - 1096-0031
pISSN - 0748-3007
DOI - 10.1111/j.1096-0031.1985.tb00427.x
Subject(s) - weighting , tree (set theory) , mathematics , disjoint sets , similarity (geometry) , tree rearrangement , set (abstract data type) , statistics , interval tree , binary tree , search tree , algorithm , combinatorics , phylogenetic tree , tree structure , computer science , artificial intelligence , biology , search algorithm , medicine , biochemistry , gene , image (mathematics) , radiology , programming language
— Evaluating the reliability of methods for reconstructing evolutionary trees is discussed under the four headings of: evaluating criteria for an optimal tree, finding the optimal tree for the criterion selected, detecting reliable and unreliable data, and estimating the error range for the final tree. It is shown with five data sets (protein sequences) that, in general, the minimal tree is a better estimate of phylogeny than a longer tree. However, for each data set, the minimal tree was no longer the shortest when the sequences were combined. An objective weighting of columns (characters) can lead to an improved tree by giving less weight to columns that are closer to a random order. The weighting of characters is derived from the ratio of the observed to expected number of incompatabilities for each column. Several forms of weighting give better trees as measured by both the increase in correlation between lengths of trees with different subsets of data, and by an increase in the similarity between minimal trees found with disjoint subsets of data. Increasing the size of randomly selected subsets, and measuring the increased similarity of the results, can lead to an estimate of the minimum number of trees that need to be considered as possibly the correct historical tree. A measure of the ‘treeness’ of the data is described that estimates the extent to which a binary tree is a good description of the data.