Premium
Diagnosability of mt DNA with Random Forests: Using sequence data to delimit subspecies
Author(s) -
Archer Frederick I.,
Martien Karen K.,
Taylor Barbara L.
Publication year - 2017
Publication title -
marine mammal science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.723
H-Index - 78
eISSN - 1748-7692
pISSN - 0824-0469
DOI - 10.1111/mms.12414
Subject(s) - subspecies , taxon , range (aeronautics) , random forest , biology , population , evolutionary biology , zoology , ecology , artificial intelligence , computer science , demography , materials science , sociology , composite material
We examine the use of an ensemble method, Random Forests, to delimit subspecies using mitochondrial DNA (mt DNA ) sequences. Diagnosability, a measure of the ability to correctly determine the taxon of a specimen of unknown origin, has historically been used to delimit subspecies, but few studies have explored how to estimate it from DNA sequences. Using simulated and empirical data sets, we demonstrate that Random Forests produces classification models that perform well for diagnosing subspecies and species. Populations with strong social structure and relatively low abundances ( e.g ., killer whales, Orcinus orca ) were found to be as diagnosable as species. Conversely, comparisons involving subspecies that are abundant ( e.g ., spinner and spotted dolphins, Stenella longirostris and S. attenuata ), are only as diagnosable as many population comparisons. Estimates of diagnosability reported in subspecies and species descriptions should include confidence intervals, which are influenced by the sample sizes of the training data. We also stress the importance of reporting the certainty with which individuals in the training data are classified in order to communicate the strength of the classification model and diagnosability estimate. Guidance as to ideal minimum diagnosability thresholds for subspecies will improve with more comprehensive analyses; however, values in the range of 80%–90% are considered appropriate.