Quantifying uncertainty of taxonomic placement in  DNA  barcoding and metabarcoding | Zendy

Somervuo Panu | Zendy; Yu Douglas W. | Zendy; Xu Charles C.Y. | Zendy; Ji Yinqiu | Zendy; Hultman Jenni | Zendy; Wirta Helena | Zendy; Ovaskainen Otso | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Quantifying uncertainty of taxonomic placement in DNA barcoding and metabarcoding

Author(s) -

Somervuo Panu,

Yu Douglas W.,

Xu Charles C.Y.,

Ji Yinqiu,

Hultman Jenni,

Wirta Helena,

Ovaskainen Otso

Publication year - 2017

Publication title -

methods in ecology and evolution

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.425

H-Index - 105

ISSN - 2041-210X

DOI - 10.1111/2041-210x.12721

Subject(s) - taxonomic rank , dna barcoding , biology , biodiversity , identification (biology) , taxonomy (biology) , biological classification , evolutionary biology , taxon , ecology

Summary A crucial step in the use of DNA markers for biodiversity surveys is the assignment of Linnaean taxonomies (species, genus, etc.) to sequence reads. This allows the use of all the information known based on the taxonomic names. Taxonomic placement of DNA barcoding sequences is inherently probabilistic because DNA sequences contain errors, because there is natural variation among sequences within a species, and because reference data bases are incomplete and can have false annotations. However, most existing bioinformatics methods for taxonomic placement either exclude uncertainty, or quantify it using metrics other than probability. In this paper we evaluate the performance of the recently proposed probabilistic taxonomic placement method PROTAX by applying it to both annotated reference sequence data as well as to unknown environmental data. Our four case studies include contrasting taxonomic groups (fungi, bacteria, mammals and insects), variation in the length and quality of the barcoding sequences (from individually Sanger‐sequenced sequences to short Illumina reads), variation in the structures and sizes of the taxonomies (800–130 000 species) and variation in the completeness of the reference data bases (representing 15–100% of known species). Our results demonstrate that PROTAX yields essentially unbiased probabilities of taxonomic placement, which means its quantification of species identification uncertainty is reliable. As expected, the accuracy of taxonomic placement increases with increasing coverage of taxonomic and reference sequence data bases, and with increasing ratio of genetic variation among taxonomic levels over within taxonomic levels. We conclude that reliable species‐level identification from environmental samples is still challenging and that neglecting identification uncertainty can lead to spurious inference. A key aim for future research is the completion of taxonomic and reference sequence data bases and making these two types of data compatible.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore