Limits to robustness and reproducibility in the demarcation of operational taxonomic units | Zendy

Schmidt Thomas S. B. | Zendy; Matias Rodrigues João F. | Zendy; Mering Christian | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Limits to robustness and reproducibility in the demarcation of operational taxonomic units

Author(s) -

Schmidt Thomas S. B.,

Matias Rodrigues João F.,

Mering Christian

Publication year - 2015

Publication title -

environmental microbiology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.954

H-Index - 188

eISSN - 1462-2920

pISSN - 1462-2912

DOI - 10.1111/1462-2920.12610

Subject(s) - biology , cluster analysis , robustness (evolution) , context (archaeology) , comparability , workflow , set (abstract data type) , data set , data mining , computer science , artificial intelligence , gene , mathematics , genetics , paleontology , combinatorics , database , programming language

Summary The demarcation of operational taxonomic units ( OTUs ) from complex sequence data sets is a key step in contemporary studies of microbial ecology. However, as biologically motivated ‘optimal’ OTU ‐binning algorithms remain elusive, many conceptually distinct approaches continue to be used. Using a global data set of 887 870 bacterial 16 S r RNA gene sequences, we objectively quantified biases introduced by several widely employed sequence clustering algorithms. We found that OTU ‐binning methods often provided surprisingly non‐equivalent partitions of identical data sets, notably when clustering to the same nominal similarity thresholds; and we quantified the resulting impact on ecological data description for a well‐defined human skin microbiome data set. We observed that some methods were very robust to varying clustering thresholds, while others were found to be highly susceptible even to slight threshold variations. Moreover, we comprehensively quantified the impact of the choice of 16 S r RNA gene subregion, as well as of data set scope and context on algorithm performance. Our findings may contribute to an enhanced comparability of results across sequence‐processing pipelines, and we arrive at recommendations towards higher levels of standardization in established workflows.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore