A De Novo Robust Clustering Approach for Amplicon-Based Sequence Data
Author(s) -
Alexandre Bazin,
Didier Debroas,
Engelbert Mephu Nguifo
Publication year - 2018
Publication title -
journal of computational biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.585
H-Index - 95
eISSN - 1557-8666
pISSN - 1066-5277
DOI - 10.1089/cmb.2018.0170
Subject(s) - cluster analysis , computer science , amplicon , categorization , data mining , sequence (biology) , task (project management) , artificial intelligence , biology , gene , genetics , polymerase chain reaction , management , economics
When analyzing microbial communities, an active and computational challenge concerns the categorization of 16S rRNA gene sequences into operational taxonomic units (OTUs). Established clustering tools use a one pass algorithm to tackle high number of gene sequences and produce OTUs in reasonable time. However, all of the current tools are based on a crisp clustering approach, where a gene sequence is assigned to one cluster. The weak quality of the output compared with more complex clustering algorithms forces the user to postprocess the obtained OTUs. Providing a membership degree when assigning a gene sequence to an OTU will help the user during the postprocessing task. Moreover it is possible to use this membership degree to automatically evaluate the quality of the obtained OTUs. So the goal of this study is to propose a new clustering approach that takes into account uncertainty when producing OTUs, and improves both the quality and the presentation of the OTU results.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom