z-logo
Premium
Are similarity‐ or phylogeny‐based methods more appropriate for classifying internal transcribed spacer (ITS) metagenomic amplicons?
Author(s) -
Porter Teresita M.,
Brian Golding G.
Publication year - 2011
Publication title -
new phytologist
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.742
H-Index - 244
eISSN - 1469-8137
pISSN - 0028-646X
DOI - 10.1111/j.1469-8137.2011.03838.x
Subject(s) - internal transcribed spacer , amplicon , metagenomics , phylogenetic tree , biology , similarity (geometry) , phylogenetics , computational biology , amplicon sequencing , set (abstract data type) , computer science , data mining , artificial intelligence , genetics , 16s ribosomal rna , gene , polymerase chain reaction , image (mathematics) , programming language
Summary• The internal transcribed spacer (ITS) of the nuclear ribosomal DNA region is a widely used species marker for plants and fungi. Recent metagenomic studies using next‐generation sequencing, however, generate only partial ITS sequences. Here we compare the performance of partial and full‐length ITS sequences with several classification methods. • We compiled a full‐length ITS data set and created short fragments to simulate the read lengths commonly recovered from current next‐generation sequencing platforms. We compared recovery, erroneous recovery, and coverage for the following methods: best BLAST hit classification, MEGAN classification, and automated phylogenetic assignment using the Statistical Assignment Program (SAP). • We found that summarizing results with more inclusive taxonomic ranks increased recovery and reduced erroneous recovery. The similarity‐based methods BLAST and MEGAN performed consistently across most fragment lengths. Using a phylogeny‐based method, SAP runs with queries 400 bp or longer worked best. Overall, BLAST had the highest recovery rates and MEGAN had the lowest erroneous recovery rates. • A high‐throughput ITS classification method should be selected, taking into consideration read length, an acceptable tradeoff between maximizing the total number of classifications and minimizing the number of erroneous classifications, and the computational speed of the assignment method.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here