z-logo
Premium
SOAP B arcode: revealing arthropod biodiversity through assembly of Illumina shotgun sequences of PCR amplicons
Author(s) -
Liu Shanlin,
Li Yiyuan,
Lu Jianliang,
Su Xu,
Tang Min,
Zhang Rui,
Zhou Lili,
Zhou Chengran,
Yang Qing,
Ji Yinqiu,
Yu Douglas W.,
Zhou Xin
Publication year - 2013
Publication title -
methods in ecology and evolution
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.425
H-Index - 105
ISSN - 2041-210X
DOI - 10.1111/2041-210x.12120
Subject(s) - barcode , biology , amplicon , shotgun sequencing , illumina dye sequencing , sequence assembly , pyrosequencing , metagenomics , computational biology , operational taxonomic unit , amplicon sequencing , shotgun , dna sequencing , k mer , genome , genetics , polymerase chain reaction , gene , computer science , 16s ribosomal rna , gene expression , transcriptome , operating system
Summary Metabarcoding of mixed arthropod samples for biodiversity assessment has mostly been carried out on the 454 GS FLX sequencer (Roche, Branford, Connecticut, USA), due to its ability to produce long reads (≥400 bp) that are believed to allow higher taxonomic resolution. The Illumina sequencing platforms, with their much higher throughputs, could potentially reduce sequencing costs and improve sequence quality, but the associated shorter read length (typically <150 bp) has deterred their usage in next‐generation‐sequencing ( NGS )‐based analyses of eukaryotic biodiversity, which often utilize standard barcode markers (e.g. COI , rbcL , matK , ITS ) that are hundreds of nucleotides long. We present a new Illumina‐based pipeline to recover full‐length COI barcodes from mixed arthropod samples. Our new assembly program, SOAPB arcode , a variant of the genome assembly program SOAP denovo , uses paired‐end reads of the standard COI barcode region as anchors to extract the correct pathways (sequences) out of otherwise chaotic ‘ de Bruijn graphs’, which are caused by the presence of large numbers of COI homologs of high sequence similarity. Two bulk insect samples of known species composition have been analysed in a recently published 454 metabarcoding study (Yu et al . 2012) and are re‐analysed by our analysis pipeline. Compared to the results of Roche 454 ( c . 400‐bp reads), our pipeline recovered full‐length COI barcodes (658 bp) and 17–31% more species‐level operational taxonomic units ( OTU s) from bulk insect samples, with fewer untraceable (novel) OTU s. On the other hand, our PCR ‐based pipeline also revealed higher rates of contamination across samples, due to the Illumina's increased sequencing depth. On balance, the assembled full‐length barcodes and increased OTU recovery rates resulted in more resolved taxonomic assignments and more accurate beta diversity estimation. The HiSeq 2000 and the SOAPB arcode pipeline together can achieve more accurate biodiversity assessment at a much reduced sequencing cost in metabarcoding analyses. However, greater precaution is needed to prevent cross‐sample contamination during field preparation and laboratory operation because of greater ability to detect non‐target DNA amplicons present in low‐copy numbers.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here