
A Benchmark Study on Error Assessment and Quality Control of CCS Reads Derived from the PacBio RS
Author(s) -
Xiaoli Jiao,
Xin Zheng,
Liang Ma,
Geetha Kutty,
Emile Gogineni,
Qiang Sun,
Brad T. Sherman,
Xiaojun Hu,
Kristine Jones,
Castle Raley,
Bao Tran,
David J. Munroe,
Robert M. Stephens,
Dun Liang,
Tomozumi Imamichi,
J. Kovács,
Richard A. Lempicki,
Da Wei Huang
Publication year - 2013
Publication title -
journal of data mining in genomics and proteomics
Language(s) - English
Resource type - Journals
ISSN - 2153-0602
DOI - 10.4172/2153-0602.1000136
Subject(s) - computer science , amplicon sequencing , amplicon , dna sequencing , benchmark (surveying) , data mining , hybrid genome assembly , word error rate , sequence (biology) , computational biology , shotgun sequencing , biology , artificial intelligence , dna , genetics , polymerase chain reaction , gene , 16s ribosomal rna , geodesy , geography
PacBio RS, a newly emerging third-generation DNA sequencing platform, is based on a real-time, single-molecule, nano-nitch sequencing technology that can generate very long reads (up to 20-kb) in contrast to the shorter reads produced by the first and second generation sequencing technologies. As a new platform, it is important to assess the sequencing error rate, as well as the quality control (QC) parameters associated with the PacBio sequence data. In this study, a mixture of 10 prior known, closely related DNA amplicons were sequenced using the PacBio RS sequencing platform. After aligning Circular Consensus Sequence (CCS) reads derived from the above sequencing experiment to the known reference sequences, we found that the median error rate was 2.5% without read QC, and improved to 1.3% with an SVM based multi-parameter QC method. In addition, a De Novo assembly was used as a downstream application to evaluate the effects of different QC approaches. This benchmark study indicates that even though CCS reads are post error-corrected it is still necessary to perform appropriate QC on CCS reads in order to produce successful downstream bioinformatics analytical results.