Systematic bias in high-throughput sequencing data and its correction by BEADS
Author(s) -
Ming-Sin Cheung,
Thomas A. Down,
Isabel Latorre,
Julie Ahringer
Publication year - 2011
Publication title -
nucleic acids research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 9.008
H-Index - 537
eISSN - 1362-4954
pISSN - 0305-1048
DOI - 10.1093/nar/gkr425
Subject(s) - biology , dna sequencing , computational biology , deep sequencing , genetics , chip sequencing , normalization (sociology) , genome , illumina dye sequencing , dna nanoball sequencing , genomics , chromatin immunoprecipitation , dna , genomic library , gene , promoter , chromatin , base sequence , gene expression , nucleosome , sociology , anthropology
RELEASE NOTE rBEADS is in pre-release (alpha) stage. The software is provided for testing purposes. Please report the problem, bugs, unexpected behaviors and missing features here.BEADS algorithm requires deep inputs (high reads coverage) to work properly. This means >50 million reads for worm and fly experiments and proportionally higher number for mammalian experiments. It is suggested to pool multiple input experiments using sumBAMinputs function from rBEADS package.INTRODUCTIONBEADS is a normalization scheme that corrects nucleotide composition bias, mappability variations and differential local DNA structural effects in deep sequencing data. In high-throughput sequencing data, the recovery of sequenced DNA fragments is not uniform along the genome. In particular, GC-rich sequences are often over-represented and AT-rich sequences under-represented in sequencing data. In addition, the read mapping procedure also generates regional bias. Sequence reads that can be mapped to multiple sites in the genome are usually discarded. Genomic regions with high degeneracy therefore show lower mapped read coverage than unique portions of the genome. Mappability varies along the genome and thus creates systematic bias. Furthermore, local DNA or chromatin structural effects can lead to coverage inhomogeneity of sequencing data
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom