Correcting for Sample Contamination in Genotype Calling of DNA Sequence Data
Author(s) -
Matthew Flickinger,
Goo Jun,
Gonçalo R. Abecasis,
Michael Boehnke,
Hyun Min Kang
Publication year - 2015
Publication title -
the american journal of human genetics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 6.661
H-Index - 302
eISSN - 1537-6605
pISSN - 0002-9297
DOI - 10.1016/j.ajhg.2015.07.002
Subject(s) - contamination , genotype , sample (material) , sequence (biology) , genetics , biology , dna , computational biology , chemistry , gene , chromatography , ecology
DNA sample contamination is a frequent problem in DNA sequencing studies and can result in genotyping errors and reduced power for association testing. We recently described methods to identify within-species DNA sample contamination based on sequencing read data, showed that our methods can reliably detect and estimate contamination levels as low as 1%, and suggested strategies to identify and remove contaminated samples from sequencing studies. Here we propose methods to model contamination during genotype calling as an alternative to removal of contaminated samples from further analyses. We compare our contamination-adjusted calls to calls that ignore contamination and to calls based on uncontaminated data. We demonstrate that, for moderate contamination levels (5%-20%), contamination-adjusted calls eliminate 48%-77% of the genotyping errors. For lower levels of contamination, our contamination correction methods produce genotypes nearly as accurate as those based on uncontaminated data. Our contamination correction methods are useful generally, but are particularly helpful for sample contamination levels from 2% to 20%.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom