Open Access
Identifying and Reducing Systematic Errors in Chromosome Conformation Capture Data
Author(s) -
Seungsoo Hahn,
Dongsup Kim
Publication year - 2015
Publication title -
plos one
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.99
H-Index - 332
ISSN - 1932-6203
DOI - 10.1371/journal.pone.0146007
Subject(s) - normalization (sociology) , systematic error , correlation , chromosome conformation capture , restriction enzyme , computational biology , biology , mathematics , dna , genetics , biological system , statistics , gene expression , geometry , enhancer , sociology , anthropology , gene
Chromosome conformation capture (3C)-based techniques have recently been used to uncover the mystic genomic architecture in the nucleus. These techniques yield indirect data on the distances between genomic loci in the form of contact frequencies that must be normalized to remove various errors. This normalization process determines the quality of data analysis. In this study, we describe two systematic errors that result from the heterogeneous local density of restriction sites and different local chromatin states, methods to identify and remove those artifacts, and three previously described sources of systematic errors in 3C-based data: fragment length, mappability, and local DNA composition. To explain the effect of systematic errors on the results, we used three different published data sets to show the dependence of the results on restriction enzymes and experimental methods. Comparison of the results from different restriction enzymes shows a higher correlation after removing systematic errors. In contrast, using different methods with the same restriction enzymes shows a lower correlation after removing systematic errors. Notably, the improved correlation of the latter case caused by systematic errors indicates that a higher correlation between results does not ensure the validity of the normalization methods. Finally, we suggest a method to analyze random error and provide guidance for the maximum reproducibility of contact frequency maps.