z-logo
open-access-imgOpen Access
qc3C: Reference-free quality control for Hi-C sequencing data
Author(s) -
Matthew Z. DeMaere,
Aaron E. Darling
Publication year - 2021
Publication title -
plos computational biology/plos computational biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.628
H-Index - 182
eISSN - 1553-7358
pISSN - 1553-734X
DOI - 10.1371/journal.pcbi.1008839
Subject(s) - computer science , software , reference genome , data mining , protocol (science) , quality (philosophy) , sample (material) , dna sequencing , computational biology , dna , biology , chemistry , genetics , medicine , alternative medicine , epistemology , pathology , chromatography , programming language , philosophy
Hi-C is a sample preparation method that enables high-throughput sequencing to capture genome-wide spatial interactions between DNA molecules. The technique has been successfully applied to solve challenging problems such as 3D structural analysis of chromatin, scaffolding of large genome assemblies and more recently the accurate resolution of metagenome-assembled genomes (MAGs). Despite continued refinements, however, preparing a Hi-C library remains a complex laboratory protocol. To avoid costly failures and maximise the odds of successful outcomes, diligent quality management is recommended. Current wet-lab methods provide only a crude assay of Hi-C library quality, while key post-sequencing quality indicators used have—thus far—relied upon reference-based read-mapping. When a reference is accessible, this reliance introduces a concern for quality, where an incomplete or inexact reference skews the resulting quality indicators. We propose a new, reference-free approach that infers the total fraction of read-pairs that are a product of proximity ligation. This quantification of Hi-C library quality requires only a modest amount of sequencing data and is independent of other application-specific criteria. The algorithm builds upon the observation that proximity ligation events are likely to create k -mers that would not naturally occur in the sample. Our software tool (qc3C) is to our knowledge the first to implement a reference-free Hi-C QC tool, and also provides reference-based QC, enabling Hi-C to be more easily applied to non-model organisms and environmental samples. We characterise the accuracy of the new algorithm on simulated and real datasets and compare it to reference-based methods.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here