z-logo
open-access-imgOpen Access
Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era
Author(s) -
Rizzi Raffaella,
Beretta Stefano,
Patterson Murray,
Pirola Yuri,
Previtali Marco,
Della Vedova Gianluca,
Bonizzoni Paola
Publication year - 2019
Publication title -
quantitative biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.707
H-Index - 15
eISSN - 2095-4697
pISSN - 2095-4689
DOI - 10.1007/s40484-019-0181-x
Subject(s) - de bruijn sequence , computer science , bloom filter , de bruijn graph , sequence assembly , genome , combinatorics , theoretical computer science , biology , algorithm , mathematics , genetics , gene , gene expression , transcriptome
Background De novo genome assembly relies on two kinds of graphs: de Bruijn graphs and overlap graphs. Overlap graphs are the basis for the Celera assembler, while de Bruijn graphs have become the dominant technical device in the last decade. Those two kinds of graphs are collectively called assembly graphs. Results In this review, we discuss the most recent advances in the problem of constructing, representing and navigating assembly graphs, focusing on very large datasets. We will also explore some computational techniques, such as the Bloom filter, to compactly store graphs while keeping all functionalities intact. Conclusions We complete our analysis with a discussion on the algorithmic issues of assembling from long reads ( e.g., PacBio and Oxford Nanopore). Finally, we present some of the most relevant open problems in this field.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here