DSK: k-mer counting with very low memory usage | Zendy

Guillaume Rizk | Zendy; Dominique Lavenier | Zendy; Rayan Chikhi | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

DSK: k-mer counting with very low memory usage

Author(s) -

Guillaume Rizk,

Dominique Lavenier,

Rayan Chikhi

Publication year - 2013

Publication title -

bioinformatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.599

H-Index - 390

eISSN - 1367-4811

pISSN - 1367-4803

DOI - 10.1093/bioinformatics/btt020

Subject(s) - k mer , computer science , hash table , hash function , substring , auxiliary memory , parallel computing , bloom filter , data structure , algorithm , operating system , dna sequencing , dna , biology , genetics , computer security

Counting all the k-mers (substrings of length k) in DNA/RNA sequencing reads is the preliminary step of many bioinformatics applications. However, state of the art k-mer counting methods require that a large data structure resides in memory. Such structure typically grows with the number of distinct k-mers to count. We present a new streaming algorithm for k-mer counting, called DSK (disk streaming of k-mers), which only requires a fixed user-defined amount of memory and disk space. This approach realizes a memory, time and disk trade-off. The multi-set of all k-mers present in the reads is partitioned, and partitions are saved to disk. Then, each partition is separately loaded in memory in a temporary hash table. The k-mer counts are returned by traversing each hash table. Low-abundance k-mers are optionally filtered. DSK is the first approach that is able to count all the 27-mers of a human genome dataset using only 4.0 GB of memory and moderate disk space (160 GB), in 17.9 h. DSK can replace a popular k-mer counting software (Jellyfish) on small-memory servers.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research