Weighted minimizer sampling improves long read mapping | Zendy

Chirag Jain | Zendy; Arang Rhie | Zendy; Haowen Zhang | Zendy; Claudia Chu | Zendy; Brian P. Walenz | Zendy; Sergey Koren | Zendy; Adam M. Phillippy | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Weighted minimizer sampling improves long read mapping

Author(s) -

Chirag Jain,

Arang Rhie,

Haowen Zhang,

Claudia Chu,

Brian P. Walenz,

Sergey Koren,

Adam M. Phillippy

Publication year - 2020

Publication title -

bioinformatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.599

H-Index - 390

eISSN - 1367-4811

pISSN - 1367-4803

DOI - 10.1093/bioinformatics/btaa435

Subject(s) - computer science , sampling (signal processing) , statistics , algorithm , mathematics , computer vision , filter (signal processing)

In this era of exponential data growth, minimizer sampling has become a standard algorithmic technique for rapid genome sequence comparison. This technique yields a sub-linear representation of sequences, enabling their comparison in reduced space and time. A key property of the minimizer technique is that if two sequences share a substring of a specified length, then they can be guaranteed to have a matching minimizer. However, because the k-mer distribution in eukaryotic genomes is highly uneven, minimizer-based tools (e.g. Minimap2, Mashmap) opt to discard the most frequently occurring minimizers from the genome to avoid excessive false positives. By doing so, the underlying guarantee is lost and accuracy is reduced in repetitive genomic regions.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research