An improved encoding of genetic variation in a Burrows–Wheeler transform | Zendy

Thomas Büchler | Zendy; Enno Ohlebusch | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

An improved encoding of genetic variation in a Burrows–Wheeler transform

Author(s) -

Thomas Büchler,

Enno Ohlebusch

Publication year - 2019

Publication title -

bioinformatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.599

H-Index - 390

eISSN - 1367-4811

pISSN - 1367-4803

DOI - 10.1093/bioinformatics/btz782

Subject(s) - encode , substring , genome , single nucleotide polymorphism , variation (astronomy) , genetics , structural variation , reference genome , computer science , computational biology , alphabet , dbsnp , biology , gene , genotype , data structure , programming language , linguistics , philosophy , physics , astrophysics

In resequencing experiments, a high-throughput sequencer produces DNA-fragments (called reads) and each read is then mapped to the locus in a reference genome at which it fits best. Currently dominant read mappers are based on the Burrows-Wheeler transform (BWT). A read can be mapped correctly if it is similar enough to a substring of the reference genome. However, since the reference genome does not represent all known variations, read mapping tends to be biased towards the reference and mapping errors may thus occur. To cope with this problem, Huang et al. encoded single nucleotide polymorphisms (SNPs) in a BWT by the International Union of Pure and Applied Chemistry (IUPAC) nucleotide code. In a different approach, Maciuca et al. provided a 'natural encoding' of SNPs and other genetic variations in a BWT. However, their encoding resulted in a significantly increased alphabet size (the modified alphabet can have millions of new symbols, which usually implies a loss of efficiency). Moreover, the two approaches do not handle all known kinds of variation.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research