
Hardware acceleration of the novel two dimensional Burrows‐Wheeler Aligner algorithm with maximal exact matches seed extension kernel
Author(s) -
Taheri Mahdi,
Ansari Mohammad Saeed,
Magierowski Sebastian,
Mahani Ali
Publication year - 2021
Publication title -
iet circuits, devices and systems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.251
H-Index - 49
eISSN - 1751-8598
pISSN - 1751-858X
DOI - 10.1049/cds2.12005
Subject(s) - algorithm , acceleration , computer science , kernel (algebra) , extension (predicate logic) , matrix (chemical analysis) , diagonal , similarity (geometry) , row , speedup , parallel algorithm , parallel computing , mathematics , discrete mathematics , artificial intelligence , physics , classical mechanics , programming language , materials science , geometry , image (mathematics) , database , composite material
Next‐generation sequencing techniques have dramatically increased the amount of genomic data being sequenced, which calls for the acceleration of the alignment algorithms. This article proposes an FPGA‐based accelerated implementation of the seed extension kernel of the Burrows–Wheeler alignment genomic mapping algorithm. The well‐known Smith–Waterman algorithm is used during the seed extension to find the optimum alignment between the sequences. The state‐of‐the‐art architectures in the literature use one‐dimensional (1‐D) systolic arrays to fill a similarity matrix, based on the best score out of all match combinations, mismatches and gaps are computed. The cells on the same anti‐diagonal are computed in parallel in these architectures. We propose a novel 2‐dimensional architecture in which all the cells on the same rows and the same columns are computed in parallel and, thereby, significantly accelerated the process. The similarity matrix cells are computed in two phases: (1) the calculation phase and (2) error compensation phase. The calculation phase roughly approximate the cell values and the approximation error is fixed up during the error compensation phase. Our simulation results show that the proposed architecture can be up to 718x and 1.7x faster than the software execution and the 1‐D systolic arrays, respectively.