Premium
MASA‐OpenCL: Parallel pruned comparison of long DNA sequences with OpenCL
Author(s) -
Figueiredo Marco Antonio C.,
Oliveira Sandes Edans F.,
Rodrigues Genai.,
Teodoro George L. M.,
Melo Alba Cristina M. A.
Publication year - 2018
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.5039
Subject(s) - computer science , cuda , parallel computing , graphics , pruning , smith–waterman algorithm , pairwise comparison , sequence (biology) , simd , similarity (geometry) , graphics processing unit , algorithm , artificial intelligence , sequence alignment , image (mathematics) , computer graphics (images) , biochemistry , chemistry , genetics , gene , agronomy , peptide sequence , biology
Summary Biological sequence comparison is often used as an auxiliary task in the analysis of genetic material. Pairwise comparison algorithms like Smith‐Waterman evaluate two strings representing sequences of proteins, DNA or RNA to obtain optimal alignment between them. Many applications have been proposed to address the sequence comparison problem, prioritizing the use of graphics cards and proprietary languages such as CUDA. In this paper, we propose and evaluate MASA‐OpenCL, an OpenCL solution for comparing long DNA sequences that is based on the MASA sequence alignment framework, with pruning capability proportional to the similarity of the sequences compared. The results of MASA‐OpenCL were compared to its CUDA counterpart (MASA‐CUDAlign) and, in most cases, MASA‐OpenCL achieved better performance. In order to better understand the behavior of MASA‐OpenCL, we performed a statistical analysis considering 11 comparisons of sequences with high, medium and low similarity in 4 GPUs. As a result, we obtained a multiple linear regression model that considers (a) the sizes of the sequences, (b) the similarity between them, (c) the computational power of the GPU, and (d) the GPU memory bandwidth. We used this model to predict the performance in two other GPUs, with low error rates.