Premium
A multithreading and hashing technique for indexing Target‐Decoy peptides databases
Author(s) -
Maabreh Majdi,
Irshid Hafez,
Gupta Ajay,
Alasmadi Izzat
Publication year - 2017
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.4371
Subject(s) - computer science , search engine indexing , hash function , multithreading , process (computing) , decoy , parallel computing , database , data mining , information retrieval , operating system , biochemistry , chemistry , receptor , computer security , thread (computing)
Summary Target‐Decoy database is currently the method of choice to assess the quality of Proteins' search engines. Decoy versions of real peptides are generated and injected to the same database of real ones with different labels. Quality of search engines results is assessed based on the number of decoys retrieved as hits. In Crux‐Tide search engine, which is one of the fastest search engines currently available, the process of indexing and generating decoys is computationally expensive. In this paper, we analyze the serial algorithm in detail and show improvement possibilities, and then describe a parallel‐shared memory solution using OpenMP. To completely break up the dependency in the serial algorithms, a clever hashing technique is utilized to localize the process. The parallel solution and the hashing technique together are able to reduce the computation cost by approximately 70‐80% using few threads. Besides the parallelization, we redesign part of the serial code so that the memory consumption becomes more efficient. The parallel version can index the same files using around two‐third of the memory space that the serial version consumes. This solution could impact and support future distributed developments of Crux‐Tide searching phase, where each parallel unit could rank the observed spectra independently.