z-logo
Premium
Parallel solution of the subset‐sum problem: an empirical study
Author(s) -
Bokhari Saniyah S.
Publication year - 2012
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.2800
Subject(s) - parallel computing , computer science , multithreading , scalability , massively parallel , graphics processing unit , ibm , cuda , word (group theory) , thread (computing) , mathematics , materials science , database , nanotechnology , operating system , geometry
SUMMARY The subset‐sum problem is a well‐known NP‐complete combinatorial problem that is solvable in pseudo‐polynomial time, that is, time proportional to the number of input objects multiplied by the sum of their sizes. This product defines the size of the dynamic programming table used to solve the problem. We show how this problem can be parallelized on three contemporary architectures, that is, a 128‐processor Cray Extreme Multithreading (XMT) massively multithreaded machine, a 16‐processor IBM x3755 shared memory machine, and a 240‐core NVIDIA FX 5800 graphics processing unit (GPU). We show that it is straightforward to parallelize this algorithm on the Cray XMT primarily because of the word‐level locking that is available on this architecture. For the other two machines, we present an alternating word algorithm that can implement an efficient solution. Our results show that the GPU performs well for problems whose tables fit within the device memory. Because GPUs typically have memories in the order of 10GB, such architectures are best for small problem sizes that have tables of size approximately 10 10 . The IBM x3755 performs very well on medium‐sized problems that fit within its 64‐GB memory but has poor scalability as the number of processors increases and is unable to sustain performance as the problem size increases. This machine tends to saturate for problem sizes of 10 11 bits. The Cray XMT shows very good scaling for large problems and demonstrates sustained performance as the problem size increases. However, this machine has poor scaling for small problem sizes; it performs best for problem sizes of 10 12 bits or more. The results in this paper illustrate that the subset‐sum problem can be parallelized well on all three architectures, albeit for different ranges of problem sizes. The performance of these three machines under varying problem sizes show the strengths and weaknesses of the three architectures. Copyright © 2012 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here