Fast synchronization‐free algorithms for parallel sparse triangular solves with multiple right‐hand sides | Zendy

Liu Weifeng | Zendy; Li Ang | Zendy; Hogg Jonathan D. | Zendy; Duff Iain S. | Zendy; Vinter Brian | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Fast synchronization‐free algorithms for parallel sparse triangular solves with multiple right‐hand sides

Author(s) -

Liu Weifeng,

Li Ang,

Hogg Jonathan D.,

Duff Iain S.,

Vinter Brian

Publication year - 2017

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.4244

Subject(s) - speedup , computer science , parallel computing , preprocessor , synchronization (alternating current) , partition (number theory) , algorithm , exploit , set (abstract data type) , mathematics , artificial intelligence , channel (broadcasting) , computer network , computer security , combinatorics , programming language

Summary The sparse triangular solve kernels, SpTRSV and SpTRSM, are important building blocks for a number of numerical linear algebra routines. Parallelizing SpTRSV and SpTRSM on today's manycore platforms, such as GPUs, is not an easy task since computing a component of the solution may depend on previously computed components, enforcing a degree of sequential processing. As a consequence, most existing work introduces a preprocessing stage to partition the components into a group of level‐sets or colour‐sets so that components within a set are independent and can be processed simultaneously during the subsequent solution stage. However, this class of methods requires a long preprocessing time as well as significant runtime synchronization overheads between the sets. To address this, we propose in this paper novel approaches for SpTRSV and SpTRSM in which the ordering between components is naturally enforced within the solution stage. In this way, the cost for preprocessing can be greatly reduced, and the synchronizations between sets are completely eliminated. To further exploit the data‐parallelism, we also develop an adaptive scheme for efficiently processing multiple right‐hand sides in SpTRSM. A comparison with a state‐of‐the‐art library supplied by the GPU vendor, using 20 sparse matrices on the latest GPU device, shows that the proposed approach obtains an average speedup of over two for SpTRSV and up to an order of magnitude speedup for SpTRSM. In addition, our method is up to two orders of magnitude faster for the preprocessing stage than existing SpTRSV and SpTRSM methods.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore