Using analysis information in the synchronization‐free GPU solution of sparse triangular systems | Zendy

Dufrechou Ernesto | Zendy; Ezzatti Pablo | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Using analysis information in the synchronization‐free GPU solution of sparse triangular systems

Author(s) -

Dufrechou Ernesto,

Ezzatti Pablo

Publication year - 2019

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.5499

Subject(s) - computer science , solver , massively parallel , synchronization (alternating current) , parallel computing , context (archaeology) , kernel (algebra) , sparse matrix , general purpose computing on graphics processing units , set (abstract data type) , computational science , mathematics , computer graphics (images) , computer network , paleontology , channel (broadcasting) , physics , graphics , combinatorics , quantum mechanics , gaussian , biology , programming language

Summary The solution of sparse triangular linear systems is one of the most important building blocks for a large number of science and engineering problems. For these reasons, it has been studied steadily for several decades, principally in order to take advantage of emerging parallel platforms. In the context of massively parallel platforms such as GPUs, the standard strategy of parallel solution is based on performing a level‐set analysis of the sparse matrix, and the kernel included in the nVidia cuSparse library is the most prominent example of this approach. However, a weak spot of this implementation is the costly analysis phase and the constant synchronizations with the CPU during the solution stage. In previous work, we presented a self‐scheduled and synchronization‐free GPU algorithm that avoided the analysis phase and the synchronizations of the standard approach. Here, we extend this proposal and show how the level‐set information can be leveraged to improve its performance. In particular, we present new GPU solution routines that attack some of the weak spots of the self‐scheduled solver, such as the under‐utilization of the GPU resources in the case of highly sparse matrices. The experimental evaluation reveals a sensible runtime reduction over cuSparse and the state‐of‐the‐art synchronization‐free method.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore