Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs | Zendy

Carrijo Nasciutti Thiago | Zendy; Panetta Jairo | Zendy; Pais Lopes Pedro | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs

Author(s) -

Carrijo Nasciutti Thiago,

Panetta Jairo,

Pais Lopes Pedro

Publication year - 2018

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.4929

Subject(s) - stencil , parallel computing , computer science , grid , computation , computational science , supercomputer , algorithm , mathematics , geometry

Summary This work compares the performance of optimizations that transform replicated global memory accesses into local memory accesses on 3D stencil computations in the NVIDIA Tesla K80 GPGPU. The optimizations reduce global memory contention caused by the set of multiprocessors. Evaluated optimizations are grid tiling, inserting spatial and temporal loops into kernels, register reuse, and some of their combinations. A standardized experiment evaluates performance variation with grid size and stencil size for each optimization. Experimental data show that codes that use these optimizations are up to 3.3 times faster than the classical stencil formulation. It also shows that the most profitable optimization varies with grid and stencil sizes.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore