z-logo
Premium
Experiences in autotuning matrix multiplication for energy minimization on GPUs
Author(s) -
Anzt Hartwig,
Haugen Blake,
Kurzak Jakub,
Luszczek Piotr,
Dongarra Jack
Publication year - 2015
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.3516
Subject(s) - kernel (algebra) , matrix multiplication , metric (unit) , computer science , parallel computing , throughput , multiplication (music) , graphics , matrix (chemical analysis) , energy (signal processing) , efficient energy use , energy minimization , contrast (vision) , minification , energy consumption , algorithm , computational science , artificial intelligence , computer graphics (images) , mathematics , operating system , engineering , statistics , operations management , materials science , chemistry , composite material , quantum , quantum mechanics , wireless , programming language , physics , computational chemistry , combinatorics , electrical engineering
Summary In this paper, we report extensive results and analysis of autotuning the computationally intensive graphics processing units kernel for dense matrix–matrix multiplication in double precision. In contrast to traditional autotuning and/or optimization for runtime performance only, we also take the energy efficiency into account. For kernels achieving equal performance, we show significant differences in their energy balance. We also identify the memory throughput as the most influential metric that trades off performance and energy efficiency. As a result, the performance optimal case ends up not being the most efficient kernel in overall resource use. Copyright © 2015 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here