Premium
Using Ginkgo's memory accessor for improving the accuracy of memory‐bound low precision BLAS
Author(s) -
Grützmacher Thomas,
Anzt Hartwig,
QuintanaOrtí Enrique S.
Publication year - 2023
Publication title -
software: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.437
H-Index - 70
eISSN - 1097-024X
pISSN - 0038-0644
DOI - 10.1002/spe.3041
Subject(s) - computer science , parallel computing , memory bandwidth , rounding , auxiliary memory , computer hardware , memory management , arithmetic , semiconductor memory , mathematics , operating system
Abstract The roofline model not only provides a powerful tool to relate an application's performance with the specific constraints imposed by the target hardware but also offers a graphic representation of the balance between memory access cost and compute throughput. In this work, we present a strategy to break up the tight coupling between the precision format used for arithmetic operations and the storage format employed for memory operations. (At a high level, this idea is equivalent to compressing/decompressing the data in registers before/after invoking store/load memory operations.) In practice, we demonstrate that a “memory accessor” that hides the data compression behind the memory access, can virtually push the bandwidth‐induced roofline, yielding higher performance for memory‐bound applications using high precision arithmetic that can handle the numerical effects associated with lossy compression. We also demonstrate that memory‐bound applications operating on low precision data can increase the accuracy by relying on the memory accessor to perform all arithmetic operations in high precision. In particular, we demonstrate that memory‐bound BLAS operations (including the sparse matrix‐vector product) can be re‐engineered with the memory accessor and that the resulting accessor‐enabled BLAS routines achieve lower rounding errors while delivering the same performance as the fast low precision BLAS.