z-logo
open-access-imgOpen Access
An Improvement of the Matrix-Matrix Multiplication Speed using 2D-Tiling and AVX512 Intrinsics for Multi-Core Architectures
Author(s) -
Nwe Zin Oo,
Panyayot Chaikan
Publication year - 2021
Publication title -
asean journal of scientific and technological reports
Language(s) - English
Resource type - Journals
ISSN - 2773-8752
DOI - 10.55164/ajstr.v24i2.242021
Subject(s) - intrinsics , computer science , parallel computing , matrix multiplication , matrix (chemical analysis) , speedup , multiplication (music) , cache oblivious algorithm , sparse matrix , algorithm , computational science , cache , mathematics , cpu cache , combinatorics , physics , materials science , cache coloring , quantum mechanics , composite material , quantum , gaussian
Matrix-matrix multiplication is a time-consuming operation in scientific and engineering applications. When the matrix size is large, it will take a lot of computation time, resulting in slow software which is unacceptable in real-time applications. In this paper, 2D-tiling, loop unrolling, data padding, OpenMP directives, and AVX512 intrinsics are utilized to increase the speed of matrix-matrix multiplication on multi-core architectures. Our algorithm, tested on a Core i9-7900X machine, is more than two times faster than the operations offered by the OpenBLAS and Eigen libraries for single and double precision floating-point matrices. We also propose an equation for parameter tuning which allows our algorithm to be adapted to process any size of matrix on CPUs with different cache organizations.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here