
An Improvement of the Matrix-Matrix Multiplication Speed using 2D-Tiling and AVX512 Intrinsics for Multi-Core Architectures
Author(s) -
Nwe Zin Oo,
Panyayot Chaikan
Publication year - 2021
Publication title -
asean journal of scientific and technological reports
Language(s) - English
Resource type - Journals
ISSN - 2773-8752
DOI - 10.55164/ajstr.v24i2.242021
Subject(s) - intrinsics , computer science , parallel computing , matrix multiplication , matrix (chemical analysis) , speedup , multiplication (music) , cache oblivious algorithm , sparse matrix , algorithm , computational science , cache , mathematics , cpu cache , combinatorics , physics , materials science , cache coloring , quantum mechanics , composite material , quantum , gaussian
Matrix-matrix multiplication is a time-consuming operation in scientific and engineering applications. When the matrix size is large, it will take a lot of computation time, resulting in slow software which is unacceptable in real-time applications. In this paper, 2D-tiling, loop unrolling, data padding, OpenMP directives, and AVX512 intrinsics are utilized to increase the speed of matrix-matrix multiplication on multi-core architectures. Our algorithm, tested on a Core i9-7900X machine, is more than two times faster than the operations offered by the OpenBLAS and Eigen libraries for single and double precision floating-point matrices. We also propose an equation for parameter tuning which allows our algorithm to be adapted to process any size of matrix on CPUs with different cache organizations.