z-logo
Premium
Guided installation of basic linear algebra routines in a cluster with manycore components
Author(s) -
Cuenca J.,
García L. P.,
Giménez D.,
Herrera F. J.
Publication year - 2017
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.4112
Subject(s) - computer science , coprocessor , multi core processor , matrix multiplication , parallel computing , kernel (algebra) , linear algebra , graphics , multiplication (music) , node (physics) , computational science , operating system , mathematics , physics , geometry , structural engineering , quantum mechanics , combinatorics , engineering , quantum
Summary Computational systems are nowadays composed of basic computational components that share multiprocessors and coprocessors of different types, typically several graphics processing units (GPUs) or many integrated cores (MICs), and those computational components are combined in heterogeneous clusters of nodes with different characteristics, including coprocessors of different types, with varying numbers of nodes at different speeds. The software previously developed and optimized for simpler system needs to be redesigned and reoptimized for these new, more complex systems. The adaptation to hybrid multicore + multiGPU and multicore + multiMIC of autotuning techniques for basic linear algebra routines is analyzed. The matrix‐matrix multiplication kernel, which is optimized for different computational system components through guided experimentation, is studied. The routine is installed for each node in the cluster, and the information generated from individual installations may be used for a hierarchical installation in a cluster. The basic matrix‐matrix multiplication may, in turn, be used inside higher level routines, which delegate their efficient execution to the optimization of the lower level routine. Experimental results are satisfactory in different multicore + multiGPU and multicore + multiMIC systems. So the guided search of execution configurations for satisfactory execution times proves to be a useful tool for heterogeneous systems, where the complexity of the system means a correct use of highly efficient routines and libraries is difficult.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here