z-logo
Premium
Improving performance of optimized kernels through fast instantiations of templates
Author(s) -
Khan Minhaj Ahmad,
Charles H.P.,
Barthou D.
Publication year - 2009
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.1333
Subject(s) - computer science , compiler , compile time , template , parallel computing , overhead (engineering) , exploit , computation , just in time compilation , programming language , parallelism (grammar) , computer security
To fully exploit the instruction‐level parallelism offered by modern processors, compilers need the necessary information available during the execution of the program. This advocates for iterative or dynamic compilation. Unfortunately, dynamic compilation is suitable only for applications where the cost of compilation may be amortized by multiple invocations of the same code. Similarly, the cost of iterative compilation makes it impractical to be widely used for performance improvement. In this article, we suggest a novel approach for improving the performance of mathematical kernels through fast instantiations of templates. Optimized templates are generated at static compile time with a limited number of compilations. The initial instantiations of these templates are performed at static compile time, and the runtime instantiations are performed with a very small overhead through specialized data, requiring no computations at runtime. It represents an effective solution in terms of reduced overhead incurring at static compile time and dynamic compile time. The experiments have been performed on an Itanium‐II architecture using highly optimized kernels of ATLAS and FFTW with icc and gcc compilers. Copyright © 2008 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here