z-logo
Premium
A fast and accurate method for determining a lower bound on execution time
Author(s) -
Fursin G.,
O'Boyle M. F. P.,
Temam O.,
Watts G.
Publication year - 2004
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.774
Subject(s) - computer science , benchmark (surveying) , suite , spec# , cas latency , process (computing) , latency (audio) , overhead (engineering) , compiler , range (aeronautics) , execution time , parallel computing , computer engineering , memory controller , operating system , programming language , telecommunications , semiconductor memory , materials science , archaeology , geodesy , composite material , history , geography
In performance critical applications, memory latency is frequently the dominant overhead. In many cases, automatic compiler‐based optimizations to improve memory performance are limited and programmers frequently resort to manual optimization techniques. However, this process is tedious and time‐consuming. Furthermore, as the potential benefit from optimization is unknown there is no way to judge the amount of effort worth expending, nor when the process can stop, i.e. when optimal memory performance has been achieved or sufficiently approached. Architecture simulators can provide such information but designing an accurate model of an existing architecture is difficult and simulation times are excessively long. In this article, we propose and implement a technique that is both fast and reasonably accurate for estimating a lower bound on execution time for scientific applications. This technique has been tested on a wide range of programs from the SPEC benchmark suite and two commercial applications, where it has been used to guide a manual optimization process and iterative compilation. We compare our technique with that of a simulator with an ideal memory behaviour and demonstrate that our technique provides comparable information on memory performance and yet is over two orders of magnitude faster. We further show that our technique is considerably more accurate than hardware counters. Copyright © 2004 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here