On the communication complexity of 3D FFTs and its implications for Exascale
Author(s) -
Kenneth Czechowski,
Casey Battaglino,
Chris McClanahan,
Kartik P. Iyer,
P. K. Yeung,
Richard Vuduc
Publication year - 2012
Publication title -
citeseer x (the pennsylvania state university)
Language(s) - English
Resource type - Conference proceedings
DOI - 10.1145/2304576.2304604
Subject(s) - computer science , bottleneck , bandwidth (computing) , implementation , fast fourier transform , memory bandwidth , node (physics) , parallel computing , memory hierarchy , communication complexity , key (lock) , distributed computing , computer architecture , theoretical computer science , computer network , embedded system , algorithm , operating system , structural engineering , engineering , programming language , cache
This paper revisits the communication complexity of large-scale 3D fast Fourier transforms (FFTs) and asks what impact trends in current architectures will have on FFT performance at exascale. We analyze both memory hierarchy traffic and network communication to derive suitable analytical models, which we calibrate against current software implementations; we then evaluate models to make predictions about potential scaling outcomes at exascale, based on extrapolating current technology trends. Of particular interest is the performance impact of choosing high-density processors, typified today by graphics co-processors (GPUs), as the base processor for an exascale system. Among various observations, a key prediction is that although inter-node all-to-all communication is expected to be the bottleneck of distributed FFTs, intra-node communication---expressed precisely in terms of the relative balance among compute capacity, memory bandwidth, and network bandwidth---will play a critical role.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom