Premium
Measuring the overhead of Intel C++ Concurrent Collections over Threading Building Blocks for Gauss–Jordan elimination
Author(s) -
Tang Peiyi
Publication year - 2012
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.2811
Subject(s) - parallel computing , computer science , overhead (engineering) , task (project management) , data flow diagram , threading (protein sequence) , data structure , macro , operating system , programming language , engineering , physics , systems engineering , nuclear magnetic resonance , database , protein structure
SUMMARY The most efficient way to parallelize computation is to build and evaluate the task graph constrained only by the data dependencies between the tasks. Both Intel's C++ Concurrent Collections (CnC) and Threading Building Blocks (TBB) libraries allow such task‐based parallel programming. CnC also adapts the macro data flow model by providing only single‐assignment data objects in its global data space. Although CnC makes parallel programming easier, by specifying data flow dependencies only through single‐assignment data objects, its macro data flow model incurs overhead. Intel's C++ CnC library is implemented on top of its C++ TBB library. We can measure the overhead of CnC by comparing its performance with that of TBB. In this paper, we analyze all three types of data dependencies in the tiled in‐place Gauss–Jordan elimination algorithm for the first time. We implement the task‐based parallel tiled Gauss–Jordan algorithm in TBB using the data dependencies analyzed and compare its performance with that of the CnC implementation. We find that the overhead of CnC over TBB is only 12 % – 15 % of the TBB time, and CnC can deliver as much as 87 % – 89 % of the TBB performance for Gauss–Jordan elimination, using the optimal tile size. Copyright © 2012 John Wiley & Sons, Ltd.