Premium
Evaluating high level parallel programming support for irregular applications in ICC++
Author(s) -
Chien Andrew A.,
Dolby Julian,
Gangul Bishwaroop,
Karamcheti Vijay,
Zhang Xingbin
Publication year - 1998
Publication title -
software: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.437
H-Index - 70
eISSN - 1097-024X
pISSN - 0038-0644
DOI - 10.1002/(sici)1097-024x(199809)28:11<1213::aid-spe201>3.0.co;2-m
Subject(s) - computer science , concurrency , runtime system , programming paradigm , concurrent object oriented programming , programmer , distributed computing , parallel computing , programming language , reactive programming , inductive programming
Object‐oriented techniques have been proffered as aids for managing complexity, enhancing reuse, and improving readability of irregular parallel applications. However, as performance is the major reason for employing parallelism, programmability and high performance must be delivered together. Using a suite of seven challenging irregular applications and the mature Illinois Concert system (a high‐level concurrent object‐oriented programming model) and an aggressive implementation (whole program compilation plus microsecond threading and communication primitives in the runtime), we evaluate what programming efforts are required to achieve high performance. For all seven applications, we achieve performance comparable to the best achievable via low‐level programming means on large‐scale parallel systems. In general, a high‐level concurrent object‐oriented programming model supported by aggressive implementation techniques can eliminate programmer management of many concerns – procedure and computation granularity, namespace management, and low‐level concurrency management. Our study indicates that these concerns are fully automated for these applications. Decoupling these concerns makes managing the remaining fundamental concerns – data locality and load balance – much easier. In several cases, data locality and load balance for the complex algorithm and pointer data structures is automatically managed by the compiler and runtime, but in general programmer intervention was required. In a few cases, more detailed control is required, specifically explicit task priority, data consistency, and task placement. Our system integrates the expression of such information cleanly into the programming interface. Finally, only small changes to the sequential code were required to express concurrency and performance optimizations, less than 5 per cent of the source code lines were changed in all cases. This bodes well for supporting both sequential and parallel performance in a single code base. © 1998 John Wiley & Sons, Ltd.