Premium
OpenMP‐oriented applications for distributed shared memory architectures
Author(s) -
Marowka Ami,
Liu Zhenying,
Chapman Barbara
Publication year - 2004
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.752
Subject(s) - computer science , parallel computing , programmer , shared memory , spmd , distributed memory , programming style , programming paradigm , cache coherence , distributed shared memory , cache , memory hierarchy , locality , parallelism (grammar) , task parallelism , uniform memory access , operating system , cpu cache , overlay , programming language , memory management , cache algorithms , linguistics , philosophy
The rapid rise of OpenMP as the preferred parallel programming paradigm for small‐to‐medium scale parallelism could slow unless OpenMP can show capabilities for becoming the model‐of‐choice for large scale high‐performance parallel computing in the coming decade. The main stumbling block for the adaptation of OpenMP to distributed shared memory (DSM) machines, which are based on architectures like cc‐NUMA, stems from the lack of capabilities for data placement among processors and threads for achieving data locality. The absence of such a mechanism causes remote memory accesses and inefficient cache memory use, both of which lead to poor performance. This paper presents a simple software programming approach called copy‐inside–copy‐back (CC) that exploits the data privatization mechanism of OpenMP for data placement and replacement. This technique enables one to distribute data manually without taking away control and flexibility from the programmer and is thus an alternative to the automat and implicit approaches. Moreover, the CC approach improves on the OpenMP‐SPMD style of programming that makes the development process of an OpenMP application more structured and simpler. The CC technique was tested and analyzed using the NAS Parallel Benchmarks on SGI Origin 2000 multiprocessor machines. This study shows that OpenMP improves performance of coarse‐grained parallelism, although a fast copy mechanism is essential. Copyright © 2004 John Wiley & Sons, Ltd.