Premium
The performance of parallel matrix algorithms on a broadcast‐based architecture
Author(s) -
Katsinis Constantine,
Hecht Diana,
Zhu Ming,
Narravula Harsha
Publication year - 2006
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.920
Subject(s) - instruction prefetch , computer science , cache , parallel computing , multiprocessing , cache invalidation , block (permutation group theory) , cache algorithms , cache pollution , cache coherence , interconnection , smart cache , cpu cache , directory , page cache , computer network , operating system , geometry , mathematics
Due to advances in fiber‐optics and very large scale integration (VLSI) technology, interconnection networks which allow multiple simultaneous broadcasts are becoming feasible. This paper summarizes one such multiprocessor architecture called the Simultaneous Optical Multiprocessor Exchange Bus (SOME‐Bus). It also presents enhancements to the network interface and the cache and directory controllers which support cache block combining, capture and prefetch, and allow complete overlap of processing time with the communication time due to compulsory misses. The paper uses two fundamental matrix algorithms to characterize the impact of each enhancement on performance. Cache miss analysis and results from the execution of these programs on a SOME‐Bus simulator show that block capture and prefetch combined with an effective block replacement policy succeed in significantly reducing the miss rate due to compulsory misses as the cache size increases, while a similar increase of cache size in traditional architectures leaves the miss rate due to compulsory misses unaffected. Copyright © 2005 John Wiley & Sons, Ltd.