Instruction scheduling for a clustered VLIW processor with a word‐interleaved cache | Zendy

Gibert Enric | Zendy; Sánchez Jesús | Zendy; González Antonio | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Instruction scheduling for a clustered VLIW processor with a word‐interleaved cache

Author(s) -

Gibert Enric,

Sánchez Jesús,

González Antonio

Publication year - 2006

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.1013

Subject(s) - computer science , very long instruction word , parallel computing , loop unrolling , cache , cas latency , scheduling (production processes) , operating system , compiler , memory controller , operations management , semiconductor memory , economics

Clustering is a common technique to overcome the wire delay problem incurred by the evolution of technology. Fully distributed architectures, where the register file, the functional units and the data cache are partitioned, are particularly effective to deal with these constraints and moreover they are very scalable. In this paper, effective instruction scheduling techniques for a word‐interleaved cache clustered VLIW processor are presented. Such scheduling techniques rely on (i) loop unrolling and variable alignment to increase the fraction of local accesses, (ii) a latency assignment process to schedule memory instructions with an appropriate latency, and (iii) different heuristics to assign memory instructions to clusters. Memory consistency is guaranteed by constraining the assignment of memory instructions to clusters. In addition, the use of Attraction Buffers is also introduced. An Attraction Buffer is a hardware mechanism that allows some data replication in order to increase the number of local accesses and, in consequence, reduces stall time. Performance results for the Mediabench benchmark suite demonstrate the effectiveness of the presented techniques and mechanisms. The number of local accesses is increased by more than 25% by using the mentioned scheduling techniques, while stall time is reduced by more than 30% when Attraction Buffers are used. Finally, IPC results for such an architecture are 10% and 5% better compared to those of a clustered VLIW processor with a centralized/unified data cache depending on the scheduling heuristic, respectively. Copyright © 2006 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore