Multi-dimensional Kernel Generation for Loop Nest Software Pipelining | Zendy

Alban Douillet | Zendy; Hongbo Rong | Zendy; Guang R. Gao | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Multi-dimensional Kernel Generation for Loop Nest Software Pipelining

Author(s) -

Alban Douillet,

Hongbo Rong,

Guang R. Gao

Publication year - 2006

Publication title -

lecture notes in computer science

Language(s) - English

Resource type - Book series

SCImago Journal Rank - 0.249

H-Index - 400

eISSN - 1611-3349

pISSN - 0302-9743

ISBN - 3-540-37783-2

DOI - 10.1007/11823285_32

Subject(s) - software pipelining , computer science , parallel computing , modulo , scheduling (production processes) , loop tiling , loop fission , instruction scheduling , compiler , high level synthesis , software , two level scheduling , dynamic priority scheduling , schedule , embedded system , mathematical optimization , operating system , field programmable gate array , mathematics , combinatorics

Single-dimension Software Pipelining (SSP) has been proposed as an effective software pipelining technique for multi-dimensional loops [16]. This paper introduces for the first time the scheduling methods that actually produce the kernel code. Because of the multi-dimensional nature of the problem, the scheduling problem is more complex and challenging than with traditional modulo scheduling. The scheduler must handle multiple subkernels and initiation rates under specific scheduling constraints, while producing a solution that minimizes the execution time of the final schedule. In this paper three approaches are proposed: the level-by-level method, which schedules operations in loop level order, starting from the innermost, and does not let other operations interfere with the already scheduled levels, the flat method, which schedules operations from different loop levels with the same priority, and the hybrid method, which uses the level-by-level mechanism for the innermost level and the flat solution for the other levels. The methods subsume Huff's modulo scheduling [8] for single loops as a special case. We also break a scheduling constraint introduced in earlier publications and allow for a more compact kernel. The proposed approaches were implemented in the Open64/ORC compiler, and evaluated on loop nests from the Livermore, SPEC200 and NAS benchmarks.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research