Dataflow management, dynamic load balancing, and concurrent processing for real‐time embedded vision applications using Quasar | Zendy

Goossens Bart | Zendy

Premium

Dataflow management, dynamic load balancing, and concurrent processing for real‐time embedded vision applications using Quasar

Author(s) -

Goossens Bart

Publication year - 2018

Publication title -

international journal of circuit theory and applications

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.364

H-Index - 52

eISSN - 1097-007X

pISSN - 0098-9886

DOI - 10.1002/cta.2494

Subject(s) - computer science , parallel computing , compiler , concurrency , cuda , dataflow , kernel (algebra) , load balancing (electrical power) , runtime system , shared memory , programming paradigm , distributed computing , programming language , geometry , mathematics , combinatorics , grid

Summary Programming modern embedded vision systems brings various challenges, due to the steep learning curve for programmers and the different characteristics of the devices. Quasar, a new high‐level programming language and development environment, considerably simplifies the development. Quasar has a compiler that detects and optimizes parallel programming patterns and a heterogeneous runtime that distributes the computational load over the available compute devices (CPUs and Graphical Processing Unit [GPUs]). In this paper, we focus on runtime aspects of Quasar. We show that with good approximation, the execution time of a GPU kernel function can be factorized in a compile‐time‐specific component and a runtime‐specific component. We show that this approximation leads to a computationally simple runtime load balancing rule. Moreover, the load balancing rule permits efficient implicit concurrency of kernel functions and automatic scaling to multiple compute devices (eg, multi‐CPU/GPU systems). Based on an appropriate mathematical scheduling model, we investigate the command queue size trade‐off between memory usage and device utilization. The result is a programming environment for embedded vision systems for which automatic parallelization and implicit concurrency detection allow scaling the program efficiently to multi‐CPU/GPU systems. Finally, benchmark results are provided to demonstrate the performance of our approach compared with OpenACC and CUDA (Compute Unified Device Architecture).

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research