z-logo
Premium
Dataflow management, dynamic load balancing, and concurrent processing for real‐time embedded vision applications using Quasar
Author(s) -
Goossens Bart
Publication year - 2018
Publication title -
international journal of circuit theory and applications
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.364
H-Index - 52
eISSN - 1097-007X
pISSN - 0098-9886
DOI - 10.1002/cta.2494
Subject(s) - computer science , parallel computing , compiler , concurrency , cuda , dataflow , kernel (algebra) , load balancing (electrical power) , runtime system , shared memory , programming paradigm , distributed computing , programming language , geometry , mathematics , combinatorics , grid
Summary Programming modern embedded vision systems brings various challenges, due to the steep learning curve for programmers and the different characteristics of the devices. Quasar, a new high‐level programming language and development environment, considerably simplifies the development. Quasar has a compiler that detects and optimizes parallel programming patterns and a heterogeneous runtime that distributes the computational load over the available compute devices (CPUs and Graphical Processing Unit [GPUs]). In this paper, we focus on runtime aspects of Quasar. We show that with good approximation, the execution time of a GPU kernel function can be factorized in a compile‐time‐specific component and a runtime‐specific component. We show that this approximation leads to a computationally simple runtime load balancing rule. Moreover, the load balancing rule permits efficient implicit concurrency of kernel functions and automatic scaling to multiple compute devices (eg, multi‐CPU/GPU systems). Based on an appropriate mathematical scheduling model, we investigate the command queue size trade‐off between memory usage and device utilization. The result is a programming environment for embedded vision systems for which automatic parallelization and implicit concurrency detection allow scaling the program efficiently to multi‐CPU/GPU systems. Finally, benchmark results are provided to demonstrate the performance of our approach compared with OpenACC and CUDA (Compute Unified Device Architecture).

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here