Exploring the benefits of heterogeneous computing to accelerate face detection deep learning inference | Zendy

G C Gutiérrez | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Exploring the benefits of heterogeneous computing to accelerate face detection deep learning inference

Author(s) -

G C Gutiérrez

Publication year - 2017

Language(s) - English

Resource type - Dissertations/theses

DOI - 10.17760/d20260296

Subject(s) - deep learning , computer science , inference , artificial intelligence , machine learning , face (sociological concept) , face detection , throughput , facial recognition system , class (philosophy) , pattern recognition (psychology) , telecommunications , social science , sociology , wireless

of the Thesis Exploring the Benefits of Heterogeneous Computing to Accelerate Face Detection Deep Learning Inference by Julian Gutierrez Master of Science in Electrical and Computer Engineering Northeastern University, September 2017 David Kaeli, Ph.D, Advisor Significant improvements in face detection accuracy have been achieved by an emerging class of deep learning algorithms. Despite the capability of these algorithms to achieve high accuracy, deep learning approaches can be computationally prohibitive. As a result, we need to trade off high accuracy with processing throughput, meaning robust face detection in real-time for full HD video streams is not possible today. To overcome this challenge, we propose a parallel pipelined framework that enables efficient usage of our heterogeneous platform. We implement this pipeline framework using a state-of-the-art algorithm, exploiting the CPU and GPU available resources through C++ libraries, including pthreads and OpenCV, and use Caffe and cuDNN libraries to implement our deep learning models. Our framework is capable of handling full HD video workloads in real-time, assuming typical video scenarios. We achieve a 2.4x faster frame-rate as compared to a sequential implementation that is GPU enabled. We are also capable of achieving up to 110 FPS for a standard definition video, while still retaining the high accuracy of the original algorithm. The resulting pipelined framework has a high degree of flexibility, enabling us to consider a range of deep learning algorithms as we try to map deep neural networks to a powerful CPU-GPU platform.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research