z-logo
open-access-imgOpen Access
A digital signal processor‐efficient accelerator for depthwise separable convolution
Author(s) -
Li Xueming,
Huang Hongmin,
Liu Yuan,
Hu Xianghong,
Xiong Xiaoming
Publication year - 2022
Publication title -
electronics letters
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.375
H-Index - 146
eISSN - 1350-911X
pISSN - 0013-5194
DOI - 10.1049/ell2.12435
Subject(s) - computer science , digital signal processing , computation , convolution (computer science) , computer hardware , digital signal processor , hardware acceleration , field programmable gate array , acceleration , kernel (algebra) , gate array , embedded system , parallel computing , computational science , artificial neural network , algorithm , artificial intelligence , physics , mathematics , classical mechanics , combinatorics
Recent researches on deep convolution neural networks have proposed some compact networks, such as MobileNet, but its main computation, depthwise separable convolution (DWC), which reduces the reusable data and improves the requirement of data loading efficiency. Although DWC can effectively reduce the amount of network computation, it needs a special accelerator to enhance the inference speed. This paper proposes a high‐performance accelerator for DWC based on the commonly used acceleration platform field‐programmable gate array. The proposed accelerator supports the computation of both standard convolutions (SCs) and DWC as well as two activation functions. In addition, two data storage formats are used to maintain the data loading efficiency for different input requirements of SC and DWC under high parallelism. Furthermore, a processing unit that can execute two 8 × 8‐bit multiplications inside one digital signal processor (DSP) is designed to make the best use of the DSP hardware resources. Finally, the accelerator is implemented on ZYNQ ZC706 at 200 MHz. Consuming only 392 DSPs, the accelerator achieves 134.5 giga operations per second (GOPS) and 209.4 frames per second (FPS) on MobileNet V1 as well as 96.4 GOPS and 250.4 FPS on MobileNet V2. Experimental results demonstrate that this design provides a better DSP efficiency than previous works.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here