Research on Dynamic Reconfiguration Technology of Neural Network Accelerator Based on Zynq | Zendy

Hao Lv | Zendy; Shengbing Zhang | Zendy; Xiaojian Liu | Zendy; Shuo Liu | Zendy; Yongqiang Liu | Zendy; Wei Han | Zendy; Shiyi Xu | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Research on Dynamic Reconfiguration Technology of Neural Network Accelerator Based on Zynq

Author(s) -

Hao Lv,

Shengbing Zhang,

Xiaojian Liu,

Shuo Liu,

Yongqiang Liu,

Wei Han,

Shiyi Xu

Publication year - 2020

Publication title -

journal of physics. conference series

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.21

H-Index - 85

eISSN - 1742-6596

pISSN - 1742-6588

DOI - 10.1088/1742-6596/1650/3/032093

Subject(s) - reconfigurability , field programmable gate array , computer science , control reconfiguration , convolutional neural network , embedded system , hardware acceleration , computer architecture , multiplexing , reconfigurable computing , computer hardware , artificial intelligence , operating system , telecommunications

Target detection based on convolutional neural network is a research hotspot in the field of computer vision. Conventional neural network (CNN) accelerators use the time division multiplexing method, and different network layers use the same accelerator, and their adaptability and resource utilization are not high. How to combine the dynamic reconfigurable characteristics of FPGA so that the calculation of each layer can be matched with the corresponding accelerator architecture at the cost of a certain configuration delay, and to improve the utilization efficiency of computing resources is a research hotspot. This article takes the YOLOv2 target detection algorithm widely used in the industry as an example, and uses Xilinx’s Zynq as the platform to describe the process of mapping the CNN model to the FPGA. Combined with the dynamic reconfigurability of FPGA, the calculation of each layer can be matched with the reconstructed accelerator architecture at the cost of a certain configuration delay, and the reconstruction delay can be shared by batch data multiplexing accelerator architecture, which effectively improves In order to improve the accelerator performance, the convolutional layer and the cascaded maximum pooling layer are merged to reduce the memory access delay. Experiments and evaluations were carried out on the accelerator architecture combined with dynamic reconfigurable characteristics, and the performance of 30.35GOP/s was obtained on the Zynq platform. Provide a reference for the application and optimization of CNN on embedded platforms.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore