An Improving Method for Performance of DNN using 2-bit Quantization | Zendy

Sang Hyeok Kim Et.al | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

An Improving Method for Performance of DNN using 2-bit Quantization

Author(s) -

Sang Hyeok Kim Et.al

Publication year - 2021

Publication title -

türk bilgisayar ve matematik eğitimi dergisi

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.218

H-Index - 3

ISSN - 1309-4653

DOI - 10.17762/turcomat.v12i6.2003

Subject(s) - mnist database , quantization (signal processing) , computer science , algorithm , field programmable gate array , deep learning , artificial intelligence , computer hardware

Background/Objectives: Recently, interest in AI(Artificial Intelligence) has increased, and many studies are being conducted to enable AI to be used in embedded and mobile environments. Among them, quantization is one of the methods to reduce the size of the model, and most quantization of less than 8 bits cannot be implemented without additional hardware such as FPGA. With this in mind, in this paper, we propose two new algorithms that can implement 2bit quantization in software. Methods/Statistical analysis: In this paper, we propose a packing operation that quantizes a weight consisting of 32-bit real values into 2 bits, stores four 2-bit quantization weights in one 8-bit memory, and a Masking Matrix Multiplication function that performs the calculation of the packed weight and input values. These functions operate in parallel in the GPU memory. Findings: The quantization model using the above function showed about 16 times more memory saving and 4 times faster when comparing the operation with the existing 32bit model. Nevertheless, the DNN model showed an error of around 1% in learning using MNIST and HandWritten data, and the CNN model showed an error of around 1% in learning using EEG (Electroencephalograpy) data. Improvements/Applications: The function used in this study is focused on the domain of DNN, and although extended to CNN, quantization could be performed only in the FC (Fully Connected) part. To apply to the convolution layer, an additional function is required, and it is necessary to check whether the difference in accuracy is small even in a more complex data set in the future.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore