
An Improving Method for Performance of DNN using 2-bit Quantization
Author(s) -
Sang Hyeok Kim et.al
Publication year - 2021
Publication title -
türk bilgisayar ve matematik eğitimi dergisi
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.218
H-Index - 3
ISSN - 1309-4653
DOI - 10.17762/turcomat.v12i6.2003
Subject(s) - mnist database , quantization (signal processing) , computer science , algorithm , field programmable gate array , deep learning , artificial intelligence , computer hardware
Background/Objectives: Recently, interest in AI(Artificial Intelligence) has increased, and many studies are being conducted to enable AI to be used in embedded and mobile environments. Among them, quantization is one of the methods to reduce the size of the model, and most quantization of less than 8 bits cannot be implemented without additional hardware such as FPGA. With this in mind, in this paper, we propose two new algorithms that can implement 2bit quantization in software.
Methods/Statistical analysis: In this paper, we propose a packing operation that quantizes a weight consisting of 32-bit real values into 2 bits, stores four 2-bit quantization weights in one 8-bit memory, and a Masking Matrix Multiplication function that performs the calculation of the packed weight and input values. These functions operate in parallel in the GPU memory.
Findings: The quantization model using the above function showed about 16 times more memory saving and 4 times faster when comparing the operation with the existing 32bit model. Nevertheless, the DNN model showed an error of around 1% in learning using MNIST and HandWritten data, and the CNN model showed an error of around 1% in learning using EEG (Electroencephalograpy) data.
Improvements/Applications: The function used in this study is focused on the domain of DNN, and although extended to CNN, quantization could be performed only in the FC (Fully Connected) part. To apply to the convolution layer, an additional function is required, and it is necessary to check whether the difference in accuracy is small even in a more complex data set in the future.