
Hybrid method of conventional neural network training
Author(s) -
Андрей Николаевич Голубинский,
Андрей Андреевич Толстых
Publication year - 2021
Publication title -
informatika i avtomatizaciâ/informatika i avtomatizaciâ (print)
Language(s) - English
Resource type - Journals
eISSN - 2713-3206
pISSN - 2713-3192
DOI - 10.15622/ia.2021.20.2.8
Subject(s) - computer science , initialization , convolutional neural network , artificial neural network , artificial intelligence , backpropagation , time delay neural network , deep learning , computation , convolution (computer science) , stability (learning theory) , convergence (economics) , computational complexity theory , machine learning , algorithm , economics , programming language , economic growth
The paper proposes a hybrid method for training convolutional neural networks. The method consists of combining second and first-order methods for different elements of the architecture of a convolutional neural network. The hybrid convolution neural network training method allows to achieve significantly better convergence compared to Adam; however, it requires fewer computational operations to implement. Using the proposed method, it is possible to train networks on which learning paralysis occurs when using first-order methods. Moreover, the proposed method could adjust its computational complexity to the hardware on which the computation is performed; at the same time, the hybrid method allows using the mini-packet learning approach.
The analysis of the ratio of computations between convolutional neural networks and fully connected artificial neural networks is presented. The mathematical apparatus of error optimization of artificial neural networks is considered, including the method of backpropagation of the error, the Levenberg-Marquardt algorithm. The main limitations of these methods that arise when training a convolutional neural network are analyzed.
The analysis of the stability of the proposed method when the initialization parameters are changed. The results of the applicability of the method in various problems are presented.