A Bop and Beyond: A Second Order Optimizer for Binarized Neural Networks | Zendy

Cuauhtémoc Daniel Suárez-Ramírez | Zendy; Miguel González-Mendoza | Zendy; Leonardo Chang-Fernandez | Zendy; Gilberto Ochoa-Ruiz | Zendy; Mario Alberto Duran-Vega | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A Bop and Beyond: A Second Order Optimizer for Binarized Neural Networks

Author(s) -

Cuauhtémoc Daniel Suárez-Ramírez,

Miguel González-Mendoza,

Leonardo Chang-Fernandez,

Gilberto Ochoa-Ruiz,

Mario Alberto Duran-Vega

Publication year - 2021

Language(s) - English

Resource type - Conference proceedings

DOI - 10.52591/lxai202106255

Subject(s) - hyperparameter , moment (physics) , computer science , artificial neural network , binary number , function (biology) , artificial intelligence , sign (mathematics) , algorithm , machine learning , mathematics , mathematical analysis , physics , arithmetic , classical mechanics , evolutionary biology , biology

The optimization of Binary Neural Networks (BNNs) relies on approximating the real-valued weights with their binarized representations. Current techniques for weight-updating use the same approaches as traditional Neural Networks (NNs) with the extra requirement of using an approximation to the derivative of the sign function - as it is the Dirac-Delta function - for back-propagation; thus, efforts are focused adapting full-precision techniques to work on BNNs. In the literature, only one previous effort has tackled the problem of directly training the BNNs with bit-flips by using the first raw moment estimate of the gradients and comparing it against a threshold for deciding when to flip a weight (Bop). In this paper, we take an approach parallel to Adam which also uses the second raw moment estimate to normalize the first raw moment before doing the comparison with the threshold, we call this method Bop2ndOrder. We present two versions of the proposed optimizer: a biased one and a bias-corrected one, each with its own applications. Also, we present a complete ablation study of the hyperparameters space, as well as the effect of using schedulers on each of them. For these studies, we tested the optimizer in CIFAR10 using the BinaryNet architecture. Also, we tested it in ImageNet 2012 with the XnorNet and BiRealNet architectures for accuracy. In both datasets our approach proved to converge faster, was robust to changes of the hyperparameters, and achieved better accuracy values.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore