AdaCN: An Adaptive Cubic Newton Method for Nonconvex Stochastic Optimization
Author(s) -
Yan Liu,
Maojun Zhang,
Zhiwei Zhong,
Xiangrong Zeng
Publication year - 2021
Publication title -
computational intelligence and neuroscience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.605
H-Index - 52
eISSN - 1687-5273
pISSN - 1687-5265
DOI - 10.1155/2021/5790608
Subject(s) - mathematical optimization , computer science , newton's method , mathematics , algorithm , physics , nonlinear system , quantum mechanics
In this work, we introduce AdaCN, a novel adaptive cubic Newton method for nonconvex stochastic optimization. AdaCN dynamically captures the curvature of the loss landscape by diagonally approximated Hessian plus the norm of difference between previous two estimates. It only requires at most first order gradients and updates with linear complexity for both time and memory. In order to reduce the variance introduced by the stochastic nature of the problem, AdaCN hires the first and second moment to implement and exponential moving average on iteratively updated stochastic gradients and approximated stochastic Hessians, respectively. We validate AdaCN in extensive experiments, showing that it outperforms other stochastic first order methods (including SGD, Adam, and AdaBound) and stochastic quasi-Newton method (i.e., Apollo), in terms of both convergence speed and generalization performance.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom