A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning | Zendy

Martin Möller | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning

Author(s) -

Martin Möller

Publication year - 1990

Publication title -

daimi pb

Language(s) - English

Resource type - Journals

eISSN - 2245-9316

pISSN - 0105-8517

DOI - 10.7146/dpb.v19i339.6570

Subject(s) - broyden–fletcher–goldfarb–shanno algorithm , conjugate gradient method , backpropagation , algorithm , artificial neural network , nonlinear conjugate gradient method , convergence (economics) , gradient method , mathematics , computer science , gradient descent , mathematical optimization , artificial intelligence , telecommunications , asynchronous communication , economics , economic growth

A supervised learning algorithm (Scaled Conjugate Gradient, SCG) with superlinear convergence rate is introduced. The algorithm is based upon a class of optimization techniques well known in numerical analysis as the Conjugate Gradient Methods. SCG uses second order information from the neural network but requires only O(N) memory usage, where N is the number of weights in the network. The performance of SCG is benchmarked against the performance of the standard backpropagation algorithm (BP), the conjugate gradient backpropagation (CGB) and the one-step Broyden-Fletcher-Goldfarb-Shanno memoryless quasi-Newton algorithm (BFGS). SCG yields a speed-up of at least an order of magnitude relative to BP. The speed-up depends on the convergence criterion, i.e., the bigger demand for reduction in error the bigger the speed-up. SCG is fully automated including no user dependent parameters and avoids a time consuming line-search, which CGB and BFGS use in each iteration in order to determine an appropriate step size. Incorporating problem dependent structural information in the architecture of a neural network often lowers the overall complexity. The smaller the complexity of the neural network relative to the problem domain, the bigger the possibility that the weight space contains long ravines characterized by sharp curvature. While BP is inefficient on these ravine phenomena, it is shown that SCG handles them effectively.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore