Premium
Gradient‐based parameter optimization for systems containing discrete‐valued functions
Author(s) -
Wilson Edward,
Rock Stephen M.
Publication year - 2002
Publication title -
international journal of robust and nonlinear control
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.361
H-Index - 106
eISSN - 1099-1239
pISSN - 1049-8923
DOI - 10.1002/rnc.729
Subject(s) - backpropagation , differentiable function , artificial neural network , heaviside step function , discrete optimization , optimization problem , discrete system , computer science , mathematical optimization , automatic differentiation , noise (video) , mathematics , algorithm , artificial intelligence , mathematical analysis , statistics , image (mathematics) , computation
Gradient‐based parameter optimization is commonly used for training neural networks and optimizing the performance of other complex systems that only contain continuously differentiable functions. However, there is a large class of important parameter optimization problems involving systems containing discrete‐valued functions that do not permit the direct use of gradient‐based methods. Examples include optimization of control systems containing discrete‐level actuators such as on/off devices, systems with discrete‐valued inputs and outputs, discrete‐decision‐making systems (accept/reject), and neural networks built with signums (also known as hard‐limiters or Heaviside step functions) rather than sigmoids. Even if most of the system is continuously differentiable, the presence of one or more discrete‐valued functions will not allow gradient‐based optimization to be used directly. A new algorithm, ‘noisy backpropagation,’ is developed here, as an extension of backpropagation, which solves this problem and extends gradient‐based parameter optimization to permit application to systems containing discrete‐valued functions. Moreover, the modification to backpropagation is small, requiring only (1) replacement of the discrete‐valued functions with continuously differentiable approximations, and (2) injection of noise into the smooth approximating function on the forward sweep during training. Noise injection is the key to reducing the round‐off error created when the discrete‐valued functions are replaced after training. This generic approach is applicable whenever gradient‐based parameter optimization is used with systems containing discrete‐valued functions; it is not limited to training neural networks. The examples in this paper demonstrate the use of noisy backpropagation in training two different multi‐layer signum networks and in training a neural network for a control problem involving on‐off actuators. This final example includes implementation on a laboratory model of a ‘free‐flying space robot’ to validate the realizability and practical utility of the method. Copyright © 2002 John Wiley & Sons, Ltd.