z-logo
Premium
Output‐feedback H ∞ quadratic tracking control of linear systems using reinforcement learning
Author(s) -
Moghadam Rohollah,
Lewis Frank L.
Publication year - 2019
Publication title -
international journal of adaptive control and signal processing
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.73
H-Index - 66
eISSN - 1099-1115
pISSN - 0890-6327
DOI - 10.1002/acs.2830
Subject(s) - algebraic riccati equation , reinforcement learning , riccati equation , control theory (sociology) , observer (physics) , computer science , convergence (economics) , linear quadratic regulator , controller (irrigation) , tracking (education) , optimal control , nash equilibrium , bounded function , mathematical optimization , algebraic equation , mathematics , control (management) , artificial intelligence , nonlinear system , differential equation , mathematical analysis , pedagogy , psychology , physics , quantum mechanics , economic growth , agronomy , economics , biology
SUMMARY This paper presents an online learning algorithm based on integral reinforcement learning (IRL) to design an output‐feedback (OPFB) H ∞ tracking controller for partially unknown linear continuous‐time systems. Although reinforcement learning techniques have been successfully applied to find optimal state‐feedback controllers, in most control applications, it is not practical to measure the full system states. Therefore, it is desired to design OPFB controllers. To this end, a general bounded L 2 ‐gain tracking problem with a discounted performance function is used for the OPFB H ∞ tracking. A tracking game algebraic Riccati equation is then developed that gives a Nash equilibrium solution to the associated min‐max optimization problem. An IRL algorithm is then developed to solve the game algebraic Riccati equation online without requiring complete knowledge of the system dynamics. The proposed IRL‐based algorithm solves an IRL Bellman equation in each iteration online in real time to evaluate an OPFB policy and updates the OPFB gain using the information given by the evaluated policy. An adaptive observer is used to provide the knowledge of the full states for the IRL Bellman equation during learning. However, the observer is not needed after the learning process is finished. A simulation example is provided to verify the convergence of the proposed algorithm to a suboptimal OPFB solution and the performance of the proposed method.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here