Output‐feedback H ∞  quadratic tracking control of linear systems using reinforcement learning | Zendy

Moghadam Rohollah | Zendy; Lewis Frank L. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Output‐feedback H ∞ quadratic tracking control of linear systems using reinforcement learning

Author(s) -

Moghadam Rohollah,

Lewis Frank L.

Publication year - 2019

Publication title -

international journal of adaptive control and signal processing

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.73

H-Index - 66

eISSN - 1099-1115

pISSN - 0890-6327

DOI - 10.1002/acs.2830

Subject(s) - algebraic riccati equation , reinforcement learning , riccati equation , control theory (sociology) , observer (physics) , computer science , convergence (economics) , linear quadratic regulator , controller (irrigation) , tracking (education) , optimal control , nash equilibrium , bounded function , mathematical optimization , algebraic equation , mathematics , control (management) , artificial intelligence , nonlinear system , differential equation , mathematical analysis , pedagogy , psychology , physics , quantum mechanics , economic growth , agronomy , economics , biology

SUMMARY This paper presents an online learning algorithm based on integral reinforcement learning (IRL) to design an output‐feedback (OPFB) H ∞ tracking controller for partially unknown linear continuous‐time systems. Although reinforcement learning techniques have been successfully applied to find optimal state‐feedback controllers, in most control applications, it is not practical to measure the full system states. Therefore, it is desired to design OPFB controllers. To this end, a general bounded L 2 ‐gain tracking problem with a discounted performance function is used for the OPFB H ∞ tracking. A tracking game algebraic Riccati equation is then developed that gives a Nash equilibrium solution to the associated min‐max optimization problem. An IRL algorithm is then developed to solve the game algebraic Riccati equation online without requiring complete knowledge of the system dynamics. The proposed IRL‐based algorithm solves an IRL Bellman equation in each iteration online in real time to evaluate an OPFB policy and updates the OPFB gain using the information given by the evaluated policy. An adaptive observer is used to provide the knowledge of the full states for the IRL Bellman equation during learning. However, the observer is not needed after the learning process is finished. A simulation example is provided to verify the convergence of the proposed algorithm to a suboptimal OPFB solution and the performance of the proposed method.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore