
Output feedback reinforcement learning based optimal output synchronisation of heterogeneous discrete‐time multi‐agent systems
Author(s) -
Rizvi Syed Ali Asad,
Lin Zongli
Publication year - 2019
Publication title -
iet control theory and applications
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.059
H-Index - 108
eISSN - 1751-8652
pISSN - 1751-8644
DOI - 10.1049/iet-cta.2018.6266
Subject(s) - control theory (sociology) , feed forward , reinforcement learning , algebraic riccati equation , computer science , riccati equation , observer (physics) , state observer , state (computer science) , scheme (mathematics) , controller (irrigation) , optimal control , multi agent system , control (management) , mathematical optimization , mathematics , control engineering , algorithm , artificial intelligence , engineering , nonlinear system , differential equation , mathematical analysis , agronomy , physics , quantum mechanics , biology
This study proposes a model‐free distributed output feedback control scheme that achieves synchronisation of the outputs of the heterogeneous follower agents with that of the leader agent in a directed network. A distributed two degree of freedom approach is presented that separates the learning of the optimal output feedback and the feedforward terms of the local control law for each agent. The local feedback parameters are learned using the proposed off‐policy Q ‐learning algorithm, whereas a gradient adaptive law is presented to learn the local feedforward control parameters to achieve asymptotic tracking of each agent. This learning scheme and the resulting distributed control laws neither require access to the local internal state of the agents nor do they need an additional distributed leader state observer. The proposed approach has the advantage over the previous state augmentation approaches as it circumvents the need of introducing a discounting factor in the local performance functions. It is shown that the proposed algorithm converges to the optimal solution of the algebraic Riccati equation and the output regulator equations without explicitly solving them as long as the leader agent is reachable directly or indirectly from all the follower agents. Simulation results validate the proposed scheme.