Q‐learning for continuous‐time graphical games on large networks with completely unknown linear system dynamics | Zendy

Vamvoudakis Kyriakos G. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Q‐learning for continuous‐time graphical games on large networks with completely unknown linear system dynamics

Author(s) -

Vamvoudakis Kyriakos G.

Publication year - 2016

Publication title -

international journal of robust and nonlinear control

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.361

H-Index - 106

eISSN - 1099-1239

pISSN - 1049-8923

DOI - 10.1002/rnc.3719

Subject(s) - reinforcement learning , convergence (economics) , computer science , nash equilibrium , mathematical optimization , function (biology) , gradient descent , bellman equation , stability (learning theory) , stability theory , mathematical proof , synchronization (alternating current) , artificial neural network , control (management) , mathematics , control theory (sociology) , nonlinear system , artificial intelligence , computer network , channel (broadcasting) , physics , geometry , quantum mechanics , evolutionary biology , machine learning , economics , biology , economic growth

Summary In this paper, we consider the problem of leader synchronization in systems with interacting agents in large networks while simultaneously satisfying energy‐related user‐defined distributed optimization criteria. But modeling in large networks is very difficult, and for that reason, we derive a model‐free formulation that is based on a separate distributed Q‐learning function for every agent. Every Q‐function is a parametrization of each agent's control, of the neighborhood controls, and of the neighborhood tracking error. It is also evident that none of the agents has any information on where the leader is connected to and from where she spreads the desired information. The proposed algorithm uses an integral reinforcement learning approach with a separate distributed actor/critic network for each agent: a critic approximator to approximate each value function and an actor approximator to approximate each optimal control law. The derived tuning laws for each actor and critic approximators are designed appropriately by using gradient descent laws. We provide rigorous stability and convergence proofs to show that the closed‐loop system has an asymptotically stable equilibrium point and that the control policies form a graphical Nash equilibrium. We demonstrate the effectiveness of the proposed method on a network consisting of 10 agents. Copyright © 2016 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore