Premium
Q‐learning for continuous‐time graphical games on large networks with completely unknown linear system dynamics
Author(s) -
Vamvoudakis Kyriakos G.
Publication year - 2016
Publication title -
international journal of robust and nonlinear control
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.361
H-Index - 106
eISSN - 1099-1239
pISSN - 1049-8923
DOI - 10.1002/rnc.3719
Subject(s) - reinforcement learning , convergence (economics) , computer science , nash equilibrium , mathematical optimization , function (biology) , gradient descent , bellman equation , stability (learning theory) , stability theory , mathematical proof , synchronization (alternating current) , artificial neural network , control (management) , mathematics , control theory (sociology) , nonlinear system , artificial intelligence , computer network , channel (broadcasting) , physics , geometry , quantum mechanics , evolutionary biology , machine learning , economics , biology , economic growth
Summary In this paper, we consider the problem of leader synchronization in systems with interacting agents in large networks while simultaneously satisfying energy‐related user‐defined distributed optimization criteria. But modeling in large networks is very difficult, and for that reason, we derive a model‐free formulation that is based on a separate distributed Q‐learning function for every agent. Every Q‐function is a parametrization of each agent's control, of the neighborhood controls, and of the neighborhood tracking error. It is also evident that none of the agents has any information on where the leader is connected to and from where she spreads the desired information. The proposed algorithm uses an integral reinforcement learning approach with a separate distributed actor/critic network for each agent: a critic approximator to approximate each value function and an actor approximator to approximate each optimal control law. The derived tuning laws for each actor and critic approximators are designed appropriately by using gradient descent laws. We provide rigorous stability and convergence proofs to show that the closed‐loop system has an asymptotically stable equilibrium point and that the control policies form a graphical Nash equilibrium. We demonstrate the effectiveness of the proposed method on a network consisting of 10 agents. Copyright © 2016 John Wiley & Sons, Ltd.