z-logo
open-access-imgOpen Access
Multi-Agent Q-Learning via Best Choice Dynamics
Author(s) -
Scott Addams,
Jorge Cortes
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3612969
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Motivated by multi-agent Q-learning scenarios, this paper introduces a distributed action selection algorithm that relies on individual agents interacting with local neighbors to learn a joint action. The algorithm, termed Best Choice Dynamics, has each agent communicate its current planned action to its neighbors, who in turn utilize this information to update their own actions. We characterize the convergence and robustness of the algorithm against message losses, showing that it converges to locally optimal joint actions in finite time. We also discuss its relative advantages with respect to message-passing algorithms and best response dynamics regarding convergence guarantees, lack of oscillations, and communication complexity. We illustrate the algorithm performance in various simulation scenarios, including both on-line training and offline training with distributed on-line roll-out.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom