
Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems.
Author(s) -
Laëtitia Matig,
Guillaume J. Laurent,
Nadine Le Fort-Piat
Publication year - 2012
Publication title -
hal (le centre pour la communication scientifique directe)
Language(s) - English
DOI - 10.1017/s026988891200057
Subject(s) - markov chain , coordination game , reinforcement , mathematics education , computer science , psychology , mathematics , mathematical economics , social psychology , machine learning
International audienceIn the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, nonstationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive FMQ and WoLF PHC. An overview of the learning algorithms' strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications