Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer | Zendy

Yujing Hu | Zendy; Yang Gao | Zendy; Bo An | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer

Author(s) -

Yujing Hu,

Yang Gao,

Bo An

Publication year - 2015

Publication title -

ieee transactions on cybernetics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.109

H-Index - 124

eISSN - 2168-2275

pISSN - 2168-2267

DOI - 10.1109/tcyb.2014.2349152

Subject(s) - signal processing and analysis , communication, networking and broadcast technologies , robotics and control systems , general topics for engineers , components, circuits, devices and systems , computing and processing , power, energy and industry applications

An important approach in multiagent reinforcement learning (MARL) is equilibrium-based MARL, which adopts equilibrium solution concepts in game theory and requires agents to play equilibrium strategies at each state. However, most existing equilibrium-based MARL algorithms cannot scale due to a large number of computationally expensive equilibrium computations (e.g., computing Nash equilibria is PPAD-hard) during learning. For the first time, this paper finds that during the learning process of equilibrium-based MARL, the one-shot games corresponding to each state's successive visits often have the same or similar equilibria (for some states more than 90% of games corresponding to successive visits have similar equilibria). Inspired by this observation, this paper proposes to use equilibrium transfer to accelerate equilibrium-based MARL. The key idea of equilibrium transfer is to reuse previously computed equilibria when each agent has a small incentive to deviate. By introducing transfer loss and transfer condition, a novel framework called equilibrium transfer-based MARL is proposed. We prove that although equilibrium transfer brings transfer loss, equilibrium-based MARL algorithms can still converge to an equilibrium policy under certain assumptions. Experimental results in widely used benchmarks (e.g., grid world game, soccer game, and wall game) show that the proposed framework: 1) not only significantly accelerates equilibrium-based MARL (up to 96.7% reduction in learning time), but also achieves higher average rewards than algorithms without equilibrium transfer and 2) scales significantly better than algorithms without equilibrium transfer when the state/action space grows and the number of agents increases.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research