
Prioritized Experience Replay in Multi-Actor-Attention-Critic for Reinforcement Learning
Author(s) -
Sheng Fan,
Guanghua Song,
Bowei Yang,
Xiaohong Jiang
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1631/1/012040
Subject(s) - reinforcement learning , computer science , reuse , convergence (economics) , scalability , metric (unit) , selection (genetic algorithm) , artificial intelligence , operations management , database , economics , economic growth , ecology , biology
Experience replay is a significant method of off-policy reinforcement learning (RL), which makes RL reuse the past experience and reduce the correlation between samples. Multi-Actor-Attention-Critic (MAAC) is a successful off-policy multi-agent reinforcement learning algorithm, due to its good scalability. To accelerate convergence, we use prioritized experience replay (PER) to optimize the experience selection in MAAC, and propose the PER-MAAC algorithm. In the PER-MAAC, the priority metric is based on the temporal-difference error during training. The algorithm is evaluated in the scenarios of Multi-UAV Cooperative Navigation and Rover-Tower. The experimental results show that PER-MAAC improves the speed of convergence.