Implementing action mask in proximal policy optimization (PPO) algorithm
Author(s) -
Cheng-Yen Tang,
ChienHung Liu,
Woei-Kae Chen,
Shingchern D. You
Publication year - 2020
Publication title -
ict express
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.733
H-Index - 22
ISSN - 2405-9595
DOI - 10.1016/j.icte.2020.05.003
Subject(s) - reinforcement learning , action (physics) , algorithm , computer science , optimization algorithm , state (computer science) , mathematical optimization , artificial intelligence , mathematics , physics , quantum mechanics
The proximal policy optimization (PPO) algorithm is a promising algorithm in reinforcement learning. In this paper, we propose to add an action mask in the PPO algorithm. The mask indicates whether an action is valid or invalid for each state. Simulation results show that, when compared with the original version, the proposed algorithm yields much higher return with a moderate number of training steps. Therefore, it is useful and valuable to incorporate such a mask if applicable.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom