Implementing action mask in proximal policy optimization (PPO) algorithm | Zendy

Cheng-Yen Tang | Zendy; ChienHung Liu | Zendy; Woei-Kae Chen | Zendy; Shingchern D. You | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Implementing action mask in proximal policy optimization (PPO) algorithm

Author(s) -

Cheng-Yen Tang,

ChienHung Liu,

Woei-Kae Chen,

Shingchern D. You

Publication year - 2020

Publication title -

ict express

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.733

H-Index - 22

ISSN - 2405-9595

DOI - 10.1016/j.icte.2020.05.003

Subject(s) - reinforcement learning , action (physics) , algorithm , computer science , optimization algorithm , state (computer science) , mathematical optimization , artificial intelligence , mathematics , physics , quantum mechanics

The proximal policy optimization (PPO) algorithm is a promising algorithm in reinforcement learning. In this paper, we propose to add an action mask in the PPO algorithm. The mask indicates whether an action is valid or invalid for each state. Simulation results show that, when compared with the original version, the proposed algorithm yields much higher return with a moderate number of training steps. Therefore, it is useful and valuable to incorporate such a mask if applicable.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research