z-logo
open-access-imgOpen Access
Take full advantage of demonstration data in deep reinforcement learning
Author(s) -
Yong-Xu Zhang
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/2010/1/012063
Subject(s) - reinforcement learning , computer science , artificial intelligence , machine learning , trajectory , reinforcement , cloning (programming) , deep learning , engineering , physics , structural engineering , astronomy , programming language
Deep reinforcement learning (DRL) algorithms have achieve a great breakthrough in many tasks, even though, they still suffer from the problems, such as random exploration and sparse reward. Recently, some reinforcement learning from demonstration (RLfD) methods have shown to be promising in overcoming these problems. However, they usually require considerable demonstrations and the demonstration data may not be optimal which means the absent of exploitation. To deal with it, we propose a novel algorithm in which we use a behaver cloning network BC to learning the demonstration data, and then use it to guide the reinforcement learning agent, when the agent learn and explore a better trajectory, the BC then learning the trajectory to make itself better, which will lead the BC network and the agent become better at the same time.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here