z-logo
open-access-imgOpen Access
Prioritised experience replay based on sample optimisation
Author(s) -
Wang Xuesong,
Xiang Haopeng,
Cheng Yuhu,
Yu Qiang
Publication year - 2020
Publication title -
the journal of engineering
Language(s) - English
Resource type - Journals
ISSN - 2051-3305
DOI - 10.1049/joe.2019.1204
Subject(s) - sample (material) , computer science , process (computing) , sampling (signal processing) , reinforcement learning , value (mathematics) , machine learning , artificial intelligence , telecommunications , chemistry , chromatography , detector , operating system
The sample‐based prioritised experience replay proposed in this study is aimed at how to select samples to the experience replay, which improves the training speed and increases the reward return. In the traditional deep Q‐networks (DQNs), it is subjected to random pickup of samples into the experience replay. However, the effect of each sample is different for the training process of agent. A better sampling method will make the agent training more effective. Therefore, when selecting a sample to the experience replay, the authors first allow the agent to learn randomly through the sample optimisation network, and take the average value returned after each study, so that the mean value is used as a threshold for selecting samples to the experience replay. Second, on the basis of sample optimisation, the authors increase the priority update and use the idea of reward‐shaping to give additional reward values to the returns of certain samples, which speeds up the agent training. Compared with traditional DQN and the prioritised experience replay DQN, this study uses OpenAI Gym as platform to improve agent learning efficiency.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here