Direct Policy Search Reinforcement Learning Based on Variational Bayesian Inference | Zendy

Nobuhiko Yamaguchi | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Direct Policy Search Reinforcement Learning Based on Variational Bayesian Inference

Author(s) -

Nobuhiko Yamaguchi

Publication year - 2020

Publication title -

journal of advanced computational intelligence and intelligent informatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.172

H-Index - 20

eISSN - 1343-0130

pISSN - 1883-8014

DOI - 10.20965/jaciii.2020.p0711

Subject(s) - overfitting , reinforcement learning , computer science , inference , bayesian inference , artificial intelligence , machine learning , bayesian probability , maximization , mathematical optimization , mathematics , artificial neural network

Direct policy search is a promising reinforcement learning framework particularly for controlling continuous, high-dimensional systems. Peters et al. proposed reward-weighted regression (RWR) as a direct policy search. The RWR algorithm estimates the policy parameter based on the expectation-maximization (EM) algorithm and is therefore prone to overfitting. In this study, we focus on variational Bayesian inference to avoid overfitting and propose direct policy search reinforcement learning based on variational Bayesian inference (VBRL). The performance of the proposed VBRL is assessed in several experiments involving a mountain car and a ball batting task. These experiments demonstrate that VBRL yields a higher average return and outperforms the RWR.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research