A kernel based true online Sarsa(λ) for continuous space control problems | Zendy

Fei Zhu | Zendy; Haijun Zhu | Zendy; Yuchen Fu | Zendy; Donghuo Chen | Zendy; Xiaoke Zhou | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A kernel based true online Sarsa(λ) for continuous space control problems

Author(s) -

Fei Zhu,

Haijun Zhu,

Yuchen Fu,

Donghuo Chen,

Xiaoke Zhou

Publication year - 2017

Publication title -

computer science and information systems

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.244

H-Index - 24

eISSN - 2406-1018

pISSN - 1820-0214

DOI - 10.2298/csis170107029z

Subject(s) - computer science , reinforcement learning , convergence (economics) , mathematical optimization , kernel (algebra) , bellman equation , cluster analysis , heuristic , algorithm , artificial intelligence , mathematics , combinatorics , economics , economic growth

Reinforcement learning is an efficient learning method for the control problem by interacting with the environment to get an optimal policy. However, it also faces challenges such as low convergence accuracy and slow convergence. Moreover, conventional reinforcement learning algorithms could hardly solve continuous control problems. The kernel-based method can accelerate convergence speed and improve convergence accuracy; and the policy gradient method is a good way to deal with continuous space problems. We proposed a Sarsa(λ) version of true online time difference algorithm, named True Online Sarsa(λ)(TOSarsa(λ)), on the basis of the clustering-based sample specification method and selective kernelbased value function. The TOSarsa(λ) algorithm has a consistent result with both the forward view and the backward view which ensures to get an optimal policy in less time. Afterwards we also combined TOSarsa(λ) with heuristic dynamic programming. The experiments showed our proposed algorithm worked well in dealing with continuous control problem.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research