Introduction of Fixed Mode States into Online Reinforcement Learning with Penalties and Rewards and its Application to Biped Robot Waist Trajectory Generation | Zendy

Seiya Kuroda | Zendy; Kazuteru Miyazaki | Zendy; Hiroaki Kobayashi | Zendy

Open Access

Introduction of Fixed Mode States into Online Reinforcement Learning with Penalties and Rewards and its Application to Biped Robot Waist Trajectory Generation

Author(s) -

Seiya Kuroda,

Kazuteru Miyazaki,

Hiroaki Kobayashi

Publication year - 2012

Publication title -

journal of advanced computational intelligence and intelligent informatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.172

H-Index - 20

eISSN - 1343-0130

pISSN - 1883-8014

DOI - 10.20965/jaciii.2012.p0758

Subject(s) - reinforcement learning , computer science , trajectory , task (project management) , robot , probabilistic logic , mode (computer interface) , artificial intelligence , q learning , control theory (sociology) , control (management) , human–computer interaction , engineering , physics , astronomy , systems engineering

During a long-term reinforcement learning task, the efficiency of learning is heavily degraded because the probabilistic actions of an agent often cause the task to fail, which makes it difficult to reach the goal and receive a reward. To address this problem, a fixed mode state is proposed in this paper. If the agent acquires an adequate reward, a normal state is switched to a fixed mode state. In this mode, the agent selects an action using a greedy strategy, i.e., it selects the highest weight action deterministically. First, this paper combines Online Profit Sharing reinforcement learning with the Penalty Avoiding Rational Policy Making algorithm, then introduces fixed mode states in it. The target task is then formulated, i.e., learning the modified waist trajectory of dynamically stable walking task based on the static stable walking of a biped robot. Finally, we present our simulation results and discuss the effectiveness of the proposed method.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research