Proposal of the Continuous-Valued Penalty Avoiding Rational Policy Making Algorithm | Zendy

Kazuteru Miyazaki | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Proposal of the Continuous-Valued Penalty Avoiding Rational Policy Making Algorithm

Author(s) -

Kazuteru Miyazaki

Publication year - 2012

Publication title -

journal of advanced computational intelligence and intelligent informatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.172

H-Index - 20

eISSN - 1343-0130

pISSN - 1883-8014

DOI - 10.20965/jaciii.2012.p0183

Subject(s) - computer science , reinforcement learning , process (computing) , algorithm , mathematical optimization , action (physics) , artificial intelligence , mathematics , physics , quantum mechanics , operating system

Applying reinforcement learning to actual problems, sometimes requires the treatment of continuousvalued input and output. We previously proposed a process called Exploitation-oriented Learning (XoL) to strongly enhance successful experience and thereby reduce the number of trial-and-error searches. A method based on Penalty-Avoiding Rational Policymaking (PARP) is proposed as a XoL method corresponding to continuous-valued input, but types of action treating continuous-valued output are not executed. We study the treatment of continuous-valued output suitable for a XoL method in which the environment includes both a reward and a penalty. We extend PARP in continuous-valued input to continuousvalued output. We apply our proposal to the pole-cart balancing problem and the biped LEGO robot, and confirm its effectiveness.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research