Adaptive Reinforcement Learning and Its Application to Robot Compliance Learning
Author(s) -
Boo-Ho Yang,
Haruhiko Asada
Publication year - 1995
Publication title -
journal of robotics and mechatronics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.257
H-Index - 19
eISSN - 1883-8049
pISSN - 0915-3942
DOI - 10.20965/jrm.1995.p0250
Subject(s) - reinforcement learning , grasp , robot , computer science , artificial intelligence , adaptive control , robot learning , robustness (evolution) , control theory (sociology) , control (management) , mobile robot , biochemistry , chemistry , programming language , gene
A new learning algorithm for connectionist networks that solves a class of optimal control problems is presented. The algorithm, called Adaptive Reinforcement Learning Algorithm, employs a second network to model immediate reinforcement provided from the task environment and adaptively identities it through repeated experience. Output perturbation and correlation techniques are used to translate mere critic signals into useful learning signals for the connectionist controller. Compared with the direct approaches of reinforcement learning, this algorithm shows faster and guaranteed improvement in the control performance. Robustness against inaccuracy of the model is also discussed. It is demonstrated by simulation that the adaptive reinforcement learning method is efficient and useful in learning a compliance control law in a class of robotic assembly tasks. A simple box palletizing task is used as an example, where a robot is required to move a rectangular part to the corner of a box. In the simulation, the robot is initially provided with only predetermined velocity command to follow the nominal trajectory. At each attempt, the box is randomly located and the part is randomly oriented within the grasp of the end-effector. Therefore, compliant motion control is necessary to guide the part to the corner of the box while avoiding excessive reaction forces caused by the collision with a wall. After repeating the failure in performing the task, the robot can successfully learn force feedback gains to modify its nominal motion. Our results show that the new learning method can be used to learn a compliance control law effectively.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom