Nonstrict Hierarchical Reinforcement Learning for Interactive Systems and Robots | Zendy

Heriberto Cuayáhuitl | Zendy; Ivana KruijffKorbayová | Zendy; Nina Dethlefs | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Nonstrict Hierarchical Reinforcement Learning for Interactive Systems and Robots

Author(s) -

Heriberto Cuayáhuitl,

Ivana KruijffKorbayová,

Nina Dethlefs

Publication year - 2014

Publication title -

acm transactions on interactive intelligent systems

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.381

H-Index - 34

eISSN - 2160-6463

pISSN - 2160-6455

DOI - 10.1145/2659003

Subject(s) - reinforcement learning , computer science , hierarchy , artificial intelligence , function (biology) , robot , bellman equation , state space , scalability , decomposition , task (project management) , temporal difference learning , function approximation , dynamic programming , machine learning , artificial neural network , mathematical optimization , engineering , mathematics , ecology , statistics , systems engineering , algorithm , database , evolutionary biology , economics , market economy , biology

Conversational systems and robots that use reinforcement learning for policy optimization in large domains often face the problem of limited scalability. This problem has been addressed either by using function approximation techniques that estimate the approximate true value function of a policy or by using a hierarchical decomposition of a learning task into subtasks. We present a novel approach for dialogue policy optimization that combines the benefits of both hierarchical control and function approximation and that allows flexible transitions between dialogue subtasks to give human users more control over the dialogue. To this end, each reinforcement learning agent in the hierarchy is extended with a subtask transition function and a dynamic state space to allow flexible switching between subdialogues. In addition, the subtask policies are represented with linear function approximation in order to generalize the decision making to situations unseen in training. Our proposed approach is evaluated in an interactive conversational robot that learns to play quiz games. Experimental results, using simulation and real users, provide evidence that our proposed approach can lead to more flexible (natural) interactions than strict hierarchical control and that it is preferred by human users.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research