z-logo
open-access-imgOpen Access
User Satisfaction Reward Estimation Across Domains: Domain-independent Dialogue Policy Learning
Author(s) -
Stefan Ultes,
Wolfgang Maier
Publication year - 2021
Publication title -
dialogue and discourse
Language(s) - English
Resource type - Journals
ISSN - 2152-9620
DOI - 10.5210/dad.2021.203
Subject(s) - reinforcement learning , estimator , computer science , task (project management) , artificial intelligence , machine learning , focus (optics) , estimation , domain (mathematical analysis) , signal (programming language) , statistics , mathematics , engineering , mathematical analysis , physics , systems engineering , optics , programming language
Learning suitable and well-performing dialogue behaviour in statistical spoken dialogue systems has been in the focus of research for many years. While most work that is based on reinforcement learning employs an objective measure like task success for modelling the reward signal, we propose to use a reward signal based on user satisfaction. We propose a novel estimator and show that it outperforms all previous estimators while learning temporal dependencies implicitly. We show in simulated experiments that a live user satisfaction estimation model may be applied resulting in higher estimated satisfaction whilst achieving similar success rates. Moreover, we show that a satisfaction estimation model trained on one domain may be applied in many other domains that cover a similar task. We verify our findings by employing the model to one of the domains for learning a policy from real users and compare its performance to policies using user satisfaction and task success acquired directly from the users as reward.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here