User Satisfaction Reward Estimation Across Domains: Domain-independent Dialogue Policy Learning | Zendy

Stefan Ultes | Zendy; Wolfgang Maier | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

User Satisfaction Reward Estimation Across Domains: Domain-independent Dialogue Policy Learning

Author(s) -

Stefan Ultes,

Wolfgang Maier

Publication year - 2021

Publication title -

dialogue and discourse

Language(s) - English

Resource type - Journals

ISSN - 2152-9620

DOI - 10.5210/dad.2021.203

Subject(s) - reinforcement learning , estimator , computer science , task (project management) , artificial intelligence , machine learning , focus (optics) , estimation , domain (mathematical analysis) , signal (programming language) , statistics , mathematics , engineering , mathematical analysis , physics , systems engineering , optics , programming language

Learning suitable and well-performing dialogue behaviour in statistical spoken dialogue systems has been in the focus of research for many years. While most work that is based on reinforcement learning employs an objective measure like task success for modelling the reward signal, we propose to use a reward signal based on user satisfaction. We propose a novel estimator and show that it outperforms all previous estimators while learning temporal dependencies implicitly. We show in simulated experiments that a live user satisfaction estimation model may be applied resulting in higher estimated satisfaction whilst achieving similar success rates. Moreover, we show that a satisfaction estimation model trained on one domain may be applied in many other domains that cover a similar task. We verify our findings by employing the model to one of the domains for learning a policy from real users and compare its performance to policies using user satisfaction and task success acquired directly from the users as reward.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research