z-logo
open-access-imgOpen Access
Conversational Intelligence Challenge: Accelerating Research with Crowd Science and Open Source
Author(s) -
Burtsev Mikhail,
Logacheva Varvara
Publication year - 2020
Publication title -
ai magazine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.597
H-Index - 79
eISSN - 2371-9621
pISSN - 0738-4602
DOI - 10.1609/aimag.v41i3.5324
Subject(s) - computer science , open domain , quality (philosophy) , domain (mathematical analysis) , data science , publication , open source , scale (ratio) , human–computer interaction , world wide web , artificial intelligence , software , question answering , mathematical analysis , philosophy , physics , mathematics , epistemology , quantum mechanics , advertising , business , programming language
Development of conversational systems is one of the most challenging tasks in natural language processing, and it is especially hard in the case of open‐domain dialogue. The main factors that hinder progress in this area are lack of training data and difficulty of automatic evaluation. Thus, to reliably evaluate the quality of such models, one needs to resort to time‐consuming and expensive human evaluation. We tackle these problems by organizing the Conversational Intelligence Challenge (ConvAI) — open competition of dialogue systems. Our goals are threefold: to work out a good design for human evaluation of open‐domain dialogue, to grow open‐source code base for conversational systems, and to harvest and publish new datasets. Over the course of ConvAI1 and ConvAI2 competitions, we developed a framework for evaluation of chatbots in messaging platforms and used it to evaluate over 30 dialogue systems in two conversational tasks — discussion of short text snippets from Wikipedia and personalized small talk. These large‐scale evaluation experiments were performed by recruiting volunteers as well as paid workers. As a result, we succeeded in collecting a dataset of around 5,000 long meaningful human‐to‐bot dialogues and got many insights into the organization of human evaluation. This dataset can be used to train an automatic evaluation model or to improve the quality of dialogue systems. Our analysis of ConvAI1 and ConvAI2 competitions shows that the future work in this area should be centered around the more active participation of volunteers in the assessment of dialogue systems. To achieve that, we plan to make the evaluation setup more engaging.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here