Open Access
The Success of Conversational AI and the AI Evaluation Challenge It Reveals
Author(s) -
Ian Beaver
Publication year - 2022
Publication title -
the ai magazine/ai magazine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.597
H-Index - 79
eISSN - 2371-9621
pISSN - 0738-4602
DOI - 10.1609/aimag.v43i1.18601
Subject(s) - creativity , computer science , variation (astronomy) , artificial intelligence , data science , human–computer interaction , knowledge management , psychology , social psychology , physics , astrophysics
Research interest in Conversational artificial intelligence (ConvAI) has experienced a massive growth over the last few years and several recent advancements have enabled systems to produce rich and varied turns in conversations similar to humans. However, this apparent creativity is also creating a real challenge in the objective evaluation of such systems as authors are becoming reliant on crowd worker opinions as the primary measurement of success and, so far, few papers are reporting all that is necessary for others to compare against in their own crowd experiments. This challenge is not unique to ConvAI, but demonstrates as AI systems mature in more “human” tasks that involve creativity and variation, evaluation strategies need to mature with them.