Efficient Corpus Creation Method for NLU Using Interview with Probing Questions
Author(s) -
Kazuaki Shima,
Takeshi Homma,
Masataka Motohashi,
Rintaro Ikeshita,
Hiroaki Kokubo,
Yasunari Obuchi,
Jinhua She,
Corresponding Author
Publication year - 2019
Publication title -
journal of advanced computational intelligence and intelligent informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.172
H-Index - 20
eISSN - 1883-8014
pISSN - 1343-0130
DOI - 10.20965/jaciii.2019.p0947
Subject(s) - utterance , computer science , task (project management) , subject (documents) , natural language processing , artificial intelligence , natural language understanding , baseline (sea) , lexical diversity , speech recognition , natural language , linguistics , world wide web , vocabulary , geology , philosophy , oceanography , economics , management
This paper presents an efficient method to build a corpus to train natural language understanding (NLU) modules. Conventional corpus creation methods involve a common cycle: a subject is given a specific situation where the subject operates a device by voice, and then the subject speaks one utterance to execute the task. In these methods, many subjects are required in order to build a large-scale corpus, which causes a problem of increasing lead time and financial cost. To solve this problem, we propose to incorporate a “probing question” into the cycle. Specifically, after a subject speaks one utterance, the subject is asked to think of alternative utterances to execute the same task. In this way, we obtain many utterances from a small number of subjects. An evaluation of the proposed method applied to interview-based corpus creation shows that the proposed method reduces the number of subjects by 41% while maintaining morphological diversity in a corpus and morphological coverage for user utterances spoken to commercial devices. It also shows that the proposed method reduces the total time for interviewing subjects by 36% compared with the conventional method. We conclude that the proposed method can be used to build a useful corpus while reducing lead time and financial cost.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom