
Topic unit detection in spontaneous speech
Author(s) -
Frederico Amorim Cavalcante,
Tommaso Raso,
Giulia Bossaglia,
Maryualê Malvessi Mittmann,
Bruna Maia Rocha
Publication year - 2020
Publication title -
chimera
Language(s) - English
Resource type - Journals
ISSN - 2386-2629
DOI - 10.15366/chimera2020.7.004
Subject(s) - cohen's kappa , test (biology) , computer science , agreement , kappa , statistic , section (typography) , protocol (science) , natural language processing , identification (biology) , portuguese , measure (data warehouse) , unit (ring theory) , annotation , test statistic , linguistics , artificial intelligence , statistics , information retrieval , data mining , psychology , statistical hypothesis testing , mathematics , machine learning , mathematics education , medicine , paleontology , philosophy , alternative medicine , botany , pathology , biology , operating system
This paper deals with an inter-annotator agreement test involving the identification of the information unit of Topic as defined within the framework of the Language into Act Theory (L-AcT). Fleiss’s kappa statistic was used to measure the agreement among the four annotators who took part in the test. The data used was sampled from C-ORAL-BRASIL II, a spontaneous speech corpus of Brazilian Portuguese. The paper begins by outlining of the theoretical underpinnings of L-AcT, dedicating special attention to aspects directly related to the notion of Topic. Section 2 presents the pilot test and discusses methodological and theoretical issues that were relevant for the design of the protocol that was eventually used in the actual test. Sections 3 and 4 deal with the test, its protocol and results (the kappa coefficient for the general agreement was 0.79, which by usual standards represents a substantial agreement). Section 5 first provides a brief review of a few studies conducted according to other frameworks which have dealt with inter-rater agreement on the annotation of information structure categories. Finally, the errors observed in the test are analyzed qualitatively.