Premium
Detection of Annotation Errors in Corpora
Author(s) -
Dickinson Markus
Publication year - 2015
Publication title -
language and linguistics compass
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.619
H-Index - 44
ISSN - 1749-818X
DOI - 10.1111/lnc3.12129
Subject(s) - annotation , treebank , computer science , natural language processing , artificial intelligence , grammar , error detection and correction , information retrieval , linguistics , algorithm , philosophy
This paper surveys methods for annotation error detection and correction. Methods can broadly be characterized as to whether they detect inconsistencies with respect to some statistical model based only on the corpus data or whether they detect inconsistencies with respect to a grammatical model, in general, some external information source. Two extended examples are presented, illustrating these different techniques: (1) the variation n ‐gram method, which searches for inconsistences in annotation for identical strings; and (2) a method of ad hoc rule detection , for syntactic annotation, which compares treebank rules to a grammar to determine which are anomalous. Methods for detecting annotation errors have developed much over the last decade, and thus corpus practitioners can benefit greatly from them, while at the same time NLP researchers can learn more about the nuances of the annotation they use and see how error correction methods intersect with NLP techniques.