Spelling Error Patterns in Brazilian Portuguese
Author(s) -
Priscila A. Gimenes,
Norton Trevisan Roman,
Ariadne Maria Brito Rizzoni Carvalho
Publication year - 2015
Publication title -
computational linguistics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.314
H-Index - 98
eISSN - 1530-9312
pISSN - 0891-2017
DOI - 10.1162/coli_a_00216
Subject(s) - spelling , computer science , portuguese , natural language processing , linguistics , rendering (computer graphics) , set (abstract data type) , artificial intelligence , character (mathematics) , range (aeronautics) , mathematics , programming language , philosophy , materials science , geometry , composite material
Fifty years after Damerau set up his statistics for the distribution of errors in typed texts, his findings are still used in a range of different languages. Because these statistics were derived from texts in English, the question of whether they actually apply to other languages has been raised. We address this issue through the analysis of a set of typed texts in Brazilian Portuguese, deriving statistics tailored to this language. Results show that diacritical marks play a major role, as indicated by the frequency of mistakes involving them, thereby rendering Damerau's original findings mostly unfit for spelling correction systems, although still holding them useful, should one set aside such marks. Furthermore, a comparison between these results and those published for Spanish show no statistically significant differences between both languages—an indication that the distribution of spelling errors depends on the adopted character set rather than the language itself.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom