z-logo
open-access-imgOpen Access
Developing an Automatic Part-of-Speech Tagger for Scottish Gaelic
Author(s) -
William Lamb,
Samuel Danso
Publication year - 2014
Language(s) - English
Resource type - Conference proceedings
DOI - 10.3115/v1/w14-4601
Subject(s) - computer science , security token , bigram , natural language processing , artificial intelligence , speech recognition , sample (material) , part of speech tagging , part of speech , chemistry , computer security , trigram , chromatography
This paper describes an on-going project that seeks to develop the first automatic PoS tagger for Scottish Gaelic. Adapting the PAROLE tagset for Irish, we manually re-tagged a preexisting 86k token corpus of Scottish Gaelic. A double-verified subset of 13.5k tokens was used to instantiate eight statistical taggers and verify their accuracy, via a randomly assigned hold-out sample. An accuracy level of 76.6% was achieved using a Brill bigram tagger. We provide an overview of the project’s methodology, interim results and future directions.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom