POS-tagging a bilingual parallel corpus: methods and challenges | Zendy

Irene Doval | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

POS-tagging a bilingual parallel corpus: methods and challenges

Author(s) -

Irene Doval

Publication year - 2017

Publication title -

research in corpus linguistics

Language(s) - English

Resource type - Journals

ISSN - 2243-4712

DOI - 10.32714/ricl.05.03

Subject(s) - computer science , annotation , natural language processing , german , part of speech tagging , corpus linguistics , artificial intelligence , process (computing) , part of speech , linguistics , information retrieval , programming language , philosophy

This paper reviews the author’s experiences of tokenizing and POS tagging a bilingual parallel corpus, the PaGeS Corpus, consisting mostly of German and Spanish fictional texts. This is part of an ongoing process of annotating the corpus for part-of-speech information. This study discusses the specific problems encountered so far. On the one hand, tagging performance degrades significantly when applied to fictional data and, on the other, pre-existing annotation schemes are all language specific. To further improve accuracy during post-editing, the author has developed a common tagset and identified major error patterns.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research