z-logo
open-access-imgOpen Access
Developing free morphological data for Polish
Author(s) -
Adam Radziszewski,
Marek Maziarz
Publication year - 2015
Publication title -
cognitive studies | études cognitives
Language(s) - English
Resource type - Journals
eISSN - 2392-2397
pISSN - 2080-7147
DOI - 10.11649/cs.2011.012
Subject(s) - natural language processing , computer science , lemmatisation , artificial intelligence , annotation , word (group theory) , limiting , rigour , linguistics , mathematics , engineering , mechanical engineering , philosophy , geometry
A limiting factor in construction of Natural Language Processing (NLP) systems is often the availability of morphological resources. This indeed happens for Polish: the freely available corpus with manual morpho-syntactic annotation (part of the IPI PAN Corpus) is not coupled with any free morphological analyser. There exists a very large morphological dictionary of Polish available under a free licence — Morfologik. Unfortunately, its tagset differs significantly from the tagset of the corpus and, what is more, its morphological description lacks desired rigour. We amend this situation by performing a massive conversion of the dictionary into the tagset compliant with the corpus. The conversion results in a free dictionary containing entries for almost 3.5 million different word forms. In this article we report on our methodology, discuss some morphological and syntactic issues related to both tagsets and present the characteristics of the resulting dictionary.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom