Compiling the Uralic Dataset for NorthEuraLex, a Lexicostatistical Database of Northern Eurasia | Zendy

Johannes Dellert | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Compiling the Uralic Dataset for NorthEuraLex, a Lexicostatistical Database of Northern Eurasia

Author(s) -

Johannes Dellert

Publication year - 2015

Publication title -

septentrio conference series

Language(s) - English

Resource type - Journals

ISSN - 2387-3086

DOI - 10.7557/5.3466

Subject(s) - computer science , database , lexical database , computational linguistics , linguistics , data science , information retrieval , natural language processing , philosophy , wordnet

This paper presents a large comparative lexical database which covers about a thousand concepts across twenty Uralic languages. The dataset will be released as the first part of NorthEuraLex, a lexicostatistical database of Northern Eurasia which is being compiled within the EVOLAEMP project. The chief purpose of the lexical database is to serve as a basis of benchmarks for different tasks within computational historical linguistics, but it might also be valuable to researchers who work on the application of computational methods to open research questions within the language family. The paper describes and motivates the decisions taken concerning data collection methodology, also discussing some of the problems involved in compiling and unifying data from lexical resources in six different gloss languages. The dataset is already publicly available in various PDF formats for inspection and review, and is scheduled for release in machine-readable form in early 2015.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research