z-logo
open-access-imgOpen Access
Compiling the Uralic Dataset for NorthEuraLex, a Lexicostatistical Database of Northern Eurasia
Author(s) -
Johannes Dellert
Publication year - 2015
Publication title -
septentrio conference series
Language(s) - English
Resource type - Journals
ISSN - 2387-3086
DOI - 10.7557/5.3466
Subject(s) - computer science , database , lexical database , computational linguistics , linguistics , data science , information retrieval , natural language processing , philosophy , wordnet
This paper presents a large comparative lexical database which covers about a thousand concepts across twenty Uralic languages. The dataset will be released as the first part of NorthEuraLex, a lexicostatistical database of Northern Eurasia which is being compiled within the EVOLAEMP project. The chief purpose of the lexical database is to serve as a basis of benchmarks for different tasks within computational historical linguistics, but it might also be valuable to researchers who work on the application of computational methods to open research questions within the language family. The paper describes and motivates the decisions taken concerning data collection methodology, also discussing some of the problems involved in compiling and unifying data from lexical resources in six different gloss languages. The dataset is already publicly available in various PDF formats for inspection and review, and is scheduled for release in machine-readable form in early 2015.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom