
Omogočanje dostopa do korpusov slovenskih spletnih besedil v luči pravnih omejitev
Author(s) -
Tomaž Erjavec,
Jaka Čibej,
Darja Fišer
Publication year - 2016
Publication title -
slovenščina 2.0
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.165
H-Index - 1
ISSN - 2335-2736
DOI - 10.4312/slo2.0.2016.2.189-219
Subject(s) - physics , humanities , theology , art , philosophy
Web texts are becoming increasingly relevant sources of information, with web corpora useful for corpus linguistic studies and development of language technologies. Even though web texts are directly accessable, which substantially simplifies the collection procedure compilation of web corpora is still complex, time consuming and expensive. It is crucial that similar endeavours are not repeated, which is why it is necessary to make the created corpora easily and widely accessible both to researchers and a wider audience. While this is logistically and technically a straightforward procedure, legal constraints, such as copyright, privacy and terms of use severely hinder the dissemination of web corpora. This paper discusses legal conditions and actual practice in this area, gives an overview of current practices and proposes a range of mitigation measures on the example of the Janes corpus of Slovene user-generated content in order to ensure free and open dissemination of Slovene web corpora