Influence of Unique Words on the Performance of Corpus-Based Keyword Detection Methods
Author(s) -
О. С. Кушнір,
V. Yaremkiv,
I. Dovhan,
А. І. Кашуба
Publication year - 2018
Language(s) - English
DOI - 10.30970/elit2018.a22
Subject(s) - natural language processing , computer science , artificial intelligence , information retrieval
We study the performance of corpus-based keyword detection methods, including TF-IDF, in a particular case when a text under investigation contains unique words, which are absent or rare in the other texts of corpus. The two points are subjects of our main attention, the quality of keyword list and propriety of the corresponding keyness scores, as well as criticality of the methods to small perturbations of the corpus. We conclude that a number of heuristically introduced TF-IDFlike measures compete quite successfully with TF-IDF in their performance but, on the other hand, they cannot cope with the problem of criticality of their scores inherent to the unique words
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom