z-logo
open-access-imgOpen Access
Automatic Linking of Terms from Scientific Texts with Knowledge Base Entities
Author(s) -
A. A. Mezentseva,
Elena Bruches,
Tatiana Batura
Publication year - 2021
Publication title -
vestnik novosibirskogo gosudarstvennogo universiteta. seriâ: informacionnye tehnologii/vestnik novosibirskogo gosudarstvennogo universiteta. seriâ: informacionnye tehnologii v obrazovanii
Language(s) - English
Resource type - Journals
eISSN - 2410-0420
pISSN - 1818-7900
DOI - 10.25205/1818-7900-2021-19-2-65-75
Subject(s) - computer science , knowledge base , information retrieval , entity linking , term (time) , set (abstract data type) , task (project management) , ranking (information retrieval) , context (archaeology) , quality (philosophy) , natural language processing , base (topology) , string (physics) , rank (graph theory) , matching (statistics) , artificial intelligence , paleontology , mathematical analysis , philosophy , statistics , physics , mathematics , management , epistemology , quantum mechanics , combinatorics , biology , economics , programming language
Due to the growth of the number of scientific publications, the tasks related to scientific article processing become more actual. Such texts have a special structure, lexical and semantic content that should be taken into account while processing. Using information from knowledge bases can significantly improve the quality of text processing systems. This paper is dedicated to the entity linking task for scientific articles in Russian, where we consider scientific terms as entities. During our work, we annotated a corpus with scientific texts, where each term was linked with an entity from a knowledge base. Also, we implemented an algorithm for entity linking and evaluated it on the corpus. The algorithm consists of two stages: candidate generation for an input term and ranking this set of candidates to choose the best match. We used string matching of an input term and an entity in a knowledge base to generate a set of candidates. To rank the candidates and choose the most relevant entity for a term, information about the number of links to other entities within the knowledge base and to other sites is used. We analyzed the obtained results and proposed possible ways to improve the quality of the algorithm, for example, using information about the context and a knowledge base structure. The annotated corpus is publicly available and can be useful for other researchers.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here