
Method for Automatic Term Extraction from Scientific Articles Based on Weak Supervision
Author(s) -
Elena Bruches,
Tatiana Batura
Publication year - 2021
Publication title -
vestnik novosibirskogo gosudarstvennogo universiteta. seriâ: informacionnye tehnologii/vestnik novosibirskogo gosudarstvennogo universiteta. seriâ: informacionnye tehnologii v obrazovanii
Language(s) - English
Resource type - Journals
eISSN - 2410-0420
pISSN - 1818-7900
DOI - 10.25205/1818-7900-2021-19-2-5-16
Subject(s) - computer science , term (time) , set (abstract data type) , extraction (chemistry) , information retrieval , quality (philosophy) , natural language processing , artificial intelligence , data mining , philosophy , chemistry , physics , epistemology , chromatography , quantum mechanics , programming language
We propose a method for scientific terms extraction from the texts in Russian based on weakly supervised learning. This approach doesn't require a large amount of hand-labeled data. To implement this method we collected a list of terms in a semi-automatic way and then annotated texts of scientific articles with these terms. These texts we used to train a model. Then we used predictions of this model on another part of the text collection to extend the train set. The second model was trained on both text collections: annotated with a dictionary and by a second model. Obtained results showed that giving additional data, annotated even in an automatic way, improves the quality of scientific terms extraction.