z-logo
open-access-imgOpen Access
Method for Automatic Term Extraction from Scientific Articles Based on Weak Supervision
Author(s) -
Elena Bruches,
Tatiana Batura
Publication year - 2021
Publication title -
vestnik novosibirskogo gosudarstvennogo universiteta. seriâ: informacionnye tehnologii/vestnik novosibirskogo gosudarstvennogo universiteta. seriâ: informacionnye tehnologii v obrazovanii
Language(s) - English
Resource type - Journals
eISSN - 2410-0420
pISSN - 1818-7900
DOI - 10.25205/1818-7900-2021-19-2-5-16
Subject(s) - computer science , term (time) , set (abstract data type) , extraction (chemistry) , information retrieval , quality (philosophy) , natural language processing , artificial intelligence , data mining , philosophy , chemistry , physics , epistemology , chromatography , quantum mechanics , programming language
We propose a method for scientific terms extraction from the texts in Russian based on weakly supervised learning. This approach doesn't require a large amount of hand-labeled data. To implement this method we collected a list of terms in a semi-automatic way and then annotated texts of scientific articles with these terms. These texts we used to train a model. Then we used predictions of this model on another part of the text collection to extend the train set. The second model was trained on both text collections: annotated with a dictionary and by a second model. Obtained results showed that giving additional data, annotated even in an automatic way, improves the quality of scientific terms extraction.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here