z-logo
open-access-imgOpen Access
Training set augmentation in training neural- network language model for ontology population
Author(s) -
Павел Ломов,
AUTHOR_ID,
Marina Malozemova,
AUTHOR_ID
Publication year - 2021
Publication title -
trudy kolʹskogo naučnogo centra ran
Language(s) - English
Resource type - Journals
ISSN - 2307-5252
DOI - 10.37614/2307-5252.2021.5.12.002
Subject(s) - computer science , ontology , set (abstract data type) , artificial neural network , artificial intelligence , natural language processing , training set , domain (mathematical analysis) , training (meteorology) , population , machine learning , programming language , mathematics , mathematical analysis , philosophy , physics , demography , epistemology , sociology , meteorology
This paper is a continuation of the research focused on solving the problem of ontology population using training on an automatically generated training set and the subsequent use of a neural-network language model for analyzing texts in order to discover new concepts to add to the ontology. The article is devoted to the text data augmentation - increasing the size of the training set by modification of its samples. Along with this, a solution to the problem of clarifying concepts (i.e. adjusting their boundaries in sentences), which were found during the automatic formation of the training set, is considered. A brief overview of existing approaches to text data augmentation, as well as approaches to extracting so-called nested named entities (nested NER), is presented. A procedure is proposed for clarifying the boundaries of the discovered concepts of the training set and its augmentation for subsequent training a neural-network language model in order to identify new concepts of ontology in the domain texts. The results of the experimental evaluation of the trained model and the main directions of further research are considered.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here