Open Access
Unsupervised Named Entity Recognition for Hi-Tech Domain
Author(s) -
Abinaya Govindan,
Gyan Ranjan,
Amit Verma
Publication year - 2021
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5121/csit.2021.111917
Subject(s) - computer science , named entity recognition , entity linking , inference , boosting (machine learning) , artificial intelligence , ambiguity , natural language processing , annotation , named entity , task (project management) , domain (mathematical analysis) , context (archaeology) , machine learning , information retrieval , data mining , knowledge base , mathematical analysis , paleontology , mathematics , management , programming language , economics , biology
This paper presents named entity recognition as a multi-answer QA task combined with contextual natural-language-inference based noise reduction. This method allows us to use pre-trained models that have been trained for certain downstream tasks to generate unsupervised data, reducing the need for manual annotation to create named entity tags with tokens. For each entity, we provide a unique context, such as entity types, definitions, questions and a few empirical rules along with the target text to train a named entity model for the domain of our interest. This formulation (a) allows the system to jointly learn NER-specific features from the datasets provided, and (b) can extract multiple NER-specific features, thereby boosting the performance of existing NER models (c) provides business-contextualized definitions to reduce ambiguity among similar entities. We conducted numerous tests to determine the quality of the created data, and we find that this method of data generation allows us to obtain clean, noise-free data with minimal effort and time. This approach has been demonstrated to be successful in extracting named entities, which are then used in subsequent components.