
Extraction of Space Domain Entity and Relation via Word Vector Representation and Clustering Method
Author(s) -
Zhanji Wei,
Gang Wan,
Ling Huang,
Yao Mu,
Yunxia Yin
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1944/1/012022
Subject(s) - relationship extraction , computer science , cluster analysis , vector space model , vector space , artificial intelligence , natural language processing , word (group theory) , domain (mathematical analysis) , graph , relation (database) , space (punctuation) , domain knowledge , information extraction , theoretical computer science , data mining , mathematics , mathematical analysis , geometry , operating system
Knowledge graph has shown great value in search engine, natural language Q&A, recommendation system and other application scenarios in recent years. The basic elements of a knowledge graph are entities and relations therein, so how to automatically extract entities and relations from natural language texts becomes a key issue in knowledge graph construction. In this paper, we propose an unsupervised method to extract space domain entities and relations with the goal of building a space knowledge graph. Firstly, a neural network model is used to extract implicit semantic features of domain words represented by dense vectors from original space domain corpus, and then new entities are discovered by clustering in vector space through a small number of labeled data. By concatenating space domain-specific word vectors and general domain word vectors, universal vector representations of entities are obtained, which include general features and domain features as well. On this basis, semantic vectors of relations between entities are calculated, and more new entity relations can be extracted from the corpus by using semantic vectors of relations. Compared with supervised method, the entity and relation extraction method proposed in this paper only needs a small amount of labeled data, thus is quite suitable for the construction of knowledge graph in space domain where labeled data is rather rare and expensive.