Sentence Vector Model Based on Implicit Word Vector Expression
Author(s) -
Xinzhi Wang,
Hui Zhang,
Yi Liu
Publication year - 2018
Publication title -
ieee access
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.587
H-Index - 127
ISSN - 2169-3536
DOI - 10.1109/access.2018.2817839
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Word vector and topic model can help retrieve information semantically. However, there still are many problems: 1) antonyms share high similarity when clustered through word vectors; 2) vectors for name entities cannot be fully trained, as name entities may appear limited times in specific corpus; and 3) words, sentences, and paragraphs, sharing the same meaning but with no overlapping words, are hard to be recognized. To overcome the above problems, this paper proposes a new vector computation model for text named s2v. Words, sentences, and paragraphs are represented in a unified way in the model. Sentence vectors and paragraph vectors are trained along with word vectors. Based on the unified representation, word and sentence (with different length) retrieval are experimentally studied. The results show that information with similar meaning can be retrieved even if the information is expressed with different words.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom