Open Access
Research on Semantic Similarity of Short Text Based on Bert and Time Warping Distance
Author(s) -
Shijie Qiu,
Yan Niu,
Jun Li,
Xing Li
Publication year - 2021
Publication title -
journal of web engineering/journal of web engineering on line
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.151
H-Index - 13
eISSN - 1544-5976
pISSN - 1540-9589
DOI - 10.13052/jwe1540-9589.20814
Subject(s) - semantic similarity , computer science , similarity (geometry) , artificial intelligence , feature (linguistics) , feature vector , dynamic time warping , natural language processing , vector space model , point (geometry) , ambiguity , sequence (biology) , pattern recognition (psychology) , information retrieval , mathematics , image (mathematics) , linguistics , philosophy , geometry , biology , genetics , programming language
The research on semantic similarity of short text plays an important role in machine translation, emotion analysis, information retrieval and other AI business applications. However, according to existing short text similarity research, the characteristics of ambiguous vocabularies are difficult to be effectively analyzed, the solution of the problem caused by words order needs to be further optimized as well. This paper proposes a short text semantic similarity calculation method that combines BERT and time warping distance algorithm, in order to solve the problem of vocabulary ambiguity. The model first uses the pre trained Bert model to extract the semantic features of the short text from the whole level, and obtains a 768 dimensional short text feature vector. Then, it transforms the extracted feature vector into a point sequence in space, uses the CTW algorithm to calculate the time warping distance between the curves connected by the point sequence, and finally uses the weight function designed by the analysis, according to the smaller the time warpage distance is, the higher the degree of small similarity is, to calculate the similarity between short texts. The experimental results show that this model can mine the feature information of ambiguous words, and calculate the similarity of short texts with lexical ambiguity effectively. Compared with other models, it can distinguish the semantic features of ambiguous words more accurately.