
A Text Similarity Measurement Based on Semantic Fingerprint of Characteristic Phrases
Author(s) -
Pang Shanchen,
Yao Jiamin,
Liu Ting,
Zhao Hua,
Chen Hongqi
Publication year - 2020
Publication title -
chinese journal of electronics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.267
H-Index - 25
eISSN - 2075-5597
pISSN - 1022-4653
DOI - 10.1049/cje.2019.12.011
Subject(s) - computer science , artificial intelligence , similarity (geometry) , natural language processing , semantic similarity , phrase , feature (linguistics) , fingerprint (computing) , pattern recognition (psychology) , recall rate , word (group theory) , semantics (computer science) , precision and recall , mathematics , linguistics , image (mathematics) , philosophy , geometry , programming language
Text similarity measurements are the basis for measuring the degree of matching between two or more texts. Traditional large‐scale similarity detection methods based on a digital fingerprint have the advantage of high detection speed, which are only suitable for accurate detection. We propose a method of Chinese text similarity measurement based on feature phrase semantics. Natural language processing (NLP) technology is used to pre‐process text and extract the keywords by the Term frequency‐Inverse document frequency (TF‐IDF) model and further screen out the feature words. We get the exact meaning of a word and semantic similarities between words and a HowNet semantic dictionary. We substitute concepts to get the feature phrases and generate a semantic fingerprint and calculate similarity. The experimental results indicate that the method proposed is superior in similarity detection in terms of its accuracy rate, recall rate, and F ‐value to the traditional and digital fingerprinting method.