z-logo
open-access-imgOpen Access
An empirical comparison of distance/similarity measures for Natural Language Processing
Author(s) -
Dimmy Magalhães,
Aurora Pozo,
Roberto Santana
Publication year - 2019
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5753/eniac.2019.9328
Subject(s) - computer science , artificial intelligence , word2vec , graph , similarity (geometry) , euclidean distance , pattern recognition (psychology) , semantic similarity , natural language processing , mathematics , embedding , theoretical computer science , image (mathematics)
Text Classification is one of the tasks of Natural Language Processing (NLP). In this area, Graph Convolutional Networks (GCN) has achieved values higher than CNN’s and other related models. For GCN, the term frequency–inverse document frequency (TF-IDF) defines the correlation between words in a vector space plays, it determines the weight of the edges between two words (represented by nodes in the graph). In this study, we empirically investigated the impact of thirteen other measures of distance/similarity in GCN. A representation was built for each document using word embedding from word2vec model. Also, a graph-based representation of five dataset was created for each measure analyzed, where each word is a node in the graph, and each edge is weighted by distance/similarity between words. Finally, each model was run in a simple graph neural network. The results show that, concerning text classification, there is no statistical difference between the analyzed metrics and the Graph Convolution Network. Even with the incorporation of external words or external knowledge, the results were similar to the methods without the incorporation of words. However, the results indicate that some distance metrics behave better than others in relation to context capture, with Euclidean distance reaching the best values or having statistical similarity with the best.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom