Open Access
Measuring Brazilian Portuguese Product Titles Similarity using Embeddings
Author(s) -
Alan da Silva Romualdo,
Livy Real,
Helena de Medeiros Caseli
Publication year - 2021
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5753/stil.2021.17791
Subject(s) - word2vec , similarity (geometry) , closeness , natural language processing , word (group theory) , computer science , cosine similarity , artificial intelligence , portuguese , semantic similarity , domain (mathematical analysis) , product (mathematics) , mathematics , pattern recognition (psychology) , embedding , linguistics , mathematical analysis , philosophy , geometry , image (mathematics)
Textual similarity deals with determining how similar two pieces of texts are, considering the lexical (surface forms) or semantic (meaning) closeness. In this paper we applied word embeddings for measuring e-commerce product title similarity in Brazilian Portuguese. We generated some domainspecific word embeddings (using Word2Vec, FastText and GloVe) and compared them with general-domain models (word embeddings and BERT models). We concluded that the cosine similarity calculated using the domain-specific word embeddings was a good approach to distinguish between similar and nonsimilar products, but the multilingual BERT pre-trained model proved to be the best one.