Measuring Brazilian Portuguese Product Titles Similarity using Embeddings | Zendy

Alan da Silva Romualdo | Zendy; Livy Real | Zendy; Helena de Medeiros Caseli | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Measuring Brazilian Portuguese Product Titles Similarity using Embeddings

Author(s) -

Alan da Silva Romualdo,

Livy Real,

Helena de Medeiros Caseli

Publication year - 2021

Language(s) - English

Resource type - Conference proceedings

DOI - 10.5753/stil.2021.17791

Subject(s) - word2vec , similarity (geometry) , closeness , natural language processing , word (group theory) , computer science , cosine similarity , artificial intelligence , portuguese , semantic similarity , domain (mathematical analysis) , product (mathematics) , mathematics , pattern recognition (psychology) , embedding , linguistics , mathematical analysis , philosophy , geometry , image (mathematics)

Textual similarity deals with determining how similar two pieces of texts are, considering the lexical (surface forms) or semantic (meaning) closeness. In this paper we applied word embeddings for measuring e-commerce product title similarity in Brazilian Portuguese. We generated some domainspecific word embeddings (using Word2Vec, FastText and GloVe) and compared them with general-domain models (word embeddings and BERT models). We concluded that the cosine similarity calculated using the domain-specific word embeddings was a good approach to distinguish between similar and nonsimilar products, but the multilingual BERT pre-trained model proved to be the best one.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research