z-logo
open-access-imgOpen Access
Comparative study of word embedding methods in topic segmentation
Author(s) -
Marwa Naili,
Anja Habacha Chaïbi,
Henda Hajjami Ben Ghézala
Publication year - 2017
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2017.08.009
Subject(s) - word2vec , computer science , natural language processing , word embedding , artificial intelligence , word (group theory) , context (archaeology) , text segmentation , representation (politics) , field (mathematics) , embedding , segmentation , linguistics , paleontology , philosophy , mathematics , politics , political science , pure mathematics , law , biology
The vector representations of words are very useful in different natural language processing tasks in order to capture the semantic meaning of words. In this context, the three known methods are: LSA, Word2Vec and GloVe. In this paper, these methods will be investigated in the field of topic segmentation for both languages Arabic and English. Moreover, Word2Vec is studied in depth by using different models and approximation algorithms. As results, we found out that LSA, Word2Vec and GloVe depend on the used language. However, Word2Vec presents the best word vector representation yet it depends on the choice of model.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom