z-logo
open-access-imgOpen Access
A Filtering Process to Enhance Topic Detection and Labelling
Author(s) -
Amal Tarifa,
Aroua Hedhili,
Wided Lejouad Chaari
Publication year - 2020
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2020.09.042
Subject(s) - computer science , latent dirichlet allocation , topic model , process (computing) , word2vec , dirichlet process , personalization , information retrieval , digitization , dependency (uml) , metric (unit) , aggregate (composite) , word embedding , word (group theory) , artificial intelligence , social media , machine learning , data mining , bayesian probability , world wide web , embedding , linguistics , operations management , materials science , philosophy , economics , composite material , computer vision , operating system
In the digitization air, it is very important to detect and analyze the related topics to some discussions, occurred in social media or to label some visited web pages or documents. This information could be very helpful to the process of personalization as well as user satisfaction. There are various and different methods that study and deal with a huge data to provide insights into user behaviors. In this paper, we propose a filtering process that enhances topic detection and labelling. The latter aims to compact the result delivered by inferential algorithms such as Latent Dirichlet Allocation and Dirichlet Mixture Model. Our filtering process relies on words dependency on each contextual use for delivering high correlated label. Indeed, we use Word2vec as well as N-grams to eliminate non-significant words in each topic. We also use Hellinger distance to aggregate redundant words to the appropriate topic. Besides, we eliminate the non-reliable topics according to some metric. We associate this proposal to different topic-modeling algorithms. Experiments demonstrate the effectiveness of the made association between inferential model and our filtering process compared to the state of the art. We also use different textual data to validate our proposal.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom