
Analysis of Topic Modeling with Unpooled and Pooled Tweets and Exploration of Trends during Covid
Author(s) -
Jaishree Ranganathan,
Tsega Tsahai
Publication year - 2021
Publication title -
international journal of computer science, engineering and applications
Language(s) - English
Resource type - Journals
eISSN - 2231-0088
pISSN - 2230-9616
DOI - 10.5121/ijcsea.2021.11601
Subject(s) - latent dirichlet allocation , topic model , social media , trigram , computer science , jargon , gibbs sampling , inference , pooling , perplexity , sentiment analysis , data science , artificial intelligence , microblogging , information retrieval , machine learning , world wide web , bayesian probability , language model , linguistics , philosophy
In this digital era, social media is an important tool for information dissemination. Twitter is a popular social media platform. Social media analytics helps make informed decisions based on people's needs and opinions. This information, when properly perceived provides valuable insights into different domains, such as public policymaking, marketing, sales, and healthcare. Topic modeling is an unsupervised algorithm to discover a hidden pattern in text documents. In this study, we explore the Latent Dirichlet Allocation (LDA) topic model algorithm. We collected tweets with hashtags related to corona virus related discussions. This study compares regular LDA and LDA based on collapsed Gibbs sampling (LDAMallet) algorithms. The experiments use different data processing steps including trigrams, without trigrams, hashtags, and without hashtags. This study provides a comprehensive analysis of LDA for short text messages using un-pooled and pooled tweets. The results suggest that a pooling scheme using hashtags helps improve the topic inference results with a better coherence score.