
Discovering Computer Science Research Topic Trends using Latent Dirichlet Allocation
Author(s) -
Kartika Rizqi Nastiti,
Ahmad Fathan Hidayatullah,
Ahmad R. Pratama
Publication year - 2021
Publication title -
join (jurnal online informatika)
Language(s) - English
Resource type - Journals
eISSN - 2528-1682
pISSN - 2527-9165
DOI - 10.15575/join.v6i1.636
Subject(s) - latent dirichlet allocation , topic model , computer science , relevance (law) , data science , field (mathematics) , tag cloud , information retrieval , range (aeronautics) , coherence (philosophical gambling strategy) , word (group theory) , interpretation (philosophy) , filter (signal processing) , visualization , data mining , statistics , mathematics , engineering , programming language , geometry , political science , pure mathematics , law , computer vision , aerospace engineering
Before conducting a research project, researchers must find the trends and state of the art in their research field. However, that is not necessarily an easy job for researchers, partly due to the lack of specific tools to filter the required information by time range. This study aims to provide a solution to that problem by performing a topic modeling approach to the scraped data from Google Scholar between 2010 and 2019. We utilized Latent Dirichlet Allocation (LDA) combined with Term Frequency-Indexed Document Frequency (TF-IDF) to build topic models and employed the coherence score method to determine how many different topics there are for each year’s data. We also provided a visualization of the topic interpretation and word distribution for each topic as well as its relevance using word cloud and PyLDAvis. In the future, we expect to add more features to show the relevance and interconnections between each topic to make it even easier for researchers to use this tool in their research projects.