Topic Tracking Based on Identifying Proper Number of the Latent Topics in Documents
Author(s) -
Midori Serizawa,
Ichiro Kobayashi
Publication year - 2012
Publication title -
journal of advanced computational intelligence and intelligent informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.172
H-Index - 20
eISSN - 1343-0130
pISSN - 1883-8014
DOI - 10.20965/jaciii.2012.p0611
Subject(s) - latent dirichlet allocation , perplexity , computer science , topic model , latent semantic analysis , similarity (geometry) , semantics (computer science) , metric (unit) , artificial intelligence , probabilistic latent semantic analysis , information retrieval , data mining , image (mathematics) , language model , operations management , economics , programming language
In this paper, we propose a method for detecting and tracking topics of newspaper articles based on the latent semantics of the documents. We use Latent Dirichlet Allocation (LDA) to extract latent topics. In using LDA, we have to provide the number of latent topics in target documents in advance. To do so, perplexity is widely used as a metric for estimating the number of latent topics in documents. As a solution, we estimate the number of latent topics without any prior information in the case of using Hierarchical Dirichlet Process LDA (HDP-LDA). We propose a method to estimate the number of latent topics in target documents based on calculating the similarity among extracted topics, and conduct an experiment with three data sets to compare the method with the above two representative methods, i.e., HDP-LDA and LDA using perplexity. From experimental results, we confirmed that our method can provide results similar to that of HDP-LDA. We also detect and track topics by means of our proposed method and confirm that our method is useful.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom