Premium
Discovering hierarchical topic evolution in time‐stamped documents
Author(s) -
Song Jun,
Huang Yu,
Qi Xiang,
Li Yuheng,
Li Feng,
Fu Kun,
Huang Tinglei
Publication year - 2016
Publication title -
journal of the association for information science and technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.903
H-Index - 145
eISSN - 2330-1643
pISSN - 2330-1635
DOI - 10.1002/asi.23439
Subject(s) - timestamp , hierarchy , computer science , process (computing) , data mining , measure (data warehouse) , hierarchical database model , baseline (sea) , real time computing , economics , market economy , operating system , oceanography , geology
The objective of this paper is to propose a hierarchical topic evolution model ( HTEM ) that can organize time‐varying topics in a hierarchy and discover their evolutions with multiple timescales. In the proposed HTEM , topics near the root of the hierarchy are more abstract and also evolve in the longer timescales than those near the leaves. To achieve this goal, the distance‐dependent C hinese restaurant process (dd CRP ) is extended to a new nested process that is able to simultaneously model the dependencies among data and the relationship between clusters. The HTEM is proposed based on the new process for time‐stamped documents, in which the timestamp is utilized to measure the dependencies among documents. Moreover, an efficient G ibbs sampler is developed for the proposed HTEM . Our experimental results on two popular real‐world data sets verify that the proposed HTEM can capture coherent topics and discover their hierarchical evolutions. It also outperforms the baseline model in terms of likelihood on held‐out data.