z-logo
Premium
Topic detection model in a single‐domain corpus inspired by the human memory cognitive process
Author(s) -
Zhao Taotao,
Luo Xiangfeng,
Qin Wei,
Huang Subin,
Xie Shaorong
Publication year - 2018
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.4642
Subject(s) - computer science , latent dirichlet allocation , artificial intelligence , topic model , natural language processing , text corpus , semantics (computer science) , process (computing) , probabilistic logic , domain (mathematical analysis) , machine learning , mathematical analysis , mathematics , programming language , operating system
Summary A corpus (eg, patents or news texts) is an important knowledge resource that contains various topics, such as specific technologies or social events. Topic detection models of corpus, eg, Latent Dirichlet Allocation and KeyGraph, provide an important basis for exploring the status quo and trends in science, technology, or social events. However, these models suffer from low retrieval performance as they only consider text own explicit semantics in a single‐domain corpus. In addition, many incremental models, such as online‐LDA, are based on time slices. In this paper, a new topic detection model is proposed to improve the topic detection performance of a single‐domain corpus, which is inspired by a human memory cognitive process (THC). First, to improve the accuracy, distributions over words and inter‐word relations across a corpus are utilized as background knowledge, which is a type of implicit semantics, and we can find a more semantic‐sensitive part of texts. Second, to realize online topic detection without time slices, we introduce a probability gain‐based dynamic probabilistic model to detect latent topics by learning a model based on the dynamic human memory cognitive process. These two steps constitute the framework of our model. The experimental results for four public datasets (Reuters‐R8, Reuters‐R52, WebKB, and Cade12) reveal that our model is approximately ten percent higher than other baselines (eg, KeyGraph and LDA) on the Adjusted Rand Index (ARI).

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here