Improving Topic Models with Latent Feature Word Representations | Zendy

Dat Quoc Nguyen | Zendy; Richard Billingsley | Zendy; Lan Du | Zendy; Mark Johnson | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Improving Topic Models with Latent Feature Word Representations

Author(s) -

Dat Quoc Nguyen,

Richard Billingsley,

Lan Du,

Mark Johnson

Publication year - 2015

Publication title -

transactions of the association for computational linguistics

Language(s) - English

Resource type - Journals

ISSN - 2307-387X

DOI - 10.1162/tacl_a_00140

Subject(s) - computer science , latent dirichlet allocation , topic model , artificial intelligence , natural language processing , word (group theory) , feature (linguistics) , probabilistic latent semantic analysis , cluster analysis , feature vector , probabilistic logic , document clustering , feature engineering , coherence (philosophical gambling strategy) , information retrieval , linguistics , deep learning , philosophy , physics , quantum mechanics

Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two different Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus. Experimental results show that by using information from the external corpora, our new models produce significant improvements on topic coherence, document clustering and document classification tasks, especially on datasets with few or short documents.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research