Premium
Combining Background Knowledge and Learned Topics
Author(s) -
Steyvers Mark,
Smyth Padhraic,
Chemuduganta Chaitanya
Publication year - 2011
Publication title -
topics in cognitive science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.191
H-Index - 56
eISSN - 1756-8765
pISSN - 1756-8757
DOI - 10.1111/j.1756-8765.2010.01097.x
Subject(s) - interpretability , computer science , hierarchy , generalization , set (abstract data type) , probabilistic logic , topic model , data science , artificial intelligence , selection (genetic algorithm) , information retrieval , natural language processing , mathematical analysis , mathematics , economics , market economy , programming language
Statistical topic models provide a general data‐driven framework for automated discovery of high‐level knowledge from large collections of text documents. Although topic models can potentially discover a broad range of themes in a data set, the interpretability of the learned topics is not always ideal. Human‐defined concepts, however, tend to be semantically richer due to careful selection of words that define the concepts, but they may not span the themes in a data set exhaustively. In this study, we review a new probabilistic framework for combining a hierarchy of human‐defined semantic concepts with a statistical topic model to seek the best of both worlds. Results indicate that this combination leads to systematic improvements in generalization performance as well as enabling new techniques for inferring and visualizing the content of a document.