z-logo
open-access-imgOpen Access
Comparing Apples to Apple: The Effects of Stemmers on Topic Models
Author(s) -
Alexandra Schofield,
David Mimno
Publication year - 2016
Publication title -
transactions of the association for computational linguistics
Language(s) - English
Resource type - Journals
ISSN - 2307-387X
DOI - 10.1162/tacl_a_00099
Subject(s) - computer science , coherence (philosophical gambling strategy) , artificial intelligence , natural language processing , principle of maximum entropy , variety (cybernetics) , machine learning , maximum likelihood , stability (learning theory) , statistics , mathematics
Rule-based stemmers such as the Porter stemmer are frequently used to preprocess English corpora for topic modeling. In this work, we train and evaluate topic models on a variety of corpora using several different stemming algorithms. We examine several different quantitative measures of the resulting models, including likelihood, coherence, model stability, and entropy. Despite their frequent use in topic modeling, we find that stemmers produce no meaningful improvement in likelihood and coherence and in fact can degrade topic stability.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom