z-logo
open-access-imgOpen Access
Model-Based Estimation of Word Saliency in Text
Author(s) -
Xin Wang,
Ata Kabán
Publication year - 2006
Publication title -
lecture notes in computer science
Language(s) - English
Resource type - Book series
SCImago Journal Rank - 0.249
H-Index - 400
eISSN - 1611-3349
pISSN - 0302-9743
ISBN - 3-540-46491-3
DOI - 10.1007/11893318_28
Subject(s) - artificial intelligence , computer science , classifier (uml) , word (group theory) , naive bayes classifier , generative model , latent variable , generative grammar , pattern recognition (psychology) , dirichlet distribution , natural language processing , latent variable model , latent dirichlet allocation , topic model , mathematics , support vector machine , mathematical analysis , geometry , boundary value problem
We investigate a generative latent variable model for model-based word saliency estimation for text modelling and classification. The estimation algorithm derived is able to infer the saliency of words with respect to the mixture modelling objective. We demonstrate experimental results showing that common stop-words as well as other corpus-specific common words are automatically down-weighted and this enhances our ability to capture the essential structure in the data, ignoring irrelevant details. As a classifier, our approach improves over the class prediction accuracy of the Naive Bayes classifier in all our experiments. Compared with a recent state of the art text classification method (Dirichlet Compound Multinomial model) we obtained improved results in two out of three benchmark text collections tested, and comparable results on one other data set.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom