z-logo
open-access-imgOpen Access
Relevance popularity: A term event model based feature selection scheme for text classification
Author(s) -
Guozhong Feng,
Baiguo An,
Fengqin Yang,
Han Wang,
Libiao Zhang
Publication year - 2017
Publication title -
plos one
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.99
H-Index - 332
ISSN - 1932-6203
DOI - 10.1371/journal.pone.0174341
Subject(s) - feature selection , computer science , term (time) , artificial intelligence , naive bayes classifier , feature (linguistics) , pattern recognition (psychology) , benchmark (surveying) , support vector machine , linear discriminant analysis , machine learning , data mining , linguistics , philosophy , physics , geodesy , quantum mechanics , geography
Feature selection is a practical approach for improving the performance of text classification methods by optimizing the feature subsets input to classifiers. In traditional feature selection methods such as information gain and chi-square, the number of documents that contain a particular term (i.e. the document frequency) is often used. However, the frequency of a given term appearing in each document has not been fully investigated, even though it is a promising feature to produce accurate classifications. In this paper, we propose a new feature selection scheme based on a term event Multinomial naive Bayes probabilistic model. According to the model assumptions, the matching score function, which is based on the prediction probability ratio, can be factorized. Finally, we derive a feature selection measurement for each term after replacing inner parameters by their estimators. On a benchmark English text datasets (20 Newsgroups) and a Chinese text dataset (MPH-20), our numerical experiment results obtained from using two widely used text classifiers (naive Bayes and support vector machine) demonstrate that our method outperformed the representative feature selection methods.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here