Relevance popularity: A term event model based feature selection scheme for text classification | Zendy

Guozhong Feng | Zendy; Baiguo An | Zendy; Fengqin Yang | Zendy; Han Wang | Zendy; Libiao Zhang | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Relevance popularity: A term event model based feature selection scheme for text classification

Author(s) -

Guozhong Feng,

Baiguo An,

Fengqin Yang,

Han Wang,

Libiao Zhang

Publication year - 2017

Publication title -

plos one

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.99

H-Index - 332

ISSN - 1932-6203

DOI - 10.1371/journal.pone.0174341

Subject(s) - feature selection , computer science , term (time) , artificial intelligence , naive bayes classifier , feature (linguistics) , pattern recognition (psychology) , benchmark (surveying) , support vector machine , linear discriminant analysis , machine learning , data mining , linguistics , philosophy , physics , geodesy , quantum mechanics , geography

Feature selection is a practical approach for improving the performance of text classification methods by optimizing the feature subsets input to classifiers. In traditional feature selection methods such as information gain and chi-square, the number of documents that contain a particular term (i.e. the document frequency) is often used. However, the frequency of a given term appearing in each document has not been fully investigated, even though it is a promising feature to produce accurate classifications. In this paper, we propose a new feature selection scheme based on a term event Multinomial naive Bayes probabilistic model. According to the model assumptions, the matching score function, which is based on the prediction probability ratio, can be factorized. Finally, we derive a feature selection measurement for each term after replacing inner parameters by their estimators. On a benchmark English text datasets (20 Newsgroups) and a Chinese text dataset (MPH-20), our numerical experiment results obtained from using two widely used text classifiers (naive Bayes and support vector machine) demonstrate that our method outperformed the representative feature selection methods.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore