z-logo
open-access-imgOpen Access
A Novel Feature Selection Technique for Text Classification Using Naïve Bayes
Author(s) -
Subhajit Dey Sarkar,
Saptarsi Goswami,
Aman Agarwal,
Javed Aktar
Publication year - 2014
Publication title -
international scholarly research notices
Language(s) - English
Resource type - Journals
ISSN - 2356-7872
DOI - 10.1155/2014/717092
Subject(s) - naive bayes classifier , feature selection , computer science , artificial intelligence , univariate , classifier (uml) , bayes' theorem , machine learning , pattern recognition (psychology) , cluster analysis , categorization , feature (linguistics) , data mining , bayes error rate , bayes classifier , bayesian probability , support vector machine , multivariate statistics , linguistics , philosophy
With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naïve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom