z-logo
Premium
The Significance Of Low Frequent Terms in Text Classification
Author(s) -
AlTahrawi Mayy M.
Publication year - 2014
Publication title -
international journal of intelligent systems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.291
H-Index - 87
eISSN - 1098-111X
pISSN - 0884-8173
DOI - 10.1002/int.21643
Subject(s) - naive bayes classifier , computer science , support vector machine , benchmark (surveying) , artificial intelligence , set (abstract data type) , function (biology) , machine learning , logistic regression , precision and recall , pattern recognition (psychology) , data mining , natural language processing , geodesy , evolutionary biology , biology , programming language , geography
The significance of low frequent terms in text classification (TC) was always debatable. These terms were often accused of adding noise to the TC process. Nevertheless, some recent studies have proved that they are very helpful in improving the performance of text classifiers. This paper shows the significance of low frequent terms in enhancing the performance of English TC, in terms of precision, recall, F‐measure, and accuracy. Six well‐known TC algorithms are tested on the benchmark Reuters Data Set, once keeping low frequent terms and another time discarding them. These algorithms are the support vector machines, logistic regression, k‐nearest neighbor, naive bayes, the radial basis function networks, and polynomial networks. All the experiments in this research have shown a superior performance of TC when the low frequent terms are used in classification.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here