Performance Comparison and Optimization of Text Document Classification using k-NN and Naïve Bayes Classification Techniques | Zendy

Zulfany Erlisa Rasjid | Zendy; Reina Setiawan | Zendy

Open Access

Performance Comparison and Optimization of Text Document Classification using k-NN and Naïve Bayes Classification Techniques

Author(s) -

Zulfany Erlisa Rasjid,

Reina Setiawan

Publication year - 2017

Publication title -

procedia computer science

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.334

H-Index - 76

ISSN - 1877-0509

DOI - 10.1016/j.procs.2017.10.017

Subject(s) - computer science , naive bayes classifier , information retrieval , focus (optics) , xml , cluster analysis , k means clustering , value (mathematics) , artificial intelligence , natural language processing , machine learning , world wide web , support vector machine , physics , optics

In the current era, information is available in several different formats, such as text, image, video, audio and others. Corpus is a collection of documents in a large volume. By using Information Retrieval (IR), it is possible to obtain an unstructured information and automatic summary, classification and clustering. This research is to focus on data classification using two out of the six approaches of data classification, which is k-NN (k-Nearest Neighbors) and Naive Bayes. The text documents used is in XML format. The Corpus used in this research is downloaded from TREC Legal Track with a total of more than three thousand text documents and over twenty types of classifications. Out of the twenty types of classifications, six are chosen with the most number of text documents. The data is processed using RapidMiner software and the result shows that the optimum value for k in k-NN occurs at k=13. Using this value for k, the accruacy in average reached 55.17 percent, which is better than using Naive Bayes which is 39.01 percent.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research