z-logo
open-access-imgOpen Access
Improving Arabic Text Categorization using Normalization and Stemming Techniques
Author(s) -
M. Rouhia,
Mohamed Hamdy,
Mahmoud F. Hussein
Publication year - 2016
Publication title -
international journal of computer applications
Language(s) - English
Resource type - Journals
ISSN - 0975-8887
DOI - 10.5120/ijca2016908328
Subject(s) - computer science , normalization (sociology) , arabic , categorization , natural language processing , text categorization , artificial intelligence , information retrieval , linguistics , philosophy , sociology , anthropology
Categorization is a technique for assigning documents based on their contents to one or more pre-defined categories. Achieving highest categorization accuracy remains one of the major challenges and it is also time consuming. We proposed approach to tackle these challenges. The proposed approach uses Frequency Ratio Accumulation Method (FRAM) as a classifier. Its features are represented using bag of word technique and an improved Term Frequency (TF) technique is used in features selection. The proposed approach is tested with known datasets. The experiments are done without both of normalization and stemming, with one of them, and with both of them. The obtained results of proposed approach are generally improved compared to existing techniques.The performance attributes of proposed Arabic Text Categorization approach were considered: Accuracy, Recall, Precision and F-measure (F1). The averages of the obtained results are 97.50%, 97.50%, 97.51%, and 97.49% respectively using normalization. Keywordstext categorization, Frequency ratio accumulation method (FRAM), Bag-Of-Word (BOW), Features selection, Term and document frequency.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom