z-logo
open-access-imgOpen Access
On the Impact of Dataset Characteristics on Arabic Document Classification
Author(s) -
Diab Abuaiadah,
Jihad El Sana,
Walid Abusalah
Publication year - 2014
Publication title -
international journal of computer applications
Language(s) - English
Resource type - Journals
ISSN - 0975-8887
DOI - 10.5120/17701-8680
Subject(s) - arabic , computer science , natural language processing , artificial intelligence , information retrieval , linguistics , philosophy
paper describes the impact of dataset characteristics on the results of Arabic document classification algorithms using TF-IDF representations. The experiments compared different stemmers, different categories and different training set sizes, and found that different dataset characteristics produced widely differing results, in one case attaining a remarkable 99% recall (accuracy). The use of a standard dataset would eliminate this variability and enable researchers to gain comparable knowledge from the published results.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom