Arabic Text Categorization Using Logistic Regression
Author(s) -
Mayy M. Al-Tahrawi
Publication year - 2015
Publication title -
international journal of intelligent systems and applications
Language(s) - English
Resource type - Journals
eISSN - 2074-9058
pISSN - 2074-904X
DOI - 10.5815/ijisa.2015.06.08
Subject(s) - arabic , computer science , categorization , natural language processing , artificial intelligence , preprocessor , classifier (uml) , text categorization , logistic regression , machine learning , linguistics , philosophy
Several Text Categorization (TC) techniques and algorithms have been investigated in the limited research literature of Arabic TC. In this research, Logistic Regression (LR) is investigated in Arabic TC. To the best of our knowledge, LR was never used for Arabic TC before. Experiments are conducted on Aljazeera Arabic News (Alj-News) dataset. Arabic text-preprocessing takes place on this dataset to handle the special nature of Arabic text. Experimental results of this research prove that the LR classifier is a competitive Arabic TC algorithm to the state of the art ones in this field; it has recorded a precision of 96.5% on one category and above 90% for 3 categories out of the five categories of Alj-News dataset. Regarding the overall performance, LR has recorded a macroaverage precision of 87%, recall of 86.33% and F- measure of 86.5%.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom