Arabic Text Categorization Using Logistic Regression

Full Text (PDF, 543KB), PP.71-78

Mayy M. Al-Tahrawi 1,*

1. Computer Science Department, Faculty of Information Technology, Al-Ahliyya Amman University, Amman, Jordan

* Corresponding author.


Received: 11 Jul. 2014 / Revised: 9 Dec. 2014 / Accepted: 22 Feb. 2015 / Published: 8 May 2015

Index Terms

Logistic Regression, Arabic Text Categorization, Arabic Document Classification


Several Text Categorization (TC) techniques and algorithms have been investigated in the limited research literature of Arabic TC. In this research, Logistic Regression (LR) is investigated in Arabic TC. To the best of our knowledge, LR was never used for Arabic TC before. Experiments are conducted on Aljazeera Arabic News (Alj-News) dataset. Arabic text-preprocessing takes place on this dataset to handle the special nature of Arabic text. Experimental results of this research prove that the LR classifier is a competitive Arabic TC algorithm to the state of the art ones in this field; it has recorded a precision of 96.5% on one category and above 90% for 3 categories out of the five categories of Alj-News dataset. Regarding the overall performance, LR has recorded a macroaverage precision of 87%, recall of 86.33% and F-measure of 86.5%.

Cite This Paper

Mayy M. Al-Tahrawi, "Arabic Text Categorization Using Logistic Regression", International Journal of Intelligent Systems and Applications(IJISA), vol.7, no.6, pp.71-78, 2015. DOI:10.5815/ijisa.2015.06.08


