Muhammad Pervez Akhtar

Work place: Department of Computer Science & IT, University of Sargodha, Sargodha, 4100, Sargodha

E-mail: pervezbcs@gmail.com

Website:

Research Interests:

Biography

Muhammad Pervez Akhtar dis his MS in text processing from University of Sargodha, Pakistan. His research interest includes text processing, information retrieval and extraction.

Author Articles
Evaluation of Feature Selection Approaches for Urdu Text Categorization

By Tehseen Zia Qaiser Abbas Muhammad Pervez Akhtar

DOI: https://doi.org/10.5815/ijisa.2015.06.03, Pub. Date: 8 May 2015

Efficient feature selection is an important phase of designing an effective text categorization system. Various feature selection methods have been proposed for selecting dissimilar feature sets. It is often essential to evaluate that which method is more effective for a given task and what size of feature set is an effective model selection choice. Aim of this paper is to answer these questions for designing Urdu text categorization system. Five widely used feature selection methods were examined using six well-known classification algorithms: naive Bays (NB), k-nearest neighbor (KNN), support vector machines (SVM) with linear, polynomial and radial basis kernels and decision tree (i.e. J48). The study was conducted over two test collections: EMILLE collection and a naive collection. We have observed that three feature selection methods i.e. information gain, Chi statistics, and symmetrical uncertain, have performed uniformly in most of the cases if not all. Moreover, we have found that no single feature selection method is best for all classifiers. While gain ratio out-performed others for naive Bays and J48, information gain has shown top performance for KNN and SVM with polynomial and radial basis kernels. Overall, linear SVM with any of feature selection methods including information gain, Chi statistics or symmetric uncertain methods is turned-out to be first choice across other combinations of classifiers and feature selection methods on moderate size naive collection. On the other hand, naive Bays with any of feature selection method have shown its advantage for a small sized EMILLE corpus.

[...] Read more.
Other Articles