An Efficient Feature Selection based on Bayes Theorem, Self Information and Sequential Forward Selection

Full Text (PDF, 506KB), PP.46-54

Views: 0 Downloads: 0

Author(s)

K.Mani 1 P.Kalpana 1,*

1. Department of Computer Science, Nehru Memorial College, Puthanampatti, 621 007, Tiruchirappalli (DT), India

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2016.06.06

Received: 12 Aug. 2016 / Revised: 1 Sep. 2016 / Accepted: 2 Oct. 2016 / Published: 8 Nov. 2016

Index Terms

Feature Selection, Irrelevant and Redundant Attributes, Feature Relevance, Feature Weighting, Bayes Theorem, Self Information, Sequential Forward Selection and Naive Bayesian Classifier

Abstract

Feature selection is an indispensable pre-processing technique for selecting more relevant features and eradicating the redundant attributes. Finding the more relevant features for the target is an essential activity to improve the predictive accuracy of the learning algorithms because more irrelevant features in the original feature space will cause more classification errors and consume more time for learning. Many methods have been proposed for feature relevance analysis but no work has been done using Bayes Theorem and Self Information. Thus this paper has been initiated to introduce a novel integrated approach for feature weighting using the measures viz., Bayes Theorem and Self Information and picks the high weighted attributes as the more relevant features using Sequential Forward Selection. The main objective of introducing this approach is to enhance the predictive accuracy of the Naive Bayesian Classifier.

Cite This Paper

K.Mani, P.Kalpana, "An Efficient Feature Selection based on Bayes Theorem, Self Information and Sequential Forward Selection", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.8, No.6, pp.46-54, 2016. DOI:10.5815/ijieeb.2016.06.06

Reference

[1]Mark A. Hall. (2000) 'Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning' in ICML 2000: Proceedings of the Seventeenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, pp. 359-366.
[2]Lei Yu and Huan Liu. (2004) 'Efficient Feature Selection via Analysis of Relevance and Redundancy', Journal of Machine Learning Research, pp. 1205-1224.
[3]Jiawei Han. and Micheline Kambar (2006) Data Mining: Concepts and Techniques, 2nd ed., Morgan Kaufmann Publisher.
[4]Jacek Biesiada and Wlodzislaw Duch (2007) 'Feature Selection for High-Dimesional Data: A Pearson Redundancy Based Filter', Computer Recognition System 2, ASC, Vol. 45, pp.242-249.
[5]Ranjan Bose (2008) Information Theory, Coding and Cryptography, 2nd edition, Tata McGraw-Hill publishing company Limited.
[6]Subramanian Appavu Alias Balamurugu et al (2009) 'Effective and Effective Feature Selection for Large-scale Data using Bayes Theorem', International Journal of Automation and Computing, pp.62-71.
[7]Gauyhier Doquire and Michel Verleysen. (2011) 'Mutual information based feature selection for mixed data', ESANN proceedings, pp. 27-29.
[8]Subramanian Appavu and et al (2011) 'Bayes Theorem and Information Gain Based Feature Selection for maximizing the performance of classifier', Springer-Verlag Berlin Heidelberg, CCSIT, pp. 501-511.
[9]John Peter, T and Somasundaram, K. (2012) 'Study and Development of Novel Feature Selection Framework for Heart Disease Prediction', International Journal of Scientific and Research Publication, Vol. 2, No. 10.
[10]Rajeshwari, K. et al. (2013) 'Improving efficiency of Classification using PCA and Apriori based attribute selection technique', Research Journal of Applied Sciences, Engineering and Technology, Maxwell scientific organisation, pp. 4681-4684.
[11]Mani, K and Kalpana, P. (2015) 'A Filter-based Feature Selection using Information Gain with Median Based Discretization for Naive Bayesian Classifier', International Journal of Applied and Engineering Research, Vol. 10 No. 82, pp. 280-285.
[12]UCI Machine Learning Repository - Center for Machine Learning and Intelligent System. [online] http://archive.ics.uci.edu (Accessed 10 October 2015).
[13]Mr. Saptarsi Goswami and Dr. Amlan Chakrabarti. (2014) 'Feature Selection: A Practitioner View', International Journal of Information Technology and Computer Science, Vol. 11, pp. 66-77.
[14]Eniafe Festus Ayetiran and Adesesan Barnabas Adeyemo (2012) 'A Data Mining-Based Response Model for Target Selection in Direct Marketing', International Journal of Information Technology and Computer Science, Vol. 1, pp. 9-18.
[15]Muhammad Atif Tahir, Ahmed Bouridane and Fatih Kurugollu (2007) 'Simultaneous feature selection and feature weighting using Hybrid Tubu Search/K-nearest neighbor classifier', Pattern Recognition Letters, Elsevier, pp. 438-446.