Efficient Classification using Average Weighted Pattern Score with Attribute Rank based Feature Selection

Full Text (PDF, 646KB), PP.29-42

Views: 0 Downloads: 0

Author(s)

S. Sathya Bama 1,* A. Saravanan 2

1. Independent Researcher, Lawley Road, Coimbatore, India

2. Department of MCA, Sree Saraswathi Thyagaraja College, Tamil Nadu, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2019.07.04

Received: 9 Jan. 2019 / Revised: 20 Feb. 2019 / Accepted: 19 Mar. 2019 / Published: 8 Jul. 2019

Index Terms

Classification, outlier detection, pattern matching, feature selection, attribute rank score

Abstract

Classification is found to be an important field of research for many applications such as medical diagnosis, credit risk and fraud analysis, customer segregation, and business modeling. The main intention of classification is to predict the class labels for the unlabeled test samples using a labelled training set accurately. Several classification algorithms exist to classify the test samples based on the trained samples. However, they are not suitable for many real world applications since even a small performance degradation of classification algorithms may lead to substantial loss and crucial implications. In this paper, a simple classification method using the average weighted pattern score with attribute rank based feature selection has been proposed. Feature selection is carried out by computing the attribute score based ranking and the classification is performed using average weighted pattern computation. Experiments have been performed with 40 standard datasets and the results are compared with other classifiers. The outcome of the analysis shows the good performance of the proposed method with higher classification accuracy.

Cite This Paper

S. Sathya Bama, A. Saravanan, "Efficient Classification using Average Weighted Pattern Score with Attribute Rank based Feature Selection", International Journal of Intelligent Systems and Applications(IJISA), Vol.11, No.7, pp.29-42, 2019. DOI:10.5815/ijisa.2019.07.04

Reference

[1]J. Han, J. Pei and M. Kamber, Data mining: concepts and techniques. Elsevier, 2011.
[2]I. H. Witten, E. Frank, M. A. Hall and C. J. Pal, Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2016.
[3]I. Kononenko and M. Kukar, “Machine Learning and Data Mining: Introduction to Principles and Algorithms,” Cambridge, U.K.: Horwood Publ, 2007.
[4]G. V. Lashkia and L. Anthony, “Relevant, irredundant feature selection and noisy example elimination,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 34, no. 2, pp. 888–897, Apr. 2004.
[5]M. Paliwal and U. A. Kumar, “Neural networks and statistical techniques: A review of applications,” Expert Syst. Appl., vol. 36, no. 1, pp. 2–17, Jun. 2009.
[6]G. Lin, C. Shen, Q. Shi, A. Van den Hengel and D. Suter, “Fast supervised hashing with decision trees for high-dimensional data,” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1963-1970, 2014.
[7]T. R. Patil and S. S. Sherekar, “Performance analysis of Naive Bayes and J48 classification algorithm for data classification,” International journal of computer science and applications, 6(2), pp. 256-261, 2013
[8]D. W. Aha, D. Kibler, and M. K. Albert, “Instance-based learning algorithms,” Mach. Learn., vol. 6, no. 1, pp. 37–66, Jan. 1991.
[9]M. Muja. and D. G. Lowe, “Scalable nearest neighbor algorithms for high dimensional data,” IEEE Transactions on Pattern Analysis & Machine Intelligence, pp. 2227-2240, 2014.
[10]I. Naseem, R. Togneri. and M. Bennamoun, “Linear regression for face recognition,” IEEE transactions on pattern analysis and machine intelligence, 32(11), pp. 2106-2112, 2010.
[11]A. Y. Ng and M. I. Jordan, “On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes,” Advances in neural information processing systems, pp. 841-848, 2002.
[12]P. G. Espejo, S. Ventura, and F. Herrera, “A survey on the application of genetic programming to classification,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 40, no. 2, pp. 121–144, Mar. 2010.
[13]T. G. Dietterich, “Ensemble methods in machine learning,” International workshop on multiple classifier systems, Springer, Berlin, Heidelberg, pp. 1-15, June 2000.
[14]A. Liaw and M. Wiener, “Classification and regression by random Forest”, R news, 2(3), pp.18-22, 2002.
[15]T. K. An and M. H. Kim, “A new diverse AdaBoost classifier,” in Proc. of Artificial Intelligence and Computational Intelligence (AICI), IEEE, vol. 1, pp. 359-363, 2010.
[16]S. W. Lin, Z. J. Lee, S. C. Chen and T. Y. Tseng, “Parameter determination of support vector machines and feature selection using simulated annealing approach,” Appl. Soft Comput. 8, pp. 1505–1512, 2008.
[17]A. Frank and A. Asuncion, “UCI machine learning repository”, Univ. California, School Inf. Comput. Sci., Irvine, CA. [Online]. Available: http://archive.ics.uci.edu/ml/citation_policy.html
[18]J. Alcalá-Fdez, A. Fernandez, J. Luengo, J. Derrac, S. García, L. Sánchez and F. Herrera, “KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework,” J. Multiple-Valued Logic Soft Comput., vol. 17, pp. 255–287, 2011.
[19]J. V. Tu, “Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes,” Journal of clinical epidemiology, 49(11), pp. 1225-1231, 1996.
[20]S. Dreiseitl and L. Ohno-Machado, “Logistic regression and artificial neural network classification models: a methodology review”, Journal of biomedical informatics, 35(5-6), pp. 352-359, 2002.
[21]S. S. Nikam, “A comparative study of classification techniques in data mining algorithms,” Oriental Journal of Computer Science & Technology, 8(1), pp.13-19, 2015.
[22]D. L. Poole and A K. Mackworth, “Artificial Intelligence: foundations of computational agents,” Cambridge University Press, 2010.
[23]S. J. Press and S. Wilson, “Choosing between logistic regression and discriminant analysis,” Journal of the American Statistical Association, 73(364), pp. 699-705, 1978, 1978.
[24]Leo Breiman, “Random Forests,” in Machine Learning, 45(1), pp. 5-32, 2001.
[25]J. R. Quinlan, “Bagging, Boosting, and C4.5,” in Proc. of national conference on artificial intelligence, pp. 725–730.
[26]S. C. Chen, S. W. Lin, S. Y. Chou, “Enhancing the classification accuracy by scatter-search-based ensemble approach,” Appl. Soft Comput. 11, pp. 1021–1028, 2011.
[27]K. Ming Leung, “K-Nearest Neighbor Algorithm for Classification,” Polytechnic University Department of Computer Science/Finance and Risk Engineering, 2007.
[28]B. Li, Y. W. Chen, and Y. Q. Chen, “The nearest neighbor algorithm of local probability centers,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 38, no. 1, pp. 141–154, Feb. 2008.
[29]L. Peng, B. Peng, Y. Chen, A. Abraham, “Data gravitation based classification,” Inf.Sci. 179, pp. 809–819, march 2009.
[30]J. Wang, P. Neskovic, and L. N. Cooper, “Improving nearest neighbor rule with a simple adaptive distance measure,” Pattern Recognit. Lett., vol. 28, no. 2, pp. 207–213, Jan. 2007.
[31]Y. Zong-Chang, “A vector gravitational force model for classification,” Pattern Anal. Appl., 11, pp.169–177, May 2008.
[32]A. Cano, A. Zafra, S. Ventura, “Weighted data gravitation classification for standard and imbalanced data,” IEEE Trans. Cybern. 43, December 2013.
[33]N.K. Sreeja, and A. Sankar, “Pattern matching based classification using ant colony optimization based feature selection,” Applied Soft Computing, 31, pp.91-102, 2015.
[34]G. Hesamian, “One-way ANOVA based on interval information,” International Journal of Systems Science, 47(11), pp .2682-2690, 2016
[35]V. López, A. Fernandez, S. Garcia, V. Palade and F. Herrera, “An Insight into Classification with Imbalanced Data: Empirical Results and Current Trends on Using Data Intrinsic Characteristics,” Information Sciences, 250, pp. 113-141, 2013.