Outlier Detection Algorithm Based on Fuzzy C-Means and Self-organizing Maps Clustering Methods

Full Text (PDF, 541KB), PP.21-29

Views: 0 Downloads: 0

Author(s)

Mesut. Polatgil 1,*

1. Şarkışla School of Applied Sciences /Computer Science, Sivas, 58050, Turkey

* Corresponding author.

DOI: https://doi.org/10.5815/ijmsc.2022.03.02

Received: 13 Mar. 2022 / Revised: 15 Apr. 2022 / Accepted: 25 May 2022 / Published: 8 Aug. 2022

Index Terms

Outlier detection, Fuzzy C Means, Self-Organization Maps, Silhouette, Calinski-Harabasz, Davies-Bouldin.

Abstract

Data mining and machine learning methods are important areas where studies have increased in recent years. Data is critical for these areas focus on inferring meaningful conclusions from the data collected. The preparation of the data is very important for the studies to be carried out and the algorithms to be applied. One of the most critical steps in data preparation is outlier detection. Because these observations, which have different characteristics from the observations in the data, affect the results of the algorithms to be applied and may cause erroneous results. New methods have been developed for outlier detection and machine learning and data mining algorithms have been provided with successful results with these methods. Algorithms such as Fuzzy C Means (FCM) and Self Organization Maps (SOM) have given successful results for outlier detection in this area. However, there is no outlier detection method in which these two powerful clustering methods are used together. This study proposes a new outlier detection algorithm using these two powerful clustering methods. In this study, a new outlier detection algorithm (FUSOMOUT) was developed by using SOM and FCM clustering methods together. With this algorithm, it is aimed to increase the success of both clustering and classification algorithms. The proposed algorithm was applied to four different datasets with different characteristics (Wisconsin breast cancer dataset (WDBC), Wine, Diabetes and Kddcup99) and it was shown to significantly increase the classification accuracy with the Silhouette, Calinski-Harabasz and Davies-Bouldin indexes as clustering success indexes.

Cite This Paper

Mesut Polatgil, "Outlier Detection Algorithm Based on Fuzzy C-Means and Self-organizing Maps Clustering Methods", International Journal of Mathematical Sciences and Computing(IJMSC), Vol.8, No.3, pp. 21-29, 2022. DOI:10.5815/ijmsc.2022.03.02

Reference

[1]A. Dik, K. Jebari, A. Ettouhami, An Improved Robust Fuzzy Algorithm for Unsupervised Learning, Journal of Intelligent Systems. 29 (2020) 1028–1042. https://doi.org/10.1515/JISYS-2018-0030.

[2]A. Saha, A. Chatterjee, S. Ghosh, N. Kumar, R. Sarkar, An ensemble approach to outlier detection using some conventional clustering algorithms, Multimedia Tools and Applications. (2020). https://doi.org/10.1007/S11042-020-09628-5.

[3]M. Vaghefi, K. Mahmoodi, M. Akbari, A comparison among data mining algorithms for outlier detection using flow pattern experiments, Scientia Iranica. 25 (2018) 590–605. https://doi.org/10.24200/SCI.2017.4182.

[4]A. Smiti, A critical overview of outlier detection methods, Computer Science Review. 38 (2020) 100306. https://doi.org/10.1016/j.cosrev.2020.100306.

[5]Z. Nazari, D. Kang, Evaluation of Multivariate Outlier Detection Methods with Benchmark Medical Datasets, INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY. 18 (2018) 36–43. https://www.webofscience.com/wos/woscc/full-record/WOS:000432495400006 (accessed August 22, 2021).

[6]F. Carcillo, Y.A. Le Borgne, O. Caelen, Y. Kessaci, F. Oblé, G. Bontempi, Combining unsupervised and supervised learning in credit card fraud detection, Information Sciences. 557 (2021) 317–331. https://doi.org/10.1016/J.INS.2019.05.042.

[7]P. Verma, M. Sinha, S. Panda, Fuzzy c-Means Clustering-Based Novel Threshold Criteria for Outlier Detection in Electronic Nose, IEEE Sensors Journal. 21 (2021) 1975–1981. https://doi.org/10.1109/JSEN.2020.3020272.

[8]K. Vembandasamy, T. Karthikeyan, Novel outlier detection in diabetics classification using data mining techniques | Request PDF, International Journal of Applied Engineering Research. 11 (2016) 1400–1415. https://www.researchgate.net/publication/298714562_Novel_outlier_detection_in_diabetics_  classification_using_data_mining_techniques (accessed August 8, 2021).

[9]H. Tanatavikorn, Y. Yamashita, Fuzzy treatment method for outlier detection in process data, Journal of Chemical Engineering of Japan. 49 (2016) 864–873. https://doi.org/10.1252/JCEJ.16WE042.

[10]A. Dik, K. Jebari, A. Bouroumi, A. Ettouhami, A new fuzzy clustering by outliers, Journal of Engineering and Applied Sciences. 9 (2014) 372–377. https://doi.org/10.3923/JEASCI.2014.372.377.

[11]M.B. Al-Zoubi, A. Al-Dahoud, A.A. Yahya, New outlier detection method based on fuzzy clustering, WSEAS Transactions on Information Science and Applications. 7 (2010) 681–690.

[12]P. Stefanovic, O. Kurasovay, Outlier detection in self-organizing maps and their quality estimation, Neural Network World. 28 (2018) 105–117. https://doi.org/10.14311/NNW.2018.28.006.

[13]T.Y. Christyawan, W.F. Mahmudy, Text Classification and Visualization on News Title Using Self Organizing Map, in: 3rd International Conference on Sustainable Information Engineering and Technology, SIET 2018 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 2018: pp. 332–336. https://doi.org/10.1109/SIET.2018.8693189.

[14]S. Arora, D. Kumar, Hybridization of SOM and PSO for detecting fraud in credit card, International Journal of Information Systems in the Service Sector. 9 (2017) 17–36. https://doi.org/10.4018/IJISSS.2017070102.

[15]E. López-Rubio, E.J. Palomo, E. Domínguez, Robust self-organization with M-estimators, Neurocomputing. 151 (2015) 408–423. https://doi.org/10.1016/j.neucom.2014.09.024.

[16]P. Stefanovič, O. Kurasova, Creation of Text Document Matrices and Visualization by Self-Organizing Map, Information Technology and Control . 43 (2014) 37–46. https://doi.org/10.5755/J01.ITC.43.1.4299.

[17]Q. Cai, H. He, H. Man, Spatial outlier detection based on iterative self-organizing learning model, Neurocomputing. 117 (2013) 161–172. https://doi.org/10.1016/j.neucom.2013.02.007.

[18]X. Yan, Multivariate outlier detection based on self-organizing map and adaptive nonlinear map and its application, Chemometrics and Intelligent Laboratory Systems. 107 (2011) 251–257. https://doi.org/10.1016/j.chemolab.2011.04.007.

[19]S.A. Mingoti, J.O. Lima, Comparing SOM neural network with Fuzzy c-means, K-means and traditional hierarchical clustering algorithms, European Journal of Operational Research. 174 (2006) 1742–1759. https://doi.org/10.1016/j.ejor.2005.03.039.

[20]A. Muñoz, J. Muruzábal, Self-organizing maps for outlier detection, Neurocomputing. 18 (1998) 33–60. https://doi.org/10.1016/S0925-2312(97)00068-4.

[21]J. Muruzábal, A. Muñoz, J. Muruzabal, A. Munoz, On the Visualization of Outliers via Self-Organizing Maps, Journal of Computational and Graphical Statistics. 6 (1997) 355. https://doi.org/10.2307/1390741.

[22]T. Kohonen, Self-organized formation of topologically correct feature maps, Biological Cybernetics 1982 43:1. 43 (1982) 59–69. https://doi.org/10.1007/BF00337288.

[23]J.C. Bezdek, Models for Pattern Recognition, Pattern Recognition with Fuzzy Objective Function Algorithms. (1981) 1–13. https://doi.org/10.1007/978-1-4757-0450-1_1.

[24]J.C. Dunn, A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters, Journal of Cybernetics. 3 (1973) 32–57. https://doi.org/10.1080/01969727308546046.

[25]S.P. Petrovi´c, A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters, (2006).

[26]Y. Kwon, K. Kang, C. Bae, Unsupervised learning for human activity recognition using smartphone sensors, Expert Systems with Applications. 41 (2014) 6067–6074. https://doi.org/10.1016/J.ESWA.2014.04.037.

[27]S.R. Kannan, S. Ramthilagam, R. Devi, T.P. Hong, Fuzzy C-means in finding subtypes of cancers in cancer database, Journal of Innovative Optical Health Sciences. 7 (2014) 1450018. https://doi.org/10.1142/S1793545814500187.

[28]A. Triayudi, I. Fitri, Comparison of parameter-free agglomerative hierarchical clustering methods, ICIC Express Letters. 12 (2018) 973–980. https://doi.org/10.24507/ICICEL.12.10.973.

[29]V. Divya, R. Deepika, C. Yamini, P. Sobiyaa, An Efficient K-Means Clustering Initialization Using Optimization Algorithm, Proceedings of the 2019 International Conference on Advances in Computing and Communication Engineering, ICACCE 2019. (2019). https://doi.org/10.1109/ICACCE46606.2019.9079998.