An Anomaly Detection Based on Optimization

Full Text (PDF, 532KB), PP.87-96

Views: 0 Downloads: 0

Author(s)

Rasim M. Alguliyev 1,* Ramiz M. Aliguliyev 1 Yadigar N. Imamverdiyev 1 Lyudmila V. Sukhostat 1

1. Institute of Information Technology, Azerbaijan National Academy of Sciences, Baku, Azerbaijan

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2017.12.08

Received: 4 Apr. 2017 / Revised: 5 Jun. 2017 / Accepted: 7 Jul. 2017 / Published: 8 Dec. 2017

Index Terms

Optimization, anomaly detection, Big data, clustering, regularization parameter

Abstract

At present, an anomaly detection is one of the important problems in many fields. The rapid growth of data volumes requires the availability of a tool for data processing and analysis of a wide variety of data types. The methods for anomaly detection are designed to detect object’s deviations from normal behavior. However, it is difficult to select one tool for all types of anomalies due to the increasing computational complexity and the nature of the data. In this paper, an improved optimization approach for a previously known number of clusters, where a weight is assigned to each data point, is proposed. The aim of this article is to show that weighting of each data point improves the clustering solution. The experimental results on three datasets show that the proposed algorithm detects anomalies more accurately. It was compared to the k-means algorithm. The quality of the clustering result was estimated using clustering evaluation metrics. This research shows that the proposed method works better than k-means on the Australia (credit card applications) dataset according to the Purity, Mirkin and F-measure metrics, and on the heart diseases dataset according to F-measure and variation of information metric.

Cite This Paper

Rasim M. Alguliyev, Ramiz M. Aliguliyev, Yadigar N. Imamverdiyev, Lyudmila V. Sukhostat, "An Anomaly Detection Based on Optimization", International Journal of Intelligent Systems and Applications(IJISA), Vol.9, No.12, pp.87-96, 2017. DOI:10.5815/ijisa.2017.12.08

Reference

[1]T. Hey, S. Tansley, and K. Tolle, The fourth paradigm: data-intensive scientific discovery. Redmond, Washington: Microsoft Research, 2009, 284 p.
[2]S. Kaisler, F. Aemour, J.A. Espinosa, and W. Money, “Big Data: Issues and Challenges Moving Forward,” IEEE Computer Society, vol. 1, pp. 995-1004, 2013 [46th Hawaii International Conference on System Sciences (HICSS 2013), 2013].
[3]P.C. Zikopoulos, Ch. Eaton, D. Deroos, T. Deutsch, and G. Lapis, Understanding Big data, analytics for enterprise class hadoop and streaming data. New York: McGraw-Hill, 2011, 166 p.
[4]R.M. Alguliyev, R.T. Gasimova, and R.N. Abbaslı, “The obstacles in Big data process,” International Journal of Modern Education and Computer Science (IJMECS), vol. 9, no. 3, pp.28-35, 2017. doi:10.5815/ijmecs.2017.03.04
[5]J. Leskovec, A. Rajaraman, and J.D. Ullman, Mining of massive datasets. Cambridge: Cambridge University Press, 2014, 511p.
[6]M. Xie, S. Han, B. Tian, and S. Parvin, “Anomaly detection in wireless sensor networks: A survey,” Journal of Network and Computer Applications, vol. 34, no. 4, pp. 1302–1325, 2011.
[7]H. Nallaivarothayan, D. Ryan, S. Denman, S. Sridharan, and C. Fookes, “An evaluation of different features and learning models for anomalous event detection,” Proc. of the 2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA’13), Hobart, Australia, November 2013, pp. 1–8.
[8]J.J. Davis and A. J. Clark, “Data preprocessing for anomaly based network intrusion detection: A review,” Computers & Security, vol. 30, no. 6–7, pp. 353–375, 2011.
[9]U. Fiorea, F. Palmierib, A. Castiglionec, and A. D. Santis, “Network anomaly detection with the restricted Boltzmann machine,” Neurocomputing, vol. 122, pp. 13–23, 2013.
[10]R. Kaur and S. Singh, “A survey of data mining and social network analysis based anomaly detection techniques,” Egyptian Informatics Journal, vol. 17, no. 2, pp. 199-216, 2016.
[11]M.E. Agha and M. Wesam, “Efficient and fast initialization algorithm for k-means clustering,” International Journal of Intelligent Systems and Applications (IJISA), vol. 4, no. 1, pp.21-31, 2012.
[12]M. Verma, M. Srivastava, N. Chack, A. Kumar Diswar, and N. Gupta, “A comparative study of various clustering algorithms in Data mining,” IJERA, vol. 2, no. 3, pp. 1379-1384, 2012.
[13]G. van Capelleveen, M. Poel, and R. Muller, “Outlier detection in healthcare fraud: A case study in the Medicaid dental domain,” Int. J. Accounting Information Syst., vol. 21, pp. 18-31, 2016.
[14]W. Wang, T. Guyet, R. Quiniou, M.-O. Cordier, F. Masseglia, and X. Zhang, “Autonomic intrusion detection: Adaptively detecting anomalies over unlabeled audit data streams in computer networks,” Knowl.-Based Syst., vol. 70, pp. 103-117, 2014.
[15]R.M. Alguliyev, R.M. Aliguliyev, A. Bagirov, and R. Karimov, “Batch clustering algorithm for Big data sets // Proc. of International Conference on Application of Information and Communication Technologies (AICT’16), Baku, Azerbaijan, 2016, pp. 79-82.
[16]J.D. Kelleher, B. MacNamee, and A. DArcy, Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies, Cambridge, MA: MIT Press, 2015, 624 p.
[17]H. Ren, Z. Ye, and Z. Li, “Anomaly detection based on a dynamic Markov model,” Information Sciences, vol. 411, pp. 52-65, 2017.
[18]V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection for discrete sequences: a survey,” IEEE Trans. on Knowl. and Data Eng., vol. 24, no. 5, pp. 823-839, 2012.
[19]M. Abubaker and W. Ashour, “Efficient data clustering algorithms: improvements over Kmeans,” International Journal of Intelligent Systems and Applications (IJISA), vol. 5, no. 3, pp. 37-49, 2013. doi:10.5815/ijisa.2013.03.04
[20]D. Yankov, E. J. Keogh, and U. Rebbapragada, “Disk aware discord discovery: finding unusual time series in terabyte sized datasets,” Proc. of International Conference on Data Mining, 2007, pp. 381–390.
[21]U. Rebbapragada, P. Protopapas, C. E. Brodley, and C. Alcock, “Finding anomalous periodic time series,” Machine Learning, vol. 74, no. 3, pp. 281-313, 2009.
[22]Z. Liu, J.X. Yu, L. Chen, and D. Wu, “Detection of shape anomalies: a probabilistic approach using hidden Markov models,” Proc. of IEEE 24th International Conference on Data Engineering, April 2008, pp. 1325-1327.
[23]M. Gupta, J. Gao, C. Aggarwal, and J. Han, “Outlier detection for temporal data: a survey,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 9, pp. 2250-2267, 2014.
[24]M. Bai, X. Wang, J. Xin, and G. Wang, “An efficient algorithm for distributed density-based outlier detection on big data,” Neurocomputing, vol. 181, pp. 19-28, 2016.
[25]M.H. Bhuyan, D.K. Bhattacharyya, and J.K. Kalita, “A multi-step outlier-based anomaly detection approach to network-wide traffic,” Information Sciences, vol. 348, pp. 243-271, 2016.
[26]Y. Rajabzadeh, A.H. Rezaie, and H. Amindavar, “A dynamic modeling approach for anomaly detection using stochastic differential equations,” Digital Signal Processing, vol. 54, pp. 1-11, 2016.
[27]T. Velmurugan and T. Santhanam, “A survey of partition based clustering algorithms in Data mining: an experimental approach,” Information Technology Journal, vol. 10, no. 3, pp. 478-484, 2011.
[28]M. Capo, A. Perez, and J.A. Lozano, “An efficient approximation to the k-means clustering for massive data,” Knowl.-Based Syst., vol. 117, pp. 56-69, 2017.
[29]L. Bai, J.Y. Liang, C. Sui, and C.Y. Dang, “Fast global k-means clustering based on local geometrical information,” Information Sciences, vol. 245, pp. 168-180, 2013.
[30]M.C. Naldi and R.J.G.B. Campello, “Comparison of distributed evolutionary k-means clustering algorithms,” Neurocomputing, vol. 163, pp. 78-93, 2015.
[31]A. Saha and S. Das, “Categorical fuzzy k-modes clustering with automated feature weight learning,” Neurocomputing, vol. 166, pp. 422-435, 2015.
[32]L. Bai, J. Liang, C. Dang, and F. Cao, “A cluster centers initialization method for clustering categorical data,” Expert Systems with Applications, vol. 39, no. 9, pp. 8022-8029, 2012.
[33]S.S. Khan and A. Ahmad, “Cluster center initialization algorithm for K-modes clustering,” Expert Systems with Applications, vol. 40, no. 18, pp. 7444-7456, 2013.
[34]F. Jiang, G. Liu, J. Du, and Y. Sui, “Initialization of K-modes clustering using outlier detection techniques,” Information Sciences, vol. 332, pp. 167-183, 2016.
[35]R.M. Alguliyev, R.M. Aliguliyev, T.Kh. Fataliyev, and R.Sh. Hasanova, “Weighted consensus index for assessment of the scientific performance of researchers,” COLLNET Journal of Scientometrics and Information Management, vol. 8, no. 2, pp. 371-400, 2014.
[36]Y. Fu, Human activity recognition and prediction. Cham, Switzerland: Springer International Publishing AG, 2016, 181 p.
[37]K. Bache and Lichman, UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. (accessed on 21/03/17), 2013.
[38]P. Aggarwal, S.K. Sharma, “Analysis of KDD dataset attributes-class wise for intrusion detection,” Procedia Computer Science, vol. 57, pp. 842–851, 2015.
[39]J. McHugh, “Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln laboratory,” ACM Transactions on Information and System Security, vol. 3, no. 4, pp. 262–294, 2000.
[40]F. Boutin and M. Hascoet, “Cluster validity indices for graph partitioning,” Proc. of the 8th International Conference on Information Visualization (IV’2004), London, UK, 2004, pp. 376–381.
[41]A.M. Rubinov, N.V. Soukhorukova, and J. Ugon, “Classes and clusters in data analysis,” European Journal of Operational Research, vol. 173, pp. 849-865, 2006.
[42]B. Mirkin, Mathematical Classification and Clustering, Kluwer Academic Press, Boston, Dordrecht, 1996, 580 p.
[43]J.C. Bezdek and N.R. Pal, “Some new indexes of cluster validity,” IEEE Transactions on Systems, Man and Cybernetics – Part B: Cybernetics, vol. 28, pp. 301–315, 1998.
[44]A. Patrikainen and M. Meila, “Comparing subspace clusterings,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, pp. 902–916, 2006.
[45]A. Rosenberg and J. Hirschberg, “V-Measure: a conditional entropy-based external cluster evaluation measure,” Proc. of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning(EMNLP-CoNLL), 2007, pp. 410-420.
[46]R.M. Aliguliyev, “Performance evaluation of density-based clustering methods,” Information Sciences, vol. 179, no. 20, pp. 3583-3602, 2009.
[47]S.M.R. Zadegan, M. Mirzaie, and F. Sadoughi, “Ranked k-medoids: a fast and accurate rank-based partitioning algorithm for clustering large datasets,” Knowl.-Based Syst., vol. 39, pp. 133-143, 2013.