An Efficient Algorithm for Density Based Subspace Clustering with Dynamic Parameter Setting

Full Text (PDF, 644KB), PP.27-33

Views: 0 Downloads: 0

Author(s)

B.Jaya Lakshmi 1,* K.B.Madhuri 1 M.Shashi 2

1. Department of IT, GVP College of Engineering (A), Andhra Pradesh, 530048, India

2. Department of CS&SE, AU College of Engineering (A), Andhra Pradesh, 530003, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2017.06.04

Received: 16 Jun. 2016 / Revised: 21 Oct. 2016 / Accepted: 12 Feb. 2017 / Published: 8 Jun. 2017

Index Terms

Density divergence, multi-density behavior, data spread, dynamic epsilon, subspace clusters, density based subspace clustering

Abstract

Density based Subspace Clustering algorithms have gained their importance owing to their ability to identify arbitrary shaped subspace clusters. Density-connected SUBspace CLUstering(SUBCLU) uses two input parameters namely epsilon and minpts whose values are same in all subspaces which leads to a significant loss to cluster quality. There are two important issues to be handled. Firstly, cluster densities vary in subspaces which refers to the phenomenon of density divergence. Secondly, the density of clusters within a subspace may vary due to the data characteristics which refers to the phenomenon of multi-density behavior. To handle these two issues of density divergence and multi-density behavior, the authors propose an efficient algorithm for generating subspace clusters by appropriately fixing the input parameter epsilon. The version1 of the proposed algorithm computes epsilon dynamically for each subspace based on the maximum spread of the data. To handle data that exhibits multi-density behavior, the algorithm is further refined and presented in version2. The initial value of epsilon is set to half of the value resulted in the version1 for a subspace and a small step value 'delta' is used for finalizing the epsilon separately for each cluster through step-wise refinement to form multiple higher dimensional subspace clusters. The proposed algorithm is implemented and tested on various bench-mark and synthetic datasets. It outperforms SUBCLU in terms of cluster quality and execution time.

Cite This Paper

B.Jaya Lakshmi, K.B.Madhuri, M.Shashi, "An Efficient Algorithm for Density Based Subspace Clustering with Dynamic Parameter Setting", International Journal of Information Technology and Computer Science(IJITCS), Vol.9, No.6, pp.27-33, 2017. DOI:10.5815/ijitcs.2017.06.04

Reference

[1]Vishal M. Patel, Hien Van Nguyen, René Vidal, "Latent Space Sparse and Low-Rank Subspace Clustering," IEEE Journal of Selected Topics in Signal Processing, vol. 9, No. 4, pp. 691 - 701, 2015.

[2]Shaker K. Ali, Zainab Naser Azeez, Ahmed Abdul-Hussein Ouda, "A New Clustering Algorithm for Face Classification", International Journal of Information Technology and Computer Science (IJITCS), Vol.8, No.6, pp.1- 8, 2016. DOI: 10.5815/ijitcs.2016.06.01.

[3]Preeti Jain, Bala Buksh, "Accelerated K-means Clustering Algorithm", International Journal of Information Technology and Computer Science (IJITCS), Vol.8, No.10, pp.39-46, 2016.  DOI: 10.5815/ijitcs.2016.10.05.

[4]Sirisha GNVG, Shashi M, “Mining Closed Interesting Subspaces to Discover Conducive Living Environment of Migratory Animals,” Proceedings of 4th International Conference on Frontiers in Intelligent Computing: Theory and Applications, AISC, Springer, Vol.404 pp.153-166, 2015.

[5]Ester M, Kriegel H, Sander J, and Xu X, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise," Proc. 2ndInternational Conference on Knowledge Discovery and Data Mining, Portland, pp. 169-194, 1996.

[6]Kailing K, Kriegel H, and Kroger P, "Density Connected Subspace Clustering for High - Dimensional Data," Proc.4th SIAM International Conference on Data Mining, Lake Buena Vista, FL, pp.246-257, 2004.

[7]Chu Y, Huang J, Chuang K, Yang D, Chen M, "Density conscious Subspace clustering for high-dimensional data,"  IEEE Transactions on  Knowledge Data Engineering, vol. 22, no.1,  pp. 16–30, 2010.

[8]Zhang Huirong, Tang Yan, He Ying, Mou Chunqian, Xu Pingan, Shi Jiaokai,"A novel subspace clustering method based on data cohesion," in  ModelOptik - International Journal for Light and Electron Optic, vol. 127, issue 20, pp. 8513-8519, 2016.

[9]Ghanbarpour and Minaei B,"EXDBSCAN: An Extension of DBSCAN to Detect Clusters in Multi-Density Datasets," In Iranian Conference on Intelligent Systems (ICIS), pp. 1-5, 2014.

[10]Agrawal R, Gehrke J, Gunopulos D, and  Raghavan P, "Automatic subspace clustering of high dimensional data for data mining applications," Proc.1st international conference on Management of data, New York, U.S.A, pp. 94-105, 1998.

[11]Cheng C H, Fu A W, and Zhang Y," Entropy-based subspace clustering for mining numerical data," Proc.  5th ACM SIGKDD International conference on Knowledge discovery and data mining, New York, U.S.A, pp. 84-93, 1999.

[12]Goil S, Nagesh H, and Choudhary A, "Mafia: Efficient and scalable subspace clustering for very large data sets," Technical Report, Northwestern University, Evanston, 1999.

[13]Kelvin Sim, Vivekanand Gopalkrishnan, Arthur Zimek, Gao Cong," A survey on enhanced subspace clusterin," Data Mining and Knowledge Discovery, vol. 26, no.2 pp. 332-397, 2013.

[14]Kriegel H P, Kroger P, Renz M, and Wurst S,"A generic framework for efficient subspace clustering of high dimensional data," In proceedings International Conference on Data Mining, pp. 250–257, 2005.

[15]Assent I, Krieger R, M¨uller E, and Seidl T,"DUSC: Dimensionality unbiased subspace clusterin," In proceedings International Conference on Data Mining, pp. 409–414, 2007.

[16]Müller E, Assent I, Günnemann S, Seidl T, "Scalable density-based subspace clustering," In: Proceedings of the ACM conference on information and knowledge management (CIKM), pp. 1077–1086, 2011.

[17]Stephan Günnemann , Brigitte Boden , Thomas Seidl, "DB-CSC: a density-based approach for subspace clustering in graphs with feature vectors," Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases, September 05-09, Athens, Greece, 2011.

[18]Datasets from UCI Machine Learning Repository, [online]. available at: http://archive.ics.uci.edu/ml

[19]Pang-Ning Tan, Michael Steinbach, Vipin Kumar, "Introduction to Data Mining," Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 2005.