Mining Maximal Subspace Clusters to deal with Inter-Subspace Density Divergence

Full Text (PDF, 1029KB), PP.37-48

Views: 0 Downloads: 0

Author(s)

B.Jaya Lakshmi 1,* K.B.Madhuri 1

1. Department of IT, GVP College of Engineering (A), Andhra Pradesh, 530048, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijmsc.2019.03.04

Received: 6 Mar. 2019 / Revised: 26 Apr. 2019 / Accepted: 13 Jun. 2019 / Published: 8 Aug. 2019

Index Terms

Subspace Clustering, Maximal subspace Clusters, Inter-Subspace Density Divergence, Dynamic Epsilon, Density Notion

Abstract

In general, subspace clustering algorithms identify enormously large number of subspace clusters which may possibly involve redundant clusters. This paper presents Dynamic Epsilon based Maximal Subspace Clustering Algorithm (DEMSC) that handles both redundancy and inter-subspace density divergence, a phenomenon in density based subspace clustering. The proposed algorithm aims to mine maximal and non-redundant subspace clusters. A maximal   subspace cluster is defined by a group of similar data objects that share maximal number of attributes. The DEMSC algorithm consists of four steps. In the first step, data points are assigned with random unique positive integers called labels. In the second step, dense units are identified based on the density notion using proposed dynamically computed epsilon-radius specific to each subspace separately and user specified input parameter minimum points, τ.  In the third step, sum of the labels of each data object forming the dense unit is calculated to compute its signature and is hashed into the hash table. Finally, if a dense unit of a particular subspace collides with that of the other subspace in the hash table, then both the dense units exists with high probability in the subspace formed by combining the colliding subspaces. With this approach efficient maximal subspace clusters which are non-redundant are identified and outperforms the existing algorithms in terms of cluster quality and number of the resulted subspace clusters when experimented on different benchmark datasets.

Cite This Paper

B.Jaya Lakshmi, K.B.Madhuri,"Mining Maximal Subspace Clusters to deal with Inter-Subspace Density Divergence", International Journal of Mathematical Sciences and Computing(IJMSC), Vol.5, No.3, pp.37-48, 2019. DOI: 10.5815/ijmsc.2019.03.04

Reference

[1] Assent I, Krieger, M¨uller, E and Seidl T. DUSC: Dimensionality unbiased subspace clustering. Proceedings International Conference on Data Mining 2007;  409–414.

[2] Lakshmi BJ, Shashi M and Madhuri K B. A Rough Set Based Subspace Clustering Technique for High Dimensional Data. Journal of King Saud University-Computer and Information Sciences [Online] 2017. 

[3] Hans-Peter Kriegel, Peer Kro¨ Ger and Arthur Zimek Clustering High-Dimensional Data: A Survey on Subspace Clustering, Pattern-Based Clustering, and Correlation Clustering. ACM Transactions on Knowledge Discovery from Data, 2009; 3(1).

[4] Jaya Lakshmi B, Shashi M and Madhuri KB. Summarization of Subspace Clusters based on Similarity Connectedness. International Journal of Data Science. 2018;3(3): 255-265,

[5] Jaya Lakshmi B, Shashi M and Madhuri K.B. Automation of Power Transformer Maintenance through Summarization of Subspace Clusters. Journal of Engineering Science and Technology. 2018;13(11):3610-3618 .

[6] Sahil Raj and Tanveer Kajla. Sentiment Analysis of Swachh Bharat Abhiyan. International Journal of Business Analyics and Intelligence, 2015;3(1): 32-38.

[7] Lifei Chen, Shengrui Wang, Kaijun Wang and Jianping Zhu. Soft subspace clustering of categorical data with probabilistic distance. Pattern Recognition. 2016;  51: 322-332.

[8] Wang Lijuan, Hao Zhifeng, Cai Ruichu and Wen Wen. “Enhanced soft subspace clustering through hybrid dissimilarity”. Journal of Intelligent and Fuzzy Systems, 2015; 29(4):1395-1405.

[9] Zhaohong Deng, Yizhang Jiang, Fu-Lai Chung, Hisao Ishibuchi, Kup-Sze Choi and Shitong Wang. Transfer Prototype-Based Fuzzy Clustering. IEEE Transactions on Fuzzy Systems. 2016b;  24(5).

[10] Feiping Nie, Dong Xu, Ivor Wai-Hung Tsang and Changshui Zhang. Flexible Manifold Embedding: A Framework for Semi-Supervised and Unsupervised Dimension Reduction. IEEE Transactions on Image Processing. 2010; 19(7)1921-1932.

[11] Ester M, Kriegel H, Sander J and Xu X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. International Conference on Knowledge Discovery and Data Mining. 1996; 169-194.

[12] Kailing K, Kriegel H and Kroger P. Density Connected Subspace Clustering for High - Dimensional Data. International Conference on Data Mining. 2004; 246-256.

[13] Huirong Zhang, Yan Tang, Ying He, Chunqian Mou, Pingan Xu and Jiaokai Shi. A novel subspace clustering method based on data cohesion. model Optik- International Journal for Light and Electron Optics. 2016;127(20): 8513–8519.

[14] Yi-Hong Chu, Jen-Wei Huang and Kun-Ta Chuang. Density Conscious Subspace Clustering for High-Dimensional Data. IEEE Transactions on Knowledge and Data Engineering. 2010;  22(1):16-30.

[15] Sequeira and Zaki M. SCHISM: A New Approach for Interesting Subspace Mining. The Proceedings of the Fourth IEEE Conference on Data Mining. 2004; 186–193.

[16] Maria Kontaki, Apostolos N. Papadopoulos and Yannis Manolopoulos. Continuous subspace clustering in streaming time series Information Systems. 2008; 33: 240–260.

[17] Agrawal R, Gehrke J, Gunopulos D and Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. Proceedings of the 1998 ACM SIGMOD international conference on Management of data.1998: 94-105.

[18] Mohammed J Zaki, Markus Peters, Ira Assent and Thomas Seidl. CLICKS: An effective algorithm for mining subspace clusters in categorical datasets. Data & Knowledge Engineering. 2007; 60: 51–70.

[19] MARY JEYA JOTHI, R. AND EBIN EPHREM ELAVANTHINGALN. ANALYZING THE REGULARITY OF COMPLETE K-PARTITE GRAPH USING SUPER STRONGLY PERFECT 

 GRAPHS. ONLINE INTERNATIONAL CONFERENCE ON GREEN ENGINEERING AND TECHNOLOGIES (IC-GET). 2015. 

[20] Assent I, Krieger R, Muller E and Seidl T. INSCY: indexing subspace clusters with in-process-removal of redundancy.Proceeding of IEEE International Conference on Data Mining. 2008; 719–724.

[21] Yi-Hong Chu, Ying-Ju Chen, De-Nian Yang and Ming-Syan Chen. Reducing Redundancy in Subspace Clustering. IEEE Transactions on Knowledge and Data Engineering. 2009; 21(10): 1432-1446.

[22] Gabriela Moise, Arthur Zimek, Peer Kröger, Hans-Peter Kriegel and Jörg Sander. Subspace and projected clustering: experimental evaluation and analysis. Knowledge Information Systems. 2009; 21: 299–326.

[23] Brian McWilliams and Giovanni Montana. Subspace clustering of high-dimensional data: a predictive approach. Data Mining and Knowledge Discovery. 2014; 28:736–772.

[24] Amardeep Kaur and Amitava Datta. A novel algorithm for fast and scalable subspace clustering of high-dimensional data. Journal of Big Data. 2015;2(17).

[25] Hans-Peter Kriegel, Peer Kr¨oger, Matthias Renz and Sebastian Wurst. Generic Framework for Efficient Subspace Clustering of High-Dimensional Data. Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05). 2005; 250-257.

[26] JIAN YIN, ZHILAN HUANG AND JIAN CHEN. AN EFFECTIVE MAXIMAL SUBSPACE CLUSTERING ALGORITHM BASED ON ENUMERATION TREE. FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, HAIKOU, CHINA, 2007L; 572 - 576.

[27] Ma shuai, Wang tengjiao, Yang dongqing and Gao jun. A new fast clustering algorithm based on reference and density. International Conference on Web-Age Information Management (WAIM'03), Lecture Notes in Computer Science. 2003;214–225.

[28] Datasets from UCI Machine Learning Repository, [Online]. available at: http://archive.ics.uci.edu/ml

[29] Jaya Lakshmi B, Madhuri KB and Shashi M. An Efficient Algorithm for Density Based Subspace Clustering with Dynamic Parameter Setting. International Journal of Information Technology and Computer Science, 2017;6(4):27-33.