An Initilization Method for Subspace Clustering Algorithm

Full Text (PDF, 234KB), PP.54-61

Views: 0 Downloads: 0

Author(s)

Qingshan Jiang 1,* Yanping Zhang 2 Lifei Chen 3

1. Shenzhen Institutes of Advanced Technology, Chinese Academy of Science, Shenzhen, 518055, China

2. Software School, Xiamen University Xiamen, Fujian, 361005, China

3. School of Mathematics and Computer Science, Fujian Normal University, Fuzhou, 350108, China

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2011.03.08

Received: 7 Jun. 2010 / Revised: 6 Oct. 2010 / Accepted: 17 Jan. 2011 / Published: 8 May 2011

Index Terms

K-means type clustering, subspace clustering, subspace difference, initialization method

Abstract

Soft subspace clustering is an important part and research hotspot in clustering research. Clustering in high dimensional space is especially difficult due to the sparse distribution of the data and the curse of dimensionality. By analyzing limitations of the existing algorithms, the concept of subspace difference and an improved initialization method are proposed. Based on these, a new objective function is given by taking into account the compactness of the subspace clusters and subspace difference of the clusters. And a subspace clustering algorithm based on k-means is presented. Theoretical analysis and experimental results demonstrate that the proposed algorithm significantly improves the accuracy.

Cite This Paper

Qingshan Jiang, Yanping Zhang, Lifei Chen, "An Initilization Method for Subspace Clustering Algorithm“, International Journal of Intelligent Systems and Applications(IJISA), vol.3, no.3, pp.54-61, 2011. DOI:10.5815/ijisa.2011.03.08

Reference

[1]A. Jain, M. Murty, P. Flynn, “Data clustering: a review”, ACM Comput. Surv. 31 (1999) 264–323. 

[2]Y. Cao, J. Wu, “Projective ART for clustering data sets in high dimensional spaces”, Neural Networks 15 (2002) 105–120. 

[3]A. Hotho, A. Maedche, S. Staab, “Ontology-based text document clustering”, in: Proceedings of the IJCAI 2001 Workshop on Text Learning: Beyond Super- vision, 2001.

[4]L. Parsons, E. Haque, H. Liu, “Subspace clustering for high dimensional data: a review”, SIGKDD Explorations 6 (1) (2004) 90–105. 

[5]R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan, “Automatic subspace clustering of high dimensional data for data mining applications”, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, 1998, pp. 94–105. 

[6]C.H. Cheng, A.W. Fu, Y. Zhang, “Entropy-Based Subspace Clustering for Mining Numerical Data”, in: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge and Data Mining, 1999, pp. 84–93. 

[7]S. Goil, H. Nagesh, A. Choudhary, Mafia, “efficient and scalable subspace clustering for very large data sets”, Technical Report CPDC-TR-9906-010, Northwest University, 1999 .

[8]C. Aggarwal, C. Procopiuc, J.L. Wolf, P.S. Yu, J.S. Park, “Fast algorithms for projected clustering”, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, 1999, pp. 61–72. 

[9]C.C. Aggarwal, P.S. Yu,” Finding generalized projected clusters in high dimensional spaces”, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, 2000, pp. 70–81. 

[10]K.G. Woo, J.H. Lee, Findit: “A fast and intelligent subspace clustering algorithm using dimension voting”, Ph.D. Dissertation, Korea Advanced Institute of Science and Technology, 2002. 

[11]C.M. Procopiuc, M. Jones, P.K. Agarwal, T.M. Murali, “A Monte Carlo algorithm for fast projective clustering”, in: Proceedings of the ACM SIGMOD Conference on Management of Data, 2002, pp. 418–427. 

[12]K.Y. Yip, D.W. Cheung, M.K. Ng, “A practical projected clustering algorithm”, IEEE Trans. Knowl. Data Eng. 16 (11) (2004) 1387–1397. 

[13]K. Chakrabarti, S. Mehrotra,” Local dimensionality reduction: A new approach to indexing high dimensional spaces”, in: Proceedings of the 26th Interna- tional Conference on Very Large Data Bases, 2000, pp. 89–100. 

[14]Z.H. Deng, k.S. Choi, F.L. Chung, S.T. Wang. “Enhanced soft subspace clustering integrating within-cluster and between-cluster information”. Pattern Recognition 43(2010) 767-781.

[15]G. De Soete, “Optimal variable weighting for ultrametric and additive tree clustering”, Qual. Quantity 20 (1986) 169–180. 

[16]L. Jing, M.K. Ng, J. Xu, J.Z. Huang, “Subspace clustering of text documents with feature weighting k-means algorithm”, in: Proceedings of the Ninth Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2005, pp. 802–812.

[17]C. Domeniconi, D. Gunopulos, S. Ma, B. Yan, M. Al-Razgan, D. Papadopoulos, “Locally adaptive metrics for clustering high dimensional data”, Data Min. Knowl. Discovery J. 14 (2007) 63–97.

[18]K.L. Wu, J. Yu, M.S. Yang, “A novel fuzzy clustering algorithm based on a fuzzy scatter matrix with optimality tests”, Pattern Recognition Lett. 26 (5) (2005) 639–652. 

[19]Y. Zhang, Q. Jiang. “An Improved initialization method for clustering high-dimensional data”, the 2th International Workshop on Database Technology and Applications, 2010.

[20]L. Chen, Q. Jiang, L. Chen. “An initialization method for clustering high-dimensional data”, First International Workshop on Database Technology and Applications, 2009.

[21]He, j., Lan, M., Tan, C., Sung, S., Low, H.. “Initialization of cluster refinement algorithms: A review and comparative study”. Proceeding of the IEEE IJCNN2004.

[22]E. Forgy, “Cluster analysis of multivariate data: efficiency vs. interpretability of classifications”, In WNAR mretingx, Univ of ColifRiverside, number 768, 1965.

[23]Tou, J., Gonzalez, R.: “Pattern recognition principles”. In: Addison Wesley, Massachusetts, 1974.

[24]L. Kaufman and Rousseeuw. “Finding groups in data: An introduction to cluster analysis”. Wiley, New York, 1990.

[25]G.J. Gan, J.H. Wu, Z.J. Yang, “A fuzzy subspace algorithm for clustering high dimensional data”, in: X. Li, O. Zaiane, Z. Li (Eds.), Lecture Notes in Artificial Intelligence, vol. 4093, Springer, Berlin, 2006, pp. 271–278. 

[26]L. Chen, G. Guo, Q. Jiang. “An adaptive algorithm for soft subspace clustering”, Journal of Software, 2010,21(10):2513-2523.

[27]Y. Zhang, Q. Jiang. “A k-means-based alogorithm for soft subspace clustering”, Journal of Frontiers of Computer Science and Technoloty, 2010,4(11:1019-1026).

[28]P., B.: “Survey of clustering data mining techniques”. In: Technical Report, Accrue Software, Inc. 2002.

[29]SpamBase. ftp.ics.uci.edu:pub/machine-learning-databases

[30]Email-1431. http://www2.imm.dtu.dk/-rem/data/

[31]Ling-Spam. http://www.aueb. gr/users/ion/data/

[32]TanCorp. http://lcc.software.ict.ac.cn/~tansongbo/corpus1.php