An Improved Text Clustering Method based on Hybrid Model

Full Text (PDF, 231KB), PP.35-44

Views: 0 Downloads: 0

Author(s)

Jinzhu Hu 1,* Chunxiu Xiong 1 Jiangbo Shu 1 Xing Zhou 1 Jun Zhu 1

1. Department of Computer Science of HuaZhong Normal University, Wuhan, China

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2009.01.05

Received: 15 May 2009 / Revised: 16 Jul. 2009 / Accepted: 15 Aug. 2009 / Published: 8 Oct. 2009

Index Terms

Tree-structured growing self-organizing maps, Fuzzy K-Means, text clustering, text clustering flow model

Abstract

According to the high-dimensional sparse features on the storage of textual document, and defects existing in the clustering methods or the hybrid methods which have already been studied by now and some other problems. So an improved text clustering method based on hybrid model, that is a text clustering approach (short for TGSOM-FS-FKM) based on tree-structured growing self-organizing maps (TGSOM) and Fuzzy K-Means (FKM) is proposed. The method has optimized the clustering result through three times of clustering. It firstly makes preprocess of texts, and filters the majority of noisy words by using an unsupervised feature selection method. Then it used TGSOM to execute the first clustering to get a rough classification of texts, and to get the initial clustering number and each text’s category. And then introduced LSA theory to improve the precision of clustering and reduce the dimension of the feature vector. After that, it used TGSOM to execute the second clustering to get more precise clustering results, and used supervised feature selection method to select feature items. Finally, it used FKM to cluster the result set. In the experiment, it remained the same number of feature items and experimental results indicate that TGSOM-FS-FKM clustering excels to other clustering method such as DSOM-FS-FCM, and the precision is better than DSOM-FCM, DFKCN and FDMFC clustering.

Cite This Paper

Jinzhu Hu, Chunxiu Xiong, Jiangbo Shu, Xing Zhou, Jun Zhu, "An Improved Text Clustering Method based on Hybrid Model", International Journal of Modern Education and Computer Science(IJMECS), vol.1, no.1, pp.35-44, 2009. DOI:10.5815/ijmecs.2009.01.05

Reference

[1]YE Ping, Fuzzy K-means algorithms based on membership function improvement[J].Changchun Institute of Technology(Natural Sciences Edition),,2007,(01)
[2]HE Zhongshi,XU Zhejun,A New Methal Unsupervised Feature Selection for Text Mining[J]. Chongqing University(Natural Science Edition),2007(06)
[3]XIONG Zhongyang, ZHANG Pengzhao and ZHANG Yufang,Improved approach to CHI in feature extraction[J]. Computer Applications,2008,(02)
[4]GENG Xinqing,WANG Zhengou,DFKCN:A Dynamic Fuzzy Kohonen Neural Network and Its Application[J].Computer Engineering, 2003,(03)
[5]Wang Li Wang Zhcngou, TGSOM: A NEW DYNAMIC SELF-ORGANIZING MAPS FOR DATA CLUSTERING[J]. Electronics and Information Technology, 2003(03)
[6]GUO yan-fen; LI Tai,Design of fuzzy K-means-based fuzzy classifier[J]. Technical Acoustics,2007(04)
[7]GONG Jing, LI Ying-jie,Analysis and Comparison on Text Clustering Algorithm[J]. Hunan Environment-Biological Polytechnic,2006(03)
[8]R. Siegwart, I. R. Nourbakhsh, “Introduction to Autonomous Mobile Robots,” Prentice Hall of India (pvt.) Ltd., 2005.
[9]G. Hamerly and C. Elkan, “Alternatives to the K-Means Algorithm That Find Better Clusterings,” Proc. 11th Int’l Conf.Information and Knowledge Management (IKM ’02), pp. 600-607,2002.
[10]P. Arabshahi, J. J. Choi, R. J. Marks 11, and T. P. Caudell, “Fuzzy control of backpropagation,” in Proc. IEEE Int. Conf. Fuzzy System (FUZZ-IEEE ’92), San Diego, CA, Mar. 1992.
[11]Z. Huang, M. Ng, H. Rong, and Z. Li, “Automated Variable Weighting in k-Means Type Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 5, pp. 657-668, May 2005.
[12]M. Laszlo and S. Mukherjee, “A Genetic Algorithm Using Hyper-Quadtrees for Low-Dimensional K-Means Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 4,pp. 533-543, Apr. 2006.
[13]D. Arthur and S. Vassilvitskii, “K-Means++: The Advantages of Careful Seeding,” Proc. 18th Ann. ACM-SIAM Symp. Discrete Algorithms (SODA ’07), pp. 1027-1035, 2007.
[14]Y. Cheung, “Maximum Weighted Likelihood via Rival Penalized EM for Density Mixture Clustering with Automatic Model Selection,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 6,pp. 750-761, June 2005.
[15]D. Lieb, A. Lookingbill, and S. Thrun, “Adaptive Road Following using Self-Supervised Learning and Reverse Optical Flow,” Stanford Artificial Intelligence Laboratory, Stanford University, 2005.
[16]Gath and A. Geve, “Unsupervised Optimal Fuzzy Clustering,”IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, no. 7,pp. 773-781, July 1989.
[17]S. Padhraic, “Model Selection for Probabilistic Clustering Using Cross-Validated Likelihood,” Statistics and Computing, vol. 10,pp. 63-72, 2000.
[18]Y. Cheung, “Maximum Weighted Likelihood via Rival Penalized EM for Density Mixture Clustering with Automatic Model Selection,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 6,pp. 750-761, June 2005.