Exploration of Various Clustering Algorithms for Text Mining

Full Text (PDF, 266KB), PP.10-18

Views: 0 Downloads: 0

Author(s)

Neha Garg 1,* R. K. Gupta 2

1. Department of CS&A, ITM University, Gwalior, India

2. Department of CSE & IT, Madhav Institute of Technology and Science, Gwalior, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijeme.2018.04.02

Received: 15 Dec. 2017 / Revised: 15 Feb. 2018 / Accepted: 20 Mar. 2018 / Published: 8 Jul. 2018

Index Terms

Text Mining, Document Clustering, Partitioning Algorithms, Hierarchical Algorithms

Abstract

Due to the current encroachments in technology and also sharp lessening of storage cost, huge extents of documents are being put away in repositories for future references. At the same time, it is time consuming as well as costly to recover the user intrigued documents, out of these gigantic accumulations. Searching of documents can be made more efficient and effective if documents are clustered on the premise of their contents. This article uncovers a comprehensive discussion on various clustering algorithm used in text mining alongside their merits, demerits and comparisons. Further, author has likewise examined the key challenges of clustering algorithms being used for effective clustering of documents.

Cite This Paper

Neha Garg, R.K. Gupta,"Exploration of Various Clustering Algorithms for Text Mining", International Journal of Education and Management Engineering(IJEME), Vol.8, No.4, pp.10-18, 2018. DOI: 10.5815/ijeme.2018.04.02

Reference

[1]Han, J., Kamber, M. (2006). Data Mining: Concepts and Techniques, Morgan Kaufmann, 2nd Ed., 2006.

[2]Konchady, M. (2006). Text Mining Application Programming, Programming Series Charles River Media (2006).

[3]Steinbach, M., Karypis, G., Kumar, V. (2000). A Comparison of document clustering techniques. Technical report, Department of Computer Science and Engineering, University of Minnesota.

[4]Sihag, V. K., Kumar, S. (2013). Graph based Text Document Clustering by Detecting Initial Centroids for k-means, International Journal of Computer Applications, 62(19), Jan 2013.

[5]Sarkar, S., Roy, A., Purkayastha, B. S. (2014). A Comparative Analysis of Particle Swarm Optimization and K-means Algorithm For Text Clustering Using Nepali Wordnet, International Journal on Natural Language Computing, 3(3), June 2014.

[6]Jaiganesh, S., Jaganathan, P. (2015). An Appropriate Similarity Measure for K-Means Algorithm in Clustering Web Documents, International Journal for Scientific Research & Development, 3(2), 2015.

[7]Cao, D., Yang, B. (2010). An improved k-medoids clustering algorithm, International Conference of Computer and Automation Engineering, 3, pp. 132 – 135, 2010. 

[8]Pratap, R., Vani, K. S., Devi, J. R., Rao, K. N. (2011). An Efficient Density based Improved K- Medoids Clustering algorithm, International Journal of Advanced Computer Science and Applications, 2(6), 2011.

[9]Jha, M. (2015). Document clustering using k-medoids, International Journal on Advanced Computer Theory and Engineering, 4(1), 2015.

[10]Murugavel, P., Punithavalli, M. (2011). Improved Hybrid Clustering and Distance-Based Technique for Outlier Removal, International Journal on Computer Science and Engineering, 3(1), Jan 2011.

[11]Vijayarani, S., Nithya, S. (2011). An Efficient Clustering Algorithm for Outlier Detection, International Journal of Computer Applications, 32(7), October 2011.

[12]Zhang, T., Ramakrishnan, R., Livny, M. (1996). BIRCH: An efficient data clustering method for very large databases, In: Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 103 – 114, 1996.

[13]Ismael, N., Alzaalan, M., Ashour, W. (2014). Improved Multi Threshold Birch Clustering Algorithm, International Journal of Artificial Intelligence and Applications for Smart Devices, 2(1), Feb 2014.

[14]Gupta, M., Rajavat, A. (2015). Time Improving Policy of Text Clustering Algorithm by Reducing Computational Overheads, International Journal of Computer Applications, 123(5), August 2015.

[15]Pujari, A. K. (2007). Data Mining Techniques, 3rd Ed., Universities Press, 2007.

[16]Song, S., Li, C. (2006). Improved ROCK for Text Clustering Using Asymmetric Proximity, SOFSEM, LNCS 3831, pp. 501 – 510, Jan 2006.

[17]Zhang, Q., Ding, L., Zhang, S. (2010). A Genetic Evolutionary ROCK Algorithm, International Conference on Computer Application and System Modeling, 12, pp. 347 – 351, Nov 2010.

[18]Ahmad, R., Khanum, A. (2010). Document Topic Generation in Text Mining by Using Cluster Analysis with EROCK, International Journal of Computer Science and Security, 4(2), May 2010