Efficient Intelligent Framework for Selection of Initial Cluster Centers

Full Text (PDF, 650KB), PP.44-55

Views: 0 Downloads: 0

Author(s)

Bikram K. Mishra 1,* Amiya K. Rath 1 Santosh K. Nanda 2 Ritik R. Baidyanath 3

1. Veer Surendra Sai University of Technology, Burla

2. Flytxt Mobile Solutions Pvt. Ltd., 7th Floor, Leela Infopark, Technopark Rd, Technopark Campus, Thiruvananthapuram, Kerala 695581

3. Silicon Institute of Technology, Bhubaneswar

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2019.08.05

Received: 26 Feb. 2019 / Revised: 27 Mar. 2019 / Accepted: 11 Apr. 2019 / Published: 8 Aug. 2019

Index Terms

Optimal Cluster Center, Performance index, Clustering Accuracy, K-Means, Modified Center K-Means and Far Efficient K-Means

Abstract

At present majority of research is on cluster analysis which is based on information retrieval from data that portrays the objects and their association among them. When there is a talk on good cluster formation, then selection of an optimal cluster core or center is the necessary criteria. This is because an inefficient center may result in unpredicted outcomes. Hence, a sincere attempt had been made to offer few suggestions for discovering the near optimal cluster centers. We have looked at few versatile approaches of data clustering like K-Means, TLBOC, FEKM, FECA and MCKM which differs in their initial center selection procedure. They have been implemented on diverse data sets and their inter and intra cluster formation efficiency were tested using different validity indices. The clustering accuracy was also conducted using Rand index criteria. All the algorithms computational complexity was analyzed and finally their computation time was also recorded. As expected, mostly FECA and to some extend FEKM and MCKM confers better clustering results as compared to K-Means and TLBOC as the former ones manages to obtain near optimal cluster centers. More specifically, the accuracy percentage of FECA is higher than the other techniques however, it’s computational complexity and running time is moderately higher.

Cite This Paper

Bikram K. Mishra, Amiya K. Rath, Santosh K. Nanda, Ritik R. Baidyanath, "Efficient Intelligent Framework for Selection of Initial Cluster Centers", International Journal of Intelligent Systems and Applications(IJISA), Vol.11, No.8, pp.44-55, 2019. DOI:10.5815/ijisa.2019.08.05

Reference

[1]A. J. Mohammed, Yusof, Y. and Husni, H. “Discovering optimal clusters using firefly algorithm”, Int. J. of Data Mining, Modelling and Management, vol. 8, no. 4, pp.330–347, 2016.
[2]A. K. Jain, M. N. Murty and P. J. Flynn. “Data Clustering: A Review”, ACM Computing Surveys, vol. 31, no. 3, pp 264–323, 1999.
[3]A. K. Jain, A. Topchy, M. H. C. Law and J. M. Buhmann. “Landscape of clustering algorithms”, In Proc. IAPR International conference on pattern recognition, Cambridge, UK, pp. 260–263, 2004.
[4]A. Naik. S. C. Satpathy and K. Parvathi. “Improvement of initial cluster centre of c-means using teaching learning based optimization”, 2nd Int. Conf. on Communication, Computing & Security, pp. 428 – 435, 2012.
[5]B. Amiri. “Application of Teaching-Learning-Based Optimization Algorithm on Cluster Analysis”, Journal of Basic and Applied Scientific Research, 2(11), pp. 11795–11802, 2012.
[6]B. K. Mishra and A. K. Rath. “Improving the Efficacy of Clustering by Using Far Enhanced Clustering Algorithm”, Int. J. Data Mining, Modeling and Management, vol. 10, issue 3, pp. 269–292, 2018.
[7]B. K. Mishra, N. R. Nayak and A. K. Rath. “Assessment of basic clustering techniques using teaching-learning-based optimization”, Int. J. Knowledge Engineering and Soft Data Paradigms, vol. 5, no. 2, pp. 106–122, 2016.
[8]B. K. Mishra, N. R. Nayak, A. K. Rath and S. Swain. “Improving the Efficiency of Clustering by Using an Enhanced Clustering Methodology”, Int. J. of Advances in Engineering & Technology, vol. 4, issue 2, pp. 415–424, 2012.
[9]B. K. Mishra, N. R. Nayak, A. K. Rath and S. Swain. “Far Efficient K-Means Clustering Algorithm”, Proceedings of the International Conference on Advances in Computing, Communications and Informatics, ACM, pp. 106–110, 2012.
[10]C. Merz and P. Murphy. UCI Repository of Machine Learning Databases, Available: fttp://ftp.ics.uci.edu/pub/machine-learning-databases.
[11]C. S. Li. “Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters”, Int. Conference on Advances in Engineering, Elsevier, vol. 24, pp. 324–328, 2011.
[12]D. L. Davies and D. W. Bouldin. “A Cluster Separation Measure”, IEEE Trans Pattern Analysis & Machine Intelligence, vol. 1, pp. 224–227, 1979.
[13]F. Cao, J. Liang and G. Jiang. “An initialization method for the K-Means algorithm using neighbourhood model”, Computers and Mathematics with Applications, pp. 474–483, 2009.
[14]H. S. Park and C.H. Jun. “A simple and fast algorithm for K-medoids clustering”, Expert System with Applications, pp. 3336–3341, 2009.
[15]H. Xiong, G. Pandey, M. Steinbach and V. Kumar. “Enhancing Data Analysis with Noise Removal”, IEEE Transactions on Knowledge and Data Engineering, vol. 18, issue 3, pp. 304–319, 2006.
[16]J. C. Dunn. “A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters”, J. Cybernetics, vol. 3, pp. 32–57, 1973.
[17]J. Mac Queen. “Some methods for classification and analysis of multivariate observations”, Fifth Berkeley Symposium on Mathematics, Statistics and Probability, University of California Press, pp. 281–297, 1967.
[18]K. A. Nazeer and M. P. Sebastian. “Improving the Accuracy and Efficiency of the k-means Clustering Algorithm”, Proceedings of the World Congress on Engineering, vol. 1, 2009.
[19]L. J. Hubert and J. R. Levin. “A general statistical framework for accessing categorical clustering in free recall”, Psychological Bulletin 83, pp.1072–1080, 1976.
[20]L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, 1990.
[21]M. Erisoglu, N Calis and S Sakallioglu. “A new algorithm for initial cluster centers in k-means algorithm”, Pattern Recognition Letters, vol. 32, no. 14, pp. 1701–1705, 2011.
[22]M. Halkidi, Y. Batistakis and M. Vazirgiannis. “Clustering validity checking methods: Part ii ”, SIGMOD, record 31 (3), pp. 19–27, 2002.
[23]P. J. Rousseeuw. “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis”, J. of Computational and Applied Mathematics, vol. 20, pp. 53–65, 1987.
[24]R. V. Rao, V. J. Savsani and D.P. Vakharia. “Teaching–learning-based optimization: A novel method for constrained mechanical design optimization problems”, Computer-Aided Design 43, pp. 303–315, 2011.
[25]R. Xu and D. Wunsch. “Survey of Clustering Algorithms”, IEEE Transactions on Neural networks, vol. 16, no. 3, pp. 645–678, 2005.
[26]S. C. Satpathy and A. Naik. “Data Clustering Based on Teaching-Learning-Based Optimization”, SEMCCO, LNCS 7077, pp. 148–156, 2011.
[27]S. Na, L. Xumin and G. Yong. “Research on K-Means clustering algorithm – An Improved K-Means Clustering Algorithm”, IEEE 3rd Int. Symposium on Intelligent Info. Technology and Security Informatics, pp. 63–67, 2010.
[28]T. Caliliski and Harabasz, J. “A dendrite method for cluster analysis”, Communications in Statistics, vol. 3, no. 1, pp. 1–27, 1974.
[29]V. E. Castro. “Why so many clustering algorithms — A Position Paper”, SIGKDD Explorations, vol. 4, issue 1, pp 65–75, 2002.
[30]W. M. Rand. “Objective Criteria for the Evaluation of Clustering Methods”. Journal of the American Statistical Association, vol. 66, no. 336, pp. 846 – 850, 1971.
[31]Y. M. Cheung. “A New Generalized K-Means Clustering Algorithm”, Pattern Recognition Letters, vol. 24, issue15, pp. 2883–2893, 2003.
[32]Ahmad and S. Khan. “Survey of State-of-the-Art Mixed Data Clustering Algorithms”, IEEE Access, vol. 7, pp. 31883–31902, 2019.
[33]A. Fahim. “Homogeneous Densities Clustering Algorithm”, I. J. Information Technology and Computer Science, vol. 10, no. 10, pp. 1–10, 2018.
[34]A. Nag and S Karforma, "An Efficient Clustering Algorithm for Spatial Datasets with Noise", Int. J. of Modern Education and Computer Science, vol.10, no.7, pp.29–36, 2018.
[35]V. Kumar, J. K. Chhabra and D. Kumar. “Data Clustering Using Differential Search Algorithm”, Pertanika J. Sci. & Technol., vol. 24(2), pp. 295–306, 2016.
[36]X. Yao, S. Ge, H. Kong and H. Ning. “An improved Clustering Algorithm and its Application in WeChat Sports Users Analysis”, Procedia Computer Science, Elsevier, vol. 129, pp. 166–174, 2018.
[37]D. Jian and G. Xinyue. “Bisecting K-means algorithm based on K-valued self-determining and clustering center optimization”, Journal of Computers, vol. 13, no. 6, pp. 588–596, 2018.
[38]R. R. Kurada and K. P. Kanadam. “A Novel Evolutionary Automatic Data Clustering Algorithm using Teaching-Learning-Based Optimization”, Int. J. of Intelligent Systems and Applications, vol. 10, no. 5, pp. 61–70, 2018.
[39]G. Gan and M. K-P. Ng. “K-means clustering with outlier removal”, Pattern Recognition Letters, vol. 90, pp. 8–14, 2017.