An Efficient Clustering Algorithm for Spatial Datasets with Noise

Full Text (PDF, 461KB), PP.29-36

Views: 0 Downloads: 0

Author(s)

Akash Nag 1,* Sunil Karforma 2

1. Department of Computer Science, M.U.C. Women’s College, Burdwan, WB, India

2. Department of Computer Science, The University of Burdwan, Burdwan, WB, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2018.07.03

Received: 16 Jun. 2017 / Revised: 26 Jun. 2017 / Accepted: 5 Jul. 2017 / Published: 8 Jul. 2018

Index Terms

Clustering, data mining, spatial datasets, noisy data

Abstract

Clustering is the technique of finding useful patterns in a dataset by effectively grouping similar data items. It is an intense research area with many algorithms currently available, but practically most algorithms do not deal very efficiently with noise. Most real-world data are prone to containing noise due to many factors, and most algorithms, even those which claim to deal with noise, are able to detect only large deviations as noise. In this paper, we present a data-clustering method named SIDNAC, which can efficiently detect clusters of arbitrary shapes, and is almost immune to noise – a much desired feature in clustering applications. Another important feature of this algorithm is that it does not require apriori knowledge of the number of clusters – something which is seldom available.

Cite This Paper

Akash Nag, Sunil Karforma, " An Efficient Clustering Algorithm for Spatial Datasets with Noise", International Journal of Modern Education and Computer Science(IJMECS), Vol.10, No.7, pp. 29-36, 2018. DOI:10.5815/ijmecs.2018.07.03

Reference

[1]MacQueen, James. "Some methods for classification and analysis of multivariate observations." Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. Vol. 1. No. 14. 1967.
[2]Kaufman, Leonard, and Peter J. Rousseeuw. Finding groups in data: an introduction to cluster analysis. Vol. 344. John Wiley & Sons, 2009.
[3]Huang, Zhexue. "A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining." DMKD. 1997.
[4]Ng, Raymond T., and Jiawei Han. "Efficient and Effective Clustering Methods for Spatial Data Mining." Proc. of. 1994.
[5]Schikuta, Erich. "Grid-clustering: An efficient hierarchical clustering method for very large data sets." Pattern Recognition, 1996., Proceedings of the 13th International Conference On. Vol. 2. IEEE, 1996.
[6]Schikuta, Erich, and Martin Erhart. "The BANG-clustering system: Grid-based data analysis." International Symposium on Intelligent Data Analysis. Springer Berlin Heidelberg, 1997.
[7]Guha, Sudipto, Rajeev Rastogi, and Kyuseok Shim. "CURE: an efficient clustering algorithm for large databases." ACM Sigmod Record. Vol. 27. No. 2. ACM, 1998.
[8]Sibson, Robin. "SLINK: an optimally efficient algorithm for the single-link cluster method." The computer journal 16.1 (1973): 30-34.
[9]Ester, Martin, et al. "A density-based algorithm for discovering clusters in large spatial databases with noise." Kdd. Vol. 96. No. 34. 1996.
[10]Hinneburg, Alexander, and Daniel A. Keim. "An efficient approach to clustering in large multimedia databases with noise." KDD. Vol. 98. 1998.
[11]Agrawal, Rakesh, et al. Automatic subspace clustering of high dimensional data for data mining applications. Vol. 27. No. 2. ACM, 1998.
[12]Zhang, Tian, Raghu Ramakrishnan, and Miron Livny. "BIRCH: an efficient data clustering method for very large databases." ACM Sigmod Record. Vol. 25. No. 2. ACM, 1996.
[13]Fu, Limin, and Enzo Medico. "FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data." BMC bioinformatics 8.1 (2007): 3.
[14]Ankerst, Mihael, et al. "OPTICS: ordering points to identify the clustering structure." ACM Sigmod record. Vol. 28. No. 2. ACM, 1999.