Anand Khandare

Work place: Department of CSE, SGB Amravati University Amravati, India

E-mail: anand.khandare1983@gmail.com

Website:

Research Interests: Computer systems and computational processes, Computational Learning Theory, Embedded System, Data Structures and Algorithms

Biography

Anand Khandare has graduated from Sant Gadge Baba (SGB) Amravati University, Amravati in Computer Science and Engineering in 2005. He completed his Master's Degree from Mumbai University in Academic Year 2010-11. He is pursuing Ph.D. from Sant Gadge Baba Amravati University. Currently, he is working as an Assistant Professor at Thakur College of Engineering and Technology, Mumbai University. He has 11 years of teaching experience in the Institute. He has published more than 10 papers in international journals and conferences. He has also published C and C++ programming language books. His area of interest is machine learning and intelligent system. His interests also include web application development and mobile application development. He is a life time member of ISTE professional body.

Author Articles
Efficient Clustering Algorithm with Enhanced Cohesive Quality Clusters

By Anand Khandare Abrar Alvi

DOI: https://doi.org/10.5815/ijisa.2018.07.05, Pub. Date: 8 Jul. 2018

Analyzing data is a challenging task nowadays because the size of data affects results of the analysis. This is because every application can generate data of massive amount. Clustering techniques are key techniques to analyze the massive amount of data. It is a simple way to group similar type data in clusters. The key examples of clustering algorithms are k-means, k-medoids, c-means, hierarchical and DBSCAN. The k-means and DBSCAN are the scalable algorithms but again it needs to be improved because massive data hampers the performance with respect to cluster quality and efficiency of these algorithms. For these algorithms, user intervention is needed to provide appropriate parameters as an input. For these reasons, this paper presents modified and efficient clustering algorithm. This enhances cluster’s quality and makes clusters more cohesive using domain knowledge, spectral analysis, and split-merge-refine techniques. Also, this algorithm takes care to minimizing empty clusters. So far no algorithm has integrated these all requirements that proposed algorithm does just as a single algorithm. It also automatically predicts the value of k and initial centroids to have minimum user intervention with the algorithm. The performance of this algorithm is compared with standard clustering algorithms on various small to large data sets. The comparison is with respect to a number of records and dimensions of data sets using clustering accuracy, running time, and various clusters validly measures. From the obtained results, it is proved that performance of proposed algorithm is increased with respect to efficiency and quality than the existing algorithms.

[...] Read more.
Optimized Time Efficient Data Cluster Validity Measures

By Anand Khandare A. S. Alvi

DOI: https://doi.org/10.5815/ijitcs.2018.04.05, Pub. Date: 8 Apr. 2018

The main task of any clustering algorithm is to produce compact and well-separated clusters. Well separated and compact type of clusters cannot be achieved in practice. Different types of clustering validation are used to evaluate the quality of the clusters generated by clustering. These measures are elements in the success of clustering. Different clustering requires different types of validity measures. For example, unsupervised algorithms require different evaluation measures than supervised algorithms. The clustering validity measures are categorized into two categories. These categories include external and internal validation. The main difference between external and internal measures is that external validity uses the external information and internal validity measures use internal information of the datasets. A well-known example of the external validation measure is Entropy. Entropy is used to measure the purity of the clusters using the given class labels. Internal measures validate the quality of the clustering without using any external information. External measures require the accurate value of the number of clusters in advance. Therefore, these measures are used mainly for selecting optimal clustering algorithms which work on a specific type of dataset. Internal validation measures are not only used to select the best clustering algorithm but also used to select the optimal value of the number of clusters. It is difficult for external validity measures to have predefined class labels because these labels are not available often in many of the applications. For these reasons, internal validation measures are the only solution where no external information is available in the applications. 

All these clustering validity measures used currently are time-consuming and especially take additional time for calculations. There are no clustering validity measures which can be used while the clustering process is going on.

This paper has surveyed the existing and improved cluster validity measures. It then proposes time efficient and optimized cluster validity measures. These measures use the concept of cluster representatives and random sampling. The work proposes optimized measures for cluster compactness, separation and cluster validity. These three measures are simple and more time efficient than the existing clusters validity measures and are used to monitor the working of the clustering algorithms on large data while the clustering process is going on.

[...] Read more.
Performance Analysis of Improved Clustering Algorithm on Real and Synthetic Data

By Anand Khandare A. S. Alvi

DOI: https://doi.org/10.5815/ijcnis.2017.10.07, Pub. Date: 8 Oct. 2017

Clustering is an important technique in data mining to partition the data objects into clusters. It is a way to generate groups from the data objects. Different data clustering methods or algorithms are discussed in the various literature. Some of these are efficient while some are inefficient for large data. The k-means, Partition Around Method (PAM) or k-medoids, hierarchical and DBSCAN are various clustering algorithms. The k-means algorithm is more popular than the other algorithms used to partition data into k clusters. For this algorithm, k should be provided explicitly. Also, initial means are taken randomly but this may generate clusters with poor quality. This paper is a study and implementation of an improved clustering algorithm which automatically predicts the value of k and uses a new technique to take initial means. The performance analysis of the improved algorithm and other algorithms by using real and dummy datasets is presented in this paper. To measure the performance of algorithms, this paper uses running time of algorithms and various cluster validity measures. Cluster validity measures include sum squared error, silhouette score, compactness, separation, Dunn index and DB index. Also, the k predicted by the improved algorithm is compared with optimal k suggested by elbow method. It is found that both values of k are almost similar. Most of the values of validity measures for the improved algorithm are found to be optimal.

[...] Read more.
Other Articles