Outlier Reduction using Hybrid Approach in Data Mining

Full Text (PDF, 504KB), PP.43-49

Views: 0 Downloads: 0

Author(s)

Nancy Lekhi 1,* Manish Mahajan 1

1. Department of Information Technology, Chandigarh Engineering College Landran, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2015.05.06

Received: 12 Feb. 2015 / Revised: 10 Mar. 2015 / Accepted: 12 Apr. 2015 / Published: 8 May 2015

Index Terms

Data Mining, Clustering, Weighted K-means, Neural Network, Outlier, and SOF

Abstract

The Outlier detection is very active area of research in data mining where outlier is a mismatched data in dataset with respect to the other available data. In existing approaches the outlier detection done only on numeric dataset. For outlier detection if we use clustering method , then they mainly focus on those elements as outliers which are lying outside the clusters but it may possible that some of the unknown elements with any possible reasons became the part of the cluster so we have to concentrate on that also. The Proposed method uses hybrid approach to reduce the number of outliers. The number of outlier can only reduce by improving the cluster formulation method. The proposed method uses two data mining techniques for cluster formulation i.e. weighted k-means and neural network where weighted k-means is the clustering technique that can apply on text and date data set as well as numeric data set. Weighted k-means assign the weights to each element in dataset. The output of weighted k-means becomes the input for neural network where the neural network is the classification and clustering technique of data mining. Training is provided to the neural network and according to that neurons performed the testing. The neural network test the cluster formulated by weighted k-means to ensure that the clusters formulated by weighted k-means are group accordingly. There is lots of outlier detection methods present in data mining. The proposed method use Integrating Semantic Knowledge (SOF) for outlier detection. This method detects the semantic outlier where the semantic outlier is a data point that behaves differently with other data points in the same class or cluster. The main motive of this research work is to reduce the number of outliers by improving the cluster formulation methods so that outlier rate reduces and also to decrease the mean square error and improve the accuracy. The simulation result clearly shows that proposed method works pretty well as it significantly reduces the outlier.

Cite This Paper

Nancy Lekhi, Manish Mahajan, "Outlier Reduction using Hybrid Approach in Data Mining", International Journal of Modern Education and Computer Science (IJMECS), vol.7, no.5, pp.43-49, 2015. DOI:10.5815/ijmecs.2015.05.06

Reference

[1]Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth, “From Data Mining to Knowledge Discovery in Databases”, AI Magazine Volume 17 Number 3 (1996).
[2]Dr. Sankar Rajagopal , “ CUSTOMER DATA CLUSTERING USING DATA MINING TECHNIQUE”, International Journal of Database Management Systems ( IJDMS ) Vol.3, No.4, November 2011.
[3]Samir Kumar Sarangi, Dr. Vivek Jaglan, Yajnaseni Dash, “A Review of Clustering and Classification Techniques in Data Mining”, In ternational Journal of Engineering, Business and Enterprise Applications (IJEBEA) pp 140-145, 2013.
[4]Yabing Jiao, “Research of an Improved Apriori Algorithm in Data Mining Association Rule”, International Journal of Computer and Communication Engineering, Vol. 2, No. 1, January 2013.
[5]V. Uma, M. Kalaivany, G. Aghila, “Survey of Sequential Pattern Mining Algorithms and an Extension to Time Interval Based Mining Algorithm”, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 3, Issue 12, December 2013.
[6]Ms. S. D. Pachgade, Ms. S. S. Dhande, “Outlier Detection over Data Set Using Cluster-Based and Distance-Based Approach”, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 2, Issue 6, June 2012.
[7]De Amorim, R.C. , “Constrained clustering with Minkowski Weighted K-meanss ” Computational Intelligence and Informatics (CINTI), 2012 IEEE 13th International Symposium on 20-22 Nov. 2012.
[8]K. Amarendra, K.V. Lakshmi & K.V. Ramani , “Research on Data Mining Using Neural Networks” , Special Issue of International Journal of Computer Science & Informatics (IJCSI), ISSN (PRINT) : 2231–5292, Vol.- II, Issue-1, 2.
[9]M. O. Mansur, Mohd. Noor Md. Sap, “Outlier Detection Technique in Data Mining: A Research Perspective”, Proceedings of the Postgraduate Annual Research Seminar 2005.
[10]Jason J. Jung, Geun-Sik J, Semantic Outlier Analysis for Sessionizing Web Logs.
[11]Nancy Lekhi, Manish Mahajan, “Improving Cluster Formulation to Reduce Outliers in Data Mining”, International Journal of Innovative Research in Computer and Communication Engineering, Vol. 2, Issue 6, June 2014.
[12]Dharmendra S. Modha and W. Scott Spangler, “FeatureWeighting in k-meanss Clustering”, 2002 Kluwer Academic Publishers. Printed in the Netherlands, Machine Learning, Vol. 47, 2002.
[13]Reddy M. V. Jagannatha and B. Kavitha, “ Clustering the Mixed Numerical and Categorical Dataset using Similarity Weight and Filter Method”, International Journal of Database Theory and Application Vol. 5, No. 1, March, 2012.
[14]Anand M. Baswade, Kalpana D. Joshi , Prakash S. Nalwade, “A Comparative Study Of K-meanss And Weighted K-meanss For Clustering”, International Journal of Engineering Research & Technology (IJERT) Vol. 1 Issue 10, December- 2012 ISSN: 2278-0181.
[15]Pranjali Kasture, Jayant Gadge, “Cluster based Outlier Detection”, International Journal of Computer Applications (0975 – 8887) Volume 58– No.10, November 2012.
[16]Garima Singh, Vijay Kumar, “An Efficient Clustering and Distance Based Approach for Outlier Detection”, International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue 7–July 2013.
[17]Surekha V Peshatwar & Snehlata Dongre, “Outlier Detection Over Data Stream Using Cluster Based Approach And Distance Based Approach”, International Conference on Electrical Engineering and Computer Science (ICEECS-2012),Trivendum May 12th, 2012.
[18]Ms. S. D. Pachgade, Ms. S. S. Dhande, “Outlier Detection over Data Set Using Cluster-Based and Distance-Based Approach”, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 2, Issue 6, June 2012.
[19]M. H. Marghny , Ahmed I. Taloba, “ Outlier Detection using Improved Genetic K-meanss”, IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 38, NO. 11, July 2013.