A New Hybrid Classification Method for Condensing of Large Datasets: A Case Study in the Field of Intrusion Detection

Full Text (PDF, 869KB), PP.32-41

Views: 0 Downloads: 0


Saeed Khazaee 1,* ALI Bozorgmehr 2

1. Engineering Department, Islamic Azad University, Chalous Branch, Iran

2. Electrical Engineering Department, Imam Hosssein University, Tehran, Iran

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2015.04.04

Received: 6 Dec. 2014 / Revised: 2 Jan. 2015 / Accepted: 26 Feb. 2015 / Published: 8 Apr. 2015

Index Terms

Intrusion Detection, Artificial Neural Network, Decision Tree, Sampling


In large data sets data pre-processing always has been the most essential data processing stages. Sampling and using small volumes of data has been an integrated part of data pre-processing to decrease training errors and increase speed of learning. In this study, instead of sampling from all data and using small parts of them, a method has been proposed to not only benefit from sampling but all data be used during training process. In this way, outliers would be detected and even used in completely different way. Using artificial neural networks, new features for instances will be built and the problem of intrusion detection will be mapped as a 10- feature problem. In fact, such a classification is for feature creation and as features in new problem only have discrete values, in final classification decision tree will be used. The results of proposed method on KDDCUP’99 datasets and Cambridge datasets show that this has improved classification in many classes dramatically.

Cite This Paper

SAEED Khazaee, ALI Bozorgmehr, "A New Hybrid Classification Method for Condensing of Large Datasets: A Case Study in the Field of Intrusion Detection", International Journal of Modern Education and Computer Science (IJMECS), vol.7, no.4, pp.32-41, 2015. DOI:10.5815/ijmecs.2015.04.04


[1]Yuksel Ozbay, Gulay Tezel. A new method for classification of ECG arrhythmias using neural network with adaptive activation function. Digital Signal Processing 2010; 20(4):1040–1049.
[2]Yuksel Ozbay, Rahime Ceylan, Bekir Karlik. A fuzzy clustering neural network architecture for classification of ECG arrhythmias. Computers in Biology and Medicine 2006; 36: 376–388.
[3]Saeed Khazaee, Maryam Sharifi Rad. Using fuzzy c-means algorithm for improving intrusion detection performance. In: 2013 13th Iranian Conference on Fuzzy Systems, 27-29 Aug 2013, doi: 10.1109/IFSC.2013.6675669, IEEE Computer society.
[4]Saeed Khazaee, Karim Faez,"A Novel Classification Method Using Hybridization of Fuzzy Clustering and Neural Networks for Intrusion Detection", IJMECS, vol.6, no.11, pp.11-24, 2014.DOI: 10.5815/ijmecs.2014.11.02.
[5]Jose´ M. Jerez, Ignacio Molina, Pedro J. Garcıa-Laencina, Emilio Alba, Nuria Ribelles, Miguel Martın , Leonardo Franco. Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artificial Intelligence in Medicine 2010; 50(2): 105-115.
[6]Shih-Wei Lin, Kuo-Ching Ying, Chou-Yuan Lee, Zne-Jung Lee. An intelligent algorithm with feature selection and decision rules applied to anomaly intrusion detection. Applied Soft Computing 2012; 12(10) 3285–3290.
[7]S.J. Han, S.B. Cho. Detecting intrusion with rule-based integration of multiple models. Computer & Security, 2003; 22(7): 613-623.
[8]Subhash Chandra Pandey, Gora Chand Nandi. TSD based framework for mining the induction. Journal of Computational Science 2013; available online on Elsevier.
[9]S. Kummar. Classification and detection of computer intrusions. Ph.D Thesis, Purdue University, 1995.
[10]Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining. Boston: Pearson Addison Wesley, 2005.
[11]D. Fisch, A. Hofmann and B. Sick. On the versatility of radial basis function neural networks: A case study in the field of intrusion detection. Information Sciences 2010; 180(12): 2421-2439.
[12]F. Keller. Clustering. Computer University Saarlandes, Tutorial Slides.
[13]J. C. Bezdek. Pattern Recognition with Fuzzy Objective Function Algorithms. New York: Plenum, 1981.
[14]J.C. Bezdek, R. Ehrlich, W. Full. FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences 1984; 10: 191-203.
[15]1999 KDD Cup Competition (Available on http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html).
[16]Gang Wang, Jinxing Hao. A new approach to intrusion detection using Artificial Neural Networks and fuzzy clustering. Expert Systems with Applications 2010; 37: 6225–6232.
[17]M. Sheikhan and M. Sharifi Rad. Misuse detection based on feature selection by fuzzy association rule mining. World Applied Sciences Journal (Special Issue of Computer & Electrical Engineering) 2010; 10: 32-40.
[18]H.T. Nguyen, K. Franke and S. Petrovi'c. Towards a generic feature-selection measure for intrusion detection. In: 20th International Conference on Pattern Recognition; 23-26 Aug 2010, pp. 1529-1532, 2010.
[19]A. Zainal, M.A. Maarof and S.M. Shamsuddin. Feature selection using Rough-DPSO in anomaly intrusion detection. Lecture Notes in Computer Science, Computational Science and its Applications 2007; 4705: 512–524.
[20]Z. Farzanyar, M. Kangavari and S. Hashemi, "Effect of similar behaving attributes in mining of fuzzy association rules in the large databases", Lecture Notes in Computer Science, Computational Science and its Applications, Volume 3980, pp. 1100 – 1109, 2006.
[21]F. Martínez-álvarez, A. Troncoso, J.C. Riquelme, J.S. Aguilar–Ruiz. Discovery of motifs to forecast outlier occurrence in time series. Pattern Recognition Letters 2011; 32(12): 1652–1665.
[22]Oral Alan, Cagatay Catal. Thresholds based outlier detection approach for mining class outliers: An empirical case study on software measurement datasets. Expert Systems with Applications 2011; 38(4): 3440–3445.
[23]Zhenxia Xue, Youlin Shang, Aifen Feng. Semi-supervised outlier detection based on fuzzy rough C-means clustering. Mathematics and Computers in Simulation 2010; 80(9): 1911-1921.
[24]Phurivit Sangkatsanee, Naruemon Wattanapongsakorn, Chalermpol Charnsripinyo. Practical real-time intrusion detection using machine learning approaches. Computer Communications, 2011; 34(18): 2227–2235.
[25]Mansour Sheikhan, Maryam Sharifi Rad. Gravitational search algorithm–optimized neural misuse detector with selected features by fuzzy grids–based association rules mining. Neural Computing and Applications 2013; 23(7): 2451-2463.
[26]A. N. Toosi, M. A. Kahani. New approach to intrusion detection based on an evolutionary soft computing model using neuro-fuzzy classifiers. Computer Communications 2007; 30: 2201–2212.
[27]S.J. Horng, M.Y. Su, Y.H. Chen, T.W. Kao, R.J. Chen, J.L. Lai and C.D. Perkasa. A novel intrusion detection system based on hierarchical clustering and support vector machines. Expert Systems with Applications 2011; 38(1): 306-313.
[28]Hongli Zhang, Gang Lu, Mahmoud T. Qassrawi, Yu Zhang, Xiangzhan Yu. Feature selection for optimizing traffic classification. Computer Communications 2012; 35(12): 1457–1471.
[29]Denis Zuev, Andrew W. Moore. Traffic Classification using a Statistical Approach. Technical report, Intel Research, Cambridge, 2005.
[30]Andrew W. Moore, Denis Zuevy. Internet Traffic Classification Using Bayesian Analysis Techniques. Technical report, Intel Research, Cambridge, 2005.
[31]Witten, I. H., & Frank, E. Data mining: Practical machine learning tools and techniques. Boston: Morgan Kaufmann Publishers, 2005.