Shadman Latif; Faria Farzana Dola; MD. Mahir Afsar; Ishrat Jahan Esha; Dip Nandi

Investigation of Machine Learning Algorithms for Network Intrusion Detection

Full Text (PDF, 731KB), PP.1-22

Views: 0 Downloads: 0

Author(s)

Shadman Latif ^1,* Faria Farzana Dola ¹ MD. Mahir Afsar ¹ Ishrat Jahan Esha ¹ Dip Nandi ²

1. Department Of Computer Science, American International University-Bangladesh, Dhaka, Bangladesh

2. Faculty of Science and Technology, American International University-Bangladesh, Dhaka, Bangladesh

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2022.02.01

Received: 2 Jan. 2022 / Revised: 16 Feb. 2022 / Accepted: 5 Mar. 2022 / Published: 8 Apr. 2022

Index Terms

Machine Learning, Network intrusion, Intrusion Detection System, sampling, data preprocessing, NSL-KDD, KNN.

Abstract

Network intrusion is an increasing major concern as we are rapidly advancing in technology. To detect network intrusion, Intrusion Detection Systems are required. Among the wide range of intrusion detection technologies, machine learning methods are the most appropriate. In this paper we investigated different machine learning techniques using NSL-KDD dataset, with steps of building a model. We used Decision Tree, Support Vector Machine, Random Forest, Naïve Bayes, Neural network, adaBoost machine leaning algorithms. At step one, one-hot-encoding is applied to convert categorical to numeric features. At step two, different feature scaling techniques, including normalization and standardization, are applied on these six selected machine learning algorithms with the encoded dataset. Further in this step, for each of the six machine learning algorithms, the better scaling technique application outcome is selected for the comparison in the next step. We considered six pairs of better scaling technique with each machine learning algorithm. Among these six scaling-machine learning pairs, one pair (Naïve Bayes) is dropped for having inferior performance. Hence, the outcome of this step is five scaling-machine learning pairs. At step three, different feature reduction techniques, including low variance filter, high correlation filter, Random Forest, Incremental PCA, are applied to the five scaling-machine learning pairs from step two. Further in this step, for each of the five scaling-machine learning pairs, the better feature reduction technique application outcome is selected for the comparison in the next step. The outcome of this step is five feature reduced scaling-machine learning pairs. At step four, different sampling techniques, including SMOTE, Borderline-SMOTE, ADASYN are applied to the five feature reduced scaling-machine learning pairs. The outcome of this step is five over sampled, feature reduced scaling-machine learning pairs. This outcome is then finally compared to find the best pairs to be used for intrusion detection system.

Cite This Paper

Shadman Latif, Faria Farzana Dola, MD. Mahir Afsar, Ishrat Jahan Esha, Dip Nandi, "Investigation of Machine Learning Algorithms for Network Intrusion Detection", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.14, No.2, pp. 1-22, 2022. DOI:10.5815/ijieeb.2022.02.01

Reference

[1]G. Padmavathi, S. Divya, A Survey on Various Security Threats and Classification of Malware Attacks , Vulnerabilities and Detection Techniques, (2013).
[2]J. Andress, S. Winterfeld, The Basics of Information Security: Understanding the Fundamentals of InfoSec in Theory and Practice: Second Edition, The Basics of Information Security: Understanding the Fundamentals of InfoSec in Theory and Practice: Second Edition. (2014) 1–217.
[3]C. Modi, D. Patel, B. Borisaniya, H. Patel, A. Patel, M. Rajarajan, A survey of intrusion detection techniques in Cloud, Undefined. 36 (2013) 42–57. https://doi.org/10.1016/J.JNCA.2012.05.003.
[4]A. Mallik, A. Ahsan, M.M.Z. Shahadat, J.C. Tsou, Man-in-the-middle-attack: Understanding in simple words, International Journal of Data and Network Science. 3 (2019) 77–92. https://doi.org/10.5267/J.IJDNS.2019.1.001.
[5]R. v. Deshmukh, K.K. Devadkar, Understanding DDoS Attack & its Effect in Cloud Environment, Procedia Computer Science. 49 (2015) 202–210. https://doi.org/10.1016/J.PROCS.2015.04.245.
[6]E.M. Rudd, A. Rozsa, M. Günther, T.E. Boult, A Survey of Stealth Malware Attacks, Mitigation Measures, and Steps Toward Autonomous Open World Solutions, IEEE Communications Surveys and Tutorials. 19 (2017) 1145–1172. https://doi.org/10.1109/COMST.2016.2636078.
[7]S. Kamiya, J.-K. Kang, J. Kim, A. Milidonis, R.M. Stulz, What is the Impact of Successful Cyberattacks on Target Firms?, NBER Working Papers. (2018). https://ideas.repec.org/p/nbr/nberwo/24409.html (accessed January 16, 2022).
[8]M. Bada, J.R.C. Nurse, The social and psychological impact of cyberattacks, Emerging Cyber Threats and Cognitive Vulnerabilities. (2020) 73–92. https://doi.org/10.1016/B978-0-12-816203-3.00004-6.
[9]Y.Y. Wee, W.P. Cheah, S.C. Tan, K. Wee, Reasoning with Cause and Effect in Intrusion Detection, International Journal of Computer and Electrical Engineering. (2012) 641–646. https://doi.org/10.7763/IJCEE.2012.V4.574.
[10]B. Mukherjee, T.L. Heberlein, K. Levitt, Network intrusion detection, Undefined. 8 (1994) 26–41. https://doi.org/10.1109/65.283931.
[11]A. Khraisat, I. Gondal, P. Vamplew, J. Kamruzzaman, Survey of intrusion detection systems: techniques, datasets and challenges, Cybersecurity. 2 (2019) 1–22. https://doi.org/10.1186/S42400-019-0038-7/FIGURES/8.
[12]R. Patgiri, U. Varshney, T. Akutota, R. Kunde, An Investigation on Intrusion Detection System Using Machine Learning, Proceedings of the 2018 IEEE Symposium Series on Computational Intelligence, SSCI 2018. (2019) 1684–1691. https://doi.org/10.1109/SSCI.2018.8628676.
[13]V. Jyothsna, V. v. Rama Prasad, K. Munivara Prasad, A Review of Anomaly based Intrusion Detection Systems, International Journal of Computer Applications. 28 (2011) 26–35. https://doi.org/10.5120/3399-4730.
[14]S. Einy, C. Oz, Y.D. Navaei, The Anomaly- And Signature-Based IDS for Network Security Using Hybrid Inference Systems, Mathematical Problems in Engineering. 2021 (2021). https://doi.org/10.1155/2021/6639714.
[15]P. García-Teodoro, J. Díaz-Verdejo, G. Maciá-Fernández, E. Vázquez, Anomaly-based network intrusion detection: Techniques, systems and challenges, Computers and Security. 28 (2009) 18–28. https://doi.org/10.1016/J.COSE.2008.08.003.
[16]T. Verwoerd, R. Hunt, Intrusion detection techniques and approaches, Computer Communications. 25 (2002) 1356–1365. https://doi.org/10.1016/S0140-3664(02)00037-3.
[17]C. Kruegel, T. Toth, Using Decision Trees to Improve Signature-Based Intrusion Detection, Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2820 (2003) 173–191. https://doi.org/10.1007/978-3-540-45248-5_10.
[18]M.K. Asif, T.A. Khan, T.A. Taj, U. Naeem, S. Yakoob, Network Intrusion Detection and its strategic importance, BEIAC 2013 - 2013 IEEE Business Engineering and Industrial Applications Colloquium. (2013) 140–144. https://doi.org/10.1109/BEIAC.2013.6560100.
[19]S.K. Gautam, H. Om, Computational neural network regression model for Host based Intrusion Detection System, Perspectives in Science. 8 (2016) 93–95. https://doi.org/10.1016/J.PISC.2016.04.005.
[20]P. de Boer, P. de Boer, M. Pels, Host-based Intrusion Detection Systems, (2005). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.2707 (accessed January 16, 2022).
[21]Bace, R. Gurley. (1999). Intrusion detection / Rebecca Gurley Bace. In Intrusion detection. Macmillan Technical Publishing.
[22]O. Depren, M. Topallar, E. Anarim, M.K. Ciliz, An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks, Expert Systems with Applications. 29 (2005) 713–722. https://doi.org/10.1016/J.ESWA.2005.05.002.
[23]N. Das, T.S.-I.J. of A.N. and, undefined 2014, Survey on host and network based intrusion detection system, Academia.Edu. (n.d.). https://www.academia.edu/download/43954388/Survey_on_Host_and_Network_Based_Intrusi20160321-26062-7stpvd.pdf (accessed January 16, 2022).
[24]R. Brackney, Cyber-intrusion response, Proceedings of the IEEE Symposium on Reliable Distributed Systems. (1998) 413–415. https://doi.org/10.1109/RELDIS.1998.740533.
[25]S. Jose, D. Malathi, B. Reddy, D. Jayaseeli, A Survey on Anomaly Based Host Intrusion Detection System, Journal of Physics: Conference Series. 1000 (2018) 012049. https://doi.org/10.1088/1742-6596/1000/1/012049.
[26]Y. Shin, K. Kim, Comparison of anomaly detection accuracy of host-based intrusion detection systems based on different machine learning algorithms, International Journal of Advanced Computer Science and Applications. (2020) 252–259. https://doi.org/10.14569/IJACSA.2020.0110233.
[27]A. Khraisat, I. Gondal, P. Vamplew, J. Kamruzzaman, A. Alazab, Hybrid intrusion detection system based on the stacking ensemble of C5 decision tree classifier and one class support vector machine, Electronics (Switzerland). 9 (2020). https://doi.org/10.3390/ELECTRONICS9010173.
[28]A. Khraisat, I. Gondal, P. Vamplew, J. Kamruzzaman, A. Alazab, A Novel Ensemble of Hybrid Intrusion Detection System for Detecting Internet of Things Attacks, Electronics 2019, Vol. 8, Page 1210. 8 (2019) 1210. https://doi.org/10.3390/ELECTRONICS8111210.
[29]M.A. Aydin, A.H. Zaim, K.G. Ceylan, A hybrid intrusion detection system design for computer network security, Undefined. 35 (2009) 517–526. https://doi.org/10.1016/J.COMPELECENG.2008.12.005.
[30]R.M. Elbasiony, E.A. Sallam, T.E. Eltobely, M.M. Fahmy, A hybrid network intrusion detection framework based on random forests and weighted k-means, Ain Shams Engineering Journal. 4 (2013) 753–762. https://doi.org/10.1016/J.ASEJ.2013.01.003.
[31]G. Kim, S. Lee, S. Kim, A novel hybrid intrusion detection method integrating anomaly detection with misuse detection, Expert Systems with Applications. 41 (2014) 1690–1700. https://doi.org/10.1016/J.ESWA.2013.08.066.
[32]Z. Chiba, N. Abghour, K. Moussaid, A. el Omri, M. Rida, A Cooperative and Hybrid Network Intrusion Detection Framework in Cloud Computing Based on Snort and Optimized Back Propagation Neural Network, Procedia Computer Science. 83 (2016) 1200–1206. https://doi.org/10.1016/J.PROCS.2016.04.249.
[33]A.K. Dalai, S.K. Jena, Hybrid network intrusion detection systems: A decade’s perspective, Lecture Notes in Electrical Engineering. 395 (2017) 341–349. https://doi.org/10.1007/978-81-322-3592-7_35.
[34]P. Mishra, V. Varadharajan, U. Tupakula, E.S. Pilli, A Detailed Investigation and Analysis of Using Machine Learning Techniques for Intrusion Detection, Undefined. 21 (2019) 686–728. https://doi.org/10.1109/COMST.2018.2847722.
[35]N. Moustafa, J. Slay, The Significant Features of the UNSW-NB15 and the KDD99 Data Sets for Network Intrusion Detection Systems, (2017) 25–31. https://doi.org/10.1109/BADGERS.2015.014.
[36]S. Sapre, P. Ahmadi, K. Islam, A Robust Comparison of the KDDCup99 and NSL-KDD IoT Network Intrusion Detection Datasets Through Various Machine Learning Algorithms, (2019). https://arxiv.org/abs/1912.13204v1 (accessed January 16, 2022).
[37]M. Mohammed, M.B. Khan, E.B.M. Bashie, Machine learning: Algorithms and applications, Machine Learning: Algorithms and Applications. (2016) 1–204. https://doi.org/10.1201/9781315371658.
[38]B.M.-I.J. of S. and R. (IJSR), undefined 2020, Machine Learning Algorithms-A Review, Researchgate.Net. (2018). https://doi.org/10.21275/ART20203995.
[39]I. Muhammad, Z. Yan, SUPERVISED MACHINE LEARNING APPROACHES: A SURVEY, Undefined. 05 (2015) 946–952. https://doi.org/10.21917/IJSC.2015.0133.
[40]N. Williams, S. Zander, G. Armitage, A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification, Computer Communication Review. 36 (2006) 7–15. https://doi.org/10.1145/1163593.1163596.
[41]S. Choudhury, A. Bhowal, Comparative analysis of machine learning algorithms along with classifiers for network intrusion detection, Undefined. (2015) 89–95. https://doi.org/10.1109/ICSTM.2015.7225395.
[42]K.A. Jalil, M.H. Kamarudin, M.N. Masrek, Comparison of Machine Learning algorithms performance in detecting network intrusion, Undefined. (2010) 221–226. https://doi.org/10.1109/ICNIT.2010.5508526.
[43]B. Dong, X. Wang, Comparison deep learning method to traditional methods using for network intrusion detection, Undefined. (2016) 581–585. https://doi.org/10.1109/ICCSN.2016.7586590.
[44]M. Almseidin, M. Alzubi, S. Kovacs, M. Alkasassbeh, Evaluation of Machine Learning Algorithms for Intrusion Detection System, SISY 2017 - IEEE 15th International Symposium on Intelligent Systems and Informatics, Proceedings. (2018) 277–282. https://doi.org/10.1109/SISY.2017.8080566.
[45]H. Malhotra, P.S.-I.J. of Computer, undefined 2019, Intrusion Detection using Machine Learning and Feature Selection., J.Mecs-Press.Net. (n.d.). http://j.mecs-press.net/ijcnis/ijcnis-v11-n4/IJCNIS-V11-N4-6.pdf (accessed January 27, 2022).
[46]I. Syarif, E. Zaluska, A. Prugel-Bennett, G. Wills, Application of Bagging, Boosting and Stacking to Intrusion Detection, Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 7376 LNAI (2012) 593–602. https://doi.org/10.1007/978-3-642-31537-4_46.
[47]K. Atefi, S. Yahya, A.Y. Dak, A. Atefi, A hybrid intrusion detection system based on different machine learning algorithms, Undefined. (2013).
[48]J. Ren, J. Guo, W. Qian, H. Yuan, X. Hao, H. Jingjing, Building an Effective Intrusion Detection System by Using Hybrid Data Optimization Based on Machine Learning Algorithms, Security and Communication Networks. 2019 (2019). https://doi.org/10.1155/2019/7130868.
[49]A. Iqbal, S. Aftab, A Feed-Forward and Pattern Recognition ANN Model for Network Intrusion Detection, International Journal of Computer Network and Information Security. 11 (2019) 19–25. https://doi.org/10.5815/IJCNIS.2019.04.03.
[50]P.G. Majeed, S. Kumar, Genetic Algorithms in Intrusion Detection Systems: A Survey, International Journal of Innovation and Applied Studies. 5 (2014) 233–240. http://www.ijias.issr-journals.org/abstract.php?article=IJIAS-13-284-07 (accessed January 16, 2022).
[51]M. Tavallaee, E. Bagheri, W. Lu, A.A. Ghorbani, A detailed analysis of the KDD CUP 99 data set, IEEE Symposium on Computational Intelligence for Security and Defense Applications, CISDA 2009. (2009). https://doi.org/10.1109/CISDA.2009.5356528.
[52]F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, É. Duchesnay, Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research. 12 (2012) 2825–2830. https://arxiv.org/abs/1201.0490v4 (accessed January 16, 2022).
[53]Ahsan M, Gomes R, Chowdhury MM, Nygard KE. Enhancing Machine Learning Prediction in Cybersecurity Using Dynamic Feature Selector. Journal of Cybersecurity and Privacy. 2021; 1(1):199-218. https://doi.org/10.3390/jcp1010011
[54]N. v. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, SMOTE: Synthetic Minority Over-sampling Technique, Journal of Artificial Intelligence Research. 16 (2002) 321–357. https://doi.org/10.1613/JAIR.953.
[55]H. Han, W.Y. Wang, B.H. Mao, Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning, Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 3644 LNCS (2005) 878–887. https://doi.org/10.1007/11538059_91.
[56]H. He, Y. Bai, E.A. Garcia, S. Li, ADASYN: Adaptive synthetic sampling approach for imbalanced learning, Proceedings of the International Joint Conference on Neural Networks. (2008) 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969.

International Journal of Information Engineering and Electronic Business (IJIEEB)