An Intelligent System for Detecting Fake Materials on the Internet

Full Text (PDF, 1215KB), PP.42-59

Views: 0 Downloads: 0

Author(s)

Aya S. Noah 1,* Naglaa E. Ghannam 1 Gaber A. Elsharawy 1 Abeer S. Desuky 1

1. Al-Azhar University, Faculty of Science (Girl’s branch), Mathematics Department, Cairo, 11754, Egypt

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2023.05.04

Received: 20 Feb. 2023 / Revised: 27 Mar. 2023 / Accepted: 9 Jun. 2023 / Published: 8 Oct. 2023

Index Terms

Deep Neural Networks, MLP Classifier, Fake Websites, Fake materials, Feature Selection

Abstract

There has been a significant rise in internet usage in recent years, which has led to the presence of data theft and the diversity of counterfeit materials. This has resulted the proliferation of cybercrimes and the theft of personal data via social media, e-mail, and phishing websites that are similar to the websites commonly used to grab user data details like that of a credit card or login ID. Phishing, a prevalent form of cybercrime, poses a danger to online security through the theft of personal information, and with the emergence of the COVID-19 virus, which has led to people and organizations being drawn towards the Internet and many people and companies being forced to work remotely, it has led to an increase in the existing phishing threats. Previously, hackers took advantage of the situation to infiltrate the devices of people and companies in numerous ways, which caused huge financial losses and damage to organizations. Based on previous results and research, Machine Learning (ML) is selected by researchers as an efficient method for identifying malicious software web pages from original web pages. This paper presents 30 characteristics of websites, which are analyzed using a correlation matrix to determine the relationship between variables. Feature selection is performed through a wrapper method and Extra Tree Classifiers (ETC) to identify the top-ranked characteristics (Features) for website classification. To evaluate web pages, various machine learning techniques such as Random Forest Tree (RF), Multilayer Perceptron (MLP), Decision Tree (DT), and Support Vector Machine (SVM) are used. The results of monitoring indicate that MLP, a deep neural network, outperforms all other techniques in terms of performance.

Cite This Paper

Aya S. Noah, Naglaa E. Ghannam, Gaber A. Elsharawy, Abeer S. Desuky, "An Intelligent System for Detecting Fake Materials on the Internet", International Journal of Modern Education and Computer Science(IJMECS), Vol.15, No.5, pp. 42-59, 2023. DOI:10.5815/ijmecs.2023.05.04

Reference

[1]A. Abuzuraiq, M. Alkasassbeh, and M. Almseidin, “Intelligent methods for accurately detecting phishing websites,” 2020 11th International Conference on Information and Communication Systems (ICICS), 2020. doi:10.1109/icics49469.2020.239509.
[2]A. K. Jain and B. B. Gupta, “A machine learning based approach for phishing detection using hyperlinks information,” Journal of Ambient Intelligence and Humanized Computing, vol. 10, no. 5, pp. 2015–2028, 2018. Doi: 10.1007/s12652-018-0798-z.
[3]APWG GA, Manning R (2020) APWG Phishing Reports.
[4]“2nd quarter 2022 - docs.apwg.org.” [Online]. Available: https://docs.apwg.org/reports/apwg_trends_report_q2_2022.pdf. [Accessed: 26-Dec-2022].
[5]A. K. Dutta, “Detecting phishing websites using Machine Learning Technique,” PLOS ONE, vol. 16, no. 10, 2021. doi: 10.1371/journal.pone.0258361.
[6]S. A. Khan, W. Khan, and A. Hussain, “Phishing attacks and websites classification using machine learning and multiple datasets (a comparative analysis),” Intelligent Computing Methodologies, pp. 301–313, 2020. doi:10.1007/978-3-030-60796 -8_26.
[7]P. Saravanan and S. Subramanian, “A framework for detecting phishing websites using GA based feature selection and ARTMAP based website classification,” Procedia Computer Science, vol. 171, pp. 1083–1092, 2020. doi: 10.1016/j.procs.2020.04.116.
[8]M. G. HR, A. MV, G. P. S, and V. S, “Development of anti-phishing browser based on Random Forest and rule of extraction framework,” Cybersecurity, vol. 3, no. 1, 2020. doi:10.1186/s42400-020-00059-1.
[9]Shantanu, B. Janet, and R. Joshua Arul Kumar, “Malicious URL detection: A comparative study,” 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), 2021. doi:10.1109/icais50930.2021.9396014.
[10]A. Mandadi, S. Boppana, V. Ravella, and R. Kavitha, “Phishing website detection using Machine Learning,” 2022 IEEE 7th International conference for Convergence in Technology (I2CT), 2022.
[11]L. Bustio-Martínez, M. A. Álvarez-Carmona, V. Herrera-Semenets, C. Feregrino-Uribe, and R. Cumplido, “A lightweight data representation for phishing urls detection in iot environments,” Information Sciences, vol. 603, pp. 42–59, 2022. doi: 10.1016/j.ins.2022.04.059.
[12]O. K. Sahingoz, E. Buber, O. Demir, and B. Diri, “Machine learning based phishing detection from urls,” Expert Systems with Applications, vol. 117, pp. 345–357, 2019. doi: 10.1016/j.eswa.2018.09.029.
[13]C. Catal, G. Giray, B. Tekinerdogan, S. Kumar, and S. Shukla, “Applications of deep learning for phishing detection: A systematic literature review,” Knowledge and Information Systems, vol. 64, no. 6, pp. 1457–1500, 2022. doi:10.1007/s10115-022-01672-x.
[14]J. Anitha and M. Kalaiarasu, “A new hybrid deep learning-based phishing detection system using MCS-Dnn Classifier,” Neural Computing and Applications, vol. 34, no. 8, pp. 5867–5882, 2022. doi:10.1007/s00521-021-06717-w.
[15]A. Ozcan, C. Catal, E. Donmez, and B. Senturk, “A hybrid DNN–LSTM model for detecting phishing urls,” Neural Computing and Applications, 2021. doi:10.1007/s00521-021-06401-z.
[16]L. Lakshmi, M. P. Reddy, C. Santhaiah, and U. J. Reddy, “Smart phishing detection in web pages using supervised deep learning classification and Optimization Technique adam,” Wireless Personal Communications, vol. 118, no. 4, pp. 3549–3564, 2021. doi:10.1007/s11277-021-08196-7.
[17]T. Li, G. Kou, and Y. Peng, “Improving malicious urls detection via feature engineering: Linear and nonlinear space transformation methods,” Information Systems, vol. 91, p. 101494, 2020. doi:10.1016/j.is.2020.101494.
[18]M. Alsaedi, F. Ghaleb, F. Saeed, J. Ahmad, and M. Alasli, “Cyber threat intelligence-based malicious URL detection model using ensemble learning,” Sensors, vol. 22, no. 9, p. 3373, 2022. doi:10.3390/s22093373.
[19]A. Almomani, M. Alauthman, M. T. Shatnawi, M. Alweshah, A. Alrosan, W. Alomoush, B. B. Gupta, B. B. Gupta, and B. B. Gupta, “Phishing website detection with semantic features based on machine learning classifiers,” International Journal on Semantic Web and Information Systems, vol. 18, no. 1, pp. 1–24, 2022.
[20]L. Gajic, D. Cvetnic, M. Zivkovic, T. Bezdan, N. Bacanin, and S. Milosevic, “Multi-layer perceptron training using hybridized bat algorithm,” Computational Vision and Bio-Inspired Computing, pp. 689–705, 2021. doi:10.1007/978-981-33-6862-0_54.
[21]J. Zhang, C. Li, Y. Yin, J. Zhang, and M. Grzegorzek, “Applications of artificial neural networks in Microorganism Image Analysis: A comprehensive review from conventional multilayer perceptron to popular convolutional neural network and potential visual transformer,” Artificial Intelligence Review, 2022. doi:10.1007/s10462-022-10192-7.
[22]S. Abirami and P. Chitra, “Energy-efficient edge based real-time healthcare support system,” Advances in Computers, pp. 339–368, 2020. doi: 10.1016/bs.adcom.2019.09.007.
[23]Y. Xiao, W. Huang, and J. Wang, “A random forest classification algorithm based on dichotomy rule fusion,” 2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC), 2020. doi:10.1109/iceiec49280.2020.9152236.
[24]L. Xie, Z. Li, Y. Zhou, Y. He, and J. Zhu, “Computational diagnostic techniques for electrocardiogram signal analysis,” Sensors, vol. 20, no. 21, p. 6318, 2020. doi:10.3390/s20216318.
[25]R. M. Panda and B. S. Daya Sagar, “Decision tree,” Encyclopedia of Mathematical Geosciences, pp. 1–7, 2022. doi:10.1007/978-3-030-26050-7_81-2.
[26]C. El Morr, M. Jammal, H. Ali-Hassan, and W. El-Hallak, “Decision trees,” International Series in Operations Research & Management Science, pp. 251–278, 2022. doi:10.1007/978-3-031-16990-8_8.
[27]B. Kamiel, T. Akbar, Sudarisman, and Krisdiyanto, “Cavitation detection of centrifugal pumps using SVM and statistical features,” Lecture Notes in Mechanical Engineering, pp. 1–9, 2022. doi:10.1007/978-981-19-0867-5_1.
[28]M.-T. Wu, “Confusion matrix and minimum cross-entropy metrics-based motion recognition system in the classroom,” Scientific Reports, vol. 12, no. 1, 2022. doi:10.1038/s41598-022-07137-z.
[29]C. V. Gonzalez Zelaya, “Towards explaining the effects of data preprocessing on machine learning,” 2019 IEEE 35th International Conference on Data Engineering (ICDE), 2019.
[30]UCI Machine Learning Repository: Phishing Websites Data Set [Online]. Available: https://archive.ics.uci.edu/ml/datasets/phishing+websites. [Accessed: 30-Dec-2022].
[31]J. Moedjahedy, A. Setyanto, F. K. Alarfaj, and M. Alreshoodi, “CCrFS: Combine correlation features selection for detecting phishing websites using machine learning,” Future Internet, vol. 14, no. 8, p. 229, 2022. doi:10.3390/fi14080229.
[32]E. O. Abiodun, A. Alabdulatif, O. I. Abiodun, M. Alawida, A. Alabdulatif, and R. S. Alkhawaldeh, “A systematic review of emerging feature selection optimization methods for optimal text classification: The Present State and Prospective Opportunities,” Neural Computing and Applications, vol. 33, no. 22, pp. 15091–15118, 2021.
[33]E. O. Abiodun, A. Alabdulatif, O. I. Abiodun, M. Alawida, A. Alabdulatif, and R. S. Alkhawaldeh, “A systematic review of emerging feature selection optimization methods for optimal text classification: The Present State and Prospective Opportunities,” Neural Computing and Applications, vol. 33, no. 22, pp. 15091–15118, 2021.
[34]J. Brank, D. Mladenić, M. Grobelnik, H. Liu, D. Mladenić, P. A. Flach, G. C. Garriga, H. Toivonen, and H. Toivonen, “Feature selection,” Encyclopedia of Machine Learning, pp. 402–406, 2011.
[35]I. Attoui, B. Oudjani, N. Boutasseta, N. Fergani, M.-S. Bouakkaz, and A. Bouraiou, “Novel predictive features using a wrapper model for rolling bearing fault diagnosis based on vibration signal analysis,” The International Journal of Advanced Manufacturing Technology, vol. 106, no. 7-8, pp. 3409–3435, 2020.
[36]M. G. Lanjewar, J. S. Parab, A. Y. Shaikh, and M. Sequeira, “CNN with machine learning approaches using extratreesclassifier and MRMR feature selection techniques to detect liver diseases on cloud,” Cluster Computing, 2022.
[37]S. O. Abdulsalam, A. A. Mohammed, J. F. Ajao, R. S. Babatunde, R. O. Ogundokun, C. T. Nnodim, and M. O. Arowolo, “Performance evaluation of ANOVA and RFE algorithms for classifying microarray dataset using SVM,” Information Systems, pp. 480–492, 2020.
[38]H. H. Luong, N. T. Phan, T. T. Duong, T. M. Dang, T. D. Nguyen, and H. T. Nguyen, “Dimensionality reduction on metagenomic data with recursive feature elimination,” Complex, Intelligent and Software Intensive Systems, pp. 68–79, 2021.
[39]N. Tabassum, F. F. Neha, M. S. Hossain, and H. S. Narman, “A hybrid machine learning based phishing website detection technique through dimensionality reduction,” 2021 IEEE International Black Sea Conference on Communications and Networking (BlackSeaCom), 2021. doi: 10.1109/BlackSeaCom52164.2021.9527806.
[40]“1st Quarter - APWG.” [Online]. Available: https://docs.apwg.org/reports/apwg_trends_report_q1_2021.pdf. [Accessed: 15-Jan-2023].
[41]B. Geyik, K. Erensoy, and E. Kocyigit, “Detection of phishing websites from urls by using classification techniques on Weka,” 2021 6th International Conference on Inventive Computation Technologies (ICICT), 2021. doi: 10.1109/ICICT50816.2021.9358642.
[42]S. S. Motiur Rahman, T. Islam, and M. I. Jabiullah, “PhishStack: Evaluation of stacked generalization in phishing URLs detection,” Procedia Computer Science, vol. 167, pp. 2410–2418, 2020. doi: 10.1016/j.procs.2020.03.294.
[43]S. Al-Ahmadi and T. Lasloum, “PDMLP: Phishing detection using multilayer perceptron,” International Journal of Network Security & Its Applications, vol. 12, no. 3, pp. 59–72, 2020.
[44]S. Wassan, C. Xi, N. Jhanjhi, and H. Raza, “A smart comparative analysis for secure electronic websites,” Intelligent Automation & Soft Computing, vol. 29, no. 3, pp. 187–199, 2021. doi:10.32604/iasc.2021.015859.
[45]A. Niranjan, D. K. Haripriya, R. Pooja, S. Sarah, P. Deepa Shenoy, and K. R. Venugopal, “EKRV: Ensemble of KNN and Random Committee using voting for efficient classification of phishing,” Advances in Intelligent Systems and Computing, pp. 403–414, 2018. doi: 10.1007/978-981-13-1708-8_37.
[46]A. Kulkarni and L. L., “Phishing websites detection using machine learning,” International Journal of Advanced Computer Science and Applications, vol. 10, no. 7, 2019. doi: 10.14569/IJACSA.2019.0100702.
[47]UCI Machine Learning Repository: Website Phishing Data Set. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/Website+Phishing. [Accessed: 06-Feb-2023].
[48]S. Alrefaai, G. Ozdemir, and A. Mohamed, “Detecting phishing websites using machine learning,” 2022 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), 2022. doi: 10.1109/HORA55278.2022.9799917.
[49]P. Robic-Butez and T. Y. Win, “Detection of phishing websites using generative Adversarial Network,” 2019 IEEE International Conference on Big Data (Big Data), 2019. doi: 10.1109/BigData47090.2019.9006352.
[50]M. A. Adebowale, K. T. Lwin, and M. A. Hossain, “Deep learning with convolutional neural network and long short-term memory for phishing detection,” 2019 13th International Conference on Software, Knowledge, Information Management and Applications (SKIMA), 2019. doi: 10.1109/SKIMA47702.2019.8982427.
[51]N. Tabassum, F. F. Neha, M. S. Hossain, and H. S. Narman, “A hybrid machine learning based phishing website detection technique through dimensionality reduction,” 2021 IEEE International Black Sea Conference on Communications and Networking (BlackSeaCom), 2021. doi: 10.1109/ACCESS.2022.3151903.