Evaluation of GAN-based Models for Phishing URL Classifiers

Full Text (PDF, 685KB), PP.1-14

Views: 0 Downloads: 0

Author(s)

Thi Thanh Thuy Pham 1 Tuan Dung Pham 2 Viet Cuong Ta 2,*

1. Academy of People Security, 125 Tran Phu Street, Ha Dong District, 12100, Ha Noi, Vietnam

2. VNU University of Engineering and Technology, E3 Building, 144 Xuan Thuy Street, Cau Giay District, 11310, Ha Noi, Vietnam

* Corresponding author.

DOI: https://doi.org/10.5815/ijcnis.2023.02.01

Received: 20 Oct. 2022 / Revised: 1 Nov. 2022 / Accepted: 1 Dec. 2022 / Published: 8 Apr. 2023

Index Terms

Attacker, Defender, Discriminator Network, Fake URLs, Phishing URL Detection

Abstract

Phishing attacks by malicious URL/web links are common nowadays. The user data, such as login credentials and credit card numbers can be stolen by their careless clicking on these links. Moreover, this can lead to installation of malware on the target systems to freeze their activities, perform ransomware attack or reveal sensitive information. Recently, GAN-based models have been attractive for anti-phishing URLs. The general motivation is using Generator network (G) to generate fake URL strings and Discriminator network (D) to distinguish the real and the fake URL samples. This is operated in adversarial way between G and D so that the synthesized URL samples by G become more and more similar to the real ones. From the perspective of cybersecurity defense, GAN-based motivation can be exploited for D as a phishing URL detector or classifier. This means after training GAN on both malign and benign URL strings, a strong classifier/detector D can be achieved. From the perspective of cyberattack, the attackers would like to to create fake URLs that are as close to the real ones as possible to perform phishing attacks. This makes them easier to fool users and detectors. In the related proposals, GAN-based models are mainly exploited for anti-phishing URLs. There have been no evaluations specific for GAN-generated fake URLs. The attacker can make use of these URL strings for phishing attacks. In this work, we propose to use TLD (Top-level Domain) and SSIM (Structural Similarity Index Score) scores for evaluation the GAN-synthesized URL strings in terms of the structural similariy with the real ones. The more similar in the structure of the GAN-generated URLs are to the real ones, the more likely they are to fool the classifiers. Different GAN models from basic GAN to others GAN extensions of DCGAN, WGAN, SEQGAN are explored in this work. We show from the intensive experiments that D classifier of basic GAN and DCGAN surpasses other GAN models of WGAN and SegGAN. The effectiveness of the fake URL patterns generated from SeqGAN is the best compared to other GAN models in both structural similarity and the ability in deceiving the phishing URL classifiers of LSTM (Long Short Term Memory) and RF (Random Forest).

Cite This Paper

Thi Thanh Thuy Pham, Tuan Dung Pham, Viet Cuong Ta, "Evaluation of GAN-based Models for Phishing URL Classifiers", International Journal of Computer Network and Information Security(IJCNIS), Vol.15, No.2, pp.1-14, 2023. DOI:10.5815/ijcnis.2023.02.01

Reference

[1]https://docs.broadcom.com/doc/istr-24-2019-en (Online: Nov 08, 2022).
[2]COFENSE: cofense-annual-report-2021.pdf [Online]. Available: https://cofense.com/wp-content/uploads/2021/02/.
[3]https://terranovasecurity.com/2021-gone-phishing-tournament-results/
[4]CISCO: 2021-cyber-security-threat-trends-phishing-crypto-top-the-list. [Online]. Available: https://umbrella.cisco.com/info/.
[5]https://datatracker.ietf.org/doc/html/rfc3986.
[6]Marchal, Samuel and Saari, Kalle and Singh, Nidhi and Asokan, N, “Know your phish: Novel techniques for detecting phishing sites and their targets,” in 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS), pp. 323–333. Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative adversarial nets." Advances in neural information processing systems 27 (2014).
[7]Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
[8]Denton, Emily L., Soumith Chintala, and Rob Fergus. "Deep generative image models using a laplacian pyramid of adversarial networks." Advances in neural information processing systems 28 (2015).
[9]M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017.
[10]Pham, Tuan Dung, et al, "Exploring Efficiency of GAN-based Generated URLs for Phishing URL Detection," International Conference on Multimedia Analysis and Pattern Recognition (MAPR). IEEE, 2021.
[11]Zhu, Jun-Yan, Taesung Park, Phillip Isola, and Alexei A. Efros. "Unpaired image-to-image translation using cycleconsistent adversarial networks." In Proceedings of the IEEE international conference on computer vision, pp. 2223- 2232. 2017.
[12]Yu, Lantao, Weinan Zhang, Jun Wang, and Yong Yu. "Seqgan: Sequence generative adversarial nets with policy gradient." In Proceedings of the AAAI conference on artificial intelligence, vol. 31, no. 1. 2017.
[13]Zhang, Yizhe, Zhe Gan, and Lawrence Carin. "Generating text via adversarial training." In NIPS workshop on Adversarial Training, vol. 21, pp. 21-32. academia. edu, 2016.
[14]https://docs.python.org/3/library/urllib.parse.html#module-urllib.parse.
[15]Moghimi, Mahmood, and Ali Yazdian Varjani. "New rule-based phishing detection method," Expert systems with applications 53 (2016), pp. 231-242.
[16]Singh, Charu. "Phishing Website Detection Based on Machine Learning: A Survey." 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS). IEEE, 2020.
[17]Sabir, Bushra, M. Ali Babar, and Raj Gaire. "An evasion attack against ml-based phishing url detectors." arXiv preprint arXiv:2005.08454 (2020).
[18]Kumar, N., Misra, S., Obaidat, M., ‘Collaborative Learning Automata-Based Routing for Rescue Operations in Dense Urban Regions Using Vehicular Sensor Networks’, In IEEE Systems Journal, DOI: 10.1109/JSYST.2014 22335451, 2014, pp 1081-1090.
[19]Salimans, Tim, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. "Improved techniques for training gans." Advances in neural information processing systems 29 (2016).
[20]Karras, Tero, Timo Aila, Samuli Laine, and Jaakko Lehtinen. "Progressive growing of gans for improved quality, stability, and variation." arXiv preprint arXiv:1710.10196 (2017).
[21]Corley, Isaac and Lwowski, Jonathan and Hoffman, Justin, “Domaingan: generating adversarial examples to attack domain generation algorithm classifiers,” arXiv preprint arXiv:1911.06285, 2019
[22]Bin Yu, Jie Pan, Jiaming Hu, Anderson Nascimento, and Martine De Cock, “Character level based detection of dga domain names,” In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2018.
[23]AlEroud, Ahmed, and George Karabatis, “Bypassing detection of URL-based phishing attacks using generative adversarial deep neural networks, ” Proceedings of the Sixth International Workshop on Security and Privacy Analytics, 2020.
[24]Kamran, Sharif Amit, Shamik Sengupta, and Alireza Tavakkoli, "Semi-supervised Conditional GAN for Simultaneous Generation and Detection of Phishing URLs: A Game theoretic Perspective," arXiv preprint arXiv:2108.01852 (2021).
[25]He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016.
[26]Pham, Thuy Thi Thanh, Van Nam Hoang, and Thanh Ngoc Ha. "Exploring efficiency of character-level convolution neuron network and long short term memory on malicious url detection." Proceedings of the 2018 VII International Conference on Network, Communication and Computing. 2018.