Arvind Kumar Gautam; Abhishek Bansal

Automatic Cyberstalking Detection on Twitter in Real-Time using Hybrid Approach

Full Text (PDF, 925KB), PP.58-72

Views: 0 Downloads: 0

Author(s)

Arvind Kumar Gautam ^1,* Abhishek Bansal ¹

1. Department of Computer Science, Indira Gandhi National Tribal University, Amarkantak, MP, 484886, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2023.01.05

Received: 3 Jan. 2022 / Revised: 19 Mar. 2022 / Accepted: 19 Jun. 2022 / Published: 8 Feb. 2023

Index Terms

Cyberstalking Detection, Cyberbullying, Machine Learning, Lexicon, TF-IDF, Support Vector Machine, Naive Bayes, Sentiment Analysis, Feature Extraction, Twitter

Abstract

Many people are using Twitter for thought expression and information sharing in real-time. Twitter is one of the trendiest social media applications that cybercriminals also widely use to harass the victim in the form of cyberstalking. Cyberstalkers target the victim through sexism, racism, offensive language, hate language, trolling, and fake accounts on Twitter. This paper proposed a framework for automatic cyberstalking detection on Twitter in real-time using the hybrid approach. Initially, experimental works were performed on recent unlabeled tweets collected through Twitter API using three different methods: lexicon-based, machine learning, and hybrid approach. The TF-IDF feature extraction method was used with all the applied methods to obtain the feature vectors from the tweets. The lexicon-based process produced maximum accuracy of 91.1%, and the machine learning approach achieved maximum accuracy of 92.4%. In comparison, the hybrid approach achieved the highest accuracy of 95.8% for classifying unlabeled tweets fetched through Twitter API. The machine learning approach performed better than the lexicon-based, while the performance of the proposed hybrid approach was outstanding. The hybrid method with a different approach was again applied to classify and label the live tweets collected by Twitter Streaming in real-time. Once again, the hybrid approach provided the outstanding result as expected, with an accuracy of 94.2%, recall of 94.1%, the precision of 94.6%, f-score of 94.1%, and the best AUC of 98%. The performance of machine learning classifiers was measured in each dataset labeled by all three methods. Experimental results in this study show that the proposed hybrid approach performed better than other implemented approaches in both recent and live tweets classification. The performance of SVM was better than other machine learning algorithms with all applied approaches.

Cite This Paper

Arvind Kumar Gautam, Abhishek Bansal, "Automatic Cyberstalking Detection on Twitter in Real-Time using Hybrid Approach", International Journal of Modern Education and Computer Science(IJMECS), Vol.15, No.1, pp. 58-72, 2023. DOI:10.5815/ijmecs.2023.01.05

Reference

[1](2021) The Blacklinko Website. How Many People Use Twitter in 2021? [New Twitter Stats] [Online]. Available: https://backlinko.com/twitter-users
[2]Arkaitz Zubiaga, Alex Voss, Rob Procter, Maria Liakata, Bo Wang, Adam Tsakalidis, "Towards Real-Time, Country-Level Location Classification of Worldwide Tweets," IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE), 29(9), 2017.
[3]M. Baer. "Cyberstalking and the Internet Landscape We Have Constructed," Virginia Journal of Law & Technology, 154(15), 2020, pp. 153-227.
[4]Gautam, Arvind Kumar, and Abhishek Bansal. "Email-Based Cyberstalking Detection On Textual Data Using Multi-Model Soft Voting Technique Of Machine Learning Approach." Journal of Computer Information Systems (2023): 1-20. doi: 10.1080/08874417.2022.2155267
[5]Tarmizi, Nursyahirah, Suhaila Saee, and Dayang Hanani Abanag Ibrahim, "Detecting the usage of vulgar words in cyberbully activities from Twitter," International Journal on Advanced Science, Engineering and Information Technology 10(3), 2020, pp. 1117-1122.
[6]S. Lal, L. Tiwari, R. Ranjan, A. Verma, N. Sardana, & R. Mourya, Analysis and classification of crime tweets. Procedia Computer Science, 167, 2020, pp. 1911-1919.
[7]Arvind Kumar Gautam, and Abhishek Bansal, "A Review on Cyberstalking Detection Using Machine Learning Techniques: Current Trends and Future Direction." International Journal of Engineering Trends and Technology, 70(3), 2022, pp. 95-107. Crossref, https://doi.org/10.14445/22315381/IJETT-V70I3P211
[8]Salawu, Semiu, Yulan He, and Joanna Lumsden, "Approaches to automated detection of cyberbullying: A survey," IEEE Transactions on Affective Computing, 11(1), 2017, pp. 3-24.
[9]Abdur Rahman, Mobashir Sadat, Saeed Siddik, "Sentiment Analysis on Twitter Data: Comparative Study on Different Approaches," International Journal of Intelligent Systems and Applications, 13(4), 2021, pp.1-13.
[10]K. Rakshitha, H. M. Ramalingam, M. Pavithra, H.D. Advi, & M. Hegde, "Sentimental analysis of Indian regional languages on social media," Global Transitions Proceedings, 2(2), 2021, pp. 414-420.
[11]Khoo, Christopher SG, and Sathik Basha Johnkhan, "Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons," Journal of Information Science 44(4), 2018, pp. 491-511.
[12]Norah AL-Harbi, Amirrudin Bin Kamsin, "An Effective Text Classifier using Machine Learning for Identifying Tweets' Polarity Concerning Terrorist Connotation," International Journal of Information Technology and Computer Science, 13(5), 2021, pp.19-29.
[13]A. Hasan, S. Moin, A. Karim, & S. Shamshirband, "Machine learning-based sentiment analysis for twitter accounts," Mathematical and Computational Applications, 23(1), 2018, pp. 11.
[14]Golam Mostafa, Ikhtiar Ahmed, Masum Shah Junayed, "Investigation of Different Machine Learning Algorithms to Determine Human Sentiment Using Twitter Data," International Journal of Information Technology and Computer Science, 13(2), 2021, pp.38-48.
[15]Z. Ghasem, I. Frommholz, and C. Maple, "Machine learning solutions for controlling cyberbullying and cyberstalking," International Journal of Information Security, 6(2), 2015, pp. 55-64.
[16]Ingo Frommholz, Haider M. al-Khateeb, Martin Potthast, Zinnar Ghasem, Mitul Shukla , Emma Short, “On Textual Analysis and Machine Learning for Cyberstalking Detection,” Datenbank Spektrum 16, 2016, pp. 127–135.
[17]Saravanaraj, A., J. I. Sheeba, and S. Pradeep Devaneyan, "Automatic detection of cyberbullying from twitter," International Journal of Computer Science and Information Technology & Security (IJCSITS), 2016.
[18]J. Zhang, T. Otomo, L. Li, & S. Nakajima, "Cyberbullying Detection on Twitter using Multiple Textual Features," In 2019 IEEE 10th International Conference on Awareness Science and Technology (CAST), IEEE, 2019, pp. 1-6.
[19]S. W. Liew, N. F. M. Sani, M. T. Abdullah, R. Yaakob & M. Y. Sharum, "An effective security alert mechanism for real-time phishing tweet detection on Twitter," Computers & Security, 83, 2019, pp. 201-207.
[20]V. Balakrishnan, S. Khan, H.R. Arabnia, "Improving cyberbullying detection using Twitter users' psychological features and machine learning," Science Direct, ELSEVIER, Computer & Security, 90, 2020.
[21]R. Shah, S. Aparajit, R. Chopdekar, & R. Patil, "Machine Learning based Approach for Detection of Cyberbullying Tweets," International Journal of Computer Applications, 175(37), 2020
[22]Kazim Raza Talpur, Siti Sophiayati Yuhaniz, Nilam Nur binti Amir Sjarif, Bandeh Ali, "Cyberbullying Detection In Roman Urdu Language Using Lexicon Based Approach," JOURNAL OF CRITICAL REVIEWS, 16, 2020, pp. 834-848. doi: 10.31838/jcr.07.16.109
[23]R. Geetha, S. Karthika, C. J. Sowmika, & B. M. Janani, "Auto-Off ID: Automatic Detection of Offensive Language in Social Media," In Journal of Physics: Conference Series, 1911(1), 2021.
[24]Bandi Yoshna, A. K. Jaithunbi, G. Lavanya, D.V. Smitha, "Detecting Twitter Cyberbullying Using Machine Learning," Annals of the Romanian Society for Cell Biology, 2021, pp. 16307–16315.
[25]Kumar, Akshi, and Nitin Sachdeva, "Multi-input integrative learning using deep neural networks and transfer learning for cyberbullying detection in real-time code-mix data," Multimedia systems, 2020, pp. 1-15.
[26]N. Yuvaraj, V. Chang, B. Gobinathan, A. Pinagapani, S. Kannan, G. Dhiman, & A. R. Rajan, "Automatic detection of cyberbullying using multi-feature based artificial intelligence with deep decision tree classification," Computers & Electrical Engineering, 92, 2021.
[27]S. Sadiq, A. Mehmood, S. Ullah, M. Ahmad, G.S. Choi, "Aggression detection through deep neural model on twitter," Future Generation Computer Systems, 114, 2021, pp. 120-129.
[28]Sangwan, Saurabh Raj, and M. P. S. Bhatia, "D-BullyRumbler: a safety rumble strip to resolve online denigration bullying using a hybrid filter-wrapper approach," Multimedia Systems, 2020, pp. 1-17.
[29]Lepe-Faúndez M, Segura-Navarrete A, Vidal-Castro C, Martínez-Araneda C, Rubio-Manzano C, "Detecting Aggressiveness in Tweets: A Hybrid Model for Detecting Cyberbullying in the Spanish Language," Applied Sciences, 22(11), 2021. https://doi.org/10.3390/app112210706
[30]Madan, Anjum, and Udayan Ghose. "Sentiment Analysis for Twitter Data in the Hindi Language," 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence). IEEE, 2021.
[31]Almutairi, Amjad Rasmi, and Muhammad Abdullah Al-Hagery, "Cyberbullying Detection by Sentiment Analysis of Tweets' Contents Written in Arabic in Saudi Arabia Society," International Journal of Computer Science & Network Security 21(3), 2021, pp. 112-119.
[32]I. Arora, J. Guo, S. L. Levitan, S. McGregor, & J. Hirschberg, "A novel methodology for developing automatic harassment classifiers for Twitter," In Proceedings of the Fourth Workshop on Online Abuse and Harms, 2020, pp. 7-15.
[33]F.E. Ayo, O. Folorunso, F.T. Ibharalu, I.A. Osinuga, & A. Abayomi-Alli, "A probabilistic clustering model for hate speech classification in twitter," Expert Systems with Applications, 173, 2021.
[34]S. Vijayarani, J. Ilamathi, and Nithya, "Pre-processing techniques for text mining-an overview," International Journal of Computer Science & Communication Networks, 5(1), 2015, pp. 7-16..
[35](2020) Towardsdatascience website. All you need to know about text pre-processing for NLP and Machine Learning. [Online]. Available: https://towardsdatascience.com/all-you-need-to-know-about-text-preprocessing-for-nlp-and-machine-learning-bc1c5765ff67.
[36]Kadhim, Ammar Ismael, "An evaluation of pre-processing techniques for text classification," International Journal of Computer Science and Information Security (IJCSIS), 16(6), 2018, pp. 22-32.
[37]Dimple Tiwari, Nanhay Singh, "Ensemble Approach for Twitter Sentiment Analysis", International Journal of Information Technology and Computer Science, 11(8), 2019, pp. 20-26.
[38]Gautam, Arvind Kumar, and Abhishek Bansal, " Effect of Features Extraction Techniques on Cyberstalking Detection using Machine Learning Framework," Journal of Advances in Information Technology, 13(5), 2022.
[39]Rui, Weikang, Kai Xing, and Yawei Jia. "BOWL: Bag of word clusters text representation using word embeddings." International Conference on Knowledge Science, Engineering and Management. Springer, Cham, 2016.
[40](2020) Medium Website. All about Embeddings. [Online]. Available: https://medium.com/@kashyapkathrani/all-about-embeddings-829c8ff0bf5b
[41]T. Mikolov, K. Chen, G. Corrado, J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013. https://arxiv.org/pdf/1301.3781.pdf
[42]Pennington, Jeffrey, Richard Socher, and D. Christopher, "Glove: Global vectors for word representation," Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014.
[43]A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, "Bag of tricks for efficient text classification," arXiv preprint arXiv:1607.01759, 2016. https://arxiv.org/pdf/1607.01759.pdf
[44]C. Raj, A. Agarwal, G. Bharathy, B. Narayan, M. Prasad, "Cyberbullying Detection: Hybrid Models Based on Machine Learning and Natural Language Processing Techniques," Electronics, 22(10), 2021.
[45]B. Das, S. Chakraborty, "An improved text sentiment classification model using TF-IDF and next word negation," arXiv preprint arXiv:1806.06407, 2018.
[46]S. Alashri, S. Alzahrani, M. Alhoshan, I. Alkhanen, S. Alghunaim, & M. Alhassoun, "Lexi-Augmenter: Lexicon-Based Model for Tweets Sentiment Analysis," In 2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), IEEE, 2019, pp. 7-10.
[47]Gupta, Neha, and Rashmi Agrawal, "Application and techniques of opinion mining," Hybrid Computational Intelligence. Academic Press, 2020, pp. 1-23.
[48]Osman, Aida, Said Ahmad. "Current trends and research directions in the dictionary-based approach for sentiment lexicon generation: a survey," Journal of theoretical and applied information technology 97(2), 2019.
[49]Kumar, Akshi, and Geetanjali Garg, "Systematic literature review on context-based sentiment analysis in social multimedia," Multimedia tools and Applications, 2020, pp. 15349-15380.
[50]Sazzed, Salim, and Sampath Jayarathna. "Ssentia: a self-supervised sentiment analyzer for classification from unlabeled data," Machine Learning with Applications, 4, 2021.
[51]Gautam, Arvind Kumar, and Abhishek Bansal, "Performance Analysis of Supervised Machine Learning Techniques For Cyberstalking Detection In Social Media," Journal of Theoretical and Applied Information Technology, 100(2), 2022.
[52](2017) Data Flair website. Kernel Functions-Introduction to SVM Kernel & Examples. [Online]. Available: https://data-flair.training/blogs/svm-kernel-functions/
[53]Rish, "An empirical study of the naive bayes classifier", IJCAI 2001 workshop on empirical methods in artificial intelligence, 3(22), 2001, pp. 41–46
[54]Eman Bashir, Mohamed Bouguessa, "Data Mining for Cyberbullying and Harassment Detection in Arabic Texts," International Journal of Information Technology and Computer Science, 13(5),2021, pp.41-50.
[55](2020) Mendeley Cyberbullying datasets. [Online]. Available:https://data.mendeley.com/datasets/jf4pzyvnpj/1
[56](2020) The Kaggle website-dataset. [Online]. Available: https://www.kaggle.com/mrmorj/hate-speech-and-offensive-language-dataset
[57](2022) The Kaggle website-dataset. [Online]. Available: https://www.kaggle.com/andrewmvd/cyberbullying-classification
[58](2021) The Kaggle website-dataset. [Online]. Available: https://www.kaggle.com/sanamps/toxiccommentclassification
[59](2014) The Kaggle website-dataset. [Online]. Available: https://www.kaggle.com/c/detecting-insults-in-social-commentary/data

International Journal of Modern Education and Computer Science (IJMECS)