A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on FormSpring in Textual Modality

Full Text (PDF, 546KB), PP.36-47

Views: 0 Downloads: 0

Author(s)

Sahana V. 1,* Anil Kumar K. M. 2 Abdulbasit A. Darem 3

1. JSS Academy of Technical Education Bengaluru/ Department of Information Science and Engineering, Bengaluru, 560060, India

2. JSS Science and Technology University/ Department of Computer Science and Engineering, Mysuru, 570006, India

3. Northern Border University/ Department of Computer Science, Arar, 9280, Saudi Arabia

* Corresponding author.

DOI: https://doi.org/10.5815/ijcnis.2023.04.04

Received: 23 Feb. 2022 / Revised: 17 Jun. 2022 / Accepted: 20 Sep. 2022 / Published: 8 Aug. 2023

Index Terms

Cyberbullying Detection, Machine Learning, Classification, Natural Language Processing, Social Media

Abstract

Social media usage has increased tremendously with the rise of the internet and it has evolved into the most powerful networking platform of the twenty-first century. However, a number of undesirable phenomena are associated with increased use of social networking, such as cyberbullying (CB), cybercrime, online abuse and online trolling. Especially for children and women, cyberbullying can have severe psychological and physical effects, even leading to self-harm or suicide. Because of its significant detrimental social impact, the detection of CB text or messages on social media has attracted more research work. To mitigate CB, we have proposed an automated cyberbullying detection model that detects and classifies cyberbullying content as either bullying or non-bullying (binary classification model), creating a more secure social media experience. The proposed model uses Natural Language Processing (NLP) techniques and Machine Learning (ML) approaches to assess cyberbullying contents. Our main goal is to assess different machine learning algorithms for their performance in cyberbullying detection based on a labelled dataset from Formspring [1]. Nine popular machine learning classifiers namely Bootstrap Aggregation or Bagging, Stochastic Gradient Descent (SGD), Random Forest (RF), Decision Tree (DT), Linear Support Vector Classifier (Linear SVC), Logistic Regression (LR), Adaptive Boosting (AdaBoost), Multinomial Naive Bayes (MNB) and K-Nearest Neighbour (KNN) are considered for the work. In addition, we have experimented with a feature extraction method namely CountVectorizer to obtain features that aid for better classification. The results show that the classification accuracy of AdaBoost classifier is 86.52% which is found better than all other machine learning algorithms used in this study. The proposed work demonstrates the effectiveness of machine learning algorithms in automatic cyberbullying detection as against the very intense and time-consuming approaches for the same problem, thereby by facilitating easy incorporation of an effective approach as tools across different platforms enabling people to use social media safely.

Cite This Paper

Sahana V., Anil Kumar K. M., Abdulbasit A. Darem, "A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on FormSpring in Textual Modality", International Journal of Computer Network and Information Security(IJCNIS), Vol.15, No.4, pp.36-47, 2023. DOI:10.5815/ijcnis.2023.04.04

Reference

[1]Reynolds, K., Kontostathis, A., & Edwards, L. (2011, December). Using machine learning to detect cyberbullying. In 2011 10th International Conference on Machine learning and applications and workshops (Vol. 2, pp. 241-244). IEEE. DOI: 10.1109/ICMLA.2011.152
[2]Olweus, D. (1994). Bullying at school. In Aggressive behavior (pp. 97-130). Springer, Boston, MA. DOI: 10.1007/978-1-4757-9116-7_5
[3]Slonje, R., & Smith, P. K. (2008). Cyberbullying: Another main type of bullying? Scandinavian journal of psychology, 49(2), 147-154. DOI: 10.1111/j.1467-9450.2007. 00611.x
[4]Al-Garadi, M. A., Hussain, M. R., Khan, N., Murtaza, G., Nweke, H. F., Ali, I., ... & Gani, A. (2019). Predicting cyberbullying on social media in the big data era using machine learning algorithms: review of literature and open challenges. IEEE Access, 7, 70701-70718.
[5]O'Keeffe, G. S., & Clarke-Pearson, K. (2011). The impact of social media on children, adolescents, and families. Pediatrics, 127(4), 800-804. DOI: 10.1542/peds.2011-0054
[6]Xu, J. M., Jun, K. S., Zhu, X., & Bellmore, A. (2012, June). Learning from bullying traces in social media. In Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 656-666).
[7]S. Nadali, M.A.A. Murad, N.M. Sharef, A. Mustapha, S. Shojaee, “A review of cyberbullying detection: An overview,” International Conference on Intelligent Systems Design and Applications, ISDA, 325–330, 2014, DOI:10.1109/ISDA.2013.6920758
[8]Willard, “Parent Guide to Cyberbullying and Cyberthreats,” 1–14, 2014.
[9]Ogi Djuraskovic, 2022, Cyberbullying Statistics, Facts, and Trends with Charts. https://firstsiteguide.com/cyberbullying-stats/
[10]Mc Guckin, C., & Corcoran, L. (Eds.). (2017). Cyberbullying: where are We Now?: A Cross-national Understanding. MDPI.
[11]Vaillancourt, T., Faris, R., & Mishna, F. (2017). Cyberbullying in children and youth: Implications for health and clinical practice. The Canadian journal of psychiatry, 62(6), 368-373. DOI: 10.1177/0706743716684791
[12]Görzig, A., & Ólafsson, K. (2013). What makes a bully a cyberbully? Unravelling the characteristics of cyberbullies across twenty-five European countries. Journal of Children and Media, 7(1), 9-27.
[13]Raisi, E., & Huang, B. (2017, July). Cyberbullying detection with weakly supervised machine learning. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017 (pp. 409-416).
[14]Yin, D., Xue, Z., Hong, L., Davison, B. D., Kontostathis, A., & Edwards, L. (2009). Detection of harassment on web 2.0. Proceedings of the Content Analysis in the WEB, 2, 1-7.
[15]Singh, A., & Kaur, M. (2019). Content-based cybercrime detection: A concise review. International Journal of Innovative Technology and Exploring Engineering (IJITEE), 8(8), 1193-1207.
[16]Yao, M., Chelmis, C., & Zois, D. S. (2019, May). Cyberbullying ends here: Towards robust detection of cyberbullying in social media. In The World Wide Web Conference (pp. 3427-3433). DOI: 10.1145/3308558.3313462
[17]Hosseinmardi, H., Rafiq, R. I., Han, R., Lv, Q., & Mishra, S. (2016, August). Prediction of cyberbullying incidents in a media-based social network. In 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (pp. 186-192). IEEE. DOI: 10.1109/ASONAM.2016.7752233
[18]Huang, Q., Singh, V. K., & Atrey, P. K. (2014, November). Cyber bullying detection using social and textual analysis. In Proceedings of the 3rd International Workshop on Socially-aware Multimedia (pp. 3-6). DOI: 10.1145/2661126.2661133
[19]Agrawal, S., & Awekar, A. (2018, March). Deep learning for detecting cyberbullying across multiple social media platforms. In European conference on information retrieval (pp. 141-153). Springer, Cham. DOI: 10.1109/ICMLA.2011.152
[20]Dinakar, K., Jones, B., Havasi, C., Lieberman, H., & Picard, R. (2012). Common sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Transactions on Interactive Intelligent Systems (TiiS), 2(3), 1-30. DOI: 10.1145/2362394.2362400
[21]Ptaszynski, M., Dybala, P., Matsuba, T., Masui, F., Rzepka, R., & Araki, K. (2010). Machine learning and affect analysis against cyber-bullying. the 36th AISB, 7-16.
[22]Balakrishnan, V., Khan, S., & Arabnia, H. R. (2020). Improving cyberbullying detection using Twitter users’ psychological features and machine learning. Computers & Security, 90, 101710.DOI: 10.1016/j.cose.2019.101710
[23]Rosa, H., Pereira, N., Ribeiro, R., Ferreira, P. C., Carvalho, J. P., Oliveira, S., ... & Trancoso, I. (2019). Automatic cyberbullying detection: A systematic review. Computers in Human Behavior, 93, 333-345. DOI: 10.1016/j.chb.2018.12.021
[24]Thun, L. J., Teh, P. L., & Cheng, C. B. (2022). CyberAid: Are your children safe from cyberbullying? Journal of King Saud University-Computer and Information Sciences, 34(7), 4099-4108. DOI: 10.1016/j.jksuci.2021.03.001
[25]Chatzakou, D., Kourtellis, N., Blackburn, J., De Cristofaro, E., Stringhini, G., & Vakali, A. (2017). Mean birds: Detecting aggression and bullying on Twitter. WebSci 2017 - Proceedings of the 2017 ACM Web Science Conference, 13–22. DOI: 10.1145/3091478.3091487
[26]Ribeiro, M. H., Calais, P. H., Santos, Y. A., Almeida, V. A. F., & Meira, W. (2017). “Like Sheep Among Wolves”: Characterizing Hateful Users on Twitter. Available at: http://arxiv.org/abs/1801.00317.
[27]Fosler-Lussier, E., Riloff, E., & Bangalore, S. (2012, June). Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
[28]Dinakar, K., Reichart, R., & Lieberman, H. (2011, July). Modeling the detection of textual cyberbullying. In fifth international AAAI conference on weblogs and social media.
[29]Chen, Y., Zhou, Y., Zhu, S., & Xu, H. (2012, September). Detecting offensive language in social media to protect adolescent online safety. In 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing (pp. 71-80). IEEE. DOI: 10.1109/SocialCom-PASSAT.2012.55
[30]Aziz S., M. U. Khan, Z. Ahmad Choudhry, A. Aymin and A. Usman, “ECG-based Biometric Authentication using Empirical Mode Decomposition and Support Vector Machines,” 2019 IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 2019, pp. 0906–0912. DOI: 10.1109/IEMCON.2019.8936174.
[31]Dey, D. (2018). ML | Bagging Classifier.
[32]Pal, M. (2005). Random forest classifier for remote sensing classification. International journal of remote sensing, 26(1), 217-222. DOI: 10.1080/01431160412331269698
[33]Bayzick, J., Kontostathis, A., & Edwards, L. (2011). Detecting the presence of cyberbullying using computer software.
[34]Al-Garadi, M. A., Varathan, K. D., & Ravana, S. D. (2016). Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network. Computers in Human Behavior, 63, 433-443. DOI: 10.1016/j.chb.2016.05.051
[35]Bisaso, K. R., Karungi, S. A., Kiragga, A., Mukonzo, J. K., and Castelnuovo, B. (2018). A comparative study of logistic regression-based machine learning techniques for prediction of early virological suppression in antiretroviral initiating HIV patients. BMC medical informatics and decision making, 18(1), 77. DOI: 10.1186/s12911-018-0659-x.
[36]Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), 119-139.
[37]Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794). DOI: 10.1145/2939672.2939785
[38]James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112, p. 18). New York: springer. DOI: 10.1007/978-1-4614-7138-7
[39]Safavian, S. R., & Landgrebe, D. (1991). A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics, 21(3), 660-674. DOI: 10.1109/21.97458
[40]Su, J., Shirab, J. S., & Matwin, S. (2011, January). Large scale text classification using semisupervised multinomial naive bayes. In ICML.
[41]Guo, G., Wang, H., Bell, D., Bi, Y., & Greer, K. (2003, November). KNN model-based approach in classification. In OTM Confederated International Conferences" On the Move to Meaningful Internet Systems" (pp. 986-996). Springer, Berlin, Heidelberg. DOI: 10.1007/978-3-540-39964-3-62
[42]Chavan, V. S., & Shylaja, S. S. (2015, August). Machine learning approach for detection of cyber-aggressive comments by peers on social media network. In 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 2354-2358). IEEE. DOI: 10.1109/ICACCI.2015.7275970