Bangla News Headline Categorization

Full Text (PDF, 785KB), PP.39-48

Views: 0 Downloads: 0

Author(s)

Amran Hossain 1,* Niraj Chaudhary 1 Zahid Hasan Rifad 1 B M Mainul Hossain 1

1. Institute of Information Technology, University of Dhaka, Dhaka, Bangladesh

* Corresponding author.

DOI: https://doi.org/10.5815/ijeme.2021.06.05

Received: 15 May 2021 / Revised: 7 Jul. 2021 / Accepted: 10 Aug. 2021 / Published: 8 Dec. 2021

Index Terms

Categorization, Bangla news headlines, neural networks, LSTM, GRU.

Abstract

News categorization from various newspapers is important as readers want to read the news by category. But, the readers face difficulty if the news from different categories is presented without any order. This study aims to determine the category of news from online Bangla newspapers. In this context Bangla news headlines data, along with its categories, were collected from various online newspapers through scrapping. Eight categories of news are considered for this work and the headlines of the news are used for categorization. The input data is modeled by the LSTM and GRU neural networks, and the predicted category is compared with the actual category. For LSTM model, the result gives an accuracy of 82.74% and GRU model, The result gives an accuracy of 87.48%. GRU accuracy is higher than LSTM.

Because, GRU training performance is faster than that of LSTM. In GRU 64 units used and in LSTM 128 units used for this research. For this reason, it also suggests that the GRU model gives better results than that of LSTM. 

Cite This Paper

Amran Hossain, Niraj Chaudhary, Zahid Hasan Rifad, B M Mainul Hossain, " Bangla News Headline Categorization", International Journal of Education and Management Engineering (IJEME), Vol.11, No.6, pp. 39-48, 2021. DOI: 10.5815/ijeme.2021.06.05

Reference

[1]Meparlad, Understanding Text Classification in NLP with Movie Review Example, AnalyticsVidhya, (2020).

[2]Md. Mahmudul hasan shahin, Tanvir Ahmed, Shahriyar Hasan Piyal, Classification of Bangla news articles using bidirectional long short-term memory, (2020).

[3]Yiming Yang and Thorsten Joachims (2008) Text categorization. Scholarpedia, 3(5):4242.

[4]Yang li, Short Text Classification With Convolutional Neural Networks Based Method, (2018). 

[5]J. Roger Alan Stenin, Patricia A. Jaques, An Analysis of hierarchical text classification using word embedding (2018). 

[6]A. Amin Omidvur, Hui Jiang, Using Neural Network for Identifying Clickbaits in online news media, Communications in computerand information science, 220-232, (2019).

[7]W. J. Jingjing Cai, Jianping Li, Deep Learning model used text classification (2018). 

[8]A. Tej Bahadur Shahi, Nepali News Classification using Naïve Bayes, Support Vector Machine and Neural Networks (2018).

[9]Pranshengit Dhar, Md. Zainal Abedin, Bangla News Headline Categorization Using Optimized Machine Learning Principle, (2021).

[10]Sharun Akter Khushbu, Abu Kaiser Mohammad Masum, Neural Network Based Bangali New Headline Multi Classification System: Selection of Features Describes Comparative Performance, (2020).

[11]Mayy M. Al-Tahrawi, Arabic Text Categorization Using Logistic Regression, (2015).

[12]Tehseen Zia, Qaiser Abbas,Muhammad Pervez Akhtar,"Evaluation of Feature Selection Approaches for Urdu Text Categorization", (2015).

[13]Bjorn Gamback, Utpal Kumar Sikdar, “Using Convolutional Neural Networks to Classify Hate-Speech”, (2017). 

[14]Oriol Vinyals, Understanding LSTM Networks, Colah.Github, https://colah.github.io/posts/2015-08-Understanding-LSTMs/ (2015). 

[15]Simeon Kostadinov, Understanding GRU Networks, Towardsdatascience, https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be (2017). 

[16]Bangladesh protidin, https://www.bd-protidin.com (2021). 

[17]Doinik Jugantor, https://www.jugantor.com (2021). 

[18]Daily Inqilab, https://www.dailyinqilab.com (2021). 

[19]Omar Elgabry, The Ultimate Guide to Data Cleaning, Towardsdatascience, https://towardsdatascience.com/the-ultimate-guide-to-data-cleaning-3969843991d4 (2019)

[20]J. Shazia Usmani, News Headlines Categorization Scheme for Unlabeled Data, (2020). 

[21]Sharun Akter Khushbu, Neural Network Based Bengali News Headline Multi Classification System: Selection of Features describes Comparative Performance, (2020).

[22]Rick Anderson, RNN Talking about Gated Recurrent Unit, https://technopremium.com/blog/rnn-talking-about-gated-recurrent-unit/, (2019).

[23]Eftekhar Hossain, Bangla News Headline Categorization using Gated Recurrent Unit (GRU), Github, (2020).

[24]F Ciravegna, L Gilardoni, A Lavelli, S Mazza, W J Black, M Ferraro, et al, "Flexible text classification for financial applications: the FACILE system," in ECAI, 2000, pp 696-700.

[25]M Taboada, J Brooke, M Tofiloski, K Voll, and M Stede, “Lexiconbased methods for sentiment analysis,” Computational linguistics, vol 37, no 2, pp 267–307, 2011 Chen, S Y, & Hsieh, J W Boosted Road sign detection and recognition In Proc of Intl Conference on Machine Learning and Cybernetics, 2008 pp 3823–3826.

[26]A Khan, B Baharudin, and K Khairullah, “Sentiment classification using sentence-level lexical based semantic orientation of online reviews,” Trends in Applied Sciences Research, vol 6, no 10, pp 1141–1157, 2011.

[27]Hawalah, Ahmad 2019 "Semantic Ontology-Based Approach to Enhance Arabic Text Classification “Big Data Cogn Comput 3”, no 4:53.