How do Machine Learning Algorithms Effectively Classify Toxic Comments? An Empirical Analysis

By Md. Abdur Rahman Abu Nayem Mahfida Amjad Md. Saeed Siddik

DOI: https://doi.org/10.5815/ijisa.2023.04.01, Pub. Date: 8 Aug. 2023

Toxic comments on social media platforms, news portals, and online forums are impolite, insulting, or unreasonable that usually make other users leave a conversation. Due to the significant number of comments, it is impractical to moderate them manually. Therefore, online service providers use the automatic detection of toxicity using Machine Learning (ML) algorithms. However, the model's toxicity identification performance relies on the best combination of classifier and feature extraction techniques. In this empirical study, we set up a comparison environment for toxic comment classification using 15 frequently used supervised ML classifiers with the four most prominent feature extraction schemes. We considered the publicly available Jigsaw dataset on toxic comments written by human users. We tested, analyzed and compared with every pair of investigated classifiers and finally reported a conclusion. We used the accuracy and area under the ROC curve as the evaluation metrics. We revealed that Logistic Regression and AdaBoost are the best toxic comment classifiers. The average accuracy of Logistic Regression and AdaBoost is 0.895 and 0.893, respectively, where both achieved the same area under the ROC curve score (i.e., 0.828). Therefore, the primary takeaway of this study is that the Logistic Regression and Adaboost leveraging BoW, TF-IDF, or Hashing features can perform sufficiently for toxic comment classification.

[...] Read more.

An Adaptive Hybrid Outdoor Propagation Loss Prediction Modelling for Effective Cellular Systems Network Planning and Optimization

By Ikechi Risi Clement Ogbonda Friday Barikpe Sigalo Isabona Joseph

DOI: https://doi.org/10.5815/ijisa.2023.04.02, Pub. Date: 8 Aug. 2023

The frequent poor service network experienced by some mobile phone users within some deadlock areas in Nigeria is an issue which has been identified by different researchers due to wrong positioning and planning of the evolved NodeB (eNodeB) transmitter using existing propagation loss models. To effectively contribute towards this potential issue constantly experienced in some part of Nigeria, an adaptive hybrid propagation loss model that is based on wavelet transform and genetic algorithm methods has been developed for cellular network planning and optimization, with the capacity to resolve the problems absolutely. First, the signal strengths were measured within four selected eNodeB cell sites in long term evolution (LTE) at 2600MHz using drive-test method. Secondly, the measured data were denoised through wavelet tools. Thirdly, COST231 model was optimize and deduced to generic model with parameters. Fourthly, genetic optimization algorithm automatically developed the propagation loss models for denoised signal data (designated as wavelet-GA model) and unprocessed signal data (designated as GA model). The hybrid wavelet-GA propagation loss model, GA propagation loss model, and COST231 propagation loss model were compared based on three error metrics such as root mean square error (RMSE), mean absolute error (MAE) and correlation coefficient (R). The developed hybrid wavelet-GA model estimated the lowest RMSEs of 2.8813 dB, 3.9381 dB, 4.7643 dB, 6.9366 dB, whereas, COST231 model gave highest value of RMSE. The developed hybrid wavelet-GA model also derived the least value of MAE as compared with COST231 and the GA models, such as, 2.2016 dB, 2.8672 dB, 3.4766 dB, 5.8235 dB. The correlation coefficients were also compared, and it showed that the developed hybrid wavelet-GA model were 90.04%, 78.61%, 92.21% and 91.23% for the four cell sites. The developed hybrid wavelet-GA model was also validated to account for the performance level by checking for the correlation coefficient using another measured signal data from different eNodeB cell sites other than the once used for the developed of the hybrid wavelet-GA model. It was noticed that the developed hybrid wavelet-GA propagation loss model is 97.41% valid. Existing standard COST231 model are not able to predict propagation loss with high level of accuracy, as such not efficient to be applied within part of Port Harcourt, Nigeria. The proposed hybrid wavelet-GA model has proven to achieve high performance level and it is relevant to be utilized for cellular network planning and optimization. In future purposes, more regions and locations should be considered to form a broader view in the development of more robust propagation loss models.

[...] Read more.

Interpretable Fuzzy System for Early Detection Autism Spectrum Disorder

By Rajan Prasad Praveen Kumar Shukla

DOI: https://doi.org/10.5815/ijisa.2023.04.03, Pub. Date: 8 Aug. 2023

Autism spectrum disorder (ASD) is a chronic developmental impairment that impairs a person's ability to communicate and connect with others. In people with ASD, social contact and reciprocal communication are continually jeopardized. People with ASD may require varying degrees of psychological aid in order to gain greater independence, or they may require ongoing supervision and care. Early discovery of ASD results in more time allocated to individual rehabilitation. In this study, we proposed the fuzzy classifier for ASD classification and tested its interpretability with the fuzzy index and Nauck's index to ensure its reliability. Then, the rule base is created with the Gauje tool. The fuzzy rules were then applied to the fuzzy neural network to predict autism. The suggested model is built on the Mamdani rule set and optimized using the backpropagation algorithm. The proposed model uses a heuristic function and pattern evolution to classify dataset. The model is evaluated using the benchmark metrics accuracy and F-measure, and Nauck's index and fuzzy index are employed to quantify interpretability. The proposed model is superior in its ability to accurately detect ASD, with an average accuracy rate of 91% compared to other classifiers.

[...] Read more.

Design of Automatic Number Plate Recognition System for Yemeni Vehicles with Support Vector Machine

By Farhan M. Nashwan Khaled A. M. Al Soufy Nagi H. Al-Ashwal Majed A. Al-Badany

DOI: https://doi.org/10.5815/ijisa.2023.04.04, Pub. Date: 8 Aug. 2023

Automatic Number Plate Recognition (ANPR) is an important tool in the Intelligent Transport System (ITS). Plate features can be used to provide the identification of any vehicle as they help ensure effective law enforcement and security. However, this is a challenging problem, because of the diversity of plate formats, different scales, rotations and non-uniform illumination and other conditions during image acquisition. This work aims to design and implement an ANPR system specified for Yemeni vehicle plates. The proposed system involves several steps to detect, segment, and recognize Yemeni vehicle plate numbers. First, a dataset of images is manually collected. Then, the collected images undergo preprocessing, followed by plate extraction, digit segmentation, and feature extraction. Finally, the plate numbers are identified using Support Vector Machine (SVM). When designing the proposed system, all possible conditions that could affect the efficiency of the system were considered. The experimental results showed that the proposed system achieved 96.98% and 99.19% of the training and testing success rates respectively.

[...] Read more.

MediBERT: A Medical Chatbot Built Using KeyBERT, BioBERT and GPT-2

By Sabbir Hossain Rahman Sharar Md. Ibrahim Bahadur Abu Sufian Rashidul Hasan Nabil

DOI: https://doi.org/10.5815/ijisa.2023.04.05, Pub. Date: 8 Aug. 2023

The emergence of chatbots over the last 50 years has been the primary consequence of the need of a virtual aid. Unlike their biological anthropomorphic counterpart in the form of fellow homo sapiens, chatbots have the ability to instantaneously present themselves at the user's need and convenience. Be it for something as benign as feeling the need of a friend to talk to, to a more dire case such as medical assistance, chatbots are unequivocally ubiquitous in their utility. This paper aims to develop one such chatbot that is capable of not only analyzing human text (and speech in the near future), but also refining the ability to assist them medically through the process of accumulating data from relevant datasets. Although Recurrent Neural Networks (RNNs) are often used to develop chatbots, the constant presence of the vanishing gradient issue brought about by backpropagation, coupled with the cumbersome process of sequentially parsing each word individually has led to the increased usage of Transformer Neural Networks (TNNs) instead, which parses entire sentences at once while simultaneously giving context to it via embeddings, leading to increased parallelization. Two variants of the TNN Bidirectional Encoder Representations from Transformers (BERT), namely KeyBERT and BioBERT, are used for tagging the keywords in each sentence and for contextual vectorization into Q/A pairs for matrix multiplication, respectively. A final layer of GPT-2 (Generative Pre-trained Transformer) is applied to fine-tune the results from the BioBERT into a form that is human readable. The outcome of such an attempt could potentially lessen the need for trips to the nearest physician, and the temporal delay and financial resources required to do so.

[...] Read more.

International Journal of Intelligent Systems and Applications (IJISA)

MECS Press Journal

Table Of Contents

How do Machine Learning Algorithms Effectively Classify Toxic Comments? An Empirical Analysis

An Adaptive Hybrid Outdoor Propagation Loss Prediction Modelling for Effective Cellular Systems Network Planning and Optimization

Interpretable Fuzzy System for Early Detection Autism Spectrum Disorder

Design of Automatic Number Plate Recognition System for Yemeni Vehicles with Support Vector Machine

MediBERT: A Medical Chatbot Built Using KeyBERT, BioBERT and GPT-2