Towhidul Islam Robin

Work place: Department of Computer Science and Engineering, Stamford University Bangladesh, Dhaka, Bangladesh

E-mail: towhid.austcse@gmail.com

Website:

Research Interests: Natural Language Processing, Data Mining, Data Structures and Algorithms

Biography

Md. Towhidul Islam Robin is currently serving as a Senior Lecturer in the Department of Computer Science and Engineering at Stamford University Bangladesh (SUB). He received his Bachelor of Science degree from Ahsanullah University of Science and Technology (AUST) in 2015. He is currently pursuing a Master of Science Degree in Computer Science and Engineering (CSE) from United International University (UIU), Dhaka, Bangladesh. His research interest lies within Machine Learning, Data Mining, Natural Language Processing (NLP), and the Internet of Things.

Author Articles
Automatic Environmental Sound Recognition (AESR) Using Convolutional Neural Network

By Md. Rayhan Ahmed Towhidul Islam Robin Ashfaq Ali Shafin

DOI: https://doi.org/10.5815/ijmecs.2020.05.04, Pub. Date: 8 Oct. 2020

Automatic Environmental Sound Recognition (AESR) is an essential topic in modern research in the field of pattern recognition. We can convert a short audio file of a sound event into a spectrogram image and feed that image to the Convolutional Neural Network (CNN) for processing. Features generated from that image are used for the classification of various environmental sound events such as sea waves, fire cracking, dog barking, lightning, raining, and many more. We have used the log-mel spectrogram auditory feature for training our six-layer stack CNN model. We evaluated the accuracy of our model for classifying the environmental sounds in three publicly available datasets and achieved an accuracy of 92.9% in the urbansound8k dataset, 91.7% accuracy in the ESC-10 dataset, and 65.8% accuracy in the ESC-50 dataset. These results show remarkable improvement in precise environmental sound recognition using only stack CNN compared to multiple previous works, and also show the efficiency of the log-mel spectrogram feature in sound recognition compared to Mel Frequency Cepstral Coefficients (MFCC), Wavelet Transformation, and raw waveform. We have also experimented with the newly published Rectified Adam (RAdam) as the optimizer. Our study also shows a comparative analysis between the Adaptive Learning Rate Optimizer (Adam) and RAdam optimizer used in training the model to correctly classifying the environmental sounds from image recognition architecture.

[...] Read more.
Other Articles