A System to Predict Emotion from Bengali Speech

Full Text (PDF, 700KB), PP.26-35

Views: 0 Downloads: 0

Author(s)

Prashengit Dhar 1,* Sunanda Guha 2

1. Cox’s Bazar City College, Bangladesh

2. Missouri State University, USA

* Corresponding author.

DOI: https://doi.org/10.5815/ijmsc.2021.01.04

Received: 6 Dec. 2020 / Revised: 30 Dec. 2020 / Accepted: 20 Jan. 2021 / Published: 8 Feb. 2021

Index Terms

Speech recognition, Bengali speech, MFCC, LPC, XgBoost

Abstract

Predicting human emotion from speech is now important research topic. One’s mental state can be understood by emotion. The proposed research work is emotion recognition from human speech. Proposed system plays significant role in recognizing emotion while someone is talking. It has a great use for smart home environment. One can understand the emotion of other who is in home or may be in other place. University, service center or hospital can get a valuable decision support system with this emotion prediction system. Features like-MFCC (Mel-Frequency Cepstral Coefficients) and LPC are extracted from audio sample signal. Audios are collected by recording speeches. A test also applied by combining self-collected dataset and popular Ravdees dataset. Self-collected dataset is named as ABEG. MFCC and LPC features are used in this study to train and test for predicting emotion. This study is made on angry, happy and neutral emotion classes. Different machine learning algorithms are applied here and result is compared with each other. Logistic regression performs well as compared to other ML algorithm.

Cite This Paper

Prashengit Dhar, Sunanda Guha," A System to Predict Emotion from Bengali Speech ", International Journal of Mathematical Sciences and Computing(IJMSC), Vol.7, No.1, pp. 26-35, 2021. DOI: 10.5815/ijmsc.2021.01.04

Reference

[1]Yu, Feng, et al. "Emotion detection from speech to enrich multimedia content." Pacific-Rim Conference on Multimedia. Springer, Berlin, Heidelberg, 2001. 

[2]Polzin, T., and Waibel, A., “Emotion-Sensitive HumanComputer Interfaces”, Proceedings of the ISCA-Workshop on Speech and Emotion, 2000.

[3]D. Litman and K. Forbes, “Recognizing emotions from student speech in tutoring dialogues,” in ASRU, 2003

[4]P. Belin, S. Fillion-Bilodeau, and F. Gosselin, “The montreal affective voices: a validated set of nonverbal affect bursts for research on auditory affective processing,” in Behavior Research Methods, 2008

[5]Al-Sarayreh KT, Al-Qutaish RE, Al-Kasasbeh BM. Using the sound recognition techniques to reduce the electricity consumption in highways. Journal of American Science. 2009.

[6]F., Polzin, T., and Waibel, A., “Recognizing Emotion in Speech”, Proceedings of the ICSLP, 1996

[7]Erickson, D., Abramson, A., Maekawa, K., and Kaburagi, T., “Articulatory Characteristics of Emotional Utterances in Spoken English”, Proceedings of the ICSLP, 2000.

[8]Paeschke, A., and Sendlmeier, W. F., “Prosodic Characteristics of Emotional Speech: Measurements of Fundamental Frequency Movements”, Proceedings of the ISCA-Workshop on Speech and Emotion, 2000.

[9]W. Q. Zheng, J. S. Yu, and Y. X. Zou, ‘‘An experimental study of speech emotion recognition based on deep convolutional neural networks,’  in Proc. IEEE Int. Conf. Affect. Comput. Intell. Interact., Sep. 2015, pp. 827–831

[10]S. Mirsamadi, C. Zhang, and E. Barsoum, ‘‘Automatic speech emotion recognition using recurrent neural networks with local attention,’’ in Proc. 42nd IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Mar. 2017, pp. 2227–2231

[11]T. Zoughi and M. M. Homayounpour, ‘‘Gender aware deep Boltzmann Machines for phone recognition,’’ in Proc. Int. Joint Conf. Neural Netw., 2015, pp. 1–5

[12]W. Han, X. Chen, Z. Wang, H. Li, B. Schuller, and H. Ruan, ‘‘Towards temporal modelling of categorical speech emotion recognition,’’ in Proc. Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH), 2018, pp. 932–936

[13]J. W. Kim and R. A. Saurous, ‘‘Emotion recognition from human speech using temporal information and deep learning,’’ in Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, 2018, pp. 937–940

[14]Z. Zhao, Y. Zheng, Z. Zhang, H. Wang, Y. Zhao, and C. Li, ‘‘Exploring spatio-temporal representations by integrating attention-based bidirectional-LSTM-RNNs and FCNs for speech emotion recognition,’’ in Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, 2018, pp. 272–276

[15]M. Sarma, D. Povey, N. K. Goel, K. K. Sarma, N. Dehak, and P. Ghahremani, ‘‘Emotion identification from raw speech signals using DNNs,’’ in Proc. Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH), 2018, pp. 1–5

[16]J. Zhao, X. Mao, and L. Chen, ‘‘Speech emotion recognition using deep 1D & 2D CNN LSTM networks,’’ Biomed. Signal Process. Control, vol. 47, pp. 312–323, Jan. 2019

[17]Gu Y, Postma E, Lin H X, et al. Speech Emotion Recognition Using Voiced Segment Selection Algorithm:,22nd European Conference on Artificial Intelligence (ECAI 2016), pp. 1682- 1683.

[18]Lim W, Jang D, Lee T. “Speech emotion recognition using convolutional and Recurrent Neural Networks”, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2016 Asia-Pacific

[19]Patel P, Chaudhari A, Kale R, “Emotion Recognition From Speech With Gaussian Mixture Models & Via Boosted GMM”. International Journal of Research In Science & Engineering, 2017.

[20]Agrawal S, Shruti AK, Krishna CR. Prosodic feature based text dependent speaker recognition using machine learning algorithms. International Journal of Engineering Science and Technology. 2010,pp 5150-5157

[21]Gill AS.,” A review on feature extraction techniques for speech processing”, International Journal Of Engineering and Computer Science 2016,pp 18551-18556

[22]Kumar R, Ranjan R, Singh SK, Kala R, Shukla A, Tiwari R.,” Multilingual speaker recognition using neural network”, In Proceedings of the Frontiers of Research on Speech and Music, FRSM. 2009. pp. 1-8

[23]Paulraj MP, Sazali Y, Nazri A, Kumar S. A speech recognition system for Malaysian English pronunciation using neural network. In: Proceedings of the International Conference on Man-Machine Systems (ICoMMS). 2009

[24]Tan CL, Jantan A. Digit recognition using neural networks. Malaysian Journal of Computer Science. 2004,pp 40-54

[25]Agrawal S, Shruti AK, Krishna CR. Prosodic feature based text dependent speaker recognition using machine learning algorithms. International Journal of Engineering Science and Technology. 2010,pp 5150-5157

[26]Al-Sarayreh KT, Al-Qutaish RE, Al-Kasasbeh BM. Using the sound recognition techniques to reduce the electricity consumption in highways. Journal of American Science. 2009,pp 1-12

[27]Htwe Pa Pa Win, Phyo Thu Thu Khine, " Emotion Recognition System of Noisy Speech in Real World Environment", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.12, No.2, pp. 1-8, 2020.DOI: 10.5815/ijigsp.2020.02.01

[28]Sun Menghana, Jiang Baochena, Yuan Jing, “Vocal Emotion Recognition Based on HMM and GMM for Mandarin speech”, I.J. Education and Management Engineering 2012, pp 25-31

[29]J. Sirisha Devi, Srinivas Yarramalle, Siva Prasad Nandyala,"Speaker emotion recognition based on speech features and classification techniques", IJIGSP, vol.6, no.7, pp. 61-77, 2014.DOI: 10.5815/ijigsp.2014.07.08