Thimmaraja Yadava G

Work place: Department of Electronics and Communication Engineering Siddaganga Institute of Technology, Tumakuru, Karnataka, India

E-mail: thimrajyadav@gmail.com

Website:

Research Interests: Computer systems and computational processes, Speech Recognition, Speech Synthesis

Biography

Thimmaraja Yadava G

He received his BE and M.Tech from Kuvempu and VTU university in 2011 and 2014 respectively. Pursuing Ph.D under VTU at Siddaganga Institute of Technology, Tumkur, Karnataka, India. He has published number of papers in various national and international journals and conferences. Currently, he is working as Assistant Professor in the Department of Electronics and Communication Engineering, Sapthagiri College of Engineering, Bangalore, Karnataka, India. His research interests are in the areas of speech processing and speech recognition.

Author Articles
Creation and Comparison of Language and Acoustic Models Using Kaldi for Noisy and Enhanced Speech Data

By Thimmaraja Yadava G H S Jayanna

DOI: https://doi.org/10.5815/ijisa.2018.03.03, Pub. Date: 8 Mar. 2018

In this work, the Language Models (LMs) and Acoustic Models (AMs) are developed using the speech recognition toolkit Kaldi for noisy and enhanced speech data to build an Automatic Speech Recognition (ASR) system for Kannada language. The speech data used for the development of ASR models is collected under uncontrolled environment from the farmers of different dialect regions of Karnataka state. The collected speech data is preprocessed by proposing a method for noise elimination in the degraded speech data. The proposed method is a combination of Spectral Subtraction with Voice Activity Detection (SS-VAD) and Minimum Mean Square Error-Spectrum Power Estimator (MMSE-SPZC) based on Zero Crossing. The word level transcription and validation of speech data is done by Indic language transliteration tool (IT3 to UTF-8). The Indian Language Speech Label (ILSL12) set is used for the development of Kannada phoneme set and lexicon. The 75% and 25% of transcribed and validated speech data is used for system training and testing respectively. The LMs are generated by using the Kannada language resources and AMs are developed by using Gaussian Mixture Models (GMM) and Subspace Gaussian Mixture Models (SGMM). The proposed method is studied determinedly and used for enhancing the degraded speech data. The Word Error Rates (WERs) of ASR models for noisy and enhanced speech data are highlighted and discussed in this work. The developed ASR models can be used in spoken query system to access the real time agricultural commodity price and weather information in Kannada language.

[...] Read more.
Other Articles