Single Channel Speech Separation Using an Efficient Model-based Method

Full Text (PDF, 433KB), PP.42-47

Views: 0 Downloads: 0

Author(s)

Sonay Kammi 1,* Mohammad Reza Karami 1

1. Faculty of Electrical and Computer Engineering, Babol University of Technology, Babol, Iran

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2015.03.06

Received: 27 May 2014 / Revised: 3 Oct. 2014 / Accepted: 25 Dec. 2014 / Published: 8 Feb. 2015

Index Terms

Single Channel Speech Separation, Vector Quantization, MIXMAX Approximation, Gain Estimation, Source Estimation

Abstract

The subject of extracting multiple speech signals from a single mixed recording, which is referred to single channel speech separation, has received considerable attention in recent years and many model-based techniques have been proposed. A major problem of most of these systems is their inability to deal with the situation in which the signals are combined at different levels of energies because they assume that the data used in the test and training phase have equal levels of energies, where, this assumption hardly occurs in reality. Our proposed method based on MIXMAX approximation and sub-section vector quantization (VQ) is an attempt to overcome this limitation. The proposed technique is compared with a technique in which a gain adapted minimum mean square error estimator is derived to estimate the separated signals. Through experiments we show that our proposed method outperforms this method in terms of SNR results and also reduces computational complexity.

Cite This Paper

Sonay Kammi, Mohammad Reza Karami, "Single Channel Speech Separation Using an Efficient Model-based Method", International Journal of Information Technology and Computer Science(IJITCS), vol.7, no.3, pp.42-47, 2015. DOI:10.5815/ijitcs.2015.03.06

Reference

[1]G. Hu, D. Wang, “Monaural speech segregation based on pitch tracking and amplitude modulation”, IEEE Transactions on Neural Networks, vol. 15, no. 5, pp. 1135-1150, 2004.

[2]D. L. Wang, G. J. Brown, Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, Wiley-IEEE Press, 2006.

[3]L. Y. Gu, R .M. Stern, “Single-Channel Speech Separation Based on Modulation Frequency”, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 25-28, 2008.

[4]M. H. Radfar, R. M. Dansereau and A. Sayadiyan, “A maximum likelihood estimation of vocal-tract-related filter characteristics for single channel speech separation,” EURASIP Journal on Audio, Speech and Music Processing, pp.1-15, 2007.

[5]M. Wu, D. L. Wang and G. J. Brown, “A multipitch tracking algorithm for noisy speech,” IEEE Transactions on Speech and Audio Processing, vol. 11, no. 3, pp. 229-24, 2003.

[6]M .G. Christensen and A. Jakobsson, “Multi-Pitch Estimation,” Synthesis Lectures on Speech and Audio Processing. Morgan and Claypool Publishers, San Rafael, CA, USA, pp. 1-24, 2009.

[7]T. Tolonen and M. Karjalainen, “A computationally efficient multipitch analysis model,” IEEE Transactions on Speech and Audio Processing, vol. 8, no. 6, pp. 708-716, 2000.

[8]S. Srinivasan and D. Wang, 2008. “A model for multitalker speech perception,” Journal of Acoustical Society of America, vol. 124, no. 5, pp. 3213-3224.

[9]P. Mowlaee, A. Sayadiyan and H. Sheikhzadeh, “Evaluating single channel separation performance in transform domain,” Journal of Zhejiang University Science-C, Engineering Springer-Verlag, vol. 11, no. 3, pp. 160–174, March. 2010.

[10]M. H. Radfar, R. M. Dansereau, and A. Sayadiyan, “Speaker independent model based single channel speech separation,” Neurocomputing, vol. 72, no. 1-3, pp. 71-78, Dec. 2008.

[11]A. M. Reddy and B. Raj, “Soft mask methods for single channel speaker separation,” IEEE Transactions on Audio, Speech and Language Processing., vol. 15, no. 6, pp. 1766-1776, 2007.

[12]M. H. Radfar and R. M. Dansereau, “Single channel speech separation using soft mask filtering”, IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 8, pp. 2299–2310, Nov. 2007.

[13]M. H. Radfar, R. M. Dansereau, and A. Sayadiyan, “A novel low complexity VQ-based single channel speech separation technique,” in Proc. IEEE ISSPT06, Aug. 2006.

[14]M. J. Reyes-Gomez, D. Ellis, and N. Jojic, “Multiband audio modeling for single channel acoustic source separation,” in Proc. ICASSP’04, vol. 5, pp. 641–644, May. 2004.

[15]M. H. Radfar, R. M. Dansereau, W.-Y. Chan, “Monaural speech separation based on gain adapted minimum mean square error estimation”, Journal of Signal Processing Systems, vol. 61, no. 1, pp. 21-37, 2010.

[16]M. H. Radfar, A. H. Banihashemi, R. M. Dansereau, A. Sayadiyan, “A non-linear minimum mean square error estimator for the mixture- maximization approximation”, Electronic Letters, vol. 42, no. 12, pp. 75–76, 2006.

[17]A. Gersho, R. M. Gray, Vector Quantization and Signal Compression. Norwell, MA: Kluwer, 1992.

[18]S. Boyd, L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004

[19]B. Bradie, A Friendly Introduction to Numerical Analysis. Englewood Cliffs: Pearson Prentice Hall, 2006.

[20]M. P. Cooke, J. Barker, S. P. Cunningham, X. Shao, “An audiovisual corpus for speech perception and automatic speech recognition”, Journal of Acoustical Society of America, vol. 120, pp. 2421–2424, 2006.