Robustness of Speech Recognition Using Non-Linear Asymmetric Filter and Delta Spectral Characteristics
Subject Areas : electrical and computer engineeringH. Farsi 1 * , S. Kuhimoghadam 2
1 -
2 -
Abstract :
In this paper, we propose a new feature extraction algorithm which is robust against noise. In the proposed algorithm, a non-linear filter with temporal masking are used for speech feature extraction and by applying delta spectral characteristics instead of delta cepstral, the accuracy of speech recognition is improved. Almost, all present Automatic Speech Recognition (ASR) systems use cepstral-delta and delta-delta characteristics for speech feature extraction. The aim of this paper is to reach the robust speech features which provide more accurate speech recognition under different noisy conditions. This is achieved by focusing on speech key features (especially non-stationary speech features) which highly differ from the noise signals. The obtaining experimental results show that the accuracy of speech recognition improves in comparison with traditional methods such as PLP and MFCC.
[1] B. Atal, "Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification," J. of the Acoustical Society of America, vol. 55, no. 6, pp. 1304-1312, Jun. 1974.
[2] P. Jain and H. Hermansky, "Improved mean and variance normalization for robust speech recognition," in Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, vol. 6, pp. 80-85, May 2001.
[3] X. Huang, A. Acero, and H. W. Won, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, Upper Saddle River, NJ: Prentice Hall, 2001.
[4] Y. Obuchi, N. Hataoka, and R. M. Stern, "Normalization of time-derivative parameters for robust speech recognition in small devices," IEICE Trans. on Information and Systems, vol. 87, no. 4, pp. 1004-1011, Spring 2004.
[5] P. J. Moreno, B. Raj, and R. M. Stern, "A vector Taylor series approach for environment-independent speech recognition," in Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, vol. 2, pp. 733-736, 7-10 May 1996.
[6] R. M. Stern, B. Raj, and P. J. Moreno, "Compensation for environmental degradation in automatic speech recognition," in Proc. of the ESCA Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication Channels, vol. 2, pp. 33-42, Apr. 1997.
[7] C. Kim and R. M. Stern, "Power function-based power distribution normalization algorithm for robust speech recognition," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop, vol. 3, pp. 188-193, Nov. 2009.
[8] B. Raj, V. N. Parikh, and R. M. Stern, "The effects of background music on speech recognition accuracy," in Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, vol. 2, pp. 851-854, Apr. 1997.
[9] B. Raj and R. M. Stern, "Missing-feature methods for robust automatic speech recognition," IEEE Signal Processing Magazine, vol. 22, no. 5, pp. 101-116, Apr. 2005.
[10] H. Hermansky, "Perceptual linear prediction analysis of speech," J. Acoust. Soc. Am., vol. 87, no. 4, pp. 1738-1752, Apr. 1990.
[11] C. Kim, Y. H. Chiu, and R. M. Stern, "Physiologically-motivated synchrony-based processing for robust automatic speech recognition," in Proc. INTERSPEECH-2006 Conf., pp. 1975-1978, Sep. 2006.
[12] H. Hermansky and N. Morgan, "RASTA processing of speech," IEEE. Trans. Speech Audio Process., vol. 2, no. 4, pp. 578-58, Oct. 1994.
[13] B. E. D. Kingsbury, N. Morgan, and S. Greenberg, "Robust speech recognition using the modulation spectrogram," Speech Communication, vol. 25, no. 1-3, pp. 117-132, May 1998.
[14] H. G. Hirsch and C. Ehrlicher, "Noise estimation techniques or robust speech recognition," in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 1, pp. 153-156, May 1995.
[15] C. Kim and R. M. Stern, "Nonlinear enhancement of onset for robust speech recognition," in Proc. INTERSPEECH-2010 Conf., vol. 1, pp. 2058-2061, Sep. 2010.
[16] S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoustic, Speech, and Signal Processing, vol. 27, no. 2, pp. 113-120, Apr. 1979.
[17] C. Kim and R. M. Stern, "Power function-based power distribution normalization algorithm for robust speech recognition," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop, vol. 1, pp. 188-193, Dec. 2009.
[18] C. Kim and R. M. Stern, "Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring," in Proc. IEEE Int. Conf. on Acoust., Speech, and Signal Processing, vol. 1, pp. 4574-4577, May 2010.
[19] S. Furui, "Speaker-independent isolated word recognition based on emphasized spectral dynamics," in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 1, pp. 1991-1994, Apr. 1986.
[20] M. Bijankhan and J. Sheikhzadegan, "FARSDAT-the speech database of farsi spoken language," in Proc. 5th Australian Int. Conf. on Speech Science & Tech., vol. 2, pp. 826-831, Dec. 1994.
[21] SPIB, SPIB Noise Data, Available from: http://spib.rice.edu/spib/select_noise.html