بهبود کيفيت گفتار نويزي باند محدود با تلفيق الگوريتمهاي سري تيلور برداري و گسترش پهناي باند
محورهای موضوعی : مهندسی برق و کامپیوترسارا پورمحمدي 1 * , منصور ولي 2 , محسن قدياني 3
1 - دانشگاه شاهد
2 - برق
3 - دانشگاه شاهد
کلید واژه: سريهاي تيلور برداري گسترش پهناي باند گفتار نويزي باند محدود مدل ترکيب گوسي,
چکیده مقاله :
در مقاله حاضر با تلفيق دو ديدگاه سريهاي تيلور برداري و گسترش پهناي باند مصنوعي، ايده جديدي در زمينه بهبود كيفيت سيگنال گفتار باند محدود تخريبشده توسط نويز ارائه شده است. بدين ترتيب كه ابتدا پارامترهاي بازنمايي MFCC استخراجشده از گفتار نويزي باند محدود به روش سريهاي تيلور برداري اصلاح شده و سپس با استفاده از مدل گسترش پهناي باند مبتني بر GMM، بردارهاي بازنمايي گفتار باند گسترده براي اين پارامترهاي اصلاحشده تخمين زده ميشوند. سپس به كمك دو معيار اندازهگيري PESQ و LSD، ميزان شباهت پوش طيف و سيگنال گفتار تخمين زده شده باند گسترده با پوش طيف باند گسترده و گفتار تميز مرجع سنجيده ميشود. نتايج به دست آمده از پيادهسازي اين الگوريتم به وضوح بيانگر كارايي مناسب ايده پيشنهادي در جهت بهبود كيفيت بردارهاي بازنمايي گفتار باند محدود آلوده به نويز و نزديكتر كردن آنها به بردارهاي ويژگي سيگنال گفتار باند گسترده مرجع هستند.
In this paper, we introduce an efficient and previously unreported approach to enhance the quality of corrupted narrowband speech signal using joint Vector Taylor Series (VTS) and Bandwidth Extension (BWE) algorithms. First, feature vectors extracted from the noisy narrowband signal have modified applying VTS technique. Then, the estimation of corresponding wideband features have derived from the compensated parameters using two different artificial BWE methods (Envelope prediction with GMM and Neural Network). Finally, the distance between the wideband feature vectors and their estimated values evaluated using Log Spectral Distortion (LSD) measurement criteria. The results of implementation clearly show the advantage of proposed idea to improve the quality of the contaminated speech. In addition, we show that artificial BWE of speech signal, based on the neural network envelope extension outperforms better results in comparison with the GMM algorithm.
[1] M. Vali, S. A. Seyyed Salehi, and K. Karimi, "Robust speech recognition by modifying clean and telephone feature vectors using bidirectional neural network," in Proc. Interspeech, Pittsburgh, US, 17-21 Sep. 2006.
[2] R. M. Stern, B. Raj, and P. J. Moreno, "Compensation for environmental degradation in automatic speech recognition," in Proc. of the Tutorial and Research Workshop, pp. 33-42, 1997.
[3] P. J. Moreno, Speech Recognition in Noisy Environment, Ph.D. Thesis, pp. 79-96 and 121-126, 1996.
[4] P. J. Moreno, B. Raj, and R. M. Stern, "A vector taylor series approach for environment-independent speech recognition," in Proc. ICASSP, vol. 2, pp. 733-736, Atlanta, US, 7-10 May 1996.
[5] N. S. Kim, D. Y. Kim, B. G. Kong, and S. R. Kim, "Application of VTS to environment compensation with noise statistics," in Proc. Interspeech, 2001.
[6] D. Y. Kim, C. K. Un, and N. S. Kim, "Speech recognition in noisy environments using first-order vector taylor series," Speech Communication, vol.24, no.1, pp. 39-49, Apr. 1998.
[7] B. Iser and G. Schmidt, Bandwidth Extension of Telephony Speech, in Adaptive Signal Processing: Next Generation Solutions, eds. T Adali and S. Haykin, New York, Wiley, 2010.
[8] J. Peter and V. Peter, "On artificial bandwidth extension of telephone speech," Signal Processing, vol. 83, no. 8, pp. 1707-1719, 2003.
[9] P. Jax and P. Vary, "Feature selection for improved bandwidth extension of speech signal," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, pp. 697-700, Montreal, Canada, 2004.
[10] A. H. Nour-Eldin and P. Kabal, "Objective analysis of the effect of memory inclusion on bandwidth extension of narrowband speech," in Proc. Interspeech, pp. 2489-2492, Antwerp, Belgium, 2007.
[11] A. H. Nour-Eldin and P. Kabal, "Mel-frequency cepstral coefficient-based bandwidth extension of narrowband speech," in Proc. Interspeech, pp. 53-56, Brisbane, Australia, 22-26 Sep. 2008.
[12] H. Pulakka, U. Remes, K. Palomaki, M. Kurimo, and P. Alku, "Speech bandwidth extension using gaussian mixture model-based estimation of the highband mel spectrum," in Proc. ICASSP, pp. 5100-5103, 2011.
[13] A. Shahina and B. Yegnanarayana, "Mapping neural networks for bandwidth extension of narrowband speech," in Proc. Interspeech, pp 1435-1438, 2006.
[14] B. Milner and X. Shao, "Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model," InterSpeech, pp. 2421-2424, Denver, US, 2002.
[15] L. Laaksonen, H. Pulakka, V. Myllyla, and P. Alku, "Development, evaluation, and implementation of an artificial bandwidth extension method of telephone speech in mobile terminal," IEEE Trans. Consumer Electronics, vol. 55, no. 2, pp. 780-787, May 2009.
[16] ب. زماني دهكردي، ا. اكبري و ب. ناصر شريف، "طرح دو فيلتر جديد براي بهبود كيفيت گفتار مبتني بر توزيع احتمال پسين براي ضرايب موجك،" نشريه علمي پژوهشي انجمن كامپيوتر ايران، جلد 6، شماره 3- ب، صص. 13-1، پاييز 1387.