بهبود سرعت و دقت در استفاده از برنامه‌نويسي ژنتيک براي تصديق هويت گوينده

محورهای موضوعی : مهندسی برق و کامپیوتر

سعيده سادات سديدپور ^{1
*} , محمدمهدی همایون‌پور ² , مهدي فسنقري ³

1 - دانشگاه صنعتي اميرکبير
2 - دانشگاه صنعتی امیرکبیر
3 - پژوهشکده فناوري اطلاعات

تاریخ دریافت : 1394/09/07 تاریخ پذیرش : 1394/09/07 تاریخ انتشار : 1389/09/30

کلید واژه: گوينده تصديق هويت گوينده برنامه‌نويسي ژنتيک خوشه‌بندي ويژگی MFCC ويژگی PLP,

چکیده مقاله :

در تصديق هويت گوينده، سيستم هويت شخصي را که با سيستم تماس برقرار کرده است، بررسي کرده و تعيين مي‌کند که وي همان شخص مدعي است و يا دروغ مي‌گويد. در اين مقاله از برنامه‌نويسي ژنتيک به‌عنوان روشی برای مدل‌سازي گويندگان استفاده شد. با توجه به زمان زياد آموزش مدل‌ها توسط برنامه‌نويسي ژنتيک، ايده بهره‌مندي از فشرده‌سازي داده‌هاي آموزشي، به‌‌منظور کاهش زمان آموزش مدل‌ها مطرح گرديد و بدين ترتيب زمان لازم برای مدل‌سازی گويندگان با استفاده از برنامه‌نويسي ژنتيک در حدود 20 برابر کاهش داده شد. آموزش چندين درخت برنامه‌نويسي ژنتيک به‌عنوان مدل هر گوينده، ايده ديگري است که به‌منظور بهبود دقت تصديق هويت گوينده در اين مقاله مطرح شده است. در اين روش، داده‌هاي آموزشي به تعداد کمي خوشه تفکيک شده و به‌ازاي هر خوشه، يک درخت برنامه‌نويسي ژنتيک آموزش داده مي‌شود. بدين ترتيب يک گوينده با چندين درخت برنامه‌نويسي ژنتيک مدل مي‌شود. با استفاده از روش پيشنهادي، کارايي برنامه‌نويسي ژنتيک براي تصديق هويت گوينده از 50% به حدود 92% افزايش پيدا کرده است. نتايج حاصل از عملکرد برنامه‌نويسي ژنتيک با کارايي روش‌هاي تمايزي ديگري مثل شبکه‌های ‌عصبي MLP و LVQ و نيز روش‌هاي غير تمايزي مانند LBG، GMM، GMM-UBM و VQ-MAP مقايسه گرديد و مشاهده شد که برنامه‌نويسي ژنتيک کارايي بهتري را نسبت به ديگر روش‌ها نتيجه مي‌دهد.

چکیده انگلیسی:

In speaker verification, a system investigates a person's identity and decides whether the person is a true client or an imposter. In this paper, genetic programming (GP) is used as a method for speaker modeling. When GP is used for construction of models for speakers, due to long training time to train GP models, training data compression is proposed in this paper. This idea reduced training time for 20 times. Training of several GP trees as a speaker's model is another idea presented in this paper to improve the speaker verification performance. In this method, training data are separated to a few clusters. Then a GP tree is trained for each cluster. Therefore, a speaker is modeled by several genetic programming trees. The verification performance increased from 50% to about 92% using the proposed method. Genetic programming performance was compared to some other discriminative methods such as Multi-Layer Perceptron neural network and Learning Vector quantization, and generative methods such as K-Means, GMM and LBG, GMM-UBM and VQ-MAP. Experiments show that Genetic programming is more effective than the other methods.

منابع و مأخذ:

[1] P. Day and A. Nandi, "Robust text - independent speaker verification using genetic programming," IEEE Trans. on Audio, Speech, and Language Processing, vol. 15, no. 1, pp. 285-295, Jan. 2007.
[2] J. Campbell, "Speaker recognition: a tutorial," in Proc. IEEE, vol. 85, no. 9, pp. 1437-1462, Sep. 1997.
[3] S. Pruzansky, "Pattern-matching procedure for automatic talker recognition," The Journal of the Acoustical Society of America, vol. 35, no. 3, pp. 354-358, 1963.
[4] P. Bricker, "Statistical techniques for talker identification," Bell System Technical J., vol. 50, no. 4, pp. 1427-1454, Apr. 1971.
[5] K. Li and G. Hughs, "Talker identification as they appear in correlation matrics of continous speech spectral," The J. of the Acoustical Society of America, vol. 55, pp. 833-837, 1974.
[6] M. Sambur, "Speaker recognition and verification using linear prediction analysis," The J. of the Acoustical Society of America, vol. 53, p. 354, 1973.
[7] B. Juang and F. Soong, "Speaker recognition based on source coding," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 1, pp. 613-616, 3-6 Apr. 1990.
[8] A. Poritz, "Linear predictive hidden markov models and the speech signal," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 2, pp. 1291-1294, May 1982.
[9] B. Furui, "Cepstral analysis technique for automatic speaker verification," IEEE Trans. on Acoust., Speech, and Signal Processing, vol. 29, no. 2, pp. 254-272, Apr. 1981.
[10] T. Ganchev, N. Fakotakis, and G. Kokkinakis, "Comparative evaluation of various MFCC implementations on the speaker verification task," in Proc. 10th Int. Conf. on Speech and Computer, SPECOM, vol. 1, pp. 191-194,Oct. 2005.
[11] M. Siafarikas, T. Ganchev, and N. Fakotak, "Wavelet packet based speaker verification," in Proc. ISCA-ICSLP, vol. 1, pp. 257-264, Toledo, Spain, Jun. 2004.
[12] H. Ezzaidi, J. Rout, and D. Shaughnessy, Combining Pitch and MFCC for Speaker Recognition Systems, University Quebc, Ermetis, Canada, 2001.
[13] D. Reynolds and R. Rose, "Robust text-independent speaker identification using gaussian mixture speaker models," IEEE Trans. on Speech and Audio Processing, vol. 3, no. 1, pp. 72-83, Jan. 1995.
[14] S. Stafford, A Gaussian Mixture Model Based Speaker Verification System That Captures Sequential Information, M. S. Thesis, 2005.
[15] M. Faouzi BenZeghiba and H. Bourlard, "User-customized password speaker verification using multiple reference and background models," Speech Communication, vol. 48, no. 9, pp. 1200-1213, 2006.
[16] S. Adinarayanan, Text-Independent Speaker Verification Using Support Vector Machine, M.Sc. Thesis, Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, 2005.
[17] T. Matsui and S. Furui, "Comparision of text independent Speaker recognition methods using VQ-distortion and discrete/continuous HMMs," in Proc. ICSLP, vol. 2, pp. 157-160, 1992.
[18] Q. Li and B. Juang, "Speaker verification using verbal information verification for automatic enrollment," Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 1, pp. 650-658, 1998.
[19] X. Li and K. Chen, "Mandarin verbal information verification," in Proc. IEEE Int Conf. on Acoust., Speech, and Signal Processing, vol. 1, pp. 833-836, 13-17 May 2002.
[20] V. Wan, Speaker Verification Using Support Vector Machines, Ph. D. Thesis, University of Sheffield, 2003.
[21] D. Reynolds, T. Quatieri, and R. Dunn, "Speaker verification using adapted gaussian mixture models," Digital Signal Processing, vol. 10, no. 1-3, pp. 19-41, Jan. 2000.
[22] Q. Hong and S. Kwong, "A discriminative training approach for text- independent speaker recognition," Signal Processing, vol. 85, no. 7, pp. 1449-1463, Jul. 2005.
[23] V. Hautamaki, T. Kinnunen, I. Karkkainen, J. Saastamoinen, M. Tuononen, and P. Franti, "Maximum a posteriori adaptation of the centroid model for speaker verification," IEEE Signal Processing Letters, vol. 15, pp. 162-165, 2008.
[24] E. Avci, "A new optimum feature extraction and classification method for speaker recognition: GWPNN," Expert Systems with Applications, vol. 32, no. 2, pp. 485-498, Feb. 2007.
[25] K. Faraoun and A. Boukelif, "Artificial immune systems for text -dependent speaker recognition," J. of Computer Science, vol. 5, no. 4, pp. 19-26, Dec. 2006.
[26] R. Wouhaybi and M. Al-Alaoui, "Comparison of neural networks for speaker recognition," in Proc. Sixth IEEE Int. Conf. on Electronics, Circuits and Systems, ICECS'99, vol.1, pp. 125-128, Pafos, Cyprus, Sep. 1999.
[27] S. Lung, "Efficient text independent speaker recognition with wavelet feature selection based multilayered neural network using supervised learning algorithm," Pattern Recognition, vol. 40, no. 10, pp. 3616-3620, Dec. 2007.
[28] K. Markov and S. Nakagawa, "Text - independent speaker recognition using non-linear frame likelihood transformation," Speech Communication, vol. 24, no. 3, pp. 193-209, Jun. 1998.
[29] R. Saeidi, H. Sadegh Mohammadi, and M. Khalaj Amirhosseini, "An efficient GMM classification post-processing method for structural gaussian mixture model based speaker verification," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing,, vol. 6, pp. 909-912, May 2006.
[30] S. Bengio and J. Mariethoz, "Learning the decision function for speaker verification," in Proc. IEEE Internat. Conf. on Acoust., Speech, and Signal Processing, vol. 1, pp. 425-428, May. 2001.
[31] N. Mirghafori and M. Hebert, "Parameterization of the score the thereshold for s text-dependent adaptive speaker verification system," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 1, pp. 361-364, Montreal, Canada, May 2004.
[32] N. Cramer, "A representation for the adaptive generation of simple sequential programs," in Proc. of the 1st Int. Conf. on Genetic Algorithms, vol. 1, pp. 183-187, Jul. 1985.
[33] J. Koza, Genetic Programming: on the Programming of Computers by Means of Natural Selection: MIT Press, 1992.
[34] G. Liai, Z. Shuguang, Z. Yongjie, and L. Lihua, "A new codebook design method based on genetic programming," in Proc. IEEE 8th Int. Conf. on Electronic Measurement and Instruments. ICEMI'2007, vol. 3, pp. 250-253, Aug. 2007.
[35] K-Means Algorithm. Available:http://en.wikipedia.org/wiki/k-means_algorithm
[36] F. Soong, A. Rosenberg, L. Rabiner, and B. Juang, "A vector quantization approach to speaker recognition," in Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process, vol. 1, pp. 387-390, Apr. 1985.
[37] T. Kinnunen, J. Saastamoinen, V. Hautamaki, M. Vinni, and P. Franti, "Comparative evaluation of maximum a posteriori vector quantization and gaussian mixture models in speaker verification," Pattern Recognition Letters, vol. 30, no. 4, pp. 341-347, Mar. 2009.
[38] S. Furui, "Section 1.7: Speaker Recognition," in Survey of the State of the Art in Human Language Technology, ed: Cambridge University Press, 1996.
[39] S. Furui, Digital Speech Processing Synthesis and Recognition, New York: Marcel Dekker Inc., 1989.
[40] R. Ramachandran, K. Farrell, R. Ramachandrana, and R. Mammone, "Speaker recognition-general classifier approaches and data fusion methods," Pattern Recognition, vol. 35, pp. 2801-2821, Dec. 2002.
[41] ع. صادقي نائيني، بازشناسي گوينده مبتني بر هم‌جوشي و فضاي حالت ساخته‌شده از گويندگان ويژه، پايان‌نامه کارشناسي ارشد، دانشکده مهندسي کامپيوتر و فناوري اطلاعات، دانشگاه صنعتي اميرکبير، 1385.
[42] A. Higgins, L. Bahler, and J. Porter, "Speaker verification using randomized phrase prompting," Digital Signal Processing, vol. 1, no. 2, pp. 89-106, 1991.
[43] About FAR, FRR and EER. Available:http://www.bioid.com/sdk/docs/about_eer.htm

مقالات مرتبط

یک رهیافت فرااکتشافی چندهدفه برای بهبود پوشش و اتصال در شبکه‌های حسگر بی‌سیم
تاریخ چاپ : 1405/02/22
رویکرد ارزیابی هیجان نوین جهت مراقبت از سرطان مبتنی بر مدل‌های زبانی بزرگ
تاریخ چاپ : 1405/02/22
ارائه روشی برای مدیریت منابع در شبکه‌های Fog-DSDN با بهره‌گیری از معماری میکروسرویس و شبکه‌های ESN
تاریخ چاپ : 1405/02/22
چارچوب ترکیبی سبک‌وزن برای امنیت اینترنت اشیا با استفاده از جنگل تصادفی بهینه و انتخاب ویژگی تطبیقی در معماری لبه-ابری
تاریخ چاپ : 1405/02/22
یک چارچوب یادگیری نیمه‌نظارتی جهت دسته‌بندی دقیق موارد آزمون با بهره‌گیری از تعبیه‌های زبانی و ویژگی‌های معنایی متن
تاریخ چاپ : 1405/02/22
تکنیک هوشمند مبتنی بر الگوریتم چتر دریایی برای زمان‌بندی وظایف بر اساس اولویت در شبکه‌های IoT/Fog
تاریخ چاپ : 1405/02/22

اشتراک گذاری

آدرس مقاله

بهبود سرعت و دقت در استفاده از برنامه‌نويسي ژنتيک براي تصديق هويت گوينده