بهبود سرعت و دقت در استفاده از برنامهنويسي ژنتيک براي تصديق هويت گوينده
محورهای موضوعی : مهندسی برق و کامپیوترسعيده سادات سديدپور 1 * , محمدمهدی همایونپور 2 , مهدي فسنقري 3
1 - دانشگاه صنعتي اميرکبير
2 - دانشگاه صنعتی امیرکبیر
3 - پژوهشکده فناوري اطلاعات
کلید واژه: گوينده تصديق هويت گوينده برنامهنويسي ژنتيک خوشهبندي ويژگی MFCC ويژگی PLP,
چکیده مقاله :
در تصديق هويت گوينده، سيستم هويت شخصي را که با سيستم تماس برقرار کرده است، بررسي کرده و تعيين ميکند که وي همان شخص مدعي است و يا دروغ ميگويد. در اين مقاله از برنامهنويسي ژنتيک بهعنوان روشی برای مدلسازي گويندگان استفاده شد. با توجه به زمان زياد آموزش مدلها توسط برنامهنويسي ژنتيک، ايده بهرهمندي از فشردهسازي دادههاي آموزشي، بهمنظور کاهش زمان آموزش مدلها مطرح گرديد و بدين ترتيب زمان لازم برای مدلسازی گويندگان با استفاده از برنامهنويسي ژنتيک در حدود 20 برابر کاهش داده شد. آموزش چندين درخت برنامهنويسي ژنتيک بهعنوان مدل هر گوينده، ايده ديگري است که بهمنظور بهبود دقت تصديق هويت گوينده در اين مقاله مطرح شده است. در اين روش، دادههاي آموزشي به تعداد کمي خوشه تفکيک شده و بهازاي هر خوشه، يک درخت برنامهنويسي ژنتيک آموزش داده ميشود. بدين ترتيب يک گوينده با چندين درخت برنامهنويسي ژنتيک مدل ميشود. با استفاده از روش پيشنهادي، کارايي برنامهنويسي ژنتيک براي تصديق هويت گوينده از 50% به حدود 92% افزايش پيدا کرده است. نتايج حاصل از عملکرد برنامهنويسي ژنتيک با کارايي روشهاي تمايزي ديگري مثل شبکههای عصبي MLP و LVQ و نيز روشهاي غير تمايزي مانند LBG، GMM، GMM-UBM و VQ-MAP مقايسه گرديد و مشاهده شد که برنامهنويسي ژنتيک کارايي بهتري را نسبت به ديگر روشها نتيجه ميدهد.
In speaker verification, a system investigates a person's identity and decides whether the person is a true client or an imposter. In this paper, genetic programming (GP) is used as a method for speaker modeling. When GP is used for construction of models for speakers, due to long training time to train GP models, training data compression is proposed in this paper. This idea reduced training time for 20 times. Training of several GP trees as a speaker's model is another idea presented in this paper to improve the speaker verification performance. In this method, training data are separated to a few clusters. Then a GP tree is trained for each cluster. Therefore, a speaker is modeled by several genetic programming trees. The verification performance increased from 50% to about 92% using the proposed method. Genetic programming performance was compared to some other discriminative methods such as Multi-Layer Perceptron neural network and Learning Vector quantization, and generative methods such as K-Means, GMM and LBG, GMM-UBM and VQ-MAP. Experiments show that Genetic programming is more effective than the other methods.
[1] P. Day and A. Nandi, "Robust text - independent speaker verification using genetic programming," IEEE Trans. on Audio, Speech, and Language Processing, vol. 15, no. 1, pp. 285-295, Jan. 2007.
[2] J. Campbell, "Speaker recognition: a tutorial," in Proc. IEEE, vol. 85, no. 9, pp. 1437-1462, Sep. 1997.
[3] S. Pruzansky, "Pattern-matching procedure for automatic talker recognition," The Journal of the Acoustical Society of America, vol. 35, no. 3, pp. 354-358, 1963.
[4] P. Bricker, "Statistical techniques for talker identification," Bell System Technical J., vol. 50, no. 4, pp. 1427-1454, Apr. 1971.
[5] K. Li and G. Hughs, "Talker identification as they appear in correlation matrics of continous speech spectral," The J. of the Acoustical Society of America, vol. 55, pp. 833-837, 1974.
[6] M. Sambur, "Speaker recognition and verification using linear prediction analysis," The J. of the Acoustical Society of America, vol. 53, p. 354, 1973.
[7] B. Juang and F. Soong, "Speaker recognition based on source coding," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 1, pp. 613-616, 3-6 Apr. 1990.
[8] A. Poritz, "Linear predictive hidden markov models and the speech signal," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 2, pp. 1291-1294, May 1982.
[9] B. Furui, "Cepstral analysis technique for automatic speaker verification," IEEE Trans. on Acoust., Speech, and Signal Processing, vol. 29, no. 2, pp. 254-272, Apr. 1981.
[10] T. Ganchev, N. Fakotakis, and G. Kokkinakis, "Comparative evaluation of various MFCC implementations on the speaker verification task," in Proc. 10th Int. Conf. on Speech and Computer, SPECOM, vol. 1, pp. 191-194,Oct. 2005.
[11] M. Siafarikas, T. Ganchev, and N. Fakotak, "Wavelet packet based speaker verification," in Proc. ISCA-ICSLP, vol. 1, pp. 257-264, Toledo, Spain, Jun. 2004.
[12] H. Ezzaidi, J. Rout, and D. Shaughnessy, Combining Pitch and MFCC for Speaker Recognition Systems, University Quebc, Ermetis, Canada, 2001.
[13] D. Reynolds and R. Rose, "Robust text-independent speaker identification using gaussian mixture speaker models," IEEE Trans. on Speech and Audio Processing, vol. 3, no. 1, pp. 72-83, Jan. 1995.
[14] S. Stafford, A Gaussian Mixture Model Based Speaker Verification System That Captures Sequential Information, M. S. Thesis, 2005.
[15] M. Faouzi BenZeghiba and H. Bourlard, "User-customized password speaker verification using multiple reference and background models," Speech Communication, vol. 48, no. 9, pp. 1200-1213, 2006.
[16] S. Adinarayanan, Text-Independent Speaker Verification Using Support Vector Machine, M.Sc. Thesis, Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, 2005.
[17] T. Matsui and S. Furui, "Comparision of text independent Speaker recognition methods using VQ-distortion and discrete/continuous HMMs," in Proc. ICSLP, vol. 2, pp. 157-160, 1992.
[18] Q. Li and B. Juang, "Speaker verification using verbal information verification for automatic enrollment," Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 1, pp. 650-658, 1998.
[19] X. Li and K. Chen, "Mandarin verbal information verification," in Proc. IEEE Int Conf. on Acoust., Speech, and Signal Processing, vol. 1, pp. 833-836, 13-17 May 2002.
[20] V. Wan, Speaker Verification Using Support Vector Machines, Ph. D. Thesis, University of Sheffield, 2003.
[21] D. Reynolds, T. Quatieri, and R. Dunn, "Speaker verification using adapted gaussian mixture models," Digital Signal Processing, vol. 10, no. 1-3, pp. 19-41, Jan. 2000.
[22] Q. Hong and S. Kwong, "A discriminative training approach for text- independent speaker recognition," Signal Processing, vol. 85, no. 7, pp. 1449-1463, Jul. 2005.
[23] V. Hautamaki, T. Kinnunen, I. Karkkainen, J. Saastamoinen, M. Tuononen, and P. Franti, "Maximum a posteriori adaptation of the centroid model for speaker verification," IEEE Signal Processing Letters, vol. 15, pp. 162-165, 2008.
[24] E. Avci, "A new optimum feature extraction and classification method for speaker recognition: GWPNN," Expert Systems with Applications, vol. 32, no. 2, pp. 485-498, Feb. 2007.
[25] K. Faraoun and A. Boukelif, "Artificial immune systems for text -dependent speaker recognition," J. of Computer Science, vol. 5, no. 4, pp. 19-26, Dec. 2006.
[26] R. Wouhaybi and M. Al-Alaoui, "Comparison of neural networks for speaker recognition," in Proc. Sixth IEEE Int. Conf. on Electronics, Circuits and Systems, ICECS'99, vol.1, pp. 125-128, Pafos, Cyprus, Sep. 1999.
[27] S. Lung, "Efficient text independent speaker recognition with wavelet feature selection based multilayered neural network using supervised learning algorithm," Pattern Recognition, vol. 40, no. 10, pp. 3616-3620, Dec. 2007.
[28] K. Markov and S. Nakagawa, "Text - independent speaker recognition using non-linear frame likelihood transformation," Speech Communication, vol. 24, no. 3, pp. 193-209, Jun. 1998.
[29] R. Saeidi, H. Sadegh Mohammadi, and M. Khalaj Amirhosseini, "An efficient GMM classification post-processing method for structural gaussian mixture model based speaker verification," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing,, vol. 6, pp. 909-912, May 2006.
[30] S. Bengio and J. Mariethoz, "Learning the decision function for speaker verification," in Proc. IEEE Internat. Conf. on Acoust., Speech, and Signal Processing, vol. 1, pp. 425-428, May. 2001.
[31] N. Mirghafori and M. Hebert, "Parameterization of the score the thereshold for s text-dependent adaptive speaker verification system," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 1, pp. 361-364, Montreal, Canada, May 2004.
[32] N. Cramer, "A representation for the adaptive generation of simple sequential programs," in Proc. of the 1st Int. Conf. on Genetic Algorithms, vol. 1, pp. 183-187, Jul. 1985.
[33] J. Koza, Genetic Programming: on the Programming of Computers by Means of Natural Selection: MIT Press, 1992.
[34] G. Liai, Z. Shuguang, Z. Yongjie, and L. Lihua, "A new codebook design method based on genetic programming," in Proc. IEEE 8th Int. Conf. on Electronic Measurement and Instruments. ICEMI'2007, vol. 3, pp. 250-253, Aug. 2007.
[35] K-Means Algorithm. Available:http://en.wikipedia.org/wiki/k-means_algorithm
[36] F. Soong, A. Rosenberg, L. Rabiner, and B. Juang, "A vector quantization approach to speaker recognition," in Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process, vol. 1, pp. 387-390, Apr. 1985.
[37] T. Kinnunen, J. Saastamoinen, V. Hautamaki, M. Vinni, and P. Franti, "Comparative evaluation of maximum a posteriori vector quantization and gaussian mixture models in speaker verification," Pattern Recognition Letters, vol. 30, no. 4, pp. 341-347, Mar. 2009.
[38] S. Furui, "Section 1.7: Speaker Recognition," in Survey of the State of the Art in Human Language Technology, ed: Cambridge University Press, 1996.
[39] S. Furui, Digital Speech Processing Synthesis and Recognition, New York: Marcel Dekker Inc., 1989.
[40] R. Ramachandran, K. Farrell, R. Ramachandrana, and R. Mammone, "Speaker recognition-general classifier approaches and data fusion methods," Pattern Recognition, vol. 35, pp. 2801-2821, Dec. 2002.
[41] ع. صادقي نائيني، بازشناسي گوينده مبتني بر همجوشي و فضاي حالت ساختهشده از گويندگان ويژه، پاياننامه کارشناسي ارشد، دانشکده مهندسي کامپيوتر و فناوري اطلاعات، دانشگاه صنعتي اميرکبير، 1385.
[42] A. Higgins, L. Bahler, and J. Porter, "Speaker verification using randomized phrase prompting," Digital Signal Processing, vol. 1, no. 2, pp. 89-106, 1991.
[43] About FAR, FRR and EER. Available:http://www.bioid.com/sdk/docs/about_eer.htm