تشخیص دستگاه قطعه های موسیقی سنتی ایرانی بر مبنای استخراج توالی نتها و استفاده از شبکههای LSTM
محورهای موضوعی : مهندسی برق و کامپیوترسینا غضنفری پور 1 , مرتضی خادمی 2 * , عباس ابراهیمی مقدم 3
1 - دانشگاه فردوسي مشهد،دانشكده مهندسي برق و كامپيوتر
2 - دانشگاه فردوسی مشهد،دانشكده مهندسي
3 - دانشگاه فردوسی مشهد،دانشكده مهندسي
کلید واژه: تشخیص دستگاه موسیقی, توالی نت, دستهبندی سلسلهمراتبی, یادگیری عمیق, LSTM,
چکیده مقاله :
دستهبندی دستگاه قطعات موسیقی سنتی ایرانی توسط کامپیوتر برای علاقهمندان موسیقی دستگاهی ایرانی، موضوعی بسیار جالب ولی پیچیده و چالشبرانگیز است. این مسئله اولاً به دلیل کاربردهای فراوان آن در زمینههایی مانند آهنگسازی و آموزش موسیقی و ثانیاً به خاطر نیاز افراد عادی به کامپیوتر برای تشخیص دستگاه از اهمیت بالایی برخوردار است. در این مقاله روشی برای تشخیص دستگاه و زیردستگاه یک قطعه موسیقی ایرانی بر پایه استخراج نتهای متوالی، دستهبندی سلسلهمراتبی و استفاده از شبکههای LSTM ارائه شده است. در این روش، قطعه موسیقی در مرحله اول به یکی از سه دسته کلی، دستهبندی میشود. دسته اول صرفاً شامل دستگاه ماهور، دسته دوم شامل دستگاههای شور و نوا و دسته سوم شامل دستگاههای همایون، سهگاه و چهارگاه است. سپس برای هر دسته بسته به نوع آن، تعداد متفاوت دستهبندهای دیگر اعمال میشود تا این که یکی از 6 دستگاه و یکی از 11 زیردستگاه موسیقی سنتی ایرانی مشخص گردد. این تحقیق به هیچ سبک نوازندگی و ساز خاصی محدود نشده و تحت تأثیر سرعت و تکنیکهای نوازندگی قرار نمیگیرد. قطعات برچسبگذاری شده در پایگاه داده "اَرگ" که برای این تحقیق به وجود آمده است، به صورت تکنوازی هستند؛ اگرچه تعداد اندکی از آنها از همنوایی سازهای کوبهای (مانند تنبک) نیز در کنار سازهای ملودی بهرهمند میباشند. نتایج نشان میدهند که تشخیص 6 دستگاه اصلی و 11 زیردستگاه به ترتیب با دقت میانگین 5/74% و 35/66% انجام گرفته که نسبت به تحقیقات کمشمار مشابه، نتایج بهتری دارد.
Iranian "Dastgah" music classification by computer is a very interesting yet complex and challenging topic for those who are interested in Iranian Dastgah music. The aforementioned problem is important, firstly, due to its many applications in different areas such as composing and teaching music, and secondly, because of the needs of ordinary people to computer to detect the Dastgah. This paper presents a method for recognition of the genre (Dastgah) and subgenre (sub-Dastgah) of Iranian music based on sequential note extraction, hierarchical classification, and the use of LSTM networks. In the proposed method, the music track is first classified into one of the three general categories. The first category includes only "Mahour" Dastgah, the second category includes "Shour" and "Nava", and the third category includes "Homayoun", "Segah" and "Chahargah". Then, for each category, depending on its type, a different number of classifiers are applied until one of the 6 Dastgah and 11 sub-Dastgah of Iranian music are recognized. This research is not limited to any particular style of playing or instruments, it is also not affected by neither the speed nor the techniques of player. The labeled tracks in the "Arg" database, which is created for this research, are solo. However, some of them are also played by percussion instruments (such as the Tombak) along with melodic instruments. The results show that recognition of 6 main Dastgah and 11 sub-Dastgah have been approved by an average accuracy of 74.5% and 66.35%, respectively, which is more promising compared to other few similar studies.
[1] R. Mayer, R. Neumayer, and A. Rauber, "Combination of audio and lyrics features for genre classification in digital audio collections," in Proc. of the 16th ACM Int. Conf. on Multimedia, pp. 159-168, Vancouver, Canada, 26-31 Oct. 2008.
[2] R. Rajan and H. A. Murthy, "Music genre classification by fusion of modified group delay and melodic features," in 23rd National Conf. on Communications, NCC’17, 6 pp. Chennai, India, 2-4 Mar 2017.
[3] Y. Wang, "Research on music recognition algorithm based on RBF neural network," Revista de la Facultad de Ingenieria, vol. 32, no. 8, pp. 707-712, Jan. 2017.
[4] G. K. Birajdar and M. D. Patil, "Speech/music classification using visual and spectral chromagram features," J. of Ambient Intelligence and Humanized Computing, vol. 11, no. 1, pp. 329-347, Jan. 2020.
[5] J. H. Foleiss and T. F. Tavares, "Texture selection for automatic music genre classification," Applied Soft Computing, vol. 89, no. C, Article ID: 106127, Apr. 2020.
[6] W. H. Chang, J. L. Li, Y. S. Lin, and C. C. Lee, "A genre-affect relationship network with task-specific uncertainty weighting for recognizing induced emotion in music," in Proc. IEEE Int. Conf. on Multimedia and Expo, ICME’18, 6 pp., San Diego, CA, USA, 23-27 Jul. 2018.
[7] A. Elbir, H. O. İlhan, G. Serbes, and N. Aydın, "Short time fourier transform based music genre classification," in Proc. Electric Electronics, Computer Science, Biomedical Engineerings' Meeting, EBBT’18, 4 pp., Istanbul, Turkey, 18-19 Apr.. 2018.
[8] E. Simas Filho, E. Borges Jr., and A. Fernandes Jr., "Genre classification for brazilian music using independent and discriminant features," Journal of Communication and Information Systems, vol. 33, no. 1, pp. 104-112, May 2018.
[9] Y. M. G. Costa, L. S. Oliveira, A. L. Koerich, F. Gouyon, and J. G. Martins, "Music genre classification using LBP textural features," Signal Processing, vol. 92, no. 11, pp. 2723-2737, Nov. 2012.
[10] A. K. Singh, R. Singh, and A. Dwivedi, "Mel frequency cepstral coefficients based text independent Automatic Speaker Recognition using matlab," in Proc. Int. Conf. on Reliability Optimization and Information Technology, ICROIT’14, pp. 524-527, Faridabad, India, 6-8 Feb. 2014.
[11] C. Silla, C. A. A. Kaestner, and A. L. Koerich, "Automatic music genre classification using ensemble of classifiers," in Proc. IEEE Int. Conf. on Systems, Man and Cybernetics, pp. 1687-1692, Montreal, Canada, 7-10 Oct. 2007.
[12] G. Tzanetakis and P. Cook, "Musical genre classification of audio signals," IEEE Trans. on Speech and Audio Processing, vol. 10, no. 5, pp. 293-302, Jul. 2002.
[13] Y. Bengio, A. Courville, and P. Vincent, "Representation learning: a review and new perspectives," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798-1828, Aug. 2013.
[14] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, pp. 436-444, May 2015.
[15] W. Shi and X. Fan, "Speech classification based on cuckoo algorithm and support vector machines," in Proc. 2nd IEEE Int. Conf. on Computational Intelligence and Applications, ICCIA’17, pp. 98-102, Beijing, China, 8-11 Sept. 2017.
[16] S. Sharma, P. Fulzele, and I. Sreedevi, "Novel hybrid model for music genre classification based on support vector machine," IEEE Symp. on Computer Applications & Industrial Electronics, ISCAIE’18, pp. 395-400, , Penang, Malaysia, 28-29 Apr. 2018.
[17] D. Chaudhary, N. P. Singh, and S. Singh, "Genre based classification of hindi music," in Proc. Int. Conf. on Innovations in Bio-Inspired Computing and Applications, pp. 73-82, Kochi, India, 23-24 Nov. 2019.
[18] J. Li, J. Ding, and X. Yang, "The regional style classification of chinese folk songs based on GMM-CRF model," in Proc. of the 9th Int. Conf. on Computer and Automation Engineering, ICCAE'17, pp. 66-72, Sydney, Australia, 18-21 Feb. 2017.
[19] C. Kaur and R. Kumar, "Study and analysis of feature based automatic music genre classification using Gaussian mixture model," in Proc. Int. Conf. on Inventive Computing and Informatics, ICICI’17, pp. 465-468, , Coimbatore, India, 23-24 Nov. 2017.
[20] D. G. Bhalke, B. Rajesh, and D. S. Bormane, "Automatic genre classification using fractional fourier transform based mel frequency cepstral coefficient and timbral features," Archives of Acoustics, vol. 42, no. 2, pp. 213-222, Jan. 2017.
[21] A. Sridharan, Music Similarity Estimation, Master's Projects, 607, 2018, DOI: https://doi.org/10.31979/etd.8nz2-b9yavol
[22] A. Acharya, Detecting the Trend in Musical Taste Over the Decade: A Novel Feature Extraction Algorithm to Classify Musical Content with Simple features, arXiv preprint arXiv:1901.02053, 2018.
[23] Y. LeCun, et al., "Backpropagation applied to handwritten zip code recognition," Neural Computation, vol. 1, pp. 541-551, 1989.
[24] E. J. Humphrey and J. P. Bello, "Rethinking automatic chord recognition with convolutional neural networks," in Proc. 11th Int. Conf. on Machine Learning and Applications, vol. 2, pp. 357-362, Boca Raton, FL, USA, 12-15 Dec. 2012.
[25] E. J. Humphrey, J. P. Bello, and Y. LeCun, "Moving beyond feature design: deep architectures and automatic feature learning in music informatics," in Proc. 13th Int. Society for Music Information Retrieval Conf. ,ISMIR’12, pp. 403-408, Porto, Portugal, 8-12 Oct. 2012.
[26] J. Schlüter and S. Böck, "Improved musical onset detection with convolutional neural networks," in Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP’14, pp. 6979-6983, Florence, Italy, 4-9 May 2014.
[27] T. Nakashika, C. Garcia, and T. Takiguchi, "Local-feature-map integration using convolutional neural networks for music genre classification," in Proc. 13th Annual Conf. of the Int. Speech Communication Association, INTERSPEECH’12, pp. 1752-1755, Portland, ON, USA, Sept. 2012.
[28] R. M. Haralick, K. Shanmugam, and I. Dinstein, "Textural features for image classification," IEEE Trans. on Systems, Man, and Cybernetics, vol. 6, no. 3, pp. 610-621, Jan. 1973.
[29] G. Gwardys and D. M. Grzywczak, "Deep image features in music information retrieval," International J. of Electronics and Telecommunications, vol. 60, no. 4, pp. 321-326, Dec. 2014.
[30] S. Sigtia and S. Dixon, "Improved music feature learning with deep neural networks," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP’14, pp. 6959-6963, Florence, Italy, 4-9 May 2014.
[31] Y. M. G. Costa, L. S. Oliveira, and C. Silla, "An evaluation of convolutional neural networks for music classification using spectrograms," Applied Soft Computing, vol. 52, no. C, pp. 28-38, Mar. 2017.
[32] L. Nanni, et al., "Combining visual and acoustic features for music genre classification," Expert Systems with Applications, vol. 45, no. C, pp. 108-117, Mar. 2016.
[33] L. Nanni, Y. M. G. Costa, D. R. Lucio, C. N. Silla, and S. Brahnam, "Combining visual and acoustic features for bird species classification," in Proc. IEEE 28th Int. Conf. on Tools with Artificial Intelligence, ICTAI’16, pp. 396-401, San Jose, CA, USA, 6-8 Nov. 2016.
[34] F. Medhat, D. Chesmore, and J. Robinson, "Masked conditional neural networks for audio classification," in Proc. Int.Conf. on Artificial Neural Networks. pp. 349-358, Alghero, Italy, 11-14 Sept. 2017.
[35] F. Medhat, D. Chesmore, and J. Robinson, "Automatic classification of music genre using masked conditional neural networks," in Proc. IEEE Int. Conf. on Data Mining, ICDM’17, pp. 979-984, New Orleans, LA, USA, 18-21 Nov. 2017.
[36] L. R. Aguiar, M. G. Y. Costa, and C. Silla, "Exploring data augmentation to improve music genre classification with convnets," in Proc. Int. Joint Conf. on Neural Networks, IJCNN’18, 8 pp., Rio de Janeiro, Brazil, 8-13 Jul 2018.
[37] L. Feng, S. Liu, and J. Yao, Music Genre Classification with Paralleling Recurrent Convolutional Neural Network, arXiv preprint arXiv:1712.08370, Dec. 2017.
[38] S. Panwar, A. Das, M. Roopaei, and P. Rad, "A deep learning approach for mapping music genres," in 12th System of Systems Engineering Conf., SoSE’17, 5 pp., Waikoloa, HI, USA, 18-21 Jun. 2017.
[39] J. Schlüter and S. Böck, "Musical onset detection with convolutional neural networks," in Proc. 6th Int. Workshop on Machine Learning and Music, MML’13, 4 pp. Prague, Czech Republic, 23-23 Sept. 2013.
[40] S. Oramas, et al., "Multimodal deep learning for music genre classification," Trans. of the International Society for Music Information Retrieval, vol. 1, no. 1, pp. 4-21, Sept. 2018.
[41] J. Jakubik, "Evaluation of gated recurrent neural networks in music classification tasks," in Proc. of 38th Int. Conf. on Information Systems Architecture and Technology, ISAT’17, pp. 27-37, Szklarska Poręba, Poland, 17-19 Sept. 2018.
[42] N. Chen and S. Wang, "High-level music descriptor extraction algorithm based on combination of multi-channel CNNs and LSTM," in Proc. 18th Int. Society for Music Information Retrieval Conf., ISMIR’17, pp. 509-514, Suzhou, China, 23-27 Oct. 2017.
[43] D. Ghosal and M. H. Kolekar, "Musical genre and style recognition using deep neural networks and transfer learning," in Proc. APSIPA Annual Summit and Conf., pp. 978-988, Hawaii, HI, USA, 12-15 Nov. 2018.
[44] P. Fulzele, R. Singh, N. Kaushik, and K. Pandey, "A hybrid model for music genre classification using LSTM and SVM," in Proc. 11th Int. Conf. on Contemporary Computing, IC3’18, 3 pp., Noida, India, 2-4 Aug. 2018.
[45] R. J. M. Quinto, R. O. Atienza, and N. M. C. Tiglao, "Jazz music sub-genre classification using deep learning," in Proc. IEEE Region 10 Conf., TENCON’17, pp. 3111-3116, Penang, Malaysia, 5-8 Nov. 2017.
[46] L. Soboh, I. Elkabani, and Z. Osman, "Arabic cultural style based music classification," in Proc. Int. Conf. on New Trends in Computing Sciences ICTCS’17, pp. 6-11, Amman, Jordan, 11-13 Oct. 2017.
[47] S. Kanchana, K. Meenakshi, and V. Ganapathy, "Comparison of genre based tamil songs classification using term frequency and inverse document frequency," Research J. Pharm. and Tech, vol. 10, no. 5, pp. 1449-1454, Jul. 2017.
[48] A. Sridharan, M. Moh, and T. Moh, "Similarity estimation for classical indian music," in Proc. 17th IEEE Int. Conf. on Machine Learning and Applications, ICMLA’18, pp. 814-819, Orlando, FL, USA, 17-20 Dec. 2018.
[49] S. Chowdhuri, "PhonoNet: multi-stage deep neural networks for raga identification in hindustani classical music," in Proc. of the 2019 on Int. Conf. on Multimedia Retrieval, pp. 197-201, Ottawa, Canada, 10-13 Jun. 2019.
[50] M. Bhatt and T. Patalia, "Neural network based Indian folk dance song classification using MFCC and LPC," Int. J. Intell. Eng. Syst., vol. 10, no. 3, pp. 173-183, Jun. 2017.
[51] F. Mahardhika, H. L. H. S. Warnars, Y. Heryadi, and Lukas, "Indonesian's dangdut music classification based on audio features," in Proc. Indonesian Association for Pattern Recognition Int. Conf., INAPR’18, pp. 99-103 Jakarta, Indonesia, 7-8 Sept.2018.
[52] س. محمودان و ا. بنوشی، "دستهبندی خودکار گام ماهور موسیقی ایرانی توسط یک شبکه عصبی مصنوعی،" دومین کنفرانس بینالمللی آکوستیک و ارتعاشات دانشگاه صنعتی شریف، صص. 9-1 ، تهران، دی 1391.
[53] H. Hajimolahoseini, R. Amirfattahi, and M. Zekri, "Real-time classification of Persian musical dastgahs using artificial neural network," in Proc. 16th CSI Int. Symp. on Artificial Intelligence and Signal Processing, AISP’12, pp. 157-160, Shiraz, Iran, 2-3 May 2012.
[54] ب. باباعلی، آ. گرگان محمدی و ا. فرجی دیزجی، "نوا: دادگان موسیقي سنتي ایراني براي تشخیص دستگاه و سازهاي اصیل ایراني،" پردازش سیگنال پیشرفته، جلد 8، شماره 2، صص. 134-125، پاییز و زمستان 1398.
[55] Md. Kamrul Hasan, S. Hussain, M. T. Hossain Setu, and Md. N. Ibne Nazrul, "Signal reshaping using dominant harmonic for pitch estimation of noisy speech," Signal Process. vol. 86, no. 5, pp. 1010-1018, May 2006.
[56] Q. Wang, X. Zhao, and J. Xu, "Pitch detection algorithm based on normalized correlation function and central bias function," in Proc. 10th Int. Conf. on Communications and Networking in China, ChinaCom’15, pp. 617-620, Shanghai, China, 15-17 Aug. 2015.
[57] B. S. Atal, "Automatic speaker recognition based on pitch contours," the J. of the Acoustical Society of America, vol. 52, no. 6B, pp. 1687-1697, Dec. 1972.
[58] S. Gonzalez and M. Brookes, "A pitch estimation filter robust to high levels of noise (PEFAC)," in Proc. 19th European Signal Processing Conf., pp. 451-455, Barcelona, Spain, 29 Aug.-3 Sept. 2011.
[59] A. M. Noll, "Cepstrum pitch determination," The J. of the Acoustical Society of America, vol. 41, no. 2, pp. 293-309, Feb. 1967.
[60] T. Drugman and A. Alwan, Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics, arXiv preprint arXiv:2001.00459, Dec. 2019.
[61] http://colah.github.io/posts/2015-08-Understanding-LSTMs
[62] A. Graves, N. Jaitly, and A. Mohamed, "Hybrid speech recognition with deep bidirectional LSTM," in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 273-278, Olomouc, Czech Republic, 8-12 Dec. 2013.
[63] E. Charniak, Introduction to Deep Learning, the MIT Press, 2019.
[64] س. غضنفریپور، ح. نظامآبادیپور و ع. راشدی، "تركیب ویژگيها به كمك الگوریتم جستجوي گرانشي در بازیابي موسیقي ایراني مبتني بر محتوا در دستگاه ماهور،" اولین كنفرانس محاسبات تكاملي و هوش جمعي، صص. 70-65، کرمان، 21-19 اسفند 1394.