Singing-Voice Timbre Evaluations Based on Transfer Learning
Abstract
:1. Introduction
2. Related Work
2.1. Research on Timbre Analysis
2.2. Research on Automatic Evaluation System for Singing Voice
3. Materials and Data Preprocessing
3.1. Chinese Traditional Instrument Sound Database
3.2. Singing Dry Voice Evaluation Database
3.3. Data Preprocessing
4. Transfer Learning Model for Timbre Evaluation
4.1. Deep Regression for Contrast Experiment
4.2. Transfer Learning Model
4.2.1. Musical Instruments Timbre-Evaluation Network
4.2.2. Singing-Voice Timbre-Evaluation Network
5. Results
5.1. Results of Instrument Timbre-Evaluation Model
5.2. Results of Singing-Voice Timbre-Evaluation Model
5.2.1. Transfer Learning Model
5.2.2. Contrast Model
5.3. Comparison of Results
6. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Cao, C.; Li, M.; Liu, J.; Yan, Y. A study on singing performance evaluation criteria for untrained singers. In Proceedings of the IEEE 2008 9th International Conference on Signal Processing, Beijing, China, 26–29 October 2008; pp. 1475–1478. [Google Scholar]
- McAdams, S.; Giordano, B.L. The perception of musical timbre. In The Oxford Handbook of Music Psychology; Oxford University Press: Oxford, UK, 2009; pp. 72–80. [Google Scholar]
- Jianmin, L. On the timbre of music in vocal singing. J. Henan Univ. Soc. Sci. Ed. 2009, 49, 143–147. [Google Scholar]
- Bertin-Mahieux, T.; Ellis, D.P.; Whitman, B.; Lamere, P. The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), Miami, FL, USA, 24–28 October 2011. [Google Scholar]
- Rafii, Z.; Liutkus, A.; Stöter, F.R.; Mimilakis, S.I.; Bittner, R. MUSDB18—A Corpus for Music Separation (1.0.0) [Data Set]; Zenodo: Montpellier, French, 2017. [Google Scholar]
- Hung, H.-T.; Ching, J.; Doh, S.; Kim, N.; Nam, J.; Yang, Y.-H. EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation. In Proceedings of the 22nd International Society for Music Information Retrieval Conference, Online, 7–12 November 2021; pp. 318–325. [Google Scholar]
- Liu, Z.; Li, Z. Music Data Sharing Platform for Computational Musicology Research (CCMUSIC DATASET); Zenodo: Beijing, China, 2021. [Google Scholar]
- Pons, J.; Slizovskaia, O.; Gong, R.; Gómez, E.; Serra, X. Timbre analysis of music audio signals with convolutional neural networks. In Proceedings of the IEEE 2017 25th European Signal Processing Conference (EUSIPCO), Kos, Greece, 28 August–2 September 2017; pp. 2744–2748. [Google Scholar]
- Jiang, W.; Liu, J.; Li, Z.; Zhu, J.; Zhang, X.; Wang, S. Analysis and modeling of timbre perception features of chinese musical instruments. In Proceedings of the 2019 IEEE/ACIS 18th International Conference on Computer and Information Science (ICIS), Beijing, China, 17–19 June 2019; pp. 191–195. [Google Scholar]
- Yiliang, J.; Qiuheng, S.; Xiaojing, L.; Zijin, L.; Wei, L. Color Analysis of National Musical Instruments based on objective characteristics. J. Fudan Univ. 2020, 59, 346–353. [Google Scholar]
- Saitis, C.; Siedenburg, K. Brightness perception for musical instrument sounds: Relation to timbre dissimilarity and source-cause categories. J. Acoust. Soc. Am. 2020, 148, 2256–2266. [Google Scholar] [CrossRef] [PubMed]
- Poli, G.D. Methodologies for expressiveness modelling of and for music performance. J. New Music Res. 2004, 33, 189–202. [Google Scholar] [CrossRef]
- Gupta, C.; Li, H.; Wang, Y. Perceptual evaluation of singing quality. In Proceedings of the IEEE 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, Malaysia, 12–15 December 2017; pp. 577–586. [Google Scholar]
- Lee, J.; Choi, H.S.; Koo, J.; Lee, K. Disentangling timbre and singing style with multi-singer singing synthesis system. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 7224–7228. [Google Scholar]
- Kim, J.W.; Salamon, J.; Li, P.; Bello, J.P. Crepe: A Convolutional Representation for Pitch Estimation. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 161–165. [Google Scholar] [CrossRef] [Green Version]
- Shi, B.; Bai, X.; Yao, C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 2298–2304. [Google Scholar] [CrossRef] [PubMed]
Number of Files | Total Time | The 16-Dimension Evaluation Criteria |
---|---|---|
1918 | 23 h and 35 min | Slim, bright, dim, sharp, thick, thin, solid, clear, dry, plump, rough, pure, hoarse, harmonize, soft, turbid |
Activation Function | Average Absolute Error Loss of Test Set |
---|---|
Relu + BN | |
selu | 3.84 |
Sigmoid | 1.70 |
softmax | 1.25 |
Slim | Bright | Dim | Sharp | Thick | Thin | Solid | Clear |
---|---|---|---|---|---|---|---|
0.0005 | 0.0001 | 0.0003 | 0.0003 | 0.0001 | 0.00001 | 0.0001 | 0.0001 |
Dry | Plump | Rough | Pure | Hoarse | Harmonize | Soft | Turbid |
0.0003 | 0.000001 | 0.0003 | 0.0001 | 0.0003 | 0.0001 | 0.0003 | 0.00001 |
Slim | Bright | Dim | Sharp | Thick | Thin | Solid | Clear |
---|---|---|---|---|---|---|---|
1.25 | 1.01 | 1.58 | 1.71 | 1.78 | 1.45 | 1.28 | 1.24 |
Dry | Plump | Rough | Pure | Hoarse | Harmonize | Soft | Turbid |
1.52 | 1.54 | 1.80 | 1.18 | 1.24 | 0.94 | 0.70 | 1.77 |
Model | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|
Transfer learning model | 1.025 | 0.984 | 1.001 | 1.006 | 1.045 | 1.017 | 0.950 | 1.009 | 1.091 | 0.947 |
Contrast model | 1.580 | 1.354 | 1.432 | 1.585 | 1.542 | 1.746 | 1.613 | 1.574 | 1.612 | 1.521 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, R.; Zhang, M. Singing-Voice Timbre Evaluations Based on Transfer Learning. Appl. Sci. 2022, 12, 9931. https://doi.org/10.3390/app12199931
Li R, Zhang M. Singing-Voice Timbre Evaluations Based on Transfer Learning. Applied Sciences. 2022; 12(19):9931. https://doi.org/10.3390/app12199931
Chicago/Turabian StyleLi, Rongfeng, and Mingtong Zhang. 2022. "Singing-Voice Timbre Evaluations Based on Transfer Learning" Applied Sciences 12, no. 19: 9931. https://doi.org/10.3390/app12199931