Framework for Groove Rating in Exercise-Enhancing Music Based on a CNN–TCN Architecture with Integrated Entropy Regularization and Pooling
Abstract
:1. Introduction
- We conduct an in-depth evaluation of how entropy regularization and entropy-pooling techniques improve the learning process within the existing dataset when integrated into the CNN–TCN architecture for music groove rating.
- We present empirical evidence highlighting the significant role of these entropy-based techniques in enhancing the model’s performance, including its generalization ability and stability, despite the limitations of the dataset.
2. Related Works
2.1. Methodologies for Predicting Perceived Groove in Music
2.2. Methodologies for Recognizing Emotional Responses to Music
2.3. Entropy-Aware Learning Techniques
3. Methodology
3.1. Overall
3.2. Two-Dimensional Convolutional Layers
- represents the input feature maps at layer l,
- is the weight matrix (filters) for the convolutional layer,
- denotes the 2D convolution operation,
- is the bias term,
- represents the activation function, typically a ReLU in this context.
3.3. Temporal Convolutional Network
- represents the convolution operation with a dilation factor d,
- is the position in the sequence,
- , where v is the network depth level,
- is the filter size, and
- represents the temporal offset, ensuring that only past information is considered.
- is the input to the block,
- represents the sequence of transformations (dilated convolutions, ReLU, dropout) applied to ,
- The activation function is typically ReLU.
3.4. Entropy-Pooling Layer
- represents the probability distribution over the feature map values, typically derived using a softmax function,
- is the total number of elements in the feature map.
3.5. Entropy-Enhanced Loss Function
- is the predicted value,
- is the true value,
- is the number of samples,
- is a regularization parameter controlling the weight of the entropy term,
- is the entropy of the output feature map.
4. Experiments
4.1. Settings
4.1.1. Datasets
4.1.2. Baselines
4.1.3. Implementation Details
4.2. Experimental Results
4.3. Comparison with Other Methods
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Janata, P.; Tomic, S.T.; Haberman, J.M. Sensorimotor coupling in music and the psychology of the groove. J. Exp. Psychol. Gen. 2012, 141, 54–75. [Google Scholar] [CrossRef] [PubMed]
- Ashley, R.D. Music, groove, and play. Behav. Brain Sci. 2021, 44, e61. [Google Scholar] [CrossRef] [PubMed]
- Madison, G. Experiencing groove induced by music: Consistency and phenomenology. Music Percept. 2006, 24, 201–208. [Google Scholar] [CrossRef]
- Stupacher, J.; Hove, M.J.; Janata, P. Audio Features Underlying Perceived Groove and Sensorimotor Synchronization in Music. Music Percept. 2016, 33, 571–589. [Google Scholar] [CrossRef]
- Witek, M.A.; Clarke, E.F.; Wallentin, M.; Kringelbach, M.L.; Vuust, P. Syncopation, body-movement and pleasure in groove music. PLoS ONE. 2014, 9, e94446. [Google Scholar] [CrossRef]
- Etani, T.; Marui, A.; Kawase, S.; Keller, P.E. Optimal Tempo for Groove: Its Relation to Directions of Body Movement and Japanese nori. Front. Psychol. 2018, 9, 462. [Google Scholar] [CrossRef]
- Etani, T.; Miura, A.; Kawase, S.; Fujii, S.; Keller, P.E.; Vuust, P.; Kudo, K. A review of psychological and neuroscientific research on musical groove. Neurosci. Biobehav. Rev. 2024, 158, 105522. [Google Scholar] [CrossRef]
- Stupacher, J.; Hove, M.J.; Novembre, G.; Schutz-Bosbach, S.; Keller, P.E. Musical groove modulates motor cortex excitability: A TMS investigation. Brain. Cogn. 2013, 82, 127–136. [Google Scholar] [CrossRef]
- Chang, A.; Kragness, H.E.; Tsou, W.; Bosnyak, D.J.; Thiede, A.; Trainor, L.J. Body sway predicts romantic interest in speed dating. Soc. Cogn. Affect. Neurosci. 2021, 16, 185–192. [Google Scholar] [CrossRef]
- Wang, Y.; Guo, X.; Wang, H.; Chen, Y.; Xu, N.; Xie, M.; Wong, D.W.; Lam, W.K. Training and retention effects of paced and music-synchronised walking exercises on pre-older females: An interventional study. BMC Geriatr. 2022, 22, 895. [Google Scholar] [CrossRef]
- Madison, G.; Sioros, G. What musicians do to induce the sensation of groove in simple and complex melodies, and how listeners perceive it. Front. Psychol. 2014, 5, 894. [Google Scholar] [CrossRef]
- Cameron, D.J.; Caldarone, N.; Psaris, M.; Carrillo, C.; Trainor, L.J. The complexity-aesthetics relationship for musical rhythm is more fixed than flexible: Evidence from children and expert dancers. Dev. Sci. 2023, 26, e13360. [Google Scholar] [CrossRef] [PubMed]
- Vuust, P.; Witek, M.A. Rhythmic complexity and predictive coding: A novel approach to modeling rhythm and meter perception in music. Front. Psychol. 2014, 5, 1111. [Google Scholar] [CrossRef]
- Friberg, A.; Schoonderwaldt, E.; Hedblad, A.; Fabiani, M.; Elowsson, A. Using listener-based perceptual features as intermediate representations in music information retrieval. J. Acoust. Soc. Am. 2014, 136, 1951–1963. [Google Scholar] [CrossRef]
- Hove, M.J.; Martinez, S.A.; Stupacher, J. Feel the bass: Music presented to tactile and auditory modalities increases aesthetic appreciation and body movement. J. Exp. Psychol. Gen. 2020, 149, 1137–1147. [Google Scholar] [CrossRef]
- Madison, G.; Gouyon, F.; Ullen, F.; Hornstrom, K. Modeling the tendency for music to induce movement in humans: First correlations with low-level audio descriptors across music genres. J. Exp. Psychol. Hum. Percept. Perform. 2011, 37, 1578–1594. [Google Scholar] [CrossRef]
- Mori, K. Decoding peak emotional responses to music from computational acoustic and lyrical features. Cognition 2022, 222, 105010. [Google Scholar] [CrossRef]
- Daly, I.; Williams, D.; Hallowell, J.; Hwang, F.; Kirke, A.; Malik, A.; Weaver, J.; Miranda, E.; Nasuto, S.J. Music-induced emotions can be predicted from a combination of brain activity and acoustic features. Brain. Cogn. 2015, 101, 1–11. [Google Scholar] [CrossRef]
- Vempala, N.N.; Russo, F.A. Modeling Music Emotion Judgments Using Machine Learning Methods. Front. Psychol. 2017, 8, 2239. [Google Scholar] [CrossRef]
- Weninger, F.; Weninger, F.; Schuller, B. On-line continuous-time music mood regression with deep recurrent neural networks. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; pp. 5412–5416. [Google Scholar]
- Choi, K.; Fazekas, G.; Sandler, M.; Cho, K. Transfer learning for music classification and regression tasks. arXiv 2017, arXiv:1703.09179. [Google Scholar]
- Peng, X.; Xu, H.; Liu, J.; Wang, J.; He, C. Voice disorder classification using convolutional neural network based on deep transfer learning. Sci. Rep. 2023, 13, 7264. [Google Scholar] [CrossRef]
- Grandvalet, Y.; Bengio, Y. Semi-Supervised Learning by Entropy Minimization. In Proceedings of the 18th International Conference on Neural Information Processing Systems (NIPS’04), Cambridge, MA, USA, 1 December 2004; pp. 529–536. [Google Scholar]
- Tishby, N.; Zaslavsky, N. Deep Learning and the Information Bottleneck Principle. In Proceedings of the 2015 IEEE Information Theory Workshop (ITW), Jerusalem, Israel, 26 April–1 May 2015; pp. 1–5. [Google Scholar]
- Feng, Z.; Zhou, Y.; Wu, L.; Li, Z. Audio Classification Based on Maximum Entropy Model. In Proceedings of the 2003 International Conference on Multimedia and Expo (ICME ’03), Baltimore, MD, USA, 6–9 July 2003; pp. I–745. [Google Scholar]
- Wang, H.; Du, J.; Dai, Y.; Lee, C.-H.; Ren, Y.; Liu, Y. Improving Multi-Modal Emotion Recognition Using Entropy-Based Fusion and Pruning-Based Network Architecture Optimization. In Proceedings of the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 11766–11770. [Google Scholar]
- Ronnie, J.B.; Sharma, V.H.; Angappan, R.A.; Srinivasan, R. Exploring the Role of Entropy in Music Classification. In Speech and Language Technologies for Low-Resource Languages; Springer Nature Switzerland: Cham, Germany, 2024; Volume 2046, pp. 323–343. [Google Scholar]
- Choi, S.; Park, J.I.; Hong, C.H.; Park, S.G.; Park, S.C. Accelerated construction of stress relief music datasets using CNN and the Mel-scaled spectrogram. PLoS ONE 2024, 19, e0300607. [Google Scholar] [CrossRef]
- De Lope, J.; Grana, M. A Hybrid Time-Distributed Deep Neural Architecture for Speech Emotion Recognition. Int. J. Neural Syst. 2022, 32, 2250024. [Google Scholar] [CrossRef] [PubMed]
- Hajarolasvadi, N.; Demirel, H. 3D CNN-Based Speech Emotion Recognition Using K-Means Clustering and Spectrograms. Entropy. 2019, 21, 479. [Google Scholar] [CrossRef]
- Hashempour, S.; Boostani, R.; Mohammadi, M.; Sanei, S. Continuous Scoring of Depression From EEG Signals via a Hybrid of Convolutional Neural Networks. IEEE Trans. Neural Syst. Rehabil. Eng. 2022, 30, 176–183. [Google Scholar] [CrossRef]
- Elmadjian, C.; Gonzales, C.; Costa, R.; Morimoto, C.H. Online eye-movement classification with temporal convolutional networks. Behav. Res. Methods 2023, 55, 3602–3620. [Google Scholar] [CrossRef]
- Filus, K.; Domańska, J. Global Entropy Pooling layer for Convolutional Neural Networks. Neurocomputing 2023, 555, 126615. [Google Scholar] [CrossRef]
- Zadeh, S.G.; Schmid, M. Bias in Cross-Entropy-Based Training of Deep Survival Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 3126–3137. [Google Scholar] [CrossRef]
- Akbar, B.; Tayara, H.; Chong, K.T. Unveiling dominant recombination loss in perovskite solar cells with a XGBoost-based machine learning approach. IScience 2024, 27, 109200. [Google Scholar] [CrossRef]
MSE | R2 | |
---|---|---|
Fold 1 | 73.115 | 0.850 |
Fold 2 | 100.659 | 0.807 |
Fold 3 | 72.242 | 0.837 |
Fold 4 | 96.435 | 0.773 |
Fold 5 | 83.436 | 0.818 |
Mean ± SD | 85.177 ± 11.681 | 0.817 ± 0.027 |
Dataset | Model | MSE | R2 |
---|---|---|---|
CNN+TCN+Entropy Pooling | 89.879 | 0.804 | |
CNN+TCN+Entropy Regularization | 99.837 | 0.783 | |
CNN+TCN+Entropy Pooling and Regularization | 85.177 | 0.817 | |
Janata | CNN+TCN | 103.161 | 0.766 |
CNN+Regression task | 128.224 | 0.731 | |
CNN Feature Embedding+LSTM | 107.200 | 0.707 | |
CNN Feature Embedding+SVM | 253.094 | 0.548 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, J.; Han, J.; Su, P.; Zhou, G. Framework for Groove Rating in Exercise-Enhancing Music Based on a CNN–TCN Architecture with Integrated Entropy Regularization and Pooling. Entropy 2025, 27, 317. https://doi.org/10.3390/e27030317
Chen J, Han J, Su P, Zhou G. Framework for Groove Rating in Exercise-Enhancing Music Based on a CNN–TCN Architecture with Integrated Entropy Regularization and Pooling. Entropy. 2025; 27(3):317. https://doi.org/10.3390/e27030317
Chicago/Turabian StyleChen, Jiangang, Junbo Han, Pei Su, and Gaoquan Zhou. 2025. "Framework for Groove Rating in Exercise-Enhancing Music Based on a CNN–TCN Architecture with Integrated Entropy Regularization and Pooling" Entropy 27, no. 3: 317. https://doi.org/10.3390/e27030317
APA StyleChen, J., Han, J., Su, P., & Zhou, G. (2025). Framework for Groove Rating in Exercise-Enhancing Music Based on a CNN–TCN Architecture with Integrated Entropy Regularization and Pooling. Entropy, 27(3), 317. https://doi.org/10.3390/e27030317