Predicting Hit Songs Using Audio and Visual Features †
Abstract
:1. Introduction
2. Literature Review
2.1. Music Popularity Prediction
2.2. Sound
2.3. Visual Elements
3. Materials and Methods
3.1. Independent Variables
3.2. Dependent Variables
4. Results
4.1. Accuracy of Models
4.2. Visual Features
- Hypothesis 1:
- H0: The average brightness of non-popular songs is equal to the average brightness of popular songs.
- H1: The average brightness of non-popular songs is not equal to the average brightness of popular songs.
- Hypothesis 2:
- H0: The average motion of non-popular songs is equal to the average motion of popular songs.
- H1: The average motion of non-popular songs is not equal to the average motion of popular songs.
- Hypothesis 3:
- H0: The average R value of non-popular songs is equal to the average R value of popular songs.
- H1: The average R value of non-popular songs is not equal to the average R value of popular songs.
- Hypothesis 4:
- H0: The average G value of non-popular songs is equal to the average G value of popular songs.
- H1: The average G value of non-popular songs is not equal to the average G value of popular songs.
- Hypothesis 5:
- H0: The average B value of non-popular songs is equal to the average B value of popular songs.
- H1: The average B value of non-popular songs is not equal to the average B value of popular songs.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Anjana, S. A Robust Approach to Predict the Popularity of Songs by Identifying Appropriate Properties. Ph.D. Thesis, University of Colombo School of Computing, Colombo, Sri Lanka, 2021. [Google Scholar]
- Yee, Y.K.; Raheem, M. Predicting music popularity using Spotify and YouTube features. Indian J. Sci. Technol. 2022, 15, 1786–1799. [Google Scholar] [CrossRef]
- Motamedi, E.; Kholgh, D.K.; Saghari, S.; Elahi, M.; Barile, F.; Tkalcic, M. Predicting movies’ eudaimonic and hedonic scores: A machine learning approach using metadata, audio and visual features. Inf. Process. Manag. 2024, 61, 103610. [Google Scholar] [CrossRef]
- Dhanaraj, R.; Logan, B. Automatic Prediction of Hit Songs. In Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR 2005), London, UK, 11–15 September 2005; pp. 488–491. [Google Scholar]
- Li, T.; Ogihara, M.; Li, Q. A comparative study on content-based music genre classification. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, Toronto, ON, Canada, 28 July–1 August 2003. [Google Scholar]
- Borg, N.; Hokkanen, G. What Makes for a Hit Pop Song? What Makes for a Pop Song? Unpublished Thesis, Stanford University, CA, USA, 2011. [Google Scholar]
- Yang, Y.H.; Hu, X. Cross-cultural Music Mood Classification: A Comparison on English and Chinese Songs. In Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR 2012), Porto, Portugal, 8–12 October 2012; pp. 19–24. [Google Scholar] [CrossRef]
- Yang, L.C.; Chou, S.Y.; Liu, J.Y.; Yang, Y.H.; Chen, Y.A. Revisiting the problem of audio-based hit song prediction using Convolutional Neural Networks. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017. [Google Scholar]
- Rajyashree, R.; Anand, A.; Soni, Y.; Mahajan, H. Predicting hit music using MIDI features and machine learning. In Proceedings of the 2018 3rd International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 15–16 October 2018. [Google Scholar]
- Zangerle, E.; Huber, R.; Vötter, M.; Yang, Y.H. Hit Song Prediction: Leveraging Low- and High-Level Audio Features. In Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR 2019), Delft, The Netherlands, 4–8 November 2019. [Google Scholar]
- Pham, B.D.; Tran, M.T.; Pham, H.L. Hit song prediction based on Gradient Boosting Decision tree. In Proceedings of the 2020 7th NAFOSTED Conference on Information and Computer Science (NICS), Ho Chi Minh City, Vietnam, 26–27 November 2020. [Google Scholar]
- Kim, S.T.; Oh, J.H. Music intelligence: Granular data and prediction of top ten hit songs. Decis. Support Syst. 2021, 145, 113535. [Google Scholar]
- Nikas, D.; Sotiropoulos, D.N. A machine learning approach for modeling time-varying hit song preferences. In Proceedings of the 2022 13th International Conference on Information, Intelligence, Systems & Applications (IISA), Corfu, Greece, 18–20 July 2022. [Google Scholar]
- Vötter, M.; Mayerl, M.; Zangerle, E.; Specht, G. Song Popularity Prediction using Ordinal Classification. In Proceedings of the 20th Sound and Music Computing Conference (SMC 2023), Stockholm, Sweden, 12–17 June 2023. [Google Scholar]
- Nagathil, A.; Martin, R. Evaluation of spectral transforms for Music Signal Analysis. In Proceedings of the 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 20–23 October 2013. [Google Scholar]
- Weismer, G. Acoustic Phonetics. Available online: https://www.accessscience.com/content/article/a802380 (accessed on 26 February 2025).
- Chang, T.; Chen, L.; Chung, Y.; Chen, S. Automatic license plate recognition. IEEE Trans. Intell. Transp. Syst. 2004, 5, 42–53. [Google Scholar] [CrossRef]
- Devi, T.G.; Neelamegam, P.; Sudha, S. Image Processing System for automatic segmentation and yield prediction of fruits using open CV. In Proceedings of the 2017 International Conference on Current Trends in Computer, Electrical, Electronics and Communication (CTCEEC), Mysore, India, 8–9 September 2017. [Google Scholar]
- Domingues, T.; Brandão, T.; Ferreira, J.C. Machine learning for detection and prediction of crop diseases and pests: A comprehensive survey. Agriculture 2022, 12, 1350. [Google Scholar] [CrossRef]
Features | Description |
---|---|
Danceability | Describes how suitable a track is for dancing based on a combination of musical elements. A value of 0.0 is least danceable and 1.0 is the most danceable. |
Energy | A measure from 0.0 to 1.0 which represents a perceptual measure of intensity and activity. |
Key | The key the track is in. Integers map to pitches using standard Pitch Class notation. |
Loudness | The overall loudness of a track in decibels (dB). |
Speechiness | The presence of spoken words in a track. The more exclusively speech-like the recording (e.g., talk show, audio book, poetry), the closer to 1.0 the attribute value. |
Acousticness | A confidence measure from 0.0 to 1.0 of whether the track is acoustic: 1.0 represents high confidence that the track is acoustic. |
Liveness | The presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. |
Valence | A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g., happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g., sad, depressed, angry). |
Tempo | The overall estimated tempo of a track in beats per minute (BPM). |
Duration_ms | The duration of the track in milliseconds. |
Features | Description |
---|---|
Average brightness | The average luminance level across all frames of a video. |
Average motion | The average magnitude of change between consecutive frames. |
Average R value | The mean intensity of the red color across all pixels in a video. |
Average G value | The mean intensity of the green color across all pixels in a video. |
Average B value | The mean intensity of the blue color across all pixels in a video. |
Algorithm | Audio Accuracy | Audio+Visual Accuracy | Audio F1-Score | Audio+Visual F1-Score |
---|---|---|---|---|
Support Vector Machine | 72.57% | 80.68% | 71.98% | 80.57% |
Random Forest | 73.87% | 81.98% | 73.57% | 80.57% |
Decision Tree | 69.26% | 76.47% | 69.06% | 76.40% |
KNN | 63.37% | 74.77% | 62.84% | 74.29% |
Logistic Regression | 72.57% | 80.78% | 72.10% | 80.69% |
Average | 70.33% | 78.93% | 69.91% | 78.76% |
Features | Non-Popular Songs | Popular Songs |
---|---|---|
Average R Value | 43 91 141 | 42 73 104 |
#2B0000 #5B0000 #8D0000 | #2A0000 #490000 #680000 | |
Average G Value | 35 82 130 | 39 67 95 |
#002300 #005200 #008200 | #002700 #004300 #005F00 | |
Average B Value | 32 80 128 | 36 65 94 |
#000020 #000050 #000080 | #000024 #000041 #00005E | |
Average | (91,82,80) | (73,67,65) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, C.-Y.; Tu, Y.-N. Predicting Hit Songs Using Audio and Visual Features. Eng. Proc. 2025, 89, 43. https://doi.org/10.3390/engproc2025089043
Lee C-Y, Tu Y-N. Predicting Hit Songs Using Audio and Visual Features. Engineering Proceedings. 2025; 89(1):43. https://doi.org/10.3390/engproc2025089043
Chicago/Turabian StyleLee, Cheng-Yuan, and Yi-Ning Tu. 2025. "Predicting Hit Songs Using Audio and Visual Features" Engineering Proceedings 89, no. 1: 43. https://doi.org/10.3390/engproc2025089043
APA StyleLee, C.-Y., & Tu, Y.-N. (2025). Predicting Hit Songs Using Audio and Visual Features. Engineering Proceedings, 89(1), 43. https://doi.org/10.3390/engproc2025089043