1. Introduction
Today, a lot of information about music is easily accessible. For example, information about music and artists is obtained from media such as record labels, websites, lyrics from databases and commentary on music can be viewed from blogs and online magazines [
1]. In every period of life, music has the feature of satisfying and expressing different emotions such as entertainment, rest, education, and pleasure tool. In addition, it has scientific, artistic and cultural functions [
2]. Music is an art that is present in one way or another in all human societies. Music choices vary from person to person, even within the same geographic culture, although they differ around the world [
3]. Although the music differs from person to person, there are also popular music that appeals to large audiences. Popular music pieces often have melodies that can be accompanied and distributed through the music industry and platforms [
4]. Among the most well-known music platforms are iTunes and Apple Music, YouTube, Spotify, Google Play Music, Amazon Music.
Good song is easy to remember and fun to listen. There are many reasons why we find a song good or love it and want to listen to it. Maybe we love it because we can connect with the lyrics, because it helps us feel good, or because the song just has a great melody. Maybe we just enjoy listening to the vocalist or the song itself is catchy. This and many other reasons have an effect on the popularity of a song. Besides this, the audio features are of great importance. These available audio features are shown in
Figure 1 [
5].
Danceability defines how suitable a piece is for dancing based on a combination of certain musical elements. A value of 0.0 means it is the least danceable and 1.0 is the most danceable. Acousticness is a measure from 0.0 to 1.0 of whether the track is acoustic and energy is a measure between 0.0 and 1.0. Typically, energetic pieces feel fast, loud, and boisterous.
Instrumentalness predicts whether a track contains no vocals. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Liveness and loudness are the music features in which the former detects the presence of an audience in the recording, higher liveness represent an increased probability that the track is performed alive, and the later defines the overall loudness of a track is in decibels (dB). Loudness values are averaged across the entire track. Typical range of values change from −60 db to 0 db.
Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording e.g., talk show, reading of audio book and poetry, the closer to 1.0 the attribute value. Similarly, tempo defines the overall estimated tempo of a track is determined in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration. Lastly the valence is a measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive e.g., happy, cheerful, euphoric, while tracks with low valence sound more negative e.g., sad, depressed, angry.
Machine learning and artificial intelligence are directly related with each other and it explained the ability of a machine to produce intelligent human behavior. Artificial intelligence performs complicated operations for solving human problems. Machine learning inserts a large amount of data through computer algorithm to analyze and provide decision and recommendation of the input data [
6,
7,
8,
9].
Several popular music listening platforms, particularly the Spotify platform, have billions of intercepted songs. These songs, as well as underplayed songs, are available in songs not loved by listeners. One of the reasons these unpopular songs are unpopular is that people do not enjoy them. In addition, the song’s technical and audio features are important. In order for the song to be listened to a lot, these features need to be improved and significant audio elements that influence the number of listings need to be identified.
In the current dataset, there are musical terms and information about popular songs. By looking at the correlation values, the variables affecting the popularity of the music were determined. Three different machine learning models have been established, where these variables are the input and the popularity is the target variable. Other variables that have little or no relation to the popularity of music were also used as inputs in these three different machine learning algorithms. These algorithms with different inputs were compared according to the accuracy values of the training and test data.
In this work, the most significant sound characteristics of a song in the music market were determined using the method of filter feature selection and logistic regression, random forest, and K-nearest neighbor (KNN). Our main contributions are summarized below:
- (1)
This article shows how effective the feature selection method is in determining the properties that are most effective on the popularity of songs.
- (2)
This article provides a comprehensive study and evaluation comparing the performance of machine learning algorithms in the method of feature selection. Performance criteria have been used to determine the best algorithm.
The purpose of this article is to determine the factors that affect the popularity of a song, to determine whether these determined factors are sufficient for popularity alone, and how effective the other variables in the used dataset are on popularity, using machine learning algorithms. With this study, it can be determined which sound elements should be given more importance by the artists while creating a work. In this way, the popularity of a new song can be determined before it is released.
The paper is structured as follows. A review of relevant studies is described in
Section 2.
Section 3 contain the information about the dataset.
Section 4 contains the evaluation of the accuracy of the established models along with material and method.
Section 5 the results obtained along with comparison of benchmark schemes. Finally,
Section 6 contains the conclusions and recommendations.
2. Literature Survey
It noticed that various studies examined the Spotify data in the literature. In [
10], the behavior of Spotify active users is examined. The paper investigated the system dynamics such as the tracks played, session lengths, and the downtime. In [
11], the effect of the promotion process of the music pieces on the number of playlists is investigated. Goldmann and Kreitz have examined the general network features and performance through the number of IP addresses by collecting the network information of the Spotify application via NAT devices [
12]. In [
13], the marketing strategy of the Spotify application has been emphasized. Likewise, the financial impacts created on music market and its effective growth adventure by its advertising policies have been researched. Kurt et al. conducted a personalized music study by the Spotify application in the paper. They have recommended an appeal to user pleasure through real-time resting molds [
14]. In [
15], a Spotify application uses digital ads in which the content for the users is investigated. [
16] searched the physiology and historical background of the sound, and then created content. In a study by An et al., they used music lyrics to analyze and classify Chinese music according to emotion and the creation of four different datasets. They used the Naive Bayes algorithm, which is one of the most effective algorithms for text classification. Apart from the Naive Bayes algorithm, four different classification algorithms were trained with different datasets and their performances were reported. The performances of the trained algorithms were evaluated and the final accuracy was determined as 68% [
17].
Guimaraes et al. used different machine learning algorithms to classify the words in the lyrics of Brazilian music. According to the frequency of the words in the lyrics, they guessed which Brazilian music type the song belonged to [
18]. In a study by Duru and Yüreğir, a statistical analysis of the database consisting of 43,936 pieces in the Turkish Music repertoire was made. Determinations were made about the rhythm, which is the basis of music, the “usûl” used in Turkish Music, and the prosody element and its importance in the lyrics that are thought to be directly related to it. The importance of data cleaning was mentioned in the process. As a result of the analysis of the data, it has been revealed that the use of the aruz meter is more in the works composed before the 20th century [
19]. In a study by Karatana and Yıldız, the music genres of the songs were determined by using machine learning methods. Certain features were obtained by passing the songs through the signal processing stage, and a classification study was carried out using machine learning algorithms with these features [
20].
Sciandraa and Spera show the relationship between song data audio features obtained from the Spotify database (e.g., key and tempo) and song popularity, measured by the number of streams that a song has on Spotify. In the study, special attention was paid to the popularity of the songs, under the research question “What are the determinants of popularity?”, the features that are considered important in making a song popular were determined, while doing this, Beta Regression, generalized linear mixed models (GLLM), beta GLLM were used. In the study, the songs of the artist named Luciano Ligabue were used as a sample application. As a result of the application, it was determined that while Speech, Instrumentality, and Vitality were the features that negatively affected the Popularity Index, the Energy, Value, and Duration of the song were the features that had a positive effect [
21].
Trpkovska et al. focused on analyzing the audio characteristics of tracks on Spotify’s Top Songs of 2017 list. In the analysis, information is provided about the common features of popular songs and why people prefer these songs. In the study, estimating one sound feature based on others, searching for patterns in the sound properties of songs, and which properties are related to each other were performed using data visualization and data mining [
22].
In the study by Pareek et al., the popularity of the song was estimated using Random Forest classifier, K-Nearest neighbor classifier and Linear Support Vector classifier algorithms and song metrics available in Spotify. Which of these algorithms predicted popularity effectively was determined by looking at the accuracy, precision, recall, and F1-score metrics. When the results were examined, it was observed that the random forest algorithm gave the best result in estimating popularity [
23].
Mora and Tierney compared and evaluated Feature Engineering, Feature Selection, and Hyperparameter Optimization algorithms using the Spotify Song Popularity dataset in their study. As a result of this study, Feature Engineering has a greater effect on model efficiency compared to alternative approaches [
24].
Zangerle et al. presented an approach that predicts hit songs using low- and high-level sound features in their work. They used deep neural network architecture while performing the prediction process. While predicting the hit song, the input set has been enriched by adding the release year information to the low and high sound features. The findings show that the proposed approach is better than the approaches that use only low- and high-level audio features [
25].
In the Nijkamp research, ‘Is Spotify’s audio-based attribution approach effective in explaining streaming popularity on Spotify?’ aimed to answer the research question. The question was analyzed using Spotify’s audio capabilities, taking an attribute-based approach to a success prediction model for the number of streams a song has on Spotify. The results of the correlations showed that there were significant relationships and the aspects of the relationships were in line with the hypotheses, and these relationships were calculated to be weak. It has been determined that the determined sound characteristics alone are insufficient to predict the count of streams [
26].
The most similar method to the one that we aimed is the method developed by Rahardwika et al. Rahardwika et al. have investigated the effect of feature selection on the accuracy of music genre classification by using the SVM classifier. In their study, they used Spotify music data. In the feature selection stage, they combined some features in different combination groups (FC1, FC2, FC3, FC4). They proved that each combination group has different accuracy results in the classification results. They suggested the combination of FC1 and FC2 features because the combination of features FC1 and FC2 gives the same accuracy of 80%, but because FC2 has fewer features, logically shorter computation time, because it contains fewer features. Features were included in the FC2 acousticness, instrumentalness, popularity, energy, danceability, speechiness, valence, loudness, tempo, and artist_name. In our method, we have studied the accuracy of feature selection with machine learning algorithms to the classification of music popularity. We have used the filter feature selection method when determining effective features over popularity. The success of algorithms created with both datasets was evaluated by using the F-score value [
27]. The properties that influence popularity, through the methods that we use, are instrumentality, acousticness, mode, valence, danceability, energy, loudness. Like our work, this study found that the most effective features in classification of the music genre are acousticness, instrumentality, popularity, energy, danceability, speechiness, valence, loudness, tempo, and artist_name. The fact that a song is popular is an important case in the music field. Moreover, above all other studies, we try to help people in this area by selecting the most beneficial and most important characteristics for popularity.
In this section, a comparison of the machine learning models established using the features selected according to the correlation values and the machine learning models established without feature selection is made. The aim is to learn how these variables, which do not affect popularity, affect the performance of machine learning models. In addition, in this section, a comparison of the studies using methods such as the ones we used is made. The comparison of the studies performed is shown in
Table 1.
4. Methodology
This section provides a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn. In this study, several classifications were using Spotify API via Kaggle software [
28]. While the independent variables in the dataset (API) represents musical attributes, the dependent variables represent popularity. There are 130663 observation units in the Spotify API used in this study. The independent variables in this dataset are continuous numerical variables. The dependent variables are an ordinal scale that is an ordered categorical variable. During performing the research, various technological tools have been used in this study. In this study, the datasets are analysis through this platform. In addition, during the analysis process, this platform was used for some subjects in terms of research and learning.
As shown in
Figure 2, first, the data are collected, and then a dataset is created. In the step of receiving the data, the data are pulled from the dataset. During the data preparation phase, the data are processed through some processes. In this study, first, data preprocessing steps were made, and then feature selection was performed by removing unimportant variables from the dataset. Models are created using data and different machine learning algorithms. Analyzes are performed with the established models. The previous steps are repeated until the algorithms give the best result. The obtained results are evaluated and the best model is selected.
Jupyter Notebook is an open-source program that provides an interactive environment for programming languages, where both explanation and code can work interactively on the same screen. It was chosen for analysis in this study as an interactive report.
Pandas’ library was used to support for using and manipulating multidimensional data. Scikit used machine learning because it is a library that supports for structuring data, modeling, and explaining variable relations through the model.
In this study, the dataset is analyzed by integrating the dataset into the Jupyter Notebook code editor. Variables are changed where they are required to alter in dataset properties such as types of variables. The unwanted data were removed from the dataset and then the missing data are analyzed. The necessary methods are applied, the outlier data are analyzed mathematically and its results are obtained visually. Corrections are made in the dataset based upon the results. The dataset can be analyzed and it becomes interpretable. Discovery data analysis and data visualizations provide us an opinion regarding structuring of the dataset. In the discoverer data analysis section, summary statistics of the dataset are generated. The classes and class frequencies of the variables and their internal distribution structures are also observed by visualization and tabular methods. Statistical analysis is performed on the pre-processed data and meaningful information is produced from the data by using machine learning algorithms.
4.1. Data Pre-Processing
One of the operations to be performed based on the dataset is to make the data responsive to the operation to be performed. This process is called data preprocessing. Data preprocessing steps are carried out just before starting the work by determining a model on the data. Data pre-processing steps are provided in
Figure 3.
As shown in
Figure 3, data cleaning is the process of inserting missing data and fixing, repairing, or removing incorrect or unrelated data from a dataset. Data integration is the merging of data from different sources and the introduction of transformed data to users. Data transformation (normalization) means lowering the input value. When data differ too much, it takes data into a single pattern. The goal is to make it comparable by moving data from a different system to a common system. Data reduction includes volume reduction, data compression, removal of trivial attributes. Data discretization refers to a method that makes it easier to evaluate and manage data by converting a large number of data values into smaller values.
Detecting and Extracting Outliers from the Dataset
The first step we perform on the dataset is to deduct the outliers from the dataset. The outlier finding operation on the datasets was performed with the following pseudo codes.
Outlier Data Query and Deletion Algorithm
Q1 = np.percentile(data[c],25)
Q3 = np.percentile(data[c],75)
IQR = Q3–Q1
Outlier_step = 1.5 × IQR
outlier_list_col = data[(data[c] < Q1 − outlier_step)|(data[c] > Q3 + outlier_step)].index
An outlier is any data point that differs substantially from the rest of the observations in a dataset. In other words, it is an observation that goes beyond the general trend. The algorithm given below was used to find these values. In the first and second lines of the algorithm, the first and third quarter calculations are made. After this calculation, the interquartile range (IQR) is calculated. The quartiles gap is the name given to the difference between the 75% and 25% values of the datasets. In other words, the quartiles represent the middle 50% of the data. This shows us how the mean values are spread out. As a general expression, values that are 1.5 times less than the 25% quartile and 1.5 times more than the 75% quartile are classified as outliers. The outlier’s data in the sample variable are given as shown in
Figure 4.
A total of 798 outliers were detected when this transaction was applied to the dataset. These values have been removed from the dataset.
4.2. Categorizing the Popularity Variable
The most important variable in our data is the popularity variable. Popularity shows the song’s ranking on the most listened playlists. We organized the popularity variable in the dataset as popular and unpopular. In the process, we have given value 1 to popular songs and value 0 to unpopular songs. It has been looked at the average of data in the popularity column at to determine what popular songs were. The average of variables in the popularity column were found as to be 24. According to this average value, songs that are 24 and below are labeled as popular i.e., 1, and songs above it is labeled as unpopular (i.e., 0). After the data were re-edited, it has been determined that 71,709 popular songs and 58156 unpopular songs are found in the dataset.
4.3. Feature Selection
Feature selection is an important method in data mining and machine learning that reduces data size. Feature selection is to create a new feature subset from all the features in the dataset [
29,
30]. Some of the most important reasons for using the feature selection may include ensuring faster training of the machine learning algorithm, reducing the complexity of a model and facilitating interpretation, and reducing overfitting. There are three main ways to feature selection:
Filter methods
Wrapper method (Forward, Backward)
Embedded methods (Lasso-L1, Ridge-L2 Regression)
In this study, we have used the filter methods. The filter method sorts each property relative to some single-variable metrics, and then selects the highest-order properties amongst them. The wrapper method requires to use a method which is for searching for the blanks of all possible subsets of properties, learning and evaluating a classifier with a subset of these properties. It is considered that the embedded methods be both as a mixture of filter and spiral (wrapper) methods in variable selection, and farther it is as a different approach style. In the feature selection phase, first, the correlations between the independent variables and the target variable are checked. The variables with positive and negative high correlations are shown in
Figure 5.
The graph shown in
Figure 5 is the correlation of each property with our target variable. It is required that the properties which are highly associated with the target variable are kept. This means that the input property has a high impact on predicting the target variable. We have set the threshold at 0.02 when selecting important features with the filter method. When determining input variables, we selected variables in which their correlation with the target variable is higher than 0.02. This threshold value is determined by the success results of models set up with other threshold values. Best bet is taken at this threshold (e.g., threshold = 0.1, random forest = 57.83%, logistics regression = 58.53%, kNN = 55.42%). In this case, the variables to be used as input in popularity classification are instrumentalness, acousticness, liability, mode, valence, danceability, energy, loudness as shown in
Figure 6. The datasets set up without selecting feature selection is given as shown in
Figure 7.
6. Conclusions and Recommendations
This study sheds light on people who work in the related field about the effect of music parameters on popularity. In the study, first, the datasets were made suitable for analysis with data preprocessing. By using filter feature selection, variables (features) associated with popularity were determined and feature selection was applied. Outliers in the variables to be used in the popularity classification were determined and removed from the datasets. In the popularity variable, data ranking 24 and less than 24 are labeled as 1, and values below it are labeled 0.
The work uses the filter feature selection method. The reason why the wrapper and embedded feature selection methods are not used is that models built using these methods produce lower results. For example, in models built with the selection of wrapper feature selection method, the random forest algorithm had 57.73% success, logistics regression algorithm had 58.59%, kNN algorithm had 55.48% success.
While the datasets are separated as training and test data, the make_classification function is used to distribute the classes evenly. 66% of the dataset was used as training data and 33% as test data. The difference in the number of samples of the classes in the training and test data of the models built without feature selection is due to the excess of input features.
When the F1-score ratios of the established algorithms in the test data are examined, the accuracy rates of the test data in the algorithms created by the feature selection process are higher than the models established without feature selection. In this case, it is possible to classify popularity with only popularity-related features, without features with low correlation values. Among all the algorithms used, the algorithm with the highest accuracy is the random forest algorithm. Both models with and without feature selection are more successful than other algorithms. The success rate of the model with feature selection in the random forest algorithm is higher.
Future research may look at these models to create different prediction models to predict the song’s popularity. They can expand this study by using data from different platforms (Tik Tok, YouTube Shorts) instead of song popularity. In future studies, words that affect popularity can be determined by examining the words of popular music. This type of work can be of great help to music artists and producers.