*2.3. Participants*

40 participants, all from the region of Murcia, Spain, were recruited for the experiment. These volunteers were 23 women and 17 men with an average age of 65 (SD = 6.3) and 68 (SD = 5.1), respectively. The volunteers were all in good health and cognitive conditions to perform the experiment. They were given two screening tests, the PROMIS (Patient-Reported Outcomes Measurement Information System) diagnostic test and the TYM (Test Your Memory) test for cognitive impairment. Those who scored above the cutoff point in depression and below in cognitive functioning did not participate in the study. No compensation was paid for the conduct of the study. In addition, participants were required to sign a consent form explaining the procedure and the risks that could arise from conducting the test.

The experiment had been previously validated by the Ethics Committee of the Universidad de Castilla-La Mancha in accordance with the Helsinki Declaration.

#### *2.4. Self-Assessment Manikins*

One way of quantifying and subsequently relating the signals obtained from EDA to each of the different musical stimuli is by using a self-assessment manikin (SAM) questionnaire [23,24]. This questionnaire is widely used in psychology to measure the subjectively felt intensity of emotions

to compare with the emotional connotation of the different physiological signals captured by electrophysiological devices [42–44]. The questionnaire consists of a series of manikins representing different values of valence, activation and dominance [45]. In this experiment only the manikin for activation was used.

#### *2.5. Music Stimuli*

As mentioned above, in this experiment the key in provoking emotions is music. For this reason, eight music pieces have been specifically composed by a professional musician for this experiment. These compositions reflect some musical styles that older people listened to when they were young (more than 30 years ago). Thus, it was the first time the participants heard each of these original pieces. All eight pieces are characterized by a same main melody and eight variations according to eight musical styles. The duration of each variation was 60 s. Table 1 shows the eight selected variations of four musical genres, with each genre including two musical styles. They are "rock/jazz" (*twist* and *swing*), "Cuban" (*bolero* and *habanera*), "Spanish folklore" (*pasodoble* and *Murcian jota*) and "flamenco" (*fandango* and *petenera*), respectively.

**Table 1.** Musical genres and styles used in the experiment.


The musical genres used in this experiment and their repercussion in the region of Murcia are briefly described below. First, flamenco, which has been widely disseminated on the radio and orally through simple songs, is a deeply rooted genre in Spain. The most cheerful, folkloric and festive flamenco styles were adopted, such as the *Fandango* and the *Petenera*, relegating everything related to the "jondo" singing to a secondary position [46]. Secondly, Spanish Folklore, mainly linked to moments of celebration, is characterized by its joyful and jovial character. Also profoundly anchored in the popular, its simplicity and the repetition of melodic-rhythmic elements give off energy and vitality. It is closely linked to dancing as a couple, allowing one to enjoy the social atmosphere and to relate the music to the parties and the cortege.

On its side, Cuban music evokes silent listening without movement or slow dancing in couples with direct physical contact. This musical genre has also been adopted by classical music and has been expanded mainly by the cinema and the radio due to its sentimental character. Finally, jazz and rock'n'roll imply a new way of listening and relating to music. The orchestration of this music that adds instruments and sounds unknown in their culture was novel to the participants. This music relies on simple and repetitive structures, as well as on melodic improvisation through instrumental or vocal solos. The dancing of this music is also new, in pairs but without physical contact, and with very rhythmic movements that sometimes are perceived as transgression.

#### *2.6. Experimental Design*

An appropriate experimental design is fundamental to achieving relevant results. The E-Prime software has been chosen to create the basic design of the experiment. This software is the most widely used in the field of psychology for setting up experimental trials. In fact, E-Prime is a very robust software tool for our proper study, since it allows us to randomize and synchronize the musical pieces that are played to the participants. Furthermore, it makes it possible to add the SAM questionnaire and to control/record different parameters that will be used the exploit the EDA signals acquired during music performance.

The design of the experiment has been carried out following the scheme shown in Figure 1. As it can be seen, the experiment has well-differentiated phases. In the first phase, the measuring instruments are placed on the participant. The EDA signals start to be collected when the participant is prepared, which means that he/she is in a neutral emotional state. To achieve this state, the participant remains silent looking at a black screen before the first piece of music is played. In the second phase, the participant listens to each of the musical pieces and, when the reproduction of each one of them is concluded, the person completes the SAM questionnaire. This process is carried out 8 times until all the musical pieces have been played.

**Figure 1.** Flowchart of the experimental design

At the same time as the experiment was being conducted, the EDA signals were continuously collected, making possible further segmentation, preprocessing and analysis of the signal.

#### *2.7. Electrodermal Activity Preprocessing*

As discussed above, EDA has been measured by a non-invasive device. Concretely, the E4 Empatica bracelet measures the skin conductance (SC) in the form of EDA signals. These measurements are composed of two signals: a first signal that varies slowly, called the tonic driver or skin conductance level (SCL), and the second that varies rapidly, called the phase driver or skin conductance response (SCR). The SCL signal establishes the base level of the signal, while the SCR is directly associated with the activity of the sweat motor system which, in turn, is directly associated with the parasympathetic nervous system.

Within the process of processing the EDA signals, different phases are crossed during which the signals are transformed. These phases are usually preprocessing, filtering, artefact removal and discrete deconvolution. The preprocessing process is in charge of establishing the segments acquired in each of the phases of the experiment. Then, it is necessary to filter the SC signals to eliminate the artefacts and interference recorded during the acquisition phase. In our case, two different filters have been used: first, a low-pass filter with a 4 Hz cutoff frequency, and second, a Gaussian filter to smooth the signal and attenuate artefacts and noise.

The next step is the deconvolution process to separate the SCR from the SCL signals. This method makes it possible to minimize the effects that race, sex and age contribute to the SC signal. Figure 2 shows an outline of how this process has been performed. As can be seen, it is the SCR driver that can be used to detect the arousal level of the participant. For this sake, the MATLAB library called Ledalab 3.4.9 has been successfully used [47]. Mathematically, the sudomotor nerve activity can be considered a *Driver* containing a train of impulses that develop over time. This response is integrated in SC and, consequently also in SCR and SCL. The result is represented by a convolution (\*) of the driver with the impulse-response function (IRF), which describes the flow of the impulse response over time, as shown in Equation (1).

$$SC = SC\_{Driver} \* IRF \tag{1}$$

The *SC* signal is composed of signals *SCL* and *SCR*, as shown in Equation (2).

$$SC = SCI + SCR \tag{2}$$

$$SC = (SCL\_{Driver} + SCR\_{Driver}) \* IRF \tag{3}$$

**Figure 2.** Flowchart of the deconvolution process

Thus, by deconvolution of Equation (3), the tonic signal driver is obtained as:

$$SCR\_{Diver} = \frac{SC}{IRF} - SCI\_{Diver} \tag{4}$$

At this point the resulting signals can be used in the following process, which is feature extraction and analysis.

#### *2.8. Feature Extraction and Analysis*

As commented above, to establish if there are differences between the EDA signals produced during the listening to the different music tracks, the *SCRDriver* has been used. Figure 3 shows the feature extraction and analysis process, which aim is to assess those features (metrics) that characterize the signals. The SCR driver, obtained through the deconvolution process described above, is decomposed into a series of temporal, morphological, statistical and frequency features. These features are stored on a feature sheet for later analysis to investigate if there are differences in the arousal on the basis of each feature for each of the musical genres.

**Figure 3.** Flowchart of the feature extraction process

Notice that the human reaction against a specific stimulus is usually expressed as a peak or a burst of peaks in *SCRDriver* as per the level of alertness involved. From a physiological perspective, the reactions against the stimuli are plotted on the signals as peaks proportional to the intensity, length and number of emotional events. The greater the disturbance caused, the greater the peak height produced in the SCR data. The number of peaks in *SCRDriver* increase when the stimulus is maintained over time, which produces a series of sequential peaks.

Table 2 details the several features selected to characterize the different segments of the *SCRDriver*. These features, which have been applied successfully in previous works [40,41,48], allow us to quantify each signal.


**Table 2.** Features obtained from skin conductance response (SCR)

The temporal parameters are the mean value (M), standard deviation (SD), maximum and minimum peak value (MA and MI), and dynamic range (DR) establishing the difference between maximum and minimum. These parameters can provide globally significant feedback about the average and variability of the data series. They provide specific information about a higher or lower reaction obtained through the data, which may differ by the nature of the stimulus. Other temporal parameters used are the first and second derivative (D1, D2), their means (D1M, D2M) and their standard deviations (D1SD and D2SD). The use of these parameters is due to the fact that if the stimulus is intense it produces a greater slope than when it is less intense. It is, therefore, necessary to establish a criterion of speed and acceleration in the response. If the slope has reached its maximum, the time needed in the recovery produces a smoother and opposite sign gradient.

Within the morphological features there is arc length (AL), integral area (IN), normalized mean power (AP), root mean square (RMS), perimeter and area ratio (IL), and energy and perimeter ratio (EL). These parameters obey the need to understand the morphological differences in the shape of the *SCRDriver*. There are not only peaks to be studied, but changes in the general morphology of the signals are of interest. Statistical features employed are skewness (SK), kurtosis (KU) and momentum (MO). These supply information about the distribution and variability of the data series. Finally, for the frequency domain the fast Fourier transform (FFT) for bandwidths F1 (0.1, 0.2), F2 (0.2, 0.3) and F3 (0.3, 0.4) has been chosen. Using these parameters enables discovering any variation in the frequency domain for each of the stimuli.

#### **3. Results and Discussion**

This section presents the results obtained in the experiment, broken down into two different studies. In the first study, a series of statistical tests were carried out to determine whether any significant statistical differences exist for each the temporal, morphological, statistical and frequency features described above in the EDA signals processed for each of the music genres. The objective was to identify the variations in arousal depending on the music genre, as well as to specify which features can confirm a significant statistical difference.

The second study consisted of analyzing whether there is a clear correspondence between the responses given by the participants in the SAM activation questionnaire and the physiological EDA signals acquired during listening to the music fragments. To this end, objective information on each of the EDA signal segments associated with each music genre was linked to the subjective response to the SAM questionnaire. Several classifiers were used to quantify whether there are differences between low and high excitation states. Our purpose was to check whether these classifiers can classify the states with good accuracy.

For the statistical analysis of both studies IBM SPSS Statistics version 23 was used. Please note that in all cases only a *p*-value < 0.05 was considered to be statistically significant.

#### *3.1. Direct Arousal Detection from Electrodermal Activity*

As mentioned before, first a statistical study was carried out to determine if there are any significant statistical differences for each of the features selected. This started by verifying whether the features obtained from the SCL driver signals satisfied the hypothesis of normality. This check defines whether a parametric or non-parametric test can be used. In our case, all the features were found to meet this criterion with a *p*-value < 0.05. Therefore, we chose to use the T-Student distribution to determine

whether significant statistical differences existed. For each of the musical genres, the comparison was made with the values obtained at the beginning of the experiment, corresponding to each participant's neutral state (no music played). Table 3 shows the mean and the standard deviation of each of the features associated with the different musical genres. Hence, the *p*-value of each feature is provided for every musical genre in Table 4.


**Table 3.** Mean and standard deviation for the different features.

**Table 4.** *p*-value for the different features.


Moreover, Figure 4 visually displays the statistically significant features for each of the musical genres. From the previous figures and table, it can be observed that the musical genres with more statistically significant differences, according to the features employed, are Flamenco and Spanish Folklore. In contrast, there are far fewer statistically significant differences in Cuban and Rock/Jazz genres.

**Figure 4.** Statistically significant features for each of the musical genres according to their *p*-value.

In relation to the temporal features, M, SD and D2SD show significant differences for all four musical genres. Most other features also obtain statistically significant differences in two or three musical genres. Only for D1M and D2M there is no statistical evidence of a difference. For the group of morphological features there are only meaningful differences for all four musical genres in AL. AP presents meaningful differences in flamenco, Cuban and Spanish folklore. AP presents significant differences in flamenco, Cuban and Spanish folklore, followed by EL which has only Cuban and Spanish folk. For RM and IL no remarkable differences are found. Regarding statistical features, there are significant differences for all musical genres in SK and MO. On the contrary, for KU there are only differences in flamenco. Finally, in the category of frequency parameters, only F2 presents significant differences.

A plausible interpretation to the fact that more statistically significant differences are found in Flamenco and Spanish Folklore in contrast to Cuban and Rock/Jazz genres is provided next. Especially in the south of Spain, including the region of Murcia, flamenco is a genre that was strongly interpreted in the 60s and 70s, both in social life and in learning moments. We can say that there are many orally transmitted songs with a flamenco influence in the Spanish culture that over decades, have been sung and clapped in groups. Moreover, flamenco became a sign of identity of the purely Spanish [49]. On the other hand, through Spanish folklore, the choirs and dances, understood not as isolated elements of each Spanish region, but through musical bases common to the whole Spanish territory, were used for decades to strengthen the idea of unity of the homeland [50]. Moreover, the *Pasodoble* style and especially the *Murcian jota*, as its name indicates, are profoundly established in the region of Murcia.

On the other hand, in the 60s and 70s, and even earlier, foreign music, especially American music, was identified as the antithesis of Spanish music and as contrary to Spanish values and morality [51]. This led to the discrediting of these musical genres by the radio and the press. This was the case, although not to a high degree, of the Cuban genre. Finally, despite the media pressure of aversion towards foreign music, and mainly in foreign languages, there was an increase in fans of musical genres imported from the United States in the two grea<sup>t</sup> Spanish cities, Madrid and Barcelona. In small cities more rooted in traditional culture, such as the region of Murcia, these cultural manifestations had to wait a longer time [51].

#### *3.2. Comparison of Arousal Detection and SAM Questionnaire Responses*

The second study introduced the use of classifiers to verify that the differences between the two states (low and high arousal) mentioned above do exist. The classifiers were required to analyze possible correlations between the objective detection of the arousal level from processed physiological EDA signals and the level of arousal subjectively perceived by participants when answering the SAM questionnaire.

It was decided to use different well-known classifiers, which were grouped into trees, ensemble, regression, discriminant, naïve Bayes, k-nearest neighbors (KNN) and support vector machines (SVM). In addition, several standard configurations were chosen [52–56]. More concretely, we used logistic regression and linear discriminant classifier. We tried with both Gaussian and Bayes distributions in the case of naïve Bayes. Three were the configurations used for trees, namely fine tree (Gini criterion and 4 splits), medium tree (Gini criterion and 20 splits) and coarse tree (Gini criterion and 100 splits). The kinks of ensemble trees were boosted, bagged, RUS boosted and subspace KNN. The KNN configurations used were fine (Euclidean distance and 2 neighbors), medium (Euclidean distance and 10 neighbors), coarse (Euclidean distance and 100 neighbors), cosine (angular distance and 10 neighbors) and weighted (Manhattan distance and 10 neighbors). Lastly for SVM the following configurations were studied: linear (polynomial kernel, grade 1), quadratic (polynomial kernel, grade 2), cubic (polynomial kernel, grade 3) and linear (radial basis function kernel), all of them with 10<sup>5</sup> iterations and MSE criterion.

As input parameters we used the different established features. As output we used the answers to the SAM excitation questionnaires completed during the experiment. Thirty iterations were performed for each of the classifiers, obtaining the precision (and its standard deviation) shown in Table 5. The dataset was randomly separated into 70% for training, 15% for testing and 15% for validation.


**Table 5.** Accuracy (%) of arousal assessment through different classifiers.

As a result, it can be seen that in the tree classifiers, for the Flamenco and Spanish Folklore genres, the tree that best classifies is the medium one with 75 and 78% respectively. On the contrary, for the rock/jazz genre, none of the trees exceed 50%, so we cannot consider that it is classified well enough. In the logistic regression classifier the results are between 60 and 67% for all music genres. One could argue that this is not a good classifier for this data set. For the linear discriminant, it was found that the best result obtained was for flamenco with 57%, which was not enough to accept it as a good classifier. Thus, this method of classification can be discarded. This is because this type of classifier works better with time series, as opposed to our proposal which is for the chosen features [57].

Naïve Naive Bayes only works well for the Gaussian configuration with an accuracy of 70.6%, 71.1% and 70.6% for flamenco, Cuban and Spanish folklore, respectively, and slightly worse for rock/jazz with 69.2%. The results of the above classifiers are in line with other studies carried out in recent years [58,59]. As for the ensemble trees, the configuration that performs the best classification is the subspace KNN. It classifies quite well the high versus low arousal states for the flamenco, Cuban genre and Spanish folklore with 74.5%, 71.43% and 72.1%, respectively. For rock/jazz the one that works better is the RUS boosted with 68.6% accuracy. The results are similar to those found in recent previous studies with EDA [60,61].

Among the KNN methods, the best classifier for flamenco is cosine KNN with an accuracy of 81.4%. For the remaining musical genres, the best is the medium configuration with an accuracy of 80.2, 81.5 and 76.09% for the Cuban, Spanish folklore and rock/jazz genres, respectively. Finally, for SVM the best classifier is the radial basis function kernel with 87.4, 81.4 and 83.1% accuracy for flamenco, Cuban genre and Spanish folklore, respectively. On the other hand, in the rock/jazz genre, the accuracy of the classifier increases to 67.4%, but it is not enough to conclude that it classifies well between the two states (low and high arousal) [62–64].

As is known from previous preliminary studies [40], kernel-based classifiers (SVM) perform better than the others because they can handle a larger number of features. Afterwards, distance-based classifiers of the k-NN type are the best for classifying this type of signals as may be seen from the results (see Table 5).
