*3.2. Sensors*

An EEG sensor produced by Muse [36] and a Grove GSR sensor produced by Seeed [37] were used in this study (Figure 2). The upper section of Figure 2 shows the EEG sensor, which has four electrodes. According to the instructions from the EEG sensor manufacturer, the electrodes are located at the positions FP1, FP2, TP9, and TP10 of the head [36]. The mapping of head locations was defined by Jasper [38], and it is used in neuroscience. The EEG data were captured with four electrodes; however, only the data from FP1 and FP2 were utilized. This is because TP9 and TP10 were not attached well on the participants' heads during the data collection, which caused instability in the output data from these electrodes. The EEG sensor captured raw EEG at a 220 Hz sampling rate and provided power spectral density (PSD) values for each electrode. According to the sensor manufacturer, the types of PSD data were absolute band power (ABP), relative band power, and others [36]. The sampling rate of these data was 10 Hz. To calculate ABP, we applied the fast Fourier transform algorithm. We calculated PSD by the following EEG frequency bands [36]:


The lower section of Figure 2 illustrates the Grove GSR sensor that was used in this study, along with the finger band electrodes. These electrodes were attached to the index and middle fingers of the participants. The GSR sensor captures micro voltages (MV) between the fingers using the attached electrodes. Furthermore, the sensor calculates skin resistance (SR) utilizing the MV input in ohms. The formula for calculating SR using MV is provided by the sensor manufacturer [37], and is replicated in Equation (1). The sensor was connected to an Arduino Uno device, and the captured data were transmitted to a computer at a 192 Hz sampling rate:

$$SR = ((1024 + 2 \ast MV) \ast 10,000) / (512 - MV). \tag{1}$$

**Figure 2.** (**a**) EEG sensor, and (**b**) GSR sensor.

## *3.3. Protocol*

Figure 3 presents the protocol of data collection. In the introduction stage, the participants were presented a page showing a consent form of the experiment. The information on the consent form stated that we would use the collected data only anonymously for academic purposes. Furthermore, the participants were instructed that they could stop the experiment anytime when they feel uncomfortable.

In the stage for showing non-boredom videos, a cinematic trailer of Blizzard's Starcraft 2: Heart of the Swarm, and a Korean comedy video clip were used to evoke non-boredom. These video clips were chosen to entertain the participants so that they would not become bored. To evoke boredom, a looping video was played in which a small circle moved slowly tracing the boundary of a bigger circle. Furthermore, to neutralize the emotion state of the participants, a cloud image from the International Affective Picture System [39] was shown for 30 seconds before showing each video stimulus. When the participants watched the videos, they were instructed to stop watching (by pressing a button) at any time they chose. Thus, each participant's data length was different, with the shortest watching time being 7.13 s.

The data collection consisted of two sessions. The sessions were otherwise identical except for the non-boredom video stimulus (i.e., the game trailer and the comedy show clip). The reason for carrying out data collection in two sessions was to ge<sup>t</sup> a content-independent classification result. In other words, we wanted to see whether the change of non-boredom video has an effect on the classification. After watching the video stimuli, the participants answered a questionnaire to measure the strength of boredom that they felt. The questionnaire had two questions: one for the boredom video, and another for the non-boredom video. The questionnaires were designed to be answered on a 5-point scale, and the range of the scale was from "None" to "Very much."

**Figure 3.** Protocol of data collection.

#### **4. Machine Learning Methods**

In this section, we describe the procedure for feature extraction and machine learning techniques for analyzing the collected data to classify boredom.

#### *4.1. Window Size*

As explained in Section 3.3, each participant's data length was different because they were instructed to stop the video stimuli playback at the time they chose. The shortest data length was 7.13 s, thus only the last 7 s of each participant's data were extracted to build the models. With the window size of 7 s, the number of samples was 56. Other window sizes, such as 1 s and 0.5 s, were also tested; however, these potentially caused overfitting because two or more samples would be generated from the same data with the same label.
