**5. Discussion**

## *5.1. Visualization*

Visualization is a promising approach to finding evidence for how networks make decisions. Of the various visualization tools, we elected to look at the activation of the network's neurons to identify which ECG and RESP patterns they focused on. As Figure 4 shows, the neurons were activated around ECG QRS, and T waveforms. These are unique patterns, specific to ECG data, and the network's convolutional layers were able to consider changes in their shape and amplitude. Likewise, Figure 5 shows that the network was able to process patterns extracted around the RESP peaks and troughs.

These findings indicate that our network can extract a more comprehensive range of features than simple handcrafted ones that consider specific waveform (e.g., R-peaks), frequency-domain, or time-domain features. This is possible because the network learned meaningful stress-related features from the data. From this point of view, we can understand why the network performed better than the machine learning models (Table 4). We can therefore conclude that this deep learning approach is more promising than the previously proposed machine learning approaches.

#### *5.2. Comparison with Previous Studies*

Three studies [5,12,15] have proposed deep learning approaches to stress recognition. Deep ECG Net [12]'s structure was optimized using domain knowledge about the ECG PQRST waveforms, enabling it to achieve a high average accuracy of 80.7% on two different datasets and perform better than conventional machine learning models. Consequently, we used this optimized network structure as the basis for our proposed DeepER Net. Next, because good experimental protocol design is important for obtaining reliable datasets and results, we adapted Cho et al.'s [15] well-designed protocol for use in our study. They proposed a cheap thermal imaging-based stress detection method, which extracts multiple spectrum images from the thermal respiration images and then augments the data using a sliding window method. The resulting CNN achieved 84.6% accuracy for classifying two stress levels (binary classification). Finally, He et al. [5] proposed a deep CNN for detecting acute cognitive stress from 10-s ECGs. They used spectrum images extracted around ECG R-peaks as input and applied data augmentation. Their CNN achieved an average error rate of 17.3%, equivalent to an average accuracy of 82.7%.

In this study, we have proposed the first end-to-end deep neural network (DeepER Net) to recognize stress using multiple signals (ECG and RESP). Because we needed to consider two different signals, we developed a unique network structure that could extract features from both signals. The network achieved an average accuracy of 83.9%, which is comparable to the results achieved by the other proposed models [5,12,15] as summarized in Table 5. For a fair comparison, evaluating the models on a public dataset via the same training conditions and evaluation method can be useful. We proceeded with an experiment validating the models using the DRIVERDB [10] including ECG, RESP, and stress label information. The dataset [10] was collected with the different driving sections (e.g., rest, city, and highway) and each section indicates different stress level. For example, the rest section, city section, and highway section indicate low, high, and medium stress levels, respectively. Among a 17 drivers dataset in [10], we considered only 11 drivers having an existence of the clear marker [25]. The preprocessing including noise filtering and clipping was the same presented in the Methods section. After preprocessing, 801 labeled segments including ECG, RESP, and Lomb Periodogram spectrum [5] were obtained. Finally, the last layer of networks was replaced with a softmax layer for classifying three classes (e.g., low, medium, and high) and then we trained and evaluated the three networks with five-fold cross validation on the segments. As a result, the proposed DeepER Net showed the highest average accuracy of 83.0%; the Deep ECG Net [12] showed the average accuracy of 75.0% and the network [5] showed the average accuracy of 38.5% which may be owing to under-fitting caused by the small capacity of the network. This result means that the use of the multi physiological signals improves the performance of recognizing stress. However, we guess that there may be performance degradation in the open dataset because several important hyper-parameters of networks have been optimized in their dataset, not the open dataset. Thus, more open and reliable data needs to be disclosed. The hyper-parameters, learning rule, and structure of networks [5,12] are shown in Tables S1 and S2.


**Table 5.** Comparison with the-state-of-the-art deep learning approaches using physiological signals for recognizing stress. Abbreviations: CNN, convolutional neural network; LSTM, long short-term memory.

By visualization, we also identified the activation patterns produced by the ECG and RESP data and analyzed their meanings. Although previous studies have analyzed ECG activation patterns [5,12], ours is the first to analyze the various ECG and RESP activation patterns related to stress recognition, which we believe makes it distinctly different from previous work.

#### *5.3. Possibility of Personalized Models*

Although this study did not focus on personalized models that can adapt to individual stress responses, such models could be developed based on the proposed network. Because DeepER Net's last layer is a sigmoid function, the probability of stress is calculated within a 0–1 range and the model then makes a decision using the default threshold (0.5). Increasing the threshold would make the model stricter when determining stress states, while lowering the threshold would make it more generous. This suggests that we could change the threshold based on individual stress responses, and hence develop personalized models. Alternatively, personalized models could be developed by fine-tuning the network based on data from a single individual. Unlike with conventional machine learning approaches, there is no need to retrain the network from scratch, so it can be trained rapidly and avoid over-fitting issues.

#### *5.4. Multiple Physiological Datasets*

The main reasons for using multiple physiological datasets are as follows. First, a small number of subjects can cause over-fitting problems that reduce generalization performance. Such over-fitting issues can be overcome by increasing the amount of data (e.g., by involving more subjects or augmenting the data) or using features based on other independent types of data. Because increasing the number of subjects is difficult, extracting independent features can help to deal with over-fitting problems. In addition, each person's stress responses may vary slightly, leading to the problem of inter-variability, which has the effect of lowering generalization performance for new subjects. Therefore, considering multiple data related to stress could help to reduce the problem.

However, using too many different types of data could reduce the stress recognition system's usability by requiring a variety of monitoring devices to be worn to collect all the different physiological signals, which would be burdensome in practice. Researchers should thus consider the trade-offs involved between usability and performance.

#### *5.5. Limitations and Future Work*

Our study has two main limitations: the experimental setting and the use of a respiration monitoring device. Although the setting was intended to simulate a real workplace, the actual experiments were conducted in a more controlled manner because recruiting working subjects is difficult and an uncontrolled experimental setting would have reduced the quality of the data. Once we have established our model's validity, we plan to perform experiments in a real workplace setting. In this study, we used a chest strap-based wearable device to measure the physiological signals, but we are aware that such devices can be hard to wear in the workplace and thus plan to use a patch-type ECG device and a wearable device to measure RESP in a later study.
