**1. Introduction**

Mental health is being recognized as an important issue in the workplace [1]. If mental stress is not treated in a timely manner (i.e., left unmanaged), employees can experience serious physical problems, such as heart disorders, diabetes, cancer, and stomachaches [2,3]. Stress also causes mental disorders such as depression and anger, and can even lead to suicide [2,4]. Such problems can seriously reduce productivity owing to absences and work disability [1], with the medical and socioeconomic costs in the United States adding up to \$300 billion annually [5]. Detecting and relieving stress in a timely manner could thus improve overall healthcare substantially.

Stress is typically evaluated using a stress indicator questionnaire, where individuals answer questions such as the perceived stress scale (PSS) [6] and sleep quality assessment (PSQI) [7], and healthcare professionals evaluate the stress score based on those answers. Because these methods rely on expert evaluations, they are not suitable for continuously monitoring stress in the workplace. This limitation makes it difficult or impossible to recognize stress rapidly and intervene appropriately to help people suffering from it. Consequently, there is a growing need for ways to continuously and objectively monitor stress.

The autonomic nervous system comprises the sympathetic nervous system (SNS) and the parasympathetic nervous system (PNS). When an individual is mentally stressed, the PNS activity decreases and the SNS starts to dominate. These neurological changes lead to physiological changes in heart rate (HR), skin conductance, respiration (RESP), and pupil diameter [2] that can be accurately measured by conventional biomedical instruments. Unfortunately, conventional instruments for measuring physiological signals are not optimal for continuous use in the workplace owing to their bulky size and associated cables. However, the recent advancement of wearable technology has made it practical to continuously measure various physiological signals with minimal disturbances, leading to increased research interest in continuous stress detection based on physiological signals.

Similar to the importance of the developments in monitoring devices, developing algorithms to analyze the collected data and accurately recognize the occurrences of stress is also crucial. Several machine learning models have been proposed to recognize stress based on multiple physiological signals [8–12]. Although these models have demonstrated the feasibility of recognizing stress, they have one serious limitation, namely that machine learning approaches require us to extract well-defined, handcrafted features and find the best way to combine them, both very challenging tasks [13]. Furthermore, because the dependence of such approaches on handcrafted features means they cannot find new stress-related features, it can limit their maximum generalization performance. Overcoming this limitation will require a breakthrough.

Recently, deep learning approaches have made grea<sup>t</sup> strides in image processing and natural language processing [14]. This is because they not only automatically extract features from data, but also learn new high-level features based on low-level ones owing to their hierarchical structure, something that simple machine learning models cannot do. In particular, convolutional neural networks (CNNs) and long short-term memories (LSTMs) have led to grea<sup>t</sup> successes in numerous fields. Owing to these advantages, attempts have also been made to use this approach to recognize stress [5,12,15]. However, these have only considered one type of physiological signal. Because a single signal cannot capture all possible responses to stress, this may degrade their generalization performance. Conversely, the performance degradation can be solved as well as the diversity of individual physiological characteristics be considered using multiple physiological signals. It is thus essential to study the validity and feasibility of deep learning approaches based on multiple physiological signals.

Our goal in this study is to propose an end-to-end deep neural network based on combining two types of physiological signals, namely electrocardiogram (ECG) and RESP data, which have been proposed as meaningful stress-related signals [10]. In addition, we compare the proposed network with conventional machine learning models and visualize the results to see the activation patterns produced by the ECG and RESP signals.

The remainder of this paper is organized as follows. First, in Section 2, we review the literature on both machine learning approaches using multiple physiological signals and deep learning approaches using one type of physiological signal. Then, our experiment's protocol, a machine learning approach, and a procedure developing the networks will be covered in Section 3. In Section 4, we provide statistical results, evaluate our proposed network, and compare it with the benchmark machine learning models. Finally, in Section 5, we visualize the activation patterns in our network and compare our study with previous ones that have proposed deep learning approaches. Then, we discuss the use of multiple datasets and conclude the paper by discussing potential limitations and future work.

#### **2. Related Works**

#### *2.1. Machine Learning Approaches*

Numerous studies have proposed machine learning approaches for recognizing mental stress based on various types of physiological signals [8,9,11,16]. Of these signals, ECGs and photoplethysmograms (PPGs) have been used to extract handcrafted features related to heart activity, such as the HR and HR variability (HRV). In addition to these, other physiological signals have been investigated, such as RESP, electrodermal activity (EDA), galvanic skin response (GSR), pupil diameter, acceleration, electroencephalograms, electromyograms (EMGs), and electrooculograms. Then, with collected physiological signals, developing such machine learning models requires the following main steps: (1) preprocess and de-noise the data with a digital noise filter; (2) extract well-defined features from the multiple physiological signals and find the best feature set; (3) use these features to train a machine learning model; and (4) evaluate the model on a test dataset.

Siramprakas et al. [8] proposed a stress evaluation model using multiple physiological signals such as ECGs and GSR. In this study, a simulated workplace's stress was considered to replicate workplace stress and signal data were collected. Then, a support vector machine (SVM) was trained and evaluated with either well-defined features or combinations of features. The model was able to recognize stress with greater than 90% accuracy, leading the authors to conclude that HR, HRV, and GSR features in the time and frequency domains were sufficient to accurately detect stress.

In addition to workplace stress, recognizing stress during driving has also been studied. Here, stress is considered to be a risk factor as it can cause aggressive driving behavior and reduced concentration [16]. In [16], the authors developed two main machine learning models, namely an SVM and a K-nearest neighbors (KNN) approach, to identify three distinct stress levels (low, medium, and high). Using Stress Recognition in Automobile Drivers dataset (DRIVERDB) [10] in PHYSIONET, they collected multiple physiological signals including foot GSR, hand GSR, EMGs, ECGs, and RESP, then extracted well-defined features. By finding the feature set that minimized the error rates, the SVM achieved 99% accuracy with a 5-min time window. Their analysis found that selecting the right model, preprocessing steps, and feature set all helped to maximize its generalization performance.

Betti et al. [11] proposed a wearable physiological sensor system for monitoring stress. They conducted Maastricht Acute Stress Tests and collected multiple physiological signals, including ECGs, EDA, and EEGs. After training, the proposed SVM achieved 86.0% accuracy and found correlations between the handcrafted features and the measured cortisol level, which is regarded as a biomarker of stress. By finding these correlations, the study validated the feasibility of monitoring stress with the proposed wearable sensor system.

#### *2.2. Deep Learning Approaches*

Although deep learning approaches are heavily used in the image processing and natural language processing fields, and a few studies have used them to detect or recognize stress [5,12,15], no study has ye<sup>t</sup> applied this approach to analyzing multiple signals. Researchers have developed deep neural networks using the following main steps: (1) process the physiological signals with a digital noise filter; (2) design a unique deep neural network based on domain knowledge; (3) train the network; and (4) evaluate it on a test dataset.

Cho et al. [15] proposed a promising approach to recognizing stress with a cheap thermal imaging camera. The collected thermal images of people breathing were preprocessed to create spectrum sequences, and then a CNN was used to extract features from these. To increase the number of data points, a sliding window method was used to augmen<sup>t</sup> the data. The proposed CNN achieved greater than 80% accuracy on average for classifying the images as stressed or unstressed. Their main contribution is that they were the first to use spectrum sequences taken from thermal images as input.

Hwang et al. [12] presented the Deep ECG Net for recognizing stress based on short-term ECGs (10 s). The authors proposed a 1D CNN with optimized filter size and pooling length that used domain knowledge of ECG PQRST waveforms. The proposed model achieved better performance than conventional machine learning models. Visualizing the process showed that it detected spiky patterns around ECG P waves, meaning that it was able to automatically extract ECG waveform characteristics. Their network achieved about 80% accuracy on average in classifying the data as stressed or unstressed for their two experiments. Their main contribution is showing that optimized networks based on domain knowledge can provide better performance than conventional machine learning approaches and deep neural networks that are designed without domain knowledge.

He et al. [5] developed a ten-layer CNN to detect acute cognitive stress based on short-term ECGs. Here, spectrum information was used to extract consecutive ECG R-peaks for use as input instead of raw ECG data. This study also used a sliding window method to increase the number of data points. Their results showed that the proposed CNN achieved a lower error rate than conventional machine learning models. In addition, the authors found that the meaningful information related to stress lay in the 0.4–20 Hz range by visualizing the activation maps of multiple layers. This demonstrates that deep learning approaches can benefit from having data-driven features that are not used by conventional machine learning approaches.
