**1. Introduction**

Stress, an ever-growing issue in modern societies, has become an inseparable part of people's fast-paced daily lives. Continuously increasing workload, tight deadlines, and the resulting time pressure all contribute to the increasing stress levels. Stress is an organism's reaction mechanism to a stressor. In a stressful state, certain control systems in the human body, such as the autonomic nervous system (ANS), act mostly unconsciously to control the responses to stress by regulating some bodily functions. This mechanism has been constantly improved throughout human evolution, to create prompt canny reactions in life-threatening situations [1]. Stress symptoms can be measured and observed in numerous ways. The sympathetic nervous system (SNS) kicks off the stress reaction, which will appear in the form of psychological, physiological, and behavioral indications [1]. The bidirectional impacts of the mind on the body and vice versa are among the major hallmarks of the autonomic nervous system (ANS), which has evolved in a way to have a direct role in human life and survival [2]. The autonomic nervous system (ANS) is divided into the sympathetic nervous system (SNS) and the parasympathetic nervous system (PNS). Most of the studies investigating the effects of chronic psychological stress on the human body have concluded that the sympathetic and parasympathetic nervous systems become over and under activated respectively, while the

individual is under psychological stress. The resulting abnormal activities of the sympathetic and parasympathetic nervous systems cause physical, behavioral, and affective irregularities. In order to regulate the physiological arousal states, a balance of activity is expected between the sympathetic and parasympathetic subclasses of the autonomic nervous system (ANS). It is feasible to measure and evaluate the autonomous nervous system (ANS) function through non-invasive physiological phenomena like the electrodermal activity (EDA) and heart rate variability (HRV). For example, the high frequency (HF) component of the HRV, which is one of the frequency domain characteristics of the heart rate variability (HRV), is an indicator of the vagus nerve and parasympathetic nervous system's (PNS) activity. In contrast, low frequency (LF) reflects the activity of the sympathetic nervous system (SNS) [3–5]. The final goal concerning almost all of the stress detection research is to find ways to notify the user about their stress levels and help them to manage it to avoid the social, economic, and health-related consequences.

Traditional approaches for measuring stress are taken either by a psychologist interviewing the subject or by requesting the study subject to fill in particular questionnaires designed explicitly for self-reporting. Such interviews require the constant presence of a trained psychology specialist. Requesting the subjects to fill out long lists of questionnaires and self-report diaries are the most widely adopted processes to evaluate stress. These techniques are the current gold-standard as well. However, these methods are cumbersome and entirely manual. There are also other concerns and drawbacks regarding the interviews, self-report diaries, and questionnaires due to the way that different subjects commonly behave. Since these reports are based on or influenced by personal feelings, tastes, or opinions, they are highly subjective. As an example, some people tend to respond to questions in a manner that will be viewed favorably by others [6]. Some responders may feel that sharing their private psychological feelings may put them to shame and hide their real feelings by understating them. In contradiction to that, there are numerous cases in which responses are exaggerated [7]. Gender differences also play an essential role in how men and women express and report their stress levels and affect states when confronted with various stressors [8]. However, for everyday life, the self-reports are the closest labels to the ground truth.

Any potential automated stress detection framework for daily life will be developed using a mechanism that tries to surmount the need for any intervention from a psychologist and making the whole process less incommodious. Due to the disadvantages that self-report questionnaires' possess, research has emerged detecting regular psychophysiological signs of stress with machine learning (ML) algorithms utilizing the reliable and proven indicators such as response activities of the sympathetic nervous system, skin temperature (ST), electrodermal activity (EDA), and electrocardiogram (ECG) [9]. The purpose of adopting ML methods for raw psychophysiological objective data is to extract meaningful emotional and affective information. The raw sensor data are collected in this process and transformed into information-containing features. Some of those features would then be used while assigning affective state labels. Subjective data are also recorded by taking records of the known context and/or daily questionnaires and ecological momentary assessments (EMAs). Finally, the primary purpose of the whole process is to train machine learning algorithms to diagnose different types of behavioral and emotional states by using the EMAs, questionnaires, and the known context and test the system using the rest of the features and any feature recorded in the future. Although some of the steps described above are not necessarily required for deep learning models, the whole process for almost all of the traditional machine learning systems is almost the same [10]. In an automated stress detection scheme, these psychophysiological signals are utilized, and analysis of these signals reveals the frequency and intensity of the stress experienced by the subjects.

Preliminary studies of automated stress detection were held in the laboratory environments, and then, the research took a step into daily life since researchers realized that the stress level experienced in the laboratory is different from daily life stress [11]. However, the problems encountered in this new environment were both more complicated and intricate. Achieving precise annotations and identification of the perceived stress in the wild is a difficult task due to the high diversity in human psychological evaluation and the lack of direct observation over subject activities. Furthermore, the unrestricted movements of subjects in the wild may cause misinterpreting of stress detection by causing artifacts in the signal data. Lastly, since the medical-grade devices with cables, electrodes, and boards cannot be used in daily life due to their obtrusive nature, more pervasive and comfortable devices should be used. However, alternatives such as smart bands and smartwatches have lower data quality, and more advanced signal processing techniques should be applied to overcome this problem. Due to the issues mentioned above, the performance of daily life stress detection systems is lower than those proposed for laboratory environments. Smartwatches and smart bands are unobtrusive, easy to use without requiring specific actions, and are more suitable for daily life. They are adopted by the consumer and easily available on the market. Most of them are equipped with photoplethysmography (PPG) sensors [12]. Our solution is easily applicable to consumers due to the availability of these devices.

Given the complexity of data acquisition solely performed in the wild in addition to the advantages of the in-lab data, consequently, the question arises as to whether it is possible to combine these systems somehow and use the high accuracy from the first for the sake of the latter. One of the questions that stands out the most is the feasibility of combinations of these two systems to achieve even better results. In this study, we will represent a framework by which it is possible to design a stress detection mechanism that utilizes both of those methods. The data collected in a laboratory can be used for the long-term performance evaluation of the same system in daily life scenarios.

Our work contributes to state-of-the-art in four different aspects:


These contributions will provide insights for researchers into how to improve the performance of daily life stress detection systems.

The organization of the rest of the paper is as follows. In Section 2, we present the related work in stress detection and alleviation. Our proposed unobtrusive system for stress level monitoring with smart bands is described in Section 3. Data collection procedures are explained in Section 3. Experimental results and discussion are presented in Section 4. We present the conclusions of the study in Section 5.

#### **2. Related Work**

Most of the automatic stress detection studies in the literature were conducted either in the laboratory environments or restricted daily life settings. We can examine the studies in the literature as five different classes. The first class develops a laboratory model with known context labels and tests in the same environment. Since the stressor levels (i.e., the context participants are in) are known at any time in laboratory experiments, they could be used as the ground truth labels for machine learning (ML) models, and we called this type laboratory-to-laboratory known context (LLKC) models. The second type uses collected self-report labels in laboratory environments and tests the created model in the same environment. We call this type laboratory-to-laboratory self-report (LLSR) models. The third type is using self-report questionnaires collected in the wild and testing the model in the same environment. We named this model daily-to-daily self-report (DDSR) models. Since we could not monitor the everyday life of participants and ge<sup>t</sup> the ground truth all the time, a known context does not exist in daily life environments. The laboratory data could be used to enhance daily life

stress detection models. If the known context labels in the laboratory are used for an ML model development, we call this fourth type laboratory-to-daily known context (LDKC) models. On the contrary, if the self-reports in the laboratory are used as labels and the developed models are tested in the wild, we name the fifth type laboratory-to-daily self-report (LDSR). Table 1 illustrates some of the studies conducted either in the laboratory, in daily life, or both. In this section, we will briefly outline some of the research for stress detection that has been conducted in the laboratory and everyday life environments by using the taxonomy developed above.



Daily-to-daily self-report. LDKC: Laboratory-to-daily known context. LDSR: Laboratory-to-daily self-report.

#### *Sensors* **2020**, *20*, 838

*Sensors* **2020**, *20*, 838

Early experimental practices in the laboratory were the first building blocks of this research field. These preliminary studies provided researchers with the idea of choosing the most proper devices, features, and machine learning algorithms that can be used later in everyday life settings. In a trade-off between unobtrusiveness and accurate signals, researchers chose the types of devices with different sensing technologies based on the requirements of their study. Several researchers demonstrated that wearable devices equipped with PPG sensors added more comfort and convenience compared to ECG devices for HRV measurements [28–30]. Nevertheless, single-lead electrocardiogram (ECG) devices are becoming more compact, easy to wear, and commercially available over time. In a recent study by Billeci et al., single-lead ECG was used to study the autonomic nervous system response through monitoring the heart rate and HRV features [31]. In [14], Zubair et al. developed a smart wristband that has EDA, Bluetooth, and accelerometer sensors to detect the stress values of the test subjects. In their designed scheme, EDA and the accelerometer were used together to enhance the detection accuracy. Their experiments resulted in 91% accuracy for two-class stressed-relaxed classification. Castaldo et al. administered a study in a laboratory environment to investigate the effects of ultra-short HRV (heart rate variability) with a two minute duration on stress detection accuracy [15]. By applying the support vector machine (SVM) to HRV features, they could score 70% accuracy on their binary classification problem. Although a 70% accuracy seems low compared to other studies in the same year, they speculated that this could be due to the stress induction methods used and ultra-short-term HRV features. Tivatansakul et al. used HRV features inferred from the ECG signals and facial expressions to recognize negative emotions in the laboratory environment. Subjects were exposed to the International Affective Picture System (IAPS) set of disturbing images for four minutes. After applying the SVM classifier on the extracted features, classification accuracy for the negative emotion detection was 83% [20]. All of the in-lab implemented cases mentioned above used only one biosignal, either EDA or ECG, in their proposed stress detection mechanism. Other studies practiced multimodality by employing multiple biosignals to achieve even higher accuracy levels. In a recent stress detection and alleviation research work, Akmandor et al. [21] recorded EDA, ECG, blood pressure, blood oximeter, and respiration rates of the participants. In order to induce stress, memory games and IAPS were used, and for the alleviation part, they used various stress mitigation techniques such as classical music and micro-meditation. Their classification accuracy using SVM and kNN for their binary class problem was 95.8% [21]. Researchers used a combination of medical-grade devices with the highest possible accuracies in the laboratory settings. For instance, there are cases in which electroencephalogram (EEG) signals were used for stress monitoring [17]. Implementing such a setting in the wild is practically impossible due to the obtrusive nature of the EEG devices and the lack of daily-life suitable wearable devices with EEG capability.

In other cases, researchers decided to mitigate the trade-off between the accuracy and unobtrusiveness and conducted their studies either in the lab or in the wild using unobtrusive wearable devices. The choice of device type (i.e., whether to use medical-grade obtrusive devices with cables, electrodes, boards, or not) is important because it determines the applicability of the system to daily life, which is the main and final goal of all the stress detection studies. In some of the studies, the data were collected in the wild and trained and tested in these environments [22–25] (DDSR model type in the taxonomy). Ciman et al. [22] detected stress by analyzing smartphone usage behaviors. In the controlled laboratory experiment, by developing an Android app with search and typing tasks, participants' tap, swipe, scroll, and text input gestures were recorded. These gestures were expected to induce stress on the participants. They obtained approximately 80% stress detection classification accuracy with SVM, NN, kNN, and decision tree classifiers. The physical activities of the user, the light values of the screen, and the events related to the mobile phone screen were recorded in the wild experiment. The classification accuracy was about 70% with the same classifiers. Vildjounaite et al. [24] used mobile phone usage data and implemented MPM (Maximum Posterior Marginal) and HMM (Hidden Markov Model) for stress recognition. They experimented with 28 subjects for four days and obtained a maximum of 68% accuracy on their semi-personal data. Mishra et al. [25] detected stress by analyzing heart activity signals captured with a Moto360 smartwatch. The data were collected from 23 subjects for three days. They also added the activity type to physiological features to increase the accuracy of the system. Random forest (RF) was used for the classification of stressed and relaxed sessions. They increased their F1 score from 0.50 to 0.76 by adding the activity-related contextual information. A recent daily life study was carried out with 1002 participants using unobtrusive wearables [32]. They extracted heart rate variability features and employed the RF classifier. They could only achieve a 0.43 F1 score (with three classes) and listed the additional difficulties of working in the wild while explaining the reasons for the relatively low performance [32]. The performance of daily life stress monitoring systems is lower than studies conducted in controlled environments due to the mentioned issues such as low-quality physiological signals with artifacts and the difficulty of collecting the ground truth in [33,34] (see Table 1).

The data collected in the experiments conducted in the laboratory could be used for developing models in the wild to improve the reported low accuracies. There is a limited number of studies using laboratory data for creating models with higher performance in daily life. Li et al. [27] built their model in the lab, verified its performance in the lab with different participants, and later verified it in the field test again with new subjects. They used mental arithmetic tasks to induce stress and seven-point scale Likert self-reports as labels in the laboratory (LDSR model type). They evaluated their system performance by using the Elastic-net regression technique. The correlation results of predicted labels and ground truth self-reports were 0.72 and 0.56 for laboratory and daily-life settings, respectively. In another study, Gjoreski et al. proposed a continuous stress detection scheme, which consisted of a base detection mechanism solely using the laboratory data, an activity recognizer for identifying the contextual information, and a real-life stress detection mechanism that utilized the results from the laboratory detector and the context information for real-life stress detection [26]. While training the model, they used only the context label in the laboratory environment (LDKC-type of model). They did not collect self-reports during these tests. The classification accuracy of their stress detection system without using context information was 76% [26] (they increased it to 92% by adding the activity type). Since different methods that were mentioned in our taxonomy were tried in different datasets created for each particular paper in the literature (see Table 1), one could not infer the success of a method over others. Therefore, there is clearly a need for comparing the different proposed techniques for stress detection in the laboratory and daily life settings with unobtrusive devices.
