3. Discussion
A lot of improvements are still to be made in the physiological stress model reported in this paper. As further stated in
Section 4, we used a simple linear model for regression and classification, and purposely did not perform feature selection. Such an optimization is our next step in developing a human stress model, but this also requires collecting data from a larger set of participants, and to get more information about their psychological feelings and environmental factors that may influence the recorded physiological signals. In that perspective, the feature set could be optimized both by enrichment (considering other kinds of signal properties) and reduction (performing efficient feature selection). Additionally, more complex models should be tested, which might provide more accurate and robust estimations.
Still, our approach brought up some interesting results. For instance, the two train examples in
Figure 1 illustrate the influence of ambulatory conditions on model performances: as soon as the participant stopped working on his laptop, the stress level assessed by our model decreased, along with RMSEs on the reported stress levels; and all samples were correctly labelled as no-stress data. This phenomenon could be timely monitored as shown in
Figure 1c,d. Even though the variability in physiological signals is intrinsically taken into account by our model through the feature set, notably with features based on standard deviation, real-life conditions will still decrease its accuracy. What is more, it should be noted that participant’s stress levels unlikely remained the same all the time during each condition, as assumed in this preliminary study—in our approach, we considered as the ground truth the mean stress level reported by the user over a given period.
Regarding the matter of ground truth, determining which kinds of events are the most suitable to train a stress detection model is not a trivial question. In an end-user setting, soliciting user intervention as little as possible is a key issue: to calibrate a reliable stress model, an efficient trade-off has to be found between user acceptance and algorithm’s requirements, regarding information collection on affective states and environmental factors. In our experimental design, a compromise had thus to be brought up between, on the one hand: full awareness on environmental factors and participant’s feelings, and on the other hand: some coarse ground truth, related to specific events. We hence designed our grey-box model on the grounds of three stressful and one control conditions (the last one being non-stressful but still active), delineated in retrospect by the user himself.
The three self-reported periods of moderate-to-high stress occurred in different circumstances. Varying participant’s environment, mental or physical activity during the stress condition was a prerequisite to the construction of a general model, whose accuracy would extend to any new stressful situation. Furthermore, adding an event that was not related to the trip itself (e.g., giving a speech during a meeting) allows assessing the ubiquitous nature of the psychological state that is quantified—given that stress is not specific to travelling. This could be seen during the cross-validation with Trial N°3 (
Table 1), where the “speech” event was used as a validation dataset: the model, trained on the other stressful episodes taken from the journey, successfully applied to the oral presentation period, with similar RMSEs for learning and validation steps.
However, a main limitation of this study lies in the small number of available events for the no-stress condition: this likely explains the perfect classification rate shown in
Table 1. Indeed, unusual environmental factors might have strongly affected one or more peripheral measurements during the no-stress period, which would have led to an overfitting effect. In this case, our algorithm should focus more on detecting these influential factors rather than user’s psychological stress. Such a hypothesis may be argued by the higher RMSEs obtained during the test phase (
Table 2) than during the validation steps (
Table 1): on new no-stress events, the model has shown up less accurate with respect to participant’s reported stress levels. However, our algorithm remained efficient in classifying all three test events as actually non-stressful episodes.
In a nutshell, we succeeded in recovering the stressful (or non-stressful) nature of episodes that took place during a medium-length trip, using a linear combination of statistical features extracted from the raw physiological data. This model has been implemented in a smartphone application, allowing for real-time estimation of user’s stress levels using a minimally invasive setting: a smart wristband connected with user’s own phone. Our purpose here was to present preliminary results and to highlight the potential of such a tool for the future of mass transit. In the Bonvoyage H2020 framework for instance, the digital platform will automatically infer a user profile based on the timely assessed stress in different transport modes during previous travels or commutes. This profiling will be used to propose tailored solutions to users in order to improve their on-board experience. Real-time monitoring of passenger’s stress also aims at proposing an on-trip assistance to deal with excessive stress during any travel, by suggesting an alternative trip solution. In a more large-scale use, statistical data could be drawn by crowdsourcing on passengers’ feelings about transportation means, which would bring service providers closer from consumers’ wellbeing. At the individual level, this can be an efficient way to raise people’s awareness over their own living standards.
4. Materials and Methods
The next paragraphs describe in further details the materials and methods mentioned in
Section 2.
4.1. Ethics Approval, Consent, Availability of Data
The experiment described in this article is part of an experimental campaign approved by the Ethics Committee in Non-Interventional Research (CERNI) related to COMUE Université Grenoble-Alpes, and in accordance with the Declaration of Helsinki. Following a standard inclusion procedure, the participant provided written consent to participate, and agreement—subject to full anonymity—for publication of results based on collected psychological and physiological data. However, the raw data acquired throughout this experiment will not be made available for public use, in accordance with a confidentiality agreement between the participant and the research team.
4.2. Physiological Recording
The wearable sensor Empatica™ E4 wristband was used to record three different physiological signals: blood volume pulse (BVP), skin conductance (SC) and skin temperature; and 3-axis accelerometer data. This device can be used in two separate ways: either in storage mode (data is stored in the wristband’s local memory), or in streaming mode (transmission by Bluetooth® connection to a suitable smartphone application). The storage mode has been used during the learning phase, the streaming mode during the test phase. The BVP signal is an Empatica’s proprietary version of pulse oximetry: by measuring the oxygen saturation in blood vessels at the surface of the skin, it allows an indirect measure of heart rate. In Empatica™ E4, BVP is recorded at 64 Hz, SC at 4 Hz, 3-axis acceleration at 32 Hz and skin temperature at 4 Hz. The wristband automatically infers interbeat intervals (IBIs) when data’s quality is high enough. Its battery and memory units are sufficient to enable more than one day of continuous recording. The device is also equipped with a trigger button which allows the participant to mark some significant events, and the experimenter to locate them easily in time.
4.3. Preliminary Signal Processing
The SC signal has been processed in two different ways. The first one is a slight signal smoothing, performed by applying a 2nd order low-pass Butterworth filter with a cut-off frequency set to 0.2 Hz. This allows removing peaky artefacts from the signal, such as pressure waves, or zeros caused by one-time contact losses between electrodes and the skin—and nonetheless, to preserve the main skin conductance responses (SCRs). The second processing is a harder signal smoothing, also performed by a 2nd order low-pass Butterworth filter, with a cut-off frequency set to 0.05 Hz: the remaining information corresponds to the skin conductance level (SCL). SCL is the main tonic component of SC: it is related to the current physical properties of the skin and to individual’s general arousal level. SCRs consist in punctual rises in the skin conductance that are potentially triggered by emotional events, with a progressive return to SCL; they represent the main phasic component of SC. They are the part of interest in the first processed signal, which includes both the tonic and phasic information.
The IBI signal is derived from BVP by an Empatica™ proprietary algorithm that performs a real-time recognition on the BVP waves. It is not regularly sampled, and the poor signal-to-noise ratio brought on BVP by ambulatory conditions makes wave recognition difficult, generating sometimes a sparse IBI signal. The HR signal is inversely proportional to the IBI.
Only the Euclidian norm was considered for the 3-axis accelerometer data. No post-processing was applied to the skin temperature signal: raw data were used.
4.4. Feature Computation
Following this first step, 24 statistical features were computed from the processed signals, which are displayed in
Table 3. Each one of them has been calculated over successive 60-s time windows, with no overlapping. They have been selected on the basis of previous work on experimental databases [
12,
13], and for their suitability for real-time implementation.
Regarding the electrodermal activity, with respect to the four different features computed from the local maxima in the SC curve (SCRs), previous filtering allows setting minimal constraints on the detection parameters, without time delay between two peaks and with a threshold prominence set to 10
−3 μS. For a given peak, the prominence is defined as its height, relatively to the lowest point located between this peak and the nearest higher peak (for more detailed information see [
14]). The peak width is defined as the number of samples, within this peak, which the ordinate exceeds its half prominence.
Regarding cardiac activity, given that the IBI signal is not steadily sampled, the frequency domain features have been derived from its Lomb-Scargle power spectral density estimate on each time window. When no IBI data could be found, the time window was discarded from the whole analysis. After feature extraction, the cleaned feature matrix included 84 non-stress and 137 stress examples, which represent 78.4% of the initial database.
Since real-time computations were intended, no artefact rejection was performed on the remaining dataset. Together with the expected poor signal-to-noise ratio in ambulatory conditions, this motivated the use of the whole feature set to fully take advantage of information content from all the recorded signals.
4.5. Linear Regression Model
The final stress model has been designed as a weighted sum on each of the previous statistical features. The predicted stress level
s over one time window
t is expressed in Equation (3), where
F is the number of features, each of which
fk is associated with a weight
βk that quantifies its linear relationship with
s.
To estimate the full weight vector β, a linear regression has been performed using the least-squares method on the feature matrix extracted from a learning database: the one recorded during participant’s first trip.
4.6. From Learning to Test Phase
The previous model is the one described all along
Section 3. Following the validation stage, the results of which are displayed in
Table 1, the final
β weight vector was estimated using the whole learning database. All the required steps for stress estimation (first processing, feature computation and combination within the linear model) were then implemented in a home-made Android application, able to communicate with the E4 wristband in streaming mode. The overall framework (wristband-smartphone-application) was able to record the same user’s physiological signals and estimate his stress levels every minute in real time. This setting was put on test during the journey from Grenoble to Brussels. Model outputs are those displayed in
Figure 1.