1. Introduction
Driving a vehicle represents, in general, a complex task that requires different and particular skills, especially when this is not performed in a standard environment, such as on a road or highway, or when the vehicle itself is not a standard one, such as a car or a motorbike. Moreover, in working environments, the difficulty increases due to both many distracting factors and to the use of more complex vehicles, such as earth-moving vehicles (EMV). To limit the risks in these situations, specialized personnel must be carefully instructed regarding the use of EMVs, and these can be equipped with specific safety systems that can help in hazardous situations. The International Organization for Standardization (ISO) has defined functional safety standards (e.g., ISO 19014-1) for EMV [
1], such as the Articulated Dump Truck (ADT). According to ISO 19014-1, the concept of controllability is related to the ability of the driver to avoid harm being caused to people through the timely reactions of the operator, possibly with the support of alternative controls [
1].
Because the off-road environment in which EMVs are deployed (e.g., mining environments or quarry sites) is not uniform, situational awareness (i.e., the perception and understanding of information that enables an individual to predict future courses of action needed to respond to a dynamic environment) is very important [
2]. Georg et al. define the term readiness as the driver’s ability to engage in driving by starting with a non-driving-related activity (NDRT), such as tuning the radio volume or having a mobile phone conversation [
3]. Many models have been proposed for monitoring driver status based on monitoring the driver’s behavioral or physiological signals [
4]. In more detail, these methods (especially those based on physiology) often rely on the use of a single technology. However, the use of a combined set of parameters, as proposed in this paper, can provide higher performance in terms of detection accuracy while maintaining low intrusiveness and ensuring usability. Moreover, while in the case of dynamic tasks it is generally more appropriate to use kinematic sensors to detect different behaviors [
5,
6], it has been widely shown that the use of physiological parameters in specific tasks can be a valuable indicator for subjects’ adaptation to new conditions [
7]. In fact, as observed in [
8], to achieve higher controllability, the smart vehicle should be equipped with sensors for monitoring the drivers’ situational awareness. In the literature, the link between alertness and mental stress has been assessed by measuring human physiological features [
2,
9]. To this aim, muscle activity, heartbeat, sweating, respiration rate, speech, and blood pressure have been exploited to estimate the user’s acute stress [
10,
11,
12].
In this paper, we analyze a specific situation in which the state of the driver may change several times during operations. In a previous work, we proposed a physiology-based human situational awareness model for road vehicles [
8]. In this work, the model is tested and validated on an ADT simulation driving environment. The tests were divided into three phases: (1) driver identification, (2) awareness definition, and (3) voice interaction.
In the first phase, the driver is identified and an enrollment procedure is performed. In this phase, the user’s physiological signals are collected under resting or non-stressful driving conditions. Data can be stored locally to take advantage of edge-based computing architectures.
Next, changes in drivers’ physiological signals, such as heart rate variability (HRV), galvanic skin response (GSR) and speech, are monitored to estimate the driver’s state of stress or drowsiness. These data can be used to inform the driver and the vehicle managing system about the current awareness condition. In the third phase, speech analysis is performed through voice interaction between the vehicle managing system and the driver, and data are used to support the situation awareness condition.
To complete the model definition, a vehicle task demand phase could be added. In this phase, the computational cost necessary to a semi-autonomous vehicle to avoid a potential danger or the cognitive engagement overload of the driver during a hazardous situation must be analyzed and assessed [
13].
The proposed assessment system is validated through an off-road driving environment. The validation of the model in [
8] was limited to whether the selected sensors could be used to acquire physiological signals in a potential stress situation, that is, completing the Stroop test and some arithmetic calculations.
Here, we go further by implementing the whole system in a complex scenario and validating it in an ADT simulator driving environment. The validation is based on real-time non-invasive acquisition and processing of the heart activity and galvanic skin response signals, together with speech data.
The rest of the paper is organized as follows. In
Section 2, we recall essential background information. In
Section 3, we describe the materials and the methods used in the performed experiments. In
Section 3.4, we describe the performed analysis. In
Section 4, we present and discuss the experimental analysis. Finally, in
Section 5, conclusions are drawn.
3. Materials and Methods
To collect signals used for evaluating the driver status, a specific setup has been studied aiming to minimizing the invasiveness of the sensors while being used during EMV driving simulations. The adopted solutions can be easily integrated in a driving environment by including sensors in the steering wheel or in the dashboard of the vehicle. Moreover, a specific experimental protocol has been designed and implemented to validate the use of adopted sensors and methodology.
In this section, the data acquisition sensors and the experimental protocol adopted for validating the system are described.
3.1. Data Acquisition Devices
In this study, subjective experiments were conducted in order to collect the physiological and the voice signals by using a commercial sensor kit (i.e., e-Health platform) interfaced with a shield connected to an Arduino Uno board. The E-Health sensor shield allows the acquisition of nine biomedical signals. For our purposes, only two signals have acquired: ECG and GSR [
28]. Since we are interested in identifying the QRS peaks from the ECG signal, this was band-limited using a low pass anti-aliasing filter with a cut-off frequency components of 100 Hz and data were recorded at 325 Hz. To guarantee a proper synchronization with signal recordings, the same setup was used for the GSR signal. The disposable ECG sensor electrodes (
Figure 2) were placed on the body in the classical lead configuration considering the Right Arm (RA), the Left Arm (LA), and the Left Leg (LL). The skin conductance was measured at the base of two fingers (see
Figure 3) of the left hand by measuring the electrical current that flowed as a result of applying a constant voltage. A preliminary recording was made of the ECG and GSR signals under resting conditions to obtain a baseline signal used during processing as a reference for the signals recorded in the different minutes of the tests.
To collect a homogeneous voice signal among participants, drivers were asked during the tests to explain each action they performed while driving (e.g., “I want to increase the gas pedal because I have a good road to drive” or “I want to take left because this track is not comfortable”, etc.). In the meanwhile, they were asked to provide dashboard information on demand every minute (such as km, engine rpm, loaded vehicle information, etc.). To record the voice signal, a sampling rate of 22 kHz has been set. This rate was selected because it helps in achieving higher accuracy on the acquired signal. To standardize all voice recordings, an external microphone was placed at a distance of 5 cm from the volunteer’s mouth. To synchronize the acquisitions by both e-health sensors and the microphone, data were recorded using a custom interface developed in LabView (@National Instruments).
3.2. Driving Simulator
Aiming to stimulate the acute stress for the driver, in these experiments we put the driver into some challenging virtual environments subjected to a time constraint. In more detail, challenging situations refers to the demand for drivers’ complete attention to avoid physical or mental harm. As the ADTS machine works under two Degrees of Freedom (DOF), which could bring a physical palpitation for the driver while the machine runs in a muddy deep slope, marshland, etc. (see
Figure 4).
In such a situation, the driver may feel as if he or she is falling out of the simulator even when the seat belt is fastened. Similarly, the driver’s long-term presence in a virtual environment can cause mental discomfort such as vomiting, headaches, etc. The presence of acute stress could be analyzed through ANS-controlled physiological organs, i.e., ECG, GSR, and speech. The experiment was divided into three stages; each part was designed with some challenging virtual environment situations with the aim of inducing acute stress (as shown in
Table 1). The interval between each experiment was 3 min. Our hypothesis is that HRV, GSR, and speech data, properly combined, are good indicators for estimating the subject status variation. If the HRV power and GSR levels are high, compared to the normal ones, then the subject might be under acute stress; similarly, low levels could indicate drowsiness. Additionally, the HRV power and GSR levels also pave the road to a hypothetical situation (
Figure 1), and the use of speech helps in giving a significant interpretation to the driver status, for better estimation.
3.3. Experimental Protocol
Nine male volunteers were monitored, gathered from students of Mälardalen University, Västerås, Sweden. Each volunteer holds a driving license with a minimum of two years of driving experience. During the test, volunteers demonstrated their ability to use the driving simulator to conduct EMV simulations in the virtual environment, even with a short preliminary familiarization test (without data recording). An informed consensus was signed by each volunteer. The following driving instructions were provided: description of the setup, shifting procedures, and availability of information in virtual reality (e.g., vehicle speed, vehicle load, stopwatch for completing the activity, and overview of the route map). In addition, safety instructions such as fastening the seat belt, precautions regarding the electrodes, and the microphone were provided to the volunteers. An ADTS driving practice phase was allowed to enable the volunteer to become accustomed to the driving system. The experiment was conducted in a room with a temperature of about 20 °C–22 °C.
To generate acute stress in the volunteer, three experiments have been designed each one structured in three parts of different engagement and difficulty (
Table 1).
Experiment 1: In this experiment, the driver is asked to drive the unloaded ADTS on the assigned track. This activity must be completed in 5 m, which requires the user to drive fast. However, due to the size of the vehicle, to avoid accidents (e.g., collision with vehicles, obstacles or vehicle rollover), it is necessary to maintain control of the vehicle.
In the following, the peculiarities of the path are described:
During the 1st minute, the driver runs the vehicle in a smooth landscape.
During the 2nd minute, the road becomes muddy and bumpy.
At the end of the muddy road, the driver reaches a muddy deep slope (around the 3rd minute).
Driving in these road conditions requires more control of the ADTS. Reaching the end of the muddy slope, the driver must turn right 180° and drive along a flat stretch of road, approximately during the 4th minute.
Later, approximately during the 5th minute, the user drives the ADTS on a marshland road.
The interval between the 1st and the 2nd experiment is set to 5 m.
Experiment 2: In this simulation, the user drives the ADTS in the same mining field used in experiment 1. However, before proceeding on, the driver must load the ADTS. The time limit to complete the simulation is 5 m.
In the 1st minute, the driver must reverse the ADTS on a narrow path to reach the position set for the loading procedure. Once the ADTS is correctly positioned, a sound is generated and the excavator proceeds to load the vehicle.
During the next few minutes, the user drives the fully loaded ADTS (40 tons) on a muddy road. The ADTS reaches the muddy road during the 3rd minute of the experiment.
Thereafter, the flat road is reached during the 4th minute, and then the deep steep muddy road during the 5th minute of the experiment.
The interval between the two experiments is set to 3 m.
Experiment 3: The driver drives the ADTS in the same mining field but during the night. Both the duration and the execution of the simulation follow the previous simulation. Experiment 3 will end by driving him into a swamp, that is, during the 5th minute.
3.4. Data Analysis
3.4.1. Feature Extraction
The data have been analyzed with a 60s frame window. This length allows an effective update of drivers’ psychological condition [
29,
30]. The growth rate was used to compare the results obtained in the different minutes of the tests with the preliminary registration of ECG and GSR signals performed in resting conditions, used as baseline. The growth rate has been calculated as the ratio between the measured parameter in each minute and the baseline one.
HRV Analysis
The HRV analysis consists in the evaluation of the natural irregularity of the heart rhythm that can be achieved extracting a derived from ECG time signal and evaluating its frequency features. In the time domain analysis, the RR intervals (mean heart rate (mean HR): i.e., the distance between two consecutive R peaks of the QRS complex of a cardiac cyclic activity), the mean heart rate and the standard deviation of RR-intervals are extracted from the acquired ECG signals using a specific algorithm [
21]. Then, a spectral analysis of the RR series is conducted to evaluate ANS functioning, considering different frequency intervals:
Low frequency (LF), defined in the range 0.04–0.15 Hz, represents the component mainly associated with the involvement of the SNS;
High Frequency (HF) defined in the range 0.15–0.4 Hz represent the component mainly associated to the involvement of the PNS.
A useful parameter to evaluate the changing from the rest to the attention status is the HRV power, that is defined as the LF/HF ratio and is used as an index of autonomic balance.
In this work, each set of data has been segmented with 1m time window. Salahuddin et al. [
29] claimed that ultra short term analysis in the spectral domain is feasible for features such as HF, LF, LF/HF with 50 s of data. In addition, Takahshi et al. [
30] proved that spectrum analysis for the above mentioned HF and LF based on HRV analysis, the results of recording ECG of 30 s to 1 m are still comparable to the 5 m ECG.
GSR Analysis
GSR is used for measuring sympathetic activation related to acute stress. As the skin conductance varies, the GSR can detect small changes [
31]. The signal is processed according to the scheme reported in
Figure 5.
For a fast estimation of sympathetic activity each of the raw GSR signal is time-windowed for 60 s. Then, the signal is filtered using a moving mean technique with a window size of 400 samples. Finally, the filtered signal is normalized in the range of
as follows:
where
x = (
, ...,
) is the filtered data and
is the
i-th normalized data [
32,
33]. The increase in the average value compared to the norm confirms the involvement of the SNS. Conversely, the decrease in value demonstrates the induction of the PNS.
Speech Analysis
To obtain different time interval estimations, the speech signal, which was sampled at 22 kHz, is discretized into 5 s time windows. Next, the silent zone is removed using an amplitude threshold of 0.01. To understand if the signal is either voiced or unvoiced, the pitch value is computed. To this aim, the resulting speech signal is further segmented into frames by using a Hamming window with a size of 660 samples (30 ms) and with an overlap of 110 samples (5 ms). After windowing and computing the autocorrelation over the range of lags, the peak is located. The pitch value is computed using the sampling frequency value over the position of the first positive peak lag. The periodicity of the signal waveform and also the uniform delay difference between the peaks of the autocorrelation function explain whether the incoming speech signal is voice or non-voice. In addition, for the voice signal, the pitch (or) fundamental frequency is searched in the range [70–450] Hz. Similarly, for the spectral slope estimation, the voiced signal is isolated for the fundamental frequency (i.e., 70–450 Hz). The unvoiced areas were excluded from our analysis.
Air pressure generated by the lungs, which initially pressurizes the vocal folds into vibration, causes the glottal pulsation that serves to excite the speech production mechanism. This greatly influences analyzing the role of glottal flow from the speech signal. Inverse filtration is a widely used technique for estimating glottal excitation, which still suffers from problems such as: (i) the result of inverse filtration depends on the quality of the input signal, with a long closed phase of glottal source, (ii) while the vocal track identification is manually performed, the estimation of the glottal source depends on subjective criteria, and (iii) the results also depend on the quality of input signal [
34].
To overcome these issues, in order to estimate the glottal source signal, we adopt the Iterative Adaptive Inverse Filtration (IAIF) method [
34] using the linear prediction for the estimation of vocal track response. The vocal track effects are removed using IAIF in an iterative manner to achieve the appropriate estimation of the glottal source signal. To obtain the spectral envelope for the voiced speech and its glottal waveform, Linear Predictive Coding (LPC) estimation was performed. LPC estimation is used to construct the LPC coefficient for the inverse transfer function of the vocal track [
35]. For better estimation of the LPC coefficient, the input signal should be stationary/vocal; this greatly reduces error signals (i.e., LPC analysis) [
35]. Similarly, the size of the LPC coefficient vector depends on the order, which is computed by (
/(2*
)). The magnitude spectrum envelop is computed using the Fast Fourier Transform (FFT). Finally, to obtain the spectral envelop, linear regression is applied to the spectral tilt line. The non-parametric Probability Density Function (PDF) histogram was computed for the input time windowed signal. The number of bins used in the histogram is 20. The same procedure was repeated for all voiced segments and their glottal waveform. To perform statistical comparison among the non-normally distributed data, the Wilcoxon signed-rank test was used [
36]. The glottal and voiced spectral slopes of the neutral signal are compared to the spectral slope of varied voiced and its glottal signal. The individual difference of the distributions is statistically significant if the
p < 0.10. Thus, we obtain two sets of data, i.e., the
p-value for the voiced and the glottal.
Table 2,
Table 3 and
Table 4 are obtained from the average between the
p-values of voiced and glottal. The data population in
Table 2,
Table 3 and
Table 4 is not uniform because of the fact when the involvement of the driver increases the probability of his/her speech might be limited unless the speech is integrated within the experimental protocol [
8]. Since the population of voice data is not uniform, it was not possible to perform a statistical comparison similar to that of other physiological data (e.g., HRV and GSR).
5. Conclusions and Future Work
In this paper, an experimental evaluation of a system based on physiological signal to evaluate the driver status of workers using EMV is presented. The analysis aims at the possible definition of a physiology-based situation awareness model for readiness. The results obtained from different driving situations are promising in the direction of also discriminating the opposite state while driving under normal or challenging conditions.
For the experimental validation, physiological signals such as ECG, GSR, and Speech data were collected from nine participants in driving experiments with increasing level of complexity. Experimental results show that the selected physiological signals can be used for assessing the driver status.
From the preliminary analysis performed in this study, some useful insights can be drawn. As previously mentioned, the selected physiological signals, can be used to assess, as a first approximation, the state of the driver. Thus, the proposed framework can be used as input to a monitoring and alerting system. However, in order to design an effective monitoring system, characterized by miss-detection and false-alarm probabilities required by the specific application scenario, further and more accurate testing campaigns are needed. To this aim, future investigations include the use of more complex voice interactions with the system, the use of a sensorized steering wheel for GSR data acquisition, a seat belt equipped with ECG sensors, and posture analysis modules. Aspects of psychophysiology will also be considered.