**Supporting Drivers of Partially Automated Cars through an Adaptive Digital In-Car Tutor**

**Anika Boelhouwer 1,\*, Arie Paul van den Beukel 2, Mascha C. van der Voort 2, Willem B. Verwey <sup>3</sup> and Marieke H. Martens 4,5**


Received: 28 February 2020; Accepted: 27 March 2020; Published: 30 March 2020

**Abstract:** Drivers struggle to understand how, and when, to safely use their cars' complex automated functions. Training is necessary but costly and time consuming. A Digital In-Car Tutor (DIT) is proposed to support drivers in learning about, and trying out, their car automation during regular drives. During this driving simulator study, we investigated the effects of a DIT prototype on appropriate automation use and take-over quality. The study had three sessions, each containing multiple driving scenarios. Participants needed to use the automation when they thought that it was safe, and turn it off if it was not. The control group read an information brochure before driving, while the experiment group received the DIT during the first driving session. DIT users showed more correct automation use and a better take-over quality during the first driving session. The DIT especially reduced inappropriate reliance behaviour throughout all sessions. Users of the DIT did show some under-trust during the last driving session. Overall, the concept of a DIT shows potential as a low-cost and time-saving solution for safe guided learning in partially automated cars.

**Keywords:** Adaptive HMI; automated driving; automotive user interfaces; driver behaviour

#### **1. Introduction**

Although commercial cars are increasingly equipped with combinations of automated functions such as Adaptive Cruise Control (ACC) and Lane Keeping Systems (LK), drivers appear to have a hard time getting used to them. Many drivers do not know which Advanced Driver Assistance Systems (ADAS) their car has, what they do, and how to safely use them [1,2]. Several aspects appear to contribute to the confusion about car automation among drivers. First, different car brands are introducing automated systems with similar names but with different functions, or different system names for similar functions [3,4]. Second, research showed that at least a quarter of all drivers do not receive any information about ADAS from their salesman when they buy a car equipped with such a system [5,6]. Furthermore, only a small proportion of drivers gets to actually drive with the automated functions at their sales point. This is worrisome as drivers need multiple interactions with an automated system to properly understand it [7,8]. Third, current driver-car interfaces often fail to follow widely accepted human factors and human machine interaction guidelines [4], leading to misinterpretations of the system's capabilities. Co-driving (alternatively referred to as cooperative-

or shared control) (see, for example, [9–11]) has been suggested to reduce the need for frequent and complete control switches. Although this may take many forms, co-driving entails the shared control of the vehicle. Some responsibilities are allocated to the driver, while others are allocated to the car. Still, even in co-driving, a driver still needs to know how this shared control works, what the car's capabilities and limitations are, and when they are responsible for what particular driving task. All in all, a lack of understanding about ADAS may reduce traffic safety [12–15] and limit any prospected benefits of automated driving [16–20]. Drivers need to be supported in learning when it is (not) safe to use the automation in their car [21].

Several solutions have been proposed to support drivers in understanding, and safely using, the automation in their car. The first one is to stimulate the use of owners' manuals. However, not only are these usually long and complicated, studies suggest that practise is required to fully support safe automation use [22–24]. Driving simulators in particular allow drivers to practise with rare but critical driving situations [25–27]. The main downside to all these options is that additional training at a driving school or at a facility with a simulator requires high investments, both financially and time-wise.

#### *1.1. Digital In-Car Tutor (DIT)*

In the present study, we explore the potential of a Digital In-car Tutor (DIT) to support drivers in using in-vehicle automation. A DIT guides drivers through the different automated systems in their own cars, during regular drives. While a DIT may take various forms, we particularly studied a DIT prototype using audio and an Augmented Reality (AR) overlay on the windscreen (see Section 2.2.3). The DIT is designed to be used in real cars during regular drives. The following three steps illustrate the core functionalities of our DIT prototype. First, the DIT introduces one of the automated car systems while the driver is driving manually. New systems are only introduced when the driver is in a low complex situation [28], like an empty straight road on a clear day. Such an introduction concerns the system's functionalities, handling, capabilities and limitations, and equipment. Second, the driver can try out the functionality while the DIT provides immediate feedback. Third, the DIT reminds drivers about specific systems capabilities and limitations when a related situation is encountered. Furthermore, rare situations are addressed when driving in similar, but more frequent, situations to keep the driver's mental model up to date [7]. A new system is introduced as the driver has safely driven with it for a certain amount of kilometres (for example 500 km), and the cycle repeats itself. A DIT could have many benefits over regular driving lessons, simulator training, and the use of owners' manuals. First, it is less time consuming and costly, as it is active in the driver's own car during regular drives. Second, a DIT allows for continuous and situated support over a longer period of time. Last, a DIT can be brand- and model-specific, and can be adjusted when automated functions are changed by software updates.

#### *1.2. Adaptive Communication*

To facilitate learning and avoid an excessive cognitive demand, a DIT should be adaptive in various ways. First, instructions by the DIT should concern the current driving situation so that the driver is able to immediately process and apply them. Furthermore, the modality, timing, and duration of the communication needs to be adjusted to the demand of the driving situation to avoid overload. Studies on the cognitive demands of feedback suggest that tutoring in highly complex driving situations should be condensed and action-based. Elaborate theory and reflection can be presented during low complexity situations [29–31]. Last, the feedback needs to adapt to the driver's performance, to update his or her mental model. This includes both direct but short feedback, and elaborate reflection after the situation. For example, drivers may need to be informed if they turn on the automation outside of its Operational Design Domain (ODD) [32]. These tutor strategies were implemented in our DIT prototype.

Earlier, Simon [33] studied an auditory digital tutoring system for Adaptive Cruise Control (ACC). The tutor content was adapted to the traffic situation in general and to the driver's preferred maximum deceleration. However, the timing and duration did not adapt, nor was the information adjusted to the complexity of the traffic situation. These characteristics may, however, be required in a tutor system, as they may help to prevent driver overload. Simon [33] did find benefits to the tutor in terms of driving safety and a more efficient use of the ACC. However, with the introduction of a variety of automated systems, such research needs to be extended towards cars with multiple systems as these drastically increase the learning difficulty for drivers.

#### *1.3. Present Study*

In the current driving simulator study, we compared the effects of a DIT prototype (DIT group) with those of an information brochure (IB group) on the use of complex car automation during three driving sessions. In all driving scenarios, participants were required to decide whether they could rely on the automation or not. In the specific scenarios that required drivers to turn off the automation, the take-over quality was analysed. During the first driving session, the DIT group was supported by the DIT prototype in learning about the various automated car systems. In contrast, the IB group familiarized itself with the automation by reading an information brochure (IB group) before driving in the simulator. Two more driving sessions followed, one directly after the first and one after two weeks. During these sessions, the DIT was no longer active for the DIT group. The additional sessions were introduced to investigate how any effects of the DIT lasted over time. Last, multiple acceptance elements (e.g., ease of use) of DIT were assessed through a questionnaire.

Overall, we expected the DIT to provide drivers with a better understanding, and safer use, of the automation. Our first hypothesis was that using the DIT would result in more correct automation use. That is, drivers would only rely on the automation if it could deal with the situation safely, and take back control if it could not. A second hypothesis was that drivers were expected to show a better take-over performance in critical situations. A better take-over performance was defined as: taking-over earlier, braking less intensely, and showing a more stable vehicle control.

In conclusion, we examined whether a DIT was more beneficial for supporting drivers in safely using car automation, compared to drivers that received an information brochure. DITs may provide a more time- and cost-efficient solution to driver training of partially automated cars compared to training in driving simulators or on the road with driving instructors. Furthermore, it allows for situated and repeated learning. Lastly, any over-the-air updates of the automation can be directly integrated in the DIT, allowing for tailored instructions about the latest version of the automation. The results of this study allow us to gain insight in whether or not a DIT is an appropriate method to increase appropriate car automation use.

#### **2. Materials and Methods**

#### *2.1. Participants*

38 participants (23 female, 15 male) took part in the driving simulator study. 19 participants were part of the control condition (IB group) and 19 were part of the experimental condition (DIT group). All participants were students or employees of the University of Twente. All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the University of Twente BMS Ethics Committee (nr. 191220). Their average age was 27.5 years (*SD* = 13.1 years, range = 18–65 years). On average, participants possessed their driver's license for 9.2 years (*SD* = 10.81, range: 1–47). Eight participants drove almost every day, and 15 drove multiple times a week. Eight participants drove once a week, and seven drove less than once per week. Most had experience with Cruise Control (*N* = 29). Seven participants had experience with Adaptive Cruise Control, and two with Lane Assist. The Affinity for Technology Interaction (ATI) scale [34,35] was used to determine

the level of general affinity with technology of the participants. On this scale of 1 (low affinity with technology) to 6 (high affinity with technology), the participants scored an average of 3.9 (*SD* = 0.77). The groups did not significantly differ on any of these characteristics. Participants had to speak and understand English fluently to be able to participate as the experiment was conducted in English.

#### *2.2. Research Design*

#### 2.2.1. Driving Simulator & Simulated Automated Car

The experiment took place in the driving simulator of the University of Twente (Figure 1). This simulator includes a car mock-up with a steering wheel and pedals. Three beamers project the simulation on a 7.8 m by 1.95 m screen with a view angle of approximately 180 degrees. Rear- and side mirrors were projected on the screen. A tablet displayed the speedometer, tachometer, and an icon that showed whether the automation was on. The simulated car was equipped with level 2 automation which included (1) Adaptive Cruise Control (ACC), (2) Lane Keeping (LK), (3) Obstacle Detection (OD), (4) Traffic Light and Priority Sign Detection (TS), and (5) Priority Road Markings Detection (RM). These systems were designed specifically for this experiment and did not resemble a particular car model to prevent transfer from existing cars. Participants were informed about this. The steering wheel included a blue button to turn all automation on and off. Participants could not turn the automation off by braking or steering.

**Figure 1.** The fixed-base driving simulator of the University of Twente.

#### 2.2.2. Experimental Condition: Information Brochure Training (IB Group)

At the start of the first driving session, participants in the IB group received a paper brochure on the five automated systems. They read this information for 10 min before driving. This brochure included the functions, handling, equipment, capabilities, and limitations of each system. It contained the same system information that the DIT group received from the DIT. However, as the information was given prior to the practise scenarios, it did not include any situation- and driver-adaptive feedback.

#### 2.2.3. Experimental Condition: Digital In-Car Tutor (DIT Group)

The DIT prototype introduced the five automated systems to the participants though auditory and visual information (ACC, LK, OD, TS, and RM). All visual information was projected as an overlay on the windscreen (Figure 2). This reduced the need for drivers to look away from the road and allowed the information to be directly related to the driving situation. All visual information was accompanied by verbal explanations. The digital standard Google Assistant voice was used for the verbal communication, and had been pre-recorded. This voice was female with a British accent.

**Figure 2.** Examples of the Digital In-car Tutor visuals. (**A**) Visuals while the digital tutor verbally explained Lane Keeping. (**B**) Visuals when the digital tutor verbally explained that the automation cannot deal with overly complex lane markings. (**C**) Visuals when the digital tutor reminded the driver that the automation had trouble driving in bad weather conditions such as heavy rain and fog, and that the weather would be changing.

**Procedure**. The DIT followed the following steps during Session 1 in the experiment. The DIT first introduced a specific automated system (e.g., Adaptive Cruise Control) at the start of the scenarios. This was always on a straight road without traffic. The DIT would verbally explain the functions, handling, equipment, capabilities, and limitations of this system (Figure 2A,B). The verbal explanations were supported by illustrations which were projected on the windscreen. The DIT then told participants to use the automation if they thought that it was safe. As participants approached the situation where they needed to either turn off the automation or leave it on, the DIT would remind the participant of the system capabilities and limitations that applied to the specific situation (Figure 2C).

**Adaptivity**. The information from the DIT was expected to put some cognitive demand on drivers [36,37]. To avoid driver overload, the length and type of DIT messages were adapted to the complexity of the driving situation. This could be considered a 'safety filter' for our DIT as described by Van Gent et al. [29]. The communication was longer and more detailed in low complex situations, while it was condensed during highly complex situations. Furthermore, discussing theory and reflecting upon situations only occurred during low complex situations. This included the system introductions on the simple straight road at the start of each scenario [28], and reflection after each critical situation. As an example, the ACC introduction was: "ACC keeps the car at a set speed, and automatically speeds up, and slows down the car, to keep a set distance to the car ahead. The car has several cameras which are used to detect a car ahead of you." If the driver correctly left the automation on in this scenario (ACC1), the reflection was "Great job. The ACC detected the cars in front of you and slowed down to keep the set speed". These strategies were based upon studies that investigated tutoring strategies by driving instructors [38,39]. In a similar way that studies have used human processing and decision-making strategies as a base for robotics or intelligent vehicles with artificial processing and decision-making skills [40], we implemented the observed feedback strategies of human tutors in a digital tutor.

The DIT also adapted to the driving situation by reminding drivers of the system's capabilities and limitations specific to the current situation. In combination with the overlay visuals, this meant that the driver could directly perceive and process the information in their specific context. Drivers did not have to interpret information in an artificial context (e.g., a screen with a simplified visualisation of the situation) and then apply it to the current driving situation. For example, when the weather changed for the worst in a scenario, the DIT reminded the driver that the car cannot function reliably in heavy fog and rain (Figure 2C). It is important to note that the DIT never explicitly told the driver that it was safe to leave the automation on, or that the automation needed to be turned off. This was decided as it would be unrealistic in a real-world driving scenario (driving a level 2 vehicle) both for safety and reliability issues. Similarly, the DIT is not intended to be used as a warning system. Rather, the DIT identifies some situations to provide situated tutoring and learning.

Last, the DIT adapted its feedback on the current performance of the driver. If the automation was used outside its ODD, the DIT reflected afterwards on why this was not safe. If the automation was unnecessarily turned off, the DIT would also reflect on this. The DIT would add that the driver's judgement was the most important, and that the automation should only be used if the driver thought that it could safely cope. The feedback was manually activated by the researcher.

#### 2.2.4. Set-Up and Procedure

The experiment was a between-subjects design with an experimental condition (DIT group) and a control group (IB group). Both groups drove in three sessions (Table 1), which each containing multiple scenarios. All participants were given the following task for each scenario: "You can start the scenario by driving manually. Turn on the automation whenever you think that the car can safely cope, and turn (or leave) it off if it cannot. The car can't cope with a situation if: traffic regulations have to be violated or the car will damage something or harm someone".

Participants were informed at the start of each session that they remained responsible for their safety and that of their fellow road users while using the automation. They also needed to adhere to the general traffic rules and speed limits. If the participant hit something or someone, a crash sound was played and the scenario ended. After each scenario, participants were asked by the researcher whether they thought that the car could safely cope with the previous situation and why.

At the start of Session 1, all participants received a written overview of the experiment procedure and filled out an informed consent form and a demographics questionnaire. Participants could get used to the simulator in a 10-min demo scenario. Overall, Session 1 consisted of 10 scenarios and lasted 1 h. The DIT provided information and feedback during all scenarios in session 1 (see Section 2.2.3), while the IB group read a brochure about the automation for 10 min before driving. Participants were reminded of their task before each scenario (mentioned above). Session 2 started after a 10-min break. This session contained 8 scenarios and lasted 30 min. Again, participants were reminded of their task before each scenario. The DIT was disengaged for all participants in this session. All participants were asked to participate in Session 3, which took place after two weeks. However, as not all participants were able to come back due to work or school commitments, each group contained 11 participants during Session 3. The set-up for Session 3 was identical to that of Session 2. This last session was included to investigate how any potential effects of the DIT evolved after repeated interaction with the automation.

The order of the scenarios was randomized in Sessions 2 and 3. The scenarios in Session 1 were not randomized and followed the order as depicted in Table 1. This way, the DIT could introduce the different automated systems in a realistic and logical order to the DIT group. The same order of scenarios was adhered to for the IB group to avoid that different orders between groups might influence the results.


**Table 1.** Overview of the experiment set-up for the Digital-in Car Tutor (DIT) group and the Information Brochure (IB) group. Descriptions of all abbreviated driving scenarios are available in Tables 2 and 3.

#### 2.2.5. Scenarios

All scenarios started with a straight road without traffic so drivers could calmly start driving manually and turn on the automation if they thought that it was safe to do so. Furthermore, during Session 1, the DIT introduced a new system to the DIT group on this road as they were still driving manually. After the straight road, the specific driving scenario started. All scenarios contained an event area during which the automation should be on or off.

Session 1 contained 10 driving scenarios (Table 2) of 3 to 4 min each. Each of the five automated systems described in Section 2.2.1 had two dedicated scenarios that addressed a particular capability or limitation of that system. Each system contained one scenario in which the automation could cope, and one in which the automation could not. During the first system-specific scenario, the DIT would explain the basic functionalities, capabilities, and limitations of the particular system. During the second scenario, the DIT would further elaborate on the limitations of the system. Sessions 2 and 3 both contained eight scenarios of 2 to 3 min each (Table 3). In each session, four scenarios required a take-over, and four did not. The scenarios in Session 3 were the same as those in Session 2 but with considerable changes to the environment. It made them look different to the participants, but still allowed for a comparison with Session 2. If a participant did not take back control in situations that the automation could not cope with, the car would crash and the scenario would end.


**Table 2.** An overview of all scenarios during Session 1. Each scenario addresses a particular automated system (e.g., Adaptive Cruise Control).


**Table 3.** An overview of all scenarios during Sessions 2 and 3.

#### 2.2.6. Variables

This study contained two independent variables: Training Method (DIT versus information brochure), and Session (Sessions 1, 2, and 3). Three dependent variables were measured during the experiment: acceptance, appropriate automation use, and take-over quality.

**Acceptance.** Participants indicated their acceptance of their training method in a questionnaire at the end of the first session. This questionnaire was a slight adaptation of the Technology Acceptance Questionnaire [41] and addressed six core aspects of technology acceptance: perceived ease of use, perceived usefulness, attitude, intention to use, self-efficacy, and social norm [42–46] (Appendix A).

**Appropriate automation use.** Each scenario contained an 'event area' during which the automation should be on or off. For events that required the automation to be off, the event area started at the latest moment the participant could turn off the automation and brake to avoid a crash. For example, when the participant was driving 100 km/h, the event area started 76 m before the point where the car would crash into something or someone (members.home.nl/johngrimbergen/remwegformule.htm). For scenarios in which the automation could be (left) on, the event area started directly after the straight road at the start of the specific scenario. Whether a scenario required the automation to be off was determined before the experiment, based on the system information used in the driver training. Four subcategories were used to specify the type of automation use during the event areas: (1) Correct take-over, the automation is off when necessary, (2) Correct reliance, the automation is on while it is safe, (3) Incorrect take-over, the automation is off while this is not necessary, (4) Incorrect

reliance, the automation is on when this is not safe. It was decided not to include a knowledge test to determine the participants' explicit knowledge about the automated systems. In our previous studies [22], we found that a good score on the initial knowledge test did not predict actual use of the automation in the driving simulator study.

**Take-over quality.** In scenarios that required the automation to be (turned) off, three following take-over quality variables were measured from the moment the driver turned off the automation until the location of a possible collision: Time To Collision (TTC) (s), deceleration rate (m/s2), and lateral acceleration (m/s2) [47,48].

Appropriate automation use and take-over quality were already used as performance measures during Session 1. As the DIT is intended to be used by drivers in real cars during regular trips, Session 1 represented drivers' first on-road experience with the automation. For the DIT condition this would be when the DIT provides situated training to the driver while he or she is driving with the automation for the first time. For the IB group, this would be when the driver is driving with the automation for the first time after reading the information brochure. Careful assessment of the automation use was therefore already necessary during the first session as drivers need to be able to safely use the automation as soon as they start driving.

#### 2.2.7. Analysis

The frequency data on 'appropriate automation use' was first analysed using a Chi-Square test. Next, we investigated how the 'appropriate automation use' evolved over time for each of the training methods. This was achieved through a mixed model approach, specifically Generalized Estimating Equation model (GEE). A Generalized Estimating Equation model was created as: our study was a 2 × 2 repeated measures design, the independent variable was binary, and we wanted to control for variations between scenarios [49,50]. In order to closer evaluate the specific types of (correct) automation use, a multinomial logistic regression model was created [51,52] to allow categorical response variables with more than two options. The response variable was 'automation use type' (correct take-over, correct reliance, incorrect take-over, and incorrect reliance).

The average lateral acceleration and deceleration rates were determined for the scenarios that required a take-over, starting directly after the participant turned off the automation until the end of the scenario. Then, any group differences on 'vehicle control' were analysed with unpaired independent t-tests. All research data is freely available in the Supplementary Materials and in the following data repository https://osf.io/xebrw/?view\_only=eb59ffbbddc04bdf8f18d811f74d65ab.

#### **3. Results**

#### *3.1. Appropriate Automation Use*

#### 3.1.1. Collisions

The total number of collisions appeared higher for the IB group in Session 1 (*NIB* = 24, *NDIT* = 20), Session 2 (*NIB* = 10, *NDIT* = 5), and Session 3 (*NIB* = 5, *NDIT* = 1). However, the Chi-Square tests did not indicate significant differences in the individual sessions (all *p* > 0.05). Two specific scenarios showed a significantly higher number of collisions for the IB group on a 0.1 level. These were OD2 (*NIB* = 5, *NDIT* = 1, χ*<sup>2</sup>* (1, *N* = 38) = 3.167, *p* = 0.075) and TS2 (*NIB* = 3, *NDIT* = 0, χ*<sup>2</sup>* (1, *N* = 38) = 3.257, *p* = 0.071).

#### 3.1.2. Correct Take-Over and Reliance Behaviour

During the first session, the IB group used the automation incorrectly (either incorrect reliance or incorrect take-over) more often than the DIT group (*NIB* = 65, *NDIT* = 46) (Table 4). This difference was significant overall (χ*<sup>2</sup>* (1, *N* = 379) = 4.285, *p* = 0.025), and also for the specific scenarios OD2 (χ*<sup>2</sup>* (1, *N* = 38) = 8.992, *p* = 0.003) and RM2 (χ*<sup>2</sup>* (1, *N* = 38) = 7.795, *p* = 0.006). In the scenario OD2, a pedestrian crossed the street from behind a large bus that is blocking the view of the car's cameras. In RM2, the lane markings are missing just before a sharp curve. No significant differences were found in Session 2 (*NIB* = 32, *NDIT* = 26) (χ*<sup>2</sup>* (1, *N* = 301) = 0.720, *p* = 0.240) and Session 3 (*NIB* = 13, *NDIT* = 17) (χ*<sup>2</sup>* (1, *N* = 176) = 0.643, *p* = 0.274). The observed power was sufficient for the Chi-Square tests per session (1-β >.8, d = 0.3, α = 0.05), but insufficient for between group comparisons in specific scenarios (1- β < 0.6, d = 0.3, α = 0.05). Consequently, if we control for the number of scenarios through a rather conservative Bonferroni correction (αadjusted = 0.05/26 = 0.002), the differences found in individual scenarios are no longer significant (all *p* > 0.002).


**Table 4.** Overview of incorrect automation use (*N*) per scenario.

<sup>1</sup> = 1 missing participant. <sup>2</sup> = Significant difference between groups on a 0.05 significance level.

Some specific scenarios appeared to show particularly more incorrect automation uses compared to the other scenarios: ACC1 and T6. ACC1 (*N* = 34) was the very first scenario that any of the participants encountered during this study. T6 contained a signalized intersection with intersecting traffic (*Nsession2* = 20, *Nsession3* = 10). The car would stop for the crossing traffic through traffic signs and continue after all traffic had passed. Multiple participants indicated that they thought the buildings were too close to the intersection and might block the view of the cameras.

Next, a Generalized Estimating Equation procedure followed (Section 2.2.7). The dependent variable was correct automation use. The random effects were the participants and scenarios. The fixed effects were the groups and sessions (Table 5). The chosen working correlation matrix type was 'exchangeable', as this resulted in the lowest Quasi Likelihood under the Independence Model Criterion (*QIC* = 917.230) [50]. The binary logit model showed a significant effect of sessions (χ*<sup>2</sup>* (1, *N* = 856) = 17.158, *p* < 0.001), but no overall effect of groups (χ*<sup>2</sup>* (1, *N* = 856) = 0.249, *p* = 0.618), nor an overall interaction effect (χ*<sup>2</sup>* (2, *N* = 856) = 4.186, *p* = 0.123). However, there were near significant effects on a 0.05 significance level of group in Session 1 (χ*<sup>2</sup>* (1, *N* = 379) = 3.835, *p* = 0.050) and Session 2 (χ*<sup>2</sup>* (1, *N* = 301) = 3.688, *p* = 0.055).


**Table 5.** The Generalized Estimating Equations model that was developed. The working correlation matrix was exchangeable. The random effects were the participants and scenarios, while the fixed effects were the groups and sessions.

*Note.* The DIT group and Session 3 statistics are not included as these were the baseline.

Looking at the specific types of incorrect automation use (incorrect take-over or incorrect reliance), it appeared that the IB group had more incorrect reliance decisions in Session 1 (*NIB* = 27, *NDIT* = 13), Session 2 (*NIB* = 16, *NDIT* = 12), and Session 3 (*NIB* = 6, *NDIT* = 2) (Figure 3). A Chi-Square analysis confirmed a difference between groups in incorrect reliance decisions but only for Sessions 1 (χ*<sup>2</sup>* (1, *N* = 190) = 6.20, *p* = 0.020). The DIT group had more incorrect take-overs in Session 3 (*NIB* = 7, *NDIT* = 15) (χ*<sup>2</sup>* (1, *N* = 88) = 3.879, *p* = 0.049). That is, they did not rely on the car when it was safe to do so more often than the IB group. The observed power for these Chi-Square tests was sufficient at > 0.8 (d = 0.3, α = 0.05). A multinomial logistic regression model was created next (Table 6). Similar to the GEE analysis, the fixed effects of the multinomial logistic regression were group and session, and the random effects were participant and scenario. The analysis confirmed an effect of both session and group on the specific types of automation use. Participants in the IB group were more likely to show an incorrect reliance behaviour (*p* = 0.030). Furthermore, participants were more likely to show incorrect reliance (*p* = 0.014) and incorrect take-overs (*p* = 0.044) during Session 1. No interaction effects of groups and sessions were found (all *p* > 0.05).

**Figure 3.** Overview of the different types of (in)correct automation use. Incorrect take-over means that the driver unnecessarily turned off the automation. Incorrect reliance indicates that the automation was on when it was not safe.


**Table 6.** Multinomial logistic regression model in which the response variable was 'automation use type', the fixed effects were 'group' and 'session', and the random effects were 'participant' and 'scenario'.

*Note.* The automation use type 'correct reliance, the DIT group, and Session 3 were not included as these were the baseline. \* = significant effect on a 0.05 level. The interaction effects were all non-significant (all *p* > 0.05) and were excluded from this table for readability purposes.

**Summary**. Overall, the DIT group appeared to have a more correct automation use than the IB group during Sessions 1 and 2. However, a significant difference was only confirmed for Session 1. Considering the specific types of automation use, the DIT group consistently showed less incorrect reliance behaviour than the IB group throughout all sessions. This difference was confirmed through a multinomial regression. Surprisingly, however, the DIT group unnecessarily took back control (incorrect take-over) more often than the IB group in Session 3.

#### *3.2. Take-Over Quality and Vehicle Control*

During the first driving session, the DIT group showed larger Times To Collision (TTC) at take-over in three (ACC2, OD2, and RM2) out of five scenarios that required a take-over (Figure 4). For the scenario ACC2, the DIT group took back control significantly earlier (*MDIT* = 11.30, *SDDIT* = 7.54) than the IB group (*MIB* = 3.48, *SDIB* = 3.57) (*t*(20.59) = 3.80, *p* = 0.001). The DIT group also took back control significantly earlier in the scenario OD2 (*t*(27) = 2.45, *p* = 0.025), with a mean TTC of 6.19 s for the DIT group (*SD* = 2.55) and 3.67 s for the IB group (*SD* = 2.92). Similarly, the DIT group took back control significantly earlier in scenario RM2 (*t*(21.63) = 2.27, *p* = 0.034). In this scenario, the mean take-over distance was even negative for the IB group, indicating that take-over after the collision location had already passed (*MIB* = −0.03, *SDIB* = 2.12) (*MDIT* = 1.24, *SDDIT* = 0.93). In Sessions 2 and 3, it still appeared that the IB group took back control later in most scenarios that require a take-over; however, these results were not significant.

**Figure 4.** TTC when participants took back control.

During Session 1, the deceleration rate (m/s2) was higher for the IB group in the same three scenarios in which the IB group showed later take-overs (ACC2, OD2, and RM2) (Figure 5). In scenarios ACC2 (*MIB* = 1.79, *SDIB* = 0.79, *MDIT* = 0.88, *SDDIT* = 0.44) (*t*(34) = 4.12, *p* < 0.001) and RM2 (*MIB* = 0.83, *SDIB* = 0.38, *MDIT* = 0.61, *SDDIT* = 0.23) (*t*(34) = 2.10, *p* = 0.043), the IB group decelerated significantly faster. This was also the case in scenario OD2, but only on a 0.1 significance level (*MIB* = 2.25, *SDIB* = 2.89, *MDIT* = 0.92, *SDDIT* = 0.61) (*t*(28) = 1.85, *p* = 0.075). During the second session, only scenario Test 6 showed a difference between groups on the deceleration rate on a 0.1 significance level (*MDIT* = 0.68, *SDDIT* = 1.97, *MIB* = 0.89, *SDIB* = 2.51) (*t*(36) = 1.72 *p* = 0.093). None of the scenarios in Session 3 showed significant differences on the deceleration rate between groups.

In Sessions 1 and 2, none of the scenarios showed a significant difference between groups on the average lateral acceleration after take-over. In Session 3, only one scenario (Test 9) showed a significant difference between groups on the average lateral acceleration after take-over (*t*(19) = −2.38, *p* = 0.028). In this particular scenario, the DIT group showed a higher average lateral acceleration (*MDIT* = 0.57, *SDDIT* = 0.18, *MIB* = 0.36, *SDIB* = 0.22).

**Summary**. Overall, the DIT group showed significantly larger TTCs and smaller deceleration rates during the first session. This indicates earlier and consequently more gentle take-overs by the DIT group. While this still appeared to be the case in Sessions 2 and 3, the differences were no longer significant. Only one scenario across all sessions showed a difference between groups in the lateral acceleration. In this case, the DIT group showed a larger lateral acceleration. The possibility of Type II errors needs to be taken into account for the take-over quality and vehicle control variables, as the power was < 0.8 for these tests (d = 0.5, α = 0.05) [53].

**Figure 5.** Deceleration rate after the participants took back control. \* = significant on a 0.05 level. \*\* = significant on a 0.1 level. The error bars represent the Standard Error.

#### *3.3. Acceptance*

At the end of the first session, participants rated their agreement to several statements about their training on a scale of 1 (Strongly disagree) to 7 (Strongly agree) (Figure 6). Overall, the participants of the DIT group agreed that the DIT was easy to use (*M* = 5.79, *SD* = 0.93, *95% CI* = 5.34–6.24) and useful (*M* = 5.72, *SD* = 1.18, *95% CI* = 5.15–6.29). Participants were positive towards the DIT (*M* = 5.74, *SD* = 1.11, *95% CI* = 5.20–6.27), and disagreed that it was annoying or frustrating (*M* = 2.63, *SD* = 1.28, *95% CI* = 2.02–3.25). Furthermore, participants showed the intent to use the DIT if it was in their partially automated car (*M* = 5.05, *SD* = 1.65, *95% CI* = 4.26–5.85), and felt that they were capable of using it (*M* = 5.87, *SD* = 0.47, *95% CI* = 5.64–6.09). Participants disagreed that people who are important to them think that they should use the DIT (*M* = 3.79, *SD* = 2.12, *95% CI* = 2.77–4.81). This seems logical as their friends and family most likely do not know about the system. The acceptance ratings could not be compared as each group only experienced one training method.

**Figure 6.** Overview of the acceptance ratings. For the IB group, the words 'training system' were replaced by 'training'. Two 'ease of use' questions did not apply to the IB group. The error bars indicate the 95% Confidence Intervals.

#### **4. Discussion**

A Digital In-car Tutor (DIT) is proposed as a situated, low-cost, and time efficient method for drivers to learn about their partially automated car during regular driving trips. In this study, we evaluated a DIT prototype for a complex (simulated) partially automated car. It was hypothesized that the DIT prototype would support drivers in deciding when it is safe to use the automation, and consequently lead to better vehicle control when taking back control. To study this, we compared appropriate automation use and take-over quality, in two groups over three driving sessions. The control group received information about the car automation through a brochure (IB group), while the experimental group received the information from the DIT prototype during the first driving session (DIT group). The DIT provided situated information about the systems' capabilities and limitations. Drivers were instructed to turn on the automation whenever they thought that the car could safely cope with the situation, and turn (or leave) it off if they thought that it could not. Each scenario contained an event in which it was either safe or unsafe to use the automation. This way, the automation use could be classified as follows: (1) Correct take-over, the automation is off when necessary, (2) Correct reliance, the automation is on while it is safe, (3) Incorrect take-over, the automation is off while this is not necessary, and (4) Incorrect reliance, the automation is on when this is not safe. It is important to note that the DIT is *not* a warning system that prompts all upcoming events. Rather, it identifies certain scenarios to support situated learning. Furthermore, the DIT never stated that it was safe to leave the automation on, or that it was necessary to take back control. For technical, safety, and liability reasons, this would be unrealistic to expect if the DIT were to be implemented in commercial cars.

**Correct automation use.** During the first driving session, the DIT group showed overall a more correct automation use (combined correct take-overs and correct reliance) compared to the IB group. During the second session, in which the DIT was no longer active, this still appeared to be the case, but the difference was no longer significant. During the third session, the two groups showed a similar level of correct automation use. Although a significant difference could only be confirmed for the first session, this still has implications for traffic safety. As the DIT should be used in real cars during normal trips, drivers need to be able to use the automation appropriately and safely from the start without any possible confusion. In simulator training, one could require drivers to go through multiple driving sessions to get to a desired performance level (although we did still see more inappropriate reliance behaviour in the control group after three driving sessions, which we will discuss soon). But as drivers are using the DIT during regular driving in their own car, initial appropriate automation use is critical for traffic safety. Still, although most learning is believed to occur during the initial interaction [7,8,54], it may still be necessary to increase the duration of the DIT to obtain a higher final performance level, especially since multiple studies, like those by Beggiato [7,54] and Forster [8], have shown that the learning curve stabilizes after approximately five interactions (or 3.5 h) [7,8]. Extended DIT support may also be necessary as situations that have not been experienced for a long time can fade from the driver's mental model [7]. Longer (but not necessarily continuous) DIT support provides the option to highlight rare situations in similar frequently occurring situations. This needs further investigation in a more longitudinal study.

**Incorrect reliance.** The DIT group already showed less incorrect reliance during the first session, compared to the IB group. By the third session, the amount of incorrect reliances of the DIT group had further decreased to around two and a half percent of all interactions. While the IB group also showed a decrease in incorrect reliances over time, both the initial and final amount of incorrect reliances appeared to be higher compared to the DIT group. During the third session, the brochure group still showed around seven percent of incorrect reliances out of all interactions. Further analysis confirmed that the IB group was more likely to show incorrect reliance behaviour. These results follow our expectations based on both established and more recent models that describe the interaction between automation feedback and automation use. These include, amongst others, Lee and See [55], Seppelt [56,57], and Revell [58]. All these interaction models suggest that (external) information about the automation, as well as repeated interactions and automation feedback all affect automation use

(and reliance). The results suggest that by combining all these elements in the DIT, it was effective in specifically decreasing inappropriate reliance behaviour. This is an important implication of the prototype as inappropriate reliance can lead to severe safety issues.

**Incorrect take-over.** Both groups had a similar number of unnecessary (incorrect) take-overs during the first driving session. While the number of unnecessary take-overs decreased over time for the IB group, this was not the case for the DIT group. It seems that the DIT group was more careful to rely on the automation throughout the driving sessions. These results are unexpected as they are not in line with the statement that repeated interactions, feedback, and background information lead to improved mental models and consequently appropriate automation use. Similarly, they are not in line with the research on a digital tutor for ACC by Simon [33], which showed fewer unnecessary take-overs from users of the digital tutor. However, interestingly, that study also showed a slight increase of unnecessary take-overs during the third driving session in specific scenarios. One would expect that the feedback of the DIT would in this case lead to fewer unnecessary take-overs, just as the lack of feedback for the IB group should lead to an over- or under-reliance depending on the experience of safe driving situations or crashes.

The amount of unnecessary take-overs for the DIT group might be explained by the Signal Detection Theory [59–61]. In our study, correct take-over and correct reliance correspond respectively to 'hit' and 'correct rejection', while incorrect take-over and incorrect reliance correspond to 'false alarm' and 'miss'. The information and explicit feedback by the DIT repeatedly stressed the limitations of the automation. This may have made drivers change their criterion and take a more conservative attitude when judging situations as being inside the ODD of the automation, consequently increasing the number of incorrect take-overs (false alarms) and reducing the amount of incorrect reliance (misses). Another explanation is that drivers were still in the phase of forming their core mental models about the automation by the third session [33]. It is important to realize that unnecessary take-overs are not necessarily dangerous and are arguably preferred in ambiguous situations. Still, unnecessary take-overs need to be limited so that the automation can be used to its full potential. If drivers are constantly disengaging the automation when it is unnecessary, potential benefits of the automation such as increased traffic safety and driver comfort may not be achieved.

**Challenging scenarios.** Two particular driving situations were very difficult for both groups: ACC1 and T6 (see Section 2.2.5). It was safe to leave the automation on in both situations. ACC1 was the very first scenario that all drivers encountered during the study. As discussed earlier, drivers need repeated experience and feedback to develop a calibrated level of trust [7,8,62]. While reassurance feedback may support a higher initial level of trust, a DIT should never suggest that the automation can perfectly handle a situation. Scenario T6 was a signalized intersection with crossing traffic. The automated car would detect the priority signs and stop to let the crossing cars pass. Drivers did not rely on the car as they thought that the houses were too close to the street and might block the view of the car's cameras. This suggests that the drivers were well aware of the limitations (blocked cameras) and capabilities (detecting priority signs) of the automation. However, as no specific camera ranges were provided during the training, this particular situation became ambiguous for the drivers. Taking back control was then arguably the safest decision.

**Vehicle control.** We expected to see better vehicle control for the DIT group after disengaging the automation in situations that required to take back control [63,64]. For example, Simon [33] found less intense braking behaviour for users of the digital ACC tutor. In our study, the DIT group took back control significantly earlier, and braked less hard, than the IB group during the first session. However, no significant differences were found between the groups in the second and third sessions. Still, the minimum Time To Collision at take-over was consistently larger, and the maximum deceleration was smaller, for the DIT group. While overall no differences between groups were found for the lateral acceleration after take-over, one scenario surprisingly showed a larger lateral acceleration for the DIT group. The possibility for Type II errors needs to be taken into consideration for the vehicle control variables as these tests had limited power.

**Acceptance.** Our results show that participants found the DIT easy to use. Participants also indicated that the DIT made learning about, and using, the automation easier. They felt positively about the DIT and confident in using it. Participants indicated an intent to use the DIT, but did not think that their peers and family felt that they should use it.

#### *4.1. Limitations*

Certain limitations concerning this study have to be taken into account. First, participants in the control group were asked to read the brochure carefully before entering the driving simulator. However, in real life, a large share of drivers does not read the owner's manual, nor looks up any other information about the automation in their car [1,5]. Therefore, the group will not be representative of all drivers. A brochure was chosen for the control group as this is often used by car sellers as the main (and only) method of providing customers with information about the automation in their new car [5]. An additional study with a control group that does not receive any information about the automation before driving may be required for an improved representation of current drivers.

Second, it may be that the visual cues have contributed to the differences between groups during Session 1 due to a priming effect. Although the visuals were a core part of the DIT prototype as they allowed to address the systems' limitations in the current driving situation, further research is necessary to determine how the way that the information is presented influences learning. For example, it is unclear if a DIT that is strictly auditory will have similar effects.

Third, participants could only turn off the automation by pressing a button on the steering wheel. It is possible that the inability to disengage the automation through the brake has caused confusion among drivers in time-critical situations. However, participants were reminded that they had to disengage the automation through the button, and not the pedals, multiple times throughout the driving sessions.

Last, the current between-subject set-up did not allow us to compare the acceptance between the DIT and an information brochure. Additional studies with a within-subject design are required to examine the acceptance of the DIT more extensively.

#### *4.2. Future Research*

The results of this study provide multiple opportunities for further research. First, it is necessary to further investigate the specific information that needs to be included during the introduction of a new system. For example, it is unclear if it is necessary to include the technical equipment specifications.

Second, the effects of a DIT on driver distraction need to be assessed. By projecting the transparent images on the windscreen, the driver does not have to continuously shift his attention from the road to a secondary screen. However, the images are still expected to introduce glances away from the centre of the road and take up cognitive resources. They therefore need to be further refined so that they facilitate optimal learning while limiting distraction from the road. For example, the images may need to be located closer to the centre of the driver's field of view, without causing visual clutter [65,66], to adhere to the NHTSA guidelines on the number and duration of glances away from the centre of the road [67,68].

Last, while the concept prototype used the entire windscreen to project the images on, more practical implementations need to be explored. For example, the DIT may be implemented in an off-the-shelf head-up display device.

#### **5. Conclusions**

During the first driving session, in which the DIT was active for the experimental group, users of the DIT showed a more correct automation use (correct reliance and correct take-overs) and higher-quality take-overs. This first driving session represented the initial on-road contact with both the automation and DIT. However, the differences in correct automation use were reduced over time and disappeared by the last driving session, which took place two weeks after the first session. The IB group appeared

to catch up with the DIT group and came to a similar level of correct automation use. Still, as the DIT is used in drivers' cars during regular drives, safe automation use is extremely important directly from the start. The DIT specifically led to less incorrect reliance behaviour throughout the driving sessions, something that would otherwise lead to immediate safety issues. While the IB and DIT groups both showed a decrease in incorrect reliance over the course of the driving sessions, the overall incorrect reliance was significantly lower in the DIT group throughout the sessions. That means that drivers relied less on the automation in situations that were outside of its Operational Design Domain. Still, further research is necessary on the precise required content of a DIT, and how the way of presenting the DIT information exactly influences learning. The results further indicated a possible under-trust of the automation among users of the DIT. While under-trust may be less dangerous, it may hinder the adoption (and proposed benefits) of automated driving. It is therefore necessary to investigate how to address the under-trust without the risk of creating overreliance. Finally, drivers found the DIT easy to use, useful, and felt confident in using it. Overall, this study provides an initial insight into the effects of a Digital In-Car Tutor on the appropriate use of complex car automation. The concept of a DIT shows some potential as a low-cost, time-efficient, situated, and long-term method for learning about partially automated cars, with additional benefits for instructing drivers after overnight software updates. Therefore, additional research is advised to further explore DIT content and form.

**Supplementary Materials:** The data collected during the study are freely available at www.mdpi.com/xxx/s1.

**Author Contributions:** Conceptualization, A.B. and A.P.v.d.B.; Data curation, A.B.; Formal analysis, A.B.; Investigation, A.B.; Methodology, A.B., A.P.v.d.B., M.C.v.d.V., W.B.V. and M.H.M.; Project administration, A.B.; Supervision, A.P.v.d.B., M.C.v.d.V., W.B.V. and M.H.M.; Visualization, A.B.; Writing – original draft, A.B. and A.P.v.d.B.; Writing – review & editing, A.B., A.P.v.d.B., M.C.v.d.V., W.B.V. and M.H.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research is funded by the Dutch Domain Applied and Engineering Sciences, which is part of the Netherlands Organisation for Scientific Research (NWO), and which is partly funded by the Ministry of Economic Affairs (grant number 14896).

**Conflicts of Interest:** The authors declared no potential conflict of interest with respect to the research, authorship, and/or publication of this article.

#### **Appendix A —Acceptance Questionnaire**

The following acceptance questionnaire was completed by participants of the DIT group after the first session.

The following questions are specifically about the training system you experienced!

#### **Perceived ease of use.**

Please indicate for each statement to what extent you (dis)agree. (1- Strongly agree, 7- Strongly disagree)


#### **Perceived usefulness.**

Please indicate for each statement to what extent you (dis)agree. (1- Strongly agree, 7- Strongly disagree)


#### **Attitude.**

Please indicate for each statement to what extent you (dis)agree. (1- Strongly agree, 7- Strongly disagree)


#### **Intention to use.**

Imagine that you own the partially automated car that you experienced today.

Please indicate for each statement to what extent you (dis)agree. (1- Strongly agree, 7- Strongly disagree)

11. I would actively use the training system in my partially automated car

#### **Self-e**ffi**cacy.**

Please indicate for each statement to what extent you (dis)agree. (1- Strongly agree, 7- Strongly disagree)


#### **Social norm.**

Imagine that you own the partially automated car that you experienced today.

Please indicate for each statement to what extent you (dis)agree. (1- Strongly agree, 7- Strongly disagree)

14. People who are important to me think I should use the training system

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

## **The Impact of Situational Complexity and Familiarity on Takeover Quality in Uncritical Highly Automated Driving Scenarios**

#### **Marlene Susanne Lisa Scharfe 1,\*,†, Kathrin Zeeb <sup>1</sup> and Nele Russwinkel <sup>2</sup>**


Received: 30 January 2020; Accepted: 17 February 2020; Published: 20 February 2020

**Abstract:** In the development of highly automated driving systems (L3 and 4), much research has been done on the subject of driver takeover. Strong focus has been placed on the takeover quality. Previous research has shown that one of the main influencing factors is the complexity of a traffic situation that has not been sufficiently addressed so far, as different approaches towards complexity exist. This paper differentiates between the objective complexity and the subjectively perceived complexity. In addition, the familiarity with a takeover situation is examined. Gold et al. show that repetition of takeover scenarios strongly influences the take-over performance. Yet, both complexity and familiarity have not been considered at the same time. Therefore, the aim of the present study is to examine the impact of objective complexity and familiarity on the subjectively perceived complexity and the resulting takeover quality. In a driving simulator study, participants are requested to take over vehicle control in an uncritical situation. Familiarity and objective complexity are varied by the number of surrounding vehicles and scenario repetitions. Subjective complexity is measured using the NASA-TLX; the takeover quality is gathered using the take-over controllability rating (TOC-Rating). The statistical evaluation results show that the parameters significantly influence the takeover quality. This is an important finding for the design of cognitive assistance systems for future highly automated and intelligent vehicles.

**Keywords:** highly automated driving; HAD; takeover; conditional automation; intelligent vehicles; objective complexity; subjective complexity; familiarity; cognitive assistance; takeover quality

#### **1. Introduction**

Within recent years, human factors have become an important research topic in automating driving [1]. Approaching the Level 3 of automation [2], the driver may shift attention to a non driving related task (NDRT) during the automated drive. Still, the driver remains as fallback if the automation requests a takeover (TOR; [2]). Most takeover requests in Level 3 highly automated driving [2] will be non-critical [3], giving the driver sufficient comfortable transition time [4]. The focus in this study lies on non-critical takeover situations in different scenarios and the resulting takeover quality. In contrast to critical takeover situations, where drivers abbreviate the takeover process, the driver has enough time to properly perceive the driving environment before performing a maneuver. During the automated mode, the driver can engage into a non-driving related task. The takeover is a complex task. As soon as a TOR is triggered, the driver has to shift the attention back to the driving environment, perceive the surrounding traffic environment and take over the driving task. Hands and feet have to be relocated, situation awareness regained and the driving task has to be executed [5,6]. In addition the in-vehicle environment has to be perceived and filtered for relevant information. All these processes happen after the driver has been out-of the loop. In a small amount of time and a dynamic environment, these are several cognitive and motoric processes that have to happen in a very small amount of time. It is thus important to investigate aspects that affect a safe and comfortable takeover. In this paper, four relevant factors that influence the takeover quality are examined. In the following, the four factors are described separately and distinguished. Still, they are not independent from each other. First, the takeover process is influenced by the complexity of the surrounding traffic environment that can be defined as objective complexity. The objective complexity mainly varies in its amount of relevant objects in the surrounding environment. However, other factors, such as weather conditions, road structure and relative speed can also add up to objective complexity. Especially when taking over the driving task, the objective complexity can impact the quality of the takeover. Different studies [7,8] found that high traffic density leads to a reduced takeover quality when a lane change is required. A reason for this is that the choice of lane change is more complex than just braking as vehicles on the other lanes have to be perceived and time gaps and relative speeds estimated. Second, besides the objective complexity, individual differences have to be taken into account [8]. Not only the traffic situation but also the current state of the individual driver (e.g., stress level, vigilance, workload of non-driving related task) may differ in every takeover situation. This is called subjective complexity. The subjective complexity is task- and resource-dependent and describes an individuals' subjective perception of complexity in a certain traffic situation [9]. Depending on the current attentional state of the driver, the perception of complex situations can vary. While one driver might be familiar and thus very comfortable with high traffic density and rate complexity of the situation as low, another driver might perceive the situation as more complex. Third, such an individual perception of complexity is influenced by the familiarity. Due to common driving routes of individual drivers, the familiarity with roads and therefore traffic situations (traffic jam, urban roads, villages etc.) varies. Reference [10] show that the overall response time is significantly lower for drivers who are familiar with the system. In unfamiliar situations, drivers thus have higher response times. This is highly important when dealing with safety aspects for takeover situations, as the takeover quality can be enhanced when lower reaction times are needed in familiar situations. Fourth, stable driver variables, such as driving style, driving frequency, driving routes and driving duration have an impact on the takeover quality. To improve the takeover quality, cognitive assistance systems can support the driver during a takeover. By integrating information about the surrounding traffic environment (objective complexity), the current state of the driver (subjective complexity), the customary traffic situations of individual drivers (familiarity) and stable driver variables, such as the driving style, the HMI as well as vehicle dynamics can be adapted. In a situation with high objective complexity and an unfamiliar driver, who perceives the situation as very complex, only relevant and supportive information would be presented to the driver (e.g., projection of best maneuver trajectory) and the automation would hand over the driving task gradually (e.g., handing over the steering but keeping adaptive cruise control activated). In highly familiar situations with low complexity, additional information, such as a radio channel, playing the favorite song or the time schedule of the next appointment could be presented to the driver to keep vigilance low. As it is already shown that the drivers' familiarity with a situation and the objective complexity of the current traffic situation influence the subjective complexity [9,11,12], this study investigates the impact of the situational variables familiarity, objective complexity and subjective complexity on the takeover quality. Furthermore, stable individual variables are integrated. All variables are related to each other in different ways. Figure 1 represents the relationships that are investigated in the present study. Based on this, cognitive assistance systems can be developed to support individual drivers accordingly. The following hypotheses are examined in this study:

**Figure 1.** Hypothesised relationships between situational variables, stable variables and the takeover quality. The impact on subjective complexity as shown in [9].

**Hypothesis 1.** *Higher familiarity with the situation is related to increased quality of a takeover.*

**Hypothesis 2.** *Higher objective complexity is related to a decreased quality of the takeover.*

**Hypothesis 3.** *Higher subjective complexity is related to a decreased quality of the takeover.*

**Hypothesis 4.** *Situational and stable driver variables (driving style, driving frequency, driving routes and driving duration) together can best explain variance in takeover quality.*

#### **2. Methods**

To rate the takeover quality, this study evaluates videos of a driving simulator study. The driving simulator consists of six monitors that create a 360◦ surround view and a moveable driving unit to create a more realistic driving simulation. Six different traffic scenarios are built using the driving simulation SILAB [13]. Participants are tested in a controlled environment to enable measurements under exactly the same traffic conditions. A ten minutes learning session prior to the study is included for participants to get acquainted with simulator dynamics, notifications and the takeover itself. The implementation of the study is approved by the ethics committee of the TU Berlin in April 2019 and Robert Bosch GmbH.

#### *2.1. Study Design*

The study includes six scenarios with a different amount of relevant vehicles in the surrounding traffic environment (Section 2.3.1). In three blocks, each scenario is repeated once per block in randomized order. Overall, participants took over the driving task 18 times after an automated drive. Depending on each participant the global study duration lies between 90 and 120 min. After the mandatory documents, participants are theoretically instructed into the study (20–30 min). This is followed by a test drive in which participants get used to the simulator (5 min). Their main task is to drive onto the highway (starting from a parking lot) and onto the center lane, where they turn on the automation as soon as it is available. During the automated drive, they are instructed to play a quiz on a mounted tablet next to the center console until a takeover request is triggered. Each automated drive lasts around 2 min. As soon as the takeover is triggered, participants are instructed to immediately stop the quiz and take over. The takeover request is always triggered when the ego vehicle is driving on the center lane with a speed of 120 km/h. Participants are instructed to take over the driving task using the levers, and keep the speed at ca. 120 km/h. Each scenario triggers a certain maneuver that is the best solution in the given situation. Depending on the traffic situation (speed and position of relevant vehicles), participants should stick to the obligation to drive on the right and try not to break or accelerate enormously. Due to this, always one maneuver is most useful (right when the right lane is free; follow when the right is occupied and the leading vehicle faster or at the same speed; left when the right is occupied and the leading vehicle certainly slower than the ego vehicle). As soon as participants take an action decision the corresponding decision has to be indicated aloud. After each takeover, participants drive onto a parking lot to answer a rating sheet for subjective complexity (NASA-TLX; Section 2.3.3). From the parking lot, the next scenario starts as soon as participants finish the rating sheet. Depending on the time participants took to answer the rating sheet, each scenario lasts three to five minutes.

#### *2.2. Participants*

The simulator study took place in May and April 2019 after a successful pre-testing. Statistical evaluations base on *N = 20* (13 male, 7 female) participants with a mean age of *M =* 26.2 years (*SD* = 2.69) who took part in the study. Most participants drive on average 30 min on a daily basis. They drive mostly on highways and indicate a moderate driving style (Figure 2).

**Figure 2.** Distributions of driving statistics of the participants (*N =* 20).

#### *2.3. Variables and Measurements*

The study is designed to measure four main variables that are important for the takeover in highly automated driving. The connection between those variables is depicted in Figure 1. Variables and measurement methods are described in detail below.

#### 2.3.1. Objective Complexity

The objective complexity is an independent variable (Figure 1) and based on the amount of relevant vehicles in the traffic environment. A vehicle is defined as relevant when it has a direct impact on the ego vehicle. Such a direct impact is either the necessity to react, the reason for a maneuver or a safety critical vehicle that has to be regarded during a maneuver (e.g., overtaking vehicles during a lane change to the left). Three different maneuver options are set up in the traffic simulation. The takeover is always triggered when the ego vehicle is in the highly automated mode on the center lane. Maneuver options are thus a lane change to the left, a lane change to the right or car following. Based on the obligation to drive on the right, the traffic environment is set up to trigger all three maneuvers. For every maneuver a complex and an easy traffic scenario exists. This results in overall six different scenarios that vary in their complexity based on the amount of vehicles relevant for the maneuver (0, 1, 2, 3, 6; Figure 3). Two scenarios have two relevant vehicles in the surrounding traffic environment that are similarly integrated into statistical analysis.

**Figure 3.** Traffic scenarios during the takeover request. Blue squares mark relevant vehicles in the given scenario situation, the red star marks the ego vehicle.

h

h

h

h

#### 2.3.2. Familiarity

The second independent variable is the familiarity with a certain traffic situation (Figure 1). It is implemented by a repetition of the scenarios. Each scenario is represented three times for each participant in a randomized order. Therefore, the habituation to general traffic situations rises with repeated exposure.

#### 2.3.3. Subjective Complexity

Subjective complexity is not a direct independent variable as it is not manipulated throughout the experiment. It indicates how complex participants perceive the scenario (individual perception of complexity in terms of "has this been a complex environment **for you**"). It is influenced by the objective complexity and the familiarity (Figure 1; [9]). To assess the subjective complexity, the multidimensional rating sheet NASA-Task Load IndeX /NASA-TLX; [14]) is used after each takeover. Originally the NASA-TLX is a rating scale in which information about magnitude and sources of six workload-related factors are combined to derive an estimate of workload. Due to its six sub-scales, the questionnaire is the most suitable to measure subjective complexity in takeover situations. On a 20-point likert scale, six different sub-scales are rated. The six sub-scales measure mental demand, physical demand, temporal demand, performance, effort and frustration. A weighting of the items as in [14] has been criticized in the past [15]. Reference [16] states that without the weighting of the scales a better differentiation and higher reliability can be achieved. Furthermore, it is stated that the weighting of the scales provides little informative value [17]. Another shortcoming of the weighting is the aspect of time that is additionally needed for the weighting. Based on this, the weighting is not used in this study. Participants are instructed to rate the complexity of the situation using the NASA-TLX after every trial, resulting in overall 18 ratings (six scenarios, three times each).

#### 2.3.4. Takeover Quality

The takeover quality is the dependent variable (Figure 1). Both complexities and the familiarity are assumed to influence the takeover quality. The quality of the takeover is rated using the take-over controllability rating (TOC; [18]). The TOC is a procedure for an assessment of control transitions from automated to manual driving. It provides a standardized rating scheme on a scale from one to ten. Furthermore, it allows the integration of different aspects of driving performance during control transitions into a global measure when evaluating video material of a driving situation. The sub-scales of the TOC include braking response, longitudinal vehicle control, lateral vehicle control, lane change/lane choices, securing/communication, vehicle/system operation and the facial expression of the driver. The last sub-scale (facial expression of the driver) is not rated in this study as the video material does not include the face of the driver [18]. The sub-scales are rated on a 10-point scale. A perfect quality is rated with one. Values of two or three indicate imprecision. Those include jerky steering movement or imprecise lane keeping on the sub-scale of lateral vehicle control, unnecessary/wrong use of indicator on the sub-scale securing/communication, imprecision for vehicle/system operation and visible emotions on the sub-scale facial expression of the driver. Driving errors are rated between four and six, depending on the strength of the error. The following items indicate errors: too strong, too weak, too late, missing (braking response), safety distance too low, inadequate speed (longitudinal vehicle control), safety-distance too low, strong oscillation, crossing lane markings (lateral vehicle control), hesitant/interrupted, too late, missing, wrong lane (lane change/lane choices), missing/too late use of indicator, missing/too late control glance (securing/communication) and problems (vehicle/system operation). Endangerment is rated between seven and nine, including endangerment of others and self-endangerment over all the sub-scales. In cases of non-controllable events, the takeover is rated with a ten, including collision, lane departure/leaving road or loss of vehicle control over all the sub-scales [18]. Low values indicate a faultless takeover (= 1) and high quality. Higher values on the other hand indicate a bad quality of the takeover (10 = uncontrolled).

#### **3. Results**

Regression analysis is used to examine the influence of the independent variables on the dependent variable takeover quality. Residual vs. fitted, normal Q-Q, scale-location and residual vs. leverage plots are used to test on the model, normal distribution, homoscedasticity and outliers. To test on multicollinearity, the variance inflation factor is used. Mediation and moderation effects are tested as well, but no significant effects are found.

#### *3.1. The Impact of Familiarity on Takeover Quality (H1)*

To evaluate the impact that the familiarity with a traffic scenario has on the takeover quality, regression analysis is used. Results show that with a rise in familiarity, the quality of the takeover significantly improves (*<sup>β</sup>* = −0.24, *<sup>R</sup>*<sup>2</sup> = 0.01, *<sup>t</sup>*(311) = −2, *<sup>p</sup>* < 0.05; Figure 4, right). The slope of the regression is with −0.24 not very high and only one percent of variance in the takeover quality can be explained by familiarity. This shows that familiarity has a significant impact on the takeover quality, but only a small one. It has to be stated though that all participants are regular highway drivers. Hence, the familiarity may have been high already.

**Figure 4.** The relation between takeover quality and the objective complexity as relevant vehicles in the surroundingtraffic environment **(left**) and between takeover quality and situation familiarity (**right**). Red lines indicate the regression line (significance codes: 0 '\*\*\*' 0.001 '\*\*' 0.01 '\*' 0.05).

#### *3.2. The Impact of Objective Complexity on Takeover Quality (H2)*

Additionally, the takeover quality is significantly influenced by the objective traffic complexity (Figure 4, left). With more relevant vehicles in the surrounding traffic environment, the takeover quality becomes worse. In scenarios that have a bad TOC rating, drivers do not hold enough safety distance, changed the lane very hesitant and interrupted, did not use the indicator or did not do the control glance. In cases where the low safety distance could have lead to a collision in real traffic, endangerment of self and others is rated. Wrong decisions did not influence the takeover quality when the maneuver was executed perfectly. Results show that the slope of the regression is 0.17. Three percent of variance can be explained by the amount of relevant vehicles in the surrounding traffic environment (*β* = 0.17, *R*<sup>2</sup> = 0.03, *t*(311) = 3.44, *p* < 0.001). The small amount of variance that can be explained can again be due to the participants driving history. As all drivers are used to highway situations where the objective complexity is usually high, the impact might be reduced due to the increased familiarity. Furthermore, other aspects that add up to objective complexity (e.g., traffic signs) may also play an important role.

#### *3.3. The Impact of Subjective Complexity on Takeover Quality (H3)*

The subjective complexity measures how complex each individual perceives the situation. It is significantly influenced by the objective complexity of the environment (*β* = 0.55, *p* < 0.001) and the familiarity with the situation (*β* = −0.83, *p* < 0.001; Figure 1; [9]). In addition, the aggregated subjective complexity has a significant impact on the takeover quality (*β* = 0.07, *p* < 0.05). A driver who perceives a situation as highly complex has a worse quality of the takeover (Figure 5). Although the impact is significant, only one percent of variance can be explained by the aggregated subjective complexity (*R*<sup>2</sup> = 0.01, *t*(311) = 2.33, *p* < 0.05). Subjective complexity consists of the six different sub-scales mental demand, physical demand, temporal demand, performance, effort and frustration. Mental demand and physical demand do not influence the takeover quality significantly. However, with a rise in temporal demand, the takeover quality decreases significantly (*β* = 0.1, *t*(306) = 2.26, *p* < 0.05). In addition, the takeover quality decreases with a rise in frustration (*β* = 0.11, *t*(306) = 2.86, *p* < 0.01). Surprisingly, with a rise in the perceived performance, the actual takeover quality also decreases (*β* = 0.08, *t*(306) = 2.54, *p* < 0.05). Furthermore, the effort has a positive effect on the takeover quality (*β* = −0.15, *t*(306) = −4.36, *p* < 0.001). Multiple linear regression analysis of the sub-scales can explain ten percent of variance in takeover quality (*R*<sup>2</sup> = 0.1; Figure 5). Figure 5 shows that many scores lie on the fourth marker. In the TOC rating, driving errors are rated between four and six. After taking over in this study, a lot of drivers make driving errors. These errors are mostly not enough distance, too strong braking, a missing use of indicators or a missing control glance. As these errors are not severe (e.g., low distance but no cutting in on other vehicles) in these cases, the lowest driving error rating is chosen.

**Figure 5.** The relation between subjective complexity (left), its sub-scales (right) and the takeover quality. Red lines indicate the regression line (significance codes: 0 '\*\*\*' 0.001 '\*\*' 0.01 '\*' 0.05).

#### *3.4. Multiple Regression Analysis on Takeover Quality Including Stable Driver Variables (H4)*

Separately, the variables show significant relationships, but the amount of variance in takeover quality that can be explained is not high. To estimate the impact of the combination of the variables, multiple regression analysis is used (Figure 6). Results show that a combination of stable (e.g., driving style) and situational variables (e.g., objective complexity) increases the amount of variance in takeover quality that can be explained to 58 percent. The stable variables that significantly influence takeover quality are indicated driving style, average driving frequency, most used driving routes and average driving duration. The takeover quality decreases with a more defensive driving style (*β* = 0.77, *t*(183) = 5.85, *p* < 0.001), less driving frequency (*β* = −0.41, *t*(183) = −8.03, *p* < 0.001) and longer average driving duration (*β* = 0.44, *t*(183) = 4.77, *p* < 0.01). More frequent highway usage is related to a better takeover quality (*β* = 1.17, *t*(183) = 8.4, *p* < 0.001). Situation familiarity is not significant in the multiple linear regression anymore. Similarly, the objective complexity is only significant on a .1 level (*β* = 0.09, *t*(183) = 1.95, *p* < 0.1). The sub-scales temporal demand, effort and frustration from subjective complexity add to the multiple linear regression. The higher temporal demand (*β* = 0.09, *t*(183) = 3.16, *p* < 0.01) and frustration (*β* = 0.1, *t*(183) = 3.19, *p* < 0.01), the lower is the resulting takeover quality. The more effort is spent during a takeover on the other hand, the better is the resulting quality (*β* = −0.14, *t*(183) = −4.67, *p* < 0.001). In contrast to the simple linear regressions, multiple linear regression shows that the combination of the above mentioned variables give a better understanding on how the variables influence takeover quality (Figure 6). Regression results can be used to compute predictions of takeover quality, depending on the input data that is available.

**Figure 6.** Multiple linear regression results for stable and situational variables on takeover quality. *β* coefficients indicate the slope of the relationship in the multiple regression (significance codes: 0 '\*\*\*' 0.001 '\*\*' 0.01 '\*' 0.05).

#### **4. Discussion**

Results show that a combination of stable and situational variables can be used to explain 58 percent of variance in takeover quality. This new finding is important for the development of highly automated driving. Depending on the variables that can be assessed, a prediction of the takeover quality can now be made and cognitive assistance systems for highly automated driving adapted accordingly. In a user profile for example, the stable driver characteristics can be stored and used for predictions. Based on previous rides, the profile can adapt and store information about the drivers familiarity with certain situations. In combination with that, sensors of highly automated vehicles are able to provide information about the objective complexity of the current traffic environment. In contrast to these variables, measuring subjective complexity is more challenging. To integrate subjective complexity measurements into such a system, a faster and more easily manageable measurement method than the NASA-TLX rating sheet is needed. A way to measure subjective complexity is via eye-tracking (e.g., saccade distance [19], fixation times [20]) or physiological data (e.g., heart rate [21], skin conductance [22]). However, eye-tracking has to be supported in the corresponding vehicle or the driver wears a smart-watch featuring health tracking. Considering the current trend, these two measurement techniques are very likely. By integrating eye-tracking or physiological data, information about the current subjective complexity can be collected. In combination with measurements of the other situational and stable variables, good predictions about the current situation and the drivers state can be made. Based on this, cognitive assistance can support the driver during a takeover situation. Vehicle dynamics and the HMI can be adapted to increase the takeover quality. The results of the study already provide a very good basis for variables that are relevant for the takeover quality. For future research, it is important to consider further variables that might be important. Other distracting objects in the environment that are not vehicles (e.g., traffic signs, roadside environment), current stress level, vigilance and other variables are important to consider in future research. Furthermore, investigation in eye-tracking and physiological measurement methods to capture subjective complexity is important. If these methods are able to measure subjective complexity validly, a next step towards cognitive assistance systems that can be adapted based on the needs of the individual on hand is made.

#### **5. Conclusions**

In sum, it can be shown that already 58 percent of variance in takeover quality can be explained by the observed variables of this study. Those are the stable variables driving style, driving frequency, driving routes and driving duration as well as aspects of the situational variable subjective complexity. Objective complexity and familiarity did not become significant in the multiple regression analysis, but show a significant impact when taken separately. In future research it is thus still important to consider these variables. Stable variables can easily be stored in a user profile. Situational variables on the other hand have to be updated and integrated permanently. Different measurement methods have to be used and their output combined to validly display situational variables. Such a combination could be for example the integration of high traffic density (high objective complexity), a high heart rate or skin conductance level and a low saccade distance (high subjective complexity), low familiarity and a defensive driving style. Based on this combination, cognitive assistance would support with relevant information (e.g., projection of optimal driving trajectory), but suppress irrelevant information (e.g., radio or weather information). In addition, the automation would adapt vehicle parameters, such as decelerating while handing over or handing over step by step (e.g., first lateral dynamics—steering, second longitudinal dynamics—acceleration and deceleration). This process has to be very fast as takeover times are short and cognitive assistance has to be given as soon as possible. This paper gives an important selection of relevant variables that influence takeover quality. Based on this it is important to consider valid and fast measurement methods for situational variables and find further variables that influence the takeover quality. Then, cognitive assistance can be developed, individualized and adapted instantaneously.

**Author Contributions:** The individual contributions have been distributed as listed in the following (M.S.L.S., K.Z., N.R.): conceptualization, K.Z, N.R. and M.S.L.S.; data curation, M.S.L.S.; formal analysis, M.S.L.S.; investigation, M.S.L.S.; methodology, K.Z., N.R. and M.S.L.S.; project administration, M.S.L.S.; software, M.S.L.S.; supervision, K.Z. and N.R.; visualization, M.S.L.S.; writing—original draft preparation, M.S.L.S.; writing—review and editing, N.R. and M.S.L.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** I wish to acknowledge the help provided by my supervisor Michael Schulz at Robert Bosch GmbH for the support with the simulator, my department at TU Berlin and my second supervisor Klaus Bengler from the TU Munich. This work is part of the public promoted project PAKoS in which the Robert Bosch GmbH participated.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Repeated Usage of an L3 Motorway Chau**ff**eur: Change of Evaluation and Usage**

#### **Barbara Metz \*, Johanna Wörle, Michael Hanig, Marcus Schmitt and Aaron Lutz**

WIVW GmbH, 97202 Veitshöchheim, Germany; woerle@wivw.de (J.W.); hanig@wivw.de (M.H.); schmitt@wivw.de (M.S.); aaronlutz@gmx.de (A.L.)

**\*** Correspondence: metz@wivw.de

Received: 8 January 2020; Accepted: 14 February 2020; Published: 18 February 2020

**Abstract:** Most studies on users' perception of highly automated driving functions are focused on first contact/single usage. Nevertheless, it is expected that with repeated usage, acceptance and usage of automated driving functions might change this perception (behavioural adaptation). Changes can occur in drivers' evaluation, in function usage and in drivers' reactions to take-over situations. In a driving simulator study, N = 30 drivers used a level 3 (L3) automated driving function for motorways during six experimental sessions. They were free to activate/deactivate that system as they liked and to spend driving time on self-chosen side tasks. Results already show an increase of experienced trust and safety, together with an increase of time spent on side tasks between the first and fourth sessions. Furthermore, attention directed to the road decreases with growing experience with the system. The results are discussed with regard to the theory of behavioural adaptation. Results indicate that the adaptation of acceptance and usage of the highly automated driving function occurs rather quickly. At the same time, no behavioural adaptation for the reaction to take-over situations could be found.

**Keywords:** behavioural adaptation; SAE L3 motorway chauffeur; system usage; acceptance; attention; secondary task

#### **1. Introduction**

As discussed in the media, vehicle manufacturers plan to introduce (partly-) self-driving cars in the (near) future. According to the classification of the Society of Automobile Engineers (SAE) [1], drivers will be allowed to use the time while the system is active for non-driving related activities (NDRAs) starting from level 3 automation onwards (L3, conditional automation). By definition, in L3, all aspects of the driving task are executed by the automated driving system (ADS). Consequently, with L3 ADS, the role of the driver fundamentally changes when compared to manual driving. Even though the driver remains the fall-back option in the event of system limits, there is no need to monitor the driving environment or the system's performance while the ADS is driving. The driver is allowed to engage in NDRAs, such as browsing the internet or watching movies. However, in the event of a take-over request (TOR) by the system, the driver has to be able to retake control of the vehicle within a certain time frame. Therefore, from the driver's perspective, L3 is the first level of automation where vehicle automation can be experienced as completely self-driving within system boundaries with all the expected benefits.

The H2020 EU-funded project L3Pilot deals with L3/L4 vehicle automation (https://www.l3pilot. eu/). The overall objective of L3Pilot is to test and study the viability of automated driving as a safe and efficient means of transportation and to explore and promote new service concepts to provide inclusive mobility. Besides testing and evaluating current prototype versions of L3/L4 functions in on-road rests, one part of the project deals with the change of drivers' acceptance and usage of L3/L4-systems with

repeated usage. Due to the real-world systems still being at a prototype stage, this cannot be done in on-road tests, e.g., due to safety reasons. Instead, a study in a driving simulator is conducted in which drivers have the possibility to use an L3-motorway ADS during several drives. The main goals of this study are to gain insight into the change of user attitudes and trust with repeated experience of an L3-function, to identify user and situational factors that affect driver behaviour and acceptance and to investigate changes in driver behaviour in terms of engagement in non-driving related activities, take-over performance and mode awareness.

#### *1.1. Behavioural Adaptation*

With increasing vehicle automation, the role of the driver changes fundamentally. On a technological level vehicle automation is progressing rapidly. Still, there are challenges regarding the interaction between human drivers and automated systems. This includes the impact of automated systems on the driver's mental workload, situation awareness and acceptance of automated driving, as well as trust and reliance issues [2]. Another important aspect that has to be considered is that drivers may change their behaviour due to automation. This phenomenon is referred to as "behavioural adaptation", which is defined as "behaviours which may occur following the introduction of changes to the road–vehicle–user system and which were not intended by the initiators of the change" [3]. It is known that in the past, many road safety measures did not have the expected safety benefit in terms of a decrease in accidents. It is argued that drivers react to a lower safety risk, for instance, by increasing their traffic intensity and increasing their travel speed [4]. The first theory explaining changes in driver behaviour following changes in the vehicle or road infrastructure was the theory of risk homeostasis [5]. The theory assumes that individuals have a target level of accepted risk in driving situations and that they adjust their behaviour such that the perceived level of risk matches their target level of risk. After the introduction of safety-promoting vehicle technology, drivers might increase risky driving behaviour to adjust their perceived level of risk to their original target level. More recent theories shift away from the concept of 'risk perception' as an underlying mechanism. Two more recent models of behavioural adaptation might explain drivers' behaviour when using advanced driver assistance systems (ADAS) or ADSs: The driver-in-control model [6] considers the driver and the vehicle as a unit, the joint driver–vehicle system. The model describes a cycle of intentions, actions and outcomes. The probably most applicable model in the context of automated driving is the qualitative model of behavioural adaptation [7] which is specially designed to explain behavioural adaptation to in-vehicle driver assistance systems. One main factor in this model is trust in the system. Trust is affected by the driver's personality (especially the variables locus of control and sensation-seeking). The model seems highly applicable since there is evidence suggesting that trust is one of the main factors in driver behaviour when using ADSs [8,9].

The manifestation of behavioural adaptation highly depends on system functionality. For instance, it was found that when using adaptive cruise control (ACC), drivers reacted more slowly to a hazardous situation and had a greater deviation from lane position than when driving without ACC [7] and that drivers increased their maximum speed when provided a congestion tail. In addition, drivers' engagement in a secondary task increased when driving with a congestion tail warning system [10].

Compared to the investigation of behavioural adaptation for ADAS, a slightly different approach needs to be chosen for studying behaviour adaptation for ADSs. During automated driving, driving parameters such as speed or steering behaviour do not depend on the driver but rather on the system implementation. Hence, simply comparing these parameters when using the system compared to manual driving is not applicable. Therefore, other indicators need to be defined to measure behavioural changes to highly ADS, e.g., in terms of lane choice and secondary task engagement [11]. It is also reported that drivers visually focus less on the centre of the road when driving in the automated mode.

For ADAS, six more high-level categories of changes in drivers' behaviour are proposed to study behavioural adaptation [12]. Those categories seem relevant when investigating behavioural adaptation to ADSs, too. Behavioural changes are defined in terms of:


It can be hypothesized that behavioural changes on different levels are interconnected. An increase in trust in the ADS (attitudinal change—acceptance), for instance, may lead to a higher willingness to engage in secondary activities (cognitive change—prioritizing), which could then lead to a decreased perception of the environment (driver state changes—attentiveness) or a decreasing performance in case of a TOR (performance changes—driving). Such links must be considered in the assessment of behavioural adaptation to ADSs.

#### *1.2. Usage and Evaluation of ADS*

One of the major preconditions of acceptance and usage of ADSs is the drivers' trust in the system. If drivers do not trust the automation, they will not use it (*disuse*). On the other hand, if drivers over-rely on the automated system, this might lead to decision errors, for example, in terms of not responding appropriately to takeover requests (TOR) [13]. Increasing acceptance of ADSs can already be found after the first drive. Drivers who have experienced crashes or safety-critical situations report lower trust levels [9]. Trust is therefore closely tied to the perceived reliability of an automated system. If the perceived reliability increases, trust is likely to increase as well.

The acceptance of ADS is also highly related to its perceived usefulness. The perceived usefulness of an ADS for the user might increase along with the increasing automation level. When drivers are not required to monitor the system's performance and are allowed to engage in other activities, they will perceive the system as more useful. Several surveys have been conducted on the NDRAs which drivers want to engage in while driving in the automated mode. The perceived usefulness of the ADS depends on the extent to which drivers are able to perform these activities [14]. NDRAs that drivers would like to engage in include eating, interacting with passengers, phoning, observing the scenery, emailing, etc. [15].

Another relevant aspect arises from the rather passive role of the driver while driving with the ADS: fatigue or sleepiness. Due to the monotony of the situation, while being driven by the car, drivers experience fatigue much earlier than in manual driving and at much higher levels [11,16]. The generation of fatigue during highly automated driving might in extreme cases even cause the driver to fall asleep while driving in automated driving (AD) mode. In a simulator study on the assessment of trust in automation, two participants fell asleep while driving in AD mode [9].

#### *1.3. Change with Repeated Usage*

Studies investigating user experiences of ADAS and ADSs mostly assess the drivers' behaviour and attitudes when they first encounter the new technology. In most studies, for practical reasons, only the first 1–2 h of using a new technology is investigated. However, it is likely that after a certain time of using and experiencing the behaviour of the system in various use cases, drivers will adapt their behaviour accordingly. However, changes with repeated usage are assessed very rarely since this is rather complex and expensive.

Theories on behavioural adaptation distinguish different phases: The learning process is crucial for drivers to gain an appropriate understanding of the system's functionality as well as system limits and helps to build an appropriate level of trust. The learning process takes some time and requires an experience of the system in different situations and different environments. Two phases in the learning process are suggested: in the "learning phase", the driver learns how to operate the system, identifies system limits and internalizes the system functionality. The learning phase heavily depends on the way the system is introduced to the driver. In the second stage, the "integration phase", the driver integrates the system into the management of the overall driving task by increasing experience in different situations [17].

When testing ADAS in the AIDE project [17], the focus was on directly observable behavioural changes due to the ADAS, mainly in terms of changes in driving parameters. However, when assessing L3 ADS, the approach must be adapted. Since the vehicle is controlled by the ADS most of the time, changes in human driving behaviour can only be assessed to a limited extent. However, attitudes towards automation can change dramatically over time, for instance when experiencing the system in different traffic situations.

The term 'behavioural adaptation' is said to have an inherent association with time because it suggests that changes in behaviour are a result of being exposed to e.g., a certain ADAS/ADS and experiencing it in different situations [18]. From a methodological point of view, it is therefore crucial not only to consider a single usage of a system but sufficiently long exposure. The question is: How long is long enough to capture behavioural adaptation? For the investigation of ADAS (like ACC or lane departure warnings), a few hours to a few weeks are considered to be short-term usage whereas long-term usage is meant to last at least 6 months [18].

In another approach, five phases of behavioural adaptation to ADAS are distinguished with defined durations [12]:


The *First encounter phase* depends greatly on how intuitive and self-explaining the human-machine interface (HMI) is. The *Learning phase* still depends highly on the HMI, especially in terms of required system input. The timely dimension of the learning phases is empirically supported by studies on e.g., electronic speed control [7]. The *Trust phase* is mainly characterized by a shift in the locus of control [19] from the driver to the vehicle. Related problems might be overreliance, passivity and drowsiness. In the phases *Adjustment* and *Readjustment*, drivers adjust their adapted behaviour depending on their experience of (critical) situations and system limitations. It can be expected that trust plays an important role in the behavioural adaptation to ADS and indeed, for the overall acceptance of the system. According to Muir [20], trust depends on the degree of experience with automation and thus can be expected to change over time.

The durations given in the literature for the different phases of behavioural adaptation relate to the time period during which an equipped vehicle is available to the driver and the system can be used. The required period of actual system usage can be expected to be much less. From the literature, it is not known how many hours of driving with an active system or how many occurrences of a certain system intervention/warning is needed to study behavioural adaptation. For our research, it also has to be considered that the phases defined in Martens and Jenssen [12] refer to behavioural adaptation to ADAS, not to ADSs. The learning phase for a system that only intervenes very occasionally can be expected to be much longer than for a system whose behaviour can be experienced continuously by the driver. For ADSs, it can be expected that the learning process is much faster. It seems likely that the phases of behavioural adaptation defined by Martens and Jenssen [12] only give a rough estimate and do not apply for behavioural adaptation to high automation.

One study is known that investigated secondary task engagement during highly automated driving with a repeated usage perspective. Six drivers were invited to undertake five 30-min journeys with a highly automated system in a driving simulator. They were encouraged to use the system just as they would in a real automated vehicle. Participants were asked to bring with them any objects or devices that they would be willing to engage with during the drives. The most common activities during the drives were reading articles or magazines, using mobile devices for social networking activities, web browsing and watching programmes or films on a laptop. Although the study was set up with a focus on repeated usage, no findings on changes in behaviour over time were reported [21].

#### *1.4. Objective*

In summary, there is literature that discusses the concept of behavioural adaptation especially with the focus of usage of ADAS. But even for ADAS, studies investigating behavioural adaptation are rare, probably due to the fact that such research is time-consuming and expensive. For L3/L4-ADSs it seems reasonable to assume that behavioural adaptation will have a relevant impact e.g., on function usage as soon as there are functions on the market. Nevertheless, experimental results are still lacking on that topic. The aim of our research is to study behavioural adaptation to an L3 motorway ADS with repeated usage. The focus is on


Due to practical limitations, it is not possible to study the full process of behavioural adaptation where changes are still expected to occur even after several months. Instead, the focus is on the beginning of this process including the first encounter, the learning phase and maybe the beginning of the trust phase. It is expected that during the learning phase there is still some change of behaviour. For the trust phase, a more constant level is expected.

#### **2. Materials and Methods**

The study was conducted in the high-fidelity moving base driving simulator of the WIVW GmbH (see Figure 1). The mock-up consisted of a production type BMW 520i. The motion system used six degrees of freedom and could display a linear acceleration up to 5 m/s2. All vehicle dynamics and noises were displayed realistically. The simulation software was SILAB® Version 6.0 (WIVW GmbH, Veitshöchheim).

**Figure 1.** The high-fidelity driving simulator at WIVW.

Drivers were invited to participate in a study on the long-term effects of an L3-motorway ADS (L3ADS). The study consisted of six drives on a motorway during which the L3ADS could be used. The drives took place on six different days. In all drives, the drivers were free to use the L3ADS as they liked, meaning they could activate and deactivate it and attend to NDRAs as they wished. Drivers

were instructed that while in the automated mode, they were not required to pay attention to the driving task and they were allowed to engage in other activities. However, when the system issued a TOR, they had to retake the vehicle guidance and were responsible for the driving task. For the description of the system and the responsibility of the driver, the actual wording of §1b of the German Road Transport Law [22] defining the responsibility of the driver when driving with an ADS, was used.

#### *2.1. System Implementation*

The study focused on acceptance, evaluation and usage of an L3ADS by ordinary, non-professional drivers. Therefore, participants tested a simulated L3ADS that worked realistically in the motorway scenarios included in the six test drives. The system was implemented based on the descriptions of L3 motorway systems to be tested in the on-road tests of L3Pilot [23]. It was designed to work in the driving scenarios tested in the study by using controllers already available in SILAB®.

The L3ADS had an operational design domain (ODD) that is similar to the ODD of highway ADSs tested in the on-road experiments in L3Pilot [23]:


#### *2.2. Test Scenarios*

Four of the six experimental drives had a duration of 30–35 min (drive 1 to drive 4 in Table 1). In those drives, it was taken care that the driving environment was not too boring and that traffic density and driving situations changed within and between the drives. As can be seen in Table 1, all four drives contained sections with low traffic density and changing speed limits (in three of them also unlimited), in three of the four drives, sections lasting between five and ten minutes with traffic jams occurred. The number of TORs varied between two and five per drive. Reasons for TORs were missing lane markings, approaches to construction sites, highway intersections and at the end of every drive the approach to the exit. Table 2 gives more details on the takeover scenarios. All scenarios were defined in a way that included common non-critical driving situations. Very critical, unusual or rare scenarios were avoided because the focus of the study was on simulating potential everyday usage of the L3ADS.


**Table 1.** Content of the six experimental drives. Results from the drives in bold are included in this paper. The two other sessions are excluded to avoid confusion of the effects of repeated usage and driver state.

**Table 2.** Description of the analysed takeover scenarios.


The two other drives were longer (90 min) and more monotonous, one of them taking place at 6 am in the morning. Those two drives were included to study specific hypotheses on driver state which will be presented elsewhere (in preparation). The order of the drives was varied between participants to avoid sequence effects. The two monotonous drives always took place in the third and the fifth session.

#### *2.3. Data Logging*

Most methods used were defined in the common methodological approach within L3Pilot and that will also be used for the on-road tests of L3Pilot [24]. This specifically relates to the questionnaire developed within L3Pilot which assesses aspects like acceptance, perceived safety, trust, workload, etc. The questionnaire was designed for on-road tests where drivers have the opportunity to test an

L3/L4-system once. It consisted of a pre-drive questionnaire in which demographic information, as well as pre-experiences with in-vehicle systems, were collected. The post-drive questionnaire assessed the evaluation of the tested system through a mixture of standardized items (e.g., [25]) and items specifically tailored to the project questions of L3Pilot. The specifically developed questionnaire items mostly consisted of a statement with which the participants could agree or disagree on a 5-point scale (for example see Figure 2a). In the present study, the pre-drive questionnaire was administered once at the beginning of the first session. The full post-drive questionnaire was filled in after the 1st and the 6th session, a shortened version was used after session two to five. Directly after every TOR, drivers rated the criticality of the previous driving situation on a ten-point scale, ranging from harmless to uncontrollable with intermediate steps of unpleasant and dangerous (based on [26], see Figure 2b). The rating related to the TOR itself and to the following driving scenario (e.g., drive through a construction site).


**Figure 2.** Example of questionnaire items used to assess concepts like acceptance and trust (**a**) and scale used to asses experienced criticality for takeover scenarios (**b**).

Furthermore, during all sessions a variety of objective parameters was logged:


#### *2.4. Procedure*

In the introductory session, drivers were informed about the schedule for their test drives. Before each session, they knew the length of the oncoming trip and they were informed that they were free to prepare for the drive as they liked to. This meant for instance that they could bring something to read, something to eat or prepare other potential side tasks to fill the time of the automated drive. Besides being free to attend to side tasks as they liked while in automated mode, drivers were also free to use the system as they liked. This meant that they were allowed to override the system or deactivate it in situations where they preferred to drive manually. Table 3 gives an overview of the 6 experimental sessions.


**Table 3.** Overview of the content of the six sessions of the experiment. Results from the sessions in bold are included in this paper. The two other sessions are excluded to avoid a confusion of the effects of repeated usage and driver state.

#### *2.5. Sample*

The study was conducted with N = 31 drivers (mean age = 37, sd = 11.75); 58% of the sample were male. Nearly 70% of the sample have had their driving license for at least 10 years. In the pre-questionnaire, participants also stated on average that for them driving on a highway is neither difficult nor stressful, but that they also do not enjoy driving on motorways; 42% of the sample stated that they drive on a highway at least 1–2 times per week; 10% that they are stuck in traffic jams on highway with the same frequency of 1–2 times per week. All participants had completed an extensive training for the driving simulator before participating in the study in order to avoid learning effects and simulator sickness.

#### *2.6. Data Analysis*

To investigate drivers' performance in takeover scenarios, two approaches were chosen:


Reaction times and TOC-rating were only analysed for situations where drivers took control back after a takeover request was issued by the L3ADS.


**Table 4.** Scale for take-over controllability (TOC)-rating which was used to evaluate takeover performance.

For all questionnaire items, general agreement or disagreement was evaluated with single t-tests against zero (meaning neutral on the scale). Results are reported for the evaluation of the system after the first and after the sixth session.

To investigate the behavioural adaptations with repeated usage, the changes over experimental sessions were analysed for the following parameters:


For statistical testing of the effect of repeated usage, repeated measures ANOVAs were calculated with time (session) as a within-subject factor. To avoid having the effects of repeated usage mixed with effects of drive state (which was experimentally influenced in the two monotonous drives) only the four shorter drives are included in the analysis of behavioural adaptation. These drives always took place in the first, second, fourth and sixth experimental sessions. In the result section, graphs show means and 95% confidence interval.

#### **3. Results**

#### *3.1. Evaluation of the L3 Motorway ADS*

For more general statements about the L3ADS, there is either a general agreement or disagreement (see Figure 3 and Table 5). Drivers state that they would use the system, recommend it and trust it. Furthermore, driving with L3ADS was rated as being comfortable and fun, drivers did not evaluate it as demanding, stressful or difficult and drivers felt safe while driving with the system. Therefore, drivers evaluate L3ADS positively.

A significant change of drivers' evaluation with repeated usage occurs for the statements "driving was stressful" (F(3, 84) = 2.99, *p* = 0.03536), "I felt safe driving with the system" (F(3, 84) = 5.54, *p* = 0.00161), "I trust the system" (F(3, 81) = 3.87, *p* = 0.01221), "I would use this system" (F(3, 84) = 3.16, *p* = 0.02882) and "Using the system was fun" (F(3, 84) = 3.06, *p* = 0.03260). Post-hoc tests show that with repeated usage, there is an increase in trust and driving safety going together with a decrease of subjective stress which is most pronounced during the fourth drive. Afterwards, there is again a decrease of expressed trust. Experienced fun is most pronounced during the second drive.

**Figure 3.** Drivers' agreement with general statements about the L3-motorway automated driving system (ADS) (L3ADS). \* marks statements with a significant effect of the session.

**Table 5.** Results of t-tests evaluation agreement or disagreement with the questionnaire items for evaluation of the ADS. The table gives mean (m), standard deviation (sd), number of included participants (N), t-value, degrees of freedom (df) and *p*-value. Significant t-tests are marked in bold.


#### *3.2. Usage of L3 Motorway ADS*

The overall positive evaluation of the system is reflected in system usage: 90% of the time the L3ADS is available it is actually activated (see Figure 4a). There is no change of system activation with repeated usage (F(3, 90) = 1.03, *p* = 0.38470). Instead, the increase of trust is reflected in a significant increase of engagement in NDRAs (F(3, 90) = 5.87, *p* = 0.00104) from 68% during the first session to about 80% in the following sessions. The significant increase in manual NDRAs (F(3, 90) = 7.95, *p* = 0.00009) is even more pronounced; from 32% of driving time in session one, over 40% in session two up to 59% in session four and 63% in session six.

**Figure 4.** The proportion of time L3ADS was activated, drivers attended to non-driving related activities (NDRAs), drivers attended to NDRAs involving manual distraction (**a**) and drivers spent looking on the road (percentage road centre, PRC) during all time with L3ADS active and during time L3ADS was overtaking other vehicles (**b**).

With the increase of manual NDRAs while driving with the L3ADS activated, the proportion of glances directed to the road decreases (F(3, 90) = 5.79, *p* = 0.00115, see Figure 4a). There is a decrease between sessions one and two and a further decrease during session four. Then, PRC stays at a constant level. PRC decreases from 30% of the time with the system active in session one to 20% in sessions four and six. The decrease is similar for situations where the L3ADS overtakes other vehicles including lane changes and for situations where the L3ADS follows its own lane. However, during overtaking manoeuvres, drivers' gaze is direct on average during 5% more driving time to the road compared to lane following (F(1, 30) = 12.073, *p* = 0.00158, see Figure 4b). Therefore, with repeated usage of the L3ADS, the willingness of the drivers increases to engage in other activities and to draw attention away from the driving environment, but situational differences remain unchanged.

#### *3.3. Driver State with L3 Motorway ADS*

The measurable behavioural changes are reflected in the subjective evaluation as well (see Figure 5): over the sessions, drivers agreed significantly more strongly with the statement "I use the time to do other activities" (F(3, 78) = 6.38, *p* = 0.00063) and significantly less with the statement "I monitored the environment more than in manual driving" (F(3, 84) = 8.40, *p* = 0.00006). For both statements, the change is most pronounced after the first session.

Drivers agree significantly with the statement "driving with the system would make me tired" (see Table 5). This subjective impression is supported by the comparison of ratings of fatigue assessed with the KSS directly before and after the drives. There is a significant increase of fatigue (F(1, 26) = 17.71, *p* = 0.00027) of about 0.6 scale points on average for the four drives.

**Figure 5.** Drivers' agreement with statements about the effect of L3ADS on drivers' state. \* marks statements with a significant effect of the session.

#### *3.4. Take-Over Situations*

Drivers agree significantly with the statements "during take-overs I felt safe", "it was obvious to me why take-over requests occurred", "take-overs were warned appropriately" and "take-overs were with sufficient time" (see Table 5). For none of the statements on takeover situations, there is a significant change in the evaluation with repeated usage.

Within the four drives, frequency and reasons of TORs varied. Overall, the majority of take-over situations are experienced as being harmless or unpleasant (see Figure 6a). N = 7 out of 433 situations are rated as dangerous, but in four of these situations, drivers took control back even before a TOR was issued by the system. Therefore, the rating mostly relates to the following driving situation, which was a highway intersection with traffic in the two most critical situations.

**Figure 6.** Experienced criticality of take-over situations (**a**) and criticality and proportion of take-over before take-over request (TOR) split by situation type (**b**).

As can be seen in Figure 6b, there are situations in which control is taken back quite frequently before a TOR actually occurred (exit and highway intersection) because these system limits are announced by the navigation system before a TOR. These situations are rated as less critical than situations without a pre-announcement like TORs before a construction site, before roadworks or because of missing lane markings (F(4, 108) = 8.12, *p* = 0.00001).

To analyse behavioural adaptation to TORs, take-over situations are averaged per driver and driving session separately for situations where drivers take control back before or after a TOR. For subjective criticality, there is a significant interaction between the type of take-over situation and the number of sessions (F(3, 100) = 3.20, *p* = 0.02671, see Figure 7). During the first session, experienced criticality is similar to situations where drivers take control back before and after a TOR. After the first session, situations are rated as less critical when the driver takes control back before the system issues a TOR. There is no change in the evaluation of situations where control is taken back after a TOR.

**Figure 7.** Experienced criticality in a situation where drivers took control back before and after a TOR.

For situations where drivers react after the TOR, TOC-rating and reaction times are analysed (see Figure 8). The time it takes until drivers look onto the road (eyes on-road) is shorter than one second for all sessions and it does not change with repeated usage (F(3, 72) = 0.26, *p* = 0.85355, Figure 8b, lowest parameter). It takes between two and three seconds until drivers put their hands on the wheel (Figure 8b, middle parameter) and between three and four seconds until the L3ADS is deactivated and the driver starts driving manually (Figure 8b, upmost parameter). For the time until drivers put their hands on the wheel there is a tendency (F(3, 87) = 2.51, *p* = 0.06424) and for the time until control is actually taken back there is a significant (F(3, 87) = 4.51, *p* = 0.00547) change over time. For both parameters, the effect is based on an increase of reaction times during the second session. This pattern resembles the results for the TOC-rating. Descriptively there is an increase in average TOC-rating in session 2, which means a worsened takeover performance. Nevertheless, this change is not significant (F(3, 87) = 1.3382, *p* = 0.26723). In all sessions, between 31% and 42% of all takeover reactions are rated either as perfect or good (on the scale 1–3) with the highest proportion during the first session and the lowest during the second. Between 56% and 69% of takeover reactions are evaluated as being with errors (on the scale 4–6), now vice version session one having the lowest and session two the highest proportion. Overall there is only one takeover scenario rated as being critical that occurred during session one.

**Figure 8.** Experienced criticality in a situation where drivers took control back before and after a TOR (**a**) and reaction times until eyes were on the road, hands were on the steering wheel and control was taken back (**b**).

The pattern of errors occurring in the takeover scenarios remains similar to repeated usage (see Figure 9). Most errors/imprecision rated relate to imprecise lateral control like jerky steering, too low lateral safety distance and crossing of lane markings. Furthermore, drivers frequently forget to use the indicator or use it too late. Errors in longitudinal control (like braking too strong or too late) and errors indicate problems on the decision level (e.g., missing, hesitant or wrong lane change) are rare.

**Figure 9.** The proportion of takeover scenarios with the different types of errors/imprecision rated in the TOC-rating.

#### **4. Discussion**

In summary, several of the investigated measures change with repeated usage of the L3ADS:

• With repeated usage, drivers trust the function more and feel safer and less stressed.


In the course of the drives, there is no change in the proportionate time that the system is activated. This can be explained by the fact that during the first drive the usage is already very high, with the system being activated more than 80% of the time it is available. This level remains rather stable in the course of the six drives. Therefore, the growing trust in the system is reflected not in an increase of usage of the system but rather in an increase in the willingness to engage in NDRAs and let the system be unsupervised. For most measures, the main increase can be observed between the first and the fourth drive, the second drive ranging somewhere in between. For the proportion of time the gaze is directed to the road, there is a continuous decrease from session one to session four. During sessions four and six, the level remains stable. Therefore, subjective as well as objective measures indicate an increase of trust over the first four drives. Afterwards, no further behavioural change can be observed. The results are in-line with the model of Martens and Jenssen [12] that describes that after the first encounter where the driver first explores the system, a phase of learning starts. In this phase, the driver experiences the system behaviour in different situations or scenarios. Even though the timely dimensions stated in the model (1–6 h for the first encounter and 3–4 weeks for the learning phase) do not apply to the results of our study, the phases seem applicable.

Performance changes over time as assumed by the model of behavioural adaptation [12] were expected in terms of better reactions to TORs. In summary, drivers were able to handle TORs safely and easily within the available timeframe of 15 s. There is a small effect of repeated usage on reaction time to a TOR based on increased reaction times in the second drive which is reflected at least on a descriptive level in the TOC-rating. This pattern does not support the assumption of a learning effect in terms of a constant improvement of take-over performance. However, it has to be considered that the applied take-over situations were easy to handle. This is also reflected by an overall very low subjective criticality. Especially, situations that were designed such that drivers received a cue that a take-over situation would occur soon, e.g., the information from the navigation system that was given before the TOR was issued were rated as not critical. During the first session, drivers learn to use the pre-announcement to react without time pressure before reaching the system limit and to take control back even before a TOR is issued by the L3ADS.

For those take-over situations where drivers react after a TOR was issued, there is no change of experienced criticality over time. Probably, due to the time pressure after a TOR announces the on-coming end of ODD 15 s before it is actually reached and probably also due to the variability and the changing complexity of the oncoming driving situations, there is no change of experienced criticality over time. It might either be that the number of actual TORs experienced in the experimental drives was too low for such an adaptation to take place or that there is no room for adaptation because appropriate reaction and timing are largely pre-defined by the situation itself. For reactions after a TOR, reaction times for later parts of the reaction (hands-on the steering wheel and control taken back) are delayed during the second session. Whether this indicates a relevant but short change with regard to the concept of behavioural adaptation is questionable.

#### **5. Conclusions**

Investigating behavioural adaptation to ADSs poses high requirements for the study design. Simply comparing driving parameters when using the system with driving without the system as applied in studies on ADAS (see e.g., [7]) is not applicable for automated driving systems from SAE level 3 onwards. This would mean that manual driving behaviour is compared to a driving behaviour defined by the automation technology. An alternative approach is to investigate the drivers' behaviour in a timely perspective when interacting with the system. As described by Martens and Jenssen [12], drivers' behaviour when using an ADAS changes over time. Especially the phase of building trust in

the system seems highly critical for explaining changes in the drivers' behaviour. Self-reported trust in the L3ADS in our study increased in the course of the driving sessions. Between the first and the fourth drive, an increase in trust in the system was evident. Along with increasing trust, a decrease in monitoring behaviour (decrease in PRC) and increasing engagement in NDRAs was observed. Even though the causal relation of this development is unclear, it can be assumed that drivers change their monitoring behaviour as well as their engagement in NDRAs due to their growing trust in the L3ADS. Furthermore, the observed changes are in line with the predictions of the theory of risk homeostasis [5]. The increase in subjective trust went along with an increase in perceived safety. The increased trust explains why drivers led their attention away from the driving environment and engaged in other activities. Therefore, it can be argued that the overall subjective risk was kept constant by the drivers.

It seems likely that the progress in behavioural adaptation varies for different aspects of using and handling an L3-system. Since driving with the activated system, seeing the system work and experiencing its advantages included the largest proportion of the total 8 h of driving time, the six sessions seem to be sufficient to investigate changes in drivers' attitudes and also in their decisions regarding handling the activated system and using the driving time. Compared to that, actual TORs are rare and short situations. Furthermore, they often lead to situations that require a situationally adapted reaction from the driver with little room for behavioural variations. It is likely that TORs were not frequent enough to study behavioural adaptation, especially because they were experienced as being harmless and manageable.

Regarding the different dimensions of behavioural adaptation discussed in the literature, a clear differentiation between cognitive changes and performance changes turned out to be difficult to capture for driving with L3 automation. This is mainly because the driving task is performed by automation most of the time, therefore the performance of the driver cannot be measured. What can be measured is the decision to activate the system and how the time with the system active is used. These are measures that to our understanding mirror the cognitive decisions of the driver. Also, for driving with ADAS, these two dimensions are probably the ones that interact most, because with mostly manual driving a decision (e.g., to attend to an NDRA) often directly impacts the measured driving performance (e.g., lane-keeping performance). With L3 ADSs, drivers' performance is only measurable in take-over situations where control is handed back to the driver. For situations with a pre-announcement of a system limit (e.g., due to the navigation system), experienced criticality decreases in parallel to other measures during the second session. To gain further insight into potential behavioural adaptation in takeover scenarios more research is needed. It needs to be investigated whether no behavioural adaptation to TORs occurs, e.g., due to the nature of takeover situations (time pressure, varying situational demands) or whether the number of TOR in our study was too low to observe behavioural adaptation.

The approach of operationalizing behavioural adaptation by comparing the driver's attitudes and behaviours over different points in time seems applicable to ADSs. Therefore, the driver's behaviour when using the system for the first time can be compared to the behaviour when using the system at a later point in time. The only question is: When do changes in behaviour occur? What is a reasonable period of usage to observe a change? The timely dimension of the five phases of behavioural adaptation to ADAS by Martens and Jenssen [12] is 1–2 years and was clearly not covered in the presented study. However, the results suggest that for the use of highly automated driving systems this process might be faster. Between the first and the fourth drive, an increase in subjectively reported trust, perceived safety and the willingness to use the system was evident (*attitudinal changes*). The engagement in NDRAs also increased in parallel with reported trust (*cognitive changes*). It seems that especially the learning phase (3–4 weeks) passes much faster since besides the system handling (activation/deactivation) there were mainly the system limits that had to be learned. Drivers experienced various system limits during the experimental drives. This might have been sufficient for "learning" the system. It is obvious that if drivers use the system for an extended period of time and experience the system in more diverse situations, drivers might adjust their behaviour at a later point in time (see *adjustment phase*, [12]). However, it can be argued that behavioural adaptation to ADSs seems to occur faster than for ADAS. A longer-term user study on ADSs preferably in a real driving environment could yield more insights into further behavioural changes due to ADSs. Nevertheless, such a study requires that L3/L4-ADSs are on the market or at least available in a market-ready version. Such a study would also help to replicate the findings from the presented study.

**Author Contributions:** conceptualization, B.M. and J.W.; methodology, B.M. and J.W.; software, M.H.; analysis, B.M.; study conduction, J.W., A.L. and M.S.; writing—original draft preparation, B.M.; writing—review and editing, J.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** The study is part of the research project L3Pilot (https://www.l3pilot.eu/), which receives funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 723051.

**Acknowledgments:** The research leading to these results has received funding from the European Commission Horizon 2020 program under the project L3Pilot, grant agreement number 723051. Responsibility for the information and views set out in this publication lies entirely with the authors. The authors would like to thank all partners within L3Pilot for their cooperation and valuable contribution.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

### **Measuring Drivers' Physiological Response to Di**ff**erent Vehicle Controllers in Highly Automated Driving (HAD): Opportunities for Establishing Real-Time Values of Driver Discomfort**

**Vishnu Radhakrishnan 1,\*, Natasha Merat 1, Tyron Louw 1, Michael G. Lenné 2, Richard Romano 1, Evangelos Paschalidis 1, Foroogh Hajiseyedjavadi 1, Chongfeng Wei <sup>1</sup> and Erwin R. Boer <sup>3</sup>**


Received: 2 July 2020; Accepted: 5 August 2020; Published: 8 August 2020

**Abstract:** This study investigated how driver discomfort was influenced by different types of automated vehicle (AV) controllers, compared to manual driving, and whether this response changed in different road environments, using heart-rate variability (HRV) and electrodermal activity (EDA). A total of 24 drivers were subjected to manual driving and four AV controllers: two modelled to depict "human-like" driving behaviour, one conventional lane-keeping assist controller, and a replay of their own manual drive. Each drive lasted for ~15 min and consisted of rural and urban environments, which differed in terms of average speed, road geometry and road-based furniture. Drivers showed higher skin conductance response (SCR) and lower HRV during manual driving, compared to the automated drives. There were no significant differences in discomfort between the AV controllers. SCRs and subjective discomfort ratings showed significantly higher discomfort in the faster rural environments, when compared to the urban environments. Our results suggest that SCR values are more sensitive than HRV-based measures to continuously evolving situations that induce discomfort. Further research may be warranted in investigating the value of this metric in assessing real-time driver discomfort levels, which may help improve acceptance of AV controllers.

**Keywords:** driver state; discomfort; psychophysiology; heart-rate variability (HRV); skin conductance response (SCR); highly automated driving (HAD)

#### **1. Introduction**

In the recent past, there has been an increasing interest in implementing vehicles with a range of advanced driver assistant systems (ADAS), fuelled by manufacturers' desire to introduce higher levels of vehicle automation capability [1]. The primary motivation for these implementations is their hypothesised provision of increased road safety, and enhanced mobility, accessibility, efficiency and comfort [2]. According to Carsten and Mertens [3], manufacturers have been using comfort as one of the main selling points for ADAS. Additionally, the comfort of the driver is considered to be a determining factor for the broader acceptance of the automated system [4]. Therefore, it can be argued that, if an automated system can measure driver comfort in real-time, it can adapt its driving style/behaviour to match the drivers' expectations accordingly, and thereby potentially increase acceptance. This could have the additional benefit of reducing unnecessary driver initiated takeovers,

which can otherwise jeopardise the safety of the vehicle and its occupants [5]. This study, conducted as part of the HumanDrive project, considered the effect of a number of road and vehicle-based factors on driver comfort, investigating whether physiological metrics can be used to provide an objective measure of comfort, to help inform the design process when investigating the acceptance of future automated vehicles.

Currently, there is no unanimously agreed on definition of comfort. In a general context, Slater [6] (p. 158) described comfort as "a pleasant state of physiological, psychological and physical harmony between human being and the environment". In the context of driving, and especially highly automated driving (HAD), Beggiato et al. [7] (p. 446), defined comfort as "a subjective, pleasant state of relaxation resulting from confidence in safe vehicle operation which is achieved by the absence of uneasiness and distress". Beggiato et al. [7] further suggested this is still a rather broad definition of comfort, and is associated with other concepts, such as stress, mental workload, fear, motion sickness or anger, with stress and mental workload having the closest link to discomfort (i.e., lack of comfort). Siebert et al. [4] argued that it is easier to measure discomfort rather than comfort, since signs of discomfort tend to be more well-defined and pronounced, compared to the un-aroused relaxed state of comfort. Summala [8] proposed four factors that need to be maintained above a certain threshold to keep drivers within their "comfort zone" during manual driving. These are safety margins (to road edges, obstacles or other vehicles), vehicle-road system (accelerations, road geometry), rule-following (obeying traffic laws, maintaining speed limits) and good progress of the trip (meeting one's expectations for the pace or progress of the travel). However, assuming 100% performance of the automated system, Siebert et al. [4] noted that the rule-following factor for comfort is redundant in HAD, as the automated vehicle (AV) will almost certainly follow the rules, and that good progress of the trip is dependent on traffic conditions, rather than automation state in itself, assuming the route selected by the AV is similar to that in manual driving, where the navigation system decides/recommends the optimal route to be followed. Therefore, in this paper, we focus specifically on how factors that affect the safety margins, and vehicle-road system, affect driver discomfort, for manual and automated driving.

Summala [8], suggested that sufficient safety margins from potential hazards are required for a driver to feel safe and comfortable. Factors influencing these safety margins, and likely to increase driver discomfort, include situations which increase drivers' stress levels, such as navigating in crowded cities, interactions with other road users, or when passing another car/obstacle [9,10].

Comfort is affected by jerk and acceleration forces of the vehicle, with higher accelerations and jerks (in terms of both magnitude and frequency) associated with an increase in discomfort [11–13], and an increase in motion sickness [14]. Drivers tend to keep their lateral and longitudinal acceleration under 2 m/s2 for a comfortable driving experience [15–17]. However, it should be noted that drivers' comfort threshold for lateral acceleration varies with respect to their velocity, with an increase in velocity resulting in lower threshold values for lateral acceleration [17,18]. Within the public transport domain, especially in railway systems, standard acceleration values are limited to under 1.47 m/s2, and jerk values are kept under 0.6 m/s3, to ensure passenger comfort [13,16,19]. However, the acceleration and jerk thresholds used in public transport systems consider both seated and standing passengers. Therefore, it may be permissible to have slightly higher thresholds in HAD, where passengers are typically seated. For instance, Eriksson and Svensson [20] suggested an acceleration and jerk threshold of under 2 m/s2 and 0.9 m/s3 respectively, to ensure a comfortable ride in HAD.

Because AVs are still in the prototype and testing phase, most individuals have not had a real-world experience of HAD. Therefore, our expectations of what constitutes a 'comfortable' experience during HAD can only be based on our current understanding of users' comfort in either manual driving, or in other surface transport modes. However, there are considerable differences between these modes, in terms of Summala's [8] proposed four factors, described above, making them difficult to compare to HAD. Thus, to assist with the development of more acceptable AVs, and to ensure user uptake of these systems in the future, it is of value to understand what particular features of an AV's manoeuvres are likely to enhance or diminish user discomfort. For example, humans try to minimise the jerk

during manual driving, whereas most current ADAS features tend to have a relatively higher jerk, due to their preference to stay closer to the lane centre and unwillingness to cut corners, unlike human drivers. Thus, it is important to know if users would prefer, and feel more comfortable with, a more "human-like" AV controller, which favours manoeuvres that result in lower acceleration and jerk, over a more conventional AV controller, with very strict margins for optimal and accurate lane-keeping and vehicle velocities.

Studies on comfort in manual driving have used subjective measures, such as comfort questionnaires [21] and comfort scales [22]. Since comfort is highly subjective, it can be challenging to measure it accurately and reliably on a moment-to-moment basis. In a real-world HAD scenario, the driver may become annoyed if they are asked to rate their comfort levels time and again during the drive, especially when they have the option to engage in more appealing non-driving related activities. Thus, in HAD, there is a need for a non-intrusive, objective, discomfort detection system, which can ultimately be used to adapt the automated system's driving style, to ensure the driver is relaxed and at ease [7]. Physiological techniques are one example of such objective methods, which have been used in the past to assess driver state both in HAD [7] and manual driving [23,24]. Recent technological advancements have led to the development of non-intrusive physiological devices that measure heart rate variability (HRV) and electrodermal activity (EDA), such as wearable smart-band sensors like Empatica E4 [25] or Microsoft band 2 [7], and non-contact methods, such as those listed in [26]. Previously, studies have shown strong correlations between stress and workload, and users' HRV, and EDA. A general finding is that heart rate (HR) increases, and HRV (including the time-domain based metric of root mean square of successive differences in R-R intervals (RMSSD)) decreases, during periods of high stress or workload [10,27–29].

An EDA signal consists of the slow-changing tonic component called skin conductance level (SCL) and the rapidly changing phasic component, known as skin conductance response (SCR) [30]. SCRs are generally used to understand short-term fluctuations in the EDA signal, due to a short-term stimulus (for example, being startled or passing an obstacle), whereas SCL is used to understand the overall change in a person's skin conductance when the stimulus is spread over a longer period (for example, fatigue induced by driving for a long time). SCRs have a much shorter decay time than SCLs, and, hence, can more accurately capture differences in manipulations, without the need for recovery/resting periods in between [30,31]. In the context of driving, both SCL and SCRs have been shown to increase with an increase in stress and workload for a driver [10,23,32], and, thus, are associated with increases in discomfort [7]. Based on these findings, we analysed RMSSD, HR and SCR responses per minute (nSCR/min) in this study, as the objective physiological metrics of drivers' comfort.

#### *Current Study*

This study was undertaken as part of a 10-member consortium of the HumanDrive project, part-funded by the UK's Centre for Connected and Autonomous Vehicles (CCAV), via Innovate UK. The main aim of the project was to develop an advanced vehicle controller, which allowed the vehicle to perform a 'natural', human-like, driving style, using artificial intelligence (AI), and deep learning techniques. As outlined above, developing a human-like controller could potentially help with the broader acceptance of AVs, driven by a more natural driving style, which is familiar to the driver. Using manual driving data collected from 44 drivers in an earlier HumanDrive study, an aggregated model for human-like controllers, focusing on both vehicle safety and comfort, was developed for the present study (see also [33], for more details of the controllers). An environment-specific risk model was developed to guide the design of the experiments. The simulated drives were constructed to include risk elements present in the drive, based on road width and curvature, as well as on the presence of road-based furniture and obstacles, such as hedges of different heights, grass/asphalt verges, pedestrian refuges and parked-cars or roadworks (see [34] for more details). The development of this risk model was based on satisficing risk corridors, proposed by Boer [35], where a set of

vehicle states are within acceptable bounds. The vehicle state includes velocity and lateral offset. The trajectory of the vehicle is always within this risk corridor and adopts a comfortable smoothness for the ride. The model holds that drivers' perceived risk level is based on minimum time to lane crossing, wherein the lateral position for the vehicle stays within the road boundaries [35]. Based on this model, two human-like AV controllers (SLOW and FAST, with the FAST controller having higher velocities than the SLOW controller) were developed, and compared to a conventional controller (LKAS), and drivers' replay of their own drive (see Section 2.3, for more details). To understand how the different physical characteristics of a drive can affect drivers' discomfort, our study exposed participants to a range of accelerations, induced by the four different AV controllers and manual driving. Participants experienced these controllers in two different road environments (rural and urban), which included a variety of road geometries, such as roads of different curvatures/width/speed limit, containing a range of road furniture/obstacles (parked cars, roadworks and pedestrian refuges). Previous studies on driver discomfort during HAD, such as Beggiato et al. [7], have focused on discrete situations causing discomfort, such as negotiating an intersection, exit ramp or an obstacle. In our study, we considered the effects of longer, repeated exposure to different road environment, human-like AV controllers and interactions with road furniture and obstacles, on drivers' discomfort. Drivers' HR and EDA data were compared to drivers' self-reported level of perceived discomfort for each road environment, which was measured in real-time, using a button pressing technique (see Section 3.2 for more details). We addressed the following research questions:


#### **2. Materials and Methods**

#### *2.1. Participants*

In total, 24 participants (10 Female), each with a valid UK driving licence, took part in this driving simulator-based study. Their mean age was 43 ± 17 years, with a mean driving experience of 23 ± 18 years. All participants gave consent to take part in the study, in accordance with the rules and regulations of the University of Leeds ethics committee (LTTRAN-086) and were compensated with £50 for taking part in the study. Participants were pre-screened for physiological data collection and those with pre-existing heart conditions were not included in the study (as per [30,36]). In addition, participants were requested to avoid consuming food and beverages that had cardiac stimulants such as caffeine or alcohol for 24 h before they took part in the study.

#### *2.2. Aparatus*

The experiment was conducted in the full motion-based University of Leeds Driving Simulator (UoLDS), which consists of a Jaguar S-type cab housed ina4m diameter spherical projection dome with a 300-degree field-of-view projection system. The simulator also incorporates an 8 degree-of-freedom electrical motion system. This consists of a 500 mm stroke-length hexapod motion platform, carrying the 2.5 T payload of the dome and vehicle cab combination, and allowing movement in all six orthogonal degrees-of-freedom of the Cartesian inertial frame. Additionally, the platform is mounted on a railed gantry that allows a further 5 m of effective travel in surge and sway. Drivers' physiological data were collected using a Biopac MP35 data acquisition system at 1000 Hz, which consisted of ECG electrodes and an EDA sensor.

#### *2.3. Study Design*

The study used a within-participant design and included a short familiarisation drive for ~10 min. Each participant experienced five drives: a MANUAL drive, two with human-like AV controllers (SLOW and FAST), a replay of their manual drive (REPLAY) and one conventional lane-keeping assist-based AV controller (LKAS) which did not adapt its behaviour to road furniture, such as kerbs or hedges. Each drive consisted of two different road environments (rural and urban). The design of the drives and the road environments are discussed below.

#### 2.3.1. Road Design

Each drive was 15.8 km long, and incorporated several situations that demanded greater attention and a shift in lateral position and speed, which could be deemed uncomfortable by the driver based on how it was negotiated, presented across two different road environments (rural and urban, see Figure 1). The speed limits, geometries, and obstacle locations, for each road are listed in Table 1 and Figure 2. The road design was similar across all drives except for LKAS, which did not include any obstacles, which were partly within the lane, such as roadworks or parked cars.

**Figure 1.** (**a**) Rural environment with roadworks; (**b**) urban environment.



Roads in the rural environments were narrower than those in the urban environments, except in the first segment, which was wider than the other two rural segments (see Table 1). We did this to assess whether a decrease in road-width increased discomfort within the same road environment. Overall, rural environments were designed to have narrower roads, tighter curves, and higher speed limits (and therefore, higher resultant acceleration), along with the presence of obstacles (parked-cars and roadworks, see Figure 1). These factors were designed to increase the attentional demand of the driver at varying degrees, which could possibly induce discomfort depending on how they were negotiated by the controllers, or drivers' individual manual driving style. There were more obstacles (parked-cars, roadworks, or pedestrian refuge, see Figure 2) in the urban environments (10), when compared to the rural environments (4), to investigate whether participants' discomfort increased with the number of obstacles.

**Figure 2.** Resultant acceleration of the different controllers and manual driving, along with the location of obstacles across all drives, except LKAS.

#### 2.3.2. Experimental Design

The five drives were counterbalanced, with the exception of the MANUAL drive, which was always the first drive for every participant, so that data could be collected for their REPLAY drive, although participants were not explicitly informed about this. As discussed in the Introduction, the SLOW and FAST controllers were modelled, based on data collected during manual driving across similar road segments in a previous HumanDrive study (see [34]). They were designed to mimic human-like driving, based on a risk model, which defined a range of acceptable vehicle states, such as velocity and lateral offset, depending on drivers' perceived risk levels in response to different road furnitures and features present in the drive, such as parked-cars or sharp curves. The FAST controller had higher velocities, compared to the SLOW controller, with a maximum difference of 4 m/s, and a minimum difference of 0.15 m/s. The driving data used to create the models (see [33]) showed that when driving at higher velocities, drivers' time to lane crossing (TLC) decreased, and, in order to maintain their preferred safety boundary, they moved further away from the road edge. Taking this knowledge into account, we increased the lateral offset of the FAST controller from the left edge of the road, at a rate of 5 cm for every 1 m/s increase in relative speed, compared to the SLOW controller. The LKAS controller was a simple lane-keeping assist controller, which had a constant velocity for most parts of the drive (at the speed limit for that section), except for when the vehicle had to negotiate a curve, or when it moved from an urban to rural environment (or vice-versa). The LKAS controller mostly kept to the lane centre (even when on curves). The objective of the design of the different drives with these controllers was to understand how discomfort was affected by factors such as manual and automated driving, the behaviour of the human-like AV controllers, a conventional lane-keeping controller and the controller based on one's own driving style. The different drives and their properties are shown in Figure 2, Tables 2 and 3, which show that the LKAS controller had the highest resultant acceleration (combined lateral and longitudinal accelerations) in rural environments, whereas the SLOW controller had the lowest resultant acceleration in rural environments. The 95th percentile of resultant acceleration and lateral jerk values across all the drives in rural environments was higher

than the suggested comfort threshold value for acceleration and jerk (2 m/s2 and 0.9 m/s3, respectively, according to [20]), whereas it was well below this threshold across all drives in the urban environments. The resultant acceleration values were mainly governed by the lateral accelerations, as the longitudinal accelerations were minimal, and within the suggested comfort threshold for longitudinal acceleration, across both environments, for all controllers.

**Table 2.** The 95th percentile of resultant acceleration (in m/s2) for different drives across different road environments.


**Table 3.** The 95th percentile of absolute values of lateral jerk (m/s3) for different drives across different road environments.


#### *2.4. Subjective Discomfort Rating (Button Presses)*

For each of the automated drives, the participants heard 41 auditory beep triggers. These beeps were played immediately after the participants were exposed to any obstacles, changes in road furniture, changes in road curvature or changes in road environment. In response to these triggers, they were required to press one of two buttons on an Xbox handset, to state: *"Yes, I found the behaviour to be safe*/*natural*/*comfortable"* (right button) or *"No, I did not find the behaviour to be safe*/*natural*/*comfortable"* (left button). This response explicitly pertained to the behaviour of the car within a couple of seconds around the moment of the beep's occurrence. Additionally, participants were encouraged to give this binary input whenever they felt necessary, across each drive.

#### *2.5. Procedure*

Upon arrival, the participants were briefed with the description of the study, after which they were invited to sign a consent form, with an opportunity to ask questions. Three ECG electrodes were then attached to the participant's chest, and 2 EDA electrode bands were attached on the index and middle finger of their non-dominant hand. They then performed a manual familiarisation drive, where they could become accustomed to the simulator environment and vehicle controls. Participants were instructed to adhere to the posted speed limit and to obey the normal rules of the road. After each drive, the participants were given a 10-min break, during which they were asked to complete a set of subjective questionnaires relating to that drive and the controllers. The results of the subjective questionnaires are not within the scope of this paper and will not be reported here.

#### *2.6. Data Analysis Tools*

The ECG data was processed on Kubios HRV premium software [37]. EDA signals were pre-processed, and artefacts were removed using custom algorithms based on recommendations in [30] and [38], on MATLAB R2016a. The data were analysed using Ledalab v3.9 [39], a MATLAB-based software package.

#### *2.7. Statistical Analysis*

Statistical analysis was conducted on IBM SPSS Statistics 26. Shapiro Wilk's test, which showed that not all estimates across the independent variables were normally distributed, but, in general, the majority of the estimates (>75%) were normally distributed for each of the dependent variables used. We judged the repeated measures ANOVA to be sufficiently robust to these issues, with only a small effect on Type I error rate [40]. For statistical significance, an α-value of 0.05 was used, and partial eta-squared was computed as an effect size statistic. Degrees of freedom were Greenhouse-Geisser corrected when Mauchly's test showed a violation of sphericity. Pair-wise comparisons with Bonferroni corrections were used to determine the differences in different drives and road segments. Pearson's correlation coefficient was used for any correlation analyses. Data from participants 24 and 14 were classified as outliers, and the data recorded from participants 10 and 15 were of poor quality, and, hence, these were discarded for RMSSD and HR analysis. Participant 12 did not respond to the instructions given for button presses, and participant 13 had an abnormally high rate of button presses. Therefore, these participants were not considered in the subjective button press analysis.

#### **3. Results**

Initially, the data were analysed for five separate segments (three in rural and two in urban environments) for each of the five drives, but results for physiological metrics, and the button presses, were not statistically different between the different segments, within the same environment. Therefore, the physiological and button press data across the three rural and 2 urban segments were aggregated for analysis, with the two independent variables being drive (MANUAL, SLOW, LKAS, FAST, REPLAY) and environment (rural and urban). The dependent variables were RMSSD, mean HR and nSCR/min.

#### *3.1. Physiological Metrics*

To understand how the behaviour of the AV controllers and manual driving affected drivers' physiological response, and discomfort, across the different road environments, we conducted a 5 (Drive: SLOW, LKAS, FAST, MANUAL and REPLAY) × 2 (Environment: rural, urban) repeated-measures ANOVA on all three physiological metrics (RMSSD, mean HR, nSCR/min). As discussed in the Introduction, previous research has shown that RMSSD values tend to decrease with an increase in discomfort, whereas mean HR and nSCR/min values tend to increase with an increase in discomfort [7,32].

There was a main effect of drive on RMSSD values, *F*(2.4, 45.2) = 5.27, *p* = 0.006, η*<sup>p</sup> <sup>2</sup>* = 0.22, (Figure 3), with post-hoc tests showing significantly lower RMSSD values in the MANUAL drive, compared to the *LKAS* (*p* = 0.007) and FAST (*p* = 0.008) drives. No other significant differences were found between the drives. There was no effect of environment on RMSSD, or any interactions between drive and environment.

There was a main effect of drive on drivers' mean HR, *F*(4, 76) = 6.81, *p* < 0.001, η*<sup>p</sup> <sup>2</sup>* = 0.23, (Figure 3), with post-hoc tests showing that drivers had significantly higher mean HR values in the MANUAL drive, compared with the FAST *drive (p* = 0.001). There were no significant differences between the other drives. There was no main effect of environment and no interactions between drive and environment.

**Figure 3.** (**a**) Root mean square of successive differences (RMSSD) and (**b**) heart rate (HR) plots for drive. \*\* *p* ≤ 0.01, \*\*\* *p* ≤ 0.001. Error bars denote s.e.

There was a main effect of drive on nSCR/min, *F*(4, 92) = 4.70, *p* = 0.002, η*<sup>p</sup> <sup>2</sup>* = 0.17, (Figure 4a), with post-hoc tests showing that there were significantly higher nSCRs/min in the MANUAL drive, compared to the SLOW (*p* = 0.006) and REPLAY drives (*p* = 0.005). There were no other significant differences. There was also a main effect of environment on drivers' nSCR/min, *F*(1, 23) = 40.54, *p* < 0.001, η*<sup>p</sup> <sup>2</sup>* = 0.64, (Figure 4b), with higher values seen in the rural environments, than the urban environments (*p* < 0.001). An interaction between drive and environment, *F* (4, 92) = 3.37, *p* = 0.013, η*p <sup>2</sup>* = 0.13, (Figure 4c) was also observed. Pairwise comparisons with Bonferroni corrections (α = 0.002) revealed that, in the MANUAL drive, drivers had a significantly higher nSCR/min while driving in rural environments, compared to the urban environments (*p* < 0.001). Additionally, within the rural environments, drivers showed significantly higher nSCR/min values in the MANUAL drive, when compared to the SLOW (*p* < 0.001), FAST (*p* < 0.001) and REPLAY (*p* = 0.001) drives. Amongst the AV controllers, LKAS showed the largest reduction in nSCR/min values between rural and urban environments (20.3% reduction in mean nSCR/min from rural to urban).

**Figure 4.** Number of skin conductance responses (SCRs) per minute (nSCR/min) for: (**a**) each drive; (**b**) across different environments; (**c**) and interaction effects. \*\* *p* ≤ 0.01, \*\*\* *p* ≤ 0.001. Error bars denote s.e.

#### *3.2. Subjective Discomfort Ratings (Button Presses)*

In the previous section, we reported a comparison of drivers' physiological state during each drive. However, physiological signals are sensitive to a wide range of stimuli, and are prone to individual differences. Therefore, care must be taken when interpreting a psychological construct, such as discomfort, using physiological measures only [7]. Hence, we used data from the button presses (see Section 2.4, in the Methods section) to establish whether the changes in physiological state correlated with the participants' overall subjective discomfort rating. Correlation analysis showed that button presses and nSCR/min were significantly positively correlated (*r*(20) = 0.46, *p* = 0.04).

To normalise the button press data across all participants, the percentage of NO presses was calculated in relation to the total number of presses, for each road environment, in each drive. A 4 × 2 repeated measures ANOVA was performed on the percentage of NO presses to assess discomfort, comparing the values across the four drives (SLOW, LKAS, FAST, and REPLAY) at two different road environments (rural and urban).

ANOVA results showed no main effect of drive on participants' button presses, but there was a main effect of environment, where drivers reported a significantly higher percentage of discomfort ratings in the rural, compared to the urban environment, *F*(1, 21) = 9.83, *p* = 0.005, η*<sup>p</sup> <sup>2</sup>* = 0.32 (Figure 5a). This pattern is similar to that observed for drivers' nSCR/min values, above.

**Figure 5.** Percentage of NO presses: (**a**) across the two environments; (**b**) the interaction between these two factors is shown in the right graph. \*\* *p* ≤ 0.01. Error bars denote s.e.

There was also an interaction effect, *F*(3, 63) = 3.16, *p* = 0.031, η*<sup>p</sup> <sup>2</sup>* = 0.13 (Figure 5b). Pair-wise comparisons with Bonferroni corrections (α = 0.003125) did not show any significant differences between any of the drives, in each environment. Discomfort ratings were similar across all the drives in the rural environment. However, there was a 43.8% and 52.3% reduction in mean discomfort ratings for LKAS and REPLAY drives, respectively, in the urban environment, compared to their respective values in the rural environment.

#### **4. Discussion and Conclusions**

This study investigated driver discomfort, from a physiological perspective, and sought to establish whether drivers' physiological state changes in line with the behaviour of different automated vehicle controllers. Drivers' response in manual driving was compared to four automated drives, with each navigating through a range of road geometries and speeds, associated with urban and rural road environments.

Physiological signals can be highly subjective, and therefore individuals may respond slightly differently to a particular stimulus. Additional care must be given whilst interpreting a physiological change to a psychological construct, as a range of constructs could initiate similar psychological responses [7]. In this study, participants were pre-screened for any physiological anomalies that could occur from usage of cardiac stimulants, exercise, or any medication that they were taking. Furthermore, for EDA analysis, we used nSCR/min instead of amplitude sum of each SCRs, and the former is less susceptible to individual differences such as thickness of skin, as each event related SCR is generally initiated as a response to a particular stimuli. This, and, given the fact that our study incorporated a within-subject design, additional standardisation techniques were not applied for processing RMSSD, mean HR and nSCR/min metrics.

Results showed lower RMSSD values, and higher mean HR and nSCR/min values, in the MANUAL drive, compared to at least one of the AV controllers. However, since drivers were not required to evaluate their own driving, by button presses in the MANUAL drive, it is not possible to conclude whether this difference in physiological metric between the MANUAL and automated drives reflects driver discomfort only, or rather, whether it is due to an increased physical and mental demand associated with the manual driving task, or both.

There were no significant main effects in either the physiological metrics, or button press data, between the four automated drives. This may be because overall, the drives had similar resultant acceleration profiles across the whole drive (see Figure 2.). We analysed physiological metrics and subjective button press data for each segment/environment, which were at least 2 min long. Hence, some of the instantaneous variations in controller behaviour may have produced opposing effects, which cancelled each other out when averaged across a larger time window. These findings are in agreement with [7], where the authors did not find any significant differences in physiological responses between their three automated drives (defensive, aggressive and replay of manual drive). Those authors attributed the lack of difference in physiological responses to high confidence interval bands in their analysis, where missing or opposite effects would have increased the confidence bands dramatically.

In contrast, there were some observable differences, both in terms of physiological metrics nSCR/min), and subjective button presses, for the two road environments, with the rural roads being significantly more uncomfortable than the urban environments. This increase in discomfort is likely attributed to the significantly higher resultant acceleration and jerk experienced in the rural environments, for all drives, which often crossed the 2 m/s<sup>2</sup> and 0.9 m/s<sup>3</sup> threshold for acceleration and jerk, respectively, for a comfortable driving experience, as suggested by [20]. In other words, the higher speed limits, narrower roads and tighter curves associated with the rural environments, seem to be the main cause of increased driver discomfort in this environment. Although more obstacles were present in the urban sections (10 vs. 4), it seems that the way these were negotiated by the vehicle in the rural sections (i.e., passed at a much higher velocity and on narrower roads), was a significant source of driver discomfort during rural environment. These findings are in line with those of [41], where the authors found higher levels of simulator sickness in high-velocity rural environments, when compared to city environments. These results also suggest that those developing automated vehicle controllers should focus on improving comfort, and thereby minimising jerk, when the vehicle is negotiating higher speed, higher acceleration, road geometries.

While the mean discomfort ratings and nSCR/min seemed to be quite similar across all AV controllers in the rural environments, these were particularly low for the urban section of the LKAS (as seen in both discomfort ratings and nSCR/min) and REPLAY (as seen in the discomfort ratings) drives. This is likely due to the absence of any obstacles in the LKAS drive, resulting in very little variations in velocity and lateral offset (and thus, resultant acceleration). With respect to the REPLAY drive, it is likely that participants visibly recognised their own driving style and preferred this familiar behaviour during the lower speed urban environment, where their comfort threshold for acceleration forces was not breached. This was also reflected in their subjective ratings. This recognition was indeed noted by some participants, after their REPLAY drive, although not formally recorded. There seems to be incongruence in participants' physiological indicator of discomfort and perceived level of discomfort during the REPLAY drive in urban environments, indicating a bias in rating one's own driving behaviour. These findings suggest that when the resultant acceleration and jerk experienced by the driver remains well below the comfort threshold, other factors that affect discomfort, such as familiarity of the drive or presence of obstacles, become more prominent and noticeable. In contrast, when the resultant acceleration and jerk values moves above the comfort threshold, it seemingly overshadows other determinants of driver discomfort. This warrants further research into understanding drivers' comfort threshold in terms of jerk and acceleration forces, and its impact on other factors that induce discomfort to the driver.

This study was conducted on a dynamic driving simulator (see Section 2.2 for more details), and the acceleration and jerk forces experienced by the participants would be similar to that in a real-world scenario. Since acceleration and jerk were two main factors affecting discomfort, we believe a drivers' feeling of discomfort due to these forces is quite similar in a simulator and real-world environment. Johnson et al. [42] conducted a study on effect of physiological responses in fixed-based simulator vs. real-world driving and concluded that while level of immersion is at an acceptable level to elicit presence and the trends observed in physiological data during simulated driving relative to real-world driving were quite similar, the absolute physiological responses for virtual and real-world environments were significantly different. There is also the possibility of different behavioural responses by drivers in simulator, when compared to a real-world driving situation [43]. This study incorporated conventional techniques and sensors to measure drivers' physiological data, which were intrusive

in nature. However, recent technological advancements have led to non-intrusive [7,25] and even non-contact physiological sensor technologies [26], which need to be validated with on-road studies.

To conclude, there is a need to measure discomfort objectively, and in real-time, so that future AVs can adapt their driving behaviour and provide a more comfortable and pleasant driving experience for human occupants. The novelty of this study is in understanding and measuring the long-term effects of discomfort, across various road environments and a range of AV controllers, using physiological measures. This study suggests that, compared to HR variability measures, EDA-based SCR values are more sensitive to continuous changes in discomfort inducing stimuli, such as those experienced when a vehicle navigates through different geometric and speed-based scenarios. We observed a moderately positive correlation between participants' nSCR/min and their subjective rating of discomfort. Further research may, therefore, be warranted to investigate the value of this metric for assessing real-time driver discomfort levels, which may be useful when developing more acceptable controllers for future automated vehicles.

**Author Contributions:** Conceptualisation, N.M., V.R., T.L., E.R.B., R.R.; data curation, V.R.; formal analysis, V.R.; funding acquisition, N.M., R.R., E.R.B.; investigation, V.R., E.P., F.H.; methodology, V.R., N.M., R.R., E.R.B., C.W., F.H., E.P.; project administration, N.M., E.R.B., R.R.; software, V.R.; supervision, N.M., T.L., M.G.L.; validation, V.R., N.M., T.L.; visualisation, V.R.; writing—original draft preparation, V.R.; writing—review and editing, V.R., N.M., T.L., M.G.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** The work described in this paper was undertaken as part of the HumanDrive project, which is co-funded by the Centre for Connected and Automated Vehicles (CCAV) and Innovate UK, the UK's innovation agency. The lead author's Ph.D. is funded by EPSRC CASE studentship in partnership with Seeing Machines Ltd.

**Acknowledgments:** This paper is published with kind permission from the HumanDrive consortium: Nissan, Hitachi, Horiba MIRA, Atkins Ltd., Aimsun Ltd., SBD Automotive, University of Leeds, Highways England, Cranfield University, and the Connected Places Catapult. The data collection for this paper was feasible due to the help and technical support provided by the University of Leeds Driving Simulator (UoLDS) team.

**Conflicts of Interest:** The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Information* Editorial Office E-mail: information@mdpi.com www.mdpi.com/journal/information

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18