**Test and Evaluation Methods for Human-Machine Interfaces of Automated Vehicles**

Editors

**Frederik Naujoks Sebastian Hergeth Andreas Keinath Nadja Sch ¨omig Katharina Wiedemann**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Frederik Naujoks BMW Group Germany Nadja Schomig ¨ Wuerzburg Institute for Traffic Sciences Germany Sebastian Hergeth BMW Group Germany Katharina Wiedemann Wuerzburg Institute for Traffic Sciences Germany

Andreas Keinath BMW Group Germany

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Information* (ISSN 2078-2489) (available at: https://www.mdpi.com/journal/information/special issues/Automated Vehicles).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Article Number*, Page Range.

**ISBN 978-3-03943-198-4 (Hbk) ISBN 978-3-03943-199-1 (PDF)**

c 2020 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

### **Contents**




### **About the Editors**

**Frederik Naujoks** graduated from the University of Wuerzburg in 2010 with a Diploma in Psychology and a PhD in 2015. Between 2011 and 2017, he worked at the Center for Traffic Sciences (IZVW) at the University of Wuerzburg and at the Wuerzburg Institute for Traffic Sciences (WIVW). Since then, he has worked with BMW since 2017. His research focuses on applied psychology topics, such as driver distraction, usability and human-centered design and the evaluation of assisted and automated driving.

**Sebastian Hergeth** was born in 1986 in Munich, Germany, and obtained his Bachelor of Science in Psychology from Paris Lodron University of Salzburg in 2011, Master of Science in Economic and Organizational Psychology from Ludwig-Maximilians-Universitat M ¨ unchen in 2013, and PhD in ¨ Psychology from Chemnitz University of Technology on the topic of trust in automation in 2016. Since 2016 he is an employee of the BMW Group in Munich. His main research areas include human factors of assisted and automated driving, HMI design and evaluation, method development, trust in automation and driver distraction, as well as exterior human–machine interfaces.

**Andreas Keinath** is Head of Concept Quality and Usability of the HMI department at BMW Group. He received his PhD in Psychology from Chemnitz University of Technology in 2003. His research focusses on cognitive and applied psychology, as well as automotive systems engineering.

**Nadja Sch ¨omig** finished her studies in psychology at the University of Wurzburg in 2003. From ¨ 2003 to 2007, she worked at the Centre for Traffic Sciences at the University of Wurzburg. In 2008, she ¨ started working as a senior researcher at the Wurzburg Institute for Traffic Sciences (WIVW). In 2009, ¨ she received her PhD in psychology on the topic of driver situation awareness and its measurement. Her main research areas are human factor-related topics in assisted and automated driving, such as HMI design, HMI evaluation methodologies and driver state assessment methodologies (fatigue, distraction).

**Katharina Wiedemann** has been working at the Wurzburg Insitute for Traffic Sciences (WIVW) ¨ after finishing her studies in Psychology at the University of Wurzburg in 2014. Her research focuses ¨ on human factors of assisted and automated driving, HMI design and evaluation, the development of test methods and driver distraction. She is currently finalizing her PhD in psychology about the design of automated vehicle HMIs.

### **Preface to "Test and Evaluation Methods for Human-Machine Interfaces of Automated Vehicles"**

The human-machine interface of automated driving systems (ADS) will play a crucial role in their safe, comfortable and efficient use. For example, the ADS HMI should be capable of efficiently informing the user about the current automated driving mode and the user's responsibilities (e.g., whether the ADS is functioning properly or requesting a transition of control from the ADS to the user). While ADS might allow new and more comfortable seating positions and engagement in nondriving-related tasks that are not allowed in manual driving, these might lower the user's availability for a transfer of control or generate motion sickness. Furthermore, when interacting with other vehicles, ADS might behave differently than manually driven vehicles, which might generate a need for external HMIs or standardized motion patterns for an adequate interaction with non-automated traffic participants. This is only a small proportion of the new challenges for test and evaluation methods of HMIs that arise from the introduction of ADS. Thus, human factor experts need to explore, advance and establish test methods that are able to account for these new challenges in the design of future vehicles.

The articles of this Special Issue analyze developments and new challenges by introducing literature reviews, and analytical as well as experimental approaches to the topics outlined above. The contributions all stem from well-known research institutes and leading practitioners in the field of ADS research. The papers deal with a broad selection of relevant topics, which can be broadly categorized in four clusters:

• Assessing the relationship of automated vehicles and surrounding non-automated traffic: ADS will very likely be introduced into a mixed traffic environment, which means that some road users will be automated, while others will drive manually. Papers [1–4] focus on the impact of automated vehicles on surrounding, non-automated traffic such as pedestrians or cyclists.

• Designing and evaluating external human–machine interfaces (eHMIs): Automated cars may be equipped with eHMIs for communication with other unequipped road users such as pedestrians. Their potential benefits and drawbacks are discussed in the technical and scientific community, but there are currently no available standards for their implementation. Thus, papers [5–9] present empirical studies as well as test protocols for this focus area.

• Evaluating interior HMIs of automated vehicles: As long as vehicles can be driven manually or require manual intervention by their users, the interior HMI will still play a crucial part in their safe and efficient usage. However, guidelines and test methods are only slowly being adapted from those of manual and assisted driving. The next three papers [10–12] investigate methods regarding the assessments of interior HMIs of automated vehicles.

• Evaluating the influence of driver state, driver availability and situational factors on control transitions and the comfort of automated driving: A crucial human factor in the use of automated driving functions is the driver's state, such as the readiness to take over manual driving, mode awareness, fatigue or motion sickness. The driver's state can have an impact both on the safety of control transitions as well as the perceived comfort and acceptance of automated driving. The following papers [13–21] provide empirical studies, as well as theoretical analyses and test protocols on this issue.

This Special Issue brings together research from well-known human factor experts in the field of automated driving. The impressive number of published papers covering a wide range of research topics on test and evaluation methods for automated vehicles HMIs shows the high relevance of this Special Issue. The Special Issue has thus contributed to the promotion and dissemination of these methods within the scientific community, and will hopefully stimulate further research on these topics.

#### **References**


**Frederik Naujoks, Sebastian Hergeth, Andreas Keinath, Nadja Sch ¨omig, Katharina Wiedemann** *Editors*

### *Editorial* **Editorial for Special Issue: Test and Evaluation Methods for Human-Machine Interfaces of Automated Vehicles**

#### **Frederik Naujoks 1,\*, Sebastian Hergeth 1, Andreas Keinath 1, Nadja Schömig <sup>2</sup> and Katharina Wiedemann <sup>2</sup>**


Received: 18 August 2020; Accepted: 19 August 2020; Published: 20 August 2020

**Abstract:** Today, OEMs and suppliers can rely on commonly agreed and standardized test and evaluation methods for in-vehicle human–machine interfaces (HMIs). These have traditionally focused on the context of manually driven vehicles and put the evaluation of minimizing distraction effects and enhancing usability at their core (e.g., AAM guidelines or NHTSA visual-manual distraction guidelines). However, advances in automated driving systems (ADS) have already begun to change the driver's role from actively driving the vehicle to monitoring the driving situation and being ready to intervene in partially automated driving (SAE L2). Higher levels of vehicle automation will likely only require the driver to act as a fallback ready user in case of system limits and malfunctions (SAE L3) or could even act without any fallback within their operational design domain (SAE L4). During the same trip, different levels of automation might be available to the driver (e.g., L2 in urban environments, L3 on highways). These developments require new test and evaluation methods for ADS, as available test methods cannot be easily transferred and adapted. The shift towards higher levels of vehicle automation has also moved the discussion towards the interaction between automated and non-automated road users using exterior HMIs. This Special Issue includes theoretical papers a well as empirical studies that deal with these new challenges by proposing new and innovative test methods in the evaluation of ADS HMIs in different areas.

**Keywords:** automated driving; human–machine interface; test methods; user studies; evaluation

#### **1. Introduction**

The human–machine interface (HMI) will play a crucial role in the safe, comfortable and efficient use of automated vehicles. For example, the automated driving system (ADS) HMI should be capable of informing the user about the current mode and minimize confusion about the status of the ADS and the user's current responsibilities (e.g., whether the ADS is functioning properly, ready for use, unavailable for use or requesting a transition of control from the ADS to the user). While ADS might allow new and more comfortable seating positions and engagement in nondriving-related tasks that were not allowed in manual driving, these might lower the user's availability for a transfer of control or generate motion sickness. As the driving task is no longer actively fulfilled by the driver, distraction by nondriving-related tasks might turn into controlled engagement by activating activities that prevent fatigue, generating the need to advance assessment methods for nondriving-related tasks. Furthermore, when interacting with other vehicles, ADS might behave differently than manually driven vehicles, which might generate a need for external HMIs or standardized motion patterns for

an adequate interaction with non-automated traffic participants. This is only a small proportion of the new challenges for test and evaluation methods of HMIs that arise from the introduction of ADS. The articles of this Special Issue analyze the developments and new challenges by introducing new test methods about the topics outlined above. Among the submissions received, all of which went through a rigorous peer-review process, 21 papers have been selected for publication. The contributions all stem from well-known research institutes and leading practitioners in the field of ADS research. The papers, which will be described in the following, deal with a broad selection of relevant topics such as the evaluation of the relationship of automated vehicles and surrounding non-automated traffic, external as well as interior human–machine interfaces of automated vehicles and the influence of driver state, driver availability and situational factors on control transitions and comfort of automated driving.

#### **Assessing the relationship of automated vehicles and surrounding non-automated tra**ffi**c**

ADS will very likely be introduced into a mixed traffic environment, which means that some road users will be automated while others will be driven manually. The following papers focus on the impact of automated vehicles on surrounding, non-automated traffic such as pedestrians or cyclists. The first paper "Comparison of Methods to Evaluate the Influence of an Automated Vehicle's Driving Behavior on Pedestrians: Wizard of Oz, Virtual Reality, and Video" by Fuest, Schmidt and Bengler [1] investigates four different methods regarding the communication between automated vehicles and pedestrians. Hence the same study design in four different settings was used. Two video, one virtual reality, and one Wizard of Oz setup was replicated. An automated vehicle approached from the left, using different driving profiles characterized by changing speed to communicate its intention to let the pedestrians cross the road. Participants were asked to recognize the intention of the automated vehicle and to press a button as soon as they realized its intention.

The second paper "Effects of Marking Automated Vehicles on Human Drivers on Highways" by Fuest, Feierle, Schmidt and Bengler [2] presents a simulation study with different highway scenarios each with and without a marked automated vehicle. Common to all scenarios was that the automated vehicles strictly adhered to German highway regulations, and therefore moved in road traffic somewhat differently to human drivers. After each trial, the participants were asked to rate how appropriate and disturbing the automated vehicle's driving behavior was. In addition, objective data, such as the time of a lane change and the time headway were measured.

The third paper "Multi-Vehicle Simulation in Urban Automated Driving: Technical Implementation and Added Benefit" by Feierle, Rettenmaier, Zeitlmeir and Bengler [3] investigates the simultaneous interaction between an automated vehicle (AV) and its passenger, and between the same AV and a human driver of another vehicle. For this purpose a multi-vehicle simulation consisting of two driving simulators, one for the AV and one for the manual vehicle was implemented. This paper analyzes the effect of an automation failure, where the AV first communicates to yield the right of way and then changes its strategy and passes through the bottleneck first, despite oncoming traffic. The research questions the study aims to answer are what methods should be used for the implementation of multi-vehicle simulations with one AV, and is there an added benefit of this multi-vehicle simulation compared to single-driver simulator studies?

The next paper focuses on the communication of surrounding traffic conditions to users of automated vehicles. The paper "Feeling Uncertain—Effects of a Vibrotactile Belt that Communicates Vehicle Sensor Uncertainty" by Krüger, Driessen, Wiebel-Herboth, de Winter and Wersing [4] deals with the design and evaluation of a vibrotactile interface that communicates spatiotemporal information about surrounding vehicles and encodes a representation of spatial uncertainty in a novel way. For the measure of subjective understanding and benefit, a questionnaire, ratings and scores were used, for the objective benefit, the minimum time-to-contact as a measure of safety and gaze distributions as an indicator for attention guidance were computed.

#### **Designing and evaluating external human–machine interfaces (eHMIs)**

Automated cars may be equipped with eHMIs for communication with other unequipped road users such as pedestrians. Their potential benefits and drawbacks are discussed in the technical and scientific community, but there are currently no available standards for their implementation. Therefore the first paper "Standardized Test Procedure for External Human-Machine Interfaces of Automated Vehicles", by Kaß, Schoch, Naujoks, Hergeth, Keinath and Neukum [5] presents a standardized test procedure that enables the effective usability evaluation of eHMIs from the perspective of multiple road users. The paper includes a methodological approach to deduce relevant use cases as well as specific usability requirements that should be fulfilled by an eHMI to be effective, efficient, and satisfying. To prove whether an eHMI meets these requirements, a test protocol for the empirical evaluation of an eHMI with a participant study is demonstrated.

To be effective, any message displayed by an automated vehicle to other road users must satisfy legibility requirements based on the dynamics of the road traffic and the time required by the human to process the respective message. Therefore the second paper "How Much Space Is Required? Effect of Distance, Content, and Color on External Human–Machine Interface Size" by Rettenmaier, Schulze and Bengler [6] examines the size requirements of displayed text or symbols regarding eHMIs for ensuring the legibility of a message. Based on a developed eHMI prototype, the influence of content type on content size to ensure legibility from a constant distance, as well as the influence of content type and content color on the human detection range, was investigated.

The third paper "How Do eHMIs Affect Pedestrians' Crossing Behavior? A Study Using a Head-Mounted Display Combined with a Motion Suit" by Kooijmann, Happee and de Winter [7] focuses on the investigation of the effects of eHMIs on participants' crossing behavior. For this purpose, the participants were immersed in a virtual urban environment using a head-mounted display coupled to a motion-tracking suit. The approaching vehicles' behavior (yielding, or nonyielding) and eHMI type (None, Text or Front Brake Lights) were manipulated and the participants could cross the road whenever they felt safe enough to do so. The study shows that the motion suit allows investigating pedestrian behaviors related to bodily attention and hesitation in the context of interacting with automated vehicles.

The fourth paper "External Human–Machine Interfaces: The Effect of Display Location on Crossing Intentions and Eye Movements" by Eisma, van Bergen, Brake, Hensen, Tempelaar and de Winter [8] addresses the effects of the position of the eHMI on the feeling of safety to cross the street. The eHMI showed "Waiting" combined with a walking symbol 1.2 s before the car started to slow down, or "Driving" while the car continued driving. Participants had to press and hold the spacebar when they felt it was safe to cross. After that, the percentages of spacebar presses and the eye-tracking analyses were evaluated.

The last paper regarding the concept of eHMIs "Efficient Paradigm to Measure Street-Crossing Onset Time of Pedestrians in Video-Based Interactions with Vehicles" by Faas, Mattes, Kao and Baumann [9] introduces a methodology to compare eHMI concepts from a pedestrian's viewpoint. Therefore a quantifiable concept that allows participants to naturally step off a sidewalk to cross the street was developed. Hidden force-sensitive resistor sensors recorded their crossing onset time (COT) in response to real-life videos of approaching vehicles in an immersive crosswalk simulation environment.

#### **Evaluating interior HMIs of automated vehicles**

As long as vehicles can be driven manually or require manual intervention by their users, the interior HMI will still play a crucial part in their safe and efficient usage. However, guidelines and test methods are only slowly being adapted from those of manual and assisted driving. The next three papers investigate methods regarding the assessments of interior HMIs of automated vehicles. The first one "Usability Evaluation—Advances in Experimental Design in the Context of Automated Driving Human–Machine Interfaces" by Albers, Radlmayr, Löw, Hergeth, Naujoks, Keinath and Bengler [10] aggregates common research methods and findings based on an extensive literature

review. These methods and findings are discussed critically, taking into consideration requirements for usability assessments of HMIs in the context of conditional automated driving. The paper concludes with a derivation of recommended study characteristics framing best practice advice for the design of experiments.

The second paper "Checklist for Expert Evaluation of HMIs of Automated Vehicles—Discussions on Its Value and Adaptions of the Method within an Expert Workshop" by Schömig, Wiedemann, Hergeth, Forster, Muttart, Eriksson, Mitropulos-Rundus, Grove, Krems, Keinath, Neukum and Naujoks [11] summarizes the results of a workshop about a checklist method for the evaluation of automated vehicles' HMIs. Within this workshop, members of the human factors community were brought together to discuss the method and to further promote the development of HMI guidelines and assessment methods for the design of HMIs of automated driving systems (ADS). The results will be used to further improve the checklist method and make the process available to the scientific community.

The paper "Human–Vehicle Integration in the Code of Practice for Automated Driving" by Wolter, Dominioni, Hergeth, Tango, Whitehouse and Naujoks [12] deals with a new Code of Practice for automated driving (CoP-AD) as part of the publicly funded European project L3Pilot. It provides developers with a comprehensive guideline on how to design and test automated driving functions, with a focus on highway driving and parking. This paper focuses on the human factors aspects addressed in the CoP-AD, which includes, inter alia, general human factors-related guidelines, mode awareness, trust, and misuse, driver monitoring together with the topic of controllability and the execution of customer clinics, as well as the training and variability of users.

#### **Evaluating the influence of driver state, driver availability and situational factors on control transitions and comfort of automated driving**

A crucial human factor in the use of automated driving functions is the driver's state, such as the readiness to take over manual driving, mode awareness, fatigue or motion sickness. The driver's state can have an impact both on the safety of control transitions as well as the perceived comfort and acceptance of automated driving. The following papers provide empirical studies as well as theoretical analyses and test protocols on this issue.

The first one "Sleep Inertia Countermeasures in Automated Driving: A Concept of Cognitive Stimulation" by Wörle, Kenntner-Mabiala, Metz, Fritzsch, Purucker, Befelein and Prill [13] shows the concept and evaluation of a reactive countermeasure against sleep inertia, which could be useful with regard to dual-mode vehicles that allow both manual and automated driving. The so called "sleep inertia counter-procedure for drivers" (SICD), has been developed with the aim to activate and motivate the driver as well as to measure the driver's alertness level. The SICD is evaluated in a study with drivers in a driving simulator.

The second paper "Methodological Approach towards Evaluating the Effects of Non-Driving Related Tasks during Partially Automated Driving" by Hollander, Rauh, Naujoks, Hergeth, Krems and Keinath [14] shows the development of a test protocol for systematically evaluating non driving-related tasks' (NDRT) effects during partially automated driving (PAD). Two generic take-over situations addressing system limits of a given PAD regarding longitudinal and lateral control were implemented to evaluate drivers' supervisory and take-over capabilities while engaging in different NDRTs (e.g., manual radio tuning task). The test protocol was evaluated and refined across the three studies (two simulator and one test track).

The third paper "Mode Awareness and Automated Driving—What Is It and How Can It Be Measured?" by Kurpiers, Biebl, Mejia Hernandez and Raisch [15] introduces a measurement method to assess mode awareness when using automated vehicles. The background of this study is the different responsibility allocation in different automation modes that requires the driver to always be aware of the currently active system and its limits to ensure a safe drive. For that reason, current research focuses on identifying factors that might promote mode awareness. In the method presented by the authors, the behavior aspect is represented by the relational attention ratio in manual, Level 2 and Level 3 driving as well as the controllability of a system limit in Level 2. The knowledge aspect of mode awareness is operationalized by a questionnaire on the mental model for the automation systems after an initial instruction as well as an extensive enquiry following the driving sequence.

The fourth paper "Engagement in Non-Driving Related Tasks as a Non-Intrusive Measure for Mode Awareness: A Simulator Study" by Forster, Geisel, Hergeth, Naujoks and Keinath [16] describes a driving simulator study, based on the expectation that HMI design and practice with different levels of driving automation influence NDRT engagement. Therefore the participants completed several transitions of control and could engage in an NDRT if they felt safe and comfortable to do so. The NDRT was the Surrogate Reference Task (SuRT) as a representative of a wide range of visual-manual NDRTs. Engagement (i.e., number of inputs on the NDRT interface) was assessed at the onset of a respective episode of automated driving (i.e., after transition) and during ongoing automation (i.e., before subsequent transition).

The fifth paper "Methodological Considerations Concerning Motion Sickness Investigations during Automated Driving" by Mühlbacher, Tomzig, Reinmüller and Rittger [17] discusses methodological aspects for investigating motion sickness in the context of automated driving including measurement tools, test environments, sample, and ethical restrictions. Additionally, methodological considerations guided by different underlying research questions and hypotheses are provided. Selected results from the authors' own studies concerning motion sickness during automated driving which were conducted in a motion-based driving simulation and a real vehicle are used to support the discussion.

The sixth paper "Supporting Drivers of Partially Automated Cars through an Adaptive Digital In-Car Tutor" by Boelhouwer, van den Beukel, van der Voort, Verwey and Martens [18] investigates the effects of a Digital In-Car Tutor (DIT) prototype on appropriate automation use and take-over quality during a driving simulator study. A DIT is proposed to support drivers in learning about, and trying out, their car automation during regular drives. Participants needed to use the automation when they thought that it was safe, and turn it off if they did not. The control group read an information brochure before driving, while the experiment group received the DIT during the first driving session.

The seventh paper "The Impact of Situational Complexity and Familiarity on Takeover Quality in Uncritical Highly Automated Driving Scenarios" by Scharfe, Zeeb and Russwinkel [19] differentiates between the objective complexity and the subjectively perceived complexity of a traffic situation. The aim of the present study was to examine the impact of objective complexity and familiarity on the subjectively perceived complexity and the resulting takeover quality. In a driving simulator study, participants were requested to take over vehicle control in an uncritical situation. Familiarity and objective complexity were varied by the number of surrounding vehicles and scenario repetitions. Subjective complexity was measured using the NASA-TLX; the takeover quality was gathered using the take-over controllability rating (TOC-Rating).

The eighth paper "Repeated Usage of an L3 Motorway Chauffeur: Change of Evaluation and Usage" by Metz, Wörle, Hanig, Schmitt and Lutz [20] investigates changes in drivers' evaluation, in function usage and in drivers' reactions to take-over situations with repeated usage of automated driving functions. Therefore, drivers used a level 3 (L3) automated driving function for motorways during six experimental sessions in a driving simulator study. They were free to activate/deactivate the system as they liked and to spend driving time on self-chosen side tasks. After that the experienced trust and safety, the time spent on side tasks, attention directed to the road and behavioral adaptation was analyzed.

The last paper "Measuring Drivers' Physiological Response to Different Vehicle Controllers in Highly Automated Driving (HAD): Opportunities for Establishing Real-Time Values of Driver Discomfort" by Radhakrishnan, Merat, Louw, Lenné, Romano, Paschalidis, Hajiseyedjavadi, Wei and Boer [21] investigates how driver discomfort was influenced by different types of automated vehicle (AV) controllers, compared to manual driving, and whether this response changed in different road environments, using heart-rate variability and electrodermal activity. The drivers were subjected to manual driving and four AV controllers: two modelled to depict "human-like" driving behavior, one conventional lane-keeping assist controller, and a replay of their own manual drive.

#### **2. Conclusions**

This Special Issue brings together research from well-known human factors experts in the field of automated driving. The impressive number of published papers covering a wide range of research topics on test and evaluation methods for automated vehicles HMIs shows the high relevance of this Special Issue. The Special Issue has thus contributed to the promotion and dissemination of these methods within the scientific community and will hopefully stimulate further research on these topics.

**Acknowledgments:** The guest editors would like to thank the authors for their valuable submissions, the reviewers for their precious and constructive comments. We also thank Helena Opower for proofreading and her input for this editorial.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

## **Comparison of Methods to Evaluate the Influence of an Automated Vehicle's Driving Behavior on Pedestrians: Wizard of Oz, Virtual Reality, and Video**

#### **Tanja Fuest 1,\*, Elisabeth Schmidt <sup>2</sup> and Klaus Bengler <sup>1</sup>**


Received: 5 May 2020; Accepted: 26 May 2020; Published: 29 May 2020

**Abstract:** Integrating automated vehicles into mixed traffic entails several challenges. Their driving behavior must be designed such that is understandable for all human road users, and that it ensures an efficient and safe traffic system. Previous studies investigated these issues, especially regarding the communication between automated vehicles and pedestrians. These studies used different methods, e.g., videos, virtual reality, or Wizard of Oz vehicles. However, the extent of transferability between these studies is still unknown. Therefore, we replicated the same study design in four different settings: two video, one virtual reality, and one Wizard of Oz setup. In the first video setup, videos from the virtual reality setup were used, while in the second setup, we filmed the Wizard of Oz vehicle. In all studies, participants stood at the roadside in a shared space. An automated vehicle approached from the left, using different driving profiles characterized by changing speed to communicate its intention to let the pedestrians cross the road. Participants were asked to recognize the intention of the automated vehicle and to press a button as soon as they realized this intention. Results revealed differences in the intention recognition time between the four study setups, as well as in the correct intention rate. The results from vehicle–pedestrian interaction studies published in recent years that used different study settings can therefore only be compared to each other to a limited extent.

**Keywords:** (automated) vehicle–pedestrian interaction; implicit communication; mixed traffic; virtual reality; Wizard of Oz; video; setup comparison/method comparison

#### **1. Introduction**

An increasing number of automated functions are being integrated into vehicles, and it is only a question of time before the first conditionally automated vehicles (AVs) [1] are driving on public highways. In the long term, AVs will also travel in urban spaces that are characterized by an increased complexity compared to driving on highways [2]. In both scenarios, in addition to AVs, human road users (HRUs) will continue to participate in the traffic system. For this reason, AVs must not only be able to detect HRUs, but they must also communicate with them to ensure safe and efficient interaction. Explicit and implicit communication already takes place in road traffic today. For example, in terms of explicit communication, human drivers flash their headlights or deploy the horn to communicate their intentions [3]. For AVs, besides the existing communication forms, it is also possible to extend the explicit communication by using external human–machine interfaces (eHMIs) (e.g., [4–11]), such as light strips [6,12] or displays [4,7,13].

However, it is still unknown whether AVs require eHMIs. In addition, it has not yet been fully investigated as to what driving profile AVs should follow, and if these trajectories should differ from situation to situation. The driving profile and eHMI might influence traffic safety, as well as the communication between AVs and other HRUs.

Several studies have already been carried out to investigate the influence of AV markings, eHMIs and driving profiles. Most studies focused on the interaction between AVs and pedestrians, using different methods, e.g., images, videos, virtual reality (VR), or Wizard of Oz (WoZ) vehicles. More recently, driving simulator studies subsequently investigated the interaction between AVs and human drivers. However, the extent of transferability of results between these studies is still unknown.

#### *1.1. Images*

One method suggested by researchers for development process for human machine interactions are images. For a comparison of 30 early stage design concepts of eHMIs within a short space of time, images were used [14]. Participants had to rate their understanding of the different concepts. The results presented gave no clear recommendation regarding the concepts, but the conclusion of the paper was that the method is suitable for evaluating design elements at an early stage [14]. The method of presenting photos to participants to evaluate the AV's communication strategies was also used in a preliminary study by [15]. Photos of an approaching vehicle were shown to the participants, who were then asked what they would focus on when crossing the street [15]. The authors found out that pedestrians pay particular attention to the AV's braking behavior before crossing the road [15]. Most participants would even wait for a complete standstill, especially when they did not see a driver in the AV [15]. Reference [16] used images of different vehicles to evaluate which vehicle type is most suitable for a subsequent video-based survey.

To sum up, these references suggest that the image setup can be useful for gleaning initial impressions for subsequent studies and for evaluating early stage design concepts.

#### *1.2. Videos*

The subsequent video experiment of [16] was used to evaluate the crossing behavior of participants at an unmarked road, depending on different vehicles driving behavior and the automation state of the vehicle [16]. Again, it was shown that the braking behavior plays an important role in the pedestrians' decision to cross the road independent of the vehicle's automation status or the presence of a driver [16].

Additional eHMIs have a positive impact on the imagined crossing behavior of pedestrians [13]. During the braking process, eHMIs have influenced the subjective feeling of participants that it is safe to cross [17]. The eHMIs should be installed on the roof, windscreen, or grille; however, projections and eHMIs on wheels should be avoided [17].

The video studies presented were used to identify possible differences between different implicit and explicit AV communication forms.

#### *1.3. Virtual Reality*

Whereas the participants in the video studies sat in front of a monitor, for VR, participants usually saw the environment, including the AV, through a head-mounted display.

The results from a VR study show that pedestrians react with confusion and mistrust to atypical trajectories compared to conventional trajectories [18]. This gives a first hint that VR is a good tool for evaluating pedestrian–vehicle interaction [18]. Other results illustrate that pedestrians understand the AV's driving behavior and recommend early deceleration when yielding [15]. A hard initial braking with a pitch reduced the time pedestrians need to realize an AV's yielding intention [19]. Moreover, defensive driving strategies led to pedestrians starting to cross at an earlier point in time [19].

In addition, eHMIs enhance the interaction between pedestrians and AVs [4] and improve the perceived safety and comfort of participants introduced to the eHMI, when encountering an AV [20]. However, the vehicle size has a small effect on the perceived safety [4]. Larger vehicles reduce the perceived safety of participants [4]. The authors of [21] integrated display into their AV mimicking eyes looking at the pedestrians. These "eyes" help pedestrians to feel safer crossing the street and make their decision to cross quicker [21]. However, eHMIs do not necessarily have the same advantages in all

countries: using an eHMI when yielding helps pedestrians in Germany and the United States to realize the AV's intention; however, this effect is not apparent for those in China [6]. In addition, the results have shown that, across Germany, the United States, and China, eHMIs deteriorate the pedestrians' recognition of the AV's passing intention [6]. Moreover, the implemented test environment had an influence, and especially the sound. The study by [22] showed that a spatial audio enhanced task performance compared to unimodal muting.

In summary, it can be stated that many questions concerning explicit and implicit communication of AVs have been carried out in VR. VR setups were especially advantageous due to the cost-effective implementation of a study design that can be replicated in different countries. In addition, setup is more immersive than video or image setups.

#### *1.4. Wizard of Oz*

To investigate the interaction of a user with a computer system that is not yet fully developed, a WoZ approach can be used [23]. In this approach, an investigator—who is hidden from the user—simulates the system [23]. In most WoZ studies that examine the interaction between AVs and pedestrians, seat covers are used to hide the driver from the pedestrians' view, so as to simulate an AV [24–28]. The results of WoZ studies demonstrated that being able to see the driver is not very important for pedestrians [12,25,28]. In the study by [12], only half of the sample recognized the driver; however, when asked directly, they expressed that they felt safer when a driver is present. This result stands in contrast to the results of [28], where the perceived safety was not influenced by being able to see the driver. As a reason for their increased feeling of safety in the study of [12], some participants did not mention the eHMI, but instead mentioned the driving strategy of the AV [12]. This is in line with the results of [8], who stated that pedestrians rely on proven methods, and therefore focus on the driving behavior of vehicles rather than on additional eHMIs. The results also demonstrate that not every eHMI is suitable for communication with pedestrians [12]. The pedestrians did not relate the cyan light bar consisting of 12 LEDs on the roof used in the study to themselves, and could not understand the vehicle's intention as communicated by the eHMI [12].

In recent years, the number of WoZ studies has increased. With the WoZ setup, similar questions were investigated as with the VR setup, but the WoZ method is closer to reality. However, the use of a vehicle, a trained driver, a test track, the objective data measurement, and the safety protocol in WoZ studies are complex and cost-intensive.

#### *1.5. Driving Simulator*

While the design of AV communication initially focused on pedestrian–vehicle interaction, current studies also deal with human driver–AV interaction. In order to evaluate the influence of different driving strategies and eHMIs on other drivers, simulator studies have been conducted. Reference [7] examined the potential of eHMIs in bottlenecks and recommends the use of eHMIs due to a reduced passing time compared to a condition without an interface. However, labeling an AV did not have an influence on drivers in a simulation setup [9,11].

Investigated driver–AV interaction via a driving-simulator has the benefit of a risk-free setup, compared to WoZ setups.

#### *1.6. Objectives*

With regard to the different results, the question arises as to the method by which the communication of AVs should be investigated to obtain valid results. Furthermore, it is unclear whether the obtained results can be compared with each other and whether recommendations should be derived from the different studies.

To answer the question of comparability, we replicated the same study design in four different setups: two videos, one VR, and one WoZ approach. The video setup was divided in two parts: In the first part, we used videos from the VR setup, and in the second part, we filmed the WoZ vehicle. To the knowledge of the authors, such a thorough method validation has not been contributed to the state of the art yet.

Based on the previous results, we focused on the comparison of AVs' driving behavior without the use of eHMIs. In particular, the results for the video, VR and driving simulator studies revealed a positive impact of eHMIs on pedestrians' intention-recognition and the imagined crossing behavior. This contrasts with the results of the WoZ studies, in which hardly any effects were found for eHMIs, and in which driving behavior likely plays a greater role in the pedestrians' decision to cross the road. Across all methods, it can be seen that the driving behavior has an influence on the crossing behavior of pedestrians.

Images were excluded as a method variant in this study because they do not illustrate vehicle dynamics. The focus was on pedestrian–AV interaction, as this is the focus of most published studies. For this reason, driving simulator studies are not included in the comparison, as they investigate human driver–AV interaction.

#### **2. Materials and Methods**

#### *2.1. Procedure*

A study plan was implemented in three different setups, namely WoZ, VR, and videos, of both setups. The studies were conducted in Germany, which implicates that the motorized traffic was driving on the right lane. In all setups, participants stood at the roadside, in a shared space. An AV approached from the left, using different driving profiles, characterized by changing speed, to communicate whether the HRU—in this case, a pedestrian—was allowed to go first or should wait. Participants were asked to recognize the AV's intention and to press a button when they thought they realized the intention (intention recognition time, IRT).

In the WoZ study, we used the button of a light barrier system. The vehicle activated the sensors after driving over a determined point and a light flashed, when participants pressed the button. This light was visible to the driver, so that he could accelerate to the original speed. Therefore, the rest of the driving profile did not influence pedestrians [24].

In the VR study, participants were asked to press a button on a remote control, and the simulation stopped simultaneously. Additionally, we tracked the walking movement. In this variant, we replicated all driving profiles and asked participants not to press the button, but to cross the virtual street. However, for safety reasons, we did not ask participants in the WoZ setup to cross the street.

In the video setup, participants saw all trials on a monitor. They were asked to press a key on the keyboard, at the moment they realized the intention, upon which the video disappeared.

After each trial, participants had to answer a small number of questionnaire items in each study setup.

#### *2.2. Apparatus*

#### 2.2.1. Wizard of Oz Setup

The WoZ vehicle was a BMW 2 series (F46, 220d xDrive) with automatic transmission and equipped with a speed limiter (Figures 1 and 2). The vehicle was marked as an "automated test vehicle" with two magnetic signs. A non-professional driver drove the vehicle and was hidden from the pedestrians' view by a seat cover (Figure 1). The driver practiced the trajectories, so that there was little deviation with each repetition [24]. We implemented the light barrier system SmartSpeed Pro of the company Fusion Sport, connected to a remote control with one button via Bluetooth, and recorded at a sampling rate of 1000 Hz.

**Figure 1.** Wizard of Oz vehicle: (**a**) seat cover used to hide the driver; (**b**) driver hidden under the seat cover [24].

**Figure 2.** Wizard of Oz setup.

#### 2.2.2. Virtual Reality Setup

An HTC Vive Pro VR setup with a head-mounted display, two infrared sensors, two trackers, and one remote controller wereused for the VR study setup. All participants held the remote control in their hand, and a tracker was attached to each foot. The simulation software is based on Unity 3D, and a simulated BMW 3 series (F30) was used (Figure 3). The vehicle had no driver, but also no additional markings. The investigator could manipulate the driving behavior by adding a trajectory path and maneuver points. Driving data and the triggering of the button were recorded at 5 Hz. However, no sound was utilized, due to technical reasons.

**Figure 3.** Virtual reality setup.

#### 2.2.3. Video Setups

We filmed the WoZ vehicle, using a SONY FDR-AX53 with a 26.8 mm wide-angle lens (Figure 2). The camera was mounted on a tripod at a height of 1.61 m, at the same position the participants were standing in the WoZ study. The videos from the VR setup were recorded, using the open-source software OBS (Open Broadcaster Software) studio, and the viewing height was also 1.61 m (Figure 3). The videos were incorporated via HTML, and the survey was accessible from a website. However, since the videos were too large for low internet capacity, most participants watched the videos in the premises of the Chair of Ergonomics (Technical University of Munich) or BMW. The invited participants saw the videos on a 24" monitor. For all videos, no sound was recorded.

#### *2.3. Study Design and Variables*

For all four study setups, almost the same study design was implemented. However, there were some small differences between the study setups:


We randomized two AV intentions: either the *AV goes first*, or to *Let the HRU go first*. For both intentions, an unambiguous and ambiguous driving profile was presented to the participants. To communicate the intentions, altered driving strategies were used that differed in the longitudinal dynamics.

#### 2.3.1. Independent Variables

A within-subject design with two AV intentions (*Let the HRU go first* and *AV goes first*) and—for each of these intentions—an unambiguous and an ambiguous driving profile was implemented. Previous studies showed that the IRT is not sensitive enough to evince differences in driving profiles that are rated very well by humans; thus, we chose highly opposite profiles to apply the IRT [24,25]. All profiles were extracted from human trajectories: In a previous study, participants drove three times, in an unambiguous and ambiguous way, to communicate both intentions to a pedestrian. After each trial, participants rated how satisfied they were with the respective driving profile. We extracted the best rated profiles and defined the specified target trajectories. For the factor "Unambiguity of Driving Profiles" the driver drove either in an understandable or misleading way, to communicate both intentions.

For both intentions, the vehicle accelerated to 28.5 km/h on a 100 m test track. All indicated distances refer to the vehicle's front bumper. If the *AV goes first*, it had a speed of at least 20 km/h when passing the pedestrian. For the second intention, to *Let the HRU go first* the AV decelerated and came to a full stop.

The driving profile *AV goes first, unambiguous* is defined by a constant speed of 28.5 km/h. In contrast, for the profile *AV goes first, ambiguous* the vehicle accelerated to 28.5 km/h and decelerated to 13 km/h after 60 m. After another 32.6 m (7.4 m distance from the pedestrian's position), the vehicle accelerated again (Figure 4).

**Figure 4.** Unambiguous (solid lines) and ambiguous (dashed lines) target trajectory for the intentions *Let the HRU go first* (gray lines) and *AV goes first* (black lines). The vertical dashed line represents the position of the beginning of the time measurement.

For the intention *Let the HRU go first* the vehicle decelerated in two different ways. For the driving profile *Let the HRU go first, unambiguous* the vehicle decelerated by at most 1.5 m/s2 at a distance of 60 m from the start position. Thus, it started decelerating at the same point as in the driving profile *AV goes first, ambiguous*. The vehicle stopped completely 7.4 m away from the pedestrian—the same point at which the vehicle accelerated in the *AV goes first, ambiguous* profile. In contrast to the smooth deceleration (at max. 1.5 m/s2) for the unambiguous profile, the vehicle decelerated by at most 4.1 m/s<sup>2</sup> for the driving profile *Let the HRU go first, ambiguous*. The braking process started at 85.2 m from the starting position; hence, the vehicle slowed down in 25.2 m distance to the braking point for the unambiguous driving profile. The vehicle stopped completely after a driving distance of 95.7 m, 4.3 m away from the pedestrian's position.

#### 2.3.2. Dependent Variables

As mentioned, participants pressed a button when they thought they had recognized the vehicle's intention [15,24,25]. We measured the time lapse between the vehicle being at a 40 m distance from the pedestrian's position and the moment at which the participants pressed the button. This time lapse, the IRT, was measured for each trial.

After each trial, participants filled out a five-item questionnaire. This questionnaire was already published in [24] and based on previous studies [15,25]. Based on the IRT, the participants were asked about the vehicle's assumed intention (*Let the HRU go first* or *AV goes first*) and whether they would cross the street at the moment they recognized the intention. Then, pedestrians evaluated their certainty about the vehicle's intention (very uncertain to very certain), the vehicle's driving behavior (very poor to very good), and the perceived criticality of the situation (very critical to very uncritical) on a five-point Likert scale [24]. In the video study, participants were also asked if the video activity had run smoothly from a technical point of view. This item was used to exclude data from the evaluation if videos had frozen during playback.

In order to track the walking movement in the VR setup, we asked participants not to press the button, but to cross the virtual street. The trackers on each foot detected when the participant walked over a virtual line. This line was located about one meter from the participants' starting position. In order to be able to compare the time at the beginning of road crossing with the IRT, the times were synchronized: In both cases, the time measurement started at a 40 m distance from the pedestrian's position. However, the IRT was always independent from the walking movement (Figure 5).


**Figure 5.** Comparison of the different setups.

#### *2.4. Sample*

For the VR setup, 37 participants (23 male and 14 female) with a mean age of *M* = 27.32 years (*SD* = 9.93 years) and for the WoZ experiment 34 participants (24 male and 10 female) with a mean age of *M* = 40.94 years (*SD* = 21.39 years) were recruited via BMW and postings at the Technical University of Munich (Table 1). In the video setup, from altogether 46 participants (20 male and 26 female) with a mean age of *M* = 30.50 years (*SD* = 11.55 years), 28 participants were recruited via emailing lists among BMW employees and postings at the Technical University of Munich, and the remaining participants participated online. All participants received compensation; however, in the video setup, participants either received monetary compensation or—the participants who participated online—were entered into a lottery for vouchers for an electronic commerce company.

**Table 1.** Samples for all study setups.


On average, participants travel as pedestrians in traffic *M* = 7.06 h (*SD* = 6.33 h) per week in the WoZ setup, *M* = 8.03 h (*SD* = 6.01 h) per week in the VR setup, and *M* = 6.57 h (*SD* = 5.76 h) per week in the video setup.

#### *2.5. Analysis*

The different study setups (WoZ, VR, and video) were compared with a between subject design. However, for the video setups, we had two kinds of videos (video WoZ and video VR) and dependent samples. The samples of the WoZ, VR and the two video setups are independent. We were only interested in the comparison between WoZ and VR; WoZ and video WoZ setup; and VR and the video VR study (Figure 5). Therefore, all outcomes are related to these comparisons. Moreover, as a result of the different nature of the samples (the samples of the two video setups are dependent and the other samples independent), a statistical analysis was not useful for all results and most data were compared descriptively.

For the WoZ setup, we had to exclude three participants, because they did not understand the task. In the VR setup, seven participants did not press the button. Therefore, the IRT was evaluated for only 30 participants; however, subjective data are still described for all 37 participants. For the video setups, we asked participants to answer if the video ran smoothly from a technical point of view. All trials in which participants indicated technical problems were excluded from the evaluation.

Due to the different setups, we had dissimilar maximum values for the IRT: in the WoZ setup, the driving behavior varied from trial to trial because of the human driver [24]. Accordingly, the videos of the WoZ setup are also dependent on the driver. Both videos were cut at the moment the AV came to a complete stop or had passed the pedestrian. The time may vary due to human error, so the lengths

of the routes were calculated to specify the maximum IRTs. Therefore, it is not possible to compare the absolute values of the IRT between the different setups. However, we had the participants' answers about the vehicle's assumed intention and if they would cross the street. Both dependent variables are related to the IRT, but can still be evaluated.

#### **3. Results**

This section is divided into five subsections. In the first three subsections, the setups are compared with each other with regard to the frequency of misinterpretations of intentions (Section 3.1), the mentioned crossing behavior (Section 3.2), and the time of decision (Section 3.3). In Section 3.4, we analyzed for each setup, separately, whether the unambiguity of driving profiles led to different IRTs and evaluations of driving behavior. In Section 3.5., IRT is compared to the start of crossing behavior for the VR setup.

#### *3.1. Misinterpretations of Intentions*

Table 2 presents the misinterpretation rate for the intention *Let the HRU go first*, while Table 3 illustrates the misinterpretation rate for the intention *AV goes first* for all setups. The results of the misinterpretations of intentions for the WoZ study were already published in [24].

For the intention *Let the HRU go first*, we found correct interpretation rates of 100% (WoZ), 97% (VR), 96% (video VR), and 89% (video WoZ) for the unambiguous driving profile. In contrast, the interpretation for the ambiguous driving profile was only correct in 23% (WoZ), 36% (video VR), 39% (video WoZ), and 70% (VR) of all trials.

For the intention *AV goes first*, the results showed a similar outcome. For the unambiguous driving profile, we found correct interpretation rates of 93% (video WoZ), 97% (WoZ and VR), and 98% (video VR). In contrast, the interpretation for the ambiguous driving profile was only correct in 29% (WoZ), 60% (VR), 68% (video WoZ), and 72% (video VR) of all trials.

To sum up, for all methods, the ambiguous driving profiles lead to higher misinterpretation rates, compared to the unambiguous profiles. This effect can especially be seen for the WoZ setup, whereas the effect is more moderate for the VR setup. However, for the video setups we found different results. The misinterpretation rate for the intention *Let the HRU go first* is between the rate for the WoZ and VR setup for both video setups. In contrast, for the intention *AV goes first* the misinterpretation rate is lower than for the WoZ and VR setup for both video setups.


**Table 2.** Misinterpretations of the intention *Let the HRU go first*.

**Table 3.** Misinterpretations of the intention *AV goes first*.


#### *3.2. Mentioned Crossing Behavior*

Besides the vehicle's assumed intention, we asked participants if they would cross the street. Tables 4–7 present the mentioned crossing behavior for all four intentions and setups. The tables

are subdivided into the correctly or incorrectly recognized intention and the respective mentioned crossing behavior.

In total, 52% of all participants correctly realized the intention for *Let the HRU go first, unambiguous* and would have crossed the road in the WoZ study. This value is higher for the other study setups: 62% for the VR setup, 67% for the video WoZ setup, and 85% for the video VR setup (Table 4). Compared to the ambiguous driving profile, more participants would have crossed the road (Table 5). The tendency for the WoZ and the VR setup is the same: More participants would have crossed the road in the VR setup, as compared to the WoZ setup (Figure 6). Nevertheless, for the unambiguous driving profile, the highest number of participants crossed the road for both video setups, whereas for the ambiguous driving profile, the fewest participants crossed the road for the video setups (Figure 6).

**Figure 6.** Mentioned crossing behavior for the intention *Let the HRU go first*, for the participants who understood the intention correctly.

For the intention *AV goes first*, it poses a safety risk if participants misunderstand the intention and would still cross the road. That risk is higher for the ambiguous driving profile for all study setups than for the unambiguous driving profile (Figure 7). Especially for the ambiguous driving profile, fewer participants would have crossed the road by mistake in the VR setup (16%), as compared to the WoZ setup (23%). The result for the video WoZ setup had the same tendency as the WoZ setup (WoZ: 23%, video WoZ: 22%; Table 7); in addition, the video VR setup had the same tendency as the VR setup (VR: 16%; video VR: 16%; Table 7). However, for the unambiguous driving profile, the collision risk was comparatively low for all four study setups (Table 6).

**Figure 7.** Mentioned crossing behavior for the intention *AV goes first*, for the participants who misunderstood the intention.



**Table5.**Mentionedcrossingbehavior for the intention*LettheHRUgofirst,ambiguous*.


**Table 6.** Mentioned crossing behavior for the intention *AV goes first, unambiguous*.



**Table7.**Mentionedcrossingbehaviorfortheintention*AVgoesfirst,ambiguous*.

#### *3.3. Time of Decision*

For the time of decision, we evaluated how often the participants waited to press the button until the AV came to a complete standstill or passed by for each setup. To analyze this, only correct answers were included. Therefore, *n* varies for the different driving strategies and settings.

The results showed that, for the WoZ setup, only one participant waited until the AV passed by. However, in the other three setups, more participants waited for a complete standstill when faced with the ambiguous driving profile, compared to the unambiguous driving profile (Table 8). For the intention *AV goes first*, more participants waited in the VR and video VR setup for the AV to pass by with the ambiguous driving profile, as compared to the unambiguous profile. Only for the video WoZ setup did more participants wait for a complete standstill when faced with the unambiguous driving profile (Table 8).

**Table 8.** Percentage and number of participants waited to press the button until the AV came to a complete standstill or passed by, for each setup.


*3.4. Unambiguity of Driving Profiles: Subjective Data and Intention Recognition Time*

To evaluate the subjective data and the IRT, we only used correct answers. As we focused only on the comparison between WoZ and VR, WoZ and video WoZ, VR and video VR, and video WoZ and video VR (Figure 5), we calculated planned contrasts between those setups and compared the *p*-values with a Bonferroni-corrected alpha of 0.0125. For the comparison between the independent samples, Mann–Whitney U-tests were calculated, and for the comparison between the two video setups (in which the samples are dependent), Wilcoxon tests were calculated (Figure 8).

**Figure 8.** Comparison for the subjective data.

As already published in [24] for the WoZ setup, we also tested whether the driving profiles led to different IRTs and evaluations of driving behavior. Therefore, we used the mean of the repeated measurements for every dependent variable for each driving profile for the results of the WoZ setup. Hence, two non-parametric Wilcoxon tests were calculated for all dependent variables (one for each intention), and we compared the *p*-values with an alpha of 0.05.

#### 3.4.1. Intention Recognition Time

The Wilcoxon tests only revealed significant differences for the intention *Let the HRU go first* for the two video setups. Moreover, the IRT was higher for the ambiguous driving profile for the WoZ, VR and video WoZ setups, whereas for the video VR setup, the IRT was higher for the unambiguous driving profile (Table 9).

However, for the intention *AV goes first*, significant differences for all four setups comparing the unambiguous and ambiguous driving profile were found (Table 9). For all four setups, participants needed more time to correctly interpret the ambiguous driving profile.


**Table 9.** Median (*Mdn*) of the IRT (measured in seconds), segregated by setup.

#### 3.4.2. Subjective Decision-Making Reliability

For the intention *Let the HRU go first, unambiguous* (*z* = −1.38, *p* = 0.167), the intention *Let the HRU go first, ambiguous* (*z* = −0.14, *p* = 0.892), and the intention *AV goes first, unambiguous* (*z* = −2.35, *p* = 0.019), we did not find significant differences between the WoZ and VR setups after the Bonferroni correction. However, for the intention *AV goes first, ambiguous*, there was a significantly higher subjective decision-making reliability (*z* = −2.84, *p* = 0.004, *r* = 0.45) for the VR setup (*Mdn* = 5.0), as compared to the WoZ setup (*Mdn* = 3.0; Figure 9).

**Figure 9.** Boxplots for the subjective decision-making reliability (1 = very uncertain; 5 = very certain), segregated by setup (\* = *p* < 0.0125).

The comparison between VR and video VR revealed no significant differences for any of the four intentions (*Let the HRU go first, unambiguous*: *z* = −0.58, *p* = 0.561; *Let the HRU go first, ambiguous*: *z* = −0.08, *p* = 0.934; *AV goes first, unambiguous*: *z* = −0.05, *p* = 0.963; *AV goes first, ambiguous*: *z* = −1.43, *p* = 0.154).

In addition, the results for the subjective decision-making reliability of the WoZ and the video WoZ setups revealed no significant differences (*Let the HRU go first, unambiguous*: *z* = −1.61, *p* = 0.107; *Let the HRU go first, ambiguous*: *z* = −0.324, *p* = 0.746; *AV goes first, unambiguous*: *z* = −1.39, *p* = 0.163; *AV goes first, ambiguous*: *z* = −2.01, *p* = 0.045).

We also found no significant differences for the video VR and the video WoZ setups (*Let the HRU go first, unambiguous*: *z* = −2.39, *p* = 0.017; *Let the HRU go first, ambiguous*: *z* = −0.33, *p* = 0.740; *AV goes first, unambiguous*: *z* = −0.43, *p* = 0.668; *AV goes first, ambiguous*: *z* = −0.53, *p* = 0.595).

The boxplots (Figure 9) illustrated that the inter-quartile ranges (IQRs) for the WoZ setup for the intention *Let the HRU go first* are both comparatively small. In contrast, for the intention *AV goes first* the boxplots differ in their IQRs with regard to the unambiguous and the ambiguous driving profile: The range for the ambiguous driving profile is greater than the range for the unambiguous driving profile. The boxplots for the VR setup revealed a different result: the IQRs for the intention *AV goes first* are both small. For the intention *Let the HRU go first* the range is greater for the ambiguous driving profile than for the unambiguous profile. As presented in Section 3.3, more participants in the VR setup waited for a complete standstill or for the vehicle to pass before answering the questions. For both driving strategies, the participants who waited for the complete driving strategy were very confident in their decision (first quartile, median, and third quartile: 5.0). For the other participants, the boxplots are very tall (first quartile: 2.8, median: 4.0, and third quartile: 4.3).

The IQRs for the video setups are relatively small for the unambiguous driving profiles, but comparatively large for the *AV goes first, ambiguous* driving profile. This is comparable with the boxplots from the WoZ setup. However, for the intention *Let the HRU go first, ambiguous*, the IQR for the video WoZ setup is much greater than for the video VR setup and the WoZ setup. For both video setups, the number of participants who waited for the complete driving profile is relatively high (Table 8).

For the intention *Let the HRU go first*, none of the setups showed a significant difference in terms of decision-making reliability between the ambiguous and the unambiguous driving profile. For the WoZ setup, the subjective decision-making reliability revealed a significant difference for the driving profile *AV goes first* between the unambiguous and the ambiguous driving profile (Table 10; the median in Table 10 for the WoZ setup differs from the median in Figure 9, since we used the mean of the repeated measurements for comparison within the setup). The participants were more confident with their decision when the driving profile was unambiguous. This is comparable with the results from both video setups, even if these were not significant.


**Table 10.** Median (*Mdn*) of the subjective decision-making reliability (1 = very uncertain, 5 = very certain), segregated by setup.

#### 3.4.3. Evaluation of Driving Behavior

Just as for the subjective decision-making reliability, the differences for the evaluation of driving behavior showed no significant differences for the intention *Let the HRU go first, unambiguous* (*z* = −0.38, *p* = 0.702) and the intention *Let the HRU go first, ambiguous* (*z* = −1.42, *p* = 0.156). We also found no significant difference for the intention *AV goes first, ambiguous* (*z* = −0.34, *p* = 0.734). However, the participants rated the driving behavior significantly better in the WoZ setup (*Mdn* = 4.0) than in the VR setup (*Mdn* = 4.0) (*z* = −4.59, *p* ≤ 0.001, *r* = 0.47) for the intention *AV goes first, unambiguous* (Figure 10).

The comparison between the WoZ and the video WoZ setup showed a significant difference for the intention *Let the HRU go first, unambiguous* (*z* = −3.12, *p* = 0.002, *r* = 0.33). The rating is better for the WoZ setup (*Mdn* = 4.0) than for the video WoZ setup (*Mdn* = 4.0). Moreover, the intention *AV goes first, unambiguous* revealed a significantly better rating for the WoZ setup (*Mdn* = 4.0) than for the video WoZ setup (*Mdn* = 4.0; *z* = −4.20, *p* ≤ 0.001, *r* = 0.42). For the intention *Let the HRU go first, ambiguous* (*z* = −2.04, *p* = 0.041), and *AV goes first, ambiguous* (*z* = −0.42, *p* = 0.678) no significant differences were found.

No significant differences for all intentions were found when comparing the VR and video VR setup (*Let the HRU go first, unambiguous*: *z* = −1.63, *p* = 0.103; *Let the HRU go first, ambiguous*: *z* = −1.00, *p* = 0.319; *AV goes first, unambiguous*: *z* = −0.08, *p* = 0.936; *AV goes first, ambiguous*: *z* = −0.11, *p* = 0.909), as well as video WoZ and video VR setups (*Let the HRU go first, unambiguous*: *z* = −0.28, *p* = 0.776; *Let the HRU go first, ambiguous*: *z* = −0.14, *p* = 0.890; *AV goes first, unambiguous*: *z* = −1.08, *p* = 0.279; *AV goes first, ambiguous*: *z* = −1.04, *p* = 0.299).

**Figure 10.** Boxplots for evaluation of driving behavior (1 = very poor, 5 = very good), segregated by setup (\* = *p* < 0.0125).[M1] [W2]

The IQRs for all boxplots for the WoZ setups are comparatively small. However, with the exception of the intention *AV goes first, unambiguous*, the IQRs for the VR setup are rather large. For the mentioned intention, very few participants (6%) waited until the vehicle had passed by (Table 8). The large IQRs for both ambiguous driving profiles might have occurred due to those participants who waited to see the entirety of the driving profiles (Figure 11). However, this does not explain the larger IQR for the intention *Let the HRU go first, unambiguous*, because only four participants waited for the complete standstill of the AV (Table 8). In addition, the boxplots for both video setups revealed different IQRs that cannot be explained by the fact that some participants waited. However, all boxplots illustrate that the unambiguous driving profiles tend to be rated better than the ambiguous driving profiles (Figure 10).

**Figure 11.** Boxplots for evaluation of driving behavior (1 = very poor, 5 = very good) for the VR setup, segregated by time of decision (before the AV reached standstill or after the AV reached standstill, and before the AV passed by or waited until the AV passed by).[M3] [W4]

The evaluation of the driving behavior showed significant differences for all four setups and both driving strategies (*Let the HRU go first* and *AV goes first*), between the unambiguous and ambiguous driving profiles. The participants rated the unambiguous driving profiles better than the ambiguous driving profiles in all four setups (Table 11). Here, the deviating median listed in the table and boxplots results from using the mean of the repeated measurements for the WoZ setup.


**Table 11.** Median (*Mdn*) of the evaluation of driving behavior (1 = very poor, 5 = very good), segregated by setup.

#### 3.4.4. Perceived Criticality

In terms of perceived criticality, no significant differences were revealed between the WoZ and VR setups (*Let the HRU go first, unambiguous*: *z* = −0.32, *p* = 0.749; *Let the HRU go first, ambiguous*: *z* = −0.46, *p* = 0.645; *AV goes first, unambiguous*: *z* = −1.44, *p* = 0.151; *AV goes first, ambiguous*: *z* = −0.25, *p* = 0.801). However, we found significant differences for the intention *Let the HRU go first, ambiguous* (*z* = −2.56, *p* = 0.011, *r* = 0.26) and the intention *AV goes first, unambiguous* (*z* = −2.79, *p* = 0.005, *r* = 0.28) between the WoZ and the video WoZ setups (Figure 12). For both intentions, the perceived criticality is higher for the WoZ setup (for both intentions: *Mdn* = 4.0), as compared to the video WoZ setup (*Let the HRU go first, ambiguous: Mdn* = 3.0; *AV goes first, unambiguous: Mdn* = 4.0). For the intention *Let the HRU go first, unambiguous* (*z* = −1.44, *p* = 0.151) and for the intention *AV goes first, unambiguous* (*z* = −0.10, *p* = 0.917), no significant differences were found.

**Figure 12.** Boxplots for perceived criticality (1 = very critical, 5 = very uncritical), segregated by setups (\* = *p* < 0.0125).

Moreover, the VR and video VR setup (*Let the HRU go first, unambiguous*: *z* = −1.50, *p* = 0.134; *Let the HRU go first, ambiguous*: *z* = −1.03, *p* = 0.305; *AV goes first, unambiguous*: *z* = −1.14, *p* = 0.255; *Go first, ambiguous*: *z* = −0.43, *p* = 0.595) revealed no significant differences.

Furthermore, no differences were found for the perceived criticality between the video VR and video WoZ setup (*Let the HRU go first, unambiguous*: *z* = 0.00, *p* ≥ 0.999; *Let the HRU go first, ambiguous*: *z* = −0.82, *p* = 0.412; *AV goes first, unambiguous*: *z* = −1.89, *p* = 0.059; *AV goes first, ambiguous*: *z* = −0.86, *p* = 0.388).

All boxplots illustrate that the ambiguous driving profiles tend to be rated more critically than the unambiguous driving profiles (Figure 12). The boxplots for both ambiguous driving profiles showed larger IQRs for all setups compared to the unambiguous driving profiles. The only exception is the boxplot for the intention *AV goes first, ambiguous* for the video VR setup: The IQRs are not larger for the ambiguous driving profile than for the unambiguous driving profile. This is independent of whether the participants waited to see the entirety of the driving profile (IQRs for both groups: first quartile: 2.0, median: 2.0, third quartile: 3.0).

We also evaluated the extent to which the unambiguity influences the perceived criticality for all setups. In all four setups, participants rated the situation to be significantly less critical if the driving profile was unambiguous for both intentions (Table 12). As before, the median in the boxplot differs from the median listed in the table for the WoZ setup, because the mean of the repeated measurements for the comparison was used for the table (Table 12).


**Table 12.** Median (*Mdn*) of the perceived criticality (1 = very critical, 5 = very uncritical), segregated by setup.

#### *3.5. VR Study: IRT vs. Start of Road Crossing*

As mentioned in Section 2.3.2, we asked participants in the VR setup to cross the street instead of pressing a button. Reaction times such as IRTs and the crossing time were not normally distributed. Therefore, two Wilcoxon tests were calculated to evaluate possible differences between the IRT and the crossing time for the intention *Let the HRU go first*.

The results revealed that participants made their decision for the intention *Let the HRU go first*, *unambiguous* earlier (IRT, *Mdn* = 4.5 s) and waited significantly longer to cross the street (*Mdn* = 7.2 s; *z* = −5.09, *p* ≤ 0.001, *r* = 0.87). A comparable result was found for the intention *Let the HRU go first*, *ambiguous* (*z* = −3.90, *p* ≤ 0.001, *r* = 0.76). Participants made their decision first (IRT, *Mdn* = 4.8 s) and crossed the street later (*Mdn* = 6.9 s). This leads to lower misinterpretation rates for all intentions (Table 13).



Just as with the IRT, there are no significant differences between the unambiguous and the ambiguous driving profiles for the start of road crossing (*z* = −0.77, *p* = 0.442).

#### **4. Discussion**

The aim of the study was to compare different study setups that can be used to evaluate the driving behavior of AVs, in order to be able to give indications as to whether already-conducted studies can be compared with each other. Therefore, we replicated the same study design in four different settings: WoZ, VR, video WoZ, and video VR. In all studies, participants stood at the roadside in a shared space. An AV approached from the left, using different driving profiles, characterized by changing speed as a way of communicating its intention to let the pedestrian cross the road. Participants were asked to recognize the intention of the AV and to press a button as soon as they had realized this intention.

Since the WoZ setup is the closest to reality, the authors assume that the values measured in this setup are the most realistic ones. The other setups were related to the results of the WoZ setup.

The misinterpretation rates for the ambiguous driving profiles were underestimated in VR, video WoZ, and video VR, as compared to the WoZ setup: The misinterpretation rate is lower in those setups. However, differences between unambiguous and ambiguous driving strategies were revealed in all setups, since the misinterpretation rate was higher for ambiguous driving profiles compared to the unambiguous profiles. This coincides with the results of previous studies, employing video, VR, and WoZ setups, where pedestrians refer to differences in driving strategies when crossing the road (e.g., [8,12,15,16,19]).

For the intention *Let the HRU go first*, it was preferable that participants recognize the intention correctly and cross the road before the AV had to come to a standstill. The results for the crossing behavior showed that the proportion of those pedestrians is overestimated in VR, video WoZ, and video VR, as compared to the WoZ setup for the unambiguous and the ambiguous driving profile. While the results for both video setups for the intention *Let the HRU go first, ambiguous* are approximately the same (Δ 1%), there is a rather high discrepancy for the intention *Let the HRU go first, unambiguous* (Δ 18%). This result suggests that the crossing behavior is dependent on the type of video.

As mentioned in the results, it poses a safety risk if participants misunderstand the intention and cross the road for the intention *AV goes first*. As for the misinterpretation rate, all setups detect this risk especially for the ambiguous driving profile. While the risk for the unambiguous driving strategy is assessed almost equally by all setups, the risk was underestimated in the VR setup for the ambiguous driving profile compared to the WoZ setup (WoZ vs. VR: *AV goes first, unambiguous* Δ 1%, *AV goes first, ambiguous* Δ 6%). Just like the results for the intention *Let the HRU go first, unambiguous*, the results for the intention *AV goes first, ambiguous* are also dependent on the choice of video: The video WoZ setup can reproduce the critical crossing rate from the WoZ setup (Δ <1%), and the video VR can reproduce the results from the VR setup (Δ <1%).

The comparison also showed that, in the WoZ setup, only one participant waited to see the whole driving profile; all others had made their decision before this point. In the VR setup, a total of 20% of all participants who correctly realized the intention, waited to make their decision until the end of the driving profile. That rate is higher for the ambiguous driving profile (39%) compared to the unambiguous profile (9%). Therefore, it seems that the perception of the driving profiles is more difficult for participants in a VR setup. However, understanding intentions by using the driving profiles appears to be even more difficult when only seeing videos. Most participants waited until the end of the driving profile (46%) in the video WoZ setup; however, also in the video VR setup, many participants waited to see the whole driving profile (32%). It is possible to differentiate between unambiguous and ambiguous driving profiles with just the results of a VR or a video study, but the results are not transferable to reality, because the pedestrians made their decisions in the WoZ setup at an earlier stage.

The results for the subjective decision-making reliability let no clear statement be made regarding the significance tests. The different IQRs result from participants who waited until the vehicle stood completely or had passed by, depending on the study setup. However, the results for the WoZ setup revealed the greatest IQR for the intention *AV goes first, ambiguous*. In addition, the comparison between the *AV goes first, unambiguous* and *AV goes first, ambiguous* driving profile in the WoZ setup showed the only significant difference across all setups. The results indicate that the *AV goes first, ambiguous* profile leads to the most uncertainties. In contrast, the *AV goes first, unambiguous* profile revealed the shortest IQRs across all setups. A reason could be that, in this driving strategy, the AV does not change its speed.

This can also be seen for the evaluation of the driving profile: in all four setups, the IQRs for the intention *AV goes first, unambiguous* were short. The driving strategy led to clear trends in the evaluations. With one exception, the intention *AV goes first, ambiguous*, the driving strategies were rated better in the WoZ setup. The intention *AV goes first, ambiguous* is rated equally bad in all setups. When looking at the boxplots and the significance tests, it becomes clear that the item can be used to distinguish between unambiguous and ambiguous driving strategies in all settings. This effect can especially be seen for the WoZ setup, because the effect size is greatest for this setup, compared to the other three setups. However, the IQRs for the VR setup to some extent—but especially for the video setups—cannot be explained by the results. This could be due to perception and/or decision artefacts.

The perceived criticality is higher in the WoZ setup for some intentions, as compared to the video WoZ setup. However, there is no clear tendency for the perceived criticality to be systematically underestimated in the video setups or the VR setup. It is possible in all setups to differentiate between the unambiguous and ambiguous driving profiles. However, the effect size is greatest for the WoZ setup.

In addition to the setup comparison, the VR setup was used to check how the IRT metrics differ in terms of the start of road crossing. Results revealed that participants made their decision regarding the AV's intention significantly earlier than they would cross the road. A motor process must be performed for both metrics; however, more time is needed to walk one meter than to press a button. Nevertheless, this does not explain the time difference of 2.7 s between IRT and the start of road crossing for the unambiguous and 2.1 s for the ambiguous driving profile. However, it can be assumed that pedestrians assess the AV's driving behavior at an early stage, but wait until they are certain in their decision before crossing the road. Due to the longer time period, participants saw more of the whole driving profile and made more correct decisions, compared to the IRT metric. However, for the intention *AV goes first, ambiguous*, two persons still crossed the road by mistake. In real-life traffic situations, but also in the WoZ setup, this behavior would probably have led to an accident.

#### **5. Limitations**

Even though we tried to replicate the setups as much as possible, there were small differences: In the VR and video setups, for example, no engine sound was presented to the participants. Compared to the results from [22], this might deteriorate the task performance. Furthermore, the environment varied in the WoZ (rather rural) and VR setup (rather urban).

In addition, in the WoZ setup the driver accelerated to the original speed at the moment the participants pressed the button, so that they were not influenced by the remaining driving profile. In the VR setup and both video setups, the video was frozen the moment participants pressed the button. These limitations might have led to differences between the setups.

Although all vehicles were BMWs, a BMW 2 series was used in the WoZ setup, and a BMW 3 series was used in the VR setup. As mentioned, Ref. [4] found a significant effect for different vehicles sizes. However, the authors compared a Smart Fortwo, a BMW Z4, and a Ford F150; therefore, the different sizes of the vehicles were comparatively large compared with our vehicles. In addition, the differences found had only a small effect [4].

Furthermore, there are also weaknesses in the analysis: Equivalence tests should have been carried out instead of significance tests for differences. Unfortunately, the prerequisites were not met, due to the ordinal-scaled data and small sample sizes. For this reason, the authors have limited themselves to report descriptive data for most results.

Methodologically, it was not possible to compare IRT between the studies, because the different times measurements calculating the IRT were not synchronized. We implemented the driving profiles for the VR setup as a replicate from the specification. However, due to the low sampling rate of 5 Hz, differences of a maximum of 200 ms may occur. For the video VR setup, the videos were screened on the monitor, and for the video WoZ setup, a driving throughput was recorded. Due to the cutting of the video sequences, the driving data can no longer be clearly calculated for the respective video. This makes it impossible to use the absolute IRT values for the setup comparison. However, the comparison within the setup is possible, even if the driving profiles themselves are of different lengths

Furthermore, it would have been useful to add a setup in which a programmable vehicle runs the given profiles, since the driving strategies in the WoZ differ for each trial, because a human driver is not able to precisely replicate a given driving profile [24].

#### **6. Conclusions**

To sum up, it can be stated that the WoZ setup is a useful approach to evaluate large differences between trajectories. However, small changes in driving behavior cannot be assessed, as a human driver is not able to replicate these [24]. Using the misinterpretation and crossing rate, it is possible to differentiate between unambiguous and ambiguous driving profiles in VR setups. Nevertheless, the collision risk would be underestimated in the VR setup compared to the WoZ setup, because less participants would have crossed the road by mistake in the VR setup. Conclusions as to absolute values are not possible in the VR setup. It is possible to detect a potential ambiguous driving profile when using a video setup. However, the type of video influences, among other things, the collision risk. Additionally, it is possible that perception and decision artefacts will emerge in a video study.

**Author Contributions:** Conceptualization, T.F., E.S., and K.B.; methodology, T.F..; formal analysis, T.F.; investigation, T.F.; resources, T.F., E.S., and K.B.; data curation, T.F.; writing—original draft preparation, T.F.; writing—review and editing, T.F., E.S., and K.B.; visualization, T.F. and E.S.; supervision, T.F. and K.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** The authors would like to thank Lars Michalowski for support in conducting the Wizard of Oz and the VR study.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **E**ff**ects of Marking Automated Vehicles on Human Drivers on Highways**

#### **Tanja Fuest 1,\*, Alexander Feierle 1, Elisabeth Schmidt <sup>2</sup> and Klaus Bengler <sup>1</sup>**


Received: 4 May 2020; Accepted: 25 May 2020; Published: 28 May 2020

**Abstract:** Due to the short range of the sensor technology used in automated vehicles, we assume that the implemented driving strategies may initially differ from those of human drivers. Nevertheless, automated vehicles must be able to move safely through manual road traffic. Initially, they will behave as carefully as human learners do. In the same way that driving-school vehicles tend to be marked in Germany, markings for automated vehicles could also prove advantageous. To this end, a simulation study with 40 participants was conducted. All participants experienced three different highway scenarios, each with and without a marked automated vehicle. One scenario was based around some roadworks, the next scenario was a traffic jam, and the last scenario involved a lane change. Common to all scenarios was that the automated vehicles strictly adhered to German highway regulations, and therefore moved in road traffic somewhat differently to human drivers. After each trial, we asked participants to rate how appropriate and disturbing the automated vehicle's driving behavior was. We also measured objective data, such as the time of a lane change and the time headway. The results show no differences for the subjective and objective data regarding the marking of an automated vehicle. Reasons for this might be that the driving behavior itself is sufficiently informative for humans to recognize an automated vehicle. In addition, participants experienced the automated vehicle's driving behavior for the first time, and it is reasonable to assume that an adjustment of the humans' driving behavior would take place in the event of repeated encounters.

**Keywords:** marking automated vehicles; automated vehicles—human drivers interaction; mixed traffic; explicit communication; external human-machine interface

#### **1. Introduction**

BMW has announced that the first highly automated vehicles (AVs) will be integrated into road traffic by 2021 [1]. It can be assumed that, initially, level 3 functions [2] will be available on highways. At the beginning, there will be several situations where the implemented driving strategy of an AV differs from that of a human driver. These include, in particular, situations where anticipatory driving is required, such as waiting for large gaps or reacting to missing traffic signs (e.g., changes in the speed limit). These atypical driving strategies could lead to confusion and distrust by other human road users (HRUs) [3]. One way of counteracting the confusion of HRUs is the clear identification of AVs, e.g., through special marking or additional light signals.

One argument for marking AVs, besides the positive marketing effect, is an increased understanding of larger gap sizes or ambiguous driving strategies [4]. One argument against marking is that the compliant behavior of AVs could lead to unwanted external interference [4]. For example, pedestrians could step onto the road, as they could be sure that the AV will brake [4].

Similar markings already exist for several types of vehicle, e.g., driving-school vehicles [5]. Those vehicles can be marked when they are being used for lessons to draw the attention of the surrounding traffic to the presence of the learner driver [5]. For example, in Austria and New Zealand, when driving at the age of 17, a clearly visible sign must be attached to the vehicle [6,7]. This allows other drivers to adjust their driving behavior to the learner driver and, if necessary, maintain a greater than usual distance from the vehicle, or overtake quickly.

In a study to evaluate the influence of marked AVs on human drivers, drivers encountered an AV that was either marked, not marked, or wrongly marked, in different highway scenarios [8]. The authors asked participants to rate the perceived safety, risk, and how pleasant it was to encounter the AV. Objective driving data were recorded during the simulator study. The results show that human drivers evaluate encounters with AVs independent of the marking [8].

Moreover, the critical gap acceptance and the perceived safety of participants crossing a road in front of an AV is not affected by the vehicle's driving mode (manual vs. automated) [9]. A comparable result was found in the study by Rodríguez Palmeiro [10]: even if participants noticed that the vehicle had an automated-driving sign, and they were subjectively influenced by feeling less safe and more doubtful, the objective behavior of participants did not change [10]. In addition, Faas, Mathis and Baumann [11] recommended providing—as a minimum—information for the pedestrians on the vehicle's status, so as to increase trust, perceived safety and to improve the road user experience.

Although few studies exist that have investigated the marking of AVs as such, there is currently increased research into visual external human-machine interfaces (eHMIs) for AVs, used to communicate explicitly with other HRUs [12]. Even though the focus of eHMIs is on other communication content, they result in additional marking of the AV. Light strips (e.g., [11,13,14]), displays (e.g., [9,15–17]) and projections (e.g., [18]) have primarily been used to communicate intentions to pedestrians (e.g., [13–15]) or human drivers (e.g., [16,19]). Cyan is recommended for eHMIs as it is a highly visible color and has no specific association in road traffic contexts [11,18,20,21]. Therefore, it seems to be well-suited to represent AVs [20].

The current results indicate that eHMIs improve the interaction between pedestrians and AVs [15,17] and increase the perceived safety and comfort of participants [22,23]. However, with regard to pedestrian–AV interaction, projections and eHMIs on wheels should be avoided, whereas eHMIs on roofs, windscreens or grilles work quite well [23]. In addition, it was found that eHMIs are useful for human driver–AV interaction, whereby displays are recommended rather than projections [16].

However, there are also results which indicate that the interpretation of eHMIs by pedestrians is sometimes ambiguous [14] and suggest that pedestrians make their decision to cross the road depending on the AV's driving behavior [24–27].

#### **2. Objectives**

When integrating AVs into traffic, communication might differ from situation to situation depending on the communication partner, such as pedestrians or human drivers [28]. As mentioned, there will be situations where the AV's driving strategy differs from that of human drivers. It can be assumed that these driving strategies can only be adapted with improved technology and algorithms. For as long as better technology cannot be implemented, consideration should be given to mark AVs. Such markings can be used by drivers to identify AVs and adapt their driving behavior if necessary. The aim of the study is to investigate whether marking the vehicles with a cyan LED strip in the upper part of the rear window as AVs (Figure 1) results in differences in the drivers' behavior and subjective evaluation in situations where it can be expected that an AV's driving strategy will deviate from that of a human.

**Figure 1.** Marked automated vehicle with a cyan LED strip in the upper part of the rear window in the driving simulation.

#### **3. Method**

#### *3.1. Preliminary Study: Interview with Driving Instructors*

In order to obtain an initial impression of the effects of vehicle marking, we posted a question in two Facebook online groups for German driving instructors. We asked for their experience of marking their driving schools' vehicles. We received 53 responses sharing different impressions. Of the 53 responses, it was possible to analyze 40 answers, as the others did not discuss the topic of marking driving school's vehicles. Altogether, 20% of the driving instructors mentioned that they do not experience differences in the behavior of surrounding traffic while driving a marked driving-school vehicle, compared to driving a vehicle without markings. In total, 27.5% are in favor of marking and 52.5% prefer not being identified as a driving-school vehicle. Reasons mentioned for preferring markings are the greater consideration demonstrated by other road users (10%), less honking (12%), and more acceptance from others (5%) (Figure 2). However, other driving instructors perceive less consideration from other road users when they see such markings, along with riskier behavior by the same (for example, not adhering to appropriate distances when cutting in and out during overtaking; 27%). In their opinion, others honk more (3%) if they recognize a driving-school vehicle. Therefore, from their perspective, it is more relaxing (23%) to drive without markings.

**Figure 2.** Attitude towards the marking of driving school vehicles.

To evaluate whether marking AVs also leads to differing opinions, we conducted a driving simulation study. The ethics committee of the Technical University of Munich approved this study. The corresponding code is 448/19 S.

#### *3.2. Procedure*

After welcoming the participants, they had to sign a declaration of consent. They were then asked to fill out demographic questions on a tablet and take a seat in the driving simulator in order to adjust the driver's seat and mirrors. Participants were introduced to the simulator, and experienced the driving simulation during a familiarization drive. All participants experienced six trials in random order. Each trial consisted of one of three different highway scenarios in which the driver encountered an AV (see Section 3.4.1). After each trial, participants were asked about the surrounding traffic (see Section 3.5.1). At the end of the study, we asked about the attitude towards marking AVs. With the exception of the demographic information, the experimenter gathered the information via oral questions and responses.

#### *3.3. Apparatus*

The basis of the static driving simulator (Figure 3) was a BMW 6 series mockup. A 6-channel projection system provided a realistic driving environment, with a refresh rate of 60 Hz. Three projectors were used for the 180◦ front view, and three projectors for the rear view (side and rear mirrors). We used the driving simulation software SILAB 6.5 of the Würzburg Institute for Traffic Sciences GmbH [29] and logged the driving data with 240 Hz. A 6-channel noise simulation completed the driving simulation. A freely programmable instrument cluster was used as human-machine interface. A tachometer and a speedometer were implemented for displaying driving-relevant information in this study. No additional advanced driver-assistance systems were used.

**Figure 3.** Driving simulator of the Chair of Ergonomics at the Technical University of Munich [30].

#### *3.4. Independent Variables*

We implemented a 3×2 within-subject design with three different scenarios on a three-lane highway (Figure 4), each with and without a marked AV. In all trials, participants started from a highway rest area and drove manually on a highway at a maximum speed of 130 km/h. The participants were instructed to adhere to the German highway regulations, in particular driving in the right lane, except when overtaking. To keep participants in the right lane, we implemented a high traffic density with a speed of 144 km/h in the middle lane at the beginning of all scenarios.

After a short time, an AV appeared in front of the participants in the right lane. The AV was either marked as such or looked like a manual vehicle. In all scenarios, the AV adhered strictly to the highway regulations and stayed in the right lane in front of participants. The appearance of the AV indicated the beginning of one of three different scenarios (Figure 4).

#### 3.4.1. Scenarios

#### Roadworks

Participants drove through roadworks where a speed limit of 60 km/h was applicable. The scenario started at the end of the roadworks. There was no sign to inform drivers that the 60 km/h limit no longer applied. Therefore, the AV remained at 60 km/h, whilst all vehicles in the other lanes accelerated to 100 km/h (Figure 4a).

#### Traffic Jam

During the second scenario, a traffic jam occurred on the highway. The vehicles drove at a speed of 30 km/h in the left lane and in the middle lane. The vehicles in the middle lane used the large gaps to cut in front of the AV. The AV had a speed range of 15 to 40 km/h while maintaining a minimum gap of 5 seconds to the vehicles cutting in. Since the ego vehicle drove behind the AV, the participant had to brake in accordance with the AV (Figure 4b).

#### Lane Change

The AV used the indicator to signal to change lanes from the right to the middle lane in the third scenario. The vehicles on the middle lane were traveling at a speed of 130 km/h, and at 140 km/h in the left lane. However, the gaps between the vehicles on the target track were too small for the AV's algorithm to conduct a lane change and the AV stayed in the right lane. As a result, the AV drove at a varying speed of between 110 and 120 km/h (Figure 4c).

**Figure 4.** Scenarios implemented in the driving simulation: (**a**) Roadworks, (**b**) Traffic Jam, (**c**) Lane Change.

#### 3.4.2. Marking the AV

In every scenario, a different vehicle type was used, so that participants could not recognize the AV immediately (Figure 5). The vehicle size was kept as constant as possible and the vehicle colors were kept unobtrusive (Figure 5). The participants experienced all scenarios with and without a marked AV. We marked the AV with a cyan LED strip in the upper part of the rear window that was visible to the participant when they followed the AV (Figures 1 and 5) [11,18,20,21]. We used an LED strip since it is less costly and easier to integrate into common commercial vehicles, compared to display or projection systems.

**Figure 5.** Vehicle types and colors for the AV, with and without a marking.

#### *3.5. Dependent Variables*

#### 3.5.1. Subjective Data

The questionnaire was divided into three parts. In the first part, we surveyed demographic information such as age, sex, kilometers driven per year and the attitude towards the development of AVs on a five-point Likert scale (1 = *very positive* to 5 = *very negative*).

The second part of the questionnaire comprised five questions and was repeated after each trial. The first question asked about the surrounding traffic (*Did you notice anything particularly positive or negative about the surrounding tra*ffi*c?*). With these questions, we aimed to find out whether participants recognized any different driving behavior in scenarios where the AV is not marked. The next questions enquired about conformity to the participants' expectations (*Did the vehicle in front behave as you would have expected?* and *How should the vehicle have behaved to meet your expectations?*). In addition, two further items were rated on a five-point Likert scale to investigate the driving behavior of the vehicle in front. With the first item, we measured, with regard to rationality, the perceived appropriateness of the driving behavior (*How appropriate was the driving behavior of the vehicle in front?;* 1 = *inappropriate* to 5 = *appropriate*) [31]. With the second item, we measured, with regard to emotionality, the perceived disturbance caused by the vehicle in front (*How disturbing was the driving behavior of the vehicle in front?;* 1 = *disturbing* to 5 = *not disturbing*) [31].

After all trials—in order to compare the objective driving data with the subjective perception—we asked the participants how they reacted when the vehicle in front was marked as an AV. In addition, we wanted to find out how people would react in real traffic situations. Therefore, we asked participants how they would behave in real traffic if they were to encounter an AV (which behaved as experienced in the simulation). Finally, we evaluated whether and for what reasons human drivers would like AVs to be marked as such.

#### 3.5.2. Objective Data

We counted the number of lane changes conducted by the participants in overtaking the AV. The time between the start of the scenario and the completion of the lane change was calculated, to assess whether the AV's marking led to earlier overtaking. The lane change was considered to be completed when the vehicle's center of gravity crossed the lane marking.

To evaluate whether participants kept a greater safety gap to the AV, the minimum time headway (THW) of each participant was calculated for the period that the participant followed the AV in the same lane. THW is calculated using the distance of the AV to the human driver (xAV-EGO) and the speed of the driver (vEGO) according to [32], see Equation (1).

$$\text{THW} = \frac{\text{x}\_{\text{AV}-\text{ECO}}}{\text{v}\_{\text{ECO}}} \tag{1}$$

#### *3.6. Participants*

Altogether, 40 participants were recruited via postings at the Technical University of Munich and received compensation. This sample did not consist of the driving instructors from the preliminary study. Due to simulation sickness, we had to exclude two participants. In total, we analyzed 38 participants (24 male, 14 female) with a mean age of 29.63 years (*SD* = 9.58 years). The participants had had their driver's license for an average of 12.13 years (*SD* = 9.37 years) and drove on average 7997.37 km per year (*SD* = 7535.95 km per year). Their attitude towards AVs was rather positive (*Mdn* = 2). This attitude was based, among other things, on the expectation of increasing road safety, improved traffic flow, and more comfort, but also on personal enthusiasm for the topic (Figure 6).

**Figure 6.** Attitude towards automated vehicles.

#### *3.7. Analysis*

We had some lags in the simulation, especially in the *Traffic Jam* scenario. Due to the technical problems, we had to exclude 20 trials from the subjective data and 25 trials from the objective data. In total, 208 trials were analyzable. Data were analyzed using Matlab, SPSS, and Excel. The Bonferroni correction was used for all statistical tests and the p-values were compared with a corrected alpha of 0.017.

The subjective data were ordinal scaled variables. Hence, two non-parametric Wilcoxon tests were calculated for both dependent questionnaire items.

The time elapsed until participants changed lanes in the *Roadworks* scenario is not normally distributed (marking: *W*(27) = 0.91, *p* = 0.02, no marking: *W*(24) = 0.88, *p* ≤ 0.01). However, for the *Tra*ffi*c Jam* scenario, the time elapsed until participants changed lanes is normally distributed (marking: *W*(14) = 0.88, *p* = 0.06, no marking: *W*(15) = 0.90, *p* = 0.08). In addition, the Shaphiro–Wilk test showed no significant departure from normality for the time elapsed until participants changed lanes in the *Lane Change* scenario (marking: *W*(16) = 0.95, *p* = 0.47, no marking: *W*(15) = 0.90, *p* = 0.09). As a result, we calculated one Wilcoxon test for the *Roadworks* scenario, and two t-tests for the other two scenarios.

The Shaphiro–Wilk test showed a significant departure from normality for the THW in the *Roadworks* scenario (marking: *W*(37) = 0.78, *p* ≤ 0.001, no marking: *W*(38) = 0.61, *p* ≤ 0.001). The THW for the *Tra*ffi*c Jam* scenario (marking: *W*(29) = 0.94, *p* = 0.12, no marking: *W*(27) = 0.92, *p* = 0.05), and the THW for the *Lane Change* scenario are normally distributed (marking: *W*(36) = 0.97, *p* = 0.31, no marking: *W*(36) = 0.96, *p* = 0.13). Therefore, we calculated one Wilcoxon test for the *Roadworks* scenario, and two Wilcoxon tests for the other two scenarios.

The open questionnaire items were evaluated descriptively.

#### **4. Results**

#### *4.1. Subjective Data*

We wanted to find out, whether marking an AV influences drivers. However, we found no significant differences for the item *How appropriate was the driving behavior of the vehicle in front?* (*Roadworks*: *z* = −0.26, *p* = 0.79, n = 37; *Tra*ffi*c Jam*: *z* = −1.00, *p* = 0.32, n = 25; *Lane Change*: *z* = −0.76, *p* = 0.45, n = 38; Figure 7), and for the item *How disturbing was the driving behavior of the vehicle in front?* (*Roadworks*: *z* = −0.94, *p* = 0.35, n = 37; *Tra*ffi*c Jam*: *z* = −1.36, *p* = 0.17, n = 25; *Lane Change*: *z* = −0.34, *p* = 0.73, n = 38; Figure 8).

**Figure 7.** Boxplot for the item *How appropriate was the driving behavior of the vehicle in front?*, segregated by situation and marking.

**Figure 8.** Boxplot for the item *How disturbing was the driving behavior of the vehicle in front?,* segregated by situation and marking.

In addition, the open questions illustrated that marking the vehicle does not affect the perception of the surrounding vehicles. For the *Roadworks* scenario, participants expressed incomprehension that the vehicle in front drove with only 60 km/h even after the roadworks. The vehicle in the *Tra*ffi*c Jam* scenario was criticized for letting other vehicles merge in front of it. For the last scenario, *Lane Change*, participants mentioned that the vehicle flashed but did not change lanes and that the vehicle lost speed trying to change lanes. For all scenarios, the aspects were named with and without a marking of the vehicle in front.

Moreover, no descriptive differences were found for the expected driving behavior between the scenarios where the AV is marked or not (Table 1). For the *Roadworks* scenario, participants expected the perceived AV's driving behavior in 48% (marking: 46%, no marking: 50%) of all cases. Over 60% of all participants wished that the AV would have accelerated again after the *Roadworks*, regardless of the marking. One participant in the *Roadworks* scenario with the marked vehicle and two participants in the analogous scenario with the unmarked vehicle mentioned that the AV drove as expected because of the lack of the appropriate road sign. For the *Tra*ffi*c Jam* scenario, nearly 76% of all participants (marking: 81%, no marking: 70%) expected the driving behavior. Those who had expected other driving behavior wished for a smoother driving style without letting as many vehicles merge. For the *Lane Change* scenario, only 26% (marking: 32%, no marking: 21%) of all participants expected the observed AV's driving behavior. Regardless of the marking of the vehicle, 80% wanted the AV to change lane or to switch off the indicators (27%) and accelerate once again (18%).



However, even if the marking had no influence on subject's ratings, 66% would prefer AVs to be marked (Figure 9). The other 34% do not want the vehicle to be directly identified as automated (Figure 9). Participants preferring AVs to be marked argued that it is easier to assess the AV's driving behavior (41%) and to adapt their own behavior to the new road user (e.g., greater gaps, increased attention; 22%). Another 15% would like to have marking in order to increase acceptance, and 15% mentioned that the AV is a role model, because it complies with the German highway regulations. In addition, 7% would like markings only as additional information. Reasons mentioned against marking the AV include that it is a normal road user (20%) and should not attract attention (33%). Another 27% mentioned that the potential for abuse is too high due to the marking and 20% were afraid that the uncertainty in road traffic will become too great (Figure 9).

Frequency of the reasons given

**Figure 9.** Reasons for or against AV marking.

At the end of the study, we asked participants how they reacted when seeing an AV in the simulation. Altogether, 50% of all participants mentioned that they behaved as usual, whereas others raised their attention levels (10%), drove more carefully (8%), and/or kept a greater distance (5%). Another 8% had higher confidence in the vehicle, because it adheres to German highway regulations (Figure 10).

**Figure 10.** Participants' mentioned behavior in the driving simulation scenarios and assumed behavior in real-world scenarios.

Nevertheless, in real traffic, only 18% of all participants said they would behave "normally". Another 24% would drive more carefully, 16% would raise their attention level, and 14% would keep a greater distance and drive more defensively. Another 14% said they would follow the vehicle and orientate themselves to the driving behavior of the AV, because it adheres to German highway regulations. However, another 8% would overtake the AV quickly and 6% would behave in a more risky manner than usual, because the AV drives in an error-free way (Figure 10).

#### *4.2. Objective Data*

Altogether, participants changed lanes in 55% of all trials. The most lane changes happened in the scenario *Roadworks* (marking: 73%, no marking: 63%), followed by *Tra*ffi*c Jam* (marking: 48%, no marking: 56%), and *Lane Change* (marking: 44%, no marking: 42%; Table 2). However, we found no tendency that marking the AV influences the frequency of lane changes on a descriptive level (Table 2). We also found no significant differences in the time elapsed until the lane change was conducted (*Roadworks*: *z* = −1.48, *p* = 0.14, n = 21; *Tra*ffi*c Jam*: *t*(10) = −1.26, *p* = 0.24; *Lane Change*: *t*(8) = −0.21, *p* = 0.84; Figure 11). In addition, the presence of markings had no significant influence on the THW in any of the three scenarios (*Roadworks: z* = −0.52, *p* = 0.60, n = 37; *Tra*ffi*c Jam*: *t*(24) = −0.16, *p* = 0.88; *Lane Change: t*(34) = −0.54, *p* = 0.59; Figure 12).



**Figure 11.** Boxplot for the time until lane change, segregated by situation and marking.

**Figure 12.** Boxplot for the minimum time headway, segregated by situation and marking.

#### **5. Discussion**

The aim of the study was to determine the influence of marking AVs on human drivers in three scenarios, in order to deduce whether markings should be implemented for AVs.

The results illustrate that marking AVs does not influence the driving behavior of human drivers and their subjective rating. This confirms the results of Kühn, Stange and Vollrath [8]. It is possible that the driving behavior itself is sufficiently informative in order to be able to recognize an AV. This is also consistent with the statements of Kühn et al. [8], who mentioned that drivers have a fairly accurate idea of how AVs will behave in situations on highways where interaction with other HRUs is required. Another aspect could be that drivers can deal with ambiguous driving strategies of other drivers and have already learned to compensate for such behavior by, for example, increasing the gap to the vehicle

in front or by overtaking. Rodríguez Palmeiro [10] already stated that the driving behavior of the AV is more important than external signs in pedestrians' deciding whether or not to cross the road.

Although no significant differences were found in participants' ratings, the majority of participants preferred the AV to be marked. Due to the marking, they could assess the AV's driving behavior and adapt their own driving behavior accordingly. In addition, most participants mentioned that they behaved normally in the simulation, but when encountering an AV in real traffic, they would behave more carefully with increased attention levels. Therefore, it can be assumed that a study involving encounters with AVs in real traffic might lead to different results.

However, this study only examined the encounter with a single AV in three selected scenarios. The drivers experienced the AV's driving behavior in every scenario for the first time. Therefore, it might be difficult to adapt their driving behavior to the AV without knowing what the AV is going to do next. It is reasonable to assume that an adjustment of human driving behavior would take place in the event of their repeated encounters with AVs. This also explains the participants' preference for marked AVs, as it enables drivers to recognize the AV at an early stage, and adapt their driving behavior accordingly. Therefore, it might be useful to investigate long-term effects in a further study. In addition, the effects of age and gender should be evaluated.

Besides the result that the marking had no influence, we found descriptive differences—dependent on the given scenario—for the number of lane changes and the time participants needed until they changed lanes. The *Roadworks* scenario showed the highest number of lane changes in the shortest time passed. This may be due to the large speed difference of the AV compared to the vehicles in the middle lane (60 to 100 km/h) in relation to the other scenarios (*Traffic Jam*: 15–40 km/h to 30 km/h; *Lane Change*: 110–120 km/h to 130 km/h). Based on these results, more scenarios should be investigated in future studies.

Due to technical issues in the *Tra*ffi*c Jam* scenario, the simulation did not run smoothly, therefore participants' driving behavior might be influenced. As a result, absolute values can only be interpreted with caution. Nevertheless, the comparison between the scenarios with and without a marking is still possible.

#### **6. Conclusions**

As a general conclusion, it can be stated that the marking of an AV made no differences to human drivers in terms of their driving behavior and their subjective ratings. It seems that drivers can compensate for AVs' driving behavior, whereby they do not require the AV to be identified as such. Nevertheless, the participants indicate that they prefer to be able to distinguish AVs from other vehicles. However, this study did not address the long-term effects, which may affect the results, and should be investigated in future studies.

**Author Contributions:** Conceptualization, T.F., A.F. and E.S.; methodology, T.F. and A.F.; software, A.F.; validation, T.F. and A.F.; formal analysis, T.F. and A.F.; investigation, T.F. and A.F.; resources, T.F., A.F. and K.B.; data curation, T.F.; writing—original draft preparation, T.F. and A.F.; writing—review and editing, T.F., A.F., E.S. and K.B.; visualization, T.F. and A.F.; supervision, K.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** The authors thank Franz Daisenberger for conducting the study.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

1. BMW Group. BMW Group's Driver Assistance and Autonomous Driving Development Department under New Leadership. Alejandro Vukotich Takes Over at the Helm, Elmar Frickenstein to Retire after Handover Phase. Available online: https://www.press.bmwgroup.com/africa-dom-easteurope/article/ detail/T0288264EN/bmw-group%E2%80%99s-driver-assistance-and-autonomous-driving-developmentdepartment-under-new-leadership-alejandro-vukotich-takes-over-at-the-helm-elmar-frickenstein-toretire-after-handover-phase?language=en (accessed on 3 May 2019).


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Multi-Vehicle Simulation in Urban Automated Driving: Technical Implementation and Added Benefit**

#### **Alexander Feierle \*,**†**, Michael Rettenmaier \*,**†**, Florian Zeitlmeir and Klaus Bengler**

Chair of Ergonomics, Technical University of Munich, 85748 Garching, Germany; florian.zeitlmeir@tum.de (F.Z.); bengler@tum.de (K.B.)


Received: 11 April 2020; Accepted: 16 May 2020; Published: 19 May 2020

**Abstract:** This article investigates the simultaneous interaction between an automated vehicle (AV) and its passenger, and between the same AV and a human driver of another vehicle. For this purpose, we have implemented a multi-vehicle simulation consisting of two driving simulators, one for the AV and one for the manual vehicle. The considered scenario is a road bottleneck with a double-parked vehicle either on one side of the road or on both sides of the road where an AV and a simultaneously oncoming human driver negotiate the right of way. The AV communicates to its passenger via the internal automation human–machine interface (HMI) and it concurrently displays the right of way to the human driver via an external HMI. In addition to the regular encounters, this paper analyzes the effect of an automation failure, where the AV first communicates to yield the right of way and then changes its strategy and passes through the bottleneck first despite oncoming traffic. The research questions the study aims to answer are what methods should be used for the implementation of multi-vehicle simulations with one AV, and if there is an added benefit of this multi-vehicle simulation compared to single-driver simulator studies. The results show an acceptable synchronicity for using traffic lights as basic synchronization and a distance control as the detail synchronization method. The participants had similar passing times in the multi-vehicle simulation compared to a previously conducted single-driver simulation. Moreover, there was a lower crash rate in the multi-vehicle simulation during the automation failure. Concluding the results, the proposed method seems to be an appropriate solution to implement multi-vehicle simulation with one AV. Additionally, multi-vehicle simulation offers a benefit if more than one human affects the interaction within a scenario.

**Keywords:** multi-vehicle simulation; mixed traffic; human–machine interface; automated driving

#### **1. Introduction**

A current research focus in the context of automated driving is human–machine interface (HMI) design. In urban areas, which are characterized by a high number of objects [1], a high number of vulnerable road users [2], and high information density [3], the automated vehicle (AV) must be able to clearly communicate with the passenger and the surrounding human road user [4]. The only way to investigate the simultaneous communication via the automation HMI (aHMI) and the external HMI (eHMI) [4] is by conducting a multi-vehicle simulation. This requires a human road user, such as a human driver, who perceives the eHMI and a passenger in the AV who perceives information from the aHMI.

A scenario of particular interest is the bottleneck scenario in urban areas [5] where communicating via eHMIs has the potential to enhance traffic efficiency and safety [6]. Partially automated driving systems (ADS) are already state of the art. Nevertheless, the current operation design domain (ODD) in partially automated driving is limited to highways, since these are characterized by a lower complexity compared to urban areas. As the driver must still monitor the ADS and must be able to take over vehicle guidance at any time without a request to intervene [7], it could be assumed that such systems will be realized sooner than systems with a higher level of driving automation in urban areas. Therefore, this study addresses the interaction between a human driver and a partially AV and its passenger in bottleneck scenarios in urban areas.

Compared to investigations with fixed programmed road users, multi-vehicle simulations should generate a more realistic driving behavior [8]. With regard to partially automated driving, there may be an added benefit, especially when the passenger of the AV has to take over vehicle guidance again. For this purpose, a controlled interaction scenario must be achieved, which is a special challenge of multi-vehicle simulation [9]. Therefore, this publication aims at the realization and evaluation of the technical implementation of such a multi-vehicle study. Additionally, a multi-vehicle experiment has been conducted to compare the results with a single-driver simulation to identify added benefits using multi-vehicle simulation.

#### **2. State of Research**

#### *2.1. Previous Studies on Multi-Agent Simulation*

Multi-agent simulation is a useful tool for analyzing the interaction of various road users in the same environment. It permits the measurement of the parameters of each individual participant as well as the objectification of the behavior within a group of several drivers, e.g., in platoons [10]. Additionally, the multi-agent simulation retains the single-agent simulation's benefits of being controllable and accurate and enriches the experiments with a more realistic traffic flow environment [11,12]. Thus, the multi-vehicle simulation increases the ability of both driving and traffic simulation [11]. It enables the investigation of social interaction [13] and the analysis of advanced driver assistance systems affecting several drivers [14]. A classification of previous research can be made according to the characteristics of the road users involved.

Lehsing, Benz, and Bengler [15] investigated the interaction between a human driver and a pedestrian in a pedestrian crossing scenario. In half of the encounters a confederate controlled the pedestrian, resulting in a more human-like behavior since he was able to react to the participants' driving behavior. In the other half of the encounters the pedestrian's behavior was programmed. The authors state that the approach of physically linking both simulators is a meaningful method in traffic research since it raises the validity of investigations in human–human interaction [15].

In contrast to the driver-pedestrian interaction there were studies researching the interaction between several human drivers, which could be clustered in experiments investigating safety-critical situations and experiments researching the interaction and cooperation between several road users. Hancock and de Ridder [16] used the multi-vehicle simulation to investigate the participants' avoidance responses at the brink of a collision. The authors emphasize the value of multi-vehicle simulation because it analyzes critical situations in a safe and efficient manner. Moreover, the method provided similar avoidance responses compared to real-world investigations [16]. Yasar, Berbers, and Preuveneers [17] also used the multi-vehicle simulation to investigate safety critical situations at intersections by analyzing the incident rate and the participants' driving behavior affected by a voice-based command system and the presence of traffic lights. Will [18] found a decrease in the criticality of encounters between a human driver and a motorcyclist due to a system supporting the interaction at intersections.

Aside from conducting multi-vehicle simulations to investigate safety critical situations, the method was also used to analyze the interaction or cooperation of different human drivers. The method was used to realize the presence of multiple participants in a platoon of four vehicles to identify parameters describing the behavior of different drivers within the platoon as well as the behavior

of the platoon as a whole [19,20]. Moreover, Heesen, Baumann, Kelsch, Nause, and Friedrich [21] conducted a multi-vehicle study to examine the effect of a cooperative lane change assistant on possible conflicts on motorways. Results of the experiment show that drivers consider the other driver's possible actions when requesting to cooperate. In addition, the capability to anticipate affects the willingness to cooperate [21]. Sun, Ma, Li, and Niu [11] confirm the positive effect of multi-vehicle simulation on the behavior in lane change maneuvers to be consistent with the data of field observations. Further research including multi-vehicle simulation was applied, e.g., the evaluation of dynamic speed guidance strategies [22] or the analysis of the "rubbernecking" phenomenon, consisting of a driver slowing down due to an accident on the opposite side of the road [23].

Furthermore, multi-vehicle simulations are used to analyze the subjective feeling of human drivers. Rittger, Mühlbacher, Maag, and Kiesel [24] found that the usage of a traffic light assistant could raise the feeling of bothering other road users and it induces anger in participants without an assistant. Additionally, the participants' knowledge of the presence of another real human in the same simulation influences the participants' sensation [25] and the willingness to cooperate [21].

The implementation of AVs and the associated investigation of the interaction between AVs and other road users enlarge the application of multi-agent simulation. Bazilinskyy, Kooijman, Dodou, and de Winter [26] analyzed the interaction between an AV communicating via an eHMI, a human driver, and a pedestrian at a T-intersection with a zebra-crossing. The authors concluded that the multi-agent simulation is a promising tool to research interaction in traffic in the future.

#### *2.2. Implementation of Multi-Agent Simulation*

One challenge in conducting a multi-agent simulation is to induce the interaction in a controlled manner [9]. In the case that the interaction does not occur in the simulation, there is no added benefit of multi-agent simulation [8]. The following possibilities to realize the participants' coordination were used to avoid the insufficiently synchronized encounters of several participants.

Schindler and Köster [27] used the implementation of detours, dynamically modified speed adjustments, and the manipulation of the participants' speedometer to synchronize the participants' encounter. Another possibility is the dynamic change of the route length [16,27] or to have one interaction partner as a confederate [15]. The confederate knows about the experimental condition and is able to react to the driving behavior of the other participant. Moreover, the instruction of participants could be used to enable a synchronized interaction [24,28]. In the simulation, the implementation of road sections where the participants have to follow programmed traffic and the control of implemented traffic lights are methods to enable coordinated interaction in a multi-agent simulation [27,29].

#### **3. Objectives**

One challenge of multi-vehicle simulation is that the participants have to approach the investigated scenario at the same time in order to ensure controlled interaction. Various publications have already taken up this challenge. However, all these studies investigated scenarios without automated road users. Since the present work investigates the interaction between an AV and a human driver at bottlenecks, new opportunities arise to achieve the synchronous arrival of both road users via the ADS and its implemented longitudinal control. This creates new challenges in terms of reproducibility and comprehensibility. Therefore, this publication aims at the technical implementation and evaluation of such a method with an automated road user in a multi-vehicle simulation. Hence, a multi-vehicle simulation was conducted. The results are compared with the results of a single-driver study on eHMI design to identify the relevant use cases where multi-vehicle simulation offers an added benefit. The objectives of this study lead to the following research questions (RQ):


#### **4. Technical Implementation**

#### *4.1. Basic Synchronization*

After analyzing synchronization methods used in research for multi-vehicle simulation studies (see Section 2.2), we decided to synchronize the AV and the human driver via a traffic light control. The basic synchronization with traffic lights enables the compensation of large time differences and has a low space requirement in the simulation environment. Figure 1 shows the basic synchronization we implemented in the simulation. For the manual vehicle, a speed limit of 30 km/h was applied directly after the traffic lights. For the AV, the speed limit of 30 km/h was set at the beginning of the interaction phase. When approaching the bottleneck the traffic light in front of the human driver shows red and the human driver has to wait at the stop line. The AV arrives at the other traffic light with a delay (Δt) due to course design. During the approach the AV passes a trigger point at the course which causes the traffic lights in front of the AV to switch from green to red so that the AV decelerates to a standstill in front of its stop line. Subsequently, both traffic lights switch from red to green. Since both traffic lights have the same distance to the road bottleneck and due to the simultaneous change of the traffic light's state, the AV and the human driver are basically synchronized when entering the scenario.

**Figure 1.** Basic synchronization of the AV (red vehicle in the lower part) and human driver (black vehicle in the upper part) via the traffic light control. The route does not correspond to the real course in the simulation and is shown schematically.

#### *4.2. Detail Synchronization*

After the basic synchronization has compensated large time differences, both vehicles start from a standstill after the traffic lights have turned green. A distance difference may already occur while waiting in front of the traffic lights if the human driver comes to a standstill with a different distance to the traffic lights than the AV. According to Rettenmaier, Albers, and Bengler [30] the interaction phase was defined in a radius of 50 m around the road bottleneck (see Figure 1). After passing the green traffic light, distance differences (Δd) would occur without detail synchronization, where the AV adapts to the behavior of the human driver. These differences would result due to the different speed profiles. In the case of large distance differences, there would be no interaction because the passing of the bottleneck would be regulated by the earlier arrival of one of the vehicles [31]. In order to achieve a high degree of synchronicity when the interaction phase is reached, the automated longitudinal control of the AV is used to adapt to the behavior of the human driver.

The automated driving system is realized by using simulation state data. The longitudinal control of the automation during free driving without a front vehicle or traffic light consists of a PID control, which receives speed settings as input. An acceleration is generated as output of the PID control, which is transferred to the internal vehicle dynamics using a single-track model of the driving simulation software SILAB. Here, several implementation opportunities to adapt to the behavior of the manual vehicle exist (Figure 2):


**Figure 2.** Block diagrams of the proposed methods to adapt the AV to the behavior of the manual vehicle (MV).

In order to analyze which of these methods is most appropriate, speed profiles for a simulation of the manual vehicle are required. To exclude influences of lateral steering on the longitudinal dynamics, the scenario (Figure 1) was implemented on a straight track instead of a u-shaped one. Subsequently, three different speed profiles were implemented using a cruise control (Figure 3). The different speed profiles are intended to represent different human driver types (offensive, neutral,

defensive). Nevertheless, these synthetic profiles cannot represent a human driver exactly, so they are only suitable for a first pre-test.

**Figure 3.** Three different implemented speed profiles (offensive, neutral, defensive) using a cruise control to simulate the manual vehicle during the pre-test.

Negative values for the distance differences of both vehicles to the road bottleneck mean that the manual vehicle reached the interaction phase first. Method 1 using the speed as input resulted in a mean (*M*) difference of −8.26 m with a standard deviation (*SD*) of 7.61 m. Using the distance difference as an input in Method 2 led to *M* = −0.79 m (*SD* = 1.12 m) difference. Method 3 using the acceleration of the manual vehicle did not lead to any interaction scenarios due to implementation issues. Method 4 using the pedals' positions as input led to the smallest average difference of *M* = −0.51 m (*SD* = 0.63 m). However, since the same pedals in terms of hardware and software were not installed in both simulators, a factor was required to convert the pedal values. This factor was also dependent on the lateral dynamics, so that it was not possible to configure this factor for the u-shaped track and we had to reject this method. Due to the smaller resulting differences for Method 2 compared to Method 1, we used the distance difference as an input for a separate PID controller to do the detail synchronization. In order to enable the PID control of the AV to compensate for the distance difference, the AV's speed limit of 50 km/h should be maintained until the start of the interaction phase.

For standardized conditions of the interactions, the speed profiles of the AV should be as identical as possible during all encounters within the interaction phase. For this purpose, a further pre-test was carried out in which the detail synchronization was switched off before the interaction phase so that the automated longitudinal guidance could be adjusted to 30 km/h. Again, the three synthetic speed profiles (offensive, neutral, defensive) were used on a straight course, while the distance of the switch-off to the road bottleneck was varied. The longitudinal control needs about 40 m to compensate for a speed difference of 5 km/h to the target speed of 30 km/h. Therefore, the distance of the switch-off was varied in 10 m steps between 80 m and 120 m to the road bottleneck. The start of the interaction phase (distance of 50 m) was used as reference. The mean distance differences with the standard deviation between the AV and implemented manual vehicle to the road bottleneck, respectively, are shown in Table 1.


**Table 1.** Mean and standard deviation of the distance differences of the AV and the implemented manual vehicle using each speed profile (offensive, neutral, defensive) once (*n* = 3). The distance to the road bottleneck when the detail synchronization was switched off was varied.

The earlier the switch-off is performed, the more the mean distance difference, and especially its standard deviation, increases. Thus, an earlier switch-off point leads to a reduction in synchronicity. In contrast, an early switch-off of the detail synchronization leads to a constant speed profile in the interaction phase and thus to a corresponding reproducibility of the AV's speed profile. At speeds of less than 30 km/h of the AV during detail synchronization, the AV would subsequently accelerate to 30 km/h after switching off the detail synchronization in front of the bottleneck. This could lead to a lack of comprehensibility by the passenger, which in turn could result in passenger intervention. Thus, synchronicity, reproducibility, and comprehensibility must be taken into account when designing the detail synchronization (Figure 4). It is not possible to guarantee the desired interaction scenarios with the human driver by simultaneously fulfilling these three attributes. Therefore, one of the criteria had to be neglected in the design and either a limited synchronicity, a limited reproducibility, or a limited comprehensibility had to be accepted (Figure 4).

**Figure 4.** Effect of the switch-off point of the distance controller on the attributes synchronicity, reproducibility, and comprehensibility for detail synchronization. The switch-off point results in a trade-off between these attributes in a way that a simultaneous fulfilling of all attributes cannot be guaranteed.

For the investigation of the interaction at bottlenecks and possible automation failures, we considered the highest possible synchronicity and reproducibility as most important, so that

the vehicles arrive at the bottleneck simultaneously and the AV has the target speed of 30km/h at the beginning of the interaction phase. Low synchronicity or reproducibility could limit the validity of the experimental setting and may lead to many excluded datasets. Therefore, we decided to use a limited comprehensibility. Switching off the detail synchronization 80 m in front of the bottleneck represents the best compromise between synchronicity and reproducibility (Table 1). A final pre-test with two participants and three runs each showed a distance difference of *M* = −2.6 m (*SD* = 8.2 m). We considered reaching the 30 km/h before the start of the interaction phase and having a distance difference of less than one vehicle length as a good reproducibility and synchronicity for our approach to use it for our experimental setting.

#### *4.3. Course Design*

Figure 5 presents the two course modules (Module I and Module II) we used in our study from the bird's eye view including the navigation details which supported both participants as they passed through the respective module on the intended route. Each participant drives through an individually designed urban route consisting of different streets and intersections. Since the participants are separated by a row of houses during entry into and exit from the module, they encounter each other only once per module at the road bottleneck. The size of the modules results in an average transit time of five minutes per module. The basic and detail synchronization occurs in the area around the bottleneck. The straight section on which the interaction takes place is 300 m long. Each traffic light of basic synchronization is 250 m apart of the bottleneck. The access to the interaction section consists of a slight bend so that the participants are not able to see each other while waiting at the respective traffic lights. For the manual vehicle, the speed limit at the interaction section was set to 30 km/h directly after the corresponding traffic light. The 30 km/h speed limit of the AV was set 50 m in front of the bottleneck. On the remaining course the speed limit was 50 km/h.

**Figure 5.** Course design consisting of two modules (Module I and Module II) the participants passed through during the experiment. Additionally, the navigation through the modules of the AV and the manual vehicle (MV) is presented.

#### **5. Multi-Vehicle Study**

#### *5.1. Sample*

Twenty-six participants took part in this study resulting in 13 participant pairs. The participants were comprised of 31% women and 69% men. The mean age of the participants was *M* = 27.50 years with a standard deviation of *SD* = 8.99 years. They possessed their driver's license for *M* = 10.08 years (*SD* = 8.93 years) and evaluated their previous knowledge of automated driving on a 5-point Likert scale from "very low" to "very high" with a median of 4 (= high). A statistical evaluation showed no differences between automated and manual vehicle groups. The requirement for participation in this experiment was a valid driver's license.

#### *5.2. Experimental Design*

The multi-vehicle study consisted of a 2 (message) × 2 (bottleneck type) repeated measures design. The first factor message (within-subject) represented the AV's intention. It contained the factor levels AV yields the right of way and AV insists on the right of way. The second factor bottleneck type (within-subject) consisted of the levels bottleneck narrowed on both sides and bottleneck narrowed on one side. Additionally, we implemented an automation failure where the AV first communicated to yield the right of way at a bottleneck narrowed only on the AV's side. Thirty meters in front of the bottleneck the AV failed to detect the oncoming human driver. Therefore, it stopped communicating by switching off the eHMI and started to pass through the bottleneck despite the oncoming human driver. Each participant pair experienced the Use Cases 1-4 once in a permuted order followed by Use Case 5 with the automation failure at the end of the experimental drive (Table 2).

**Table 2.** Five different Use Cases the participants passed through.


#### *5.3. Driving Simulators*

The study took place in the two modular driving simulators at the Chair of Ergonomics of the Technical University of Munich (Figure 6). Both simulators offer a 120◦ horizontal field of view on three 55-inch screens with Ultra-HD resolution. While the rearview mirror is integrated in the view of the middle screen, two additional displays visualize the side mirrors. An additional display behind the steering wheel serves as a freely programmable instrument cluster (IC). In the AV setup, an LED-strip was positioned where the bottom of the windshield would be. In addition, the AV setup was equipped with a motion platform. Four D-BOX actuators generated pitch and roll movements, which provided participants with improved feedback about the behavior of the AV. Sound systems in both simulators generated engine and environmental sounds. We used the driving simulation software SILAB 6.0 from the Würzburg Institute of Traffic Sciences [32]. A data collection rate of 240 Hz and a refresh rate of 60 Hz was used. The partially automated driving system of the AV had to be activated by a button on the steering wheel. The automated driving system could be deactivated at any time using the same button or by braking, accelerating, or steering. The simulators are located in different rooms and were networked via LAN cable.

**Figure 6.** Modular driving simulators. (**a**) Manual vehicle setup; (**b**) Automated vehicle setup with blue LED-strip.

(**a**) (**b**)

#### *5.4. HMI Design*

#### 5.4.1. Human–Machine Interface of the Manual Vehicle

We used an instrument cluster (IC) and head-up display (HUD) for the HMI of the manual vehicle. Both HMI elements presented navigation and speed information. No other information, such as from driver assistance systems, was implemented in the manual vehicle's HMI.

#### 5.4.2. Automation Human–Machine Interface

The aHMI [4] consisted of an instrument cluster (IC), a head-up display (HUD), and an LED strip. The aHMI should provide information about current and planned maneuvers in addition to the system status to the passenger when monitoring a partial automated driving system [33–35]. The LED-strip was mounted at the bottom of the windshield since this is an often used position in the context of automated driving [35–39]. When the ADS was available, the LED-strip illuminated white and after activation, the LED-strip illuminated blue [40]. For displaying the current and planned maneuver, the IC and HUD were used. The IC display (Figure 7) has been further modified from the adaptive concept of Feierle, Bücherl, Hecht, and Bengler [41]. The current speed is displayed on the left part of the IC, while the system status is displayed on the right and at the bottom as part of an automation scale. Central to the display is the indication of the planned and current maneuvers of the vehicle as well as the traffic sign recognition. Above this, as an extension of the road, is the navigation display. The visualization of the maneuvers regarding the investigated bottleneck scenarios depending on the oncoming traffic, are shown in Figure 8.

**Figure 7.** Visualization of the instrument cluster, modified from Feierle et al. (2020) [41].

**Figure 8.** Visualization of the maneuver in the IC during the bottleneck scenarios: (**a**) bottleneck narrowed on both sides, AV insists on the right of way; (**b**) bottleneck narrowed on both sides, no oncoming traffic, AV passes; (**c**) bottleneck narrowed on both sides, AV yields the right of way; (**d**) bottleneck narrowed on one side, no upcoming traffic, AV passes; (**e**) bottleneck narrowed on one side, AV yields the right of way.

The HUD is based on the concept of Feierle, Beller, and Bengler [42]. The display (Figure 9) is divided into three sections. Speed information is located at the left section, system status, and driving maneuvers are shown in the middle section, and the right section shows the navigation information.

**Figure 9.** Head-up display showing speed, maneuver, and navigation information when the AV insists on the right of way in the road bottleneck scenario narrowed on both sides. The black background is transparent in the driving simulation.

#### 5.4.3. External Human–Machine Interface

The eHMI [4] consisted of a display mounted at the front of the vehicle, since its message is visible for the human driver, especially for long distances like in the road bottleneck scenario [6]. The design of the eHMI (Figure 10) was developed by Rettenmaier et al. [30]. The eHMI uses an arrow to indicate which negotiation partner can pass through the bottleneck first. With the green arrow the AV communicates to yield the right of way to the human driver. The orange arrow indicates that the AV insists on the right of way. Both arrows are animated with a frequency of 1 Hz building up in the direction the bottleneck may be passed through first. Additionally, with the arrows the eHMI design includes the contour of the road represented by two gray lines [30].

**Figure 10.** External HMI used in the study. In the upper part of the picture the AV indicates to yield the right of way to the human driver. In the lower part the AV communicates to insist on the right of way. The illustrated scenario is the road bottleneck narrowed on both sides of the road [30].

#### *5.5. Experimental Track and Bottleneck Scenarios*

The experimental track consisted of a route network in an urban area with several intersections and connecting roads. The scenario examined in the study is the road bottleneck scenario composed of the simultaneous encounter of a human driver and an AV approaching from the opposite direction. The scenario varies the bottleneck type and the right of way. Figure 11 presents the five resulting Use Cases the participants passed through during the experimental drive. The scenario is subdivided into the approaching phase in which both participants approach the bottleneck until the start of the interaction phases starting 50 m in front of the bottleneck. In the interaction phase the AV switches on its eHMI and it starts communicating to yield the right of way or to insist on it. If the AV yielded the right of way it stopped (S) 13 m in front of the obstacle. In Use Case 1 and Use Case 3 the bottleneck was constricted on both sides of the road due to two double-parked vehicles. In Use Case 2 and Use Case 4 there was only one obstacle, either on the human driver's side of the road or in the AV's lane. Use Case 5 represents the implemented automation failure when the AV first communicates to yield the right of way at the bottleneck narrowed on one side. Then the AV changes the strategy 30 m in front of the bottleneck and demonstrates insisting on the right of way. The 30 m results from adding the travel distance within a one second reaction time (*x* = 8.33 m), the braking distance with a deceleration of <sup>−</sup>2 m/s<sup>2</sup> (*y* = 17.35 m), and the stopping distance to the middle of the bottleneck (*z* = 4 m). After oncoming traffic is initially detected, the AV changes the communication strategy by switching off the eHMI due to losing the detection of the oncoming human driver during the passage. The speed limit in the interaction phase was set to 30 km/h for both participants.

**Figure 11.** Different bottleneck scenarios the participants passed through during the experimental drive. The scenarios are located in the interaction phase with a speed limit of 30 km/h. In the interaction phase the AV communicates to the driver of the manual vehicle (MV) via the eHMI either to yield the right of way or to insist on it [30].

#### *5.6. Procedure*

During the experiment there were two experimenters, one for each participant. Welcoming and introducing the participants was conducted separately by the experimenters to avoid the influence of gender effects, sympathy/antipathy, or social similarity between the participants. After reading the safety instructions and the participant information the participants consented to the experiment. Subsequently, the participants filled in a demographic questionnaire including the age, gender, experience with automated driving, and the possession of their driver's license. Afterwards the participants received the instruction. The participants acting as the human drivers in the simulation were instructed about manual driving with navigation instructions and were informed that there would be interactions with an AV. Moreover, the human drivers were also made aware of the presence of another human in the AV in the simulation. The AV's passengers were instructed about partially automated driving, its capacities, and about the obligation of monitoring the driving scene. Additionally, the AV's passengers also received information about the presence of a human driver in the same simulation, since this awareness could positively affect the willingness to cooperate [21].

Subsequently, both participants completed an introductory drive (duration: 10 min) in the multi-vehicle simulation. The human drivers had the opportunity to familiarize themselves with the simulator's driving behavior and the navigation information. The AV's passengers got acquainted with the driving automation including the oversteering of the same. Afterwards, the experimental drive (duration: 25 min) was conducted consisting of passing through the Use Cases 1–4 in a permuted order followed by the experience of the automation failure in Use Case 5. The experiment concluded by both participants filling out a questionnaire and having an oral interview referring to the automation failure they experienced.

#### *5.7. Measures and Analysis*

We used the differences in distance and in time to arrival (TTA) of the two simulated vehicles to the bottleneck to assess the synchronicity and the driving profiles resulting from the methodology. Both metrics were calculated once the first of the two vehicles reached the interaction phase. For this purpose, six of the 65 possible encounters had to be discarded due to the intervention of participants in the AV before reaching the interaction phase.

To determine traffic efficiency and safety, we excluded the data of three participant pairs due to technical issues within the interaction phase. The traffic efficiency was operationalized by means of participants' passing times. This metric was defined as the time that elapsed from the manual driver's entrance to the interaction phase (50 m in front of the bottleneck) until passing the AV 15 m behind the bottleneck. The crash rate was used to assess the controllability of the automation failure. Additionally, as a further metric the time to collision (TTC) was calculated when the passenger of the AV took over control of the vehicle guidance. Based on the small sample size in multi-vehicle simulation and the large difference in sample size compared to single driver simulations, we refrained from a statistical evaluation and we descriptively analyzed the data.

#### **6. Results**

#### *6.1. Technical Implementation*

Figure 12 shows the distances of the manual vehicles and AVs to the bottleneck (blue line) as a result of the distance control. The angle bisector (orange line) represents the distances for an ideal synchronization, if the implemented control does not result in any delay. It can be seen that both vehicles start at different distances from the bottleneck after the traffic light turns green. At the beginning the manual vehicles approach the bottleneck faster than the AVs, resulting in a vertical rise in the curves. Therefore, the distance control results in an offset as the initial accelerations of the human drivers cannot be compensated for quickly enough. The maximum deviations occur between 250 m and 200 m. From 200 m to the bottleneck, the control is more successful in compensating for the difference in distance, which brings the curves closer to ideal synchronicity again, whereby in some cases an offset remains until 80 m before the bottleneck. The deviation increases again directly before the interaction phase. This may be due to the switch of the synchronization mode to the longitudinal control independent of the human drivers' behavior. In most cases, the manual vehicle reaches the interaction phase first since the human drivers show higher speed than 30 km/h in most cases.

The differences in distance (Δ*d*) (Figure 13) result in *M* = –5.70 m (*SD* = 4.06 m) which corresponds to a difference in TTA of *M* = –0.34 s (*SD* = 1.10 s). A negative difference in distance and TTA mean an earlier arrival of the manual vehicle at the interaction phase.

**Figure 12.** Distances of the manual vehicles plotted over the distances of the AVs to the bottleneck during the detail synchronization phase. The angle bisector visualizes the distances for an ideal synchronization.

**Figure 13.** Distance differences and its relative frequency of the manual vehicles and AVs when one of them has already reached the interaction phase. Negative values mean that the manual vehicle arrived at the road bottleneck first.

#### *6.2. Multi-Vehicle Study*

#### 6.2.1. Human Driving Behavior

Figure 14 shows the participants' passing time in the case that the AV yielded the right of way divided by the data of the single-driver simulation [30] (21 data sets) and the data of the multi-vehicle simulation (10 data sets). One data set (*n* = 9) was removed in multi-vehicle simulation due to an intervening participant in the AV. Table 3 contains the descriptive data. At the bottleneck narrowed

on one side the passing time is similar in both studies with an average difference of 158 ms. At the bottleneck narrowed on both sides the participants in the single-driver study needed on average 465 ms more than in the multi-vehicle simulation.

**Figure 14.** Participants average passing time in the case that the AV yields the right of way to the oncoming human driver divided by the bottleneck type. The data of the single-driver simulation are derived from Rettenmaier et al. [30].

**Table 3.** Descriptive data of the participants' passing time. The data of the single-driver simulation are derived from Rettenmaier et al. [30].


#### 6.2.2. Effect of Automation Failure

In the multi-vehicle simulation from ten trials four crashes occurred during the automation failure, where the human driver crashed with the AV and its passenger. These encounters are characterized by a late intervening AV's passenger (TTC: 0.37 s, 0.65 s, 0.90 s, 0.94 s). The change in the aHMI was not detected by the AV's passenger in all four cases. Moreover, switching off the AV's eHMI was only detected by one manual driving participant. The other three participants did not detect that the eHMI was deactivated. In contrast to the 40% crash rate of the multi-vehicle simulation, the single-driver simulation showed a crash rate of 95% [30].

In six trials no crash occurred. These encounters include faster interventions of the AV's passenger braking to standstill (TTC: 1.31 s, 1.83 s, 2.06 s, 2.32 s, 2.44 s, 2.73 s). The AV's passengers stated that no oncoming traffic was detected permanently (once), that they had noticed the change in the aHMI (three times), or that they could not give any information about the aHMI during the automation failure (two times). None of the six human drivers that did not crash noticed that the eHMI was switched off. They stated that the eHMI continuously communicated to yield the right of way.

#### **7. Discussion**

#### *7.1. Technical Implementation*

The results of the detail synchronization show that at the beginning the distance of the manual vehicle to the road bottleneck is decreasing faster than the distance of the AV to the bottleneck. According to the driving data, this is due to a slower acceleration of the AV. Thus, the manual vehicle accelerates strongly in the beginning and quickly reaches the maximum permitted speed. This fact can be attributed to the accelerator pedal in the manual driving simulator setup, which has a lower resistance than one of the Sensowheel pedals in the automated driving simulator setup. The resulting distance difference is compensated by the distance control in the detail synchronization with the distance passed. This can only be achieved by increasing the speed of the AV compared to the manual vehicle. Nevertheless, the allowed 50 km/h on the AV side during detail synchronization were rarely reached before the interaction phase. None of the participants reported that the speed was below the maximum speed, so different speed regulations seem to be a good way to compensate for differences in distance. Since the synchronicity increases with distance traveled, extending the distance of the detail synchronization could provide an improvement. A further adjustment of the PID controller could additionally provide improved synchronicity with a lower deviation. In addition, a modification of the control loop, e.g., by a two-cascade control, would be thinkable. However, the inconsistent setpoint changes caused by the driving behavior of the human driver make it difficult to minimize the control deviation with the proposed possible improvements. In particular, switching off the detail synchronization 80 m before the bottleneck leads to an increase in asynchrony directly before the interaction phase. The absolute value of the resulting mean distance difference (*M* = 5.7 m) only moderately exceeds the AV's length (4.68 m), which we consider a tolerable deviation. Previous multi-agent studies including two manual road users lacked in inducing the intended interactions in a controlled manner in half of the recorded interactions in Will [18] and between 30% and 43% in Hancock and de Ridder [16]. Compared to these studies, the synchronization of the AV and the manual driver in this paper succeeded in all cases without an intervening AV's passenger. Therefore, the proposed method appears to be valid to implement a multi-vehicle simulation with one AV.

#### *7.2. Multi-Vehicle Study*

#### 7.2.1. Human Driving Behavior

The AV supports the human driver to efficiently pass through the bottleneck scenarios by communicating to yield the right of way. The enhancement in traffic efficiency is reflected in the human drivers' short passing times. In comparison to the passing times of the single-driver simulation [30], the ones of the multi-vehicle study are similar or even slightly faster. This could be attributable to the fact that the AV arrived at the bottleneck a little later, which is an indication to yield the right of way in real world traffic [31]. However, as there are no clear tendencies, we state that the synchronization of both participants was implemented with sufficient accuracy and that there is no major influence by the variance of synchronization. Thus, the multi-vehicle simulation has, apart from the complex implementation, no disadvantage compared with the single-driver simulation when investigating the interaction of an AV with a manual driving participant.

#### 7.2.2. Effect of Automation Failure

The multi-vehicle simulation resulted in a lower crash rate compared to the single-driver study [30]. However, the automation failure in this paper resulted in four crashes of the AV and the human driver, which means that the implemented scenario was too critical to be resolved by the participants. Only one participant noticed that the AV switched off its eHMI. Switching off the eHMI to communicate that the AV changes its strategy and passes through the bottleneck is insufficient. As already shown in the single-driver simulation [30] the AV has to communicate the changing driving strategy more saliently by displaying at least the message of the AV's actual status. The increased stimulus would result in faster reaction times by the participants [43] and could lower the crash rate. Additionally, only 30% of the AVs' passengers noticed the change in the IC or HUD. Here, a salient presentation of the

planned maneuver by an augmented reality HUD and the resulting shift of the visual attention to the relevant driving environment could offer added value for future investigations [42].

In summary, participants were used to a perfect working automated system due to the previous encounters. During the automation failure, participants were not attentive enough since it was hardly possible for humans to monitor for unlikely abnormalities [44]. Therefore, we state that the AV's internal and external communication must be reliable and the AV must not change its strategy.

#### 7.2.3. Is Multi-Vehicle Simulation Beneficial?

If a study deals with the interaction of a perfectly working AV with its passenger or surrounding road users, there is no benefit of multi-vehicle simulation compared to the single-driver simulation because the results show no clear descriptive tendencies. It makes no difference to the human drivers' driving behavior whether there is a real passenger in the AV or whether the AV is implemented within a single-driver simulation because in both cases the AV is programmed and the passenger has no influence on the AV's behavior. We state that in scenarios where only one human negotiation partner affects the interaction it is sufficient to use single-driver simulation, thus avoiding the additional effort of the multi-vehicle simulation.

If research deals with the interaction of two human negotiation partners like after the take-over of the AV's passenger during the automation failure, there is a benefit for multi-vehicle simulation. The results show that the AV's passenger lowered the crash rate by intervening in the multi-vehicle simulation. The take-over including the timing and the braking behavior of the AV's passenger is barely possible to implement in the single-driver simulation.

#### *7.3. Limitations*

A statistical analysis between the data of the multi-vehicle simulation and the single-driver simulation is not reasonable since the sample size in the present study was too small. Nevertheless, descriptively analyzing the data shows similar results in driving behavior in multi-vehicle and in single-driver simulation. Moreover, the sample was young and an above-average number of male participants attended. It will be useful to conduct future experiments with an age- and gender-balanced sample.

Since the human drivers' driving behavior differed, the synchronization and thus the arrival at the bottleneck was not completely simultaneous in each trial in the way that the human driver reached the interaction phase first. This fact could have affected the participants' passing times. The variance in manual driving behavior had the additional effect of the AV sometimes demonstrating incomprehensible driving behavior to compensate for the difference in distance. However, this problem did not disturb any participant.

#### **8. Conclusions and Future Work**

Based on the successful synchronization of the AV and manual vehicle in this study, we recommend a traffic light control for basic synchronization and a distance control for detail synchronization for future investigations using multi-vehicle simulation. The multi-vehicle simulation compared to a single-driver simulation revealed an added benefit for the automation failure scenario by realizing a more human-like interaction of two potential reacting and acting participants.

Single-driver studies seem to be appropriate to enable a worst-case consideration without an intervening AV's passenger, for example, in automation failure scenarios. To investigate more realistic regular interactions between several road users further multi-vehicle simulation studies should be conducted. We suggest conducting a large-scaled study addressing several scenarios (e.g., bottlenecks, intersections, roundabouts) to allow a deeper comparison with single-driver studies and a simultaneous investigation of AV's internal and external communication. Furthermore, future multi-agent simulation studies should not be limited to motorized road users, but should also address vulnerable road users such as cyclists and pedestrians.

**Author Contributions:** Conceptualization, A.F., M.R., and F.Z.; methodology, A.F., M.R., and F.Z.; software, A.F., M.R., and F.Z.; validation, A.F., M.R., and F.Z.; formal analysis, A.F., M.R., and F.Z.; investigation, A.F., M.R., and F.Z.; resources, A.F. and M.R.; data curation, A.F., M.R., and F.Z.; writing—original draft preparation, A.F. and M.R.; writing—review and editing, A.F. and M.R.; visualization, A.F. and M.R.; supervision, K.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the German Federal Ministry of Economics and Energy within the project @CITY: Automated Cars and Intelligent Traffic in the City, grant number 19A17015B. The authors are solely responsible for the content.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
