*Article* **Lane-Merging Strategy for a Self-Driving Car in Dense Traffic Using the Stackelberg Game Approach**

**Kyoungtae Ji 1, Matko Orsag <sup>2</sup> and Kyoungseok Han 1,\***


**Abstract:** This paper presents the lane-merging strategy for self-driving cars in dense traffic using the Stackelberg game approach. From the perspective of the self-driving car, in order to make sufficient space to merge into the next lane, a self-driving car should interact with the vehicles in the next lane. In heavy traffic, where the possible actions of the vehicle are pretty limited, it is possible to conjecture the driving intentions of the vehicles from their behaviors. For example, by observing the speed changes of the human-driver in the next lane, the self-driving car can estimate its driving intention in real time, much in the same way of a human driver. We use the principle of Stackelberg competition to make the optimal decision for the self-driving car based on the predicted reaction of the interacting vehicles in the next lane. In this way, according to the traffic circumstances, a self-driving car can decide whether to merge or not. In addition, by limiting the number of interacting vehicles, the computational burden is manageable enough to be implemented in production vehicles. We verify the efficiency of the proposed method through the case studies for different test scenarios, and the test results show that our approach is closer to the human-like decision-making strategy, as compared to the conventional rule-based method.

**Keywords:** self-driving car; game theory; decision-making; stackelberg game; lane-merging; intention estimation

#### **1. Introduction**

The recent development of self-driving cars has shifted the concept of partially autonomous driving from purely imaginary to the real. However, in order to achieve fully autonomous driving (i.e., Level-5 [1]), developers should still overcome many technical difficulties. One of the most challenging tasks is to describe the interaction between a self-driving car and human-driven vehicles [2,3]. In city driving, the vehicle often faces complex traffic situations that should be appropriately addressed. For instance, in congested traffic, human drivers constantly interact with other vehicles to create flexibility [4] by guessing the driving intentions of other drivers. Thus, autonomous vehicles should also act similarly to human drivers when facing complex traffic situations instead of conservative motions. Otherwise, to ensure safety, very conservative decisions such as waiting for traffic to ease is most likely to be made, which are not efficient [5]. Therefore, it is important to reflect the interactions between autonomous vehicles (AV) and interacting vehicles in the decision-making logic. In this way, human-like decision-making can be realized, which is essential when human-driven vehicles and self-driving cars share roads in the near future.

To resolve the technical problems mentioned above, we propose the game theoretic decision-making strategy that enables the self-driving car to consider the interactions with the surrounding vehicles. In particular, we model human thinking processes using game theory as a good candidate to handle heavy traffic conditions in which vehicles affect each other [6]. In this approach, the game participants are assumed to be rational

**Citation:** Ji, K.; Orsag, M.; Han, K. Lane-Merging Strategy for a Self-Driving Car in Dense Traffic Using the Stackelberg Game Approach. *Electronics* **2021**, *10*, 894. https://doi.org/10.3390/ electronics10080894

Academic Editors: Calin Iclodean, Bogdan Ovidiu Varga and Felix Pfister

Received: 6 March 2021 Accepted: 5 April 2021 Published: 8 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

players that make decisions maximizing their own utility [7]. The latter includes setting the player's particular objective. Thus, the appropriate decisions for self-driving cars can be made provided so that they play the game with surrounding vehicles. These vehicles are considered rational decision makers maximizing the utility [8,9].

The efficiency of game theory in modeling vehicle's decision-making process has been verified in previous studies. The most common approaches include the level-K reasoning framework and Stackelberg game approach.

The level-K framework (also referred to as the hierarchical reasoning game theory) is a method to model interaction between players using the hierarchical depth of thought [10,11]. The key idea of the level-K framework is that each player has a depth of strategic thought from level-0 to a specific number K, and the K-level player makes the decision, assuming that the other players choose the particular actions, which are based on the (K-1)-level depth of thought [12]. More specifically, the players assume themselves as the most advanced ones who can think one level ahead of others. In [13,14], the level-K framework decision making is proposed at the unsignalized intersection where many interactions between the vehicles occur. Other researchers also considered the lane-changing problem in highways using the level-K framework [15]. Although, a level-K framework is a promising technique to describe the interactions between the multiple agents, it should model the depth of thought for all agents in the strategic game. Therefore, if a self-driving car faces multiple vehicles, a heavy computational burden is required to model the depth of all vehicle's thoughts [16].

In contrast, the Stackelberg game (also referred to as the leader-follower game) is a hierarchical game where each player is assigned the roles of either leader or follower. By modeling the utility function that needs to be maximized by the each game participant, the interactions between each vehicle can be modeled effectively. Compared to the level-K framework, the Stackelberg game does not need to model the depth of thought for all game players hierarchically, so the computation is less complex. The lane-changing scenario is modeled using Stackelberg game theory in [17] and the surrounding drivers' intentions such as "aggressive driver" and "cautious driver" are estimated in real time [18,19]. In addition, the modeling of multi-vehicle interactions at uncontrolled intersections is considered in [20]. However, these approaches do not consider active interaction which is essential for lane-merging in dense traffic condition.

In this paper, we develop the decision-making strategy for a vehicle merging into another lane in dense traffic where all vehicles interact with each other. In such dense traffic, lane-merging is not possible unless there is a concession between interacting vehicles. So we consider the active interaction that changes the behavior of the interacting vehicle. For the manageable computation, we exploit the Stackelberg game approach. In real-world driving, the human-drivers consider only interacting vehicles, not all vehicles on the road. Similarly, in this paper, the self-driving cars consider interactions only with a single interacting vehicle in the next lane. In this way, the computational load of the proposed method is manageable enough to be implemented in the hardware.

It is also worth noting that the estimation of surrounding vehicle intentions is essential to make the appropriate decision in real time [21]. In our problem (i.e., dense traffic condition), we assume that the intentions of the drivers in the next lane are limited to "yield" or "ignore" [22]. By imposing a certain quantity (i.e., politeness) to the surrounding vehicles, the AV that needs to merge estimates the politeness of the interacting vehicle in real time and decides whether to merge [23]. However, human drivers utilize a strategy that is not exactly known to self-driving cars. To reflect this aspect, in our verification environments, the interacting vehicles in the next lane behave based on the car-following model that the self-driving car does not know. To verify the effectiveness of our game theoretic decision-making strategy, the performances for the various test scenarios are compared to those of the rule-based approach [24].

The characteristic features of the proposed strategy are summarized as follows.


The remainder of the paper is organized as follows. In Section 2, we provide a problem definition for the lane-merging scenario in heavy traffic. In Section 3, we introduce the vehicle model, action space, and the driving strategy of the surrounding vehicles. The key result of this paper, game theoretic decision-making strategy is presented in Section 4. The efficiency of our approach is verified through the case studies in Section 5. Finally, we make conclusions and provide a future outlook in Section 6.

#### **2. Problem Statement**

Here we describe the interactions between two agents in a strategic game. In Figure 1, AV in the side lane is called an "ego-car", which is a controlled host vehicle and the surrounding vehicles are the human-drivers modeled by a car-following model described in Section 3.3. As illustrated in Figure 1, the interactions between the vehicles in dense traffic are modeled using game theory. More specifically, we propose a lane-merging decision-making strategy for an autonomous vehicle in dense traffic where AV essentially does not have enough space to merge into the next lane from the side lane. In such an environment, lane-merging is not encouraged due to the risk of collision. However, for extreme cases, when traffic is not relaxed for a long time, the driver have to wait indefinitely unless aggressive behavior is considered. In addition, when in emergency situations such as hospital transport, aggressive behavior is required to a certain degree. Therefore, the lane-merging method for traffic congestion can be an option for self-driving cars, especially where traffic congestion happens frequently.

**Figure 1.** (**a**) Lane-merging scenario in the heavy traffic (photo is from Unsplash) and (**b**) reproducing the real-world driving scenario.

Under such heavy traffic conditions, the AV should interact with vehicles in the next lane to efficiently change the lane [25]. For example, if the AV in the side lane waits until enough space becomes available to merge into the next lane, the decision and control algorithm that is generally not willing to take a risk (i.e., lane-changing with an insufficient distance) may not merge into the next lane unless safety is ensured. However, in reality, human drivers in the side lane often attempt to influence interacting vehicles to get an opportunity to merge [26]. The notable waysi in which drivers interact with others are

hand gestures (motion indicating that they about to change the lane) and others. Obviously, in general traffic, not only surrounding vehicles but also the AV influences the behaviors of interacting vehicles. Therefore, such interaction modeling is essential to adequately describe real-world traffic.

From the perspective of an AV, it can secure sufficient distance to merge rather than wait forheavy traffic to be relaxed by affecting the behaviors of surrounding vehicles. To achieve this goal, the AV should predict the response of other vehicles to its actions, which is very uncertain in reality. From the perspective of the surrounding vehicles in the next lane, they should decide whether to yield to AV while obeying traffic regulations. For instance, even if the vehicle in the next lane is willing to make the safety gap by decelerating, the traffic condition does not allow to decelerate considering the relative distance or velocity with the behind vehicle. The latter is a very common situation for heavy traffic. For instance, the perspective of Car 3 is shown in Figure 1. Car 3 makes the decision based on its relationship to the AV, its relative distance, and velocity to Car 4.

In general, the decision-making of the drivers is determined by their dispositions e.g., aggressive, cautious dispositions. However, for the limited traffic condition considered (lane-merging in dense traffic), a decision of the vehicle is mainly governed by the traffic conditions rather than its driving disposition. Generally, the resulting decisions of the vehicle appears in the form of driving intentions. In the merging scenario, these intentions include "yield" or "ignore" (from the perspective of the vehicle in the next lane). For instance, when AV expresses a lane-merging intention by turning on the lane-changing signal, the reaction of Car 3 can be the deceleration to express "yield" or maintain the speed to express "ignore" (Figure 1). To build reliable and verifiable scenarios where the human drivers usually consider only the adjacent vehicles, the AV considers only one interacting vehicle.

#### **3. Vehicle Model and Action Space**

In this section, we introduce the model that represents the vehicle dynamics and the decision-making process for all interacting vehicles.

#### *3.1. Vehicle Dynamics*

For simplicity, we consider a point-mass vehicle model with continuous time:

$$\begin{aligned} \dot{x} &= v\_{x\_{\prime}} \\ \dot{v}\_{x} &= a\_{x\_{\prime}} \\ \dot{y} &= v\_{y\_{\prime}} \end{aligned} \tag{1}$$

and discretize it using the Euler forward method [27]:

$$\begin{aligned} x(t+1) &= x(t) + v\_x(t) \triangle t, \\ v\_x(t+1) &= v\_x(t) + a\_x(t) \triangle t, \\ y(t+1) &= y(t) + v\_y \triangle t, \end{aligned} \tag{2}$$

where state **x** = [*x*, *vx*, *y*] is defined by the longitudinal position at the center of gravity, velocity, and lateral position at the center of gravity. Moreover, control **u** = - *ax*, *vy* is defined by the acceleration, and lateral velocity. Finally, *t* and *t* stand for the time step size and is the discrete time instance, respectively.

#### *3.2. Action Set*

According to the Stackelberg game approach [28], the game players are assumed to choose a discrete action and execute it for the entire control cycle. To focus on vehicle interactions rather than dynamics itself, the finite discrete actions are assumed for all game participants (interacting vehicles) as follows:

1. "Maintain" : Maintain the speed.


Here, we set the constant acceleration, maximum velocity, and the lane-merge velocity to: *ax* = 0.97 m/s2, *v*max = 2.5 m/s, and *vy* = 2 m/s, respectively. The latter is only available to the ego-car.

Based on the action space, the strategy space for the leader and follower, i.e., the AV and interacting vehicle, is defined as,

$$S = \Gamma\_l \times \Gamma\_f \tag{3}$$

Γ*<sup>l</sup>* = {L, M} , Lateral Motions {A, M, D} , Longitudinal Motions Γ*<sup>f</sup>* = {A, M, D}, Longitudinal Motions

where *S* is a strategy space, Γ*<sup>l</sup>* and Γ*<sup>f</sup>* are action sets of the leader and follower, L and M denote the "Lane-merge" and "Maintain", and A, M, and D are the "Accelerate", "Maintain", and "Decelerate" in the longitudinal direction.

Obviously, the lateral motions are only available to the ego-car, and the surrounding vehicles are assumed to move forward in a longitudinal direction. In other words, for each time step, the leader decides whether to change lanes or not, while the follower decides its longitudinal motion based on the circumstances, such as traffic conditions.

#### *3.3. Intelligent Driver Model*

As mentioned earlier, in real-world driving, the ego-car reacts to the behaviors of the surrounding vehicles and vice versa. Therefore, to establish reliable and verifiable lane-merging scenarios for the formulated problem, the modeling of these interactions between the vehicles is important, which distinguishes our test environment itself from others where the vehicle motion is not interactive [29].

In game theoretic interaction modeling, it is assumed that all vehicles choose their actions based on the game theory so that all actions are limited by defined strategy space *S*. However, the surrounding vehicles, i.e., vehicles in the next lane, choose their actions based on their own strategies that the ego-car does not know exactly. The latter is reasonable because, generally, the drivers do not know the future behaviors and/or trajectories of others. Instead, they can predict the behavior (velocity and acceleration) of other vehicles from their observations.

To reflect this human-like decision-making process for interacting vehicles, a widelyused longitudinal car-following model referred to as the intelligent driver model (IDM) is introduced [30] :

$$\dot{v} = \frac{dv}{dt} = a\_{\text{max}} \left\{ 1 - \left(\frac{v}{v\_0}\right)^{\delta} - \left(\frac{\mathbf{s}^\*(v, \triangle v)}{\mathbf{s}}\right)^2 \right\},\tag{4}$$

where *v*0, *v*, *a*max, *δ*, *s*, and *s*∗ stand for target speed, velocity difference (approach rate), maximum acceleration, constant acceleration component, gap, and desired gap with the front vehicle, respectively.

The desired gap *s*<sup>∗</sup> is a function of *v* and *v* and given by:

$$s^\*(v, \triangle v) = s\_0 + vT + \frac{v \triangle v}{2\sqrt{a\_{\text{max}}b}} \tag{5}$$

where *s*<sup>0</sup> is the minimum gap between ego and front vehicles, *T* is a safe time headway, and *b* is the desired deceleration that makes a driver feel comfortable.

If there is no car ahead, *s*∗ is ignored, i.e., *s*∗ = 0, thus IDM become a function of the *v* and *v*0. All parameter values for the IDM are summarized in Table 1. The preferred time headway in dense traffic is defined based on [31] and the established models aim to describe the car-following in heavy traffic.

**Table 1.** Parameter values for the intelligent driver model (IDM).


The conventional IDM is a mathematical model that is based on psychical properties such as the relative distance and speed between vehicles. Thus, the intention of the drivers cannot be described in (4) and (5). To tackle this problem, we impose politeness in the conventional model, and IDM is modified to adequately react to the surrounding circumstances. In particular, when the ego-car sends a lane-changing signal to an interacting vehicle, the latter chooses its action depending on its specified politeness *pi* ∈ [0, 1]. If the *pi* of the *i th* interacting vehicle is close to 1, then the interacting vehicle is likely to allow the ego-car to change lanes by reducing the speed. Once the ego-car merges into the next lane successfully, the interacting vehicle now follows the ego-car based on (4) and (5). Otherwise (*pi* is close to 0), the interacting driver ignores the signal from the ego-car and follows the car ahead.

The speed control procedure of the modified IDM is described in Algorithm 1. Here *St* = [{*x*1(*t*), *<sup>v</sup>*<sup>1</sup> *<sup>x</sup>*(*t*), *<sup>y</sup>*1(*t*)}, ··· , {*xn*(*t*), *<sup>v</sup><sup>n</sup> <sup>x</sup>* (*t*), *<sup>y</sup>n*(*t*)}] is the state tuple of the interacting vehicles at time step *t*, and *S <sup>t</sup>* = {*St*,*se*} is the state tuple including the state of the ego-car *s*<sup>e</sup> = (*x*e(*t*), *v*<sup>e</sup> *<sup>x</sup>*(*t*), *<sup>y</sup>*e(*t*)). Moreover, *<sup>M</sup>* <sup>=</sup> {*i*|*<sup>i</sup>* <sup>∈</sup> {1, ··· , *<sup>n</sup>*}}, where *<sup>i</sup>* is the index of the interacting vehicle and *n* is the number of the interacting vehicle, *P* is the set of the specified politeness to the interacting vehicle (*P* = {*pi*|*i* ∈ *M*}). Additionally, *S*flag ∈ {0, 1} is a flag of the lane-changing signal that the ego-car sends to the interacting vehicle. For example, when the ego-car turns on the lane-changing signal, *S*flag = 1, otherwise *S*flag = 0. Finally, *f*IDM represents a conventional IDM in (4) and (5).

It is assumed that only one interacting vehicle can see the lane-changing signal from the ego-car. Thus, if *Observe*(*S*flag) is true (i.e., ego-car turns on the lane-changing signal and only one interacting vehicle observes it), the behavior of the interacting vehicle is determined by the assigned politeness. The process of decision making for the interacting vehicle behavior is as follows.

To include the stochastic component of the driver's behavior, we first generate the random number *p*rand ∈ [0, 1] and compare it with the assigned politeness of the *i* th vehicle *pi* ∈ [0, 1]. If the *pi* is larger than *p*rand, it is assumed that the interacting vehicle now considers the ego-car as its leader car (Line 6) and takes an action based on *f*IDM (Line 7). More specifically, the interacting vehicle is willing to allow the lane-merging of the ego-car. For example, if the interacting vehicle recognizes that the ego-car is too close, it decelerates to keep a desired distance from the ego-car. In the case that *pi* is smaller than *p*rand, the interacting vehicle ignores the ego-car's lane-merging intention and follows the original front vehicle in the same lane (Line 9). In addition, when *Observe*(*S*flag) = 0 (noninteracting vehicle that cannot see the lane-merging signal from the ego-car), the vehicle speed is controlled based on the conventional IDM (Line 12). This procedure is repeated for every time step to create a reasonable test environment. Following Algorithm 1, the politeness is imposed to the conventional car-following model.

**Algorithm 1:** IDM with Politeness

```
1 Input St, M, P and se
2 for i ∈ M do
3 if Observe(Sflag) then
4 prand = rand[0, 1];
5 if pi > prand then
6 S
            t ← St ∪ se
                      ;
7 si
            t+1 = fIDM(si
                       t | s ∈ S
                              t);
8 else
9 si
            t+1 = fIDM(si
                       t | s ∈ St);
10 end
11 else
12 si
         t+1 = fIDM(si
                    t | s ∈ St);
13 end
14 end
15 St+1 = {∀si
            t+1 ∈ St+1 | i ∈ M};
16 Output St+1
```
#### **4. Game Theoretic Lane-Merging Strategy**

#### *4.1. Utility Function*

In game theory, the participants are considered as rational decision-makers whose goal is to maximize their utility function (achieve a certain numerical design value). Here we define an appropriate utility function, and the ego-car assumes that the interacting vehicle's behavior aims to maximize the utility.

The objective of the ego-car is to merge into the next lane while maintaining safety. At the same time, the interacting vehicles also try to adjust their speeds to avoid a collision. These objectives for all game participants can be described by the utility function *U*≤<sup>0</sup> [32]:

$$
\mathcal{U} = w\_1 \mathcal{C} + w\_2 V + w\_3 H. \tag{6}
$$

where *w*1, *w*2, and *w*<sup>3</sup> are the non-negative weights for each term depending on its importance, *C*, *V* and *H* denote "Collision," "Velocity," and "Headway" functions defined below.

The collision detection function *C* ∈ {−1, 0} is equal to −1 when the collision occurs. Otherwise, it is set to 0. Additionally, we set the follower's weight, *w*1, as a varying parameter depending on the politeness:

$$w\_1 = \begin{cases} w\_\varepsilon \times p\_i & \text{for follow} \\ w\_\varepsilon & \text{, for leader} \end{cases} \tag{7}$$

where *wc* is a constant collision penalty.

Once we introduce *pi*, there is room for the follower to choose the less conservative action, even if the collision is expected due to the action of the ego-car. For example, when the *pi* is close to zero, the follower's behavior is dominated by the functions *V* and *H*. It means that an impolite driver usually prevents the merging of ego-car by choosing the aggressive actions, such as "Accelerate" and "Maintain."

The function *V* is the normalized difference between the current and the target speeds:

$$V = -\left\| \frac{v - v\_0}{v\_0} \right\|\_2. \tag{8}$$

If *v* is different from *v*0, *V* is always negative.

The function *H* is defined based on the headway:

$$H = \begin{cases} -1 & \text{, if headway} \in \text{"close"}\\ 0 & \text{, if headway} \in \text{"sufficent."} \end{cases} \tag{9}$$

The comparison of the current distance headway, *s*, and desired distance headway, *s*0, allows for the vehicle to determine whether the headway is closed or not. When the headway is "close," the vehicle is likely to decelerate and keeps a sufficient distance to the front car.

The defined utility (6) can be utilized for all game participants because the common goal of all vehicles is to drive safely by taking the appropriate actions moment. Once the follower recognizes the leader's lane-merging intention, it tries to maximize its utility based on the specified politeness. For instance, although the leader car takes action (*L*) first, the follower with low politeness takes an action that prevents the leader's merging by choosing the non-conservative action. For the leader car, by estimating the follower's driving intention in real time, the prediction for the follower's behavior is available, which will be described in the next subsection. Other parameters such as "comfort" are not considered due to the limitations of the scenario: Dense traffic, slow driving [33].

#### *4.2. Stackelberg Game*

In the Stackelberg game approach, the game participants are assigned as the leader and the follower. A leader takes action first, and a follower makes a decision after observing the leader's behavior ("first-mover advantage" [34]). We assign the leader role to the ego-car and the follower role to the vehicle in the next lane.

To successfully merge into the next lane, the leader vehicle should predict the reaction of the follower according to the leader's action. We assume that the behavior of the interacting vehicle is based on the basic principle of game theory: *"All game participants make their decisions in such a way as to maximize their own utility U"*. Thus, the future behavior of the follower can be predicted to a certain degree.

For example, from the perspective of the follower, the optimal action *γ<sup>f</sup>* ,<sup>∗</sup> should maximize its utility *Uf*(*γ<sup>l</sup>* , *γf*). Note that the utility is influenced by the leader's action *γ<sup>l</sup>* as well as the follower's action *γ<sup>f</sup>* , which is true in reality. From the perspective of the leader, he can predict the follower's behavior based on *Uf*(*γ<sup>l</sup>* , *γf*) while maximizing its utility *U<sup>l</sup>* (*γ<sup>l</sup>* , *γf*). Since the follower's optimal action *γ<sup>f</sup>* ,<sup>∗</sup> may be not unique in some cases, the leader assumes the worst-case scenario (the follower can take the action that is the worst in terms of *U<sup>l</sup>* (*γ<sup>l</sup>* , *γf*) maximization).

The optimal action for the leader *γl*,<sup>∗</sup> referred to as the Stackelberg equilibrium [6] is given by:

$$\gamma^{l,\*} \in \operatorname\*{argmax}\_{\gamma^f \in S^f(\gamma^l)} \mathcal{U}^l(\gamma^l, \gamma^f)),\tag{10}$$

$$S^f(\gamma^l) \stackrel{\text{def}}{=} \left\{ \mathbb{J} \in \Gamma\_f : \mathsf{U}^f(\gamma^l, \mathbb{J}) \ge \mathsf{U}^f(\gamma^l, \gamma^f), \forall \gamma^f \in \Gamma\_f \right\},\tag{11}$$

where *<sup>ζ</sup>* ∈ <sup>Γ</sup>*<sup>f</sup>* is an optimal action of the follower that maximizes *<sup>U</sup><sup>f</sup>* after observing the leader's action *γ<sup>l</sup>* , *Sf*(*γ<sup>l</sup>* ) is the strategy space for the given leader's action and follower's optimal action *ζ*, and *γl*,<sup>∗</sup> is optimal action of the leader based on the maximin strategy.

For a better understanding of the above-described algorithm, we give a simple example of the Stackelberg game in Figure 2. In the overtaking case, the leader car in the left line wants to go back to the right line and the follower car in the right line wants to drive faster. The possible decisions of the leader and follower are limited to discrete actions in Section 3.2 and denotes the initial letters in the heading as (4). To simplify, Γ*<sup>l</sup>* and Γ*<sup>f</sup>* are given as {*A*, *L*, *D*} and {*A*, *M*, *D*} respectively, so that the tree diagram has nine leaves. The numbers at the leaves are the payoff of the leader and follower for each action combination in *S*. The leader can predict the decision of the follower based on the game theory approach.

If the leader chooses action *L*, then *Sf*(*γ<sup>l</sup>* ) is {*M*} to maximize its own utility. Therefore, the leader can predict that the *U<sup>l</sup>* should be 0.7 when choosing action *L*. Likewise, the utility of action *A* and *D* can be predicted if the leader can predict the follower's action based on *Sf*(*γ<sup>l</sup>* ). However, when the leader chooses action *D*, the decision to maximize the follower's utility is not unique, i.e., *Sf*(*γ<sup>l</sup>* ) = {*A*, *M*}. To avoid the risk of the worst case, the leader assumes that the follower chooses the action *A*, which is the worst for the leader's utility. The expected utility of the leader's decision is 0.6 with action *A*, 0.7 with action *L*, and 0.8 with action *D*. In this manner, we can find that the optimal action *γl*,<sup>∗</sup> that maximizes the utility is action *D*. It means the optimal action of the leader is deceleration so as to allow the follower to go ahead.

**Figure 2.** Simple example of the Stackelberg game in an overtaking scenario.

#### *4.3. Real Time Politeness Estimation*

As defined earlier, politeness is the numerical value representing the intention of the interacting vehicle to yield the ego-car. When the politeness is low, the interacting vehicle is likely to ignore the lane-changing signal from the leader car and follows the front car based on the IDM. If the follower's intention is to yield to ego-car with high politeness, the follower considers the ego-car as its target vehicle to follow. Therefore, for safe lane-changing, the leader car should estimate the interacting vehicle's driving intention (i.e., politeness).

The politeness can be estimated by observing the current interacting vehicle's acceleration and is given by:

$$P(t+1) \leftarrow \frac{P(t) + \mathfrak{a}}{1 + \beta} \tag{12}$$

where *P*(*t*) ∈ [0, 1] is the estimated politeness at time step *t*, *α* ∈ [0, *β*] and *β* ∈ (0, ∞] are the tunable weighting parameters that determine the update rate, and the initial politeness *P*(0) is set to a relatively low value.

For example, when the acceleration of the interacting vehicle is observed (i.e., *a <sup>f</sup>*(*t*) ≥ 0 and *vf*(*t*) = 0), the politeness should be decreased. That is, if the interacting vehicle accelerates in dense traffic, where there is no sufficient gap between the vehicles, we assume that its driver is aggressive. In this case, we set *α* = 0. In contrast, if the deceleration of the interacting vehicle (i.e., *a <sup>f</sup>*(*t*) < 0 or *vf*(*t*) = 0) is observed, the politeness increases until it reaches 1 by setting *α* equal to *β*. The ego-car estimates the follower's intention in real time by comparing the *P*(*t*) with the threshold *P*th = [*P<sup>l</sup> th*, *<sup>P</sup><sup>u</sup> th*], where *<sup>P</sup><sup>l</sup> th* = 0.2 and *<sup>P</sup><sup>u</sup> th* = 0.8. For instance, if *P*(*t*) is less than *P<sup>l</sup> th*, then the ego-car determines the intention of follower as "ignore" and vice versa.

#### *4.4. Game Process*

The process of the merging strategy is shown in Figure 3. As the game starts, the leader and the follower are assigned based on their states. Specifically, the ego-car is always the leader, and the vehicle behind the ego-car in the next lane is considered as the follower (the Stackelberg game settings). We assume that ego-car has already reached the end of the side lane so that lane merging is needed as soon as possible. After the target vehicle (follower) that the ego-car should interact with is selected, the ego-car estimates the politeness of the target vehicle by observing its acceleration and follows the optimal action based on Stackelberg equilibrium for every time step.

**Figure 3.** Game process.

If the estimated politeness of the target vehicle is high enough (*P*(*t*) > *P<sup>u</sup> th*), it is considered as "Yield" intention and the optimal action of the ego-car (*γl*,∗) is "Lane Change." In this case, based on the Stackelberg game approach, the ego-car tries to change the lane. By contrast, even though *P*(*t*) < *P<sup>u</sup> th* is given, if *<sup>γ</sup>l*,<sup>∗</sup> is determined as "Maintain," the ego-car waits for the next time step and repeats the same procedure of the "Politeness Estimation" (Figure 3). In addition, when the target vehicle ignores the signal from the ego-car for some reason, the latter finishes the interaction with the target vehicle and starts the game again with the other target vehicle. If the ego-car fails lane merging by interacting with all the vehicles in the next lane, lane merging is possible when the traffic condition in the next lane is relaxed rather than based on the game theoretic approach.

#### **5. Case Studies**

In order to verify the effectiveness of the proposed approach, we conduct case studies for various test scenarios and compare our approach with the conventional rule-based decision making. The simulations are performed on Matlab R2020a platform under desktop specification (Intel i5-9500 CPU, Ram 16GB, Windows 10) and no computational difficulty is found. Figure 4 and Table 2 visualize and give the initial vehicle state conditions. There are four vehicles in the next lane. Car 3 and Car 4 are the interacting vehicles and all the vehicles move slowly due to the dense traffic. Therefore, space for the ego-car to merge into the next lane is not sufficient. In addition, we assume that all vehicles can observe the state of other vehicles, i.e., state tuple *S <sup>t</sup>* is available for all vehicles. Although the perception part is one of the major components in the autonomous driving technology, we exclude it from the scope of our study and focus on the decision and control parts.

**Figure 4.** Initial condition for the case studies.


**Table 2.** Initial conditions for simulations.

#### *5.1. Test Environment Setup*

We consider three scenarios by assigning the different politeness values to the four vehicles in the next lane. For a fair comparison, the same initial conditions are used for all scenarios. The differently assigned politeness values mean that the interacting vehicles can interpret the same traffic condition differently. For example, even for the same condition, the reactions of the interacting vehicles to the action of the ego-car are different, which is true in reality.

It is worth noting that instead of setting the extreme values (0 and 1), we consider the politeness of 0.1 or 0.9 for all vehicles. It gives room to act against unexpected situations like jaywalking. For example, even though the politeness of the interacting vehicle is 0.1, the vehicle has a 10% chance to behave cautiously in an emergency situation.

#### *5.2. Rule-Based Lane Merging*

In this subsection, the rule-based lane-merging approach is introduced, and the decision-making performance is compared to that of our approach in the next subsection. Rule-based lane-merging is a quite conservative decision-making strategy since it prefers to obey the traffic rules rather than interacting with the vehicles. For example, only the physical properties such as the relative distance and velocity between the vehicles are considered when making the decision [16]. From the perspective of the ego-car using the rule-based lane-merging approach, the surrounding vehicles in the next lane are considered as the moving obstacles, and the gap between the obstacles seems too tight to attempt a cut-in.

#### *5.3. Case Studies*

The results of the case studies for different scenarios are shown in Figures 5–7, where the snapshots are visualized for every 5 s during the entire simulation time (15 s). In Scenario 1, since the politeness of Car 3 is relatively high, we reckon that Car 3 is likely to allow the lane-changing of the ego-car as shown in Figure 5. As expected, starting from the initial traffic state in Figure 4, the ego-car successfully merges into the next lane around

*t* = 5 s. In contrast, Car 3 in Scenario 2 (Figure 6) ignores the ego-car's lane-merging signal due to its low politeness. Instead, the ego-car interacts with Car 4 whose politeness is high, and the ego-car can change lanes around *t* = 10 *s* as illustrated in Figure 6.

**Figure 5.** Scenario 1: *pi* = {0.9, 0.1, 0.9, 0.9}.

**Figure 6.** Scenario 2: *pi* = {0.1, 0.9, 0.1, 0.9}.

In Scenario 3 (Figure 7), the ego-car fails to merge into the next lane when all interacting vehicles have (Car 3 and Car 4) aggressive drivers. In this case, lane merging is only possible once the traffic condition is relaxed (*t* = 15 s). The production vehicle may not want to take a risk when they face these conflicts. Thus, it is most likely that the production AV in the side lane may behave as the ego-car in Figure 7, which is not very efficient.

Next, we estimate politeness values for all interacting vehicles using (12). The corresponding plots are shown in Figure 8. We make a neutral guess by assigning the initial politeness *P*<sup>0</sup> = 0.5, which is tunable. As can be seen in Figure 8a, the ego-car changes lanes successfully when the estimated politeness is larger than the upper threshold, i.e., *P*(*t*) > *P<sup>u</sup> th*, and the optimal action *<sup>γ</sup>l*,<sup>∗</sup> is calculated as "Lane change." The estimation of the Car 4's politeness is not performed since the ego-car does not interact with it after completing the mission (i.e., Lane-merging).

**Figure 7.** Scenario 3: *pi* = {0.9, 0.1, 0.1, 0.1}.

**Figure 8.** Estimated politeness of target vehicles for each scenario.

Figure 8b shows the estimated politeness for Car 3 and Car 4. For the first few steps, the ego-car interacts with Car 3, and the estimated politeness of Car 3 decreases as the steps progress. Once the *P*(*t*) of Car 3 reaches the lower bound *P<sup>l</sup> th*, the ego-car gives up lane-merging attempts and changes the target vehicle. Around 8 s, the target vehicle is changed from Car 3 to Car 4, and the ego-car repeats the same procedure. Similar to Figure 8a, the estimated politeness of Car 4 increases, and the ego-car attempts to change

lanes when the predicted action of the follower is "Yield," i.e., *P*(*t*) > *P<sup>u</sup> th*. The vertical dotted line indicates the moment of time when the target vehicle is changed.

The estimated politeness for Scenario 3 is illustrated in Figure 8c. Similar to the interaction with Car 3 in Figure 8b, both estimated politeness values decrease as the ego-car interacts with Car 3 and Car 4. Therefore, the ego-car fails to change lanes (Figure 7). Thus, we confirm that the estimated politeness values correspond to those from Table 3, even though the ego-car does not know the decision-making strategy of the interacting vehicle exactly, which is in fact based on the modified IDM.

For a fair comparison, the rule-based lane-merging strategy that is only based on the relative distance is implemented in our scenarios with the same initial condition. In this case, the vehicles in the next lane always ignore the lane-changing signals from the ego-car and move based on the conventional IDM. The snapshot of this case is omitted since it does not differ from that of Scenario 3 in Figure 7 regardless of the interacting vehicle's politeness. Instead, as shown in Figure 9, we illustrate the relative distance between the ego-car and vehicle in the next lane.


**Table 3.** Assigned politeness.

Using the rule-based approach, the ego-car calculates the distance to each interacting car and compares it with the threshold (dashed line). In Figure 9, *x* ego *<sup>t</sup>* <sup>−</sup> *<sup>x</sup>i*+<sup>1</sup> *<sup>t</sup>*+<sup>1</sup> is the distance between the ego-car and the vehicle behind the ego-car in the next lane (i.e., (*i* + 1)*th* vehicle). As the (*i* + 1)*th* vehicle approaches the ego-car, the relative distance decreases until the (*i* + 1)*th* vehicle passes the ego-car. Thus, the ego-car calculates the predicted relative distance one step ahead. In contrast, the relative distance between the ego-car and the vehicle in front of the ego-car in the next lane ((*i*)*th* vehicle) described by *x<sup>i</sup> <sup>t</sup>* − *x* ego *t* increases since the vehicle ahead is moving away.

**Figure 9.** Calculated distance for rule-based approach.

*x* ego *<sup>t</sup>* <sup>−</sup> *<sup>x</sup>i*+<sup>1</sup> *<sup>t</sup>*+<sup>1</sup> and *<sup>x</sup><sup>i</sup> <sup>t</sup>* − *x* ego *<sup>t</sup>* changes drastically at the 8-th time step. It corresponds to the (*i* + 1)*th* vehicle approaching and going beyond the ego-car at this moment. After this step, the (*i* + 1)*th* vehicle becomes the *i th* vehicle, since the ego-car fails to change the lane. At the same time, the ego-car changes the (*i* + 2)*th* vehicle to a new interacting vehicle ((*i* + 1)*th*).

We assume the safety threshold of 7 m because the vehicle length is 5 m. Therefore, the safe gap between *i th* vehicle and (*i* + 1)*th* vehicle is set to 2 m. The ego-car initiates the lane-merging when both relative distances exceed the safety threshold. In this dense traffic, unlike the results of the game theoretic decision-making strategy, there is no chance to merge into the next lane. That is, the ego-car is tuned to behave cautiously so that risky decision making is avoided, which is general in production vehicles.

Based on the results of the case studies, we confirm that the Stackelberg game approach in Figures 5 and 6 is much closer to human decision making in dense traffic compared to the rule-based approach (see Figure 7). Since an AV being too cautious in its decision making may not be preferred by the driver in the AV, human-like decision making should be considered. The AV may share the road with human-driven vehicles until the penetration rate of the AV in the road reaches 100%. Therefore, including vehicle interactions in decision-making algorithm is quite promising. Moreover, it can also be extended to other driving situations with little modifications.

#### **6. Conclusions**

This paper presented the lane-merging strategy for a self-driving car in dense traffic using the Stackelberg game approach, which included the driving intention of the surrounding vehicles. By monitoring the speed variations of the interacting vehicle, the self-driving car could estimate its politeness, representing driving intention. Based on the Stackelberg game theory, the decision of the self-driving car is made in such a way as to maximize utility function that is affected by the self-driving car as well as the interacting vehicle. Furthermore, to describe the reasonable behavior of the human driver, we present the modified car-following model that responds to the self-driving car's action. The proposed method is verified through case studies in various driving conditions. Compared to the rule-based lane-merging strategy, the decision made by our approach is much closer to that of the human driver in real-world driving. To extend the proposed method for different driving scenarios, future work will include the generalization of the proposed logic in other situations where the self-driving car frequently interacts with vehicles (e.g., intersection, take over, cut-in, etc). In addition, rather than discrete actions, continuous action will be considered to capture accurate vehicle dynamics.

**Author Contributions:** K.J., K.H., M.O.; writing—original draft, K.J., K.H. and M.O; writing—Review and editing, K.J., K.H. and M.O. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. NRF-2021R1C1C1003464 and NRF-2020K1A3A1A39112277).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

