1. Introduction
More than a century before John Nash formalized the concept of equilibrium in game theory [
1,
2,
3], Antoine Cournot [
4] had already introduced a similar idea through his duopoly model, which became a cornerstone in the study of industrial organization [
5]. In economics, an oligopoly refers to a market structure in which a small number of firms (
) supply a particular product. A duopoly, a specific case where
, is the scenario to which Cournot’s model applies. In this model, two firms simultaneously produce and sell a homogeneous product. Cournot identified an equilibrium quantity for each firm, where the optimal strategy for each participant is to follow a specific rule if the other firm adheres to it. This idea of equilibrium in a duopoly anticipated Nash’s more general concept of equilibrium points in non-cooperative games.
In 1934, Heinrich von Stackelberg [
6,
7] introduced a dynamic extension to Cournot’s model by allowing for sequential moves rather than simultaneous ones. In the Stackelberg model, one firm, the leader, moves first, while the second, the follower, reacts accordingly. A well-known example of such strategic behavior is General Motors’ leadership in the early U.S. automobile industry, with Ford and Chrysler often acting as followers.
The Stackelberg equilibrium, derived through backward induction, represents the optimal outcome in these sequential-move games. This equilibrium is often considered more robust than Nash equilibrium (NE) in such settings, as sequential games can feature multiple NEs, but only one corresponds to the backward-induction outcome [
1,
2,
3].
2. Related Work
Stackelberg games have been significantly influential in security and military research applications [
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19]. These games, based on the Stackelberg competition model, have been successfully applied in a wide range of real-world scenarios. They are particularly notable for their deployment in contexts where security decisions are critical, such as in protecting infrastructure and managing military operations.
The sequential setup of Stackelberg games is particularly relevant in military contexts where strategic decisions often involve anticipating and responding to an adversary’s actions. The applications of such games in military settings are diverse, ranging from optimizing resource allocation for defence to strategizing offensive maneuvers.
This paper considers the strategic interplay between drones and tanks through the lens of the Stackelberg equilibrium and the principles of backwards induction. In a military operation setting, we consider two types of agents, namely, the attacker (Red team) and the defender (Blue team). The attacker might utilize mobile threats to attack or reduce the number of static, unmovable entities belonging to the defender. In response, the defender will employ countermeasures to reduce the number of enemy attackers. This complex pattern of strategic moves and countermoves is explored as a sequential game, drawing on the concept of Stackelberg competition to illuminate the dynamics at play.
While focusing on developing a game-theoretical analysis, we have presented a hypothetical strategic scenario involving tanks and drones to illustrate our point. Naturally, this scenario may only loosely reflect the realities of such encounters, which evolve rapidly and are subject to constant change.
This paper’s contribution consists of obtaining an analytical solution to a Stackelberg competition in a military setting. To obtain such a solution, we limit the number of available strategic moves to small numbers but enough still to demonstrate the dynamics of a sequential strategic military operation.
3. The Game Definition
We consider a scenario in which two teams, Blue () and Red (), are engaged in a military operation. The team comprises ground units, specifically tanks, while the team operates aerial units, namely drones. The strategy involves the team’s drones targeting the team’s tanks. Meanwhile, the team not only has the capability to shoot down these drones but also provides defensive cover for their tanks, creating a complex interplay of offensive and defensive maneuvers in this combat scenario.
We assume that the team consists of n tanks where and represent the set of tanks as Let be a set of resources that are at the disposal of team to protect the tanks. It is assumed that the team’s pure strategy is to attack one of the tanks from the set T. The team’s mixed strategy is then defined as a vector where is the probability of attacking the tank attack and The team’s mixed strategy is also a vector where is the marginal probability of protecting the tank . Note that a marginal probability is obtained by summing (or integrating) over the distribution of the variables that are being disregarded, and these disregarded variables are said to have been marginalized out.
3.1. Marginal Probability of Protection for Tanks
We consider the case when there are five resources from the set
available to protect the four tanks from
whereas one or more of the resources can be used to protect a tank
from the set
T. For this case, the marginal probabilities
to protect the tank
are determined as follows:
where
For instance,
is the probability that the resource
is used to give protection to the tank
This means
and
In case the set of resources consists of
m element i.e.,
and the number of tanks to be protected are
n, we then have the the marginal probabilities
to protect the
j-th tank obtained as
where
3.2. Defining the Reward Functions
Let be the reward to team if the attacked tank is protected using resources from the set be the cost to team if the attacked tank is unprotected, be the reward to team if the attacked tank is unprotected, be the cost to team if attacked tank is protected. Note that is the marginal probability of protecting the tank using the resources from the set
The quantity then describes the payoff to the team when tank is attacked. Similarly, the quantity describes the payoff to the team when the tank is attacked. However, the probability that the tank is attacked is and we can take this into consideration to define the quantities and . These are the contributions to the payoffs to the and teams, respectively, when the tank is attacked with the probability
As the vector
describes the
team’s (mixed) attacking strategy whereas the vector
describes the
team’s (mixed) protection strategy, the players’ strategy profiles are given as
. For a set of tanks
T, the expected payoffs [
16,
17] to the
and
teams, respectively, can then be written as
We note from these payoffs that if the attack probability for a tank is zero, the rewards to both the and teams for the tank are also zero; the payoff functions for the either team depend only on the attacked tanks; and if the and teams move simultaneously, the solution is a Nash equilibrium.
Note that with reference to the reward functions defined in Equation (
3), for the imagined strategy profile (not equilibrium), where defender protects the first tank by all elements from the set
i.e.,
for
we obtain
and thus
If the attacker decides to attack the first tank i.e.,
, we obtain
and
4. Leader-Follower Interaction and Stackelberg Equilibrium
We consider a three step strategic game between the and teams, also called the leader-follower interaction. As the leader, the team chooses an action consisting of a protection strategy . The team observes and then chooses an action consisting of its attack strategy given by the vector . Knowing the rational response of the team, the team takes this in account and as the leader optimizes its own action. The payoffs to the two teams are and .
This game is an example of the dynamic games of complete and perfect information [
2]. Key features of this game are (a) the moves occur in sequence, (b) all previous moves are known before next move is chosen, and (c) the players’ payoffs are common knowledge. This framework allows for strategic decision-making based on the actions and expected reactions of the other players, typical of Stackelberg competition scenarios. In many real-world scenarios—especially in complex environments in a military contexts—the assumption that players’ payoffs are common knowledge does not hold and complete information about the payoffs of other players is rarely available.
Given the action
is previously chosen by the
team, at the second stage of the game, when the
team gets the move, it faces the problem:
Assume that for each
,
team’s optimization problem (
4) has a unique solution
, which is known as the
best response of the
team. Now the
team can also solve the
team’s optimization problem by anticipating the
team’s response to each action
that the
team might take. So that the
team faces the problem:
Suppose this optimization problem also has a unique solution for the
team and is denoted by
. The solution
is the
backwards-induction outcome of this game.
To address this, we consider the above simplified case i.e., when
. Expanding Equation (
3) we obtain:
Now, as
, we take as an arbitrary choice
in Equation (
6) to obtain
and this re-presses the
team’s reward function in terms of only three variables
,
, and
—defining its attack strategy
When expanded, the above equation becomes
As a rational player, the
team knows that the
team would maximize its reward function with respect to its strategic variables and this is expressed as
where
and
This results in obtaining
and this leads us to denote the sum of the reward and the cost to the
and
teams for protecting or attacking the tank
, respectively, by new symbols
As
and
, we substitute
in Equation (
10) along with the substitutions (
11) to obtain
Using Equations (
12)–(14), we now express
and
in terms of
. For this, we subtract Equation (13) from Equation (
12)
which gives
Similarly, subtracting Equation (14) from (
12) results in
Using Equations (
15) and (
16), the marginal
can then be expressed in terms of the marginal
as
Equations (
15)–(
17) represent the rational behaviour of the
team, which the
team can now exploit to optimize its defence strategy
.
From Equations (
3) the payoff function of the
team can be expressed as
and with the substitution
this result in obtaining
Now, substitute from Equations (
15) and (
16) to Equation (19), along with the substitutions (
11), to obtain
where
appear as the new parameters of considered sequential strategic interaction. This completes the backwards induction process of obtaining the optimal response of the
team in view of its encounter with the rational behaviour of the
team.
5. Optimal Response of the Team
From Equations (
21) and (22) we note that
depend on the values assigned to the two teams’ rewards and costs variables i.e.,
,
,
,
as well as on the
team’s attack probabilities
. Three case, therefore, emerge in view of Equation (
20) that are described below.
5.1. Case
After observing the attack probabilities
the
team obtains
using Equation (
21) along with the rewards and costs variables i.e.,
and the Equation (
11). If the
team finds that
then its payoff
is maximized at the maximum value of
and it is irrespective of the value of
. Note that at this maximum value of
, the corresponding values of
—as expressed in terms of
and given by Equations (
15)–(
17)—must remain non-negative, and that the maximum value obtained for
can still be less than
or
or
.
5.2. Case
As
and
, therefore, in view of the attack probabilities
, if the
team finds that
then the reward is maximized to the value of
with
and we then have
5.3. Case
If the
team finds that
then its reward becomes
, as defined by Equation (22), and is independent of the value assigned to
and via Equations (
15)–(
17) also independent of
,
,
.
6. Example Instantiation
As an example, we consider the set of arbitrarily-assigned values to the two teams’ rewards and costs as in the table below.
for which we have
and also for which using Equations (
21) and (22) we obtain
6.1. Case
Now, assume that while knowing the attack probabilities
, the
team uses Equation (
26) to find that
As discussed above, its payoff is maximized at the maximum value of
and it is irrespective of the value of
Using Equations (
15) and (
16), along with the enteries in the table (
24), the
team now determines the maximum value for
at which
,
,
obtained from Equations (
15)–(
17), respectively, all have non-negative values. Table (
24) gives
and a table of values is then obtained as
and
emerges as the maximum value at which
,
,
remain non-negative. The plots of
,
, and
vs.
(Range: 0.06 to 0.0636) appear in
Figure 1.
The
team’s protection strategy is therefore obtained from Equation (
28) as
and from Equations (
20), (
26) and (27), along with the table (
24), the
team’s payoffs then become
which, in view of the fact that
, can also be expressed as
Plot of the payoff
for the values of
,
,
that satisfy the constraints
,
,
and
is in
Figure 2.
From Equation (
32) the payoff
is maximized at the value of
for
and therefore
The payoff to the
team is then obtained from Equation (
8) as
where from Equation (
30) we have
and using the table (
24) we obtain
.
Now we consider the reaction of the
team after the
team has determined its protection strategy
while following the backwards induction in the case above. For the case when the attack probabilities are such that
in Equation (
26), we re-express the
team’s payoff given by Equation (
8) by substituting the
team’s protection strategy described by Equation (
30). Using
, the
team’s payoff are then expressed in terms of the attack probabilities
as
Now, in
Figure 3 below a plot is obtained comparing the
and the
team’s payoffs given by Equations (
32) and (
34), respectively, when these are considered implicit functions of
,
,
and with the constraints that
,
,
and
For most of the allowed values of the attack probabilities, as represented by the blue shade, and for
,
team remains significantly better off than the
team.
In view of the reward table (
24), the
team’s payoffs attain the maximum value of 7 when
or when
However, when this is the case, using Equation (
32) the payoff to the
team then becomes
.
6.2. Case
Consider the case when by using Equation (
26) the
team finds that
. As
in Equation (
26), the condition
can be realized for some values of the attack probabilities.
Plot of the payoff
for the values of
,
,
that satisfy the constraints
,
,
and
is in
Figure 4.
Now in view of Equation (
20) the
team’s reward is maximized to the value of
when
In this case, using Equation (
23) and the table (
24) the
team’s protection strategy is therefore obtained as
and, as before, using Equation (
20), (
26) and (27), along with the table (
24), the
team’s payoffs then become
i.e.,
which in view of the fact that
can also be expressed as
and similarly for the
team
7. Discussion
We consider the case that the team moves first and commits to a protection strategy . The team notices the protection strategy and decides its attack strategy given by the vector . The team knows that the team is a rational decision maker and how it will react to its protection strategy . The leader-follower interaction—resulting in the consideration of Stackelberg equilibrium—looks into finding the team’s best protection strategy while knowing that the team is going to act rationally in view of a protection strategy committed by the team. The team’s mixed strategy is given by the vector of the the attack probabilities on the four tanks by the team.
The vector
describing the
team’s allocation of its resources depends crucially on the parameter
as defined in Equation (14) and is obtained from the assigned values in table (
24) for rewards and the costs to the two teams. If
then the reward to the
team is maximized at the maximum value of
for which
—as expressed in terms of
by Equations (
15)–(
17)—remain non-negative. That is, the maximum value obtained for
can still be less than
or
or
We note that for the case
and for most situations encountered by the two teams—represented by the area covered by the blue shade in
Figure 3—the reward for the
team remains between
and
whereas the reward for the
team remains between
and 7.
However, in the case
and for most situations encountered by the two teams—represented now by the area covered by the blue shade in
Figure 5—the reward for the
team remains between
and
whereas the reward for the
team remains between
and 4.
That is, for most of the allowed values of the attack probabilities, the team can receive higher reward when relative to the case when However, for most of the allowed values of the attack probabilities, the team can receive less reward when relative to the case when Therefore, the situation is more favourable to the team than it is to the team. Similarly, the situation turns out to be more favorable to the team than it is to the team. Note that these results are specific to the particular values assigned in the considered example to the parameters , , , for the four tanks.
8. Conclusions
The Stackelberg equilibrium in this scenario is reached when the drones have optimized their attack patterns, in view of the protections provided to the tanks, and the tanks have subsequently optimized their protections in light of the drone best responses. The dynamic interplay of strategic decision-making, under the principles of Stackelberg equilibrium and backwards induction, highlights the intricate nature of modern warfare involving drones and tanks where brains and brawn are equally pivotal. A natural extension of this work is the case when the set of resources consists of m element i.e., and set of tanks is given as .
Author Contributions
Conceptualization, A.I.; Methodology, A.I.; Software, I.H.; Validation, E.T. and R.B.; Formal analysis, A.I.; Investigation, E.T., A.P. and R.B.; Writing—original draft, A.I.; Writing—review & editing, A.I. and C.S.; Visualization, A.P. and G.P.; Supervision, C.S.; Project administration, C.S. All authors have read and agreed to the published version of the manuscript.
Funding
The work in this paper was carried out under a Research Agreement between the Defence Science and Technology Group, Department of Defence, Australia, and the University of Adelaide, Contract No. UA216424-S27.
Data Availability Statement
The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.
Conflicts of Interest
The author declares no conflicts of interest.
References
- Binmore, K. Game Theory: A Very Short Introduction; Oxford University Press: Oxford, UK, 2007. [Google Scholar]
- Rasmusen, E. Games and Information: An Introduction to Game Theory, 3rd ed.; Blackwell Publishers Ltd.: Oxford, UK, 2001. [Google Scholar]
- Osborne, M.J. An Introduction to Game Theory; Oxford University Press: Oxford, UK, 2003. [Google Scholar]
- Cournot, A. Researches into the Mathematical Principles of the Theory of Wealth; Bacon, N., Ed.; Macmillan: New York, NY, USA, 1897. [Google Scholar]
- Tirole, J. The Theory of Industrial Organization; MIT: Cambridge, MA, USA, 1988. [Google Scholar]
- von Stackelberg, H. Marktform und Gleichgewicht; Julius Springer: Vienna, Austria, 1934. [Google Scholar]
- Gibbons, R. Game Theory for Applied Economists; Princeton University Press: Princeton, NJ, USA, 1992. [Google Scholar]
- Korzhyk, D.; Yin, Z.; Kiekintveld, C.; Conitzer, V.; Tambe, M. Stackelberg vs. Nash in Security Games: An Extended Investigation of Interchangeability, Equivalence, and Uniqueness. J. AI Res. (JAIR) 2011, 41, 297–327. [Google Scholar] [CrossRef]
- Bustamante-Faúndez, P.; Bucarey, L.V.; Labbé, M.; Marianov, V.; Ordoñez, F. Playing Stackelberg Security Games in perfect formulations. Omega 2024, 126, 103068. [Google Scholar] [CrossRef]
- Hunt, K.; Zhuang, J. A review of attacker-defender games: Current state and paths forward. Eur. J. Oper. Res. 2024, 313, 401–417. [Google Scholar] [CrossRef]
- Chen, X.; Xiao, L.; Feng, W.; Ge, N.; Wang, X. DDoS Defense for IoT: A Stackelberg Game Model-Enabled Collaborative Framework. IEEE Int. Things J. 2022, 9, 9659–9674. [Google Scholar] [CrossRef]
- Bansal, G.; Sikdar, B. Security Service Pricing Model for UAV Swarms: A Stackelberg Game Approach. In Proceedings of the IEEE INFOCOM 2021—IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Vancouver, BC, Canada, 10–13 May 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Li, H.; Zheng, Z. Optimal Timing of Moving Target Defense: A Stackelberg Game Model. In Proceedings of the MILCOM 2019—2019 IEEE Military Communications Conference (MILCOM), Norfolk, VA, USA, 12–14 November 2019; pp. 1–6. [Google Scholar] [CrossRef]
- Feng, Z.; Ren, G.; Chen, J.; Zhang, X.; Luo, Y.; Wang, M.; Xu, Y. Power Control in Relay-Assisted Anti-Jamming Systems: A Bayesian Three-Layer Stackelberg Game Approach. IEEE Access 2019, 7, 14623–14636. [Google Scholar] [CrossRef]
- Kar, D.; Nguyen, T.H.; Fang, F.; Brown, M.; Sinha, A.; Tambe, M.; Jiang, A.X. Trends and Applications in Stackelberg Security Games. In Handbook of Dynamic Game Theory; Basar, T., Zaccour, G., Eds.; Springer: Cham, Switzerland, 2016. [Google Scholar] [CrossRef]
- Tambe, M. Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned; Cambridge University Press: Cambridge, MA, USA, 2011. [Google Scholar]
- Paruchuri, P.; Pearce, J.; Marecki, J.; Tambe, M.; Ordonez, F.; Kraus, S. Playing Games for Security: An Efficient Exact Algorithm for Solving Bayesian Stackelberg Games. In Proceedings of the International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), Estoril, Portugal, 12–16 May 2008; pp. 895–902. [Google Scholar]
- Hohzaki, R.; Nagashima, S. A Stackelberg equilibrium for a missile procurement problem. Eur. J. Oper. Res. 2009, 193, 238–249. [Google Scholar] [CrossRef]
- Sinha, A.; Fang, F.; An, B.; Kiekintveld, C.; Tambe, M. Stackelberg Security Games: Looking Beyond a Decade of Success. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), Stockholm, Sweden, 13–19 July 2018; pp. 5494–5501. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).