Manipulation Game Considering No-Regret Strategies

Clempner, Julio B.

doi:10.3390/math13020184

Open AccessArticle

Manipulation Game Considering No-Regret Strategies

by

Julio B. Clempner

Instituto Politécnico Nacional, Mexico City 07320, Mexico

Mathematics 2025, 13(2), 184; https://doi.org/10.3390/math13020184

Submission received: 10 October 2024 / Revised: 19 December 2024 / Accepted: 3 January 2025 / Published: 8 January 2025

(This article belongs to the Special Issue Game and Decision Theory Applied to Business, Economy and Finance)

Download

Browse Figure

Versions Notes

Abstract

:

This paper examines manipulation games through the lens of Machiavellianism, a psychological theory. It analyzes manipulation dynamics using principles like hierarchical perspectives, exploitation tactics, and the absence of conventional morals to interpret interpersonal interactions. Manipulators intersperse unethical behavior within their typical conduct, deploying deceptive tactics before resuming a baseline demeanor. The proposed solution leverages Lyapunov theory to establish and maintain Stackelberg equilibria. A Lyapunov-like function supports each asymptotically stable equilibrium, ensuring convergence to a Nash/Lyapunov equilibrium if it exists, inherently favoring no-regret strategies. The existence of an optimal solution is demonstrated via the Weierstrass theorem. The game is modeled as a three-level Stackelberg framework based on Markov chains. At the highest level, manipulators devise strategies that may not sway middle-level manipulated players, who counter with best-reply strategies mirroring the manipulators’ moves. Lower-level manipulators adjust their strategies in response to the manipulated players to sustain the manipulation process. This integration of stability analysis and strategic decision-making provides a robust framework for understanding and addressing manipulation in interpersonal contexts. A numerical example focusing on the oil market and its regulations highlights the findings of this work.

Keywords:

Machiavellianism; manipulation; no regret; Markov chains; game theory

MSC:

91A10; 91A27; 91A50; 91A65; 91A80; 90C40

1. Introduction

1.1. Brief Review

The inclusion of believable behavioral assumptions into traditional economic models has sparked extensive debate. Manipulation emerges as a prominent concern, with game-theoretic studies revealing significant individual disparities in scenarios allowing such behavior. This work focuses on Machiavellianism—a personality trait associated with manipulation, callousness, and moral indifference—as studied in personality psychology. While enriching economic models with believable behavioral assumptions enhances their descriptive accuracy, it also introduces complexities and challenges in predicting human behavior and its economic outcomes. Thus, understanding the implications of incorporating these assumptions is crucial for advancing economic theory and policy [1].

Christie and Geis [2] utilized adapted Machiavellian remarks to investigate variations in human behavior, drawing from Machiavelli’s dichotomy of individuals as either manipulators or the manipulated. This classification reflects a worldview steeped in power dynamics and strategic interactions. Machiavellian individuals are characterized by a propensity to exploit others, harbor a pessimistic view of human nature, and demonstrate a lack of moral accountability. The concept of Machiavellianism underscores the absence of coordination in self-serving, conflict-laden situations, particularly in games where interests diverge.

Christie and Geis further delineated Machiavellian traits into three key components:

Worldview shaped by manipulators and manipulated (views component), lacking sympathy; manipulated individuals also engage in manipulation.
Lack of regard for conventional morality (morality component), indifferent to actions like lying and cheating.
Emphasis on pragmatic problem-solving over ideological commitments, utilizing power strategies (tactics component) for personal goals rather than ideological ones.

Understanding Machiavellianism offers valuable insights into the intricacies of social dynamics, decision-making, and interpersonal relationships. This psychological framework helps unravel how individuals navigate power structures, negotiate conflicting interests, and strategize to achieve personal goals. By dissecting Machiavellian traits—such as manipulation, pragmatic problem-solving, and moral flexibility—researchers can better understand the complexities of human behavior. It sheds light on how individuals employ strategic thinking in diverse contexts, from interpersonal interactions to organizational settings, revealing patterns of influence, persuasion, and control. Such analysis is crucial for comprehending motivations, fostering cooperation, and addressing conflicts in various social and professional environments [3,4].

1.2. Related Work

Allen and Gorton [5] delved into the concept of manipulation equilibrium, emphasizing the impact of stock price dynamics and their implications for welfare. Their work highlights how manipulation strategies can disrupt market equilibrium, leading to unintended consequences for various stakeholders. Building on this, Bagnoli and Lipman [6] demonstrated that efforts to manipulate stock prices often result in a decline in pre-bid stock prices, which can inflate takeover bids. This inflation can create barriers to successful acquisitions, particularly in times when the market experiences limited takeover activity, effectively creating an environment where genuine value is obscured by artificial price alterations.

Furthering the discussion on manipulation, Clempner [7] introduced a comprehensive game-theoretical framework designed to simulate such manipulation, integrating reinforcement learning techniques to explore moral considerations surrounding these strategies. This innovative approach sheds light on the ethical dimensions of decision-making in financial markets, illustrating how self-interest can clash with broader market integrity.

In a broader empirical study, Cumming et al. [8] examined a dataset comprising suspected instances of stock price manipulation across nine countries over an eight-year period. Their findings revealed significant adverse effects of market manipulation on innovation, suggesting that such practices stifle creativity and investment in new ideas. Interestingly, their research pointed out that the detrimental impacts are particularly severe in jurisdictions characterized by weak intellectual property rights coupled with strong shareholder protections, indicating that the regulatory environment plays a critical role in shaping market behavior. Clempner’s contributions continued with the development of a manipulation model tailored for repeated Stackelberg security games [9], which explores strategic interactions over time among players with hierarchical roles. Additionally, Clempner [10] introduced a Machiavellian game model founded on ergodic Bayesian Markov processes, further enriching the discourse on strategic manipulation in competitive environments. This work addresses how players with Machiavellian traits might exploit informational asymmetries to gain an advantage. Finally, significant advancements in the study of Machiavellianism games were presented by [11], who explored the nuances of strategic deception and manipulation within competitive settings, emphasizing the psychological and behavioral aspects that drive such strategies. Collectively, these studies provide a comprehensive view of how manipulation in financial markets not only affects stock prices but also has far-reaching implications for overall market efficiency and ethical standards.

In this theoretical framework, the sender, armed with private information, formulates a signaling strategy designed to influence the decision-making process of the receiver. This strategic communication is crucial because it allows the sender to steer the receiver’s actions in a desired direction. To explore extensions of this model that incorporate multiple senders, one can refer to the research conducted by Milgrom [12] and Krishna [13], who delve into the dynamics and interactions that arise when several senders attempt to convey information to a common receiver.

The idea of private signal-based persuasion has been investigated in diverse scenarios, ranging from two-player, two-action games [14] to unanimity voting systems [15]. These contexts provide valuable insights into how information asymmetries can shape competitive interactions and collective decision-making. Focusing specifically on the interaction between a single sender and a receiver, Kamenica and Gentzkow [16] developed a Bayesian persuasion framework. This model outlines the conditions under which the signals sent by the sender can be beneficial and delineates optimal strategies for signaling that enhance the sender’s objectives. Bergemann and Morris [17] further explored the role of a mechanism designer within the Bayesian persuasion framework. Their research examines how such a designer selects an appropriate information structure to fulfill specific objectives, shedding light on the strategic considerations involved in information dissemination and persuasion. Expanding on their previous findings, Gentzkow and Kamenica [18] introduced a model with multiple senders, analyzing scenarios where a vast array of potential signals exists. They employed a lattice structure to facilitate intuitive comparisons and combinations of these signals, enriching the understanding of how senders can effectively convey their information in a competitive landscape.

Additionally, Brocas et al. [19] proposed a novel model where two competing agents allocate resources to acquire public information about an uncertain state of the world. Their analysis focuses on the equilibrium strategies for information acquisition, revealing how these strategies evolve when one agent faces greater costs in gathering information. This research highlights the complexities involved in information competition and the tactical decisions that arise from resource allocation. In the realm of political communication, Gul and Pesendorfer [20] developed a model in which two senders with opposing interests share asymmetric information regarding a binary state of the world. This study illustrates how differing objectives can influence the nature of information conveyed and the strategic behavior of the senders involved. Moreover, several authors, including [21,22,23,24], have explored frameworks in which the information revealed by the sender is exclusively accessible to the recipients. These works emphasize the significance of control over information flow and the implications it has for decision-making processes in various contexts.

Collectively, these studies contribute to a deeper understanding of the mechanisms of persuasion and information asymmetry, highlighting the strategic complexities involved in both competitive and cooperative environments.

1.3. Main Results

This paper concerns manipulation games based on the psychological theory of Machiavellianism, which analyzes the world from the Machiavellian perspective using three guiding principles: view, tactics, and immoral behavior. View is related to the existence of manipulating and manipulated individuals, corresponding to a hierarchical scheme represented by a Stackelberg game where all the participants are involved in some degree of manipulation. Tactics are concerned with employing interpersonal exploitation strategies as a method for control. Immoral refers to a behavior lacking conventional morals that would prohibit their actions, which is designed using a Bayesian–Markov game theory approach. A manipulator is someone who may temporarily stray from steady conduct by committing an unethical act and then reverting to their usual behavior.

We propose a solution to the manipulation game problem involving Lyapunov theory, which supports the existence and stability of Nash equilibria: each asymptotically stable equilibrium point admits the so-called Lyapunov-like function that converges to a Nash/Lyapunov equilibrium point if it exists. The asymptotic behavior of a Lyapunov-like function admits by definition no-regret strategies.

The idea behind the representation of the game is that it is a three-level Stackelberg game, where the manipulating players choose a manipulation strategy at the highest level that might not convince the manipulated players in the middle level. A best-reply strategy that equals the manipulating strategy can be selected by the player who is being manipulated. In order to ensure the manipulation process, the manipulating player at the lower level then chooses a strategy once more based on the strategy of the manipulated players.

2. Markov Games

In a discrete-time, finite-horizon setting, we define an environment characterized by private and independent valuations [25]. Machiavellian players, denoted as

ι \in N

, where

N = \{1, 2, . . ., n\}

, incur a cost at each period

t \in T,

with

T \subseteq N

. This cost is influenced by the current physical allocation

y_{t}^{ι} \in Y_{t}^{ι}

and the player’s type

θ_{t}^{ι} \in Θ_{t}^{ι}

. The cost function

φ^{ι}

for Machiavellian player

ι

is represented as

φ^{ι} (y_{t}^{ι}, θ_{t}^{ι}) \geq 0

, where

φ^{ι}

maps

Y_{t}^{ι} \times Θ_{t}^{ι}

to

R^{+}

.

At time period t, the type vector is represented as

θ_{t} = (θ_{t}^{1}, . . ., θ_{t}^{n}) \in Θ

, where

Θ_{t}

is defined as the product

\times_{ι \in N} Θ_{t}^{ι}

. The set of permissible allocations in period t can depend on the vector of prior allocations

y_{t} = (y_{t}^{1}, . . ., y_{t}^{n}) \in Y

, where

Y_{t} = \times_{ι \in N} Y_{t}^{ι}

. We denote by

Θ_{t}^{- l}

the product

\times_{ι \in N, ι \neq l} Θ_{t}^{ι}

, and by

Δ (Y_{t})

the set of all probability distributions over

Y_{t}

. It is assumed that

Y_{t}^{ι}

,

Θ_{t}^{ι}

, and T are finite sets.

For each Machiavellian player

ι

, there exists a shared prior distribution

μ^{ι} (θ_{0}^{ι})

. The player’s current type,

θ_{t}^{ι}

, along with their current action,

y_{t}^{ι}

, determines the probability distribution for the next period’s state variable

θ_{t + 1}^{ι}

on

Θ_{t + 1}^{ι}

. We assume that this probability distribution is described by a transition function (or stochastic kernel)

F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι})

.

We assume that the type

θ_{t}^{ι}

of Machiavellian player

ι

follows a Markov process over the state space

Θ_{t}^{ι}

, and the dynamics adhere to the Markov condition, expressed as follows:

F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι}, θ_{t - 1}^{ι}, y_{t - 1}^{ι} \dots, θ_{0}^{ι}, y_{0}^{ι}) = F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι}) .

The sequence of states and actions for each player

ι

is represented by

\{(Θ_{t}, Y_{t}) | t\}

. In this case, the process described by a Markov chain is fully characterized by the transition function

F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι})

and the common prior distribution

μ^{ι} (θ_{0}^{ι})

, where

μ^{ι} (θ_{t}^{ι})

belongs to

Δ (Θ_{t}^{ι})

, the set of probability distributions over

Θ_{t}^{ι}

. It is further assumed that each chain

(μ^{ι}, F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι}))

is ergodic [9,25].

The cost functions

φ^{ι} (y_{t}^{ι}, θ_{t}^{ι})

and the transition functions

F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι})

are considered common knowledge among all agents

ι

. The shared prior distribution

μ^{ι} (θ_{0}^{ι})

and the transition function

F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι})

are assumed to be independent for each agent. At the start of each period t, every agent

ι

privately observes their type

θ_{t}^{ι}

. By the end of the period, after choosing an action

y_{t}^{ι} \in Y_{t}^{ι}

, the cots for that period are realized. Asymmetric information arises from each player’s private observation of

θ_{t}^{ι}

at every time t. Due to the independence of the prior distributions

μ^{ι} (θ_{0}^{ι})

and the transition functions

F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι})

for each player

ι

, the information available to player

ι

, determined by

θ_{t + 1}^{ι}

, does not depend on the type

θ t^{l}

of any other player

l \neq ι

. After this, each participant simultaneously sends messages

m_{t}^{ι} \in M_{t}^{ι}

, and the entire message profile is made public. Additionally, we assume that

M_{t}^{ι} = Θ_{t}^{ι}

from this point onward.

The relationship

ϱ^{ι} (a_{t}^{ι} | y_{t}^{ι})

for the Machiavellian players is given by

ϱ^{ι} : Y^{ι} \to Δ (Y^{ι})

. The set of actions that are admissible is defined by

Y_{a d m}^{ι} = \{ϱ^{ι} (a_{t}^{ι} | y_{t}^{ι}) | \sum_{a_{t}^{ι} \in Y^{ι}} ϱ^{ι} (a_{t}^{ι} | y_{t}^{ι}) = 1, y_{t}^{ι} \in Y^{ι}\},

The relationship

ϱ^{ι} (a_{t}^{ι} | y_{t}^{ι})

represents the likelihood that a Machiavellian player

ι

believes that

a_{t}^{ι}

is an action

y_{t}^{ι}

.

A (behavioral) strategy

σ^{ι} (m_{t}^{ι} | θ_{t}^{ι})

for player

ι

is a function

σ^{ι} : Θ_{t}^{ι} \to Δ (M_{t}^{ι})

, which indicates the probability that player

ι

associates a message

m_{t}^{ι}

with type

θ_{t}^{ι}

. The set of (behavioral) strategies is defined as follows:

Σ_{a d m}^{ι} = {σ^{ι} (m_{t}^{ι} | θ_{t}^{ι}) \geq 0 |\sum_{m_{t}^{ι} \in M_{t}^{ι}} σ^{ι} (m_{t}^{ι} | θ_{t}^{ι}) = 1, θ_{t}^{ι} \in Θ_{t}^{ι}} .

A policy refers to a series

\{π^{ι} (y_{t}^{ι} | m_{t}^{ι})\}

where, for each time period t,

π^{ι} (y_{t}^{ι} | m_{t}^{ι})

represents a stochastic function on

Y^{ι}

. The collection of all possible policies is represented by the symbol

Π

. We have that

Π_{a d m}^{ι} = \{π^{ι} (y_{t}^{ι} | m_{t}^{ι}) \geq 0 |\sum_{y_{t}^{ι} \in Y_{t}^{ι}} π^{ι} (y_{t}^{ι} | m_{t}^{ι}) = 1, m_{t}^{ι} \in M_{t}^{ι}\}

Considering the Markovian nature of the type processes, the utility at time t starts at the type vector

θ_{t}^{ι}

, accompanied by the (behavioral) strategy

σ^{ι} (θ_{t}^{ι} | m_{t}^{ι})

, the policy

π^{ι} (y_{t}^{ι} | θ_{t}^{ι})

, and the ex ante probability

μ^{ι} (θ_{t}^{ι})

. These elements can be expressed as

\begin{matrix} Φ^{ι} (π, σ) = \\ \sum_{t \in T} \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} \sum_{m_{t}^{ι} \in M_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} \sum_{θ_{t + 1}^{ι} \in Θ_{t + 1}^{ι}} φ^{ι} (y_{t}^{ι}, θ_{t}^{ι}) F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι}) \prod_{ι \in N} π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι}) μ^{ι} (θ_{t}^{ι}) \\ = \sum_{t \in T} \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} \sum_{m_{t}^{ι} \in M_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} r^{ι} (y_{t}^{ι}, θ_{t}^{ι}) \prod_{ι \in N} π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι}) μ^{ι} (θ_{t}^{ι}), \end{matrix}

where

r^{ι} (y_{t}^{ι}, θ_{t}^{ι}) = \sum_{θ_{t + 1}^{ι} \in Θ_{t + 1}^{ι}} φ^{ι} (y_{t}^{ι}, θ_{t}^{ι}) F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι})

.

We consider that players are aware of their respective costs. A policy

π

along with the strategy

σ

work together to minimize the cost function

Φ^{ι} (π, σ)

, which operates according to the following rule:

(π^{*}, σ^{*}) : = A r g min_{π \in Π_{a d m}} min_{σ \in Σ_{a d m}} \sum_{ι \in N} Φ (π, σ) .

(1)

The policy

π^{*}

and strategy

σ^{*}

satisfy the conditions of a Bayesian–Nash equilibrium, holding true for all

σ

, such that

Φ^{ι} (π^{*}, σ^{*}) \leq Φ^{ι} (π^{ι}, π^{- ι *}, σ^{ι}, σ^{- ι *}) .

(2)

where the policy

π^{*} = (π^{1 *}, . . ., π^{n *})

and the strategy

σ^{*} = (σ^{1 *}, . . ., σ^{n *})

are called Bayesian–Nash equilibrium, where

π

and

σ

fulfill that

π^{- ι *} =

(π^{1 *}, . . ., π^{ι - 1 *}, π^{ι + 1, *},

. . ., π^{n *})

and

σ^{- ι *} =

(σ^{1 *}, . . ., σ^{ι - 1 *}, σ^{ι + 1, *}, . . ., σ^{n *})

[26].

Here is an explanation in words. This mathematical expression describes a game-theoretic framework where a policy

π

and a strategy

σ

are optimized to minimize a cost function

Φ^{ι} (π, σ)

. The goal for cost minimization is to find the optimal policy

π^{*}

and strategy

σ^{*}

that minimize the overall cost function

Φ (π, σ)

across a set of entities

N

. For the Bayesian–Nash equilibrium, the optimal policy

π^{*}

and strategy

σ^{*}

must satisfy the conditions of a Bayesian–Nash equilibrium. This means that no single entity

ι

can reduce its individual cost

Φ^{ι} (π, σ)

by unilaterally deviating from the optimal policy or strategy, while others maintain their optimal choices. This framework ensures that each entity

ι

adopts its optimal policy

π^{ι *}

and strategy

σ^{ι *}

, given that all others are also acting optimally (

π^{- ι *}, σ^{- ι *}

). The result is a state where no entity has an incentive to change its actions unilaterally, which aligns with the concept of equilibrium in game theory. In summary, the expression defines a systematic approach to optimizing policies and strategies in a multi-agent system, ensuring equilibrium while minimizing the overall and individual costs.

Let us introduce the

ξ

-variable as follows

ξ^{ι} (θ_{t}^{ι}, m_{t}^{ι}, y_{t}^{ι}) : = π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι}) μ^{ι} (θ_{t}^{ι}) .

(3)

such that

\begin{matrix} Ξ_{a d m}^{ι} : = {ξ^{ι} (θ_{t}^{ι}, m_{t}^{ι}, y_{t}^{ι}) |\sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} \sum_{m_{t}^{ι} \in Θ_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} ξ^{ι} (θ_{t}^{ι}, m_{t}^{ι}, y_{t}^{ι}) = 1, \\ \sum_{m_{t}^{ι} \in Θ_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} ξ^{ι} (θ_{t}^{ι}, m_{t}^{ι}, y_{t}^{ι}) = μ^{ι} (θ_{t}^{ι}) > 0, \\ \sum_{m_{t}^{ι} \in Θ_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} [δ_{θ_{t}^{ι} θ_{t + 1}^{ι}} - F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι})] ξ^{ι} (θ_{t}^{ι}, m_{t}^{ι}, y_{t}^{ι}) = 0, \\ \sum_{m_{t}^{ι} \in Θ_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} P^{ι} (m_{t}^{ι} | ϑ_{t}^{ι}) ξ^{ι} (θ_{t}^{ι}, m_{t}^{ι}, y_{t}^{ι}) \geq 0, ϑ_{t}^{ι} \in Θ_{t}^{ι}} \end{matrix}

such that

P^{ι} (m_{t}^{ι} | θ_{t}^{ι}) = {(σ^{ι} (m_{t}^{ι} | θ_{t}^{ι}))}^{- 1}

is the inverse and it exists; then, the policy

π^{ι} (y_{t}^{ι} | m_{t}^{ι})

and the strategy

σ^{ι} (m_{t}^{ι} | θ_{t}^{ι})

form a Bayesian–Nash equilibrium. In this equilibrium, each player minimizes their expected cost for every

ι \in N

Φ^{ι} (π^{ι *} (y_{t}^{ι} | m_{t}^{ι}), σ^{*} (m_{t}^{ι} | θ_{t}^{ι})) \leq Φ^{ι} (π^{ι} (y_{t}^{ι} | m_{t}^{ι}), σ (m_{t}^{ι} | θ_{t}^{ι})) .

The variable

ξ^{ι} (θ_{t}^{ι}, m_{t}^{ι}, y_{t}^{ι})

defined in Equation (3) must satisfy the following individual nonlinear optimization problem if and only if

\begin{matrix} Φ (π, σ) = \sum_{ι \in N} {\tilde{Φ}}^{ι} (ξ) \to min_{ξ \in Ξ_{a d m}} \end{matrix}

(4)

where

\begin{matrix} {\tilde{Φ}}^{ι} (ξ) = \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} \sum_{m_{t}^{ι} \in Θ_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} φ^{ι} (y_{t}^{ι}, θ_{t}^{ι}) \prod_{ι \in N} ξ^{ι} (θ_{t}^{ι}, m_{t}^{ι}, y_{t}^{ι}) \end{matrix}

and

Ξ_{a d m} = \times_{ι \in N}^{ι} Ξ_{a d m}

.

Remark 1.

We have that

\begin{matrix} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} π^{ι} (y_{t}^{ι} | m_{t}^{ι}) = 1, & \sum_{m_{t}^{ι} \in Θ_{t}} σ (m_{t}^{ι} | θ_{t}^{ι}) = 1, & \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} μ^{ι} (θ_{t}^{ι}) = 1 . \end{matrix}

The solution of the problem (4) is defined as

ξ^{*}

.

The variables

π^{ι *} (y_{t}^{ι} | m_{t}^{ι})

can be derived from

ξ^{ι} (θ_{t}^{ι}, m_{t}^{ι}, y_{t}^{ι})

in the following way:

\begin{matrix} π^{ι *} (y_{t}^{ι} | m_{t}^{ι}) = \frac{1}{|Θ_{t}^{ι}|} \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} \frac{ξ^{ι} (θ_{t}^{ι}, m_{t}^{ι}, y_{t}^{ι})}{\sum_{y_{t}^{ι} \in Y_{t}^{ι}} ξ^{ι} (θ_{t}^{ι}, m_{t}^{ι}, y_{t}^{ι})} . \end{matrix}

(5)

To retrieve

σ^{ι *} (m_{t}^{ι} | θ_{t}^{ι})

, for each player

ι \in N

, the relevant quantity is defined as follows:

\begin{matrix} σ^{ι *} (m_{t}^{ι} | θ_{t}^{ι}) = \frac{\sum_{y_{t}^{ι} \in Y_{t}^{ι}} ξ^{ι} (θ_{t}^{ι}, m_{t}^{ι}, y_{t}^{ι})}{\sum_{m_{t}^{ι} \in M_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} ξ^{ι} (θ_{t}^{ι}, m_{t}^{ι}, y_{t}^{ι})} . \end{matrix}

(6)

As well as the distribution

μ^{ι *} (θ_{t}^{ι})

, we have

\begin{matrix} μ^{ι *} (θ_{t}^{ι}) = \sum_{m_{t}^{ι} \in M_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} ξ^{ι} (θ_{t}^{ι}, m_{t}^{ι}, y_{t}^{ι}) > 0 . \end{matrix}

(7)

The formulas that minimize Equation (4) with respect to the variables

ξ^{ι} (θ_{t}^{ι}, m_{t}^{ι}, y_{t}^{ι})

have been determined, along with those needed to recover the strategy

σ^{ι *} (m_{t}^{ι} | θ_{t}^{ι})

, the policy

π^{ι *} (y_{t}^{ι} | m_{t}^{ι})

, and the distribution

μ^{ι *} (θ_{t}^{ι})

. When players use a reporting strategy

σ^{ι} (m_{t}^{ι} | θ_{t}^{ι})

, they are minimizing the expected cost described by Equation (4). The calculated strategy profile

σ^{ι *} (m_{t}^{ι} | θ_{t}^{ι})

and policy

π^{ι *} (y_{t}^{ι} | m_{t}^{ι})

constitute a Bayesian–Nash equilibrium.

3. No-Regret Strategy

To tackle this issue, we propose modeling the state-value function

Φ^{ι}

using a linear framework, defined in terms of

π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι})

. The aim is to develop a strategy that strictly minimizes the trajectory value. Accordingly,

Φ^{ι}

is formulated in a recursive matrix structure, facilitating efficient computations and in-depth analysis across various states and control parameters. This formulation optimizes the policy by enhancing both performance and decision-making, providing a systematic approach for achieving optimal control in dynamic systems [25].

3.1. The Cost Function

The cost function

Φ^{ι}

for any fixed strategies

π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι})

is defined over all possible state–action combinations. It represents the expected value of taking action

y_{t}^{ι}

and sending a message

m_{t}^{ι}

considering state

θ_{t}^{ι}

and subsequently following

π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι})

. This function enables the evaluation of the long-term cost associated with a particular strategy. The

Φ^{ι}

-values for all the states in open format can be expressed by

\begin{matrix} Φ^{ι} (π, σ) = \\ \sum_{t \in T} \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} \sum_{m_{t}^{ι} \in M_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} \sum_{θ_{t + 1}^{ι} \in Θ_{t + 1}^{ι}} φ^{ι} (y_{t}^{ι}, θ_{t}^{ι}) F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι}) π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι}) μ^{ι} (θ_{t}^{ι}) \\ = \sum_{t \in T} \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} \sum_{m_{t}^{ι} \in M_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} r^{ι} (y_{t}^{ι}, θ_{t}^{ι}) π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι}) μ^{ι} (θ_{t}^{ι}), \end{matrix}

where

r^{ι} (y_{t}^{ι}, θ_{t}^{ι}) = \sum_{θ_{t + 1}^{ι} \in Θ_{t + 1}^{ι}} φ^{ι} (y_{t}^{ι}, θ_{t}^{ι}) F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι})

.

E (Φ_{t + 1}^{ι} (π, σ)) = \sum_{t \in T} \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} \sum_{m_{t}^{ι} \in M_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} r^{ι} (y_{t}^{ι}, θ_{t}^{ι}) π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι}) μ^{ι} (θ_{t}^{ι}),

(8)

such that

φ^{ι} (y_{t}^{ι}, θ_{t}^{ι})

is a loss value at type

θ_{t}^{ι}

when the action

y_{t}^{ι}

is applied (without loss of generality it can be assumed to be positive) and

μ^{ι} (θ_{t}^{ι})

for any given

μ^{ι} (θ_{0}^{ι})

is defined as follows

μ^{ι} (θ_{t + 1}^{ι}) = \sum_{t \in T} \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} \sum_{m_{t}^{ι} \in M_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι}) π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι}) μ^{ι} (θ_{t}^{ι})

or, in matrix format,

\begin{matrix} μ_{t + 1}^{ι} = {(F_{t}^{ι})}_{t}^{⊤} μ_{t}^{ι}, \\ F_{t}^{ι} : = \sum_{t \in T} \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} \sum_{m_{t}^{ι} \in M_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι}) π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι}) . \end{matrix}

Remark 2.

We will suppose that

φ^{ι} (y_{t}^{ι}, θ_{t}^{ι}) > 0, for all ι \in N .

Considering

\begin{matrix} E (Φ_{t + 1}^{ι} (π, σ)) = \\ \sum_{t \in T} \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} \sum_{m_{t}^{ι} \in M_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} \sum_{θ_{t + 1}^{ι} \in Θ_{t + 1}^{ι}} r^{ι} (y_{t}^{ι}, θ_{t}^{ι}) π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι}) μ^{ι} (θ_{t}^{ι}) = \\ \sum_{t \in T} \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} \sum_{m_{t}^{ι} \in M_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} \sum_{θ_{t + 1}^{ι} \in Θ_{t + 1}^{ι}} [φ^{ι} (y_{t}^{ι}, θ_{t}^{ι}) + c] \cdot [F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι}) π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι}) μ^{ι} (θ_{t}^{ι}) - c] \end{matrix}

the minimization of the state-value function

E (Φ_{t + 1}^{ι} (π, σ))

is equivalent to the minimization of the function

E ({\tilde{Φ}}_{t + 1}^{ι} (π, σ))

where

{\tilde{Φ}}_{t + 1}^{ι} (π, σ) = {\tilde{Φ}}_{t + 1}^{ι} (π, σ) + c,

which is strictly positive if we take

c > max_{y_{t}^{ι} \in Y_{t}^{ι}, θ_{t}^{ι} \in Θ_{t}^{ι}} φ^{ι} (y_{t}^{ι}, θ_{t}^{ι}) .

(9)

In a vector format, the Formula (8) can be expressed as

\begin{matrix} E (Φ_{t + 1}^{ι} (π, σ)) = \\ \sum_{t \in T} \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} [\sum_{m_{t}^{ι} \in M_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} \sum_{θ_{t + 1}^{ι} \in Θ_{t + 1}^{ι}} r^{ι} (y_{t}^{ι}, θ_{t}^{ι}) π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι})] μ^{ι} (θ_{t}^{ι}) = 〈R_{t}^{ι}, μ_{t}^{ι}〉, \end{matrix}

where

\begin{matrix} R_{t}^{ι} : = \sum_{m_{t}^{ι} \in M_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} \sum_{θ_{t + 1}^{ι} \in Θ_{t + 1}^{ι}} r^{ι} (y_{t}^{ι}, θ_{t}^{ι}) π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι}), \\ r^{ι} (y_{t}^{ι}, θ_{t}^{ι}) : = \sum_{θ_{t + 1}^{ι} \in Θ_{t + 1}^{ι}} φ^{ι} (y_{t}^{ι}, θ_{t}^{ι}) F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι}), \\ U_{t}^{ι} : = \sum_{t \in T} \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} \sum_{m_{t}^{ι} \in M_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι}) π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι}) . \end{matrix}

Remark 3.

Lyapunov theory plays a critical role in ensuring the stability of dynamic systems by providing conditions under which a system remains in equilibrium. Beyond stability, Lyapunov theory also offers a pathway for these systems to converge toward a Nash equilibrium, where all participants optimize their strategies. This dual function—guaranteeing both stability and convergence to optimal outcomes—enhances the practical applicability of the framework. It ensures that not only will the system remain stable over time, but it will also naturally evolve toward a state of optimal strategic interactions. This makes the framework robust and suitable for real-world applications requiring both stability and optimality.

3.2. Optimal Stationary Strategy

Let us first introduce the following Lemma.

Lemma 1.

Any nonstationary strategy

ξ^{ι} (θ_{t}^{ι}, m_{t}^{ι}, y_{t}^{ι})

cannot improve the average cost obtained by employing an optimal stationary strategy

ξ^{ι *} (θ^{ι *}, m^{ι *}, y^{ι *})

defined on a compact set. An optimal solution exists.

Proof.

According to the Weierstrass theorem, any nonstationary bounded strategy

ξ^{ι} (θ_{t}^{ι}, m_{t}^{ι}, y_{t}^{ι})

, defined on a compact set, necessarily contains a convergent subsequence that satisfies the following relationships:

\begin{matrix} \underset{n \to \infty}{lim inf} \frac{1}{t} \sum_{n = 1}^{t} {\tilde{Φ}}^{ι} (ξ^{ι} (θ_{t}^{ι}, m_{t}^{ι}, y_{t}^{ι})) \geq \underset{t \to \infty}{lim inf} \frac{1}{t} \sum_{n = 1}^{t} min_{k = 1, . . ., n} {\tilde{Φ}}^{ι} (ξ^{ι} (θ_{k}^{ι}, m_{k}^{ι}, y_{k}^{ι})) \geq \\ \underset{n \to \infty}{lim inf} \frac{1}{t} \sum_{n = 1}^{t} \underset{k \to \infty}{lim inf} {\tilde{Φ}}^{ι} (ξ^{ι} (θ_{t_{k}}^{ι}, m_{t_{k}}^{ι}, y_{t_{k}}^{ι})) = {\tilde{Φ}}^{ι} (ξ^{ι * *} (θ^{ι * *}, m^{ι * *}, y^{ι * *})), \end{matrix}

where

{\tilde{Φ}}^{ι} (ξ^{ι} (θ^{ι}, m^{ι}, y^{ι}))

is assumed to be a monotonically decreasing functional with respect to each component

ξ^{ι^{'}} (θ^{ι^{'}}, m^{ι^{'}}, y^{ι^{'}})

, when the others are held constant. Additionally,

\underset{k \to \infty}{lim inf} {\tilde{Φ}}^{ι} (ξ^{ι} (θ_{t_{k}}^{ι}, m_{t_{k}}^{ι}, y_{t_{k}}^{ι})) : = {\tilde{Φ}}^{ι} (ξ^{ι * *} (θ^{ι * *}, m^{ι * *}, y^{ι * *})

This lower bound is achieved by setting

ξ^{ι} (θ_{t}^{ι}, m_{t}^{ι}, y_{t}^{ι}) = ξ^{ι *} (θ^{ι *}, m^{ι *}, y^{ι *}) = ξ^{ι * *} (θ^{ι * *}, m^{ι * *}, y^{ι * *})

and this is because

\underset{n \to \infty}{lim inf} \frac{1}{n} \sum_{h = 1}^{n} {\tilde{Φ}}^{ι} (ξ^{ι *} (θ^{ι *}, m^{ι *}, y^{ι *})) = {\tilde{Φ}}^{ι} (ξ^{ι *} (θ^{ι *}, m^{ι *}, y^{ι *})) = {\tilde{Φ}}^{ι} (ξ^{ι * *} (θ^{ι * *}, m^{ι * *}, y^{ι * *})) .

□

Remark 4.

It is essential to highlight that this property holds exclusively for average cost functionals defined on compact sets of strategies.

Proposition 1.

Let S be the unit simplex in

R^{M}

, that is,

S = {κ \in R^{|Y|} | \sum_{y_{t}^{ι} \in Y_{t}^{ι}} κ (y_{t}^{ι}) = 1, κ (y_{t}^{ι}) \geq 0} .

Then,

min_{κ \in S} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} g (y_{t}^{ι}) κ (y_{t}^{ι}) = min_{y_{t}^{ι} \in Y_{t}^{ι}} g (y_{t}^{ι}) = g (e),

such that the minimum is achieved at least for

κ = (\underset{e}{\underset{︸}{0, 0, . . ., 0, 1}}, 0, . . ., 0)

.

Proof.

We have that

\sum_{y_{t}^{ι} \in Y_{t}^{ι}} g (y_{t}^{ι}) κ (y_{t}^{ι}) \geq \sum_{y_{t}^{ι} \in Y_{t}^{ι}} (min g (y_{t}^{ι}) κ (y_{t}^{ι})) = min g (y_{t}^{ι}) \sum_{y_{t}^{ι} \in Y_{t}^{ι}} κ (y_{t}^{ι}) = min g (y_{t}^{ι}) = g (e),

and the equality is achieved at least for

κ = (\underset{e}{\underset{︸}{0, 0, . . ., 0, 1}}, 0, . . ., 0)

. □

As a result, it holds that

\begin{matrix} Ψ_{t + 1}^{ι} = 〈R_{t}^{ι}, μ_{t}^{ι}〉 = \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} R_{t}^{ι} μ_{t}^{ι} \leq \\ \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} min_{π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι})} R_{t}^{ι} μ_{t}^{ι} = \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} μ_{t}^{ι} min_{y_{t}^{ι} \in Y_{t}^{ι}} [\sum_{θ_{t + 1}^{ι} \in Θ_{t + 1}^{ι}} φ^{ι} (y_{t}^{ι}, θ_{t}^{ι}) F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι})] . \end{matrix}

At this point, let us introduce the following general definition of a Lyapunov-like function

Given fixed history of the process

(μ_{0}^{ι}, π^{ι} (y_{0}^{ι} | m_{0}^{ι}), σ^{ι} (m_{0}^{ι} | θ_{0}^{ι}), . . ., π^{ι} (y_{t}^{ι} | m_{t}^{ι}), σ^{ι} (m_{t}^{ι} | θ_{t}^{ι}))

min_{π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι})} V_{t + 1}^{ι} = \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} {(μ_{0}^{ι})}_{θ_{t}^{ι}} [\sum_{θ_{t + 1}^{ι} \in Θ_{t + 1}^{ι}} φ^{ι} (y_{t}^{ι *}, θ_{t}^{ι}) F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι *})] .

(10)

the identity in (10) is achieved for the pure and stationary local-optimal policy.

π^{ι *} (y_{t}^{ι *} | m_{t}^{ι *}) σ^{ι *} (m_{t}^{ι} | θ_{t}^{ι}) = κ_{y_{t}^{ι *} (θ_{t}^{ι *}), θ_{t}^{ι *}} t = 0, 1, . . .

(11)

where

κ_{y_{t}^{ι *} (m_{t}^{ι *}), m_{t}^{ι *}}

is the Kronecker symbol and

y_{t}^{ι *} (m_{t}^{ι *})

is an index for which

\sum_{θ_{t + 1}^{ι} \in Θ_{t + 1}^{ι}} φ^{ι} (y_{t}^{ι *}, θ_{t}^{ι}) F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι *}) \leq \sum_{θ_{t + 1}^{ι} \in Θ_{t + 1}^{ι}} φ^{ι} (y_{t}^{ι}, θ_{t}^{ι}) F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι}) : = r^{ι} (y_{t}^{ι}, θ_{t}^{ι}), \forall y_{t}^{ι} \in Y_{t}^{ι}

(12)

As a result, we can state the following lemma.

Lemma 2.

Given a fixed local-optimal policy, the

V

-values for all state–action pairs from (8) in the recursive matrix format become

V_{t + 1}^{l} = 〈R_{t}^{ι *}, μ_{t}^{ι *}〉

(13)

where

R_{t}^{ι *} : = \sum_{θ_{t + 1}^{ι} \in Θ_{t + 1}^{ι}} φ^{ι} (y_{t}^{ι *}, θ_{t}^{ι}) F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι *}) = min_{y_{t}^{ι} \in Y_{t}^{ι}} r^{ι} (y_{t}^{ι}, θ_{t}^{ι}),

r^{ι} (y_{t}^{ι}, θ_{t}^{ι}) = \sum_{θ_{t + 1}^{ι} \in Θ_{t + 1}^{ι}} φ^{ι} (y_{t}^{ι}, θ_{t}^{ι}) F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι}) .

Remark 5.

Under the local-optimal strategy (11), the probability state vector

μ_{t}^{ι}

satisfies the following relation

μ_{t + 1}^{ι} = {(F_{t}^{ι})}_{t}^{⊤} μ_{t}^{ι} = {({(F^{ι *})}^{⊤})}^{t + 1} μ_{0}^{ι},

where

(F^{ι *}) : = \sum_{y_{t}^{ι} \in Y_{t}^{ι}} F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι}) κ_{y_{t}^{ι *} (θ_{t}^{ι}), θ_{t}^{ι}} = F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι *}) .

Lemma 3.

The function

V_{t}^{ι}

is a Lyapunov-like function (monotonically decreasing in time) that reaches an equilibrium point.

Proof.

The function

V_{t}^{ι}

satisfies the following properties:

According to Remark 2, $V_{t}^{ι} > 0$ .
According to Equations (10) and (11), $Δ V_{t}^{ι} = V_{t + 1}^{ι} - V_{t}^{ι} < 0$ .
$V_{t}^{ι} = 0$ at the Lyapunov equilibrium point. It results straightforwardly from the following:

$\begin{matrix} \sum_{ι \in N} min_{π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι})} V_{t + 1}^{ι} = \sum_{ι \in N} \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} {(μ_{0}^{ι})}_{θ_{t}^{ι}} [\sum_{θ_{t + 1}^{ι} \in Θ_{t + 1}^{ι}} φ^{ι} (y_{t}^{ι *}, θ_{t}^{ι}) F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι *})] = \\ \sum_{ι \in N} \sum_{t \in T} \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} \sum_{m_{t}^{ι} \in M_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} r^{ι} (y_{t}^{ι}, θ_{t}^{ι}) π^{ι *} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι *} (m_{t}^{ι} | θ_{t}^{ι}) μ^{ι *} (θ_{t}^{ι}) \times \\ \prod_{i \neq ι \in N} π^{i *} (x_{t}^{i} | m_{t}^{i}) σ^{i *} (m_{t}^{i *} | θ_{t}^{i *}) μ^{i *} (θ_{t}^{i *}) = \\ \sum_{ι \in N} (\sum_{t \in T} min_{π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι})} \sum_{m_{t}^{ι} \in M_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} π^{ι} (y_{t}^{ι} | m_{t}^{ι}) \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} r^{ι} (y_{t}^{ι}, θ_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι}) μ^{ι} (θ_{t}^{ι}) \times \\ \prod_{i \neq ι \in N} π^{i *} (x_{t}^{i} | m_{t}^{i}) σ^{i *} (m_{t}^{i *} | θ_{t}^{i *}) μ^{i *} (θ_{t}^{i *})) < \\ \sum_{ι \in N} (\sum_{t \in T} \sum_{m_{t}^{ι} \in M_{t}^{ι}} \sum_{y_{t}^{ι} \in Y_{t}^{ι}} π^{ι} (y_{t}^{ι} | m_{t}^{ι}) \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} r^{ι} (y_{t}^{ι}, θ_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι}) μ^{ι} (θ_{t}^{ι}) \times \\ \prod_{i \neq ι \in N} π^{i *} (x_{t}^{i} | m_{t}^{i}) σ^{i *} (m_{t}^{i *} | θ_{t}^{i *}) μ^{i *} (θ_{t}^{i *})) = \\ \sum_{ι \in N} Φ_{t}^{ι} (π^{ι}, π^{- ι *}, σ^{ι}, σ^{- ι *}) \end{matrix}$

Hence,

$\sum_{ι \in N} (Φ_{t}^{ι} (π^{*}, σ^{*}) - Φ_{t}^{ι} (π^{ι}, π^{- ι *}, σ^{ι}, σ^{- ι *})) < 0 .$

(14)

Given that the inequality in Equation (14) is valid for all admissible strategies $π$ and $σ$ , it is valid when $π^{i} = π^{i *}$ and $σ^{i} = σ^{i *}$ for $i \neq ι$ , considering

$Φ_{t}^{ι} (π^{*}, σ_{π}^{*}) - Φ_{t}^{ι} (π^{*}, σ^{i}, σ^{- ι *}) < 0 .$

(15)

Then, Equation (1) holds at $(π^{*}, σ_{π}^{*})$ , and then $V_{t}^{ι} = 0$ .

□

4. The Three-Level Manipulation Model

The manipulation model is a game in which the manipulating players move first and then the manipulated players move sequentially [27].

4.1. The Stackelberg Game

To simplify the descriptions below, let us introduce the new variables

\begin{matrix} α : = c o l [ξ^{l}], A_{a d m} : = Ξ^{l}, (l = \bar{1, n}) . \\ β : = c o l [ξ^{f}], B_{a d m} : = Ξ^{f} (f = \bar{1, m}) . \end{matrix}\}

(16)

where

c o l [\cdot]

transforms a matrix in a vector format.

Consider a game involving manipulating players, where each manipulator’s strategy is denoted by

α^{l} \in A^{l}

and

α^{- l} \in A^{- l}

represents the strategies of the other manipulating players,

α = (α^{l}, α^{- l})

. Similarly, manipulated players have strategies

β^{f} \in B_{a d m}^{f}

and

β^{- f} \in B_{a d m}^{- f}

denotes the strategies of the other manipulated players,

β = (β^{f}, β^{- f})

. The manipulating players anticipate the reactions of the manipulated players, choosing their strategies

α \in A_{a d m}

before the manipulated players respond with their best-reply strategies

β \in B_{a d m} .

-: The individual aim for the manipulating players is to find $α^{l *} \in A_{a d m}^{l}$ for fixed $β (α^{l *}, α^{- l}) \in B_{a d m}^{f}$ such that

$v^{l} (α^{l *}, α^{- l} | β (α^{l *}, α^{- l})) : = {\tilde{Φ}}^{l} (α^{l *}, α^{- l} | β (α^{l *}, α^{- l})) - {\tilde{Φ}}^{l} (α^{l}, α^{- l} | β (α^{l *}, α^{- l})) \leq ε, ε \geq 0,$

(17)

valid for all $α^{l} \in A_{a d m}^{l}$ , $α^{- l} \in A_{a d m}^{- l}$ and all $l = \bar{1, n}$ . Below $α^{l *}$ will be associated with the $ε$ -Nash equilibrium in a Stackelberg–Nash game.

The individual objectives of the manipulating players can be expressed in a joint format by utilizing the regularized Tanaka function, as demonstrated below:

v_{δ} (α^{l *}, α^{- l} | λ, β (α^{l *}, α^{- l})) : = \sum_{l = 1}^{n} v_{δ}^{l} (α^{l *}, α^{- l} | λ^{l, i}, λ^{f, j}, β (α^{l *}, α^{- l})) \leq ε

(18)

for all

α \in A

,

λ = {[λ^{j, i}, λ^{f, j}]}_{l = \bar{1, n}, f = \bar{1, m}, i = \bar{1, N}, j = \bar{1, N}}

and

v_{δ}^{l} (α^{l *}, α^{- l} | λ^{l, i}, λ^{f, j}, β (α^{l *}, α^{- l})) = L_{δ}^{l} (α^{*} | λ^{l, i}, λ^{f, j}, β (α^{l *}, α^{- l})) - L_{δ}^{l} (α | λ^{l, i}, λ^{f, j}, β (α^{l}, α^{- l}))

(19)

where

L_{δ}^{l} (α | λ^{l, i}, λ^{f, j}, β (α^{l}, α^{- l}))

corresponds with the Lagrange function.

The function

v_{δ} (α^{l *}, α^{- l} | λ, β (α^{l *}, α^{- l}))

is known as the regularized Tanaka function within the context of Stackelberg–Nash games. When both

δ = 0

and

λ = 0

, it corresponds to the original Tanaka function as outlined in [28].

-: The individual aim for the manipulated players is to find $β^{f *} \in B_{a d m}^{f}$ for a fixed $α (β^{f *}, β^{- f}) \in A_{a d m}^{l}$ such that

$u^{f} (β^{f *}, β^{- f} | α (β^{f *}, β^{- f})) : = U^{f} (β^{f *}, β^{- f} | α (β^{f *}, β^{- f})) - U^{f} (β^{f}, β^{- f} | α (β^{f *}, β^{- f})) \leq ε, ε \geq 0,$

(20)

valid for all $β^{f} \in B_{a d m}^{f}$ , $β^{- f} \in B_{a d m}^{- f}$ and all $f = \bar{1, m}$ . Below $β^{f *}$ will be associated with the $ε$ -Nash equilibrium in a Stackelberg–Nash game.

The individual objectives of the manipulated players can be expressed jointly through the regularized Tanaka function, as outlined below:

u_{δ} (β^{f *}, β^{- f} | λ, α (β^{f *}, β^{- f})) : = \sum_{f = 1}^{m} u_{δ}^{f} (β^{f *}, β^{- f} | λ^{l, i}, λ^{f, j}, α (β^{f *}, β^{- f})) \leq ε

(21)

for all

β \in B^{f}

,

λ = {[λ^{j, i}, λ^{f, j}]}_{l = \bar{1, n}, f = \bar{1, m}, i \in |Θ_{t}|, j \in |Θ_{t}|}

and

u_{δ}^{f} (β^{f *}, β^{- f} | λ^{l, i}, λ^{f, j}, α (β^{f *}, β^{- f})) = L_{δ}^{f} (β^{f *} | λ^{l, i}, λ^{f, j}, α (β^{f *}, β^{- f})) - L_{δ}^{f} (β^{f} | λ^{l, i}, λ^{f, j}, α (β^{f}, β^{- f}))

(22)

In this context, the equilibrium concept for games is defined as Nash equilibrium for simultaneous play scenarios and Stackelberg equilibrium for games with a hierarchical structure.

Definition 1.

(Stackelberg Equilibrium) In a game featuring l-leaders, a strategy

α^{*} \in A_{a d m}

is considered a Stackelberg–Nash equilibrium strategy for the leaders if

max_{β \in ρ (α^{*})} v_{δ} (α^{*} | β) \leq max_{β \in ρ (α)} v_{δ} (α ξ | β)

such that

ζ (α) = {β \in B_{a d m} | u_{δ} (β | α) \leq u_{δ} (β^{'} | ξ) \forall β^{'} \in B_{a d m}}

is the best-reply strategy set of the followers.

The previously defined Stackelberg equilibrium can be reformulated for the followers by replacing the set

ζ (α)

with the set of Nash equilibria. In this context, if the leaders adopt the strategy

α

; the optimal response of the followers will be a Nash equilibrium. In this scenario, the Nash equilibrium represents a state where, given the leaders’ strategy

α

, the followers have optimized their own strategies. They cannot improve their outcomes by unilaterally changing their decisions, as doing so would lead to a less favorable result. This adjustment transforms the dynamics of the game, as the followers must consider not only their responses but also the broader implications of their strategies within the context of Nash equilibrium. To expand upon this, we can envision a multi-level decision-making environment where the leaders, acting as primary decision-makers, set the stage for the followers’ reactions. By defining their strategy

α

, the leaders create a framework within which the followers must operate. The followers, recognizing the leaders’ strategic choices, analyze their own positions and responses. In essence, their responses are not isolated actions but part of a larger strategic equilibrium that includes the leaders’ moves.

Definition 2.

A strategy

α^{*} \in A_{a d m}

of the manipulating player together with the collection

β^{*} \in B_{a d m}

of the manipulated players is said to be a Stackelberg–Nash equilibrium (

ε = 0

) if

(α^{*} {, β}^{*}) \in A r g max_{λ \geq 0} min_{β^{f}, β^{- f} \in B_{a d m}} min_{α^{'} \in A_{a d m}} \{v_{δ} (α^{'} | β) | u_{δ} (β^{f}, β^{- f} | λ, α (β^{f}, β^{- f})) \leq 0\} .

(23)

Using the Lagrange multiplier approach, we can represent (23) as follows:

L_{δ} (α^{'}, β^{f}, β^{- f}, μ, λ) \to max_{μ, λ \geq 0} min_{β^{f}, β^{- f} \in B_{a d m}} min_{α^{'} \in A_{a d m}},

(24)

where

L_{δ} (α^{'}, β^{f}, β^{- f}, μ, λ) = v_{δ} (α^{'} | β^{'}) + μ u_{δ} (β^{f}, β^{- f} | λ, α (β^{f}, β^{- f})) .

(25)

With

δ > 0

, the considered functions become strictly convex, providing the uniqueness of the considered conditional optimization problem (24). Notice also that the Lagrange function in (25) satisfies the saddle-point condition, namely, for all

α^{'} \in A_{a d m}

and

λ \geq 0

we have

\begin{matrix} L_{δ} (α^{' *}, β^{f *}, β^{- f *}, μ, λ) \leq L_{δ} (α^{' *}, β^{f *}, β^{- f *}, μ^{*}, λ^{*}) \leq L_{δ} (α^{'}, β^{f}, β^{- f}, μ^{*}, λ^{*}) . \end{matrix}

(26)

4.2. Tri-Level Manipulation Game

The costs associated with different hierarchical roles are assumed to be distributed as follows: the outermost manipulators incur a cost

v_{δ}

, the players being manipulated bear a cost

u_{δ}

, and the innermost manipulators are assigned a cost

w_{δ}

. These costs are thought to resemble Lyapunov-like functions.

Tri-level game problems have the following general form

\begin{matrix} min_{α \in A_{a d m}} v_{δ} (α | β, γ) \\ s . t \\ \begin{matrix} ϕ (α, β, γ) \leq 0, β, γ \in ζ (α) \\ min_{β \in B_{a d m}} u_{δ} (β | α, γ) \\ \begin{matrix} s . t \\ ϕ (α, β, γ) \leq 0, α, γ \in ζ (β) \\ min_{γ \in A_{a d m}} w_{δ} (γ | α, β) \\ s . t \\ ϕ (α, β, γ) \leq 0, α, β \in ζ (γ) \end{matrix} \end{matrix} \end{matrix}

where

ζ (α),

ζ (β)

, and

ζ (γ)

are the solution set of the corresponding problem. The complement variables are fixed; for instance, for computing

min_{α \in A_{a d m}} v_{δ} (α | β, γ)

the variables

β

and

γ

are fixed.

At the initial stage of this framework, the manipulating players choose a manipulation strategy

α \in A_{a d m}

. In response, at the second stage, the manipulated players react by selecting a manipulation strategy

β \in B_{a d m}

. In the final stage, the manipulating players, after considering both the manipulation strategy

β

and the manipulation strategy

α

, steer the system by selecting a strategy

γ \in A_{a d m}

that persuades the manipulated players.

At the highest level, the strategy

α \in A_{a d m}

represents the approach taken by the manipulating players who initiate the game by choosing a manipulation tactic

α

. This strategy may involve actively influencing the manipulated players. By controlling

α \in A_{a d m}

, the manipulating players aim to allocate resources in a way that builds an infrastructure system as resilient to manipulation as possible. The set

A_{a d m}

often encompasses one or more critical resource limitations that govern these actions.

The strategy

β \in B_{a d m}

represents the actions of the manipulated players at the middle level of the hierarchical structure. These players observe the manipulative tactics

α \in A_{a d m}

and

γ \in A_{a d m}

. Their goal is to resist the manipulation from the manipulating players to the greatest extent possible. Best-reply strategies at the middle level function as a defensive mechanism, effectively countering manipulation attempts from higher or lower levels. This counterbalancing dynamic enhances the system’s resilience by neutralizing external influences. As a result, it improves strategic robustness, ensuring stability and adaptability in complex, multi-level decision-making environments.

The strategy of the manipulating players, acting as operators, is reflected at the deepest level through the minimization of

w_{δ} (γ | α, β)

, where they select

γ \in A_{a d m}

to reduce the system’s operational costs. It is important to note that the strategy

γ

can be influenced by both the manipulation strategy

α

and the response strategy

β

, which are determined according to

u_{δ} (β | α, γ)

. While the manipulating players may represent different real entities, they all pursue the same manipulation objectives.

Continuous refinement by lower-level players ensures their strategies remain aligned with those of higher-level players. This adaptability is vital for maintaining effectiveness, as it allows the system to respond to changing conditions and evolving strategies within the hierarchy, ensuring that decisions remain optimal and responsive to dynamic environments.

Decision-making at each level generates feedback loops that significantly influence the entire game’s dynamics. These loops create a continual exchange of information and adjustments between levels, ensuring that the system evolves toward stable outcomes. This interconnected process enhances the system’s predictability, allowing for more reliable forecasts and better strategic alignment over time.

The main point of this solution lies in its asymptotic stability, which means that even if players start with different strategies, as the game evolves, they will naturally gravitate toward a stable equilibrium, without needing to regret any past decisions. No matter how the manipulative players attempt to influence the middle level, or how the lower level responds, the structure of the game ensures that, in the long run, stability will prevail.

This concept of stability also ensures that players can adopt no-regret strategies, meaning they will not have to second-guess their moves, because over time, their actions will converge to a point that best suits the entire dynamic.

5. Numerical Example: Oil Market and Regulatory Influence

This example considers controllable Bayesian–Markov chains with Lyapunov-like cost functions to model manipulation in a hierarchical oil market system vs. the regulatory influence. The decision-making framework involves players that manipulate each other strategically in a three-level Stackelberg framework.

Consider a oil market system as follows. There are three type of players: (a) outermost manipulating players: a lobbying group representing oil companies, aiming to influence regulatory priorities; (b) middle-level manipulated players: national regulators, balancing energy production, environmental sustainability, and economic growth; (c) innermost manipulating players: oil producers, who adjust production and investment strategies based on regulatory policies that persuade the manipulated players.

Consider a market that evolves over discrete types

{θ_{1}, θ_{2}, θ_{3}, θ_{4}}

\in Θ_{a d m}

, representing oil and environmental conditions:

θ_{1}

—high demand, low environmental degradation;

θ_{2}

—moderate demand, moderate degradation;

θ_{3}

—steady demand, unaltered degradation; and

θ_{4}

—low demand, high degradation.

Each player can take three possible actions

{y_{1}, y_{2}, y_{3}} \in Y_{a d m}

. The transition probabilities

F^{ι} (θ_{t + 1} | θ_{t}, y_{t})

depend on the current type

θ_{t}

and actions

y_{t}

by players at all levels:

y_{1}

—lobbying actions (e.g., media campaigns and policy framing);

y_{2}

—regulatory decisions (e.g., subsidies and emissions limits); and

y_{3}

—oil producers’ production and investment strategies.

The outermost manipulators that represent the lobbying group incur a cost

v_{δ}

, which is the cost of lobbying activities, the players being manipulated, regulators, bear a cost

u_{δ}

, which optimize policies based on their perception, penalizing deviations from environmental and energy demand equilibrium, and the innermost manipulators, energy producers, are assigned a cost

w_{δ}

, which adjust their strategies to minimize profit fluctuations and operational costs. These costs are thought to resemble Lyapunov-like functions

min_{π^{ι} (y_{t}^{ι} | m_{t}^{ι}) σ^{ι} (m_{t}^{ι} | θ_{t}^{ι})} V_{t + 1}^{ι} = \sum_{θ_{t}^{ι} \in Θ_{t}^{ι}} {(μ_{0}^{ι})}_{θ_{t}^{ι}} [\sum_{θ_{t + 1}^{ι} \in Θ_{t + 1}^{ι}} φ^{ι} (y_{t}^{ι *}, θ_{t}^{ι}) F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι *})] .

Each level minimizes a cost function that encourages stability.

The strategic manipulation will be as follows:

The outermost manipulating players manipulate transition probabilities $F^{l} (θ_{t + 1}^{l} | θ_{t}^{l}, y_{t}^{l})$ into $F^{l} (θ_{t + 1}^{l} | θ_{t}^{l}, y_{t}^{l *})$ , (i) sponsoring biased reports exaggerating oil demand and (ii) framing oil fuels as critical for economic growth. Their goal is to influence manipulated players (regulators’) cost function $u_{δ}$ .
Manipulated players in the middle-level best reply to manipulated transition probabilities $F^{l} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι *})$ by implementing policies by applying $m i n$ $π^{f} (y_{t}^{f *} | m_{t}^{f}) σ^{f} (m_{t}^{f} | θ_{t}^{f})$ to $F^{f} (θ_{t + 1}^{f} | θ_{t}^{f}, y_{t}^{f *})$ , such as (i) subsidies for oil fuel production and (ii) relaxed penalties for environmental violations. Their objective is to influence manipulating players’ cost functions $v_{δ} w_{δ}$ .
The innermost manipulating players again improve their strategies $F^{l} (θ_{t + 1}^{l} | θ_{t}^{l}, y_{t}^{l *})$ based on $F^{f} (θ_{t + 1}^{f} | θ_{t}^{f}, y_{t}^{f *})$ , leading to the following: (i) increased production exploiting subsidies and (ii) reduced investment in renewable energy.

Considering

\begin{matrix} F^{ι} (: | :, 1) = [\begin{matrix} 0.8181 & 0.6596 & 0.8003 & 0.0835 \\ 0.8175 & 0.5186 & 0.4538 & 0.1332 \\ 0.7224 & 0.9730 & 0.4324 & 0.1734 \\ 0.1499 & 0.6490 & 0.8253 & 0.3909 \end{matrix}] & F^{ι} (: | :, 2) = [\begin{matrix} 0.8314 & 0.5269 & 0.2920 & 0.1672 \\ 0.8034 & 0.4168 & 0.4317 & 0.1062 \\ 0.0605 & 0.6569 & 0.0155 & 0.3724 \\ 0.3993 & 0.6280 & 0.9841 & 0.1981 \end{matrix}] \end{matrix}

\begin{matrix} F^{ι} (: | :, 3) = [\begin{matrix} 0.4897 & 0.0527 & 0.5479 & 0.3015 \\ 0.3395 & 0.7379 & 0.9427 & 0.7011 \\ 0.9516 & 0.2691 & 0.4177 & 0.6663 \\ 0.9203 & 0.4228 & 0.9831 & 0.5391 \end{matrix}] \end{matrix}

and random cost matrices, the resulting optimal strategies are shown in Figure 1 of the oil market Stackelberg game.

\begin{matrix} π^{ι} (y_{t}^{ι *} | m_{t}^{ι}) = [\begin{matrix} 0 & 0 . & 1 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{matrix}] \end{matrix}

The manipulation occurs via transforming

F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι})

into

F^{ι} (θ_{t + 1}^{ι} | θ_{t}^{ι}, y_{t}^{ι *})

, misleading regulators and energy producers into suboptimal but predictable decisions:

Outermost manipulating players: $F^{l} (θ_{t + 1}^{l} | θ_{t}^{l}, y_{t}^{l *})$ maximizes influence and profit by ensuring system inertia favors oil fuels.
Middle-level manipulated players: $F^{f} (θ_{t + 1}^{f} | θ_{t}^{f}, y_{t}^{f *})$ implements suboptimal policies based on manipulated dynamics.
Innermost manipulating players: $F^{l} (θ_{t + 1}^{l} | θ_{t}^{l}, y_{t}^{l *})$ exploits short-term gains, worsening long-term environmental and economic sustainability.

6. Conclusions

We proposed a solution to the manipulation game problem by applying Lyapunov theory, demonstrating both the existence and stability of Nash equilibria within the system. Each equilibrium that is asymptotically stable is associated with a Lyapunov-like function, which converges towards a Nash/Lyapunov equilibrium when such an equilibrium is present. This convergence guarantees the system’s robustness, as the asymptotic properties of the function promote no-regret strategies, ensuring players have little incentive to deviate from their chosen course of action in the long run.

Adaptation is crucial for maintaining a strategic advantage in a dynamic environment. The ability to adjust based on observed responses allows the system to stay aligned with its evolving objectives. This ongoing adjustment process ensures that strategies remain relevant, effective, and responsive to changing conditions, fostering long-term success and competitiveness.

The manipulation game is modeled as a hierarchical, three-level Stackelberg framework. At the top level, the manipulating players select a strategy aimed at influencing the middle-level participants. However, the strategy chosen by the manipulators may not always succeed in persuading those on the middle tier. In response, the middle-level players “who are being manipulated” adopt a best-reply strategy that can effectively neutralize or match the manipulation, creating a counterbalancing dynamic. This interaction embodies the core of strategic manipulation, where each player’s decisions are interdependent and reactions must be carefully calibrated.

To ensure the success of the manipulation, the player operating at the lowest level of the hierarchy must refine and adjust their strategy in response to the behavior of the manipulated players above them. This bottom-up adjustment is crucial to aligning with the overall manipulation scheme, as it allows the lower-level manipulator to account for and anticipate the reactions of the middle-tier participants. In essence, the success of the manipulation depends not just on the initial strategy but also on continuous adaptation based on observed responses at higher levels of the hierarchy.

This structured approach underscores the complexity of manipulation in multi-tiered strategic games. Each layer of decision-making “whether it is at the top, middle, or lower level” creates feedback loops that influence the broader dynamics of the game. By using Lyapunov theory to map out stable equilibria, we provide a framework that ensures the system gravitates towards a stable and predictable outcome, even in the face of manipulation. The incorporation of no-regret strategies further reinforces the stability and long-term viability of these equilibria, suggesting that the players’ strategic choices will converge to an optimal solution over time. This comprehensive solution offers valuable insights into both the theoretical and practical applications of manipulation in game theory, with implications for economics, cybersecurity, and political strategy.

Funding

This research received no external funding.

Data Availability Statement

All data required for this article are included within this article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The author declares no conflicts of interest.

References

Jones, D.N.; Paulhus, D.L. Handbook of Individual Differences in Social Behavior; Chapter Machiavellianism; The Guilford Press: New York, NY, USA, 2009; pp. 93–108. [Google Scholar]
Christie, R.; Geis, F. Studies in Machiavellianism; Academic Press: Cambridge, MA, USA, 2013. [Google Scholar]
Spielberger, C.D.; Butcher, J. Advances in Personality Assessment; Routledge: London, UK, 2013. [Google Scholar]
Geis, F. Dimensions of Personality; Wiley: New York, NY, USA, 1978; Chapter Machiavellianism; pp. 305–363. [Google Scholar]
Allen, F.; Gorton, G. Stock price manipulation, market microstructure and asymmetric information. Eur. Econ. Rev. 1992, 36, 624–630. [Google Scholar] [CrossRef]
Bagnoli, M.; Lipman, B.L. Stock Price Manipulationthrough Takeover Bids. Rand J. Econ. 1996, 27, 124–147. [Google Scholar] [CrossRef]
Clempner, J.B. A game theory model for manipulation based on Machiavellianism: Moral and ethical behavior. J. Artif. Soc. Soc. Simul. 2017, 20, 12. [Google Scholar] [CrossRef]
Cumming, D.; Ji, S.; Peter, R.; Tarsalewska, M. Market manipulation and innovation. J. Bank. Financ. 2020, 120, 105957. [Google Scholar] [CrossRef]
Clempner, J.B. Learning machiavellian strategies for manipulation in Stackelberg security games. Ann. Math. Artif. Intell. 2022, 90, 373–395. [Google Scholar] [CrossRef]
Clempner, J.B. A Manipulation Game Based on Machiavellian Strategies. Int. Game Theory Rev. 2022, 24, 2150015. [Google Scholar] [CrossRef]
Sanchez-Rabaza, J.; Rocha-Martinez, J.M.; Clempner, J.B. Characterizing Manipulation via Machiavellianism. Mathematics 2023, 11, 4143. [Google Scholar] [CrossRef]
Milgrom, P.; Roberts, J. Relying on the information of interested parties. RAND J. Econ. 1986, 17, 18–32. [Google Scholar] [CrossRef]
Krishna, V.; Morgan, J. A model of expertise. Q. J. Econ. 2001, 116, 747–775. [Google Scholar] [CrossRef]
Taneva, I. Information Design. Am. Econ. J. Microecon. 2019, 11, 151–185. [Google Scholar] [CrossRef]
Bardhi, A.; Guo, Y. Modes of persuasion toward unanimous consent. Theor. Econ. 2018, 13, 1111–1149. [Google Scholar] [CrossRef]
Kamenica, E.; Gentzkow, M. Bayesian Persuasion. Am. Econ. Rev. 2011, 101, 2590–2615. [Google Scholar] [CrossRef]
Bergemann, D.; Morris, S. Information design, Bayesian persuasion, and Bayes correlated equilibrium. Am. Econ. Rev. 2016. [Google Scholar] [CrossRef]
Gentzkow, M.; Kamenica, E. Bayesian persuasion with multiple senders and rich signal spaces. Games Econ. Behav. 2017, 104, 411–429. [Google Scholar] [CrossRef]
Brocas, I.; Carrillo, J.D.; Palfrey, T.R. Information gatekeepers: Theoryand experimental evidence. Econ. Theory 2012, 51, 649–676. [Google Scholar] [CrossRef]
Gul, F.; Pesendorfer, W. The war of information. Rev. Econ. Stud. 2012, 79, 707–734. [Google Scholar] [CrossRef]
Eső, P.; Szentes, B. Optimal Information Disclosure in Auctionsand the Handicap Auction. Rev. Econ. Stud. 2007, 74, 705–731. [Google Scholar] [CrossRef]
Bergemann, D.; Pesendorfer, M. Information structuresin optimal auctions. J. Econ. Theory 2007, 137, 580–609. [Google Scholar] [CrossRef]
Rayo, L.; Segal, I. Optimal information disclosure. J. Political Econ. 2010, 118, 949–987. [Google Scholar] [CrossRef]
Li, H.; Shi, X. Discriminatory Information Disclosure. Discrim. Inf. Discl. 2017, 107, 3363–3385. [Google Scholar] [CrossRef]
Clempner, J.B.; Poznyak, A.S. Optimization and Games for Controllable Markov Chains: Numerical Methods with Application to Finance and Engineering; Springer: Berlin/Heidelberg, Germany, 2023. [Google Scholar]
Harsanyi, J.C.; Selten, R. A generalized Nash solutionfor two-person bargaining games with incomplete information. Manag. Sci. 1972, 18, 80–106. [Google Scholar] [CrossRef]
Clempner, J.B. A tri-level approach for computing Stackelberg Markov game equilibrium: Computational analysis. J. Comput. Sci. 2023, 68, 101995. [Google Scholar] [CrossRef]
Tanaka, K.; Yokoyama, K. On ϵ-equilibrium point in a noncooperative n-person game. J. Math. Anal. Appl. 1991, 160, 413–423. [Google Scholar] [CrossRef]

Figure 1. Convergence of the policy.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Clempner, J.B. Manipulation Game Considering No-Regret Strategies. Mathematics 2025, 13, 184. https://doi.org/10.3390/math13020184

AMA Style

Clempner JB. Manipulation Game Considering No-Regret Strategies. Mathematics. 2025; 13(2):184. https://doi.org/10.3390/math13020184

Chicago/Turabian Style

Clempner, Julio B. 2025. "Manipulation Game Considering No-Regret Strategies" Mathematics 13, no. 2: 184. https://doi.org/10.3390/math13020184

APA Style

Clempner, J. B. (2025). Manipulation Game Considering No-Regret Strategies. Mathematics, 13(2), 184. https://doi.org/10.3390/math13020184

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Manipulation Game Considering No-Regret Strategies

Abstract

1. Introduction

1.1. Brief Review

1.2. Related Work

1.3. Main Results

2. Markov Games

3. No-Regret Strategy

3.1. The Cost Function

3.2. Optimal Stationary Strategy

4. The Three-Level Manipulation Model

4.1. The Stackelberg Game

4.2. Tri-Level Manipulation Game

5. Numerical Example: Oil Market and Regulatory Influence

6. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI