Analysis of Unmanned Aerial Vehicle-Assisted Cellular Vehicle-to-Everything Communication Using Markovian Game in a Federated Learning Environment

Fernando, Xavier; Gupta, Abhishek

doi:10.3390/drones8060238

Open AccessArticle

Analysis of Unmanned Aerial Vehicle-Assisted Cellular Vehicle-to-Everything Communication Using Markovian Game in a Federated Learning Environment

by

Xavier Fernando

^†

and

Abhishek Gupta

^*,†

Department of Electrical, Computer and Biomedical Engineering, Toronto Metropolitan University, Toronto, ON M5B 2K3, Canada

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Drones 2024, 8(6), 238; https://doi.org/10.3390/drones8060238

Submission received: 17 April 2024 / Revised: 23 May 2024 / Accepted: 30 May 2024 / Published: 2 June 2024

(This article belongs to the Special Issue Artificial Intelligence (AI) and Machine Learning (ML) in UAV Technology)

Download

Browse Figures

Versions Notes

Abstract

The paper studies a game theory model to ensure fairness and improve the communication efficiency in an unmanned aerial vehicle (UAV)-assisted cellular vehicle-to-everything (C-V2X) communication network using Markovian game theory in a federated learning (FL) environment. The UAV and each vehicle in a cluster utilized a strategy-based mechanism to maximize their model completion and transmission probability. We modeled a two-stage zero sum Markovian game with incomplete information to jointly study the utility maximization of the participating vehicles and the UAV in the FL environment. We modeled the aggregating process at the UAV as a mixed strategy game between the UAV and each vehicle. By employing Nash equilibrium, the UAV determined the probability of sufficient updates received from each vehicle. We analyzed and proposed decision-making strategies for several representative interactions involving gross data offloading and federated learning. When multiple vehicles enter a parameter transmission conflict, various strategy combinations are evaluated to decide which vehicles transmit their data to the UAV. The optimal payoff in a transmission window is derived using the Karush–Khun–Tucker (KKT) optimality conditions. We also studied the variation in optimal model parameter transmission probability, average packet delay, UAV transmit power, and the UAV–Vehicle optimal communication probabilities under different conditions.

Keywords:

game theory; vehicle selection; federated reinforcement learning; noncooperative game; Nash equilibrium; Markovian game; UAV; C-V2X; drones in vehicular communication networks (D-VCNs)

1. Introduction

In the realm of autonomous driving and intelligent transportation systems (ITSs), the sixth generation (6G) cellular vehicle-to-everything (C-V2X) networks face exponential growth in data transmission requirements [1]. Consequently, unmanned aerial vehicle (UAV)-assisted C-V2X networks are being investigated to meet high throughput, ultra reliability, and low latency requirements [2]. The high mobility and line of sight (LoS) channel characteristics of UAVs offer much-needed performance improvement to C-V2X networks [3]. Therefore, UAV-assisted C-V2X communications are being explored to achieve enhanced coverage, improved reliability, traffic management, emergency response support, and road safety. However, challenges such as UAV battery power and energy consumption, optimal deployment altitude, and integrating intelligent approaches to predict UAV trajectories to cover multiple vehicles need to be further investigated [4].

In the advanced 6G C-V2X communications, hundreds of embedded sensors in the vehicles will collect data pertaining to the vehicular environment, traffic status, weather conditions that impact safe driving, and on-road security incidents. This sensor data are often offloaded to a UAV for processing so that the UAV can facilitate coordination and optimal driving decisions among vehicles. Therefore, the UAV must be capable of collecting, storing, mining, analyzing, and processing data with minimum delay [5]. Furthermore, in UAV-assisted C-V2X communication, vehicle speed and density change continuously over time and location. Moreover, while infotainment applications can tolerate relatively high latency, emergency, mission-critical, and real-time cooperative control messages have stringent delay constraints [6]. Reducing latency is a major research issue, as it involves multiple aspects of the network. A packet’s transmission and processing contribute to the total delay, which impacts the quality of service (QoS) [7].

The total delay (

D

) varies with the type of packet, packet size, transmission time interval (TTI), transmission window characteristics, and the underlying network infrastructure [8]. Additionally, the limited battery power of a drone must be optimally utilized so that the drone can serve a maximum number of vehicles, process maximum sensor data, and restrict its trajectory such that it can maintain LoS communication with a maximum number of vehicles. This is an analytically complex problem, and consequently, drone assisted vehicular communication networks (D-VCNs) are being trained with novel machine learning algorithms for resource allocation and energy management [9]. As more and more novel applications of D-VCNs emerge, it is critical to evaluate the communication performance of these D-VCNs. Therefore, the recent third-generation partnership project (3GPP) and the fifth-generation automotive association (5GAA) standards emphasize the exploitation of machine learning for performance optimization and QoS enhancement in wireless communications [10].

However, since conventional machine learning requires a large amount of training data and may not protect the privacy of users, federated learning (FL), an advanced form of machine learning, is recently gaining momentum. FL enables data collected at the local vehicles to be processed locally and allows for the converged model parameters to be transmitted to UAVs [11]. This model parameter transmission to the UAV instead of gross data transmission has multiple benefits, including protecting the user privacy and saving the channel bandwidth. However, in a heavy traffic urban driving scenario, the UAV needs sufficient data to control the deceleration and acceleration of vehicles in real time.

Since, in the FL environment, the local model parameters from a few randomly selected vehicles are transmitted to the UAV, it may lead to data imbalance and possible collision if both vehicles lack coordinated driving decisions [12]. For the UAV to make an informed decision, all the vehicles must be able to transmit their local model parameters to the UAV, which is too data intensive. A potential solution to achieve this balance is to use game theory that selects the best samples among all the vehicles. Game theory provides a strategy-based approach where multiple vehicles play a game as they compete for bandwidth and UAV computing resources [13].

Moreover, for V2X connectivity in cooperative intelligent transport systems (C-ITS), the 3GPP has elaborated on the new radio vehicle-to-everything (NR-V2X) standard in Release 16 and Release 17. These releases are being progressively enhanced by embedding UAVs in C-V2X communications. The open issues that are expected to be addressed in Release 18 encompass the integration of low complexity machine learning algorithms for performance enhancement of UAV-assisted C-V2X communications [14]. These advances aim to enhance the current state-of-the-art in C-V2X communication standards based on long-term evolution (LTE) and new radio (NR) technology [14]. The authors in [15] have proposed a novel architecture for resource allocation, packet transmission, and vehicle density in V2X communications. The work concluded that the current LTE based networks do not meet the stringent latency requirements in UAV-assisted C-V2X communications, thus leading to severe performance bottlenecks [15]. Variations in the number of vehicles, communication range of a UAV, and packet transmission frequency further impact the average end-to-end (E2E) latency and packet delivery ratio (PDR) [16]. A recent work has demonstrated that the parameters of the communication channel and the data traffic influence the LTE-V2X’s performance in a realistic multiapplication environment [17].

1.1. Example Scenario of a Multiagent UAV-Assisted C-V2X Communication

In order to minimize transmission latency, as well as minimize UAV energy consumption while maximizing the spectral efficiency and UAV’s coverage range, one approach is to optimize the UAV trajectory. The UAV-assisted C-V2X communications can be viewed as a multiagent communication system in which various agents are responsible for making decisions. The agents are positively rewarded for making decisions that maximize their own benefits, while they are punished for bad decisions. This is done through frequent model updates, optimal resource utilization, and low latency. An example scenario of such a multiagent UAV-assisted C-V2X communication is illustrated in Figure 1. Here, each vehicle has a different objective and competes for communication bandwidth and the UAV’s computing resources.

Considering the scenario depicted in Figure 1, an incomplete information game can model the interaction between multiple vehicles and the UAV. Here, vehicle

V_{1}

is involved in a high-priority smart home communication,

V_{2}

is requesting infotainment services,

V_{3}

needs voice services,

V_{5}

is requesting cloud based services, and

V_{6}

needs resources to respond to an urgent email. Vehicles

V_{4}

and

V_{7}

have no priority or urgency but need to be aware of the current traffic status. Some vehicles can communicate directly with the UAV, whereas at other times, they need to communicate through a roadside unit (RSU). The UAV needs to allocate resources fairly to all the vehicles. The UAV also needs to monitor the priority of vehicular communications and allocate resources based on the priority. In this game, each vehicle and the UAV is a player that takes a set of actions based on communication strategy. The player’s payoffs are determined by the system performance, which depends on the strategies of all the players [18]. For instance, if vehicle

V_{1}

decides to overtake

V_{3}

, then the following scenarios arise.

Note that the designated velocity is determined by the UAV, while an alternate velocity is chosen by the individual vehicle. A vehicle’s current decision incurs a positive or negative payoff based on the information it receives from the UAV:

Let both $V_{1}$ and $V_{3}$ follow the designated velocity assigned by the UAV. In this case, both $V_{1}$ and $V_{3}$ accumulate a low positive payoff. The UAV’s payoff also gradually increases for each correct decision communicated to the vehicles.
Vehicle $V_{1}$ decides to take an alternate velocity to overtake $V_{3}$ and later needs to decelerate to avoid collision with $V_{3}$ . The $d e c e l e r a t e$ action leads to a low negative payoff.
Vehicle $V_{1}$ takes an alternate velocity, and $V_{3}$ follows the designated velocity. If there is a need to decelerate to avoid a collision, then $V_{1}$ receives a low negative payoff, while $V_{3}$ receives a zero payoff.
If both $V_{1}$ and $V_{3}$ follow the designated velocity, there is no collision; both players receive a high positive payoff.
If both $V_{1}$ and $V_{3}$ take an alternate velocity, and still there is no collision, each player receives a low positive payoff.
Here, the UAV allocates communication and computing resources to both $V_{1}$ and $V_{3}$ , which results in safe driving and no collision. Thus, the UAV incurs a high positive payoff. A challenge in the scenario is simultaneously monitoring the status of multiple vehicles and deciding resource allocation strategies in a short time.

In Figure 1, each player aims to maximize their payoff by selecting a strategy that leads to an optimal system performance. The system performance depends on the actions of all the players, which leads to the problem of finding the Nash equilibrium [19]. The Nash equilibrium implies a set of strategies where no player can improve their payoff by unilaterally changing their strategy if all other players keep their strategies unchanged. In UAV-assisted C-V2X communications, an incomplete information game can be used to analyze the performance and optimize the communication strategies of individual vehicles to improve the overall system performance [20].

1.2. Contributions

This paper proposes a Markovian game theoretic approach using incomplete information to solve a multiplayer zero sum stochastic game. Each player competes to find the Nash equilibrium to enable multiple vehicles to communicate with the UAV in a transmission window. The zero sum property implies that at a time (t) in a game iteration (

G

), there is only one winning player. Moreover, we apply Federate Reinforcement Learning (FRL) to process sensor data at the vehicles, and the vehicles transmit the processed local model parameters to the UAV [11]. Depending on the payoff associated with a strategy, agents can decide whether to transmit their model parameters or gross sensor data. The global model at the UAV can be improved by incorporating context information, driving location, sensor capabilities, driving environment, and sensor data preferences. To maximize its payoff, the UAV needs to decide which vehicles to accept in a TTI and how to allocate available resources to each participating vehicle. We evaluate a payoff matrix to identify solutions that converge to an optimal mixed Nash equilibrium. Even when there is no saddle point, a pure strategy may exist between multiple players that maximizes the payoffs of the interacting agents. Unlike linear incentive–penalty schemes, our proposed FRL based game theoretic model achieves vehicle–UAV communication with less parameter tuning, lower computational complexity, and lower end-to-end delay compared to the existing works in [10,13,21].

The main contributions of this paper are as follows:

We calculate the FRL model parameter transmission probability of a vehicle in each TTI in scenarios where each vehicle is aware of the transmission probability of other vehicles in the cluster. Then, each vehicle makes a time-bound decision whether to transmit the update or not and allow other vehicles to transmit their local model parameters to the UAV, without incurring negative incentives.
We evaluate the proposed Markovian game theoretic approach by studying the variation in UAV average energy consumption (Joules/s) with the number of vehicles (V) in a single subframe in C-V2X mode 4 [22].
We study the variation in the average packet delay in federated learning scenario. Here, we vary the number of vehicles (V) and the vehicle velocity under different road lengths ( $R_{L}$ ). This is an extension of our previous work in [11], where we demonstrated the behavior of model convergence time for federated averaging. Note that in our previous work in [11], the vehicles in a C-V2X cluster communicated their local model parameters to a static parameter server (PS). In this work, the PS is embedded in a mobile UAV, which introduces challenges pertaining to the mobility of the UAV.
We plot the variation in the optimal model parameter transmission probability $p_{o p t}$ values of the $i^{t h}$ vehicles. Here, we vary the number of vehicles in an iteration of the game ( $G$ ).

1.3. Organization

The rest of the article is structured as follows: Section 2 discusses some recent literature that applied game theoretic approaches to vehicular communications. This section also briefly discusses some recent applications of game theoretic approach in UAV–vehicle communications. Section 3 presents our system model and the UAV–vehicle communication architecture. Section 4 presents our problem formulation, where we formulate the problem of UAV’s resource allocation to vehicles in an orthogonal time frequency space (OTFS)-based channel. Section 5 outlines our proposed solution approach. Section 6 discusses the findings of this work. Section 7 concludes the paper and discusses some avenues for future research.

2. Related Work

2.1. Game Theoretic Approaches in UAV–Vehicle Communications

A few applications and use cases of UAVs in D-VCNs are depicted in Figure 2 and are briefly mentioned below. A detailed list of applications and use cases of UAVs in vehicular communications can be found in [23]:

UAVs can be equipped with a variety of sensors and cameras that will enable image processing to manage parking spaces and roads. It can help reduce traffic congestion, parking shortages, and transportation costs while also reducing air pollution. UAV-mounted vision systems can monitor lane occupation and analyze parking spaces to automate parking space management. Consequently, UAVs can make urban space more manageable with the use of 6G communication technologies [25].
By deploying UAVs to monitor highways and vehicle platoons, fuel consumption can be reduced, traffic flow can be improved, and safety can be improved. Multiple drones can communicate with vehicles in real time to ensure collision avoidance and to adhere to vehicle velocity and mobility restrictions. Machine learning and game theoretic approaches can enhance communication and real-time decision making [26].
The authors in [27] have presented a novel technique to optimize the interactions between vehicles using game theory. In this study, behavioral decision-making was based on noncooperative game theory with incomplete information and complete information for cooperative vehicle platoons. The payoff functions for a noncooperative game take into account the economy, comfort, safety, and autonomous driving of the platoon. To calculate the action probability for different types of vehicles with incomplete information, a belief pool is constructed, which is updated with a Bayesian probability formula based on the driving intention identification. For the potentially conflicting entities, stable strategies are developed, thus ensuring that neither has a motivation to change their driving behavior. The authors demonstrated that platoons can formulate cooperative decision-making approaches to resolve the conflicts among vehicles [27].
Using UAVs to communicate with vehicles, especially in non-line of sight (NLoS) scenarios, requires commercial communication networks, which may experience a service outage in some scenarios. In addition to sharing their locations with ground entities, UAVs must also communicate with each other [28].
The authors in [29] have presented an efficient task forwarding mechanism in search and rescue operations. As part of the task forwarding process of a multiagent system, the authors introduced a reputation mechanism derived from an evolutionary game to improve cooperation rates between agents. This model combines reputation mechanisms with strategy updates in a multiagent system. The model is based on evolutionary game theory, and key factors such as reputation thresholds and the percentage of agents who choose to forward a task are assessed.
In search and rescue operations, UAVs can be utilized to deliver food and medication to passengers in autonomous vehicles stuck in remote or disaster-affected areas. For this, the communication channels must be free of interference, and outages and the weather must be good [30]. Antennas and transceivers can be integrated into UAVs to enhance wireless network coverage. By using C-V2X communication technologies, a communication link can be established between UAVs and vehicles. In a typical communication scenario, a stuck vehicle transmits its location information to a drone, which arrives at the location, captures images, and transmits them to first responders [31]. Based on game theory, the latency and packet drop percentage could be improved to achieve satisfactory system performance [32].
As electric vehicles continue to replace traditional fuel vehicles, UAVs are expected to play a major role in further reducing greenhouse gas emissions and air pollution. As a result of improved charging stations and battery swapping facilities, UAVs can become a major means of goods delivery by 2040. Electric vehicles and wireless charging technologies have been proposed to provide electricity to UAVs in urgent need. Each UAV has its own flight path and needs a charging plan that aligns with load balancing requirements at the charging station [33].
In a multi-UAV network, because the charging stations have limited capacity, strategic charging is dependent on the actions of other UAVs. As a result, the UAV battery charging problem becomes a generalized Nash equilibrium problem. A UAV’s objective in the bidding strategy is to minimize the cost of purchased energy and maximize the priority of the task using a stochastic optimization model. The authors in [34] have presented a strategy for bidding on the load distribution of several plug-in electric vehicles sharing the same charging station. The authors in [35] have proposed a game theoretic solution to a scenario where the deployment of charging stations to meet vehicle’s electricity demands caused load imbalance on the power grid. A game theoretic approach was used to minimize load imbalance at the grid, inefficient resource utilization, and nonoptimal power transaction costs. The solution was designed for optimum charging prices, efficient utilization, and energy conservation at charging stations. Furthermore, a discrete time event simulator was developed to test the proposed scheme on parameters such as arrival rates, queue length, and reactive power.
The authors in [36] have presented an electric vehicle energy management strategy to coordinate efficient utilization of multiple power sources. It uses game theoretic approaches to improve fuel economy and transmission efficiency, as well as a Markov chain-based driver model for predicting vehicle speed. Using a noncooperative game model and Nash equilibrium as the solution, the simulation results show that the proposed strategy improves fuel economy [36].
In advanced futuristic applications, UAVs are increasingly being integrated into smart city initiatives, where security and privacy concerns are important in determining communication strategies. UAV capabilities and their deployment in smart city environments can be enhanced through the use of machine learning and game theory [37].

Most of these 6G applications require autonomous UAV navigation and precise localization, an accurate map of the environment, and the UAV’s location at a given time [38]. Other challenges include complex and dynamic channel models, frequent cell associations, UAV energy constraints, the formation of spectrum-agile ad hoc UAV networks, and legislative requirements. Time-varying channels and the Doppler effect complicate the UAV-to-vehicle channel estimation. The complexity of multiagent decision making increases with the deployment of more UAVs and vehicles. This introduces challenges associated with trajectory planning, uninterrupted user association, and the allocation of resources among multiple agents.

2.2. Application of Game Theory to Assist Vehicles to Make a Coordinated Driving Decision

In a normal form UAV–vehicle communication game, the players, i.e., the UAV and vehicles, make simultaneous decisions. The players have complete knowledge about the other players’ strategies and possible actions. The overall decisions of each player are determined by the combination of strategies of all players [39]. However, a Markovian game deals with sequential decision making under uncertainty where the state of the game evolves over time, and players make decisions based on the current state [40]. The state transition is modeled using a Markov process, where the future state depends only on the current state and the actions of each player but not on the past actions. The players interact to maximize a long-term objective [40].

When applying the existing game theoretic approaches such as outage probability analysis to delay minimization and vehicle selection problem, an agent’s transmission probabilities are updated in an iterative manner based on periodic feedback [18]. Some other methods operate with limited information and usually have a high computational complexity [6]. The payoff-matching algorithm requires frequently updating the probability of each action to find the cumulative payoff for an action. A strategic action is played based on its corresponding relative probability, which is an outcome of limited contextual information [41].

In one approach, the delay minimization problem is solved using a flow allocation vector that achieves Nash equilibrium. The flow allocation vector is updated frequently to satisfy Nash equilibrium for all packets p

(p = p_{1}, p_{2}, \dots, p_{D})

arriving at a queue [42]. The reliability–latency model specifies that when the Nash flow is positive for any packet or all packets that have positive flow at Nash equilibrium, they will have specified reliability and latency [42]. Flows at Nash equilibrium have a property that all packets within one window experience the same delay for a reliability–latency function [43]. The authors in [13] have demonstrated that the reliability–latency model of sojourn time to reach traffic equilibrium can be calculated using Nash equilibrium. Nash equilibrium is reached when all the packets in a link have equal costs. As the packets spend more time in a queue, they incur a higher cost [13]. However, the Nash equilibrium approach to model latency is a computationally intensive approach. Also, the reliability–latency model identifies the set of packets that have low latency, but it does not specify any means to further reduce latency.

Furthermore, the relation between optimal flow allocation vector and the Nash flow allocation vector requires the partial derivatives of many parameters with respect to the number of servers in a given transmission time interval (TTI) [42]. In other approaches, scheduling algorithms to vary the TTI have been explored to establish trade-offs among the constraints that impact delay. By assigning a distinct packet transmission window in each TTI, server allocation criteria have been designed based on the traffic arrival pattern [44]. A limitation of this approach is that although it minimizes the sojourn time, it does not address network performance degradation due to unbalanced load distribution at the servers [45]. The load balancing algorithms assign similar traffic and numbers of packets per transmission window. However, a drawback of this approach is that the RTT increases considerably. Moreover, in situations where excessive load accumulates on the backhaul, manually controlling the TTI for delay-sensitive applications leads to longer processing time [46]. Utilizing game theory can improve both the load balancing of the network and lessen the delay constraint violation percentage.

Some Nash equilibrium approaches converge to a unique mixed strategy equilibrium but need to store the past incentives and penalties to evaluate the next action [40]. Frequent communication between agents leads to cooperative solutions but also increases the computational complexity of the solution. Some solutions converge to a repeated single action or to other mixed Nash equilibrium solutions that do not converge. Some solutions converge to a mixed Nash equilibrium solution that is in an absorbing state, thus implying that once the system reaches that state, it remains there indefinitely, particularly when the players lack an incentive to change their strategy [21]. In weighted average stochastic games, a vehicle maximizes the expectation of a fixed weighted average of incentives. Players calculate their own payoff function and observe the past choices of other players using fictitious play, thus utilizing best response dynamics and gradient-based learning approaches [47]. To reduce computational complexity and to realize quick decision making in UAV-assisted vehicular networks, cooperation among agents needs to be explored further. Table 1 compares the proposed approaches and objectives of some recent publications with our proposed approach.

2.3. Application of Federated Learning in UAV–Vehicle Communications

Cooperative game theory provides strategy-based solutions for system parameters such as throughput, delay, and low latency by establishing federated edge resources with network context information [48]. Optimal bandwidth utilization and queuing policies are determined by game theoretic models to reduce latency [49]. The UAV-assisted C-V2X communication environment is dynamic in nature and changes often. Thus, relying on individual local learning leads to higher error variance, particularly when the UAV computing resources are shared among a large number of vehicles [50]. Jointly optimizing C-V2X network and UAV computing resources is a challenging issue, as the number of parameters transmitted from vehicles to the UAV need to be as few as possible while communicating maximum information [9].

Federated reinforcement learning (FRL) is a distributed machine learning technique where vehicles collaborate by interacting across multiple edge devices without sharing actual data [51]. This enables individual vehicles to build a shared global learning model where the FRL model infers distributed packet transmission patterns [52]. Due to vehicle and UAV mobility, data sharing based on conventional cloud computing can miss real-time dynamic updates. To address this challenge, FRL empowered mobile edge computing (MEC) has emerged as a promising technique to intelligently support UAV-assisted C-V2X communications [53]. FRL enables collaborative data sharing in vehicular edge servers with the deployment of MEC servers [54]. Furthermore, as FRL is a distributed model for collaborative data sharing, it ensures efficient and fast data sharing to and from the edge servers. Using FRL, the data island problem among individual vehicle clusters that arises due to network congestion or inadequate buffer size can also be alleviated [55].

Federated learning approaches such as federated averaging have been shown to minimize the number of communication rounds and the resulting computational cost [11]. This also depends on the size of sensor data used by each vehicle to train the local models. A vehicle with a smaller dataset completes the local updates faster compared to other vehicles [56]. In 3GPP-based C-V2X mode 4, a semipersistent scheduling-based sensing algorithm was used to select server resources. Mode 4 was proposed in C-V2X to realize high reliability and better availability with low latency, but the performance of C-V2X mode 4 degraded with the increase in vehicle density [1]. Delay minimization through minimum hop routing utilizes maximum link capacity, where the objective function is to reduce latency. Modified deficit round-robin queuing addresses delay minimization for traffic such as voice data packets [15]. The modified deficit round-robin scheme addresses delay minimization by assigning scheduling priority for different queues [57].

The delay variation of a packet is the difference between the delay experienced by a packet and the delay of a selected reference packet, which is a high-priority packet. As the delay varies, delay variation indicates the distribution of delays at various TTIs for packets at varying arrival rates [58]. Different packets have different queuing delays at the same queue and different processing delays at the same UAV. The packets also travel via different network paths and accumulate different queuing delays and propagation delays. In UAV-assisted C-V2X communication, delay variation determines the consistency of the UAV servers’ responsiveness [45]. The recent advances in a UAV-assisted real-time channel sounder for air-to-ground scenarios and the analysis of fading characteristics of the UAV-to-vehicle communication channel can be found in [59,60].

3. System Model

Figure 3 illustrates the vehicular communication architecture, where the vehicles are grouped in a cluster (

C

) that comprises (

V

) vehicles denoted by

{v_{1}

, …,

v_{n}}

. The sensor data are locally processed at the vehicles, and the processed local model parameters are transmitted to the UAV. In each transmission window, only a subset of vehicles must be selected by the UAV. The packets either arrive at the UAV or are stored in a

M / M / k

queue until the UAV processor becomes available. To account for the high mobility of vehicles, orthogonal time frequency space (OTFS) base stations (BSs) are deployed to provide wireless coverage to vehicles. To reduce the problem complexity, we assume a linear multicell OTFS channel with Rician fading, where the interference is primarily from adjacent BSs.

To model the interactions between the UAV and vehicles, cumulative prospect theory is applied to capture the underlying rationality of the players as

V

vehicles cooperate to transmit their local model parameters to the UAV [61]. We assume LoS UAV-to-vehicle communication, and the channel gain between the

k^{t h}

vehicle and the UAV in the

i^{t h}

transmission window is denoted by

g_{i, k}

in Equation (1) that follows the free space path loss model.

g_{i, k} = \frac{β_{i}}{H^{2} + {(x_{u_{i}} - x_{v_{k}})}^{2} + {(y_{u_{i}} - y_{v_{k}})}^{2}}

(1)

where

β_{i}

is the channel power gain at an initial distance between a vehicle and the UAV. The term (

x_{u_{i}}

,

y_{u_{i}}

) indicates the UAV coordinates during the

i^{t h}

transmission window, and (

x_{v_{k}}

,

y_{v_{k}}

) indicates the coordinates of the

k^{t h}

vehicle during the

i^{t h}

transmission window. We assume that the UAV is flying at a height (

H

) in meters (m). In the above scenario, when the

k^{t h}

vehicle transmits the local model parameters to the UAV in the

i^{t h}

transmission window, the data transmission rate (

r_{i, k}

) is given by Equation (2) as

r_{i, k} = {log}_{2} (1 + \frac{β_{i} p_{u i}}{[H^{2} + {(x_{u_{i}} - x_{v_{k}})}^{2} + {(y_{u_{i}} - y_{v_{k}})}^{2}] + σ^{2}})

(2)

where

σ^{2}

represents the power spectral density of Gaussian noise, and

p_{u i}

is the transmit power of the UAV in the

i^{t h}

transmission window, which keeps diminishing during the subsequent transmission windows. The data rate does not change within a transmission window (

L_{w}

) but varies from one

L_{w}

to another due to vehicle mobility and uncertainty in the vehicular environment. Hence, the channel gain changes with the signal strength and the available UAV energy in different time slots. The packet arrival at the UAV follows a Poisson process, and the number of queued packets also follow a Poisson process. Different types of packets from different vehicles follow a uniform distribution in the queue. The probability of the number of packets in queue (

p_{a} (k)

) is

p_{a} (k) = e^{- μ} \frac{μ^{k}}{k!}

(3)

where

μ^{k}

is the mean of the uniform distribution of packets in the queue transmitted from k vehicles. As the number of vehicles in the coalition increases, the communication between the UAV and vehicles experiences delay. Note that the UAV tracks a series of discrete locations sequentially to cover the maximum number of vehicles using OTFS modulation [62]. The agents can randomly play the game in one of the following modes:

The vehicles can complete their local models in a TTI and then adopt a cooperate and leave strategy. Here, a vehicle cooperates with other vehicles and the UAV and then leaves the coalition for a random period to avoid a negative incentive. This is referred to as cooperate and leave strategy.
When the vehicles that complete their model update and are selected by the UAV in a TTI, the vehicle having previously followed the cooperate and leave strategy immediately returns to the coalition in the next TTI. This is referred to as the leave and return strategy. Here, we consider a repeated game model, and the past interactions between agents are taken into account using a discrete time Markov chain (DTMC).

The analysis of the interactions employs a cooperative strategy for the UAV and the vehicles. The cooperate and leave strategy is assigned to a subset of vehicles, and the leave and return strategy is assigned to the remaining subset of the vehicles. The cooperate and leave strategy allows the vehicles to leave the game for a few iterations. The UAV does not incur a negative resource sharing incentive if it discontinues interaction with the agent that left the coalition. Since the UAV can accept or reject an interaction, the vehicle receives a payoff to return to the coalition later. However, a vehicle incurs a low negative payoff for the duration that it left the coalition. The game profiles in this paper involve permutations and combinations of the strategies available to all the agents. In the iterative interaction game, the vehicles know the actions the other vehicles after each TTI, since the vehicles’ decisions are simultaneous. Using a DTMC, the agent interaction model and associated payoffs are stored to learn an agent’s strategy and action history. A continuous cooperation between a UAV and a vehicle leads to a trust behavior and assigns a lower negative payoff for a specific vehicle. A continuous noncooperation strategy represents a higher negative payoff value. An agent also incurs a positive payoff to maintain trust over the future iterations of the repeated game. The probability of maintaining trust is calculated dynamically from the past agent strategies. The symbols and parameters used in this paper are briefly described in Table 2.

4. Problem Formulation

In UAV–vehicle communication, interference poses a critical limit on system performance. This impacts both the vehicles’ and the UAV’s data rates and throughput. Furthermore, the high mobility of vehicles and UAVs causes rapid time variation of the Rician fading channel [63]. This increases the complexity of channel estimation methods. Hence, we formulate a joint UAV transmission power and vehicle selection problem. The objective is to maximize the data rate of vehicle to UAV communications for both gross data offloading and the FRL scenario. Consequently, a UAV aims to minimize its average cost (

Ψ_{a_{i}}

) over flight time

τ

, where

a_{i}

is the action set of UAV and

(s_{i}; s_{- i})

are the mixed and opposing strategies of the UAV. The term

G (a_{i})

indicates the selected actions during a game

G

. The cost (

Ψ_{a_{i}}

) is calculated in Equation (4) as the expectation of all optimal actions during the UAV flight time

τ

. Note that for different trajectories, the flight time varies, and hence,

Ψ_{a_{i}}

is different for various iterations of the game.

Ψ_{a_{i}} (s_{i}; s_{- i}) = E \int_{t}^{τ} G (a_{i} (τ) d τ

(4)

Furthermore, the signal-to-interference-plus-noise ratio (SINR) at the UAV’s receiver

u_{r}

during the UAV flight time

τ

is a function of the bandwidth

b_{u_{r}}^{(τ)}

and power

p_{u_{r}}^{(τ)}

:

γ_{u_{r}}^{(τ)} (b_{u_{r}}^{(τ)}, p_{u_{r}}^{(τ)}) = \frac{{\bar{g}}_{b_{u_{r}}^{(τ)}}^{(τ)} p_{u_{r}}^{(τ)}}{\sum_{V} {\bar{g}}_{b_{u_{r}}^{(τ)}} p_{u_{r}}^{(τ)} + σ^{2}}

(5)

where

σ^{2}

is the additive white Gaussian noise power spectral density, which is assumed to be same at all vehicles. The term

{\bar{g}}_{b_{i}^{(τ)}}

is the instantaneous channel gain experienced by vehicle i. Then, the downlink spectral efficiency of vehicle i is

C_{i}^{(τ)} = log (1 + γ_{u_{r}}^{(τ)} (b_{u_{r}}^{(τ)}, p_{u_{r}}^{(τ)}))

(6)

Hence, the problem (

P 1

) of energy-efficient computing resource allocation in UAV is formulated as follows in Equation (7). The problem aims to minimize the UAV cost funtion, minimize the UAV’s power consumption by the

i^{t h}

vehicle, and to maximize the data rate. The objective function and constraints are nonconvex, thus leading to a mixed-integer nonconvex optimization problem.

\begin{matrix} P 1 : & min_{Ψ^{(t)}, p_{u i}} max_{r_{i, k}} \sum_{i = 1}^{V} \sum_{t = 1}^{τ} {Ψ_{a_{i}}} \\ subject to \end{matrix}

(7)

\begin{matrix} C 1 : & | | q_{i} (t + 1) - q_{i} (t) | | \leq v_{m a x} (τ) τ_{i, i + 1} \end{matrix}

(8)

\begin{matrix} C 2 : & | | q_{i} (t + 1) - q_{i} (t) | | \geq d_{m i n} \end{matrix}

(9)

\begin{matrix} C 3 : & q_{(x, y)} = q_{(x_{0}, y_{0})} \cdot e^{- α (\frac{x^{2}}{a^{2}} + \frac{y^{2}}{b^{2}})} + H \end{matrix}

(10)

\begin{matrix} C 4 : & Equation (4) \end{matrix}

(11)

\begin{matrix} C 5 : & Equation (5) \end{matrix}

(12)

\begin{matrix} C 6 : & Equation (6) \end{matrix}

(13)

\begin{matrix} C 7 : & {\hat{g}}_{j}^{i} = \sqrt{\frac{K}{K + 1}} g_{j}^{i} + \sqrt{\frac{1}{K + 1}} g_{j}^{i} \end{matrix}

(14)

where

P 1

is a multiobjective optimization problem. The constraint

C 1

implies that for a UAV flight time (

τ

), the UAV velocity is restricted to an upper bound denoted by

v_{m a x}

. In constraint

C 2

, the UAV traverses a minimum distance

d_{m i n}

in each TTI. The constraint

C 3

restricts the UAV trajectory to an elliptical path. We assume this restriction to ensure that the UAV does not drift away from vehicle’s coverage area. The constraint

C 3

also describes the stochastic trajectory of the UAV and describes the initial coordinates of the trajectory. In constraints

C 4

,

C 5

, and

C 6

, the UAV learns the strategies and identifies which vehicle to select for uplink communication. Then, the UAV target coordinates are selected such that the target vehicle receives the maximum UAV transmit power. The stochastic trajectory is determined such that the remaining distance to the destination coordinates are successively minimized. The UAV flight time

τ

also depends on its battery capacity, and these constraints imply the total transmission power limitation of the UAV. The constraints

C 5

and

C 6

ensure a minimum required power to successfully receive the vehicle’s transmitted packets. These constraints also ensure that the UAV’s transmitted power consumed by a vehicle is less than a predefined value and also imply that the resource allocation parameters are integer variables. The constraint

C 7

indicates the OTFS constraint for vehicles. Here, the Rician K factor impacts the maximum transmit power and path loss exponents, where

{\hat{g}}_{j}^{i}

is the channel gain experienced by the

i^{t h}

vehicle in

j^{t h}

transmission window.

The problem is a mixed-integer nonlinear programming and

N P

-hard problem [64]. Therefore, a solution based on a Markovian game is proposed to obtain the optimal solutions for UAV resource allocation. To solve the UAV’s vehicle selection problem, we approximate the nonlinear problem to a linear problem using successive convex approximation. We also introduce a control parameter (

ϕ

), which determines the fairness of the vehicle selection probability. We also introduce a discount factor (

δ

), where a larger

δ

indicates that an agent is more inclined to explore the game. A smaller

δ

indicates that the agent is more cautious to learn.

5. Game Theory-Based Solution Approach

This paper proposes game theory to analyze UAV and vehicle selection in a federated learning (FL) environment to find an equilibrium solution for multiobjective situations. An aggregation algorithm utilizes federated averaging, and then based on game theory, a vehicle selection model is developed. This makes it possible to model the aggregation process as a mixed-strategy game between the UAV and each vehicle. Using Nash equilibrium, the UAV determines the probability that sufficient updates will be received from all vehicles based on their actions of sending updates. A shared global model is trained with the cooperation of a central parameter server that aggregates the information sent by the vehicles to update the shared global model. The vehicles train the model using their own local data and send the model to the UAV. The UAV intermittently shares the updated global model to the vehicles, which is updated in tune with the local models, and the process is repeated. The proposed solution approach is represented as a block diagram in Figure 4. We model strategic situations based on the agents’ actions in our game model. It is possible for the players to be in total conflict, and the game model employed is noncooperative, in which the players have partial information about the actions of other players. A rational player knows how to maximize its payoff and calculates its strategy based on that knowledge to arrive at a mixed strategy, where each agent has a probability assigned to each strategy.

In this paper, the game’s participants are the vehicles and the UAV. Each vehicle aims to communicate frequently with the UAV, thus leading to competition among vehicles. A vehicle’s objective is to establish communication with the UAV in each TTI while preventing other vehicles from accessing the UAV’s resources. Conversely, the UAV’s role is to impartially select a subset of vehicles in each TTI. Additionally, the UAV aims to identify vehicles that have not completed their local model updates to avoid squandering resources. The proposed game model’s cost functions include channel bandwidth and UAV battery power availability. The UAV’s utility is measured by the proportion of battery power used to serve vehicles, and any wastage of battery power adversely affects this utility.

5.1. Nash Equilibrium

Nash equilibrium in noncooperative games implies that a player can not increase its own incentive by deviating from its strategy while other players maintain their strategy [65]. The incentive is related to the complexity and priority of a task and the task execution probability of the UAV. The cost is the energy consumption as the UAV selects different vehicles and updates the global model. This enables a vehicle to learn the strategies of other vehicles to optimize its own strategy using Nash equilibrium. In this paper, we model an imperfect information game, where vehicles have partial or incomplete information about other players’ strategies, current state, actions, and payoffs, and they analyze various coalitions formed between the vehicles. An absence of coalition indicates that a vehicle acts independently without collaboration or communication with others. Each vehicle is initially unaware of both the mixed strategy and the selected action of the other vehicles but is aware of whether its previous action resulted in a positive or negative incentive. The vehicles and the UAV periodically update their strategies based on the feedback received from the environment, and the game is played repeatedly.

In the model proposed in this paper, the players, i.e., the UAV and the vehicles, follow a mixed strategy in which the actions are randomly selected according to a probability distribution determined using Nash equilibrium. Nash equilibrium describes the strategies in which the players should not deviate from their initial decision to maximize their utility function. The selected vehicles send back the new updated local model to the UAV. We assume that only less than half of the vehicles complete the local training data. The vehicles have no information about the aggregation algorithm used by the UAV to average the updates and generate the new global model. The vehicles do not control the updates sent by other vehicles. We apply Nash equilibrium to compute the probability of receiving updates from each vehicle. The empirical evaluation of the proposed method shows that our method provides both higher vehicle selection accuracy and faster model convergence. Our work emphasizes the averaging algorithm in federated learning and is a significant extension of our previous work in [11]. We formulate a robust aggregation algorithm based on a game theoretic approach combined with federated learning aggregate local models into a joint global model.

The data distributed among the vehicles is nonindependent and identically distributed (non-i.i.d.), and the UAV plays the game independently with each vehicle. The overall goal is to let the global model to converge with minimum delay. The iterative procedure assigns equal selection probability to all vehicles with an initial value of 1. Consequently, the initial aggregated model at the first iteration is calculated using simple averaging. Based on the variation of the selection probability of vehicles, the iterative algorithm considers a stopping criterion K. Due to the similarity of the updates provided by the vehicles, some of them may be redundant. Therefore, the UAV considers a probability p in the model aggregation algorithm. For each vehicle, the probability to provide updates to the model can be computed by considering a game between the UAV and each vehicle, as well as a game between multiple vehicles that compete to transmit their local models to the UAV. The valid actions of the vehicles are to complete and transmit the updates, while the UAV can accept or ignore these updates as its valid actions.

5.2. Action Set and the Players’ Strategies

The action set of a vehicle comprises the following actions. In a federated learning scenario, a vehicle captures sensor data and processes the data locally. This generates a set of hyperparameters that is pertinent to its local model. However, since there are a multitude of vehicles, each vehicle requires information from other vehicles to compute the Global model. Note that most ground vehicles may not be able to complete this action, but the UAV can communicate with all the ground vehicles. Therefore, it can generate the Global model.This can be done in two different ways: either using a gross data offloading scheme or using the federated learning scheme. In gross data offloading, the vehicles transmit all the sensor data to the UAV, and the UAV generates the Global model. In each TTI, the UAV selects a subset of vehicles for gross data offloading to optimally utilize channel bandwidth and the available battery power. On the other hand, in the federated learning scheme, the selected vehicles will only transmit the local model parameters to the UAV, which in turn will compute the Global model. Hence, the FL approach saves bandwidth.

Let

A_{i} = {a_{1}, a_{2}, \dots, a_{j}}

denote the action set of the

i^{t h}

vehicle. Let

p_{i} (a_{j})

be the probability that the

i^{t h}

vehicle plays action (

a_{j}

) in the mixed strategy. The expected utility (

u_{i}

) is given by Equation (15a), where (

s_{- i}

) is the set of opposing strategies. In the above game, the set (

S_{i}

) is the set of all mixed strategies (

s_{i}

) such that for some (

u_{i}

), the n-tuple

(u_{i}, s_{i})

is an equilibrium point, as quantified in Equation (15b,c). Furthermore, as (

s_{i}

) is strictly dominant, as per Equation (15d), and (

s_{i}

) must strictly dominate all pure strategies of vehicle i. To transmit local models to the UAV, vehicles (

V_{i}

) and (

V_{m}

) identify an optimum strategy to choose an action from the set

{1, \dots, A}

. Each vehicle has

A

actions to choose from;

u_{1} (a, a) = 1, a \in {1, \dots, A}

and

u_{1} (a, \bar{a}) = 0, a \neq \bar{a}

, where (

\bar{a}

) indicates the set of actions that do not contribute to the Nash equilibrium strategy, and the unique mixed strategy equilibrium is

(\frac{1}{A}, \dots, \frac{1}{A})

. The solution of a solvable game is a set of equilibrium points.

\begin{matrix} u_{i} (s_{i}, s_{- i}) & = \sum_{j = 1}^{| A_{i} |} u_{i} (a_{j}, s_{- i}) p_{i} (a_{j}) & \forall s_{- i} \in S_{- i} \end{matrix}

(15a)

s . t . p_{i} (a_{j}) \geq 0, 1 \leq j \leq | A_{i} |

(15b)

s . t . \sum_{j = 1}^{| A_{i} |} p_{i} (a_{j}) = 1

(15c)

\forall s_{i} \in S_{i} and \forall s_{- i} \in S_{- i} : u_{i} ({\bar{s}}_{i}, s_{- i}) > u_{i} (s_{i}, s_{- i})

(15d)

Note that

u_{i} ({\bar{s}}_{i}, s_{- i})

is a convex combination of the utilities of the

i^{t h}

vehicles’ pure strategies (

π_{i α}

) with weights indicated by

{\bar{s}}_{i} (π_{i α})

. Let

π_{i α}^{*} = \arg {max}_{π_{i α_{j}}} (u_{i} (π_{i α}, s_{- i}))

be the pure strategy, which implies the largest utility for the

i^{t h}

vehicle when the other vehicles play

s_{- i}

. Then, the strategies (

π_{i α}^{*}

) for a utility imply an upper bound for

{\bar{s}}_{i}

’s expected utility, as per Equation (16).

\begin{matrix} u_{i} (π_{i α}^{*}, s_{- i}) & \geq u_{i} ({\bar{s}}_{i}, s_{- i}) \end{matrix}

(16)

However, in a specific case when (

π_{i α} = π_{i α}^{*}

) and (

s_{- i} = s_{- i}

), the best and opposing strategies lead to a contradiction, as per Equation (17).

\begin{matrix} u_{i} (π_{i α}^{*}, s_{- i}) & \geq u_{i} ({\bar{s}}_{i}, s_{- i}) > u_{i} (π_{i α}^{*}, s_{- i}) \end{matrix}

(17)

We ignore this condition in our solution to reduce computational complexity of the solution and to prevent vehicles and the UAV from being locked in playing a contradicting strategy indefinitely. The term (

π_{i α}

) indicates the

i^{t h}

vehicle’s

α^{t h}

pure strategy.

5.3. Vehicles’ Local Model Parameter Transmission Strategies

When the

i^{t h}

vehicle sends updates to the UAV, if the UAV accepts these updates, it earns a payoff by updating the

i^{t h}

vehicle’s local model to the previous global model. If the UAV rejects these updates, the incentive reduces. The Markovian game played between the UAV and the

i^{t h}

vehicle updates the strategy as per Equation (15a), and both of them experience a negative incentive if vehicles does not complete a local model. If the UAV rejects an update, no player earns an incentive. When the UAV plays a mixed strategy, it has equal incentive for accepting and rejecting the model updates from a vehicle. The probability of accepting the

i^{t h}

vehicle’s updates by the UAV when it plays Nash equilibrium is updated as the UAV aggregates the received updates. This game has two pure strategy Nash equilibrium sets, which are denoted as (select, not_select) and (transmit, not_transmit). The UAV places the vehicles in one of the selected or not selected categories as the UAV considers the probability of receiving updates from each vehicle. The valid actions of the vehicles are to complete and transmit updates. In the proposed scenario,

V

vehicles periodically communicate with the UAV to learn a global model. Furthermore, the

V

vehicles can send updates, while the UAV can accept or reject these updates.

If two vehicles generate identical sensor data in a TTI, they have two discrete amounts of data: high or low. Data generation at a high rate results in a lot of packets to queue, while a low generation rate leads to less queuing. The UAV availability and queuing time depends on the amount of data, which depends on the sensors of both vehicles. In the case where both vehicles generate data at a low rate, the data are rare and fetch useful information in the driving scenario. In contrast, when they generate data at a high rate, the redundancy may lead to less contribution toward driving decisions, especially when cooperating. In the case where only one vehicle generates data at a high rate, the increase in wait times in processed information is compensated by higher redundancy, thus increasing the queuing delay of the vehicle packets that generate data at high rates. The proposed method uses an iterative averaging algorithm to balance the local updates generated by the vehicles. The iterative algorithm places the vehicle in one of the selected or not selected categories. The UAV considers the probability of receiving updates from each vehicle. These probabilities are computed by considering a mixed-strategy game between the UAV and each vehicle in the set. The valid actions of the vehicles are to complete and transmit the updates. In our simulated scenario, 1–100 vehicles periodically communicate with the UAV to learn a global model. As the vehicles send the updates, the UAV can accept or ignore these updates. By employing Nash equilibrium, the UAV determines the probability to receive updates from vehicles. As a novel contribution, we propose an iterative federated averaging algorithm for the UAV to obtain the selection probability of each update and a robust estimate of the final model.

5.4. UAV’s Global Model Update and Parameter Transmission Strategies

The Karush–Kuhn–Tucker (KKT) conditions provide a set of necessary and sufficient conditions for optimization problems constrained by multiple equality and inequality constraints [66]. The UAV maximizes its utility for the number of participating vehicles is a convex optimization problem whose optimal primal and dual variables are characterized using the KKT conditions. Using the KKT conditions and assuming that the vehicle selection strategy is known, the UAV finds the optimal number of vehicles that maximize the incentive. Here, (

V_{i}

S_{1}

) and (

V_{i}

S_{2}

) are both pure strategies of vehicle (

V_{i}

). A weakly dominant strategy dominates the other strategies for that vehicle. In this scenario, we assume that (

V_{i}

S_{1}

) weakly dominates (

V_{i}

S_{2}

). The set of strategy profiles of remaining vehicles is denoted by (

V_{m}

S_{i}

), and

u_{1} (S_{i}, S_{i + 1})

is the utility function of the

i^{t h}

vehicle.

Note that in a TTI, each vehicle has an equal probability to transmit the local updates to the UAV. To maximize the incentive, the same local updates cannot be transmitted in the next TTI, and new updates will have to be transmitted. If the two vehicles (

V_{i}

) and (

V_{m}

) complete their local updates with the same probability, then for half of the TTI, vehicle (

V_{i}

) transmits the local updates, and for the other half of the TTI, (

V_{m}

) transmits the local updates to the UAV. Each vehicle achieves an incentive for transmitting local updates. Moreover, depending on the time of the day or other driving factors, the UAV incurs different penalties for different TTIs. A sample set of actions for vehicles and the UAV in a TTI can be local model completion, noncompletion, model transmission, and nontransmission to the UAV. The UAV may select or not select a local update.

For instance, (

N C_{i} C

) is an action when the

i^{t h}

vehicle is in the state not_complete and does not transmit, and it transmits when the local models are complete, thus assuming it is selected by the UAV. Two vehicles complete the local update and generate an incentive in a single TTI. After each local update, the vehicles’ strategies simultaneously decide whether to transmit the updates to the UAV or not. If one or both of the vehicles does or do not transmit after the local update, the UAV maintains its payoff by continuing with previous global update, and only a small incentive is accumulated. In the next TTI, selecting a vehicle becomes critical, as a nonselection leads to a higher penalty. After each local update, if both vehicles do not transmit, the penalty is divided evenly. If only a single vehicle withdraws from the coalition, it continues with the previous global update and incurs a negative incentive. If none of the vehicle withdraws their weights and local updates are transmitted to the UAV, both receive a higher positive incentive. For local incentives, each vehicle prefers to complete local updates, as well as get selected by the UAV.

5.5. Vehicle Selection Payoff

At each TTI, the UAV decides whether or not to select a vehicle. The vehicle also decides whether or not to transmit the local updates to the UAV. In an iteration, the objective of the UAV and the vehicles is to optimize their own models. In the case where the UAV selects a vehicle and the vehicle is not ready with the updates, the UAV wastes an effort. In the case where vehicle is ready with the local updates but the UAV does not select, the vehicle wastes the computation, as a local update was not required. Furthermore, either the UAV is optimized or the vehicle is left without an optimized updated model. The UAV receives more incentive to select a vehicle with completed updates than a vehicle that does not have model updates. Correct vehicle selection adds to the UAV incentive, and incorrect vehicle selection reduces incentive. The game assumes that the vehicle wastes local updates if it is not selected by the UAV, thus resulting in a negative incentive for the vehicle.

As per Equation (18), a convex combination of utility values

u_{1}, u_{2} \dots u_{n}

is a linear combination of

\sum_{i = 1}^{V} w (u_{i}) u_{i}

, where

\sum_{i = 1}^{V} w (u_{i}) = 1

and

w (u_{i}) \geq 0

. A combination of weights

w (u_{i})

does not lead to a value greater than the largest value

u_{m a x}

, and the assignment of weights

w (u_{i})

does not lead to a value lower than the smallest value

u_{m i n}

.

\sum_{i = 1}^{V} w (u_{i}) u_{i} \leq \sum_{i = 1}^{V} w (u_{i}) u_{m a x} = u_{m a x} \sum_{i = 1}^{V} w (u_{i}) = u_{m a x}

(18)

The utility functions in Equation (19a–d) are used to measure the success of a strategy, and they quantify the preferences between different outcomes of various actions. This leads to the UAV’s actions, which impact the UAV’s cost function (

Ψ_{a_{i}} (s_{i}; s_{- i})

) in Equation (4). The UAV’s cost function (

Ψ_{a_{i}} (s_{i}; s_{- i})

) is thus based on the utility functions of the agents and the probability of the relevant actions occurring. For example, a game can comprise the following utilities and actions:

G = ({p_{1}, p_{2}}, {a_{1}, a_{2}}, {u_{1}, u_{2}})

. Let

i \in G

and the max–min value be

w_{i}

and min–max value be

v_{i}

so that

v_{i} \geq w_{i}

. Moreover,

v_{i} \geq w_{i}

uses the definitions of the max–min and min–max. The definition of the max–min

w_{i}

is

w_{i} = max_{s_{i}} min_{s_{- i}} u_{i} (s_{i}, s_{- i}) = u_{i} (s_{i}^{*}, s_{- i}^{*})

(19a)

where, ‘

- i

’ indicates the strategies of an opposing vehicle.

v_{i} = min_{s_{- i}} max_{s_{i}} u_{i} (s_{i}, s_{- i})

(19b)

Following from the definition of

max s_{i}

,

max_{s_{i}} u_{i} (s_{i}, s_{- i}) \geq u_{i} (s_{i}^{*}, s_{- i})

(19c)

This also holds for

max_{s_{i}} u_{i} (s_{i}, s_{- i}) \geq u_{i} (s_{i}^{*}, s_{- i}^{*}) = w_{i}

(19d)

Let

A_{i} = {a_{1}, a_{2}, \dots a_{j}}

be the action set of vehicle i or the set of pure strategies. Let

{\bar{s}}_{i} (a_{j})

be the probability that vehicle i plays action

a_{j}

in the mixed strategy. The expected utility of

{\bar{s}}_{i}

has been given earlier in Equation (15a). Furthermore, as

{\bar{s}}_{i}

is strictly dominant, then

{\bar{s}}_{i}

must strictly dominate all pure strategies of vehicle i. Let

g \to G

comprise a set of d-dimensional probability instances of

p = (p_{1}, \dots, p_{d}) \in P

. We assume that

g (\cdot)

is a model for which we evaluate the game utility

u = g (p)

for any given probability instance

p

, where the probability space

P

is the unit hypercube, i.e.,

P = {[0, 1]}^{d}

. Each game

g \in G

is a model that comprises

g (p)

for all

p \in P

, where

G

comprises a set of model parameter transmission probabilities from vehicles to the UAV:

\begin{matrix} g^{*} = arg min_{g \in G} ℓ (g, G), ℓ (g, G) = {∥ G - g ∥}^{2} \end{matrix}

(20)

where

ℓ (\cdot)

is the model loss calculated as the mean squared error (MSE) between

G

and g. Here, we propose

g (p) = G (p; θ), θ \in Θ

, where

Θ = R^{M}

is a parameter space that specifies the FRL local model parameters transmitted from vehicles to the UAV. The model parameter transmission probabilities for each game are as follows:

G (p) = g (p_{1}, \dots, p_{n}) = \sum_{i = 0}^{V} g_{i}^{u} (\sum_{j = 1}^{V} g_{i j}^{s} (p_{j}))

(21)

where

g_{i}^{s}

and

g_{i j}^{u}

are the game functions for different strategy and utility combinations, and

V \in N

is the number of vehicles in the coalition. Every combination of a set of actions leads to a different corresponding model transmission probability. Using Equation (20) and Equation (22), the overall model transmission probability of

(V)

vehicles playing a game

G (p; θ)

is

\begin{matrix} G (p; θ) = \sum_{i = 0}^{V} G_{p, a}^{u, s} (θ_{i}^{v} | \sum_{j = 1}^{V} G_{p, a}^{u, s} (θ_{i j}^{v} | u_{j})) \end{matrix}

(22)

where

G_{p, a}^{u, s}

is the set of game

G

for utility (u), strategy (s), and probability of action (p,a). The term

θ_{i j}^{v}

is the vehicle (v)’s model transmission hyperparameter weights, thus assuming that there are (i) vehicles, and the UAV needs to select a vehicle. In a given TTI, two vehicles

V_{1}

and

V_{2}

have completed their local updates. The two vehicles can be located in the same or different clusters. Out of the (i) vehicles in a TTI, each vehicle has equal probability to transmit the local updates to UAV. The same local updates cannot be transmitted in the next TTI; new updates will have to be transmitted.

If the two vehicles

V_{1}

and

V_{2}

complete their local updates with the same probability, then for half of the TTI, vehicle

V_{1}

may transmit the local updates, and for the other half of the TTI,

V_{2}

may transmit the local updates to the UAV. Each vehicle achieves a reward for transmitting local updates. Some strategies survive the iterated removal of strictly dominated strategies. If it is not possible for the UAV to select either vehicle in a TTI, then the UAV has missed global updates in a TTI and accumulates a negative incentive. Moreover, depending on the time of the day or other driving factors, the UAV may face different penalties for different TTIs. Moreover,

V_{2}

may not accumulate rewards, but it can try to minimize the rewards of

V_{1}

, i.e., it plays a min–max strategy, and

V_{1}

can maximize its minimum reward, i.e., it plays max–min strategy. Note that the UAV adapts a transmission window size to improve the number of participants for maximizing its utility. The framework suffers from slower convergence due to less participation. Thus, the UAV avoids deliberately dropping the vehicles to achieve a faster consensus. With the measured responses from the participating vehicles, the UAV updates an incentive to improve the global model while minimizing the maximum delay in the updated global model.

6. Simulation Results and Discussion

We considered a Manhattan mobility model and assumed an average vehicle speed of 100 km/h and average intervehicle distance of 50 m. Here, the BSM and CPM were utilized for communication, and their reference packet formats were specified by the SAE. We considered a BSM packet interarrival time between 100 ms and 1 s. The interarrival time of the event triggered CPM was modeled as a Poisson process, and the average repetition frequency of the CPM packets was varied from 100 to 500 ms, with an average number of packet retransmissions between 1 and 5. The packet arrival rate (

λ

) was varied from 1000 to 2000 packets/s. The vehicles and the UAV were trained using a random 10% slice of the V2X-Sim dataset. We used the nashpy library to calculate the Nash equilibrium. The V2X-Sim dataset was processed using the Amazon elastic compute-2 (EC-2) instance. The FRL collaborators were created using the Python programming language, the TensorFlow, and the TensorFlow Federated frameworks. The Table 3 lists the main parameters used in the simulations.

6.1. Variation in Delay Profile and UAV Transmit Power

We considered 100 participating vehicles with different preferences, payoffs, and strategies. We observed that a vehicle seeks more incentive to maintain a higher probability of completing local updates and transmitting them to the UAV. Furthermore, we considered the trade-off between the communication cost and the computation cost, as these costs are complementary for each vehicle. The higher incentive for a vehicle emphasizes increased communication with the UAV to improve the global model convergence. The increase in communication time for a fixed incentive influences participating vehicles to partly improve the local model convergence and partly to rely upon selection by the UAV to minimize the total cost. The increase in communication cost with the increase in communication time indicates that the vehicles will complete more local updates.

However, the trend was significantly affected by more participating vehicles, as more vehicles completed local models and competed to maximize the incentive over a sequence of states and actions. The experimental results show that our proposed game-based aggregation algorithm is robust to faulty and noisy local updates. Our algorithm converged after a maximum of 50 iterations and detected 100% of the updates from vehicles for the investigated scenarios. The delay profile for federated learning and the average delay experienced by a packet are illustrated in Figure 5. In Figure 5, we varied the number of vehicles (V) from 1 to 100, and we varied the vehicle velocity as 40 km/h, 60 km/h, and 80 km/h. The road length (

R_{L}

) was set as 1 km and 2 km.

The variation in the UAV transmit power (dBm) in gross data offloading with the UAV altitude for a varying number of vehicles (V) is illustrated in Figure 6. The variation in the UAV transmit power (dBm) in federated learning with the UAV altitude for a varying number of vehicles (V) is illustrated in Figure 7. It is evident from Figure 6 and Figure 7 that the UAV power consumption was higher (30 dBm) in the gross data offloading scenario, since all the data needed to be processed promptly by the UAV processor. Conversely, in the federated learning scenario, the UAV power consumption was marginally lower (25 dBm), as the processing occurred at the vehicle level. Moreover, Figure 6 and Figure 7 indicate that the UAV power consumption progressively increased with the number of vehicles and UAV altitude. This trend is inferred from the positive slope of the graphs, which shows a continual increase rather than a plateau. Considering this delay profile and the UAV power consumption, we will next discuss and illustrate the variation in optimal vehicle selection probabilities for the UAV and the model completion probabilities for vehicles.

6.2. Optimal Payoff and Incentive Probabilities

We introduced a discount factor (

δ

) for future incentives based on the set of possible states of the game. The immediate payoff or incentive for taking a specific action in a particular state and the discount factor model the preference for immediate incentives over future incentives. For example, in a two-vehicle game, the UAV selects from two vehicles with very weakly dominant pure strategies. The UAV prefers to select one of these vehicles. Two weakly dominant strategies can not weakly dominate each other and cannot coexist. Table 4 lists the reward and strategy utility values for two vehicles. When we increase the number of vehicles from 2 to 100, an equilibrium point is achieved, which is an n-tuple

s_{i}

such that each vehicle’s mixed strategy maximizes its payoff when the strategies of the other vehicles are fixed, as per Equation (23). Thus, each player’s strategy is optimal against those of the others.

max_{all V ’ s} [P_{i} (s_{i}, V)] = max_{α} [P_{i} (s_{i}; π_{i α})]

(23)

P_{i} (s_{i}) = max_{α} P_{i α} (s_{i})

(24)

Here,

P_{i α} (s_{i}) = P_{i} (s_{i}; π_{i α})

for

s_{i}

to be an equilibrium point. If

s_{i} = (s_{1}, s_{2}, \dots, s_{n})

and

s_{i} = \sum_{α} c_{i α} π_{i α}

, then

P_{i} (s_{i}) = \sum_{α} c_{i α} P_{i α} (s_{i})

. Consequently, for Equation (24) to hold,

c_{i α} = 0

whenever

P_{i α} (s_{i}) < max P_{i α} (s_{i})

, i.e.,

s_{i}

does not use

π_{i α}

unless it is an optimal pure strategy for vehicle i. A payoff function

P_{i}

maps the set of all n-tuples of pure strategies into the real numbers ∈

R

. This game has a Nash equilibrium in which both vehicles can be selected or rejected in a given TTI by the UAV. Let

s (t) = [s_{1} (t) s_{2} (t)]

denote the mixed strategy of a vehicle at time instant t, where

s_{1} (t)

is for the probability of adopting strategy 1, and

s_{2} (t)

stands for the probability of adopting strategy 2. The set

s (t)

describes the distribution over the strategies of the player. The probabilities

p = [p_{i j}]

denote the incentive matrix associated with the game in Equation (25) as

\begin{matrix} P = (\begin{matrix} p_{11} & p_{12} \\ p_{21} & p_{22} \end{matrix}) \end{matrix}

(25)

where all the entries are probabilities. In zero sum games, Nash equilibrium or saddle points for the game vary, as the outcome for a given joint action is stochastic. The zero sum property implies that at any time t, there is only one winning player. The Nash equilibrium of the game is the pair

(p_{o p t 1}, p_{o p t 2})

, where the optimal mixed strategies are

p_{opt 1} = | \frac{p_{22} - p_{21}}{l o s s} |, p_{opt 2} = | \frac{p_{22} - p_{12}}{l o s s} |

where loss =

(p_{11} + p_{22}) - (p_{12} + p_{21})

.

The variation in the

p_{o p t}

values for

i^{t h}

vehicle’s model transmission and the UAV’s model acceptance in an iteration of the game (

G

) is illustrated in Figure 8. In the initial case, there might be only one best strategy for the UAV and the vehicle, so in this case,

p_{o p t 1} = 1

and

p_{o p t 2} = 0

. As the game is played repeatedly, it is possible for a vehicle to select a small

p_{m i n} \geq

0 such that p approaches

p_{o p t}

. The iterations stop when all the possible strategies are Pareto optimal, as no other strategy generates a higher incentive for one vehicle without generating a lower incentive for the other. We initialize the payoff matrix P as follows:

\begin{matrix} P = (\begin{matrix} 0.65 & 0.35 \\ 0.45 & 0.55 \end{matrix}) . \end{matrix}

This results in

p_{o p t 1}

= 0.575 and

p_{o p t 2}

= 0.750. As an initial strategy, the vehicle selection or rejection probability is 50%. The corresponding strategy profile for this equilibrium is

((0.5, 0.5), (0.5, 0.5))

. We analyzed the preferences of the vehicles over the set of possible outcomes. We varied the number of participating vehicles up to 100 with different sizes of data transmitted. With

p_{o p t 1} = 0.575

and

p_{o p t 2} = 0.750

, we set

p_{m a x} = 0.95

and varied the control parameter

ϕ

from 0.000001 to 0.001.

In the case of Nash equilibrium, there is no oscillatory behavior when a vehicle assigns a higher probability to an action, since the other player reinforces the best strategy. The loss decreases as

p_{m a x}

approaches a value of 1 for

p_{m a x}

= 0.99 and

ϕ

= 0.001. The UAV allocates incentives to the participating vehicles to achieve optimal local model convergence in consideration for improving the communication efficiency. The UAV incurs an incentive for maximizing its own benefit, i.e., an improved global model. The UAV also accumulates an incentive for accepting more vehicles’ local models while improving the performance of the global model. The UAV does not randomly select the participating vehicles and uses the incentive to select vehicles that will improve the global model when relative

p_{o p t}

is attained for the total game iterations. The Table 5 lists the

p_{o p t 1}

and

p_{o p t 2}

values for

i^{t h}

and

m^{t h}

vehicles in an iteration of the game (

G

), where

G

is the set of all games between

i^{t h}

and

m^{t h}

vehicles.

6.3. Optimal Weight Values and Model Transmission Probabilities

For an iteration of the game (

G

), the weight values from Equation (18) for different actions pairs are listed in Table 6.

Similarly, as shown in Table 7, a vehicle might be selected by a UAV when the vehicle has completed its local model updates in a TTI. In this case, both the vehicle and the UAV incur a positive incentive. If a vehicle completes the local model updates but is not selected by the UAV, then the vehicle incurs a low negative incentive. Here, we assume that the vehicle wastes its local updates if it is not selected by the UAV, thus resulting in negative incentive for the vehicle. If the vehicle did not complete its local model updates and is not selected by the UAV, then there is no accumulation or reduction of the incentive for either player.

The variation in the

p_{o p t}

values for

i^{t h}

vehicle’s model transmission with a varying number of vehicles in an iteration of the game (

G

) is illustrated in Figure 9. Initially, for

V = 5 - 100

, all the vehicles had a high

p_{o p t}

value for the first 25 iterations of the game (

G

). As the number of iterations of the game increased, the model transmission probability,

p_{o p t}

, continued to decrease gradually for all vehicles. This is because during the previous iterations of the game in a TTI, the vehicles have communicated with the UAV. To ensure fairness, the UAV must be able to select other vehicles that have not communicated in the previous TTIs. For

V = 5

and

V = 10

, after 80 and 120 iterations of the game, respectively, the

p_{o p t}

approached 0 before gradually increasing again. However, as the number of vehicles increased,

p_{o p t}

did not approach 0. At each iteration of the game, at least one vehicle had a minimum

p_{o p t}

of

0.1

, i.e., a minimum of 10% of the vehicles had a

p_{o p t}

> 0.

In Table 6, the values (1, 1) in the first row and first column indicate that Nash equilibrium consists of best responses for both vehicles. Here, we considered the Nash equilibrium to determine if any of the strategies are also Pareto optimal. From the third and fourth rows of the Table 6, for vehicle

V_{i}

,

N o t_t r a n s m i t

is the best option here, so there is no profitable deviation from restriction

N o t_t r a n s m i t

for this game as (

10 > 5

). Similarly, the UAV considers the decisions of vehicle

V_{m}

, and none of the vehicles can deviate to increase their profit. If vehicle

V_{m}

is placed in category

S e l

by the UAV, instead of restriction

N_S e l

, it will result in a zero reward, which is lower than the reward received under the restriction (

0 < 2

). The options of vehicle

V_{i}

in the left column yielded no profit form deviations. For the second column of vehicle

V_{i}

,

N o t_t r a n s m i t

instead of restriction

T r a n s m i t

is the optimal action as (

10 > 5

). In a game where vehicle i has N information sets indexed as

n = 1, \dots, N

and

M_{n}

possible actions at information set n, the number of pure strategies for vehicle i continue to increase, and the UAV adopts one strategy that maximizes its incentive; the vehicle’s incentive and builds a trust for future iterations of the game. For imperfect information games, a pure strategy is composed of the N choices made at each information set. Since information set n has

M_{n}

options, we multiply all options by each other to find the amount of all possible pure strategies with

\prod_{n = 1}^{N} M_{n}

as solution.

The variation in the

p_{o p t}

values for

i^{t h}

vehicle’s model transmission with the control parameter (

ϕ

) in an iteration of the game (

G

) is illustrated in Figure 10. When

ϕ

is high, the probability of the UAV selecting a vehicle increases, which increases the payoff. When

ϕ

is small, more numbers of vehicles aim to transmit their model parameters frequently to the UAV, and when

ϕ

is increased, the agents consider their previous transmission probabilities and lower the current transmission probability. This ensures a fair transmission opportunity to other vehicles. In future iterations of the game, the vehicles

V_{i}

and

V_{m}

decide whether or not to transmit the local updates to the UAV in a TTI. They know that there is a equal probability that the UAV will select a vehicle. Each vehicle’s reward is diminished if it does not complete local updates and the UAV selects, remains unchanged if it does not update and UAV does not select, diminishes if it updates and the UAV does not select, and increases if it updates and the UAV selects it. If the vehicle

V_{i}

learns the global update before the local update and the vehicle

V_{m}

does not, it can learn vehicle

V_{i}

’s reward accumulation before updating. If the UAV selects a vehicle that did not complete local updates, the effort in that TTI is wasted. The vehicles start a new iteration in the next TTI. Each vehicle prefers an outcome where it is selected and has completed the local updates. Also, in each TTI, the players want to maximize the reward. Also, for local rewards, each vehicle prefers to complete local updates and get selected by the UAV. The interaction is modeled as a strategic game

({V_{i}, V_{m}}, {A_{1}, A_{2}}, {u_{1}, u_{2}})

in which

A_{i} = [0, + \infty)

and

a_{i} \in A_{i}

represent how many vehicles are selected for federated averaging i. The variation in the UAV and vehicle trajectory in a random iteration of the game (

G

) with the control parameter (

ϕ

) is illustrated in Figure 11. The UAV has been confined to an elliptical path, with its trajectory determined by a random Poisson point process within the elliptical boundary. The altitude of the UAV (

H

) changes along the z axis. Meanwhile, the vehicles are depicted as moving in a straight line along the x and y axes.

7. Conclusions

This paper proposed a game theory model to improve the communication efficiency of UAV-assisted C-V2X communications in a federated learning environment. The UAV and each vehicle in a cluster utilized a strategy-based mechanism to maximize their model completion and transmission probability. We formulated multiple games between different pairs of vehicles to generate an optimal action for each vehicle at a given state. Specifically, we modeled a two-stage zero sum Markovian game with incomplete information to jointly study the utility maximization of the participating vehicles and the UAV in a federated learning environment. The UAV improved the communication efficiency by facilitating model and hyperparameter exchange from multiple participating vehicles during aggregation. The UAV utilized a game theory-based robust federated averaging algorithm to select and discard updates provided by the vehicles in each TTI. We modeled the aggregating process as a mixed-strategy game between the UAV and each vehicle. The valid actions of the vehicles are to send updates while the UAV can accept or ignore these updates as its valid actions. By employing Nash equilibrium, the UAV determined the probability of sufficient updates received from each vehicle.

The experimental results show that our proposed game-based aggregation algorithm is robust to overall sojourn time, thus leading to lower latency while the UAV ensures fairness among multiple vehicles. As future work, we aim to extend this game theoretic approach to the communication scenario where we have multiple UAVs. The problem will aim to optimize resource allocation so that only one UAV serves a specific vehicle at a given time. The game theoretic approach will be used to develop strategies so that other UAVs can either serve other vehicles or stay idle to preserve battery power. Moreover, as a future work, we also aim to incorporate different path loss models and their impact on the communications performance, especially when the UAV altitude is significantly increased and ventures beyond the visual line of sight of the vehicles.

Author Contributions

Conceptualization, A.G. and X.F.; methodology, A.G.; writing—original draft preparation, A.G.; writing—review and editing, X.F.; supervision, X.F.; funding acquisition, X.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

3GPP	Third generation partnership project
5GAA	Fifth generation automotive association
6G	Sixth generation (Communication networks)
AWGN	Additive white Gaussian noise
BS	Base station
BSM	Basic safety messages
C-ITS	Cooperative intelligent transport systems
CPM	Cooperative perception messages
C-V2X	Cellular vehicle-to-everything
D-VCN	Drone-assisted vehicular communication network
DNN	Deep neural network
E2E	End-to-end
ES	Edge server
FANET	Flying ad hoc network
FL	Federated learning
FRL	Federated reinforcement learning
HAP	High altitude platform
KKT	Karush–Khun–Tucker (Optimality conditions)
LAP	Low-altitude platform
LoS	Line of sight
MAC	Medium-access control
MEC	Mobile edge computing
ML	Machine learning
MSE	Mean square error
NE	Nash equilibrium
NOMA	Nonorthogonal multiple access
NLoS	Non-line of sight
NR-V2X	New radio vehicle-to-everything
OTFS	Orthogonal time frequency space
PDR	Packet delivery ratio
QoS	Quality of service
RSU	Road side unit
RTT	Round trip time
SAE	Society of automotive engineers
SINR	Signal-to-interference-plus-noise ratio
SPS	Sensing-based semipersistent scheduling
TTI	Transmission time interval
UAV	Unmanned aerial vehicle
VEC	Vehicular edge computing

References

Hirai, T.; Murase, T. Performance Evaluation of NOMA for Sidelink Cellular-V2X Mode 4 in Driver Assistance System with Crash Warning. IEEE Access 2020, 8, 168321–168332. [Google Scholar] [CrossRef]
Wang, H.; Ding, G.; Chen, J.; Zou, Y.; Gao, F. UAV Anti-Jamming Communications with Power and Mobility Control. IEEE Trans. Wirel. Commun. 2023, 22, 4729–4744. [Google Scholar] [CrossRef]
Manogaran, G.; Hsu, C.H.; Shakeel, P.M.; Alazab, M. Non-Recurrent Classification Learning Model for Drone Assisted Vehicular Ad-Hoc Network Communication in Smart Cities. IEEE Trans. Netw. Sci. Eng. 2021, 8, 2792–2800. [Google Scholar] [CrossRef]
Shi, W.; Zhou, H.; Li, J.; Xu, W.; Zhang, N.; Shen, X. Drone Assisted Vehicular Networks: Architecture, Challenges and Opportunities. IEEE Netw. 2018, 32, 130–137. [Google Scholar] [CrossRef]
Shen, T.; Ochiai, H. A UAV-Enabled Wireless Powered Sensor Network Based on NOMA and Cooperative Relaying with Altitude Optimization. IEEE Open J. Commun. Soc. 2021, 2, 21–34. [Google Scholar] [CrossRef]
Zhang, T.; Wang, Z.; Liu, Y.; Xu, W.; Nallanathan, A. Joint Resource, Deployment, and Caching Optimization for AR Applications in Dynamic UAV NOMA Networks. IEEE Trans. Wirel. Commun. 2022, 21, 3409–3422. [Google Scholar] [CrossRef]
Zhou, Y.; Yeoh, P.L.; Kim, K.J.; Ma, Z.; Li, Y.; Vucetic, B. Game Theoretic Physical Layer Authentication for Spoofing Detection in UAV Communications. IEEE Trans. Veh. Technol. 2022, 71, 6750–6755. [Google Scholar] [CrossRef]
Xie, J.; Chang, Z.; Guo, X.; Hamalainen, T. Energy Efficient Resource Allocation for Wireless Powered UAV Wireless Communication System with Short Packet. IEEE Trans. Green Commun. Netw. 2023, 7, 101–113. [Google Scholar] [CrossRef]
Ghamari, M.; Rangel, P.; Mehrubeoglu, M.; Tewolde, G.S.; Sherratt, R.S. Unmanned Aerial Vehicle Communications for Civil Applications: A Review. IEEE Access 2022, 10, 102492–102531. [Google Scholar] [CrossRef]
Hu, Z.; Shaloudegi, K.; Zhang, G.; Yu, Y. Federated Learning Meets Multi-Objective Optimization. IEEE Trans. Netw. Sci. Eng. 2022, 9, 2039–2051. [Google Scholar] [CrossRef]
Gupta, A.; Fernando, X. Co-operative Edge Intelligence for C-V2X Communication using Federated Reinforcement Learning. In Proceedings of the 2023 IEEE 34th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), Toronto, ON, Canada, 5–8 September 2023; pp. 1–6. [Google Scholar]
Zhao, L.; Xu, H.; Wang, Z.; Chen, X.; Zhou, A. Joint Channel Estimation and Feedback for mm-Wave System Using Federated Learning. IEEE Commun. Lett. 2022, 26, 1819–1823. [Google Scholar] [CrossRef]
Wang, H.; Lv, T.; Lin, Z.; Zeng, J. Energy-Delay Minimization of Task Migration Based on Game Theory in MEC-Assisted Vehicular Networks. IEEE Trans. Veh. Technol. 2022, 71, 8175–8188. [Google Scholar] [CrossRef]
Saad, M.M.; Tariq, M.A.; Seo, J.; Kim, D. An Overview of 3GPP Release 17 & 18 Advancements in the Context of V2X Technology. In Proceedings of the 2023 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Bali, Indonesia, 20–23 February 2023; pp. 57–62. [Google Scholar]
González, E.E.; Garcia-Roger, D.; Monserrat, J.F. LTE/NR-V2X Communication Modes and Future Requirements of Intelligent Transportation Systems Based on MR-DC Architectures. Sustainability 2022, 14, 3879. [Google Scholar] [CrossRef]
Petrov, T.; Pocta, P.; Kovacikova, T. Benchmarking 4G and 5G-Based Cellular-V2X for Vehicle-to-Infrastructure Communication and Urban Scenarios in Cooperative Intelligent Transportation Systems. Appl. Sci. 2022, 12, 9677. [Google Scholar] [CrossRef]
Ghodhbane, C.; Kassab, M.; Maaloul, S.; Aniss, H.; Berbineau, M. A Study of LTE-V2X Mode 4 Performances in a Multiapplication Context. IEEE Access 2022, 10, 63579–63591. [Google Scholar] [CrossRef]
Tian, D.; Zhou, J.; Sheng, Z.; Chen, M.; Ni, Q.; Leung, V.C.M. Self-Organized Relay Selection for Cooperative Transmission in Vehicular Ad-Hoc Networks. IEEE Trans. Veh. Technol. 2017, 66, 9534–9549. [Google Scholar] [CrossRef]
Zhang, E.; Yin, S.; Ma, H. Stackelberg Game-Based Power Allocation for V2X Communications. Sensors 2019, 20, 58. [Google Scholar] [CrossRef]
Zhang, L.; Zhu, T.; Xiong, P.; Zhou, W.; Yu, P. A Robust Game-theoretical Federated Learning Framework with Joint Differential Privacy. IEEE Trans. Knowl. Data Eng. 2022, 35, 3333–3346. [Google Scholar] [CrossRef]
Lhazmir, S.; Oualhaj, O.A.; Kobbane, A.; Ben-Othman, J. Matching Game with No-Regret Learning for IoT Energy-Efficient Associations with UAV. IEEE Trans. Green Commun. Netw. 2020, 4, 973–981. [Google Scholar] [CrossRef]
Sempere-García, D.; Sepulcre, M.; Gozalvez, J. LTE-V2X Mode 3 scheduling based on adaptive spatial reuse of radio resources. Ad Hoc Netw. 2021, 113, 102351. [Google Scholar] [CrossRef]
Gupta, A.; Afrin, T.; Scully, E.; Yodo, N. Advances of UAVs toward Future Transportation: The State-of-the-Art, Challenges, and Opportunities. Future Transp. 2021, 1, 326–350. [Google Scholar] [CrossRef]
Kim, T.; Lee, S.; Kim, K.H.; Jo, Y.I. FANET Routing Protocol Analysis for Multi-UAV-Based Reconnaissance Mobility Models. Drones 2023, 7, 161. [Google Scholar] [CrossRef]
Kujawski, A.; Nürnberg, M. Analysis of the Potential Use of Unmanned Aerial Vehicles and Image Processing Methods to Support Road and Parking Space Management in Urban Transport. Sustainability 2023, 15, 3285. [Google Scholar] [CrossRef]
de Curtò, J.; de Zarzà, I.; Cano, J.C.; Manzoni, P.; Calafate, C.T. Adaptive Truck Platooning with Drones: A Decentralized Approach for Highway Monitoring. Electronics 2023, 12, 4913. [Google Scholar] [CrossRef]
Liu, Y.; Zong, C.; Dai, C.; Zheng, H.; Zhang, D. Behavioral Decision-Making Approach for Vehicle Platoon Control: Two Noncooperative Game Models. IEEE Trans. Transp. Electrif. 2023, 9, 4626–4638. [Google Scholar] [CrossRef]
Shan, L.; Miura, R.; Matsuda, T.; Koshikawa, M.; Li, H.B.; Matsumura, T. Vehicle-to-Vehicle Based Autonomous Flight Coordination Control System for Safer Operation of Unmanned Aerial Vehicles. Drones 2023, 7, 669. [Google Scholar] [CrossRef]
Wang, L.; Fan, D.; Huang, K.; Xia, C. A New Game Model of Task Forwarding for a Multiagent System Based on a Reputation Mechanism. IEEE Trans. Circuits Syst. II Express Briefs 2022, 69, 1089–1093. [Google Scholar] [CrossRef]
Mushtaq, A.; Haq, I.u.; Nabi, W.u.; Khan, A.; Shafiq, O. Traffic Flow Management of Autonomous Vehicles Using Platooning and Collision Avoidance Strategies. Electronics 2021, 10, 1221. [Google Scholar] [CrossRef]
Alawad, W.; Halima, N.B.; Aziz, L. An Unmanned Aerial Vehicle (UAV) System for Disaster and Crisis Management in Smart Cities. Electronics 2023, 12, 1051. [Google Scholar] [CrossRef]
Kavas-Torris, O.; Gelbal, S.Y.; Cantas, M.R.; Aksun Guvenc, B.; Guvenc, L. V2X Communication between Connected and Automated Vehicles (CAVs) and Unmanned Aerial Vehicles (UAVs). Sensors 2022, 22, 8941. [Google Scholar] [CrossRef]
Huang, C.J.; Hu, K.W.; Cheng, H.W. An Electric Vehicle Assisted Charging Mechanism for Unmanned Aerial Vehicles. Electronics 2023, 12, 1729. [Google Scholar] [CrossRef]
Moghaddam, S.Z.; Akbari, T. Network-constrained optimal bidding strategy of a plug-in electric vehicle aggregator: A stochastic/robust game theoretic approach. Energy 2018, 151, 478–489. [Google Scholar] [CrossRef]
Chavhan, S.; Gupta, D.; Alkhayyat, A.; Alharbi, M.; Rodrigues, J.J.P.C. AI-Empowered Game Theoretic-Enabled Dynamic Electric Vehicles Charging Price Scheme in Smart City. IEEE Syst. J. 2023, 17, 5171–5182. [Google Scholar] [CrossRef]
Li, C.; Sun, X.; Zha, M.; Yang, C.; Wang, W.; Su, J. IGBT Thermal Model-Based Predictive Energy Management Strategy for Plug-In Hybrid Electric Vehicles Using Game Theory. IEEE Trans. Transp. Electrif. 2023, 9, 3268–3281. [Google Scholar] [CrossRef]
AL-Dosari, K.; Fetais, N. A New Shift in Implementing Unmanned Aerial Vehicles (UAVs) in the Safety and Security of Smart Cities: A Systematic Literature Review. Safety 2023, 9, 64. [Google Scholar] [CrossRef]
Gupta, A.; Fernando, X. Simultaneous Localization and Mapping (SLAM) and Data Fusion in Unmanned Aerial Vehicles: Recent Advances and Challenges. Drones 2022, 6, 85. [Google Scholar] [CrossRef]
Hossain, M.D.; Sultana, T.; Hossain, M.A.; Layek, M.A.; Hossain, M.I.; Sone, P.P.; Lee, G.W.; Huh, E.N. Dynamic Task Offloading for Cloud-Assisted Vehicular Edge Computing Networks: A Non-Cooperative Game Theoretic Approach. Sensors 2022, 22, 3678. [Google Scholar] [CrossRef]
Banez, R.A.; Li, L.; Yang, C.; Han, Z. Mean Field Game and Its Applications in Wireless Networks, 1st ed.; Springer International Publishing: Cham, Switzerland, 2021. [Google Scholar]
Luong, N.; Nguyen, T.T.V.; Feng, S.; Nguyen, H.T.; Niyato, T.D.; Kim, D.I. Dynamic Network Service Selection in IRS-Assisted Wireless Networks: A Game Theory Approach. IEEE Trans. Veh. Technol. 2021, 70, 5160–5165. [Google Scholar] [CrossRef]
Hichri, Y.; Dahi, S.; Fathallah, H. A non-cooperative game-theoretic approach applied to the service selection in the vehicular cloud. Int. J. Commun. Syst. 2022, 35, e5157. [Google Scholar] [CrossRef]
Osman, R.A.; Saleh, S.N.; Saleh, Y.N.M.; Elagamy, M.N. Enhancing the Reliability of Communication between Vehicle and Everything (V2X) Based on Deep Learning for Providing Efficient Road Traffic Information. Appl. Sci. 2021, 11, 11382. [Google Scholar] [CrossRef]
Gautam, C.; Priyanka, K.; Dharmaraja, S. Analysis of a model of batch arrival single server queue with random vacation policy. Commun. Stat. Theory Methods 2021, 50, 5314–5357. [Google Scholar] [CrossRef]
Xu, J.; Liu, L.; Wu, K. Analysis of a retrial queueing system with priority service and modified multiple vacations. Commun. Stat. Theory Methods 2022, 52, 6207–6231. [Google Scholar] [CrossRef]
Segawa, Y.; Tang, S.; Ueno, T.; Ogishi, T.; Obana, S. Improving Performance of C-V2X Sidelink by Interference Prediction and Multi Interval Extension. IEEE Access 2022, 10, 42518–42528. [Google Scholar] [CrossRef]
Feng, Z.; Huang, M.; Wu, D.; Wu, E.Q.; Yuen, C. Multi-Agent Reinforcement Learning with Policy Clipping and Average Evaluation for UAV-Assisted Communication Markov Game. IEEE Trans. Intell. Transp. Syst. 2023, 24, 14281–14293. [Google Scholar] [CrossRef]
Amer, H.; Al-Kashoash, H.; Khami, M.J.; Mayfield, M.; Mihaylova, L. Non-cooperative game based congestion control for data rate optimization in vehicular ad hoc networks. Ad Hoc Netw. 2020, 107, 102181. [Google Scholar] [CrossRef]
Sagduyu, Y.E.; Shi, Y.; MacKenzie, A.B.; Hou, Y.T. Regret Minimization for Primary/Secondary Access to Satellite Resources with Cognitive Interference. IEEE Trans. Wirel. Commun. 2018, 17, 3512–3523. [Google Scholar] [CrossRef]
He, J.; Yang, K.; Chen, H.H. 6G Cellular Networks and Connected Autonomous Vehicles. IEEE Netw. 2021, 35, 255–261. [Google Scholar] [CrossRef]
Li, X.; Cheng, L.; Sun, C.; Lam, K.Y.; Wang, X.; Li, F. Federated-Learning-Empowered Collaborative Data Sharing for Vehicular Edge Networks. IEEE Netw. 2021, 35, 116–124. [Google Scholar] [CrossRef]
Nguyen, V.D.; Chatzinotas, S.; Ottersten, B.; Duong, T.Q. FedFog: Network-Aware Optimization of Federated Learning over Wireless Fog-Cloud Systems. IEEE Trans. Wirel. Commun. 2022, 21, 8581–8599. [Google Scholar] [CrossRef]
Ali, R.; Zikria, Y.B.; Garg, S.; Bashir, A.K.; Obaidat, M.S.; Kim, H.S. A Federated Reinforcement Learning Framework for Incumbent Technologies in Beyond 5G Networks. IEEE Netw. 2021, 35, 152–159. [Google Scholar] [CrossRef]
Zhan, Y.; Li, P.; Guo, S.; Qu, Z. Incentive Mechanism Design for Federated Learning: Challenges and Opportunities. IEEE Netw. 2021, 35, 310–317. [Google Scholar] [CrossRef]
Nie, L.; Wang, X.; Sun, W.; Li, Y.; Li, S.; Zhang, P. Imitation-Learning-Enabled Vehicular Edge Computing: Toward Online Task Scheduling. IEEE Netw. 2021, 35, 102–108. [Google Scholar] [CrossRef]
Zamanipour, M. Novel Information-theoretic Game-theoretical Insights to Broadcasting. IEEE Trans. Signal Inf. Process. Netw. 2022, 8, 713–725. [Google Scholar] [CrossRef]
Elahi, A.; Alfi, A.; Modares, H. H-∞ Consensus of Homogeneous Vehicular Platooning Systems with Packet Dropout and Communication Delay. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 3680–3691. [Google Scholar] [CrossRef]
Markova, E.; Satin, Y.; Kochetkova, I.; Zeifman, A.; Sinitcina, A. Queuing System with Unreliable Servers and In-homogeneous Intensities for Analyzing the Impact of Non-Stationarity to Performance Measures of Wireless Network under Licensed Shared Access. Mathematics 2020, 8, 800. [Google Scholar] [CrossRef]
Mao, K.; Zhu, Q.; Qiu, Y.; Liu, X.; Song, M.; Fan, W.; Kokkeler, A.B.J.; Miao, Y. A UAV-Aided Real-Time Channel Sounder for Highly Dynamic Nonstationary A2G Scenarios. IEEE Trans. Instrum. Meas. 2023, 72, 1–15. [Google Scholar] [CrossRef]
Lyu, Y.; Wang, W.; Sun, Y.; Rashdan, I. Measurement-based fading characteristics analysis and modeling of UAV to vehicles channel. Veh. Commun. 2024, 45, 100707. [Google Scholar] [CrossRef]
Hosseini, M.; Ghazizadeh, R. Stackelberg Game-Based Deployment Design and Radio Resource Allocation in Coordinated UAVs-Assisted Vehicular Communication Networks. IEEE Trans. Veh. Technol. 2023, 72, 1196–1210. [Google Scholar] [CrossRef]
Wang, B.; Yuan, Z.; Zheng, S.; Liu, Y. Data-Driven Intelligent Receiver for OTFS Communication in Internet of Vehicles. IEEE Trans. Veh. Technol. 2023, 73, 6968–6979. [Google Scholar] [CrossRef]
Liu, X.; Yang, Y.; Gong, J.; Xia, N.; Guo, J.; Peng, M. Amplitude Barycenter Calibration of Delay-Doppler Spectrum for OTFS Signal—An Endeavor to Integrated Sensing and Communication Waveform Design. IEEE Trans. Wirel. Commun. 2023, 23, 2622–2637. [Google Scholar] [CrossRef]
Plaisted, D. Some Polynomial and Integer Divisibility Problems are NP-Hard. SIAM J. Comput. 1978, 7, 458–464. [Google Scholar] [CrossRef]
Xu, Y.H.; Li, J.H.; Zhou, W.; Chen, C. Learning-Empowered Resource Allocation for Air Slicing in UAV-Assisted cellular-V2X Communications. IEEE Syst. J. 2023, 17, 1008–1011. [Google Scholar] [CrossRef]
Chen, S.l. The KKT optimality conditions for optimization problem with interval-valued objective function on Hadamard manifolds. Optimization 2022, 71, 613–632. [Google Scholar] [CrossRef]

Figure 1. This Figure illustrates a scenario where, using FL, the UAV selects local update parameters from different vehicles to update its global model parameters. If three vehicles

V_{2}

,

V_{4}

, and

V_{5}

complete their local updates in a transmission window, the UAV needs to select only one vehicle. To avoid the UAV from selecting the same vehicle repeatedly, we use a game theoretic approach. Note that each vehicle has a different objective with a different priority and competes for communication bandwidth and the UAV’s computing resources.

Figure 1. This Figure illustrates a scenario where, using FL, the UAV selects local update parameters from different vehicles to update its global model parameters. If three vehicles

V_{2}

,

V_{4}

, and

V_{5}

complete their local updates in a transmission window, the UAV needs to select only one vehicle. To avoid the UAV from selecting the same vehicle repeatedly, we use a game theoretic approach. Note that each vehicle has a different objective with a different priority and competes for communication bandwidth and the UAV’s computing resources.

Figure 2. Applications of C-V2X communications where UAVs are expected to play a significant role in enhancing the system performance. Drones can serve as an aerial data transmission interface between vehicles and base stations or as a wireless communications network service provider. These applications bring forth some challenges and opportunities where game theoretic approaches can provide a solution to integrate UAVs in C-V2X and related transportation systems [23]. Note that a single UAV can be deployed as a standalone high altitude platform (HAP) or a low altitude platform (LAP) agent. Multiple UAVs can also be deployed as a flying ad hoc network (FANET) [24].

Figure 3. System Model: Vehicles transmit gross data or local model parameters to UAV. From UAV, global model parameters are transmitted to the vehicles. Using game theory, vehicle selection is optimized.

Figure 4. Proposed solution approach using game theory for UAV-assisted C-V2X communications in a federated reinforcement learning environment.

Figure 5. The delay profile for federated learning and average delay experienced by a packet.

Figure 6. Variation in UAV transmit power (dBm) in gross data offloading with UAV altitude for varying number of vehicles (V).

Figure 7. Variation in UAV transmit power (dBm) in federated learning with UAV altitude for varying number of vehicles (V).

Figure 8. The

p_{o p t}

values for

i^{t h}

vehicle’s model transmission and UAV’s model acceptance in an iteration of the game (

G

).

Figure 8. The

p_{o p t}

values for

i^{t h}

vehicle’s model transmission and UAV’s model acceptance in an iteration of the game (

G

).

Figure 9. The variation in

p_{o p t}

values for

i^{t h}

vehicle’s model transmission with varying number of vehicles in an iteration of the game (

G

).

Figure 9. The variation in

p_{o p t}

values for

i^{t h}

vehicle’s model transmission with varying number of vehicles in an iteration of the game (

G

).

Figure 10. The variation in

p_{o p t}

values for

i^{t h}

vehicles’ model transmission with varying control parameter (

ϕ

) in an iteration of the game (

G

).

Figure 10. The variation in

p_{o p t}

values for

i^{t h}

vehicles’ model transmission with varying control parameter (

ϕ

) in an iteration of the game (

G

).

Figure 11. UAV and vehicle locations at random iterations of the game (

G

).

Figure 11. UAV and vehicle locations at random iterations of the game (

G

).

Table 1. Comparison of different game theoretic methods in UAV–vehicle communications.

Reference	Proposed Approach and Objectives	Our Approach
[6]	Considered resource deployment and caching optimization in UAV networks. Considered nonorthogonal multiple access (NOMA) networks. Application scenario included augmented reality and multimedia applications assisted by UAV base stations. Used a Stackelberg game-based approach.	We consider power consumption and UAV utility for FL and gross data offloading scenario. We consider OTFS network. We aim to optimize vehicle selection by the UAV to minimize queuing and processing delay. Consider a Markovian game.
[10]	Investigated FL in edge computing scenarios. Aimed to optimize computing and resource allocation.	Extend FL approach to UAV-assisted C-V2X communication scenario. Optimize UAV power consumption profile, minimize delay.
[11]	Considered federated averaging and federated stochastic gradient descent. Scenario considered static cloud server. Aimed to minimize communication rounds and number of hyperparameters transmitted.	Extend the FRL approach by introducing game theoretic vehicle selection by the UAV. Scenario considers a mobile server embedded in the UAV. When multiple vehicles enter a parameter transmission conflict, various strategy combinations are evaluated to decide which vehicles transmit their data to the UAV.
[13]	Considered energy–delay minimization in MEC-assisted vehicular networks. Modeled task migration based on game theory. Achieved Nash equilibrium and convergence using finite improvement property.	Extend the MEC-assisted scenario to UAV-assisted C-V2X scenario. Use Markovian game theory for vehicle selection by the UAV. Derive the optimal payoff in a transmission window using the KKT optimality conditions.
[18]	Considered self-organized relay selection for cooperative massage transmission in vehicular networks. Formulated an automata game based noncooperative game theoretic analysis. Modeled relay selection game as an ordinal potential game.	Consider UAV-assisted C-V2X communication scenario. Formulate energy-efficient computing resource allocation in UAV as a mixed-integer nonconvex optimization problem. Aim to minimize UAV cost funtion, minimize UAV’s power consumption by the $i^{t h}$ vehicle, and maximize the data rate.
[21]	Analyzed UAV–device association for reliable connections. Considered low communication power and load balanced the traffic using game theory.	Analyze optimal and fair vehicle selection using the flying UAV. Model the aggregating process at the UAV as a mixed-strategy game between the UAV and each vehicle.
[39]	Considered dynamic task offloading in vehicular networks. Used noncooperative game theoretic approach.	Aim to enhance the cooperation among vehicles to optimize utilization of UAV’s computation resources. Propose a low-complexity Markovian game approach for vehicle selection and low-latency communication.

Table 2. Definition of symbols and parameters used in the paper.

Symbol	Definition
$q$ = $[q_{1}, q_{2}, \dots, q_{n}]$	UAV trajectory at different time steps
$C$	Vehicle cluster
$V$	Number of vehicles in a cluster $C$
$g_{i, k}$	Channel gain between the $k^{t h}$ vehicle and the UAV in $i^{t h}$ transmission window
$β_{i}$	Channel power gain
( $x_{u_{i}}$ , $y_{u_{i}}$ )	UAV coordinates during the $i^{t h}$ transmission window
( $x_{v_{k}}$ , $y_{v_{k}}$ )	Coordinates of the $k^{t h}$ vehicle during the $i^{t h}$ transmission window
$H$	UAV flying height in meters (m)
$r_{i, k}$	Data transmission rate from $k^{t h}$ vehicle to the UAV in $i^{t h}$ transmission window
$σ^{2}$	Power spectral density of Gaussian noise
$p_{u i}$	Transmit power of the UAV in the $i^{t h}$ transmission window
$L_{w}$	Length of the transmission window
$p_{a} (k)$	Probability of the number of packets in queue
$μ^{k}$	Mean of the uniform distribution of packets in the queue
$A_{i} = {a_{1}, a_{2}, \dots, a_{j}}$	Action set of the $i^{t h}$ vehicle
$p_{i} (a_{j})$	Probability that the $i^{t h}$ vehicle plays action ( $a_{j}$ )
$u_{i}$	Expected utility
$s_{- i}$	Set of opposing strategies
$S_{i}$	Set of all mixed strategies ( $s_{i}$ )
n-tuple $(u_{i}, s_{i})$	An equilibrium point as quantified in Equation (15b,c)
$u_{i} ({\bar{s}}_{i}, s_{- i})$	Convex combination of the utilities of the $i^{t h}$ vehicles’ pure strategies
$π_{i α}$	The $i^{t h}$ vehicles’ pure strategies
${\bar{s}}_{i} (π_{i α})$	Associated weights with $i^{t h}$ vehicles’ pure strategies
$π_{i α}$	The $i^{t h}$ vehicle’s $α^{t h}$ pure strategy
$G$	A game with probabilities, utilities, and actions $({p_{1}, p_{2}}, {a_{1}, a_{2}}, {u_{1}, u_{2}})$
$g (\cdot)$	A game for which we evaluate utility $u = g (p)$ for probability instance $p$
$p = (p_{1}, \dots, p_{d}) \in P$	Set of d-dimensional probability instance
$w_{i}$	Max–min value of utility weights
$v_{i}$	Min–max value of utility weights
$G (p)$	Model parameter transmission probabilities for each game
$g_{i}^{s}$	Game function for different strategy combinations
$g_{i j}^{u}$	Game function for different utility combinations
$G_{p, a}^{u, s}$	Set of game $G$ for utility (u), strategy (s), probability of action ( $p, a$ )
$θ_{i j}^{v}$	Vehicle (v)’s model transmission parameter weights
$ℓ (\cdot)$	Model loss between $G$ and g
$Θ = R^{M}$	Parameter space that specifies parameter transmission from vehicles to UAV
$δ$	Discount factor for future incentives
$(p_{o p t 1}, p_{o p t 2})$	Nash equilibrium of the game
$ϕ$	Control parameter to tune $(p_{o p t 1}, p_{o p t 2})$
$b_{u_{r}}^{(τ)}$	Bandwidth at UAV’s receiver
$p_{u_{r}}^{(τ)}$	Power at UAV’s receiver
$γ_{u_{r}}^{(τ)}$	SINR during UAV flight time $τ$
$Ψ_{a_{i}}$	UAV’s cost function
$C_{i}^{(τ)}$	Downlink spectral efficiency of vehicle i

Table 3. Simulation parameters.

Parameter	Value
Vehicle Mobility	Manhattan Mobility
Number of vehicles (V)	1–100
Number of $PS$ in drone	1
Drone deployment altitude	100 m–2 km
Elliptical path’s major axis	200 m–500 m
Elliptical path’s minor axis	100 m–350 m
Edge server location	In-vehicle
Communication frequency	5.9 GHz
Modulation technique	16-QAM
Distance between vehicles	10–100 m
Road length	1–5 km
Vehicle speed	0–100 km/h
Payload size for BSM, CPM	1 byte–3 Megabytes
Payload size of FL models	1 byte–10 Megabytes
Dataset used	V2X-Sim
$T_{B S M}$	100 ms–1000 ms
$T_{C P M}$	100, 200, 300, 500 ms
$λ$	1000, 2000 packets/s
Mean speed of vehicles	50 km/h
OTFS base station transmit power	40 dBm (10 W)
Drone transmission power	20 dBm (100 mW)
Drone receiving threshold	−80 dBm
Vehicle transmission power	25 dBm (316.2 mW)
Noise power	−50 dBm ( $10^{- 8}$ W)
Standard deviation in speed	10 km/h

Table 4. Reward and strategy utility values for 2 vehicles.

	$V_{1}$ $S_{i}$	$V_{2}$ $S_{i}$
$V_{1}$ $S_{i}$	$(1, 0)$	$(1, 1)$
$V_{1}$ $S_{i}$	$(0, 1)$	$(1, 0)$

Table 5. The

p_{o p t 1}

and

p_{o p t 2}

values for

i^{t h}

and

m^{t h}

vehicles in an iteration of the game (

G

).

Table 5. The

p_{o p t 1}

and

p_{o p t 2}

values for

i^{t h}

and

m^{t h}

vehicles in an iteration of the game (

G

).

$V_{i}$ / $V_{m}$	1	2	3	4	5
1	(0.625, 0.625)	(0.250, 0.700)	(0.375, 0.875)	(0.500, 0.750)	(0.625, 0.625)
2	(1.000, 0.250)	(0.625, 0.625)	(0.500, 0.750)	(0.625, 0.625)	(0.750, 0.500)
3	(0.875, 0.375)	(0.750, 0.500)	(0.625, 0.500)	(0.750, 0.500)	(0.875, 0.375)
4	(0.750, 0.500)	(0.625, 0.625)	(0.500, 0.750)	(0.500, 0.625)	(1.000, 0.500)
5	(0.625, 0.625)	(0.500, 0.750)	(0.375, 0.875)	(0.250, 1.000)	(0.625, 0.625)

Table 6. Weight values from Equation (18) for different actions pairs.

$V_{i}$ / $V_{m}$	Select	Not_Select
(Not_complete, Transmit)	$(1, 2)$	(2,1)
(Not_complete, Not_transmit)	$(1, 1)$	$(2, 2)$
(Complete,Not_transmit)	$(8, 0)$	$(1, 4)$
(Complete,Transmit)	$(6, 6)$	$(2, 4)$

Table 7. Weight values from Equation (18) for different actions pairs for the vehicles and the UAV.

	Select	Not_Select
Updated	$(5, 7)$	$(- 1, 0)$
Not_Updated	$(- 5, - 8)$	$(0, 0)$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fernando, X.; Gupta, A. Analysis of Unmanned Aerial Vehicle-Assisted Cellular Vehicle-to-Everything Communication Using Markovian Game in a Federated Learning Environment. Drones 2024, 8, 238. https://doi.org/10.3390/drones8060238

AMA Style

Fernando X, Gupta A. Analysis of Unmanned Aerial Vehicle-Assisted Cellular Vehicle-to-Everything Communication Using Markovian Game in a Federated Learning Environment. Drones. 2024; 8(6):238. https://doi.org/10.3390/drones8060238

Chicago/Turabian Style

Fernando, Xavier, and Abhishek Gupta. 2024. "Analysis of Unmanned Aerial Vehicle-Assisted Cellular Vehicle-to-Everything Communication Using Markovian Game in a Federated Learning Environment" Drones 8, no. 6: 238. https://doi.org/10.3390/drones8060238

APA Style

Fernando, X., & Gupta, A. (2024). Analysis of Unmanned Aerial Vehicle-Assisted Cellular Vehicle-to-Everything Communication Using Markovian Game in a Federated Learning Environment. Drones, 8(6), 238. https://doi.org/10.3390/drones8060238

Article Menu

Analysis of Unmanned Aerial Vehicle-Assisted Cellular Vehicle-to-Everything Communication Using Markovian Game in a Federated Learning Environment

Abstract

1. Introduction

1.1. Example Scenario of a Multiagent UAV-Assisted C-V2X Communication

1.2. Contributions

1.3. Organization

2. Related Work

2.1. Game Theoretic Approaches in UAV–Vehicle Communications

2.2. Application of Game Theory to Assist Vehicles to Make a Coordinated Driving Decision

2.3. Application of Federated Learning in UAV–Vehicle Communications

3. System Model

4. Problem Formulation

5. Game Theory-Based Solution Approach

5.1. Nash Equilibrium

5.2. Action Set and the Players’ Strategies

5.3. Vehicles’ Local Model Parameter Transmission Strategies

5.4. UAV’s Global Model Update and Parameter Transmission Strategies

5.5. Vehicle Selection Payoff

6. Simulation Results and Discussion

6.1. Variation in Delay Profile and UAV Transmit Power

6.2. Optimal Payoff and Incentive Probabilities

6.3. Optimal Weight Values and Model Transmission Probabilities

7. Conclusions

Author Contributions

Funding

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI