D2D-Assisted Adaptive Federated Learning in Energy-Constrained Edge Computing

Zhenhua Li; Ke Zhang; Yuhan Zhang; Yanyue Liu; Yi Chen

doi:10.3390/app14124989

Abstract

The booming growth of the internet of things has brought about widespread deployment of devices and massive amounts of sensing data to be processed. Federated learning (FL)-empowered mobile edge computing, known for pushing artificial intelligence to the network edge while preserving data privacy in learning cooperation, is a promising way to unleash the potential information of the data. However, FL’s multi-server collaborative operating architecture inevitably results in communication energy consumption between edge servers, which poses great challenges to servers with constrained energy budgets, especially wireless communication servers that rely on battery power. The device-to-device (D2D) communication mode developed for FL turns high-cost and long-distance server interactions into energy-efficient proximity delivery and multi-level aggregations, effectively alleviating the server energy constraints. A few studies have been devoted to D2D-enabled FL management, but most of them have neglected to investigate server computing power for FL operation, and they have all ignored the impact of dataset characteristics on model training, thus failing to fully exploit the data processing capabilities of energy-constrained edge servers. To fill this gap, in this paper we propose a D2D-assisted FL mechanism for energy-constrained edge computing, which jointly incorporates computing power allocation and dataset correlation into FL scheduling. In view of the variable impacts of computational power on model accuracy at different training stages, we design a partite graph-based FL scheme with adaptive D2D pairing and aperiodic variable local iterations of heterogeneous edge servers. Moreover, we leverage graph learning to exploit the performance gain of the dataset correlation between the edge servers in the model aggregation process, and we propose a graph-and-deep reinforcement learning-based D2D server pairing algorithm, which effectively reduces FL model error. The numerical results demonstrate that our designed FL schemes have great advantages in improving FL training accuracy under edge servers’ energy constraints.

Keywords:

federated learning; energy; device-to-device; graph learning

1. Introduction

The explosive growth and widespread deployment of internet of things (IoT) devices has brought about unprecedented amounts of data, which contain valuable information that can be used by governments and enterprises in environmental monitoring, event trend prediction, and business decision making [1]. In the past, cloud computing and centralized artificial intelligence have been leveraged to derive hidden knowledge and actionable insight from the data. However, sending all the data to a cloud center causes bandwidth consumption and latency issues. The unprecedented scale and complexity of data generated by ubiquitously connected devices may outpace cloud service capabilities. Furthermore, these devices may belong to different owners who are unwilling to share raw data directly to the cloud center, due to privacy and security concerns.

In the context of edge intelligence, federated learning (FL)-empowered mobile edge computing (MEC) has arisen as a promising paradigm to address the excessive transmission overhead and information privacy issues [2]. MEC distributes multiple servers with powerful processing capabilities to the edges of IoT networks, allowing data analysis to occur near where the data are generated, thereby avoiding large-scale data transmission to the cloud center and reducing transmission costs and delays [3]. In practical IoT application scenarios, these edge servers can be service facilities powered by batteries or green energy sources, such as solar and wind energy, or they can be IoT devices with strong processing capabilities [4].

Serving as an appealing technology for edge computing systems, FL enables multiple edge servers to collaborate on developing shared models, but without directly sharing with one another the privacy-sensitive data that they each collect [5]. In FL, edge servers take their local data to train machine learning models locally. Then, the trained local models are periodically sent to a selected server for aggregation, and the aggregated global model is sent back to the edge servers for further training. The iteration of the collaborative training continues, until the model is fully trained. Servers participating in FL only need to upload local model parameters to the aggregation server rather than directly uploading their collected data, thus protecting user privacy [6]. However, long-term transmission of large amounts of local model data from edge servers to aggregators may bring about considerable energy overheads, especially when an edge server is far away from its aggregation server, with poor channel quality. Since some edge servers deployed in the wild are mostly powered by unstable green energy sources or batteries with limited capacity the energy efficiency of the FL process turns out to be a key issue to be investigated.

There have been extensive studies on energy-aware FL in edge computing environments. Some of these focused on client edge server selection or reducing the update frequency of local model data, thereby saving server energy consumption, while others concentrated on improving transmission energy efficiency through spectrum scheduling and power control. However, these measures either compromise the learning accuracy, due to an insufficient number of data samples, or lead to unsatisfactory performance, due to resource competition for data transmission between adjacent edge servers.

Device-to-device (D2D)-assisted FL has emerged as an appealing solution to lessen the data volume transferred between the edge servers and the aggregation servers while ensuring learning accuracy. That is, edge servers can be combined into cooperative pairs according to their energy availability and locations, where an edge server that is further away from the aggregation server and has a lower energy budget can send its local model data through other edge servers that act as relays and are closer to the aggregator. On the premise of reliable data delivery, shorter distance transmission can reduce the sending power of edge servers. Moreover, for those relay servers with powerful processing capabilities, they can also fuse the received model with their own local models and then upload them to the aggregation servers, thereby further reducing the amount of transmitted data and energy consumption.

One generally accepted benchmark to evaluate an FL solution is the difference between a learned model and its corresponding real model. However, most existing works on D2D-assisted FL only note the impact of the training set size on FL performance when designing FL schemes. They note that more training data sets yield more accurate learning models, but they usually ignore the effect of computing power on the FL results. In fact, for a given dataset, consuming more computing power for multiple learning iterations can also improve local model accuracy. Thus, neglecting or improper computing resource scheduling may seriously undermine FL efficiency.

Moreover, recently designed client pairing strategies in D2D FL only take account of the differences between clients in geographical locations and available energy, while always ignoring the possible differential characteristics of cooperating clients, in terms of training data generation. In some scenarios, such as FL clients training an IoT model based on data collected by sensors, different types of sensors may vary in data collection frequencies, generated data volumes, and information contained in a unit of data. These various characteristics may affect the accuracy of the local model trained by each client, change the collaboration efficiency between different D2D pairs, and further affect the overall performance of the federated learning system. However, the relationship between the features of training datasets and D2D collaboration patterns in FL scheduling is largely unexplored.

To fill these gaps, we propose a D2D-assisted FL mechanism for energy-constrained edge computing. Unlike the previous studies, we incorporate computing power into FL management and adjust the energy allocation between training processing and data transfer for each FL client based on the joint effect of computing power and dataset size on learning accuracy. Furthermore, a dataset-characteristic-aware FL scheme is proposed to exploit the effect of dataset correlation on improving model aggregation efficiency. Catering for complex and dynamic relations between heterogeneous clients, we leverage a graph theoretic approach to designing D2D-pairing and resource-allocation schemes, which maximizes FL accuracy under both energy and delay constraints. The main contributions of this paper are summarized as follows:

We incorporate D2D communication into FL training so as to meet the energy constraints of edge servers while minimizing the training model error. In addition to the energy consumption of the wireless transmission of FL clients, the energy cost for model training at each client as well as its training performance are also considered in the FL scheduling.
In order to jointly match FL clients in the training process, in terms of energy budget, computing power, and communication topology, we design a partite graph-based FL scheme with adaptive D2D pairing and aperiodic local iterations of the clients.
We leverage graph learning to exploit the performance gain of dataset correlation between FL clients in the model aggregation process, and we propose a graph-and-deep reinforcement learning-based D2D pairing algorithm, which effectively reduces the model error of the FL.

The rest of this paper is organized as follows. In Section 2, we review related works. In Section 3, we introduce the system model. In Section 4, we propose a D2D-assisted FL scheme for clients with aperiodic local iterations. In Section 5, we further investigate the FL with correlated training datasets. A performance evaluation is presented in Section 6. Finally, we conclude our work in Section 7.

2. Related Work

Federated learning, as a promising distributed and privacy-preserving machine learning technology, has been explored and applied in various fields. In [7], the authors used FL to exploit the parallel-processing power of multiple servers for training mobile traffic-prediction models, and they devised a server-selection-and-dataset-management scheme for FL participation. In [8], the authors applied FL in radio frequency fingerprinting, and they developed a deep federated learning network, which facilitated distributed fingerprinting learning while ensuring privacy preservation within diverse networks. In [9], FL was utilized for resource allocation and device orchestration in non-orthogonal multiple-access-enabled industrial IoT networks, where a hierarchical FL framework was presented to reduce learning overheads. In [10], the authors designed an FL-based distributed algorithm to predict file popularity for edge caching arrangement, which solved user privacy issues in centralized approaches. With the rise of the new generation of wireless communications, FL has been used for a variety of radio automating, network slicing, and process orchestration. The authors in [11] comprehensively investigated FL-empowered mobile network management from the access to the core of 5G-beyond networks. The authors in [12] applied FL to the medical field, and they developed a privacy infrastructure based on federated learning and blockchain to provide a shared model of the spread of the COVID-19 virus.

As the distributed processing pattern of FL well-matches the multi-server architecture of MEC, the integration of FL and MEC has been extensively studied recently. For example, in [13] the authors introduced a clustered FL for mobile traffic-prediction MEC systems. By clustering the MEC servers based on their geographical location and data patterns, the learning straggler impact caused by the heterogeneity of the MEC servers was alleviated. The authors in [14] focused on the slow convergence of FL caused by learning-community changes, and they proposed a meta-learning-aided FL, where new clients could learn from other ones and quickly perform learning scene migration. In order to obtain accurate machine learning models for time-sensitive applications, the authors in [15] presented an age-aware FL framework, and they reduced the age of the data in MEC networks through data selection and aggregator placement. Considering that model parameters may be lost due to unreliable communication, thereby slowing down the FL training process, the authors in [2] devised a multi-layer computing architecture with dynamic epoch adjustment in FL-enabled MEC systems. To tackle management complexity caused by the continuous growth of user equipment, the authors in [16] proposed a multiple-layer framework with clustered clients to deploy FL algorithms on large-scale MEC networks. In [17], the authors aimed to address the problem that FL may malfunction in MEC systems, due to device heterogeneity, unstable channels, and unpredictable user mobility, and they proposed a multi-core FL framework to help FL algorithms to be implemented in MEC systems. In [18], the authors focused on the incentives for clients participating in FL, and they designed an incentive mechanism that selected high-quality edge nodes to contribute to FL training through a heuristic algorithm.

In order to reduce the energy consumption caused by parameter transmission in the FL training process, the authors in [19] proposed an energy-and-distribution-aware collaborative-clustering algorithm to select multiple IoT devices with sufficient energy as aggregation cluster heads. The authors in [20] developed a weight-quantization scheme to facilitate on-device local training over heterogeneous mobile devices. A mixed-integer programming problem was formulated to jointly determine quantization and spectrum allocation to minimize overall FL energy consumption. In [21], the authors investigated wireless-powered FL networks with non-orthogonal multiple access, and they introduced a two-dimensional search algorithm to minimize total energy consumption. In [22], the authors designed an energy-efficient FL scheme over cell-free IoT networks to minimize the energy consumption of IoT devices participating in FL. A wireless-energy-transfer-enabled hierarchical FL framework for heterogeneous networks with a massive multiple-input–-multiple-output wireless backhaul was developed in [23]. The authors saved energy costs through FL device association and wireless-transmitted energy management. In [24], the authors focused on renewable-energy-driven FL systems, and they leveraged reinforcement learning to adapt devices to intermittent renewable-energy supplies and to improve communication efficiency.

Few studies have investigated D2D-assisted FL mechanisms. In [25], the authors designed a hybrid blockchain architecture to enhance the security and reliability of model parameters sharing between FL D2D pairs, and they improved FL node selection through deep reinforcement learning. In [26], the authors developed a multi-stage hybrid FL approach, which orchestrated the devices on different network layers in a collaborative D2D manner to form consensus on the learning model parameters. The authors in [27] introduced a decentralized FL scheme that took D2D communications and overlapped clustering to enable decentralized device aggregation and to save energy consumption. In [28], the authors focused on the wireless communication efficiency of D2D FL, and they presented theoretical insights into the performance of digital and analog implementations of decentralized stochastic gradient descent in FL systems. To improve the scalability of Fl, the authors in [29] proposed a server-less learning approach that leveraged the cooperation of devices to perform data operations inside the network by iterating local computations and mutual interactions. In [30], the authors investigated energy-aware deep neural network model training, and they formulated an energy-aware and D2D-assisted FL problem to minimize the global loss of the training model.

Although the aforementioned studies have provided some useful insights and promising paradigms for FL scheduling, key factors, such as the impact of computing power on model accuracy and the correlation of training datasets, as well as the latency constraints of model training, have not yet been well-investigated. Furthermore, how to adaptively formulate D2D pairs between heterogeneous FL clients with complex communication topology remains a critical challenge. To fill this gap, in this paper, we leverage graph theoretic approaches to describe the relationships between FL clients and to obtain the joint schemes of D2D pairing and resource allocation, which improves the model accuracy of FL training.

3. System Model and Problem Formulation

As shown in Figure 1, we consider D2D-assisted federated learning in an edge computing network for IoT applications. There are

N + 1

edge servers that jointly execute FL to obtain an IoT application model, such as traffic flow prediction and environmental monitoring. Servers with indexes 1 to N act as FL clients, and the server indexed with 0 acts as an aggregator. These clients are heterogeneous in terms of energy budget, local dataset generation, data processing capabilities, and geographical location. A deep neural network (DNN) model is deployed in these clients for training the IoT model [30].

Figure 1. D2D-assisted federated learning in an edge computing network.

Without loss of generality, the number of rounds for the entire FL system to run is defined as K. In each round, the client servers train their own local model parameters based on the datasets collected from IoT devices and upload them to the aggregation server. In the k-th round, the number of local training iterations taken by client server n before uploading its model parameters is given as

τ_{n, k}

. Then, based on these collected model parameters, the aggregation server generates a global model and distributes it back to the client servers.

Let

D_{n, k}

denote the local dataset of server n generated in round k, and let

(x_{n, k}, y_{n, k})

be one data point in

D_{n, k}

, where

x_{n, k} \in R^{d}

is the input features,

y_{n, k}

is the label of the data point, and

n \in N = {1, 2, . . ., N}

. The learning target of the FL running on server n is to train a DNN to minimize the error between the neural network output and label

y_{n, k}

given input

x_{n, k}

. The loss function training based on dataset

D_{n, k}

can be defined as

\begin{matrix} L (ω_{n, k} | D_{n, k}, τ_{n, k}) = \frac{\sum_{{x_{n, k}, y_{n, k}} \in D_{n, k}} l (ω_{n, k}, x_{n, k}, y_{n, k}, τ_{n, k})}{| D_{n, k} |}, \end{matrix}

(1)

where

ω_{n, k}

is a d-dimensional vector parameter of the trained DNN model, function l is a mean squared error or cross-entropy, and function

L (ω_{n, k} | D_{n, k}, τ_{n, k})

is a monotonically increasing function of dataset size

| D_{n, k} |

and local training iteration time

τ_{n, k}

[31].

Due to the constrained energy budget of the servers, energy conservation becomes a critical issue and can be addressed in two ways. The first is to reduce computing power costs—that is, each client server can selectively participate in the local model training process in each round. We adjust the energy consumption of server n participating in model training through variable

τ_{n, k}

. Another way to save energy is to reduce communication energy consumption. In each FL training round, a client can either not upload any locally trained model parameters or can directly transfer the parameters to the aggregation server or even use a relay node to submit parameters to the aggregation server in a D2D manner. The relay node is selected from the servers that act as clients in the FL training. We take

ρ_{n, m, k}^{del} = 1

to indicate that server n delivers parameters to relay m and vice versa, where

n \in N

,

m \in {0} \cup N

, and

m \neq n

.

Given the time

t_{n}

of client n to process a unit size of dataset, the training time for that client in round k is

\begin{matrix} T_{n, k}^{tra} = τ_{n, k} t_{n} \cdot | D_{n, k} | . \end{matrix}

(2)

The spectrum taken to transmit model parameters between the servers participating in the FL training is divided into Z orthogonal channels with bandwidth B, of which

Z_{1}

channels are used for the direct delivery of parameters from client or relay servers to the aggregation server, and the remaining

Z_{2}

channels are leveraged for client servers’ D2D communications. We assume that a channel can only be allocated to at most one server at the same time. Thus, at any given time and among the servers participating in FL, at most

Z_{1}

servers are allowed to directly upload to the aggregation server without violating bandwidth capacity limitations, while at most

Z_{2}

servers can conduct D2D interactions. Differently from existing studies that are limited to one-to-one D2D communication, we consider that all servers are equipped with broadband wireless receivers, which can support many-to-one D2D communication and receive model parameters from multiple D2D paired servers simultaneously. When server n obtains a channel to communicate with server m its transmission rate can be expressed as

\begin{matrix} r_{n, m} = B {log}_{2} (1 + \frac{p_{n} ξ_{0} d_{n, m}^{- κ}}{σ^{2}}), \end{matrix}

(3)

where

p_{n}

and

d_{n, m}

are the transmission power of server n and the distance between server n and m, respectively;

σ^{2}

is the white noise of the channel;

ξ_{0}

is the channel gain at a reference distance of one meter; and

κ

is a constant.

Let the the data amount of the model parameters generated by server n training be

C_{n}

. Server n may also act as a relay node in round k. In such a case, it will fuse the parameters passed to it by another client server

n^{'}

with the model parameters it has trained and then deliver the fusion result to server 0. The fused model parameter at server n is

\begin{matrix} ω_{n, k} = \frac{\sum_{n^{'} \in N, n^{'} \neq n} ρ_{n^{'}, n, k}^{del} | D_{n^{'}, k} | ω_{n^{'}, k} + 1_{τ_{n, k} \geq 1} | D_{n, k} | ω_{n, k}}{\sum_{n^{'} \in N, n^{'} \neq n} ρ_{n^{'}, n, k}^{del} | D_{n^{'}, k} | + 1_{τ_{n, k} \geq 1} | D_{n, k} |}, \end{matrix}

(4)

where

1_{\hat{z}}

is an indicator function that equals 1 if

\hat{z}

is true and is 0 otherwise. It is noteworthy that the training and transmission process of server

n^{'}

and the local training of server n can be carried out simultaneously. Thus, the time cost for server n in the k-th round of FL, which consists of local model training time, D2D model gathering time, and model parameter transmission time, can be calculated as

\begin{matrix} \begin{matrix} T_{n, k} = max_{n^{'} \in N, n^{'} \neq n} \{{ρ_{n^{'}, n, k}^{del} C_{n^{'}} / r_{n^{'}, n} + T_{n^{'}, k}^{tra}}, T_{n, k}^{tra}\} + ρ_{n, m, k}^{del} C_{n} / r_{n, m} \end{matrix} . \end{matrix}

(5)

When the FL system runs in a synchronous mode, in each round, the aggregation server needs to obtain the uploaded parameters from the latest client before it can aggregate the model. Therefore, the time consumption of the k-th round is expressed as

\begin{matrix} T_{k} = max_{n \in N} {T_{n, k}} . \end{matrix}

(6)

Next, we discuss the energy cost of these clients. Let

e_{n}

denote the energy cost of server n to process a unit size of dataset in one local training iteration. The energy consumption of server n in round k is

\begin{matrix} E_{n, k}^{tra} = τ_{n, k} e_{n} \cdot | D_{n, k} | . \end{matrix}

(7)

Moreover, in addition to the energy consumed by model training, there is also an energy cost when a client transmits model parameters to a relay server or the aggregation server. The transmission energy consumption is expressed as

\begin{matrix} E_{n, m, k}^{del} = p_{n} C_{n} / r_{n, m}, \end{matrix}

(8)

where

C_{n}

is the data size of the model parameters trained by server n, and

m \in {0 \cup N}

and

m \neq n

. Based on (7) and (8), we can obtain the total energy consumption of server n in the kth round of FL as

\begin{matrix} \begin{matrix} E_{n, k} = E_{n, k}^{tra} + \sum_{m \in {0 \cup N}, m \neq n} ρ_{n, m, k}^{del} E_{n, m, k}^{del} \end{matrix} . \end{matrix}

(9)

In round k, the uploaded local models from both client or relay servers are aggregated at server 0 as

\begin{matrix} \begin{matrix} ω_{k} = \frac{\sum_{n \in N} ρ_{n, 0, k}^{del} | D_{n, k} | ω_{n, k}}{\sum_{n \in N} ρ_{n, 0, k}^{del} | D_{n, k} |} \end{matrix} . \end{matrix}

(10)

Then, server 0 distributes the aggregated model to all the client servers for the

k + 1

round of training. Since server 0 only takes parameter aggregation its processing cost is negligible. Moreover, model distribution is carried out in the form of a broadcast with a fixed delay. Therefore, we focus on minimizing the expectation of the loss function of these client servers under given constraints by scheduling their model training and parameter transmission. The formulated optimization problem is as follows:

\begin{matrix} \begin{matrix} min_{{τ_{n, k}, ρ_{n, m, k}^{del}}} E {\sum_{n \in N} L (ω_{n, K} | D_{n})} \\ s . t . C 1 : \sum_{k \in K} E_{n, k} ⩽ E_{n}^{max}, n \in N \\ C 2 : \sum_{k \in K} T_{k} ⩽ T^{max} \\ C 3 : \sum_{n \in N} ρ_{n, 0, k}^{del} ⩽ Z_{1}, k \in K \\ C 4 : \sum_{n \in N} \sum_{n^{'} \in N} ρ_{n^{'}, n, k}^{del} ⩽ Z_{2}, k \in K \\ C 5 : \sum_{m \in {0 \cup N}} ρ_{n, m, k}^{del} ⩽ 1, n \in N, k \in K \\ C 6 : ρ_{n, m, k}^{del} = {0, 1}, τ_{n, k} = {0, 1, 2, . . .}, \\ n \in N, m \in {0 \cup N}, k \in K \end{matrix} . \end{matrix}

(11)

where

D_{n} = ⋃_{k \in K} D_{n, k}

. In (11), constraints C1 and C2 guarantee that the energy allocated for the model training and parameter transmission of each client server should not exceed its energy budget and that the time cost for K rounds of FL should under maximum delay constraint, respectively. Constraints C3 and C4 ensure that the number of client servers performing direct delivery to server 0 in parallel and the number of client servers performing D2D communication simultaneously cannot exceed their respective maximum channel capacities. Constraint 5 states that for any client server, it cannot communicate with more than one target object at the same time. Constraint C6 indicates the possible value range of

ρ_{n, m, k}^{del}

and

τ_{n, k}

.

4. Partite Graph-Based FL Scheme with Adaptive D2D Pairing and Aperiodic Local Iterations

As (11) is an integer programing problem, it is proved to be NP-complete. To address problem (11), in this section, we leverage the graph theoretic approach, and we propose a partite graph-based FL scheme, which minimizes the learning loss function through adjusting D2D pairing and agent local iterations.

Since the energy budget of edge servers is limited, how to improve learning efficiency under energy constraints has become a key issue in FL scheme design. In a traditional FL mechanism, the edge servers as learning clients always perform fixed numbers of local iterations before each aggregation to a central server. However, in practical edge computing applied to the IoT, the dataset gathered by each edge server may follow non-IID (Identical and Independently Distributed), which always cause differences in the local models trained by the edge servers. The differences vary with time and become worse as the number of iterations increases. If model aggregation is implemented in the early iteration of the edge server, excessive differences between local models may cause serious deviations in the aggregation model, thereby reducing learning accuracy. In addition, the transmission of model parameters in the aggregation process also brings unnecessary communication energy costs. Thus, the aggregation time needs to be optimally selected.

Given the maximum FL rounds K, for server n, subject to its energy limitation, the number of iterations in each round, denoted as

τ_{n, k}

, satisfies the following relationship:

\sum_{k = 1}^{K} τ_{n, k} = \frac{E_{n}^{max}}{| D_{n, k} | e_{n}} .

(12)

We consider the loss function

L (ω_{n, k} | D_{n, k}, τ_{n, k})

to be

ζ

-smooth, i.e., for any

ω_{n, k}

and

ω_{n, k}^{'}

,

\begin{matrix} \begin{matrix} L (ω_{n, k}^{'} | D_{n, k}, τ_{n, k}) \leq L (ω_{n, k} | D_{n, k}, τ_{n, k}) + ▽ L (ω_{n, k} | D_{n, k}, τ_{n, k}) (ω_{n, k}^{'} - ω_{n, k}) + \frac{ζ}{2} | | ω_{n, k}^{'} - ω_{n, k} {| |}^{2} \end{matrix} . \end{matrix}

(13)

This loss function is also

μ

-quasi-convexity; that is, for

ω_{n, k}^{*} \in arg min L (ω_{n, k} | D_{n, k}, τ_{n, k})

and any

ω_{n, k}

, we have

\begin{matrix} \begin{matrix} L (ω_{n, k}^{*} | D_{n, k}, τ_{n, k}) \geq L (ω_{n, k} | D_{n, k}, τ_{n, k}) + ▽ L (ω_{n, k} | D_{n, k}, τ_{n, k}) (ω_{n, k}^{*} - ω_{n, k}) + \frac{μ}{2} | | ω_{n, k}^{*} - ω_{n, k} {| |}^{2} \end{matrix} . \end{matrix}

(14)

Moreover, the gradient of

L (ω_{n, k} | D_{n, k}, τ_{n, k})

has an upper bound, which can be described as the existence of

ϕ_{n} > 0

and

G > 0

, such that

E | | ▽ L (ω_{n, k} | D_{n, k}^{'}, τ_{n, k}) - ▽ L (ω_{n, k} | D_{n, k}, τ_{n, k}) {| |}^{2} \leq ϕ_{n}^{2},

(15)

and

E | | ▽ L (ω_{n, k} | D_{n, k}^{'}, τ_{n, k}) {| |}^{2} \leq G^{2},

(16)

where

D_{n, k}^{'}

is a mini-batch dataset sampled from

D_{n, k}

randomly.

For client server

n \in N

, when its loss function

L (ω_{n, k} | D_{n, k}, τ_{n, k})

meets the above conditions, the server’s local iteration number in K FL rounds is an arithmetic sequence with a tolerance of

Δ_{n}

; that is,

Δ_{n} = τ_{n, k + 1} - τ_{n, k}

[32]. Therefore, we can obtain the entire iteration sequence of each client server by calculating its first iteration number and

Δ_{n}

.

We have

\frac{\sum_{k = 1}^{K} τ_{n, k}}{K} < τ_{n, 1} < \frac{(2 K - 2 + 2 H)}{(K - 1 + 2 H) \sum_{k = 1}^{K} τ_{n, k} / K},

(17)

where H is a positive constant that satisfies

H Δ < τ_{n, K}

and

Δ = \frac{2 (E_{n}^{max} / | D_{n, k} | e_{n} - K τ_{n, 1})}{K (K - 1)}

(18)

according to (12). Since the number of FL aggregation rounds affects the values of

τ_{n, 1}

and

Δ

, we use an iterative optimization approach to update it. The initial value of the aggregation round K is calculated as

K = \frac{T^{m a x} - max \{E_{n}^{m a x} / e_{n} t_{n} {| D_{n, k} |}^{2}\}}{C_{n} / r_{n, 0}} .

(19)

Considering that these client servers may have different iteration times and computing energy consumption in each round of FL aggregation, we select the servers with sufficient energy surplus as the relay nodes in the D2D pairs. Moreover, in order to aggregate as many local models trained by client servers as possible under the condition of limited communication channels, it is also necessary to optimize the organization of D2D pairs and schedule the delivery time of non-relay nodes to relay nodes to support the concurrent communication of multiple D2D pairs while improving spectrum efficiency. Thus, based on the obtained iteration numbers of these client servers, we propose a partite graph-based D2D pairing scheme, as follows.

First, the client servers are divided into

Z_{1}

clusters through a voting approach. Here,

z_{1}

is the maximum number of channels used by the client servers to directly deliver local model parameters to the aggregation server, and

Z_{1}

limits the number of relay nodes working in parallel. For a given server n in FL round k, we calculate the energy cost of delivering its local model parameters to the aggregation server with the relay help of another server m, where

n, m \in N

and

n \neq m

. The relay node with the smallest energy consumption is recorded as

m^{*}

. If

E_{n, m^{*}, k}^{del} + E_{m^{*}, 0, k}^{del} < E_{n, 0, k}^{del},

(20)

then cast one vote for relay node

m^{*}

; otherwise, no vote will be taken. After traversing all server nodes and completing the voting, we select the

z_{1}

nodes with the highest votes as relay servers and divide the client servers that voted for the same relay server into a cluster. For the other servers that did not vote or had lower votes we choose the relay server with the lowest energy consumption to join the corresponding cluster.

Next, we build a complete J-partite graph within each cluster. Let

G_{f} (V_{f}, E_{f})

denote this graph, where

V_{f}

represents the set of vertices consisting of the servers in the cluster and

E_{f}

is the set of edges connecting these nodes. Take any two vertices n and

n^{'}

from

V_{f}

, and if

min \{T_{n, k}^{tra} + \frac{C_{n}}{r_{n, m^{*}}} - T_{n^{'}, k}^{tra}, T_{n^{'}, k}^{tra} + \frac{C_{n^{'}}}{r_{n^{'}, m^{*}}} - T_{n, k}^{tra}\} \leq 0,

(21)

then connect n and

n^{'}

to form an edge. Then, divide the vertices in

G_{f} (V_{f}, E_{f})

into J disjoint subsets to form a complete J-partite graph. In this graph, no two graph vertices within the same set are adjacent but there are edges between vertices that belong to different subsets. The divided subsets satisfy a sequential relationship, in terms of time cost; that is, the sum of the maximum training time of the servers in subset

J_{i}

and the time of transmission to the aggregation server should be no more than the minimum training time of the servers in subset

J_{j}

, where

i, j \in J

and

i < j

. This relationship can be represented as

max_{n \in J_{i}} \{T_{n, k}^{tra} + \frac{C_{n}}{r_{n, m^{*}}}\} \leq min_{n \in J_{j}} \{T_{n, k}^{tra}\}

(22)

By dividing such J subsets, client servers with similar model training times belong to the same subset, while server training times in different sets have a sequential relationship. In each round of FL, an edge server can be selected from each subset to aggregate to the relay server in graph

G_{f} (V_{f}, E_{f})

. Due to the difference in training time of these edge servers, the communication times between the edge servers and the relay server have no overlap, thereby reducing co-channel communication interference and improving transmission efficiency.

The client servers in these J subsets are selected, based on priority, to train local models and aggregate parameters to relay servers. The priority of server n in FL round k is determined jointly by the server’s remaining energy and its model training efficiency, which can be calculated as

S_{n, k} = θ_{1} \frac{E_{n}^{m a x} - \sum_{i = 0}^{k} E_{n, i}}{e_{n} | D_{n, k} |} + \frac{θ_{2}}{t_{n}},

(23)

where

θ_{1}

and

θ_{2}

are positive coefficients. In order to normalize the priority, we put

{S_{1, k}, S_{2, k}, . . ., S_{N, k}}

into a softmax function to obtain the probability of each server being selected, which is depicted as

{o_{1, k}, o_{2, k}, . . ., o_{N, k}} = Softmax ({S_{1, k}, S_{2, k}, . . ., S_{N, k}}) .

(24)

The server with the highest probability will be selected first, which is denoted as

n_{k}^{*} = arg {max}_{n} {o_{1, k}, o_{2, k}, . . ., o_{N, k}}

. Let

ℵ_{n}

represent the neighbor servers of server n in graph

G_{f}

. The next server to be selected is the one with the highest probability belonging to

ℵ_{n}

. This server selection process continues until the currently selected server has no available new neighbors. Then, these selected servers

{n_{k}^{*}}

train their local models, respectively, according to their specified number of iterations

{τ_{n^{*}, k}}

, aggregate their local models on relay server

m^{*}

, and, finally, all the relay severs distributed in

Z_{1}

clusters pass their model parameters to the aggregation server for one global aggregation, thereby completing the kth round of FL.

As the remaining energy of the servers changes and the running time elapses, the total number of FL rounds K also changes. After the kth FL round is completed, the remaining FL round number is updated as

K_{k + 1} = max {0, \frac{T^{max} - \sum_{i = 1}^{k} T_{i} - max \{\frac{(E_{n}^{m a x} - \sum_{i = 1}^{k} E_{n, i})}{| D_{n, k} |^{2} e_{n} t_{n}}\}}{C_{n} / r_{n, 0}}} .

(25)

In the

(k + 1)

th round, the number of iterative training of each server is redetermined, the relay servers and client servers are selected through the constructed partite graph, and the model parameters are aggregated again. The process iterates until the updated FL round reaches 0. The main steps of the proposed partite graph-based FL scheme are summarized as Algorithm 1.

The computational complexity of Algorithm 1 primarily arises from the construction of partite graphs in all FL rounds. From step 1 to step 15, the formation of clusters takes

O (N^{2})

time, as the system consists of N vertices. The construction of partite graphs in each round also takes

O (N^{2})

time, which depends on the number of vertices. Therefore, the computation complexity of Algorithm 1 can be estimated by

O (K N^{2})

, since it takes K FL rounds.

Algorithm 1 Partite graph-based FL algorithm

Require:
: Initial $E_{n}^{max}, D_{n, k}, t_{n}, c_{n}, p_{n}, d_{n, m}, ε^{h}$
: Set $V_{n} = 0$ and $ρ_{n, m, k}^{del} = 0$
1:: for $n = 1$ to N do
2:: $α_{n} = 0$ ;
3:: for $m = 1$ to N do
4:: if $E_{n, m, k}^{del} + E_{m, 0, k}^{del} < E_{n, α_{n}, k}^{del} + E_{α_{n}, 0, k}^{del}$ then
5:: $α_{n} = m$ ;
6:: end if
7:: end for
8:: $V_{α_{n}} = V_{α_{n}} + 1$ ;
9:: end for
10:: Select $Z_{1}$ servers from set $N$ according to their $V_{n}$ values from large to small, which are denoted as $F_{n^{*}}$ , and take them to form $Z_{1}$ clusters;
11:: for $n = 1$ to N do
12:: if $α_{n} \in F_{α_{n}^{*}}$ then
13:: $F_{α_{n}^{*}} = n ⋃ F_{α_{n}^{*}}$ , $ρ_{n, α_{n}, k}^{del} = 1$ ;
14:: end if
15:: end for
16:: Set k = 0;
17:: while $T^{max} - \sum_{i = 0}^{k} T_{i} > 0$ and $E_{n}^{max} - \sum_{i = 0}^{k} E_{n, i} > 0$ do
18:: $k = k + 1$ , and $ρ_{n, k}^{tra} = 0$ ;
19:: $K_{k} = {T^{max} - \sum_{i = 0}^{k - 1} T_{i} - max {(E_{n}^{max} - \sum_{i = 0}^{k - 1} E_{n, i}) / D_{n, k - 1} e_{n} \cdot t_{n} \cdot | D_{n, k - 1} |}} r_{n, 0} / C_{n}$ ;
20:: Randomly select $τ_{n, k}$ from range $[\frac{(E_{n}^{max} - \sum_{i = 0}^{k} E_{n, i}) / D_{n, k} e_{n}}{K_{k}}, \frac{(2 K_{k} - 2 + 2 H)}{(K_{k} - 1 + 2 H) \sum_{i = 0}^{K_{k}} τ_{n, i} / K_{k}}]$ following a continuous uniform distribution;
21:: $S_{n, k} = θ_{1} \frac{E_{n}^{max} - \sum_{i = 0}^{k} E_{n, i}}{e_{n} | D_{n, k} |} + θ_{2} t_{n}$ ;
22:: $O_{k} = Softmax ([S_{1, k}, S_{2, k}, . . ., S_{N, k}])$ ;
23:: for $F_{n^{*}}$ in ${\{F_{n^{*}}\}}_{n^{*} = 1}^{Z_{1}}$ do
24:: Construct partite graph $G_{f} (V_{f}, E_{f})$ ;
25:: for $(v_{n}, v_{n^{'}}) \in V_{f}$ do
26:: if $min {T_{n, k}^{tra} + \frac{C_{n}}{r_{n}, α_{n}} - T_{n^{'}, k}^{tra}, T_{n^{'}, k}^{tra} + \frac{C_{n^{'}}}{r_{n^{'}}, α_{n^{'}}}$
: $- T_{n, k}^{tra}} > 0$ then
27:: $G_{f} = G_{f} - e_{v_{n}, v_{n^{'}}}$ ;
28:: end if
29:: Divide vertices $V_{f}$ of graph $G_{f}$ into $J_{f, k}$ subsets;
30:: end for
31:: Select client servers for local training $n^{'} = \underset{n}{arg max} \{O_{k} (n \in F_{n^{*}})\}$ ;
32:: $V_{f}^{h} = V_{f} - J_{i}, (n^{'} \in J_{i})$ ;
33:: while Randomly select $ε < ε^{h}$ and $V_{f}^{h} \neq ⌀$ do
34:: $n^{'} = \underset{n}{arg max} \{O_{k} (n \in ℵ_{n ‘, k} ⋂ V_{f}^{h})\}$ , and $ρ_{n^{'}, k}^{t r a i n} = 1$ ;
35:: $V_{f}^{h} = V_{f}^{h} - J_{i}, (n^{'} \in J_{i})$ ;
36:: end while
37:: $τ_{n, k} = ρ_{n, k}^{tra} τ_{n, k}$ ;
38:: end for
39:: end while

During the operation of the partite graph-based D2D pairing scheme, when a server fails, it will be handled in two cases. First, if the server has been selected as a relay node, the scheme needs to proceed from step 10 of Algorithm 1 to re-select the relay nodes and further reconfigure the D2D pairing relations. On the other hand, if the server is not acting as a relay node, then the failed server only needs to be removed from the partite graph and stopped from sending its local model to its pairing server. When the network is split and some paired servers lose their pairing relationship due to unreachable communication, it is necessary to perform step 23 of Algorithm 1 to re-match relay servers for these independent client servers.

5. Graph Learning-Enabled D2D Pairing in Dataset-Correlated FL

When taking FL to obtain IoT application models, the information contained in the datasets collected by each agent may seriously affect the training performance. Moreover, the correlation between the datasets used by client agents may also affect the efficiency of the FL model aggregation. For instance, for using sensors to monitor mountainous areas and constructing their environmental models, the model generated by aggregating sensor datasets with overlapping or adjacent monitoring areas is always more accurate than the model corresponding to aggregating non-correlated areas. Based on the above considerations, in this section, we will explore the gain of dataset correlation between client servers in model aggregation process, and leverage graph learning and deep reinforcement learning (DRL) to propose a D2D-assisted FL scheme.

Let

g_{n, m, k}

denote the model training gain brought about by the aggregation of client server n and relay server m due to their dataset correlation in the k-th round of the FL iteration. Since there exists a complex and implicit relationship between the datasets correlation and the model gains, it is difficult to give an explicit formula for

g_{n, m, k}

. Thus, we use a graph to present this relationship, which can be denoted as

G (V, E, X)

. Here,

V

is the set of vertices, composed of FL client servers, and

E

is the set of edges between the vertices. The edge between any two vertices only exists when their corresponding servers n and

n^{'}

satisfy

T_{n, k}^{tra} + \frac{C_{n}}{r_{n, 0}} - T_{n^{'}, k}^{tra} \geq 0 .

(26)

In graph

G (V, E, X)

,

X = {X_{n}}

is the attribute of vertex n, such as the collection place and collection frequency of the dataset used by the server corresponding to n, and the server’s remain energy budget.

We use

h_{n, k}

to denote the features embedded in vertex n in round k of the FL, which can be calculated as

h_{n, k}^{(l)} = σ (ϕ^{(l)} \sum_{n^{'} \in ℵ (n)} \frac{h_{n^{'}, k}^{(l - 1)}}{| ℵ (n) |} + B^{(l)} h_{n, k}^{(l - 1)}),

(27)

where

h_{n, k}^{(0)} = X_{n}

and

ϕ^{(l)}

and

B^{(l)}

are parameters of the l-th layer of the graph neural network and

ℵ (n)

is the neighbor vertices of vertex n. Based on the formulated graph, the model training gain

g_{n, m, k}

is represented as

\begin{matrix} \begin{matrix} g_{n, m, k} = max (0, c o s (h_{n, k}, h_{m, k})) + E_{m \sim P_{n, k}} (1 - c o s (h_{n, k}, h_{m, k})) . \end{matrix} \end{matrix}

(28)

In (28),

P_{n, k}

is the negative sampling distribution of server n, and

c o s (h_{n, k}, h_{m, k}) = \frac{h_{n, k} \cdot h_{m, k}}{∥ h_{n, k} ∥ ∥ h_{m, k} ∥} .

(29)

Here, we leverage the cosine embedding loss function to calculate the embedding similarity between server n and m, as well as the negative sample embedding dissimilarity between the server pairs [33]. In the cosine function,

∥ h_{n, k} ∥

and

∥ h_{m, k} ∥

are the norms of vectors

h_{n, k}

and

h_{m, k}

, respectively, where

∥ h_{n, k} ∥ = \sqrt{h_{1, n, k}^{2} + h_{2, n, k}^{2} + . . . + h_{d, n, k}^{2}} .

(30)

Next,

g_{n, m, k}

is normalized to

{\tilde{g}}_{n, m, k}

, which is shown as

{\tilde{g}}_{n, m, k} = \sum_{n = 1}^{N} ρ_{n, m, k}^{del} \frac{g_{n, m, k}}{\sqrt{\sum_{n = 1}^{N} g_{n, m, k}^{2}}} .

(31)

Based on (31), the loss function defined in (1) can be redefined as

\begin{matrix} \begin{matrix} L (ω_{n, k}, {\tilde{g}}_{n, m, k} | D_{n, k}, τ_{n, k}) \\ = \frac{\sum_{\{x_{n, k}, y_{n, k}\} \in D_{n, k}} {\tilde{g}}_{n, m, k} l (ω_{n, k}, x_{n, k}, y_{n, k}, τ_{n, k})}{| D_{n, k} |} \end{matrix} . \end{matrix}

(32)

Then, an optimization problem, which concerns the dataset collaboration between the FL client servers, is formulated as

\begin{matrix} \begin{matrix} min_{{τ_{n, k}, ρ_{n, m, k}^{del}}} E \{\sum_{n \in N} L (ω_{n, K}, {\tilde{g}}_{n, m, k} | D_{n})\} \\ s . t . C 1 \sim C 6 in (11) \end{matrix} . \end{matrix}

(33)

To address (33), we turn to a DRL approach and propose a graph-and-deep reinforcement learning D2D pairing scheme for FL, whose main framework is shown in Figure 2. The framework consists of four parts: environment, actor network, critic network, and replay buffer. The environment is graph

G (V, E, X)

reflected from the FL-empowered edge computing network. The actor network is constructed as a graph neutral network (GNN) inference framework, which takes the

G (V, E, X)

as input states, realizes reasoning through message forward propagation between vertices in this graph, and helps draw the D2D pairing and local training action of the FL servers. The critic network uses the same GNN structure as the actor network to evaluate the FL server action. The replay buffer stores learning experience, allowing the agents to learn from early memories to speed up learning.

Figure 2. Graph-and-deep reinforcement learning D2D pairing scheme.

Since the FL running process contains K rounds, and the states of the servers participating in the learning in each round are only related to the states of the previous round, the whole FL operation can be modeled as a Markov process. Let the state of the Markov process be denoted as

s_{k} = [X_{1, k}, X_{2, k}, \dots, X_{N, k}]

, where

X_{n, k}

is the attribute of server n in round k and can be obtained from

G (V, E, X)

.

X_{n, k}

contains two elements. One is the correlation between the datasets of server n and server

n^{'}

, and the other one is the remaining energy of server n in the k-th round, and they are denoted as

{ϕ_{n, n^{'}}}

and

e_{n, k} = E_{n}^{max} - \sum_{i = 1}^{k - 1} E_{n, i}

, respectively.

The GNN learns the embedded feature of each graph vertex in the next state by aggregating the features of its neighboring vertices. The neighborhood information aggregation of vertex n in layer l can be shown as

h_{n, k}^{(l)} = σ (W^{(l)} \cdot C_{θ}^{(l)} (h_{n, k}^{(l - 1)}, A_{θ}^{(l)} (h_{n^{'}, k}^{(l - 1)}, n^{'} \in ℵ (n))),

(34)

where

n^{'} \in ℵ (n)

is the neighborhood set of vertex n,

θ

is the parameter of the GNN,

A_{θ}

is an aggregation function,

C_{θ}

denotes the concat operation that combines the aggregated neighbor information with the features of vertex n itself in

l - 1

to obtain the new embedding

h_{n, k}^{(l)}

, and l is the number of the layers in the GNN. In order to avoid over-fitting during the GNN training process, l is set to 3. The initial embedded feature of the GNN is set as

h_{n, k}^{(0)} = X_{n, k}

.

The action taken by these servers in round k is represented as

a_{k} = [a_{1, k}, a_{2, k}, \dots, a_{n, k}]

. It is noteworthy that in order to reduce the size of the action space we define the server action as

a_{n, k} = [F_{n, k}, Ψ_{n, k}]

, where

F_{n, k}

is the selected head of the cluster where server n belongs to in the k-th round,

1 \leq F_{n, k} \leq Z_{1}

, and

F_{n, k} \in N^{+}

. Here, we take

F_{n, k}

to indicate the D2D pairing relationship of the servers, i.e., server n uses D2D transmission to aggregate its local model parameters to server

F_{n, k}

in the FL round k. Moreover, we should note that action

F_{n, k}

is equivalent to variable

ρ_{n, F_{n, k}, k}^{del}

in (33), whose value is set to 1. For element

Ψ_{n, k}

in action vector

a_{n, k}

,

Ψ_{n, k} = 1

indicates that server n participates in local model training in the k-th FL round, and

Ψ_{n, k} = 0

vice versa. Subject to the number of D2D communication channels, at most

Z_{2}

servers in each cluster can participate in training in one FL round. When

Ψ_{n, k} = 1

, the local iteration number of server n in round k, which is denoted as

τ_{n, k}

, can be calculated following the arithmetic sequence approach described in Section IV. The size of the action space of

a_{k}

is

| 2 Z_{1} N |

. The reward function of this Markov process in round k is given as

r_{k} = E \{\sum_{n \in N} L (ω_{n, k}, {\tilde{g}}_{n, m, k} | D_{n})\} .

(35)

In the deep reinforcement learning, given state s at training time t while taking action a, the Q value is

Q^{π} (s, a) = E \{\sum_{k = 1}^{K} β^{k} r_{k} | π, s, a\},

(36)

where

β \in (0, 1)

is a discount factor. To minimize

Q^{π} (s, a)

the optimal action strategy to be chosen is

π (a | s) = \underset{a}{arg min} Q^{π} (s, a) .

(37)

To obtain

π (a | s)

in the actor network, after all the vertices in the graph complete neighborhood information aggregation, we use a fully connected layer to obtain the action probabilities. The fully connected layer consists of the MLP function and the softmax function, and it can be written as

P r o b (a_{n, k}) = S o f t m a x (M L P (h_{n, k})) .

(38)

In (38),

P r o b (a_{n, k})

is the probability of taking action

a_{n, k}

, and the action with the highest probability will be considered as the optimal choice.

Based on the chosen optimal action, the state at the next step represented as

s_{k + 1}

can be obtained. Then, the action that satisfies all the constraints in (33), the current state, the next state, and the reward form a tuple

< s_{k}, a_{k}, s_{k + 1}, r_{k} >

, which will be stored in the replay buffer. Next, a batch of the experience will be sampled from the replay buffer and will be put into the critic network to evaluate

Q (s_{k}, a_{k})

of action

a_{k}

executed in state

s_{k}

. The PPO (proximal policy optimization) strategy is taken to train the actor network parameter

π_{θ}

and the critic network parameter

υ_{ϕ}

, where the loss function is expressed as

\begin{matrix} \begin{matrix} L o s s (θ, ϕ) = \sum_{k^{'} = 1}^{k} min (r_{k^{'}} {\hat{A}}_{k^{'}}, c l i p (r_{k^{'}}, 1 - ϵ, 1 + ϵ) {\hat{A}}_{k^{'}}) \\ - c_{1} \sum_{k^{'} = 0}^{k} {(υ_{ϕ} (s_{k^{'}}) - {\hat{A}}_{k^{'}})}^{2} + c_{2} \sum_{k^{'} = 1}^{k} S (π_{θ}) \end{matrix}, \end{matrix}

(39)

where S is an entropy function,

{\hat{A}}_{k^{'}}

is the evaluation of the critic network,

{\hat{A}}_{k^{'}} = \sum_{k^{'} = 1}^{k} γ_{k^{'}} r_{k^{'}} - υ_{ϕ} (s_{k^{'}})

,

γ_{k^{'}}

is the discount factor, and

γ_{k^{'}} \in (0, 1)

. The first term on the right side of (39) is the PPO loss, which is the core optimization target of the PPO strategy;

r_{k^{'}}

is a ratio that measures the improvement of the new policy over the old policy;

{\hat{A}}_{k^{'}}

is an advantage function, which indicates the superiority of action

a_{k^{'}}

relative to the average strategy, and can be calculated by the difference between the reward

r_{k^{'}}

and the value function

υ_{ϕ} (s_{k^{'}})

. The clip function modifies the surrogate objective by clipping the probability ratio, which removes the incentive for moving

r_{k^{'}}

outside of interval

[1 - ϵ, 1 + ϵ]

, where

ϵ

is a hyperparameter [34]. The second term is used to update the value function, which represents the difference between the current state

s_{k^{'}}

and the actual reward

{\hat{A}}_{k^{'}}

, calculated using squared-error loss. The third term is the entropy regularization term;

S (π_{θ})

is the entropy of the policy

π_{θ}

. The larger the value of

S (π_{θ})

, the more random and exploratory the policy is, which can help prevent the policy from converging to a suboptimal solution in the early training stage;

c_{1}

and

c_{2}

are weight coefficients. To sum up, we describe the main procedure of the graph learning-enabled D2D pairing scheme in Algorithm 2. In Algorithm 2, we first initialize the parameters of graph G, the actor network, the critic network, and the clipping ratio

ϵ

, and we clear the replay buffer, which is a data structure for temporarily saving the agents’ observations and allowing our learning procedure to update on them multiple times. Then, the algorithm enters a loop with

e p i^{max}

episodes. Each episode consists of K iterations. In each iteration, given the current state

s_{k}

, we obtain action

a_{k}

of the actor network according to (37). Then, we calculate the reward

r_{k}

and obtain the new state

s_{k + 1}

caused by executing action

a_{k}

. The tuple

< s_{k}, a_{k}, r_{k}, s_{k + 1} >

is stored in the replay buffer for future parameter updates. If the condition in step 15 of Algorithm 2 is met, it means that the training time or energy consumption exceeds the limit, and the iteration of this episode will be terminated. Otherwise, I tuple records will be randomly selected from the replay buffer as samples for PPO strategy training, and the parameters of the actor and critic networks will be updated by maximizing the value of

L o s s (θ, ϕ)

. The above process is repeated until the preset maximum number of episodes is reached.

Algorithm 2 GNN-and-DRL-based D2D pairing algorithm

Require:: Initial graph $G (V, E, X_{v})$ , actor network $π_{θ}$ , critic network $υ_{ϕ}$ , and clipping ratio $ϵ$ ;
1:: for episode=1 to $e p i^{max}$ do
2:: Reset $s_{0}$ and set reward $r = 0$ ;
3:: for $k = {1, 2, . . ., K}$ do
4:: Get action $a_{k}$ according to (37);
5:: if $F_{n, k} = m$ then
6:: $ρ_{n, m, k} = 1$ ;
7:: end if
8:: if $Ψ_{n, k} = 1$ then
9:: Randomly chosen $τ_{n, k}$ from $[\frac{(E_{n}^{m a x} - \sum_{i = 1}^{k} E_{n, i}) / D_{n, k} e_{n}}{K}, \frac{(2 K - 2 + 2 H)}{(K - 1 + 2 H) \sum_{i = 1}^{K} τ_{n, i} / K}]$ following a continuous uniform distribution;
10:: else
11:: $τ_{n, k} = 0$ , node n does not perform model training in the kth round;
12:: end if
13:: Get reward $r_{k}$ , $\hat{A_{k}} = r_{k}$ , and turn to next state $s_{k + 1}$ ;
14:: Put $< s_{k}, a_{k}, r_{k}, s_{k + 1} >$ into Replay buffer;
15:: if $T^{max} - \sum_{i = 0}^{k} T_{i} > 0$ & $E_{n}^{max} - \sum_{i = 0}^{k} E_{n, i} > 0$ then
16:: break;
17:: end if
18:: Randomly obtain I records from the replay buffer;
19:: Leverage the Adam optimizer to update parameter $θ$ of the actor network and $ϕ$ of the critic network, ${θ, ϕ} = arg max (L o s s (θ, ϕ))$ according to (39);
20:: end for
21:: end for

The computational complexity of Algorithm 2 derives from several key parts: policy evaluation (Critic), policy update (Actor), experience replay and sampling, and the FL rounds. In each time step, the Critic network computes the state value or action value through a forward pass. The Critic network is an L layer GNN, and the forward pass and backpropagation complexity is

O (2 L \cdot υ_{ϕ})

, where the vertex features have two dimensions and the network has

υ_{ϕ}

parameters. The Actor network computes the policy through a forward pass in each time step using an L layer GNN. The complexity of the forward pass and updating parameters is

O (2 L π_{θ})

, where the actor network has

π_{θ}

parameters. Storing each time step’s state, action, reward, and next state into the experience replay buffer has a complexity of

O (1)

. Sampling a mini-batch from the experience replay buffer for training has a complexity of

O (B)

, where B denotes the batch size. The algorithm runs for K FL rounds and

e p i^{max}

episodes, so the overall complexity is

O (e p i^{max} K (2 L υ_{ϕ} + 2 L π_{θ} + B))

.

It is noteworthy that Algorithm 2 can be applied to large-scale network scenarios. In a sense, this algorithm can be taken as a server clustering approach, in which the relay servers act as cluster heads while the client servers act as cluster members. The models trained by cluster members are distributively aggregated at their corresponding cluster heads and then delivered to the aggregation server. When the number of servers in the network increases, more servers will be selected as cluster heads to share the model aggregation load of the aggregation server.

6. Numerical Results

In this section, we evaluate the performance of our proposed D2D-assisted adaptive FL schemes through experimental simulations. We consider an FL-enabled edge network that consists of 30 edge servers randomly deployed in a rectangular area with a side length of 500 m, together with an aggregation server located within the center of the area. The edge network scenario is shown in Figure 3. The energy budget and the energy cost for processing a unit-sized dataset of the servers are randomly selected from

(10, 40)

and

(0.0004, 0.0010)

following continuous uniform distributions, respectively. We use the MNIST and CIFAR-10 datasets to evaluate our proposed schemes. MNIST is a large database of handwritten digits and CIFAR-10 is a collection of 60,000 32 × 32 color images, which are commonly used for training various image-processing systems and have been widely leveraged for the performance evaluation of FL schemes [35]. We use the these two datasets instead of IoT images collected through camera sensors to test our designed schemes. The learning task is to identify the objects showing in the dataset images. Moreover, to make the datasets follow non-IID distribution, we sort and divide the datasets into subsets and then randomly assign a number of these shards to each edge server, according to McMahan, B. et al. [36].

Figure 3. Deployment of servers in an edge computing network.

Figure 4 compares the training loss of dataset-uncorrelated FL schemes using the MNIST and CIFAR-10 datasets. The benchmarks selected for this performance comparison are the energy budget-oriented D2D pairing FL scheme, the transmission distance-oriented D2D pairing FL scheme, and the FedYogi FL scheme [37]. Compared with these benchmark schemes, our proposed partite graph-based FL scheme yields the lowest loss expectation when applied to both datasets. Although FedYogi can flexibly adjust the learning rate, its aggregation process does not consider the energy budget constraints and the communication topology of the edge servers participating in the FL. Thus, due to the energy exhaustion caused by intensive model training and long-distance parameter transfer, some servers may exit the FL model aggregation process prematurely and, finally, produce the highest model training loss. The energy budget-oriented FL allocates both local model training and parameter data relay tasks to the servers with the most remaining energy but ignores the impact of communication topology on the transmission energy overhead. It wastes part of the energy available for model training on unnecessary remote data delivery. Contrary to the energy budget-oriented FL, the transmission distance-oriented FL scheme only pays attention to the difference in transmission energy consumption affected by the geographical location of the servers in the designing of its operation process. Therefore, it fails to fully utilize the training capabilities of energy-constrained servers. Unlike the above two D2D-assisted FL mechanisms, our proposed partite graph-based FL jointly matches the edge servers in the training process, in terms of energy budget, computing power, and communication topology. Moreover, we exploit the differential effect of server computing power on model accuracy in different FL iterations, and we further optimize the server computing power allocation, thereby improving server-training energy efficiency and obtaining the lowest loss expectation.

Figure 4. Comparison of training loss with different FL schemes.

Figure 5 shows the training loss of FL performed in networks with different numbers of wireless channels. Given the number of channels used for direct delivery between client servers and aggregation servers, which is denoted as

Z_{1}

, the training loss decreases as the number of D2D communication channels

Z_{2}

increases, but this decreasing trend gradually weakens as

Z_{2}

increases. When Z2 increases from 2 to 4, although the D2D communication capacity between the FL servers has improved, due to the densely deployed 30 servers, only four channels still have difficulty supporting the complete spatial-division multiplexing of the spectrum resources. However, with six channels, spectrum spatial-division multiplexing in server-dense areas can be achieved, thereby supporting more concurrent D2D model aggregation between servers, thereby reducing the training loss. In contrast to the above phenomenon, when

Z_{2}

is fixed, the training loss does not show an obvious changing trend with the increase of

Z_{1}

, and the loss value fluctuates. Since the increase of

Z_{2}

can promote D2D aggregation between client servers, the results of this figure demonstrate that distributed model aggregation has better training improvement than simply adjusting the central aggregation number.

Figure 5. FL training loss in networks with different numbers of wireless channels.

Figure 6 presents the number of local training iterations for the servers with different energy budgets. Without loss of generality, we select three servers with different energy budgets to demonstrate their total number of local training iterations during the entire FL process. For the server with an energy budget value of 16, its total iteration number remains constant as channel number

Z_{1}

changes. This is because the task of the server with low energy budgets is to train local models and transfer its trained model parameters to adjacent servers with sufficient energy in a D2D manner. The growth of

Z_{1}

does not affect the server’s tasks or the local training arrangements. For the server with an energy budget of 32, the total number of iterations shows a downward trend as

Z_{1}

increases. The increase of

Z_{1}

results in more servers that can serve as relays. The servers with more energy budget but less computation capability may take on the job. Subject to their energy budget, these servers may reduce the number of local training iterations, thereby ensuring energy supply for relay transmission. As the number of relay servers increases, the servers with an energy budget value of 38 are able to deliver model parameters in D2D mode with lower energy costs, so more energy can be used for local training; consequently, the total number of iterations is increased.

Figure 6. Number of local training iterations for servers with different energy budgets.

Figure 7 shows the convergence of the proposed GNN-and-DRL-based D2D pairing algorithm in scenarios with different numbers of edge servers, respectively. The varying number may lead to changes in the edge network’s total energy budget, computing power, and dataset distribution among adjacent servers. Despite the difference of edge network size and these varying factors all the FL training loss expectations obtained by our GNN-and-DRL-based D2D pairing algorithm converge around 100 episodes.

Figure 7. Convergence of GNN-and-DRL-based D2D pairing algorithm.

Figure 8 compares the loss performance of different D2D pairing schemes training their models with correlated datasets. Our proposed GNN-and-DRL-based pairing scheme outperforms other baseline schemes with the lowest training loss. The partite graph-based pairing scheme aggregates edge servers only according to servers’ communication topology and energy budget, but ignores the correlation between the datasets used by different servers for local model training. As a result, its training performance is the worst. In FL, as typical distributed multi-agent model training, the correlation of datasets leads to the correlation of the trained models. Therefore, using dataset correlation to guide model aggregation may improve the convergence of the aggregation model, thereby reducing the training loss value. Although the DRL-based scheme considers dataset correlation in the D2D server pairing process, it only maps the correlations to hyper-parameters of an individual DRL agent, whose improvement in FL model training is limited. In our designed GNN-and-DRL-based pairing scheme, the dataset correlation between servers, and the server’s current energy surplus, computing power, and D2D communication characteristics are reflected to the attributes of the nodes and edges of the formulated graph. Benefiting from the diffusion of these attributes across servers through the GNN, our scheme can better coordinate the server pairing, thereby achieving lower training loss, especially in networks with a large number of servers.

Figure 8. Training loss comparison of different D2D pairing schemes.

Figure 9 shows the total energy cost of the FL system with different schemes. Compared with the other two schemes, the GNN-and-DRL-based D2D pairing scheme not only achieves optimal training loss performance, but also effectively reduces the total energy consumption of the FL system. The reason is that this scheme takes into account the communication topology between servers as well as the correlation between training datasets in its server aggregation management, thus significantly saving the communication energy cost in model parameter delivery while also reducing the energy consumption for iterative local training before reaching model convergence. Moreover, it can be seen from the figure that the energy costs of all schemes decrease as

Z_{1}

grows. As the number of channels

Z_{1}

increases, more edge servers can become relay nodes, aggregate the local models of surrounding servers in a D2D manner, and then report the model parameters to the aggregation server. The newly emerged relay nodes allow the edge servers to choose a closer relay server to deliver its local model, thus saving transmission energy consumption. However, this energy-saving effect gradually weakens as

Z_{1}

increases. The reason is that when

Z_{1}

is greater than 4, there are enough relay nodes to provide D2D transmission services for all the edge servers.

Figure 9. Total energy cost of FL system with different schemes.

Figure 10 depicts the number of server clusters with different schemes. It is easy to see that with the expansion of the edge computing network scale and the increase in the number of edge servers, the number of clusters formed by these three schemes gradually increases, but at different growth rates. The partite graph-based D2D pairing scheme has the slowest growth in the number of clusters, while our GNN-and-DRL-based scheme has the fastest growth. This is because the partite graph-based scheme mainly focuses on the communication topology between servers. As long as the transmission distance between the servers and the existing relay servers meets the aggregation requirements, the scheme will not trigger the generation of new clusters. In addition to considering the network communication topology, our designed GNN-and-DRL-based scheme further incorporates dataset correlation between different servers into the cluster formulation, thereby optimizing the arrangement of server clusters and improving FL training accuracy. The DRL-based scheme maps the topological relationship between servers and the correlation of datasets into hyperparameters. Some specific features of the correlation may be lost in the mapping process. Therefore, the change in the number of edge servers may not lead to a significant impact on the DRL hyperparameters, which slows down the growth of the number of server clusters.

Figure 10. Number of server clusters with different schemes.

Figure 11 shows the training loss values, the local iteration numbers, and the energy costs of the proposed two schemes alongside the configuration parameters of the wireless channel numbers and energy budget restrictions. Subgraphs (a), (b), and (c) present the performance of Algorithm 1, while subgraphs (d), (e), and (f) correspond to the performance of Algorithm 2. Through the joint comparison of sub-figures (a), (b), and (c) it can be seen that the number of channels

Z_{2}

has a stronger impact on performance than the number of channels

Z_{1}

, especially on the training loss value and the number of local training iterations. When the number of

Z_{2}

is reduced to 2, regardless of the value of energy budget

{\bar{E}}_{n}^{max}

or the value of

Z_{1}

, the training loss increases significantly compared with when

Z_{2}

is 4 or 6. This observation suggests that too few channels

Z_{2}

may limit D2D model aggregation between the servers, thus undermining the overall training performance of the FL. In subgraph (b), when both

Z_{2}

and

{\bar{E}}_{n}^{max}

are large, the number of local training iterations of the servers increases significantly compared to other scenarios. This is because a rich energy budget and large D2D transmission capacity prompt the servers to conduct sufficient local training before sharing model parameters in a D2D approach. Unlike subgraphs (a) and (b), the energy costs presented by subgraph (c) are less affected by

Z_{2}

and

{\bar{E}}_{n}^{max}

, but it shows a slow downward trend as

Z_{1}

increases. The growth number of

Z_{1}

means that more servers can directly interact with the aggregation server, reducing the energy costs of model parameters’ relay forwarding. In subfigures (d), (e), and (f), the performance of Algorithm 2 with these configuration parameters has characteristics similar to Algorithm 1.

Figure 11. Training loss, local iteration numbers and energy costs of different schemes. (a) Training loss of Algorithm 1; (b) Iteration number of Algorithm 1; (c) Energy cost of Algorithm 1; (d) Training loss of Algorithm 2; (e) Iteration number of Algorithm 2; (f) Energy cost of Algorithm 2.

7. Conclusions

In this paper, we investigated FL-enabled mobile edge computing networks, and we presented a D2D-assisted FL mechanism that improves learning accuracy for energy-constrained servers. Making use of the time-varying impact of server computing power on model accuracy, we proposed an adaptive FL scheme with energy-aware D2D server pairing and aperiodic local training iterations in a partite graph theoretic approach. Moreover, we leveraged graph learning to exploit the performance gain of dataset correlation between servers in FL training, and we designed a graph-and-deep reinforcement learning-based server pairing algorithm, which further improves model aggregation efficiency. Finally, we evaluated the performance of the proposed algorithms through experimental simulations. The numerical results demonstrate that our designed FL schemes are promising, in terms of both training accuracy improvement and server energy saving compared with the benchmark algorithms.

Author Contributions

System model construction and problem formulation, Z.L. and Y.C.; scheme design, K.Z., Y.Z., Y.C. and Y.L.; performance evaluation, Y.Z. and K.Z.; writing—original draft, K.Z. and Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key R&D Program of China under Grant 2021YFB3202904.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets presented in this article are not readily available because the data are part of an ongoing study. Requests to access the datasets should be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

De Vito, S.; D’elia, G.; Ferlito, S.; Di Francia, G.; Davidović, M.D.; Kleut, D.; Stojanović, D.; Jovaševic-Stojanović, M. A global multiunit calibration as a method for large-scale IoT particulate matter monitoring systems deployments. IEEE Trans. Instrum. Meas. 2023, 73, 1–16. [Google Scholar] [CrossRef]
Xiang, T.; Bi, Y.; Chen, X.; Liu, Y.; Wang, B.; Shen, X.; Wang, X. Federated learning with dynamic epoch adjustment and collaborative training in mobile edge computing. IEEE Trans. Mob. Comput. 2024, 23, 4092–41061. [Google Scholar] [CrossRef]
Zhang, X.; Liu, J.; Hu, T.; Chang, Z.; Zhang, Y.; Min, G. Federated learning-assisted vehicular edge computing: Architecture and research directions. IEEE Veh. Technol. Mag. 2023, 18, 75–84. [Google Scholar] [CrossRef]
Zhang, X.; Chang, Z.; Hu, T.; Chen, W.; Zhang, X.; Min, G. Vehicle selection and resource allocation for federated learning-assisted vehicular network. IEEE Tran. Mob. Comput. 2024, 23, 3817–3829. [Google Scholar] [CrossRef]
Zhang, K.; Cao, J.; Zhang, Y. Adaptive digital twin and multiagent deep reinforcement learning for vehicular edge computing and networks. IEEE Trans. Ind. Inform. 2022, 18, 1405–1413. [Google Scholar] [CrossRef]
Qiang, X.; Hu, Y.; Chang, Z.; Hamalainen, T. Importance-aware data selection and resource allocation for hierarchical federated edge learning. Future Gener. Comput. Syst. 2024, 154, 35–44. [Google Scholar] [CrossRef]
Kim, D.; Shin, S.; Jeong, J.; Lee, J. Joint edge server selection and dataset management for federated learning-enabled mobile traffic prediction. IEEE Internet Things J. 2024, 11, 4971–4986. [Google Scholar] [CrossRef]
Zhang, T.; Xu, D.; Ren, P.; Yu, K.; Guizani, M. DFLNet: Deep federated learning network with privacy preserving for vehicular LoRa nodes fingerprinting. IEEE Trans. Veh. Technol. 2024, 73, 2901–2905. [Google Scholar] [CrossRef]
Zhao, T.; Li, F.; He, L. DRL-based joint resource allocation and device orchestration for hierarchical federated learning in NOMA-enabled industrial IoT. IEEE Trans. Ind. Inform. 2023, 19, 7468–7479. [Google Scholar] [CrossRef]
Lin, N.; Wang, Y.; Zhang, E.; Yu, K.; Zhao, L.; Guizani, M. Feedback delay-tolerant proactive caching scheme based on federated learning at the wireless edge. IEEE Netw. Lett. 2023, 5, 26–30. [Google Scholar] [CrossRef]
Lee, J.; Solat, F.; Kim, T.Y.; Poor, H.V. Federated learning-empowered mobile network management for 5G and beyond networks: From access to core. IEEE Commun. Surv. Tutorials 2024, accepted. [Google Scholar] [CrossRef]
Samuel, O.; Omojo, A.B.; Onuja, A.M.; Sunday, Y.; Tiwari, P.; Gupta, D.; Hafeez, G.; Yahaya, A.S.; Fatoba, O.J.; Shamshirband, S. IoMT: C COVID-19 healthcare system driven by federated learning and blockchain. IEEE J. Biomed. Health Inform. 2023, 27, 823–834. [Google Scholar] [CrossRef] [PubMed]
Solat, F.; Kim, T.Y.; Lee, J. A novel group management scheme of clustered federated learning for mobile traffic prediction in mobile edge computing systems. J. Commun. Netw. 2023, 25, 480–490. [Google Scholar] [CrossRef]
Sun, W.; Zhao, Y.; Ma, W.; Guo, B.; Xu, L.; Duong, T.Q. Accelerating convergence of federated learning in MEC with dynamic community. IEEE Trans. Mob. Comput. 2024, 23, 1769–1784. [Google Scholar] [CrossRef]
Xu, Z.; Wang, L.; Liang, W.; Xia, Q.; Xu, W.; Zhou, P.; Rana, O.F. Age-aware data selection and aggregator placement for timely federated continual learning in mobile edge computing. IEEE Trans. Comput. 2024, 73, 466–480. [Google Scholar] [CrossRef]
You, C.; Guo, K.; Yang, H.; Quek, T.Q.S. Hierarchical personalized federated learning over massive mobile edge computing networks. IEEE Trans. Wirel. Commun. 2023, 22, 8141–8157. [Google Scholar] [CrossRef]
Bai, Y.; Chen, L.; Li, J.; Wu, J.; Zhou, P.; Xu, Z.; Xu, J. Multicore federated learning for mobile-edge computing platforms. IEEE Internet Things J. 2023, 10, 5940–5952. [Google Scholar] [CrossRef]
Li, Y.; Wang, X.; Zeng, R.; Yang, M.; Li, K.; Huang, M.; Dustdar, S. VARF: An incentive mechanism of cross-silo federated learning in MEC. IEEE Internet Things J. 2023, 10, 15115–15132. [Google Scholar] [CrossRef]
Lee, J.; Ko, H. Energy and distribution-aware cooperative clustering algorithm in Internet of Things (IoT)-based federated learning. IEEE Trans. Veh. Technol. 2023, 72, 13799–13804. [Google Scholar] [CrossRef]
Chen, R.; Li, L.; Xue, K.; Zhang, C.; Pan, M.; Fang, Y. Energy efficient federated learning over heterogeneous mobile devices via joint design of weight quantization and wireless transmission. IEEE Trans. Mob. Comput. 2023, 22, 7451–7465. [Google Scholar] [CrossRef]
Alishahi, M.; Fortier, P.; Hao, W.; Li, X.; Zeng, M. Energy minimization for wireless-powered federated learning network with NOMA. IEEE Wirel. Commun. Lett. 2023, 12, 833–837. [Google Scholar] [CrossRef]
Zhao, T.; Chen, X.; Sun, Q.; Zhang, J. Energy-efficient federated learning over cell-free IoT networks: Modeling and optimization. IEEE Internet Things J. 2023, 10, 17436–17449. [Google Scholar] [CrossRef]
Hamdi, R.; Ben Said, A.; Baccour, E.; Erbad, A.; Mohamed, A.; Hamdi, M.; Guizani, M. Optimal resource management for hierarchical federated learning over HetNets with wireless energy transfer. IEEE Internet Things J. 2023, 10, 16945–16958. [Google Scholar] [CrossRef]
Cui, Y.; Cao, K.; Wei, T. Reinforcement learning-based device scheduling for renewable energy-powered federated learning. IEEE Trans. Ind. Inform. 2023, 19, 6264–6274. [Google Scholar] [CrossRef]
Lu, Y.; Huang, X.; Zhang, K.; Maharjan, S.; Zhang, Y. Blockchain empowered asynchronous federated learning for secure data sharing in internet of vehicles. IEEE Trans. Veh. Technol. 2020, 69, 4298–4311. [Google Scholar] [CrossRef]
Hosseinalipour, S.; Azam, S.S.; Brinton, C.G. Multi-stage hybrid federated learning over large-scale D2D-enabled fog networks. IEEE/ACM Trans. Netw. 2022, 30, 1569–1584. [Google Scholar] [CrossRef]
Al-Abiad, M.S.; Obeed, M.; Hossain, J.; Chaaban, A. Decentralized aggregation for energy-efficient federated learning via D2D communications. IEEE Trans. Commun. 2023, 71, 3333–3351. [Google Scholar] [CrossRef]
Xing, H.; Simeone, O.; Bi, S. Federated learning over wireless device-to-device networks: Algorithms and convergence analysis. IEEE J. Sel. Areas Commun. 2021, 39, 3723–3741. [Google Scholar] [CrossRef]
Savazzi, S.; Nicoli, M.; Rampa, V. Federated learning with cooperating devices: A consensus approach for massive IoT networks. IEEE Internet Things J. 2020, 7, 4641–4654. [Google Scholar] [CrossRef]
Li, Y.; Liang, W.; Li, J.; Cheng, X.; Yu, D.; Zomaya, A.Y.; Guo, S. Energy-aware, device-to-device assisted federated learning in edge computing. IEEE Trans. Parallel Distrib. Syst. 2023, 34, 2138–2154. [Google Scholar] [CrossRef]
Zhang, K.; Cao, J.; Maharjan, S.; Zhang, Y. Digital twin empowered content caching in social-aware vehicular edge networks. IEEE Trans. Comput. Soc. Syst. 2022, 9, 239–251. [Google Scholar] [CrossRef]
Zhang, H.; Wu, T.; Cheng, S.; Liu, J. Aperiodic local SGD: Beyond local SGD. In Proceedings of the 51st International Conference on Parallel Processing, Bordeaux, France, 29 August 2022. [Google Scholar]
Xing, P.; Lu, S.; Wu, L.; Yu, H. BiG-Fed: Bilevel optimization enhanced graph-aided federated learning. IEEE Trans. Big Data 2022, accepted. [Google Scholar] [CrossRef]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
Nilsson, A.; Smith, S.; Ulm, G.; Gustavsson, E.; Jirstrand, M. A performance evaluation of federated learning algorithms. In Proceedings of the Second Workshop on Distributed Infrastructures for Deep Learning, Rennes, France, 10 December 2018. [Google Scholar]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.Y.; Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data. arXiv 2023, arXiv:1602.05629. [Google Scholar]
Nabavirazavi, S.; Taheri, R.; Ghahremani, M.; Iyengar, S.S. Model poisoning attack against federated learning with adaptive aggregation. In Adversarial Multimedia Forensics; Springer Nature Switzerland: Cham, Switzerland, 2023; pp. 1–27. [Google Scholar]

Figure 1. D2D-assisted federated learning in an edge computing network.

Figure 2. Graph-and-deep reinforcement learning D2D pairing scheme.

Figure 3. Deployment of servers in an edge computing network.

Figure 4. Comparison of training loss with different FL schemes.

Figure 5. FL training loss in networks with different numbers of wireless channels.

Figure 6. Number of local training iterations for servers with different energy budgets.

Figure 7. Convergence of GNN-and-DRL-based D2D pairing algorithm.

Figure 8. Training loss comparison of different D2D pairing schemes.

Figure 9. Total energy cost of FL system with different schemes.

Figure 10. Number of server clusters with different schemes.

Figure 11. Training loss, local iteration numbers and energy costs of different schemes. (a) Training loss of Algorithm 1; (b) Iteration number of Algorithm 1; (c) Energy cost of Algorithm 1; (d) Training loss of Algorithm 2; (e) Iteration number of Algorithm 2; (f) Energy cost of Algorithm 2.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.