MAB-Based Online Client Scheduling for Decentralized Federated Learning in the IoT

Chen, Zhenning; Zhang, Xinyu; Wang, Siyang; Wang, Youren

doi:10.3390/e27040439

Open AccessArticle

MAB-Based Online Client Scheduling for Decentralized Federated Learning in the IoT

by

Zhenning Chen

^1,*,

Xinyu Zhang

²,

Siyang Wang

^3,4 and

Youren Wang

¹

College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

²

School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing 210049, China

³

Jiangsu Key Laboratory of Wireless Communications, Nanjing University of Posts and Telecommunications, Nanjing 210049, China

⁴

Engineering Research Center of Health Service System Based on Ubiquitous Wireless Networks, Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing 210049, China

^*

Author to whom correspondence should be addressed.

Entropy 2025, 27(4), 439; https://doi.org/10.3390/e27040439

Submission received: 3 March 2025 / Revised: 1 April 2025 / Accepted: 17 April 2025 / Published: 18 April 2025

(This article belongs to the Section Information Theory, Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

:

Different from conventional federated learning (FL), which relies on a central server for model aggregation, decentralized FL (DFL) exchanges models among edge servers, thus improving the robustness and scalability. When deploying DFL into the Internet of Things (IoT), limited wireless resources cannot provide simultaneous access to massive devices. One must perform client scheduling to balance the convergence rate and model accuracy. However, the heterogeneity of computing and communication resources across client devices, combined with the time-varying nature of wireless channels, makes it challenging to estimate accurately the delay associated with client participation during the scheduling process. To address this issue, we investigate the client scheduling and resource optimization problem in DFL without prior client information. Specifically, the considered problem is reformulated as a multi-armed bandit (MAB) program, and an online learning algorithm that utilizes contextual multi-arm slot machines for client delay estimation and scheduling is proposed. Through theoretical analysis, this algorithm can achieve asymptotic optimal performance in theory. The experimental results show that the algorithm can make asymptotic optimal client selection decisions, and this method is superior to existing algorithms in reducing the cumulative delay of the system.

Keywords:

decentralized federated learning; client scheduling; multi-armed bandit

1. Introduction

The Internet of Things (IoT), driven by the latest advancements in information and communication technologies, connects countless devices to the Internet, enabling seamless connectivity and real-time interaction between people, machines, and objects [1]. Recently, with the development of IoT, a large number of intelligent applications and services based on IoT have begun to emerge; while bringing great convenience to people’s daily work and life, it is also promoting profound changes in the fields of industrial manufacturing, agricultural production, and infrastructure construction [2]. At the same time, the development and widespread use of machine learning [3] makes it possible to mine huge amounts of potential value from the vast amounts of data generated by IoT devices, providing more intelligent solutions to existing and newly developed applications.

However, conventional centralized machine learning methods require user devices to upload their raw data, which may contain sensitive information, thus increasing the risk of privacy leakage [4]. Moreover, uploading large-scale raw data to the central server will consume many network bandwidth resources and cause considerable communication delays. Recently, federated learning (FL) [5] has been introduced as a solution to the aforementioned challenges. In FL, clients do not need to share their local data with the cloud or other clients; instead, they can train models locally and upload the updated model for global aggregation. Therefore, the risk of sensitive data leakage, as well as communication overhead, is significantly reduced. FL has been widely applied in different fields, such as natural language processing [6], intelligent manufacturing [7], intelligent transportation [8], and smart healthcare [9].

Traditional centralized FL systems are susceptible to the single point of failure effects; that is to say, when the central server is broken down, the FL training process cannot continue. Furthermore, the limited wireless resources cannot support ever-increasing user devices to participate in FL training simultaneously, and the scalability of the centralized FL systems is constrained [10]. Thus, decentralized FL (DFL) [11] architecture attracts researchers’ interest, deploying multiple servers and enabling more user devices to collaborate on global model training. DFL can reduce the impact of single server failure on model training and further improve system scalability.

However, the deployment of DFL in IoT still faces challenges [12]. Firstly, different user devices have different computing and communication resources, and we refer to this issue as resource heterogeneity [13]. Secondly, the data collected by different user devices are imbalanced and non-identically and independently distributed (non-i.i.d.), which is referred to as data heterogeneity. Thirdly, because the resource availability of user devices is time-varying, the local training delay and communication delay are difficult to estimate, and the training efficiency of FL cannot be guaranteed [14]. Finally, with the increase of user devices, multiple servers still cannot allow simultaneous access to massive devices due to the limited radio resources. To address these challenges, this work aims to propose a novel client scheduling strategy that selects a portion of clients to participate in the DFL training process in each round. The main contributions of this paper can be summarized as follows:

This paper considers the client scheduling problem in DFL scenarios. Due to the heterogeneity of local computing and communication resources, as well as the time-varying nature of wireless channels, the total delay of each client in each round cannot be predicted. Thus, we formulate the client scheduling problem as a contextual combinatorial multi-armed bandit (CC-MAB) program [15].
We propose an online client scheduling algorithm that estimates the delay of clients based on their contextual information during training and continuously updates the estimator according to the actual delay. Through theoretical analysis and algorithm parameter design, this algorithm can achieve asymptotic optimal performance in theory.
Finally, through extensive experiments, we show that the algorithm can make asymptotically optimal client scheduling decisions, which is superior to existing algorithms in reducing the cumulative delay of the system.

The rest of this work is organized as follows. Section 2 presents the related works. Section 3 and Section 4 introduces the system model and formulates the optimization problem. Section 5 and Section 6 provides the convergence analysis and client scheduling algorithm. Section 7 presents the simulation results, followed by the conclusions in Section 8.

2. Related Works

2.1. Client Scheduling in Centralized FL

There have been many works studying client scheduling and resource allocation in the traditional centralized FL. The authors in [16] aimed to reduce the communication load on the central server by identifying clients with irrelevant updates and excluding them from model aggregation. In [17], a communication- and computation-efficient client selection method was proposed where the clients with significant local training losses were selected to accelerate model convergence. In [18], the importance of local learning updates was measured based on the gradient differences of local learning updates, and then a client scheduling method was proposed to balance between client channel quality and update importance. A joint optimization method for client selection and wireless resource allocation based on bipartite matching was proposed in [19], which minimized the global training loss function by optimizing the transmission power of client devices and the wireless resource allocation of servers. The authors in [20] modeled the client scheduling problem in wireless FL with unknown client channel states as an MAB program [21] and proposed a solution based on the

ϵ

-greedy method to balance exploration and exploitation.

There are also some works focusing on optimizing the efficiency of hierarchical FL. For example, the work [22] formulated a joint problem of client scheduling and resource optimization in a hierarchical FL architecture and solved the problem by decomposing the original problem into two sub-problems: resource allocation and edge server cooperation. The authors in [23] simultaneously considered the uncertainty of wireless channels and the weights of client models and transformed the original optimization problem into a mixed integer nonlinear programming problem through theoretical derivation for a solution. The work [24] considered a multi-objective optimization problem under local computing resources and client transmit power constraints and proposed an algorithm based on deep reinforcement learning. In terms of improving system energy efficiency, the work [25] simultaneously considers the local data distribution of clients and the delay caused by model transmission. By jointly optimizing the association strategy between clients and edge servers and resource allocation, the system communication energy consumption is minimized. The work [26] considers the joint optimization of client-server association and client local computing power control under long-term energy consumption constraints to simultaneously minimize global training loss and delay. The authors in [27] studied the client scheduling problem in a hierarchical FL framework and proposed a method based on contextual combined MAB to learn the states of clients who successfully participate in training during the global iteration process, thereby providing appropriate client selection strategies for subsequent training.

2.2. Client Scheduling in DFL

Due to limited communication and computing resources, the key to optimizing the DFL performance lies in balancing the number of communication and computing rounds. The authors in [28] proposed a universal DFL framework to achieve a balance between system communication efficiency and global model convergence by performing a certain number of local model updates and model exchanges between nodes in each round of global iteration. Similarly, the authors in [29] considered the resource heterogeneity of different devices and analyzed the impact of local training on the global model convergence. Based on the analysis results, closed-form solutions for local training rounds of different local nodes were obtained. Furthermore, the authors in [30] incorporated the node selection strategy into a regularized multi-objective optimization problem, aiming to maximize system knowledge gain while minimizing energy consumption. In response to the limited node resources in large-scale IoT scenarios, the work [31] proposed a joint optimization method of node scheduling and bandwidth allocation in asynchronous DFL, aiming to minimize the transmission delay of FL models and improve convergence speed.

However, the above literature assumes that there are no model errors or losses during the model propagation process, which is unrealistic, especially in wireless environments. Although the reliability of model transmission can be improved through the use of the transmission control protocol (TCP) [32] and other methods, more additional communication overhead is caused, and the scalability of the system is reduced. To address this issue, the work [33] divided the model parameters into multiple data packets and sent them through the user datagram protocol (UDP) [34]. Then, the weighted matrix of the inter-node model mixture based on the reliability matrix of inter-node communication was optimized.

3. System Model

We consider an area comprising S cellular cells, each served by a base station (BS) located at the center of the cell. Each BS is equipped with an edge server, and the set of these edge servers is denoted by

S = {1, 2, \dots, S}

. There are K single-antenna devices (clients), denoted by a set

K = {1, 2, \dots, K}

, that are randomly distributed in the considered area. We assume that each edge server has N orthogonal wireless channels. In other words, each server can communicate with up to N clients within the communication range of the BS at the same time (See Figure 1). The set of clients in the cell s is located is denoted as

K_{s} = {k \in K | k \in cell s}

with

|K_{s}| = K_{s}

. In this work, we assume that

N S ≪ K

.

All the clients and edge servers in the area are organized to train a shared global model through DFL. The edge servers select the participating clients at the beginning of each training round. We define the binary variable

a_{k} (t) \in {0, 1}, \forall k \in K, t \leq T

. Specifically, if device k participates in the training of round t,

a_{k} (t) = 1

; otherwise,

a_{k} (t) = 0

. We further define the set of clients participating in round t as

A (t) = {k \in K | a_{k} (t) = 1}

. Similarly, we denote by

A_{s} (t) = {k \in K | a_{k} (t) = 1, k \in cell s}

a set of clients which are associated with edge server s. Furthermore, we collect these

a_{k} (t), \forall k \in K

into

a (t) = {a_{1} (t), a_{2} (t), \dots, a_{K} (t)}

and denote by

a \overset{Δ}{=} \{a (1), a (2), \dots, a (T)\}

the client participation metric of all the training rounds.

3.1. DFL Process

The goal of DFL is to minimize the weighted global training loss, i.e.,

F (g) = \frac{1}{K} \sum_{k \in K} F_{k} (g),

(1)

where

g

are the global model parameters, and

F_{k} (g)

is the local training loss of client k, denoted as

F_{k} (g) = \frac{1}{|D_{k}|} \sum_{d = 1}^{D_{k}} f (g, x_{k d}, y_{k d}),

(2)

where

|D_{k}|

is the size of the local dataset of client k, and

f (g, x_{k d}, y_{k d})

is the local loss function of training data

(x_{k d}, y_{k d})

.

Each global training round of the DFL involves three phases: (1) local model training, (2) intra-cluster model aggregation, and (3) inter-cluster model aggregation. During the local model training phase, each client updates model parameters based on their local training dataset as

g_{k, t + 1} = g_{t} - \frac{λ}{|D_{k}|} \sum_{d = 1}^{|D_{k}|} \nabla f (g_{t}, x_{k d}, y_{k d}), k \in A (t + 1),

(3)

where

g_{k, t + 1}

denotes the local model of client k in round

t + 1

,

g_{t}

denotes the global model at the end of the previous training round,

λ

represents the learning rate, and

\nabla f (g_{t}, x_{k d}, y_{k d})

is the gradient of the model

g_{t}

on the training dataset

D_{k}

. In the phase of intra-cluster model aggregation, the clients participating in the training process upload their latest local model parameters to the associated edge servers via cellular communication. Then, each edge server aggregates the received model parameters as follows:

g_{s, t + 1}^{(S)} = \frac{\sum_{k \in A_{s} (t + 1)} |D_{k}| g_{k, t + 1}}{\sum_{k \in A_{s} (t + 1)} |D_{k}|},

(4)

where

g_{s, t + 1}^{(S)}

is the model of server s after intra-cluster model aggregation in round

t + 1

.

In the phase of inter-cluster model aggregation, each server transmits the updated model to other servers connected to it via high-speed wired links. In this work, we assume that the edge servers are fully connected. Thus, each server aggregates its received models and its models to obtain a global model:

g_{t + 1} = \frac{\sum_{s \in S} \sum_{k \in A_{s} (t + 1)} | D_{k} | g_{s, t + 1}^{(S)}}{\sum_{k \in A (t + 1)} |D_{k}|},

(5)

where

g_{t + 1}

represents the global model obtained during the inter-cluster model aggregation in round

t + 1

. After inter-cluster model aggregation, each server sends the global model back to the associated clients, and the clients then substitute their local models with the global model.

3.2. Delay Model

Considering the sufficient computational power of the edge servers, the delay caused by model aggregation at the edge servers can be ignored. Additionally, the transmission delay of model transmission between the edge servers can also be ignored because of the high-speed wired links between these edge servers. Therefore, the total delay of client k in training round t is as follows:

τ_{k} (t) = min {τ_{k}^{D} (t) + τ_{k}^{U} (t) + τ_{k}^{LU} (t), τ_{max}},

(6)

where

τ_{k}^{D} (t)

denotes the delay of the client downloading the latest global model from the associated server,

τ_{k}^{U} (t)

denotes the delay of the client uploading the local model to the associated server,

τ_{k}^{LU} (t)

represents the delay caused by local model updates, and

τ_{\max}

represents the maximum delay allowed for each training round. Specifically, the delay of client k in downloading the global model can be expressed as

τ_{k}^{D} (t) = \frac{m}{B R_{k}^{D} (t)},

(7)

with

R_{k}^{D} (t) = {log}_{2} [1 + \frac{p_{k}^{D} {|h_{k}^{D} (t)|}^{2}}{σ^{2}}],

(8)

where m is the data size of model parameters, B is the downlink channel bandwidth,

R_{k}^{D} (t)

is the downlink transmission rate,

p_{k}^{D}

is the downlink transmission power,

h_{k}^{D} (t)

is the downlink channel gain, and

σ

is the noise power spectral density. Similarly, the delay of the client uploading the local model can be expressed as follows:

τ_{k}^{U} (t) = \frac{m}{B R_{k}^{U} (t)},

(9)

with

R_{k}^{U} (t) = {log}_{2} [1 + \frac{p_{k}^{U} {|h_{k}^{U} (t)|}^{2}}{σ^{2}}],

(10)

where

R_{k}^{U} (t)

is the uplink transmission rate,

p_{k}^{U}

is the uplink transmit power, and

h_{k}^{U} (t)

is the uplink channel gain. The local update delay can be expressed as follows:

τ_{k}^{LU} (t) = \frac{s_{k} |D_{k} (t)|}{φ_{k} η_{k} (t)},

(11)

where

s_{k}

is the number of CPU rounds required for calculating the unit data volume,

φ_{k}

are the computational resources of client k, and

η_{k} (t)

are the available computational resources of client k in round t.

4. Problem Formulation

In this work, we consider synchronous DFL training, which means that at the intra-cluster aggregation phase of each training round, the server conducts model aggregation only after receiving the models from all clients associated with it. Therefore, the delay of round t, denoted by

τ (a (t))

, is determined by the slowest client, i.e.,

τ^{t} (a (t)) = max_{k \in A (t)} τ_{k} (t) .

(12)

Due to the ever-changing wireless channel state and the available resources of client devices, the transmission and computational delay will vary during the training process. To shorten the training time of synchronous DFL, this paper optimizes the client participation scheme in a T-round training process to minimize the total delay, i.e.,

\begin{matrix} min_{a} & \sum_{t = 1}^{T} τ^{t} (a (t)), \end{matrix}

(13a)

\begin{matrix} s . t . & a_{k} (t) \in {0, 1}, \forall k \in K, t \leq T, \end{matrix}

(13b)

\begin{matrix} | A_{s} (t) | \leq N, \forall s \in S, t \leq T, \end{matrix}

(13c)

where (13c) is the access constraint that limits the maximum number of clients that each edge server can serve in each round. However, due to the uncertainty of wireless channels and client activities, the local processing delay and model transmission delay in (11) are hard to obtain.

Fortunately, we can infer the client delay in each round by observing contextual information. Specifically, the context of client k in round t is denoted as

x_{k}^{t} \in X

, where

X = {[0, 1]}^{D}

. The server estimates the delay of client k based on experience

P_{k}

, i.e.,

{\hat{τ}}_{k} (t) = Θ_{k} (x_{k}^{t}, P_{k})

, where

Θ_{k} (\cdot)

is the estimator corresponding to client k. Based on the estimated delay of clients in each round, the system selects clients to participate in each training round. Therefore, the problem (13) is re-expressed as follows:

\begin{matrix} min_{a} & \sum_{t = 1}^{T} max_{k \in A (t)} {\hat{τ}}_{k} (t) \\ s . t . & (13b) and (13c) . \end{matrix}

(14)

The key to Problem (14) is how to estimate client delay in each round according to the observed contextual information. In the following, we will introduce the contextual combinatorial multi-armed bandit (CC-MAB) programming [15], based on which a client scheduling algorithm is proposed.

5. Algorithm Design

In a CC-MAB problem, the player performs actions by pulling one or more of a set of arms. Every time an action is executed, the player will receive a reward value. Before executing an action, the player first observes the contextual information of each arm, and by recording the reward value obtained from executing the action, the player can obtain the corresponding expected reward of the action in that context. By continuously pulling different arms and recording reward values, players gradually learn the best strategy to maximize the expected reward value. It is worth noting that the action corresponding to the un-pulled arm will not be executed; therefore, no corresponding reward values will be recorded.

In our work, each edge server acts as the player, with its arms being all clients within its coverage area. The action of the server is to select a group of clients for scheduling in each training round, while the action space consists of all possible client combinations, subject to the constraint that each edge server can schedule at most N clients simultaneously. The reward obtained by the server in each round is defined as

- \frac{τ^{t} (a (t))}{τ_{m a x}} + 1

, where

τ^{t} (a (t))

represents the delay in the current round. Therefore, we propose a client scheduling method based on CC-MAB programming, enabling the system to learn the optimal strategy and minimize cumulative delay. The client scheduling process is described as follows.

Before each training round begins, each edge server observes the context of clients within the cell. Subsequently, the servers estimate the delay of the clients in this round based on their historical experience information and contextual information. Afterward, each edge server determines client selection. After determining the clients participating in the training round, the system organizes the aforementioned clients to perform DFL model training. After the training is completed, each edge server records the actual delay of the clients participating in this training round and adds it to the client’s historical experience information along with the contextual information observed before the next round of training. Figure 2 summarizes the flow of the proposed client scheduling algorithm.

5.1. Delay Estimation Based on Contextual Information

The context of client k in round t is denoted as

x_{k}^{t} \in X

, where the context space contains D dimensions, denoted as

X = {[0, 1]}^{D}

. The contextual information considered in this work includes the following: (1) the client’s current device activity

I \in I

, such as the number of running programs, is reported by the client themselves; (2) the size of the local dataset

D_{k}

used for training. Due to the continuous values in each dimension of the context space, training an estimator based on every possible scenario in the context space and estimating the delay will result in high computational complexity. Meanwhile, a set of similar contexts within a certain range often corresponds to similar delays. Therefore, we discretize each dimension of the context space and map the observed values to the partitioned grid points of the context space after observing the context information and estimating the delay based on the discretized context information (See Figure 3).

Assume that each dimension of the context space

X

is uniformly divided into G parts. Then, in total,

G^{D}

subspaces will be included. Each subspace in

X

is referred to as a grid point, and the set of all grid points in

X

forms a context set

G

. Therefore, a mapping relationship from

X

to

G

can be established. Due to the fact that the value of G will affect the size of the context set, which, in turn, affects algorithm performance, it is necessary to set a reasonable value for parameter G.

The server observes the client’s contextual information before each training round and maps it to the corresponding grid points in the context set. Assuming that, before training round t, the context

x_{k}^{t}

of client k is mapped to the grid point

g \in G

. If client k is selected to participate in this training round, the server will receive the actual delay of the client after the training is completed, denoted as

τ

. Subsequently, the pair

(g, τ)

is saved as experiences to update the corresponding estimator for the client. We denote the set of historical experiences corresponding to client k as

P_{k}

.

The times of client k participating in training before round t with the context falling on the context grid

g \in G

is recorded using a counter

C_{k}^{t} (g)

. When client k is selected to participate in training in round t, and its context falls on the context grid point g, the counter corresponding to the grid point g of client k will be updated, i.e.,

C_{k}^{t} (g) = C_{k}^{t} (g) + 1

.

In this work, the maximum likelihood estimator (MLE) is leveraged to estimate client delay. Assuming that the delay

τ

of client k corresponding to the same grid point g follows a normal distribution, then the estimation method can be expressed as follows:

{\hat{τ}}_{k}^{t} (g) = \frac{\sum_{(g, τ_{k}) \in P_{k} (g)} τ_{k}}{C_{k}^{t} (g)} .

(15)

5.2. Exploration and Exploitation

We define the empirical threshold function as

E (t)

, which is a monotonically increasing function of the training round t, representing the minimum value of

C_{k}^{t} (g)

that the client needs to reach on any grid point if they are considered to be well explored in round t. Therefore, the proposed algorithm selects the subset of clients to be explored in round t as follows:

E_{s, t} = {k \in K_{s} | C_{k}^{t} (g) < E (t)},

(16)

where

K_{s}

represents the set of all clients within the cell where edge server s is located, and

E_{s, t}

represents the set of clients selected to be explored. Note that if

E_{s, t} \neq \emptyset

, it indicates that there are still clients in the cell that need to be explored; thus, the cell enters the exploration phase in the current training round. Otherwise, the cell enters the exploitation phase.

(a) Exploration Phase: In the exploration phase, the player needs to select as many under-explored clients as possible to participate in training in order to enrich their experience and train the estimator. Here, two cases are considered, i.e.,

| E_{s, t} | \leq N

and

| E_{s, t} | > N

. In the first case, all the clients in the cell are selected to participate in this training round. After that, the player will greedily select the client with the minimum estimated delay value from the remaining clients until

| E_{s, t} | = N

. In the second case of

| E_{s, t} | > N

, N clients are randomly selected from

E_{s, t}

to participate in the current round of training.

(b) Exploitation Phase: In the exploitation phase, the player estimates the delay of each client in the current round based on the current estimator and contextual information and selects the clients participating in the training based on the estimated values to minimize the expected delay of the current round. For each well-explored cell, the optimization problem is formulated as follows:

\begin{matrix} min_{a (t)} max_{k \in A_{s} (t)} & {\hat{τ}}_{k} (t) \end{matrix}

(17a)

\begin{matrix} s . t . & a_{k} (t) \in {0, 1}, \forall k \in K_{s}, \end{matrix}

(17b)

\begin{matrix} | E_{s, t} | = N . \end{matrix}

(17c)

Problem (17) can be simply solved using a greedy algorithm, which arranges the delay of all clients in the cell in ascending order and selects the top N clients to participate in this training round.

6. Key Parameter Design

In this section, the key parameters G and

{E (t)}_{t = 1}^{T}

are designed to minimize cumulative delay of T-round training. Since the setting of parameter G depends on the total number T of training rounds, we re-express it using

G_{T}

.

6.1. Upper Bound of Regret

Denote the optimal solution to Problem (14) by

a^{*} = {a^{*} (1), a^{*} (2), \dots, a^{*} (T)}

. The difference between the delay corresponding to the optimal solution in each round t and the actual delay based on the proposed algorithm is defined as the regret, i.e.,

r_{t} = τ (t, a (t)) - τ (t, a^{*} (t)) .

(18)

The expected cumulative regret of round T is denoted as follows:

E [R (T)] = \sum_{t = 1}^{T} E [r_{t}] = \sum_{t = 1}^{T} (E [τ (t, a (t))] - E [τ (t, a^{*} (t))]) .

(19)

We also introduce two assumptions, as follows.

Assumption 1.

For a specific estimator, as its historical experience

P_{k}

increases, its estimation error for client delay will decrease. Therefore, it is assumed that for any grid point in the context set

G

, the estimator corresponding to client k satisfies the following probably appropriately correct (PAC) property:

\Pr \{|Θ_{k} (P_{k}^{t} (g)) - τ_{k} (g)| >\} \leq σ_{k} (ϵ, C_{k}^{t} (g)),

(20)

where

τ_{k} (g)

represents the delay expectation when the client context

x \in g

is prior.

σ_{k} (ϵ, C_{k}^{t} (g))

is a term that decreases with the increase of the counter

C_{k}^{t} (g)

, which is related to the estimator.

Assumption 2.

Empirically, it can be inferred that when the contextual information is similar, the delay of clients is also similar. Therefore, it is assumed that for each client

\forall k \in K

, there exists

L > 0

and

α > 0

such that for any grid point

x, x^{'} \in X

in the context space

X

, the following inequality holds:

|τ_{k} (x) - τ_{k} (x^{'})| \leq L {∥x - x^{'}∥}^{α},

(21)

where

∥\cdot∥

represents the Euclidean norm in

R^{D}

.

With the above assumptions, we have the following theorem.

Theorem 1.

Given Assumptions 1 and 2, when

2 H (t) + 2 L D^{α / 2} {(G_{T})}^{α} \leq A t^{θ}

, the expected cumulative regret is upper bounded as follows:

\begin{matrix} [R (T)] & \leq r^{max} K_{s} {(G_{T})}^{D} ⌈E (T)⌉ & + 3 T L D^{\frac{α}{2}} {(G_{T})}^{- α} + A T^{θ + 1} & + 2 r^{max} \sum_{t = 1}^{T} \sum_{a \in L (g^{t})} σ_{k^{'}} (H (t), E (t)) . \end{matrix}

(22)

Proof.

Please see the Appendix A for reference. □

6.2. Parameter Design Based on the Upper Bound

It is assumed that the historical delay

τ_{k} (x_{k}), x_{k} \in g_{k}^{t}

in the experience

P_{k} (g_{k}^{t})

corresponding to the grid points

g_{k}^{t} \sim CN (μ_{k} (g_{k}^{t}), δ_{k}^{2} (g_{k}^{t}))

. With the MLE, an unbiased estimate of

μ_{k} (g_{k}^{t})

is given as follows:

{\hat{τ}}_{k} (g_{k}^{t}) = \frac{1}{C_{k}^{t} (g_{k}^{t})} \sum_{τ \in P_{k} (g_{k}^{t})} τ .

(23)

Note that the PAC property in Assumption 1 can be further refined as follows:

\begin{matrix} \Pr \{|{\hat{τ}}_{k} (g_{k}^{t}) - τ_{k} (g_{k}^{t})| >\} & \leq σ_{k} (ϵ, C_{k}^{t} (g_{k}^{t})) & = exp (- \frac{2 C_{k}^{t} {(g_{k}^{t})}^{2}}{{(τ^{max})}^{2}}) . \end{matrix}

(24)

We further let

E (t) = t^{z} log (t)

with

0 < z < 1

and

G_{T} = ⌈T^{γ}⌉

with

0 < γ < \frac{1}{D}

. Then, the first term on the right-hand side of (22) can be rewritten as follows:

r^{max} K_{s} {⌈T^{γ}⌉}^{D} ⌈T^{z} log (T)⌉ .

(25)

Considering

{⌈ T^{γ} ⌉}^{D} \leq {(2 T^{γ})}^{D}

, we have the following:

\begin{matrix} r^{max} K_{s} {⌈T^{γ}⌉}^{D} ⌈T^{z} log (T)⌉ & \leq r^{max} K_{s} {(2 T^{γ})}^{D} (T^{z} log (T) + 1) & = 2^{D} r^{max} K_{s} (T^{γ D} + T^{z + γ D} log (T)) . \end{matrix}

(26)

Let

2 H (t) + 2 L D^{\frac{α}{2}} {(G_{T})}^{α} \leq A t^{θ}

. Then. the third term on the right-hand side of (22) can be rewritten as follows:

\begin{matrix} 2 r^{max} \sum_{t = 1}^{T} \sum_{a \in L (g^{t})} σ_{k^{'}} (H (t), E (t)) \\ = 2 |F| r^{max} \sum_{t = 1}^{T} exp (- \frac{2 E (t) H^{2} (t)}{{(τ^{max})}^{2}}) \\ = 2 |F| r^{max} \sum_{t = 1}^{T} exp (- 2 log (t)) \\ = 2 |F| r^{max} \sum_{t = 1}^{T} t^{- 2} \\ \leq 2 |F| r^{max} \sum_{t = 1}^{\infty} t^{- 2} \\ \leq \frac{π^{2}}{3} |F| r^{max} . \end{matrix}

(27)

Considering

{(G_{T})}^{- α} = {⌈T^{γ}⌉}^{- α} \leq T^{- α γ}

, we have

\begin{matrix} E [R (T)] & \leq 2^{D} r^{max} K_{s} (T^{z + γ D} log (T) + T^{γ D}) & + 3 L D^{\frac{α}{2}} T^{1 - α γ} + A T^{θ + 1} + \frac{π^{2}}{3} |F| r^{max} . \end{matrix}

(28)

Let

z = \frac{2 α}{3 α + D}

,

γ = \frac{z}{2 α}

, and

θ = - \frac{z}{2}

. Then, the highest power term of T in

E [R (T)]

is

2^{D} r^{max} K_{s} (T^{\frac{2 α + D}{3 α + D}} log (T))

, where

\frac{2 α + D}{3 α + D} < 1

.

Therefore, given

E (t) = t^{\frac{2 α}{3 α + D}} log (t)

and

G_{T} = ⌈T^{\frac{1}{3 α + D}}⌉

, the expected cumulative regret increases sub-linearly with respect to T and the asymptotic optimal decisions of client scheduling are obtained.

7. Experimental Results

7.1. Simulation Setup

All of the experiments involved in this work are conducted on a personal computer with a CPU 2.10 GHz Intel Core i712700F and 32 GB of RAM, running a 64-bit Windows operating system and PyTorch 1.13.1. Assuming that there are 3 edge servers and 18 client devices. Each edge server can communicate with up to 2 clients simultaneously. A total of 2000 rounds of DFL training are conducted.

The communication-related parameters are configured as follows. The channel gains for both uplink and downlink links consist of small-scale and large-scale fading. The small-scale fading follows a Rayleigh distribution with uniform variance, while the large-scale fading is modeled using the path-loss equation,

PL [dB] = 128.1 + 37.6 {log}_{10} (d)

, where d represents the distance in kilometers. The noise power

σ^{2}

is set to

- 173

dBm, and the uplink resource block bandwidth is 1 Mbps. The transmit power of clients and edge servers is set to 10 mW and 1 W, respectively. The data size of model parameters is configured as

m = 5 \times 10^{3}

.

The local computing-related parameters for clients are configured as follows. The computing capability

s_{k}

of each client k is uniformly distributed within the range

[10, 30] \times 10^{6}

. The computational resource allocation

φ_{k} (t)

is set to

2 \times 10^{11}

. The available computational resource

η_{k} (t)

follows a uniform random distribution given by

\frac{1}{1 + I_{k} (t)}

in each round, where

I_{k} (t)

represents the number of active programs running on client k in round t, uniformly distributed within

[0, 10]

. Furthermore, the maximum interval is set to

τ_{max} = 5

s. Table 1 summarizes the key parameters used in this work.

The MNIST dataset [35], comprising 70,000 grayscale images of handwritten digits (0–9), and the CIFAR-10 dataset [36], containing 60,000 color images across 10 categories, were employed for training handwritten digit recognition and image classification models, respectively. For each dataset, the data partitioning scheme was implemented as follows. After random shuffling, the dataset was uniformly distributed across all clients. In each training round, clients randomly determined the quantity of local data to utilize for model updates. For the MNIST dataset, a multi-layer perceptron (MLP) consisting of an input layer, a fully connected layer, and an output layer is chosen as the target model with a total of 101,770 trainable parameters. For the CIFAR-10 dataset, a convolutional neural network (CNN) consisting of two convolutional layers (and corresponding pooling layers), a fully connected layer, and an output layer is chosen as the target model with a total of 313,802 trainable parameters. During the training process, the batch size is set to 64. For the MNIST and CIFAR-10 datasets, the learning rates are set as 0.05 and 0.02, respectively.

To verify the effectiveness of the client selection strategy proposed in this work in reducing long-term cumulative delay, the following methods are introduced for comparison:

Optimal client selection. In this method, the total delay of each client in each round of the system is known as a priority. When making decisions in each round, edge servers select the N clients with the smallest total delay in the covered cells to participate in training. Note that this method serves as the upper bound.
$ϵ$ -greedy client selection. This method employs a greedy metric to decide between exploration and exploitation. In the exploration round, each edge server randomly selects N clients from their covered cells to participate in training. In the exploitation round, each edge server selects the N clients with the minimum delay expectation to participate in training. This method does not utilize contextual information when making selection decisions, relying solely on randomness and delay-based selection. In this work, the value of $ϵ$ is 0.3.
Random client selection. At the beginning of each training round, each edge server randomly selects N clients from the corresponding cells to participate in the training.

7.2. Performance Analysis

Figure 4 shows the cumulative delay and the corresponding performance regret during the training process using different client selection methods on the MINIST dataset. It can be observed from Figure 4 that, compared to the random and

ϵ

-greedy client selection methods, the proposed method significantly reduces delay within the given number of training rounds. This observation indicates that the proposed method can effectively mitigate the effects of heterogeneous and time-varying client resources and improve training efficiency. In addition, we find that the cumulative delay performance gap of the proposed method can gradually have sub-linear growth over communication rounds.

Figure 5 and Figure 6 show the test accuracy and training loss over training time using different client selection methods on the MINIST and CIFAR-10 datasets, respectively. We observe that for both datasets, compared with the random and

ϵ

-greedy client selection methods, the proposed method can achieve the best performance, which suggests that the client selection method proposed in this work can effectively accelerate the convergence of the global model and reduce delay. Figure 7 shows the cumulative delay over different numbers of clients. As shown in Figure 7, when

N S = K

, i.e., all the clients can be associated with the edge BSs simultaneously, all four methods have similar delay performance. With the increase of clients (e.g., new clients joining the systems), the cumulative delay of the methods, except for the random selection method, decreases. Moreover, the cumulative delay of the proposed method is always lower than that of the

ϵ

-greedy and random client selection methods. The results in Figure 7 indicate that, compared to the random and

ϵ

-greedy client selection methods, the proposed method can effectively reduce the cumulative delay of the system.

8. Conclusions

This work investigates the client scheduling problem in a DFL scenario with multiple servers, where the local computing and communication resources of clients are heterogeneous and time-varying, and the aforementioned client resource priors are unknown. Firstly, model the delay generated during the FL training process and propose a client scheduling problem to minimize the cumulative delay. Subsequently, this work proposes a client scheduling algorithm based on context multi-arm slot machines. Through theoretical analysis and algorithm parameter design, this algorithm can achieve asymptotic optimal performance in theory. The experimental results show that the algorithm can make asymptotic optimal client selection decisions, and this method is superior to existing algorithms in reducing the cumulative delay of the system.

Author Contributions

Conceptualization, Z.C. and S.W.; methodology, Z.C.; software, Z.C.; validation, S.W. and Y.W.; formal analysis, Z.C.; investigation, Z.C.; resources, Z.C.; data curation, Z.C.; writing—original draft preparation, Z.C.; writing—review and editing, S.W. and X.Z.; visualization, Z.C.; supervision, X.Z. and Z.C.; project administration, Y.W. and X.Z.; funding acquisition, S.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data available on request due to restrictions. The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Proof of Theorem 1

Due to the cumulative delay loss

R (T)

comprising both the exploration delay loss

R_{Explore} (T)

and the exploitation delay loss

R_{Exploit} (T)

, we have

R (T) = R_{Explore} (T) + R_{Exploit} (T)

. Therefore, we derive the upper bounds for these two components separately to ultimately obtain an upper bound for

R (T)

.

First, we derive the upper bound of the exploration delay loss

R_{Explore} (T)

given

G_{T}

and

E (t)

. If cell s enters the exploration phase in round t, there must exist some

k \in K_{s}

and context grid

g_{k}^{t}

such that

C_{k}^{t} (g_{k}^{t}) < E (t)

. According to the definition of

E (t)

, there are at most

⌈ E (t) ⌉

exploration rounds in cell s in which user k, with context

x_{k}^{t} \in g_{k}^{t}

and

t^{'} < t

, is selected to participate in training in the current round. Additionally, it can be derived that there are at most

K_{s} {(G_{T})}^{D} ⌈ E (T) ⌉

exploration rounds across T rounds. Assuming the delay loss per exploration round is at most

r^{max}

, the upper bound for the exploration delay loss

R_{Explote} (T)

over T rounds is as follows:

E [R_{Exploce} (T)] \leq r^{max} K_{s} {(G_{T})}^{D} ⌈ E (T) ⌉ .

(A1)

From this expression, it is evident that the order of

R_{Explore} (T)

is determined by the grid division

{(G_{T})}^{D}

and the function

E (t)

.

Next, we derive the upper bound for the exploitation delay loss

R_{Explore} (T)

given

G_{T}

and

E (t)

. Let the expected delay estimate for user device k with context x be defined as

μ_{k} (x) ≜ {\hat{τ}}_{k} (x)

. Furthermore, we denote the upper and lower bounds of the expected delay estimate for user

k \in K_{s}

across all contexts in grid g by

{\bar{μ}}_{k} (x) = {sup}_{x \in g} μ_{k} (x)

and

μ_{k} (x) = {inf}_{x \in g} μ_{k} (x)

, respectively. We define the context at the geometric center of grid g as

\dot{x} (g)

and make the following definitions:

\begin{matrix} \bar{μ} (g^{t}) = \{{\bar{μ}}_{1} (g_{1}^{t}), {\bar{μ}}_{2} (g_{2}^{t}), \dots, {\bar{μ}}_{N} (g_{N}^{t})\}, \\ \underset{̲}{μ} (g^{t}) = \{{\underset{̲}{μ}}_{1} (g_{1}^{t}), {\underset{̲}{μ}}_{2} (g_{2}^{t}), \dots, {\underset{̲}{μ}}_{N} (g_{N}^{t})\}, \\ \dot{μ} (g^{t}) = \{μ_{1} (\dot{x} (g_{1}^{t})), μ_{2} (\dot{x} (g_{2}^{t})), \dots, μ_{N} (\dot{x} (g_{N}^{t}))\} . \end{matrix}

(A2)

The user selection strategy based on the delay estimate

\dot{μ} (g^{t})

is defined as follows:

\begin{matrix} \dot{a} (g^{t}) = & {arg min}_{a_{s} (t)} max_{k \in A, f} \dot{μ} (g^{t}), \end{matrix}

(A3a)

\begin{matrix} s . t . a_{k} (t) \in {0, 1}, \forall k \in K_{s}, \end{matrix}

(A3b)

\begin{matrix} ∥A_{s} (t)∥ = N . \end{matrix}

(A3c)

We also define the follows:

L (g^{t}) = \{a ∣ τ^{t} (a, \underset{̲}{μ} (g^{t})) - τ^{t} (\dot{a} (g^{t}), \bar{μ} (g^{t})) \geq A t^{θ}\} .

(A4)

We term any user selection strategy

a \in L (g^{t})

as a suboptimal strategy, and any strategy

a \in A, a \notin L (g^{t})

as a near-optimal strategy. Therefore, the exploitation delay loss can be expressed as follows:

E [R_{Exploit} (T)] = E [R_{s} (T)] + E [R_{n} (T)],

(A5)

where

E [R_{s} (T)]

and

E [R_{n} (T)]

represent the delay losses in the exploitation phase caused by suboptimal and near-optimal strategies, respectively. Next, we derive the upper bounds for each of these losses.

To begin with, let

W (t) = \{E_{s, t} = ⌀\}

represent the case where the algorithm enters the exploitation phase, and let

V_{a} (t)

denote the scenario in round t, where the strategy is

a

. Consequently, the following holds:

\begin{matrix} R_{s} (T) & = \sum_{t = 1}^{T} \sum_{a^{t} \in L (g^{t})} I_{\{V_{a} (t), W (t)\}} \\ \times (τ^{t} (a^{t}, τ^{t}) - τ^{t} (a^{*, t}, τ^{t})) . \end{matrix}

(A6)

Since the maximum delay loss per round is

r^{\max}

, (A1) can be simplified to the following:

R_{s} (T) \leq r^{max} \sum_{t = 1}^{T} \sum_{a \in L (g^{t})} I_{\{V_{a} (t), W (t)\}} .

(A7)

Taking expectations on both sides of the inequality yields the following:

E [R_{s} (T)] \leq r^{max} \sum_{t = 1}^{T} \sum_{a \in L (g^{t})} Pr \{V_{a} (t), W (t)\} .

(A8)

Given that in the exploitation phase with strategy

a

,

τ^{t} (a, {\hat{τ}}^{t}) \leq τ^{t} (\dot{a} (g^{t}), {\hat{τ}}^{t})

, the following is true:

Pr \{V_{a} (t), W (t)\} \leq Pr \{τ^{t} (a, {\hat{τ}}^{t}) \leq τ^{t} (a (g^{t}), {\hat{τ}}^{t}), W (t)\} .

(A9)

Defining

H (t) > 0

, the right side of Equation (A9) includes at least one of the following events:

\begin{matrix} E_{1} = & \{τ^{t} (a, {\hat{τ}}^{t}) \leq τ^{t} (a, \bar{μ} (g^{t})) - H (t), W (t)\}, \\ E_{2} = & \{τ^{t} (\dot{a} (g^{t}), {\hat{τ}}^{t}) \geq τ^{t} (\dot{a} (g^{t}), \underset{̲}{μ} (g^{t})) + H (t), W (t)\}, \\ E_{3} = & \{τ^{t} (a, {\hat{τ}}^{t}) \leq τ^{t} (\dot{a} (g^{t}), {\hat{τ}}^{t}), \\ τ^{t} (a, {\hat{τ}}^{t}) > τ^{t} (a, \bar{μ} (g^{t})) - H (t), \\ τ^{t} (\dot{a} (g^{t}), {\hat{τ}}^{t}) < τ^{t} (\dot{a} (g^{t}), \underset{̲}{μ} (g^{t})) + H (t), W (t)\} . \end{matrix}

(A10)

Thus, we have

\{τ^{t} (a, {\hat{τ}}^{t}) \leq τ^{t} (\dot{a} (g^{t}), {\hat{τ}}^{t}), W (t)\} \subseteq \{E_{1} \cup E_{2} \cup E_{3}\}

. Next, we derive the upper bound for the probability of each of these events. Given the definition

{\bar{μ}}_{k} (x) = {sup}_{x \in g} μ_{k} (x)

, we have

E_{x \in g} [μ_{k} (x)] = μ_{k} (g) \leq {sup}_{x \in g} μ_{k} (x) = {\bar{μ}}_{k} (g)

, thus

\begin{matrix} Pr \{E_{1}\} \\ = Pr \{τ^{t} (a, {\hat{τ}}^{t}) \leq τ^{t} (a, \bar{μ} (g^{t})) - H (t), W (t)\} \\ = Pr \{max_{k \in K_{s}} (a_{k}^{t} {\hat{τ}}_{k}^{t}) \leq max_{k \in K_{s}} (a_{k}^{t} {\bar{μ}}_{k} (g_{k}^{t})) - H (t), W (t)\} \\ \leq Pr \{max_{k \in K_{s}} (a_{k}^{t} {\hat{τ}}_{k}^{t}) \leq max_{k \in K_{s}} (a_{k}^{t} μ_{k} (g_{k}^{t})) - H (t), W (t)\} . \end{matrix}

(A11)

The assumption (a) is as follows:

Considering that for a specific estimator, as its historical experience

P_{k}

increases, the estimation error for user delay will decrease. Thus, we assume that the estimator corresponding to user k satisfies the following PAC (Probably Approximately Correct) property for any grid point in the context set

G

:

Pr \{|Θ_{k} (P_{k}^{t} (g)) - τ_{k} (g)| > ϵ\} \leq σ_{k} (ϵ, C_{k}^{t} (g)),

(A12)

where

τ_{k} (g)

represents the expected delay for user context

x \in g

under prior conditions.

σ_{k} (ϵ, C_{k}^{t} (g))

is a term that decreases with the increase in the count

C_{k}^{t} (g)

and is related to the estimator.

In combination with assumption (a), we obtain the following:

Pr \{E_{1}\} \leq σ_{k^{'}} (H (t), C_{k^{'}}^{t} (g_{k^{'}}^{t})),

(A13)

where

k^{'} = \{k^{'} \in K_{s} ∣ {\hat{τ}}_{k^{'}}^{t} a_{k^{'}}^{t} = {max}_{k \in K_{s}} (a_{k}^{t} {\hat{τ}}_{k}^{t})\}

. Similarly, we have the following:

Pr \{E_{2}\} \leq σ_{k^{'}} (H (t), C_{k^{'}}^{t} (g_{k^{'}}^{t})) .

(A14)

When

H (t)

satisfies

2 H (t) + 2 L D^{\frac{α}{2}} {(G_{T})}^{α} \leq A t^{θ}

, we have

Pr \{E_{3}\} = 0

. Therefore,

\begin{matrix} Pr \{V_{a} (t), W (t)\} & \leq Pr \{E_{1} \cup E_{2} \cup E_{3}\} \\ \leq Pr \{E_{1}\} + Pr \{E_{2}\} + Pr \{E_{3}\} \\ \leq 2 σ_{k^{'}} (H (t), C_{k^{'}}^{t} (g_{k^{'}}^{t})) \\ \leq 2 σ_{k^{'}} (H (t), E (t)) . \end{matrix}

(A15)

Combined with Equation (A8), the upper bound of

E [R_{s} (T)]

is the following:

\begin{matrix} E [R_{s} (T)] & \leq r^{max} \sum_{t = 1}^{T} \sum_{a \in L (g^{t})} Pr \{V_{a} (t), W (t)\}, \\ \leq 2 | F | r^{max} \sum_{t = 1}^{T} \sum_{a \in L (g^{t})} σ_{k^{'}} (H (t), E (t)), \end{matrix}

(A16)

(the upper bound of

E [R_{n} (T)]

) where

W (t)

indicates the situation in which the algorithm enters the exploitation phase, and

Q (t)

denotes that the strategy

a^{t}

in round t is near-optimal, i.e.,

a^{t} \in A, a^{t} \notin L (g^{t})

. So,

R_{n} (T) = \sum_{t = 1}^{T} I_{{Q (t), W (t)}} \times (τ^{t} (a^{t}, τ^{t}) - τ^{t} (a^{*, t}, τ^{t})) .

(A17)

Taking the expectation on both sides of the above equation, we have the following:

\begin{matrix} E [R_{n} (T)] = & \sum_{t = 1}^{T} Pr {Q (t), W (t)} \\ \cdot & E [τ^{t} (a^{t}, τ^{t}) - τ^{t} (a^{*, t}, τ^{t}) ∣ Q (t), W (t)] \\ \leq & \sum_{t = 1}^{T} E [τ^{t} (a^{t}, τ^{t}) - τ^{t} (a^{*, t}, τ^{t}) ∣ Q (t), W (t)], \\ \leq & \sum_{t = 1}^{T} τ^{t} (a^{t}, μ^{t}) - τ^{t} (a^{*, t}, μ^{t}) . \end{matrix}

(A18)

Since

a^{t} \in A

,

a^{t} \notin L (g^{t})

, we obtain

τ^{t} (a^{t}, \underset{̲}{μ} (g^{t})) - τ^{t} (\dot{a} (g^{t}), \bar{μ} (g^{t})) < A t^{θ}

, which leads to the following:

\begin{matrix} τ^{t} (a^{t}, μ^{t}) - τ^{t} (a^{*, t}, μ^{t}) \\ \leq & 3 L D^{\frac{α}{2}} {(G_{T})}^{- α} + τ^{t} (a^{t}, \underset{̲}{μ} (g^{t})) - τ^{t} (\dot{a} (g^{t}), \bar{μ} (g^{t})), \\ \leq & 3 L D^{\frac{α}{2}} {(G_{T})}^{- α} + A t^{θ} . \end{matrix}

(A19)

Thus, the upper bound of

E [R_{n} (T)]

is as follows:

\begin{matrix} E [R_{n} (T)] & \leq \sum_{t = 1}^{T} (3 L D^{\frac{α}{2}} {(G_{T})}^{- α} + A t^{θ}), & \leq 3 T L D^{\frac{α}{2}} {(G_{T})}^{- α} + A T^{θ + 1} . \end{matrix}

(A20)

In summary, the upper bound of the delay loss during exploitation is as follows:

\begin{matrix} E [R_{Exploit} (T)] & \leq 3 T L D^{\frac{α}{2}} {(G_{T})}^{- α} + A T^{θ + 1} & + 2 r^{max} \sum_{t = 1}^{T} \sum_{a \in L (g^{'})} σ_{k^{'}} (H (t), E (t)) . \end{matrix}

(A21)

Thus, when the condition

2 H (t) + 2 L D^{\frac{α}{2}} {(G_{T})}^{α} \leq A t^{θ}

is satisfied, the upper bound of the total delay loss can be expressed as follows:

\begin{matrix} E [R (T)] & \leq r^{max} K_{s} {(G_{T})}^{D} ⌈ E (T) ⌉ \\ + 3 T L D^{\frac{α}{2}} {(G_{T})}^{- α} + A T^{θ + 1} \\ + 2 r^{max} \sum_{t = 1}^{T} \sum_{a \in L (g^{t})} σ_{k^{'}} (H (t), E (t)) . \end{matrix}

(A22)

The proof is complete.

References

Nguyen, D.C.; Ding, M.; Pathirana, P.N.; Seneviratne, A.; Li, J.; Niyato, D.; Dobre, O.; Poor, H.V. 6G Internet of Things: A Comprehensive Survey. IEEE Internet Things J. 2021, 9, 359–383. [Google Scholar] [CrossRef]
Lampropoulos, G.; Siakas, K.; Anastasiadis, T. Internet of things (IoT) in industry: Contemporary application domains, innovative technologies and intelligent manufacturing. People 2018, 6, 109–118. [Google Scholar] [CrossRef]
Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef]
Majeed, I.A.; Kaushik, S.; Bardhan, A.; Tadi, V.S.K.; Min, H.K.; Kumaraguru, K.; Muni, R.D. Comparative assessment of federated and centralized machine learning. arXiv 2022, arXiv:2202.01529. [Google Scholar]
Konečný, J.; Brendan McMahan, H.; Ramage, D.; Richtárik, P. Federated Optimization: Distributed Machine Learning for On-Device Intelligence. arXiv 2016, arXiv:1610.02527. [Google Scholar] [CrossRef]
Zhao, P.; Jin, Y.; Ren, X.; Li, Y. A personalized cross-domain recommendation with federated meta learning. Multim. Tools Appl. 2024, 83, 71435–71450. [Google Scholar] [CrossRef]
Hao, M.; Li, H.; Luo, X.; Xu, G.; Yang, H.; Liu, S. Efficient and Privacy-Enhanced Federated Learning for Industrial Artificial Intelligence. IEEE Trans. Ind. Inform. 2020, 16, 6532–6542. [Google Scholar] [CrossRef]
Samarakoon, S.; Bennis, M.; Saad, W.; Debbah, M. Federated Learning for Ultra-Reliable Low-Latency V2V Communications. In Proceedings of the 2018 IEEE Global Communications Conference (GLOBECOM), Abu Dhabi, United Arab Emirates, 9–13 December 2018; pp. 1–7. [Google Scholar]
Yuan, B.; Ge, S.; Xing, W. A Federated Learning Framework for Healthcare IoT devices. arXiv 2020, arXiv:2005.05083. [Google Scholar]
AbdulRahman, S.; Tout, H.; Ould-Slimane, H.; Mourad, A.; Talhi, C.; Guizani, M. A survey on federated learning: The journey from centralized to distributed on-site learning and beyond. IEEE Internet Things J. 2020, 8, 5476–5497. [Google Scholar] [CrossRef]
Sun, Y.; Shao, J.; Mao, Y.; Wang, J.H.; Zhang, J. Semi-Decentralized Federated Edge Learning for Fast Convergence on Non-IID Data. In Proceedings of the 2022 IEEE Wireless Communications and Networking Conference (WCNC), Austin, TX, USA, 10–13 April 2022; pp. 1898–1903. [Google Scholar] [CrossRef]
Beltrán, E.T.M.; Pérez, M.Q.; Sánchez, P.M.S.; Bernal, S.L.; Bovet, G.; Pérez, M.G.; Pérez, G.M.; Celdrán, A.H. Decentralized federated learning: Fundamentals, state of the art, frameworks, trends, and challenges. IEEE Commun. Surv. Tutor. 2023, 25, 2983–3013. [Google Scholar] [CrossRef]
S Alsaffar, Q.; Ayed, L.B. Heterogeneous Resources in Infrastructures of the Edge Network Paradigm: A Comprehensive Review. Karbala Int. J. Mod. Sci. 2024, 10, 15. [Google Scholar] [CrossRef]
Noaman, M.; Khan, M.S.; Abrar, M.F.; Ali, S.; Alvi, A.; Saleem, M.A. Challenges in integration of heterogeneous internet of things. Sci. Program. 2022, 2022, 8626882. [Google Scholar] [CrossRef]
Qin, L.; Chen, S.; Zhu, X. Contextual combinatorial bandit and its application on diversified online recommendation. In Proceedings of the 2014 SIAM International Conference on Data Mining, SIAM, Philadelphia, PA, USA, 24–26 April 2014; pp. 461–469. [Google Scholar]
WANG, L.; WANG, W.; LI, B. CMFL: Mitigating Communication Overhead for Federated Learning. In Proceedings of the 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), Dallas, TX, USA, 7–9 July 2019; pp. 954–964. [Google Scholar] [CrossRef]
Cho, Y.J.; Wang, J.; Joshi, G. Client Selection in Federated Learning: Convergence Analysis and Power-of-Choice Selection Strategies. arXiv 2020, arXiv:2010.01243. [Google Scholar]
Ren, J.; He, Y.; Wen, D.; Yu, G.; Huang, K.; Guo, D. Scheduling for Cellular Federated Edge Learning With Importance and Channel Awareness. IEEE Trans. Wirel. Commun. 2020, 19, 7690–7703. [Google Scholar] [CrossRef]
Chen, M.; Yang, Z.; Saad, W.; Yin, C.; Poor, H.V.; Cui, S. A Joint Learning and Communications Framework for Federated Learning Over Wireless Networks. IEEE Trans. Wirel. Commun. 2021, 20, 269–283. [Google Scholar] [CrossRef]
Xu, B.; Xia, W.; Zhang, J.; Quek, T.Q.S.; Zhu, H. Online Client Scheduling for Fast Federated Learning. IEEE Wirel. Commun. Lett. 2021, 10, 1434–1438. [Google Scholar] [CrossRef]
Slivkins, A. Introduction to multi-armed bandits. Found. Trends Mach. Learn. 2019, 12, 1–286. [Google Scholar] [CrossRef]
Luo, S.; Chen, X.; Wu, Q.; Zhou, Z.; Yu, S. HFEL: Joint Edge Association and Resource Allocation for Cost-Efficient Hierarchical Federated Edge Learning. IEEE Trans. Wirel. Commun. 2020, 19, 6535–6548. [Google Scholar] [CrossRef]
Wen, W.; Chen, Z.; Yang, H.H.; Xia, W.; Quek, T.Q.S. Joint Scheduling and Resource Allocation for Hierarchical Federated Edge Learning. IEEE Trans. Wirel. Commun. 2022, 21, 5857–5872. [Google Scholar] [CrossRef]
Zhao, T.; Li, F.; He, L. DRL-Based Joint Resource Allocation and Device Orchestration for Hierarchical Federated Learning in NOMA-Enabled Industrial IoT. IEEE Trans. Ind. Inform. 2023, 19, 7468–7479. [Google Scholar] [CrossRef]
Saadat, H.; Allahham, M.S.; Abdellatif, A.A.; Erbad, A.; Mohamed, A. RL-Assisted Energy-Aware User-Edge Association for IoT-based Hierarchical Federated Learning. In Proceedings of the 2022 International Wireless Communications and Mobile Computing (IWCMC), Dubrovnik, Croatia, 30 May–3 June 2022; pp. 548–553. [Google Scholar] [CrossRef]
Xu, B.; Xia, W.; Zhang, J.; Sun, X.; Zhu, H. Dynamic Client Association for Energy-Aware Hierarchical Federated Learning. In Proceedings of the 2021 IEEE Wireless Communications and Networking Conference (WCNC), Nanjing, China, 29 March–1 April 2021; pp. 1–6. [Google Scholar] [CrossRef]
Qu, Z.; Duan, R.; Chen, L.; Xu, J.; Lu, Z.; Liu, Y. Context-Aware Online Client Selection for Hierarchical Federated Learning. IEEE Trans. Parallel Distrib. Syst. 2021, 33, 4353–4367. [Google Scholar] [CrossRef]
Liu, W.; Chen, L.; Zhang, W. Decentralized Federated Learning: Balancing Communication and Computing Costs. IEEE Trans. Signal Inf. Process. Over Netw. 2022, 8, 131–143. [Google Scholar] [CrossRef]
Yan, Z.H.; Li, D. Adaptive Decentralized Federated Learning in Energy and Latency Constrained Wireless Networks. arXiv 2024, arXiv:abs/2403.20075. [Google Scholar]
Masmoudi, N.; Jaafar, W. OCD-FL: A Novel Communication-Efficient Peer Selection-based Decentralized Federated Learning. arXiv 2024, arXiv:abs/2403.04037. [Google Scholar] [CrossRef]
Xie, H.; Xia, M.; Wu, P.; Wang, S.; Huang, K. Decentralized Federated Learning With Asynchronous Parameter Sharing for Large-Scale IoT Networks. IEEE Internet Things J. 2024, 11, 34123–34139. [Google Scholar] [CrossRef]
Abed, Q.A. Study the Performance of Transmission Control Protocol Versions in Several Domains. J. Mach. Comput. 2023, 3, 517–522. [Google Scholar] [CrossRef]
Ye, H.; Liang, L.; Li, G.Y. Decentralized Federated Learning With Unreliable Communications. IEEE J. Sel. Top. Signal Process. 2022, 16, 487–500. [Google Scholar] [CrossRef]
Garcia, N.M.; Gil, F.; Matos, B.; Yahaya, C.; Pombo, N.; Goleva, R.I. Keyed user datagram protocol: Concepts and operation of an almost reliable connectionless transport protocol. IEEE Access 2019, 7, 18951–18963. [Google Scholar] [CrossRef]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE Inst. Electr. Electron. Eng. 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]

Figure 1. System framework.

Figure 2. Flow of the proposed algorithm.

Figure 3. An illustration of the context.

Figure 4. The performance comparison during the training process using different client selection methods on the MINIST dataset.

Figure 5. Comparison of training on the MINIST dataset using different client selection methods.

Figure 6. Comparison of training on the CIFAR-10 dataset using different client selection methods.

Figure 7. Cumulative delay versus number of clients.

Table 1. Simulation settings.

Parameters	Values
Size of the area	500 m × 500 m
Noise power spectral density	$- 173$ dBm/Hz
Uplink resource block bandwidth	1 Mbps
Transmit power of clients	10 mW
Transmit power of servers	1 W
The data size of model parameters	$5 \times 10^{3}$
The computing capability	$[10, 30] \times 10^{6}$
The computational resource	$2 \times 10^{11}$
The number of active programs running on client	[0,10]
The maximum interval	5 s
Batch size	64
Learning rate (MNIST / CIFAR-10)	0.05/0.02

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Z.; Zhang, X.; Wang, S.; Wang, Y. MAB-Based Online Client Scheduling for Decentralized Federated Learning in the IoT. Entropy 2025, 27, 439. https://doi.org/10.3390/e27040439

AMA Style

Chen Z, Zhang X, Wang S, Wang Y. MAB-Based Online Client Scheduling for Decentralized Federated Learning in the IoT. Entropy. 2025; 27(4):439. https://doi.org/10.3390/e27040439

Chicago/Turabian Style

Chen, Zhenning, Xinyu Zhang, Siyang Wang, and Youren Wang. 2025. "MAB-Based Online Client Scheduling for Decentralized Federated Learning in the IoT" Entropy 27, no. 4: 439. https://doi.org/10.3390/e27040439

APA Style

Chen, Z., Zhang, X., Wang, S., & Wang, Y. (2025). MAB-Based Online Client Scheduling for Decentralized Federated Learning in the IoT. Entropy, 27(4), 439. https://doi.org/10.3390/e27040439

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MAB-Based Online Client Scheduling for Decentralized Federated Learning in the IoT

Abstract

1. Introduction

2. Related Works

2.1. Client Scheduling in Centralized FL

2.2. Client Scheduling in DFL

3. System Model

3.1. DFL Process

3.2. Delay Model

4. Problem Formulation

5. Algorithm Design

5.1. Delay Estimation Based on Contextual Information

5.2. Exploration and Exploitation

6. Key Parameter Design

6.1. Upper Bound of Regret

6.2. Parameter Design Based on the Upper Bound

7. Experimental Results

7.1. Simulation Setup

7.2. Performance Analysis

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Proof of Theorem 1

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI