Ultra-Reliable and Low-Latency Wireless Hierarchical Federated Learning: Performance Analysis

Zhang, Haonan; Xu, Peng; Dai, Bin

doi:10.3390/e26100827

Open AccessArticle

Ultra-Reliable and Low-Latency Wireless Hierarchical Federated Learning: Performance Analysis^†

by

Haonan Zhang

^1,2,3

,

Peng Xu

^2,4 and

Bin Dai

^1,2,3,*

¹

School of Information Science and Technology, Southwest JiaoTong University, Chengdu 611756, China

²

Chongqing Key Laboratory of Mobile Communications Technology, Chongqing 400065, China

³

Provincial Key Lab of Information Coding and Transmission, Southwest Jiaotong University, Chengdu 611756, China

⁴

School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

^*

Author to whom correspondence should be addressed.

^†

An earlier version of this paper was presented in part at the IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), New York, NY, USA, 17–20 May 2023 and in part at the IEEE International Conference on Communications Workshops (ICC Workshops), Roma, Italy, 28 May–1 June 2023.

Entropy 2024, 26(10), 827; https://doi.org/10.3390/e26100827

Submission received: 25 August 2024 / Revised: 24 September 2024 / Accepted: 27 September 2024 / Published: 29 September 2024

(This article belongs to the Special Issue Advanced New Physical Layer Technologies for Next-Generation Wireless Communications)

Download

Browse Figures

Versions Notes

Abstract

Wireless hierarchical federated learning (WHFL) is an implementation of wireless federated Learning (WFL) on a cloud–edge–client hierarchical architecture that accelerates model training and achieves more favorable trade-offs between communication and computation. However, due to the broadcast nature of wireless communication, the WHFL is susceptible to eavesdropping during the training process. Apart from this, recently ultra-reliable and low-latency communication (URLLC) has received much attention since it serves as a critical communication service in current 5G and upcoming 6G, and this motivates us to study the URLLC-WHFL in the presence of physical layer security (PLS) issue. In this paper, we propose a secure finite block-length (FBL) approach for the multi-antenna URLLC-WHFL, and characterize the relationship between privacy, utility, and PLS of the proposed scheme. Simulation results show that when the eavesdropper’s CSI is perfectly known by the edge server, our proposed FBL approach not only almost achieves perfect secrecy but also does not affect learning performance, and further shows the robustness of our schemes against imperfect CSI of the eavesdropper’s channel. This paper provides a new method for the URLLC-WHFL in the presence of PLS.

Keywords:

finite block-length coding; physical layer security; privacy-utility relationship; wireless federated learning

1. Introduction

Wireless federated learning (WFL), which allows the training of machine learning (ML) models on a large corpus of decentralized data stored on mobile devices [1], has attracted significant research interest. Currently, WFL primarily aims to enhance communication efficiency [2,3,4,5,6], improve privacy and security [7,8,9], investigate a balance between privacy and utility [10,11,12,13,14], manage power control for wireless devices [15,16], and design effective beamforming strategies [17]. To optimize the processing power of edge and cloud servers, a hierarchical federated learning (HFL) system involving clients, edge servers, and cloud servers has been proposed in [18]. Compared to FL systems relying on a single server, HFL reduces the computational load [19,20,21], lowers user-to-cloud server communication costs [18,19,20,21,22,23], decreases FL processing time [18], and improves privacy and security in FL [22,23]. Specifically, since the convergence performance of the HFL system is theoretically proved in [18], joint user scheduling and wireless resource allocation are established to improve both communication and energy efficiency [19,20,21]. To enhance privacy in wireless hierarchical federated learning (WHFL), ref. [22] introduced a method based on local differential privacy (LDP) that involves artificial noise into the shared model parameters at two stages. Additionally, ref. [24] considered the influence of device mobility on the learning performance of WHFL systems.

The broadcast nature of wireless communication renders WFL susceptible to eavesdropping. As a result, tackling the challenge of WFL in the presence of physical layer security (PLS) is a significant issue. Different from the privacy requirement of the FL that the information leakage between users and servers does not need to be arbitrarily small due to the accuracy of data analysis, the information leakage to the eavesdropper should vanish, which is also known as the PLS requirement [25]. The current research in WFL in the presence of PLS [26,27,28,29] primarily focuses on enhancing the security of data through resource allocation and artificial jamming techniques. Specifically, reference [26] focuses on optimizing the power control of drones to enhance the security rate of the WFL system, considering constraints such as WFL training time and battery capacity of the drone. In [27], a method for achieving secrecy in WFL was proposed via using cooperative jamming, which involves the cooperative provision of jamming signals by users to counteract eavesdropping attempts and enhance security. Ref. [28] proposed the method of using conventional wireless devices to form a non-orthogonal multiple access (NOMA) transmission group with an edge device for secrecy-enhanced mobile edge computing, and the devices provide cooperative jamming to an eavesdropper while transmitting data to a cellular base station. In [29], a power allocation algorithm is proposed for WFL, where the transmitting power is divided proportionally between the transmitted signal and artificial noise to maximize the secrecy rate while satisfying the model performance requirement. Apart from this, ref. [30] proposed a PLS measure while considering the privacy-utility constraints in WFL.

Very recently, ultra-reliable and low-latency communication (URLLC) has attracted significant attention, as it serves as a critical communication service in fifth-generation (5G) and sixth-generation (6G) cellular networks. One essential technology for URLLC is short-sized packet communication [31], which indicates that the coding block length should be finite, and finite block-length (FBL) coding [32] provides an effective way for this scenario. Currently, the study of WFL combined with URLLC includes the design of a multi-level architecture to satisfy URLLC requirements [33], and the application of WFL in vehicular networks while considering URLLC constraints [34]. To the best of the authors’ knowledge, the practical FBL scheme for the WFL remains unknown. Then it is natural to ask: is there any practical FBL scheme for the WHFL in the presence of PLS, if yes, what is the relationship between PLS, privacy, and utility in WHFL systems while considering URLLC requirements?

One possible solution to the aforementioned question is a channel feedback coding scheme. The study of channel feedback scheme started from [35], where an elegant feedback coding scheme called the Schalkwijk–Kailath (SK) scheme was proposed for additive white Gaussian noise (AWGN) channel with noiseless feedback. In this scheme, the transmitter sends the original message only in the initial transmission. In the subsequent round, the receiver sends an estimate of the original message to the transmitter via a noiseless feedback channel. The transmitter sends an amplified version of the estimation error back to the receiver, and the receiver obtains an estimate of the estimation error by using the minimum mean square error (MMSE). After a predetermined number of rounds, the receiver uses the minimum distance rule to decode the message. It was shown that the SK scheme [35] is not only capacity-achieving but also its decoding error probability doubly exponentially decays to zero as the coding block length increases, which indicates that the SK scheme requires an extremely short coding block length to achieve a desired decoding error probability. Furthermore, ref. [36] showed that the SK scheme achieves perfect weak secrecy by itself, i.e., the SK scheme satisfies PLS requirement by itself. Recently, ref. [37] showed that the SK scheme [35] is almost the optimal FBL scheme for the AWGN channel with feedback, which indicates that it may be a good choice for URLLC.

However, note that the application of the SK scheme to the wireless fading channel still has a long way to go since it is based on the assumption that the feedback channel is a noiseless channel. Apart from this, in wireless communication, the channel feedback is often utilized to transmit channel state information (CSI) back to the device for each uplink transmission [17], and this allows the device to adjust its transmission parameters based on the received feedback. Then it is natural to ask: Can we utilize the channel feedback not only for CSI transmission but also for designing an FBL approach for the multi-antenna URLLC-WHFL in the presence of PLS, i.e., is it possible to extend the classical SK scheme to the multi-antenna URLLC-WHFL in the presence of PLS?

In this paper, we answer the aforementioned questions by studying the WHFL in the presence of PLS. Figure 1 illustrates the collaborative training of a learning model by users, edge servers, and cloud servers. To preserve privacy, a local differential privacy (LDP) mechanism [38] is utilized by adding Gaussian noise to each user’s gradient before aggregating all gradients to the edge servers. Furthermore, communication between each edge server and the cloud server over a quasi-static fading duplex channel, which, due to the inherent broadcast characteristics of wireless communication, is eavesdropping by an external eavesdropper. Our primary objective is to ensure that the polluted gradient data retains a certain amount of utility while minimizing privacy leakage to the cloud server and protecting the gradient data transmitted from edge servers to the cloud server from eavesdropping. A straightforward way to achieve the above goal is for the edge servers to securely encode the polluted data gradients as codewords and transmit them into wireless duplex fading channels. The cloud server can successfully decode the polluted data gradients, while the eavesdropper obtains no information about them. In this way, the PLS and the privacy of the data can simultaneously be guaranteed since the real data gradients are protected by the LDP mechanism.

Our key contributions to this paper are summarized as follows:

We propose an FBL approach for multi-antenna WHFL in the presence of PLS. In this approach, the feedback link is not only utilized for CSI transmission but also used to send the cloud server’s MMSE about the transmitted polluted data gradient back to the edge server. The key idea of the proposed scheme is to apply the modulo-lattice operation (MLO) [39] to eliminate the impact of feedback channel noise on the performance of the SK scheme [35], and further extend the SK-type scheme to a two-dimensional situation, which performs well in the SISO fading channel. Then further applying pre-coding, beamforming, and singular value decomposition (SVD) techniques to the extended scheme for the SISO case, the FBL coding scheme for the multi-antenna WHFL is obtained.
We derive the achievable secrecy rate of our proposed scheme and characterize the relationship between PLS, privacy, and utility of our scheme. Moreover, given fixed decoding error probability and coding block length, we establish lower and upper bounds on the LDP noise variance that ensure certain privacy, utility, and secrecy levels of PLS.

To obtain a better understanding of the contribution of this paper and the related works studied in the literature, the following Table 1 summarizes the study of WFL in the presence of privacy, utility, PLS, and URLLC in the literature.

The remainder of this paper is organized as follows. In Section 2, the definitions, system model, and main results are given. The FBL approach for the MIMO case is shown in Section 3. FBL approaches for the SIMO/MISO cases are proposed in Section 4. Simulation results are shown in Section 5. Section 6 summarizes all results in this paper and discusses future work.

2. Definitions, System Model and Main Results

2.1. WHFL System

Figure 1 illustrates a system composed of

K_{tot}

users, L edge servers indexed by ℓ and a cloud server. The disjoint user sets are denoted as

{C_{ℓ}}_{ℓ = 1}^{L}

, and

K = | C_{ℓ} |

representing the number of users in edge server ℓ. The distributed datasets are represented by

{S_{ℓ, k}}_{k = 1}^{| C_{ℓ} |}

, where

S_{ℓ, k} = | S_{ℓ, k} |

is the size of

S_{ℓ, k}

. Each dataset

S_{ℓ, k}

is defined as

{(u_{k, j}, v_{k, j})}_{j = 1}^{| S_{ℓ, k} |}

, where

u_{k, j}

represents the j-th input sample and

v_{k, j}

is the corresponding label.

S_{ℓ}

is the aggregated dataset of edge server ℓ, and the gradients from each user are aggregated by their corresponding edge server. The global loss function

F (m)

is defined as follows:

\begin{matrix} F (m) = \frac{1}{S} \sum_{ℓ = 1}^{L} \sum_{k = 1}^{| C_{ℓ} |} S_{ℓ, k} F_{ℓ, k} (m), \end{matrix}

(1)

where model vector

m \in R^{q}

and

S = \sum_{ℓ} \sum_{k} S_{ℓ, k}

. The local loss function is given by the following:

\begin{matrix} F_{ℓ, k} (m) = \frac{1}{S_{ℓ, k}} \sum_{(u_{k, j}, v_{k, j}) \in S_{ℓ, k}} f (m; u_{k, j}, v_{k, j}), \end{matrix}

(2)

where

f (m; u_{k, j}, v_{k, j})

represents the sample-wise loss function. The goal of model training is to minimize the global loss function, as follows:

\begin{matrix} m^{★} = arg min_{m} F (m) . \end{matrix}

(3)

To achieve this, we employ a distributed gradient descent iterative algorithm. Specifically, in the t-th (

t \in {1, 2, \dots, T}

) communication round, the cloud server broadcasts the current global model vector

m_{t}

to all users, and every user has perfect knowledge of

m_{t}

. Each user k then computes its local gradient

\nabla F_{ℓ, k} (m_{t})

using its dataset

S_{ℓ, k}

and the current model

m_{t}

. Once the edge server ℓ receives all the noisy local gradients from its users, which have been perturbed by Gaussian noise for LDP, it computes an estimation of the partial gradient as follows:

\begin{matrix} \nabla F_{ℓ} (m_{t}) = \frac{1}{S_{ℓ}} \sum_{k \in C^{ℓ}} S_{ℓ, k} \nabla F_{ℓ, k} (m_{t}), \end{matrix}

(4)

where

S_{ℓ} = | S_{ℓ} |

denotes the size of

S_{ℓ}

. Then, the cloud server aggregates the partial gradient estimates from all edge servers to compute the estimation

\hat{\nabla F} (m_{t})

of the global gradient, as follows:

\begin{matrix} \nabla F (m_{t}) = \frac{1}{S} \sum_{ℓ = 1}^{L} S_{ℓ} \nabla F_{ℓ} (m_{t}), \end{matrix}

(5)

and updates the global model

m_{t + 1}

by the following:

\begin{matrix} m_{t + 1} = m_{t} - μ \hat{\nabla F} (m_{t}), \end{matrix}

(6)

where

μ

denotes the learning rate.

2.2. Model Formulation

An information-theoretic model of WHFL system is shown in Figure 2. Without loss of generality, we adopt the following assumptions:

Assumption 1.

The communication of any individual edge server to the cloud server is not affected by other edge servers, and the downlink transmission from the cloud server to the edge servers is reliable [16]. Furthermore, we consider that an external eavesdropper targets the information transmitted during the uplink communication from the edge servers to the cloud server. Consequently, this paper primarily focuses on the PLS of the T rounds of uplink communication from one edge server to the cloud server.

Assumption 2.

The channels are quasi-static fading.

Assumption 3.

Following similar arguments in [3,9,16,17], we assume that the perfect CSI of the feedforward and feedback channels is known by both the cloud server and the edge server. Here note that this assumption is well-justified from a practical standpoint. For the feedforward channel, the channel training for estimating CSI at the cloud server can be achieved by transmitting pilot sequences from the edge servers, and the channel estimation is perfect when the length of the pilot sequences is sufficiently large [40]. On the other hand, when the cloud server transmits the perfectly estimated CSI to the edge server through the feedback channel, only a few feedback bits are required. By using a code with a low coding rate and high error-correcting capability, the probability of feedback error can be negligible [41] and, hence, the CSI of the feedforward channel is perfectly known by the transceiver. For the feedback channel, the perfect CSI sharing between transceivers can be realized in a similar way.

2.2.1. Privacy-Utility

In Figure 2, let

W_{t, k} = \sum_{j = 1}^{S_{ℓ, k}} \nabla f (m_{t}; u_{k, j}, v_{k, j}) = {(W_{t, k, 1}, \dots, W_{t, k, q})}^{T} \in R^{q}

represent the overall local gradient vector for user k (

k \in {1,2, \dots, K}

) during the t-th (

t \in {1,2, \dots, T}

) communication round, where

\nabla f (m_{t}; u_{k, j}, v_{k, j}) = {(\nabla f_{1} (m_{t}; u_{k, j}, v_{k, j}), \dots, \nabla f_{q} (m_{t}; u_{k, j}, v_{k, j}))}^{T}

and

W_{t, k, i} = \sum_{j = 1}^{S_{ℓ, k}} \nabla f_{i} (m_{t}; u_{k, j}, v_{k, j})

(

i \in {1,2, \dots, q}

). Following [42], assume that

\nabla f (m_{t};

u_{k, j}, v_{k, j})

is independent and identically distributed (i.i.d.) and

\nabla f (m_{t};

u_{k, j}, v_{k, j}) \sim N (0, σ_{w, t}^{2} I)

, which indicates that

W_{t, k} \sim N (0, S_{ℓ, k} σ_{w, t}^{2} I)

. The i.i.d. generated local Gaussian noise

η_{t, k} = {(η_{t, k, 1}, \dots, η_{t, k, q})}^{T}

follows distribution

N (0, σ^{2} I)

and is independent of

W_{t, k}

. The edge server aggregates the corrupted local gradient, and it is defined as follows:

\begin{matrix} W_{t, k}^{'} = W_{t, k} + η_{t, k}, \end{matrix}

(7)

where

W_{t, k}^{'} \sim N (0, (S_{ℓ, k} σ_{w, t}^{2} + σ^{2}) I)

. The overall local gradients and noise for the t-th round are

W_{t} = {(W_{t, 1}, \dots, W_{t, q})}^{T}

and

η_{t} = {(η_{t, 1}, \dots, η_{t, q})}^{T}

, respectively, where

W_{t, i} = \sum_{k = 1}^{K} W_{t, k, i}

,

η_{t, i} = \sum_{k = 1}^{K} η_{t, k, i}

and

i \in {1,2, \dots, q}

. Consequently, from (7), the overall corrupted local gradients for the t-th round are

W_{t}^{'} = {(W_{t, 1}^{'}, \dots, W_{t, q}^{'})}^{T}

, where

W_{t, i}^{'} = \sum_{k = 1}^{K} W_{t, k, i}^{'}

and

i \in {1,2, \dots, q}

. Due to the fact that

W_{t, k}

and

η_{t, k}

are i.i.d. and independent, the overall corrupted gradients

W_{t}^{'}

are i.i.d. and distributed as

N (0, (S_{ℓ} σ_{w, t}^{2} + K σ^{2}) I)

, where

S_{ℓ} = \sum_{k = 1}^{K} S_{ℓ, k}

.

Definition 1

(Mutual information privacy [43]). For every

t \in {1, \dots, T}

, if the mutual information

\frac{1}{q} I (W_{t}; W_{t}^{'})

during the t-th round is upper bounded by ϵ, namely,

max_{t \in {1, \dots, T}} \frac{1}{q} I (W_{t}; W_{t}^{'}) \leq ϵ

, the LDP mechanism is said to satisfy ϵ-mutual information privacy for

ϵ > 0

.

Definition 2

(Utility [44]). The utility of

W_{t}^{'}

is defined by the distortion between

W_{t}

and

W_{t}^{'}

, and in this paper, we consider the quadratic distortion

d (W_{t}, W_{t}^{'}) = | | W_{t}^{'} - W_{t} {| |}^{2}

. If

\frac{1}{q T} \sum_{t = 1}^{T} E (d (W_{t},

W_{t}^{'})) \leq υ

, the utility of

W_{t}^{'}

is determined by υ, where the utility and the distortion have an inverse relationship with each other, i.e., smaller υ corresponds to larger utility.

2.2.2. Gradient Compression

We employ lossy Gaussian source coding characterized by a quadratic distortion metric, defined as

d (W_{t}^{'}, {\hat{W}}_{t}^{'}) = | | W_{t}^{'} - {\hat{W}}_{t}^{'} {| |}^{2}

(the source encoder and decoder are respectively located at the edge server and cloud server), where

{\hat{W}}_{t}^{'}

is the output of the source decoder at the cloud server. Following [45] (Chapter 3.8, pp. 64–65), the edge server’s source encoder maps

W_{t}^{'}

to

{1,2, \dots, 2^{q R_{t} (D)}}

and compresses

W_{t}^{'}

into an index

W_{t}^{″}

that is uniformly distributed over

W_{t}^{″} = {1, 2, \dots, 2^{q R_{t} (D)}}

. The rate-distortion function

R_{t} (D)

is defined as follows:

\begin{matrix} R_{t} (D) = \{\begin{matrix} \frac{1}{2} log \frac{K σ^{2} + S_{ℓ} σ_{w, t}^{2}}{D} & 0 \leq D < K σ^{2} + S_{ℓ} σ_{w, t}^{2} \\ 0 & D \geq K σ^{2} + S_{ℓ} σ_{w, t}^{2}, \end{matrix} \end{matrix}

(8)

where

\frac{1}{q T} \sum_{t = 1}^{T} E (d (W_{t}^{'}, {\hat{W}}_{t}^{'})) \leq D

. For the cloud server’s source decoder, the decoding mapping transforms the indices

{1,2, \dots, 2^{q R_{t} (D)}}

into

{\hat{W}}_{t}^{'}

. Here note that when

R_{t} (D) = 0

, no message is transmitted, and

{\hat{W}}_{t}^{'}

is set to 0.

2.2.3. Communication Model

At the t-th round, the channel input-output relationships are expressed as follows:

\begin{matrix} Y_{i} (t) = h X_{i} (t) + η_{1, i} (t), 1 \leq i \leq N_{t}, \end{matrix}

(9)

\begin{matrix} {\tilde{Y}}_{i} (t) = \tilde{h} {\tilde{X}}_{i} (t) + η_{2, i} (t), 1 \leq i \leq N_{t} - 1, \end{matrix}

(10)

\begin{matrix} Z_{i} (t) = g X_{i} (t) + \tilde{g} {\tilde{X}}_{i} (t) + η_{e, i} (t), 1 \leq i \leq N_{t}, \end{matrix}

(11)

where the input and output of the feedforward channel are denoted by

X_{i} (t)

and

Y_{i} (t)

, respectively, the feedback channel’s input and output are

{\tilde{X}}_{i} (t)

and

{\tilde{Y}}_{i} (t)

, respectively, and the eavesdropping channel’s output is

Z_{i} (t)

. Note that

X_{i} (t), {\tilde{Y}}_{i} (t) \in C^{A \times 1}

,

{\tilde{X}}_{i} (t), Y_{i} (t) \in C^{B \times 1}

and

Z_{i} (t) \in C^{C \times 1}

. The average power constraint for the input of the edge server

X_{i} (t)

is

\frac{1}{N_{t}} \sum_{i = 1}^{N_{t}} E [X_{i}^{H} (t) X_{i} (t)] \leq P

, the input of the cloud server

{\tilde{X}}_{i} (t)

is constrained by

\frac{1}{N_{t} - 1} \sum_{i = 1}^{N_{t} - 1} E [{\tilde{X}}_{i}^{H} (t) {\tilde{X}}_{i} (t)] \leq \tilde{P}

. The matrices

h \in C^{B \times A}

,

\tilde{h} \in C^{A \times B}

,

g \in C^{C \times A}

, and

\tilde{g} \in C^{C \times B}

represent the CSI of the feedforward, feedback, and eavesdropping channels, respectively. The channel noises’ elements of

η_{e, i} (t) \in C^{C \times 1}

,

η_{2, i} (t) \in C^{A \times 1}

and

η_{1, i} (t) \in C^{B \times 1}

are i.i.d. and distributed as

CN (0, σ_{e}^{2})

,

CN (0, σ_{2}^{2})

and

CN (0, σ_{1}^{2})

, respectively. The input message

W_{t}^{″}

of the edge server is uniformly drawn in the set

W_{t}^{″}

, and it is encoded as a codeword of length

N_{t}

. Furthermore, the input of the edge server is defined as

X_{i} (t) = f_{t, i} (W_{t}^{″}, h, \tilde{h}, {\tilde{Y}}_{1}^{i - 1} (t))

, where

f_{t, i} (\cdot)

is an encoding function and

{\tilde{Y}}_{1}^{i - 1} (t) = ({\tilde{Y}}_{1} (t), \dots, {\tilde{Y}}_{i - 1} (t))

. The cloud server estimates the message

{\hat{w}}_{t}^{″} = φ (h, \tilde{h}, Y^{N_{t}})

using the decoding function

φ

. The input of the cloud server is defined as

{\tilde{X}}_{i} (t) = {\tilde{f}}_{t, i} (h, Y_{1}^{i} (t), \tilde{h})

, where

{\tilde{f}}_{t, i} (\cdot)

is an encoding function and

Y_{1}^{i} (t) = (Y_{1} (t), \dots, Y_{i} (t))

. The average decoding error probability

P_{e, t}

is given by the following:

\begin{matrix} P_{e, t} = \frac{1}{| W_{t}^{″} |} \sum_{w_{t}^{″} \in W_{t}^{″}} P r {φ (h, Y^{N_{t}}, \tilde{h}) \neq w_{t}^{″} | w_{t}^{″} s e n t} . \end{matrix}

(12)

Definition 3.

According to [46,47], the CSIs

g

and

\tilde{g}

of eavesdropping channels are defined as follows:

\begin{matrix} g = \hat{g} + {Δ g, | | Δ g | |}_{F} \leq ω, \tilde{g} = \hat{\tilde{g}} + Δ \tilde{g}, | | Δ \tilde{g} {| |}_{F} \leq \tilde{ω}, \end{matrix}

(13)

where

\hat{g} \in C^{C \times A}, \hat{\tilde{g}} \in C^{C \times B}

are the estimated CSI of

g

and

\tilde{g}

, respectively.

Δ g

and

Δ \tilde{g}

represent the legal parties’ estimation errors about the perfect CSI of the eavesdropper’s channel, and these errors are respectively bounded by parameters

ω > 0

and

\tilde{ω} > 0

. Here note that

Δ g = Δ \tilde{g} = 0

corresponds to the situation that the legal parties obtain perfect CSI of the eavesdropper’s channel.

Definition 4.

The secrecy level of PLS [48] (the normalized uncertainty of the eavesdropper) is given by

Δ = \frac{H (W_{1}^{″}, \dots, W_{T}^{″} | Z^{N_{1}}, \dots, Z^{N_{T}}, h, \tilde{h}, g, \tilde{g})}{H (W_{1}^{″}, \dots, W_{T}^{″})}, 0 \leq Δ \leq 1 .

(14)

A transmission rate R is said to be

(τ, N, δ, D, υ, ϵ)

achievable, if for given decoding error probability τ, block length N (

N = \sum_{t = 1}^{T} N_{t}

), secrecy level δ,

\frac{1}{q T} \sum_{t = 1}^{T} E (d (W_{t}^{'}, {\hat{W}}_{t}^{'})) \leq D

,

max_{t \in {1, \dots, T}} \frac{1}{q} I (W_{t}; W_{t}^{'}) \leq ϵ

and

\frac{1}{q T} \sum_{t = 1}^{T} E (d (W_{t}, W_{t}^{'})) \leq υ

, there exists a channel code described above such that we have the following:

\frac{H (W_{1}^{″}, \dots, W_{T}^{″})}{N} = R, \frac{1}{T} \sum_{t = 1}^{T} P_{e, t} \leq τ, Δ \geq δ,

(15)

where

δ \in [0, 1]

, and

δ = 1

represents the perfect secrecy. For the WHFL in SISO/SIMO/MISO/MIMO cases, the achievable secrecy transmission rates are respectively denoted by

R_{s i s o / s i m o / m i s o / m i m o} (τ, N,

δ, D, υ, ϵ)

, the channel gains are respectively defined by

h_{s i s o}, {\tilde{h}}_{s i s o}, g_{s i s o}, {\tilde{g}}_{s i s o}, h_{s i m o}, {\tilde{h}}_{s i m o}, g_{s i m o},

{\tilde{g}}_{s i m o}, h_{m i s o}, {\tilde{h}}_{m i s o}, g_{m i s o}, {\tilde{g}}_{m i s o}

,

h_{m i m o}, {\tilde{h}}_{m i m o}, g_{m i m o}

,

{\tilde{g}}_{m i m o}

, and the CSI estimation errors are defined by

Δ g_{s i s o}

,

Δ {\tilde{g}}_{s i s o}

,

Δ g_{s i m o}

,

Δ {\tilde{g}}_{s i m o}

,

Δ g_{m i s o}

,

Δ {\tilde{g}}_{m i s o}

,

Δ g_{m i m o}

and

Δ {\tilde{g}}_{m i m o}

.

2.3. Main Results

Theorem 1.

For the MIMO WHFL with K users and T iterations, given that N, τ, υ, D, ϵ, δ, and applying the FBL approach in Section 3, the relationship between PLS, privacy, utility, and the noise variance of LDP is characterized by the following:

max \{\underset{Secrecy level of PLS}{\underset{︸}{max_{\begin{matrix} t \in {1, \dots, T}, \\ Δ g_{m i m o} \end{matrix}} [\frac{D \cdot 2^{\frac{2}{q (1 - δ)} log det (I + \frac{{\hat{g}}_{m i m o} K_{x_{1}} {\hat{g}}_{m i m o}^{H}}{σ_{e}^{2}})} - S_{ℓ} σ_{w, t}^{2}}{K}]}}, \underset{Privacy term}{\underset{︸}{max_{t \in {1, \dots, T}} [\frac{S_{ℓ} σ_{w, t}^{2}}{K (2^{2 ϵ} - 1)}]}}\} \leq \underset{\begin{matrix} LDP noise variance \end{matrix}}{\underset{︸}{σ^{2}}} \leq \underset{\begin{matrix} Utility term \end{matrix}}{\underset{︸}{\frac{υ}{K}}},

(16)

where

{\hat{g}}_{m i m o} = g_{m i m o} - Δ g_{m i m o}

,

K_{x_{1}} = E (X_{1} (t) X_{1}^{H} (t))

. In addition, an achievable transmission rate

R_{m i m o} (τ, N, δ, D, υ, ϵ)

of our proposed FBL approach is given by the following:

R_{m i m o} (τ, N, δ, D, υ, ϵ) = \frac{\sum_{t = 1}^{T} N_{t} R_{t}}{N},

(17)

where

\begin{matrix} N = \sum_{t = 1}^{T} N_{t}, R_{t} = max_{\begin{matrix} \sum_{j = 1}^{J} P_{j} = P \\ \sum_{j = 1}^{J} {\tilde{P}}_{j} = \tilde{P} \end{matrix}} \sum_{j = 1}^{J} \frac{1}{N_{t}} log (\frac{3 {SNR}_{j} d_{j}^{2}}{{[Q^{- 1} (\frac{τ}{8 J})]}^{2}} {(1 + \frac{{SNR}_{j} d_{j}^{2}}{Ψ_{1} Ψ_{2}})}^{N_{t} - 1}), \end{matrix}

(18)

\begin{matrix} Ψ_{1} = 1 + ξ \frac{d_{j}^{2} {SNR}_{j}}{{\tilde{d}}_{j}^{2} {\tilde{SNR}}_{j}}, Ψ_{2} = {(1 - \frac{ξ}{{\tilde{d}}_{j}^{2} {\tilde{SNR}}_{j}})}^{- 1}, ξ = \frac{1}{3} {[Q^{- 1} (\frac{τ}{8 J (N_{t} - 1)})]}^{2}, \end{matrix}

(19)

and

{SNR}_{j} = \frac{P_{j}}{σ_{1}^{2}}

,

{\tilde{SNR}}_{j} = \frac{{\tilde{P}}_{j}}{σ_{2}^{2}}

,

d_{j}

,

{\tilde{d}}_{j}

,

P_{j}

, and

{\tilde{P}}_{j}

are defined in Section 3.

Proof of Theorem 1.

Our FBL approach for the MIMO WHFL is an extension of the classical SK scheme for the AWGN channel with noiseless feedback. The key to this extension is composed of three parts:

The two-dimensional message mapping method, which maps the message to a complex codeword transmitted over the fading channels.
An SVD-based pre-coding strategy that divides the MIMO channel into several parallel SISO channels.
The two-dimensional modulo-lattice operation (MLO) that eliminates the impact of feedback channel noise on the performance of the SK scheme.

Details about the above tools and how to combine these tools to show our FBL approach for the MIMO WHFL are given in the next section, and the formal proof of Theorem 1 is in Appendix A. □

Remark 1.

Here note that in the FBL approach for the MIMO WHFL, we apply an SVD-based pre-coding strategy to divide the MIMO channel into several parallel SISO channels, which indicates that the FBL approach for the SISO WHFL can be directly obtained since it is a special case of the approach for the MIMO WHFL. The following Corollary 1 proposes an FBL approach for the SISO WHFL and characterizes the relationship between PLS, privacy, utility, and the noise variance of LDP. Since Corollary 1 can be directly obtained from Theorem 1, we omit the detailed proof here.

Corollary 1.

For the SISO WHFL with K users and T iterations, given N, τ, υ, D, ϵ, δ, and using a similar FBL approach to that of Theorem 1, the relationship between PLS, privacy, utility, and the noise variance of LDP is characterized by the following:

max \{\underset{Secrecy level of PLS}{\underset{︸}{max_{\begin{matrix} t \in {1, \dots, T}, \\ Δ g_{s i s o} \end{matrix}} [\frac{D \cdot 2^{\frac{2}{q (1 - δ)} log (1 + \frac{| {\hat{g}}_{s i s o} |^{2} P}{σ_{e}^{2}})} - S_{ℓ} σ_{w, t}^{2}}{K}]}}, \underset{Privacy term}{\underset{︸}{max_{t \in {1, \dots, T}} [\frac{S_{ℓ} σ_{w, t}^{2}}{K (2^{2 ϵ} - 1)}]}}\} \leq \underset{\begin{matrix} LDP noise variance \end{matrix}}{\underset{︸}{σ^{2}}} \leq \underset{Utility term}{\underset{︸}{\frac{υ}{K}}},

(20)

where

{\hat{g}}_{s i s o} = g_{s i s o} - Δ g_{s i s o}

. Furthermore, an achievable transmission rate

R_{s i s o} (τ, N, δ, D, υ, ϵ)

of our proposed FBL approach is given by the following:

\begin{matrix} R_{s i s o} (τ, N, δ, D, υ, ϵ) = \frac{\sum_{t = 1}^{T} N_{t} R_{t}}{N}, R_{t} = \frac{1}{N_{t}} log (\frac{3 SNR | h_{s i s o} |^{2}}{{[Q^{- 1} (\frac{τ}{8})]}^{2}} {(1 + \frac{SNR | h_{s i s o} |^{2}}{Ψ_{3} Ψ_{4}})}^{N_{t} - 1}), \end{matrix}

(21)

where

N = \sum_{t = 1}^{T} N_{t}

,

Ψ_{3} = 1 + ξ^{★} \frac{| h_{s i s o} |^{2} SNR}{| {\tilde{h}}_{s i s o} |^{2} \tilde{SNR}}

,

Ψ_{4} = {(1 - \frac{ξ^{★}}{| {\tilde{h}}_{s i s o} |^{2} \tilde{SNR}})}^{- 1}

,

ξ^{★} = \frac{1}{3} {[Q^{- 1} (\frac{τ}{8 (N_{t} - 1)})]}^{2}

,

SNR = \frac{P}{σ_{1}^{2}}

,

\tilde{SNR} = \frac{\tilde{P}}{σ_{2}^{2}}

, and

| h_{s i s o} |

,

| {\tilde{h}}_{s i s o} |

,

| {\hat{g}}_{s i s o} |

represent the modulus of

h_{s i s o}

,

{\tilde{h}}_{s i s o}

and

{\hat{g}}_{s i s o}

, respectively.

Theorem 2.

For the SIMO WHFL with K users and T iterations, given N, τ, υ, D, ϵ, δ, and using the FBL approach in Section 4, the relationship between PLS, privacy, utility, and the noise variance of LDP is characterized by the following:

\begin{matrix} max \{\underset{Secrecy level of PLS}{\underset{︸}{max_{\begin{matrix} t \in {1, \dots, T}, \\ Δ g_{s i m o} \end{matrix}} [\frac{D \cdot 2^{\frac{2}{q (1 - δ)} log (1 + \frac{| | {\hat{g}}_{s i m o} {| |}^{2} P}{σ_{e}^{2}})} - S_{ℓ} σ_{w, t}^{2}}{K}]}}, \underset{Privacy term}{\underset{︸}{max_{t \in {1, \dots, T}} [\frac{S_{ℓ} σ_{w, t}^{2}}{K (2^{2 ϵ} - 1)}]}}\} \leq \underset{\begin{matrix} LDP noise variance \end{matrix}}{\underset{︸}{σ^{2}}} \leq \underset{Utility term}{\underset{︸}{\frac{υ}{K}}}, \end{matrix}

(22)

where

{\hat{g}}_{s i m o} = g_{s i m o} - Δ g_{s i m o}

. In addition, an achievable transmission rate

R_{s i m o} (τ, N, δ, D, υ, ϵ)

of our proposed FBL approach is given by the following:

\begin{matrix} R_{s i m o} (τ, N, δ, D, υ, ϵ) = \frac{\sum_{t = 1}^{T} N_{t} R_{t}}{N}, R_{t} = \frac{1}{N_{t}} log (\frac{3 SNR | | h_{s i m o} {| |}^{2}}{{[Q^{- 1} (\frac{τ}{8})]}^{2}} {(1 + \frac{SNR | | h_{s i m o} {| |}^{2}}{Ψ_{5} Ψ_{6}})}^{N_{t} - 1}), \end{matrix}

(23)

where

Ψ_{5} = 1 + ξ^{★} \frac{| | h_{s i m o} {| |}^{2} SNR}{| | {\tilde{h}}_{s i m o} {| |}^{2} \tilde{SNR}}

,

Ψ_{6} = {(1 - \frac{ξ^{★}}{| | {\tilde{h}}_{s i m o} {| |}^{2} \tilde{SNR}})}^{- 1}

, SNR,

\tilde{SNR}

, N, and

ξ^{★}

are given in Corollary 1.

Proof of Theorem 2.

The difference between the approaches in Theorems 1 and 2 is that for the SIMO case, we use a beamforming strategy together with a new pre-coding strategy instead of the SVD-based pre-coding strategy used for the MIMO case. Here the beamforming and new pre-coding strategies respectively transform the feedforward and feedback channels into SISO channels. Then, along the lines of the encoding-decoding procedure in Section 3.1.3, the FBL approach for the SIMO WHFL is obtained, and the detail about this approach is in Section 4. Finally, since the proof of Theorem 2 is included in that of Theorem 1, we omit the formal proof here. □

Remark 2.

Here, note that in the SIMO WHFL, a beamforming strategy transforms the SIMO feedforward channel into the SISO feedforward channel, while a new pre-coding strategy transforms the MISO feedback channel into the SISO feedback channel. Analogously, for the MISO WHFL, first, we apply the pre-coding strategy of the SIMO WHFL to transform the MISO feedforward channel into the SISO feedforward channel, and the beamforming strategy of the SIMO WHFL to transform the SIMO feedback channel into the SISO feedback channel, then along the lines of the encoding-decoding procedure in Theorem 2, the following Corollary 2 for the MISO WHFL is obtained. As the proof follows a similar way to that of Theorem 2, the detailed proof is omitted here.

Corollary 2.

For the MISO WHFL with K users and T iterations, given N, τ, υ, D, ϵ and δ, and using a similar FBL approach to that of Theorem 2, the relationship between PLS, privacy, utility, and the noise variance of LDP is characterized by the following:

\begin{matrix} max \{\underset{Secrecy level of PLS}{\underset{︸}{max_{\begin{matrix} t \in {1, \dots, T}, \\ Δ g_{m i s o} \end{matrix}} [\frac{D \cdot 2^{\frac{2}{q (1 - δ)} log (1 + \frac{| | {\hat{g}}_{m i s o} {| |}^{2} P}{σ_{e}^{2}})} - S_{ℓ} σ_{w, t}^{2}}{K}]}}, \underset{Privacy term}{\underset{︸}{max_{t \in {1, \dots, T}} [\frac{S_{ℓ} σ_{w, t}^{2}}{K (2^{2 ϵ} - 1)}]}}\} \leq \underset{\begin{matrix} LDP noise variance \end{matrix}}{\underset{︸}{σ^{2}}} \leq \underset{Utility term}{\underset{︸}{\frac{υ}{K}}}, \end{matrix}

(24)

where

{\hat{g}}_{m i s o} = g_{m i s o} - Δ g_{m i s o}

. In addition, an achievable transmission rate

R_{m i s o} (τ, N, δ, D, υ, ϵ)

of our proposed FBL approach is given by the following:

\begin{matrix} R_{m i s o} (τ, N, δ, D, υ, ϵ) = \frac{\sum_{t = 1}^{T} N_{t} R_{t}}{N}, R_{t} = \frac{1}{N_{t}} log (\frac{3 SNR | | h_{m i s o} {| |}^{2}}{{[Q^{- 1} (\frac{τ}{8})]}^{2}} {(1 + \frac{SNR | | h_{m i s o} {| |}^{2}}{Ψ_{7} Ψ_{8}})}^{N_{t} - 1}), \end{matrix}

(25)

where

Ψ_{7} = 1 + ξ^{★} \frac{| | h_{m i s o} {| |}^{2} SNR}{| | {\tilde{h}}_{m i s o} {| |}^{2} \tilde{SNR}}

,

Ψ_{8} = {(1 - \frac{ξ^{★}}{| | {\tilde{h}}_{m i s o} {| |}^{2} \tilde{SNR}})}^{- 1}

, SNR,

\tilde{SNR}

, N and

ξ^{★}

are given in Corollary 1.

3. An FBL Approach for the MIMO WHFL

For the WHFL in the MIMO case, (9)–(11) can be re-written as follows:

\begin{matrix} Y_{i} (t) = h_{m i m o} X_{i} (t) + η_{1, i} (t), 1 \leq i \leq N_{t}, \end{matrix}

(26)

\begin{matrix} {\tilde{Y}}_{i} (t) = {\tilde{h}}_{m i m o} {\tilde{X}}_{i} (t) + η_{2, i} (t), 1 \leq i \leq N_{t} - 1, \end{matrix}

(27)

\begin{matrix} Z_{i} (t) = g_{m i m o} X_{i} (t) + {\tilde{g}}_{m i m o} {\tilde{X}}_{i} (t) + η_{e, i} (t), 1 \leq i \leq N_{t}, \end{matrix}

(28)

where

h_{m i m o} \in C^{B \times A}

,

{\tilde{h}}_{m i m o} \in C^{A \times B}

,

g_{m i m o} \in C^{C \times A}

,

{\tilde{g}}_{m i m o} \in C^{C \times B}

,

X_{i} (t) \in C^{A \times 1}

,

{\tilde{X}}_{i} (t) \in C^{B \times 1}

, the elements of

η_{1, i} (t) \in C^{B \times 1}

,

η_{2, i} (t) \in C^{A \times 1}

and

η_{e, i} (t) \in C^{C \times 1}

are i.i.d. as

CN (0, σ_{1}^{2})

,

CN (0, σ_{2}^{2})

and

CN (0, σ_{e}^{2})

, respectively. Here note that the feedforward channel (26) and the feedback channel (27) are both MIMO channels.

In this section, for the MIMO WHFL system, an FBL approach is proposed, which combines the two-dimensional message mapping method, the two-dimensional MLO, and the SVD technique, see the following Figure 3. To facilitate a better understanding of Figure 3, we introduce the two-dimensional message mapping method and the two-dimensional MLO below.

The two-dimensional message mapping method: We first review the message mapping in the classical SK scheme [35] (see Figure 4a). Specifically, for given codeword length n, let the message

W \in W = {1, 2, \dots, 2^{n R}}

and

| W | = 2^{n R}

, where R is the transmission rate. Partition the interval

[- \sqrt{3}, \sqrt{3}]

into

2^{n R}

equal sub-intervals, with each sub-interval’s midpoint corresponding to a message in

W

. Let

θ

denote the midpoint associated with message W, where the variance of

θ

is approximately 1. This one-dimensional mapping method is shown to be optimal for AWGN channels with real signals. To address the complexity of fading channels, we introduce a two-dimensional message mapping method, detailed as follows:

For given codeword length n, let message

W = (W_{R}, W_{I})

, where W,

W_{R}

and

W_{I}

are uniformly distributed in

W = {1, \dots, 2^{n R}}

,

W_{R} = {1, \dots, 2^{n R_{R}}}

and

W_{I} = {1, \dots, 2^{n R_{I}}}

, respectively, and

R_{R} + R_{I} = R

. Since the message W is composed of two parts, we place the points

(W_{R}, W_{I})

in a complex square grid with corners located at

(\pm \sqrt{3}, \pm j \sqrt{3})

(see Figure 4b). Divide the entire square grid into

2^{n (R_{R} + R_{I})}

equally spaced sub-grids, and the center point of each sub-grid is mapped to a pair of values in

W = (W_{R}, W_{I})

. Let

θ = θ_{R} + j θ_{I}

be the center point of the sub-grid with respect to (w.r.t) the message

W = (W_{R}, W_{I})

, where

θ_{R}

and

θ_{I}

represent the real and imaginary components of

θ

, respectively, and the variance of

θ

approximately equals 2.

The two-dimensional MLO: The two-dimensional MLO is given by the following:

\begin{matrix} M_{Λ} [x] \overset{def}{=} x - Q [x], \end{matrix}

(29)

where the two-dimensional lattice

Λ = Λ_{R} + j Λ_{I}

is a complex plane with

Λ_{R} \in [- \frac{d}{2}, - \frac{d}{2}]

,

Λ_{I} \in [- \frac{d}{2}, - \frac{d}{2}]

,

d > 0

,

j = \sqrt{- 1}

,

Q [x]

is the nearest neighbor quantization of x w.r.t.

Λ

, and x is a complex-valued number. Some basic properties of the two-dimensional MLO [39] are listed below.

Proposition 1.

(1). The distributive law

M_{Λ} [M_{Λ} [x] + y] = M_{Λ} [x + y]

.

(2). If

x + y \in Λ

,

M_{Λ} [x + y] = x + y

, otherwise, a modulo-aliasing error occurred.

(3). Let the dither signal ν be uniformly distributed on Λ, then

M_{Λ} [x + ν]

is uniformly distributed on Λ, where

Var (M_{Λ} [x + ν]) = \frac{d^{2}}{12} + \frac{d^{2}}{12} = \frac{d^{2}}{6}

.

The classical SK scheme does not work in the noisy feedback case, and this is because in such a case, the transmitter cannot accurately obtain the estimation error of the receiver. We show that by applying the two-dimensional MLO to both the feedforward and feedback encoders, the adverse effects of feedback channel noise on the SK scheme’s performance can be mitigated, which allows the SK-type scheme to remain effective even in the presence of noisy feedback. The following Figure 5a,b illustrate the differences between the classical SK scheme and the modified SK-type scheme utilizing two-dimensional MLO.

3.1. An FBL Approach for the MIMO WHFL

3.1.1. Channel Decomposition by SVD

Based on the SVD technique, matrices

h_{m i m o}

and

{\tilde{h}}_{m i m o}

can be expressed as follows:

\begin{matrix} h_{m i m o} = U Λ V^{H}, {\tilde{h}}_{m i m o} = \tilde{U} \tilde{Λ} {\tilde{V}}^{H}, \end{matrix}

(30)

where

U, {\tilde{V}}^{H} \in C^{B \times B}

and

\tilde{U}, V^{H} \in C^{A \times A}

are unitary matrices. The diagonal matrices

Λ \in C^{B \times A}

and

\tilde{Λ} \in C^{A \times B}

have non-negative real number diagonal elements (

d_{1}

,…,

d_{J}

) and (

{\tilde{d}}_{1}

,…,

{\tilde{d}}_{J}

) [49], respectively, and

J = min (A, B) .

(31)

According to (26) and (30), we have the following:

U^{H} Y_{i} (t) = Λ V^{H} X_{i} (t) + U^{H} η_{1, i} (t) ⟹ Y_{i}^{'} (t) = Λ X_{i}^{'} (t) + η_{1, i}^{'} (t),

(32)

where

Y_{i}^{'} (t) = U^{H} Y_{i} (t), η_{1, i}^{'} (t) = U^{H} η_{1, i} (t) \in C^{B \times 1}

and

X_{i}^{'} (t) = V^{H} X_{i} (t) \in C^{A \times 1}

. It is noted that

E (X_{i}^{' H} (t) X_{i}^{'} (t)) = E (X_{i}^{H} (t) X_{i} (t))

and

E (η_{1, i}^{' H} (t) η_{1, i}^{'} (t)) = E (η_{1, i}^{H} (t) η_{1, i} (t))

, ensuring that the power constraint of

X_{i} (t)

is equal to that of

X_{i}^{'} (t)

, and the distributions of

η_{1, i}^{'} (t)

and

η_{1, i} (t)

remain the same. As

Λ

is a diagonal matrix, (32) can be decomposed as follows:

Y_{j, i}^{'} (t) = d_{j} X_{j, i}^{'} (t) + η_{j, 1, i}^{'} (t), 1 \leq j \leq J, 1 \leq i \leq N_{t},

(33)

where

Y_{j, i}^{'} (t)

,

X_{j, i}^{'} (t)

and

η_{j, 1, i}^{'} (t)

denote the j-th components of

Y_{i}^{'} (t)

,

X_{i}^{'} (t)

and

η_{1, i}^{'} (t)

, respectively.

Similarly, from (27) and (30), (27) can be decomposed as follows:

{\tilde{Y}}_{j, i}^{'} (t) = {\tilde{d}}_{j} {\tilde{X}}_{j, i}^{'} (t) + η_{j, 2, i}^{'} (t), 1 \leq j \leq J, 1 \leq i \leq N_{t} - 1,

(34)

where

{\tilde{Y}}_{j, i}^{'} (t)

,

{\tilde{X}}_{j, i}^{'} (t)

and

η_{j, 2, i}^{'} (t)

denote the j-th components of

{\tilde{Y}}_{i}^{'} (t)

,

{\tilde{X}}_{i}^{'} (t)

and

η_{2, i}^{'} (t)

, respectively, and

{\tilde{Y}}_{i}^{'} (t) = {\tilde{U}}^{H} {\tilde{Y}}_{i} (t), η_{2, i}^{'} (t) = {\tilde{U}}^{H} η_{2, i} (t) \in C^{A \times 1}

and

{\tilde{X}}_{i}^{'} (t) = {\tilde{V}}^{H} {\tilde{X}}_{i} (t) \in C^{B \times 1}

. As shown in (32)–(34), applying the SVD technique, the feedforward and feedback MIMO channels can be effectively transformed into J parallel SISO sub-channels.

Power allocating: The edge server assigns power

P_{1}, \dots, P_{J}

to the J parallel sub-channels for the feedforward channel, where

\sum_{j = 1}^{J} P_{j} = P

. Similarly, the cloud server distributes power

{\tilde{P}}_{1}, \dots, {\tilde{P}}_{J}

across the J parallel sub-channels for the feedback channel, where

\sum_{j = 1}^{J} {\tilde{P}}_{j} = \tilde{P}

.

3.1.2. Message Splitting

For given

τ

,

N_{t}

,

υ

, D and

ϵ

, we define the following:

\begin{matrix} | W_{t}^{″} | = 2^{N_{t} R_{t}} = 2^{q R_{t} (D)}, R_{t} = \frac{H (W_{t}^{″})}{N_{t}} . \end{matrix}

(35)

Next, the message

W_{t}^{″}

is divided into J independent components

(W_{t, 1}^{″}, \dots, W_{t, J}^{″})

, where

W_{t, j}^{″}

is uniformly distributed over the set

W_{t, j}^{″} = {1, 2, \dots, 2^{N_{t} R_{t, j}}}

and

j = 1, \dots, J

. Then, we divide each sub-message

W_{t, j}^{″}

into

W_{t, j}^{″} = (W_{t, j, R}^{''}, W_{t, j, I}^{''})

, where

W_{t, j, R}^{″}

and

W_{t, j, I}^{″}

are uniformly distributed over the sets

W_{t, j, R}^{″} = {1, 2, \dots, 2^{N_{t} R_{t, j, R}}}

and

W_{t, j, I}^{″} = {1, 2, \dots, 2^{N_{t} R_{t, j, I}}}

, respectively. The rate for each parallel sub-channel is defined as

R_{t, j} = R_{t, j, R} + R_{t, j, I}

. Consequently, the total rate

R_{t}

for all J parallel sub-channels during the t-th communication round is as follows:

R_{t} = \sum_{j = 1}^{J} (R_{t, j, R} + R_{t, j, I}) .

(36)

3.1.3. An FBL Scheme of Each Parallel Sub-Channel

By using the two-dimensional message mapping method introduced in the last subsection, the message

W_{t, j}^{″}

is mapped to the center point

θ_{j}

of its corresponding sub-grid.

Initialization: At time instant 1, the edge server maps the messages

W_{t, j}^{″}

to

θ_{j} = θ_{R, j} + j θ_{I, j}

, and sends the following:

\begin{matrix} X_{j, 1}^{'} (t) = \sqrt{\frac{P_{j}}{2}} θ_{j}, \end{matrix}

(37)

Then, the cloud server computes the first estimation

{\hat{θ}}_{j, 1}

of

θ_{j}

by the following:

\begin{matrix} {\hat{θ}}_{j, 1} = \frac{Y_{j, 1}^{'} (t)}{d_{j} \sqrt{\frac{P_{j}}{2}}} = θ_{j} + \frac{η_{j, 1, 1}^{'} (t)}{d_{j} \sqrt{\frac{P_{j}}{2}}} = θ_{j} + ε_{1}, \end{matrix}

(38)

where

ε_{1} = ε_{R, 1} + j ε_{I, 1} = {\hat{θ}}_{j, 1} - θ_{j}

is the estimation error of the cloud server at time instant 1. Define

α_{1} = Var (ε_{1}) = \frac{2 σ_{1}^{2}}{d_{j}^{2} P_{j}}

,

α_{R, 1} = Var (ε_{R, 1}) = \frac{σ_{1}^{2}}{d_{j}^{2} P_{j}}

and

α_{I, 1} = Var (ε_{I, 1}) = \frac{σ_{1}^{2}}{d_{j}^{2} P_{j}}

.

Iteration: First, we introduce a shared dither random i.i.d. sequence

ν^{N_{t} - 1} = (ν_{1}, \dots, ν_{N_{t} - 1})

, which is perfectly known by both the edge server and the cloud server, and it is uniformly distributed on

Λ

(

Λ = Λ_{R} + j Λ_{I}

is a complex plane with

Λ_{R} \in [- \frac{d}{2}, - \frac{d}{2}]

,

Λ_{I} \in [- \frac{d}{2}, - \frac{d}{2}]

), and

d = \sqrt{6 {\tilde{P}}_{j}}

. Here

ν^{N_{t} - 1}

is independent of all signals transmitted over channels. At time instant i (

2 \leq i \leq N_{t}

), using the two-dimensional MLO shown in Section 3, the cloud server sends the following:

\begin{matrix} {\tilde{X}}_{j, i - 1}^{'} (t) = M_{Λ} [γ_{i - 1} {\hat{θ}}_{j, i - 1} + ν_{i - 1}], \end{matrix}

(39)

where

γ_{i - 1}

is a modulation coefficient. From Property (3) of Proposition 1, we have

E ({\tilde{X}}_{j, i - 1}^{' H} (t) {\tilde{X}}_{j, i - 1}^{'} (t)) = {\tilde{P}}_{j}

(the dither signals guarantee that the codeword transmitted by the cloud server meets the power constraint). Then the edge server computes a noisy version of estimation error

ε_{i - 1} = {\hat{θ}}_{j, i - 1} - θ_{j}

by the following:

\begin{matrix} {\tilde{ε}}_{i - 1} = \frac{1}{γ_{i - 1}} M_{Λ} [\frac{{\tilde{Y}}_{j, i - 1}^{'} (t)}{{\tilde{d}}_{j}} - γ_{i - 1} θ_{j} - ν_{i - 1}] \overset{(a)}{=} \frac{1}{γ_{i - 1}} M_{Λ} [γ_{i - 1} ε_{i - 1} + \frac{η_{j, 2, i - 1}^{'} (t)}{{\tilde{d}}_{j}}], \end{matrix}

(40)

where (a) is due to the modulo distributive law in property (1) of Proposition 1. The modulo-aliasing errors do not occur in the edge server, if

γ_{i - 1} ε_{i - 1} + \frac{η_{j, 2, i - 1}^{'} (t)}{{\tilde{d}}_{j}} \in Λ

. Hence, the edge server obtains

{\tilde{ε}}_{i - 1} = ε_{i - 1} + \frac{η_{j, 2, i - 1}^{'} (t)}{γ_{i - 1} {\tilde{d}}_{j}}

. Then, the edge server sends the following:

\begin{matrix} X_{j, i}^{'} (t) = λ_{i - 1} γ_{i - 1} {\tilde{ε}}_{i - 1}, \end{matrix}

(41)

where

λ_{i - 1}

is chosen to satisfy the transmitter’s power constraint

P_{j}

. Then, the cloud server updates

{\hat{θ}}_{j, i}

by computing the following:

\begin{matrix} {\hat{θ}}_{j, i} = {\hat{θ}}_{j, i - 1} - {\hat{ε}}_{i - 1} = {\hat{θ}}_{j, i - 1} - β_{i} \frac{Y_{j, i}^{'} (t)}{d_{j}}, \end{matrix}

(42)

where

{\hat{ε}}_{i - 1} = β_{i} \frac{Y_{j, i}^{'} (t)}{d_{j}}

, and the MMSE estimation coefficient

β_{i}

is given by the following:

\begin{matrix} β_{i} = \frac{E (\frac{ε_{i - 1} Y_{j, i}^{'} {(t)}^{H}}{d_{j}})}{E (\frac{Y_{j, i}^{'} (t) Y_{j, i}^{'} {(t)}^{H}}{d_{j}^{2}})}, \end{matrix}

(43)

which ensures that

ε_{i - 1}

is correctly estimated from

Y_{j, i}^{'} (t)

. Define

ε_{i} = ε_{R, i} + j ε_{I, i} = {\hat{θ}}_{j, i} - θ_{j}

, (42) yields the following:

\begin{matrix} ε_{i} = ε_{i - 1} - β_{i} \frac{Y_{j, i}^{'} (t)}{d_{j}} . \end{matrix}

(44)

Further define

α_{i} = Var (ε_{i})

,

α_{R, i} = Var (ε_{R, i})

,

α_{I, i} = Var (ε_{I, i})

. Since

ε_{i}

is a CSCG distribution estimation error, we conclude that

α_{R, i} = α_{I, i} = \frac{α_{i}}{2}

.

Decoding: At time instant

N_{t}

, the final estimation obtained by the cloud server is

{\hat{θ}}_{j, N_{t}} = θ_{j} + ε_{N_{t}}

, where

ε_{N_{t}} = ε_{R, N_{t}} + j ε_{I, N_{t}}

. The cloud server successfully decodes the message

W_{t, j}^{″}

if

{\hat{θ}}_{j, N_{t}}

is closest to the message point

θ_{j}

, i.e.,

ε_{R, N_{t}} \in [- \frac{\sqrt{3}}{2^{N_{t} R_{t, j, R}}}, \frac{\sqrt{3}}{2^{N_{t} R_{t, j, R}}})

and

ε_{I, N_{t}} \in [- \frac{\sqrt{3}}{2^{N_{t} R_{t, j, I}}}, \frac{\sqrt{3}}{2^{N_{t} R_{t, j, I}}})

.

The formal proof of Theorem 1 is provided in Appendix A.

4. An FBL Approach for the SIMO WHFL

For the SIMO WHFL, (9)–(11) can be re-written as follows:

\begin{matrix} Y_{i} (t) = h_{s i m o} X_{i} (t) + η_{1, i} (t), 1 \leq i \leq N_{t}, \end{matrix}

(45)

\begin{matrix} {\tilde{Y}}_{i} (t) = {\tilde{h}}_{s i m o} {\tilde{X}}_{i} (t) + η_{2, i} (t), 1 \leq i \leq N_{t} - 1, \end{matrix}

(46)

\begin{matrix} Z_{i} (t) = g_{s i m o} X_{i} (t) + {\tilde{g}}_{s i m o} {\tilde{X}}_{i} (t) + η_{e, i} (t), 1 \leq i \leq N_{t}, \end{matrix}

(47)

where

h_{s i m o} \in C^{B \times 1}

,

{\tilde{h}}_{s i m o} \in C^{1 \times B}

,

g_{s i m o} \in C^{C \times 1}

,

{\tilde{g}}_{s i m o} \in C^{C \times B}

,

X_{i} (t) \in C^{1 \times 1}

,

{\tilde{X}}_{i} (t) \in C^{B \times 1}

, the elements of

η_{1, i} (t) \in C^{B \times 1}

and

η_{e, i} (t) \in C^{C \times 1}

are i.i.d. as

CN (0, σ_{1}^{2})

and

CN (0, σ_{e}^{2})

, respectively, and

η_{2, i} (t) \in C^{1 \times 1} \sim CN (0, σ_{2}^{2})

. Here note that the feedforward channel (45) is a SIMO channel, while the feedback channel (46) is a MISO channel. Unlike the SVD technique used for the MIMO WHFL that decomposes the MIMO channel into several parallel SISO channels, we use a beamforming strategy to transform the feedforward SIMO channel into the SISO channel, and a new pre-coding strategy to transform the feedback MISO channel into the SISO channel, see the following Figure 6. Further applying the approach for each SISO channel (see Section 3.1.3), the FBL approach for the SIMO WHFL is obtained, and the details are given below.

Beamforming strategy: The signal received by the cloud server in (45) can proceed as follows:

\begin{matrix} h_{s i m o}^{H} Y_{i} (t) = h_{s i m o}^{H} h_{s i m o} X_{i} (t) + h_{s i m o}^{H} η_{1, i} (t) = | | h_{s i m o} {| |}^{2} X_{i} (t) + h_{s i m o}^{H} η_{1, i} (t), \\ ⟹ {\bar{Y}}_{i} (t) = | | h_{s i m o} {| |}^{2} X_{i} (t) + {\bar{η}}_{1, i} (t), \end{matrix}

(48)

where

{\bar{Y}}_{i} (t) = h_{s i m o}^{H} Y_{i} (t) \in C^{1 \times 1}

and

{\bar{η}}_{1, i} (t) = h_{s i m o}^{H} η_{1, i} (t) \in C^{1 \times 1}

. Applying (4), the feedforward SIMO channel is transformed into the SISO channel.

A new pre-coding strategy: For the feedback channel (46), allowing the following:

\begin{matrix} {\tilde{X}}_{i} (t) = \frac{{\tilde{h}}_{s i m o}^{H}}{| | {\tilde{h}}_{s i m o} | |} {\tilde{X}}_{i} (t), \end{matrix}

(49)

where

{\tilde{X}}_{i} (t) \in C^{1 \times 1}

and

E ({\tilde{X}}_{i}^{H} (t) {\tilde{X}}_{i} (t)) = E ({\tilde{X}}_{i}^{H} (t) \frac{{\tilde{h}}_{s i m o}}{| | {\tilde{h}}_{s i m o} | |} \frac{{\tilde{h}}_{s i m o}^{H}}{| | {\tilde{h}}_{s i m o} | |} {\tilde{X}}_{i} (t)) = E ({\tilde{X}}_{i}^{H} (t) {\tilde{X}}_{i} (t)) = \tilde{P}

, which indicates that the power constraint of

{\tilde{X}}_{i} (t)

is equal to that of

{\tilde{X}}_{i} (t)

. Hence, substituting (49) into (46), we have the following:

\begin{matrix} {\tilde{Y}}_{i} (t) = {\tilde{h}}_{s i m o} \frac{{\tilde{h}}_{s i m o}^{H}}{| | {\tilde{h}}_{s i m o} | |} {\tilde{X}}_{i} (t) + η_{2, i} (t) = | | {\tilde{h}}_{s i m o} | | {\tilde{X}}_{i} (t) + η_{2, i} (t), \end{matrix}

(50)

which indicates that the feedback MISO channel is transformed into the SISO channel. Hence along the lines of the encoding-decoding procedure in Section 3.1.3, the FBL approach for the SIMO WHFL is obtained.

Since the proof of Theorem 2 is included in the proof of Theorem 1, we omit the detailed proof here.

5. Simulation Results

5.1. Experimental Settings

The simulation results are derived by averaging 2000 independent channel realizations (i.e., Monte-Carlo simulations). We consider a WHFL system consisting of 10 users, an edge server, and a cloud server, with each user having the same amount of training data. We assume that the channel matrix elements follow an i.i.d. distribution as

CN (0, 1)

[5,6,17]. Following [47], the maximum normalized estimation errors of the eavesdropper’s channel are defined as

Ω = \frac{ω}{{| | g | |}_{F}}

and

\tilde{Ω} = \frac{\tilde{ω}}{| | \tilde{g} {| |}_{F}}

, where

ω

and

\tilde{ω}

are defined in (13). The edge server employs Lempel–Ziv–Welch (LZW) source coding [50] to compress the quantized gradients, and the total transmitted data are M bits. The transmission latency for the edge server to upload data is

T_{c o m m} = \frac{M}{R_{e g}}

[5], where

R_{e g}

represents the edge server’s transmission rate.

To evaluate the effectiveness of the proposed FBL scheme under real-world conditions, we train a neural network using the MNIST dataset (http://yann.lecun.com/exdb/mnist/, accessed on 20 March 2024), which contains 60,000 training samples and 10,000 test samples of 10 different handwritten digits. The network architecture includes 784 input nodes, a hidden layer containing 20 nodes, and an output layer with 10 nodes. The loss function is cross-entropy, with the hidden and output layers utilizing the ReLU and softmax activation functions, respectively. The neural network contains a total of

q = 15,910

parameters, and the learning rate is set at

μ = 0.1

. In the experiments, the following three schemes are compared.

Benchmark (Perfect HFL): The perfectly aggregated HFL system can be achieved through error-free transmission, which serves as the benchmark accuracy in ideal settings.
Baseline 1 (Random binning coding scheme (RBCS)-based WHFL [26,28]): The gradient data from the edge servers is uploaded using the RBCS, which is based on traditional low-density parity-check (LDPC) codes with a target bit error rate of $10^{- 6}$ .
Baseline 2 (Frequency division multiple access (FDMA)-based WHFL with artificial noise (AN) [29]): In the FDMA-based WHFL system with AN, FDMA is employed to transmit gradient data from edge servers to the cloud server, targeting a bit error ratio of $10^{- 6}$ . Additionally, AN is added to the transmitted signals to prevent eavesdroppers from obtaining the true gradient data.

5.2. Experimental Results

We show the results of test accuracy and the cross entropy versus the communication round for SISO/SIMO/MISO/MIMO cases in Figure 7 and Figure 8, respectively. From Figure 7 and Figure 8, we see that if perfect CSI of the eavesdropper’s channel is obtained by legal parties, both our proposed FBL scheme, Baseline 1 scheme, and Baseline 2 scheme almost do not affect the learning performance of HFL. This is because Baseline 1, Baseline 2, and our proposed schemes are all capable of transmitting gradient data with a sufficiently low decoding error probability. On the other hand, in our proposed FBL schemes, if imperfect CSI of the eavesdropper’s channel is obtained by legal parties, the test accuracy of HFL decreases, and the training loss of HFL increases as the maximum normalized estimation error of the eavesdropper’s channel increases. However, note that in such an imperfect CSI case, our proposed FBL schemes still provide the same level of secrecy as that of the perfect CSI case, which shows the robustness of our schemes against imperfect CSI of the eavesdropper’s channel. Furthermore, Figure 7 and Figure 8 demonstrate that the eavesdropper cannot obtain the real gradient data when applying our FBL scheme, which indicates that our FBL schemes effectively ensure the PLS of the data.

As depicted in Figure 9, the transmission latency of our FBL scheme is approximately 2 to 5 times lower than that of Baseline 1 and Baseline 2, due to the gain from introducing feedback. Additionally, the transmission latency of our scheme decreases as the number of antennas increases. Furthermore, the transmission latency of Baseline 2 is lower than that of Baseline 1, owing to the gain from introducing AN to counter eavesdropping attacks in Baseline 2. Moreover, Figure 9 shows that the transmission latency of our FBL scheme increases as the maximum normalized estimation error of the eavesdropper’s channel increases, and this is because to support the same level of performance, the worse estimation of the CSI of the eavesdropper’s channel, the more bits need to be transmitted, which leads to an increase in transmission latency.

From Table 2, we show that the achievable secrecy transmission rates of our FBL schemes increase with the number of antennas, and the achievable secrecy transmission rates of our schemes are significantly higher than those of Baseline 1 and Baseline 2, due to the gain introduced by feedback in our scheme. Additionally, due to the gain from introducing AN in Baseline 2, its achievable secrecy transmission rate is higher than that of Baseline 1. Furthermore, Table 2 shows that the achievable secrecy transmission rates of our proposed FBL scheme decrease as the maximum normalized estimation errors of the eavesdropper’s channel increase, which can be viewed as the price for the worst estimation. From Table 3, we conclude that the achievable secrecy transmission rates of FBL schemes increase as the SNR of the feedback channel increases. Moreover, Figure 10 shows that the transmission latency of FBL schemes increases as the SNR of the feedback channel decreases. Therefore, in our schemes, poorer feedback channel conditions lead to lower achievable secrecy transmission rates and increased transmission latency. However, poorer feedback channel conditions do not directly affect learning performance, as it is primarily determined by the distortion D of lossy source coding, the average decoding error probability

τ

of channel coding, and the variance of noise introduced by LDP mechanisms.

Figure 11 shows the relationship between PLS (measured by the secrecy level), privacy, utility, and the LDP noise variance of proposed FBL schemes. From Figure 11, we conclude that the secrecy level increases as the LDP noise variance increases, and a higher secrecy level leads to a more stringent relationship between privacy and utility (with a smaller

ϵ

and a larger

υ

). Apart from this, for a given secrecy level, increasing the maximum normalized estimation error in the eavesdropper’s channel results in an increase in the variance of LDP noise, which can be also viewed as the price for the worse estimation.

6. Conclusions and Future Work

In this paper, a practical FBL approach, which is an extension of the classical SK scheme, is proposed for the multi-antenna URLLC-WHFL systems in the presence of PLS. We characterize the relationship between PLS, privacy, and the utility of these WHFL systems, and derive achievable transmission rates of the proposed FBL approach. Simulation results demonstrate that when the edge server has perfect knowledge of the eavesdropper’s CSI, our proposed FBL approach not only almost achieves perfect secrecy but also does not affect learning performance. Additionally, simulation results demonstrate that the proposed schemes have robustness even when the edge server has an imperfect eavesdropper’s CSI. Apart from this, it has been demonstrated that the transmission latency of our proposed FBL approach is significantly lower compared to traditional RBCS.

Furthermore, this paper focuses on proposing and analyzing a theoretical scheme. The application of this approach in real-world systems still faces practical challenges, such as hardware constraints, power consumption, or synchronization issues. Future work should aim to optimize energy efficiency and address synchronization in more complex multi-antenna systems using the proposed FBL scheme. On the other hand, as the computational complexity of techniques like precoding, beamforming, and SVD increases with the number of devices and communication channels, particularly in multi-antenna systems, further research, and optimization are needed to extend our proposed approach to more complex and large-scale networks. For instance, distributed or hierarchical architectures can allocate the computational load across multiple servers or devices, reducing the burden on individual components. Additionally, low-complexity approximation methods for precoding and beamforming could help lower overall system complexity. Future work will extend our approach to more complex multi-edge scenarios, exploring the impact of interference among edge servers on the WHFL in the presence of PLS.

Author Contributions

H.Z. did the theoretical work, performed the experiments, analyzed the data and drafted the work; B.D. designed the work, performed the theoretical work, interpreted the data for the work and revised the work; and P.X. interpreted the data for the work and revised the work. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was supported in part by the National Key R&D Program of China under grant no. 2022YFA1005000, in part by the National Natural Science Foundation of China under grant no. 62071392, and in part by Chongqing Key Laboratory of Mobile Communications Technology under grant no. cqupt-mct-202302.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. The Formal Proof of Theorem 1

Appendix A.1. Utility and Privacy Analysis

First, note that since

W_{t}

,

η_{t}

, and

W_{t}^{'}

are i.i.d. generated, from Definition 1, we conclude the following:

\begin{matrix} max_{t \in {1, \dots, T}} \frac{1}{q} I (W_{t}; W_{t}^{'}) = max_{t \in {1, \dots, T}} \frac{1}{2} log (1 + \frac{S_{ℓ} σ_{w, t}^{2}}{K σ^{2}}) \leq ϵ, \end{matrix}

(A1)

On the other hand, from Definition 2, we conclude the following:

\begin{matrix} \frac{1}{q T} \sum_{t = 1}^{T} E (d (W_{t}, W_{t}^{'})) = \frac{1}{q T} \sum_{t = 1}^{T} E (| | η_{t} {| |}^{2}) = K σ^{2} \leq υ . \end{matrix}

(A2)

From (A1) and (A2), the relationship between privacy, utility, and the noise variance of LDP is characterized by the following:

\underset{Privacy term}{\underset{︸}{max_{t \in {1, \dots, T}} \frac{S_{ℓ} σ_{w, t}^{2}}{K (2^{2 ϵ} - 1)}}} \leq \underset{\begin{matrix} Noise variance of LDP \end{matrix}}{\underset{︸}{σ^{2}}} \leq \underset{\begin{matrix} Utility term \end{matrix}}{\underset{︸}{\frac{υ}{K}}},

(A3)

which indicates that by selecting an appropriate LDP noise to satisfy (A3), both privacy and utility can be ensured.

Appendix A.2. Decoding Error Probability and Convergence Analysis

First, we bound the decoding error probability

P_{e, t}

of

W_{t}^{″}

transmitted in all parallel sub-SISO channels as follows:

\begin{matrix} P_{e, t} \leq P_{e, t} (1) + P_{e, t} (2) + \dots + P_{e, t} (J), \end{matrix}

(A4)

where

P_{e, t} (j)

(

j = 1, \dots, J

) represents the decoding error probability of message

W_{t, j}^{″}

, and J is the number of parallel sub-SISO channels, which is defined in (31). Next, we analyze the error events of message

W_{t, j}^{″}

, which consist of the following:

(1): A modulo-aliasing error occurs in the edge server at time instant $i + 1 (1 \leq i \leq N_{t} - 1)$ , and it is defined as follows:

$\begin{matrix} E_{i} = {γ_{i} ε_{i} + \frac{η_{j, 2, i}^{'} (t)}{{\tilde{d}}_{j}} \notin Λ} = {[γ_{i} ε_{R, i} + \frac{η_{R, j, 2, i}^{'} (t)}{{\tilde{d}}_{j}} \notin [- \frac{d}{2}, \frac{d}{2})] \cup [γ_{i} ε_{I, i} + \frac{η_{I, j, 2, i}^{'} (t)}{{\tilde{d}}_{j}} \notin [- \frac{d}{2}, \frac{d}{2})]}, \end{matrix}$

(A5)

where $ε_{R, i}$ and $ε_{I, i}$ are the real and imaginary parts of $ε_{i}$ , respectively, $η_{R, j, 2, i}^{'} (t)$ and $η_{I, j, 2, i}^{'} (t)$ are the real and imaginary parts of $η_{j, 2, i}^{'} (t)$ , respectively.
(2): A decoding error occurs in the cloud server at time instant $N_{t}$ , and it is defined as follows:

$\begin{matrix} E_{N_{t}} = {ε_{R, N_{t}} \notin [- \frac{\sqrt{3}}{2^{N_{t} R_{t, j, R}}}, \frac{\sqrt{3}}{2^{N_{t} R_{t, j, R}}}) \cup ε_{I, N_{t}} \notin [- \frac{\sqrt{3}}{2^{N_{t} R_{t, j, I}}}, \frac{\sqrt{3}}{2^{N_{t} R_{t, j, I}}})}, \end{matrix}$

(A6)

where $ε_{R, N_{t}}$ and $ε_{I, N_{t}}$ are the real and imaginary parts of $ε_{N_{t}}$ , respectively.

Thus, the error probability

P_{e, t} (j)

is bounded by the following:

\begin{matrix} P_{e, t} (j) & \leq P r (⋃_{i = 1}^{N_{t}} E_{i}) = P r (⋃_{i = 1}^{N_{t} - 1} E_{i}) + P r (⋂_{i = 1}^{N_{t} - 1} E_{i}^{c} \cap E_{N_{t}}) \\ = \sum_{i = 1}^{N_{t} - 1} P r (⋂_{j = 1}^{i - 1} E_{j}^{c} \cap E_{i}) + P r (⋂_{i = 1}^{N_{t} - 1} E_{i}^{c} \cap E_{N_{t}}) \\ = \sum_{i = 1}^{N_{t} - 1} P r ({\tilde{E}}_{i}) + P r ({\tilde{E}}_{N_{t}}), \end{matrix}

(A7)

where

E^{c}

is the complement of the set E, and

{\tilde{E}}_{i} = ⋂_{j = 1}^{i - 1} E_{j}^{c} \cap E_{i}

. Here, note that

P r ({\tilde{E}}_{i}) (i \in {1, \dots, N_{t} - 1})

is the error probability that a demodulation error occurs at time instant

i + 1

, and no error occurs in all previous times.

P r ({\tilde{E}}_{N_{t}})

is the error probability of the final decoding, and no demodulation error occurs in all times. We assume that

P_{e, t} \leq τ

, which indicates that

\frac{1}{T} \sum_{t = 1}^{T} P_{e, t} \leq τ

is guaranteed. Then, we choose

P_{e, t} (j) \leq \frac{τ}{J}

, and we have the following:

\begin{matrix} P r ({\tilde{E}}_{N_{t}}) = \sum_{i = 1}^{N_{t} - 1} P r ({\tilde{E}}_{i}) = \frac{τ}{2 J}, \end{matrix}

(A8)

for simplification, we define the following:

\begin{matrix} P r ({\tilde{E}}_{1}) = \dots = P r ({\tilde{E}}_{N_{t} - 1}) = p_{m} . \end{matrix}

(A9)

Substituting (A9) into (A8), we have the following:

\begin{matrix} p_{m} = \frac{τ}{2 J (N_{t} - 1)} . \end{matrix}

(A10)

For the error event

{\tilde{E}}_{i}

, since no demodulation error occurs before time instant

i + 1

, according to (43), (44), and the fact

η_{j, 2, i}^{'} (t)

is CSCG-distributed, we can conclude the following:

γ_{i} ε_{i} + \frac{η_{j, 2, i}^{'} (t)}{{\tilde{d}}_{j}} \sim CN (0, γ_{i}^{2} α_{i} + \frac{σ_{2}^{2}}{{\tilde{d}}_{j}^{2}})

. Hence, we have

γ_{i} ε_{k, i} + \frac{η_{k, j, 2, i}^{'} (t)}{{\tilde{d}}_{j}} \sim N (0, γ_{i}^{2} \frac{α_{i}}{2} + \frac{σ_{2}^{2}}{2 {\tilde{d}}_{j}^{2}})

, where

k \in {R, I}

. From (A5) and

d = \sqrt{6 {\tilde{P}}_{j}}

, we have the following:

\begin{matrix} P r (E_{i}) & = P r ({[γ_{i} ε_{R, i} + \frac{η_{R, j, 2, i}^{'} (t)}{{\tilde{d}}_{j}} \notin [- \frac{d}{2}, \frac{d}{2})] \cup [γ_{i} ε_{I, i} + \frac{η_{I, j, 2, i}^{'} (t)}{{\tilde{d}}_{j}} \notin [- \frac{d}{2}, \frac{d}{2})]}) \\ \leq P r (γ_{i} ε_{R, i} + \frac{η_{R, j, 2, i}^{'} (t)}{{\tilde{d}}_{j}} \notin [- \frac{d}{2}, \frac{d}{2})) + P r (γ_{i} ε_{I, i} + \frac{η_{I, j, 2, i}^{'} (t)}{{\tilde{d}}_{j}} \notin [- \frac{d}{2}, \frac{d}{2})) \\ = 2 Q (\sqrt{\frac{\frac{3 {\tilde{P}}_{j}}{2}}{E {(γ_{i} ε_{R, i} + \frac{η_{R, j, 2, i}^{'} (t)}{{\tilde{d}}_{j}})}^{2}}}) + 2 Q (\sqrt{\frac{\frac{3 {\tilde{P}}_{j}}{2}}{E {(γ_{i} ε_{I, i} + \frac{η_{I, j, 2, i}^{'} (t)}{{\tilde{d}}_{j}})}^{2}}}) \\ = 4 Q (\sqrt{\frac{\frac{3 {\tilde{P}}_{j}}{2}}{γ_{i}^{2} \frac{α_{i}}{2} + \frac{σ_{2}^{2}}{2 {\tilde{d}}_{j}^{2}}}}) = p_{m} . \end{matrix}

(A11)

For simplification, let

\begin{matrix} ξ = \frac{1}{3} {[Q^{- 1} (\frac{p_{m}}{4})]}^{2} = \frac{1}{3} {[Q^{- 1} (\frac{τ}{8 J (N_{t} - 1)})]}^{2} . \end{matrix}

(A12)

Substituting (A11) into (A12), we have the following:

\begin{matrix} γ_{i}^{2} \frac{α_{i}}{2} + \frac{σ_{2}^{2}}{2 {\tilde{d}}_{j}^{2}} = \frac{{\tilde{P}}_{j}}{2 ξ} . \end{matrix}

(A13)

From (A13), we can conclude the following:

\begin{matrix} γ_{i} = \sqrt{\frac{1}{α_{i}} (\frac{{\tilde{P}}_{j}}{ξ} - \frac{σ_{2}^{2}}{{\tilde{d}}_{j}^{2}})}, \end{matrix}

(A14)

and note that

X_{j, i + 1}^{'} (t)

is subject to the power constraint

P_{j}

; hence, we have the following:

\begin{matrix} E [X_{j, i + 1}^{'} {(t)}^{H} X_{j, i + 1}^{'} (t)] = λ_{i}^{2} E {(γ_{i} ε_{i} + \frac{η_{j, 2, i}^{'} (t)}{{\tilde{d}}_{j}})}^{2} = P_{j} . \end{matrix}

(A15)

According to (A13), (A15), we conclude the following:

\begin{matrix} λ_{i} = \sqrt{ξ \cdot \frac{P_{j}}{{\tilde{P}}_{j}}} . \end{matrix}

(A16)

From (43), (A14), and (A16), we have the following:

\begin{matrix} Y_{j, i + 1}^{'} (t) = d_{j} λ_{i} (γ_{i} ε_{i} + \frac{η_{j, 2, i}^{'} (t)}{{\tilde{d}}_{j}}) + η_{j, 1, i + 1}^{'} (t), \end{matrix}

(A17)

we have the following:

\begin{matrix} β_{i + 1} & = \frac{λ_{i} γ_{i} α_{i}}{P_{j} + \frac{σ_{1}^{2}}{d_{j}^{2}}} = \frac{\sqrt{α_{i}}}{σ_{1}} \frac{\sqrt{{SNR}_{j} (1 - ξ \cdot {\tilde{SNR}}_{j}^{- 1} {\tilde{d}}_{j}^{- 2})}}{{SNR}_{j} + d_{j}^{- 2}} . \end{matrix}

(A18)

According to (44), (A14), (A16), (A17), and (A18), we have the following:

\begin{matrix} α_{i + 1} & = α_{i} {(1 + {SNR}_{j} d_{j}^{2} \frac{1 - ξ \cdot {\tilde{SNR}}_{j}^{- 1} {\tilde{d}}_{j}^{- 2}}{1 + ξ \cdot {SNR}_{j} \cdot {\tilde{SNR}}_{j}^{- 1} d_{j}^{2} {\tilde{d}}_{j}^{- 2}})}^{- 1} = 2 d_{j}^{- 2} {SNR}_{j}^{- 1} {(1 + \frac{{SNR}_{j} d_{j}^{2}}{Ψ_{1} Ψ_{2}})}^{- i}, \end{matrix}

(A19)

where

Ψ_{1}

,

Ψ_{2}

, and

ξ

are defined in (19). From (A6)–(A8), we have the following:

\begin{matrix} P r ({\tilde{E}}_{N_{t}}) & \leq P r {ε_{R, N_{t}} \notin [- \frac{\sqrt{3}}{2^{N_{t} R_{t, j, R}}}, \frac{\sqrt{3}}{2^{N_{t} R_{t, j, R}}})} + P r {ε_{I, N_{t}} \notin [- \frac{\sqrt{3}}{2^{N_{t} R_{t, j, I}}}, \frac{\sqrt{3}}{2^{N_{t} R_{t, j, I}}})} \leq \frac{τ}{2 J}, \end{matrix}

(A20)

and let

\begin{matrix} P r {ε_{R, N_{t}} \notin [- \frac{\sqrt{3}}{2^{N_{t} R_{t, j, R}}}, \frac{\sqrt{3}}{2^{N_{t} R_{t, j, R}}})} = P r {ε_{I, N_{t}} \notin [- \frac{\sqrt{3}}{2^{N_{t} R_{t, j, I}}}, \frac{\sqrt{3}}{2^{N_{t} R_{t, j, I}}})} = \frac{τ}{4 J} . \end{matrix}

(A21)

Next, we first analyze the term

P r {ε_{R, N_{t}} \notin [- \frac{\sqrt{3}}{2^{N_{t} R_{t, j, R}}}, \frac{\sqrt{3}}{2^{N_{t} R_{t, j, R}}})} = \frac{τ}{4 J}

, i.e.,

\begin{matrix} P r {ε_{R, N_{t}} \notin [- \frac{\sqrt{3}}{2^{N_{t} R_{t, j, R}}}, \frac{\sqrt{3}}{2^{N_{t} R_{t, j, R}}})} = 2 Q (\frac{\sqrt{3}}{2^{N_{t} R_{t, j, R}}} \cdot \frac{1}{\sqrt{α_{R, N_{t}}}}) = 2 Q (\frac{\sqrt{3}}{2^{N_{t} R_{t, j, R}}} \cdot \frac{1}{\sqrt{\frac{α_{N_{t}}}{2}}}) = \frac{τ}{4 J} . \end{matrix}

(A22)

Substituting (A19) into (A22), we have the following:

\begin{matrix} R_{t, j, R} = \frac{1}{2 N_{t}} log (\frac{3 {SNR}_{j} d_{j}^{2}}{{[Q^{- 1} (\frac{τ}{8 J})]}^{2}} {(1 + \frac{{SNR}_{j} d_{j}^{2}}{Ψ_{1} Ψ_{2}})}^{N_{t} - 1}) . \end{matrix}

(A23)

Analogously, we can show that

R_{t, j, I} = R_{t, j, R}

. The transmission rate of the j-th

(j = 1, \dots, J)

parallel sub-channel in the t-th round is given by the following:

R_{t, j} = R_{t, j, R} + R_{t, j, I} = \frac{1}{N_{t}} log (\frac{3 {SNR}_{j} d_{j}^{2}}{{[Q^{- 1} (\frac{τ}{8 J})]}^{2}} {(1 + \frac{{SNR}_{j} d_{j}^{2}}{Ψ_{1} Ψ_{2}})}^{N_{t} - 1}) .

(A24)

Combining (A24) and (36), and power allocating in Section 3.1.1, (18) in Theorem 1 is obtained. Then, according to

R = \frac{H (W_{1}^{''}, \dots, W_{T}^{''})}{N}

in (15), the transmission rate

R_{m i m o} (τ, N, δ, D, υ, ϵ)

is given by the following:

\begin{matrix} R_{m i m o} (τ, N, δ, D, υ, ϵ) = \frac{H (W_{1}^{″}, \dots, W_{T}^{″})}{N} \overset{(b)}{=} \frac{\sum_{t = 1}^{T} H (W_{t}^{''})}{N} = \frac{\sum_{t = 1}^{T} N_{t} R_{t}}{N}, \end{matrix}

(A25)

where

N = \sum_{t = 1}^{T} N_{t}

, and (b) is due to the fact that

W_{t}^{'}

is mapped into the uniformly distributed index

W_{t}^{″}

in each communication round, which indicates that

(W_{1}^{″}, \dots, W_{T}^{″})

are independent of each other; hence, (17) in Theorem 1 is obtained. A brief convergence analysis of our FBL approach is given below.

Convergence analysis: Following the convergence proof in [18,22], we can prove the existence of convergence in the WHFL system with our FBL scheme when the decoding error probability of our FBL scheme is significantly small. Furthermore, from (A19) and (A22), we conclude that the variance of the estimation error in our FBL approach converges to zero with double-exponential speed, which indicates that the final estimation

{\hat{θ}}_{j, N_{t}}

can always converge to

θ_{j}

, and the required coding block length for achieving the desired decoding error probability is significantly short.

Appendix A.3. Security Analysis

First, note that the eavesdropper’s equivocation rate,

Δ

, can be re-written as follows:

\begin{matrix} Δ & = \frac{H (W_{1}^{″}, \dots, W_{T}^{″} | Z^{N_{1}}, \dots, Z^{N_{T}}, h_{m i m o}, {\tilde{h}}_{m i m o}, g_{m i m o}, {\tilde{g}}_{m i m o})}{H (W_{1}^{″}, \dots, W_{T}^{″})} \\ = \frac{\sum_{t = 1}^{T} H (W_{t}^{''} | Z^{N_{1}}, \dots, Z^{N_{T}}, h_{m i m o}, {\tilde{h}}_{m i m o}, g_{m i m o}, {\tilde{g}}_{m i m o}, W_{1}^{''}, \dots, W_{t - 1}^{''})}{H (W_{1}^{″}, \dots, W_{T}^{″})} \\ \overset{(c)}{=} \frac{\sum_{t = 1}^{T} H (W_{t}^{''} | Z^{N_{t}}, h_{m i m o}, {\tilde{h}}_{m i m o}, g_{m i m o}, {\tilde{g}}_{m i m o})}{H (W_{1}^{″}, \dots, W_{T}^{″})} = \frac{\sum_{t = 1}^{T} H (W_{t}^{''} | Z^{N_{t}}, h_{m i m o}, {\tilde{h}}_{m i m o}, g_{m i m o}, {\tilde{g}}_{m i m o})}{\sum_{t = 1}^{T} H (W_{t}^{''})}, \end{matrix}

(A26)

where (c) follows from the Markov chain

W_{t}^{″} \to (Z^{N_{t}}, h_{m i m o}, {\tilde{h}}_{m i m o}, g_{m i m o}, {\tilde{g}}_{m i m o}) \to (Z^{N_{1}}, \dots,

Z^{N_{t - 1}}, Z^{N_{t + 1}}, \dots, Z^{N_{T}}, W_{1}^{″}, \dots, W_{t - 1}^{″})

. The term

H (W_{t}^{″} | Z^{N_{t}}, h_{m i m o}, {\tilde{h}}_{m i m o}, g_{m i m o}, {\tilde{g}}_{m i m o})

in (A26) is given by the following:

\begin{matrix} H (W_{t}^{″} | Z^{N_{t}}, h_{m i m o}, {\tilde{h}}_{m i m o}, g_{m i m o}, {\tilde{g}}_{m i m o}) \\ \overset{(e)}{\geq} H (W_{t}^{″} | \underset{Z_{1} (t)}{\underset{︸}{g_{m i m o} X_{1} (t) + {\tilde{g}}_{m i m o} {\tilde{X}}_{1} (t) + η_{e, 1} (t)}}, \dots, \underset{Z_{N_{t} - 1} (t)}{\underset{︸}{g_{m i m o} X_{N_{t} - 1} (t) + {\tilde{g}}_{m i m o} {\tilde{X}}_{N_{t} - 1} (t) + η_{e, N_{t} - 1} (t)}}, \\ \underset{Z_{N_{t}} (t)}{\underset{︸}{g_{m i m o} X_{N_{t}} (t) + η_{e, N_{t}} (t)}}, η_{1, 1} (t), \dots, η_{1, N_{t}} (t), η_{2, 1} (t), \dots, η_{2, N_{t}} (t), η_{e, 2} (t), \dots, η_{e, N_{t}} (t), {\tilde{X}}_{1}^{'} (t), \dots, {\tilde{X}}_{N_{t} - 1}^{'} (t), h_{m i m o}, \\ {\tilde{h}}_{m i m o}, g_{m i m o}, {\tilde{g}}_{m i m o}, U, Λ, V, \tilde{U}, \tilde{Λ}, \tilde{V}) \\ \overset{(f)}{=} H (W_{t}^{″} | g_{m i m o} V X_{1}^{'} (t) + η_{e, 1} (t), η_{1, 1} (t), \dots, η_{1, N_{t}} (t), η_{2, 1} (t), \dots, η_{2, N_{t}} (t), η_{e, 2} (t), \dots, η_{e, N_{t}} (t), {\tilde{X}}_{1}^{'} (t), \dots, {\tilde{X}}_{N_{t} - 1}^{'} (t), \\ h_{m i m o}, {\tilde{h}}_{m i m o}, g_{m i m o}, {\tilde{g}}_{m i m o}, U, Λ, V, \tilde{U}, \tilde{Λ}, \tilde{V}) \\ \overset{(g)}{=} H (W_{t}^{″} | g_{m i m o} V X_{1}^{'} (t) + η_{e, 1} (t)) \overset{(h)}{=} H (W_{t}^{″}) + h (η_{e, 1} (t)) - h (g_{m i m o} V X_{1}^{'} (t) + η_{e, 1} (t)) \\ \overset{(i)}{=} H (W_{t}^{″}) - \underset{Information leakage at time 1}{\underset{︸}{log det (I + \frac{g_{m i m o} K_{x_{1}} g_{m i m o}^{H}}{σ_{e}^{2}})}}, \end{matrix}

(A27)

where

(e) follows from the fact that conditioning reduces entropy, as shown in (28),

(f) follows from

X_{i}^{'} (t) = V^{H} X_{i} (t), {\tilde{X}}_{i}^{'} (t) = {\tilde{V}}^{H} {\tilde{X}}_{i} (t)

and

X_{i}^{'} (t)

(i = 2, \dots, N_{t})

is a function of

h_{m i m o}, {\tilde{h}}_{m i m o}, η_{1, 1} (t),

\dots, η_{1, i - 1} (t), η_{2, 1} (t), \dots, η_{2, i - 1} (t)

,

(g) follows from the fact that

{\tilde{X}}_{i}^{'} (t)

(i = 1, \dots, N_{t} - 1)

is only related to

ν_{i}

[39] [Chapter 4.1, pp. 61–63], and

h_{m i m o}

,

{\tilde{h}}_{m i m o}

,

g_{m i m o}

,

{\tilde{g}}_{m i m o}

,

U

,

Λ

,

V

,

\tilde{U}

,

\tilde{Λ}

,

\tilde{V}

,

η_{1, 1} (t), \dots, η_{1, N_{t}} (t)

,

η_{2, 1} (t), \dots, η_{2, N_{t}} (t)

,

η_{e, 2} (t), \dots, η_{e, N_{t}} (t)

,

ν_{1}, \dots, ν_{N_{t} - 1}

are independent of

W_{t}^{″}

,

X_{1}^{'} (t)

,

η_{e, 1} (t)

,

(h) is due to the fact that (33) and

X_{j, 1}^{'} (t) = \sqrt{\frac{P_{j}}{2}} θ_{j}

, and

W_{t}^{″} = (W_{t, 1}^{″},

\dots, W_{t, J}^{″})

are mapped into

(θ_{1}, \dots, θ_{J})

, respectively,

(i) follows from the following:

\begin{matrix} h (g_{m i m o} V X_{1}^{'} (t) + η_{e, 1} (t)) - h (η_{e, 1} (t)) & \leq log det (I + \frac{g_{m i m o} V E (X_{1}^{'} (t) X_{1}^{' H} (t)) V^{H} g_{m i m o}^{H}}{σ_{e}^{2}}) \\ = log det (I + \frac{g_{m i m o} E (X_{1} (t) X_{1}^{H} (t)) g_{m i m o}^{H}}{σ_{e}^{2}}), \end{matrix}

(A28)

and

K_{x_{1}} = E (X_{1} (t) X_{1}^{H} (t))

. Substituting (A27) into (A26), we have the following:

\begin{matrix} Δ & \geq \frac{\sum_{t = 1}^{T} H (W_{t}^{''}) (1 - \frac{log det (I + \frac{g_{m i m o} K_{x_{1}} g_{m i m o}^{H}}{σ_{e}^{2}})}{H (W_{t}^{''})})}{\sum_{t = 1}^{T} H (W_{t}^{''})} \geq min_{t \in {1, \dots, T}} (1 - \frac{log det (I + \frac{g_{m i m o} K_{x_{1}} g_{m i m o}^{H}}{σ_{e}^{2}})}{H (W_{t}^{''})}) . \end{matrix}

(A29)

From (8), (35) and (A29),

Δ \geq δ

in (15) is guaranteed if we have the following:

\begin{matrix} σ^{2} \geq \underset{Secrecy level of PLS}{\underset{︸}{max_{t \in {1, \dots, T}} [\frac{D \cdot 2^{\frac{2}{q (1 - δ)} log det (I + \frac{g_{m i m o} K_{x_{1}} g_{m i m o}^{H}}{σ_{e}^{2}})} - S_{ℓ} σ_{w, t}^{2}}{K}]}} . \end{matrix}

(A30)

With the assumption that the edge server has an imperfect CSI of the eavesdropper’s channel, combining (A30) and Definition 3, (A30) can be re-written by the following:

\begin{matrix} σ^{2} \geq \underset{Secrecy level of PLS}{\underset{︸}{max_{t \in {1, \dots, T}, Δ g_{m i m o}} [\frac{D \cdot 2^{\frac{2}{q (1 - δ)} log det (I + \frac{{\hat{g}}_{m i m o} K_{x_{1}} {\hat{g}}_{m i m o}^{H}}{σ_{e}^{2}})} - S_{ℓ} σ_{w, t}^{2}}{K}]}}, \end{matrix}

(A31)

where

{\hat{g}}_{m i m o} = g_{m i m o} - Δ g_{m i m o}

. Then, combining (A3) and (A31), (16) in Theorem 1 is obtained.

The proof of Theorem 1 is completed.

References

Zhu, G.; Liu, D.; Du, Y.; You, C.; Zhang, J.; Huang, K. Toward an Intelligent Edge: Wireless Communication Meets Machine Learning. IEEE Commun. Mag. 2020, 58, 19–25. [Google Scholar] [CrossRef]
Yang, Z.; Chen, M.; Saad, W.; Hong, C.S.; Shikh-Bahaei, M. Energy Efficient Federated Learning Over Wireless Communication Networks. IEEE Trans. Wireless Commun. 2021, 20, 1935–1949. [Google Scholar] [CrossRef]
Amiri, M.M.; Gündüz, D. Federated Learning Over Wireless Fading Channels. IEEE Trans. Wireless Commun. 2020, 19, 3546–3557. [Google Scholar] [CrossRef]
Jin, R.; He, X.; Dai, H. Communication Efficient Federated Learning with Energy Awareness Over Wireless Networks. IEEE Trans. Wireless Commun. 2022, 21, 5204–5219. [Google Scholar] [CrossRef]
Zhu, G.; Wang, Y.; Huang, K. Broadband Analog Aggregation for Low-Latency Federated Edge Learning. IEEE Trans. Wireless Commun. 2020, 19, 491–506. [Google Scholar] [CrossRef]
Zhu, G.; Du, Y.; Gündüz, D.; Huang, K. One-Bit Over-the-Air Aggregation for Communication-Efficient Federated Edge Learning: Design and Convergence Analysis. IEEE Trans. Wireless Commun. 2021, 20, 2120–2135. [Google Scholar] [CrossRef]
Elgabli, A.; Park, J.; Issaid, C.B.; Bennis, M. Harnessing Wireless Channels for Scalable and Privacy-Preserving Federated Learning. IEEE Trans. Commun. 2021, 69, 5194–5208. [Google Scholar] [CrossRef]
Wen, H.; Wu, Y.; Yang, C.; Duan, H.; Yu, S. A Unified Federated Learning Framework for Wireless Communications: Towards Privacy, Efficiency, and Security. In Proceedings of the 2020 IEEE INFOCOM Computer Communications Workshops (INFOCOM WKSHPS), Toronto, ON, Canada, 6–9 July 2020; pp. 653–658. [Google Scholar]
Seif, M.; Tandon, R.; Li, M. Wireless Federated Learning with Local Differential Privacy. In Proceedings of the 2020 IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020; pp. 2604–2609. [Google Scholar]
Yuan, X.; Ni, W.; Ding, M.; Wei, K.; Li, J.; Poor, H.V. Amplitude-Varying Perturbation for Balancing Privacy and Utility in Federated Learning. IEEE Trans. Inf. Forensics Secur. 2023, 18, 1884–1897. [Google Scholar] [CrossRef]
Kim, M.; Günlü, O.; Schaefer, R.F. Federated Learning with Local Differential Privacy: Trade-Offs Between Privacy, Utility, and Communication. In Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Process (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 2650–2654. [Google Scholar]
Zhou, J.; Su, Z.; Ni, J.; Wang, Y.; Pan, Y.; Xing, R. Personalized Privacy-Preserving Federated Learning: Optimized Trade-off Between Utility and Privacy. In Proceedings of the 2022 IEEE Global Communications Conference (GLOBECOM), Rio de Janeiro, Brazil, 4–8 December 2022; pp. 4872–4877. [Google Scholar]
Guo, S.; Su, Z.; Tian, Z.; Yu, S. Utility-Aware Privacy-Preserving Federated Learning through Information Bottleneck. In Proceedings of the 2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Wuhan, China, 9–11 December 2022; pp. 680–686. [Google Scholar]
Wang, B.; Chen, Y.; Jiang, H.; Zhao, Z. PPeFL: Privacy-Preserving Edge Federated Learning with Local Differential Privacy. IEEE Internet Things J. 2023, 10, 15488–15500. [Google Scholar] [CrossRef]
Zhang, N.; Tao, M. Gradient Statistics Aware Power Control for Over-the-Air Federated Learning. IEEE Trans. Wireless Commun. 2021, 20, 5115–5128. [Google Scholar] [CrossRef]
Liu, D.; Simeone, O. Privacy for Free: Wireless Federated Learning via Uncoded Transmission with Adaptive Power Control. IEEE J. Sel. Areas Commun. 2021, 39, 170–185. [Google Scholar] [CrossRef]
Yang, K.; Jiang, T.; Shi, Y.; Ding, Z. Federated Learning via Over-the-Air Computation. IEEE Trans. Wireless Commun. 2020, 19, 2022–2035. [Google Scholar] [CrossRef]
Liu, L.; Zhang, J.; Song, S.H.; Letaief, K.B. Client-Edge-Cloud Hierarchical Federated Learning. In Proceedings of the 2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–6. [Google Scholar]
Luo, S.; Chen, X.; Wu, Q.; Zhou, Z.; Yu, S. HFEL: Joint Edge Association and Resource Allocation for Cost-Efficient Hierarchical Federated Edge Learning. IEEE Trans. Wireless Commun. 2020, 19, 6535–6548. [Google Scholar] [CrossRef]
Liu, S.; Yu, G.; Chen, X.; Bennis, M. Joint User Association and Resource Allocation for Wireless Hierarchical Federated Learning with IID and Non-IID Data. IEEE Trans. Wireless Commun. 2022, 21, 7852–7866. [Google Scholar] [CrossRef]
Wen, W.; Chen, Z.; Yang, H.H.; Xia, W.; Quek, T.Q.S. Joint Scheduling and Resource Allocation for Hierarchical Federated Edge Learning. IEEE Trans. Wireless Commun. 2022, 21, 5857–5872. [Google Scholar] [CrossRef]
Shi, L.; Shu, J.; Zhang, W.; Liu, Y. HFL-DP: Hierarchical Federated Learning with Differential Privacy. In Proceedings of the 2021 IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, 7–11 December 2021; pp. 1–7. [Google Scholar]
Wainakh, A.; Guinea, A.S.; Grube, T.; Mühlhäuser, M. Enhancing Privacy via Hierarchical Federated Learning. In Proceedings of the 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), Genoa, Italy, 7–11 September 2020; pp. 344–347. [Google Scholar]
Feng, C.; Yang, H.H.; Hu, D.; Zhao, Z.; Quek, T.Q.S.; Min, G. Mobility-Aware Cluster Federated Learning in Hierarchical Wireless Networks. IEEE Trans. Wireless Commun. 2022, 21, 8441–8458. [Google Scholar] [CrossRef]
Wyner, A.D. The Wire-Tap Channel. Bell Syst. Tech. J. 1975, 54, 1355–1387. [Google Scholar] [CrossRef]
Yao, J.; Ansari, N. Secure Federated Learning by Power Control for Internet of Drones. IEEE Trans. Cognitive Commun. Netw. 2021, 7, 1021–1031. [Google Scholar] [CrossRef]
Wang, T.; Li, Y.; Wu, Y.; Quek, T.Q.S. Secrecy driven Federated Learning via Cooperative Jamming: An Approach of Latency Minimization. IEEE Trans. Emerg. Topics Comput. 2021, 10, 1687–1703. [Google Scholar] [CrossRef]
Qian, L.; Wu, W.; Lu, W.; Wu, Y.; Lin, B.; Quek, T.Q.S. Secrecy-Based Energy-Efficient Mobile Edge Computing via Cooperative Non-Orthogonal Multiple Access Transmission. IEEE Trans. Commun. 2021, 69, 4659–4677. [Google Scholar] [CrossRef]
Yan, Z.; Li, D.; Zhang, Z.; He, J. Accuracy-Security Tradeoff with Balanced Aggregation and Artificial Noise for Wireless Federated Learning. IEEE Internet Things J. 2023, 10, 18154–18167. [Google Scholar] [CrossRef]
Zhang, H.; Yang, C.; Dai, B. When Wireless Federated Learning Meets Physical Layer Security: The Fundamental Limits. In Proceedings of the IEEE INFOCOM Computer Communications Workshops (INFOCOM WKSHPS), New York, NY, USA, 2–5 May 2022; pp. 1–6. [Google Scholar]
Durisi, G.; Koch, T.; Popovski, P. Toward Massive, Ultrareliable, and Low-Latency Wireless Communication with Short Packets. Proc. IEEE 2016, 104, 1711–1726. [Google Scholar] [CrossRef]
Polyanskiy, Y.; Poor, H.V.; Verdu, S. Channel Coding Rate in the Finite Blocklength Regime. IEEE Trans. Inf. Theory 2010, 56, 2307–2359. [Google Scholar] [CrossRef]
She, C.; Dong, R.; Gu, Z.; Hou, Z.; Li, Y.; Hardjawana, W.; Vucetic, B.; Song, L.; Yang, C. Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G Networks. IEEE Netw. 2020, 34, 219–225. [Google Scholar] [CrossRef]
Samarakoon, S.; Bennis, M.; Saad, W.; Debbah, M. Distributed Federated Learning for Ultra-Reliable Low-Latency Vehicular Communications. IEEE Trans. Commun. 2020, 68, 1146–1159. [Google Scholar] [CrossRef]
Schalkwijk, J.; Kailath, T. A coding scheme for additive noise channels with feedback–I: No bandwidth constraint. IEEE Trans. Inf. Theory 1966, 12, 172–182. [Google Scholar] [CrossRef]
Gunduz, D.; Brown, D.R.; Poor, H.V. Secret communication with feedback. In Proceedings of the 2008 International Symposium on Information Theory and Its Applications (ISITA), Auckland, New Zealand, 7–10 December 2008; pp. 1–6. [Google Scholar]
Truong, L.V.; Fong, S.L.; Tan, V.Y.F. On Gaussian Channels with Feedback Under Expected Power Constraints and with Non-Vanishing Error Probabilities. IEEE Trans. Inf. Theory 2017, 63, 1746–1765. [Google Scholar] [CrossRef]
Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep Learning with Differential Privacy. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 303–318. [Google Scholar]
Zamir, R. Lattice Coding for Signals and Network; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
Zhi, K.; Pan, C.; Ren, H.; Wang, K.; Elkashlan, M.; Di Renzo, M.; Hanzo, L.; Schober, R.; Wang, J. Two-Timescale Design for Reconfigurable Intelligent Surface-Aided Massive MIMO Systems with Imperfect CSI. IEEE Trans. Inf. Theory 2022, 69, 3001–3033. [Google Scholar] [CrossRef]
Schiessl, S.; Al-Zubaidy, H.; Skoglund, M.; Gross, J. Delay Performance of Wireless Communications with Imperfect CSI and Finite-Length Coding. IEEE Trans. Commun. 2018, 66, 6527–6541. [Google Scholar] [CrossRef]
Chen, Z.J.; Hernandez, E.E.; Huang, Y.C.; Rini, S. DNN gradient lossless compression: Can GenNorm be the answer? In Proceedings of the 2022 IEEE International Conference on Communications (ICC), Seoul, Republic of Korea, 16–20 May 2022; pp. 407–412. [Google Scholar]
Wang, W.; Ying, L.; Zhang, J. On the Relation Between Identifiability, Differential Privacy, and Mutual-Information Privacy. IEEE Trans. Inf. Theory 2016, 62, 5018–5029. [Google Scholar] [CrossRef]
Sankar, L.; Rajagopalan, S.R.; Poor, H.V. Utility-Privacy Tradeoffs in Databases: An Information-Theoretic Approach. IEEE Trans. Inf. Forensics Secur. 2013, 8, 838–852. [Google Scholar] [CrossRef]
Gamal, A.A.E.; Kim, Y.-H. Network Information Theory; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Han, S.; Xu, X.; Fang, S.; Sun, Y.; Cao, Y.; Tao, X.; Zhang, P. Energy Efficient Secure Computation Offloading in NOMA-Based mMTC Networks for IoT. IEEE Internet Things J. 2019, 6, 5674–5690. [Google Scholar] [CrossRef]
Ng, D.W.K.; Lo, E.S.; Schober, R. Robust Beamforming for Secure Communication in Systems with Wireless Information and Power Transfer. IEEE Trans. Wireless Commun. 2014, 13, 4599–4615. [Google Scholar] [CrossRef]
Tekin, E.; Yener, A. The Gaussian Multiple Access Wire-Tap Channel. IEEE Trans. Inf. Theory 2008, 54, 5747–5755. [Google Scholar] [CrossRef]
Tse, D.; Viswanath, P. Fundamentals of Wireless Communication; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
Welch, T.A. A technique of high-performance data compression. IEEE Comput. 1984, 17, 8–19. [Google Scholar] [CrossRef]

Figure 1. The multi-antenna WHFL in the presence of PLS.

Figure 2. An information-theoretic model of the WHFL system, where the edge server, cloud server and eavesdroppers are equipped with A, B, and C antennas, respectively (

A \geq 1, B \geq 1, C \geq 1

).

Figure 2. An information-theoretic model of the WHFL system, where the edge server, cloud server and eavesdroppers are equipped with A, B, and C antennas, respectively (

A \geq 1, B \geq 1, C \geq 1

).

Figure 3. A schematic diagram of the FBL approach for the WHFL over the MIMO channel.

Figure 4. Comparison of the message mapping methods between the classical SK scheme and the scheme in this paper. (a) Message mapping of classical SK scheme. (b) Message mapping in this paper.

Figure 5. Comparing the mechanisms between the classical SK scheme and the two-dimensional MLO-based SK-type scheme, where

{\hat{θ}}_{i - 1}

represents the estimation of the transmitted message

θ

at time

i - 1

. (a) The classical SK scheme in a certain round i. (b) The two-dimensional MLO-based SK-type scheme in a certain round i.

Figure 5. Comparing the mechanisms between the classical SK scheme and the two-dimensional MLO-based SK-type scheme, where

{\hat{θ}}_{i - 1}

represents the estimation of the transmitted message

θ

at time

i - 1

. (a) The classical SK scheme in a certain round i. (b) The two-dimensional MLO-based SK-type scheme in a certain round i.

Figure 6. A schematic diagram of the FBL approach for the SIMO WHFL.

Figure 7. Performance comparison between the different schemes on the MNIST dataset (

K = 10

,

S_{ℓ} = 60,000

,

q = 15,910

,

D = 10^{- 4}

,

\tilde{SNR} = 15

dB,

P = 10

,

τ = 10^{- 6}

,

σ_{1}^{2} = σ_{2}^{2} = 1, σ_{e}^{2} = 2

). (a)

A = B = C = 4, δ = 0.99994

. (b)

A = 1, B = C = 4, δ = 0.99997

. (c)

A = 4, B = C = 1, δ = 0.99997

. (d)

A = B = C = 1, δ = 0.99998

.

Figure 7. Performance comparison between the different schemes on the MNIST dataset (

K = 10

,

S_{ℓ} = 60,000

,

q = 15,910

,

D = 10^{- 4}

,

\tilde{SNR} = 15

dB,

P = 10

,

τ = 10^{- 6}

,

σ_{1}^{2} = σ_{2}^{2} = 1, σ_{e}^{2} = 2

). (a)

A = B = C = 4, δ = 0.99994

. (b)

A = 1, B = C = 4, δ = 0.99997

. (c)

A = 4, B = C = 1, δ = 0.99997

. (d)

A = B = C = 1, δ = 0.99998

.

Figure 8. Performance comparison between the different schemes on the MNIST dataset (

K = 10

,

S_{ℓ} = 60,000

,

q = 15,910

,

D = 10^{- 4}

,

\tilde{SNR} = 15

dB,

P = 10

,

τ = 10^{- 6}

,

σ_{1}^{2} = σ_{2}^{2} = 1, σ_{e}^{2} = 2

). (a)

A = B = C = 4, δ = 0.99994

. (b)

A = 1, B = C = 4, δ = 0.99997

. (c)

A = 4, B = C = 1, δ = 0.99997

. (d)

A = B = C = 1, δ = 0.99998

.

Figure 8. Performance comparison between the different schemes on the MNIST dataset (

K = 10

,

S_{ℓ} = 60,000

,

q = 15,910

,

D = 10^{- 4}

,

\tilde{SNR} = 15

dB,

P = 10

,

τ = 10^{- 6}

,

σ_{1}^{2} = σ_{2}^{2} = 1, σ_{e}^{2} = 2

). (a)

A = B = C = 4, δ = 0.99994

. (b)

A = 1, B = C = 4, δ = 0.99997

. (c)

A = 4, B = C = 1, δ = 0.99997

. (d)

A = B = C = 1, δ = 0.99998

.

Figure 9. Transmission latency (200 rounds) of the different schemes on the MNIST dataset (

K = 10

,

S_{ℓ} = 60,000

,

q = 15,910

,

D = 10^{- 4}

,

τ = 10^{- 6}

,

\tilde{SNR} = 15

dB,

σ_{1}^{2} = 1, σ_{2}^{2} = 1,

σ_{e}^{2} = 2

,

T = 200

). (a)

A = B = C = 4, δ = 0.99994

. (b)

A = 1, B = C = 4, δ = 0.99997

. (c)

A = 4, B = C = 1, δ = 0.99997

. (d)

A = B = C = 1, δ = 0.99998

.

Figure 9. Transmission latency (200 rounds) of the different schemes on the MNIST dataset (

K = 10

,

S_{ℓ} = 60,000

,

q = 15,910

,

D = 10^{- 4}

,

τ = 10^{- 6}

,

\tilde{SNR} = 15

dB,

σ_{1}^{2} = 1, σ_{2}^{2} = 1,

σ_{e}^{2} = 2

,

T = 200

). (a)

A = B = C = 4, δ = 0.99994

. (b)

A = 1, B = C = 4, δ = 0.99997

. (c)

A = 4, B = C = 1, δ = 0.99997

. (d)

A = B = C = 1, δ = 0.99998

.

Figure 10. Transmission latency (200 rounds) of our schemes under different feedback channel SNR and perfect CSI on the MNIST dataset (

K = 10

,

S_{ℓ} = 60,000

,

q = 15,910

,

P = 10

,

D = 10^{- 4}

,

τ = 10^{- 6}

,

σ_{1}^{2} = 1, σ_{2}^{2} = 1, σ_{e}^{2} = 2

,

T = 200

).

Figure 10. Transmission latency (200 rounds) of our schemes under different feedback channel SNR and perfect CSI on the MNIST dataset (

K = 10

,

S_{ℓ} = 60,000

,

q = 15,910

,

P = 10

,

D = 10^{- 4}

,

τ = 10^{- 6}

,

σ_{1}^{2} = 1, σ_{2}^{2} = 1, σ_{e}^{2} = 2

,

T = 200

).

Figure 11. The relationship between the PLS (secrecy level), the privacy-utility, and LDP noise variance of proposed FBL schemes on the MNIST dataset (

K = 10

,

S_{ℓ} = 60,000

,

q = 15,910

,

D = 10^{- 4}

,

\tilde{SNR} = 15

dB,

P = 10

,

τ = 10^{- 6}

,

σ_{1}^{2} = σ_{2}^{2} = 1,

σ_{e}^{2} = 2

). (a)

A = B = C = 4

. (b)

A = B = C = 4

. (c)

A = B = C = 4

. (d)

A = 1, B = C = 4

. (e)

A = 1, B = C = 4

. (f)

A = 1, B = C = 4

. (g)

A = 4, B = C = 1

. (h)

A = 4, B = C = 1

. (i)

A = 4, B = C = 1

. (j)

A = B = C = 1

. (k)

A = B = C = 1

. (l)

A = B = C = 1

.

Figure 11. The relationship between the PLS (secrecy level), the privacy-utility, and LDP noise variance of proposed FBL schemes on the MNIST dataset (

K = 10

,

S_{ℓ} = 60,000

,

q = 15,910

,

D = 10^{- 4}

,

\tilde{SNR} = 15

dB,

P = 10

,

τ = 10^{- 6}

,

σ_{1}^{2} = σ_{2}^{2} = 1,

σ_{e}^{2} = 2

). (a)

A = B = C = 4

. (b)

A = B = C = 4

. (c)

A = B = C = 4

. (d)

A = 1, B = C = 4

. (e)

A = 1, B = C = 4

. (f)

A = 1, B = C = 4

. (g)

A = 4, B = C = 1

. (h)

A = 4, B = C = 1

. (i)

A = 4, B = C = 1

. (j)

A = B = C = 1

. (k)

A = B = C = 1

. (l)

A = B = C = 1

.

Table 1. Summarizing all results in WFL in the presence of privacy, utility, PLS and URLLC.

Related Work	Privacy	Utility	PLS	Relationship between PLS, Privacy, and Utility	URLLC
[7,8,9,16,23]	✓	−	−	−	−
[10,11,12,13,14,22]	✓	✓	−	Relationship between Privacy and Utility	−
[26,27,28]	−	−	✓	−	−
[29]	−	✓	✓	Relationship between PLS and Utility	−
[30]	✓	✓	✓	Relationship between PLS-Privacy-Utility	−
[33,34]	−	−	−	−	✓
This Work	✓	✓	✓	Relationship between PLS-Privacy-Utility	✓

Table 2. Achievable secrecy rates of the different schemes on the MNIST dataset (

K = 10

,

S_{ℓ} = 60,000

,

q = 15,910

,

\tilde{SNR} = 15

dB,

P = 10

,

τ = 10^{- 6}

,

T = 200

,

D = 10^{- 4}

,

σ_{1}^{2} = σ_{2}^{2} = 1, σ_{e}^{2} = 2

).

Table 2. Achievable secrecy rates of the different schemes on the MNIST dataset (

K = 10

,

S_{ℓ} = 60,000

,

q = 15,910

,

\tilde{SNR} = 15

dB,

P = 10

,

τ = 10^{- 6}

,

T = 200

,

D = 10^{- 4}

,

σ_{1}^{2} = σ_{2}^{2} = 1, σ_{e}^{2} = 2

).

Number of Antennas	$A = B = C = 4$ (MIMO)	$A = 1, B = C = 4$ (SIMO)	$A = 4, B = C = 1$ (MISO)	$A = B = C = 1$ (SISO)
Our scheme (Perfect CSI)	$10.3951 (bits / symbol)$ ( $ϵ = 0.04$ , $υ = 3$ , $σ^{2} = 0.3$ , $δ = 0.99994$ )	$4.9928 (bits / symbol$ ) ( $ϵ = 0.025$ , $υ = 5$ , $σ^{2} = 0.5$ , $δ = 0.99997$ )	$5.1346 (bits / symbol)$ ( $ϵ = 0.018$ , $υ = 6.5$ , $σ^{2} = 0.65$ , $δ = 0.99997$ )	2.6718 (bits/symbol) ( $ϵ = 0.032$ , $υ = 4$ , $σ^{2} = 0.4$ , $δ = 0.99998$ )
Our scheme (Imperfect CSI, $Ω = 0.2$ )	$10.3941 (bits / symbol)$ ( $ϵ = 0.025$ , $υ = 5$ , $σ^{2} = 0.5$ , $δ = 0.99994$ )	$4.9922 (bits / symbol$ ) ( $ϵ = 0.012$ , $υ = 10$ , $σ^{2} = 1$ , $δ = 0.99997$ )	$5.1343 (bits / symbol)$ ( $ϵ = 0.013$ , $υ = 9$ , $σ^{2} = 0.9$ , $δ = 0.99997$ )	2.6708 (bits/symbol) ( $ϵ = 0.01$ , $υ = 12$ , $σ^{2} = 1.2$ , $δ = 0.99998$ )
Our scheme (Imperfect CSI, $Ω = 0.4$ )	$10.3898 (bits / symbol)$ ( $ϵ = 0.0025$ , $υ = 50$ , $σ^{2} = 5$ , $δ = 0.99994$ )	$4.9914 (bits / symbol$ ) ( $ϵ = 0.005$ , $υ = 24$ , $σ^{2} = 2.4$ , $δ = 0.99997$ )	$5.1338 (bits / symbol)$ ( $ϵ = 0.0065$ , $υ = 18.5$ , $σ^{2} = 1.85$ , $δ = 0.99997$ )	2.6689 (bits/symbol) ( $ϵ = 8.2 \times 10^{- 4}$ , $υ = 150$ , $σ^{2} = 15$ , $δ = 0.99998$ )
Baseline 1 [26,28] (Perfect CSI)	$4.2827 (bits / symbol)$ ( $ϵ = 0.04$ , $υ = 3$ , $σ^{2} = 0.3$ )	$2.1537 (bits / symbol)$ ( $ϵ = 0.025$ , $υ = 5$ , $σ^{2} = 0.5$ )	$2.2124 (bits / symbol)$ ( $ϵ = 0.018$ , $υ = 6.5$ , $σ^{2} = 0.65$ )	$0.8046 (bits / symbol)$ ( $ϵ = 0.032$ , $υ = 4$ , $σ^{2} = 0.4$ )
Baseline 2 [29] (Perfect CSI)	$5.2228 (bits / symbol)$ ( $ϵ = 0.04$ , $υ = 3$ , $σ^{2} = 0.3$ )	$2.6265 (bits / symbol)$ ( $ϵ = 0.025$ , $υ = 5$ , $σ^{2} = 0.5$ )	$2.6981 (bits / symbol)$ ( $ϵ = 0.018$ , $υ = 6.5$ , $σ^{2} = 0.65$ )	$0.9812 (bits / symbol)$ ( $ϵ = 0.032$ , $υ = 4$ , $σ^{2} = 0.4$ )

Table 3. Achievable secrecy rates of our schemes under different feedback channel SNR on the MNIST dataset (

K = 10

,

S_{ℓ} = 60,000

,

q = 15,910

,

P = 10

,

τ = 10^{- 6}

,

T = 200

,

D = 10^{- 4}

,

σ_{1}^{2} = σ_{2}^{2} = 1, σ_{e}^{2} = 2

).

Table 3. Achievable secrecy rates of our schemes under different feedback channel SNR on the MNIST dataset (

K = 10

,

S_{ℓ} = 60,000

,

q = 15,910

,

P = 10

,

τ = 10^{- 6}

,

T = 200

,

D = 10^{- 4}

,

σ_{1}^{2} = σ_{2}^{2} = 1, σ_{e}^{2} = 2

).

Number of Antennas	$A = B = C = 4$ (MIMO) ( $ϵ = 0.04$ , $υ = 3$ , $σ^{2} = 0.3$ , $δ = 0.99994$ )	$A = 1, B = C = 4$ (SIMO) $(ϵ = 0.025$ , $υ = 5$ , $σ^{2} = 0.5$ , $δ = 0.99997)$	$A = 4, B = C = 1$ (MISO) ( $ϵ = 0.018$ , $υ = 6.5$ , $σ^{2} = 0.65$ , $δ = 0.99997$ )	$A = B = C = 1$ (SISO) ( $ϵ = 0.032$ , $υ = 4$ , $σ^{2} = 0.4$ , $δ = 0.99998$ )
$\tilde{SNR} = 10$ dB (Perfect CSI)	$8.0911 (bits / symbol)$	$3.6612 (bits / symbol$ )	$3.7589 (bits / symbol)$	$2.2885 (bits / symbol)$
$\tilde{SNR} = 15$ dB (Perfect CSI)	$10.3951 (bits / symbol)$	$4.9928 (bits / symbol$ )	$5.1346 (bits / symbol)$	$2.6718 (bits / symbol)$
$\tilde{SNR} = 20$ dB (Perfect CSI)	$12.5581 (bits / symbol)$	$6.1146 (bits / symbol$ )	$6.3177 (bits / symbol)$	$3.0622 (bits / symbol)$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, H.; Xu, P.; Dai, B. Ultra-Reliable and Low-Latency Wireless Hierarchical Federated Learning: Performance Analysis. Entropy 2024, 26, 827. https://doi.org/10.3390/e26100827

AMA Style

Zhang H, Xu P, Dai B. Ultra-Reliable and Low-Latency Wireless Hierarchical Federated Learning: Performance Analysis. Entropy. 2024; 26(10):827. https://doi.org/10.3390/e26100827

Chicago/Turabian Style

Zhang, Haonan, Peng Xu, and Bin Dai. 2024. "Ultra-Reliable and Low-Latency Wireless Hierarchical Federated Learning: Performance Analysis" Entropy 26, no. 10: 827. https://doi.org/10.3390/e26100827

APA Style

Zhang, H., Xu, P., & Dai, B. (2024). Ultra-Reliable and Low-Latency Wireless Hierarchical Federated Learning: Performance Analysis. Entropy, 26(10), 827. https://doi.org/10.3390/e26100827

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ultra-Reliable and Low-Latency Wireless Hierarchical Federated Learning: Performance Analysis^†

Abstract

1. Introduction

2. Definitions, System Model and Main Results

2.1. WHFL System

2.2. Model Formulation

2.2.1. Privacy-Utility

2.2.2. Gradient Compression

2.2.3. Communication Model

2.3. Main Results

3. An FBL Approach for the MIMO WHFL

3.1. An FBL Approach for the MIMO WHFL

3.1.1. Channel Decomposition by SVD

3.1.2. Message Splitting

3.1.3. An FBL Scheme of Each Parallel Sub-Channel

4. An FBL Approach for the SIMO WHFL

5. Simulation Results

5.1. Experimental Settings

5.2. Experimental Results

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Appendix A. The Formal Proof of Theorem 1

Appendix A.1. Utility and Privacy Analysis

Appendix A.2. Decoding Error Probability and Convergence Analysis

Appendix A.3. Security Analysis

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Ultra-Reliable and Low-Latency Wireless Hierarchical Federated Learning: Performance Analysis †

Abstract

1. Introduction

2. Definitions, System Model and Main Results

2.1. WHFL System

2.2. Model Formulation

2.2.1. Privacy-Utility

2.2.2. Gradient Compression

2.2.3. Communication Model

2.3. Main Results

3. An FBL Approach for the MIMO WHFL

3.1. An FBL Approach for the MIMO WHFL

3.1.1. Channel Decomposition by SVD

3.1.2. Message Splitting

3.1.3. An FBL Scheme of Each Parallel Sub-Channel

4. An FBL Approach for the SIMO WHFL

5. Simulation Results

5.1. Experimental Settings

5.2. Experimental Results

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Appendix A. The Formal Proof of Theorem 1

Appendix A.1. Utility and Privacy Analysis

Appendix A.2. Decoding Error Probability and Convergence Analysis

Appendix A.3. Security Analysis

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Ultra-Reliable and Low-Latency Wireless Hierarchical Federated Learning: Performance Analysis^†