**Information Theoretic Methods for Future Communication Systems**

Editors

**Onur G ¨unl ¨u Rafael F. Schaefer Holger Boche H. Vincent Poor**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Onur Gunl ¨ u¨ Linkoping University ¨ Sweden

H. Vincent Poor Princeton University USA

Rafael F. Schaefer Technische Universitat¨ Dresden Germany

Holger Boche Technische Universitat¨ Munchen ¨ Germany

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Entropy* (ISSN 1099-4300) (available at: https://www.mdpi.com/journal/entropy/special issues/ future communication).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-7364-9 (Hbk) ISBN 978-3-0365-7365-6 (PDF)**

© 2023 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

### **Contents**


### **About the Editors**

### **Onur G ¨unl ¨u**

Onur Gunl ¨ u received a B.Sc. degree (Highest Distinction) in Electrical and Electronics ¨ Engineering from Bilkent University, Turkey in 2011, and M.Sc. (Highest Distinction) and Dr.-Ing. (Ph.D. equivalent) degrees in Communications Engineering, both from the Technical University of Munich (TUM), Germany, in October 2013 and November 2018, respectively. He was a Working Student in the Communication Systems division of Intel Mobile Communications (IMC) in Munich, Germany between November 2012 and March 2013. He worked as a Research and Teaching Assistant at the TUM Chair of Communications Engineering between February 2014 and May 2019. Onur has made many research visits to top universities and companies, including visits to the TU Eindhoven, Netherlands and later to Georgia Institute of Technology, USA. He was a Research Associate and Dozent between June 2019 and September 2020 and was a Research Group Leader and Dozent between October 2020 and March 2021 at TU Berlin, and he held the same academic titles at the Chair of Communications Engineering and Security at the University of Siegen, Germany from April 2021 until September 2022. Onur has been working as an ELLIIT Assistant Professor at Linkoping University, Sweden, as the head of the Information Theory and Security ¨ (ITS) group within the Information Coding Division (ICG) since October 2022. He has received the prestigious VDE Information Technology Society (ITG) 2021 Johann-Philipp-Reis Award, was selected by the IEEE Communications Society as 2021 Exemplary Reviewer of the IEEE Transactions on Communications (TCOM), and received the 2023 ZENITH Research and Career Development Award. His research interests include information-theoretic privacy and security, coding theory, statistical signal processing for biometrics and physical unclonable functions (PUFs), private (federated) learning and function computations, and doubly exponential (secure) identification via channels.

### **Rafael F. Schaefer**

Rafael F. Schaefer is a Professor and head of the Chair of Information Theory and Machine Learning at Technische Universitat Dresden. He received a Dipl.-Ing. degree in Electrical Engineering ¨ and Computer Science from the Technische Universitat Berlin, Germany, in 2007, and a Dr.-Ing. ¨ degree in Electrical Engineering from the Technische Universitat M¨ unchen, Germany, in 2012. From ¨ 2013 to 2015, he was a Post-Doctoral Research Fellow with Princeton University. From 2015 to 2020 he was an Assistant Professor with the Technische Universitat Berlin, Germany, and from 2021 ¨ to 2022 a Professor with the Universitat Siegen, Germany. Among his publications is the recent ¨ book *Information Theoretic Security and Privacy of Information Systems* (Cambridge University Press, 2017). He was a recipient of the VDE Johann-Philipp-Reis Award in 2013. He received the best paper award from the German Information Technology Society (ITG-Preis) in 2016. He is currently an Associate Editor of the *IEEE Transactions on Information Forensics and Security* and of the *IEEE Transactions on Communications*. He is a member of the IEEE Information Forensics and Security Technical Committee.

### **Holger Boche**

Holger Boche received a Dipl.-Ing. degree in Electrical Engineering, a graduate degree in Mathematics, and a Dr.-Ing. degree in Electrical Engineering from the Technische Universitat¨ Dresden, Germany, in 1990, 1992, and 1994, respectively. In 1998, he received the Dr. rer. nat. degree in Pure Mathematics from the Technische Universitat Berlin, Germany. From 2002 to 2010, ¨ he was a Full Professor in Mobile Communication Networks with the Institute for Communications Systems, Technische Universitat Berlin, Germany. In 2004, he became the Director of the Fraunhofer ¨ Institute for Telecommunications (HHI). He is currently a Full Professor at the Institute of Theoretical Information Technology, Technische Universitat M ¨ unchen, Germany, which he joined in October ¨ 2010. Since 2014, Prof. Boche has been a member and Honorary Fellow of the TUM Institute for Advanced Study, Munich, Germany, and since 2018 has been a Founding Director of the Center for Quantum Engineering, Technische Universitat M¨ unchen, Germany. Since 2021, he has been jointly ¨ leading the BMBF Research Hub 6G-life with Frank Fitzek. He was elected a member of the German Academy of Sciences (Leopoldina) in 2008 and to the Berlin Brandenburg Academy of Sciences and Humanities in 2009. He is a recipient of the Research Award "Technische Kommunikation" from the Alcatel SEL Foundation in October 2003, the "Innovation Award" from the Vodafone Foundation in June 2006, and the Gottfried Wilhelm Leibniz Prize from the Deutsche Forschungsgemeinschaft (German Research Foundation) in 2008. He was a co-recipient of the 2006 IEEE Signal Processing Society Best Paper Award and a recipient of the 2007 IEEE Signal Processing Society Best Paper Award. He was General Chair of the Symposium on Information Theoretic Approaches to Security and Privacy at IEEE GlobalSIP 2016.

### **H. Vincent Poor**

H. Vincent Poor received his Ph.D. in EECS from Princeton University in 1977. From 1977 until 1990 he was on the faculty of the University of Illinois at Urbana-Champaign. Since 1990, he has been on the faculty at Princeton, where he is currently the Michael Henry Strater University Professor. From 2006 to 2016, he served as the Dean of Princeton's School of Engineering and Applied Science. He has also held visiting appointments at several other universities, including most recently at Berkeley and Cambridge. His research interests are in the areas of information theory, machine learning, and network science, and their applications in wireless networks, energy systems, and related fields. Among his publications in these areas is the recent book Machine Learning and Wireless Communications. (Cambridge University Press, 2022). Dr. Poor is a member of the U.S. National Academy of Engineering and the U.S. National Academy of Sciences, and is a foreign member of the Chinese Academy of Sciences, the Royal Society, and other national and international academies. He received the IEEE Alexander Graham Bell Medal in 2017.

### **Preface to "Information Theoretic Methods for Future Communication Systems"**

Information theory provides powerful tools that can help to eliminate bottlenecks in future communication and computation systems. Eliminating such bottlenecks requires low latency operations with large amounts of data to take advantage of data-driven methods for improving services and providing reliability and other benefits. A collection of highly significant results, provided in this book, shows how information theory can provide a fundamental understanding of the limits of the reliability, robustness, secrecy, privacy, resiliency, and latency of such systems. Thus, we are happy to share these fundamental insights to contribute to the research and development of future systems.

> **Onur G ¨unl ¨u, Rafael F. Schaefer, Holger Boche, and H. Vincent Poor** *Editors*

### *Editorial* **Information Theoretic Methods for Future Communication Systems**

**Onur Günlü 1,\*, Rafael F. Schaefer 2,3, Holger Boche 4,5,6,7 and H. Vincent Poor <sup>8</sup>**


It is anticipated that future communication systems will involve the use of new technologies, requiring high-speed computations using large amounts of data, in order to take advantage of data-driven methods for improving services and providing reliability and other benefits. In many cases, information theory can provide a fundamental understanding of the limits to the reliability, robustness, secrecy, privacy, resiliency, and latency of such systems. The aim of this Featured Special Issue has been to develop a collection of top information and coding theoretic results that provide insight into future communication and computation systems.

The top-notch quality contributions to this Featured Special Issue consist of 11 articles, one of which is a review article. The topics touched upon include a multi-layer grant-free transmission method [1], a direct transform-coding approach that maps the delay-Doppler domain to the time domain [2], degree-of-freedom bounds for multi-antenna, multi-user, and frequency-selective interference channels with an instantaneous relay with or without coordination [3], new coded caching methods to reduce latency with user cooperation and simultaneous transmission [4], and a low-resolution downlink precoding method for multi-input single-output channels with orthogonal frequency-division multiplexing [5]. Furthermore, machine learning methods are discussed in the context of knowledge graphs for semantic communications [6] and in a review of the state-of-the-art coding methods for large-scale distributed machine learning [7]. Focusing on coding theory over rings, a new weight that extends the traditional Hamming weight used for algebraic structures is proposed and its properties are analyzed in [8]. Moreover, security aspects for future communication and computation systems are considered to analyze Gaussian wiretap channels with a jammer that overhears the transmissions [9], to propose new polynomial codes that enable straggler-tolerant secure matrix multiplication [10], and to illustrate the private-key rate regimes observed when reconstructing source sequences at another node with side information under privacy and security constraints [11]. It is expected that these contributions will have a significant impact on the applications of information and coding theory to future communication and computation systems.

**Acknowledgments:** The Guest Editors are grateful to all authors, anonymous reviewers, and the *Entropy* Editors for their great contributions to this Featured Special Issue. Our work was partially supported by the German Federal Ministry of Education and Research (BMBF) within the national initiative on 6G Communication Systems through the research hub *6G-life* under Grants 16KISK001K

**Citation:** Günlü, O.; Schaefer, R.F.; Boche, H.; Poor, H.V. Information Theoretic Methods for Future Communication Systems. *Entropy* **2023**, *25*, 392. https://doi.org/ 10.3390/e25030392

Received: 13 February 2023 Accepted: 15 February 2023 Published: 21 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

and 16KISK002, which motivated and greatly assisted the Guest Editors in putting together this Featured Special Issue. Moreover, this Featured Special Issue was also supported by the ZENITH Research and Career Development Fund, ELLIIT funding endowed by the Swedish government, and U.S. National Science Foundation (NSF) Grant with no. CCF-1908308.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Low-Resolution Precoding for Multi-Antenna Downlink Channels and OFDM †**

**Andrei Stefan Nedelcu 1, Fabian Steiner <sup>2</sup> and Gerhard Kramer 2,\***


**Abstract:** Downlink precoding is considered for multi-path multi-input single-output channels where the base station uses orthogonal frequency-division multiplexing and low-resolution signaling. A quantized coordinate minimization (QCM) algorithm is proposed and its performance is compared to other precoding algorithms including squared infinity-norm relaxation (SQUID), multi-antenna greedy iterative quantization (MAGIQ), and maximum safety margin precoding. MAGIQ and QCM achieve the highest information rates and QCM has the lowest complexity measured in the number of multiplications. The information rates are computed for pilot-aided channel estimation and data-aided channel estimation. Bit error rates for a 5G low-density parity-check code confirm the information-theoretic calculations. Simulations with imperfect channel knowledge at the transmitter show that the performance of QCM and SQUID degrades in a similar fashion as zero-forcing precoding with high resolution quantizers.

**Keywords:** massive MIMO; precoding; coarse quantization; coordinate descent; information rates

### **1. Introduction**

Massive multiple-input multiple-output (MIMO) base stations can serve many user equipments (UEs) with high spectral efficiency and simplified signal processing [1,2]. However, their implementation is challenging due to the cost and energy consumption of analog-to-digital and digital-to-analog converters (ADCs/DACs) and linear power amplifiers (PAs). There are several approaches to lower cost. One approach is hybrid beamforming with analog beamformers in the radio frequency (RF) chain of each antenna and where the digital baseband processing is shared among RF chains. Second, constant envelope waveforms permit using non-linear PAs. Third, all-digital approaches use lowresolution ADCs/DACs or low-resolution digitally controlled RF chains. The focus of this paper is on the all-digital approach.

### *1.1. Single-Carrier Transmission*

We study the multi-antenna downlink and UEs with one antenna each, a model referred to as multi-user multi-input single-output (MU-MISO). Most works on low-cost precoding for MU-MISO consider phase-shift keying (PSK) to lower the requirements on the PAs. For instance, the early papers [3,4] (see also [5]) use iterative coordinate-wise optimization to choose transmit symbols from a continuous PSK alphabet for flat and frequency-selective (or multipath) fading, respectively. We remark that these papers do not include an optimization parameter (called *α* below, see (8)) in their cost function, which plays an important role at high signal-to-noise ratio (SNR), see [6,7]. This parameter is related to linear minimum-mean square error (MMSE) precoding.

**Citation:** Nedelcu, A.S.; Steiner, F.; Kramer, G. Low-Resolution Precoding for Multi-Antenna Downlink Channels and OFDM. *Entropy* **2022**, *24*, 504. https:// doi.org/10.3390/e24040504

Academic Editor: Syed A. Jafar

Received: 13 February 2022 Accepted: 27 March 2022 Published: 4 April 2022 Corrected: 3 March 2023

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Most works consider discrete alphabet signaling. Perhaps the simplest approach, called quantized linear precoding (QLP), applies a linear precoder followed by one lowresolution quantizer per antenna [8–15]. Our focus is on zero forcing (ZF), and we use the acronyms LP-ZF and QLP-ZF, respectively, for unquantized ZF and the QLP version of ZF.

More sophisticated approaches use optimization tools as in [3,4]. For example, the papers [16–18] use convex relaxation methods; Refs. [19–25] apply coordinate-wise optimization; Refs. [26–28] develops a symbol-wise Maximum Safety Margin (MSM) precoder; Refs. [29–32] use a branch-and-bound (BB) algorithm; Ref. [33] uses a majorizationminimization algorithm; Ref. [34] uses integer programming; and [35,36] use neural networks (NNs). These references are collected in Table 1 together with the papers listed below on orthogonal frequency-division multiplexing (OFDM). As the table shows, most papers focus on single-carrier and flat fading channels.


**Table 1.** References for quantized precoding.

### *1.2. Discrete Signaling and OFDM*

Our main interest is discrete-alphabet precoding for multipath channels with OFDM as in 5G wireless systems. Precoding for OFDM is challenging because the alphabet constraint is in the time domain after the inverse discrete Fourier transform (IDFT) rather than in the frequency domain. We further focus on using information theory to derive achievable rates. For this purpose, we consider two types of channel estimation at the receivers: pilot-aided channel estimation via pilot-aided transmission (PAT) and data-aided channel estimation.

Discrete-alphabet precoding for OFDM was treated in Ref. [37], who used QLP and low resolution DACs. A more sophisticated approach appeared in Ref. [38], who applied a squared-infinity norm Douglas-Rachford splitting (SQUID) algorithm to minimize a quadratic cost function in the frequency domain. The performance was illustrated via bit error rate (BER) simulations with convolutional codes and QPSK or 16-quadrature amplitude modulation (QAM) by using 1–3 bits of phase quantization.

The paper [39] instead proposed an algorithm called multi-antenna greedy iterative quantization (MAGIQ) that builds on [19] and uses coordinate-wise optimization of a quadratic cost function in the time domain. MAGIQ may thus be considered an extended version of [4] for OFDM and discrete alphabets. Simulations showed that MAGIQ outperforms SQUID in terms of complexity and achievable rates. Another coordinate-wise optimization algorithm appeared in [40,41] that builds on the papers [21,22]. The algorithm is called constant envelope symbol level precoding (CESLP) and it is similar to the refinement of MAGIQ presented here. The main difference is that, as in [38], the optimization in [40,41] uses a cost function in the frequency domain rather than the time domain. We remark that processing in the time domain has advantages that are described in Section 3.1.

The MSM algorithm was extended to OFDM in [42]. MSM works well at low and intermediate rates but MAGIQ outperforms MSM at high rates both in terms of complexity and achievable rates. Finally, the recent paper [43] uses generalized approximate message passing (GAMP) for OFDM.

### *1.3. Contributions and Organization*

The contributions of this paper are as follows.


We remark that our focus is on algorithms that approximate ZF based on channel inversion, i.e., there is no attempt to optimize transmit powers across subcarriers. This approach simplifies OFDM channel estimation at the receivers because the precoder makes all subcarriers have approximately the same channel magnitude and phase. For instance, a rapid and accurate channel estimate is obtained for each OFDM symbol by averaging the channel estimates of the subcarriers, see Section 4.1. Of course, it is interesting to develop algorithms for other precoders and for subcarrier power allocation.

This paper is organized as follows. Section 2 introduces the baseband model and OFDM signaling. Section 3 describes the MAGIQ and QCM precoders. Section 4 develops theory for achievable rates, presents complexity comparisons, and reviews a model for imperfect channel state information (CSI). Section 5 compares achievable rates and BERs with 5G NR LDPC codes. Section 6 concludes the paper.

### **2. System Model**

Figure 1 shows a MU-MISO system with *N* transmit antennas and *K* UEs that each have a single antenna. The base station has one message per UE and each antenna has a resolution of 1 bit for the amplitude (on-off switch) and *b* bits for the phase per antenna. All other hardware components are ideal: linear, infinite bandwidth, no distortions except for additive white Gaussian noise (AWGN).

**Figure 1.** Multi-user MIMO downlink with a low resolution digitally controlled analog architecture.

### *2.1. Baseband Channel Model*

The discrete-time baseband channel is modeled as a finite impulse response filter between each pair of transmit and receive antennas. Let *xn*[*t*] be the symbol of transmit antenna *n* at time *t* and let *x*[*t*]=(*x*1[*t*] ... *xN*[*t*])*T*. Similarly, let *yk*[*t*] be the received symbol of UE *k* at time *t* and let *y*[*t*]=(*y*1[*t*] ... *yK*[*t*])*T*. The channel model is

$$\mathbf{y}[t] = \sum\_{\tau=0}^{L-1} H[\tau] \mathbf{x}[t-\tau] + \mathbf{z}[t] \tag{1}$$

where the noise *z*[*t*]=(*z*1[*t*] ... *zK*[*t*])*<sup>T</sup>* has circularly-symmetric, complex, Gaussian entries that are independent and have variance *<sup>σ</sup>*2, i.e., we have *<sup>z</sup>* ∼ CN (**0**, *<sup>σ</sup>*<sup>2</sup> *<sup>I</sup>*). The *H*[*τ*], *τ* = 0, ... , *L* − 1, are *K* × *N* matrices representing the channel impulse response, i.e., we have

$$H[\pi] = \begin{pmatrix} h\_{11}[\pi] & h\_{12}[\pi] & \dots & h\_{1N}[\pi] \\ h\_{21}[\pi] & h\_{22}[\pi] & \dots & h\_{2N}[\pi] \\ \vdots & \vdots & \ddots & \vdots \\ h\_{K1}[\pi] & h\_{K2}[\pi] & \dots & h\_{KN}[\pi] \end{pmatrix} \tag{2}$$

where *hkn*[.] is the channel impulse response from the *n*-th antenna at the base station to the *k*-th UE. For instance, a Rayleigh fading multi-path channel with a uniform power delay profile (PDP) has *hkn*[*τ*] ∼ CN (0, 1/*L*) and these taps are independent and identically distributed (iid) for all *k*, *n*, *τ*.

The vector *x*[*t*] is constrained to have entries taken from a discrete and finite alphabet

$$\mathcal{X} = \{0\} \cup \left\{ \sqrt{\frac{P}{N}} \, \mathrm{e}^{j2\pi q/2^b}; q = 0, 1, 2, \dots, 2^b - 1 \right\}. \tag{3}$$

The transmit energy clearly satisfies *x*[*t*]<sup>2</sup> <sup>≤</sup> *<sup>P</sup>* and we define SNR <sup>=</sup> *<sup>P</sup>*/*σ*2. The inequality is due to the 0 symbol that permits antenna selection. Antenna selection was also used in [45] to enforce sparsity. Our intent is rather to allow antennas not to be used if they do not improve performance.

### *2.2. OFDM Signaling*

Figure 1 shows how OFDM can be combined with the precoder. Let *T* = *TF* + *Tc* be the OFDM blocklength with *TF* symbols for the DFT and *Tc* symbols for the cyclic prefix. We assume that *TF* ≥ *L* and *Tc* ≥ *L* − 1. For simplicity, all *TF* subcarriers carry data and we do not include the cyclic prefix overhead in our rate calculations below, i.e., the rates in bits per channel use (bpcu) are computed by normalizing by *TF*.

Consider the frequency-domain modulation alphabet <sup>U</sup><sup>ˆ</sup> that has a finite number of elements, e.g., QPSK has <sup>U</sup><sup>ˆ</sup> <sup>=</sup> {*u*<sup>ˆ</sup> : *<sup>u</sup>*<sup>ˆ</sup> = (±<sup>1</sup> <sup>±</sup> <sup>j</sup>)/ <sup>√</sup>2}. Messages are mapped to the frequency-domain vectors *<sup>u</sup>*ˆ[*m*]=(*u*ˆ1[*m*], ... , *<sup>u</sup>*ˆ*K*[*m*])<sup>T</sup> for subcarriers *<sup>m</sup>* <sup>=</sup> 0, ... , *TF* <sup>−</sup> <sup>1</sup> that are converted to time-domain vectors *u*[*t*] by IDFTs

$$u\_k[t] = \frac{1}{T\_F} \sum\_{m=0}^{T\_F - 1} \hat{u}\_k[m] e^{j2\pi mt/T\_F} \tag{4}$$

for times *t* = 0, ... , *TF* − 1 and UEs *k* = 1, ... , *K*. For the simulations below, we generated the *u*ˆ*k*[*m*] uniformly from finite constellations such as 16-QAM or 64-QAM. We assume that E[*u*ˆ*k*[*m*]] = 0 for all *k* and *m*. Each UE *k* uses a DFT to convert its time-domain symbols *yk*[*t*] to the frequency-domain symbols

$$\hat{y}\_k[m] = \sum\_{t=0}^{T\_F - 1} y\_k[t] e^{-\frac{i}{\hbar} 2\pi mt/T\_F}.\tag{5}$$

### *2.3. Linear MMSE Precoding*

To describe the linear MMSE precoder, consider the channel from base station antenna *n* to UE *k*:

$$\mathbf{h}\_{\rm kn} = (h\_{\rm kn}[0], \dots, h\_{\rm kn}[L-1], \underbrace{0, \dots, 0}\_{(T\_{\rm F} - L) \text{ zeros}})^{\rm T} \tag{6}$$

and denote its DFT as *h*ˆ *kn* = (ˆ *hkn*[0], ... , ˆ *hkn*[*TF* <sup>−</sup> <sup>1</sup>])T. The channel of subcarrier *<sup>m</sup>* is the *<sup>K</sup>* <sup>×</sup> *<sup>N</sup>* matrix *<sup>H</sup>*<sup>ˆ</sup> [*m*] with entries <sup>ˆ</sup> *hkn*[*m*] for *k* = 1, ... , *K*, *n* = 1, ... , *N*. The linear MMSE precoder (or Wiener filter) for subcarrier *m* is

$$\left(P[m]\hat{\mathcal{H}}[m]^\dagger \left(P[m]\hat{\mathcal{H}}[m]\hat{\mathcal{H}}[m]^\dagger + \sigma^2 I\right)\right)^{-1} \tag{7}$$

where *P*[*m*] = E[|*u*ˆ*k*[*m*]| <sup>2</sup>] is the same for all *k*, *H*ˆ [*m*] † is the Hermitian of *H*ˆ [*m*], and *I* is the *K* × *K* identity matrix. The precoder multiplies *u*ˆ[*m*] by (7) for all subcarriers *m*, and performs *N* IDFTs to compute the resulting *x*[0], ... , *x*[*TF* − 1]. We remark that ZF precoding is the same as (7) but with *σ*<sup>2</sup> = 0, where *H*ˆ [*m*]*H*ˆ [*m*] † is usually invertible if *N* is much larger than *K*.

### **3. Quantized Precoding**

We wish to ensure compatibility with respect to LP-ZF. In other words, each receiver *k* should ideally see signals *uk*[*t*], *t* = 0, ... , *T* − 1, that were generated from the frequencydomain signals *<sup>u</sup>*ˆ*k*[*m*], *<sup>m</sup>* <sup>=</sup> 0, ... , *TF* <sup>−</sup> 1. Let *<sup>u</sup>*[*t*]=(*u*1[*t*] ... *uK*[*t*])*<sup>T</sup>* and define the time-domain mean square error (MSE) cost function

$$\begin{aligned} G(\mathbf{x}[0], \dots, \mathbf{x}[T-1], \boldsymbol{\alpha}) &= \sum\_{t=0}^{T-1} \mathbf{E}\_{\mathbf{z}[t]} \left[ \left\| \mathbf{u}[t] - \alpha \mathbf{y}[t] \right\|^2 \right] \\ &= \sum\_{t=0}^{T-1} \left\| \mathbf{u}[t] - \alpha \sum\_{\tau=0}^{L-1} \mathbf{H}[\tau] \mathbf{x}[t-\tau] \right\|^2 + \alpha^2 T \mathbf{K} \sigma^2 \end{aligned} \tag{8}$$

where E*z*[*t*][·] denotes the expectation with respect to the noise *z*[*t*]. The optimization problem is as follows:

$$\begin{array}{ll}\min\_{\mathbf{x}[0],\ldots,\mathbf{x}[T-1],\boldsymbol{\alpha}} & G(\mathbf{x}[0],\ldots,\mathbf{x}[T-1],\boldsymbol{\alpha})\\\text{s.t.} & \mathbf{x}[t] \in \mathcal{X}^{N}, \; t = 0,\ldots,T-1\\& \boldsymbol{\alpha} > 0.\end{array} \tag{9}$$

The parameter *α* in (8) and (9) can easily be optimized for fixed *x*[0], ... , *x*[*T* − 1] and the result is (see [18] Equation (26))

$$\alpha = \frac{\sum\_{t=0}^{T-1} \text{Re}\left(\mathfrak{u}[t]^{\mathbb{H}} \sum\_{\tau=0}^{L-1} H[\tau] \mathfrak{x}[t-\tau]\right)}{\sum\_{t=0}^{T} \left| \sum\_{\tau=0}^{L-1} H[\tau] \mathfrak{x}[t-\tau] \right|^{2} + T\mathcal{K}\sigma^{2}}. \tag{10}$$

For the MAGIQ and QCM algorithms below, we use alternating minimization to find the *x*[0],..., *x*[*T* − 1] and *α*. For the linear MMSE precoder, we label the *α* in (10) as *α*WF.

Observe that we use the same *α* for all *K* UEs because all UEs experience the same shadowing, i.e., all *K* UEs see the same average power. For UE-dependent shadowing, a more general approach would be to replace *α* with a diagonal matrix with *K* parameters *αk*, *k* = 1, . . . , *K*, and then modify (8) appropriately.

### *3.1. MAGIQ and QCM*

For multipath channels, the vector *x*[*t*] influences the channel output at times *t*, *t* + 1, ... , *t* + *L* − 1. A joint optimization over strings of length *T* seems difficult because of this influence and because of the finite alphabet constraint for the *xn*[*t*]. Instead, MAGIQ splits the optimization into sub-problems with reduced complexity by applying coordinate-wise minimization across the antennas and iterating over the OFDM symbol.

For this purpose, consider the precoding problem for time *t* starting at *t* = 0 and ending at *t* = *T* − 1. Observe that *x*[*t* ] influences at most *L* summands in (8), namely the summands for *t* = (*t* )*T*, ... ,(*t* + *L* − 1)*<sup>T</sup>* where (*t*)*<sup>T</sup>* = min(*t*, *T* − 1). To compute the new cost after updating the symbol *xn*[*t* ], one may thus compute sums of the form

$$\sum\_{t=(t')\_{T},...,(t'+L-1)\_{T}} \left\| u[t] - \alpha \sum\_{\tau=0}^{L-1} H[\tau] x[t-\tau] \right\|^2 \tag{11}$$

for *t* = 0, ... , *T* − 1. In both cases, one computes a first and second sum having the old and new *xn*[*t* ], respectively. One then takes the difference and adds the result to (8) to obtain the updated cost.

We remark that the time-domain cost function (8) is closely related to the frequencydomain cost functions in [38,40,41]. However, the time-domain approach is more versatile as it can include acyclic phenomena such as interference from previous OFDM blocks. The time-domain approach is also slightly simpler because updating the symbol *xn*[*t* ] in (8) or (11) requires taking the norm of at most *L* vectors of dimension *K* for each test symbol in X while the frequency-domain approach in ([40] Equation (17)) takes the norm of *TF* vectors of dimension *K* for each test symbol. Recall that *TF* ≥ *L*, and usually *TF* ≥ 10*L* to avoid losing too much efficiency with the cyclic prefix that has length *Tc* ≥ *L* − 1.

The MAGIQ algorithm is summarized in Algorithm 1. MAGIQ steps through time in a cyclic fashion for fixed *α*. At each time *t*, it initializes the antenna set S = {1, ... , *N*} and performs a greedy search for the antenna *n* and symbol *xn*[*t*] that minimize (8) (one may equivalently consider sums of *L* norms as in (11)). The resulting antenna is removed from S and a new greedy search is performed to find the antenna in the new S and the symbol that minimizes (8) while the previous symbol assignments are held fixed. This step is repeated until S is empty. MAGIQ then moves to the next time and repeats the procedure. To determine *α*, MAGIQ applies alternating minimization with respect to *α* and the precoder output {*x*[*t*] : *t* = 0, ... , *T* − 1}. For fixed *x*[.] the minimization can be solved in closed form, see (10) and line 22 of Algorithm 1.

Simulations show that MAGIQ exhibits good performance and converges quickly [39]. However, the greedy selection considerably increases the computational complexity. We thus replace the minimization over S in line 9 of Algorithm 1) with a round-robin schedule or a random permutation. We found that both approaches perform equally well. The new QCM algorithm performs as well as MAGIQ but with a simpler search and a small increase in the number of iterations.

Finally, one might expect that *α* is close to the *α*WF of the transmit Wiener filter [6,7] since our cost function accounts for the noise power. However, Figure 2 shows that this is true only at low SNR. The figure plots the average *α* of the QCM algorithm, called *α*QCM, against the computed *α*WF for simulations with System A in Section 5. Note that *α*QCM is generally larger than *α*WF.

### **Algorithm 1** MAGIQ and QCM precoding.

1: **procedure** PRECODE(Algo, *H*[.], *u*[.]) 2: *x*(0)[.] = *x*[.]*init* 3: *α*(0) = *αinit* 4: **for** *i* = 1 : *I* **do** // iterate over OFDM block 5: **for** *t* = 0 : *T* − 1 **do** 6: S = {1, . . . , *N*} 7: **while** S = ∅ **do** 8: **if** Algo = MAGIQ **then** 9: (*x*- *n*- , *n*-) = argmin*x*˜*n*∈X ,*n*∈S 10: *G <sup>x</sup>*(*i*)[0],..., *<sup>x</sup>*(*i*)[*<sup>t</sup>* <sup>−</sup> <sup>1</sup>], ˜*x*, 11: *<sup>x</sup>*(*i*−<sup>1</sup>)[*<sup>t</sup>* <sup>+</sup> <sup>1</sup>],..., *<sup>x</sup>*(*i*−<sup>1</sup>)[*<sup>T</sup>* <sup>−</sup> <sup>1</sup>], *<sup>α</sup>*(*i*−1) 12: **else** // Algo = QCM 13: *n*- = min S // round-robin schedule 14: *x*- *n*- = argmin*x*˜*n*-∈X 15: *G <sup>x</sup>*(*i*)[0],..., *<sup>x</sup>*(*i*)[*<sup>t</sup>* <sup>−</sup> <sup>1</sup>], ˜*x*, 16: *<sup>x</sup>*(*i*−<sup>1</sup>)[*<sup>t</sup>* <sup>+</sup> <sup>1</sup>],..., *<sup>x</sup>*(*i*−<sup>1</sup>)[*<sup>T</sup>* <sup>−</sup> <sup>1</sup>], *<sup>α</sup>*(*i*−1) 17: **end if** 18: *x* (*i*) *n*- [*t*] = *x*- *n*- // update antenna *n* at time *t* 19: S←S\{*n*-} 20: **end while** 21: **end for** 22: *<sup>α</sup>*(*i*) <sup>=</sup> <sup>∑</sup>*T*−<sup>1</sup> *<sup>t</sup>*=<sup>0</sup> Re(*u*[*t*] <sup>H</sup> <sup>∑</sup>*L*−<sup>1</sup> *<sup>τ</sup>*=<sup>0</sup> *<sup>H</sup>*[*τ*]*x*(*i*)[*t*−*τ*]) ∑*<sup>T</sup> <sup>t</sup>*=0∑*L*−<sup>1</sup> *<sup>τ</sup>*=<sup>0</sup> *<sup>H</sup>*[*τ*]*x*(*i*)[*t*−*τ*] 2 +*TKσ*<sup>2</sup> 23: **end for** 24: **return** *x*[.] = *x*(*I*)[.], *α* = *α*(*I*) 25: **end procedure**

**Figure 2.** *α*QCM vs. *α*WF for System A of Table 2 and 64-QAM.


**Table 2.** System parameters for the simulations.

### **4. Performance Metrics**

*4.1. Achievable Rates*

We use generalized mutual information (GMI) to compute achievable rates [46,47], (Ex. 5.22) which is a standard tool to compare coded systems. Consider a generic input distribution *<sup>P</sup>*(*x*) and a generic channel density *<sup>p</sup>*(*y*|*x*) where *<sup>x</sup>* = (*x*1, ... , *xS*)*<sup>T</sup>* and *y* = (*y*1,..., *yS*)*<sup>T</sup>* each have *S* symbols. A lower bound to the mutual information

$$\Pi(\mathbf{X};\mathbf{Y}) = \sum\_{\mathbf{x},\mathbf{y}} P(\mathbf{x}) p(\mathbf{y}|\mathbf{x}) \log\_2 \left( \frac{p(\mathbf{y}|\mathbf{x})}{\sum\_{\mathbf{a}} P(\mathbf{a}) \ p(\mathbf{y}|\mathbf{a})} \right) \tag{12}$$

is the GMI

$$\Pi\_{q,\mathbb{s}}(\mathbf{X};\mathbf{Y}) = \sum\_{\mathbf{x},\mathbf{y}} P(\mathbf{x}) p(\mathbf{y}|\mathbf{x}) \log\_2 \left( \frac{q(\mathbf{y}|\mathbf{x})^{\mathbf{s}}}{\sum\_{\mathbf{a}} P(\mathbf{a}) \, q(\mathbf{y}|\mathbf{a})^{\mathbf{s}}} \right) \tag{13}$$

where *q*(*y*|*x*) is any auxiliary density and *s* ≥ 0. In other words, the choices *q*(*y*|*x*) = *p*(*y*|*x*) for all *x*, *y* and *s* = 1 maximize the GMI. However, the idea is that *p*(*y*|*x*) may be unknown or difficult to compute and so one chooses a simple *q*(*y*|*x*). The reason why *p*(*y*|*x*) is difficult to compute here is because we will measure the GMI across the end-to-end channels from the *u*ˆ*k*[*m*] to the *y*ˆ*k*[*m*] and the quantized precoding introduces non-linearities in these channels. The final step in evaluating the GMI is maximizing over *s* ≥ 0. Alternatively, one might wish to simply focus on *s* = 1, e.g., see [48].

We study the GMI of two non-coherent systems: classic PAT and data-aided channel estimation. For both systems, we apply memoryless signaling with the product distribution

$$P(\mathbf{x}) = \prod\_{i=1}^{S\_p} \mathbf{1}(\mathbf{x}\_i = \mathbf{x}\_{p,i}) \cdot \prod\_{i=S\_p+1}^{S} P(\mathbf{x}\_i) \tag{14}$$

where the *xp*,*<sup>i</sup>* are pilot symbols, 1(*a* = *b*) is the indicator function that takes on the value 1 if its argument is true and 0 otherwise, and *P*(*x*) is a uniform distribution. Joint data and channel estimation has *Sp* = 0 so that we have only the second product in (14). At the receiver we use the auxiliary channel

$$q(y|\mathbf{x}) = \prod\_{i=1}^{S} q\_{\mathbf{x},y}(y\_i \mid \mathbf{x}\_i) \tag{15}$$

where the symbol channel *qx*,*y*(.) is a function of *x* and *y*. Observe that *qx*,*y*(.) is invariant for *S* symbols and the channel can be considered to have memory since every symbol *x* or *y*, = 1, ... , *S*, influences the channel for all "times" *i* = 1, ... , *S*. The GMI rate (13) simplifies to

$$\sum\_{\mathbf{x},\mathbf{y}}P(\mathbf{x})p(\mathbf{y}|\mathbf{x})\sum\_{i=S\_p+1}^S \log\_2\left(\frac{q\_{\mathbf{x},\mathbf{y}}(y\_i\mid\mathbf{x}\_i)^s}{\sum\_a P(a)\,q\_{\mathbf{x},\mathbf{y}}(y\_i\mid a)^s}\right). \tag{16}$$

One may approximate (16) by applying the law of large numbers for stationary signals and channels. The idea is to independently generate the *B* pairs of vectors

$$\begin{aligned} \mathbf{x}^{(b)} &= (\mathbf{x}\_1^{(b)}, \dots, \mathbf{x}\_S^{(b)})^T \\ \mathbf{y}^{(b)} &= (\mathbf{y}\_1^{(b)}, \dots, \mathbf{y}\_S^{(b)})^T \end{aligned}$$

for *b* = 1, ... , *B*, and then the following average rate will approach I*q*,*s*(*X*; *Y*)/*S* bpcu as *B* grows:

$$R\_{\mathbf{a}} = \frac{1}{B} \sum\_{b=1}^{B} R\_{\mathbf{a}}^{(b)} \tag{17}$$

where

$$R\_{\mathbf{a}}^{(b)} = \frac{1}{S} \sum\_{i=S\_p+1}^{S} \log\_2 \left( \frac{q\_{\mathbf{x}^{(b)}, \mathbf{y}^{(b)}} \left( y\_i^{(b)} \mid \mathbf{x}\_i^{(b)} \right)^s}{\sum\_a P(a) \, q\_{\mathbf{x}^{(b)}, \mathbf{y}^{(b)}} \left( y\_i^{(b)} \mid a \right)^s} \right). \tag{18}$$

We choose the Gaussian auxiliary density

$$q\_{\mathbf{x},\mathbf{y}}(\mathbf{y}|\mathbf{x}) = \frac{1}{\pi \sigma\_q^2} \exp\left(-\frac{\left|\mathbf{y} - \mathbf{h} \cdot \mathbf{x}\right|^2}{\sigma\_q^2}\right) \tag{19}$$

where for pilot-aided transmission (PAT) the receiver computes joint maximum likelihood (ML) estimates with sums of *Sp* terms:

$$\begin{split} h &= \frac{\sum\_{i=1}^{S\_p} |y\_i \cdot \mathbf{x}\_i^\*|}{\sum\_{i=1}^{S\_p} |\mathbf{x}\_i^2|} \\ \sigma\_q^2 &= \frac{1}{S\_p} \sum\_{i=1}^{S\_p} |y\_i - h \cdot \mathbf{x}\_i|^2. \end{split} \tag{20}$$

For the data-aided detector we replace *Sp* with *S* in (20). Note that for the Gaussian channel (19) the parameter *s* multiplies 1/*σ*<sup>2</sup> *<sup>q</sup>* in (16) or (18), and optimizing *s* turns out to be the same as choosing the best parameter *σ*<sup>2</sup> *<sup>q</sup>* when *s* = 1.

Summarizing, we use the following steps to evaluate achievable rates. Suppose the coherence time is *S*/*TF* OFDM symbols where *S* is a multiple of *TF*. We index the channel symbols by the pairs (, *m*) where is the OFDM symbol and *m* is the subcarrier, 1 ≤ ≤ *S*/*TF*, 0 ≤ *m* ≤ *T* − 1. We collect the pilot index pairs in the set S*<sup>p</sup>* that has cardinality *Sp*, and we write the channel inputs and outputs of UE *k* for OFDM symbol and subcarrier *m* as *u*ˆ*k*[, *m*] and *y*ˆ*k*[, *m*], respectively.


$$\begin{split} \boldsymbol{h}\_{k} &= \frac{\sum\_{(\ell,m)\in\mathcal{S}\_{p}} \mathcal{Y}\_{k}[\ell,m] \cdot \boldsymbol{\hat{n}}\_{k}[\ell,m]^{\*}}{\sum\_{(\ell,m)\in\mathcal{S}\_{p}} \left| \boldsymbol{\hat{n}}\_{k}[\ell,m] \right|^{2}} \\ \sigma\_{q,k}^{2} &= \frac{1}{\boldsymbol{S}\_{p}} \sum\_{(\ell,m)\in\mathcal{S}\_{p}} \left| \mathcal{Y}\_{k}[\ell,m] - \boldsymbol{h}\_{k} \cdot \boldsymbol{\hat{n}}\_{k}[\ell,m] \right|^{2}} . \end{split} \tag{21}$$

For the data-aided detector, in (21) we replace S*<sup>p</sup>* with the set of all index pairs (, *m*), and we replace *Sp* with *S*;

4. Compute *R*(*b*) <sup>a</sup> in (18) for each UE *k* by averaging, i.e., the rate for UE *k* is

$$R\_{\mathbf{a},k}^{(b)} = \frac{1}{S} \sum\_{(\ell,m)\notin S\_p} \log\_2 \left( \frac{q\_{\mathbf{a},\hat{\mathcal{G}}\_k}(\hat{\mathcal{G}}\_k[\ell,m] \mid \hat{\mathbf{a}}\_k[\ell,m])^s}{\sum\_a P(a) \, q\_{\mathbf{\hat{a}}\_k,\hat{\mathcal{G}}\_k}(\hat{\mathcal{G}}\_k[\ell,m] \mid a)^s} \right) \tag{22}$$

where *u*ˆ *<sup>k</sup>* and *y*ˆ *<sup>k</sup>* are vectors collecting the *u*ˆ*k*[, *m*] and *y*ˆ*k*[, *m*], respectively, for all pairs (, *m*). For the data-aided detector we set S*<sup>p</sup>* = ∅ in (22);


Our simulations showed that optimizing over *s* ≥ 0 gives *s* ≈ 1 if the channel parameters are chosen using (21).

### *4.2. Discussion*

We make a few remarks on the lower bound. First, the receivers do not need to know *α*. Second, the rate *R*<sup>a</sup> in (17) is achievable if one assumes stationarity and coding and decoding over many OFDM blocks. Third, as *S* grows, the channel estimate of the dataaided detector becomes more accurate and the performance approaches that of a coherent receiver. Related theory for PAT and large *S* is developed in [49]. However, the PAT rate is generally smaller than for a data-aided detector because the PAT channel estimate is less accurate and because PAT does not use all symbols for data.

We remark that blind channel estimation can approach the performance of data-aided receivers for large *S*. Blind channel estimation algorithms can, e.g., be based on high-order statistics and iterative channel estimation and decoding. For polar codes and low-order constellations, one may use the blind algorithms proposed in [50]. We found that the PAT rates are very close (within 0.1 bpcu) of the pilot-free rates multiplied by the rate loss factor 1 − *Sp*/*S* for pilot fractions as small as *Sp*/*S* = 10%.

Depending on the system under consideration, we choose one of *TF* = 32,256,396, one of *T* = 35,270,277,286,410, one of *S* = 256,1584, and *B* = 200. For most simulations we have *TF* = *S* = 256 and estimate the channel based on individual OFDM symbols, see Section 1.3. For example, for *T* = 270 and a symbol time of 30 ns (symbol rate 33.3 MHz) the coherence time needs to be at least (30 ns) · *T* = 8.1 μs. Of course, the transmitter needs to know the channel also, e.g., via time-division duplex, which requires the coherence time to be substantially larger. The main point is that channel estimation at the receiver is not a bottleneck when using ZF based on channel inversion. Finally, for the coded simulations we chose *TF* = 396 and *S* = 4*TF* = 1548 because the LDPC code occupies four OFDM symbols.

### *4.3. Algorithmic Complexity*

This section studies the algorithmic complexity in terms of the number of multiplications and iterations. The complexity of SQUID is thoroughly discussed in [38] and Table 3 shows the order estimates take from [38] (Table I). Note the large number of iterations.


**Table 3.** Algorithmic complexity.

The complexity of MSM depends on the choice of optimization algorithm and [42] considers a simplex algorithm. Unfortunately, the simplex algorithm requires a large number of iterations to converge because this number is proportional to the number of variables and linear inequalities that grow with the system size (*N*, *K*, *T*). An interior point algorithm converges more quickly but has a much higher complexity per iteration.

For MAGIQ and QCM, Equation (8) shows that updating *x*[.] requires updating *L* of the *<sup>T</sup>* terms that each require a norm calculation. The resulting terms *u*[*t*]<sup>2</sup> do not affect the maximization; terms such as *αHx*<sup>2</sup> <sup>2</sup> can be pre-computed and stored with a complexity of *NKL*|X |, and then reused as they do not change during the iterations. On the other hand, products of the form *αu*H*Hx* must be computed for each of the *L* terms for each antenna update and at each time instance, resulting in a complexity of O(*NKLT*). The initialization requires *KNT* multiplications and one must transform the solutions to the time domain. We neglect the cost of updating *α* because the terms needed to compute it are available as a byproduct of the iterative process over the time instances.

### *4.4. Sensitivity to Channel Uncertainty at the Transmitter*

In practice, the CSI is imperfect due to noise, quantization, calibration errors, etc. We do not attempt to model these effects exactly. Instead, we adopt a standard approach based on MMSE estimation and provide the precoder with channel matrices *H*˜ [*τ*] that satisfy

$$H[\tau] = \sqrt{1 - \varepsilon^2} \mathbf{H}[\tau] + \varepsilon \mathbf{Z}[\tau] \tag{23}$$

where 0 <sup>≤</sup> *<sup>ε</sup>* <sup>≤</sup> 1 and *<sup>Z</sup>*[*τ*] is a *<sup>K</sup>* <sup>×</sup> *<sup>N</sup>* matrix of independent, variance *<sup>σ</sup>*<sup>2</sup> *<sup>h</sup>* = 1/*L*, complex, circularly-symmetric Gaussian entries. Note that *ε* = 0 corresponds to perfect CSI and *ε* = 1 corresponds to no CSI. The precoder treats *H*˜ [*τ*] as the true channel realization for *τ* = 0, . . . , *L* − 1.

### **5. Numerical Results**

We evaluate the GMIs of four systems. The main parameters are listed in Table 2 and we provide a few more details here.


The average GMIs for Systems A–C were computed using *S* = 256, *B* = 200, and a data-aided detector. The coded results of System D instead have *S* = 1584 symbols to fit the block structure determined by the LDPC encoder. For System D we considered both PAT and a data-aided detector. For all cases, the GMI was computed by averaging over the subcarriers, i.e., channel coding is assumed to be applied over multiple sub-carriers and OFDM

symbols. The MAGIQ and QCM algorithms were both initialized with a time-domain quantized solution of the transmit matched filter (MF).

Figures 3 and 4 show the average GMIs for System A with *b* = 2 and *b* = 3, respectively. In Figure 3, MAGIQ performs four iterations for each OFDM symbol while QCM performs six iterations. Observe that MAGIQ and QCM are best at all SNRs and they are especially good in the interesting regime of high SNR and rates. The gap to the rates over flat fading channels (*L* = 1) is small. SQUID with 64-QAM requires 100–300 iterations for SNR > 15 dB and a modified algorithm with damped updates, otherwise SQUID diverges. In addition, we show the broadcast channel capacity with uniform power allocation and Gaussian signaling as an upper bound for the considered scenario [52,53]. Figure 4 shows that QCM with three iterations operates within ≈0.2–0.4 dB of MAGIQ with five iterations when *b* = 3, which shows that QCM performs almost as well as MAGIQ.

**Figure 3.** Average GMIs for System A and *b* = 2.

**Figure 4.** Average GMIs for System A with 64-QAM and *b* = 3.

Figure 5 compares achievable rates of QCM, SQUID, and MSM for a smaller system studied in [42]. We use PSK because the MSM algorithm was designed for PSK. The figure shows that MSM outperforms SQUID and QCM at low to intermediate SNR and rates, but QCM is best at high SNR and rates. This suggests that modifying the cost function (8) to include a safety margin will increase the QCM rate at low to intermediate SNR, and similarly modifying the MSM optimization to more closely resemble QCM will increase the MSM rate at high SNR. We tried to simulate MSM for System A but the algorithm ran into memory limitations (we used 2 AMD EPYC 7282 16-Core processors, 125 GB of system memory, and Matlab with both dual-simplex and interior-point solvers).

Consider next the Winner2 non-line-of-sight (NLOS) C2 urban model [51], which is more realistic than Rayleigh fading. The model parameters are as follows.


Figure 6 shows the average GMIs for LP-ZF and MAGIQ. At high SNR, there is a slight decrease in the slope of the MAGIQ GMI as compared to LP-ZF. This suggests that one might need a larger *N* or *b*. The performance for the Rayleigh fading model is better than for the Winner2 model but otherwise behaves similarly.

**Figure 5.** Average GMIs for System B.

**Figure 6.** Average GMIs for System C.

Figure 7 shows BERs for the LDPC code with 64-QAM. Each codeword is interleaved over 4 OFDM symbols, all 396 subcarriers, and the 6 bits of each modulation symbol by using bit-interleaved coded modulation (BICM). The interleaver was chosen randomly with a uniform distribution over all permutations of length 9504. The solid curves are for data-aided channel estimation and the dotted curves show the performance of PAT when the fraction of pilots is *Sp*/*S* = 10%. The pilots were placed uniformly at random over the four OFDM symbols and 396 subcarriers. A good blind detector algorithm that performs joint channel and data estimation should have BERs between the solid and dotted curves.

The dashed curves in Figure 7 show the SNRs required for the different algorithms based on Figure 3. In particular, the rate 5.33 bpcu requires SNRs of 9 dB, 12.9 dB, and 15.2 dB for LP-ZF, QCM, and SQUID, respectively. SQUID is run with 300 iterations and QCM is run with 6 iterations. Each UE computes its log-likelihoods based on the parameters (20) of the auxiliary channel. The GMI predicts the coded behavior of the system within approximately 1 dB of the code waterfall region, except for SQUID, where the gap is about 2 dB. The gap seems to be caused mainly by the finite-blocklength of the LDPC code, since the smaller gap of approximately 1 dB is also observed for additive white Gaussian noise (AWGN) channels. The sizes of the gaps are different, and the reason may be that the slopes of the GMI at rate 5.33 bpcu are different, see Figure 3. Observe that LP-ZF exhibits the steepest slope and SQUID the flattest at *R*<sup>a</sup> = 5.33 bpcu; this suggests that SQUID's SNR performance is more sensitive to the blocklength.

**Figure 7.** BERs for System D and a 5G NR LDPC code. The dashed vertical curves show the SNRs required for long random codes, see Figure 3.

Figure 8 is for System A and shows how the GMI decreases as the CSI becomes noisier. The behavior of all systems is qualitatively similar. However, the figure shows that the QCM rate is more sensitive to the parameter than the SQUID rate when is small.

**Figure 8.** Average GMIs for System A and imperfect CSI at SNR = 12 dB.

### **6. Conclusions**

We studied downlink precoding for MU-MISO channels where the base station uses OFDM and low-resolution DACs. A QCM algorithm was introduced that is based on the MAGIQ algorithm in [39] (see also [19]) and which performs a coordinate-wise optimization in the time-domain. The performance was analyzed by computing the GMI for two auxiliary channel models: one model for pilot-aided channel estimation and a second model for a data-aided channel estimation. Simulations for several downlink channels, including a Winner2 NLOS urban scenario, showed that QCM achieves high information rates and is computationally efficient, flexible, and robust. The performance of QCM was compared to MAGIQ and other precoding algorithms including SQUID and MSM. The QCM and MAGIQ algorithms achieve the highest information rates with the lowest complexity measured by the number of multiplications. For example, Figure 4 shows that *b* = 3 bits of phase modulation operates within 3 dB of LP-ZF. Moreover, BER simulations for a 5G NR LDPC code show that GMI is a good predictor of the coded performance. Finally, for noisy CSI the performance degradation of QCM and SQUID is qualitatively similar to the performance degradation of LP-ZF.

**Author Contributions:** Investigation, A.S.N., F.S. and G.K.; Software, A.S.N.; Writing—original draft, F.S.; Writing—review & editing, A.S.N. and G.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Deutsche Forschungsgemeinschaft through the grant KR 3517/9-1, and by Nokia Solutions and Networks through the project "Low Cost Booster Arrays for Massive MIMO Precoding" in 2017.

**Acknowledgments:** The authors wish to thank M. Staudacher, W. Zirwas, B. Panzner, R. S. Ganesan, P. Baracca, and S. Wesemann for useful discussions.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


### *Article* **Reliable Semantic Communication System Enabled by Knowledge Graph**

**Shengteng Jiang, Yueling Liu \*, Yichi Zhang, Peng Luo, Kuo Cao \*, Jun Xiong, Haitao Zhao and Jibo Wei**

College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China; jiangshengteng@nudt.edu.cn (S.J.); zhangyichi13@nudt.edu.cn (Y.Z.); pengluo.eric@outlook.com (P.L.); xj8765@nudt.edu.cn (J.X.); haitaozhao@nudt.edu.cn (H.Z.); wjbhw@nudt.edu.cn (J.W.)

**\*** Correspondence: liuyueling16@nudt.edu.cn (Y.L.); caokuo18@nudt.edu.cn (K.C.)

**Abstract:** Semantic communication is a promising technology used to overcome the challenges of large bandwidth and power requirements caused by the data explosion. Semantic representation is an important issue in semantic communication. The knowledge graph, powered by deep learning, can improve the accuracy of semantic representation while removing semantic ambiguity. Therefore, we propose a semantic communication system based on the knowledge graph. Specifically, in our system, the transmitted sentences are converted into triplets by using the knowledge graph. Triplets can be viewed as basic semantic symbols for semantic extraction and restoration and can be sorted based on semantic importance. Moreover, the proposed communication system adaptively adjusts the transmitted contents according to channel quality and allocates more transmission resources to important triplets to enhance communication reliability. Simulation results show that the proposed system significantly enhances the reliability of the communication in the low signal-to-noise regime compared to the traditional schemes.

**Keywords:** semantic communication; knowledge graph; semantic extraction; semantic restoration

### **1. Introduction**

In recent years, wireless communication technology has developed rapidly, bringing great convenience to human life. Fifth-generation (5G) wireless communication technology has played an important role in smart cities, autonomous driving, telemedicine, and other fields [1]. However, with the gradual increase in the communication rate, the explosive growth of data has created enormous challenges for wireless communication technology [2]. According to the forecast from the International Telecommunication Union (ITU), the annual growth rate of the global mobile data stream will reach up to 55% by 2030 [3]. Moreover, the transmission rate of existing communication technologies has gradually approached the Shannon capacity [4], which cannot meet the continuously growing communication demands in the future 6G era. In the future, the 6G communication system will play an important role in remote holography [5], digital twin [6], and other application fields. Therefore, the sixth-generation wireless communication system needs to provide an ultrahigh peak rate, ultra-large user experience rate, and ultra-low network latency, which will consume more limited available spectrum and power and bring huge challenges to communication technology. Semantic communication is one of the effective techniques used to overcome these challenges [7].

Semantic communication, as a revolution against traditional communication, is a new communication paradigm [8]. The concept of semantic communication was first proposed by Weaver (1949) [9]. After Shannon (1948) put forward the classical information theory [4], Weaver proposed that communication should be divided into three different layers, namely the technical layer, semantic layer, and effectiveness layer. The technical layer represents traditional communication, focusing on "how to accurately transmit communication symbols".

**Citation:** Jiang, S.; Liu, Y.; Zhang, Y.; Luo, P.; Cao, K.; Xiong, J.; Zhao, H.; Wei, J. Reliable Semantic Communication System Enabled by Knowledge Graph. *Entropy* **2022**, *24*, 846. https://doi.org/ 10.3390/e24060846

Academic Editors: Onur Günlü, Rafael F. Schaefer, Holger Boche and H. Vincent Poor

Received: 26 April 2022 Accepted: 18 June 2022 Published: 20 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The semantic layer focuses on "how to accurately convey the meaning of communication symbols"; the effectiveness layer focuses on "how the received meaning effectively affects the receiver's behavior". Compared with traditional communication, semantic communication aims to reduce the uncertainty of message understanding between the transmitter and the receiver. Moreover, semantic communication mainly transmits semantic-relevant information, which greatly reduces the amount of redundant data. Therefore, semantic communication is a suitable technology (against the scenarios) with limited communication bandwidth and a low signal-to-noise ratio (SNR) [10,11].

However, some fundamental problems of semantic communication have not been effectively solved. One of them is semantic representation, which limits the development of semantic communication [7]. Regarding semantic representation—existing research studies tend to use transmitted content features to represent the semantics. This representation lacks human language logic and cannot be interactive verification with human understanding [12]. To solve this problem, we considered using the knowledge graph instead of features to represent semantics. The knowledge graph can decompose text into multiple semantic units without losing semantics [13], ensuring the accuracy of semantic representation. The basic structure of the knowledge graph is a triplet in the form of an "entity-relation-entity" [13]. From the linguistic point of view, a single entity may have multiple types of semantic information. The specific semantic information can be determined after a relationship is formed between entities, so the triplet in the knowledge graph can be regarded as the smallest semantic symbol. There have been some research studies exploring the relationship between the knowledge graph and semantics. Jaradeh et al. (2019) proposed that the knowledge graph was the next-generation infrastructure for semantic scholarly knowledge [14]. Mosa (2021) proposed that the knowledge graph could help with semantic category prediction [15]. Zhou et al. (2022) combined the knowledge graph with semantic communication to improve the validity of communication [16]. Thus, the knowledge graph can effectively represent semantics; we investigated the semantic communication system based on the knowledge graph (SCKG) for improving communication reliability. The main contributions of this paper are summarized as follows:


The rest of this paper is organized as follows. Section 2 briefly reviews the related work. Section 3 details the proposed system and the semantic extraction and restoration methods used in the model. Experimental results are presented in Section 4 to verify the performance of the proposed model. Finally, Section 5 concludes this paper.

### **2. Related Work**

### *2.1. Semantic Communication Development*

Due to technical limitations in the early stage of communication development, researchers have focused on solving engineering problems at the technical layer and postponed the study at the semantic layer. However, this does not mean that the research on semantic communication will be shelved. With the advancements in technology, the semantic problem has become an urgent problem that needs to be solved in the communication field [17].

In terms of theoretical research, Carnap et al. (1954) first proposed the concept of the semantic information theory to supplement the classical information theory [18]. They thought that the semantic information contained in the sentence should be defined based on the logical probability of the content of the sentence. Floridi (2004) proposed a theory of strongly semantic information [19] and pointed out the problem that sentence contradictions will have infinite information. Bao et al. (2011) put forward a general model of semantic communication, using a factual statement in the propositional logic form to represent semantics [20]. Moreover, the semantic entropy, semantic noise, and semantic channel capacity were defined in [20]. Based on the literature [20], Basu et al. (2012) provided a detailed explanation of the relationship between semantic entropy and information entropy, and they defined the concepts of semantic ambiguity and semantic redundancy [21]. In [22], Lan et al. (2021) proposed that semantic communication can be divided into human-tohuman, human-to-machine, and machine-to-machine sub-areas, which broadened the scope of semantic communication.

On the other hand, the rapid development of neural networks and artificial intelligence technology promotes the progress of technical research in semantic communication. In terms of semantic coding, the authors of [23] proposed a joint source-channel coding for semantic information with a bidirectional long short-term memory model (BILSTM). As an extension of the literature [23], Rao et al. (2018) presented a variable-length joint source-channel coding of semantic information [24]. In [25], Liu et al. (2022) proposed a semantic encoding strategy based on parts-of-speech and context-based decoding strategies, which enhanced communication reliability from the semantic level. Based on the semantic communication framework, Xie et al. (2021) proposed a deep learning-based semantic communication model [26], which used word embedding technology to map text to semantic space and then performed source-channel joint encoding for semantic information by using the transformer framework [27]. Furthermore, the authors of [28] proposed a lightweight distributed semantic communication system for the application scenario of the internet of things (IoT), which reduced the cost of IoT devices. The authors of [29] proposed a semantic communication model based on reinforcement learning to investigate the impact of noisy environments on semantic information. In different information forms, Weng et al. (2021) proposed a semantic communication model for speech transmission [30]. In [31], Hu et al. (2022) proposed a robust end-to-end semantic communication system to combat the semantic noise for image transmission. Moreover, a semantic communication model based on multi-information modalities was developed in [32]. Regarding semantic representation, Zhou et al. (2022) used the transformer for semantic extraction and semantic restoration [33].

### *2.2. Performance Metrics*

Semantic communication, different from traditional communication systems, does not emphasize the perfect recovery of the transmitted message, but rather on the receiver correctly understanding the message in the same way as the transmitter. As a result, performance metrics commonly used in traditional communication systems (e.g., bit error rate and symbol error rate) are no longer suitable for semantic communication. Hence, this paper uses the bilingual evaluation understudy (BLEU) score [34], a metric for evaluation of translation with the explicit ordering (METEOR) score [35], and the semantic similarity score [36], as performance metrics.

### 2.2.1. BLEU Score

BLEU is currently the most commonly used metric in text evaluation [37]. It evaluates the similarity by counting the number of the same n-grams between transmitted and received texts, where n-gram means *n* consecutive words in the text. The formula can be expressed as

$$\log \text{BLEU} = \min \left( 1 - \frac{l\_{\text{g}}}{l\_{\text{s}}}, 0 \right) + \sum\_{n=1}^{N} \omega\_{n} \log p\_{n} \tag{1}$$

where *s* and *s*ˆ denote the transmitted sentence and restored sentence, respectively. *ls* and *ls*<sup>ˆ</sup> are the lengths of the transmitted sentences *s* and restored sentence *s*ˆ, respectively. *ω<sup>n</sup>* represents the weight of n-grams, and *pn* denotes the precision of n-grams.

### 2.2.2. METEOR Score

METEOR extends the synonym set by introducing external knowledge sources, such as WordNet [38]. Furthermore, it uses precision *Pm* and recall *Rm* to evaluate the similarity between transmitted and received texts. The formula is given as follows

$$F\_{\text{mean}} = \frac{P\_{\text{m}} R\_{\text{m}}}{aP\_{\text{m}} + (1 - a)R\_{\text{m}}} \tag{2}$$

$$\text{METEOR} = (1 - \text{Per})F\_{\text{mean}} \tag{3}$$

where *α* is the hyperparameter according to WordNet, *F*mean represents the harmonic mean combining *Pm* and *Rm*, and Pen is the penalty coefficient.

### 2.2.3. Semantic Similarity Score

The semantic similarity score converts text into vectors by using the BERT model [39]. It evaluates the semantic similarity between sentences by comparing the degree of similarity between vectors. For the transmitted sentence's vector *v*(*s*) and the received sentence's vector *v*(*s*ˆ), the semantic similarity score can be expressed as

$$\text{sim}\_{\text{V}}(\mathbf{s}, \mathbf{\hat{s}}) = \frac{\mathbf{v}(\mathbf{s}) \cdot \left(\mathbf{v}(\mathbf{\hat{s}})\right)^{T}}{||\mathbf{v}(\mathbf{s})|| \cdot ||\mathbf{v}(\mathbf{\hat{s}})||}\tag{4}$$

All the performance metrics introduced above take values between 0 and 1. A higher score given by the performance metrics means that the received text's semantic is closer to the transmitted text's semantic; 0 means semantically irrelevant; 1 means semantically consistent.

### **3. System Model**

As shown in Figure 1, the structure of the proposed system consists of a semantic extraction module, traditional communication architecture, and semantic restoration module. The proposed system can be divided into two levels, which are the semantic level and the technical level. The structure of the technical level is the same as that of the traditional communication system; thus, we mainly introduce the details at the semantic level. At the transmitter, the semantic extraction module can extract the knowledge graph (KG) of the transmitted sentence to represent its semantics. More importantly, the knowledge graph is sorted according to semantic importance. At the receiver, the semantic restoration module can recover the transmitted sentence according to the received knowledge graph.

Figure 2 shows examples of the proposed semantic communication system in different channel qualities. At the transmitter, the transmitted sentence is first converted into the knowledge graph through the semantic extraction module. Next, the transmitter adjusts the knowledge graph according to the channel quality. Then, the knowledge graph is transmitted through the channel. With the noisy knowledge graph received, the semantic is recovered through the semantic restoration module. In Figure 2a, when the channel quality is good, the transmitted sentence and the restored sentence convey the same semantics although they have different sentence structures. When the channel quality is poor, all triplets cannot be transmitted correctly. Therefore, the proposed semantic communication system chooses to transmit the most important triplet. When it comes to Steve Jobs, people tend to care about his relationship with Apple rather than the college he graduated from. As shown in Figure 2b, the transmitter only sends "< Steve Jobs-founder-Apple" when the channel quality is poor.

**Figure 1.** The structure of the proposed semantic communication system based on the knowledge graph, including the semantic extraction module, traditional communication architecture, and semantic restoration module.

**Figure 2.** Examples of the proposed semantic communication system in different channel qualities. (**a**) An example of the proposed semantic communication system when the channel quality is good. (**b**) An example of the proposed semantic communication system when the channel quality is poor.

### *3.1. Semantic Extraction Method*

To represent the semantic information correctly, the semantic extraction module at the transmitter uses a deep learning network to extract the knowledge graph from the transmitted sentence. Let *S*2*G<sup>θ</sup>* (•) be the function of the proposed semantic extraction method, which takes the sentence *S* = [*w*1, *w*2, ··· , *wm*] as input and its corresponding output is the knowledge graph *G*, where *wm* is the *m*th word in the sentence. The deep learning network structure for the semantic extraction method is shown in Figure 3.

**Figure 3.** The deep learning network structure for the semantic extraction method.

In particular, we used the pipeline method to extract the knowledge graph, which means extracting the entities in *S* and then predicting the relations between entities. Firstly, we used a well-established named entity recognition model (NER) to extract the entities [40]. This model is based on the conditional random field classifier and Gibbs sampling. The conditional random field classifier combines the characteristics of the maximum entropy model and the hidden Markov model, and it is often used to deal with sequence labeling tasks, such as parts-of-speech tagging and named entity recognition. Gibbs sampling is a method of generating Markov chains that can be used for Monte Carlo simulations. Based on the conditional random field classifier and Gibbs sampling, NER is trained by using a large amount of manually annotated text and can recognize entities from given sentences. Therefore, the entities in the transmitted sentence can be expressed as

$$E = [en\_1, en\_2, \dots, en\_{i\prime}, \dots, en\_L] = \text{NER}(S) \tag{5}$$

where *eni* represents the *i*th entity in the sentence, *L* is the total number of entities contained in the sentence.

After extracting entities from *S*, we predict the relations between the two entities. Firstly, the embedding of each word *wj* in the entity *eni* is averaged to obtain the entity's embedding. The embedding of *wj* can be obtained by using a long short-term memory model (LSTM) [41] to encode *wj* and its context. The formula is given as follows

$$\text{emb}(w\_{\rangle}) = \text{LSTM}\_{\text{end}}(w\_{\rangle}, w\_{\leq j}, w\_{\geq j}) \tag{6}$$

Therefore, the *i*th entity's embedding *ei* can be represented as

$$\varepsilon\_{i} = \frac{1}{\text{Len}(\varepsilon n\_{i})} \sum\_{w\_{j} \in \varepsilon n\_{i}} \text{emb}\left(w\_{j}\right) \tag{7}$$

where Len(*eni*) is the number of words in the entity *eni*.

Then we feed the entity embeddings into a multi-label classification layer MLCL(•) to predict the relations. The multi-label classification layer MLCL(•) can take in two entities and predict the possible relation set. To prevent these two entities from being irrelevant, the relation set includes the "no-relation" type. The relation set between the *i*th entity and the *j*th entity can be represented as

$$r\_{i\bar{j}} = \text{MLCL}(e\_{i\prime}e\_{\bar{j}}) \tag{8}$$

Since the knowledge graph is made of entities and relations, the probability of extracting a graph from a given sentence is equivalent to the product of the probability of extracting the relation set given any two entities. The formula can be expressed as

$$p(G \mid S) = \prod\_{i=0}^{L} \prod\_{j=0}^{L} p\left(r\_{ij} \mid c\_{i\prime}c\_{j\prime}S\right) \tag{9}$$

Based on the probability *p*(*G* | *S*), we can denote the loss function of the proposed semantic extraction method by using the negative log-likelihood loss, which can be formulated as

$$\begin{split} \mathcal{L}\_{S2G}(\theta) &= \mathbb{E}[-\log p(G \mid \mathcal{S}; \theta)] \\ &= \mathbb{E}\left[ -\log \prod\_{i=0}^{L} \prod\_{j=0}^{L} p\left(r\_{ij} \mid e\_{i\prime}, e\_{j\prime}, \mathcal{S}; \theta\right) \right] \end{split} \tag{10}$$

where *θ* is the network parameter set of the deep learning network, which is shown in Figure 3.

Utilizing the loss function L*S*2*G*, the optimal parameter set *θ*<sup>∗</sup> can be easily found using the gradient descent method. Consequently, the details of the proposed semantic extraction method can be summarized in Algorithm 1.


**Input:** the transmitted sentence *S*


**Output:** The knowledge graph *G*

### *3.2. Semantic Restoration Method*

The proposed semantic restoration method—similar to the proposed semantic extraction method—uses deep learning to generate sentences from the received knowledge graph. The generated sentence can help the receiver understand the semantics of the transmitted sentence. Let *G*2*Sϕ*(•) be the function of the proposed semantic restoration. The input of *<sup>G</sup>*2*Sϕ*(•) is the received knowledge graph *<sup>G</sup>*<sup>ˆ</sup> and its output is the restored sentence *<sup>S</sup>*ˆ. The deep learning network structure for the semantic restoration method is shown in Figure 4.

**Figure 4.** The deep learning network structure for the semantic restoration method.

At first, we encoded the received knowledge graph *G*ˆ to convert it to the embedding, which could be processed by the deep learning network. Specifically, we used the graph attention network (GAT) [42] to calculate the embedding of the received knowledge graph *G*ˆ. GAT is a representative graph convolutional network that can encode the knowledge graph by introducing the attention mechanism into the knowledge graph. Therefore, the embedding of *G*ˆ can be represented as

$$h = \text{GAT}(\tilde{G}) \tag{11}$$

After obtaining the embedding *h*, we used the recurrent neural network (RNN) and the attention mechanism to generate the sentence word by word. Each step of RNN can produce a word embedding. In the *i*th step, the embedding *bi* can be represented as

$$b\_i = \text{RNN}(b\_{i-1}, w\_{i-1}) \tag{12}$$

where *wi*−<sup>1</sup> is the *i* − 1th word in the generated sentence, *bi*−<sup>1</sup> is the embedding produced in the *i* − 1th step. To improve the accuracy of the generated sentence, the attention mechanism was used to obtain the embedding of contextual information. The formula can be described as

$$c\_i = \text{ATTENTION}(b\_{i\prime}h) \tag{13}$$

where *ci* denotes the contextual information of the *i*th word. Then we fed the word embedding *bi* and the contextual information *ci* into a multilayer perceptron (MLP) to generate the *i*th word *wi*.

Consequently, the generation of *wi* based on the received knowledge graph *G*ˆ and all previously generated words *w*<*<sup>i</sup>* was fulfilled by predicting the word *wi* through MLP with the assistance of the word embedding *bi* and the contextual information *ci*. Thus, the probability of recovering word *wi* can be represented as

$$p(w\_i|w\_{\le i\prime}, \hat{G}) \propto \exp(\text{MLP}([b\_i; c\_i]))\tag{14}$$

In summary, the probability of generating a sentence from the received knowledge graph *G*ˆ is equivalent to the product of the probability of generating each word. The probability can be described as

$$p(\mathbb{S}|\mathbb{G}) = \prod p\left(w\_i \mid w\_{\prec i\prime}, \mathbb{G}\right) \tag{15}$$

Similarly, we used the negative log-likelihood loss to denote the loss function of the proposed semantic restoration method according to the probability *<sup>p</sup>*(*S*ˆ|*G*ˆ). The loss function can be represented as

$$\begin{split} \mathcal{L}\_{G2S}(\boldsymbol{\varphi}) &= \mathbb{E}[-\log p(\boldsymbol{\hat{S}} \mid \hat{\mathsf{G}}; \boldsymbol{\uprho})] \\ &= \mathbb{E}\left[-\log \prod p\left(w\_{i} \mid w\_{\leq i\prime}, \hat{\mathsf{G}}; \boldsymbol{\uprho}\right)\right] \end{split} \tag{16}$$

where *ϕ* is the network parameter set of the deep learning network, which is shown in Figure 4. Finally, the gradient descent can be used to find the optimal parameter set *ϕ*∗ for minimizing the loss function L*G*2*S*(*ϕ*).

The details of the proposed semantic restoration process are summarized in Algorithm 2.


1: Compute the embedding of *G*ˆ by Equation (11)


7: Compute the loss function L*G*2*S*(*ϕ*) according to Equation (16)

8: Train *ϕ* → *ϕ*<sup>∗</sup>

```
Output: the knowledge graph Sˆ
```
### *3.3. System Process*

In this section, we introduce the overall process of the proposed semantic communication system. Let *S* = [*w*1, *w*2, ··· , *wm*] be the transmitted sentence, where *wm* is the *m*th word in the sentence. As shown in Figure 5, with the help of the proposed semantic extraction method *S*2*G<sup>θ</sup>* (•), the transmitter converts the transmitted sentence *S* to the knowledge graph *G*, which can be represented as *G* = *S*2*G<sup>θ</sup>* (*S*). The knowledge graph *G* consists of *n* triplets and it can be formulated as *G* = [*g*1, *g*2, ··· , *gn*].

**Figure 5.** The overall process of the proposed semantic communication system based on the knowledge graph, combining the proposed semantic extraction method, the proposed semantic restoration method, and the traditional communication architecture.

Using the proposed semantic extraction method, the transmitted sentence is converted into a series of triplets. In this process, the semantics of the transmitted sentence are extracted without losing semantics [13]. During transmission, these triplets are independent of each other, which means that errors in some triplets will not affect other triplets. However, in Markov models, once there is a transmission error, the whole transmitted sentence will be affected. Therefore, the proposed semantic communication system is more robust under a low SNR. Moreover, different semantic basic symbols (triplets) have semantic importance in semantic communication, unlike bits or symbols that are treated equally in traditional communication, such as longer-range models and Markov chain-based probabilistic models. These triplets (with semantic importance) should be treated differently. The triplets with important semantics should be allocated with many time slots and bandwidth resources. When the channel quality is extremely poor, instead of transmitting all triplets, which cannot be guaranteed by the channel, it is better to ensure that the most important triplet can be transmitted correctly. When the channel quality is better, the system can adjust the sending content according to semantic importance. Motivated by the different triplets with semantic importance, we sort these triplets according to their semantic similarity scores:

$$\text{sim}\_{\mathbf{v}}(s, \mathbf{g}\_i) = \frac{\mathbf{v}(\mathbf{s}) \cdot (\mathbf{v}(\mathbf{g}\_i))^T}{||\mathbf{v}(\mathbf{s})|| \cdot ||\mathbf{v}(\mathbf{g}\_i)||} \tag{17}$$

where *gi* denotes the *i*th triplet in *G*. Table 1 shows an example of semantic importance. From Table 1, "< Steve Jobs – founder-Apple>" is more important than "< Steve Jobs – graduate-Reed College >", which is also in line with human perception.

**Table 1.** An example of semantic importance.


Based on the sorted triplets, we can adaptively adjust the number of transmitted triplets according to the channel quality. When the channel quality is extremely poor, we only transmit the most significant triplet and use the communication resources of triplets not transmitted to protect it. As the channel quality improves, we increase the number of transmitted triplets.

After the transmitted knowledge graph *G* is obtained, the transmitter first maps it into a binary bit stream *B* = *T*(*G*), and then feeds the binary bit stream into the channel encoder to cope with the effects of channel noise and distortion. Therefore, the whole process of the transmitter can be represented as

$$X = \mathbb{C}(T(G))\tag{18}$$

where *T*(•) and *C*(•) denote the source encoder and the channel encoder, respectively. If *X* is sent, the received signal can be represented as

$$Y = HX + N \tag{19}$$

where *H* is the channel coefficient and *N* ∼ CN 0, *σ*<sup>2</sup> *n* denotes the additive white Gaussian noise.

After obtaining the received signal, the receiver will decode it to recover the transmitted knowledge graph. Defining *<sup>C</sup>*−1(•) and *<sup>T</sup>*−1(•) as the channel decoder and the source decoder, respectively, the received knowledge graph *G*ˆ can be represented as

$$
\hat{G} = T^{-1} \left( \mathbb{C}^{-1}(\boldsymbol{Y}) \right) \tag{20}
$$

Then we use the proposed semantic restoration method *G*2*Sϕ*(•) to obtain the restored sentence *S*ˆ.

$$
\hat{S} = \mathbf{G} 2S\_{\boldsymbol{\theta}}(\hat{\mathbf{G}}) \tag{21}
$$

The process of the proposed semantic communication system is shown in Algorithm 3.


```
Input: The transmitted sentence S
```

```
1: Transmitter:
```

10: **Receiver:**

11: Receive *Y*

$$\mathbf{12:} \quad T^{-1}(\mathbb{C}^{-1}(\mathbb{Y})) \to \hat{G}$$


### **4. Experimental Results**

In this section, we compare the proposed SCKG with other traditional models under different channels, including the AWGN channel and the Rayleigh fading channel to verify the effectiveness of SCKG. In Table 2, we introduce the models used in the experiments, including their general features and technical methods. It is worth noting that the traditional communication models are not the only ones mentioned in Table 2. The source coding

can also choose arithmetic coding, L–Z coding, and other coding methods. Identically, the channel coding can also choose turbo code, polar code, and other coding methods.


**Table 2.** Introduction to the proposed model and other traditional models.

### *4.1. Experimental Settings*

In the simulation, the adopted dataset was the WebNLG dataset [45], which is usually used to generate sentences from knowledge graphs. Each data in the dataset consists of multiple triplets and their corresponding sentences. After preprocessing the dataset, we obtained 12,597 training data, 1746 validation data, and 2493 test data. The training and testing environment was Ubuntu16.04+CUDA10.1, the selected deep learning framework was PyTorch 1.6.0. The training settings of the semantic extraction method and the semantic restoration method are shown in Table 3.

**Table 3.** Training settings for semantic extraction and restoration method.


In the experiment, the test data of WebNLG were transmitted sentence-by-sentence to the transmitter. Then we obtained the restored sentences by using the above-mentioned methods at the receiver. After the restored sentences were obtained, the experimental results could be calculated according to the performance metrics.

For the benchmark, we adopted the traditional communication architecture with source coding and channel coding, where source coding could use Huffman coding, arithmetic coding, L–Z coding, etc., and channel coding could use LDPC coding, turbo code, polar code, etc. For simplicity, we adopted the combination of Huffman coding and LDPC

coding (named "Huffman + LDPC"). Moreover, we considered another two methods as ablation experiments to validate the effectiveness of the proposed model. One involved using the proposed model without adaptive transmission and semantic restoration (named the "Proposed model without AT and SR"), and the other involved using the proposed model without adaptive transmission (named the "Proposed model without AT").

### *4.2. Experimental Result Analysis*

4.2.1. Performance of the Proposed Semantic Communication System

First, we investigated the effects of the number of triplets on the semantic performance under different SNRs. Here, we considered three strategies, one strategy was to send the first triplet (named "Send the 1st triplet"), and the other two schemes involved sending 50% triplets (named "Send 50% triplets") and 100% triplets (named "Send 100% triplets"), respectively. Moreover, we compared these three strategies with the benchmark and an end-to-end deep learning-based communication system proposed in [23] (named DeepNN). Figure 6 shows the performance of the semantic similarity versus the SNR in this experiment. From Figure 6, "Send the 1st triplet" has the best semantic similarity under a low SNR because it uses the most resources to protect the first triplet. With the SNR becoming better, "Send 50% triplets" has better performance because "Send the 1st triplet" transmits limited semantics, and the accuracy of the scheme "Send 100% triplets" cannot be guaranteed due to the channel distortion. The semantic similarity of "Send 100% triplets" is above the others at a high SNR, which is reasonable due to the error-free transmission when the channel quality is good. Meanwhile, all three strategies outperformed the benchmark and DeepNN in their superior SNR regions. According to Figure 6, it is reasonable to send the most important triplet in the low SNR region, send 50% triplets in the medium SNR region, and send 100% triplets in the high SNR region.

**Figure 6.** Semantic similarity versus the SNR under the AWGN channel, with send the 1st triplet; send 50% triplets; send 100% triplets; Huffman + LDPC; DeepNN.

Figure 7 demonstrates the relationship between the SNR and the BLEU score under the AWGN channel. From Figure 7, the proposed model performs better under a low SNR in terms of the 1-gram BLEU score or 2-gram BLEU score due to the protection of important triplets. Moreover, after converting the received triplets into sentences by using the proposed semantic restoration method, "Proposed model without AT" outperforms "Proposed model without AT and SR" for all SNR regimes. However, the performance of the proposed model is inferior to the traditional communication system in the high SNR region in Figure 7. This is because the proposed semantic restoration method attempts to recover the same semantic rather than the same sentence structure. For example, the transmitted sentence is "Steve Jobs was the founder of Apple", and the restored sentence is "Steve Jobs founded Apple". Although the two sentences are semantically consistent, the BLEU score of the proposed scheme is poor.

**Figure 7.** BLEU score versus the SNR over the AWGN channel. (**a**) BLEU(1-gram) score over the AWGN channel. (**b**) BLEU(2-gram) score over the AWGN channel.

Figure 8 shows the relationship between the SNR and the BLEU score under the Rayleigh fading channel. All scores in Figure 8 are lower than the scores in Figure 7 because of the severe impacts of Rayleigh fading. However, the proposed model significantly improves performance compared to the benchmark. From Figure 8, the proposed model outperforms the benchmark across the SNR range over the Rayleigh fading channel, either the 1-gram BLEU score or the 2-gram BLEU score. It reflects that our proposed model is more robust to complex communication environments. Meanwhile, since "Proposed model without AT" and "Send 100% triplets" are identical in the high SNR region, the results of the proposed model and "Proposed model without AT" are the same when the SNR is higher than 2 dB.

**Figure 8.** BLEU score versus the SNR over the Rayleigh fading channel. (**a**) BLEU(1-gram) score over the Rayleigh fading channel. (**b**) BLEU(2-gram) score over the Rayleigh fading channel.

Since BLEU is an evaluation metric that calculates scores based on word matching, sentence sizes can affect the performance of our proposed model. To research this, we divided the transmitted sentences into three groups—sentence length between 0 and 15, sentence length between 15 and 30, and sentence length greater than 30. Figure 9 shows the relationship between the SNR and the (1-gram) BLEU score under the AWGN channel and the Rayleigh fading channel, respectively. From Figure 9a, "Sentence Length (0, 15)" is significantly higher than the other two groups. This is because the proposed model only transmits the most important triplet in the low SNR, and the length of the restored sentence is limited. In the low SNR region, the BLEU score decreases as the sentence length increases. With the SNR increasing, the number of the transmitted triplets increases, and the gaps between the different groups narrow. In Figure 9b, the gaps between the different groups are not obvious due to the effects of Rayleigh fading.

**Figure 9.** BLEU (1-gram) score versus the SNR with sentence length (0, 15). Sentence Length (15, 30); sentence length (>30). (**a**) BLEU (1-gram) score over the AWGN channel. (**b**) BLEU (1-gram) score over the Rayleigh fading channel.

Figure 10 shows the METEOR score versus the SNR over the AWGN channel and the Rayleigh fading channel. From Figure 10a, the score of the benchmark is close to 1 and higher than our proposed model when the SNR is above 4 dB. This is because the few errors that occurred during the transmission were corrected by the channel coding at a high SNR; the benchmark could restore the transmitted sentence without distortion. However, our proposed model discards the information of sentence structure during transmission. When the SNR is less than 4 dB, the channel coding cannot correct all errors during transmission. In this situation, the METEOR score of the benchmark degrades rapidly. However, the proposed model reduces the number of transmitted triplets and protects important triplets in this case, which leads to a better performance in the low SNR region. From Figure 10b, even under the Rayleigh fading channel, our model outperforms the benchmark in all SNR regions.

**Figure 10.** (**a**) METEOR score versus the SNR over the AWGN channel. (**b**) METEOR score versus the SNR over the Rayleigh fading channel.

Figure 11 draws the relationship between the SNR and the semantic similarity under the AWGN channel and the Rayleigh fading channel. From Figure 11, the "Proposed model without AT and SR" outperforms the benchmark in the low SNR region under the AWGN channel, while it outperforms the benchmark in all SNR regions under the Rayleigh fading channel. This is because our proposed model splits the transmitted sentence into multiple independent triplets, leading to that, the wrongly transmitted triplets will not affect the semantics of other triplets. However, the benchmark model transmits the sentence as a whole, and if errors occur in the transmission, then the semantics of the sentence are affected. Therefore, when the channel quality is poor, our proposed model can preserve partially correct semantics. Meanwhile, since the semantic similarity based on the BERT model can capture semantic relationships among words, the proposed scheme obtains a higher similarity compared with the BLEU score and METEOR score.

**Figure 11.** (**a**) Semantic similarity versus the SNR over the AWGN channel. (**b**) Semantic similarity versus the SNR over the Rayleigh fading channel.

To ensure the fairness of the comparison of experimental results, we computed the time complexities of all strategies. We transmitted 1000 sentences from the transmitter to the receiver by using different strategies and calculated the average execution time. All tests were run on Python and were performed by the computer with AMD Ryzen 7 4800H and NVIDIA GeForce GTX 3060. The results are shown in Table 4. From Table 4, our proposed model increases the computational complexity and improves communication reliability.

**Table 4.** The time complexity of all strategies.


4.2.2. Comparison with Other Semantic Communication Models

To validate that our proposed model is more competitive than existing research, we compared it with the scheme from [23], which adopts an end-to-end deep learningbased communication system for text transmission (named DeepNN). Figure 12 shows the relationship between the SNR and the semantic similarity performance over the AWGN channel. From Figure 12, our proposed model outperforms DeepNN across the entire SNR region. The reasons are two-fold. First, by using triplets as semantic basic symbols, our proposed model can extract lossless semantics. Moreover, the important triplets are allocated more transmission resources in our proposed model, which effectively protects the importance of the semantics. However, DeepNN uses a fixed bit length to encode sentences of different lengths, resulting in a partial loss of semantics.

**Figure 12.** Semantic similarity of our proposed model and DeepNN versus the SNR over the AWGN channel.

### **5. Conclusions**

In this paper, the reliable semantic communication assisted by the knowledge graph was studied, which overcomes the problem that the meaning of data represented by the features of the deep learning model cannot be explainable [26,28]. Specifically, we proposed a semantic extraction scheme that transforms the transmitted sentence into multiple triplets with semantic importance. Moreover, an adaptive transmission scheme is proposed, in which the important triplets are allocated more communication resources to combat channel distortion. Moreover, a semantic restoration scheme was designed to reconstruct the sentence and recover the whole semantic at the receiver. The simulation results show that the proposed system outperforms the traditional schemes in improving communication reliability, especially in the low SNR regime. However, the optimal number of triplets transmitted over a specific channel is still an 'open question'. In the future, more work is needed to analyze the relationship between the number of triplets and the channel quality.

**Author Contributions:** Conceptualization, S.J.; methodology, S.J. and Y.L.; formal analysis, Y.Z. and P.L.; investigation, K.C. and J.X.; supervision, H.Z.; writing—original draft preparation, S.J. and Y.L.; writing—review and editing, K.C., H.Z. and J.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded in part by the National Natural Science Foundation of China under grant nos. 61931020, U19B2024, and 62001483, and in part by the science and technology innovation Program of Hunan Province under grant no. 2021JJ40690.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data are available from the authors, on request.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


### *Article* **Coded Caching for Broadcast Networks with User Cooperation †**

**Zhenhao Huang 1, Jiahui Chen 1, Xiaowen You 1, Shuai Ma <sup>2</sup> and Youlong Wu 1,\***


**Abstract:** Caching technique is a promising approach to reduce the heavy traffic load and improve user latency experience for the Internet of Things (IoT). In this paper, by exploiting edge cache resources and communication opportunities in device-to-device (D2D) networks and broadcast networks, two novel coded caching schemes are proposed that greatly reduce transmission latency for the centralized and decentralized caching settings, respectively. In addition to the multicast gain, both schemes obtain an additional *cooperation gain* offered by user cooperation and an additional *parallel gain* offered by the parallel transmission among the server and users. With a newly established lower bound on the transmission delay, we prove that the centralized coded caching scheme is *order-optimal*, i.e., achieving a constant multiplicative gap within the minimum transmission delay. The decentralized coded caching scheme is also order-optimal if each user's cache size is larger than a threshold which approaches zero as the total number of users tends to infinity. Moreover, theoretical analysis shows that to reduce the transmission delay, the number of users sending signals simultaneously should be appropriately chosen according to the user's cache size, and always letting more users send information in parallel could cause high transmission delay.

**Keywords:** coded cache; cooperation; device-to-device; transmission delay

### **1. Introduction**

With the rapid development of Internet of Things (IoT) technologies, IoT data traffic, such as live streaming and on-demand video streaming, has grown dramatically over the past few years. To reduce the traffic load and improve the user latency experience, the caching technique has been viewed as a promising approach that shifts the network traffic to low congestion periods. In the seminal paper [1], Maddah-Ali and Niesen proposed a coded caching scheme based on centralized file placement and coded multicast delivery that achieves a significantly larger global multicast gain compared to the conventional uncoded caching scheme.

The coded caching scheme has attracted wide and significant interest. The coded caching scheme was extended to a setup with decentralized file placement, where no coordination is required for the file placement [2]. For the cache-aided broadcast network, ref. [3] showed that the rate–memory tradeoff of the above caching system is within a factor of 2.00884. For the setting with uncoded file placement where each user stores uncoded content from the library, refs. [4,5] proved that Maddah-Ali and Niesen's scheme is optimal. In [6], both the placement and delivery phases of coded caching are depicted using a placement delivery array (PDA), and an upper bound for all possible regular PDAs was

**Citation:** Huang, Z.; Chen, J.; You, X.; Ma, S.; Wu, Y. Coded Caching for Broadcast Networks with User Cooperation. *Entropy* **2022**, *24*, 1034. https://doi.org/10.3390/e24081034

Academic Editors: H. Vincent Poor, Holger Boche, Rafael F. Schaefer and Onur Günlü

Received: 22 June 2022 Accepted: 25 July 2022 Published: 27 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

established. In [7], the authors studied a cached-aided network with heterogeneous setting where the user cache memories are unequal. More asymmetric network settings have been discussed, such as coded caching with heterogeneous user profiles [8], with distinct sizes of files [9], with asymmetric cache sizes [10–12] and with distinct link qualities [13]. The settings with varying file popularities have been discussed in [14–16]. Coded caching that jointly considers various heterogeneous aspects was studied in [17]. Other works on coded caching include, e.g., cache-aided noiseless multi-server network [18], cache-aided wireless/noisy broadcast networks [19–22], cache-aided relay networks [23–25], cacheaided interference management [26,27], coded caching with random demands [28], caching in combination networks [29], coded caching under secrecy constraints [30], coded caching with reduced subpacketization [31,32], the coded caching problem where each user requests multiple files [33], and a cache-aided broadcast network for correlated content [34], etc.

A different line of work is to study the cached-aided networks without the presence of a server, e.g., the device-to-device (D2D) cache-aided network. In [35], the authors investigated coded caching for wireless D2D network [35], where users locate in a fixed mesh topology wireless D2D network. A D2D system with selfish users who do not participate in delivering the missing subfiles to all users was studied in [36]. Wang et al. applied the PDA to characterize cache-aided D2D wireless networks in [37]. In [38], the authors studied the spatial D2D networks in which the user locations are modeled by a Poisson point process. For heterogeneous cache-aided D2D networks where users are equipped with cache memories of distinct sizes, ref. [39] minimized the delivery load by optimizing over the partition during the placement phase and the size and structure of D2D during the delivery phase. A highly dense wireless network with device mobility was investigated in [40].

In fact, combining the cache-aided broadcast network with the cache-aided D2D network can potentially reduce the transmission latency. This hybrid network is common in many practical distributed systems such as cloud network [41], where a central cloud server broadcasts messages to multiple users through the cellular network, and meanwhile users communicate with each other through a fiber local area network (LAN). A potential scenario is that users in a moderately dense area, such as a university, want to download files, such as movies, from a data library, such as a video service provider. It should be noted that the user demands are highly redundant, and the files need not only be stored by a central server but also partially cached by other users. Someone can attain the desired content through both communicating with the central server and other users such that the communication and storage resources can be used efficiently. Unfortunately, there is very little research investigating the coded caching problem for this hybrid network. In this paper, we consider such hybrid cache-aided network where a server consisting of *<sup>N</sup>* <sup>∈</sup> <sup>Z</sup><sup>+</sup> files connects with *<sup>K</sup>* <sup>∈</sup> <sup>Z</sup><sup>+</sup> users through a broadcast network, and meanwhile the users can exchange information via a D2D network. Unlike the settings of [35,38], in which each user can only communicate with its neighboring users via spatial multiplexing, we consider the D2D network as either an error-free shared link or a flexible routing network [18]. In particular, for the case of the shared link, all users exchange information via a shared link. In the flexible routing network, there exists a routing strategy adaptively partitioning all users into multiple groups, in each of which one user sends data packets error-free to the remaining users in the corresponding group. Let *α* ∈ Z be the number of groups who send signals at the same time, then the following fundamental questions arise for this hybrid cache-aided network:


In this paper, we try to address these questions, and our main contributions are summarized as follows:


Please note that the decentralized scenario is much more complicated than the centralized scenario, since each subfile can be stored by *s* = 1, 2, ... , *K* users, leading to a dynamic file-splitting and communication strategy in the D2D network. Our schemes, in particular the decentralized coded caching scheme, differ greatly with the D2D coded caching scheme in [35]. Specifically, ref. [35] considered a fixed network topology where each user connects with a fixed set of users, and the total user cache sizes must be large enough to store all files in the library. However, in our schemes, the user group partition is dynamically changing, and each user can communicate with any set of users via network routing. Moreover, our model has the server share communication loads with the users, resulting in an allocation problem on communication loads between the broadcast network and D2D network. Finally, our schemes achieve a tradeoff between the cooperation gain, parallel gain and multicast gain, while the schemes in [1,2,35] only achieve the multicast gain.

The remainder of this paper is as follows. Section 2 presents the system model, and defines the main problem studied in this paper. We summarize the obtained main results in Section 3. Following that is a detailed description of the centralized coded caching scheme with user cooperation in Section 4. Section 5 extends the techniques we developed for the centralized caching problem to the setting of decentralized random caching. Section 6 concludes this paper.

### **2. System Model and Problem Definition**

Consider a cache-aided network consisting of a single server and *K* users as depicted in Figure 1. The server has a library of *N* independent files *W*1, ... , *WN*. Each file *Wn*, *n* = 1, . . . , *N*, is uniformly distributed over

$$[\mathbf{2}^F] \triangleq \{1, \mathbf{2}, \dots, \mathbf{2}^F\}\_{\prime\prime}$$

for some positive integer *F*. The server connects with *K* users through a noisy-free shared link but rate-limited to a network speed of *C*<sup>1</sup> bits per second (bits/s). Each user *k* ∈ [*K*] is equipped with a cache memory of size *MF* bits, for some *M* ∈ [0, *N*], and can communicate with each other via a D2D network.

We mainly focus on two types of D2D networks: a shared link as in [1,2] and a flexible routing network introduced in [18]. In the case of a shared link, all users connect with each other through a shared error-free link but rate-limited to *C*<sup>2</sup> bits/s. In the flexible routing network, *K* users can arbitrarily form multiple groups via network routing, in each of which at most one user can send error-free data packets at a network speed *C*<sup>2</sup> bits/s to the remaining users within the group. To unify these two types of D2D networks, we introduce an integer *<sup>α</sup>*max ∈ {1, *<sup>K</sup>* <sup>2</sup> }, which denotes the maximum number of groups allowed to send data parallelly in the D2D network. For example, when *α*max = 1, the D2D network degenerates into a shared link, and when *<sup>α</sup>*max <sup>=</sup> *<sup>K</sup>* <sup>2</sup> , it turns to be the flexible network.

**Figure 1.** Caching system considered in this paper. A server connects with *K* cache-enabled users and the users can cooperate through a flexible network.

The system works in two phases: a placement phase and a delivery phase. In the placement phase, all users will access the entire library *W*1, ... , *WN* and fill the content to their caching memories. More specifically, each user *k*, for *k* ∈ [*K*], maps *W*1, ... , *WN* to its cache content:

$$Z\_k \triangleq \phi\_k(\mathcal{W}\_{1\prime}, \dots, \mathcal{W}\_N), \tag{1}$$

for some caching function

$$
\phi\_k: [2^F]^N \to [\lfloor 2^{MF} \rfloor]. \tag{2}
$$

In the delivery phase, each user requests one of the *N* files from the library. We denote the demand of user *k* as *dk* ∈ [*N*], and its desired file as *Wdk* . Let **d** - (*d*1, ... , *dK*) denotes the request vector. In this paper, we investigate the worst request case where each user makes a unique request.

Once the request vector **d** is informed to the server and all users, the server produces the symbol

$$X \triangleq f\_{\mathbf{d}}(\mathcal{W}\_{1}, \dots, \mathcal{W}\_{N}),\tag{3}$$

and broadcasts it to all users through the broadcast network. Meanwhile, user *k* ∈ {1, . . . , *K*} produces the symbol (Each user *k* can produce *Xk* as a function of *Zk* and the received signals sent by the server, but because all users can access to the server's signal due to the fact that the server broadcasts its signals to the network, it is equivalent to generating *Xk* as a function *Zk*).

$$X\_k \triangleq f\_{k, \mathbf{d}}(Z\_k), \tag{4}$$

and sends it to a set of intended users D*<sup>k</sup>* ⊆ [*K*] through the D2D network. Here, D*<sup>k</sup>* represents the set of destination users served by node *k*, *f***<sup>d</sup>** and *fk*,**<sup>d</sup>** are some encoding functions

$$f\_{\mathbf{d}} : [2^F]^N \to [|\mathcal{Q}^{R\_1 F}|], \ f\_{k, \mathbf{d}} : [|\mathcal{Q}^{M F}|] \to [|\mathcal{Q}^{R\_2 F}|],\tag{5}$$

where *R*<sup>1</sup> and *R*<sup>2</sup> denote the *transmission rate* sent by the server in the broadcast network and by each user in the D2D network, respectively. Here we focus on the symmetric case where all users have the same transmission rate. Due to the constraint of *α*max, at most *α*max users can send signals parallelly in each channel use. The set of *α*max users who send signals in parallel could be adaptively changed in the delivery phase.

At the end of the delivery phase, due to the error-free transmission in the broadcast and D2D networks, user *k* observes symbols sent to them, i.e., (*Xj* : *j* ∈ [*K*], *k* ∈ D*j*), and decodes its desired message as *<sup>W</sup>*<sup>ˆ</sup> *dk* <sup>=</sup> *<sup>ψ</sup>k*,**d**(*X*,(*Xj* : *<sup>j</sup>* <sup>∈</sup> [*K*], *<sup>k</sup>* ∈ D*j*), *Zk*), where *<sup>ψ</sup>k*,**<sup>d</sup>** is a decoding function.

We define the worst-case probability of error as

$$P\_{\varepsilon} \stackrel{\Delta}{=} \max\_{\mathbf{d} \in \mathcal{F}^n} \max\_{k \in [K]} \Pr\left(\mathcal{W}\_{d\_k} \neq \mathcal{W}\_{d\_k}\right). \tag{6}$$

A coded caching scheme (*M*, *R*1, *R*2) consists of caching functions {*φk*}, encoding functions { *f***d**, *fk*,**d**} and decoding functions {*ψk*,**d**}. We say that the rate region (*M*, *R*1, *R*2) is *achievable* if for every > 0 and every large enough file size *F*, there exists a coded caching scheme such that *Pe* is less than .

Since the server and the users send signals in parallel, the total transmission delay, denoted by *T*, can be defined as

$$T \triangleq \max\{\frac{R\_1 F}{\mathcal{C}\_1}, \frac{R\_2 F}{\mathcal{C}\_2}\}. \tag{7}$$

The *optimal* transmission delay is *T*∗ inf{*T* : *T* is achievable}. For simplicity, we assume that *C*<sup>1</sup> = *C*<sup>2</sup> = *F*, and then from (7) we have

$$T = \max\{R\_1, R\_2\}.\tag{8}$$

When *C*<sup>1</sup> = *C*2, e.g., *C*<sup>1</sup> : *C*<sup>2</sup> = 1/*k*, one small adjustment allowing our scheme to continue to work is multiplying *λ* by 1/(*k*(1 − *λ*) + *λ*), where *λ* is a devisable parameter introduced later.

Our goal is to design a coded caching scheme to minimize the transmission delay. Finally, in this paper we assume *K* ≤ *N* and *M* ≤ *N*. Extending the results to other scenarios is straightforward, as mentioned in [1].

### **3. Main Results**

We first establish a general lower bound on the transmission delay for the system model described in Section 2, then present two upper bounds of the optimal transmission delay achieved by our centralized and decentralized coded caching schemes, respectively. Finally, we present the optimality results of these two schemes.

**Theorem 1** (Lower Bound)**.** *For memory size* 0 ≤ *M* ≤ *N, the optimal transmission delay is lower bounded by*

$$T^\* \ge \max\left\{ \frac{1}{2} \left( 1 - \frac{M}{N} \right), \max\_{s \in [K]} \left( s - \frac{KM}{\lfloor N/s \rfloor} \right), \max\_{s \in [K]} \left( s - \frac{sM}{\lfloor N/s \rfloor} \right) \frac{1}{1 + \alpha\_{\max}} \right\}.\tag{9}$$

**Proof.** See the proof in Appendix A.

### *3.1. Centralized Coded Caching*

In the following theorem, we present an upper bound on the transmission delay for the centralized caching setup.

**Theorem 2** (Upper Bound for the Centralized Scenario)**.** *Let t* - *KM*/*<sup>N</sup>* <sup>∈</sup> <sup>Z</sup>+*, and <sup>α</sup>* <sup>∈</sup> <sup>Z</sup>+*. For memory size <sup>M</sup>* ∈ {0, *<sup>N</sup> <sup>K</sup>* , <sup>2</sup>*<sup>N</sup> <sup>K</sup>* , ... , *N*}*, the optimal transmission delay T*<sup>∗</sup> *is upper bounded by T*<sup>∗</sup> ≤ *T*central*, where*

$$T\_{\text{central}} \stackrel{\Delta}{=} \min\_{a \le a\_{\text{max}}} \mathcal{K} \left( 1 - \frac{M}{N} \right) \frac{1}{1 + t + a \min \left\{ \lfloor \frac{K}{a} \rfloor - 1, t \right\}}. \tag{10}$$

*For general* 0 ≤ *M* ≤ *N, the lower convex envelope of these points is achievable.*

**Proof.** See scheme in Section 4.

The following simple example shows that the proposed upper bound can greatly reduce the transmission delay.

**Example 1.** *Consider a network described in Section 2 with KM*/*N* = *K* − 1*. The coded caching scheme without D2D communication [1] has the server multicast an XOR message useful for all K users, achieving the transmission delay K* <sup>1</sup> <sup>−</sup> *<sup>M</sup> N* <sup>1</sup> <sup>1</sup>+*<sup>t</sup>* <sup>=</sup> <sup>1</sup> *<sup>K</sup> . The D2D coded caching scheme [35] achieves the transmission delay <sup>N</sup> <sup>M</sup>* (<sup>1</sup> <sup>−</sup> *<sup>M</sup> <sup>N</sup>* ) = <sup>1</sup> *<sup>K</sup>*−<sup>1</sup> *. The achievable transmission delay in Theorem <sup>2</sup> equals* <sup>1</sup> <sup>2</sup>*K*−<sup>1</sup> *by letting <sup>α</sup>* <sup>=</sup> <sup>1</sup>*, almost twice as short as the transmission delay of previous schemes if K is sufficiently large.*

From (10), we obtain that the optimal value of *α*, denoted by *α*∗, equals 1 if *t* ≥ *K* − 1 and to *<sup>α</sup>*max if *<sup>t</sup>* ≤ *<sup>K</sup> <sup>α</sup>*max −1. When ignoring all integer constraints, we obtain *<sup>α</sup>*<sup>∗</sup> <sup>=</sup> *<sup>K</sup> <sup>t</sup>*+<sup>1</sup> . We rewrite this choice as follows:

$$a^\* = \begin{cases} 1, & t \ge K - 1, \\ K/(t+1), & \lfloor K/a\_{\max} \rfloor - 1 < t < K - 1, \\ a\_{\max \prime} & t \le \lfloor K/a\_{\max} \rfloor - 1. \end{cases} \tag{11}$$

**Remark 1.** *From* (11)*, we observe that when M is small such that t* ≤ *K*/*α*max−1*, we have α*<sup>∗</sup> = *α*max*. As M is increasing, α*<sup>∗</sup> *becomes K*/(*t* + 1)*, smaller than α*max*. When M is sufficiently large such that M* ≥ (*K* −1)*N*/*K, only one user should be allowed to send information, i.e., α*<sup>∗</sup> = 1*. This indicates that letting more users parallelly send information could be harmful. The main reason for this phenomenon is the existence of a tradeoff between the multicast gain,* cooperation gain *and* parallel gain*, which will be introduced below in this section.*

Comparing *T*central with the transmission delay achieved by Maddah-Ali and Niesen's scheme for the broadcast network [1], i.e., *K* <sup>1</sup><sup>−</sup> *<sup>M</sup> N* <sup>1</sup> <sup>1</sup>+*t*, *T*central consists of an additional factor

$$G\_{\text{central},c} \stackrel{\Delta}{=} \frac{1}{1 + \frac{a}{1+t} \min\{\lfloor \frac{K}{a} \rfloor - 1, t\}},\tag{12}$$

referred to as *centralized cooperation gain*, as it arises from user cooperation. Comparing *T*central with the transmission delay achieved by the D2D coded caching scheme [35], i.e., *<sup>N</sup> <sup>M</sup>* (<sup>1</sup> <sup>−</sup> *<sup>M</sup> <sup>N</sup>* ), *T*central consists of an additional factor

$$G\_{\text{central,p}} \triangleq \frac{1}{1 + \frac{1}{t} + \frac{q}{t} \min\{\lfloor \frac{K}{a} \rfloor - 1, t\}},\tag{13}$$

referred to as *centralized parallel gain*, as it arises from parallel transmission among the server and users. Both gains depend on *K*, *M*/*N* and *α*max.

Substituting the optimal *α*∗ into (12), we have

$$G\_{\text{central},\text{c}} = \begin{cases} \frac{1+t}{K+t}, & t \ge K-1, \\\frac{1+t}{K-\frac{K}{t+1}+t}, & \lfloor \frac{K}{a\_{\text{max}}} \rfloor - 1 < t < K-1, \\\frac{1+t}{a\_{\text{max}}t+t+1}, & t \le \lfloor \frac{K}{a\_{\text{max}}} \rfloor - 1. \end{cases} \tag{14}$$

When fixing (*K*, *N*, *α*max), *G*central,c in general is not a monotonic function of *M*. More specifically, when *<sup>M</sup>* is small enough such that *<sup>t</sup>* <sup>&</sup>lt; *<sup>K</sup> <sup>α</sup>*max −1, the function *G*central,c is monotonically decreasing, indicating that the improvement caused by introducing D2D communication. This is mainly because relatively larger *M* allows users to share more common data with each other, providing more opportunities on user cooperation. However, when *<sup>M</sup>* grows larger such that *<sup>t</sup>* ≥ *<sup>K</sup> <sup>α</sup>*max −1, the local and global caching gains become dominant, and less improvement can be obtained from user cooperation, turning *G*central,c to a monotonic increasing function of *M*,

Similarly, substituting the optimal *α*∗ into (13), we obtain

$$G\_{\text{central},p} = \begin{cases} \frac{t}{K+t}, & t \ge K-1, \\\frac{t}{\frac{t+K}{t+1}+t+1}, & \lfloor \frac{K}{\mathfrak{a}\_{\text{max}}} \rfloor - 1 < t < K-1, \\\frac{t}{\mathfrak{a}\_{\text{max}}t+t+1}, & t \le \lfloor \frac{K}{\mathfrak{a}\_{\text{max}}} \rfloor - 1. \end{cases} \tag{15}$$

Equation (15) shows that *G*central,p is monotonically increasing with *t*, mainly due to the fact that when *M* increases, more content can be sent through the D2D network without the help of the central server, decreasing the improvement from parallel transmission between the server and users.

The centralized cooperation gain (12) and parallel gain (13) are plotted in Figure 2 when *N* = 40, *K* = 20 and *α*max = 5.

**Figure 2.** Centralized cooperation gain and parallel gain when *N* = 40, *K* = 20 and *α*max = 5.

**Remark 2.** *Larger α could lead to better parallel and cooperation gain (more uses can concurrently multicast signals to other users), but will result in worse multicast gain (signals are multicast to fewer users in each group). The choice of α in* (11) *is in fact a tradeoff between the multicast gain, parallel gain and cooperation gain.*

The proposed scheme achieving the upper bound in Theorem 2 is order-optimal.

**Theorem 3.** *For memory size* 0 ≤ *M* ≤ *N,*

$$\frac{T\_{\text{central}}}{T^\*} \le 31.\tag{16}$$

**Proof.** See the proof in Appendix B.

The exact gap of *T*central/*T*<sup>∗</sup> could be much smaller. One could apply the method proposed in [3] to obtain a tighter lower bound and shrink the gap. In this paper, we only prove the order optimality of the proposed scheme, and leave the work of finding a smaller gap as the future work.

Figure 3 plots the lower bound (9) and upper bounds achieved by various schemes, including the proposed scheme, the scheme *Maddah-Ali 2014* in [1] which considers the broadcast network without D2D communication, and the scheme *Ji 2016* in [35], which considers the D2D network without server. It is obvious that our scheme outperforms the previous schemes and approaches closely to the lower bound.

**Figure 3.** Transmission delay when *N* = 40, *K* = 20 and *α*max = 5. The upper bounds are achieved under the centralized caching scenario.

### *3.2. Decentralized Coded Caching*

We exploit the multicast gain from coded caching, D2D communication, and parallel transmission between the server and users, leading to the following upper bound.

**Theorem 4** (Upper Bound for the Decentralized Scenario)**.** *Define p* - *M*/*N. For memory size* 0 ≤ *M* ≤ *N, the optimal transmission delay T*<sup>∗</sup> *is upper bounded by*

$$T^\* \le T\_{\text{electrical}} \triangleq \max \left\{ R\_{\odot \prime}, \frac{R\_{\text{s}} R\_{\text{u}}}{R\_{\text{s}} + R\_{\text{u}} - R\_{\odot}} \right\},\tag{17}$$

*where*

$$R\_{\overrightarrow{\mathcal{Q}}} \stackrel{\triangle}{=} K(1-p)^K,\tag{18}$$

$$\mathcal{R}\_s \triangleq \frac{1-p}{p} \left(1 - (1-p)^K\right),\tag{19}$$

$$R\_{\rm II} \stackrel{\triangle}{=} \frac{1}{a\_{\rm max}} \sum\_{s=2}^{\lceil \frac{K}{a\_{\rm max}} \rceil -1} \left( \frac{s \binom{K}{s}}{s-1} p^{s-1} (1-p)^{K-s+1} \right) \\ + \sum\_{s=\lceil \frac{K}{a\_{\rm max}} \rceil}^{K} \left( \frac{K \binom{K-1}{s-1}}{f(K,s)} p^{s-1} (1-p)^{K-s+1} \right), \tag{20}$$

*with*

$$f(K,s) \triangleq \begin{cases} \lfloor \frac{K}{s} \rfloor (s-1), & (K \bmod s) < 2, \\ K - 1 - \lfloor K/s \rfloor, & (K \bmod s) \ge 2. \end{cases} \tag{21}$$

**Proof.** Here, *R*<sup>∅</sup> represents the transmission rate of sending contents that are not cached by any user, *R*<sup>s</sup> and *R*<sup>u</sup> represent the transmission rate sent by the server via the broadcast network, and the transmission rate sent by users via the D2D network, respectively. Equation (17) balances the communication loads assigned to the server and users. See more detailed proof in Section 5.

The key idea of the scheme achieving (17) is to partition *<sup>K</sup>* users into *<sup>K</sup> <sup>s</sup>* groups for each communication round *s* ∈ [*K* − 1], and let each group perform the D2D coded caching scheme [35] to exchange information. The main challenge is that that among all *<sup>K</sup> s* groups, there are *<sup>K</sup> <sup>s</sup>* groups of the same size *s*, and an *abnormal* group of size (*K* mod *s*) if (*K* mod *s*) = 0, leading to an asymmetric caching setup. One may use the scheme [35] for the groups of size *s*, for the group of size (*K* mod *s*) ≥ 2, but how to exploit the caching resource and communication capability of all groups while balancing communication loads among the two types of groups to minimize the transmission delay remains elusive and needs to be carefully designed. Moreover, this challenge poses complexities both in establishing the upper bound and in optimality proof.

**Remark 3.** *The upper bound in Theorem 4 is achieved by setting the number of users that exactly send signals in parallel as follows:*

$$\mathfrak{a}\_{\rm D} = \begin{cases} \mathfrak{a}\_{\rm max}, & \text{case 1}, \\ \lfloor \frac{K}{s} \rfloor, & \text{case 2}, \\ \lceil \frac{K}{s} \rceil, & \text{case 3}. \end{cases} \tag{22}$$

*If <sup>K</sup> <sup>s</sup>* > *α*max*, the number of users who send data in parallel is smaller than αmax, indicating that always letting more users parallelly send messages could cause higher transmission delay. For example, when K* <sup>≥</sup> <sup>4</sup>*, s* <sup>=</sup> *<sup>K</sup>* <sup>−</sup> <sup>1</sup> *and <sup>α</sup>max* <sup>=</sup> *<sup>K</sup>* <sup>2</sup> *, we have α<sup>D</sup>* = 1 < *αmax.*

**Remark 4.** *From the definitions of T*decentral*, R*s*, R*<sup>u</sup> *and R*∅*, it is easy to obtain that R*<sup>∅</sup> ≤ *T*decentral ≤ *R*s*,*

$$T\_{\text{decentral}} = \begin{cases} \frac{R\_{\text{s}} R\_{\text{u}}}{R\_{\text{s}} + R\_{\text{u}} - R\_{\odot}}, & R\_{\text{u}} \ge R\_{\odot \prime} \\ R\_{\odot \prime}, & R\_{\text{u}} < R\_{\odot \prime} \end{cases} \tag{23}$$

*T*decentral *decreases as α*max *increases, and T*decentral *increases as R*<sup>u</sup> *increases if R*<sup>u</sup> ≥ *R*∅*.*

Due to the complex term *R*u, *T*decentral in Theorem 4 is hard to evaluate. Since *T*decentral is increasing as *R*u increases (see Remark 4), substituting the following upper bound of *R*u into (17) provides an efficient way to evaluate *T*decentral.

**Corollary 1.** *For memory size* 0 ≤ *p* ≤ 1*, the upper bound of R*<sup>u</sup> *is given below:*

• *α*max = 1 *(a shared link):*

$$R\_{\mathbf{u}} \le \frac{1-p}{p} \left[ 1 - \frac{5}{2} Kp \left( 1-p \right)^{K-1} - 4 \left( 1-p \right)^{K} + \frac{3 \left( 1 - (1-p)^{K+1} \right)}{(K+1)p} \right];\tag{24}$$

• *<sup>α</sup>*max <sup>=</sup> *<sup>K</sup>* <sup>2</sup> *(a flexible network):*

$$R\_{\rm u} \le \frac{K(1-p)}{(K-1)} \left[ 1 - (1-p)^{K-1} - \frac{2/p}{K-2} (1 - (1-p)^K - Kp(1-p)^{K-1}) \right].\tag{25}$$

**Proof.** See the proof in Appendix C.

Recall that the transmission delay achieved by the decentralized scheme without D2D communication [2] is equal to *R*<sup>s</sup> given in (19). We define the ratio between *T*decentral and *R*s as *decentralized cooperation gain*:

$$G\_{\text{electrical},c} \stackrel{\Delta}{=} \max \{ \frac{R\_{\odot}}{R\_{\text{s}}}, \frac{R\_{\text{u}}}{R\_{\text{s}} + R\_{\text{u}} - R\_{\odot}} \},\tag{26}$$

with *G*decentral,c ∈ [0, 1] because of *R*<sup>∅</sup> ≤ *R*s. Similar to the centralized scenario, this gain arises from the coordination between users in the D2D network. Moreover, we also compare *T*decentral with the transmission delay (1 − *p*)/*p*, achieved by the D2D decentralized coded caching scheme [35], and define the ratio between *R*<sup>s</sup> and (1 − *p*)/*p* as *decentralized parallel gain*:

$$G\_{\text{deccentral},p} \triangleq G\_{\text{deccentral},\mathcal{c}} \cdot \left(1 - (1 - p)^K\right),\tag{27}$$

where *G*decentral,p ∈ [0, 1] arises from the parallel transmission between the server and the users.

We plot the decentralized cooperation gain and parallel gain for the two types of D2D networks in Figure 4 when *N* = 20 and *K* = 10. It can be seen that *G*decentral,c and *G*decentral,p in general are not monotonic functions of *M*. Here *G*decentral,c performs in a way similar to *G*central,c. When *M* is small, the function *G*decentral,c is monotonically decreasing from value 1 until reaching the minimum. For larger *M*, the function *G*decentral,c turns to monotonically increase with *M*. The reason for this phenomenon is that in the decentralized scenario, when *M* increases, the proportion of subfiles that are not cached by any user and must be sent by the server is decreasing. Thus, there are more subfiles that can be sent parallelly via D2D network as *M* increases. Meanwhile, the decentralized scheme in [2] offers an additional multicasting gain. Therefore, we need to balance these two gains to reduce the transmission delay.

**Figure 4.** Decentralized cooperation gain and parallel gain when *N* = 20 and *K* = 10.

The function *G*decentral,p behaves differently as it monotonically increases when *M* is small. After reaching the maximal value, the function *G*decentral,p decreases monotonically until meeting the local minimum (The abnormal bend in parallel gain when *<sup>α</sup>*max <sup>=</sup> *<sup>K</sup>* 2 comes from a balance effect between the *<sup>G</sup>*decentral,c and 1<sup>−</sup> (1<sup>−</sup> *<sup>p</sup>*)*<sup>K</sup>* in (27)), then *<sup>G</sup>*decentral,p turns to be a monotonic increasing function for large *M*. Similar to the centralized case, as *M* increases, the impact of parallel transmission among the server and users becomes smaller since more data can be transmitted by the users.

**Theorem 5.** *Define p* - *M*/*N and p*th - <sup>1</sup> <sup>−</sup> <sup>1</sup> *K*+1 1 *<sup>K</sup>*−<sup>1</sup> *, which tends to 0 as K tends to infinity. For memory size* 0 ≤ *M* ≤ *N,*

• *if α*max = 1 *(shared link), then*

$$\frac{T\_{\text{descentral}}}{T^\*} \le 24;$$

• *if <sup>α</sup>*max <sup>=</sup> *<sup>K</sup>* <sup>2</sup> *, then*

$$\frac{T\_{\text{decentral}}}{T^\*} \le \begin{cases} \max\left\{6, 2K\left(\frac{2K}{2K+1}\right)^{K-1}\right\}, & p < p\_{\text{th}}\\ 6, & p \ge p\_{\text{th}}. \end{cases}$$

**Proof.** See the proof in Appendix D.

Figure 5 plots the lower bound in (9) and upper bounds achieved by various decentralized coded caching schemes, including our scheme, the scheme *Maddah-Ali 2015* in [2] which considers the case without D2D communication, and the scheme *Ji 2016* in [35] which considers the case without server.

**Figure 5.** Transmission delay when *N* = 100, *K* = 20 and *α*max = 3. The upper bounds are achieved under the decentralized random caching scenario.

### **4. Coding Scheme under Centralized Data Placement**

In this section, we describe a novel centralized coded caching scheme for arbitrary *K*, *N* and *M* such that *t* = *KM*/*N* is a positive integer. The scheme can be extended to the general case 1 ≤ *t* ≤ *K* by following the same approach as in [1].

We first use an illustrative example to show how we form D2D communication groups, split files and deliver data, and then present our generalized centralized coding caching scheme.

### *4.1. An Illustrative Example*

Consider a network consisting of *K* = 6 users with cache size *M* = 4, and a library of *N* = 6 files. Thus, *t* = *KM*/*N* = 4. Divide all six users into two groups of equal size, and choose an integer *<sup>L</sup>*<sup>1</sup> <sup>=</sup> 2 that guarantees *<sup>K</sup>*( *K*−1 *<sup>t</sup>* )*L*<sup>1</sup> min{*α*(*K*/*α*−1),*t*} to be an integer. (According to (11) and (29), one optimal choice could be (*α* = 1, *L*<sup>1</sup> = 4, *λ* = 5/9), here we choose (*α* = 2, *L*<sup>1</sup> = 2, *λ* = 1/3) for simplicity, and also in order to demonstrate that even with a suboptimal choice, our scheme still outperforms that in [1,35]). Split each file *Wn*, for *n* = 1, . . . , *N*, into 3( 6 <sup>4</sup>) = 45 subfiles:

$$\mathcal{W}\_n = \left( \mathcal{W}\_{n,\mathcal{T}}^l : l \in [\mathfrak{Z}], \mathcal{T} \subset [\mathfrak{G}], |\mathcal{T}| = 4 \right).$$

We list all the requested subfiles uncached by all users as follows: for *l* = 1, 2, 3,

*Wl <sup>d</sup>*1,{2,3,4,5}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*1,{2,3,4,6}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*1,{2,3,5,6}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*1,{2,4,5,6}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*1,{3,4,5,6}; *Wl <sup>d</sup>*2,{1,3,4,5}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*2,{1,3,4,6}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*2,{1,3,5,6}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*2,{1,4,5,6}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*2,{3,4,5,6}; *Wl <sup>d</sup>*3,{1,2,4,5}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*3,{1,2,4,6}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*3,{1,2,5,6}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*3,{1,4,5,6}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*3,{2,4,5,6}; *Wl <sup>d</sup>*4,{1,2,3,5}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*4,{1,2,3,6}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*4,{1,2,5,6}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*4,{1,3,5,6}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*4,{2,3,5,6}; *Wl <sup>d</sup>*5,{1,2,3,4}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*5,{1,2,3,6}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*5,{1,2,4,6}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*5,{1,3,4,6}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*5,{2,3,4,6}; *Wl <sup>d</sup>*6,{1,2,3,4}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*6,{1,2,3,5}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*6,{1,2,4,5}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*6,{1,3,4,5}, *<sup>W</sup><sup>l</sup> <sup>d</sup>*6,{2,3,4,5}.

The users can finish the transmission in different partitions. Table 1 shows the transmission in four different partitions over the D2D network.


**Table 1.** Subfiles sent by users in different partition, *l* = 1, 2.

In Table 1, all users first send XOR symbols with superscript *l* = 1. Please note that the subfiles *W*<sup>1</sup> *<sup>d</sup>*2,{1,3,4,5} and *<sup>W</sup>*<sup>1</sup> *<sup>d</sup>*5,{1,2,4,6} are not delivered at the beginning since *<sup>K</sup>*( *K*−1 *<sup>t</sup>* ) *α*(*K*/*α*−1) is not an integer. Similarly, for subfiles with *l* = 2, *W*<sup>2</sup> *<sup>d</sup>*3,{1,2,5,6} and *<sup>W</sup>*<sup>2</sup> *<sup>d</sup>*4,{2,3,5,6} remain to be sent to user 3 and 4. In the last transmission, user 1 delivers the XOR message *W*<sup>2</sup> *<sup>d</sup>*3,{1,2,5,6} <sup>⊕</sup> *<sup>W</sup>*<sup>1</sup> *<sup>d</sup>*2,{1,3,4,5} to user 2 and 3, and user 6 multicasts *<sup>W</sup>*<sup>1</sup> *<sup>d</sup>*5,{1,2,4,6} <sup>⊕</sup> *<sup>W</sup>*<sup>2</sup> *<sup>d</sup>*4,{2,3,5,6} to user 5 and 6. The transmission rate in the D2D network is *R*<sup>2</sup> = <sup>1</sup> 3 .

For the remaining subfiles with superscript *l* = 3, the server delivers them in the same way as in [1]. Specifically, it sends symbols <sup>⊕</sup>*k*∈S*W*<sup>3</sup> *dk*,S\{*k*}, for all S ⊆ [*K*] : |S| <sup>=</sup> 5. Thus, the rate sent by the server is *<sup>R</sup>*<sup>1</sup> <sup>=</sup> <sup>2</sup> <sup>15</sup> , and the transmission delay *<sup>T</sup>*central <sup>=</sup> max{*R*1, *<sup>R</sup>*2} <sup>=</sup> <sup>1</sup> <sup>3</sup> , which is less than the delay achieved by the coded caching schemes for the broadcast network [1] and the D2D communication [35], respectively.

### *4.2. The Generalized Centralized Coding Caching Scheme*

In the placement phase, each file is first split into ( *K <sup>t</sup>*) subfiles of equal size. More specifically, split *Wn* into subfiles as follows: *Wn* = (*Wn*,<sup>T</sup> : T ⊂ [*K*], |T | = *<sup>t</sup>*). User *<sup>k</sup>* caches all the subfiles if *k* ∈ T for all *n* = 1, ..., *N*, occupying the cache memory of *MF* bits. Then split each subfile *Wn*,<sup>T</sup> into two mini-files as *Wn*,<sup>T</sup> = *W*<sup>s</sup> *<sup>n</sup>*,<sup>T</sup> , *<sup>W</sup>*<sup>u</sup> *n*,T , where

$$\begin{aligned} |\mathcal{W}\_{\textit{n},\mathcal{T}}^{\sf s}| &= \lambda \cdot |\mathcal{W}\_{\textit{n},\mathcal{T}}| = \lambda \cdot \frac{F}{\binom{K}{t}},\\ |\mathcal{W}\_{\textit{n},\mathcal{T}}^{\sf u}| &= (1 - \lambda) \cdot |\mathcal{W}\_{\textit{n},\mathcal{T}}| = (1 - \lambda) \cdot \frac{F}{\binom{K}{t}}, \end{aligned} \tag{28}$$

with

$$\lambda = \frac{1+t}{\mathfrak{a}\min\{\lfloor \frac{K}{a} \rfloor - 1, t\} + 1+t}. \tag{29}$$

Here, the mini-file *W*<sup>s</sup> *<sup>n</sup>*,<sup>T</sup> and *<sup>W</sup>*<sup>u</sup> *<sup>n</sup>*,<sup>T</sup> will be sent by the server and users, respectively. For each mini-file *W*<sup>u</sup> *<sup>n</sup>*,<sup>T</sup> , split it into *<sup>L</sup>*<sup>1</sup> pico-files of equal size (<sup>1</sup> <sup>−</sup> *<sup>λ</sup>*) · *<sup>F</sup> L*1( *K t*) , i.e., *W*<sup>u</sup> *<sup>n</sup>*,<sup>T</sup> <sup>=</sup> *W*u,1 *<sup>n</sup>*,<sup>T</sup> ,..., *<sup>W</sup>*u,*L*<sup>1</sup> *n*,T , where *L*<sup>1</sup> satisfies

$$\frac{K \cdot \binom{K-1}{t} \cdot L\_1}{\alpha \min\{\lfloor \frac{K}{\alpha} \rfloor - 1, t\}} \in \mathbb{Z}^+. \tag{30}$$

As we will see later, condition (29) ensures that communication loads can be optimally allocated between the server and the users, and (30) ensures that the number of subfiles is large enough to maximize multicast gain for the transmission in the D2D network.

In the delivery phase, each user *k* requests file *Wdk* . The request vector **d** = (*d*1, *d*2, ... , *dK*) is informed by the server and all users. Please note that different parts of file *Wdk* have been stored in the user cache memories, and thus the uncached parts of *Wdk* can be sent both by the server and users. Subfiles

$$\left(\mathcal{W}\_{d\_k,\mathcal{T}'}^{\mathfrak{u},1} \dots , \mathcal{W}\_{d\_k,\mathcal{T}}^{\mathfrak{u},L\_1} : \mathcal{T} \subset [\mathbb{K}] , |\mathcal{T}| = t, k \notin \mathcal{T}\right),$$

are requested by user *k* and will be sent by the users via the D2D network. Subfiles

$$\left(W\_{d\_k,\mathcal{T}}^{\ast}: \mathcal{T} \subset [\mathbb{K}]\_\prime \\ |\mathcal{T}| = t, k \notin \mathcal{T}\right)$$

are requested by user *k* and will be sent by the server via the broadcast network.

First consider the subfiles sent by the users. Partition the *K* users into *α* groups of equal size:

$$\mathcal{G}\_{1\prime}, \dots, \mathcal{G}\_{\alpha\prime}$$

where for *i*, *j* = 1, ... , *α*, G*<sup>i</sup>* ⊆ [*K*] : |G*i*| = *K*/*α*, and G*<sup>i</sup>* ∩ G*<sup>j</sup>* = ∅, if *i* = *j*. In each group G*i*, one of *K*/*α* users plays the role of server and sends symbols based on its cached contents to the remaining (*K*/*α* − 1) users within the group.

Focus on a group G*<sup>i</sup>* and a set S ⊂ [*K*] : |S| = *t* + 1. If G*<sup>i</sup>* ⊆ S, then all nodes in G*<sup>i</sup>* share subfiles

$$(\mathcal{W}\_{n,\mathcal{T}}^{\mathbf{u},l} : l \in [L\_1], n \in [N], \mathcal{G}\_l \subseteq \mathcal{T}\_{\prime} \,|\, \mathcal{T} \vert = t).$$

In this case, user *k* ∈ G*<sup>i</sup>* sends XOR symbols that contains the requested subfiles useful to all remaining *K*/*α* − 1 users in <sup>G</sup>*i*, i.e., <sup>⊕</sup>*j*∈G*i*\{*k*}*W*u,*l*(*k*,G*i*,S) *dj*,S\{*j*} , where *<sup>l</sup>*(*k*, <sup>G</sup>*i*, <sup>S</sup>) <sup>∈</sup> [*L*1] is a function of (*k*, G*i*, S) which avoids redundant transmission of any fragments.

If S⊆G*i*, then the nodes in S share subfiles

$$(\mathcal{W}\_{n,\mathcal{T}}^{\mathfrak{u},l} : l \in [L\_1], n \in [N], \mathcal{T} \subset \mathcal{S}\_\prime |\mathcal{T}| = t).$$

In this case, user *k* ∈ S sends an XOR symbol that contains the requested subfiles for all remaining *<sup>t</sup>* <sup>−</sup> 1 users in <sup>S</sup>, i.e., <sup>⊕</sup>*j*∈S\{*k*}*W*u,*l*(*k*,G*i*,S) *dj*,S\{*j*} . Other groups perform the similar steps and concurrently deliver the remaining requested subfiles to other users.

By changing group partition and performing the delivery strategy described above, we can send all the requested subfiles

$$\{\mathcal{W}^{\mathbf{u},1}\_{d\_k,\mathcal{T}},\dots,\mathcal{W}^{\mathbf{u},L\_1}\_{d\_k,\mathcal{T}} : \mathcal{T} \subset [K], |\mathcal{T}| = t, k \notin \mathcal{T}\}\_{k=1}^{K} \tag{31}$$

to the users.

Since *α* groups send signals in a parallel manner (*α* users can concurrently deliver contents), and each user in a group delivers a symbol containing min{*K*/*α* − 1, *t*} nonrepeating pico-files requested by other users, in order to send all requested subfiles in (31), we need to send in total

$$\frac{K \cdot \binom{K-1}{t} \cdot L\_1}{\alpha \min\{\lfloor \frac{K}{\alpha} \rfloor - 1, t\}}\tag{32}$$

XOR symbols, each of size <sup>1</sup>−*<sup>λ</sup>* ( *K <sup>t</sup>*) *<sup>F</sup>* bits. Notice that *<sup>L</sup>*<sup>1</sup> is chosen according to (30), ensuring that (32) equals to an integer. Thus, we obtain *R*<sup>2</sup> as

$$\begin{split} R\_2 &= \frac{KL\_1 \cdot \binom{K-1}{t}}{a \min\{\lfloor \frac{K}{a} \rfloor - 1, t\}} \cdot \frac{1 - \lambda}{L\_1 \binom{K}{t}} \\ &= K \left( 1 - \frac{M}{N} \right) \frac{1}{1 + t + a \min\{\lfloor \frac{K}{a} \rfloor - 1, t\}} \Big/ \\ \end{split} \tag{33}$$

where the last equality holds by (29).

Now consider the delivery of the subfiles sent by the server. Apply the delivery strategy as in [1], i.e., the server broadcasts

$$\oplus\_{k \in \mathcal{S}} \mathcal{W}^{\mathfrak{s}}\_{d\_k \mathcal{S}} \backslash \{k\}^{\mathfrak{s}}$$

to all users, for all S ⊆ [*K*] : |S| = *t* + 1. We obtain the transmission rate of the server

$$\begin{split} R\_1 &= \lambda \cdot K \left( 1 - \frac{M}{N} \right) \cdot \frac{1}{1 + t} \\ &= K \left( 1 - \frac{M}{N} \right) \frac{1}{1 + t + a \min\{ \lfloor \frac{K}{a} \rfloor - 1, t \}}. \end{split} \tag{34}$$

From (33) and (34), we can see that the choice *λ* in (29) guarantees equal communication loads at the server and users. Since the server and users transmit the signals simultaneously, the transmission delay of the whole network is the maximum between *R*<sup>1</sup> and *R*2, i.e., *<sup>T</sup>*central <sup>=</sup> max{*R*1, *<sup>R</sup>*2} <sup>=</sup> *<sup>K</sup>*(1−*M*/*N*) <sup>1</sup>+*t*+*<sup>α</sup>* min{*K*/*α*−1,*t*} , for some *<sup>α</sup>* <sup>∈</sup> [*α*max].

### **5. Coding Scheme under Decentralized Data Placement**

In this section, we present a novel decentralized coded caching scheme for joint broadcast network and D2D network. The decentralized scenario is much more complicated than the centralized scenario, since each subfile can be stored by *s* = 1, 2, ... , *K* users, leading to a dynamic file-splitting and communication strategy in the D2D network. We first use an illustrative example to demonstrate how we form D2D communication groups, split data and deliver data, and then present our generalized coding caching scheme.

### *5.1. An Illustrative Example*

Consider a joint broadcast and D2D network consisting of *K* = 7 users. When using the decentralized data placement strategy, the subfiles cached by user *k* can be written as

$$\left(\mathcal{W}\_{\mathfrak{n},\mathcal{T}} : n \in [N], k \in \mathcal{T}, \mathcal{T} \subseteq [\mathcal{T}]\right). \tag{35}$$

We focus on the delivery of subfiles *Wn*,<sup>T</sup> : *<sup>n</sup>* ∈ [*N*], *<sup>k</sup>* ∈ T , |T | = *<sup>s</sup>* = 4, i.e., each subfile is stored by *s* = 4 users. A similar process can be applied to deliver other subfiles with respect to *s* ∈ [*K*]\{4}.

To allocate communication loads between the server and users, we divide each subfile into two mini-files *Wn*,<sup>T</sup> = *W*<sup>s</sup> *<sup>n</sup>*,<sup>T</sup> , *<sup>W</sup>*<sup>u</sup> *n*,T , where mini-files {*W*<sup>s</sup> *<sup>n</sup>*,<sup>T</sup> } and {*W*<sup>u</sup> *<sup>n</sup>*,<sup>T</sup> } will be sent by the server and users, respectively. To reduce the transmission delay, the size of *W*<sup>s</sup> *<sup>n</sup>*,<sup>T</sup> and *<sup>W</sup>*<sup>u</sup> *<sup>n</sup>*,<sup>T</sup> need to be chosen properly such that *<sup>R</sup>*<sup>1</sup> <sup>=</sup> *<sup>R</sup>*2, i.e., the transmission rate of the server and users are equal; see (37) and (39) ahead.

Divide all the users into two non-intersecting groups (G*<sup>r</sup>* <sup>1</sup>, <sup>G</sup>*<sup>r</sup>* <sup>2</sup>), for *r* ∈ [35] which satisfies

$$\mathcal{G}\_1^r \subset [K], \mathcal{G}\_2^r \subset [K], |\mathcal{G}\_1^r| = 4, |\mathcal{G}\_2^r| = 3, \mathcal{G}\_1^r \cap \mathcal{G}\_2^r = \mathcal{Q}.$$

There are ( 7 <sup>4</sup>) = 35 kinds of partitions in total, thus *r* ∈ [35]. Please note that for any user *<sup>k</sup>* ∈ G*<sup>r</sup> <sup>i</sup>* , |G*<sup>r</sup> <sup>i</sup>* | − 1 of its requested mini-files are already cached by the rest users in <sup>G</sup>*<sup>r</sup> i* , for *i* = 1, 2.

To avoid repetitive transmission of any mini-file, each mini-file in

$$(\mathsf{W}\_{d\_k}^{\mathsf{u}} \mathcal{T} \backslash \{k\} : \mathcal{T} \subseteq [\mathsf{T}], k \in [\mathsf{T}])$$

is divided into non-overlapping pico-files *W*u1 *dk*,T \{*k*} and *<sup>W</sup>*u2 *dk*,T \{*k*}, i.e.,

$$
\mathcal{W}^{\mathfrak{u}}\_{d\_k \mathcal{T} \backslash \{k\}} = (\mathcal{W}^{\mathfrak{u}\_1}\_{d\_k \mathcal{T} \backslash \{k\} \prime} \mathcal{W}^{\mathfrak{u}\_2}\_{d\_k \mathcal{T} \backslash \{k\} }) .
$$

The sizes of *W*u1 *<sup>n</sup>*,<sup>T</sup> and *<sup>W</sup>*u2 *<sup>n</sup>*,<sup>T</sup> need to be chosen properly to have equal transmission rate of group <sup>G</sup>*<sup>r</sup>* <sup>1</sup> and <sup>G</sup>*<sup>r</sup>* <sup>2</sup>; see (51) and (52) ahead.

To allocate communication loads between the two different types of groups, split each *W*u1 *dk*,T \{*k*} and *<sup>W</sup>*u2 *dk*,T \{*k*} into 3 and two equal fragments, respectively, e.g.,

$$\begin{split} \mathcal{W}\_{d\_{2},\{1,3,4\}}^{\mathfrak{u}\_{1}} &= \left( \mathcal{W}\_{d\_{2},\{1,3,4\}}^{\mathfrak{u}\_{1},1}, \mathcal{W}\_{d\_{2},\{1,3,4\}}^{\mathfrak{u}\_{1}}, \mathcal{W}\_{d\_{2},\{1,3,4\}}^{\mathfrak{u}\_{1}} \right), \\ \mathcal{W}\_{d\_{2},\{1,3,4\}}^{\mathfrak{u}\_{2}} &= \left( \mathcal{W}\_{d\_{2},\{1,3,4\}}^{\mathfrak{u}\_{2},1}, \mathcal{W}\_{d\_{2},\{1,3,4\}}^{\mathfrak{u}\_{2}} \right). \end{split}$$

,

During the delivery phase, in each round, one user in each group produces and multicasts an XOR symbol to all other users in the same group, as shown in Table 2.

**Table 2.** Parallel user delivery when *<sup>K</sup>* <sup>=</sup> 7, *<sup>s</sup>* <sup>=</sup> 4, <sup>G</sup>*<sup>r</sup>* <sup>1</sup> <sup>=</sup> 4 and <sup>G</sup>*<sup>r</sup>* <sup>2</sup> = 3, *r* ∈ [35].


There should be 35 partitions in total while the table only shows three partitions.

Please note that in this example, each group only appears one time among all partitions. However, for some other values of *s*, each group could appear multiple times in different partitions. For example, when *s* = 2, group {1, 2} appears in both partitions

{{1, 2}, {3, 4}, {5, 6, 7}} and {{1, 2}, {3, 5}, {4, 6, 7}}. To reduce the transmission delay, one should balance communication loads between all groups, and between the server and users as well.

### *5.2. The Generalized Decentralized Coded Caching Scheme*

In the placement phase, each user *k* applies the caching function to map a subset of *MF <sup>N</sup>* bits of file *Wn*, *n* = 1, ..., *N*, into its cache memory at random: *Wn* = *Wn*,<sup>T</sup> : T ⊆ [*K*] . The subfiles cached by user *k* can be written as *Wn*,<sup>T</sup> : *<sup>n</sup>* ∈ [*N*], *<sup>k</sup>* ∈ T , T ⊆ [*K*] . When the size of file *F* is sufficiently large, by the law of large numbers, the subfile size with high probability can be written by

$$|\mathcal{W}\_{n,\mathcal{T}}| \approx p^{|\mathcal{T}|} (1 - p)^{K - |\mathcal{T}|}. \tag{36}$$

The delivery procedure can be characterized into three different levels: allocating communication loads between the server and user, inner-group coding (i.e., transmission in each group) and parallel delivery among groups.

### 5.2.1. Allocating Communication Loads between the Server and User

To allocate communication loads between the server and users, split each subfile *Wn*,<sup>T</sup> , for T ⊆ [*K*] : T = ∅, into two non-overlapping mini-files

$$
\mathcal{W}\_{\mathfrak{u},\mathcal{T}} = \left(\mathcal{W}\_{\mathfrak{u},\mathcal{T}'}^{\mathfrak{s}} \mathcal{W}\_{\mathfrak{u},\mathcal{T}}^{\mathfrak{u}}\right).
$$

where

$$\begin{aligned} |\mathcal{W}\_{n,\mathcal{T}}^{\sf s}| &= \lambda \cdot |\mathcal{W}\_{n,\mathcal{T}}|\_{\prime} \\ |\mathcal{W}\_{n,\mathcal{T}}^{\sf u}| &= (1 - \lambda) \cdot |\mathcal{W}\_{n,\mathcal{T}}|\_{\prime} \end{aligned} \tag{37}$$

and *λ* is a design parameter whose value is determined in Remark 5.

Mini-files (*W*<sup>s</sup> *dk*,T \{*k*} : *<sup>k</sup>* <sup>∈</sup> [*K*]) will be sent by the server using the decentralized coded caching scheme for the broadcast network [2], leading to the transmission delay

$$
\lambda R\_s = \lambda \frac{1 - M/N}{M/N} \left( 1 - \left( 1 - \frac{M}{N} \right)^K \right), \tag{38}
$$

,

where *R*s is defined in (19).

Mini-files (*W*<sup>u</sup> *dk*,T \{*k*} : *<sup>k</sup>* <sup>∈</sup> [*K*]) will be sent by users using *parallel user delivery* described in Section 5.2.3. The corresponding transmission rate is

$$R\_2 = (1 - \lambda) R\_{\text{ul}} \tag{39}$$

where *R*u represents the transmission bits sent by each user normalized by *F*.

Since subfile *Wdk*,<sup>∅</sup> is not cached by any user and must be sent exclusively from the server, the corresponding transmission delay for sending (*Wdk*,<sup>∅</sup> : *k* ∈ [*K*]) is

$$R\_{\odot} = K \left( 1 - \frac{M}{N} \right)^{K},\tag{40}$$

where *R*<sup>∅</sup> coincides with the definition in (18).

By (38)–(40), we have

$$R\_1 = R\_\oslash + \lambda R\_\oslash \ R\_2 = (1 - \lambda) R\_\mathbf{u}.\tag{41}$$

According to (8), we have *T*decentral = max{*R*1, *R*2}.

**Remark 5** (Choice of *λ*)**.** *The parameter λ is chosen such that T*decentral *is minimized. If R*<sup>u</sup> < *R*∅*, then the inequality R*<sup>2</sup> ≤ *R*<sup>1</sup> *always holds and T*decentral *reaches the minimum T*decentral = *R*<sup>∅</sup> *with <sup>λ</sup>* <sup>=</sup> <sup>0</sup>*. If R*<sup>u</sup> <sup>≥</sup> *<sup>R</sup>*∅*, solving R*<sup>1</sup> <sup>=</sup> *<sup>R</sup>*<sup>2</sup> *yields <sup>λ</sup>* <sup>=</sup> *<sup>R</sup>*u−*R*<sup>∅</sup> *<sup>R</sup>*s+*R*<sup>u</sup> *and T*decentral <sup>=</sup> *<sup>R</sup>*s*R*<sup>u</sup> *<sup>R</sup>*s+*R*u−*R*<sup>∅</sup> .

### 5.2.2. Inner-Group Coding

Given parameters (*s*, G, i, *γ*) where *s* ∈ [*K* − 1], G ⊆ [*K*], i ∈ {u, u1, u2} with indicators u, u1, u2 described in (37) and (51), and *<sup>γ</sup>* <sup>∈</sup> <sup>Z</sup>+, we present how to successfully deliver

> (*W*<sup>i</sup> *dk*,S\{*k*} : ∀S ⊆ [*K*], |S| <sup>=</sup> *<sup>s</sup>*, G⊆S)

to every user *k* ∈ G via D2D communication.

Split each *W*<sup>i</sup> *dk*,S\{*k*} into (|G| − <sup>1</sup>)*<sup>γ</sup>* non-overlapping fragments of equal size, i.e.,

$$\mathcal{W}\_{d\_k,\mathcal{S}}^{\rm i}|\{k\} = \left(\mathcal{W}\_{d\_k,\mathcal{S}}^{\rm iJ}|\_{\{k\}} : l \in [(|\mathcal{G}| - 1)\gamma] \right),\tag{42}$$

and each user *k* ∈ G takes turn to broadcast XOR symbol

$$X\_{k,\mathcal{G},s}^{\rm i} \stackrel{\Delta}{=} \ominus\_{j \in \mathcal{G} \backslash \{k\}} \mathcal{W}\_{d\_j,\mathcal{S} \backslash \{j\}}^{i,l(j,\mathcal{G},\mathcal{S})}{}\_{\prime} \tag{43}$$

where *l*(*k*, G, S) ∈ [(|G| − 1)*γ*] is a function of (*k*, G, S) which avoids redundant transmission of any fragments. The XOR symbol *X*<sup>i</sup> *<sup>k</sup>*,G,*<sup>s</sup>* will be received and decoded by the remaining users in G.

For each group G, inner-group coding encodes in total ( *K*−|G| *<sup>s</sup>*−|G|) of *<sup>W</sup>*<sup>i</sup> *dk*,S\{*k*}, and each XOR symbol *X*<sup>i</sup> *<sup>k</sup>*,G,*<sup>s</sup>* in (43) contains fragments required by |G| − 1 users in <sup>G</sup>.

### 5.2.3. Parallel Delivery among Groups

The parallel user delivery consists of (*K* − 1) rounds characterized by *s* = 2, ... , *K*. In each round *s*, mini-files

$$(\mathcal{W}\_{d\_{\mathcal{K}}}^{\mathfrak{u}} \mathcal{T} \backslash \{ \boldsymbol{k} \} : \forall \mathcal{T} \subseteq [\mathbb{K}] , |\mathcal{T}| = \boldsymbol{s} , \boldsymbol{k} \in [\mathbb{K}] )$$

are recovered through D2D communication.

The key idea is to partition *<sup>K</sup>* users into *<sup>K</sup> <sup>s</sup>* groups for each communication round *s* ∈ {2, ..., *K*}, and let each group perform the D2D coded caching scheme [35] to exchange information. If (*K* mod *s*) <sup>=</sup> 0, there will be *<sup>K</sup> <sup>s</sup>* numbers of groups of the same size *s*, and an *abnormal* group of size (*K* mod *s*), leading to an asymmetric caching setup. We optimally allocate the communication loads between the two types of groups, and between the broadcast network and D2D network as well.

Based on *K*, *s* and *α*max, the delivery strategy in the D2D network is divided into 3 cases:

• Case 1: *<sup>K</sup> <sup>s</sup>* > *α*max. In this case, *α*max users are allowed to send data simultaneously. Select *s* · *α*max users from all users and divide them into *α*max groups of equal size *s*. The total number of such kinds of partition is

$$\beta\_1 \stackrel{\triangle}{=} \frac{\binom{K}{s} \binom{K-s}{s} \cdot \dots \cdot \binom{K-s(a\_{\max}-1)}{s}}{\alpha\_{\max}!} . \tag{44}$$

In each partition, *α*max users, selected from *α*max groups, respectively, send data in parallel via the D2D network.

• Case 2: *<sup>K</sup> <sup>s</sup>* ≤ *<sup>α</sup>*max and (*<sup>K</sup>* mod *<sup>s</sup>*) <sup>&</sup>lt; 2. In this case, choose ( *<sup>K</sup> <sup>s</sup>* − 1)*s* users from all users and partition them into ( *<sup>K</sup> <sup>s</sup>* − 1) groups of equal size *s*. The total number of such kind partition is

$$\beta\_2 \triangleq \frac{\binom{K}{s} \binom{K-s}{s} \cdots \binom{K-s\left\lfloor \left\lfloor \frac{K}{s} \right\rfloor - 1 \right\rfloor}{s}}{\lfloor \frac{K}{s} \rfloor !}. \tag{45}$$

In each partition, ( *<sup>K</sup> <sup>s</sup>* − <sup>1</sup>) users selected from ( *<sup>K</sup> <sup>s</sup>* − 1) groups of equal size *s*, respectively, together with an extra user selected from the *abnormal* group of size *<sup>K</sup>* <sup>−</sup> *<sup>s</sup>*( *<sup>K</sup> <sup>s</sup>* − 1) send data in parallel via the D2D network.

• Case 3: *<sup>K</sup> <sup>s</sup>* ≤ *α*max and (*K* mod *s*) ≥ 2. In this case, every *s* users form a group, resulting in *<sup>K</sup> <sup>s</sup>* groups consisting of *<sup>s</sup> <sup>K</sup> <sup>s</sup>* users. The remaining (*K* mod *s*) users form an *abnormal* group. The total number of such kind of partition is

$$
\beta\_3 = \beta\_2. \tag{46}
$$

In each partition, *<sup>K</sup> <sup>s</sup>* users selected from *<sup>K</sup> <sup>s</sup>* groups of equal size *s*, respectively, together with an extra user selected from the abnormal group of size (*K* mod *s*) send data in parallel via the D2D network.

Thus, the exact number of users who parallelly send signals can be written as follows:

$$\mathfrak{w}\_{\rm D} = \begin{cases} \mathfrak{a}\_{\rm max}, & \text{case 1}, \\ \lfloor \frac{K}{s} \rfloor, & \text{case 2}, \\ \lceil \frac{K}{s} \rceil, & \text{case 3}. \end{cases} \tag{47}$$

Please note that each group G re-appears

$$N\_{\mathcal{G}} \stackrel{\Delta}{=} \frac{\binom{K-s}{s} \cdots \binom{K-s \cdot (a\_{\mathcal{D}}-1)}{s}}{(a\_{\mathcal{D}}-1)!} \tag{48}$$

times among [*βc*] partitions.

Now we present the decentralized scheme for these three cases as follows.

*Case 1* ( *<sup>K</sup> <sup>s</sup>* > *α*max): Consider a partition *r* ∈ [*β*1], denoted by

$$\mathcal{G}\_{1'}^r, \dots, \mathcal{G}\_{a\_{\mathcal{D}'}}^r$$

where |G*<sup>r</sup> <sup>i</sup>* <sup>|</sup> <sup>=</sup> *<sup>s</sup>* and <sup>G</sup>*<sup>r</sup> <sup>i</sup>* ∩ G*<sup>r</sup> <sup>j</sup>* = ∅, ∀*i*, *j* ∈ [*α*D] and *i* = *j*.

Since each group <sup>G</sup>*<sup>r</sup> <sup>i</sup>* re-appears *N*G*<sup>r</sup> <sup>i</sup>* times among [*β*1] partitions, and (|G*<sup>r</sup> <sup>i</sup>* | − 1) users take turns to broadcast XOR symbols (43) in each group <sup>G</sup>*<sup>r</sup> <sup>i</sup>* , in order to guarantee that each group can send a unique fragment without repetition, we split each mini-file *W*<sup>u</sup> *dk*,S\{*k*} into (|G*<sup>r</sup> <sup>i</sup>* | − <sup>1</sup>)*N*G*<sup>r</sup> <sup>i</sup>* fragments of equal size.

Each group <sup>G</sup>*<sup>r</sup> <sup>i</sup>* , for *r* ∈ [*β*1] and *i* ∈ [*α*D], performs inner-group coding (see Section 5.2.2) with parameters

$$(s, \mathcal{G}\_i^r, \mathbf{u}\_\prime \mathcal{N}\_{\mathcal{G}\_i^r})\_{\prime}$$

for all *<sup>s</sup>* satisfying *<sup>K</sup> <sup>s</sup>* <sup>&</sup>gt; *<sup>α</sup>*max. For each round *<sup>r</sup>*, all groups <sup>G</sup>*<sup>r</sup>* <sup>1</sup>, ... , <sup>G</sup>*<sup>r</sup> <sup>α</sup>*<sup>D</sup> parallelly send XOR symbols containing |G*<sup>r</sup> <sup>i</sup>* | − 1 fragments required by other users of its group. By the fact that the partitioned groups traverse every set T , i.e.,

$$\mathcal{T} \subseteq \{ \mathcal{G}\_1^r \cup \dots \cup \mathcal{G}\_{\mathbf{a}\_\mathbf{D}}^r \}\_{r=1'}^{\beta\_1} \forall \mathcal{T} \subseteq [\mathbb{K}] : |\mathcal{T}| = \mathbf{s}\_{\text{'}} $$

and since inner-group coding enables each group <sup>G</sup>*<sup>r</sup> <sup>i</sup>* to recover

$$(\mathcal{W}\_{d\_k,\mathcal{S}}^{\mathbf{u}}\backslash\_{\{k\}} : \forall \mathcal{S} \subseteq [\mathcal{K}], \lvert \mathcal{S} \rvert = \mathbf{s}\_\prime \mathcal{G}\_i^r \subseteq \mathcal{S}, k \in [\mathcal{K}]),$$

we can recover all required mini-files

$$(\mathsf{W}\_{d\_{k'}\mathcal{T}\backslash\{k\}}^{\mathtt{u}} : \forall \mathcal{T} \subseteq [\mathsf{K}], |\mathcal{T}| = \mathsf{s}, k \in [\mathsf{K}]).$$

The transmission delay of Case 1 in round *s* is thus

$$\begin{split} R\_{\text{case1}}^{\mathbf{u}}(\mathbf{s}) & \stackrel{\scriptstyle \triangleq}{=} \sum\_{r \in [\mathcal{S}\_{1}]} \sum\_{k \in \mathcal{G}\_{i}'} |X\_{k, \mathcal{G}\_{i}', s}^{\mathbf{u}}| \\ & \stackrel{\scriptstyle \triangleq}{=} \frac{K(\prescript{K-1}{s-1})}{a\_{\text{D}}(s-1)} |W\_{d\_{k}, \mathcal{T} \backslash \{k\}}^{\mathbf{u}}| \\ & = \frac{K(\prescript{K-1}{s-1})}{a\_{\text{max}}(s-1)} (1-\lambda) p^{s-1} (1-p)^{K-s+1}, \end{split} \tag{49}$$

where (a) follows by (44) and (48).

*Case 2* ( *<sup>K</sup> <sup>s</sup>* ≤ *α*max and (*K* mod *s*) < 2): We apply the same delivery procedure as Case 1, except that *<sup>β</sup>*<sup>1</sup> is replaced by *<sup>β</sup>*<sup>2</sup> and *<sup>α</sup>*<sup>D</sup> <sup>=</sup> *<sup>K</sup> <sup>s</sup>* . Thus, the transmission delay in round *s* is

$$\begin{split} R\_{\text{case2}}^{\mathbf{u}}(s) &= \frac{K(\frac{K-1}{s-1})}{\alpha\_D(s-1)} |W\_{d\_k,\mathcal{T}\backslash\{k\}}^{\mathbf{u}}| \\ &= \frac{K(\frac{K-1}{s-1})}{\lfloor \frac{K}{s} \rfloor (s-1)} (1-\lambda) p^{s-1} (1-p)^{K-s+1}. \end{split} \tag{50}$$

*Case 3* ( *<sup>K</sup> <sup>s</sup>* ≤ *α*max and (*K* mod *s*) ≥ 2): Consider a partition *r* ∈ [*β*3], denoted as

$$\mathcal{G}\_{1\prime}^r, \dots, \mathcal{G}\_{a\bullet\prime}^r$$

where <sup>G</sup>*<sup>r</sup> <sup>i</sup>* <sup>⊆</sup> [*K*], <sup>G</sup>*<sup>r</sup> <sup>i</sup>* ∩ G*<sup>r</sup> <sup>j</sup>* = ∅, ∀*i*, *j* ∈ [*α*<sup>D</sup> − 1] and *i* <sup>=</sup> *<sup>j</sup>* and <sup>G</sup>*<sup>r</sup> <sup>α</sup>*<sup>D</sup> = [*K*]\(∪*α*D−<sup>1</sup> *<sup>i</sup>*=<sup>1</sup> <sup>G</sup>*<sup>r</sup> <sup>i</sup>* ) with |G*r <sup>i</sup>* <sup>|</sup> <sup>=</sup> *<sup>s</sup>*, |G*<sup>r</sup> <sup>α</sup>*<sup>D</sup> | = (*K* mod *s*).

Since group <sup>G</sup>*<sup>r</sup> <sup>i</sup>* : *<sup>i</sup>* <sup>∈</sup> [*α*<sup>D</sup> <sup>−</sup> <sup>1</sup>] and <sup>G</sup>*<sup>r</sup> <sup>α</sup>*<sup>D</sup> have different group sizes, we further split each mini-file *W*<sup>u</sup> *dk*,T \{*k*} into two non-overlapping fragments such that

$$\begin{aligned} |\mathcal{W}\_{d\_k,\mathcal{T}}^{\mathfrak{u}\_1}| &= \lambda\_2 |\mathcal{W}\_{d\_k,\mathcal{T}}^{\mathfrak{u}}|\_{\{k\}} |\_{\prime} \\ |\mathcal{W}\_{d\_k,\mathcal{T}}^{\mathfrak{u}\_2}|\_{\{k\}} &= (1 - \lambda\_2) |\mathcal{W}\_{d\_k,\mathcal{T}}^{\mathfrak{u}}|\_{\{k\}}|\_{\prime} \end{aligned} \tag{51}$$

where *λ*<sup>2</sup> ∈ [0, 1] is a designed parameter satisfying (52).

Split each mini-file *W*u1 *dk*,S\{*k*} and *<sup>W</sup>*u2 *dk*,S\{*k*} into fragments of equal size:

$$\begin{aligned} \mathcal{W}\_{d\_k,S\backslash\{k\}}^{\mathfrak{u}\_1} &= \left(\mathcal{W}\_{d\_k,S\backslash\{k\}}^{\mathfrak{u}\_1,l} : l \in [(s-1)N\_{G\_i^r}]\right), \\ \mathcal{W}\_{d\_k,S\backslash\{k\}}^{\mathfrak{u}\_2} &= \left(\mathcal{W}\_{d\_k,S\backslash\{k\}}^{\mathfrak{u}\_2,l} : l \in \left[\left(|\mathcal{G}\_{\mathfrak{u}\_D}^r|-1\right)\binom{s-1}{|\mathcal{G}\_{\mathfrak{u}\_D}^r|-1} N\_{G\_i^r}\right]\right). \end{aligned}$$

Following the similar encoding operation in (43), group <sup>G</sup>*<sup>r</sup> <sup>i</sup>* : *i* ∈ [*α*<sup>D</sup> − 1] and group G*r <sup>α</sup>*<sup>D</sup> send the following XOR symbols, respectively:

$$(X\_{k, \mathcal{G}\_i^r, \mathfrak{s}}^{\mathbf{u}\_1} : k \in \mathcal{G}\_i^r)\_{i=1}^{(\mathfrak{a}\_{\mathbf{D}} - 1)} , \quad (X\_{k, \mathcal{G}\_{\mathbf{u}\_{\mathbf{D}}, \mathfrak{s}}^r}^{\mathbf{u}\_{\mathbf{D}} - 1} : k \in \mathcal{G}\_{\mathfrak{a}\_{\mathbf{D}}}^r) .$$

For each *s* ∈ {2, ... , *K*}, the transmission delay for sending the XOR symbols above by group <sup>G</sup>*<sup>r</sup> <sup>i</sup>* : *<sup>i</sup>* <sup>∈</sup> [*α*<sup>D</sup> <sup>−</sup> <sup>1</sup>] and group <sup>G</sup>*<sup>r</sup> K <sup>s</sup>* can be written as

$$\begin{aligned} R\_{\text{case3}}^{\text{u1}}(s) &= \frac{\lambda\_2 K\_{\binom{K-1}{s-1}}^{(K-1)}}{(\alpha\_\text{D} - 1)(s - 1)} \cdot |\mathcal{W}\_{d\_k, \mathcal{T}\backslash\{k\}}^{\text{u}}|\_{\prime} \\ R\_{\text{case3}}^{\text{u2}}(s) &= \frac{(1 - \lambda\_2) K\_{\binom{K-1}{s-1}}^{(K-1)}}{(K \bmod s) - 1} \cdot |\mathcal{W}\_{d\_k, \mathcal{T}\backslash\{k\}}^{\text{u}}|\_{\prime} \end{aligned}$$

respectively. Since <sup>G</sup>*<sup>i</sup>* : *<sup>i</sup>* <sup>∈</sup> [ *<sup>K</sup> <sup>s</sup>* ] and group G *<sup>K</sup> <sup>s</sup>* can send signals in parallel, by letting

$$R\_{\text{case3}}^{\mathfrak{u}\_1}(s) = R\_{\text{case3}}^{\mathfrak{u}\_2}(s),\tag{52}$$

we eliminate the parameter *λ*<sup>2</sup> and obtain the balanced transmission delay at users for Case 3:

$$R\_{\text{case3}}^{\text{u}}(s) \stackrel{\Delta}{=} \frac{K(\frac{K-1}{s-1})}{K - 1 - \lfloor \frac{K}{s} \rfloor} (1 - \lambda) p^{s-1} (1 - p)^{K - s + 1}. \tag{53}$$

**Remark 6.** *The condition <sup>K</sup> <sup>s</sup>* <sup>&</sup>gt; *<sup>α</sup>*max *in Case 1 implies that <sup>s</sup>* ≤ *<sup>K</sup> <sup>α</sup>*max − 1*. In this regime, the transmission delay is given in* (49)*. If <sup>s</sup>* ≥ *<sup>K</sup> <sup>α</sup>*max − 1 *and* (*K* mod *s*) < <sup>2</sup>*, scheme for Case 2 starts to work and the transmission delay is given in* (50)*; If <sup>s</sup>* ≥ *<sup>K</sup> <sup>α</sup>*max − 1 *and* (*K* mod *s*) ≥ <sup>2</sup>*, scheme for Case 3 starts to work and the transmission delay is given in* (53)*.*

In each round *s* ∈ {2, ... , *K*}, all requested mini-files can be recovered by the delivery strategies above. By Remark 6, the transmission delay in the D2D network is

$$\begin{split} R\_2 &= (1 - \lambda) \frac{1}{a\_{\max}} \sum\_{s=2}^{\lceil \frac{K}{a\_{\max}} \rceil - 1} \left[ \frac{s \binom{K}{s}}{s-1} p^{s-1} (1 - p)^{K - s + 1} \right] + (1 - \lambda) \sum\_{s=\lceil \frac{K}{a\_{\max}} \rceil}^{K} \left[ \frac{K \binom{K - 1}{s - 1}}{f(K, s)} p^{s-1} (1 - p)^{K - s + 1} \right] \\ &= (1 - \lambda) R\_{\mathbf{u} \prime} \end{split} \tag{54}$$

where *R*u is defined in (20) and

$$f(K,s) \triangleq \begin{cases} \lfloor \frac{K}{s} \rfloor (s-1), & (K \bmod s) < 2, \\ K - 1 - \lfloor K/s \rfloor, & (K \bmod s) \ge 2. \end{cases} \tag{55}$$

### **6. Conclusions**

In this paper, we considered a cache-aided communication via joint broadcast network with a D2D network. Two novel coded caching schemes were proposed for centralized and decentralized data placement settings, respectively. Both schemes achieve a parallel gain and a cooperation gain by efficiently exploiting communication opportunities in the broadcast and D2D networks, and optimally allocating communication loads between the server and users. Furthermore, we showed that in the centralized case, letting too many users parallelly send information could be harmful. The information theoretic converse bounds were established, with which we proved that the centralized scheme achieves the optimal transmission delay within a constant multiplicative gap in all regimes, and the decentralized scheme is also order-optimal when the cache size of each user is larger than a small threshold which tends to zero as the number of users tends to infinity. Our work indicates that combining the cache-aided broadcast network with the cache-aided D2D network can greatly reduce the transmission latency.

**Author Contributions:** Project administration, Y.W.; Writing—original draft, Z.H., J.C. and X.Y.; Writing—review & editing, S.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by National Natural Science Foundation of China grant number 61901267.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **Appendix A. Proof of the Converse**

Let *T*∗ <sup>1</sup> and *T*<sup>∗</sup> <sup>2</sup> denote the optimal rate sent by the server and each user. We first consider an enhance system where every user is served by an exclusive server and user, which both store full files in the database, then we are easy to obtain the following lower bound:

$$T^\* \ge \frac{1}{2}(1 - \frac{M}{N}).\tag{A1}$$

Another lower bound follows similar idea to [1]. However, due to the flexibility of D2D network, the connection and partitioning status between users can change during the delivery phase, prohibiting the direct application of the proof in [1] into the hybrid network considered in this paper. Moreover, the parallel transmission of the server and many users creates abundant different signals in the networks, making the scenario more sophisticated.

Consider the first *s* users with cache contents *Z*1, ..., *Zs*. Define *X*1,0 as the signal sent by the server, and *X*1,1, ... , *X*1,*α*max as the signals sent by the *α*max users, respectively, where *Xj*,*<sup>i</sup>* <sup>∈</sup> [2*T*<sup>∗</sup> <sup>2</sup> *<sup>F</sup>*] for *<sup>j</sup>* <sup>∈</sup> [*s*] and *<sup>i</sup>* <sup>∈</sup> [*α*max]. Assume that *<sup>W</sup>*1, ... , *Ws* are determined by *X*1,0, *X*1,1, ... , *X*1,*α*max and *Z*1, ... , *Zs*. Additionally, define *X*2,0, *X*2,1, ... , *X*2,*α*max as the signals which enable the users to decode *Ws*+1, ..., *W*2*s*. Continue the same process such that *<sup>X</sup>N*/*s*,0, *<sup>X</sup>N*/*s*,1, ... , *<sup>X</sup>N*/*s*,*α*max are the signals which enable the users to decode *WsN*/*s*−*s*<sup>+</sup>1, ..., *WsN*/*s*. We then have *Z*1,..., *Zs*, *X*1,0,..., *XN*/*s*,0, and

$$X\_{1,1}, \ldots, X\_{1,a\_{\max}}, \ldots, X\_{\lfloor N/s \rfloor, 1}, \ldots, X\_{\lfloor N/s \rfloor, a\_{\max}}$$

to determine *W*1,..., *WsN*/*s*. Let

$$\mathbf{X}\_{1:n\_{\max}} \triangleq (X\_{1,1}, \dots, X\_{1,\mu\_{\max}}, \dots, X\_{\lfloor N/s \rfloor, 1'}, \dots, X\_{\lfloor N/s \rfloor, \mu\_{\max}}).$$

By the definitions of *T*∗ <sup>1</sup> , *T*<sup>∗</sup> <sup>2</sup> and the encoding function (5), we have

$$H(X\_{1,0\prime}, \dots, X\_{\lfloor N/s \rfloor, 0}) \le \lfloor N/s \rfloor T\_1^\* F\_\prime \tag{A2}$$

$$H(\mathbf{X}\_{1:a\_{\text{max}}}) \le \lfloor N/s \rfloor a\_{\text{max}} T\_2^\* F\_\prime \tag{A3}$$

$$H(\mathbf{X}\_{1:n\_{\max}}, Z\_1, \dots, Z\_s) \le KMF. \tag{A4}$$

Consider the cut separating *<sup>X</sup>*1,0, ... , *<sup>X</sup>N*/*s*,0, **<sup>X</sup>**1:*α*max , and *<sup>Z</sup>*1, ... , *Zs* from the corresponding *s* users. By the cut-set bound and (A2), we have

$$\mathbb{E}\left\lfloor\frac{N}{s}\right\rfloor sF \le \left\lfloor\frac{N}{s}\right\rfloor T\_1^\*F + KMF\_\* \tag{A5}$$

$$\lfloor \frac{N}{s} \rfloor sF \le \lfloor \frac{N}{s} \rfloor T\_1^\* F + sMF + \lfloor \frac{N}{s} \rfloor a\_{\max} T\_2^\* F. \tag{A6}$$

Since we have *T*<sup>∗</sup> ≥ *T*<sup>∗</sup> <sup>1</sup> and *T*<sup>∗</sup> ≥ max{*T*<sup>∗</sup> <sup>1</sup> , *T*<sup>∗</sup> <sup>2</sup> } from the above definition, we obtain

$$T^\* \ge \max\_{s \in [K]} (s - \frac{KM}{\lfloor N/s \rfloor}),\tag{A7}$$

$$T^\* \ge \max\_{s \in [K]} (s - \frac{sM}{\lfloor N/s \rfloor}) \frac{1}{1 + \kappa\_{\max}}.\tag{A8}$$

### **Appendix B**

We prove that *T*central is within a constant multiplicative gap of the minimum transmission delay *T*∗ for all values of *M*. To prove the result, we compare them in the following regimes.

• If 0.6393 < *t* < *K*/*α* − 1, from Theorem 1, we have

$$\begin{split} T^\* &\geq (s - \frac{Ms}{\lfloor N/s \rfloor}) \frac{1}{1 + \kappa\_{\max}} \\ &\overset{(a)}{\geq} \frac{1}{12} \cdot K \Big( 1 - \frac{M}{N} \Big) \frac{1}{1+t} \cdot \frac{1}{1 + \kappa\_{\max}}, \end{split} \tag{A9}$$

where (a) follows from [1] [Theorem 3]. Then we have

$$\begin{split} \frac{T\_{\text{central}}}{T^\*} &\le 12 \cdot \frac{(1 + a\_{\text{max}})(1+t)}{1+t+\alpha t} \\ &= 12 \cdot \frac{(1 + a\_{\text{max}})}{1+\alpha t/(1+t)} \\ &\le 12 \cdot \frac{(1 + a\_{\text{max}})}{1+\alpha \cdot 0.6393/(1+0.6393)} \\ &\le 31, \end{split} \tag{A10}$$

where the last inequality holds by setting *α* = *α*max.

• If *t* > *K*/*α* − 1, we have

$$\begin{split} \frac{T\_{\text{central}}}{T^\*} &\leq \frac{K(1-\frac{M}{N})\frac{1}{1+t+a(\lfloor K/a\rfloor -1)}}{\frac{1}{2}(1-\frac{M}{N})}\\ &= \frac{2K}{1+t+a(\lfloor K/a\rfloor -1)}\\ &\overset{(a)}{\leq} \frac{2K}{K+KM/N}\\ &\leq 2, \end{split} \tag{A11}$$

where (*a*) holds by choosing *α* = 1.

• If *t* ≤ 0.6393, setting *s* = 0.275*N*, we have

$$\begin{aligned} T^\* &\geq s - \frac{KM}{\lfloor N/s \rfloor} \\ &\overset{(a)}{\geq} s - \frac{KM}{N/s - 1} \\ &= 0.275N - t \cdot 0.3793N \\ &\geq 0.0325N > \frac{1}{31} \cdot N, \end{aligned} \tag{A12}$$

where (*a*) holds since *x* ≥ *x* − 1 for any *x* ≥ 1. Please note that for all values of *M*, the transmission delay

$$T\_{\text{central}} \le \min\{K, N\}. \tag{A13}$$

Combining with (A12) and (A13), we have *<sup>T</sup>*central *<sup>T</sup>*<sup>∗</sup> ≤ 31.

### **Appendix C**

*Appendix C.1. Case <sup>α</sup>*max <sup>=</sup> *<sup>K</sup>* 2 When *<sup>α</sup>*max <sup>=</sup> *<sup>K</sup>* <sup>2</sup> , we have

$$R\_{\mathbf{u}} = R\_{\mathbf{u}:\mathbf{f}} \stackrel{\Delta}{=} \sum\_{s=2}^{K} \frac{K(\mathbf{f}\_{s-1}^{K-1})}{f(K,s)} p^{s-1} q^{K-s+1},\tag{A14}$$

where *<sup>R</sup>*u-f denotes the user's transmission rate for a flexible D2D network with *<sup>α</sup>*max <sup>=</sup> *<sup>K</sup>* <sup>2</sup> . In the flexible D2D network, at most *<sup>K</sup>* <sup>2</sup> users are allowed to transmit messages simultaneously, in which the user transmission turns to unicast.

Please note that in each term of the summation:

$$\begin{split} \frac{K\binom{K-1}{s-1}}{f(K,s)} &\leq \frac{K\binom{K-1}{s-1}}{K-1-\frac{K}{s}}\\ &= \left(\frac{K}{K-1} + \frac{\left(\frac{K}{K-1}\right)^2}{s-\frac{K}{K-1}}\right) \cdot \binom{K-1}{s-1} \\ &\leq \frac{K\binom{K-1}{s-1}}{K-1} + \frac{2K\binom{K}{s}}{(K-1)(K-2)},\end{split} \tag{A15}$$

where the last inequality holds by *<sup>s</sup>* <sup>≥</sup> *<sup>K</sup> <sup>K</sup>*−<sup>1</sup> <sup>+</sup> *<sup>K</sup>*−<sup>2</sup> *<sup>K</sup>*−<sup>1</sup> <sup>=</sup> 2 and

$$\begin{split} \frac{\left(\frac{K}{K-1}\right)^2}{s - \frac{K}{K-1}} \binom{K-1}{s-1} &= \frac{K^2 \binom{K-1}{s-1}}{(K-1)(K-2)} \cdot \frac{\frac{K-2}{K-1}}{s - \frac{K}{K-1}}\\ &\leq \frac{K^2 \binom{K-1}{s-1}}{(K-1)(K-2)} \cdot \frac{\frac{K-2}{K-1} + \frac{K}{K-1}}{s - \frac{K}{K-1} + \frac{K}{K-1}}\\ &= \frac{2K}{(K-1)(K-2)} \cdot \binom{K}{s}. \end{split}$$

Therefore, by (A15), *R*u-f can be rewritten as

$$\begin{split} R\_{\mathbf{u}\cdot\mathbf{f}} &\leq \frac{K}{K-1} \sum\_{s=2}^{K} \binom{K-1}{s-1} p^{s-1} q^{K-s+1} + \frac{2K}{(K-1)(K-2)} \sum\_{s=2}^{K} \binom{K}{s} p^{s-1} q^{K-s+1} \\ &= \frac{Kq}{K-1} \cdot \sum\_{i=1}^{K-1} \binom{K-1}{i} p^i q^{K-1-i} + \frac{2Kq/p}{(K-1)(K-2)} \cdot \sum\_{s=2}^{K} \binom{K}{s} p^s q^{K-s} \\ &= \frac{Kq}{K-1} \left( 1 - q^{K-1} \right) + \frac{2Kq/p}{(K-1)(K-2)} \cdot \left( 1 - q^K - Kp q^{K-1} \right). \end{split}$$

*Appendix C.2. Case α*max = 1

When *α*max = 1, the cooperation network degenerates into a shared link where only one user acts as the server and broadcasts messages to the remaining *K* − 1 users. A similar derivation is given in [35]. In this case, *R*u can be rewritten as

$$\begin{split} R\_{\mathbf{u}} &= \sum\_{s=2}^{K} \frac{\mathbf{s} \binom{K}{s}}{s-1} p^{s-1} q^{K-s+1} \\ &\leq \sum\_{s=2}^{K} \left( 1 + \frac{3}{s+1} \right) \binom{K}{s} p^{s-1} q^{K-s+1} \\ &= \sum\_{s=2}^{K} \binom{K}{s} p^{s-1} q^{K-s+1} + \frac{3}{K+1} \sum\_{s=2}^{K} \binom{K+1}{s+1} p^{s-1} q^{K-s+1} \\ &= \frac{q}{p} \left( 1 - q^K - Kp q^{K-1} \right) + \frac{3q/p^2}{K+1} \left( 1 - q^{K+1} - (K+1)pq^K - \frac{K(K+1)}{2} p^2 q^{K-1} \right) \\ &= \frac{q}{p} \left( 1 - \frac{5}{2} Kp q^{K-1} - 4q^K + \frac{3(1 - q^{K+1})}{(K+1)p} \right), \end{split}$$

where the inequality holds by the fact that *s* ≥ 2.

### **Appendix D**

*Appendix D.1. When <sup>α</sup>*max <sup>=</sup> *<sup>K</sup>* 2

Recall that *p*th - 1 − 1 *K*+1 1 *K*−1 , which tends to zero as *K* goes to infinity. We first introduce the following three lemmas.

**Lemma A1.** *Given arbitrary convex function g*1(*p*) *and arbitrary concave function g*2(*p*)*, if they intersect at two points with p*<sup>1</sup> < *p*2*, then g*1(*p*) ≤ *g*2(*p*) *for all p* ∈ [*p*1, *p*2]*.*

We omit the proof of Lemma A1 as it is straightforward.

**Lemma A2.** *For memory size* <sup>0</sup> <sup>≤</sup> *<sup>p</sup>* <sup>≤</sup> <sup>1</sup> *and <sup>α</sup>*max <sup>=</sup> *<sup>K</sup>* <sup>2</sup> *, we have*

$$R\_{\mathbf{u}} \ge R\_{\odot \prime} \ T\_{\text{decentral}} = \frac{R\_{\mathbf{s}} R\_{\mathbf{u}}}{R\_{\mathbf{s}} + R\_{\mathbf{u}} - R\_{\odot} \prime}, \text{ for all } p \in [p\_{\text{th}}, 1].$$

**Proof.** When *<sup>α</sup>*max <sup>=</sup> *<sup>K</sup>* <sup>2</sup> , from Equation (20), we have

$$\begin{split} R\_{\mathbf{u}} &= \sum\_{s=2}^{K} \frac{K(\frac{K-1}{s-1})}{f(\mathbf{K}, \mathbf{s})} p^{s-1} (1-p)^{K-s+1} \\ &\geq \frac{K}{K} \sum\_{x=1}^{K-1} \binom{K-1}{x} p^{x} (1-p)^{K-x} \\ &= (1-p) \left( 1 - (1-p)^{K-1} \right), \end{split} \tag{A16}$$

where the first inequality holds by letting *<sup>x</sup>* <sup>=</sup> *<sup>s</sup>* <sup>−</sup> 1 and *<sup>K</sup> <sup>K</sup>*−1− *<sup>K</sup> <sup>s</sup>* <sup>&</sup>gt; *<sup>K</sup> <sup>K</sup>*−<sup>1</sup> . It is easy to show that (1 − *p*) <sup>1</sup> <sup>−</sup> (<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)*K*−<sup>1</sup> is a concave function of *<sup>p</sup>* by verifying *<sup>∂</sup>*2(1−*p*)(1−(1−*p*)*K*−1) *<sup>∂</sup>p*<sup>2</sup> ≤ 0.

On the other hand, one can easily show that

$$\mathcal{R}\_{\mathbb{Q}} = K(1 - p)^K$$

is a convex function of *<sup>p</sup>* by showing *<sup>∂</sup>*2*R*∅(*p*) *<sup>∂</sup>p*<sup>2</sup> ≥ 0. Since the two functions (<sup>1</sup> − *<sup>p</sup>*) 1 − (1 − *p*)*K*−<sup>1</sup> and *R*<sup>∅</sup> intersect at *p*th = 1 − 1 *K*+1 1 *K*−1 and *p*<sup>2</sup> = 1 with *p*th ≤ *p*2, from Lemma A1 and (A16), we have

$$\mathcal{R}\_{\mathbf{u}} \ge (1 - p) \left( 1 - (1 - p)^{K - 1} \right) \ge \mathcal{R}\_{\mathcal{Q} \mathcal{M}}$$

for all *<sup>p</sup>* <sup>∈</sup> [*p*th, 1]. From Remark 4, we know that *<sup>T</sup>*decentral <sup>=</sup> *<sup>R</sup>*s*R*<sup>u</sup> *<sup>R</sup>*s+*R*u−*R*<sup>∅</sup> if *<sup>R</sup>*<sup>u</sup> <sup>≥</sup> *<sup>R</sup>*<sup>∅</sup>

**Lemma A3.** *For memory size* <sup>0</sup> <sup>≤</sup> *<sup>p</sup>* <sup>≤</sup> <sup>1</sup> *and <sup>α</sup>*max <sup>=</sup> *<sup>K</sup>* <sup>2</sup> *, we have*

$$\frac{R\_{\text{s}}R\_{\text{u}}}{R\_{\text{s}} + R\_{\text{u}} - R\_{\text{\mathcal{O}}}} \le 6T^\*.$$

**Proof.** From (25) and (19), we have

$$\begin{split} R\_{\mathbf{u}} &\leq \frac{K}{K-1} \cdot \left( q - q^{K} \right) + \frac{2K}{(K-1)(K-2)} \cdot \frac{q}{p} \left( 1 - q^{K} - Kpq^{K-1} \right) \\ &\overset{(a)}{\leq} \frac{K}{K-1} \cdot \left( q - q^{K} \right) + \frac{2K}{(K-1)(K-2)} \cdot \frac{q}{p} \left( 1 - (1 - Kp) - Kpq^{K-1} \right) \\ &= \frac{K(3K-2)}{(K-1)(K-2)} \cdot \left( q - q^{K} \right), \end{split} \tag{A17}$$

$$R\_5 = \frac{q}{p} \left( 1 - q^K \right) \stackrel{(b)}{\leq} \frac{q}{p} \left( 1 - \left( 1 - Kp \right) \right) = Kq,\tag{A18}$$

where (*a*) and (*b*) both follow from inequality

$$\left(1 - p\right)^{K} \geq \left(1 - \mathbb{K}p\right). \tag{A19}$$

Then, by Remark <sup>4</sup> and (A17), (A18) and definition of *<sup>R</sup>*<sup>∅</sup> in (18), if *<sup>α</sup>*max <sup>=</sup> *<sup>K</sup>* <sup>2</sup> , then

$$\frac{R\_{\text{s}}R\_{\text{u}}}{R\_{\text{s}} + R\_{\text{u}} - R\_{\text{O}}} \le \frac{Kq \cdot \frac{K(3K-2)}{(K-1)(K-2)} \left(q - q^{K}\right)}{Kq + \frac{K(3K-2)}{(K-1)(K-2)} \left(q - q^{K}\right) - Kq^{K}}$$

$$= \left(3 - \frac{2}{K}\right) \cdot q. \tag{A20}$$

From Theorem 1, we have *<sup>T</sup>*<sup>∗</sup> <sup>≥</sup> <sup>1</sup> <sup>2</sup> *q*. Thus, we obtain

$$\frac{R\_{\text{sf}}R\_{\text{u}}}{R\_{\text{s}} + R\_{\text{u}} - R\_{\text{\mathcal{Q}}}} \cdot \frac{1}{T^\*} \le \frac{\left(3 - 2/K\right) \cdot q}{q/2} \le 6 - \frac{4}{K} < 6.$$

Next, we use Lemmas A2 and A3 to prove that when *<sup>α</sup>*max <sup>=</sup> *<sup>K</sup>* <sup>2</sup> ,

$$\frac{T\_{\text{decentral}}}{T^\*} \le \begin{cases} \max\left\{6, 2K\left(\frac{2K}{2K+1}\right)^{K-1}\right\}, & p < p\_{\text{th}\prime} \\ 6, & p \ge p\_{\text{th}\prime} \end{cases}$$

Appendix D.1.1. Case *<sup>α</sup>*max <sup>=</sup> *<sup>K</sup>* <sup>2</sup> and *p* ≥ *p*th

In this case, from Lemma A2, we have

$$T\_{\text{decental}} = \frac{R\_{\text{s}}R\_{\text{u}}}{R\_{\text{s}} + R\_{\text{u}} - R\_{\odot}}.$$

Thus, from Lemma A3,

$$T\_{\text{decentral}} = \frac{R\_{\text{s}}R\_{\text{u}}}{R\_{\text{s}} + R\_{\text{u}} - R\_{\odot}} \le 6T^\*.$$

Appendix D.1.2. Case *<sup>α</sup>*max <sup>=</sup> *<sup>K</sup>* <sup>2</sup> and *p* ≤ *p*th

From the definition of *T*decentral in (17), we have

$$\frac{T\_{\text{electrical}}}{T^\*} = \max \{ \frac{R\_{\odot}}{T^\*}, \frac{R\_{\text{s}} R\_{\text{u}}}{R\_{\text{s}} + R\_{\text{u}} - R\_{\odot}} \cdot \frac{1}{T^\*} \}. \tag{A21}$$

From Lemma A3, we know that

$$\frac{R\_{\text{s}}R\_{\text{u}}}{R\_{\text{s}} + R\_{\text{u}} - R\_{\text{\mathcal{D}}}} \cdot \frac{1}{T^{\*}} \le \theta\_{\text{\prime}} \tag{A.22}$$

and thus only focus on the upper bound of *R*∅/*T*∗.

According to Theorem 1, *<sup>T</sup>*<sup>∗</sup> has the following two lower bounds: *<sup>T</sup>*<sup>∗</sup> <sup>≥</sup> <sup>1</sup>−*<sup>p</sup>* <sup>2</sup> , and

$$T^\* \ge \max\_{s \in \left[K\right]} \left( s - \frac{KM}{\lfloor N/s \rfloor} \right) \ge \max\_{s \in \left[K\right]} \left( s - \frac{KM}{N/\left(2s\right)} \right).$$

Let *R*∗ <sup>1</sup> (*p*) - 1 <sup>2</sup> (1 − *p*) and *R*<sup>∗</sup> <sup>2</sup> (*p*) -(*<sup>K</sup>* <sup>−</sup> <sup>2</sup>*K*<sup>2</sup> *<sup>p</sup>*), then we have

$$T^\* \ge \max\{R\_1^\*(p), R\_2^\*(p)\}.$$

Here *R*∅/*R*∗ <sup>1</sup> (*p*) and *R*∅/*R*<sup>∗</sup> <sup>2</sup> (*p*) both are monotonic functions of *p* according to the following properties:

$$\begin{split} \frac{\partial \left( R\_{\bigtriangleup} / R\_1^\*(p) \right)}{\partial p} &= \frac{\partial \left( 2K(1-p)^{K-1} \right)}{\partial p} \le 0, \\\frac{\partial \left( R\_{\bigtriangleup} / R\_2^\*(p) \right)}{\partial p} &= \frac{\partial \left( q^K / (1-2Kp) \right)}{\partial p} \\&= \frac{Kq^{K-1} \left( 1 + 2(K-1)p \right)}{(1-2Kp)^2} \ge 0. \end{split}$$

Additionally, notice that if *p* = 0, then *<sup>R</sup>*<sup>∅</sup> *R*∗ <sup>2</sup> (*p*) <sup>=</sup> <sup>1</sup> <sup>&</sup>lt; *<sup>R</sup>*<sup>∅</sup> *R*∗ <sup>1</sup> (*p*), and if *<sup>p</sup>* <sup>=</sup> 1, *<sup>R</sup>*<sup>∅</sup> *R*∗ <sup>2</sup> (*p*) <sup>&</sup>gt; *<sup>R</sup>*<sup>∅</sup> *R*∗ <sup>1</sup> (*p*) <sup>=</sup> 1. Therefore, the maximum value of *R*∅/ max{*R*<sup>∗</sup> <sup>1</sup>, *R*<sup>∗</sup> <sup>2</sup> } is chosen at *<sup>p</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup>*K*+<sup>1</sup> which satisfying *R*∗ <sup>1</sup> ( <sup>1</sup> <sup>2</sup>*K*+<sup>1</sup> ) = *<sup>R</sup>*<sup>∗</sup> <sup>2</sup> ( <sup>1</sup> <sup>2</sup>*K*+<sup>1</sup> ), implying that

$$\frac{R\_{\mathcal{D}}}{T^\*} \le \frac{R\_{\mathcal{D}}(\frac{1}{2K+1})}{R\_1^\*(\frac{1}{2K+1})} = 2K \left(\frac{2K}{2K+1}\right)^{K-1}.\tag{A23}$$

From (A21)–(A23), we obtain the following equality:

$$\frac{T\_{\text{decentral}}}{T^\*} \le \max\left\{ 2K \left( \frac{2K}{2K+1} \right)^{K-1}, 6 \right\}.$$

*Appendix D.2. When α*max = 1

From Equation (24), we obtain that

$$\begin{split} R\_{\rm li} &\leq \frac{q}{p} \left( 1 - \frac{5}{2} Kpq^{K-1} - 4q^{K} + \frac{3(1 - q^{K+1})}{(K+1)p} \right) \\ &\leq \frac{q}{p} \left( 1 - \frac{5}{2} Kpq^{K-1} - 4q^{K} + \frac{3(K+1)p}{(K+1)p} \right) \\ &= \frac{q}{p} \left( 4 \cdot (1 - q^{K}) - \frac{5}{2} Kpq^{K-1} \right) \\ &\leq \frac{q}{p} (4 \cdot (1 - q^{K})) \\ &= 4R\_{\rm s} \end{split} \tag{A24}$$

where the second inequality holds by (A19) and the last equality holds by the definition *Rs q <sup>p</sup>* (<sup>1</sup> <sup>−</sup> *<sup>q</sup>K*) in (19). On the other hand, rewrite the second lower bound of *<sup>T</sup>*∗:

$$T^\* \ge \max\_{s \in \left[K\right]} \left( s - \frac{sM}{\lfloor N/s \rfloor} \right) \frac{1}{1 + \alpha\_{\max}}.\tag{A25}$$

From the result in [2] (Appendix B), we have

$$\frac{R\_s}{\max\_{s \in [K]} \left(s - \frac{sM}{\lfloor N/s \rfloor}\right)} \le 12. \tag{A26}$$

Combining (A24)–(A26), we have

$$\frac{R\_{\rm s}}{T^\*} \le 12(1 + \alpha\_{\rm max}),\\\frac{R\_{\rm u}}{T^\*} \le 48(1 + \alpha\_{\rm max}).\tag{A27}$$

If *p* ≤ *p*th, by (A27) and since *R*<sup>∅</sup> ≤ *T*decentral ≤ *R*<sup>s</sup> (see Remark 4), we have

$$\frac{T\_{\text{decentral}}}{T^\*} \le \frac{R\_6}{T^\*} \le 12(1 + \alpha\_{\text{max}}) = 24,\tag{A28}$$

the last equality holds by the fact *α*max = 1.

If *p* ≥ *p*th, from Lemma A2, we have *R*<sup>u</sup> ≥ *R*<sup>∅</sup> and

$$\begin{split} \frac{T\_{\text{decentral}}}{T^\*} &= \frac{\frac{R\_u R\_u}{R\_u + R\_u - R\_\odot}}{T^\*} \\ &\le \frac{\min\{R\_{\text{ul}}, R\_\text{s}\}}{T^\*} \\ &\le \min\{12(1 + \alpha\_{\text{max}}), 48(1 + \alpha\_{\text{max}})\} \\ &= 24\_\text{.} \end{split} \tag{A29}$$

where the second inequality holds by (A27) and the last equality is from the fact *α*max = 1 in this case.

### **References**


### *Article* **Degrees of Freedom of a** *K***-User Interference Channel in the Presence of an Instantaneous Relay**

**Ali H. Abdollahi Bafghi 1, Mahtab Mirmohseni 2,\* and Masoumeh Nasiri-Kenari <sup>1</sup>**

<sup>1</sup> Department of Electrical Engineering, Sharif University of Technology, Tehran P932+FM4, Iran

<sup>2</sup> Institute for Communication Systems (ICS), University of Surrey, Guildford GU2 7XH, UK

**\*** Correspondence: m.mirmohseni@surrey.ac.uk

**Abstract:** In this paper, we study the degrees of freedom (DoF) of a frequency-selective *K*-user interference channel in the presence of an instantaneous relay (IR) with multiple receiving and transmitting antennas. We investigate two scenarios based on the IR antennas' cooperation ability. First, we assume that the IR receiving and transmitting antennas can coordinate with each other and that the transmitted signal of each transmitting antenna can depend on the received signals of all receiving antennas, and we derive lower and upper bounds for the sum DoF of this model. In an interference alignment scheme, we divide receivers into two groups called clean and dirty receivers. We design our scheme such that a part of the messages of clean receivers can be de-multiplexed at the IR. Thus, the IR can use these message streams for an interference cancellation at the clean receivers. Next, we consider an IR, the antennas of which do not have coordination with each other and where the transmitted signal of each transmitting antenna depends only on the received signal of its corresponding receiving antenna. We also derive lower and upper bounds for the sum DoF for this model of IR. We show that the achievable sum DoF decreases considerably compared with the coordinated case. In both of these models, our schemes achieve the maximum *K* sum DoF if the number of transmitting and receiving antennas is more than a finite threshold.

**Keywords:** frequency-selective interference channel; *K*-user interference channel; DoF; instantaneous relay

### **1. Introduction**

Spectrum sharing in wireless networks seems to be an inevitable solution to increasing bandwidth demands. How to treat interference is one of the main challenges in these scenarios. Interference alignment has proved to be a useful solution that aligns all interference signals into a smaller subspace, allowing the remaining signal space to be used for the transmission of main signals. Thereby, it can achieve the maximum degrees of freedom (DoF) of *<sup>K</sup>* <sup>2</sup> in a *K*-user interference channel [1]. An interesting question would be to find tools that can improve this maximum value for the DoF. Instantaneous relay (relay-without-delay; IR) is one of these tools [2,3].

For an IR, a transmitted signal in a *t*-th time slot (**X**IR(*t*)) is a function of all received signals (**Y**IR(*t*)) from a first time slot up to a current (*t*-th) time slot, i.e., **Y**IR(*t*) = *f*IR(**X**IR(1), ... , **X**IR(*t*)), while for a classic relay, a transmitted signal in a *t*-th time slot does not depend on a received signal in the *t*-th (current) time slot (it was shown in [4] that a classic relay cannot increase the DoF of a *K*-user interference channel), i.e., **Y**R(*t*) = *f*R(**X**R(1), ... , **X**R(*t* − 1)). Though for the current technology, an IR might seem impractical, there have been significant results on an IR, and active reconfigurable intelligent surface (RIS) is a promising technology that makes it possible to realize an IR in the near future [5]. An RIS is a special case of the IR model for which a transmitted signal in the *t*-th time slot (**X**RIS(*t*)) is a function of the received signal (**Y**RIS(*t*)) in the *t*-th time slot only, i.e., **Y**RIS(*t*) = *f*RIS(**X**RIS(*t*)).

**Citation:** Abdollahi Bafghi, A.H.; Mirmohseni, M.; Nasiri-Kenari, M. Degrees of Freedom of a *K*-User Interference Channel in the Presence of an Instantaneous Relay. *Entropy* **2022**, *24*, 1078. https://doi.org/ 10.3390/e24081078

Academic Editors: H. Vincent Poor, Holger Boche, Rafael F. Schaefer and Onur Günlü

Received: 6 June 2022 Accepted: 26 July 2022 Published: 4 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The capacities of wireless networks in the presence of an IR were studied in [6–30]. El Gamal et al., in [6], showed that in the presence of an IR, rates higher than an existing cut-set bound for a classic relay can be achieved for a point-to-point channel. In [7], a new upper bound was derived for the capacity of a channel with an IR. The authors in [8] studied a two-user interference channel in the presence of an IR and derived an outer bound for the Gaussian case under strong and very strong interference conditions. They also introduced an achievable scheme based on instantaneous amplify-and-forward relaying. In [9], the authors studied a *K*-user interference channel in the presence of an IR in two scenarios, wherein transmitters and receivers were aware and not aware of the existence of an IR. It was shown that in both cases, an IR can enlarge the rate region and increase user fairness. In [10], the authors studied general networks in the presence of an IR and derived cut-set bounds for two cases of the IR having or not having its own message; they showed that the proposed bounds are tight in some cases. In [11], it was proven that the networks with an IR can be considered a channel with in-block memory. Then, a cut-set bound was characterized that generalizes existing cut-set bounds.

As we stated before, an RIS is a special case of the generic IR model; thus, we will review some related work on the capacities of RIS-assisted networks. In [12], the fundamental capacity limit of RIS-assisted multiple-input multiple-output (MIMO) communications systems was studied by using a joint optimization of a MIMO transmit covariance matrix and RIS phase shifts. In [13], RIS-assisted communication systems were studied wherein a transmitter could control an RIS with a finite-rate link and information-theoretic limits were derived. It was proven that the capacity is achievable if information is jointly encoded in a transmitted signal and RIS phase shifts. In [14], a downlink non-orthogonal multiple-access (NOMA) RIS-assisted communication system was studied wherein multiple users were served by only one base station (BS). The sum rate of the users was maximized by using a joint optimization of a beamforming vector at the BS and the phase shifts of the RIS, wherein a successive interference cancellation decoding rate and RIS scattering element constraints existed. In [15], the usage of an RIS was studied for a rank improvement of MIMO communication channels.

From a DoF perspective, an interference alignment signaling scheme for a MIMO *X*-channel, which outperforms the achievable DoF of previous signaling schemes, was proposed in [16]. It is well known that the DoF of the frequency or time-selective *K*-user interference channel is *<sup>K</sup>* <sup>2</sup> [1], which is an important result of the interference alignment technique. We remark that the DoF of interference channels is an important problem, which has been studied vastly in the literature; e.g., the DoF of a multi-input multi-output (MIMO) interference channel [17], the DoF region of an interference channel [18,19], and the DoF of an interference channel with a partial network topology [20–25]. Interference alignment is an important technique, which has a vital impact on proving DoF achievability theorems for multi-user wireless networks. A survey of the results available on the interference alignment technique was reviewed in [26]. For the DoF of networks in the presence of an IR, the sum DoF of a two-user interference channel assisted by an IR, with *M* antennas for all nodes, was studied in [3], and it was proven that the DoF of <sup>3</sup>*<sup>M</sup>* <sup>2</sup> can be achieved. The DoF of an *M* antenna three-user interference channel assisted by an IR was studied in [27], and it was shown that a DoF of 2*M* is achievable. The DoF of a two-way *K*-user IR-aided interference channel, when the IR is equipped with 2*K* antennas, was studied in [28]. It was demonstrated that the DoF of *K* can be achieved. The DoF of a two-user interference channel in the presence of an IR, when there is an arbitrary number of IR transmitting and receiving antennas, was studied in [29]. An inner and two outer bounds were obtained. For a *K*-user interference channel assisted by an IR wherein the IR can only instantaneously amplify and forward a received signal in a current channel use, with the same number of antennas at all nodes, an achievable scheme and an outer bound were proposed in [30]. Though the DoF in some special cases wherein *K* = 2 or *K*(*K* − 1) IRs was derived, a general achievable DoF was not obtained. For a *K*-user interference channel in the presence of active and passive RISs, inner and outer bounds on a DoF region and

lower and upper bounds on a sum DoF were derived in [31]. For both active and passive RISs, it was shown that by employing a sufficient number of elements for RIS, a *K* sum DoFs can be achieved. In [32], it was shown that when there is a line-of-sight link between an RIS and transceivers and there is no direct link between the transceivers, the phases of RIS elements can be adjusted such that all interference can be canceled and a maximum *K* DoF can be achieved in a *K*-user interference channel if the number of RIS elements is more than a finite value.

The goal of this paper was to study the sum DoF of a frequency-selective *K*-user interference channel in the presence of an IR. To the best of our knowledge, although the DoF of two- and three-user interference channels and a scenario in which there are *K*(*K* − 1) IRs have been studied, the sum DoF of a frequency-selective *K*-user interference channel (wherein symbol extensions are in the frequency domain) in the presence of a multi-input multi-output (MIMO) IR has not been characterized. Our contributions are as follows:


This paper is organized as follows. In Section 2, we present the system model. In Sections 3 and 4, we discuss our main results for the coordinated and non-coordinated IRs, respectively. In Section 5, we present some numerical results to evaluate our proposed schemes. Finally, in Section 6, we conclude the paper.

**Notations:** Bold letters demonstrate matrices. Calligraphic uppercase letters denote sets and vector spaces. R is the set of real numbers. For the set A, |A| indicates the cardinality of <sup>A</sup>. **<sup>V</sup>***<sup>T</sup>* and **<sup>V</sup>***<sup>H</sup>* are the transposition and Hermitian of matrix **<sup>V</sup>**, respectively. diag(a1,...,am) denotes a diagonal matrix with the diagonal elements *a*1, ... , *am*. The function *f*(*ρ*) is *o*(log(*ρ*)) if

$$\lim\_{\rho \to \infty} \frac{|f(\rho)|}{\log(\rho)} = 0.$$

Sequence *a*(*n*) goes to infinity with *O*(*g*(*n*)) if

$$0 < \lim\_{n \to \infty} \frac{|a(n)|}{|g(n)|} < \infty.$$

N is the set of natural numbers, and W is the set of non-negative integers.

### **2. System Model and Preliminaries**

### *2.1. System Model*

We consider a *K*-user interference channel with an IR in which *K* single-antenna transmitters send their messages to *K* single-antenna receivers. In this system, the *i*-th transmitter sends the message *<sup>w</sup>*[*i*] ∈ W[*i*] <sup>=</sup> ( 1, . . . , ) 2*Tri* \*+ to the *i*-th receiver, where *ri* is the transmission rate corresponding to the *i*-th transmitter and *T* is the number of channel uses (in this paper, each channel use corresponds to each frequency slot and all transmissions are in the same time cycle). We assume an IR with *Q* receiving antennas and *W* transmitting antennas. Figure 1 shows the system model.

**Figure 1.** IR-assisted *K*-user interference channel. The IR has *W* transmitting antennas and *Q* receiving antennas. Direct links are shown by solid arrows, cross-links are shown by dotted arrows, and links between the IR and transmitters or receivers are shown by dashed arrows.

We consider a frequency-selective channel. Due to the instantaneity of the IR, it can process the signals received from all frequency slots in the current time cycle and transmit signals in different frequency slots in the same time cycle, which affects the received signals at the receivers in all frequency slots. The received signal at the *j*-th receiver in the *t*-th frequency slot *ω<sup>t</sup>* is shown by *Y*[*j*] (*ωt*) and is presented as follows (note that in the general case, the IR-transmitted signal is a function of the received signal in the past time cycles in addition to the current time cycle. In the achievability proofs of this paper, the signals of past time cycles are not needed and transmissions in different frequency slots are at the same time cycle. However, for the upper bounds, the general case is considered.):

$$Y^{[\bar{j}]}(\omega\_l) = \sum\_{i=1}^{K} H^{[\bar{j}\bar{i}]}(\omega\_l) X^{[\bar{i}]}(\omega\_l) + \sum\_{u=1}^{W} H^{[\bar{ju}]}\_{\text{IR}-\mathbb{R}}(\omega\_l) X^{[u]}\_{\text{IR}}(\omega\_l) + Z^{[\bar{j}]}(\omega\_l), \tag{1}$$

where *X*[*i*] (*ωt*) is the signal of the *i*-th transmitter, *H*[*ji*] (*ωt*) is the channel coefficient between the *i*-th transmitter and the *j*-th receiver, *X*[*u*] IR (*ωt*) is the transmitted signal of the *<sup>u</sup>*-th IR transmitting antenna, *<sup>H</sup>*[*ju*] IR−R(*ωt*) is the channel coefficient between the *<sup>u</sup>*-th IR transmitting antenna and the *j*-th receiver, and *Z*[*j*] (*ωt*) is additive white Gaussian noise (AWGN) at the *j*-th receiver in the *t*-th frequency slot *ωt*, where *t* ∈ {1, 2, ... , *T*}. We assume a perfect self-interference cancellation at the IR; thus, the received signal at the *<sup>q</sup>*-th IR receiving antenna in the *<sup>t</sup>*-th frequency slot, which is shown by *<sup>Y</sup>*[*q*] IR (*ωt*), is given as follows:

$$Y\_{\rm IR}^{[q]}(\omega\_{\rm t}) = \sum\_{i=1}^{K} H\_{\rm T-IR}^{[qi]}(\omega\_{\rm t}) X^{[i]}(\omega\_{\rm t}) + Z\_{\rm IR}^{[q]}(\omega\_{\rm t}),\tag{2}$$

where *<sup>H</sup>*[*qi*] <sup>T</sup>−IR(*ωt*) is the channel coefficient from the *<sup>i</sup>*-th transmitter to the *<sup>q</sup>*-th IR receiving antenna (for an NC-IR, before a transmission begins, all required channel-state information and the transmission strategy are shared between all nodes and all receiving and transmitting antennas of the NC-IR. However, when the transmission begins, the *i*-th transmitting antenna of the NC-IR has access to the *i*-th receiving antenna only and its received signal cannot be exchanged between other transmitting antennas (the same holds for the active RIS [31])), *<sup>q</sup>* ∈ {1, ... , *<sup>Q</sup>*}, and *<sup>Z</sup>*[*q*] IR (*ωt*) are the AWGN at the *q*-th IR receiving antenna in the *t*-th frequency slot. We assume that the perfect channel-state information for all frequency slots is available at all nodes (this ideal assumption is vastly considered in the literature [1,33]. Noisy channel-state information will be an interesting subject of future work.). We consider two types of IR: (1) a MIMO IR, the antennas of which can have a coordination with each other, called MIMO-coordinated IR (C-IR) and (2) an IR with no coordination among its antennas because the *u*-th transmitting antenna has access to only the *u*-th receiving antenna (*W* = *Q*). We call this model non-coordinated IR (NC-IR). At each time cycle, for the MIMO C-IR, we have:

$$X\_{\rm IR}^{[u]}(\omega\_{\rm l}) = f^{[u,\omega\_{\rm l}]}(Y\_{\rm IR}^{[1]}(\omega\_{\rm l}), \dots, Y\_{\rm IR}^{[1]}(\omega\_{\rm T}), \dots, Y\_{\rm IR}^{[Q]}(\omega\_{\rm l}), \dots, Y\_{\rm IR}^{[Q]}(\omega\_{\rm T})), \tag{3}$$

where *f* [*u*,*ωt*] indicates the encoding function of the IR for the *u*-th transmitting antenna at the *t*-th frequency slot *ωt*. For the NC-IR, we have:

$$X\_{\rm IR}^{[u]}(\omega\_l) = f^{[u,\omega\_l]}(Y\_{\rm IR}^{[u]}(\omega\_1), \dots, Y\_{\rm IR}^{[u]}(\omega\_T)), u \in \{1, \dots, Q\}.\tag{4}$$

We limit the functions *f* [*u*,*ωt*] to be linear. (1) and (2) can be rewritten into the following vector form:

$$\mathbf{Y}^{[j]} = \sum\_{i=1}^{K} \mathbf{H}^{[ji]} \mathbf{X}^{[i]} + \sum\_{u=1}^{W} \mathbf{H}^{[ju]}\_{\text{IR}-\text{R}} \mathbf{X}^{[u]}\_{\text{IR}} + \mathbf{Z}^{[j]},\tag{5}$$

$$\mathbf{Y}\_{\rm IR}^{[q]} = \sum\_{i=1}^{K} \mathbf{H}\_{\rm T-IR}^{[qi]} \mathbf{X}^{[i]} + \mathbf{Z}\_{\rm IR}^{[q]},\tag{6}$$

where **<sup>X</sup>**[*i*] is a *<sup>T</sup>* <sup>×</sup> 1 column vector including the channel inputs *<sup>X</sup>*[*i*] (*ωt*), i.e.,

$$\mathbf{X}^{[i]} = \left[ \begin{array}{cc} \mathbf{X}^{[i]}(\omega\_1) & \mathbf{X}^{[i]}(\omega\_2) & \cdots & \mathbf{X}^{[i]}(\omega\_T) \end{array} \right]^T.$$

**Y**[*i*] , **<sup>Y</sup>**[*q*] IR , **<sup>X</sup>**[*u*] IR , **<sup>Z</sup>**[*j*] and **<sup>Z</sup>**[*q*] IR are also defined in the similar way. **<sup>H</sup>**[*ji*] is a diagonal matrix defined as follows:

$$\mathbf{H}^{[ji]} = \text{diag}\left(H^{[ji]}(\omega\_1), \dots, H^{[ji]}(\omega\_T)\right).$$

**<sup>H</sup>**[*ju*] IR−<sup>R</sup> and **<sup>H</sup>**[*qi*] <sup>T</sup>−IR are also defined similarly. Considering functions *<sup>f</sup>* [*u*,*ωt*] to be linear, the operation of the the MIMO C-IR can be represented as follows:

$$\mathbf{X}\_{\rm IR}^{[u]} = \sum\_{q=1}^{Q} \mathbf{A}^{[uq]} \mathbf{Y}\_{\rm IR'}^{[q]} \tag{7}$$

where **<sup>A</sup>**[*uq*] are *<sup>T</sup>* <sup>×</sup> *<sup>T</sup>* matrices. Moreover, the linear operation of the NC-IR can be represented as follows:

$$\mathbf{X}\_{\rm IR}^{[u]} = \mathbf{A}^{[u]} \mathbf{Y}\_{\rm IR}^{[u]}.\tag{8}$$

Since we assume a frequency-selective *K*-user interference channel, *H*[*ji*] (*ωt*), *<sup>H</sup>*[*ju*] IR−R(*ωt*) and *<sup>H</sup>*[*qi*] <sup>T</sup>−IR(*ωt*) are independent random variables for different values of *<sup>i</sup>*, *<sup>j</sup>*, *<sup>u</sup>*, *<sup>q</sup>* and *<sup>ω</sup>t*, whose cumulative distribution functions (CDFs) are continuous due to the frequency selectivity of the channel. In the case of complex channel coefficients, their real and imaginary parts are independent random variables , whose CDFs are continuous (e.g.,

complex Gaussian random variable).

**Remark 1.** *The assumption of frequency selectivity is essential for our coding scheme not only for the realization of independent channel coefficients for each channel use but also because if we assume the channel to be time selective and channel uses are in different time slots, by using (7) and (8), the matrices* **A**[*uq*] *for the MIMO C-IR and the matrices* **A**[*u*] *for the NC-IR must be lower triangular matrices due to the definition of the IR (the transmitted signal of an IR for the t-th time slot is a function of the received signals for the time slots t* ∈ {1, ... , *t*}*). However, if we assume the channel to be frequency selective and consider our different channel uses in different frequency slots in the same time cycle, the transmitted signals of the IR for each frequency slot can be a function of all received signals for all frequency slots; thus, there would not be any constraint on the matrices* **A**[*uq*] *and* **A**[*u*] *and our proposed achievability schemes will be realizable.*

We assume that all transmitters can send a signal with a maximum average power of *ρ*, i.e., <sup>1</sup> *T T* ∑ *t*=1 *X*[*i*] (*ωt*) 2 *ρ*, ∀*i* ∈ {1, ... , *K*}. We say the rate vector **r** = (*r*1, ... ,*rK*) is achievable if lim *<sup>T</sup>*→<sup>∞</sup> Pr , *i* {*w*ˆ[*i*] = *w*[*i*] } ! = 0, where *w*ˆ[*i*] is the estimated message at the *i*-th receiver. In addition, C(*ρ*) indicates the closure of all the achievable rate vectors **r** = (*r*1,...,*rK*).

### *2.2. Preliminaries*

In the following section, we introduce some definitions that are used throughout this paper.

**Degrees of freedom (DoF)**: Similar to [1], we define the DoF region D for a *K*-user interference channel as follows:

$$\mathcal{D} = \left\{ (d\_1, \dots, d\_K) \in \mathbb{R}\_+^K : \forall (w\_1, \dots, w\_K) \in \mathbb{R}\_+^K \right.$$

$$w\_1 d\_1 + \dots + w\_K d\_K \le \limsup\_{\rho \to \infty} \left( \frac{1}{\log(\rho)} \sup\_{\mathbf{r}(\rho) \in \mathcal{E}(\rho)} (w\_1 r\_1 + \dots + w\_K r\_K) \right) \right\}. \tag{9}$$

**Span**: The span(**V**) denotes the space spanned by the column vectors of the matrix **V**. **Dimension**: We define the number of dimensions of the span(**V**) as the dimension of **V** and show it by using *d*(**V**), which is equal to rank(**V**).

**Normalized asymptotic dimension**: We will see in our analysis that for a given *K*, *Q* and for *W*, the dimensions of the beamforming matrices and the vector spaces will have an order of *O*(*n<sup>l</sup>* ), *l*, *n* ∈ N. For the matrix **V**, we define the normalized asymptotic dimension (*DN*) as follows:

$$D\_N(\mathbf{V}) = \lim\_{n \to \infty} \frac{d(\mathbf{V})}{n^l} \,, \tag{10}$$

where *<sup>l</sup>* is the maximum integer number that satisfies lim*n*→<sup>∞</sup> *d*(**V**) *<sup>n</sup><sup>l</sup>* < <sup>∞</sup>.

These definitions are also used for the vector space A; therefore, *d*(A) indicates the dimension of A, and *DN*(A) indicates the normalized asymptotic dimension of A.

### **3.** *K***-User Interference Channel in the Presence of MIMO C-IR**

In this section, we present the lower and upper bounds for the sum DoF of the frequency-selective *K*-user interference channel with a MIMO C-IR. First, we introduce the lower bound as follows:

**Theorem 1.** *For a frequency-selective K-user interference channel with a MIMO C-IR, where* max{*W*, *Q*} ≤ *K, the following DoF is achievable:*

$$\text{DoF} = \max\left\{\frac{K}{2} + \max\left\{0, K \frac{\frac{W}{K} - \frac{1}{2}}{1 + 2\left\lceil \frac{W}{\sqrt{\zeta}} \right\rceil}\right\}, \min\{Q, W\}\right\}.\tag{11}$$

*We can see from (11) that when <sup>W</sup> <sup>K</sup>* <sup>&</sup>gt; <sup>1</sup> <sup>2</sup> *, the DoF always increases over <sup>K</sup>* <sup>2</sup> *, i.e., the DoF increases in the absence of an IR.*

**Proof.** We will prove the achievability of the first term *<sup>K</sup>* <sup>2</sup> <sup>+</sup> max 0, *<sup>K</sup> <sup>W</sup> <sup>K</sup>* <sup>−</sup> <sup>1</sup> 2 1+2 . *W Q* / in (11) in

the following. The proof of the second term, i.e., min{*Q*, *W*}, is provided in Appendix A.

We present this proof in six steps. In Step 1, we divide the transmitters and the receivers into two groups (clean and dirty). In Step 2, some message streams are considered to have the capability of being de-multiplexed at the MIMO C-IR; thus, the MIMO C-IR can use them for an interference cancellation in the clean receivers. After the interference cancellation, the equivalent channel coefficients are derived for other receivers (dirty receivers). In Step 3, we introduce the interference alignment equations such that the assumption of the previous step (the de-multiplexing of some message streams) and the interference alignment for each receiver and MIMO C-IR receiving antenna are satisfied. In Step 4, we present the beamforming design for each symbol stream. In Step 5, we analyze the satisfaction of the interference alignment equations at each receiver and MIMO C-IR receiving antenna. Finally, in Step 6, we derive the achieved DoF, presented in the first term of (11).

### **Step 1: Partitioning the Transmitters and Receivers**

We divide the transmitters into two partitions. For the transmitters *i* ∈ {1, ... , *W*}, we provide two sets of symbol streams: **x**¯[*i*] and **x**˜[*i*] (each element of the vectors **x**¯[*i*] and **x**˜[*i*] is the extended symbols). The matrices **V**¯ [*i*] and **V**˜ [*i*] are the beamforming matrices, the columns of which are the beamforming vectors corresponding to the elements of **x**¯[*i*] and **x**˜[*i*] , respectively. We can write:

$$\mathbf{X}^{[i]} = \mathbf{\hat{V}}^{[i]}\mathbf{\hat{x}}^{[i]} + \mathbf{\hat{V}}^{[i]}\mathbf{\hat{x}}^{[i]}, i \in \{1, \ldots, W\}. \tag{12}$$

For the transmitters *<sup>i</sup>* ∈ {*<sup>W</sup>* <sup>+</sup> 1, ... , *<sup>K</sup>*}, we only provide one set of extended symbols (**x**¯[*i*] ), and **V**¯ [*i*] is the beamforming matrix for the symbols **x**¯[*i*] . Thus, we have:

$$\mathbf{X}^{[i]} = \mathbf{V}^{[i]} \bar{\mathbf{x}}^{[i]}, i \in \{\mathcal{W} + 1, \dots, \mathcal{K}\}. \tag{13}$$

Note that the matrices **V**˜ [*i*] and **V**¯ [*i*] have *T* rows because we have *T* frequency slots. The dimensions of **x**¯[*i*] and **x**˜[*i*] and the number of columns of **V**¯ [*i*] and **V**˜ [*i*] are determined in the next steps.

In the following steps, we design the beamforming vectors **V**˜ [*i*] and **V**¯ [*i*] such that the extended symbols **x**˜[*i*] can be de-multiplexed at the MIMO C-IR. By de-multiplexing, we mean that the MIMO C-IR can separate each symbol of message streams **x**˜[*i*] using zero forcing without decoding the symbol. The symbol streams **x**¯[*i*] act as interference signals, and their beamforming vectors align into a smaller subspace.

We also divide the receivers into clean and dirty sets. In the next steps, the signal transmitted by the MIMO C-IR is designed such that the interference induced by the symbols **<sup>x</sup>**˜[*i*] will be removed at the receivers *<sup>j</sup>* ∈ {1, ... , *<sup>W</sup>*}, called clean receivers, but this interference will remain at the receivers *j* ∈ {*W* + 1, ... , *K*}, called dirty receivers. The main reason for choosing these terms (clean and dirty receivers) is that in our scheme, the interference of some symbol streams is canceled at clean receivers by the MIMO C-IR (the MIMO C-IR can de-multiplex these symbols and use them for interference cancellation) and the clean receivers will observe fewer dimensions for the interference; however, all interference remains at the dirty receivers.

**Step 2: Interference Cancellation at Clean Receivers and Equivalent Channel for Dirty Receivers**

We design the beamforming vectors **V**˜ [*i*] and **V**¯ [*i*] such that the interference induced by the symbols **x**˜[*i*] will be removed at the clean receivers. We denote this interference as ˜ **I**[*j*] , which is written as follows:

$$\mathbf{\tilde{i}}^{[j]} = \sum\_{i \in \{1, \ldots, \mathcal{W}\}, i \neq j} \mathbf{H}^{[ji]} \mathbf{\tilde{V}}^{[i]} \mathbf{\tilde{x}}^{[i]}, j \in \{1, \ldots, \mathcal{W}\}, \tag{14}$$

The MIMO C-IR can de-multiplex the streams corresponding to **x**˜[*i*] (this will be shown in Steps 3–5), which is only contaminated by an additive noise, i.e., it will separate them into the form of ˆ **x**˜ [*i*] = **<sup>x</sup>**˜[*i*] + **<sup>z</sup>**˜[*i*] . Thus, for the interference cancellation, the MIMO C-IR designs its transmitted signal such that:

$$\sum\_{u \in \{1, \dots, W\}} \mathbf{H}\_{\text{IR}-\mathbb{R}}^{[ju]} \mathbf{X}\_{\text{IR}}^{[u]} = $$
 
$$ - \sum\_{\substack{i \in \{1, \dots, W\}, j \neq j}} \mathbf{H}^{[ji]} \mathbf{V}^{[i]} \hat{\mathbf{x}}^{[i]} = - \sum\_{\substack{i \in \{1, \dots, W\}, i \neq j}} \mathbf{H}^{[ji]} \mathbf{V}^{[i]} \left( \hat{\mathbf{x}}^{[i]} + \hat{\mathbf{z}}^{[i]} \right) = -\mathbf{I}^{[j]} + \mathbf{Z}^{[j]}, \tag{15} $$

where

$$\mathbf{Z}^{[j]} = -\sum\_{i \in \{1, \dots, \mathcal{W}\}, i \neq j} \mathbf{H}^{[ji]} \mathbf{\hat{V}}^{[i]} \mathbf{\bar{z}}^{[i]} \mathbf{.}$$

The vector Equation (15) generates a linear set of equations, an equation for each element of **<sup>X</sup>**[*u*] IR , which can be written for the *t*-th element as:

$$\sum\_{\{\omega \in \{1, \dots, \mathcal{W}\}\}} H^{[\mathrm{i}\mathfrak{a}]}\_{\mathrm{IR}-\mathbb{K}}(\omega \mathfrak{e}) X^{[\mathfrak{a}]}\_{\mathrm{IR}}(\omega \mathfrak{e}) = -\tilde{I}^{[\mathfrak{I}]}(\omega \mathfrak{e}) + \tilde{Z}^{[\mathfrak{I}]}(\omega \mathfrak{e}), \quad \forall \mathfrak{f} \in \{1, \dots, \mathcal{W}\}, \quad \forall t \in \{1, \dots, T\}, \tag{16}$$

which is a linear set of equations with *W* variables for each *ωt*. This set of equations is almost surely solvable since the coefficients of the linear equations are drawn independently and their CDFs are continuous; thus, the determinant of the matrix of linear equations will be a non-zero polynomial in terms of independent random variables and by using ([34], Lemma 1), it will be a non-zero with a probability equal to 1. Applying (16), the interference cancellation will be conducted. Thus, for each *ωt*, we will have:

$$X\_{\rm IR}^{[\
u]}(\omega\_{\rm l}) = \sum\_{j \in \{1, \dots, \mathcal{W}\}} H\_{\rm inv}^{[ju]}(\omega\_{\rm l}) (-\vec{I}^{[j]}(\omega\_{\rm l})(\omega\_{\rm l}) + \vec{Z}^{[j]}(\omega\_{\rm l})),\tag{17}$$

where *<sup>H</sup>*[*ju*] inv (*ωt*), the factor of <sup>−</sup>˜*I*[*j*] (*ωt*) + *Z*˜[*j*] (*ωt*) in (17), is a function of *<sup>H</sup>*[*<sup>j</sup> u* ] IR−R(*ωt*), *u* , *j* ∈ {1, ... , *W*} obtained by solving Equation (16). We can write Equation (17) in the vector form as follows:

$$\mathbf{X}\_{\rm IR}^{[u]} = \sum\_{j \in \{1, \dots, \mathcal{W}\}} \mathbf{H}\_{\rm inv}^{[ju]} (-\mathbf{\bar{I}}^{[j]} + \mathbf{\bar{Z}}^{[j]}) \tag{18}$$

$$=\sum\_{j\in\{1,\dots,\mathcal{W}\}}\sum\_{i\in\{1,\dots,\mathcal{W}\},j\neq j}-\mathbf{H}^{[ju]}\_{\text{inv}}\mathbf{H}^{[ji]}\mathbf{\tilde{V}}^{[i]}\mathbf{\tilde{x}}^{[i]}+\sum\_{j\in\{1,\dots,\mathcal{W}\}}\mathbf{H}^{[ju]}\_{\text{inv}}\mathbf{Z}^{[j]},\tag{19}$$

where **<sup>H</sup>**[*ju*] inv is a diagonal matrix as follows:

$$\mathbf{H}\_{\mathrm{inv}}^{[ju]} = \mathbf{diag}\left(H\_{\mathrm{inv}}^{[ju]}(\omega\_1), \dots, H\_{\mathrm{inv}}^{[ju]}(\omega\_T)\right).$$

We highlight two properties of **<sup>H</sup>**[*ju*] inv :

• Similar to **H**[*ji*] , diagonal elements *<sup>H</sup>*[*ju*] inv (*ωt*) are independent random variables for different *t* ∈ {1, ... , *T*} because the channel coefficients are independent random variables for each *t* ∈ {1, . . . , *T*}.

• Each diagonal element *<sup>H</sup>*[*ju*] inv (*ωt*) is a fractional polynomial constructed by the matrices *H*[*j u* ] IR−R(*ωt*), *<sup>j</sup>* , *u* ∈ {1, ... , *W*}. A fractional polynomial is the ratio of the polynomial *P*1(·) to the non-zero polynomial *P*2(·).

Although we cancel the interference ˜ **I**[*j*] at the clean receivers, this interference remains at the dirty receivers with new equivalent channel coefficients. Now, we derive the new channel coefficients for **V**˜ [*i*] **x**˜[*i*] , ∀*i* ∈ {1, ... , *W*} at the dirty receivers *j* ∈ {*W* + 1, ... , *K*}. By combining (5), (12), and (13), we have:

$$\mathbf{Y}^{[j]} = \sum\_{i \in \{1, \dots, K\}} \mathbf{H}^{[ji]} \mathbf{V}^{[i]} \mathbf{\bar{x}}^{[i]} + \sum\_{i \in \{1, \dots, W\}} \mathbf{H}^{[ji]} \mathbf{V}^{[i]} \mathbf{\bar{x}}^{[i]} + \sum\_{u \in \{1, \dots, W\}} \mathbf{H}^{[ju]}\_{\text{IR}-\text{R}} \mathbf{X}^{[u]}\_{\text{IR}} + \mathbf{Z}^{[j]} \tag{20}$$

$$\mathbf{H} = \sum\_{i \in \{1, \dots, K\}} \mathbf{H}^{[ji]} \mathbf{V}^{[i]} \dot{\mathbf{x}}^{[i]} + \sum\_{i \in \{1, \dots, W\}} \mathbf{H}^{[ji]} \mathbf{V}^{[i]} \dot{\mathbf{x}}^{[i]} + \sum\_{u, d, i \in \{1, \dots, W\}, i \neq d} \mathbf{H}^{[ju]}\_{\mathbf{I}\mathbf{R} - \mathbf{R}} \mathbf{H}^{[du]}\_{\mathbf{I}\mathbf{W}} \mathbf{H}^{[dj]} \mathbf{V}^{[i]} \dot{\mathbf{x}}^{[i]} + \dot{\mathbf{Z}}^{[j]},\tag{21}$$

where (21) follows from (19) and:

$$\bar{\mathbf{Z}}^{[j]} = \sum\_{\mathbf{u}, \mathbf{d}} \mathbf{H}\_{\mathbf{IR} - \mathbf{R}}^{[j\mathbf{u}]} \mathbf{H}\_{\text{inv}}^{[d\mathbf{u}]} \mathbf{Z}^{[d]} + \mathbf{Z}^{[j]}.$$

(21) can be rewritten as:

$$\mathbf{Y}^{[j]} = \sum\_{i \in \{1, \ldots, K\}} \mathbf{H}^{[ji]} \mathbf{\tilde{V}}^{[i]} \mathbf{\tilde{x}}^{[i]} + \sum\_{i \in \{1, \ldots, W\}} \mathbf{H}^{[ji]} \mathbf{\tilde{V}}^{[i]} \mathbf{\tilde{x}}^{[i]} + \mathbf{\tilde{Z}}^{[j]},\tag{22}$$

$$\mathbf{H}^{[ji]} = \mathbf{H}^{[ji]} + \sum\_{\substack{\mu, d \in \{1, \dots, \mathcal{W}\}, d \neq i}} \mathbf{H}^{[ju]}\_{\text{IR}-\text{R}} \mathbf{H}^{[du]}\_{\text{inv}} \mathbf{H}^{[di]}, i \in \{1, \dots, \mathcal{W}\}, \tag{23}$$

where **<sup>H</sup>**˜ [*ji*] is the equivalent channel coefficient matrix from the transmitter *<sup>i</sup>* ∈ {1, ... , *<sup>W</sup>*} to the receiver *<sup>j</sup>* ∈ {*<sup>W</sup>* <sup>+</sup> 1, ... , *<sup>K</sup>*} (dirty receivers) for **<sup>V</sup>**˜ [*i*] **x**˜[*i*] . By using (23), we can see that **H**˜ [*ji*] has the following properties:


$$
\hat{H}^{[\bar{\imath}]}(\omega\_{\bar{\imath}}) = 
$$

$$\sum\_{\substack{\boldsymbol{\mu},\boldsymbol{\mu}',\boldsymbol{\eta}' \in \{1,\ldots,\mathcal{W}\},\boldsymbol{i}' \neq \boldsymbol{j}'}} H\_{\text{IR}-\mathbb{R}}^{[\boldsymbol{j}\boldsymbol{u}]}(\omega\_{\boldsymbol{l}})H^{[\boldsymbol{j}'\boldsymbol{i}']}(\omega\_{\boldsymbol{l}})P^{[\boldsymbol{u}\boldsymbol{i}']'}(\{H\_{\text{IR}-\mathbb{R}}^{[\boldsymbol{m}\boldsymbol{\varepsilon}]}(\omega\_{\boldsymbol{l}}):\boldsymbol{m},\boldsymbol{e}\in\{1,\ldots,\mathcal{W}\}) \\ + H^{[\boldsymbol{j}\boldsymbol{i}]}(\omega\_{\boldsymbol{l}}),$$

where *P*[*ui j* ] (S) indicates a fractional polynomial constructed from the variables *s* ∈ S.

### **Step 3: Interference Alignment**

In this step, we determine the interference alignment equations in the clean and dirty receivers and MIMO C-IR receiving antennas. In our interference alignment scheme, we align the subspace of the interference of each user into a bigger subspace with an equal normalized asymptotic dimension. Note that for the matrices **V** and **V** , we can have the following relations simultaneously: *d*(**V**) > *d*(**V** ), *DN*(**V**) = *DN*(**V** ), e.g., *d*(**V**)=(*n* + 1) *<sup>l</sup>* > *d*(**V** ) = *n<sup>l</sup>* , *DN*(**V**) = *DN*(**V** ) = 1. We begin with clean receivers.

*(1) Interference alignment at clean receivers:* Consider the clean receiver *j* ∈ {1, ... , *W*}; for each *i* ∈ {1, . . . , *K*}, *i* = *j*, we must have:

$$\text{span}\left(\mathbf{H}^{[ji]}\mathbf{V}^{[i]}\right) \subseteq \mathcal{A}\_{j\prime} \tag{24}$$

where <sup>A</sup>¯ *<sup>j</sup>* is considered a subspace that encompass all interference at the *j*-th receiver induced by **x**¯[*i*] , *i* ∈ {1, . . . , *K*}, *i* = *j*, for which we have:

$$\max\_{i \in \{1, \ldots, K\}, i \neq j} D\_N \left( \text{span} \left( \mathbf{H}^{[ij]} \mathbf{V}^{[i]} \right) \right) = D\_N (\mathcal{A}\_j)\_{\prime} \tag{25}$$

which implies that the normalized asymptotic dimension of <sup>A</sup>¯ *<sup>j</sup>* is equal to the maximum asymptotic dimension of span **H**[*ji*] **V**¯ [*i*] for ∀*i* = *j*. Moreover, we define the message subspaces as:

$$
\mathcal{C}\_{\boldsymbol{j}} = \text{span}\left(\mathbf{H}^{\left[\boldsymbol{j}\boldsymbol{j}\right]}\mathbf{V}^{\left[\boldsymbol{j}\right]}\right).
$$

$$
\bar{\mathcal{C}}\_{\boldsymbol{j}} = \text{span}\left(\mathbf{H}^{\left[\boldsymbol{j}\boldsymbol{j}\right]}\mathbf{V}^{\left[\boldsymbol{j}\right]}\right).
$$

and we require <sup>C</sup>¯ *j*, C˜ *<sup>j</sup>* and <sup>A</sup>¯ *<sup>j</sup>* to be full-rank and linearly independent; thus, we can ensure the decodability of the message streams **x**˜[*j*] and **x**¯[*j*] by using zero forcing at the *j*-th receiver.

*(2) Interference alignment at dirty receivers:* Consider the dirty receiver *j* ∈ {*W* + 1, ... , *K*}. Here, we have two interference subspaces at each receiver *j*; the interference induced by **<sup>x</sup>**¯[*i*] aligns in subspace <sup>A</sup>¯ *<sup>j</sup>*, while the interference induced by **x**˜[*i*] aligns in subspace <sup>A</sup>˜ *<sup>j</sup>*. For each *i* ∈ {1, . . . , *K*}, *i* = *j*, we must have:

$$\text{span}\left(\mathbf{H}^{[ji]}\mathbf{V}^{[i]}\right) \subseteq \mathcal{A}\_{j\prime} \tag{26}$$

where <sup>A</sup>¯ *<sup>j</sup>* is considered a subspace for which we have:

$$\max\_{i \in \{1, \ldots, K\}, i \neq j} D\_N \left( \text{span} \left( \mathbf{H}^{[ji]} \vec{\mathbf{V}}^{[i]} \right) \right) = D\_N (\vec{\mathcal{A}}\_j)\_{\prime} \tag{27}$$

and for every *i* ∈ {1, . . . , *W*}, we must have:

$$\text{span}\left(\mathbf{H}^{[ji]}\mathbf{\tilde{V}}^{[i]}\right) \subseteq \mathcal{A}\_{j\prime} \tag{28}$$

where <sup>A</sup>˜ *<sup>j</sup>* is considered a subspace for which we have:

$$\max\_{i \in \{1, \ldots, W\}} D\_N \left( \text{span} \left( \tilde{\mathbf{H}}^{[ji]} \tilde{\mathbf{V}}^{[i]} \right) \right) = D\_N(\tilde{\mathcal{A}}\_j). \tag{29}$$

Moreover, we define the message subspace as:

$$\mathcal{C}\_{\circ} = \text{span}\left(\mathbf{H}^{[jj]}\mathbf{V}^{[j]}\right)\prime$$

and we want <sup>C</sup>¯ *<sup>j</sup>*, <sup>A</sup>˜ *<sup>j</sup>* and <sup>A</sup>¯ *<sup>j</sup>* to be full-rank and linearly independent; hence, we can ensure the decodability of the message stream **x**¯[*j*] by using zero forcing in the *j*-th receiver.

*(3) Interference alignment at the MIMO C-IR q-th receiving antenna:* We assume that *W* = *QZ* + *P*, 0 ≤ *P* < *Q*; we divide the transmitters *i* ∈ {1, ... , *W*}, into *Q* distinct sets, and the first *P* sets include *Z* + 1 transmitters and the other *Q* − *P* sets include *Z* transmitters. We name these sets B*q*, *q* ∈ {1, ... , *Q*}. We designed our interference alignment scheme such that the symbol streams **x**˜[*i*] , *i* ∈ B*<sup>q</sup>* can be de-multiplexed at the *q*-th receiving antenna of the MIMO C-IR. To this end, all the interference induced by the symbol streams **x**¯[*i*] , *i* ∈ {1, ... , *K*} must align into a limited subspace at each receiving antenna of the MIMO C-IR. Thus, at each receiving antenna *q* ∈ {1, ... , *Q*}, and for each *i* ∈ {1, . . . , *K*}, we must have:

$$\text{span}\left(\mathbf{H}\_{\mathbf{T}-\mathbf{I}\mathbf{R}}^{[qi]}\mathbf{\bar{V}}^{[i]}\right) \subseteq \mathcal{A}\_{r\_{q'}} \tag{30}$$

where <sup>A</sup>¯*rq* is considered a subspace for which we have:

$$\max\_{i \in \{1, \ldots, K\}} D\_N \left( \text{span} \left( \mathbf{H}\_{\Gamma - \text{IR}}^{[qi]} \mathbf{V}^{[i]} \right) \right) = D\_N(\mathcal{A}\_{r\_q}).\tag{31}$$

In addition, at the *q*-th receiving antenna of the MIMO C-IR, the interference induced by the symbol streams **x**˜[*i*] , *<sup>i</sup>* ∈ {1, ... , *<sup>W</sup>*}, *<sup>i</sup>* ∈ B / *<sup>q</sup>* must align into a subspace named <sup>A</sup>˜*rq* . Hence, for each *i* ∈ {1, . . . , *W*}, *i* ∈ B / *<sup>q</sup>*, we must have:

$$\text{span}\left(\mathbf{H}\_{\mathbf{T}-\mathbf{I}\mathbf{R}}^{[qi]}\mathbf{\tilde{V}}^{[i]}\right) \subseteq \mathbf{\tilde{A}}\_{r\_{q'}}\tag{32}$$

where <sup>A</sup>˜*rq* is considered a subspace for which we have:

$$\max\_{i \in \{1, \ldots, W\}, i \notin B\_{\mathbb{F}}} D\_N \left( \text{span} \left( \mathbf{H}\_{\mathbb{T} - \text{IR}}^{[qi]} \tilde{\mathbf{V}}^{[i]} \right) \right) = D\_N(\vec{\mathcal{A}}\_q). \tag{33}$$

Furthermore, we define <sup>C</sup>˜ *<sup>i</sup>*,*rq* , *i* ∈ B*<sup>q</sup>* as the message subspaces, which can be demultiplexed at the *q*-th MIMO C-IR receiving antenna as follows:

$$\mathcal{C}\_{i\mathcal{I}\_q} = \text{span}\left(\mathbf{H}^{[qi]}\_{\mathbf{T}-\mathbf{I}\mathbf{R}}\mathbf{V}^{[i]}\right), i \in \mathcal{B}\_q.$$

We want <sup>C</sup>˜ *<sup>i</sup>*,*rq* , <sup>∀</sup>*<sup>i</sup>* ∈ B*q*, <sup>A</sup>¯*rq* and <sup>A</sup>˜*rq* to be full-rank and linearly independent; thus, we can make sure that the message streams **x**˜[*i*] , *i* ∈ B*<sup>q</sup>* can be de-multiplexed at the *q*-th MIMO C-IR receiving antenna by using zero forcing. Note that the *q*-th receiving antenna of the MIMO C-IR de-multiplexes the message streams **x**˜[*i*] , *i* ∈ B*<sup>q</sup>* without having the coordination with other receiving antennas. After each antenna de-multiplexes its own message streams **x**˜[*i*] , *i* ∈ B*q*, all of these message streams are passed to the MIMO C-IR transmitting antennas so the transmitting antennas can have coordination with each other for an interference cancellation at the clean receivers (as in Equation (19)). A simple illustration of the interference alignment scheme is shown in Figure 2 for *K* = 3 and *W* = 2. In Steps 4 and 5, we prove the existence of such beamforming vectors, messages, and interference subspaces, which satisfies the previous interference alignment Equations (24)–(33) for the clean and dirty receivers and the MIMO C-IR. In Step 6, we analyze the achieved DoF by using these beamforming vector designs.

**Figure 2.** Interference alignment scheme for 3-user interference channel in the presence of MIMO C-IR with 2 receiving antennas. Subspaces corresponding to symbol streams in common dashed boxes align into a joint subspace at each node. We can see that the interference of the message streams **x**˜[1] and **x**˜[2] is canceled at clean receivers.

### **Step 4: Beamforming Matrix Design**

In this step, we design beamforming matrices such that the alignment Equations (24)–(33) are satisfied and all users' message streams are decodable.

*(1) Beamforming matrix design for i* ∈ {1, ... , *W*}: To introduce the beamforming matrix design, we must define some new notations. First, we define set F(A, B) as the set of all functions *g*(*x*) : A→B, i.e.,

$$\mathcal{F}(\mathcal{A}, \mathcal{B}) = \{ \mathcal{g}(\mathbf{x}) | \mathcal{g}(\mathbf{x}) : \mathcal{A} \to \mathcal{B} \}. \tag{34}$$

It is obvious that |F(A, <sup>B</sup>)<sup>|</sup> <sup>=</sup> |A||B|. Moreover, we define matrix **<sup>M</sup>**(*g*(*x*), **<sup>N</sup>**[*x*] , A) as follows:

$$\mathbf{M}(\mathcal{J}(\mathbf{x}), \mathbf{N}^{[\mathbf{x}]}, \mathcal{A}) = \prod\_{\mathbf{x} \in \mathcal{A}} \left(\mathbf{N}^{[\mathbf{x}]}\right)^{\mathcal{J}(\mathbf{x})}.\tag{35}$$

Then, consider the vector **<sup>w</sup>** <sup>=</sup> 1 1 ··· <sup>1</sup> *<sup>H</sup>*. We design the beamforming matrices **V**¯ [*i*] and **V**˜ [*i*] as the following:

$$\dot{\mathbf{V}}^{[i]} = \left\{ \left[ \mathbf{M}(\mathcal{g}\_1(i, j), \mathbf{H}^{[ji]}, \vec{\mathcal{S}}\_1) \right] \left[ \mathbf{M}(\mathcal{g}\_2(i, q), \mathbf{H}^{[qi]}\_{\Gamma - \text{IR}}, \vec{\mathcal{S}}\_2) \right] \mathbf{w} : \right. \\ \left. \left. \begin{aligned} \left[ \mathbf{M}(\mathcal{g}\_2(i, q), \mathbf{H}^{[qi]}\_{\Gamma - \text{IR}}, \vec{\mathcal{S}}\_2) \right] \mathbf{w} : \\ \left. \begin{aligned} \mathbf{w} &: & \mathbf{M}(\mathcal{G}\_2(i, 1, \dots, sn)) \right] \end{aligned} \right. \end{aligned} \right. \\ \left. \begin{aligned} \left[ \mathbf{w} \right] &: & \mathbf{M}(\mathcal{G}\_2(i, 1, \dots, sn)) \right] \right\} \end{aligned} \tag{36}$$

where

$$\mathcal{S}\_1 = \{(i, j) | i, j \in \{1, \dots, K\}, i \neq j\},\tag{37}$$

$$\bar{\mathcal{S}}\_2 = \{ (i, q) | i \in \{1, \dots, K\}, q \in \{1, \dots, Q\} \},\tag{38}$$

where *n* ∈ N is an auxiliary variable that can go to infinity, and *s* is a parameter for controlling the dimension of **V**¯ [*i*] , i.e., *d*(**V**¯ [*i*] ). This notation means that the right-hand side of (36) is the set of column vectors, which forms the beamforming matrix **V**¯ [*i*] . For **V**˜ [*i*] , we have:

$$
\bar{\mathbf{V}}^{[i]} = \left\{ \left[ \mathbf{M}(\mathcal{G}\_1(i,j), \mathbf{\tilde{H}}^{[ji]}, \tilde{\mathcal{S}}\_1) \right] \left[ \mathbf{M}(\mathcal{G}\_2(i,q), \mathbf{H}^{[q]}\_{\mathbf{T}-\mathbf{I}\mathbf{R}}, \tilde{\mathcal{S}}\_2) \right] \left[ \mathbf{M}(\mathcal{G}^{\mathcal{G}}(i,q), \mathbf{T}^{[qi]}, \tilde{\mathcal{S}}\_3) \right] \mathbf{w} : \right. \\
$$

$$
\mathcal{g}\_1 \in \mathcal{F}(\mathcal{S}\_1, \{1, \dots, n\}), \mathcal{g}\_2 \in \mathcal{F}(\mathcal{S}\_2, \{1, \dots, sn\}), \mathcal{g}\_3 \in \mathcal{F}(\mathcal{S}\_3, \{1, \dots, \upsilon n\}) \right\}, \tag{39}
$$

where <sup>S</sup>¯ <sup>1</sup> is given in (37), and we have:

$$\bar{\mathcal{S}}\_2 = \{ (i, q) \Big| i \in \{ 1, \dots, \mathcal{K} \}, i \notin \mathcal{B}\_q, q \in \{ 1, \dots, Q \} \Big\}, \tag{40}$$

$$\mathcal{S}\_3 = \{(i, q) \vert i \in \mathcal{B}\_{q^\*} q \in \{1, \dots, Q\}\; \},\tag{41}$$

**T**[*qi*] s are *T* × *T* diagonal random matrices for each *i* and *q*, where each of the diagonal elements for each matrix is drawn independently and its CDF is continuous.

*(2) Beamforming matrix design for i* ∈ {*W* + 1, ... , *K*}: We consider the beamforming matrix **V**¯ [*i*] as the following:

$$\bar{\mathbf{V}}^{[i]} = \left\{ \left[ \mathbf{M}(\mathcal{g}\_1(i, j), \mathbf{H}^{[ji]}, \vec{\mathcal{S}}\_1) \right] \left[ \mathbf{M}(\mathcal{g}\_2(i, q), \mathbf{H}^{[qi]}\_{\mathbf{T} - \mathbf{I} \mathbf{R}'}, \vec{\mathcal{S}}\_2) \right] \mathbf{w} : \right. $$

$$\mathfrak{g}\_1 \in \mathcal{F}(\mathfrak{S}\_{1\prime}\{1, \dots, n\}), \mathfrak{g}\_2 \in \mathcal{F}(\mathfrak{S}\_{2\prime}\{1, \dots, tn\}) \right\}, \tag{42}$$

where <sup>S</sup>¯ <sup>1</sup> and <sup>S</sup>¯ <sup>2</sup> are given by using (37) and (38), respectively. *t* is a parameter for controlling the dimension of **V**¯ [*i*] , i.e., *d*(**V**¯ [*i*] ).

We note that each value of parameters *s*, *υ* and *t* can be approximated by using rational numbers with arbitrarily small errors, and by choosing a sufficiently large *n*, parameters *sn*, *υn* and *tn* will be integers and our proposed scheme will be realizable.

**Step 5: Validity of Interference Alignment Conditions and Decodability of Message Symbols**

Now, we analyze the spaces of messages and interference.

*(1) Validity of interference alignment conditions at the clean receivers j* ∈ {1, ... , *W*}*:* For the clean receivers *j* ∈ {1, . . . , *W*}, we have the following lemma:

**Lemma 1.** *For the clean receivers <sup>j</sup>* ∈ {1, ... , *<sup>W</sup>*}*, consider* <sup>C</sup>¯ *<sup>j</sup> as the message subspace corresponding to the symbol stream* **x**¯[*j*] *, consider* <sup>C</sup>˜ *<sup>j</sup> as the message subspace corresponding to the symbol stream* **x**˜[*j*] *, and consider* <sup>A</sup>¯ *<sup>j</sup> as the interference subspace induced by the symbol stream* **x**¯[*<sup>j</sup>* ] , *j* = *j. Then,* <sup>C</sup>¯ *j*, C˜ *<sup>j</sup> and* <sup>A</sup>¯ *<sup>j</sup> are full-rank and linearly independent, i.e., all base vectors of these subspaces are linearly independent. Thus, the message streams* **x**¯[*j*] *and* **x**˜[*j*] *are decodable by using zero forcing. In addition, we have:*

$$D\_N(\vec{\mathcal{C}\_j}) = \Gamma,\tag{43}$$

$$D\_N(\mathcal{C}\_j) = \chi\_\prime \tag{44}$$

$$D\_N(\vec{\mathcal{A}}\_{\dot{\jmath}}) = \max\{\Gamma, \zeta\}\_{\prime} \tag{45}$$

*where*

$$
\Gamma = s^{QK}, \quad \chi = s^{QK-W} \upsilon^W, \quad \zeta = t^{QK}.
$$

### **Proof.** The proof is provided in Appendix B.

*(2) Validity of interference alignment conditions at the dirty receivers j* ∈ {*W* + 1, ... , *K*}*:* For the dirty receivers *j* ∈ {*W* + 1, . . . , *K*}, we have the following lemma:

**Lemma 2.** *For the dirty receivers <sup>j</sup>* ∈ {*<sup>W</sup>* <sup>+</sup> 1, ... , *<sup>K</sup>*}*, consider* <sup>C</sup>¯ *<sup>j</sup> the message subspace corresponding to the symbol stream* **x**¯[*j*] *, consider* <sup>A</sup>˜ *<sup>j</sup> as the interference subspace corresponding to the symbol stream* **x**˜[*<sup>j</sup>* ] , *j* <sup>=</sup> *j, and consider* <sup>A</sup>¯ *<sup>j</sup> as the interference subspace induced by the symbol streams* **x**¯[*<sup>j</sup>* ] , *j* <sup>=</sup> *j. Then,* <sup>C</sup>¯ *j*, A˜ *<sup>j</sup> and* <sup>A</sup>¯ *<sup>j</sup> are full-rank and linearly independent, i.e., all base vectors of these subspaces are linearly independent. Thus, the message stream* **x**¯[*j*] *is decodable by using zero forcing. In addition, we have:*

$$D\_N(\vec{\mathcal{C}\_j}) = \mathbb{\zeta}\_{\prime} \tag{46}$$

$$D\_N(\bar{\mathcal{A}}\_{\dot{\jmath}}) = \max \{ \Gamma, \zeta \}, \tag{47}$$

$$D\_N(\vec{\mathcal{A}}\_{\vec{\jmath}}) = \chi.\tag{48}$$

**Proof.** The proof is provided in Appendix C.

*(3) Validity of interference alignment conditions at the MIMO C-IR q-th receiving antenna q* ∈ {1, ... , *Q*}*:* For the *q*-th receiving antenna of the MIMO C-IR *q* ∈ {1, ... , *Q*}, we have the following lemma:

**Lemma 3.** *For the q-th receiving antenna of the MIMO C-IR <sup>q</sup>* ∈ {1, ... , *<sup>Q</sup>*}*, consider* <sup>C</sup>˜ *<sup>i</sup>*,*rq the message subspace corresponding to the symbol streams* **x**˜[*i*] , *<sup>i</sup>* ∈ B*q, consider* <sup>A</sup>˜*rq the interference subspace corresponding to the symbol streams* **x**˜[*j*] , *j* <sup>=</sup> <sup>B</sup>*q, and consider* <sup>A</sup>¯*rq the interference subspace induced by the symbol streams* **x**¯[*j*] , <sup>∀</sup>*j. Then,* <sup>C</sup>˜ *<sup>i</sup>*,*rq* , *<sup>i</sup>* ∈ B*q,* <sup>A</sup>¯*rq , and* <sup>A</sup>˜*rq are full-rank and linearly independent, i.e., all base vectors of these subspaces are linearly independent. Thus, the message stream* **x**˜[*i*] , *i* ∈ B*<sup>q</sup> can be de-multiplexed by using zero forcing. In addition, we have:*

$$D\_N(\tilde{\mathcal{C}}\_{i\mathcal{I}\_\emptyset}) = \chi\_\prime \tag{49}$$

$$\sum\_{i \in \mathcal{B}\_q} D\_N(\vec{\mathcal{C}}\_{i, r\_q}) = |\mathcal{B}\_q| \chi\_{\prime} \tag{50}$$

$$D\_N(\vec{\mathcal{A}}\_q) = \max\{\Gamma, \zeta\},\tag{51}$$

$$D\_N(\vec{\mathcal{A}}\_q) = \chi. \tag{52}$$

**Proof.** The proof is provided in Appendix D.

Now, we can calculate the dimension of the whole signal space at each receiver. We define *dt*,*<sup>j</sup>* as the total dimension at the *j*-th receiver and *dt*,*rq* as the total dimension at the *q*-th receiving antenna of the MIMO C-IR; thus, we have:

$$d\_{t,j} = d(\vec{\mathcal{C}}\_j) + d(\vec{\mathcal{C}}\_j) + d(\vec{\mathcal{A}}\_j), \forall j \in \{1, \ldots, \mathcal{W}\}, \tag{53}$$

$$d\_{t,j} = d(\bar{\mathcal{C}}\_j) + d(\bar{\mathcal{A}}\_j) + d(\tilde{\mathcal{A}}\_j), \forall j \in \{\mathcal{W} + 1, \dots, \mathcal{K}\}, \tag{54}$$

$$d\_{l,r\_q} = \sum\_{i \in \mathcal{B}\_q} d(\tilde{\mathcal{C}}\_{i,r\_q}) + d(\bar{\mathcal{A}}\_{r\_q}) + d(\tilde{\mathcal{A}}\_{r\_q}), \forall q \in \{1, \dots, Q\}, \tag{55}$$

where the dimension of the message and the interference subspaces are derived in (A8)–(A10), (A20)–(A22), and (A26)–(A28) in Appendices B–D. Similarly, define *DN*,*t*,*<sup>j</sup>* as the total normalized asymptotic dimension at the *j*-th receiver and *DN*,*t*,*rq* as the total normalized asymptotic dimension at the *q*-th receiving antenna of the MIMO C-IR; thus, from (43)–(52), we have:

$$D\_{N,t,\dot{\jmath}} = D\_N(\mathcal{C}\_{\dot{\jmath}}) + D\_N(\mathcal{C}\_{\dot{\jmath}}) + D\_N(\mathcal{A}\_{\dot{\jmath}}) = \Gamma + \chi + \max\{\Gamma, \zeta\}, \forall \dot{\jmath} \in \{1, \ldots, N\}, \tag{56}$$

$$D\_{N,t,j} = D\_N(\mathcal{E}\_j) + D\_N(\mathcal{A}\_j) + D\_N(\mathcal{A}\_j) = \zeta + \chi + \max\{\Gamma, \zeta\}, \forall j \in \{\mathcal{W} + 1, \dots, K\}, \tag{57}$$

$$D\_{N,t,r\_q} = \sum\_{i \in B\_q} D\_N(\mathcal{L}\_{i,r\_q}) + D\_N(\mathcal{A}\_{r\_q}) + D\_N(\mathcal{A}\_{r\_q}) = \left| \mathcal{B}\_q \right| \chi + \chi + \max\{\Gamma, \zeta\}, \forall q \in \{1, \ldots, Q\}. \tag{58}$$

Now, we determine the minimum value for the parameter *T* (for which the interference alignment equations are satisfied) as follows:

$$T = \max\left\{ \max\_{j \in \{1, \ldots, K\}} \{d\_{t,j}\}\_{\prime}, \max\_{q \in \{1, \ldots, Q\}} \{d\_{t,r\_q}\} \right\},\tag{59}$$

and from (53)–(59), we have

$$\lim\_{n \to \infty} \frac{T}{n^{K^2 - K + QK}} = \chi + \max\{\Gamma, \zeta\} + \max\left\{ \max\_{q \in \{1, \ldots, Q\}} \left| \mathcal{B}\_q \right| \chi, \zeta, \Gamma \right\}.\tag{60}$$

However, we have:

$$\max\_{q \in \{1, \dots, Q\}} |\mathcal{B}\_q| = \left\lceil \frac{W}{Q} \right\rceil,$$

so we conclude that:

$$\lim\_{n \to \infty} \frac{T}{n^{K^2 - K + QK}} = \chi + \max\{\Gamma, \zeta\} + \max\left\{ \left\lceil \frac{W}{Q} \right\rceil \chi, \zeta, \Gamma \right\}.\tag{61}$$

Up until now, we have considered any arbitrary real values for each parameter Γ, *χ* and *ζ*. Now, we make two additional assumptions for these parameters, which give us an achievable DoF. First, we set the normalized asymptotic dimension of the space at the clean receivers equal to that of the dirty receivers. Hence:

$$
\Gamma = \mathbb{Z}.\tag{62}
$$

Second, we set the maximum normalized asymptotic dimension of the space at each MIMO C-IR receiving antenna to be less than or equal to that of the dirty receivers. Therefore, we have:

$$\mathcal{Z} \ge \left\lceil \frac{\mathcal{W}}{\mathcal{Q}} \right\rceil \chi. \tag{63}$$

Having (62) and (63), (61) will have the following form:

$$\lim\_{m \to \infty} \frac{T}{n^{K^2 - K + QK}} = \chi + 2\Gamma. \tag{64}$$

### **Step 6: DoF Analysis**

Now, we characterize the total DoF. As stated before, we have *W* clean receivers, each with a normalized message dimension equal to Γ + *χ*, and *K* − *W* dirty receivers, each with a normalized message dimension equal to *ζ* (note that we set *ζ* = Γ).The total normalized transmission length is equal to *χ* + 2Γ, so the total DoF has the following form:

$$\text{DoF} = \max\_{\chi \ge 0, \Gamma \ge \left\lfloor \frac{W}{Q} \right\rfloor \chi} \frac{W(\chi + \Gamma) + (K - W)\Gamma}{\chi + 2\Gamma},\tag{65}$$

and by assuming Γ = *βχ*, we have:

$$\text{DoF} = \max\_{\beta \succeq \begin{bmatrix} \frac{W}{U} \end{bmatrix}} \frac{\mathcal{W}(1+\beta) + (K-W)\beta}{1+2\beta} \tag{66}$$

$$\xi = \frac{K}{2} + \max\_{\beta \ge \left\lfloor \frac{W}{Q} \right\rfloor} K \frac{\frac{W}{K} - \frac{1}{2}}{1 + 2\beta} = \frac{K}{2} + \max\left\{ K \frac{\frac{W}{K} - \frac{1}{2}}{1 + 2\left\lceil \frac{W}{Q} \right\rceil}, 0 \right\}.\tag{67}$$

We remark that if *<sup>W</sup> <sup>K</sup>* <sup>&</sup>gt; <sup>1</sup> <sup>2</sup> , we set *β* = . *W Q* / , and if *<sup>W</sup> <sup>K</sup>* <sup>&</sup>lt; <sup>1</sup> <sup>2</sup> , we tend *β* to ∞. This completes the proof of the achievability of the first term of (11). The proof of the second term, i.e., min{*Q*, *W*}, is provided in Appendix A.

**Remark 2.** *It is known that the DoF is an appropriate performance metric that provides a capacity approximation accurate within o*(log(*ρ*)) *[1]. Therefore, Theorem 1 indicates that the approximate sum capacity of the K-user interference channel in the presence of a MIMO C-IR is lower bounded by* 

$$\begin{cases} \max\left\{\frac{K}{2} + \max\left\{0, K \frac{\frac{W}{2} - \frac{1}{2}}{1 + 2\left|\frac{W}{2}\right|}\right\}, \min\{Q, W\}\right\} - \epsilon\\ \text{we prove an improved achievable DoF for a special case of } W \text{ and Q.} \end{cases}$$

**Theorem 2.** *Assume W* = *QZ* + *P*, *P* = 1*. Then, the achievable DoF (11) can be improved as follows:*

$$\text{DoF} = \max\left\{\frac{K}{2} + \max\left\{0, K \frac{\frac{W}{K} - \frac{1}{2}}{1 + 2\left\lfloor \frac{W}{\sqrt{Q}} \right\rfloor}\right\}, \min\{Q, W\}\right\}.\tag{68}$$

**Proof.** The proof is provided in Appendix E.

**Remark 3.** *Theorem 2 shows that the approximate sum capacity of the K-user interference channel with a MIMO C-IR is lower bounded by* max *K* <sup>2</sup> <sup>+</sup> max 0, *<sup>K</sup> <sup>W</sup> <sup>K</sup>* <sup>−</sup> <sup>1</sup> 2 1+2 5 *W Q* 6 , min{*Q*, *W*} − log(1 + *ρ*) + *o*(log(*ρ*)), ∀ > 0*, where P* = 1 *(we have W* = *QZ* + *P*, 0 ≤ *P* < *Q). From (11) and (68), we note that this lower bound is tighter than the previous bound.*

**Remark 4.** *As expected, if we set Q* = *W* = *K, the maximum K DoF, which is the DoF at the absence of interference, is achievable for the MIMO C-IR.*

**Remark 5.** *It was shown in [4] that an ordinary relay cannot increase the DoF of a K-user interference channel. The main difference here is that the instantaneity of the relay can significantly improve the DoF.*

**Remark 6.** *Although we derived the achievable DoF for the asymptotic case, the achievability results are also valid for finite values of the auxiliary variable n, which determines the dimensions of beamforming vectors (see Equations (36)–(42)). Thus, if all interference alignment conditions (24)–(33) are satisfied and T is sufficiently large (as in Equation (59), i.e., larger than the sum of the interference and message subspaces), then for each receiver j* ∈ {1, ... , *K*}*, there is the matrix* **E***<sup>j</sup> such that if we multiply the vector of received signals in all frequency slots (***Y**[*j*] *) by* **E***j, the transmitted streams will be separated at each receiver with additive noise. Then, for the clean receivers j* ∈ {1, . . . , *W*}*, we have:*

$$\mathbf{E}\_{\mathbf{j}}\mathbf{Y}^{[j]} = \begin{bmatrix} \mathbf{x}^{[j]} \\ \mathbf{\tilde{x}}^{[j]} \end{bmatrix} + \mathbf{\hat{n}}^{[j]},\tag{69}$$

*where* **n**ˆ[*j*] *is additive Gaussian noise, which is not necessarily white. Moreover, for the dirty receivers j* ∈ {*W* + 1, . . . , *K*}*, we have:*

$$\mathbf{E}\_{\mathbf{j}}\mathbf{Y}^{[j]} = \bar{\mathbf{x}}^{[j]} + \hat{\mathbf{n}}^{[j]}.\tag{70}$$

*Thus, the proposed achievability scheme can be used for resource allocation problems, such as sumrate optimization problems. This kind of utilization of interference alignment coding schemes for optimization problems was used in [35]. However, finding the optimal input distributions for the symbol streams* **x**¯[*i*] *and* **x**˜[*i*] *and the optimal values for other parameters (t*,*s*, *and υ) in order to compare the performance of the proposed scheme with the performance of other signaling strategies (e.g., [36,37]) from the rate region perspective are still complicated problems and need complex optimization algorithms, which are directions for future research.*

Next, we introduce an upper bound for the sum DoF of the frequency-selective *K*-user interference channel assisted by the MIMO C-IR.

**Theorem 3.** *Considering the functions f* [*u*,*ωt*] *to be linear in (3), the sum DoF of the frequencyselective K-user interference channel assisted by the MIMO C-IR can be upper-bounded as follows:*

$$\sum\_{i=1}^{K} d\_i \le \min\left\{\frac{K}{2} + \frac{WQ}{2(K-1)}, K\right\}.\tag{71}$$

**Proof.** By using (5)–(7), we have:

$$\mathbf{Y}^{[j]} = \sum\_{i=1}^{K} \mathbf{H}^{[ji]} \mathbf{X}^{[i]} + \sum\_{u=1}^{W} \mathbf{H}\_{\text{IR}-\text{R}}^{[ju]} \sum\_{q=1}^{Q} \mathbf{A}^{[uq]} \left( \sum\_{i=1}^{K} \mathbf{H}\_{\text{T}-\text{IR}}^{[qi]} \mathbf{X}^{[i]} + \mathbf{Z}\_{\text{IR}}^{[q]} \right) + \mathbf{Z}^{[j]}$$

$$\mathbf{Y} = \sum\_{i=1}^{K} \left( \mathbf{H}^{[ji]} + \sum\_{u=1}^{W} \sum\_{q=1}^{Q} \mathbf{H}\_{\text{IR}-\text{R}}^{[ju]} \mathbf{A}^{[uq]} \mathbf{H}\_{\text{T}-\text{IR}}^{[qi]} \right) \mathbf{X}^{[i]} + \mathbf{Z}^{[j]} = \sum\_{i=1}^{K} \hat{\mathbf{H}}^{[ji]} \mathbf{X}^{[i]} + \hat{\mathbf{Z}}^{[j]}, \tag{72}$$

where

$$\mathbf{\hat{H}}^{[ji]} = \mathbf{H}^{[ji]} + \sum\_{\mu=1}^{\mathcal{W}} \sum\_{q=1}^{Q} \mathbf{H}^{[ju]}\_{\text{IR}-\mathbf{R}} \mathbf{A}^{[uq]} \mathbf{H}^{[qi]}\_{\text{T}-\mathbf{IR'}} \tag{73}$$

$$\mathbf{Z}^{[j]} = \sum\_{\iota=1}^{W} \sum\_{q=1}^{Q} \mathbf{H}\_{\text{IR}-\mathbb{R}}^{[ju]} \mathbf{A}^{\{uq\}} \mathbf{Z}\_{\text{IR}}^{[q]} + \mathbf{Z}^{[j]}.\tag{74}$$

Now, consider the given *i*, *j* ∈ {1, ... , *K*}, *i* = *j*. The matrices **A**[*uq*] must be chosen such that rank(**Hˆ** [*ii*] ) = *T*, ∀*i*; otherwise, the messages of each transmitter cannot be transmitted completely and the resulting upper bound for the sum DoF will decrease. For more clarity of the proof, we eliminate messages *w*[*k*] , *k* = *i*, *j*, and this causes the rates *ri* and *rj* to increase because of a data processing inequality [38] (Theorem 2.8.1). Hence, we have:

$$\mathbf{Y}^{[i]} = \mathbf{H}^{[ii]}\mathbf{X}^{[i]} + \mathbf{H}^{[ij]}\mathbf{X}^{[j]} + \mathbf{Z}^{[i]},\tag{75}$$

$$\mathbf{Y}^{[j]} = \mathbf{H}^{[ji]}\mathbf{X}^{[i]} + \mathbf{H}^{[ji]}\mathbf{X}^{[j]} + \mathbf{Z}^{[j]}.\tag{76}$$

Now, we define new variables as follows:

$$\mathbf{Y}^{[j]'} = \hat{\mathbf{H}}^{[ij]} \left(\hat{\mathbf{H}}^{[ji]}\right)^{-1} \mathbf{Y}^{[j]} = \hat{\mathbf{H}}^{[ij]} \left(\hat{\mathbf{H}}^{[ji]}\right)^{-1} \left(\hat{\mathbf{H}}^{[ji]} \mathbf{X}^{[j]} + \hat{\mathbf{H}}^{[ji]} \mathbf{X}^{[j]}\right) + \hat{\mathbf{H}}^{[ij]} \left(\hat{\mathbf{H}}^{[ji]}\right)^{-1} \hat{\mathbf{Z}}^{[j]}, \tag{77}$$
 
$$\mathbf{Y}^{[j]''} = \hat{\mathbf{H}}^{[ij]} \left(\hat{\mathbf{H}}^{[ji]}\right)^{-1} \left(\hat{\mathbf{H}}^{[ji]} \mathbf{X}^{[i]} + \hat{\mathbf{H}}^{[ji]} \mathbf{X}^{[j]}\right) + \hat{\mathbf{Z}}^{[i]}. \tag{78}$$

Then, we obtain:

$$Tr\_i \le I\left(w^{[i]}; \mathbf{Y}^{[i]}\right) + \varepsilon,\tag{79}$$

$$\begin{split} \operatorname{Tr}\_{\mathbf{j}} \leq & I\Big(w^{[j]}; \mathbf{Y}^{[j]}\Big) + \varepsilon \leq I\Big(w^{[j]}; \mathbf{Y}^{[j]}, \mathbf{Y}^{[j]''}\Big) + \varepsilon = I\Big(w^{[j]}; \mathbf{Y}^{[j]''}\Big) + I\Big(w^{[j]}; \mathbf{Y}^{[j]}\Big) \Big|\mathbf{Y}^{[j]''}\Big) + \varepsilon \\ \leq & I\Big(w^{[j]}; \mathbf{Y}^{[j]''}\Big|w^{[j]}\Big) + I\Big(w^{[j]}; \mathbf{Y}^{[j]}\Big|\mathbf{Y}^{[j]''}\Big) + \varepsilon \\ = & I\Big(w^{[j]}; \mathbf{Y}^{[j]}\Big|w^{[j]}\Big) + I\Big(w^{[j]}; \mathbf{Y}^{[j]}\Big|\mathbf{Y}^{[j]''}\Big) + \varepsilon. \end{split} \tag{80}$$

Thus, we have:

$$T(r\_{\bar{i}} + r\_{\bar{j}}) \le I\left(w^{[\bar{i}]}, w^{[\bar{j}]}; \mathbf{Y}^{[\bar{i}]}\right) + I\left(w^{[\bar{j}]}; \mathbf{Y}^{[\bar{j}]} \Big| \mathbf{Y}^{[\bar{j}]''}\right) + 2\varepsilon \le \left(2T - R^{[\bar{i}]}\right) \log(1 + \rho) + o(\log(\rho)),\tag{81}$$

$$r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}} \le r\_{\bar{i}}$$

where *R*[*ij*] = rank(**Hˆ** [*ij*] ). By using the same argument, we obtain:

$$r\_i + r\_j \le \left( 2 - \frac{\max\left\{ \text{rank}\left( \mathbf{H}^{[ij]} \right), \text{rank}\left( \mathbf{H}^{[ji]} \right) \right\}}{T} \right) \log(1 + \rho) + o(\log(\rho)). \tag{82}$$

Therefore, we obtain:

$$(K-1)\sum\_{i=1}^{K}r\_{i} = \sum\_{i\neq j}r\_{i} + r\_{j}$$

$$\leq \sum\_{i\neq j} \left(2 - \frac{\max\left\{\text{rank}\left(\hat{\mathbf{H}}^{[ij]}\right), \text{rank}\left(\hat{\mathbf{H}}^{[ij]}\right)\right\}}{T}\right) \log(1+\rho) + o(\log(\rho))$$

$$= \left(K(K-1) - \sum\_{i\neq j} \left(\frac{\max\left\{\text{rank}\left(\hat{\mathbf{H}}^{[ij]}\right), \text{rank}\left(\hat{\mathbf{H}}^{[ij]}\right)\right\}}{T}\right)\right) \log(1+\rho) + o(\log(\rho)). \tag{83}$$

$$\leq \max\left(\text{rank}\left(\mathbf{M}^{[ij]}\right), \text{rank}\left(\mathbf{M}^{[ij]}\right)\right).$$

To minimize the term ∑ *i* =*j* max{rank(**Hˆ** [*ij*] ),rank(**Hˆ** [*ji*] )} *<sup>T</sup>* , there are *WQT*<sup>2</sup> variables in

the matrices **A**[*uq*] . Every unit decrement of the rank of cross-link matrices requires *T* linear dependencies (*T* independent linear equations, which follow from the form of the arrangement of coefficients of equations); thus, we can see that:

$$\sum\_{i \neq j} \left( \frac{\max\left\{ \text{rank}\left( \mathbf{H}^{[ij]} \right), \text{rank}\left( \mathbf{H}^{[ji]} \right) \right\}}{T} \right) \ge \frac{K(K-1)}{2} - \frac{WQ}{2}. \tag{84}$$

Considering (83) and (84), the upper bound (71) can be obtained. We note that ∑*<sup>K</sup> <sup>i</sup>*=<sup>1</sup> *di* ≤ *K* is obvious because of (79).

**Remark 7.** *Theorem 3 indicates that the approximate sum capacity of the frequency-selective Kuser interference channel assisted by the MIMO C-IR is upper-bounded by* min{ *<sup>K</sup>* <sup>2</sup> <sup>+</sup> *WQ* <sup>2</sup>(*K*−1), *<sup>K</sup>*} log(1 + *ρ*) + *o*(log(*ρ*))*.*

### **4.** *K***-User Interference Channel in the Presence of NC-IR**

In this section, we provide the lower and upper bounds for the sum DoF of the frequency-selective *K*-user interference channel in the presence of an NC-IR as follows.

**Theorem 4.** *Consider U*, *p*,*e*,*e* ∈ W *such that*

$$\mathcal{U}\mathcal{U} = p\epsilon + \epsilon', 0 \le \epsilon' < p, \frac{\mathcal{K}}{2} < \mathcal{U} \le \mathcal{K}. \tag{85}$$

*Then, with an NC-IR with W* = *Q* = *pU antennas, the following DoF is achievable:*

$$\text{DoF} = \frac{K}{2} + \max\left\{ K \frac{\frac{\text{II}}{K} - \frac{1}{2}}{1 + 2\left\lceil \frac{\text{II}}{p} \right\rceil}, 0 \right\}. \tag{86}$$

**Proof.** The proof is provided in Appendix F.

**Remark 8.** *Theorem 4 indicates that the approximate sum capacity of a frequency-selective K-user interference channel in the presence of the NC-IR is lower bounded by K* <sup>2</sup> <sup>+</sup> max *<sup>K</sup> <sup>U</sup> <sup>K</sup>* <sup>−</sup> <sup>1</sup> 2 1+2 . *U p* / , 0 − log(1 + *ρ*) + *o*(log(*ρ*)), ∀ > 0*.*

**Remark 9.** *The active reconfigurable intelligent surface RIS can be modeled as a special case of an NC-IR [34]. It was proven in [34] that for an active RIS with Q* = *U*(*K* − 1) + *U*(*K* − *U*) *antennas, the following DoF is achievable:*

$$\text{DoF} = \frac{K + \mathcal{U}}{2}, 0 \le \mathcal{U} \le K. \tag{87}$$

*Therefore, we can see that for* 0 < *Q* < 2(*K* − 1)*, the achievable DoF (86) is dominant, and for Q* ≥ 2(*K* − 1)*, the maximums of (86) and (87) form the maximum achievable DoF for the NC-IR.*

**Remark 10.** *Considering Theorem 1, we can conclude that the maximum K DoF can be achieved by using Q* = *W* = *K antennas for a MIMO C-IR, but Q* = *K*(*K* − 1) *antennas for achieving the maximum K DoF by an NC-IR is required, which grows quadratically and shows a loss of performance.*

Finally, we introduce an upper bound for the sum DoF of the frequency-selective *K*-user interference channel assisted by the NC-IR.

**Theorem 5.** *Considering the functions f* [*u*,*ωt*] *to be linear in (4), the sum DoF of the frequencyselective K-user interference channel assisted by the NC-IR can be upper-bounded as follows:*

$$\sum\_{i=1}^{K} d\_i \le \min\left\{\frac{K}{2} + \frac{Q}{2(K-1)}, K\right\} = \min\left\{\frac{K}{2} + \frac{W}{2(K-1)}, K\right\} = \min\left\{\frac{K}{2} + \frac{\sqrt{WQ}}{2(K-1)}, K\right\}.\tag{88}$$

**Proof.** This theorem can be proven by using the same argument given for Theorem 3, except for the fact that the linear operation of the NC-IR can be represented as (8). Thus, matrices **A**[*u*] provide *QT*<sup>2</sup> variables, which changes (84) as follows:

$$\sum\_{i \neq j} \left( \frac{\max\left\{ \text{rank}\left(\mathbf{\hat{H}}^{[ij]}\right), \text{rank}\left(\mathbf{\hat{H}}^{[ji]}\right) \right\}}{T} \right) \geq \frac{K(K-1)}{2} - \frac{Q}{2} \,. \tag{89}$$

and which yields (88).

**Remark 11.** *By considering Theorem 5, it can be seen that the approximate sum capacity of the frequency-selective K-user interference channel assisted by the NC-IR is upper-bounded by the expression* min- *<sup>K</sup>* <sup>2</sup> <sup>+</sup> *<sup>Q</sup>* <sup>2</sup>(*K*−1), *<sup>K</sup>* 7 log(1 + *ρ*) + *o*(log(*ρ*))*.*

### **5. Numerical Results**

In this section, we numerically evaluate the lower and upper bounds for the sum DoF provided in the previous sections by using some examples. We note that the proposed bounds of the DoF of the MIMO C-IR and NC-IR and the existing bounds for the active RIS [31] (Theorems 1–5) do not depend on the distribution of channel coefficients, and the only required properties are independence and being drawn from a CDF, which is continuous. In Figure 3, we compare the lower and upper bounds for the sum DoF of a six-user interference channel in the presence of the MIMO C-IR for different values of *Q* and *W* and the case without the MIMO C-IR. We see that the achievable DoF can approach only a maximum value (*K* = 6) when *W* = *K* = 6. Additionally, we can observe that the maximum achieved DoF is equal to *W* when *W* ≥ 4. Moreover, the maximum *K* DoFs can be achieved when *Q* = *W*.

In Figure 4, we compare the lower and upper bounds for the sum DoF of four-user interference channels in the presence of the MIMO C-IR, NC-IR, and active RIS [34], and the case without an IR. We note that to have a fair comparison, we assume the same number of receiving and transmitting antennas for the MIMO C-IR (*W* = *Q*) as for the NC-IR and active RIS. These figures show that the maximum *K* DoF can be achieved by employing enough antennas for the MIMO C-IR, NC-IR, and active RIS. We see that the achievable DoF is considerably decreased for the NC-IR and active RIS, and this reduction is due to a lack of coordination between the antennas in the NC-IR and active RIS. Moreover, these figures show that the required number of antennas to allow the NC-IR and active RIS to achieve the maximum *K* DoF is quadratically larger than the required number of antennas for a MIMO C-IR, which shows a performance loss for the NC-IR due to a lack of coordination between the NC-IR antennas. In addition, the achievable DoF for the NC-IR is better than for the active RIS because the NC-IR can combine the received signals from different frequency slots (see Equation (4)); however, the model of the active RIS cannot conduct this operation.

In Figure 5, we compare the achievable sum DoF of a three-user interference channel in the presence of the MIMO C-IR (with *W* = *Q*), NC-IR, and active RIS, a time-selective channel without an IR [1], and a channel with constant coefficients using Improper Gaussian Signaling (IGS) [39] and Widely Linear Precoding (WLP) [40]. We can see that the proposed scheme for the MIMO C-IR has the best performance and the IGS and WLP schemes for the constant channel have the worst performance.

**Figure 3.** Comparison of lower and upper bounds for the sum DoF of the six-user interference channel in the presence of MIMO C-IR for the case without MIMO C-IR.

**Figure 4.** Comparison of lower and upper bounds for the sum DoF of the four-user interference channel in the presence of MIMO C-IR (with *W* = *Q*), NC-IR, active RIS and for the case without IR.

**Figure 5.** Comparison of the achievable sum DoF of the three-user interference channel in the presence of MIMO C-IR (with *W* = *Q*), NC-IR, and active RIS, the time-selective channel without IR [1], and the channel with constant coefficients using Improper Gaussian Signaling (IGS) [39] and Widely Linear Precoding (WLP) [40].

### **6. Conclusions**

In this paper, we studied the lower and upper bounds for the sum DoF of the IRassisted frequency-selective *K*-user interference channel and proposed novel interference alignment-based coding schemes. The main novelty of this work is proposing a new interference alignment-based coding scheme in which receivers are partitioned into two groups called clean and dirty receivers. In this scheme, a part of the message streams of transmitters corresponding to clean receivers is de-multiplexed at the IR, and the IR uses these streams for an interference cancellation at the clean receivers, which causes an improvement of the DoF. This DoF improvement is achieved because in the interference alignment scheme, the dimension of interference subspaces decreases and the dimension of message subspaces increases at the clean receivers. For a MIMO C-IR, the antennas of which can have coordination with each other, and for an NC-IR (an IR with no coordination between the antennas), we derived achievable DoFs and observed a performance loss for the NC-IR compared with the MIMO C-IR. Moreover, we showed that by considering a number of antennas more than a finite value, a maximum *K* DoF is achievable for both the MIMO C-IR and NC-IR. The directions of our future work will contains the following aspects: (1) Finding tight bounds for the DoF of a time-selective *K*-user interference channel in the presence of an IR; (2) Extending our proposed coding scheme for more general wireless channels, e.g., an *X* network; (3) Extending our coding scheme to a scenario with an imperfect CSI.

**Author Contributions:** Conceptualization, A.H.A.B.; Formal analysis, A.H.A.B. and M.M.; Supervision, M.M. and M.N.-K.; Validation, M.M. and M.N.-K.; Writing—original draft, A.H.A.B.; Writing review & editing, M.M. and M.N.-K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **Appendix A**

In this scheme, we use only one frequency slot: *ω*1. We set *L* = min{*W*, *Q*}. We assume that only the transmitters *i* ∈ {1, ... , *L*} send their messages to the receivers *<sup>j</sup>* ∈ {1, ... , *<sup>L</sup>*} via the symbols *<sup>X</sup>*[*i*] (*ω*1), *i* ∈ {1, ... , *L*}, and other transmitters are silent (*X*[*i*] (*ω*1) = 0, ∀*i* ∈ {*L* + 1, ... , *K*}). Considering (2), the MIMO C-IR can de-multiplex *X*[*i*] (*ω*1), ∀*i* ∈ {1, ... , *L*} by using *L* linear equations in the first *L* receiving antennas almost surely because the matrix of the coefficients is in terms of independent random variables; thus, the matrix's determinant is a non-zero polynomial of independent random variables with a continuous cumulative probability distribution, and considering [34] (Lemma 1), it is a non-zero with the probability 1. Then, the MIMO C-IR designs its transmitted signal to remove the interference in each receiver *j* ∈ {1, ... , *L*} by solving the following linear equations:

$$-\sum\_{\substack{i\in\{1,\dots,L\}, i\neq j}}H^{[ji]}(\omega\_1)(X^{[i]}(\omega\_1)+Z^{[i]}(\omega\_l)) = \sum\_{u=1}^L H^{[ju]}\_{\text{IR}-\mathbb{R}}X^{[u]}\_{\text{IR}}, \forall j\in\{1,\dots,L\},\tag{A1}$$

$$\tilde{Z}^{[\tilde{j}]}(\omega\_l) = -\sum\_{\substack{i \in \{1, \ldots, L\}, i \neq j}} H^{[\tilde{j}l]}(\omega\_1) \tilde{Z}^{[\tilde{i}]}(\omega\_l) \tag{A2}$$

where *Z*˜[*i*] (*ωt*) is the detection noise for symbol *X*[*i*] (*ωt*) at the MIMO C-IR. Note that by using this procedure, the interference cancellation is conducted, but we have the additional noise ˜ *Z*˜ [*j*] (*ωt*), which is negligible in a high signal to noise ratio (SNR) regime. Therefore, *L* symbols can be transmitted in one frequency slot, and the total *L* DoF is achievable. Thus, the second term in (11) is achievable, which completes the proof.

### **Appendix B**

Using (36) and (39), we characterize the message subspaces <sup>C</sup>¯ *<sup>j</sup>* and <sup>C</sup>˜ *<sup>j</sup>* as follows:

$$\mathcal{E}\_{j} = \text{span}\left(\mathbf{H}^{[j]}\boldsymbol{\Psi}^{[j]}\right) = \text{span}\left(\mathbf{H}^{[j]}\boldsymbol{\Psi}^{[j]}\right) = \text{span}\left(\mathbf{H}^{[j]}\boldsymbol{\Psi}^{[j]}\right),$$

$$\text{span}\left\{\mathbf{H}^{[j]}\left[\mathbf{M}(\mathcal{G}\_{1}(i,j),\mathbf{H}^{[j]},\tilde{\mathbf{S}}\_{1})\right]\left[\mathbf{M}(\mathcal{G}\_{2}(i,q),\mathbf{H}^{[q]}\_{1-\mathbf{H}},\tilde{\mathbf{S}}\_{2})\right]\mathbf{w} : \mathcal{G}\_{1} \in \mathcal{F}\left(\tilde{\mathbf{S}}\_{1},\{1,\ldots,n\}\right), \mathcal{g}\_{2} \in \mathcal{F}\left(\tilde{\mathbf{S}}\_{2},\{1,\ldots,n\}\right)\right\},\tag{A3}$$

$$\mathbf{I} = \mathbf{C} + \mathbf{C} \quad \mathcal{T}(\mathbf{M}(\mathcal{G}\_{1})) = \mathbf{M}(\mathcal{H}^{[1]}) = \mathbf{I} + \mathbf{M}(\mathcal{G}\_{2}) = \mathbf{I} + \mathbf{C} \quad \text{(20)} \quad \mathcal{G}\_{1}\mathbf{O} = \mathbf{C}\mathcal{D}.\tag{A4}$$

 =

where <sup>S</sup>¯ 1, <sup>S</sup>¯ 2, F(·, ·), and **M**(·, ·, ·) are given by using (37), (38), (34), and (35), respectively.

$$\vec{\mathcal{C}}\_{j} = \text{span}\left(\dot{\mathbf{H}}^{[j]} \vec{\mathbf{V}}^{[j]}\right) =$$

$$\text{span}\left\{\ddot{\mathbf{H}}^{[j]}\Big[\mathbf{M}(\mathcal{g}\_{1}(i,j), \dot{\mathbf{H}}^{[j]}, \vec{\mathcal{S}}\_{1})\Big] \Big[\mathbf{M}(\mathcal{g}\_{2}(i,q), \mathbf{H}^{[q]}\_{\Gamma-\text{IR}}, \vec{\mathcal{S}}\_{2})\Big] \Big[\mathbf{M}(\mathcal{g}\_{3}(i,q), \mathbf{T}^{[q]}, \vec{\mathcal{S}}\_{3})\Big] \mathbf{w} :$$

$$\mathcal{g}\_{1} \in \mathcal{F}(\vec{\mathcal{S}}\_{1}, \{1, \dots, n\}), \mathcal{g}\_{2} \in \mathcal{F}(\vec{\mathcal{S}}\_{2}, \{1, \dots, sn\}), \mathcal{g}\_{3} \in \mathcal{F}(\vec{\mathcal{S}}\_{3}, \{1, \dots, vn\})\Big)\Big], \qquad \text{(A4)}$$

where <sup>S</sup>˜ <sup>2</sup> and <sup>S</sup>˜ <sup>3</sup> are given by using (40) and (41), respectively.

To satisfy interference alignment Equation (24), the subspace <sup>A</sup>¯ *<sup>j</sup>* must be chosen such that:

$$\bigcup\_{i \in \{1, \dots, \mathcal{K}\}, i \neq j} \left\{ \text{span} \left( \mathbf{H}^{[ji]} \mathbf{V}^{[i]} \right) \right\} \subseteq \mathcal{A}\_{j}.$$

Therefore, we characterize <sup>A</sup>¯ *<sup>j</sup>* as follows:

> A¯ *<sup>j</sup>* =

$$\operatorname{span}\left\{ \left[ \mathbf{M}(\mathcal{g}\_{1}(i,j), \mathbf{H}^{[\bar{\eta}]}, \bar{\mathcal{S}}\_{1}) \right] \middle| \mathbf{M}(\mathcal{g}\_{2}(i,q), \mathbf{H}^{[\bar{\eta}]}\_{\Gamma-\mathbb{R}}, \bar{\mathcal{S}}\_{2}) \right] \mathbf{w} : \mathcal{g}\_{1} \in \mathcal{F} \left( \bar{\mathcal{S}}\_{1}, \{1, \ldots, n+1\} \right), \mathcal{g}\_{2} \in \mathcal{F} \left( \bar{\mathcal{S}}\_{2}, \{1, \ldots, n \max\{s, t\} \} \right) \right\},\tag{A5}$$

where <sup>S</sup>¯ <sup>1</sup> and <sup>S</sup>¯ <sup>2</sup> are given by using (37) and (38), respectively. Note that to use the zeroforcing technique, the subspace of the interference must be a vector space, but the set of interference vectors, which is equal to 9 *i*∈{1,...,*K*},*i* =*j* span **H**[*ji*] **V**¯ [*i*] 7, is not a vector space; thus, we choose the subspace of interference (A5), which is easier to work with and includes 9 span **H**[*ji*] **V**¯ [*i*] 7.

*i*∈{1,...,*K*},*i* =*j*

After that step, we analyze the dimension and the normalized asymptotic dimension of the messages and interference subspaces. First, we assume that the parameter *T* (the number of frequency slots) is sufficiently large, and at the end of Step 5 of the proof, we will choose the minimum value for *T* such that all message streams can be decodable and all interference alignment equations can be satisfied. Considering the natures of <sup>A</sup>¯ *<sup>j</sup>* in (A5), C¯ *<sup>j</sup>* in (A3), and <sup>C</sup>˜ *<sup>j</sup>* in (A4), we can see from a statement of [34] (Lemma 2) that if we choose the variables *xk* as *H*[*ji*] (*ωt*), *<sup>H</sup>*[*qi* ] <sup>T</sup>−IR(*ωt*), *<sup>i</sup>*, *<sup>i</sup>* , *<sup>j</sup>* ∈ {1, ... , *<sup>K</sup>*}, *<sup>q</sup>* ∈ {1, ... , *<sup>Q</sup>*}, *yk* as *<sup>H</sup>*[*ju*] IR−R(*ωt*), *<sup>j</sup>* ∈ {*<sup>W</sup>* <sup>+</sup> 1, ... , *<sup>K</sup>*}, *<sup>u</sup>* ∈ {1, ... , *<sup>W</sup>*}, and *zk* as *<sup>H</sup>*[*ju*] IR−R(*ωt*), *<sup>j</sup>* ∈ {1, ... , *<sup>W</sup>*}, *<sup>u</sup>* ∈ {1, ... , *<sup>W</sup>*}, then by using [34] (Lemmas 1–3), the subspaces <sup>A</sup>¯ *j*, C¯ *<sup>j</sup>* and <sup>C</sup>˜ *<sup>j</sup>* are almost surely full-rank and linearly independent (all base vectors of these subspaces are linearly independent). In fact, if we take the constructing base vectors of <sup>A</sup>¯ *j*, C¯ *<sup>j</sup>* and <sup>C</sup>˜ *<sup>j</sup>* and construct a square matrix by choosing some rows of the matrix, we can see by using [34] (Lemmas 2–3) that the determinant of this square matrix will be a non-zero polynomial, and by using [34] (Lemma 1), it will be a non-zero with a probability equal to one; thus, all message streams are decodable at the clean receivers (by using zero forcing).

For more clarity, we will review [34] (Lemmas 1–3) as follows:

Ref. [34] (Lemma 1): Consider the *k* independent random variables *X*1, ... , *Xk*, each constructed from a CDF, which is continuous. The probability of the event that the non-zero polynomial *Pk*(*X*1, ... , *Xk*), constructed from *X*1, ... , *Xk* with a finite degree, assumes the value zero is zero, i.e., Pr{*Pk*(*X*1,..., *Xk*) = 0} = 0.

Ref. [34] (Lemma 2): Consider the three sets of variables {*xi*, *i* ∈ A*x*, |A*x*| < ∞}, {*yi*, *i* ∈ A*y*, A*<sup>y</sup>* <sup>&</sup>lt; <sup>∞</sup>}, and {*zi*, *<sup>i</sup>* ∈ A*z*, |A*z*<sup>|</sup> <sup>&</sup>lt; <sup>∞</sup>}. Consider the following functions:

$$f\_j = \prod\_{i=1}^{|\mathcal{A}\_x|} \left( x\_i + \sum\_{i' \in \mathcal{C}\_j, i'' \in \mathcal{D}\_j} x\_{i'} y\_{i''} P\_1^{[i'i'']}(z\_k : k \in \mathcal{A}\_z) + y\_{i'} P\_2^{[i'i'']}(z\_k : k \in \mathcal{A}\_z) \right)^{d\_j}, \quad \text{(A6)}$$

$$(a\_1^j, \dots, a\_{|\mathcal{A}\_x|}^j) \in \mathbb{W}^{|\mathcal{A}\_x|}, j \in \{1, \dots, l\},$$

where *<sup>P</sup>*[*<sup>i</sup> i j*] <sup>1</sup> (·) and *<sup>P</sup>*[*<sup>i</sup> i j*] <sup>2</sup> (·) are fractional polynomials and for ∀*j*, we have C*j* , D*<sup>j</sup>* <sup>&</sup>lt; <sup>∞</sup>. If for ∀*j*, *j* with *j* = *j* , (*a j* <sup>1</sup>, ... , *a j* |A*x*| ) = (*a j* <sup>1</sup> , ... , *a j* |A*x*| ), then the functions *fj* will be linearly independent.

Ref. [34] (Lemma 3): Consider the set of non-zero linearly independent fractional polynomials {*P*[*j*] (·), *<sup>j</sup>* ∈ {1, ... , *<sup>J</sup>*}} and consider the *<sup>J</sup>* sets of variables <sup>X</sup>*<sup>j</sup>* <sup>=</sup> {*x<sup>j</sup> i* : *i* ∈ I, I ⊆ N, |I| < ∞}, *j* ∈ {1, ... , *J*}. The determinant of the following matrix will be a non-zero fractional polynomial:

$$\mathbf{A} = \begin{bmatrix} P^{[1]}(\mathcal{X}\_1) & P^{[2]}(\mathcal{X}\_1) & \cdots & P^{[l]}(\mathcal{X}\_1) \\ P^{[1]}(\mathcal{X}\_2) & P^{[2]}(\mathcal{X}\_2) & \cdots & P^{[l]}(\mathcal{X}\_2) \\ \vdots & \vdots & \ddots & \vdots \\ P^{[1]}(\mathcal{X}\_I) & P^{[2]}(\mathcal{X}\_I) & \cdots & P^{[l]}(\mathcal{X}\_I) \end{bmatrix} . \tag{A7}$$

Now, we have to make sure that interference alignment Equations (24) and (25) are satisfied by analyzing the dimension of message streams and interference. The dimension of the message subspaces <sup>C</sup>¯ *<sup>j</sup>* and <sup>C</sup>˜ *<sup>j</sup>*, which is equal to the number of its base vectors in (A3) and (A4), can be characterized as follows:

$$d(\mathcal{C}\_{\bar{\jmath}}) = n^{K^2 - K} (sn)^{QK} \, , \tag{A8}$$

$$d(\vec{\mathcal{C}}\_{\vec{\lambda}}) = n^{K^2 - K} (sn)^{\varphi} (\upsilon n)^{\theta},\tag{A9}$$

where

$$\varphi = \sum\_{q'=1}^{Q} \left( K - \left| \mathcal{B}\_{q'} \right| \right) = KQ - \sum\_{q'=1}^{Q} \left| \mathcal{B}\_{q'} \right| = KQ - \mathcal{W},$$

$$\theta = \sum\_{q'=1}^{Q} \left| \mathcal{B}\_{q'} \right| = \mathcal{W}.$$

The dimension of the interference subspace <sup>A</sup>¯ *<sup>j</sup>*, which is equal to the number of its base vectors in (A5), is:

$$d(\bar{\mathcal{A}}\_{\bar{\jmath}}) = (n+1)^{K^2 - K} (\max\{\operatorname{sn}, \operatorname{tr}\})^{QK}. \tag{A10}$$

We can see from (A8)–(A10) and (10) that *<sup>l</sup>* <sup>=</sup> *<sup>K</sup>*<sup>2</sup> <sup>−</sup> *<sup>K</sup>* <sup>+</sup> *QK*. We define the following parameters:

$$
\Gamma = s^{QK}\,,\tag{A11}
$$

$$
\chi = s^{QK-W} \upsilon^W,\tag{A12}
$$

$$
\mathbb{Z} = \mathbf{t}^{\mathbb{Q}K}.\tag{\text{A13}}
$$

Considering (A8)–(A13) and (10), the normalized asymptotic dimensions of the message and interference subspaces are:

$$D\_N(\mathcal{C}\_{\bar{\jmath}}) = \Gamma,\tag{A14}$$

span-

$$D\_N(\tilde{\mathcal{C}}\_j) = \chi\_\prime \tag{A15}$$

$$D\_N(\bar{\mathcal{A}}\_{\dot{\jmath}}) = \max\{\Gamma, \zeta\}.\tag{A16}$$

Interference alignment Equations (24) and (25) are satisfied because we can see that the normalized asymptotic dimension of the interference induced by **V**¯ [*i*] **x**¯[*i*] , *i* ∈ {1, ... , *W*}, *i* = *j* is Γ and the normalized asymptotic dimension of the interference induced by **V**¯ [*i*] **x**¯[*i*] , *i* ∈ {*W* + 1, . . . , *K*} is *ζ*.

### **Appendix C**

Using (42), we can characterize the message subspace <sup>C</sup>¯ *<sup>j</sup>* as follows:

$$\mathcal{C}\_{\rangle} = \text{span}\left(\mathbf{H}^{[\bar{\eta}]} \mathbf{V}^{[\bar{\jmath}]}\right) = $$

$$\mathbf{H}^{[\bar{\eta}]} \left[\mathbf{M}(\mathcal{g}\_1(i,j), \mathbf{H}^{[\bar{\eta}]}, \mathcal{S}\_1)\right] \left[\mathbf{M}(\mathcal{g}\_2(i,q), \mathbf{H}^{[\eta]}\_{\mathbf{I}-\mathbf{R}\mathbf{P}}, \mathcal{S}\_2)\right] \mathbf{w} : \mathcal{g}\_1 \in \mathcal{F}(\mathcal{S}\_1, \{1, \dots, n\}), \mathcal{g}\_2 \in \mathcal{F}(\mathcal{S}\_2, \{1, \dots, n\})\right],\tag{A17}$$

where <sup>S</sup>¯ 1, <sup>S</sup>¯ 2, F(·, ·), and **M**(·, ·, ·) are given by using (37), (38), (34), and (35), respectively. To satisfy interference alignment Equation (26), the subspace <sup>A</sup>¯ *<sup>j</sup>* must be chosen such

that:

$$\bigcup\_{i \in \{1, \dots, \mathcal{K}\}, i \neq j} \left\{ \text{span} \left( \mathbf{H}^{[ji]} \vec{\mathbf{V}}^{[i]} \right) \right\} \subseteq \mathcal{A}\_{j \times j}$$

Therefore, we characterize <sup>A</sup>¯ *<sup>j</sup>* as follows:

$$\mathcal{A}\_{j} = \mathbf{x}$$

$$\operatorname{span} \left\{ \left[ \mathbf{M}(\underline{\mathcal{g}}\_{1}(i,j), \mathbf{H}^{[\bar{\boldsymbol{\eta}}]}, \bar{\boldsymbol{\mathcal{S}}}\_{1}) \right] \left[ \mathbf{M}(\underline{\mathcal{g}}\_{2}(i,\boldsymbol{q}), \mathbf{H}^{[\bar{\boldsymbol{\eta}}]}\_{\mathrm{T}-\mathrm{IR}}, \bar{\boldsymbol{\mathcal{S}}}\_{2}) \right] \mathbf{w} : \boldsymbol{\mathcal{g}}\_{1} \in \mathcal{F}(\bar{\boldsymbol{\mathcal{S}}}\_{1}, \{1,\ldots,n+1\}), \boldsymbol{\mathcal{g}}\_{2} \in \mathcal{F}(\bar{\boldsymbol{\mathcal{S}}}\_{2}, \{1,\ldots,n \max\{\underline{\boldsymbol{s}},\boldsymbol{t}\}\}) \right\}, \tag{A18}$$

where <sup>S</sup>¯ <sup>1</sup> and <sup>S</sup>¯ <sup>2</sup> are given by (37) and (38), respectively. To satisfy interference alignment Equation (28), the subspace <sup>A</sup>˜ *<sup>j</sup>* must be chosen such that:

$$\bigcup\_{i \in \{1, \dots, W\}} \left\{ \text{span} \left( \mathbf{H}^{[ji]} \mathbf{V}^{[i]} \right) \right\} \subseteq \mathcal{A}\_{\mathcal{I}}.$$

Therefore, we characterize subspace <sup>A</sup>¯ *<sup>j</sup>* as follows:

$$\bar{\mathcal{A}}\_{\dot{\mathcal{I}}} = \text{span}\left\{ \left[ \mathbf{M}(\mathcal{g}\_1(i,j), \mathbf{H}^{[ji]}, \bar{\mathcal{S}}\_1) \right] \left[ \mathbf{M}(\mathcal{g}\_2(i,q), \mathbf{H}^{[qi]}\_{\Gamma-\text{IR}}, \bar{\mathcal{S}}\_2) \right] \left[ \mathbf{M}(\mathcal{g}\_3(i,q), \mathbf{T}^{[qi]}, \bar{\mathcal{S}}\_3) \right] \mathbf{w} : \right\}$$

$$\mathcal{g}\_1 \in \mathcal{F}(\bar{\mathcal{S}}\_1, \{1, \dots, n+1\}), \mathcal{g}\_2 \in \mathcal{F}(\bar{\mathcal{S}}\_2, \{1, \dots, sn\}), \mathcal{g}\_3 \in \mathcal{F}(\bar{\mathcal{S}}\_3, \{1, \dots, \iota m\}) \right\}, \quad \text{(A19)}$$

where <sup>S</sup>˜ <sup>2</sup> and <sup>S</sup>˜ <sup>3</sup> are given by using (40) and (41), respectively.

By using the same argument given for the clean receivers, subspaces <sup>A</sup>¯ *<sup>j</sup>*, <sup>A</sup>˜ *<sup>j</sup>*, and <sup>C</sup>¯ *<sup>j</sup>* are full-rank and linearly independent almost surely, i.e., all base vectors of these subspaces are linearly independent. Now, we analyze the dimensions of the message and interference subspaces. By calculating the number of base vectors of the message subspace <sup>C</sup>¯ *<sup>j</sup>* in (A17), we have:

$$d(\vec{\mathcal{C}\_{j}}) = n^{K^{2}-K}(t n)^{QK},\tag{A20}$$

$$D\_{N}(\vec{\mathcal{C}\_{j}}) = \mathbb{Z}\_{\prime}$$

and for the interference subspaces in (A18) and (A19), we have:

$$d(\mathcal{A}\_{\circ}) = (n+1)^{k^2 - K} (\max\{sn, tn\})^{QK},\tag{A21}$$

$$D\_N(\vec{\mathcal{A}}\_{\vec{\prime}}) = \max\{\Gamma, \zeta\},$$

$$d(\vec{\mathcal{A}}\_{\vec{\prime}}) = (n+1)^{K^2 - K} (sn)^{QK - W} (\upsilon n)^W,\tag{A22}$$

#### *DN*(A˜ *<sup>j</sup>*) = *χ*.

Therefore, we can see that interference alignment Equations (26)–(29) are satisfied because the normalized asymptotic dimension of the interference subspace induced by **V**˜ [*i*] **x**˜[*i*] , *i* ∈ {1, .., *W*} is *χ*, the normalized asymptotic dimension of the interference subspace induced by **V**¯ [*i*] **x**¯[*i*] , *i* ∈ {1, .., *W*} is Γ, and the normalized asymptotic dimension of the interference subspace induced by **V**¯ [*i*] **x**¯[*i*] , *i* ∈ {*W* + 1, .., *K*}, *i* = *j* is *ζ*.

### **Appendix D**

Using (39), we can characterize the message subspaces <sup>C</sup>˜ *<sup>i</sup>*,*rq* , *i* ∈ B*<sup>q</sup>* as follows:

$$\mathcal{C}\_{i,r\_q} = \text{span}\left(\mathbf{H}\_{\text{T-IR}}^{[qi]} \boldsymbol{\nabla}^{[i]}\right) =$$

$$\text{span}\left\{\mathbf{H}\_{\text{T-IR}}^{[qi]} \left[\mathbf{M}(\mathcal{G}\_1(i,j), \mathbf{H}^{[ji]}, \bar{\mathbf{S}}\_1)\right] \left[\mathbf{M}(\mathcal{G}\_2(i,q), \mathbf{H}\_{\text{T-IR}}^{[qi]}, \bar{\mathbf{S}}\_2)\right] \left[\mathbf{M}(\mathcal{G}\_3(i,q), \mathbf{T}^{[qi]}, \bar{\mathbf{S}}\_3)\right] \mathbf{w} : \right\}$$

$$\mathcal{g}\_1 \in \mathcal{F}\left(\bar{\mathcal{S}}\_1, \{1, \dots, n\}\right), \mathcal{g}\_2 \in \mathcal{F}\left(\bar{\mathcal{S}}\_2, \{1, \dots, sn\}\right), \mathcal{g}\_3 \in \mathcal{F}\left(\bar{\mathcal{S}}\_3, \{1, \dots, vn\}\right)\right), \qquad \text{(A23)}$$

where <sup>S</sup>˜ 2, <sup>S</sup>˜ 3, F(·, ·), and **M**(·, ·, ·) are given by using (40), (41), (34), and (35), respectively. To satisfy interference alignment Equation (30), the subspace <sup>A</sup>¯*rq* must be chosen such

that:

$$\bigcup\_{i \in \{1, \dots, K\}} \left\{ \text{span} \left( \mathbf{H}\_{\mathbf{T} - \mathbf{I} \mathbf{R}}^{[qi]} \mathbf{V}^{[i]} \right) \right\} \subseteq \bar{A}\_{r\_q} \dots$$

Therefore, we can characterize <sup>A</sup>¯ *<sup>j</sup>* as follows:

<sup>A</sup>¯*rq* <sup>=</sup>

$$\operatorname{span}\left\{ \left[ \mathbf{M}(\operatorname{\mathcal{g}}\_{1}(i,j), \mathbf{H}^{[\bar{\boldsymbol{\eta}}]}, \bar{\boldsymbol{\mathcal{S}}}\_{1}) \right] \middle| \mathbf{M}(\operatorname{\mathcal{g}}\_{2}(i,\boldsymbol{q}), \mathbf{H}^{[\bar{\boldsymbol{\eta}}]}\_{\mathsf{T}-\mathsf{IR}}, \bar{\boldsymbol{\mathcal{S}}}\_{2}) \middle| \mathbf{w} : \operatorname{\mathcal{g}}\_{1} \in \mathcal{F}(\bar{\boldsymbol{\mathcal{S}}}\_{1}, \{1,\ldots,n\}), \bar{\boldsymbol{\varrho}}\_{2} \in \mathcal{F}(\bar{\boldsymbol{\mathcal{S}}}\_{2}, \{1,\ldots,n\max\{\mathbf{s},t\}+1\}) \right\},\tag{A24}$$

where <sup>S</sup>¯ <sup>1</sup> and <sup>S</sup>¯ <sup>2</sup> are given by using (37) and (38), respectively.

To satisfy interference alignment Equation (32), the subspace <sup>A</sup>˜*rq* must be chosen such that:

$$\bigcup\_{i \in \{1, \dots, \mathcal{W}\}, i \notin \mathcal{S}\_{\emptyset}} \left\{ \text{span} \left( \mathbf{H}\_{\mathbf{T} - \mathbf{I} \mathbf{R}}^{[qi]} \mathbf{V}^{[i]} \right) \right\} \subseteq A\_{r\_{\emptyset}}.$$

Therefore, we characterize <sup>A</sup>˜ *<sup>j</sup>* as follows:

$$\mathcal{A}\_{r\_{\mathcal{I}}} = \text{span}\left\{ \left[ \mathbf{M}(\mathcal{g}\_1(i,j), \mathbf{H}^{[ji]}, \mathcal{S}\_1) \right] \left[ \mathbf{M}(\mathcal{g}\_2(i,q), \mathbf{H}^{[qi]}\_{\mathbf{T}-\mathbf{I}\mathbf{R}^\prime}, \mathcal{S}\_2) \right] \left[ \mathbf{M}(\mathcal{g}\_3(i,q), \mathbf{T}^{[qi]}, \mathcal{S}\_3) \right] \mathbf{w} : \right\}$$

$$\mathcal{G}\_1 \in \mathcal{F}(\bar{\mathcal{S}}\_1, \{1, \dots, n\}), \mathcal{g}\_2 \in \mathcal{F}(\bar{\mathcal{S}}\_2, \{1, \dots, sn + 1\}), \mathcal{g}\_3 \in \mathcal{F}(\bar{\mathcal{S}}\_3, \{1, \dots, \nu n\}) \right\}, \quad \text{(A25)}$$

where <sup>S</sup>˜ <sup>2</sup> and <sup>S</sup>˜ <sup>3</sup> are given by using (40) and (41), respectively.

By using the same argument given for the clean receivers, subspaces <sup>A</sup>¯*rq* , <sup>A</sup>˜*rq* and C˜ *<sup>i</sup>*,*rq* , *i* ∈ B*<sup>q</sup>* are full-rank and linearly independent almost surely, i.e., all base vectors of these subspaces are linearly independent. Now, by calculating the number of base vectors, we can analyze the dimensions of the subspaces <sup>C</sup>˜ *<sup>i</sup>*,*rq* , *<sup>i</sup>* <sup>∈</sup> *Bq*, <sup>A</sup>¯*rq* and <sup>A</sup>˜*rq* :

$$d(\mathcal{C}\_{i,r\_q}) = n^{K^2 - K} (sn)^{QK - W} (\upsilon n)^W, \forall i \in \mathcal{B}\_{q\_{\mathcal{I}}} \tag{A26}$$
 
$$D\_N(\mathcal{C}\_{i,r\_q}) = \chi.$$

Thus, the normalized dimension of the total subspaces, the message symbols of which may be de-multiplexed (**x**˜[*i*] , *i* ∈ B*q*) at the MIMO C-IR *q*-th receiving antenna is:

$$\sum\_{i \in \mathcal{B}\_q} D\_N(\vec{\mathcal{C}}\_{i\mathcal{I}\_{\mathcal{I}}}) = |\mathcal{B}\_q|\chi.$$

For <sup>A</sup>¯*rq* , we have:

$$d(\mathcal{A}\_{r\_q}) = n^{K^2 - K} (\max\{sn, tn \} + 1)^{KQ},\tag{A27}$$

$$D\_N(\mathcal{A}\_{r\_q}) = \max\{\Gamma, \zeta\},$$

and for <sup>A</sup>˜*rq* , we have:

$$d(\tilde{\mathcal{A}}\_{r\_q}) = n^{K^2 - K} (sn + 1)^{QK - W} (\upsilon n)^W,\tag{A28}$$
 
$$D\_N(\tilde{\mathcal{A}}\_{r\_q}) = \chi.$$

Thus, we can see that interference alignment Equations (30)–(33) are satisfied.

### **Appendix E**

The second term of (68) is exactly the same as the second term of (11) in Theorem 1. The proof of the first term is similar to the proof of the first term of (11) in Theorem 1 with a difference in the MIMO C-IR de-multiplexing method. In the proof of Theorem 1, each MIMO C-IR receiving antenna *q* de-multiplexes the message streams **x**˜*i*, *i* ∈ B*<sup>q</sup>* separately without a coordination with other receiving antennas. However, in the proof of this theorem, we use a coordination between the MIMO C-IR receiving antennas. Without a loss of generality, assume that |B1| = *<sup>Z</sup>* + 1 and B*<sup>q</sup>* <sup>=</sup> *<sup>Z</sup>*, *<sup>q</sup>* = 1. To de-multiplex the message streams **x**˜*i*, *i* ∈ {1, ... , *W*} at the MIMO C-IR, first we de-multiplex the message streams **x**˜*i*, *i* ∈ B*q*, *q* = 1 at the *q*-th MIMO C-IR receiving antenna separately. Then, to de-multiplex the message streams **x**˜*i*, *i* ∈ B1, we first remove the interference induced by the message streams **x**˜*i*, *i* ∈ {1, ... , *W*}, *i* ∈ B / 1. This results in a decrement in the total normalized asymptotic dimension at the first receiving antenna of the MIMO C-IR (the amount of decrement is *χ*), so (58) changes into the following form for *q* = 1:

$$D\_{N, \ell, \gamma\_1} = \left\lfloor \frac{\mathcal{W}}{\mathcal{Q}} \right\rfloor \chi + \chi + \max\{\Gamma, \zeta\}, \tag{A29}$$

and the constraint (63) changes into the following form:

$$\mathcal{Z} \ge \left\lfloor \frac{\mathcal{W}}{\mathcal{Q}} \right\rfloor \chi. \tag{A30}$$

Then, we see that the DoF (68) is achievable.

### **Appendix F**

The proof of this theorem is similar to the first term in the proof of Theorem 1. Here, we use the variable *U* introduced in the statement of the theorem to denote the number of clean receivers. Note that to avoid several notations, we use the same notations (such as the name of sets and vector subspaces) used in the proof of Theorem 1. Thus, from now on, these notations belong to this theorem. Our proof has six steps as follows.

**Step 1: Dividing Receivers, Transmitters, and NC-IR Antennas**

Using the same method as Step 1 of the proof of the first term in Theorem 1, we divide the transmitters into two partitions. For the transmitters *i* ∈ {1, ... , *U*}, we provide two sets of symbol streams: **x**¯[*i*] and **x**˜[*i*] . The matrices **V**¯ [*i*] and **V**˜ [*i*] are beamforming matrices, the columns of which are the beamforming vectors for each element of **x**¯[*i*] and **x**˜[*i*] , respectively. For the transmitters *<sup>i</sup>* ∈ {*<sup>U</sup>* <sup>+</sup> 1, ... , *<sup>K</sup>*}, we provide only one set of the symbol stream **<sup>x</sup>**¯[*i*] , and the matrix **V**¯ [*i*] is the beamforming matrix for the symbols **x**¯[*i*] . Hence, the vectors **X**[*i*] will have the forms of (12) and (13) by using the setting *W* = *U*. The reason for this kind of partitioning is the same as in Theorem 1. The main difference here is in the interference alignment scheme used for de-multiplexing the message streams **x**˜[*i*] , *i* ∈ {1, ... , *U*} in the NC-IR receiving antennas.

Next, we divide the transmitters *i* ∈ {1, ... , *U*} into the *p* distinct sets E*l*, *l* ∈ {1, ... , *p*} such that for *l* ∈ {1, ... ,*e* }, we have |E*l*| = *e* + 1, and for *l* ∈ {*e* + 1, ... , *p*}, we have |E*l*| = *e*. Similarly, we divide the NC-IR antennas into the *p* distinct sets F*l*, *l* ∈ {1, ... , *p*} such that |F*l*<sup>|</sup> <sup>=</sup> *<sup>U</sup>*, <sup>∀</sup>*<sup>l</sup>* ∈ {1, ... , *<sup>p</sup>*}. Now, we design the beamforming matrices **<sup>V</sup>**¯ [*i*] and **V**˜ [*i*] such that the message streams **x**˜[*i*] , *i* ∈ E*<sup>l</sup>* may be de-multiplexed in each of the NC-IR antennas *u* ∈ F*<sup>l</sup>* for ∀*l* ∈ {1, . . . , *p*}.

### **Step 2: Interference Cancellation at the Clean Receivers and Equivalent Channel at the Dirty Receivers**

For the interference cancellation, we design the outputs of antennas in the set F*<sup>l</sup>* such that the interference induced by the message streams **x**˜[*i*] , *i* ∈ E*<sup>l</sup>* is removed at the clean receivers *j* ∈ {1, ... , *U*}. Thus, the NC-IR antennas' transmitted signal must be designed such that they satisfy the following:

$$-\sum\_{i\in\mathcal{E}\_l, i\neq j} \mathbf{H}^{[ji]} \mathbf{V}^{[i]} \hat{\mathbf{x}}^{[i]} = \sum\_{u\in\mathcal{F}\_l} \mathbf{H}^{[ju]}\_{\text{IR}-\text{R}} \mathbf{X}^{[u]}\_{\text{IR}}, \forall j \in\{1,\ldots,l\}, \forall l\in\{1,\ldots,p\}.\tag{A31}$$

The solution to (A31) can be derived as follows:

$$\mathbf{X}\_{\text{IR}}^{[\mu]} = \sum\_{j \in \{1, \ldots, l\}} \sum\_{i \in \mathcal{E}\_l, i \neq j} \mathbf{H}\_{\text{inv}}^{[j\mu]} \mathbf{H}^{[ji]} \tilde{\mathbf{V}}^{[ji]} \tilde{\mathbf{x}}^{[i]}, \forall \mu \in \mathcal{F}\_{l\prime} \tag{A32}$$

where **<sup>H</sup>**[*ju*] inv is a *T* × *T* diagonal matrix and its *t*-th diagonal element is a fractional polynomial in terms of *<sup>H</sup>*[*<sup>j</sup> u* ] IR−R(*ωt*), *<sup>u</sup>* ∈ F*l*, *<sup>j</sup>* ∈ {1, ... , *U*}. This solution exists almost surely because the matrix of the coefficients of the linear equations is in terms of independent random variables, its determinant is a non-zero polynomial in terms of these random variables drawn from a CDF, which is continuous, and by using [34] (Lemma 1), it is a non-zero with the probability 1. Note that each NC-IR receiving antenna de-multiplexes the symbol streams **x**˜[*i*] with additive noise. This event does not disturb the equations above because if each symbol is replaced by a symbol with additive noise, the interference cancellation holds but we have additional noise, which is negligible in a high SNR regime. We can see that the received signals at the receivers have the same forms as (22) and **<sup>H</sup>**[*ju*] inv and the equivalent channel matrix **H**˜ [*ji*] has the same properties introduced in Step 2 of the proof of the first term in Theorem 1.

### **Step 3: Interference Alignment Equations**

The interference alignment equations and message and interference subspaces for the clean and dirty receivers are the same as in Step 3 of the proof of the first term in Theorem 1 ((24)–(29)) if we replace *W* with *U*. Consider *q* ∈ {1, ... , *pU*}: we define the function *L*(*q*) = *l* if *q* ∈ F*<sup>l</sup>* (*l* is unique because the sets F*<sup>l</sup>* are disjointed). We designed the interference alignment scheme such that the symbol streams **x**˜[*i*] , *i* ∈ E*L*(*q*) can be demultiplexed at the *q*-th receiving antenna of the NC-IR. Thus, the interference alignment equations for the NC-IR change as follows.

To this end, all the interference induced by the symbol streams **x**¯[*i*] must align into a limited subspace. Therefore, at the *q*-th receiving antenna of the NC-IR and for each *i* ∈ {1, . . . , *K*}, we must have:

$$\text{span}\left(\mathbf{H}\_{\mathbf{T}-\mathbf{I}\mathbf{R}}^{[qi]}\mathbf{\bar{V}}^{[i]}\right) \subseteq \mathcal{A}\_{\mathbf{r}\_{q'}}\tag{A33}$$

where <sup>A</sup>¯*rq* is considered a subspace for which we have:

$$\max\_{i \in \{1, \dots, K\}} D\_N \left( \text{span} \left( \mathbf{H}^{[qi]}\_{\mathbf{T} - \mathbf{IR}} \mathbf{V}^{[i]} \right) \right) = D\_N(\mathcal{A}\_{r\_q}).\tag{A34}$$

Then, for each *i* ∈ {1, . . . , *U*}, *i* ∈ E / *<sup>L</sup>*(*q*), we have:

$$\text{span}\left(\mathbf{H}\_{\mathbf{T}-\mathbf{I}\mathbf{R}}^{[qi]}\mathbf{V}^{[i]}\right)\subseteq\mathcal{A}\_{r\_{q'}}\tag{A35}$$

where <sup>A</sup>˜*rq* is considered a subspace for which we have:

$$\max\_{i \in \{1, \dots, II\}, i \notin \mathcal{E}\_{\mathcal{L}(q)}} D\_N \left( \text{span} \left( \mathbf{H}\_{\Gamma - \text{IR}}^{[qi]} \mathbf{V}^{[i]} \right) \right) = D\_N(\mathcal{A}\_{r\_q}). \tag{A36}$$

Moreover, we define <sup>C</sup>˜ *<sup>i</sup>*,*rq* , *i* ∈ E*L*(*q*) as the message subspaces, which can be demultiplexed at the NC-IR *q*-th antenna as follows:

$$\tilde{\mathcal{E}}\_{i,r\_{\!\!\!T\_{\!\!T\_{\!\!T}}}} = \text{span}\left(\mathbf{H}^{[\!\!\! \! \! \! \! \! ]}\_{\text{T}-\!\! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \!$$

We want <sup>C</sup>˜ *<sup>i</sup>*,*rq* , <sup>∀</sup>*<sup>i</sup>* ∈ E*L*(*q*), <sup>A</sup>¯*rq* and <sup>A</sup>˜*rq* to be full-rank and linearly independent, so we can make sure that the message streams **x**˜[*i*] , *i* ∈ E*L*(*q*) can be de-multiplexed at the *q*-th NC-IR antenna. In Steps 4 and 5, we prove the existence of such beamforming vectors, messages, and interference subspaces, which satisfies the previous interference alignment equations for the clean and dirty receivers and the MIMO C-IR. In Step 6, we analyze the achieved DoF by using the beamforming vectors' design.

### **Step 4: Beamforming Matrix Design**

The beamforming matrices **V**¯ [*i*] , ∀*i* ∈ {1, ... , *K*} are the same as (36) and (42) if we replace *W* with *U*. For **V**˜ [*i*] , we have:

$$\bar{\mathbf{V}}^{[i]} = \left\{ \left[ \mathbf{M}(\mathcal{g}\_1(i,j), \mathbf{\bar{H}}^{[ji]}, \mathbf{\bar{S}}\_1) \right] \left[ \mathbf{M}(\mathcal{g}\_2(i,q), \mathbf{H}^{[qi]}\_{\mathbf{T}-\mathbf{I}\mathbf{R}'}, \mathbf{\bar{S}}\_2) \right] \left[ \mathbf{M}(\mathcal{g}\_3(i,q), \mathbf{T}^{[qi]}, \mathbf{\bar{S}}\_3) \right] \mathbf{w} \right\} \tag{A38}$$

$$\mathcal{G}\_1 \in \mathcal{F}(\bar{\mathcal{S}}\_1, \{1, \dots, n\}), \mathcal{g}\_2 \in \mathcal{F}(\bar{\mathcal{S}}\_2, \{1, \dots, sn\}), \mathcal{g}\_3 \in \mathcal{F}(\bar{\mathcal{S}}\_3, \{1, \dots, vm\})),\tag{A39}$$

where <sup>S</sup>¯ 1, F(·, ·) and **M**(·, ·, ·) are given by using (37), (34), and (35), respectively, and we have:

$$\mathcal{S}\_2 = \left\{ (i, q) \Big| i \in \{1, \dots, \mathbb{K}\}, i \notin \mathcal{E}\_{L(q)}, q \in \{1, \dots, Q\} \right\},\tag{A40}$$

$$\mathcal{S}\_3 = \left\{ (i, q) \Big| i \in \mathcal{E}\_{L(q)}, q \in \{1, \dots, Q\} \right\}. \tag{A41}$$

**T**[*<sup>q</sup> <sup>i</sup>* ] s are *T* × *T* diagonal random matrices for each (*i*, *q*), where each diagonal element for each matrix is drawn independently and its CDF is continuous.

Note that similar to the proof of Theorem 1, each value of the parameters *s*, *υ* and *t* can be approximated by using rational numbers with arbitrarily small errors, and by choosing a sufficiently large *n*, the parameters *sn*, *υn* and *tn* will be integers.

### **Step 5: Validity of Interference Alignment Conditions and Decodability of Message Symbols**

*(1) Validity of Interference Alignment Conditions at Clean Receivers j* ∈ {1, ... , *U*}*:* The message subspace <sup>C</sup>¯ *<sup>j</sup>* and the interference subspace <sup>A</sup>¯ *<sup>j</sup>* will be exactly the same as (A3) and (A5). The message subspaces <sup>C</sup>˜ *<sup>j</sup>* will change as follows:

$$\vec{\mathcal{C}}\_{\vec{\jmath}} = \text{span}\left(\mathbf{H}^{[jj]}\mathbf{V}^{[j]}\right) = 1$$

$$\operatorname{span}\left\{\mathbf{H}^{[j]\bar{\jmath}}\Big[\mathbf{M}(\mathcal{G}\_{1}(i,j),\mathbf{H}^{[\bar{\jmath}]},\bar{\mathbf{S}}\_{1})\Big]\Big|\mathbf{M}(\mathcal{G}\_{2}(i,q),\mathbf{H}^{[q\bar{\jmath}]}\_{\Gamma-\text{IR}},\bar{\mathbf{S}}\_{2})\Big]\Big|\mathbf{M}(\mathcal{G}\_{3}(i,q),\mathbf{T}^{[qi]},\bar{\mathbf{S}}\_{3})\Big|\mathbf{w}:$$

$$\mathcal{g}\_{1}\in\mathcal{F}(\bar{\mathcal{S}}\_{1},\{1,\ldots,n\}),\mathcal{g}\_{2}\in\mathcal{F}(\bar{\mathcal{S}}\_{2},\{1,\ldots,sn\}),\mathcal{g}\_{3}\in\mathcal{F}(\bar{\mathcal{S}}\_{3},\{1,\ldots,vn\})),\qquad(\text{A42})$$

where <sup>S</sup>˜ <sup>2</sup> and <sup>S</sup>˜ <sup>3</sup> are given by using (A40) and (A41).

Considering the natures of <sup>A</sup>¯ *<sup>j</sup>* in (A5), <sup>C</sup>¯ *<sup>j</sup>* in (A3), and <sup>C</sup>˜ *<sup>j</sup>* in (A42), we can see from a statement by [34] (Lemma 2) that if we choose the variables *xk* as *H*[*ji*] (*ωt*), *<sup>H</sup>*[*ri* ] <sup>T</sup>−IR(*ωt*), *<sup>i</sup>*, *<sup>i</sup>* , *<sup>j</sup>* ∈ {1, ... , *<sup>K</sup>*}, *<sup>u</sup>* ∈ {1, ... , *<sup>Q</sup>*}, *yk* as *<sup>H</sup>*[*ju*] IR−R(*ωt*), *<sup>j</sup>* ∈ {*<sup>U</sup>* <sup>+</sup> 1, ... , *<sup>K</sup>*}, *<sup>u</sup>* ∈ {1, ... , *<sup>Q</sup>*}, and *zk* as *<sup>H</sup>*[*ju*] IR−R(*ωt*), *<sup>j</sup>* ∈ {1, ... , *<sup>U</sup>*}, *<sup>u</sup>* ∈ {1, ... , *<sup>Q</sup>*}, then by using [34] (Lemmas 1–3), subspaces <sup>A</sup>¯ *j*, C¯ *<sup>j</sup>* and <sup>C</sup>˜ *<sup>j</sup>* are full-rank and linearly independent (all base vectors of these subspaces are linearly independent) almost surely. The reason is that if we take the constructing base vectors of <sup>A</sup>¯ *j*, C¯ *<sup>j</sup>*, and <sup>C</sup>˜ *<sup>j</sup>* and construct a square matrix by choosing some rows of it, we can see by using [34] (Lemmas 2–3) that the determinant of this square matrix is a non-zero polynomial, which is non-zero with the probability 1 by using [34] (Lemma 1). Thus, all the message streams are decodable at the clean receivers (by using zero forcing). For more clarity, [34] (Lemmas 1–3) are reviewed in Appendix B.

Similar to the proof of Theorem 1, first we assume that the parameter *T* is sufficiently large, and at the end of this step, we determine the minimum required *T*. The dimensions of the subspaces <sup>C</sup>¯ *<sup>j</sup>* and <sup>A</sup>¯ *<sup>j</sup>* are the same as (A8) and (A10), respectively. Hence, we calculate the dimension of <sup>C</sup>˜ *<sup>j</sup>* by calculating the number of its base vectors in (A42) as follows:

$$d(\mathcal{C}\_j) = n^{K^2 - K} (sn)^{\varrho} (\upsilon n)^{\theta},\tag{A43}$$

where

$$\mathfrak{p} = \sum\_{q'=1}^{Q} \left( K - \left| \mathcal{E}\_{L(q')} \right| \right) = KQ - \sum\_{q'=1}^{Q} \left| \mathcal{E}\_{L(q')} \right| = KQ - \mathcal{U}^2,$$

$$\theta = \sum\_{q'=1}^{Q} \left| \mathcal{E}\_{L(q')} \right| = \mathcal{U}^2.$$

We can see from (10) that *<sup>l</sup>* <sup>=</sup> *<sup>K</sup>*<sup>2</sup> <sup>−</sup> *<sup>K</sup>* <sup>+</sup> *QK*. We can define the following parameters:

$$
\Gamma = \mathbf{s}^{QK}{}\_{\prime}$$

$$
\chi = \mathbf{s}^{QK - \mathcal{U}^2} \mathbf{v}^{\mathcal{U}^2}{}\_{\prime}$$

$$
\zeta = \mathbf{t}^{QK} \mathbf{s}$$

Therefore, the normalized asymptotic dimensions of the message and interference subspaces are:

$$D\_N(\vec{\mathcal{C}\_j}) = \Gamma,\tag{A44}$$

$$D\_N(\vec{\mathcal{C}\_j}) = \chi\_\prime \tag{A45}$$

$$D\_N(\bar{\mathcal{A}}\_{\dot{\jmath}}) = \max \{ \Gamma, \zeta \}. \tag{A46}$$

Thus, interference alignment Equations (24) and (25) are satisfied.

*(2) Validity of interference alignment conditions at the dirty receivers j* ∈ {*U* + 1, ... , *K*}*:* For the dirty receivers, the message subspace <sup>C</sup>¯ *<sup>j</sup>* and the interference subspace <sup>A</sup>¯ *<sup>j</sup>* are exactly the same as (A17) and (A18). To satisfy interference alignment Equation (28) (if *W* is replaced with *<sup>U</sup>*), the subspace <sup>A</sup>˜ *<sup>j</sup>* must be chosen such that:

$$\bigcup\_{i \in \{1, \dots, U\}} \left\{ \text{span} \left( \mathbf{\tilde{H}}^{[ji]} \mathbf{\tilde{V}}^{[i]} \right) \right\} \subseteq \mathcal{A}\_{j}.$$

Therefore, we can characterize subspace <sup>A</sup>˜ *<sup>j</sup>* as follows:

$$\mathcal{A}\_{\boldsymbol{j}} = \text{span}\left\{ \left[ \mathbf{M}(\mathcal{g}\_1(\boldsymbol{i}, \boldsymbol{j}), \mathbf{H}^{[\boldsymbol{j}\boldsymbol{i}]}, \mathcal{S}\_1) \right] \left[ \mathbf{M}(\mathcal{g}\_2(\boldsymbol{i}, \boldsymbol{q}), \mathbf{H}^{[\boldsymbol{q}\boldsymbol{i}]}\_{\mathrm{T}-\mathrm{IR}}, \mathcal{S}\_2) \right] \left[ \mathbf{M}(\mathcal{g}\_3(\boldsymbol{i}, \boldsymbol{q}), \mathbf{T}^{[\boldsymbol{q}\boldsymbol{i}]}, \mathcal{S}\_3) \right] \mathbf{w} : \right\}$$

$$\mathcal{g}\_1 \in \mathcal{F}\left(\mathcal{S}\_{1\prime}\{1, \ldots, \prime n+1\}\right), \mathcal{g}\_2 \in \mathcal{F}\left(\mathcal{S}\_{2\prime}\{1, \ldots, \prime n\}\right), \mathcal{g}\_3 \in \mathcal{F}\left(\mathcal{S}\_{3\prime}\{1, \ldots, \prime n\}\right) \right\}, \quad \text{(A47)}$$

where <sup>S</sup>˜ <sup>2</sup> and <sup>S</sup>˜ <sup>3</sup> are given by using (A40) and (A41).

By using the same argument given for <sup>A</sup>¯ *j*, C¯ *<sup>j</sup>* and <sup>C</sup>˜ *<sup>j</sup>* at the clean receivers, subspaces A¯ *<sup>j</sup>*, <sup>A</sup>˜ *<sup>j</sup>* and <sup>C</sup>¯ *<sup>j</sup>* are full-rank and linearly independent almost surely. Then, we have:

$$D\_N(\vec{\mathcal{C}\_j}) = \mathbb{\zeta}\_{\prime} \tag{A48}$$

$$D\_N(\bar{\mathcal{A}}\_{\dot{\jmath}}) = \max\{\Gamma, \zeta\}, \tag{A49}$$

$$d(\mathcal{A}\_j) = (n+1)^{K^2 - K} (sn)^{QK - lI^2} (\upsilon n)^{lI^2} \,. \tag{A50}$$

$$D\_N(\mathcal{A}\_j) = \chi. \tag{A51}$$

Hence, we can see that interference alignment Equations (26)–(29) are satisfied.

*(3) Validity of interference alignment conditions at the q-th antenna of the NC-IR <sup>q</sup>* ∈ {1, ... , *<sup>Q</sup>*}*:*The interference subspace <sup>A</sup>¯*rq* is exactly the same as (A24) if we replace *<sup>W</sup>* with *<sup>U</sup>*. The message subspaces <sup>C</sup>˜ *<sup>i</sup>*,*rq* , *<sup>i</sup>* ∈ E*L*(*q*) and the interference subspace <sup>A</sup>˜*rq* will change as follows:

$$\vec{\mathcal{C}}\_{i,\mathsf{r}\_q} = \text{span}\left(\mathbf{H}\_{\text{T}-\text{IR}}^{[q]}\vec{\mathbf{V}}^{[i]}\right) =$$

$$\text{span}\left\{\mathbf{H}\_{\text{T}-\text{IR}}^{[q]}\left[\mathbf{M}(\mathcal{g}\_1(i,j),\mathbf{H}^{[j]},\vec{\mathcal{S}}\_1)\right]\left[\mathbf{M}(\mathcal{g}\_2(i,q),\mathbf{H}\_{\text{T}-\text{IR}}^{[q]},\vec{\mathcal{S}}\_2)\right]\left[\mathbf{M}(\mathcal{g}\_3(i,q),\mathbf{T}^{[q]},\vec{\mathcal{S}}\_3)\right]\mathbf{w}:\right\}$$

$$\mathcal{g}\_1 \in \mathcal{F}(\mathcal{S}\_1,\{1,\ldots,n\}), \mathcal{g}\_2 \in \mathcal{F}(\mathcal{S}\_2,\{1,\ldots,sn\}), \mathcal{g}\_3 \in \mathcal{F}(\mathcal{S}\_3,\{1,\ldots,vn\})\,\Big\},\qquad \text{(A52)}$$

where <sup>S</sup>˜ <sup>2</sup> and <sup>S</sup>˜ <sup>3</sup> are given by using (40) and (41), respectively.

$$\vec{\mathcal{C}}\_{i,r\_{\vec{q}}} = \text{span}\left(\mathbf{H}\_{\text{T-IR}}^{[q]} \vec{\mathbf{V}}^{[i]}\right) =$$

$$\text{span}\left\{ \mathbf{H}\_{\text{T-IR}}^{[q]} \left[ \mathbf{M}(\mathcal{g}\_{1}(i,j), \mathbf{H}^{[\vec{\boldsymbol{w}}]}, \vec{\mathcal{S}}\_{1}) \right] \left[ \mathbf{M}(\mathcal{g}\_{2}(i,q), \mathbf{H}\_{\text{T-IR}}^{[q]}, \vec{\mathcal{S}}\_{2}) \right] \left[ \mathbf{M}(\mathcal{g}\_{3}(i,q), \mathbf{T}^{[q]}, \vec{\mathcal{S}}\_{3}) \right] \mathbf{w} : \right.$$

$$\mathcal{g}\_{1} \in \mathcal{F}(\vec{\mathcal{S}}\_{1}, \{1, \ldots, n\}), \mathcal{g}\_{2} \in \mathcal{F}(\vec{\mathcal{S}}\_{2}, \{1, \ldots, sn\}), \mathcal{g}\_{3} \in \mathcal{F}(\vec{\mathcal{S}}\_{3}, \{1, \ldots, vn\}) \right), \qquad \text{(A53)}$$

where <sup>S</sup>˜ <sup>2</sup> and <sup>S</sup>˜ <sup>3</sup> are given by using (A40) and (A41), respectively.

To satisfy interference alignment Equation (A35), the subspace <sup>A</sup>˜*rq* must be chosen such that:

$$\bigcup\_{i \in \{1, \dots, II\}, i \notin \mathcal{E}\_{L(q)}} \left\{ \text{span} \left( \mathbf{H}\_{\mathbf{T} - \mathbf{I} \mathbf{R}}^{[qi]} \mathbf{V}^{[i]} \right) \right\} \subseteq A\_{r\_q} \dots$$

Therefore, we can characterize <sup>A</sup>˜ *<sup>j</sup>* as follows:

$$\vec{\mathcal{A}}\_{\mathbb{Z}\_{q}} = \text{span}\left\{ \left[ \mathbf{M}(\mathcal{g}\_{1}(i,j), \tilde{\mathbf{H}}^{[ji]}, \tilde{\mathbf{S}}\_{1}) \right] \left[ \mathbf{M}(\mathcal{g}\_{2}(i,q), \mathbf{H}^{[qi]}\_{\mathbf{T}-\mathbf{IR}}, \tilde{\mathbf{S}}\_{2}) \right] \left[ \mathbf{M}(\mathcal{g}\_{3}(i,q), \mathbf{T}^{[qi]}, \tilde{\mathbf{S}}\_{3}) \right] \mathbf{w} : \right\}$$

$$\mathcal{g}\_{1} \in \mathcal{F}(\vec{\mathcal{S}}\_{1}, \{1, \dots, n\}), \mathcal{g}\_{2} \in \mathcal{F}(\vec{\mathcal{S}}\_{2}, \{1, \dots, sn+1\}), \mathcal{g}\_{3} \in \mathcal{F}(\vec{\mathcal{S}}\_{3}, \{1, \dots, \upsilon n\}) \right\}, \quad \text{(A54)}$$

where <sup>S</sup>˜ <sup>2</sup> and <sup>S</sup>˜ <sup>3</sup> are given by using (A40) and (A41), respectively.

By using the same argument given before, subspaces <sup>A</sup>¯*rq* , <sup>A</sup>˜*rq* , and <sup>C</sup>˜ *<sup>i</sup>*,*rq* , *i* ∈ E*L*(*q*) are full-rank and linearly independent almost surely. We can see that:

$$d(\mathcal{C}\_{i\mathcal{I}\_q}) = n^{K^2 - K} (sn)^{QK - \mathcal{U}^2} (\upsilon n)^{\mathcal{U}^2}, \forall i \in \mathcal{E}\_{L(q)\prime} \tag{A55}$$

$$D\_N(\mathcal{C}\_{i\mathcal{I}\_q}) = \chi\_\prime \tag{A56}$$

so the normalized dimension of the total subspaces that can be de-multiplexed at the NC-IR *q*-th antenna is:

$$\sum\_{i \in \mathcal{E}\_{L(q)}} D\_N(\mathcal{E}\_{i, r\_q}) = \left| \mathcal{E}\_{L(q)} \right| \chi. \tag{A57}$$

For <sup>A</sup>¯*rq* , the same as in the proof of Theorem 1, we have:

$$D\_N(\bar{\mathcal{A}}\_{r\_q}) = \max\{\Gamma, \zeta\}.\tag{A58}$$

For <sup>A</sup>˜*rq* , we have:

$$d(\mathcal{A}\_{r\_q}) = n^{K^2 - K} (sn + 1)^{QK - lI^2} (\upsilon n)^{lI^2},\tag{A59}$$

$$D\_N(\vec{\mathcal{A}}\_q) = \chi. \tag{A60}$$

Thus, we can see that interference alignment Equations (30)–(33) are satisfied.

The same as in the proof of scheme 1 in Theorem 1, we derive the dimension of the whole received signal space at each receiver. Therefore, if we define *dt*,*<sup>j</sup>* as the total dimension at the *j*-th receiver and *dt*,*rq* as the total dimension at the *q*-th receiving antenna of the NC-IR, then we can see (53)–(55) will be obtained if we replace *W* and B*<sup>q</sup>* with *U* and E*L*(*q*), respectively. Therefore, considering *DN*,*t*,*<sup>j</sup>* as the total normalized asymptotic dimension at the *j*-th receiver and *DN*,*t*,*rq* as the total normalized asymptotic dimension at the *q*-th antenna of the NC-IR, we have:

$$D\_{N, \ell, j} = \Gamma + \chi + \max\{\Gamma, \zeta\}, \forall j \in \{1, \ldots, \ell I\}, \tag{A61}$$

$$D\_{N,t,j} = \zeta + \chi + \max\{\Gamma, \zeta\}, \forall j \in \{\mathsf{U} + 1, \dots, K\}, \tag{A62}$$

$$D\_{N,t,r\_q} = \left| \mathcal{E}\_{L(q)} \right| \chi + \chi + \max \{ \Gamma, \zeta \}, \forall q \in \{ 1, \ldots, Q \}. \tag{A63}$$

Considering the parameter *T* as (59), we have:

$$\lim\_{n \to \infty} \frac{T}{n^{K^2 - K + QK}} = \chi + \max\{\Gamma, \zeta\} + \max\left\{ \max\_{q \in \{1, \ldots, Q\}} \left| \mathcal{E}\_{L(q)} \right| \chi, \zeta, \Gamma \right\}.\tag{A64}$$

Moreover, we have:

$$\max\_{q \in \{1, \dots, Q\}} \left| \mathcal{E}\_{L(q)} \right| = \left\lceil \frac{\mathcal{U}}{p} \right\rceil. \tag{A65}$$

Therefore, from using (A64) and (A65), we can conclude that:

$$\lim\_{n \to \infty} \frac{T}{n^{K^2 - K + QK}} = \chi + \max\{\Gamma, \zeta\} + \max\left\{ \left\lceil \frac{\mathcal{U}}{p} \right\rceil \chi, \zeta, \Gamma \right\}.\tag{A66}$$

Moreover we let:

$$
\Gamma = \mathbb{Z}\_{\prime} \tag{A67}
$$

$$
\zeta \ge \left\lceil \frac{\mathcal{U}}{p} \right\rceil \chi. \tag{A68}
$$

By using assumptions (A67) and (A68), we can see that the total normalized length is:

$$\lim\_{m \to \infty} \frac{T}{n^{K^2 - K + QK}} = \chi + 2\Gamma. \tag{A69}$$

### **Step 6: DoF Analysis**

Now, we can characterize the total DoF. As stated before, we have *U* clean receivers, each with a normalized message dimension equal to Γ + *χ*, and *K* − *U* dirty receivers, each with a normalized message dimension equal to *ζ* (note that we assumed *ζ* = Γ). Therefore, the total normalized length of *T* is equal to *χ* + 2Γ. Thus, the total DoF has the following form:

$$\text{DoF} = \max\_{\chi \ge 0, \Gamma \ge \left\lfloor \frac{\mu}{p} \right\rfloor\_{\chi}} \frac{\mathcal{U}(\chi + \Gamma) + (K - \mathcal{U})\Gamma}{\chi + 2\Gamma}. \tag{A70}$$

By assuming that Γ = *βχ*, we have:

$$\text{DoF} = \max\_{\beta \ge \left\lfloor \frac{\mu}{p} \right\rfloor} \frac{\mathcal{U}(1+\beta) + (K-\mathcal{U})\beta}{1+2\beta} \tag{A71}$$

$$\xi = \frac{K}{2} + \max\_{\beta \ge \left\lfloor \frac{\mathcal{U}}{p} \right\rfloor} K \frac{\frac{\mathcal{U}}{K} - \frac{1}{2}}{1 + 2\beta} = \frac{K}{2} + \max\left\{ K \frac{\frac{\mathcal{U}}{K} - \frac{1}{2}}{1 + 2\left\lceil \frac{\mathcal{U}}{p} \right\rceil}, 0 \right\},\tag{A72}$$

where (A72) follows from the fact that if *<sup>U</sup> <sup>K</sup>* <sup>&</sup>gt; <sup>1</sup> <sup>2</sup> , we set *β* = . *U p* / , and if *<sup>U</sup> <sup>K</sup>* <sup>&</sup>lt; <sup>1</sup> <sup>2</sup> , we tend *β* to ∞. This completes the proof.

### **References**


### *Review* **Coding for Large-Scale Distributed Machine Learning**

**Ming Xiao \* and Mikael Skoglund \***

Division of Information Science and Engineering, Royal Institute of Technology, Malvinas Vag 10, KTH, 100-44 Stockholm, Sweden

**\*** Correspondence: mingx@kth.se (M.X.); skoglund@kth.se (M.S.)

**Abstract:** This article aims to give a comprehensive and rigorous review of the principles and recent development of coding for large-scale distributed machine learning (DML). With increasing data volumes and the pervasive deployment of sensors and computing machines, machine learning has become more distributed. Moreover, the involved computing nodes and data volumes for learning tasks have also increased significantly. For large-scale distributed learning systems, significant challenges have appeared in terms of delay, errors, efficiency, etc. To address the problems, various error-control or performance-boosting schemes have been proposed recently for different aspects, such as the duplication of computing nodes. More recently, error-control coding has been investigated for DML to improve reliability and efficiency. The benefits of coding for DML include high-efficiency, low complexity, etc. Despite the benefits and recent progress, however, there is still a lack of comprehensive survey on this topic, especially for large-scale learning. This paper seeks to introduce the theories and algorithms of coding for DML. For primal-based DML schemes, we first discuss the gradient coding with the optimal code distance. Then, we introduce random coding for gradientbased DML. For primal–dual-based DML, i.e., ADMM (alternating direction method of multipliers), we propose a separate coding method for two steps of distributed optimization. Then coding schemes for different steps are discussed. Finally, a few potential directions for future works are also given.

**Keywords:** error-control coding; gradient coding; random codes; ADMM

### **1. Background and Motivations**

With the fast development of computing and communication technologies, and emerging data-driven applications, e.g., IoT (Internet of Things), social network analysis, smart grids and vehicular networks, the volume of data for various intelligent systems with machine learning has increased explosively along with the number of involved computing nodes [1], i.e., in a large scale. For instance, learning systems based on MAPReduce [2] have been widely used and may often reach the data volume of petabytes (10<sup>15</sup> bytes), which may be produced and stored in thousands of separated nodes [3,4]. Large-scale machine learning is pervasive in our societies and industries. Meanwhile, it is inefficient (sometimes even infeasible) to transmit all data to a central node for analysis. For the reason, distributed machine learning (DML), which stores and processes all or parts of data in different nodes, has attracted significant research interests and applications [1,3–16]. There are different methods of implementing DML, i.e., primal method (e.g., distributed gradient descend [4,7], federated learning [5,6]) and primal–dual method (e.g., alternating direction method of multipliers (ADMM)) [16]. In a DML system, participating nodes (i.e., agents or workers) normally process local data and send the learning model information to other nodes for consensus. For instance, in a typical federated learning system [5,6], worker nodes run multiple rounds of gradient descends (local epoch) with local data and received global models. Then, the updated local models are sent to the server for aggregating into new global models (normally weighted sum). The models are normally much shorter than raw data. Thus, significant communication costs are saved by federated learning, and meanwhile the transmission of models in general has better privacy than sending raw data over networks. Actually, in addition to federated learning, other DML also has the benefits

**Citation:** Xiao, M.; Skoglund, M. Coding for Large-Scale Distributed Machine Learning. *Entropy* **2022**, *24*, 1284. https://doi.org/10.3390/ e24091284

Academic Editor: H. Vincent Poor, Onur Günlü, Rafael F. Schaefer and Holger Boche

Received: 12 August 2022 Accepted: 8 September 2022 Published: 12 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

of communication efficiency and improved privacy since model information has, in general, smaller volumes and better privacy than raw data.

Despite various benefits, there are severe challenges for the implementation of DML, especially for large-scale DML. Ideally, DML algorithms have speedup gains, which should scale linearly with the number of participating learning machines (computing nodes). However, the practical speedup gain of DML is limited by various bottlenecks, and is still far from the theoretical upper limits [17,18]. Among others, significant bottlenecks include communication loads, security, global convergence, synchronization, slow computing nodes, complex optimization functions, etc. For instance, due to the limitation of computing capability and communication networks, a part of the computing nodes may have slow response and become the bottleneck of DML systems if the fast-response nodes have to wait for them. These nodes are often referred to as straggler nodes [4], and also called system noise [19]. To efficiently combat the straggler nodes, many schemes have been proposed, such as repetition nodes [20,21], blacklisting straggler nodes [22] and errorcontrol codes [4,8–14,23–25]. Blacklisting method detects the straggler nodes and will not schedule more tasks to them. Thus, it is a type of *after-event* approach. The repetition of computing nodes needs lots of resources and a suitable mechanism to detect straggler nodes and find corresponding repetition nodes. Moreover, it is rather expensive to repeat all computing tasks and related data. More recently, error-control coding was proposed for DML by regarding straggler nodes as erasure, which can be corrected by coded data from non-straggler nodes and are shown to be much more efficient than the schemes based on replication. Error-control coding can correct the loss by straggler nodes of current learning rounds and thus is a type of *current-event* approach.

In [8], more practical computing networks with hierarchical structures were studied. For such networks, hierarchical coding schemes based on multiple MDS codes were proposed to reduce computation time. In [9], each multiplication matrix was further divided into sub-matrices, and all sub-matrices were encoded by MDS codes (e.g., Reed–Solomon codes). Thus, the computed parts in straggler nodes can be exploited, and the computing time can be further reduced. However, as the number of nodes and sub-matrices increases, the complexity of the MDS codes will increase substantially. In [25], the deterministic construction of Reed–Solomon codes was proposed for gradient-based DML. The generator matrix of the codes in [25] is sparse and well balanced, and thus the waiting time is reduced for gradient computation. In [10], a new entangled polynomial coding scheme was proposed to minimize the recover threshold of master–worker networks with generalized configurations for matrix-multiplication-based DML. In [26,27], coding schemes are considered for matrix multiplication in heterogeneous computing networks. However, the complexity of coding in [26,27] is still very high for large-scale DML since matrix inversion is used for decoding, and moreover, the coding matrix is pre-fixed and is hard to adapt to varying networks. In [28], low-complexity decoding was proposed for matrix multiplication for DML. However, the results in [28] are preliminary and hard to be used for heterogeneous networks, and the communication load is still very high. In [11], coding schemes based on the Lagrange polynomial are proposed to encode blocks among worker nodes. The proposed codes may achieve optimal tradeoffs among redundancy (against straggler nodes), security (against Byzantine modification) and privacy. However, the coding scheme in [11] is also based on MDS codes, which may not be flexible and have high complexity for large-scale DML. Furthermore, the existing coding schemes are mostly for matrix multiplication (for distributed gradient descend), i.e., the primal method. Another important class of large-scale DML is based on primal-dual methods, i.e., ADMM [16], for which codes have seldom been studied. Thus, coding for ADMM based large-scale DML should be developed to combat straggler nodes, reduce communication loads and increase efficiency.

Despite the progress in coding for straggler nodes [4,8–14,24,25], the results are still preliminary and there are also various critical challenges for exploiting the advantages of DML, especially for *large-scale learning*: (1) Reliability and complexity—though coding has been proposed for addressing the straggler nodes to improve reliability, the existed schemes are mainly for the systems with a limited number of nodes or data. The coded DML schemes based on existing optimal error-control codes (i.e., maximum distance separable: MDS codes) [4,24,25] have very high encoding/decoding complexity when the number of involved nodes or the data volume scales up. Moreover, MDS codes treat every coding node equally and are not optimal for heterogeneous networks (e.g., IoT or mobile networks). (2) Communication loads—with increasing nodes or data volumes, the communication loads will quickly increase for exchanging model updates among learning nodes. Thus, coding schemes efficient in communication loads are critical for large-scale DML. (3) Limited learning functions—most of the existing coding schemes for DML are for gradient descend (primal method), i.e., combining coding with matrix multiplication and/or data shuffling [4,8–14,24,25]. Coding for many other important distributed learning functions, e.g., primal–dual optimization functions (also may be non-smooth or nonconvex) in ADMM has seldom been explored. Moreover, existing coding for DML often runs in a master–worker structure, which may not be efficient (or even infeasible) for certain applications, e.g., those without master nodes. Thus, coding for fully decentralized DML should be also investigated. By encoding the messages to (or/and from) different destinations/sources in intermediate nodes, network coding shows the benefits of reducing information flow in the networks [29,30]. Moreover, it has been shown that network coding can improve the reliability and security of communication networks [12,31,32]. Thus, it is also valuable to discuss the applications of network coding to DML.

In what follows, we first introduce the basics on DML in Section 2. Then we discuss how error-control coding can help with the straggler problem in Section 3, the random coding construction in Section 4, and coding for primal–dual-based DML (ADMM) in Section 5. Finally, conclusions and discussion for future works are given in Section 6.

### **2. Introduction of Distributed Machine Learning**

In general, DML will have two steps: (1) Agents learn local models from local data, maybe combining with global models. This step may iterate multiple rounds, i.e., local iterations, to produce a local model. (2) With local models, agents will reach consensus. These two steps may also iterate multiple rounds, i.e., global iterations. There are also different methods to implement the two steps, for instance, the primal and primal–dual methods as mentioned above. There are different ways to achieve consensus, for instance, through a central server, i.e., master–slave method or fully decentralized. For the former, the implementation is relatively straightforward. Yet, for the latter, there are also different approaches as will be discussed later on. For Step (1), the common local learning machine includes, for example, linear (polynomial) regressions, classification and neural networks. The common approach of these learning algorithms is to find the model parameters (e.g., weights in neural networks) that minimize the cost functions (such as mean-squared errors/L2 loss, hinge loss and cross-entropy loss). In general, convex cost functions should be chosen. For instance, for linear regression, we assume *x*, *y* as the input and output of the training data, respectively, and *w* (normally a matrix or a vector) as the weight to be optimized. If the mean-squared error cost functions are used, then the learning machine works as

$$\min\_{w} \parallel \left. \mathbf{x}w - y\right\|^2. \tag{1}$$

To find the optimal *w*, one common approach is to use gradient descend, which is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. If the cost function is convex, then the local minimum is also the global minimum [33]. For instance, in the training process of neural networks, gradient descend is commonly used to find the optimized weight and bias iteratively. The gradient is found by partial derivative of cost functions relative to optimizing variables (weight and bias of training examples). For instance, for node *i*, the optimizing variables can be updated by

$$w\_{t+1}^l = w\_t^l - \gamma \nabla F(w\_{t\prime}^l D\_i)\_{\prime} \tag{2}$$

where *t* is the iteration step index, *γ* is the step size, *Di* is the data set (training samples) in node *i*, *F*(*w<sup>i</sup> <sup>t</sup>*) is the cost function with current optimizing variables, and <sup>∇</sup>*F*(*w<sup>i</sup> <sup>t</sup>*, *Di*) denotes the gradients for given (*w<sup>i</sup> <sup>t</sup>*, *Di*) (by partial derivatives). The training process is normally performed in batches of data. *Di* can be further divided into subsets, e.g., *N* subsets, i.e., *Di* <sup>=</sup> {*D*<sup>1</sup> *<sup>i</sup>* , *<sup>D</sup>*<sup>2</sup> *<sup>i</sup>* , ··· , *<sup>D</sup><sup>N</sup> <sup>i</sup>* }. If subsets are exclusive, the gradients from different subsets are independent, i.e., <sup>∇</sup>*F*(*w<sup>i</sup> <sup>t</sup>*, *Di*) = {∇*F*(*w<sup>i</sup> <sup>t</sup>*, *D*<sup>1</sup> *<sup>i</sup>* ), <sup>∇</sup>*F*(*w<sup>i</sup> <sup>t</sup>*, *D*<sup>2</sup> *<sup>i</sup>* ), ··· , <sup>∇</sup>*F*(*w<sup>i</sup> <sup>t</sup>*, *D<sup>N</sup> <sup>i</sup>* )}. However, in many DML systems, e.g., those based on MAPReduce file systems, or sensor nodes in neighboring areas, there may be overlapping data subsets, i.e., *D<sup>k</sup> <sup>i</sup>* = *<sup>D</sup><sup>n</sup> <sup>j</sup>* for certain *k*, *n* and *i* = *j*. Therefore, there may be identical gradients in different nodes. These properties were recently exploited for coding. It it clear from (2) that for given gradients, the steps of finding optimal parameters are mainly linear matrix operations (matrix multiplications). Actually, in addition to neural networks, one core operation of many other learning algorithms is also matrix multiplications, such as regression, poweriteration-like algorithms, etc. [4]. Thus, one of the major coding schemes for DML is based on the matrix multiplication of the learning process [4,8–14,24,25]. Clearly, major coding schemes (forward error-control coding and network coding) are linear in terms of encoding and decoding operations, i.e., *C* = *M* × *W*, where *C*, *M* and *W* are codeword (vectors), coding matrix and information message, respectively. Since both learning and coding operations are linear matrix operations, then the coding matrix and learning matrix can be *jointly* optimized. On the other hand, coding can be optimized to provide efficient and reliable information pipelines for DML systems. In such way, coding and DML matrices are *separately* optimized. Separate optimization actually has been widely studied for many years for existing systems due to the simpler design relative to joint design. There are many works in the literature on the separate optimization of learning systems and coding schemes. We will focus on joint design in this article.

### **3. Coding for Reliable Large-Scale DML**

In this section, we will first give a review on the basic principles of coding for reliable DML. Then, we will discuss two optimal construction of codes for DML.

One toy example of how coding can help to deal with stragglers can be found in Figure 1 [34]. For instance, it can be a federated learning network with worker and server nodes. There is partial overlapping for data segments in different worker nodes and thus the partial overlapping of gradients. As in Figure 1, we divide the data set of a node into multiple smaller sets to denote the partial overlapping of different nodes. Meanwhile, multiple sets in a node are also necessary for encoding as shown in the figure since one data set corresponds to one source symbol of the code. In the server node, a weight sum of the gradient is needed. In the figure, three worker nodes have different data parts of *D*1, *D*2, *D*3, which are used to compute gradients *G*1, *G*2, *G*3, respectively. In the server, an individual gradient is not needed but only their sum *Gs* = *G*<sup>1</sup> + *G*<sup>2</sup> + *G*3. We can easily see that gradients from *any* two nodes can calculate *Gs*. For instance, if worker3 is outage, then *Gs* = 2(*G*1/2 + *G*2) − (*G*<sup>2</sup> − *G*3) with two transmission coded blocks from worker1 and worker2. If there is no coding, then worker1 and worker2 have to transmit *G*1, *G*2, *G*<sup>3</sup> separately with three blocks after the coordination operations. Thus, coding can save the transmission and also coordination loads.

Though the idea of applying coding for DML is straightforward as shown in the above toy example, the code design will be rather challenging for large-scale DML, i.e., when the numbers of nodes and/or gradients per node are very large. One big challenge is how to construct encoding and decoding matrices, especially with limited complexity. In what follows, we will first give a brief introduction of the MAPReduce file systems, which are often used in DML. Then, we will discuss the coding schemes with deterministic construction [34]. The random construction based on fountain codes is given in the next section, which normally has lower complexity [13,14].

**Figure 1.** Coded DML with a master–worker structure can tolerate any of one straggler node.

In large DML systems, MAPReduce is a commonly used distributed file storage system. As shown in Figure 2, there are three stages for the MAPReduce file systems: map, shuffling and reduce. In the system, data are stored in different nodes. In the map stage, stored data are sent to different computing nodes (e.g., cloud computing nodes), according to pre-defined protocols. In the shuffling stage, the computed results (e.g., gradients) are exchanged among nodes. Finally, the end users will collect the computed results in the reduce stage. MAPReduce can be used in federated learning, which was originally proposed for the applications in mobile devices [5]. In such a scenario, data are first sent to different worker nodes in the map stage, according to certain design principles. Then in the shuffling stage, local model parameters are aggregated in the server node. Finally, the aggregated models are obtained in the final iteration at the server. In such a way, worker nodes have all necessary data for computing local models, sent from storage nodes. However, there may be straggling worker nodes, due to either slow computing at the node or transmission errors in the channels. In such scenario, gradient coding [34] can be used to correct the straggler nodes.

**Figure 2.** A common realization of DML based on MAPReduce.

To construct gradient coding, we use *A* to denote the possible straggler pattern multiplied by the corresponding decoding matrix, and *B* to denote how different gradients (or model parameters) are combined in the worker node. Thus, *A* denotes *transmission matrix multiplied by decoding* matrices in some sense (as they recover transmitting gradients from received coded symbols) and *B* can also be regarded as an *encoding* matrix. Assuming that *k* is the number of different gradients (data partitions) in all nodes and there are a total of *n* output channels in all nodes, the dimension of *B* is *n* × *k*. Denoting *g*¯ = [*g*1, *g*2, ··· , *gk*] *<sup>T</sup>* as the vector of all gradients, then worker node *i* transmits *big*¯, where *bi* is the *i*-th row of *B* and the encoding vector at node *i*. The dimension of *A* is *k* × *n*. A row of *A* corresponds to an instance of straggling patterns, in which 0 means a straggler node and how the gradients are reproduced in the server. Thus, all rows in *A* denote all possible ways of straggling. Denoting *f* as the number of surviving workers (none-stragglers), there are at most *n* − *f* 0s in each row of *A*. In the example of Figure 1, we only need the sum of gradients from worker nodes instead of the values of individual gradients. Thus, we have *AB* = **<sup>1</sup>***k*×*<sup>k</sup>*

and each row of *ABg*¯ is identically *<sup>G</sup>*<sup>1</sup> + *<sup>G</sup>*<sup>2</sup> + *<sup>G</sup>*3, where **<sup>1</sup>***k*×*<sup>k</sup>* denotes all 1 matrix. For the example, we can easily see that

$$A = \begin{Bmatrix} 0 & 1 & 2 \\ 1 & 0 & 1 \\ 2 & -1 & 0 \end{Bmatrix}, \quad \text{and} \quad B = \begin{Bmatrix} 1/2 & 1 & 0 \\ 0 & 1 & -1 \\ 1/2 & 0 & 1 \end{Bmatrix}.\tag{3}$$

Clearly, if we want individual values of *g*¯, we should redesign *A*, *B* such that *AB* is an identity matrix. Or if we want the weighted sum of gradients (weights more general than 1), *A*, *B* should be also redesigned. From the description, we can see that the main challenge of designing the gradient coding is to find suitable encoding matrix *B* such that it can correct the straggling loss defined by *A*. In [34], two different ways of finding *B* and corresponding *A* are given, i.e., fractional repetition and cyclic repetition schemes as detailed in the following.

We denote *n* and *s* as the number of worker nodes and straggler nodes, respectively, and assume *n* is a multiple of *s* + 1. Then, fractional repetition construction is described as the following steps.


By the second step, in a group, the first worker obtains the first *s* + 1 partitions from the map stage and computes the first *s* + 1 gradients, and the second worker obtains the second *s* + 1 partition from the map stage and computes the second *s* + 1 gradient and so on. The encoding of each group of workers can be denoted by a block matrix *B*¯ *block*(*n*,*s*) <sup>∈</sup> <sup>R</sup> *<sup>n</sup> <sup>s</sup>*+<sup>1</sup> <sup>×</sup>*<sup>n</sup>* with

$$\mathcal{B}\_{black}(n,s) = \begin{bmatrix} \mathbf{1}\_{1\times(s+1)} & \mathbf{0}\_{1\times(s+1)} & \cdots & \mathbf{0}\_{1\times(s+1)} \\ \mathbf{0}\_{1\times(s+1)} & \mathbf{1}\_{1\times(s+1)} & \cdots & \mathbf{0}\_{1\times(s+1)} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{0}\_{1\times(s+1)} & \mathbf{0}\_{1\times(s+1)} & \cdots & \mathbf{1}\_{1\times(s+1)} \end{bmatrix}\_{\frac{n}{s+1}\times n}.\tag{4}$$

Here **<sup>1</sup>**1×(*s*+1) and **<sup>0</sup>**1×(*s*+1) means 1 × (*<sup>s</sup>* + <sup>1</sup>) matrix of all 1 s and all 0 s (row vector), respectively. Then B is obtained by replicating *s* + 1 copies of *B*¯ *block*(*n*,*s*), i.e.,

$$B = B\_{frac} = \begin{bmatrix} B\_{block}^1(n\_\prime s) \\ B\_{block}^2(n\_\prime s) \\ \vdots \\ B\_{block}^{(s+1)}(n\_\prime s) \end{bmatrix}' \tag{5}$$

where *B*¯*<sup>i</sup> block*(*n*,*s*) = *<sup>B</sup>*¯ *block*(*n*,*s*), for *i* ∈ {1, ··· ,*s* + 1}. In addition to the encoding matrix *Bf rac*, reference [34] also gives the algorithms of constructing the corresponding *A* matrix as follows.

It was shown in [34] that by fractional repetition schemes, *B* = *Bf rac* from (5) and *A* from Algorithm 1 can correct any *s* straggler. It can be more formally stated as the following theorem.

**Algorithm 1** Algorithm to compute *A* for fractional repetition coding. **Input:** *B* = *Bf rac*; *f* ← binom(*n*,*s*) *A* ← zeros(*f* , *n*) **for** *I* ⊆ [*n*],*s.t.*|*I*| = (*n* − *s*) **do** *a* = zeros(1, *k*) *x* = ones(1, *k*)/*B*(*I*, :) *a*(*I*) = *x A* = [*A*; *a*]

**Output:** *<sup>A</sup>* s.t. *AB* = <sup>1</sup>*f*×*k*;

**Theorem 1.** *Consider B* = *Bf rac from (5) for a given number of workers n and stragglers s*(< *n*)*. Then, the scheme* (*A*, *Bf rac*)*, with A from Algorithm 1 is robust to any s straggler.*

Here, we refer the interested readers to [34] for the proof. In addition to fractional repetition construction, another way of finding the *B* matrix is the cyclic repetition scheme, which does not require *n* to be a multiple of *s* + 1. The algorithm to construct the cyclic repetition *B* matrix is given as follows.

Actually, the resultant matrix *B* = *Bcyc* from Algorithm 2 has the following support (non-zero parts):


where ∗ is the non-zero entries in *Bcyc*, and in each row of *supp*(*Bcyc*), there are (*s* + 1) non-zero entries. The position of non-zero entries is right shifted one step and cycled around until the last row. The construction of *A* matrix follows Algorithm 1 also for *Bcyc*. It was shown in [34] that cyclic repetition schemes can also correct any *s* stragglers:

```
Algorithm 2 Algorithm to construct B = Bcyc.
Input: n,s(< n);
H = binom(n,s) H = −sum(H(:, 1 : n − 1), 2) B = zeros(n) for i = 1 : n do
   j = mod(i − 1 : s + i − 1, n) + 1 B(i, j)=[1; −H(:, j(2 : s + 1))] \ H(:, j(1))]
Output: B ∈ Rn×n with (s + 1) non-zeros in each row.
```
**Theorem 2.** *Consider B* = *Bcyc from Algorithm 2, for a given number of workers n and stragglers s*(< *n*)*. Then, the scheme* (*A*, *Bcyc*)*, with A from Algorithm 1 is robust to any s straggler.*

Fractional repetition and cyclic repetition schemes provide specific methods of encoding and decoding for master–worker DML for tolerating any *s* stragglers. More generally, it was also shown in [34] the necessary conditions for matrix *B* for tolerating any *s* stragglers if the following conditions are satisfied.

Condition 1 (B-Span): Consider any scheme (*A*, *B*) robust to any *s* stragglers, given *n*(*s* < *n*) workers, then every subset (*I*) ⊆ span{*bi*|*i* ∈ (*I*)} is satisfied, where span{·} is the span of vectors.

If *A* matrix is constructed by Algorithm 1, (*A*, *B*) with Condition 1 is also sufficient.

**Corollary 1.** *If A matrix is constructed by Algorithm 1 and B satisfies Condition 1,* (*A*, *B*) *can correct any s stragglers.*

Numerical results: In Figure 3, the average time per iteration for different schemes is compared from [34]. In *naive scheme*, the data are divided uniformly across all workers without replication, and the master just waits for all workers to send their gradients. In *ignoring the s straggler scheme*, the data distribution is the same as the naive scheme. However, the master node only waits until *n* − *s* worker nodes successfully send their gradients (no need to wait for all gradients). Thus, as discussed in [34], ignoring the straggler scheme may lose in the generalization performance by ignoring a part of data sets of straggler nodes. The running learning algorithms are based on logistic regression. The training data are from the Amazon Employee Access dataset from Kaggle. The delay is introduced by the computing latency of AWS clusters, and there is no transmission error. As shown in the figure, the naive scheme performs the worst. With increasing stragglers, coding schemes also perform better than ignoring straggler schemes as expected.

**Figure 3.** Comparison average time per iteration on Amazon employee access dataset [34].

### **4. Random Coding Construction for Large-Scale DML**

The gradient coding in [34] works well for the DML scheme with a master–worker structure with limited sizes (finite number of nodes and limited data partitions). However, the deterministic construction of encoding and decoding matrices may be challenging when the number of nodes or data partitions (e.g., *n* or *k*) is large. The first challenge is the complexity of encoding and decoding, both of which are based on matrix multiplication, which may be rather complex, especially for decoding (e.g., based on Gaussian elimination). Though DML with MDS codes is optimal in terms of code distance (i.e., the degree of tolerance to the amount of straggler nodes), the coding complexity will be rather high with the increasing number of participating nodes, i.e., for hundreds or even thousands of computing nodes. For instance, Reed–Solomon codes normally need to run in non-binary fields, which are of high complexity. Another challenge is lack of flexibility. Both factional repetition and cyclic repetition coding schemes assume static networks (worker nodes and data). However, in practice, the participating nodes may be varying in mobile nodes or sensors, for example. In the mobile computing scenario, the number of participating nodes may be unknown. It will rather difficult to design deterministic coding matrices (*A* or *B*) in such a scenario. Similarly, if the data are from sensors, the amount of data may also be varying. Thus, the deterministic construction of coding is hard to adapt to these scenarios, which, however, are very common in large-scale learning networks. Thus, coding schemes efficient in varying networks and of low complexity are preferable for large-scale DML. In [13,14], we investigated the random coding for DML (or distributed computing in general) to address the problems. Our coding scheme is based on fountain codes [35–37]. The coding scheme is introduced as follows.

*Encoding Phase:* As shown in Figure 4, we consider a network with multiple storage and computing/fog nodes. Let *FNf* denote the *f*-th fog node and let *SUs* denote the *s*-th storage unit with *f* ∈ {1, 2, ··· , *F*} and *s* ∈ {1, 2, ··· , *S*}, respectively. Let *Df* denote the dataset node *f* needed to finish a learning task. *Df* will be obtained from the storage units available to node *f* . For instance, in a DML with wireless links as in Figure 4, *Df* means the data union for all the storage units within the communication range of *FNf* (i.e., within *Rf*). Similar to federated learning, *FNf* will use the current model parameters to calculate gradients, namely, intermediate gradients, denoted as *gf* = [*gf* ,1, *gf* ,2, ··· , *gf* ,|*Df* <sup>|</sup>], where *gf* ,*<sup>a</sup>* means the gradient trained by data *a*(*a* ∈ *Df*) and |*Df* | is the size of *Df* . Meanwhile, fog nodes need to calculate the intermediate model parameters (e.g., weight) *wf* = [*wf* ,1, *wf* ,2, ··· , *wf* ,|*wf* <sup>|</sup>], where |*wf* | is the length of model parameters learned at *FNf* . Then the intermediate gradients and model parameters will be sent out to other fog nodes (or the central sever if there is one) for further processing after encoding. The coding process for *gf* is as follows.

• A number *dg* is selected according to degree distribution <sup>Ω</sup>(*x*) = <sup>∑</sup>|*Df* <sup>|</sup> *dg*=<sup>1</sup> <sup>Ω</sup>*dg <sup>x</sup>dg* with probability Ω*dg xdg* ;


Ω(*x*) can be optimized by the probability of straggling (regarded as erasure) due to channel errors, slow computing, etc. The optimization of the degree distribution for distributed fountain codes can be found in, for example, [38], and we will not discuss it here for space limitation. With the above coding process, the resulted coded intermediate gradients are

$$\mathbf{c}\_{f}^{\mathcal{S}} = [\mathbf{g}\_{f,1}, \mathbf{g}\_{f,2}, \dots, \mathbf{g}\_{f,|D\_f|}] \mathbf{G}\_f^{\mathcal{S}} = \mathbf{g}\_f \mathbf{G}\_{f'}^{\mathcal{S}} \tag{7}$$

where *G<sup>g</sup> <sup>f</sup>* is the generator matrix at fog node *FNf* . The encoding process for *wf* is the same as that of *gf* with a possibly different degree distribution *<sup>μ</sup>*(*x*) = <sup>∑</sup>*wf dw*=<sup>1</sup> *<sup>μ</sup>dw <sup>x</sup>dw* . The formed *Q<sup>w</sup> <sup>f</sup>* = (<sup>1</sup> + *<sup>η</sup>f*)*wf* coded intermediate parameters can be written as *<sup>c</sup><sup>w</sup> <sup>f</sup>* = *wf <sup>G</sup><sup>w</sup> f* , where *G<sup>w</sup> <sup>f</sup>* is the generator matrix at *FNf* for model parameters.

**Figure 4.** Distributed machine learning with multiple data storage and computing/fog nodes.

*Exchanging Phase:* The coded intermediate gradients *c g <sup>f</sup>* and model parameters *<sup>c</sup><sup>w</sup> f* , (*f* ∈ {1, 2, ··· , *N*}) are exchanged among fog nodes. Let *M* be the total number of all different data in all *<sup>F</sup>* nodes, *<sup>M</sup>* <sup>≤</sup> <sup>∑</sup>*<sup>F</sup> <sup>f</sup>*=<sup>1</sup> |*Df* |. The equality holds only if *F* datasets are disjoint.

*Decoding Phase:* The generator matrices for the received coded intermediate gradients and model parameters from fog node *FNi*(*<sup>i</sup>* ∈ {1, 2, ··· , *<sup>F</sup>*}) \ { *<sup>f</sup>* } at *FNf* are *<sup>G</sup>*˜ *<sup>g</sup> <sup>i</sup>*, *<sup>f</sup>* with size <sup>|</sup>*G*| × *<sup>Q</sup><sup>g</sup> <sup>i</sup>*, *<sup>f</sup>* and *<sup>G</sup>*˜ *<sup>w</sup> <sup>i</sup>*, *<sup>f</sup>* with size *wi* <sup>×</sup> *<sup>Q</sup><sup>w</sup> <sup>i</sup>*, *<sup>f</sup>* , respectively, where *<sup>Q</sup><sup>g</sup> <sup>i</sup>*, *<sup>f</sup>* = (<sup>1</sup> <sup>−</sup> *i*, *<sup>f</sup>*)*Q<sup>g</sup> <sup>i</sup>* and *Q<sup>w</sup> <sup>i</sup>*, *<sup>f</sup>* = (<sup>1</sup> <sup>−</sup> *i*, *<sup>f</sup>*)*Q<sup>w</sup> <sup>i</sup>* . Here *i*, *<sup>f</sup>* denotes the straggling probability from *FNi* to *FNf* due to various reasons, e.g., physical-layer erasure, slow computing, and congestion. Thus, the generator matrices corresponding to the received coded intermediate gradient and model parameters at *FNf* can be written as *<sup>G</sup>*˜ *<sup>g</sup> <sup>f</sup>* = [**1**1*G*˜ *<sup>g</sup>* 1, *<sup>f</sup>* , ··· , **<sup>1</sup>***f*−1*G*˜ *<sup>g</sup> <sup>f</sup>*−1, *<sup>f</sup>* , **<sup>1</sup>***f*+1*G*˜ *<sup>g</sup> <sup>f</sup>*+1, *<sup>f</sup>* , ··· , **<sup>1</sup>***FG*˜ *<sup>g</sup> <sup>F</sup>*, *<sup>f</sup>* and *G*˜ *g <sup>f</sup>* = [**1**1*G*˜ *<sup>g</sup>* 1, *<sup>f</sup>* , ··· , **<sup>1</sup>***f*−1*G*˜ *<sup>g</sup> <sup>f</sup>*−1, *<sup>f</sup>* , **<sup>1</sup>***f*+1*G*˜ *<sup>g</sup> <sup>f</sup>*+1, *<sup>f</sup>* , ··· , **<sup>1</sup>***FG*˜ *<sup>g</sup> <sup>F</sup>*, *<sup>f</sup>* and *<sup>G</sup>*˜ *<sup>w</sup> <sup>f</sup>* = [**1**1*G*˜ *<sup>w</sup>* 1, *<sup>f</sup>* , ··· , **<sup>1</sup>***f*−1*G*˜ *<sup>w</sup> <sup>f</sup>*−1, *<sup>f</sup>* , **1***f*+1*G*˜ *<sup>w</sup> <sup>f</sup>*+1, *<sup>f</sup>* , ··· , **<sup>1</sup>***FG*˜ *<sup>w</sup> <sup>F</sup>*, *<sup>f</sup>* , respectively. Here **I** = {**1**1, ··· , **1***F*} is an indicator parameter. Let *λ* be the probability of straggling. Then, **I***<sup>f</sup>* ,(*f* ∈ {1, 2, ··· , *F*}) can be evaluated as

$$\mathbf{I}\_f = \begin{cases} 1, & \text{with probability} \quad 1 - \lambda\_\prime \\ 0, & \text{with probability} \quad \lambda. \end{cases} \tag{8}$$

Then fog node *FNf* decodes the received coded intermediate parameters from *<sup>G</sup>*˜ *<sup>g</sup> i*, *f* and *G*˜ *<sup>w</sup> <sup>i</sup>*, *<sup>f</sup>* ,(*i* ∈ {1, 2, ··· , *F*}\{ *f* }), and tried to decode *N* − |*Df* | new gradients and <sup>Γ</sup>*<sup>w</sup>* <sup>∑</sup>*i*∈{1,2,··· ,*F*}\{ *<sup>f</sup>* } *wi* model parameters, where <sup>Γ</sup>*<sup>w</sup>* ∈ [0, 1] is a parameter determined by specific learning algorithms. For the benefits of fountain codes (e.g., LT or Raptor codes), the iterative decoding is feasible if the numbers of received coded gradients or model parameters are slightly larger than those of gradients and models in transmitting fog nodes. Clearly, to optimize the code degree distribution and task allocation, it is critical for a node to know the number of received intermediate gradients and model parameters at the node. For the purpose, we have the following analysis.

Assume *γa*,*<sup>b</sup>* as the overlapping ratio of the dataset in *FNa* and *FNb*, then for all fog nodes, we have the overlapping ratio as follows:

$$
\gamma = \begin{bmatrix}
1 & \gamma\_{1,2} & \cdots & \gamma\_{1,F} \\
\gamma\_{2,1} & 1 & \cdots & \gamma\_{2,F} \\
\vdots & \vdots & \ddots & \vdots \\
\gamma\_{F,1} & \gamma\_{F,2} & \cdots & 1
\end{bmatrix}.
\tag{9}
$$

If *γa*,*<sup>b</sup>* = 0, then node *FNa* and *FNb* has disjoint datasets. At *FNf* , |*Df* | intermediate gradients are known. Thus, *A* = *N* − |*Df* | new intermediate gradients are required for updating model parameters *wf* . Then, we have the following result:

**Theorem 3.** *The total number of new intermediate gradients received from the other fog nodes at FNf can be calculated by* Δ = ∑*F*−<sup>1</sup> *<sup>π</sup>i*,*i*=<sup>1</sup> **1***π<sup>i</sup>* ((1 − *γπi*, *<sup>f</sup>*)*ϕ*(*i*, *f*)) · |*Dπ<sup>i</sup>* |*, where ϕ*(*i*, *f*) *can be written as*

$$\varphi(i,f) = \begin{cases} 1, & \text{if } \quad i = 1, \\ \Pi\_{a=1}^{i-1} (1 - \gamma\_{\pi\_i, \pi\_i - \pi\_a} |\Theta\_{a,f}), & \text{if } \quad 2 \le i \le F - 1, \end{cases} \tag{10}$$

*where* Θ*a*, *<sup>f</sup> is a set formed by the indices of fog nodes, and it can be evaluated by*

$$\Theta\_{a,f} = \begin{cases} \{f\}, & \text{if} \quad a = 1, \\ \{f, \pi\_{1\prime}, \dots, \pi\_{a-1}\}, & \text{if} \quad a > 1. \end{cases} \tag{11}$$

If *γ* is known at each fog node (or at least from the transmitted neighbors at each receiving node), then Δ can be evaluated, and the computation and communication loads can be optimized through proper task assignment and code degree optimization. Theorem 3 is for gradients, and a similar analysis also holds for model parameters. In Figure 5, we show the coding gains in terms of communication loads, which are defined as the ratio of the total number of data transmitted by all the fog nodes to the data required at these fog nodes. As we can see from the figure, if the number of nodes *F* or straggler probability increases, the coding gains increase as expected.

**Figure 5.** Ratio of coding gains relative to uncoded systems in communication loads.

We note that both deterministic codes in Section 3 and random construction coding here are actually a type of network coding [29,30], which can reduce communication loads by computing at intermediate nodes (fog nodes) [3,4]. More recently, one type of special network codes, i.e., BATS (batched sparse) codes, was proposed with two layered codes as shown in Figure 6. For outer codes, we can use error control codes such as fountain codes in MAP phase. For inner codes, network codes can be used such as random linear network codes in data shuffling stage. In [12], we studied BATS codes for fog computing networks. As shown in Figure 7, numerical results demonstrate that the BATS codes can achieve a lower communication load than uncoded and deterministic codes (network codes) if the computing load is lower than certain thresholds. Here, we skip further details and refer interested readers to [12].

**Figure 6.** Large-scale distributed machine learning (DML) with BATS codes.

**Figure 7.** Communication load comparison among BATS codes, coded computing (deterministic codes) and uncoded [12]. *eF* denotes the channel erasure probability and corresponds to straggling probability. The computing load is defined as involved computing nodes and thus corresponds to expanding coefficients.

### **5. Coding for ADMM**

### *5.1. Introduction and System Setup*

As a primal–dual optimization method, ADMM is shown to be able to generally converge at a rate of O(1/*t*) for convex functions, where *t* is the iteration number [16], which is often faster than the schemes based on primal methods. Meanwhile, ADMM also has the benefits of robustness to non-smooth/non-convex functions and being adaptive to fully decentralized implementation. Thus, ADMM is especially suitable for large-scale DML and has attracted substantial research interests. For DML, especially for the fully decentralized learning system without a central server, we can denote the learning network as G = (N , E), where N = {1, ... , *N*} is the set of agents (computing nodes) and E is the set of links. For ADMM, agents aim at solving the following consensus optimization problem collaboratively:

$$\min\_{\mathbf{x}} \sum\_{i=1}^{N} f\_i(\mathbf{x}; \mathcal{D}\_i)\_\prime \tag{12}$$

where *fi* : *<sup>R</sup><sup>p</sup>* <sup>→</sup> *<sup>R</sup>* is the local optimization function of agent *<sup>i</sup>*, and <sup>D</sup>*<sup>i</sup>* is the data set of agent *<sup>i</sup>*. All the agents share a global optimization variable *<sup>x</sup>* <sup>∈</sup> *<sup>R</sup>n*. Data sets of different agent may have overlapping, i.e., D*<sup>i</sup>* ∩ D*<sup>j</sup>* = ∅, for a part or all *i* = *j*. This can happen, for instance, among the sensors of nearby areas for weathers, traffic, smart grids, etc., or if MAPReduce is used, the same data are mapped to different agents. For ADMM, (12) is solved iteratively by a two-step process:


With DML, there are also straggler nodes and unreliable-link challenges for ADMM, especially for large-scale and heterogeneous networks or with wireless links. However, with primal–dual optimization, it is very hard (if possible) to transfer ADMM optimization process into a linear function (e.g., matrix multiplication as in gradient descend). Thus, coding schemes based on linear operations (e.g., matrix multiplication in [4,8–11,24,25]) are impossible to be directly used in ADMM and there are very few results on coding for ADMM so far, to our best knowledge. To address the problem, one solution is to use coding separately for two steps of ADMM. For instance, error control coding can be used for local optimization if the data are stored in different locations for an agent. For the global

consensus, network coding can be used to reduce the communication loads and increase reliability. In [15], we preliminarily investigated how coding (MDS codes) can be used in local optimization (step (a)). A more detailed introduction is given as follows.

As depicted in Figure 8, a distributed computing system consists of multiple agents, each of which is connected with several edge computing nodes (ECNs). Agents can communicate with each other through links. ECNs are capable of processing data collected from sensors, and transferring desired messages (e.g., model updates) back to the connected agent. Based on the agent coverage and computing resources, the ECNs connected to agent *i*(∈ N ) are denoted as K*<sup>i</sup>* = {1, ... , *Ki*}. This model is common in current intelligent systems, such as smart factories or homes.

**Figure 8.** ADMM with multiple agents, each of which collect trained models from multiple ECNs with sensed data. Agents are connected via Hamiltonian networks.

The multi-agent system seeks to find out the optimal solution *x*<sup>∗</sup> by solving (12). D*<sup>i</sup>* is allocated to dispersed ECNs *Ki*. The formulation of decentralized optimization problem can be described as follows. By defining *<sup>x</sup>* = [*x*1, ... , *xN*] ∈ R*pN*×*<sup>d</sup>* and introducing a global variable *<sup>z</sup>* ∈ R*p*×*d*, problem (12) can be reformulated as

$$f(\text{P-1}):\min\_{\mathbf{x},\mathbf{z}}\sum\_{i=1}^{N}f\_i(\mathbf{x}\_i;\mathcal{D}\_i),\quad \text{s.t.}\ \mathbf{1}\otimes\mathbf{z}-\mathbf{x}=\mathbf{0},\tag{13}$$

where **1** = [1, ... , 1] *<sup>T</sup>* ∈ R*N*, and <sup>⊗</sup> is the Kronecker product. In the following, *fi*(*xi*, <sup>D</sup>*i*) is denoted as *fi*(*xi*) for simplifying illustration.

In what follows, we will present communication-efficient and straggler-tolerant decentralized algorithms, by which the agents can collaboratively find an optimal solution through local computations and limited information exchange among neighbors. In the scheme, local gradients are calculated in dispersed ECNs, while variables, including primal and dual variables and global variables *z*, are updated in the corresponding agent. For illustration purpose, we will first present stochastic ADMM (sI-ADMM) and then coded version of sI-ADMM (i.e., csI-ADMM). Both of them are proposed in [15]. The standard incremental ADMM iterations for decentralized consensus optimization will be reviewed first. The augmented Lagrangian function of problem (P-1) is

$$\mathcal{L}\_{\rho}(\mathbf{x}, \mathbf{y}, \mathbf{z}) = \sum\_{i=1}^{N} f\_i(\mathbf{x}\_i) + \langle \mathbf{y}, \mathbf{1} \otimes \mathbf{z} - \mathbf{x} \rangle + \frac{\rho}{2} \| \mathbf{1} \otimes \mathbf{z} - \mathbf{x} \|^2,\tag{14}$$

where *<sup>y</sup>* = [*y*1, ... , *yN*] ∈ R*pN*×*<sup>d</sup>* is the dual variable, and *<sup>ρ</sup>* <sup>&</sup>gt; 0 is a penalty parameter. With incremental ADMM (I-ADMM) [39,40], with guaranteeing ∑*<sup>N</sup> i*=1(*x*<sup>1</sup> *<sup>i</sup>* <sup>−</sup> *<sup>y</sup>*<sup>1</sup> *i <sup>ρ</sup>* ) = **0** (e.g., initialize *x*<sup>1</sup> *<sup>i</sup>* = *<sup>y</sup>*<sup>1</sup> *<sup>i</sup>* = **0**), the updates of *x*, *y* and *z* at the (*k* + 1)-th iteration follow:

$$\mathbf{x}\_{i}^{k+1} := \begin{cases} \arg\min\_{\mathbf{x}\_{i}} f\_{i}(\mathbf{x}\_{i}) + \frac{\rho}{2} \left\| z^{k} - \mathbf{x}\_{i} + \frac{y\_{i}^{k}}{\rho} \right\|^{2}, i = i\_{k};\\ \mathbf{x}\_{i}^{k}, \text{ otherwise;} \end{cases} \tag{15a}$$

$$y\_i^{k+1} := \begin{cases} y\_i^k + \rho \left(z^k - x\_i^{k+1}\right), \; i = i\_k; \\ y\_{i'}^k, \; \text{otherwise}; \end{cases} \tag{15b}$$

$$z^{k+1} := z^k + \frac{1}{N} \left[ \left( \mathbf{x}\_{i\_k}^{k+1} - \mathbf{x}\_{i\_k}^k \right) - \frac{1}{\rho} \left( y\_{i\_k}^{k+1} - y\_{i\_k}^k \right) \right]. \tag{15c}$$

For ADMM, solving augmented Lagrangian especially for the *x*-update above may lead to rather high computational complexity. To achieve fast computation for *x*-update, *first-order* approximation and *mini-batch stochastic* optimization in (15a) can be adapted. Furthermore, a quadratic proximal term with parameter *τ<sup>k</sup>* is proposed in [15] to stabilize the convergence behavior of the inexact augmented Lagrangian method. Ref. [15] also introduces the updating step-size *γ<sup>k</sup>* for the dual update. Both parameters *τ<sup>k</sup>* and *γ<sup>k</sup>* can be adjusted with iteration *k*. Finally, the updates of *x* and *y* at the (*k* + 1)-th iteration are presented as follows:

$$\mathbf{x}\_{i}^{k+1} := \begin{cases} \arg\min\_{\mathbf{x}\_{i}} \left< \mathcal{G}\_{i}(\mathbf{x}\_{i}^{k}; \mathbf{f}\_{i}^{k}), \mathbf{x}\_{i} - \mathbf{x}\_{i}^{k} \right> + \left< \mathbf{y}\_{i}^{k}, \mathbf{z}^{k} - \mathbf{x}\_{i} \right> \\ \quad + \frac{\rho}{2} \left\| \mathbf{z}^{k} - \mathbf{x}\_{i} \right\|^{2} + \frac{\mathbf{r}^{k}}{2} \left\| \mathbf{x}\_{i} - \mathbf{x}\_{i}^{k} \right\|^{2}, i = i\_{k}; \\ \mathbf{x}\_{i}^{k}, \text{ otherwise;} \\ \mathbf{y}\_{i}^{k}, \mathbf{z}\_{i}^{k} \text{(where } \mathbf{z}\_{i}^{k} \text{)} \end{cases} \tag{16a}$$

$$y\_i^{k+1} := \begin{cases} y\_i^k + \rho \gamma^k \left( z^k - x\_i^{k+1} \right), \ i = i\_k; \\ y\_{i'}^k, \text{otherwise;} \end{cases} \tag{16b}$$

where <sup>G</sup>*i*(*x<sup>k</sup> <sup>i</sup>* ; *<sup>ξ</sup><sup>k</sup> <sup>i</sup>*) is the mini-batch stochastic gradient, which can be obtained through <sup>G</sup>*i*(*x<sup>k</sup> <sup>i</sup>* ; *<sup>ξ</sup><sup>k</sup> <sup>i</sup>*) = <sup>1</sup> *<sup>M</sup>* <sup>∑</sup>*<sup>M</sup> <sup>l</sup>*=<sup>1</sup> <sup>∇</sup>*Fi*(*x<sup>k</sup> <sup>i</sup>* ; *<sup>ξ</sup><sup>k</sup> i*,*l* ). To be more specific, *M* is the mini-batch size of sampling data, *ξ<sup>k</sup> <sup>i</sup>* <sup>=</sup> {*ξ<sup>k</sup> i*,*l* }*<sup>M</sup>* denotes a set of independent and identically distributed randomly selected samples in one batch, and <sup>∇</sup>*Fi*(*x<sup>k</sup> <sup>i</sup>* ; *<sup>ξ</sup><sup>k</sup> i*,*l* ) corresponds to the stochastic gradient of a single example *ξ<sup>k</sup> i*,*l* .

### *5.2. Mini-Batch Stochastic I-ADMM*

For above setup of ADMM, *response time* is defined as the execution time for updating all variables in each iteration. In the updates, all steps, including *x*-update, *y*-update and *z*-update, are assumed to be in agents rather than ECNs. In practice, the update is often computed in a tandem order, which leads to a long response time. With the fast development of edge/fog computing, it is feasible to further reduce the response time since computing the local gradients can be dispersed to multiple edge nodes, as shown in Figure 8. Each ECN computes a gradient using local data and shares the result with its corresponding agent, and no information is directly exchanged among ECNs. Agents can be activated in a predetermined circulant pattern, e.g., according to a Hamiltonian cycle, and ECNs are activated whenever the connected agent is active, as shown in Figure 8. A Hamiltonian cycle based activation pattern is a cyclic pattern through a graph that visits each agent exactly once (i.e., 1 → 2 → 4 → 5 → 3 in Figure 8). Correspondingly, the mini-batch stochastic incremental ADMM (sI-ADMM) [15] is presented in Algorithm 3. At agent *ik*, global variable *zk*+<sup>1</sup> gets updated and is passed as a token to the next agent *ik*<sup>+</sup><sup>1</sup> via

a pre-determined traversing pattern, as shown in Figure 8. Specifically, in the *k*-th iteration with cycle index *<sup>m</sup>* <sup>=</sup> *k*/*N*, agent *ik* is activated. Token *<sup>z</sup><sup>k</sup>* is first received and then the active agent broadcasts the local variable *x<sup>k</sup> <sup>i</sup>* to its attached ECNs K*i*. According to batch data with index *I<sup>k</sup> i*,*j* , new gradient *gi*,*<sup>j</sup>* is calculated in each ECN, followed by the gradient update, *x*-update, *y*-update and *z*-update in agent *ik*, via steps 21–24 in Algorithm 3. At last, the global variable *zk*+<sup>1</sup> is passed as a token to its neighbor *ik*+1. In Algorithm 3, the stopping criterion is reached when *z<sup>k</sup>* <sup>−</sup> *<sup>x</sup><sup>k</sup> i* <sup>≤</sup> *pri* and G*i*(*x<sup>k</sup> <sup>i</sup>* ; *<sup>ξ</sup><sup>k</sup> <sup>i</sup>*) <sup>−</sup> *<sup>y</sup><sup>k</sup> i* <sup>≤</sup> *dual*, <sup>∀</sup>*<sup>i</sup>* ∈ N , where *pri* and *dual* are two pre-defined feasibility tolerances.

**Algorithm 3** Mini-batch stochastic I-ADMM (sI-ADMM)

1: **initialize**: {*z*<sup>1</sup> <sup>=</sup> *<sup>x</sup>*<sup>1</sup> *<sup>i</sup>* = *<sup>y</sup>*<sup>1</sup> *<sup>i</sup>* = **0**, |*i* ∈N}, batch size *M*;


$$\mathcal{G}\_i(\mathbf{x}\_i^k; \mathfrak{f}\_i^k) = \frac{1}{K\_i} \sum\_{j=1}^{K\_i} \mathcal{g}\_{i,j};\tag{17}$$

22: **update** *xk*+<sup>1</sup> according to (16a); 23: **update** *yk*+<sup>1</sup> according to (16b); 24: **update** *zk*+<sup>1</sup> according to (15c); 25: **send** token *zk*+<sup>1</sup> to agent *ik*<sup>+</sup><sup>1</sup> via link (*ik*, *ik*+1); 26: **until** the stopping criterion is satisfied. 27: **end for**

### *5.3. Coding for Local Optimization for sI-ADMM*

With less reliable and limited computing capability of ECNs, straggling nodes may be a significant performance bottleneck in the learning networks. To address this problem, error control codes can be used to mitigate the impact of the straggling nodes by leveraging data redundancy. Similar to Section 3, two MDS-based coding methods over real field R, i.e., *fractional* repetition scheme and *cyclic* repetition scheme, can be adopted and integrated with sI-ADMM for reducing the responding time in the presence of straggling nodes. The coded sI-ADMM (csI-ADMM) approach is presented in Algorithm 4. Denote the minimum required ECNs number by *Ri* and the maximum number of stragglers the system can tolerate by *Si*. Different from sI-ADMM, in csI-ADMM, encoding and decoding processes

are used in each ECN *<sup>j</sup>* ∈ K*<sup>i</sup>* and its corresponding agent *<sup>i</sup>*, respectively. <sup>G</sup>*i*(*x<sup>k</sup> <sup>i</sup>* ; *<sup>ξ</sup><sup>k</sup> <sup>i</sup>*) will be updated via steps 15–20, where the local gradient is calculated in ECN *j* ∈ K*<sup>i</sup>* in parallel via selected (*Si* + 1)*M*/*Ki* batch samples, and the gradient summation can be recovered in active agent *ik* with the responded messages from any *Ri* out of *Ki* ECNs to combat slow links and straggler nodes. As in steps 22–26 of sI-ADMM, activated agent *ik* then updates local variables successively. Computation redundancy is introduced, but agent *i* can tolerate any (*Si* = *Ki* − *Ri*) stragglers.

**Algorithm 4** Coded sI-ADMM (csI-ADMM)

1: **initialize**: {*z*<sup>1</sup> <sup>=</sup> *<sup>x</sup>*<sup>1</sup> *<sup>i</sup>* = *<sup>y</sup>*<sup>1</sup> *<sup>i</sup>* = **0**|*i* ∈N}, batch size *M*; 2: **LocalDataAllocation:** 3: **for** agent *i* ∈ N **do** 4: **divide** D*<sup>i</sup>* labeled data based on repetition schemes in [34] and denote each partition as *ξi*,*j*, *j* ∈ K*i*; 5: **for** ECN *j* ∈ K*<sup>i</sup>* **do** 6: **allocate** *ξi*,*<sup>j</sup>* to ECN *j*; 7: **partition** *ξi*,*<sup>j</sup>* examples into multiple batches with each size (*Si* + 1)*M*/*Ki*; 8: **end for** 9: **end for** 10: **UpdatingProcess:** 11: **for** *k* = 1, 2, . . . **do** 12: **StepsofActiveAgent***i* = *ik* = (*k* − 1) mod *N* + 1**:** 13: **run** steps 13–14 of Algorithm 3 14: **ECN***j* ∈ K*i***computesgradientinparallel**: 15: **run** step 16 of Algorithm 3 16: **select** batch *I k <sup>i</sup>*,*<sup>j</sup>* = *m* mod |*ξi*,*j*| · *Ki*/(*Si* + 1)*M*; (18) 17: **update** *gi*,*<sup>j</sup>* via encoding function *p j enc*(·); 18: **transmit** *gi*,*<sup>j</sup>* to the connected agent; 19: **until** the *Ri*-th fast responded message is received; 20: **update** gradient via decoding function *q<sup>i</sup> dec*(·); 21: **run** steps 22–26 of Algorithm 3; 22: **end for**

### *5.4. Simulations for Coded Local Optimization*

Both computed-generated and real-world datasets are used to evaluate the performance of the coded stochastic ADMM algorithms. The experimental network G consists of *N* agents and *E* = *<sup>N</sup>*(*N*−1) <sup>2</sup> *η* links, where *η* is the network connectivity ratio. For agent *i*, *Ki* = *K* ECNs with the same computing power (e.g., computing and memory) are attached. To reduce the impact of token traversing patterns, both the Hamiltonian cycle-based and non-Hamiltonian cycle-based (i.e., the shortest path cycle-based [41]) token traversing methods are evaluated for the proposed algorithms.

To demonstrate the advantages of the coding schemes, csI-ADMM algorithms are compared with uncoded sI-ADMM algorithms with respect to the accuracy [42], which is defined as

$$\text{accuracy} = \frac{1}{N} \sum\_{i=1}^{N} \frac{\left\| \mathbf{x}\_i^k - \mathbf{x}^\* \right\|}{\left\| \mathbf{x}\_i^1 - \mathbf{x}^\* \right\|}, \tag{19}$$

where *<sup>x</sup>*<sup>∗</sup> ∈ R*p*×*<sup>d</sup>* is the optimal solution of (P-1), and the test error [43], which is defined as the mean square error loss. For demonstrating the robustness against straggler nodes, distributed coding schemes, including *cyclic* and *fractional* repetition methods and the uncode method, are used for comparison. For fair comparison, the parameters for algorithms are tuned and kept the same in different experiments. Moreover, unicast is

considered among agents, and the communication cost per link is 1 unit. The consumed time for each communication among agents is assumed to follow a uniform distribution <sup>U</sup>(10−5, 10−4) seconds. The response time of each ECN is measured by the computation time, and the overall response time of each iteration is equal to the execution time for updating all variables in each iteration. All experiments were performed using Python on an Intel CPU @2.3 GHz (16 GB RAM) laptop.

To show the benefit of coding, in Figure 9, we compare the accuracy vs. running time for both coded and uncoded sI-ADMM. In simulation, the maximum delay *i*, (*i* = 1, 2, 3) for stragglers in each iteration is considered. For illustration purpose, we set up different *<sup>i</sup>* with <sup>1</sup> > <sup>2</sup> > <sup>3</sup> in simulation. For showing the benefits of coding to the convergence rate, convergence vs. straggler nodes trade-off for csI-ADMM, the impact of the number of straggler nodes on the convergence speed is shown in Figure 10. In simulations, 10 independent experiment runs are performed with the same simulation setup on synthetic data and take an average for presentation. We can see that, with an increasing number of straggler nodes, the convergence speed decreases. This is because increasing the number of straggler nodes decreases the allowable mini-batch size allocated in each iteration and therefore affects the convergence speed.

**Figure 9.** Comparison of coded and uncoded ADMM in accuracy and running time.

**Figure 10.** Impact of number of straggler nodes on the convergence rate of the proposed csI-ADMM on synthetic dataset.

### *5.5. Discussion*

Above, we discuss the application of error-control coding in the local optimization step of ADMM. In the agent consensus step, there are also straggling or transmission errors for updating global variables. To improve reliability in the consensus step, we can use linear network error correction codes [31] or BATS codes [32] based on LT codes. For the latter, the global variable (vector) is divided into many smaller vectors. The encoding process continues until certain stopping criteria are reached (e.g., feedback from other nodes or time out). There are quite a few papers on applying network coding for consensus; see [44,45]. Since there is no significant difference between the consensus process of the global variables of ADMM or other types of messages, interested readers are referred to these papers for further reading. We note that network coding can improve both the reliability and security of the consensus, i.e., as secure network codes [46].

### **6. Conclusions and Future Work**

We discussed how coding can be used to improve the reliability and reduce the communication loads for both primal- and primal–dual-based DML. We discussed both deterministic (and optimal) and random construction of error-control codes for DML. For the low-complexity and high flexibility, the latter may be more suitable for large-scale DML. For primal-dual based DML (i.e., ADMM), we discussed separate coding process for the two steps of ADMM, i.e., in local optimization and consensus processes separately. We introduced the algorithms on how to use codes for the local optimization of ADMM.

For emerging applications of increased interest, DML will be more and more common. Another interesting area for applying coding for DML is security. Though DML has a certain privacy-preserved capability (compared to transmit raw data), a higher security standard may be needed for sensitive applications. Secure coding has been an active topic for years; see [47]. We also have preliminary results on improving privacy by artificial noise in DML [40]. However, a further study is largely needed for improving performance and general scenarios.

Another interesting area for future work may be further studying coding for primal– dual methods. Though separate coding for the two steps of ADMM may solve the problem partly, the coding efficiency may be low and system complexity may be high. As discussed in Section 5, directly applying error control codes to ADMM may be infeasible. Another potential approach may be to simplify the optimization functions without significant performance loss, and error-control codes can be used.

**Author Contributions:** All authors have read and agreed to the published version of the manuscript.

**Funding:** This research is supported partly by Swedish Research Council (VR) project entitled "Coding for large-scale distributed machine learning" (Project ID: 2021-04772).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** This is an invited contribution. The authors acknowledge the effort of Guest Editors and reviewers.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


### *Article* **Bounds for Coding Theory over Rings**

**Niklas Gassner 1, Marcus Greferath 2, Joachim Rosenthal <sup>1</sup> and Violetta Weger 3,\***


**Abstract:** Coding theory where the alphabet is identified with the elements of a ring or a module has become an important research topic over the last 30 years. It has been well established that, with the generalization of the algebraic structure to rings, there is a need to also generalize the underlying metric beyond the usual Hamming weight used in traditional coding theory over finite fields. This paper introduces a generalization of the weight introduced by Shi, Wu and Krotov, called overweight. Additionally, this weight can be seen as a generalization of the Lee weight on the integers modulo 4 and as a generalization of Krotov's weight over the integers modulo 2*<sup>s</sup>* for any positive integer *s*. For this weight, we provide a number of well-known bounds, including a Singleton bound, a Plotkin bound, a sphere-packing bound and a Gilbert–Varshamov bound. In addition to the overweight, we also study a well-known metric on finite rings, namely the homogeneous metric, which also extends the Lee metric over the integers modulo 4 and is thus heavily connected to the overweight. We provide a new bound that has been missing in the literature for homogeneous metric, namely the Johnson bound. To prove this bound, we use an upper estimate on the sum of the distances of all distinct codewords that depends only on the length, the average weight and the maximum weight of a codeword. An effective such bound is not known for the overweight.

**Keywords:** rings; coding theory; Johnson bound; Plotkin bound

### **1. Introduction**

Coding theoretic experience has shown that considering linear codes over finite fields often yields significant complexity advantages over the nonlinear counterparts, particularly when it comes to complex tasks such as encoding and decoding. On the other side, it was recognized early [1,2] that the class of binary block codes contains excellent code families, which were not linear (Preparata, Kerdock codes, Goethals and Goethals–Delsarte codes). For a long time, it could not be explained why these families exhibit formal duality properties in terms of their distance enumerators that occur only on those among linear codes and their duals.

A true breakthrough in the understanding of this behavior came in the early 1990s when, after preceding work by Nechaev [3], the paper by Hammons et al. [4] discovered that these families allow a representation in terms of Z4-linear codes.

A crucial condition for this ring-theoretic representation was that Z<sup>4</sup> was equipped with an alternative metric, the Lee weight, rather than with the traditional Hamming weight, which only distinguishes whether an element is zero or non-zero. The Lee weight is finer, assigning 2 a higher weight than the other non-zero elements of this ring.

The fact that the traditional settings of linear coding theory (finite fields endowed with the Hamming metric) are actually too narrow, which suggests expanding the theory in at least two directions: on the algebraic part, the next more natural algebraic structure serving as alphabet for linear coding is that of finite rings (and modules). On the metrical part, the appropriateness of the Lee weight for Z4-linear coding suggests that the distance function for a generalized coding theory requires generalization as well.

**Citation:** Gassner, N.; Greferath, M.; Rosenthal, J.; Weger, V. Bounds for Coding Theory over Rings. *Entropy* **2022**, *24*, 1473. https://doi.org/ 10.3390/e24101473

Academic Editors: Onur Günlü, Rafael F. Schaefer, Holger Boche and H. Vincent Poor

Received: 15 September 2022 Accepted: 12 October 2022 Published: 16 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Since these ground-breaking observations, an entire discipline arose within algebraic coding theory. A considerable community of scholars have been developing results in various directions, among them code duality, weight-enumeration, code equivalence, weight functions, homogeneous weights, existence bounds, code optimality and decoding schemes, to mention only a few.

The paper at hand aims at providing a further contribution to this discipline, by introducing the overweight on a finite ring. This weight is a generalization of the Lee weight over Z4, as well as of the weight introduced in [5] by Krotov over Z2*<sup>s</sup>* for any positive integer *s*, which was further generalized to Z*p<sup>k</sup>* in [6].

We study the relations of this new weight to other well-known weights over rings and state several properties of the overweight, such as its extremal property. We also develop a number of standard existence bounds, such as a Singleton bound, a sphere-packing bound, a Plotkin bound and a version of the (assertive) Gilbert–Varshamov bound.

In the final part of this article, we derive a general Johnson bound for the homogeneous weight on a finite Frobenius ring. This result is important, as it is closely connected to list decoding capabilities.

### **2. Preliminaries**

Throughout this paper, we will consider *R* to be a finite ring with identity, denoted by 1. If *R* is a finite ring, we denote by *R*× its group of invertible elements, also known as units.

Let us recall some preliminaries in coding theory, where we focus on ring-linear coding theory.

For a prime power *q*, let us denote by F*<sup>q</sup>* the finite field with *q* elements and, for a positive integer *m*, we denote by Z*<sup>m</sup>* the ring of integers modulo *m*.

In traditional coding theory, we consider a linear code to be a subspace of a vector space over a finite field.

**Definition 1.** *Let q be a prime power, and let k* ≤ *n be non-negative integers. A linear subspace C of* F*<sup>n</sup> <sup>q</sup> of dimension k is called a linear* [*n*, *k*] *code.*

We define a weight in a general way.

**Definition 2.** *Let R be a finite ring. A real-valued function w on R is called a* weight *if it is a non-negative function that maps 0 to 0.*

It is natural to identify *w* with its additive extension to *Rn*, and so we will always write *w*(*x*) = ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *<sup>w</sup>*(*xi*) for all *<sup>x</sup>* <sup>∈</sup> *<sup>R</sup>n*. Every weight *<sup>w</sup>* on *<sup>R</sup>* induces a *distance d* : *<sup>R</sup>* <sup>×</sup> *<sup>R</sup>* −→ <sup>R</sup> by *d*(*x*, *y*) = *w*(*x* − *y*). Again, we will identify *d* with its natural additive extension to *<sup>R</sup><sup>n</sup>* <sup>×</sup> *<sup>R</sup>n*.

If the weight additionally is positive definite, symmetric and satisfies the triangular inequality, that is,


then the induced distance inherits these properties, i.e.,

1. *d*(*x*, *y*) ≥ 0 for all *x*, *y* ∈ *R* and *d*(*x*, *y*) = 0 if and only if *x* = *y*.


The most prominent and best studied weight in traditional coding theory is the Hamming weight.

**Definition 3.** *Let <sup>n</sup>* <sup>∈</sup> <sup>N</sup>*. The Hamming weight of a vector <sup>x</sup>* <sup>∈</sup> *<sup>R</sup><sup>n</sup> is defined as the size of its support*

$$w\_H(\mathbf{x}) = |\{i \in \{1, \dots, n\} \mid \mathbf{x}\_i \neq \mathbf{0}\}|\_{\mathbf{x}}$$

*and the Hamming distance between x and y* <sup>∈</sup> *<sup>R</sup><sup>n</sup> is given by*

$$d\_H(\mathbf{x}, \mathbf{y}) = |\{i \in \{1, \dots, n\} \mid \mathbf{x}\_i \neq \mathbf{y}\_i\}| = w\_H(\mathbf{x} - \mathbf{y}).$$

The minimum Hamming distance of a code is then defined as the minimum distance between two different codewords

$$d\_H(\mathbb{C}) = \min \{ d\_H(\mathbf{x}, y) \mid \mathbf{x}, y \in \mathbb{C}, \mathbf{x} \neq y \}. $$

Note that the concept of minimum distance can be applied for any underlying distance *d*.

In the paper at hand, we focus on a more general setting where the ambient space is a module over a finite ring.

**Definition 4.** *Let <sup>n</sup>* <sup>∈</sup> <sup>N</sup> *and let <sup>R</sup> be a finite ring. A submodule <sup>C</sup> of RR<sup>n</sup> of size <sup>M</sup>* <sup>=</sup> <sup>|</sup>*C*<sup>|</sup> *is called a left R-linear* (*n*, *M*) *code.*

The most studied ambient space for ring-linear coding theory is the integers modulo 4, denoted by Z4, endowed with the Lee metric.

**Definition 5.** *For x* ∈ Z*m, its Lee weight is defined as*

$$w\_L(\mathbf{x}) = \min\{\mathbf{x}\_\prime \mid m - \mathbf{x}\_\parallel\}.$$

One of the most prominent generalizations of the Lee weight over Z<sup>4</sup> is the homogeneous weight.

**Definition 6.** *Let R be a Frobenius ring. A weight w* : *R* −→ R *is called (left) homogeneous of average value γ* > 0*, if w*(0) = 0 *and the following conditions hold:*

*(i) For all x*, *y with Rx* = *Ry, we have that w*(*x*) = *w*(*y*)*.*

*(ii) For every non-zero ideal I* ≤ *RR, it holds that*

$$\frac{1}{|I|} \sum\_{\mathbf{x} \in I} w(\mathbf{x}) \, = \, \gamma.$$

*We will denote the homogeneous weight with wt.*

The homogeneous weight was first introduced by Constantinescu and Heise in [7] in the context of coding over integer residue rings. It was later generalized by Greferath and Schmidt [8] to arbitrary finite rings, where the ideal *I* in Definition 6 was assumed to be a principal ideal. In its original form, however, the homogeneous weight only exists on finite Frobenius rings. It can be shown that a left homogeneous weight is at the same time right homogeneous, and for this reason, we will omit the reference to any side for the sequel. In [9], Honold and Nechaev finally generalized the notion of homogeneous weight to some finite modules, called weighted modules, over a (not necessarily commutative) ring *R* with identity.

Since we will establish a Plotkin bound for a new weight, let us recall here the Plotkin bound over finite fields equipped with the Hamming metric.

**Theorem 1** (Plotkin bound)**.** *Let C be an* (*n*, *M*) *block code over* F*<sup>q</sup> with minimum Hamming distance d*. *If d* > *<sup>q</sup>*−<sup>1</sup> *<sup>q</sup> n, then*

$$M \le \frac{d}{d - \frac{q-1}{q}n}$$

For the homogeneous weight, the following Plotkin bound was established in [10].

.

**Theorem 2** (Plotkin bound for homogeneous weights, [10])**.** *Let wt be a homogeneous weight of average value γ on R, and let C be an* (*n*, *M*) *block code over R with minimum homogeneous distance d. If γn* < *d, then*

$$M \le \frac{d}{d - \gamma n}.$$

### **3. Overweight**

As the Hamming weight defined over the binary can be generalized to larger ambient spaces in different ways resulting in different metrics, such as the Hamming weight over F*<sup>q</sup>* or the Lee weight over Z*p<sup>s</sup>*; in addition, the Lee weight over Z<sup>4</sup> can be generalized in different ways. For example, the weight defined in [5] over Z2*<sup>m</sup>* for any positive integer *m* is a possible generalization, but the most prominent generalization is the homogeneous weight (see for example [10]). In this section, we introduce a new generalization, called the *overweight*. This weight shows some interesting properties and relations to the homogeneous weight and can additionally be seen as a generalization of the weight defined in [5] over Z2*<sup>s</sup>* for any positive integer *s* and the weight defined in [6] over Z*p<sup>s</sup>*.

**Definition 7.** *Let R be a finite ring. The* overweight *on R is defined as*

$$\mathcal{W}: \mathbb{R} \longrightarrow \mathbb{R}, \quad \mathbf{x} \mapsto \begin{cases} 0 & \text{if } \mathbf{x} = \mathbf{0}, \\ 1 & \text{if } \mathbf{x} \in \mathbb{R}^{\times}, \\ 2 & \text{otherwise.} \end{cases}$$

We also denote by *W* its additive expansion to *Rn*, given by *W*(*x*) = ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *W*(*xi*).

Let us call the distance which is induced by the overweight the *overweight distance*, and denote it by *D*, i.e., *D*(*x*, *y*) = *W*(*x* − *y*).

The motivation of introducing this new weight is twofold: on one hand, it is theoretically interesting to explore a new generalization of the Lee weight over Z<sup>4</sup> and its connections to other known weights over rings. On the other hand, the overweight would also be perfectly suitable for a channel, where unit errors are more likely.

Note that the overweight is designed to satisfy the following criteria: it is positive definite, symmetric, satisfies the triangular inequality and distinguishes between units and non-zero non-units. Furthermore, it is extremal in the sense that, on a big family of rings, any increase of the weight of non-zero non-units would violate the triangular inequality, thus the name *overweight*. We will now study this extremal property in more details.

We can consider weights with values in {0, 1, *α*}, for some *α* > 0, without fixing the subsets of *R* where these values are attained. Thus, we are considering the generic weight function

$$f(\mathbf{x}) = \begin{cases} 0 & \text{if } \mathbf{x} = \mathbf{0}, \\ 1 & \text{if } \mathbf{x} \in A\_{1\prime}, \\ a & \text{if } \mathbf{x} \in A\_{2\prime} \end{cases}$$

where *A*<sup>1</sup> ⊂ *R* \ {0} and *A*<sup>2</sup> = *R* \ (*A*<sup>1</sup> ∪ {0}). Such a weight is always positive definite. In addition, the weight is symmetric if and only if *A*<sup>1</sup> and *A*<sup>2</sup> contain all additive inverses of their elements. Let us now consider the triangular inequality: if there exist *x*, *y* ∈ *A*<sup>1</sup> such that *x* + *y* ∈ *A*2, then we must have

$$\alpha = f(\mathbf{x} + \mathbf{y}) \le f(\mathbf{x}) + f(\mathbf{y}) = 2.$$

Thus, in order for *f* to be an extremal weight, one chooses *α* = 2.

The overweight is a special case of such a weight function *f* with the choice *A*<sup>1</sup> = *R*×. The existence of elements *x*, *y* ∈ *R*<sup>×</sup> such that *x* + *y* ∈ *R* \ ({0} ∪ *R*×) is satisfied for many rings—for example, for rings with a non-trivial Jacobson radical.

### *Relations to Other Weights*

Clearly, the homogeneous weight and the overweight coincide with the Lee weight on Z4, with the Hamming metric on finite fields F*q*, and finally with the weight [6] on Z*p<sup>s</sup>*.

**Proposition 1.** *The overweight over finite chain rings gives an upper bound on the normalized homogeneous weight.*

**Proof.** Over a finite chain ring with socle *S* and residue field size *q*, we have that the normalized homogeneous weight is defined as

$$wt(\mathbf{x}) = \begin{cases} 0 & \text{if } \mathbf{x} = \mathbf{0}, \\ \frac{q}{q-1} & \text{if } \mathbf{x} \in S \backslash \{0\}, \\ 1 & \text{else}. \end{cases}$$

If *x* ∈ *S* \ {0}, then also *x* ∈ *R* \ *R*×, and

$$wt(x) = \frac{q}{q-1} \le 2 = W(x).$$

If *x* ∈ *R*×, then *wt*(*x*) = 1 = *W*(*x*) and finally, if *x* ∈ *R* \ (*S* ∪ *R*×), we have that

$$wt(\mathbf{x}) = 1 \le 2 = W(\mathbf{x}),$$

which implies the result.

In [11], Bachoc defines the following weight on F*p*-algebras *A*, with units *A*<sup>×</sup> as follows:

$$w\_B(\mathbf{x}) = \begin{cases} 0 & \text{if } \mathbf{x} = \mathbf{0}, \\ 1 & \text{if } \mathbf{x} \in A^\times, \\ p & \text{else}. \end{cases}$$

This is in the same spirit as the overweight. The weight of Bachoc is, however, only assuming positive definiteness. We note that, whenever we have a F2-algebra, the two weights coincide. The overweight can thus also be seen as a generalization of Bachoc's weight to a general finite ring.

Let us illustrate this connection with some examples: we consider the ring *M*2(F*p*) of 2 <sup>×</sup> 2 matrices over <sup>F</sup>*<sup>p</sup>* and the ring <sup>F</sup>*p*[*x*]/(*x*2). In both cases, the Bachoc weight only coincides with the homogeneous and the overweight in the case *p* = 2.

Finally, in [5], Krotov defines the following weight over Z2*m*, for any positive integer *m*:

$$w\_K(\mathbf{x}) = \begin{cases} 0 & \text{if } \mathbf{x} = \mathbf{0}, \\ 2 & \text{if } \mathbf{2} \mid \mathbf{x}, \mathbf{x} \neq \mathbf{0}, \\ 1 & \text{else}. \end{cases}$$

Clearly, this is a further generalization of the Lee weight over Z<sup>4</sup> and thus coincides there with the homogeneous and the overweight. However, even more is true: the weight of Krotov and the overweight coincide over Z2*<sup>s</sup>*, for any positive integer *s*. Thus, the overweight may be considered as a generalization of Krotov's weight over Z2*<sup>s</sup>* for any positive integer *s*.

Let us give some examples to illustrate the differences between the above-mentioned weights.

**Example 1.** *In the following table, wH denotes the Hamming weight, wt the normalized homogeneous weight, wL denotes the Lee weight, wK denotes Krotov's weight, wB denotes Bachoc's weight and finally W denotes the overweight. Let us consider two easy but pathological cases, namely* Z<sup>6</sup> *for Table 1 and* Z<sup>2</sup> × Z<sup>2</sup> *for Table 2.*

**Table 1.** Comparison of weights in Z6.


**Table 2.** Comparison of weights in Z<sup>2</sup> × Z2.


Finally, another interesting connection to the Hamming weight arises by considering the following linear injective isometry.

**Lemma 1.** *The map*

$$\begin{aligned} \psi &: (\mathbb{F}\_2[x]/(x^2), W) \to (\mathbb{F}\_{2'}^2 w\_H) \\ a + bx &\mapsto (a + b, b) \end{aligned}$$

*is a linear isometry.*

Recall that, over F2[*x*]/(*x*2), the overweight coincides with the weight of Bachoc and the homogeneous weight.

### **4. Bounds for the Overweight**

In this section, we develop several bounds for the overweight, such as a Singleton bound, a sphere-packing bound, a Gilbert–Varshamov bound and a Plotkin bound.

For this, let us first define the minimum overweight distance of a code.

**Definition 8.** *Let <sup>C</sup>* <sup>⊆</sup> *<sup>R</sup><sup>n</sup> be a code. The minimum overweight distance of <sup>C</sup> is then denoted by D*(*C*) *and defined as*

$$D(\mathbf{C}) = \min \{ D(\mathbf{x}, \mathbf{y}) \mid \mathbf{x}, \mathbf{y} \in \mathbf{C}, \mathbf{x} \neq \mathbf{y} \}.$$

### *4.1. A Singleton Bound*

The Singleton bound usually follows a puncturing argument, which is possible for the overweight, but gives the same result as applying the following observation:

**Remark 1.** *For all x* ∈ *R, we have that*

$$0 \le w\_H(\mathfrak{x}) \le W(\mathfrak{x}) \le 2w\_H(\mathfrak{x}) \le 2n\_{\mathfrak{x}}$$

*where wH denotes the Hamming weight.*

Hence, using the Singleton bound for the Hamming metric directly gives a Singleton bound for the overweight.

**Proposition 2.** *Let C* <sup>⊆</sup> *<sup>R</sup><sup>n</sup> be a code of size M and minimum overweight distance d. Then,*

$$d \le 2(n - \lceil \log\_{|\mathcal{R}|}(\mathcal{M}) \rceil + 1).$$

**Example 2.** *A trivial example for a code achieving the Singleton bound in Proposition 2 is given by <sup>C</sup>* <sup>=</sup> (*p*,..., *<sup>p</sup>*) ⊂ <sup>Z</sup>*<sup>n</sup> <sup>p</sup>s, having* log*p<sup>s</sup>* (<sup>|</sup> *<sup>C</sup>* <sup>|</sup>) = *<sup>s</sup>*−<sup>1</sup> *<sup>s</sup> and minimum overweight distance d* = 2*n.*

However, if we define the rank of a linear code *C*, denoted by *rk*(*C*), to be the minimal number of generators of *C*, then the following bound is known for principal ideal rings [12,13]

$$d\_H(\mathbb{C}) \le n - rk(\mathbb{C}) + 1.$$

Codes achieving this bound are called Maximum Distance with respect to Rank (MDR) codes, in order to differentiate from MDS codes. This is a sharper bound than the usual Singleton bound, since for non-free codes we have *rk*(*C*) <sup>&</sup>gt; log|*R*<sup>|</sup> (*M*).

In the case of linear codes, the rank thus also leads to a sharper Singleton-like bound for the overweight.

**Proposition 3.** *Let <sup>R</sup> be a principal ideal ring. Let <sup>C</sup>* <sup>⊆</sup> *<sup>R</sup><sup>n</sup> be a linear code of rank rk*(*C*) *and minimum overweight distance d. Then,*

$$d \le 2(n - rk(\mathbb{C}) + 1).$$

**Example 3.** *As an example for a code, we can consider <sup>C</sup>* <sup>=</sup> (3, 6, 3, 0),(6, 6, 0, 3) ⊂ <sup>Z</sup><sup>4</sup> <sup>9</sup>*, having minimum overweight distance d* = 6*.*

### *4.2. A Sphere-Packing Bound*

The sphere-packing bound as well as the Gilbert–Varshamov bound are *generic* bounds, and we are able to provide them for the overweight in a simple form involving the volume of the balls in the underlying metric space.

We begin by defining balls with respect to the overweight distance.

**Definition 9.** *For a given radius r* ≥ 0*, the* overweight ball *Br*,*D*(*x*) *of radius r centered in x is defined as*

$$B\_{r,D}(\mathfrak{x}) := \{ y \in \mathbb{R}^n \mid D(\mathfrak{x}, y) \le r \}.$$

Clearly, the volume of such a ball is invariant under translations, i.e.,

$$|B\_{r,D}(\mathfrak{x})| \;=\; |B\_{r,D}(\mathfrak{y})|.$$

for all *<sup>x</sup>*, *<sup>y</sup>* <sup>∈</sup> *<sup>R</sup>n*.

Moreover, setting *u* := |*R*×| and *v* := |*R*| − 1 − *u*, we have the generating function *fW*(*z*) = 1 + *uz* + *vz*<sup>2</sup> for this weight function, so that the generating function for *W* on *R<sup>n</sup>* takes the form

$$\begin{aligned} f\_W^n(z) &= \begin{pmatrix} 1 + \imath z + \imath z^2 \end{pmatrix}^n \\ &= \sum\_{k\_0 + k\_u + k\_v = n} \binom{n}{k\_0, k\_{u\prime} k\_v} \mathbf{1}^{k\_0} (\imath z)^{k\_u} (\imath z^2)^{k\_v} \\ &= \sum\_{k=0}^n \sum\_{\ell=0}^{n-k} \binom{n}{k} \binom{n-k}{\ell} \imath^k \upsilon^\ell z^{k+2\ell} \end{aligned}$$

where we have set *k* = *ku* and = *kv*, and where the condition *k*<sup>0</sup> + *ku* + *kv* = *n* is transformed in 0 ≤ *k* ≤ *n*, 0 ≤ ≤ *n* − *k*. Now, setting *t* = *k* + 2, we obtain the simplified expression for the generating function

$$f\_W^n(z) = \sum\_{t=0}^{2n} \sum\_{\ell=0}^{\lfloor \frac{t}{2} \rfloor} \binom{n}{t-2\ell} \binom{n-t+2\ell}{\ell} u^{t-2\ell} v^{\ell} z^t.$$

**Lemma 2.** *The foregoing implies that the ball of radius e (centered in* 0*) has volume exactly*

$$|B\_{\ell,D}(0)| \;= \sum\_{t=0}^{\ell} \sum\_{\ell=0}^{\lfloor \frac{t}{2} \rfloor} \binom{n}{t-2\ell} \binom{n-t+2\ell}{\ell} u^{t-2\ell} v^{\ell}.\tag{1}$$

We thus provided an explicit formula for the cardinality of balls in *R<sup>n</sup>* with respect to the overweight distance.

We now obtain the sphere-packing bound for the overweight distance by combining the previous results. As before, *R* is a finite ring and *u* = |*R*×|, whereas *v* = |*R*| − 1 − *u* represents the number of non-zero non-units.

**Corollary 1** (Sphere-Packing Bound)**.** *Let <sup>C</sup>* <sup>⊆</sup> *<sup>R</sup><sup>n</sup> be a (not necessarily linear) code of length n, and minimum overweight distance d* = 2*e* + 1*. Then, we have*

$$|\mathbf{C}| \le \frac{|\mathcal{R}|^n}{|\mathcal{B}\_{\mathbf{c},\mathcal{D}}(\mathbf{0})|} \prime$$

*where the cardinality of Be*,*D*(0) *is given in Equation* (1)*.*

If the minimum distance is even and *R* is a finite local ring with maximal ideal *J*, this bound can be adapted as follows.

**Corollary 2.** *Let <sup>R</sup> be a local ring with maximal ideal J, <sup>q</sup>* <sup>=</sup> <sup>|</sup>*R*/*J*<sup>|</sup> *and <sup>C</sup>* <sup>⊆</sup> *<sup>R</sup>n*+<sup>1</sup> *be a (not necessarily linear) code of length n* + 1 *and minimum overweight distance d* = 2*e* + 2*. Then,*

$$|\mathbf{C}| \le \frac{|\mathcal{R}|^{n+1}}{q \, |B\_{\mathbf{c},D}(\mathbf{0})|},$$

*where Be*,*D*(0) *is the overweight ball of radius e in Rn, and its volume is given in Equation* (1)*.*

**Proof.** Pick *x*1, ... , *xq* such that the cosets *x*<sup>1</sup> + *J*, ... , *xq* + *J* form a partition of *R*. For all *m* ∈ *J*, define the set

$$S\_{\mathfrak{m}} := \{ \mathfrak{x}\_1 + m, \dots, \mathfrak{x}\_{\mathfrak{q}} + m \}.$$

Notice that the sets *Sm* form a partition of *R* and that all elements of *Sm* have mutual overweight distance 1. Thus, given *r* ∈ *R*, we denote with *S*(*r*) the unique set *Sm* that contains *r*. Furthermore, let

$$
\pi: \mathbb{R}^{n+1} \to \mathbb{R}^n
$$

be the projection that removes the *n* + 1'th coordinate and

$$Z(\mathbf{x}) := \{ z \in \mathbb{R}^{n+1} \mid D(\pi(z), \pi(\mathbf{x})) \le \varepsilon, \, z\_{n+1} \in S(\mathbf{x}\_{n+1}) \}.$$

Now, if *x* <sup>=</sup> *<sup>y</sup>* <sup>∈</sup> *<sup>R</sup>n*+<sup>1</sup> are two codewords, then *<sup>Z</sup>*(*x*) and *<sup>Z</sup>*(*y*) are disjoint. Indeed, if *z* ∈ *Z*(*x*) ∩ *Z*(*y*), then *S*(*xn*+1) = *S*(*yn*+1) as they cannot be disjoint. Hence, *D*(*xn*+1, *yn*+1) ≤ 1. Furthermore, both *D*(*π*(*x*), *π*(*z*)) and *D*(*π*(*y*), *π*(*z*)) are less than or equal to *e*, implying that *D*(*π*(*x*), *π*(*y*)) ≤ 2*e*. It follows that *D*(*x*, *y*) ≤ 2*e* + 1, which is a contradiction.

To find non-trivial examples of perfect codes is as notoriously hard as over finite fields in the Hamming metric. Clearly, in the case *R* = F*q*, there are non-trivial perfect codes, as the overweight coincides with the Hamming weight. Examples of such codes can be found in [5] (Section IV). Furthermore, in the case *R* = Z*p<sup>k</sup>* , linear 1-perfect codes are classified in terms of their parity-check matrix in [6] (Theorem IV.1).

### *4.3. A Gilbert–Varshamov Bound*

With arguments similar to those for the sphere-packing bound, we can also obtain a lower bound on the maximal size of a code with a fixed minimum distance.

**Proposition 4** (Gilbert–Varshamov bound)**.** *Let R be a finite ring, n a positive integer and <sup>d</sup>* ∈ {0, ... , 2*n*}*. Then, there exists a code <sup>C</sup>* <sup>⊆</sup> *<sup>R</sup><sup>n</sup> of minimum overweight distance at least d satisfying*

$$|\mathcal{C}| \ge \frac{|\mathcal{R}|^n}{\left| \overline{B\_{d-1,D}(0)} \right|}.$$

,

*where the volume is given in Equation* (1) *for e* = *d* − 1*, i.e.,*

$$\left| B\_{d-1,D}(0) \right| \;= \sum\_{t=0}^{d-1} \sum\_{\ell=0}^{\lfloor \frac{t}{2} \rfloor} \binom{n}{t-2\ell} \binom{n-t+2\ell}{\ell} u^{t-2\ell} v^{\ell} \cdot \ell$$

**Proof.** Assume *<sup>C</sup>* <sup>⊆</sup> *<sup>R</sup><sup>n</sup>* of minimum overweight distance of at least *<sup>d</sup>* is a largest code of length *<sup>n</sup>* and minimum distance *<sup>d</sup>*. Then, the set of balls *Bd*−1,*D*(*x*) centered in the codewords *<sup>x</sup>* <sup>∈</sup> *<sup>C</sup>* must already cover the space *<sup>R</sup>n*. Since, if this were not the case, one would find an element *<sup>y</sup>* <sup>∈</sup> *<sup>R</sup><sup>n</sup>* that is not contained in the ball of radius *<sup>d</sup>* <sup>−</sup> 1 around any element of *C*. This word *y* would have distance at least *d* to each of the words of *C*, and thus *C* ∪ {*y*} would be a code of properly larger size with distance at least *d*, a contradiction to the choice of *C*.

From the covering argument, we then see that

$$\left|\mathbf{C}\right| \ge \frac{|\mathcal{R}|^n}{\left|B\_{d-1,D}(0)\right|^{\prime}},$$

as desired.

Let us consider the special case where *R* is a finite chain ring. Since the overweight is an additive weight, and the conditions of [14] are easily verified, we can use [14] (Theorem 22) to obtain that random linear codes over *R<sup>n</sup>* achieve the (asymptotic) Gilbert–Varshamov bound with high probability.

**Example 4.** *As an easy example, we can consider R<sup>n</sup>* = Z<sup>2</sup> <sup>8</sup>*. The maximal minimum overweight distance is given by d* = 2*n* = 4*. The Gilbert–Varshamov bound states for this example that* *there exists a code C with* | *C* |> 1, *as* | *B*3,*D*(0) |= 55*. For example, the code C* = (2, 2) *has four elements.*

### *4.4. A Plotkin Bound*

Over a local ring, we can use methods similar to the ones used for the classical Plotkin bound to obtain an analogue of the Plotkin bound for (not necessarily linear) codes equipped with the overweight.

For the rest of this section, *R* is a finite local ring with maximal ideal *J*. The notation stems from the Jacobson radical of the ring *R*. Note that the factor ring *R*/*J* is a finite field, whose cardinality will be denoted by *q*.

Similarly to the Hamming case, for a subset *A* ⊆ *R*, we will denote by

$$
\overline{\mathcal{W}}(A) = \frac{\sum\_{a \in A} \mathcal{W}(a)}{|A|}
$$

the average weight of the subset *A*.

**Lemma 3.** *Let I* ⊆ *R be a left or right ideal. Then,*

$$
\overline{\mathcal{W}}(I) = \begin{cases}
\frac{|R| + |f| - 2}{|R|} & \text{if } I = R, \\
2 \left( 1 - \frac{1}{|I|} \right) & \text{if } \{0\} \subsetneq I \subsetneq R, \\
0 & \text{else}.
\end{cases}
$$

**Proof.** Note that the last case is trivial as *I* = {0}. If {0} *I R*, then all non-zero elements of *I* have weight 2, so this case follows as well.

Finally, if *I* = *R*, then there are |*R* \ *J*| = |*R*|−|*J*| elements of weight 1 and |*J*| − 1 elements of weight 2. Hence, the total weight is |*R*|−|*J*| + 2(|*J*| − 1) and dividing by |*R*| yields the claim.

**Corollary 3.** *Let R be a local ring with maximal ideal J and assume that* |*J*| ≥ 2*. Then, we have that W*(*J*) ≥ *W*(*I*) *for all left or right ideals I* ⊆ *R.*

**Proof.** We immediately see that *W*(*J*) ≥ *W*(*I*) for all *I* ⊆ *J*. Now, consider the case *I* = *R*. We have that

$$\begin{aligned} \overline{\mathcal{W}}(R) &= \frac{|R| + |I| - 2}{|R|} = \frac{|R \backslash f|}{|R|} + 2\frac{|I| - 1}{|R|} \\ &= \frac{|R \backslash f|}{|R|} + 2\frac{|I| - 1}{|I|} \cdot \frac{|f|}{|R|} \\ &\le 2\frac{|I| - 1}{|I|} \cdot \frac{|R \backslash f|}{|R|} + 2\frac{|I| - 1}{|I|} \cdot \frac{|I|}{|R|} \\ &= 2\frac{|I| - 1}{|I|} = \overline{W}(f)\_{\prime} \end{aligned}$$

where we used that 2 <sup>|</sup>*J*|−<sup>1</sup> <sup>|</sup>*J*<sup>|</sup> <sup>≥</sup> 1.

To ease the notation, let us denote by *η* the following

$$\eta = \overline{\mathcal{W}}(J) = 2\left(1 - \frac{1}{|J|}\right).$$

In what follows, we provide a Plotkin bound for the overweight over a local ring *R* with maximal ideal *J*. The case |*J*| = 1 is already well studied, since, in this case, *R* is a field and *D* is simply the Hamming distance. Hence, we will assume that |*J*| ≥ 2.

We start with a lemma for the Hamming weight. The proof of it follows the idea of the classical Plotkin bound, which can be found in [15], and for the homogeneous weight in [10].

**Lemma 4.** *Let I* ⊆ *R be a subset and P be a probability distribution on I. Then, we have that*

$$\sum\_{\mathbf{x}\in I} \sum\_{y\in I} w\_H(\mathbf{x} - y) P(\mathbf{x}) P(y) \le 1 - \frac{1}{|I|}.$$

**Proof.** We have that

$$\sum\_{x \in I} \sum\_{y \in I} w\_{Ii}(x - y)P(x)P(y) = \sum\_{x \in I} P(x)(1 - P(x)) = \sum\_{x \in I} P(x) - \sum\_{x \in I} P(x)^2.$$

If we apply the Cauchy–Schwarz inequality to the latter sum, we obtain that

$$\sum\_{\mathbf{x}\in I} P(\mathbf{x}) - \sum\_{\mathbf{x}\in I} P(\mathbf{x})^2 \le 1 - \frac{1}{|I|} \left| \sum\_{\mathbf{x}\in I} P(\mathbf{x}) \right|^2 = 1 - \frac{1}{|I|}.$$

From which we can conclude.

We are now ready for the most important step of the Plotkin bound. As before, *R* is a local ring with non-zero maximal ideal *J* and *η* = *W*(*J*).

**Proposition 5.** *Let P be a probability distribution on R. Then, it holds that*

$$\sum\_{\mathbf{x}\in R} \sum\_{y\in R} \mathcal{W}(\mathbf{x} - \mathbf{y}) P(\mathbf{x}) P(y) \le \eta.$$

**Proof.** Let *q* = |*R*/*J*| and pick *x*1, ... , *xq* such that *xi* + *J* = *xj* + *J* if *i* = *j*. Then, it follows that the cosets *xi* := *xi* + *J* form a partition of *R*. For all *k* ∈ {1, . . . , *q*}, we denote by

$$P\_k = \sum\_{x \in \overline{\mathbb{Z}\_k}} P(x).$$

It follows that *q* ∑ *k*=1 *Pk* = 1. By rewriting the initial sum as sum over all cosets, we obtain that

$$\begin{split} &\sum\_{\mathbf{x}\in\mathcal{R}}\sum\_{\mathbf{y}\in\mathcal{R}}W(\mathbf{x}-\mathbf{y})P(\mathbf{x})P(\mathbf{y}) \\ &=\sum\_{k=1}^{q}\sum\_{\mathbf{x}\in\overline{\mathbf{x}\_{k}}}\sum\_{\mathbf{y}\in\mathcal{R}}W(\mathbf{x}-\mathbf{y})P(\mathbf{x})P(\mathbf{y}) \\ &=\sum\_{k=1}^{q}\sum\_{\mathbf{x}\in\overline{\mathbf{x}\_{k}}}\left(\sum\_{\mathbf{y}\in\overline{\mathbf{x}\_{k}}}2w\_{H}(\mathbf{x}-\mathbf{y})P(\mathbf{x})P(\mathbf{y}) + \sum\_{z\in\overline{\mathbf{x}}\in\overline{\mathbf{x}\_{k}}}w\_{H}(\mathbf{x}-z)P(\mathbf{x})P(z)\right) \\ &=\sum\_{k=1}^{q}\left(2\sum\_{\mathbf{x}\in\overline{\mathbf{x}\_{k}}}\sum\_{\mathbf{y}\in\overline{\mathbf{x}\_{k}}}w\_{H}(\mathbf{x}-\mathbf{y})P(\mathbf{x})P(\mathbf{y}) + \sum\_{x\in\overline{\mathbf{x}\_{k}}}\sum\_{z\in R\backslash\overline{\mathbf{x}\_{k}}}P(\mathbf{x})P(z)\right) \\ &=\sum\_{k=1}^{q}\left(2\sum\_{\mathbf{x}\in\overline{\mathbf{x}\_{k}}}\sum\_{\mathbf{y}\in\overline{\mathbf{x}\_{k}}}w\_{H}(\mathbf{x}-\mathbf{y})P(\mathbf{x})P(\mathbf{y}) + \sum\_{x\in\overline{\mathbf{x}\_{k}}}P(\mathbf{x})(1-P\_{k})\right). \end{split}$$

If *Pk* = 0, then *P*˜(*x*) := *P*(*x*)/*Pk* defines a probability distribution on *xk*. In this case, we apply Lemma 4 to obtain that

$$\begin{aligned} &\sum\_{\mathbf{x}\in\mathfrak{X}\_{k}}\sum\_{\mathbf{y}\in\mathfrak{X}\_{k}}w\_{H}(\mathbf{x}-\mathbf{y})P(\mathbf{x})P(\mathbf{y})\\ &=P\_{k}^{2}\left(\sum\_{\mathbf{x}\in\mathfrak{X}\_{k}}\sum\_{\mathbf{y}\in\mathfrak{X}\_{k}}w\_{H}(\mathbf{x}-\mathbf{y})\frac{P(\mathbf{x})P(\mathbf{y})}{P\_{k}^{2}}\right),\\ &\leq P\_{k}^{2}\left(1-\frac{1}{|f|}\right). \end{aligned}$$

Note that the same inequality also trivially holds if *Pk* = 0. Applying this and using that ∑ *x*∈*xk P*(*x*) = *Pk*, we obtain that

$$\begin{aligned} &\sum\_{k=1}^{q} \left( 2\sum\_{\mathbf{x}\in\overline{\mathbf{x}\_{k}}} \sum\_{\mathbf{y}\in\overline{\mathbf{x}\_{k}}} w\_{H}(\mathbf{x}-\mathbf{y})P(\mathbf{x})P(\mathbf{y}) + \sum\_{\mathbf{x}\in\overline{\mathbf{x}\_{k}}} P(\mathbf{x})(1-P\_{\mathbf{k}}) \right) \\ &\leq \sum\_{k=1}^{q} \left( P\_{k}^{2} \cdot 2\left(1 - \frac{1}{|f|}\right) + P\_{k}(1-P\_{\mathbf{k}}) \right) \\ &\leq \sum\_{k=1}^{q} P\_{k} \cdot 2\left(1 - \frac{1}{|f|}\right) = 2\left(1 - \frac{1}{|f|}\right) = \eta . \end{aligned}$$

where we used that 2 <sup>1</sup> <sup>−</sup> <sup>1</sup> |*J*| ≥ 1 since |*J*| ≥ 2 in the last inequality.

To complete the Plotkin bound for the overweight, we now follow the steps in [10]. Using Proposition 5, we obtain the following result:

**Proposition 6.** *Let <sup>C</sup>* <sup>⊆</sup> *<sup>R</sup><sup>n</sup> be a (not necessarily linear) code of minimum overweight distance d. Then,*

$$|\mathbb{C}|(|\mathbb{C}|-1)d \le \sum\_{\mathbf{x} \in \mathbb{C}} \sum\_{\mathbf{y} \in \mathbb{C}} D(\mathbf{x}, \mathbf{y}) \le |\mathbb{C}|^2 n \,\eta.$$

**Proof.** The first inequality follows since the distance between all distinct pairs of *C* is at least *d*.

For the second inequality, let *pi* : *<sup>R</sup><sup>n</sup>* <sup>→</sup> *<sup>R</sup>* be the projection onto the *<sup>i</sup>*th coordinate. Note that

$$P\_{\mathbf{i}}(z) := \frac{|p\_i^{-1}(z) \cap \mathbb{C}|}{|\mathbb{C}| }$$

defines a probability distribution on *R* for all *i* ∈ {1, ... , *n*}. Using Proposition 5, we obtain that

$$\begin{aligned} \sum\_{\mathbf{x} \in \mathbb{C}} \sum\_{\mathbf{y} \in \mathbb{C}} D(\mathbf{x}, \mathbf{y}) &= \sum\_{i=1}^{n} \sum\_{\mathbf{x} \in \mathbb{C}} \sum\_{\mathbf{y} \in \mathbb{C}} W(\mathbf{x}\_{i} - y\_{i}) \\ &= \sum\_{i=1}^{n} \sum\_{r \in R} \sum\_{s \in R} W(r - s) P\_{i}(r) P\_{i}(s) |\mathbb{C}|^{2} \\ &\leq |\mathbb{C}|^{2} \sum\_{i=1}^{n} \eta = |\mathbb{C}|^{2} n \eta. \end{aligned}$$

Thus, we obtain the claim.

From this inequality, we obtain a Plotkin bound for the overweight distance. As before, *R* is a local ring with non-zero maximal ideal *J* and *η* = 2 <sup>1</sup> <sup>−</sup> <sup>1</sup> |*J*| .

**Theorem 3** (Plotkin bound for the overweight distance)**.** *Let <sup>C</sup>* <sup>⊆</sup> *<sup>R</sup><sup>n</sup> be a (not necessarily linear) code of minimum overweight distance d* = *D*(*C*) *and assume that d* > *nη. Then,*

$$|\mathbf{C}| \le \frac{d}{d - n\eta}.$$

**Proof.** We divide both sides of the inequality in Proposition 6 by |*C*| to obtain that

$$|\mathcal{C}| (d - n\eta) \le d.$$

The result then follows from the assumption that *d* − *nη* > 0.

By rearranging the same inequality, we also obtain the following version of the Plotkin bound, which does not require the assumption that *d* > *nη*.

**Corollary 4.** *Let <sup>C</sup>* <sup>⊆</sup> *<sup>R</sup><sup>n</sup> be a (not necessarily linear) code with* <sup>|</sup> *<sup>C</sup>* |≥ <sup>2</sup> *and let <sup>d</sup>* <sup>=</sup> *<sup>D</sup>*(*C*)*. Then,*

$$d \le \frac{|\mathcal{C}| \, \eta \eta}{|\mathcal{C}| - 1}.$$

**Proof.** We obtain this by dividing both sides of the inequality in Proposition 6 by |*C*|(|*C*| − 1), which is non-zero by assumption.

**Remark 2.** *Note that W is a homogeneous weight on* Z4*, and thus our bound coincides with the bound from [10] for the homogeneous weight on* Z4*.*

**Example 5.** *If we consider codes over* Z<sup>9</sup> *and fix* |*C*| = 9*, n* = 3*. We obtain that d* ≤ 9/2 *and hence by the integrality that d* ≤ 4*. The linear code*

$$\mathbf{C} = \langle (1, 1, 3) \rangle$$

*attains this bound.*

### **5. A Johnson Bound for the Homogeneous Weight**

Another interesting bound is the Johnson bound due to its relation with list-decodability. In the classical form, the Johnson bound gives an upper bound on the largest size *Aq*(*n*, *d*, *w*) of a constant-weight *w* code over F*<sup>q</sup>* of length *n* and minimum Hamming distance *d*. However, for the list-decodability of a code, we are interested in codes having codewords of weight *at most w*. In fact, if the largest size of such a code *A <sup>q</sup>*(*n*, *d*, *w*) is small, e.g., at most a constant *L*, then every ball of radius *w* contains at most *L* codewords and hence one can decode a list of a size at most *L*. In more detail, the Johnson bound for list-decodability in the Hamming metric states that, if

$$\frac{w}{n} < (1 - \frac{1}{q})\left(1 - \sqrt{1 - \frac{q}{q - 1}\delta}\right) = J\_q(\delta),$$

where *δ* denotes the relative minimum distance, then *A <sup>q</sup>*(*n*, *d*, *w*) ≤ *n*(*d* − 1).

This famous bound is still missing for the well-studied homogeneous weight, which is, like the overweight, a generalization of the Lee weight over Z4. In this section, we prove a Johnson bound for the homogeneous weight from Definition 6, denoted by *wt* and let *γ* be its average weight on *R*. By abuse of notation, we denote with *wt* also the extension of *wt* to *Rn*, that is,

$$wt(x) = \sum\_{i=1}^{n} wt(x\_i).$$

Note that *wt* does not necessarily satisfy the triangle inequality. In [7] (Theorem 2), it is shown that the homogeneous weight on Z*<sup>m</sup>* satisfies the triangle inequality if and only if *m* is not divisible by 6.

We define the ball of radius *r* with respect to a homogeneous weight *wt* to be the set of all elements having distance less than or equal to *r*.

**Definition 10.** *Let y* <sup>∈</sup> *<sup>R</sup><sup>n</sup> and r* <sup>∈</sup> <sup>R</sup>≥0*. The ball Br*,*wt*(*y*) *of radius r centered in y is defined as*

$$B\_{r, \text{rot}}(y) := \{ \mathfrak{x} \in \mathbb{R}^n \mid wt(\mathfrak{x} - y) \le r \}.$$

Our aim is to provide a Johnson bound for the homogeneous weight over Frobenius rings. Thus, we begin by defining list-decodability.

**Definition 11.** *Let <sup>R</sup> be a finite ring. Given <sup>ρ</sup>* <sup>∈</sup> <sup>R</sup>≥0*, a code <sup>C</sup>* <sup>⊆</sup> *<sup>R</sup><sup>n</sup> is called* (*ρ*, *<sup>L</sup>*) *list-decodable (with respect to wt) if, for every y* <sup>∈</sup> *<sup>R</sup>n, it holds that*

$$|B\_{\rho m, \text{wt}}(y) \cap \mathcal{C}| \le L.$$

Over Frobenius rings, the following result holds, which will play an important role in the proof of the Johnson bound.

**Proposition 7** ([10] (Corollary 3.3))**.** *Let <sup>R</sup> be a Frobenius ring, <sup>C</sup>* <sup>⊆</sup> *<sup>R</sup><sup>n</sup> a (not necessarily linear) code of minimum distance d and ω* = max{*wt*(*c*) | *c* ∈ *C*}. *If ω* ≤ *γn, then*

$$2|\mathbb{C}|(|\mathbb{C}|-1)d \le \sum\_{\mathbf{x}, \mathbf{y} \in \mathbb{C}} wt(\mathbf{x} - \mathbf{y}) \le 2|\mathbb{C}|^2 \omega - \frac{|\mathbb{C}|^2 \omega^2}{\gamma n}.$$

With this, we obtain an analogue of the Johnson bound for the homogeneous weight.

**Theorem 4.** *Let <sup>R</sup> be a Frobenius ring and <sup>C</sup>* <sup>⊆</sup> *<sup>R</sup><sup>n</sup> be a (not necessarily linear) code of minimum distance d. Assume that ρ* ≤ *γ. Then, it holds that C is* (*ρ*, *dγn*) *list-decodable if one of the following conditions is satisfied:*


**Proof.** Assume that *<sup>e</sup>* <sup>≤</sup> *<sup>ρ</sup><sup>n</sup>* and let *<sup>y</sup>* <sup>∈</sup> *<sup>R</sup>n*. We have to show that, under the given conditions, |*Be*,*wt*(*y*) ∩ *C*| ≤ *dγn*.

Note first that we may assume that *y* = 0; otherwise, simply consider the translate

$$\mathcal{C}' = \{c - y \mid c \in \mathcal{C}\}\dots$$

Assume that *x*1, ... , *xN* are in *Be*,*wt*(0) ∩ *C*. We have that *wt*(*xi* − *xj*) ≥ *d* for *i* = *j*, thus using Proposition 7 and *wt*(*x* − *y*) = *wt*(*y* − *x*), we obtain that

$$N(N-1)d \le 2\sum\_{i$$

Hence, it follows that

$$N(d\gamma n - 2e\gamma n + e^2) \le d\gamma n.$$

It holds that

$$d\gamma n - 2e\gamma n + e^2 = (n\gamma - e)^2 - n\gamma (n\gamma - d).$$

If we assume that *nγ*(*nγ* − *d*) ≤ −1, then we clearly have

$$(n\gamma - \varepsilon)^2 - n\gamma(n\gamma - d) \ge 1.$$

If this is not the case, we see that <sup>D</sup> (*<sup>γ</sup>* <sup>−</sup> *<sup>d</sup> <sup>n</sup>* )*<sup>γ</sup>* <sup>+</sup> <sup>1</sup> *<sup>n</sup>*<sup>2</sup> is well-defined. Thus, if

$$\frac{c}{n} \le \gamma - \sqrt{\left(\gamma - \frac{d}{n}\right)\gamma + \frac{1}{n^2}}$$

then

$$n\gamma - e \ge \sqrt{(n\gamma - d)n\gamma + 1}\_2$$

and hence

$$(n\gamma - \mathfrak{e})^2 - n\gamma(n\gamma - d) \ge 1.$$

It follows that *N* ≤ *dγn*.

**Remark 3.** *Note that the second condition already forces ρ* ≤ *γ.*

**Example 6.** *As an easy example, we can consider the code <sup>C</sup>* <sup>=</sup> (1, 1),(4, 0) ⊂ <sup>Z</sup><sup>2</sup> <sup>8</sup> *of minimum homogeneous distance 2 and γ* = 1*. The second condition of Theorem 4 is clearly satisfied by choosing ρ* = 1/2 *since*

$$
\gamma - \sqrt{\left(\gamma - \frac{d}{n}\right)\gamma + \frac{1}{n^2}} = \frac{1}{2},
$$

*implying that the code is* (1/2, 4) *list decodable. For example, when setting y* = (1, 2)*, we see that*

$$B\_{1,wt}(y)\cap \mathcal{C} = \{(1,1),(2,2),(1,5),(6,2)\},$$

*so the bound is attained.*

### **6. Open Problems**

We conclude this paper with some interesting open questions for the newly defined overweight that we have encountered.

**Problem 1.** *Classify the codes that attain the bounds derived in this paper.*

**Problem 2.** *Give a Griesmer bound, an Elias-Bassalygo and a Johnson bound for the overweight.*

Proving an analogue of a Griesmer, Elias-Bassalygo and Johnson bound poses a difficult challenge over rings and in particular for the overweight, due to the necessity of an effective upper bound on the sum of the distances.

**Author Contributions:** All authors contributed to the content of this article. Conceptualization, N.G., M.G., J.R. and V.W.; methodology, N.G., M.G., J.R. and V.W.; validation, N.G., M.G., J.R. and V.W.; formal analysis, N.G., M.G., J.R. and V.W.; investigation, N.G., M.G., J.R. and V.W.; data curation, N.G., M.G., J.R. and V.W.; writing—original draft preparation, N.G., M.G., J.R. and V.W.; writing—review and editing, N.G., M.G., J.R. and V.W.; visualization, N.G., M.G., J.R. and V.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** The first and third author are supported by armasuisse Science and Technology (Project Nr.: CYD C-2020010) and were supported in part by the Swiss National Science Foundation Grant No. 188430. The fourth author is supported by the Swiss National Science Foundation Grant No. 195290 and by the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 899987.

**Institutional Review Board Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


### *Article* **Gaussian Multiuser Wiretap Channels in the Presence of a Jammer-Aided Eavesdropper †**

**Rémi A. Chou 1,\* and Aylin Yener 2,\***


**Abstract:** This paper considers secure communication in the presence of an eavesdropper and a malicious jammer. The jammer is assumed to be oblivious of the communication signals emitted by the legitimate transmitter(s) but can employ any jamming strategy subject to a given power constraint and shares her jamming signal with the eavesdropper. Four such models are considered: (i) the Gaussian point-to-point wiretap channel; (ii) the Gaussian multiple-access wiretap channel; (iii) the Gaussian broadcast wiretap channel; and (iv) the Gaussian symmetric interference wiretap channel. The use of pre-shared randomness between the legitimate users is not allowed in our models. Inner and outer bounds are derived for these four models. For (i), the secrecy capacity is obtained. For (ii) and (iv) under a degraded setup, the optimal secrecy sum-rate is characterized. Finally, for (iii), ranges of model parameter values for which the inner and outer bounds coincide are identified.

**Keywords:** Gaussian wiretap channel; Gaussian multiple-access wiretap channel; Gaussian broadcast wiretap channel; jamming; secure communication

### **1. Introduction**

Consider secure communication over wireless channels between legitimate parties in the presence of an eavesdropper and a malicious jammer. The jammer is assumed to be oblivious of the legitimate users' communication but can employ any jamming strategy subject to a given power constraint. Consequently, the main channel between the legitimate users is arbitrarily varying [1]. Unlike most works that consider arbitrarily varying channels, however, pre-shared randomness is not available to the legitimate users in our scenario. Additionally, the jammer shares her jamming signal with the eavesdropper who can thus perfectly cancel the effect of the jamming signal on her channel. In this paper, we study the fundamental limits of secure communication rates in the presence of such a jammeraided eavesdropper over four Gaussian wiretap channel models: the Gaussian wiretap channel [2], the Gaussian multiple-access wiretap channel [3], the Gaussian broadcast wiretap channel [4], and the Gaussian symmetric interference wiretap channel.

Our contributions are summarized as follows.


**Citation:** Chou, R.A.; Yener, A. Gaussian Multiuser Wiretap Channels in the Presence of a Jammer-Aided Eavesdropper. *Entropy* **2022**, *24*, 1595. https:// doi.org/10.3390/e24111595

Academic Editor: Eduard Jorswieck

Received: 29 September 2022 Accepted: 28 October 2022 Published: 2 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Our main strategy to handle our multiuser settings is to reduce the problem to singleuser coding. Previous known techniques for such a reduction, such as rate-splitting [5] and successive cancellation decoding [5] [Appendix C], that have been developed for multipleaccess settings without security constraints, do not easily apply to wiretap channel models. These techniques consist in achieving the corner points of achievability regions that can be described by polymatroids whose corner points have *positive components*. However, regions described by polymatroids whose corner points have *negative components*, as in our wiretap channel models, prevent the applications of these techniques. We overcome this roadblock by proposing novel time-sharing strategies coupled with appropriate secret-key exchanges between the legitimate users. As seen in the proofs of our results, eavesdropping and arbitrary jamming are not easy to decouple in the secrecy analysis. In particular, the analysis of the secrecy in our proposed model does not follow from a standard secrecy analysis in the absence of jamming, as we need to consider (i) codewords uniformly distributed over spheres, which we use to handle an arbitrarily varying main channel; and (ii) block-Markov coding and specific time-sharing strategies (to allow the reduction of multiuser coding to single-user coding) which create inter-dependencies between coding blocks. Note that our achievability schemes also rely on point-to-point codes developed in [1]. One of the benefits of reducing multiuser coding to point-to-point coding techniques is that despite the fact that our setting involves multiple transmitters and an arbitrarily varying channel between the legitimate users, *pre-shared randomness among the legitimate users will not be needed in our achievability schemes*. Our strategy for the converse consists of reducing the problem of determining a converse for our model to the problem of determining a converse for a related model in the absence of a jammer.

### *1.2. Related Works*

Related works that consider simultaneous eavesdropping and oblivious jamming threats for the point-to-point discrete memoryless wiretap channel include [6–11]. The proof techniques used in these references to obtain security, such as random binning [12,13], resolvability/soft covering [10,14,15], or typicality arguments, are challenging to apply to a Gaussian setting in the absence of shared randomness at the legitimate user. Specifically, for the Gaussian point-to-point channel in the presence of an adversary that arbitrarily jams [1], the only known coding mechanism to obtain reliability in the absence of preshared randomness relies on codewords uniformly drawn on a unit sphere [1], which are challenging to integrate with the above techniques to obtain security because their components are not independent and identically distributed.

Another line of work [16] considers Gaussian channel models where the eavesdropper channel can vary arbitrarily, but the main channel is not. The setting considered in the present paper, where the main channel between the legitimate users is arbitrarily varying, prevents the use of analyses similar to those in [16] for the same reasons described above.

Several other works have considered continuous channel models, including the Gaussian MIMO wiretap channel [17], the Gaussian multiple-access wiretap channel [18], where deviating users can be viewed as active adversary, and continuous point-to-point wiretap channels [19,20], where the adversary can choose between eavesdropping or jamming. These references differ from the above-mentioned references on arbitrarily varying channels as they assume a specific signaling strategy for the jammer.

Finally, note that for point-to-point channels, stronger jamming strategies that depend on the signals of the legitimate transmitters have been studied in [21–23].

### *1.3. Organization of the Paper*

The remainder of the paper is organized as follows. We describe the models in Section 2. We present our results for the Gaussian point-to-point wiretap channel, the Gaussian multiple-access wiretap channel, the Gaussian broadcast wiretap channel, and the Gaussian symmetric interference wiretap channel in Sections 3–6, respectively. We discuss in Section 4.2 a way to avoid, at least for some channel parameters, time-sharing for the

multiple-access setting. We also discuss in Section 4.3 an extension of the multiple-access setting to more than two transmitters. We detail the proofs for the multiple-access setting in Sections 7 and 8. We end the paper with concluding remarks in Section 9.

### **2. Problem Statement**

### *2.1. Notation*

For *a*, *b* ∈ R, define *a*, *b* - [*a*, *b*] ∩ N, ]*a*, *b*[- [*a*, *b*]\{*a*, *b*}, ]*a*, *b*] - [*a*, *b*]\{*a*}, and [*a*, *b*[- [*a*, *<sup>b</sup>*]\{*b*}. The components of a vector, *<sup>X</sup>n*, of size *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>, are denoted by subscripts, i.e., *X<sup>n</sup>* - (*X*1, *X*2, ... , *Xn*). For *x* ∈ R, define [*x*] <sup>+</sup> max(0, *x*). The notation *x* → *y* describes a function that associates *y* to *x* when the domain and the image of the function are clear from the context. The power set of a finite set S is denoted by 2S. The convex hull of a set S is denoted by Conv(S). Unless specified otherwise, capital letters designate random variables, whereas lowercase letters designate realizations of associated random variables, e.g., *x* is a realization of the random variable *X*. For *R* ∈ R+, B*n* <sup>0</sup> (*R*) denotes the ball of radius *<sup>R</sup>* centered in 0 in R*<sup>n</sup>* under the Euclidian norm.

### *2.2. Gaussian Multiuser Wiretap Channel in the Presence of a Jammer-Aided Eavesdropper*

Consider the Gaussian memoryless wiretap channel model with two transmitters and two legitimate receivers

$$Y\_1^{\mathfrak{n}} \stackrel{\Delta}{=} \sqrt{\mathfrak{g}\_{11}} X\_1^{\mathfrak{n}} + \sqrt{\mathfrak{g}\_{12}} X\_2^{\mathfrak{n}} + \sqrt{\mathfrak{g}\_{13}} S^{\mathfrak{n}} + N\_{1\prime}^{\mathfrak{n}} \tag{1a}$$

$$Y\_2^n \triangleq \sqrt{\mathcal{g}\_{21}} X\_1^n + \sqrt{\mathcal{g}\_{22}} X\_2^n + \sqrt{\mathcal{g}\_{23}} S^n + N\_{2,\*}^n \tag{1b}$$

$$Z^{\underline{n}} \triangleq \sqrt{h\_1}X\_1^{\underline{n}} + \sqrt{h\_2}X\_2^{\underline{n}} + N\_{Z'}^{\underline{n}} \tag{1c}$$

where *Y<sup>n</sup>* <sup>1</sup> , *<sup>Y</sup><sup>n</sup>* <sup>2</sup> are the channel outputs observed by the legitimate receivers, and *<sup>Z</sup><sup>n</sup>* is the channel output observed by the eavesdropper. For *<sup>l</sup>* ∈ {1, 2}, *<sup>X</sup><sup>n</sup> <sup>l</sup>* is the signal emitted by Transmitter *<sup>l</sup>* satisfying the power constraint *X<sup>n</sup> <sup>l</sup>* <sup>2</sup> - ∑*<sup>n</sup> i*=1(*Xl*)<sup>2</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>n</sup>*Γ*l*, *<sup>S</sup><sup>n</sup>* is an arbitrary jamming sequence transmitted by the jammer that is oblivious of the communication of the legitimate users and satisfies the power constraint *Sn*<sup>2</sup> - ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *S*<sup>2</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>n</sup>*Λ, and *<sup>N</sup><sup>n</sup> Y*1 , *N<sup>n</sup> Y*2 , *N<sup>n</sup> Z* are sequences of independent and identically distributed Gaussian noise with variances *σ*<sup>2</sup> 1 , *σ*2 <sup>2</sup> , *<sup>σ</sup>*<sup>2</sup> *<sup>Z</sup>*, respectively. The channel coefficients *g*11, *g*12, *g*13, *g*21, *g*22, *g*23, *h*1, *h*<sup>2</sup> are fixed and known to all parties. Note that we assume that the jammer helps the eavesdropper by sharing her jamming sequence, which allows the eavesdropper to perfectly cancel *S<sup>n</sup>* from *Zn*. Coding schemes and achievable rates are defined as follows.

**Definition 1.** *Let n*, *k* ∈ N*. A* 2*nR*<sup>1</sup> , 2*nR*<sup>2</sup> , *n*, *k code* C*<sup>n</sup> consists, for each block j* ∈ -1, *k, of*


*where for any <sup>l</sup>* ∈ {1, 2}*, Rl* <sup>=</sup> <sup>1</sup> *<sup>k</sup>* <sup>∑</sup>*<sup>k</sup> <sup>j</sup>*=<sup>1</sup> *<sup>R</sup>*(*j*) *<sup>l</sup> , and operates as follows. For each block j* ∈ -1, *k, transmitter l* ∈ {1, 2} *encodes with e* (*j*) *<sup>l</sup> a uniformly distributed message <sup>M</sup>*(*j*) *<sup>l</sup>* ∈ M(*j*) *<sup>l</sup> to a codeword of length n, which is sent to the legitimate receiver over the channel described by Equation* (1a)*, Equation* (1b)*, Equation* (1c) *with the power constraint n*Λ *for the jamming signal S<sup>n</sup> <sup>i</sup> . Note that all the power constraints at the transmitters and the jammer hold for a given transmission block of length n, which is relevant when the power constraints hold within any time window corresponding to <sup>n</sup> channel uses. Then, the legitimate receiver <sup>l</sup>* ∈ {1, 2} *forms an estimate <sup>M</sup>*<sup>F</sup> (*j*) *<sup>l</sup> g* (*j*) *<sup>l</sup>* (*Y<sup>n</sup> <sup>l</sup>* ) *of the message <sup>M</sup>*(*j*) *<sup>l</sup> . We define <sup>M</sup>*<sup>F</sup> - *M*F (*j*) <sup>1</sup> , *<sup>M</sup>*<sup>F</sup> (*j*) 2 *j*∈-1,*k , M* - *<sup>M</sup>*(*j*) <sup>1</sup> , *<sup>M</sup>*(*j*) 2 *j*∈-1,*k , S* - (*S<sup>n</sup> <sup>i</sup>* )*i*∈-1,*k, and* S - {(*S<sup>n</sup> <sup>i</sup>* )*i*∈-1,*k* : *S<sup>n</sup> <sup>i</sup>* 2<sup>≤</sup> *<sup>n</sup>*Λ, <sup>∀</sup>*<sup>i</sup>* <sup>∈</sup> -1, *k*}*.*

**Definition 2.** *A rate pair* (*R*1, *R*2) *is achievable, if there exists a sequence of* 2*nR*<sup>1</sup> , 2*nR*<sup>2</sup> , *n*, *k codes such that*

$$\lim\_{m \to \infty} \sup\_{S \in \mathcal{S}} \mathbb{P}[\hat{M} \neq M] = 0 \text{ (reliability)},\tag{2a}$$

$$\lim\_{n \to \infty} \frac{1}{nk} H(M|Z^{kn}) \ge R\_1 + R\_2 \text{ (equivocation)}.\tag{2b}$$

*2.3. Special Case 1: The Gaussian Wiretap Channel in the Presence of a Jammer-Aided Eavesdropper*

Assume that the two transmitters are colocated and the two receivers are colocated in Section 2.2. More specifically, as depicted in Figure 1, the channel model of Section 2.2 becomes

$$\mathcal{Y}^{\mathfrak{n}} \stackrel{\scriptstyle \Delta}{=} X^{\mathfrak{n}} + \mathcal{S}^{\mathfrak{n}} + \mathcal{N}\_{1\prime}^{\mathfrak{n}} \,. \tag{3a}$$

$$Z^n \triangleq \sqrt{h}X^n + N\_{Z'}^n \tag{3b}$$

where *σ*<sup>2</sup> <sup>1</sup> = *<sup>σ</sup>*<sup>2</sup> *<sup>Z</sup>* = 1. We term this model as Gaussian Wiretap channel with Jammer-Aided eavesdropper (Gaussian WT-JA in short form). Note that this model recovers as a special case the Gaussian wiretap channel [2].

**Figure 1.** The Gaussian wiretap channel in the presence of a jammer-aided eavesdropper.

*2.4. Special Case 2: The Gaussian Multiple-Access Wiretap Channel in the Presence of a Jammer-Aided Eavesdropper*

Assume that the two receivers are colocated in Section 2.2. More specifically, as depicted in Figure 2, the channel model of Section 2.2 becomes

$$Y^{\mathfrak{n}} \stackrel{\Delta}{=} X\_1^{\mathfrak{n}} + X\_2^{\mathfrak{n}} + S^{\mathfrak{n}} + N\_{1\prime}^{\mathfrak{n}} \tag{4a}$$

$$Z^{\mathfrak{n}} \triangleq \sqrt{h\_1} X\_1^{\mathfrak{n}} + \sqrt{h\_2} X\_2^{\mathfrak{n}} + N\_{Z^{\mathfrak{s}}}^{\mathfrak{n}} \tag{4b}$$

where *σ*<sup>2</sup> <sup>1</sup> = *<sup>σ</sup>*<sup>2</sup> *<sup>Z</sup>* = 1. We term the model as Gaussian Multiple-Access Wiretap channel with Jammer-Aided eavesdropper (Gaussian MAC-WT-JA in short form) with the parameters (Γ1, Γ2, *h*1, *h*2, Λ, *σ*<sup>2</sup> <sup>1</sup> , *<sup>σ</sup>*<sup>2</sup> *<sup>Z</sup>*). This model recovers as special cases the model in [24] in the absence of the security constraint (2b), and the Gaussian multiple-access wiretap channel [3]. Note that the model in [24] was introduced to study the presence of selfish transmitters via cooperative game theory, and that, similarly, the Gaussian MAC-WT-JA can be used to study the presence of selfish transmitters via coalitional game theory [25].

**Figure 2.** The Gaussian multiple-access wiretap channel in the presence of a jammer-aided eavesdropper.

*2.5. Special Case 3: The Gaussian Broadcast Wiretap Channel in the Presence of a Jammer-Aided Eavesdropper*

Assume that the two transmitters are colocated in Section 2.2. More specifically, as depicted in Figure 3, the channel model of Section 2.2 becomes

$$Y\_1^n \triangleq X^n + \sqrt{\mathcal{S}\_1} S^n + N\_{1\prime}^n \tag{5a}$$

$$Y\_2^{\mathfrak{n}} \triangleq \mathcal{X}^{\mathfrak{n}} + \sqrt{\mathfrak{g}\_2} \mathcal{S}^{\mathfrak{n}} + \mathcal{N}\_{2\text{ }\prime}^{\mathfrak{n}} \tag{5b}$$

$$Z^n \triangleq \sqrt{h}X^n + N\_{Z,\*}^n \tag{5c}$$

where *σ*<sup>2</sup> *<sup>Z</sup>* = 1. We term the model as Gaussian Broadcast Wiretap channel with Jammer-Aided eavesdropper (Gaussian BC-WT-JA in short form). Note that this model recovers as special cases the multi-receiver wiretap channel [26] and the model in [27] in the absence of the security constraint (2b).

**Figure 3.** The Gaussian broadcast wiretap channel in the presence of a jammer-aided eavesdropper.

*2.6. Special Case 4: The Gaussian Symmetric Interference Wiretap Channel in the Presence of a Jammer-Aided Eavesdropper*

Consider the following special case of the channel model of Section 2.2

$$Y\_1^{\prime\prime} \stackrel{\scriptstyle \Delta}{=} X\_1^{\prime\prime} + X\_2^{\prime\prime} + S^{\prime\prime} + N\_{1\prime}^{\prime\prime} \tag{6a}$$

$$Y\_2^n \triangleq X\_1^n + X\_2^n + S^n + N\_{2"\prime}^n \tag{6b}$$

$$Z^{\mathfrak{n}} \triangleq \sqrt{h\_1} X\_1^{\mathfrak{n}} + \sqrt{h\_2} X\_2^{\mathfrak{n}} + N\_{Z^{\mathfrak{s}}}^{\mathfrak{n}} \tag{6c}$$

where *σ*<sup>2</sup> <sup>1</sup> = *<sup>σ</sup>*<sup>2</sup> <sup>2</sup> = *<sup>σ</sup>*<sup>2</sup> *<sup>Z</sup>* = 1. We term the model as Gaussian Symmetric Interference Wiretap channel with Jammer-Aided eavesdropper (Gaussian SI-WT-JA in short form). In the absence of the security constraint (2b) and the jamming sequence, this model recovers a special case of the Gaussian interference channel under strong interference [28].

### **3. The Gaussian Wiretap Channel in the Presence of a Jammer-Aided Eavesdropper**

We present a capacity result for the Gaussian WT-JA model described in Section 2.3.

### **Theorem 1.** *The secrecy capacity of the Gaussian WT-JA is*

$$C(\Lambda) \triangleq \begin{cases} \left[ \frac{1}{2} \log \left( \frac{1 + (1 + \Lambda)^{-1} \Gamma}{1 + h \Gamma} \right) \right]^+ & \text{if } \Gamma > \Lambda \\ 0 & \text{if } \Gamma \le \Lambda \end{cases} \tag{7}$$

Observe that *C*(Λ) is non-zero if and only if Γ > Λ and (1 + Λ)−<sup>1</sup> > *h*. When Γ > Λ, Theorem 1 means that arbitrary oblivious jamming is no more harmful than Gaussian jamming, i.e., when the jamming sequence is obtained from independent and identical realizations of a zero-mean Gaussian random variable with variance equal to the power constraint Λ.

The proof of Theorem 1 follows as a special case of the achievability and converse bounds derived in the next section in Theorems 2 and 3, respectively, for the Gaussian MAC-WT-JA.

### **4. The Gaussian Multiple-Access Wiretap Channel in the Presence of a Jammer-Aided Eavesdropper**

*4.1. Inner and Outer Bounds for the Gaussian MAC-WT-JA*

We derive inner and outer bounds for the Gaussian MAC-WT-JA in Theorems 2 and 3. Their proofs are provided in Sections 7 and 8, respectively.

**Theorem 2** (Achievability)**.** *We consider three cases.*

*1. When* Γ<sup>1</sup> > Λ *and* Γ<sup>2</sup> ≤ Λ*,*

$$\mathcal{R}\_1^{\text{MAC}} \triangleq \left\{ (R\_1, 0) : R\_1 \le \max\_{0 \le P\_2 \le \Gamma\_2} \left[ \frac{1}{2} \log \left( \frac{1 + \Gamma\_1 (1 + \Lambda + P\_2)^{-1}}{1 + \Gamma\_1 h\_1 (1 + h\_2 P\_2)^{-1}} \right) \right]^+ \right\} \tag{8}$$

*is achievable.*

*2. When* Γ<sup>2</sup> > Λ *and* Γ<sup>1</sup> ≤ Λ*,*

$$\mathcal{R}\_2^{\text{MAC}} \triangleq \left\{ (0, R\_2) : R\_2 \le \max\_{0 \le P\_1 \le \Gamma\_1} \left[ \frac{1}{2} \log \left( \frac{1 + \Gamma\_2 (1 + \Lambda + P\_1)^{-1}}{1 + \Gamma\_2 h\_2 (1 + h\_1 P\_1)^{-1}} \right) \right]^+ \right\} \tag{9}$$

*is achievable.*

*3. When* min(Γ1, Γ2) > Λ*,*

$$\mathcal{R}^{\text{MAC}} \triangleq \text{Conv}\left(\mathcal{R}\_1^{\text{MAC}} \cup \mathcal{R}\_2^{\text{MAC}} \cup \bigcup\_{\substack{\Lambda < P\_1 \le \Gamma\_1\\ \Lambda < P\_2 \le \Gamma\_2}} \mathcal{R}\_{1,2}^{\text{MAC}}(P\_1, P\_2)\right) \tag{10}$$

*is achievable, where*

$$\mathcal{R}\_{1,2}^{\text{MAC}}(P\_1, P\_2) \triangleq \left\{ (R\_1, R\_2) : R\_1 \le \left[ \frac{1}{2} \log \left( \frac{1 + P\_1 (1 + \Lambda)^{-1}}{1 + P\_1 h\_1 (1 + h\_2 P\_2)^{-1}} \right) \right]^+,$$

$$R\_2 \le \left[ \frac{1}{2} \log \left( \frac{1 + P\_2 (1 + \Lambda)^{-1}}{1 + P\_2 h\_2 (1 + h\_1 P\_1)^{-1}} \right) \right]^+,$$

$$R\_1 + R\_2 \le \left[ \frac{1}{2} \log \left( \frac{1 + (P\_1 + P\_2)(1 + \Lambda)^{-1}}{1 + P\_1 h\_1 + P\_2 h\_2} \right) \right]^+ \}.\tag{11}$$

### **Theorem 3** (Partial Converse)**.**


Observe that in the achievability scheme for <sup>R</sup>MAC <sup>1</sup> , choosing a transmission power smaller than Γ<sup>1</sup> for Transmitter 1 would result in a smaller region, since for a fixed *P*2, *<sup>x</sup>* <sup>→</sup> log <sup>1</sup>+*x*(1+Λ+*P*2)−<sup>1</sup> 1+*xh*1(1+*h*2*P*2)−<sup>1</sup> is either negative when (<sup>1</sup> <sup>+</sup> <sup>Λ</sup> <sup>+</sup> *<sup>P</sup>*2)−<sup>1</sup> <sup>≤</sup> *<sup>h</sup>*1(<sup>1</sup> <sup>+</sup> *<sup>h</sup>*2*P*2)−1, or nondecreasing when (1 + Λ + *P*2)−<sup>1</sup> > *h*1(1 + *h*2*P*2)−1. By exchanging the role of the transmitters, we have the same observation for <sup>R</sup>MAC <sup>2</sup> .

### *4.2. Discussion of Rate-Splitting*

Rate-splitting [5] can be adapted to the Gaussian MAC-WT-JA to avoid time-sharing, however, the entire region in Equation (11) cannot be achieved as splitting the power of one user precludes reliable communication. Assuming that

$$I(X\_1X\_2;Y) - I(X\_1X\_2;Z) \ge \max[I(X\_1;Y|X\_2) - I(X\_1;Z), I(X\_2;Y|X\_1) - I(X\_2;Z)],\tag{12}$$

then one can split the power of Transmitter 1 in (*P*<sup>1</sup> − *δ*) and *δ*, where *δ* ∈ [0, *P*1], and define the following functions from [0, *P*1] to R

$$R\_{II}: \delta \mapsto \frac{1}{2} \log \frac{1 + (P\_1 - \delta)(1 + \Lambda + \delta + P\_2)^{-1}}{1 + h\_1(P\_1 - \delta)},\tag{13a}$$

$$R\_V: \delta \mapsto \frac{1}{2} \log \frac{1 + \delta (1 + \Lambda)^{-1}}{1 + h\_1 \delta (1 + h\_1 (P\_1 - \delta) + h\_2 P\_2)^{-1}} \tag{13b}$$

$$R\_2: \delta \mapsto \frac{1}{2} \log \frac{1 + P\_2(1 + \Lambda + \delta)^{-1}}{1 + h\_2 P\_2 (1 + h\_1 (P\_1 - \delta))^{-1}}.\tag{13c}$$

**Lemma 1.** *For any δ* ∈ [0, *P*1]*, we have* (*RU* + *RV* + *R*2)(*δ*) = *I*(*X*1*X*2;*Y*) − *I*(*X*1*X*2; *Z*)*. Moreover, for any point* (*x*0, *y*0) *in*

$$\begin{aligned} \mathcal{D}(P\_1, P\_2) \\ \triangleq \left\{ (R\_1, R\_2) \in \mathcal{R}\_{1,2}^{\text{MAC}}(P\_1, P\_2) : R\_1 + R\_2 = \left[ \frac{1}{2} \log \left( \frac{1 + (P\_1 + P\_2)(1 + \Lambda)^{-1}}{1 + P\_1 h\_1 + P\_2 h\_2} \right) \right]^+ \right\}, \end{aligned} \tag{14}$$

*there exists δ*<sup>0</sup> ∈ [0, *P*1] *such that x*<sup>0</sup> = (*RU* + *RV*)(*δ*0) *and y*<sup>0</sup> = *R*2(*δ*0)*.*

**Proof.** Define

$$Y \triangleq \mathcal{U} + V + X\_2 + N\_{Y\_{\prime}} \tag{15a}$$

$$Z \triangleq \sqrt{h\_1}(\mathcal{U} + V) + \sqrt{h\_2}X\_2 + N\mathcal{Z}\_\prime \tag{15b}$$

where *V*, *U*, *X*2, *NY*, *NZ* are independent zero-mean Gaussian random variables with variances *δ* ∈ [0, *P*1], *P*<sup>1</sup> − *δ*, *P*2, (1 + Λ), 1, respectively. Additionally, define

$$R\_{II}(\delta) \triangleq I(\mathcal{U}; Y) - I(\mathcal{U}; Z | V X\_2) = \frac{1}{2} \log \frac{1 + (P\_1 - \delta)(1 + \Lambda + \delta + P\_2)^{-1}}{1 + h\_1(P\_1 - \delta)},\tag{16a}$$

$$R\_V(\delta) \triangleq I(V;Y|lIX\_2) - I(V;Z) = \frac{1}{2} \log \frac{1 + \delta(1 + \Lambda)^{-1}}{1 + h\_1 \delta (1 + h\_1(P\_1 - \delta) + h\_2 P\_2)^{-1}} \tag{16b}$$

$$R\_2(\delta) \triangleq I(X\_2; Y|\mathcal{U}) - I(X\_2; Z|\mathcal{V}) = \frac{1}{2} \log \frac{1 + P\_2(1 + \Lambda + \delta)^{-1}}{1 + h\_2 P\_2 (1 + h\_1 (P\_1 - \delta))^{-1}}.\tag{16c}$$

By the chain rule, we have, for any *δ* ∈ [0, *P*1], (*RU* + *RV* + *R*2)(*δ*) = *I*(*X*1*X*2;*Y*) − *I*(*X*1*X*2; *Z*). Finally, since (*RU* + *RV*)(0) = *I*(*X*1;*Y*) − *I*(*X*1; *Z*|*X*2) and (*RU* + *RV*)(*P*1) = *I*(*X*1;*Y*|*X*2) − *I*(*X*1; *Z*), by continuity of *δ* → (*RU* + *RV*)(*δ*), there exists *δ*<sup>0</sup> ∈ [0, *P*1] such that *x*<sup>0</sup> = (*RU* + *RV*)(*δ*0) and *y*<sup>0</sup> = *R*2(*δ*0) for any point (*x*0, *y*0) in D(*P*1, *P*2) .

As remarked in [29], a potential issue is that *RU*(*δ*0) or *RV*(*δ*0) can be negative in Lemma 1. We have the following achievability result.

**Proposition 1.** *Let* (*x*0, *y*0) ∈ D(*P*1, *P*2) *and δ*<sup>0</sup> *be as in Lemma 1. Then,* (*x*0, *y*0) *can be achieved without time-sharing if RU*(*δ*0) ≥ 0 *and RV*(*δ*0) ≥ 0 *and* min(*δ*0, *P*<sup>1</sup> − *δ*0) > Λ*.* (*x*0, *y*0) ∈ D(*P*1, *P*2) *can also be achieved without time-sharing if similar conditions (obtained by exchanging the role of the two transmitters) are satisfied when splitting the power of Transmitter 2.*

**Proof idea:** Transmitter 1 is split into two *virtual* users that transmit at rate *RU*(*δ*) with power *δ* and at rate *RV*(*δ*) with power *P*<sup>1</sup> − *δ*. Encoding for User 2 and the two virtual users is similar to Case 1 in the proof of Theorem 2. The receiver adopts a minimum distance decoding rule as in Theorem 2 to first decode the message associated with the virtual user that transmits at rate *RV*, then to decode the message associated with User 2, and finally, to decode the message associated with the virtual user that transmits at rate *RU*. A similar procedure can be performed if one decides to split the power of Transmitter 2.

An illustration of Proposition 1 is depicted in Figure 4. Note that for some model parameters, the set of points achievable with Proposition 1 can be empty and, unfortunately, it does not seem easy to obtain a simple analytical characterization of the rate pairs achievable with Proposition 1.

**Figure 4.** The shaded area represents <sup>R</sup>MAC 1,2 (*P*1, *P*2), where (*P*1, *P*2, Λ, *h*1, *h*2)=(4, 3.3, 1.5, 0.12, 0.11). The solid segments represent the rate pairs achievable with Proposition 1.

### *4.3. Extension to More Than Two Transmitters*

We extend our result for the MAC-WT-JA to the case of an arbitrary number of transmitters. The problem is more involved than the case of two transmitters and requires new time-sharing strategies that leverage extended polymatroid properties.

Consider the model of Section 2.4 with *L* transmitters instead of two transmitters. We let L - -1, *L* denote the set of transmitters. More specifically, the channel model of Section 2.4 becomes

$$Y^{\pi^{\mathfrak{m}}} \stackrel{\triangle}{=} \sum\_{\mathfrak{t} \dots \mathfrak{e}} X\_{\mathfrak{l}}^{\mathfrak{m}} + S^{\mathfrak{n}} + N\_{\mathfrak{l}\mathfrak{l}\mathfrak{l}}^{\mathfrak{n}} \tag{17a}$$

$$Z^n \stackrel{\triangle}{=} \sum\_{l \in \mathcal{L}} \sqrt{h\_l} X\_l^n + \mathcal{N}\_{Z'}^n \tag{17b}$$

where *σ*<sup>2</sup> <sup>1</sup> = *<sup>σ</sup>*<sup>2</sup> *<sup>Z</sup>* = 1. We term the model as Gaussian MAC-WT-JA with parameters ((Γ*l*)*l*∈L,(*hl*)*l*∈L, <sup>Λ</sup>, *<sup>σ</sup>*<sup>2</sup> <sup>1</sup> , *<sup>σ</sup>*<sup>2</sup> *<sup>Z</sup>*). When the channel gains (*hl*)*l*∈L are all equal to *<sup>h</sup>* ∈ [0, 1[, we refer to this model as the degraded MAC-WT-JA with parameters ((Γ*l*)*l*∈L, *<sup>h</sup>*, <sup>Λ</sup>, *<sup>σ</sup>*<sup>2</sup> <sup>1</sup> , *<sup>σ</sup>*<sup>2</sup> *Z*). Given <sup>Λ</sup> ∈ R<sup>+</sup> and (Γ*l*)*l*∈L, we define *<sup>h</sup>*<sup>Λ</sup> - (<sup>1</sup> <sup>+</sup> <sup>Λ</sup>)−1, <sup>L</sup>(Λ) - {*l* ∈ L : Γ*<sup>l</sup>* > Λ}, and <sup>L</sup>*c*(Λ) -L\L(Λ). The following achievability result is proven in Appendix B.

**Theorem 4.** *Assume that for all l* ∈ L(Λ)*, h*<sup>Λ</sup> > *hl. The following region is achievable for the Gaussian MAC-WT-JA with parameters* ((Γ*l*)*l*∈L,(*hl*)*l*∈L, <sup>Λ</sup>, 1, 1)

$$\mathcal{R} = \bigcup\_{\substack{(P\_l)\_{l \in \mathcal{L}} \\ \forall l \in \mathcal{L}(\Lambda), \Lambda < P\_l \le \Gamma\_l}} \left\{ (R\_l)\_{l \in \mathcal{L}} : \forall l \in \mathcal{L}^c(\Lambda), R\_l = 0 \text{ and } \forall \mathcal{T} \subseteq \mathcal{L}(\Lambda), \\\ R\_{\mathcal{T}} \le \left[ \frac{1}{2} \log \left( \frac{1 + h\_{\Lambda} P\_{\mathcal{T}}}{1 + (\sum\_{l \in \mathcal{T}} h\_l P\_l) (1 + \sum\_{l \in \mathcal{T}^c} h\_l P\_l)^{-1}} \right) \right]^+ \right\}, \tag{18}$$

*where for any* (*Pl*)*l*∈L *and* T ⊆L*, we use the notation P*<sup>T</sup> -∑*l*∈T *Pl.*

We immediately obtain the following corollary.

**Corollary 1.** *The following region is achievable for the degraded Gaussian MAC-WT-JA with parameters* ((Γ*l*)*l*∈L, *<sup>h</sup>*, <sup>Λ</sup>, 1, 1)

$$\mathcal{R} = \bigcup\_{\substack{(P\_l)\_{l \in \mathcal{L}} \\ \forall l \in \mathcal{L}(\Lambda), \Lambda < P\_l \le \Gamma\_l}} \left\{ (R\_l)\_{l \in \mathcal{L}} : \forall l \in \mathcal{L}^c(\Lambda), R\_l = 0 \text{ and } \forall \mathcal{T} \subseteq \mathcal{L}(\Lambda), \\\ R\_{\mathcal{T}} \le \left[ \frac{1}{2} \log \left( \frac{1 + h\_{\Lambda} P\_{\mathcal{T}}}{1 + h P\_{\mathcal{T}} (1 + h P\_{\mathcal{T}^c})^{-1}} \right) \right]^+ \right\}.\tag{19}$$

Note that the achievability strategy used in the proof of Theorem 4 is different than the achievability strategy used in the proof of Theorem 2. While Theorem 4 gains in generality by considering an arbitrary number of users, it requires the assumption ∀*l* ∈ L(Λ), *h*<sup>Λ</sup> > *hl*, which is not needed in Theorem 2. We also have the following optimality result, which is proven in Appendix C.

**Theorem 5.** *The maximal secrecy sum-rate R*<sup>L</sup> - ∑*l*∈L *Rl achievable for the degraded Gaussian MAC-WT-JA with parameters* ((Γ*l*)*l*∈L, *<sup>h</sup>*, <sup>Λ</sup>, 1, 1) *is*

$$\left[\frac{1}{2}\log\left(\frac{1+h\_{\Lambda}\Gamma\_{\mathcal{L}(\Lambda)}}{1+h\Gamma\_{\mathcal{L}(\Lambda)}}\right)\right]^{+}.\tag{20}$$

Note that the optimal secrecy sum-rate is positive if and only if *h*<sup>Λ</sup> > *h* and L(Λ) = ∅.

### **5. The Gaussian Broadcast Wiretap Channel in the Presence of a Jammer-Aided Eavesdropper**

Theorems 6 and 7 provide inner and outer bounds, respectively, for the Gaussian BC-WT-JA.

**Theorem 6** (Achievability)**.** *We have the following inner bounds.*

*1. When g*2Λ ≥ Γ *and g*1Λ < Γ*,*

$$\mathcal{R}\_1^{\rm BC} \triangleq \left\{ (\mathcal{R}\_1, 0) : \mathcal{R}\_1 \le \left[ \frac{1}{2} \log \left( \frac{1 + \frac{\Gamma}{\sigma\_1^2 + \mathfrak{g}\_1 \Lambda}}{1 + h\Gamma} \right) \right]^+ \right\} \tag{21}$$

*is achievable.*

*2. When g*1Λ ≥ Γ *and g*2Λ < Γ*,*

$$\mathcal{R}\_2^{\rm BC} \triangleq \left\{ (0, R\_2) : R\_2 \le \left[ \frac{1}{2} \log \left( \frac{1 + \frac{\Gamma}{\sigma\_2^2 + \xi \Delta \Lambda}}{1 + h\Gamma} \right) \right]^+ \right\} \tag{22}$$

*is achievable.*

*3. When* max(*g*1Λ, *g*2Λ) < Γ*, and, without loss of generality, σ*<sup>2</sup> <sup>1</sup> <sup>+</sup> *<sup>g</sup>*1<sup>Λ</sup> <sup>≤</sup> *<sup>σ</sup>*<sup>2</sup> <sup>2</sup> + *g*2Λ *(exchange the role of the receivers if σ*<sup>2</sup> <sup>1</sup> + *<sup>g</sup>*1<sup>Λ</sup> > *<sup>σ</sup>*<sup>2</sup> <sup>2</sup> + *g*2Λ*),*

$$\text{Conv}\left(\mathcal{R}\_1^{\text{BC}} \cup \mathcal{R}\_2^{\text{BC}} \cup \bigcup\_{\mathfrak{a} \in \left[\max(\mathcal{g}\_1, \mathcal{g}\_2) \land \Gamma^{-1}, 1\right]} \mathcal{R}^{\text{BC}}(\mathfrak{a})\right),\tag{23}$$

*is achievable where we have defined for α* ∈ [0, 1]

$$\mathcal{R}^{\rm BC}(a) \triangleq \left\{ (R\_1, R\_2) : R\_1 \le \left[ \frac{1}{2} \log \left( \frac{1 + \frac{(1 - a)\Gamma}{\sigma\_1^2 + \xi\_1 \Lambda}}{1 + h(1 - a)\Gamma} \right) \right]^+ \right.$$

$$R\_2 \le \left[ \frac{1}{2} \log \left( \frac{1 + \frac{a\Gamma}{(1 - a)\Gamma + \sigma\_2^2 + \xi\_2 \Lambda}}{1 + \frac{ha\Gamma}{h(1 - a)\Gamma + 1}} \right) \right]^+ \}. \tag{24}$$

Note that <sup>R</sup>BC(*<sup>α</sup>* <sup>=</sup> <sup>0</sup>) = <sup>R</sup>BC <sup>1</sup> and <sup>R</sup>BC(*<sup>α</sup>* <sup>=</sup> <sup>1</sup>) = <sup>R</sup>BC <sup>2</sup> . The achievability scheme of Theorem 6 is similar to the proof of Theorem 2 and [27] [Theorem 3].

**Theorem 7** (Partial converse)**.**

*1. If* Γ ≤ min(*g*1Λ, *g*2Λ)*, then no positive rate is achievable;*


$$\bigcup\_{\alpha \in \{0, 1\}} \mathcal{R}^{\text{BC}}(\alpha),\tag{25}$$

*where* <sup>R</sup>BC(*α*) *has been defined in Theorem 6.*

The proof of Theorem 7 is similar to the proof of Theorem 3 using [26] in place of [30]. Observe that the gap between the inner and outer bounds of Theorems 6 and 7 when Γ > max(*g*1Λ, *g*2Λ) comes from the fact that our achievability scheme is limited to *<sup>α</sup>* <sup>∈</sup>] max(*g*1, *<sup>g</sup>*2)ΛΓ−1, 1] ∪ {0}.

### **6. The Symmetric Interference Wiretap Channel in the Presence of a Jammer-Aided Eavesdropper**

By the symmetry in Equation (6a) and Equation (6b), a code for the Gaussian MAC-WT-JA allows Receiver *i* ∈ {1, 2} to securely recover the message of Transmitter *i*. Hence, from the achievability result for the Gaussian MAC-WT-JA, we have the following achievability result for the Gaussian SI-WT-JA.

**Theorem 8** (Achievability)**.** *We consider three cases.*


*where* <sup>R</sup>MAC <sup>1</sup> *,* <sup>R</sup>MAC <sup>2</sup> *, and* <sup>R</sup>MAC *are defined in Theorem 2.*

Next, by the symmetry in Equations (6a) and (6b), we have that any code for the Gaussian SI-WT-JA allows Receiver *i* ∈ {1, 2} to securely recover the messages from both transmitters, meaning that an outer bound for the Gaussian SI-WT-JA can be obtained by considering an outer bound for a Gaussian MAC-WT-JA. Hence, from the partial converse for the Gaussian MAC-WT-JA, we obtain the following partial converse for the Gaussian SI-WT-JA.

**Theorem 9** (Partial converse)**.**


### **7. Proof of Theorem 2**

To prove Theorem 2, it is sufficient to prove the achievability of the dominant face

$$\begin{aligned} \mathcal{D}(P\_1, P\_2) \\ \triangleq \left\{ (R\_1, R\_2) \in \mathcal{R}\_{1,2}^{\text{MAC}}(P\_1, P\_2) : R\_1 + R\_2 = \left[ \frac{1}{2} \log \left( \frac{1 + (P\_1 + P\_2)(1 + \Lambda)^{-1}}{1 + P\_1 h\_1 + P\_2 h\_2} \right) \right]^+ \right\} \quad \text{(26)} \end{aligned}$$

of <sup>R</sup>MAC 1,2 (*P*1, *<sup>P</sup>*2) to prove the achievability of <sup>R</sup>MAC 1,2 (*P*1, *P*2) when min(Γ1, Γ2) > Λ and where (*P*1, *<sup>P</sup>*2) <sup>∈</sup>]Λ, <sup>Γ</sup>1]×]Λ, <sup>Γ</sup>2]. The achievability of <sup>R</sup>MAC *<sup>i</sup>* , *i* ∈ {1, 2}, when Γ*<sup>i</sup>* > Λ and <sup>Γ</sup>3−*<sup>i</sup>* <sup>≤</sup> <sup>Λ</sup> is obtained similarly by having Transmitter ¯*<sup>i</sup>* - 3 − *i* send Gaussian noise. Observe that the rate constraints in <sup>R</sup>MAC 1,2 (*P*1, *P*2) can be expressed as

$$R\_1 \le \left[ I(X\_1; Y | X\_2) - I(X\_1; Z) \right]^+,\tag{27a}$$

$$R\_2 \le [I(X\_2; Y | X\_1) - I(X\_2; Z)]^+,\tag{27b}$$

$$R\_1 + R\_2 \le [I(X\_1 X\_2; Y) - I(X\_1 X\_2; Z)]^+,\tag{27c}$$

where

$$Y \triangleq X\_1 + X\_2 + N\_{Y\_{\prime}} \tag{28a}$$

$$Z \triangleq \sqrt{h\_1}X\_1 + \sqrt{h\_2}X\_2 + N\_{\mathbb{Z}\prime} \tag{28b}$$

and *X*1, *X*2, *NY*, *NZ* are independent zero-mean Gaussian random variables with variances *<sup>P</sup>*1, *<sup>P</sup>*2, (<sup>1</sup> + <sup>Λ</sup>), 1, respectively. As remarked in [29], the set function T → *<sup>I</sup>*(*X*<sup>T</sup> ;*Y*|*X*<sup>T</sup> *<sup>c</sup>* ) − *<sup>I</sup>*(*X*<sup>T</sup> ; *<sup>Z</sup>*) is submodular but not necessarily non-decreasing, where ∀T ⊆ {1, 2}, *<sup>X</sup>*<sup>T</sup> - (*Xt*)*t*∈T . This is the main reason why achieving the corner points of <sup>R</sup>MAC 1,2 (*P*1, *P*2) by means of point-to-point codes via the successive decoding method [5] [Appendix C] does not easily translate to our setting. Before we provide our solution, we summarize our proof strategy in the three cases below. Figure 5 illustrates these cases.

**Figure 5.** Region R1,2(*P*1, *P*2).

**Case 1**: Assume

$$I(X\_1X\_2;Y) - I(X\_1X\_2;Z) \ge \max[I(X\_1;Y|X\_2) - I(X\_1;Z), I(X\_2;Y|X\_1) - I(X\_2;Z)].\tag{29}$$

The corner points of <sup>R</sup>MAC 1,2 are given by

$$\underline{\subseteq} \, \triangleq \left( I(X\_1; \mathcal{Y} | X\_2) - I(X\_1; Z), I(X\_2; \mathcal{Y}) - I(X\_2; Z | X\_1) \right), \tag{30a}$$

$$\underline{\mathsf{C}}\_{2} \stackrel{\Delta}{=} \left( I(X\_{1}; \boldsymbol{Y}) - I(X\_{1}; \boldsymbol{Z}|X\_{2}), I(X\_{2}; \boldsymbol{Y}|X\_{1}) - I(X\_{2}; \boldsymbol{Z}) \right). \tag{30b}$$

We will achieve each corner point with point-to-point coding techniques and perform time-sharing to achieve D(*P*1, *P*2). Specifically, to achieve *Ci*, *i* ∈ {1, 2}, the encoders will be designed such that the decoder can first estimate the codeword sent by Transmitter ¯*i* - 3 − *i* (by considering the codewords of Transmitter *i* as noise), which is in turn used to estimate the codeword sent by Transmitter *i*. This approach is similar to the successive decoding method [5] [Appendix C] for a multiple-access channel in the absence of a security constraint.

**Case 2.a**: Assume

$$I(X\_1X\_2;Y) - I(X\_1X\_2;Z) \ge I(X\_1;Y|X\_2) - I(X\_1;Z),\tag{31a}$$

$$I(X\_1X\_2;Y) - I(X\_1X\_2;Z) \prec I(X\_2;Y|X\_1) - I(X\_2;Z). \tag{31b}$$

Hence,

$$\widetilde{\underline{C}}\_2 \stackrel{\Delta}{=} \left( I(X\_1; Y) - I(X\_1; Z | X\_2), I(X\_2; Y | X\_1) - I(X\_2; Z) \right) \tag{32}$$

has a negative x-coordinate and the method of Case 1 cannot be directly applied here. Now, the corner points of <sup>R</sup>MAC 1,2 are

$$\underline{\mathbf{C}}\_{1} \triangleq \left( I(X\_{1};Y|X\_{2}) - I(X\_{1};Z), I(X\_{2};Y) - I(X\_{2};Z|X\_{1}) \right), \tag{33a}$$

$$\underline{C}\_2 \triangleq \left( 0, I(X\_1 X\_2; Y) - I(X\_1 X\_2; Z) \right). \tag{33b}$$

The idea to achieve *C*<sup>1</sup> is, as in Case 1, a successive decoding approach by decomposing the sum rate *I*(*X*1*X*2;*Y*) − *I*(*X*1*X*2; *Z*) as the sum of *I*(*X*2;*Y*) − *I*(*X*2; *Z*|*X*1), which represents the secret message rate for Transmitter 2, and *I*(*X*1;*Y*|*X*2) − *I*(*X*1; *Z*), which represents the secret message rate for Transmitter 1. However, *C*<sup>2</sup> cannot be decomposed in a similar manner and thus cannot be achieved with the same method. Instead, to achieve any point in D(*P*1, *P*2), we rely on a strategy over several transmission blocks. First, in an appropriate number of transmission blocks, the transmitters can send secret messages with rates *C*<sup>1</sup> as in Case 1. Part of the secret messages of Transmitter 1, with a rate equal to the absolute value of the x-coordinate of the point *<sup>C</sup>*H2, is dedicated to the exchange of a secret key between Transmitter 1 and the legitimate receiver. Then, for the remaining transmission blocks, Transmitter 2 transmits a secret message with rate *I*(*X*1*X*2;*Y*) − *I*(*X*1*X*2; *Z*), while Transmitter 1 uses the previously generated secret key to produce a jamming signal, which can be canceled out by the legitimate receiver but not by the eavesdropper who does not know the secret key.

**Case 2.b**: Assume

$$I(X\_1X\_2;Y) - I(X\_1X\_2;Z) \ge I(X\_2;Y|X\_1) - I(X\_2;Z),\tag{34a}$$

$$I(X\_1X\_2;Y) - I(X\_1X\_2;Z) \prec I(X\_1;Y|X\_2) - I(X\_1;Z). \tag{34b}$$

This case is handled as Case 2.a by exchanging the role of the two transmitters. **Case 3**: Assume

$$I(X\_1X\_2;Y) - I(X\_1X\_2;Z) < \min[I(X\_1;Y|X\_2) - I(X\_1;Z), I(X\_2;Y|X\_1) - I(X\_2;Z)].\tag{35}$$

Hence,

$$
\underline{\widehat{C}\_{1}} \triangleq \left( I(X\_{1}; \boldsymbol{Y} | X\_{2}) - I(X\_{1}; \boldsymbol{Z}), I(X\_{2}; \boldsymbol{Y}) - I(X\_{2}; \boldsymbol{Z} | X\_{1}) \right), \tag{36a}
$$

$$
\underline{\dot{C}\_2} \triangleq \left( I(X\_1; Y) - I(X\_1; Z | X\_2), I(X\_2; Y | X\_1) - I(X\_2; Z) \right), \tag{36b}
$$

have a negative y-component and a negative x-component, respectively, and the strategy of Case 1 or Case 2 cannot be directly applied here. The corner points of the region are

$$\underline{\mathbf{C}}\_{1} \triangleq (I(X\_{1}X\_{2};Y) - I(X\_{1}X\_{2};Z),0),\tag{37a}$$

$$\underline{\hspace{0.2cm}} \triangleq (0, I(X\_1X\_2; \mathcal{Y}) - I(X\_1X\_2; Z)).\tag{37b}$$

These corner points do not seem to be easily achievable using the method for Case 1. We will first show that it is possible to achieve a point *R* ∈ D(*P*1, *P*2), where *R* has strictly positive components. All the other points in D(*P*1, *P*2) will then be achieved as in Case 2 by doing the substitutions *C*<sup>1</sup> ← *R* and *C*<sup>2</sup> ← *R* in Case 2.a and Case 2.b, respectively.

Note that it is sufficient to consider the case

$$\min\left[I(X\_1;\mathcal{Y}|X\_2) - I(X\_1;Z), I(X\_2;\mathcal{Y}|X\_1) - I(X\_2;Z)\right] \ge 0. \tag{38}$$

Indeed, for *<sup>i</sup>* ∈ {1, 2} and ¯*<sup>i</sup>* - 3 − *i*, when *I*(*Xi*;*Y*|*X*¯*<sup>i</sup>* ) − *I*(*Xi*; *Z*) > 0 and *I*(*X*¯*<sup>i</sup>* ;*Y*|*Xi*) − *I*(*X*¯*<sup>i</sup>* ; *Z*) ≤ 0, we have *R*¯*<sup>i</sup>* = 0 and *Ri* ≤ *I*(*X*1*X*2;*Y*) − *I*(*X*1*X*2; *Z*) ≤ *I*(*Xi*;*Y*|*X*¯*<sup>i</sup>* ) − *I*(*Xi*; *Z*|*X*¯*<sup>i</sup>* ) = <sup>1</sup> <sup>2</sup> log1+*Pi*(1+Λ)−<sup>1</sup> 1+*Pihi* . These cases correspond to Theorem 1 and can be treated as in Case 1.

### *7.1. Case 1*

We show the achievability of *C*2. The achievability of *C*<sup>1</sup> is obtained by exchanging the role of the transmitters.

**Codebook construction**: For Transmitter *<sup>i</sup>* ∈ {1, 2}, construct a codebook *<sup>C</sup>*(*i*) *<sup>n</sup>* with 2*nRi* 2*nR*H*<sup>i</sup>* codewords drawn independently and uniformly on the sphere of radius <sup>√</sup>*nPi* in R*n*. The codewords are labeled *x<sup>n</sup> <sup>i</sup>* (*mi*, *<sup>m</sup>*H*i*), where *mi* <sup>∈</sup> -1, 2*nRi* , *<sup>m</sup>*H*<sup>i</sup>* <sup>∈</sup> -1, 2*nR*H*<sup>i</sup>* . We define *Cn* -(*C*(1) *<sup>n</sup>* , *<sup>C</sup>*(2) *<sup>n</sup>* ) and choose for *<sup>δ</sup>* <sup>&</sup>gt; <sup>0</sup>

$$R\_1 \triangleq I(X\_1; Y) - I(X\_1; Z | X\_2) - \delta,\tag{39a}$$

$$\tilde{\mathcal{R}}\_1 \triangleq I(X\_1; Z|X\_2) - \delta,\tag{39b}$$

$$R\_2 \triangleq I(X\_2; Y | X\_1) - I(X\_2; Z) - \delta\_\prime \tag{39c}$$

$$
\tilde{R}\_2 \stackrel{\Delta}{=} I(X\_2; Z) - \delta. \tag{39d}
$$

**Encoding at Transmitter i** ∈ {**1**, **<sup>2</sup>**}: Given (*mi*, *<sup>m</sup>*H*i*), transmit *<sup>x</sup><sup>n</sup> <sup>i</sup>* (*mi*, *<sup>m</sup>*H*i*). In the remainder of the paper, we use the term randomization sequence for *<sup>m</sup>*H*i*.

**Decoding**: The receiver performs minimum distance decoding to first estimate (*m*1, *<sup>m</sup>*H1) and then to estimate (*m*2, *<sup>m</sup>*H2), i.e., given *<sup>y</sup>n*, it determines (*m*<sup>ˆ</sup> 1, *<sup>m</sup>*H<sup>ˆ</sup> <sup>1</sup>) *φ*1(*yn*, 0), and (*m*<sup>ˆ</sup> 2, *<sup>m</sup>*H<sup>ˆ</sup> <sup>2</sup>) *φ*2(*yn*, *x<sup>n</sup>* <sup>1</sup> (*m*<sup>ˆ</sup> 1, *<sup>m</sup>*H<sup>ˆ</sup> <sup>1</sup>)) where for *<sup>i</sup>* ∈ {1, 2}

$$\phi\_i(y^n, x) \triangleq \begin{cases} (m\_i, \tilde{m}\_i) & \text{if } \|y^n - \mathbf{x} - \mathbf{x}\_i^n(m\_i, \tilde{m}\_i)\|^2 < \|y^n - \mathbf{x} - \mathbf{x}\_i^n(m\_i', \tilde{m}\_i')\|^2 \\ & \text{for } (m\_i', \tilde{m}\_i') \neq (m\_i, \tilde{m}\_i) \\ 0 & \text{if no such } (m\_i, \tilde{m}\_i) \in \{1, 2^{nR\_i}\} \times \{1, 2^{n\tilde{R}\_i}\} \text{ exists} \end{cases} \tag{40}$$

Define *e*(*Cn*,*sn*) - P (*M*F1, *<sup>M</sup>*F2) = (*M*1, *M*2)|*Cn* . We now prove thatE*Cn* [sup*s<sup>n</sup> <sup>e</sup>*(*Cn*,*sn*)] + <sup>1</sup> *<sup>n</sup> <sup>I</sup>*(*M*1*M*2; *<sup>Z</sup>n*|*Cn*) *<sup>n</sup>*→<sup>∞</sup> −−−→ 0. We will thus conclude by Markov's inequality that there exists a sequence of realizations (C*n*)*n*≥<sup>1</sup> of (*Cn*)*n*≥<sup>1</sup> such that both sup*s<sup>n</sup> <sup>e</sup>*(C*n*,*sn*) and <sup>1</sup> *<sup>n</sup> <sup>I</sup>*(*M*1*M*2; *<sup>Z</sup>n*|C*n*) can be made arbitrarily close to zero as *<sup>n</sup>* <sup>→</sup> <sup>∞</sup>.

**Average probability of error**: We have

$$\varepsilon(\mathbb{C}\_{n},\mathbf{s}^{n}) \le \mathbb{P}\left[ (\hat{M}\_{1}, \hat{M}\_{2}) \ne (M\_{1}, M\_{2}) \text{ or } (\dot{\tilde{M}}\_{1}, \dot{\tilde{M}}\_{2}) \ne (\tilde{M}\_{1}, \tilde{M}\_{2}) | \mathbb{C}\_{n} \right] \tag{41a}$$

$$\leq \varepsilon\_1(\mathbb{C}\_{\mathbb{H}}, s^{\mathbb{H}}, \mathbf{x}\_2^{\mathbb{H}}(M\_2, \check{M}\_2)) + \varepsilon\_2(\mathbb{C}\_{\mathbb{H}}, s^{\mathbb{H}}, \mathbf{0}),\tag{41b}$$

where for *i* ∈ {1, 2}

$$\varepsilon\_{i}(\mathbb{C}\_{n},\mathbf{s}^{n},\mathbf{x}) \stackrel{\Delta}{=} \frac{1}{\lceil 2^{nR\_{i}} \rceil \lceil 2^{n\bar{R}\_{i}} \rceil} \sum\_{m\_{i}} \sum\_{\tilde{m}\_{i}} \mathbb{P}\left[ \lVert \mathbf{x}\_{i}^{n}(m\_{i},\tilde{m}\_{i}) + \mathbf{s}^{n} + \mathbf{x} + \mathbf{N}\_{Y}^{n} - \mathbf{x}\_{i}^{n}(m\_{i}^{\prime},\tilde{m}\_{i}^{\prime}) \rceil^{2} \right]$$

$$\leq \lVert \mathbf{s}^{n} + \mathbf{x} + \mathbf{N}\_{Y}^{\prime} \rVert^{2} \text{ for some } (m\_{i\prime}^{\prime},\tilde{m}\_{i}^{\prime}) \neq (m\_{i\prime},\tilde{m}\_{i}) \Big]. \tag{42}$$

Next, we have

$$\begin{aligned} \mathbb{E}\_{\mathbb{C}\_{n}}[\varepsilon\_{1}(\mathbb{C}\_{n},\mathbf{s}^{\mathrm{n}},\mathbf{x}\_{2}^{\mathrm{n}}(M\_{2},\widetilde{M}\_{2}))] & \leq \mathbb{E}\_{\mathbb{C}\_{n}}[\varepsilon\_{1}(\mathbb{C}\_{n},\mathbf{s}^{\mathrm{n}},\mathbf{x}\_{2}^{\mathrm{n}}(M\_{2},\widetilde{M}\_{2}))|\mathcal{C}\_{n}^{(1)} \in \mathcal{C}\_{1}^{\*}] + \mathbb{P}[\mathbb{C}\_{n}^{(1)} \notin \mathcal{C}\_{1}^{\*}] \\ & \xrightarrow{\mathbf{n}\to\infty} 0,\end{aligned} \tag{43a}$$

where, in Equation (43a), C<sup>∗</sup> <sup>1</sup> represents all the sets of unit norm vectors scaled by <sup>√</sup>*nP*<sup>1</sup> that satisfy the two conditions of Lemma A1 (in Appendix A), Equation (43b) holds because <sup>P</sup>[*C*(1) *<sup>n</sup>* ∈ C<sup>∗</sup> <sup>1</sup> ] *<sup>n</sup>*→<sup>∞</sup> −−−→ 1 by Lemma A1, and <sup>E</sup>*Cn* [*e*1(*Cn*,*sn*, *<sup>x</sup><sup>n</sup>* <sup>2</sup> (*M*2, *<sup>M</sup>*H2))|*C*(1) *<sup>n</sup>* ∈ C<sup>∗</sup> <sup>1</sup> ] *<sup>n</sup>*→<sup>∞</sup> −−−→ 0 by Theorem A1 (in Appendix A) using that *<sup>R</sup>*<sup>1</sup> <sup>+</sup> *<sup>R</sup>*H<sup>1</sup> <sup>&</sup>lt; *<sup>I</sup>*(*X*1;*Y*) = <sup>1</sup> <sup>2</sup> log 1 + *<sup>P</sup>*<sup>1</sup> 1+Λ+*P*<sup>2</sup> and by interpreting the signal of Transmitter 2 as noise. Then,

$$\mathbb{E}\_{\mathbb{C}\_n}[\varepsilon\_2(\mathbb{C}\_n, s^n, 0)] \le \mathbb{E}\_{\mathbb{C}\_n}[\varepsilon\_2(\mathbb{C}\_n, s^n, 0) | \mathcal{C}\_n^{(2)} \in \mathcal{C}\_2^\*] + \mathbb{P}[\mathcal{C}\_n^{(2)} \notin \mathcal{C}\_2^\*] \tag{44a}$$

$$\xrightarrow{n \to \infty} 0,\tag{44b}$$

where, in Equation (44a), C<sup>∗</sup> <sup>2</sup> represents all the sets of unit norm vectors scaled by <sup>√</sup>*nP*<sup>2</sup> that satisfy the two conditions of Lemma A1, Equation (44b) holds because <sup>P</sup>[*C*(2) *<sup>n</sup>* ∈ C<sup>∗</sup> <sup>2</sup> ] *<sup>n</sup>*→<sup>∞</sup> −−−→ 1 by Lemma A1, and <sup>E</sup>*Cn* [*e*2(*Cn*,*sn*, 0)|*C*(2) *<sup>n</sup>* ∈ C<sup>∗</sup> <sup>2</sup> ] *<sup>n</sup>*→<sup>∞</sup> −−−→ 0 by Theorem A1 using that *<sup>R</sup>*<sup>2</sup> <sup>+</sup> *<sup>R</sup>*H<sup>2</sup> <sup>&</sup>lt; *<sup>I</sup>*(*X*2;*Y*|*X*1) = <sup>1</sup> <sup>2</sup> log 1 + *<sup>P</sup>*<sup>2</sup> 1+Λ . Hence, by Equations (41b), (43b) and (44b), we have

$$\mathbb{E}\_{\mathbb{C}\_n}[\mathcal{c}(\mathbb{C}\_{n\prime}s^n)] \xrightarrow{n \to \infty} 0. \tag{45}$$

**Equivocation**: We first study the average error probability of decoding (*m*H1, *<sup>m</sup>*H2) given (*zn*, *m*1, *m*2) with the following procedure. Given (*zn*, *m*1, *m*2), determine *m*˘ <sup>2</sup> *ψ*2(*zn*, 0), and *m*˘ <sup>1</sup> *ψ*1(*zn*, <sup>√</sup>*h*2*x<sup>n</sup>* <sup>2</sup> (*m*2, *m*˘ <sup>2</sup>)) where

$$\psi\_i(z^n, x) \triangleq \begin{cases} \tilde{m}\_i & \text{if } \|z^n - x - \sqrt{h}\_i \mathbf{x}\_i^n(m\_i, \tilde{m}\_i)\|^2 < \|z^n - x - \sqrt{h}\_i \mathbf{x}\_i^n(m\_i, \tilde{m}\_i')\|^2 \\\\ 0 & \text{if no such } \tilde{m}\_i \in \{1, 2^{n\bar{R}\_i}\} \text{ exists} \end{cases} \tag{46}$$

We define <sup>H</sup>*e*(*Cn*) - P (*M*˘ 1, *<sup>M</sup>*˘ <sup>2</sup>) = (*M*H1, *<sup>M</sup>*H2)|*Cn* and for *i* ∈ {1, 2},

$$\left|\widetilde{e}\_{i}(\mathbb{C}\_{n},\mathbf{x})\right| \triangleq \frac{1}{\lceil 2^{n\bar{R}\_{i}} \rceil} \sum\_{\tilde{m}\_{i}} \mathbb{P}\left[\left\|\sqrt{\tilde{h}\_{i}}\mathbf{x}\_{i}^{\mathrm{u}}(m\_{i},\tilde{m}\_{i}) + \mathbf{x} + \mathbf{N}\_{Z}^{\mathrm{u}} - \sqrt{\tilde{h}\_{i}}\mathbf{x}\_{i}^{\mathrm{u}}(m\_{i},\tilde{m}\_{i}')\right\|^{2} \right. $$

$$\leq \left\|\mathbf{x} + \mathbf{N}\_{Z}^{\mathrm{u}}\right\|^{2} \text{ for some } \tilde{m}\_{i}' \neq \tilde{m}\_{i} \big]. \tag{47}$$

Then, with the same notation as in Equations (43) and (44), we have

$$\begin{split} \mathbb{E}\_{\mathbb{C}\_{n}}[\widetilde{\varepsilon}(\mathbb{C}\_{n})] &\leq \mathbb{E}\_{\mathbb{C}\_{n}}[\widetilde{\varepsilon}\_{1}(\mathbb{C}\_{n},0)] + \mathbb{E}\_{\mathbb{C}\_{n}}[\widetilde{\varepsilon}\_{2}(\mathbb{C}\_{n},\sqrt{h}\_{1}\mathbf{x}\_{1}^{n}(M\_{1},\tilde{M}\_{1}))] \\ &\leq \mathbb{E}\_{\mathbb{C}\_{n}}[\widetilde{\varepsilon}\_{1}(\mathbb{C}\_{n},0)|\mathbf{C}\_{n}^{(1)} \in \mathcal{C}\_{1}^{\*}] + \mathbb{P}[\mathbb{C}\_{n}^{(1)} \notin \mathcal{C}\_{1}^{\*}] \\ &\quad + \mathbb{E}\_{\mathbb{C}\_{n}}[\widetilde{\varepsilon}\_{2}(\mathbb{C}\_{n},\sqrt{h}\_{1}\mathbf{x}\_{1}^{n}(M\_{1},\tilde{M}\_{1}))|\mathcal{C}\_{n}^{(2)} \in \mathcal{C}\_{2}^{\*}] + \mathbb{P}[\mathcal{C}\_{n}^{(2)} \notin \mathcal{C}\_{2}^{\*}] \\ &\xrightarrow{n\to\infty} 0, \end{split} \tag{48b}$$

$$\xrightarrow{n\to\infty}0,\tag{48c}$$

where Equation (48c) holds because <sup>P</sup>[*C*(1) *<sup>n</sup>* ∈ C<sup>∗</sup> <sup>1</sup> ] *<sup>n</sup>*→<sup>∞</sup> −−−→ 1 and <sup>P</sup>[*C*(2) *<sup>n</sup>* ∈ C<sup>∗</sup> <sup>2</sup> ] *<sup>n</sup>*→<sup>∞</sup> −−−→ 1 by Lemma A1, <sup>E</sup>*Cn* [H*e*1(*Cn*, 0)|*C*(1) *<sup>n</sup>* ∈ C<sup>∗</sup> <sup>1</sup> ] *<sup>n</sup>*→<sup>∞</sup> −−−→ 0 by Theorem A1 using that *<sup>R</sup>*H<sup>1</sup> <sup>&</sup>lt; *<sup>I</sup>*(*X*1; *<sup>Z</sup>*|*X*2) = <sup>1</sup> <sup>2</sup> log(<sup>1</sup> <sup>+</sup> *<sup>h</sup>*1*P*1), and <sup>E</sup>*Cn* [H*e*2(*Cn*, <sup>√</sup>*h*1*x<sup>n</sup>* <sup>1</sup> (*M*1, *<sup>M</sup>*H1))|*C*(2) *<sup>n</sup>* ∈ C<sup>∗</sup> <sup>2</sup> ] *<sup>n</sup>*→<sup>∞</sup> −−−→ 0 by Theorem A1 using that *<sup>R</sup>*H<sup>2</sup> <sup>&</sup>lt; *<sup>I</sup>*(*X*2; *<sup>Z</sup>*) = <sup>1</sup> <sup>2</sup> log 1 + *<sup>h</sup>*2*P*<sup>2</sup> 1+*h*1*P*<sup>1</sup> and by interpreting the signal of Transmitter 1 as noise.

Define *M* - (*M*1, *<sup>M</sup>*2), *<sup>M</sup>*<sup>H</sup> - (*M*H1, *<sup>M</sup>*H2). Let the superscript *<sup>T</sup>* denote the transpose operation and define **X** - [ <sup>√</sup>*h*1(*X<sup>n</sup>* <sup>1</sup> )*<sup>T</sup>* <sup>√</sup>*h*2(*X<sup>n</sup>* <sup>2</sup> )*T*] *<sup>T</sup>* <sup>∈</sup> <sup>R</sup>2*n*×1, such that

$$Z^n = G\mathbf{X} + N\_{Z'}^n \tag{49}$$

with *G* - [*In*, *In*] <sup>∈</sup> <sup>R</sup>*n*×2*<sup>n</sup>* and *In* the identity matrix with dimension *<sup>n</sup>*. Let *<sup>K</sup>***<sup>X</sup>** denote the covariance matrix of **X**. Note that, by independence between *X<sup>n</sup>* <sup>1</sup> and *<sup>X</sup><sup>n</sup>* <sup>2</sup> , we have *K***<sup>X</sup>** = *<sup>K</sup>*√*h*1*X<sup>n</sup>* 1 0*n* <sup>0</sup>*<sup>n</sup> <sup>K</sup>*√*h*2*X<sup>n</sup>* 2 , where 0*n* - <sup>0</sup> <sup>×</sup> *In* and *<sup>K</sup>*√*hiX<sup>n</sup> i* is the covariance matrix of <sup>√</sup>*hiX<sup>n</sup> i* , *<sup>i</sup>* ∈ {1, 2}. Then, for *<sup>i</sup>* ∈ {1, 2}, since *<sup>X</sup><sup>n</sup> <sup>i</sup>* is chosen uniformly at random over a sphere of radius <sup>√</sup>*nPi*, the off-diagonal elements of *<sup>K</sup>*√*hiX<sup>n</sup> i* are all equal to 0 by symmetry, and the

diagonal elements are all equal (also by symmetry) and sum to *nhiPi*. Hence, *<sup>K</sup>*√*hiX<sup>n</sup> i* = *hiPiIn*, *i* ∈ {1, 2}, and

$$\mathcal{K}\_{\mathbf{X}} = \begin{pmatrix} h\_1 P\_1 I\_n & \mathcal{O}\_n \\ \mathcal{O}\_n & h\_2 P\_2 I\_n \end{pmatrix}. \tag{50}$$

Then, we have

$$I(M; Z^n | \mathbb{C}\_n) = I(M\check{M}; Z^n | \mathbb{C}\_n) - I(\check{M}; Z^n | MC\_n) \tag{51a}$$

$$=I(M\widetilde{M};Z^{\mathfrak{n}}|\mathbb{C}\_{\mathfrak{n}}) - H(\widetilde{M}|\mathbb{C}\_{\mathfrak{n}}) + H(\widetilde{M}|Z^{\mathfrak{n}}M\mathbb{C}\_{\mathfrak{n}}) \tag{51b}$$

$$0 \le I(\mathbf{X}; Z^n | \mathbf{C}\_n) - H(\tilde{M} | \mathbf{C}\_n) + H(\tilde{M} | Z^n M \mathbf{C}\_n) \tag{51c}$$

$$0 \le I(\mathfrak{X}; Z^n) - H(\tilde{M}|\mathbb{C}\_n) + H(\tilde{M}|Z^n M \mathbb{C}\_n) \tag{51d}$$

$$\mathcal{L} = h(Z^n) - h(N^n\_Z) - H(\check{M}|\mathcal{C}\_n) + H(\check{M}|Z^n M \mathcal{C}\_n) \tag{51e}$$

$$\leq \frac{1}{2} \log \left| G K\_{\mathbf{X}} G^T + I\_n \right| - H(\breve{M} | \mathbb{C}\_n) + H(\breve{M} | Z^n M \mathbb{C}\_n) \tag{51f}$$

$$=\frac{n}{2}\log(1+h\_1P\_1+h\_2P\_2)-H(\tilde{M}|\mathbb{C}\_{\mathbb{H}})+H(\tilde{M}|Z^{\mathbb{H}}\mathcal{M}\mathbb{C}\_{\mathbb{H}})\tag{51g}$$

$$= nI(X\_1X\_2;Z) - H(\tilde{M}|\mathcal{C}\_{\mathbb{H}}) + H(\tilde{M}|Z^{\mathbb{H}}\mathcal{M}\mathcal{C}\_{\mathbb{H}}) \tag{51h}$$

$$\leq nI(X\_1X\_2;Z) - n(I(X\_1X\_2;Z) - 2\delta) + O(n\mathbb{E}\_{\mathbb{C}\_n}[\tilde{\epsilon}(\mathbb{C}\_n)])\tag{51i}$$

$$= 2\delta n + o(n),\tag{51}$$

where Equation (51b) holds by independence between *<sup>M</sup>* and *<sup>M</sup>*H; Equation (51c) holds because (*M*, *<sup>M</sup>*<sup>H</sup> ) <sup>−</sup> (**X**, *Cn*) <sup>−</sup> *<sup>Z</sup><sup>n</sup>* forms a Markov chain; Equation (51d) holds because *Cn* <sup>−</sup> **<sup>X</sup>** <sup>−</sup> *<sup>Z</sup><sup>n</sup>* forms a Markov chain; Equation (51f) holds because *<sup>h</sup>*(*N<sup>n</sup> <sup>Z</sup>*) = <sup>1</sup> <sup>2</sup> log((2*πe*)*n*) and because *<sup>h</sup>*(*Zn*) <sup>≤</sup> <sup>1</sup> <sup>2</sup> log((2*πe*)*n*|*GK***X***G<sup>T</sup>* <sup>+</sup> *In*|) by Equation (49) and the maximal differential entropy lemma (e.g., [31]) [Eq. (2.6)]; Equation (51g) holds by Equation (50); in Equation (51i), we used the definition of *<sup>R</sup>*H<sup>1</sup> <sup>+</sup> *<sup>R</sup>*H<sup>2</sup> and the uniformity of *<sup>M</sup>*<sup>H</sup> to obtain the second term, and Fano's inequality to obtain the third term; Equation (51j) holds by Equation (48c).

Note that the idea of considering a fictitious decoder at the eavesdropper to use Fano's inequality in Equation (51i) is a standard technique that already appeared in [32].

### *7.2. Case 2*

We only consider Case 2.a; Case 2.b is handled by exchanging the role of the transmitters. Let *R* - (*R*1, *<sup>R</sup>*2) ∈ D(*P*1, *<sup>P</sup>*2). There exists *<sup>α</sup>* <sup>∈</sup> [0, 1[ such that *<sup>R</sup>* = (<sup>1</sup> <sup>−</sup> *<sup>α</sup>*)*C*<sup>1</sup> <sup>+</sup> *<sup>α</sup>C*H2. The corner point *C*<sup>1</sup> is achievable by Case 1, however, recall that since the first component of *<sup>C</sup>*H<sup>2</sup> is negative, it thus cannot be achieved as in Case 1, and one cannot perform time-sharing between *<sup>C</sup>*<sup>1</sup> and *<sup>C</sup>*H<sup>2</sup> to achieve *<sup>R</sup>*. Instead, we achieve *<sup>R</sup>* as follows. We define *k*, *k* ∈ N such that *k* /*<sup>k</sup>* = (<sup>1</sup> <sup>−</sup> *<sup>α</sup>*)−<sup>1</sup> <sup>−</sup> <sup>1</sup> <sup>+</sup> , <sup>&</sup>gt; 0, this is possible by density of <sup>Q</sup> in <sup>R</sup>. We realize a first transmission *T*<sup>1</sup> as in Case 1 of a pair of confidential messages of length *nkC*1. Part of these confidential messages is dedicated to exchange a secret key of length *nk* (*I*(*X*1; *Z*|*X*2) − *I*(*X*1;*Y*)) > 0 between Transmitter 1 and the receiver, which is possible because (<sup>1</sup> <sup>−</sup> *<sup>α</sup>*)*C*<sup>1</sup> <sup>+</sup> *<sup>α</sup>C*H<sup>2</sup> <sup>=</sup> *<sup>R</sup>* has positive components. We then realize a second transmission *T*<sup>2</sup> of a pair of confidential messages of length *nk* (0, *I*(*X*2;*Y*|*X*1) − *I*(*X*2; *Z*)) assisted with the secret key that is shared between Transmitter 1 and the receiver. Hence, the overall transmission rate of confidential messages is *<sup>k</sup> <sup>k</sup>*+*<sup>k</sup> <sup>C</sup>*<sup>1</sup> <sup>+</sup> *<sup>k</sup> <sup>k</sup>*+*<sup>k</sup> <sup>C</sup>*H2, which is arbitrarily close to *R* by choosing a sufficiently small . We now explain how transmission *T*<sup>2</sup> is performed. We repeat *k* times the following coding scheme.

**Codebook construction**: Perform the same codebook construction as in Case 1 for Transmitter 2. For Transmitter 1, construct a codebook with 2*nR*˘ <sup>1</sup> 2*nR*˚ <sup>1</sup> codewords drawn independently and uniformly on the sphere of radius <sup>√</sup>*nP*<sup>1</sup> in <sup>R</sup>*n*. The codewords are labeled *x<sup>n</sup>* <sup>1</sup> (*m*˘ 1, *m*˚ <sup>1</sup>), where *m*˘ <sup>1</sup> ∈ -1, 2*nR*˘ <sup>1</sup> , *<sup>m</sup>*˚ <sup>1</sup> <sup>∈</sup> -1, 2*nR*˚ <sup>1</sup> . We define the rates *<sup>R</sup>*˘ <sup>1</sup> - *<sup>I</sup>*(*X*1;*Y*) <sup>−</sup> *<sup>δ</sup>*, *<sup>R</sup>*˚ <sup>1</sup> - *<sup>I</sup>*(*X*1; *<sup>Z</sup>*|*X*2) <sup>−</sup> *<sup>I</sup>*(*X*1;*Y*) <sup>−</sup> *<sup>δ</sup>*, and *<sup>R</sup>*H<sup>1</sup> -*<sup>R</sup>*˘ <sup>1</sup> <sup>+</sup> *<sup>R</sup>*˚ <sup>1</sup> <sup>=</sup> *<sup>I</sup>*(*X*1; *<sup>Z</sup>*|*X*2) <sup>−</sup> <sup>2</sup>*δ*.

**Encoding at Transmitters**: Encoding for Transmitter 2 is as in Case 1. Given (*m*˘ 1, *m*˚ <sup>1</sup>), Transmitter 1 forms *x<sup>n</sup>* <sup>1</sup> (*m*˘ 1, *m*˚ <sup>1</sup>), where *m*˚ <sup>1</sup> is seen as a secret key known at the receiver and that has been shared through transmission *T*<sup>1</sup> described above. In the following, we define *<sup>m</sup>*H<sup>1</sup> -(*m*˘ 1, *m*˚ <sup>1</sup>).

**Decoding and average probability of error**: As in Case 1, using minimum distance decoding, one can show that on average over the codebooks, the receiver can reconstruct *xn* <sup>1</sup> (*m*˘ 1, *m*˚ <sup>1</sup>) with a vanishing average probability of error because *m*˚ <sup>1</sup> is known at the receiver and because *R*˘ <sup>1</sup> < *I*(*X*1;*Y*). The receiver can then reconstruct *x<sup>n</sup>* <sup>2</sup> as in Case 1.

**Equivocation**: The equivocation computation for transmission *T*<sup>2</sup> is as in Case 1 by remarking that it is possible on average over the codebooks to reconstruct with vanishing average probability of error first *x<sup>n</sup>* <sup>2</sup> given (*zn*, *<sup>m</sup>*2) and then *<sup>x</sup><sup>n</sup>* <sup>1</sup> given (*zn*, *<sup>x</sup><sup>n</sup>* <sup>2</sup> ) by using that *<sup>R</sup>*H<sup>1</sup> <sup>&</sup>lt; *<sup>I</sup>*(*X*1; *<sup>Z</sup>*|*X*2).

Finally, to conclude that *R* is achievable, we need to show that the secrecy constraint is satisfied for the joint transmissions *T*<sup>1</sup> and *T*2. We use the superscript (*Ti*) to denote random variables associated with transmission *Ti*, *<sup>i</sup>* ∈ {1, 2}. Define *<sup>M</sup>*(*T*1) - *M*(*T*1) <sup>1</sup> \*M*˚ (*T*1) <sup>1</sup> , *<sup>M</sup>*(*T*1) 2 , the confidential messages sent during transmission *<sup>T</sup>*<sup>1</sup> excluding *<sup>M</sup>*˚ (*T*1) <sup>1</sup> , defined as all the confidential messages sent during transmission *T*<sup>1</sup> and used during transmission *T*2. We define *M*(*T*2) - ∅, *M*(*T*2) 2 as the confidential messages sent during transmission *T*2. We define *<sup>M</sup>*<sup>H</sup> (*Ti*) - *<sup>M</sup>*<sup>H</sup> (*Ti*) <sup>1</sup> , *<sup>M</sup>*<sup>H</sup> (*Ti*) 2 as the randomization sequences used by both transmitters in Transmission *Ti*, *<sup>i</sup>* ∈ {1, 2}. We also define **<sup>X</sup>**(*Ti*) as all the channel inputs from both transmitters in Transmission *Ti*, *<sup>i</sup>* ∈ {1, 2}, and **<sup>Z</sup>**(*Ti*) as all the channel outputs observed by the eavesdropper in Transmission *<sup>i</sup>* ∈ {1, 2}. Finally, we define *<sup>M</sup>*(*T*1,*T*2) - *M*(*T*1), *M*(*T*2) , *<sup>M</sup>*<sup>H</sup> (*T*1,*T*2) - *<sup>M</sup>*<sup>H</sup> (*T*1), *<sup>M</sup>*<sup>H</sup> (*T*2) , **Z**(*T*1,*T*2) - **Z**(*T*1), **Z**(*T*2) , **X**(*T*1,*T*2) - **X**(*T*1), **X**(*T*2) , *<sup>C</sup>*(*T*1,*T*2) *<sup>n</sup>* - *<sup>C</sup>*(*T*1) *<sup>n</sup>* , *<sup>C</sup>*(*T*2) *<sup>n</sup>* . We have

$$\begin{split} &I(\mathcal{M}^{(T\_{1},T\_{2})};\mathbf{Z}^{(T\_{1},T\_{2})}|\mathbf{C}^{(T\_{1},T\_{2})}\_{\mathfrak{n}}) \\ &= I(\mathcal{M}^{(T\_{1},T\_{2})}\tilde{\mathcal{M}}^{(T\_{1},T\_{2})};\mathbf{Z}^{(T\_{1},T\_{2})}|\mathbf{C}^{(T\_{1},T\_{2})}\_{\mathfrak{n}}) - I(\tilde{\mathcal{M}}^{(T\_{1},T\_{2})};\mathbf{Z}^{(T\_{1},T\_{2})}|\mathbf{M}^{(T\_{1},T\_{2})}|\mathbf{C}^{(T\_{1},T\_{2})}\_{\mathfrak{n}}) \\ &= I(\mathcal{M}^{(T\_{1},T\_{2})}\tilde{\mathcal{M}}^{(T\_{1},T\_{2})};\mathbf{Z}^{(T\_{1},T\_{2})}|\mathbf{C}^{(T\_{1},T\_{2})}\_{\mathfrak{n}}) - H(\tilde{\mathcal{M}}^{(T\_{1},T\_{2})}|\mathbf{C}^{(T\_{1},T\_{2})}\_{\mathfrak{n}}) \end{split} \tag{52a}$$

$$\mathcal{I}\_1 + H(\check{M}^{(T\_1, T\_2)} | \mathbf{Z}^{(T\_1, T\_2)} M^{(T\_1, T\_2)} \mathbf{C}\_n^{(T\_1, T\_2)}) \tag{52b}$$

$$\begin{split} \boldsymbol{\Lambda} &\leq I(\mathbf{X}^{(T\_{1},T\_{2})};\mathbf{Z}^{(T\_{1},T\_{2})} | \mathbf{C}\_{n}^{(T\_{1},T\_{2})}) - H(\boldsymbol{\hat{M}}^{(T\_{1},T\_{2})} | \mathbf{C}\_{n}^{(T\_{1},T\_{2})}) \\ &\quad + H(\boldsymbol{\hat{M}}^{(T\_{1},T\_{2})} | \mathbf{Z}^{(T\_{1},T\_{2})}) \boldsymbol{M}^{(T\_{1},T\_{2})} \mathbf{C}\_{n}^{(T\_{1},T\_{2})}) \end{split} \tag{52c}$$

$$\leq I(\mathbf{X}^{(T\_1,T\_2)};\mathbf{Z}^{(T\_1,T\_2)}) - H(\check{M}^{(T\_1,T\_2)}|\mathbf{C}^{(T\_1,T\_2)}\_{n}) + H(\check{M}^{(T\_1,T\_2)}|\mathbf{Z}^{(T\_1,T\_2)}M^{(T\_1,T\_2)}\mathbf{C}^{(T\_1,T\_2)}\_{n}) \tag{52d}$$

$$0 \le n(k+k')I(X\_1X\_2;Z) - H(\tilde{M}^{(T\_1,T\_2)}|\mathbb{C}\_n^{(T\_1,T\_2)}) + H(\tilde{M}^{(T\_1,T\_2)}|\mathbb{Z}^{(T\_1,T\_2)}M^{(T\_1,T\_2)}\mathbb{C}\_n^{(T\_1,T\_2)})\_{\text{can a.s.}}$$

(52e)

$$\leq 3n\delta(k+k') + H(\tilde{M}^{(T\_1,T\_2)}|\mathbf{Z}^{(T\_1,T\_2)}M^{(T\_1,T\_2)}\mathbf{C}\_n^{(T\_1,T\_2)})\tag{52f}$$

$$\leq 3n\delta(k+k') + O\left(n\mathbb{E}\_{\mathbb{C}\_n^{(T\_1,T\_2)}}[\tilde{e}(\mathbb{C}\_n^{(T\_1,T\_2)})]\right),\tag{52g}$$

where Equation (52b) holds because we defined *M*(*T*1,*T*2) such that *M*(*T*1,*T*2) is independent from *<sup>M</sup>*<sup>H</sup> (*T*1,*T*2), Equation (52c) holds because (*M*(*T*1,*T*2), *<sup>M</sup>*<sup>H</sup> (*T*1,*T*2)) <sup>−</sup> *<sup>C</sup>*(*T*1,*T*2) *<sup>n</sup>* , **<sup>X</sup>**(*T*1,*T*2) − **<sup>Z</sup>**(*T*1,*T*2) forms a Markov chain, Equation (52d) holds because *<sup>C</sup>*(*T*1,*T*2) *<sup>n</sup>* <sup>−</sup> **<sup>X</sup>**(*T*1,*T*2) <sup>−</sup> **<sup>Z</sup>**(*T*1,*T*2) forms a Markov chain, Equation (52e) holds similar to Equation (51h), Equation (52f) holds because by definition *<sup>R</sup>*H<sup>1</sup> <sup>+</sup> *<sup>R</sup>*H<sup>2</sup> <sup>≥</sup> *<sup>I</sup>*(*X*1*X*2; *<sup>Z</sup>*) <sup>−</sup> <sup>3</sup>*δ*, Equation (52g) holds by Fano's

inequality with <sup>H</sup>*e*(*C*(*T*1,*T*2) *<sup>n</sup>* ) defined as the probability of error to reconstruct *<sup>M</sup>*<sup>H</sup> (*T*1,*T*2) given **Z**(*T*1,*T*2), *M*(*T*1,*T*2) using minimum distance decoding as in Case 1. Then, define <sup>H</sup>*e*(1)(*C*(*T*1,*T*2) *<sup>n</sup>* ) as the error probability to reconstruct *<sup>M</sup>*<sup>H</sup> (*T*2) from **Z**(*T*2), *M*(*T*2) using minimum distance decoding, and <sup>H</sup>*e*(2)(*C*(*T*1,*T*2) *<sup>n</sup>* ) as the error probability to reconstruct *<sup>M</sup>*<sup>H</sup> (*T*1) from **<sup>Z</sup>**(*T*1), *<sup>M</sup>*(*T*1), *<sup>M</sup>*<sup>H</sup> (*T*2) using minimum distance decoding. As in the analysis of Case 1 and by observing that *M*˚ (*T*1) <sup>1</sup> is included in *<sup>M</sup>*<sup>H</sup> (*T*2), we have

$$\mathbb{E}\_{\mathsf{C}\_{n}^{(T\_{1},T\_{2})}}\left[\vec{\varepsilon}(\mathsf{C}\_{n}^{(T\_{1},T\_{2})})\right] \leq \mathbb{E}\_{\mathsf{C}\_{n}^{(T\_{1},T\_{2})}}\left[\vec{\varepsilon}^{(1)}(\mathsf{C}\_{n}^{(T\_{1},T\_{2})})\right] + \mathbb{E}\_{\mathsf{C}\_{n}^{(T\_{1},T\_{2})}}\left[\vec{\varepsilon}^{(2)}(\mathsf{C}\_{n}^{(T\_{1},T\_{2})})\right] \tag{53a}$$
 
$$\xrightarrow{n \to \infty} 0. \tag{53b}$$

We conclude from Equations (52g) and (53b)

$$I(M^{(T\_1, T\_2)}; \mathbf{Z}^{(T\_1, T\_2)} | \mathbb{C}\_n^{(T\_1, T\_2)}) = 3n\delta(k + k') + o(n). \tag{54}$$

*7.3. Case 3*

We have *I*(*X*1; *Z*|*X*2) − *I*(*X*1;*Y*) > 0 and *I*(*X*2; *Z*|*X*1) − *I*(*X*2;*Y*) > 0 as depicted in Figure 5. Assume *<sup>I</sup>*(*X*1*X*2;*Y*) <sup>−</sup> *<sup>I</sup>*(*X*1*X*2; *<sup>Z</sup>*) <sup>&</sup>gt; 0, otherwise <sup>R</sup>MAC 1,2 (*P*1, *P*2) = {(0, 0)}. We will use the following lemma.

**Lemma 2.** *Define h*Λ -(1 + Λ)−1*. We have*


$$m'(I(X\_1; \mathcal{Y}|X\_2) - I(X\_1; Z)) \ge m(I(X\_1; Z|X\_2) - I(X\_1; \mathcal{Y})),\tag{55a}$$

$$m(I(X\_2;Y|X\_1) - I(X\_2;Z)) > m'(I(X\_2;Z|X\_1) - I(X\_2;Y)).\tag{55b}$$

**Proof.** (i) Assume that

$$I(X\_1; Z|X\_2) - I(X\_1; Y) > I(X\_1; Y|X\_2) - I(X\_1; Z),\tag{56a}$$

$$I(X\_2; Z|X\_1) - I(X\_2; Y) \ > I(X\_2; Y|X\_1) - I(X\_2; Z). \tag{56b}$$

Then,

$$\begin{aligned} I(X\_1; Z|X\_2) - I(X\_1; Y) + I(X\_2; Z|X\_1) - I(X\_2; Y) \\ > I(X\_1; Y|X\_2) - I(X\_1; Z) + I(X\_2; Y|X\_1) - I(X\_2; Z), \end{aligned} \tag{57}$$

which contradicts the fact that *I*(*X*1; *Z*|*X*2) − *I*(*X*1;*Y*) < *I*(*X*2;*Y*|*X*1) − *I*(*X*2; *Z*) and *I*(*X*2; *Z*|*X*1) − *I*(*X*2;*Y*) < *I*(*X*1;*Y*|*X*2) − *I*(*X*1; *Z*).

(ii) By contradiction, if *h*<sup>1</sup> ≥ *h*<sup>Λ</sup> and *h*<sup>2</sup> ≥ *h*Λ, then *I*(*X*1*X*2;*Y*) − *I*(*X*1*X*2; *Z*) ≤ 0. (iii) Choose *m* ∈ N<sup>∗</sup> such that

$$I(X\_1; Z|X\_2) - I(X\_1; Y) \le m'(I(X\_1 X\_2; Y) - I(X\_1 X\_2; Z)).\tag{58}$$

Then, there exists *m* ∈ N<sup>∗</sup> and *r* ∈ [0, *I*(*X*1; *Z*|*X*2) − *I*(*X*1;*Y*)[ such that

$$m'(I(X\_1; \mathcal{Y}|X\_2) - I(X\_1; Z)) = m(I(X\_1; Z|X\_2) - I(X\_1; \mathcal{Y})) + r. \tag{59}$$

Then, we have

$$\begin{aligned} &m(I(\mathbf{X\_2};Y|\mathbf{X\_1}) - I(\mathbf{X\_2};Z)) \\ &= m(I(\mathbf{X\_1};Z|\mathbf{X\_2}) - I(\mathbf{X\_1};Y)) + m(I(\mathbf{X\_1}\mathbf{X\_2};Y) - I(\mathbf{X\_1}\mathbf{X\_2};Z)) \\ &= m(I(\mathbf{Y}\cdot\mathcal{N}(\mathbf{Y})\cdot I(\mathbf{Y}\cdot\mathcal{Z})) + m(I(\mathbf{Y}\cdot\mathcal{N}(\mathbf{Y})\cdot I(\mathbf{Y}\cdot\mathcal{Z}))) \end{aligned} \tag{60a}$$

$$\mathbf{x} = m'(I(X\_1; Y | X\_2) - I(X\_1; Z)) + m(I(X\_1 X\_2; Y) - I(X\_1 X\_2; Z)) - r \tag{60b}$$

$$\mathbf{x} = m'(I(\mathbf{X\_2};\mathbf{Z}|\mathbf{X\_1}) - I(\mathbf{X\_2};\mathbf{Y})) + (m+m')(I(\mathbf{X\_1}\mathbf{X\_2};\mathbf{Y}) - I(\mathbf{X\_1}\mathbf{X\_2};\mathbf{Z})) - r \tag{60c}$$

$$\geq m'(I(X\_2; Z|X\_1) - I(X\_2; Y)) + m(I(X\_1 X\_2; Y) - I(X\_1 X\_2; Z))\tag{60d}$$

$$\geq m'(I(X\_2; Z|X\_1) - I(X\_2; Y)),\tag{60e}$$

where Equation (60b) holds by Equation (59), and Equation (60d) holds because *r* < *I*(*X*1; *Z*|*X*2) − *I*(*X*1;*Y*) ≤ *m* (*I*(*X*1*X*2;*Y*) − *I*(*X*1*X*2; *Z*)).

By (i) in Lemma 2, assume without loss of generality that *I*(*X*1; *Z*|*X*2) − *I*(*X*1;*Y*) ≤ *I*(*X*1;*Y*|*X*2) − *I*(*X*1; *Z*) by exchanging the role of the transmitters if necessary. We let *m*, *m* be as in (iii) of Lemma 2. D(*P*1, *P*2) is achieved in four steps.

**Step 1**. During a first transmission *T*0, Transmitter 2 transmits a confidential message of length *nm* (*I*(*X*2; *Z*|*X*1) − *I*(*X*2;*Y*)) to the receiver. This is possible with a point-to-point wiretap code; as in Case 1, when Transmitter 1 remains silent and when *h*<sup>Λ</sup> > *h*2. If, on the other hand, *h*<sup>Λ</sup> ≤ *h*2, then by (ii) in Lemma 2, *h*<sup>Λ</sup> > *h*<sup>1</sup> and Transmitter 2 can transmit a confidential message of length *nm* (*I*(*X*2; *Z*|*X*1) − *I*(*X*2;*Y*)) as follows. Transmitter 1 transmits a confidential message of length *nk*(*I*(*X*1; *Z*|*X*2) − *I*(*X*1;*Y*)), where *k* ∈ N<sup>∗</sup> is such that *nk*(*I*(*X*2;*Y*|*X*1) − *I*(*X*2; *Z*)) ≥ *nm* (*I*(*X*2; *Z*|*X*1) − *I*(*X*2;*Y*)). Using this secret key shared by Transmitter 1 and the receiver, Transmitter 2 can transmit a confidential message of length *nk*(*I*(*X*2;*Y*|*X*1) − *I*(*X*2; *Z*)) as in Case 2. Note that Step 1 is operated in a fixed number of blocks of length *n*.

**Step 2**. As in Case 2, the transmitters achieve transmission *T*<sup>1</sup> of confidential messages of length (*nm* (*I*(*X*1;*Y*|*X*2) − *I*(*X*1; *Z*)), 0) by using the secret key exchanged during *T*<sup>0</sup> between Transmitter 2 and the receiver. Then, as in Case 2 and because *m* (*I*(*X*1;*Y*|*X*2) − *I*(*X*1; *Z*)) − *m*(*I*(*X*1; *Z*|*X*2) − *I*(*X*1;*Y*)) ≥ 0 by (iii) in Lemma 2, the transmitters achieve a transmission *T*<sup>2</sup> of confidential messages of length (0, *nm*(*I*(*X*2;*Y*|*X*1) − *I*(*X*2; *Z*))) using a secret key of length *nm*(*I*(*X*1; *Z*|*X*2) − *I*(*X*1;*Y*)) exchanged between Transmitter 1 and the receiver during *T*1. Hence, after *T*<sup>1</sup> and *T*2, the transmitters achieved the transmission of confidential messages of length (*nm* (*I*(*X*1;*Y*|*X*2) − *I*(*X*1; *Z*)) − *nm*(*I*(*X*1; *Z*|*X*2) − *I*(*X*1;*Y*)), *nm*(*I*(*X*2;*Y*|*X*1) − *I*(*X*2; *Z*))).

**Step 3**. The transmitters repeat *T*<sup>1</sup> and *T*<sup>2</sup> *t* times, where *t* is arbitrary, since *m*(*I*(*X*2;*Y*|*X*1) − *I*(*X*2; *Z*)) − *m* (*I*(*X*2; *Z*|*X*1) − *I*(*X*2;*Y*)) > 0 by (iii) in Lemma 2. After these *t* repetitions, the rate pair achieved is arbitrarily close to

$$\underline{R} = \frac{1}{m + m'} (m'(I(\mathbf{X\_1}; \mathbf{Y}|\mathbf{X\_2}) - I(\mathbf{X\_1}; \mathbf{Z})) - m(I(\mathbf{X\_1}; \mathbf{Z}|\mathbf{X\_2}) - I(\mathbf{X\_1}; \mathbf{Y})),$$

$$m(I(\mathbf{X\_2}; \mathbf{Y}|\mathbf{X\_1}) - I(\mathbf{X\_2}; \mathbf{Z})) - m'(I(\mathbf{X\_2}; \mathbf{Z}|\mathbf{X\_1}) - I(\mathbf{X\_2}; \mathbf{Y}))) \tag{61}$$

provided that *t* is large enough since Step 1 only requires a fixed number of transmission blocks. Observe that *R* ∈ D(*P*1, *P*2).

**Step 4**. Any point of D(*P*1, *P*2) can then be achieved as in Case 2 by doing the substitutions *C*<sup>1</sup> ← *R* and *C*<sup>2</sup> ← *R* in Case 2.a and Case 2.b, respectively.

The proof that secrecy holds over the joint transmissions is similar to Case 2 and thus omitted.

### **8. Proof of Theorem 3**

We first show that determining a converse for our model reduces to determining a converse for a similar model when the jammer is inactive, i.e., when Λ = 0.

**Lemma 3.** *Let* O - {(*R*1, *R*2) : *R*<sup>1</sup> ≤ *B*1, *R*<sup>2</sup> ≤ *B*2, *R*<sup>1</sup> + *R*<sup>2</sup> ≤ *B*1,2} *be an outer bound, i.e., a set that contains all possibly achievable rate pairs, for the Gaussian MAC-WT-JA with parameters* (Γ1, Γ2, *h*1, *h*2, 0, *σ*<sup>2</sup> *<sup>Y</sup>* + <sup>Λ</sup>, *<sup>σ</sup>*<sup>2</sup> *<sup>Z</sup>*)*. Then,*

$$\left\{ (R\_1, R\_2) : R\_1 \le \begin{cases} B\_1 & \text{if } \Gamma\_1 > \Lambda \\ 0 & \text{if } \Gamma\_1 \le \Lambda \end{cases} , R\_2 \le \begin{cases} B\_2 & \text{if } \Gamma\_2 > \Lambda \\ 0 & \text{if } \Gamma\_2 \le \Lambda \end{cases} , R\_1 + R\_2 \le B\_{1,2} \right\}$$

*is an outer bound for the Gaussian MAC-WT-JA with parameters* (Γ1, Γ2, *h*1, *h*2, Λ, *σ*<sup>2</sup> *<sup>Y</sup>*, *<sup>σ</sup>*<sup>2</sup> *Z*)*.*

**Proof.** Consider any encoders and decoder for the Gaussian MAC-WT-JA with the parameters (Γ1, Γ2, *h*1, *h*2, Λ, *σ*<sup>2</sup> *<sup>Y</sup>*, *<sup>σ</sup>*<sup>2</sup> *<sup>Z</sup>*) that achieve the rate pair (*R*1, *R*2). Note that by [24] [Theorem 2.3], for any *l* ∈ {1, 2} such that Γ*<sup>l</sup>* ≤ Λ, we must have *Rl* = 0, since an outer bound for the model in [24] is also an outer bound for the Gaussian MAC-WT-JA, which has the additional security constraint (2b). Then, to derive an outer bound, it is sufficient to consider a specific jamming strategy and study the best achievable rates for this jamming strategy, since the boundaries of the capacity region correspond to the best (from the jammer's point of view) jamming strategies and any other jamming strategy can only enlarge the set of achievable rates. We assume that in each transmission block, the jamming sequence is *S<sup>n</sup>* with the components independent and identically distributed according to a zero-mean Gaussian random variable with the variance Λ < Λ. The average probability of error at the legitimate receiver is thus upper-bounded by sup*S*∈S <sup>P</sup>[*M*<sup>F</sup> <sup>=</sup> *<sup>M</sup>*] + *<sup>k</sup>*P[*Sn*<sup>2</sup> <sup>&</sup>gt; *<sup>n</sup>*Λ] *<sup>n</sup>*→<sup>∞</sup> −−−→ 0 where we used the notation of Definition <sup>1</sup> and the fact that *<sup>k</sup>*P[*Sn*<sup>2</sup> <sup>&</sup>gt; *<sup>n</sup>*Λ] *<sup>n</sup>*→<sup>∞</sup> −−−→ 0 since <sup>Λ</sup> <sup>&</sup>lt; <sup>Λ</sup>. Hence, since the secrecy constraint is independent of Λ , we obtain the reliability and secrecy constraints for a Gaussian MAC-WT-JA with parameters (Γ1, Γ2, *h*1, *h*2, 0, *σ*<sup>2</sup> *<sup>Y</sup>* + Λ , *σ*<sup>2</sup> *<sup>Z</sup>*), meaning that (*R*1, *R*2) ∈ O , where O is an outer bound for the Gaussian MAC-WT-JA with parameters (Γ1, Γ2, *h*1, *h*2, 0, *σ*<sup>2</sup> *<sup>Y</sup>* + Λ , *σ*<sup>2</sup> *<sup>Z</sup>*). Finally, we conclude the proof by choosing Λ arbitrarily close to Λ.

We now obtain Theorem 3 as follows. (i) holds from Lemma 3. (ii) holds from Lemma <sup>3</sup> and [33] [Theorem 6] by remarking that *<sup>x</sup>* <sup>→</sup> log1+*x*(1+Λ)−<sup>1</sup> <sup>1</sup>+*xh* is non-decreasing when (<sup>1</sup> <sup>+</sup> <sup>Λ</sup>)−<sup>1</sup> <sup>&</sup>gt; *<sup>h</sup>* and negative when (<sup>1</sup> <sup>+</sup> <sup>Λ</sup>)−<sup>1</sup> <sup>≤</sup> *<sup>h</sup>*.

### **9. Concluding Remarks**

In this paper, we defined Gaussian wiretap channels in the presence of an eavesdropper aided by a jammer. The jamming signal is power-constrained and assumed to be oblivious of the legitimate users' communication but is not restricted to be Gaussian. We studied several models in this framework, namely point-to-point, multiple-access, broadcast, and symmetric interference settings. We derived inner and outer bounds for these settings, and identified conditions for these bounds to coincide. We stress that no shared randomness among the legitimate users is required in our coding schemes.

Our achievability scheme for the Gaussian MAC-WT-JA relies on novel time-sharing strategies and an extension of successive decoding for multiple-access channels to multipleaccess wiretap channels via secret-key exchanges. An open problem remains to provide a scheme that avoids time-sharing. Section 4.2 provides such a scheme for some rate pairs and channel parameters; however, it might not be possible to achieve the entire region of Theorem 2 by solely relying on point-to-point codes, in which case the design of multitransmitter codes for arbitrarily varying multiple-access channels would be necessary.

Finally, beyond proving the existence of achievability schemes for our models, finding explicit coding schemes largely remains an open problem. We note that [34] investigates this problem for short communication blocklengths over point-to-point channels via a practical approach that relies on deep learning. Another open problem is to achieve the same regions as that derived in this paper under strong and semantic security guarantees. **Author Contributions:** The ideas in this work were formed by the discussions between R.A.C. and A.Y. Both authors collaborated on the writing of the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported in part by NSF grants CIF-1319338, CNS-1314719, CCF-2105872, and CCF-2047913.

**Institutional Review Board Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **Appendix A. Supporting Results**

**Lemma A1** ([1])**.** *Let* > <sup>0</sup>*, η* ∈]<sup>8</sup> <sup>√</sup>, 1[*, <sup>K</sup>* <sup>&</sup>gt; <sup>2</sup>*, <sup>R</sup>* <sup>∈</sup> [2, *<sup>K</sup>*]*, and <sup>N</sup> enR. Let X<sup>n</sup>* <sup>1</sup> , ... , *<sup>X</sup><sup>n</sup> <sup>N</sup> be independent random variables uniformly distributed on the unit sphere. With probability arbitrarily close to one as n* → ∞*, we have*


**Theorem A1** ([1,24])**.** *Consider a channel whose output is defined as Y<sup>n</sup>* = *X<sup>n</sup>* + *V<sup>n</sup>* + *sn, where <sup>X</sup><sup>n</sup> is the input such that Xn*<sup>2</sup> <sup>≤</sup> *n, <sup>V</sup><sup>n</sup> represents noise (to be defined next), and <sup>s</sup><sup>n</sup> is a state unknown to the encoder and decoder such that sn*<sup>2</sup> <sup>≤</sup> *<sup>n</sup>*Λ*,* <sup>Λ</sup> <sup>&</sup>lt; <sup>1</sup>*. Let <sup>σ</sup>*, *<sup>δ</sup>* <sup>&</sup>gt; <sup>0</sup>*. Consider a codebook* C*<sup>n</sup> made of N en*( <sup>1</sup> <sup>2</sup> log(1+(Λ+*σ*2)−1)−*δ*) *codewords* (*x<sup>n</sup>* <sup>1</sup> , ... , *<sup>x</sup><sup>n</sup> <sup>N</sup>*) *that satisfy the two conditions of Lemma A1, and define the average probability of error e*(C*n*) *of a minimum distance decoder as e*(C*n*) - 1 *<sup>N</sup>* <sup>∑</sup>*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> <sup>P</sup>[*x<sup>n</sup> <sup>i</sup>* <sup>+</sup> *<sup>s</sup><sup>n</sup>* <sup>+</sup> *<sup>V</sup><sup>n</sup>* <sup>−</sup> *<sup>x</sup><sup>n</sup> <sup>j</sup>* <sup>2</sup> ≤ *s<sup>n</sup>* <sup>+</sup> *<sup>V</sup>n*2, *for some j* = *i*]*.*


### **Appendix B. Proof of Theorem 4**

We first recall some definitions and results on polymatroids.

**Definition A1** ([35])**.** *Let f* : 2<sup>M</sup> → R*.* P(*f*) - ( (*Ri*)*i*∈M ∈ R<sup>M</sup> : *<sup>R</sup>*<sup>S</sup> ≤ *<sup>f</sup>*(S), ∀S ⊆ M<sup>+</sup> *associated with the function f is an* extended polymatroid *if f is submodular, i.e.,* ∀S, T ⊆ M, *f*(S∪T ) + *f*(S∩T ) ≤ *f*(S) + *f*(T )*.*

**Property A1** ([29] [Property 1])**.** *Define <sup>g</sup>* : <sup>2</sup>L(Λ) <sup>→</sup> <sup>R</sup>, <sup>T</sup> <sup>→</sup> *<sup>I</sup>*(*X*<sup>T</sup> ;*Y*|*X*<sup>T</sup> *<sup>c</sup>* ) <sup>−</sup> *<sup>I</sup>*(*X*<sup>T</sup> ; *<sup>Z</sup>*)*, where Y* - <sup>∑</sup>*l*∈L(Λ) *Xl* + *NY, <sup>Z</sup>* - ∑*l*∈L(Λ) <sup>√</sup>*hlXl* <sup>+</sup> *NZ, with* (*Xl*)*l*∈L(Λ)*, NY, NZ independent zero-mean Gaussian random variables with variances* (*Pl*)*l*∈L(Λ)*,* (<sup>1</sup> + <sup>Λ</sup>)*,* <sup>1</sup>*, respectively.*

$$\mathcal{C}(\Lambda) \triangleq \left\{ (R\_l)\_{l \in \mathcal{L}(\Lambda)} \in \mathbb{R}^{|\mathcal{L}(\Lambda)|} : \forall \mathcal{T} \subseteq \mathcal{L}(\Lambda), R\_{\mathcal{T}} \le \mathcal{g}(\mathcal{T}) \right\} \tag{A1}$$

*associated with g is an extended polymatroid.*

**Property A2** ([35])**.** *Define the dominant face D*(Λ) *of C*(Λ) *as*

$$D(\Lambda) \triangleq \left\{ (\mathcal{R}\_l)\_{l \in \mathcal{L}(\Lambda)} \in \mathcal{C}(\Lambda) : \mathcal{R}\_{\mathcal{L}(\Lambda)} = \mathcal{g}(\mathcal{L}(\Lambda)) \right\}.\tag{A2}$$

*For π* ∈ *Sym*(|L(Λ)|)*, where Sym*(|L(Λ)|) *is the symmetric group on* L(Λ)*, for <sup>i</sup>*, *<sup>j</sup>* ∈ L(Λ)*, define <sup>π</sup>i*:*<sup>j</sup>* - (*π*(*k*))*k*∈*<sup>i</sup>*,*j*. *D*(Λ) *is the convex hull of the vertices* V - - (*Cπ*(*i*))*<sup>i</sup>*∈-1,|L(Λ)| : *<sup>π</sup>* <sup>∈</sup> *Sym*(|L(Λ)|) 7 *, where for π* ∈ *Sym*(|L(Λ)|)*, for i* ∈ -1, |L(Λ)|*, Cπ*(*i*) = *g* {*πi*:|L(Λ)<sup>|</sup> } − *g* {*πi*+1:|L(Λ)<sup>|</sup> } .

Define *D*+(Λ) - *<sup>D</sup>*(Λ) <sup>∩</sup> <sup>R</sup>|L(Λ)<sup>|</sup> <sup>+</sup> . By Property A2, for any *R* ∈ *D*+(Λ), for any *<sup>V</sup>* = (*Vl*)*l*∈L(Λ) ∈ V, there exists *<sup>α</sup><sup>V</sup>* ∈ [0, 1], such that <sup>∑</sup>*V*∈V *<sup>α</sup><sup>V</sup>* = 1 and *<sup>R</sup>* = <sup>∑</sup>*V*∈V *<sup>α</sup>VV*. As remarked in [29], *g* is, in general, not non-decreasing; hence, some *V* ∈ V might have negative components and the successive decoding method [5] [Appendix C] cannot be applied to the multiple-access wiretap channel. We show in the following how to overcome this issue. For *l* ∈ L(Λ), define *R*<sup>∗</sup> *<sup>l</sup>* - <sup>−</sup> <sup>∑</sup>*V*∈V *<sup>α</sup>V*1{*Vl* <sup>&</sup>lt; <sup>0</sup>}*Vl*, and *<sup>R</sup>*<sup>∗</sup> - (*R*∗ *<sup>l</sup>* )*l*∈L(Λ). Our coding scheme operates in three steps, the idea of which is described below.

**Step 1.** For *l* ∈ L(Λ), a secret message of length *nR*<sup>∗</sup> *<sup>l</sup>* is exchanged between Transmitter *l* and the receiver.

**Step 2.** For all *<sup>V</sup>* ∈ V, secret messages of length *<sup>n</sup>*(*αV*1{*Vl* <sup>&</sup>gt; <sup>0</sup>}*Vl*)*l*∈L(Λ) are exchanged between the transmitters and the receiver, provided that secret sequences of length *nR*∗ are shared between the transmitters and the receiver, which is ensured by Step 1. The overall length of secret communication is *<sup>n</sup>*(∑*V*∈V *<sup>α</sup>V*1{*Vl* <sup>&</sup>gt; <sup>0</sup>}*Vl*)*l*∈L(Λ), i.e., *n*(*R* + *R*∗).

**Step 3.** Repeat *t* times Step 2. It is possible to do so because secret sequences of length at least *nR*∗ were exchanged between the transmitters and the receiver in Step 2. The overall rate of secret sequences exchanged between the transmitters and the receiver is thus *R*, provided that *t* is large enough, since Step 1 only requires the transmission of a finite number of blocks.

The coding schemes and their analyses to realize Steps 1 and 2 are described in Appendix B.1 and Appendix B.2, respectively. In the remainder of the section, *Y* and *Z* are defined as in Property A1 with (*Xl*)*l*∈L(Λ) zero-mean Gaussian random variables with variances (*Pl*)*l*∈L(Λ).

### *Appendix B.1. Proof of Step 1*

The proof of Step 1 directly follows from the point-to-point setting, i.e., Theorem 1, applied to each *l* ∈ L(Λ) since we assumed *hl* < *h*Λ.

### *Appendix B.2. Proof of Step 2*

We fix *V* ∈ V. The following procedure must be reiterated for each *V* ∈ V by applying a permutation *π* ∈ *Sym*(|L(Λ)|) on the labeling of the transmitters. For convenience, we relabel the transmitter from 1 to |L(Λ)| and redefine L(Λ) as -1, |L(Λ)|. We show how to exchange secret messages with rate (1{*Vl* <sup>&</sup>gt; <sup>0</sup>}*Vl*)*l*∈L(Λ) between the transmitters and the receiver, when they have access to pre-shared secrets (obtained from Step 1) with rate (−1{*Vl* <sup>&</sup>lt; <sup>0</sup>}*Vl*)*l*∈L(Λ). Define <sup>I</sup> - {*<sup>l</sup>* ∈ L(Λ) : *Vl* <sup>≤</sup> <sup>0</sup>} and <sup>I</sup>*<sup>c</sup>* - L(Λ)\I. We also use the notation *X*L(Λ) - (*Xl*)*l*∈L(Λ), *<sup>X</sup><sup>n</sup>* <sup>L</sup>(Λ) - (*X<sup>n</sup> <sup>l</sup>* )*l*∈L(Λ), and for *<sup>i</sup>*, *<sup>j</sup>* ∈ L(Λ), *Xi*:*<sup>j</sup>* - (*Xl*)*l*∈*<sup>i</sup>*,*j*.

**Codebook construction**: For Transmitter *<sup>i</sup>* ∈ I*c*, construct a codebook *<sup>C</sup>*(*i*) *<sup>n</sup>* with 2*nRi* 2*nR*H*<sup>i</sup>* codewords drawn independently and uniformly on the sphere of radius <sup>√</sup>*nPi* in R*n*. The codewords are labeled *x<sup>n</sup> <sup>i</sup>* (*mi*, *<sup>m</sup>*H*i*), where *mi* <sup>∈</sup> -1, 2*nRi* , *<sup>m</sup>*H*<sup>i</sup>* <sup>∈</sup> -1, 2*nR*H*<sup>i</sup>* . We choose the rates as *Ri* - *<sup>I</sup>*(*Xi*;*Y*|*X*1:*i*−1) <sup>−</sup> *<sup>I</sup>*(*Xi*; *<sup>Z</sup>*|*Xi*<sup>+</sup>1:|L(Λ)|) <sup>−</sup> *<sup>δ</sup>*, *<sup>R</sup>*H*<sup>i</sup>* - *<sup>I</sup>*(*Xi*; *<sup>Z</sup>*|*Xi*<sup>+</sup>1:|L(Λ)|) <sup>−</sup> *<sup>δ</sup>*. For Transmitter *<sup>i</sup>* ∈ I, construct a codebook *<sup>C</sup>*(*i*) *<sup>n</sup>* with 2*nR*˘*<sup>i</sup>* 2*nR*˚*<sup>i</sup>* codewords drawn independently and uniformly on the sphere of radius <sup>√</sup>*nPi* in <sup>R</sup>*n*. The codewords are labeled *x<sup>n</sup> <sup>i</sup>* (*m*˘ *<sup>i</sup>*, *m*˚ *<sup>i</sup>*), where *m*˘ *<sup>i</sup>* ∈ -1, 2*nR*˘*<sup>i</sup>* , *<sup>m</sup>*˚ *<sup>i</sup>* <sup>∈</sup> -1, 2*nR*˚*<sup>i</sup>* . We define the rates *R*˘*<sup>i</sup>* - *<sup>I</sup>*(*Xi*;*Y*|*X*1:*i*−1) <sup>−</sup> *<sup>δ</sup>*, *<sup>R</sup>*˚*<sup>i</sup>* - *<sup>I</sup>*(*Xi*; *<sup>Z</sup>*|*Xi*<sup>+</sup>1:|L(Λ)|) <sup>−</sup> *<sup>I</sup>*(*Xi*;*Y*|*X*1:*i*−1) <sup>−</sup> *<sup>δ</sup>*, and *<sup>R</sup>*H*<sup>i</sup>* - *R*˘*<sup>i</sup>* + *<sup>R</sup>*˚*<sup>i</sup>* <sup>=</sup> *<sup>I</sup>*(*Xi*; *<sup>Z</sup>*|*Xi*<sup>+</sup>1:|L(Λ)|) <sup>−</sup> <sup>2</sup>*δ*. Define *Cn* - (*C*(*i*) *<sup>n</sup>* )*i*∈L(Λ).

**Encoding at the transmitters**: For Transmitter *<sup>i</sup>* ∈ I*c*, given (*mi*, *<sup>m</sup>*H*i*), transmit *<sup>x</sup><sup>n</sup> <sup>i</sup>* (*mi*, *<sup>m</sup>*H*i*). For Transmitter *<sup>i</sup>* ∈ I, given (*m*˘ *<sup>i</sup>*, *<sup>m</sup>*˚ *<sup>i</sup>*), transmit *<sup>x</sup><sup>n</sup> <sup>i</sup>* (*m*˘ *<sup>i</sup>*, *m*˚ *<sup>i</sup>*), where *m*˚ *<sup>i</sup>* is assumed to be known at the receiver by the transmissions in Step 1. In the following, we define for *i* ∈ I,

*<sup>m</sup>*H*<sup>i</sup>* - (*m*˘ *<sup>i</sup>*, *m*˚ *<sup>i</sup>*). By convention, define for *i* ∈ I, *mi* - ∅. Also define *m* - (*mi*)*i*∈L(Λ), *<sup>m</sup>*<sup>H</sup> -(*m*H*i*)*i*∈L(Λ). In the following, we refer to *<sup>m</sup>*<sup>H</sup> as randomization sequence.

**Decoding**: The receiver performs minimum distance decoding, i.e., given *yn*, determine starting from *<sup>i</sup>* <sup>=</sup> 1 to *<sup>i</sup>* <sup>=</sup> |L(Λ)|, (*m*<sup>ˆ</sup> *<sup>i</sup>*, *<sup>m</sup>*H<sup>ˆ</sup> *<sup>i</sup>*) *φi*(*yn*, ∑*i*−<sup>1</sup> *<sup>j</sup>*=<sup>1</sup> *<sup>x</sup><sup>n</sup> <sup>j</sup>* (*m*<sup>ˆ</sup> *<sup>j</sup>*, *<sup>m</sup>*H<sup>ˆ</sup> *<sup>j</sup>*)) where

$$\phi\_i \colon (y^n, \mathbf{x}) \mapsto \begin{cases} (m\_i, \tilde{m}\_i) & \text{if } \|y^n - \mathbf{x} - \mathbf{x}\_i^n(m\_i, \tilde{m}\_i)\|^2 < \|y^n - \mathbf{x} - \mathbf{x}\_i^n(m\_i', \tilde{m}\_i')\|^2 \\ & \text{for } (m\_i', \tilde{m}\_i') \neq (m\_i, \tilde{m}\_i) \\ 0 & \text{if no such } (m\_i, \tilde{m}\_i) \in [1, 2^{nR\_i}] \times [1, 2^{n\bar{R}\_i}] \text{ exists} \end{cases} \tag{A3}$$

Define *m*ˆ - (*m*<sup>ˆ</sup> *<sup>i</sup>*)*i*∈L(Λ), *<sup>m</sup>*H<sup>ˆ</sup> - (*m*H<sup>ˆ</sup> *<sup>i</sup>*)*i*∈L(Λ). Let *<sup>e</sup>*(*Cn*,*sn*) - P *M*F = *M*|*Cn* , we now prove that on average on *Cn*, we have E*Cn* [sup*s<sup>n</sup> <sup>e</sup>*(*Cn*,*sn*)] + <sup>1</sup> *<sup>n</sup> <sup>I</sup>*(*M*; *<sup>Z</sup>n*|*Cn*) *<sup>n</sup>*→<sup>∞</sup> −−−→ 0. We will thus conclude that there exists a sequence of realizations (C*n*) of (*Cn*) such that both sup*s<sup>n</sup> <sup>e</sup>*(C*n*,*sn*) and <sup>1</sup> *<sup>n</sup> <sup>I</sup>*(*M*; *<sup>Z</sup>n*|*Cn*) can be made arbitrarily close to zero as *<sup>n</sup>* <sup>→</sup> <sup>∞</sup>.

**Average probability of error**: We have

$$\mathcal{E}(\mathbb{C}\_{n}, \mathbf{s}^{n}) \le \mathbb{P}\left[\hat{\mathcal{M}} \ne \mathcal{M} \text{ or } \hat{\mathcal{M}} \ne \hat{\mathcal{M}} \Big| \mathcal{C}\_{n}\right] \tag{A4a}$$

$$=\sum\_{i\in\mathcal{L}(\Lambda)}\boldsymbol{e}\_{i}\left(\boldsymbol{\mathsf{C}}\_{\text{tr}}\,\boldsymbol{s}^{\text{tr}}\_{\text{\textdegree\textquotedblleft}\boldsymbol{s}}\sum\_{j=i+1}^{|\mathcal{L}(\Lambda)|}\boldsymbol{x}^{\text{\textquotedblleft}}\_{j}(\boldsymbol{M}\_{j},\widetilde{\boldsymbol{M}}\_{j})\right),\tag{A4b}$$

where for *i* ∈ L(Λ)

$$\left|c\_{i}(\mathbb{C}\_{n},\mathbf{s}^{\boldsymbol{n}},\mathbf{x})\right| \stackrel{\scriptstyle \Delta}{=} \frac{1}{\lceil 2^{nR\_{i}} \rceil \lceil 2^{n\bar{R}\_{i}} \rceil} \sum\_{m\_{i}} \sum\_{\bar{m}\_{i}} \mathbb{P}\left[\left\|\mathbf{x}\_{i}^{\boldsymbol{n}}(m\_{i},\tilde{m}\_{i}) + \mathbf{s}^{\boldsymbol{n}} + \mathbf{x} + \mathbf{N}\_{Y}^{\boldsymbol{n}} - \mathbf{x}\_{i}^{\boldsymbol{n}}(m\_{i}^{\prime},\tilde{m}\_{i}^{\prime})\right\|^{2} \right.\tag{A5}$$

$$\leq \left\|\mathbf{s}^{\boldsymbol{n}} + \mathbf{x} + \mathbf{N}\_{Y}^{\boldsymbol{n}}\right\|^{2} \text{ for some } (m\_{i}^{\prime},\tilde{m}\_{i}^{\prime}) \neq (m\_{i},\tilde{m}\_{i})\Big|.\tag{A5}$$

Assume that the receiver has reconstructed (*mj*, *<sup>m</sup>*H*j*)*j*∈-1,*i*, for *i* ∈ L(Λ). Assume first that *<sup>i</sup>* <sup>+</sup> <sup>1</sup> ∈ I*c*. Using minimum distance decoding, on average over the codebooks, we show that the receiver can reconstruct *x<sup>n</sup> <sup>i</sup>*+1. We have

$$\begin{split} & \mathbb{E}\_{\mathbb{C}\_{n}} \left[ \mathbf{c}\_{i} \left( \mathbf{C}\_{n}, \mathbf{s}^{\text{u}}, \sum\_{j=i+1}^{|\mathcal{L}(\Lambda)|} \mathbf{x}\_{j}^{\text{u}} (\mathcal{M}\_{j}, \widetilde{\mathcal{M}}\_{j}) \right) \right] \\ & \leq \mathbb{E}\_{\mathbb{C}\_{n}} \left[ \mathbf{c}\_{i} \left( \mathbf{C}\_{n}, \mathbf{s}^{\text{u}}\_{\text{s}}, \sum\_{j=i+1}^{|\mathcal{L}(\Lambda)|} \mathbf{x}\_{j}^{\text{u}} (\mathcal{M}\_{j}, \widetilde{\mathcal{M}}\_{j}) \right) \right] \mathbf{C}\_{n}^{(i)} \in \mathcal{C}\_{i}^{\*} \right] + \mathbb{P} \left[ \mathcal{C}\_{n}^{(i)} \notin \mathcal{C}\_{i}^{\*} \right] \\ & \overset{n \to \infty}{\longrightarrow} 0, \end{split} \tag{A6a} \tag{A6b} \tag{A6b} $$

where in Equation (A6a) C<sup>∗</sup> *<sup>i</sup>* represents all the sets of unit norm vectors scaled by <sup>√</sup>*nPi* that satisfy the two conditions of Lemma A1 (in Appendix A), Equation (A6b) holds because P[*C*(*i*) *<sup>n</sup>* ∈ C<sup>∗</sup> *<sup>i</sup>* ] *<sup>n</sup>*→<sup>∞</sup> −−−→ 1 by Lemma A1, and E*Cn ei Cn*,*sn*, <sup>∑</sup>|L(Λ)<sup>|</sup> *<sup>j</sup>*=*i*+<sup>1</sup> *<sup>x</sup><sup>n</sup> <sup>j</sup>* (*Mj*, *<sup>M</sup>*<sup>H</sup> *<sup>j</sup>*) |*C*(*i*) *<sup>n</sup>* ∈ C<sup>∗</sup> *i <sup>n</sup>*→<sup>∞</sup> −−−→ 0 by Theorem A1 (in Appendix A) using the definition of *Ri* <sup>+</sup> *<sup>R</sup>*H*<sup>i</sup>* and by interpreting the signal of transmitters in *i* + 1, |L(Λ)| as noise.

Similarly, when *i* + 1 ∈ I, using minimum distance decoding, on average over the codebooks, the receiver can reconstruct *x<sup>n</sup> <sup>i</sup>*+1(*m*˘ *<sup>i</sup>*<sup>+</sup>1, *m*˚ *<sup>i</sup>*+1) with a vanishing average probability of error because *m*˚ *<sup>i</sup>*+<sup>1</sup> is known at the receiver and by definition of *R*˘*i*+1, hence,

$$\mathbb{E}\_{\mathbb{C}\_n}[\mathcal{c}\left(\mathbb{C}\_{n\prime}s^n\right)] \xrightarrow{n\to\infty} 0.\tag{A7}$$

**Equivocation**: We first study the average error probability of decoding *<sup>m</sup>*<sup>H</sup> given (*zn*, *<sup>m</sup>*) with the following procedure. From *<sup>i</sup>* <sup>=</sup> |L(Λ)<sup>|</sup> to *<sup>i</sup>* <sup>=</sup> 1, given (*zn*, *<sup>m</sup>*), determine *<sup>m</sup>*H˙ *<sup>i</sup>* - *ψi <sup>z</sup>n*, <sup>∑</sup>|L(Λ)<sup>|</sup> *j*=*i*+1 D *hjx<sup>n</sup> <sup>j</sup>* (*mj*, *<sup>m</sup>*H˙ *<sup>j</sup>*) , where for *i* ∈ L(Λ)

$$\psi\_{i}:(\boldsymbol{z}^{\boldsymbol{n}},\boldsymbol{x})\mapsto\begin{cases} \boldsymbol{\tilde{m}}\_{i} & \text{if } \|\boldsymbol{z}^{\boldsymbol{n}}-\boldsymbol{x}-\sqrt{\boldsymbol{h}\_{i}}\boldsymbol{x}\_{i}^{\boldsymbol{n}}(\boldsymbol{m}\_{i},\boldsymbol{\tilde{m}}\_{i})\|^{2} < \|\boldsymbol{z}^{\boldsymbol{n}}-\boldsymbol{x}-\sqrt{\boldsymbol{h}\_{i}}\boldsymbol{x}\_{i}^{\boldsymbol{n}}(\boldsymbol{m}\_{i},\boldsymbol{\tilde{m}}\_{i}')\|^{2} \\ & \text{for } \boldsymbol{\tilde{m}}\_{i}' \neq \boldsymbol{\tilde{m}}\_{i} \\ 0 & \text{if no such } \boldsymbol{\tilde{m}}\_{i} \in [1,2^{\boldsymbol{n}\bar{R}\_{i}}] \text{ exists} \end{cases} \quad\text{for } \boldsymbol{\tilde{m}}\_{i}' \neq \boldsymbol{\tilde{m}}\_{i} \quad\text{. (A8)}$$

We define <sup>H</sup>*e*(*Cn*) - P H˙ *M* <sup>=</sup> *<sup>M</sup>*<sup>H</sup> *Cn* . We have

$$\widetilde{\varepsilon}(\mathbb{C}\_n) = \sum\_{i \in \mathcal{L}(\Lambda)} \widetilde{\varepsilon}\_i \left( \mathbb{C}\_n \sum\_{j=1}^{i-1} \sqrt{h\_j} \mathbf{x}\_j^n (\mathcal{M}\_{\mathbf{\hat{\boldsymbol{\boldsymbol{\beta}}}}} \, \mathbf{\tilde{M}}\_{\mathbf{\hat{\boldsymbol{\beta}}}}) \right), \tag{A9}$$

where for *i* ∈ L(Λ)

$$\widetilde{c}\_{i}(\mathbb{C}\_{n},\mathbf{x}) \stackrel{\Delta}{=} \frac{1}{\lceil 2^{n\bar{R}\_{i}} \rceil} \sum\_{\tilde{m}\_{i}} \mathbb{P} \Big[ \lVert \sqrt{h}\_{i} \mathbf{x}\_{i}^{\boldsymbol{\mu}} (m\_{i}, \tilde{m}\_{i}) + \mathbf{x} + N\_{Z}^{\mathbf{u}} - \sqrt{h}\_{i} \mathbf{x}\_{i}^{\boldsymbol{\mu}} (m\_{i}, \tilde{m}\_{i}^{\prime}) \big\|^{2} $$
 
$$ \leq \lVert \mathbf{x} + N\_{Z}^{\mathbf{u}} \rVert^{2} \text{ for some } \tilde{m}\_{i}^{\prime} \neq \tilde{m}\_{i} \Big]. \tag{A10} $$

Similar to the justifications for obtaining Equation (A6b), E*Cn* <sup>H</sup>*ei*(*Cn*, <sup>∑</sup>*i*−<sup>1</sup> *j*=1 D *hjx<sup>n</sup> <sup>j</sup>* (*Mj*, *<sup>M</sup>*<sup>H</sup> *<sup>j</sup>*)) vanishes to zero as *n* → ∞ by interpreting the signal of transmitters in -1, *i* − 1 as noise and by using the definition of *<sup>R</sup>*H*i*. We thus obtain

$$\mathbb{E}\_{\mathbb{C}\_n}[\widetilde{\varepsilon}(\mathbb{C}\_n)] \xrightarrow{n \to \infty} 0. \tag{A11}$$

Let the superscript *T* denote the transpose operation and define **X** - [ <sup>√</sup>*h*1(*X<sup>n</sup>* 1 )*T* <sup>√</sup>*h*2(*X<sup>n</sup>* <sup>2</sup> )*<sup>T</sup>* ... <sup>D</sup> *<sup>h</sup>*|L(Λ)|(*X<sup>n</sup>* |L(Λ)| )*T*] *<sup>T</sup>* <sup>∈</sup> <sup>R</sup>*n*|L(Λ)|×1, such that

$$Z^{\mathfrak{n}} = \mathbf{G}\mathfrak{X} + \mathbf{N}\_{\mathbb{Z}'}^{\mathfrak{n}} \tag{A12}$$

with *G* - [*In*, *In*, ... , *In*] <sup>∈</sup> <sup>R</sup>*n*×*n*|L(Λ)<sup>|</sup> and *In* the identity matrix with dimension *<sup>n</sup>*. Let *<sup>K</sup>***<sup>X</sup>** denote the covariance matrix of **X**. Similar to Equation (50), we have

$$\mathcal{K}\_{\mathbf{X}} = \text{diag}(h\_1 P\_1 I\_{\mathbf{n}\_1}, \dots, h\_{|\mathcal{L}(\Lambda)|} P\_{|\mathcal{L}(\Lambda)|} I\_{\mathbf{n}}).\tag{A13}$$

Then, we have

$$I(M; Z^n | \mathbb{C}\_n) \le I(\mathbf{X}; Z^n) - H(\tilde{M} | \mathbb{C}\_n) + H(\tilde{M} | Z^n M \mathbb{C}\_n) \tag{A14a}$$

$$0 \le \frac{1}{2} \log |\mathbf{G} K\_\mathbf{X} \mathbf{G}^T + I\_n| - H(\check{M} | \mathbf{C}\_n) + H(\check{M} | Z^n M \mathbf{C}\_n) \tag{A14b}$$

$$=\frac{n}{2}\log\left(1+\sum\_{l\in\mathcal{L}(\Lambda)}h\_{l}P\_{l}\right)-H(\tilde{M}|\mathbb{C}\_{n})+H(\tilde{M}|Z^{n}M\mathbb{C}\_{n})\tag{A14c}$$

$$0 \le n I(X\_{\mathcal{L}(\Lambda)}; Z) - n \left( I(X\_{\mathcal{L}(\Lambda)}; Z) - 2|\mathcal{L}(\Lambda)|\delta \right) + O(n \mathbb{E}\_{\mathbb{C}\_n}[\hat{\varepsilon}(\mathbb{C}\_n)]) \tag{A14d}$$

$$=2|\mathcal{L}(\Lambda)|\delta+o(n),\tag{A14e}$$

where Equation (A14a) holds similar to Equation (51d), Equation (A14b) holds similar to Equation (51f), Equation (A14c) holds by Equation (A13), in Equation (A14d), we used the definition of <sup>∑</sup>*i*∈L(Λ) *<sup>R</sup>*H*<sup>i</sup>* and the uniformity of *<sup>M</sup>*<sup>H</sup> to obtain the second term, and Fano's inequality to obtain the third term, Equation (A14e) holds by Equation (A11).

The proof of joint secrecy for Step 1 and the repetitions of Step 2 is similar to the proof of Theorem 2.

### **Appendix C. Proof of Theorem 5**

The proof that Equation (20) is an upper bound on the secrecy sum-rate is similar to the case *L* = 2 in Theorem 3.

Remark that from the statement of Corollary 1, it is unclear whether the sum-rate of Theorem 5 is achievable. However, by inspecting the proof of Theorem 4, observe that we achieve a point in *D*+(Λ) - *<sup>D</sup>*(Λ) <sup>∩</sup> <sup>R</sup>|L(Λ)<sup>|</sup> <sup>+</sup> , where *D*(Λ) is defined in Equation (A2). Hence, the sum-rate of Theorem 5 is indeed achievable.

### **References**


### *Article* **Orthogonal Time Frequency Space Modulation Based on the Discrete Zak Transform**

**Franz Lampel \*, Hamdi Joudeh, Alex Alvarado and Frans M. J. Willems**

Information and Communication Theory Lab, Signal Processing Systems Group, Department of Electrical Engineering, Eindhoven University of Technology, 5600 MB Eindhoven, The Netherlands

**\*** Correspondence: f.lampel@tue.nl

**Abstract:** In orthogonal time frequency space (OTFS) modulation, information-carrying symbols reside in the delay-Doppler (DD) domain. By operating in the DD domain, an appealing property for communication arises: time-frequency (TF) dispersive channels encountered in high-mobility environments become time-invariant. OTFS outperforms orthogonal frequency division multiplexing (OFDM) in high-mobility scenarios, making it an ideal waveform candidate for 6G. Generally, OTFS is considered a pre- and postprocessing step for OFDM. However, the so-called Zak transform provides the fundamental relation between the DD and time domain. In this work, we propose an OTFS system based on the discrete Zak transform (DZT). To this end, we discuss the DZT and establish the input–output relation for time-frequency (TF) dispersive channels solely by the properties of the DZT. The presented formulation simplifies the derivation and analysis of the input–output relation of the TF dispersive channel in the DD domain. Based on the presented formulation, we show that operating in the DD incurs no loss in capacity.

**Keywords:** orthogonal time frequency space modulation; discrete Zak transform; delay-Doppler channel; time-frequency dispersive channel; 6G

### **1. Introduction**

Motivated by challenges encountered in wireless communication over time-variant channels, such as Doppler dispersion or equalization, a new modulation technique termed orthogonal time frequency space (OTFS) was introduced in [1]. The driving idea behind OTFS is to utilize the delay-Doppler (DD) domain to represent information-carrying symbols. The interaction of the corresponding OTFS waveform with a time-frequency (TF) dispersive channel results in a two-dimensional convolution of the symbols in the DD domain ([2], [Section III-A]). OTFS utilizes the time-invariant channel interaction in the DD domain and outperforms orthogonal frequency division multiplexing (OFDM) in high-mobility scenarios, as shown in [1–6], making it an ideal waveform candidate for 6G.

Most of the literature on OTFS considers OTFS as a pre- and postprocessing technique for OFDM systems, as described in [3,5,7]. However, the *continuous* Zak transform provides a more fundamental relationship between the DD and time domain, as pointed out in [2] and studied in [8]. In principle, OTFS describes a time domain signal by its DD representations in a similar way to OFDM, which defines a signal in the TF domain. The difference between the DD and TF domains is that the TF domain allows a continuous-time signal to be described by a discrete number of coefficients in the TF domain [9]. On the other hand, the *continuous* Zak transform maps a continuous-time signal to continuous values in the Zak domain. In [8], a discretization of the Zak representation was achieved using time and bandwidth limitations on the signal, represented by a point in the DD domain. However, depending on the domain of the signal under study, different variants of the Zak transform exists. The discrete-time version is referred to as the discrete-time Zak transform (DTZT) and the discrete (and finite) version is the discrete Zak transform (DZT) [10]. The DTZT is discrete in the delay and continuous in the frequency domain,

**Citation:** Lampel, F.; Joudeh, H.; Alvarado, A.; Willems, F.M.J. Orthogonal Time Frequency Space Modulation Based on the Discrete Zak Transform. *Entropy* **2022**, *24*, 1704. https://doi.org/10.3390/ e24121704

Academic Editors: H. Vincent Poor, Holger Boche, Rafael F. Schaefer and Onur Günlü

Received: 16 October 2022 Accepted: 17 November 2022 Published: 22 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

while the DZT is discrete in both the delay and Doppler domains. Thus, an alternative description of OTFS can be provided by the DZT, as we show in this work.

Another motivation for using the DZT can be found by considering OFDM. The fundamental concept of OFDM, that is, mapping symbols onto a set of orthogonal signals in the frequency domain, dates back to 1966 [11]. The success of OFDM is based on its efficient *digital* implementation to compute the discrete Fourier transform (DFT) [12]. Equivalently, OTFS can be efficiently implemented using the *discrete* Zak transform (DZT). The DZT itself is based on the DFT, which allows for efficient implementation as well. Implementations of OTFS which resemble the DZT have been studied previously, in [13], for example. However, the proposed systems is based on OFDM that adds a cyclic prefix (CP) to every OFDM symbol. The CP adds additional signaling overhead and results in a different channel interaction in the DD domain.

DZT-based OTFS is closely related to radar processing in a pulse Doppler radar. A pulse radar transmits a pulse train with uniformly spaced and identical pulses. Target motion introduces a phase shift for each pulse, which is utilized at the receiver to extract the velocity information of a radar target. To this end, the sampled signal is arranged in a two-dimensional grid, and a DFT is applied along the so-called slow time to extract the velocity information of a target; see ([14], [Chapter 17]) or ([15], [Chapter 3]) for details. This variant of Doppler processing is equivalent to the DZT. Similarly, the radar transmitter of such a pulse Doppler radar can be described by the inverse DZT, as demonstrated in [16]. The close connection to radar makes OTFS an ideal waveform for joint communication and sensing, which has been explored by [6], among others.

A fundamental treatment of OTFS based on the DZT is currently absent from the literature. The aim of this work is to close this gap in the literature by providing a complete treatment of OTFS based solely on the DZT. Therefore, we discuss the DZT and its properties, then we derive the input–output relationship for TF dispersive channels in the DD using the DZT and its properties. Our DZT-based approach provides an intuitive understanding of OTFS and drastically simplifies its analysis. Based on our analysis, we further show that the capacity in the DD domain is equivalent to the capacity of the timevariant channel in the time domain (Parts of this work were presented at the 2022 IEEE International Conference on Communications Workshops (ICC Workshops) [17]).

The remainder of the paper is organized as follows. In Section 2, we provide an introduction to the DZT covering all properties needed for OTFS. The signal model based on the *DZT* is described in Section 3. Based on the presented signal model, we further establish the input–output relationship of OTFS based on the DZT in Section 4. In Section 5, we establish the connection between the DD and the TF domain, which allows the implementation of OTFS by an OFDM system. In Section 6, we demonstrate that operating in the DD incurs no loss in capacity. Finally, our conclusions are presented in Section 7.

### **2. Discrete Zak Transform**

The *continuous* Zak transform is a mapping of a continuous-time signal onto a twodimensional function. Implicit usage of the Zak transform can be traced back to Gauss [18]; however, it was Zak who formally introduced the transform in [19], and after whom it was named. An excellent paper from a signal theoretical point of view was provided by Janssen [20]. Later on, Bölcskei and Hlawatsch [10] provided an overview of the discrete versions of the transform, namely, the discrete-time Zak transform and the *discrete* Zak transform. This section is devoted to the DZT and its properties, which we use to describe OTFS and to establish the input–output relation of the TF dispersive channel discussed in Section 3.

### *2.1. Definition and Relations*

In the following discussion, we treat finite-length sequences of length *N* as one period of a periodic sequence with period *N*, which we express as a product *N* = *KL* with *K*, *L* ∈ N. Following the notation in [10], we use *<sup>Z</sup>*(*L*,*K*) *<sup>x</sup>* <sup>∈</sup> <sup>C</sup>Z×<sup>Z</sup> to denote the DZT of a sequence *<sup>x</sup>* <sup>∈</sup> <sup>C</sup><sup>Z</sup> with a period *KL*. The DZT of *<sup>x</sup>* is defined as follows ([10], Equation (30)):

$$Z\_{\mathbf{x}}^{(L,K)}[n,k] \stackrel{\triangle}{=} \frac{1}{\sqrt{\mathbb{K}}} \sum\_{l=0}^{K-1} \underbrace{\mathbf{x}[n+lL]}\_{\mathbf{x}^{\{n,l\}}[l]} e^{-j2\pi \frac{k}{K}l}, \quad n,k \in \mathbb{Z}.\tag{1}$$

It follows from (1) that the DZT for a given *n* is the unitary discrete Fourier transform (DFT) of a subsampled sequence *x*(*n*,*L*) - {*x*(*n*,*L*)[*l*] = *<sup>x</sup>*[*<sup>n</sup>* <sup>+</sup> *lL*] : *<sup>l</sup>* <sup>∈</sup> <sup>Z</sup>}. The variable *n* determines the starting phase of the downsampled sequence, whereas the variable *k* is the discrete frequency of its DFT. Thus, the variables *n* and *k* represent the time and frequency, respectively.

The *periodic* sequence *x* can be recovered from its DZT through the following sum relation:

$$\text{tr}[n] = \frac{1}{\sqrt{\mathcal{K}}} \sum\_{k=0}^{K-1} Z\_x^{(L,\mathcal{K})}[n,k],\tag{2}$$

which follows from the definition of the DZT in (1) and the relation

$$\sum\_{k=0}^{K-1} e^{-j2\pi \frac{k}{K}k} = K \sum\_{m=-\infty}^{\infty} \delta[l - mK],\tag{3}$$

where *δ*[*n*] denotes the Kronecker delta. We refer to (2) as the inverse discrete Zak transform (IDZT).

**Remark 1.** *Depending on the period N of the sequence under consideration, different choices of K and L are possible. We indicate the particular choice of L and K in the superscript of the DZT notation we use (Z*(*L*,*K*) *<sup>x</sup> ). If the choice is not important for the context, we drop the superscript for brevity of notation* (*Zx*)*. Furthermore, the DZT is in general a complex-valued function. To illustrate the DZT, we often write the DZT in polar form, i.e.,*

$$Z\_{\mathbf{x}}[n,k] = |Z\_{\mathbf{x}}[n,k]|e^{i\mathbf{q}\_{\mathbf{x}}\cdot[n,k]},\tag{4}$$

*where* <sup>|</sup>*Zx*[*n*, *<sup>k</sup>*]<sup>|</sup> *and <sup>ϕ</sup>x*[*n*, *<sup>k</sup>*] *represent the magnitude and the phase of <sup>Z</sup>*(*L*,*K*) *<sup>x</sup>* [*n*, *<sup>k</sup>*]*, respectively. We restrict the phase to the principal values, i.e., to the interval* [−*π*, *π*)*.*

**Example 1** (DZT)**.** *Consider the N-periodic sequence g with elements*

$$\lg[n] = \begin{cases} f[n], & 0 \le n \le L - 1, \\ 0, & L \le n \le KL - 1. \end{cases} \tag{5}$$

*The sequence is zero, except possibly for the first L samples, where it takes the value of an arbitrary sequence f . The second condition in* (5) *implies that only one nonzero addend (for l* = 0*) exists in the summation* (1)*. Thus, the elements of Zg for* 0 ≤ *n* ≤ *L* − 1 *and* 0 ≤ *k* ≤ *K* − 1 *are*

$$Z\_{\mathfrak{F}}[n,k] = \frac{1}{\sqrt{K}} f[n]. \tag{6}$$

*Example for a sequence f and the corresponding magnitude of the DZT Zg are illustrated in Figure 1a,b, respectively.*

**Figure 1.** (**a**) Sequence *f* [*n*] = *e* − 1 <sup>2</sup> ( *<sup>n</sup>*−*L*/2 *<sup>σ</sup>L*/2 ) 2 for *σ* = 1/4, 0 ≤ *n* ≤ *L* − 1 and *L* = 30. The sequence *g* has a period *KL* = 900. (**b**) Magnitude of the discrete Zak transform (DZT) *Zg* with parameters *K* = 30, *L* = 30 in (6), for 0 ≤ *n* ≤ *L* − 1 and 0 ≤ *k* ≤ *K* − 1. The phase *ϕg*[*n*, *k*] (not plotted) is zero for the presented values of *n* and *k*; see (6).

We express the period of the sequence *x* as a product *KL* with *K*, *L* ∈ N. This factorization ensures that the sequence can be decomposed into *L* subsampled sequences with period *K*. In general, the product *KL* is not uniquely defined, as different choices of *K* and *L* result in the same product. Independent of the period, two choices are always possible and provide interesting insights. First, the choice *K* = 1 in (1) leads to

$$Z\_{\mathbf{x}}^{(L,1)}[n,k] = \mathbf{x}[n],\tag{7}$$

i.e., the elements of DZT for a specific *n* and any *k* are the elements of the sequence *x*. Second, the case *L* = 1 results in

$$Z\_{\mathbf{x}}^{(1,K)}[n,k] = \frac{1}{\sqrt{\mathcal{K}}} \sum\_{l=0}^{K-1} x[n+l] e^{-j2\pi \frac{k}{\hbar}l}.\tag{8}$$

For *n* = 0, we obtain

$$X\_{\mathbf{x}}^{(1,K)}[0,k] = X[k] \tag{9}$$

where *<sup>X</sup>* <sup>∈</sup> <sup>C</sup><sup>Z</sup> is the unitary DFT of the sequence *<sup>x</sup>*, i.e.,

$$X[k] \stackrel{\Delta}{=} \frac{1}{\sqrt{K}} \sum\_{l=0}^{K-1} x[l] e^{-j2\pi \frac{k}{K}l}.\tag{10}$$

It follows from (8) that *<sup>Z</sup>*(1,*K*) *<sup>x</sup>* [*n*, *<sup>k</sup>*] represents the DFT of the circular shifted sequence *x* with shift parameter *n*. Using the circular shift property of the DFT provided in ([21], Equation (3.168))

$$\propto [n - n\_0] \Leftrightarrow e^{-j2\pi \frac{k}{K} n\_0} X[k] ,\tag{11}$$

we can express (8) equivalently as

$$Z\_{\mathbf{x}}^{(1,K)}[n,k] = e^{j2\pi \frac{k}{K}n} X[k] = e^{j2\pi \frac{k}{K}n} Z\_{\mathbf{x}}^{(1,K)}[0,k]. \tag{12}$$

Following the same approach used to obtain the DFT (9), we can obtain the inverse DFT (IDFT). Therefore, we consider (2) for the case *L* = 1, which is

$$\text{tr}[n] \stackrel{\Delta}{=} \frac{1}{\sqrt{K}} \sum\_{k=0}^{K-1} X[k] e^{j2\pi \frac{k}{K} n} \,\text{}\,\tag{13}$$

where (13) is obtained by substituting (12) in (2).

While the DZT *Zx* of a sequence *x* can be obtained from a sequence *x*, it can additionally be obtained from its DFT *X* in (9) through

$$Z\_x^{(L,K)}[n,k] = \frac{1}{\sqrt{L}} \sum\_{l=0}^{L-1} X[k+lK] e^{j2\pi \frac{k+lK}{K!}n}.\tag{14}$$

**Proof.** See Appendix A.

Equivalently, using (1), we recognize (14) as

$$Z\_{\mathbf{x}}^{(L,K)}[n,k] = e^{j2\pi \frac{n}{K\mathbb{L}}k} Z\_{\mathbf{X}}^{(K,L)}[k\_{\prime}-n]\_{\prime} \tag{15}$$

where *Z*(*K*,*L*) *<sup>X</sup>* is the DZT of the DFT sequence *X*.

The corresponding inverse relation is

$$X[k] = \frac{1}{\sqrt{L}} \sum\_{n=0}^{L-1} Z\_x^{(L,K)}[n,k] \varepsilon^{-2\pi \frac{k}{KL}n}.\tag{16}$$

**Proof.** See Appendix B.

Figure 2 summarizes the relations between the sequence *x*, the DZT *Zx*, and the DFT *X*. Note that the DFT *X* can be obtained in two ways: either directly via (10) or indirectly using (1) and (16). The later approach resembles the Cooley–Tukey algorithm, which is a fast Fourier transform algorithm [10].

**Figure 2.** Different signal representations of a sequence *x* and its corresponding DZT *Zx* and DFT *X* transforms.

### *2.2. Properties of the DZT*

The DFT *X* of a sequence *x* with length *K* is periodic with period *K*, i.e., *X*[*k*] = *X*[*k* + *mK*] with *m* ∈ Z; see (10). The DZT possess similar properties, as the DZT is the DFT of the downsampled sequence *x*(*n*,*L*); see (1). Consequently, the DZT is periodic in the frequency variable *k*, i.e.,

$$Z\_{x}^{(L,K)}[n,k+mK] = Z\_{x}^{(L,K)}[n,k], \quad m \in \mathbb{Z}.\tag{17}$$

Using the circular shift property of the DFT in (11), we then have

$$Z\_{\mathbf{x}}^{(L,K)}[n+mL,k] = e^{i2\pi \frac{k}{K}m} Z\_{\mathbf{x}}^{(L,K)}[n,k], \quad m \in \mathbb{Z},\tag{18}$$

i.e., the DZT is periodic in *n* with a period *L* up to a complex factor *ej*2*π*(*k*/*K*)*m*. The DZT is therefore said to be *quasi*-periodic with *quasi*-period *L*. Due to the periodicity properties in (17) and (18), the DZT is fully determined by the DZT for 0 ≤ *n* ≤ *L* − 1 and 0 ≤ *k* ≤ *K* − 1, which is referred to as the fundamental rectangle [10].

The *quasi*-periodicity in (18) can be utilized to express the IDZT in (2) as follows:

$$\text{tr}[n+IL] = \frac{1}{\sqrt{K}} \sum\_{k=0}^{K-1} Z\_x^{(L,K)}[n,k] e^{j2\pi \frac{k}{K}I}.\tag{19}$$

Here, we express the index of the sequence as sum of the form *n* + *lL* with 0 ≤ *n* ≤ *L* − 1 and *l* ∈ Z. Because the fundamental rectangle fully determines the DZT *Zx*, we restrict ourselves to this fundamental rectangle when plotting the DZT. In fact, this is what is done in Figure 1b.

**Example 2** (IDZT)**.** *Consider the DZT defined by a single nonzero coefficient on the fundamental rectangle of size* 4 × 6 *and provided by*

$$Z\_{\mathbf{x}}^{(4,6)}[n,k] = \delta[n]\delta[k].\tag{20}$$

*The fundamental rectangle and the DZT in* (20) *are illustrated in Figure 3a (left). One period of the sequence x obtained through* (19) *is*

$$\mathfrak{x}[n] = \frac{1}{\sqrt{6}} \sum\_{l=0}^{K-1} \delta[n - \mathfrak{G}l]\_{\prime} \tag{21}$$

*i.e., a train of real Kronecker deltas starting at n* = 0 *with spacing L* = 6*, as shown in Figure 3a (right). Now, consider the DZT*

$$Z\_{\mathcal{Y}}^{(4,6)}[n,k] = \delta[n-3]\delta[k-5],\tag{22}$$

*which is shown in Figure 3b. One period of the corresponding sequence y is*

$$y[n] = \frac{1}{\sqrt{6}} \sum\_{l=0}^{K-1} \delta[n-3-6l] e^{j2\pi \frac{5}{6}l} \tag{23}$$

*and is shown in Figure 3b. When compared to x, the sequence y is delayed by three samples and modulated with a discrete frequency k* = 5*.*

*In fact, a single coefficient at Zx*[*n*, *k*] *maps onto a sequence*

$$w\_{n,k}[n'] = \frac{1}{\sqrt{K}} \sum\_{l=0}^{K-1} \delta[n'-n+lL]e^{j2\pi \frac{k}{K}l}.\tag{24}$$

*The set of sequence* {*vn*,*<sup>k</sup>* : 0 ≤ *n* ≤ *L* − 1, 0 ≤ *k* ≤ *K* − 1} *forms an orthonormal basis and Zx*[*n*, *k*] *are the* expansion coefficients *of a sequence x with respect to this orthonormal basis. We use this fact in Section 3, where we define a sequence by its corresponding DZT in the same way as OFDM defines the symbols in the DFT domain.*

**Figure 3.** Two examples of DZTs (left) defined by a single nonzero coefficient on the fundamental rectangle (indicated by a dot) and the corresponding sequences (right) for (**a**) the DZT in (20) and (**b**) the DZT in (22).

Using the *quasi*-periodicity, we can further find that the elementwise product of a DZT *Zx* with the complex conjugate DZT *Z*∗ *<sup>y</sup>* is periodic in *n* and *k*. Motivated by this periodicity, we apply a two-dimensional DFT, which turns out to be [10,22]

$$\sum\_{n=0}^{L-1} \sum\_{k=0}^{K-1} Z\_{\mathbf{x}}[n,k] Z\_{\mathbf{y}}^{\*}[n,k] e^{j2\pi \left(\frac{m}{K}k - \frac{l}{L}n\right)} = \langle \mathbf{x}, \mathbf{y}\_{m,l} \rangle\_{\prime} \tag{25}$$

where *ym*,*<sup>l</sup> <sup>y</sup>*[*<sup>n</sup>* <sup>−</sup> *mL*]*ej*2*π*(*l*/*L*)*n*. Here, ·, · is the inner product, defined as

$$
\langle \mathbf{x}, \mathbf{y} \rangle = \sum\_{n=0}^{N-1} \mathbf{x}[n] y^\*[n]. \tag{26}
$$

Note that the Fourier kernel *e j*2*π*( *<sup>m</sup> <sup>K</sup> <sup>k</sup>*<sup>−</sup> *<sup>l</sup> <sup>L</sup> <sup>n</sup>*) in (25) has opposed signs for the two individual dimensions. Therefore, the two-dimensional discrete Fourier transform in (25) is usually referred to as the inverse *symplectic* finite Fourier transform (ISFFT).

### **Proof.** See Appendix C.

The inverse relation is provided by

$$Z\_{\mathcal{X}}[n,k]Z\_{\mathcal{Y}}^{\*}[n,k] = \frac{1}{KL} \sum\_{m=0}^{K-1} \sum\_{l=0}^{L-1} \langle x, y\_{m,l} \rangle e^{-j2\pi \left(\frac{k}{K}m - \frac{\pi}{L}l\right)},\tag{27}$$

which follows from applying the corresponding two-dimensional inverse transform on both sides of (25). The transform of the right-hand side of (27) is referred to as the symplectic finite Fourier transform (SFFT). The relations (25) and (27) provide a useful tool when considering the OTFS overlay for OFDM in Section 5.

### *2.3. Signal Transform Properties*

Here, we list three signal transform properties that we use later when studying OTFS. A comprehensive overview of signal transform properties can be found in ([10], Table VII). Let *x*, *y*, and *z* be sequences with the same periods and let *Zx*, *Zy*, and *Zz* be their respective DZTs. Then, the following properties hold:

1. *Shift:* Let *y* be the shifted version of *x*, i.e., *y*[*n*] = *x*[*n* − *m*]; then,

$$Z\_y[n,k] = Z\_x[n-m,k].\tag{28}$$

A shift in the sequence causes a shift in the corresponding DZT. The proof follows from the definition of the DZT (1). For shifts of multiples of *L*, i.e., *m* = *lL* with *l* ∈ Z, we further have

$$Z\_y[n,k] = e^{-j2\pi \frac{k}{K}m} Z\_x[n,k],\tag{29}$$

which follows from the *quasi*-periodicity of the DZT in (18).

2. *Modulation:* Let *z* = *x* · *y* be the elementwise product of *x* and *y*, i.e., *z*[*n*] = *x*[*n*]*y*[*n*]. Then,

$$Z\_z[n,k] = \frac{1}{\sqrt{K}} \sum\_{l=0}^{K-1} Z\_x[n,l] Z\_y[n,k-l] \,. \tag{30}$$

i.e., the DZT of the element-wise multiplication is a scaled convolution with respect to the variable *k*.

### **Proof.** See Appendix D.

3. *Circular Convolution:* Consider *z* = *x y*, i.e., the circular convolution of *x* and *y*. Then, the DZT *Zz* is

$$Z\_z[n,k] = \sqrt{K} \sum\_{m=0}^{L-1} Z\_x[m,k] Z\_y[n-m,k] \,. \tag{31}$$

i.e., the DZT of a circular convolution is the scaled convolution with respect to the variable *n* up to a constant.

### **Proof.** See Appendix E.

The shift property in (28) together with the *quasi*-periodicity in (18) has another important implication. In OTFS, as we show in Section 3, the received signal includes a superposition of delayed sequences that, in general, are not multiples of *L*. We discuss this further in Example 3.

**Example 3** (Shifted DZT)**.** *Consider a DZT Zh with elements*

$$Z\_h[n,k] = Z\_\S[n-10,k],\tag{32}$$

*which is a shifted version of the DZT Zg in Figure 1b of Example 1. To evaluate the DZT Zh within the fundamental rectangle, we first make the observation that any index n can be expressed as n* = *i* + *mL with m* = *n*/*L, where n*/*L denotes the greatest integer less than or equal to n*/*L. In this example, the indices n* = 0 *to* 9 *of Zh correspond to the indices n* = −10 *to* −1 *of Zg. Expressing the latter indices in terms of i and m, we know m* = −1 *and i from 20 to 29. Thus, by the* quasi*-periodicity property in* (18)*, we have that Zh*[*n*, *k*] = *e*−*j*2*πk*/*KZg*[*n* + 20, *k*] *for* 0 ≤ *n* ≤ 9*. On the other hand, the indices of* 10 ≤ *n* ≤ 29 *of Zh*[*n*, *k*] *correspond to the indices* 0 ≤ *n* ≤ 19 *of Zg*[*n*, *k*]*. Therefore, m* = 0 *and Zh is the shifted DZT Zg within the fundamental rectangle. Thus,*

$$Z\_{\hbar}[n,k] = \begin{cases} e^{-j2\pi \frac{k}{K}} Z\_{\mathbb{S}}[n+20,k], & 0 \le n \le 9, \\ Z\_{\mathbb{S}}[n-10,k], & 10 \le n \le 29, \end{cases} \tag{33}$$

*or more generally, Zh*[*n*, *k*] = *ej*2*π*(*k*/*K*) (*n*−10)/*<sup>L</sup>Zg*[(*<sup>n</sup>* <sup>−</sup> <sup>10</sup>)*L*, *<sup>k</sup>*]*. The DZT Zh is depicted in Figure 4, which illustrates different phase behaviors as well.*

**Figure 4.** The DZT *Zh*[*n*, *k*] = *Zg*[*n* − 10, *k*] in Example 3, with *Zg*[*n*, *k*] being the DZT of Figure 1. The shift of the DZT with respect to *n* causes a circular shift of the magnitude |*Zg*[*n*, *k*]| of the DZT (**top**). The phase *ϕh*[*n*, *k*] experiences an additional linear phase for indices smaller than 10 (**bottom**).

### **3. System Model**

In this section, we use the IDZT/DZT to map the symbols in the DD domain directly to a time domain sequence and vice versa. We consider a pulse-amplitude modulation (PAM) system to map the discrete symbols onto continuous pulses, as schematically shown in Figure 5. This approach allows for the digital implementation of OTFS similar to the PAM implementation of OFDM presented in ([23] Chapter 6.4.2).

**Figure 5.** OTFS system model considered in this work. The IDZT maps a sequence consisting of the symbols defined in the DD domain to a discrete sequence. A CP is added by copying the last *O* samples. The resulting sequence *x* is converted to a serial stream by a parallel-to-serial converter (P/S) before being mapped onto a pulse *p*(*t*) and sent over a noisy TF-dispersive channel *h*(*τ*, *ν*). At the receiver, a sampled matched filter is applied before the serial stream is converted to a parallel stream by a serial-to-parallel (S/P) converter. Lastly, the sequence *y* is mapped to the DD domain using the DZT. The DD input–output relationship is provided by (46) and Theorem 1.

### *3.1. Transmitter*

Similar to OFDM, which defines symbols in the frequency domain, OTFS defines *K* × *L* symbols on the fundamental rectangle in the Zak domain. The symbols in the Zak domain are mapped to a sequence in the time domain using the IDZT in (19). Prior to modulation, a CP of length *O* is added by copying the last *O* samples and inserting them at the beginning of the sequence (see Figure 5). As we show later, the CP turns the linear convolution of the channel into a circular convolution, allowing us to use the circular convolution property (47) of the DZT. The elements of the sequence *x* are then mapped onto time-shifted pulses *p*(*t*) using PAM. The transmitted signal is provided as follows:

$$s(t) = \sum\_{n=0}^{N+O-1} x[n-O]p(t-nT),\tag{34}$$

where *T* is the modulation interval and *p*(*t*) is a square-root Nyquist pulse. Note that (34) is equivalent (up to the CP) to (21) of [8]. However, by considering the DZT and PAM, no discretization of the continuous Zak transform is required. Moreover, considering the class of Nyquist pulses in the modulation allows for more freedom in controlling the interference in the delay domain.

**Remark 2.** *In Section 2.1, we discussed the implications of the choice of the parameters K and L for the DZT. Similarly, the choice of K and L influences the OTFS system under study. For the case K* = 1*, the symbols of Zx are arranged on a line along the delay axis. The IDZT does not alter the sequence and can be skipped; see* (7)*. Thus, the system is a single carrier system. On the other hand, for <sup>L</sup>* <sup>=</sup> <sup>1</sup>*, the symbols <sup>Z</sup>*(*L*,*K*) *<sup>x</sup>* [*n*, *<sup>k</sup>*] *are arranged along the Doppler axis. The IDZT is simply the IDFT (see* (13)*), and* (34) *becomes an OFDM signal as in ([23] Chapter 6.4.2).*

### *3.2. Channel Model*

We now consider TF dispersive channels and model the received signal as follows ([24] Chapter 1.3.1):

$$r(t) = \int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} h(\tau, \nu) s(t - \tau) e^{j2\pi \nu t} d\tau d\nu + \vec{w}(t) \tag{35}$$

where *h*(*τ*, *ν*) is the so-called DD spreading function. The complex noise *w*˜(*t*) is assumed to be white and Gaussian with power spectral density *N*0. We model the channel by *P* discrete scattering objects. Each scattering object is associated with a path delay *τp*, a Doppler shift *νp*, and a complex attenuation factor *αp*. Thus, the spreading function *h*(*τ*, *ν*) becomes

$$h(\boldsymbol{\pi}, \boldsymbol{\nu}) = \sum\_{p=0}^{P-1} a\_p \delta(\boldsymbol{\pi} - \boldsymbol{\pi}\_p) \delta(\boldsymbol{\nu} - \boldsymbol{\nu}\_p). \tag{36}$$

Substituting (36) in (35) yields

$$r(t) = \sum\_{p=0}^{P-1} a\_p s(t - \tau\_p) e^{j2\pi \nu\_p t} + \bar{w}(t),\tag{37}$$

i.e., the received signal is a superposition of scaled, delayed, and Doppler-shifted replicas of the transmitted signal. The Doppler shift is provided by *ν<sup>p</sup>* = *vp f*c/*c*, where *vp*, *f*c, and *c* are the relative velocity of the *p*th scattering object, the carrier frequency, and the speed of light, respectively. The length of the CP in (34) is chosen such that *OT* is larger than or equal to the maximum delay.

**Remark 3.** *In the channel model in* (36)*, it is assumed that the individual delays are independent of the absolute time. Strictly speaking, this is not the case, as the movement of a reflector affects the delay. However,* (36) *holds as long as the signal length NT is chosen such that the delay does not change significantly.*

Substituting (34) in (37), the received signal is

$$\boldsymbol{r}(t) = \sum\_{p=0}^{P-1} \boldsymbol{a}\_p \sum\_{n=0}^{N+O-1} \boldsymbol{x}[n-O] \boldsymbol{p}(t-nT-\tau\_p) e^{j2\pi \nu\_p t} + \vec{w}(t). \tag{38}$$

### *3.3. Receiver*

At the receiver, a matched filter with impulse response *p*∗(−*t*) is applied. The output of the matched filter *y*(*t*) is

$$y(t) = \sum\_{p=0}^{P-1} a\_p \sum\_{n=0}^{N+O-1} x[n-O] \int\_{-\infty}^{\infty} p(\tau - nT - \tau\_p) e^{j2\pi \nu\_p \tau} p^\*(\tau - t) d\tau + w(t), \tag{39}$$

where *w*(*t*) is the filtered noise. Assuming that the pulse bandwidth is much larger than the maximum Doppler shift, we can approximate the integral in (39) as *ej*2*πνp*(*nT*+*τp*) *h*(*t* − *nT* − *τp*), where *h*(*t*) is the corresponding Nyquist pulse. The output of the matched filter is then

$$y(t) \approx \sum\_{p=0}^{P-1} a\_p \sum\_{n=0}^{N+O-1} x[n-O] e^{j2\pi \nu\_p (nT + \tau\_p)} h(t - nT - \tau\_p) + w(t). \tag{40}$$

The matched filter output is sampled every *T* seconds and with an offset of *OT* to discard the CP. The sampled signal *y*[*m*] = *y*((*m* + *O*)*T*) is

$$y[n] = \sum\_{p=0}^{P-1} \alpha\_p \sum\_{m=-O}^{N-1} x[m] e^{j2\pi \frac{k\_p}{KT} m} h\_{\tau\_p}[n-m] + w[n],\tag{41}$$

where *hτ<sup>p</sup>* [*n*] = *h*(*nT* − *τp*) is the sampled Nyquist pulse and *w*[*m*] are independent and identically distributed (i.i.d.) complex zero-mean Gaussian random variables with variance *N*0. To shorten the notation, we combine the constant phase terms *ej*2*πνpτ<sup>p</sup>* with the channel gain *α<sup>p</sup>* in (41). Furthermore, we express *ν<sup>p</sup>* as a multiple of the Doppler resolution, which we define as

$$
\Delta\nu \triangleq 1/(KLT)\_\prime \tag{42}
$$

i.e., *ν<sup>p</sup>* = Δ*νkp*.

We can bound the interval for which *h*(*t*) is significantly different from zero (for sufficient large *L*) to ±*LT*/2. Thus, we can express *hτ<sup>p</sup>* [*n*] as

$$h\_{\tau\_p}[n] = \begin{cases} h(nT - \tau\_p), & \text{for } -\frac{LT}{2} \le nT - \tau\_p < \frac{LT}{2}, \\ 0, & \text{else.} \end{cases} \tag{43}$$

The CP allows the linear convolution in (41) to be approximated by a circular convolution; the sample *y*[*n*] is then provided by

$$y[n] = \sum\_{p=0}^{P-1} \alpha\_p y\_p[n] + w[n],\tag{44}$$

where

$$\log y\_p[m] = \sum\_{m=0}^{KL-1} \ge [m] e^{j2\pi \frac{k\_p}{N} m} h\_{\mathbb{T}\_p}[n-m]. \tag{45}$$

Here, *hτ<sup>p</sup>* is periodicized over a period *KL*, i.e., *hτ<sup>p</sup>* [*n*] = *hτ<sup>p</sup>* [*n* + *KL*]. In a last step, the receiver computes the DZT of the sequence *y*[*m*] before subsequent processing takes place.

### **4. Delay Doppler Input–Output Relationship**

To express the input–output relationship in the DD domain for the system presented in Figure 5, we first note that the DZT is a linear transform; as such, we can write the DZT of (44) as

$$Z\_y[n,k] = \sum\_{p=0}^{P-1} a\_p Z\_{y\_p}[n,k] + Z\_w[n,k],\tag{46}$$

where *Zyp* is the DZT of sequence *yp* described in (45) and *Zw*[*n*, *k*] is the DZT of the noise. The elements of *Zw*[*n*, *k*] are i.i.d. zero-mean Gaussian random variables with variance *N*0. This follows from the fact that the DZT is a unitary transform ([10], Section VI).

For the signal model of a single reflector in (45), we provide the following result for the input–output relationship in the DD domain for the OTFS system described in Section 3.

**Theorem 1.** *Considering the fundamental rectangle Zx* ∈ C*L*×*<sup>K</sup> of complex symbols in the DD domain, the input–output relation for OTFS transmission over a time-frequency selective channel for a single reflector is*

$$Z\_{\mathcal{Y}p}[n,k] = \sum\_{m=0}^{L-1} \left( \sum\_{l=0}^{K-1} Z\_x[m,l] Z\_{\mathcal{V}p}[m,k-l] \right) Z\_{\mathcal{T}p}[n-m,k],\tag{47}$$

*where Zτ<sup>p</sup> and Zν<sup>p</sup> are the delay and Doppler spreading functions, respectively. The delay spreading function Zτ<sup>p</sup> is the DZT of the shifted and sampled impulse hτ<sup>p</sup>* [*n*] *in* (43)*, and the Doppler spreading functions is provided as follows:*

$$Z\_{\mathbf{v}\_{\mathbb{P}}}[n,k] = \frac{1}{\sqrt{\mathcal{K}}} e^{j2\pi \frac{k\_p}{\mathbb{K}\mathbb{Z}}n} e^{-j\pi \frac{\mathbb{K}-1}{\mathbb{K}}(k-k\_p)} \frac{\sin\left(\pi(k-k\_p)\right)}{\sin\left(\frac{\pi}{\mathbb{K}}(k-k\_p)\right)}.\tag{48}$$

**Proof.** See Appendix F.

To illustrate the spreading of a single symbol in the DD domain, we consider the following example. Let *L* = *K* = 30 and

$$Z\_{\chi}[n,k] = \begin{cases} 1 & \text{for } n=k=L/2, \\ 0 & \text{else.} \end{cases} \tag{49}$$

The fundamental rectangle with the only nonzero element is presented in Figure 6a. Furthermore, assume that *τ* = 0.5*T* and *ν* = 0.5Δ*ν*. Note that this example causes the maximum spread of a single symbol in the DD domain. We can visualize the spreading of the symbol defined in (49) in two steps. Therefore, we define *Zy*<sup>ˆ</sup> as the DZT resulting from the inner convolution in (47), presented in Figure 6b, with respect to the Doppler index *k*. The resulting spread of the nonzero symbol is visualized in Figure 6c. Finally, the symbol that has been spread in the Doppler domain is spread in the delay domain by the delay spreading function *Zτ*, which is illustrated in Figure 6d. Note that due to the limited support of *h<sup>τ</sup>* (see (43)), the magnitude of *Z<sup>τ</sup>* is independent of the index *k*. The resulting spread of the nonzero symbol in the DD domain is shown in Figure 6e.

For the particular case of *τ<sup>p</sup>* = *npT* with *np* = 0, 1, ... ,*O* − 1 and *ν<sup>p</sup>* = *kp*/(*KLT*) with *kp* ∈ Z, *Zyp* simplifies to

$$Z\_{y\_p}[n,k] = e^{j2\pi \frac{k\_p}{kT}(n-n\_p)} Z\_x[n-n\_p, k-k\_p],\tag{50}$$

i.e., the received symbols *Zyp* are in the DD domain displaced symbols *Zx*.

**Figure 6.** Example of the spread of a symbol (**a**) in the DD domain due to fractional delay and Doppler shift. The spread can be first evaluated in the Doppler domain (**c**) using the Doppler spreading function in (**b**). The spread symbol in the Doppler domain is further spread in the delay by the the delay spread function in (**d**). The overall spread in the DD domain is shown in (**e**).

Theorem 1 shows that the channel interaction with the symbols in the DD domain is time-invariant, neglecting the additional phase terms due to the quasi-periodicity and modulation. The invariance is helpful in the detection of the symbols. Consider a TDL-C channel with a delay spread of 300 ns, a carrier frequency of 4 GHz, and a maximum velocity of 120 kmph. Furthermore, assume an OTFS system with *K* = 7 and *L* = 600 and 1/*T* = 9 MHz. The channel response *Zh*[*n*, *k*] = ∑*K*−<sup>1</sup> *<sup>l</sup>*=<sup>0</sup> *Zν<sup>p</sup>* [*n*, *k* − *l*]*Zτ<sup>p</sup>* [*n*, *k*] in the DD domain is illustrated in Figure 7a. The magnitude of this channel stays approximately constant throughout the entire transmission of an OTFS frame. Figure 7b illustrates the equivalent OFDM channel. The variation of the channel along the subcarrier index k as well along the time index n can be seen. To keep track of the channel, additional pilots need to be used, and these cannot be used for communication.

In addition to constant channel interaction, OTFS offers the advantage of a concise and sparse channel description compared to OFDM. In an OFDM system, the channel coefficient for each subcarrier must be estimated for subsequent symbol detection. In contrast, for symbol detection in an OTFS system, knowledge of the interference introduced by each reflector is sufficient. The sparsity can be seen in Figure 7; the support of |*Zh*[*n*, *k*]| is limited to a small area, while the channel transfer function changes with each subcarrier and time index, that is, *l* and *m*, respectively.

**Figure 7.** Two different representations of the time-variant channel: (**a**) DD representation and (**b**) TF representation. The DD domain representation is only nonzero for a small part of the domain, and stays constant throughout the transmission. On the other hand, the TF domain representation of the channel in the TF domain changes with respect to the time, and therefore needs to be tracked.

**Remark 4.** *The discrete two-dimensional convolution in* (46) *can be equivalently expressed in the form*

$$\mathbf{y} = \mathbf{H}\mathbf{x} + \mathbf{w}\_{\prime} \tag{51}$$

*where* **y***,* **x***, and* **w** *are the vectorized DTZs Zy, Zx, and Zw, respectively. The vectors are all of length KL. The matrix* **<sup>H</sup>** ∈ C*KL*×*KL describes the intersymbol interference in the DD domain. Because Zτ<sup>p</sup> and Zν<sup>p</sup> have small support in the DD domain, the corresponding matrix* **H** *is sparse. The matrix-vector formulation of the input–output relationship is the basis for many works on OTFS; for example, see [5,6].*

### **5. OTFS Overlay for OFDM**

Currently, orthogonal frequency division multiplexing (OFDM) is the dominant modulation scheme in wireless communication. For example, it is used in 5G and in several 802.11 standards. This section shows that DFT-based ODFM can be used for OTFS modulation and demodulation. In this context, OTFS is considered a pre- and postprocessing step for the OFDM system.

To derive the pre- and postprocessing step, we first derive an alternative way to compute the DZT. For this purpose, we consider (27). If we choose the sequence *y* such that its DZT *Zy*[*n*, *k*] = 1, then we can obtain the DZT *Zx* through the right-hand side of (27). The *N* periodic sequence *y* with DZT *Zy*[*n*, *k*] = 1 is

$$y[n] = \begin{cases} \sqrt{K}, & 0 \le n \le L - 1, \\ 0, & \text{elsewhere}. \end{cases} \tag{52}$$

With this particular choice of *y*, we recognize the inner product on the right-hand side of (27) as

$$
\langle \mathbf{x}, y\_{m,l} \rangle = \sqrt{K} \sum\_{n=0}^{L-1} x[n+mL] e^{j2\pi \frac{l}{L}n},\tag{53}
$$

which is the scaled *L*-point DFT of the samples *x*[*n*] for *mL* ≤ *n* ≤ (*m* + 1)*L* − 1. If we define

$$a\_{m,l} \triangleq \langle \mathbf{x}, y\_{m,l} \rangle\_{\text{\textquotedblleft}} \tag{54}$$

for 0 ≤ *m* ≤ *K* − 1 and 0 ≤ *l* ≤ *L* − 1, then the DZT of *x* is obtained through

$$Z\_{\mathbf{x}}[n,k] = \frac{1}{KL} \sum\_{m=0}^{K-1} \sum\_{l=0}^{L-1} a\_{m,l} e^{-j2\pi \left(\frac{k}{K}m - \frac{\mu}{L}l\right)},\tag{55}$$

i.e., by the SFFT of the coefficients *am*,*l*. Note that the set *am*,*<sup>l</sup>* represents the Gabor expansion coefficients for the choice of a rectangular analysis window (see [25], Section 4), and thus a mixed TF representation of the sequence *x*.

The coefficients *am*,*l*, on the other hand, are obtained from *<sup>Z</sup>*(*L*,*K*) *<sup>x</sup>* [*n*, *<sup>k</sup>*] using (25):

$$a\_{m,l} = \sum\_{n=0}^{L-1} \sum\_{k=0}^{K-1} Z\_{\chi}[n,k] e^{j2\pi \left(\frac{k}{K}m - \frac{n}{L}l\right)}.\tag{56}$$

The samples of the sequence *x* for *mL* ≤ *n* ≤ (*m* + 1)*L* − 1 are obtained as follows:

$$\text{tr}[n+mL] = \frac{1}{\sqrt{K}L} \sum\_{l=0}^{L-1} a\_{m,l} e^{j2\pi \frac{l}{L}n},\tag{57}$$

which is the *L*-point IDFT of the coefficients *am*,*<sup>l</sup>* for a fixed *m*. Thus, the DZT (IDZT) can be implemented by consecutive execution of the DFT (IDFT) and the SFFT (ISFFT).

The above-described two-step approach for the calculation of the DZT and IDZT can be used to implement OTFS using OFDM hardware, which is typically based on the IDFT/DFT (see ([26], Section 19.3), ([23], Section 6.4.2), ([27] Section 12.4.3), or ([28], Section 4.6)) by extending the transmitter and receiver by the ISFFT and SFFT, respectively. The coefficients *am*,*<sup>l</sup>* then represent the coefficient in the TF domain. The index *m* refers to the *m*th OFDM symbol in the time domain, and *l* is the corresponding subcarrier index. Note that for the DZT, the parameter *L* the grid size in the delay domain. For DFT-SFFT implementation, on the other hand, *L* defines DFT size, which defines the number of points in the frequency domain. Thus, an *L* × *K* grid in the DD domain translates to a *K* × *L* grid in the TF domain.

**Remark 5.** *In CP-OFDM, a CP is added for each OFDM symbol by copying the last O samples of an OFDM symbol and inserting them in front of the corresponding OFDM symbol with length L. This symbol-wise CP is not required in the OFDM implementation of OTFS. Instead, a single CP is added by copying the last O samples of the entire sequence and inserting them in front of the sequence.*

### **6. DD Channel Capacity**

The input–output relationship in (41) is equivalently expressed as

$$y[n] = \sum\_{m \in \mathcal{L}} h[n, m]x[n - m] + w[n],\tag{58}$$

where *h*[*n*, *m*] is the time-variant multi-tap channel response at time instance *n* and L is the support of *h*[*n*, *m*] in *m*. This channel response is deterministic and periodic (considering *kp* ∈ Q) with some finite period *M*, i.e., *h*[*n*, *m*] = *h*[*n* + *bM*, *m*] for any *n* ∈ {1, 2, ... , *M*} and *b* ∈ Z. Upon using the channel *N* times, the input output relationship can be written in the following vector form:

$$\mathbf{Y}\_{N} = \mathbf{H}\_{N}\mathbf{X}\_{N} + \mathbf{W}\_{N\prime} \tag{59}$$

where **X***<sup>N</sup>* is the input block, **Y***<sup>N</sup>* is the corresponding output block, **W***<sup>N</sup>* is the block of noise samples (all column vectors), and **H***<sup>N</sup>* is the channel (convolution) matrix constructed from the time-varying channel response *h*[*n*, *m*].

The above channel can be shown to be *information-stable* (see Section 3.9 in [29]); hence, its capacity is provided by the following multi-letter limiting expression [30]:

$$\mathcal{C} = \lim\_{N \to \infty} \sup\_{f \mathbf{x}\_N} \frac{1}{N} I(\mathbf{X}\_N; \mathbf{Y}\_N)\_\prime \tag{60}$$

where *f***X***<sup>N</sup>* is the multi-letter input distribution for block length *N*. For each block length *N*, the corresponding mutual information term in (60) is maximized by a Gaussian input [31]; hence, the capacity is provided by

$$\mathbb{C} = \lim\_{N \to \infty} \max\_{\mathbf{Q}\_N: \text{tr}(\mathbf{Q}\_N) \le N\mathbb{P}} \frac{1}{N} \log \det \left( \frac{1}{\sigma^2} \mathbf{H}\_N \mathbf{Q}\_N \mathbf{H}\_N^\mathbb{H} + \mathbf{I}\_N \right). \tag{61}$$

Let **H***<sup>N</sup>* = **U***N***Σ***N***V**<sup>H</sup> *<sup>N</sup>* be the SVD of **H***N*. Then, the optimal input covariance matrix is provided by **Q***<sup>N</sup>* = **V***N***D***N***V**<sup>H</sup> *<sup>N</sup>*, where **D***<sup>N</sup>* is a diagonal matrix obtained using waterfilling [31]. The capacity-achieving strategy is characterized by a sequence {**Q***N*}*N*∈N.

In case we do not wish to use the channel response matrix in the construction of input sequences, we may add the restriction that the multi-letter input distribution must be isotropic. In this case, we simply have **Q***<sup>N</sup>* = *P***I***N*, and the capacity is provided by

$$\mathbf{C}\_{\text{iso}} = \lim\_{N \to \infty} \frac{1}{N} \log \det \left( \frac{P}{\sigma^2} \mathbf{H}\_N \mathbf{H}\_N^H + \mathbf{I}\_N \right). \tag{62}$$

It is evident that *C*iso is achieved by any input of the form **X***<sup>N</sup>* = **B***N***S***N*, where **B***<sup>N</sup>* is a set of orthonormal basis (i.e., **B**<sup>H</sup> *<sup>N</sup>***B***<sup>N</sup>* <sup>=</sup> **<sup>B</sup>***N***B**<sup>H</sup> *<sup>N</sup>* = **I***N*) and **S***<sup>N</sup>* is a vector of zero-mean i.i.d. Gaussian symbols with covariance E **S***N***S**<sup>H</sup> *N* = *P***I***N*. As shown in Section 2, the set of sequence {*vn*,*<sup>k</sup>* : 0 ≤ *n* ≤ *L* − 1, 0 ≤ *k* ≤ *K* − 1} forms an orthonormal basis. Thus, the capacity of the DD channel is provided by (62).

### **7. Conclusions**

In this work, we have presented an OTFS based on the discrete Zak transform. The discrete Zak transform-based description allows for an efficient digital implementation of OTFS. Furthermore, we derived the input–output relation for the symbols in the delay-Doppler domain solely based on discrete Zak transform properties, which provides a concise description of OTFS compared to the pre- and postprocessing approaches for OFDM.

Our presented discrete Zak transform approach can be used to study and evaluate OTFS from a different perspectives, potentially leading to OTFS performance improvements. For example, considering Nyquist pulses *p*(*t*) with larger roll-off factors allows the interference in the delay domain to be controlled. Additionally, applying windows to the subsampled sequences of the DZT reduces the interference in the Doppler domain.

**Author Contributions:** Conceptualization, F.L.; formal analysis, F.L.; H.J. and F.M.J.W.; writing original draft preparation, F.L.; writing—review and editing, F.L, H.J., A.A. and F.M.J.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Dutch Technology Foundation TTW, which is part of the Netherlands Organisation for Scientific Research (NWO), and which is partly funded by the Ministry of Economic Affairs under the project Integrated Cooperative Automated Vehicles (i-CAVE).

**Institutional Review Board Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **Appendix A. Proof of Relation** (14)

Substituting *x*[*n*] in (1) by (13), we obtain

$$Z\_{\mathbf{x}}^{(L,K)}[n,k] = \frac{1}{K\sqrt{L}}\sum\_{l=0}^{K-1}\sum\_{k'=0}^{KL-1}X[k']e^{j2\pi\left(\frac{k'}{kT}(n+IL)-\frac{k}{kT}I\right)}.\tag{A1}$$

Note that in the derivation of (13), the case for *L* = 1 was considered; thus, the sequence *x* has a period *K*. Here, on the other hand, we consider the sequence *x* to be *KL*-periodic. Therefore, (13) is adopted accordingly by substituting *K* by *KL*. Next, we rearrange terms and obtain

$$Z\_x^{(L,K)}[n,k] = \frac{1}{K\sqrt{L}}\sum\_{k'=0}^{KL-1} X[k']e^{j2\pi\frac{k'}{KL}n}\sum\_{l=0}^{K-1} e^{-j2\pi\frac{k'-k}{K}l},\tag{A2}$$

where we finally replace the last sum by relation (3) which, due to the sifting property of the Kronecker delta, leads to

$$Z\_{\mathbf{x}}^{(L,K)}[n,k] = \frac{1}{\sqrt{L}} \sum\_{l=0}^{L-1} X[k+lK]e^{j2\pi \frac{k+lK}{KL}n}.\tag{A3}$$

### **Appendix B. Proof of Relation** (16)

In a first step, we rewrite the summation in (10) as a double summation, i.e.,

$$X[k] = \frac{1}{\sqrt{KL}} \sum\_{l=0}^{K-1} \sum\_{n=0}^{L-1} x[n+lL] e^{-j\frac{k}{KT}(n+lL)}.\tag{A4}$$

Next, we use relation (19) to express *x*[*n* + *lL*] through its IDZT, which leads to

$$X[k] = \frac{1}{K\sqrt{L}} \sum\_{l=0}^{K-1} \sum\_{n=0}^{L-1} \sum\_{k'=0}^{K-1} Z[n,k'] e^{-j\frac{k-k'}{K}l} e^{-j\frac{k}{KL}n},\tag{A.5}$$

and in a final step we use relation (3) with respect to the summation over *l*, which results in

$$X[k] = \frac{1}{\sqrt{L}} \sum\_{n=0}^{L-1} Z\_x[n,k] e^{-j2\pi \frac{k}{M} n}.\tag{A6}$$

### **Appendix C. Proof of Relation** (25)

To prove the relation (25), we substitute the DZT *Zx* and *Z*∗ *<sup>y</sup>* by their definition in (1). After rearranging terms, we obtain

$$\frac{1}{K} \sum\_{n=0}^{L-1} \sum\_{l'=0}^{L-1} \sum\_{l''=0}^{L-1} x \left[ n + l'L \right] y^\* \left[ n + l''L \right] e^{-j2\pi \frac{L}{L} n} \sum\_{k=0}^{K-1} e^{-j2\pi \frac{k}{K} \left( l' - l'' - m \right)}.\tag{A7}$$

We can us relation (3) to substitute the last summation. From the sifting property of the Kronecker delta (3), we have

$$\sum\_{n=0}^{L-1} \sum\_{l'=0}^{L-1} x[n+l'L]y^\*[n+(l'-m)L]e^{-j2\pi \frac{l}{L}n}.\tag{A8}$$

Because the complex exponential sequence is periodic, with a period *L*, we can rewrite the double summation as a single summation, providing us with

$$\sum\_{n=0}^{KL-1} \ge [n]y^\*[n-mL]e^{-j2\pi \frac{L}{L}n} \tag{A9}$$

which can be recognized as the inner product between *x* and *ym*,*l*.

### **Appendix D. Proof of the Modulation Property**

To prove the modulation property, we can use the definition of the sequence *z* = *x* · *y* and the definition of the DZT in (1), which is

$$Z\_{\Xi}[n,k] = \frac{1}{\sqrt{K}} \sum\_{l=0}^{K-1} \ge [n+lL]y[n+lL]e^{-j2\pi \frac{k}{K}l}.\tag{A10}$$

Now, expressing *x*[*n* + *lL*] using (19), we have

$$Z\_z[n,k] = \frac{1}{K} \sum\_{m=0}^{K-1} Z\_x[n,m] \sum\_{l=0}^{K-1} y[n+lL] e^{-j2\pi \frac{(k-m)}{K}l}.\tag{A11}$$

Finally, using the DZT definition (1), we obtain

$$Z\_{\mathbf{z}}[n,k] = \frac{1}{\sqrt{K}} \sum\_{m=0}^{K-1} Z\_{\mathbf{x}}[n,m] Z\_{\mathbf{y}}[n,k-m]. \tag{A12}$$

### **Appendix E. Proof of the Convolution Property**

To prove relation (31), we first express the circular convolution as a multiplication in the DFT domain, i.e.,

$$Z[k] = \sqrt{KL}X[k]\mathcal{Y}[k],\tag{A13}$$

where the factor <sup>√</sup>*KL* is due to the unitary definition of the DFT. Using (14), we have

$$Z\_z[n,k] = \sqrt{K} \sum\_{l=0}^{L-1} X[k+lK] \mathcal{Y}[k+lK] e^{j2\pi \frac{k \pm lK}{KL} n}. \tag{A14}$$

Now, using (16) to express the elements of the DFT through their DZT, we obtain

$$Z\_{\mathbb{Z}}[n,k] = \frac{\sqrt{K}}{L} \sum\_{n'=0}^{L-1} \sum\_{n''=0}^{L-1} Z\_{\mathbb{X}}[n',k] Z\_{\mathbb{Y}}[n'',k] \sum\_{l=0}^{L-1} e^{-j2\pi \frac{k+lK}{KL} \left(n'+n''-n\right)}.\tag{A15}$$

Substituting the last sum by (3) and applying the sifting property of the Kronecker delta, we finally have

$$Z\_z[n,k] = \sqrt{K} \sum\_{n'=0}^{L-1} Z\_x[n',k] Z\_{\mathcal{Y}}[n-n',k]. \tag{A16}$$

### **Appendix F. Proof of Theorem 1**

To prove Theorem 1, we start by expressing the sequence *y* in (45) as

$$y = \left(\mathbf{x} \cdot \boldsymbol{\mu}\_{\nu\_{\mathcal{V}}}\right) \circledast h\_{\mathbf{r}\_{\mathcal{V}}} \tag{A17}$$

where *uν<sup>p</sup>* [*n*] = *ej*2*π*(*kp*/*N*)*n*. Using the modulation property (30) and the convolution property (31), we can express the DZT of *y* as

$$Z\_{\mathcal{Y}}[m,k] = \sum\_{m=0}^{L-1} \left( \sum\_{l=0}^{K-1} Z\_{\mathcal{X}}[m,l] Z\_{\mathcal{V}}[m,k-l] \right) Z\_{\mathcal{T}}[n-m,k]. \tag{A18}$$

Here, *Zν* is the DZT of sequence *uν*, which is

$$\begin{split} Z\_{\boldsymbol{v}}[n,k] &= \frac{1}{\sqrt{K}} \sum\_{l=0}^{K-1} e^{j2\pi \frac{kp}{K}(n+lL)} e^{-j2\pi \frac{k}{K}l} \\ &= \frac{1}{\sqrt{K}} e^{j2\pi \frac{kp}{K}n} \sum\_{l=0}^{K-1} e^{-j2\pi \frac{k-k\_p}{K}l} \\ &= \frac{1}{\sqrt{K}} e^{j2\pi \frac{kp}{K}n} \frac{1 - e^{-j2\pi (k-k\_p)}}{1 - e^{-j2\pi \frac{k-k\_p}{K}}} \\ &= \frac{1}{\sqrt{K}} e^{j2\pi \frac{kp}{K}n} e^{-j\pi \frac{K-k}{K}(k-k\_p)} \frac{\sin\left(2\pi (k-k\_p)\right)}{\sin\left(2\pi \frac{k-k\_p}{K}\right)}. \end{split} \tag{A19}$$

### **References**


### *Article* **Private Key and Decoder Side Information for Secure and Private Source Coding †**

**Onur Günlü 1,\*, Rafael F. Schaefer 2,3, Holger Boche 4,5,6,7 and Harold Vincent Poor <sup>8</sup>**


**Abstract:** We extend the problem of secure source coding by considering a remote source whose noisy measurements are correlated random variables used for secure source reconstruction. The main additions to the problem are as follows: (1) all terminals noncausally observe a noisy measurement of the remote source; (2) a private key is available to all legitimate terminals; (3) the public communication link between the encoder and decoder is rate-limited; and (4) the secrecy leakage to the eavesdropper is measured with respect to the encoder input, whereas the privacy leakage is measured with respect to the remote source. Exact rate regions are characterized for a lossy source coding problem with a private key, remote source, and decoder side information under security, privacy, communication, and distortion constraints. By replacing the distortion constraint with a reliability constraint, we obtain the exact rate region for the lossless case as well. Furthermore, the lossy rate region for scalar discrete-time Gaussian sources and measurement channels is established. An achievable lossy rate region that can be numerically computed is also provided for binary-input multiple additive discrete-time Gaussian noise measurement channels.

**Keywords:** information theoretic security; secure source coding; remote source; private key; side information

### **1. Introduction**

Consider multiple terminals that observe correlated random sequences and wish to reconstruct these sequences at another terminal, called a decoder, by sending messages through noiseless communication links, i.e., the distributed source coding problem [1]. A sensor network where each node observes a correlated random sequence that needs to be reconstructed at a distant node is a classic example of this problem [2] (p. 258). Similarly, function computation problems in which a fusion center observes messages sent by other nodes to compute a function are closely related problems that can be used to model various recent applications [3,4]. Since messages sent over communication links can be public, security constraints are imposed on these messages against an eavesdropper in the same network [5]. If all sent messages are available to the eavesdropper, it is necessary to provide an advantage to the decoder over the eavesdropper to enable secure source coding. Providing side information that is correlated with the sequences

**Citation:** Günlü, O.; Schaefer, R.F.; Boche, H.; Poor, H.V. Private Key and Decoder Side Information for Secure and Private Source Coding. *Entropy* **2022**, *24*, 1716. https://doi.org/ 10.3390/e24121716

Academic Editor: T. Aaron Gulliver

Received: 18 October 2022 Accepted: 18 November 2022 Published: 24 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

that should be reconstructed to the decoder can provide such an advantage over the eavesdropper that can also have side information, as in [6–8]. Allowing for the eavesdropper to access only a strict subset of all messages is also a method to enable secure distributed source coding, which was considered in [9–11]; see also [12], in which a similar method was applied to enable secure remote source reconstruction. Similarly, a private key that is shared by legitimate terminals and hidden from the eavesdropper can also provide such an advantage, as in [13,14].

Source coding models in the literature commonly assume that dependent multiletter random variables are available and should be compressed. For secret-key agreement [15,16] and secure function computation problems [17,18], which are instances of the source coding with the side information problem [19] (Section IV-B), the correlation between these multiletter random variables was posited in [20,21] to stem from an underlying ground truth that is a remote source, such that its noisy measurements are these dependent random variables. Such a remote source allows one to model the cause of correlation in a network, so we also posit that there is a remote source whose noisy measurements are used in the source coding problems discussed below, which is similar to the models in [22] (p. 78) and [23] (Figure 9). Furthermore, in the chief executive officer (CEO) problem [24], there is a remote source whose noisy measurements are encoded, such that a decoder can reconstruct the remote source by using encoder outputs. Our model is different from the model in the CEO problem, since in our model, the decoder aims to recover encoder observations rather than the remote source that is considered mainly to describe the cause of correlation between encoder observations. Thus, we define the *secrecy leakage* as the amount of information leaked to an eavesdropper about encoder observations. Since the remote source is common for all observations in the same network, we impose a *privacy leakage* constraint on the remote source because each encoder output observed by an eavesdropper leaks information about unused encoder observations, which might later cause secrecy leakage when the unused encoder observations are employed [25–27]; see [28–30] for joint secrecy and joint privacy constraints imposed due to multiple uses of the same source.

### *1.1. Summary of Contributions*

We extend the lossless and lossy source coding rate region analyses by considering a remote source that should be kept private, decoder and eavesdropper side information, and a private key shared by the encoder and decoder. Considering that one encoder provides insights with enough richness to extend the results to multiple encoders [31], in this work, we consider the single encoder case. A summary of the main contributions is as follows.


### *1.2. Organization*

This paper is organized as follows. In Section 2, we introduce the lossless and lossy secure and private source coding problems with decoder and eavesdropper side information and a private key under storage, secrecy, privacy, and reliability or distortion constraints. In Section 3, we characterize the rate regions for the introduced problems, which include three parts that correspond to different private key rate regimes. In Section 4, we evaluate

the lossy rate region for Gaussian sources and channels with squared error distortion. In Section 5, we consider a binary modulated remote source measured through additive Gaussian noise channels and provide an inner bound for the lossy rate region with Hamming distortion. In Section 6, we provide the proof for the lossy secure and and private source coding region.

### *1.3. Notation*

Uppercase *X* represents random variables and lowercase *x* their realizations from a set X , denoted by calligraphic letters. A discrete random variable *X* has probability distribution *PX* and a continuous random variable *X* probability density function (pdf) *pX*. A subscript *i* denotes the position of a variable in a length-*n* sequence *X<sup>n</sup>* = *X*1, *X*2, ... , *Xi*, ... , *Xn*. Boldface uppercase **X** = [*X*1, *X*2, ...] *<sup>T</sup>* represent vector random variables, where *T* denotes the transpose. [1 : *m*] denotes the set {1, 2, ... , *m*} for an integer *m* ≥ 1. Define [*a*] <sup>−</sup> = min{*a*, 0} for *a* ∈ R. Function *Hb*(*x*) = −*x* log *x* − (1−*x*)log(1−*x*) is the binary entropy function, where logarithms are to the base 2. A binary symmetric channel (BSC) with crossover probability is denoted by BSC(). *X* ∼ Bern(*β*) with X = {0, 1} is a binary random variable with Pr[*X* = 1] = *β*. The ∗ operator represents *p* ∗ *q* = (1 − 2*q*)*p* + *q*. Function *Q*(·) denotes the complementary cumulative distribution function of the standard Gaussian distribution. The function sgn(·) represents the signum function.

### **2. System Model**

We consider the lossy source coding model with one encoder, one decoder, and an eavesdropper (Eve), depicted in Figure 1. The encoder Enc(·, ·) observes a noisy measurement *<sup>X</sup>*H*<sup>n</sup>* of an i.i.d. remote source *<sup>X</sup><sup>n</sup>* <sup>∼</sup> *<sup>P</sup><sup>n</sup> <sup>X</sup>* through a memoryless channel *PX*H|*<sup>X</sup>* in addition to a private key *<sup>K</sup>* ∈ [<sup>1</sup> : <sup>2</sup>*nR*<sup>0</sup> ]. The encoder output is an index *<sup>W</sup>* that is sent over a link with limited communication rate. Decoder Dec(·, ·, ·) observes index *W*, private key *K*, and another noisy measurement *Y<sup>n</sup>* of the same remote source *X<sup>n</sup>* through another memoryless channel *PYZ*|*<sup>X</sup>* in order to estimate *<sup>X</sup>*H*<sup>n</sup>* as *<sup>X</sup>* GH*<sup>n</sup>*. The other noisy output *<sup>Z</sup><sup>n</sup>* of *PYZ*|*<sup>X</sup>* is observed by Eve in addition to index *W*. Assume *K* is uniformly distributed, hidden from Eve, and independent of the source output and its noisy measurements. The source and measurement alphabets are finite sets.

**Figure 1.** Source coding with noisy measurements (*X*H*n*,*Yn*) of a remote source *<sup>X</sup><sup>n</sup>* and with a uniform private key *K* under privacy, secrecy, communication, and distortion constraints.

We next define the rate region for the lossy secure and private source coding problem defined above.

**Definition 1.** A *lossy* tuple (*R*w, *R*s, *R*-, *<sup>D</sup>*) ∈ R<sup>4</sup> <sup>≥</sup><sup>0</sup> is achievable given a private key with rate *R*<sup>0</sup> ≥0, if for any *δ*>0 there exist *n*≥1, an encoder, and a decoder, such that

$$\frac{1}{m}\log\left|\mathcal{W}\right| \le R\_{\text{w}} + \delta \tag{storage} \tag{1}$$

$$\frac{1}{n}I(\check{X}^n; \mathcal{W}|Z^n) \le \mathcal{R}\_\ast + \delta \tag{8.6.7}$$

$$\frac{1}{n}I(X^n; \mathcal{W}|Z^n) \le \mathcal{R}\_\ell + \delta \tag{3}$$

$$\mathbb{E}\left[d\left(\bar{\mathbf{X}}^{\boldsymbol{n}}, \bar{\mathbf{X}}^{\boldsymbol{n}}(\mathbf{Y}^{\boldsymbol{n}}, \boldsymbol{W}, \boldsymbol{K})\right)\right] \le D + \delta \tag{\text{distortion}} \tag{4}$$

where *<sup>d</sup>*(*x*H*n*, <sup>G</sup>*x*H*n*) = <sup>1</sup> *<sup>n</sup>* <sup>∑</sup>*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *<sup>d</sup>*(*x*H*i*, *<sup>x</sup>*HF *<sup>i</sup>*) is a per-letter bounded distortion metric. The *lossy* secure and private source coding region R<sup>D</sup> is the closure of the set of all achievable lossy tuples. ♦

In (2) and (3), we consider conditional mutual information terms to take account of unavoidable secrecy and privacy leakages due to Eve's side information, i.e., *<sup>I</sup>*(*X*H*n*; *<sup>Z</sup>n*) and *I*(*Xn*; *Zn*), respectively; see also [21,32]. Furthermore, we consider conditional mutual information terms rather than corresponding conditional entropy terms, the latter of which is used in [6,14,33–35], to characterize the secrecy and privacy leakages simplifies our analysis.

We next define the rate region for the lossless secure and private source coding problem.

**Definition 2.** A *lossless* tuple (*R*w, *R*s, *R*-)∈R<sup>3</sup> <sup>≥</sup><sup>0</sup> is achievable given a private key with rate *R*<sup>0</sup> ≥ 0, if for any *δ* > 0 there exist *n* ≥ 1, an encoder, and a decoder, such that we have (1)–(3) and

$$\Pr\left[\widetilde{X}^{\mathfrak{n}} \neq \widetilde{X}^{\mathfrak{n}}(Y^{\mathfrak{n}}, \mathcal{W}, K)\right] \leq \delta \tag{reliability} . \tag{5}$$

The *lossless* secure and private source coding region R is the closure of the set of all achievable lossless tuples. ♦

### **3. Secure and Private Source Coding Regions**

*3.1. Lossy Source Coding*

The lossy secure and and private source coding region R<sup>D</sup> is characterized below; see Section 6 for its proof.

Define [*a*] <sup>−</sup> = min{*a*, 0} for *a* ∈ R.

**Theorem 1.** *For given PX, PX*H|*X, PYZ*|*X, and <sup>R</sup>*0*, the region* <sup>R</sup>*<sup>D</sup> is the set of all rate tuples* (*Rw*, *Rs*, *R*-, *D*) *satisfying*

$$\mathcal{R}\_w \ge I(\mathcal{U}; \mathcal{X}|\mathcal{Y}) \tag{6}$$

*and if R*<sup>0</sup> <sup>&</sup>lt; *<sup>I</sup>*(*U*; *<sup>X</sup>*H|*Y*, *<sup>V</sup>*)*, then*

$$R\_s \ge I(\mathcal{U}; X|Z) + R'\_s - R\_0 \tag{7}$$

$$\mathcal{R}\_{\ell} \ge I(\mathcal{U}; \mathcal{X}|\mathcal{Z}) + \mathcal{R}' - \mathcal{R}\_0 \tag{8}$$

*where we have*

$$\mathcal{R}' = [I(\mathcal{U}; Z|V, Q) - I(\mathcal{U}; Y|V, Q)]^{-} \tag{9}$$

*and if I*(*U*; *<sup>X</sup>*H|*Y*, *<sup>V</sup>*) <sup>≤</sup> *<sup>R</sup>*<sup>0</sup> <sup>&</sup>lt; *<sup>I</sup>*(*U*; *<sup>X</sup>*H|*Y*)*, then*

$$\mathcal{R}\_s \ge I(V; \hat{X}|Z) \tag{10}$$

$$R\_{\ell} \ge I(V; X|Z) \tag{11}$$

*and if R*<sup>0</sup> <sup>≥</sup> *<sup>I</sup>*(*U*; *<sup>X</sup>*H|*Y*)*, then*

$$\mathcal{R}\_s \ge 0 \tag{12}$$

$$\mathcal{R}\_{\ell} \ge 0 \tag{13}$$

*for some*

$$P\_{QVIL\bar{X}XYZ} = P\_{Q|V} P\_{V|U} P\_{U|\bar{X}} P\_{\overline{X}|X} P\_X P\_{YZ|X} \tag{14}$$

*such that* E *d <sup>X</sup>*H, *<sup>X</sup>* FH(*U*,*Y*) ≤ *D for some reconstruction function X* FH(*U*,*Y*)*. The region* <sup>R</sup>*<sup>D</sup> is convexified by using the time-sharing random variable Q, required due to the* [·] − *operation. One can limit the cardinalities to* |Q| ≤ <sup>2</sup>*,* |V| ≤ |*X*H<sup>|</sup> <sup>+</sup> <sup>3</sup>*, and* |U| ≤ (|*X*H<sup>|</sup> <sup>+</sup> <sup>3</sup>)2*.*

We remark that (12) and (13) show that one can simultaneously achieve *strong secrecy* and *strong privacy*, i.e., the conditional mutual information terms in (2) and (3), respectively, are negligible, by using a large private key *K*, which is a result missing in some recent works on secure source coding with a private key.

### *3.2. Lossless Source Coding*

The lossless secure and and private source coding region R is characterized next; see below for a proof sketch.

**Proposition 1.** *For given PX, PX*H|*X, PYZ*|*X, and <sup>R</sup>*0*, the region* <sup>R</sup> *is the set of all rate tuples* (*Rw*, *Rs*, *R*-) *satisfying*

$$\mathcal{R}\_w \ge H(\mathcal{X}|Y) \tag{15}$$

*and if R*<sup>0</sup> <sup>&</sup>lt; *<sup>H</sup>*(*X*H|*Y*, *<sup>V</sup>*)*, then*

$$R\_s \ge H(\vec{X}|Z) + R^{\prime\prime} - R\_0 \tag{16}$$

$$R\_{\ell} \ge I(\tilde{X}; X|Z) + R^{\prime\prime} - R\_0 \tag{17}$$

*where we have*

$$\mathcal{R}'' = \left[ I(\tilde{X}; Z|V, Q) - I(\tilde{X}; Y|V, Q) \right]^{-} \tag{18}$$

*and if H*(*X*H|*Y*, *<sup>V</sup>*) <sup>≤</sup> *<sup>R</sup>*<sup>0</sup> <sup>&</sup>lt; *<sup>H</sup>*(*X*H|*Y*)*, then*

$$\mathcal{R}\_s \ge I(V; \mathcal{X}|Z) \tag{19}$$

*R*-≥ *I*(*V*; *X*|*Z*) (20)

*and if R*<sup>0</sup> <sup>≥</sup> *<sup>H</sup>*(*X*H|*Y*)*, then*

$$R\_s \ge 0\tag{21}$$

$$R\_{\ell} \ge 0 \tag{22}$$

*for some*

$$P\_{QV\bar{X}XYZ} = P\_{Q|V} P\_{V|\bar{X}} P\_{\bar{X}|X} P\_X P\_{YZ|X} \cdot \tag{23}$$

*One can limit the cardinalities to* |Q| ≤ <sup>2</sup> *and* |V| ≤ |*X*H<sup>|</sup> <sup>+</sup> <sup>2</sup>*.*

**Proof Sketch.** The proof for the lossless region R follows from the proof for the lossy region <sup>R</sup>D, given in Theorem <sup>1</sup> above, by choosing *<sup>U</sup>* <sup>=</sup> *<sup>X</sup>*H, such that we have reconstruction function *X* FH(*X*H,*Y*) = *<sup>X</sup>*H, so we achieve *<sup>D</sup>* <sup>=</sup> 0. Thus, the reliability constraint in (5) is satisfied because *d*(·, ·) is a distortion metric.

### **4. Gaussian Sources and Additive Gaussian Noise Channels**

We evaluate the lossy rate region for a Gaussian example with squared error distortion by finding the optimal auxiliary random variable in the corresponding rate region. Consider a special lossy source coding case in which (*i*) there is no private key; (*ii*) the eavesdropper's channel observation *Z<sup>n</sup>* is less noisy than the decoder's channel observation *Yn*, such that we obtain a lossy source coding region with a single auxiliary random variable that should be optimized.

We next define less noisy channels, considering *PYZ*|*X*.

**Definition 3** ([36])**.** *Z* (or eavesdropper) is *less noisy* than *Y* (or decoder) if

$$I(L; Z) \ge I(L; Y) \tag{24}$$

holds for any random variable *L*, such that *L* − *X* − (*Y*, *Z*) form a Markov chain. ♦

**Corollary 1.** *For given PX, PX*H|*X, PYZ*|*X, and <sup>R</sup>*<sup>0</sup> <sup>=</sup> <sup>0</sup>*, the region* <sup>R</sup>*<sup>D</sup> when the eavesdropper is less noisy than the decoder is the set of all rate tuples* (*Rw*, *Rs*, *R*-, *D*) *satisfying*

$$R\_w \ge I(\mathcal{U}; \hat{X}|Y) = I(\mathcal{U}; \hat{X}) - I(\mathcal{U}; Y) \tag{25}$$

$$R\_s \ge I(\mathsf{LI}; \dot{X} | Z) = I(\mathsf{LI}; \dot{X}) - I(\mathsf{LI}; Z) \tag{26}$$

$$R\_{\ell} \ge I(\mathsf{U}; \mathsf{X}|\mathsf{Z}) = I(\mathsf{U}; \mathsf{X}) - I(\mathsf{U}; \mathsf{Z}) \tag{27}$$

*for some*

$$P\_{\mathcal{U}\mathcal{X}XYZ} = P\_{\mathcal{U}|\mathcal{X}} P\_{\mathcal{X}|X} P\_X P\_{YZ|X} \tag{28}$$

*such that* E *d <sup>X</sup>*H, *<sup>X</sup>* FH(*U*,*Y*) ≤*D for some reconstruction function X* FH(*U*,*Y*)*. One can limit the cardinality to* |U|≤|*X*H|+3*.*

**Proof Sketch.** The proof for Corollary 1 follows from the proof for Theorem 1 by considering the bounds in (6)–(8) since *R*<sup>0</sup> = 0. Furthermore, *R* defined in (9) is 0 for the less noisy condition considered, which follows because (*Q*, *V*) − *U* − *X* − (*Y*, *Z*) form a Markov chain.

Suppose the following scalar discrete-time Gaussian source and channel model for the lossy source coding problem depicted in Figure 1

$$X = \rho\_{\mathcal{X}} X + N\_{\mathcal{X}} \tag{29}$$

$$Y = \rho\_y X + N\_y \tag{30}$$

$$Z = \rho\_z X + N\_z \tag{31}$$

where we have remote source *X* ∼ N (0, 1), fixed correlation coefficients *ρx*, *ρy*, *ρ<sup>z</sup>* ∈ (−1, 1), and additive Gaussian noise random variables

$$N\_{\mathbf{x}} \sim \mathcal{N}\left(0, 1 - \rho\_{\mathbf{x}}^2\right) \tag{32}$$

$$N\_y \sim \mathcal{N}(0, 1 - \rho\_y^2) \tag{33}$$

$$N\_{\tilde{z}} \sim \mathcal{N}(0, 1 - \rho\_z^2) \tag{34}$$

such that (*X*H, *Nx*, *Ny*, *Nz*) are mutually independent, and we consider the squared error distortion, i.e., *<sup>d</sup>*(*x*H, <sup>F</sup>*x*H)=(*x*H−F*x*H) 2 . Note that (29) is an inverse measurement channel *PX*|*X*<sup>H</sup> that is a weighted sum of two independent Gaussian random variables, imposed to be able to apply the conditional entropy power inequality (EPI) [37] (Lemma II); see [20] (Theorem 3) and [38] (Section V) for binary symmetric inverse channel assumptions imposed to apply Mrs. Gerber's lemma [39]. Suppose |*ρz*| > |*ρy*|, such that *Y* is less stochastically degraded than *<sup>Z</sup>*, since then there exists a random variable *<sup>Y</sup>*<sup>H</sup> such that *PY*H|*<sup>X</sup>* <sup>=</sup> *PY*|*<sup>X</sup>* and *PYZ*<sup>H</sup> <sup>|</sup>*<sup>X</sup>* <sup>=</sup> *PZ*|*XPY*H|*<sup>Z</sup>* [40] (Lemma 6), so *<sup>Z</sup>* is also less noisy than *<sup>Y</sup>* since less noisy channels constitute a strict superset of the set of stochastically-degraded channels and both channel sets consider only the conditional marginal probability distributions [2] (p. 121).

We next take the liberty to use the lossy rate region in Corollary 1, characterized for discrete memoryless channels, for the model in (29)–(31). This is common in the literature since there is a discretization procedure to extend the achievability proof to well-behaved continuous-alphabet random variables and the converse proof applies to arbitrary random variables; see [2] (Remark 3.8). For Gaussian sources and channels, we use differential entropy and eliminate the cardinality bound on the auxiliary random variable. The lossy source coding region for the model in (29)–(31) without a private key is given below.

**Proposition 2.** *For the model in (29)–(31), such that* |*ρz*| > |*ρy*| *and R*<sup>0</sup> = 0*, the region* R*<sup>D</sup> with squared error distortion is the set of all rate tuples* (*Rw*, *Rs*, *R*-, *D*) *satisfying, for α* ∈ (0, 1]*,*

$$R\_w \ge \frac{1}{2} \log \left( \frac{1 - \rho\_x^2 \rho\_y^2 (1 - \alpha)}{\alpha} \right) \tag{35}$$

$$R\_s \ge \frac{1}{2} \log \left( \frac{1 - \rho\_x^2 \rho\_z^2 (1 - \kappa)}{\ldots} \right) \tag{36}$$

$$R\_{\ell} \ge \frac{1}{2} \log \left( \frac{1 - \rho\_x^2 \rho\_z^2 (1 - a)}{1 - \rho\_x^2 (1 - a)} \right) \tag{37}$$

$$D \ge \frac{\alpha \left(1 - \rho\_x^2 \rho\_y^2\right)}{1 - \rho\_x^2 \rho\_y^2 \left(1 - \alpha\right)}.\tag{38}$$

**Proof Sketch.** For the achievability proof, let *U* ∼ N (0, 1−*α*) and Θ ∼ N (0, *α*), as in [41] ([Equation (32)]) and [42] (Appendix B), be independent random variables for some *<sup>α</sup>* <sup>∈</sup> (0, 1] such that *<sup>X</sup>*<sup>H</sup> <sup>=</sup> *<sup>U</sup>* <sup>+</sup> <sup>Θ</sup> and *<sup>U</sup>* <sup>−</sup> *<sup>X</sup>*<sup>H</sup> <sup>−</sup> *<sup>X</sup>* <sup>−</sup> (*Y*, *<sup>Z</sup>*) form a Markov chain. Choose the reconstruction function *X* FH(*U*,*Y*) as the minimum mean square error (MMSE) estimator, and given any fixed *D* > 0, auxiliary random variables are chosen such that the distortion constraint is satisfied. We then have, for the squared error distortion,

$$D = \mathbb{E}\left[\left(\check{X} - \widehat{\check{X}}(\mathcal{U}, \mathcal{Y})\right)^2\right] \stackrel{(a)}{=} \frac{1}{2\pi\epsilon} e^{2h(\bar{X}|\mathcal{U}, \mathcal{Y})} \tag{39}$$

where equality in (*a*) is achieved because *<sup>X</sup>*<sup>H</sup> is Gaussian and the reconstruction function is the MMSE estimator [43] (Theorem 8.6.6). Define the covariance matrix of the vector random variable [*X*H, *<sup>U</sup>*,*Y*] as **<sup>K</sup>***XUY* <sup>H</sup> and of [*U*,*Y*] as **<sup>K</sup>***UY*, respectively. We then have

$$h(X|\mathcal{U},Y) = h(X,\mathcal{U},Y) - h(\mathcal{U},Y)$$

$$= \frac{1}{2} \log \left( 2\pi e \frac{\det(\mathbf{K}\_{\bar{X}\mathcal{U}Y})}{\det(\mathbf{K}\_{\mathcal{U}Y})} \right) \tag{40}$$

where det(·) is the determinant of a matrix; see also [12] (Section F). Combining (39) and (40), and calculating the determinants, we obtain

$$D = \frac{\varkappa \left(1 - \rho\_x^2 \rho\_y^2\right)}{1 - \rho\_x^2 \rho\_y^2 (1 - \alpha)}.\tag{41}$$

One can also show that

$$I(\mathcal{U}; \tilde{X}) = h(\tilde{X}) - h(\tilde{X}|\mathcal{U}) = \frac{1}{2} \log \left(\frac{1}{a}\right) \tag{42}$$

$$I(\mathcal{U};X) = h(X) - h(X|\mathcal{U}) = \frac{1}{2} \log \left( \frac{1}{1 - \rho\_x^2 (1 - a)} \right) \tag{43}$$

$$I(L\mathbf{I};\mathbf{Y}) = h(\mathbf{Y}) - h(\mathbf{Y}|\mathbf{U}) = \frac{1}{2} \log \left(\frac{1}{1 - \rho\_{\mathbf{x}}^2 \rho\_{\mathbf{y}}^2 (1 - \alpha)}\right) \tag{44}$$

$$I(\mathsf{U};Z) = h(Z) - h(Z|\mathsf{U}) = \frac{1}{2} \log \left( \frac{1}{1 - \rho\_{\mathsf{X}}^2 \rho\_{\mathbb{Z}}^2 (1 - a)} \right). \tag{45}$$

Thus, by calculating (25)–(27), the achievability proof follows.

For the converse proof, one can first show that

$$I(\mathcal{U}; \mathcal{X}) - I(\mathcal{U}; \mathcal{Y}) = h(\mathcal{Y}|\mathcal{U}) - h(\mathcal{X}|\mathcal{U}) \tag{46}$$

$$I(\mathcal{U}; \dot{X}) - I(\mathcal{U}; Z) = h(Z|\mathcal{U}) - h(\dot{X}|\mathcal{U}) \tag{47}$$

$$I(\mathsf{U};\mathsf{X}) - I(\mathsf{U};\mathsf{Z}) = h(\mathsf{Z}|\mathsf{U}) - h(\mathsf{X}|\mathsf{U})\tag{48}$$

which follow since *<sup>h</sup>*(*X*H) = *<sup>h</sup>*(*X*) = *<sup>h</sup>*(*Y*) = *<sup>h</sup>*(*Z*). Suppose

$$h(\vec{X}|\mathcal{U}) = \frac{1}{2}\log(2\pi e\alpha) \tag{49}$$

for any *α* ∈ (0, 1] that represents the unique variance of a Gaussian random variable; see [20] (Lemma 2) for a similar result applied to binary random variables. Thus, by applying the conditional EPI, we obtain

$$\begin{split} \mathfrak{e}^{2h(Y|\mathcal{U})} & \stackrel{(a)}{=} \mathfrak{e}^{2h(\rho\_{\mathcal{Y}}\rho\_{\mathcal{Y}}\bar{X}|\mathcal{U})} + \mathfrak{e}^{2h(\rho\_{\mathcal{Y}}N\_{\mathcal{X}} + N\_{\mathcal{Y}})} \\ &= 2\pi \mathfrak{e} \left(\rho\_{\mathcal{X}}^2 \rho\_{\mathcal{Y}}^2 \alpha + \rho\_{\mathcal{Y}}^2 (1 - \rho\_{\mathcal{X}}^2) + 1 - \rho\_{\mathcal{Y}}^2 \right) \\ &= 2\pi \mathfrak{e} \left(1 - \rho\_{\mathcal{X}}^2 \rho\_{\mathcal{Y}}^2 (1 - \alpha) \right) \end{split} \tag{50}$$

where (*a*) follows because *<sup>U</sup>* <sup>−</sup> *<sup>X</sup>*<sup>H</sup> <sup>−</sup> (*Nx*, *Ny*) form a Markov chain and (*Nx*, *Ny*) are independent of *<sup>X</sup>*H, so (*Nx*, *Ny*) are independent of *<sup>U</sup>*, and equality is satisfied since, given *<sup>U</sup>*, *<sup>ρ</sup>xρyX*<sup>H</sup> and (*ρyNx* <sup>+</sup> *Ny*) are conditionally independent and they are Gaussian random variables, as imposed in (49) above; see [20] (Lemma 1 and Equation (28)) for a similar result applied to binary random variables by extending Mrs. Gerber's lemma. Similarly, we have

$$e^{2\hbar(Z|\mathcal{U})} = 2\pi\epsilon \left(1 - \rho\_x^2 \rho\_z^2 (1 - \alpha)\right) \tag{51}$$

which follows by replacing (*Y*, *ρy*, *Ny*) with (*Z*, *ρz*, *Nz*) in (50), respectively, because the channel *PY*|*<sup>U</sup>* can be mapped to *PZ*|*<sup>U</sup>* with these changes due to (29)–(31) and the Markov chain relation *<sup>U</sup>* <sup>−</sup> *<sup>X</sup>*<sup>H</sup> <sup>−</sup> *<sup>X</sup>* <sup>−</sup> (*Y*, *<sup>Z</sup>*). Furthermore, we have

$$e^{2h(X|L)} \stackrel{(a)}{=} e^{2h(\rho\_x \bar{X}|L)} + e^{2h(N\_x)}$$

$$\begin{split} &= 2\pi e \left(\rho\_x^2 \alpha + 1 - \rho\_x^2\right) \\ &= 2\pi e \left(1 - \rho\_x^2 (1 - \alpha)\right) \end{split} \tag{52}$$

where (*a*) follows because *Nx* is independent of *U*, and equality is achieved since, given *U*, *<sup>ρ</sup>xX*<sup>H</sup> and *Nx* are conditionally independent and are Gaussian random variables. Therefore, by applying (46)–(52) to (25)–(27), the converse proof for (35)–(37) follows.

Next, consider

$$\begin{split} h(\check{X}|I,Y) &= -I(I;\check{X}|Y) + h(\check{X}|Y) \\ \overset{(a)}{=} -h(Y|II) + h(\check{X}|II) + h(Y|\check{X}) \\ \overset{(b)}{=} \frac{1}{2} \log\left(\frac{a}{1-\rho\_x^2 \rho\_y^2 (1-a)}\right) + h(\rho\_x \rho\_y \check{X} + \rho\_y N\_x + N\_y|\check{X}) \\ \overset{(c)}{=} \frac{1}{2} \log\left(\frac{a}{1-\rho\_x^2 \rho\_y^2 (1-a)}\right) + h(\rho\_y N\_x + N\_y) \\ &= \frac{1}{2} \log\left(2\pi e \frac{a(\rho\_y^2 (1-\rho\_x^2) + (1-\rho\_y^2))}{1-\rho\_x^2 \rho\_y^2 (1-a)}\right) \\ &= \frac{1}{2} \log\left(2\pi e \frac{a(1-\rho\_x^2 \rho\_y^2)}{1-\rho\_x^2 \rho\_y^2 (1-a)}\right) \end{split} \tag{53}$$

where (*a*) follows by (25) and (46), and since *<sup>h</sup>*(*Y*) = *<sup>h</sup>*(*X*H), (*b*) follows by (49) and (50), and (*c*) follows because (*Nx*, *Ny*) are independent of *<sup>X</sup>*H. Furthermore, for any random variable *<sup>X</sup>*<sup>H</sup> and reconstruction function *<sup>X</sup>* FH(*U*,*Y*), we have [43] (Theorem 8.6.6)

$$\mathbb{E}\left[\left(\boldsymbol{\hat{X}} - \widehat{\boldsymbol{X}}(\boldsymbol{\mathcal{U}}, \boldsymbol{Y})\right)^{2}\right] \geq \frac{1}{2\pi\varepsilon} e^{2h(\boldsymbol{\bar{X}}|\boldsymbol{\mathcal{U}}, \boldsymbol{Y})}.\tag{54}$$

Combining the distortion constraint given in Corollary 1 with (53) and (54), the converse proof for (38) follows.

### **5. Multiple Binary-input Additive Gaussian Noise Channels**

Consider next a binary remote source *X* ∈ {−1, 1} and its binary noisy measurement *<sup>X</sup>*<sup>H</sup> ∈ {−1, 1} observed by the encoder, which represents a practical setting with binary quantizations. For instance, a static random-access memory (SRAM) start-up output at a nominal temperature is a binary value obtained by quantizing sums of Gaussian random variables [28,44]. Suppose the noisy channel *PYZ*|*<sup>X</sup>* outputs consist of a single discrete-time additive Gaussian noise channel output *Y* observed by the decoder and two independent discrete-time additive Gaussian noise channel outputs **Z** = [*Z*1, *Z*2] *<sup>T</sup>* observed by the eavesdropper, in which the eavesdropper obtains more information by measuring the remote source twice. Furthermore, assume that *X* is uniformly distributed, the binary channel *PX*H|*<sup>X</sup>* is symmetric such that Pr[*X*<sup>H</sup> <sup>=</sup> *<sup>X</sup>*] = *<sup>p</sup>* for *<sup>p</sup>* <sup>∈</sup> [0, 1], and we also have

$$Y = \rho\_y X + N\_y \tag{55}$$

$$\mathbf{Z} = \begin{bmatrix} Z\_1 \\ Z\_2 \end{bmatrix} = \rho\_z X \begin{bmatrix} 1 \\ 1 \end{bmatrix} + \begin{bmatrix} N\_{z\_1} \\ N\_{z\_2} \end{bmatrix} \tag{56}$$

where we have fixed correlation coefficients *ρy*, *ρ<sup>z</sup>* ∈ (−1, 1) and additive Gaussian noise random variables

$$N\_y \sim \mathcal{N}(0, 1 - \rho\_y^2) \tag{57}$$

$$N\_{z\_1} \sim \mathcal{N}(0, 1 - \rho\_z^2) \tag{58}$$

$$N\_{z\_2} \sim \mathcal{N}(0, 1 - \rho\_z^2) \tag{59}$$

such that (*X*, *Ny*, *Nz*<sup>1</sup> , *Nz*<sup>2</sup> ) are mutually independent. Consider the Hamming distortion, i.e., *<sup>d</sup>*(*x*H, <sup>F</sup>*x*H) =1{*x*<sup>H</sup> <sup>=</sup> <sup>F</sup>*x*H}. Impose the condition <sup>|</sup>*ρz*<sup>|</sup> <sup>&</sup>gt; <sup>|</sup>*ρy*<sup>|</sup> such that *<sup>Z</sup>*<sup>1</sup> and *<sup>Z</sup>*<sup>2</sup> are less noisy than *Y*, so **Z** is also less noisy than *Y*, which follows by applying similar steps as being applied in Section 4. Thus, for *R*<sup>0</sup> = 0, the region R<sup>D</sup> characterized in Corollary 1 is also valid for such binary-input additive Gaussian noise channels when one replaces *Z* with **Z**. A computable achievable lossy secure and private source coding region for such channels is given next.

**Proposition 3.** *For the setting with multiple binary-input additive Gaussian noise channels, defined above, such that* |*ρz*| > |*ρy*| *and R*<sup>0</sup> = 0*, the region* R*<sup>D</sup> with Hamming distortion includes the set of all rate tuples* (*Rw*, *Rs*, *R*-, *D*) *satisfying, for an independent random variable C* ∼ *Bern*(*p* ∗ *q*) *with any q* ∈ [0, 0.5] *and for any λ* ∈ [0, 1]*,*

$$R\_w \ge \lambda \left(1 - H\_b(q) - h(\rho\_y X + N\_y) + h(\rho\_y (1 - 2C) + N\_y)\right) \tag{60}$$

$$R\_s \ge \lambda \left( 1 - H\_b(q) - h \left( \begin{bmatrix} \rho\_z X + N\_{z\_1} \\ \rho\_z X + N\_{z\_2} \end{bmatrix} \right) + h \left( \begin{bmatrix} \rho\_z (1 - 2\mathcal{C}) + N\_{z\_1} \\ \rho\_z (1 - 2\mathcal{C}) + N\_{z\_2} \end{bmatrix} \right) \right) \tag{61}$$

$$R\_{\ell} \ge \lambda \left( 1 - H\_b(p \ast q) - h \left( \begin{bmatrix} \rho\_z X + N\_{z\_1} \\ \rho\_z X + N\_{z\_2} \end{bmatrix} \right) + h \left( \begin{bmatrix} \rho\_z (1 - 2\mathcal{C}) + N\_{z\_1} \\ \rho\_z (1 - 2\mathcal{C}) + N\_{z\_2} \end{bmatrix} \right) \right) \tag{62}$$

$$R\_{\ell} \ge \lambda \left( 1 - H\_{\delta}(p \ast q) - h \left( \left[ \frac{\rho\_z \mathbf{X} + \mathbf{N}\_z}{\rho\_z \mathbf{X} + \mathbf{N}\_z} \right] \right) + h \left( \left[ \frac{\rho\_z (1 - 2\mathbf{C}) + \mathbf{N}\_z}{\rho\_z (1 - 2\mathbf{C}) + \mathbf{N}\_z} \right] \right) \right) \tag{62}$$

$$D \ge \lambda q + (1 - \lambda) \left( p \ast Q \left( \frac{\rho\_y}{\sqrt{1 - \rho\_y^2}} \right) \right) \tag{63}$$

$$\text{where random variable } Y = \left( \rho\_y \mathbf{X} + \mathbf{N}\_y \right) \text{ has } pd\mathcal{f}$$

$$\frac{1}{2} \frac{\left( e^{-\frac{(y + \rho\_y)^2}{2(1 - \rho\_y^2)}} + e^{-\frac{(y - \rho\_y)^2}{2(1 - \rho\_y^2)}} \right)}{\sqrt{2\pi(1 - \rho\_y^2)}} \tag{64}$$

$$\text{the random variable } \bar{Y} = \left( \rho\_y (1 - 2\mathbf{C}) + \mathbf{N}\_y \right) \text{ has } pd\bar{f}$$

*where random variable Y* = *ρyX* + *Ny has pdf*

$$\frac{1}{2} \frac{\left( e^{-\frac{(\mathbf{y} + \rho\_{\mathbf{y}})^2}{2(1 - \rho\_{\mathbf{y}}^2)}} + e^{-\frac{(\mathbf{y} - \rho\_{\mathbf{y}})^2}{2(1 - \rho\_{\mathbf{y}}^2)}} \right)}{\sqrt{2\pi(1 - \rho\_{\mathbf{y}}^2)}} \tag{64}$$

*ρy*(1−2*C*) + *Ny has pdf*

$$(p\*q)\frac{e^{-\frac{(\bar{y}+\rho\_y)^2}{2(1-\rho\_y^2)}}}{\sqrt{2\pi(1-\rho\_y^2)}}+(1-(p\*q))\frac{e^{-\frac{(\bar{y}-\rho\_y)^2}{2(1-\rho\_y^2)}}}{\sqrt{2\pi(1-\rho\_y^2)}}\tag{65}$$

$$\begin{aligned} \text{the vector random variable } \begin{bmatrix} Z\_1 \\ Z\_2 \end{bmatrix} &= \left( \begin{bmatrix} \rho\_z X + N\_{z\_1} \\ \rho\_z X + N\_{z\_2} \end{bmatrix} \right) \text{ has joint } pdf \end{aligned} $$

$$\frac{1}{2} \frac{\left( e^{-\frac{\left( (z\_1 + \rho\_z)^2 + (z\_2 + \rho\_z)^2 \right)}{2(1 - \rho\_z^2)}}{2\pi (1 - \rho\_z^2)} \right)}{2\pi (1 - \rho\_z^2)} \tag{66}$$

*and the vector random variable* & *Z*s1 *Z*s2 ' = & *ρz*(1−2*C*) + *Nz*<sup>1</sup> *ρz*(1−2*C*) + *Nz*<sup>2</sup> ' *has joint pdf* (s*z*1+*ρz* )2+(s*z*2+*ρz* )2 (s*z*1−*ρz* )2+(s*z*2−*ρz* )2

s

$$\pi(p\*q)\frac{e^{-\left(\frac{(z\_1+\rho\_z)^2+(z\_2+\rho\_z)^2}{2(1-\rho\_z^2)}\right)}}{2\pi(1-\rho\_z^2)} + (1-(p\*q))\frac{e^{-\frac{\left((z\_1-\rho\_z)^2+(z\_2-\rho\_z)^2\right)}{2(1-\rho\_z^2)}}}{2\pi(1-\rho\_z^2)}.\tag{67}$$

**Proof.** We first evaluate (25)–(27) by choosing a binary uniformly distributed *U* and a channel *PX*H|*<sup>U</sup>* such that Pr[*X*<sup>H</sup> <sup>=</sup> *<sup>U</sup>*] = *<sup>q</sup>* for any *<sup>q</sup>* <sup>∈</sup> [0, 0.5]. We have

$$I(lI; \tilde{X}) = H(\tilde{X}) - H(\tilde{X}|\mathcal{U}) \stackrel{(a)}{=} 1 - H\_b(q) \tag{68}$$

$$I(\mathcal{U}; \mathcal{X}) = H(\mathcal{X}) - H(\mathcal{X}|\mathcal{U}) \stackrel{(b)}{=} 1 - H\_b(p\*q) \tag{69}$$

where (*a*) and (*b*) follow by relabeling the input and output symbols to represent the channels *PX*H|*<sup>U</sup>* and *PX*|*X*<sup>H</sup> as BSC(*q*) and BSC(*p*), respectively, which follows since entropy is preserved under a bijective mapping for discrete random variables. For relabeled symbols, the channel *PX*|*<sup>U</sup>* is a BSC(*<sup>p</sup>* ∗ *<sup>q</sup>*) since it is a concatenation of two BSCs, so denote the independent random noise component in this channel as *C* ∼ Bern(*p* ∗ *q*). Then, we obtain <sup>=</sup> *<sup>h</sup>*(*ρy*(1−2*C*) + *Ny*) = *<sup>h</sup>*(*Y*s) (70) ss

$$h(Y|\mathcal{U}) = h(\rho\_{\mathcal{Y}}X + N\_{\mathcal{Y}}|\mathcal{U}) \stackrel{(a)}{=} h(\rho\_{\mathcal{Y}}(1 - 2\mathcal{C}) + N\_{\mathcal{Y}}) = h(\bar{Y}) \tag{70}$$

where (*a*) follows since symbols {−1, 1} correspond to the antipodal modulation of binary symbols, and since (*C*, *Ny*, *U*) are mutually independent. One can compute (70) numerically by using the pdf *pY*s(*y PC*(*c*)*pY*s|*C*(*y*

$$p\_{\overline{Y}}(\overline{y}) = \sum\_{c=0}^{1} P\_{\mathbb{C}}(c) p\_{\overline{Y}|\mathbb{C}}(\overline{y}|c) = (p\*q) \frac{e^{-\frac{(\overline{y}+\rho\_y)^2}{2(1-\rho\_y^2)}}}{\sqrt{2\pi(1-\rho\_y^2)}} + (1-(p\*q)) \frac{e^{-\frac{(\overline{y}-\rho\_y)^2}{2(1-\rho\_y^2)}}}{\sqrt{2\pi(1-\rho\_y^2)}}.\tag{71}$$

Similarly, we can compute

s

$$h(Y) = h(\rho\_y X + N\_y) \tag{72}$$

numerically by using the pdf

$$\begin{cases} \varepsilon^{\varepsilon} & \sqrt{2\pi(1-\rho\_{\mathcal{Y}}^{2})} & \sqrt{2\pi(1-\rho\_{\mathcal{Y}}^{2})} \\\\ \text{e can compute} & \\\\ h(\mathcal{Y}) = h(\rho\_{\mathcal{Y}}X + N\_{\mathcal{Y}}) & \end{cases} \tag{72}$$
 by using the pdf 
$$p\_{\mathcal{Y}}(y) = \sum\_{x \in \{-1, 1\}} P\_{\mathcal{X}}(x) p\_{\mathcal{Y}|X}(y|x) = \frac{1}{2} \frac{\left(\varepsilon \frac{-(y+\rho\_{\mathcal{Y}})^{2}}{2(1-\rho\_{\mathcal{Y}}^{2})} + \varepsilon \frac{(y-\rho\_{\mathcal{Y}})^{2}}{2(1-\rho\_{\mathcal{Y}}^{2})}\right)}{\sqrt{2\pi(1-\rho\_{\mathcal{Y}}^{2})}}. \tag{73}$$
 
$$\text{consider} \tag{73}$$
 
$$= h\left(\left(\rho\_{\mathcal{Z}}X \begin{bmatrix} 1\\1 \end{bmatrix} + \begin{bmatrix} N\_{z1} \\ N\_{z2} \end{bmatrix}\right) \begin{bmatrix} \mathcal{U} \end{bmatrix} \stackrel{(a)}{=} h\left(\begin{bmatrix} \rho\_{\mathcal{Z}}(1-2C) + N\_{z1} \\ \rho\_{\mathcal{Y}}(1-2C) + N\_{z2} \end{bmatrix}\right) = h\left(\begin{bmatrix} \bar{Z}\_{1} \\ Z\_{2} \end{bmatrix}\right) \tag{74}$$
 
$$\text{Allows since (C, N\_{z1}, N\_{z2}, II) are mutually independent. Denote} $$
 
$$\mathcal{Z} = [Z\_{1}, Z\_{2}]^{T}. \tag{75}$$

Next, consider

$$h(\mathbf{Z}|\mathcal{U}) = h\left(\left(\rho\_z X \begin{bmatrix} 1 \\ 1 \end{bmatrix} + \begin{bmatrix} N\_{z\_1} \\ N\_{z\_2} \end{bmatrix}\right) \Big| \mathcal{U}\right) \stackrel{(a)}{=} h\left(\begin{bmatrix} \rho\_z (1 - 2\mathcal{C}) + N\_{z\_1} \\ \rho\_z (1 - 2\mathcal{C}) + N\_{z\_2} \end{bmatrix}\right) = h\left(\begin{bmatrix} \bar{Z}\_1 \\ Z\_2 \end{bmatrix}\right) \tag{74}$$

where (*a*) follows since (*C*, *Nz*<sup>1</sup> , *Nz*<sup>2</sup> , *U*) are mutually independent. Denote

$$\mathbf{Z} = [Z\_1, Z\_2]^T. \tag{75}$$

We can compute (74) numerically by using the joint pdf

$$\begin{aligned} \text{We can compute (74) numerically by using the joint pdf} \\ p\_{\overline{Z}}(\overline{\mathbf{z}}) &= p\_{\overline{Z},\overline{Z}}(\overline{z}\_{1},\overline{z}\_{2}) = \sum\_{c=0}^{1} \mathbb{P}\_{\overline{C}}(c) p\_{\overline{Z},\overline{Z}|c}(\overline{z}\_{1},\overline{z}\_{2}|c) \\ &= (p\*q) \frac{-\left(\frac{(\overline{z}+\rho\_{2})^{2}+(\overline{z}+\rho\_{1})^{2}}{2(1-\rho\_{1}^{2})}\right)}{2\pi(1-\rho\_{1}^{2})} + (1-(p\*q))\frac{-\left(\frac{(\overline{z}-\rho\_{2})^{2}+(\overline{z}-\rho\_{1})^{2}}{2(1-\rho\_{1}^{2})}\right)} \tag{76} \\ \text{which follows since } \overline{Z} \mid \mathbb{C} \text{ is a jointly Gaussian vector random variable with independent components } \overline{Z}\_{\mathbb{C}}(\mathbb{C}) \text{ and } \overline{Z}\_{\mathbb{C}}(\mathbb{C}, \text{ since every scalar linear combination of the components is} \end{aligned}$$

Gaussian; see [45] (Theorem 1). Similarly, we can compute

$$h(\mathbf{Z}) = h\left( \begin{bmatrix} \rho\_z X + N\_{z\_1} \\ \rho\_z X + N\_{z\_2} \end{bmatrix} \right) \tag{77}$$

numerically by using the joint pdf

$$p\_{\mathbf{Z}}(\mathbf{z}) = p\_{Z\_1 Z\_2}(z\_1, z\_2) = \sum\_{\mathbf{x} \in \{-1, 1\}} P\_{\mathbf{X}}(\mathbf{x}) p\_{Z\_1 Z\_2 | \mathbf{X}}(z\_1, z\_2 | \mathbf{x})$$

$$\sigma\_z = \frac{1}{2} \frac{\left(e^{-\frac{\left((z\_1 + \rho\_z)^2 + (z\_2 + \rho\_z)^2\right)}{2(1 - \rho\_z^2)}}{2(1 - \rho\_z^2)} + e^{-\frac{\left((z\_1 - \rho\_z)^2 + (z\_2 - \rho\_z)^2\right)}{2(1 - \rho\_z^2)}}\right)}{2\pi(1 - \rho\_z^2)}.\tag{78}$$

Now, we consider the expected distortion. First, choose the reconstruction function

$$
\ddot{\vec{X}}\_1(\mathcal{U}, \mathcal{Y}) = \mathcal{U} \tag{79}
$$

for the binary uniformly distributed *<sup>U</sup>* and the channel *PX*H|*<sup>U</sup>* such that Pr[*X*<sup>H</sup> <sup>=</sup> *<sup>U</sup>*] = *<sup>q</sup>* for any *q* ∈ [0, 0.5], as considered above. For this reconstruction function and choices of *U* and *PX*H|*U*, we obtain the expected distortion

$$\mathbb{E}\left[d\left(\widetilde{X}, \dot{\widetilde{X}}\_1(\mathcal{U}, Y)\right)\right] = q. \tag{80}$$

Second, choose the reconstruction function

$$
\hat{X}\_2(\mathcal{U}, \mathcal{Y}) = \text{sgn}(\mathcal{Y}) \tag{81}
$$

and consider *U*. We then obtain

$$\mathbb{E}\left[d\left(\check{\mathbf{X}}, \hat{\mathbf{X}}\_2(\mathcal{U}, \mathcal{Y})\right)\right] = p \ast Q\left(\frac{\rho\_y}{\sqrt{1 - \rho\_y^2}}\right) \tag{82}$$

which follows since the channel *P* sgn(*Y*)|*X*<sup>H</sup> can be considered as a concatenation of two BSCs with crossover probabilities *p* and *Q ρy* <sup>1</sup> <sup>−</sup> *<sup>ρ</sup>*<sup>2</sup> *y* , where the former follows since Pr[*X*<sup>H</sup> <sup>=</sup> *<sup>X</sup>*] = *<sup>p</sup>* and the latter because *<sup>X</sup>* ∈ {−1, 1} and

$$\Pr[X \neq \text{sgn}(Y)] = \Pr[X \neq \text{sgn}(\rho\_{\mathcal{Y}} X + N\_{\mathcal{Y}})] = \Pr[N\_{\mathcal{Y}} > \rho\_{\mathcal{Y}}].\tag{83}$$

Therefore, the proof for the achievable lossy secure and private source coding region follows by combining (68)–(70), (72), (74), (77), (80), and (82) by applying time sharing, with timesharing parameter *λ* ∈ [0, 1], between the two reconstruction functions in (79) and (81) with corresponding *<sup>U</sup>* and *PX*H|*U*, since for constant *<sup>U</sup>* the terms in (25)–(27) are zero.

**Remark 1.** *The proof of Proposition 3 follows similar steps as those in [46] (Section II) and it seems that the achievable lossy secure and private source coding region given in Proposition 3 is optimal. Considering* (*Rw*, *Rs*, *R*-)*, one can apply Mrs. Gerber's lemma to show that the choice of U such that PX*H|*<sup>U</sup> is a BSC*(*q*) *after relabeling the input and output symbols is optimal, since Mrs. Gerber's lemma is valid for all binary-input symmetric memoryless channels with discrete or continuous outputs [47]. This result follows because convexity is preserved; see also [48] (Appendix B) for an alternative proof of convexity preservation for independent BSC measurements. However, it is not entirely clear how to prove that the sign operation used for estimation suffices for the rate region.*

### **6. Proof for Theorem 1**

*6.1. Achievability Proof for Theorem 1*

**Proof Sketch.** We leverage the output statistics of random binning (OSRB) method [16,49,50] for the achievability proof by following the steps described in [51] (Section 1.6).

Let (*Vn*, *<sup>U</sup>n*, *<sup>X</sup>*H*n*, *<sup>X</sup>n*,*Yn*, *<sup>Z</sup>n*) be i.i.d. according to *PVUXXYZ* <sup>H</sup> that can be obtained from (14) by fixing *PU*|*X*<sup>H</sup> and *PV*|*U*, such that <sup>E</sup>[*<sup>d</sup> <sup>X</sup>*H, *<sup>X</sup>* FH)] <sup>≤</sup> (*<sup>D</sup>* <sup>+</sup> ) for any <sup>&</sup>gt; 0. To each *<sup>v</sup><sup>n</sup>* assign two random bin indices *<sup>F</sup>*<sup>v</sup> ∈ [<sup>1</sup> : <sup>2</sup>*nR*H<sup>v</sup> ] and *<sup>W</sup>*<sup>v</sup> ∈ [<sup>1</sup> : <sup>2</sup>*nR*<sup>v</sup> ]. Furthermore, to each *<sup>u</sup><sup>n</sup>* assign three random bin indices *<sup>F</sup>*<sup>u</sup> ∈ [<sup>1</sup> : <sup>2</sup>*nR*H<sup>u</sup> ], *<sup>W</sup>*<sup>u</sup> ∈ [<sup>1</sup> : <sup>2</sup>*nR*<sup>u</sup> ], and *<sup>K</sup>*<sup>u</sup> ∈ [<sup>1</sup> : <sup>2</sup>*nR*<sup>0</sup> ], where *R*<sup>0</sup> is the private key rate defined in Section 2. Public indices *F* = (*F*v, *F*u) represent the choice of a source encoder and decoder pair. Furthermore, we impose that the messages sent by the source encoder Enc(·, ·) to the source decoder Dec(·, ·, ·) are

$$\mathcal{W} = (\mathcal{W}\_{\text{V}}, \mathcal{W}\_{\text{u}}, \mathcal{K} + \mathcal{K}\_{\text{u}}) \tag{84}$$

where the summation with the private key is in modulo- 2*nR*<sup>0</sup> , i.e., one-time padding.

The public index *<sup>F</sup>*<sup>v</sup> is almost independent of (*X*H*n*, *<sup>X</sup>n*,*Yn*, *<sup>Z</sup>n*) if we have [49] (Theorem 1)

$$
\widetilde{R}\_V < H(V|\widetilde{X}, X, Y, Z) \stackrel{(a)}{=} H(V|\widetilde{X}) \tag{85}
$$

where (*a*) follows since (*X*,*Y*, *<sup>Z</sup>*) <sup>−</sup> *<sup>X</sup>*<sup>H</sup> <sup>−</sup> *<sup>V</sup>* form a Markov chain. The constraint in (85) suggests that the expected value, taken over the random bin assignments, of the variational distance between the joint probability distributions Unif[1:2*nR*H<sup>v</sup> ] · *PX*H*<sup>n</sup>* and *PF*v*X*H*<sup>n</sup>* vanishes when *<sup>n</sup>* <sup>→</sup> <sup>∞</sup>. Moreover, the public index *<sup>F</sup>*<sup>u</sup> is almost independent of (*Vn*, *<sup>X</sup>*H*n*, *<sup>X</sup>n*,*Yn*, *<sup>Z</sup>n*) if we have

$$\mathbb{\widetilde{R}}\_{\mathbf{u}} < H(\mathcal{U}|V, \mathbb{X}, \mathbf{X}, \mathbf{Y}, \mathbf{Z}) \stackrel{(a)}{=} H(\mathcal{U}|V, \mathbb{X}) \tag{86}$$

where (*a*) follows from the Markov chain relation (*X*,*Y*, *<sup>Z</sup>*) <sup>−</sup> *<sup>X</sup>*<sup>H</sup> <sup>−</sup> (*U*, *<sup>V</sup>*).

Using a Slepian–Wolf (SW) [1] decoder that observes (*Yn*, *F*v, *W*v), one can reliably estimate *V<sup>n</sup>* if we have [49] (Lemma 1)

$$
\dot{R}\_\mathcal{V} + R\_\mathcal{V} > H(V|\mathcal{Y}) \tag{87}
$$

since then the expected error probability, taken over random bin assignments, vanishes when *<sup>n</sup>* → <sup>∞</sup>. Furthermore, one can reliably estimate *<sup>U</sup><sup>n</sup>* by using a SW decoder that observes (*K*, *Vn*,*Yn*, *F*u, *W*u, *K* + *K*u) if we have

$$R\_0 + \check{R}\_\mathbf{u} + R\_\mathbf{u} \succ H(\mathcal{U}|V, Y). \tag{88}$$

To satisfy (85)–(88), for any > 0 we fix

$$
\tilde{\mathcal{R}}\_V = H(V|\tilde{X}) - \varepsilon \tag{89}
$$

$$R\_{\mathbf{V}} = I(V; \tilde{X}) - I(V; \mathbf{Y}) + 2\varepsilon \tag{90}$$

$$
\tilde{\mathcal{R}}\_{\mathfrak{u}} = H(\mathcal{U}|V, \tilde{X}) - \epsilon \, \tag{91}
$$

$$R\_0 + R\_\text{u} = I(\mathcal{U}; \mathcal{X}|V) - I(\mathcal{U}; \mathcal{Y}|V) + 2\varepsilon. \tag{92}$$

Since all tuples (*vn*, *<sup>u</sup>n*, *<sup>x</sup>*H*n*, *<sup>x</sup>n*, *<sup>y</sup>n*, *<sup>z</sup>n*) are in the jointly typical set with high probability, by the typical average lemma [2] (p. 26), the distortion constraint (4) is satisfied.

**Communication Rate**: (90) and (92) result in a communication (storage) rate of

$$R\_{\mathbf{w}} = R\_0 + R\_{\mathbf{v}} + R\_{\mathbf{u}} \stackrel{(a)}{=} I(\mathcal{U}; \tilde{\mathcal{X}} | \mathcal{Y}) + 4\epsilon \tag{93}$$

where (*a*) follows since *<sup>V</sup>* <sup>−</sup> *<sup>U</sup>* <sup>−</sup> *<sup>X</sup>*<sup>H</sup> <sup>−</sup> *<sup>Y</sup>* form a Markov chain.

Ď

**Privacy Leakage Rate**: Since private key *K* is uniformly distributed, and is independent of source and channel random variables, we can consider the following virtual scenario to calculate the leakage. We first assume for the virtual scenario that there is no private key such that the encoder output for the virtual scenario is ĎĎĎĎĎĎ

$$
\overline{\mathcal{W}} = (\mathcal{W}\_{\mathbf{V}}, \mathcal{W}\_{\mathbf{u}}, \mathcal{K}\_{\mathbf{u}}).\tag{94}
$$

We calculate the leakage for the virtual scenario. Then, given the mentioned properties of the private key and due to the one-time padding step in (84), we can subtract *H*(*K*) = *nR*<sup>0</sup> from the leakage calculated for the virtual scenario to obtain the leakage for the original problem, which follows from the sum of (91) and (92) if → 0 when *n* → ∞. Thus, we have the privacy leakage Ď

$$\begin{aligned} I(X^n; W, F|Z^n) &= I(X^n; \overline{W}, F|Z^n) - nR\_0 \\ \overset{(a)}{=} H(\overline{W}, F|Z^n) - H(\overline{W}, F|X^n) - nR\_0 \\ \overset{(b)}{=} H(\overline{W}, F|Z^n) - H(\mathcal{U}^n, V^n|X^n) + H(V^n|\overline{W}, F, X^n) + H(\mathcal{U}^n|V^n, \overline{W}, F, X^n) - nR\_0 \\ \overset{(c)}{\leq} H(\overline{W}, F|Z^n) - nH(\mathcal{U}, V|X) + 2n\epsilon\_n - nR\_0 \end{aligned} \tag{95}$$

where (*a*) follows because (*W* , *<sup>F</sup>*) − *<sup>X</sup><sup>n</sup>* − *<sup>Z</sup><sup>n</sup>* form a Markov chain, (*b*) follows since (*Un*, *Vn*) determine (*F*u, *W*u, *K*u, *F*v, *W*v), and (*c*) follows since (*Un*, *Vn*, *Xn*) is i.i.d. and for some *<sup>n</sup>* > 0 such that *<sup>n</sup>* → 0 when *<sup>n</sup>* → <sup>∞</sup> because (*F*v, *<sup>W</sup>*v, *<sup>X</sup>n*) can reliably recover *<sup>V</sup><sup>n</sup>* by (87) because of the Markov chain relation *<sup>V</sup><sup>n</sup>* − *<sup>X</sup><sup>n</sup>* − *<sup>Y</sup><sup>n</sup>* and, similarly, (*F*u, *<sup>W</sup>*u, *<sup>K</sup>*u, *<sup>V</sup>n*, *<sup>X</sup>n*) can reliably recover *<sup>U</sup><sup>n</sup>* by (88) because of *<sup>H</sup>*(*U*|*V*,*Y*) ≥ *<sup>H</sup>*(*U*|*V*, *<sup>X</sup>*) that is proved in [21] (Equation (55)) for the Markov chain relation (*V*, *U*) − *X* − *Y*.

Next, we consider the term *H*(*W* , *<sup>F</sup>*|*Zn*) in (95) and provide single letter bounds on it by applying the six different decodability results given in [21] (Section V-A) that are applied to an entirely similar conditional entropy term in [21] (Equation (54)) that measures the uncertainty in indices conditioned on an i.i.d. multi-letter random variable. Thus, combining the six decodability results in [21] (Section V-A) with (95) we obtain

$$I(X^{\mathfrak{n}}; \mathcal{W}, F|Z^{\mathfrak{n}}) \le n \left( \left[ I(\mathcal{U}; Z|V) - I(\mathcal{U}; Y|V) + \epsilon \right]^- + I(\mathcal{U}; X|Z) + \mathfrak{A}\epsilon\_{\mathfrak{n}} - R\_0 \right). \tag{96}$$

The equation (92) implicitly assumes that private key rate *<sup>R</sup>*<sup>0</sup> is less than (*I*(*U*; *<sup>X</sup>*H|*V*)<sup>−</sup> *<sup>I</sup>*(*U*;*Y*|*V*)+ <sup>2</sup>)=(*I*(*U*; *<sup>X</sup>*H|*Y*, *<sup>V</sup>*) + <sup>2</sup>), where the equality follows from the Markov chain relation (*V*, *<sup>U</sup>*) <sup>−</sup> *<sup>X</sup>*<sup>H</sup> <sup>−</sup>*Y*. The communication rate results are not affected by this assumption, since *<sup>X</sup>*H*<sup>n</sup>* should be reconstructed by the decoder. However, if the private key rate *<sup>R</sup>*<sup>0</sup> is greater than or equal to (*I*(*U*; *<sup>X</sup>*H|*Y*, *<sup>V</sup>*) + <sup>2</sup>), then we can remove the bin index *<sup>K</sup>*<sup>u</sup> from the

code construction above and apply one-time padding to the bin index *W*u, such that we have the encoder output

ĎĎ

ĎĎ

$$
\overline{\mathcal{W}} = (\mathcal{W}\_{\mathbf{V}}, \mathcal{W}\_{\mathbf{u}} + \mathcal{K}) \tag{97}
$$

where the summation with the private key is in modulo- 2*nR*<sup>u</sup> = 2*n*(*I*(*U*;*X*H|*Y*,*V*)+2). Thus, one then does not leak any information about *W*u to the eavesdropper because of the one-time padding step in (97). We then have privacy leakage

$$\begin{aligned} I(X^n; \overline{W}, \mathcal{F}|Z^n) &= I(X^n; \mathcal{W}\_\mathbf{v}, \mathcal{F}|Z^n) \\ \stackrel{(a)}{\leq} H(X^n|Z^n) - H(X^n|Z^n, \mathcal{W}\_\mathbf{v}, \mathcal{F}\_\mathbf{v}) + \epsilon'\_n \\ \stackrel{(b)}{\leq} H(X^n|Z^n) - H(X^n|Z^n, V^n) + \epsilon'\_n \\ \stackrel{(c)}{=} nI(V; X|Z) + \epsilon'\_n \end{aligned} \tag{98}$$

where (*a*) follows for some *<sup>n</sup>* such that *<sup>n</sup>* → 0 when *n* → ∞ since by (86) *F*<sup>u</sup> is almost independent of (*Vn*, *Xn*, *Zn*); see also [52] (Theorem 1), (*b*) follows since *V<sup>n</sup>* determines (*F*v, *W*v), and (*c*) follows because (*Xn*, *Zn*, *Vn*) are i.i.d.

Note we can reduce the privacy leakage given in (98) if *<sup>R</sup>*<sup>0</sup> <sup>≥</sup> (*I*(*U*; *<sup>X</sup>*H) <sup>−</sup> *<sup>I</sup>*(*U*;*Y*) + <sup>4</sup>)=(*I*(*U*; *<sup>X</sup>*H|*Y*) + <sup>4</sup>), where the equality follows from the Markov chain relation *<sup>U</sup>* <sup>−</sup> *<sup>X</sup>*<sup>H</sup> − *<sup>Y</sup>*, since then we can apply one-time padding to both bin indices *<sup>W</sup>*<sup>v</sup> and *<sup>W</sup>*<sup>u</sup> with the sum rate

$$R\_\mathbf{v} + R\_\mathbf{u} \stackrel{(a)}{=} I(V; \tilde{X}) - I(V; Y) + 2\varepsilon + I(\mathcal{U}; \tilde{X} | V) - I(\mathcal{U}; Y | V) + 2\varepsilon$$

$$\stackrel{(b)}{=} I(\mathcal{U}; \tilde{X}) - I(\mathcal{U}; Y) + 4\varepsilon \tag{99}$$

where (*a*) follows by (90) and (92), and (*b*) follows from the Markov chain relation *V* − *U* − *<sup>X</sup>*<sup>H</sup> <sup>−</sup> *<sup>Y</sup>*. Thus, one then does not leak any information about (*W*v, *<sup>W</sup>*u) to the eavesdropper because of the one-time padding step, so we then obtain the privacy leakage of ĎĎĎĎĎ

$$I(X^n; F|Z^n) = I(X^n; F\_\mathbf{v}|Z^n) + I(X^n; F\_\mathbf{u}|Z^n, F\_\mathbf{v}) \stackrel{(a)}{\leq} 2\varepsilon'\_\mathbf{u} \tag{100}$$

where (*a*) follows since by (85) *F*<sup>v</sup> is almost independent of (*Xn*, *Zn*) and by (86) *F*<sup>u</sup> is almost independent of (*Vn*, *Xn*, *Zn*). Ď

**Secrecy Leakage Rate**: Similar to the privacy leakage analysis above, we first consider the virtual scenario with the encoder output given in (94), and then calculate the leakage for the original problem by subtracting *H*(*K*) = *nR*<sup>0</sup> from the leakage calculated for the virtual scenario. Thus, we obtain Ď

$$\begin{aligned} &I(\widetilde{X}^n; W, F|Z^n) = I(\widetilde{X}^n; \overline{W}, F|Z^n) - nR\_0 \\ &\stackrel{(a)}{=} H(\overline{W}, F|Z^n) - H(\overline{W}, F|\widetilde{X}^n) - nR\_0 \\ &\stackrel{(b)}{=} H(\overline{W}, F|Z^n) - H(\mathcal{U}^n, V^n|\widetilde{X}^n) + H(V^n|\overline{W}, F, \widetilde{X}^n) + H(\mathcal{U}^n|V^n, \overline{W}, F, \widetilde{X}^n) \\ &\stackrel{(c)}{\leq} H(\overline{W}, F|Z^n) - nH(\mathcal{U}, V|\overline{X}) + 2n\epsilon\_n' - nR\_0 \\ &\stackrel{(d)}{\leq} n\left( [I(\mathcal{U}; Z|V) - I(\mathcal{U}; Y|V) + \epsilon]^- + I(\mathcal{U}; \overline{X}|Z) + 3\epsilon\_n' - R\_0 \right) \end{aligned} \tag{101}$$

where (*a*) follows from the Markov chain relation (*W* , *<sup>F</sup>*) <sup>−</sup> *<sup>X</sup>*H*<sup>n</sup>* <sup>−</sup> *<sup>Z</sup>n*, (*b*) follows since (*Un*, *Vn*) determine (*W* , *<sup>F</sup>*), (*c*)follows because (*Vn*, *<sup>U</sup>n*, *<sup>X</sup>*H*n*) are i.i.d. and because (*F*v, *<sup>W</sup>*v, *<sup>X</sup>*H*n*) can reliably recover *<sup>V</sup><sup>n</sup>* by (87) due to the Markov chain relation *<sup>V</sup><sup>n</sup>* <sup>−</sup> *<sup>X</sup>*H*<sup>n</sup>* <sup>−</sup> *<sup>Y</sup><sup>n</sup>* and, similarly, (*F*u, *<sup>W</sup>*u, *<sup>K</sup>*u, *<sup>V</sup>n*, *<sup>X</sup>*H*n*) can reliably recover *<sup>U</sup><sup>n</sup>* by (88) due to *<sup>H</sup>*(*U*|*V*,*Y*)<sup>≥</sup> *<sup>H</sup>*(*U*|*V*, *<sup>X</sup>*H) that can

Ď

be proved as in [21] (Equation (55)) for the Markov chain relation (*V*, *<sup>U</sup>*) <sup>−</sup> *<sup>X</sup>*<sup>H</sup> <sup>−</sup> *<sup>Y</sup>*, and (*d*) follows by applying the six decodability results in [21] (Section V-A) that are applied to (95) with the final result in (96) by replacing *<sup>X</sup>* with *<sup>X</sup>*H.

ĎĎ

Similar to the privacy leakage analysis above, if we have *<sup>R</sup>*<sup>0</sup> <sup>≥</sup> (*I*(*U*; *<sup>X</sup>*H|*Y*, *<sup>V</sup>*) + <sup>2</sup>), then we can eliminate *K*u and apply one-time padding as in (97), such that no information about *W*u is leaked to the eavesdropper, we have

$$\begin{aligned} I(\check{X}^n; \overline{\mathcal{W}}, F|Z^n) &= I(\check{X}^n; \mathcal{W}\_\mathbf{V}, F|Z^n) \\ \overset{(a)}{\leq} H(\check{X}^n|Z^n) - H(\check{X}^n|Z^n, \mathcal{W}\_\mathbf{V}, F\_\mathbf{V}) + \epsilon\_n' \\ \overset{(b)}{\leq} H(\check{X}^n|Z^n) - H(\check{X}^n|Z^n, V^n) + \epsilon\_n' \\ \overset{(c)}{=} nI(V; \check{X}|Z) + \epsilon\_n' \end{aligned} \tag{102}$$

where (*a*) follows because by (86) *<sup>F</sup>*<sup>u</sup> is almost independent of (*Vn*, *<sup>X</sup>*H*n*, *<sup>Z</sup>n*), (*b*) follows since *<sup>V</sup><sup>n</sup>* determines (*F*v, *<sup>W</sup>*v), and (*c*) follows because (*X*H*n*, *<sup>Z</sup>n*, *<sup>V</sup>n*) are i.i.d.

If *<sup>R</sup>*<sup>0</sup> <sup>≥</sup> (*I*(*U*; *<sup>X</sup>*H|*Y*) + <sup>4</sup>), we can apply one-time padding to hide (*W*v, *<sup>W</sup>*u), as in the privacy leakage analysis above. We then have the secrecy leakage of

$$I(\tilde{X}^n; F|Z^n) = I(\tilde{X}^n; F\_\mathbf{v}|Z^n) + I(\tilde{X}^n; F\_\mathbf{u}|Z^n, F\_\mathbf{v}) \stackrel{(a)}{\leq} 2\epsilon'\_n \tag{103}$$

where (*a*) follows since by (85) *<sup>F</sup>*<sup>v</sup> is almost independent of (*X*H*n*, *<sup>Z</sup>n*) and by (86) *<sup>F</sup>*<sup>u</sup> is almost independent of (*Vn*, *<sup>X</sup>*H*n*, *<sup>Z</sup>n*).

Suppose that public indices *F* are generated uniformly at random, and the encoder generates (*Vn*, *<sup>U</sup>n*) according to *PVnUn*|*X*H*<sup>n</sup> <sup>F</sup>*v*F*<sup>u</sup> that can be obtained from the proposed binning scheme above to compute the bins *W*<sup>v</sup> from *V<sup>n</sup>* and *W*<sup>u</sup> from *Un*, respectively. Such a procedure results in a joint probability distribution almost equal to *PVUXXYZ* <sup>H</sup> fixed above [51] (Section 1.6). The privacy and secrecy leakage metrics above are expectations over all possible public index realizations *F* = *f* . Therefore, using a time-sharing random variable *Q* for convexification and applying the selection lemma [53] (Lemma 2.2) to each decodability case separately, the achievability for Theorem 1 follows by choosing an > 0 such that → 0 when *n* → ∞.

### *6.2. Converse Proof for Theorem 1*

**Proof Sketch.** Assume that for some *δ<sup>n</sup>* >0 and *n* ≥ 1, there exist an encoder and a decoder, such that (1)–(4) are satisfied for some tuple (*R*w, *R*s, *R*-, *D*) given a private key with rate *R*0.

Define *Vi* - (*W*,*Y<sup>n</sup> <sup>i</sup>*+1, *<sup>Z</sup>i*−1) and *Ui* - (*W*,*Y<sup>n</sup> <sup>i</sup>*+1, *<sup>Z</sup>i*−1, *<sup>X</sup>i*−1, *<sup>K</sup>*) that satisfy the Markov chain relation *Vi* <sup>−</sup> *Ui* <sup>−</sup> *<sup>X</sup>*H*<sup>i</sup>* <sup>−</sup> *Xi* <sup>−</sup> (*Yi*, *Zi*) by definition of the source statistics. We have

$$D + \delta\_{\mathcal{V}} \stackrel{(a)}{\geq} \mathbb{E}\left[d\left(\widetilde{\mathcal{X}}^{n}, \widehat{\widetilde{\mathcal{X}}^{n}}(\mathcal{Y}^{n}, W, K)\right)\right]$$

$$\stackrel{(b)}{\geq} \mathbb{E}\left[d\left(\widetilde{\mathcal{X}}^{n}, \widehat{\widetilde{\mathcal{X}}^{n}}(\mathcal{Y}^{n}, W, K, \mathcal{X}^{i-1}, Z^{i-1})\right)\right]$$

$$\stackrel{(c)}{=} \mathbb{E}\left[d\left(\widetilde{\mathcal{X}}^{n}, \widehat{\widetilde{\mathcal{X}}^{n}}(\mathcal{Y}\_{i}^{n}, W, K, \mathcal{X}^{i-1}, Z^{i-1})\right)\right]$$

$$\stackrel{(d)}{=} \frac{1}{n} \sum\_{i=1}^{n} \mathbb{E}\left[d\left(\widetilde{\mathcal{X}}\_{i}, \widehat{\widetilde{\mathcal{X}}\_{i}}(\mathcal{U}\_{i}, \mathcal{Y}\_{i})\right)\right] \tag{104}$$

where (*a*) follows by (4), (*b*) follows since providing more information to the reconstruction function does not increase expected distortion, (*c*) follows from the Markov chain relation

$$\left(Y^{i-1} - (Y\_i^n, X^{i-1}, Z^{i-1}, \mathcal{W}, \mathcal{K}) - \tilde{X}^n\right) \tag{105}$$

and (*d*) follows from the definition of *Ui*.

**Communication Rate**: For any *R*<sup>0</sup> ≥ 0, we have

*n*(*R*<sup>w</sup> + *δn*) (*a*) ≥ log |W| <sup>≥</sup> *<sup>H</sup>*(*W*|*Yn*, *<sup>K</sup>*) <sup>−</sup> *<sup>H</sup>*(*W*|*Yn*, *<sup>K</sup>*, *<sup>X</sup>*H*n*) (106)

$$\stackrel{\text{(b)}}{=} \sum\_{i=1}^{n} I(\mathcal{W}; \check{\mathcal{X}}\_{i} | \hat{\mathcal{X}}^{i-1}, \mathcal{Y}\_{i+1}^{n}, Z^{i-1}, \mathcal{K}, \mathcal{Y}\_{i}) \tag{107}$$

$$\begin{aligned} & \stackrel{(c)}{=} \sum\_{i=1}^{n} I(\tilde{X}^{i-1}, Y\_{i+1}^{n}, Z^{i-1}, \mathsf{K}, \mathsf{W}; \tilde{X}\_{i} | Y\_{i}) \\ & \stackrel{(d)}{\geq} \sum\_{i=1}^{n} I(X^{i-1}, Y\_{i+1}^{n}, Z^{i-1}, \mathsf{K}, \mathsf{W}; \tilde{X}\_{i} | Y\_{i}) \\ & \stackrel{(e)}{=} \sum\_{i=1}^{n} I(\mathsf{U}I\_{i}; \tilde{X}\_{i} | Y\_{i}) \end{aligned} \tag{108}$$

where (*a*) follows by (1), (*b*) follows from the Markov chain relation

$$(Y^{i-1}, X^{i-1}, Z^{i-1}) - (\tilde{X}^{i-1}, Y^n\_i, \mathcal{K}) - (\tilde{X}\_i, \mathcal{W}) \tag{109}$$

(*c*) follows because (*X*H*i*,*Yi*) are independent of (*X*H*i*−1,*Y<sup>n</sup> <sup>i</sup>*+1, *<sup>Z</sup>i*−1, *<sup>K</sup>*), (*d*) follows by applying the data processing inequality to the Markov chain relation in (109), and (*e*) follows from the definition of *Ui*.

**Privacy Leakage Rate**: We obtain

*n*(*R*- + *δn*) (*a*) <sup>≥</sup> [*I*(*W*;*Yn*) <sup>−</sup> *<sup>I</sup>*(*W*; *<sup>Z</sup>n*)] + [*I*(*W*; *<sup>X</sup>n*) <sup>−</sup> *<sup>I</sup>*(*W*;*Yn*)] (*b*) = [*I*(*W*;*Yn*) <sup>−</sup> *<sup>I</sup>*(*W*; *<sup>Z</sup>n*)] + *<sup>I</sup>*(*W*; *<sup>X</sup>n*|*K*) <sup>−</sup> *<sup>I</sup>*(*K*; *<sup>X</sup>n*|*W*) <sup>−</sup> *<sup>I</sup>*(*W*;*Yn*|*K*) + *<sup>I</sup>*(*K*;*Yn*|*W*) (*c*) = [*I*(*W*;*Yn*) <sup>−</sup> *<sup>I</sup>*(*W*; *<sup>Z</sup>n*)] + [*I*(*W*; *<sup>X</sup>n*|*K*) <sup>−</sup> *<sup>I</sup>*(*W*;*Yn*|*K*)] <sup>−</sup> *<sup>I</sup>*(*K*; *<sup>X</sup>n*|*W*,*Yn*) ≥ *n* ∑ *i*=1 *<sup>I</sup>*(*W*;*Yi*|*Y<sup>n</sup> <sup>i</sup>*+1) <sup>−</sup> *<sup>I</sup>*(*W*; *Zi*|*Zi*−1) + *n* ∑ *i*=1 *<sup>I</sup>*(*W*; *Xi*|*Xi*−1, *<sup>K</sup>*) <sup>−</sup> *<sup>I</sup>*(*W*;*Yi*|*Y<sup>n</sup> <sup>i</sup>*<sup>+</sup>1, *K*) − *H*(*K*) (*d*) = *n* ∑ *i*=1 *<sup>I</sup>*(*W*;*Yi*|*Y<sup>n</sup> <sup>i</sup>*<sup>+</sup>1, *<sup>Z</sup>i*−1) <sup>−</sup> *<sup>I</sup>*(*W*; *Zi*|*Zi*−1,*Y<sup>n</sup> <sup>i</sup>*+1) − *R*<sup>0</sup> + *n* ∑ *i*=1 *<sup>I</sup>*(*W*; *Xi*|*Xi*−1,*Y<sup>n</sup> <sup>i</sup>*<sup>+</sup>1, *<sup>K</sup>*) <sup>−</sup> *<sup>I</sup>*(*W*;*Yi*|*Y<sup>n</sup> <sup>i</sup>*<sup>+</sup>1, *<sup>X</sup>i*−1, *<sup>K</sup>*) (*e*) = *n* ∑ *i*=1 *<sup>I</sup>*(*W*;*Yi*|*Y<sup>n</sup> <sup>i</sup>*<sup>+</sup>1, *<sup>Z</sup>i*−1)−*I*(*W*; *Zi*|*Zi*−1,*Y<sup>n</sup> <sup>i</sup>*+1)−*R*<sup>0</sup> + *n* ∑ *i*=1 *<sup>I</sup>*(*W*; *Xi*|*Xi*−1,*Y<sup>n</sup> <sup>i</sup>*<sup>+</sup>1, *<sup>Z</sup>i*−1, *<sup>K</sup>*) <sup>−</sup> *<sup>I</sup>*(*W*;*Yi*|*Y<sup>n</sup> <sup>i</sup>*<sup>+</sup>1, *<sup>X</sup>i*−1, *<sup>Z</sup>i*−1, *<sup>K</sup>*) (*f*) = *n* ∑ *i*=1 *I*(*W*,*Y<sup>n</sup> <sup>i</sup>*<sup>+</sup>1, *<sup>Z</sup>i*−1;*Yi*)−*I*(*W*, *<sup>Z</sup>i*−1,*Y<sup>n</sup> <sup>i</sup>*<sup>+</sup>1; *Zi*) − *R*<sup>0</sup> + *n* ∑ *i*=1 *I*(*W*, *Xi*−1,*Y<sup>n</sup> <sup>i</sup>*<sup>+</sup>1, *<sup>Z</sup>i*−1, *<sup>K</sup>*; *Xi*) <sup>−</sup> *<sup>I</sup>*(*W*,*Y<sup>n</sup> <sup>i</sup>*<sup>+</sup>1, *<sup>X</sup>i*−1, *<sup>Z</sup>i*−1, *<sup>K</sup>*;*Yi*) 

$$\begin{aligned} \stackrel{(g)}{=} & \sum\_{i=1}^{n} \left[ I(V\_{i}; Y\_{i}) - I(V\_{i}; Z\_{i}) - R\_{0} + I(\mathcal{U}\_{i}, V\_{i}; X\_{i}) - I(\mathcal{U}\_{i}, V\_{i}; Y\_{i}) \right] \\ = & \sum\_{i=1}^{n} \left[ -I(\mathcal{U}\_{i}, V\_{i}; Z\_{i}) - R\_{0} + I(\mathcal{U}\_{i}, V\_{i}; X\_{i}) + \left( I(\mathcal{U}\_{i}, Z\_{i}|V\_{i}) - I(\mathcal{U}\_{i}, Y\_{i}|V\_{i}) \right) \right] \\ \stackrel{(h)}{\geq} & \sum\_{i=1}^{n} \left[ I(\mathcal{U}\_{i}; X\_{i}|Z\_{i}) - R\_{0} + \left[ I(\mathcal{U}\_{i}; Z\_{i}|V\_{i}) - I(\mathcal{U}\_{i}; Y\_{i}|V\_{i}) \right]^{-} \right] \end{aligned} \tag{110}$$

where (*a*) follows by (3) and from the Markov chain relation *<sup>W</sup>* − *<sup>X</sup><sup>n</sup>* − *<sup>Z</sup>n*, (*b*) follows since *<sup>K</sup>* is independent of (*Xn*,*Yn*), (*c*) follows from the Markov chain relation (*W*, *<sup>K</sup>*) − *<sup>X</sup><sup>n</sup>* − *<sup>Y</sup>n*, (*d*) follows because *H*(*K*) = *nR*<sup>0</sup> and from Csiszár's sum identity [54], (*e*) follows from the Markov chain relations

$$Z^{i-1} - (X^{i-1}, Y^{\eta}\_{i+1\prime}, K) - (X\_i, \mathcal{W}) \tag{111}$$

$$Z^{i-1} - \left(X^{i-1}, Y^{n}\_{i+1\prime}, \mathcal{K}\right) - \left(Y\_{i\prime}, \mathcal{W}\right) \tag{112}$$

(*f*) follows because (*Xn*,*Yn*, *Zn*) are i.i.d. and *K* is independent of (*Xn*,*Yn*, *Zn*), (*g*) follows from the definitions of *Vi* and *Ui*, and (*h*) follows from the Markov chain relation *Vi* − *Ui* − *Xi* − *Zi*.

Next, we provide the matching converse for the privacy leakage rate in (98), which is achieved when *<sup>R</sup>*<sup>0</sup> <sup>≥</sup> *<sup>I</sup>*(*U*; *<sup>X</sup>*H|*Y*, *<sup>V</sup>*). We have

$$\begin{aligned} &n(\mathcal{R}\_{\ell} + \delta\_{n}) \stackrel{(a)}{\geq} H(X^{n}|Z^{n}) - H(X^{n}|Z^{n}, W) \\ &\stackrel{(b)}{=} H(X^{n}|Z^{n}) - \sum\_{i=1}^{n} H(X\_{i}|Z\_{i}, Z^{i-1}, X\_{i+1}^{n}, W, Y\_{i+1}^{n}) \\ &\stackrel{(c)}{=} H(X^{n}|Z^{n}) - \sum\_{i=1}^{n} H(X\_{i}|Z\_{i}, V\_{i}, X\_{i+1}^{n}) \\ &\stackrel{(d)}{\geq} \sum\_{i=1}^{n} [H(X\_{i}|Z\_{i}) - H(X\_{i}|Z\_{i}, V\_{i})] \\ &= \sum\_{i=1}^{n} I(V\_{i}; X\_{i}|Z\_{i}) \end{aligned} \tag{113}$$

where (*a*) follows by (3), (*b*) follows from the Markov chain relation

$$(Z\_{i+1'}^n, Y\_{i+1}^n) - (X\_{i+1'}^n, W, Z^i) - X\_i \tag{114}$$

(*c*) follows from the definition of *Vi*, and (*d*) follows because (*Xn*, *Zn*) are i.i.d.

The matching converse for the privacy leakage rate in (100), achieved when *R*<sup>0</sup> ≥ *<sup>I</sup>*(*U*; *<sup>X</sup>*H|*Y*), follows from the fact that conditional mutual information is non-negative.

**Secrecy Leakage Rate**: We have

$$\begin{split} &n(R\_{\delta} + \delta\_{n}) \\ &\stackrel{(a)}{\geq} \left[ I(W; Y^{n}) - I(W; Z^{n}) \right] + \left[ I(W; \hat{X}^{n}) - I(W; Y^{n}) \right] \\ &\stackrel{(b)}{=} \left[ I(W; Y^{n}) - I(W; Z^{n}) \right] + I(W; \hat{X}^{n}|K) - I(K; \hat{X}^{n}|W) - I(W; Y^{n}|K) + I(K; Y^{n}|W) \\ &\stackrel{(c)}{=} \left[ I(W; Y^{n}) - I(W; Z^{n}) \right] + \left[ I(W; \hat{X}^{n}|K) - I(W; Y^{n}|K) \right] - I(K; \hat{X}^{n}|W, Y^{n}) \\ &\stackrel{(d)}{\geq} \sum\_{i=1}^{n} \left[ I(W; Y\_{i}|Y\_{i+1}^{n}) - I(W; Z\_{i}|Z^{i-1}) \right] + I(W; \hat{X}^{n}|Y^{n}, K) - H(K) \end{split}$$

(*e*) = *n* ∑ *i*=1 *<sup>I</sup>*(*W*;*Yi*|*Y<sup>n</sup> <sup>i</sup>*<sup>+</sup>1, *<sup>Z</sup>i*−1) <sup>−</sup> *<sup>I</sup>*(*W*; *Zi*|*Zi*−1,*Y<sup>n</sup> <sup>i</sup>*+1) − *R*<sup>0</sup> <sup>+</sup> *nH*(*X*H|*Y*) <sup>−</sup> *n* ∑ *i*=1 *<sup>H</sup>*(*X*H*i*|*Yi*,*Y<sup>n</sup> <sup>i</sup>*<sup>+</sup>1, *<sup>W</sup>*, *<sup>K</sup>*, *<sup>X</sup>*H*i*−1) (*f*) ≥ *n* ∑ *i*=1 *I*(*W*,*Y<sup>n</sup> <sup>i</sup>*<sup>+</sup>1, *<sup>Z</sup>i*−1;*Yi*) <sup>−</sup> *<sup>I</sup>*(*W*, *<sup>Z</sup>i*−1,*Y<sup>n</sup> <sup>i</sup>*<sup>+</sup>1; *Zi*) − *R*<sup>0</sup> <sup>+</sup> *nH*(*X*H|*Y*) <sup>−</sup> *n* ∑ *i*=1 *<sup>H</sup>*(*X*H*i*|*Yi*,*Y<sup>n</sup> <sup>i</sup>*<sup>+</sup>1, *<sup>W</sup>*, *<sup>K</sup>*, *<sup>X</sup>i*−1, *<sup>Z</sup>i*−1) (*g*) = *n* ∑ *i*=1 *I*(*Vi*;*Yi*) − *I*(*Vi*; *Zi*) − *R*<sup>0</sup> <sup>+</sup> *nH*(*X*H|*Y*) <sup>−</sup> *n* ∑ *i*=1 *<sup>H</sup>*(*X*H*i*|*Yi*, *Ui*, *Vi*) (*h*) = *n* ∑ *i*=1 *I*(*Vi*;*Yi*) − *I*(*Vi*; *Zi*) − *R*<sup>0</sup> + *n* ∑ *i*=1 *<sup>I</sup>*(*Ui*, *Vi*; *<sup>X</sup>*H*i*) <sup>−</sup> *<sup>I</sup>*(*Ui*, *Vi*;*Yi*) = *n* ∑ *i*=1 <sup>−</sup> *<sup>I</sup>*(*Ui*, *Vi*; *Zi*) <sup>−</sup> *<sup>R</sup>*<sup>0</sup> <sup>+</sup> *<sup>I</sup>*(*Ui*, *Vi*; *<sup>X</sup>*H*i*) + (*I*(*Ui*; *Zi*|*Vi*) <sup>−</sup> *<sup>I</sup>*(*Ui*;*Yi*|*Vi*)) (*i*) ≥ *n* ∑ *i*=1 *<sup>I</sup>*(*Ui*; *<sup>X</sup>*H*i*|*Zi*) <sup>−</sup> *<sup>R</sup>*<sup>0</sup> + [*I*(*Ui*; *Zi*|*Vi*) <sup>−</sup> *<sup>I</sup>*(*Ui*;*Yi*|*Vi*)]<sup>−</sup> (115)

where (*a*) follows by (2) and from the Markov chain relation *<sup>W</sup>* <sup>−</sup> *<sup>X</sup>*H*<sup>n</sup>* <sup>−</sup> *<sup>Z</sup>n*, (*b*) follows because *<sup>K</sup>* is independent of (*X*H*n*,*Yn*), (*c*) and (*d*) follow from the Markov chain relation (*W*, *<sup>K</sup>*) <sup>−</sup> *<sup>X</sup>*H*<sup>n</sup>* <sup>−</sup> *<sup>Y</sup>n*, (*e*) follows because *<sup>H</sup>*(*K*) = *nR*<sup>0</sup> and (*X*H*n*,*Yn*) are i.i.d. and independent of *K*, and from the Csiszár's sum identity and the Markov chain relation

$$Y^{i-1} - \left(\widetilde{X}^{i-1}, W, \mathcal{K}, Y^n\_{i+1'}, \mathcal{Y}\_i\right) - \widetilde{X}\_i \tag{116}$$

(*f*) follows since (*Yn*, *Zn*) are i.i.d. and from the data processing inequality applied to the Markov chain relation

$$(X^{i-1}, Z^{i-1}) - (\tilde{X}^{i-1}, \mathcal{W}\_\prime \mathcal{K}\_\prime \mathcal{Y}\_{i+1\prime}^\prime \mathcal{Y}\_i) - \tilde{X}\_i \tag{117}$$

(*g*) follows from the definitions of *Vi* and *Ui*, (*h*) follows from the Markov chain relation (*Vi*, *Ui*) <sup>−</sup> *<sup>X</sup>*H*<sup>i</sup>* <sup>−</sup> *Yi*, and (*i*) follows from the Markov chain relation *Vi* <sup>−</sup> *Ui* <sup>−</sup> *<sup>X</sup>*H*<sup>i</sup>* <sup>−</sup> *Zi*.

Next, the matching converse for the secrecy leakage rate in (102), achieved when *<sup>R</sup>*<sup>0</sup> <sup>≥</sup> *<sup>I</sup>*(*U*; *<sup>X</sup>*H|*Y*, *<sup>V</sup>*), is provided.

$$\begin{aligned} &m(R\_{\delta} + \delta\_{n}) \stackrel{(a)}{\geq} H(\check{X}^{n}|Z^{n}) - H(\check{X}^{n}|Z^{n}, W) \\ &\stackrel{(b)}{\geq} H(\check{X}^{n}|Z^{n}) - \sum\_{i=1}^{n} H(\check{X}\_{i}|Z\_{i}, Z^{i-1}, \check{X}\_{i+1}^{n}, W, Y\_{i+1}^{n}) \\ &\stackrel{(c)}{=} H(\check{X}^{n}|Z^{n}) - \sum\_{i=1}^{n} H(\check{X}\_{i}|Z\_{i}, V\_{i}, \check{X}\_{i+1}^{n}) \\ &\stackrel{(d)}{\geq} \sum\_{i=1}^{n} [H(\check{X}\_{i}|Z\_{i}) - H(\check{X}\_{i}|Z\_{i}, V\_{i})] = \sum\_{i=1}^{n} I(V\_{i}; \check{X}\_{i}|Z\_{i}) \end{aligned} \tag{118}$$

where (*a*) follows by (2), (*b*) follows from the Markov chain relation

$$(Z\_{i+1}^n, Y\_{i+1}^n) - (\widetilde{X}\_{i+1}^n, \mathcal{W}, Z^i) - \widetilde{X}\_i \tag{119}$$

(*c*) follows from the definition of *Vi*, and (*d*) follows because (*X*H*n*, *<sup>Z</sup>n*) are i.i.d.

Similar to the privacy leakage analysis above, the matching converse for the secrecy leakage rate in (103), achieved when *<sup>R</sup>*<sup>0</sup> <sup>≥</sup> *<sup>I</sup>*(*U*; *<sup>X</sup>*H|*Y*), follows from the fact that conditional mutual information is non-negative.

Introduce a uniformly distributed time-sharing random variable *Q*∼Unif[1:*n*] that is independent of other random variables, and define *<sup>X</sup>* <sup>=</sup> *XQ*, *<sup>X</sup>*<sup>H</sup> <sup>=</sup> *<sup>X</sup>*H*Q*, *<sup>Y</sup>* <sup>=</sup> *YQ*, *<sup>Z</sup>* <sup>=</sup> *ZQ*, *V* =*VQ*, and *U* = (*UQ*,*Q*), so

(*Q*, *<sup>V</sup>*)<sup>−</sup> *<sup>U</sup>* <sup>−</sup> *<sup>X</sup>*<sup>H</sup> <sup>−</sup> *<sup>X</sup>* <sup>−</sup> (*Y*, *<sup>Z</sup>*) (120)

form a Markov chain. The converse proof follows by letting *δ<sup>n</sup>* → 0.

**Cardinality Bounds**: We use the support lemma [54] (Lemma 15.4) for the cardinality bound proofs, which is a standard step, so we omit the proof.

**Author Contributions:** Conceptualization, O.G., R.F.S., H.B. and H.V.P.; Methodology, O.G. and H.V.P.; Software, H.B.; Validation, R.F.S.; Formal analysis, O.G., R.F.S., H.B. and H.V.P.; Resources, H.B.; Data curation, O.G. and R.F.S.; Writing—original draft, O.G.; Writing—review & editing, R.F.S., H.B. and H.V.P.; Project administration, R.F.S. and H.V.P.; Funding acquisition, R.F.S. and H.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** O. Günlü was supported by the ZENITH Research and Career Development Fund and the ELLIIT funding endowed by the Swedish government. R. F. Schaefer was supported in part by the German Federal Ministry of Education and Research (BMBF) within the national initiative for Post-Shannon Communication (NewCom) under grant no. 16KIS1004 and the National Initiative for 6G Communication Systems through the Research Hub 6G-life under grant no. 16KISK001K. H. Boche was supported in part by the BMBF within the National Initiative for 6G Communication Systems through the Research Hub 6G-life under grant no. 16KISK002 and within the national initiative for Information Theory for Post Quantum Crypto "Quantum Token Theory and Applications—QTOK" under grant no. 16KISQ037K, which has received additional funding from the German Research Foundation (DFG) within Germany's Excellence Strategy EXC-2092 CASA-390781972. H. V. Poor was supported in part by the U.S. National Science Foundation (NSF) under grant no. CCF-1908308.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

### **References**


### *Article* **Broadcast Approach to Uplink NOMA: Queuing Delay Analysis**

**Maha Zohdy 1,†,‡, Ali Tajer 1,\*,† and Shlomo Shamai (Shitz) 2,†**


**Abstract:** Emerging wireless technologies are envisioned to support a variety of applications that require simultaneously maintaining low latency and high reliability. Non-orthogonal multiple access techniques constitute one candidate for grant-free transmission alleviating the signaling requirements for uplink transmissions. In open-loop transmissions over fading channels, in which the transmitters do not have access to the channel state information, the existing approaches are prone to facing frequent outage events. Such outage events lead to repeated re-transmissions of the duplicate information packets, penalizing the latency. This paper proposes a multi-access broadcast approach in which each user splits its information stream into several information layers, each adapted to one possible channel state. This approach facilitates preventing outage events and improves the overall transmission latency. Based on the proposed approach, the average queuing delay of each user is analyzed for different arrival processes at each transmitter. First, for deterministic arrivals, closed-form lower and upper bounds on the average delay are characterized analytically. Secondly, for Poisson arrivals, a closed-form expression for the average delay is delineated using the Pollaczek-Khinchin formula. Based on the established bounds, the proposed approach achieves less average delay than single-layer outage approaches. Under optimal power allocation among the encoded layers, numerical evaluations demonstrate that the proposed approach significantly minimizes average sum delays compared to traditional outage approaches, especially under high arrival rates.

**Keywords:** broadcast approach; channel state information; latency; multiple access

### **1. Introduction**

There is a growing need for maintaining low latency and high reliability in a wide range of wireless communication systems [1]. Among the recently proposed techniques for attaining the latency-reliability requirements is the power domain non-orthogonal multiple access (NOMA) [2–6]. Uplink power domain NOMA [5] facilitates simultaneous multi-user channel access, alleviating the traditional signaling period at the beginning of the transmission. Furthermore, by leveraging power control and adaptive decoding order among users, NOMA techniques enhance user fairness by taking into consideration the dissimilarities in the channel state of each user [7,8].

A fundamental challenge that NOMA faces in wireless networks is that its power control critically relies on the availability of full channel state information at each transmitter (CSIT). This assumption is generally unfeasible under the anticipated network scale growth. In the absence of CSIT, traditional NOMA occasionally suffers from outage events, which necessitate repeated re-transmissions and negatively affect the overall latency. To address this issue, we propose a non-orthogonal multi-access technique in which each transmitter splits its stream of information into multiple encoded layers, each adapted to a specific combination of all the network's channel states. Each user then transmits

**Citation:** Zohdy, M.; Tajer, A.; Shamai, S. Broadcast Approach to Uplink NOMA: Queuing Delay Analysis. *Entropy* **2022**, *24*, 1757. https://doi.org/10.3390/e24121757

Academic Editors: Onur Günlü and Holger Boche

Received: 16 September 2022 Accepted: 15 November 2022 Published: 30 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the superposition of all its encoded layers to the receiver. In particular, we approach the problem of minimizing the overall communication latency from a cross-layer resource allocation perspective by focusing on the dominant delay factor, i.e., the queuing delay [9]. The goal of the proposed approach is to minimize the average sum-queuing delay among users by optimally allocating power among the encoded layers at each transmitter in the physical layer.

Outage avoidance via multi-layer superposition coding was first proposed in [10,11] for the slowly fading single-user channels. This is generally referred to as the broadcast approach [12]. Furthermore, the studies in [13] extended the broadcast approach to the energy harvesting settings, those in [14–20] to random and multi-access channel models, and those in [21,22] to the multiuser interference channel. Aside from analyzing the achievable rate regions of multi-layer superposition coding [17,23], the average delay performance has only been studied for the single-user fading channel in [24]. However, under CSIT uncertainties, the advantages of adaptive multi-layer superposition coding for controlling the average queuing delay in multiple access channels are yet to be explored. Finally, we note that the broadcast approach is related to the studies on the "rate-splitting", the foundations of which rely on superposition coding of the layered information messages [25].

In this paper, we consider an *N*-user block fading multiple access channel (MAC) in which all transmitters are oblivious to their instantaneous channel state. Each user possesses an infinite capacity queue, occasionally holding the arriving information packets to be transmitted. A novel multi-layer superposition coding scheme is then employed, in which each transmitter adapts its message to the combined network state. Based on the proposed scheme, closed-form lower and upper bounds on the average delay are characterized analytically for deterministic arrivals. Furthermore, a closed-form expression for the average queuing delay is delineated for Poisson arrivals. Based on the derived bounds on average delay, the proposed approach is shown to outperform the single-layer outage approach. Finally, under optimal power allocation among the encoded layers, numerical evaluations demonstrate that the broadcast approach significantly reduces the average sum delays compared to traditional outage approaches under symmetric/asymmetric arrival rates and channel statistics among users.

A rich literature exists on minimizing the average delay through cross-layer resource allocation in MAC with full CSIT. Relevant studies include [26] in which the authors provide an optimal solution for minimizing average delays of two-user MAC channels by controlling the departure probability of each user's queue. In [27], an informationtheoretic rate allocation policy is proposed to achieve a lower bound on the average delay of multi-access coding schemes. Dynamic power and rate control to minimize the average delay are studied for multi-access channels in [28]. The study in [28] provides a one-step value iteration policy for optimal scheduling in MAC fading channels. A lower bound on the LTE-A average delay is derived in [29] for random access channels under different arrival processes. The random access scheduling problem is addressed in [30] using a distributed virtual queue model facilitating a self-organizing policy. The study in [31] proposes a joint superposition coding and scheduling policy for the uplink NOMA by relying on user-pairing to reduce the complexity of analysis [32,33]. The accuracy of ranking users in NOMA techniques using distance-based measures versus instantaneous signal-to-noise ratio (SNR) is addressed in [34]. Joint scheduling and superposition coding in fading channels is studied in [35]. The effect of unsaturated traffic in uplink NOMA is studied in [36] using tools from queuing theory. Interaction between power control and queuing service rates in interference-limited channels is studied in [37]. Delay analysis of multi-point to multi-point networks is provided in [38] for spatial-temporal random arrival traffic. The problem of power control in delay-bounded applications is considered in [39], especially under the assumption of imperfect successive interference cancellation in uplink NOMA. The effective capacity of two-user uplink NOMA is characterized in [40] under quality-of-service delay constraints.

Energy-efficient transmission in uplink NOMA is studied in [41] under statistical delay constraints, where probabilistic upper bounds on queuing delays of NOMA are characterized. Resorting to the concept of effective capacity, the study in [42] proposes an optimized hybrid approach between non-orthogonal multiple access and orthogonal multiple access with different user pairing techniques in order to maximize the effective capacity under stringent delay constraints. Contention-based modified NOMA for uplink access is studied in [43], showing that exploiting collisions in the power domain can greatly reduce access delay. The throughput, access delay, and energy efficiency of NOMA uplink random access system are studied in [44]. Joint power control and user scheduling is considered in [45] to investigate the access delay minimization problem through an efficient sub-optimal iterative algorithm. Optimal power level partitioning to accommodate noncritical and high-priority messages is studied in [46]. A joint dynamic power control and user pairing algorithm is proposed in [47] to minimize long-term time average transmit power and queuing delay. Recent studies further includes [48] in which an adaptive rate NOMA with full CSIT is shown to provide better ergodic capacities for mobile users than OMA while satisfying strict local delay constraints for the internet of things (IoT) devices in cellular IoT networks. Opportunistic NOMA schemes are proposed in [49] for short message delivery with delay constraint based on which an upper bound on session error probability is derived, showing the impact of NOMA on session error under Rayleigh fading. A queuing delay analysis is presented in [50] for uplink NOMA with full CSIT, and the impact of channel estimation imperfections for finite-length channel coding is studied. Dynamic power allocation schemes with statistical delay quality-of-service (QoS) guarantees are shown in [51] to significantly improve the sum effective capacity and effective energy efficiency for an uplink NOMA system with paired users.

The rest of this paper is organized as follows. Section 2 presents the N-user multiaccess channel model. The proposed multi-layer-based multi-access approach is outlined in Section 3 for the special case of the 2-state channel. The average delay achievable by the proposed approach is shown to outperform the average delay of the single-layer outage approach in Section 4 for deterministic and stochastic arrivals processes. The proposed multi-access approach is generalized to the case of finite arbitrary --state channel in Section 5. Finally, numerical evaluations are provided in Section 6, and the paper is concluded in Section 7.

### **2. Channel Model**

Consider an *N*-user block fading MAC channel consisting of *N* transmitters and one receiver. The channel state is assumed to remain unchanged during the period of one transmission block of *n* channel uses and varies independently among consecutive blocks. We assume that the block length *n* is large enough to give rise to the notion of reliable communications but much shorter than the dynamics of the fading process [24]. Each transmitter is assumed to know the statistics of the channel state information (CSI) of its own link to the receiver but is oblivious to its instantaneous value. Complete CSI of all links is assumed to be available at the receiver. The input-output relationship of this channel is given by

$$\mathcal{Y} = \sum\_{i=1}^{N} h\_i \mathcal{X}\_i + \mathcal{W}\_{\prime} \tag{1}$$

where *Xi* denotes the transmitted signal from user *i* and *W* is the additive white Gaussian noise with zero mean and unit variance. Finally, *hi* denotes the state of the fading channel between transmitter *i* and the receiver. The transmitted signal *Xi* is subject to an average power constraint *P* for all *i* ∈ {1, ... , *N*}, i.e., E |*Xi*| 2 ≤ *P*. We consider a quantized model for the fading channel according to which *h*<sup>2</sup> *<sup>i</sup>* takes one of two possible states, referred to as {*weak*, *strong*}, denoted by {*α*1, *α*2}, respectively. Without loss of generality, we assume 0 < *α*<sup>1</sup> < *α*<sup>2</sup> < +∞. User *i* experiences *strong* or *weak* channel states with probabilities *pi* = P(*h*<sup>2</sup> *<sup>i</sup>* = *α*2) and *p*¯*<sup>i</sup>* = 1 − *pi*, respectively.

Each transmitter is assumed to possess an infinite-capacity queue. The queue at transmitter *i* receives random packets with an average arrival rate *λ<sup>i</sup>* (bits/channel use). The size of the data queued at transmitter *i* at the beginning of any transmission block *t* is denoted by *<sup>Q</sup>*˜*i*(*t*), <sup>∀</sup>*<sup>i</sup>* ∈ {1, ... , *<sup>N</sup>*}. We define *Ai*(*t*) as the total number of bits arriving in the queue at transmitter *i* during transmission block *t*. Finally, *ri*(*t*) (bits/channel use) denotes the service rate of the queue at transmitter *i*. Hence, the queue size at transmitter *i* at the end of any transmission block can be expressed using a recursive relationship as

$$
\tilde{Q}\_i(t+1) = \begin{cases}
\tilde{Q}\_i(t) + nA\_i(t) - nr\_i(t), & \tilde{Q}\_i(t) + nA\_i(t) - nr\_i(t) \ge 0 \\
0, & \text{otherwise}
\end{cases}.\tag{2}
$$

Accordingly, we define *Qi*(*t*) as queue size normalized by the number of transmission blocks *n*, i.e.,

$$Q\_i(t+1) \stackrel{\triangle}{=} \begin{cases} Q\_i(t) + Z\_i(t), & Q\_i(t) + Z\_i(t) \ge 0 \\ 0, & \text{o.w.} \end{cases} \tag{3}$$

where the random variable *Zi*(*t*) is defined as *Zi*(*t*) = *Ai*(*t*) − *ri*(*t*), and it captures the change in the queue size at transmitter *i* at the end of transmission block *t*. We remark that the number of bit arrivals *Ai*(*t*) is random and does not necessarily fit into the exact size of the transmitted packet in a given transmission block. Therefore, if the backlogged data at any queue is less than a packet length, the data bits are zero-padded to form a complete packet for the encoder at each transmitter. Throughout the rest of the paper, we assume that the processing delay, i.e., encoding and decoding processes, as well as the transmission delay, are fixed and negligible with respect to the queuing delay. We use the concise notation *C*(*x*, *y*) = <sup>1</sup> <sup>2</sup> log2(<sup>1</sup> <sup>+</sup> *<sup>x</sup>* 1 *<sup>P</sup>* +*y* ), {*x<sup>i</sup> j* }*k j*=1 = {*x<sup>i</sup>* <sup>1</sup>, *<sup>x</sup><sup>i</sup>* <sup>2</sup>, ... , *<sup>x</sup><sup>i</sup> <sup>k</sup>*}. Finally, we denote the set of all users in the network by N = {1, . . . , *N*}.

### **3. 2-State Channel Multi-Access**

In this section, we present a non-orthogonal multiple-access approach based on multilayer encoding at each transmitter and successive interference cancellation (SIC) at the receiver. The underlying layering approach hinges on adapting the number of encoded layers at each transmitter to the combined fading state of the network, i.e., the fading states of all transmitters to the receiver. Owing to the arising interference in non-orthogonal multi-access channels with no CSIT, the channel state of each user directly affects the decoding success probabilities of all the other users. Motivated by this, the recent work in [17] proposed a multi-layer coding approach for the two-user multiple access channel with no CSIT, specially adapted to the combined network state resulting in an enlarged average achievable rate regions compared to the existing multi-layer coding approaches. In this section, we extend the layering approach in [17] to the general case of an arbitrary number of *N*-users. As shown in this paper, the proposed multi-access approach enjoys considerable advantages in reducing the queuing delay.

### *3.1. Layering Approach*

At the beginning of each transmission block, user *i* aims to transmit all the data bits accumulated in its queue if the channel state allows it. Otherwise, it encodes a part of its data with the maximum allowable encoding rate. Towards this goal, user *i* encodes its data (fully or partially) using 2*N* independent messages generated from 2*N* Gaussian codebooks. These messages are denoted by *U<sup>i</sup> jk*, ∀*i* ∈ N , *j* ∈ {1, 2}, *k* ∈ {0 ∪N}. Based on this decomposition

$$X\_i = \sum\_{j=1}^{2} \sum\_{k=0}^{N} \mathcal{U}\_{jk}^{i} \,. \tag{4}$$

We consider an ordering of the network states based on the number of users with *strong* channel states denoted by *k*. We define S*<sup>k</sup>* as the set of *k* users' indices that experience *strong* channel states. Accordingly, E*<sup>k</sup>* denotes the event that exactly *k* users are experiencing a *strong* channel including.

The notation *U<sup>i</sup> jk* can be interpreted as follows. Superscript *i* denotes the user index *<sup>i</sup>* ∈ N , subscript *<sup>j</sup>* ∈ {1, 2} refers to user *<sup>i</sup>*'s channel state, where *<sup>j</sup>* = 1 if *<sup>h</sup>*<sup>2</sup> *<sup>i</sup>* = *α*<sup>1</sup> and otherwise *j* = 2. Finally, *k* ∈ {0 ∪N} represents the number of users in the network with a *strong* channel state, possibly including user *i*'s channel. Therefore, for every value of *k*, user *<sup>i</sup>* adapts the rate of two codewords, {*U<sup>i</sup> jk*}<sup>2</sup> *<sup>j</sup>*=1, based on its own channel state resulting in a total of 2*k* layers. The correspondence between each channel state and the adapted layer is shown in Table 1 and summarized below:

	- **–** *U<sup>i</sup>* <sup>1</sup>*<sup>k</sup>* is adapted to N \E*<sup>k</sup>* if user *i*'s channel is *weak*.
	- **–** *U<sup>i</sup>* <sup>2</sup>*<sup>k</sup>* is adapted to E*<sup>k</sup>* if user *i*'s channel is *strong*.

The rate of codeword *U<sup>i</sup> jk* is denoted by *<sup>R</sup><sup>i</sup> jk*. Finally, we define *<sup>β</sup><sup>i</sup> jk* as the power fraction of the total power *P* allocated to codeword *U<sup>i</sup> jk*, such that

$$\sum\_{j=1}^{2} \sum\_{k=0}^{N} \beta\_{jk}^{i} = 1$$

For user *i*, the rate of each codebook is governed via the power allocation parameters *β<sup>i</sup> jk* such that at least one layer is successfully decoded in every possible network state.


**Table 1.** Layering and codebook assignments by user *i*.

*3.2. Decoding Approach*

Corresponding to the layering approach in Section 3.1, we propose a decoding algorithm with 2*kN* SIC stages for each combined channel state with *k* strong channels. The layers' decoding order is adapted to the combined channel states such that all the layers adapted to channel states with less than *<sup>k</sup>* strong users, {*U<sup>i</sup> j*-, ∀*j* ∈ {1, 2}, - < *k*}, are first decoded and subtracted from the received signal. Afterwards, layers adapted to channel state with exactly *<sup>k</sup>* strong users, {*U<sup>i</sup> jk*, ∀*j* ∈ {1, 2}}, are decoded.

When |S| = *k*, the receiver employs 4*k* + 1 decoding stages. Each of the layers for any *j* ∈ {1, 2} and - ∈ {0, . . . , *<sup>k</sup>*}, the set of codebooks {*U<sup>i</sup> j*-: *i* ∈N} is partitioned to two sets

$$\mathcal{P}\_{j\ell} \triangleq \{ \mathcal{U}\_{j\ell}^{i} : i \in \mathcal{S} \} \qquad \text{and} \qquad \mathcal{Q}\_{j\ell} \triangleq \{ \mathcal{U}\_{j\ell}^{i} : i \notin \mathcal{S} \} \,, \tag{5}$$

rendering a total of 4*k* + 1 partitions for different *j* ∈ {1, 2} and - ∈ {0, ... , *k*}. The decoding strategy decodes one message from each of these, except for the partition {*U<sup>i</sup>* <sup>2</sup>*<sup>k</sup>* : *i* ∈ S} / . The decoding strategy works as follows. We create the following two sequences of sets:

$$\mathcal{P} \triangleq \{ \mathcal{P}\_{10\prime} \mathcal{P}\_{11\prime} \mathcal{P}\_{21\prime} \dots \mathcal{P}\_{2(k-1)\prime} \mathcal{P}\_{1k\prime} \} \,, \tag{6}$$

$$\mathbb{Q} \stackrel{\triangle}{=} \left\{ \mathbb{Q}\_{1k}, \mathbb{Q}\_{2(k-1)}, \mathbb{Q}\_{1(k-1)}, \dots, \mathbb{Q}\_{11}, \mathbb{Q}\_{10} \right\}.\tag{7}$$

The decoding strategy selects codebooks by alternating between P and Q in ascending order and decodes exactly one codebook from each. Specifically, the codebook sets are selected in the following order: {P10, Q1*k*,P11, Q2(*k*−1),P21, ... ,P1*k*, Q10}. This results in <sup>4</sup>*<sup>k</sup>* coding stages. Finally, the codebooks in {*U<sup>i</sup>* <sup>2</sup>*<sup>k</sup>* : *i* ∈ S} are decoded as the last stage, i.e., stage 4*k* + 1. Next, we describe the decoding stages and the set of codebooks decoded in each.



The proposed decoding approach results in decoding more layers for a channel state with *k* strong users compared to a state with *k* − 1 strong users. In particular, the receiver decodes one extra layer for user *i* in channel state E*<sup>k</sup>* as compared to state E*k*−1. Note that in both states, user *i* experiences a *weak* channel. On the other hand, the receiver decodes two extra layers for user *i* in channel state E*<sup>k</sup>* as compared to state E*k*−1, note that user *i* experiences a *strong* channel in both states. Our intuition behind such a strategy hinges on two factors. First, that decoding and removing additional interfering users with strong channel states is expected to increase the achievable rate of user *i*. Secondly, when user *i* experiences a stronger channel, the receiver can possibly decode an additional layer from its message. The decoded layers for channel state E*<sup>k</sup>* are shown in Table 2 for illustration.

**Table 2.** Decoded layers for channel state <sup>E</sup>*<sup>k</sup>* where *<sup>h</sup>*<sup>2</sup> *<sup>i</sup>* = *αj*.


Finally, the detailed steps of the proposed successive decoding algorithm are presented in Algorithm 1. We remark that the effect of the precedence of users with similar channel states within each decoding stage on the average achievable delay will be analyzed in the subsequent sections.


Based on the multi-access approach outlined throughout this section, the service rate of the queue at transmitter *i* is determined by the total rates of the successfully decoded layers during each network state. Therefore, the service rate *ri*(*t*) during transmission block *t* varies randomly and is jointly determined by the states of all users as well as the power allocation among different layers at each transmitter, i.e., *β<sup>i</sup> jk*. The achievable rates for all the encoded layers are formally stated in the Theorem 1.

**Theorem 1.** *For the N-user MAC channel without CSIT, when exactly k* ∈N ∪{0} *users have strong channels, the achievable rates of the layering approach in Section 3.1 and the decoding policy in Algorithm 1 are characterized by the set of rates* - *Ri jk*, ∀*j* ∈ {1, 2}, *i* ∈ N , - ∈ {<sup>0</sup> ∪N}<sup>7</sup> *that satisfy*

$$R^i\_{j\ell} \le \min\_{\mathcal{S}: |\mathcal{S}| = k} d^i\_{j\ell}(\mathcal{S})\,. \tag{9}$$

*where constants di jk*(S), ∀*k* ∈ {0 ∪N}, *j* ∈ {1, 2} 7 *are defined in Appendix A.*

### **Proof.** See Appendix B.

We remark that characterizing the achievable rate region of the proposed approach in the form of rate bounds on individual codebooks rates, rather than an average achievable rate region, will be instrumental to characterizing the average achievable delay analysis throughout the next section.

### **4. Average Queuing Delay**

In this section, we investigate the average queuing delay achieved by the multi-access approach in Section 3 compared to the conventional single-layer (outage) multi-access approach. First, in Section 4.1, we focus on the case of the deterministic arrival process at each queue, for which we delineate lower and upper bounds on the average queuing delay. Furthermore, the case of stochastic arrivals is examined in Section 4.2 in which a closed-form expression for the average delay achievable by the proposed approach is characterized and compared to that of the single-layer transmission approach. To proceed, we define E*<sup>i</sup> <sup>k</sup>* as the event in which we have exactly *k* strong channels and they include the channel of user *<sup>i</sup>*. Accordingly, we define <sup>E</sup>¯*<sup>i</sup> <sup>k</sup>* - N \E*<sup>i</sup> <sup>k</sup>*. We begin by computing the probabilities of the events E*<sup>i</sup> <sup>k</sup>* (and <sup>E</sup>(S¯*<sup>i</sup> <sup>k</sup>*)) as follows.

$$\mathbb{P}\left[\mathcal{E}\_k^i\right] = \sum\_{\substack{\mathcal{T}\subseteq\mathcal{N} \\ |\mathcal{T}|=k}} \prod\_{j\in\mathcal{T}} p\_j \prod\_{\substack{\ell\notin\mathcal{T} \\ \ell\neq i}} \bar{p}\_\ell \qquad \text{and} \qquad \mathbb{P}\left[\mathcal{E}\_k^{i\overline{i}}\right] = \sum\_{\substack{\mathcal{T}\subseteq\mathcal{N} \\ |\mathcal{T}|=k}} \prod\_{j\in\mathcal{T}} p\_j \prod\_{\ell\notin\mathcal{T}} \bar{p}\_\ell \,. \tag{10}$$

where I denotes a subset of user indices.

### *4.1. Deterministic Arrivals*

Throughout this subsection, we assume that the data arrival process at each queue is a deterministic process with an average arrival rate *λi*, i.e., *Ai*(*t*) = *λi*, ∀*i* ∈ N . Note that as a result of the zero-padding applied by the encoder, whenever the available data bits are fewer than a transmission packet, a G/G/1 queuing model is generated at each transmitter. A closed-form expression characterizing the average delay of the G/G/1 queuing model is, in general, unknown. Therefore we resort to characterizing upper and lower bounds on the average queuing delay. These bounds are formally presented in Theorem 2. Before stating Theorem 2, we provide an outline of the main steps pertinent to deriving the characterized bounds, where the detailed proof can be found in Appendix C.

Establishing the desired bounds hinges on characterizing the average queue size at each transmitter *i* using the Laplace transform of the probability distribution function (PDF) of the queue size *Qi* (moment generating function). Let the PDF of *Qi* be denoted by *dFi*(*q*) and its associated Laplace transform be denoted by *Li*(*s*). Therefore, the average queue size at transmitter *i* is given by

$$\mathbb{E}[Q\_i] = \lim\_{s \to 0} -\frac{dL\_i(s)}{ds} \,. \tag{11}$$

Recalling the recursive expression for *Qi* in terms of the variable *Zi* in (3), a recursive form of *Fi*(*q*) can be expressed as follows [52,53]

$$F\_i(q) = \begin{cases} \int\_{-\infty}^q F\_i(q-\tau) dF\_{\overline{Z}\_i}(\tau) \,, & q \ge 0\\ 0 \, \, \, \, \, \qquad q < 0 \end{cases} \tag{12}$$

where *dFZi* (*z*) denote PDF of *Zi* denoting change in queue size at user *i*. At the end of every transmission block, the change in queue size *i*, *Zi*, is primarily determined by the difference between the data arrival *λ<sup>i</sup>* and the total rate of all the layers successfully decoded by the receiver from user *i*'s message stream, which in turn is determined by the combined network state. Consequently, *dFZi* (*z*) can be expressed as

$$\begin{split}dF\_{Z\_{i}}(z) &= \mathbb{P}\left[\mathcal{E}\left(\mathcal{S}\_{0}^{i}\right)\right]\delta\left(z-\lambda\_{i}+R\_{10}^{i}\right) + \mathbb{P}\left[\mathcal{E}\left(\mathcal{S}\_{N}^{i}\right)\right]\delta\left(z-\lambda\_{i}+\sum\_{j=1}^{2}\sum\_{k=1}^{N}R\_{jk}^{i}\right) \\ &+ \sum\_{\ell=1}^{N-1}\mathbb{P}\left[\mathcal{E}\left(\mathcal{S}\_{\ell}^{i}\right)\right]\delta\left(z-\lambda\_{i}+\sum\_{j=1}^{2}\sum\_{k=0}^{\ell-1}R\_{jk}^{i}+R\_{2\ell}^{i}\right) \\ &+ \sum\_{\ell=1}^{N-1}\mathbb{P}\left[\mathcal{E}\left(\mathcal{S}\_{\ell}^{i}\right)\right]\delta\left(z-\lambda\_{i}+\sum\_{j=1}^{2}\sum\_{k=0}^{\ell-1}R\_{jk}^{i}+R\_{1\ell}^{i}\right). \end{split} \tag{13}$$

We remark that in order to guarantee the stability of the data queue at each transmitter, we assume that the arrival rate *λ<sup>i</sup>* is less than the average achievable rate (service rate of the queue), i.e.,

$$
\lambda\_i < \mathbb{E}[r\_i] \; , \qquad \forall i \in \mathcal{N} \; , \tag{14}
$$

where the average service rate at queue *i* is given by

$$\begin{split} \mathbb{E}[r\_{i}] &= \mathbb{P}\left[\mathcal{E}\left(\mathcal{S}\_{0}^{i}\right)\right]R\_{10}^{i} + \mathbb{P}\left[\mathcal{E}\left(\mathcal{S}\_{N}^{i}\right)\right] \cdot \sum\_{j=1}^{2} \sum\_{k=1}^{N} R\_{jk}^{i} \\ &+ \sum\_{\ell=1}^{N-1} \mathbb{P}\left[\mathcal{E}\left(\mathcal{S}\_{\ell}^{i}\right)\right] \cdot \left(\sum\_{j=1}^{2} \sum\_{k=0}^{\ell-1} R\_{jk}^{i} + R\_{2\ell}^{i}\right) + \sum\_{\ell=1}^{N-1} \mathbb{P}\left[\mathcal{E}\left(\mathcal{S}\_{\ell}^{i}\right)\right] \cdot \left(\sum\_{j=1}^{2} \sum\_{k=0}^{\ell-1} R\_{jk}^{i} + R\_{1\ell}^{i}\right) . \end{split} \tag{15}$$

An explicit expression for *Fi*(*q*), ∀*i* ∈ N , directly follows by combining (12) and (13)

$$F\_{\bar{i}}(q) = \begin{cases} 0, & \forall q \in \mathcal{R}\_1 \\ \mathbb{P}\left[\mathcal{E}\left(\mathcal{S}\_0^i\right)\right]F\_{\bar{i}}(q - \lambda\_{\bar{i}} + \sum\_{j=1}^2 \sum\_{k=0}^N R\_{jk}^i), & \forall q \in \mathcal{R}\_2 \\ \vdots \\ \mathbb{P}\left[\mathcal{E}\left(\mathcal{S}\_0^i\right)\right]F\_{\bar{i}}(q - \lambda\_{\bar{i}} + R\_{10}^i), & \forall q \in \mathcal{R}\_{2N-1} \end{cases} \tag{16}$$

where the intervals R*i*, ∀*i* ∈ {1, . . . , 2*N* − 1}, are given by

$$\begin{aligned} \mathcal{R}\_1 & \stackrel{\triangle}{=} (-\infty, 0) \\ \mathcal{R}\_2 & \stackrel{\triangle}{=} \left[ 0, \lambda\_i - \sum\_{j=1}^2 \sum\_{k=0}^N R\_{jk}^i + R\_{1(N-1)}^i \right] \check{\phantom{\triangle}} \\ & \vdots \\ \mathcal{R}\_{2N-1} & \stackrel{\triangle}{=} \left[ \lambda\_i - R\_{10'}^i \infty \right) . \end{aligned}$$

Finally, the Laplace transform of the queue size PDF is computed using (16), which in turn facilitates obtaining the average queue size at user *i*. Note that although *Fi*(*q*) is expressed in (16), it is still a recursive form. Therefore, the obtained expression for the average queue size delay contains the unknown term *Fi*(*q*), which is why a closed form cannot be obtained. Subsequently, an upper and a lower bound on the average queue size of user *i* ∈ N are formally characterized in the next theorem.

**Theorem 2.** *The average queue size of transmitter i under the multi-access policy in Section 3 is bounded by*

$$\frac{1}{2} \sum\_{j=1}^{2} \sum\_{k=0}^{N} R\_{jk}^{i} - \frac{\lambda\_{i}}{2} - \frac{N\_{i}}{D\_{i}} \le \mathbb{E}[Q\_{i}] \le \sum\_{j=1}^{2} \sum\_{k=0}^{N} R\_{jk}^{i} - \lambda\_{i} - \frac{N\_{i}}{D\_{i}} \tag{17}$$

*where we have defined Di* = 2(E[*ri*] − *λi*) *and*

$$\begin{split} N\_{l} & \stackrel{\triangle}{=} - \left( \sum\_{j=1}^{2} \sum\_{k=0}^{N} R\_{jk}^{l} - \lambda\_{l} \right)^{2} + \mathbb{P} \left[ \mathcal{E} \left( \mathcal{S}\_{0}^{l} \right) \right] \left( \sum\_{j=1}^{2} \sum\_{k=1}^{N} R\_{jk}^{l} \right)^{2} \\ & + \sum\_{\ell=1}^{N-1} \mathbb{P} \left[ \mathcal{E} \left( \mathcal{S}\_{\ell}^{l} \right) \right] \cdot \left( \sum\_{j=1}^{2} \sum\_{k=\ell+1}^{N} R\_{jk}^{l} + R\_{1\ell}^{l} \right)^{2} + \sum\_{\ell=1}^{N-1} \mathbb{P} \left[ \mathcal{E} \left( \bar{\mathcal{S}}^{l} \right) \right] \cdot \left( \sum\_{j=1}^{2} \sum\_{k=\ell+1}^{N} R\_{jk}^{l} + R\_{2\ell}^{l} \right)^{2} . \end{split} \tag{18}$$

**Proof.** See Appendix C.

Using Little's law, upper and lower bounds on the average queuing delay at transmitter *i* under deterministic arrivals can directly be obtained by normalizing the bounds characterized in Theorem 2 E[*Qi*] by *λi*.

In order to assess the performance of the proposed multi-layer superposition coding access approach, we compare the achievable average queuing delay to that of the conventional single-layer access (outage) approach. To this end, we first summarize the single-layer approach, and afterward, a lower bound on the average queuing delay achieved by the single-layer approach is characterized in Lemma 1. Finally, we compare the rate of increase of the average delay achieved by each policy with respect to the data arrival rate. As the arrival rate increases, the rate of increase of the average delay with respect to *λ<sup>i</sup>* resulting from the proposed approach is lower than that resulting from the single-layer (outage) approach.

According to the single-layer (outage) transmission approach, each transmitter encodes the available data in its queue into one layer of a fixed rate irrespective of the unknown network state. For *<sup>i</sup>* ∈ N , let *<sup>R</sup>*<sup>s</sup> *<sup>i</sup>* denote the rate of the single encoded layer

transmitted by user *i* in the outage approach. In any given transmission block, if the rate *R*s *<sup>i</sup>* lies in the achievable rate region of the actual network state, it will be successively decoded by the receiver. Otherwise, an outage occurs where the receiver fails to decode the message of user *i*, and the transmitter attempts to re-transmit the same message in the subsequent transmission block using the same encoding rate *R*<sup>s</sup> *<sup>i</sup>* . We define *<sup>r</sup>*<sup>s</sup> *<sup>i</sup>*(*t*) as the service rate of the queue at user *i* under the single-layer transmission, the encoding rate of the codeword transmitted by user *i* in transmission block *t* and successively decoded by the receiver, hence removed from user *i*'s queue. Furthermore, we denote by *p*<sup>s</sup> *<sup>i</sup>* the probability of successfully decoding a message of rate *R*<sup>s</sup> *<sup>i</sup>* from user *i*. Accordingly, the service rate of the queue at transmitter *i* using the outage approach is given by

$$r\_i^s(t) = \begin{cases} R\_{i'}^s & \text{with probability} \qquad p\_i^s \\ 0, & \text{with probability} \qquad 1 - p\_i^s \end{cases} \tag{19}$$

Finally, we define *Q*<sup>s</sup> *<sup>i</sup>* as the queuing size at transmitter *i* under the single-layer transmission approach summarized above. In Lemma 1, we characterize lower and upper bounds on the average E[*Q*<sup>s</sup> *<sup>i</sup>* ] using an approach similar to that used to characterize the bounds in Theorem 2.

**Lemma 1.** *The average queue size of transmitter i under single layer (outage) approach is lower and upper bounded according to:*

$$\frac{1}{2}\mathbf{R}\_i^s - \frac{\lambda\_i}{2} - \frac{\left(R\_i^s - \lambda\_i\right)^2 - R\_i^s \left(1 - p\_i^s\right)}{2\left(p\_i^s R\_i^s - \lambda\_i\right)} \le \mathbb{E}[Q\_i^s] \le R\_i^s - \lambda\_i - \frac{\left(R\_i^s - \lambda\_i\right)^2 - R\_i^s \left(1 - p\_i^s\right)}{2\left(p\_i^s R\_i^s - \lambda\_i\right)}.\tag{20}$$

**Proof.** Follows the same argument as that in Appendix C.

In Theorem 2 and Lemma 1, we remark that the characterized bounds on the average queuing delay at each transmitter depend only on the arrival rate at the same node. Therefore, the effect of the average arrival rate on the delay bounds in (17) or (20) can be analyzed for each node *i* independently. In Theorem 3, while fixing the average achievable rates at each user among both approaches, we show that as the arrival rate *λ<sup>i</sup>* at each user increases, the proposed multi-access approach lower rate of increase in the average queuing delay with respect to that achieved by the single layer approach.

**Theorem 3.** *For the N-user multiple access channel, given that*

$$\mathbb{E}[r\_i] = \mathbb{E}[r\_i^s] \, , \tag{21}$$

*the rate of increase of average delay with respect the arrival rate under the approach in Section 3 is lower than that achieved by single-layer outage approach, i.e., for every i* ∈ N

$$\frac{\partial \mathbb{E}[Q\_i]}{\partial \lambda\_i} \le \frac{\partial \mathbb{E}[Q\_i^s]}{\partial \lambda\_i}.\tag{22}$$

**Proof.** See Appendix D.

### *4.2. Stochastic Arrivals*

In this section, we consider the proposed multi-layer superposition coding policy presented in Section 3 under Poisson distributed random arrivals *Ai* ∼ *Pois*(*λi*). We adopt the same queuing model in which each transmitter applies zero-padding in case the available bits in its queue are fewer than the size of a transmitted packet. Therefore, under Poisson distributed arrivals, the considered model constitutes an *M*/*G*/1 queuing model with an average arrival rate *λ<sup>i</sup>* and service rate *ri* specified in (15). Furthermore, we denote the queue utilization at transmitter *i* by *ρ<sup>i</sup>* = *<sup>λ</sup><sup>i</sup>* E[*ri*] . The average queue length for an *M*/*G*/1 queue can be characterized in a closed form by directly applying the Pollaczek-Khinchin formula. Theorem 4 formally states the average queuing size under the proposed layering and decoding approach.

**Theorem 4.** *According to the multi-access approach outlined in Section 3, the average queue length at user i with Poisson distributed arrivals with the average rate λ<sup>i</sup> is given by*

$$\mathbb{E}[Q\_i] = \rho\_i + \frac{\rho\_i^2 + \lambda\_i \mathbb{V}[r\_i]}{2(1 - \rho\_i)},\tag{2.3}$$

*where the average service rate* E[*ri*] *is given by* (15) *and the variance of the service rate* V[*ri*] *is*

$$\begin{split} \mathbb{V}[r\_{i}] &= -\mathbb{E}[r\_{i}] + \mathbb{P}\left[\mathcal{E}\left(\mathcal{S}\_{0}^{i}\right)\right](\mathcal{R}\_{10}^{i})^{2} + \mathbb{P}\left[\mathcal{E}\left(\mathcal{S}\_{N}^{i}\right)\right] \cdot \left(\sum\_{j=1}^{2} \sum\_{k=1}^{N} \mathcal{R}\_{jk}^{i}\right)^{2} \\ &+ \sum\_{\ell=1}^{N-1} \mathbb{P}\left[\mathcal{E}\left(\mathcal{S}\_{\ell}^{i}\right)\right] \cdot \left(\sum\_{j=1}^{2} \sum\_{k=0}^{\ell-1} \mathcal{R}\_{jk}^{i} + \mathcal{R}\_{2\ell}^{i}\right)^{2} + \sum\_{\ell=1}^{N-1} \mathbb{P}\left[\mathcal{E}\left(\mathcal{S}^{i}\right)\right] \cdot \left(\sum\_{j=1}^{2} \sum\_{k=0}^{\ell-1} \mathcal{R}\_{jk}^{i} + \mathcal{R}\_{1\ell}^{i}\right)^{2} . \end{split} \tag{24}$$

**Proof.** Follows by applying Pollaczek-Khinchin formula for the M/G/1 average queue size [54], where the service rate of queue *i* is given by *ri*.

We remark that the proof of Theorem 3 implies that the proposed approach outperforms the single-layer outage approach in the case of Poisson arrivals as well, under equal average achievable rates. This result can be readily verified given that the proof in Appendix D essentially boils down to showing that the variance of the service rate (transmission rate) at each queue, V[*ri*], is higher in the case of single-layer outage approach when compared to the proposed multi-layer approach.

#### **5.** *-***-State Channel Multi-Access**

In this section, we generalize the multi-access encoding and decoding approach outlined in Section 3 from the special case of 2-state channel, {*weak*, *strong*}, to channel with an arbitrary number of states -. We denote the channel states by {*α*1, ... , *α*-}. Without loss of generality, we assume that 0 < *α*<sup>1</sup> < ··· < *α*- < +∞. Similarly to Section 2, we consider a slowly fading non-orthogonal multiple access channel model with *N*-transmitters and one receiver. The channel power gain of each user *i* can randomly take one of --states, i.e., *h*2 *<sup>i</sup>* ∈ {*α*1,..., *α*-}.

In the layering approach in Section 3.1, we ordered the network state according to the number of users experiencing a *strong* channel state. Subsequently, each user splits its message into 2*N* layers, and the receiver decodes the layers adapted to the actual network state. Similarly, for the --state channel, we order the combined network state according to the number of users in the network sharing a particular state *α<sup>j</sup>* as well as the value of such a state. In particular, a combined network state is degraded with respect to another state if it has a strictly smaller sum-rate capacity. We define the column vector *h* = [*h*<sup>2</sup> <sup>1</sup>, ... , *<sup>h</sup>*<sup>2</sup> *N*] *<sup>T</sup>* as the the combined network state and consider that a network state *h* to be degraded with respect to network state *h*˜ if and only if

$$\|\|\tilde{h}\|\|\_{1} \leqslant \|\|\tilde{h}\|\|\_{1} \,. \tag{25}$$

The motivation of such ordering stems from the fact the condition in (25) indicates the state *h*˜ allows higher sum-rate capacity in an N-user MAC with full CSIT. In order to overcome the absence of full CSIT at each user, a transmitter splits its message into a finite number of layers, each adapted to the combined network state to avoid complete outages. Similarly to Section 3.1, user *i* encodes an available message using (- − 1)*N* + 1 independent random Gaussian codebooks. The codewords of these codebooks are denoted by *U<sup>i</sup> jk*. For layer *<sup>U</sup><sup>i</sup> jk*, *j* ∈ {1, ... , -} denotes the channel state of user *<sup>i</sup>*, that is *<sup>h</sup>*<sup>2</sup> *<sup>i</sup>* = *αj*, while *k* ∈ {0, ... , *N* − 1} denotes the number of users in the network with stronger channel state, i.e., *k* = ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *I*(*h*<sup>2</sup> *<sup>i</sup>* > *αj*) where *I*(*x*) is the indicator function.

According to the layering approach outlined above, the receiver attempts to successively decode up to *N*((- − 1)*N* + 1) depending on the exact combined network state *h*. In particular, when the actual network state is *h*, the receiver decodes for each user *i* layer *U<sup>i</sup> jk* adapted to network state *h* in addition to all the layers adapted to all degraded network states *h*˜ such that (25) is satisfied. The number of layers decoded for user *i* at the receiver increases from network state *h* to network state *h*ˆ either if its own channel state becomes stronger or if the number of users experiencing channels strictly stronger than *h*<sup>2</sup> *<sup>i</sup>* increases.

Given a network state *h*, the receiver employs up to *M* stages of successive decoding, where *M* denotes the argument of the strongest channel gain in the network, i.e., *M* = arg *h* ∞. In stage *n* ∈ {1, ... , *M*}, the receiver successively decodes up to one layer for each user according to a descending order of the channel states among users. The details of the proposed decoding order for the --state channel are outlined in Algorithm 2.


We remark that according to the proposed layering approach for the --state channel and decoding approach in Algorithm 2, the total number of layers decoded by the receiver from each user *i* is possibly different in certain network states. Although, one possible generalization of the layering policy in Section 3 is that each user adapts a different encoding layer to each possible combined channel state, which in turn requires each user to encode its message into -*<sup>N</sup>* layers. However, the computational complexity of the decoding process, in addition to determining the optimal power allocation among layers, is considerable as the number of users *N* grows larger. Therefore, we adopt the outlined layering approach where each user splits its message into *N*(- − 1) + 1 layers instead of -*<sup>N</sup>* layers.

### **6. Numerical Evaluations**

In this section, we evaluate the average achievable queuing delay for each user in the MAC channel using the multi-access broadcast approach outlined in Section 3. In particular, we adopt a Monte-Carlo simulation to optimally allocate the transmission power among the encoded layers at each user such that the average queuing delay is minimized. We divide the comparison settings into two main parts according to the arrival process at each queue, where we set the arrival process to be the same among both users in each setting. The first considers deterministic arrivals with value *λ*. The second one considers the Poisson arrival process. Furthermore, we also consider symmetric and asymmetric channel distributions among users. Throughout this section, we set the channel gains to *α*<sup>1</sup> = 0.5 for the weak channel and *α*<sup>2</sup> = 1 for the strong channel gains. In the symmetric case, we set the channel probability distribution for each user as *p*<sup>1</sup> = *p*<sup>2</sup> = 0.5, and in the asymmetric case, we set the probabilities to *p*<sup>1</sup> = 0.5 and *p*<sup>2</sup> = 0.1. In the asymmetric model, user 2 encounters a weak channel with a high probability, i.e., *p*¯2 = 0.9. We set the objective function in this numerical simulation to minimize the sum average delays of users 1 and 2 for the broadcast approach. Subsequently, based on the obtained optimal power distribution among the layers at each user, we evaluate the resulting average delay for the outage approach such that the average rates for each user are equal across both approaches.

Figures 1 and 2 focus on deterministic arrivals in the symmetric and asymmetric channel settings. In these figures, we compare average delay versus varying arrival rate *λ* in the proposed broadcast approach (denoted by "Bc") and in the outage approach (denoted by "outage"). In these evaluations, we have set the SNR to *P* = 10 dB. Furthermore, in these figures, we provide upper bounds that we have characterized for the broadcast approach (denoted by "BcUB") and the outage approach (denoted by "OutageUB"). Figures 3 and 4 depict the counterparts of these results for Poisson arrival processes. Finally, it is observed that introducing asymmetry in the models (i.e., unequal probabilities for encountering strong channels) slightly improves the average latency of the broadcast approach, whereas it does not have a notable effect in the outage approach.

**Figure 1.** Deterministic: Symmetric.

**Figure 2.** Deterministic: Asymmetric.

**Figure 3.** Poisson: Symmetric.

**Figure 4.** Poisson: Asymmetric.

The numerical evaluations support the analysis, demonstrating that the proposed broadcast approach significantly enhances the average delays of both users in the moderate and high SNR regimes for moderate and high arrival rates.

### **7. Concluding Remarks**

In this paper, a non-orthogonal multi-access broadcast approach is employed, in which each user splits its information stream into a finite number of encoded layers, each adapted to one possible network state, serving as an outage-free low-latency transmission scheme. In particular, the average queuing delay of each user under the proposed multiaccess approach is analyzed for different arrival processes at each transmitter. First, for deterministic arrivals, closed-form lower and upper bounds on the average delay are derived analytically. Secondly, for Poisson arrival rates, the average queuing delay is characterized in a closed form. The latency advantage of the proposed approach compared

to the single-layer transmission is shown analytically. Finally, we note that in this paper, our focus has been on the discrete channel models since it provides a setting based on which the key ideas (specifically information layering and decoding strategy) can be described clearly and in detail. In order to gain insight into the behavior in the continuous channel models, by increasing the number of channel states in the limit of an infinite number of states, the models converge to a continuous model, and the codebook assignments and decoding strategy converge to their counterparts for continuous channels (larger number of codebooks with low rates).

**Author Contributions:** All authors have contributed equally to the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** The work of A. Tajer has been supported in part by the U.S. Nationa Science Foundation grant ECCS-1933107. The work of S. Shamai (Shitz) has been supported by the European Union's Horizon 2020 Research And Innovation Programme, grant agreement no. 694630.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **Appendix A. Constants of Theorem 1**

∀*i* ∈ N :

$$d\_{10}^i(\boldsymbol{\phi}) \stackrel{\triangle}{=} \mathbb{C}\left(\boldsymbol{a}\_1 \boldsymbol{\beta}\_{10'}^i \, \mathrm{N} \boldsymbol{a}\_1 - \sum\_{j=1}^i \boldsymbol{a}\_1 \boldsymbol{\beta}\_{10}^j\right). \tag{A1}$$

∀*m* ∈ S*<sup>k</sup>* :

$$d\_{10}^{\mathfrak{m}}(\mathcal{S}\_k) \stackrel{\triangle}{=} \mathbb{C}\left(a\_2 \beta\_{10}^{\mathfrak{m}}, (N-k)a\_1 + ka\_2 - \sum\_{j \in \mathcal{S}\_k, j \le k(m)} a\_2 (1 - \beta\_{10}^j)\right),\tag{A2}$$

$$d\_{11}^{\mathfrak{m}}(\mathcal{S}\_{k}) \stackrel{\triangle}{=} \mathbb{C} \left( a\_{2} \mathfrak{f}\_{11}^{\mathfrak{m}}, (N-k)a\_{1} + ka\_{2} - \sum\_{j \in \mathcal{S}\_{k}} a\_{2} \mathfrak{f}\_{10}^{j} - \sum\_{j \notin \mathcal{S}\_{k}} a\_{1} \mathfrak{f}\_{1k}^{j} - \sum\_{j \in \mathcal{S}\_{k} \cup j \le k} a\_{2} \mathfrak{f}\_{11}^{j} \right), \tag{A.3}$$

$$d\_{21}^{\rm mf}(S\_k) \stackrel{\triangle}{=} \mathbb{C}\left(a\_2 \mathfrak{f}\_{21}^{\rm mf}, (N-k)a\_1 + ka\_2 - \sum\_{j \in S\_k} a\_2 (\mathfrak{f}\_{10}^j + \mathfrak{f}\_{11}^j)\right)$$

$$- \sum\_{j \notin S\_k} a\_1 (\mathfrak{f}\_{1k}^j + \mathfrak{f}\_{2(k-1)}^j) - \sum\_{j \in S\_{k,j} \subset k(m)} a\_2 \mathfrak{f}\_{21}^j\right) \tag{A4}$$

∀*m* ∈ S*<sup>k</sup>* and -∈ {1, . . . , *k*}:

$$d\_{1\ell}^{m}(\mathcal{S}\_{k}) \stackrel{\triangle}{=} \mathbb{C} \left( a\_{2} \beta\_{1\ell\prime}^{m} (N-k) a\_{1} + k a\_{2} - \sum\_{j \in \mathcal{S}\_{k}} a\_{2} \beta\_{10}^{j} - \sum\_{j \in \mathcal{S}\_{k}} \sum\_{i=1}^{\ell-1} a\_{2} (\beta\_{1i}^{j} + \beta\_{2i}^{j}) \right)$$

$$- \sum\_{j \notin \mathcal{S}\_{k}} a\_{1} \sum\_{i=1}^{\ell-1} (\beta\_{1(k-i+1)}^{j} + \beta\_{2(k-i+1)}^{j}) - \sum\_{j \in \mathcal{S}\_{k}, j \le k(m)} a\_{2} \beta\_{1\ell}^{j} \Big), \tag{A5}$$

$$\implies \dots \dots \dots \ne \dots \quad \dots \quad \sum\_{i=1}^{\ell} \dots \quad \sum\_{j=1}^{\ell} a\_{i} \beta\_{1(k-i+1)} \beta\_{1(k-i+1)}^{j} + \dots \quad \sum\_{j=1}^{\ell-1} \dots \quad \dots \quad \sum\_{j=1}^{\ell-1} \dots$$

$$d\_{2\ell}^{m}(\mathcal{S}\_{k}) \stackrel{\triangle}{=} \mathcal{C} \left( a\_{2} \beta\_{2\ell'}^{m} \left( N - k \right) a\_{1} + k a\_{2} - \sum\_{j \in \mathcal{S}\_{k}} a\_{2} \beta\_{10}^{j} - \sum\_{j \in \mathcal{S}\_{k}} \sum\_{i=1}^{\ell} a\_{2} \beta\_{1i}^{j} - \sum\_{j \in \mathcal{S}\_{k}} \sum\_{i=1}^{\ell-1} a\_{2} \beta\_{2i}^{j} \right) \tag{A6}$$
 
$$- \sum\_{j \notin \mathcal{S}\_{k}} a\_{1} \sum\_{i=1}^{\ell} \left( \beta\_{1(k-i+1)}^{j} + \beta\_{2(k-i+1)}^{j} \right) - \sum\_{j \in \mathcal{S}\_{k}, j \le k(m)} a\_{2} \beta\_{2\ell}^{j} \Big) . \tag{A6}$$

∀*n* ∈ S / *<sup>k</sup>* and -∈ {1, . . . , *k*}:

*dn* 1(*k*−-<sup>+</sup>1)(S*<sup>k</sup>* ) - *C α*1*β<sup>n</sup>* 1(*k*−-<sup>+</sup>1),(*<sup>N</sup>* − *<sup>k</sup>*)*α*<sup>1</sup> + *<sup>k</sup>α*<sup>2</sup> − ∑ *j*∈S*<sup>k</sup> <sup>α</sup>*2*β<sup>j</sup>* <sup>10</sup> − ∑ *j*∈S*<sup>k</sup>* -−1 ∑ *i*=1 *<sup>α</sup>*2(*β<sup>j</sup>* <sup>1</sup>*<sup>i</sup>* <sup>+</sup> *<sup>β</sup><sup>j</sup>* 2*i* ) − ∑ *j*∈S/ *<sup>k</sup> α*1 -−1 ∑ *i*=1 (*βj* <sup>1</sup>(*k*−*i*+1) <sup>+</sup> *<sup>β</sup><sup>j</sup>* 2(*k*−*i*+1) ) − ∑ *<sup>j</sup>*∈S/ *<sup>k</sup>* ,*j*≤¯ *k*(*n*) *<sup>α</sup>*1*β<sup>j</sup>* 1(*k*−-+1) ) ⎞ <sup>⎠</sup> , (A7) *dn* 2(*k*−-)(S*<sup>k</sup>* ) - *C α*1*β<sup>n</sup>* 2(*k*−-),(*<sup>N</sup>* − *<sup>k</sup>*)*α*<sup>1</sup> + *<sup>k</sup>α*<sup>2</sup> − ∑ *j*∈S*<sup>k</sup> <sup>α</sup>*2*β<sup>j</sup>* <sup>10</sup> − ∑ *j*∈S*<sup>k</sup>* - ∑ *i*=1 *<sup>α</sup>*2*β<sup>j</sup>* <sup>1</sup>*<sup>i</sup>* − ∑ *j*∈S*<sup>k</sup>* -−1 ∑ *i*=1 *<sup>α</sup>*2*β<sup>j</sup>* 2*i* − ∑ *j*∈S/ *<sup>k</sup>* - ∑ *i*=1 *<sup>α</sup>*1*β<sup>j</sup>* <sup>1</sup>(*k*−*i*+<sup>1</sup> <sup>−</sup> ∑ *j*∈S/ *<sup>k</sup>* -−1 ∑ *i*=1 *<sup>α</sup>*1*β<sup>j</sup>* <sup>2</sup>(*k*−*i*+1) <sup>−</sup> ∑ *<sup>j</sup>*∈S/ *<sup>k</sup>* ,*j*≤¯ *<sup>α</sup>*1*β<sup>j</sup>* 2(*k*−-) ) ⎞ <sup>⎠</sup> . (A8)

*k*(*n*)

### **Appendix B. Proof of Theorem 1**

The rate region characterized in Theorem 1 is achievable by employing the layering scheme in Section 3.1 at each transmitter combined with the successive decoding strategy in Algorithm 1. Recall that the maximum rate of codeword *U<sup>i</sup> jk*, for each user *i* ∈ N channel *j* ∈ {1, 2}, and *k* ∈ {0 ∪N}, is bounded by the minimum achievable rate for that codebook in all combined network states during which it is decoded.

We define S as the set of users' indices that are experiencing a *strong* states. This set is known to the receiver. Accordingly, we define S*<sup>k</sup>* as a realization of *S* that contains exactly *k* users, i.e., *k* users have strong channels and *N* − *k* users have weak channels. Next, we discuss S<sup>0</sup> and S*<sup>k</sup>* for *k* ∈ N , separately.

### |S| = 0**:** All channels are weak

In the event of a network state with all channels in the weak state, *h*<sup>2</sup> *<sup>i</sup>* = *α*1, ∀*i* ∈ N , the receiver decodes only one layer per user. Specifically, it decodes {*U<sup>i</sup>* <sup>10</sup> : *i* ∈ N}. It performs successive decoding, starting from user 1 and continuing in the ascending order of users' indices. In order to successfully decode layers {*U<sup>i</sup>* <sup>10</sup> : *i* ∈N}, the rate of each layer *i* ∈ N should satisfy:

$$\forall i \in \mathcal{N}: \qquad \mathcal{R}\_{10}^{i} \le \mathbb{C} \left( a\_1 \beta\_{10^\prime}^{i} \,\mathrm{N}a\_1 - \sum\_{j=1}^{i} a\_1 \beta\_{10}^{j} \right) \stackrel{\triangle}{=} d\_{10}^{i}(\boldsymbol{\phi}). \tag{A9}$$

Note that the second argument *C*(*x*, *y*) represents the undecoded layers that will be treated as interference for layer *U<sup>i</sup>* <sup>10</sup>. Hence, based on the successive decoding procedure, when the receiver decodes *U<sup>i</sup>* <sup>10</sup>, layers *<sup>U</sup><sup>i</sup>* <sup>10</sup> for users *j* ∈ {1, ... , *i* − 1} have already been decoded. Thus, their interference is subtracted from the total transmitted signal, accounted by the term 1 <sup>−</sup> *<sup>β</sup><sup>j</sup>* <sup>10</sup>. On the other hand, none of the layers transmitted by users *j* ∈ {*i* + 1, ... , *N*} have been decoded yet, which is accounted by the term (*N* − *i*)*α*1.

Next, we will characterize upper bounds on the achievable rates of all the layers decoded when there are exactly *k* users with strong channels, i.e., |S| = *k*.

### |S| = *k***:** *k* channels are strong

As discussed earlier, when |S| = *k*, the receiver employs 4*k* + 1 decoding stages. For this purpose, the set of codebooks {*U<sup>i</sup> j*-: *i* ∈N} is partitioned to two sets

$$\mathcal{P}\_{j\ell} \triangleq \{ \mathcal{U}\_{j\ell}^{i} : i \in \mathcal{S} \} \qquad \text{and} \qquad \mathcal{Q}\_{j\ell} \triangleq \{ \mathcal{U}\_{j\ell}^{i} : i \notin \mathcal{S} \} \,. \tag{A10}$$

rendering a total of 4*k* + 1 partitions for different *j* ∈ {1, 2} and - ∈ {0, ... , *k*}. The decoding strategy, decodes one message from each of these, except for the partition {*U<sup>i</sup>* <sup>2</sup>*<sup>k</sup>* : *i* ∈ S} / . The decoding strategy works as follows. We create the following two sequences of sets:

$$\mathcal{P} \triangleq \{ \mathcal{P}\_{10\prime} \mathcal{P}\_{11\prime} \mathcal{P}\_{21\prime} \dots \mathcal{P}\_{2(k-1)\prime} \mathcal{P}\_{1k\prime} \} \,, \tag{A11}$$

$$\mathbb{Q} \stackrel{\triangle}{=} \left\{ \mathbb{Q}\_{1k'} \mathbb{Q}\_{2(k-1)'} \mathbb{Q}\_{1(k-1)'} \dots \mathbb{Q}\_{11'} \mathbb{Q}\_{10'} \right\}.\tag{A12}$$

The decoding strategy selects codebooks by alternating between P and Q an an ascending order and decodes exactly one codebook from each. This results in 4*k* coding stages. Finally, the codebooks in {*U<sup>i</sup>* <sup>2</sup>*<sup>k</sup>* : *i* ∈ S} are decoded as the last stage, i.e., stage 4*k* + 1.

### • **Decoding stage** 1**:**

We start by decoding the layers P<sup>10</sup> - {*U<sup>i</sup>* <sup>10</sup> : *i* ∈ S}. Recall that S*<sup>k</sup>* was defined as an ordered set of these users. The codebooks will be decoded sequentially in this order. When *m* ∈ S*k*, we denote the position of *m* in S*<sup>k</sup>* by *k*(*m*). Hence, ∀*m* ∈ S*<sup>k</sup>*

$$R\_{10}^m \le \mathbb{C} \left( a\_2 \beta\_{10}^m, (N - k)a\_1 + ka\_2 - \sum\_{j \in \mathcal{S}\_k, j \le k(m)} a\_2 (1 - \beta\_{10}^j) \right) \stackrel{\triangle}{=} d\_{10}^m (\mathcal{S}\_k) \,. \tag{A13}$$

### • **Decoding stage** 2**:**

Next, we sequentially decode the layers in Q1*<sup>k</sup>* = {*U<sup>i</sup>* <sup>1</sup>*<sup>k</sup>* : *i* ∈ S} / , which involves layers *Ui* <sup>1</sup>*<sup>k</sup>* of users with weak channels. When *n* ∈ S / *<sup>k</sup>*, we denote the position of *n* in the ordered set N \S*<sup>k</sup>* by ¯ *k*(*n*). Hence, ∀*n* ∈ S / *<sup>k</sup>*

$$R\_{1k}^{\mathfrak{n}} \le \mathbb{C} \left( a\_1 \beta\_{1k}^{\mathfrak{n}} \left( N - k \right) a\_1 + k a\_2 - \sum\_{j \in \mathcal{S}\_k} a\_2 \beta\_{10}^j - \sum\_{j \notin \mathcal{S}\_k \ i < \bar{k}(n)} a\_1 \beta\_{1k}^j \right) \stackrel{\triangle}{=} d\_{1k}^{\mathfrak{n}} \left( \mathcal{S}\_k \right). \tag{A14}$$

### • **Decoding stage** 3**:**

In the third stage, the codebooks in P<sup>10</sup> and Q1*<sup>k</sup>* are already decoded. We continue by sequentially decoding the set of codebooks in P<sup>11</sup> - {*U<sup>i</sup>* <sup>11</sup> : *i* ∈ S}. Hence, ∀*m* ∈ S*<sup>k</sup>*

$$R\_{11}^{m} \leq \mathbb{C} \left( a\_2 \beta\_{11}^m, (N - k)a\_1 + ka\_2 - \sum\_{j \in S\_k} a\_2 \beta\_{10}^j - \sum\_{j \notin S\_k} a\_1 \beta\_{1k}^j - \sum\_{j \in S\_{k'} \leq k(m)} a\_2 \beta\_{11}^j \right) \overset{\wedge}{=} d\_{11}^m (\mathcal{S}\_k) \,. \tag{A15}$$

### • **Decoding stage** 4**:**

The decoding process continues by sequentially decoding the codebooks in Q2(*k*−1) = {*U<sup>i</sup>* <sup>2</sup>(*k*−1) : *<sup>i</sup>* ∈ S} / , while the codebooks of <sup>P</sup>10, <sup>Q</sup>1*k*, and <sup>P</sup><sup>11</sup> are already decoded. Hence, for *n* ∈ S / *<sup>k</sup>*

$$R\_{2(k-1)}^{\rm n} \le C \left( \alpha\_1 \beta\_{2(k-1)}^{\rm n} (N - k) a\_1 + k a\_2 - \sum\_{j \in \mathcal{S}\_k} \alpha\_2 (\beta\_{10}^j + \beta\_{11}^j) \right. $$

$$- \sum\_{j \notin \mathcal{S}\_k} \alpha\_1 \beta\_{1k}^j - \sum\_{j \notin \mathcal{S}\_k, j \le k(n)} \alpha\_1 \beta\_{2(k-1)}^j \right) \stackrel{\triangle}{=} d\_{2(k-1)}^{\rm n} (\mathcal{S}\_k) \. \tag{A16}$$

### • **Decoding stage** 5**:**

This stage sequentially decodes the codebooks P21. For all *<sup>m</sup>* ∈ S*<sup>i</sup> <sup>k</sup>* we have

$$R\_{21}^{\rm m} \le \mathbb{C} \left( a\_2 \mathfrak{f}\_{21}^{\rm m}, (N - k)a\_1 + ka\_2 - \sum\_{j \in \mathcal{S}\_k} a\_2 (\mathfrak{f}\_{10}^j + \mathfrak{f}\_{11}^j) \right)$$

$$- \sum\_{j \notin \mathcal{S}\_k} a\_1 (\mathfrak{f}\_{1k}^j + \mathfrak{f}\_{2(k-1)}^j) - \sum\_{j \in \mathcal{S}\_k, j \le k(\mathfrak{m})} a\_2 \mathfrak{f}\_{21}^j \right) \stackrel{\triangle}{=} d\_{21}^{\rm m} (\mathcal{S}\_k) \,. \tag{A17}$$

### • **Decoding stage** 6**:**

This stage sequentially decodes the codebooks in Q1(*k*−1). Hence, ∀*n* ∈ S / *<sup>k</sup>* we have

$$R\_{1(k-1)}^n \le \mathbb{C} \left( a\_1 \boldsymbol{\beta}\_{1(k-1)}^n \left( N - k \right) a\_1 + k a\_2 - \sum\_{j \in \mathcal{S}\_k} a\_2 (\boldsymbol{\beta}\_{10}^j + \boldsymbol{\beta}\_{11}^j + \boldsymbol{\beta}\_{21}^j) \right)$$

$$- \sum\_{j \notin \mathcal{S}\_k} a\_1 (\boldsymbol{\beta}\_{1k}^j + \boldsymbol{\beta}\_{2(k-1)}^j) - \sum\_{j \notin \mathcal{S}\_k, j \le k(n)} a\_1 \boldsymbol{\beta}\_{1(k-1)}^j) \stackrel{\triangle}{=} d\_{1(k-1)}^n (\mathcal{S}\_k) \,. \tag{A18}$$

• **Decoding stages** {2, . . . , 4*k* + 1}**:** Following the pattern of the previous decoding stages, in general in the stage {2, ... , 4*k*}, we decode the codebooks according to the following schedule, for -∈ {1, . . . , *k*}:

$$\begin{array}{llll} \text{codebook in } \mathcal{Q}\_{1(k-\ell+1)} & \text{stage} & 4\ell-2\\ \text{codebook in } \mathcal{P}\_{1\ell} & \text{stage} & 4\ell-1\\ \text{codebook in } \mathcal{Q}\_{2(k-\ell)} & \text{stage} & 4\ell \\ \text{codebook in } \mathcal{P}\_{2\ell} & \text{stage} & 4\ell+1 \end{array} \tag{A19}$$

Accordingly, we obtain the following rate constraints.

• **Decoding stage** 4-− 2**:**

> By sequentially decoding the messages in Q1(*k*−-<sup>+</sup>1), ∀*n* ∈ S / *<sup>k</sup>* we have

$$R\_{1(k-\ell+1)}^n \le C \left( a\_1 \beta\_{1(k-\ell+1)}^n \left( N - k \right) a\_1 + k a\_2 - \sum\_{j \in S\_k} a\_2 \beta\_{10}^j - \sum\_{j \in S\_k} \sum\_{i=1}^{\ell-1} a\_2 (\beta\_{1i}^j + \beta\_{2i}^j) \right. \\ \left. \begin{aligned} & \left. - \sum\_{j \in S\_k} a\_2 (\beta\_{1i}^j + \beta\_{2i}^j) \\ & - \sum\_{j \notin S\_k, j \le k} a\_1 \beta\_{1(k-\ell+1)}^j \right) \overset{\triangle}{=} d\_{1(k-\ell+1)}^n (S\_k) \end{aligned} \tag{A20}$$

• **Decoding stage** 4-− 1**:**

By sequentially decoding the messages in P1-, ∀*m* ∈ S*<sup>k</sup>* we have

$$R\_{1\ell}^{m} \le \mathcal{C} \left( a\_2 \beta\_{1\ell\prime}^{m} (N - k) a\_1 + k a\_2 - \sum\_{j \in S\_k} a\_2 \beta\_{10}^j - \sum\_{j \in S\_k} \sum\_{i=1}^{\ell - 1} a\_2 (\beta\_{1i}^j + \beta\_{2i}^j) \right)$$

$$- \sum\_{j \notin S\_k} a\_1 \sum\_{i=1}^{\ell - 1} (\beta\_{1(k - i + 1)}^j + \beta\_{2(k - i + 1)}^j) - \sum\_{j \in S\_k, j \in \mathcal{k}(m)} a\_2 \beta\_{1\ell}^j \right) \stackrel{\triangle}{=} d\_{1\ell}^m (S\_k) \,. \tag{A21}$$

#### • **Decoding stage** 4-**:**

By sequentially decoding the messages in Q1(*k*−-), ∀*n* ∈ S / *<sup>k</sup>* we have

$$R\_{2(k-\ell)}^n \le \mathbb{C} \left( a\_1 \mathfrak{f}\_{2(k-\ell)^\*}^n (N-k) a\_1 + k a\_2 - \sum\_{j \in \mathcal{S}\_k} a\_2 \mathfrak{f}\_{10}^j - \sum\_{j \in \mathcal{S}\_k} \sum\_{i=1}^\ell a\_2 \mathfrak{f}\_{1i}^j - \sum\_{j \in \mathcal{S}\_k} \sum\_{i=1}^{\ell-1} a\_2 \mathfrak{f}\_{2i}^j \right)$$

$$- \sum\_{j \notin \mathcal{S}\_k} \sum\_{i=1}^\ell a\_1 \mathfrak{f}\_{1(k-i+1)}^j - \sum\_{j \notin \mathcal{S}\_k} \sum\_{i=1}^{\ell-1} a\_1 \mathfrak{f}\_{2(k-i+1)}^j - \sum\_{j \notin \mathcal{S}\_k; j \le k(n)} a\_1 \mathfrak{f}\_{2(k-\ell)}^j \right) \overset{\triangle}{=} d\_{2(k-\ell)}^n (\mathcal{S}\_k) \,. \tag{A22}$$

• **Decoding stage** 4- + 1**:** By sequentially decoding the messages in P2-, ∀*m* ∈ S*<sup>k</sup>* we have

$$R\_{2\ell}^{\mathfrak{m}} \le \mathbb{C} \left( a\_2 \mathfrak{gl}\_{2\ell}^{\mathfrak{m}} (N - k) a\_1 + k a\_2 - \sum\_{j \in S\_k} a\_2 \mathfrak{gl}\_{10}^j - \sum\_{j \in S\_k} \sum\_{i = 1}^{\ell} a\_2 \mathfrak{gl}\_{1i}^j - \sum\_{j \in S\_k} \sum\_{i = 1}^{\ell - 1} a\_2 \mathfrak{gl}\_{2i}^j \right)$$

$$- \sum\_{j \notin S\_k} a\_1 \sum\_{i = 1}^{\ell} (\mathfrak{gl}\_{1(k - i + 1)}^j + \mathfrak{gl}\_{2(k - i + 1)}^j) - \sum\_{j \in S\_k, j \le k(\mathfrak{m})} a\_2 \mathfrak{gl}\_{2\ell}^j \rightleftharpoons \mathring{d}\_{2\ell}^{\mathfrak{m}} (\mathcal{S}\_k) \,. \tag{A23}$$

Given the upper bounds on the individual achievable rates of *U<sup>i</sup> jk*, ∀*i* ∈ N , *j* ∈ 1, 2, *<sup>k</sup>* ∈ {<sup>0</sup> ∪N}, the maximum achievable rate of *<sup>U</sup><sup>i</sup> jk* is bounded my the minimum upper bound among all the network states within which it is decoded.

### **Appendix C. Proof of Theorem 2**

By applying a change of variable to each term and taking the integral J <sup>∞</sup> <sup>0</sup> *<sup>e</sup>*<sup>−</sup>*sqdF*1(*q*) as a common factor, *L*1(*s*) can be expressed as

$$L\_{1}(s) = \frac{F\_{1}(0) - \int\_{0^{+}}^{\left(\sum\_{\vec{\eta}} R\_{ij}^{1} - \lambda\_{1}\right)} e^{-s\left(q + \left(\lambda\_{1} - \sum\_{\vec{\eta}} R\_{ij}^{1}\right)\right)} dF\_{1}(q)}{1 - \left[p\_{1}p\_{2}e^{-s\left(\lambda\_{1} - \sum\_{\vec{\eta}} R\_{ij}^{1}\right)} + p\_{1}p\_{2}e^{-s\left(\lambda\_{1} - R\_{11}^{1} - R\_{21}^{1}\right)}\right. + p\_{1}p\_{2}e^{-s\left(\lambda\_{1} - R\_{11}^{1} - R\_{12}^{1}\right)} + p\_{1}p\_{2}e^{-s\left(\lambda\_{1} - R\_{11}^{1}\right)}}\tag{A24}$$

Further, by using the definition of *<sup>F</sup>*1(0) = *<sup>p</sup>*¯1 *<sup>p</sup>*¯2*F*1(*<sup>q</sup>* − (*λ*<sup>1</sup> − <sup>∑</sup>*ij <sup>R</sup>*<sup>1</sup> *ij*)) and multiplying the numerator and denominator of (A24) by a common factor, *e* <sup>−</sup>*s*(∑*ij <sup>R</sup>*<sup>1</sup> *ij*−*λ*1) , we have.

$$L\_{1}(s) = \frac{\bar{p}\_{1}\bar{p}\_{2}\left[\int\_{0}^{\left(\sum\_{j}R\_{ij}^{1} - \lambda\_{1}\right)} e^{-s\left(\sum\_{j}R\_{ij}^{1} - \lambda\_{1}\right)} - e^{-sq}dF\_{1}(q)\right]}{\varepsilon^{-s\left(\sum\_{j}R\_{ij}^{1} - \lambda\_{1}\right)} - \left[\bar{p}\_{1}\bar{p}\_{2} + \bar{p}\_{1}p\_{2}\varepsilon^{-s\left(R\_{12}^{1} + R\_{22}^{1}\right)}\right) + p\_{1}\bar{p}\_{2}\varepsilon^{-s\left(R\_{21}^{1} + R\_{22}^{1}\right)} + p\_{1}p\_{2}\varepsilon^{-s\left(R\_{21}^{1} + R\_{12}^{1} + R\_{22}^{1}\right)}\right]}\tag{A25}$$

It can be readily noticed from (A25) that lim*s*→<sup>0</sup> *D*1(*s*) = lim*s*→<sup>0</sup> *N*1(*s*) = 0, therefore we apply L'hopital's limit rule on (A25) to arrive at

$$\mathbb{E}[Q\_1] = \lim\_{s \to 0} \frac{D\_{Q\_1}^{''}(s) - N\_{Q\_1}^{''}(s)}{2D\_{Q\_1}^{'}(s)}. \tag{A26}$$

Finally, we evaluate the terms *D Q*<sup>1</sup> (*s*), *N Q*<sup>1</sup> (*s*) and *D Q*<sup>1</sup> (*s*) where we have

$$\lim\_{s \to 0} D\_{Q\_1}^{'}(s) = - (\sum\_{ij} R\_{ij}^1 - \lambda\_1) + \overline{p}\_1 p\_2 (R\_{12}^1 + R\_{22}^1) + p\_1 p\_2 (R\_{12}^1 + R\_{22}^1) + p\_1 p\_2 (R\_{12}^1 + R\_{21}^1 + R\_{22}^1) \tag{A27}$$

$$\lim\_{s \to 0} D\_{Q\_1}^{''}(s) = \langle \sum\_{\vec{n}} R\_{\vec{n}}^1 - \lambda\_1 \rangle^2 - \bar{p}\_1 p\_2 (R\_{12}^1 + R\_{22}^1)^2 - p\_1 p\_2 (R\_{12}^1 + R\_{22}^1)^2 - p\_1 p\_2 (R\_{12}^1 + R\_{21}^1 + R\_{22}^1)^2,\tag{A28}$$

and

$$\lim\_{s \to 0} N\_{Q\_1}''(s) = p\_1 p\_2 \int\_0^{(\sum\_{j} R\_{ij}^1 - \lambda\_1)} [(\sum\_{ij} R\_{ij}^1 - \lambda\_1)^2 - q^2] dF\_1(q) \,. \tag{A29}$$

Finally, by using lim*s*→<sup>0</sup> *<sup>D</sup>* <sup>1</sup>(*s*) = lim*s*→<sup>0</sup> *<sup>N</sup>* <sup>1</sup>(*s*), the second derivative of the numerator term can be upper bounded by replacing (∑*ij R*<sup>1</sup> *ij* − *<sup>λ</sup>*<sup>1</sup> + *<sup>q</sup>*) by 2(∑*ij <sup>R</sup>*<sup>1</sup> *ij* − *λ*1) arriving at

$$\begin{split} \lim\_{s \to 0} N\_{Q\_1}^{''}(s) &\leq 2(\sum\_{ij} R^1\_{ij} - \lambda\_1) \left( \sum\_{ij} R^1\_{ij} - \lambda\_1 \\ & - p\_1 p\_2 (R^1\_{12} + R^1\_{22}) - p\_1 \bar{p}\_2 (R^1\_{12} + R^1\_{22}) - p\_1 p\_2 (R^1\_{12} + R^1\_{21} + R^1\_{22}) \right) . \end{split} \tag{A30}$$

Next, we leverage (A26) reaching

$$\mathbb{E}[Q\_i] \ge \frac{1}{2} \sum\_{j=1}^{2} \sum\_{k=0}^{N} R\_{jk}^i - \frac{\lambda\_i}{2} - \frac{N\_i}{D\_i},$$

$$\mathbb{E}[Q\_i] \le \sum\_{j=1}^{2} \sum\_{k=0}^{N} R\_{jk}^i - \lambda\_i - \frac{N\_i}{D\_i},\tag{A31}$$

where

$$\begin{split} \mathcal{N}\_{i} & \stackrel{\triangle}{=} -\left(\sum\_{j=1}^{2} \sum\_{k=k}^{N} R\_{jk}^{i} - \lambda\_{i}\right)^{2} \\ &+ \mathbb{P}\left[\mathcal{E}\left(\mathcal{S}\_{0}^{i}\right)\right] \left(\sum\_{j=1}^{2} \sum\_{k=1}^{N} R\_{jk}^{i}\right)^{2} \\ &+ \sum\_{\ell=1}^{N-1} \mathbb{P}\left[\mathcal{E}\left(\mathcal{S}\_{\ell}^{i}\right)\right] \cdot \left(\sum\_{j=1}^{2} \sum\_{k=\ell+1}^{N} R\_{jk}^{i} + R\_{1\ell}^{i}\right)^{2} \\ &+ \sum\_{\ell=1}^{N-1} \mathbb{P}\left[\mathcal{E}\left(\mathcal{S}^{i}\right)\right] \cdot \left(\sum\_{j=1}^{2} \sum\_{k=\ell+1}^{N} R\_{jk}^{i} + R\_{2\ell}^{i}\right)^{2} \\ &D\_{i} \stackrel{\triangle}{=} 2\left(\mathbb{E}\left[r\_{i}\right] - \lambda\_{i}\right). \end{split} \tag{A32}$$

### **Appendix D. Proof of Theorem 3**

In this Appendix, we base the proof of Theorem 3 on two main steps. First, we characterize a lower bound on the average achievable rate of each user *i* using a single layer per user (outage approach). Secondly, we derive the rate of increase of the average achievable delay with respect to the average arrival rate *λ<sup>i</sup>* (first-order derivative) for the delay upper bound of the multi-layer approach to that of the delay lower bound of the outage approach. Finally, under a fixed average achievable rate among both approaches, we show that the proposed approach outperforms the single layer outage approach.

Recalling the recursive expression for *Qi* in terms of the variable *Zi* in (3), a recursive form of *Fi*(*q*) can be expressed as follows [52,53]

$$F\_i(q) = \begin{cases} 0, & q < 0\\ \int\_{-\infty}^q F\_i(q-\tau) dF\_{\mathbb{Z}\_i}(\tau), & q \ge 0 \end{cases} \tag{A33}$$

where *dFZi* (*z*) denote pdf of *Zi*.

At the end of every transmission block, the change in queue size *i*, *Zi*, is primarily determined by the difference between the data arrival *λ<sup>i</sup>* and the fixed rate successfully decoded at the receiver, which in turn is determined by the combined network state. Consequently, *dFZi* (*z*) can be expressed by

$$dF\_{Z\_i}(z) = P\_{\text{out}} \delta(z - \lambda\_i + R\_{\text{F}}) \,. \tag{A34}$$

We remark that in order to guarantee the stability of every queue *i*, we assume that the arrival rate *λ<sup>i</sup>* is less that the average achievable rate (service rate of the queue), i.e.,

$$
\lambda\_i < P\_{\text{out}} R\_{\text{F}} \; \forall i \in \mathcal{N} \; . \tag{A35}
$$

Combining (12) and (13), an explicit expression for *Fi*(*q*), ∀*i* ∈ N is given by

$$F\_l(q) = \\\begin{cases} 0, & \forall q < 0\\ P\_{\rm out} F\_l(q - \lambda\_l + R\_{\rm F}), & \forall q \ge 0 \end{cases}. \tag{A36}$$

Finally, we evaluate the terms *D Q*<sup>1</sup> (*s*) and *N Q*<sup>1</sup> (*s*) where we have

$$\lim\_{s \to 0} D\_{Q\_1}''(s) = (R\_{\mathcal{F}} - \lambda\_i)^2 - (1 - P\_{\text{out}}) R\_{\mathcal{F}'}^2 \tag{A37}$$

and

$$\lim\_{s \to 0} N\_{Q\_1}''(s) = P\_{\text{out}} \int\_0^{R\_{\text{F}} - \lambda\_1} [(R\_{\text{F}} - \lambda\_1)^2 - q^2] dF\_1(q) \,. \tag{A38}$$

Finally, by using lim*s*→<sup>0</sup> *<sup>D</sup>* <sup>1</sup>(*s*) = lim*s*→<sup>0</sup> *<sup>N</sup>* <sup>1</sup>(*s*), the second derivative of the numerator term can be lower bounded by replacing (*F*<sup>F</sup> − *λ*<sup>1</sup> + *q*) by (*R*<sup>F</sup> − *λ*1) arriving at

$$\lim\_{s \to 0} N\_{Q\_1}''(s) \ge (R\_{\mathcal{F}} - \lambda\_1)(R\_{\mathcal{F}} - \lambda\_1 - P\_{\text{out}}R\_{\mathcal{F}}) \,. \tag{A39}$$

and substitute (A26) reaching

$$\mathbb{E}[Q\_i] \ge \frac{1}{2} R\_{\mathcal{F}} - \frac{\lambda\_i}{2} - \frac{N\_i}{D\_i},\tag{A40}$$

where

$$N\_{\hat{i}} \stackrel{\triangle}{=} -\left(R\_{\mathcal{F}} - \lambda\_{i}\right)^{2} + \left(1 - P\_{\text{out}}\right)R\_{\mathcal{F}}^{2}$$

$$D\_{\hat{i}} \stackrel{\triangle}{=} P\_{\text{out}}R\_{\mathcal{F}} - \lambda\_{i} \,. \tag{A41}$$

By taking the derivative of the upper/lower bounds derived above we reach

$$\frac{\partial \mathcal{U}\_B}{\partial \lambda\_i} = -1 - \frac{\sum\_{j=1}^2 \sum\_{k=0}^N R\_{jk}^i - \lambda\_i}{\mathbb{E}[r\_i] - \lambda\_i} - 2\frac{N\_i}{D\_i^2},\tag{A42}$$

$$\frac{\partial L\_B}{\partial \lambda\_i} = -1 - \frac{R\_\mathcal{F} - \lambda\_i}{P\_{\rm out} R\_\mathcal{F} - \lambda\_i} - 2 \frac{-\left(R\_\mathcal{F} - \lambda\_i\right)^2 + \left(1 - P\_{\rm out}\right)R\_\mathcal{F}^2}{P\_{\rm out} R\_\mathcal{F} - \lambda\_i}.\tag{A43}$$

### **References**


### *Article* **Straggler- and Adversary-Tolerant Secure Distributed Matrix Multiplication Using Polynomial Codes**

**Eimear Byrne 1, Oliver W. Gnilke <sup>2</sup> and Jörg Kliewer 3,\***


**Abstract:** Large matrix multiplications commonly take place in large-scale machine-learning applications. Often, the sheer size of these matrices prevent carrying out the multiplication at a single server. Therefore, these operations are typically offloaded to a distributed computing platform with a master server and a large amount of workers in the cloud, operating in parallel. For such distributed platforms, it has been recently shown that coding over the input data matrices can reduce the computational delay by introducing a tolerance against straggling workers, i.e., workers for which execution time significantly lags with respect to the average. In addition to exact recovery, we impose a security constraint on both matrices to be multiplied. Specifically, we assume that workers can collude and eavesdrop on the content of these matrices. For this problem, we introduce a new class of polynomial codes with fewer non-zero coefficients than the degree +1. We provide closed-form expressions for the recovery threshold and show that our construction improves the recovery threshold of existing schemes in the literature, in particular for larger matrix dimensions and a moderate to large number of colluding workers. In the absence of any security constraints, we show that our construction is optimal in terms of recovery threshold.

**Keywords:** distributed computation; matrix multiplication; distributed learning; information theoretic security; polynomial codes

### **1. Introduction**

Recently, tensor operations have emerged as an important ingredient of many signal processing and machine learning applications [1]. These operations are typically complex due to the large size of the associated tensors. Therefore, in the interest of a low execution time, such computations are often performed in a distributed fashion and outsourced to a cloud of multiple workers that operate in parallel over the distributed data set. These workers in many cases consist of commercial off-the-shelf servers that are characterized by failures and varying execution times. Such straggling servers are handled by state-of-the art cloud computation platforms via a repetition of the computation task at hand. However, recent work has shown that encoding the input data may help alleviate the straggler problem and thus reduce the computation latency, which mainly depends on the amount of stragglers present in the cloud computing environment; see [2,3]. More generally, it has been shown that coding can control the trade-off between computational delay and communication load between workers and master server [3–6]. In addition, the workers in the cloud may not be trustworthy, so the input and output of the partial computations need to be protected against unauthorized access. To this end, it has been shown that stochastic coding can help keep both input and output data secure from eavesdropping and colluding workers (see, for example, [7–14]).

In this work, we focus on the canonical problem of distributing the multiplication of two matrices *A* and *B*, i.e., *C* = *AB*, whose content should be kept secret from a prescribed

**Citation:** Byrne, E.; Gnilke, O.W.; Kliewer, J. Straggler- and Adversary-Tolerant Secure Distributed Matrix Multiplication Using Polynomial Codes. *Entropy* **2023**, *25*, 266. https://doi.org/ 10.3390/e25020266

Academic Editor: Syed A. Jafar

Received: 1 November 2022 Revised: 16 January 2023 Accepted: 20 January 2023 Published: 31 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

number of colluding workers in the cloud. Our goal is to minimize the number of workers from which the partial result must be downloaded, the so-called *recovery threshold*, to recover the correct matrix product *C*.

Coded matrix computation was first addressed in the non-secure case by applying separate MDS codes to encode the two matrices [3]. In [5], polynomial codes have been introduced, which improves on the recovery threshold of [3]. The recovery threshold was further improved by the so-called MatDot and PolyDot codes [15,16] at the expense of a larger download rate. In particular, PolyDot codes allow a flexible trade-off between the recovery threshold and the download rate, depending on the application at hand.

In [17,18] two different schemes are presented, an explicit scheme that improves on the recovery thereshold of PolyDot codes and a construction based on the tensor rank of matrix multiplication, which is optimal up to a factor of 2. In [19] a new construction for private and secure matrix multiplication is proposed based on entangled polynomial codes, which allows for a flexible trade-off between the upload rate and the download rate (equivalently, the recovery threshold). For small numbers of stragglers [20] constructs schemes that outperform the entangled polynomial scheme. Recently, several attempts have been made to design coding schemes to further reduce upload and download rates, the recovery threshold, and computational complexity for both workers and server (see, for example, [21–27]). For example, in [21], bivariate polynomial codes were used to reduce the recovery threshold in specific cases. In [22], the authors considered new schemes for the private and secure case which outperform [19] for specific parameter regions. The work in [23] considered distributed storage repair codes, so-called field-trace polynomial codes, to reduce the download rate for specific partitions of matrices *A* and *B*. Very recently, the authors in [24] proposed a black-box coding scheme based on star products, which subsumes several existing works as special cases. In [25], a discrete Fourier transformbased scheme with low upload rates and encoding complexity is proposed. The work in [26] focused on selecting the evaluation points for the polynomial codes, providing a better upload rate than [9], but worse than [25].

In the following, we propose a new scheme for secure matrix multiplication, which provides explicit evaluation points for the polynomial codes, but unlike the work in [26], is also able to tolerate stragglers. Specifically, we exploit gaps in the underlying polynomial code. This is motivated by the observation that the recovery threshold can be improved by selecting the number of evaluation points to be equal to the number of only the *non-zero* coefficients in the polynomial [9,19]. In addition, selecting dedicated evaluation points has the advantage that the condition for security against colluding workers is automatically satisfied (see, for example, condition C2 in [27]). As such, our approach is able to provide a constructive scheme with provable security guarantees. Further, our coding scheme provides an advantage in terms of download rate in some cases, and is both stragglertolerant and robust against Byzantine attacks on the workers.

This paper is organized as follows. In Section 2, the problem statement and the background is highlighted. Section 3 discusses design and properties of our proposed scheme and provides performance guarantees with respect to the number of helper nodes needed for recovery, security, straggler tolerance and under Byzantine attacks. Section 4 extends the scheme of Section 4 by introducing gaps into the code polynomials and by studying its properties. Finally, Section 5 presents numerical results and comparisons with state-of-the-art schemes from the literature.

### **2. Problem Statement and Background**

Let *A* and *B* be a pair of matrices over the finite field F*q*, whose product is well defined. We consider the problem of computing the product *C* = *AB*. The computation will be distributed among a number of helper nodes, each of which will execute a portion of the total calculation. We also assume that the user wishes to hide the data contained in the matrices *A* and *B* and that up to *T* honest but curious helper nodes may collude to deduce information about the contents of *A* and *B*. To divide the work among the helper nodes, the matrices *A* and *B* are each divided into *KM* and *ML* blocks, respectively, of compatible dimensions, say *a* × *r* and *r* × *b*. The matrices are also assumed to have independent and identically distributed uniformly distributed entries from a sufficiently large field of cardinality *q* > *N*, where *N* denotes the number of servers to be employed (in fact, we will require *q* to exceed the degree of a polynomial *P*(*x*)*Q*(*x*), central to this scheme). Hence, for given matrix partition of *A* and *B* according to

$$A = \begin{bmatrix} A\_{1,1} & \cdots & A\_{1,M} \\ \vdots & \ddots & \vdots \\ A\_{K,1} & \cdots & A\_{K,M} \end{bmatrix}, \quad B = \begin{bmatrix} B\_{1,1} & \cdots & B\_{1,L} \\ \vdots & \ddots & \vdots \\ B\_{M,1} & \cdots & B\_{M,L} \end{bmatrix}.$$

we obtain

$$\mathbf{C} = AB = \begin{bmatrix} \mathbf{C}\_{1,1} & \cdots & \mathbf{C}\_{1,L} \\ \vdots & \ddots & \vdots \\ \mathbf{C}\_{K,1} & \cdots & \mathbf{C}\_{K,L} \end{bmatrix} \\ \text{where } \mathbf{C}\_{i,j} = \sum\_{m=1}^{M} A\_{i,m} B\_{m,j} \dots$$

The system model is displayed in Figure 1. We consider a distributed computing system with a master server and *N* helper nodes or workers. The master server is interested in computing the product *C* = *AB*. In Figure 1, the worker receives matrices *A* and *B* and *<sup>T</sup>* random uniformly independent and identically distributed matrices of size *Rt* ∈ F*a*×*<sup>r</sup> <sup>q</sup>* and *St* ∈ F*r*×*<sup>b</sup>* for *<sup>t</sup>* ∈ [*T*]. To keep the data secure and to leverage possible computational redundancy at the workers, the server sends encoded versions of the input matrices to the workers. This security constraint imposes the mutual information condition

$$I(A\_{\mathcal{T}}, B\_{\mathcal{T}}; A, B) = 0 \tag{1}$$

between the pair (*A*, *B*) and their encodings (*A*<sup>T</sup> , *B*<sup>T</sup> ) for all subsets T ⊂ [*N*] of maximum cardinality *T*. The server generates a polynomial representation of *A* and *Rt* by constructing a polynomial *<sup>P</sup>*(*x*) ∈ F*a*×*<sup>r</sup> <sup>q</sup>* [*x*]. Likewise, a polynomial representation of *<sup>B</sup>* and *Qt* results in a polynomial *<sup>Q</sup>*(*x*) ∈ F*r*×*<sup>b</sup> <sup>q</sup>* [*x*]. The polynomial encodings that the *p*-th worker receives comprise the two polynomial evaluations *P*(*αp*) and *Q*(*αp*), for distinct evaluation points *α<sup>p</sup>* ∈ F*<sup>q</sup>* with *p* ∈ [*N*]. It then computes the matrix product *P*(*αp*)*Q*(*αp*) and sends it back to the server. The server collects a subset of *NR* ≤ *N* outputs from the workers as defined by the evaluation points in the subset {*P*(*αp*)*Q*(*αp*)}*p*∈N*<sup>R</sup>* with |N*R*| = *NR*. The size of the smallest possible subset *NR* for which perfect recovery is obtained, i.e.,

$$H(AB|\{P(\mathfrak{a}\_p)Q(\mathfrak{a}\_p) : p \in \mathcal{N}\_\mathbb{R}\}) = 0,\tag{2}$$

where *H* denoted the entropy function, is defined as the *recovery threshold*. The server then interpolates the underlying polynomial such that the correct product *C* = *AB* can be assembled from a combination of the interpolated polynomial coefficients *Ci*,*<sup>j</sup>* (see Section 3 for details).

We further define the *upload rate Ru* per worker as the sum of the dimensions of *P*(*αp*) and *Q*(*αp*), i.e., *Ru* = (*a* + *b*)*r* field elements of F*q*. Likewise, the *download rate* or *communication load Rd* is defined as the total number of field elements to be downloaded from the workers such that (2) is satisfied, i.e., *Rd* = *abNR*.

**Figure 1.** System model for secure matrix multiplication.

**Notation.** For the remainder, we fix *A*, *B*, *C* to be matrices over F*<sup>q</sup>* such that *C* = *AB*, and we fix *K*, *M*, *L*, *a*, *b*,*r* to be the integers as defined above. We define [*n*] := {1, ... , *n*} for any positive integer *n*. For each *k* ∈ [*K*], - ∈ [*L*], and *m* ∈ [*M*], we write *Ak*,*m*, *Bm*,-, and *Ck*,- to denote the (*k*, *m*),(*m*, -), and (*k*, -) blocks of *A*, *B*, and *C*, respectively. The transpose of a matrix *Z* is denoted by *Z<sup>t</sup>* .

### **3. Proposed Scheme**

The scheme we propose uses a similar approach to the schemes in [9,19,27]. We will begin with the choices for exponents in *P*(*x*) and *Q*(*x*) and show that the desired blocks of *C* appear as coefficients of the product *PQ*. We discuss the maximum possible degree of *PQ* since it gives us an upper bound on the necessary evaluations, and hence workers, needed to interpolate *PQ*. In Section 3.3, we give explicit criteria for choices of evaluation points and prove that the scheme protects against collusion of up to *T* servers. Section 3.4 discusses the option to query additional servers to provide resilience against stragglers and Byzantine servers.

Section 4 uses ideas from the GASP scheme [9] to reduce the recovery threshold by examining how many coefficients in the product are already known to be zero.

### *3.1. Choice of Exponents and Maximal Degree*

We propose the following scheme to outsource the computation among the worker servers. The model will incorporate methods to secure the privacy of the data held by the matrices *A*, *B*, and *C*.

Let *D* := *M* + 2. For the given *A* and *B*, we define the polynomials:

$$\bar{P}(\mathbf{x}) := \sum\_{k=1}^{K} \mathbf{x}^{D(k-1)} \sum\_{m=1}^{M} \mathbf{x}^{m} A\_{k,m} \quad \text{and} \quad \bar{Q}(\mathbf{x}) := \sum\_{\ell=1}^{L} \mathbf{x}^{DK(\ell-1)} \sum\_{m=1}^{M} \mathbf{x}^{M+1-m} B\_{m,\ell}.$$

We now define polynomials

$$P(\mathbf{x}) := P(\mathbf{x}) + R(\mathbf{x}) \quad \text{and} \quad Q(\mathbf{x}) := Q(\mathbf{x}) + S(\mathbf{x}),$$

where and *R*(*x*), *S*(*x*) are a pair of matrix polynomials:

$$R(\mathbf{x}) := \sum\_{t=1}^{T} \mathbf{x}^{D(t-1)} \mathbf{R}\_t \quad \text{and} \quad S(\mathbf{x}) := \sum\_{t=1}^{T} \mathbf{x}^{D(t-1)} S\_{t\nu}$$

whose coefficients are *a* × *r* and *r* × *b* matrices over F*q*, respectively, chosen uniformly at random.

In the next theorem, we show that the desired matrices *Ck*, appear as coefficients of the product *PQ* and can hence be retrieved by inspection of this product.

**Theorem 1.** *For each pair* (*k*, -) ∈ [*K*] × [*L*]*, the block Ck*, *arising in the product C* = *AB appears as the coefficient of xD*((*k*−1)+*K*(-<sup>−</sup>1))+*M*+<sup>1</sup> *in the product PQ.*

**Proof.** We calculate the product

$$\begin{split} PQ &= PQ + PS + RQ + RS\\ &= \sum\_{k=1}^{K} \sum\_{\ell=1}^{L} \mathbf{x}^{D((k-1)+K(\ell-1))} \sum\_{m=1}^{M} \sum\_{m'=1}^{M} A\_{k,m} B\_{m',\ell} \mathbf{x}^{M+1+m-m'}\\ &+ \sum\_{k=1}^{K} \sum\_{\ell'=1}^{T} \mathbf{x}^{D(k+\ell'-2)} \sum\_{m=1}^{M} A\_{k,m} S\_{\ell'} \mathbf{x}^{m} \\ &+ \sum\_{\ell=1}^{L} \sum\_{t=1}^{T} \mathbf{x}^{D(K(\ell-1)+(t-1))} \sum\_{m'=1}^{M} R\_{\ell} B\_{m',\ell} \mathbf{x}^{M+1-m'}\\ &+ \sum\_{t=1}^{T} \sum\_{t'=1}^{T} R\_{\ell} S\_{\ell'} \mathbf{x}^{D(t+t'-2)}. \end{split}$$

Consider the exponents modulo *D*. The first term in the sum of terms above is the product *<sup>P</sup>*¯*Q*¯. Any of the exponents of *<sup>x</sup>* in this term are equal to *<sup>D</sup>* <sup>−</sup> <sup>1</sup> <sup>≡</sup> *<sup>M</sup>* <sup>+</sup> <sup>1</sup> mod *<sup>D</sup>* if and only if *m* = *m* , in which case its corresponding coefficient is *Ck*,-. In particular, the matrix block *Ck*, appears in the product *P*¯*Q*¯ as the coefficient of *xD*((*k*−1)+*K*(-<sup>−</sup>1))+*M*<sup>+</sup>1.

We claim that no other exponent of *<sup>x</sup>* in *PQ* <sup>−</sup> *<sup>P</sup>*¯*Q*¯ is equal to *<sup>M</sup>* <sup>+</sup><sup>1</sup> mod *<sup>D</sup>*, from which the result will follow. Observe that the exponents in the second and third term of the product (i.e. those of *PS*¯ + *RQ*¯) are all between 1 and *M* modulo *D*, while every exponent of *x* in the fourth term, which is *RS*, is a multiple of *D*.

In order to retrieve the polynomial *PQ*, we may evaluate *P* and *Q* at a number of distinct values *α*1, ... , *αN*+<sup>1</sup> in F<sup>×</sup> *<sup>q</sup>* . The values *P*(*αi*) and *Q*(*αi*) are found at a cost of zero non-scalar operations. Define

$$V(\alpha\_1, \ldots, \alpha\_{N+1}) := \begin{pmatrix} 1 & \alpha\_1 & \alpha\_1^2 & \cdots & \alpha\_1^N \\ 1 & \alpha\_2 & \alpha\_2^2 & \cdots & \alpha\_2^N \\ \vdots & \ddots & \vdots & & \\ 1 & \alpha\_N & \alpha\_N^2 & \cdots & \alpha\_N^N \\ 1 & \alpha\_{N+1} & \alpha\_{N+1}^2 & \cdots & \alpha\_{N+1}^N \end{pmatrix}.$$

The (*i*, *<sup>j</sup>*)-entries of the coefficients of *PQ* ∈ F*a*×*<sup>b</sup> <sup>q</sup>* [*x*] can be retrieved by computing the product

$$V(\mathfrak{a}\_1, \dots, \mathfrak{a}\_{N+1})^{-1}((P(\mathfrak{a}\_1)Q(\mathfrak{a}\_1))\_{i,j}, \dots, (P(\mathfrak{a}\_{N+1})Q(\mathfrak{a}\_{N+1}))\_{i,j})^t,$$

if the degree of *PQ* is at most *N*. Since this computation involves only F*q*-linear computations, the total non-scalar cost is the total cost of performing the *N* + 1 matrix products *P*(*αi*)*Q*(*αi*). In the distributed computation scheme as shown in Figure 1, the server uploads each pair of evaluations *P*(*αi*), *Q*(*αi*) to the *i*-th worker node, which then computes the product *P*(*αi*)*Q*(*αi*) and returns it to the server.

In this approach to reconstructing *PQ*, we require the participation of *N* + 1 worker nodes, where *N* is the degree of *PQ*. For this reason, we study this degree. Since

deg(*PQ*) <sup>≤</sup> max(deg(*P*¯*Q*¯), deg(*PS*¯ ), deg(*RQ*¯) deg(*RS*)),

we have the following result, wherein each of the values *N*1(*K*, *L*, *M*; *T*) to *N*4(*K*, *L*, *M*; *T*) correspond to the maximum possible degrees of *P*¯*Q*¯, *PS*¯ , *RQ*¯, and *RS*, respectively. We write *N*(*A*, *B*; *K*, *L*, *M*; *T*) to denote the maximum possible degree of the polynomial *PQ*, as the *A*, *B*, *R*, *S* range over all possible matrices of the stated sizes.

**Proposition 1.** *The degree of PQ is upper bounded by N*(*A*, *B*; *K*, *L*, *M*; *T*)*, where*

$$$$

$$N(A, B; K, L, M; T) = \max\left\{ \begin{array}{l} N\_2(K, L, M; T) := D(K + T - 2) + M \\ \text{s.t.} \ (\text{yr}, \text{r.s.} \ \text{s.t.} \ \text{m}) \end{array} \mid \begin{array}{l} (4) \\ \text{m} \end{array} \right\} \tag{4}$$

$$\begin{cases} N\_3(K, L, M; T) := D(K(L - 1) + T - 1) + M \\ N\_4(K, L, M; T) := 2D(T - 1) \end{cases} \tag{6}$$

$$\mathcal{N}\_4(K, L, M; T) := 2D(T - 1) \tag{6}$$

**Proposition 2.** *The following are equivalent.*


**Proof.** First note that *<sup>T</sup>* <sup>&</sup>gt; *<sup>K</sup>* <sup>⇔</sup> *<sup>T</sup>* <sup>−</sup> *<sup>K</sup>* <sup>≥</sup> 1 and that 1 <sup>=</sup> *<sup>M</sup> <sup>D</sup>* <sup>&</sup>gt; *<sup>M</sup> <sup>D</sup>* . Since *T* − *K* is an integer, we thus have that the following inequalities are equivalent to *T* > *K*:

$$\begin{aligned} T - K &> \frac{M}{D}, \\ D(T - K) &> M, \\ D(K(L - 1) + T - 1) + M &> D(KL - 1) + 2M. \end{aligned}$$

This shows that *N*3(*K*, *L*, *M*; *T*) > *N*1(*K*, *L*, *M*; *T*) if and only if *T* > *K*. Similarly, using the 2nd and 3rd inequalities just above, we have

$$\begin{aligned} T > K &\Leftrightarrow DT > DK + M\_\prime\\ &\Leftrightarrow 2D(T-1) > D(T+K-2) + M\_\prime \end{aligned}$$

from which we see that *N*4(*K*, *L*, *M*; *T*) > *N*2(*K*, *L*, *M*; *T*) if and only if *T* > *K*.

**Proposition 3.** *The following are equivalent.*

*1. T* > *K*(*L* − 1) + 1*, 2. N*4(*K*, *L*, *M*; *T*) > *N*3(*K*, *L*, *M*; *T*)*, 3. N*2(*K*, *L*, *M*; *T*) > *N*1(*K*, *L*, *M*; *T*)*.*

**Proof.** We have the following inequalities:

$$\begin{aligned} T > K(L-1) + 1 &\Leftrightarrow T - K(L-1) - 1 \geq 1 > \frac{M}{D}, \\ &\Leftrightarrow D(T - K(L-1) - 1) > M, \\ &\Leftrightarrow D(2T - 2) > D(K(L-1) + T - 1) + M, \end{aligned}$$

from which we deduce that *N*4(*K*, *L*, *M*; *T*) > *N*3(*K*, *L*, *M*; *T*). We now show that *N*2(*K*, *L*, *M*; *T*) > *N*1(*K*, *L*, *M*; *T*). We have:

$$\begin{aligned} T > K(L-1) + 1 &\Leftrightarrow D(T - K(L-1) - 1) > M, \\ &\Leftrightarrow D(K + T - 2) + M > D(KL - 1) + 2M. \end{aligned}$$

We tabulate (see Table 1) the value of *N*(*K*, *L*, *M*; *T*) based on the observations of Propositions 2 and 3.

**Table 1.** Summary table of maximal degree of *PQ*.


### *3.2. AB versus BtAt*

We compare the recovery threshold cost of calculating *BtAt* rather than *AB*. It can be shown that it is always better to calculate *AB* whenever *K* ≥ *L*. That is, we show that *<sup>N</sup>*(*A*, *<sup>B</sup>*; *<sup>K</sup>*, *<sup>L</sup>*, *<sup>M</sup>*; *<sup>T</sup>*) ≤ *<sup>N</sup>*(*B<sup>t</sup>* , *A<sup>t</sup>* ; *L*, *K*, *M*; *T*) for *K* ≥ *L*. We consider all possible cases for the maximal degree in the following two theorems and remarks.

**Theorem 2.** *1. Let T* > *K*, *L. Suppose that T* < *K*(*L* − 1) + 1 *and T* < *L*(*K* − 1) + 1*. We have that*

$$N(A, B; K, L, M; T) = N\_3(K, L, M; T) < N\_3(L, K, M; T) = N(B^t, A^t; L, K, M; T),$$

*if and only if L* < *K.*

*2. Let K* ≥ *T* > *L. Suppose that T* < *K*(*L* − 1) + 1 *and T* < *L*(*K* − 1) + 1*. We have that*

$$N(A, B; K, L, M; T) = N\_1(K, L, M; T) \\
< N\_3(L, K, M; T) = N(B^t, A^t; L, K, M; T) \dots$$

*3. Let T* > *L*, *K and suppose that L*(*K* − 1) + 1 ≥ *T* > *K*(*L* − 1) + 1*. We have that*

*N*(*A*, *B*; *K*, *L*, *M*; *T*) = *N*4(*K*, *L*, *M*; *T*) < *N*3(*L*, *K*, *M*; *T*) = *N*(*B<sup>t</sup>* , *A<sup>t</sup>* ; *L*, *K*, *M*; *T*).

*4. Let T* > *K* ≥ *L and suppose that T* > *L*(*K* − 1) + 1*. We have that*

$$N(A, B; K, L, M; T) = N\_4(K, L, M; T) = N\_4(L, K, M; T) = N(B^I, A^I; L, K, M; T).$$

*5. Let T* ≤ *L* ≤ *K and suppose that T* ≤ *K*(*L* − 1) + 1*. We have that*

$$N(A, B; K, L, M; T) = N\_1(K, L, M; T) = N\_1(L, K, M; T) = N(B^t, A^t; L, K, M; T).$$

**Proof.** 1. Since *T* > *K*, and *T* < *K*(*L* − 1) + 1 by Propositions 2 and 3 we have that

*N*3(*K*, *L*, *M*; *T*) > *N*4(*K*, *L*, *M*; *T*) > *N*2(*K*, *L*, *M*; *T*), *N*1(*K*, *L*, *M*; *T*)

and so *N*(*A*, *B*; *K*, *L*, *M*; *T*) = *N*3(*K*, *L*, *M*; *T*). Similarly, since *<sup>T</sup>* > *<sup>L</sup>*, and *<sup>T</sup>* < *<sup>L</sup>*(*<sup>K</sup>* − <sup>1</sup>) + 1, we have that *<sup>N</sup>*(*B<sup>t</sup>* , *A<sup>t</sup>* ; *L*, *K*, *M*; *T*) = *N*3(*L*, *K*, *M*; *T*). Clearly, *L* < *K* if and only if:

$$\begin{aligned} N\_3(K, L, M; T) &= D(K(L - 1) + T - 1) + M \\ &< D(L(K - 1) + T - 1) + M = N\_3(L, K, M; T) .\end{aligned}$$

2. By Propositions 2 and 3, the assumptions *K* ≥ *T* and *T* < *K*(*L* − 1) + 1 imply that *N*(*A*, *B*; *K*, *L*, *M*; *T*) = *N*1(*K*, *L*, *M*; *T*), while the assumptions *T* > *L* and *T* < *<sup>L</sup>*(*<sup>K</sup>* − <sup>1</sup>) + 1 yield that *<sup>N</sup>*(*B<sup>t</sup>* , *A<sup>t</sup>* ; *K*, *L*, *M*; *T*) = *N*3(*L*, *K*, *M*; *T*). Clearly, since *T* > *L*, we have *M* < *D*(*T* − *L*) and

$$N\_1(K, L, M; T) = D(KL - 1) + 2M < D(L(K - 1) + T - 1) + M = N\_3(L, K, M; T).$$

3. From the given assumptions, by Propositions 2 and 3, we have *N*(*A*, *B*; *K*, *L*, *M*; *T*) = *N*4(*K*, *L*, *M*; *T*) and *N*(*B<sup>t</sup>* , *A<sup>t</sup>* ; *L*, *K*, *M*; *T*) = *N*3(*L*, *K*, *M*; *T*). Since *L*(*K* − 1) + 1 ≥ *T*, as in the proof of Proposition 3, we have

$$N\_4(K, L, M; T) = 2D(T - 1) = N\_4(L, K, M; T) \le N\_3(L, K, M; T) \dots$$


**Remark 1.** *Clearly, if T* ≤ *K and T* > *K*(*L* − 1) + 1 *then L* = 1*. In this case, from Propositions 3 and 2, we have that N*(*A*, *B*; *K*, 1, *M*; *T*) = *N*2(*K*, 1, *M*; *T*)*.*

**Theorem 3.** *Let T* ≤ *K and T* > *K*(*L* − 1) + 1*.*


**Remark 2.** *The remaining two cases lead to a contradiction and can hence never occur. Let T* ≤ *K and T* > *K*(*L* − 1) + 1 *and T* > *L*(*K* − 1) + 1*. By Remark 1, we have that L* = 1 *and we obtain the contradiction T* ≤ *K* < *T.*

*3.3. T-Collusion*

Each query is masked with a polynomial of the form ∑*T*−<sup>1</sup> *<sup>i</sup>*=<sup>0</sup> *<sup>x</sup>iDRi*, where *Ri* is chosen uniformly at random. A query is private in the case of *T* servers colluding if and only if the matrix

*M*(*x*1,..., *xT*) := ⎛ ⎜⎜⎜⎜⎝ 1 ··· 1 *xD* <sup>1</sup> ··· *<sup>x</sup><sup>D</sup> T* . . . ... . . . *xD*(*T*−1) <sup>1</sup> ··· *<sup>x</sup>D*(*T*−1) *T* ⎞ ⎟⎟⎟⎟⎠

has full rank for any subset of *T* evaluation points. This is the same as condition C2 in [27]. Because of the very specific set of exponents used, we can give a more explicit condition for the invertibility of this matrix.

**Proposition 4.** *The matrix M*(*x*1, ... , *xT*) *is invertible if and only if the elements x<sup>D</sup>* <sup>1</sup> , ... , *<sup>x</sup><sup>D</sup> T are distinct.*

**Proof.** *M*(*x*1,..., *xT*) is a Vandermonde matrix with entries *x<sup>D</sup>* <sup>1</sup> ,..., *<sup>x</sup><sup>D</sup> T* .

**Proposition 5.** *A set of elements of* F*<sup>q</sup> such that their Dth powers are pairwise different has size at most N* = *<sup>q</sup>*−<sup>1</sup> gcd(*q*−1,*D*) <sup>+</sup> <sup>1</sup>*.*

**Proof.** Fix a generator *γ* of F∗ *<sup>q</sup>* . Then the image of the map *<sup>x</sup>* → *<sup>x</sup><sup>D</sup>* from F*<sup>q</sup>* to F*<sup>q</sup>* is given by 0 together with all powers *<sup>γ</sup>Di* where 0 ≤ *<sup>i</sup>* < *<sup>q</sup>* − 1.

**Corollary 1.** *Let T* < *q. If* gcd(*q* − 1, *D*) = 1*, then the scheme in Section 3 is secure against T-collusion for any choice of evaluation points.*

### *3.4. Stragglers and Byzantine Servers*

Considering the scheme as described in the previous section, we see that the responses are the coordinates of a codeword of a Reed–Solomon code. The polynomial that needs to be interpolated has degree at most *N* = *N*(*K*, *L*, *M*; *T*), and hence *N* + 1 evaluation points suffice for reconstruction. Any *N* + 1 evaluation points are admissible and hence we have the following theorem.

**Theorem 4.** *The scheme in Section 3 is straggler resistant against S stragglers if N* + 1 + *S helper nodes are used.*

**Proof.** The responses can be considered as a codeword in an [*N* + 1 + *S*, *N* + 1, *S* + 1] RS code, with *S* erasures. Since *S* is smaller than the minimum distance of the code, the full codeword and hence the interpolating polynomial can be recovered.

Similarly, we can use additional helper nodes to account for possible Byzantine servers whose responses are incorrect.

**Theorem 5.** *The scheme in Section 3 is resistant against Byzantine attacks of up to B helper nodes if N* + 1 + 2*B helper nodes are used.*

**Proof.** The responses can be considered as a codeword in an [*N* + 1 + 2*B*, *N* + 1, 2*B* + 1] RS code, with *B* errors. Since 2*B* is smaller than the minimum distance of the code, the full codeword and hence the interpolating polynomial can be recovered.

Combining both theorems give us the following corollary.

**Corollary 2.** *The scheme in Section 3 is resistant against S stragglers and B Byzantine helper nodes if N* + 1 + *S* + 2*B helper nodes are used.*

### **4. Gaps in the Polynomial**

The upper bound on the recovery threshold given by the maximum degree of the product *PQ* can actually be improved if we choose instead to use the fact that we need only as many servers as non-zero coefficients. Similar to considerations in [9], as a basic observation of linear algebra, we note that only as many evaluation points as there are possible non-zero coordinates are required to retrieve the required matrix coefficients of *PQ*. Let *PQ* have degree *r* − 1 and suppose that *q* ≥ *r* + 1. Let *α*1, ... , *α<sup>r</sup>* be distinct elements of F× *<sup>q</sup>* . Suppose that the zero coefficients of *PQ* are indexed by I and let *i* = *r* − |I|. There exist *j*1, ... , *ji* ∈ {1, ... ,*r*} such that the *i* × *i* matrix *V*, found by deleting the columns of *V*(*αj*<sup>1</sup> , ... , *αji* ) indexed by I, is invertible. Then, each (*s*, *t*)-entry of the unknown coefficients of the polynomial *PQ* ∈ F*a*×*<sup>b</sup> <sup>q</sup>* [*x*] can be retrieved by computing the product

$$V^{-1}((P(\boldsymbol{\alpha}\_j)Q(\boldsymbol{\alpha}\_j))\_{s,t} : j \in [r] \backslash \mathcal{T})^t.$$

**Theorem 6.** *Let M* ≥ 2*, D* = *M* + 2*. Let*

$$\begin{aligned} \bar{P}(\mathbf{x}) &:= \sum\_{k=1}^{K} \mathbf{x}^{D(k-1)} \sum\_{m=1}^{M} \mathbf{x}^{m} A\_{k,m\prime} & R(\mathbf{x}) &:= \sum\_{t=1}^{T} \mathbf{x}^{D(t-1)} R\_{t\prime} \\ Q(\mathbf{x}) &:= \sum\_{\ell=1}^{L} \mathbf{x}^{DK(\ell-1)} \sum\_{m=1}^{M} \mathbf{x}^{M-m+1} B\_{m,\ell\prime} & S(\mathbf{x}) &:= \sum\_{t=1}^{T} \mathbf{x}^{D(t-1)} S\_{t} .\end{aligned}$$

*The number N of non-zero terms in the product PQ satisfies*

$$N \quad \le \begin{cases} N\_1(K, L, M; T) + 1 & \text{if } M > 2, T \le K, L \ge 2 \text{ or } L = 1, T = 1; \\ 3LK + K - T + LT + 1 & \text{if } M = 2, T \le K, L \ge 2; \\ ((L - 1)K + T)M + 2LK + 1 & \text{if } K + 1 \le T \le \lfloor LK/2 \rfloor + 1, L \ge 2; \\ ((L - 1)K + T)M + LK + 2T - 1 & \text{if } T > \lfloor LK/2 \rfloor + 1, L \ge 2; \\ (K + T - 1)M + 2K + 1 & \text{if } 2 \le T \le \lfloor K/2 \rfloor + 1, L = 1; \\ (K + T - 1)M + K + 2T - 1 & \text{if } T > \lfloor K/2 \rfloor + 1, L = 1. \end{cases}$$

**Proof.** We have *P*(*x*) = *P*¯(*x*) + *R*(*x*) and *Q*(*x*) = *Q*¯(*x*) + *S*(*x*). Recall that *P*¯(*x*) and *R*(*x*) have disjoint support, as do *<sup>Q</sup>*¯(*x*) and *<sup>S</sup>*(*x*). From Theorem 1, for each each *<sup>k</sup>* <sup>∈</sup> [*K*], - ∈ [*L*], the matrix

$$\mathcal{C}\_{k\ell} = A\_{k,1}B\_{1,\ell} + \dots + A\_{k,M}B\_{M,\ell}$$

is the coefficient of *x<sup>h</sup>* in *P*¯*Q*¯ for

$$h = (k - 1)D + (\ell - 1)KD + M + 1 = (k + (\ell - 1)K)D - 1.$$

Clearly, each such coefficient *h* ≡ *M* + 1 mod *D*. The degrees of terms arising in the product *PQ* are given by

$$(i + zK)D + j + y + 2,\tag{7}$$

$$(i+t)D + j + 1,\tag{8}$$

$$(\mu + zK)D + y + 1,\tag{9}$$

$$(\mu + t)D.\tag{10}$$

for *i* ∈ {0, ..., *K* − 1}, *z* ∈ {0, ..., *L* − 1}, *j*, *y* ∈ {0, ..., *M* − 1} and *u*, *t* ∈ {0, ..., *T* − 1}. The sequence (7) corresponds to terms that appear in the product *P*¯*Q*¯. By inspection, we see that no element *θ* in any of the sequences (8)–(10) satisfies *θ* ≡ −1 mod *D*: in (8) this would require *j* = *M* and in (9) this would require *y* = *M*, contradicting our choices of *j*, *y*. The total number of distinct terms to be computed is the number of distinct integers appearing in the union T of the elements of the sequences (7)–(10). Let U<sup>0</sup> denote the set of integers appearing in (7). Observe that U<sup>0</sup> = {2, ... ,(*LK* + 1)*D* − 4}, unless *M* = 2, in which case U<sup>0</sup> = {*j* : 2 ≤ *j* ≤ 4*LK*, *j* ≡ 1 mod 4}. Consider the set

$$\mathcal{U} := \{0, 1, 2, \dots, (LK + 1)D - 4\}.$$

We make the following observations with respect to U.


Consider the following sets.

$$\mathcal{U}\_1 \quad := \quad \{aD + i : 0 \le a \le K + T - 2, 1 \le i \le M\}, \\ |\mathcal{U}\_1| = (K + T - 1)M;$$


Clearly, U<sup>1</sup> comprises the elements of the sequence (8) and the members of U<sup>3</sup> are exactly those of the sequence (10). For *T* ≥ *K* + 1, we have

$$\{\mu + \text{x}K : 0 \le \mu \le T - 1, 0 \le \text{x} \le L - 1\} = \{\beta : 0 \le \beta \le T - 1 + (L - 1)K\},$$

in which case U<sup>2</sup> is exactly the set of elements of (9). It follows that U<sup>1</sup> ∪ U<sup>2</sup> ∪ U<sup>3</sup> ⊆ U if and only if *T* ≤ min{(*L* − 1)*K* + 1, *K*, *LK*/2 + 1}. This minimum is *K* if *L* ≥ 2 and is 1 if *L* = 1. Furthermore, U<sup>3</sup> is disjoint from U<sup>1</sup> and from U2. If *L* ≥ 2 or if *L* = *K* = 1, then U<sup>1</sup> ⊂ U2, while if *L* = 1, then U<sup>2</sup> ⊂ U1.

Suppose first that *M* > 2. We thus have that U = T if *L* ≥ 2 and *T* ≤ *K*, or if *L* = *T* = 1; in either of these cases, *PQ* has at most

$$|\mathcal{T}| = |\mathcal{U}| = (L\mathcal{K} + 1)D - 3 = (L\mathcal{K} - 1)D + 2M + 1 = N\_1(\mathcal{K}, L, M; T) + 1$$

non-zero terms. We summarize these observations as follows.

$$\begin{array}{rcl} \mathcal{T} &=& \begin{cases} \mathcal{U} & \text{if } L \ge 2 \text{ and } T \le K, \text{ or if } L = T = 1; \\\mathcal{U} \cup \mathcal{U}\_1 \cup \mathcal{U}\_3 & \text{if } L = 1 \\\mathcal{U} \cup \mathcal{U}\_2 \cup \mathcal{U}\_3 & \text{if } L \ge 2 \text{ or if } L = K = 1. \end{cases} \end{array}$$

Furthermore,

$$\begin{array}{rcl} \mathcal{U}\cap\mathcal{U}\_{3} &=& \{\gamma D: 0 \le \gamma \le \min\{2T-2, LK\}\}, \\ \mathcal{U}\cap\mathcal{U}\_{2} &=& \{\beta D+j: 0 \le \beta \le \min\{LK, T-1+(L-1)K\}, 1\le j \le M\} \\ & & \{LKD+M-1, LKD+M\}, \\ \mathcal{U}\cap\mathcal{U}\_{1} &=& \{aD+i: 0 \le a \le \min\{LK, T+K-2\}, 1\le i \le M\} \\ & & \{LKD+M-1, LKD+M\} \end{array}$$

Hence |U ∩ U3| = min{2*T* − 1, *LK* + 1}. If *T* ≥ *K* + 1 then |U ∩ U2| = *M*(*LK* + 1) − 2 and so, applying inclusion–exclusion, we see that, if *L* ≥ 2, then

$$|\mathcal{T}| = \begin{cases} |\mathcal{U}| = (L\mathcal{K} + 1)D - 3 = (L\mathcal{K} + 1)(M + 2) - 3 & \text{if } K \ge T; \\ |\mathcal{U} \cup \mathcal{U}\_2| = ((L - 1)\mathcal{K} + T)M + 2L\mathcal{K} + 1 & \text{if } K + 1 \le T \le \lfloor L\mathcal{K}/2 \rfloor + 1; \\ |\mathcal{U} \cup \mathcal{U}\_2 \cup \mathcal{U}\_3| = ((L - 1)\mathcal{K} + T)M + L\mathcal{K} + 2T - 1 & \text{otherwise}. \end{cases}$$

In the case *L* = 1, we have U<sup>2</sup> ⊆ U1, while if *T* ≤ *K* then the elements of (9) are contained in U. Therefore, T = U∪U<sup>1</sup> ∪ U<sup>3</sup> and so for *T* ≥ 2 we have

$$|\mathcal{T}| \quad = \begin{cases} (K+T-1)M+2K+1 & \text{if } T \le \lfloor K/2 \rfloor + 1; \\ (K+T-1)M+K+2T-1 & \text{otherwise}. \end{cases}$$

Finally, suppose that *M* = 2. If *L* = 1 then, since *U*<sup>2</sup> ⊂ U<sup>1</sup> we have T = U<sup>0</sup> ∪ U<sup>1</sup> ∪ U3. Similar to previous computations, we see |T | takes the same values as in the case for *M* > 2. If *L* ≥ 2 and *T* ≥ *K* + 1 then T = U<sup>0</sup> ∪ U<sup>2</sup> ∪ U3. Again using similar computations as before, we see in this case that |T | takes the same values as in the case for *M* > 2. Suppose that *L* ≥ 2 and *T* ≤ *K*. In this case, the integers appearing in (9) comprise the set

$$\mathcal{U}\_2^l := \{ 4(\mu + z\mathcal{K}) + j : 0 \le \mu \le T - 1, 0 \le z \le L - 1, 1 \le j \le 2 \}, \\ |\mathcal{U}\_2^l| = 2T\mathcal{L}\_{\mathcal{X}}$$

We have |U0| = 3*KL* and moreover,

$$\begin{aligned} \mathcal{U}\_{\mathbb{O}} \cap \mathcal{U}\_{2}^{\prime} &= \{ 4(u+zK) + 2 : 0 \le u \le T - 1, 0 \le z \le L - 1 \}, |\mathcal{U}\_{\mathbb{O}} \cap \mathcal{U}\_{2}^{\prime}| = TL; \\ \mathcal{U}\_{\mathbb{O}} \cap \mathcal{U}\_{1} &= \{ 4a + 2 : 0 \le a \le K + T - 2 \}, |\mathcal{U}\_{0} \cap \mathcal{U}\_{1}| = K + T - 1; \\ \mathcal{U}\_{\mathbb{O}} \cap \mathcal{U}\_{3} &= \{ 4(a + 1) : 0 \le a \le 2T - 3 \}, |\mathcal{U}\_{0} \cap \mathcal{U}\_{3}| = 2T - 2; \\ \mathcal{U}\_{1} \cap \mathcal{U}\_{2}^{\prime} &= \{ 4(u + zK) + j : 0 \le u \le T - 1, 0 \le z \le 1, 1 \le j \le 2 \}, |\mathcal{U}\_{1} \cap \mathcal{U}\_{2}^{\prime}| = 4T; \\ \mathcal{U}\_{0} \cap \mathcal{U}\_{1} \cap \mathcal{U}\_{2}^{\prime} &= \{ 4(u + zK) + 2 : 0 \le u \le T - 1, 0 \le z \le 1 \}, |\mathcal{U}\_{0} \cap \mathcal{U}\_{1} \cap \mathcal{U}\_{2}| = 2T. \\ \text{Therefore, } |T| = 3LK + K - T + TL + 1. \end{aligned}$$

$$\neg$$

**Example 1.** *Let M* = 3, *K* = 3, *L* = 2*, that is:*

$$A = \begin{bmatrix} A\_{1,1} & A\_{1,2} & A\_{1,3} \\ A\_{2,1} & A\_{2,2} & A\_{2,3} \\ A\_{3,1} & A\_{3,2} & A\_{3,3} \end{bmatrix}, \quad B = \begin{bmatrix} B\_{1,1} & B\_{1,2} \\ B\_{2,1} & B\_{2,2} \\ B\_{3,1} & B\_{3,2} \end{bmatrix}.$$

*We will compute the product AB using 32 helper nodes, assuming that T* = 3 *servers may collude. Choose a pair of polynomials*

$$R(z) = R\_1 + R\_6 \mathbf{x}^5 + R\_{11} \mathbf{x}^{10} \text{ and } S(z) = S\_1 + S\_6 \mathbf{x}^5 + S\_{11} \mathbf{x}^{10},$$

*whose non-zero matrix coefficients are chosen uniformly at random over* F*q. We have*

$$\begin{aligned} \bar{P}(\mathbf{x}) &= \mathbf{x}(A\_{1,1} + A\_{1,2}\mathbf{x} + A\_{1,3}\mathbf{x}^2) + \mathbf{x}^6(A\_{2,1} + A\_{2,2}\mathbf{x} + A\_{2,3}\mathbf{x}^2) + \mathbf{x}^{11}(A\_{3,1} + A\_{3,2}\mathbf{x} + A\_{3,3}\mathbf{x}^2), \\ \bar{Q}(\mathbf{x}) &= \mathbf{x}(B\_{3,1} + B\_{2,1}\mathbf{x} + B\_{1,1}\mathbf{x}^2) + \mathbf{x}^{16}(B\_{3,2} + B\_{2,2}\mathbf{x} + B\_{1,2}\mathbf{x}^2). \end{aligned}$$

*Define P*(*x*) := *P*¯(*x*) + *R*(*x*) *and Q*(*x*) := *Q*¯(*x*) + *S*(*x*)*. In Table 2, we show the exponents that arise in the product P*(*x*)*Q*(*x*)*. The monomials corresponding to the computed data are* 4, 9, 14, 19, 24, 29*, shown in blue. The coefficients of x*4, *x*9, *x*14, *x*19, *x*<sup>24</sup> *and x*<sup>29</sup> *are, respectively, given by*

$$\begin{array}{rcl} \mathbb{C}\_{1,1} &=& A\_{1,1}B\_{1,1} + A\_{1,2}B\_{2,1} + A\_{1,3}B\_{3,1}, \\ \mathbb{C}\_{1,2} &=& A\_{1,1}B\_{1,2} + A\_{1,2}B\_{2,2} + A\_{1,3}B\_{3,2}, \\ \mathbb{C}\_{2,1} &=& A\_{2,1}B\_{1,1} + A\_{2,2}B\_{2,1} + A\_{2,3}B\_{3,1}, \\ \mathbb{C}\_{2,2} &=& A\_{2,1}B\_{1,2} + A\_{2,2}B\_{2,2} + A\_{2,3}B\_{3,2}, \\ \mathbb{C}\_{3,1} &=& A\_{3,1}B\_{1,1} + A\_{3,2}B\_{2,1} + A\_{3,3}B\_{3,1}, \\ \mathbb{C}\_{3,2} &=& A\_{3,1}B\_{1,2} + A\_{3,2}B\_{2,2} + A\_{3,3}B\_{3,2}. \end{array}$$

*Note that the total number of non-zero terms in PQ is LKD* + *M* − 1 = 32*, as predicted by Theorem 6. This also corresponds to the case for which PQ has degree N*1(*K*, *L*, *M*; *T*) = *N*1(3, 2, 3; 3) = 31*, which is consistent with Theorem 2. Therefore,* 32 *helper nodes are required to retrieve PQ and hence the coefficients Ck*,*m. If the matrices have entries over* F*<sup>q</sup> with q* = 64*, then since* gcd(*q* − 1, *D*) = gcd(63, 5) = 1*, the user can retrieve the data securely in the presence of 3 colluding workers.*

*Suppose now that we have T* = 6 *colluding servers. In this case, we have T* = 6 > 4 = *LK*/2 + 1 *and L* > 1 *and so from Theorem 6, we expect the polynomial PQ to have at most* (*LK* + *T*)*D* − *K*(*M* + *L*) − 1 = 44 *non-zero coefficients. These exponents are shown in the corresponding degree table for our scheme (see Table 3). In this case, to protect against collusion by 6 workers, we require a total of 44 helpers. While the degree of PQ in this case is* 50 *(see Table 1), the coefficients corresponding to the exponents E* = {34, 39, 44, 46, 47, 48, 49} *are zero, and hence known a priori to the user. Let <sup>α</sup> be a root of <sup>x</sup>*<sup>6</sup> + *<sup>x</sup>*<sup>4</sup> + *<sup>x</sup>*<sup>3</sup> + *<sup>x</sup>* + <sup>1</sup> ∈ F2[*x*]*, so that <sup>α</sup> generates* F<sup>×</sup> 64*.* *Let <sup>V</sup> be the* <sup>44</sup>×<sup>44</sup> *matrix obtained from <sup>V</sup>*(*α<sup>i</sup>* : *<sup>i</sup>* ∈ [63]) *by deleting the columns and rows indexed by E* ∪ {51, ... , 62}*. It is readily checked (e.g., as here, using MAGMA [28]) that the determinant of V is α*<sup>11</sup> *and in particular is non-zero. Therefore, we can solve the system to find the unknown coefficients of PQ via the computation V*−1(*P*(*αij*)*Q*(*αij*) : *<sup>i</sup>*, *<sup>j</sup>* ∈ [63]\(*<sup>E</sup>* ∪ {51, . . . , 62}))*<sup>t</sup> .*

**Table 2.** Exponents of *P*(*x*)*Q*(*x*) for *K* = 3, *L* = 2, *M* = 3, *T* = 3. The monomial exponents which correspond to the computed data are shown in blue. The grey background marks noise exponents.


**Table 3.** Exponents of *P*(*x*)*Q*(*x*) for *K* = 3, *L* = 2, *M* = 3, *T* = 6. The monomial exponents which correspond to the computed data are shown in blue. The grey background marks noise exponents.


We remark that for the case of no collusion, Theorem 6 does not yield an optimal scheme. The proposition below outlines a modified scheme with a lower recovery threshold if secrecy is not a consideration.

**Proposition 6.** *Define the polynomials:*

$$\begin{aligned} P(\mathbf{x}) &:= \sum\_{k=1}^{K} \mathbf{x}^{(k-1)M} \sum\_{m=1}^{M} \mathbf{x}^{m} A\_{k,m,\prime} \\ \bar{Q}(\mathbf{x}) &:= \sum\_{\ell=1}^{L} \mathbf{x}^{(K+\ell-1)M} \sum\_{m=1}^{M} \mathbf{x}^{M+1-m} B\_{m,\ell} \end{aligned}$$

*The following hold:*


$$N \le KLM + M - 1.$$

.

**Proof.** For each (*i*, *j*) ∈ [*K*] × [*L*], define the following:


$$\tilde{P}\tilde{Q} = \sum\_{k=1}^{K} \sum\_{\ell=1}^{L} \sum\_{m=1}^{M} \sum\_{m'=1}^{M} \mathbf{x}^{\mathcal{M}(K+\ell+k-1)+1+m-m'} A\_{k,m} B\_{m',\ell}.$$

The distinct monomials arising in the product *P*˜*Q*˜ are those indexed by the distinct elements of ∪(*i*,*j*)∈[*K*]×[*L*]*BM*(*cij*). It is straightforward to check that for each (*i*, *<sup>j</sup>*) ∈ [*K*] × [*L*], the integer *cij* is not contained in *Bm*(*cut*) for any (*u*, *t*) = (*i*, *j*) and hence the required coefficients *Cij* that appear in the product *P*˜*Q*˜, which are indexed by the *cij*, can be uniquely retrieved. We compute the number of workers required by this scheme. We have

$$\begin{array}{lcl} V &:=& \left| \bigcup\_{(i,j) \in [K] \times [L]} B\_M(\mathbf{c}\_{ij}) \right| \\ &=& KL(2M - 1) - \sum\_{(i,j) \neq (u,t)} \left| B\_M(\mathbf{c}\_{ij}) \cap B\_M(\mathbf{c}\_{st}) \right| \\ &=& KL(2M - 1) - (KL - 1)(M - 1) = KLM + M - 1. \end{array}$$

The recovery threshold of this scheme takes the same value as the recovery threshold of the poly-entangled scheme of Theorem 1 [18].

### **5. Results and Comparison with the State-of-the-Art**

We provide some comparison plots that highlight parameter regions of interest. In Figure 2, we compare the two variants of our own scheme. The recovery threshold when considering the maximal degree of the resulting product polynomial is shown alongside the count of possibly non-zero coefficients. We see that significant gains can be achieved, especially in the higher collusion number region.

**Figure 2.** Comparison of maximal degree with non-zero coefficient.

In Figure 3, we compare our (non-zero coefficient) scheme with the SGPD scheme presented in [19]. For *K* > 1, we see that, except for very low values of *T*, our new scheme outperforms the SGPD scheme. This comparison of the recovery threshold for the two schemes is well justified since they use the same division of the matrices and will have identical upload and download costs per server.

**Figure 3.** Comparison with [19].

The comparison in Figure 4 with the entangled codes scheme [17] and a newer scheme using roots of unity [26] shows that our new codes have lower recovery threshold for low number of colluding servers. Calculating the actual number of servers needed for the entangled scheme requires knowledge of the tensor rank of matrix multiplication. These ranks, or their best known upper bounds, are taken from [29,30]. It should be noted that the scheme in [26] requires that either ((*L* + 1)(*K* + *T*) − 1) | *q* or (*KML* + *LT* + *KM* + *T*) | *q* where *q* is the field size. The requirements for our scheme outlined in Proposition 5 and Corollary 1 (i.e., that gcd(*q* − 1, *D*) = 1, *q* > *N*) are much less restrictive.

**Figure 4.** Comparison with [17,26] for the cases *M* = 4, *L* = 3 and *M* = 5, *L* = 2.

The comparison with the GASP scheme is less straightforward since the partitioning in GASP has a fixed value of *M* = 1. The plot in Figure 5 shows the recovery thresholds for the GASP scheme with partitioning *K* = *L* = 3*M* as well as the recovery thresholds of our scheme for *K* = *L* = 3 and varying *M* from 1 to 5. We compare here with the maximal degree of our scheme, not the non-zero coefficients, to show that the variant of our scheme that is able to mitigate stragglers and Byzantine servers achieve much lower recovery thresholds. Fixing *K* and *L* to be the same value across this comparison means that the download cost per server is the same for all our schemes and the *K* = *L* = 3 GASP scheme. Note that in the *M* = 1 case, we have identical partition and hence upload cost per server as the *K* = *L* = 3 GASP scheme, while for *M* = 2, we have identical upload cost with the *K* = *L* = 6 GASP scheme, and *M* = 5 corresponds to the *K* = *L* = 15 GASP scheme. We can see that the grid partitioning allows for a much lower recovery threshold when the upload cost is fixed. The outer partitioning of the GASP scheme allows for low download cost per server that makes up for the higher recovery threshold. Explicitly, the outer partition into *KM* and *LM* blocks allows for a download rate of *NGASP*( *ab <sup>M</sup>*<sup>2</sup> ), where *NGASP* is the recovery threshold for the GASP scheme. In contrast, the scheme presented in this paper will have a download rate of *Nab* if we partition into *K* × *M* and *M* × *L* blocks.

**Figure 5.** Comparison of the maximal degree with the *GASPr* scheme from [10].

It should be noted though that our construction allows to explicitly control the field size needed. In contrast, the GASP scheme might have to choose its evaluations points from an extension field Theorem 1 [9] if the base field is fixed by the entries of the matrices *A* and *B*, or just requires a very large base field. This would greatly increase the computational cost and the rates at all steps of the scheme. For example, for *K* = 3, *L* = 3, *T* = 3, GASP*<sup>r</sup>* uses *N* = 22 servers and the exponents for the randomness in one of the polynomials are 9, 10, 12. Then, there are no suitable evaluation points for *q* = 23, 25, 27, 29, 31, 32, 37, 41, 43 and so for these values of *q*, an extension field is required.

Furthermore, the scheme presented in this paper can be used in situations where stragglers or Byzantine servers are expected as described in Corollary 2.

### *Complexity*

We summarize the cost of F*q*-arithmetic operations and transmission of F*<sup>q</sup>* elements associated with this scheme, using *N* servers. We refer the reader to ([25], Table 1) and ([26], Table 1) to view the complexity of other schemes in the literature (note that the costs defined in [25] are normalized). There are various trade-offs in costs depending on the partitioning chosen (the proposed scheme is completely flexible in this respect), ability to handle stragglers and Byzantine servers, and constraints on the field size *q*.

We remark that additions in general are much less costly than F*q*-multiplications in terms of space and time: for example, if *q* = 2-, then an addition has space complexity (number of AND and XOR gates) O(-) and costs 1 clock in time, while multiplication has space complexity O(-<sup>2</sup>) and time complexity O(log2(-)) [31,32].

The encoding complexity of our scheme comes at the cost of evaluating the pair of polynomials *P*(*x*) and *Q*(*x*) each at *N* distinct elements of F*q*. This is equivalent to performing *Nr*(*a* + *b*) (scalar) polynomial evaluations in F*q*. Given *α* ∈ F*q*, the (*i*, *j*)-entry of *P*(*α*) is an evaluation of an F*q*-polynomial with *KM* + *T* coefficients, while the (*i*, *j*)-entry of *Q*(*α*) is an evaluation of an F*q*-polynomial with *KL* + *T* coefficients. The decoding complexity is the cost of interpolating the polynomial *PQ* ∈ F*a*×*<sup>b</sup> <sup>q</sup>* [*x*] using *N* evaluation points, when *PQ* has at most *N* unknown coefficients.

The cost of either polynomial evaluation at *N* points or interpolation of a polynomial of degree at most *<sup>N</sup>* <sup>−</sup> 1 has complexity <sup>O</sup>(*<sup>N</sup>* log<sup>2</sup> *<sup>N</sup>*log log *<sup>N</sup>*). Therefore, we have the following statement.

### **Proposition 7.**


### **6. Conclusions**

In this work, we addressed the problem of secure distributed matrix multiplication for *C* = *AB* in terms of designing polynomial codes for this setting. In particular, we assumed that *A* and *B* contain confidential data, which must be kept secure from colluding workers. Similar to some previous work also employing polynomial codes for distributed matrix multiplication, we proposed to deliberately leave gaps in the polynomial coefficients for certain degrees and provided a new code construction which is able to exploit these gaps to lower the recovery threshold. For this construction, we also presented new closed-form expressions for the recovery threshold as a function of the number of colluding workers and the specific number of submatrices that the matrices *A* and *B* are partitioned into during encoding. Further, in the absence of any security constraints, we showed that our construction is optimal in terms of recovery threshold. Our proposed scheme improves on the recovery threshold of existing schemes from the literature in particular for large dimensions of *A* and a larger number of colluding workers, in some cases, even by a large margin.

**Author Contributions:** Writing—original draft, E.B., O.W.G., and J.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported in part by U.S. National Science Foundation grants 1815322, 1908756, 2107370 in addition to the UCD Seed Funding- *Horizon Scanning* scheme (grant no. 54584).

**Institutional Review Board Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Entropy* Editorial Office E-mail: entropy@mdpi.com www.mdpi.com/journal/entropy

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel: +41 61 683 77 34

www.mdpi.com ISBN 978-3-0365-7365-6