Energy-Limited Joint Source–Channel Coding of Gaussian Sources over Gaussian Channels with Unknown Noise Level

Lev, Omri; Khina, Anatoly

doi:10.3390/e25111522

Open AccessArticle

Energy-Limited Joint Source–Channel Coding of Gaussian Sources over Gaussian Channels with Unknown Noise Level

by

Omri Lev

^1,*,†

and

Anatoly Khina

²

¹

Signals, Information, and Algorithms Laboratory, Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA

²

School of Electrical Engineering, Tel Aviv University, Tel Aviv 6997801, Israel

^*

Author to whom correspondence should be addressed.

^†

O. Lev was with the School of Electrical Engineering, Tel Aviv University.

Entropy 2023, 25(11), 1522; https://doi.org/10.3390/e25111522

Submission received: 22 August 2023 / Revised: 22 October 2023 / Accepted: 26 October 2023 / Published: 6 November 2023

(This article belongs to the Special Issue Advances in Information and Coding Theory II)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

We consider the problem of transmitting a Gaussian source with minimum mean square error distortion over an infinite-bandwidth additive white Gaussian noise channel with an unknown noise level and under an input energy constraint. We construct a universal joint source–channel coding scheme with respect to the noise level, that uses modulo-lattice modulation with multiple layers. For each layer, we employ either analog linear modulation or analog pulse-position modulation (PPM). We show that the designed scheme with linear layers requires less energy compared to existing solutions to achieve the same quadratically increasing distortion profile with the noise level; replacing the linear layers with PPM layers offers an additional improvement.

Keywords:

joint source–channel coding; Gaussian channel; infinite bandwidth; energy constraint

1. Introduction

Due to the recent technological advancements in sensing technology and the internet of things, there is a growing demand for low-energy communications solutions. Indeed, since many of the sensors have only limited batteries due to environmental (in the case of energy harvesting) or replenishment limitations, these solutions need to be economical in terms of the utilized energy. Moreover, since each sensor may serve several parties, with each experiencing different conditions, these solutions need to be robust with respect to the noise level.

This problem may be conveniently modeled as the classical setup of conveying k independent and identically distributed (i.i.d.) Gaussian source samples with minimum mean square error (MMSE) distortion over a continuous-time additive white Gaussian noise (AWGN) channel under a channel input total energy constraint

k E

, where E is the allowed transmit energy per source sample and unconstrained transmit bandwidth; see Figure 1.

For the encapsulated source–coding problem for a large k, the optimal tradeoff between the compression rate R and the (per-sample) MMSE distortion D [1] (Chapter 13.2) for a memoryless Gaussian source with variance

σ_{x}^{2}

is dictated by the rate–distortion function [1] (Chapter 13.3):

\begin{matrix} R (D) = \{\begin{matrix} \frac{1}{2} log \frac{σ_{x}^{2}}{D}, & σ_{x}^{2} > D, \\ 0, & σ_{x}^{2} \leq D \end{matrix} = \{\begin{matrix} \frac{1}{2} log SDR, & SDR > 1, \\ 0, & SDR \leq 1, \end{matrix} \end{matrix}

(1)

where

SDR ≜ σ_{x}^{2} / D

is the signal-to-distortion ratio (SDR).

For the encapsulated channel-coding problem, since the bandwidth is unconstrained (i.e., grows to infinity), and the allowed energy of the channel input is constrained by E, the maximal achievable total reliable rate (in nats) of the entire transmission—the total capacity—is given by [1] (Chapter 9.3)

\begin{matrix} C = \frac{E}{N} = ENR, \end{matrix}

(2)

when the power spectral density of the noise (the noise level) N is known to the transmitter (and the receiver), and where

ENR ≜ E / N

is the energy-to-noise ratio. We note that, in our setting, the transmit energy E is fixed regardless of the transmission duration and bandwidth, in contrast to the power-limited setting, in which the energy

E = P T

grows linearly with the transmission time for a fixed power P. To emphasize this, following [2,3] and others, we make use of

ENR = E / N

to distinguish it from the more common signal-to-noise ratio (SNR), which is defined in the fixed-power scenario as

SNR ≜ P / N

.

Returning to the overall problem of conveying k i.i.d. source samples of a Gaussian source over a continuous-time AWGN channel subject to an energy constraint (and unconstrained bandwidth), in the limit of a large-source blocklength k, the optimal achievable mean square error distortion per source sample is dictated by the celebrated source–channel separation principle [1] (Th. 10.4.1), [4] (Chapter 3.9):

R (D) \leq C

, which upon substituting (1) and (2), amounts to

\begin{matrix} D = σ_{x}^{2} \cdot e^{- 2 ENR}, \end{matrix}

(3)

For non-Gaussian continuous memoryless sources, the optimal distortion is bounded as [1] (Prob. 10.8, Th. 10.4.1), [4] (Prob. 3.18, Chapter 3.9)

\begin{matrix} \frac{e^{2 h (x)}}{2 π e} \cdot e^{- 2 ENR} \leq D \leq σ_{x}^{2} \cdot e^{- 2 ENR}, \end{matrix}

(4)

where the lower bound stems from Shannon’s lower bound [5], the upper bound holds since a Gaussian source is the “least compressable” source with a given variance under a quadratic distortion measure, and

h (x)

denotes the differential entropy of a sample of the i.i.d. source x [1] (Chapter 8), [4] (Chapter 2.2).

While the optimal performance is known when the transmitter (and the receiver) knows the noise level and

k \to \infty

, determining it becomes much more challenging when the noise level is unknown at the transmitter. Indeed, when the transmitter is oblivious of the true noise level, achieving (3) for all noise levels simultaneously is impossible [6,7]. Instead, one wishes to achieve graceful degradation of the distortion with the noise level, namely, a scheme that would work well for a continuum of all possible noise levels without knowing the true noise level at the transmitter. Since the distortion improves exponentially with the ENR (3) when the noise level is known, in the absence of knowledge of the noise level at the transmitter, one might hope to attain an exponential distortion decay profile with the ENR of the form

\begin{matrix} D & \leq a e^{- b ENR} & \forall ENR & > 0 \end{matrix}

(5a)

for some

a, b > 0

, or, equivalently,

\begin{matrix} D & \leq a e^{- c / N} & \forall N & > 0 \end{matrix}

(5b)

for some

a, c > 0

and some finite per-sample energy E. Köken and Tuncel [7] proved that, unfortunately, this is impossible, namely, no

a, b > 0

(equivalently

a, c > 0

and

E < \infty

) exist for which (5a,b) is achievable simultaneously for all

ENR > 0

(equivalently, for all

N > 0

). Consequently, distortion profiles that deteriorate faster with the noise level need to be sought.

For the case of finite bandwidth expansion/compression B (and finite power), by superimposing digital successive refinements [8] with a geometric power allocation, Santhi and Vardy [9,10] and Bhattad and Narayanan [11] showed that, in our terms, the distortion improves like

{ENR}^{- (B - ϵ)}

for an arbitrarily small

ϵ > 0

, for large ENR values. This suggests that by taking the bandwidth to be large enough, a polynomial decay with an

ENR

of any finite degree, however large, is achievable, starting from a large enough ENR. In our setting of interest, in which the bandwidth is unconstrained, this means, in turn, that there exists a finite energy E for which a polynomially decaying distortion profile in N

\begin{matrix} D & = \frac{σ_{x}^{2}}{1 + {(\frac{\tilde{E}}{N})}^{L}} & \forall N > 0 \end{matrix}

(6a)

is attainable for any predetermined power

1 \leq L < \infty

, however large yet finite, and for any predetermined constant

0 < \tilde{E} < \infty

of our choice, with large enough finite per-sample energy

E > 0

; for the particular choice of

\tilde{E} = a^{1 / L} E

for any constant

a > 0

of our choice, this is equivalent to

\begin{matrix} D & = \frac{σ_{x}^{2}}{1 + a {ENR}^{L}} & \forall ENR > 0 . \end{matrix}

(6b)

Mittal and Phamdo [12] constructed a different scheme, that works above a certain minimum (not necessarily large) design ENR by sending the digital successive refinements incrementally over non-overlapping frequency bands, and sending the quantization error r of the last digital refinement over the last frequency band.

The scheme of Mittal and Phamdo was subsequently improved by Reznic et al. [6] (see also [13,14], [15] (Chapter 11.1)) by replacing the successive refinement layers with lattice-based Wyner–Ziv coding [16,17], [4] (Chapter 11.3) which, in contrast to the digital layers of the scheme of Mittal and Phamdo, enjoys an improvement in each of the layers with the ENR.

Kokën and Tuncel [7] adopted the scheme of Mittal and Phamdo in the infinite-bandwidth (and infinite-blocklength) setting. Baniasadi and Tuncel [18] (see also [19]) further improved this scheme by allowing the sending of the resulting analog errors of all the digital successive refinements. For the case of a distortion profile that improves quadratically with the ENR (

L = 2

in (6a,b)), upper and lower bounds were established by Köken and Tuncel [7] and Baniasadi and Tuncel [18] (see also [19]) for the minimum required energy to attain such a profile for all ENR values. For a predetermined value of our choice

\tilde{E} > 0

and a Gaussian source, a quadratic distortion profile (6a) with a predefined constant

\tilde{E}

(and

L = 2

) is achievable with a minimal per-sample transmit energy E that is bounded as

\begin{matrix} 0.906 \tilde{E} \leq E \leq 2.32 \tilde{E} . \end{matrix}

(7)

A staircase profile was treated by Baniasadi [20] (see also [19]).

However, albeit much progress has been made in determining the minimal required energy to attain polynomially decaying distortion profiles (6a), with particular emphasis put on the quadratically decaying distortion profile corresponding to the choice

L = 2

, the upper and lower bounds in (7) remain far apart. Moreover, no (low-delay) schemes for a single-source sample

k = 1

with graceful degradation of the distortion with the ENR have been proposed.

In this work, we adapt the modulo-lattice modulation (MLM) scheme of Reznic et al. [6] with multiple layers to the infinite-bandwidth setting, and interpret previously decoded layers that are designed for lower ENRs as side information that is known to the receiver but not to the transmitter, which allows, in turn, to apply Wyner–Ziv coding techniques [13], [15] (Chapter 11). By utilizing linear modulation for all the layers, we show that this scheme improves the upper (achievability) bound in (7). We then replace the analog modulation in (some of) the layers with analog pulse-position modulation (PPM) which was shown to work well for known ENR in [21]. We show that this scheme requires less energy to attain the same quadratic distortion profile compared to the linear-layer-only MLM scheme. Finally, we demonstrate numerically that a low-delay variant of the scheme, which encodes a single-source sample

k = 1

and uses simple one-dimensional lattices, attains good universal performance with respect to the noise level.

We note that our analytic results rely on the well-established existence of good multi-dimensional lattice codes [15] (to be precisely defined in Section 3), which are used as a building block, along with their known theoretical guarantees. Therefore, our proposed schemes should be understood with this point in mind. That said, for a suboptimal lattice with poorer analytical guarantees, one can similarly calculate the (suboptimal) achievable performance of the scheme. Since lattices work well even in one dimension, we demonstrate the strength of the proposed technique explicitly for this practical scenario using a simple one-dimensional lattice, which amounts to a uniform grid.

The rest of the paper is organized as follows. We introduce the notation that is used in this work in Section 1.1, and formulate the problem setup in Section 2. We provide the necessary background of MLM and analog PPM—the two major building blocks that are used in this work—in Section 3 and Section 4, respectively. We then construct universal schemes with respect to the noise level in Section 5; simulation results of our analysis for good multi-dimensional lattices and of the empirical performance of single-dimensional lattices are provided in Section 6. Finally, we conclude the paper with Section 7 and Section 8 by discussing future research directions and possible improvements.

1.1. Notation

N

,

Z

,

R

, and

R_{+}

denote the sets of the natural, integer, real, and non-negative real numbers, respectively. With some abuse of notation, we denote tuples (column vectors) by

a^{k} ≜ {(a_{1}, \dots, a_{k})}^{†}

for

k \in N

, and their Euclidean norms by

∥a^{k}∥ ≜ \sqrt{\sum_{i = 1}^{k} a_{i}^{2}}

, where

{(\cdot)}^{†}

denotes the transpose operation; distinguishing the former notation from the power operation applied to a scalar value will be clear from the context. The i-th element of the vector

a^{k}

is denoted by

a_{i}

or by

a [i]

, where we will use both terms throughout the paper. All logarithms are to the natural base, and all rates are measured in nats. The differential entropy of a continuous random variable with probability density function f is defined by

h (x) ≜ - \int_{- \infty}^{\infty} f (x) log f (x) d x

and is measured in nats. The expectation of a random variable (RV) x is denoted by

E [x]

. We denote by

{[a]}_{L}

the modulo-L operation for

a, L \in N

, and by

{[\cdot]}_{Λ}

the modulo-

Λ

operation [15] (Chapter 2.3) for a lattice

Λ

[15] (Chapter 2).

⌊ \cdot ⌋

denotes the floor operation. We denote by

I_{k}

the k-dimensional identity matrix. We denote sets of vectors by capital italic letters, where

A_{b, c}

stands for a set of c vectors, each of length b. All the logarithms in this work are to the natural base and all the rates are measured in nats.

2. Problem Statement

In this section, we formulate the JSCC setting that will be treated in this work, depicted in Figure 1.

Source. The source sequence to be conveyed,

x^{k} \in R^{k}

, comprises k i.i.d. samples of a standard Gaussian source, namely, it has mean zero and variance

σ_{x}^{2} = 1

.

Transmitter. Maps the source sequence

x^{k} ≜ {(x_{1}, x_{2}, \dots, x_{k})}^{†}

to a continuous input waveform

{s_{x^{k}} (t) | | t | \leq k T / 2}

that is subject to an energy constraint. (The introduction of negative time instants yields a non-causal scheme. This scheme can be made causal by introducing a delay of size

k T / 2

. We use a symmetric transmission time around zero for convenience):

\begin{matrix} \int_{- \frac{k T}{2}}^{\frac{k T}{2}} {|s (t)|}^{2} d t & \leq k E & \forall x^{k} \in R^{k}, \end{matrix}

(8)

where E denotes the per-symbol transmit-energy.

E = P T

where P is the transmit-power and T is the transmission duration.

Channel.

s_{x^{k}}

is transmitted over a continuous-time additive white Gaussian noise (AWGN) channel:

\begin{matrix} r (t) & = s (t) + n (t), & t \in [- \frac{k T}{2}, \frac{k T}{2}], \end{matrix}

(9)

where n is a continuous-time AWGN with two-sided spectral density

N / 2

, and r is the channel output signal; N is referred to as the noise level.

Receiver. Receives the channel output signal r, and constructs an estimate

{\hat{x}}^{k}

of

x^{k}

.

Distortion. The average quadratic distortion between

x^{k}

and

{\hat{x}}^{k}

is defined as

\begin{matrix} D ≜ \frac{1}{k} E [{∥x^{k} - {\hat{x}}^{k}∥}^{2}], \end{matrix}

(10)

where

∥\cdot∥

denotes the Euclidean norm, and the corresponding signal-to-distortion ratio (SDR) by

\begin{matrix} SDR ≜ \frac{σ_{x}^{2}}{D} = \frac{1}{D}, \end{matrix}

(11)

since we assumed

σ_{x}^{2} = 1

. For non-i.i.d. samples, the variance

σ_{x}^{2}

should be replaced by the effective variance

\begin{matrix} σ_{x}^{2} ≜ \frac{1}{k} E [{∥x^{k}∥}^{2}], \end{matrix}

(12)

which clearly reduces to the (regular) variance in the case of i.i.d. zero-mean samples.

Regime. We concentrate on the energy-limited regime, viz. the channel input is not subject to a bandwidth constraint, but rather to an energy constraint E per source symbol (8). As explained in the Introduction, the per-source-symbol capacity of the channel (9) is equal to [1] (Chapter 9.3)

\begin{matrix} C = ENR, \end{matrix}

(13)

where

ENR ≜ E / N

, and the capacity is measured in nats; note that the available bandwidth is unconstrained (i.e., infinite).

Since the available bandwidth is unlimited, the receiver can learn the white noise level within any accuracy. Hence, we may assume that the receiver has exact knowledge of the channel conditions. The transmitter is oblivious of the noise level, and needs to accommodate for a continuum of noise levels. Specifically, we will require the distortion to satisfy (6a,b). Throughout most of this work we will concentrate on the setting of infinite blocklength (

k \to \infty

). We will also conduct a simulation study for the scalar-source setting (

k = 1

) in Section 6.

3. Background: Modulo-Lattice Modulation

The overall scheme, to be introduced and analyzed in Section 5, comprises two major components (in addition to components in the form of interleaving and “Gaussiaization” that are needed for analysis purposes) as depicted in Figure 3:

A component that assumes an additive noise vector channel of the same dimension as the source input k with unknown noise level, and constructs a layered hybrid digital–analog universal solution with respect to this noise level, where each layer accommodates a different noise level, where an estimator constructed from all the layers that were designed for larger noise levels acts as SI that is known at the receiver;
A component that modulates a single analog-source sample over a continuous-time AWGN channel which transforms the channel effectively into a one-dimensional additive channel (one channel use of a discrete-time channel), is designed for a certain noise level but attains graceful improvement if the noise level happens to be better.

Therefore, in this section, we provide the necessary background about the first component. This is a succinct background about lattices and modulo-lattice modulation which is needed to understand the machinery that is used in the proposed solutions in Section 5, along with known performance guarantees that are relevant to this work and are needed for the analysis of the performance guarantees that are claimed in this work. Readers who are less familiar with lattices, lattice coding, and MLM are referred to the well-regarded book of Zamir on this subject [15]. Background about the second component is provided in Section 4.

A k-dimensional lattice is a discrete regular array in the Euclidean space

R^{k}

that is closed under reflection and real addition.

Definition 1

(Lattice [15] (Def. 2.1.1)). A non-degenerate k-dimensional lattice Λ is defined by a set of k linearly independent basis (column) vectors

g_{1}^{k}, g_{2}^{k}, \dots, g_{n}^{k} \in R^{k}

, and define the

k \times k

generator matrix G:

\begin{matrix} G = [\begin{matrix} g_{1}^{k} & g_{2}^{k} & \dots & g_{k}^{k} \end{matrix}] . \end{matrix}

(14)

The lattice Λ is composed of all integral combinations of the basis vectors:

\begin{matrix} Λ & ≜ \{\sum_{j = 1}^{k} i_{j} g_{j}^{k} | i_{j} \in Z\} \end{matrix}

(15a)

\begin{matrix} = \{G i^{k} | i^{k} \in Z^{k}\} . \end{matrix}

(15b)

In particular, the origin belongs to the lattice:

0 \in Λ

.

Figure 2 provides examples of one- and two-dimensional lattices. A lattice induces a quantization and a partition of the space into cells, with each cell comprising all points that are closest to a specific lattice (quantization) point. These cells are referred to as Voronoi cells.

Definition 2

(Nearest-neighbor quantizer and Voronoi cell [15] (Chapter 2.2)). The nearest-neighbor quantizer

Q_{Λ} (\cdot)

induced by a k-dimensional lattice Λ is defined as

\begin{matrix} Q_{Λ} (x^{k}) & ≜ \underset{λ \in Λ}{argmin} ∥x^{k} - λ∥ & \forall x^{k} \in R^{k} . \end{matrix}

(16)

The Voronoi cell

V_{λ}

is the set of all points that are quantized to

λ \in Λ

:

\begin{matrix} V_{λ} ≜ \{x^{k} \in R^{k} | Q_{Λ} (x^{k}) = λ\} . \end{matrix}

(17)

V_{0}

is referred to as the fundamental Voronoi cell. The breaking of ties in (16) is carried out in a systematic manner so that the induced Voronoi cells

\{V_{λ} | λ \in Λ\}

are congruent. In particular,

$V_{λ} = V_{0} + λ \equiv \{x^{k} + λ | x^{k} \in V_{0}\}$ for $λ \in Λ$ , where the first sum is the Minkowski sum of $V_{0}$ and the singleton $\{λ\}$ ;
$V_{λ_{1}} \cap V_{λ_{2}} = \emptyset$ for $λ_{1} \neq λ_{2}$ , where $λ_{1}, λ_{2} \in Λ$ ;
$⋃_{λ \in Λ} V_{λ} = R^{k}$ .

We next define the modulo-lattice operation with respect to the fundamental Voronoi cell.

Definition 3

(Modulo-lattice [15] (Chapter 2.3)). For a k-dimensional lattice Λ with a fundamental Voronoi cell

V_{0}

, the modulo-lattice operation (with respect to

V_{0}

), applied to

x^{k} \in R^{k}

, is defined as

\begin{matrix} {[x^{k}]}_{Λ} & ≜ x^{k} - Q_{Λ} (x^{k}), \end{matrix}

(18)

namely, the outcome equals the (unique) point that satisfies

x^{k} + λ \in V_{0}

for some

λ \in Λ

.

We now define the volume, the second moment, and the normalized second moment of a lattice.

Definition 4

(Volume and second moment [15] (Chapter 2 and 3)). The volume

V (Λ)

of a k-dimensional lattice Λ with fundamental Voronoi cell

V_{0}

is defined as the volume of

V_{0}

:

\begin{matrix} V (Λ) ≜ Vol (V_{0}) = \int_{V_{0}} d (x^{k}) . \end{matrix}

(19)

The second moment

σ^{2} (Λ)

of Λ is defined as the second moment per dimension of a random variable

d^{k}

that is uniformly distributed over

V_{0}

:

\begin{matrix} σ^{2} (Λ) & ≜ \frac{1}{k} E [{∥d^{k}∥}^{2}] = \frac{1}{k} \int_{V_{0}} \frac{{∥x^{k}∥}^{2}}{V (Λ)} d (x^{k}) . \end{matrix}

(20)

The normalized second moment

G (Λ)

of Λ is defined as

\begin{matrix} G (Λ) ≜ \frac{σ^{2} (Λ)}{V^{2 / k} (Λ)} . \end{matrix}

(21)

To attain a good MMSE using lattice quantization,

G (Λ)

should be as close as possible to the normalized second moment of a k-dimensional ball which, in the limit of

k \to \infty

, converges to

\frac{1}{2 π e}

.

Since the effective source in intermediate layers (this will become clear in the following section) that we would like to transmit and the effective channel noise which is induced by the analog modulations over the continuous-time channel are not Gaussian in general (even after “Gaussianization” which would make them only approximately so), we would need to consider more general source and channel noise vectors that satisfy the following definition of semi-norm ergodicity (SNE).

Definition 5

(SNE [22] (Def. 2)). A sequence in k of random vectors

z^{k} \in R^{k}

of length k with a limit norm

σ_{z} < \infty

. (The original definition of [22] (Def. 2) requires

σ_{z}^{(k)} = σ_{z}

for all

k \in N

. We use here a more relaxed definition which will prove more convenient in the following section):

\begin{matrix} σ_{z}^{(k)} & ≜ \sqrt{\frac{1}{k} E [{∥z^{k}∥}^{2}]}, & lim_{k \to \infty} σ_{z}^{(k)} = σ_{z}, \end{matrix}

(22)

is SNE if for any

ϵ, δ > 0

, however small, there exists a large enough

k_{0} \in Z

, such that for all

k > k_{0}

\begin{matrix} Pr (\frac{1}{k} E [{∥z^{k}∥}^{2}] > (1 + δ) σ_{z}^{2}) \leq ϵ . \end{matrix}

(23)

We are now ready to describe the k-dimensional JSCC setting and the MLM technique with side information (SI) for this setting. In the overall solution of Section 5, the analog modulations over the continuous-time channel that will be described in Section 4 will translate the channel into an effective k-dimensional additive SNE noise channel (compare also the subfigures of Figure 4 in Section 5. Over this effective channel, MLM with SI will be employed, where we will treat previous source estimators as effective side information (SI) known to the receiver but not to the transmitter [13] and [15] (Chapter 11).

Source. Consider a source sequence (equivalently, vector)

x^{k}

of length k,

\begin{matrix} x^{k} = q^{k} + j^{k}, \end{matrix}

(24)

where

j^{k}

is an SI sequence which is known to the receiver but not to the transmitter, and

q^{k}

is the “unknown part” (at the receiver) with per-element variance

\begin{matrix} σ_{q}^{2} ≜ \frac{1}{k} E [{∥q^{k}∥}^{2}] \end{matrix}

(25)

and is SNE (as a sequence in k).

Transmitter. Maps

x^{k}

to a channel input,

m^{k}

, that is subject to a power constraint

\begin{matrix} \frac{1}{k} E [{∥m^{k}∥}^{2}] \leq P . \end{matrix}

(26)

Channel. The channel is an additive noise channel:

\begin{matrix} y^{k} = m^{k} + z^{k} \end{matrix}

(27)

where

z^{k}

is an SNE noise vector that is uncorrelated with

x^{k}

and has effective variance

\begin{matrix} σ_{z}^{2} ≜ \frac{1}{k} E [{∥z^{k}∥}^{2}] . \end{matrix}

(28)

The SNR is defined as

SNR ≜ P / σ_{z}^{2}

; we use here the more common SNR notion in lieu of the ENR notion to emphasize that the channel and the source vectors (equivalently, sequences) in this section are of the same dimension k, in contrast to the continuous-time channel of Section 2.

Receiver. Receives

y^{k}

, in addition to the SI

j^{k}

, and generates an estimate

{\hat{x}}^{k} (y^{k}, j^{k})

of the source

x^{k}

.

The following MLM-based scheme will be employed in the following.

Scheme 1. [MLM-based JSCC with SI [13], [15] (Chapter 11)]

Transmitter: Transmits the signal

\begin{matrix} m^{k} = {[η x^{k} + d^{k}]}_{Λ} \end{matrix}

(29)

where

Λ

is a lattice with a fundamental Voronoi cell

V_{0}

and a second moment P,

η

is a scalar scale factor, and

d^{k}

is a dither vector which is uniformly distributed over

V_{0}

and is independent of the source vector

x^{k}

; consequently,

m^{k}

is independent of

x^{k}

by the so-called crypto lemma [15] (Chapter 4.1).

Receiver:

Receives the signal $y^{k}$ (27) and generates the signal

$\begin{matrix} \begin{matrix} {\tilde{y}}^{k} = {[α_{c} y^{k} - η j^{k} - d^{k}]}_{Λ} ≜ {[η q^{k} + z_{eff}^{k}]}_{Λ} \end{matrix} \end{matrix}$

(30)

where $z_{eff}^{k} ≜ - (1 - α_{c}) m^{k} + α_{c} z^{k}$ is the equivalent channel noise, and $α_{c}$ is a channel scale factor.
Generates an estimate ${\hat{x}}^{k}$ :

$\begin{matrix} {\hat{x}}^{k} = \frac{α_{s}}{η} {\tilde{y}}^{k} + j^{k}, \end{matrix}$

(31)

where $α_{s}$ is a source scale factor.

When

η q^{k} + z_{eff}^{k}

in (30) falls within

V_{0}

, the modulo operation does not come into play, resulting in an effective additive noise channel from

q^{k}

to

{\tilde{y}}^{k}

. Thus, we want the probability of this “correct lattice decoding” event to be bounded from below by

1 - P_{e}

for some small

P_{e}

. On the other hand, conditioned on the correct lattice decoding event, we want the quantization noise, which is governed by

z_{eff}^{k}

, and consequently by the shape of

V_{0}

, to have a small normalized second moment to be good for MMSE estimation. The following theorem provides guarantees for the achievable distortion using this scheme and is aggregated from [13], [15] (Chs. 11.3, 6.4, 9.3), and [22] (see also the exposition about correlation-unbiased estimators (CUBEs) in [23]).

Theorem 1.

The distortion (10) of Scheme 1 is bounded from above by

\begin{matrix} D & \leq L (Λ, P_{e}, α_{c}) \cdot \tilde{D} + P_{e} \cdot D^{err}, \end{matrix}

(32)

for

α_{c} \in (0, 1], α_{s} \in (0, 1]

, and

η > 0

that satisfy

\begin{matrix} \frac{η^{2} σ_{q}^{2}}{P} + \frac{α_{c}^{2}}{SNR} + {(1 - α_{c})}^{2} \leq 1, \end{matrix}

(33)

where

\begin{matrix} \tilde{D} ≜ {(1 - α_{s})}^{2} σ_{q}^{2} + α_{s}^{2} (\frac{α_{c}^{2}}{SNR} + {(1 - α_{c})}^{2}) \frac{P}{η^{2}}, \end{matrix}

(34)

D^{err}

is the distortion given a lattice-decoding-error event [13] (Equation (24)) and is bounded from above by

\begin{matrix} D^{err} \leq 4 σ_{q}^{2} (1 + \frac{\tilde{L} (Λ)}{\tilde{α}}), \end{matrix}

(35)

and the lattice parameters

L (\cdot, \cdot, \cdot)

and

\tilde{L} (\cdot)

are defined as

\begin{matrix} L (Λ, P_{e}, α_{c}) & ≜ \min \{ℓ : Pr (\frac{z_{eff}^{k}}{\sqrt{ℓ}} \notin V_{0}) \leq P_{e}\} > 1, \end{matrix}

(36)

\begin{matrix} \tilde{L} (Λ) & ≜ \frac{{max}_{a^{k} \in V_{0}} {∥a^{k}∥}^{2}}{k P} > 1 . \end{matrix}

(37)

Moreover, for any

P_{e} > 0

, however small, and any

α_{c} \in (0, 1]

, there exists a sequence of lattices,

{Λ_{k} | k \in N}

, that are good for both channel coding [22] (Def. 4) and mean squared error (MSE) quantization [22] (Def. 5), viz.,

\begin{matrix} \begin{matrix} lim_{k \to \infty} L (Λ_{k}, P_{e}, α_{c}) & = 1 \\ lim_{k \to \infty} \tilde{L} (Λ_{k}) & = 1, \end{matrix} \end{matrix}

(38)

respectively, and, therefore, this sequence of lattices achieves a distortion that approaches

\tilde{D}

.

Remark 1.

By our definition of SNE sequences, for each finite k the actual variance of the unknown part

σ_{q}^{(k)}

and the noise variance

σ_{z}^{(k)}

may be higher than for every

k < \infty

higher than their asymptotic quantities. Consequently, also the second moment of

Λ_{k}

for every

k < \infty

would be taken to be higher than its asymptotic value.

That said, as k grows to infinity, these slacks become negligible and the performance converges to that of (32), (38).

The following choice of parameters is optimal in the limit of infinite blocklength,

k \to \infty

, in the Gaussian case (

q^{k}

comprises i.i.d. Gaussian samples,

z^{k}

comprises i.i.d. Gaussian samples) [4] (Chapter 11.3) when the SNR is known.

Corollary 1

(Optimal parameters [13], [15] (Chapter 11.3)). The choice

α_{c} = α_{c} (SNR)

,

L = L (Λ, P_{e}, α_{c})

,

\tilde{α} = \tilde{α} (α_{c}, L)

,

α_{s} (SNR, \tilde{α}, α_{c})

,

η = η (\tilde{α}, σ_{q}^{2})

yields a distortion D that is bounded from above as in (32) with

\begin{matrix} \tilde{D} & = \frac{σ_{q}^{2}}{1 + \tilde{α} \cdot (1 + SNR)}, \end{matrix}

(39)

where

\begin{matrix} α_{c} (SNR) & ≜ \frac{SNR}{1 + SNR}, \end{matrix}

(40)

\begin{matrix} \tilde{α} (α_{c}, L) & ≜ max (α_{c} - \frac{L - 1}{L}, 0), \end{matrix}

(41)

\begin{matrix} η (\tilde{α}, σ_{q}^{2}) & ≜ \sqrt{\tilde{α} \frac{P}{σ_{q}^{2}}}, \end{matrix}

(42)

\begin{matrix} α_{s} (SNR, \tilde{α}, α_{c}) & ≜ \frac{SNR \cdot \tilde{α}}{SNR \cdot \tilde{α} + α_{c}} . \end{matrix}

(43)

Moreover, for any

P_{e} > 0

, however small, there exists a sequence of lattices

{Λ_{k} | k \in N}

that attains (38) and, therefore, in the limit

k \to \infty

,

\tilde{α}

and

α_{s}

above converge to

α_{c}

and the distortion D approaches

\tilde{D}

, which converges, in turn, to

\begin{matrix} \tilde{D} = \frac{σ_{q}^{2}}{1 + SNR} . \end{matrix}

(44)

Consider now the setting of an SNR that is unknown at the transmitter but is known at the receiver. In this case, although the receiver knows the SNR and can, therefore, optimize

α_{c}

and

α_{s}

accordingly, the transmitter, being oblivious of the SNR, cannot optimize

η

for the true value of the SNR. Instead, by setting

η

in accordance with Corollary 1 for a preset minimal-allowable-design SNR,

{SNR}_{0}

, Scheme 1 achieves (44) for

SNR = {SNR}_{0}

and improves, albeit sublinearly, with the SNR for

SNR \geq {SNR}_{0}

. This is detailed in the next corollary.

Corollary 2

(SNR universality). Assume that

SNR \geq {SNR}_{0}

for some predefined

{SNR}_{0} > 0

. Then, the choice

L (Λ, P_{e}, α_{c} ({SNR}_{0}))

,

\tilde{α} = \tilde{α} (α_{c} ({SNR}_{0}), L)

, and

η = η (\tilde{α}, σ_{q}^{2})

with respect to

{SNR}_{0}

(as it cannot depend on the true SNR), and

α_{c} = α_{c} (SNR)

and

α_{s} = α_{s} (SNR, \tilde{α}, α_{c})

(may depend on the true SNR) yields a distortion D that is bounded from above, as in (32) for

\tilde{D}

that is given in (39) with

\tilde{α} = \tilde{α} (α_{c} ({SNR}_{0}), L)

. Moreover, for any

P_{e} > 0

, however small, there exists a sequence of lattices

{Λ_{k} | k \in N}

that satisfies (38); therefore, in the limit

k \to \infty

,

\tilde{α}

converges to

α_{c} ({SNR}_{0})

,

α_{s}

converges to

\frac{{SNR}_{0} (1 + SNR)}{{SNR}_{0} (1 + SNR) + 1 + {SNR}_{0}}

, and the distortion D approaches

\tilde{D}

, which converges, in turn, to

\begin{matrix} \tilde{D} & = \frac{σ_{q}^{2}}{1 + SNR} \frac{1}{\frac{1}{1 + SNR} + \frac{{SNR}_{0}}{1 + {SNR}_{0}}} . \end{matrix}

(45)

Corollary 3

(Source power uncertainty). Assume now additionally that the transmitter is oblivious of the exact power of

q^{k}

,

σ_{q}^{2}

, but knows that it is bounded from above by

{\tilde{σ}}_{q}^{2}

:

σ_{q}^{2} \leq {\tilde{σ}}_{q}^{2}

. Then, the distortion is bounded according to (32), with

\tilde{D} = \frac{{\tilde{σ}}_{q}^{2}}{\frac{{\tilde{σ}}_{q}^{2}}{σ_{q}^{2}} + \tilde{α} \cdot (1 + SNR)}

(46)

for the parameters

\begin{matrix} \begin{matrix} α_{c} & = \frac{SNR}{1 + SNR}, \tilde{α} & = \tilde{α} (α_{c} ({SNR}_{0}), L), η & = η (\tilde{α}, {\tilde{σ}}_{q}^{2}), α_{s} & = \frac{\tilde{α} (1 + SNR)}{\frac{{\tilde{σ}}_{q}^{2}}{σ_{q}^{2}} + \tilde{α} (1 + SNR)}, \end{matrix} \end{matrix}

(47)

Moreover, for any

P_{e} > 0

, however small, there exists a sequence of lattices

{Λ_{k} | k \in N}

that attains (38) and, therefore, in the limit of

k \to \infty

,

\tilde{α}

converges to

α_{c} ({SNR}_{0})

,

α_{s}

converges to

\frac{1 + SNR}{(1 + SNR) + \frac{{\tilde{σ}}_{q}^{2}}{σ_{q}^{2}} \frac{1 + {SNR}_{0}}{{SNR}_{0}}}

, and the distortion D is bounded from above in this limit by

\tilde{D}

:

\begin{matrix} D & \leq \tilde{D} + ϵ \end{matrix}

(48a)

\begin{matrix} = \frac{{\tilde{σ}}_{q}^{2}}{1 + SNR} \cdot \frac{1}{\frac{{\tilde{σ}}_{q}^{2}}{σ_{q}^{2}} \cdot \frac{1}{1 + SNR} + \frac{{SNR}_{0}}{1 + {SNR}_{0}}} + ϵ \end{matrix}

(48b)

\begin{matrix} \leq min \{\frac{σ_{q}^{2}}{1 + {SNR}_{0}}, \frac{{\tilde{σ}}_{q}^{2}}{1 + SNR} \frac{1 + {SNR}_{0}}{{SNR}_{0}}\} + ϵ, \end{matrix}

(48c)

where ϵ decays to zero with

P_{e}

. For

SNR \geq {SNR}_{0} ≫ 1

, the bound (48c) approaches

\frac{{\tilde{σ}}_{q}^{2}}{1 + SNR}

.

The following result is a simple consequence of Theorem 1 and avoids exact computation of the optimal parameters.

Corollary 4

(Suboptimal parameters). Assume the setting of Corollary 3 but with

z^{k}

not necessarily uncorrelated with

m^{k}

, and denote

SDR = P / σ_{z}^{2}

. Then, the distortion is bounded according to (32) with

\tilde{D} = \frac{{\tilde{σ}}_{q}^{2}}{SDR}

(49)

for the parameters

\tilde{α} = α_{c} = α_{s} = 1

,

η = η (1, {\tilde{σ}}_{q}^{2})

.

We refer to

P / σ_{z}^{2}

by

SDR

since now

z^{k}

may depend on

m^{k}

.

The following property will prove useful in Section 5 when treating non-Gaussian noise through “Gaussianization”.

Lemma 1

([24] (Lemmas 6 and 11)). Let

{Λ_{k} | k \in N}

be a sequence of lattices that satisfies the results in this section, and let

d^{k}

be a dither that is uniformly distributed over the fundamental Voronoi cell of

Λ_{k}

. Then, the probability density function (p.d.f.) of

d^{k}

is bounded from above as

\begin{matrix} f_{d^{k}} (a^{k}) & \leq f_{G^{k}} (a^{k}) e^{ϵ_{k} k} & \forall a^{k} \in R^{k}, \end{matrix}

(50)

where

f_{G^{k}}

is the p.d.f. of a vector with i.i.d. Gaussian entries with zero mean and the same second moment P as

Λ_{k}

, and

ϵ_{k} > 0

decays to zero with k.

4. Background: Analog Modulations in the Known-ENR Regime

Following the exposition at the beginning of Section 3 and Figure 3, we concentrate now on the second major component that is used in this work, that of analog modulations for conveying a scalar zero-mean Gaussian source (

k = 1

) over a channel with infinite bandwidth, where both the receiver and the transmitter know the channel noise level, or equivalently,

ENR = E / N

. To that end, we will review next the analog linear modulation and the analog PPM and will supplement the known results for the latter with a new robustness result for a source distribution that deviates from Gaussianity in Corollary 6.

Consider first analog linear modulation, in which the source sample x is linearly transmitted with energy E, (under linear transmission, the energy constraint holds only on average, and the transmitted energy is equal to the square of the specific realization of x) using some unit-energy waveform

\begin{matrix} s_{x} (t) = \sqrt{E} \frac{x}{σ_{x}} φ (t) . \end{matrix}

(51)

Note that linear modulation is the same (“universal”) regardless of the true noise level. Signal space theory [25] (Chapter 8.1), [26] (Chapter 2) suggests that a sufficient statistic of the transmission of (51) over the channel (9) is the one-dimensional projection y of r onto

φ

:

\begin{matrix} \begin{matrix} y & = \int_{- \frac{T}{2}}^{\frac{T}{2}} φ (t) r (t) d t = \sqrt{E} \frac{x}{σ_{x}} + \sqrt{\frac{N}{2}} z, \end{matrix} \end{matrix}

(52)

where z is a standard Gaussian noise variable. The MMSE estimator of x from y is linear and its distortion is equal to

\begin{matrix} D & = \frac{σ_{x}^{2}}{1 + 2 ENR}, \end{matrix}

(53)

and improves only linearly with the

ENR

.

Consider now analog PPM, in which the source sample is modulated by the shift of a given pulse rather than by its amplitude (which is the case for analog linear modulation):

\begin{matrix} s_{x} (t) = \sqrt{E} ϕ (t - x Δ) \end{matrix}

(54)

where

ϕ

is a predefined pulse with unit energy and

Δ

is a scaling parameter. In particular, the square pulse (Clearly, the bandwidth of this pulse is infinite. By taking a large enough bandwidth W, one may approximate this pulse to an arbitrarily high precision and attain its performance within an arbitrarily small gap) is known to achieve good performance. This pulse is given by

\begin{matrix} ϕ (t) & = \{\begin{matrix} \sqrt{\frac{β}{Δ}}, & |t| \leq \frac{Δ}{2 β}, \\ 0, & otherwise, \end{matrix} \end{matrix}

(55)

for a parameter

β > 1

, which is sometimes referred to as effective dimensionality. Clearly,

T = Δ + Δ / β

.

The optimal receiver is the MMSE estimator

\hat{x}

of x given the entire output signal:

\begin{matrix} {\hat{x}}^{MMSE} = E [x | r] . \end{matrix}

(56)

The following theorem provides an upper bound on the achievable distortion of this scheme using (suboptimal) maximum a posteriori (MAP) decoding, which is given by

\begin{matrix} {\hat{x}}^{MAP} & = \underset{a \in R}{argmax} \{R_{r, ϕ} (a Δ) - \frac{N}{4 \sqrt{E}} a^{2}\}, \end{matrix}

(57)

where

\begin{matrix} \begin{matrix} R_{r, ϕ} (\hat{x} Δ) & ≜ \int_{- \infty}^{\infty} r (t) ϕ (t - \hat{x} Δ) d t = \sqrt{E} R_{ϕ} ((x - \hat{x}) Δ) + \sqrt{\frac{β}{Δ}} \int_{\hat{x} Δ - \frac{Δ}{2 β}}^{\hat{x} Δ + \frac{Δ}{2 β}} n (t) d t, \end{matrix} \end{matrix}

(58a)

is the (empirical) cross-correlation function between r and

ϕ

with lag (displacement)

\hat{x} Δ

, and

\begin{matrix} \begin{matrix} R_{ϕ} (τ) & = \int_{- \infty}^{\infty} ϕ (t) ϕ (t - τ) d t = \{\begin{matrix} 1 - \frac{| τ |}{\frac{Δ}{β}}, & | τ | \leq \frac{Δ}{β} \\ 0, & otherwise \end{matrix} \end{matrix} \end{matrix}

(58b)

is the autocorrelation function of

ϕ

with lag

τ

.

Remark 2.

Since a Gaussian source has infinite support, the required overall transmission time T is infinite. Of course, this is not possible in practice. Instead, one may limit the transmission time T to a very large—yet finite—value. This will incur a loss compared to the the bound that will be stated next; this loss can be made arbitrarily small by taking T to be large enough.

Theorem 2

([21] (Prop. 2)). The distortion of the MAP decoder (57) of a standard Gaussian scalar source transmitted using analog PPM with a rectangular pulse is bounded from above by

\begin{matrix} D \leq D_{S} + D_{L} \end{matrix}

(59)

with

\begin{matrix} D_{L} ≜ 2 β \sqrt{ENR} e^{- \frac{ENR}{2}} (1 + 3 \sqrt{\frac{2 π}{ENR}} + \frac{12 e^{- 1}}{β \sqrt{ENR}} + \frac{8 e^{- 1}}{\sqrt{8 π} β} \\ + \sqrt{\frac{8}{π ENR}} + \frac{12^{\frac{3}{2}} e^{- \frac{3}{2}}}{β \sqrt{32 π ENR}}) + β \sqrt{8 π} e^{- ENR} (1 + \frac{4 e^{- 1}}{β \sqrt{2 π}}), \\ D_{S} ≜ \frac{\frac{13}{8} + \sqrt{\frac{2}{β}} (\sqrt{2 β ENR} - 1) \cdot e^{- {(\sqrt{ENR} - \frac{1}{\sqrt{2 β}})}^{2}}}{{(\sqrt{β ENR} - \frac{1}{\sqrt{2}})}^{4}} + \frac{e^{- β ENR}}{β^{2}}, \end{matrix}

bounding the small- and large-error distortions, assuming

β ENR > 1 / 2

. In particular, in the limit of large

ENR

, and β that increases monotonically with

ENR

,

\begin{matrix} D \leq ({\tilde{D}}_{S} + {\tilde{D}}_{L}) {1 + o (1)} \end{matrix}

(60)

where

\begin{matrix} {\tilde{D}}_{S} & ≜ \frac{13 / 8}{{(β ENR)}^{2}}, \end{matrix}

(61)

\begin{matrix} {\tilde{D}}_{L} & ≜ 2 β \sqrt{ENR} \cdot e^{- \frac{ENR}{2}}, \end{matrix}

(62)

and

o (1) \to 0

in the limit of

ENR \to \infty

.

Remark 3.

For a fixed β, the distortion improves quadratically with the

ENR

. This behavior will prove useful in the next section, where we construct schemes for the unknown-ENR regime.

Setting

β = {(\frac{13}{8})}^{\frac{1}{3}} {(ENR)}^{- \frac{5}{6}} e^{\frac{ENR}{6}}

in (60) of Theorem 2 yields the following asymptotic performance.

Corollary 5

([21] (Th. 2)). The achievable distortion of a standard Gaussian scalar source transmitted over an energy-limited channel with a known ENR is bounded from above as

\begin{matrix} D & \leq 3 \cdot {(\frac{13}{8})}^{\frac{1}{3}} e^{- \frac{ENR}{3}} \cdot {(ENR)}^{- \frac{1}{3}} \cdot \{1 + o (1)\}, \end{matrix}

(63)

where

o (1) \to 0

as

ENR \to \infty

.

The following corollary, whose proof is available in Appendix A, states that the (bound on the) distortion is continuous in the source p.d.f. around a Gaussian p.d.f. Such continuity results of the MMSE estimator in the source p.d.f. are known [27]. Next, we prove the required continuity directly for our case of interest with an additional technical requirement on the deviation from a Gaussian p.d.f.; this result will be used in conjunction with a non-uniform variant of the Berry–Esseen theorem in Section 5.

Corollary 6.

Consider the setting of Theorem 2 for a source p.d.f. that satisfies

\begin{matrix} |f_{x} (a) - f_{G} (a)| & \leq ϵ δ_{f} (a), & \forall a & \in R, \end{matrix}

(64)

where

ϵ > 0

,

f_{G}

is the standard Gaussian p.d.f., and

δ_{f}

is a symmetric absolutely continuous non-negative bounded function with unit integral

\int_{\infty}^{\infty} δ_{f} (a) d a = 1

, that is monotonically decreasing for

x > 0

(and for

x < 0

, by symmetry) and satisfies

δ_{f} (x) \in o (x^{- 4})

; thus, there exists

H < \infty

such that

\begin{matrix} δ_{f} (x) \leq \frac{H}{{(1 + x)}^{4}}, & \forall x \in R . \end{matrix}

(65)

Then, the distortion of the decoder that applies the decoding rule (57) is bounded from above by

\begin{matrix} D \leq D_{G} + ϵ C, \end{matrix}

(66)

where

D_{G} = D_{S} + P_{L} D_{L}

denotes the bound on the distortion for a standard Gaussian source of Theorem 2, and

C < \infty

is a non-negative constant that depends on

δ_{f}

. (This is no longer the MAP decoding rule since

f_{x}

is no longer a Gaussian p.d.f.).

5. Main Results

In this section, we construct JSCC solutions for the unknown-ENR communication problem. As already explained at the beginning of Section 3, the proposed solution, which is depicted in Figure 3 (cf. Figure 1), is composed of two major components:

Aa layered MLM-based component that works well for a continuum of possible noise levels over k-dimensional additive SNE noise channels, where each layer accommodates a different noise level, with layers of lower noise levels acting as SI in the decoding of subsequent layers;
An analog modulation component that is designed for a particular ENR of the continuous-time channel but improves for high ENRs and induces a k-dimensional additive SNE noise channel for the first component.

Following the exposition in the introduction, since an exponential improvement with the ENR cannot be attained in this setting for an infinite number of noise levels let alone a continuum thereof [7], following [7,18], we consider polynomially decaying profiles (6a,b).

We first show, in Section 5.1, that replacing the successive refinement coding of [7,18] with MLM (Wyner–Ziv coding) with linear layers results in better performance in the infinite-bandwidth setting (paralleling the results of the bandwidth-limited setting [6]).

In Section 5.2, we replace the last layer with an analog PPM one, which improves quadratically with the ENR (

L = 2

in (6b)) above the design ENR (recall Remark 3).

In principle, despite analog PPM attaining a gracious quadratic decay with the ENR (recall Remark 3) only above a predefined design ENR, since the distortion is bounded from above by the (finite) variance of the source, it attains a quadratic decay with the ENR for all

ENR \in R_{+}

, or equivalently, for all

N \in R_{+}

and

L = 2

in (6a,b).

That said, the performance of analog PPM deteriorates rapidly when the ENR is below the design ENR of the scheme, meaning that the minimum energy required to obtain (6a) with

L = 2

and a given

\tilde{E}

is large. To alleviate this, we use the above-mentioned layered MLM scheme. Furthermore, to achieve higher-order improvement with the ENR (

L > 2

in (6a,b)), multiple layers in the MLM scheme need to be employed.

We now present a simplified variant of the general scheme that is considered throughout this section. This variant is also depicted in Figure 4a. The full scheme, which incorporates interleaving for analytical purposes, is available in Appendix B and depicted in Figure A1.

Scheme 2 (MLM-based).

M-Layer Transmitter:

First layer (

i = 1

):

Transmits each of the entries of the vector $x^{k}$ over the channel (9) linearly (51):

$\begin{matrix} s_{1; ℓ} (t) & ≜ s (t + (ℓ - 1) T) = \sqrt{\frac{E_{1}}{T}} \frac{x_{ℓ}}{σ_{x}} φ (t), & t \in [0, T), \end{matrix}$

for $ℓ \in {1, \dots, k}$ , where $φ$ is a continuous unit-norm (i.e., unit-energy) waveform that is zero outside the interval $[0, T]$ , say $ϕ$ of (55), $E_{1} \in [0, E]$ is the allocated energy for layer 1, and E is the total available energy of the scheme.

Other layers: For each

i \in {2, \dots, M}

:

Calculates the k-dimensional tuple

$\begin{matrix} m_{i}^{k} & = {[η_{i} x^{k} + d_{i}^{k}]}_{Λ_{i}}, \end{matrix}$

(67)

where $m_{i}^{k} = {[\begin{matrix} m_{i; 1} & m_{i; 2} & \dots & m_{i; k} \end{matrix}]}^{†}$ , and $m_{i; ℓ}$ denotes the $ℓ th$ entry of $m_{i}^{k}$ ; $η_{i}$ , $d_{i}^{k}$ , and $Λ_{i}$ take the roles of the $η, d^{k}$ , and $Λ$ of Scheme 1, and are tailored for each layer i; $Λ_{i}$ is chosen to have unit second moment.
For each $ℓ \in {1, \dots, k}$ , views $m_{i; ℓ}$ as a scalar-source sample, and generates a corresponding channel input,

$\begin{matrix} s_{i; ℓ} (t) & ≜ s (t + (ℓ - 1) T + (i - 1) k T), & t \in [0, T), \end{matrix}$

(68)

using a scalar JSCC scheme with a predefined energy $E_{i} \geq 0$ that is designed for a predetermined ${ENR}_{i}$ , or equivalently, $N_{i} = E_{i} / {ENR}_{i}$ , such that $\sum_{i = 1}^{M} E_{i} = E$ and $N_{2} > N_{3} > \dots > N_{M} > 0$ .

Receiver: Receives the channel output signal r (9), and recovers the different layers as follows.

First layer (

i = 1

): For each

ℓ \in {1, \dots, k}

:

Recovers the MMSE estimate ${\hat{x}}_{1; ℓ}$ of $x_{ℓ}$ given ${r_{1; ℓ} (t) | t \in [0, T)}$ , where $r_{1; ℓ} (t) ≜ r (t + (ℓ - 1) T)$ .
If the true noise level N satisfies $N > N_{2}$ , sets the final estimate ${\hat{x}}_{ℓ}$ of $x_{ℓ}$ to ${\hat{x}}_{1; ℓ}$ and stops. Otherwise, determines the maximal layer index $𝚥 \in {2, \dots, M}$ for which $N \leq N_{𝚥}$ and continues to process the other layers.

Other layers: For each

i \in {2, \dots, j}

in ascending order:

For each $ℓ \in {1, \dots, k}$ , uses the receiver of the scalar JSCC scheme to generate an estimate ${\hat{\tilde{m}}}_{i; ℓ}$ of ${\tilde{m}}_{i; ℓ}$ from $\{r_{i; ℓ} (t) | t \in [0, T)\}$ , where

$\begin{matrix} r_{i; ℓ} (t) ≜ r (t + (ℓ - 1) T + (i - 1) k T) . \end{matrix}$

(69)
Using the effective channel output ${\hat{m}}_{i}^{k}$ (that takes the role of $y^{k}$ in Scheme 1) with SI ${\hat{x}}_{i - 1}^{k}$ , generates the signal

$\begin{matrix} {\tilde{y}}_{i}^{k} & = {[α_{c}^{(i)} {\hat{m}}_{i}^{k} - η_{i} {\hat{x}}_{i - 1}^{k} - d_{i}^{k}]}_{Λ_{i}}, \end{matrix}$

(70)

as in (30) of Scheme 1, where $α_{c}^{(i)}$ is a channel scale factor.
Constructs an estimate ${\hat{x}}_{i}^{k}$ of $x^{k}$ :

$\begin{matrix} {\hat{x}}_{i}^{k} & = \frac{α_{s}^{(i)}}{η_{i}} {\tilde{y}}_{i}^{k} + {\hat{x}}_{i - 1}^{k}, \end{matrix}$

(71)

as in (31) of Scheme 1, where $α_{s}^{(i)}$ is a source scale factor. The final estimate if ${\hat{x}}^{k} = {\hat{x}}_{𝚥}^{k}$ .

Remark 4

(Interleaving). To guarantee independence between all the noise entries

ℓ \in {1, \dots, k}

, we use interleaving in the full scheme, which is described in Appendix B in (A8) and (A11). We note that this operation is used to simplify the proof that the resulting noise vector is SNE (recall Definition 5).

Remark 5

(Gaussianization). To use the analysis of Section 4 of analog PPM for a Gaussian source, we multiply the vectors

m_{i}^{k}

by orthogonal matrices

H_{i}

that effectively “Gaussianize” its entries, as shown in the full description of the scheme in Appendix B, in (A8) and (A11). In particular, this is achieved by a Walsh–Hadamard matrix

H_{i}

by appealing to the central limit theorem; a similar choice was previously proposed by Feder and Ingber [28], and by Hadad and Erez [29], where in the latter, the columns of the Walsh–Hadamard matrix were further multiplied by i.i.d. Rademacher RVs to achieve near-independence between multiple descriptions of the same source vector (see [29,30,31] for other ensembles of orthogonal matrices that achieve a similar result). Interestingly, the multiplication by the orthogonal matrices

H_{i}^{- 1} = H_{i}^{†}

(since Walsh–Hadamard matrices are symmetric, they further satisfy

H_{i}^{†} = H_{i}

) Gaussianizes the effective noise incurred at the outputs of the analog PPM JSCC receivers.

Remark 6

(JSCC-induced channel). The continuous-time JSCC transmitter and receiver over the infinite-bandwidth AWGN channel induce an effective additive-noise channel of better effective SNR and source bandwidth. Over this induced channel, the MLM transmitter and receiver are then employed. This interpretation is depicted in Figure 4b, with

{\tilde{n}}_{i}^{k}

representing the effective additive noise vectors.

We next provide analytic guarantees for this scheme for linear and analog PPM layers in Section 5.1 and Section 5.2, respectively, in the infinite-blocklength regime. In Section 6, we compare the analytic and empirical performance of these schemes in the infinite-blocklength regime, as well as comparing the empirical performance of these schemes for a single-source sample. The treatment of the infinite-blocklength regime pertains to the full scheme as presented in Appendix B. The comparison for a single-source sample, uses the simplified variant of Scheme 2.

5.1. Infinite-Blocklength Setting with Linear Layers

We start with analyzing the performance of the scheme where all the M layers are transmitted linearly and M is large; we concentrate on the setting of an infinite-source blocklength (

k \to \infty

) and derive an achievability bound on the minimum energy that achieves a polynomial distortion profile (6a,b). A constructive proof of the next theorem is available in Appendix C. In particular, this proof specifies all the scheme parameters, such as the energy allocated to each layer and the minimal noise level it is designed for.

Theorem 3.

Choose a decaying order

L > 1

, a design parameter

\tilde{E} > 0

, and a minimal noise level

N_{min} > 0

, however small. Then, a distortion profile (6a) with L and

\tilde{E}

is achievable for all noise levels

N > N_{min}

for any transmit energy E that satisfies

\begin{matrix} E > δ_{lin} (L) \tilde{E}, \end{matrix}

(72)

for a large-enough-source blocklength k, where

\begin{matrix} δ_{lin} (L) & ≜ \frac{1}{2} \cdot min_{(α, x) \in R_{+}^{2}} \{{(\frac{e^{α}}{x})}^{L - 1} + \frac{x}{2} (e^{α L} - 1) (1 + \sqrt{1 + \frac{4 e^{α (L + 1)}}{{(1 - e^{α L})}^{2}}}) \frac{e^{- 2 α}}{1 - e^{- α}}\} . \end{matrix}

(73)

In particular, the choice

(x, α) = (0.898, 0.666)

achieves a quadratic decay (

L = 2

) for any transmit energy E that satisfies

\begin{matrix} E > 2.167 \tilde{E}, \end{matrix}

(74)

for a large-enough-source blocklength k.

We note that already this variant of the scheme offers an improvement compared to the hitherto best-known upper (achievability) bound of (7).

The choice of the minimal noise level

N_{min}

dictates the number of layers M that need to be employed: the lower

N_{min}

is, the more layers M need to be employed.

Remark 7.

In the proof in Appendix C, we use an exponentially decaying noise level series

N_{i} = Δ e^{- α (i - 1)}

, which facilitates the analysis. Nevertheless, any other assignment that satisfies the profile requirement and energy constraint is valid and may lead to better performance; for further discussion, see Section 7.

5.2. Infinite-Blocklength Setting with Analog PPM Layers

In this section, we concentrate on the setting of an infinite-source blocklength (

k \to \infty

) and a quadratically decaying profile (

L = 2

in (6a,b)) using analog PPM.

To that end, we use a sequence of

M - 1

linear JSCC layers as in Section 5.1, with only the last layer replaced by an analog PPM one; since analog PPM improves quadratically with the ENR (recall Remark 3), M need not go to infinity to attain a quadratically decaying profile.

Theorem 4.

Choose a design parameter

\tilde{E} > 0

, and a minimal noise level

N_{min} > 0

, however small. Then, a quadratic profile (

L = 2

) (6a) with

\tilde{E}

is achievable for all noise levels

N > N_{min}

for any transmit energy E that satisfies

\begin{matrix} E > 1.961 \tilde{E}, \end{matrix}

(75)

for a large-enough-source blocklength k.

This theorem, whose proof is available in Appendix D, offers a further improvement over the upper bounds in (7) and Theorem 3 for a quadratic profile. Again, the proof of Theorem 4 in Appendix D is constructive and details the scheme parameters, such as the energy allocated to each layer and the minimal noise level, it is designed for.

Remark 8.

Replacing all layers but the first layer with analog PPM ones should yield better performance, but complicates the analysis. Moreover, a similar analysis to that of Theorem 3 for

L \neq 2

may be devised, but for

L > 2

would require multiple layers as the distortion of analog PPM decays only quadratically. Both of these analyses are left for future research.

6. Simulations

In Section 6.1, we first compare the analytical results of Theorems 3 and 4 to the prior art in the infinite-blocklength regime (

k \to \infty

). We further optimize the parameters in Theorem 4 empirically and show a further improvement, which suggests, in turn, a slack in our analysis. In Section 6.2, we evaluate the performance of Scheme 2 empirically, using a Monte Carlo simulation, for a single-source sample (

k = 1

) of a uniform source, and compare the performance of a scheme with all linear layers to those of a scheme that incorporates an analog PPM layer.

6.1. Analytical Performance Comparison in the Infinite-Blocklength Regime

We first consider the infinite-blocklength regime (

k \to \infty

) for a Gaussian source and a quadratic profile (

L = 2

in (6a,b)), for which we have derived analytical guarantees in Section 5.1 and Section 5.2. Figure 5 depicts the accumulated energy of the employed layers at the receiver of Section 5 and the achievable distortion as functions of

\tilde{E} / N

, along with the desired quadratic distortion profile (6a) (with

L = 2

) for

N_{min} \to 0

for linear layers with the energy allocation to the different layers, as per the proof of Theorem 3 in Appendix C; and

M - 1

linear layers with a final analog PPM layer (Theorem 4) for both the energy allocation for

M = 7

layers, which is available in the proof of Theorem 4 in Appendix D and relies on the bound on the analog PPM performance, and for

M = 2

layers, with an empirically evaluated performance of analog PPM allocation.

This figure clearly demonstrates the gain due to introducing an analog PPM layer. Interestingly, the empirically evaluated analog PPM curve shows that only two layers are needed when the second layer is an analog PPM one, meaning that the seven layers needed in the proof of Theorem 4 are an artifact of the slack in our analytic bounds.

To derive the performance of the scheme with linear layers, we evaluated the energy allocation in the proof of Theorem 3 in Appendix C in (16) directly for the optimized energy allocation

E_{i} = Δ e^{- α i}

with

Δ = 0.975

and

α = 0.65

. To derive the analytical performance of Theorem 4, we used the energy allocation from its proof in Appendix D, while for the empirical performance, optimizing over the energy allocation yielded

E_{1} = 0.975 \tilde{E}, E_{2} = 0.5904 \tilde{E}

. The Matlab code package and the specific script that was used for generating Figure 5, along with all the scheme parameters and analog PPM empirical evaluation, are available in [32].

6.2. Empirical Performance Comparison for a Single-Source Sample

We move now to the uniform scalar-source setting (

k = 1

) and a quadratic profile. The analysis of Section 5 in the scalar setting is difficult. We, therefore, evaluate its performance empirically for both variants of the scheme: with linear layers, and with one linear layer and one analog PPM layer (two layers suffice in this setting as well). In Figure 6, we depict again the accumulated energy of the employed layers at the receiver of Section 5 and the achievable distortion as functions of

\tilde{E} / N

for both variants of the scheme, along with the desired quadratic distortion profile (6a) (with

L = 2

) for

N_{min} \to 0

.

For the variant with linear layers only, an energy allocation of

E_{i} / \tilde{E} = Δ e^{- α i}

with

Δ = 0.9

and

α = 0.64

was used. For the variant with an analog PPM layer, an energy of

E_{1} = 0.9 \tilde{E}

was allocated to the (first) linear layer, and an energy of

E_{2} = 0.346 \tilde{E}

was allocated to the (second) analog PPM layer. The lattice inflation factor

η

was chosen as the minimizer of a variant of (32) under the assumption that the noise is Gaussian, namely,

\begin{matrix} η^{*} = \underset{η}{argmin} \{P_{e} \cdot {(2 Δ)}^{2} + (1 - P_{e}) \cdot \frac{1}{η^{2} N}\}, \end{matrix}

(76)

where

P_{e} = 2 Q (\frac{Δ}{\sqrt{η^{2} D (ENR) + N}})

, N is the noise level,

Δ

is the modulo size that was chosen to be

\sqrt{12}

, and

D (ENR)

is the average distortion that corresponds to the last transmitted layer.

As in the infinite-blocklength regime, here too utilizing analog PPM provides better performance compared to a linear-only scheme. Again, the Matlab code package and the specific script that was used for generating Figure 6, along with all the scheme parameters and empirical evaluations, are available in [32].

7. Summary and Discussion

In this work, we studied the problem of JSCC over an energy-limited channel with unlimited bandwidth and/or transmission time when the noise level is unknown at the transmitter. We showed that MLM-based schemes outperform the existing schemes thanks to the improvement in the performance of all layers (including the preceding layers that act as SI) with the ENR. By replacing (some of the) linear layers with analog PPM ones, further improvement was achieved. We further demonstrated numerically that the MLM-layered scheme works well in the scalar-source regime.

We also note that a substantial gap remains between the lower bound in (7) and the upper bound of Theorem 4 for the energy required to achieve a quadratic profile ((6a,b) with

L = 2

). In Section 8, several ways to close this gap are described.

We note that, although we assumed that both the bandwidth and the time are unlimited, the scheme and analysis presented in this work carry over to the setting where one of the two is bounded as long as the other one is unlimited, with little adjustment.

8. Future Research

Consider first the remaining gap between the lower and upper bounds. As demonstrated in Section 6, the upper (achievability) bound on the performance of analog PPM is not tight and calls for further improvement thereof. This step is currently under intense investigation, along with improvement via companding of the presented analog PPM variant in this work as well as via other choices of energy allocation (see Remark 7). Furthermore, the optimization was performed numerically and for a particular form of noise levels of an exponential form (recall Remark 7). We believe that a systematic optimization procedure could shed light on the weaknesses of our scheme and provide further improvements in the overall performance. On the other hand, the outer bounds of [18] are based on specific choices of sequences of noise levels. Therefore, further improvement might be achieved by other choices and calls for further research.

We have also shown that the MLM scheme performs well in the scalar-source regime; it would be interesting to derive analytical performance guarantees for this regime.

Finally, since MLM utilizes well source SI at the receiver and channel SI at the transmitter [13,14], [15] (Chs. 10–12), the proposed scheme can be extended to limited-energy settings, such as universal transmission with respect to the noise level and the SI quality at the receiver [33] and the dual problem of the one considered in this work of universal transmission with respect to the noise level with near-zero bandwidth [34].

Author Contributions

Writing—original draft, O.L. and A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Israel Science Foundation (grant No. 2077/20). The work of O. Lev was further supported by the Yitzhak and Chaya Weinstein Research Institute for Signal Processing. The work of A. Khina was further supported by the WIN Consortium through the Israel Ministry of Economy and Industry, and by a grant from the Tel Aviv University Center for AI and Data Science (TAD).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Corollary 6

To prove Corollary 6, we repeat the steps of the proof of Theorem 2 in [21] (Prop. 2); we next detail the contributions to the small-distortion [21] (Equation (25)) and the large-distortion [21] (Equation (27)) terms due to the deviation (64) from the source p.d.f. from Gaussianity, which are denoted by

d_{S}

and

d_{L}

, respectively.

We start by bounding the contribution to the small-distortion term. To that end, note that [21] (Equations (24b) and (25b)) remain unaltered since the decoder remains the same. The contribution to the small-distortion term is bounded from above as follows.

\begin{matrix} \frac{d_{S}}{ϵ} & \leq \frac{2}{β^{2}} \cdot \int_{\sqrt{2 β ENR}}^{\infty} δ_{f} (a) d a \end{matrix}

(A1a)

\begin{matrix} \leq \frac{2}{β^{2}} \end{matrix}

(A1b)

where (A1a) follows from [21] (Equations (24b) and (25b)), and (A1b) follows from

δ_{f}

being non-negative with unit integral.

We next bound the contribution

d_{L}

to the large distortion term. To that end, note that [21] (Equations (27) and (28)) remain unaltered since the decoder remains the same. We define by

a_{i}

the deviation in

Pr (A_{i})

in [21] (Equation (30)). Then,

\begin{matrix} \frac{a_{i}}{ϵ} & \leq \int_{- (\frac{2 ENR β}{i} + \frac{i}{2 β})}^{\infty} δ_{f} (a) \{\frac{\sqrt{3}}{4 π} e^{- \frac{ℓ^{2} (a)}{3}} + (\frac{1}{\sqrt{8}} + \frac{ℓ (a)}{4 \sqrt{π}}) e^{- \frac{ℓ^{2} (a)}{4}} + e^{- \frac{ℓ^{2} (a)}{2}}\} d a \\ + \int_{- \infty}^{- (\frac{2 ENR β}{i} + \frac{i}{2 β})} δ_{f} (a) d a \end{matrix}

(A2a)

\begin{matrix} \leq \frac{\sqrt{2 ENR} β}{i} \int_{0}^{\infty} δ_{f} (\frac{\sqrt{2 ENR} β}{i} u - \frac{2 ENR β}{i} - \frac{i}{2 β}) \{\frac{\sqrt{3}}{4 π} e^{- \frac{u^{2}}{3}} + e^{- \frac{u^{2}}{2}} + (\frac{1}{\sqrt{8}} + \frac{u}{4 \sqrt{π}}) e^{- \frac{u^{2}}{4}}\} d u \\ + \frac{H}{{(1 + \frac{2 ENR β}{i} + \frac{i}{2 β})}^{4}} \end{matrix}

(A2b)

\begin{matrix} \leq \frac{\tilde{H}}{{(1 + \frac{2 ENR β}{i} + \frac{i}{2 β})}^{4}}, \end{matrix}

(A2c)

where (A2a) follows from [21] (Equations (28) and (30)), (A2b) follows from integration by substitution and (65), and (A2c) follows from (65) for some

\tilde{H} > 0

.

By substituting the bound of (A2a–c) in [21] (Equation (31)), we may bound

d_{L}

from above by

\begin{matrix} \frac{d_{L}}{ϵ} & \leq 2 \sum_{i = 2}^{\infty} {(\frac{i}{β})}^{2} a_{i} \leq \sum_{i = 2}^{\infty} {(\frac{i}{β})}^{2} \frac{\tilde{H}}{{(1 + \frac{2 ENR β}{i} + \frac{i}{2 β})}^{4}} \leq \tilde{C} \end{matrix}

(A3)

for some

\tilde{C} < \infty

.

Therefore, by (A1a,b) and (A3), the overall contribution d to the distortion due to the deviation (64) is bounded from above by

\begin{matrix} d & = d_{L} + d_{S} \leq ϵ (\frac{2}{β^{2}} + \tilde{C}) \end{matrix}

(A4)

choosing

C = \frac{2}{β^{2}} + \tilde{C} < \infty

concludes the proof.

Appendix B. Full Version of Scheme 2

We now present the full multi-layer transmission scheme (cf. Scheme 2), which includes interleaving and Gaussianization steps, as discussed in Remarks 4 and 5, respectively. Block diagrams of the overall scheme and the new ingredients are provided in Figure A1 and Figure A2, respectively. The new components in Scheme 3 compared to those in Scheme 2 (and Figure 4a) are highlighted in green in Figure A1.

Figure A1. Block diagram of Scheme 3.

Figure A2. Block diagram for the i-th MLM layer transmitter and receiver of Scheme 3. We denote the interleaving and de-interleaving operations by

Π

and

Π^{- 1}

, respectively.

Figure A2. Block diagram for the i-th MLM layer transmitter and receiver of Scheme 3. We denote the interleaving and de-interleaving operations by

Π

and

Π^{- 1}

, respectively.

Scheme 3 (Full MLM-based).
M-Layer Transmitter:

First layer (

i = 1

):

For $B \geq k$ , $B \in N$ , accumulates $B^{k}$ source (column) vectors $x^{k} (1), x^{k} (2), \dots, x^{k} (B^{k})$ . Denote by $X$ the matrix whose columns are the source vectors:

$\begin{matrix} X ≜ [\begin{matrix} x^{k} (1) & x^{k} (2) & \dots & x^{k} (B^{k}) \end{matrix}] . \end{matrix}$

(A5)
For each $b \in {1, 2, \dots, B^{k}}$ , transmits each of the entries of the vector $x^{k} (b)$ over the channel (9) linearly (51):

$\begin{matrix} s_{1; ℓ, b} (t) & ≜ s (t + (ℓ - 1) T + (b - 1) k T) = \sqrt{\frac{E_{1}}{T}} \frac{x_{ℓ} (b)}{σ_{x}} φ (t), t \in [0, T), \end{matrix}$

(A6)

for $ℓ = {1, 2, \dots, k}$ , where $φ$ is a continuous unit-norm (i.e., unit-energy) waveform that is zero outside the interval $[0, T]$ , say $ϕ$ of (55), $E_{1} \in [0, E]$ is the allocated energy for layer 1, and E is the total available energy of the scheme.

Other layers: For each

i \in {2, \dots, M}

:

For each $b \in {1, 2, \dots, B^{k}}$ , calculates the k-dimensional tuple

$\begin{matrix} m_{i}^{k} (b) & = {[η_{i} (b) x^{k} (b) + d_{i}^{k} (b)]}_{Λ}, \end{matrix}$

(A7)

where $m_{i}^{k} (b) = {(m_{i; 1} (b), m_{i; 2} (b), \dots, m_{i; k} (b))}^{†}$ , and $m_{i; ℓ} (b)$ denotes the $ℓ th$ entry of $m_{i}^{k} (b)$ for $ℓ \in {1, \dots k}$ ; $η_{i} (b)$ , $d_{i}^{k} (b)$ , and $Λ$ take the roles of $η, d^{k}$ , and $Λ$ of Scheme 1, and are tailored for each layer i; $Λ$ is chosen to have unit second moment.
For each $ℓ \in {1, \dots, k}$ , interleaves the entries $m_{i; ℓ} (1), \dots, m_{i; ℓ} (B^{k})$ , stacks them into vectors of size B, and applies to each of them a B-dimensional orthogonal matrix $H_{i}$ , as follows:

$\begin{matrix} {\tilde{m}}_{i; (ℓ, j)}^{B} = H_{i} (\begin{matrix} m_{i; ℓ} (⌊\frac{j - 1}{B^{ℓ}}⌋ \cdot B^{ℓ + 1} + {[j - 1]}_{B^{ℓ}} + 1) \\ m_{i; ℓ} (⌊\frac{j - 1}{B^{ℓ}}⌋ \cdot B^{ℓ + 1} + {[j - 1]}_{B^{ℓ}} + B^{ℓ} + 1) \\ m_{i; ℓ} (⌊\frac{j - 1}{B^{ℓ}}⌋ \cdot B^{ℓ + 1} + {[j - 1]}_{B^{ℓ}} + 2 B^{ℓ} + 1) \\ ⋮ \\ m_{i; ℓ} (⌊\frac{j - 1}{B^{ℓ}}⌋ \cdot B^{ℓ + 1} + {[j - 1]}_{B^{ℓ}} + B^{ℓ} (B - 1) + 1) \end{matrix}) ≜ H_{i} {\overset{`}{m}}_{i; (ℓ, j)}^{B} \end{matrix}$

(A8)

for $j \in {1, 2, \dots, B^{k - 1}}$ , where ${\overset{`}{m}}_{i; (ℓ, j)}^{B}$ is the vector after interleaving; ${\tilde{m}}_{i; (ℓ, j)}^{B}$ is the vector after interleaving and matrix multiplication and its $ξ th$ entry is ${\tilde{m}}_{i; ξ, (ℓ, j)}$ for $ξ \in {1, \dots, B}$ ; the length of the vectors ${\overset{`}{m}}_{i; (ℓ, j)}^{B}$ and ${\tilde{m}}_{i; (ℓ, j)}^{B}$ is B. Note that the interleaving operation creates doubly indexed vectors, where a set of $B^{k}$ vectors of length k is transformed into $k \times B^{k - 1}$ vectors of length B, which are indexed by $ℓ \in {1, \dots, k}$ and $j \in \{1, \dots, B^{k - 1}\}$ .
For each ℓ, j, and $ξ$ , views ${\tilde{m}}_{i; ξ, (ℓ, j)}$ as a scalar-source sample, and generates a corresponding channel input $\{s_{i; ξ, (ℓ, j)} (t) | t \in [0, T)\}$ , where

$\begin{matrix} s_{i; ξ, (ℓ, j)} (t) ≜ s (t + ((ℓ - 1) + (ξ - 1) k + (j - 1) B k + (i - 1) k B^{k}) T) \end{matrix}$

using a scalar JSCC scheme with a predefined energy $E_{i} \geq 0$ that is designed for a predetermined ${ENR}_{i}$ , or equivalently, $N_{i} = E_{i} / {ENR}_{i}$ , such that $\sum_{i = 1}^{M} E_{i} = E$ and $N_{2} > N_{3} > \dots > N_{M} > 0$ .

Receiver: Receives the channel output signal r (9) and recovers the different layers as follows.

First layer (

i = 1

): For each

ℓ \in {1, \dots, k}

,

b \in {1, \dots, B^{k}}

:

Recovers the MMSE estimate ${\hat{x}}_{1; ℓ} (b)$ of $x_{ℓ} (b)$ given $\{r_{1; ℓ, b} (t) | t \in [0, T)\}$ , where

$\begin{matrix} r_{1; ℓ, b} (t) ≜ r (t + (ℓ - 1) T + (b - 1) k T) . \end{matrix}$

(A9)

Denotes the matrix whose columns comprise these estimates by

$\begin{matrix} {\hat{X}}_{1} ≜ [\begin{matrix} {\hat{x}}_{1}^{k} (1), & {\hat{x}}_{1}^{k} (2), & \dots & {\hat{x}}_{1}^{k} (B^{k}) \end{matrix}] . \end{matrix}$

(A10)
If the true noise level N satisfies $N > N_{2}$ , sets the final estimate $\hat{X}$ of $X$ to ${\hat{X}}_{1}$ and stops. Otherwise, determines the maximal layer index 𝚥 for which $N \leq N_{𝚥}$ and continues to process the other layers.

Other layers: For each

i \in {2, \dots, 𝚥}

in ascending order:

For each $ℓ \in {1, \dots, k}, j \in {1, \dots, B^{k - 1}}$ and $ξ \in {1, \dots, B}$ , uses the receiver of the scalar JSCC scheme to generate an estimate ${\hat{\tilde{m}}}_{i; (ℓ, j)}^{B}$ of ${\tilde{m}}_{i; (ℓ, j)}^{B}$ from $\{r_{i; ξ, (ℓ, j)} (t) | t \in [0, T)\}$ , where

$\begin{matrix} r_{i; ξ, (ℓ, j)} (t) ≜ r (t + ((ℓ - 1) + (ξ - 1) k + (j - 1) B k + (i - 1) k B^{k}) T) . \end{matrix}$
For each $ℓ \in {1, \dots, k}$ , stacks the entries of ${\hat{\tilde{m}}}_{i; (ℓ, 1)}^{B}, \dots, {\hat{\tilde{m}}}_{i; (ℓ, B^{k - 1})}^{B}$ into vectors of length B, ${\hat{\tilde{m}}}_{i; (ℓ, j)}^{B}$ , applies the orthogonal matrix $H_{i}^{- 1} = H_{i}^{†}$ to each vector ${\hat{\tilde{m}}}_{i; (ℓ, j)}^{B}$ , and de-interleaves the outcomes, to attain ${\hat{\overset{`}{m}}}_{i; (ℓ, j)}^{B}$ , as follows:

${\hat{\overset{`}{m}}}_{i; (ℓ, j)}^{B} = (\begin{matrix} {\hat{m}}_{i; ℓ} (⌊\frac{j - 1}{B^{ℓ}}⌋ \cdot B^{ℓ + 1} + {[j - 1]}_{B^{ℓ}}) \\ {\hat{m}}_{i; ℓ} (⌊\frac{j - 1}{B^{ℓ}}⌋ \cdot B^{ℓ + 1} + {[j - 1]}_{B^{ℓ}} + B^{ℓ}) \\ {\hat{m}}_{i; ℓ} (⌊\frac{j - 1}{B^{ℓ}}⌋ \cdot B^{ℓ + 1} + {[j - 1]}_{B^{ℓ}} + 2 B^{ℓ}) \\ ⋮ \\ {\hat{m}}_{i; ℓ} (⌊\frac{j - 1}{B^{ℓ}}⌋ \cdot B^{ℓ + 1} + {[j - 1]}_{B^{ℓ}} + B^{ℓ} (B - 1)) \end{matrix}) = H_{i}^{†} {\hat{\tilde{m}}}_{i; (ℓ, j)}^{B}$

(A11)

for $j \in \{1, 2, \dots, B^{k - 1}\}$ .
For each $b \in \{1, \dots, B^{k}\}$ , using the effective channel output ${\hat{m}}_{i}^{k} (b)$ (that takes the role of $y^{k}$ in Scheme 1) with SI ${\hat{x}}_{i - 1}^{k} (b)$ , generates the signal

$\begin{matrix} {\tilde{y}}_{i}^{k} (b) & = {[α_{c}^{(i)} {\hat{m}}_{i}^{k} (b) - η_{i} {\hat{x}}_{i - 1}^{k} (b) - d_{i}^{k} (b)]}_{Λ}, \end{matrix}$

(A12)

as in (30) of Scheme 1, where $α_{c}^{(i)}$ is a channel scale factor.
For each $b \in \{1, \dots, B^{k}\}$ , constructs an estimate ${\hat{x}}^{k} (b)$ of $x^{k} (b)$ :

$\begin{matrix} {\hat{x}}_{i}^{k} (b) & = \frac{α_{s}^{(i)}}{η_{i}} {\tilde{y}}_{i}^{k} (b) + {\hat{x}}_{i - 1}^{k} (b), \end{matrix}$

(A13)

as in (31) of Scheme 1, where $α_{s}^{(i)}$ is a source scale factor. Denote the matrix whose columns comprise these estimates by ${\hat{X}}_{i} ≜ [\begin{matrix} {\hat{x}}_{i}^{k} (1) & \dots & {\hat{x}}_{i}^{k} (B^{k}) \end{matrix}]$ . The final estimate is $\hat{X} = {\hat{X}}_{𝚥}$ .

Appendix C. Proof of Theorem 3

To prove Theorem 3, we will make use of the following lemma about the validity of the MLM results from Section 3 for the multi-layer MLM scenario, where the mid-stage noise vectors are linear combinations of dithers and Gaussian noises. The proof of this lemma is given in Appendix E.

Lemma A1.

Let

q^{k}

be a sequence in k of vectors, such that the vector

q^{k}

equals with probability

1 - P_{k}

to a linear combination of a Gaussian vector and dithers, all of which are mutually independent, where

{lim}_{k \to \infty} P_{k} = 0

. Then, the sequence in k of error signals

x^{k} - {\hat{x}}^{k}

is SNE for a sequence of lattices that is good for both channel coding and MSE quantization; moreover, for each k, the error signal is equal, with probability

1 - Q_{k}

, to a linear combination of a Gaussian vector and dithers, all of which are mutually independent, where

{lim}_{k \to \infty} Q_{k} = 0

.

We now prove Theorem 3. We will construct a scheme with a large enough (yet finite) M that achieves (6a) with the predefined L and

\tilde{E}

for all

N > N_{min}

for a given

N_{min} > 0

. For any

N_{min} > 0

, however small, we will choose

M \in N

large enough and

{N_{i} | i = 1, \dots, M}

such that

N_{min} \in (N_{M}, N_{M - 1}]

.

Consider the first layer (

i = 1

). The distortion

D_{1}

of

{\hat{x}}_{1}^{k}

for a noise level N is bounded from above by

\begin{matrix} D_{1} (N) & = \frac{σ_{x}^{2}}{1 + \frac{2 E_{1}}{N}} \end{matrix}

(A14a)

\begin{matrix} \leq σ_{x}^{2} \cdot F (N) \end{matrix}

(A14b)

\begin{matrix} = \frac{σ_{x}^{2}}{1 + {(\frac{\tilde{E}}{N})}^{L}} \end{matrix}

(A14c)

where (A14a) follows from (53), and (A14b) and (A14c) follow from the distortion profile requirement (6a) for

N > N_{2}

.

To guarantee the requirement (A14b) for all

N > N_{2}

, it suffices to guarantee it for the extreme value

N = N_{2}

, which holds, in turn, for

\begin{matrix} E_{1} & = {(\frac{\tilde{E}}{N_{2}})}^{L} \frac{N_{2}}{2} . \end{matrix}

(A15)

For

i \in {2, \dots, 𝚥}

, the distortion

D_{i}

of

{\hat{x}}_{i}^{k}

for a noise level N is bounded from above by

\begin{matrix} D_{i} (N) & \leq \frac{D_{i - 1} (N_{i})}{1 + \frac{2 E_{i}}{N}} \cdot \frac{1 + \frac{2 E_{i}}{N_{i}}}{\frac{2 E_{i}}{N_{i}}} + ϵ_{i} \end{matrix}

(A16a)

\begin{matrix} \leq \frac{σ_{x}^{2} \cdot F (N_{i})}{1 + \frac{2 E_{i}}{N}} \cdot \frac{1 + \frac{2 E_{i}}{N_{i}}}{\frac{2 E_{i}}{N_{i}}} + ϵ_{i} \end{matrix}

(A16b)

\begin{matrix} = \frac{σ_{x}^{2}}{1 + {(\frac{\tilde{E}}{N_{i}})}^{L}} \cdot \frac{1}{1 + \frac{2 E_{i}}{N}} \cdot \frac{1 + \frac{2 E_{i}}{N_{i}}}{\frac{2 E_{i}}{N_{i}}} + ϵ_{i} \end{matrix}

(A16c)

\begin{matrix} \leq σ_{x}^{2} \cdot F (N) \end{matrix}

(A16d)

\begin{matrix} = \frac{σ_{x}^{2}}{1 + {(\frac{\tilde{E}}{N})}^{L}}, \end{matrix}

(A16e)

where (A16a) follows from Corollary 2 by treating

{\hat{x}}_{i - 1}^{k}

as SI, and the error

x - {\hat{x}}_{i - 1}^{k}

takes the role of the “unknown part” at the receiver with power

D_{i - 1}

, with

ϵ_{i}

going to zero with k, and by invoking Lemma A1 recursively, which guarantees that the sequence in k of the error vectors

x - {\hat{x}}_{i - 1}^{k}

is SNE; (A16b) holds by the distortion profile requirement (6a) (the requirement

D_{i - 1} (N) \leq σ_{x}^{2} \cdot F (N)

is satisfied for

N = N_{i} - ϵ

for any

ϵ > 0

, however small, and therefore, holds also for

N = N_{i}

, by continuity. Alternatively, one may view it as a requirement of the scheme given

i - 1

layers, for all

i \in {2, 3, \dots, 𝚥}

); (A16c) follows from (6a) with

\tilde{E}

and L; (A16d) follows from the distortion profile requirement (6a) for

N \in (N_{i + 1}, N_{i}]

; and (A16e) follows from (6a) with

\tilde{E}

and L.

To guarantee the requirement (A16d) for all

N \in (N_{i + 1}, N_{i}]

we need only to satisfy it for the extreme value

N = N_{i + 1}

, which holds, in turn, for

\begin{matrix} 1 + \frac{2 E_{i}}{N_{i + 1}} & \geq \frac{1 + \frac{2 E_{i}}{N_{i}}}{\frac{2 E_{i}}{N_{i}}} \cdot \frac{1 + {(\frac{\tilde{E}}{N_{i + 1}})}^{L}}{1 + {(\frac{\tilde{E}}{N_{i}})}^{L}} + {\tilde{ϵ}}_{i} \end{matrix}

(A17a)

\begin{matrix} \geq (1 + \frac{N_{i}}{2 E_{i}}) {(\frac{N_{i}}{N_{i + 1}})}^{L} + {\tilde{ϵ}}_{i}, \end{matrix}

(A17b)

where (A17b) holds since

N_{i + 1} < N_{i}

, and

{\tilde{ϵ}}_{i}

decays to zero with k; the set of inequalities (A17a,b) holds for

\begin{matrix} E_{i} & = \frac{N_{i + 1}}{4} ({(\frac{N_{i}}{N_{i + 1}})}^{L} - 1) (1 + \sqrt{1 + \frac{4 {(\frac{N_{i}}{N_{i + 1}})}^{L + 1}}{{(1 - {(\frac{N_{i}}{N_{i + 1}})}^{L})}^{2}}}) + ε_{i}, \end{matrix}

(A18)

where again

ε_{i}

decays to zero with k.

We are now ready to bound the total energy E.

\begin{matrix} \frac{E}{\tilde{E}} & = \frac{1}{\tilde{E}} \sum_{i = 1}^{𝚥} E_{i} \end{matrix}

(A19a)

\begin{matrix} \leq \sum_{i = 2}^{\infty} \frac{N_{i + 1}}{4 \tilde{E}} ({(\frac{N_{i}}{N_{i + 1}})}^{L} - 1) (1 + \sqrt{1 + \frac{4 {(\frac{N_{i}}{N_{i + 1}})}^{L + 1}}{{(1 - {(\frac{N_{i}}{N_{i + 1}})}^{L})}^{2}}}) + \frac{1}{2} {(\frac{\tilde{E}}{N_{2}})}^{L - 1} + \sum_{i = 2}^{𝚥} \frac{ε_{i}}{\tilde{E}} \end{matrix}

(A19b)

\begin{matrix} = \frac{Δ}{4 \tilde{E}} (e^{α L} - 1) (1 + \sqrt{1 + \frac{4 e^{α (L + 1)}}{{(1 - e^{α L})}^{2}}}) \sum_{i = 2}^{\infty} e^{- α i} + \frac{1}{2} {(\frac{\tilde{E}}{Δ e^{- α}})}^{L - 1} + \sum_{i = 2}^{𝚥} \frac{ε_{i}}{\tilde{E}} \end{matrix}

(A19c)

\begin{matrix} = \frac{x}{4} (e^{α L} - 1) (1 + \sqrt{1 + \frac{4 e^{α (L + 1)}}{{(1 - e^{α L})}^{2}}}) \frac{e^{- 2 α}}{1 - e^{- α}} + \frac{1}{2} {(\frac{e^{α}}{x})}^{L - 1} + \sum_{i = 2}^{𝚥} \frac{ε_{i}}{\tilde{E}}, \end{matrix}

(A19d)

where (A19b) follows from (A15) and (A18), in (A19c) we use the choice

N_{i} = Δ e^{- α (i - 1)}

for the noise levels for some positive parameters

α

and

Δ

, and (A19d) holds by defining

x ≜ Δ / \tilde{E}

.

Finally, by optimizing over the parameters

α

and x, taking a large enough M, and taking k to infinity, we arrive at the desired result.

For the particular case of a quadratically decaying profile (

L = 2

), numerically optimizing (A19d) over

α

and x yields (74).

Appendix D. Proof of Theorem 4

To prove Theorem 4, we will make use of the following non-uniform variant of the Berry–Esseen theorem, which is a weakened (yet more compact) form of a result due to Petrov.

Theorem A1

([35] and [36] (Chapter VII, Thm. 17)). Let

{x_{i} | i \in N}

be an i.i.d. sequence of RVs with zero mean and unit variance, and denote

s_{n} ≜ \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} x_{i}

. Assume that

E [| x_{1} |^{ν}] < \infty

for some

ν > 2

, and that

x_{1}

has a bounded p.d.f. Then, the p.d.f. of

s_{n}

, denoted by

f_{n}

, satisfies

\begin{matrix} |f_{n} (a) - f_{G} (a)| & < \frac{A_{ν}}{\sqrt{n} \cdot (1 + {| a |}^{ν})}, & \forall a & \in R, \end{matrix}

(A20)

for some

A_{ν} < \infty

, where

f_{G}

is the standard Gaussian p.d.f.

Furthermore, as discussed in Remarks 4–6, in each layer (and specifically in the PPM layer) we effectively have an additive noisy channel whose noise distribution approaches a Gaussian distribution (due to the Gaussianization and the interleaving). For all the linear layers, Lemma A1 allows us to use the MLM results of Section 3. However, for the PPM layer, instead of a combination of Gaussian vectors and dithers (as treated in Lemma A1) the actual noise also contains terms that are induced by the PPM scheme. We will use the following lemma, which is proved in Appendix F, to claim that the effect of these terms on the performance of the MLM scheme can be made arbitrarily small by taking the dimension k to be large enough.

Lemma A2.

Let

x^{k}

be a sequence in k of SNE vectors with second moment

r_{k}

such that

{lim}_{k \to \infty} r_{k} = r

, and let

{\hat{x}}^{k}

be a corresponding sequence in k of vectors with identically distributed entries such that:

The distance between the p.d.f. of $x_{1}$ and that of ${\hat{x}}_{1}$ is bounded from above by

$\begin{matrix} |f_{{\hat{x}}_{1}} (t) - f_{x_{1}} (t)| & \leq \frac{C (t)}{\sqrt{k}} & \forall t > 0, \end{matrix}$

(A21)

where $C (t) = o (1 / t^{2})$ with $f (t) = o (g (t))$ meaning that ${lim}_{t \to \infty} \frac{f (t)}{g (t)} = 0$ , and where $x_{1}$ and ${\hat{x}}_{1}$ denote the first entries of the vectors $x^{k}$ and ${\hat{x}}^{k}$ , respectively.
The correlation between any two squared entries within ${\hat{x}}^{k}$ decays to zero with k, viz..,

$\begin{matrix} lim_{k \to \infty} cov ({\hat{x}}_{i}^{2}, {\hat{x}}_{j}^{2}) = 0, \end{matrix}$

(A22)

where $cov (A, B) ≜ E [A B] - E [A] E [B]$ denotes the covariance of A and B.

Then, for all

ϵ, δ > 0

, there exists

k_{0} \in N

such that

\begin{matrix} P (\frac{1}{k} {∥{\hat{x}}^{k}∥}^{2} > r + δ) \leq ϵ \end{matrix}

(A23)

for all

k > k_{0}

, namely, the sequence

\{{\hat{x}}^{k} | k \in N\}

is a sequence of SNE vectors.

We will now prove Theorem 4. We note that the following analysis is based on the interleaving and Gaussianization blocks as they appear in the full description of the scheme in Appendix B.

Proof of Theorem 4.

We will now derive the parameters that achieve a quadratic profile (

L = 2

) and

\tilde{E}

in (6a) for all

N > N_{min}

for a given

N_{min} > 0

.

We choose

H_{i} = I_{B}

for the linear layers—layers

i = {1, 2, \dots, M - 1}

. Consequently, the analysis for the first

M - 1

layers of the proof of Theorem 3 carries over to this scheme as well.

Consider now the last layer—layer M. Following Feder and Ingber [28], and Hadad and Erez [29], we use a B-dimensional Walsh–Hadamard matrix

H_{M}

.

Now, if

N \in (N_{M - 1}, N_{M}]

, the receiver uses the last layer to improve the source estimates while viewing the estimates resulting from the previous layer,

{{\hat{x}}_{M - 1}^{k} (1), \dots, {\hat{x}}_{M - 1}^{k} (B)}

, as SI with mean power

D_{M - 1} (N_{M})

(A16a–e).

By Lemma 1, all the moments of all the entries of

m_{M}^{k} (b)

exist and are finite for all b. Thus, by Theorem A1, and since

m_{M}^{k} (1), m_{M}^{k} (2), \dots, m_{M}^{k} (B^{k})

are i.i.d., the p.d.f.

f_{ℓ}

of

{\tilde{m}}_{M; (ℓ, j)} (b)

(it is the same for all b and j for a given ℓ) satisfies

\begin{matrix} | f_{ℓ} (a) - f_{G_{ℓ}} (a) | < \frac{A_{ν}}{\sqrt{B} (1 + {| a |}^{ν})} \end{matrix}

(A24)

for all

ℓ \in {1, \dots, k}

,

b \in {1, \dots, B}

, and

j \in {1, 2, \dots, B^{k - 1}}

, for all

ν > 2

for some

A_{ν} < \infty

, where

f_{G_{ℓ}}

is the p.d.f. of a zero-mean Gaussian RV with the same variance as

{\tilde{m}}_{M; (ℓ, j)} (b)

.

By choosing some

ν > 4

and applying Corollary 6 to

{\tilde{m}}_{M; ℓ} (b)

with

h (a) = A_{ν} / (1 + {| a |}^{ν})

and

ϵ = 1 / \sqrt{B}

, the distortion bound of Theorem 2 is attained up to a loss

C / \sqrt{B}

for some constant

C < \infty

, where this loss can be made arbitrarily small by choosing a large enough B.

We note that the interleaving makes the PPM transmitters operate over elements that are related to lattices of different sources. Thus, after de-interleaving, the correlation between different vector elements, as well as the correlation between their squares, vanishes as

k \to \infty

. Furthermore, the per-element variance is bounded from above by a quantity that approaches (as

k \to \infty

) the PPM performance bound of Theorem 2. Thus, by Lemma A2, the resulting effective noise vector

z_{eff}^{k} = {\hat{m}}_{M}^{k} (b) - m_{M}^{k} (b)

is SNE (recall Definition 5).

We note that

z_{eff}^{k} (b)

is correlated with

m_{M}^{k} (b)

; nevertheless, by Corollary 4 with parameters

\tilde{α} = α_{c} = α_{s} = 1

the distortion of

{\hat{x}}_{M}^{k}

is bounded from above by

\begin{matrix} D_{M} (N) \leq \frac{D_{M - 1} (N_{M})}{{SDR}_{M} (N)} + ϵ_{M}, \end{matrix}

(A25)

where

ϵ_{M}

subsumes the aforementioned losses that all go to zero with k, and

{SDR}_{M} (N)

is the SDR of the analog PPM scheme for a noise power N of Theorem 2.

The energy

E_{M}

of the last layer is chosen to comply with the profile for

N < N_{M}

:

\begin{matrix} D_{M} (N) & \leq F (N) & \forall N < N_{M} . \end{matrix}

(A26)

Combining (A25) and the contribution of the first

M - 1

layers, given by (A19c) with summation from 1 to

M - 1

, by numerically optimizing the resulting term over the number of layers M, the PPM pulse width

β

and the energy layers

{\{E_{i}\}}_{i = 1}^{M}

we obtain that

M = 7

,

β = 0.9

, and the layer energies

E_{1} \approx 0.8480 \tilde{E}, E_{2} \approx 0.4893 \tilde{E}, E_{3} \approx 0.2823 \tilde{E}, E_{4} \approx 0.1629 \tilde{E}

,

E_{5} \approx 0.094 \tilde{E}

,

E_{6} \approx 0.0542 \tilde{E}

,

E_{7} \approx 0.0313 \tilde{E}

yields (75). □

Appendix E. Proof of Lemma A1

Since

Λ^{(k)}

is assumed to be a sequence that is good for channel coding,

\begin{matrix} lim_{k \to \infty} Pr (x^{k} - {\hat{x}}^{k} = e^{k}) = 1, \end{matrix}

(A27)

where

e^{k} ≜ (1 - α_{s}) q^{k} - \frac{α_{s} α_{c}}{β} z^{k} + \frac{α_{s} (1 - α_{c})}{β} m^{k}

; equivalently, for any

ϵ_{1} > 0

, however small, there exists

k_{1} \in N

, such that for all

k > k_{1}

,

\begin{matrix} Pr (x^{k} - {\hat{x}}^{k} \neq e^{k}) < ϵ_{1} . \end{matrix}

(A28)

Note now that

e^{k}

equals a linear combination of independent Gaussian vectors—which amounts to a Gaussian vector—and dither vectors. Hence, by [22] (Th. 3), the sequence in k of vectors

e^{k}

is SNE, namely, for any

δ, ϵ_{2} > 0

, however small, there exists

k_{2} \in N

, such that for all

k > k_{1}

,

\begin{matrix} Pr (\frac{1}{k} E [{∥e^{k}∥}^{2}] > (1 + δ) σ_{e}^{2}) \leq ϵ_{2} . \end{matrix}

(A29)

Now, let

ϵ > 0

, however small and choose

ϵ_{1} = ϵ_{2} = ϵ / 2

, and

k_{0} = max {k_{1}, k_{2}}

. Then, by the union bound, for all

k > k_{0}

,

\begin{matrix} Pr (\frac{1}{k} E [{∥x^{k} - {\hat{x}}^{k}∥}^{2}] > (1 + δ) σ_{x - \hat{x}}^{2}) \leq ϵ . \end{matrix}

(A30)

Appendix F. Proof of Lemma A2

By (A21),

\begin{matrix} |E [{\hat{x}}_{1}^{2}] - E [x_{1}^{2}]| = |\int_{t = \infty}^{\infty} t^{2} [f_{{\hat{x}}_{1}} (t) - f_{x_{1}} (t)] d t| \leq \frac{G}{\sqrt{k}} \end{matrix}

(A31)

where G is a finite constant, since

C (t) = o (t)

, that depends on

C (t)

, and

E [x_{1}^{2}] = r_{k}

. Using second-moment ergodicity of the entries of

x^{k}

, which holds in the limit of

k \to \infty

[37] (Thm. 12.1) by (A22) and recalling that

{lim}_{k \to \infty} r_{k} = r

concludes the proof.

References

Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley: New York, NY, USA, 2006. [Google Scholar]
Köken, E.; Gündüz, D.; Tuncel, E. Energy–Distortion Exponents in Lossy Transmission of Gaussian Sources Over Gaussian Channels. IEEE Trans. Inf. Theory 2016, 63, 1227–1236. [Google Scholar] [CrossRef]
Sevinç, C.; Tuncel, E. On Asymptotic Analysis of Energy–Distortion Tradeoff for Low-Delay Transmission over Gaussian Channels. IEEE Trans. Commun. 2021, 69, 4448–4460. [Google Scholar] [CrossRef]
El Gamal, A.; Kim, Y.H. Network Information Theory; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Shannon, C.E. Coding Theorems for a Discrete Source With a Fidelity Criterion of the Institute of Radio Engineers, International Convention Record; Wiley: New York, NY, USA, 1959; Volune 7, pp. 142–163. [Google Scholar]
Reznic, Z.; Feder, M.; Zamir, R. Distortion bounds for broadcasting with bandwidth expansion. IEEE Trans. Inf. Theory 2006, 52, 3778–3788. [Google Scholar] [CrossRef]
Köken, E.; Tuncel, E. On minimum energy for robust Gaussian joint source–channel coding with a distortion–noise profile. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 1668–1672. [Google Scholar]
Equitz, W.H.R.; Cover, T.M. Successive Refinement of Information. IEEE Trans. Inf. Theory 1991, 37, 851–857. [Google Scholar] [CrossRef]
Santhi, N.; Vardy, A. Analog codes on graphs. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Yokohama, Japan, 29 June–5 July 2003; p. 13. [Google Scholar]
Santhi, N.; Vardy, A. Analog codes on graphs. arXiv 2006, arXiv:cs/0608086. [Google Scholar]
Bhattad, K.; Narayanan, K.R. A note on the rate of decay of mean-squared error with SNR for the AWGN channel. IEEE Trans. Inf. Theory 2010, 56, 332–335. [Google Scholar] [CrossRef]
Mittal, U.; Phamdo, N. Hybrid digital-analog (HDA) joint source–channel codes for broadcasting and robust communications. IEEE Trans. Inf. Theory May 2002, 48, 1082–1102. [Google Scholar] [CrossRef]
Kochman, Y.; Zamir, R. Joint Wyner–Ziv/Dirty-Paper Coding by Modulo-Lattice Modulation. IEEE Trans. Inf. Theory 2009, 55, 4878–4899. [Google Scholar] [CrossRef]
Kochman, Y.; Zamir, R. Analog matching of colored sources to colored channels. IEEE Trans. Inf. Theory 2011, 57, 3180–3195. [Google Scholar] [CrossRef]
Zamir, R. Lattice Coding for Signals and Networks; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
Wyner, A.D.; Ziv, J. The Rate–Distortion Function for Source Coding with Side Information at the Decoder. IEEE Trans. Inf. Theory 1976, 22, 1–10. [Google Scholar] [CrossRef]
Wyner, A.D. The Rate–Distortion Function for Source Coding with Side Information at the Decoder—II: General Sources. Inf. Control 1978, 38, 60–80. [Google Scholar] [CrossRef]
Baniasadi, M.; Tuncel, E. Minimum Energy Analysis for Robust Gaussian Joint Source–Channel Coding with a Square-Law Profile. In Proceedings of the IEEE International Symposium on Information Theory and Its Applications (ISITA), Kapolei, HI, USA, 24–27 October 2020; pp. 51–55. [Google Scholar]
Baniasadi, M.; Köken, E.; Tuncel, E. Minimum Energy Analysis for Robust Gaussian Joint Source–Channel Coding with a Distortion–Noise Profile. IEEE Trans. Inf. Theory 2022, 68, 7702–7713. [Google Scholar] [CrossRef]
Baniasadi, M. Robust Gaussian Joint Source–Channel Coding with a Staircase Distortion–Noise Profile. arXiv 2020, arXiv:2001.09370. [Google Scholar]
Lev, O.; Khina, A. Energy-limited Joint Source–Channel Coding via Analog Pulse Position Modulation. IEEE Trans. Commun. 2022, 70, 5140–5150. [Google Scholar] [CrossRef]
Ordentlich, O.; Erez, U. A Simple Proof for the Existence of “Good” Pairs of Nested Lattices. IEEE Trans. Inf. Theory 2016, 62, 4439–4453. [Google Scholar] [CrossRef]
Kochman, Y.; Khina, A.; Erez, U.; Zamir, R. Rematch-and-Forward: Joint Source–Channel Coding for Parallel Relaying With Spectral Mismatch. IEEE Trans. Inf. Theory 2014, 60, 605–622. [Google Scholar] [CrossRef]
Erez, U.; Zamir, R. Achieving 12log(1+SNR) on the AWGN Channel with Lattice Encoding and Decoding. IEEE Trans. Inf. Theory 2004, 50, 2293–2314. [Google Scholar] [CrossRef]
Wozencraft, J.M.; Jacobs, I.M. Principles of Communication Engineering; John Wiley & Sons: New York, NY, USA, 1965. [Google Scholar]
Viterbi, A.J.; Omura, J.K. Principles of Digital Communication and Coding; McGraw-Hill: New York, NY, USA, 1979. [Google Scholar]
Wu, Y.; Verdú, S. Functional properties of minimum mean-square error and mutual information. IEEE Trans. Inf. Theory 2011, 58, 1289–1301. [Google Scholar] [CrossRef]
Feder, M.; Ingber, A. Method, device and system of reduced peak-to-average-ratio communication. U.S. Patent 8,116,695 B2, 14 February 2014. [Google Scholar]
Hadad, R.; Erez, U. Dithered quantization via orthogonal transformations. IEEE Trans. Signal Process. 2016, 64, 5887–5900. [Google Scholar] [CrossRef]
Asnani, H.; Shomorony, I.; Avestimehr, A.S.; Weissman, T. Network compression: Worst case analysis. IEEE Trans. Inf. Theory 2015, 61, 3980–3995. [Google Scholar] [CrossRef]
No, A.; Weissman, T. Rateless lossy compression via the extremes. IEEE Trans. Inf. Theory 2016, 62, 5484–5495. [Google Scholar] [CrossRef]
Lev, O. Energy-Limited Source–Channel Coding: Matlab Script. [Online]. 2022. Available online: https://github.com/omrilev1/ThesisCode/tree/main/PPM_JSCC (accessed on 20 August 2020).
Baniasadi, M.; Tuncel, E. Robust Gaussian JSCC Under the Near-Infinity Bandwidth Regime with Side Information at the Receiver. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Melbourne, Australia, 12–20 July 2021; pp. 2423–2428. [Google Scholar]
Baniasadi, M.; Tuncel, E. Robust Gaussian Joint Source–Channel Coding Under the Near-Zero Bandwidth Regime. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020; pp. 2474–2479. [Google Scholar]
Petrov, V.V. On local limit theorems for sums of independent random variables. Theory Probab. Its Appl. 1964, 9, 312–320. [Google Scholar] [CrossRef]
Petrov, V.V. Sums of Independent Random Variables; Springer: New York, NY, USA, 1975. [Google Scholar]
Papoulis, A.; Pillai, S.U. Probability, Random Variables, and Stochastic Processes, 4th ed.; Tata McGraw-Hill Education: New York, NY, USA, 2002. [Google Scholar]

Figure 1. JSCC of k samples of a Gaussian source,

x^{k}

, over a continuous-time bandwidth-unlimited AWGN channel subject to an energy constraint (8). The noise level of the noise process n is assumed to be known only to the receiver but not to the transmitter. The transmitter maps the k source samples

x^{k} \in R^{k}

into a continuous-time channel input

\{s_{x^{k}} (t) | | t | \leq k T / 2\}

(and arbitrarily large bandwidth). The receiver constructs an estimate

{\hat{x}}^{k}

of

x^{k}

from the continuous-time channel output

\{r (t) | | t | \leq k T / 2\}

.

Figure 1. JSCC of k samples of a Gaussian source,

x^{k}

, over a continuous-time bandwidth-unlimited AWGN channel subject to an energy constraint (8). The noise level of the noise process n is assumed to be known only to the receiver but not to the transmitter. The transmitter maps the k source samples

x^{k} \in R^{k}

into a continuous-time channel input

\{s_{x^{k}} (t) | | t | \leq k T / 2\}

(and arbitrarily large bandwidth). The receiver constructs an estimate

{\hat{x}}^{k}

of

x^{k}

from the continuous-time channel output

\{r (t) | | t | \leq k T / 2\}

.

Figure 2. Examples of one- and two-dimensional lattices with generator matrices

G = [1]

and

G = [\begin{matrix} \frac{\sqrt{3}}{2} & 0 \\ \frac{1}{2} & 1 \end{matrix}]

, respectively. The lattice points are marked by black crosses, whereas the Voronoi cell partitions are marked by blue lines.

Figure 2. Examples of one- and two-dimensional lattices with generator matrices

G = [1]

and

G = [\begin{matrix} \frac{\sqrt{3}}{2} & 0 \\ \frac{1}{2} & 1 \end{matrix}]

, respectively. The lattice points are marked by black crosses, whereas the Voronoi cell partitions are marked by blue lines.

Figure 3. High-level description of the JSCC scheme (Scheme 2). Our construction consists of a concatenation of an MLM encoder and an analog modulation that maps the encoded signals into a continuous-time waveform. On the receiver, we apply the inverse operations by first demodulating the analog signal and then use an MLM decoder.

Figure 4. Block diagrams of Scheme 2 and of this scheme with the effective additive noise channels of Remark 6.

Figure 5. Distortion and accumulated energy of the layers utilized by the receiver at a given

\tilde{E} / N

for a Gaussian source in the infinite-blocklength regime for a quadratic profile: Scheme 2 with linear layers with energy allocation

E_{i} = Δ e^{- α i}

for

Δ = 0.975

and

α = 0.65

. Empirical performance of the scheme with a linear layer with energy

E_{1} = 0.85

and an analog PPM layer with energy

E_{2} = 0.75

, and analytic performance of the scheme of Theorem 4 with the parameters from its proof, and analytic performance of the scheme of Baniasadi and Tuncel [18].

Figure 5. Distortion and accumulated energy of the layers utilized by the receiver at a given

\tilde{E} / N

for a Gaussian source in the infinite-blocklength regime for a quadratic profile: Scheme 2 with linear layers with energy allocation

E_{i} = Δ e^{- α i}

for

Δ = 0.975

and

α = 0.65

. Empirical performance of the scheme with a linear layer with energy

E_{1} = 0.85

and an analog PPM layer with energy

E_{2} = 0.75

, and analytic performance of the scheme of Theorem 4 with the parameters from its proof, and analytic performance of the scheme of Baniasadi and Tuncel [18].

Figure 6. Distortion and accumulated energy of the layers utilized by the receiver at a given

\tilde{E} / N

for a uniform scalar source for a quadratic profile: Scheme 2 with linear layers with energy allocation

E_{i} / \tilde{E} = Δ e^{- α i}

for

Δ = 0.9

and

α = 0.64

, and with a linear layer with energy

E_{1} = 0.9 \tilde{E}

and an analog PPM layer with energy

E_{2} = 0.346 \tilde{E}

. The

η

value was optimized according to (76).

Figure 6. Distortion and accumulated energy of the layers utilized by the receiver at a given

\tilde{E} / N

for a uniform scalar source for a quadratic profile: Scheme 2 with linear layers with energy allocation

E_{i} / \tilde{E} = Δ e^{- α i}

for

Δ = 0.9

and

α = 0.64

, and with a linear layer with energy

E_{1} = 0.9 \tilde{E}

and an analog PPM layer with energy

E_{2} = 0.346 \tilde{E}

. The

η

value was optimized according to (76).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lev, O.; Khina, A. Energy-Limited Joint Source–Channel Coding of Gaussian Sources over Gaussian Channels with Unknown Noise Level. Entropy 2023, 25, 1522. https://doi.org/10.3390/e25111522

AMA Style

Lev O, Khina A. Energy-Limited Joint Source–Channel Coding of Gaussian Sources over Gaussian Channels with Unknown Noise Level. Entropy. 2023; 25(11):1522. https://doi.org/10.3390/e25111522

Chicago/Turabian Style

Lev, Omri, and Anatoly Khina. 2023. "Energy-Limited Joint Source–Channel Coding of Gaussian Sources over Gaussian Channels with Unknown Noise Level" Entropy 25, no. 11: 1522. https://doi.org/10.3390/e25111522

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Energy-Limited Joint Source–Channel Coding of Gaussian Sources over Gaussian Channels with Unknown Noise Level

Abstract

1. Introduction

1.1. Notation

2. Problem Statement

3. Background: Modulo-Lattice Modulation

4. Background: Analog Modulations in the Known-ENR Regime

5. Main Results

5.1. Infinite-Blocklength Setting with Linear Layers

5.2. Infinite-Blocklength Setting with Analog PPM Layers

6. Simulations

6.1. Analytical Performance Comparison in the Infinite-Blocklength Regime

6.2. Empirical Performance Comparison for a Single-Source Sample

7. Summary and Discussion

8. Future Research

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Proof of Corollary 6

Appendix B. Full Version of Scheme 2

Appendix C. Proof of Theorem 3

Appendix D. Proof of Theorem 4

Appendix E. Proof of Lemma A1

Appendix F. Proof of Lemma A2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI