On Continuous-Time Gaussian Channels

Liu, Xianming; Han, Guangyue

doi:10.3390/e21010067

Open AccessArticle

On Continuous-Time Gaussian Channels^†

by

Xianming Liu

¹ and

Guangyue Han

^2,3,*

¹

School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan 430074, China

²

Department of Mathematics, The University of Hong Kong, Hong Kong, China

³

HKU Shenzhen Institute of Research and Innovation, Shenzhen 518057, China

^*

Author to whom correspondence should be addressed.

^†

Results in this paper have been partially presented in the Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA, 29 June–4 July 2014.

Entropy 2019, 21(1), 67; https://doi.org/10.3390/e21010067

Submission received: 8 December 2018 / Revised: 1 January 2019 / Accepted: 12 January 2019 / Published: 14 January 2019

(This article belongs to the Special Issue Multiuser Information Theory II)

Download Versions Notes

Abstract

:

A continuous-time white Gaussian channel can be formulated using a white Gaussian noise, and a conventional way for examining such a channel is the sampling approach based on the Shannon–Nyquist sampling theorem, where the original continuous-time channel is converted to an equivalent discrete-time channel, to which a great variety of established tools and methodology can be applied. However, one of the key issues of this scheme is that continuous-time feedback and memory cannot be incorporated into the channel model. It turns out that this issue can be circumvented by considering the Brownian motion formulation of a continuous-time white Gaussian channel. Nevertheless, as opposed to the white Gaussian noise formulation, a link that establishes the information-theoretic connection between a continuous-time channel under the Brownian motion formulation and its discrete-time counterparts has long been missing. This paper is to fill this gap by establishing causality-preserving connections between continuous-time Gaussian feedback/memory channels and their associated discrete-time versions in the forms of sampling and approximation theorems, which we believe will play important roles in the long run for further developing continuous-time information theory. As an immediate application of the approximation theorem, we propose the so-called approximation approach to examine continuous-time white Gaussian channels in the point-to-point or multi-user setting. It turns out that the approximation approach, complemented by relevant tools from stochastic calculus, can enhance our understanding of continuous-time Gaussian channels in terms of giving alternative and strengthened interpretation to some long-held folklore, recovering “long-known” results from new perspectives, and rigorously establishing new results predicted by the intuition that the approximation approach carries. More specifically, using the approximation approach complemented by relevant tools from stochastic calculus, we first derive the capacity regions of continuous-time white Gaussian multiple access channels and broadcast channels, and we then analyze how feedback affects their capacity regions: feedback will increase the capacity regions of some continuous-time white Gaussian broadcast channels and interference channels, while it will not increase capacity regions of continuous-time white Gaussian multiple access channels.

Keywords:

continuous-time channel; Gaussian channel; feedback; memory; sampling theorem; network information theory; capacity; capacity region; mutual information

1. Introduction

Continuous-time Gaussian channels were considered at the very inception of information theory. In his celebrated paper [1] birthing information theory, Shannon studied the following point-to-point continuous-time white Gaussian channels:

Y (t) = X (t) + Z (t), t \in R,

(1)

where

X (t)

is the channel input with average power limit P,

Z (t)

is the white Gaussian noise with flat power spectral density 1 and

Y (t)

are the channel output. Shannon actually only considered the case that the channel has bandwidth limit

ω

, namely, the channel input X and the noise Z, and therefore the output Y all have bandwidth limit

ω

(alternatively, as in (

9.54

) of [2], this can be interpreted as the original channel (1) concatenated with an ideal bandpass filter with bandwidth limit

ω

). Using the celebrated Shannon–Nyquist sampling theorem [3,4], the continuous-time channel (1) can be equivalently represented by a parallel Gaussian channel:

Y_{n}^{(ω)} = X_{n}^{(ω)} + Z_{n}^{(ω)}, n \in Z,

(2)

where the noise process

{Z_{n}^{(ω)}}

is i.i.d. with variance 1 [2]. Regarding the “space” index n as time, the above parallel channel can be interpreted as a discrete-time Gaussian channel associated with the continuous-time channel (1). It is well known from the theory of discrete-time Gaussian channels that the capacity of the channel (2) can be computed as

C^{(ω)} = ω \log (1 + \frac{P}{2 ω}) .

(3)

Then, the capacity C of the channel (1) can be computed by taking the limit of the above expression as

ω

tends to infinity:

C = lim_{ω \to \infty} C^{(ω)} = P / 2 .

(4)

The sampling approach consisting of (1)–(4) as above, which serves as a link between the continuous-time channel (1) and the discrete-time channel (2), typifies a conventional way to examine continuous-time Gaussian channels: convert them into associated discrete-time Gaussian channels, for which we have ample ammunition at hands. Note that when P tends to 0, using the fact that when P is “close” to 0,

ω \log (1 + \frac{P}{2 ω}) is “ close ” to \frac{P}{2},

one also reaches (4), which roughly explains the following long-held folklore within the information theory community:

a continuous-time infinite-bandwidth Gaussian channel without feedback or memory is “equivalent” to a discrete-time Gaussian channel without feedback or memory at low signal-to-noise ratio (SNR).
(A)

Moments of reflection, however, reveals that the sampling approach for the channel capacity (with bandwidth limit or not) is heuristic in nature: For one thing, a bandwidth-limited signal cannot be time-limited, which renders it infeasible to define the data transmission rate if assuming a channel has bandwidth limit. In this regard, rigorous treatments coping with this issue and other technicalities can be found in [5,6]; see also [7] for a relevant in-depth discussion. Another issue is that, even disregarding the above technical nuisance arising from the bandwidth limit assumption, the sampling approach only gives a lower bound for the capacity of (1): it shows that

P / 2

is achievable via a class of special coding schemes, but it is not clear that why transmission rate higher than

P / 2

cannot be achieved by other coding schemes. The capacity of (1) was rigorously studied in [8,9], and a complete proof establishing

P / 2

as its de facto capacity can be found in [10,11].

Alternatively, the continuous-time white Gaussian channel (1) can be examined [12] under the Brownian motion formulation:

Y (t) = \int_{0}^{t} X (s) d s + B (t),

(5)

where, slightly abusing the notation, we still use

Y (t)

to denote the output corresponding to the input

X (s)

, and

B (t)

denotes the standard Brownian motion (

Z (t)

can be viewed as a generalized derivative of

B (t)

); equivalently, the channel (5) can be seen as the original channel (1) concatenated with an integrator circuit. As opposed to white Gaussian noises, which only exist as generalized functions [13], Brownian motions are well-defined stochastic processes and have been extensively studied in probability theory. Here we remark that, via a routine orthonormal decomposition argument, both channels are equivalent to a parallel channel consisting of infinitely many Gaussian sub-channels [14].

An immediate and convenient consequence of such a formulation is that many notions in discrete time, including mutual information and typical sets, carry over to the continuous-time setting, which will rid us of the nuisances arising from the bandwidth limit assumption. Indeed, such a framework yields a fundamental formula for the mutual information of the channel (5) [15,16] and a clean and direct proof [16] that the capacity of (5) is

P / 2

; moreover, as evidenced by numerous results collected in [12] and some recent representative work [17,18] on point-to-point Gaussian channels, the use of Brownian motions elevates the level of rigor of our treatment, and equip us with a wide range of established techniques and tools from stochastic calculus. Here we remark that Girsanov’s theorem, one of the most important theorems in stochastic calculus, lays the foundation of our rigourous treatment; for those who are interested in the technical details in our proofs, we refer to [12,19], where Girsanov’s theorem (and its numerous variants) and its wide range of applications in information theory is discussed in great details.

Furthermore, as elaborated in Remark 4, the Brownian motion formulation is also versatile enough to accommodate feedback and memory; in particular, the point-to-point continuous-time white Gaussian memory/feedback channel can be characterized by the following stochastic differential equation:

Y (t) = \int_{0}^{t} g (s, W_{0}^{s}, Y_{0}^{s}) d s + B (t), t \in [0, T],

(6)

where g is a function from

[0, T] \times C [0, T] \times C [0, T]

to

R

. Note that (6) can be interpreted

(1): either as a feedback channel, where $W_{0}^{s} ≜ {W (r) : 0 \leq r \leq s}$ can be rewritten as M, interpreted as the message to be transmitted through the channel, and $g (s)$ can be rewritten as $X (s)$ , interpreted as the channel input, which depends on M and $Y_{0}^{s}$ , the channel output up to time s that is fed back to the sender,
(2): or as a memory channel, where $W_{0}^{s}$ can rewritten as $X_{0}^{s}$ , interpreted as the channel input, g is “part” of the channel, and $Y (t)$ , the channel output at time t, depends on $X_{0}^{t}$ and $Y_{0}^{t}$ , the channel input and output up to time t that are present in the channel as memory, respectively.

Note that, strictly speaking, the third parameter of g in (6) should be

Y_{0}^{s -}

, which, however, can be equivalently replaced by

Y_{0}^{s}

due to the continuity of sample paths of

{Y (t)}

. Note that, with the presence of feedback/memory, the existence and uniqueness of Y is in fact a tricky mathematical problem; however, we will in this paper simply assume that the input X is appropriately chosen such that Y uniquely exists. For more detailed discussion about the Brownian motion formulation of continuous-time Gaussian channels and preliminaries and known results thereof, we refer the reader to [12].

As opposed to the white Gaussian noise formulation, under the Brownian motion formulation, memory and feedback can be naturally translated to the discrete-time setting: the pathwise continuity of a Brownian motion allows the inheritance of temporal causality when the channel is sampled (see Section 2) or approximated (see Section 3). On the other hand, the white Gaussian noise formulation is facing inherent difficulty as far as inheriting temporal causality is concerned: in converting (1) to (2), while

X_{n}^{(w)}

are obtained as “time” samples of

X (t)

,

Z_{n}^{(w)}

are in fact “space” samples of

Z (t)

, as they are merely the coefficients of the (extended) Karhunen–Loeve decomposition of

Z (t)

[20,21,22]; see also [23] for an in-depth discussion on this.

On the other hand, though, as opposed to the white Gaussian noise formulation, a link that establishes the information-theoretic connection between the continuous-time channel (6) and its discrete-time counterparts has long been missing, which may explain why discrete-time and continuous-time information theory (under the Brownian motion formulation) have largely gone separate ways with little interaction for the past several decades. In this paper, we will fill this gap by establishing causality-preserving connections between the channel (5) and its associated discrete-time versions in the forms of sampling and approximation theorems, which we believe will serve as the above-mentioned missing links and play important roles in the long run for further developing continuous-time information theory, particularly for the communication scenarios when feedback/memory is present.

As an immediate application of the approximation theorem, we propose the approximation approach to examine continuous-time Gaussian feedback channels with the average power constraint and infinite bandwidth (again, by comparison, the conventional sampling approach cannot handle feedback). It turns out that this approach, when complemented by relevant tools from stochastic calculus, can greatly enhance our understanding of continuous-time Gaussian channels in terms of giving alternative and strengthened interpretations to the low SNR equivalence in (A), recovering “long known” results (Theorems 7 (for the non-feedback case), 8 and 10) from new and rigorous perspectives, and deriving new results (Theorems 7 (for the feedback case), 9, 11 and 12) inspired by the intuition that the approximation approach carries.

Below, we summarize the contributions of this paper in greater details.

In Section 2, we prove Theorems 1 and 2, sampling theorems for a continuous-time Gaussian feedback/memory channel, which naturally connect such a channel with their sampled discrete-time versions. In addition, in Section 3, we prove Theorems 3 and 4, the so-called approximation theorems, which connect a continuous-time Gaussian feedback/memory channel with its approximated discrete-time versions (in the sense of the Euler-Maruyama approximation [24]). Roughly speaking, a sampling theorem says that a time-sampled channel is “close” to the original channel if the sampling is fine enough, and an approximation theorem says that an approximated channel is “close” to the original channel if the approximation is fine enough, both in an information-theoretic sense. Note that, as elaborated in Remark 3, certain version of the approximation theorem boils down to the sampling theorem when there is no memory and feedback in the channel.

Apparently a sampling theorem, whose spirit is in line with the Shannon–Nyquist sampling theorem, is of practical and theoretical value due to the fact it deals with the “real” values of the channel output; and, as will be elaborated later, approximation theorems seem to be surprisingly useful in a number of respects despite the fact it only deals with the “approximated” values of the channel output: it can certainly provide alternative rigorous tools in translating results from discrete time to continuous time; more importantly, as elaborated in Section 4, it lays the foundation for the approximation approach, which gives us the intuition in the point-to-point continuous-time setting, which will further help us to deliver rigorous treatments of multi-user continuous-time Gaussian channels in Section 5.

More specifically, in Section 5, we derive the capacity regions of a continuous-time white Gaussian multiple access channel (Theorem 7), a continuous-time white Gaussian interference channel (Theorem 8), and a continuous-time white Gaussian broadcast channel (Theorem 10). Here, we note that when there is no feedback, as discussed in Remark 5, the results above are “long known” in the sense that they are roughly suggested by the conventional sampling approach, or alternatively, the low SNR equivalence in (A). However, to the best of our knowledge, explicit formulations and statements of such results are missing in the literature and their rigourous proofs are non-trivial (for instance, when establishing Theorem 10, we have to resort to the continuous-time I-minimum mean square error (MMSE) relationship [25], which has been established only recently). By comparison, the presence of feedback necessitates the use of the approximation approach, which help us to connect relevant results and proofs in discrete time to analyze how feedback affects the capacity regions of families of continuous-time multi-user one-hop Gaussian channels: feedback will increase the capacity regions of some continuous-time Gaussian broadcast channels (Theorem 12) and interference channels (Theorem 9), while it will not increase capacity regions of a continuous-time physically degraded Gaussian broadcast channel (Theorem 11) and a continuous-time Gaussian multiple access channels (Theorem 7).

2. Sampling Theorems

A very natural question is whether, similarly for the white Gaussian noise formulation, sampling theorems hold for continuous-time white Gaussian channels under Brownian motion formulation. In this section, we will establish sampling theorems for the channel (6), which naturally connect such channels with their discrete-time versions obtained by sampling.

Consider the following regularity conditions for channel (6):

(a): The solution ${Y (t)}$ to the stochastic differential Equation (6) uniquely exists;
(b): $P (\int_{0}^{T} g^{2} (t, W_{0}^{t}, Y_{0}^{t}) d t < \infty) = P (\int_{0}^{T} g^{2} (t, W_{0}^{t}, B_{0}^{t}) d t < \infty) = 1;$
(c): $\int_{0}^{T} E [| g (t, W_{0}^{t}, Y_{0}^{t}) |] d t < \infty .$

Note that all the three above conditions are rather weak: Condition (a) is necessary for the channel to be meaningful, and Conditions (b) and (c) are very mild integrability assumptions.

Now, for any

n \in N

, choose time points

t_{n, 0}, t_{n, 1}, \dots, t_{n, n} \in R

such that

0 = t_{n, 0} < t_{n, 1} < \dots < t_{n, n - 1} < t_{n, n} = T,

and let

Δ_{n} ≜ {t_{n, 0}, t_{n, 1}, \dots, t_{n, n}}

. Sampling the channel (6) over the time interval

[0, T]

with respect to

Δ_{n}

, we obtain its sampled discrete-time version as follows:

Y (t_{n, i}) = \int_{0}^{t_{n, i}} g (s, W_{0}^{s}, Y_{0}^{s}) d s + B (t_{n, i}), i = 0, 1, \dots, n .

(7)

For any time-point sequence

Δ_{n}

, we will use

δ_{Δ_{n}}

to denote its minimal stepsize, namely,

δ_{Δ_{n}} ≜ max_{i = 1, 2, \dots, n} (t_{n, i} - t_{n, i - 1}) .

Δ_{n}

is said to be evenly spaced if

t_{n, i} - t_{n, i - 1} = T / n

for all feasible i, and we will use the shorthand notation

δ_{n}

to denote its stepsize, i.e.,

δ_{n} ≜ t_{n, 1} - t_{n, 0} = T / n

. Apparently, evenly spaced time-point sequences are natural candidates with respect to which a continuous-time Gaussian channel can be sampled.

We are primarily concerned with the mutual information for the channel (6), whose standard definition (see, e.g., [12,26]) is given below:

I (W_{0}^{T}; Y_{0}^{T}) = \{\begin{matrix} E [\log \frac{d μ_{W Y}}{d μ_{W} \times μ_{Y}} (W_{0}^{T}, Y_{0}^{T})], & if \frac{d μ_{W Y}}{d μ_{W} \times μ_{Y}} exists, \\ \infty, & otherwise, \end{matrix}

(8)

where the subscripted

μ

denotes the measure induced on

C [0, T]

or

C [0, T] \times C [0, T]

by the corresponding stochastic process and

d μ_{W Y} / d μ_{W} \times μ_{Y}

denotes the Radon-Nikodym derivative of

μ_{W Y}

with respect to

d μ_{W} \times μ_{Y}

. Here we note that the same definition with appropriately modified sample spaces applies to the degenerated case when

W_{0}^{T}

and/or

Y_{0}^{T}

are random variables.

Roughly speaking, the following sampling theorem states that for any sequence of “increasingly refined” samplings, the mutual information of the sampled discrete-time channel (7) will converge to that of the original channel (6).

Theorem 1.

Assume Conditions (a)–(c). Suppose that

Δ_{n} \subset Δ_{n + 1}

for all n and that

δ_{Δ_{n}} \to 0

as n tends to infinity. Then, we have

lim_{n \to \infty} I (W_{0}^{T}; Y (Δ_{n})) = I (W_{0}^{T}; Y_{0}^{T}),

where

Y (Δ_{n}) ≜ {Y (t_{n, 0}), Y (t_{n, 1}), \dots, Y (t_{n, n})}

.

Proof.

The proof is rather technical and thereby postponed to Appendix A. ☐

Regarding the assumptions of Theorem 1, as mentioned before, Conditions (a)–(c) are very weak, but the condition that “

Δ_{n} \subset Δ_{n + 1}

for all n” is somewhat restrictive, which, in particular, is not satisfied by the set

{Δ_{n}}

of all evenly spaced time-point sequences. We next show that this condition can be replaced by some extra regularity conditions: The same theorem holds as long as the stepsize of the sampling tends to 0, which, in particular, is satisfied by the set of all evenly spaced sampling sequences.

Below and hereafter, defining the distance

∥ U_{0}^{s} - V_{0}^{t} ∥

between

U_{0}^{s}

and

V_{0}^{t}

with

0 \leq s \leq t

as

∥ U_{0}^{s} - V_{0}^{t} ∥ ≜ sup_{r \in [0, s]} | U (r) - V (r) | + sup_{r \in [s, t]} | U (s) - V (r) |,

(9)

we may assume the following three regularity conditions for the channel (6):

(d): Uniform Lipschitz condition: There exists a constant $L > 0$ such that for any $0 \leq s_{1}, s_{2}, s_{3}, t_{1}, t_{2}, t_{3} \leq T$ , any $U_{0}^{T}, V_{0}^{T}, Y_{0}^{T}$ and $Z_{0}^{T}$ ,

$| g (s_{1}, U_{0}^{s_{2}}, Y_{0}^{s_{3}}) - g (t_{1}, V_{0}^{t_{2}}, Z_{0}^{t_{3}}) | \leq L (| s_{1} - t_{1} | + ∥ U_{0}^{s_{2}} - V_{0}^{t_{2}} ∥ + ∥ Y_{0}^{s_{3}} - Z_{0}^{t_{3}} ∥);$
(e): Uniform linear growth condition: There exists a constant $L > 0$ such that for any $W_{0}^{T}$ and any $Y_{0}^{T}$ ,

$| g (t, W_{0}^{t}, Y_{0}^{t}) | \leq L (1 + ∥ W_{0}^{t} ∥ + ∥ Y_{0}^{t} ∥),$

where

$∥ W_{0}^{t} ∥ = sup_{r \in [0, t]} | W (r) |, ∥ Y_{0}^{t} ∥ = sup_{r \in [0, t]} | Y (r) |;$
(f): Regularity conditions on W: There exists $ε > 0$ such that

$E [e^{ε ∥ W_{0}^{T} ∥^{2}}] < \infty,$

and for any $K > 0$ , there exists $ε^{'} > 0$ such that

$E [e^{K {sup}_{| s - t | \leq ε^{'}} {(W (s) - W (t))}^{2}}] < \infty,$

and there exists a constant $L > 0$ such that for any $ε^{″} > 0$ ,

$E [{sup}_{| s - t | \leq ε^{″}} {(W (s) - W (t))}^{4}] \leq L {(ε^{″})}^{2} .$

The following lemma, whose proof is postponed to Appendix B, says that Conditions (d)–(f) are stronger than Conditions (a)–(c). We however remark that Conditions (d)–(f) are still rather mild assumptions: The uniform Lipschitz condition, uniform linear growth condition and their numerous variants are typical assumptions that can guarantee the existence and uniqueness of the solution to a given stochastic differential equation. In theory, these two conditions are considered mild in the sense there are examples that the corresponding stochastic differential equation may not have solutions at all if these two conditions are not satisfied (see, e.g., [27]). Note that the third condition is a mild integrability condition; as a matter of fact, for a feedback channel where W is interpreted as the message, this condition is trivially satisfied. All three conditions above will be taken for granted in most practical communication situations: as might be expected, the signals employed in practice will be much better-behaving.

Lemma 1.

Assume Conditions (d)–(f). Then, there exists a unique strong solution of (6) with initial value

Y (0) = 0

. Moreover, there exists

ε > 0

such that

E [e^{ε ∥ Y_{0}^{T} ∥^{2}}] < \infty,

(10)

which immediately implies Conditions (b) and (c).

Roughly speaking, the following sampling theorem states that if the stepsizes of the samplings tend to 0, the mutual information of the channel (7) will converge to that of the channel (6). Note that in this theorem, we do not need the assumption that “

Δ_{n} \subset Δ_{n + 1}

for all n”, which is required in Theorem 1.

Theorem 2.

Assume Conditions (d)–(f). For any sequence

{Δ_{n}}

with

δ_{Δ_{n}} \to 0

as n tends to infinity, we have

lim_{n \to \infty} I (W_{0}^{T}; Y (Δ_{n})) = I (W_{0}^{T}; Y_{0}^{T}) .

Proof.

The proof is rather technical and lengthy, and thereby postponed to Appendix C. We note that, as detailed in Remark A1, the arguments in the proof can be adapted to yield a sampling theorem in estimation theory. ☐

3. Approximation Theorems

In this section, we will establish approximation theorems for the channel (6), which naturally connect such channels with their discrete-time versions obtained by approximation. As elaborated in later sections, the approximation theorem will underpin the approximation approach that will be introduced in Section 4.

An application of the Euler-Maruyama approximation [24] with respect to

Δ_{n}

to (6) will yield a discrete-time sequence

{Y^{(n)} (t_{n, i}) : i = 0, 1, \dots, n}

and a continuous-time process

{Y^{(n)} (t) : t \in [0, T]}

, a linear interpolation of

{Y (t_{n, i})}

, as follows: Initializing with

Y^{(n)} (0) = 0

, we recursively compute, for each

i = 0, 1, \dots, n - 1

,

Y^{(n)} (t_{n, i + 1}) = Y^{(n)} (t_{n, i}) + \int_{t_{n, i}}^{t_{n, i + 1}} g (s, W_{0}^{t_{n, i}}, Y_{0}^{(n), t_{n, i}}) d s + B (t_{n, i + 1}) - B (t_{n, i}),

(11)

Y^{(n)} (t) = Y^{(n)} (t_{n, i}) + \frac{t - t_{n, i}}{t_{n, i + 1} - t_{n, i}} (Y^{(n)} (t_{n, i + 1}) - Y^{(n)} (t_{n, i})), t_{n, i} \leq t \leq t_{n, i + 1} .

(12)

We are now ready to prove the following theorem:

Theorem 3.

Assume Conditions (d)–(f). Then, we have

lim_{n \to \infty} I (W_{0}^{T}; Y^{(n)} (Δ_{n})) = I (W_{0}^{T}; Y_{0}^{T}),

where

Y^{(n)} (Δ_{n}) ≜ {Y^{(n)} (t_{n, 0}), Y^{(n)} (t_{n, 1}), \dots, Y^{(n)} (t_{n, n})}

.

Proof.

The proof is rather technical and lengthy, and thereby postponed to Appendix D. We note that, as detailed in Remark A2, the arguments in the proof can be adapted to yield an approximation theorem in estimation theory. ☐

For any

{Δ_{n}}

, let

W^{(n)} (t)

denote the piecewise linear version of

W_{0}^{T}

with respect to

Δ_{n}

; more precisely, for any

i = 0, 1, \dots, n

,

W^{(n)} (t_{n, i}) = W (t_{n, i})

, and for any

t_{n, i - 1} < s < t_{n, i}

with

s = λ t_{n, i - 1} + (1 - λ) t_{n, i}

where

0 < λ < 1

,

W^{(n)} (s) = λ W (t_{n, i - 1}) + (1 - λ) W (t_{n, i})

. The following modified Euler-Maruyama approximation with respect to

Δ_{n}

applied to the channel (6) yields a discrete-time sequences

{Y^{(n)} (t_{n, i}) : i = 0, 1, \dots, n}

and a continuous-time processes

{Y^{(n)} (t) : t \in [0, T]}

as follows: Initializing with

Y^{(n)} (0) = 0

, we recursively compute, for each

i = 0, 1, \dots, n - 1

,

Y^{(n)} (t_{n, i + 1}) = Y^{(n)} (t_{n, i}) + \int_{t_{n, i}}^{t_{n, i + 1}} g (s, W_{0}^{(n), t_{n, i}}, Y_{0}^{(n), t_{n, i}}) d s + B (t_{n, i + 1}) - B (t_{n, i}),

(13)

Y^{(n)} (t) = Y^{(n)} (t_{n, i}) + \frac{t - t_{n, i}}{t_{n, i + 1} - t_{n, i}} (Y^{(n)} (t_{n, i + 1}) - Y^{(n)} (t_{n, i})), t_{n, i} \leq t \leq t_{n, i + 1} .

(14)

Now, using a parallel argument in the proof of Theorem 3, we have the following approximation theorem.

Theorem 4.

Assume Conditions (d)–(f). Then, we have

lim_{n \to \infty} I (W^{(n)} (Δ_{n}); Y^{(n)} (Δ_{n})) = I (W_{0}^{T}; Y_{0}^{T}) .

Remark 1.

When the channel (6) is interpreted as a feedback channel, both

W^{(n)}

and W are precisely M. When the channel (6) is interpreted as a memory channel, Theorem 4 states that the mutual information between its input and output is the limit of that of its approximated input and output (in the sense of the above-mentioned modified Euler-Maruyama approximation).

Other variants of the Euler-Maruyama approximation can also be applied to the channel to yield variants of the approximation theorem. For instance, under Conditions (d)–(f), for the following variant of the Euler-Maruyama approximation,

Y^{(n)} (t_{n, i + 1}) = Y^{(n)} (t_{n, i}) + \int_{t_{n, i}}^{t_{n, i + 1}} g (t_{n, i}, W_{0}^{t_{n, i}}, Y_{0}^{(n), t_{n, i}}) d s + B (t_{n, i + 1}) - B (t_{n, i}),

(15)

a parallel argument as in the proof of Theorem 3 will give the following variant of Theorem 3:

Theorem 5.

Assume Conditions (d)–(f). Then, we have

lim_{n \to \infty} I (W_{0}^{T}; Y^{(n)} (Δ_{n})) = I (W_{0}^{T}; Y_{0}^{T}) .

(16)

Moreover, for

Y^{(n)} (t_{n, i + 1}) = Y^{(n)} (t_{n, i}) + \int_{t_{n, i}}^{t_{n, i + 1}} g (t_{n, i}, W_{0}^{(n), t_{n, i}}, Y_{0}^{(n), t_{n, i}}) d s + B (t_{n, i + 1}) - B (t_{n, i}),

(17)

we have the following variant of Theorem 4:

Theorem 6.

Assume Conditions (d)–(f). Then, we have

lim_{n \to \infty} I (W^{(n)} (Δ_{n}); Y^{(n)} (Δ_{n})) = I (W_{0}^{T}; Y_{0}^{T}) .

(18)

Regarding the approximation theorem and its variants, we make the following several remarks.

Remark 2.

Continuous-time directed information has been defined in [18] for continuous-time white Gaussian channels with positively delayed feedback. In this remark, we show that our approximation theorem can be used to give an alternative definition of continuous-time directed information, even for the case that the feedback is instantaneous.

Consider the following continuous-time Gaussian feedback channel:

Y (t) = \int_{0}^{t} X (s, M, Y_{0}^{s}) d s + B (t), t \in [0, T] .

(19)

For any

Δ_{n}

, we define

{\tilde{X}}^{(n)} (\cdot)

as follows: for any t with

t_{n, i} \leq t < t_{n, i + 1}

,

{\tilde{X}}^{(n)} (t) = \sum_{j = 0}^{i - 1} \int_{t_{n, j}}^{t_{n, j + 1}} X (s, M, Y_{0}^{(n), t_{n, j}}) d s + \int_{t_{n, i}}^{t} X (s, M, Y_{0}^{(n), t_{n, i}}) d s,

Writing

\tilde{X} (t, M, Y_{0}^{(n), t})

as

{\tilde{X}}^{(n)} (t)

for simplicity, (11) can be rewritten as

Y^{(n)} (t_{n, i + 1}) = Y^{(n)} (t_{n, i}) + {\tilde{X}}^{(n)} (t_{n, i + 1}) - {\tilde{X}}^{(n)} (t_{n, i}) + B (t_{n, i + 1}) - B (t_{n, i}),

for which it can be readily checked that

I ({\tilde{X}}^{(n)} (Δ_{n}) \to Y^{(n)} (Δ_{n})) = I (M; Y^{(n)} (Δ_{n})) .

(20)

Theorem 3 and the above observation can be used to define continuous-time directed mutual information. To be more precise, the continuous-time directed information from

X_{0}^{T}

to

Y_{0}^{T}

of the channel (19) can be defined as

I (X_{0}^{T} \to Y_{0}^{T}) ≜ lim_{n \to \infty} I ({\tilde{X}}^{(n)} (Δ_{n}) \to Y^{(n)} (Δ_{n})) .

(21)

Consider the following continuous-time Gaussian channel with possibly delayed feedback:

Y (t) = \int_{0}^{t} X (s, M, Y_{0}^{s - D}) d s + B (t), t \in [0, T],

(22)

where

D \geq 0

denotes the delay of the feedback. In [18], the notion of continuous-time directed information from

X_{0}^{T}

to

Y_{0}^{T}

is defined as follows:

I_{D} (X_{0}^{T} \to Y_{0}^{T}) = inf_{Δ_{n}} \sum_{i = 1}^{n} I (X_{t_{n, 0}}^{t_{n, i}}; Y_{t_{n, i - 1}}^{t_{n, i}} | Y_{t_{n, 0}}^{t_{n, i - 1}}) .

(23)

It is proven that for the case

D > 0

, using this notion, a connection between information theory and estimation theory can be established as follows:

I_{D} (X_{0}^{T} \to Y_{0}^{T}) = \frac{1}{2} \int_{0}^{T} E [{(X (t) - E [X (t) | Y_{0}^{t}])}^{2}] d t .

(24)

On the other hand though, it is easy to see that for the case

D = 0

, i.e., there is no delay in the feedback as in (19), the definition in (23) and the equality as in (24) may run into some problems: Consider the extreme scenario and choose

X (t) = - Y (t)

for any feasible t, then clearly the right-hand side of (24) should be equal to 0. On the other hand, though, for the left-hand side, each small interval in (23) will yield

I_{D = 0} (X_{t_{n, 0}}^{t_{n, i + 1}}; Y_{t_{n, i}}^{t_{n, i + 1}} | Y_{t_{n, 0}}^{t_{n, i}}) = I_{D = 0} (Y_{t_{n, i}}^{t_{n, i + 1}}; Y_{t_{n, i}}^{t_{n, i + 1}} | Y_{t_{n, 0}}^{t_{n, i}}) = I_{D = 0} (Y_{t_{n, i}}^{t_{n, i + 1}}; Y_{t_{n, i}}^{t_{n, i + 1}} | Y (t_{n, i})),

where for the last equality, we have used the fact that under the assumption that

X (t) = - Y (t)

,

{Y (t)}

is an Ornstein–Uhlenbeck process, which is a Gaussian Markov process. Noting that given

Y (t_{i}) = y (t_{i})

, the Radon-Nikodym derivative

d μ_{Y_{t_{i}}^{t_{i + 1}} Y_{t_{i}}^{t_{i + 1}}} / d μ_{Y_{t_{i}}^{t_{i + 1}}} \times μ_{Y_{t_{i}}^{t_{i + 1}}}

does not exist, we conclude, by referring to the definition in (8), that

I_{D = 0} (Y_{t_{n, i}}^{t_{n, i + 1}}; Y_{t_{n, i}}^{t_{n, i + 1}} | Y (t_{n, i})) = \infty, a n d t h e r e b y, I_{D = 0} (X_{t_{n, 0}}^{t_{n, i + 1}}; Y_{t_{n, i}}^{t_{n, i + 1}} | Y_{t_{n, 0}}^{t_{n, i}}) = \infty,

which further implies that,

I_{D = 0} (X_{0}^{T} \to Y_{0}^{T})

, the left-hand side of (24) at

D = 0

is infinite, a contradiction. On the other hand, be it the case

D > 0

or

D = 0

, with the definition in (21), Theorem 3 however promises:

I (X_{0}^{T} \to Y_{0}^{T}) = I (M_{0}^{T}; Y_{0}^{T}) = \frac{1}{2} \int_{0}^{T} E [{(X (t) - E [X (t) | Y_{0}^{t}])}^{2}] d t .

Remark 3.

When there is no feedback or memory, Theorem 3 boils down to Theorem 2: obviously we will have for any feasible i

Y^{(n)} (t_{n, i}) = Y (t_{n, i}),

which means that Theorem 3 actually states

lim_{n \to \infty} I (W_{0}^{T}; Y (Δ_{n})) = I (W_{0}^{T}; Y_{0}^{T}),

which is precisely the conclusion of Theorem 2. Moreover, by Remark 2, we also have

lim_{n \to \infty} I ({\tilde{X}}^{(n)} (Δ_{n}); Y (Δ_{n})) = lim_{n \to \infty} I ({\tilde{X}}^{(n)} (Δ_{n}) \to Y (Δ_{n})) = I (W_{0}^{T}; Y_{0}^{T}) .

Remark 4.

In this remark, we briefly discuss the possible applications of our sampling and approximation theorems, both of which we believe will important roles in the long run for further developing continuous-time information theory, particularly for scenarios where feedback and memory are present.

Taking advantage of the pathwise continuity of a Brownian motion, our sampling theorems, Theorems 1 and 2, naturally connect continuous-time Gaussian memory/feedback channels with their discrete-time counterparts, whose outputs are precisely sampled outputs of the original continuous-time Gaussian channel. In discrete time, the Shannon–McMillan–Breiman theorem provides an effective way to approximate the entropy rate of a stationary ergodic process, and numerical computation and optimization of mutual information of discrete-time channel using the Shannon–McMillan–Breiman theorem and its extensions have been extensively studied (see, e.g., [28,29] and references therein), which suggests our sampling theorems may well serve as a bridge to capitalize on relevant results in discrete time to numerically compute and optimize the mutual information of continuous-time Gaussian channels. In short, despite numerous technical barriers that one needs to overcome, we believe that in the long run the sampling theorems can help us in terms of numerically computing the mutual information and capacity of continuous-time Gaussian channels.

By comparison, our approximation theorems, Theorems 3 and 4, are somewhat “artificial” in the sense that the outputs of the associated discrete-time channels are only approximated outputs of the original continuous-time channels. Nonetheless, as the Euler-Maruyama approximation of a continuous-time channel yields the form of a discrete-time channel typically takes, our approximation theorems allow a smooth translation from the results and ideas from the discrete-time setting to the continuous-time setting. As a result, the approximation theorems underpin the so-called approximation approach (to be introduced in Section 4) and readily yield results for continuous-time Gaussian channels in the multi-user setting, which will be elaborated in the following sections.

4. The Approximation Approach

Consider the following continuous-time white Gaussian channel with feedback

Y (t) = \int_{0}^{t} X (s, M, Y_{0}^{s}) d s + B (t), t \geq 0,

(25)

satisfying the power constraint: there exists

P > 0

such that for any T, with probability 1

\frac{1}{T} \int_{0}^{T} X^{2} (s, M, Y_{0}^{s}) d s \leq P .

(26)

As mentioned in Section 1, it is well known that the capacity of the above channel is

P / 2

(The same result can be established under alternative power constraints; see, e.g., [12]).

As elaborated in Section 1, when there is no feedback in the channel, i.e., the channel (25) is actually equivalent to (1), and one can “derive” the non-feedback capacity heuristically using the conventional sampling approach as in (1)–(4). But this approach is unable to tackle feedback since an application of the Shannon–Nyquist sampling theorem will destroy the temporal causality.

In this section, we use our approximation theorems to give an alternative way to “derive” the capacity of (25), which will be referred to as the approximation approach in the remainder of the paper. Compared to the sampling approach, the approximation approach can handle feedback due to the fact the Euler-Maruyama approximation preserves temporal causality. Below we briefly explain this new approach, which will be further developed and used, either heuristically or rigorously, in Section 5, where multiple users may be involved in a communication system.

For fixed

T > 0

, consider the evenly spaced sequence

Δ_{n}

with stepsize

δ_{n} = T / n

. Applying the Euler-Maruyama approximation (15) to the channel (25) over the time window

[0, T]

, we obtain

Y^{(n)} (t_{n, i + 1}) = Y^{(n)} (t_{n, i}) + \int_{t_{n, i}}^{t_{n, i + 1}} X (t_{n, i}, M, Y_{0}^{(n), t_{n, i}}) d s + B (t_{n, i + 1}) - B (t_{n, i}) .

(27)

By Theorem 5, we have

I (M; Y_{0}^{T}) = lim_{n \to \infty} I (M; Y^{(n)} (Δ_{n})) .

(28)

Our strategy is to “establish” the capacity for the discrete-time channel (27) first, and then the capacity for the continuous-time channel (25) using the “closeness” between the two channels, as claimed by approximation theorems.

For the converse part, we first note that

\begin{matrix} I (M; Y^{(n)} (Δ_{n})) & = \sum_{i = 1}^{n} h (Y^{(n)} (t_{n, i}) - Y^{(n)} (t_{n, i - 1}) | Y_{t_{n, 0}}^{(n), t_{n, i - 2}}) - \sum_{i = 1}^{n} h (B (t_{n, i}) - B (t_{n, i - 1})) \\ \leq \sum_{i = 1}^{n} h (Y^{(n)} (t_{n, i}) - Y^{(n)} (t_{n, i - 1})) - \sum_{i = 1}^{n} h (B (t_{n, i}) - B (t_{n, i - 1})) . \end{matrix}

It then follows from the fact

\begin{matrix} V a r (Y^{(n)} (t_{n, i}) - Y^{(n)} (t_{n, i - 1})) & = E [{(Y^{(n)} (t_{n, i}) - Y^{(n)} (t_{n, i - 1}))}^{2}] \\ = E [δ_{n}^{2} {(X^{(n)} (t_{n, i - 1}))}^{2}] + E [{(B (t_{n, i}) - B (t_{n, i - 1}))}^{2}] \\ = E [δ_{n}^{2} {(X^{(n)} (t_{n, i - 1}))}^{2}] + δ_{n}, \end{matrix}

that

\begin{matrix} I (M; Y_{Δ_{n}}^{(n)}) & \leq \frac{1}{2} \sum_{i = 0}^{n} \log (1 + δ_{n} E [{(X^{(n)} (t_{n, i}))}^{2}]) \end{matrix}

(29)

\begin{matrix} \leq \frac{1}{2} \sum_{i = 0}^{n} δ_{n} E [{(X^{(n)} (t_{n, i}))}^{2}], \end{matrix}

(30)

which, by (28), immediately yields

I (M; Y_{0}^{T}) \leq \frac{1}{2} \int_{0}^{T} E [X^{2} (s)] d s \leq \frac{P T}{2},

(31)

which establishes the converse part.

For the availability part, note that if we assume all

X^{(n)} (t_{n, i - 1})

are independent of the Brownian motion B with

E [{(X^{(n)} (t_{n, i - 1}))}^{2}] = P

, then the inequalities in (29) and (30) will become equalities. The part then follows from a usual random coding argument with codes generated by the distribution of

X^{(n)}

(or more precisely, a linear interpolation of

X^{(n)}

). It is clear that as n tends to infinity, the process

X^{(n)}

behaves increasingly like a white Gaussian process. This observation echoes Theorem

6.4.1

in [12], whose proof rigorously shows that an Ornstein–Uhlenbeck process that oscillates “extremely” fast will achieve the capacity of (25).

Roughly speaking, similar to the conventional sampling approach, the above approximation approach establishes a continuous-time Gaussian feedback channel as the limit of the associated discrete-time channels as the SNR for each channel use shrink to zero proportionately (note that in the above arguments, the SNR for each channel use is

δ_{n}

). In other words, we have strengthened the low SNR equivalence in (A) as follows:

a continuous-time infinite-bandwidth Gaussian channel with feedback is “equivalent” to a discrete-time Gaussian channel with feedback at low SNR.
(B)

We remark however that for the purpose of deriving the capacity of (25) though, the approximation approach, like the conventional sampling approach, is heuristic in nature: Theorem 3 does require Conditions (d)–(f), which are much stronger than the power constraint (26). Nevertheless, this approach is of fundamental importance to our treatment of continuous-time Gaussian channels: as elaborated in Section 5, not only can it channel the ideas and techniques in discrete time to rigorously establish new results in continuous time, more importantly, it can also provide insights and intuition for our rigorous treatments where we will employ established tools and develop new tools in stochastic calculus.

5. Continuous-Time Multi-User Gaussian Channels

Extending Shannon’s fundamental theorems on point-to-point communication channels to general networks with multiple sources and destinations, network information theory aims to establish the fundamental limits on information flows in networks and the optimal coding schemes that achieve these limits. The vast majority of researches on network information theory to date have been focusing on networks in discrete time. In a way, this phenomenon can find its source from Shannon’s original treatment of continuous-time point-to-point channels, where such channels were examined through their associated discrete-time versions. This insightful viewpoint has exerted major influences on the bulk of the related literature on continuous-time Gaussian channels, oftentimes prompting a model shift from the continuous-time setting to the discrete-time one right from the beginning of a research attempt.

The primary focus of this section is to illustrate the possible applications of the approximation approach: (1) Guided by this approach, we will rigorously derive the capacity regions of families of continuous-time multi-user one-hop white Gaussian channels, including continuous-time multi-user white Gaussian multiple access channels (MACs) and broadcast channels (BCs). To deliver the rigourous proofs of our results, we will directly work within the continuous-time setting, employing established tools and developing new tools (see Theorems A3 and A5) in stochastic calculus to complement the approximation approach; (2) We can also rigorously apply this approach to examine how feedback affects the capacity regions of the above-mentioned channels via translations of results and techniques in discrete time. It turns out that some results can be translated from the discrete-time setting to the continuous-time setting, such as that feedback increases the capacity region of Gaussian BCs, and that feedback does not increase the capacity region of physically degraded BCs. Nevertheless, there is a seeming “exception”: as opposed to discrete-time Gaussian MACs, feedback does not increase the capacity region some of continuous-time Gaussian MACs, which, somewhat surprisingly, can also be explained by the approximation approach as well.

Below, we summarize the results in this section. To put our results into a relevant context, we will first list some related results in discrete time, and for obvious reasons, we can only list those that are most relevant to ours.

Gaussian MACs. When there is no feedback, the capacity region of a discrete-time memoryless MAC is relatively better understood: a single-letter characterization has been established by Ahlswede [30] and the capacity region of a Gaussian MAC was explicitly derived in Wyner [31] and Cover [32]. On the other hand, the capacity region of MACs with feedback still demands more complete understanding, despite several decades of great effort by many authors: Cover and Leung [33] derived an achievable region for a memoryless MAC with feedback. In [34], Willems showed that Cover and Leung’s region is optimal for a class of memoryless MACs with feedback where one of the inputs is a deterministic function of the output and the other input. More recently, Bross and Lapidoth [35] improved Cover and Leung’s region, and Wu et al. [36] extended Cover and Leung’s region for the case where non-causal state information is available at both senders. An interesting result has been obtained by Ozarow [37], who derived the capacity region of a memoryless Gaussian MAC with two users via a modification of the Schalkwijk-Kailath scheme [38]; moreover, Ozarow’s result showed that in general, the capacity region for a discrete memoryless MAC is increased by feedback.

In Section 5.1, guided by the approximation approach, we first establish Lemma A3, a key lemma which roughly says that “other users can be simply treated as noises”, and we then employ established tools from stochastic calculus to derive the capacity region of a continuous-time white Gaussian MAC with m senders and with/without feedback. It turns out that for such a channel, feedback does not increase the capacity region, which, at first sight, may seem at odds with the aforementioned Ozarow’s result and the conclusion of our approximation theorems. This however can be roughly explained by the well-known fact that “>” may become “=” when taking the limit (indeed,

a_{n} > b_{n}

does not necessarily imply

{lim}_{n \to \infty} a_{n} > {lim}_{n \to \infty} b_{n}

); see Remark 6 for a more detailed explanation.

Gaussian ICs. The capacity regions of discrete-time Gaussian ICs are largely unknown except for certain special scenarios: The capacity region of Gaussian ICs with strong interference has been established in Sato [39], Han and Kobayashi [40]. The sum-capacity of Gaussian ICs with weak interference has been simultaneously derived in [41,42,43]. The half-bit theorem on the tightness of the Han-Kobayashi bound [40] was proven in [44]. The approximation of the Gaussian IC by the q-ary expansion deterministic channel was first proposed by Avestimehr, Diggavi, and Tse [45]. Outer and inner bounds on the feedback capacity region of Gaussian interference channels are established by Suh and Tse [46]. Note that all the above-mentioned work deal with ICs with two pairs of senders and receivers. For more than two user pairs, special classes of Gaussian ICs have been examined using the scheme of interference alignment; see an extensive list of references in [47].

In Section 5.2, using a similar approach that we developed for continuous-time Gaussian MACs, we derive the capacity region of a continuous-time white Gaussian IC with m pairs of senders and receivers and without feedback. In addition, we also use a translated version of the argument in [46] and the approximation approach to show that feedback does increase the capacity region of certain continuous-time white Gaussian IC.

Gaussian BCs. The capacity regions of discrete-time Gaussian BCs without feedback are well known [48,49]. In addition, it has been shown by El Gamal [50] that feedback cannot increase the capacity region of a physically degraded Gaussian BC. On the other hand, it was shown by Ozarow and Leung [51] that feedback can increase the capacity of stochastically degraded Gaussian BCs, whose capacity regions are far less understood.

In Section 5.3, we first establish a continuous-time version of entropy power inequality (Theorem A5) and then derive the capacity region of a continuous-time Gaussian non-feedback BC with m receivers. Employing the approximation approach, we use a modified argument in [50] to show that feedback does not increase the capacity region of a physically degraded continuous-time Gaussian BC, and on the other hand, a translated version of the argument in [51] to show that feedback does increase the capacity region of certain continuous-time Gaussian BC.

Here we remark that the above-mentioned capacity results for the non-feedback case (Theorems 7 (for the non-feedback case) and 10) are “long known” in the sense that they are “predicted” by the conventional sampling approach and their proofs follow from the usual framework. On the other hand, though, explicit formulations and statements of these results and their rigorous and complete proofs, to the best of our knowledge, do not exist in the literature. The reason, we believe, is that there are a number of technical difficulties that one has to overcome to prove such results: Lemmas A3 and A5 (which is based on the I-MMSE relationship that has only been established in [25]) are newly developed in this work and their proofs are non-trivial.

In contrast, the approximation approach can be applied to continuous-time Gaussian feedback channels, either heuristically or rigorously. More specifically, it can be heuristically applied to “explain” Theorems 7 (on feedback capacity) and 10) and give us intuition (as elaborated in Remark 6, it helps “predict” the optimal channel input distribution, which has been made rigorous in Lemma A3), and it can also be applied to establish Theorems 11 and 12.

5.1. Gaussian MACs

Consider a continuous-time white Gaussian MAC with m users, which can be characterized by

Y (t) = \int_{0}^{t} X_{1} (s, M_{1}, Y_{0}^{s}) d s + \int_{0}^{t} X_{2} (s, M_{2}, Y_{0}^{s}) d s + \dots + \int_{0}^{t} X_{m} (s, M_{m}, Y_{0}^{s}) d s + B (t), t \geq 0,

(32)

where

X_{i}

is the channel input from sender i, which depends on

M_{i}

, the message sent from sender i, which is independent of all messages from other senders, and possibly on the feedback

Y_{0}^{s}

, the channel output up to time s.

For

T, R_{1}, \dots, R_{m}, P_{1}, \dots, P_{m} > 0

, a

(T, (e^{T R_{1}}, \dots, e^{T R_{m}}), (P_{1}, \dots, P_{m}))

-code for the MAC (32) consists of m sets of integers

M_{i} = {1, 2, \dots, e^{T R_{i}}}

, the message alphabet for user i,

i = 1, 2, \dots, m

, and m encoding functions,

X_{i} : M_{i} \to C [0, T]

, which satisfy the following power constraint: for any

i = 1, 2, \dots, m

, with probability 1,

\frac{1}{T} \int_{0}^{T} X_{i}^{2} (s, M_{i}, Y_{0}^{s}) d s \leq P_{i},

(33)

and a decoding function,

g : C [0, T] \to M_{1} \times M_{2} \times \dots \times M_{m} .

The average probability of error for the above code is defined as

P_{e}^{(T)} = \frac{1}{e^{T (\sum_{i = 1}^{m} R_{i})}} \sum_{(M_{1}, M_{2}, \dots, M_{m}) \in M_{1} \times M_{2} \times \dots \times M_{m}} P {g (Y_{0}^{T}) \neq (M_{1}, M_{2}, \dots, M_{m}) | (M_{1}, M_{2}, \dots, M_{m}) sent} .

A rate tuple

(R_{1}, R_{2}, \dots, R_{m})

is said to be achievable for the MAC if there exists a sequence of

(T, (e^{T R_{1}}, \dots, e^{T R_{m}}), (P_{1}, \dots, P_{m}))

-codes with

P_{e}^{(T)} \to 0

as

T \to \infty

. The capacity region of the MAC is the closure of the set of all the achievable

(R_{1}, R_{2}, \dots, R_{m})

rate tuples.

The following theorem, whose proof is postponed to Appendix E, gives an explicit characterization of the capacity region of (32).

Theorem 7.

Whether there is feedback or not, the capacity region of the continuous-time white Gaussian MAC (32) is

{(R_{1}, R_{2}, \dots, R_{m}) \in R_{+}^{m} : R_{i} \leq P_{i} / 2, i = 1, 2, \dots, m} .

Remark 5.

When there is no feedback, Theorem 7 can be heuristically explained using the sampling approach as in (2)–(4) (this heuristic approach in this example should be well known; see, e.g., Exercise 15.26 in [2]).

For simplicity only, we consider the following continuous-time white Gaussian multiple access channel with two senders:

Y (t) = X_{1} (t) + X_{2} (t) + Z (t), t \in R,

(34)

where

X_{i}

,

i = 1, 2

, is the input from the i-th user with average power limit

P_{i}

. Juat as before, consider its associated discrete-time version corresponding to bandwidth limit ω:

Y_{n} = X_{1, n}^{(ω)} + X_{2, n}^{(ω)} + Z_{n}^{(ω)}, n \in Z .

Then, it is well known [47] that the outer bound on the capacity region can be computed as

\{(R_{1}, R_{2}) \in R_{+}^{2} : R_{1} \leq W \log (1 + \frac{P_{1}}{2 ω}), R_{2} \leq W \log (1 + \frac{P_{2}}{2 ω})\},

and the inner bound as

\{(R_{1}, R_{2}) \in R_{+}^{2} : R_{1} \leq ω \log (1 + \frac{P_{1}}{2 ω}), R_{2} \leq ω \log (1 + \frac{P_{2}}{2 ω}), R_{1} + R_{2} \leq ω \log (1 + \frac{P_{1} + P_{2}}{2 ω})\} .

(Here, it is known [31,32] that the outer bound can be tightened to coincide with the inner bound, which, however, is not needed for this example.) It is easy to verify that the two bounds also collapse into the same region as ω tends to infinity:

\{(R_{1}, R_{2}) \in R_{+}^{2} : R_{1} \leq P_{1} / 2, R_{2} \leq P_{2} / 2\},

which is “expected” to be the capacity region of (34); or alternatively, one can apply the low SNR equivalence in (A) and take the limit as P tends to 0, reaching the same conclusion. Note that similar arguments hold for more than two senders as well through a parallel extension.

Remark 6.

When the feedback is present in the channel, the approximation approach, rather than the conventional sampling approach, is necessary to explain Theorem 7.

Again, for simplicity only, we consider the following continuous-time Gaussian MAC with two senders:

Y (t) = \int_{0}^{t} X_{1} (s, M_{1}, Y_{0}^{s}) d s + \int_{0}^{t} X_{2} (s, M_{2}, Y_{0}^{s}) d s + B (t), t \geq 0,

(35)

with the power constraints: there exist

P_{1}, P_{2} > 0

such that for all T,

\int_{0}^{T} X_{1}^{2} (s, M_{1}, Y_{0}^{s}) d s \leq P_{1} T, \int_{0}^{T} X_{2}^{2} (s, M_{2}, Y_{0}^{s}) d s \leq P_{2} T .

(36)

Applying the Euler-Maruyama approximation to the above channel over the time window

[0, T]

with respect to the evenly spaced

Δ_{n}

with

δ_{n} = T / n

, we obtain

Y^{(n)} (t_{n, i + 1}) = Y^{(n)} (t_{n, i}) + \int_{t_{n, i}}^{t_{n, i + 1}} X_{1} (s, M_{1}, Y_{0}^{(n), t_{n, i}}) d s + \int_{t_{n, i}}^{t_{n, i + 1}} X_{2} (s, M_{2}, Y_{0}^{(n), t_{n, i}}) d s + B (t_{n, i + 1}) - B (t_{n, i}) .

Now, straightforward computations and a usual concavity argument then yields that for large n,

\begin{matrix} I (M_{1}; Y^{(n)} (Δ_{n}) | M_{2}) & = h (Y^{(n)} (Δ_{n}) | M_{2}) - h (Y^{(n)} (Δ_{n}) | M_{1}, M_{2}) \\ = \sum_{i = 1}^{n} h (Y^{(n)} (t_{n, i}) | Y_{t_{n, 0}}^{(n), t_{n, i - 1}}, M_{2}) - \sum_{i = 1}^{n} h (Y^{(n)} (t_{n, i}) | Y_{t_{n, 0}}^{(n), t_{n, i - 1}}, M_{1}, M_{2}) \\ \leq \sum_{i = 1}^{n} \frac{1}{2} \log (E {(\int_{t_{n, i - 1}}^{t_{n, i}} X_{1} (s, M_{1}, Y_{0}^{(n), t_{n, i - 1}}) d s)}^{2} + δ_{n}) - \frac{1}{2} \log (δ_{n}) \\ \leq \sum_{i = 1}^{n} \frac{1}{2} \log ((\int_{t_{n, i - 1}}^{t_{n, i}} E X_{1} {(s, M_{1}, Y_{0}^{(n), t_{n, i - 1}})}^{2} d s) δ_{n} + δ_{n}) - \frac{1}{2} \log (δ_{n}) \\ = \sum_{i = 1}^{n} \frac{1}{2} \log (\int_{t_{n, i - 1}}^{t_{n, i}} E X_{1} {(s, M_{1}, Y_{0}^{(n), t_{n, i - 1}})}^{2} d s + 1) \\ \leq \sum_{i = 1}^{n} \sum_{i = 1}^{n} \frac{1}{2} \int_{t_{n, i - 1}}^{t_{n, i}} E X_{1} {(s, M_{1}, Y_{0}^{(n), t_{n, i - 1}})}^{2} d s \\ \approx \sum_{i = 1}^{n} \sum_{i = 1}^{n} \frac{1}{2} \int_{t_{n, i - 1}}^{t_{n, i}} E X_{1} {(s, M_{1}, Y_{0}^{s})}^{2} d s \\ \leq \frac{P_{1} T}{2} . \end{matrix}

A completely parallel argument will yield that

I (M_{2}; Y^{(n)} (Δ_{n}) | M_{1}) \leq \frac{P_{2} T}{2} .

It then follows from Theorem 3 the region below give an outer bound of the capacity region:

{(R_{1}, R_{2}) : 0 \leq R_{1} \leq P_{1} / 2, 0 \leq R_{2} \leq P_{2} / 2} .

(37)

To see that this outer bound can be achieved, set

X_{1} (s), X_{2} (s)

,

t_{n, i} \leq s \leq t_{n, i + 1}

, in (35) to be independent Gaussian random variables with variances

P_{1}

,

P_{2}

, respectively. Then, one verifies that for large n,

\begin{matrix} I (M_{1}; Y (Δ_{n})) & = h (Y (Δ_{n})) - h (Y (Δ_{n}) | M_{1}) \\ = \sum_{i = 1}^{n} h (Y (t_{n, i})) - \sum_{i = 1}^{n} h (Y (t_{n, i}) | M_{1}) \\ = \sum_{i = 1}^{n} \frac{1}{2} \log (P_{1} δ_{n}^{2} + P_{2} δ_{n}^{2} + δ_{n}) - \frac{1}{2} \log (P_{2} δ_{n}^{2} + δ_{n}) \end{matrix}

(38)

\begin{matrix} = \sum_{i = 1}^{n} \frac{1}{2} \log (1 + \frac{P_{1} δ_{n}^{2}}{P_{2} δ_{n}^{2} + δ_{n}}) \end{matrix}

(39)

\begin{matrix} \approx P_{1} T / 2, \end{matrix}

(40)

where we have used the fact that

δ_{n}

is “close” to 0 for large enough n for (40), which, parallelly as in Remark 5, can be alternatively explained by taking

P_{1}

to 0 and then applying the low SNR equivalence in (B). With a similar argument, one can prove that

I (M_{2}; Y (Δ_{n})) \approx P_{2} T / 2 .

It then follows that the outer bound in (37) can be achieved.

Here we remark that similarly as in Section 4, for n large enough, the constructed processes

X_{1}

and

X_{2}

behave like “fast-oscillating” Ornstein–Uhlenbeck processes, and moreover, from (38) and (39), one can tell that for one user to achieve the maximum transmission rate, the other user can simply be ignored. Predicting the optimal channel input, these facts echo Remark A3 and give another explanation to Lemma A3, a key lemma in our rigourous proof of Theorem 7 in Appendix E.

5.2. Gaussian ICs

Consider the following continuous-time white Gaussian interference channel having no feedback and with m pairs of senders and receivers: for

i = 1, 2, \dots, m

,

\begin{matrix} Y_{i} (t) = a_{i 1} \int_{0}^{t} X_{1} (s, M_{1}) d s + a_{i 2} \int_{0}^{t} X_{2} (s, M_{2}) d s + \dots + a_{i m} \int_{0}^{t} X_{m} (s, M_{m}) d s + B_{i} (t), t \geq 0, \end{matrix}

(41)

where

X_{i}

is the channel input from sender i, which depends on

M_{i}

, the message sent from sender i, which is independent of all messages from other senders, and

a_{i j} \in R

,

i, j = 1, 2, \dots, m

, is the channel gain from sender j to receiver i, all

B_{i} (t)

are (possibly correlated) standard Brownian motions.

For

T, R_{1}, \dots, R_{m}, P_{1}, \dots, P_{m} > 0

, a

(T, (e^{T R_{1}}, \dots, e^{T R_{m}}), (P_{1}, \dots, P_{m}))

-code for the IC (41) consists of m sets of integers

M_{i} = {1, 2, \dots, e^{T R_{i}}}

, the message alphabet for user i,

i = 1, 2, \dots, m

, and m encoding functions,

X_{i} : M_{i} \to C [0, T]

satisfying the following power constraint: for any

i = 1, 2, \dots, m

, with probability 1,

\frac{1}{T} \int_{0}^{T} X_{i}^{2} (s, M_{i}) d s \leq P_{i},

(42)

and m decoding functions,

g_{i} : C [0, T] \to M_{i}

,

i = 1, 2, \dots, m

.

The average probability of error for the

(T, (e^{T R_{1}}, \dots, e^{T R_{m}}), (P_{1}, \dots, P_{m}))

-code is defined as

P_{e}^{(T)} = \frac{1}{e^{T (\sum_{i = 1}^{m} R_{i})}} \sum_{(M_{1}, M_{2}, \dots, M_{m}) \in M_{1} \times M_{2} \times \dots \times M_{m}} P {g_{i} (Y_{i, 0}^{T}) \neq M_{i}, i = 1, 2, \dots, m | (M_{1}, M_{2}, \dots, M_{m}) sent} .

A rate tuple

(R_{1}, R_{2}, \dots, R_{m})

is said to be achievable for the IC if there exists a sequence of

(T, (e^{T R_{1}}, \dots, e^{T R_{m}}), (P_{1}, \dots, P_{m}))

-codes with

P_{e}^{(T)} \to 0

as

T \to \infty

. The capacity region of the IC is the closure of the set of all the achievable

(R_{1}, R_{2}, \dots, R_{m})

rate tuples.

The following theorem explicitly characterizes the capacity region of the above IC, whose proof has been postponed to Appendix F.

Theorem 8.

The capacity region of the continuous-time white Gaussian IC (41) is

{(R_{1}, R_{2}, \dots, R_{m}) \in R_{+}^{m} : R_{i} \leq a_{i i}^{2} P_{i} / 2, i = 1, 2, \dots, m} .

Remark 7.

Theorem 8 can be heuristically derived using a similar argument employing the approximation approach as in Remark 6.

With the explicit non-feedback capacity region stated in Theorem 8, we are now ready to use the approximation approach to analyze the effects of feedback on continuous-time Gaussian ICs.

The following theorem says that feedback does help continuous-time Gaussian ICs, whose proof uses a translated version of the argument in [46] coupled with the approximation approach as in Section 4, and so we only provide a sketch of the proof.

Theorem 9.

Feedback strictly increases the capacity region of certain continuous-time Gaussian interference channel.

Proof.

Consider the following symmetric continuous-time Gaussian interference channel with two pairs of senders and receivers:

Y_{1} (t) = \sqrt{s n r} \int_{0}^{t} X_{1} (s) d s + \sqrt{i n r} \int_{0}^{t} X_{2} (s) d s + B_{1} (t),

Y_{2} (t) = \sqrt{i n r} \int_{0}^{t} X_{1} (s) d s + \sqrt{s n r} \int_{0}^{t} X_{2} (s) d s + B_{2} (t),

where

s n r, i n r

denote the signal-to-noise, interference-to-noise ratios, respectively,

B_{1} (t), B_{2} (t)

are independent standard Brownian motions, and the average power of

X_{1}, X_{2}

are assumed to be 1.

Following [46], we consider the following coding scheme over two stages, each of length

T_{0}

. In the first stage, transmitters 1 and 2 send codewords

X_{1, 0}^{T_{0}}

and

X_{2, 0}^{T_{0}}

with rates

R_{1}

and

R_{2}

, respectively. In the second stage, using feedback, transmitters 1 and 2 decode

X_{2, 0}^{T_{0}}

and

X_{1, 0}^{T_{0}}

, respectively. This can be decoded if

R_{1}, R_{2} \leq \frac{i n r}{2} .

Then, transmitters 1 and 2 send

X_{1, T_{0}}^{2 T_{0}}

and

X_{2, T_{0}}^{2 T_{0}}

, respectively such that for any

0 \leq t \leq T_{0}

,

X_{1} (T_{0} + t) = X_{2} (t), X_{2} (T_{0} + t) = - X_{1} (t) .

Then during the two stages, receiver 1 receives

Y_{1} (t) = \sqrt{s n r} \int_{0}^{t} X_{1} (s) d s + \sqrt{i n r} \int_{0}^{t} X_{2} (s) d s + B_{1} (t), 0 \leq t \leq T_{0},

and

Y_{1} (T_{0} + t) = \sqrt{s n r} \int_{0}^{T_{0} + t} X_{1} (s) d s + \sqrt{i n r} \int_{0}^{T_{0} + t} X_{2} (s) d s + B_{1} (T_{0} + t), 0 \leq t \leq T_{0},

which immediately gives rise to

\begin{matrix} Y_{1} (T_{0} + t) - Y_{1} (T_{0}) & = \sqrt{s n r} \int_{T_{0}}^{T_{0} + t} X_{1} (s) d s + \sqrt{i n r} \int_{T_{0}}^{T_{0} + t} X_{2} (s) d s + B_{1} (T_{0} + t) - B_{1} (T_{0}) \\ = \sqrt{s n r} \int_{0}^{t} X_{2} (s) d s - \sqrt{i n r} \int_{0}^{t} X_{1} (s) d s + B_{1} (T_{0} + t) - B_{1} (T_{0}) . \end{matrix}

We then have that for any

0 \leq t \leq T_{0}

,

\sqrt{s n r} Y_{1} (t) - \sqrt{i n r} (Y_{1} (T_{0} + t) - Y_{1} (T_{0})) = (s n r + i n r) \int_{0}^{t} X_{1} (s) d s + \sqrt{s n r} B_{1} (t) - \sqrt{i n r} (B_{1} (T_{0} + t) - B (T_{0})),

which means the codeword

X_{1, 0}^{T_{0}}

can be decoded at the second stage if

R_{1} \leq \frac{s n r + i n r}{2} .

A completely parallel argument yields that the codeword

X_{2, 0}^{T_{0}}

can be decoded at the second stages if

R_{2} \leq \frac{s n r + i n r}{2} .

All in all, after the two stages, the two codewords

X_{1, 0}^{T_{0}}

and

X_{2, 0}^{T_{0}}

can be decoded as long as

R_{1}, R_{2} \leq \frac{i n r}{2};

in other words, coding rate

(\frac{i n r}{2}, \frac{i n r}{2})

is achievable, which, if assuming

i n r > s n r

, will imply that feedback strictly increases the capacity region. ☐

5.3. Gaussian BCs

In this section, we consider a continuous-time white Gaussian BC with m receivers, which is characterized by: for

i = 1, 2, \dots, m

,

Y_{i} (t) = \sqrt{s n r_{i}} \int_{0}^{t} X (s, M_{1}, M_{2}, \dots, M_{m}) d s + B_{i} (t), t \geq 0,

(43)

where X is the channel input, which depends on

M_{i}

, the message sent from sender i, which is uniformly distributed over a finite alphabet

M_{i}

and independent of all messages from other senders,

s n r_{i}

is the signal-to-noise ratio in the channel for user i,

B_{i} (t)

are (possibly correlated) standard Brownian motions.

For

T, R_{1}, R_{2}, \dots, R_{m}, P > 0

, a

(T, (e^{T R_{1}}, \dots, e^{T R_{m}}), P)

-code for the BC (43) consists of m set of integers

M_{i} = {1, 2, \dots, e^{T R_{i}}}

, the message set for receiver i,

i = 1, 2, \dots, m

, and an encoding function,

X : M_{1} \times M_{2} \times \dots \times M_{m} \to C [0, T]

, which satisfies the following power constraint: with probability 1,

\frac{1}{T} \int_{0}^{T} X^{2} (s, M_{1}, M_{2}, \dots, M_{m}) d s \leq P,

(44)

and m decoding functions,

g_{i} : C [0, T] \to M_{i}

,

i = 1, 2, \dots, m

.

The average probability of error for the

(T, (e^{T R_{1}}, e^{T R_{2}}, \dots, e^{T R_{m}}), P)

-code is defined as

P_{e}^{(T)} = \frac{1}{e^{T (\sum_{i = 1}^{m} R_{i})}} \sum_{(M_{1}, M_{2}, \dots, M_{m}) \in M_{1} \times M_{2} \times \dots \times M_{m}} P {g_{i} (Y_{0}^{T}) \neq M_{i}, i = 1, 2, \dots, m | (M_{1}, M_{2}, \dots, M_{m}) sent} .

A rate tuple

(R_{1}, R_{2}, \dots, R_{m})

is said to be achievable for the BC if there exists a sequence of

(T, (e^{T R_{1}}, e^{T R_{2}}, \dots, e^{T R_{m}}), P)

-codes with

P_{e}^{(T)} \to 0

as

T \to \infty

. The capacity region of the BC is the closure of the set of all the achievable

(R_{1}, R_{2}, \dots, R_{m})

rate tuples.

The following theorem explicitly characterizes the capacity region of the above BC, whose proof is postponed to Appendix G.

Theorem 10.

The capacity region of the continuous-time white Gaussian BC (43) is

\{(R_{1}, R_{2}, \dots, R_{m}) \in R_{+}^{m} : \frac{R_{1}}{s n r_{1}} + \frac{R_{2}}{s n r_{2}} + \dots + \frac{R_{m}}{s n r_{m}} \leq \frac{P}{2}\} .

Remark 8.

Theorem 10 can be heuristically derived using a similar argument employing the approximation approach as in Remark 6.

We are now ready to use the approximation approach to analyze the effects of feedback on continuous-time Gaussian BCs.

The following theorem says that feedback does not help physically degraded Gaussian BCs, whose proof is inspired by the ideas in Section 4 and parallels the argument in [50].

Theorem 11.

Consider the following continuous-time physically degraded Gaussian broadcast channel with one sender and two receivers:

Y_{1} (t) = \int_{0}^{t} X (s, M_{1}, Y_{1, 0}^{s}, Y_{2, 0}^{s}) d s + \sqrt{N_{1}} B_{1} (t),

Y_{2} (t) = \int_{0}^{t} X (s, M_{2}, Y_{1, 0}^{s}, Y_{2, 0}^{s}) d s + \sqrt{N_{1}} B_{1} (t) + \sqrt{N_{2}} B_{2} (t),

where

N_{1}, N_{2} > 0

, and

B_{1}, B_{2}

are independent standard Brownian motions, and the channel input

X (s)

is assumed to satisfy Conditions (d)–(f). Then, feedback does not increase the capacity region of the above channel.

Proof.

Let X be a

(T, (e^{T R_{1}}, e^{T R_{2}}), P)

-code. By the code construction, for

i = 1, 2

, it is possible to estimate the messages

M_{i}

from the channel output

Y_{i, 0}^{T}

with an arbitrarily low probability of error. Hence, by Fano’s inequality, for

i = 1, 2

,

H (M_{i} | Y_{i, 0}^{T}) \leq T R_{i} P_{e}^{(T)} + H (P_{e}^{(T)}) = T ε_{i, T},

where

ε_{i, T} \to 0

as

T \to \infty

. It then follows that

T R_{1} = H (M_{1}) = H (M_{1} | M_{2}) \leq I (M_{1}; Y_{1, 0}^{T} | M_{2}) + T ε_{1, T},

T R_{2} = H (M_{2}) \leq I (M_{2}; Y_{2, 0}^{T}) + T ε_{2, T} .

Now the Euler-Maruyama approximation with respect to the evenly spaced

Δ_{n}

of stepsize

δ_{n} = T / n

applied to the continuous-time physically degraded Gaussian BC yields:

Y_{1}^{(n)} (t_{n, i}) - Y_{1}^{(n)} (t_{n, i - 1}) = \int_{t_{n, i - 1}}^{t_{n, i}} X (s, M, Y_{1, t_{n, 0}}^{(n), t_{n, i - 1}}, Y_{2, t_{n, 0}}^{(n), t_{n, i - 1}}) d s + \sqrt{N_{1}} B_{1} (t_{n, i}) - \sqrt{N_{1}} B_{1} (t_{n, i - 1}),

Y_{2}^{(n)} (t_{n, i}) - Y_{2}^{(n)} (t_{n, i - 1}) = \int_{t_{n, i - 1}}^{t_{n, i}} X (s, M, Y_{1, t_{n, 0}}^{(n), t_{n, i - 1}}, Y_{2, t_{n, 0}}^{(n), t_{n, i - 1}}) d s + \sqrt{N_{1}} B_{1} (t_{n, i}) - \sqrt{N_{1}} B_{1} (t_{n, i - 1}) + \sqrt{N_{2}} B_{2} (t_{n, i}) - \sqrt{N_{2}} B_{2} (t_{n, i - 1}) .

Then, by Theorem 3, we have

\begin{matrix} I (M_{2}; Y_{2, 0}^{T}) & = lim_{n \to \infty} I (M_{2}; Y_{2}^{(n)} (Δ_{n})) \\ = lim_{n \to \infty} I (M_{2}; Δ Y_{2}^{(n)} (Δ_{n})) \\ = lim_{n \to \infty} h (Δ Y_{2}^{(n)} (Δ_{n})) - h (Δ Y_{2}^{(n)} (Δ_{n}) | M_{2}), \end{matrix}

where

Δ Y_{2}^{(n)} (Δ_{n}) ≜ {Y_{2}^{(n)} (t_{n, i}) - Y_{2}^{(n)} (t_{n, i - 1}) : i = 1, 2, \dots, n}

. Note that

H (Δ Y_{2}^{(n)} (Δ_{n})) \leq \sum_{i = 1}^{n} \log (2 π e (P δ_{n}^{2} + N_{2} δ_{n})),

and

\begin{matrix} H (Δ Y_{2}^{(n)} (Δ_{n}) | M_{2}) & = \sum_{i = 1}^{n} h (Y_{2}^{(n)} (t_{n, i}) - Y_{2}^{(n)} (t_{n, i - 1}) | Y_{2, t_{n, 0}}^{(n), t_{n, i - 1}}, M_{2}) \\ \geq \sum_{i = 1}^{n} h (\sqrt{N_{2}} B_{2} (t_{n, i}) - \sqrt{N_{2}} B_{2} (t_{n, i - 1})) \\ = \sum_{i = 1}^{n} \log (2 π e N_{2} δ_{n}), \end{matrix}

which implies that there exists an

α \in [0, 1]

such that

h (Δ Y_{2}^{(n)} (Δ_{n}) | M) = \sum_{i = 1}^{n} \frac{n}{2} \log (2 π e (α P δ_{n}^{2} + N_{2} δ_{n})) .

It then follows from Theorem 3 that

I (M_{2}; Y_{2, 0}^{T}) \leq \frac{1}{2} lim_{n \to \infty} \sum_{i = 1}^{n} \log \frac{P δ_{n}^{2} + N_{2} δ_{n}}{α P δ_{n}^{2} + N_{2} δ_{n}} = \frac{(1 - α) P T}{2 N_{2}} .

Next we consider

\begin{matrix} I (M_{1}; Y_{1}^{(n)} (Δ_{n}) | M_{2}) & = h (Y_{1}^{(n)} (Δ_{n}) | M_{2}) - h (Y_{1}^{(n)} (Δ_{n}) | M_{1}, M_{2}) \\ = h (Y_{1}^{(n)} (Δ_{n}) | M_{2}) - \sum_{i = 1}^{n} h (Y_{1}^{(n)} (t_{n, i}) | M_{1}, M_{2}, Y_{1, t_{n, 0}}^{(n), t_{n, i - 1}}) \\ \leq h (Y_{1}^{(n)} (Δ_{n}) | M_{2}) - \sum_{i = 1}^{n} h (Y_{1}^{(n)} (t_{n, i}) | M_{1}, M_{2}, Y_{1, t_{n, 0}}^{(n), t_{n, i - 1}}, Y_{1, t_{n, 0}}^{(n), t_{n, i - 1}}) \\ = h (Y_{1}^{(n)} (Δ_{n}) | M_{2}) - \frac{1}{2} \sum_{i = 1}^{n} \log (2 π e N_{1} δ_{n}) . \end{matrix}

Now, using Lemma 1 in [50] (an extension of the entropy power inequality), we obtain

\begin{matrix} h (Y^{(n)} (Δ_{n}) | M_{2}) \geq \frac{n}{2} \log (2^{2 h (Y_{1}^{(n)} (Δ_{n})) | M_{2}) / n} + 2 π e (N_{2} - N_{1}) δ_{n}), \end{matrix}

which immediately implies that

h (Y_{1, t_{n, 0}}^{(n), t_{n, i}} | M_{2}) \leq \frac{1}{2} \sum_{i = 1}^{n} \log (2 π e (α P δ_{n}^{2} + N_{1} δ_{n}))

and furthermore, by Theorem 3,

\begin{matrix} I (M_{1}; Y_{1, 0}^{T} | M_{2}) & \leq lim_{n \to \infty} \frac{1}{2} \sum_{i = 1}^{n} \log (2 π e (α P δ_{n}^{2} + N_{1} δ_{n})) - \frac{1}{2} \sum_{i = 1}^{n} \log (2 π e N_{1} δ_{n}) \\ = lim_{n \to \infty} \frac{1}{2} \sum_{i = 1}^{n} \log (1 + \frac{α P δ_{n}}{N_{1}}) \\ = \frac{α P T}{2 N_{1}} . \end{matrix}

Now, by Theorem 10, we conclude that feedback capacity region is exactly the same non-feedback capacity region; in other words, feedback does not increase the capacity region of a physically degraded continuous-time Gaussian BC. ☐

The following theorem says that feedback does help some stochastically degraded Gaussian BCs, whose proof, instead of directly employing the approximation theorem, uses the connections between continuous-time and discrete-time Gaussian channels and the notion of continuous-time directed information in Remark 2, both of which can find their source from the approximation theorem. We only provide the sketch of the proof, since it is largely based on a translated version of the argument in [51].

Theorem 12.

Feedback increases the capacity region of certain continuous-time stochastically degraded Gaussian broadcast channel.

Proof.

Consider the following symmetric continuous-time Gaussian broadcast channel:

Y_{1} (t) = \int_{0}^{t} X (s) d s + B_{1} (t),

Y_{2} (t) = \int_{0}^{t} X (s) d s + B_{2} (t),

where

B_{1}, B_{2}

are independent standard Brownian motions, and X satisfies the average power constraint P. By Theorem 10, without feedback, the capacity region is the set of rate pairs

(R_{1}, R_{2})

such that

R_{1} + R_{2} \leq \frac{P}{2} .

(45)

With feedback, one can use the following variation [51] of the Schalkwijk-Kailath coding scheme [38] over

[0, T]

at discrete-time points

{t_{n, i}}

that form an evenly spaced

Δ_{n}

of stepsize

δ_{n}

: For the channel input, after some proper initialization, at time

t \in [t_{n, i}, t_{n, i + 1})

, we send

X^{(n)} (t) = X_{1}^{(n)} (t) + X_{2}^{(n)} (t)

, where

X_{1}^{(n)} (t) = γ_{i} (X_{1}^{(n)} (t_{n, i - 1}) - E [X_{1}^{(n)} (t_{n, i - 1}) | Y_{1}^{(n)} (t_{n, i - 1})]),

X_{2}^{(n)} (t) = - γ_{i} (X_{2}^{(n)} (t_{n, i - 1}) - E [X_{2}^{(n)} (t_{n, i - 1}) | Y_{2}^{(n)} (t_{n, i - 1})]),

where

γ_{i}

is chosen so that

E [X_{i}^{2} (t)] = P

for each i; and for the channel outputs, we have, for any

t \in [t_{n, i}, t_{n, i + 1}]

,

Y_{1}^{(n)} (t) = Y_{1}^{(n)} (t_{n, i}) + \int_{t_{n, i}}^{t} X^{(n)} (t_{n, i}) d s + B_{1} (t) - B_{1} (t_{n, i}),

(46)

and

Y_{2}^{(n)} (t) = Y_{2}^{(n)} (t_{n, i}) + \int_{t_{n, i}}^{t} X^{(n)} (t_{n, i}) d s + B_{2} (t) - B_{2} (t_{n, i}) .

(47)

Going through a completely parallel argument as in [51] and capitalizing on the fact that the SNR in the channels (46) and (47) tend to 0 as n tends to infinity, we derive that

lim_{n \to \infty} I (X^{(n)} (Δ_{n}) \to Y_{1}^{(n)} (Δ_{n})) = \frac{1}{2} \sum_{n \to \infty} \sum_{i = 1}^{n} \frac{\log (1 + \frac{P δ_{n} (1 + ρ^{*}) / 2}{(1 + P δ_{n} (1 - ρ^{*})) / 2})}{T} = \frac{P (1 + ρ^{*})}{4},

and parallelly,

lim_{n \to \infty} I (X^{(n)} (Δ_{n}) \to Y_{2}^{(n)} (Δ_{n})) = \frac{P (1 + ρ^{*})}{4},

where

ρ^{*} > 0

satisfies the condition

ρ^{*} (1 + (P + 1) (1 + P (1 - ρ^{*}) / 2)) = \frac{P (P + 2) (1 - ρ^{*})}{2} .

Note that, by Remark 2, we have

I (X_{0}^{(n), T} \to Y_{1, 0}^{(n), T}) \geq I (X^{(n)} (Δ_{n}) \to Y_{1}^{(n)} (Δ_{n})), I (X_{0}^{(n), T} \to Y_{2, 0}^{(n), T}) \geq I (X^{(n)} (Δ_{n}) \to Y_{2}^{(n)} (Δ_{n})),

which immediately implies that

R_{1} = R_{2} = \frac{P (1 + ρ^{*})}{4}

(48)

are achievable. The claim that feedback strictly increases the capacity region then follows from (45), and (48) and the fact that

ρ^{*} > 0

. ☐

6. Conclusions and Future Work

For a continuous-time white Gaussian channel without feedback, the classical Shannon–Nyquist sampling theorem can convert it to a discrete-time Gaussian channel; however such a link has long been missing when feedback/memory is present in the channel. In this paper, we establish sampling and approximation theorems as the missing links, which we believe will play important roles in the long run for further developing continuous-time information theory, particularly for the communication scenarios where feedback/memory is present.

As an immediate application of our approximation theorem, we propose the approximation approach, an analog of the conventional sampling approach, for Gaussian feedback channels. It turns out that, like its non-feedback counterpart, the approximation approach can bring insights and intuition to investigation of continuous-time Gaussian channels with possible feedback, and moreover, when complemented with relevant tools from stochastic calculus, can deliver rigorous treatments in the point-to-point or multi-user setting.

On the other hand, though, there are many questions that remain unanswered and a number of directions that need to be further explored. Below we list a number of research directions that look promising in the near future.

(1) The first direction is to strengthen and generalize our sampling and approximation theorems.

Note that both Theorems 2 and 3 require Conditions (d)–(f), which are stronger than the typical average power constraint. While Conditions (d)–(f) are rather mild for practical considerations, the stronger assumptions in our theorems will narrow their reach in some theoretical situations. For instance, despite the fact that our approximation theorem gives intuitive explanations to the rigorous treatment of continuous-time multi-user Gaussian channels in Section 5, it fails to rigorously establish Theorems 7 and 10. The stochastic calculus approach employed in Section 5 requires only the power constraints, which can be loosely explained by the fact that Girsanov’s theorem (or, more precisely, its several variants) only requires as weak conditions. It is certainly worthwhile to explore whether the assumptions in our sampling and approximation theorems can be relaxed either in general or for some special settings.

Another topic in this direction is the rate of convergence in the sampling and approximation theorems. While the current versions of our theorems have merely established some limits, the rate of convergence will certainly yield a more quantitative description of how fast those limits will be approached.

One can also consider generalizing these two theorems to general Gaussian channels [52,53]. For this topic, note that there exist in-depth studies [54,55,56,57,58,59,60,61] on continuous-time point-to-point general Gaussian channels with possible feedback, for which information-theoretic connections with the discrete-time setting are somehow lacking. A first step in this direction can be establishing sampling or approximation theorems for stationary Gaussian processes. Obviously, such theorems for stationary Gaussian processes can connect continuous-time stationary Gaussian channels to their discrete-time counterparts, for which the variational formulation of discrete-time stationary Gaussian feedback capacity in [62] proves to rather effective.

(2) The second direction is to further explore the possible applications of our sampling and approximation theorems in the following respects.

We have shown that feedback may increase the capacity region of some continuous-time Gaussian BC, but the capacity regions of such channels remain unknown in general. An immediate problem is to explicitly find the exact capacity regions of continuous-time Gaussian BCs using the approach employed in this work, as we have done for continuous-time Gaussian MACs. Of course, further topics also include exploring whether the ideas and techniques in this paper can be applied to other families of continuous-time multi-user Gaussian channels with possible feedback.

So far we have implicitly assumed infinite bandwidth and average power constraints, but our theorems can certainly go beyond these assumptions. For instance, one can consider examining continuous-time Gaussian channels with both bandwidth limit and peak power constraint, which are more reasonable assumptions for many practical communication scenarios as they give a more accurate description of the limitations of the communication system. Little is known about the capacity of continuous-time Gaussian channels with such constraints except some upper and lower bounds established in [63,64]. In stark contrast, discrete-time peak power constrained channels (including, but not limited to Gaussian channels) have been better investigated: there has been a series of work on their capacity, such as [65,66,67,68,69,70,71,72], which feature relatively thorough discussions about different aspects of channel capacity including capacity achieving distribution, bounds and asymptotics of capacity, and numerical computation of capacity. An immediate question is to explore whether the approximation approach can translate the aforementioned existing results in discrete time, or more probably, help channel the ideas and techniques therein to the continuous-time setting. A next question is to explore whether there exists any randomized algorithm for computation of the capacity of such a channel, for which, as discussed in Remark 4, we believe our sampling theorems can be particularly helpful in terms of numerically computing and optimizing the mutual information of a continuous-time Gaussian channel with bandwidth limit and peak power constraint.

Author Contributions

Both authors have significantly contributed all aspects of this paper including conceptualization, methodology, validation, formal analysis, investigation, writing and so on.

Funding

This research was funded by the Research Grants Council of the Hong Kong Special Administrative Region, China, under Project 17301017 and by the National Natural Science Foundation of China, under Project 61871343.

Acknowledgments

We would like to thank Ronit Bustin, Jun Chen, Young-Han Kim, Haim Permuter, Shlomo Shamai, Tsachy Weissman and Wenyi Zhang for insightful suggestions and comments, and for pointing out relevant references.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorem 1

First of all, an application of Theorem

7.14

of [19] with Conditions (b) and (c) yields that

P (\int_{0}^{T} E^{2} [g (t, W_{0}^{t}, Y_{0}^{t}) | Y_{0}^{t}] d t < \infty) = 1 .

(A1)

Then one verifies that the assumptions of Lemma

7.7

of [19] are all satisfied (this lemma is stated under very general assumptions, which are exactly Conditions (b), (c) and (A1) when restricted to our settings), which implies that for any w,

μ_{Y} \sim μ_{Y | W = w} \sim μ_{B},

where “∼” is the standard notation for two measures being equivalent (i.e., one is absolutely continuous with respect to the other and vice versa), and moreover, with probability 1,

\frac{d μ_{Y | W}}{d μ_{B}} (Y_{0}^{T}) = \frac{1}{E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s} | Y_{0}^{T}, W_{0}^{T}]}, \frac{d μ_{Y}}{d μ_{B}} (Y_{0}^{T}) = \frac{1}{E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s} | Y_{0}^{T}]},

(A2)

where we have rewritten

g (s, W_{0}^{s}, Y_{0}^{s})

as

g (s)

for notational simplicity. Here we remark that

E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s} | Y_{0}^{T}, W_{0}^{T}]

is in fact equal to

e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s}

, but we keep it the way it is as above for an easy comparison.

Note that it follows from

E [d μ_{B} / d μ_{Y} (Y_{0}^{T})] = 1

that

E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s}] = 1,

which is equivalent to

E [e^{- \int_{0}^{T} g (s) d B (s) - \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s}] = 1 .

Then, a parallel argument as in the proof of Theorem

7.1

of [19] further implies that for any

Δ_{n}

, with probability 1,

\frac{d μ_{Y | W}}{d μ_{B}} (Y (Δ_{n})) = \frac{1}{E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s} | Y (Δ_{n}), W_{0}^{T}]}, \frac{d μ_{Y}}{d μ_{B}} (Y (Δ_{n})) = \frac{1}{E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s} | Y (Δ_{n})]},

(A3)

where we have defined

B (Δ_{n}) ≜ {B (t_{n, 0}), B (t_{n, 1}), \dots, B (t_{n, n})},

and moreover,

\frac{d μ_{Y | W}}{d μ_{B}} (Y (Δ_{n})) ≜ \frac{d μ_{Y (Δ_{n}) | W}}{d μ_{B (Δ_{n})}} (Y (Δ_{n})), \frac{d μ_{Y}}{d μ_{B}} (Y (Δ_{n})) ≜ \frac{d μ_{Y (Δ_{n})}}{d μ_{B (Δ_{n})}} (Y (Δ_{n})) .

Then, by definition, we have

I (W_{0}^{T}; Y (Δ_{n})) = E [\log \frac{d μ_{Y | W}}{d μ_{B}} (Y (Δ_{n}))] - E [\log \frac{d μ_{Y}}{d μ_{B}} (Y (Δ_{n}))] .

Notice that it can be easily checked that

e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s}

integrable, which, together with the fact that

Δ_{n} \subset Δ_{n + 1}

for all n, further implies that

E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s} | Y (Δ_{n}), W_{0}^{T}], E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s} | Y (Δ_{n})]

are both martingales, and therefore, by Doob’s martingale convergence theorem [73],

\frac{d μ_{Y | W}}{d μ_{B}} (Y (Δ_{n})) \to \frac{d μ_{Y | W}}{d μ_{B}} (Y_{0}^{T}), \frac{d μ_{Y}}{d μ_{B}} (Y (Δ_{n})) \to \frac{d μ_{Y}}{d μ_{B}} (Y_{0}^{T}), a . s .

Now, by Jensen’s inequality, we have

E [- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s| Y (Δ_{n}), W_{0}^{T}] \leq \log E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s} | Y (Δ_{n}), W_{0}^{T}]

(A4)

and, by the fact that

\log x \leq x

for any

x > 0

, we have

\log E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s} | Y (Δ_{n}), W_{0}^{T}] \leq E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s} | Y (Δ_{n}), W_{0}^{T}] .

(A5)

It then follows from (A4) and (A5) that

\begin{matrix} |\log E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s} | Y (Δ_{n}), W_{0}^{T}]| & \leq |E [- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s| Y (Δ_{n}), W_{0}^{T}]| \\ + E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s} | Y (Δ_{n}), W_{0}^{T}] . \end{matrix}

Applying the general Lebesgue dominated convergence theorem (see, e.g., Theorem 19 on Page 89 of [74]), we then have

lim_{n \to \infty} E [\log \frac{d μ_{Y | W}}{d μ_{B}} (Y (Δ_{n}))] = E [\log E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s} | Y_{0}^{T}, W_{0}^{T}]] = E [\log \frac{d μ_{Y | W}}{d μ_{B}} (Y_{0}^{T})] .

A completely parallel argument yields that

lim_{n \to \infty} E [\log \frac{d μ_{Y}}{d μ_{B}} (Y (Δ_{n}))] = E [\log E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s} | Y_{0}^{T}]] = E [\log \frac{d μ_{Y}}{d μ_{B}} (Y_{0}^{T})] .

So, with the definition

I (W_{0}^{T}; Y_{0}^{T}) = E [\log \frac{d μ_{Y | W}}{d μ_{B}} (Y_{0}^{T})] - E [\log \frac{d μ_{Y}}{d μ_{B}} (Y_{0}^{T})],

we conclude that

lim_{n \to \infty} I (W_{0}^{T}; Y (Δ_{n})) = E [\log \frac{d μ_{Y | W}}{d μ_{B}} (Y_{0}^{T})] - E [\log \frac{d μ_{Y}}{d μ_{B}} (Y_{0}^{T})] = I (W_{0}^{T}; Y_{0}^{T}) .

Appendix B. Proof of Lemma 1

With Conditions (d)–(f), the proof of the existence and uniqueness of the solution to (6) is somewhat standard; see, e.g., Section

5.4

in [27]. So, in the following, we will only prove (10).

For the stochastic differential equation (6), applying Condition (e), we deduce that there exists

L_{1} > 0

such that

\begin{matrix} ∥ Y_{0}^{T} ∥ & \leq \int_{0}^{T} L_{1} (1 + ∥ W_{0}^{t} ∥ + ∥ Y_{0}^{t} ∥) d t + ∥ B_{0}^{T} ∥ \\ \leq L_{1} T + L_{1} T ∥ W_{0}^{T} ∥ + ∥ B_{0}^{T} ∥ + \int_{0}^{T} L_{1} ∥ Y_{0}^{t} ∥ d t . \end{matrix}

Then, applying the Gronwall inequality followed by a straightforward bounding analysis, we deduce that there exists

L_{2} > 0

such that

\begin{matrix} ∥ Y_{0}^{T} ∥ & \leq (L_{1} T + L_{1} T ∥ W_{0}^{T} ∥ + ∥ B_{0}^{T} ∥) e^{\int_{0}^{T} L_{1} d t} \\ = e^{L_{1} T} (L_{1} T + L_{1} T ∥ W_{0}^{T} ∥ + ∥ B_{0}^{T} ∥) \\ = L_{2} + L_{2} ∥ W_{0}^{T} ∥ + L_{2} ∥ B_{0}^{T} ∥ . \end{matrix}

Now, for any

ε > 0

, applying Doob’s submartingale inequality, we have

\begin{matrix} E [e^{ε ∥ Y_{0}^{T} ∥^{2}}] & \leq E [e^{ε (L_{2} + L_{2} ∥ W_{0}^{T} ∥ + L_{2} ∥ B_{0}^{T} {∥)}^{2}}] \\ \leq E [e^{3 ε (L_{2}^{2} + L_{2}^{2} ∥ W_{0}^{T} ∥^{2} + L_{2}^{2} ∥ B_{0}^{T} ∥^{2})}] \\ = e^{3 ε L_{2}^{2}} E [e^{3 ε L_{2}^{2} {∥ W_{0}^{T} ∥}^{2}}] E [e^{3 ε L_{2}^{2} {∥ B_{0}^{T} ∥}^{2}}] \\ = e^{3 ε L_{2}^{2}} E [e^{3 ε L_{2}^{2} {∥ W_{0}^{T} ∥}^{2}}] E [{sup}_{0 \leq t \leq T} e^{3 ε L_{2}^{2} B {(t)}^{2}}] \\ \leq 4 e^{3 ε L_{2}^{2}} E [e^{3 ε L_{2}^{2} {∥ W_{0}^{T} ∥}^{2}}] E [e^{3 ε L_{2}^{2} B {(T)}^{2}}], \end{matrix}

which, by Condition (f), is finite provided that ε is small enough.

Appendix C. Proof of Theorem 2

We proceed in the following steps.

Step $1$ . In this step, we establish the theorem assuming that there exists

C > 0

such that for all

w_{0}^{T} \in C [0, T]

and all

y_{0}^{T} \in C [0, T]

,

\int_{0}^{T} g^{2} (s, w_{0}^{s}, y_{0}^{s}) d s < C .

(A6)

By the definition of mutual information, (A2) and (A3), we have

\begin{matrix} I (W_{0}^{T}; Y (Δ_{n})) & = E [\log \frac{d μ_{Y | W}}{d μ_{B}} (Y (Δ_{n}))] - E [\log \frac{d μ_{Y}}{d μ_{B}} (Y (Δ_{n}))] \\ = - E [\log E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s} | Y (Δ_{n}), W_{0}^{T}]] + E [\log E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s} | Y (Δ_{n})]] \\ = - E [F_{n}] + E [G_{n}], \end{matrix}

where, for notational simplicity, we have rewritten

g (s, W_{0}^{s}, Y_{0}^{s})

as

g (s)

.

Step 1.1. In this step, we prove that as n tends to infinity,

F_{n} \to - \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s,

(A7)

in probability.

Let

{\bar{Y}}_{Δ_{n}, 0}^{T}

denote the piecewise linear version of

Y_{0}^{T}

with respect to

Δ_{n}

; more precisely, for any

i = 0, 1, \dots, n

,

{\bar{Y}}_{Δ_{n}} (t_{n, i}) = Y (t_{n, i})

, and for any

t_{n, i - 1} < s < t_{n, i}

with

s = λ t_{n, i - 1} + (1 - λ) t_{n, i}

for some

0 < λ < 1

,

{\bar{Y}}_{Δ_{n}} (s) = λ Y (t_{n, i - 1}) + (1 - λ) Y (t_{n, i})

. Let

{\bar{g}}_{Δ_{n}} (s, W_{0}^{s}, {\bar{Y}}_{Δ_{n}, 0}^{s})

denote the piecewise “flat” version of

g (s, W_{0}^{s}, {\bar{Y}}_{Δ_{n}, 0}^{s})

with respect to

Δ_{n}

; more precisely, for any

t_{n, i - 1} \leq s < t_{n, i}

,

{\bar{g}}_{Δ_{n}} (s, W_{0}^{s}, {\bar{Y}}_{Δ_{n}, 0}^{s}) = g (t_{n, i - 1}, W_{0}^{t_{n, i - 1}}, {\bar{Y}}_{Δ_{n}, 0}^{t_{n, i - 1}})

.

Rewriting

{\bar{g}}_{Δ_{n}} (s, W_{0}^{s}, {\bar{Y}}_{Δ_{n}, 0}^{s})

as

{\bar{g}}_{Δ_{n}} (s)

, we have

\begin{matrix} F_{n} & = - \log E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g^{2} (s) d s} | Y (Δ_{n}), W_{0}^{T}] \\ = - \log E [e^{- \int_{0}^{T} {\bar{g}}_{Δ_{n}} (s) d Y (s) + \frac{1}{2} \int_{0}^{T} {\bar{g}}_{Δ_{n}}^{2} (s) d s - \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d Y (s) + \frac{1}{2} \int_{0}^{T} (g^{2} (s) - {\bar{g}}_{Δ_{n}}^{2} (s)) d s} | Y (Δ_{n}), W_{0}^{T}] \\ = - \log e^{- \int_{0}^{T} {\bar{g}}_{Δ_{n}} (s) d Y (s) + \frac{1}{2} \int_{0}^{T} {\bar{g}}_{Δ_{n}}^{2} (s) d s} E [e^{- \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \frac{1}{2} \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} (s) d s} | Y (Δ_{n}), W_{0}^{T}] \\ = - \int_{0}^{T} {\bar{g}}_{Δ_{n}} (s) d Y (s) - \frac{1}{2} \int_{0}^{T} {\bar{g}}_{Δ_{n}}^{2} (s) d s - \log E [e^{- \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \frac{1}{2} \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s} | Y (Δ_{n}), W_{0}^{T}] \end{matrix}

where we have used the fact that

E [e^{- \int_{0}^{T} {\bar{g}}_{Δ_{n}} (s) d Y (s) + \frac{1}{2} \int_{0}^{T} {\bar{g}}_{Δ_{n}}^{2} (s) d s} | Y (Δ_{n}), W_{0}^{T}] = e^{- \int_{0}^{T} {\bar{g}}_{Δ_{n}} (s) d Y (s) + \frac{1}{2} \int_{0}^{T} {\bar{g}}_{Δ_{n}}^{2} (s) d s},

since

{\bar{g}}_{Δ_{n}} (s)

is a function depending only on

W_{0}^{T}

and

Y (Δ_{n})

.

We now prove the following convergence:

E [{((- \int_{0}^{T} {\bar{g}}_{Δ_{n}} (s) d Y (s) - \frac{1}{2} \int_{0}^{T} {\bar{g}}_{Δ_{n}}^{2} (s) d s) - (- \int_{0}^{T} g (s) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (s) d s))}^{2}] \to 0,

(A8)

which will imply that

- \int_{0}^{T} {\bar{g}}_{Δ_{n}} (s) d Y (s) - \frac{1}{2} \int_{0}^{T} {\bar{g}}_{Δ_{n}}^{2} (s) d s \to - \int_{0}^{T} g (s) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (s) d s

in probability. Apparently, to prove (A8), we only need to prove that

E [{(- \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \frac{1}{2} \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s)}^{2}] \to 0 .

(A9)

To establish (A9), notice that, by the Itô isometry [75], we have

E [{(\int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s))}^{2}] = E [\int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s],

which means we only need to prove that as

n \to \infty

,

E [{(\int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s)}^{2}] \to 0 .

(A10)

To see this, we note that, by Conditions (d) and (e), there exists

L_{1} > 0

such that for any

s \in [0, T]

with

t_{n, i - 1} \leq s < t_{n, i}

,

\begin{matrix} | g (s, W_{0}^{s}, {\bar{Y}}_{Δ_{n}, 0}^{s}) - {\bar{g}}_{Δ_{n}} (s, W_{0}^{s}, {\bar{Y}}_{Δ_{n}, 0}^{s}) | \\ = | g (s, W_{0}^{s}, {\bar{Y}}_{Δ_{n}, 0}^{s}) - g (t_{n, i - 1}, W_{0}^{t_{n, i - 1}}, {\bar{Y}}_{Δ_{n}, 0}^{t_{n, i - 1}}) | \\ \leq L_{1} (| s - t_{n, i - 1} | + ∥ W_{0}^{s} - W_{0}^{t_{n, i - 1}} ∥ + ∥ {\bar{Y}}_{Δ_{n}, 0}^{s} - {\bar{Y}}_{Δ_{n}, 0}^{t_{n, i - 1}} ∥) \\ \leq L_{1} (| s - t_{n, i - 1} | + ∥ W_{0}^{s} - W_{0}^{t_{n, i - 1}} ∥ + | Y (t_{n, i}) - Y (t_{n, i - 1}) |) \end{matrix}

(A11)

\begin{matrix} \leq L_{1} δ_{Δ_{n}} + L_{1} {sup}_{r \in [t_{n, i - 1}, t_{n, i}]} | W (r) - W (t_{n, i - 1}) | \\ + L_{1} δ_{Δ_{n}} + L_{1} δ_{Δ_{n}} ∥ W_{0}^{T} ∥ + L_{1} δ_{Δ_{n}} ∥ Y_{0}^{T} ∥ + | B (t_{n, i}) - B (t_{n, i - 1}) | . \end{matrix}

(A12)

Moreover, by Lemma 1 and Condition (f), both

∥ Y_{0}^{T} ∥^{4}

and

∥ W_{0}^{T} ∥^{4}

are integrable. Furthermore, by Condition (f), we deduce that for any

t_{n, i - 1} \leq s < t_{n, i}

,

E [{sup}_{r \in [t_{n, i - 1}, t_{n, i}]} {(W (r) - W (t_{n, i - 1}))}^{4}] \leq L_{2} δ_{Δ_{n}}^{2},

(A13)

for some

L_{2} > 0

, and one easily verifies that

E [{(B (t_{i}^{(n)}) - B (t_{i - 1}^{(n)}))}^{4}] = 3 {(t_{i}^{(n)} - t_{i - 1}^{(n)})}^{2} \leq 3 δ_{Δ_{n}}^{2} .

(A14)

It can be readily checked that (A12), (A13) and (A14) imply (A10), which in turn implies (A8), as desired.

We now prove that as n tends to infinity,

E [| E [e^{- \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \frac{1}{2} \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s} | Y (Δ_{n}), W_{0}^{T}] - 1 |] \to 0,

(A15)

which will imply that

\log E [e^{- \int_{0}^{T} (g (s) - \bar{g} (s)) d B (s) - \frac{1}{2} \int_{0}^{T} {(g (s) - \bar{g} (s))}^{2} d s} | Y (Δ_{n}), W_{0}^{T}] \to 0

in probability and furthermore (A7). To establish (A15), we first note that

\begin{matrix} E [| E [e^{- \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \frac{1}{2} \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s} | Y (Δ_{n}), W_{0}^{T}] - 1 |] \\ \leq E [E [| e^{- \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \frac{1}{2} \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s} - 1 | | Y (Δ_{n}), W_{0}^{T}]] \\ = E [| e^{- \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \frac{1}{2} \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s} - 1 |] \\ \leq E [|- \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \frac{1}{2} \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s| e^{| - \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \frac{1}{2} \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s |}] \\ \leq E [{|- \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \frac{1}{2} \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s|}^{2}] E [e^{2 | - \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \frac{1}{2} \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s |}] . \end{matrix}

By (A8), we have that as n tends to infinity,

E [{|- \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \frac{1}{2} \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s|}^{2}] \to 0 .

It then follows that, to prove (A15), we only need to prove that if

δ_{Δ_{n}}

is small enough,

E [e^{2 | - \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \frac{1}{2} \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s |}] < \infty .

(A16)

Since

E [e^{2 | - \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \frac{1}{2} \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s |}]

\leq E [e^{2 (- \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \frac{1}{2} \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s)}] + E [e^{2 (\int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) + \frac{1}{2} \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s)}],

(A17)

we only have to prove that the two terms in the above upper bound are both finite provided that

δ_{Δ_{n}}

is small enough. Note that for the first term, applying the Cauchy–Schwarz inequality, we have

\begin{matrix} E [e^{2 (- \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \frac{1}{2} \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s)}] & = E [e^{\int_{0}^{T} 2 (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \int_{0}^{T} 4 {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s + 3 \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s}] \\ \leq E [e^{\int_{0}^{T} 4 (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \int_{0}^{T} 8 {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s}] E [e^{6 \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s}] . \end{matrix}

It is well known that an application of Fatou’s lemma yields that

E [e^{\int_{0}^{T} 4 (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \int_{0}^{T} 8 {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s}] \leq 1,

(A18)

nd by (A12), we deduce that there exists

L_{3} > 0

such that

E [e^{6 \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s}] \leq e^{L_{3} δ_{Δ_{n}}^{2}} E [e^{L_{3} {∥ B_{0}^{δ_{Δ_{n}}} ∥}^{2}}] E [e^{L_{3} δ_{Δ_{n}}^{2} {∥ Y_{0}^{T} ∥}^{2}}] E [e^{L_{3} δ_{Δ_{n}}^{2} {∥ W_{0}^{T} ∥}^{2}}] E [e^{L_{3} {sup}_{| s - t | \leq δ_{Δ_{n}}} {| W (s) - W (t) |}^{2}}] .

Note that it follows from Doob’s submartingale inequality that if

δ_{Δ_{n}}

is small enough,

E [e^{L_{3} {∥ B_{0}^{δ_{Δ_{n}}} ∥}^{2}}] < \infty,

and by Lemma 1, we also deduce that if

δ_{Δ_{n}}

is small enough,

E [e^{L_{3} δ_{Δ_{n}}^{2} {∥ Y_{0}^{T} ∥}^{2}}] < \infty,

which, together with Condition (f), yields that for the first term in (A17)

E [e^{2 (- \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \frac{1}{2} \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s)}] < \infty .

(A19)

A completely parallel argument will yield that for the second term in (A17)

E [e^{(2 \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) + \frac{1}{2} \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s)}] < \infty,

which, together with (A19), immediately implies (A16), which in turn implies (A15), as desired.

Step 1.2. In this step, we prove that as n tends to infinity,

G_{n} \to \log E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s} | Y_{0}^{T}],

(A20)

in probability.

First, note that by Theorem

7.23

of [19], we have,

\frac{d μ_{Y}}{d μ_{B}} (Y_{0}^{T}) = \int \frac{d μ_{Y | W = w}}{d μ_{B}} (Y_{0}^{T}) d μ_{W} (w),

where

\frac{d μ_{Y | W = w}}{d μ_{B}} (Y_{0}^{T}) = e^{\int_{0}^{T} g (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (w_{0}^{s}) d s},

where we have rewritten

g (s, w_{0}^{s}, Y_{0}^{s})

as

g (w_{0}^{s})

for notational simplicity. It then follows from (A3) that

\begin{matrix} \log E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g^{2} (s) d s} | Y_{0}^{T}] & = - \log \int \frac{d μ_{Y | W = w}}{d μ_{B}} (Y_{0}^{T}) d μ_{W} (w) \\ = - \log \int e^{\int_{0}^{T} g (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (w_{0}^{s}) d s} d μ_{W} (w) . \end{matrix}

Similarly, we have

\begin{matrix} \frac{d μ_{Y}}{d μ_{B}} (Y (Δ_{n})) & = \int \frac{d μ_{Y | W = w}}{d μ_{B}} (Y (Δ_{n})) d μ_{W} (w) \\ = \int \frac{1}{E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g^{2} (s) d s} | Y (Δ_{n}), W] |_{W = w}} d μ_{W} (w) . \end{matrix}

It then again follows from (A3) that

\begin{matrix} G_{n} & = \log E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g^{2} (s) d s} | Y (Δ_{n})] \\ = - \log \int \frac{1}{E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g^{2} (s) d s} | Y (Δ_{n}), W] |_{W = w}} d μ_{W} (w) . \end{matrix}

Now, we consider the following difference:

\begin{matrix} \int e^{\int_{0}^{T} g (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (w_{0}^{s}) d s} d μ_{W} (w) - \int \frac{1}{{E [e^{- \int_{0}^{T} g (w_{0}^{s}) d Y (s) + \frac{1}{2} \int_{0}^{T} g^{2} (w_{0}^{s}) d s}| Y (Δ_{n}), W]|}_{W = w}} d μ_{W} (w) \\ = \int e^{\int_{0}^{T} g (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (w_{0}^{s}) d s} - e^{\int_{0}^{T} {\bar{g}}_{Δ_{n}} (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} {\bar{g}}_{Δ_{n}} {(w_{0}^{s})}^{2} d s} d μ_{W} (w) \\ + \int e^{\int_{0}^{T} {\bar{g}}_{Δ_{n}} (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} {\bar{g}}_{Δ_{n}} {(w_{0}^{s})}^{2} d s} \\ \times \frac{E [e^{- (\int_{0}^{T} g (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (w_{0}^{s}) d s) + (\int_{0}^{T} {\bar{g}}_{Δ_{n}} (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} {\bar{g}}_{Δ_{n}}^{2} (w_{0}^{s}) d s)} | Y (Δ_{n}), W] |_{W = w} - 1}{E [e^{- (\int_{0}^{T} g (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (w_{0}^{s}) d s) + (\int_{0}^{T} {\bar{g}}_{Δ_{n}} (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} {\bar{g}}_{Δ_{n}}^{2} (w_{0}^{s}) d s)} | Y (Δ_{n}), W] |_{W = w}} d μ_{W} (w) \\ = I_{n} + J_{n} . \end{matrix}

Applying the inequality that for any

x, y \in R

,

| e^{x} - e^{y} | = | e^{y} (e^{x - y} - 1) | \leq e^{y} (| x - y | e^{x - y} + | x - y | e^{y - x}) = | x - y | (e^{x} + e^{2 y - x}),

(A21)

we have

\begin{matrix} E [| I_{n} |] \leq \int E [|e^{\int_{0}^{T} g (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (w_{0}^{s}) d s} - e^{\int_{0}^{T} {\bar{g}}_{Δ_{n}} (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} {\bar{g}}_{Δ_{n}}^{2} (w_{0}^{s}) d s}|] d μ_{W} (w) \\ \leq \int E [|\int_{0}^{T} g (w_{0}^{s}) - {\bar{g}}_{Δ_{n}} (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (w_{0}^{s}) - {\bar{g}}_{Δ_{n}}^{2} (w_{0}^{s}) d s| \\ \times (e^{\int_{0}^{T} g (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (w_{0}^{s}) d s} + e^{(2 \int_{0}^{T} {\bar{g}}_{Δ_{n}} (w_{0}^{s}) d Y (s) - \int_{0}^{T} {\bar{g}}_{Δ_{n}}^{2} (w_{0}^{s}) d s) - (\int_{0}^{T} g (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (w_{0}^{s}) d s)})] d μ_{W} (w) \\ \leq \int E [|\int_{0}^{T} (g (w_{0}^{s}) - {\bar{g}}_{Δ_{n}} (w_{0}^{s})) d B (s)| + |\int_{0}^{T} (g (w_{0}^{s}) - {\bar{g}}_{Δ_{n}} (w_{0}^{s})) (g (s) - \frac{1}{2} g (w_{0}^{s}) - \frac{1}{2} {\bar{g}}_{Δ_{n}} (w_{0}^{s})) d s| \\ \times (e^{\int_{0}^{T} g (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (w_{0}^{s}) d s} + e^{(2 \int_{0}^{T} {\bar{g}}_{Δ_{n}} (w_{0}^{s}) d Y (s) - \int_{0}^{T} {\bar{g}}_{Δ_{n}}^{2} (w_{0}^{s}) d s) - (\int_{0}^{T} g (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (w_{0}^{s}) d s)})] d μ_{W} (w) \\ \leq \int E [|\int_{0}^{T} (g (w_{0}^{s}) - {\bar{g}}_{Δ_{n}} (w_{0}^{s})) d B (s)| + (L δ_{Δ_{n}} + L sup_{| s - t | \leq δ_{Δ_{n}}} | w (s) - w (t) | + L δ_{Δ_{n}} \\ + L δ_{Δ_{n}} ∥ w_{0}^{T} ∥ + L δ_{Δ_{n}} ∥ Y_{0}^{T} ∥ + sup_{| s - t | \leq δ_{Δ_{n}}} | B (s) - B (t) |) (\int_{0}^{T} |g (s) - \frac{1}{2} g (w_{0}^{s}) - \frac{1}{2} {\bar{g}}_{Δ_{n}} (w_{0}^{s})| d s) \\ \times (e^{\int_{0}^{T} g (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (w_{0}^{s}) d s} + e^{(2 \int_{0}^{T} {\bar{g}}_{Δ_{n}} (w_{0}^{s}) d Y (s) - \int_{0}^{T} {\bar{g}}_{Δ_{n}}^{2} (w_{0}^{s}) d s) - (\int_{0}^{T} g (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (w_{0}^{s}) d s)})] d μ_{W} (w) . \end{matrix}

Now, using (A12), Condition (f) and the Itô isometry, we deduce that as

n \to \infty

,

\int E [{|\int_{0}^{T} g (w_{0}^{s}) - {\bar{g}}_{Δ_{n}} (w_{0}^{s}) d B (s)|}^{2}] d μ_{W} (w) \to 0,

(A22)

and as n tends to infinity,

\int E [(L δ_{Δ_{n}} + L sup_{| s - t | \leq δ_{Δ_{n}}} | w (s) - w (t) | + L δ_{Δ_{n}}

+ L δ_{Δ_{n}} ∥ w_{0}^{T} ∥ + L δ_{Δ_{n}} ∥ Y_{0}^{T} ∥ + sup_{| s - t | \leq δ_{Δ_{n}}} {| B (s) - B (t) |)}^{2}] d μ_{W} (w) \to 0 .

(A23)

Now, using a similar argument as above with (A6) and Lemma 1, we can show that for any constant K,

E [e^{\int_{0}^{T} K {\bar{g}}_{Δ_{n}}^{2} (s) d s}] = E [e^{\int_{0}^{T} K {({\bar{g}}_{Δ_{n}} (s) - g (s) + g (s))}^{2} d s}] = E [e^{\int_{0}^{T} K (2 {({\bar{g}}_{Δ_{n}} (s) - g (s))}^{2} + 2 g^{2} (s)) d s}] < \infty,

(A24)

provided that n is large enough, which, coupled with a similar argument as in the derivation of (A19), proves that for n large enough,

\int E [{(e^{\int_{0}^{T} g (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (w_{0}^{s}) d s} + e^{(2 \int_{0}^{T} \bar{g} (w_{0}^{s}) d Y (s) - \int_{0}^{T} {\bar{g}}^{2} (w_{0}^{s}) d s) - (\int_{0}^{T} g (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (w_{0}^{s}) d s)})}^{2}] d μ_{W} (w) < \infty,

(A25)

and furthermore

\int {(\int_{0}^{T} |g (s) - \frac{1}{2} g (w_{0}^{s}) - \frac{1}{2} {\bar{g}}_{Δ_{n}} (w_{0}^{s})| d s)}^{2}

\times {(e^{\int_{0}^{T} g (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (w_{0}^{s}) d s} + e^{(2 \int_{0}^{T} {\bar{g}}_{Δ_{n}} (w_{0}^{s}) d Y (s) - \int_{0}^{T} {\bar{g}}_{Δ_{n}}^{2} (w_{0}^{s}) d s) - (\int_{0}^{T} g (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (w_{0}^{s}) d s)})}^{2} d μ_{W} (w) < \infty,

(A26)

which further implies that as n tends to infinity,

E [| I_{n} |] \to 0 .

(A27)

Now, using the shorthand notations

A_{n}

, A for

\int_{0}^{T} {\bar{g}}_{Δ_{n}} (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} {\bar{g}}_{Δ_{n}} {(w_{0}^{s})}^{2} d s

,

\int_{0}^{T} g (w_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g {(w_{0}^{s})}^{2} d s

respectively, we have

\begin{matrix} E [| J_{n} |] & = E [|\int e^{A_{n}} \frac{E [e^{- A + A_{n}} | Y (Δ_{n}), W] |_{W = w} - 1}{E [e^{- A + A_{n}} | Y (Δ_{n}), W] |_{W = w}} d μ_{W} (w)|] \\ = E [|\int \frac{E [e^{- A + A_{n}} - 1 | Y (Δ_{n}), W] |_{W = w}}{E [e^{- A} | Y (Δ_{n}), W] |_{W = w}} d μ_{W} (w)|] \\ \leq E [\int \frac{E [| e^{- A + A_{n}} - 1 | | Y (Δ_{n}), W {] |}_{W = w}}{E [e^{- A} | Y (Δ_{n}), W] |_{W = w}} d μ_{W} (w)] \\ \leq E [\int E [| A - A_{n} | e^{| A - A_{n} |} | Y (Δ_{n}) {, W] |}_{W = w} E [e^{A} | Y (Δ_{n}), W] |_{W = w} d μ_{W} (w)] \\ = E [E [| A - A_{n} | e^{| A - A_{n} |} | Y (Δ_{n}), W] E [e^{A} | Y (Δ_{n}), W] (\frac{d μ_{Y}}{d μ_{B}} (Y_{0}^{T})) / (\frac{d μ_{Y | W}}{d μ_{B}} (Y_{0}^{T}))] \\ = E [| A - A_{n} | e^{| A - A_{n} |} E [e^{A} | Y (Δ_{n}), W] E [(\frac{d μ_{Y}}{d μ_{B}} (Y_{0}^{T})) / (\frac{d μ_{Y | W}}{d μ_{B}} (Y_{0}^{T})) | Y (Δ_{n}), W]] . \end{matrix}

Now, a similar argument as in (A22)–(A26), together with the well-known fact (see, e.g., Theorem

6.2.2

in [12]) that

\frac{d μ_{Y}}{d μ_{B}} (Y_{0}^{T}) = e^{\int_{0}^{T} \hat{g} (s) d Y (s) - \frac{1}{2} \int_{0}^{T} {\hat{g}}^{2} (s) d s}, \frac{d μ_{Y | W}}{d μ_{B}} (Y_{0}^{T}) = e^{\int_{0}^{T} g (s) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (s) d s},

where

\hat{g} (s) = E [g (s) | Y_{0}^{T}]

, yields that

E [| J_{n} |] \to 0 .

Now, we are ready to conclude that as n tends to infinity,

E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g^{2} (s) d s} | Y (Δ_{n})] \to E [e^{- \int_{0}^{T} g (s) d Y + \frac{1}{2} \int_{0}^{T} g^{2} (s) d s} | Y_{0}^{T}]

in probability and furthermore (A20), as desired.

Step 1.3. In this step, we show the convergence of

{E [F_{n}]}

and

{E [G_{n}]}

and further establish the theorem under the condition (A6).

Now, using the concavity of the log function and the fact that

\log x \leq x

, we can obtain the upper bounds and lower bounds of

F_{n}

and

G_{n}

as follows:

F_{n} \leq |E [- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g^{2} (s) d s| Y_{Δ_{n}}, W_{0}^{T}]| + E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g^{2} (s) d s}| Y_{Δ_{n}}, W_{0}^{T}],

F_{n} \geq - |E [- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g^{2} (s) d s| Y_{Δ_{n}}, W_{0}^{T}]| - E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g^{2} (s) d s}| Y_{Δ_{n}}, W_{0}^{T}],

and

G_{n} \leq |E [- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g^{2} (s) d s| Y_{Δ_{n}}]| + E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g^{2} (s) d s}| Y_{Δ_{n}}],

G_{n} \geq - |E [- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g^{2} (s) d s| Y_{Δ_{n}}]| - E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g^{2} (s) d s}| Y_{Δ_{n}}] .

And furthermore, using a similar argument as in Step 1.1, we can show that as n tends to infinity,

\begin{matrix} E [- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g^{2} (s) d s| Y (Δ_{n}), W_{0}^{T}] = (- \int_{0}^{T} {\bar{g}}_{Δ_{n}} (s) d Y (s) + \frac{1}{2} \int_{0}^{T} {\bar{g}}_{Δ_{n}}^{2} (s) d s) \\ \times E [- \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}}) (s) d Y (s) + \frac{1}{2} \int_{0}^{T} (g^{2} (s) - {\bar{g}}_{Δ_{n}}^{2} (s)) d s| Y (Δ_{n}), W_{0}^{T}] \\ \to (- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g^{2} (s) d s) \end{matrix}

and

\begin{matrix} E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g^{2} (s) d s} | Y (Δ_{n}), W_{0}^{T}] \\ = E [e^{- \int_{0}^{T} {\bar{g}}_{Δ_{n}} (s) d Y (s) + \frac{1}{2} \int_{0}^{T} {\bar{g}}_{Δ_{n}}^{2} (s) d s - \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d Y (s) + \frac{1}{2} \int_{0}^{T} (g^{2} (s) - {\bar{g}}_{Δ_{n}}^{2} (s)) d s} | Y (Δ_{n}), W_{0}^{T}] \\ = e^{- \int_{0}^{T} {\bar{g}}_{Δ_{n}} (s) d Y (s) + \frac{1}{2} \int_{0}^{T} {\bar{g}}_{Δ_{n}}^{2} (s) d s} E [e^{- \int_{0}^{T} (g (s) - {\bar{g}}_{Δ_{n}} (s)) d B (s) - \frac{1}{2} \int_{0}^{T} {(g (s) - {\bar{g}}_{Δ_{n}} (s))}^{2} d s} | Y (Δ_{n}), W_{0}^{T}] \\ \to e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g^{2} (s) d s} . \end{matrix}

It then follows from the general Lebesgue dominated convergence theorem that

lim_{n \to \infty} E [F_{n}] \to E [- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s] .

A parallel argument can be used to show that

lim_{n \to \infty} E [G_{n}] = E [\log E [e^{- \int_{0}^{T} g (s) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s)}^{2} d s} | Y_{0}^{T}]] .

So, under the condition (A6), we have shown that

lim_{n \to \infty} I (W_{0}^{T}; Y (Δ_{n})) = I (W_{0}^{T}; Y_{0}^{T}) .

Step $2$ . In this step, we will use the convergence in Step

1

and establish the theorem without the condition (A6).

Following Page 264 of [19], we define, for any k,

τ_{k} = \{\begin{matrix} inf {t \leq T : \int_{0}^{t} g^{2} (s, W_{0}^{s}, Y_{0}^{s}) d s \geq k}, if \int_{0}^{T} g^{2} (s, W_{0}^{s}, Y_{0}^{s}) d s \geq k \\ T, if \int_{0}^{T} g^{2} (s, W_{0}^{s}, Y_{0}^{s}) d s < k . \end{matrix}

(A28)

Then, we again follow [19] and define a truncated version of g as follows:

g_{(k)} (t, γ_{0}^{t}, ϕ_{0}^{t}) = g (t, γ_{0}^{t}, ϕ_{0}^{t}) 1_{\int_{0}^{t} g^{2} (s, γ_{0}^{t}, ϕ_{0}^{s}) d s < k} .

Now, define a truncated version of Y as follows:

Y_{(k)} (t) = ρ \int_{0}^{t} g_{(k)} (s, W_{0}^{s}, Y_{0}^{s}) d s + B (t), t \in [0, T],

which, as elaborated on Page 265 in [19], can be rewritten as

Y_{(k)} (t) = ρ \int_{0}^{t} g_{(k)} (s, W_{0}^{s}, Y_{(k), 0}^{s}) d s + B (t), t \in [0, τ_{k} \land T] .

(A29)

Note that for fixed k, the system in (A29) satisfies the condition (A6), and so the theorem holds true. To be more precise, note that

I (W_{0}^{T}; Y_{0}^{τ_{k}}) = E [\log \frac{d μ_{τ_{k}, Y | W}}{d μ_{τ_{k}, B}} (Y_{0}^{τ_{k}})] - E [\log \frac{d μ_{τ_{k}, Y}}{d μ_{τ_{k}, B}} (Y_{0}^{τ_{k}})],

where

μ_{τ_{k}, Y}

and

μ_{τ_{k}, B}

respectively denote the truncated versions of

μ_{Y}

and

μ_{B}

(from time 0 to time

τ_{n}

). Applying Theorem

7.10

in [19], we obtain

\frac{d μ_{τ_{k}, Y | W}}{d μ_{τ_{k}, B}} (Y_{0}^{τ_{n}}) = e^{\int_{0}^{τ_{k}} g (s) d Y (s) - \frac{1}{2} \int_{0}^{τ_{k}} g^{2} (s) d s},

and

\frac{d μ_{τ_{k}, Y}}{d μ_{τ_{k}, B}} (Y_{0}^{τ_{k}}) = e^{\int_{0}^{τ_{k}} \hat{g} (s) d Y (s) - \frac{1}{2} \int_{0}^{τ_{k}} {\hat{g}}^{2} (s) d s},

where

\hat{g} (s) = E [g (s, W_{0}^{s}, Y_{0}^{s}) | Y_{0}^{s}] .

It then follows that

I (W_{0}^{T}; Y_{0}^{τ_{k}}) = \frac{1}{2} E [\int_{0}^{τ_{k}} {(g (s) - \hat{g} (s))}^{2} d s] .

Notice that it can be easily verified that

τ_{k} \to T

as k tends to infinity, which, together with the monotone convergence theorem, further yields that monotone increasingly,

I (W_{0}^{T}; Y_{0}^{τ_{k}}) = \frac{1}{2} E [\int_{0}^{τ_{k}} {(g (s) - \hat{g} (s))}^{2} d s] \to I (W_{0}^{T}; Y_{0}^{T}) = \frac{1}{2} E [\int_{0}^{T} {(g (s) - \hat{g} (s))}^{2} d s],

as k tends to infinity. By Step

1

, for any fixed

k_{i}

,

lim_{n \to \infty} I (W_{0}^{T}; Y (Δ_{n} \cap [0, τ_{k_{i}}])) = I (W_{0}^{T}; Y_{0}^{τ_{k_{i}}}),

which means that there exists a sequence

{n_{i}}

such that, as i tends to infinity, we have, monotone increasingly,

I (W_{0}^{T}; Y (Δ_{n_{i}} \cap [0, τ_{k_{i}}])) \to I (W_{0}^{T}; Y_{0}^{T}) .

Since, by the fact that

Y_{0}^{τ_{k}}

coincides with

Y_{0}^{T}

on the interval

[0, τ_{k} \land T]

, we have

I (W_{0}^{T}; Y (Δ_{n_{i}})) \geq I (W_{0}^{T}; Y (Δ_{n_{i}} \cap [0, τ_{k_{i}}])) .

Now, using the fact that

I (W_{0}^{T}; Y (Δ_{n_{i}})) \leq I (W_{0}^{T}; Y_{0}^{T}),

we conclude that as i tends to infinity,

lim_{i \to \infty} I (W_{0}^{T}; Y (Δ_{n_{i}})) = I (W_{0}^{T}; Y_{0}^{T}) .

A similar argument can be readily applied to any subsequence of

{I (W_{0}^{T}; Y (Δ_{n}))}

, which will establish the existence of its further subsubsequence that converges to

I (W_{0}^{T}; Y_{0}^{T})

, which implies that

lim_{n \to \infty} I (W_{0}^{T}; Y (Δ_{n})) = I (W_{0}^{T}; Y_{0}^{T}) .

The proof of the theorem is then complete.

Remark A1.

The arguments in the proof of Theorem 2 can be adapted to yield a sampling theorem for continuous-time minimum mean square error (MMSE), a quantity of central importance in estimation theory.

More precisely, consider the following continuous-time Gaussian feedback channel under the assumptions of Theorem 2:

Y (t) = \int_{0}^{t} X (s, M, Y_{0}^{s}) d s + B (t), t \in [0, T] .

The MMSE is the limit of the MMSE based on the samples with respect to

Δ_{n}

, namely,

\int_{0}^{T} E [{(X (s) - E [X (s) | Y_{0}^{T}])}^{2}] d s = lim_{n \to \infty} \int_{0}^{T} E [(X (s) - E {[X (s) | Y (Δ_{n}))}^{2}] d s .

To see this, note that the above-mentioned convergence follows from the fact that

E [E^{2} [X (s) | Y_{0}^{T}]] = E [{(\frac{\int X (s, m_{0}^{s}, Y_{0}^{s}) d μ_{Y | M = m} (Y_{0}^{T}) / d μ_{B} d μ_{M} (m)}{d μ_{Y} (Y_{0}^{T}) / d μ_{B}})}^{2}],

and

E [E^{2} [X (s) | Y (Δ_{n})]] = E [{(\frac{\int X (s, m_{0}^{s}, Y_{0}^{s}) d μ_{Y | M = m} (Y (Δ_{n})) / d μ_{B} d μ_{M} (m)}{d μ_{Y} (Y (Δ_{n})) / d μ_{B}})}^{2}],

and the proven fact that

d μ_{Y} (Y (Δ_{n})) / d μ_{B}

and

d μ_{Y | M} (Y (Δ_{n})) / d μ_{B}

respectively converge to

d μ_{Y} (Y_{0}^{T}) / d μ_{B} a n d d μ_{Y | M} (Y_{0}^{T}) / d μ_{B}

and a parallel argument as in establishing the convergence of

{E [F_{n}]}

and

{E [G_{n}]}

in the proof of Theorem 2.

Similarly, we can also conclude that under the assumptions of Theorem 2, the causal MMSE is the limit of the sampled causal MMSE, namely,

\int_{0}^{T} E [{(X (s) - E [X (s) | Y_{0}^{s}])}^{2}] d s = lim_{n \to \infty} \int_{0}^{T} E [{(X (s) - E [X (s) | Y (Δ_{n} \cap [0, s])])}^{2}] d s .

Appendix D. Proof of Theorem 3

In this section, we give the detailed proof of Theorem 3.

We will first need the following lemma, which is parallel to Lemma 1.

Lemma A1.

Assume Conditions (d)–(f). Then, there exists

ε > 0

and a constant

C > 0

such that for all n,

E [e^{ε ∥ Y_{0}^{(n), T} ∥^{2}}] < C .

(A30)

Proof.

A discrete-time version of the proof of Lemma 1 implies that there exists

ε > 0

and a constant

C > 0

such that for all n

E [e^{ε {sup}_{i \in {0, 1, \dots, n}} {(Y^{(n)} (t_{n, i}))}^{2}}] < C,

which, together with (12), immediately implies (A30). ☐

We also need the following lemma, which is parallel to Theorem

10.2.2

in [24].

Lemma A2.

Assume Conditions (d)–(f). Then, there exists a constant

C > 0

such that for all n,

E [∥ Y_{0}^{(n), T} - Y_{0}^{T} ∥^{2}] \leq C δ_{Δ_{n}} .

Proof.

Note that for any n, we have

Y (t_{n, i + 1}) = Y (t_{n, i}) + \int_{t_{n, i}}^{t_{n, i + 1}} g (s, W_{0}^{s}, Y_{0}^{s}) d s + B (t_{n, i + 1}) - B (t_{n, i}),

and

Y^{(n)} (t_{n, i + 1}) = Y^{(n)} (t_{n, i}) + \int_{t_{n, i}}^{t_{n, i + 1}} g (s, W_{0}^{t_{n, i}}, Y_{0}^{(n), t_{n, i}}) d s + B (t_{n, i + 1}) - B (t_{n, i}) .

It then follows that

Y (t_{n, i + 1}) - Y^{(n)} (t_{n, i + 1}) = Y (t_{n, i}) - Y^{(n)} (t_{n, i}) + \int_{t_{n, i}}^{t_{n, i + 1}} (g (s, W_{0}^{s}, Y_{0}^{s}) - g (s, W_{0}^{t_{n, i}}, Y_{0}^{(n), t_{n, i}})) d s .

(A31)

Now, for any t, choose

n_{0}

such that

t_{n, n_{0}} \leq t < t_{n, n_{0} + 1}

. Now, a recursive application of (A31), coupled with Conditions (d) and (e), yields that for some

L > 0

,

\begin{matrix} Y (t) - Y^{(n)} (t) & = \sum_{i = 0}^{n_{0}} \int_{t_{n, i}}^{t_{n, i + 1}} (g (s, W_{0}^{s}, Y_{0}^{s}) - g (t_{n, i}, W_{0}^{t_{n, i}}, Y_{0}^{(n), t_{n, i}})) d s + \int_{t_{n, n_{0} + 1}}^{t} (g (s, W_{0}^{s}, Y_{0}^{s}) - g (t_{n, i}, W_{0}^{t_{n, n_{0} + 1}}, Y_{0}^{(n), t_{n, n_{0} + 1}})) d s \\ \leq \sum_{i = 0}^{n_{0}} \int_{t_{n, i}}^{t_{n, i + 1}} L | s - t_{n, i} | + L ∥ W_{0}^{s} - W_{0}^{t_{n, i}} ∥ + L ∥ Y_{0}^{s} - Y_{0}^{(n), s} ∥ + L ∥ Y_{0}^{(n), s} - Y_{0}^{(n), t_{n, i}} ∥ d s \\ + \int_{t_{n, n_{0} + 1}}^{t} L | s - t_{n, n_{0} + 1} | + L ∥ W_{0}^{s} - W_{0}^{t_{n, n_{0} + 1}} ∥ + L ∥ Y_{0}^{s} - Y_{0}^{(n), s} ∥ + L ∥ Y_{0}^{(n), s} - Y_{0}^{(n), t_{n, n_{0} + 1}} ∥ d s . \end{matrix}

Noticing that for any s with

t_{n, i} \leq s < t_{n, i + 1}

, we have

∥ Y_{0}^{(n), s} - Y_{0}^{(n), t_{n, i}} ∥^{2} \leq | Y^{(n)} (t_{n, i + 1}) - Y^{(n)} (t_{n, i}) |^{2} \leq 2 {|\int_{t_{n, i}}^{t_{n, i + 1}} g (s, W_{0}^{t_{n, i}}, Y_{0}^{(n), t_{n, i}}) d s|}^{2} + 2 {| B (t_{n, i + 1}) - B (t_{n, i}) |}^{2},

which, together with Condition (e) and the fact that for all n and i,

E [| B (t_{n, i + 1}) - B (t_{n, i}) |^{2}] = O (δ_{Δ_{n}}),

(A32)

implies that

E [∥ Y_{0}^{(n), s} - Y_{0}^{(n), t_{n, i}} ∥^{2}] = O (δ_{Δ_{n}}) .

(A33)

Noting that the constants in the two terms

O (δ_{Δ_{n}})

in (A32) and (A33) can be chosen uniform over all n, a usual argument with the Gronwall inequality and Condition (f) applied to

E [∥ Y_{0}^{t} - Y_{0}^{(n), t} ∥^{2}]

completes the proof of the theorem. ☐

We are now ready for the proof of Theorem 3.

Proof of Theorem 3.

We proceed in two steps.

Step $1$ . In this step, we establish the theorem assuming that there exists a constant

C > 0

such that for all

w_{0}^{T} \in C [0, T]

and all

y_{0}^{T} \in C [0, T]

,

\int_{0}^{T} g^{2} (s, w_{0}^{s}, y_{0}^{s}) d s < C .

(A34)

We first note that straightforward computations yield

\begin{matrix} f_{Y^{(n)} (Δ_{n}) | W} (y^{(n)} (Δ_{n}) | w_{0}^{T}) & = \prod_{i = 1}^{n} f (y_{t_{n, i}}^{(n)} | y_{t_{n, 0}}^{(n), t_{n, i - 1}}, w_{0}^{t_{n, i - 1}}) \\ = \prod_{i = 1}^{n} \frac{1}{\sqrt{2 π (t_{n, i} - t_{n, i - 1})}} \exp (- \frac{{(y_{t_{n, i}}^{(n)} - y_{t_{n, i - 1}}^{(n)} - \int_{t_{n, i - 1}}^{t_{n, i}} g (s, w_{0}^{t_{n, i - 1}}, y_{0}^{(n), t_{n, i - 1}}) d s)}^{2}}{2 (t_{n, i} - t_{n, i - 1})}), \end{matrix}

(here we have used the shorter notations

y_{t_{n, i}}^{(n)}, y_{t_{n, i - 1}}^{(n)}

for

y^{(n)} (t_{n, i}), y^{(n)} (t_{n, i - 1})

, respectively) and

f_{Y^{(n)} (Δ_{n})} (y^{(n)} (Δ_{n})) = \int \prod_{i = 1}^{n} \frac{1}{\sqrt{2 π (t_{n, i} - t_{n, i - 1})}} \exp (- \frac{{(y_{t_{n, i}}^{(n)} - y_{t_{n, i - 1}}^{(n)} - \int_{t_{n, i - 1}}^{t_{n, i}} g (s, w_{0}^{t_{n, i - 1}}, y_{0}^{(n), t_{n, i - 1}}) d s)}^{2}}{2 (t_{n, i} - t_{n, i - 1})}) d μ_{W} (w),

which further lead to

f_{Y^{(n)} (Δ_{n}) | W} (Y^{(n)} (Δ_{n}) | W_{0}^{T}) = \prod_{i = 1}^{n} \frac{1}{\sqrt{2 π (t_{n, i} - t_{n, i - 1})}} \exp (- \frac{{(Y_{t_{n, i}}^{(n)} - Y_{t_{n, i - 1}}^{(n)} - \int_{t_{n, i - 1}}^{t_{n, i}} g (s, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d s)}^{2}}{2 (t_{n, i} - t_{n, i - 1})}),

(A35)

and

f_{Y^{(n)} (Δ_{n})} (Y^{(n)} (Δ_{n})) = \int \prod_{i = 1}^{n} \frac{1}{\sqrt{2 π (t_{n, i} - t_{n, i - 1})}} \exp (- \frac{{(Y_{t_{n, i}}^{(n)} - Y_{t_{n, i - 1}}^{(n)} - \int_{t_{n, i - 1}}^{t_{n, i}} g (t_{n, i - 1}, w_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d s)}^{2}}{2 (t_{n, i} - t_{n, i - 1})}) d μ_{W} (w) .

(A36)

With (A35) and (A36), we have

\begin{matrix} I (W_{0}^{T}; Y^{(n)} (Δ_{n})) & = E [\log f_{Y^{(n)} (Δ_{n}) | W} (Y^{(n)} (Δ_{n}) | W_{0}^{T})] - E [\log f_{Y^{(n)} (Δ_{n})} (Y^{(n)} (Δ_{n}))] \\ = E [\log \prod_{i = 1}^{n} \exp (- \frac{- 2 \int_{t_{n, i - 1}}^{t_{n, i}} g (s, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d s (Y_{t_{n, i}}^{(n)} - Y_{t_{n, i - 1}}^{(n)}) + {(\int_{t_{n, i - 1}}^{t_{n, i}} g (s, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d s)}^{2}}{2 (t_{n, i} - t_{n, i - 1})})] \\ - E [\log \int \prod_{i = 1}^{n} \exp (- \frac{- 2 \int_{t_{n, i - 1}}^{t_{n, i}} g (s, w_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d s (Y_{t_{n, i}}^{(n)} - Y_{t_{n, i - 1}}^{(n)}) + {(\int_{t_{n, i - 1}}^{t_{n, i}} g (s, w_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d s)}^{2}}{2 (t_{n, i} - t_{n, i - 1})}) d μ_{W} (w)] \\ = E [\sum_{i = 1}^{n} (- \frac{- 2 \int_{t_{n, i - 1}}^{t_{n, i}} g (s, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d s (Y_{t_{n, i}}^{(n)} - Y_{t_{n, i - 1}}^{(n)}) + {(\int_{t_{n, i - 1}}^{t_{n, i}} g (s, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d s)}^{2}}{2 (t_{n, i} - t_{n, i - 1})})] \\ - E [\log \int \exp \sum_{i = 1}^{n} (- \frac{- 2 \int_{t_{n, i - 1}}^{t_{n, i}} g (s, w_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d s (Y_{t_{n, i}}^{(n)} - Y_{t_{n, i - 1}}^{(n)}) + {(\int_{t_{n, i - 1}}^{t_{n, i}} g (s, w_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d s)}^{2}}{2 (t_{n, i} - t_{n, i - 1})}) d μ_{W} (w)] . \end{matrix}

On the other hand, it is well known (see, e.g., [12]) that

\begin{matrix} I (W; Y_{0}^{T}) & = E [\log \frac{d μ_{Y | W}}{d μ_{B}} (Y_{0}^{T})] - E [\log \frac{d μ_{Y}}{d μ_{B}} (Y_{0}^{T})] \\ = E [\log \exp [\int_{0}^{T} g (s, W_{0}^{s}, Y_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (s, W_{0}^{s}, Y_{0}^{s}) d s]] \\ - E [\log \int \exp [\int_{0}^{T} g (s, w_{0}^{s}, Y_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (s, w_{0}^{s}, Y_{0}^{s}) d s d μ_{W} (w)]] \\ = E [\int_{0}^{T} g (s, W_{0}^{s}, Y_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (s, W_{0}^{s}, Y_{0}^{s}) d s] \\ - E [\log \int \exp [\int_{0}^{T} g (s, w_{0}^{s}, Y_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (s, w_{0}^{s}, Y_{0}^{s}) d s d μ_{W} (w)]] . \end{matrix}

Now, we compute

\begin{matrix} \int_{0}^{T} g (s, W_{0}^{s}, Y_{0}^{s}) d Y (s) - \sum_{i = 1}^{n} \frac{\int_{t_{n, i - 1}}^{t_{n, i}} g (s, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d s (Y_{t_{n, i}}^{(n)} - Y_{t_{n, i - 1}}^{(n)})}{t_{n, i} - t_{n, i - 1}} \\ = \int_{0}^{T} g (s, W_{0}^{s}, Y_{0}^{s}) d Y (s) - \sum_{i = 1}^{n} g (t_{n, i - 1}, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) (Y_{t_{n, i}}^{(n)} - Y_{t_{n, i - 1}}^{(n)}) \\ - \sum_{i = 1}^{n} \frac{\int_{t_{n, i - 1}}^{t_{n, i}} (g (s, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) - g (t_{n, i - 1}, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}})) d s (Y_{t_{n, i}}^{(n)} - Y_{t_{n, i - 1}}^{(n)})}{t_{n, i} - t_{n, i - 1}} . \end{matrix}

It can be easily checked that the second term of the right-hand side of the above equality converges to 0 in mean. For the first term, we have

\begin{matrix} \int_{0}^{T} g (s, W_{0}^{s}, Y_{0}^{s}) d Y (s) - \sum_{i = 1}^{n} g (t_{n, i - 1}, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) (Y_{t_{n, i}}^{(n)} - Y_{t_{n, i - 1}}^{(n)}) \\ = \sum_{i = 1}^{n} \int_{t_{n, i - 1}}^{t_{n, i}} g (s, W_{0}^{s}, Y_{0}^{s}) d Y (s) - \sum_{i = 1}^{n} g (t_{n, i - 1}, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) (Y_{t_{n, i}} - Y_{t_{n, i - 1}}) \\ + \sum_{i = 1}^{n} g (t_{n, i - 1}, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) ((Y_{t_{n, i}} - Y_{t_{n, i - 1}}) - (Y_{t_{n, i}}^{(n)} - Y_{t_{n, i - 1}}^{(n)})) \\ = \sum_{i = 1}^{n} \int_{t_{n, i - 1}}^{t_{n, i}} g (s, W_{0}^{s}, Y_{0}^{s}) d Y (s) - \sum_{i = 1}^{n} \int_{i = 1}^{n} g (t_{n, i - 1}, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d Y (s) \\ + \sum_{i = 1}^{n} g (t_{n, i - 1}, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) ((Y_{t_{n, i}} - Y_{t_{n, i - 1}}) - (Y_{t_{n, i}}^{(n)} - Y_{t_{n, i - 1}}^{(n)})) \\ = \sum_{i = 1}^{n} \int_{t_{n, i - 1}}^{t_{n, i}} (g (s, W_{0}^{s}, Y_{0}^{s}) - g (t_{n, i - 1}, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}})) d Y (s) \\ + \sum_{i = 1}^{n} g (t_{n, i - 1}, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) ((Y_{t_{n, i}} - Y_{t_{n, i - 1}}) - (Y_{t_{n, i}}^{(n)} - Y_{t_{n, i - 1}}^{(n)})) . \end{matrix}

It then follows from Conditions (d) and (e), Lemmas 1, A1 and A2 that

E [|\sum_{i = 1}^{n} \frac{\int_{t_{n, i - 1}}^{t_{n, i}} g (s, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d s (Y_{t_{n, i}}^{(n)} - Y_{t_{n, i - 1}}^{(n)})}{t_{n, i} - t_{n, i - 1}} - \int_{0}^{T} g (s, W_{0}^{s}, Y_{0}^{s}) d Y (s)|] = O (δ_{Δ_{n}}^{\frac{1}{2}}) .

(A37)

And using a similar argument as above, we deduce that

E [|\frac{1}{2} \sum_{i = 1}^{n} \frac{{(\int_{t_{n, i - 1}}^{t_{n, i}} g (s, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d s)}^{2}}{t_{n, i} - t_{n, i - 1}} - \frac{1}{2} \int_{0}^{T} g {(s, W_{0}^{s}, Y_{0}^{s})}^{2} d s|] = O (δ_{Δ_{n}}^{\frac{1}{2}}) .

(A38)

It then follows from (A37) and (A38) that as n tends to infinity,

E [|\sum_{i = 1}^{n} (- \frac{- 2 \int_{t_{n, i - 1}}^{t_{n, i}} g (s, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d s (Y_{t_{n, i}}^{(n)} - Y_{t_{n, i - 1}}^{(n)}) + {(\int_{t_{n, i - 1}}^{t_{n, i}} g (s, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d s)}^{2}}{2 (t_{n, i} - t_{n, i - 1})})

- \int_{0}^{T} g (s, W_{0}^{s}, Y_{0}^{s}) d Y (s) + \frac{1}{2} \int_{0}^{T} g {(s, W_{0}^{s}, Y_{0}^{s})}^{2} d s|] = O (δ_{Δ_{n}}^{\frac{1}{2}}) .

(A39)

We now establish the following convergence:

E [\log \int \exp A^{(n)} (w) d μ_{W} (w)] \to E [\log \int \exp A (w) d μ_{W} (w)] .

(A40)

where

A^{(n)} (w) = \prod_{i = 1}^{n} (- \frac{- 2 \int_{t_{n, i - 1}}^{t_{n, i}} g (s, w_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d s (Y_{t_{n, i}}^{(n)} - Y_{t_{n, i - 1}}^{(n)}) + {(\int_{t_{n, i - 1}}^{t_{n, i}} g (s, w_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d s)}^{2}}{2 (t_{n, i} - t_{n, i - 1})}) .

and let

A (w) = \int_{0}^{T} g (s, w_{0}^{s}, Y_{0}^{s}) d Y (s) - \frac{1}{2} \int_{0}^{T} g^{2} (s, w_{0}^{s}, Y_{0}^{s}) d s .

Note that using a parallel argument as the derivation of (A39), we can establish

E \int |A^{(n)} (w) - A (w)| d μ_{W} (w) \to 0,

(A41)

as n tends to infinity; and similarly as in the derivation of (A27), from Conditions (d), (e) and (f), Lemmas 1, A1 and A2, we deduce that

E [\int |\exp A^{(n)} (w) - \exp A (w)| d μ_{W} (w)] \to 0

(A42)

as n tends to infinity. In addition, note that we always have

|\log \int \exp A^{(n)} (w) d μ_{W} (w)| \leq \int \exp A^{(n)} (w) d μ_{W} (w) + |\int A^{(n)} (w) d μ_{W} (w)| .

(A43)

So, by the general Lebesgue dominated convergence theorem with (A41), (A42) and (A43), we have

E [\log \int \exp A^{(n)} (w) d μ_{W} (w)] \to E [\log \int \exp A (w) d μ_{W} (w)] .

So, under the condition (A34), we have established the theorem.

Step $2$ . In this step, we will use the convergence in Step

1

and establish the theorem without the condition (A34).

Defining the stopping

τ_{k}

,

g_{(k)}

and

Y_{(k)}

as in the proof of Theorem 2, we again have:

Y_{(k)} (t) = ρ \int_{0}^{t} g_{(k)} (s, W_{0}^{s}, Y_{(k), 0}^{s}) d s + B (t), t \in [0, τ_{k} \land T] .

For any fixed k, applying the Euler-Maruyama approximation as in (11) and (12) to the above channel with respect to

Δ_{n}

, we obtain the process

Y_{(k)}^{(n)} (\cdot)

.

Now, by the fact that

\begin{matrix} I (W_{0}^{T}; Y^{(n)} (Δ_{n})) & = E [\log f_{Y^{(n)} (Δ_{n}) | W} (Y^{(n)} (Δ_{n}) | W_{0}^{T})] - E [\log f_{Y^{(n)} (Δ_{n})} (Y^{(n)} (Δ_{n}))] \\ = E [A^{(n)} (W)] - E [\log \int \exp A^{(n)} (w) d μ_{W} (w)] \\ \geq 0, \end{matrix}

we deduce that

\begin{matrix} E [\log f_{Y^{(n)} (Δ_{n})} (Y^{(n)} (Δ_{n}))] \\ \leq E [\log f_{Y^{(n)} (Δ_{n}) | W} (Y^{(n)} (Δ_{n}) | W_{0}^{T})] \\ = E [\sum_{i = 1}^{n} (- \frac{- 2 \int_{t_{n, i - 1}}^{t_{n, i}} g (s, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d s (Y_{t_{n, i}}^{(n)} - Y_{t_{n, i - 1}}^{(n)}) + {(\int_{t_{n, i - 1}}^{t_{n, i}} g (s, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d s)}^{2}}{2 (t_{n, i} - t_{n, i - 1})})] \\ = E [\sum_{i = 1}^{n} \frac{{(\int_{t_{n, i - 1}}^{t_{n, i}} g (s, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}}) d s)}^{2}}{2 (t_{n, i} - t_{n, i - 1})}] \\ \leq E [\sum_{i = 1}^{n} \frac{(\int_{t_{n, i - 1}}^{t_{n, i}} g {(s, W_{0}^{t_{n, i - 1}}, Y_{(k), 0}^{(n), t_{n, i - 1}})}^{2} d s) \int_{t_{n, i - 1}}^{t_{n, i}} d s}{2 (t_{n, i} - t_{n, i - 1})}] \\ = E [\sum_{i = 1}^{n} \int_{t_{n, i - 1}}^{t_{n, i}} g {(s, W_{0}^{t_{n, i - 1}}, Y_{(k), 0}^{(n), t_{n, i - 1}})}^{2} d s] \\ = E [\int_{0}^{T} g {(s, W_{0}^{{⌊ s ⌋}_{Δ_{n}}}, Y_{0}^{(n), {⌊ s ⌋}_{Δ_{n}}})}^{2} d s], \end{matrix}

where

{⌊ s ⌋}_{Δ_{n}}

denote the unique number

n_{0}

such that

t_{n, n_{0}} \leq s < t_{n, n_{0} + 1}

. Now, using the easily verifiable fact that

\frac{1}{\int \exp A^{(n)} (w) d μ_{W} (w)} = E [\exp (- A^{(n)} (W)) | Y_{0}^{T}],

and Jensen’s inequality, we deduce that

E [\log \frac{1}{\int \exp A^{(n)} (w) d μ_{W} (w)}] = E [\log E [\exp (- A^{(n)} (W)) | Y_{0}^{T}]] \leq \log E [\exp (- A^{(n)} (W))] \leq 0,

where for the last inequality, we have applied Fatou’s lemma as in deriving (A18). It then follows that

0 \leq E [\log f_{Y^{(n)} (Δ_{n})} (Y^{(n)} (Δ_{n}))] \leq E [\sum_{i = 1}^{n} \int_{t_{n, i - 1}}^{t_{n, i}} g {(s, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}})}^{2} d s],

which further implies that

I (W_{0}^{T}; Y^{(n)} (Δ_{n})) \leq E [\sum_{i = 1}^{n} \int_{t_{n, i - 1}}^{t_{n, i}} g {(s, W_{0}^{t_{n, i - 1}}, Y_{0}^{(n), t_{n, i - 1}})}^{2} d s] .

Now, using the fact that

Y^{(n)}

and

Y_{(k)}^{(n)}

coincide over

[0, τ_{k} \land T]

, one verifies that for any

ε > 0

,

\begin{matrix} I (W_{0}^{T}; Y^{(n)} (Δ_{n})) - I (W_{0}^{T}; Y_{(k), Δ_{n}}^{(n)}) \leq E [\int_{τ_{k}}^{T} g {(s, W_{0}^{{⌊ s ⌋}_{Δ_{n}}}, Y_{0}^{(n), {⌊ s ⌋}_{Δ_{n}}})}^{2} d s] \\ \leq E [\int_{τ_{k}}^{T} g {(s, W_{0}^{{⌊ s ⌋}_{Δ_{n}}}, Y_{0}^{(n), {⌊ s ⌋}_{Δ_{n}}})}^{2} d s; T - τ_{k} \leq ε] + E [\int_{τ_{k}}^{T} g {(s, W_{0}^{{⌊ s ⌋}_{Δ_{n}}}, Y_{0}^{(n), {⌊ s ⌋}_{Δ_{n}}})}^{2} d s; T - τ_{k} > ε] \\ \leq \int_{T - ε}^{T} E [g {(s, W_{0}^{{⌊ s ⌋}_{Δ_{n}}}, Y_{0}^{(n), {⌊ s ⌋}_{Δ_{n}}})}^{2}] d s + E [\int_{τ_{k}}^{T} g {(s, W_{0}^{{⌊ s ⌋}_{Δ_{n}}}, Y_{0}^{(n), {⌊ s ⌋}_{Δ_{n}}})}^{2} d s; T - τ_{k} > ε] . \end{matrix}

Using the easily verifiable fact that

{τ_{k}}

converges to T in probability uniformly over all n and the fact that ε can be arbitrarily small, we conclude that as k tends to infinity, uniformly over all n,

I (W_{0}^{T}; Y_{(k)}^{(n)} (Δ_{n})) \to I (W_{0}^{T}; Y^{(n)} (Δ_{n})) .

(A44)

Next, an application of the monotone convergence theorem, together with the fact that

τ_{k} \to T

as k tends to infinity, yields that monotone increasingly

I (W_{0}^{T}; Y_{0}^{τ_{n}}) = \frac{1}{2} E [\int_{0}^{τ_{n}} {(g (s) - \hat{g} (s))}^{2} d s] \to I (W_{0}^{T}; Y_{0}^{T}) = \frac{1}{2} E [\int_{0}^{T} {(g (s) - \hat{g} (s))}^{2} d s]

as n tends to infinity. By Step

1

, for any fixed

k_{i}

,

lim_{n \to \infty} I (W_{0}^{T}; Y_{(k_{i})}^{(n)} (Δ_{n})) = I (W_{0}^{T}; Y_{0}^{τ_{k_{i}}}),

which means that there exists a sequence

{n_{i}}

such that, as i tends to infinity,

I (W; Y_{(k_{i})}^{(n_{i})} (Δ_{n})) \to I (W; Y_{0}^{T}) .

Moreover, by (A44),

lim_{i \to \infty} I (W_{0}^{T}; Y_{(k_{i})}^{(n_{i})} (Δ_{n_{i}})) = lim_{i \to \infty} I (W_{0}^{T}; Y^{(n_{i})} (Δ_{n})),

which further implies that

lim_{i \to \infty} I (W_{0}^{T}; Y^{(n_{i})} (Δ_{n})) = I (W_{0}^{T}; Y_{0}^{T}) .

The theorem then follows from a usual subsequence argument as in the proof of Theorem 2. ☐

Remark A2.

Parallel to Remark A1, the arguments in the proof of Theorem 3 can be adapted to yield an approximation theorem in estimation theory.

More precisely, consider the following continuous-time Gaussian feedback channel under the assumptions in Theorem 3:

Y (t) = \int_{0}^{t} X (s, M, Y_{0}^{s}) d s + B (t), t \in [0, T] .

The MMSE is the limit of the approximated MMSE, namely,

\int_{0}^{T} E [{(X (s) - E [X (s) | Y_{0}^{T}])}^{2}] d s = lim_{n \to \infty} \int_{0}^{T} E [{(X^{(n)} (s) - E [X^{(n)} (s) | Y^{(n)} (Δ_{n})])}^{2}] d s .

In more detail, the above-mentioned convergence follows from the fact that

\int_{0}^{T} E [{(X^{(n)} (s))}^{2}] d s \to \int_{0}^{T} E [{(X (s))}^{2}] d s

and the fact that

E [E^{2} [X (s) | Y_{0}^{T}]] = E [{(\frac{\int X (s, m_{0}^{s}, Y_{0}^{s}) \exp (A (m)) d μ_{M} (m)}{\int \exp (A (m)) d μ_{M} (m)})}^{2}],

and the fact that

E [E^{2} [X^{(n)} (s) | Y^{(n)} (Δ_{n})]] = E [{(\frac{\int X^{(n)} (s, m_{0}^{s}, Y_{0}^{s}) \exp (A^{(n)} (m)) d μ_{M} (m)}{\int \exp (A^{(n)} (m)) d μ_{M} (m)})}^{2}] .

Then, using a similar argument as in the proof of Theorem 3, we can show

lim_{n \to \infty} E [E^{2} [X^{(n)} (s) | Y^{(n)} (Δ_{n})]] = E [E^{2} [X (s) | Y_{0}^{T}]],

which implies the claimed convergence.

Similarly, we can also conclude that with the assumptions in Theorem 3, the causal MMSE is the limit of the approximated causal MMSE, namely,

\int_{0}^{T} E [{(X (s) - E [X (s) | Y_{0}^{s}])}^{2}] d s = lim_{n \to \infty} \int_{0}^{T} E [(X^{(n)} (s) - E [X^{(n)} (s) | Y^{(n)} {(Δ_{n} \cap [0, s])}^{2}] d s .

Appendix E. Proof of Theorem 7

In the section, we give the proof of Theorem 7. For notational convenience only, we will assume

m = 2

, the case with a generic m being completely parallel. We will first need the following lemma, which is a key component in our treatment of both continuous-time Gaussian MACs.

Lemma A3.

For any

ϵ > 0

, there exist two independent Ornstein–Uhlenbeck processes

{X_{i} (s) : s \geq 0}

,

i = 1, 2

, satisfying the following power constraint:

f o r i = 1, 2, t h e r e e x i s t s P_{i} > 0 s u c h t h a t f o r a l l t > 0, \frac{1}{t} \int_{0}^{t} E [X_{i}^{2} (s)] d s = P_{i},

(A45)

such that for all T,

| I_{T} (X_{1}, X_{2}; Y) / T - (P_{1} + P_{2}) / 2 | \leq ϵ,

(A46)

and

| I_{T} (X_{1}; Y | X_{2}) / T - P_{1} / 2 | \leq ϵ, | I_{T} (X_{2}; Y | X_{1}) / T - P_{2} / 2 | \leq ϵ,

(A47)

moreover,

| I_{T} (X_{1}; Y) / T - P_{1} / 2 | \leq ϵ, | I_{T} (X_{2}; Y) / T - P_{2} / 2 | \leq ϵ,

(A48)

where

Y (t) = \int_{0}^{t} X_{1} (s) d s + \int_{0}^{t} X_{2} (s) d s + B (t), t \geq 0 .

(A49)

Here (and often in the remainder of the paper) the subscript T means that the (conditional) mutual information is computed over the time period

[0, T]

.

Proof.

For

a > 0

, consider the following two independent Ornstein–Uhlenbeck processes

X_{i} (t)

,

i = 1, 2

, given by

X_{i} (t) = \sqrt{2 a P_{i}} \int_{- \infty}^{t} e^{- a (t - s)} d B_{i} (s),

where

B_{i}

,

i = 1, 2

, are independent standard Brownian motions. Obviously, for

X_{i}

defined as above, (A45) is satisfied. A parallel version of the proof of Theorem

6.2.1

of [12] yields that

I_{T} (X_{1}, X_{2}; Y) = I_{T} (X_{1} + X_{2}; Y) = \frac{1}{2} \int_{0}^{T} E [{(X_{1} (t) + X_{2} (t) - E [X_{1} (t) + X_{2} (t) | Y_{0}^{t}])}^{2}] d t .

It then follows from Theorem

6.4.1

in [12] (applied to the Ornstein–Uhlenbeck process

X_{1} (t) + X_{2} (t)

that as

a \to \infty

,

I_{T} (X_{1}, X_{2}; Y) / T = I_{T} (X_{1} + X_{2}; Y) / T \to (P_{1} + P_{2}) / 2,

uniformly in T, which establishes (A46).

For

i = 1, 2

, define

{\tilde{Y}}_{i} (t) = \int_{0}^{t} X_{i} (s) d s + B (t), t > 0 .

As in the proof of Theorem

6.4.1

in [12], we deduce that for

i = 1, 2

,

I_{T} (X_{i}; {\tilde{Y}}_{i}) / T

tend to

P_{i} / 2

uniformly in T. Now, since

X_{1}

and

X_{2}

are independent, we have for any fixed T,

I_{T} (X_{1}; Y | X_{2}) = I_{T} (X_{1}; {\tilde{Y}}_{1} | X_{2}) = I_{T} (X_{1}; {\tilde{Y}}_{1}),

and

I_{T} (X_{2}; Y | X_{1}) = I_{T} (X_{2}; {\tilde{Y}}_{2} | X_{1}) = I_{T} (X_{2}; {\tilde{Y}}_{2}),

which immediately implies (A47).

Now, by the chain rule of mutual information,

I_{T} (X_{1}, X_{2}; Y) = I_{T} (X_{1}; Y) + I_{T} (X_{2}; Y | X_{1}) = I_{T} (X_{2}; Y) + I_{T} (X_{1}; Y | X_{2}),

which, together with (A46) and (A47), implies (A48).

☐

Remark A3.

With

X_{i}

,

i = 1, 2

, regarded as channel inputs, (A49) can be reinterpreted as a white Gaussian MAC. For

i \neq j

,

I (X_{i}; Y)

, the reliable transmission rate of

X_{i}

when

X_{j}

is not known can be arbitrarily close to

I (X_{i}; Y | X_{j})

, the reliable transmission rate of

X_{i}

when

X_{j}

is known. In other words, for white Gaussian MACs, knowledge about other user’s inputs will not help to achieve faster transmission rate, and therefore, they can be simply treated as noises. A more intuitive explanation of this result is as follows: for the Ornstein–Uhlenbeck process

X_{i}

as specified in the proof, its power spectral density can be computed as

f_{i} (λ) = \frac{2 a P_{i}}{2 π (λ^{2} + a^{2})},

which is “negligible” compared to that of the white Gaussian noise (which is the constant 1) as a tends to infinity. Lemma A3 is a key ingredient for deriving the capacity regions of white Gaussian MACs.

We also need some result on the information stability of continuous-time Gaussian processes. Let

(U, V) = {(U (t), V (t)), t \geq 0}

be a continuous Gaussian system (which means

U (t), V (t)

are pairwise Gaussian stochastic processes). Define

φ^{(T)} (u, v) = \frac{d μ_{U V}^{(T)}}{d μ_{U}^{(T)} \times μ_{V}^{(T)}} (u, v), (u, v) \in C [0, T] \times C [0, T],

where

μ_{U}^{(T)}

,

μ_{V}^{(T)}

and

μ_{U V}^{(T)}

denote the probability distributions of

U_{0}^{T}

,

V_{0}^{T}

and their joint distribution, respectively. For any

ε > 0

, we denote by

T_{ε}^{(T)}

the ε-typical set:

T_{ε}^{(T)} = \{(u, v) \in C [0, T] \times C [0, T]; \frac{1}{T} | \log φ^{(T)} (u, v) - I_{T} (U, V) | \leq ε\} .

The pair

(U, V)

is said to be information stable [26] if for any

ε > 0

,

lim_{T \to \infty} μ_{U V}^{(T)} (T_{ε}) = 1 .

The following theorem is a rephrased version of Theorem 6.6.2. in [12].

Lemma A4.

The Gaussian system

(U, V)

is information stable provided that

lim_{T \to \infty} \frac{I_{T} (U; V)}{T^{2}} = 0 .

Lemma A4 will be used in the proof of Theorem 7 to establish, roughly speaking, that almost all sequences are jointly typical.

With Lemmas A3 and A4, Theorem 7 largely follows from a lengthy yet almost routine argument, which is included below due to a number of technical challenges in the proof.

Proof of Theorem 7.

The converse part. In this part, we will show that for any sequence of

(T, (e^{T R_{1}}, e^{T R_{2}}), (P_{1}, P_{2}))

-codes with

P_{e}^{(T)} \to 0

as

T \to \infty

, the rate pair

(R_{1}, R_{2})

will have to satisfy

R_{1} \leq P_{1} / 2, R_{2} \leq P_{2} / 2 .

Fix T and consider the above-mentioned

(T, (e^{T R_{1}}, e^{T R_{2}}), (P_{1}, P_{2}))

-code. By the code construction, it is possible to estimate the messages

(M_{1}, M_{2})

from the channel output

Y_{0}^{T}

with a low probability of error. Hence, the conditional entropy of

(M_{1}, M_{2})

given

Y_{0}^{T}

must be small; more precisely, by Fano’s inequality,

H (M_{1}, M_{2} | Y_{0}^{T}) \leq T (R_{1} + R_{2}) P_{e}^{(T)} + H (P_{e}^{(T)}) = T ε_{T},

where

ε_{T} \to 0

as

T \to \infty

. Then, we have

H (M_{1} | Y^{T}) \leq H (M_{1}, M_{2} | Y^{T}) \leq T ε_{T}, H (M_{2} | Y^{T}) \leq H (M_{1}, M_{2} | Y^{T}) \leq T ε_{T} .

Now, we can bound the rate

R_{1}

as follows:

\begin{matrix} T R_{1} & = H (M_{1}) \\ = I (M_{1}; Y_{0}^{T}) + H (M_{1} | Y_{0}^{T}) \\ \leq I (M_{1}; Y_{0}^{T}) + T ε_{T} \\ \leq H (M_{1}) - H (M_{1} | Y_{0}^{T}) + T ε_{T} \\ \leq H (M_{1} | M_{2}) - H (M_{1} | Y_{0}^{T}, M_{2}) + T ε_{T} \\ = I (M_{1}; Y_{0}^{T} | M_{2}) + T ε_{T} . \end{matrix}

Conditioning on

M_{2}

and applying Theorem

6.2.1

in [12], we have

\begin{matrix} I (M_{1}; Y_{0}^{T} | M_{2}) & = \frac{1}{2} E [\int_{0}^{T} E [{(X_{1} (t) + X_{2} (t) - {\hat{X}}_{1} (t) - {\hat{X}}_{2} (t))}^{2} | M_{2}] d t] \\ = \frac{1}{2} \int_{0}^{T} E [{(X_{1} (t) + X_{2} (t) - {\hat{X}}_{1} (t) - {\hat{X}}_{2} (t))}^{2}] d t, \end{matrix}

where

{\hat{X}}_{i} (t) = E [X_{i} (t) | Y_{0}^{T}, M_{2}]

,

i = 1, 2

. Noticing that

X_{2} = {\hat{X}}_{2}

, we then have

I (M_{1}; Y_{0}^{T} | M_{2}) = \frac{1}{2} \int_{0}^{T} E [{(X_{1} (t) - {\hat{X}}_{1} (t))}^{2}] d t,

which, together with (33), implies that

R_{1} \leq P_{1} / 2

. A completely parallel argument will yield that

R_{2} \leq P_{2} / 2

.

The achievability part. In this part, we will show that as long as

(R_{1}, R_{2})

satisfying

0 \leq R_{1} < P_{1} / 2, 0 \leq R_{2} < P_{2} / 2,

(A50)

we can find a sequence of

(T, (e^{T R_{1}}, e^{T R_{2}}), (P_{1}, P_{2}))

-codes with

P_{e}^{(T)} \to 0

as

T \to \infty

. The argument consists of several steps as follows.

Codebook generation: For a fixed

T > 0

and

ε > 0

, assume that

X_{1}

and

X_{2}

are independent Ornstein–Uhlenbeck processes over

[0, T]

with respective variances

P_{1} - ε

and

P_{2} - ε

, and that

(R_{1}, R_{2})

satisfying (A50). Generate

e^{T R_{1}}

independent codewords

X_{1, i}

,

i \in {1, 2, \dots, e^{T R_{1}}}

, of length T, according to the distribution of

X_{1}

. Similarly, generate

e^{T R_{2}}

independent codewords

X_{2, j}

,

j \in {1, 2, \dots, e^{T R_{2}}}

, of length T, according to the distribution of

X_{2}

. These codewords (which may not satisfy the power constraint in (33)) form the codebook, which is revealed to the senders and the receiver.

Encoding: To send message

i \in M_{1}

, sender 1 sends the codeword

X_{1, i}

. Similarly, to send

j \in M_{2}

, sender 2 sends

X_{2, j}

.

Decoding: For any fixed

ε > 0

, let

T_{ε}^{(T)}

denote the set of jointly typical

(x_{1}, x_{2}, y)

sequences, which is defined as follows:

T_{ε}^{(T)} = {(x_{1}, x_{2}, y) \in C [0, T] \times C [0, T] \times C [0, T] : | \log φ_{1} (x_{1}, x_{2}, y) - I_{T} (X_{1}, X_{2}; Y) | \leq T ε,

| \log φ_{2} (x_{1}, x_{2}, y) - I_{T} (X_{1}; X_{2}, Y) | \leq T ε, | \log φ_{3} (x_{1}, x_{2}, y) - I_{T} (X_{2}; X_{1}, Y) | \leq T ε},

where

φ_{1} (x_{1}, x_{2}, y) = \frac{d μ_{X_{1} X_{2} Y}}{d μ_{X_{1} X_{2}} \times μ_{Y}} (x_{1}, x_{2}, y),

φ_{2} (x_{1}, x_{2}, y) = \frac{d μ_{X_{1} X_{2} Y}}{d μ_{X_{1}} \times μ_{X_{2} Y}} (x_{1}, x_{2}, y),

φ_{3} (x_{1}, x_{2}, y) = \frac{d μ_{X_{1} X_{2} Y}}{d μ_{X_{2}} \times μ_{X_{1} Y}} (x_{1}, x_{2}, y) .

Here we remark that it is easy to check that the above Radon-Nykodym derivatives are all well-defined; see, e.g., Theorem

7.7

of [19] for sufficient conditions for their existence. Based on the received output

y \in C [0, T]

, the receiver chooses the pair

(i, j)

such that

(x_{1, i}, x_{2, j}, y) \in T_{ε}^{(T)},

if such a pair

(i, j)

exists and is unique; otherwise, an error is declared. Moreover, an error will be declared if the chosen codeword does not satisfy the power constraint in (33).

Analysis of the probability of error: Now, for fixed

T, ε > 0

, define

E_{i j} = {(X_{1, i}, X_{2, j}, Y) \in T_{ε}^{(T)}} .

By symmetry, we assume, without loss of generality, that (1,1) was sent. Define

π^{(T)}

to be the event that

\int_{0}^{T} {(X_{1, 1} (t))}^{2} d t > P_{1} T, \int_{0}^{T} {(X_{2, 1} (t))}^{2} d t > P_{2} T .

Then,

{\hat{P}}_{e}^{(T)}

, the error probability for the above coding scheme (where codewords violating the power constraint are allowed), can be upper bounded as follows:

{\hat{P}}_{e}^{(T)} = P (π^{(T)} \cup E_{11}^{c} ⋃ \cup_{(i, j) \neq (1, 1)} E_{i j})

\leq P (π^{(T)}) + P (E_{11}^{c}) + \sum_{i \neq 1, j = 1} P (E_{i 1}) + \sum_{i = 1, j \neq 1} P (E_{1 j}) + \sum_{i \neq 1, j \neq 1} P (E_{i j}) .

So, for any

i, j \neq 1

, we have

{\hat{P}}_{e}^{(T)} \leq P (π^{(T)}) + P (E_{11}^{c}) + e^{T R_{1}} P (E_{i 1}) + e^{T R_{2}} P (E_{1 j}) + e^{T R_{1} + T R_{2}} P (E_{i j})

Using the well-known fact that an Ornstein–Uhlenbeck process is ergodic [76,77], we deduce that

P (π^{(T)}) \to 0

as

T \to \infty

. In addition, by Lemma A4 and Theorem

6.2.1

in [12], we have

lim_{T \to \infty} P ((X_{1, 1}, X_{2, 1}, Y) \in T_{ε}^{(T)}) = 1 and thus lim_{T \to \infty} P (E_{11}^{c}) = 0 .

Now, we have for any

i \neq 1

,

\begin{matrix} P (E_{i 1}) & = P ((X_{1, i}, X_{2, 1}, Y) \in T_{ε}^{(T)}) \\ = \int_{(x_{1}, x_{2}, y) \in T_{ε}^{(T)}} d μ_{X_{1}} (x_{1}) d μ_{X_{2} Y} (x_{2}, y) \\ = \int_{T_{ε}^{(T)}} \frac{1}{φ_{1} (x_{1}, x_{2}, y)} d μ_{X_{1} X_{2} Y} (x_{1}, x_{2}, y) \\ \leq \int_{T_{ε}^{(T)}} e^{- I_{T} (X_{1}; X_{2}, Y) + ε T} d μ_{X_{1} X_{2} Y} (x_{1}, x_{2}, y) \\ = e^{- I_{T} (X_{1}; Y | X_{2}) + ε T}, \end{matrix}

where we have used the independence of

X_{1}

and

X_{2}

, and the consequent fact that

I_{T} (X_{1}; X_{2}, Y) = I_{T} (X_{1}; X_{2}) + I_{T} (X_{1}; Y | X_{2}) = I_{T} (X_{1}; Y | X_{2}) .

Similarly, we have, for

j \neq 1

,

P (E_{1 j}) \leq e^{- I_{T} (X_{2}; Y | X_{1}) + ε T},

and for

i, j \neq 1

,

P (E_{i j}) \leq e^{- I_{T} (X_{1}, X_{2}; Y) + ε T} .

It then follows that

{\hat{P}}_{e}^{(T)} \leq P (π^{(T)}) + P (E_{11}^{c}) + e^{T R_{1} + ε T - I_{T} (X_{1}; Y | X_{2})} + e^{T R_{2} + ε T - I_{T} (X_{2}; Y | X_{1})} + e^{T R_{1} + T R_{2} + ε T - I_{T} (X_{1}, X_{2}; Y)} .

By Lemma A3, one can choose independent OU processes

X_{1}, X_{2}

such that

I_{T} (X_{1}; Y | X_{2}) / T \to (P_{1} - ϵ) / 2

,

I_{T} (X_{2}; Y | X_{1}) / T \to (P_{2} - ϵ) / 2

and

I_{T} (X_{1}, X_{2}; Y) / T \to (P_{1} + P_{2} - 2 ϵ)

uniformly in T. This implies that with ϵ chosen sufficiently small, we have

{\hat{P}}_{e}^{(T)} \to 0

, as

T \to \infty

. In other words, there exists a sequence of good codes (which may not satisfy the power constraint) with low average error probability. Now, from each of the above codes, we delete the worse half of the codewords (any codeword violating the power constraint will be deleted since it must have error probability 1). Then, with only slightly decreased transmission rate, the remaining codewords will satisfy the power constraint and will have small maximum error probability (and thus small average error probability

P_{e}^{(T)}

), which implies that the rate pair

(R_{1}, R_{2})

is achievable. ☐

Remark A4.

The achievability part can be proven alternatively, which will be roughly described as follows: for arbitrarily small

ϵ > 0

, by Lemma A3, one can choose independent Ornstein–Uhlenbeck processes

X_{i}

with respective variances

P_{i} - ϵ

,

i = 1, 2

, such that

I_{T} (X_{i}; Y) / T

approaches

(P_{i} - ϵ) / 2

. Then, a parallel random coding argument with

X_{j}

,

j \neq i

, being treated as noise at receiver i shows that the rate pair

((P_{1} - ϵ) / 2, (P_{2} - ϵ) / 2)

can be approached, which yields the achievability part.

Appendix F. Proof of Theorem 8

For notational convenience only, we only prove the case when

n = 2

; the case when n is generic is similar.

The converse part. In this part, we will show that for any sequence of

(T, (e^{T R_{1}}, e^{T R_{2}}), (P_{1}, P_{2}))

codes with

P_{e}^{(T)} \to 0

, the rate pair

(R_{1}, R_{2})

will have to satisfy

R_{1} \leq a_{11}^{2} P_{1} / 2, R_{2} \leq a_{22}^{2} P_{2} / 2 .

(A51)

Fix T and consider the above-mentioned

(T, (e^{T R_{1}}, e^{T R_{2}}), (P_{1}, P_{2}))

code. By the code construction, for

i = 1, 2

, it is possible to estimate the messages

M_{i}

from the channel output

Y_{i, 0}^{T}

with an arbitrarily low probability of error. Hence, by Fano’s inequality, for

i = 1, 2

,

H (M_{i} | Y_{i, 0}^{T}) = T ε_{i, T},

where

ε_{i, T} \to 0

as

T \to \infty

. We then have

T R_{1} = H (M_{1}) = H (M_{1} | M_{2}) = I (M_{1}; Y_{1} | M_{2}) + H (M_{1} | M_{2}, Y_{1}) \leq I (M_{1}; Y_{1} | M_{2}) + T ε_{1, T},

As in the proof of Theorem 7, we have

I (M_{1}; Y_{1, 0}^{T} | M_{2}) = \frac{a_{11}^{2}}{2} \int_{0}^{T} E [{(X_{1} (s) - E [X_{1} (s) | M_{2}, Y_{1, 0}^{s}])}^{2}] d s .

It then follows that

T R_{1} \leq \frac{a_{11}^{2}}{2} \int_{0}^{T} E [{(X_{1} (s) - E [X_{1} (s) | M_{2}, Y_{1, 0}^{s}])}^{2}] d s + T ε_{1, T},

which implies that

R_{1} \leq a_{11}^{2} P_{1} / 2

. With a parallel argument, one can derive that

R_{2} \leq a_{22}^{2} P_{2} / 2

. The proof for the converse part is then complete.

The achievability part. We only sketch the proof of this part. For arbitrarily small

ϵ > 0

, by Lemma A3, one can choose independent Ornstein–Uhlenbeck processes

X_{i}

with respective variances

P_{i} - ϵ

,

i = 1, 2

, such that

I_{T} (X_{i}; Y) / T

approaches

a_{i i}^{2} (P_{i} - ϵ) / 2

. Then, a parallel random coding argument as in the proof of Theorem 7 with

X_{j}

,

j \neq i

, being treated as noise at receiver i shows that the rate pair

(a_{11}^{2} (P_{1} - ϵ) / 2, a_{22}^{2} (P_{2} - ϵ) / 2)

can be approached, which yields the achievability part.

Appendix G. Proof of Theorem 10

One of the important tools that plays a key role in discrete-time network information theory is the entropy power inequality [2,47], which can be applied to compare information-theoretic quantities involving different users. The following lemma, which, despite its strikingly different form, serves the typical function of a discrete-time entropy power inequality.

Lemma A5.

Consider a continuous-time white Gaussian channel characterized by the following equation

Y (t) = \sqrt{s n r} \int_{0}^{t} X (s) d s + B (t), t \geq 0,

where

s n r \geq 0

denotes the signal-to-noise ratio in the channel and M is the message to be transmitted through the channel. Then, for any fixed T,

I_{T} (M; Y) / s n r

is a monotone decreasing function of

s n r

.

Proof.

For notational convenience, in this proof, we write

I_{T} (M; Y)

as

I_{T} (s n r)

. By Theorem

6.2.1

in [12], we have

I_{T} (s n r) = \frac{s n r}{2} \int_{0}^{T} E [{(X (s) - E [X (s) | Y_{0}^{s}])}^{2}] d s,

and Theorem 6 in [25], we have (the derivative is with respect to

s n r

)

I_{T}^{'} (s n r) = \frac{1}{2} \int_{0}^{T} E [{(X (s) - E [X (s) | Y_{0}^{T}])}^{2}] d s .

It then follows that

\begin{matrix} {(\frac{I_{T} (s n r)}{s n r})}^{'} & = \frac{1}{s n r} (I_{T}^{'} (s n r) - \frac{I_{T} (s n r)}{s n r}) \\ = \frac{1}{2 s n r} (\int_{0}^{T} E [{(X (s) - E [X (s) | Y_{0}^{T}])}^{2}] d s - \int_{0}^{T} E [{(X (s) - E [X (s) | Y_{0}^{s}])}^{2}] d s) \leq 0, \end{matrix}

which immediately implies the lemma. ☐

We are now ready for the proof of Theorem 10.

Proof of Theorem 10.

For notational convenience only, we prove the case when

n = 2

, the case when n is generic being parallel.

The converse part. Without loss of generality, we assume that

s n r_{1} \geq s n r_{2} .

We will show that for any sequence of

(T, (e^{T R_{1}}, e^{T R_{2}}), P)

codes with

P_{e}^{(T)} \to 0

as

T \to \infty

, the rate pair

(R_{1}, R_{2})

will have to satisfy

\frac{R_{1}}{s n r_{1}} + \frac{R_{2}}{s n r_{2}} \leq \frac{P}{2} .

(A52)

Fix T and consider the above-mentioned

(T, (e^{T R_{1}}, e^{T R_{2}}), P)

-code. By the code construction, for

i = 1, 2

, it is possible to estimate the messages

M_{i}

from the channel output

Y_{i, 0}^{T}

with an arbitrarily low probability of error. Hence, by Fano’s inequality, for

i = 1, 2

,

H (M_{i} | Y_{i, 0}^{T}) \leq T R_{i} P_{e}^{(T)} + H (P_{e}^{(T)}) = T ε_{i, T},

where

ε_{i, T} \to 0

as

T \to \infty

. It then follows that

T R_{1} = H (M_{1}) = H (M_{1} | M_{2}) \leq I (M_{1}; Y_{1, 0}^{T} | M_{2}) + T ε_{1, T},

(A53)

T R_{2} = H (M_{2}) \leq I (M_{2}; Y_{2, 0}^{T}) + T ε_{2, T} .

(A54)

By the chain rule of mutual information, we have

I (M_{1}, M_{2}; Y_{2, 0}^{T}) = I (M_{2}; Y_{2, 0}^{T}) + I (M_{1}; Y_{2, 0}^{T} | M_{2}) \geq I (M_{2}; Y_{2, 0}^{T}) + \frac{s n r_{2}}{s n r_{1}} I (M_{1}; Y_{1, 0}^{T} | M_{2}),

(A55)

where, for the inequality above, we have applied Lemma A5. Now, by Theorem

6.2.1

in [12], we have

I (M_{1}, M_{2}; Y_{2, 0}^{T}) = \frac{s n r_{2}}{2} \int_{0}^{T} E [{(X (s) - E [X (s) | Y_{2, 0}^{s}])}^{2}] d s \leq \frac{s n r_{2}}{2} \int_{0}^{T} E [X^{2} (s)] d s,

which, together with (A53), (A54), (A55) and (44), immediately implies the converse part.

The achievability part. We only sketch the proof of this part. For an arbitrarily small

ϵ > 0

, by Theorem

6.4.1

in [12], one can choose an Ornstein–Uhlenbeck processes

\tilde{X}

with variance

P - ϵ

, such that

I_{T} (\tilde{X}; Y_{i}) / T

approaches

s n r_{i} (P - ϵ) / 2

. For any

0 \leq λ \leq 1

, let

X (t) = \sqrt{λ} X_{1} (t) + \sqrt{1 - λ} X_{2} (t), t \geq 0,

where

X_{1}

and

X_{2}

are independent copies of

\tilde{X}

. Then, by a similar argument as in the proof of Lemm A3, we deduce that

I_{T} (X_{1}; Y_{1}) / T, I_{T} (X_{2}; Y_{2}) / T

approach

s n r_{1} λ (P - ϵ) / 2

,

s n r_{2} (1 - λ) (P - ϵ) / 2

, respectively. Then, a parallel random coding argument as in the proof of Theorem 7 such that

when encoding, $X_{i}$ only carries the message meant for receiver i;
when decoding, receiver i treats $X_{j}$ , $j \neq i$ , as noise,

shows that the rate pair

(s n r_{1} λ (P - ϵ) / 2, s n r_{2} (1 - λ) (P - ϵ) / 2)

can be approached, which immediately establishes the achievability part. ☐

Remark A5.

For the achievability part, instead of using the power sharing scheme as in the proof, one can also employ the following time-sharing scheme: set X to be

X_{1}

for λ fraction of the time, and

X_{2}

for

1 - λ

fraction of the time. Then, it is straightforward to check this scheme also achieves the rate pair

(s n r_{1} λ (P - ϵ) / 2, s n r_{2} (1 - λ) (P - ϵ) / 2)

. This, from a different perspective, echoes the observation in [78] that time-sharing achieves the capacity region of a white Gaussian BC as the bandwidth limit tends to infinity.

References

Shannon, C. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Cover, T.; Thomas, J. Elements of Information Theory, 2nd ed.; Wiley Interscience: New York, NY, USA, 2006. [Google Scholar]
Nyquist, H. Certain factors affecting telegraph speed. Bell Syst. Tech. J. 1924, 3, 324–346. [Google Scholar] [CrossRef]
Shannon, C. Communication in the presence of noise. Proc. IRE 1949, 37, 10–21. [Google Scholar] [CrossRef]
Gallager, R. Information Theory and Reliable Communication; Wiley: New York, NY, USA, 1968. [Google Scholar]
Wyner, A. The capacity of the band-limited Gaussian channel. Bell Syst. Tech. J. 1966, 45, 359–395, Reprinted in Key Papers in the Development of Information Theory; Slepian, D., Ed.; IEEE Press: New York, NY, USA, 1974; pp. 190–193. [Google Scholar]
Slepian, D. On Bandwidth. Proc. IEEE 1976, 64, 292–300. [Google Scholar] [CrossRef]
Bethoux, P. Test et estimations concernant certaines functions aleatoires en particulier Laplaciennes. Ann. Inst. Henri Poincare 1962, 27, 255–322. (In French) [Google Scholar]
Fortet, R. Hypothesis testing and Estimation for Laplacian Functions. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Berkeley, CA, USA, 1961; Volume 1, pp. 289–305. [Google Scholar]
Ash, R. Capacity and error bounds for a time-continuous Gaussian channel. Inf. Control 1963, 6, 14–27. [Google Scholar] [CrossRef] [Green Version]
Ash, R. Further discussion of a time-continuous Gaussian channel. Inf. Control 1964, 7, 78–83. [Google Scholar] [CrossRef] [Green Version]
Ihara, S. Information Theory for Continuous Systems; World Scientific: Singapore, 1993. [Google Scholar]
Poor, H. An Introduction to Signal Detection and Estimation; Springer: New York, NY, USA, 1994. [Google Scholar]
Ash, R. Information Theory; Wiley Interscience: New York, NY, USA, 1965. [Google Scholar]
Duncan, T. On the calculation of mutual information. SIAM J. Appl. Math. 1970, 19, 215–220. [Google Scholar] [CrossRef]
Kadota, T.; Zakai, M.; Ziv, J. Mutual information of the white Gaussian channel with and without feedback. IEEE Trans. Inf. Theory 1971, 17, 368–371. [Google Scholar] [CrossRef]
Weissman, T. The relationship between causal and non-causal mismatched estimation in continuous-time AWGN channels. IEEE Trans. Inf. Theory 2010, 56, 4256–4273. [Google Scholar] [CrossRef]
Weissman, T.; Kim, Y.; Permuter, H. Directed information, causal estimation, and communication in continuous time. IEEE Trans. Inf. Theory 2013, 59, 1271–1287. [Google Scholar] [CrossRef]
Liptser, R.; Shiryaev, A. Statistics of Random Processes (I): General Theory, 2nd ed.; Springer: Berlin, Germany, 2001. [Google Scholar]
Gelfand, A.; Yaglom, I. Calculation of the amount of information about a random function contained in another such function. Uspekhi Mat. Nauk 1957, 12, 3–52. [Google Scholar]
Huang, R.; Johnson, R. Information capacity of time-continuous channels. IEEE Trans. Inf. Theory 1962, 8, 191–198. [Google Scholar] [CrossRef]
Huang, R.; Johnson, R. Information transmission with time-continuous random processes. IEEE Trans. Inf. Theory 1963, 9, 84–94. [Google Scholar] [CrossRef]
Kim, Y. Gaussian Feedback Capacity. Ph.D. Thesis, Stanford University, Stanford, CA, USA, 2006. [Google Scholar]
Kloeden, P.; Platen, E. Numerical Solution of Stochastic Differential Equations; Stochastic Modelling and Applied Probability; Springer: Berlin/Heidelberg, Germany, 1992; Volume 23. [Google Scholar]
Guo, D.; Shamai, S.; Verdu, S. Mutual information and minimum mean-square error in Gaussian channels. IEEE Trans. Inf. Theory 2005, 51, 1261–1282. [Google Scholar] [CrossRef]
Pinsker, M. Information and Information Stability of Random Variables and Processes; Holden-Day: San Francisco, CA, USA, 1964. [Google Scholar]
Mao, X. Stochastic Differential Equations and Applications; Horwood: Bristol, UK, 1997. [Google Scholar]
Han, G. Limit theorems in hidden Markov models. IEEE Trans. Inf. Theory 2013, 59, 1311–1328. [Google Scholar] [CrossRef]
Han, G. A randomized algorithm for the capacity of finite-state channels. IEEE Trans. Inf. Theory 2015, 61, 3651–3669. [Google Scholar] [CrossRef]
Ahlswede, R. Multi-way communication channels. In Proceedings of the IEEE Second International Symposium on Information Theory, Tsahkadsor, Armenia, 2–8 September 1971. [Google Scholar]
Wyner, A. Recent results in the Shannon theory. IEEE Trans. Inf. Theory 1974, 20, 2–10. [Google Scholar] [CrossRef]
Cover, T. Some advances in broadcast channels. In Advances in Communication Systems; Academic Press: San Francisco, CA, USA, 1975; Volume 4, pp. 229–260. [Google Scholar]
Cover, T.; Leung, C. An achievable rate region for the multiple-access channel with feedback. IEEE Trans. Inf. Theory 1981, 27, 292–298. [Google Scholar] [CrossRef] [Green Version]
Willems, F. The feedback capacity region of a class of discrete memoryless multiple access channels. IEEE Trans. Inf. Theory 1982, 28, 93–95. [Google Scholar] [CrossRef]
Bross, S.; Lapidoth, A. An improved achievable region for the discrete memoryless two-user multiple-access channel with noiseless feedback. IEEE Trans. Inf. Theory 2005, 51, 811–833. [Google Scholar] [CrossRef] [Green Version]
Wu, W.; Vishwanath, S.; Arapostathis, A. On the capacity of multiple access channels with state information and feedback. IEEE Trans. Inf. Theory. submitted.
Ozarow, L. The capacity of the white Gaussian multiple access channel with feedback. IEEE Trans. Inf. Theory 1984, 30, 623–628. [Google Scholar] [CrossRef]
Schalkwijk, J.; Kailath, T. Coding scheme for additive noise channels with feedback I: No bandwidth constraint. IEEE Trans. Inf. Theory 1966, 12, 172–182. [Google Scholar] [CrossRef]
Sato, H. On the capacity region of a discrete two-user channel for strong interference. IEEE Trans. Inf. Theory 1978, 24, 377–379. [Google Scholar] [CrossRef]
Han, T.; Kobayashi, K. A new achievable rate region for the interference channel. IEEE Trans. Inf. Theory 1981, 27, 49–60. [Google Scholar] [CrossRef]
Annapureddy, V.; Veeravalli, V. Gaussian interference networks: Sum capacity in the low interference regime and new outer bounds on the capacity region. IEEE Trans. Inf. Theory 2009, 55, 3032–3050. [Google Scholar] [CrossRef]
Motahari, A.; Khandani, A. Capacity bounds for the Gaussian interference channel. IEEE Trans. Inf. Theory 2009, 55, 620–643. [Google Scholar] [CrossRef]
Shang, X.; Kramer, G.; Chen, B. A new outer bound and the noisy-interference sum-rate capacity for Gaussian interference channels. IEEE Trans. Inf. Theory 2009, 55, 689–699. [Google Scholar] [CrossRef]
Etkin, R.; Tse, D.; Wang, H. Gaussian interference channel capacity to within one bit. IEEE Trans. Inf. Theory 2008, 54, 5534–5562. [Google Scholar] [CrossRef]
Avestimehr, A.; Diggavi, S.; Tse, D. Wireless network information flow: A deterministic approach. IEEE Trans. Inf. Theory 2011, 57, 1872–1905. [Google Scholar] [CrossRef]
Suh, C.; Tse, D. Feedback capacity of the Gaussian interference channel to within 2 Bits. IEEE Trans. Inf. Theory 2011, 57, 2667–2685. [Google Scholar] [CrossRef]
El Gamal, A.; Kim, Y. Network Information Theory; Cambridge University Press: New York, NY, USA, 2011. [Google Scholar]
Bergmans, P. Random coding theorem for broadcast channels with degraded components. IEEE Trans. Inf. Theory 1973, 19, 197–207. [Google Scholar] [CrossRef]
Cover, T. Broadcast channels. IEEE Trans. Inf. Theory 1972, 18, 2–14. [Google Scholar] [CrossRef]
El Gamal, A. The capacity of the physically degraded Gaussian broadcast channel with feedback. IEEE Trans. Inf. Theory 1981, 27, 508–511. [Google Scholar] [CrossRef]
Ozarow, L.; Leung, S. An achievable region and outer bound for the Gaussian broadcast channel with feedback. IEEE Trans. Inf. Theory 1984, 30, 667–671. [Google Scholar] [CrossRef]
Hida, T.; Hitsuda, M. Gaussian Processes; American Mathematical Society: Providence, RI, USA, 1993; Volume 120. [Google Scholar]
Ibragimov, I.; Rozanov, Y. Gaussian Random Processes; Springer: New York, NY, USA, 1978. [Google Scholar]
Hitsuda, M. Mutual information in Gaussian channels. J. Multivar. Anal. 1974, 4, 66–73. [Google Scholar] [CrossRef] [Green Version]
Hitsuda, M.; Ihara, S. Gaussian channels and the optimal coding. J. Multivar. Anal. 1975, 5, 106–118. [Google Scholar] [CrossRef] [Green Version]
Ihara, S. On the capacity of the continuous time Gaussian channel with feedback. J. Multivar. Anal. 1980, 10, 319–331. [Google Scholar] [CrossRef] [Green Version]
Ihara, S. Capacity of mismatched Gaussian channels with and without feedback. Probab. Theory Rel. Fields 1990, 84, 453–471. [Google Scholar] [CrossRef]
Ihara, S. Coding theorems for a continuous-time Gaussian channel with feedback. IEEE Trans. Inf. Theory 1994, 40, 2014–2045. [Google Scholar] [CrossRef]
Ihara, S. Mutual information in stationary channels with additive noise. IEEE Trans. Inf. Theory 1985, 31, 602–606. [Google Scholar] [CrossRef]
Jacob, B.; Zakai, M.; Ziv, J. On the ε-entropy and the rate-distortion function of certain non-Gaussian processes. IEEE Trans. Inf. Theory 1974, 20, 517–524. [Google Scholar]
Baker, R.; Ihara, S. Information capacity of the stationary Gaussian channel. IEEE Trans. Inf. Theory 1991, 37, 1314–1326. [Google Scholar] [CrossRef]
Kim, Y. Feedback capacity of stationary Gaussian channels. IEEE Trans. Inf. Theory 2010, 56, 57–85. [Google Scholar] [CrossRef]
Ozarow, L.; Wyner, A.; Ziv, J. Achievable rates for constrained Gaussian channel. IEEE Trans. Inf. Theory 1988, 34, 365–370. [Google Scholar] [CrossRef]
Shamai, S.; David, I.B. Upper bounds on capacity for a constrained Gaussian channel. IEEE Trans. Inf. Theory 1989, 35, 1079–1084. [Google Scholar] [CrossRef]
Abou-Faycal, I.; Trott, M.; Shamai, S. The capacity of discrete-time memoryless Rayleigh-fading channels. IEEE Trans. Inf. Theory 2001, 47, 1290–1301. [Google Scholar] [CrossRef]
Chan, T.; Hranilovic, S.; Kschischang, F. Capacity-Achieving probability measure for conditionally Gaussian channels with bounded inputs. IEEE Trans. Inf. Theory 2005, 51, 2073–2088. [Google Scholar] [CrossRef]
Dytso, A.; Goldenbaum, M.; Shamai, S.; Poor, V. Upper and Lower Bounds on the Capacity of Amplitude-Constrained MIMO Channels. Available online: https://arxiv.org/abs/1708.09517 (accessed on 1 January 2018).
Fahs, J.; Abou-Faycal, I. Using Hermite bases in studying capacity-achieving distributions over AWGN channels. IEEE Trans. Inf. Theory 2012, 58, 5302–5322. [Google Scholar] [CrossRef]
Raginsky, M. On the information capacity of gaussian channels under small peak power constraints. In Proceedings of the 46th Annual Allerton Conference on Communication, Control, and Computing, Urbana-Champaign, IL, USA, 23–26 September 2008; pp. 286–293. [Google Scholar]
Shamai, S.; Bar-David, I. The capacity of average and peak-power-limited quadrature Gaussian channels. IEEE Trans. Inf. Theory 1995, 41, 1060–1071. [Google Scholar] [CrossRef]
Sharma, N.; Shamai, S. Transition points in the capacity-achieving distribution for the peak-power limited AWGN and free-space optical intensity channels. Probl. Inf. Transm. 2010, 46, 283–299. [Google Scholar] [CrossRef]
Smith, J. The information capacity of amplitude- and variance-constrained scalar Gaussian channels. Inf. Control 1971, 18, 203–219. [Google Scholar] [CrossRef]
Durrett, R. Probability: Theory and Examples, 4th ed.; Cambridge Series in Statistical and Probabilistic Mathematics; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
Royden, H. Real Analysis, 4th ed.; Prentice Hall: Boston, MA, USA, 2010. [Google Scholar]
Oksendal, B. Stochastic Differential Equations: An Introduction with Applications; Springer: Berlin, Germany, 1995. [Google Scholar]
Kutoyants, Y. Statistical Inference for Ergodic Diffusion Processes; Springer: London, UK, 2004. [Google Scholar]
Leon-Garcia, A. Probability, Statistics, and Random Processes for Electrical Engineering; Pearson/Prentice Hall: Upper Saddle River, NJ, USA, 2008. [Google Scholar]
Lapidoth, A.; Telatar, E.; Urbanke, R. On wide-band broadcast channels. IEEE Trans. Inf. Theory 2003, 49, 3250–3258. [Google Scholar] [CrossRef]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, X.; Han, G. On Continuous-Time Gaussian Channels. Entropy 2019, 21, 67. https://doi.org/10.3390/e21010067

AMA Style

Liu X, Han G. On Continuous-Time Gaussian Channels. Entropy. 2019; 21(1):67. https://doi.org/10.3390/e21010067

Chicago/Turabian Style

Liu, Xianming, and Guangyue Han. 2019. "On Continuous-Time Gaussian Channels" Entropy 21, no. 1: 67. https://doi.org/10.3390/e21010067

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On Continuous-Time Gaussian Channels^†

Abstract

1. Introduction

2. Sampling Theorems

3. Approximation Theorems

4. The Approximation Approach

5. Continuous-Time Multi-User Gaussian Channels

5.1. Gaussian MACs

5.2. Gaussian ICs

5.3. Gaussian BCs

6. Conclusions and Future Work

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Proof of Theorem 1

Appendix B. Proof of Lemma 1

Appendix C. Proof of Theorem 2

Appendix D. Proof of Theorem 3

Appendix E. Proof of Theorem 7

Appendix F. Proof of Theorem 8

Appendix G. Proof of Theorem 10

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

On Continuous-Time Gaussian Channels †

Abstract

1. Introduction

2. Sampling Theorems

3. Approximation Theorems

4. The Approximation Approach

5. Continuous-Time Multi-User Gaussian Channels

5.1. Gaussian MACs

5.2. Gaussian ICs

5.3. Gaussian BCs

6. Conclusions and Future Work

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Proof of Theorem 1

Appendix B. Proof of Lemma 1

Appendix C. Proof of Theorem 2

Appendix D. Proof of Theorem 3

Appendix E. Proof of Theorem 7

Appendix F. Proof of Theorem 8

Appendix G. Proof of Theorem 10

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

On Continuous-Time Gaussian Channels^†