1. Introduction
We consider the setting of real-time decision systems based on remotely sensed observations. In this setting, the decision maker needs to track the remote observations with high precision and in a timely manner. These are competing requirements, since high precision tracking will require larger number of bits to be communicated, resulting in larger transmission delay and increased staleness of information. Towards this larger goal, we study the following problem.
Consider a discrete time first-order auto-regressive (AR[1]) process , . A sensor draws a sample from this process, periodically once every s time-slots. In each of these time-slots, the sensor can send bits to a center. The center seeks to form an estimate of at time t, with small mean square error (MSE). Specifically, we are interested in minimizing the time-averaged error to enable timely and accurate tracking of .
We propose and study a successive update scheme where the encoder computes the error in the estimate of the latest sample at the decoder and sends its quantized value to the decoder. The decoder adds this value to its previous estimate to update the estimate of the latest sample, and uses it to estimate the current value using a linear predictor. We instantiate this scheme with a general gain-shape quantizer for error-quantization.
Note that we can send this update several times between two sampling instances. In particular, our interest is in comparing a fast but loose scheme where an update is sent every slot against a scheme with slower update in every p communication slots. The latter allows the encoder to use more bits for the update, but the decoder will need to wait longer. We consider a class of discrete time AR[1] processes generated by an independently and identically distributed (i.i.d) random innovation sequence such that the fourth moment of the process is bounded. Within this class, we show that the fast but loose successive update scheme, used with an appropriately selected quantizer, is universally optimal for all possible distributions of the innovation sequence when the number of dimensions grows asymptotically large.
To show this optimality, we use a random construction for the quantizer, based on the spherical code given in [
1,
2]. Roughly speaking, this ideal quantizer
Q yields
for every
n dimensional vector
y uniformly bounded. However, in practice, at finite
n, such quantizers need not exist. Most practical vector quantizers have an extra additive error,
, the error bound takes the form
where
and
are quantizer parameters that varies with the choice of quantizer. We present our analysis for such general quantizers. Interestingly, for such a quantizer (which is all we have at a finite
n), the optimal choice of
p can differ from 1. Our analysis provides a theoretically sound guideline for choosing the frequency of updates
for practical quantizers.
Our work relates to a large body of literature ranging from real-time compression to control and estimation over networks. The structure of real-time encoders for source coding has been studied in [
3,
4,
5,
6,
7,
8,
9]. The general structure of real-time encoders for Markov sources is studied for communication over error-free channels in [
3] and over noisy channels in [
4,
5]. The authors in [
3] consider the communication delays similar to our setting. However, the delayed distortion criterion considered in [
3] is different from the instantaneous distortion in our work. From these works, we see that optimal encoder output of a
kth order Markov source at any instant would depend only on the
k latest symbols and the present state of decoder memory. A similar structural result for the optimal encoders and decoders which are restricted to be causal is given in [
6]. Furthermore, structural results in the context of optimal zero-delay coding of correlated sources are available in [
7,
8,
9]. Some of these results can be extended to the case of finite delay decoding. However, as we need to track the process in real time, we have to produce an estimate for the latest sample at every instant even though the current information available at the decoder is stale and corresponds to past samples. This is quite different from the case of delayed decoding. Hence, the setup in all these works is different from the problem we consider and the results do not extend to our problem. Another related area which has been studied in literature is that of remote source coding of noisy sources [
10,
11,
12] with an optimal encoder-decoder structure discussed in [
11]. Further, [
13] examines the optimal sampling strategy for remote reconstruction of a bandlimited signal from noisy versions of the source. Studies about remote reconstruction of a stationary process from its noisy/noiseless samples can be found in [
14,
15,
16,
17] as well. However, in our setting, the sampling is fixed and moreover each transmission in our system incurs a delay that depends on the encoding rate. Hence, our problem does not directly fit into any of these frameworks.
The problems of remote estimation under communication constraints of various kinds have been studied in [
18,
19,
20,
21,
22,
23]. This line of work proposes several Kalman-like recursive estimation algorithms and evaluates their performances. In a related thread, [
24,
25,
26] study remote estimation under communication constraints and other related constraints using tools from dynamic programming and stochastic control. However, in all these works the role of channel delay is slightly different from that in our setting. Furthermore, the specific problem of choice of quantizer we consider has not been looked at. More recently, [
27] studied remote estimation of Wiener process for channels with random delays and proves that the optimal sampling policy is a threshold based policy. This work, like some of the works cited above, assumes real-valued transmissions and does not take into account any quantization effects.
In more information theoretic settings, sequential coding for individual sequences under delay constraints was studied in [
28,
29,
30,
31]. Closer to our work, the causal (nonanticipatory) rate-distortion function for a stochastic process goes back to early works [
32,
33]. Recent works [
34,
35,
36] consider the specific case of auto-regressive and Gauss–Markov processes and use the general formula in these early works to establish asymptotic optimality of simpler information structure for the encoders (optimal decoder structure is straightforward). Further, the system model in these works slightly differs from ours as the information transmission in our setting suffers a delay due to the channel rate constraint. We note that related formulations have been studied for simple settings of two or three iterations in [
37,
38] where interesting encoder structures for using previous communication for next sample as well emerge. Although some of these works propose specific optimal schemes as well, the key results in this line of research provide an expression for the rate-distortion function as an optimization problem, solving which will provide guidelines for a concrete scheme. In contrast, motivated by problems of estimation over an erasure channel, ref. [
39] provides an asymptotically optimal scheme that roughly uses Gaussian codebooks to quantize the innovation errors between the encoder’s observation and the decoder’s estimation. Our work is closest to [
39] and our encoding scheme shares similarities with the predictive Differential Pulse Code Modulation (DPCM) based scheme employed in [
39]. However, our work differs in an important aspect from all the works mentioned in this research thread: we take transmission delays into account. In particular, in our formulation the estimation at the decoder happens in real time in spite of the communication delays. Note that the rate constraint in the channel causes a delay in the reception of information at the decoder. Nevertheless, the decoder must provide an estimate of the current state of the process at every time instant, and a longer codeword will result in a longer delay for the decoder to get complete information.
Nonetheless, our converse bound is derived using similar methods as [
39]. Even the achievability part of our proof draws from [
39], but there is a technical caveat. Note that after the first round of quantization, the error vector need not be Gaussian, and the analysis in [
39] can only be applied after showing a closeness of the error vector distribution to Gaussian in the Wasserstein distance of order 2. While the original proof [
39] overlooks this technical point, this gap can be filled using a recent result from [
40] if spherical codes are used. However, we follow an alternative approach and show a direct analysis using vector quantizers.
In addition, there is a large body of work on control problems over rate-limited communication channels (cf. [
41,
42,
43,
44,
45,
46,
47,
48,
49,
50]). This line of work implicitly requires handling of communication delays in construction of estimators. However, the simple formulation we have seems to be missing and the results in this long line of work do not resolve the questions we raise.
Our main contributions in this paper are as follows: we present an encoder structure which we show to be optimal asymptotically in the dimension of the observation. Specifically, we propose to send successive updates that refines the estimates of the latest sample at the decoder. It is important to note that we quantize the estimation error at the decoder, instead of quantizing the innovation sequence formed at the encoder. Although the optimal MMSE decoder involves taking conditional expectation, we use a simple decoder which uses a simple linear structure. Yet, we show that this decoder is asymptotically optimal. Then, we instantiate this general scheme with spherical codes for quantizers to obtain a universal scheme. In particular, we consider general gain-shape quantizers and develop a framework to analyze their performance. One interesting result we present shows that the tradeoff between the accuracy and frequency of the updates must be carefully balanced based on the “bias” (additive error) in the quantizer used.
We present our problem formulation in the next section.
Section 3 presents a discussion on the our achievability scheme followed by the main results in the subsequent section.
Section 5 provides a detailed analysis of our scheme, which we further build-on in
Section 6 to get our asymptotic achievability results. We prove our converse bound in
Section 7 and conclude with a discussion on extensions of our result in the final section.
3. The Successive Update Scheme
In this section, we present our main contribution in this paper, namely the Successive Update tracking code. Before we describe the scheme completely, we present its different components. In every communication slot, the transmitter gets an opportunity to send bits. The transmitter may use it to send any information about a previously seen sample. There are various options for the encoder. For instance, it may use the current slot to send some information about a sample it had seen earlier. Or it may use all the slots between two sampling instants to send a quantized version of the latest sample. The information utilized by the decoder will be limited by the choice of structure of information transmission adopted at encoder. As the process we consider is Markov in nature, we choose to utilize all the transmission instants between two sampling instants to send information about the latest sample.
3.1. Encoder Structure: Refining the Error Successively
As mentioned earlier, the encoder and decoder that we employ in this work are similar to that in the DPCM scheme [
39,
51,
52]. However, recall that we have multiple transmission opportunities between each sampling instance and the transmissions are delayed. This calls for the need for certain modifications which we explain in the following.
Let and respectively denote the estimate for and formed at the receiver at time t. Our encoder computes the error in the receiver estimate of the last process sample at each time instant t. Denoting the error at time by , the encoder quantizes this error and sends it as communication . At any time instant , we suppose that we estimate from . In particular, we suppose that . Simply speaking, our encoder computes and quantizes the error in the current estimate of the last sample at the decoder, and sends it to the decoder to enable the refinement of the estimate in the next time slot. While we have not been able to establish optimality of this encoder structure, our results will show its asymptotic optimality when the number of dimensions n goes to infinity.
Even within this structural simplification, a very interesting question remains. Since the process is sampled once in s time slots, we have, potentially, bits to encode the latest sample. At any time , the receiver has access to and the partial codewords for . A simple approach for the encoder is to use the complete codeword to express the latest sample and the decoder can ignore the partial codewords. This approach will result in slow but very accurate updates of the sample estimates. An alternative fast but loose approach will send quantizer codewords to refine estimates in every communication slot. Should we prefer fast but loose estimates or slow but accurate ones? Our results will shed light on this conundrum.
3.2. The Choice of Quantizers
In our description of the encoder structure above, we did not specify a key design choice, namely the choice of the quantizer. We will restrict ourselves to using the same quantizer to quantize the error in each round of communication. The precision of this quantizer will depend on whether we choose a fast but loose paradigm or a slow but accurate one. However, the overall structure will remain the same. Roughly speaking, we allow any gain-shape [
53] quantizer which separately sends the quantized value of the gain
and the shape
for input
y. Formally, we use the following abstraction.
Definition 2 (
-quantizer family).
Fix . For and , a quantizer Q with dynamic range M specified by a mapping constitutes an bit-quantizer
if for every vector such that , we haveFurther, for a mapping , which is a decreasing function of rate R, a family of quantizers constitutes an-quantizer family
if for every R the quantizer constitutes an bit -quantizer. The expectation in the previous definition is taken with respect to the randomness in the quantizer, which is assumed to be shared between the encoder and the decoder for simplicity. For instance, in Lemma 5 we study a random codebook based construction for a -quantizer and we assume that once a random codebook is picked by the encoder, it is made known to the decoder. The parameter M, termed the dynamic range of the quantizer, specifies the domain of the quantizer. When the input y does not satisfy , the quantizer simply declares a failure, which we denote by ⊥. Our tracking code may use any such -quantizer family. It is typical in any construction of a gain-shape quantizer to have a finite M and . Our analysis for finite n will apply to any such -quantizer family and, in particular, will bring-out the role of the “bias” . However, when establishing our optimality result, we instantiate it using a random spherical code to get the desired performance.
3.3. Description of the Successive Update Scheme
All the conceptual components of our scheme are ready. Note that, we focus on updating the estimates of the latest observed sample at the decoder. Our encoder successively updates the estimate of the latest sample at the decoder by quantizing and sending estimates for errors .
As discussed earlier, we must decide if we prefer a fast but loose approach or a slow but accurate approach for sending error estimates. To carefully examine this tradeoff, we opt for a more general scheme where the
bits available between two samples are divided into
subfragments of length
bits each. We use an
bit quantizer to refine error estimates for the latest sample
(obtained at time
) every
p slots and send the resulting quantizer codewords as partial tracking codewords
,
. Specifically, the
kth codeword transmission interval
is divided into
m subfragments
,
given by
and
is transmitted in communication slots in
.
At time instant the decoder receives the jth subfragment of bits and uses it to refine the estimate of the latest source sample . Note that the fast but loose and the slow but accurate regimes described above correspond to and , respectively. In the middle of the interval , the decoder ignores the partially received quantization code and retains the estimate of formed at time . It forms an estimate of the current state by simply scaling by a factor of .
Finally, we impose one more additional simplification on the decoder structure. We simply update the estimate by adding to it the quantized value of the error. Thus, the decoder has a simple linear structure.
We can use any bit quantizer (with an abuse of notation, we will use instead of to denote an bit quantizer) for the n-dimensional error vector, whereby this scheme can be easily implemented in practice if can be implemented. For instance, we can use any standard gain-shape quantizer. The performance of most quantizers can be analyzed explicitly to render them a -quantizer family for an appropriate M and function . Later, when analyzing the scheme, we will consider a coming from a -quantizer family and present a theoretically sound guideline for choosing p.
Recall that we denote the estimate of formed at the decoder at time by . We start by initializing and then proceed using the encoder and the decoder algorithms outlined above. Note that our quantizer may declare failure symbol ⊥, in which case the decoder must still yield a nominal estimate. We will simply declare the estimate as (In analysis, we account for all these events as error. Only the probability of failure will determine the contribution of this part to the MSE since the process is mean-square bounded.) 0 once a failure happens.
We give a formal description of our encoder and decoder algorithms below.
The encoder.
- 1
Initialize , , .
- 2
At time
, use the decoder algorithm (to be described below) to form the estimate
and compute the error
where we use the latest sample
available at time
.
- 3
Quantize to bit as .
- 4
If quantize failure occurs and , send ⊥ to the receiver and terminate the encoder.
- 5
Else, send a binary representation of as the communication to the receiver over the next p communication slots (For simplicity, we do not account for the extra message symbol needed for sending ⊥.).
- 6
If , increase j by 1; else set and increase k by 1. Go to Step 2.
The decoder.
- 1
Initialize , , .
- 2
At time
, if encoding failure has not occurred until time
t, compute
and output
.
- 3
Else, if encoding failure has occurred and the ⊥ symbol is received declare for all subsequent time instants .
- 4
At time , for , output (We ignore the partial quantizer codewords received as till time t.) .
- 5
If , increase j by 1; else set and increase k by 1. Go to Step 2.
Note that the decoder has a simple structure and the principal component of the encoder is the quantizer. Therefore, the complexity of the proposed scheme is dominated by the complexity of the quantization operation and varies with respect to the quantizer chosen.
6. Asymptotic Achievability Using Random Quantizer
With Theorem 3 at our disposal, the proof of achievability can be completed by fixing and showing the existence of appropriate quantizer. However, we need to handle the failure event, and we address this first. The next result shows that the failure probability depends on the quantizer only through M.
Lemma 4. For fixed T and n, consider the p-SU scheme with and an bit -quantizer Q with dynamic range M. Then, for every , there exists an independent of n such that for all , we get Proof. The event
(of encoder failure not happening until time
T for the successive update scheme) occurs when the errors
satisfies
, for every
and
such that
. For brevity, we denote by
the error random variable
and
. We note that
Note that the inequality follows from Markovs inequality. Further we have
under
, whereby
Denoting by
the probability
, the previous two inequalities imply
We saw earlier in the proof of Lemma 1 that
depends only on the probability
that failure does not occur until time
. Proceeding as in that proof, we get
where
and
do not depend on
n. Therefore, there exists
independent of
n such that for all
M exceeding
we have
which completes the proof by summing over
T. □
The bound above is rather loose, but it suffices for our purpose. In particular, it says that we can choose
M sufficiently large to make probability of failure until time
T less than any
, whereby Theorem 3 can be applied by designing a quantizer for this
M. Indeed, we can use the quantizer of unit sphere from [
1,
2], along with a uniform quantizer for gain (which lies in
) to get the following performance. In fact, we will show that a deterministic quantizer with the desired performance exists. Note that we already considered such a quantizer in Example 3. However, the analysis there was slightly loose and it assumed the existence of an ideal shape quantizer.
Lemma 5. For every , there exists an bit -quantizer with dynamic range M, for all n sufficiently large.
Proof. We first borrow a classic construction from [
1,
2], which gives us our desired shape quantizer. Denote by
the
-dimensional unit sphere
. For every
and
n sufficiently large, it was shown in [
1,
2] that there exist
vectors
in
such that for every
we can find
satisfying
Denoting
, consider the shape quantizer
from [
2] given by
Note that we shrink the length of
by a factor of
, which will be seen to yield the gain over the analysis in Example 3.
We append to this shape quantizer the uniform gain quantizer , which quantizes the interval uniformly into subintervals of length . Specifically, and the corresponding index is given by . We represent this index using its bit binary representation and denote this mapping by .
For every
such that
, we consider the quantizer
For this quantizer, for every
with
such that
, we have
where the first inequality uses the covering property of
. Therefore,
Q constitutes an
bit
-quantizer with dynamic range
M, for all
n sufficiently large. Note that this quantizer is a deterministic one. □
Proof of Theorem 1. For any fixed
and
, we can make the probability of failure until time
T less than
by choosing
M sufficiently large. Further, for any fixed
, by Lemma 5, we can choose
n sufficiently large to get an
bit
-quantizer for vectors
y with
. Therefore, by Theorem 3 applied for
, we get that
The proof is completed upon taking the limits as
, and
go to 0. □
7. Converse Bound: Proof of Theorem 2
The proof is similar to the converse proof in [
55], but now we need to handle the delay per transmission. We rely on the properties of
entropy power of a random variable. Recall that for a continuous random variable
X taking values in
, the entropy power of
X is given by
where
is the differential entropy of
X.
Consider a tracking code
of rate
R and sampling period
s and a process
. We begin by noting that the state at time
t is related to the state at time
as
where the noise
is independent of
(and the past states). In particular, for
,
, we get
where we define
and the first identity uses the orthogonality of noise added in each round from the previous states and noise. The second equality follows directly by substituting the variance of the components of the process
and simplifying the summation. Since the Gaussian distribution has the maximum differential entropy among all continuous random variables with a given variance, and the entropy power for a Gaussian random variable equals its variance, we get that
Therefore, the previous bound for tracking error yields
where the identity uses the assumption that
are identically distributed for all
t. Taking average of these terms for
, we get
Note that
s act as estimates of
which depend on the communication received by the decoder until time
. We denote the communication received at time
t by
, whereby
depends only on
. In particular, the communication
was sent as a function of
, the sample seen at time
.
From here on, we proceed by invoking the “entropy power bounds” for the MSE terms. For random variables X and Y such that has a conditional density, the conditional entropy power is given by . (The conditional differential entropy is given by .) Bounding MSE terms by entropy power is a standard step that allows us to track reduction in error due to a fixed amount of communication.
We begin by using the following standard bound (see [
56], Chapter 10): (It follows simply by noting that Gaussian maximizes differential entropy among all random variables with a given second moment and that
.) For a continuous random variable
X and a discrete random variable
Y taking
values, let
be any function of
Y. Then, it holds that
We apply this result to
given
in the role of
X and the communication
in the role of
Y. The previous bound and Jensen’s inequality yield
Next, we recall the entropy power inequality (cf. [
56]): For independent
and
,
. Noting that
, where
is an
iid zero-mean random variable independent of
, and that
is a function of
, we get
where the previous identities utilizes the scaling property of differential entropy. Upon combining the bounds given above and simplifying, we get
Finally, note that the terms
are exactly the same as that considered in [
55] (eqn. 11e) since they correspond to recovering
using communication that can depend on it. Therefore, a similar expression holds here, for the sampled process
. Using the recursive bound for the tracking error in (
6) and (
7), we adapt the results of [
55] (eqn. 11) for our case to obtain
where the quantity
is given by the recursion
with
.
The bound obtain above holds for any given process
. To obtain the best possible bound we substitute
to be a Gaussian random variable, since that would maximize
. Specifically, we set
to be a Gaussian random variable with zero mean and variance
to get
. Thus, taking supremum over all distributions on both sides of (
8), we have
where
with
. For this sequence
, we can see that (cf. [
55] (Corollary 1))
Therefore, we have obtained
As the bound obtained above holds for all tracking codes
, it follows that
8. Discussion
We restricted our treatment to an AR[1] process with uncorrelated components. This restriction is for clarity of presentation, and some of the results can be extended to AR[1] processes with correlated components. In this case, the decoder will be replaced by a Kalman-like filter in the manner of [
35]. A natural extension of this work is the study of an optimum transmission strategy for an AR[
n] process in the given setting. In an AR[
n] process, the strategy of refining the latest sample is clearly not sufficient as the value of the process at any time instant is dependent on the past
n samples. If the sampling is periodic, even the encoder does not have access to all these
n samples unless we take a sample at every instant. A viable alternative is to take
n consecutive samples at every sampling instant. However, even with this structure on the sampling policy, it is not clear how the information must be transmitted. A systematic analysis of this problem is an interesting area of future research.
Another setting which is not discussed in the current work is where the transmissions are of nonuniform rates. Throughout our work, we have assumed periodic sampling and transmissions at a fixed rate. For the scheme presented in this paper, it is easy to see from our analysis that only the total number of bits transmitted in each sampling interval matters, when the dimension is sufficiently large. That is, for our scheme, even framing each packet (sent in each communication slot) using unequal number of bits will give the same performance as that for equal packet size, if the overall bit-budget per sampling period is fixed. A similar phenomenon was observed in [
39], which allowed the extension of some of their analysis to erasure channels with feedback. We remark that a similar extension is possible for some of our results, too. This behavior stems from the use of successive batches of bits to successively refine the estimate of a single sample within any sampling interval, whereby at the end of the sampling interval the error corresponds to roughly that for a quantizer using the total number of bits sent during the interval. In general, a study of nonuniform rates for describing each sample, while keeping bits per time-slot fixed, will require us to move beyond uniform sampling. This, too, is an interesting research direction to pursue.
Finally, we remark that the encoder structure we have imposed, wherein the error in the estimate of the latest sample is refined at each instant, is optimal only asymptotically and is justified only heuristically for fixed dimensions. Even for one dimensional observation it is not clear if this structure is optimal. We believe that this is a question of fundamental interest which remains open.