Next Article in Journal
Maximum Entropy Approach to Reliability of Multi-Component Systems with Non-Repairable or Repairable Components
Next Article in Special Issue
A Stochastic Model for Block Segmentation of Images Based on the Quadtree and the Bayes Code for It
Previous Article in Journal
Mismatch Negativity and Stimulus-Preceding Negativity in Paradigms of Increasing Auditory Complexity: A Possible Role in Predictive Coding
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Tracking an Auto-Regressive Process with Limited Communication per Unit Time †

1
Robert Bosch Centre for Cyber-Physical Systems, Indian Institute of Science, Bangalore 560012, India
2
Department of ECE, Indian Institute of Science, Bangalore 560012, India
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in the Proceeding of the 2020 IEEE International Symposium on Information Theory (ISIT), June 2020 titled “Tracking an Auto-Regressive Process with Limited Communication”.
Entropy 2021, 23(3), 347; https://doi.org/10.3390/e23030347
Submission received: 1 February 2021 / Revised: 9 March 2021 / Accepted: 12 March 2021 / Published: 15 March 2021
(This article belongs to the Special Issue Data Compression and Complexity)

Abstract

:
Samples from a high-dimensional first-order auto-regressive process generated by an independently and identically distributed random innovation sequence are observed by a sender which can communicate only finitely many bits per unit time to a receiver. The receiver seeks to form an estimate of the process value at every time instant in real-time. We consider a time-slotted communication model in a slow-sampling regime where multiple communication slots occur between two sampling instants. We propose a successive update scheme which uses communication between sampling instants to refine estimates of the latest sample and study the following question: Is it better to collect communication of multiple slots to send better refined estimates, making the receiver wait more for every refinement, or to be fast but loose and send new information in every communication opportunity? We show that the fast but loose successive update scheme with ideal spherical codes is universally optimal asymptotically for a large dimension. However, most practical quantization codes for fixed dimensions do not meet the ideal performance required for this optimality, and they typically will have a bias in the form of a fixed additive error. Interestingly, our analysis shows that the fast but loose scheme is not an optimal choice in the presence of such errors, and a judiciously chosen frequency of updates outperforms it.

1. Introduction

We consider the setting of real-time decision systems based on remotely sensed observations. In this setting, the decision maker needs to track the remote observations with high precision and in a timely manner. These are competing requirements, since high precision tracking will require larger number of bits to be communicated, resulting in larger transmission delay and increased staleness of information. Towards this larger goal, we study the following problem.
Consider a discrete time first-order auto-regressive (AR[1]) process X t R n , t 0 . A sensor draws a sample from this process, periodically once every s time-slots. In each of these time-slots, the sensor can send n R bits to a center. The center seeks to form an estimate X ^ t of X t at time t, with small mean square error (MSE). Specifically, we are interested in minimizing the time-averaged error t = 1 T E X t X ^ t 2 2 / T to enable timely and accurate tracking of X t .
We propose and study a successive update scheme where the encoder computes the error in the estimate of the latest sample at the decoder and sends its quantized value to the decoder. The decoder adds this value to its previous estimate to update the estimate of the latest sample, and uses it to estimate the current value using a linear predictor. We instantiate this scheme with a general gain-shape quantizer for error-quantization.
Note that we can send this update several times between two sampling instances. In particular, our interest is in comparing a fast but loose scheme where an update is sent every slot against a scheme with slower update in every p communication slots. The latter allows the encoder to use more bits for the update, but the decoder will need to wait longer. We consider a class X n of discrete time AR[1] processes generated by an independently and identically distributed (i.i.d) random innovation sequence such that the fourth moment of the process is bounded. Within this class, we show that the fast but loose successive update scheme, used with an appropriately selected quantizer, is universally optimal for all possible distributions of the innovation sequence when the number of dimensions grows asymptotically large.
To show this optimality, we use a random construction for the quantizer, based on the spherical code given in [1,2]. Roughly speaking, this ideal quantizer Q yields
E y Q ( y ) 2 2 y 2 2 2 2 R ,
for every n dimensional vector y uniformly bounded. However, in practice, at finite n, such quantizers need not exist. Most practical vector quantizers have an extra additive error, i . e . , the error bound takes the form
E y Q ( y ) 2 2 y 2 2 θ + n ε 2
where θ and ϵ are quantizer parameters that varies with the choice of quantizer. We present our analysis for such general quantizers. Interestingly, for such a quantizer (which is all we have at a finite n), the optimal choice of p can differ from 1. Our analysis provides a theoretically sound guideline for choosing the frequency of updates 1 / p for practical quantizers.
Our work relates to a large body of literature ranging from real-time compression to control and estimation over networks. The structure of real-time encoders for source coding has been studied in [3,4,5,6,7,8,9]. The general structure of real-time encoders for Markov sources is studied for communication over error-free channels in [3] and over noisy channels in [4,5]. The authors in [3] consider the communication delays similar to our setting. However, the delayed distortion criterion considered in [3] is different from the instantaneous distortion in our work. From these works, we see that optimal encoder output of a kth order Markov source at any instant would depend only on the k latest symbols and the present state of decoder memory. A similar structural result for the optimal encoders and decoders which are restricted to be causal is given in [6]. Furthermore, structural results in the context of optimal zero-delay coding of correlated sources are available in [7,8,9]. Some of these results can be extended to the case of finite delay decoding. However, as we need to track the process in real time, we have to produce an estimate for the latest sample at every instant even though the current information available at the decoder is stale and corresponds to past samples. This is quite different from the case of delayed decoding. Hence, the setup in all these works is different from the problem we consider and the results do not extend to our problem. Another related area which has been studied in literature is that of remote source coding of noisy sources [10,11,12] with an optimal encoder-decoder structure discussed in [11]. Further, [13] examines the optimal sampling strategy for remote reconstruction of a bandlimited signal from noisy versions of the source. Studies about remote reconstruction of a stationary process from its noisy/noiseless samples can be found in [14,15,16,17] as well. However, in our setting, the sampling is fixed and moreover each transmission in our system incurs a delay that depends on the encoding rate. Hence, our problem does not directly fit into any of these frameworks.
The problems of remote estimation under communication constraints of various kinds have been studied in [18,19,20,21,22,23]. This line of work proposes several Kalman-like recursive estimation algorithms and evaluates their performances. In a related thread, [24,25,26] study remote estimation under communication constraints and other related constraints using tools from dynamic programming and stochastic control. However, in all these works the role of channel delay is slightly different from that in our setting. Furthermore, the specific problem of choice of quantizer we consider has not been looked at. More recently, [27] studied remote estimation of Wiener process for channels with random delays and proves that the optimal sampling policy is a threshold based policy. This work, like some of the works cited above, assumes real-valued transmissions and does not take into account any quantization effects.
In more information theoretic settings, sequential coding for individual sequences under delay constraints was studied in [28,29,30,31]. Closer to our work, the causal (nonanticipatory) rate-distortion function for a stochastic process goes back to early works [32,33]. Recent works [34,35,36] consider the specific case of auto-regressive and Gauss–Markov processes and use the general formula in these early works to establish asymptotic optimality of simpler information structure for the encoders (optimal decoder structure is straightforward). Further, the system model in these works slightly differs from ours as the information transmission in our setting suffers a delay due to the channel rate constraint. We note that related formulations have been studied for simple settings of two or three iterations in [37,38] where interesting encoder structures for using previous communication for next sample as well emerge. Although some of these works propose specific optimal schemes as well, the key results in this line of research provide an expression for the rate-distortion function as an optimization problem, solving which will provide guidelines for a concrete scheme. In contrast, motivated by problems of estimation over an erasure channel, ref. [39] provides an asymptotically optimal scheme that roughly uses Gaussian codebooks to quantize the innovation errors between the encoder’s observation and the decoder’s estimation. Our work is closest to [39] and our encoding scheme shares similarities with the predictive Differential Pulse Code Modulation (DPCM) based scheme employed in [39]. However, our work differs in an important aspect from all the works mentioned in this research thread: we take transmission delays into account. In particular, in our formulation the estimation at the decoder happens in real time in spite of the communication delays. Note that the rate constraint in the channel causes a delay in the reception of information at the decoder. Nevertheless, the decoder must provide an estimate of the current state of the process at every time instant, and a longer codeword will result in a longer delay for the decoder to get complete information.
Nonetheless, our converse bound is derived using similar methods as [39]. Even the achievability part of our proof draws from [39], but there is a technical caveat. Note that after the first round of quantization, the error vector need not be Gaussian, and the analysis in [39] can only be applied after showing a closeness of the error vector distribution to Gaussian in the Wasserstein distance of order 2. While the original proof [39] overlooks this technical point, this gap can be filled using a recent result from [40] if spherical codes are used. However, we follow an alternative approach and show a direct analysis using vector quantizers.
In addition, there is a large body of work on control problems over rate-limited communication channels (cf. [41,42,43,44,45,46,47,48,49,50]). This line of work implicitly requires handling of communication delays in construction of estimators. However, the simple formulation we have seems to be missing and the results in this long line of work do not resolve the questions we raise.
Our main contributions in this paper are as follows: we present an encoder structure which we show to be optimal asymptotically in the dimension of the observation. Specifically, we propose to send successive updates that refines the estimates of the latest sample at the decoder. It is important to note that we quantize the estimation error at the decoder, instead of quantizing the innovation sequence formed at the encoder. Although the optimal MMSE decoder involves taking conditional expectation, we use a simple decoder which uses a simple linear structure. Yet, we show that this decoder is asymptotically optimal. Then, we instantiate this general scheme with spherical codes for quantizers to obtain a universal scheme. In particular, we consider general gain-shape quantizers and develop a framework to analyze their performance. One interesting result we present shows that the tradeoff between the accuracy and frequency of the updates must be carefully balanced based on the “bias” (additive error) in the quantizer used.
We present our problem formulation in the next section. Section 3 presents a discussion on the our achievability scheme followed by the main results in the subsequent section. Section 5 provides a detailed analysis of our scheme, which we further build-on in Section 6 to get our asymptotic achievability results. We prove our converse bound in Section 7 and conclude with a discussion on extensions of our result in the final section.

2. Problem Formulation

We begin by providing a formal description of our problem. Figure 1 illustrates the communication model adopted in this work. Different components of the model are presented in separate sections. Throughout the remainder of this paper, the set of real numbers is denoted by R , the set of positive reals is denoted by R + , the n-dimensional Euclidean space is denoted by R n and the associated Euclidean norm by · 2 , the set of positive integers is denoted by N , the set of non-negative integers is denoted by Z + , the set of continuous positive integers until m is denoted by [ m ] { 1 , , m } , and an identity matrix of size n × n is denoted by I n .

2.1. Observed Random Process and Its Sampling

For α ( 0 , 1 ) , we consider a discrete time auto-regressive process of order 1 (AR[1] process) in R n ,
X t = α X t 1 + ξ t , t 0 ,
where ( ξ t R n , t 1 ) is an i.i.d. random sequence with zero mean and covariance matrix σ 2 ( 1 α 2 ) I n . For simplicity, we assume that X 0 R n is a zero mean random variable with covariance matrix σ 2 I n . This choice of covariance matrices for ξ t and X 0 ensures that the variance of X t R n is σ 2 I n for all t 0 . Although this assumption makes the analysis more tractable, the dependence of the covariance matrix of ξ t on the process parameter α is not crucial for the analysis.
In addition, we assume that X t 2 has a bounded fourth moment at all times t 0 . Specifically, let κ > 0 satisfy
sup k Z + 1 n E X k 2 4 κ .
It is clear that X = ( X t R n , t 0 ) is a Markov process. We denote the set of processes X satisfying the assumptions above by X n and the class of all such processes for different choices of dimension n as X .
This discrete time process is subsampled periodically at sampling frequency 1 / s , for some s N , to obtain samples ( X k s R n , k 0 ) .

2.2. Encoder Description

The sampled process ( X k s , k 0 ) is passed to an encoder which converts it to a bit stream. The encoder operates in real-time and sends n R s bits between any two sampling instants. Specifically, the encoder is given by a sequence of mappings ( ϕ t ) t 0 , where the mapping at any discrete time t = k s is denoted by
ϕ t : R n ( k + 1 ) { 0 , 1 } n R s .
The encoder output at time t = k s is denoted by the codeword C k s ϕ k ( X 0 , X s , , X k s ) . We denote this codeword by an s-length sequence of binary strings C k s = ( C k s , 0 , , C k s , s 1 ) , where each term C k s , i takes values in { 0 , 1 } n R . For time k s and 0 i s 1 , we can view the binary string C k s , i as the communication sent at time k s + i . We elaborate on the communication channel next.

2.3. Communication Channel

The output bit-stream of the encoder is sent to the receiver via an error-free communication channel. Specifically, we assume slotted transmission with synchronization where in each slot the transmitter sends n R bits of communication error-free. That is, we are allowed to send R bits per dimension, per time slot. Note that there is a delay of 1 time-unit (corresponding to one slot) in transmission of each n R bits. Therefore, the vector C k s , i of n R bits transmitted at time k s + i is received at time instant k s + i + 1 for 0 i s 1 . Throughout we use the notation I k { k s , , ( k + 1 ) s 1 } and I ˜ k = I k + 1 = { k s + 1 , , ( k + 1 ) s } , respectively, for the set of transmit and receive times for the strings C k s , i , 0 i s 1 .

2.4. Decoder Description

We describe the operation of the receiver at time t I k , for some k N , such that i = t k s { 0 , , s 1 } . Upon receiving the codewords C s , C 2 s , , C ( k 1 ) s and the partial codeword ( C k s , 0 , , C k s , i 1 ) at time t = k s + i , the decoder estimates the current-state X t of the process using the estimator mapping
ψ t : { 0 , 1 } n R t R n .
We denote the overall communication received by the decoder until time instant t by C t 1 . The time index t 1 in C t 1 corresponds to the transmission time of the codewords, whereby the communication received till time t is denoted by C t 1 . Further, we denote by X ^ t | t the real-time causal estimate ψ t ( C t 1 ) of X t formed at the decoder at time t. Thus, the overall real-time causal estimation scheme is described by the mappings ( ϕ t , ψ t ) t 0 . It is important to note that the communication available to the decoder at time t I k can only depend on samples X up to time k s . As a convention, we assume that X ^ 0 | 0 = 0 .

2.5. Performance Metrics

We call the encoder-decoder mapping sequence ( ϕ , ψ ) = ( ϕ t , ψ t ) t 0 a tracking code of rate R and sampling period s. The tracking error of our tracking code at time t for process X is measured by the mean squared error (MSE) per dimension given by
D t ( ϕ , ψ , X ) 1 n E X t X ^ t | t 2 2 .
Our goal is to design ( ϕ , ψ ) with low average tracking error D ¯ T ( ϕ , ψ , X ) given by
D ¯ T ( ϕ , ψ , X ) 1 T t = 0 T 1 D t ( ϕ , ψ , X ) .
For technical reasons, we restrict to a finite time horizon setting. For the most part, the time horizon T will remain fixed and will be omitted from the notation. Instead of working with the mean-square error, a more convenient but equivalent parameterization for us will be that of accuracy, given by
δ T ( ϕ , ψ , X ) = 1 D ¯ T ( ϕ , ψ , X ) σ 2 .
Definition 1
(Maxmin tracking accuracy). The worst-case tracking accuracy for X n attained by a tracking code ( ϕ , ψ ) is given by
δ T ( ϕ , ψ , X n ) = inf X X n δ T ( ϕ , ψ , X ) .
The maxmin tracking accuracy for X n at rate R and sampling period s is given by
δ n T ( R , s , , X n ) = sup ( ϕ , ψ ) δ T ( ϕ , ψ , X n ) ,
where the supremum is over all tracking codes ( ϕ , ψ ) .
The maxmin tracking accuracy δ n T ( R , s , X n ) is the fundamental quantity of interest for us. Note that it is possible to provide a similar characterization of the performance in terms of MSE instead of accuracy and obtain the exact same result. Recall that n denotes the dimension of the observations in X t for X X n and T the time horizon. However, we will only characterize δ n T ( R , s , X n ) asymptotically in n and T. Specifically, we define the asymptotic maxmin tracking accuracy as
δ * ( R , s , X ) = lim sup T lim sup n δ n T ( R , s , X n ) .
We will provide a characterization of δ * ( R , s , X ) and present a sequence of tracking codes that attains it. In fact, the tracking code we use is an instantiation of our successive update scheme, which we describe in the next section. It is important to note that our results may not hold if we switch the order of limits above: we need very large codeword lengths depending on a fixed finite time horizon T.

3. The Successive Update Scheme

In this section, we present our main contribution in this paper, namely the Successive Update tracking code. Before we describe the scheme completely, we present its different components. In every communication slot, the transmitter gets an opportunity to send n R bits. The transmitter may use it to send any information about a previously seen sample. There are various options for the encoder. For instance, it may use the current slot to send some information about a sample it had seen earlier. Or it may use all the slots between two sampling instants to send a quantized version of the latest sample. The information utilized by the decoder will be limited by the choice of structure of information transmission adopted at encoder. As the process we consider is Markov in nature, we choose to utilize all the transmission instants between two sampling instants to send information about the latest sample.

3.1. Encoder Structure: Refining the Error Successively

As mentioned earlier, the encoder and decoder that we employ in this work are similar to that in the DPCM scheme [39,51,52]. However, recall that we have multiple transmission opportunities between each sampling instance and the transmissions are delayed. This calls for the need for certain modifications which we explain in the following.
Let X ^ k s | t and X ^ t | t respectively denote the estimate for X k s and X t formed at the receiver at time t. Our encoder computes the error in the receiver estimate of the last process sample at each time instant t. Denoting the error at time t I k by Y t X k s X ^ k s | t , the encoder quantizes this error Y t and sends it as communication C k s + 1 , i . At any time instant t = k s + i , we suppose that we estimate X ^ t | t from X ^ k s | t . In particular, we suppose that X ^ t | t = α i X ^ k s | t . Simply speaking, our encoder computes and quantizes the error in the current estimate of the last sample at the decoder, and sends it to the decoder to enable the refinement of the estimate in the next time slot. While we have not been able to establish optimality of this encoder structure, our results will show its asymptotic optimality when the number of dimensions n goes to infinity.
Even within this structural simplification, a very interesting question remains. Since the process is sampled once in s time slots, we have, potentially, n R s bits to encode the latest sample. At any time t I ˜ k , the receiver has access to ( C 0 , , C ( k 1 ) s ) and the partial codewords ( C k s , 0 , , C k s , i 1 ) for i = t k s . A simple approach for the encoder is to use the complete codeword to express the latest sample and the decoder can ignore the partial codewords. This approach will result in slow but very accurate updates of the sample estimates. An alternative fast but loose approach will send n R quantizer codewords to refine estimates in every communication slot. Should we prefer fast but loose estimates or slow but accurate ones? Our results will shed light on this conundrum.

3.2. The Choice of Quantizers

In our description of the encoder structure above, we did not specify a key design choice, namely the choice of the quantizer. We will restrict ourselves to using the same quantizer to quantize the error in each round of communication. The precision of this quantizer will depend on whether we choose a fast but loose paradigm or a slow but accurate one. However, the overall structure will remain the same. Roughly speaking, we allow any gain-shape [53] quantizer which separately sends the quantized value of the gain y 2 and the shape y / y 2 for input y. Formally, we use the following abstraction.
Definition 2
( ( θ , ε ) -quantizer family). Fix 0 < M < . For 0 θ 1 and 0 < ε , a quantizer Q with dynamic range M specified by a mapping Q : R n { 0 , 1 } n R constitutes an n R bit ( θ , ε ) -quantizer if for every vector y R n such that y 2 2 n M 2 , we have
E y Q ( y ) 2 2 y 2 2 θ ( R ) + n ε 2 .
Further, for a mapping θ : R + [ 0 , 1 ] , which is a decreasing function of rate R, a family of quantizers Q = { Q R : R > 0 } constitutes an ( θ , ε ) -quantizer family if for every R the quantizer Q R constitutes an n R bit ( θ ( R ) , ε ) -quantizer.
The expectation in the previous definition is taken with respect to the randomness in the quantizer, which is assumed to be shared between the encoder and the decoder for simplicity. For instance, in Lemma 5 we study a random codebook based construction for a ( θ , ε ) -quantizer and we assume that once a random codebook is picked by the encoder, it is made known to the decoder. The parameter M, termed the dynamic range of the quantizer, specifies the domain of the quantizer. When the input y does not satisfy y 2 n M , the quantizer simply declares a failure, which we denote by ⊥. Our tracking code may use any such ( θ , ε ) -quantizer family. It is typical in any construction of a gain-shape quantizer to have a finite M and ε > 0 . Our analysis for finite n will apply to any such ( θ , ε ) -quantizer family and, in particular, will bring-out the role of the “bias” ε . However, when establishing our optimality result, we instantiate it using a random spherical code to get the desired performance.

3.3. Description of the Successive Update Scheme

All the conceptual components of our scheme are ready. Note that, we focus on updating the estimates of the latest observed sample X k s at the decoder. Our encoder successively updates the estimate of the latest sample at the decoder by quantizing and sending estimates for errors Y t .
As discussed earlier, we must decide if we prefer a fast but loose approach or a slow but accurate approach for sending error estimates. To carefully examine this tradeoff, we opt for a more general scheme where the n R s bits available between two samples are divided into m = s / p subfragments of length n R p bits each. We use an n R p bit quantizer to refine error estimates for the latest sample X k s (obtained at time t = k s ) every p slots and send the resulting quantizer codewords as partial tracking codewords ( C k s , j p , , C k s , ( j + 1 ) p 1 ) , 0 j m 1 . Specifically, the kth codeword transmission interval I k is divided into m subfragments I k , j , 1 j m given by
I k , j { k s + j p , , k s + ( j + 1 ) p 1 } , 0 j m 1 ,
and ( C k s , j p , , C k s , ( j + 1 ) p 1 ) is transmitted in communication slots in I k , j .
At time instant t = k s + j p + 1 the decoder receives the jth subfragment ( C k s , t k s , t I k , j ) of n R p bits and uses it to refine the estimate of the latest source sample X k s . Note that the fast but loose and the slow but accurate regimes described above correspond to p = 1 and p = s , respectively. In the middle of the interval I k , j , the decoder ignores the partially received quantization code and retains the estimate X ^ k s of X k s formed at time k s + ( j 1 ) p + 1 . It forms an estimate of the current state X k s + i by simply scaling X ^ k s by a factor of α i .
Finally, we impose one more additional simplification on the decoder structure. We simply update the estimate by adding to it the quantized value of the error. Thus, the decoder has a simple linear structure.
We can use any n R p bit quantizer (with an abuse of notation, we will use Q p instead of Q R p to denote an n R p bit quantizer) Q p for the n-dimensional error vector, whereby this scheme can be easily implemented in practice if Q p can be implemented. For instance, we can use any standard gain-shape quantizer. The performance of most quantizers can be analyzed explicitly to render them a ( θ , ε ) -quantizer family for an appropriate M and function θ . Later, when analyzing the scheme, we will consider a Q p coming from a ( θ , ε ) -quantizer family and present a theoretically sound guideline for choosing p.
Recall that we denote the estimate of X u formed at the decoder at time t u by X ^ u | t . We start by initializing X ^ 0 | 0 = 0 and then proceed using the encoder and the decoder algorithms outlined above. Note that our quantizer Q p may declare failure symbol ⊥, in which case the decoder must still yield a nominal estimate. We will simply declare the estimate as (In analysis, we account for all these events as error. Only the probability of failure will determine the contribution of this part to the MSE since the process is mean-square bounded.) 0 once a failure happens.
We give a formal description of our encoder and decoder algorithms below.
The encoder.
1
Initialize k = 0 , j = 0 , X ^ 0 | 0 = 0 .
2
At time t = k s + j p , use the decoder algorithm (to be described below) to form the estimate X ^ k s | t and compute the error
Y k , j X k s X ^ k s | t ,
where we use the latest sample X k s available at time t = k s + j p .
3
Quantize Y k , j to n R p bit as Q p ( Y k , j ) .
4
If quantize failure occurs and Q p ( Y k , j ) = , send ⊥ to the receiver and terminate the encoder.
5
Else, send a binary representation of Q p ( Y k , j ) as the communication ( C k s , 0 , . . . , C k s , p 1 ) to the receiver over the next p communication slots (For simplicity, we do not account for the extra message symbol needed for sending ⊥.).
6
If j < m 1 , increase j by 1; else set j = 0 and increase k by 1. Go to Step 2.
The decoder.
1
Initialize k = 0 , j = 0 , X ^ 0 | 0 = 0 .
2
At time t = k s + j p , if encoding failure has not occurred until time t, compute
X ^ k s | k s + j p = X ^ k s | k s + ( j 1 ) p + Q p ( Y k , j 1 ) ,
and output X ^ t | t = α t k s X ^ k s | t .
3
Else, if encoding failure has occurred and the ⊥ symbol is received declare X ^ s | t = 0 for all subsequent time instants s t .
4
At time t = k s + j p + i , for i [ p 1 ] , output (We ignore the partial quantizer codewords received as ( C k s , j p + 1 , C k s , j p + 2 , , C k s , j p + i 1 ) till time t.) X ^ t | t = α t k s X ^ k s | k s + j p .
5
If j < m 1 , increase j by 1; else set j = 0 and increase k by 1. Go to Step 2.
Note that the decoder has a simple structure and the principal component of the encoder is the quantizer. Therefore, the complexity of the proposed scheme is dominated by the complexity of the quantization operation and varies with respect to the quantizer chosen.

4. Main Results

We present results in two categories. First, we provide an explicit formula for the asymptotic maxmin tracking accuracy δ * ( R , s , X ) . Next, we present a theoretically-founded guideline for selecting a good p for the successive update scheme with a ( θ , ε ) -quantizer family. Interestingly, the optimal choice may differ from the asymptotically optimal choice of p = 1 .

4.1. Characterization of the Maxmin Tracking Accuracy

To describe our result, we define the functions δ 0 : R + [ 0 , 1 ] and g : R + [ 0 , 1 ] as
δ 0 ( R ) α 2 ( 1 2 2 R ) ( 1 α 2 2 2 R ) , for all R > 0 ; g ( s ) ( 1 α 2 s ) s ( 1 α 2 ) , for all s > 0 .
Note that g ( s ) is a decreasing function of s with g ( 1 ) = 1 . The result below shows that, for an appropriate choice of the quantizer, our successive update scheme with p = 1 (the fast but loose version) achieves an accuracy of δ 0 ( R ) g ( s ) asymptotically, universally for all processes in X .
Theorem 1
(Lower bound for maxmin tracking accuracy: The achievability). For R > 0 and s N , the asymptotic maxmin tacking accuracy is bounded below as
δ * ( R , s , X ) δ 0 ( R ) g ( s ) .
Furthermore, this bound can be obtained by a successive update scheme with p = 1 and appropriately chosen quantizer Q p .
We provide a proof in Section 6. Note that while we assume that the per dimension fourth moment of the processes in X is bounded, the asymptotic result above does not depend on that bound. Interestingly, the performance characterized above is the best possible.
Theorem 2
(Upper bound for maxmin tracking accuracy: The converse). For R > 0 and s N , the asymptotic maxmin tacking accuracy is bounded above as
δ * ( R , s , X ) δ 0 ( R ) g ( s ) .
Furthermore, the upper bound is obtained by considering a Gauss–Markov process.
We provide a proof in Section 7. Thus, δ * ( R , s , X ) = δ 0 ( R ) g ( s ) with the fast but loose successive update scheme being universally (asymptotically) optimal and the Gauss–Markov process being the most difficult process to track. Clearly, the best possible choice of sampling period is s = 1 and the highest possible accuracy at rate R is δ 0 ( R ) , whereby we cannot hope for an accuracy exceeding δ 0 ( R ) .
To provide an alternative view of the result, suppose that we fix the achievable tracking accuracy to be δ δ 0 ( R ) at rate R. Then, the above result says that g ( s ) ( δ / δ 0 ( R ) ) where g ( s ) is a decreasing function of s. Therefore, this result can be interpreted as saying that we cannot subsample at a frequency less than 1 / g 1 ( δ / δ 0 ( R ) ) for attaining a tracking accuracy δ δ 0 ( R ) .

4.2. Guidelines for Choosing a Good p

The proof of Theorem 1 entails the analysis of the successive update scheme for p = 1 . In fact, we can analyze this scheme for any p N and for any ( θ , ε ) -quantizer family; we term this tracking code the p-successive update (p-SU) scheme. This analysis can provide a simple guideline for the optimal choice of p depending on the performance of the quantizer.
However, there are some technical caveats. A quantizer family will operate only as long as the input y satisfies y 2 M . If a y outside this range is observed, the quantizer will declare ⊥ and the tracking code encoder, in turn, will declare a failure. We denote by τ the stopping time at which encoder failure occurs for the first time, i . e . ,
τ min { k s + j p : Q p ( Y k , j ) = , 0 k , 0 j m 1 } .
Further, denote by A t the event that failure does not occur until time t, i . e . ,
A t { τ > t } .
We characterize the performance of a p-SU in terms of the probability of encoder failure in a finite time horizon T.
Theorem 3
(Performance of p-SU). For fixed θ , ε , β [ 0 , 1 ] , consider the p-SU scheme with an n R p bit ( θ , ε ) -quantizer Q p , and denote the corresponding tracking code by ( ϕ p , ψ p ) . Suppose that for a time horizon T N , the tracking code ( ϕ p , ψ p ) satisfies P τ T β . Then,
sup X X n D ¯ T ( ϕ p , ψ p , X ) B T ( θ , ε , β ) ,
where B T ( θ , ε , β ) satisfies
lim sup T B T ( θ , ε , β ) σ 2 1 g ( s ) α 2 p 1 α 2 p θ 1 ε 2 σ 2 θ + κ β g ( s ) ( 1 α 2 s ) 1 α 2 ( s + p ) ( 1 θ ) 1 α 2 p θ .
We provide the proof of this theorem later in Section 5. We remark that β can be made small by choosing M to be large for a quantizer family. Furthermore, the inequality in the upper bound for the MSE in the previous result (barring the dependence on β ) comes from the inequality in the definition of a ( θ , ε ) -quantizer, rendering it a good proxy for the performance of the quantizer. The interesting regime is that of very small β where the encoder failure does not occur during the time horizon of operation. If we ignore the dependence on β , the accuracy of the p-SU does not depend either on s or on the bound for the fourth moment κ . Motivated by these insights, we define the accuracy-speed curve of a quantizer family as follows.
Definition 3
(The accuracy-speed curve). For α [ 0 , 1 ] , σ 2 , and R > 0 , the accuracy-speed curve for a ( θ , ε ) -quantizer family Q is given by
Γ Q ( p ) = α 2 p 1 α 2 p θ ( R p ) 1 ε 2 σ 2 θ ( R p ) , p > 0 .
By Theorem 3, it is easy to see that the accuracy (precisely the upper bound on the accuracy) of a p-SU scheme is better when Γ Q ( p ) is larger. Thus, a good choice of p for a given quantizer family Q is the one that maximizes Γ Q ( p ) for 1 p s .
We conclude by providing accuracy-speed curves for some illustrative examples. To build some heuristics, note that a uniform quantization of [ M , M ] has θ ( R ) = 0 and ε = M 2 R . For a gain-shape quantizer, we express a vector y = y 2 y s where the shape vector y s has y s 2 = 1 . An ideal shape quantizer (which only can be shown to exist asymptotically) using R bits per dimension will satisfy E y s ^ y s 2 2 2 2 R , similar to the scalar uniform quantizer. In one of the examples below, we consider gain-shape quantizers with such an ideal shape quantizer.
Example 1.
We begin by considering an ideal quantizer family with θ ( R ) = 2 2 R and ε = 0 . In our asymptotic analysis, we will show roughly that such a quantizer with very small ε exists. For this ideal case, for R > 0 , the accuracy-speed curve is given by
Γ Q ( p ) = α 2 p α 2 p θ ( R p ) 1 α 2 p θ ( R p ) = 1 1 α 2 p 1 α 2 p 2 R p .
It can be seen that Γ Q ( p ) is decreasing in p whereby the optimal choice of p that maximized Γ Q ( p ) over p [ s ] is p = 1 . Heuristically, this justifies why asymptotically the fast but loose successive update scheme is optimal.
Example 2
(Uniform scalar quantization). In this example, we consider a coordinate-wise uniform quantizer. Since we seek quantizers for inputs y R n such that y 2 M n , we can only use uniform quantizer of [ M n , M n ] for each coordinate. For this quantizer, we have θ = 0 and ε 2 = n M 2 2 2 R , whereby the accuracy-speed curve is given by Γ Q ( p ) = α 2 p ( 1 n M 2 2 2 R / σ 2 ) . Thus, once again, the optimal choice of p that maximizes accuracy is p = 1 .
Example 3
(Gain-shape quantizer). Consider the quantization of a vector y = a y s where a = y 2 . The vector y is quantized by a gain-shape quantizer which quantizes the norm and shape of the vector separately to give Q ( y ) = a ^ y ^ s . We use a uniform quantizer within a fixed range [ 0 , M n ] in order to quantize the norm a to a ^ , where an ideal shape quantizer is employed in quantizing the shape vector y s . Namely, we assume E y s y ^ s 2 2 2 2 R and y ^ s 1 . Suppose that we allot ℓ bits out of the total budget of n R bits for norm quantization and the rest for shape quantization. Then, we see that
E y Q ( y ) 2 2 2 a 2 2 2 ( R / n ) + n M 2 2 2 1 ,
whereby θ ( R ) = 2 2 ( R / n ) + 1 and ϵ 2 = M 2 2 2 1 . Thus, the accuracy-speed curve is given by
Γ Q ( p ) = α 2 p 1 2 α 2 p 2 2 ( R p / n ) 1 2 M 2 2 2 1 σ 2 2 2 ( R p / n ) + 1 .
Note that the optimal choice of p in this case depends on the choice of M.
We illustrated application of our analysis for idealized quantizers, but it can be used to analyze even very practical quantizers, such as the recently proposed almost optimal quantizer in [54].

5. Analysis of the Successive Update Scheme

From the discussion in Section 3, we observe that the successive update scheme is designed to refine the estimate of X k s in each interval I ˜ k . This fact helps us in establishing a recursive relation for D t ( ϕ p , ψ p , X ) , t I ˜ k in terms of D k s ( ϕ p , ψ p , X ) which is provided next.
Lemma 1.
For a time instant t = k s + j p + i , 0 j m 1 , 0 i p 1 and k 0 , let ( ϕ p , ψ p ) denote the tracking code of a p-SU scheme employing an n R p bit ( θ , ϵ ) -quantizer. Assume that P A t c β 2 . Then, we have
D t ( ϕ p , ψ p , X ) α 2 ( t k s ) θ j D k s ( ϕ p , ψ p , X ) + σ 2 ( 1 α 2 ( t k s ) ) + α 2 ( t k s ) ( 1 θ j ) ε 2 ( 1 θ ) + α 2 ( t k s ) κ β .
Proof. 
From the evolution of the AR[1] process defined in (1), we see that X t = α t k s X k s + u = k s + 1 t α t u ξ u . Further for the p-SU scheme, we know that X ^ t | t = α t k s X ^ k s | k s + j p at each instant t = k s + j p + i . Therefore, we have
X t X ^ t | t = α t k s ( X k s X ^ k s | k s + j p ) + u = k s + 1 t α t u ξ u .
Since the estimate X ^ k s | k s + j p is a function of samples ( X 0 , , X k s ) , and the sequence ( ξ u , u k s ) is independent of the past, we obtain the per dimension MSE as
D t ( ϕ p , ψ p , X ) = α 2 ( t k s ) n E X k s X ^ k s | k s + j p 2 2 + σ 2 ( 1 α 2 ( t k s ) ) .
Further, we divide the error into two terms based on occurrence of the failure event as follows:
D t ( ϕ p , ψ p , X ) = α 2 ( t k s ) n E [ X k s X ^ k s | k s + j p 2 2 𝟙 A t ] + E [ X k s X ^ k s | k s + j p 2 2 𝟙 A t c ] + σ 2 ( 1 α 2 ( t k s ) ) .
Recall that at each instant t = k s + j p , we refine the estimate X ^ k s | k s + ( j 1 ) p of X k s to X ^ k s | k s + j p = ( X ^ k s | k s + ( j 1 ) p + Q p ( Y k , j 1 ) ) 𝟙 A t . Upon substituting this expression for X ^ k s | k s + j p , we obtain
E [ X k s X ^ k s | k s + j p 2 2 𝟙 A t ] = E [ Y k , j 1 Q p ( Y k , j 1 ) 2 𝟙 A t ] θ E [ X k s X ^ k s | k s + ( j 1 ) p 2 2 𝟙 A t ] + n ε 2 ,
where the identity uses the definition of error Y k , j 1 given in (2) and the inequality holds since Q p is a ( θ , ε ) -quantizer. Repeating the previous step recursively, we get
1 n E [ X k s X ^ k s | k s + j p 2 2 𝟙 A t ] θ j · 1 n E [ X k s X ^ k s | k s 2 2 𝟙 A t ] + 1 θ j 1 θ · ε 2 θ j · 1 n E X k s X ^ k s | k s 2 2 + 1 θ j 1 θ · ε 2 ,
which is the same as
1 n E [ X k s X ^ k s | k s + j p 2 2 𝟙 A t ] θ j · D k s ( ϕ p , ψ p , X ) + 1 θ j 1 θ · ε 2 .
Moving to the error term E [ X k s X ^ k s | k s + j p 2 2 𝟙 A t c ] when encoder failure occurs, recall that the decoder sets the estimate to 0 in the event of an encoder failure. Thus, using the Cauchy–Schwarz inequality, we get
1 n E [ X k s X ^ k s | k s + j p 2 2 𝟙 A t c ] = 1 n E [ X k s 2 𝟙 A t c ] 1 n E X k s 4 P ( A t c ) κ β .
Substituting the two bounds above in (4), we get the result. □
The following recursive bound can be obtained using almost the same proof as that of Lemma 1; we omit the details.
Lemma 2.
Let ( ϕ p , ψ p ) denote the tracking code of a p-SU scheme employing an n R p bit ( θ , ϵ ) -quantizer. Assume that P A t c β 2 . Then, we have
D k s ( ϕ p , ψ p , X n ) α 2 s θ m D ( k 1 ) s ( ϕ p , ψ p , X n ) + σ 2 ( 1 α 2 s ) + α 2 s ε 2 ( 1 θ m ) ( 1 θ ) + α 2 s κ β .
We also need the following technical observation.
Lemma 3.
For a sequence ( X k R : k Z + ) that satisfies sequence of upper bounds
X k a X k 1 + b , k Z + ,
with constants a , b R such that b is finite and a ( 1 , 1 ) , we have
lim K 1 K k = 0 K 1 X k b 1 a .
Proof. 
From the sequence of upper bounds, we can inductively show that
X k a k X 0 + b 1 a k 1 a , k Z + .
Averaging X k over the horizon { 0 , , K 1 } , we get
1 K k = 0 K 1 X k 1 a K K ( 1 a ) X 0 b 1 a + b 1 a .
From the finiteness of X 0 , b and the fact that | a | < 1 , the result follows by taking the limit K growing arbitrarily large on both sides. □
We are now in a position to prove Theorem 3.
Proof of Theorem 3. 
We begin by noting that, without any loss of generality, we can restrict to T = K s . This holds since the contributions of the error term within the fixed interval I K are bounded. For T = K s , the time duration { 0 , , T } can be partitioned into intervals ( I k , k + 1 [ K ] ) . Therefore, we can write the average MSE per dimension for the p-SU scheme for time-horizon T = K s as
D ¯ T ( ϕ p , ψ p , X ) = 1 K s k = 0 K 1 j = 0 m 1 i = 0 p 1 D k s + j p + i ( ϕ p , ψ p , X ) .
From the upper bound for per dimension MSE given in Lemma 1, we get
i = 0 p 1 D k s + j p + i ( ϕ p , ψ p , X ) i = 0 p 1 [ α 2 ( j p + i ) θ j D k s ( ϕ p , ψ p , X ) + σ 2 ( 1 α 2 ( j p + i ) ) + α 2 ( j p + i ) ( 1 θ j ) ε 2 ( 1 θ ) + α 2 ( j p + i ) κ β ] = α 2 j p 1 α 2 p 1 α 2 θ j D k s ( ϕ p , ψ p , X ) + ( 1 θ j ) ε 2 ( 1 θ ) + κ β σ 2 + p σ 2 .
Summing the expression above over j { 0 , , m 1 } and k { 0 , , K 1 } , and dividing by T, we get
D ¯ T ( ϕ p , ψ p , X ) σ 2 + ( 1 α 2 s ) s ( 1 α 2 ) κ β σ 2 + ε 2 1 θ + ( 1 α 2 p ) s ( 1 α 2 ) ( 1 α 2 s θ m ) ( 1 α 2 p θ ) 1 K k = 0 K 1 D k s ( ϕ p , ψ p , X ) ε 2 1 θ .
It follows by Lemma 2 that
D ¯ T ( ϕ p , ψ p , X ) σ 2 + ( 1 α 2 s ) s ( 1 α 2 ) κ β σ 2 + ε 2 1 θ + ( 1 α 2 p ) s ( 1 α 2 ) ( 1 α 2 s θ m ) ( 1 α 2 p θ ) sup { a k } k 0 A 1 K k = 0 K 1 a k ε 2 1 θ ,
where the A denotes the set of { a k } k 0 satisfying
a k α 2 s θ m a k 1 + σ 2 ( 1 α 2 s ) + α 2 s κ β + α 2 s ε 2 ( 1 θ m ) ( 1 θ ) .
We denote the right-side of (5) by B T ( θ , ϵ , β ) . Noting that by Lemma 3 any sequence { a k } k 0 A satisfies
lim K 1 K k = 0 K 1 a k 1 1 α 2 s θ m σ 2 ( 1 α 2 s ) + α 2 s κ β + α 2 s ε 2 ( 1 θ m ) ( 1 θ ) ,
we get that
lim sup T B T ( θ , ϵ , β ) σ 2 1 ( 1 α 2 s ) s ( 1 α 2 ) · α 2 p ( 1 θ ) ( 1 α 2 p θ ) + ε 2 ( 1 α 2 s ) s ( 1 α 2 ) · α 2 p ( 1 α 2 p θ ) + κ β 1 s ( 1 α 2 ) 1 α 2 ( s + p ) ( 1 θ ) ( 1 α 2 p θ ) ,
which completes the proof. □

6. Asymptotic Achievability Using Random Quantizer

With Theorem 3 at our disposal, the proof of achievability can be completed by fixing p = 1 and showing the existence of appropriate quantizer. However, we need to handle the failure event, and we address this first. The next result shows that the failure probability depends on the quantizer only through M.
Lemma 4.
For fixed T and n, consider the p-SU scheme with p = 1 and an n R bit ( θ , ϵ ) -quantizer Q with dynamic range M. Then, for every η > 0 , there exists an M 0 independent of n such that for all M M 0 , we get
P A T c η .
Proof. 
The event A T (of encoder failure not happening until time T for the successive update scheme) occurs when the errors Y k , j satisfies Y k , j 2 2 n M 2 , for every k 0 and 0 j s 1 such that t = k s + j T . For brevity, we denote by Y t the error random variable Y k , j and Y t 1 = ( Y 1 , , Y t 1 ) . We note that
P A T c = P A T 1 c + P A T 1 A T c = P A T 1 c + P A T 1 { Y T 2 2 > n M 2 } = P A T 1 c + E 𝟙 A T 1 P Y T 2 2 > n M 2 | Y T 1 P A T 1 c + E 𝟙 A T 1 E [ Y T 2 2 | Y T 1 ] n M 2 = P A T 1 c + E [ Y T 2 2 𝟙 A T 1 ] n M 2 .
Note that the inequality follows from Markovs inequality. Further we have Y T = Y T 1 Q ( Y T 1 ) under A T 1 , whereby
E [ Y T 2 2 𝟙 A T 1 ] θ · E [ Y T 1 2 2 𝟙 A T 1 ] + n ε 2 .
Denoting by β T 2 the probability P A T c , the previous two inequalities imply
β T 2 β T 1 2 + θ n M 2 E [ Y T 1 2 2 ] + ε 2 M 2 .
We saw earlier in the proof of Lemma 1 that E [ Y T 1 2 2 ] / n depends only on the probability β T 1 2 that failure does not occur until time T 1 . Proceeding as in that proof, we get
β T 2 β T 1 2 + 1 M 2 ( c 1 β T 1 + c 2 ) ,
where c 1 and c 2 do not depend on n. Therefore, there exists M 0 independent of n such that for all M exceeding M 0 we have
β T 2 β T 1 2 + η ,
which completes the proof by summing over T. □
The bound above is rather loose, but it suffices for our purpose. In particular, it says that we can choose M sufficiently large to make probability of failure until time T less than any β 2 , whereby Theorem 3 can be applied by designing a quantizer for this M. Indeed, we can use the quantizer of unit sphere from [1,2], along with a uniform quantizer for gain (which lies in [ M , M ] ) to get the following performance. In fact, we will show that a deterministic quantizer with the desired performance exists. Note that we already considered such a quantizer in Example 3. However, the analysis there was slightly loose and it assumed the existence of an ideal shape quantizer.
Lemma 5.
For every R , ε , γ , M > 0 , there exists an n R bit ( 2 2 ( R γ ) , ε ) -quantizer with dynamic range M, for all n sufficiently large.
Proof. 
We first borrow a classic construction from [1,2], which gives us our desired shape quantizer. Denote by S n the ( n 1 ) -dimensional unit sphere { y R n : y 2 = 1 } . For every γ > 0 and n sufficiently large, it was shown in [1,2] that there exist 2 n R vectors C in S n such that for every y S n we can find y C satisfying
y , y 1 2 2 ( R γ ) .
Denoting cos θ = 1 2 2 ( R γ ) , consider the shape quantizer Q R ( y ) from [2] given by
Q R y cos θ · arg min y C y y 2 2 = cos θ · arg max y C y , y , y S n .
Note that we shrink the length of y by a factor of cos θ , which will be seen to yield the gain over the analysis in Example 3.
We append to this shape quantizer the uniform gain quantizer q M : [ 0 , M ] [ 0 , M ] , which quantizes the interval [ 0 , M ] uniformly into subintervals of length ϵ . Specifically, q M ( a ) = ε a / ε and the corresponding index is given by a / ε . We represent this index using its log ( M / ε ) bit binary representation and denote this mapping by ϕ M : [ 0 , M ] { 0 , 1 } .
For every y R n such that y 2 2 n M 2 , we consider the quantizer
Q ( y ) = n · q M y 2 n · Q R y y 2 .
For this quantizer, for every y R n with y 2 2 = n B 2 such that B M , we have
y Q ( y ) 2 2 = y 2 2 + Q ( y ) 2 2 2 y , Q ( y ) = n B 2 + n B ^ 2 cos 2 θ 2 n B B ^ cos θ y ˜ , Q R ( y ˜ ) n B 2 + n B ^ 2 cos 2 θ 2 n B B ^ cos 2 θ = n B 2 sin 2 θ + n ( B B ^ ) 2 cos 2 θ n B 2 sin 2 θ + n ε 2 cos 2 θ n B 2 2 2 ( R γ ) + n ε 2 ,
where the first inequality uses the covering property of C . Therefore, Q constitutes an n R + bit 2 2 ( R γ ) , ε -quantizer with dynamic range M, for all n sufficiently large. Note that this quantizer is a deterministic one. □
Proof of Theorem 1. 
For any fixed β and ε , we can make the probability of failure until time T less than β by choosing M sufficiently large. Further, for any fixed R , γ > 0 , by Lemma 5, we can choose n sufficiently large to get an n R bit ( 2 2 ( R γ ) , ε ) -quantizer for vectors y with y 2 2 n M 2 . Therefore, by Theorem 3 applied for p = 1 , we get that
δ * ( R , s , X ) g ( s ) α 2 1 α 2 2 2 ( R γ ) 1 ε 2 σ 2 2 2 ( R γ ) κ β g ( s ) σ 2 ( 1 α 2 s ) 1 α 2 ( s + 1 ) 1 θ 1 α 2 θ .
The proof is completed upon taking the limits as ε , γ , and β go to 0. □

7. Converse Bound: Proof of Theorem 2

The proof is similar to the converse proof in [55], but now we need to handle the delay per transmission. We rely on the properties of entropy power of a random variable. Recall that for a continuous random variable X taking values in R n , the entropy power of X is given by
N ( X ) = 1 2 π e · 2 ( 2 / n ) h ( X ) ,
where h ( X ) is the differential entropy of X.
Consider a tracking code ( ϕ , ψ ) of rate R and sampling period s and a process X X n . We begin by noting that the state at time t is related to the state at time t + i as
X t + i = α i X t + j = 0 i 1 α j ξ t + i j ,
where the noise j = 0 i 1 α j ξ t + i j is independent of X t (and the past states). In particular, for t = k s + i , 1 i < s , we get
E X t X ^ t | t 2 2 = E α i X k s X ^ t | t 2 2 + j = 0 i 1 α 2 j E ξ k s + i j 2 2 = α 2 i E X k s X ˜ t 2 2 + n ( 1 α 2 i ) σ 2 .
where we define X ˜ t : = α i X ^ t | t and the first identity uses the orthogonality of noise added in each round from the previous states and noise. The second equality follows directly by substituting the variance of the components of the process ξ k s + i j and simplifying the summation. Since the Gaussian distribution has the maximum differential entropy among all continuous random variables with a given variance, and the entropy power for a Gaussian random variable equals its variance, we get that
σ 2 ( 1 α 2 ) N ( ξ t + i ) .
Therefore, the previous bound for tracking error yields
D k s + i ( ϕ , ψ , X ) α 2 i 1 n E X k s X ˜ k s + i 2 2 + ( 1 α 2 i ) ( 1 α 2 ) N ( ξ k s + i ) = α 2 i 1 n E X k s X ˜ k s + i 2 2 + ( 1 α 2 i ) ( 1 α 2 ) N ( ξ 1 ) ,
where the identity uses the assumption that ξ t are identically distributed for all t. Taking average of these terms for t = 0 , . . , T , we get
D ¯ T ( ϕ , ψ , X ) = 1 n K s k = 0 K 1 i = k s ( k + 1 ) s 1 E X i X ^ i | i 2 2 1 n K s k = 0 K 1 i = 0 s 1 α 2 i E X k s X ˜ k s + i 2 2 + N ( ξ 1 ) ( 1 α 2 ) 1 ( 1 α 2 s ) s ( 1 α 2 ) .
Note that X ˜ k s + i s act as estimates of X k s which depend on the communication received by the decoder until time k s + i . We denote the communication received at time t by C t 1 , whereby X ˜ k s + i depends only on C 1 , , C k s + i 1 . In particular, the communication C k s , , C k s + i 1 was sent as a function of X k s , the sample seen at time t = k s .
From here on, we proceed by invoking the “entropy power bounds” for the MSE terms. For random variables X and Y such that P X | Y has a conditional density, the conditional entropy power is given by N ( X | Y ) = 1 / ( 2 π e ) 2 2 h ( X | Y ) / n . (The conditional differential entropy h ( X | Y ) is given by E h ( P X | Y ) .) Bounding MSE terms by entropy power is a standard step that allows us to track reduction in error due to a fixed amount of communication.
We begin by using the following standard bound (see [56], Chapter 10): (It follows simply by noting that Gaussian maximizes differential entropy among all random variables with a given second moment and that h ( X ) h ( X | Y ) H ( Y ) = n R .) For a continuous random variable X and a discrete random variable Y taking { 0 , 1 } n R values, let X ^ be any function of Y. Then, it holds that
1 n E X X ^ 2 2 2 R N ( X ) .
We apply this result to X k s given C k s 1 in the role of X and the communication C k s , . . , C k s + i 1 in the role of Y. The previous bound and Jensen’s inequality yield
1 n E X k s X ˜ k s + i 2 2 2 2 R i E [ N ( X k s | C k s 1 ) ] .
Next, we recall the entropy power inequality (cf. [56]): For independent X 1 and X 2 , N ( X 1 + X 2 ) N ( X 1 ) + N ( X 2 ) . Noting that X k s = α s X ( k 1 ) s + j = 0 s 1 α j ξ k s j , where { ξ i } is an iid zero-mean random variable independent of X ( k 1 ) s , and that C k s 1 is a function of X 1 , , X ( k 1 ) s , we get
N ( X k s | C k s 1 ) N ( α s X ( k 1 ) s | C k s 1 ) + N ( ξ k s ) ( 1 α 2 s ) ( 1 α 2 ) = α 2 s N ( X ( k 1 ) s | C k s 1 ) + N ( ξ 1 ) ( 1 α 2 s ) ( 1 α 2 ) ,
where the previous identities utilizes the scaling property of differential entropy. Upon combining the bounds given above and simplifying, we get
D ¯ T ( ϕ , ψ , X ) α 2 s ( 1 α 2 s 2 2 R s ) s ( 1 α 2 2 2 R ) · 1 K k = 0 K 1 E [ N ( X ( k 1 ) s | C k s 1 ) ] + N ( ξ 1 ) ( 1 α 2 ) 1 + ( 1 α 2 s ) ( 1 α 2 s 2 2 R s ) s ( 1 α 2 2 2 R ) ( 1 α 2 s ) s ( 1 α 2 ) .
Finally, note that the terms N ( X ( k 1 ) s | C k s 1 ) are exactly the same as that considered in [55] (eqn. 11e) since they correspond to recovering X ( k 1 ) s using communication that can depend on it. Therefore, a similar expression holds here, for the sampled process { X k s : k N } . Using the recursive bound for the tracking error in (6) and (7), we adapt the results of [55] (eqn. 11) for our case to obtain
E [ N ( X ( k 1 ) s | C k s 1 ) ] d k 1 * ,
where the quantity d k * is given by the recursion
d k * = 2 2 R s α 2 s d k 1 * + N ( ξ 1 ) ( 1 α 2 s ) ( 1 α 2 ) ,
with d 0 * = 0 .
The bound obtain above holds for any given process X X n . To obtain the best possible bound we substitute ξ 1 to be a Gaussian random variable, since that would maximize N ( ξ 1 ) . Specifically, we set { ξ k } to be a Gaussian random variable with zero mean and variance σ 2 to get N ( ξ ) = σ 2 ( 1 α 2 ) . Thus, taking supremum over all distributions on both sides of (8), we have
sup X X n D ¯ T ( ϕ , ψ , X ) α 2 s ( 1 α 2 s 2 2 R s ) s ( 1 α 2 2 2 R ) · 1 K k = 0 K 1 d k 1 * + σ 2 1 + ( 1 α 2 s ) ( 1 α 2 s 2 2 R s ) s ( 1 α 2 2 2 R ) ( 1 α 2 s ) s ( 1 α 2 ) ,
where
d k * = 2 2 R s α 2 s d k 1 * + σ 2 ( 1 α 2 s ) ,
with d 0 * = 0 . For this sequence d k * , we can see that (cf. [55] (Corollary 1))
lim sup K 1 K k = 0 K 1 d k 1 * = lim K d k * = σ 2 ( 1 α 2 s ) 2 2 R s ( 1 α 2 s 2 2 R s ) .
Therefore, we have obtained
lim sup T sup X X n D ¯ T ( ϕ , ψ , X ) σ 2 ( 1 α 2 s ) α 2 s 2 2 R s s ( 1 α 2 2 2 R ) + 1 ( 1 α 2 s ) s ( 1 α 2 ) + ( 1 α 2 s ) ( 1 α 2 s 2 2 R s ) s ( 1 α 2 2 2 R ) = σ 2 1 g ( s ) δ 0 ( R ) .
As the bound obtained above holds for all tracking codes ( ϕ , ψ ) , it follows that δ * ( R , s , X ) g ( s ) δ 0 ( R ) .

8. Discussion

We restricted our treatment to an AR[1] process with uncorrelated components. This restriction is for clarity of presentation, and some of the results can be extended to AR[1] processes with correlated components. In this case, the decoder will be replaced by a Kalman-like filter in the manner of [35]. A natural extension of this work is the study of an optimum transmission strategy for an AR[n] process in the given setting. In an AR[n] process, the strategy of refining the latest sample is clearly not sufficient as the value of the process at any time instant is dependent on the past n samples. If the sampling is periodic, even the encoder does not have access to all these n samples unless we take a sample at every instant. A viable alternative is to take n consecutive samples at every sampling instant. However, even with this structure on the sampling policy, it is not clear how the information must be transmitted. A systematic analysis of this problem is an interesting area of future research.
Another setting which is not discussed in the current work is where the transmissions are of nonuniform rates. Throughout our work, we have assumed periodic sampling and transmissions at a fixed rate. For the scheme presented in this paper, it is easy to see from our analysis that only the total number of bits transmitted in each sampling interval matters, when the dimension is sufficiently large. That is, for our scheme, even framing each packet (sent in each communication slot) using unequal number of bits will give the same performance as that for equal packet size, if the overall bit-budget per sampling period is fixed. A similar phenomenon was observed in [39], which allowed the extension of some of their analysis to erasure channels with feedback. We remark that a similar extension is possible for some of our results, too. This behavior stems from the use of successive batches of bits to successively refine the estimate of a single sample within any sampling interval, whereby at the end of the sampling interval the error corresponds to roughly that for a quantizer using the total number of bits sent during the interval. In general, a study of nonuniform rates for describing each sample, while keeping bits per time-slot fixed, will require us to move beyond uniform sampling. This, too, is an interesting research direction to pursue.
Finally, we remark that the encoder structure we have imposed, wherein the error in the estimate of the latest sample is refined at each instant, is optimal only asymptotically and is justified only heuristically for fixed dimensions. Even for one dimensional observation it is not clear if this structure is optimal. We believe that this is a question of fundamental interest which remains open.

Author Contributions

Conceptualization, P.P. and H.T.; methodology, H.T.; formal analysis and validation, R.J., P.P. and H.T.; writing—original draft preparation, R.J.; writing—review, editing and supervision, P.P. and H.T.; project administration and funding acquisition, P.P.. All authors have read and agreed to the published version of the manuscript.

Funding

The work of the first author is supported by fellowships from the Centre for Networked Intelligence (a Cisco CSR initiative) of Indian Institute of Science and Robert Bosch Center for Cyberphysical Systems (RBCCPS), Indian Institute of Science, Bangalore. The work of the second author is supported in part by the Department of Telecommunications, Government of India, under Grant DOTC-0001, the Centre for Networked Intelligence (a Cisco CSR initiative) and the RBCCPS, Indian Institute of Science. The work of the third author is supported by a grant from RBCCPS, Indian Institute of Science and the grant EMR/2016/002569 from the Department of Science and Technology (DST), India.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank Shun Watanabe for pointing to the reference [2].

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AR[1]Auto-Regressive process of order 1

References

  1. Wyner, A.D. Random packings and coverings of the unit n-sphere. Bell Syst. Tech. J. 1967, 46, 2111–2118. [Google Scholar] [CrossRef]
  2. Lapidoth, A. On the role of mismatch in rate distortion theory. IEEE Trans. Inf. Theory 1997, 43, 38–47. [Google Scholar] [CrossRef]
  3. Witsenhausen, H.S. On the Structure of Real-Time Source Coders. Bell Syst. Tech. J. 1979, 58, 1437–1451. [Google Scholar] [CrossRef]
  4. Teneketzis, D. On the Structure of Optimal Real-Time Encoders and Decoders in Noisy Communication. IEEE Trans. Inf. Theory 2006, 52, 4017–4035. [Google Scholar] [CrossRef] [Green Version]
  5. Mahajan, A.; Teneketzis, D. On Real-Time Communication Systems with Noisy Feedback. In Proceedings of the 2007 IEEE Information Theory Workshop, Tahoe City, CA, USA, 2–6 September 2007; pp. 283–288. [Google Scholar]
  6. Walrand, J.C.; Varaiya, P. Optimal causal coding-decoding problems. IEEE Trans. Inf. Theory 1983, 29, 814–819. [Google Scholar] [CrossRef]
  7. Yuksel, S. On optimal causal coding of partially observed Markov sources in single and multiterminal settings. IEEE Trans. Inf. Theory 2012, 59, 424–437. [Google Scholar] [CrossRef] [Green Version]
  8. Linder, T.; Yüksel, S. On optimal zero-delay coding of vector Markov sources. IEEE Trans. Inf. Theory 2014, 60, 5975–5991. [Google Scholar] [CrossRef] [Green Version]
  9. Wood, R.G.; Linder, T.; Yüksel, S. Optimal zero delay coding of Markov sources: Stationary and finite memory codes. IEEE Trans. Inf. Theory 2017, 63, 5968–5980. [Google Scholar] [CrossRef] [Green Version]
  10. Dobrushin, R.; Tsybakov, B. Information transmission with additional noise. IRE Trans. Inf. Theory 1962, 8, 293–304. [Google Scholar] [CrossRef]
  11. Wolf, J.; Ziv, J. Transmission of noisy information to a noisy receiver with minimum distortion. IEEE Trans. Inf. Theory 1970, 16, 406–411. [Google Scholar] [CrossRef]
  12. Witsenhausen, H. Indirect rate distortion problems. IEEE Trans. Inf. Theory 1980, 26, 518–521. [Google Scholar] [CrossRef]
  13. Mohammadi, E.; Fallah, A.; Marvasti, F. Sampling and Distortion Tradeoffs for Indirect Source Retrieval. IEEE Trans. Inf. Theory 2017, 63, 6833–6848. [Google Scholar] [CrossRef] [Green Version]
  14. Zamir, R.; Feder, M. Rate-distortion performance in coding bandlimited sources by sampling and dithered quantization. IEEE Trans. Inf. Theory 1995, 41, 141–154. [Google Scholar] [CrossRef] [Green Version]
  15. Mashiach, A.; Zamir, R. Entropy-coded quantization of periodic nonuniform samples. In Proceedings of the 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel, Eilat, Israel, 14–17 November 2012; pp. 1–5. [Google Scholar]
  16. Mashiach, A.; Zamir, R. Noise-shaped quantization for nonuniform sampling. In Proceedings of the 2013 IEEE International Symposium on Information Theory, Istanbul, Turkey, 7–12 July 2013; pp. 1187–1191. [Google Scholar]
  17. Kipnis, A.; Goldsmith, A.J.; Eldar, Y.C.; Weissman, T. Distortion rate function of sub-Nyquist sampled Gaussian sources. IEEE Trans. Inf. Theory 2015, 62, 401–429. [Google Scholar] [CrossRef] [Green Version]
  18. Wong, W.S.; Brockett, R.W. Systems with finite communication bandwidth constraints. I. State estimation problems. IEEE Trans. Autom. Control 1997, 42, 1294–1299. [Google Scholar] [CrossRef]
  19. Nair, G.N.; Evans, R.J. State estimation via a capacity-limited communication channel. In Proceedings of the 36th IEEE Conference on Decision and Control, San Diego, CA, USA, 12 December 1997; Volume 1, pp. 866–871. [Google Scholar]
  20. Nair, G.N.; Evans, R.J. State estimation under bit-rate constraints. In Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171), Tampa, FL, USA, 18 December 1998; Volume 1, pp. 251–256. [Google Scholar]
  21. Dokuchaev, N.G.; Savkin, A.V. Recursive state estimation via limited capacity communication channels. In Proceedings of the 38th IEEE Conference on Decision and Control, Phoenix, AZ, USA, 7–10 December 1999; Volume 5, pp. 4929–4932. [Google Scholar]
  22. Smith, S.C.; Seiler, P. Estimation with lossy measurements: Jump estimators for jump systems. IEEE Trans. Autom. Control 2003, 48, 2163–2171. [Google Scholar] [CrossRef] [Green Version]
  23. Matveev, A.S.; Savkin, A.V. The problem of state estimation via asynchronous communication channels with irregular transmission times. IEEE Trans. Autom. Control 2003, 48, 670–676. [Google Scholar] [CrossRef]
  24. Lipsa, G.M.; Martins, N.C. Remote state estimation with communication costs for first-order LTI systems. IEEE Trans. Autom. Control 2011, 56, 2013–2025. [Google Scholar] [CrossRef]
  25. Chakravorty, J.; Mahajan, A. Fundamental Limits of Remote Estimation of Autoregressive Markov Processes Under Communication Constraints. IEEE Trans. Autom. Control 2017, 62, 1109–1123. [Google Scholar] [CrossRef] [Green Version]
  26. Nayyar, A.; Basar, T.; Teneketzis, D.; Veeravalli, V.V. Optimal Strategies for Communication and Remote Estimation With an Energy Harvesting Sensor. IEEE Trans. Autom. Control 2013, 58, 2246–2259. [Google Scholar] [CrossRef] [Green Version]
  27. Sun, Y.; Polyanskiy, Y.; Uysal-Biyikoglu, E. Remote estimation of the Wiener process over a channel with random delay. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 321–325. [Google Scholar]
  28. Linder, T.; Lugosi, G. A zero-delay sequential scheme for lossy coding of individual sequences. IEEE Trans. Inf. Theory 2001, 47, 2533–2538. [Google Scholar] [CrossRef]
  29. Weissman, T.; Merhav, N. On Limited-Delay Lossy Coding and Filtering of Individual Sequences. IEEE Trans. Inf. Theory 2002, 48, 721–732. [Google Scholar] [CrossRef]
  30. Weisman, T.; Merhav, N. Universal prediction of individual binary sequences in the presence of noise. IEEE Trans. Inf. Theory 2001, 47, 2151–2173. [Google Scholar] [CrossRef]
  31. Matloub, S.; Weissman, T. Universal Zero-Delay Joint Source-Channel Coding. IEEE Trans. Inf. Theory 2006, 52, 5240–5249. [Google Scholar] [CrossRef]
  32. Gorbunov, A.K.; Pinsker, M.S. Nonanticipatory and Prognostic Epsilon Entropies and Message Generation Rates. Probl. Inform. Transm. 1973, 9, 184–191. [Google Scholar]
  33. Gorbunov, A.K.; Pinsker, M.S. Prognostic Epsilon Entropy of a Gaussian Message and a Gaussian Source. Probl. Inform. Transm. 1974, 10, 93–109. [Google Scholar]
  34. Stavrou, P.; Kourtellaris, C.K.; Charalambous, C.D. Information Nonanticipative Rate Distortion Function and Its Applications. CoRR 2014. [Google Scholar] [CrossRef] [Green Version]
  35. Stavrou, P.A.; Østergaard, J.; Charalambous, C.D. Zero-Delay Rate Distortion via Filtering for Vector-Valued Gaussian Sources. IEEE J. Sel. Top. Signal Process. 2018, 12, 841–856. [Google Scholar] [CrossRef]
  36. Stavrou, P.A.; Charalambous, T.; Charalambous, C.D.; Loyka, S. Optimal Estimation via Nonanticipative Rate Distortion Function and Applications to Time-Varying Gauss–Markov Processes. SIAM J. Control Optim. 2018, 56, 3731–3765. [Google Scholar] [CrossRef] [Green Version]
  37. Viswanathan, H.; Berger, T. Sequential coding of correlated sources. IEEE Trans. Inf. Theory 2000, 46, 236–246. [Google Scholar] [CrossRef]
  38. Ma, N.; Ishwar, P. On Delayed Sequential Coding of Correlated Sources. IEEE Trans. Inf. Theory 2011, 57, 3763–3782. [Google Scholar] [CrossRef] [Green Version]
  39. Khina, A.; Kostina, V.; Khisti, A.; Hassibi, B. Tracking and Control of Gauss–Markov Processes over Packet-Drop Channels with Acknowledgments. IEEE Trans. Control. Netw. Syst. 2019, 6, 549–560. [Google Scholar] [CrossRef] [Green Version]
  40. Kipnis, A.; Reeves, G. Gaussian Approximation of Quantization Error for Estimation from Compressed Data. In Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; pp. 2029–2033. [Google Scholar]
  41. Delchamps, D.F. Extracting state information from a quantized output record. Syst. Control. Lett. 1989, 13, 365–372. [Google Scholar] [CrossRef]
  42. Borkar, V.S.; Mitter, S.K. LQG Control with Communication Constraints. In Communications, Computation, Control, and Signal Processing: A Tribute to Thomas Kailath; Springer: Boston, MA, USA, 1997; pp. 365–373. [Google Scholar]
  43. Wong, W.S.; Brockett, R.W. Systems with finite communication bandwidth constraints. II. Stabilization with limited information feedback. IEEE Trans. Autom. Control 1999, 44, 1049–1053. [Google Scholar] [CrossRef] [Green Version]
  44. Nair, G.N.; Evans, R.J. Communication-limited stabilization of linear systems. In Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No. 00CH37187), Sydney, Australia, 12–15 December 2000; Volume 1, pp. 1005–1010. [Google Scholar]
  45. Liberzon, D. On stabilization of linear systems with limited information. IEEE Trans. Autom. Control 2003, 48, 304–307. [Google Scholar] [CrossRef] [Green Version]
  46. You, K.; Xie, L. Minimum data rate for mean square stabilization of discrete LTI systems over lossy channels. IEEE Trans. Autom. Control 2010, 55, 2373–2378. [Google Scholar] [CrossRef]
  47. Yuksel, S. Stochastic stabilization of noisy linear systems with fixed-rate limited feedback. IEEE Trans. Autom. Control 2010, 55, 2847–2853. [Google Scholar] [CrossRef]
  48. Yuksel, S.; Meyn, S.P. Random-time, state-dependent stochastic drift for Markov chains and application to stochastic stabilization over erasure channels. IEEE Trans. Autom. Control 2012, 58, 47–59. [Google Scholar] [CrossRef] [Green Version]
  49. Yuksel, S. Characterization of information channels for asymptotic mean stationarity and stochastic stability of nonstationary/unstable linear systems. IEEE Trans. Inf. Theory 2012, 58, 6332–6354. [Google Scholar] [CrossRef] [Green Version]
  50. Yuksel, S. Stationary and ergodic properties of stochastic nonlinear systems controlled over communication channels. SIAM J. Control Optim. 2016, 54, 2844–2871. [Google Scholar] [CrossRef] [Green Version]
  51. Arnstein, D. Quantization error in predictive coders. IEEE Trans. Commun. 1975, 23, 423–429. [Google Scholar] [CrossRef]
  52. Farvardin, N.; Modestino, J. Rate-distortion performance of DPCM schemes for autoregressive sources. IEEE Trans. Inf. Theory 1985, 31, 402–418. [Google Scholar] [CrossRef]
  53. Gersho, A.; Gray, R.M. Vector Quantization and Signal Compression; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 159. [Google Scholar]
  54. Mayekar, P.; Tyagi, H. RATQ: A Universal Fixed-Length Quantizer for Stochastic Optimization. arXiv 2019, arXiv:1908.08200. [Google Scholar]
  55. Khina, A.; Khisti, A.; Kostina, V.; Hassibi, B. Sequential coding of Gauss-Markov sources with packet erasures and feedback. In Proceedings of the 2017 IEEE Information Theory Workshop (ITW), Kaohsiung, Taiwan, 6–10 November 2017; pp. 529–530. [Google Scholar]
  56. Cover, T.; Thomas, J. Elements of Information Theory; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
Figure 1. Communication model.
Figure 1. Communication model.
Entropy 23 00347 g001
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Jinan, R.; Parag, P.; Tyagi, H. Tracking an Auto-Regressive Process with Limited Communication per Unit Time. Entropy 2021, 23, 347. https://doi.org/10.3390/e23030347

AMA Style

Jinan R, Parag P, Tyagi H. Tracking an Auto-Regressive Process with Limited Communication per Unit Time. Entropy. 2021; 23(3):347. https://doi.org/10.3390/e23030347

Chicago/Turabian Style

Jinan, Rooji, Parimal Parag, and Himanshu Tyagi. 2021. "Tracking an Auto-Regressive Process with Limited Communication per Unit Time" Entropy 23, no. 3: 347. https://doi.org/10.3390/e23030347

APA Style

Jinan, R., Parag, P., & Tyagi, H. (2021). Tracking an Auto-Regressive Process with Limited Communication per Unit Time. Entropy, 23(3), 347. https://doi.org/10.3390/e23030347

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop