Next Article in Journal
Improving Underwater Continuous-Variable Measurement-Device-Independent Quantum Key Distribution via Zero-Photon Catalysis
Previous Article in Journal
Systems with Size and Energy Polydispersity: From Glasses to Mosaic Crystals
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimation of Autoregressive Parameters from Noisy Observations Using Iterated Covariance Updates

Electrical and Computer Engineering Department, Utah State University, Logan, UT 84332, USA
*
Author to whom correspondence should be addressed.
Entropy 2020, 22(5), 572; https://doi.org/10.3390/e22050572
Submission received: 16 April 2020 / Revised: 11 May 2020 / Accepted: 16 May 2020 / Published: 19 May 2020
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
Estimating the parameters of the autoregressive (AR) random process is a problem that has been well-studied. In many applications, only noisy measurements of AR process are available. The effect of the additive noise is that the system can be modeled as an AR model with colored noise, even when the measurement noise is white, where the correlation matrix depends on the AR parameters. Because of the correlation, it is expedient to compute using multiple stacked observations. Performing a weighted least-squares estimation of the AR parameters using an inverse covariance weighting can provide significantly better parameter estimates, with improvement increasing with the stack depth. The estimation algorithm is essentially a vector RLS adaptive filter, with time-varying covariance matrix. Different ways of estimating the unknown covariance are presented, as well as a method to estimate the variances of the AR and observation noise. The notation is extended to vector autoregressive (VAR) processes. Simulation results demonstrate performance improvements in coefficient error and in spectrum estimation.

1. Introduction

The problem of estimating the parameters { b i , i = 1 , , p } of an autoregressive (AR) process
ξ ( m ) = i = 1 p b i ξ ( m i ) + η ( m ) ,
where the input η ( n ) is a white noise process, is important in many aspects of signal processing. It plays roles a variety of applications, such as speech coding and analysis (see References [1,2,3,4,5,6,7]). Autoregressive modeling is instrumental in many spectrum estimation algorithms, and algorithms for noise-free measurements have been developed from that perspective [8,9,10,11,12,13,14]; see also Reference [5] for a survey and perspective. Vector autoregressive modeling is also important in econometric modeling [15]. Among many other applications, AR models have also been used in biomedical signal processing (see, e.g., Reference [16]); communication (see, e.g., Reference [17]); and financial modeling (see, e.g., Reference [15]).
Because of its importance, the AR parameter estimation problem has been well-studied from a realization of noise-free process { ξ ( m ) } When autocorrelations r k = E [ ξ ( m ) ξ ( m k ) ] are known (or estimated), the Yule-Walker equations describe the solutions [18,19]. This problem may also be considered as the problem of finding optimal predictor coefficients, and solutions invoking adaptive filters are well known [20]. The parameters can also be estimated using a Kalman filter applied to a statespace formulation ([21], Section 3.3), in which the unknown parameters are taken to be the state, and the observation matrix is from the measurements. The Kalman filter method bears resemblance to the approach presented here, but as will be shown there are differences between the two. First, in the case of noisy observations, the previous AR values ξ ( m i ) , i = 1 , , p are not known, so the elements of observation matrix are only approximately known in the Kalman filter approach. No such approximation is used in the method here. Second, the approach presented here does not require the introduction of the system noise with the question of what the variance should be. And thirdly, instead of a scalar observation equation, vector observations are used, allowing the covariance structure to be used with a time-varying covariance.
It is well known ([5], Section 6.7), [22,23] that additive noise broadens spectral peaks in autoregressive spectral estimation and can result in loss of resolution of closely-spaced sinusoids. Dealing with noise added to AR processes has been addressed many times in the literature, as we summarize below. None of these previous methods use the covariance structure employed here. Some of these techniques are related to finding rank p representations of a data matrix (using, for example, an SVD or total least squares) [24,25,26,27,28,29,30]. From another perspective, it is known that white noise added to an A R ( p ) process produces an A R M A ( p , p ) process. This has resulted in algorithms for dealing with the noisy AR process using an ARMA estimator. For example, modified Yule-Walker (YW) equations [31] can be used to find the ARMA model. However, this does not take advantage of the relationship between the AR and the MA parameters in this noisy AR case. Another approach is to estimate the ARMA process using a large-order AR model. Another approach to handling noisy AR observations is total least squares (TLS) [32,33,34,35]. None of these approaches is equivalent to the approach described here. Noisy AR estimation has been studied over many years by Zheng [36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55], who estimates the variance of the observation noise to reduce bias. An approach that is somewhat similar to our approach is bias-corrected RLC (BCRLS) [56,57,58,59,60,61], but the BRCLS does not employ the covariance matrix-weighted least squares used here. Other approaches to noisy AR parameter estimation have been investigated in References [4,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92]
The approach described here uses the fact that the observation noise produces an AR-like process with correlated noise, where the noise correlation depends on the AR coefficients. The correlation is exploited by dealing with vectors of stacked observations, rather than scalar observations. A maximum likelihood approach results in a weighted least-squares problem, in which previous estimates are used to update the covariance matrix in an iterative fashion. We refer to this general algorithm as iterative covariance-weighted autoregressive estimation (ICWARE). While maximum likelihood has been used for AR parameter estimation [18] ([93], Chapter 5), [94], in previous estimators the noise is assumed white, so the autocorrelation structure derived here is not exploited. Iterative parameter estimation has been used for system parameter estimation. An early iterative method of fitting a given impulse response to a transfer function, without an explicit noise model, is in Reference [95]. Iterative quadratic ML (IQML) [96] is another iterative technique. IQML is similar to the algorithm of Reference [97], which requires knowledge of the input signal. IQML is also similar to the method of Reference [98]. An IQML approach [99] has been used for noisy AR estimation. None of these make use of the covariance weighting of ICWARE.
There is some similarity of this approach to the measurement error models studied in the statistical literature [100], in which parameters of regression models such as Y t = β 0 + β 1 x t + e t are estimated when only noisy observations of the regressors X t = x t + u t are available. These noisy observations induce changes to conventional minimum mean-square error (MMSE) estimators which are somewhat related to errors-in-variables approaches to AR estimation [101,102,103].
Another approach to this problem makes use of coupled Kalman filters [104] or cross-coupled H filters [105,106] has been developed. In essence these move beyond the Kalman filter approach referenced above ([21], Section 3.3) and operate as follows. One Kalman filter estimates the true autoregressive value ζ ^ ( m ) , assuming knowledge of the AR parameters { b i } , while the other Kalman filter estimates the { b i } using the estimates ζ ^ ( m ) as if they were the true values. These two filters operate in conjunction to jointly converge to the AR parameter estimate. To evaluate our method in comparison with the dual Kalman filter method, the algorithm here is compare to models also seen in Reference [104].
Recently, Monte Carlo methods, such as particle filtering has been used for AR and ARMA estimation [107,108,109,110]. Particle filtering methods are quite general and can track dynamical variables and jointly estimate static parameters. These methods offer the possibility of convergence to good estimates in situations when the noise is not Gaussian (which has been assumed in this paper), but offer the usual drawbacks of potentially high computational complexity if many particles are used. These represent an alternative to the methods of this paper, and a thorough comparison, while valuable, lies beyond the scope of this paper.
This topic is relevant to its journal of publication (Entropy), since spectral analysis under a maximum entropy criterion has (famously) been shown to be equivalent to AR modeling [11]. As this work shows, however, when there are noisy observations particular attention must be devoted to the information present in the autocorrelation matrix of the observations, beyond the first order equations that result from the historical maximum entropy approach.
We present the method first for the scalar AR model. For generality, noise is assumed to be circular complex Gaussian noise; modifications for real signals is straightforward. In addition to the coefficient estimation problem, we also address the problem of variance estimation. Following development for scalar AR processes, the notation is developed for the vector AR (VAR) processes. Several simulation results demonstrate that the ICWARE method provide significant improvements over classical YW-type methods.

2. Scalar Parameter Estimation in Noise

The noisy observation equation, represented by the system in Figure 1, is
y ( m ) = ξ ( m ) + ν ( m ) = i = 1 p b i ξ ( m i ) + η ( m ) + ν ( m ) ,
where ξ ( m ) C and each b i C , and where ν ( m ) is a white noise process. The process { ξ ( m ) } is said to be the noise-free AR process, and the process { y ( m ) } is said to be the noisy AR process.
The input and observation noises are assumed to be complex circular Gaussian, so that
η ( m ) = σ η ( N ( 0 , 1 ) + j N ( 0 , 1 ) ) and ν ( m ) = σ ν ( N ( 0 , 1 ) + j N ( 0 , 1 ) ) ,
where the real and imaginary parts are uncorrelated, and draws at different times are independent. Furthermore, these noises are assumed to be uncorrelated with each other, so
E [ ν ( m ) ν ( m l ) ] = 2 σ ν 2 δ l E [ η ( m ) η ( m l ) ] = 2 σ η 2 δ l E [ ν ( m ) η ( n ) ] = 0 .
Let b = b 1 b 2 b p T and y ( i ) = y ( i ) y ( i 1 ) y ( i p + 1 ) T (this vector is re-defined below to have a different length). The conventional least-squares approach to estimating b (in the forward prediction error sense) would be to compute
b ^ = arg min b i y ( i ) b T y ( i 1 ) 2 ,
which results in the conventional estimate
b ^ = i y ( i 1 ) y ( i 1 ) H 1 i y ( i ) y ( i 1 ) .
This is essentially the covariance method [19]. The estimate (3) can be computed using recursive formulation, resulting in a recursive least-squares (RLS) adaptive filter. This approach essentially neglects the noise ν ( m ) . It is known that the noise will bias the estimate results [22].
Note: The error could also be computed in a backward prediction error sense, or in a combined forward-backward sense as is done in some spectrum estimation algorithms such as the modified covariance method [14], ([5], Section 75). The ICWARE method we describe below can be extended to these generalizations as well, but for brevity only forward prediction error is used.
Substituting the AR signal in terms of the observation ξ ( m ) = y ( m ) ν ( m ) into the AR model (1) and using the observation model (2) we obtain
(4) y ( m ) = i = 1 p b i y ( m i ) + ( η ( m ) + ν ( m ) i = 1 p b i ν ( m i ) ) (5) = i = 1 p b i y ( m i ) + w ( m ) ,
where
w ( m ) = η ( m ) + ν ( m ) i = 1 p b i ν ( m i )
denotes the noise. The expression (4) is an A R M A ( p , p ) process and is the basis for some ARMA-based approaches to AR estimation in noise. The expression (5) has the appearance of an A R ( p ) process, except that the noise w ( m ) is not white, having correlation
E [ w ( m ) w ( m ) ] = E η ( m ) + ν ( m ) i = 1 p b i ν ( m i ) η ( m ) + ν ( m ) i = 1 p b i ν ( m i )   = 2 ( σ η 2 + σ ν 2 + i = 1 p | b i | 2 σ ν 2 ) = r w ( 0 ) E [ w ( m ) w ( m + ) ] = E η ( m ) + ν ( m ) i = 1 p b i ν ( m i ) η ( m + ) + ν ( m + ) i = 1 p b i ν ( m + i )   = 2 σ ν 2 ( i = 1 p b i b i + b ) = r w ( ) .
If the noise w ( n ) were uncorrelated, the solution of (3) would be least-squares optimal. However, since w ( m ) is correlated, the sample-by-sample approach of (3) is suboptimal since it does not take the correlation into account. As we now show, there is information in the covariance structure of the signal that can be used to improve the estimate. To take the correlation into account, stack the observations of (5) into vectors of length d (that is, the depth), as
y ( m ) y ( m 1 ) y ( m d + 1 ) = y ( m 1 ) y ( m p ) y ( m 2 ) y ( m 1 p ) y ( m d ) y ( m d p + 1 ) b 1 b 2 b p + w ( m ) w ( m 1 ) w ( m d + 1 ) .
Write this as
y ( m ) = Y ( m ) b + w ( m )
The correlation matrix of the vector noise is
E [ w ( m ) w ( m ) H ] = r w ( 0 ) r w ( 1 ) r w ( ( d 1 ) ) r w ( 1 ) r w ( 0 ) r w ( ( d 2 ) ) r w ( d 1 ) r w ( d 2 ) r w ( 0 ) = R w ( b ) .
In moving through a sequence of data, the data can be advanced by a skip s to form a sequence of vectors y ( m ) , y ( m + s ) , y ( m + 2 s ) , , y ( m + s ( k 1 ) ) and matrices Y ( m ) , Y ( m + s ) , Y ( m + 2 s ) , , Y ( m + s ( k 1 ) ) , which we write for convenience as y 0 , y 1 , , y k 1 , and Y 0 , Y 1 , , Y k 1 , respectively. The index m is chosen to make it possible to use only causal data, m d + p 1 .
Assuming independence (such as when s d ) and assuming that the correlation function R w ( b ) is known, the likelihood function can be written as a circular complex Gaussian as
f ( y 0 , , y k 1 | b , R w ( b ) ) 1 | det ( R w ( b ) ) | k exp [ i = 0 k 1 ( y i Y i b ) H R w ( b ) 1 ( y i Y i b ) ] .
Finding the true maximum likelihood solution by directly maximizing (9) with respect to b is difficult, since b enters nonlinearly in R w ( b ) . Instead, an iterative approach is employed. An estimated value R w ( b ) i is used at the ith step. Assuming still that each R w ( b ) i is known, a maximum likelihood solution may be obtained from
b ^ k = arg min b i = 1 k λ k i y i Y i b R w ( b ) i 1 2 .
Here, a forgetting factor λ k i , with λ < 1 , has been introduced to allow for tracking a time-varying b . Significantly, the norm here is weighted by the inverse covariance R w ( b ) i 1 . If we take d = 1 , we obtain the regular least-squares problem, and the additional correlation structure does not appear. Taking the gradient of the cost functional (10) results in the normal equation
i = 1 k λ k i Y i H R w ( b ) i 1 Y i b ^ k = i = 1 k λ k i Y i H R w ( b ) i 1 y i .
Write this as Φ k 1 b ^ k = ϕ k . This can be recursively updated using RLS recursions (see, e.g., References ([111], Section 8.7), [112]), starting from an initial Φ 1 = 1 δ I for a small scalar δ and propagating Φ k .
Computing (11) requires knowledge of R w ( b ) i , which is not available since b is to be found. In combination with a recursively updated solution to (11), R w ( b ) i is estimated using previously computed values of b ^ and used as a time-varying covariance matrix. That is, we write R w ( b ) i = R w ( b ^ i 1 ) i . The result is the framework shown in Algorithm 1, similar to a vector RLS adaptive filter.
Algorithm 1: AR Parameter estimation with noisy data.
Input: y k , Y k , R w ( b ^ k 1 ) k
Previous Conditions: Φ k 1 , b ^ k 1
Compute
K k = λ 1 Φ k 1 Y k H ( λ 1 Y k Φ k 1 Y k H + R w ( b ^ k 1 ) ) 1 b ^ k = b ^ k 1 + K k ( y k Y k b ^ k 1 ) Φ k = λ 1 ( I λ 1 K k Y k ) Φ k 1 .
Update the covariance to obtain R w ( b ^ k ) k + 1 and compute R w ( b ^ k ) k + 1 1 .
Return b ^ k , Φ k , R ( b ^ k ) k + 1 .
The “Update the covariance” step, detailed below, is how the structure of R b ( b ) can be used to improve the parameter estimates.
We have explored several different approaches to computing and using R w ( b ^ ) :
  • Ignore R w ( b ) : Neglect the correlation structure and simply assume that R w ( b ^ k ) i = I . This gives the equivalent of taking a scalar measurement and is used as a sort of worst-case basis for comparison among the different algorithms.
  • Use the correct value of R w ( b ) : That is, assume that b and σ ν 2 and σ η 2 are known and compute R w ( b ^ ) k according to (6). This provides a limit on best-case performance against which other methods can be compared.
  • Use the estimate of b : Using the correct values of σ ν 2 and σ η 2 , compute the autocorrelation matrix using b ^ k in (6).
  • Estimate b ^ , fix σ ν 2 and σ η 2 : With assumed values of σ ν 2 and σ η 2 , compute the autocorrelation matrix using b ^ k in (6).
  • Estimate everything: Estimate the values of σ ν 2 and σ η 2 , then use them with b ^ k in (6).
In the early stages while b ^ k is poorly converged, it is best to use option 1 (assuming identity covariance matrix) until b ^ has settled near its final value, then switch to option 4 using estimated b ^ k until the moments in Σ ( b ) have converged sufficiently well that reliable estimates of the variances can be obtained and option 5 can be used. As described in Section 3, the information necessary to estimate the variances can be accumulated without yet having a decent estimate for b ^ k .
The covariance update/inverse does not necessarily need to be done at every step. Particularly as b ^ k settles towards its final value, there is little to be gained by updating R w ( b ^ ) at every step.

3. Estimating the Variances

As the results below will indicate, fixed values of σ ν 2 and σ η 2 can be used in the estimate of R w ( b ) instead of estimated values, so for the purposes of estimating b , it is not strictly necessary to estimate these variances. But for other purposes it may be necessary to have an estimate of σ ν 2 and σ η 2 . A maximum likelihood estimation approach is thus described in this section.
For a given estimate of the coefficients b ^ , write R w ( b ^ ) as
R w ( b ^ ) = 2 σ η 2 I + 2 σ ν 2 B ( b ^ ) .
For a sequence of k vectors y 0 , y 1 , , y k 1 , assumed to be nonoverlapping, encompassing N samples, the joint log likelihood function is a complex (circularly symmetric) Gaussian
log f ( y 0 , y 1 , , y k 1 | b , R w ( b ) , σ ν 2 , σ η 2 ) = N log π k log det R w ( b ) i = 0 k 1 ( y i Y i b ) H R w ( b ) 1 ( y i Y i b ) ] .
Let
Σ ( b ) = 1 k i = 0 k 1 ( y i Y i b ) ( y i Y i b ) H
denote the sample covariance. It can be shown that (see Appendix A)
σ η 2 log f ( y 1 , y 2 , , y k | b , σ ν 2 , σ η 2 ) = k tr ( R w ( b ) 1 ) + k tr ( R w ( b ) 1 Σ ( b ) R w ( b ) 1 .
and that
σ ν 2 log f ( y 1 , y 2 , , y k | b , σ ν 2 , σ η 2 ) = k i , j ( R w ( b ) 1 ) i , j ( B ( b ) ) i j + k i , j ( R w ( b ) 1 Σ ( b ) R w ( b ) 1 ) i , j ( B ( b ) ) i , j .
The gradients (14) and (15) are equal to zero when
R w ( b ) = Σ ( b ) .
The variance estimates are chosen to satisfy the condition (16) using an estimate of b from previous estimates, that is
2 σ η 2 I + 2 σ ν 2 B ( b ^ ) = Σ ( b ^ ) .
We define the offset trace as
tr ( B , ) = i B i , i + ,
where the usual trace is obtained when = 0 , and for > 0, the sum is taken on the th superdiagonal. Let β = tr ( B ( b ^ ) , ) for = 0 , 1 , , d 1 , where that is, the trace on the th superdiagonal. Taking the offset trace of (17) gives
2 d σ η 2 + 2 σ ν 2 β 0 = tr ( Σ ( b ^ ) , 0 ) 2 σ ν 2 β = tr ( Σ ( b ^ ) , ) , = 1 , 2 , , d 1 .
The ML estimates of the variances are the solutions to
2 d β 0 0 β 1 0 β d 1 σ ^ η 2 σ ^ ν 2 = tr ( Σ ( b ^ ) , 0 ) tr ( Σ ( b ^ ) , 1 ) tr ( Σ ( b ^ ) , d 1 ) .
A significant number of terms of Σ ( b ^ ) must be accumulated in order to estimate the variances well. Initially, before reasonably accurate estimates of b are available, using inaccurate b ^ s can result in a highly inaccurate Σ ( b ^ ) . As a result, it is desirable to accumulate moments of y i and Y i without using b ^ . The trace of the sample covariance is
tr ( Σ ( b ) , ) = 1 k i = 1 k tr ( y i y i H , ) tr ( y i b H Y i H , ) tr ( Y i b y i H , ) + tr ( Y i b b H Y i H , ) .
The different terms can be written as
i tr ( y i b H Y i H , ) = j k i y i , j b k Y i , j + , k = k b k i [ Y i , + 1 : end , k ] H y i , 1 : end
i tr ( Y i b y i H ) = i j k Y i , k , j b j y i , k + = j b j i [ y i , 1 + : end ] H Y i , 1 : end , j .
(It is noted that, except when = 0 , these two terms are not conjugates of each other, so that both of these terms must be retained.)
i tr ( Y i b b H Y i H , ) = j k l i Y i , k , j ( b b H ) j , l [ Y i H ] l , k + = j l ( b b H ) j , l i [ Y i , 1 + : end , : ] H Y i , 1 : end , :
These terms can be computed by recursively propagating the following quantities, for k = 1 , 2 , , p and = 0 , 1 , , p :
a 0 ( , k ) = 1 k i = 0 k 1 y i , 1 + , end H y i , 1 : end = 1 k ( y k , 1 + : end H y k , 1 : end + ( k 1 ) a 0 ( , k 1 ) ) a k 1 ( , k ) = 1 k i = 0 k 1 [ Y i , + 1 : end , k ] H y i , 1 : end = 1 k ( [ Y k , + 1 : end , k ] H y k , 1 : end + ( k 1 ) a k 1 ( , k 1 ) ) a k 2 ( , k ) = 1 k i = 0 k 1 y i , + 1 : end H Y i , 1 : end , k = 1 k ( y k , + 1 : end H Y k , 1 : end , k + ( k 1 ) a k 2 ( , k 1 ) ) A ( , k ) = 1 k i = 0 k 1 Y i , + 1 : end , : H Y i , 1 : end , : = 1 k ( Y k , + 1 : end , : H Y k , 1 : end , : + ( k 1 ) A ( , k 1 ) ) .
Then
tr ( Σ ( b ^ ) , ) = a 0 ( , k ) b ^ H a 1 ( , k ) b ^ T a 2 ( , k ) + tr ( ( b ^ b ^ H ) A ( , k ) ) .
Figure 2 shows the result of this estimation in an example with p = 3 and d = 5 . The mean and standard deviation over 50 independent runs is shown, for up 2000 samples averaged to produce Σ ( b ) . The true value of b is used. The least-squares solution to (18) can produce variance estimates for σ η 2 which are negative, which is nonphysical. Solutions constrained so that the variances are constrained to the range [ 0.1 , 5 ] were also computed using CVX [113]. From these results, about 500 samples are needed before the variance estimate converges to a value somewhat close to the true value.

4. Vector Autoregressive Formulation

In this section we extend the results of the previous section to vector autoregressive random processes in noise, where at each time vector of length K is produced, and the coefficients are K × K matrices. In the interest of generality, a constant offset v is also included. The noisy observations can be written as
y ( m ) = ξ ( m ) + ν ( m ) = v + i = 1 p B i ξ ( m i ) + η ( m ) + ν ( m ) .
As in the scalar case, the noisy observations can be written as
y ( m ) = v + i = 1 p B i y ( m i ) + w ( m ) ,
where
w ( m ) = η ( m ) + ν ( m ) i = 1 p B i ν ( m i ) .
Following the usual practice [15], this is vectorized to obtain a vector of unknown parameters as follows.
y ( m ) = v B 1 B p 1 y ( m 1 ) y ( m p ) + w ( m ) .
Applying the vec operator ([114], Chapter 9) we obtain
y ( m ) = ( 1 y ( m 1 ) y ( m p ) T I K ) vec ( v B 1 B p ) + w ( m ) .
Let
b = vec ( v B 1 B p ) C ( K 2 p + K ) × 1
and
Y ( m ) = ( 1 y ( m 1 ) y ( t p ) T I K ) C K × ( K 2 p + K ) .
Then
y ( m ) = Y ( m ) b + w ( m ) .
The noise structure for a single vector can be written as
w ( m ) = I I B 1 B 2 B p η ( m ) ν ( m ) ν ( m 1 ) ν ( m 2 ) ν ( m p ) .
Then
E [ w ( m ) w H ( m ) ] = 2 ( σ η 2 I + σ ν 2 ( I + i = 1 p B i B i H ) = R w ( 0 ) .
We also find
E [ w ( m ) w H ( m + ) ] = 2 σ ν 2 ( B H + i = 1 p B i B i + H ) = R w ( ) .
As in the scalar case, stack up multiple observations so that the correlation structure may be exploited:
y ( m ) y ( m 1 ) y ( m d + 1 ) = Y ( m ) Y ( m 1 ) Y ( m d + 1 ) b + w ( m ) w ( m 1 ) w ( m d + 1 ) .
Write this as
y ( m ) = Y ( m ) b + w ( m ) .
The correlation structure of the stacked noise vector is
E [ w ( m ) w ( m ) H ] = R w ( 0 ) R w ( 1 ) R w ( d + 1 ) R w ( 1 ) R w ( 0 ) R w ( d + 2 ) R w ( d 1 ) R w ( d 2 ) R w ( 0 ) = R ( b ) .
Let y ( m ) , y ( m + s ) , , y ( m + s k ) and Y ( m ) , Y ( m + s ) , , Y ( m + s k ) be a sequence of vectors and matrices where the vector samples are skipped by s samples at each step, and for convenience denote these as y 1 , y 2 , , y k and Y 1 , Y 2 , , Y k . Following the method of Section 2, the estimate
b ^ k = arg max b i = 1 k λ k i y i Y i b R ( b ) 1
is determined by the solution to the normal equations
i = 0 k λ k i Y i H R ( b ^ ) 1 Y i 1 b = i = 0 k λ k i Y i H R w ( b ) 1 . y i
With the appropriate use of vectorized data, Algorithm 1 can be used for the VAR as well.

5. Some Results

Several test cases were examined to determine the performance of the ICWARE approach; these are summarized in Table 1, where the pole locations are given in polar form ρ e j θ .
The first example, designated Case 1, is a system with p = 3 having poles at ρ e j θ 1 , ρ e j θ 2 , ρ e j θ 3 , with ρ = 0.95 and ( θ 1 , θ 2 , θ 3 ) = ( 0.65 , 0.7 , 0.75 ) . resulting in a fairly narrowband complex AR signal. The noise variances are σ η 2 = 1 , and σ ν 2 = 1 . This case is used to explore some of variations in performance as parameters of the algorithm are varied. Figure 3 shows b b ^ 2 2 as a function of iteration using various forms of R b ( b ) and values of d (the stack height) from 1 to 7. The “skip” is set to s = 3 . The results are obtained by averaging the results of 50 independent runs. In these plots, “Iteration number” refers to the number of blocks of data used in an online scheme, each iteration of the algorithm corresponding to one data block. The results are compared with b b ^ 2 2 for YW with noisy measurements and noise-free measurements and scalar AR estimation. The autocorrelation values for the YW method are computed using 30000 points. The Burg method was also used, but with this many points there is little difference between Burg and conventional YW. The scalar AR estimation (black dotted line) converges to the YW performance, as does the ICWARE with d = 1 , as expected.
In Figure 3, three different ways of determining R b ( b ) are used. The solid lines (subplot (a)) use the true R b ( b ) . This is, of course, unavailable in practice, but these plots serve as a basis for comparison against real algorithms. The dashed lines (subplot (b)) use an identity matrix for R b ( b ) for the first 30 iterations (to establish an estimate of b ), following which R b ( b ^ ) is computed using estimated b and the correct variances σ ν 2 and σ ϵ 2 . The dotted lines (subplot (c)) are similar, except that fixed (but wrong) values of σ ν 2 = 5 and σ ϵ 2 = 5 are used in the computation of R b ( b ^ ) . It is clear that as d increases, the performance significantly improves, especially for the first few values of d. Interestingly, for d = 7 , the error performance is on the order of the error in the noise-free YW.
The improvements of the new method after 1000 iterations (denoted by b b ^ k d 2 ) compared with the YW result b b ^ Y W 2 2 tabulated in Table 2 for s = 2 , 3 and 4. There is little difference between s = 2 and s = 3 , but when s = 4 , that is, s > p , the performance declines.
Figure 4 shows the influence of the skip s on the performance, with the same methods of estimating R b ( b ) as shown in Figure 3. For a fixed value of d = 7 , the error curves for different values of s are shown. Larger s for s p does improve the performance, but only slightly.
(a) Solid: True R b ( b ) ; (b) Dashed: R b ( b ^ ) estimated from b ^ and correct variances; (c) Dotted: R b ( b ) estimated from b ^ and fixed incorrect variances.
Figure 5 shows estimates of the variances σ ν 2 and σ η 2 for different values of d. Obviously we need d > 2 (in order to have two equations). However, when d = 2 there is a very large variance. By the point d 4 , the variance estimates are very close to the true variances.
Figure 6 shows the magnitude frequency response for the true spectrum, the YW spectrum, and the spectrum computed from b ^ obtained using d = 7 and d = 3 after convergence. (The spectrum is not an average, but only a single outcome.) The ICWARE estimated spectrum is very close to the true spectrum while the YW spectrum has considerable bias, exhibiting strong peaks not present in the true spectrum.
As a point of comparison, TLS is used to estimate the AR parameters. In TLS, the equation
y ( m ) y ( m 1 ) y ( m p + 1 ) y ( m + 1 ) y ( m ) y ( m p + 2 ) y ( m + M ) y ( m + M 1 ) y ( m + M p ) b 1 b 2 b p = y ( m + 1 ) y ( m + 2 ) y ( m + M + 1 ) + ν ( m + 1 ) ν ( m + 2 ) ν ( m + M + 1 )
(for some m and M) is perturbed in both the matrix on the left and the vector of observations on the right. TLS can be computed using the singular value decomposition (SVD), making it computationally complex. It also makes it more difficult to create algorithms which track changing parameters, and, as shown below, gives inferior parameter estimates than the method presented here. Figure 7 shows the error for b ^ computed using TLS. The parameter M is the height of the linear system that is solved using the TLS method described in Reference [114]. The values M { 20 , 60 , 100 , 140 } are examined. At each iteration, a different set of points are used in the TLS equations, each of which is computed using a (relatively complicated) singular value decomposition. Even with 20 rows in the TLS equations, performance only roughly comparable to the YW equations is obtained. For 60 or more equations, the improvement of the performance quickly saturates, so that 60 or 140 perform rather comparably. The computational complexity is rather high. For each solution (at each iteration), the SVD of a M × 3 matrix is computed. Despite the computational complexity, the ICWARE method performs superior to the TLS for d 4 . (The method of estimating R b ( b ) in is the same as used for Figure 3b.)
The results presented to this point take 1000 blocks of data. A consideration is whether it is possible to iteratively re-use data so that less data is required, but the reduced error from the ICWARE algorithm can be obtained. In Figure 8, k = 100 blocks of data are employed, and the processing is repeated 10 times on each block. Figure 9 shows the error as a result of each iteration. Each line in the plot corresponds to one pass through the data. The top plot uses R b ( b ^ ) estimated using b ^ and true variances; the bottom plot uses R b ( b ^ ) using b ^ and fixed variances. These figures show that iterating over the data provides essentially the same performance as longer data.
The next example, designated Case 2, is also a third order system with poles at the same angles as Case 1, but with ρ = 0.9 . The following figures show examples of the same sort of results shown for Case 1. In Figure 10 the various methods are compared. In this case, however, the minimum error does not reach as low as the noise-free YW case. Figure 11 shows the resulting spectrum, again showing that the ICWARE spectrum is closer than the YW spectrum.
Case 3 involves a system having poles at ρ = 1 —that is, pure sinusoidal signals—and angles ( θ 1 , θ 2 , θ 3 ) = ( 0.65 , 0.655 , 0.66 ) . The estimated spectra are shown in Figure 12. The spectral peaks are clearly evident in the ICWARE estimate.
As another comparison, ICWARE spectral estimates are compared to the spectra from dual Kalman filters using the examples from Reference [104], shown as Cases 4,5, and 6 in Table 1. Figure 13 shows the results. Spectra for twenty realizations are plotted, along with the true spectrum and the mean. The values of d = p + 3 and s = 3 were selected, since simulations for Case 1 (above) suggest these are reasonable values. Comparison of these results with Figures 3, 4, and 6 of Reference [104] shows fairly comparable performance. When the poles are not near the unit circle (Case 4), the ICWARE estimate tend to have smoother spectra (estimated poles with smaller magnitude). When the poles are nearer the units circle (Case 5, Case 6), ICWARE seems to do a comparable job at capturing the peaks, and does somewhat better at representing the high-frequency dropoff of the spectrum. In all of these simulations, an estimated R b ( b ) was employed.

6. Summary and Conclusions

In this paper, we have shown that accounting for observation noise added to a pure AR(p) process results in noise which is correlated across lags. This makes it expedient to employ stacked observation vectors, and estimating the parameters in a vector sense. This results in an algorithm that is essentially a vector RLS adaptive filter. Several different methods were described to obtain information about the correlation matrix R w ( b ) . Also, estimating the variances of the AR and observation noise was described.
It was shown by simulation that the method can be applied repetitively to the same block of data, providing accurate results with data of moderate length.
The improvement of the technique compared with Yule-Walker is a function of the depth d to which the observations are stacked. Values even greater than p continue to yield improvement. The improvement relative to YW also is system dependent. Based on simulations, a depth d = p + 3 , where p is the order of the system, seems a reasonable choice.
Comparisons were made against a dual Kalman filter approach, with comparable or slightly superior results.

Author Contributions

Visualization, J.H.G.; Writing—original draft, T.K.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Maximizing The Log Likelihood Function for Estimating Variances

In this section we present details on the computation of the derivative of the likelihood (12) with respect to σ ν 2 . We note that
σ η 2 R w ( b ) = 2 I σ ν 2 R w ( b ) = 2 σ ν 2 B ( b ) .
The derivative of the second term of (12) is
σ η 2 log det R w ( b ) = i , j R w ( b ) log det R w ( b ) i , j σ η 2 R w ( b ) i , j = i , j ( R w ( b ) 1 ) i , j ( I ) i , j = tr ( R w ( b ) 1 ) .
The summation term of (12) can be written as
i = 1 k ( y i Y i b ) H R w ( b ) 1 ( y i Y i b ) = tr i = 1 k ( y i Y i b ) H R w ( b ) 1 ( y i Y i b ) = k tr R w ( b ) 1 Σ ( b )
so that the derivative is
σ η 2 i = 1 k ( y i Y i b ) H R w ( b ) 1 ( y i Y i b ) = k i , j R w ( b ) tr ( R w ( b ) 1 Σ ( b ) ) i , j σ η 2 R w ( b ) i , j = k i , j ( R w ( b ) 1 Σ ( b ) R w ( b ) 1 ) H ( I ) i , j = tr ( R w ( b ) 1 Σ ( b ) R w ( b ) 1 ) .
Combining (A1) and (A2) gives (14).

References

  1. Makhoul, J. Spectral analysis of speech by linear prediction. IEEE Trans. Audio Electroacoust. 1973, 21, 140–148. [Google Scholar] [CrossRef]
  2. Lim, J.; Oppenheim, A. All-pole modeling of degraded speech. IEEE Trans. Signal Process. 1978, 26, 197–210. [Google Scholar] [CrossRef]
  3. Lim, J.; Oppenheim, A. Enhancement and bandwidth compression of noisy speech. Proc. IEEE Inst. Electr. Electron. Eng. 1979, 67, 1586–1604. [Google Scholar] [CrossRef]
  4. Wrench, A.; Cowan, C. A new approach to noise-robust LPC. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP ’87, New York, NY, USA, 6–9 April 1987; Volume 12, pp. 305–307. [Google Scholar] [CrossRef]
  5. Kay, S. Modern Spectral Estimation: Theory and Application; Prentice-Hall: Englewood Cliffs, NJ, USA, 1988. [Google Scholar]
  6. Deller, J.R.; Proakis, J.G.; Hansen, J.H.L. Discrete–Time Processing of Speech Signals; Macmillan: New York, NY, USA, 1993. [Google Scholar]
  7. Ramamurthy, K.N.; Spanias, A.S. Matlab Software for The Code Excited Linear Prediction Algorithm: The Federal Standard 1016; Morgan & Claypool: San Rafael, CA, USA, 2009. [Google Scholar]
  8. de Prony, B.R. Essai Expérimental et Analytique: Sur les Lois de la Dilatabilité de Fluides Élastiques et sur Celles de la Force Exapnsive de la Vapeur de l’eau et de la Vapeur de l’alcool, à Différentes Températures. J. l’École Polytech. 1795, 1, 24–76. [Google Scholar]
  9. Ulrich, T.; Clayton, R. Time series modeling and maximum entropy. Phys. Earth Planetary Interiors 1976, 12, 188–200. [Google Scholar]
  10. Capon, J. High-Resolution Frequency-Wavenumber Spectrum Analysis. Proc. IEEE 1969, 57, 1408–1418. [Google Scholar] [CrossRef] [Green Version]
  11. Burg, J. Maximum entropy spectral analysis. Ph.D. Thesis, Stanford University, Stanford, CA, USA, 1975. [Google Scholar]
  12. Marple, L. High Resolution Autoregressive Spectrum Analysis Using Noise Power Cancellation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP ’78, Tulsa, OK, USA, 10–12 April 1978; Volume 3, pp. 345–348. [Google Scholar] [CrossRef]
  13. Marple, L. A New Autoregressive Spectrum Analysis Algorithm. IEEE Trans. ASSP 1980, 28, 441–454. [Google Scholar] [CrossRef]
  14. Nuttall, A. Spectral Analysis of a Univariate Process with Bad Data Points, Via Maximum Entropy and Linear Predictive Techniques; Technical Report NUSC Tech. Rep. 5303; Naval Underwater Systems Center: New London, CT, USA, 1976. [Google Scholar]
  15. Helmut Lütkepohl. New Introduction fo Multiple Time Series Analysis; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
  16. Guler, I.; Kiymik, M.; Akin, M.; Alkan, A. AR spectral analysis of EEG signals by using maximum likelihood estimation. Comput. Biol. Med. 2001, 31, 441–450. [Google Scholar] [CrossRef]
  17. Baddour, K.; Beaulieu, N. Autoregressive modeling for fading channel simulation. IEEE Trans. Wireless Comm. 2005, 4, 1650–1662. [Google Scholar] [CrossRef]
  18. Box, G.E.P.; Jenkins, G.M. Time Series Analysis: Forecasting and Control; Holden-Day: San Francisco, CA, USA, 1978. [Google Scholar]
  19. Makhoul, J. Linear Prediction: A Tutorial Review. Proc. IEEE 1975, 63, 561–580. [Google Scholar] [CrossRef]
  20. Sayed, A.H. Fundamentals of Adaptive Filtering; Wiley Interscience: Hoboken, NJ, USA, 2003. [Google Scholar]
  21. Anderson, B.D.O.; Moore, J.B. Optimal Filtering; Prentice-Hall: Englewood Cliffs, NJ, USA, 1979. [Google Scholar]
  22. Kay, S. The Effects of Noise on the Autoregressive Spectral Estimator. IEEE Trans. Acoust. Speech Signal Process. 1979, 27, 478–485. [Google Scholar] [CrossRef]
  23. Lacoss, R.T. Data adaptive spectral analysis methods. Geophysics 1971, 36, 661–675. [Google Scholar] [CrossRef]
  24. Tufts, D.; Kumaresan, R. Improved spectral resolution. Proc. IEEE 1980, 68, 419–420. [Google Scholar] [CrossRef]
  25. Tufts, D.; Kumaresan, R. Improved spectral resolution II. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP ’80, Denver, CO, USA, 9–11 April 1980; Volume 5, pp. 592–597. [Google Scholar]
  26. Kumaresan, R.; Tufts, D. Improved spectral resolution III: Efficient realization. Proc. IEEE 1980, 68, 1354–1355. [Google Scholar] [CrossRef]
  27. Kumaresan, R.; Tufts, D. Accurate parameter estimation of noisy speech-like signals. In Proceedings of the Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP ’82, Paris, France, 3–5 May 1982; Volume 7, pp. 1357–1361. [Google Scholar]
  28. Kumaresan, R.; Tufts, D. Estimating the parameters of exponentially damped sinusoids and pole-zero modeling in noise. IEEE Trans. Acoust. Speech Signal Process. 1982, 30, 833–840. [Google Scholar] [CrossRef] [Green Version]
  29. Tufts, D.; Kumaresan, R. Estimation of frequencies of multiple sinusoids: Making linear prediction perform like maximum likelihood. Proc. IEEE 1982, 70, 975–989. [Google Scholar] [CrossRef]
  30. Kumaresan, R.; Tufts, D.; Scharf, L. A Prony method for noisy data: Choosing the signal components and selecting the order in exponential signal models. Proc. IEEE 1984, 72, 230–233. [Google Scholar] [CrossRef]
  31. Cadzow, J. High Performance Spectral Estimation: A New ARMA Method. IEEE Trans. Acoust. Speech Signal Process. 1980, 28, 524–529. [Google Scholar] [CrossRef]
  32. Zheng, W.X. On TLS estimation of autoregressive signals with noisy measurements. In Proceedings of the SSeventh International Symposium on Signal Processing and Its Applications, Paris, France, 4 July 2003; Volume 2, pp. 287–290. [Google Scholar] [CrossRef]
  33. Huffel, S.V.; Vandewalle, J. The Use of Total Linear Least Squares Technique for Identification and Parameter Estimation. In Proceedings of the IFAC/IFORS Symposium on Identification and Parameter Estimation, York, UK, 3–7 July 1985; pp. 1167–1172. [Google Scholar]
  34. van Huffel, S.J.; Vandewalle, S.J.; Haegemans, A. An Efficient and Reliable Algorithm for Computing the Singular Subspace of a Matrix, Associated with Its Smallest Singular Values. J. Comput. Appl. Math. 1987, 21, 313–320. [Google Scholar] [CrossRef] [Green Version]
  35. Huffel, S.V.; Vandewalle, J. The Partial Total Least Squares Algorithm. J. Comput. Appl. Math. 1988, 21, 333–341. [Google Scholar] [CrossRef] [Green Version]
  36. Zheng, W. Identification of Autoregressive Signals Observed in Noise. In Proceedings of the American Control Conference, San Francisco, CA, USA, 2–4 June 1993; pp. 1229–1230. [Google Scholar]
  37. Zheng, W.X. An efficient algorithm for parameter estimation of noisy AR processes. In Proceedings of the 1997 IEEE International Symposium on Circuits and Systems, ISCAS ’97, Hong Kong, China, 12 June 1997; Volume 4, pp. 2509–2512. [Google Scholar]
  38. Zheng, W. Estimation of autoregressive signals from noisy measurements. IEEE Proc. Vis. Image Signal Process. 1997, 144, 39–45. [Google Scholar] [CrossRef]
  39. Zheng, W.X. Unbiased identification of autoregressive signals observed in colored noise. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, Seattle, WA, USA, 15 May 1998; Volume 4, pp. 2329–2332. [Google Scholar]
  40. Zheng, W.X. Adaptive parameter estimation of autoregressive signals from noisy observations. In Proceedings of the ICSP ’98. 1998 Fourth International Conference on Signal Processing Proceedings, Beijing, China, 12–16 October 1998; Volume 1, pp. 449–452. [Google Scholar]
  41. Zheng, W.X. On implementation of a least-squares based algorithm for noisy autoregressive signals. In Proceedings of the ISCAS ’98, 1998 IEEE International Symposium on Circuits and Systems, Monterey, CA, USA, 31 May–3 June 1998; Volume 5, pp. 21–24. [Google Scholar]
  42. Zheng, W.X. A least-squares based method for autoregressive signals in the presence of noise. IEEE Trans. Circuits Syst. II Analog Digit. Signal Process. 1999, 46, 81–85. [Google Scholar] [CrossRef]
  43. Zheng, W.X. Adaptive linear prediction of autoregressive models in the presence of noise. In Proceedings of the WCCC-ICSP 2000, 5th International Conference on Signal Processing Proceedings, Beijing, China, 21–25 August 2000; Volume 1, pp. 555–558. [Google Scholar]
  44. Zheng, W.X. Autoregressive parameter estimation from noisy data. IEEE Trans. Circuits Syst. II Analog Digit. Signal Process. 2000, 47, 71–75. [Google Scholar] [CrossRef]
  45. Zheng, W.X. Estimation of the parameters of autoregressive signals from colored noise-corrupted measurements. IEEE Signal Process. Lett. 2000, 7, 201–204. [Google Scholar] [CrossRef]
  46. Zheng, W.X. A fast convergent algorithm for identification of noisy autoregressive signals. In Proceedings of the ISCAS 2000 Geneva. The 2000 IEEE International Symposium on Circuits and Systems, Geneva, Switzerland, 28–31 May 2000; Volume 4, pp. 497–500. [Google Scholar]
  47. Zheng, W.X. A new estimation algorithm for AR signals measured in noise. In Proceedings of the 2002 6th International Conference on Signal Processing, Beijing, China, 26–30 August 2002; Volume 1, pp. 186–189. [Google Scholar]
  48. Zheng, W.X. An alternative method for noisy autoregressive signal estimation. In Proceedings of the ISCAS 2002. IEEE International Symposium on Circuits and Systems, Phoenix-Scottsdale, AZ, USA, 26–29 May 2002; Volume 5, pp. V–349–V–352. [Google Scholar]
  49. Zheng, W.X. On unbiased parameter estimation of autoregressive signals observed in noise. In Proceedings of the 2003 International Symposium on Circuits and Systems, ISCAS ’03, Bangkok, Thailand, 25–28 May 2003; Volume 4, pp. IV–261–IV–264. [Google Scholar]
  50. Zheng, W.X. Fast adaptive identification of autoregressive signals subject to noise. In Proceedings of the 2004 International Symposium on Circuits and Systems, ISCAS ’04, Vancouver, BC, Canada, 23–26 May 2004; Volume 3, pp. III–313–16. [Google Scholar]
  51. Zheng, W.X. An efficient method for estimation of autoregressive signals in noise. In Proceedings of the 2005 IEEE International Symposium on Circuits and Systems, ISCAS 2005, Kobe, Japan, 23–26 May 2005; Volume 2, pp. 1433–1436. [Google Scholar]
  52. Zheng, W.X. A new look at parameter estimation of autoregressive signals from noisy observations. In Proceedings of the 2006 IEEE International Symposium on Circuits and Systems, ISCAS 2006, Island of Kos, Greece, 21–24 May 2006; p. 4. [Google Scholar]
  53. Zheng, W.X. On Estimation of Autoregressive Signals in the Presence of Noise. IEEE Trans. Circuits Syst. II Express Briefs 2006, 53, 1471–1475. [Google Scholar] [CrossRef]
  54. Zheng, W.X. An Efficient Method for Estimation of Autoregressive Signals Subject to Colored Noise. In Proceedings of the 2007 IEEE International Symposium on Circuits and Systems, ISCAS 2007, New Orleans, LA, USA, 27–20 May 2007; pp. 2291–2294. [Google Scholar]
  55. Xia, Y.; Zheng, W.X. On unbiased identification of autoregressive signals with noisy measurements. In Proceedings of the 2015 IEEE International Symposium on Circuits and Systems (ISCAS), Lisbon, Portugal, 24–27 May 2015; pp. 2157–2160. [Google Scholar]
  56. Arablouei, R.; Dogancay, K.; Adali, T. Unbiased RLS identification of errors-in-variables models in the presence of correlated noise. In Proceedings of the 2014 22nd European Signal Processing Conference (EUSIPCO), Lisbon, Portugal, 1–5 September 2014; pp. 261–265. [Google Scholar]
  57. Ai-guo, W.; Fan, Y.; Yang-Yang, Q. Bias compensation based recursive least squares identification for equation error models with colored noises. In Proceedings of the 2014 33rd Chinese Control Conference (CCC), Nanjing, China, 28–30 July 2014; pp. 6715–6720. [Google Scholar]
  58. Arablouei, R.; Dogancay, K.; Adali, T. Unbiased Recursive Least-Squares Estimation Utilizing Dichotomous Coordinate-Descent Iterations. IEEE Trans. Signal Process. 2014, 62, 2973–2983. [Google Scholar]
  59. Yong, Z.; Hai, T.; Zhaojing, Z. A modified bias compensation method for output error systems with colored noises. In Proceedings of the 2011 30th Chinese Control Conference (CCC), Yantai, China, 1–30 April 2011; pp. 1565–1569. [Google Scholar]
  60. Wu, A.G.; Qian, Y.Y.; Wu, W.J. Bias compensation-based recursive least-squares estimation with forgetting factors for output error moving average systems. IET Signal Process. 2014, 8, 483–494. [Google Scholar] [CrossRef]
  61. Arablouei, R.; Dogancay, K.; Werner, S. Recursive Total Least-Squares Algorithm Based on Inverse Power Method and Dichotomous Coordinate-Descent Iterations. IEEE Trans. Signal Process. 2015, 63, 1941–1949. [Google Scholar] [CrossRef] [Green Version]
  62. Tomcik, J.; Melsa, J. Least squares estimation of predictor coefficients from noisy observations. In Proceedings of the 1977 IEEE Conference on Decision and Control including the 16th Symposium on Adaptive Processes and A Special Symposium on Fuzzy Set Theory and Applications, New Orleans, LA, USA, 7–9 December 1977; pp. 3–6. [Google Scholar]
  63. Lee, T.S. Identification and spectral estimation of noisy multivariate autoregressive processes. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP ’81, Atlanta, GA, USA, 30 March–1 April 1981; Volume 6, pp. 503–507. [Google Scholar]
  64. Cadzow, J.A. Autoregressive Moving Average Spectral Estimation: A Model Equation Error Procedure. IEEE Trans. Geosci. Remote Sens. 1981, GE-19, 24–28. [Google Scholar] [CrossRef]
  65. Gingras, D. Estimation of the autoregressive parameters from observations of a noise corrupted autoregressive time series. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP ’82, Paris, France, 3–5 May 1982; Volume 7, pp. 228–231. [Google Scholar]
  66. Ahmed, M. Estimating the parameters of a noisy AR-process by using a bootstrap estimator. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP ’82, Paris, France, 3–5 May 1982; Volume 7, pp. 152–155. [Google Scholar]
  67. Tugnait, J. Recursive parameter estimation for noisy autoregressive signals. IEEE Trans. Inform. Theory 1986, 32, 426–430. [Google Scholar] [CrossRef]
  68. Paliwal, K. A noise-compensated long correlation matching method for AR spectral estimation of noisy signals. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP ’86, Tokyo, Japan, 7–11 April 1986; Volume 11, pp. 1369–1372. [Google Scholar]
  69. Cernuschi-Frias, B.; Rogers, J. On the exact maximum likelihood estimation of Gaussian autoregressive processes. IEEE Trans. Acoust. Speech Signal Process. 1988, 36, 922–924. [Google Scholar] [CrossRef]
  70. Gingras, D.; Masry, E. Autoregressive spectral estimation in additive noise. IEEE Trans. Acoust. Speech Signal Process. 1988, 36, 490–501. [Google Scholar] [CrossRef]
  71. Masry, E. Almost sure convergence analysis of autoregressive spectral estimation in additive noise. IEEE Trans. Inform. Theory 1991, 37, 36–42. [Google Scholar] [CrossRef]
  72. Deriche, M. AR parameter estimation from noisy data using the EM algorithm. In Proceedings of the 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-94, South Australia, Australia, 19–22 April 1994; Volume iv, pp. IV/69–IV/72. [Google Scholar]
  73. Lee, L.M.; Wang, H.C. An extended Levinson-Durbin algorithm for the analysis of noisy autoregressive process. IEEE Signal Process. Lett. 1996, 3, 13–15. [Google Scholar]
  74. Hasan, T.; Yahagi, T. Accurate noise compensation technique for the identification of multichannel AR processes with noise. In Proceedings of the 1996 IEEE International Symposium on Circuits and Systems Connecting the World, ISCAS ’96, Atlanta, GA, USA, 15 May 1996; Volume 2, pp. 401–404. [Google Scholar] [CrossRef]
  75. Doblinger, G. An adaptive Kalman filter for the enhancement of noisy AR signals. In Proceedings of the 1998 IEEE International Symposium on Circuits and Systems, ISCAS ’98, Monterey, CA, USA, 31 May–3 June 1998; Volume 5, pp. 305–308. [Google Scholar]
  76. Hasan, T.; Hossain, J. Multichannel autoregressive spectral estimation from noisy observations. In Proceedings of the TENCON 2000, Kuala Lumpur, Malaysia, 24–27 September 2000; Volume 1, pp. 327–332. [Google Scholar]
  77. Hasan, T.; Ahmed, K. A joint technique for autoregressive spectral estimation from noisy observations. In Proceedings of the TENCON 2000, Kuala Lumpur, Malaysia, 24–27 September 2000; Volume 2, pp. 120–125. [Google Scholar]
  78. Davila, C. On the noise-compensated Yule-Walker equations. IEEE Trans. Signal Process. 2001, 49, 1119–1121. [Google Scholar] [CrossRef]
  79. So, H. LMS algorithm for unbiased parameter estimation of noisy autoregressive signals. Electron. Lett. 2001, 37, 536–537. [Google Scholar] [CrossRef]
  80. Jin, C.Z.; Jia, L.J.; Yang, Z.J.; Wada, K. On convergence of a BCLS algorithm for noisy autoregressive process estimation. In Proceedings of the 41st IEEE Conference on Decision and Control, Las Vegas, NV, USA, 10–13 December 2002; Volume 4, pp. 4252–4257. [Google Scholar]
  81. Hasan, M.; Rahim Chowdhury, A.; Adnan, R.; Rahman Bhuiyan, M.; Khan, M. A new method for parameter estimation of autoregressive signals in colored noise. In Proceedings of the 2002 11th European Signal Processing Conference, Toulouse, France, 3–6 September 2002; pp. 1–4. [Google Scholar]
  82. Hasan, T.; Fattah, S.; Khan, M. Identification of noisy AR systems using damped sinusoidal model of autocorrelation function. IEEE Signal Process. Lett. 2003, 10, 157–160. [Google Scholar] [CrossRef]
  83. Hasan, M.; Chowdhury, A.; Khan, M. Identification of autoregressive signals in colored noise using damped sinusoidal model. IEEE Trans. Circuits Syst. I Fundam. Theory Appl. 2003, 50, 966–969. [Google Scholar] [CrossRef]
  84. Diversi, R.; Guidorzi, R.; Soverini, U. Identification of ARX models with noisy input and output. In Proceedings of the 2007 European Control Conference (ECC), Kos, Greece, 2–5 July 2007; pp. 4073–4078. [Google Scholar]
  85. Fattah, S.; Zhu, W.; Ahmad, M. Identification of autoregressive moving average systems from noise-corrupted observations. In Proceedings of the 2008 Joint 6th International IEEE Northeast Workshop on Circuits and Systems and TAISA Conference, NEWCAS-TAISA 2008, Montreal, QC, Canada, 22–25 June 2008; pp. 69–72. [Google Scholar]
  86. Fattah, S.; Zhu, W.; Ahmad, M. A correlation domain algorithm for autoregressive system identification from noisy observations. In Proceedings of the 51st Midwest Symposium on Circuits and Systems, MWSCAS 2008, Knoxville, TN, USA, 10–13 August 2008; pp. 934–937. [Google Scholar]
  87. Babu, P.; Stoica, P.; Marzetta, T. An IQML type algorithm for AR parameter estimation from noisy covariance sequences. In Proceedings of the 2009 17th European Signal Processing Conference, Glasgow, UK, 24–28 August 2009; pp. 1022–1026. [Google Scholar]
  88. Qu, X.; Zhou, J.; Luo, Y. An Advanced Parameter Estimator of Multichannel Autoregressive Signals from Noisy Observations. In Proceedings of the 2009 International Conference on Information Engineering and Computer Science, ICIECS 2009, Wuhan, China, 19–20 December 2009; pp. 1–4. [Google Scholar]
  89. Weruaga, L.; Al-Ahmad, H. Frequency-selective autoregressive estimation in noise. In Proceedings of the 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Dallas, TX, USA, 15–19 March 2010; pp. 3854–3857. [Google Scholar]
  90. Mahmoudi, A.; Karimi, M. Parameter estimation of noisy autoregressive signals. In Proceedings of the 2010 18th Iranian Conference on Electrical Engineering (ICEE), Isfahan, Iran, 11–13 May 2010; pp. 145–149. [Google Scholar]
  91. Weruaga, L. Two methods for autoregressive estimation in noise. In Proceedings of the 2011 IEEE GCC Conference and Exhibition (GCC), Dubai, United Arab Emirates, 19–22 February 2011; pp. 501–504. [Google Scholar] [CrossRef]
  92. Youcef, A.; Diversi, R.; Grivel, E. Errors-in-variables identification of noisy moving average models. In Proceedings of the 2015 23rd European Signal Processing Conference (EUSIPCO), Nice, France, 31 August–4 September 2015; pp. 968–972. [Google Scholar]
  93. Hamilton, J.D. Time Series Analysis; Princeton University Press: Princeton, NJ, USA, 1994. [Google Scholar]
  94. Kay, S. Recursive Maximum Likelihood Estimation of Autoregressive Processes. IEEE Trans. Acoust. Speech Signal Process. 1983, 31, 292–303. [Google Scholar] [CrossRef]
  95. Evans, A.; Fischl, R. Optimal least squares time-domain synthesis of recursive digital filters. IEEE Trans. Audio Electroacoust. 1973. [Google Scholar] [CrossRef]
  96. Bresler, Y.; Macovski, A. Exact maximum likelihood parameter estimation of superimposed exponentials in noise. IEEE Trans. Acoust. Speech Signal Process. 1986, 34, 1081–1089. [Google Scholar] [CrossRef] [Green Version]
  97. Steiglitz, K.; McBride, L. A technique for the identification of linear systems. IEEE Trans. Autom. Control 1965, 10, 461–464. [Google Scholar] [CrossRef]
  98. Kumaresan, R.; Scharf, L.; Shaw, A. An Algorithm for pole-zero modeling and spectral analysis. IEEE Trans. Acoust. Speech Signal Process. 1986, 34, 637–640. [Google Scholar] [CrossRef]
  99. Anderson, J.; Giannakis, G. Noisy input/output system identification using cumulants and the Steiglitz-McBride algorithm. In Proceedings of the 1991 Conference Record of the Twenty-Fifth Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 4–6 November 1991; Volume 1, pp. 608–612. [Google Scholar]
  100. Fuller, W.A. Measurement Error Models; Wiley: Hoboken, NJ, USA, 1987. [Google Scholar]
  101. Petitjean, J.; Grivel, E.; Bobillet, W.; Roussilhe, P. Multichannel AR parameter estimation from noisy observations as an errors-in-variables issue. In Proceedings of the 2008 16th European Signal Processing Conference, Lausanne, Switzerland, 25–29 August 2008; pp. 1–5. [Google Scholar]
  102. Petitjean, J.; Diversi, R.; Grivel, E.; Guidorzi, R.; Roussilhe, P. Recursive errors-in-variables approach for AR parameter estimation from noisy observations. Application to radar sea clutter rejection. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2009, Taipei, Taiwan, 19–24 April 2009; pp. 3401–3404. [Google Scholar]
  103. Petitjean, J.; Grivel, E.; Diversi, R.; Guidorzi, R. A recursive errors-in-variables method for tracking time varying autoregressive parameters from noisy observations. In Proceedings of the 2010 18th European Signal Processing Conference, Aalborg, Denmark, 23–27 August 2010; pp. 840–844. [Google Scholar]
  104. Labarre, D.; Grivel, E.; Berthoumieu, Y.; Todini, E.; Najim, M. Consistent estimation of autoregressive parameters from noisy observations based on two interacting Kalman filters. Signal Process. 2006, 86, 2863–2876. [Google Scholar] [CrossRef]
  105. Jamoos, A.; Grivel, E.; Christov, N.; Najim, M. Estimation of autoregressive fading channels based on two cross-coupled H filters. Signal Image Video Process. J. Springer 2009, 3, 209–216. [Google Scholar] [CrossRef]
  106. Jamoos, A.; Grivel, E.; Shakarneh, N.; Abdel-Nour, H. Dual Optimal filters for parameter estimation of a multivariate autoregressive process from noisy observations. IET Signal Processing 2011, 5, 471–479. [Google Scholar] [CrossRef]
  107. Bugallo, M.F.; Martino, L.; Corander, J. Adaptive importance sampling in signal procesing. Digit. Signal Process. 2015, 47, 36–49. [Google Scholar] [CrossRef] [Green Version]
  108. Urteaga, I.; Djuric, P.M. Sequential estimation of Hidden ARMA Processes by Particle Filtering—Part I. IEEE Trans. Signal Process. 2017, 65, 482–493. [Google Scholar] [CrossRef]
  109. Urteaga, I.; Djuric, P.M. Sequential estimation of Hidden ARMA Processes by Particle Filtering—Part II. IEEE Trans. Signal Process. 2017, 65, 494–504. [Google Scholar] [CrossRef]
  110. Martino, L.; Elvira, V.; Camps-Valls, G. Distributed Particle Metropolis-Hastings Schemes. In IEEE Statistical Signal Processing Workshop; IEEE: Piscataway, NJ, USA, 2018; pp. 553–557. [Google Scholar]
  111. Kay, S.M. Fundamentals of Statistical Signal Processing: Estimation Theory; Prentice-Hall: Upper Saddle River, NJ, USA, 1993. [Google Scholar]
  112. Farahmand, S.; Giannakis, G.B. Robust RLS in the Presence of Correlated Noise Using Outlier Sparsity. IEEE Trans. Signal Process. 2012, 60, 3308–3313. [Google Scholar] [CrossRef]
  113. Grant, M.; Boyd, S. CVX: Matlab Software for Disciplined Convex Programming, Version 1.21. 2011. Available online: http://cvxr.com/cvx (accessed on 19 May 2020).
  114. Moon, T.K.; Stirling, W.C. Mathematical Methods and Algorithms for Signal Processing; Prentice-Hall: Upper Saddle River, NJ, USA, 2000. [Google Scholar]
Figure 1. Noisy Autoregressive Model.
Figure 1. Noisy Autoregressive Model.
Entropy 22 00572 g001
Figure 2. Example of estimating the variances σ ν 2 (top plot) and σ η 2 (bottom plot), using true b . k indicates the number of points used in the estimates. Shading indicates standard deviation of the estimates over 50 iterations. Bounded solution is computed using CVX to avoid negative estimates.
Figure 2. Example of estimating the variances σ ν 2 (top plot) and σ η 2 (bottom plot), using true b . k indicates the number of points used in the estimates. Shading indicates standard deviation of the estimates over 50 iterations. Bounded solution is computed using CVX to avoid negative estimates.
Entropy 22 00572 g002
Figure 3. Case 1: Error performance fo different values of d with s = 3 . (a) Solid: True R b ( b ) ; (b) Dashed: R b ( b ^ ) estimated from b ^ and correct variances; (c) Dotted: R b ( b ) estimated from b ^ and fixed incorrect variances. Comparison: Black dotted: conventional least-squares; Solid black: Yule-Walker (YW) with noisy observations; Dashed black: NW with noise-free observations.
Figure 3. Case 1: Error performance fo different values of d with s = 3 . (a) Solid: True R b ( b ) ; (b) Dashed: R b ( b ^ ) estimated from b ^ and correct variances; (c) Dotted: R b ( b ) estimated from b ^ and fixed incorrect variances. Comparison: Black dotted: conventional least-squares; Solid black: Yule-Walker (YW) with noisy observations; Dashed black: NW with noise-free observations.
Entropy 22 00572 g003
Figure 4. Case 1: Error performance for different values of s with d = 7 . (a) Solid: True R b ( b ) ; (b) Dashed: R b ( b ) estimated from b ^ and variances; (c) Dotted: R b ( b ) estimated from b ^ and fixed incorrect variances. Comparison: Black dotted: conventional least-squares; Solid black: Yule-Walker (YW) with noisy observations; Dashed black: NW with noise-free observations.
Figure 4. Case 1: Error performance for different values of s with d = 7 . (a) Solid: True R b ( b ) ; (b) Dashed: R b ( b ) estimated from b ^ and variances; (c) Dotted: R b ( b ) estimated from b ^ and fixed incorrect variances. Comparison: Black dotted: conventional least-squares; Solid black: Yule-Walker (YW) with noisy observations; Dashed black: NW with noise-free observations.
Entropy 22 00572 g004
Figure 5. Case 1: Estimated variances for different values of d. Top: Estimated σ η 2 ; Bottom: Estimated σ ν 2 .
Figure 5. Case 1: Estimated variances for different values of d. Top: Estimated σ η 2 ; Bottom: Estimated σ ν 2 .
Entropy 22 00572 g005
Figure 6. Case 1: Comparison of ICWARE estimated spectrum with true spectrum, and the YW estimated spectrum.
Figure 6. Case 1: Comparison of ICWARE estimated spectrum with true spectrum, and the YW estimated spectrum.
Entropy 22 00572 g006
Figure 7. Case 1: Comparison of new method with total least squares (TLS) solutions for different TLS sizes. Dashed: ICWARE for different values of d; Solid: TLS for different matrix sizes.
Figure 7. Case 1: Comparison of new method with total least squares (TLS) solutions for different TLS sizes. Dashed: ICWARE for different values of d; Solid: TLS for different matrix sizes.
Entropy 22 00572 g007
Figure 8. Case 1: Error performance of 10 repetitions on blocks of length 100. (a) Solid: True R b ( b ) ; (b) Dashed: R b ( b ) estimated from b ^ and variances; (c) Dotted: R b ( b ) estimated from b ^ and fixed incorrect variances. Comparison: Black dotted: conventional least-squares; Solid black: Yule-Walker (YW) with noisy observations; Dashed black: NW with noise-free observations.
Figure 8. Case 1: Error performance of 10 repetitions on blocks of length 100. (a) Solid: True R b ( b ) ; (b) Dashed: R b ( b ) estimated from b ^ and variances; (c) Dotted: R b ( b ) estimated from b ^ and fixed incorrect variances. Comparison: Black dotted: conventional least-squares; Solid black: Yule-Walker (YW) with noisy observations; Dashed black: NW with noise-free observations.
Entropy 22 00572 g008
Figure 9. Performance on 10 repetitions on blocks of length 100, with the results folded for each iteration. Top: Estimated R b ( b ^ ) using b ^ and true variances; Bottom: Estimated R b ( b ^ ) using b ^ and fixed variances.
Figure 9. Performance on 10 repetitions on blocks of length 100, with the results folded for each iteration. Top: Estimated R b ( b ^ ) using b ^ and true variances; Bottom: Estimated R b ( b ^ ) using b ^ and fixed variances.
Entropy 22 00572 g009
Figure 10. Case 2: Error performance for different values of d with s = 3 . (a) Solid: True R b ( b ) ; (b) Dashed: R b ( b ) estimated from b ^ and variances; (c) Dotted: R b ( b ) estimated from b ^ and fixed incorrect variances. Comparison: Black dotted: conventional least-squares; Solid black: Yule-Walker (YW) with noisy observations; Dashed black: NW with noise-free observations.
Figure 10. Case 2: Error performance for different values of d with s = 3 . (a) Solid: True R b ( b ) ; (b) Dashed: R b ( b ) estimated from b ^ and variances; (c) Dotted: R b ( b ) estimated from b ^ and fixed incorrect variances. Comparison: Black dotted: conventional least-squares; Solid black: Yule-Walker (YW) with noisy observations; Dashed black: NW with noise-free observations.
Entropy 22 00572 g010
Figure 11. Case 2: Comparison of ICWARE estimated spectrum with true spectrum, and the YW estimated spectrum.
Figure 11. Case 2: Comparison of ICWARE estimated spectrum with true spectrum, and the YW estimated spectrum.
Entropy 22 00572 g011
Figure 12. Case 3: Comparison of ICWARE estimated spectrum with true spectrum, and the YW estimated spectrum.
Figure 12. Case 3: Comparison of ICWARE estimated spectrum with true spectrum, and the YW estimated spectrum.
Entropy 22 00572 g012
Figure 13. Spectral estimation results for examples from Reference [104]. (a) Case 4; (b) Case 5; (c) Case 6. σ η 2 = 1 .
Figure 13. Spectral estimation results for examples from Reference [104]. (a) Case 4; (b) Case 5; (c) Case 6. σ η 2 = 1 .
Entropy 22 00572 g013aEntropy 22 00572 g013b
Table 1. Parameters for test cases.
Table 1. Parameters for test cases.
CaseOrderPole Locations
13 0.95 e j 2 π 0.65 , 0.95 e j 2 π 0.7 , 0.95 e j 2 π 0.75
23 0.9 e j 2 π 0.65 , 0.9 e j 2 π 0.7 , 0.9 e j 2 π 0.75
33 e j 2 π 0.65 , e j 2 π 0.7 , e j 2 π 0.75
46 0.75 e ± j 2 π 0.1 , 0.8 e ± j 2 π 0.2 , 0.85 e ± j 2 π 0.35
56 0.98 e ± j 2 π 0.05 , 0.97 e ± j 2 π 0.15 , 0.8 e ± j 2 π 0.35
66 0.98 e ± j 2 π 0.1 , 0.97 e ± j 2 π 0.1 , 0.98 e ± j 2 π 0.15
Table 2. Comparison of b b ^ k 2 computed for d = 1 , , 7 with b b ^ Y W 2 for YW.
Table 2. Comparison of b b ^ k 2 computed for d = 1 , , 7 with b b ^ Y W 2 for YW.
d 10 log 10 b b ^ k 2 2 / b b ^ YW 2
s = 2 s = 3 s = 4
2−5.9−5.811.1
3−12.4−12.54.5
4−17.5−17.5−0.6
5−20.6−20.6−3.9
6−22.6−22.7−5.9
7−23.9−24.0−7.2

Share and Cite

MDPI and ACS Style

Moon, T.K.; Gunther, J.H. Estimation of Autoregressive Parameters from Noisy Observations Using Iterated Covariance Updates. Entropy 2020, 22, 572. https://doi.org/10.3390/e22050572

AMA Style

Moon TK, Gunther JH. Estimation of Autoregressive Parameters from Noisy Observations Using Iterated Covariance Updates. Entropy. 2020; 22(5):572. https://doi.org/10.3390/e22050572

Chicago/Turabian Style

Moon, Todd K., and Jacob H. Gunther. 2020. "Estimation of Autoregressive Parameters from Noisy Observations Using Iterated Covariance Updates" Entropy 22, no. 5: 572. https://doi.org/10.3390/e22050572

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop