On Fast Converging Data-Selective Adaptive Filtering

Mendonça, Marcele O. K.; Ferreira, Jonathas O.; Tsinos, Christos G.; Diniz, Paulo S R; Ferreira, Tadeu N.

doi:10.3390/a12010004

Open AccessArticle

On Fast Converging Data-Selective Adaptive Filtering

by

Marcele O. K. Mendonça

¹,

Jonathas O. Ferreira

¹,

Christos G. Tsinos

²,

Paulo S R Diniz

^1,*

and

Tadeu N. Ferreira

³

¹

Signals, Multimedia, and Telecommunications Lab., Universidade Federal do Rio de Janeiro DEL/Poli &PEE/COPPE/UFRJ, P.O. Box 68504, Rio de Janeiro RJ 21941-972, Brazil

²

SnT-Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg,4365 Luxembourg City, Luxembourg

³

Tadeu N. Ferreira, Fluminense Federal University, Engineering School, R. Passo da Patria, 156, Room E-406, Niteroi RJ 24210-240, Brazi

^*

Author to whom correspondence should be addressed.

Algorithms 2019, 12(1), 4; https://doi.org/10.3390/a12010004

Submission received: 30 November 2018 / Revised: 17 December 2018 / Accepted: 18 December 2018 / Published: 21 December 2018

(This article belongs to the Special Issue Adaptive Filtering Algorithms)

Download

Browse Figures

Versions Notes

Abstract

:

The amount of information currently generated in the world has been increasing exponentially, raising the question of whether all acquired data is relevant for the learning algorithm process. If a subset of the data does not bring enough innovation, data-selection strategies can be employed to reduce the computational complexity cost and, in many cases, improve the estimation accuracy. In this paper, we explore some adaptive filtering algorithms whose characteristic features are their fast convergence and data selection. These algorithms incorporate a prescribed data-selection strategy and are compared in distinct applications environments. The simulation results include both synthetic and real data.

Keywords:

adaptive signal processing; adaptive filters; parameter estimation; system identification; equalization; prediction; learning systems; data processing; LMS-Newton; conjugate gradient

1. Introduction

In many practical applications, the number of data-acquisition devices is growing at an exponential rate, such as in distributed networks, massive multiple-input multiple-output (MIMO) antennas, and for social networks. This trend calls for the parsimonious use of the acquired data when considering the overwhelming resources required such as storage capacity, device-to-device communications, power consumption, among others. In the era of Big Data, we face the challenge of efficiently utilizing a large amount of data to extract the critical information. In the context of adaptive filtering, the recently proposed strategy to perform data-selection approximating a prescribed update rate appears to be promising [1].

This new data-selection method prescribes a probability of updating utilizing a threshold based on the mean squared error (MSE) which determines if the acquired data sample contains enough information to justify a change in the parameter estimate. Utilizing a statistical model for the MSE, it is possible to prescribe the probability of updating inherent to the learning algorithm. Also, an additional threshold value can be utilized to verify if the data represents an outlier, i.e., abnormal data.

Previous work [1] addressed the classical Least Mean Square (LMS), the Affine Projection (AP), and the Recursive Least Squares (RLS) algorithms. Among these algorithms, the RLS has the fastest convergence in stationary environments, and the highest computational complexity, while facing numerical stability issues. In this work, we consider fast converging adaptive filtering algorithms with data selection and apply them to process real data in order to verify their effectiveness in addressing practical problems. The proposed algorithms are the LMS Newton (LMSN) [2], the LMS Quasi-Newton (LMSQN) [3] and the online conjugate gradient (CG) [4,5,6,7,8,9]. Like the RLS algorithm, the LMSN has fast convergence even when the input signal is highly correlated. By employing an estimate of the Hessian matrix, the LMSQN methods [10,11,12,13] have a similar computational complexity to the RLS and LMSN algorithms without sacrificing the performance. In fact, one version of the LMSQN algorithm exhibits improved robustness to quantization errors [3]. A low complexity algorithm originates from the Conjugate Gradient (CG) method [4] which does not require the computation of the Hessian matrix inverse as the LMSN, LMSQN, and RLS algorithms. The CG algorithm is attractive since it updates the adaptive filter coefficients based on conjugate directions leading to faster convergence than the gradient-based algorithms such as the LMS.

We explore the data-selective version of the LMSN, LMSQN and CG algorithms, showing their cost function in different applications, such as equalization and signal enhancement, which were not previously considered in [1,14,15]. For each configuration, we describe how to estimate the MSE to allow the prescription of the update rate.

The performance of the data selective algorithms is evaluated via simulations utilizing synthetic and real data in different adaptive filtering applications. The results indicate that the data selection strategy can indeed be applied to any type of applications of adaptive filters.

This paper is organized as follows. In Section 2, the data-selection strategy is discussed along with the calculation of the threshold parameter required to achieve a target probability of update. Section 3 describes the data-selective LMSN (DS-LMSN), LMSQN (DS-LMQSN), and the CG (DS-CG) algorithms. Section 4 compares the proposed algorithms. Some concluding remarks are provided in Section 5.

2. Problem Description

This section describes how to apply the data selection approach for distinct application set-ups. Regardless of the application, the filter output can be formulated as

y (k) = w^{T} (k) x (k)

(1)

where

x (k) = {[x_{0} (k) x_{1} (k) \dots x_{N - 1} (k)]}^{T}

is the input applied to the adaptive filter and

w (k) = {[w_{0} (k) w_{1} (k) \dots w_{N - 1} (k)]}^{T}

represents the adaptive filter coefficients. We can compute the a priori error signal

e (k)

as

e (k) = d (k) - w^{T} (k) x (k)

(2)

where

d (k)

is the desired signal.

Basically the error signal

e (k)

is used by the adaptive algorithm to determine the updating of the filter coefficients. Given the error distribution, it is possible to infer the degree of innovation the current data carries; an illustration is provided in Figure 1. Observe that the central interval (in light blue) is equivalent to lower error values, whereas the edge intervals in red correspond to higher error values, indicating the presence of possible outliers that damage the estimation. The data-selection strategy relies on updating the filter coefficients only when the current data are informative, i.e., generate an error that does not belong to any of those intervals. Hence, the overall computational cost is reduced, since the coefficients are no longer updated

100 %

of the time.

The adaptive algorithms require an objective function of the error in order to perform the coefficient updating. A common objective function considered in the adaptive filtering theory is the instantaneous squared error,

J (w (k)) = \frac{1}{2} {| e (k) |}^{2}

(3)

where

| \cdot |

denotes the absolute value.

If the error distribution is assumed to be Gaussian, then

e \sim N (0, σ_{e}^{2})

(4)

where

σ_{e}^{2}

is the error variance. By normalizing the error distribution, we obtain

\frac{e}{σ_{e}} \sim N (0, 1) .

(5)

The updating of the adaptive filter coefficients occurs if the normalized error

\frac{| e (k) |}{σ_{e}}

is greater than a given threshold

\sqrt{τ (k)}

. However, if

\frac{| e (k) |}{σ_{e}}

is greater than another threshold

\sqrt{τ_{\max}}

, an outlier is identified and thus, no update should be performed. These conditions can be incorporated to the minimization of the function

J^{'} (w (k)) = \{\begin{matrix} \frac{1}{2} {| e (k) |}^{2}, if \sqrt{τ (k)} \leq \frac{| e (k) |}{σ_{e}} < \sqrt{τ_{\max}} \\ 0, otherwise . \end{matrix}

(6)

Hence, the coefficient updating follows the rule,

\begin{matrix} w (k + 1) = \{\begin{matrix} w (k) + u (k), \sqrt{τ (k)} \leq \frac{| e (k) |}{σ_{e}} < \sqrt{τ_{\max}} \\ w (k), otherwise . \end{matrix} \end{matrix}

(7)

where the term

u (k)

depends on the adaptive algorithm employed. The desired probability of coefficient update

P_{up} (k)

represents how often the first statement in Equation (7) is performed and is modeled as

P_{up} (k) = P \{\frac{| e (k) |}{σ_{e}} > \sqrt{τ (k)}\} - P \{\frac{| e (k) |}{σ_{e}} > \sqrt{τ_{\max}}\} .

(8)

By considering the distribution in (5), Equation (8) in steady-state becomes,

P_{up} = 2 Q_{e} (\sqrt{τ}) - 2 Q_{e} (\sqrt{τ_{\max}}),

(9)

where

Q_{e} (\cdot)

is the complementary Gaussian cumulative distribution function, given by

Q_{e} (x) = 1 / (2 π) \int_{x}^{\infty} e x p (- t^{2} / 2) d t

[16]. Even when outliers are present in the dataset, the probability

P \{\frac{| e (k) |}{σ_{e}} > \sqrt{τ_{\max}}\}

tends to be very small. Therefore, the parameter

τ

can be obtained from Equation (9) as

\sqrt{τ} = Q_{e}^{- 1} (\frac{P_{up}}{2}),

(10)

where

Q_{e}^{- 1} (\cdot)

is the inverse of the

Q_{e} (\cdot)

function. Basically, to apply the threshold

τ

in the coefficient updating (7), we need to calculate

σ_{e}^{2}

. At this point, it should be mentioned that in the system identification application with sufficient order, the minimum MSE in steady-state is

σ_{n}^{2}

, the variance of the measurement noise

n (k)

. Hence, it is convenient to express

σ_{e}^{2}

as a function of the noise variance

σ_{e}^{2} = (1 + ρ) σ_{n}^{2}

(11)

in which the excess MSE is rewritten as

ρ σ_{n}^{2}

. The expression of

ρ

depends on the adaptive algorithm and is key to establish the prescribed probability of update

P_{up}

.

As a result, the coefficient updating is performed based on a scaled power noise,

τ (k) σ_{n}^{2}

[1,14,15] so that an equivalent expression to Equation (8) can be rewritten as

P_{up} (k) = P \{\frac{| e (k) |}{σ_{n}} > \sqrt{τ (k)}\} - P \{\frac{| e (k) |}{σ_{n}} > \sqrt{τ_{\max}}\}

(12)

resulting in the following modifications in Equations (9) and (10)

P_{up} = 2 Q_{e} (\frac{σ_{n} \sqrt{τ}}{σ_{e}}) - 2 Q_{e} (\frac{σ_{n} \sqrt{τ_{\max}}}{σ_{e}}) and \sqrt{τ} = \sqrt{(1 + ρ)} Q_{e}^{- 1} (\frac{P_{up}}{2}) for τ_{\max} \to \infty .

(13)

If the presence of outliers is known, a possible strategy to eliminate them consists of employing the first 20% of the data without taking into consideration the threshold

τ_{\max}

, hence obtaining an estimate of the error behavior. For the remaining iterations, it is calculated by

\sqrt{τ_{\max}} = E [e (k) / σ_{e}] + 3 Var [e (k) / σ_{e}] .

(14)

Since the expression (5) represents a Gaussian distribution, we can use the empirical rule given by Equation (14) to identify the values that exceed the threshold as outliers.

Under the considered assumptions regarding the error distribution,

E {e (k)} = 0

and thus,

σ_{e}^{2} = E [e^{2} (k)] = ξ (k)

(15)

where

ξ (k)

for,

k \to \infty

is the steady-state MSE obtained by employed algorithm. The expression of

ξ (k)

depends on the filter application and the algorithm employed. In the following subsections, we compute the steady-state MSE for some adaptive filter applications.

Alternatively, the error variance can be estimated by

σ_{e}^{2} = (1 - b) e^{2} (k) + (b) e^{2} (k - 1),

(16)

where b is a forgetting factor.

Although not discussed here, for the cases the error distribution is not Gaussian we can determine the threshold based on measured data through the evaluation of tail probabilities, see [17]. It is also worth mentioning that in particular applications, such as in medical data like ECG, the outlier threshold might affect the main feature to be observed since it resembles an outlier behavior.

2.1. Equalization

In the equalization application, the desired signal is a delayed version of the input

d (k) = s (k - l)

(17)

where l represents the delay. The adaptive filter output is written as

y (k) = w^{T} (k) x (k) = w^{T} (k) (H s (k) + n (k))

(18)

where

H \in R^{N \times L}

is the finite impulse response (FIR) channel convolution matrix,

s (k) = [s_{0} (k) s_{1} (k) \dots s_{L - 1} (k)] \in R^{L}

is the received signal and the channel noise,

n (k) = [n_{0} (k) n_{1} (k) \dots n_{N - 1} (k)] \in R^{N}

, is drawn from an independent Gaussian distribution with zero mean and variance

σ_{n}^{2}

. Therefore we can express the MSE as

\begin{matrix} ξ (k) & = E [e^{2} (k)] = E [{(s (k - l) - y (k))}^{2}] \\ = σ_{s}^{2} - 2 E [s (k - l) (w^{T} (k) (H s (k) + n (k)))] + E [{(w^{T} (k) (H s (k) + n (k)))}^{2}] \\ = σ_{s}^{2} - 2 w^{T} (k) H E [s (k - l) s (k)] + w^{T} (k) H E [s (k) s^{T} (k)] H^{T} w (k) + w^{T} (k) E [n (k) n^{T} (k)] w (k) \\ = σ_{s}^{2} - 2 w^{T} (k) H r_{l} + w^{T} (k) (H R H^{T} + I_{N} σ_{n}^{2}) w (k) \\ \approx σ_{s}^{2} (1 - 2 w^{T} (k) h_{l} + w^{T} (k) H H^{T} w (k)) + σ_{n}^{2} w^{T} (k) w (k) \end{matrix}

(19)

where

R

is the autocorrelation matrix of the input signal,

r_{l}

is the l-th column of the autocorrelation matrix, and

h_{l} = H r_{l}

. We are assuming that the inputs and the additional noise are uncorrelated.

Assuming the channel model is unknown, the practical way to compute the data-selection threshold is to estimate the output error variance through (16).

2.2. Signal Enhancement

In the signal enhancement case, the desired signal is a signal of interest corrupted by noise,

d (k) = s (k) + n_{1} (k) .

(20)

By using another noise correlated with the noise that impairs

s (k)

as the adaptive filter input,

x (k) = n_{2} (k),

(21)

the conventional error signal

e (k)

will be an enhancement version of

d (k)

and the adaptive filter output

y (k)

will be the actual error. For this reason, in this signal enhancement case, the MSE is calculated based on the variance of

y (k)

instead of

e (k)

. Hence, the MSE expression for signal enhancement is obtained as

ξ (k) = σ_{y}^{2} = E [y^{2} (k)] = E [{(w^{T} (k) n_{2} (k))}^{2}] = σ_{n_{2}}^{2} (k) | | w {(k) | |}_{2}^{2} .

(22)

2.3. Signal Prediction

In the signal prediction case, the desired signal is a delayed version

x (k + L)

of the input signal

x (k)

. Therefore, the error signal is

e (k) = x (k + L) - w^{T} (k) x (k),

(23)

and the MSE expression

ξ (k) = E [e^{2} (k)] = E [{(x (k + L) - w^{T} (k) x (k))}^{2}]

(24)

give rise to a expression for the minimum MSE:

ξ_{\min} (k) = r (0) - w_{o}^{T} [\begin{matrix} r (L) \\ r (L + 1) \\ ⋮ \\ r (L + N) \end{matrix}]

(25)

where

w_{o}

is the optimal coefficients of the predictor and

r (l) = E [x (k) x (k - l)]

. Since in the prediction case

ξ (k) = σ_{e}^{2} \approx ξ_{\min}

, Equation (25) can be used to obtain an estimate of

σ_{e}^{2}

at iteration k by replacing

w_{o}^{T}

by

w (k)

which are the coefficients of the adaptive filter at iteration k. We can estimate

r (l)

through

r (l) = ζ r (l - 1) + (1 - ζ) x (k) x (k - l)

in which

ζ

is a forgetting factor.

2.4. System Identification

In the system identification, the desired signal can be formulated as

d (k) = {w_{o}}^{T} x (k) + n (k)

(26)

where

w_{o}

is the optimal coefficient,

x (k)

is the input vector and

n (k)

is the noise drawn from AWGN with zero mean and variance

σ_{n}^{2}

. Therefore, the MSE can be expressed as:

\begin{matrix} ξ (k) & = E [e^{2} (k)] = E [n^{2} (k)] - 2 E [n (k) Δ w^{T} (k) x (k)] \\ + E [Δ w^{T} (k) x (k) x^{T} (k) Δ w (k)] \end{matrix}

(27)

where we define

Δ w (k) = w (k) - w_{o}

. Assuming that the noise and coefficients are uncorrelated, the second term in (27) is zero and we get the following expression

ξ (k) = σ_{n}^{2} + ξ_{exc} (k)

(28)

where

ξ_{exc} (k)

is the excess MSE and

E [n^{2} (k)] = σ_{n}^{2}

. As excess MSE tends to zero, the MSE expression for system identification, previously mentioned in (11), is rewritten as

ξ (k) = σ_{e}^{2} = (1 + ρ) σ_{n}^{2} .

(29)

3. Data-Selective Adaptive Filtering Algorithms

We consider the Newton-based methods LMSN and LMSQN as well as the online CG algorithm to solve the objective function

ξ = \frac{1}{2} E [e^{2} (k)] .

(30)

The Newton-based methods follow a second-order approximation of the objective function and hence perform the coefficient updating as

w (k + 1) = w (k) - μ (k) {\hat{R}}^{- 1} {\hat{g}}_{w} (k),

(31)

where

\hat{R}

and

{\hat{g}}_{w} (k)

are estimates of the Hessian and gradient of the objective function, respectively. In fact, the LMSN and LMSQN minimize the objective function in (3).

The CG method, on the other hand, falls in between steepest descent and Newton methods. In the CG algorithm, the search is performed along conjugate directions which produces generally faster convergence than steepest descent methods. The coefficient updating is performed as

w (k + 1) = w (k) - α (k) c (k),

(32)

where the conjugate directions

c (k)

will be explained in more details in Section 3.2.

3.1. LMSN and LMSQN

Considering the same estimate of the gradient,

{\hat{g}}_{w} (k) = - e (k) x (k)

, used in the LMS algorithm and also a variable step-size, we end-up with the following recursive coefficient updating formula [2]

w (k + 1) = w (k) + \frac{ν}{x^{T} (k) {\hat{R}}^{- 1} (k) x (k)} {\hat{R}}^{- 1} (k) x (k) e (k),

(33)

where

μ (k) = ν / (x^{T} (k) {\hat{R}}^{- 1} (k) x (k))

is a step-size parameter in which

ν

is a positive constant and

e (k) = d (k) - w^{T} (k) x (k)

is the a priori error.

The only difference between LMSN and the LMSQN algorithms is the way of matrix

{\hat{R}}^{- 1} (k)

is estimated. In the LMSN method, matrix

\hat{R} (k)

is estimated via a Robbins-Monro procedure resulting in the following update of its inverse, given by [2]

\begin{matrix} {\hat{R}}^{- 1} (k) = & \frac{1}{1 - θ} \{{\hat{R}}^{- 1} (k - 1) - \frac{{\hat{R}}^{- 1} (k - 1) x (k) x^{T} (k) {\hat{R}}^{- 1} (k - 1)}{\frac{1 - θ}{θ} + x^{T} (k) {\hat{R}}^{- 1} (k) x (k)}\}, \end{matrix}

(34)

where

θ

is a weight factor. The LMSQN algorithm updates matrix

{\hat{R}}^{- 1} (k)

by using the approach in [3] which ensures that

{\hat{R}}^{- 1} (k)

remains positive definite and bounded for a bounded input signal. As a result, the estimate

{\hat{R}}^{- 1} (k)

is obtained as

\begin{matrix} {\hat{R}}^{- 1} (k) = & {\hat{R}}^{- 1} (k - 1) + (\frac{ν}{2 x^{T} (k) {\hat{R}}^{- 1} (k - 1) x (k)} - 1) ν \frac{{\hat{R}}^{- 1} (k - 1) x (k) x^{T} (k) {\hat{R}}^{- 1} (k - 1)}{x^{T} (k) {\hat{R}}^{- 1} (k - 1) x (k)} . \end{matrix}

(35)

These algorithms have been analyzed more deeply through the use of theory and simulations in [2,3]. Although the estimate obtained by LMSN is accurate enough, it is not free of possible instability behavior. On the other hand, the LMSQN guarantees stability but can lead to poor estimations of

R^{- 1}

.

Since the LMSN and LMSQN algorithms update the filter coefficients in the same manner, both utilizing estimates of

R^{- 1}

, the excess MSE is also the same [3]. Hence, as proposed in [14], the expression of the excess MSE at steady-state for both DS-LMSN and DS-LMSQN algorithms can be written as the following approximation

ξ_{e x c} (k) \approx \frac{ν P_{up}}{2 - ν P_{up}} σ_{n}^{2}, k \to \infty .

(36)

As a result, we use a similar procedure to the one described in [18], in which the coefficient update of the DS-LMSN and DS-LMSQN algorithms are equivalent concerning their expected values. Hence, using this new update and by following the theoretical analysis demonstrated in [2], we obtain an approximation of the excess MSE at the steady-state in both algorithms. Thus, the value

ρ

can be obtained from the expression (36).

The steps of both DS-LMSN and DS-LMSQN algorithms are summarized in Algorithm 1, where the quantities

t (k)

and

ψ (k)

are included to simplify some steps in the computations.

Algorithm 1 Data-Selective LMSN and LMSQN algorithms

DS-LMSN and DS-LMSQN algorithms

Initialize

0 < ν \leq 1

,

0 < θ \leq 1

(for LMSN),

γ

small positive constant,

w (0) =

random vectors or zero vectors and

{\hat{R}}^{- 1} (0) = γ I_{L + 1}

Prescribe

P_{up}

and choose

τ_{\max}

\sqrt{τ} = \sqrt{(1 + ρ)} Q^{- 1} (\frac{P_{up}}{2})

For prediction and equalizer use

ρ = 0

For system identification use

ρ = \frac{ν P_{up}}{2 - ν P_{up}}

.

Do for

k > 0

acquire

x (k)

and

d (k)

e (k) = d (k) - w^{T} (k) x (k)

δ (k) =

\{\begin{matrix} 0, if - \sqrt{τ} \leq \frac{e (k)}{σ_{e}} \leq \sqrt{τ} \\ 0, if \frac{| e (k) |}{σ_{e}} \geq \sqrt{τ_{\max}} \\ 1, otherwise \end{matrix}

if

δ (k) = 0

w (k + 1) = w (k)

if

\frac{| e (k) |}{σ_{e}} \geq \sqrt{τ_{\max}}

e (k) = 0

d (k) = 0

end if

else

t (k) = {\hat{R}}^{- 1} (k - 1) x (k)

ψ (k) = x^{T} (k) t (k)

w (k + 1) = w (k) + ν \frac{t (k) e (k)}{ψ (k)}

{\hat{R}}^{- 1} (k) = \frac{1}{1 - θ} [{\hat{R}}^{- 1} (k - 1) - \frac{t (k) t^{T} (k)}{\frac{1 - θ}{θ} + ψ (k)}]

, for LMSN

{\hat{R}}^{- 1} (k) = {\hat{R}}^{- 1} (k - 1) + ν \frac{\frac{ν}{2 ψ (k)} - 1}{ψ (k)} t (k) t^{T} (k)

, for LMSQN

end if

3.2. Online Conjugate Gradient

Minimizing the objective function in (30) is equivalent to

\min_{w} \frac{1}{2} w^{T} (k + 1) R w (k + 1) - p^{T} w (k + 1)

(37)

in which

R = E [x (k) x^{T} (k)]

is the

N \times N

autocorrelation matrix of the input signal and

p = E [d (k) x (k)]

is the cross-correlation vector between the input and reference signals. Similarly, our goal is to solve the linear equation

R w (k + 1) = p .

(38)

The CG method can solve this problem by expressing the solution

w_{o} = \sum_{i = 0}^{N - 1} α (i) c (i)

(39)

in a basis formed by a set of vectors

c_{i}, i \in {0, \dots N - 1}

that present

R

-conjugacy, that is,

c^{T} (i) R c (j) = 0

for all

i \neq j

. By premultiplying Equation (39) by

c^{T} (k) R

and using conjugate definition:

\begin{matrix} c^{T} (k) R w_{o} & = & c^{T} (k) R (\sum_{i = 0}^{N - 1} α (i) c (i)) \\ = & \sum_{i = 0}^{N - 1} α (i) (c^{T} (k) R c (i)) \\ = & α (k) (c^{T} (k) R c (k)) . \end{matrix}

(40)

By replacing

R w_{o} = p

in (40), we obtain an expression for the constant

α

at the kth iteration:

α (k) = \frac{c^{T} (k) p}{c^{T} (k) R c (k)} .

(41)

Equation (39) can be evaluated as an iterative process in which a portion

α (k) c (k)

is added at the kth step:

w (k + 1) = w (k) + α (k) c (k) .

(42)

As observed in [19], the estimation of the matrix

R

and vector

p

can be both computed using the exponentially decaying window, giving rise to Equations (43) and (44), respectively.

R (k) = λ R (k - 1) + x (k) x^{T} (k)

(43)

p (k) = λ p (k - 1) + d (k) x (k)

(44)

Both estimations are also employed in RLS algorithm where

λ

represents a forgetting factor.

By applying the line search method as done in [19], another expression for

α (k)

can be achieved:

α (k) = η \frac{c^{T} (k) g (k - 1)}{c^{T} (k) R c (k)}

(45)

with

(λ - 0.5) \leq η \leq λ

to assure convergence. From Equations (42)–(44), we can obtain another expression for the negative gradient

g (k)

:

\begin{matrix} g (k) & = & p (k) - R (k) w (k + 1) \\ = & λ p (k - 1) + d (k) x (k) \\ - [λ R (k - 1) + x (k) x^{T} (k)] [w (k) + α (k) c (k)] \\ = & λ g (k - 1) - α (k) R (k) c (k) + x (k) e (k) \end{matrix}

(46)

where

e (k) = d (k) - x^{T} (k) w (k)

.

The next conjugate direction

c (k + 1)

can be obtained as the current negative gradient

g (k)

corrected by a term comprising a linear combination of the previous direction vectors:

c (k + 1) = g (k) + β (k) c (k)

(47)

in which

β (k) = \frac{{(g (k) - g (k - 1))}^{T} g (k)}{g^{T} (k - 1) g (k - 1)}

(48)

is a constant calculated to guarantee

R

-conjugacy and improve performance as well.

As analyzed in [15], the CG and RLS algorithms are equivalent in steady-state and hence the excess MSE is also equivalent,

ξ_{e x c} (k) \approx (N + 1) \frac{P_{up} (1 - λ)}{2 - P_{up} (1 - λ)} σ_{n}^{2}, k \to \infty .

(49)

in which the derivation is detailed in [1]. Thus, we can obtain

ρ

from Equation (36). As a result, we obtain the DS-CG summarized in Algorithm 2.

Algorithm 2 Data-Selective Conjugate Gradient algorithm

DS-CG algorithm

Initialize

λ, η with (λ - 0.5) \leq η \leq λ

,

w (0) =

random vectors or zero vectors

R_{0} = I

,

g (0) = c (1) = z e r o s (N + 1, 1)

,

γ =

small constant for regularization

Prescribe

P_{up}

, and choose

τ_{\max}

\sqrt{τ} = \sqrt{(1 + ρ)} Q^{- 1} (\frac{P_{up}}{2})

For prediction and equalizer use

ρ = 0

For system identification use

ρ = (N + 1) \frac{P_{up} (1 - λ)}{2 - P_{up} (1 - λ)}

.

Do for

k > 0

acquire

x (k)

and

d (k)

e (k) = d (k) - w^{T} (k) x (k)

δ (k) =

\{\begin{matrix} 0, if - \sqrt{τ} \leq \frac{e (k)}{σ_{e}} \leq \sqrt{τ} \\ 0, if \frac{| e (k) |}{σ_{e}} \geq \sqrt{τ_{\max}} \\ 1, otherwise \end{matrix}

if

δ (k) = 0

w (k + 1) = w (k)

if

\frac{| e (k) |}{σ_{e}} \geq \sqrt{τ_{\max}}

e (k) = 0

d (k) = 0

end if

else

R (k) = λ R (k - 1) + x (k) x^{T} (k)

α (k) = η \frac{c^{T} (k) g (k - 1)}{[c^{T} (k) R (k) c (k) + γ]}

w (k + 1) = w (k) + α (k) c (k)

g (k) = λ g (k - 1) - α (k) R (k) c (k) + x (k) e (k)

β (k) = \frac{{[g (k) - g (k - 1)]}^{T} g (k)}{[g^{T} (k - 1) g (k - 1) + γ]}

c (k + 1) = g (k) + β (k) c (k)

end if

4. Simulation Results

In this section, we present simulations utilizing both synthetic and real-world data for the algorithms explained in the previous section in order to verify the impact on the performance when the data selection method is applied. Moreover, the desired probability of updating

P_{up}

is varied between 0% and 100%, and it is compared to the measured probability of update

{\hat{P}}_{up}

.

4.1. Simulation 1: Equalizer

In this subsection, the channel we want to equalize is one of the FIR channel impulse responses provided by The Signal Processing Information Base repository [20]. The complex channel taps were obtained from digital microwave radio systems measurements, and the FIR model frequency response is illustrated in black in Figure 2a. The transmitted signal

s (k)

, modeled as realizations of a Gaussian random variable with

σ_{s}^{2} = 1

, transverses through the channel and it is corrupted by additive Gaussian noise with

σ_{n}^{2} = 10^{- 3}

. The adaptive filter performs the equalization and its output is an equalized version of

s (k)

. Each complex version of the data-selective algorithm is applied and their frequency responses try to invert the channel behavior as illustrated in Figure 2a, for

P_{up} = 1

and

P_{up} = 0.45

. We used

θ = 9 \times 10^{- 4}

,

γ = 1

and

ν = 0.05

for both LMSN and LMSQN. For CG, we used

λ = 0.9995

and

η = 0.48

. The filter order is

N = 100

. The error variance was estimated as in Equation (16) for

b = 0.9999

. Since the channel coefficients are complex values, the threshold is computed as

τ = 2 (1 + ρ) \ln (\frac{1}{P_{up}})

(50)

where

ρ = 0

[1]. The estimated probability of updating obtained by each algorithm is quite close the prescribed

P_{up}

, as depicted in Figure 2b. As can be seen in Figure 3, it is possible to obtain the transmitted signal with only

45 %

of the input data with almost the same accuracy obtained when

100 %

of the input data are used.

4.2. Simulation 2: Prediction

The dataset used in this subsection is taken from anemometer readings provided by GoogleâAZs RE < C Initiative [21]. The data consists of the wind’s speed recorded by five sensors on 25 May 2011. The dataset is split into 40 sets of size 8192 in order to use the Monte Carlo method.

The performance of the MSE is verified in Figure 4a using

P_{up} = 0.4

. The parameters for the DS-CG algorithm is

λ = 0.98

and

η = 0.48

. Both the DS-LMSN and DS-LMSQN algorithms employed

ν = 0.1

and LMSN utilized

θ = 0.1

. All algorithms obtain a similar convergence with 40 independent runs, but the DS-LMSQN algorithm achieves better performance due to its faster convergence to the steady state. The employed adaptive filter order is

N = 7

.

In Figure 4b, the prescribed and observed probabilities of update are compared. We can observe that all DS algorithms obtained an observed probability of update close to the prescribed one.

The output of the prediction is illustrated in Figure 5a for

P_{up} = 0.4

and Figure 5b for

P_{up} = 0.7

between iterations 8000 and 8150. In both the cases, it was observed an acceptable performance in the prediction, leading us to conclude that even if we perform a reduced number of updates, data selection algorithms achieve an accurate prediction.

4.3. Simulation 3: System Identification

In this simulation, our problem is to identify an unknown channel impulse response, described as:

h = {[0.1010 0.3030 0 - 0.2020 - 0.4040 - 0.7071 - 0.4040 - 0.2020]}^{T} .

(51)

The unknown system output is written as

d (k) = H^{T} x (k) + n (k)

, where

n (k)

is a Gaussian noise with zero mean and variance

σ_{n}^{2} = 10^{- 3}

. We consider two cases of input signals: a first-order and a fourth-order AR process, given by

\begin{matrix} x (k) = & 0.88 x (k - 1) + n_{1} (k), \\ x (k) = & - 0.55 x (k - 1) - 1.221 x (k - 2) - 0.49955 x (k - 3) \\ - 0.4536 x (k - 1) + n_{2} (k), \end{matrix}

where

n_{1} (k)

and

n_{2} (k)

are samples from a Gaussian noise uncorrelated with the additional noise

n (k)

. The variances

σ_{n_{1}}^{2}

and

σ_{n_{2}}^{2}

are set such as the input signal is of unit variance. The parameters employed in the system identification problem for conjugate gradient are

λ = 0.98

and

η = 0.48

, for both DS-LMSN and DS-LMSQN we set

ν = 0.1

and for DS-LMSN

θ = 0.1

. The filter order is

N = 7

with the purpose of ensuring the convergence of the filter coefficients to the optimal coefficients due to channel size.

The learning curves of the algorithms are compared in Figure 6a for a prescribed probability of update

P_{up} = 0.4

and first-order AR input signal. It can be noted that all algorithms achieve good performance even with a smaller amount of update. The DS-CG algorithm attains better performance than the DS-LMSN and DS-LMSQN since it converges faster to the steady-state. The observed

{\hat{P}}_{up}

and the prescribed

P_{up}

probabilities of update are depicted in Figure 6b. In all cases, these values are close, except at low values of

P_{up}

where we obtained a bit more updates than the ones prescribed. By using a fourth-order AR process as input signal in Figure 7, the results are similar to the first-order AR, confirming the expected robustness with respect to the statistical properties of the input signal, as long as the rank of its autocorrelation matrix does not become too small.

In another example utilizing a fourth-order AR input signal, we included an outlier signal affecting the reference signal 1% of the time with an amplitude equal to five. The desired

P_{up} = 0.3

was set, and we measured the misalignment in the adaptive filter coefficients, defined as

\frac{∥ w (k) - w_{o} ∥}{∥ w_{o} ∥}

where

w_{o}

represents the optimal vector of coefficients, for the algorithms discussed in the paper. As observed in Table 1, the misalignment is higher when ignoring the outliers and that the level of misalignment achieved by considering outliers for

P_{up} = 0.1

matches the one for

P_{up} = 0.3

addressing the outliers. It is also possible to verify that the proposed solutions approach the solution when the algorithms are updated all the time.

5. Conclusions

In this work, the data-selective versions of the LMSN, LMSQN and CG algorithms were explored in different applications. The key idea is providing a systematic form to prescribe the probability of update through a simple statistical model, where the environment data is classified as innovative, non-innovative, and outlier. The data-selection can not only reduce the computational complexity but also enhance the estimation accuracy when outliers are present. Simulation results on both real and synthetic data show that the data selection strategy works for all types of applications of adaptive filtering. Future work will address the extension of the data-selection approach to a broader class of learning algorithms as well as its effectiveness in distributed adaptive networks.

Author Contributions

The authors worked jointly on the concepts, validation, writing, and simulations.

Funding

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001 and CNPq Universal 431381/2016-0. This work was also supported by the research council FAPERJ (Fundação de Amparo à Pesquisa do Estado do Rio de Janeiro). This work was supported in part by FNR, Luxembourg under the project CORE ECLECTIC.

Conflicts of Interest

The authors declare no conflict of interest.

References

Diniz, P.S.R. On Data-Selective Adaptive Filtering. IEEE Trans. Signal Process. 2018, 66, 4239–4252. [Google Scholar] [CrossRef]
Diniz, P.S.R.; de Campos, M.L.R.; Antoniou, A. Analysis of LMS-Newton adaptive filtering algorithms with variable convergence factor. IEEE Trans. Signal Process. 1995, 43, 617–627. [Google Scholar] [CrossRef]
De Campos, M.L.R.; Antoniou, A. A new quasi-Newton adaptive filtering algorithm. IEEE Trans. Circuits Syst. II Analog. Digit. Signal Process. 1997, 44, 924–934. [Google Scholar] [CrossRef]
Antoniou, A.; Lu, W.S. Practical Optimization—Algorithms and Engineering Applications; Springer: New York, NY, USA, 2007. [Google Scholar]
Fletcher, R. Practical Methods of Optimization, 2nd ed.; John Wiley & Sons: Cornwall, UK, 2013. [Google Scholar]
Apolinário, J.A.; de Campos, M.L.R.; Bernal O, C.P. The constrained conjugate gradient algorithm. IEEE Signal Process. Lett. 2000, 7, 351–354. [Google Scholar] [CrossRef]
Hull, A.W.; Jenkins, W.K. Preconditioned conjugate gradient methods for adaptive filtering. In Proceedings of the IEEE International Sympoisum on Circuits and Systems, Singapore, 11–14 June 1991; pp. 540–543. [Google Scholar]
Chen, Z.; Li, H.; Rangaswamy, M. Conjugate gradient adaptive matched filter. IEEE Trans. Aerosp. Electron. Syst. 2015, 51, 178–191. [Google Scholar] [CrossRef]
Zhang, M.; Zhang, A.; Yang, Q. Robust Adaptive Beamforming Based on Conjugate Gradient Algorithms. IEEE Trans. Signal Process. 2016, 4, 6046–6057. [Google Scholar] [CrossRef]
Marshall, D.F.; Jenkins, W.K. A fast quasi-Newton adaptive filtering algorithm. IEEE Trans. Signal Process. 1992, 40, 1652–1662. [Google Scholar] [CrossRef]
Glentis, G.; Berberidis, K.; Theodoridis, S. Efficient least squares adaptive algorithms for FIR transversal filtering. IEEE Signal Process. Mag. 1999, 16, 13–41. [Google Scholar] [CrossRef]
Farhang-Boroujeny, B. Fast LMS/Newton algorithms based on autoregressive modeling and their application to acoustic echo cancellation. IEEE Trans. Signal Process. 1997, 45, 1987–2000. [Google Scholar] [CrossRef]
Albu, F.; Paleologu, C. The Variable Step-Size Gauss-Seidel Pseudo Affine Projection Algorithm. Int. J. Math. Comput. Phys. Electr. Comput. Eng. 2009, 3, 27–30. [Google Scholar]
Tsinos, C.G.; Diniz, P.S.R. Data-Selective Lms-Newton And Lms-Quasi-Newton Algorithms. Unpublished work. 2019; under review. [Google Scholar]
Diniz, P.S.R.; Mendonça, M.O.K.; Ferreira, J.O.; Ferreira, T.N. Data-Selective Conjugate Gradient Algorithm. In Proceedings of the Eusipco: European Signal Processing Conference, Rome, Italy, 3–7 September 2018. [Google Scholar]
Papoulis, A.; Pillai, S.U. Probability, Random Variables, and Stochastic Processes; McGraw-Hill Education: New York, NY, USA, 2002. [Google Scholar]
Miller, S.; Childers, D. Probability, and Random Processes, 2nd ed.; Academic Press: Oxford, UK, 2012. [Google Scholar]
Lima, M.; Diniz, P. Steady-state MSE performance of the set-membership affine projection algorithm. Circuits Syst. Signal Process. 2013, 32, 1811–1837. [Google Scholar] [CrossRef]
Chang, P.S.; Willson, A.N., Jr. Analysis of conjugate gradient algorithms for adaptive filtering. IEEE Trans. Signal Process. 2000, 48, 409–418. [Google Scholar] [CrossRef]
SPIB. Signal Processing Information Base. Available online: http://spib.linse.ufsc.br/microwave.html (accessed on 29 November 2018).
Google. RE < C: Surface Level Wind Data Collection, Google Code. Available online: http://code.google.com/p/google-rec-csp/ (accessed on 29 November 2018).

Figure 1. Data selection strategy.

Figure 2. Simulation 1: (a) Frequency response of the channel and the data-selective filters (b) Comparison between the desired

P_{up}

and achieved

{\hat{P}}_{up}^{L M S N}

,

{\hat{P}}_{up}^{L M S Q N}

and

{\hat{P}}_{up}^{C G}

by the data-selection algorithms.

Figure 2. Simulation 1: (a) Frequency response of the channel and the data-selective filters (b) Comparison between the desired

P_{up}

and achieved

{\hat{P}}_{up}^{L M S N}

,

{\hat{P}}_{up}^{L M S Q N}

and

{\hat{P}}_{up}^{C G}

by the data-selection algorithms.

Figure 3. Simulation 1: Comparison between the transmitted and the recovered signal by the DS-CG, DS-LMSQN and DS-LMSN algorithms for (a)

P_{up} = 0.45

and (b)

P_{up} = 1

.

Figure 3. Simulation 1: Comparison between the transmitted and the recovered signal by the DS-CG, DS-LMSQN and DS-LMSN algorithms for (a)

P_{up} = 0.45

and (b)

P_{up} = 1

.

Figure 4. Simulation 2: (a) Learning curves for the data selection and (b) Comparison between the desired

P_{up}

and achieved

{\hat{P}}_{up}^{L M S N}

,

{\hat{P}}_{up}^{L M S Q N}

and

{\hat{P}}_{up}^{C G}

by the data-selection algorithms.

Figure 4. Simulation 2: (a) Learning curves for the data selection and (b) Comparison between the desired

P_{up}

and achieved

{\hat{P}}_{up}^{L M S N}

,

{\hat{P}}_{up}^{L M S Q N}

and

{\hat{P}}_{up}^{C G}

by the data-selection algorithms.

Figure 5. Simulation 2: Comparison between the desired signal and the predicted by the DS-CG, DS-LMSQN and DS-LMSN algorithms for (a)

P_{up} = 0.4

and (b)

P_{up} = 0.7

.

Figure 5. Simulation 2: Comparison between the desired signal and the predicted by the DS-CG, DS-LMSQN and DS-LMSN algorithms for (a)

P_{up} = 0.4

and (b)

P_{up} = 0.7

.

Figure 6. Simulation 3 for first-order AR input signal: (a) Learning curves for the data selection and (b) Comparison between the desired

P_{up}

and achieved

{\hat{P}}_{up}^{L M S N}

,

{\hat{P}}_{up}^{L M S Q N}

and

{\hat{P}}_{up}^{C G}

by the data-selection algorithms.

Figure 6. Simulation 3 for first-order AR input signal: (a) Learning curves for the data selection and (b) Comparison between the desired

P_{up}

and achieved

{\hat{P}}_{up}^{L M S N}

,

{\hat{P}}_{up}^{L M S Q N}

and

{\hat{P}}_{up}^{C G}

by the data-selection algorithms.

Figure 7. Simulation 3 for fourth-order AR input signal: (a) Learning curves for the data selection and (b) Comparison between the desired

P_{up}

and achieved

{\hat{P}}_{up}^{L M S N}

,

{\hat{P}}_{up}^{L M S Q N}

and

{\hat{P}}_{up}^{C G}

by the data-selection algorithms.

Figure 7. Simulation 3 for fourth-order AR input signal: (a) Learning curves for the data selection and (b) Comparison between the desired

P_{up}

and achieved

{\hat{P}}_{up}^{L M S N}

,

{\hat{P}}_{up}^{L M S Q N}

and

{\hat{P}}_{up}^{C G}

by the data-selection algorithms.

Table 1. Misalignment with outliers, in dBs.

	Outlier	Yes	Yes	Yes	No
	$τ_{\max}$ on	yes	no	yes	no
	$P_{up}$	0.3	0.3	0.1	1
	DS-CG	−33.29	−15.25	−30.47	−33.37
Average	DS-LMSQN	−32.75	−15.42	−32.45	−32.80
Misalignment (dB)	DS-LMSN	−31.17	−13.81	−30.39	−31.91

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mendonça, M.O.K.; Ferreira, J.O.; Tsinos, C.G.; Diniz, P.S.R.; Ferreira, T.N. On Fast Converging Data-Selective Adaptive Filtering. Algorithms 2019, 12, 4. https://doi.org/10.3390/a12010004

AMA Style

Mendonça MOK, Ferreira JO, Tsinos CG, Diniz PSR, Ferreira TN. On Fast Converging Data-Selective Adaptive Filtering. Algorithms. 2019; 12(1):4. https://doi.org/10.3390/a12010004

Chicago/Turabian Style

Mendonça, Marcele O. K., Jonathas O. Ferreira, Christos G. Tsinos, Paulo S R Diniz, and Tadeu N. Ferreira. 2019. "On Fast Converging Data-Selective Adaptive Filtering" Algorithms 12, no. 1: 4. https://doi.org/10.3390/a12010004

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On Fast Converging Data-Selective Adaptive Filtering

Abstract

1. Introduction

2. Problem Description

2.1. Equalization

2.2. Signal Enhancement

2.3. Signal Prediction

2.4. System Identification

3. Data-Selective Adaptive Filtering Algorithms

3.1. LMSN and LMSQN

3.2. Online Conjugate Gradient

4. Simulation Results

4.1. Simulation 1: Equalizer

4.2. Simulation 2: Prediction

4.3. Simulation 3: System Identification

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI