Low-Complexity Constrained Recursive Kernel Risk-Sensitive Loss Algorithm

Xiang, Shunling; Zhao, Chunzhe; Gao, Zilin; Yan, Dongfang

doi:10.3390/sym14050877

Open AccessArticle

Low-Complexity Constrained Recursive Kernel Risk-Sensitive Loss Algorithm

School of Computer Science and Engineering, Chongqing Three Gorges University, Wanzhou, Chongqing 404120, China

^*

Author to whom correspondence should be addressed.

Symmetry 2022, 14(5), 877; https://doi.org/10.3390/sym14050877

Submission received: 30 March 2022 / Revised: 16 April 2022 / Accepted: 19 April 2022 / Published: 25 April 2022

(This article belongs to the Special Issue Adaptive Filtering and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

:

The constrained recursive maximum correntropy criterion (CRMCC) combats the non-Gaussian noise effectively. However, the performance surface of maximum correntropy criterion (MCC) is highly non-convex, resulting in low accuracy. Inspired by the smooth kernel risk-sensitive loss (KRSL), a novel constrained recursive KRSL (CRKRSL) algorithm is proposed, which shows higher filtering accuracy and lower computational complexity than CRMCC. Meanwhile, a modified update strategy is developed to avoid the instability of CRKRSL in the early iterations. By using Isserlis’s theorem to separate the complex symmetric matrix with fourth-moment variables, the mean square stability condition of CRKRSL is derived, and the simulation results validate its advantages.

Keywords:

constrained adaptive filtering; kernel risk-sensitive loss; instability; mean square stability

1. Introduction

The constrained adaptive filters (CAFs) [1], where the weight is subject to linear constraints, have been widely studied in the field of adaptive signal processing. The original research of CAFs was derived from the antenna array processing, which employed the linearly-constrained minimum-variance (LCMV) criterion to estimation the direction of the antenna array [2]. CAFs have since been successfully applied to adaptive beamforming [3], system identification [4], channel equalization [5], and blind multiuser detection [6].

The simplest linearly-constrained adaptive filter, named constrained least mean-square (CLMS) [2], is developed from the LCMV criterion, and its mean square performance is analyzed based on a decomposable symmetric matrix in [7]. Due to stochastic gradient optimization, the CLMS has a simple structure with low computational complexity. However, its performance is highly influenced by the step size and correlated input. To improve the convergence speed, the constrained fast least-squares (CFLS) algorithm [8], linear-equality-constrained recursive least-squares (CRLS) algorithm [9] and its relaxed version are proposed at the expense of high computational complexity. Furthermore, the reduced-complexity constrained recursive least-squares algorithm based on the dichotomous coordinate descent (CRLS-DCD) iterations [10] and low-complexity constrained affine-projection (CAP) algorithm [11] with the data-selection method are proposed for reducing the computational complexity effectively. Other types of improved constrained filters [12,13,14] have been widely used, which make a trade-off between the computational complexity and filtering performance, i.e., convergence speed and filtering accuracy. All the mentioned constrained algorithms above are developed based on the mean square error (MSE) criterion [15] with good performances under Gaussian assumptions. However, with non-Gaussian cases, the filtering accuracy of these algorithms will decline sharply.

Therefore, the maximum correntropy criterion (MCC) [16,17], generalized MCC (GMCC) [18] and minimum error entropy (MEE) [19] criteria from information theoretic learning (ITL) [20] become alternative criteria, showing strong robustness to non-Gaussian signals. Adding the linear constraints on the weights of MCC and GMCC, the constrained MCC (CMCC) [21] and constrained GMCC (CGMCC) [22] are developed by using stochastic gradient optimization. CMCC and CGMCC display good filtering performances in the presence of single-peak heavy-tailed noise. However, when coping with multi-peak noise, their performances will decline. Due to the symmetry of the errors, MEE counteracts the influence of multi-peak noise effectively. Adding the linear constraints into MEE, a gradient-based constrained MEE (CMEE) algorithm [23] with a sliding window is proposed at the expense of higher complexity but with better accuracy than CMCC and GMCC in the presence of multi-peak noise. Except for the ITL-based constrained filters, there also exist other criteria [24,25] to show good performance under a non-Gaussian environment. A constrained least mean M-estimation (CLMM) algorithm based on an improved M-estimation loss function has been proposed in [24]. Inspired by the boundedness of the gradient of the lncosh function, a constrained least lncosh adaptive filtering algorithm (CLLAF) has been developed in [25]. These constrained algorithms show good performances under different non-Gaussian noise. However, the gradient-based constrained adaptive filters need to make a trade-off between accuracy and convergence speed through an adjustable step size. Hence, the constrained recursive MCC (CRMCC) algorithm [26] is developed, which not only improves the accuracy but also accelerates the convergence.

Recently, an advanced ITL-based criterion, named kernel risk-sensitive loss (KRSL) [27], has been proposed by introducing a beneficial risk-sensitive parameter to regulate the shape of its performance surface. Compared with MCC, the KRSL is more “convex”, which conduces better accuracy and faster convergence. Based on the KRSL, a gradient-based constrained mixture KRSL algorithm [28] is proposed, which indicates a higher filtering accuracy than CMCC.

In this paper, thanks to the advantage of KRSL criterion, a novel constrained recursive KRSL (CRKRSL) algorithm is proposed by using an average approximation method [29]. Thus, the proposed CRKRSL algorithm is converted to a variable step size gradient-based algorithm, which has a lower complexity than the traditional constrained recursive algorithms. Meanwhile, due to the instability of CRKRSL at the initial update stage, we use a gradient algorithm with a fixed step size to replace its initial update. Moreover, the convergence condition of mean square stability regarding iteration times is derived by decomposing a symmetric matrix with fourth-order variables. Simulation results indicate the advantages of CRKRSL on filtering accuracy and computational complexity.

The rest of the paper is organized as follows. The constrained KRSL loss and the CRKRSL algorithm are presented in Section 2. Stability analysis of CRKRSL is given in Section 3. Simulation results and discussion of CRKRSL are shown in Section 4. Finally, the conclusion is given in Section 5.

2. CRKRSL Algorithm

2.1. Notations

Throughout this paper,

R

denotes the real field,

R^{m}

denotes an m-dimensional real-valued vector space,

R^{n \times m}

denotes

n \times m

matrix where its entries belong to

R

;

{(\cdot)}^{T}

represents the transpose operation;

E [\cdot]

is the expectation operation;

∥ \cdot ∥

denotes the Euclidean norm; and

O (\cdot)

represents the computational complexity of algorithm.

2.2. KRSL Loss

As a nonlinear similarity measure of variable

X, Y \in R

, the KRSL [27] is defined as

\begin{matrix} S (X, Y) & = \frac{1}{γ} E [exp (\frac{γ}{2} {∥ φ (X) - φ (Y) ∥}^{2})] \\ = \frac{1}{γ} E [exp (γ (1 - κ (X - Y)))], \end{matrix}

(1)

where

γ \in [0, + \infty)

is a risk-sensitive parameter.

φ (\cdot)

is the corresponding mapping variable in reproducing kernel Hilbert spaces (RKHS) [30], satisfying

φ^{T} (X) φ (Y) = κ (X - Y)

with Gaussian kernel

\begin{matrix} κ (X - Y) = exp (- \frac{{∥ X - Y ∥}^{2}}{2 σ^{2}}) . \end{matrix}

(2)

Generally, in terms of the unknown joint distribution between

X

and

Y

, we adopt a finite sampled data pair

{{\tilde{x}}_{l}, {\tilde{y}}_{l}}_{l = 1}^{N}

to approximate the expected loss, which is given by

\begin{matrix} \hat{S} (X, Y) = \frac{1}{γ N} \sum_{l = 1}^{N} exp (γ (1 - κ ({\tilde{x}}_{l} - {\tilde{y}}_{l}))) . \end{matrix}

(3)

2.3. Constrained KRSL Loss

When applied to constrained adaptive filtering, the optimization problem with KRLS loss becomes

\begin{matrix} min_{w} \frac{1}{γ} \sum_{l = 1}^{n} exp (γ (1 - κ (d_{l} - w^{T} u_{l}))) \\ \begin{matrix} s . t . & C^{T} w = f, \end{matrix} \end{matrix}

(4)

where

{u_{l}, d_{l}}_{l = 1}^{N} \in R^{m} \times R

are the input–output training data pair;

w \in R^{m}

is the weight vector and

e_{l} = d_{l} - w^{T} u_{l}

is the corresponding error;

C \in R^{m \times q}

and

f \in R^{q}

are the constrained matrix and vector, respectively. By constructing the Lagrange function, the constrained problem is transformed to minimize the following constrained KRSL loss:

\begin{matrix} J_{c} (n) = \frac{σ^{2}}{γ} \sum_{l = 1}^{n} exp (γ (1 - κ (d_{l} - w^{T} u_{l}))) + θ_{n}^{T} (f - C^{T} w) \end{matrix}

(5)

with

θ_{n} \in R^{q}

being the Lagrange multiplier.

2.4. Proposed CRKRSL Algorithm

Taking the gradient of

J_{c} (n)

to zero at instant n, one has

\begin{matrix} \frac{\partial J_{c} (n)}{\partial w_{n}} = \sum_{l = 1}^{n} ϕ (e_{l}) (d_{l} - w_{n}^{T} u_{l}) u_{l} + C θ_{n} = 0 \end{matrix}

(6)

with

ϕ (e_{l}) = exp (γ (1 - κ (e_{l}))) κ (e_{l})

.

Define

\begin{matrix} U_{n}^{- 1} & = \sum_{l = 1}^{n - 1} ϕ (e_{l}) u_{l} u_{l}^{T} + ϕ (e_{n}) u_{n} u_{n}^{T} \end{matrix}

(7)

\begin{matrix} d_{n} & = \sum_{l = 1}^{n - 1} ϕ (e_{l}) u_{l} d_{l} + ϕ (e_{n}) u_{n} d_{n} . \end{matrix}

(8)

Then, the constrained solution is derived as

\begin{matrix} w_{n} = U_{n} d_{n} + U_{n} C θ_{n} \end{matrix}

(9)

with

θ_{n} = {(C^{T} U_{n} C)}^{- 1} (f - C^{T} U_{n} d_{n})

. By the matrix inversion lemma [31], Equation (7) is further rewritten by

\begin{matrix} U_{n} = U_{n - 1} - g_{n} u_{n}^{T} U_{n - 1}, \end{matrix}

(10)

where the gain is given by

\begin{matrix} g_{n} & = \frac{U_{n - 1} u_{n}}{ϕ^{- 1} (e_{n}) + u_{n}^{T} U_{n - 1} u_{n}} . \end{matrix}

(11)

Reorganizing Equation (11), we get another form of

g_{n}

by

\begin{matrix} g_{n} & = ϕ (e_{n}) (U_{n - 1} u_{n} - g_{n} u_{n}^{T} U_{n - 1} u_{n}) \\ = ϕ (e_{n}) U_{n} u_{n} . \end{matrix}

(12)

To obtain a recursive solution, we expand Equation (9) as follow:

\begin{matrix} w_{n} & = U_{n} (d_{n - 1} + ϕ (e_{n}) u_{n} d_{n}) + U_{n} C θ_{n} \\ = (U_{n - 1} - g_{n} u_{n}^{T} U_{n - 1}) d_{n - 1} + ϕ (e_{n}) U_{n} u_{n} d_{n} + U_{n} C θ_{n} \\ = w_{n - 1} + ϕ (e_{n}) U_{n} u_{n} d_{n} - g_{n} u_{n}^{T} w_{n - 1} + U_{n} C θ_{n} \\ = w_{n - 1} + ϕ (e_{n}) e_{n} U_{n} u_{n} + U_{n} C θ_{n}, \end{matrix}

(13)

where the corrected error

e_{n}

updated by the a priori weight

w_{n - 1}

is obtained by

\begin{matrix} e_{n} & = d_{n} - u_{n}^{T} w_{n - 1} . \end{matrix}

(14)

Substituting Equation (12) into Equation (13), we get

\begin{matrix} w_{n} & = w_{n - 1} + g_{n} e_{n} + U_{n} C θ_{n} \end{matrix}

(15)

with

\begin{matrix} θ_{n} & = {(C^{T} U_{n} C)}^{- 1} (f - C^{T} (w_{n - 1} + g_{n} e_{n})) . \end{matrix}

(16)

Therefore, combing Equations (10), (11), (14) and (15), we obtain the constrained recursive CRKRSL algorithm. The main drawback of Equation (15) is that the inverse matrix

{(C^{T} U_{n} C)}^{- 1}

needs to be updated iteratively with complexity

O (q^{3})

. To reduce the computational complexity of the CRKRSL algorithm, we consider the following linear model and give some assumptions. The linear model is described by

\begin{matrix} d_{n} = w_{*}^{T} u_{n} + v_{n}, \end{matrix}

(17)

where

w_{*}

is the model parameter and

v_{n}

is the noise at instant n. The assumptions are given as follows:

A1 ${u_{n}}$ is independent identically distributed (i.i.d), generated from a multivariate Gaussian distribution with covariance matrix $R = E [u_{n} u_{n}^{T}]$ ;
A2 ${v_{n}}$ is zero mean, i.i.d, and independent with ${u_{n}}$ , satisfying $δ^{2} = E [v_{n}^{2}]$ ;
A3 the error $e_{n}$ is uncorrelated with $u_{n} u_{n}^{T}$ .

Inspired by the average approximation [29] and based on A1–A3, the correlation matrix

U_{n}

is approximated by

\begin{matrix} U_{n} & = \frac{1}{n} {(\frac{1}{n} \sum_{l = 1}^{n} ϕ (e_{l}) u_{l} u_{l}^{T})}^{- 1} \\ \approx \frac{1}{n} E {[ϕ (e_{n})]}^{- 1} E {[u_{n} u_{n}^{T}]}^{- 1} = η_{n} Z, \end{matrix}

(18)

where

Z = R^{- 1} = E {[u_{n} u_{n}^{T}]}^{- 1}

and

η_{n} = {(n E [ϕ (v_{n})])}^{- 1}

is approximated by the first-order Taylor expansion around the noise.

Furthermore, Equation (13) can be simplified as

\begin{matrix} w_{n} & = w_{n - 1} + ϕ (e_{n}) e_{n} U_{n} u_{n} + U_{n} C θ_{n} \\ = w_{n - 1} + η_{n} ϕ (e_{n}) e_{n} Z u_{n} + Z C \hat{θ}, \end{matrix}

(19)

where

\hat{θ} = Θ (f - C^{T} (w_{n - 1} + η_{n} ϕ (e_{n}) e_{n} Z u_{n}))

and the constrained inverse matrix is defined as

Θ = {(C^{T} Z C)}^{- 1}

.

Therefore,

w_{n}

is further expressed as

\begin{matrix} w_{n} = Q (w_{n - 1} + η_{n} ϕ (e_{n}) e_{n} Z u_{n}) + p \end{matrix}

(20)

with

Q = I - Z C Θ C^{T}

and

p = Z C Θ f

.

Based on the approximation, the recursive CRKRSL algorithm has been changed to a gradient one with variable step size

η_{n}

and transformed input

Z u_{n}

.

Remark 1.

The term

ϕ (e_{n})

in Equation (20) has a significant impact on the stability of the CRKRSL algorithm under non-Gaussian noise since CRKRSL can suppress large outliers (

e_{n} \to \infty

) with a small

ϕ (e_{n})

. Figure 1 shows the relation between

ϕ (e_{n})

and the error

e_{n}

. It is clear to see that

ϕ (e_{n})

in CRKRSL is bigger than

κ (e_{n})

in CRMCC, when the error is small. (Note that

ϕ (e_{n}) = κ (e_{n})

if

γ = 0

). Moreover, when

γ = 0

, the CRKRSL degenerates to an efficient CRMCC [26] algorithm and

ϕ (e_{n})

reaches the maximum at

e_{n} = 0

. When

γ > 0

,

ϕ (e_{n})

reaches the maximum at local points around

e_{n} = 0

with a larger increment, resulting in a faster convergence speed and higher accuracy than CRMCC.

However, Equation (20) is not stable at the initial update phase, since the variable step size

η_{n}

is large with a small instant n. Especially, when the kernel width of KRSL is small, the step size

η_{n}

even exceeds the convergence range, leading to the attenuation of filtering performance. To overcome this unfavorable factor, we introduce a gradient strategy with a fixed step size

μ

to replace the update of Equation (20) in the initial L iterations, which is described by

\begin{matrix} w_{n} = \hat{Q} (w_{n - 1} + μ ϕ (e_{n}) e_{n} u_{n}) + \hat{p} \end{matrix}

(21)

with

\hat{Q} = I - C {(C^{T} C)}^{- 1} C^{T}

and

\hat{p} = C {(C^{T} C)}^{- 1} f

.

Finally, the CRKRSL is summarized in Algorithm 1.

Algorithm 1: The CRKRSL Algorithm.

Input:

Data pair

{u_{n}, d_{n}} \in R^{m} \times R

,

n = 1, 2, \dots

.

Initialization:

Choose step-size

μ

; kernel width

σ

; risk-sensitive

γ

; initial iterative length L;

training size

N_{t r}

;

initial weight

w_{0} = 0

.

e_{n} = d_{n} - w_{n - 1}^{T} u_{n}

for

n = 1 : L

w_{n} = \hat{Q} (w_{n - 1} + μ ϕ (e_{n}) e_{n} u_{n}) + \hat{p}

end

for

n = (L + 1) : N_{t r}

w_{n} = Q (w_{n - 1} + η_{n} ϕ (e_{n}) e_{n} Z u_{n}) + p

end

Remark 2.

In Equation (20), the constant inverse matrix

Θ = {(C^{T} Z C)}^{- 1}

needs to be calculated only once before the update. On the contrary,

{(C^{T} U_{n} C)}^{- 1}

in Equation (15) needs to be updated iteratively. Meanwhile, the update of matrix

U_{n}

by Equation (10) is avoided by using Equation (20). Therefore, the proposed CRKRSL with Equation (20) has a lower computational complexity than one with Equation (15).

3. Stability Analysis

To obtain the stability condition of mean square weight error of CRKRSL, we define the weight error and model error as

\begin{matrix} {\tilde{w}}_{n} = w_{n} - w_{o} \end{matrix}

(22)

\begin{matrix} w_{Δ} = w_{*} - w_{o} \end{matrix}

(23)

with the optimal weight

\begin{matrix} w_{o} & = lim_{n \to \infty} (U_{n} d_{n} + U_{n} C θ_{n}) \\ = lim_{n \to \infty} U_{n} d_{n} + lim_{n \to \infty} U_{n} C {(C^{T} U_{n} C)}^{- 1} (f - C^{T} U_{n} d_{n}) \\ = lim_{n \to \infty} {[\frac{1}{n} \sum_{l = 1}^{n} ϕ (e_{l}) u_{l} u_{l}^{T}]}^{- 1} [\frac{1}{n} \sum_{l = 1}^{n} ϕ (e_{l}) u_{l} d_{l}] \\ + lim_{n \to \infty} {[\frac{1}{n} \sum_{l = 1}^{n} ϕ (e_{l}) u_{l} u_{l}^{T}]}^{- 1} C {(C^{T} {[\frac{1}{n} \sum_{l = 1}^{n} ϕ (e_{l}) u_{l} u_{l}^{T}]}^{- 1} C)}^{- 1} (f - C^{T} U_{n} d_{n}) \\ = U d + U C {(C^{T} U C)}^{- 1} (f - C^{T} U d), \end{matrix}

(24)

where the robust correlation matrix and vector are defined as

U = lim_{n \to \infty} {[\frac{1}{n} \sum_{l = 1}^{n} ϕ (e_{l}) u_{l} u_{l}^{T}]}^{- 1} = E {[ϕ (e_{n}) u_{n} u_{n}^{T}]}^{- 1}

and

d = lim_{n \to \infty} [\frac{1}{n} \sum_{l = 1}^{n} ϕ (e_{l}) u_{l} d_{l}] = E [ϕ (e_{n}) u_{n} d_{n}]

.

Subtracting

w_{o}

from both sides of Equation (20), we obtain

\begin{matrix} {\tilde{w}}_{n} & = Q (w_{n - 1} + η_{n} ϕ (e_{n}) Z u_{n} d_{n} - η_{n} ϕ (e_{n}) Z u_{n} u_{n}^{T} w_{n - 1}) + p - w_{o} \\ = Q (w_{n - 1} + η_{n} ϕ (e_{n}) Z u_{n} (u_{n}^{T} w_{*} + v_{n}) - η_{n} ϕ (e_{n}) Z u_{n} u_{n}^{T} w_{n - 1}) + p - w_{o} \\ = Q (w_{n - 1} + η_{n} ϕ (e_{n}) Z u_{n} (u_{n}^{T} (w_{Δ} + w_{o}) + v_{n}) - η_{n} ϕ (e_{n}) Z u_{n} u_{n}^{T} w_{n - 1}) + p - w_{o} \\ = Q (w_{n - 1} - η_{n} ϕ (e_{n}) Z u_{n} u_{n}^{T} {\tilde{w}}_{n - 1}) + η_{n} ϕ (e_{n}) Q Z u_{n} (u_{n}^{T} w_{Δ} + v_{n}) + p - w_{o} \\ = Q (I - η_{n} ϕ (e_{n}) Z u_{n} u_{n}^{T}) {\tilde{w}}_{n - 1} + η_{n} ϕ (e_{n}) Q Z u_{n} (u_{n}^{T} w_{Δ} + v_{n}) + Q w_{o} - w_{o} + p . \end{matrix}

(25)

Since

Q

is an idempotent matrix, we have

Q w_{o} - w_{o} + p = 0

and

Q {\tilde{w}}_{n} = {\tilde{w}}_{n}

. Then, we get

\begin{matrix} {\tilde{w}}_{n} & = Q (I - η_{n} ϕ (e_{n}) Z u_{n} u_{n}^{T}) {\tilde{w}}_{n - 1} + η_{n} ϕ (e_{n}) Q Z u_{n} (u_{n}^{T} w_{Δ} + v_{n}) \\ = (I - η_{n} ϕ (e_{n}) Q Z u_{n} u_{n}^{T}) {\tilde{w}}_{n - 1} + η_{n} ϕ (e_{n}) Q Z u_{n} (u_{n}^{T} w_{Δ} + v_{n}) . \end{matrix}

(26)

We take the expectation operator on both sides of the square norm, since the noise

v_{n}

is independent with

u_{n}

and the input sequence

{u_{n}}

is i.i.d under assumptions A1–A2. We further get the useful results that the a priori error weight

{\tilde{w}}_{n - 1}

is independent with

u_{n}

and

v_{n}

based on the independence assumptions [32]. Hence, the cross terms are equal to zeros. Then, we obtain

\begin{matrix} E [∥ {\tilde{w}}_{n} ∥^{2}] & = E [∥ {\tilde{w}}_{n - 1} ∥_{F_{n}}^{2}] + η_{n}^{2} δ^{2} E [ϕ^{2} (e_{n})] E [u_{n}^{T} Z Q Z u_{n}] \\ + η_{n}^{2} E [ϕ^{2} (e_{n})] w_{Δ}^{T} E [u_{n} u_{n}^{T} Z Q Z u_{n} u_{n}^{T}] w_{Δ} \end{matrix}

(27)

with

\begin{matrix} F_{n} & = E [{(I - η_{n} ϕ (e_{n}) Q Z u_{n} u_{n}^{T})}^{T} (I - η_{n} ϕ (e_{n}) Q Z u_{n} u_{n}^{T})] \\ = I - 2 η_{n} E [ϕ (e_{n})] E [Q Z u_{n} u_{n}^{T}] + η_{n}^{2} E [ϕ^{2} (e_{n})] E [u_{n} u_{n}^{T} Z Q Z u_{n} u_{n}^{T}] . \end{matrix}

(28)

According to Isserlis’s theorem [33], the symmetric matrix with fourth-moment Gaussian variables can be separated by

\begin{matrix} V & = E [u_{n} u_{n}^{T} Z Q Z u_{n} u_{n}^{T}] \\ = E [u_{n} u_{n}^{T} Z] E [Q Z u_{n} u_{n}^{T}] + E [u_{n} u_{n}^{T} Z Q] E [Z u_{n} u_{n}^{T}] + E [u_{n} u_{n}^{T}] E [u_{n}^{T} Z Q Z u_{n}] \\ = 2 Q + tr {Q Z} R, \end{matrix}

(29)

where

tr {\cdot}

denotes the trace operator and

R = Z^{- 1} = E [u_{n} u_{n}^{T}]

is a positive define correlation matrix.

Therefore, Equation (27) can be simplified as

\begin{matrix} E [∥ {\tilde{w}}_{n} ∥^{2}] & = E [∥ {\tilde{w}}_{n - 1} ∥_{F_{n}}^{2}] + η_{n}^{2} δ^{2} E [ϕ^{2} (e_{n})] tr {Q Z} + η_{n}^{2} E [ϕ^{2} (e_{n})] w_{Δ}^{T} (2 Q + tr {Q Z} R) w_{Δ} \\ = E [∥ {\tilde{w}}_{n - 1} ∥_{F_{n}}^{2}] + η_{n}^{2} E [ϕ^{2} (e_{n})] tr {Q Z} (w_{Δ}^{T} R w_{Δ} + δ^{2}) \end{matrix}

(30)

with a simplified

\begin{matrix} F_{n} & = I - 2 η_{n} E [ϕ (e_{n})] Q + η_{n}^{2} E [ϕ^{2} (e_{n})] V \\ = I - 2 η_{n} E [ϕ (e_{n})] Q + η_{n}^{2} E [ϕ^{2} (e_{n})] (2 Q + tr {Q Z} R) . \end{matrix}

(31)

Define

q_{k}

and

r_{k}

,

k \in {1, 2, \dots, m}

, being the kth eigenvalue of matrix

Q

and

R

. To ensure the square stability, the eigenvalues of

F_{n}

should satisfy the following condition:

\begin{matrix} | 1 - 2 η_{n} E [ϕ (e_{n})] q_{k} + η_{n}^{2} E [ϕ^{2} (e_{n})] (2 q_{k} + tr {Q Z} r_{k}) | < 1, \\ for k = 1, 2, \dots, m . \end{matrix}

(32)

Then, the convergence condition about the step size can be expressed as

\begin{matrix} η_{n} < min_{i} (\frac{2 E [ϕ (e_{n})] q_{k}}{E [ϕ^{2} (e_{n})] (2 q_{k} + tr {Q Z} r_{k})}) . \end{matrix}

(33)

Since the step size

η_{n} = {(n E [ϕ (v_{n})])}^{- 1}

is related to the iteration n, finally, we obtain the weight square error stability condition about the iteration n by

\begin{matrix} n > max_{i} (\frac{E [ϕ^{2} (e_{n})] (2 q_{k} + tr {Q Z} r_{k})}{2 E [ϕ (v_{n})] E [ϕ (e_{n})] q_{k}}), \end{matrix}

(34)

where the nonlinear terms

E [ϕ (e_{n})]

and

E [ϕ^{2} (e_{n})]

can be approximated by the Taylor expansion.

Remark 3.

Inequality (34) implies that the iteration n should be sufficiently large to guarantee the convergence. When n is small, CRKRSL with Equation (20) cannot satisfy (34), resulting in fluctuations at the initial update stage. Therefore, it is reasonable to use Equation (21) to replace Equation (20) to improve the convergence speed and filtering accuracy.

4. Results and Discussion

In this section, we will show the advantages of the CRKRSL algorithm on the filtering accuracy and computational complexity for both low-dimensional and high-dimensional inputs. The noise model, data selection and algorithms comparison are given as follows, respectively.

Noise model: We first consider a pure Gaussian noise to test the filtering accuracy, i.e., Gaussian noise with

v_{n} \sim N (0, δ^{2})

, where

N (\bar{μ}, {\bar{δ}}^{2})

denotes the Gaussian distribution with mean

\bar{μ}

and variance

{\bar{δ}}^{2}

. Then, a mixed noise is considered to test the robustness of CRKRSL. The mixed noise model under a probability process is denoted as

\begin{matrix} v_{n} = b (n) v_{1} (n) + (1 - b (n)) v_{2} (n), \end{matrix}

(35)

where

b (n) \in {0, 1}

satisfies the binary distribution with probability

P {b (n) = 0} = 0.1

and

P {b (n) = 1} = 0.9

.

v_{1} (n)

with high probability generates the ordinary noise by a Gaussian distribution

N (0, δ_{1}^{2})

and

v_{2} (n)

with low probability generates a few impulsive noise. Two types of

v_{2} (n)

are considered: (a) Gaussian noise with large variance, i.e.,

v_{2} (n) \sim N (0, 100)

; (b)

α

-stable noise [34] with parameter function

F (0.8, 0, 0.1, 0)

.

Data selection: The training inputs sampled from a Gaussian distribution with zero mean and variance matrix

R

and 5000 samples are chosen for the simulation. The parameters

C

,

f

,

R

are configured as the same as in [7]. Note that there exist two sets of data with different input dimensions in [7]. The underlying dimension is either

m = 7

or

m = 31

. The simulated mean square deviation (MSD) is define as

MSD (dB) = 10 {log}_{10} (∥ {\tilde{w}}_{n} ∥^{2})

and the steady-state MSD is defined as the mean of last 1000 samples. The obtained results are averaged over 500 Monte Carlo trials. The results were run on MATLAB version R2020b on a Windows 10 operating system, configured with an Intel(R) Core(TM) i7-8700 CPU 3.20 GHz and RAM 16 GB.

Compared algorithms: The constrained algorithms, including the CLMS [7], CMCC [21], CLLAF [25] CRLS [9], and CRMCC [26], are chosen to be compared with the proposed CRKRSL. For fair comparison, the kernel widths

σ

of CMCC, CRMCC, and CRKRSL are set by the same value; the regularization terms of CRLS and CRMCC are set as 0.001; the initial iterative length of CRKRSL is set as the same the input dimension, i.e.,

L = m

.

4.1. Low-Dimensional Input

In this part, the input dimension m and constrained dimension q are set as

m = 7

and

q = 3

, respectively. The noise considered here satisfies the following conditions, i.e., Gaussian noise

v (n) \sim N (0, 0.1)

and mixed noise

v_{1} (n) \sim N (0, 0.1)

. To reflect the influence of risk-sensitive parameter

γ

on the MSD, the relations between

γ

and steady-state MSD are shown in Figure 2 and Figure 3 under different noise models. From Figure 2, one can see that CRKRSL has a stable steady-state MSD under Gaussian noise. Therefore,

γ

has little influence on the performance of CRKRSL. From Figure 3, it is observed that the CRKRSL is sensitive to

γ

under mixed noise with model (a) and achieves the lowest steady-state MSD around

γ = 2.5

. Therefore, we choose

γ = 2.5

for the following algorithm comparison in Figure 4 and Figure 5. Note that

ρ

denotes the shape parameter in the CLLAF algorithm. All necessary parameters are given in the figures.

From Figure 4, one can see that CRKRSL, CRMCC and CRLS coincide and have almost the same MSDs under Gaussian noise. This implies that the CRKRSL deals with the Gaussian noise well by choosing a large kernel width. From Figure 5, it is observed that CRKRSL has the best performance among all constrained algorithms since the risk-sensitive parameter

γ

can avoid fluctuations caused by a small kernel width. Moreover, CRKRSL is more stable than CRMCC at the initial stage, potentially leading to a better filtering performance. In Table 1, we further compare the consumed time at each iteration and steady-state MSDs of each algorithm under mixed noise with model (a). The consumed time of CRKRSL is far less than that of CRMCC and CRLS.

To show the advantage of CRKRSL on computational complexity, Table 2 lists the compared results of all mentioned algorithms at each iteration. One can see that CRKRSL has a lower computational complexity than CRLS and CRMCC by avoiding calculating the inverse matrix. Although the inverse matrix

U_{n}

of CRMCC is not required to be calculated, the inverse matrix

{(C^{T} U_{n} C)}^{- 1}

is still needed to be calculated at each iteration, resulting in a high computational complexity with a large q.

To test the performance of CRKRSL under mixed noise with model (b), Figure 6 gives the MSD results of all mentioned algorithms. One can see from Figure 6 that CRKRSL, CRMCC, CMCC and CLLAF show strong robustness to outliers, and CRKRSL has the lowest MSD, whereas the CLMS and CRLS are not stable due to being sensitive to

α

-stable noise.

4.2. High-Dimensional Input

In this part, the input dimension m and constrained dimension q are set as

m = 31

and

q = 1

, respectively. We only consider the mixed noise, since the Gaussian noise influences the performance of CRKRSL little by selecting a large kernel width. The mixed noise satisfies

v_{1} (n) \sim N (0, 1)

.

Figure 7 and Figure 8 show the MSDs of different algorithms under mixed noise with model (a) and model (b), respectively. It is clear to see that CRKRSL shows the best performance out of all the compared algorithms both under mixed noise with model (a) and model (b). The initial iterative length L influences the convergence speed, significantly. Therefore, the initial iterative length should be not smaller than the input dimension. In Table 3, we further compare the consumed time at each iteration and steady-state MSDs of each algorithm under mixed noise with model (a). One can see that the consumed time of CRKRSL is far less than that of CRMCC. Moreover, CRKRSL has the lowest steady-state MSD value.

5. Conclusions

By introducing the linear constraints to the kernel risk-sensitive loss (KRSL), a low-complexity constrained recursive KRSL (CRKRSL) algorithm is presented with the help of average approximation. Since the risk-sensitive parameter is able to control the smoothness of the performance surface, CRKRSL achieves higher accuracy than some existing constrained recursive algorithms. Due to the inaccuracy of the average approximation with a few inputs, a fixed step-size gradient method is adopted to avoid the instability of CRKRSL at the initial update stage. Moreover, mean square analysis indicates that iteration times influence the stability of CRKRSL significantly and simulation results confirm the advantages of CRKRSL. The effectiveness of CRKRSL highly relies on the average approximation method, which has limits when coping with nonstationary signals. In the future, we focus on finding a novel approximation method to process both stationary and nonstationary signals and further improve the computational efficiency and accuracy of constrained recursive algorithms.

Author Contributions

Conceptualization, S.X. and C.Z.; methodology, S.X.; software, S.X. and Z.G.; validation, S.X.; formal analysis, S.X.; investigation, S.X. and D.Y.; resources, C.Z.; data curation, S.X. and D.Y.; writing—original draft preparation, S.X.; writing—review and editing, C.Z. and Z.G.; visualization, S.X.; supervision, C.Z.; project administration, C.Z. and D.Y.; funding acquisition, C.Z. and Z.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Chongqing Social Science Planning Project (2021BS038) and Natural Science Foundation of Chongqing, China (cstc2021jcyj-bshX0035, cstc2018jcyjA2453).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

De Campos, M.L.R.; Werner, S.; Apolinário, J.A. Constrained adaptive filters. In Adaptive Antenna Arrays: Trends and Applications; Springer: New York, NY, USA, 2004; pp. 46–64. [Google Scholar]
Frost, O. An algorithm for linearly constrained adaptive array processing. Proc. IEEE 1972, 60, 926–935. [Google Scholar] [CrossRef]
Li, J.; Stoica, P. Robust Adaptive Beamforming; John Wiley & Sons: New York, NY, USA, 2005. [Google Scholar]
Diniz, P.S. Adaptive Filtering: Algorithms and Practical Implementation; Springer: Cham, Switzerland, 1997. [Google Scholar]
Wu, Q.; Li, Y.; Xue, W. A kernel recursive maximum versoria-like criterion algorithm for nonlinear channel equalization. Symmetry 2019, 11, 1067. [Google Scholar] [CrossRef] [Green Version]
Verdu, S. Multiuser Detection; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
Arablouei, R.; Doğançay, K.; Werner, S. On the mean-square performance of the constrained LMS algorithm. Signal Process. 2015, 117, 192–197. [Google Scholar] [CrossRef] [Green Version]
Resende, L.S.; Romano, J.M.T.; Bellanger, M.G. A fast least-squares algorithm for linearly constrained adaptive filtering. IEEE Trans. Signal Process. 1996, 44, 1168–1174. [Google Scholar] [CrossRef]
Arablouei, R.; Doğançay, K. Performance analysis of linear-equality-constrained least-squares estimation. IEEE Trans. Signal Process. 2015, 63, 3762–3769. [Google Scholar] [CrossRef] [Green Version]
Arablouei, R.; Doğançay, K. Reduced-complexity constrained recursive least-squares adaptive filtering algorithm. IEEE Trans. Signal Process. 2012, 60, 6687–6692. [Google Scholar] [CrossRef]
Werner, S.; Apolinário, J.A.; de Campos, M.L.R.; Diniz, P. Low-complexity constrained affine-projection algorithms. IEEE Trans. Signal Process. 2005, 53, 4545–4555. [Google Scholar] [CrossRef]
Apolinário, J.A., Jr.; de Campos, M.L.R.; Bernal, O.C.P. The constrained conjugate gradient algorithm. IEEE Signal Process. Lett. 2000, 12, 351–354. [Google Scholar] [CrossRef]
De Campos, M.L.R.; Werner, S.; Apolinário, J.A. Constrained adaptation algorithms employing Householder transformation. IEEE Trans. Signal Process. 2002, 9, 2187–2195. [Google Scholar] [CrossRef]
Arablouei, R.; Doğançay, K. Linearly-constrained recursive total least-squares algorithm. IEEE Signal Process. Lett. 2012, 12, 821–824. [Google Scholar] [CrossRef]
Haykin, S. Adaptive Filter Theory; Prentice-Hall: Upper Saddle River, NJ, USA, 2002. [Google Scholar]
Liu, W.; Pokharel, P.P.; Príncipe, J.C. Correntropy: Properties and applications in non-Gaussian signal processing. IEEE Trans. Signal Process. 2007, 55, 5286–5298. [Google Scholar] [CrossRef]
Li, Y.; Wang, Y.; Sun, L. A proportionate normalized maximum correntropy criterion algorithm with correntropy induced metric constraint for identifying sparse systems. Symmetry 2018, 10, 683. [Google Scholar] [CrossRef] [Green Version]
Chen, B.; Xing, L.; Zhao, H.; Zheng, N.; Príncipe, J.C. Generalized correntropy for robust adaptive filtering. IEEE Trans. Signal Process. 2016, 64, 3376–3387. [Google Scholar] [CrossRef] [Green Version]
Erdogmus, D.; Príncipe, J.C. An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems. IEEE Trans. Signal Process. 2002, 50, 1780–1786. [Google Scholar] [CrossRef] [Green Version]
Príncipe, J.C. Information Theoretic Learning: Renyi’s Entropy and Kernel Perspectives; Springer: New York, NY, USA, 2010. [Google Scholar]
Peng, S.; Chen, B.; Sun, L.; Ser, W.; Lin, Z. Constrained maximum correntropy adaptive filtering. Signal Process. 2017, 140, 116–126. [Google Scholar] [CrossRef] [Green Version]
Bhattacharjee, S.S.; Shaikh, M.A.; Kumar, K.; George, N.V. Robust constrained generalized correntropy and maximum versoria criterion adaptive filters. IEEE Trans. Circuits Syst. II Express Briefs 2021, 68, 3002–3006. [Google Scholar] [CrossRef]
Peng, S.; Ser, W.; Chen, B.; Sun, L.; Lin, Z. Robust constrained adaptive filtering under minimum error entropy criterion. IEEE Trans. Circuits Syst. II Express Briefs 2018, 65, 1119–1123. [Google Scholar] [CrossRef]
Wang, Z.; Zhao, H.; Zeng, X. Constrained least mean M-estimation adaptive filtering algorithm. IEEE Trans. Circuits Syst. II Express Briefs 2021, 68, 1507–1511. [Google Scholar] [CrossRef]
Liang, T.; Li, Y.; Zakharov, Y.V.; Xue, W.; Qi, J. Constrained least lncosh adaptive filtering algorithm. Signal Process. 2021, 183, 108044. [Google Scholar] [CrossRef]
Qian, G.; Ning, X.; Wang, S. Recursive constrained maximum correntropy criterion algorithm for adaptive filtering. IEEE Trans. Circuits Syst. II Express Briefs 2020, 67, 2229–2233. [Google Scholar] [CrossRef]
Chen, B.; Xing, L.; Xu, B.; Zhao, H.; Zheng, N.; Príncipe, J.C. Kernel risk-sensitive loss: Definition, properties and application to robust adaptive filtering. IEEE Trans. Signal Process. 2017, 65, 2888–2901. [Google Scholar] [CrossRef] [Green Version]
Qian, G.; Dong, F.; Wang, S. Robust constrained minimum mixture kernel risk-sensitive loss algorithm for adaptive filtering. Digit. Signal Process. 2020, 107, 102859. [Google Scholar] [CrossRef]
Qian, G.; Wang, S.; Wang, L.; Duan, S. Convergence analysis of a fixed point algorithm under maximum complex correntropy criterion. IEEE Signal Process. Lett. 2018, 25, 1830–1834. [Google Scholar] [CrossRef]
Liu, W.; Príncipe, J.C.; Haykin, S. Kernel Adaptive Filtering: A Comprehensive Introduction; Wiley: Hoboken, NJ, USA, 2011. [Google Scholar]
Zhang, X.-D. Matrix Analysis and Applications; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar]
Sayed, A.H. Adaptive Filters; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
Isserlis, L. On a formula for the product-moment coefficient of any order of a normal frequency distribution in any number of variables. Biometrika 1918, 12, 134–139. [Google Scholar] [CrossRef] [Green Version]
Zhang, Q.; Feng, W.; Iu, H.H.C.; Wang, S. Adaptive filters with robust augmented space linear model: A weighted k-NN method. IEEE Trans. Signal Process. 2021, 69, 6448–6461. [Google Scholar] [CrossRef]

Figure 1. The relation between

ϕ (e_{n})

and the error

e_{n}

(

σ = \sqrt{2}

).

Figure 1. The relation between

ϕ (e_{n})

and the error

e_{n}

(

σ = \sqrt{2}

).

Figure 2. Steady-state MSD versus risk-sensitive parameter

γ

under Gaussian noise (

m = 7, q = 3

).

Figure 2. Steady-state MSD versus risk-sensitive parameter

γ

under Gaussian noise (

m = 7, q = 3

).

Figure 3. Steady-state MSD versus risk-sensitive parameter

γ

under mixed noise with model (a) (

m = 7, q = 3

).

Figure 3. Steady-state MSD versus risk-sensitive parameter

γ

under mixed noise with model (a) (

m = 7, q = 3

).

Figure 4. MSDs of compared algorithms under Gaussian noise (

m = 7, q = 3

).

Figure 4. MSDs of compared algorithms under Gaussian noise (

m = 7, q = 3

).

Figure 5. MSDs of compared algorithms under mixed noise with model (a) (

m = 7, q = 3

).

Figure 5. MSDs of compared algorithms under mixed noise with model (a) (

m = 7, q = 3

).

Figure 6. MSDs of compared algorithms under mixed noise with model (b) (

m = 7, q = 3

).

Figure 6. MSDs of compared algorithms under mixed noise with model (b) (

m = 7, q = 3

).

Figure 7. MSDs of compared algorithms under mixed noise with model (a) (

m = 31, q = 1

).

Figure 7. MSDs of compared algorithms under mixed noise with model (a) (

m = 31, q = 1

).

Figure 8. MSDs of compared algorithms under mixed noise with model (b) (

m = 31, q = 1

).

Figure 8. MSDs of compared algorithms under mixed noise with model (b) (

m = 31, q = 1

).

Table 1. Consumed time and steady-state MSD of CLMS, CMCC, CLLAF, CRLS, CRMCC and CRKRSL under mixed noise with model (a) based on low-dimensional input.

Algorithm	Consumed Time (s)	Steady-State MSD (dB)
CLMS	$0.0024$	− $6.70$
CMCC	$0.0028$	− $10.86$
CLLAF	$0.0025$	− $18.58$
CRLS	$0.0324$	− $21.23$
CRMCC	$0.0327$	− $27.75$
CRKRSL	$0.0038$	− $34.34$

Table 2. Computational complexity of CLMS, CMCC, CLLAF, CRLS, CRMCC and CRKRSL at each iteration.

Algorithms	Computational Complexity
CLMS	$O (m^{2})$
CMCC	$O (m^{2})$
CLLAF	$O (m^{2})$
CRLS	$O (m^{3} + q^{3})$
CRMCC	$O (m^{2} + q^{3})$
CRKRSL	$O (m^{2})$

Table 3. Consumed time and steady-state MSD of CLMS, CMCC, CLLAF, CRLS, CRMCC and CRKRSL under mixed noise with model (a) based on high-dimensional input.

Algorithm	Consumed Time (s)	Steady-State MSD (dB)
CLMS	$0.0035$	$2.98$
CMCC	$0.0034$	$- 2.21$
CLLAF	$0.0034$	$- 3.13$
CRLS	$0.0112$	$- 11.02$
CRMCC	$0.0114$	$- 15.45$
CRKRSL	$0.0063$	$- 19.72$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiang, S.; Zhao, C.; Gao, Z.; Yan, D. Low-Complexity Constrained Recursive Kernel Risk-Sensitive Loss Algorithm. Symmetry 2022, 14, 877. https://doi.org/10.3390/sym14050877

AMA Style

Xiang S, Zhao C, Gao Z, Yan D. Low-Complexity Constrained Recursive Kernel Risk-Sensitive Loss Algorithm. Symmetry. 2022; 14(5):877. https://doi.org/10.3390/sym14050877

Chicago/Turabian Style

Xiang, Shunling, Chunzhe Zhao, Zilin Gao, and Dongfang Yan. 2022. "Low-Complexity Constrained Recursive Kernel Risk-Sensitive Loss Algorithm" Symmetry 14, no. 5: 877. https://doi.org/10.3390/sym14050877

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Low-Complexity Constrained Recursive Kernel Risk-Sensitive Loss Algorithm

Abstract

1. Introduction

2. CRKRSL Algorithm

2.1. Notations

2.2. KRSL Loss

2.3. Constrained KRSL Loss

2.4. Proposed CRKRSL Algorithm

3. Stability Analysis

4. Results and Discussion

4.1. Low-Dimensional Input

4.2. High-Dimensional Input

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI