Complex Correntropy with Variable Center: Definition, Properties, and Application to Adaptive Filtering

Dong, Fei; Qian, Guobing; Wang, Shiyuan

doi:10.3390/e22010070

Open AccessArticle

Complex Correntropy with Variable Center: Definition, Properties, and Application to Adaptive Filtering

by

Fei Dong

,

Guobing Qian

^*

and

Shiyuan Wang

College of Electronic and Information Engineering, Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, Southwest University, Chongqing 400715, China

^*

Author to whom correspondence should be addressed.

Entropy 2020, 22(1), 70; https://doi.org/10.3390/e22010070

Submission received: 14 November 2019 / Revised: 23 December 2019 / Accepted: 4 January 2020 / Published: 6 January 2020

(This article belongs to the Section Information Theory, Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

:

The complex correntropy has been successfully applied to complex domain adaptive filtering, and the corresponding maximum complex correntropy criterion (MCCC) algorithm has been proved to be robust to non-Gaussian noises. However, the kernel function of the complex correntropy is usually limited to a Gaussian function whose center is zero. In order to improve the performance of MCCC in a non-zero mean noise environment, we firstly define a complex correntropy with variable center and provide its probability explanation. Then, we propose a maximum complex correntropy criterion with variable center (MCCC-VC), and apply it to the complex domain adaptive filtering. Next, we use the gradient descent approach to search the minimum of the cost function. We also propose a feasible method to optimize the center and the kernel width of MCCC-VC. It is very important that we further provide the bound for the learning rate and derive the theoretical value of the steady-state excess mean square error (EMSE). Finally, we perform some simulations to show the validity of the theoretical steady-state EMSE and the better performance of MCCC-VC.

Keywords:

complex; MCCC-VC; variable center; stability; EMSE

1. Introduction

Choosing the appropriate cost function (usually the statistical measure of error signal) is the key problem in adaptive filtering theory and application [1,2,3]. In the presence of Gaussian noise, it is best to using the minimum mean square error (MMSE) criterion. Therefore, a series of MMSE based algorithms [4,5,6,7] have emerged during the past decades. The MMSE based algorithms use the mean square value of the error between the desired signal and output signal as the cost function, which has many attractive features, such as convexity and smoothness. In addition, MMSE has low computational complexity since it only needs to calculate the second order statistics of the signals. However, in many non-Gaussian cases, the MMSE based algorithms are not robust. To improve this shortcoming, many kinds of non-MMSE criteria based algorithms have been developed in [8,9,10,11,12,13,14,15,16]. Since signals are often expressed in complex forms in many practical scenarios [17,18], adaptive filtering in complex domain is of great significance. During the past few years, some information criteria based algorithms have been proposed for complex domain adaptive filtering [19,20,21,22]. Particularly recently Guimarães defined a new similarity measurement between two complex variables based on complex correntropy [19,20] and proposed the maximum complex correntropy criterion (MCCC) algorithm. MCCC uses a complex Gaussian function as the kernel function, and derives the updation of weight based on Wirtinger Calculus. The complex Gaussian kernel function is desirable due to its smoothness and strict positive-definiteness. The performance of the MCCC algorithm is better than classic MMSE based algorithms, and is robust to non-Gaussian noise. Moreover, MCCC has been widely applied to many fields of machine learning and signal processing [23,24].

According to the MCCC, given two complex variables

C_{1} = A_{1} + j B_{1}

and

C_{2} = A_{2} + j B_{2}

, complex correntropy is defined by [19,20]

V_{σ}^{C} (C_{1}, C_{2}) = E [κ (C_{1} - C_{2})]

(1)

where

A_{1}

,

B_{1}

,

A_{2}

,

B_{2}

represent real variables,

E [\cdot]

denotes the expectation, and

κ (C_{1} - C_{2})

denotes the kernel function with

\begin{array}{l} κ (C_{1} - C_{2}) & = G_{σ}^{C} (C_{1} - C_{2}) \\ = \frac{1}{2 π σ^{2}} \exp (- \frac{(C_{1} - C_{2}) {(C_{1} - C_{2})}^{*}}{2 σ^{2}}) \end{array}

(2)

and

σ > 0

is the kernel width.

The purpose of adaptive filtering is to estimate the target variable

T

in some sense by designing a model

M

to construct a output

Y

from input

X

. Under MCCC, we find this mode by maximizing the complex correntropy between

T

and

Y

:

M^{*} = \underset{M \in ℳ}{\arg \max} V_{σ}^{C} (T, Y) = \underset{M \in ℳ}{\arg \max} E [G_{σ}^{C} (T - Y)]

(3)

where

ℳ

is the model assumption space which contains the possible models to construct the output

Y

from input

X

, and

M^{*}

is the optimal model.

However, the center of complex correntropy is always at zero, which is not the best option in the case of non-zero mean noise. Although the maximum corentropy criterion with variable center in [25] and [26] can be suitable for the variable center, they cannot be used for complex domain adaptive filtering. To overcome their defects, this paper proposes the maximum complex correntropy criterion with variable center (MCCC-VC).

The main contributions of this research lie in the following aspects: (1) we define a MCCC-VC and give its probability explanation; (2) based on the MCCC-VC, we propose a novel adaptive filtering algorithm in complex domain by utilizing the gradient descend approach; (3) we give effective and feasible methods to estimate the kernel center and update the kernel width adaptively; (4) we derive the bound for the learning rate, and the theoretical steady-state excess mean square error (EMSE) of the MCCC-VC algorithm, and verify the theoretical analysis by simulations.

The organization of this paper is as follows: Section 2 defines the complex correntropy with variable center and studies its properties. Section 3 proposes the MCCC-VC algorithm and provides the method for the optimization of the parameters. In addition, Section 3 also studies the convergence of the MCCC-VC algorithm and derives the theoretical steady-state excess mean square error (EMSE). Section 4 verifies the correctness of the theoretical conclusions and the superior performance of the MCCC-VC algorithm. Finally, Section 5 summarizes the conclusion of this paper.

2. Complex Correntropy with Variable Center

For two complex variables, the target variable

T

and the output

Y

, the complex correntropy with variable center is defined as:

V_{σ, c}^{C} (T, Y) = E [G_{σ}^{C} (T - Y - c)] = E [\frac{1}{2 π σ^{2}} \exp (- \frac{(T - Y - c) {(T - Y - c)}^{*}}{2 σ^{2}})]

(4)

where

c \in ℂ

represents the center of the kernel function. When

c = 0

, (4) will return to the original complex correntropy.

The complex correntropy with variable center

c

consists of the whole even moments of

T - Y

about the center

c

, which is as follows:

V_{σ, c}^{C} (T, Y) = \frac{1}{2 π σ^{2}} \sum_{n = 0}^{\infty} \frac{{(- 1)}^{n}}{2^{n} n!} E [\frac{{| e - c |}^{2 n}}{σ^{2 n}}]

(5)

where

e = T - Y

is the complex valued error variable. With the increase of

σ

, the higher-order moments around the variable center

c

would attenuate quickly. Therefore, the second-order moment is the key factor which determines the value. In particular, when

c = E [e]

and

σ \to \infty

, maximizing the complex correntropy with

c

would be equal to minimizing the variance of the error.

Moreover, when

σ \to 0

, we obtain

\begin{array}{l} \lim_{σ \to 0} V_{σ, c}^{C} (T - Y) & = \lim_{σ \to 0} \iint \iint G_{σ}^{C} (t_{R} - y_{R} - c_{R}, t_{I} - y_{I} - c_{I}) p_{T Y} (t_{R}, t_{I}, y_{R}, y_{I}) d t_{R} d t_{I} d y_{R} d y_{I} \\ = \iint \iint δ (t_{R} - y_{R} - c_{R}, t_{I} - y_{I} - c_{I}) p_{T Y} (t_{R}, t_{I}, y_{R}, y_{I}) d t_{R} d t_{I} d y_{R} d y_{I} \\ = \iint p_{T Y} (t_{R}, t_{I}, t_{R} - c_{R}, t_{I} - c_{I}) d t_{R} d t_{I} \end{array}

(6)

where

δ (x, y)

is the two-dimensional Dirac function with

{\begin{cases} \iint δ (x, y) d x d y = 1 \\ δ (x, y) = 0, x^{2} + y^{2} \neq 0 \end{cases}

, the second line is derived based on the fact that

\lim_{σ \to 0} G_{σ}^{C} (x, y) = \lim_{σ \to 0} \frac{1}{2 π σ^{2}} \exp (- \frac{x^{2} + y^{2}}{2 σ^{2}})

has the same property as

δ (x, y)

,

t_{R}

,

y_{R}

, and

c_{R}

are the real parts of

t

,

y

and

c

,

t_{I}

,

y_{I}

, and

c_{I}

are the imaginary parts of

t

,

y

, and

c

, and

p_{T Y} (t_{R}, t_{I}, y_{R}, y_{I})

denotes the joint probability density function (PDF) of

(T, Y)

. Furthermore, we derive the following result:

\begin{array}{l} \lim_{σ \to 0} V_{σ, c}^{C} (T - Y) & = \lim_{σ \to 0} \iint G_{σ}^{C} (ε_{R} - c_{R}, ε_{I} - c_{I}) p_{e} (ε_{R}, ε_{I}) d ε_{R} d ε_{I} \\ = \iint δ (ε_{R} - c_{R}, ε_{I} - c_{I}) p_{e} (ε_{R}, ε_{I}) d ε_{R} d ε_{I} \\ = \iint p_{e} (c_{R}, c_{I}) d ε_{R} d ε_{I} \end{array}

(7)

where

p_{e} (ε_{R}, ε_{I})

is the joint PDF of error. Thus, when

σ \to 0

, the value of complex correntropy with variable center

c

would approach

p_{e} (ε_{R}, ε_{I})

evaluated at

(c_{R}, c_{I})

.

3. MCCC-VC Algorithm

In this part, we derive a novel adaptive filtering algorithm based on maximum complex correntropy criterion (i.e., minimum complex correntropy loss) with variable center (MCCC-VC).

3.1. Cost Function

We apply the MCCC-VC to adaptive filtering and derive the cost function as follows:

\begin{array}{l} J_{V C - l o s s}^{C} & = G_{σ}^{C} (0) - E [G_{σ}^{C} (e (k) - c (k))] \\ = \frac{1}{2 π σ^{2}} {1 - E [\exp [- \frac{((e (k) - c (k)) {(e (k) - c (k))}^{*})}{2 σ^{2}}]]} \end{array}

(8)

where

e (k) = d (k) - w^{H} x (k)

(9)

is the error at time instant

k

,

w = {[w_{1} w_{2} \dots w_{m}]}^{T}

is the filter weight,

d (k)

is the desired signal at time instant

k

,

x (k) = {[x (k) x (k - 1) \dots x (k - m + 1)]}^{T}

is the input signal at time instant

k

,

c (k)

is the center of the kernel at time instant

k

.

The essential idea behind the cost function (8) is that, in the practical case, even when the error distribution is non-zero-mean, the proposed MCCC-VC can perform well, because MCCC-VC matches well the error distribution.

Figure 1 compares the surfaces of the proposed MCCC-VC with MCCC, where the noise is non-zero-mean complex Gaussian noise with unit variance. For visualization, we chose

m = 1

, and set the system parameter and the mean of the noise as

w_{0}^{} = 5 + 5 i

and

c = 6 + 6 i

, respectively. One can see that the cost function of MCCC-VC is minimized at

w_{0}^{}

, whereas the cost function of MCCC is minimized at some other place.

3.2. Gradient Descent Algorithm Based On MCCC-VC

Since the stochastic gradient descent approach requires less computational complexity, we adopt it to search the minimum of the cost function. Utilizing Wirtinger Calculus [27,28], we obtain the updation of the weight as follows:

\begin{array}{l} w (k + 1) & = w (k) - μ {\frac{\partial [1 - \exp [- \frac{(e (k) - c (k)) {(e (k) - c (k))}^{*}}{2 σ^{2}}]]}{\partial w^{*} (k)}} \\ = w (k) + \frac{μ}{2 σ^{2}} \exp [- \frac{{| e (k) - c (k) |}^{2}}{2 σ^{2}}] {(e (k) - c (k))}^{*} x (k) \\ = w (k) + η_{w} \exp [- \frac{{| e (k) - c (k) |}^{2}}{2 σ^{2}}] {(e (k) - c (k))}^{*} x (k) \end{array}

(10)

where

η_{w} = \frac{μ}{2 σ^{2}}

is the learning rate for the weight.

3.3. Optimization of the Parameters in MCCC-VC

3.3.1. Optimization Problem in MCCC-VC

The two parameters center location

c

and the width of kernel

σ

act a pivotal part in the performance of MCCC-VC. Thus, it is extremely important to optimize them to further improve the robustness and convergence performance in the non-zero mean noise.

The optimal model according to MCCC-VC is as follows:

M * = \underset{M \in ℳ, σ \in Ω, c \in ℂ}{\arg \max} V_{σ, c}^{C} (T, Y) = \underset{M \in ℳ, σ \in Ω, c \in ℂ}{\arg \max} E [G_{σ}^{C} (e - c)]

(11)

In addition, the complex correntropy with variable center can be divided into three parts:

\begin{array}{l} V_{σ, c}^{C} (T, Y) & = \iint G_{σ}^{C} (ε_{R} - c_{R}, ε_{I} - c_{I}) p_{e} (ε_{R}, ε_{I}) d ε_{R} d ε_{I} \\ = \frac{1}{2} {\iint [G_{σ}^{C} (ε_{R} - c_{R}, ε_{I} - c_{I})]}^{2} d ε_{R} d ε_{I} + \frac{1}{2} \iint {[p_{e} (ε_{R}, ε_{I})]}^{2} d ε_{R} d ε_{I} \\ - \frac{1}{2} \iint [G_{σ}^{C} (ε_{R} - c_{R}, ε_{I} - c_{I}) - p_{e} {(ε_{R}, ε_{I})}^{2}] d ε_{R} d ε_{I} \end{array}

(12)

Owing to the first term is independent from the optimal model, we can derive

M * = \underset{M \in ℳ, σ \in Ω, c \in ℂ}{\arg \max} V_{σ, c}^{C} (T, Y) = \underset{M \in ℳ, σ \in Ω, c \in ℂ}{\arg \max} U_{σ, c}^{C} (T, Y)

(13)

where

\begin{array}{l} U_{σ, c}^{C} (T, Y) & = {\iint [p_{e} (ε_{R}, ε_{I})]}^{2} d ε_{R} d ε_{I} \\ - {\iint [G_{σ}^{C} (ε_{R} - c_{R}, ε_{I} - c_{I}) - p_{e} (ε_{R}, ε_{I})]}^{2} d ε_{R} d ε_{I} \end{array}

(14)

and

p_{e} (ε_{R}, ε_{I}) = 2 E [G_{σ}^{C} (e_{R} - c_{R}, e_{I} - c_{I})]

(15)

The parameters can be optimized by

(M *, σ *, c *) = \underset{M \in ℳ, σ \in Ω, c \in ℂ}{\arg \max} U_{σ, c}^{C} (T, Y)

(16)

where

Ω

and

ℂ

represent the allowed sets of parameters

σ

and

c

.

Remark 1.

It can be seen that as long as the function

U_{σ, c}^{C} (T, Y)

is maximized,

M

,

σ

, and

c

can be optimized simultaneously. However, it is computationally demanding to compute and compare all the values of

U_{σ, c}^{C} (T, Y)

under all the possible parameters in the allowed sets. Moreover, it may be difficult to obtain the allowed sets of parameters.

3.3.2. Stochastic Gradient Descent Approach

To further simplify the optimization problem, we propose a stochastic gradient descent based online approach.

(1) When the model

M

is fixed,

\iint p_{e} (ε) d ε_{r} d ε_{I}

is independent of the kernel width

σ

and the center position

c

. In this case,

σ

and

c

can be optimized according to the following formula:

\begin{array}{l} (σ *, c *) & = \underset{σ \in Ω, c \in ℂ}{\arg \min} {\iint [G_{σ}^{C} (ε_{R} - c_{R}, ε_{I} - c_{I}) - p_{e} (ε_{R}, ε_{I})]}^{2} d ε_{R} d ε_{I} \\ = \underset{σ \in Ω, c \in ℂ}{\arg \min} {\iint {[G_{σ}^{C} (ε_{R} - c_{R}, ε_{I} - c_{I})]}^{2} d ε_{R} d ε_{I} - 2 E [G_{σ}^{C} (e - c)]} \\ = \underset{σ \in Ω, c \in ℂ}{\arg \min} {- 2 E [G_{σ}^{C} (e - c)] + \frac{1}{4 π σ^{2}}} \end{array}

(17)

Provide N error samples

{e (k)}_{k = 1}^{N}

, we can get

E [G_{σ}^{C} (e - c)] \approx \frac{1}{N} \sum_{k = 1}^{N} G_{σ}^{C} (e (k) - c (k))

. Therefore, we have the following formula:

(σ *, c *) = \underset{σ \in Ω, c \in ℂ}{\arg \min} {- [\frac{2}{N} \sum_{k = 1}^{N} G_{σ}^{C} (e (k) - c (k))] + \frac{1}{4 π σ^{2}}}

(18)

Furthermore, in order to simplify the optimization problem, we can set

c (k)

as the median or mean of the error samples. Thus, we only need to optimize

σ

. We take

1 / σ^{2}

as a new variable

\tilde{σ}

, and update

\tilde{σ}

and

σ^{2}

using the stochastic gradient descent approach as follows:

\begin{array}{l} \tilde{σ} (k + 1) & = \tilde{σ} (k) - η_{σ} {[\frac{\partial - [\frac{2}{N} \sum_{l = k - T + 1}^{k} G_{σ}^{C} (e (l) - c (k))] + \frac{1}{4 π σ^{2}}}{\partial \tilde{σ}}] |}_{\tilde{σ} = \tilde{σ} (k), c = c (k)} \\ = \tilde{σ} (k) - η_{σ} {- [\frac{1}{π N} \sum_{l = k - T + 1}^{k} \exp (- \frac{{| e (l) - c (k) |}^{2}}{2} \tilde{σ} (k)) (1 - \frac{{| e (l) - c (k) |}^{2}}{2} \tilde{σ} (k))] + \frac{1}{4 π}} \end{array}

(19)

and

σ^{2} (k + 1) = \frac{1}{\tilde{σ} (k + 1)}

(20)

where

c (k)

is estimated online as

c (k) = \sum_{l = k - T + 1}^{k} e (l)

, and

T

is the smoothing length,

η_{σ}

is the learning rate for

\tilde{σ}

.

(2) When the kernel width

σ (k)

and the center position

c (k)

is fixed, the model

M

is optimized by MCCC-VC using (10).

Remark 2.

For the proposed MCCC-VC algorithm, the weight and the parameters are updated alternately at each time instant

k

using (10), (19) and (20), respectively.

3.4. Performance Analysis

3.4.1. Convergence Analysis

The MCCC-VC algorithm is written as a form of nonlinear error:

w (k + 1) = w (k) + η_{w} f (e (k)) x (k)

(21)

with

f (e (k)) = \exp [- \frac{{| e (k) - c (k) |}^{2}}{2 σ^{2}}] {(e (k) - c (k))}^{*}

being the scalar function of the error

e (k)

.

Taking into consideration that

d (k) = w_{0}^{H} x (k) + v (k)

(22)

the error is written as

e (k) = {\tilde{w}}^{H} (k) x (k) + v (k) = e_{a} (k) + v (k)

(23)

where

\tilde{w} (k) = w_{0} - w (k)

is the weight error vector at time instant

k

,

w_{0}^{}

is the system parameter,

e_{a} (k) = {\tilde{w}}^{H} (k) x (k)

is the prior error, and

v (k)

is the additive noise at time instant

k

.

Therefore, we get the following formula

\tilde{w} (k + 1) = \tilde{w} (k) - η_{w} f (e (k)) x (k)

(24)

By taking the square of the 2-norm of both sides, we can further get the following formula:

\begin{array}{l} E {{‖ \tilde{w} (k + 1) ‖}^{2}} \\ = E {{‖ \tilde{w} (k) ‖}^{2}} - 2 η_{w} E {Re [e_{a} (k) f (e (k))]} \\ + η_{w}^{2} E {{‖ x (k) ‖}^{2} {| f (e (k)) |}^{2}} \end{array}

(25)

To guarantee the convergence of the MCCC-VC, the weight error power should be gradually decreased. Thus, we obtain the bound for the learning rate as follows:

0 < η_{w} \leq \frac{2 E {Re [e_{a} (k) f (e (k))]}}{E {{‖ x (k) ‖}^{2} {| f (e (k)) |}^{2}}}

(26)

3.4.2. Steady-State Mean Square

If MCCC-VC arrives at steady-state, we have

\lim_{k \to \infty} E {{‖ \tilde{w} (k + 1) ‖}^{2}} = \lim_{k \to \infty} E {{‖ \tilde{w} (k) ‖}^{2}}

(27)

Then, when

k \to \infty

, we can get

2 E {Re [(e_{a} (k) - c) f (e (k))]} = η_{w} E {{‖ x (k) ‖}^{2} {| f (e (k)) |}^{2}}

(28)

According to the definition of the steady-state excess mean square error (EMSE), we have

S = \lim_{k \to \infty} E [{| e_{a} (k) |}^{2}] = E [{| e_{a} |}^{2}]

(29)

To obtain the theoretical steady-state EMSE, we present the following two assumptions [21,22,29]:

(1): $v (k)$ is zero-mean distributed and independent of $x (k)$ , and $x (k)$ is circular.
(2): $e_{a} (k)$ is zero-mean and independent of $v (k)$ .

Owing to the distributions of

x (k)

,

v (k)

,

e_{a} (k)

, and

e (k)

are not related to the time index

k

at the steady-state, the time index is ignored in the following derivation.

The left side of (28) can be written as

\begin{array}{l} L & = E {e_{a} [\exp [- \frac{{| e - c |}^{2}}{2 σ^{2}}] {(e - c)}^{*}] + e_{a}^{*} [\exp [- \frac{{| e - c |}^{2}}{2 σ^{2}}] (e - c)]} \\ = E {\exp [- \frac{{| e - c |}^{2}}{2 σ^{2}}] (e_{a} {(e - c)}^{*} + e_{a}^{*} (e - c))} \\ = E {g_{1} (e) (2 {| e_{a} |}^{2} + e_{a} v^{*} + e_{a}^{*} v)} \end{array}

(30)

where

g_{1} (e) = \exp [- \frac{{| e - c |}^{2}}{2 σ^{2}}]

(31)

We use Taylor expansion to approximate

g_{1} (e)

as

\begin{array}{l} g_{1} (e) \approx g_{1} (v) & + 2 Re {{\frac{\partial g_{1}}{\partial e} |}_{e = v} \cdot e_{a}} \\ + Re {{\frac{\partial^{2} g_{1}}{\partial e^{*} \partial e^{*}} |}_{e = v} \cdot {(e_{a}^{*})}^{2} + {\frac{\partial^{2} g_{1}}{\partial e^{*} \partial e} |}_{e = v} \cdot {| e_{a} |}^{2}} \end{array}

(32)

where

\frac{\partial g_{1}}{\partial e} = \exp [- \frac{{| e - c |}^{2}}{2 σ^{2}}] \times [- \frac{{| e - c |}^{2}}{2 σ^{2}} {(e - c)}^{- 1}]

(33)

\frac{\partial g_{1}}{\partial e^{*}} = \exp [- \frac{{| e - c |}^{2}}{2 σ^{2}}] \times [- \frac{{| e - c |}^{2}}{2 σ^{2}} {({(e - c)}^{*})}^{- 1}]

(34)

\frac{\partial^{2} g_{1}}{\partial e^{*} \partial e} = \exp [- \frac{{| e - c |}^{2}}{2 σ^{2}}] {| e - c |}^{- 2} \times [\frac{{| e - c |}^{4}}{{(2 σ^{2})}^{2}} - \frac{3 {| e - c |}^{2}}{2 σ^{2}} + \frac{{| e - c |}^{2}}{σ^{2}}]

(35)

\frac{\partial^{2} g_{1}}{\partial e^{*} \partial e^{*}} = \exp [- \frac{{| e - c |}^{2}}{2 σ^{2}}] {({(e - c)}^{*})}^{- 2} \times [\frac{{| e - c |}^{4}}{{(2 σ^{2})}^{2}}]

(36)

Owing to

x

is circular, we can get the values of the following two items:

E [{(e_{a}^{*})}^{2}] = 0

(37)

E [e_{a}^{2}] = {\tilde{w}}^{H} x x^{T} {\tilde{w}}^{*} = 0

(38)

Based on the above derivation, if the higher-order terms are small enough, we can rewrite the left side of (28) as follows:

L \approx 2 S \exp [- \frac{{| v |}^{2}}{2 σ^{2}}] \times {1 - \frac{{| v |}^{2}}{2 σ^{2}}}

(39)

The right side of (28) can be written as

\begin{array}{l} R & = η_{w} T r (R_{x x^{H}}) E {{| f (e (k)) |}^{2}} \\ = η_{w} T r (R_{x x^{H}}) E {\exp [- \frac{{| e - c |}^{2}}{σ^{2}}] {| e - c |}^{2}} \\ = η_{w} T r (R_{x x^{H}}) E {g_{2} (e)} \end{array}

(40)

where

g_{2} (e) = \exp [- \frac{{| e - c |}^{2}}{σ^{2}}] {| e - c |}^{2}

(41)

In a similar way, we use a Taylor expansion to approximate

g_{2} (e)

as

\begin{array}{l} g_{2} (e) \approx g_{2} (v) & + Re {{\frac{\partial^{2} g_{2}}{\partial e^{*} \partial e} |}_{e = v} \cdot {| e_{a} |}^{2} + {\frac{\partial^{2} g_{2}}{\partial e^{*} \partial e^{*}} |}_{e = v} \cdot {(e_{a}^{*})}^{2}} \\ + 2 Re {{\frac{\partial g_{2}}{\partial e} |}_{e = v} \cdot e_{a}} \end{array}

(42)

where

\frac{\partial g_{2}}{\partial e} = \exp [- \frac{{| e - c |}^{2}}{σ^{2}}] {| e - c |}^{2} \times [- \frac{{| e - c |}^{2}}{σ^{2}} {({(e - c)}^{*})}^{- 1} + {(e^{*})}^{- 1}]

(43)

\frac{\partial g_{2}}{\partial e^{*}} = \exp [- \frac{{| e - c |}^{2}}{σ^{2}}] {| e - c |}^{2} \times [- \frac{{| e - c |}^{2}}{σ^{2}} {({(e - c)}^{*})}^{- 1} + {(e^{*})}^{- 1}]

(44)

\frac{\partial^{2} g_{2}}{\partial e^{*} \partial e} = \exp [- \frac{{| e - c |}^{2}}{σ^{2}}] \times [\frac{{| e - c |}^{4}}{σ^{4}} - \frac{3 {| e - c |}^{2}}{σ^{2}} - 1]

(45)

\frac{\partial^{2} g_{2}}{\partial e^{*} \partial e^{*}} = \exp [- \frac{{| e - c |}^{2}}{2 σ^{2}}] {| e - c |}^{2} {({(e - c)}^{*})}^{- 2} (\frac{{| e - c |}^{4}}{σ^{4}} - \frac{{| e - c |}^{2}}{2 σ^{2}})

(46)

If the higher-order terms are small enough, we can rewrite the right side of (28) as follows:

R \approx η_{w} T r (R_{x x^{H}}) E {\exp [- \frac{{| v - c |}^{2}}{2 σ^{2}}] {| v - c |}^{2}} + η_{w} T r (R_{x x^{H}}) S \times R_{1}

(47)

where

R_{1} = E {\exp [- \frac{{| v - c |}^{2}}{2 σ^{2}}] (\frac{{| v - c |}^{4}}{σ^{4}} - \frac{3 {| v - c |}^{2}}{σ^{2}} - 1)}

(48)

Finally, we get the theoretical steady-state EMSE as follows:

S = \frac{η_{w} T r (R_{x x^{H}}) E {\exp [- \frac{{| v - c |}^{2}}{2 σ^{2}}] {| v - c |}^{2}}}{E {2 \exp [- \frac{{| v - c |}^{2}}{2 σ^{2}}] \times [1 - \frac{{| v - c |}^{2}}{2 σ^{2}}]} - η_{w} T r (R_{x x^{H}}) R_{1}}

(49)

Furthermore, when

η_{w}

is small enough, (49) is further simplified as

S = \frac{η_{w} T r (R_{x x^{H}}) E {\exp [- \frac{{| v - c |}^{2}}{2 σ^{2}}] {| v - c |}^{2}}}{E {2 \exp [- \frac{{| v - c |}^{2}}{2 σ^{2}}] \times [1 - \frac{{| v - c |}^{2}}{2 σ^{2}}]}}

(50)

Moreover, we derive the theoretical value of

σ^{2}

by setting

\frac{\partial {- 2 E [G_{σ}^{C} (e - c)] + \frac{1}{4 π σ^{2}}}}{\partial σ^{2}}

= 0

. In this way, we have

\frac{1}{π σ^{4}} \exp (- \frac{{| e - c |}^{2}}{2 σ^{2}}) - \frac{1}{π σ^{6}} \exp (- \frac{{| e - c |}^{2}}{2 σ^{2}}) \frac{{| e - c |}^{2}}{2} - \frac{1}{4 π σ^{4}} = 0

. Due to

e \approx v

at the steady state, we can further obtain the theoretical value of

σ^{2}

based on the following approach:

σ^{2} = \frac{E {\frac{{| v - c |}^{2}}{2} \exp [- \frac{{| v - c |}^{2}}{2 σ^{2}}]}}{E {- \frac{1}{4} + \exp [- \frac{{| v - c |}^{2}}{2 σ^{2}}]}}

(51)

Since the right side of (51) depends on

σ^{2}

, it is a fixed-point solution for the theoretical

σ^{2}

.

Remark 3.

The theoretical steady-state EMSE in (50) is accurate only when

e_{a}

is small enough, since the higher-order term can be negligible in this case. However, if the noise power or step size is too large, or the center position of the kernel function deviates from the mean of the noise, there will be a large deviation between the theoretical and simulated values of steady-state EMSE.

4. Simulation

In this section, we present some simulations to show the validity of theoretical results and the superiority of MCCC-VC. We obtain all the simulation results by averaging over 300 Monte Carlo trials.

4.1. Steady-State Performance

In this part, the filter weight

w_{0} = [\begin{array}{l} w_{1} & w_{2} & \dots \end{array} {w_{10}]}^{T}

is randomly generated, where

w_{k} = w_{R k} + j w_{I k}

, and

w_{R k}, w_{I k} \in N (0, 0.1)

,

w_{R k}

and

w_{I k}

represent the real and imaginary components of

w_{k}

, and

N (μ, {\hat{σ}}^{2})

denotes the Gaussian distributed variable whose mean and variance are

μ

and

{\hat{σ}}^{2}

, respectively. We randomly generate input signal

x = x_{R} + j x_{I}

. In order to show the robustness of MCCC-VC, additive complex noise

v = v_{R} + j v_{I}

is added in the simulation, whose real and imaginary parts are denoted by

v_{R}

and

v_{I}

, respectively. All algorithms initialize

w

with a zero vector.

Firstly, we illustrate the correctness of theoretical steady-state EMSEs. For each simulation, 30,000 iterations are carried out to make sure MCCC-VC reaches the steady-state, and the last 1000 iterations are used to obtain the simulated steady-state EMSEs. The theoretical kernel width and steady-state EMSEs are calculated according to (51) and (50), respectively. Figure 2 and Figure 3 show the simulated and theoretical steady-state EMSEs of MCCC-VC under various noise variances and learning rates, where

v

is Gaussian distributed with mean

3 + 3 j

. It can be seen from both figures that theoretical results are closely matching with simulated results.

Then, we change the noise to binary noise, and the mean is also

3 + 3 j

. In addition, the simulated and theoretical steady-state EMSEs are obtained the same as before. Figure 4 and Figure 5 show the simulated and theoretical steady-state results of MCCC-VC under various noise variances and learning rates. Obviously, there is also a good matching between theoretical results and simulated results.

4.2. Performance Comparison

In this part, we compare the performance of the proposed MCCC-VC algorithm with MCCC and minimum complex kernel risk sensitive loss (MCKRSL) [22]. For the fair comparison, all three algorithms use the gradient descent method to search for the optimal solution. We measure the performance of all the algorithms by weight error power.

In this simulation, the noise

v (k)

is composed of two independent noises [16], i.e.,

v (k) = (1 - a (k)) A (k) + a (k) B (k)

, where

P (a (k) = 0) = 1 - c

, and

P (a (k) = 1) = c

(0 \leq c \leq 1)

.

A (k)

is the ordinary noise with small variance

σ_{v}^{2} = 1

whose real and imaginary parts are denoted by

A_{R} (k)

and

A_{I} (k)

, and

B (k)

is the outliers with large variance whose real and imaginary parts are denoted by

B_{R} (k)

and

B_{I} (k)

.

In this simulation, we set

c = 0.05

and

B_{R}, B_{I} \in N (0, 100)

. In addition, we consider the following four cases for

A (k)

:

(1): $A_{R} (k), A_{I} (k) \in N (3, σ_{v}^{2} / 2)$ ;
(2): $P (A_{R} (k) = 3 + σ_{v}^{} / \sqrt{2}) = P (A_{R} (k) = 3 - σ_{v}^{} / \sqrt{2}) = P (A_{I} (k) = 3 + σ_{v}^{} / \sqrt{2}) = P (A_{I} (k) = 3 - σ_{v}^{} / \sqrt{2}) = 0.5$ ;
(3): $A_{R} (k), A_{I} (k) \in U (3 - σ_{v}^{} / \sqrt{2}, 3 + σ_{v}^{} / \sqrt{2})$ , with $U (α, β)$ denoting the uniform distribution over $[α, β]$ ;
(4): $A_{R} (k) = 3 + σ_{v}^{} \sin θ_{1 k} / \sqrt{2}$ , $A_{I} (k) = 3 + σ_{v}^{} \sin θ_{2 k} / \sqrt{2}$ , where $θ_{1 k}, θ_{2 k} \in U [0, 2 π]$ .

Figure 6, Figure 7, Figure 8 and Figure 9 show the convergence behavior of various algorithms on the basis of weight error power

{‖ w (k) - w_{0} ‖}^{2}

under different noises, where the parameter settings of different algorithms are summarized in Table 1. It can be seen clearly that the convergence performance of MCCC-VC is better than other two algorithms in all cases.

5. Conclusions

The complex correntropy usually employs a Gaussian kernel whose center is zero, which is not the best choice for many situations. To overcome this defect, this paper proposes the maximum complex correntropy criterion with variable center (MCCC-VC). The complex correntropy is extended to the case where the center can be anywhere. Furthermore, this paper also proposes an effective method to optimize the center position and the kernel width. More significantly, we analyze the convergence and steady-state performance of MCCC-VC theoretically. Simulation results obtained in Section 4 support the reliability of theoretical analysis and show the excellent performance of MCCC-VC.

Author Contributions

Conceptualization, F.D., G.Q., and S.W.; methodology, F.D., G.Q., and S.W.; software, F.D., G.Q., and S.W.; validation, G.Q.; formal analysis, G.Q.; investigation, F.D., G.Q., and S.W.; resources, G.Q.; data curation, F.D. and G.Q.; writing—original draft preparation, F.D.; writing—review and editing, G.Q. and S.W.; visualization, F.D. and G.Q.; supervision, F.D., G.Q., and S.W.; project administration, G.Q.; funding acquisition, S.W. and G.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China under grants 61671389 and 61701419, and Fundamental Research Funds for the Central Universities under grant XDJK2019B011.

Conflicts of Interest

The authors declare no conflict of interest.

References

Haykin, S. Adaptive Filter Theory, 3rd ed.; Prentice Hall: New York, NY, USA, 1996. [Google Scholar]
Sayed, A.H. Fundamentals of adaptive filtering. IEEE Control Syst. 2003, 25, 77–79. [Google Scholar]
Chen, B.; Zhu, Y.; Hu, J.; Principe, J.C. System Parameter Identification: Information Criteria and Algorithms; Newnes: Oxford, UK, 2013. [Google Scholar]
Widrow, B.; McCool, J.M.; Larimore, M.G.; Johnson, C.R. Stationary and nonstationary learning characteristics of the LMS adaptive filter. Proc. IEEE. 1976, 64, 1151–1162. [Google Scholar] [CrossRef]
Kwong, R.H.; Johnston, E.W. A variable step size LMS algorithm. IEEE Trans. Signal Process. 1992, 40, 1633–1642. [Google Scholar] [CrossRef] [Green Version]
Benesty, J.; Duhamel, P. A fast exact least mean square adaptive algorithm. IEEE Trans. Signal Process. 1992, 40, 2904–2920. [Google Scholar] [CrossRef]
Diniz, P.S.R. Adaptive Filtering: Algorithms and Practical Implementation, 4th ed.; Springer: New York, NY, USA, 2013. [Google Scholar]
Pei, S.C.; Tseng, C.C. Least mean p-power error criterion for adaptive FIR filter. IEEE J. Sel. Areas Commun. 1994, 12, 1540–1547. [Google Scholar]
AI-Naffouri, T.Y.; Sayed, A.H. Adaptive filters with error nonlinearities: Mean-square analysis and optimum design. EURASIP J. Appl. Signal Process. 2001, 1, 192–205. [Google Scholar] [CrossRef]
Erdogmus, D.; Principe, J.C. Generalized information potential criterion for adaptive system training. IEEE Trans. Neural Netw. 2002, 13, 1035–1044. [Google Scholar] [CrossRef]
Principe, J.C. Information Theoretic Learning: Renyi’s Entropy and Kernel Perspectives; Springer: New York, NY, USA, 2010. [Google Scholar]
Sayin, M.O.; Vanli, N.D.; Kozat, S.S. A novel family of adaptive filtering algorithms based on the logarithmic cost. IEEE Trans. Signal Process. 2014, 62, 4411–4424. [Google Scholar] [CrossRef] [Green Version]
Liu, W.; Pokharel, P.P.; Príncipe, J.C. Correntropy: Properties and applications in non-Gaussian signal processing. IEEE Trans. Signal Process. 2007, 55, 5286–5298. [Google Scholar] [CrossRef]
Chen, B.; Xing, L.; Zhao, H.; Zheng, N.; Príncipe, J.C. Generalized correntropy for robust adaptive filtering. IEEE Trans. Signal Process. 2016, 64, 3376–3387. [Google Scholar] [CrossRef] [Green Version]
Ma, W.; Qu, H.; Gui, G.; Xu, L.; Zhao, J.; Chen, B. Maximum correntropy criterion based sparse adaptive filtering algorithms for robust channel estimation under non-Gaussian environments. J. Franklin Inst. 2015, 352, 2708–2727. [Google Scholar] [CrossRef] [Green Version]
Chen, B.; Xing, L.; Xu, B.; Zhao, H.; Zheng, N.; Príncipe, J.C. Kernel Risk-Sensitive Loss: Definition, Properties and Application to Robust Adaptive Filtering. IEEE Trans. Signal Process. 2017, 65, 2888–2901. [Google Scholar] [CrossRef] [Green Version]
Mandic, D.; Goh, V. Complex Valued Nonlinear Adaptive Filters: Noncircularity, Widely Linear and Neural Models (ser. Adaptive and Cognitive Dynamic Systems: Signal Processing, Learning, Communications and Control); Wiley: New York, NY, USA, 2009. [Google Scholar]
Shi, L.; Zhao, H.; Zakharov, Y. Performance analysis of shrinkage linear complex-valued LMS algorithm. IEEE Signal Process. Lett. 2019, 26, 1202–1206. [Google Scholar] [CrossRef]
Guimaraes, J.P.F.; Fontes, A.I.R.; Rego, J.B.A.; Martins, A.M.; Principe, J.C. Complex correntropy: Probabilistic interpretation and application to complex-valued data. IEEE Signal Process. Lett. 2017, 24, 42–45. [Google Scholar] [CrossRef]
Guimarães, J.P.F.; Fontes, A.I.R.; Rego, J.B.A.; Martins, A.M.; Principe, J.C. Complex Correntropy Function: properties, and application to a channel equalization problem. Expert Syst. Appl. 2018, 107, 173–181. [Google Scholar] [CrossRef] [Green Version]
Qian, G.; Wang, S. Generalized Complex Correntropy: Application to Adaptive Filtering of Complex Data. IEEE Access. 2018, 6, 19113–19120. [Google Scholar] [CrossRef]
Qian, G.; Wang, S. Complex Kernel Risk-Sensitive Loss: Application to Robust Adaptive Filtering in Complex Domain. IEEE Access. 2018, 6, 60329–60338. [Google Scholar] [CrossRef]
Guimarães, J.P.F.; Fontes, A.I.R.; da Silva, F.B. Complex Correntropy Induced Metric Applied to Compressive Sensing with Complex-Valued Data. In Proceedings of the 2018 IEEE Southwest Symposium on Image Analysis and Interpretation (SSIAI), Las Vegas, NV, USA, 8–10 April 2018; pp. 21–24. [Google Scholar]
Qian, G.; Luo, D.; Wang, S. A Robust Adaptive Filter for a Complex Hammerstein System. Entropy 2019, 21, 162. [Google Scholar] [CrossRef] [Green Version]
Chen, B.; Wang, X.; Li, Y.; Principe, J.C. Maximum correntropy criterion with variable center. IEEE Signal Process. 2019, 26, 1212–1216. [Google Scholar] [CrossRef] [Green Version]
Zhu, L.; Song, C.; Pan, L.; Li, J. Adaptive filtering under the maximum correntropy criterion with variable center. IEEE Access. 2019, 7, 105902–105908. [Google Scholar] [CrossRef]
Wirtinger, W. Zur formalen theorie der funktionen von mehr complexen veränderlichen. Math. Ann. 1927, 97, 357–375. [Google Scholar] [CrossRef]
Bouboulis, P.; Theodoridis, S. Extension of Wirtinger’s calculus to reproducing Kernel Hilbert spaces and the complex kernel LMS. IEEE Trans. Signal Process. 2011, 59, 964–978. [Google Scholar] [CrossRef] [Green Version]
Picinbono, B. On circularity. IEEE Trans. Signal Process. 1994, 42, 3473–3482. [Google Scholar] [CrossRef]

Figure 1. Surfaces of maximum complex correntropy criterion with variable center (MCCC-VC) and MCCC.

Figure 2. Steady-state excess mean square errors (EMSEs) under various

σ_{v}^{2}

. (Gaussian distributed noise,

η_{w} = 3.8 \times 10^{- 4}

,

η_{σ} = 4 \times 10^{- 3}

).

Figure 2. Steady-state excess mean square errors (EMSEs) under various

σ_{v}^{2}

. (Gaussian distributed noise,

η_{w} = 3.8 \times 10^{- 4}

,

η_{σ} = 4 \times 10^{- 3}

).

Figure 3. Steady-state EMSEs under various

η_{w}

. (Gaussian distributed noise,

σ_{v}^{2} = 1

,

η_{σ} = 4 \times 10^{- 3}

).

Figure 3. Steady-state EMSEs under various

η_{w}

. (Gaussian distributed noise,

σ_{v}^{2} = 1

,

η_{σ} = 4 \times 10^{- 3}

).

Figure 4. Steady-state EMSEs under various

σ_{v}^{2}

. (Binary distributed noise,

η_{w} = 3.8 \times 10^{- 4}

,

η_{σ} = 4 \times 10^{- 3}

).

Figure 4. Steady-state EMSEs under various

σ_{v}^{2}

. (Binary distributed noise,

η_{w} = 3.8 \times 10^{- 4}

,

η_{σ} = 4 \times 10^{- 3}

).

Figure 5. Steady-state EMSEs under various

η_{w}

. (Binary distributed noise,

σ_{v}^{2} = 1

,

η_{σ} = 4 \times 10^{- 3}

).

Figure 5. Steady-state EMSEs under various

η_{w}

. (Binary distributed noise,

σ_{v}^{2} = 1

,

η_{σ} = 4 \times 10^{- 3}

).

Figure 6. Convergence behavior of various algorithms (case 1).

Figure 7. Convergence behavior of various algorithms (case 2).

Figure 8. Convergence behavior of various algorithms (case 3).

Figure 9. Convergence behavior of various algorithms (case 4).

Table 1. Parameter setting of different algorithms.

Algorithm	MCCC	MCKRSL	MCCC-VC
Parameters	$η = 1 \times 10^{- 3}$ , $σ = 5$ .	$η = 1.8 \times 10^{- 4}$ , $σ = 5$ , $λ = 3$ .	$η_{w} = 4.8 \times 10^{- 4}$ , $η_{σ} = 4 \times 10^{- 4}$ , $σ (0) = 5$ .

Notes: η and σ denote the learning rate and kernel width for MCCC and MCKRSL, and λ denotes the risk-sensitive parameter for MCKRSL. Moreover,

η_{w}

,

η_{σ}

denote the learning rates for the weight and kernel width of MCCC-VC, and σ(0) denotes the initial kernel with of MCCC-VC.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dong, F.; Qian, G.; Wang, S. Complex Correntropy with Variable Center: Definition, Properties, and Application to Adaptive Filtering. Entropy 2020, 22, 70. https://doi.org/10.3390/e22010070

AMA Style

Dong F, Qian G, Wang S. Complex Correntropy with Variable Center: Definition, Properties, and Application to Adaptive Filtering. Entropy. 2020; 22(1):70. https://doi.org/10.3390/e22010070

Chicago/Turabian Style

Dong, Fei, Guobing Qian, and Shiyuan Wang. 2020. "Complex Correntropy with Variable Center: Definition, Properties, and Application to Adaptive Filtering" Entropy 22, no. 1: 70. https://doi.org/10.3390/e22010070

APA Style

Dong, F., Qian, G., & Wang, S. (2020). Complex Correntropy with Variable Center: Definition, Properties, and Application to Adaptive Filtering. Entropy, 22(1), 70. https://doi.org/10.3390/e22010070

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Complex Correntropy with Variable Center: Definition, Properties, and Application to Adaptive Filtering

Abstract

1. Introduction

2. Complex Correntropy with Variable Center

3. MCCC-VC Algorithm

3.1. Cost Function

3.2. Gradient Descent Algorithm Based On MCCC-VC

3.3. Optimization of the Parameters in MCCC-VC

3.3.1. Optimization Problem in MCCC-VC

3.3.2. Stochastic Gradient Descent Approach

3.4. Performance Analysis

3.4.1. Convergence Analysis

3.4.2. Steady-State Mean Square

4. Simulation

4.1. Steady-State Performance

4.2. Performance Comparison

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI