1. Introduction
Adaptive signal processing is carried out by minimizing or maximizing an appropriate performance criterion for adjusting weights of algorithms designed based on that criterion [
1]. The mean squared error (MSE) criterion that measures the average of the squares of the error signal is widely employed in the Gaussian noise environment. However in non-Gaussian noise like impulsive noise, the averaging process of squared error samples that may mitigate the effects of the Gaussian noise is defeated because a single large, impulse can dominate these sums. As recent signal processing methods, the information-theoretic learning (ITL) is based on the information potential concept that data samples can be treated as physical particles in an information potential field where they interact with each other by information forces [
2]. The ITL method usually exploits probability distribution functions constructed by the kernel density estimation method with the Gaussian kernel.
Among the ITL criteria, Euclidian distance (ED) between two distributions has been known to be effective in signal processing fields demanding similarity measure functions [
3,
4,
5]. For training of adaptive systems for medical diagnosis, the ED criterion has been successfully applied to distinguish biomedical datasets [
6]. For finite impulse response (FIR) adaptive filter structures in impulsive noise environments, ED between the output distribution and a set of Dirac delta functions has been used as an efficient performance criterion taking advantage of the outlier-cutting effect of Gaussian kernel for output pairs and symbol-output pairs [
7]. In this approach with output distribution and delta functions, minimization of the ED (MED) leads to adaptive algorithms that adjust weights so as for the output distribution to be formed into the shape of delta functions located at each symbol point, that is, output samples concentrate on symbol points. Though the blind MED algorithm shows superior performance of robustness against impulsive noise and channel distortions, a drawback of heavy computational burden lies in it. The computational complexity is due in large part to the double summation operations at each iteration time for its gradient estimation. A follow-up study [
8], however, shows that the drawback can be reduced significantly by employing a recursive gradient estimation method.
The gradient in ED minimization process of the MED algorithm has two components; one for kernel function of output pairs and the other for kernel function of symbol-output pairs. The roles of these two components have not been investigated or analyzed in scientific literature. In this paper, we analyze the roles of the two components and prove the analysis through controlling each component individually by normalizing each component with component-related input power. Through simulation in multipath channel equalization under impulsive noise, their roles of managing sample pairs are verified, and it is shown that the proposed method of controlling each component through power normalization increases convergence speed and lowers steady state MSE significantly in multipath and impulsive noise environment.
2. MSE Criterion and Related Algorithms
Employing the tapped delay line (TDL) structure, the output
yk becomes
at time
k with the input vector
and weight
. Given the desired signal
dk chosen randomly among the
M symbol points (
A1,
A2, …,
AM), the system error is calculated as
ek =
dk −
yk. In blind equalization, the constant modulus error
where
is mostly used [
9].
The MSE criterion, one of the most widely used criteria, is the statistical average
E[·] of error power
in supervised equalization and of CME power
in a blind one. For practical implementation we can use the instant squared error
as a cost function in supervised equalization. With the gradient
and a step size
μLMS, minimization of
leads to the least mean square (LMS) algorithm [
1]:
As an extension of the LMS algorithm, the normalized LMS (NLMS) algorithm has been introduced where the gradient is normalized as proportional to the inverse of the dot product of the input vector with itself
as a result of minimizing weight perturbation
of the LMS algorithm [
1]. Then the NLMS algorithm becomes:
The NLMS algorithm is known to be more stable with unknown signals and effective in real time adaptive systems [
10,
11]. We can see under impulsive noise environments that a single large error sample induced by impulsive noise can generate large weight perturbations. The perturbation becomes zero only when the error
ek is zero. So we can predict that the weight update process (1) may be unstable so that it requires a very small step size in impulsive noise environment. Also the LMS and NLMS algorithms utilizing instant error power
may cause instability in an impulsive noise environment.
3. ED Criterion and Entropy
Unlike the MSE based on error power, probability distribution functions can be used in constructing performance criterion. As one of the criteria utilizing distributions, the ED between the distribution of transmitted symbol
fD(
d) and the equalizer output distribution
fY(
y) is defined as (3) [
3,
6].
Assuming that modulation schemes are known to receivers beforehand and all the
M symbol points (
A1,
A2, …,
AM) are equally likely, the distribution of the transmitted symbols can be expressed as:
The output distribution can be estimated based on kernel density estimation method
with a set of available
N output samples {
y1,
y2, …,
yN} [
6].
Then the ED can be expressed as:
The first term 1/
M in (5) is a constant which is not adjustable, so the ED can be reduced to the following performance criterion
CED [
7]:
In ITL methods, data samples are treated as physical particles interacting with each other. If we place physical particles in the locations of
yi and
yj, the Gaussian kernel
produces an exponentially decaying positive value as the distance between the two particles increases. This leads us consider the Gaussian kernel
as a potential field-inducing interaction among particles. Then
corresponds to the sum of interactions on the
i-th particle and
is the averaged sum of all pairs of interactions. This summed potential energy is referred to as information potential in ITL methods [
2]. Therefore, the term
in (6) is the information potential between symbol points and output samples, and
in (6) indicates the information potential among output samples themselves.
On the other hand, the information potential can be interpreted in the concept of entropy that can be described in terms of “energy dispersal” or the “spreading of energy” [
11]. As one of the convenient entropy definitions, Reny’s entropy of order 2,
HReny(
y) is defined in (7) as logarithm of the sum of the power of probability which is much easier to estimate [
2]:
When the Reny’s entropy is used along with the kernel density estimation method
, we obtain a much simpler form of Reny’s quadratic entropy as:
Therefore the cost function
CED becomes:
Equations (9) and (11) indicate that the entropy of output samples increases as the distance (yj − yi) between the two information particles yj and yi increases. Therefore, (yj − yi) can be referred to as entropy-governing output and we can notice that (9) controls the spreading of output samples. Likewise, the term in (6), that is, in (11) governs dispreading or recombining the sample pairs of symbol points and output samples.
4. Entropy-Governing Variables and Recursive Algorithms
When defining
yj,i = (
yj −
yi) and
em,i = (
Am –
yi) and
Xj.i = (
Xj –
Xi) for convenience’s sake,
yj,i,
em,i and
Xj,i can be referred to as entropy-governing output, entropy-governing error and entropy-governing input, respectively. Using these entropy-governing variables and the on-line density estimation method
instead of
fY(
y), the cost function at time
k,
CED,k can be written as:
where:
Minimization of CED,k indicates that Uk forces spreading of output samples and −Vk compels output samples to move close to symbol points. Considering that initial-stage output samples which may have clustered about wrong places due to channel distortion, Uk is associated with the role of getting the output samples to move out in search of each destination, that is, accelerating initial convergence speed. On the other hand, Vk is related with compelling output samples near a symbol point to come close lowering the minimum MSE.
On the other hand, the double summation operations for
Uk and
Vk impose a heavy computational burden. In the work [
8] it has been revealed that each component
Uk+1 and
Vk+1 of
CED,k+1 =
Uk+1 −
Vk+1 can be recursively calculated so that the computational complexity of (12) is significantly reduced as in the following equations (15) and (16):
Similarly,
Vk+1 can be divided into the terms with
yk+1 and the terms with
yk−N+1:
The gradients
and
are calculated recursively by using Equations (15) and (16) as:
Similarly,
is calculated recursively as described below:
Since the argument
in (17) is a function of the entropy-governing output
yk,i, we can define
as the modified entropy-output
, which becomes a significantly mitigated value through the Gaussian kernel when the entropy-governing output
yk,i is a large value.
Similarly, we see that the argument
in (18) is a function of entropy-governing error
em,k, so that we have the modified entropy-error
as:
The modified entropy-error
also becomes a significantly reduced value through the Gaussian kernel when the entropy-governing error
em,k is large. Then (18) becomes:
Through minimization of
CED,k =
Uk −
Vk with the gradients
and
obtained by (20) and (22), the following recursive MED (RMED) algorithm can be derived [
7]:
Comparing the gradients of RMED to the gradient of the LMS algorithm in (1) which is composed of error and input, we may find that the gradients and in (20) and (22) have similar terms (modified entropy-output multiplied by entropy-input) and (modified entropy-error multiplied by input), respectively. Considering that impulsive noise may induce large entropy-governing output yk,i or entropy-governing error em,k, modified entropy-output and modified entropy-error which are significantly mitigated by the Gaussian kernel can be viewed as playing a crucial role in obtaining stable gradients under strong impulsive noise. Therefore we can anticipate that the RMED algorithm (23) can have a low weight perturbation in impulsive noise environments.
5. Input Power Estimation for Normalized Gradient
For the purpose of minimizing the weight perturbation
of the LMS algorithm in (1), the
NLMS algorithm has been introduced where the gradient is normalized by the averaged power of the current input samples
[
1].
Applying this approach to RMED we propose in this section to normalize the gradients in some ways. Since the role of
Uk (spreading output samples) is different from that of
Vk (moving output samples close to symbol points), the gradients of (23) can be normalized separately as:
where
PU(
k) is the average power of
Xi,k and
PV(
k) is the average power of
Xk as:
Since defeating the impulsive noise contained in the input by way of the average operation
is considered to be ineffective, the denominators of (26) and (27) are likely to be fluctuating under impulsive noise. This may cause the algorithm to be sensitive to impulsive noise. Also the summation operators make the algorithm demand computationally burdensome. To avoid these drawbacks, we can track the average power
PU(
k) and
PV(
k) recursively with the balance parameter
β (0 <
β <1) as:
With the recursive power estimation (28) and (29), we may summarize the proposed algorithm in a more formal one as in the
Table 1. In the following section, we will investigate the new RMED algorithm (25) with separate normalization by
PU(
k) in (28) and
PV(
k) in (29) in the aspect of convergence speed and steady state MSE.
6. Results and Discussion
A base-band communication system with multipath fading channel and impulsive noise used in the experiment is depicted in
Figure 1. The symbol set in the transmitter is composed of equally probable four symbols (−3, −1, 1, 3). The transmitted symbol is to be distorted by the multipath channel
H(
z) = 0.26 + 0.93
z−1 + 0.26
z−2 [
12]. The channel output is added by impulsive noise
nk. The distribution function of
nk,
f(
nk) is expressed in
Table 2 where
is the variance of impulses which are generated according to Poisson process (occurrence rate
ε) and
is that of the background Gaussian noise [
13]. The simulation setup and parameter values are described in the
Figure 1 and the
Table 2.
An example of impulsive noise being used in this simulation is depicted in
Figure 2.
It has in
Section 4 been analyzed that
Uk is associated with the role of spreading output samples which are clustered to wrong positions due to distorted channel characteristics and
Vk is related with moving output samples close to symbol points. This process can be explained through initial-stage investigation of what happens in the error distribution and observing how the distribution of output samples changes in the experimental environment.
Figure 3 shows the error distribution in the initial stage with 200 error samples and ensemble average of 500 runs. Considering the four symbol points are (−3, −1, 1, 3), error values greater than 1.0 are associated with output samples which can be decided as wrong symbols. The cumulative probability of initial output samples placed in the wrong regions in this respect is calculated to be 0.35 from the
Figure 3 (35% output samples are not in place). The peaks or ridges in the error distribution are about 6 on each side. This observation may indicate that output samples are clustered or grouped in some regions (two groups are within the correct range but 4 groups are in the incorrect positions on each side of the distribution). This result coincides clearly with the initial output distribution in
Figure 4. The output distribution showing about 12 peaks indicates that the initial output samples are clustered into 12 groups mostly located out of place, that is, not around −3, −1, 1, 3.
On the 35% output samples clustered in the wrong symbol regions, the spreading force has a positive effect in order for them in blind search to move out for finding their correct symbol positions. This process is observed in the graph of
k = 700 in
Figure 4. The output distribution at time
k = 700 has an evenly spread shape, indicating that the clustered output samples have moved out and mingled with one another. At the sample time
k = 1800 the output samples start to position at their correct symbol areas. From this phase, the force moving output samples close to the symbol points is in effect on lowering steady state MSE.
These results imply that
Uk is related with convergence speed and
Vk with steady state MSE. To verify this analysis we experiment the proposed algorithm in the following three modes with respect to convergence speed and steady state MSE (we assume that steady state MSE is close to minimum MSE):
Mode 1 of RMED-SN algorithm in (30) is for observing changes in initial convergence speed by normalizing only by the average power PU(k) of entropy-input Xi,k compared to the not-normalized RMED. Mode 2 is to observe whether the normalization of by PV(k) of input Xk without managing Uk lowers the steady state MSE of RMED. Finally we see if Mode 3 employing normalization of and simultaneously yields both of the two performance enhancements; faster convergence and lowered steady state MSE.
Figure 5 shows the MSE learning performance for CMA, LMS, RMED and Mode 1 of the proposed algorithm. As discussed in
Section 2, the learning curves of the MSE-based algorithms, CMA and LMS do not fall down below −6 dB being defeated by the impulsive noise. On the other hand, the RMED and proposed algorithm show a rapid and stable convergence. The difference of convergence speed between RMED and Mode 1 is clearly observed. While the RMED converges in about 4000 samples, the Mode 1 does in about 2000 samples. Therefore, Mode 1 shows faster convergence than the RMED algorithm by 2 times verifying the analysis of the role of
since only
is normalized but
is not, and we see little difference (about 1 dB) in the steady state MSE.
In
Figure 6 RMED and Mode 2 are compared. Both algorithms have similar convergence speed with difference of only 500 samples. But after convergence the Mode 2 yields much lower steady state MSE than the original RMED by over 2 dB. These findings indicate that the role of
Vk is definitely related with lowering minimum MSE. This is in accordance with the analysis that
Uk plays the role of pulling error samples close together.
Furthermore, Mode 3 employing normalization of
and
simultaneously proves to yield the two merits of performance enhancement revealing increased speed and lowered steady state MSE as depicted in
Figure 7. While the RMED converges in about 4000 samples and leaves its steady state MSE at about 25 dB, the Mode 3 converges in about 2000 samples and has about 27 dB of steady state MSE. By employing Mode 3, we obtained faster convergence by about 2 times and lower steady state MSE by over 2 dB.
In Mode 3, it is still not clear whether the normalization to
Uk for speeding up the initial convergence may have a negative influence in later iterations, so we try to reduce the
Uk normalization gradually after convergence (
k ≥ 3000) by using
in place of
PU(
k) as:
where
k ≥ 3000 and a constant
c is 0 ≤
c ≤ 1.
The results for
c = 0.8, 0.9, 0.99, 1.0 are shown in
Figure 8 in terms of error distribution since the learning curves for the various constant values are not clearly distinguishable.
The value of
c in (33) may be related with the degree of gradual reduction in the normalization to
Uk, that is,
c = 1 indicates no reduction (Mode 3 as it is) and
c = 0.8 means comparatively rapid reduction. From the
Figure 8, we observe that the error performance becomes better and then worse as the degree of reduction decreases from 0.8 to 1.0. This implies that the gradual reduction of the normalization to
Uk is effective but not much. We may conclude that the normalization to
Uk for speeding up the initial convergence has a slight negative influence in later iterations and this can be overcome by employing the gradual reduction of the
Uk normalization.