Next Article in Journal
Impact of Microgroove Shape on Flat Miniature Heat Pipe Efficiency
Previous Article in Journal
Information-Theoretic Analysis of a Family of Improper Discrete Constellations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Function Analysis of the Euclidean Distance between Probability Distributions

Division of Electronic & Information Communication, Kangwon National University, Samcheok 25913, Korea
Entropy 2018, 20(1), 48; https://doi.org/10.3390/e20010048
Submission received: 20 November 2017 / Revised: 31 December 2017 / Accepted: 8 January 2018 / Published: 11 January 2018
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
Minimization of the Euclidean distance between output distribution and Dirac delta functions as a performance criterion is known to match the distribution of system output with delta functions. In the analysis of the algorithm developed based on that criterion and recursive gradient estimation, it is revealed in this paper that the minimization process of the cost function has two gradients with different functions; one that forces spreading of output samples and the other one that compels output samples to move close to symbol points. For investigation the two functions, each gradient is controlled separately through individual normalization of each gradient with their related input. From the analysis and experimental results, it is verified that one gradient is associated with the role of accelerating initial convergence speed by spreading output samples and the other gradient is related with lowering the minimum mean squared error (MSE) by pulling error samples close together.

1. Introduction

Adaptive signal processing is carried out by minimizing or maximizing an appropriate performance criterion for adjusting weights of algorithms designed based on that criterion [1]. The mean squared error (MSE) criterion that measures the average of the squares of the error signal is widely employed in the Gaussian noise environment. However in non-Gaussian noise like impulsive noise, the averaging process of squared error samples that may mitigate the effects of the Gaussian noise is defeated because a single large, impulse can dominate these sums. As recent signal processing methods, the information-theoretic learning (ITL) is based on the information potential concept that data samples can be treated as physical particles in an information potential field where they interact with each other by information forces [2]. The ITL method usually exploits probability distribution functions constructed by the kernel density estimation method with the Gaussian kernel.
Among the ITL criteria, Euclidian distance (ED) between two distributions has been known to be effective in signal processing fields demanding similarity measure functions [3,4,5]. For training of adaptive systems for medical diagnosis, the ED criterion has been successfully applied to distinguish biomedical datasets [6]. For finite impulse response (FIR) adaptive filter structures in impulsive noise environments, ED between the output distribution and a set of Dirac delta functions has been used as an efficient performance criterion taking advantage of the outlier-cutting effect of Gaussian kernel for output pairs and symbol-output pairs [7]. In this approach with output distribution and delta functions, minimization of the ED (MED) leads to adaptive algorithms that adjust weights so as for the output distribution to be formed into the shape of delta functions located at each symbol point, that is, output samples concentrate on symbol points. Though the blind MED algorithm shows superior performance of robustness against impulsive noise and channel distortions, a drawback of heavy computational burden lies in it. The computational complexity is due in large part to the double summation operations at each iteration time for its gradient estimation. A follow-up study [8], however, shows that the drawback can be reduced significantly by employing a recursive gradient estimation method.
The gradient in ED minimization process of the MED algorithm has two components; one for kernel function of output pairs and the other for kernel function of symbol-output pairs. The roles of these two components have not been investigated or analyzed in scientific literature. In this paper, we analyze the roles of the two components and prove the analysis through controlling each component individually by normalizing each component with component-related input power. Through simulation in multipath channel equalization under impulsive noise, their roles of managing sample pairs are verified, and it is shown that the proposed method of controlling each component through power normalization increases convergence speed and lowers steady state MSE significantly in multipath and impulsive noise environment.

2. MSE Criterion and Related Algorithms

Employing the tapped delay line (TDL) structure, the output yk becomes y k = W k T X k at time k with the input vector X k = [ x k , x k 1 , , x k L + 1 ] T and weight W k = [ w 0 . k ,   w 1 , k ,   ,   w L 1 , k ] T . Given the desired signal dk chosen randomly among the M symbol points (A1, A2, …, AM), the system error is calculated as ek = dkyk. In blind equalization, the constant modulus error e CME , k = | y k | 2 R 2 where R 2 = E [ | d k | 4 ] / E [ | d k | 2 ] is mostly used [9].
The MSE criterion, one of the most widely used criteria, is the statistical average E[·] of error power e k 2 in supervised equalization and of CME power ( | y k | 2 R 2 ) 2 in a blind one. For practical implementation we can use the instant squared error e k 2 as a cost function in supervised equalization. With the gradient e k 2 W = 2 e k X k and a step size μLMS, minimization of e k 2 leads to the least mean square (LMS) algorithm [1]:
W k + 1 = W k μ LMS e k 2 W = W k + μ LMS 2 e k X k
As an extension of the LMS algorithm, the normalized LMS (NLMS) algorithm has been introduced where the gradient is normalized as proportional to the inverse of the dot product of the input vector with itself X k 2 = X k T X k = m = 0 L 1 x k m 2 as a result of minimizing weight perturbation W k + 1 W k 2 of the LMS algorithm [1]. Then the NLMS algorithm becomes:
W k + 1 = W k + μ NLMS e k X k m = 0 L 1 x k m 2
The NLMS algorithm is known to be more stable with unknown signals and effective in real time adaptive systems [10,11]. We can see under impulsive noise environments that a single large error sample induced by impulsive noise can generate large weight perturbations. The perturbation becomes zero only when the error ek is zero. So we can predict that the weight update process (1) may be unstable so that it requires a very small step size in impulsive noise environment. Also the LMS and NLMS algorithms utilizing instant error power e k 2 may cause instability in an impulsive noise environment.

3. ED Criterion and Entropy

Unlike the MSE based on error power, probability distribution functions can be used in constructing performance criterion. As one of the criteria utilizing distributions, the ED between the distribution of transmitted symbol fD(d) and the equalizer output distribution fY(y) is defined as (3) [3,6].
ED = [ f D ( α ) f Y ( α ) ] 2 d α
Assuming that modulation schemes are known to receivers beforehand and all the M symbol points (A1, A2, …, AM) are equally likely, the distribution of the transmitted symbols can be expressed as:
f D ( α ) = 1 M [ δ ( α A 1 ) + δ ( α A 2 ) + + δ ( α A m ) + + δ ( α A M ) ]
The output distribution can be estimated based on kernel density estimation method f Y ( y ) = 1 / N i = 1 N G σ ( y y i ) with a set of available N output samples {y1, y2, …, yN} [6].
Then the ED can be expressed as:
ED = 1 M + 1 N 2 i = 1 N j = 1 N G σ 2 ( y j y i ) 2 1 M 1 N m = 1 M i = 1 N G σ ( A m y i )
The first term 1/M in (5) is a constant which is not adjustable, so the ED can be reduced to the following performance criterion CED [7]:
C ED = 1 N 2 i = 1 N j = 1 N G σ 2 ( y j y i ) 2 1 M 1 N m = 1 M i = 1 N G σ ( A m y i )
In ITL methods, data samples are treated as physical particles interacting with each other. If we place physical particles in the locations of yi and yj, the Gaussian kernel G σ 2 ( y j y i ) produces an exponentially decaying positive value as the distance between the two particles increases. This leads us consider the Gaussian kernel G σ 2 ( y j y i ) as a potential field-inducing interaction among particles. Then j = 1 N G σ 2 ( y j y i ) corresponds to the sum of interactions on the i-th particle and 1 / N 2 i = 1 N j = 1 N G σ 2 ( y j y i ) is the averaged sum of all pairs of interactions. This summed potential energy is referred to as information potential in ITL methods [2]. Therefore, the term 1 M 1 N m = 1 M i = 1 N G σ ( A m y i ) in (6) is the information potential between symbol points and output samples, and 1 / N 2 i = 1 N j = 1 N G σ 2 ( y j y i ) in (6) indicates the information potential among output samples themselves.
On the other hand, the information potential can be interpreted in the concept of entropy that can be described in terms of “energy dispersal” or the “spreading of energy” [11]. As one of the convenient entropy definitions, Reny’s entropy of order 2, HReny(y) is defined in (7) as logarithm of the sum of the power of probability which is much easier to estimate [2]:
H Reny ( y ) = log ( i = 1 N p i 2 )
When the Reny’s entropy is used along with the kernel density estimation method f Y ( y ) = 1 / N i = 1 N G σ ( y y i ) , we obtain a much simpler form of Reny’s quadratic entropy as:
H Reny ( y ) = log ( 1 N 2 i = 1 N j = 1 N G σ 2 ( y j y i ) )
This leads to:
1 N 2 i = 1 N j = 1 N G σ 2 ( y j y i ) = 1 2 H Reny ( y )
Likewise:
1 M 1 N m = 1 M i = 1 N G σ ( A m y i ) = N M 1 2 H Reny ( A m , x )
Therefore the cost function CED becomes:
C ED = 1 2 H Reny ( y ) 2 N M 1 2 H Reny ( A m , x )
Equations (9) and (11) indicate that the entropy of output samples increases as the distance (yjyi) between the two information particles yj and yi increases. Therefore, (yjyi) can be referred to as entropy-governing output and we can notice that (9) controls the spreading of output samples. Likewise, the term 2 1 M 1 N m = 1 M i = 1 N G σ ( A m y i ) in (6), that is, 2 N M 1 2 H R e n y ( A m , x ) in (11) governs dispreading or recombining the sample pairs of symbol points and output samples.

4. Entropy-Governing Variables and Recursive Algorithms

When defining yj,i = (yjyi) and em,i = (Amyi) and Xj.i = (XjXi) for convenience’s sake, yj,i, em,i and Xj,i can be referred to as entropy-governing output, entropy-governing error and entropy-governing input, respectively. Using these entropy-governing variables and the on-line density estimation method f X . k ( y ) = 1 N i = k N + 1 k G σ ( y y i ) instead of fY(y), the cost function at time k, CED,k can be written as:
C ED , k = U k V k
where:
U k = 1 2 H Reny ( y ) = 1 N 2 i = k N + 1 k j = k N + 1 k G σ 2 ( y j , i )
V k = 2 N M 1 2 H R e n y ( A m , x ) = 2 1 M 1 N i = k N + 1 k j = k N + 1 k G σ ( e m , i )
Minimization of CED,k indicates that Uk forces spreading of output samples and −Vk compels output samples to move close to symbol points. Considering that initial-stage output samples which may have clustered about wrong places due to channel distortion, Uk is associated with the role of getting the output samples to move out in search of each destination, that is, accelerating initial convergence speed. On the other hand, Vk is related with compelling output samples near a symbol point to come close lowering the minimum MSE.
On the other hand, the double summation operations for Uk and Vk impose a heavy computational burden. In the work [8] it has been revealed that each component Uk+1 and Vk+1 of CED,k+1 = Uk+1Vk+1 can be recursively calculated so that the computational complexity of (12) is significantly reduced as in the following equations (15) and (16):
U k + 1 = U k + 2 N 2 j = k N + 1 k G σ 2 ( y i , k + 1 ) 2 N 2 j = k N + 1 k 1 2 σ π exp [ ( y i , k N + 1 ) 2 4 σ 2 ] 2 N 2 1 2 σ π exp [ ( y k + 1 , k N + 1 ) 2 4 σ 2 ] + 2 N 2 1 2 σ π
Similarly, Vk+1 can be divided into the terms with yk+1 and the terms with yk−N+1:
V k + 1 = V k + 2 N M m = 1 M [ 1 σ 2 π exp [ ( e m , k + 1 ) 2 2 σ 2 ] 1 σ 2 π exp [ ( e m , k N + 1 ) 2 2 σ 2 ] ]
The gradients U k W and V k W are calculated recursively by using Equations (15) and (16) as:
U k W = U k 1 W + 1 N 2 σ 2 j = k N k 1 ( y k , i ) · 1 2 σ π exp [ ( y k , i ) 2 4 σ 2 ] · X i , k    1 N 2 σ 2 j = k N k 1 ( y k N , i ) · 1 2 σ π exp [ ( y k N , i ) 2 4 σ 2 ] · X i , k N    1 N 2 σ 2 ( y k N , k ) · 1 2 σ π exp [ ( y k N , k ) 2 4 σ 2 ] · X k , k N
Similarly, V k W is calculated recursively as described below:
V k W = V k 1 W + 2 N M σ 2 m = 1 M [ ( e m , k ) · 1 σ 2 π exp [ ( e m , k ) 2 2 σ 2 ] · X k              ( e m , k N ) · 1 σ 2 π exp [ ( e m , k N ) 2 2 σ 2 ] · X k N ]
Since the argument y k , i 1 2 σ π exp [ ( y k , i ) 2 4 σ 2 ] in (17) is a function of the entropy-governing output yk,i, we can define y k , i 1 2 σ π exp [ ( y k , i ) 2 4 σ 2 ] as the modified entropy-output y ^ k , i , which becomes a significantly mitigated value through the Gaussian kernel when the entropy-governing output yk,i is a large value.
y k , i = y k , i 1 2 σ π exp [ ( y k , i ) 2 4 σ 2 ]
Then (17) becomes
U k W = U k 1 W + 1 N 2 σ 2 j = k N k 1 y k , i · X i , k 1 N 2 σ 2 j = k N k 1 y k N , i · X i , k N 1 N 2 σ 2 y k N , k · X k , k N
Similarly, we see that the argument e m , k 1 σ 2 π exp [ ( e m , k ) 2 2 σ 2 ] in (18) is a function of entropy-governing error em,k, so that we have the modified entropy-error e ^ m , k as:
e m , k = e m , k · 1 σ 2 π exp [ ( e m , k ) 2 2 σ 2 ]
The modified entropy-error e ^ m , k also becomes a significantly reduced value through the Gaussian kernel when the entropy-governing error em,k is large. Then (18) becomes:
V k W = V k 1 W + 2 N M σ 2 m = 1 M [ e m , k · X k e m , k N · X k N ]
Through minimization of CED,k = UkVk with the gradients U k W and V k W obtained by (20) and (22), the following recursive MED (RMED) algorithm can be derived [7]:
W k + 1 = W k μ RMED ( U k V k ) W = W k μ RMED ( U k W V k W )
Comparing the gradients of RMED to the gradient e k 2 W = 2 e k X k of the LMS algorithm in (1) which is composed of error and input, we may find that the gradients U k W and V k W in (20) and (22) have similar terms y ^ k , i · X i , k (modified entropy-output multiplied by entropy-input) and e ^ m , k · X k (modified entropy-error multiplied by input), respectively. Considering that impulsive noise may induce large entropy-governing output yk,i or entropy-governing error em,k, modified entropy-output y ^ k , i and modified entropy-error e ^ m , k which are significantly mitigated by the Gaussian kernel can be viewed as playing a crucial role in obtaining stable gradients under strong impulsive noise. Therefore we can anticipate that the RMED algorithm (23) can have a low weight perturbation in impulsive noise environments.

5. Input Power Estimation for Normalized Gradient

For the purpose of minimizing the weight perturbation W k + 1 W k 2 of the LMS algorithm in (1), the NLMS algorithm has been introduced where the gradient is normalized by the averaged power of the current input samples X k 2 = X k T X k = m = 0 L 1 x k m 2 [1].
W k + 1 = W k + μ NLMS e k X k X k 2
Applying this approach to RMED we propose in this section to normalize the gradients in some ways. Since the role of Uk (spreading output samples) is different from that of Vk (moving output samples close to symbol points), the gradients of (23) can be normalized separately as:
W k + 1 = W k μ RMED U k W 1 P U ( k ) + μ RMED V k W 1 P V ( k )
where PU(k) is the average power of Xi,k and PV(k) is the average power of Xk as:
P U ( k ) = 1 N i = k N + 1 k j = k N + 1 k | x i , j | 2
P V ( k ) = 1 N i = k N + 1 k | x i | 2
Since defeating the impulsive noise contained in the input by way of the average operation 1 N i = k N + 1 k is considered to be ineffective, the denominators of (26) and (27) are likely to be fluctuating under impulsive noise. This may cause the algorithm to be sensitive to impulsive noise. Also the summation operators make the algorithm demand computationally burdensome. To avoid these drawbacks, we can track the average power PU(k) and PV(k) recursively with the balance parameter β (0 < β <1) as:
P U ( k ) = β P U ( k 1 ) + ( 1 β ) j = k N + 1 k | x i , j | 2
P V ( k ) = β P V ( k 1 ) + ( 1 β ) | x k | 2
With the recursive power estimation (28) and (29), we may summarize the proposed algorithm in a more formal one as in the Table 1. In the following section, we will investigate the new RMED algorithm (25) with separate normalization by PU(k) in (28) and PV(k) in (29) in the aspect of convergence speed and steady state MSE.

6. Results and Discussion

A base-band communication system with multipath fading channel and impulsive noise used in the experiment is depicted in Figure 1. The symbol set in the transmitter is composed of equally probable four symbols (−3, −1, 1, 3). The transmitted symbol is to be distorted by the multipath channel H(z) = 0.26 + 0.93z−1 + 0.26z−2 [12]. The channel output is added by impulsive noise nk. The distribution function of nk, f(nk) is expressed in Table 2 where σ I N 2 is the variance of impulses which are generated according to Poisson process (occurrence rate ε) and σ G N 2 is that of the background Gaussian noise [13]. The simulation setup and parameter values are described in the Figure 1 and the Table 2.
An example of impulsive noise being used in this simulation is depicted in Figure 2.
It has in Section 4 been analyzed that Uk is associated with the role of spreading output samples which are clustered to wrong positions due to distorted channel characteristics and Vk is related with moving output samples close to symbol points. This process can be explained through initial-stage investigation of what happens in the error distribution and observing how the distribution of output samples changes in the experimental environment.
Figure 3 shows the error distribution in the initial stage with 200 error samples and ensemble average of 500 runs. Considering the four symbol points are (−3, −1, 1, 3), error values greater than 1.0 are associated with output samples which can be decided as wrong symbols. The cumulative probability of initial output samples placed in the wrong regions in this respect is calculated to be 0.35 from the Figure 3 (35% output samples are not in place). The peaks or ridges in the error distribution are about 6 on each side. This observation may indicate that output samples are clustered or grouped in some regions (two groups are within the correct range but 4 groups are in the incorrect positions on each side of the distribution). This result coincides clearly with the initial output distribution in Figure 4. The output distribution showing about 12 peaks indicates that the initial output samples are clustered into 12 groups mostly located out of place, that is, not around −3, −1, 1, 3.
On the 35% output samples clustered in the wrong symbol regions, the spreading force has a positive effect in order for them in blind search to move out for finding their correct symbol positions. This process is observed in the graph of k = 700 in Figure 4. The output distribution at time k = 700 has an evenly spread shape, indicating that the clustered output samples have moved out and mingled with one another. At the sample time k = 1800 the output samples start to position at their correct symbol areas. From this phase, the force moving output samples close to the symbol points is in effect on lowering steady state MSE.
These results imply that Uk is related with convergence speed and Vk with steady state MSE. To verify this analysis we experiment the proposed algorithm in the following three modes with respect to convergence speed and steady state MSE (we assume that steady state MSE is close to minimum MSE):
Mode   1 W k + 1 = W k μ RMED U k W 1 P U ( k ) + μ RMED V k W
Mode   2 W k + 1 = W k μ RMED U k W + μ RMED V k W 1 P V ( k )
Mode   3 W k + 1 = W k μ RMED U k W 1 P U ( k ) + μ RMED V k W 1 P V ( k )
Mode 1 of RMED-SN algorithm in (30) is for observing changes in initial convergence speed by normalizing only U k W by the average power PU(k) of entropy-input Xi,k compared to the not-normalized RMED. Mode 2 is to observe whether the normalization of V k W by PV(k) of input Xk without managing Uk lowers the steady state MSE of RMED. Finally we see if Mode 3 employing normalization of U k W and V k W simultaneously yields both of the two performance enhancements; faster convergence and lowered steady state MSE.
Figure 5 shows the MSE learning performance for CMA, LMS, RMED and Mode 1 of the proposed algorithm. As discussed in Section 2, the learning curves of the MSE-based algorithms, CMA and LMS do not fall down below −6 dB being defeated by the impulsive noise. On the other hand, the RMED and proposed algorithm show a rapid and stable convergence. The difference of convergence speed between RMED and Mode 1 is clearly observed. While the RMED converges in about 4000 samples, the Mode 1 does in about 2000 samples. Therefore, Mode 1 shows faster convergence than the RMED algorithm by 2 times verifying the analysis of the role of U k since only U k W is normalized but V k W is not, and we see little difference (about 1 dB) in the steady state MSE.
In Figure 6 RMED and Mode 2 are compared. Both algorithms have similar convergence speed with difference of only 500 samples. But after convergence the Mode 2 yields much lower steady state MSE than the original RMED by over 2 dB. These findings indicate that the role of Vk is definitely related with lowering minimum MSE. This is in accordance with the analysis that Uk plays the role of pulling error samples close together.
Furthermore, Mode 3 employing normalization of U k W and V k W simultaneously proves to yield the two merits of performance enhancement revealing increased speed and lowered steady state MSE as depicted in Figure 7. While the RMED converges in about 4000 samples and leaves its steady state MSE at about 25 dB, the Mode 3 converges in about 2000 samples and has about 27 dB of steady state MSE. By employing Mode 3, we obtained faster convergence by about 2 times and lower steady state MSE by over 2 dB.
In Mode 3, it is still not clear whether the normalization to Uk for speeding up the initial convergence may have a negative influence in later iterations, so we try to reduce the Uk normalization gradually after convergence (k ≥ 3000) by using P U ( k ) in place of PU(k) as:
P U ( k ) = P U ( k ) · c k 3000 + ( 1 c k 3000 )
where k ≥ 3000 and a constant c is 0 ≤ c ≤ 1.
The results for c = 0.8, 0.9, 0.99, 1.0 are shown in Figure 8 in terms of error distribution since the learning curves for the various constant values are not clearly distinguishable.
The value of c in (33) may be related with the degree of gradual reduction in the normalization to Uk, that is, c = 1 indicates no reduction (Mode 3 as it is) and c = 0.8 means comparatively rapid reduction. From the Figure 8, we observe that the error performance becomes better and then worse as the degree of reduction decreases from 0.8 to 1.0. This implies that the gradual reduction of the normalization to Uk is effective but not much. We may conclude that the normalization to Uk for speeding up the initial convergence has a slight negative influence in later iterations and this can be overcome by employing the gradual reduction of the Uk normalization.

7. Conclusions

Minimization of the Euclidean distance between output distribution and Dirac delta function as a performance criterion is known to force the distribution of system output to come to a set of delta functions located at each symbol point. In the analysis of the algorithm RMED developed based on that criterion and recursive gradient estimation, it has been revealed in this paper that the minimization process of the cost function uses its two gradients with different functions; one for Uk that forces spreading of output samples and the other one for Vk that compels output samples to move close to symbol points. In order to verify the roles of Uk and Vk explained in the analysis by controlling Uk and Vk separately, we proposed to normalize U k W with the averaged power of entropy-governing input and to normalize V k W with that of input. From the results through simulation for the separate normalization of the gradients of RMED in multipath channel equalization under impulsive noise, faster convergence by about two times through normalization of U k W and lower steady state MSE by over 2 dB by normalization of V k W have been observed. From the analysis and experimental results, we can conclude that Uk is associated with the role of accelerating initial convergence speed by spreading output samples which may have clustered around wrong places in the initial-stage due to channel distortions, and Vk is related with lowering the minimum MSE by pulling error samples close together through the minimization of CED,k. Also it can be concluded that through applying normalization to the two factors U k W and V k W separately with each related input power, significant performance enhancement can be achieved.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Haykin, S. Adaptive Filter Theory, 4th ed.; Prentice Hall: Upper Saddle River, NJ, USA, 2001. [Google Scholar]
  2. Principe, J.; Xu, D.; Fisher, J. Information theoretic learning. In Unsupervised Adaptive Filtering; Haykin, S., Ed.; Wiley: New York, NY, USA, 2000. [Google Scholar]
  3. Erdogmus, D.; Rao, Y.; Principe, J. Supervised training of adaptive systems with partially labeled data. In Proceedings of the International Conference on ASSP, Marrakech, Morocco, 9–15 April 2005; pp. 321–324. [Google Scholar]
  4. Soleimani, H.; Tomasin, S.; Alizadeh, T.; Shojafar, M. Cluster-head based feedback for simplified time reversal prefiltering in ultra-wideband systems. Phys. Commun. 2017, 25, 100–109. [Google Scholar] [CrossRef]
  5. Ahmadi, A.; Shojafar, M.; Hajeforosh, S.F.; Dehghan, M.; Singhal, M. An efficient routing algorithm to preserve k-coverage in wireless sensor networks. J. Supercomput. 2014, 68, 599–623. [Google Scholar] [CrossRef]
  6. Jeong, K.; Xu, J.W.; Erdogmus, D.; Principe, J.C. A new classifier based on information theoretic learning with unlabeled data. Neural Netw. 2005, 18, 719–726. [Google Scholar] [CrossRef] [PubMed]
  7. Kim, N.; Kang, M. Blind signal processing algorithms based on recursive gradient estimation. Int. J. Electr. Comput. Eng. 2015, 5, 548–561. [Google Scholar]
  8. Treichler, R.; Agee, B. A new approach to multipath correction of constant modulus signals. IEEE Trans. 1983, 31, 349–372. [Google Scholar] [CrossRef]
  9. Bharani, L.; Radhika, P. FPGA implementation of optimal step size NLMS algorithm and its performance analysis. Int. J. Res. Eng. Technol. 2013, 2, 885–890. [Google Scholar]
  10. Chinaboina, R.; Ramkiran, D.; Khan, H.; Usha, M.; Madhav, B.; Srinivas, K.; Ganesh, G. Adaptive algorithms for acoustic echo cancellation in speech processing. Int. J. Res. Rev. Appl. Sci. 2011, 7, 38–42. [Google Scholar]
  11. Leff, H.S. Thermodynamic entropy: The spreading and sharing of energy. Am. J. Phys. 1996, 64, 1261–1271. [Google Scholar] [CrossRef]
  12. Proakis, J. Digital Communications, 2nd ed.; McGraw-Hill: New York, NY, USA, 1989. [Google Scholar]
  13. Santamaria, I.; Pokharel, P.; Principe, J. Generalized correlation function: Definition, properties, and application to blind equalization. IEEE Trans. Signal Process. 2006, 54, 2187–2197. [Google Scholar] [CrossRef]
Figure 1. Base-band communication system for simulation.
Figure 1. Base-band communication system for simulation.
Entropy 20 00048 g001
Figure 2. An example of impulsive noise.
Figure 2. An example of impulsive noise.
Entropy 20 00048 g002
Figure 3. The error distribution at time k = 200 with 200 error samples.
Figure 3. The error distribution at time k = 200 with 200 error samples.
Entropy 20 00048 g003
Figure 4. Output distributions in an initial stage.
Figure 4. Output distributions in an initial stage.
Entropy 20 00048 g004
Figure 5. MSE convergence performance for Uk normalization.
Figure 5. MSE convergence performance for Uk normalization.
Entropy 20 00048 g005
Figure 6. MSE convergence performance for normalization of Vk.
Figure 6. MSE convergence performance for normalization of Vk.
Entropy 20 00048 g006
Figure 7. MSE convergence performance for normalization of both Uk and Vk.
Figure 7. MSE convergence performance for normalization of both Uk and Vk.
Entropy 20 00048 g007
Figure 8. Error distribution with respect to the values of c for normalization of Uk.
Figure 8. Error distribution with respect to the values of c for normalization of Uk.
Entropy 20 00048 g008
Table 1. A summary of the proposed algorithm.
Table 1. A summary of the proposed algorithm.
ProcessEquations
Initialization U 0 W = 0 , V 0 W = 0 , P U ( 0 ) = 1 , P V ( 0 ) = 1 , W 0 = [ 0 , , 0 , w L / 2 , 0 = 1 , 0 , , 0 ] T
Update of gradient function U k W U k W = U k 1 W + 1 N 2 σ 2 j = k N k 1 y k , i · X i , k 1 N 2 σ 2 j = k N k 1 y k N , i · X i , k N 1 N 2 σ 2 y k N , k · X k , k N
Update of gradient function V k W V k W = V k 1 W + 2 N M σ 2 m = 1 M [ e m , k · X k e m , k N · X k N ]
Update of P U ( k ) P U ( k ) = β P U ( k 1 ) + ( 1 β ) j = k N + 1 k | x i , j | 2
Update of P V ( k ) P V ( k ) = β P V ( k 1 ) + ( 1 β ) | x k | 2
Update of W k W k + 1 = W k μ RMED U k W 1 P U ( k ) + μ RMED V k W 1 P V ( k )
Table 2. Simulation setup and parameter values.
Table 2. Simulation setup and parameter values.
FeaturesParameters
The symbol points in the transmitter ( A 1 ,   A 2 ,   A 3 ,   A 4 ) = ( 3 ,   1 ,   + 1 ,   + 3 )
The channel transfer function H ( z ) H ( z ) = 0.26 + 0.93 z 1 + 0.26 z 2
The noise distribution function f ( n k ) f ( n k ) = 1 ε σ GN 2 π exp [ n k 2 2 σ GN 2 ] + ε 2 π ( σ GN 2 + σ IN 2 ) exp [ n k 2 2 ( σ GN 2 + σ IN 2 ) ] , ε = 0.03 , σ GN 2 = 0.001 , σ GN 2 + σ IN 2 = 50.001
NNumber of weights11
4 Step size μ CMA = 0.000001 , μ LMS = 0.0002 , μ RMED = 0.005
Sample size N 6
Kernel size σ 0.6

Share and Cite

MDPI and ACS Style

Kim, N. Function Analysis of the Euclidean Distance between Probability Distributions. Entropy 2018, 20, 48. https://doi.org/10.3390/e20010048

AMA Style

Kim N. Function Analysis of the Euclidean Distance between Probability Distributions. Entropy. 2018; 20(1):48. https://doi.org/10.3390/e20010048

Chicago/Turabian Style

Kim, Namyong. 2018. "Function Analysis of the Euclidean Distance between Probability Distributions" Entropy 20, no. 1: 48. https://doi.org/10.3390/e20010048

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop