Next Article in Journal
Optimization of Gene Selection for Cancer Classification in High-Dimensional Data Using an Improved African Vultures Algorithm
Previous Article in Journal
Hierarchical Optimization Framework for Layout Design of Star–Tree Gas-Gathering Pipeline Network in Discrete Spaces
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fast Minimum Error Entropy for Linear Regression

1
State Grid Information & Telecommunication Group Co., Ltd., Beijing 102209, China
2
School of Electrical Engineering, Xi’an Jiaotong University, Xi’an 710049, China
*
Author to whom correspondence should be addressed.
Algorithms 2024, 17(8), 341; https://doi.org/10.3390/a17080341
Submission received: 23 June 2024 / Revised: 1 August 2024 / Accepted: 3 August 2024 / Published: 6 August 2024

Abstract

:
The minimum error entropy (MEE) criterion finds extensive utility across diverse applications, particularly in contexts characterized by non-Gaussian noise. However, its computational demands are notable, and are primarily attributable to the double summation operation involved in calculating the probability density function (PDF) of the error. To address this, our study introduces a novel approach, termed the fast minimum error entropy (FMEE) algorithm, aimed at mitigating computational complexity through the utilization of polynomial expansions of the error PDF. Initially, the PDF approximation of a random variable is derived via the Gram–Charlier expansion. Subsequently, we proceed to ascertain and streamline the entropy of the random variable. Following this, the error entropy inherent to the linear regression model is delineated and expressed as a function of the regression coefficient vector. Lastly, leveraging the gradient descent algorithm, we compute the regression coefficient vector corresponding to the minimum error entropy. Theoretical scrutiny reveals that the time complexity of FMEE stands at O ( n ) , in stark contrast to the O ( n 2 ) complexity associated with MEE. Experimentally, our findings underscore the remarkable efficiency gains afforded by FMEE, with time consumption registering less than 1‰ of that observed with MEE. Encouragingly, this efficiency leap is achieved without compromising accuracy, as evidenced by negligible differentials observed between the accuracies of FMEE and MEE. Furthermore, comprehensive regression experiments on real-world electric datasets in northwest China demonstrate that our FMEE outperforms baseline methods by a clear margin.

1. Introduction

Entropy serves as a pivotal metric for quantifying the inherent uncertainty within a system, finding ubiquitous application across a multitude of domains including dimension reduction [1,2], parameter estimation [3,4], identification [5,6], feedback control [7,8], and abnormal detection [9,10,11]. The minimum error entropy (MEE) criterion stands out as a notable framework aimed at minimizing the entropy associated with estimation errors, thereby mitigating uncertainty within the estimation model. Particularly pertinent to linear regression estimation quandaries, MEE diverges from conventional least square methodologies by not only considering the variance of prediction errors, but also incorporating higher-order cumulants, rendering it adept at handling non-Gaussian noise distributions [12,13]. Given the prevalence of non-Gaussian noise in real-world scenarios, the efficacy of MEE has been duly validated across various applications, encompassing adaptive filtering [14,15,16], face recognition [17,18], sparse system identification [19,20,21], stochastic control systems [22,23], and visible light communication [24].
Furthermore, the convergence properties of MEE have been meticulously examined in the prior literature [25,26,27]. In-depth analyses delved into the convergence dynamics of fixed-point MEE methodologies [28], while investigations into the interplay between MEE and Minimum Mean-Squared Error (MMSE) [29], as well as its relationship with maximum correntropy [30], have been extensively explored. Noteworthy extensions to the conventional MEE framework include the proposal of kernel MEE variants [31], elucidations on regularization strategies for MEE implementations [32], and the development of semi-supervised and distributed adaptations [33]. Moreover, within the errors-in-variables (EIV) modeling realm, robustness analyses of MEE methodologies have been expounded upon [34], alongside the proposition of novel methodologies such as the minimum total error entropy approach [35].
The computational demands associated with MEE criterion stem from its time complexity, which scales quadratically as O ( n 2 ) . Central to MEE’s computational load is the necessity for a double summation operation during the calculation of the gradient pertaining to the error probability density function (PDF), particularly when employing Parzen windowing [36] and certain types of Kernel functions. Moreover, the selection of an inappropriate Kernel function or its parameters can detrimentally impact the accuracy of the resulting PDF. To solve the problem, efforts have been made to optimize the gradient descent MEE approach, as evidenced by the normalization of the step size in [37], leveraging the power of the input entropy. Additionally, innovative methodologies such as the utilization of a quantization operator, as proposed in [38], have been instrumental in mitigating computational complexity. By mapping the error onto a series of real-valued code words, this approach effectively transforms the double summation operation into a singular form, consequently reducing the algorithm’s computational overhead.
This paper presents a novel approach, the fast minimum error entropy estimation (FMEE) algorithm, tailored specifically for linear regression tasks, leveraging the polynomial expansion technique applied to the error Probability Density Function (PDF). In contrast to the traditional minimum error entropy (MEE) methodology employing Parzen windowing for error PDF derivation, FMEE adopts the Gram–Charlier expansion method to approximate the error PDF, resulting in a notable reduction in time complexity from O ( n 2 ) to O ( n ) . Notably, FMEE obviates the need for Kernel functions or their associated parameters. The proposed algorithm entails several key steps: firstly, the derivation of the PDF approximation for a random variable via the Gram–Charlier expansion, followed by the simplification of the entropy of the random variable facilitated by the orthogonality of the Hermite polynomials. Subsequently, the error of the linear regression model is ascertained, with the error entropy expressed as a function of the regression coefficient vector. Ultimately, the gradient descent technique is employed to deduce the regression coefficient vector corresponding to the minimum error entropy. Experimental validation underscores the efficacy of FMEE, revealing a remarkable reduction in time consumption, with a disparity of less than 1‰ compared to that of the MEE approach for identical problem settings. Importantly, minimal discrepancies are observed between the accuracy levels of FMEE and MEE methodologies.
The rest of the paper is organized as follows: Section 2 introduces the related algorithms, FMEE is proposed in detail in Section 3, Section 4 contains the experiments, and the conclusions are in Section 5.

2. The Related Algorithms

2.1. MEE

MEE minimizes the information contained in the error and maximizes the information captured by the estimated model. Usually, the second-order Renyi entropy [39] is used to measure the entropy contained in the error of the model.
H ( E ) = log E p ( p E ) = log p E 2 ( e ) d e ,
where E is the error random variable of MEE, and H ( E ) is the entropy of E and p E ( e ) donates the PDF of E. In this paper, E p ( · ) donates the expectation in order to avoid misunderstanding. Then, the PDF of E can be estimated by Parzen windowing:
p ^ E ( e ) = 1 n i = 1 n K h ( e e i ) ,
where n is the number of the observed data, K is the kernel function, and h is the bandwidth. Usually, the Gaussian kernel is chosen, and K h ( z ) = 1 2 π h exp ( z 2 2 h 2 ) . Then, the empirical information of error H ^ ( f ) can be calculated as:
H ^ ( f ) = log 1 n 2 i = 1 n j = 1 n K h ( e i e j ) .
As the logarithmic function is monotone increasing, it can be removed without any influence on the minimizing process. Then, the transformed empirical error information can be obtained as Equation (4):
R ( e ) = 1 n 2 i = 1 n j = 1 n K h ( e i e j ) .
For the linear regression, y = w T x + e , the target of the model is to estimate w from the limited samples. As e i = y i w T x i , the corresponding empirical error information is calculated as Equation (5):
R ( w ) = 1 n 2 i = 1 n j = 1 n K h ( ( y i w T x i ) ( y j w T x j ) ) .
Then, the MEE estimator w ^ can be obtained by minimizing R ( w ) with gradient descent. The time complexity of MEE is O ( n 2 ) due to the double summation operation, as shown in Equation (5). As the time consumption of MEE is quadratic to the number of the observed data, the running time of MEE would increase dramatically with the increase in the amount of the observed data.

2.2. The Hermite Polynomials

The Hermite polynomials [40] are a classical orthogonal polynomial sequence. They are defined by the derivatives of the standardized Gaussian PDF φ ( ξ ) as Equation (6):
i φ ( ξ ) ξ i = ( 1 ) i H i ( ξ ) φ ( ξ ) ,
where H ( i ) donates the ith-order Hermite polynomial.
The Hermite polynomials forms an orthogonal system, as in Equation (7):
φ ( ξ ) H i ( ξ ) H j ( ξ ) d ξ = i ! , if i = j 0 , if i j ,
This paper uses the first five Hermite polynomials, shown in Equation (8):
H 0 ( x ) = 1 , H 1 ( x ) = x , H 2 ( x ) = x 2 1 , H 3 ( x ) = x 3 3 x , H 4 ( x ) = x 4 6 x 2 + 3 .

2.3. The Gram–Charlier Expansion

The Gram–Charlier expansion obtains the approximation of the PDF of a random variable, and there is no significant loss of accuracy with this expansion, which is demonstrated in [41]. Under the assumption that the random variable is near the normal distribution with the same mean and variance as itself, the Gram–Charlier expansion of the PDF of x can be expressed as:
p x ( ξ ) p ^ x ( ξ ) = 1 2 π σ exp ( ξ μ ) 2 2 σ 2 1 + κ 3 ( x ) 3 ! σ 3 H 3 ξ μ σ + κ 4 ( x ) 4 ! σ 4 H 4 ξ μ σ ,
where μ and σ are the mean and standard deviation of x. ξ is the Gaussian random variable whose mean and variable are the same as those of x. κ 3 ( x ) and κ 4 ( x ) are the third-order and fourth-order cumulant of x, which can also be called skewness and kurtosis. H 3 and H 4 are the third-order and fourth-order Hermite polynomials.

3. Methodology

This section derives the relation between the differential entropy and the Gram–Charlier expansion of the PDF of the error for linear regression. Moreover, the assumption that the PDF of the regression error is near the Gaussian distribution φ ( ξ ) that has the same mean and variance as the error is implemented.
The differential entropy [42] of a random variable X is shown in Equation (10):
H ( x ) = p ( x ) log p ( x ) d x .
Substitute Equation (9) into Equation (10); the result can be derived in Equation (11):
H ( x ) p ^ x ( ξ ) log p ^ x ( ξ ) d ξ = 1 2 π σ exp ( ξ μ ) 2 2 σ 2 1 + κ 3 ( x ) 3 ! σ 3 H 3 ξ μ σ + κ 4 ( x ) 4 ! σ 4 H 4 ξ μ σ log [ 1 2 π σ exp ( ξ μ ) 2 2 σ 2 1 + κ 3 ( x ) 3 ! σ 3 H 3 ξ μ σ + κ 4 ( x ) 4 ! σ 4 H 4 ξ μ σ ] d ξ .
This integral is rather difficult to compute. However, notice that if the PDF of x is near the normal density as assumed, it can be inferred that κ 3 ( x ) and κ 4 ( x ) are very small. Then, the approximation in Equation (12) can be used:
log ( 1 + ϵ ) ϵ ϵ 2 2 ,
where ϵ  denotes the infinitesimal.
Then, Equation (11) can be transformed into Equation (13):
H ( x ) = φ ( ξ ) 1 + κ 3 ( x ) 3 ! σ 3 H 3 ξ μ σ + κ 4 ( x ) 4 ! σ 4 H 4 ξ μ σ [ log φ ( ξ ) + κ 3 ( x ) 3 ! σ 3 H 3 ξ μ σ + κ 4 ( x ) 4 ! σ 4 H 4 ξ μ σ 1 2 ( κ 3 ( x ) 3 ! σ 3 H 3 ξ μ σ + κ 4 ( x ) 4 ! σ 4 H 4 ξ μ σ ) 2 ] d ξ .
As H i forms an orthogonal system, Equation (13) can be simplified as Equation (14),
H ( x ) = 1 2 log 2 π e σ 2 κ 3 2 ( x ) 2 3 ! σ 6 κ 4 2 ( x ) 2 4 ! σ 8 .
The details of the simplification process of Equation (14) are illustrated as follows. This part is to derive Equation (14). First, it is needed to show that
φ ( ξ ) ( ξ μ σ ) 2 H 3 ( ξ μ σ ) d ξ = 0
and
φ ( ξ ) ( ξ μ σ ) 2 H 4 ( ξ μ σ ) d ξ = 0
For Equation (15), suppose that t = ( ξ μ ) / σ ,
φ ( ξ ) ( ξ μ σ ) 2 H 3 ( ξ μ σ ) d ξ = 1 2 π σ exp ( ξ μ ) 2 2 σ 2 ( ξ μ σ ) 3 3 ( ξ μ σ ) d ξ = 1 2 π t 2 exp ( t 2 2 ) ( t 3 3 t ) d t
As exp ( t 2 2 ) is an even function and t 5 3 t 3 is an odd function, then
1 2 π t 2 exp ( t 2 2 ) ( t 3 3 t ) d t = 0
and Equation (15) holds.
For Equation (16), suppose that t = ( ξ μ ) / σ ,
φ ( ξ ) ( ξ μ σ ) 2 H 4 ( ξ μ σ ) d ξ = 1 2 π σ exp ( ξ μ ) 2 2 σ 2 ( ξ μ σ ) 6 6 ( ξ μ σ ) 4 + 3 ( ξ μ σ ) 2 d ξ = 1 2 π exp ( t 2 2 ) ( t 6 6 t 4 + 3 t 2 ) d t = 3 1 2 π exp ( t 2 2 ) t 2 d t 6 1 2 π exp ( t 2 2 ) t 4 d t + 1 2 π exp ( t 2 2 ) t 6 d t
The first term in Equation (19) can be calculated as Equation (20),
1 2 π exp ( t 2 2 ) t 2 d t = 1 2 π ( t ) d exp ( t 2 2 ) = 1 2 π ( t ) exp ( t 2 2 ) | + + 1 2 π exp ( t 2 2 ) d t = 1
The second term in Equation (19) can be calculated as Equation (21),
1 2 π exp ( t 2 2 ) t 4 d t = 1 2 π ( t 3 ) d exp ( t 2 2 ) = 1 2 π ( t 3 ) exp ( t 2 2 ) | + + 3 1 2 π exp ( t 2 2 ) t 2 d t = 3
The last term in Equation (19) can be calculated as Equation (22),
1 2 π exp ( t 2 2 ) t 6 d t = 1 2 π ( t 5 ) d exp ( t 2 2 ) = 1 2 π ( t 5 ) exp ( t 2 2 ) | + + 5 1 2 π exp ( t 2 2 ) t 4 d t = 15
According to Equations (20)–(22),
φ ( ξ ) ( ξ μ σ ) 2 H 4 ( ξ μ σ ) d ξ = 15 6 3 + 3 = 0
and Equation (16) holds.
Then, for Equation (13),
H ( x ) = φ ( ξ ) log φ ( ξ ) d ξ φ ( ξ ) κ 3 ( x ) 3 ! σ 3 H 3 ( ξ μ σ ) + κ 4 ( x ) 4 ! σ 4 H 4 ( ξ μ σ ) d ξ + 1 2 φ ( ξ ) κ 3 ( x ) 3 ! σ 3 H 3 ( ξ μ σ ) + κ 4 ( x ) 4 ! σ 4 H 4 ( ξ μ σ ) 2 d ξ φ ( ξ ) log φ ( ξ ) [ κ 3 ( x ) 3 ! σ 3 H 3 ( ξ μ σ ) + κ 4 ( x ) 4 ! σ 4 H 4 ( ξ μ σ ) ] d ξ φ ( ξ ) [ κ 3 ( x ) 3 ! σ 3 H 3 ( ξ μ σ ) + κ 4 ( x ) 4 ! σ 4 H 4 ( ξ μ σ ) ] 2 d ξ + 1 2 φ ( ξ ) κ 3 ( x ) 3 ! σ 3 H 3 ( ξ μ σ ) + κ 4 ( x ) 4 ! σ 4 H 4 ( ξ μ σ ) 3 d ξ
Notice that H 0 = 1 , then the second term of Equation (24) is 0. As it is assumed that x is close to a normal random variable, κ 3 ( x ) and κ 4 ( x ) are very small. Then, the last term of Equation (24) can be omitted as 0, as it involves third-order monomials, which are infinitely smaller than the terms only containing second-order monomials. Then, for Equation (24), take Equations (15) and (16) into consideration, and notice again that H 0 = 1 ,
H ( x ) = H ( ξ ) 1 2 φ ( ξ ) [ [ κ 3 ( x ) 3 ! σ 3 H 3 ( ξ μ σ ) ] 2 + [ κ 4 ( x ) 4 ! σ 4 H 4 ( ξ μ σ ) ] 2 + 2 κ 3 ( x ) 3 ! σ 3 H 3 ( ξ μ σ ) κ 4 ( x ) 4 ! σ 4 H 4 ( ξ μ σ ) ] d ξ φ ( ξ ) log 1 2 π σ + log exp ( ξ 2 2 ) κ 3 ( x ) 3 ! σ 3 H 3 ( ξ μ σ ) + κ 4 ( x ) 4 ! σ 4 H 4 ( ξ μ σ ) d ξ = H ( ξ ) κ 3 2 ( x ) 18 σ 6 κ 3 2 ( x ) 32 σ 8 log 1 2 π σ φ ( ξ ) [ κ 3 ( x ) 3 ! σ 3 H 3 ( ξ μ σ ) + κ 4 ( x ) 4 ! σ 3 H 4 ( ξ μ σ ) ] d ξ + 1 2 ln 2 φ ( ξ ) ξ 2 κ 3 ( x ) 3 ! σ 3 H 3 ( ξ μ σ ) + κ 4 ( x ) 4 ! σ 3 H 4 ( ξ μ σ ) d ξ = 1 2 log 2 π e σ 2 κ 3 2 ( x ) 18 σ 6 κ 4 2 ( x ) 32 σ 8
So, Equation (14) holds.
For the linear regression, e i = y i w T x i , the standard deviation of the error e w is in Equation (26):
σ 2 ( e w ) = E p ( E w 2 ) [ E p ( E w ) ] 2 = E p ( E w 2 ) ( y ¯ w T x ¯ ) 2 ,
where y ¯ is the mean of y and x ¯ donates the mean of x .
To simplify the calculation of κ 4 ( e w ) , let e w become a random variable with 0 mean:
e w i = y i w T x i y ¯ + w T x ¯ .
Notice that the transformation in Equation (27) does not change the standard deviation and the entropy of e w as σ 2 ( e + c ) = σ 2 ( e ) and H ( e + c ) = H ( e ) when c is a constant number. In the rest of the paper, to keep clarity, σ 2 ( · ) is replaced with v a r ( · ) .
Then, for linear regression, its entropy of the error can be expressed as Equation (28):
H ( e w ) = 1 2 log 2 π e + 1 2 log v a r ( e w ) κ 3 2 ( e w ) 2 3 ! v a r 3 ( e w ) κ 4 2 ( e w ) 2 4 ! v a r 4 ( e w ) ,
where
κ 3 ( e w ) = E p [ ( y w T x y ¯ + w T x ¯ ) 3 ] ,
κ 4 ( e w ) = E p [ ( y w T x y ¯ + w T x ¯ ) 4 ] 3 .
To minimize H ( e w ) , the derivative of H ( e w ) with respect to w is calculated in Equation (31):
H ( e w ) w = 1 2 v a r ( e w ) ln 2 v a r ( e w ) w 1 2 3 ! v a r 6 ( e w ) [ 2 v a r 3 ( e w ) κ 3 ( e w ) κ 3 ( e w ) w 3 κ 3 2 ( e w ) v a r 2 ( e w ) v a r ( e w ) w ] 1 2 4 ! v a r 8 ( e w ) [ 2 v a r 4 ( e w ) κ 4 ( e w ) κ 4 ( e w ) w 4 κ 4 2 ( e w ) v a r 3 ( e w ) v a r ( e w ) w ] ,
where
v a r ( e w ) w = 2 E p [ ( y w T x ) x ] + 2 ( y ¯ w T x ¯ ) x ¯ ,
κ 3 ( e w ) w = 3 E p [ ( y w T x y ¯ + w T x ¯ ) 2 ( x + x ¯ ) ] ,
κ 4 ( e w ) w = 4 E p [ ( y w T x y ¯ + w T x ¯ ) 3 ( x + x ¯ ) ] .
Then, the optimal w for the minimum of H ( e w ) can be obtained by the iteration scheme in Equation (35) with gradient descent:
w ^ k + 1 = w ^ k α H ( e w k ) w ^ k ,
where α is the step size determined by Armijo condition [43].
To compute H ( e w k ) w k for FMEE in every iteration, v a r ( e w k ) , κ 3 ( e w k ) , κ 4 ( e w k ) , v a r ( e w k ) w k , κ 3 ( e w k ) w k , and κ 4 ( e w k ) w k are needed. From Equations (26) and (29)–(34), the time complexity of v a r ( e w k ) , κ 3 ( e w k ) , κ 4 ( e w k ) , v a r ( e w k ) w k , κ 3 ( e w k ) w k , and κ 4 ( e w k ) w k is O ( n ) , respectively. Then, the computational complexity of FMEE is O ( n ) . As the time complexity of MEE is O ( n 2 ) due to the double summation operation, FMEE runs faster than MEE does.

4. Experiments

In this section, a comprehensive array of experiments is undertaken, encompassing both numerical simulations and real-world scenarios. The numerical simulations serve to validate the efficacy of the FMEE approach and to scrutinize the time efficiency of both FMEE and MEE methodologies. Concurrently, practical experiments are conducted aimed at forecasting power outages within a city located in northwest China.
Regarding the numerical simulations, we consider a scenario where x R 10 , with the model defined as y = w T x + e , where w = [ 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 ] T and x follows a normal distribution N ( 0 , I 10 ) . Two distinct types of noise are examined in our experiments. Firstly, Gaussian noise e N ( 0 , 1 ) is employed, while the second type comprises generalized Gaussian noise characterized by a probability density function f ( e ) exp ( α | e | ) 0.3 . This heavy-tailed distribution, utilized herein, aligns with previous works, such as references [27,35]. Although FMEE is derived under the assumption of near-Gaussian noise, it is anticipated that FMEE remains effective in the presence of non-Gaussian, heavy-tailed noises. Here, α denotes a constant adjusted to maintain a variance of 1 for e. For MEE, the step size is set to 0.005 π , and the scale parameter for the Gaussian kernel is 10, mirroring the settings adopted in Reference [27]. Across the experiments, sample sizes range from 100 to 500, with each combination of sample size and noise type repeated 100 times. The ensuing experimental results are detailed below.
Table 1 provides a comparative analysis of the time consumption between MEE and FMEE in the presence of Gaussian noise. FMEE demonstrates a notable acceleration in computational efficiency compared to MEE. Specifically, while the computational time of MEE exhibits quadratic growth relative to the sample size, FMEE’s computational overhead scales linearly with the number of samples. Notably, FMEE achieves remarkable speed, processing 500 objects in approximately 0.1 s, thereby positioning it as a highly promising candidate for time-sensitive applications.
Table 2 compares the mean squared error between MEE and FMEE in the context of Gaussian noise. Overall, MEE exhibits superior performance compared to FMEE in terms of MSE. Particularly noteworthy is MEE’s substantial advantage over FMEE, especially evident when the sample size is 100. However, as the sample size increases, the discrepancy in MSE between the two algorithms diminishes. By the time the sample size reaches 500, the difference dwindles to a mere 2 % . This observation suggests that while FMEE may lag slightly behind MEE, particularly in larger-scale datasets, such disparity remains acceptable in practical applications, given FMEE’s significantly reduced computational time compared to MEE.
Table 3 analyses the iteration counts required for convergence between MEE and FMEE in the context of Gaussian noise. Notably, FMEE achieves convergence with significantly fewer iterations compared to MEE. Furthermore, it is observed that the iteration count of MEE gradually decreases as the sample size increases. Conversely, the iteration count of FMEE remains relatively constant, irrespective of variations in sample size.
Table 4 provides a comparative assessment of the time consumption between MEE and FMEE in the context of non-Gaussian noise. Across all five experimental groups, FMEE consistently demonstrates a remarkable reduction in time consumption, consistently amounting to less than 1‰ of MEE’s time expenditure. Moreover, in accordance with theoretical predictions, the computational cost of MEE exhibits quadratic growth relative to the sample size, whereas FMEE’s time overhead scales linearly with the sample size.
Table 5 conducts a comparative evaluation of the mean squared error (MSE) between MEE and FMEE in the context of non-Gaussian noise. Generally, the disparity between the MSE values of FMEE and MEE is relatively minor. Notably, FMEE outperforms MEE when the sample size is 100 or 400, while MEE exhibits superior performance for sample sizes of 200, 300, and 500. Additionally, FMEE attains the optimal MSE across all sample sizes, except for n = 100 . Furthermore, it is observed that the MSE values for non-Gaussian noise, as depicted in Table 5, span a broader range compared to those for Gaussian noise, indicating greater variability in results for both MEE and FMEE.
Table 6 compares the number of iterations required for convergence between MEE and FMEE in the context of non-Gaussian noise. FMEE consistently demonstrates faster convergence compared to MEE. Furthermore, it is observed that the iteration count of MEE decreases as the sample size increases. In contrast, the iteration count of FMEE remains relatively stable despite variations in sample size.
Based on the findings presented in Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6, it is evident that FMEE consistently yields comparable outcomes to MEE, albeit with significantly reduced time consumption and iteration counts. Notably, FMEE stands out for its remarkable efficiency, being approximately 1000 times faster than MEE and achieving convergence within approximately 0.1 s for 500 instances. Given its linear time complexity of O ( n ) , FMEE holds considerable promise for deployment in real-time scenarios characterized by non-Gaussian noise.
Subsequently, we embark on practical experimentation utilizing transformer characteristic data to forecast distribution network failures. The transformers are categorized into two groups based on their geographical location within distinct levels of urban activity within a specific city in northwest China. Initially, our attention is directed towards a subset of 1265 transformers situated within a smaller, less densely populated area of the aforementioned city. We undertake a random partitioning of this dataset into two segments: a training set comprising 80 % of the data, totaling 1012 instances, and a verification set encompassing the remaining 20 % , comprising 253 instances. To be more specific, the transformer dataset comprises various characteristic variables, involving the standardized heavy overload duration, maximum active load ratio, average active load ratio, mean three-phase unbalance, standardized heavy three-phase unbalance duration and so forth.
We conduct a comparative analysis, pitting our proposed FMEE against three established baseline methodologies: logistic regression, neural network, and support vector machine. The dataset comprises 1265 transformer characteristic variables associated with substantial load levels, undergoing 30 rounds of random partitioning. Subsequently, each randomly divided dataset is subjected to four distinct algorithms. The predictive efficacy of these algorithms on the verification set is assessed based on F-measure and error rate metrics.
Figure 1 depicts the F-measure evaluations stemming from 30 prediction instances conducted by four distinct algorithms. Notably, each evaluation involves consistent partitioning of the dataset across the four algorithms. A discernible trend emerges wherein all algorithms yield F-measure values surpassing 0.8. Notably, our proposed FMEE exhibits the most promising performance, boasting an average F-measure of 0.904, thereby outshining the other three baseline methodologies by a significant margin.
To facilitate a comprehensive comparison among the four algorithms, Figure 2 depicts the error rate and F-measure of fault outage prediction results in the form of line and box charts. Notably, FMEE emerges as a frontrunner, showcasing a substantial advantage over the sophisticated support vector machine in terms of error rate. Specifically, our FMEE achieves an average error rate of 9.1 % , surpassing the neural network, logistic regression, and support vector machine by margins of 0.3 % , 4.2 % , and 4.3 % , respectively. These findings underscore the clear superiority of our proposed FMEE methodology.
Furthermore, we extend our experimentation to a more densely populated urban region within the northwest Chinese city. Figure 3 illustrates the F-measure evaluations derived from 30 independent tests conducted across the four comparative algorithms. On a comprehensive scale, our proposed FMEE attains an F-measure of 0.891, surpassing baseline methodologies such as neural networks, logistic regression, and support vector machines by margins of 0.013, 0.023, and 0.028, respectively.
Finally, we report the error rate of F-measure concerning fault outage prediction results derived from four comparative algorithms, represented through line and box charts. As shown in Figure 4, our proposed FMEE achieves an average error rate of 11.2 % , showcasing a notable superiority over three baseline methodologies. These findings robustly underscore the efficacy of our algorithm in addressing regression problems.

5. Conclusions

This study introduces a rapid minimum error entropy algorithm tailored for linear regression. Several notable conclusions emerge from our investigation. Firstly, the proposed FMEE exhibits significantly faster convergence, requiring fewer iterations compared to MEE. Secondly, while FMEE does not surpass traditional MEE in performance for Gaussian noise and yields comparable results for non-Gaussian noise as measured by mean squared error, its computational efficiency is strikingly superior, consuming less than 1‰ of the time required by MEE. Thirdly, despite being derived under the assumption of noise proximity to Gaussian distribution, FMEE demonstrates efficacy in handling non-Gaussian, heavy-tailed noise distributions. Fourthly, owing to its exceptional time efficiency, FMEE holds promise for real-time applications in scenarios featuring non-Gaussian noise. Moving forward, further analysis of FMEE’s convergence conditions and exploration of noise approximation methods for arbitrary distributions represent avenues for future research.

Author Contributions

Conceptualization, Q.L. and X.L.; methodology, Q.L.; software, W.C.; validation, Q.L., X.L. and W.C.; formal analysis, Y.W.; investigation, X.L.; resources, Q.L.; data curation, Y.W.; writing—original draft preparation, Q.G.; writing—review and editing, H.C.; visualization, Q.G.; supervision, H.C.; project administration, W.C.; funding acquisition, Q.L. and H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the State Grid Information and Telecommunication Group scientific and technological innovation projects “Research on Power Digital Space Technology System and Key Technologies” (Grant no: SGIT0000XMJS2310456).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author. The data are not publicly available due to the special significance of the electrical power system used in the experimental process, which involve privacy in some research fields, it is not convenient to fully disclose their data.

Conflicts of Interest

Authors (Qiang Li, Xiao Liao, Wei Cui, and Ying Wang) were employed by the State Grid Information & Telecommunication Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Baggenstoss, P.M. Maximum Entropy PDF Design Using Feature Density Constraints: Applications in Signal Processing. IEEE Trans. Signal Process. 2015, 11, 2815–2825. [Google Scholar] [CrossRef]
  2. Li, Y.; Bai, X.; Yuan, D.; Yu, C.; San, X.; Guo, Y.; Zhang, L.; Ye, J. Cu-based high-entropy two-dimensional oxide as stable and active photothermal catalyst. Nat. Commun. 2023, 14, 3171. [Google Scholar] [CrossRef] [PubMed]
  3. Bisikalo, O.; Kharchenko, V.; Kovtun, V.; Krak, I.; Pavlov, S. Parameterization of the stochastic model for evaluating variable small data in the Shannon entropy basis. Entropy 2023, 25, 184. [Google Scholar] [CrossRef] [PubMed]
  4. Chen, Q.; Zhang, F.; Su, L.; Lin, B.; Chen, S.; Zhang, Y. State Parameter Fusion Estimation for Intelligent Vehicles Based on IMM-MCCKF. Appl. Sci. 2024, 14, 4495. [Google Scholar] [CrossRef]
  5. Carli, F.P.; Chen, T.; Ljung, L. Maximum entropy kernels for system identification. IEEE Trans. Autom. Control 2016, 62, 1471–1477. [Google Scholar] [CrossRef]
  6. Fan, X.; Chen, L.; Huang, D.; Tian, Y.; Zhang, X.; Jiao, M.; Zhou, Z. From Single Metals to High-Entropy Alloys: How Machine Learning Accelerates the Development of Metal Electrocatalysts. Adv. Funct. Mater. 2024, 1, 2401887. [Google Scholar] [CrossRef]
  7. Han, T.T.; Ge, S.S.; Lee, T.H. Persistent Dwell-Time Switched Nonlinear Systems: Variation Paradigm and Gauge Design. IEEE Trans. Autom. Control 2010, 55, 321–337. [Google Scholar]
  8. Chesi, G. Stabilization and Entropy Reduction via SDP-Based Design of Fixed-Order Output Feedback Controllers and Tuning Parameters. IEEE Trans. Autom. Control 2017, 62, 1094–1108. [Google Scholar] [CrossRef]
  9. Wei, P.; Li, H.X. Spatiotemporal entropy for abnormality detection and localization of Li-ion battery packs. IEEE Trans. Ind. Electron. 2023, 70, 12851–12859. [Google Scholar] [CrossRef]
  10. Wang, Z.; Li, G.; Yao, L.; Cai, Y.; Lin, T.; Zhang, J.; Dong, H. Intelligent fault detection scheme for constant-speed wind turbines based on improved multiscale fuzzy entropy and adaptive chaotic Aquila optimization-based support vector machine. ISA Trans. 2023, 138, 582–602. [Google Scholar] [CrossRef] [PubMed]
  11. Daneshpazhouh, A.; Sami, A. Entropy-based outlier detection using semi-supervised approach with few positive examples. Pattern Recogn. Lett. 2014, 49, 77–84. [Google Scholar] [CrossRef]
  12. Qu, B.; Wang, Z.; Shen, B.; Dong, H. Decentralized dynamic state estimation for multi-machine power systems with non-Gaussian noises: Outlier detection and localization. Automatica 2023, 153, 111010. [Google Scholar] [CrossRef]
  13. Feng, Z.; Wang, G.; Peng, B.; He, J.; Zhang, K. Novel robust minimum error entropy wasserstein distribution kalman filter under model uncertainty and non-gaussian noise. Signal Process. 2023, 203, 108806. [Google Scholar] [CrossRef]
  14. Sun, M.; Davies, M.E.; Proudler, I.K.; Hopgood, J.R. Adaptive kernel Kalman filter. IEEE Trans. Signal Process. 2023, 71, 713–726. [Google Scholar] [CrossRef]
  15. Wu, L.; Lin, H.; Hu, B.; Tan, C.; Gao, Z.; Liu, Z.; Li, S.Z. Beyond homophily and homogeneity assumption: Relation-based frequency adaptive graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 8497–8509. [Google Scholar] [CrossRef] [PubMed]
  16. Guo, L.; Wang, H. Minimum entropy filtering for multivariate stochastic systems with non-Gaussian noises. IEEE Trans. Autom. Control 2006, 51, 695–700. [Google Scholar]
  17. Wang, Y.; Tang, Y.Y.; Li, L. Minimum error entropy based sparse representation for robust subspace clustering. IEEE Trans. Signal Process. 2015, 63, 4010–4021. [Google Scholar] [CrossRef]
  18. Wang, M.; Deng, W. Deep face recognition: A survey. Neurocomputing 2021, 429, 215–244. [Google Scholar] [CrossRef]
  19. Lin, M.; Cheng, C.; Peng, Z.; Dong, X.; Qu, Y.; Meng, G. Nonlinear dynamical system identification using the sparse regression and separable least squares methods. J. Sound Vib. 2021, 505, 116141. [Google Scholar] [CrossRef]
  20. Wu, Z.; Peng, S.; Ma, W.; Chen, B.; Principe, J.C. Minimum error entropy algorithms with sparsity penalty constraints. Entropy 2015, 17, 3419–3437. [Google Scholar] [CrossRef]
  21. Huang, S.; Tran, T.D. Sparse signal recovery via generalized entropy functions minimization. IEEE Trans. Signal Process. 2018, 67, 1322–1337. [Google Scholar] [CrossRef]
  22. Yue, H.; Wang, H. Minimum entropy control of closed-loop tracking errors for dynamic stochastic systems. IEEE Trans. Autom. Control 2003, 48, 118–122. [Google Scholar]
  23. Wang, H. Minimum entropy control of non-Gaussian dynamic stochastic systems. IEEE Trans. Autom. Control 2002, 47, 398–403. [Google Scholar] [CrossRef]
  24. Mitra, R.; Bhatia, V. Minimum error entropy criterion based channel estimation for massive-MIMO in VLC. IEEE Trans. Veh. Technol. 2018, 68, 1014–1018. [Google Scholar] [CrossRef]
  25. Erdogmus, D.; Principe, J.C. Convergence properties and data efficiency of the minimum error entropy criterion in adaline training. IEEE Trans. Signal Process. 2003, 51, 1966–1978. [Google Scholar]
  26. Chen, B.; Yuan, Z.; Zheng, N.; Principe, J.C. Kernel minimum error entropy algorithm. Neurocomputing 2013, 121, 160–169. [Google Scholar] [CrossRef]
  27. Hu, T.; Wu, Q.; Zhou, D.X. Convergence of gradient descent for minimum error entropy principle in linear regression. IEEE Trans. Signal Process. 2016, 64, 6571–6579. [Google Scholar] [CrossRef]
  28. Zhang, Y.; Chen, B.; Liu, X.; Yuan, Z.; Principe, J.C. Convergence of a fixed-point minimum error entropy algorithm. Entropy 2015, 17, 5549–5560. [Google Scholar] [CrossRef]
  29. Chen, B.; Zhu, Y.; Hu, J.; Zhang, M. On optimal estimations with minimum error entropy criterion. J. Frankl. Inst. 2010, 347, 545–558. [Google Scholar]
  30. Heravi, A.R.; Hodtani, G.A. A new information theoretic relation between minimum error entropy and maximum correntropy. IEEE Signal Process. Lett. 2018, 25, 921–925. [Google Scholar]
  31. Yang, S.; Tan, J.; Chen, B. Robust spike-based continual meta-learning improved by restricted minimum error entropy criterion. Entropy 2022, 24, 455. [Google Scholar] [CrossRef] [PubMed]
  32. Dang, L.; Chen, B.; Wang, S.; Ma, W.; Ren, P. Robust power system state estimation with minimum error entropy unscented Kalman filter. IEEE Trans. Instrum. Meas. 2020, 69, 8797–8808. [Google Scholar] [CrossRef]
  33. Wang, B.; Hu, T. Semi-Supervised Minimum Error Entropy Principle with Distributed Method. Entropy 2018, 20, 968. [Google Scholar] [CrossRef] [PubMed]
  34. Chen, B.; Xing, L.; Xu, B.; Zhao, H.; Principe, J.C. Insights into the robustness of minimum error entropy estimation. IEEE Trans. Neural Netw. Learn. Syst. 2016, 29, 731–737. [Google Scholar] [CrossRef] [PubMed]
  35. Shen, P.; Li, C. Minimum total error entropy method for parameter estimation. IEEE Trans. Signal Process. 2015, 63, 4079–4090. [Google Scholar] [CrossRef]
  36. Kothari, P.; Kreiss, S.; Alahi, A. Human trajectory forecasting in crowds: A deep learning perspective. IEEE Trans. Intell. Transp. Syst. 2021, 23, 7386–7400. [Google Scholar] [CrossRef]
  37. Kim, N.; Kwon, K. Normalized Minimum Error Entropy Algorithm with Recursive Power Estimation. Entropy 2016, 18, 239. [Google Scholar] [CrossRef]
  38. Chen, B.; Ma, R.; Yu, S.; Du, S.; Qin, J. Granger causality analysis based on quantized minimum error entropy criterion. IEEE Signal Process. Lett. 2019, 26, 347–351. [Google Scholar] [CrossRef]
  39. Niknam, M.; Santos, L.F.; Cory, D.G. Experimental detection of the correlation Rényi entropy in the central spin model. Phys. Rev. Lett. 2021, 127, 080401. [Google Scholar] [CrossRef] [PubMed]
  40. Kim, T.; San Kim, D.; Jang, L.C.; Lee, H.; Kim, H. Representations of degenerate Hermite polynomials. Adv. Appl. Math. Mech. 2022, 139, 102359. [Google Scholar] [CrossRef]
  41. Lin, W.; Zhang, J.E. The valid regions of Gram–Charlier densities with high-order cumulants. J. Comput. Appl. Math. 2022, 407, 113945. [Google Scholar] [CrossRef]
  42. Matsushita, R.; Brandão, H.; Nobre, I.; Da Silva, S. Differential entropy estimation with a Paretian kernel: Tail heaviness and smoothing. Phys. A Stat. Mech. Appl. 2024, 1, 129850. [Google Scholar] [CrossRef]
  43. Canales, C.; Galarce, C.; Rubio, F.; Pineda, F.; Anguita, J.; Barros, R.; Parragué, M.; Daille, L.K.; Aguirre, J.; Armijo, F.; et al. Testing the test: A comparative study of marine microbial corrosion under laboratory and field conditions. ACS Omega 2021, 6, 13496–13507. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The boxplot of F-measure among four algorithms for failure outage prediction results.
Figure 1. The boxplot of F-measure among four algorithms for failure outage prediction results.
Algorithms 17 00341 g001
Figure 2. The boxplot of error rate among four algorithms for failure outage prediction results.
Figure 2. The boxplot of error rate among four algorithms for failure outage prediction results.
Algorithms 17 00341 g002
Figure 3. The boxplot of F-measure of four algorithms for failure outage prediction results.
Figure 3. The boxplot of F-measure of four algorithms for failure outage prediction results.
Algorithms 17 00341 g003
Figure 4. The boxplot of error rate of four algorithms for failure outage prediction results.
Figure 4. The boxplot of error rate of four algorithms for failure outage prediction results.
Algorithms 17 00341 g004
Table 1. The running time of MEE and FMEE for Gaussian noise.
Table 1. The running time of MEE and FMEE for Gaussian noise.
Sample SizeAverage Running TimeOptimal Running Time
MEEFMEEMEEFMEE
100103.890.040177.770.0156
200384.310.0580310.050.0312
300801.520.0769654.210.0468
4001430.610.10001254.120.0468
5002306.240.11042072.500.0624
Table 2. The mean squared error of MEE and FMEE for Gaussian noise.
Table 2. The mean squared error of MEE and FMEE for Gaussian noise.
Sample SizeAverage MSEOptimal MSE
MEEFMEEMEEFMEE
1000.91081.40270.56970.6542
2000.93111.05280.62950.7258
3000.95731.01900.76000.7462
4000.96960.97690.80240.7588
5000.96040.99380.80240.8302
Table 3. The number of iterations of MEE and FMEE for Gaussian noise.
Table 3. The number of iterations of MEE and FMEE for Gaussian noise.
Sample SizeAverage Iteration NumberOptimal Iteration Number
MEEFMEEMEEFMEE
100797.0138.215926
200697.0938.295927
300656.6136.885427
400650.0340.675719
500637.0336.175867
Table 4. The running time of MEE and FMEE for non-Gaussian noise.
Table 4. The running time of MEE and FMEE for non-Gaussian noise.
Sample SizeAverage Running TimeOptimal Running Time
MEEFMEEMEEFMEE
100107.960.041385.240.0156
200348.660.0635270.260.0312
300800.500.0819727.450.0468
4001431.340.10271274.870.0624
5002179.330.11781952.270.0624
Table 5. The mean squared error of MEE and FMEE for non-Gaussian noise.
Table 5. The mean squared error of MEE and FMEE for non-Gaussian noise.
Sample SizeAverage MSEOptimal MSE
MEEFMEEMEEFMEE
1001.08440.96040.15840.1630
2000.84851.15540.26540.2607
3000.91101.23140.31400.2659
4001.10371.05500.36720.2891
5000.95361.04910.49950.4668
Table 6. The number of iterations of MEE and FMEE for non-Gaussian noise.
Table 6. The number of iterations of MEE and FMEE for non-Gaussian noise.
Sample SizeAverage Iteration NumberOptimal Iteration Number
MEEFMEEMEEFMEE
100782.8834.336188
200689.6735.725639
300644.9536.7258410
400648.7037.5558012
500633.2535.3557112
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Q.; Liao, X.; Cui, W.; Wang, Y.; Cao, H.; Guan, Q. Fast Minimum Error Entropy for Linear Regression. Algorithms 2024, 17, 341. https://doi.org/10.3390/a17080341

AMA Style

Li Q, Liao X, Cui W, Wang Y, Cao H, Guan Q. Fast Minimum Error Entropy for Linear Regression. Algorithms. 2024; 17(8):341. https://doi.org/10.3390/a17080341

Chicago/Turabian Style

Li, Qiang, Xiao Liao, Wei Cui, Ying Wang, Hui Cao, and Qingshu Guan. 2024. "Fast Minimum Error Entropy for Linear Regression" Algorithms 17, no. 8: 341. https://doi.org/10.3390/a17080341

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop