Next Article in Journal
Axiomatics of the Observer Manifold and Relativity
Next Article in Special Issue
Application of Whale Optimization Algorithm Based FOPI Controllers for STATCOM and UPQC to Mitigate Harmonics and Voltage Instability in Modern Distribution Power Grids
Previous Article in Journal
Extremal Graphs for Sombor Index with Given Parameters
Previous Article in Special Issue
Improved Cascade Correlation Neural Network Model Based on Group Intelligence Optimization Algorithm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

L1-Norm Robust Regularized Extreme Learning Machine with Asymmetric C-Loss for Regression

1
School of Automation, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
2
Xi’an Key Laboratory of Advanced Control and Intelligent Process, Xi’an 710121, China
3
School of Electronic Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
*
Author to whom correspondence should be addressed.
Axioms 2023, 12(2), 204; https://doi.org/10.3390/axioms12020204
Submission received: 10 January 2023 / Revised: 13 February 2023 / Accepted: 14 February 2023 / Published: 15 February 2023
(This article belongs to the Special Issue Fractional-Order Equations and Optimization Models in Engineering)

Abstract

:
Extreme learning machines (ELMs) have recently attracted significant attention due to their fast training speeds and good prediction effect. However, ELMs ignore the inherent distribution of the original samples, and they are prone to overfitting, which fails at achieving good generalization performance. In this paper, based on expectile penalty and correntropy, an asymmetric C-loss function (called AC-loss) is proposed, which is non-convex, bounded, and relatively insensitive to noise. Further, a novel extreme learning machine called L1 norm robust regularized extreme learning machine with asymmetric C-loss (L1-ACELM) is presented to handle the overfitting problem. The proposed algorithm benefits from L1 norm and replaces the square loss function with the AC-loss function. The L1-ACELM can generate a more compact network with fewer hidden nodes and reduce the impact of noise. To evaluate the effectiveness of the proposed algorithm on noisy datasets, different levels of noise are added in numerical experiments. The results for different types of artificial and benchmark datasets demonstrate that L1-ACELM achieves better generalization performance compared to other state-of-the-art algorithms, especially when noise exists in the datasets.

1. Introduction

The single hidden-layer feedforward neural network (SLFN) is one of the most important learning algorithms in data mining and machine learning fields. SLFN has only one hidden layer that connects the input and output layers. Generally, gradient-based algorithms are used to train SLFNs similar to back-propagation algorithms [1], which often leads to slow convergence, overfitting, and local minima. To overcome these problems, Huang et al. [2,3] proposed a widely used method based on the structure of SLFN called extreme learning machine (ELM). Compared to the traditional single hidden layer feedforward neural network, the input weights and thresholds of the hidden layer nodes in ELM are randomly generated, and there is no need for repeated adjustment via iterations. ELM identifies the output weight vector with the smallest norm by calculating the Moore-Penrose inverse. Therefore, the training speed of ELM is much higher than that of SLFN. Moreover, ELM also requires minimal training error and norm of the weights, which facilitates good generalization performance. Since ELM has a higher learning speed and better generalization performance, it has been successfully applied in many fields [4,5,6]. However, ELM still has several shortcomings. For example, ELM is based on empirical risk minimization (ERM) [7] which often leads to overfitting.
To address this issue, many scholars have proposed various algorithms based on ELM to improve the generalization performance. In [8], Deng et al. introduced the weight factor γ into ELM for the first time and proposed the regularized extreme learning machine (RELM). By adjusting the weight factor γ , the proportion of empirical risk and structural risk in the actual prediction risk can be optimal, thereby avoiding model overfitting. However, RELM uses the L2 norm which is sensitive to outliers. To reduce the influence of outliers, Rong et al. proposed the pruned extreme learning machine (P-ELM) [9], which can remove irrelevant hidden nodes. P-ELM is only used for classification problems. To further address the regression problem, the optimally pruned extreme learning machine (OP-ELM) [10] was proposed. In OP-ELM, The L1 norm is used to remove irrelevant output nodes and select the corresponding hidden nodes, and then the weight of the corresponding hidden nodes is calculated using the least squares method. Given that the L1 norm is robust to outliers, it is used in various algorithms to improve the generalization performance [11,12]. Balasundaram et al. [13] proposed the L1 norm extreme learning machine, which produces sparse models such that decision functions can be determined using fewer hidden layer nodes. Generally speaking, RELM is composed of empirical risk and structural risk. Structural risk can effectively avoid overfitting, and structural risk is determined by loss function. Traditional RELMs use the squared loss function, which is symmetric and unbounded. The symmetry makes the model unable to take into account the distribution characteristics within the training samples, while unboundedness will cause the model to be sensitive to noise and outliers. In real life, the distribution of data is unbalanced, and noise is generally mixed in the process of data collection. Therefore, it is particularly important to choose an appropriate loss function to construct the model.
Quantiles can reflect completely the distribution of random variables without missing any information Quantile regression can more accurately describe the distribution characteristics of random variables for comprehensive analysis. Therefore, quantile regression is more robust and has been successfully applied to statistical prediction [14,15]. Quantile loss can be thought of as a pinball penalty. Expectile loss is an asymmetric least squares loss, which is the square of the quantile loss function. It is often used in regression problems with imbalanced data [16]. However, the unboundedness of the expectile loss leads to a lack of robustness.
From [17], the bounded loss function is less sensitive to noise and outliers than the unbounded loss function, whereas convex functions are usually unbounded. To further improve the robustness of ELM, researchers have proposed various non-convex loss functions to replace the convex loss functions [18,19,20]. Examples of common convex loss functions include square loss, hinge loss, and Huber loss, which allow for the determination of global optimal solutions and are easy to solve. However, the unboundedness of the convex loss function implies that it is not suited for handling outliers. Compared to convex loss functions, non-convex loss functions are more robust to outliers. Recently, Singh et al. [21] proposed a correntropy-based loss function called C-loss. Based on information theory and the kernel method, correntropy [22,23] is considered to be a generalized local similarity measure between two random variables. As a non-convex, bounded loss function, the C-loss function has been widely used in machine learning to improve robustness. In 2019, Zhao et al. [24] applied the C-loss function to ELM for the first time. They proposed the C-loss based ELM (CELM), and also experimentally demonstrated that the generalization performance was better compared to that of other algorithms.
In real life, the distribution of datasets tends to be asymmetric, and the training samples are easily contaminated by noise. In order to better consider the distribution characteristics inside the data and improve the generalization ability of the algorithm, a non-convex robust loss function is proposed, called asymmetric C-loss (AC-loss). A robust extreme learning machine based on the asymmetric C-loss and L1-norm (called L1-ACELM) is then developed. The main contributions of this report are as follows:
(1)
Based on the expectile penalty and correntropy loss function, a new loss function (AC-loss) is developed. AC-loss retains some important properties of C-loss such as non-convexity and boundedness. AC-loss is asymmetric, and it can handle unbalanced noise.
(2)
A novel approach called the L1-norm robust regularized extreme learning machine with asymmetric C-loss (L1-ACELM) is proposed by applying the proposed AC-loss function and the L1-norm in the objective function of ELM to enhance robustness to outliers.
(3)
The non-convexity of the AC-loss function makes it difficult for L1-ACELM to be solved. The half-quadratic optimization algorithm [25,26,27] is used to address these problems. Moreover, the convergence of the proposed algorithms is analyzed.
The remainder of this paper is structured as follows. Section 2 briefly reviews ELM, RELM, C-loss function, and the half-quadratic optimization algorithm. In Section 3, we propose the asymmetric C-loss function and the L1-ACELM model. Next, the half-quadratic optimization algorithm is used to solve L1-ACELM. In addition, we analyze the convergence of the algorithm. The experimental results for the artificial and benchmark datasets are presented in Section 4. Section 5 summarizes the main conclusions and further study.

2. Related Work

2.1. Extreme Learning Machine (ELM)

ELM is a new single hidden layer feedforward neural network that is first proposed by Huang et al. [2]. Unlike traditional SLFN, the input weights and thresholds of the hidden layer in ELM are randomly generated and the output weights can be determined using the least square method. Hence, it is much faster than traditional SLFN. In addition, ELM has good generalization ability.
Given N arbitrary distinct samples X , Y = x i , y i i = 1 N , x i = x i 1 , x i 2 , , x i m T m and y i = y i 1 , y i 2 , , y i n T n are the input samples and the corresponding output vectors, respectively. The output of a standard SLFN with L hidden nodes can be expressed as follows:
f x i = j = 1 L β j h α j , b j , x i ,   i = 1 , , N
where α j = α j 1 , α j 2 , , α j m T m is the input weight vector that connects the input node to the j-th hidden layer node and b j R is the bias of the j-th hidden node. β j = β j 1 , β j 2 , , β j n T n is the output weight vector that connects the j-th hidden layer node to the output node, and h α j , b j , x i is the output of the j-th hidden layer node with respect to the input x i . f denotes the actual output vector of SLFN.
For ELM, the input weight vector and the bias that connects the input node to the hidden layer node are randomly assigned instead of being updated. Therefore, it can be converted to a linear model:
F = H β
where
H = h x 1 h x N = h α 1 , b 1 , x 1 h α L , b L , x 1 h α 1 , b 1 , x N h α L , b L , x N N × L ,   β = β 1 T β L T L × n   and   F = f x 1 T f x N T N × n
Here, H is the output matrix of the hidden layer. Thus, the output weight vector that connects the hidden layer node to the output node can be determined by solving the following equation:
min β H β Y 2
ELM requires the approximation of the training samples with zero error. Therefore, Equation (3) can be written as:
H β = Y
The output weight β is the least squares solution of Equation (4), which can be obtained as follows:
β = H + Y
where H + is the Moore-Penrose generalized inverse of the matrix H .
To avoid overfitting of the model, regularized ELM is proposed, which facilitates better generalization performance by minimizing the sum of the training error and the norm of the output weights [28]. RELM can be expressed as follows:
min β H β Y 2 2 + γ 2 β 2 2
The optimal solution to RELM is computed as follows:
β = H T H + γ I 1 H T Y     i f   N L H T H H T + γ I 1 Y     i f   N < L
where I is an identity matrix.

2.2. Correntropy-Induced Loss (C-Loss)

Correntropy is a generalized similarity measure between two random variables in a small neighborhood defined by the kernel width σ . For a regression problem, the choice of the loss function could ensure that the similarity between the actual output and the target value is maximized, which is equivalent to the maximization of correntropy. Thus, the C-loss function [21] is proposed by Singh et al., which is defined as:
L C y i , f x i = 1 exp y i f x i 2 2 σ 2
As a bounded non-convex loss function, the C-loss loss function is more robust to outliers than the traditional squared loss function.

2.3. Half-Quadratic Optimization

The half-quadratic optimization algorithm based on the conjugate function theory [29] is usually used for convex optimization and non-convex optimization problems. This method transforms the original non-convex objective function into a half-quadratic objective function by introducing auxiliary variables. As such, the objective function cannot be solved directly, and a two-step alternating minimization method is required. The specific operations are as follows: given the original variables, the auxiliary variables are optimized. The variables are then optimized, and the original variables are determined.
The minimization problem is as follows:
min v ϕ v v + F v
where v = v 1 , v 2 , , v N T N , ϕ is a potential loss function with ϕ v = i = 1 N ϕ v i and F is a convex penalty function.
Considering the half-quadratic optimization algorithm, we introduce an auxiliary variable p = p 1 , p 2 , , p N T N into ϕ , which can then be expressed as:
ϕ v i = min p i Q v i , p i + φ p i
where Q v i , p i is a half-quadratic function, which can be represented in the additive form Q A v i , p i = 1 2 c v i p i / c 2 or the multiplicative form Q M v i , p i = 1 2 p i v i 2 .
Substituting Equation (10) into Equation (9), we obtain the following optimization problem:
min v ϕ v v + F v = min v , p Q v , p + φ p + F v
where p i is determined using a function g , which is the conjugate function of ϕ . Alternatively, Equation (11) can then be optimized as follows:
p t + 1 = g v
v t + 1 = arg min v Q v , p t + 1 + F v
where t represents the t-th iteration.

3. Main Contributions

3.1. Asymmetric C-Loss Function (AC-Loss)

As a measure of risk, the expectile is an extension of the quantile, which represents the distributional information of a random variable. The expectile loss is essentially a squared pinball loss, which can also be considered as an asymmetric squared loss. The asymmetric least square loss function can be expressed as:
L τ y i , f x i = τ y i f x i 2 1 τ y i f x i 2 i f   y i f x i 0 i f   y i f x i < 0
However, given that the asymmetric least square loss is an unbounded loss function, it is more sensitive to outliers. Therefore, we construct an asymmetric C-loss (AC-loss) function, based on the C-loss function and the expectile loss function, which is a non-convex, asymmetric, and bounded function for dealing with outliers and noise. The AC-loss function is defined as follows:
L C a l s y i , f x i = 1 exp τ y i f x i 2 2 σ 2 i f   y i f x i 0 1 exp 1 τ y i f x i 2 2 σ 2 i f   y i f x i < 0  
The plot of the AC-loss function is shown in Figure 1.

3.2. L1-ACELM

To improve the generalization performance of RELM, the proposed loss function is introduced to replace the squared loss function. To further enhance robustness to outliers, the L2 norm of structural risk in RELM is replaced with the L1 norm. Therefore, we propose a new robust ELM (called L1-ACELM):
min β   J β = i = 1 N L C a l s y i h x i β + γ β 1
where γ > 0 is a regularized parameter.
Since AC-loss is a non-convex loss function, it is difficult to directly optimize the objective function. The half-quadratic optimization algorithm is usually applied to optimize non-convex problems. Therefore, we chose the half-quadratic optimization algorithm to find the optimal solution of the objective function.

3.3. Solving Method

For the function f u = exp u , there exists a convex function g v , which is expressed as follows:
g v = v log v + v
where v < 0 , and the conjugate function g u of the function g v is defined as:
g u = sup v u v + v log v v
where
v = exp u < 0
By substituting Equation (19) into Equation (18), we have
g u = exp u
Now, let u = τ e i 2 2 σ 2 i f   e i 0 1 τ e i 2 2 σ 2 i f   e i < 0   and e i = y i h x i β , then Equation (18) can be expressed as:
g u = sup v τ e i 2 2 σ 2 v + v log v v sup v 1 τ e i 2 2 σ 2 v + v log v v = exp τ e i 2 2 σ 2 i f   e i 0 exp 1 τ e i 2 2 σ 2 i f   e i < 0  
where
v i = exp τ e i 2 2 σ 2 i f   e i 0 exp 1 τ e i 2 2 σ 2 i f   e i < 0  
By combining Equations (21) and (16), we have
min β , v   J β , v = i = 1 N 1 sup v i exp τ e i 2 2 σ 2 v i + g v i + γ β 1 i f   e i 0 i = 1 N 1 sup v i exp 1 τ e i 2 2 σ 2 v i + g v i + γ β 1 i f   e i < 0   s . t .   β h x i = y i e i , i = 1 , 2 , , N
where v = v 1 , v 2 , , v N T . Equation (23) can be simplified as:
min β , v   J β , v = sup v i = 1 N τ e i 2 2 σ 2 v i v i log v i + v i + γ β 1 i f   e i 0 sup v i = 1 N 1 τ e i 2 2 σ 2 v i v i log v i + v i + γ β 1 i f   e i < 0   s . t .   h x i β = y i e i ,   i = 1 , 2 , , N
The optimal solution β can be obtained by solving Equation (24) using the alternating optimization method.
Firstly, given the original variables β t , we can obtain the optimal solution for the auxiliary variables v t + 1 . When β t is given, the minimization problem is given as follows:
min v   J v = i = 1 N τ y i f x i 2 2 σ 2 v i v i log v i + v i i f   e i 0 i = 1 N 1 τ y i f x i 2 2 σ 2 v i v i log v i + v i i f   e i < 0  
According to the half-quadratic optimization algorithm, the auxiliary variables v t + 1 can be obtained by solving Equation (24). Thus, we have:
v i t + 1 = exp τ y i f t x i 2 2 σ 2 i f   e i 0 exp 1 τ y i f t x i 2 2 σ 2 i f   e i < 0   , i = 1 , 2 , , N
Secondly, the auxiliary variables v t + 1 are fixed and the optimal solution of the original variable β t + 1 can be obtained by solving the following minimization problem:
min β t + 1   J β t + 1 = i = 1 N τ v i 2 σ 2 e i 2 + γ β t + 1 1 i f   e i 0 i = 1 N 1 τ v i 2 σ 2 e i 2 + γ β t + 1 1 i f   e i < 0   s . t .   β t + 1 h x i = y i e i , i = 1 , 2 , , N
Equation (27) is equivalent to
min β t + 1   J β t + 1 = i = 1 N τ v i t + 1 2 σ 2 y i h x i β t + 1 2 + γ β t + 1 1 i f   y i h x i β t + 1 i = 1 N 1 τ v i t + 1 2 σ 2 y i h x i β t + 1 2 + γ β t + 1 1   i f   y i < h x i β t + 1
Since the L1 norm exists in the objective function, the proximal gradient descent (PGD) algorithm is applied to solve the optimization problem Equation (28). The objective function J β t + 1 can be written as
J β t + 1 = S β t + 1 + γ β t + 1 1 ,
where
  S β t + 1 = i = 1 N τ v i t + 1 2 σ 2 y i h x i β t + 1 2 i f   y i h x i β t + 1 i = 1 N 1 τ v i t + 1 2 σ 2 y i h x i β t + 1 2 i f   y i < h x i β t + 1
S β t + 1 is differentiable and its derivative is as follows:
  S β t + 1 = i = 1 N τ v i t + 1 σ 2 h T x i y i h x i β t + 1 i f   y i h x i β t + 1 i = 1 N 1 τ v i t + 1 σ 2 h T x i y i h x i β t + 1 i f   y i < h x i β t + 1
Since S β t + 1 satisfies the L-Lipschitz continuity condition, there is a constant η > 0 such that
S β S β t + 1 2 2 η β β t + 1 2 2 , β , β t + 1
The second-order Taylor expansion of the function S β t + 1 can be expressed as
S β ; β t + 1 S β k + 1 + S β k + 1 β β k + 1 + η 2 β β k + 1 = η 2 β β k + 1 1 η S β k + 1 2 2 + δ β k + 1
where δ β t + 1 is a constant that is independent of β t + 1 .
Introducing β t + 1 1 into the objective function, the iterative equation of the proximal gradient descent can be expressed as
β t + 1 = arg min β t + 1 η 2 β β t + 1 1 η S β t + 1 2 2 + γ β t + 1 1
Let z = β t + 1 1 η S β t + 1 . Then, the closed-form solution of Equation (34) can be written as:
β i t + 1 = z i γ / η γ / η < z i 0 z i γ / η z i + γ / η z i < γ / η , i = 1 , 2 , , N
where β i t + 1 and z i represent the i-th component of β t + 1 and z , respectively. We develop a half-quadratic optimization to solve the proposed model, and the pseudo code is presented in Algorithm 1.
Algorithm 1. Half-quadratic optimization for L1-ACELM
Input: The training dataset T = { ( x i , y i ) } i = 1 N , the number of hidden layer nodes L, the activation function h x , the regularization parameter γ , the maximum number of iterations t max , window width σ , a small number ρ and the parameter τ .
Output: the output weight vector β .
Step 1. Randomly generate input weight α i and hidden layer bias b i with L hidden nodes.
Step 2. Calculate hidden output matrix H x .
Step 3. Compute β by Equation (7).
Step 4. Let β 0 = β and β 1 = β , set t = 1 .
Step 5. While J β t J β t 1 < ρ or t < t max do
calculate v i t + 1 by Equation (26).
update β t + 1 using Equation (35).
compute J β t + 1 by Equation (29).
update t: = t + 1.
End while
Step 6: Output result given by β = β t 1 .

3.4. Convergence Analysis

Proposition 1.
The sequence J β t , v t , t = 1 , 2 , , t generated by Algorithm 1 is convergent.
Proof. 
Let β t and v t be the optimal solution to the objective function (23) after t iterations. In the half-quadratic optimization problem, the conjugate function g satisfies Q β i , g β i + φ β i Q β i , g v i + φ v i . When β t is fixed, we can obtain the optimal solution v t + 1 of v at the (t + 1)-th iteration from Equation (26), then we have:
J β t , v t + 1 J β t , v t
Next, when v t + 1 is fixed, we can optimize (28) to obtain the solution β t + 1 of β at the (t + 1)-th iteration. Then we have:
J β t + 1 , v t + 1 J β t , v t + 1
Combining Inequation (36) with Inequality (37), we have:
J β t + 1 , v t + 1 J β t , v t + 1 J β t , v t
Hence, the optimization problem J β , v is bounded, and the sequence J β t , v t , t = 1 , 2 , , t is convergent. □

4. Experiments

4.1. Experimental Setup

To evaluate the performance of the proposed L1-ACELM algorithm, we performed numerical simulations using two artificial datasets and ten standard benchmark datasets. To show the effectiveness of the L1-ACELM algorithm compared to traditional algorithms including extreme learning machine (ELM), regularized ELM (RELM), and C-loss based ELM (CELM), several experiments were performed. All experiments were implemented in Matlab2016a on a PC with an i5-7200U Intel(R) Core (TM) processor (2.70 GHz) 4 GB RAM.
To evaluate the prediction performance of the L1-ACELM algorithm, the regression evaluation metrics are defined as follows:
(1)
The root mean square error (RMSE)
R M S E = 1 N i = 1 N y i y ^ i 2
(2)
Mean absolute error (MAE)
M A E = 1 N i = 1 N y i y ^ i
(3)
The ratio of the sum squared error (SSE) to the sum squared deviation of the sample SST (SSE/SST) is given as:
S S E / S S T = i = 1 N y ^ i y i 2 i = 1 N y i y ¯ i 2
(4)
The ratio between the interpretable sum deviation SSR and SST (SSR/SST) is given by:
S S R / S S T = i = 1 N y ^ i y ¯ i 2 i = 1 N y i y ¯ i 2
where N is the number of samples. y i and y ^ i denote the target values and the corresponding predicted values, respectively. y ¯ i can be calculated from y ¯ i = 1 N i = 1 N y i , which represents the average value of y 1 , y 2 , , y N . The sigmoid function is chosen as the activation function for ELM, RELM, CELM, and L1-ACELM, and can be expressed as:
h x = 1 1 + exp a i T x + b i
Since the original algorithms and the proposed algorithm involve many parameters, to ensure the best performance, ten-fold cross-validation is used to determine the optimal parameters. In ELM and RELM, the number of hidden layer nodes L = 30 is fixed. For RELM, CELM, and L1-ACELM, the optimal value of the regularization parameter γ is selected from the set {2−50, 2−49, …, 249, 250}. For CELM and L1-ACELM, the window width σ is selected from the range {2−2, 2−1, 20, 21, 22}. For L1-ACELM, the parameter τ is obtained from the set {0.1, 0.2, …, 0.9}.

4.2. Performance on Artificial Datasets

To verify the robustness of the proposed L1-ACELM, two artificial datasets were generated using six different types of noise, both of which consisted of 2000 data points. Table 1 shows the specific forms of two artificial datasets and different types of noise. λ i N 0 , s 2 indicates that λ i has a normal distribution with a mean of zero and variance of s 2 , λ i U a , b means that λ i has a uniform distribution in the interval [ a , b ] , λ i T c indicates that λ i has a t-distribution with c degrees of freedom.
Figure 2 shows different types of noise graphs, the graphs of the sinc function, and the graphs of the sinc function with different noises.
Figure 3 shows different types of noise graphs, the graphs of the self-defining function, and the graphs of the self-defining function with different noises.
In our experiments, we randomly selected 1600 samples as the training dataset and the remaining 400 samples as the testing dataset. To evaluate the effectiveness of the proposed algorithm, we compared its performance to that of ELM, RELM, and CELM. Table 2 shows the optimal RMSE, MAE, SSE/SST, and SSR/SST of the four algorithms that were obtained based on the optimal parameters selected using the ten-fold cross-validation method. Table 2 also lists the optimal parameters for each algorithm. The regression fitting results of ELM, RELM, CELM, and L1-ACELM on two artificial datasets with noise are shown in Figure 4 and Figure 5.
Figure 4 and Figure 5 demonstrate the fitting effect of the four algorithms on the two artificial datasets. Based on these figures, it is observed that the fitting curve of L1-ACELM is the closest to the real function curve compared to the other three algorithms. In Table 2, the best test results are shown in bold.
The data in Table 2 demonstrate that L1-ACELM exhibits better performance in most cases when compared to the other three algorithms for the two artificial datasets with different noises. It is evident that L1-ACELM has smaller RMSE, MAE, and SSE/SST, and larger SSE/SSR. This indicates that L1-ACELM is more robust to noise. For example, for the sinc function, except for F noise, the performance of the proposed algorithm is superior to that of the other algorithms for different types of noise. Moreover, it is seen that L1-ACELM has better generalization performance in the case of unbalanced noise data. In conclusion, L1-ACELM is more stable in a noisy environment.

4.3. Performance on Benchmark Datasets

To further test the robustness of L1-ACELM, experiments were performed on ten UCI datasets [30] with different levels of noise, including noise-free datasets, datasets with 5% noise, and datasets with 10% noise. Noise datasets were only added to the target output value of the training datasets. Among them, datasets with 5% noise indicate that the noisy data are 5% of the training dataset. The data in the noisy dataset are randomly taken from the set [ 0 , d ] , where d is the average of the target output values of the training datasets.
In the experiment, we randomly selected 80% of the data as the training dataset and the remaining 20% as the testing dataset for each benchmark dataset. The specific description is shown in Table 3.
To better reflect the performance of the proposed algorithm L1-ACELM, the RMSE, MAE, SSE/SST, and SSR/SST were compared with those of ELM, RELM, and CELM. The evaluation indicators and the ranking of each algorithm for different noise environments are listed in Table 4, Table 5 and Table 6, and the best test results are shown in bold. From Table 4 to Table 6, it is observed that the performance of each algorithm decreases as the noise level increases. However, compared to the other algorithms, the performance of L1-ACELM is still the best in most cases. From Table 4, it can be concluded that L1-ACELM performs best on nine datasets out of a total of ten datasets in term of the RMSE and SSR/SST values. Similarly, for the MAE and SSE/SST values, L1-ACELM exhibits the best performance on all the datasets. Table 5 shows that after adding 5% noise, the performance of each algorithm decreases, and according to the RMSE value, the proposed algorithm performed well on eight of the ten datasets. For the MAE, SSE/SST, and SSR/SST values, L1-ACELM performs better for nine datasets. Moreover, for the RMSE, MAE, and SSR/SST values, it exhibits superior performance in nine cases and for the SSE/SST values, it has better performance in all ten datasets.
To further illustrate the difference between the proposed algorithm and traditional algorithms, we conducted statistical analysis on the experimental results. Friedman’s test [31] is a well-known test for comparing the performance of various algorithms on datasets. Table 7, Table 8 and Table 9 list the average ranks of four algorithms on four performance measures under a noise-free environment and noisy environment.
The Friedman statistic variable can be expressed as follows:
χ F 2 = 12 N k k + 1 j R j 2 k k + 1 2 4
which is distributed according to χ F 2 with k 1 degrees of freedom, where R j is the average rank of the algorithms as listed in Table 7, Table 8 and Table 9. N = 10 and k = 4 are the number of datasets and the number of the algorithms, respectively. The Friedman statistic follows an F-distribution:
F F = N 1 χ F 2 N k 1 χ F 2
with k 1 and k 1 N 1 degrees of freedom. Table 10 shows the results of the Friedman test on the dataset without noise, with 5% noise, and with 10% noise. For α = 0.05 , the critical value of F α 3 , 27 is 2.960. For the four algorithms, ELM, RELM, CELM, and L1-ACELM, F F > F α is achieved by comparing the results from Table 10. Therefore, the assumption that all the algorithms perform the same is rejected. To further contrast the differences between paired algorithms, the Nemenyi test [32] is often used as a post hoc test.
The critical difference can be expressed as:
C D = q α k k + 1 6 N = 2.569 × 4 × 4 + 1 6 × 10 = 1.4832
where the critical value of q 0.05 is 2.569. Here, we can compare the average rank difference between the proposed algorithm and other algorithms using the CD value. If the average rank difference is greater than the CD value, this implies that the proposed algorithm is superior to the other algorithms. Otherwise, there is no difference between the two algorithms. Therefore, we can analyze the difference between the proposed algorithm and other algorithms in the following three cases:
(1)
Under noise-free environment. For the RMSE and SSR/SST index, the performance of L1-ACELM is better than that of ELM 4 1.1 = 2.9 > 1.4832 . For the MAE index, the performance of L1-ACELM is better than that of ELM 4 1.0 = 3.0 > 1.4832 and RELM 2.6 1.0 = 1.5 > 1.4832 . There is no significant difference between L1-ACELM and CELM.
(2)
Under 5% noise environment. For the RMSE index, the performance of L1-ACELM is better than that of ELM 3.7 1.0 = 2.7 > 1.4832 , RELM 2.6 1.0 = 1.6 > 1.4832 , and CELM 2.5 1.0 = 1.5 > 1.4832 . For the MAE and SSE/SST index, the performance of L1-ACELM is better than that of ELM ( 3.7 1.1 = 2.6 > 1.4832 , 3.8 1.1 = 2.7 > 1.4832 ) and RELM ( 2.7 1.1 = 1.6 > 1.4832 , 2.8 1.1 = 1.7 > 1.4832 ). For the SSR/SST index, the performance of L1-ACELM is better than that of ELM 3.7 1.15 = 2.55 > 1.4832 and CELM 2.65 1.15 = 1.5 > 1.4832 .
(3)
Under 10% noise environment. Similarly, for the RMSE, MAE, and SSE/SST index, the performance of L1-ACELM is better than that of ELM, RELM, and CELM. For the SSR/SST index, the performance of L1-ACELM is better than that of ELM and RELM.

5. Conclusions

In this paper, a novel asymmetric, bounded, smooth non-convex loss function based on the expected loss and the correntropy loss is proposed, termed AC-loss. The AC-loss loss function and L1 norm are introduced into the regularized extreme learning machine, and an improved robust regularized extreme learning machine is proposed for regression. Owing to the non-convexity of the AC-loss function, it is difficult to solve L1-ACELM. As such, the half-quadratic optimization algorithm is applied to address the nonconvex optimization problem. To prove the effectiveness of L1-ACELM, experiments are conducted on artificial datasets and benchmark datasets with different types of noise, respectively. The results demonstrate the significant advantages of L1-ACELM in generalization performance and robustness, especially when the data distribution with noise and outliers are asymmetric.
The PGD algorithm is used to solve the L1-ACELM in this paper. Since it is an iterative process, the training speed is reduced. In the future, we will research a faster method to solve this optimization problem.

Author Contributions

Conceptualization, Q.W. and F.W.; methodology, Q.W.; software, F.W.; validation, F.W., Y.A. and K.L.; writing—original draft preparation, F.W.; writing—review and editing, Q.W.; visualization, Y.A.; funding acquisition, Q.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant (51875457), the Key Research Project of Shaanxi Province (2022GY-050, 2022GY-028), the Natural Science Foundation of Shaanxi Province of China (2022JQ-636, 2021JQ-701, 2021JQ-714), and Shaanxi Youth Talent Lifting Plan of Shaanxi Association for Science and Technology (20220129).

Data Availability Statement

The data presented in the article are freely available and are listed at the reference address in the bibliography.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ding, S.; Su, C.; Yu, J. An optimizing BP neural network algorithm based on genetic algorithm. Artif. Intell. Rev. 2011, 36, 153–162. [Google Scholar] [CrossRef]
  2. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: A new learning scheme of feedforward neural networks. In Proceedings of the 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541), Budapest, Hungary, 25–29 July 2004; pp. 985–990. [Google Scholar]
  3. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
  4. Silva, B.L.; Inaba, F.K.; Evandro, O.T.; Ciarelli, P.M. Outlier robust extreme machine learning for multi-target regression. Expert Syst. Appl. 2020, 140, 112877. [Google Scholar] [CrossRef] [Green Version]
  5. Li, Y.; Wang, Y.; Chen, Z.; Zou, R. Bayesian robust multi-extreme learning machine. Knowl. -Based Syst. 2020, 210, 106468. [Google Scholar] [CrossRef]
  6. Liu, X.; Ge, Q.; Chen, X.; Li, J.; Chen, Y. Extreme learning machine for multivariate reservoir characterization. J. Pet. Sci. Eng. 2021, 205, 108869. [Google Scholar] [CrossRef]
  7. Catoni, O. Challenging the empirical mean and empirical variance: A deviation study. Annales de l’IHP Probabilités et Statistiques 2012, 48, 1148–1185. [Google Scholar] [CrossRef]
  8. Deng, W.; Zheng, Q.; Chen, L. Regularized extreme learning machine. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Data Mining, Nashville, TN, USA, 30 March–2 April 2009; pp. 389–395. [Google Scholar]
  9. Rong, H.J.; Ong, Y.S.; Tan, A.H.; Zhu, Z. A fast pruned-extreme learning machine for classification problem. Neurocomputing 2008, 72, 359–366. [Google Scholar] [CrossRef]
  10. Miche, Y.; Sorjamaa, A.; Bas, P.; Simula, O.; Jutten, C.; Lendasse, A. OP-ELM: Optimally pruned extreme learning machine. IEEE Trans. Neural Netw. 2009, 21, 158–162. [Google Scholar] [CrossRef]
  11. Ye, Q.; Yang, J.; Liu, F.; Zhao, C.; Ye, N.; Yin, T. L1-norm distance linear discriminant analysis based on an effective iterative algorithm. IEEE Trans. Circuits Syst. Video Technol. 2016, 28, 114–129. [Google Scholar] [CrossRef]
  12. Li, C.N.; Shao, Y.H.; Deng, N.Y. Robust L1-norm non-parallel proximal support vector machine. Optimization 2016, 65, 169–183. [Google Scholar] [CrossRef]
  13. Balasundaram, S.; Gupta, D. 1-Norm extreme learning machine for regression and multiclass classification using Newton method. Neurocomputing 2014, 128, 4–14. [Google Scholar] [CrossRef]
  14. Dong, H.; Yang, L. Kernel-based regression via a novel robust loss function and iteratively reweighted least squares. Knowl. Inf. Syst. 2021, 63, 1149–1172. [Google Scholar] [CrossRef]
  15. Dong, H.; Yang, L. Training robust support vector regression machines for more general noise. J. Intell. Fuzzy Syst. 2020, 39, 2881–2892. [Google Scholar] [CrossRef]
  16. Farooq, M.; Steinwart, I. An SVM-like approach for expectile regression. Comput. Stat. Data Anal. 2017, 109, 159–181. [Google Scholar] [CrossRef] [Green Version]
  17. Razzak, I.; Zafar, K.; Imran, M.; Xu, G. Randomized nonlinear one-class support vector machines with bounded loss function to detect of outliers for large scale IoT data. Future Gener. Comput. Syst. 2020, 112, 715–723. [Google Scholar] [CrossRef]
  18. Gupta, D.; Hazarika, B.B.; Berlin, M. Robust regularized extreme learning machine with asymmetric Huber loss function. Neural Comput. Appl. 2020, 32, 12971–12998. [Google Scholar] [CrossRef]
  19. Ren, Z.; Yang, L. Correntropy-based robust extreme learning machine for classification. Neurocomputing 2018, 313, 74–84. [Google Scholar] [CrossRef]
  20. Ma, Y.; Zhang, Q.; Li, D.; Tian, Y. LINEX support vector machine for large-scale classification. IEEE Access. 2019, 7, 70319–70331. [Google Scholar] [CrossRef]
  21. Singh, A.; Pokharel, R.; Principe, J. The C-loss function for pattern classification. Pattern Recognit. 2014, 47, 441–453. [Google Scholar] [CrossRef]
  22. Zhou, R.; Liu, X.; Yu, M.; Huang, K. Properties of risk measures of generalized entropy in portfolio selection. Entropy 2017, 19, 657. [Google Scholar] [CrossRef]
  23. Ren, L.R.; Gao, Y.L.; Liu, J.X.; Shang, J.; Zheng, C.H. Correntropy induced loss based sparse robust graph regularized extreme learning machine for cancer classification. BMC Bioinform. 2020, 21, 1–22. [Google Scholar] [CrossRef] [PubMed]
  24. Zhao, Y.P.; Tan, J.F.; Wang, J.J.; Yang, Z. C-loss based extreme learning machine for estimating power of small-scale turbojet engine. Aerosp. Sci. Technol. 2019, 89, 407–419. [Google Scholar] [CrossRef]
  25. He, Y.; Wang, F.; Li, Y.; Qin, J.; Chen, B. Robust matrix completion via maximum correntropy criterion and half-quadratic optimization. IEEE Trans. Signal Process. 2019, 68, 181–195. [Google Scholar] [CrossRef] [Green Version]
  26. Ren, Z.; Yang, L. Robust extreme learning machines with different loss functions. Neural Process. Lett. 2019, 49, 1543–1565. [Google Scholar] [CrossRef]
  27. Chen, L.; Paul, H.; Qu, H.; Zhao, J.; Sun, X. Correntropy-based robust multilayer extreme learning machines. Pattern Recognit. 2018, 84, 357–370. [Google Scholar]
  28. Huang, G.; Huang, G.B.; Song, S.; You, K. Trends in extreme learning machines: A review. Neural Netw. 2015, 61, 32–48. [Google Scholar] [CrossRef]
  29. Robini, M.C.; Yang, F.; Zhu, Y. Inexact half-quadratic optimization for linear inverse problems. SIAM J. Imaging Sci. 2018, 11, 1078–1133. [Google Scholar] [CrossRef]
  30. Blake, C.L.; Merz, C.J.; UCI Repository for Machine Learning Databases. Department of Information and Computer Sciences, University of California, Irvine. 1998. Available online: http://www.ics.uci.edu/~mlearn/MLRepository.html (accessed on 15 June 2022).
  31. Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
  32. Benavoli, A.; Corani, G.; Mangili, F. Should we really use post-hoc tests based on mean-ranks? J. Mach. Learn. Res. 2016, 17, 152–161. [Google Scholar]
Figure 1. Asymmetric C-loss function.
Figure 1. Asymmetric C-loss function.
Axioms 12 00204 g001
Figure 2. Graphs of the sinc function with different noises.
Figure 2. Graphs of the sinc function with different noises.
Axioms 12 00204 g002
Figure 3. Graphs of the self-defining function with different noises.
Figure 3. Graphs of the self-defining function with different noises.
Axioms 12 00204 g003
Figure 4. Fitting results of the sinc function with different noises.
Figure 4. Fitting results of the sinc function with different noises.
Axioms 12 00204 g004
Figure 5. Fitting results of the self-defining function with different noises.
Figure 5. Fitting results of the self-defining function with different noises.
Axioms 12 00204 g005
Table 1. Artificial datasets with different types of noise.
Table 1. Artificial datasets with different types of noise.
Artificial DatasetFunction DefinitionTypes of Noise
Sinc function y i = sin c 2 x i = sin 2 x i 2 x i + λ i Type A: x 3 , 3 , λ i N 0 , 0.15 ^ 2
Type B: x 3 , 3 , λ i N 0 , 0.5 ^ 2
Type C: x 3 , 3 , λ i U 0.15 , 0.15
Type D: x 3 , 3 , λ i U 0.5 , 0.5
Type E: x 3 , 3 , λ i T 5
Type F: x 3 , 3 , λ i T 10
Self-defining function y i = e x i 2 sin c 0.3 π x i + λ i
Table 2. Experiment results on artificial datasets with different types of noise.
Table 2. Experiment results on artificial datasets with different types of noise.
DatasetNoiseAlgorithm ( γ , σ , τ ) RMSEMAESSE/SSTSSR/SST
Sinc functionType AELM
RELM
CELM
L1−ACELM
(/, /, /)
(220, /, /)
(210, 2−2, /)
(2−23, 2−2, 0.7)
0.2429
0.2341
0.2345
0.2109
0.1957
0.1942
0.1949
0.1690
0.6206
0.5768
0.5785
0.4680
0.3808
0.4263
0.4256
0.5359
Type BELM
RELM
CELM
L1−ACELM
(/, /, /)
(22, /, /)
(2−19, 2−2, /)
(25, 2−2, 0.3)
0.5288
0.5270
0.5286
0.5221
0.4199
0.4186
0.4199
0.4143
0.9064
0.9004
0.9060
0.8838
0.0988
0.1004
0.0991
0.1246
Type CELM
RELM
CELM
L1−ACELM
(/, /, /)
(2−42, /, /)
(210, 2−2, /)
(239, 2−2, 0.7)
0.1923
0.2019
0.1922
0.1595
0.1581
0.1677
0.1582
0.1309
0.4332
0.4776
0.4325
0.2978
0.5701
0.5233
0.5705
0.7023
Type DELM
RELM
CELM
L1−ACELM
(/, /, /)
(212, /, /)
(2−38, 2−2, /)
(2−4, 2−2, 0.3)
0.3262
0.3246
0.3223
0.3199
0.2715
0.2709
0.2695
0.2678
0.6963
0.6890
0.6828
0.6706
0.7633
0.7578
0.7664
0.8571
Type EELM
RELM
CELM
L1−ACELM
(/, /, /)
(212, /, /)
(2−12, 2−2, /)
(2−12, 2−2, 0.2)
0.1737
0.1766
0.1725
0.1349
0.1406
0.1441
0.1398
0.1175
0.2369
0.2451
0.2338
0.1431
0.7633
0.7578
0.7664
0.8571
Type FELM
RELM
CELM
L1−ACELM
(/, /, /)
(2−2, /, /)
(2−1, 2−2, /)
(2−3, 2−2, 0.1)
0.1885
0.1746
0.1757
0.1753
0.1422
0.1412
0.1413
0.1416
0.2715
0.2328
0.2359
0.2346
0.7298
0.7681
0.7651
0.7663
Type AELM
RELM
CELM
L1−ACELM
(/, /, /)
(2−8, /, /)
(2−7, 2−2, /)
(2−10, 2−2, 0.5)
0.1572
0.1569
0.1565
0.1560
0.1304
0.1301
0.1294
0.1241
0.0908
0.0893
0.0888
0.0800
0.9105
0.9120
0.9127
0.9211
Type BELM
RELM
CELM
L1−ACELM
(/, /, /)
(226, /, /)
(215, 2−2, /)
(2−16, 2−2, 0.2)
0.4905
0.4862
0.4858
0.4849
0.3843
0.3850
0.3838
0.3795
0.4761
0.4766
0.4759
0.4641
0.5251
0.5249
0.5252
0.5369
Type CELM
RELM
CELM
L1−ACELM
(/, /, /)
(225, /, /)
(217, 2−2, /)
(237, 2−2, 0.2)
0.0937
0.0950
0.0936
0.0934
0.0794
0.0803
0.0792
0.0791
0.0288
0.0296
0.0287
0.0286
0.9714
0.9706
0.9715
0.9716
Type DELM
RELM
CELM
L1−ACELM
(/, /, /)
(215, /, /)
(2−34, 2−2, /)
(222, 2−2, 0.7)
0.3009
0.3006
0.2948
0.2929
0.2622
0.2614
0.2555
0.2534
0.2471
0.2466
0.2373
0.2342
0.7534
0.7539
0.7634
0.7665
Type EELM
RELM
CELM
L1−ACELM
(/, /, /)
(2−26, /, /)
(22, 2−2, /)
(244, 2−2, 0.4)
0.0434
0.0426
0.0425
0.0415
0.0372
0.0367
0.0363
0.0335
0.0074
0.0071
0.0071
0.0068
0.9929
0.9932
0.9932
0.9935
Self−defining functionType FELM
RELM
CELM
L1−ACELM
(/, /, /)
(25, /, /)
(212, 2−2, /)
(220, 2−2, 0.3)
0.0498
0.0761
0.0481
0.0513
0.0425
0.0586
0.0408
0.0372
0.0098
0.0230
0.0092
0.0104
0.9912
0.9779
0.9920
0.9908
Table 3. Description of benchmark datasets.
Table 3. Description of benchmark datasets.
DatasetNumber of Training DataNumber of
Testing Data
Number of Features
Boston Housing40410213
Air Quality7485187212
AutoMPG313797
Triazines1483860
Bodyfat2015114
Pyrim591527
Servo133344
Bike Sharing58414713
Balloon16004011
NO24001007
Table 4. Performance of different algorithms under noise-free environment.
Table 4. Performance of different algorithms under noise-free environment.
DatasetAlgorithm ( γ , σ , τ ) RMSEMAESSE/SSTSSR/SST
Boston HousingELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−16, /, /)
(2−31, 2−2, /)
(2−24, 2−2, 0.4)
4.4449(4)
4.1636(3)
4.1511(2)
4.0435(1)
3.1736(4)
2.9660(2)
2.9847(3)
2.9236(1)
0.2438(4)
0.2068(3)
0.2067(2)
0.1965(1)
0.7682(4)
0.7998(3)
0.8002(2)
0.8097(1)
Air QualityELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−32, /, /)
(2−37, 2−2, /)
(2−36, 2−2, 0.4)
8.3167(4)
7.4516(1)
7.5140(3)
7.4574(2)
6.5439(4)
5.7812(3)
5.7604(2)
5.7383(1)
0.0297(4)
0.0215(2.5)
0.0215(2.5)
0.0212(1)
0.9705(4)
0.9786(2)
0.9785(3)
0.9788(1)
AutoMPGELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−57, /, /)
(2−43, 2−2, /)
(2−32, 2−2, 0.5)
2.8296(4)
2.6859(3)
2.6590(2)
2.5914(1)
2.0956(4)
1.9632(3)
1.9582(2)
1.8949(1)
0.1352(4)
0.1205(3)
0.1202(2)
0.1143(1)
0.8710(4)
0.8845(2)
0.8840(3)
0.8907(1)
TriazinesELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−49, /, /)
(2−19, 2−2, /)
(2−31, 2−2, 0.5)
0.0664(4)
0.0557(3)
0.0529(2)
0.0490(1)
0.0465(4)
0.0410(3)
0.0393(2)
0.0365(1)
0.0816(4)
0.0545(3)
0.0526(2)
0.0416(1)
0.9283(4)
0.9547(3)
0.9573(2)
0.9645(1)
BodyfatELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−10, /, /)
(2−6, 2−2, /)
(2−16, 2−2, 0.1)
1.3123(4)
1.1374(3)
1.1352(2)
1.0036(1)
0.7449(4)
0.6904(3)
0.6858(2)
0.5936(1)
0.0298(4)
0.0233(2)
0.0234(3)
0.0189(1)
0.9732(4)
0.9794(2)
0.9787(3)
0.9820(1)
PyrimELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−1, /, /)
(2−20, 2−2, /)
(2−10, 2−2, 0.1)
0.1085(4)
0.0759(2)
0.0800(3)
0.0728(1)
0.0688(4)
0.0548(2)
0.0552(3)
0.0502(1)
0.6897(4)
0.3535(2)
0.3839(3)
0.2956(1)
0.6143(4)
0.8034(2)
0.7718(3)
0.8284(1)
ServoELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−40, /, /)
(2−41, 2−2, /)
(2−46, 2−2, 0.4)
0.7367(4)
0.6769(3)
0.6733(2)
0.6593(1)
0.5220(4)
0.4750(3)
0.4730(2)
0.4491(1)
0.2826(4)
0.2075(3)
0.2061(2)
0.1917(1)
0.7874(4)
0.8148(3)
0.8214(2)
0.8270(1)
Bike SharingELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−10)
(2−16, 2−2, /)
(2−9, 2−2, 0.2)
287.615(4)
236.107(2)
241.917(3)
217.385(1)
206.507(4)
178.976(2)
180.856(3)
160.747(1)
0.0230(4)
0.0157(2)
0.0161(3)
0.0130(1)
0.9773(4)
0.9851(2)
0.9844(3)
0.9873(1)
BalloonELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−29, /, /)
(2−25, 2−2, /)
(2−24, 2−2, 0.9)
0.0850(4)
0.0796(3)
0.0782(2)
0.0773(1)
0.0543(4)
0.0528(3)
0.0527(2)
0.0525(1)
0.3452(4)
0.2991(3)
0.2806(2)
0.2790(1)
0.7026(4)
0.7147(3)
0.7335(1)
0.7304(2)
NO2ELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−9, /, /)
(2−15, 2−2, /)
(2−17, 2−2, 0.2)
0.5272(4)
0.5154(2)
0.5161(3)
0.5132(1)
0.4128(4)
0.4034(2)
0.4047(3)
0.4028(1)
0.5157(4)
0.4844(2)
0.4910(3)
0.4823(1)
0.5060(4)
0.5298(2)
0.5271(3)
0.5338(1)
Table 5. Performance of different algorithms under 5% noise environment.
Table 5. Performance of different algorithms under 5% noise environment.
DatasetAlgorithm ( γ , σ , τ ) RMSEMAESSE/SSTSSR/SST
Boston HousingELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−17, /, /)
(2−6, 2−2, /)
(2−5, 2−2, 0.5)
6.5817(4)
6.2972(3)
6.2155(2)
6.1256(1)
4.1292(4)
3.9095(3)
3.8937(2)
3.8185(1)
0.4196(4)
0.3835(3)
0.3756(2)
0.3675(1)
0.5962(4)
0.6327(3)
0.6407(2)
0.6478(1)
Air QualityELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−32, /, /)
(2−39, 2−2, /)
(2−39, 2−2, 0.8)
12.0381(4)
11.6199(2)
11.6303(3)
11.5540(1)
7.5222(4)
7.1866(3)
7.1554(2)
7.1145(1)
0.0531(4)
0.0496(2)
0.0499(3)
0.0489(1)
0.9471(4)
0.9504(2)
0.9501(3)
0.9511(1)
AutoMPGELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−21, /, /)
(2−28, 2−2, /)
(2−30, 2−2, 0.9)
5.6949(4)
5.5923(2)
5.6502(3)
5.4775(1)
3.2315(4)
3.1677(3)
3.1189(2)
3.0347(1)
0.4024(4)
0.3919(3)
0.3915(2)
0.3688(1)
0.6204(4)
0.6337(2)
0.6299(3)
0.6558(1)
TriazinesELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−16, /, /)
(2−39, 2−2, /)
(2−22, 2−2, 0.5)
0.0937(4)
0.0790(3)
0.0779(2)
0.0725(1)
0.0618(4)
0.0549(3)
0.0515(2)
0.0489(1)
0.1510(4)
0.1031(3)
0.0989(2)
0.0834(1)
0.8719(4)
0.9199(3)
0.9172(2)
0.9273(1)
BodyfatELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−16, /, /)
(2−36, 2−2, /)
(2−11, 2−2, 0.6)
4.1325(4)
3.9255(3)
3.8868(2)
3.7288(1)
2.0890(4)
2.0575(3)
2.0413(2)
1.9119(1)
0.2414(4)
0.2115(3)
0.2095(2)
0.1986(1)
0.7783(4)
0.8027(2)
0.8078(3)
0.8149(1)
PyrimELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−12, /, /)
(2−3, 2−2, /)
(2−13, 2−2, 0.8)
0.1019(4)
0.0825(2)
0.0871(3)
0.0743(1)
0.0722(4)
0.0591(2)
0.0609(3)
0.0562(1)
0.6711(4)
0.4008(2)
0.4435(3)
0.3720(1)
0.6685(4)
0.7537(2)
0.7153(3)
0.7762(1)
ServoELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−46, /, /)
(2−42, 2−2, /)
(2−49, 2−2, 0.7)
0.8424(4)
0.7753(3)
0.7598(1)
0.7724(2)
0.5868(4)
0.5473(3)
0.5252(1)
0.5299(2)
0.3224(4)
0.2794(3)
0.2763(1)
0.2983(2)
0.7235(4)
0.7742(3)
0.7752(2)
0.7778(1)
Bike SharingELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−1, /, /)
(2−9, 2−2, /)
(2−6, 2−2, 0.9)
1130.04(4)
1093.85(2)
1094.35(3)
1085.27(1)
497.051(4)
453.720(2)
461.094(3)
441.646(1)
0.2730(4)
0.2556(3)
0.2545(2)
0.2526(1)
0.7352(4)
0.7505(3)
0.7523(1.5)
0.7523(1.5)
BalloonELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−16, /, /)
(2−9, 2−2, /)
(2−5, 2−2, 0.9)
0.0874(4)
0.0850(3)
0.0799(2)
0.0782(1)
0.0546(3)
0.0544(2)
0.0549(4)
0.0536(1)
0.3815(4)
0.3444(3)
0.3086(2)
0.2704(1)
0.6794(4)
0.7170(2)
0.7135(3)
0.7368(1)
NO2ELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−31, /, /)
(2−19, 2−2, /)
(2−19, 2−2, 0.5)
0.9489(1)
0.9698(3)
0.9737(4)
0.9611(2)
0.5767(2)
0.5781(3)
0.5856(4)
0.5708(1)
0.7594(2)
0.7754(3)
0.7844(4)
0.7515(1)
0.2803(1)
0.2692(3)
0.2644(4)
0.2790(2)
Table 6. Performance of different algorithms under 10% noise environment.
Table 6. Performance of different algorithms under 10% noise environment.
DatasetAlgorithm ( γ , σ , τ ) RMSEMAESSE/SSTSSR/SST
Boston HousingELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−30, /, /)
(2−36, 2−2, /)
(2−48, 2−2, 0.9)
8.6315(4)
8.2456(3)
8.2437(2)
8.1718(1)
5.1524(4)
5.1512(3)
4.9250(2)
4.8090(1)
0.5873(4)
0.5177(3)
0.5151(2)
0.5123(1)
0.4557(4)
0.4999(3)
0.5006(2)
0.5074(1)
Air QualityELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−39, /, /)
(2−45, 2−2, /)
(2−4, 2−2, 0.6)
14.7386(4)
14.5651(3)
14.5412(2)
14.4355(1)
8.8277(4)
8.4928(3)
8.4737(2)
8.4236(1)
0.0778(4)
0.0759(3)
0.0754(2)
0.0748(1)
0.9223(4)
0.9241(3)
0.9246(2)
0.9253(1)
AutoMPGELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−28, /, /)
(2−27, 2−2, /)
(2−39, 2−2, 0.1)
7.0139(3)
7.0729(4)
6.9306(2)
6.9151(1)
4.0307(2)
4.0592(3)
4.0792(4)
3.9845(1)
0.5218(3)
0.5278(4)
0.5147(2)
0.5032(1)
0.5009(4)
0.5068(3)
0.5183(1)
0.5169(2)
TriazinesELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−37, /, /)
(2−21, 2−2, /)
(2−29, 2−2, 0.6)
0.1166(4)
0.1068(2)
0.1074(3)
0.0963(1)
0.0776(4)
0.0703(2)
0.0705(3)
0.0638(1)
0.2077(4)
0.1693(2)
0.1729(3)
0.1378(1)
0.8116(4)
0.8536(2)
0.8501(3)
0.8815(1)
BodyfatELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−23, /, /)
(2−22, 2−2, /)
(2−8, 2−2, 0.4)
6.5116(3)
6.5075(2)
6.5343(4)
6.3088(1)
3.4749(2)
3.4977(3)
3.5697(4)
3.4931(1)
0.4184(4)
0.4094(2)
0.4119(3)
0.3743(1)
0.6129(4)
0.6180(3)
0.6182(2)
0.6515(1)
PyrimELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−23, /, /)
(2−10, 2−2, /)
(2−24, 2−2, 0.5)
0.1263(4)
0.1136(2)
0.1137(3)
0.1010(1)
0.0903(4)
0.0804(2)
0.0812(3)
0.0717(1)
0.9389(4)
0.7002(2)
0.7098(3)
0.4848(1)
0.5540(4)
0.6048(3)
0.6515(2)
0.7080(1)
ServoELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−34, /, /)
(2−39, 2−2, /)
(2−45, 2−2, 0.9)
0.8648(4)
0.8253(3)
0.8025(2)
0.7486(1)
0.6291(3)
0.6889(4)
0.5487(2)
0.5332(1)
0.3719(4)
0.2863(3)
0.2788(2)
0.2412(1)
0.7042(4)
0.7633(2)
0.7557(3)
0.7960(1)
Bike SharingELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−39, /, /)
(2−42, 2−2, /)
(2−49, 2−2, 0.1)
1614.52(4)
1587.01(3)
1582.54(2)
1562.74(1)
755.097(4)
716.147(2)
718.328(3)
714.710(1)
0.4224(4)
0.4052(3)
0.4012(2)
0.3952(1)
0.5926(4)
0.6055(3)
0.6089(2)
0.6194(1)
BalloonELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−34, /, /)
(2−39, 2−2, /)
(2−42, 2−2, 0.5)
0.0785(1)
0.0807(4)
0.0793(3)
0.0788(2)
0.0547(3)
0.0549(4)
0.0545(2)
0.0544(1)
0.2749(2)
0.2871(3)
0.2931(4)
0.2682(1)
0.7321(2)
0.7206(3)
0.7127(4)
0.7398(1)
NO2ELM
RELM
CELM
L1-ACELM
(/, /, /)
(2−16, /, /)
(2−27, 2−2, /)
(2−23, 2−2, 0.2)
1.2576(4)
1.2718(2)
1.2478(3)
1.2408(1)
0.7013(1)
0.7259(4)
0.7164(3)
0.7080(2)
0.8752(3)
0.8908(4)
0.8639(2)
0.8566(1)
0.1643(4)
0.1663(3)
0.1770(2)
0.1882(1)
Table 7. Average ranks of benchmark algorithms under noise-free environment.
Table 7. Average ranks of benchmark algorithms under noise-free environment.
AlgorithmRMSEMAESSE/SSTSSR/SST
ELM4444
RELM2.52.62.552.4
CELM2.42.42.452.5
L1-ACELM1.11.01.01.1
Table 8. Average ranks of benchmark algorithms under 5% noise environment.
Table 8. Average ranks of benchmark algorithms under 5% noise environment.
AlgorithmRMSEMAESSE/SSTSSR/SST
ELM3.73.73.83.7
RELM2.62.72.82.5
CELM2.52.52.32.65
L1-ACELM1.01.11.11.15
Table 9. Average ranks of benchmark algorithms under 10% noise environment.
Table 9. Average ranks of benchmark algorithms under 10% noise environment.
AlgorithmRMSEMAESSE/SSTSSR/SST
ELM3.53.13.63.8
RELM2.83.02.92.8
CELM2.62.82.52.3
L1-ACELM1.11.11.01.1
Table 10. Relevant values in the Friedman test on benchmark datasets.
Table 10. Relevant values in the Friedman test on benchmark datasets.
Ratio of Noise χ F 2 F F CD
RMSEMAESSE/SSTSSR/SSTRMSEMAESSE/SSTSSR/SST
Noise-free25.3227.1227.0325.3248.6984.7581.9148.691.4832
5% noise16.2020.6422.6819.7110.5719.8127.8917.241.4832
10% noise18.3615.9621.7222.6814.2010.2323.6127.891.4832
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, Q.; Wang, F.; An, Y.; Li, K. L1-Norm Robust Regularized Extreme Learning Machine with Asymmetric C-Loss for Regression. Axioms 2023, 12, 204. https://doi.org/10.3390/axioms12020204

AMA Style

Wu Q, Wang F, An Y, Li K. L1-Norm Robust Regularized Extreme Learning Machine with Asymmetric C-Loss for Regression. Axioms. 2023; 12(2):204. https://doi.org/10.3390/axioms12020204

Chicago/Turabian Style

Wu, Qing, Fan Wang, Yu An, and Ke Li. 2023. "L1-Norm Robust Regularized Extreme Learning Machine with Asymmetric C-Loss for Regression" Axioms 12, no. 2: 204. https://doi.org/10.3390/axioms12020204

APA Style

Wu, Q., Wang, F., An, Y., & Li, K. (2023). L1-Norm Robust Regularized Extreme Learning Machine with Asymmetric C-Loss for Regression. Axioms, 12(2), 204. https://doi.org/10.3390/axioms12020204

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop