1. Introduction
In the field of machine learning, researchers have been dedicated to enhancing the efficiency and accuracy of models. Sakheta et al. [
1] improved the prediction of the biomass gasification model through six machine learning algorithms. The research demonstrated that the XGBoost algorithm has significant advantages in improving the accuracy of gasification product prediction. Maydanchi et al. [
2] systematically compared various machine learning methods and found that tree-based ensemble methods, such as XGBoost, gradient boosting, and random forest, excelled in diabetes prediction. Kim et al. [
3] successfully classified three similar enterococci by combining MALDI-TOF mass spectrometry techniques and multiple supervised learning algorithms (e.g., KNN, SVM, random forest). Although these methods have made significant progress in different domains, there remains room for improvement in enhancing computational efficiency and response times. The extreme learning machine (ELM) [
4,
5] offers a promising solution to these challenges with its efficient training process and superior generalization capabilities. It was first proposed by Huang et al. [
6] and quickly gained widespread application in multiple fields, including image classification [
7], fault detection [
8,
9], disease diagnosis [
10], computer vision [
11], face recognition [
12], and signal processing [
13]. These application cases fully validate the practicability and effectiveness of ELM as an efficient neural network training method.
In binary classification tasks, traditional ELM only learns a single hyperplane to distinguish between classes. Recently, two nonparallel hyperplanes classification algorithms have attracted significant attention and research interest [
14,
15]. These algorithms involve the training of multiple hyperplanes, where each hyperplane is designed to minimize its distance to one of the two classes while maximizing its distance from the other class. For example, the twin support vector machine (TSVM) is notable for its efficiency in learning two nonparallel separating hyperplanes more quickly than the traditional support vector machine (SVM) by solving two reduced-sized quadratic programming problems (QPPs). The various variants of TSVM [
16,
17,
18,
19] have been extensively studied and have been successfully applied in classification tasks.
Inspired by TSVM, Wan et al. [
20] proposed the twin extreme learning machine (TELM). It is noteworthy that the TELM and TSVM use the hinge loss function, which is unbounded and tends to exaggerate the impact of noise and outliers on the model. Consequently, the research community has expressed increasing interest in exploring alternative loss functions. Wang et al. [
21] proposed a new robust capped
-norm twin support vector machine (CTWSVM), which maintains the benefits of TWSVM and enhances the robustness of the model. Wang and Yu et al. [
22] proposed a new robust loss function, the capped Linex loss function, which was applied to the TSVM to enhance the classification capabilities of the model. Kumari A et al. [
23] introduced the capped pinball loss function into the universum twin support vector machine (UTWSVM), and proposed a universum twin support vector machine (Tpin-UTWSVM) based on capped pinball loss function, which improved the model’s generalization performance. Ma et al. [
24] proposed a robust adaptive capped
loss, altering the loss function value by adjusting the adaptive parameter
during the training process. Applying this loss function to TSVM, an adaptive robust learning framework was proposed, namely the adaptive robust twin support vector machine (ARTSVM). All the above models use bounded capped loss functions, which constrain the impact of noise within certain limits and make the classifiers less sensitive to noise.
In order to further reduce the impact of noise, many scholars have begun to look for new metrics to substitute for the squared
-norm metric used in the TELM. Ma et al. [
25] proposed a fast robust twin extreme learning machine (FRTELM) based on capped
-norm metric and loss function in the classic TELM learning framework, which enhances the robustness of the TELM in handling classification problems. Yang et al. [
26] added the idea of projection on the basis of the twin extreme learning machine, and combining this with the capped
-norm metric and loss function, they proposed a new capped
-norm projection twin extreme learning machine (
-PTELM). It lessens the influence of outliers and demonstrates more robustness than the TELM. Ma and Yang et al. [
27] proposed a new robust TELM framework (RTELM) using the capped
-norm metrics and capped
loss function. RTELM addresses the limitations of
-norm metric and hinge loss, particularly in scenarios with outliers. It retains the strengths of the TELM and further enhances the robustness of classification. These algorithms show that the capped
-norm metric is resistant to outliers. In fact, the capped
-norm metric is considered an effective approximation of the
-norm by a non-negative parameter, and it is superior in robustness to the
-norm metric [
27]. In addition, related scholars have begun to focus on the capped
-norm metric and have applied it to their models. This metric is bounded and can be flexibly tuned by adjusting the
p-value to adapt to diverse datasets and reduce the effect of noise. Yuan et al. [
28] created a novel framework to improve robustness by substituting the squared
-norm metric with the robust capped
-norm metric in a least squares twin support vector machine (LSTSVM), which is called capped
-norm LSTSVM (
-LSTSVM). Wang et al. [
29] proposed a capped
-norm metric based on the robust twin support vector machine with Welsch loss function (WCTBSVM). The generalization performance and robustness of the TSVM are further improved. Jiang et al. [
30] proposed a novel robust twin extreme learning machine learning framework (CWTELM) by combining the capped
-norm metric and Welsch loss function with the TELM. CWTELM improves robustness while preserving the advantages of TELM, thereby enhancing classification performance.
Besides altering metrics and loss functions, regularization techniques play a vital role in improving the generalization capabilities of models. The Fisher regularization term is a notable technique that minimizes within-class variance and excels in improving class separability and robustness. Ma and Wen et al. [
31] proposed a Fisher regularization ELM (Fisher-ELM) to reach a minimal within-class scatter. Fisher-ELM utilizes the statistical properties of the data, which exhibits excellent generalization ability. Although Fisher-ELM incorporates statistical knowledge into its framework, it tends to ignore the potential effects of noise or outliers. To reduce the negative effects of these factors, Xue and Zhao et al. [
32] first proposed a novel asymmetric Welsch loss function and integrated it into Fisher-ELM, then proposed a robust Fisher regularization extreme learning machine with asymmetric Welsch-induced loss function (AWFisher-ELM). This model better copes with the adverse effects of noise and outliers, enhancing the robustness of the model. Xue et al. [
33] added Fisher regularization to the TELM and proposed Fisher regularization TELM (FTELM), which both keeps the strengths of the TELM and minimizes the intra-class differences of samples. In order to further improve the noise immunity of the FTELM method, a new capped
-norm Fisher regularization TELM (C
-FTELM) is proposed by combining the capped
-norm metric and loss function to enhance the robustness of the model.
In this paper, we first propose a bounded, smooth, and symmetrical squared fractional loss (SF-loss). Based on the proposed SF-loss, we also integrate the TELM, capped -norm metric, and Fisher regularization and propose a robust supervised TELM learning framework (SF-RSTELM). SF-RSTELM can effectively utilize the statistical properties of the data, which the TELM lacks. In addition, it can effectively reduce the impact of noise and outliers by employing the bounded capped -norm metric and SF-loss function. In contrast, the TELM uses the unbounded squared -norm metric and hinge loss, which are susceptible to the influence of noise and outliers.
The main work of this paper is summarized as follows:
- (1)
A new robust loss function called squared fractional loss (SF-loss) is presented. It has some important properties such as being bounded, smooth, symmetric, and noise-insensitive. Moreover, the robustness of the SF-loss is analyzed according to the perspective of M estimation theory [
34], and its Fisher consistency is proved according to the Bayesian rule [
35].
- (2)
An innovative method named “The Robust Supervised Learning Framework: Harmonious Integration of Twin Extreme Learning Machine, Squared Fractional Loss, Capped -norm Metric, and Fisher Regularization” is proposed. This framework cleverly combines the efficiency of the TELM, the robustness of the SF-loss function, the flexibility of the capped -norm metric, and the advantages of Fisher regularization. This integrated approach not only takes into account the statistical information of the data but also significantly reduces the impact of noise, thereby enhancing the model’s performance.
- (3)
Due to the non-convex nature of the established optimization model, an efficient algorithm based on CCCP [
36] is proposed to solve the optimization problem. Moreover, the convergence of the proposed algorithm is proved.
- (4)
We performed extensive experiments on artificial datasets, UCI datasets, image datasets, and NDC-large datasets to validate the effectiveness of our proposed algorithm compared to other state-of-the-art algorithms.
The rest of this paper is structured as follows. In
Section 2, we briefly review related work on Fisher regularization, the Fisher regularized twin extreme learning machine, the capped
-norm metric, and the concave-convex procedure. In
Section 3, we provide a comprehensive description of the proposed model and a detailed solution process. The experimental results on multiple datasets are presented in
Section 4. Conclusions and suggestions for future work are given in
Section 5.
5. Conclusions and Future Works
In this paper, we first propose a new kind of SF-loss function that exhibits favorable characteristics including boundedness, smoothness, symmetry, noise insensitivity, and Fisher consistency. Then, SF-RSTELM is proposed by integrating the capped -norm metric, SF-loss, and Fisher regularization term. SF-RSTELM not only integrates the Fisher regularization term, addressing the intra-class divergence of the data, but also exploits the parameter adjustability of SF-loss and the flexibility of capped -norm metrics to reduce the influence of noise and outliers. Moreover, an efficient iterative algorithm is proposed to solve the model, and the convergence of the algorithm is proved. Experimental results on multiple datasets demonstrate the efficiency of the proposed model. Specifically, our model was able to achieve higher ACC and scores on most datasets, with improvements ranging from 0.28% to 4.5% compared to other state-of-the-art algorithms.
In the future, we will continue to study the improvement of the algorithm. Because the model constructed in this paper represents a non-convex optimization problem, we convert it into a series of convex problems to solve by the CCCP method, resulting in a long training time, so it is necessary to find a fast solution method in future research. Moreover, transforming this paper’s model from supervised to semi-supervised learning remains an important direction for future studies.