*Article* **Robust Online Support Vector Regression with Truncated** *ε***-Insensitive Pinball Loss**

**Xian Shan \*, Zheshuo Zhang, Xiaoying Li, Yu Xie and Jinyu You**

College of Science, China University of Petroleum, Qingdao 266580, China **\*** Correspondence: 20120029@upc.edu.cn

**Abstract:** Advances in information technology have led to the proliferation of data in the fields of finance, energy, and economics. Unforeseen elements can cause data to be contaminated by noise and outliers. In this study, a robust online support vector regression algorithm based on a non-convex asymmetric loss function is developed to handle the regression of noisy dynamic data streams. Inspired by pinball loss, a truncated *ε*-insensitive pinball loss (TIPL) is proposed to solve the problems caused by heavy noise and outliers. A TIPL-based online support vector regression algorithm (TIPOSVR) is constructed under the regularization framework, and the online gradient descent algorithm is implemented to execute it. Experiments are performed using synthetic datasets, UCI datasets, and real datasets. The results of the investigation show that in the majority of cases, the proposed algorithm is comparable, or even superior, to the comparison algorithms in terms of accuracy and robustness on datasets with different types of noise.

**Keywords:** regression; data stream; non-convex loss function; noise-resilient; online-learning

**MSC:** 68T09; 62R07

#### **1. Introduction**

Machine learning-based techniques attempt to investigate the patterns in the data and the reasoning behind it. Researchers in the field of machine learning field have shown significant interest in support vector regression (SVR) algorithms owing to the strong theoretical basis and excellent generalization ability. SVR has proven to be a reliable method for regression and has been widely used in several applications, such as wind speed forecasting [1,2], solar radiation forecasting [3], financial time series forecasting [4,5], travel time forecasting [6], among others.

Classic SVR is a powerful regression method. It works by minimizing the empirical risk loss and the structural risk, which are defined by the loss function and the regularization term, respectively. Given a training dataset *T* = {(**X***i*, *yi*) | **X***<sup>i</sup>* ∈ **R***m*, *yi* ∈ **R**, *i* = 1, 2, . . . , *N*}, SVR aims to find a linear function *f*(**X**) = **W***T***X** + *b*, **W** ∈ **R***m*, *b* ∈ **R** or a nonlinear function *f*(**X**) = **W***Tφ*(**X**) + *b* in feature space, to reveal the patterns and trends in the data. The minimal problem is described as follows:

$$\min \quad \frac{1}{2} \|f\|\_{\mathcal{H}}^2 + \mathbb{C} \sum\_{i=1}^N L(f(\mathbf{X}\_i) - y\_i) \tag{1}$$

*C* is the regularization parameter used to adjust the model complexity and training error. The loss function *L*(·) measures the difference between the predicted and observed values, which is used to define empirical risk loss. The regularization term makes *f*(*x*) as flat as possible to avoid overfitting. The regression estimator is obtained by solving a convex optimization problem where all the local minima are also global.

The loss function is of significant importance for SVR. It should accurately reflect the noise characteristics present in the training data. In recent years, researchers have

**Citation:** Shan, X.; Zhang, Z.; Li, X.; Xie, Y.; You, J. Robust Online Support Vector Regression with Truncated *ε*-Insensitive Pinball Loss. *Mathematics* **2023**, *11*, 709. https:// doi.org/10.3390/math11030709

Academic Editors: Wen Zhang, Xiaofeng Xu, Jun Wu and Kaijian He

Received: 13 December 2022 Revised: 22 January 2023 Accepted: 27 January 2023 Published: 30 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

developed various loss functions. The most commonly used loss functions are the squared loss, linear loss, Huber loss, and *ε*-insensitive loss [7]. Squared loss [8,9] is a metric that assesses the discrepancy between the predicted and actual values by calculating the mean squared error. It is a smooth function which can be solved quickly and accurately by convex optimization methods. However, it is sensitive to large errors, making it less robust than other techniques. Linear loss [10,11] is a general loss function applicable to several problems. It is less sensitive to large errors than the squared loss because it is designed by absolute errors. Huber loss [12] is a combination of linear and squared losses and is designed to simultaneously provide robustness and smoothness by using squared loss for the smaller errors and linear loss for the larger errors. The combination of the two loss functions allows for a more sophisticated understanding of the data. The *ε*-insensitive loss [10,11] augments the linear loss by introducing an insensitive band to the data, which promotes sparsity.

In the current era of data abundance, accurately analyzing dynamic data with noise and outliers using SVR is a challenging but essential task.

One of the problems is that the loss function is not sufficiently resilient to general noise in the data and can be adversely affected by outliers. Data collection processes are influenced by various external factors, resulting in noise and outliers in the data. Across numerous fields, including finance, economy, and energy, data are regularly accompanied by a considerable amount of asymmetric noise. Further, noise has several forms and is difficult to identify and remove. For example, asymmetric heavy-tailed noise that is predominantly positive is typically found in automobile insurance claims [13]. The energyload data contain non-Gaussian noise with a heavy-tailed distribution [14]. The popular loss functions mentioned above are usually symmetric, which means that the loss incurred is the same degree regardless of the direction of the prediction error. They have proven themselves in situations where the noise is symmetric, such as the Gaussian noise and Uniform noise. However, they are not as robust as dealing with asymmetric noise, including heavy-tailed noise and outliers. This was demonstrated in a previous study [7,15,16]. In addition, comparing the predicted value to the target value, there may be different impacts of over-estimation and under-estimation [7,14]. Take the energy market as an example, hedging contracts between retailers and suppliers are commonly used to stabilize the cost of goods in short term, thus reducing economic risk. Over-prediction and under-prediction both result in economic losses, albeit of different magnitudes. Over-forecasting may incur the cost of disposing of unused orders, while under-forecasting may cause retailers to pay a higher price for energy loads than the contract price. Developing a more accurate regression model requires consideration of the various penalties for over-estimation and under-estimation.

To handle the asymmetric noise while considering the distinct effects of positive and negative errors, two asymmetric loss functions have been proposed: quantile loss and pinball loss [15,17,18]. These loss functions differ in terms of different penalty weights for positive and negative errors, thus making them more robust to asymmetric noise.

Moreover, outliers have a severe impact on the accuracy of the model. The presence of outliers skews the data and causes large deviations from the expected results. It is crucial to consider the presence of outlier when constructing and evaluating a model. Given the under-forecasting in dealing with general noise and outliers, there is a need for designing a broader range of loss functions to address these issues. Researchers have developed non-convex loss functions to handle outliers, such as correlated entropy loss [16,19] and truncated loss functions [8,18,20,21]. These two strategies limit the outlier loss to a specific range, thereby reducing the impact of outliers on the regression function. The asymmetric loss function and the truncated loss functions have proven to be viable strategies for improving the robustness of regression models.

Another problem is that the batch learning framework used by a typical SVR is unsuitable for the data flow environment. Traditional SVRs involve batch learning, which can be challenging when dealing with large datasets owing to the increased storage requirements

and computational complexity. Researchers have proposed various solutions to address this problem, such as the convex optimization technique outlined in [9,15] and online learning algorithms based on the stochastic approximation theory [22–24]. The online SVR presented in [9] and the Canal loss-based online regression algorithm described in [22] are used as examples. Therefore, despite the efficiency of the proposed online learning algorithms for handling regression problems in data streams, other solutions may be required when noise and outliers are present.

The current study aims to present an online learning regression algorithm based on truncated asymmetric loss functions, which can effectively address the regression problem in noisy data streams. We propose a novel online SVR, termed TIPOSVR, established within the regularization framework of SVR and solved by the online gradient descent (OGD) algorithm. TIPOSVR uses an innovative loss function. The main contributions of this study are as follows:


The remainder of this paper is organized as follows. Section 2 of this paper provides a literature review, while Section 3 presents the regularization framework and the robust loss function of SVR. Section 4 proposes an online SVR based on the TIPL function. To validate the performance of the proposed algorithm, Section 5 presents numerical experiments on the synthetic, UCI benchmark, and real datasets which compare TIPOSVR with some classical and advanced SVRs. Finally, Section 6 summarizes the main findings, limitations, and prospects for future work of this paper.

#### **2. Literature Review**

SVR has been widely used and implemented in various fields as a powerful machine learning algorithm. This section provides an overview of the recent advances in robust loss functions and online learning algorithms.

#### *2.1. Robust Loss Function*

Noise is generally classified into two categories: characteristic noise and outliers. It has been reported [21,25] that the *ε*-insensitive loss function is more effective for dealing with uniform noise datasets, whereas the squared loss function is more effective in dealing with Gaussian noise datasets. Aside from Gaussian and Uniform noise, asymmetric noise, especially heavy-tail noise, also significantly affects the accuracy of a regression model. Studies have shown that an asymmetric loss function can be used to solve the asymmetric noise problem [15,17,18]. The quantile regression theory provides the basis for deriving an asymmetric loss function [15]. Assigning different penalty weights to positive and negative errors allows for a wider range of noise distributions. Quantile regression has been increasingly used since the 1970s. Refs. [26,27] have adopted the quantile loss function, incorporating an adjustment term *ν*  and *C*(*ντ*(1 − *τ*)), to adjust the asymmetric insensitive region within the regularization framework, as expressed in optimization problem Equation (1). The introduction of the parameter *ν* to control the width of the asymmetric *ε*-insensitive area ensures that a certain percentage of the samples are situated in this area and classified as support vectors. Consequently, an insensitive band that can accommodate the necessary amount of samples is outputted to address the sampling issue, thereby facilitating the automated control of accuracy. The pinball loss is developed based on quantile regression [15,18]. Extensive research on pinball loss has been conducted, leading to the development and application of the sparse *ε*-insensitive pinball loss [16] and the twin pinball SVR [28].

Several studies have been conducted to address the outlier problem, with a focus on developing a non-convex loss function. The correlation entropy loss [16,19] is a loss function derived from the correlation entropy theory based on the Gaussian or Laplacian kernel. As the error moves away from zero in either direction, the loss value eventually increases to a constant. Another type of non-convex loss function is horizontal truncated loss. In this case, the loss value of the outlier is a constant. As described in [22], Canal loss is a *ε*-insensitive loss with horizontal truncation. Ref. [29] construct a non-convex loss function by subtracting two *ε*-insensitive loss functions, which yield a linear loss with horizontal truncation. A non-convex least square loss function, proposed in [8], is based on the horizontal truncation and squared loss. Experimental evidence suggests that these loss functions successfully reduce the impact of outliers. The aforementioned methods produce bounded loss functions that help reduce the sensitivity of the model to outliers by maintaining the loss of outliers within a certain limit.

For a dataset containing noise and outliers, a combination of the asymmetric loss function and truncated loss function is proposed to improve the robustness of the regression model, because such a combination covers a more comprehensive range of noise distributions.

Most of the analysis uses batch learning SVR. Batch algorithms assume that data can be collected and used via a single step process, ignoring any changes that may occur over time. This is not the case in today's era of the Big Data age, where the data are constantly in flux [23,30,31]. Batch algorithms are faced with memory and computational problems because of the considerable amount of data needed by these algorithms. Researchers have now focused on investigating online learning algorithms to improve the performance of regression strategies in the face of data flow.

#### *2.2. Online Learning Algorithm*

A regression algorithm should be designed to easily integrate new data into the existing model to address the storage and computational issues caused by a large amount of data. Online learning algorithms have been discussed and implemented in previous studies [23,24,32,33]. A kernel-based online extreme learning machine is proposed by [34]. An online sparse SVR method is introduced in [35].

Nevertheless, these techniques are formulated in the context of convex optimization, which is unsuitable for non-convex optimization problems. The online learning approach is further improved according to the theory of pseudo-convex function optimization theory [33]. Studies on online learning algorithms for non-convex loss functions have been performed, including the online SVR [22] and a variable selection [36] based on Canal loss.

This study presents an online SVR that contains a bounded and non-convex loss function. The algorithm is designed to be noise-resilient and sparse while being capable of capturing the various data characteristics.

#### **3. Related Work**

#### *3.1. Robust Loss Function*

The loss function plays a key role in any regression model. The sensitivity of a model to asymmetric noise is reduced using the pinball loss function, which is an asymmetric loss function. Further, the truncated loss function reduces the effect of outliers.

#### 3.1.1. Pinball Loss

The pinball loss is derived from quantile regression, which is more effective for dealing with different forms of noise than linear loss. Pinball loss is defined as follows:

$$L\_{\tau}(u) = \begin{cases} u, & u > 0 \\ -\tau u, & u \le 0 \end{cases} \tag{2}$$

where *τ* is an asymmetry parameter. By adjusting the value of *τ*, we can address the problems of over and under-prediction to different degrees, making it suitable for datasets with different noise distributions. Cross-validation is the preferred method to determine the best value of *τ*. When *τ* = 1, the pinball loss is equivalent to a linear loss.

Incorporating a *ε*-insensitive band into the pinball loss enables the pinball loss to be sparser and resilient to minor errors. *ε*-insensitive pinball loss is defined as:

$$L\_{\varepsilon,\tau}(u) = \begin{cases} u - \varepsilon & u > \varepsilon \\ 0 & -\varepsilon/\tau \le u \le \varepsilon \\ -\tau u - \varepsilon & u < -\varepsilon/\tau \end{cases} \tag{3}$$

Pinball loss and *ε*-insensitive pinball loss present a potential solution to counteract the asymmetric distribution of noise. These two convex loss functions are particularly susceptible to severe noise and outliers owing to the lack of an upper bound.

#### 3.1.2. Truncated *ε*-Insensitive Loss

The horizontal truncation technique provides an efficient approach to dealing with outliers. The truncated *ε*-insensitive loss is a variant of the *ε*-insensitive loss that includes a horizontal truncation, as shown in Figure 1. As suggested by [22], the truncated *ε*insensitive loss known as Canal loss promotes sparsity and robustness. It is defined as follows:

$$L\_{\text{Carall}}\left(\mu\right) = \begin{cases} |u| - \varepsilon, & \varepsilon < |u| < \delta \\ 0, & |u| \le \varepsilon \\ \delta - \varepsilon, & |u| \ge \delta \end{cases} \tag{4}$$

**Figure 1.** Loss functions.

By limiting the loss of outliers to a predetermined value *δ* − *ε*, the impact of outliers on the model is limited, thereby increasing the robustness of the model. The *ε*-insensitive band and the area |*u*| ≥ *δ* contribute to the sparse solution of the algorithm.

#### *3.2. Online SVR*

Online learning algorithms integrate new arrival data into the historical model and adjust the model through the parameter update strategy. When constructing a model, the instantaneous risk is used instead of empirical risk. Online SVR is expressed as an instantaneous risk minimization problem under the regularization framework. It is defined as follows:

$$\min \qquad \mathcal{R}\_{\text{inst}}[f, \mathbf{X}\_i, y\_i] = \frac{1}{2} \|f\|\_{\mathcal{H}}^2 + \mathbb{C} \cdot L(f(\mathbf{X}\_i - y\_i)) \tag{5}$$

The objective function consists of two components: a regularization term <sup>1</sup> <sup>2</sup> *<sup>f</sup>* <sup>2</sup> <sup>H</sup> and a loss function *L*(·) representing the instantaneous risk. The regularization parameter *C* is typically set by cross-validation. The model can be updated using the latest information and previous support vector data patterns by incorporating the instantaneous risk. Consequently, the memory requirements and the number of calculations are lower than those of the batch algorithm.

#### **4. Online SVR Based on Truncated** *ε***-Insensitive Pinball Loss Function**

In this section, we present a modified version of the pinball loss function, called TIPL, which is a non-convex and asymmetric loss function. In addition, an online SVR for the TIPL is designed and solved using the OGD.

#### *4.1. Truncated ε-Insensitive Pinball Loss Function (TIPL) and Its Properties*

#### 4.1.1. Truncated *ε*-Insensitive Pinball Loss Function

Inspired by the pinball loss function, TIPL is developed, which is defined as follows:

$$L\_{\rm TIP}(\mu) = \begin{cases} \delta - \varepsilon, & \mu \ge \delta \\ \mu - \varepsilon, & \varepsilon \le \mu < \delta \\ 0, & -\varepsilon/\tau \le \mu < \varepsilon \\ -\tau\mu - \varepsilon, & -\delta/\tau \le \mu < -\varepsilon/\tau \\ \delta - \varepsilon, & \mu < -\delta/\tau \end{cases} \tag{6}$$

where *τ* is an asymmetric parameter, *ε* is an insensitive parameter, and *δ* is the truncation parameter. TIPL is divided into five parts, as shown in Figure 1. If the error *u* is within the specified tolerance range [−*ε*/*τ*,*ε*) as *ε*-insensitive area, the loss is zero. The loss is *u* − *ε* for *u* ∈ [*ε*, *δ*), and −*τu* − *ε* for *u* ∈ [−*δ*/*τ*, −*ε*/*τ*). Except in the above cases, when *u* ∈ (−∞, −*δ*/*τ*) or *u* ∈ [*δ*, ∞), the loss is fixed as a constant *δ* − *ε*.

TIPL is an improved version of the pinball loss that offers improved resistance to noise and outliers, and produces a sparser representation of the solution. The *ε*-insensitive band promotes sparsity and thus saves computing resources. Applying horizontal truncation limits the impact of outliers on the loss value and increases the algorithm's robustness to large disturbances and outliers. The asymmetric feature makes the model more versatile and applicable to a wide range of noise types.

TIPL is expressed in an equivalent form:

$$\min \{ \delta - \varepsilon, \max \{ -\tau \mu - \varepsilon, \mu - \varepsilon, 0 \} \} \tag{7}$$

4.1.2. Properties of the TIPL Function

**Property 1.** *LTIP*(*u*) *is a non-negative, asymmetric, and bounded function.*

**Proof.** (1) For ∀*u* ∈ *R*, *LTIP*(*u*) ≥ 0. TIPL is non-negative.

(2) From Equation (6), *LTIP*(*u*) = −*τu* − *ε* if *u* ∈ [−*δ*/*τ*, −*ε*/*τ*). *LTIP*(*u*) = *u* − *ε*, if the error *u* ∈ [*ε*, *δ*). Obviously, *LTIP*(*u*) = *LTIP*(−*u*) if *τ* = 1. *LTIP*(*u*) is not symmetrical.

(3) From Equation (6), *LTIP*(*u*) ≤ *δ* − *ε*, so *LTIP*(*u*) is bounded.

**Property 2.** *LTIP*(*u*) *includes and extends both ε-insensitive loss function and truncated εinsensitive loss function.*

#### **Proof.** From Equation (6),

When *δ* = ∞, *LTIP*(*u*) reduces to the *ε*-insensitive pinball loss function.

When *τ* = 1, *LTIP*(*u*) is equivalent to the truncated *ε*-insensitive loss function.

By inheriting the asymmetry of pinball losses and the immunity to outliers of truncated *ε*-insensitive loss, TIPL achieves a higher level of resilience. TIPL differs from the *ε*insensitive pinball loss in that the former incorporates horizontal truncation, which limits the loss of outliers to some extent and makes the model more robust. Unlike truncated *ε*insensitive loss, TIPL loss takes advantage of asymmetric functions. It assigns different

penalty weights to positive and negative errors, enabling it to deal with general noise distributions. *δ* and *τ* are determined using a data-driven process.

**Property 3.** *The derivative of LTIP*(*u*) *is discontinuous.*

**Proof.**

$$L'\_{\rm TP}(u) = \begin{cases} -\tau, & -\delta/\tau \le u < -\varepsilon/\tau \\ 0, & \text{otherwise} \\ 1, & \varepsilon \le u < \delta \end{cases} \tag{8}$$

As can be seen from Equation (8),

$$\begin{array}{lcl}\lim\_{\boldsymbol{\mu}\to-\boldsymbol{\varepsilon}/\boldsymbol{\tau}^{-}}\boldsymbol{L}\_{\text{TIP}}^{\prime}(\boldsymbol{\mu})=-\boldsymbol{\tau}\neq\lim\_{\boldsymbol{\mu}\to-\boldsymbol{\varepsilon}/\boldsymbol{\tau}^{+}}\boldsymbol{L}\_{\text{TIP}}^{\prime}(\boldsymbol{\mu})=0,\\\lim\_{\boldsymbol{\mu}\to-\boldsymbol{\delta}/\boldsymbol{\tau}^{-}}\boldsymbol{L}\_{\text{TIP}}^{\prime}(\boldsymbol{\mu})=0\neq\lim\_{\boldsymbol{\mu}\to-\boldsymbol{\delta}/\boldsymbol{\tau}^{+}}\boldsymbol{L}\_{\text{TIP}}^{\prime}(\boldsymbol{\mu})=-\boldsymbol{\tau},\\\lim\_{\boldsymbol{\mu}\to\boldsymbol{\varepsilon}^{-}}\boldsymbol{L}\_{\text{TIP}}^{\prime}(\boldsymbol{\mu})=0\neq\lim\_{\boldsymbol{\mu}\to\boldsymbol{\varepsilon}^{+}}\boldsymbol{L}\_{\text{TIP}}^{\prime}(\boldsymbol{\mu})=1,\\\lim\_{\boldsymbol{\mu}\to\boldsymbol{\delta}^{-}}\boldsymbol{L}\_{\text{TIP}}^{\prime}(\boldsymbol{\mu})=1\neq\lim\_{\boldsymbol{\mu}\to\boldsymbol{\delta}^{+}}\boldsymbol{L}\_{\text{TIP}}^{\prime}(\boldsymbol{\mu})=0.\end{array}$$

The derivative of *LTIP*(*u*) is discontinuous, which precludes using a convex optimization to solve it.

#### *4.2. Online SVR Based on the Truncated ε-Insensitive Pinball Loss Function*

Within the regularization framework of SVR, the online SVR model with TIPL is derived by incorporating the loss function *LTIP*(·) into Equation (5), which is defined as:

$$\min \quad \frac{1}{2}||f||\_{\mathcal{H}}^2 + \mathbb{C} \cdot L\_{\text{TIP}}(f^{k-1}(\mathbf{X}\_k) - y\_k) \tag{9}$$

We chose the OGD to solve the non-convex optimization problem presented by the TIPL loss function. The online algorithm updates the regression function by incorporating the initial decision function *f <sup>k</sup>* and the new sample (**X***k*, *yk*).

The learning process involves generating a series of decision functions(*f* 0, *f* 1, ..., *f <sup>N</sup>*), with the initial hypothesis *f* <sup>0</sup> and the updated regression function *f <sup>k</sup>*. When a new sample (**X***k*, *yk*) arrives, the predicted value *f <sup>k</sup>*−1(*Xk*) is calculated by the historical decision function *f <sup>k</sup>*<sup>−</sup>1, and the loss value *LTIP*(*f <sup>k</sup>*−1(**X***k*) − *yk*) is determined by combining *f <sup>k</sup>*−1(*Xk*) with the actual label *yk*.

The update process of *f <sup>k</sup>* is defined as follows:

$$f^k = f^{k-1} - \gamma\_k \cdot z\_k \tag{10}$$

where *<sup>γ</sup><sup>k</sup>* <sup>&</sup>gt; 0 is the learning rate; *zk* <sup>=</sup> *<sup>C</sup>*·*∂<sup>f</sup> LTIP*(*<sup>f</sup> <sup>k</sup>*−1(*Xk*) <sup>−</sup> *yk*)<sup>|</sup> *<sup>f</sup>*=*<sup>f</sup> <sup>k</sup>*<sup>−</sup>1+*<sup>f</sup> <sup>k</sup>*<sup>−</sup>1. *<sup>∂</sup><sup>f</sup> LTIP*(*<sup>f</sup> <sup>k</sup>*−<sup>1</sup> (**X***k*) − *yk*) is determined by the renewable nucleus, i.e., *∂<sup>f</sup> LTIP*(*f <sup>k</sup>*−1(**X***k*) − *yk*) = *L TIP* (*f <sup>k</sup>*−1(**X***k*) − *yk*)·*κ*(**X***k*, ·). With *uk* = (*f <sup>k</sup>*−1(**X***k*) − *yk*), *zk* is expressed as follows:

$$z^{k} = \begin{cases} & \mathbb{C} \cdot k(\mathbf{X}\_{k\prime} \cdot) + f^{k-1} & \varepsilon \le \mu\_{k} < \delta \\ & f^{k-1} & \text{otherwise} \\ & -\mathbb{C} \cdot \tau k(\mathbf{X}\_{k\prime} \cdot) + f^{k-1} & -\delta/\tau \le \mu\_{k} < -\varepsilon/\tau \end{cases} \tag{11}$$

Combining Equations (10) and (11), the iterations for the decision function is defined as:

$$z^{k} = \begin{cases} (1 - \gamma\_{k})f^{k-1} - \gamma\_{k} \mathbf{C} \cdot k(\mathbf{X}\_{k'} \cdot) & \varepsilon \le u\_{k} < \delta \\ (1 - \gamma\_{k})f^{k-1} & \text{otherwise} \\ (1 - \gamma\_{k})f^{k-1} + \gamma\_{k} \mathbf{C} \cdot \tau k(\mathbf{X}\_{k'} \cdot) & -\delta/\tau \le u\_{k} < -\varepsilon/\tau \end{cases} \tag{12}$$

The sample is not a support vector if *uk* ∈ [−*ε*/*τ*,*ε*) or (−∞, −*δ*/*τ*) ∪ [*δ*, +∞). The two regions are *ε*-insensitive or outlier regions, in which the samples from these regions are not considered during the update process. The proposed SVR model not only preserves the sparsity of *ε*-insensitive loss but also increases the sparsity by eliminating outliers. Algorithm 1 details the proposed TIPOSVR algorithm.

**Algorithm 1:** Online support vector regression algorithm based on the truncated *ε*-insensitive Pinball loss function. **Input:** Initial assumption (decision function) *f* 0, hyperparameter *γ* > 0, *λ* > 0, *τ* > 0,*ε* > 0, *δ* > 0, *C* > 0, *k* > 0. Data sample (**X***i*, *yi*), *i* = 1, 2, ······ , **Output:** sequence of decision functions *f* 0, *f* 1, ······ , *f <sup>N</sup>* 1: **for** *k* = 1, 2, ······ do 2: Receive data **X***<sup>k</sup>* 3: Predict *f <sup>k</sup>*−1(**X***k*) 4: Receive true label *yk* 5: Compute *uk* = *f <sup>k</sup>*−1(**X***k*) − *yk* 6: **if** *ε* ≤ *uk* < *δ* 7: *fk* ← (1 − *γk*)*f <sup>k</sup>*−<sup>1</sup> − *γkC* · *k*(**X***k*, ·) 8: **elif** −*δ*/*τ* ≤ *uk* < −*ε*/*τ* 9: *fk* ← (1 − *γk*)*f <sup>k</sup>*−<sup>1</sup> + *γkC* · *τ* · *k*(**X***k*, ·) 10: **else** 11: *fk* ← (1 − *γk*)*f <sup>k</sup>*−<sup>1</sup> 12: **end if** 13: **end for**

#### *4.3. Convergence of TIPOSVR*

In the research of the regularized instantaneous risk minimization with Canal loss, Ref. [22] reveals that the regularized Canal loss satisfies an inequality analogous to that of a convex function on **R**, apart from two small, unidentifiable intervals. Strong pseudoconvexity is defined based on this representation, and the convergence performance of NROR is analyzed using online convex optimization theory. It has been demonstrated that if the prediction deviation sequence does not fall into the unrecognizable region of Canal loss, the average instantaneous risk will converge to the minimum regularization risk at a rate of *o T*−1/2 .

Drawing inspiration from [22], this section illustrates the strong pseudo-convexity of the regularization TIPL loss, and concludes TIPOSVR's convergence rate of the average instantaneous risk to the minimum regularization risk. Definitions and propositions of strong pseudo-convexity in this section are taken from [22].

**Definition 1** ([22] Strong pseudo-convexity)**.** *A function f* : *χ* → **R** *is said to be strongly pseudo-convex (SPC) on χ*<sup>1</sup> ⊂ *χ with respect to x* ∈ *χ, if*

$$f(\mathbf{x}) - f(\overline{\mathbf{x}}) \le K \langle f'(\mathbf{x}), \mathbf{x} - \overline{\mathbf{x}} \rangle \tag{13}$$

*holds for all x* ∈ *χ*1*, with f* (*x*) *a Clarke subgradient of f at x*, *K* > 0 *is a contant. If the Inequality Equation (13) holds with respect to any x* ∈ *χ*1*, f is called SPC on χ*<sup>1</sup> ⊂ *χ. The collection of SPC functions on χ*<sup>1</sup> *with K* > 0 *are denoted as* W*K*(*χ*1)*.*

When *k* = 1, a strongly pseudo-convex function is equivalent to a convex function. In order to understand the strong pseudo-convexity of the regularized TIPL's loss, which is a piecewise convex function, further analysis is required. Propositions 1 and Propositions 2 [22] enable us to verify the strong pseudo-convexity of TIPL loss.

**Proposition 1** ([22])**.** *Let f* : **R** → **R** *be a univariate continuous function. Assume that on each interval of* (−∞, *a*],(*a*, *b*), [*b*, ∞)*, f*(*x*) *is convex, and f* <sup>−</sup>(*a*) < 0, *f* +(*b*) > 0, *f* +(*a*) = 0, *f* <sup>−</sup>(*b*) = 0*. Then we have that the Inequality (13) holds for any fixed x* ∈ *R and x* ∈ *R with*

$$K = \max\left\{1, \frac{f\_{-}'(a)}{f\_{+}'(a)}, \frac{f\_{+}'(a)}{f\_{-}'(a)}, \frac{f\_{-}'(b)}{f\_{+}'(b)}, \frac{f\_{+}'(b)}{f\_{-}'(b)}, \frac{f\_{-}'(a)}{f\_{+}'(b)}, \frac{f\_{+}'(b)}{f\_{-}'(a)}\right\} \tag{14}$$

**Proposition 2** ([22])**.** *Let f* : **R** → **R** *be a univariate continuous function. Let a*<sup>0</sup> < *a*<sup>1</sup> < ··· < *am be the real numbers, a*<sup>0</sup> = −∞ *and am* = +∞*. On each interval of* [*ai*, *ai*+1]*, f*(*x*) *is convex, and i* = 0, 1, ..., *m* − 1*. Let S be the set of the minimum points of f on R. Suppose that the optimal solution set S* ∈ [*aq*, *aq*+1]*. With q* ∈ [0, ··· , *m* − 1]*. Moreover, suppose that f*(*x*) *is strictly decreasing when X* ≤ *InfS and strictly increasing when X* ≥ *SupS. Then, for any fixed x* ∈ [*a*0, *am*] *and x* ∈ [*a*0, *am*]*. Inequality (13) holds with*

$$K = \max\left\{1, \frac{f\_+'\left(a\_\mu\right)}{f\_-'\left(a\_{\upsilon+1}\right)}, \frac{f\_-'\left(a\_i\right)}{f\_+'\left(a\_j\right)} \mid q \in [0, \ldots, m-1], \mu \in [\upsilon+1, \ldots, q] \right\}\tag{15}$$

$$\upsilon \in [0, \ldots, q-1], i \in [q+1, \ldots, j], j \in [q+1, \ldots, m-1] \right\}$$

It is evident from Proposition 1 and Proposition 2 that the parameter K of strong pseudo-convexity is associated with the directional derivatives at the end of the intervals. The strong pseudo-convexity of regularized TIPL loss is obtained from Lemma 1 and Lemma 2. The proof of lemmas and theorems can be found in the supplementary material.

**Lemma 1.** *Denote* Ω<sup>0</sup> = [*t*<sup>0</sup> − *δ*/*τ* − *C*|*β*|, *t*<sup>0</sup> − *δ*/*τ*] ∪ [*t*<sup>0</sup> + *δ*, *t*<sup>0</sup> + *δ* + *C*|*β*|]*, suppose* 0 ∈/ Ω0*, f*(*t*) = <sup>1</sup> 2 *t* <sup>2</sup> <sup>+</sup> *<sup>C</sup>* · *LTIP*(*β*(*<sup>t</sup>* <sup>−</sup> *<sup>t</sup>*0)) *is SPC with K* <sup>=</sup> max 2, 1 + *<sup>C</sup>*·*τ*2*X*<sup>2</sup> *<sup>δ</sup>* , 1 <sup>+</sup> *<sup>C</sup>*·*τX*<sup>2</sup> *δ*·*τ*−*ε on R*\Ω0*.*

**Lemma 2.** *Let the sequence instance* (*Xt*, *yt*) *satisfy k*(*Xt*, *Xt*) ≤ *X*2*. For a fixed g* ∈ H

$$u^t = \left(f^t - \mathfrak{g}\right) / \left|\left[f^t - \mathfrak{g}\right]\right|,\\ t\_0 = y\_t - \mathfrak{g}(\mathbf{X}\_t) + u^t(\mathbf{X}\_t) \cdot \left,$$

$$\Omega\_0 = \left[-\delta / \tau - \left(u^t(\mathbf{X}\_t)\right)^2, -\delta / \tau\right] \cup \left[\delta, \delta + \left(u^t(\mathbf{X}\_t)\right)^2\right]$$

$$\text{Assuming } t\_0 \notin \Omega\_0, \tilde{\xi}\_t = f^t(\mathbf{X}\_t) - y\_t \notin \Omega\_0,\text{ we have}$$

$$\begin{aligned} &\mathcal{R}\_{\text{inst}}\left[f^t, \mathbf{X}\_t, y\_t\right] - \mathcal{R}\_{\text{inst}}[\mathcal{g}, \mathbf{X}\_t, y\_t] \le K \cdot \left<\bigotimes\_f \mathcal{R}\_{\text{inst}}\left[f^t, \mathbf{X}\_t, y\_t\right]\Big|\_{f=f^t}, f^t - \mathcal{g}\right>\_{\text{IV}} \end{aligned}$$

$$K = \max\left\{2, 1 + \frac{\mathbb{C} \cdot \tau^2 X^2}{\delta}, 1 + \frac{\mathbb{C} \cdot \tau X^2}{\delta \cdot \tau - \varepsilon}\right\}. \tag{16}$$

H

Lemma 2 demonstrates that, under the given assumptions, f and g of instantaneous loss satisfy the inequality of strong pseudo-convexity. Subsequently, we can employ online convex optimization technology to analyze the convergence performance of TIPOSVR. The theorem provides the rate measure of TIPOSVR convergence to minimize risk.

**Theorem 1.** *Set example sequence S* = {(*Xt*, *yt*)}*<sup>T</sup> <sup>t</sup>*=<sup>0</sup> *be <sup>k</sup>*(*Xt*, *<sup>X</sup>t*) ≤ *<sup>X</sup>*<sup>2</sup> *holds for all t. f* 0, ··· , *f <sup>T</sup> represents a hypothetical sequence produced by TIPOSVR, Rinst*[*g*, *S*] = <sup>1</sup> *<sup>T</sup>* <sup>∑</sup>*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> *Rinst*[*g*, *Xt*, *yt*]*, and g*ˆ = arg min*g*∈H *Rinst*[*g*, *S*]*. Fix C*,*ε* > 0, 0 < *η* < *C and set the learning rate η<sup>t</sup>* = *η* · *t* <sup>−</sup>1/2*. We assume that each hypothesis f <sup>t</sup> generated by TIPOSVR satisfies the hypothesis stated in Lemma 2, for t* = 0, 1, 2 ··· *T, and then we have the following expression*

$$\frac{1}{T} \sum\_{t=1}^{T} R\_{inst} \left[ f^t, \mathbf{X}\_t \, \mathcal{Y}\_t \right] \le R\_{inst} [\p, S] + \alpha T^{-1/2} + o\left( T^{-1/2} \right) \tag{17}$$

*Rinst*" *f t* Among them, *α* = <sup>2</sup>*KX*<sup>2</sup> *<sup>η</sup>* <sup>+</sup> <sup>4</sup>*KX*2*η*, *<sup>K</sup>* <sup>=</sup> max 2, 1 + *<sup>C</sup>*·*τ*2*X*<sup>2</sup> *<sup>δ</sup>* , 1 <sup>+</sup> *<sup>C</sup>*·*τX*<sup>2</sup> *δ*·*τ*−*ε* . 

In Theorem 1, we get an *o T*−1/2 regret boundary. For each *t*, *f <sup>t</sup>*−1(**X***t*) − *yt* ∈/ Ω0, Ω<sup>0</sup> is the union of two intervals with length *u*2(**X***t*) ≤ *X*2. For *f <sup>t</sup>* exceeding this hypothesis in practice, the prediction error *f <sup>t</sup>* (**X***t*) − *yt* may fall within the zone Ω0, where losses are flat. In this case, sample (**X***t*, *yt*) is identified as a non support vector by TIPOSVR.

#### **5. Numerical Experiments**

We performed experiments on multiple datasets with noise and outliers to evaluate the effectiveness of the TIPOSVR. The performance of our model is then compared with those of other online SVR models. The datasets adopted consist of synthetic datasets, benchmark datasets, and real datasets. The artificial dataset evaluates performance under specific fluctuations, while the benchmark and real-world datasets allow for assessing performance in realistic environments. In the experimental part, three comparison algorithms are used in the experiment: *ε*-SVR (SVR with *ε*-insensitive loss), SVQR (SVR with *ε*-insensitive pinball loss), and NROR (SVR with truncated *ε*-insensitive loss), as shown in Table 1. Batch algorithms are not included in the comparison algorithms because they are inappropriate for training with large datasets.

**Table 1.** Loss functions.


Experiments are carried out using Python 3.8 on a PC with an Intel i7-5500U CPU 2.40 GHz.

To ensure the accuracy and effectiveness of the assessment, we chose absolute mean error (MAE), root mean square error (RMSE), and time to run (TIME) as the assessment metrics. Details of the evaluation criteria are presented in Table 2.

**Table 2.** Table of evaluation criteria.


MAE is the average of the absolute errors. RMSE is the square root of the mean square error, which provides the standard deviation of the errors. MAE is not as sensitive to outliers as RMSE, which puts more emphasis on large error values.

Samples from the dataset are randomly selected to form the training and test sets, with outliers and noise added to the training set. A grid search method was used to identify the optimal values of the parameters. The Gaussian kernel *k*(*x*, *x* ) = *exp* −*κ x* − *x* 2 was selected as the kernel in the work. TIPOSVR includes hyperparameters such as the insensitivity coefficient *ε*, the asymmetry parameter *τ*, the truncation parameter *δ*, the regularization parameter *C*, the learning rate *γ* and the kernel parameter *κ*. To negate the effects of the kernel and regularization parameters, the most effective values of *C* and *κ* were identified through *ε*-SVR, and the same values were then used for the other three models. Grid search and five-fold cross-validation were performed on the training set for each dataset to obtain the highest accuracy. *C* was taken from {0.1, 0.5} while *κ* choosen from {0.5, 1, 2, 4, 8, 16}. The NROR algorithm was employed to determine the truncation parameter *δ* from {0.4, 0.8, 1.6}. Cross-validation and grid search methods in TIPOSVR

were used to select the asymmetric parameter *τ* from {0.4, 0.8, 1, 1.2, 1.4}. The insensitivity parameter was set to 0.04 for simplicity.

#### *5.1. Synthetic Datasets*

The synthetic dataset is generated by a bivariate function defined as follows:

$$f(\mathbf{x}\_1, \mathbf{x}\_2) = \frac{\left(\mathbf{5} - \mathbf{x}\_2\right)^2 + \left(\mathbf{6} - \mathbf{x}\_1\right)^2}{\left(\mathbf{5} - \mathbf{x}\_1\right)^2 + \left(\mathbf{5} - \mathbf{x}\_2\right)^2} \tag{18}$$

where *x*<sup>1</sup> and *x*<sup>2</sup> are input features of the sample with uniform distribution *U*[0, 10], *x*<sup>1</sup> ∈ [0, 10], *x*<sup>2</sup> ∈ [0, 10] and (*x*1, *x*2) = (5, 5). The output features are generated by Equation (18), which is shown in Figure 2.

**Figure 2.** Function for synthetic dataset.

Different types of noises are added to the dataset to evaluate the effectiveness of the proposed algorithm. Consider the label of the training sample *y*˜*<sup>i</sup>* to be of the form *y*˜*<sup>i</sup>* = *yi* + *ζi*, where *ζ<sup>i</sup>* is noise sampled according to the noise distribution. The synthetic dataset is affected by five different types of noise: symmetric homoscedastic noise, symmetric heteroscedastic noise, asymmetric homoscedastic noise, asymmetric and heteroscedastic, and asymmetric heteroscedastic noise that varies with the independent variable. Samples generated by the bivariate function are polluted with noise. Five noisy training datasets are generated as follows:

Type I.

$$
\mathfrak{F}\_i^{(1)} = y\_i + \mathfrak{I}\_i^{(1)} \tag{19}
$$

*ζ* (1) *<sup>i</sup>* is the Gaussian noise with a normal distribution *N*(0, 2), whereas *y*˜ (1) *<sup>i</sup>* is the data label that contains symmetric homoscedastic noise.

Type II.

$$
\mathfrak{F}\_i^{(2)} = y\_i + \mathfrak{I}\_i^{(2)} \tag{20}
$$

*ζ* (2) *<sup>i</sup>* is the Gaussian noise whose distribution obeys *<sup>N</sup>*(0, 2), where *<sup>σ</sup>*<sup>2</sup> is a random number on the interval [0, 6], *y*˜ (2) *<sup>i</sup>* is the data label containing symmetric heteroscedastic noise. Type III.

$$
\mathfrak{y}\_i^{(3)} = y\_i + \mathfrak{z}\_i^{(3)} \tag{21}
$$

*ζ* (3) *<sup>i</sup>* is the Chi square noise whose distribution obeys *<sup>χ</sup>*2(1). *<sup>y</sup>*˜ (3) *<sup>i</sup>* is the data label containing asymmetric and homoscedastic noise.

Type IV.

$$
\bar{y}\_i^{(4)} = y\_i + \zeta\_i^{(4)} \tag{22}
$$

*ζ* (4) *<sup>i</sup>* is the Chi square noise with a Chi square distribution *<sup>χ</sup>*2(*n*), where n is the random number on the interval [1, 4]. *y*˜ (4) *<sup>i</sup>* is the data label containing asymmetric and heteroscedastic noise.

Type V.

$$
\mathfrak{g}\_i^{(5)} = \mathfrak{y}\_i + \mathfrak{x}\_i \cdot \mathbb{J}\_i^{(1)} \tag{23}
$$

*y*˜ (5) *<sup>i</sup>* is the data label containing asymmetric and heteroscedastic noise where the noise varies *xi* · *ζ* (1) *<sup>i</sup>* significantly with the independent variable x.

The probability density function of Gaussian distribution *N*(0, 2) is shown in Figure 3. The mean is zero and the variance is 2. The skewness of Gaussian distribution *N*(0, 2) is 0. This distribution is symmetric.

The probability density function of *δχ*<sup>2</sup> − 4 is shown in Figure 4. The mean of *δχ*<sup>2</sup> − 4 is zero and the variance of *δχ*<sup>2</sup> <sup>−</sup> 4 is 8. The skewness of *δχ*<sup>2</sup> <sup>−</sup> 4 is <sup>√</sup>2. This distribution is asymmetric.

**Figure 3.** The probability density function of Gaussian distribution *N*(0, 2).

**Figure 4.** The probability density function of Chi square distribution *δχ*<sup>2</sup> − 4.

The noise level is determined by the ratio of noisy data in the training set, which are set to 0%, 5%, 20%, 40%, 50%, or 60%. The training set consists of 5000 samples with noise, and the test set consists of 5000 samples without noise. The experimental results are listed in Tables 3–7. The performance of TIPOSVR demonstrates its effectiveness in making accurate predictions across diverse datasets.

Table 3 lists the accuracy and learning time of algorithms for the symmetric homovariance noise dataset. The most outstanding results of each indicator are in bold font. At a low noise level, the performance of the various algorithms is comparable, yet TIPOSVR's running time is significantly longer than the other comparison algorithms. The benefits of TIPOSVR are not particularly noticeable in datasets that are symmetric and have a low level of noise. However, when the noise level is high, TIPOSVR outperforms the comparison algorithm.


**Table 3.** Operation results of algorithms under the influence of noise type I for synthetic dataset.

The accuracy and learning time of various algorithms for symmetric heteroscedastic noise datasets are summarized in Table 4. The MAE and RMSE of the TIPOSVR are lowest when the noise rate is 5%, 40%, 50%, and 60%, showing that the TIPOSVR has achieved an excellent matching effect. TIPOSVR is trained with the minimum time when the noise rates are 50% and 60%, indicating that it provides accurate predictions while saving computational resources. TIPOSVR shows the best accuracy and is followed by NROR and SVQR. *ε*-insensitive loss provides the worst accuracy for datasets corrupted by heteroscedastic noise. The performance of asymmetric *ε*-insensitive loss and truncated asymmetric *ε*-insensitive loss is superior to that of symmetric loss, indicating that the ability to deal with heteroscedastic noise can be significantly enhanced by adjusting the asymmetric parameters.


**Table 4.** Operation results of algorithms under the influence of noise typeII for synthetic dataset.

Results of different algorithms on asymmetric homoscedastic noise are tabulated in Table 5. In the absence of noise, the time and accuracy performance of SVQR is clearly superior to other methods. The evidence indicates that truncation has no effect on data regression when there is no noise, however, it will cause a longer running time. As the noise rate increases, the superiority of TIPOSVR becomes more and more apparent, especially when the noise rate is 40%, 50%, and 60%, where it attains the best accuracy. The results show that TIPOSVR can achieve the highest accuracy in a relatively short time for asymmetric noise.

Simulation results of algorithms for asymmetric heteroscedastic noise datasets are presented in Table 6. The experimental results show that TIPOSVR achieves the best MAE performance at overall noise levels. It yields the lowest RMSE when the noise rate is 0%, 20%, 50%, and 60%. TIPOSVR performs better than NROR. The comparison between NROR and TIPOSVR, in terms of both truncated losses, reveals that asymmetric truncated loss is more effective than symmetric loss for heteroscedastic noise. The asymmetric feature diminishes the influence of asymmetric noise on the regression function. The results confirm the theoretical analysis.

The test accuracy and learning time of different algorithms, as the noise value varies with the independent variable, are shown in Table 7. This dataset presents a more intricate situation. The noise in the dataset depends on the independent variable. More noise and outliers are likely to appear. The results show that all algorithms are sensitive to noise level. TIPOSVR is still significantly more accurate than the comparison algorithms and shows its proficiency in dealing with general noise and outliers.


**Table 5.** Operation results of algorithms under the influence of noise type III for synthetic dataset.

**Table 6.** Operation results of algorithms under the influence of noise typeIV for synthetic dataset.



**Table 7.** Operation results of algorithms under the influence of noise type V for synthetic dataset.

In summary, the study indicates that TIPOSVR is effective when applied to datasets with different types of noise. In particular, at high noise levels (when the noise rate reaches 50% and 60%), TIPOSVR provides accurate predictions in a timely manner. This algorithm shows excellent robustness and generalizability.

#### *5.2. Benchmark Datasets*

In this section, four datasets are selected from the UCI benchmark dataset, including the Dry Bean dataset (DB), the Grid Stability Simulation dataset (EGSSD), the Abalone dataset, and the Gas Turbine Generation (CCPP). To evaluate the results, three benchmark algorithms are employed: *ε*-SVR, NROR, and SVQR. Table 8 provides an overview of the attributes and sample numbers of the UCI benchmark datasets.

**Table 8.** Benchmark datasets description.


The data from the datasets are normalized and split equally into training and test datasets. Research is conducted using datasets with symmetric noise characterized by homogeneous and heteroscedastic variances. The performance of TIPOSVR is evaluated and compared with that of the comparison algorithms on the four benchmark datasets. The selection of the hyperparameters follows the same approach as that used for the synthetic datasets.

Figures 5 and 6 shows the MAE and RMSE values obtained from TIPOSVR and the comparison algorithms for the UCI benchmark datasets with homogeneous Gaussian noise added. No remarkable disparity is observed in the performance of the four methods without noise when analyzing the A dataset. The performance of *ε*-SVR and SVQR vary significantly as the noise rate increases. When the noise rate reaches 40%, 50% and 60%, TIPOSVR and NROR demonstrate superior performance compared to the other two methods. In the majority of cases, TIPOSVR has been found to be the most effective. When the noise rate reaches 60%, it is only inferior to NROR. On dataset B, TIPOSVR, NROR and *ε*-SVR demonstrate the same level of performance with no noise present. As the noise rate increases, both *ε*-SVR and SVQR display more variations. The variation of both TIPOSVR and NROR are less than that of the two methods mentioned above. At noise rates of 40% and 50%, TIPOSVR proves to be more effective than NROR. At a noise rate of 60%, TIPOSVR and NROR demonstrate comparable results. In comparison to SVQR, the performance of TIPOSVR, NROR and *ε*-SVR on the C dataset are superior when the noise level is 0, 5%, 20% and 40%. At noise rates of 50% and 60%, NROR and TIPOSVR show superior performance compared to the other two methods, with TIPOSVR exhibiting the best results. For the D dataset, NROR and TIPOSVR demonstrate the same level of performance, regardless of the noise rate, which is superior to the other two methods. In terms of noise, the RMSE of TIPOSVR is smaller than that of NROR.

The data indicate that TIPOSVR does not excel in symmetric homogeneity datasets at low noise rate, and may even be inferior to NROR. At a high noise rate, TIPOSVR's performance is equivalent to NROR, surpassing the other two comparison algorithms, and even surpassing NROR in some cases. TIPOSVR has been found to be successful in dealing with regression problems that have a high noise level.

**Figure 5.** MAE for Gaussian noise of homogeneity on (**A**) CCPP, (**B**) DB, (**C**) EGSSD, (**D**) Abalone.

**Figure 6.** RMSE for noise of homogeneity on (**A**) CCPP, (**B**) DB, (**C**) EGSSD, (**D**) Abalone.

The bar graph in Figures 7 and 8 illustrate the MAE and RMSE of each algorithm for the benchmark datasets with heteroscedastic noise added. If the noise rate is not more than 40% for datasets A and C, there is no significant distinction between the four methods. At noise rates of 40%, 50% and 60%, NROR and TIPOSVR demonstrate superior performance compared to *ε*-SVR and SVQR, with TIPOSVR displaying the best results at 60%. At a noise rate of 20%, NROR and TIPOSVR prove to be more effective than SVQR and *ε*-SVR when applied to dataset B. Furthermore, when the noise rate increases to 50% and 60%, TIPOSVR outperforms NROR. In the D dataset, TIPOSVR and NROR have consistently demonstrated better performance than SVQR and *ε*-SVR. TIPOSVR and NROR demonstrated an equivalent level of performance.

It is observable that in the dataset with heteroscedasticity noise, when the noise rate is high, TIPOSVR and NROR have comparable performance, and TIPOSVR are usually more effective than NROR. The loss function that accounts for asymmetry exhibits improved performance.

**Figure 7.** MAE for noise of heteroscedasticity on (**A**) CCPP, (**B**) DB, (**C**) EGSSD, (**D**) Abalone.

**Figure 8.** RMSE for noise of heteroscedasticity on (**A**) CCPP, (**B**) DB, (**C**) EGSSD, (**D**) Abalone.

#### *5.3. Real Datasets*

We perform an experiment using actual data from the gas consumption dataset [37]. The dataset consists of 18 features: minimum temperature (minT), average temperature (aveT), maximum temperature (maxT), minimum dew point (minD), average dew point (aveD), maximum dew point (maxD), minimum humidity (minH), average humidity (aveH), maximum humidity (maxH), minimum visibility (minV), average visibility (aveV), maximum visibility (maxV), minimum air pressure (minA), average air pressure (aveA), maximum air pressure (maxA), minimum wind speed (minW), average wind speed (aveW) and maximum wind speed (maxW), and the prediction label is Natural Gas Consumption (NGC).

No noise is added to the data labels on the real datasets. The training set and the test set are divided into equal parts. The parameters and comparison algorithms settings remain the same as before, and the calculation results are presented in Table 9.


**Table 9.** Evaluation table based on real dataset experimental result.

Table 9 illustrates the performance of various algorithms on an actual dataset. In Table 9, TIPOSVR is shown to perform optimally with an MAE of 0.059 and an RMSE of 0.102, indicating its ability to provide a reliable estimate of the real datasets. TIPOSVR remains a viable option compared to other algorithms when dealing with real-world problems.

In summary, all datasets show the effectiveness of TIPOSVR. It is still possible to accurately represent the data distribution even though it is corrupted by noise or outliers. This indicates that the algorithm is advantageous in handling noisy data and lends itself to regression in the data flow.

#### **6. Conclusions**

In this paper, we review the progress of SVR and find that the existing regression algorithms are insufficient to effectively predict dynamic data streams containing noise and outliers effectively.

This study introduces TIPL to assess instantaneous risk in the SVR model. This new loss function is a combination of asymmetry loss and truncated non-convex loss function that offers a variety of advantages. TIPL adjusts the weights of the penalties for both positive and negative errors using asymmetric parameters *τ*. *τ* allows us to partition the fixed width of the *ε*-insensitive area without sacrificing its sparsity. Horizontal truncation is used to deal with large noise and outliers. TIPL incorporates and extends the pinball loss, *ε*-insensitive loss, and truncated *ε*-insensitive loss.

Within the regularization framework, a TIPL-based online SVR algorithm is developed to perform robust regression in a data flow context. Given the non-convexity of the proposed model, an online gradient descent algorithm is chosen to solve the problem.

Experiments are performed on synthetic datasets, UCI datasets, and real datasets corrupted by Gaussian, heteroscedastic, asymmetric, and outlier noise. Our model has been found to be more resilient to noise and outliers than some classical and advanced methods. It also has better prediction performance and faster learning speed. The proposed model is therefore expected to provide more accurate predictions in the dynamic flow of data while consuming fewer computational resources than the batch learning approach.

The main disadvantage of this model lies in the use of multiple hyperparameters. Choosing the appropriate parameter values is essential for the algorithm to achieve optimal performance. In our ongoing research, we aim to develop techniques for determining the optimal hyperparameters for a given training set.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/math11030709/s1.

**Author Contributions:** Conceptualization, X.S. and Z.Z.; methodology, X.S., J.Y. and Z.Z.; software, Y.X.; writing—original draft preparation, X.S., X.L. and Z.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was partially supported by the National Natural Science Foundation of China under Grant No. 71901219.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The datasets used in this paper are all available at the following links: 1. http://archive.ics.uci.edu/ml/ (accessed on 12 December 2022); 2. http://www.csie.ntu.edu.tw/ ~cjlin/libsvmtools/datasets/ (accessed on 12 December 2022).

**Acknowledgments:** This work is supported by National Natural Science Foundation of China (Grant No. 71901219).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


TIPOSVR Support Vector Regression based on Truncated *ε*-insensitive pinball loss function

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
