*3.2. Support Vector Regression*

Support vector regression (SVR) is an emerging nonlinear regression method based on statistical learning theory having a more stable solution than traditional neural network models. Adopting the structural risk minimization principle in SVM reduces overfitting and local minima issues. In SVR, the nonlinear regression problem is transformed into a linear regression problem by mapping the input data into a high dimensional feature space by applying kernel functions [47]. Consider a set of data (*xi*, *yi*) *m <sup>i</sup>*=<sup>1</sup> <sup>⊂</sup> <sup>R</sup>*<sup>m</sup>* <sup>×</sup> <sup>R</sup> where *xi* is a vector of inputs, *yi* represents the scalar output. In the nonlinear regression case, the linear estimation function can be formulated as *f*(*x*) = *w*, *φ*(*x*) <sup>+</sup> *<sup>b</sup>* where, *<sup>w</sup>* <sup>∈</sup> <sup>R</sup>*<sup>m</sup>* is weight vector, *φ*(*x*) is the mapping function, ·, · denotes the dot product in the feature space, and b is a constant. Several cost functions can be used in SVR, including Huber's Gaussian, ε-insensitive, and Laplacian. The robust ε-insensitive loss function introduced by Vapnik [48] is the most frequently used function, which can be formulated as follows:

$$L\_{\varepsilon}(f(\mathbf{x}) - \mathbf{y}) = \begin{cases} |f(\mathbf{x}) - y| \inf |f(\mathbf{x}) - \mathbf{y}| \ge \varepsilon \\ 0 \text{ otherwise} \end{cases} \tag{3}$$

where ε is the tube radius around the regression function *f*(*x*), affecting the number of support vectors used to construct the regression function. The cost of errors on the points inside the tube is zero. Figure 2 shows a schematic diagram of the nonlinear regression by SVR.

**Figure 2.** A schematic diagram of the nonlinear regression by SVR based on the ε-insensitive loss function in the feature space.

The SVR performs linear regression in the feature space using the *ε*-insensitive loss function by minimizing the empirical risk *Remp* = <sup>1</sup> *n n* ∑ *i*=1 *Lε*(*f*(*x*) − *y*) as well as minimizing the regularization term, *w*<sup>2</sup> to reduce the model complexity (flatness). The slack variables *ξ<sup>i</sup>* and *ξ*<sup>∗</sup> *<sup>i</sup>* represents the deviation of training samples out of the ε-insensitive zone. The optimal regression function can be obtained [47]:

$$\min \frac{1}{2} \|w\|^2 + \mathbb{C} \sum\_{i=1}^{k} (\xi\_i + \xi\_i^{\ast \ast}) \tag{4}$$

$$\left| s.t. \mathcal{y}\_i - \left< w, \phi(\mathbf{x}\_i) \right> - b \right| \le \varepsilon + \tilde{\mathfrak{z}}\_i \tag{5}$$

$$<\langle w, \phi(\mathbf{x}\_i) \rangle + b - y\_i \le \varepsilon + \tilde{\varepsilon}\_i^\* \tag{6}$$

$$
\xi\_{i\prime} \xi\_i^\* \ge 0 \tag{7}
$$

where *C* is the regularization constant determining the trade-off between the empirical risk and the regularization term. The above optimization problem can be solved by using Lagrangian multipliers *α*∗ *<sup>i</sup>* and *α<sup>i</sup>* and Karush–Kuhn–Tucker conditions as the following form:

$$\max \ -\varepsilon \sum\_{i=1}^{n} (a\_i^\* + a\_i) + \sum\_{i=1}^{n} (a\_i^\* - a\_i)y\_i - \frac{1}{2} \sum\_{i,j=1}^{n} \left(a\_i^\* - a\_i\right)(a\_j^\* - a\_j) \ K\left< x\_i, x\_j \right> \tag{8}$$

$$\text{s.t. } \sum\_{i=1}^{n} (\alpha\_i^\* - \alpha\_i) = 0 \tag{9}$$

$$0 \le a\_i \le \mathbb{C}, \ i = 1, \dots, n \tag{10}$$

$$0 \le a\_i^\* \le \mathbb{C}, \ i = 1, \dots, n \tag{11}$$

where *K xi*, *xj* is the kernel function which is defined as the inner product of *φ*(*xi*) and *φ*(*xj*) in the feature space. After solving the optimization problem, the optimal form of the regression function can be obtained as [47]:

$$f(\mathbf{x}) = \sum\_{i=1}^{n} (\alpha\_i - \alpha\_i^\*) K \left< \mathbf{x}, \mathbf{x}\_i \right> + b \tag{12}$$

By setting the parameters *C* and *ε* and the kernel parameters, the estimation accuracy can be obtained. The reason for choosing SVR is that it is robust to outliers. The decision model can be easily updated. It has excellent generalization capacity with high prediction accuracy, and its implementation is straightforward.
