1. Introduction
Regression models, also known as response models, can accurately predict the output of other features by establishing a mapping relationship between data features and outputs [
1]. The commonly used regression models are single-output models, i.e., models with one or more inputs but only one output. However, in real engineering problems, there are often multiple outputs. For example, in novel battery material discovery, simultaneous and comprehensive prediction of the multidimensional properties of battery electrode materials is needed to help accelerate material discovery and design [
2]. In environmental forecasting, there is a need to simultaneously predict particulate matter concentrations at different air quality monitoring stations, which often have potentially nonlinear spatial correlations. Reliable and accurate predictions help in crisis response and can reduce health risks [
3]. Ultra-high-performance fiber-reinforced concrete (UHPFRC) is used in a variety of civil engineering applications, and its structural behavior is closer to that of steel. To investigate the effect of component dosage on its strain and energy absorption capacity under peak tension and to optimize the material dosage, both outputs need to be predicted simultaneously [
4]. Multiple outputs can be processed separately, but this method ignores the potential correlation between the outputs and results in information loss. Therefore, the correlation between the outputs can be used to build a multi-output model, which is also known as a multi-response or multi-task model.
Support vector machine (SVR) was first proposed by Vapnik based on the principle of structural risk minimization [
5]. SVR uses quadratic programming to obtain predictions for a single output. Compared to other models, SVR has superior performance due to its structural risk minimization principle, which allows it to avoid overfitting and achieve better output approximation [
6,
7]. The Bayesian support vector machine (BSVR) introduces Gaussian process assumptions and Bayesian inference on the basis of SVR to obtain the predicted values and their distributions. The BSVR model not only obtains an estimate of the unknown sample points but also has the advantages of adaptivity and prediction error distributions of Bayesian methods [
8]. Meanwhile, SVR has shown superior performance in dealing with nonlinear problems and avoiding overfitting with good generalization ability [
9]. Therefore, Bayesian support vector machines have received a lot of attention in the past time, such as [
9,
10,
11] and references therein.
Multiple output regression aims to establish a mapping from multivariate inputs to multiple outputs [
12]. Despite the potential utility of BSVR, its standard form cannot handle multiple outputs. The simplest way to deal with the multiple-output problem is to model multiple outputs individually. For each output indicator, a model can be built independently. This treatment is simpler but does not take into account the correlation between outputs and is suitable for scenarios where no correlation exists between outputs [
13]. Another method is chain modeling [
14]. This method predicts an output, followed by predicting the next output using the predicted output as input, and so on. However, the use of chain modeling requires determining the order of the outputs and the dependencies between them.
Considering that multiple outputs are correlated and being modeled individually can lead to information loss, more and more multi-output modeling approaches have been proposed. Multi-output modeling takes advantage of the correlation between outputs so that a single output can utilize information from other outputs to obtain more accurate predictions [
13]. Methods have been developed to extend the support vector machine model so that it can handle multiple outputs simultaneously. Pérez-Cruz et al. [
14] transformed the pipeline in a pipeline-based model
into a hyper-square pipeline by equalizing the output values of the data points located outside the pipeline. This hyperspherical insensitive zone is designed to be more effective than modeling it individually. Zhang et al. [
15] proposed an extended LSSVR (ELS-SVR), which extends the original feature space using vector virtualization to represent the multi-output case as an equivalent single-output case in the extended feature space and solved using a least-squares support vector machine. Inspired by multi-task learning, Xu et al. [
16] changed the weight vector of a least-squares support vector machine from one to two. One carries generic information and the other carries specific information, thus characterizing the correlation between the two outputs, which is referred to as multi-output LS-SVR (MLS-SVR). Literature [
17] gives an overview of the correlation methods and also analyzes the disadvantages of the above methods: the hyperspherical
ϵ-tube of M-SVR does not exhibit an advantage over a hypercubic one, ELSSVR cannot handle the negative correlations, MLSSVR does not handle well the case of only partial correlations, and the above methods do not consistently outperform single-output support vector machines.
The above methods distribute modifications to
and LSSVR so that they can solve the multi-output problem. Among them, the support vectors in
are sparse and only some of the samples are involved in the model construction. LSSVR transforms convex quadratic optimization problems into linear systems of equations problems, in which all the samples are involved in the model construction. These methods better utilize the correlation between the outputs and improve the model accuracy to some extent. However, these methods cannot obtain a prediction distribution similar to BSVR, which can quantify uncertainty and has good application prospects. In addition, BSVR is based on Bayesian theory, which can systematically and effectively infer the optimal hyperparameters [
8]. In terms of describing the correlation of multiple outputs, the method based on
does not have an accurate structure to describe the correlation of outputs. ELS-SVR only describes the correlation through a parameter greater than 0, so it cannot describe the negative correlation. MLS-SVR describes the shared information through the disassembly of the weight vector. However, as the weight describes the correlation of multiple outputs in a unified manner, it cannot describe the partial correlation of the outputs.
Therefore, this paper introduces the multi-output Gaussian process assumption based on the Bayesian support vector machine model (BSVR) while considering the variability of multiple outputs in terms of SVR trade-off parameters. A Bayesian framework is used to systematically and comprehensively optimize the original BSVR hyperparameters and the hyperparameters of the Gaussian process, which in turn provides the predicted values and probability distributions of multiple outputs. The difference between the multi-output Bayesian support vector machine (MBSVR) and single-output Bayesian support vector machine (BSVR) mainly lies in the kernel function. MBSVR uses the semiparametric latent factor model in the new kernel function, which describes the relationship between the inputs and outputs, and the outputs and outputs at the same time through the linear combinations of implicit functions so that information between them can be transferred to improve the accuracy of the model. The main contributions of our work are as listed:
The method inherits the advantages of support vector machines in nonlinear, high-dimensional problems by introducing Bayesian derivation in support vector machines.
Compared to other SVR-based multi-output regression methods. Based on Bayesian theory, the predicted mean and its probability distribution (uncertainty) can be obtained, and the hyperparameter optimization can be performed systematically and effectively.
Compared with BSVR, the method combines the SLFM structure with BSVR for comprehensive optimization of parameters through information transfer between outputs and uses the shared information to improve model accuracy.
The use of a trade-off parameter makes the method sensitive to outliers and allows for more robust performance on real datasets than multi-output Gaussian process.
The rest of the paper is structured as follows: in
Section 2, the Bayesian support vector machine model and the semiparametric latent factor model are introduced. In
Section 3, the new multi-output Bayesian support vector machine model is introduced. In
Section 4, the model evaluation is carried out using function arithmetic and real datasets, and
Section 5 concludes.
3. Multi-Output Bayesian Support Vector Regression Model
The structure of the MBSVR model is shown in
Figure 1, where the left side is the SLFM structure and
combines linear combination of
implicit functions, according to which the variance between
can be quantitatively described. The right side represents the trade-off parameters in the support vector machine; for each output, there is a corresponding trade-off parameter, which is used to trade off the complexity and error of the model. The
output can be expressed as (27), where
is the parameter to be optimized,
is Gaussian process implicit function. Based on the expression of
, the model can be expressed as:
where
is the
sample and
is an independently and identically distributed random error. C is a number greater than 0, which determines the degree of tolerance for error in the model. When C is large, the model will not allow for errors, the complexity is high, and it may be overfitted with poor generalization ability. When C is small, the model does not focus on the presence of errors, the model is simpler, and it is easy to be underfitted.
MSVR combines the SLFM structure with the support vector machine model through the Bayesian assumptions. Through the Gaussian process assumption and Bayesian derivation, the correlation between the outputs is effectively delineated, and finally, the predicted mean and probability distribution of multiple outputs are obtained.
3.1. Bayesian Assumptions for MBSVR
Assume that a multi-output modeling problem consists of
outputs and
samples. Define a vector
, which characterizes the outputs of the sample points and contains
elements in the vector. Multi-output Bayesian support vectors aim to approximate the
outputs
simultaneously. A more accurate model is built by considering the correlation between the outputs. In a multi-output Bayesian support vector machine, for a certain
, the relationship between the outputs and the factors can be expressed as (38). where
is an independently and identically distributed random error whose distribution form is usually unknown.
is the values of multiple outputs.
is a support vector machine, which is a multi-output Gaussian process. Since there are multiple outputs, the multiple outputs of all samples
are satisfied:
According to the SLFM principle,
is the parameter to be optimized.
Gaussian process outputs can be expressed as:
In order to make the model satisfy the Gaussian process assumptions and to facilitate the solution, the Gaussian process output is stored using a stack as:
. To facilitate the derivation of the formula, it is denoted as
. Then, the likelihood function of
can be expressed as:
where
denotes the mean vector of the
elements and
denotes the covariance matrix of
.
denotes the parameter vector to be optimized. The definition of the covariance matrix
is the key element that distinguishes the multi-output Gaussian process from the multi-output correlation, and the description of the multi-output correlation is also included in the covariance matrix.
Since the noise is assumed to be an independent and identically distributed random variable, the likelihood function of the sample output for a given training set of samples can be expressed as:
where
is the probability distribution of
and the expression is:
where
is the loss function of the support vector machine.
is the trade-off constant. For each output, there is a corresponding trade-off constant. According to Bayesian theory, the posterior distribution of
satisfies:
is a normalization constant, and further,
can be expressed as (See
Appendix A for more details):
where
,
,
,
is Kronecker products.
Therefore, maximizing the posterior distribution according to the principle of the great likelihood method of solution can be equated to:
Similar to the original support vector machine, the first term of the objective function denotes the empirical risk, and the second term, which denotes the smoothness of the function, is an expansion of the trade-off parameters.
3.2. Model Construction for MBSVR
As with single-output Bayesian support vector machines, MBSVR still uses the squared loss function:
The squared loss function actually obeys a Gaussian probability density function [
20]. Bringing in the loss function expression yields the new objective function as
where
.
.
. The estimate of
is (See
Appendix B for more details):
where
satisfies
,
is in diagonal form of
,
is a unit matrix,
.
is the Hadamard product, denoting the element-by-element multiplication of the matrix. For the output to be predicted
, its joint distribution with the training set is satisfied:
where
,
.The prior of
still obeys a multi-output Gaussian distribution:
where
,
,the expression is:
The variance of the diagonal element of corresponds to the variance of output of .
3.3. Optimized Solution of Parameters
In MBSVR, the parameters to be optimized include the kernel function parameter
; the trade-off parameter
; and the matrix
, which describes the correlations between outputs. For computational convenience, the specific implementation is decomposed by Cholesky
into
. The optimal values of these hyperparameters are determined by the maximum a posteriori probability:
where
,
is the prior distribution of the hyperparameters, and
is a regularization constant that in general specifies a uniform probability distribution for the hyperparameters. Therefore, its prior distribution
is a constant. Therefore, it is only necessary to maximize
to achieve the purpose of great likelihood estimation of the parameters:
where
,
.
can be expressed as
Bringing (64) into the probability distribution in (63) yields the following equation:
The hyperparameters are obtained by solving according to the minimized likelihood function. The nonlinear programming problem is solved using the “fmincon” function in MATLAB2022b. Given the initial solution, iterative optimization is performed to obtain the optimal hyperparameters. In general, the method can find the global optimal solution of the objective function, and the initial value of the parameters on the optimal solution of the parameters has less influence.
5. Conclusions
This paper investigates the multi-output modeling problem, aiming to improve the model accuracy by quantitatively describing the correlation between the outputs and using the information between the outputs to construct the model for multiple outputs simultaneously. To inherit the advantages of a single-output Bayesian support vector machine, based on it, the SLFM model is introduced, combined with Bayesian derivation, and the hyperparameters are optimized comprehensively to get the multi-output model that can predict multiple output means and probability distributions at the same time. Model validation is carried out on simple function arithmetic cases and real datasets, and overall, the MBSVR accuracy is higher due to the single-output modeling and the multi-output Gaussian process model.
Due to the large number of hyperparameters that need to be optimized in MBSVR, the efficiency of the algorithm is low. In addition, inaccurate hyperparameters make a difference between the shared information and the actual accurate information, resulting in limited room for improvement in model accuracy. Therefore, achieving efficient and accurate parameter optimization is the main problem that needs to be solved in the future. In addition, how to simplify the correlation description structure and further improve the applicability and optimization efficiency of the model is also the direction of further research.