1. Introduction
Regression analysis is a forecasting method for data analysis based on the causal relationship of changes in things; that is, according to the actual statistical data, through mathematical calculation, the interdependent quantitative relationship between variables is determined, and a reasonable mathematical model is established to calculate the future value of variables. Linear regression is a statistical analysis method that uses regression analysis in mathematical statistics to determine the interdependent quantitative relationship between two or more variables, which is widely used. Linear regression analysis is mainly used to analyze the observed values and fit a reasonable model. When a new value appears, it can be forecast using this model. The least squares method is a mathematical optimization technique, which is one of the most commonly used methods to solve the unknown parameters of linear regression. By minimizing the sum of squares of errors, it can find the best-matching data function and obtain a better linear regression fitting equation. The combination forecasting model adopts different single-item forecasting models for the same forecasting object, making full use of the information provided by various single item forecasting methods, and assigning appropriate weighting coefficients to improve the forecasting accuracy. There are many kinds of combination forecasting models, including the linear regression model, exponential model, power function model, logistic model, and neural network. Each model has its own characteristics and application scope. The idea of combining various models to achieve a better forecasting effect is basis of the combination forecasting. Many experts and scholars have conducted in-depth research on the linear combination forecasting model, deduced some forecasting models, achieved good results and carried out practical applications [
1,
2,
3,
4,
5]. We know that precision and imprecision are symmetrical, precise data are relative, and imprecision data are absolute. Many of the observed data are imprecise. In other words, in practice, the obtained observation is often not a definite value, and may even show an approximate range. At this time, the traditional combination forecasting model cannot solve these problems. However, the uncertainty theory proposed by Liu [
6] can solve this problem.
The relation between certainty and uncertainty is symmetrical, and any random event is uncertain. We need to study these problems by means of uncertainty theory. Liu [
7] founded the uncertainty theory and gradually improved it [
6,
8,
9,
10]. Uncertainty theory is a branch of mathematics concerned with the analysis of degree of belief. Its main theories include uncertain measure, uncertain variable, uncertain distribution, uncertain inverse operation and expected value. Uncertainty theory has become an important branch of axiomatic mathematics to deal with uncertainty problems in reality. It has been widely used in uncertain planning, uncertain statistics, comprehensive evaluation and production planning [
11,
12,
13], and has achieved fruitful results, which has aroused great attention. In 2010, Liu [
6] began his research on uncertainty statistics, which is a methodology for collecting and interpreting expert experience data through uncertainty theory. Uncertainty statistics mainly include uncertain regression equation, uncertain estimation and uncertain hypothesis testing. Based on the keen interest in uncertain regression equations, many uncertain regression models have been proposed by experts and scholars [
14,
15,
16,
17,
18]. Yao and Liu [
19] proposed the least squares estimation to solve the unknown parameters of the uncertain regression equation. Wang et al. [
20,
21,
22] proposed two new uncertain linear regression models. Shi et al. [
23] proposed total least squares estimation model based on uncertainty theory. Uncertainty statistics also have real applications; when COVID-19 was spreading rapidly in most countries around the world, Liu Z. [
24] proposed an uncertain growth model for the cumulative number of COVID-19 infections in China.
It is not easy to build a scientific forecasting model, because whether the forecasting model is scientific depends on the accuracy of the forecasting results on the one hand, and on the simplicity of the model itself on the other. However, these two aspects are contradictory: when the model is simple, the forecasting results are often not too accurate; when the forecasting is relatively accurate, the model is not too simple. On the basis of the previous research [
3,
4,
5] and uncertainty theory, this paper puts forward two kinds of uncertain combination forecasting models, which are the unary uncertain linear combination forecasting model and the uncertain relative error combination forecasting model. In general, the newer the data information, the greater the impact of the given data on the model, but the historical data are also a factor affecting the accuracy of the model. According to the principle of minimum error, the unary uncertain linear combination forecasting model combines the piecewise linear regression of the data corresponding to different periods into a prediction model with higher accuracy. The uncertain relative error combination forecasting model is based on the least squares principle and relative error, combined with the uncertainty theory, which can better deal with the regression and forecasting of imprecision data. The two kinds of uncertain linear combination forecasting models can be used for both imprecise data and precise data, and the forecasting effect of the models is very good.
In this paper, we propose the unary uncertain linear combination forecasting model and the uncertain relative error combination forecasting model. Both of these models can solve the regression equation of imprecise observation data better, and the forecasting effect is better. The main organizational structure is as follows: In
Section 2, we propose the unary uncertain linear combination forecasting model. This model aims to establish several piecewise linear regression models according to the data of different periods, and combine the piecewise linear regression into an uncertain combination forecasting model. In
Section 3, we propose the uncertain relative error combination forecasting model. This model is a new model, which combines relative error and uncertainty theory together and has a good forecasting effect. In the
Section 4, the feasibility of the uncertain linear regression combination forecasting model is verified by numerical example. The forecasting effect of the model is good. Finally, we summarize the proposed model and point out the future research direction.
2. Uncertain Regression Model
Certainty and uncertainty are symmetrical, and precision and imprecision are also symmetrical. In order to resolve uncertainty problems such as imprecise data, Liu [
7] founded uncertainty theory. The main content of uncertainty theory includes the basic theory of uncertainty variable, uncertainty measure and uncertainty distribution, as well as the calculation methods of uncertainty operational laws and expected value. If you are interested in uncertainty theory and uncertainty statistics, please study Reference [
10]. In this section, we mainly introduce the uncertain least squares estimation method of the uncertain regression equation.
Assume that (
,
, …,
) is an independent variables vector, and
y is a dependent variable. If the functional relationship is between (
,
, …,
), then
y can be expressed by a regression model
where
is an unknown vector of parameters,
is a disturbance term and
is an uncertain variable. If the regression equation fits well, its expected value
should be 0 [
10].
In particular, Liu [
10] call
a linear regression model.
Assume that we have a set of imprecisely observed data,
where
,
, …,
,
are independent uncertain variables with regular uncertainty distributions
,
, …,
,
,
i = 1, 2, …,
m, respectively.
Yao-Liu [
19] proposed the least squares estimate of unknown parameter
of linear regression model. The parameter
is the solution of the following minimization problem
If the minimization solution is
, then the fitted regression equation is determined by
y =
f (
,
, …,
∣
). Then, for each index
i(
i = 1, 2, …,
m), the term
is called the
ith residual.
Let the disturbance term
is uncertain variable, its expected value and variance can be estimated as
and
where
are the
ith residual,
i = 1, 2, …,
m, respectively [
25].
Let (
,
, …,
) be a new independent variables vector; the forecast uncertain variable of dependent variable
is
Lio-Liu [
25] suggested that the forecast value is defined as the expected value of the uncertain variable
, i.e.,
Taking
(e.g., 95%) as the confidence level, the confidence interval of dependent variable
is
3. Unary Uncertain Linear Regression Combination Forecasting Model
In this section, we derive the unary uncertain linear combination forecasting model, which is abbreviated as UULCFM. We all know that timely updated data have a greater effect on the forecasting model, so when establishing the forecasting model scientifically, we should fully consider the changes in time and conditions. The data information of different periods has different influences on the model and recent data information is obviously more valuable than long-term data information. The idea of UULCFM is to establish m regression models by discarding a certain amount of previous historical data according to the existing data, and then assemble m regression models into a forecasting model according to the principle of minimum error.
Assume that (, ) (i = 1, 2, …, n) be a set of imprecise data, where , are independent uncertain variables with regular uncertainty distributions , (i = 1, 2, …, n), respectively. We always assumed that there is a linear relationship between and (i = 1, 2, …, n), and y can be expressed by an uncertain regression model , where , are unknown parameters, and is an uncertain disturbance term.
The main steps of the UULCFM are as follows.
- Step 1.
For the original
n sets of data, we obtained the following unary linear regression model using uncertain least squares estimation.
where
and
are unknown parameters.
- Step 2.
Discarding the first
sets of data, we can obtain the following unary linear regression model for the remaining
sets of data through least squares estimation.
where
and
are unknown parameters,
is positive integer and
.
- Step 3.
Discarding the first
sets of data, we can obtain the following unary linear regression model for the remaining
sets of data through least squares estimation.
where
and
are unknown parameters. Both
,
are positive integers, and
.
By analogy, we can obtain the mth unary linear regression equation.
Step
m. Discarding the first
sets of data, we can obtain the following unary linear regression model for the remaining
sets of data through least squares estimation.
where
and
are unknown parameters. Both
are positive integers, and
.
In this way,
m unary linear regression models are obtained as follows,
where
,
are unknown parameters.
Each regression equation of Equation (
5) is fitted to the remaining
sets of data, and the generated errors are, respectively,
Since
,
i = 1, 2, …,
n is a set of imprecise data, Equation (
6) is deformed into the following form according to the uncertain expected value formula [
10].
The purpose of this paper is to find
m numbers
,
,
, …,
that satisfy
. Then, we construct the composite model
This minimizes the sum of the squares of error , , . This model is called the linear regression combinatorial model.
We know from the above derivation
So, the linear regression combination model becomes the problem of finding the minimum value of under the constraint condition .
We use the Lagrange multiplier method to solve the conditional extremum. We construct the Lagrange function as follows
The Lagrange function
L is a binary elementary function and the minimum point is the stagnation point of the function. If we take the first partial derivative of
, then we obtain
If we solve the Equation (
26), we get
According to constraint
, we can solve Equation (
27) and obtain
For m numbers m, numbers satisfy . Thus, the linear regression combination model was obtained.
The derivation process of the UULCFM involves the matrix inverse and matrix elementary transformation, which requires readers to have a certain matrix foundation and linear algebra foundation.
4. Uncertain Relative Error Linear Combination Forecasting Model
In this section, we derive the uncertain relative error linear combination forecasting model, which is abbreviated as UURELCFM. Suppose that we have a set of imprecise data . where , , …, , are independent uncertain variables with regular uncertainty distributions , , …, , , respectively.
The basic principles of UURELCFM are as follows. The forecasting result of the
ith
forecasting method is
. The linear combination of the
m forecasting result is
The relative error between the forecasting value and the original data can be defined as
Since
nleads toimprecise data, we have to solve Equation (
31) by means of uncertain expectations [
10].
The uncertain relative error linear combination forecasting model I (URELCFM I) with the minimum sum of squares of relative errors is
and the constraint of the model is
The sum of squares of relative errors is
Equation (
33) is transformed into
and the constraint is transformed into
According to the Lagrange multiplier method, the optimal coefficient
is
The solution of model I sometimes has a negative component, which does not achieve the expected effect of the linear combination forecasting model. In order to overcome the limitations of URELCFM I, we put forward an uncertain combination forecasting model with the minimum sum of squares of relative errors of non-negative weights, namely, the uncertain relative error linear combination forecasting model II (URELCFM II)
and the constraint of the model is
According to the derivation of URELCFM I, we can obtain
and the constraint is transformed into
URELCFM II belongs to quadratic convex programming and can be solved by the simplex algorithm of quadratic convex programming. This method needs to be solved by the linear programming method of finite number or can also be solved by MATLAB optimization toolbox.
Both URELCFM I and URELCFM II require that the sum of the weighting coefficients is 1. In fact, there is no need for this limitation. The weight can also be negative, and the goal is to minimize the sum of squares of the combined forecasting errors. Although it is controversial that the weight is negative, it is also common from a mathematical perspective; for example, multiple regression often has negative coefficients. By removing the limitation of the weighting coefficient, we can obtain the uncertain relative error linear combination forecasting model III (URELCFM III) with the minimum sum of squares of relative errors
The sum of squares of relative errors is
Q in Equation (
47) is an elementary function. In order to solve the minimum value of
Q, we take the partial derivative of
and equal to 0, respectively, to obtain the following equations
We express the above equations by matrix equations, and obtain
The matrix
Z is invertible and the solution
W is
URELCFM I has more constraints than URELCFM III, and the accuracy of URELCFM I is lower than that of URELCFM III, while URELCFM II has more constraints than URELCFM I. Therefore, URELCFM III has the highest accuracy, that is, the sum of relative error squares .
5. Numerical Example
To verify the feasibility and effectiveness of the model proposed in this paper, we provide a numerical example of imprecise data. Moreover, we followed the numerical analysis method for the disturbance term in Reference [
25], calculated the expected values and variance of the disturbance term, and forecasting and solved the confidence interval. The numerical analysis results show that the model proposed in this paper can lead to better forecasting data.
Assuming that (
,
),
i = 1, 2, …, 8 are imprecise data provided in
Table 1, where
,
,
i = 1, 2, …, 8 are independent linear uncertain variables with regular uncertainty distributions
and
,
i = 1, 2, …, 8, respectively.
We carried out linear regression using the uncertain uncertain slope mean method (USMM) [
20] and uncertain equation deformation method (UEDM) [
21] respectively, and then solved the linear regression equations according to the combination forecasting model proposed in this paper. The results are shown in
Table 2.
It can be seen from
Table 2 that there are some differences in the coefficients of the linear regression equation of USMM and UEDM. The coefficients of the linear regression equation of the model proposed in this paper are almost the same, the stability of the model is strong, and the difference in fitting effect is small, which can be ignored.
The estimated expected values and estimated variances of the each model disturbance term are shown in
Table 3.
As can be seen from
Table 3, the estimated expected values of the disturbance terms of the URELCFM I, URELCFM II and URELCFM III are all 0.0000, and the variance is relatively small, indicating that the three models have a better fitting effect and better forecast effect, and URELCFM III has the best performance.
We forecast the data according to the URELCFM III and obtained the confidence interval. We assumed that
is a new imprecise form of data, and we take the confidence level to be
. According to Reference [
25], the forecast value and confidence interval were obtained as shown in
Table 4.
From the perspective of numerical examples, all four models proposed in this paper are feasible. From the perspective of data analysis and comparison with existing models, the prediction effect of the four models proposed in this paper is better.
6. Conclusions
Traditional forecasting models all require data to be precise. In fact, statistics can be imprecise. For example, after the college entrance examination, we invited a teacher to estimate the score of a certain candidate. If the teacher believes that the candidate’s score is bound to exceed 500, we would obtain an expert’s experience data (500, 0), if the teacher thinks the candidate’s score is less than 520 is 0.3, we obtain an expert’s experience data (520, 0.3), if the teacher thinks the candidate’s score is less than 550 is 0.6, we obtain an expert’s experience data (550, 0.6), if the teacher thinks the candidate’s score is less than 580 is 0.8, we obtain an expert experience data (580, 0.8), and the teacher believes that the candidate will score no higher than 600, we obtain an expert experience data (600, 1). This gives us five pieces of expert experience data (500, 0), (520, 0.3), (550, 0.6), (580, 0.8), (600, 1), all of which are imprecise data.
Based on traditional combination forecasting methods and uncertainty theory, this paper proposes two kinds of uncertain combination forecasting models. The forecasting models proposed in this paper are all aimed at imprecise data, and they rely on uncertainty theory when solving. Univariate uncertain linear combination forecasting model is a relatively basic linear model. It establishes several piecewise linear regression models based on data in different periods and combines them into an uncertain combination forecasting model with high accuracy. The uncertain relative error combination forecasting model is based on the principle of minimizing the sum of squares of relative errors, setting weight restrictions, and obtaining three kinds of uncertain relative error combination forecasting models with good forecasting results. The four models proposed in this paper are all feasible, and the forecasting effect of the models proposed in this paper is better than the existing models obtained through data analysis.
The numerical example in this paper is a univariate linear forecasting problem, and the model solution and data analysis are not too complicated. The derivation and calculation of multivariable uncertain linear combination forecasting model are relatively complex, and can only be realized with the help of computer programs or MATLAB programming.