A Novel Adaptive Intelligent Ensemble Model for Forecasting Primary Energy Demand

Zhao, Wenting; Zhao, Juanjuan; Yao, Xilong; Jin, Zhixin; Wang, Pan

doi:10.3390/en12071347

Open AccessArticle

A Novel Adaptive Intelligent Ensemble Model for Forecasting Primary Energy Demand

by

Wenting Zhao

¹,

Juanjuan Zhao

^1,2,*,

Xilong Yao

¹,

Zhixin Jin

^1,3 and

Pan Wang

²

¹

School of Economic and Management, Taiyuan University of Technology, Jinzhong 030600, China

²

School of Information and Computer, Taiyuan University of Technology, Jinzhong 030600, China

³

Shanxi Coking Coal Group Co. Ltd., Taiyuan 030024, China

^*

Author to whom correspondence should be addressed.

Energies 2019, 12(7), 1347; https://doi.org/10.3390/en12071347

Submission received: 12 March 2019 / Revised: 28 March 2019 / Accepted: 2 April 2019 / Published: 8 April 2019

Download

Browse Figures

Versions Notes

Abstract

:

Effectively forecasting energy demand and energy structure helps energy planning departments formulate energy development plans and react to the opportunities and challenges in changing energy demands. In view of the fact that the rolling grey model (RGM) can weaken the randomness of small samples and better present their characteristics, as well as support vector regression (SVR) having good generalization, we propose an ensemble model based on RGM and SVR. Then, the inertia weight of particle swarm optimization (PSO) is adjusted to improve the global search ability of PSO, and the improved PSO algorithm (APSO) is used to assign the adaptive weight to the ensemble model. Finally, in order to solve the problem of accurately predicting the time-series of primary energy consumption, an adaptive inertial weight ensemble model (APSO-RGM-SVR) based on RGM and SVR is constructed. The proposed model can show higher prediction accuracy and better generalization in theory. Experimental results also revealed outperformance of APSO-RGM-SVR compared to single models and unoptimized ensemble models by about 85% and 32%, respectively. In addition, this paper used this new model to forecast China’s primary energy demand and energy structure.

Keywords:

forecasting; primary energy demand; ensemble model; improved particle swarm optimization algorithm

1. Introduction

Ensuring adequate energy supply is an important material basis for economic construction and development in most countries of the world. Especially for developing countries which are in the process of transformation and economic restructuring, energy supply and demand and structural adjustment are the keys to their transformation [1]. Estimating energy demand will not only affect energy market allocation [2], energy policy adjustment [3], economic development [4], carbon emission prediction [5], and other related energy planning research, it can also serve as the basis for building energy planning models [6]. Therefore, accurately forecasting energy demand can provide effective advice for energy policy-makers on energy development planning and policy formulation. It also helps to avoid imbalances in energy supply and demand and ensures energy security in the region and worldwide. In addition, adjustment and transformation of the energy structure directly affect environmental protection and governance in the region and worldwide [7]. In view of this, accurately forecasting primary energy demand and energy structure is important for energy planning.

Many methods have been presented in the field of prediction. In short, energy forecasting models fall into three categories: time-series models, soft-computing technologies, and the ensemble of models [8,9]. Time-series models are the most widely used in energy prediction, including regression models, the autoregressive integrated moving average model (ARIMA), and the grey model (GM), which are commonly used for energy demand forecasting [10,11]. The construction of regression models requires rigorous assumptions, which leads to poor generalization. In recent years, ARIMA and GM have performed better in the forecasting of small samples [11,12]. With the development of artificial intelligence (AI), soft-computing technology has gradually been applied to energy demand and power load forecasting [13]. This kind of method has high prediction accuracy, but most of the models have strict requirements on training samples. For remedying the shortcomings of single forecasting models, scholars began to develop ensemble models and applied them to various types of energy consumption and load forecasting research. Ensemble models extract the characteristics of every single model for improving forecast precision [14,15]. At present, the single-model features of most ensemble models are extracted by fixed weights. In comparison with the single model, the precision of prediction has been improved. However, for the ensemble forecasting model with a fixed weight, the feature extraction of a single model cannot be adjusted in real-time, resulting in poor generalization ability. In view of the above, although many forecasting methods have been put forward, more accurate and optimized forecasting of energy demand remains a key issue for energy management departments, because forecasting energy demand is the basis of energy planning and policy adjustment for a region or even a country. There are not many statistical data on primary energy consumption. Especially for primary electricity consumption, only about a dozen years of data are available, which is clearly a small sample. In addition, the change in energy consumption is volatile. The use of ARIMA [12], the first-order single variable grey model (GM (1,1)) [16], and rolling grey model (RGM) [17] in time-series models perform better for forecasting small-size samples [8,9]. In addition, support vector regression (SVR) in artificial intelligence methods has good generalization [18]. Therefore, considering the characteristics of primary energy consumption, this paper combines ARIMA, GM (1,1), and RGM with SVR to construct ensemble models. Then, we improve the global search ability of particle swarm optimization (PSO) by improving the inertia weight, and then optimize the ensemble model with the improved adaptive inertia weight particle swarm optimization algorithm (APSO). Thus, an adaptive variable weight ensemble forecasting model (APSO-RGM-SVR) is innovatively proposed to accurately forecast primary energy demand and energy structure. Accurately forecasting primary energy demand can provide reference and evidence for energy–environment–economic planning and policy-making. It can also help to maintain the balance of energy supply and demand, and ensure energy security and stability. The contribution of this paper mainly includes two aspects:

(1): The APSO algorithm based on adaptive inertia weight is proposed. The inertial weight of PSO is an important factor affecting global search ability. In this paper, an adaptive adjustment method of inertia weight was used to improve the PSO algorithm. A “momentum” factor was added to PSO to adjust the adaptive inertia weight of the particles so as to improve the global search ability of PSO.
(2): An adaptive variable weight ensemble forecasting model (APSO-RGM-SVR) is proposed to predict primary energy demand and energy structure. This paper combines the GM (1,1), RGM, and ARIMA in time-series model with SVR in machine learning method, and constructs GM (1,1) -SVR, RGM-SVR, and ARIMA-SVR ensemble models based on the standard deviation weight method. Through comparative experiments, we found RGM-SVR had higher precision. Then, the RGM-SVR model with higher accuracy was optimized by the APSO algorithm, and the APSO-RGM-SVR model was obtained. The new method proposed in this paper was used to forecast China’s primary energy demand and structure. The empirical results showed that the new method had higher accuracy and better generalization.

The remaining paper is structured as follows. Section 2 reviews the related work. Section 3 presents the single forecasting models involved in this paper, including GM (1,1), RGM, ARIMA, and SVR. Section 4 presents ensemble models, including ensemble models with the standard deviation method and the adaptive inertial weight ensemble forecasting model. Section 5 presents the results. Section 6 presents the conclusion.

2. Literature Review

Forecasting energy demand has always been the focus of authors in the study of energy. The key to improving the forecasting accuracy is to construct a reasonable and appropriate energy demand forecasting model according to different types of forecasting objects and targets [8]. Over the past decade, the energy consumption index has shown a globally increasing trend, and its actual situation is more complex. More new technologies will be developed for energy forecasting.

2.1. Time-Series Forecasting Model

The time-series forecasting model is a kind of traditional forecasting method which was put forward early. The popular methods include regression model [9,19], ARIMA [12], and the grey theory model [16,17]. The regression model can forecast the changes in the dependent variable by establishing the regression equation through the correlation between the dependent variable and the independent variable. The regression model was the earliest method applied to energy management [19,20]. Bianco et al. (2009) established a regression model to predict power consumption based on data of electricity consumption [20]. The construction of the regression model uses strict assumptions, which leads to poor generalization [9,12].

The ARIMA is a forecasting method that was developed by Box and Jenkins [19]. The principle of ARIMA is to treat the data sequence formed by the prediction object over time as a random sequence, and then establish a model to express the sequence [12,19]. This method is often used in energy consumption and price forecasting [21,22]. Contreras et al. (2003) used ARIMA to predict the power price in the Spanish mainland and the California market [21]. Abdel-Aal et al. (2014) forecasted power load using ARIMA [22]. Building the ARIMA model does not require the changes in random variables to be considered directly, but this method is not suitable for non-linear forecasting [23].

In time-series model research, the grey model has been more widely used in recent decades. It was proposed by Julong Deng in 1982 [16]. It can use small sample data to establish the grey differential equation and describe the development trend of things for a long time [16]. Prediction methods based on grey theory are often used to predict energy consumption [24,25]. Zhao and Guo (2016) used the RGM to predict power load [24]. Tsai et al. (2017) used three grey models to forecast the growth trend of renewable energy [25]. However, GM (1,1) has low prediction accuracy for samples with complex backgrounds and seasonal variation characteristics, and it is not good at long-term prediction, although the model has the advantages of a simple model structure and high accuracy for short-term predictions. At present, scholars generally optimize the grey prediction model or combine it with other methods to improve the accuracy and generalization of GM [26].

2.2. Soft-Computing Technology

With the progress of AI, soft-computing technology based on AI has gradually entered the field of energy management [27]. Compared with traditional time-series prediction methods, soft-computing technology can cope with more complex changes and has a higher fitting accuracy [28].

The artificial neural network (ANN) has been a hotspot since the 1980s [29]. It imitates human brain neurons from the perspective of information processing, and forms different networks according to different connection modes, and establishes models. In later years, ANN has been gradually applied to predict energy consumption and power load [29,30]. Ekonomou (2010) used ANN to predict energy demand [29]. Azadeh et al.(2013) proposed ANN based on multi-layer perceptron (MLP) training and to test data to forecast Iran’s energy consumption [30].

Deep learning was proposed by Hinton et al. in 2006 [31]. As the arrival of the era of big data, deep learning has been greatly developed. Scholars have proposed several time-series prediction methods based on in-depth learning, including Long Short-Term Memory (LSTM) [32,33], the Recurrent Neural Network (RNN) [34,35], and other prediction methods based on deep learning [36,37], which are used to forecast photovoltaic power generation, energy demand, and power load [36,38,39,40,41]. Liu et al. (2016) established the Gated Recurrent Unit prediction model (GRU) and forecasted China’s primary energy demand [40]. Lee and Kim (2019) used ANN, DNN, and LSTM to forecast photovoltaic power combined with meteorological information [41]. However, ANN and time-series prediction models based on deep learning have better performance in the training of large samples. Compared with ANN and deep learning, SVR has a good generalization ability and low-computational complexity.

The SVR model improves the generalization of learning machines by seeking structural risk minimization, so as to obtain good statistical rules in the case of fewer statistical samples [18]. This method has been widely used in financial market forecasting [42], weather forecasting [43], energy consumption and power load forecasting [44,45], etc. Jain et al. (2014) used SVR to construct sensor-based forecasting models to forecast energy consumption in New York buildings [44]. Chen et al. (2017) constructed SVR to forecast the power load based on the demand response [45]. The SVR model structure is simple, and the prediction accuracy is high. However, the model has better performance for short-term prediction. At present, more scholars choose to optimize SVR or combine it with other models to improve its generalization [46,47]. The applicability, advantages, and disadvantages of some single prediction models are listed in Table 1.

2.3. Optimization Algorithm

The intelligent optimization algorithm is an optimization algorithm with global optimization performance and strong versatility that is suitable for parallel computing [48]. In 1995, Kennedy and Eberhart [49] proposed the PSO algorithm, which represents significant progress in the research of the meta-heuristic optimization algorithm. The PSO algorithm starts from a random solution and finds the optimal solution through iteration. This algorithm has fewer parameters that need to be adjusted and the structure is simple. Thus, it is widely used in parameter optimization, engineering application, and stochastic optimization [50]. However, PSO also has the potential to fall into a local optimum. With the continuous evolution of optimization algorithms and the increasing complexity of optimization problems, more and more scholars are choosing to improve PSO for better application [51,52].

In recent years, some new swarm intelligence optimization algorithms have been proposed, including gray wolf optimization (GWO), moth-flame optimization (MFO), the whale optimization algorithm (WOA), etc. The GWO algorithm is an optimization method that imitates the hunting behavior of gray wolves [53]. The GWO algorithm has the characteristics of simple structure and good global performance, but its local search ability is poor [54]. The MFO algorithm is an optimization algorithm for simulating the flight characteristics of moths, which was proposed by Mirjalili in 2015 [55]. Although MFO has fast convergence speed and high accuracy, this method also has the problem of being easily trapped in local extreme points [56]. In 2016, the WOA was proposed by Mirjalili [57], and it is often used to solve large-scale, complex optimization problems [58]. Easily falling into local extremum is the common characteristic of most optimization models, and this represents a challenge and s goal for scholars trying to improve optimization models.

2.4. Ensemble-Based Methods

Energy management is affected by energy policy, population, consumption, and economic development, and the change in energy consumption is unstable. The single forecasting model can only extract some features of energy consumption time-series, and it is difficult to obtain a more perfect prediction effect. In later years, many scholars have chosen to adopt an ensemble model approach. More specifically, multiple single-prediction models have been established, and then these single models are combined. This combination can extract the advantages of each method and achieve more accurate prediction results [15,59,60]. Li et al. (2017) forecasted power demand based on a hybrid approach [59]. Xiao et al. (2018) combined with the grouping method of data processing selection set to construct an ensemble prediction model, which is mainly used to forecast the non-linear changes in energy consumption [60].

The ensemble model extracts the advantages of each individual model and has a higher prediction accuracy. Therefore, for constructing an adaptive ensemble model with stronger generalization ability and higher forecasting accuracy, determining the combined weight of every single model is the focus of the ensemble model [61,62]. The linear combination of predictive models can improve the predictive power of a single model, but the flexibility in assigning weights is relatively poor, thus affecting the adaptability of the ensemble model [63,64]. Liu et al. (2014) adjusted the weight of the ensemble model by PSO and used the optimized model to forecast carbon emissions [63]. Li et al. (2018) successfully developed an ensemble prediction model for short-term wind speed prediction using the variable weighted combination theory [64]. Thus, while improving the prediction accuracy of the model, the adaptability and flexibility of the enhanced model are also issues that deserve attention from scholars [65].

3. Methodology

The GM (1,1) [16], RGM [17], and ARIMA [12] in a time-series model have better performance in forecasting small-sized samples [8,9]. In addition, SVR in an AI method has good generalization ability [18]. In view of the small sample size of primary energy annual data, we combined ARIMA, GM (1,1), and RGM with SVR to construct three ensemble forecasting models based on standard deviation. Then, the global search ability of PSO was enhanced by improving the inertial weight, and the APSO algorithm was used to optimize the ensemble model with higher accuracy. Thus, an adaptive ensemble forecasting model with high accuracy and good generalization was constructed. The overall framework of this study is shown in Figure 1. In this section, we introduce the modeling principles and methods. We first introduce the modeling principles and processes of the four models: GM (1,1), RGM, ARIMA, and SVR in detail.

3.1. GM (1,1)

The basic idea of GM is that the original sequence

x^{(0)}

is composed of the original data, and the sequence

x^{(1)}

is generated by the method of cumulative generation [19]. GM can weaken the randomness of the original data and make them show more obvious characteristics. GM is a differential equation model that is used to generate a transformed sequence

x^{(1)}

.

GM (1,1) represents the first-order differential equation model with one variable. Specific steps are as follows:

Step 1. Assume

x^{(0)}

is the original sequence:

x^{(0)} = {x^{(0)} (1), x^{(0)} (2), \dots, x^{(0)} (n)}

(1)

where n is the number of years observed. Perform an accumulation of the original sequence

x^{(0)}

, recorded as the first-order accumulated generating sequence (1-AGO), and generate a sequence:

x^{(1)} = {x^{(1)} (1), x^{(1)} (2), \dots, x^{(1)} (n)}

(2)

where,

x^{(1)} (k) = \sum_{i = 1}^{k} x^{(0)} (i), k = 1, 2, \dots, n

(3)

Step 2. Assume

x^{(1)}

is continuously differentiable and satisfies the first-order linear differential equation:

{\begin{matrix} \frac{d x^{(1)} (t)}{d t} + a x^{(1)} (t) = u \\ x^{(1)} (t) | t = 0 = x^{(0)} (1) \end{matrix}

(4)

where a and u are pending parameters.

Step 3. Discretize Equation (4), and the differential equation becomes a difference equation of the form

x^{(0)} (t) + a z^{(1)} (t) = u

(5)

where

z^{(1)} (t) = {z^{(1)} (1), z^{(1)} (2), \dots, z^{(1)} (n)}

is a sequence of background values constructed by

x^{(1)} (t)

, namely,

z^{(1)} (t) = λ x^{(1)} (t - 1) + (1 - λ) x^{(1)} (t), t = 2, 3, \dots, n λ \in [0, 1]

(6)

where

λ

= 0.5.

Step 4. Solve for the parameters a and u in Equation (4) using the least squares method:

\hat{a} = {[a, u]}^{T} = {(B^{T}, B)}^{- 1} B^{T} Y_{N}

(7)

where

B = [\begin{matrix} - z^{(1)} (2) & 1 \\ - z^{(1)} (3) & 1 \\ ⋮ & ⋮ \\ - z^{(1)} (n) & 1 \end{matrix}], Y_{N} = [\begin{matrix} x^{(0)} (2) \\ x^{(0)} (3) \\ ⋮ \\ x^{(0)} (n) \end{matrix}]

(8)

Step 5. Establish a prediction formula. Solve Equation (8) after finding a and u:

{\hat{x}}^{(1)} (t) = (x^{(0)} (1) - \frac{u}{a}) e^{- a (t - 1)} + \frac{u}{a}

(9)

Perform a subtractive generation on Equation (9) to get the prediction formula:

{\hat{x}}^{(0)} (t) = (x^{(0)} (1) - \frac{u}{a}) (1 - e^{a}) + e^{- a (t - 1)}

(10)

3.2. RGM

In theory, GM (1,1) is a continuous-time function that can continue from the initial value to any moment in the future. As time goes by, some disturbances in the future may affect the system. The farther away the future is, the larger the grey interval of the predicted value. Therefore, for GM (1,1), the meaningful prediction data are only a few data points after

x^{(0)} (n)

, and other data can only represent the planning data under the premise that the current conditions are stable. For improving the precision of the predicted data, new information has to be constantly added while making full use of the known information [17].

Therefore, an improved prediction method is proposed: GM (1,1) is established based on the known series, a grey value is predicted, and then the predicted value is added to the known series to form a sequence of information. GM (1,1) is established based on the known series, a grey value is predicted once more, and then the predicted value is added to the known series again to form a new sequence of information. Whenever a new datum is added, GM (1,1) is created. Because the previous data are more and more unable to reflect the new situation, each time a new datum is added, the oldest datum is eliminated. Therefore, the dimensions of the sequence are unchanged, and then GM (1,1) is constructed to complete the prediction target. For each step of prediction, the parameters are corrected once, the model is improved, and the predicted values are generated dynamically. This prediction method is called the “grey model with the rolling mechanism” (RGM). Specific steps are as follows:

Step 1.: Construct GM (1, 1) by using the original sequence $x_{0}^{(0)} = (x^{(0)} (1), x^{(0)} (0), \dots, x^{(0)} (n))$ .
Step 2.: Forecast a new datum as $\bar{x} (n + 1)$ , add it to the original data $x_{0}^{(0)}$ , and remove the oldest original datum $x^{(0)} (1)$ to keep the original data dimension unchanged, the new sequence is recorded as $x_{1}^{(0)}$ .
Step 3.: Establish GM (1,1) with $x_{1}^{(0)} = (x^{(0)} (2), x^{(0)} (3), \dots, x^{(0)} (n), {\bar{x}}_{1} (n + 1))$ .
Step 4.: Forecast the next value, denoted as ${\bar{x}}_{2} (n + 1)$ . Add it to the original datum $x_{1}^{(0)}$ and remove one of the oldest raw datum $x^{(0)} (2)$ to form a new sequence: $x_{2}^{(0)}$ .

Remove the old and add the new, then add them one by one, until the prediction target is achieved, or a certain accuracy requirement is met. At each prediction step, the parameters are modified, and the model is improved constantly.

3.3. ARIMA

The digital characteristics of a non-stationary sequence, such as the mean, variance, and covariance, change with time. In 1970, Box and Jenkins [19] proposed a time-series analysis method based on stochastic theory. The basic models include three types [23]: the autoregressive model (AR (p)), the moving average model (MA (q)), and the ARIMA. The ARIMA carries out the disturbance analysis by using AR, single integer, and MA. It describes the change in the time-series by the extrapolation mechanism, and comprehensively takes into account the previous data, current fata, and error of the prediction variable, to improve the prediction accuracy.

For a single sequence

Y_{t}

, the non-stationary sequence can be translated into a stationary sequence

X_{t}

using the d-order difference so that the stationary sequence

X_{t}

is fitted to autoregressive moving average model (ARMA (p, q)).

The ARMA (p, q) consists of AR (p) and MA (q). The AR (p) is an autoregressive model with order p.

X_{t} = c + φ_{1} X_{t - 1} + \dots + φ_{p} X_{t - p} + u_{t}

(11)

where

φ_{1}, \dots, φ_{p}

are parameters,

c

is a constant, and the random variable

u_{t}

is a white-noise sequence.

MA (q) is a moving average model of order q with the expression:

X_{t} = μ + ε_{t} + θ_{1} ε_{t - 1} + \dots + θ_{q} ε_{t - q}

(12)

where

θ_{1}, \dots, θ_{q}

are parameters,

μ

is the expected value of

X_{t}

(usually assumed to be 0),

ε_{t}, ε_{t - 1}, \dots, ε_{t - q}

are the error terms of the white-noise sequence.

The expression of the ARMA (p, q) model is:

X_{t} = c + φ_{1} X_{t - 1} + \dots + φ_{p} X_{t - p} + ε_{t} + θ_{1} ε_{t - 1} + \dots + θ_{q} ε_{t - q}

(13)

The ARMA (p, q) is changed into ARIMA (p, d, q) by d-order differential transformation. In ARIMA (p, d, q), the parameter p is the order of AR (p), d is the difference, and q is the order of MA (q).

The construction of ARIMA (p, d, q) and the determination of the parameters are summarized as follows: (i) First of all, it should be examined whether the time-series is stable. A smooth time-series is the basic precondition for constructing this model. For unstable time-series, the d-order difference should be made before the model is constructed to make it stationary. (ii) Then the main parameters of the model need to be determined. The d-value can be obtained by the difference, and then the q-value and the p-value can be obtained by the autocorrelation function and the partial correlation function. (iii) After the model parameters are determined, the saliency of the model is tested to determine the rationality of each parameter and to verify whether the model’s settings conform to the specifications. (iv) The model is trained with the sample, the sample data are fitted, and ARIMA (p, d, q) is found.

3.4. SVR

The basic principle of SVR is to map the input factor

x

to the high-dimensional feature space through the non-linear mapping

φ

, and then linear regression is performed in this high-dimensional space to find the function

f (x)

[44]. The modeling steps are as follows.

Step 1. Given a set of numbers

G = {(x_{i}, y_{i})}_{i = 1}^{n}

, with

x_{i}

the input factor,

y_{i}

the expected value, and

n

is the amount of data, then the expression is found as:

f (x_{i}) = ω^{T} φ (x_{i}) + b

(14)

where,

ω

is the weight and b is the bias.

Step 2. By means of the principle of structural risk minimization, the parameters

ω

and b are estimated:

{\begin{matrix} \min J = \frac{1}{2} {‖ w ‖}^{2} + C \sum_{i = 1}^{n} (ξ_{i} + ξ_{i}^{*}) \\ s . t . {\begin{matrix} y_{i} - f (x_{i}) \leq ε + ξ_{i} \\ f (x_{i}) - y_{i} \leq ε + ξ_{i}^{*} \\ ξ_{i} \geq 0, ξ_{i}^{*} \geq 0, i = 1, 2 \dots n \end{matrix} \end{matrix}

(15)

where

{‖ w ‖}^{2}

is a confidence risk,

C

is a penalty parameter,

ε

is an insensitive coefficient, and

ξ_{i}^{*}

and

ξ_{i}

are slack variables.

Step 3. For convenience, Equation (15) is transformed into a dual problem, and SVR function is obtained:

f (x_{i}) = \sum_{i = 1}^{n} (α_{i} - α_{i}^{*}) K (X_{i}, X) + b

(16)

where

α_{i}

and

α_{i}^{*}

are support vector parameters, and

K (X_{t}, X)

is an inner product function.

Step 4. Since the radial basis function (RBF) is a good general kernel and can achieve non-linear projection [66], according to the Mercer condition, the kernel function is defined, and RBF is selected:

K (X_{i}, X) = \exp (- \frac{{‖ x_{i} - x ‖}^{2}}{2 σ^{2}})

(17)

By substituting Equation (17) into Equation (16), we obtain the following equivalent Equation (18):

f (x_{i}) = \sum_{i = 1}^{n} α_{i} \exp {- \frac{{‖ x_{i} - x ‖}^{2}}{2 σ^{2}}} + b

(18)

where

σ

is a hyper-parameter of the RBF kernel that defines the feature-length scale of the similarity between learning samples, and

x

is the center of the kernel function.

Equation (18) is operated to obtain parameters

α_{i}

and

b

, thereby obtaining an energy demand prediction model.

3.5. Model Evaluation Criteria

Three criteria are used to evaluate the performance of a forecasting algorithm: Mean Absolute Error (MAE), Mean Absolute Percent Error (MAPE), and Mean Square Percent Error (MPSE) which are defined as

M A E = \frac{1}{n} \sum_{i = 1}^{n} | e_{i} | = \frac{1}{n} \sum_{i = 1}^{n} | x_{i} - {\hat{x}}_{i} |

(19)

M A P E = \frac{1}{n} \sum_{i = 1}^{n} | \frac{e_{i}}{x_{i}} | = \frac{1}{n} \sum_{i = 1}^{n} | \frac{x_{i} - {\hat{x}}_{i}}{x_{i}} |

(20)

M S P E = \frac{1}{n} \sqrt{\sum_{i = 1}^{n} {(\frac{e_{i}}{x_{i}})}^{2}} = \frac{1}{n} \sqrt{\sum_{i = 1}^{n} {(\frac{x_{i} - {\hat{x}}_{i}}{x_{i}})}^{2}}

(21)

where,

e_{i}

represents the error values,

x_{i}

represents the sample data,

{\hat{x}}_{i}

represents the prediction data, and n is the sample size.

The accuracy examination of each forecasting algorithm is determined using the appropriate tests.

4. Construction of Ensemble Forecasting Model

In the process of constructing the ensemble forecasting model, the most important thing is to extract the effective forecasting results of the single forecasting model, that is, to determine the weight of each single forecasting model. In this paper, GM (1,1), RGM, and ARIMA are combined with SVR, respectively, and the weights of the ensemble prediction model are determined based on the standard deviation method to forecast China’s primary energy demand. Then, we propose adding the momentum factor into PSO to improve the global search ability of PSO, and compare with PSO, GWO, MFO, and WOA to test the optimization accuracy of the improved PSO (APSO). Finally, the cost function is constructed by the sum of the absolute values of the prediction errors, and RGM-SVR is optimized by the improved PSO to achieve the best combination; thus, the adaptive weighted ensemble prediction model (APSO-RGM-SVR) is constructed to forecast China’s primary energy demand.

4.1. Construction of the Ensemble Model Based on the Standard Deviation Weight Method

In this section, we describe three ensemble models: GM (1,1)-SVR, RGM-SVR, and ARIMA-SVR. The relevant parameters of each single model are estimated by model training and calculation. Then we calculate the weights of each model based on the standard deviation method to combine the single models based on the calculated weights.

4.1.1. Standard Deviation Method

The standard deviation weight method is a relatively common weight calculation method. The calculation formula is as follows:

Assume the standard deviations of the prediction methods are

s_{1}

and

s_{2}

, and

s = s_{1} + s_{2}

(22)

where,

s_{1}

and

s_{2}

are the standard deviations of the prediction error of each model.

Then, the weights are

ω_{i} = \frac{s - s_{i}}{s}, i = 1, 2

(23)

4.1.2. GM (1,1)-SVR

The modeling procedure for the GM (1,1)-SVR combined prediction model is shown in Figure 2. The modeling steps are as follows:

Step 1.: GM (1,1) and SVR are constructed, respectively. According to Step 4 of Section 3.1., the parameters a and u of GM (1,1) are calculated as shown in Table 2. The Lagrangian multiplier method is used to transform SVR into a “dual problem”, and then the parameters are solved using sequential minimal optimization (SMO). The parameters of SVR have been tested repeatedly. It is shown that the fitting result is optimal when C = 10000 and $σ^{2}$ = 0.01. The parameters $α_{i}$ and b of the SVR are estimated, as shown in Table 3.
Step 2.: The results of GM (1,1) and SVR are assigned weights according to the standard deviation method, and the weight distribution is as shown in Table 4.
Step 3.: The accuracy of the GM (1,1)-SVR is evaluated by MAE, MAPE, and MSPE.

4.1.3. RGM-SVR

The description of the RGM-SVR combined prediction model is shown in Figure 3. The steps are as follows:

Step 1.: RGM and SVR are constructed, respectively. Take China’s primary energy consumption from 2005 to 2012 as a training set, which is brought into the model for parameter estimation. The training samples are unchanged, therefore, the estimated parameters of the SVR model are unchanged. The parameter estimation results of the RGM model are shown in Table 5. The parameters of SVR have been tested repeatedly. C = 10000, $σ^{2}$ = 0.01. The parameters $α_{i}$ and b of the SVR are estimated as shown in Table 3.
Step 2.: The prediction results of RGM and SVR are assigned weights according to the standard deviation method. The weight distribution is shown in Table 6.
Step 3.: The accuracy of RGM-SVR is evaluated by MAE, MAPE, and MSPE.

4.1.4. ARIMA-SVR

The description of the ARIMA-SVR model is shown in Figure 4. The steps are as follows.

Step 1.: Establish an ARIMA (p, d, q) model for China’s primary energy demand forecast and examine whether the input samples are stationary time-series.
Step 2.: Perform a time-series difference d on the sample. Begin with the first-order difference and observe the sequence. After the d-order difference, the sequence tends to be stationary, and the difference order is the parameter d of ARIMA.
Step 3.: Estimate the parameters p and q. Check and observe the autocorrelation and partial correlation graphs of the stationary time series to obtain q and p, and so that the parameters of ARIMA are determined which are shown in Table 7.
Step 4.: Assign weights to ARIMA and SVR by the standard deviation method, and the weight distribution is as shown in Table 8.
Step 5.: The accuracy of ARIMA-SVR is evaluated by MAE, MAPE, and MSPE.

4.2. Construction of the Adaptive Weight Ensemble Forecasting Model

Linear and non-linear combinations are the most commonly used methods for ensemble forecasting models. However, when the prediction accuracy of the predicted system is unstable, the fixed-weight combination cannot be adjusted according to the changes in the predicted system in time, and thus, the generalization is poor. The adaptive variable weight combination can adjust the weight of each individual prediction model in time, thus improving the overall prediction accuracy and generalization of the ensemble model. Therefore, this section introduces the method of improving the inertia weight of PSO (APSO). Then, APSO is compared with PSO, GWO, MFO, and WOA to test the performance of the optimization algorithms. Finally, APSO is used to optimize the ensemble model to construct the adaptive weight ensemble prediction model (APSO-RGM-SVR).

4.2.1. PSO

PSO is a new evolutionary algorithm with easy implementation, high precision, and fast convergence. Therefore, PSO is suitable for solving practical problems.

The algorithm principle is as follows.

PSO abstracts the potential solution of the optimization problem into a particle in the D-dimensional search space. Each particle has a fitness value set according to the objective function, and the particle adjusts its position s according to the flight speed v. Assume the search space has P particles composing a population

s = (s_{1}, s_{2}, \dots, s_{i}, \dots, s_{P})

, where the position of the ith particle (

1 \leq i \leq P

) is a D-dimensional vector. Then the position change rate of particle i at time t is

v_{i}^{t} = (v_{i 1}^{t}, v_{i 2}^{t}, \dots, v_{i D}^{t})

, and the current optimal position of particle i is represented as

p_{i}^{t} = (p_{i 1}^{t}, p_{i 2}^{t}, \dots, p_{i D}^{t})

, which is often denoted by

p_{b e s t}

. The optimal position of the population from the particle swarm search to time t is denoted by

p_{g}^{t} = (p_{g 1}^{t}, p_{g 2}^{t}, \dots, p_{g D}^{t})

, or by

g_{b e s t}

. The position of the particle is updated by the following formula:

{\begin{array}{l} v_{i d}^{t + 1} = ω v_{i d}^{t} + c_{1} r_{1} (p_{i d}^{t} - x_{i d}^{t}) + c_{2} r_{2} (p_{g d}^{t} - x_{g d}^{t}) \\ s_{i d}^{t + 1} = s_{i d}^{t} + v_{i d}^{t + 1} \end{array}

(24)

where

ω

is the inertia weight,

d = 1, 2, \dots, D

is the particle dimension,

t

is the current iteration number,

v_{i d} \in [v_{m i n}, v_{m a x}]

is the particle velocity, the acceleration factors

c_{1}

and

c_{2}

are non-negative constants, and

r_{1}

and

r_{2}

are random numbers distributed in the interval (0, 1).

4.2.2. APSO

The main factor affecting the search capability of particle swarm is the inertia weight. For improving the adaptive ability of PSO, an adaptive adjustment method based on inertia weight is proposed, which is called APSO. The algorithm is briefly described as follows:

Define the particle population diversity function

F_{d} (t) = \frac{f_{m i n} (α (t))}{f_{m i n} (α (t)) + f_{m a x} (α (t))}

(25)

where

{\begin{matrix} f_{m i n} (α (t)) = Min (f (α_{i} (t))) \\ f_{m a x} (α (t)) = Max (f (α_{i} (t))) \end{matrix}

(26)

where

f (a_{i} (t))

is the fitness value of the ith particle (

i = 1, 2, \dots, P,

), and

f_{m i n} (α (t))

and

f_{m a x} (α (t))

are the minimum and maximum values of the particle fitness at time

t

, respectively.

The diversity function

F_{d} (t)

can be used to describe the motion characteristics of the particle, and the non-linear function

δ (t)

is defined accordingly for the adaptive adjustment of the inertia weight:

δ (t) = e^{{(F_{d} (t) - L)}^{- 1}}

(27)

where

L \geq 2

is the initialization constant.

The adjustment rules for defining particle inertia weight adaptation are

v_{i}^{t + 1} = β v_{i}^{t} + η d ω_{i}^{t}

(28)

ω_{i}^{t + 1} = e^{- {(δ (t) ε_{i}^{t} + v_{i}^{t + 1})}^{- 1}}

(29)

where

v_{i}^{t}

is the velocity of the particle at time t,

β

is the “momentum” hyper-parameter, that is usually set to 0.9,

η

is the learning rate,

d ω_{i}^{t}

is the gradient of the weight,

ξ > 0

is a custom slack variable used to adjust the global search ability of the ion, and

ε_{i}^{t}

is the distance between the particle

i

and the global optimal particle, defined as

ε_{i}^{t} = e^{\frac{{‖ s_{i}^{t} - g_{b} ‖}^{2}}{D}}

(30)

where

s_{i}^{t}

and

g_{b}

are the positions of particle

i

and the global optimal particle at time

t

, respectively.

4.2.3. Contrast Test of Optimization Algorithms

The PSO algorithm has fewer parameters that need to be adjusted and the structure is simple. However, PSO also has the potential to fall into local optimum. In recent years, some new swarm intelligence optimization algorithms have been proposed, including GWO, MFO, and WOA. These algorithms have high accuracy, but the problem of falling into local extremum is the common characteristic of most optimization models. Therefore, we improved PSO, which has fewer parameters and a simpler structure, to improve the global search ability of the optimization algorithm. In order to test the performance of the improved optimization algorithm, we conducted a comparative test.

The optimization performance of APSO was tested in a 30-dimensional search space. The population size was set to 30, the maximum number of iterations was 1000, and the benchmark functions are F₁–F₆. The standard deviation comparison results after 40 independent experiments of GWO, MFO, WOA, POS, and APSO are shown in Table 9.

It can be seen from Table 10 that for six benchmark functions, the optimization accuracy of APSO for most functions was many orders of magnitude higher than that of PSO, and the accuracy of APSO was better than that of traditional GWO, MFO, and WOA. In the process of solving continuous single-mode functions, APSO had the highest accuracy for solving function F₁. For the optimization of function F₃, APSO was slightly better than other algorithms. For the multimodal functions F₄, F₅, and F₆ with multiple local extremums, the optimization accuracy of APSO was significantly better than other algorithms.

4.2.4. Construction of APSO-RGM-SVR Based on Adaptive Inertia Weight

Based on the advantages of various forecasting models, considering the characteristics of the small sample size and long forecasting target of our paper, we combined RGM with SVR. Then, the combined weights of the two prediction models were determined by APSO. Finally, an adaptive optimal weight ensemble prediction model was established, which is shown in Figure 5. The involved steps are as follows.

Step 1.: Based on SVR and RGM, a model for China’s primary energy demand forecasting is constructed and brought into the sample training set. The parameters of SVR and RGM required to determine the model are shown in Table 3 and Table 5.
Step 2.: The sum of the absolute values of the prediction errors is used as the cost function of the adaptive weight. The expression is

$\min Q = \sum_{i = 1}^{n} | e_{i} |$

(31)

where $e_{i}$ represents the error values, and n is the sample size.
Step 3.: The optimal weighted model is obtained by the APSO based on the adaptive inertia weight, and thus, an optimal ensemble forecasting model is constructed. The optimal weight is shown in Table 11:

$x_{i (A P S O - R G M - S V R)} = ω_{i} x_{i (R G M)} + (1 - ω_{i}) x_{i (S V R)}$

(32)
Step 4.: The prediction accuracy of the APSO-RGM-SVR prediction model is evaluated by MAE, MAPE, and MSPE.

5. Results

The previous section introduced the data sources, described the detailed steps for constructing the four ensemble models, and estimated the parameters of each model. The calculation steps for applying the model for prediction are similar to those used to evaluate the fitting, and thus are not described here. This section introduces the dataset and presents the fitting results and evaluation of primary energy consumption by the single forecasting models, unoptimized ensemble models, and optimized ensemble models. Then, the forecast results of the future primary energy demand and primary energy structure are presented.

5.1. Data

The data used to construct China’s primary energy demand and energy structure prediction models came from the National Bureau of Statistics of China (http://www.stats.gov.cn/tjsj/). We selected 12 years (2005–2016) of annual consumption data as a dataset, including China’s total annual energy consumption, coal, crude oil, natural gas, and annual consumption of water, wind, and nuclear power, as shown in Table 12. Seventy percent of the dataset (2005–2012) was used as a model training set, and 30% of the sample (2013–2016) was used as the model test set.

Figure 6 depicts the annual consumption of coal, crude oil, natural gas, water, wind, and nuclear power as well as China’s total primary energy consumption. The growth rate of energy consumption is shown in Figure 7. From 2005 to 2016, China’s total energy consumption showed an overall growth trend. During the period 2005–2011, primary energy demand showed rapid growth. For promoting economic development, the demand for primary energy increased steadily, especially the demand for coal. Oil and natural gas are affected by supply, and their growth rates were small. Under the influence of environmental policies, the total primary energy consumption growth slowed from 2011. The primary energy structure began to change significantly, and coal consumption growth began to slow down and decline. The proportion of clean energy began to increase.

5.2. Model Adaptability

The experimental data came from the total annual consumption of primary energy in China and the annual consumption of four types of primary energy for the years 2005–2016. The dataset was divided into two parts: data from 2005–2012 was used as model data, and data from 2013–2016 was used as a test dataset. The prediction results of the prediction model including GM (1,1), RGM, ARIMA, SVR, GM (1,1)-SVR, RGM-SVR, ARIMA-SVR, and APSO-RGM-SVR are shown in Table 13. Then, the accuracy was evaluated by MAE, MAPE, and MSPE. As far as a single model is concerned, fitting results of SVR are better in the model training phase, followed by RGM and GM (1,1), while SVR had the highest prediction accuracy in the test phase. The optimization of the rolling mechanism makes the accuracy of RGM higher than GM (1,1). The ARIMA showed very poor prediction effect in both the training phase and the prediction phase. The ensemble model had better prediction accuracy than the single models. Compared with other unoptimized ensemble models, the RGM-SVR model had better fitting and prediction effects both in training and testing. However, the adaptive weight ensemble model APSO-RGM-SVR based on the APSO proposed in this study had higher prediction accuracy and stronger adaptability.

The best evaluation results for each type of energy are indicated in bold in Table 13. Because the variation of consumption of each type of energy was different, the characteristics extracted by each prediction method during training were also different. From the prediction results of the total annual energy consumption, the RGM and GM (1,1) had the best fitting effect in the training phase. Because the modeling steps of RGM and GM (1,1) were the same in the training phase, the fitting results of the two models in the phase were basically the same. The results of the three evaluations, MAE, MAPE, and MSPE were 28.310, 0.00840, and 0.00384, respectively, which were the lowest values, followed by those of APSO-RGM-SVR. In the test phase, the values of the three evaluations of APSO-RGM-SVR were 26.252, 0.00616, and 0.00352, respectively, which were the lowest values. The fitting and prediction results of annual coal consumption show that the fitting errors of RGM and GM (1,1) were the smallest in the training phase, and the MAE, MAPE, and MSPE of the two models were 29.404, 0.01207, and 0.00516, and 2939.859, 0.01207, and 0.00516, respectively, i.e., they were approximately the same. In the test phase, the APSO-RGM-SVR model had the highest accuracy. The three error evaluations results of the APSO-RGM-SVR model were the lowest compared with other methods: 14066, 0.00505, and 0.00320. For the training and test results of annual oil consumption, SVR had the highest fitting degree to the training data. The results of the three error evaluations were 4.345, 0.00700, and 0.00651, respectively. In the test phase, the accuracy of APSO-RGM-SVR was still higher than that of the other models, and the results of the three error evaluations were 5.956, 0.00785, and 0.00448, respectively. For the results of the annual consumption of natural gas, the results of the MAE and MAPE evaluations in the training phase were analyzed, and SVR had the smallest error of 0.622 and 0.00549, respectively, indicating that the SVR model had the highest fitting degree. However, the analysis results of MSPE, APSO-RGM-SVR, and RGM-SVR error evaluations were the lowest—0.00512. Most noteworthy, the MSPE value was low, indicating that the fitting curve was relatively stable, and the abnormality was low. For a fitted curve with a large MAPE value, if it is relatively stable, its MSPE value is low. Therefore, the MSPE value cannot fully reflect the accuracy of the curve fitting. In evaluating the prediction model, it should be combined with MAE, MAPE, and MSPE to give a comprehensive evaluation. In the test phase, the three error evaluations of APSO-RGM-SVR—2.092, 0.01259, and 0.00995—were the lowest, showing an optimal prediction effect. For the training and test results of water, wind, and nuclear power consumption, in the training phase, it can be seen from the values of MAE and MAPE (7.912 and 0.02546, respectively) that the SVR errors were the smallest, indicating that the SVR model had the highest fitting degree. However, regarding the value of MSPE, the error evaluations of RGM and GM (1,1) (both were 0.01333) were the lowest. In the test phase, the APSO-RGM-SVR model had the best prediction capacity, and the three error evaluations were the lowest—10.947, 0.02024, and 0.01232, respectively.

The fitting and prediction curves for primary energy consumption are shown in Figure 8a–e. In the training phase, ARIMA had the worst fitting effect, and the fitting effects of the GM (1,1), RGM, SVR, and APSO-RGM-SVR model were ideal. However, in the test phase, the ensemble model showed its advantages, especially the APSO-RGM-SVR model, which had a high fitting degree to the real data.

5.3. Prediction Results

Using the sample information, the APSO-RGM-SVR model was selected to forecast China’s total annual energy demand and annual demand for the four types of primary energy for the year 2020 and 2025. The prediction results are shown in Table 14. The total primary energy demand was predicted to increase to 4465.41 mtce in 2020 and 4825.43 mtce in 2025. Compared with 2016, the demand for coal was predicted to decrease. It was estimated that the demand for coal will be 2637.23 and 2203.86 mtce in 2020 and 2025, respectively. The total demand for oil is growing slowly. It was estimated that the demand for oil will be 922.15 and 989.19 mtce by 2020 and 2025, respectively. In the future, the demand for clean energy was predicted to drastically. It is estimated that the demand for natural gas will be 409.53 mtce in 2020 and 540.64 mtce in 2025. Water, wind, and nuclear power were predicted to receive more and more attention in the future energy market. The demand for 2020 was predicted to be 687.50 mtce, and the demand for 2025 was predicted to be 1091.74 mtce.

Figure 9a shows the prediction results of energy for the years 2017–2025 and the integration and the comparison of the energy demand structure with the energy structure, for the years 2005–2017. Figure 9b shows the ratio of coal, oil, natural gas, water, wind, and nuclear power for the years 2005 and 2025. As can be seen from Figure 9, China’s total primary energy demand has grown steadily. The demand for natural gas and water, wind, and nuclear power has risen sharply. In 2020, the ratio of natural gas was predicted to be about 8.8%. In 2025, the ratio of natural gas was expected to reach 11.2%. It was estimated that the ratio of water, wind, and nuclear power will reach 14.8% and 22.6% in 2020 and 2025, respectively. Coal and oil that cause pollution to the environment were predicted to gradually decrease. The ratio of coal will fall to 56.6% in 2020 and 45.7% in 2025. It was estimated that oil will account for 19.8% in 2020 and around 20.5% in 2025, with a very small increase.

Figure 10 shows the growth rates of annual demand for primary energy for the next nine years. Table 15 shows the average annual growth rate of primary energy demand. The results show that the total primary energy demand was predicted to grow steadily in the future, with an average annual growth rate of 1.14%. The annual growth rate of coal demand is negative, i.e., it is declining year by year with an average annual rate of 2.2%. Although the annual demand for oil has increased yearly, the growth rate has shown a downward trend, and the average annual growth rate is 2.42%. It is expected that natural gas and clean energy such as water, wind, and nuclear power will be fully developed. Although the growth rate is predicted to decline slightly after 2020, the annual demand is still predicted to showing a very high growth trend. The average annual growth rates of the two types of primary energy demand were predicted to be 7.43% and 7.7%.

6. Conclusions

Accurate forecasting of energy demand can help decision departments to develop reasonable policies and plans. At the same time, by forecasting energy demand and structure, the connections between energy consumption and economic, social, and environmental protection can be established to guide the involved departments to adjust the energy structure and industrial layout in a targeted manner. The purpose of this study was to develop an ensemble forecasting technology based on a time-series model and artificial intelligence, together with APSO to determine the optimal combination weight. The ensemble forecasting technology was optimized to predict China’s primary energy demand and primary energy structure more accurately. Through the research results, we obtained the following three important findings:

(1): The improved PSO algorithm with the “momentum” factor added had better global search capabilities.
(2): By comparing a variety of single-prediction models and fixed-weight ensemble models, the prediction results of the APSO-RGM-SVR model proposed in this study showed the smallest error, and the prediction accuracy of energy demand was significantly improved. The ensemble model (APSO-RGM-SVR), which was optimized by the APSO algorithm, effectively combined the characteristics of a time-series model and artificial intelligence, and could adjust the optimal weight at any time with the change of samples. The new method proposed in this paper is a prediction method with higher accuracy and better generalization.
(3): The APSO-RGM-SVR method was used to forecast China’s primary energy consumption from 2017 to 2025. Prediction results indicate that China’s energy demand will continue to rise. By 2020, China’s primary energy demand was predicted to reach about 4656.41 mtce, and the proportions of coal, oil, natural gas, and hydropower and nuclear power were predicted to be 56.6%, 19.8%, 8.8%, and 14.8%, respectively. By 2025, China’s primary energy demand was predicted to reach about 4656.41 mtce, and the proportion of coal, oil, natural gas, and hydro, nuclear, and wind power were predicted to be 45.7%, 20.5%, 11.2%, and 22.6%, respectively.

According to the evaluation results of the model training situation and predictive ability, it can be considered that the prediction model APSO-RGM-SVR proposed in this paper can be used for future short-term and medium-term primary energy demand forecasting and primary energy structure prediction in China, and it can produce better prediction results. At the same time, this method can also be used for the same types of data training and prediction in other regions and countries. However, whether the model proposed in this study is applicable to the prediction of large samples remains to be tested. This is also a direction that we need to explore further.

Author Contributions

W.Z. conceived and designed the experiments, and wrote the paper; J.Z. and Z.J. analyzed the data; X.Y. modified the manuscript; P.W. performed the experiments; and all authors read and approved the final manuscript.

Funding

This study was funded by the National Natural Science Foundation of China [grant number 41401655], the Qualified Personnel Foundation of Taiyuan University of Technology [grant number 2013W005], Program for the Top Young Academic Leaders of Higher Learning Institutions of Shanxi [grant number 163080127-S], and Program for the Philosophy and Social Sciences Research of Higher Learning Institutions of Shanxi [grant number 2016311], and the Postgraduate Education Innovation Project of Shanxi [grant number 2018BY044]. The APC was funded by [grant number 2018BY044].

Acknowledgments

The authors would like to thank the support from the project grants: the National Natural Science Foundation of China [grant number 41401655], the Qualified Personnel Foundation of Taiyuan University of Technology [grant number 2013W005], Program for the Top Young Academic Leaders of Higher Learning Institutions of Shanxi [grant number 163080127-S], Program for the Philosophy and Social Sciences Research of Higher Learning Institutions of Shanxi [grant number 2016311], and the Postgraduate Education Innovation Project of Shanxi [grant number 2018BY044].

Conflicts of Interest

The authors declare no conflict of interest.

References

Xie, N.M.; Yuan, C.Q.; Yang, Y.J. Forecasting China’s energy demand and self-sufficiency rate by grey forecasting model and Markov model. Int. J. Electr. Power Energy Syst. 2015, 66, 1–8. [Google Scholar] [CrossRef]
Shaikh, F.; Ji, Q. Forecasting natural gas demand in China: Logistic modelling analysis. Electr. Power Energy Syst. 2016, 77, 25–32. [Google Scholar] [CrossRef]
Lin, B.; Jiang, Z. Estimates of energy subsidies in China and impact of energy subsidy reform. Energy Econ. 2011, 33, 273–283. [Google Scholar] [CrossRef]
Hu, Y.; Guo, D.; Wang, M.; Zhang, X.; Wang, S. The relationship between energy consumption and economic growth: Evidence from china’s industrial sectors. Energies 2015, 8, 9392–9406. [Google Scholar] [CrossRef]
Li, M.; Wang, W.; De, G.; Ji, X.; Tan, Z. Forecasting Carbon Emissions Related to Energy Consumption in Beijing-Tianjin-Hebei Region Based on Grey Prediction Theory and Extreme Learning Machine Optimized by Support Vector Machine Algorithm. Energies 2018, 11, 2475. [Google Scholar]
Rehman, S.; Cai, Y.; Fazal, R.; Das Walasai, G.; Mirjat, N. An Integrated Modeling Approach for Forecasting Long-Term Energy Demand in Pakistan. Energies 2017, 10, 1868. [Google Scholar] [CrossRef]
He, Y.D.; Lin, B.Q. Forecasting China’s total energy demand and its structure using ADL-MIDAS model. Energy 2018, 151, 420–429. [Google Scholar] [CrossRef]
Suganthi, L.; Samuel, A.A. Energy models for demand forecasting—A review. Renew. Sustain. Energy Rev. 2012, 16, 1223–1240. [Google Scholar] [CrossRef]
Debnath, K.B.; Mourshed, M. Forecasting methods in energy planning models. Renew. Sustain. Energy Rev. 2018, 88, 297–325. [Google Scholar] [CrossRef]
Wang, Q.; Li, S.Y.; Li, R.R. Forecasting energy demand in China and India: Using single-linear, hybrid-linear, and non-linear time series forecast techniques. Energy 2018, 161, 821–831. [Google Scholar] [CrossRef]
Mustafa, A.; Nejat, Y. Year Ahead Demand Forecast of City Natural Gas Using Seasonal Time Series Methods. Energies 2016, 9, 727. [Google Scholar]
Ediger, V.S.; Akar, S. ARIMA forecasting of primary energy demand by fuel in Turkey. Energy Policy 2007, 35, 1701–1708. [Google Scholar] [CrossRef]
Raza, M.Q.; Khosravi, A. A review on artificial intelligence based load demand forecasting techniques for smart grid and buildings. Renew. Sustain. Energy Rev. 2015, 50, 1352–1372. [Google Scholar] [CrossRef]
Barak, S.; Sadegh, S.S. Forecasting energy consumption using ensemble ARIMA–ANFIS hybrid algorithm. Int. J. Electr. Power Energy Syst. 2016, 82, 92–104. [Google Scholar] [CrossRef]
Yuan, C.Q.; Liu, S.F.; Fang, Z.G. Comparison of China’s primary energy consumption forecasting by using ARIMA (the autoregressive integrated moving average) model and GM (1, 1) model. Energy 2016, 100, 384–390. [Google Scholar] [CrossRef]
Deng, J.L. Control problem of grey system. Sys. Contr. Lett. 1982, 5, 288–294. (In Chinese) [Google Scholar]
Akay, D.; Atak, M. Grey prediction with rolling mechanism for electricity demand forecasting of Turkey. Energy 2007, 32, 1670–1675. [Google Scholar] [CrossRef]
Wu, C.H.; Ho, J.M.; Lee, D.T. Travel-time prediction with support vector regression. IEEE Trans. Intell. Transp. Syst. 2004, 5, 276–281. [Google Scholar] [CrossRef]
Box, G.E.P.; Jenkins, G.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control, 5th ed.; John Wiley & Sons.: Hoboken, NJ, USA, 2015; pp. 88–103. [Google Scholar]
Bianco, V.; Manca, O.; Nardini, S. Electricity consumption forecasting in Italy using linear regression models. Energy 2009, 34, 1413–1421. [Google Scholar] [CrossRef]
Contreras, J.; Espinola, R.; Nogales, F.J.; Conejo, A.J. ARIMA models to predict next-day electricity prices. IEEE Trans. Power Syst. 2003, 18, 1014–1020. [Google Scholar] [CrossRef]
Abdel-Aal, R.E.; Al-Garni, A.Z. Forecasting monthly electric energy consumption in eastern Saudi Arabia using univariate time-series analysis. Energy 2014, 22, 1059–1069. [Google Scholar] [CrossRef]
Wang, Q.; Li, S.; Li, R. China’s dependency on foreign oil will exceed 80% by 2030: Developing a novel NMGM-ARIMA to forecast China’s foreign oil dependence from two dimensions. Energy 2018, 163, 151–167. [Google Scholar] [CrossRef]
Zhao, H.; Guo, S. An optimized grey model for annual power load forecasting. Energy 2016, 107, 272–286. [Google Scholar] [CrossRef]
Tsai, S.B.; Xue, Y.; Zhang, J.; Chen, Q.; Liu, Y.; Zhou, J.; Dong, W. Models for forecasting growth trends in renewable energy. Renew. Sustain. Energy Rev. 2017, 77, 1169–1178. [Google Scholar] [CrossRef]
Li, S.; Li, R. Comparison of forecasting energy consumption in Shandong, China Using the ARIMA model, GM model, and ARIMA-GM model. Sustainability 2017, 9, 1181. [Google Scholar]
Ghanbari, A.; Hadavandi, E.; Abbasian-Naghneh, S. Comparison of artificial intelligence based techniques for short term load forecasting. In Proceedings of the 2010 Third International Conference on Business Intelligence and Financial Engineering, Hong Kong, China, 13–15 August 2010; pp. 6–10. [Google Scholar]
Daut, M.A.M.; Hassan, M.Y.; Abdullah, H.; Hussin, F. Building electrical energy consumption forecasting analysis using conventional and artificial intelligence methods: A review. Renew. Sustain. Energy Rev. 2017, 70, 1108–1118. [Google Scholar] [CrossRef]
Ekonomou, L. Greek long-term energy consumption prediction using artificial neural networks. Energy 2010, 35, 512–517. [Google Scholar] [CrossRef]
Azadeh, A.; Babazadeh, R.; Asadzadeh, S.M. Optimum estimation and forecasting of renewable energy consumption by artificial neural networks. Renew. Sustain. Energy Rev. 2013, 27, 605–612. [Google Scholar] [CrossRef]
Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Lee, W.; Kim, K.; Park, J.; Kim, J.; Kim, Y. Forecasting Solar Power Using Long-Short Term Memory and Convolutional Neural Networks. IEEE Access 2018, 6, 73068–73080. [Google Scholar] [CrossRef]
Graves, A.; Fernández, S.; Schmidhuber, J. Multi-dimensional recurrent neural networks. In Proceedings of the International Conference on Artificial Neural Networks, Porto, Portugal, 9–13 September 2007; pp. 549–558. [Google Scholar]
Abdel-Nasser, M.; Mahmoud, K. Accurate photovoltaic power forecasting models using deep LSTM-RNN. Neural Comput. Applic. 2017, 1–14. [Google Scholar] [CrossRef]
Ryu, S.; Noh, J.; Kim, H. Deep neural network based demand side short term load forecasting. Energies 2016, 10, 3. [Google Scholar] [CrossRef]
Son, J.; Park, Y.; Lee, J.; Kim, H. Sensorless PV power forecasting in grid-connected buildings through deep learning. Sensors 2018, 18, 2529. [Google Scholar] [CrossRef]
Zhang, J.; Verschae, R.; Nobuhara, S.; Lalonde, J.F. Deep photovoltaic nowcasting. Solar Energy 2018, 176, 267–276. [Google Scholar] [CrossRef]
Mathe, J.; Miolane, N.; Sebastien, N.; Lequeux, J. PVNet: A LRCN Architecture for Spatio-Temporal Photovoltaic PowerForecasting from Numerical Weather Prediction. arXiv Preprint, 2019; arXiv:1902.01453. [Google Scholar]
Liu, B.; Fu, C.; Bielefield, A.; Liu, Y. Forecasting of Chinese Primary Energy Consumption in 2021 with GRU Artificial Neural Network. Energies 2017, 10, 1453. [Google Scholar] [CrossRef]
Lee, D.; Kim, K. Recurrent Neural Network-Based Hourly Prediction of Photovoltaic Power Output Using Meteorological Information. Energies 2019, 12, 215. [Google Scholar] [CrossRef]
Nava, N.; Di Matteo, T.; Aste, T. Financial time series forecasting using empirical mode decomposition and support vector regression. Risks 2018, 6, 7. [Google Scholar] [CrossRef]
Kisi, O.; Cimen, M. Precipitation forecasting by using wavelet-support vector machine conjunction model. Eng. Appl. Artif. Intell. 2012, 25, 783–792. [Google Scholar] [CrossRef]
Jain, R.K.; Smith, K.M.; Culligan, P.J. Forecasting energy consumption of multi-family residential buildings using support vector regression: Investigating the impact of temporal and spatial monitoring granularity on performance accuracy. Appl. Energy 2014, 123, 168–178. [Google Scholar] [CrossRef]
Chen, Y.; Xu, P.; Chu, Y.; Li, W.; Wu, Y.; Ni, L.; Wang, K. Short-term electrical load forecasting using the Support Vector Regression (SVR) model to calculate the demand response baseline for office buildings. Appl. Energy 2017, 195, 659–670. [Google Scholar] [CrossRef]
Peng, L.L.; Fan, G.F.; Huang, M.L.; Hong, W.C. Hybridizing DEMD and quantum PSO with SVR in electric load forecasting. Energies 2016, 9, 221. [Google Scholar] [CrossRef]
Cao, G.; Wu, L. Support vector regression with fruit fly optimization algorithm for seasonal electricity consumption forecasting. Energy 2016, 115, 734–745. [Google Scholar] [CrossRef]
Boussaï, D.I.; Lepagnot, J.; Siarry, P. A survey on optimization metaheuristics. Inf. Sci. 2013, 237, 82–117. [Google Scholar] [CrossRef]
Eberhart, R.; Kennedy, J. Particle swarm optimization. In Proceedings of the IEEE international conference on neural networks, Perth, WA, Australia, 27 November–1 December 1995; pp. 1942–1948. [Google Scholar]
Banks, A.; Vincent, J.; Anyakoha, C. A review of particle swarm optimization. Part I: Background and development. Nat. Comput. 2007, 6, 467–484. [Google Scholar] [CrossRef]
Chan, C.L.; Chen, C.L. A cautious PSO with conditional random. Expert Syst. Appl. 2015, 42, 4120–4125. [Google Scholar] [CrossRef]
Liu, P.; Liu, J. Multi-leader PSO (MLPSO): A new PSO variant for solving global optimization problems. Appl. Soft. Comput. 2017, 61, 256–263. [Google Scholar] [CrossRef]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
Joshi, H.; Arora, S. Enhanced grey wolf optimization algorithm for global optimization. Fundam. Inform. 2017, 153, 235–264. [Google Scholar] [CrossRef]
Mirjalili, S. Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowledge-Based Syst. 2015, 89, 228–249. [Google Scholar] [CrossRef]
Mohamed, A.A.A.; Mohamed, Y.S.; El-Gaafary, A.A.; Hemeida, A.M. Optimal power flow using moth swarm algorithm. Electr. Power Syst. Res. 2017, 142, 190–206. [Google Scholar] [CrossRef]
Mirjalili, S.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
Ben oualid Medani, K.; Sayah, S.; Bekrar, A. Whale optimization algorithm based optimal reactive power dispatch: A case study of the Algerian power system. Electr. Power Syst. Res. 2018, 163, 696–705. [Google Scholar] [CrossRef]
Li, W.; Yang, X.; Li, H.; Su, L. Hybrid Forecasting Approach Based on GRNN Neural Network and SVR Machine for Electricity Demand Forecasting. Energies 2017, 10, 44. [Google Scholar] [CrossRef]
Xiao, J.; Li, Y.X.; Xie, L.; Liu, D.H.; Huang, J. A hybrid model based on selective ensemble for energy consumption forecasting in China. Energy 2018, 159, 534–546. [Google Scholar] [CrossRef]
Samuels, J.D.; Sekkel, R.M. Model confidence sets and forecast combination. Int. J. Forecast. 2017, 33, 48–60. [Google Scholar] [CrossRef]
Hsiao, C.; Wan, S.K. Is there an optimal forecast combination? J. Econom. 2014, 178, 294–309. [Google Scholar] [CrossRef]
Liu, L.; Zong, H.; Zhao, E.; Chen, C.; Wang, J. Can China realize its carbon emission reduction goal in 2020: From the perspective of thermal power development. Appl. Energy 2014, 124, 199–212. [Google Scholar] [CrossRef]
Li, H.; Wang, J.; Lu, H.; Guo, Z. Research and application of a combined model based on variable weight for short term wind speed forecasting. Renew. Energy 2018, 116, 669–684. [Google Scholar] [CrossRef]
Wang, L.; Wang, Z.; Qu, H.; Liu, S. Optimal forecast combination based on neural networks for time series forecasting. Appl. Soft Comput 2018, 66, 1–17. [Google Scholar] [CrossRef]
Jackson, Q.; Landgrebe, D.A. An adaptive method for combined covariance estimation and classification. IEEE Trans. Geosci. Remote Sens. 2002, 40, 1082–1087. [Google Scholar] [CrossRef]

Figure 1. Research framework for forecasting energy demand and energy structure.

Figure 2. The procedure of GM (1,1)-SVR.

Figure 3. The procedure of RGM-SVR.

Figure 4. The procedure of ARIMA–SVR.

Figure 5. The procedure of APSO-RGM–SVR model.

Figure 6. Energy consumption in China from 2005–2016.

Figure 7. The energy consumption growth rate in China.

Figure 8. The simulation and forecasting results for (a) the total energy consumption; (b) coal; (c) oil; (d) natural gas; and (e) hydro, nuclear, and wind power.

Figure 9. (a) Prediction results of primary energy demand forecast; (b) prediction results of primary energy ratio.

Figure 10. Growth rate of primary energy demand.

Table 1. Summary of several models commonly used for energy prediction.

Models	Feature	Advantages	Disadvantages	Applied to
Regression analysis	Establishing the regression equation through the correlation between variables.	The correlation degree between the factors can be analyzed.	Poor generalization and low accuracy.	Finding the relationship between energy consumption and economic growth [4]; forecasting long-term electricity consumption [20].
ARIMA	Treating the data sequence formed by the prediction object over time as a random sequence, and then establishing a model to express the sequence.	The model is simple and only requires endogenous variables.	Requirements are stable time-series data or are stable after differentiation; essentially only captures linear relationships, not non-linear relationships.	Forecasting the next-day electricity price in the Spanish mainland [21]; forecasting monthly electricity consumption in Eastern Saudi Arabia [22].
Grey	Using small sample data to establish a grey differential equation and describing the development trend of things over a long time.	Less sample size required; short-term prediction accuracy is higher; less model parameters.	Cannot consider the relationship between factors; medium- and long-term prediction error is relatively large.	Forecasting annual power load in Shanghai, China [24]; forecasting growth trends in renewable energy [25].
ANN	A highly complex non-linear dynamic learning system; suitable for handling inaccurate and fuzzy information processing problems under various factors and conditions.	Can fully approximate complex non-linear relationships; parallel distributed processing; highly robust and fault tolerant.	More model parameters; learning time is too long; large sample size required for model training.	Forecasting long-term energy demand in Greece [29]; forecasting Iran’s future monthly energy consumption [30].
LSTM	Long-term save input; identifying useful information.	Suitable for dealing with problems highly related to time-series; solves long sequence dependency problems.	The training time of the model is a little long.	Forecasting photovoltaic power generation [33,35].
SVR	Application of Support Vector in Regression Function; maps data to a high-dimensional feature space through a non-linear mapping and performs regression in this space.	Small samples; simplifies regression problems; high flexibility.	With the increase of sample size, the time complexity of model training will increase.	Forecasting buildings’ energy consumption in New York [44]; forecasting the electrical load of four typical office buildings [45].

Table 2. Parameter estimation of GM (1,1).

Parameter	The Total Energy Consumption	Coal	Oil	Natural Gas	Water, Wind, and Nuclear Power
a	−0.056	−0.047	−0.055	−0.153	−0.096
u	2.656	1.966	4.528	6.285	1.875

Table 3. Parameter estimation of SVR.

Parameter	The Total Energy Consumption	Coal	Oil	Natural Gas	Water, Wind, and Nuclear Power
$α_{1}$	0.9999	0.9999	0.9999	0.9999	0.9999
$α_{1}$	0.9999	0.9999	0.9999	0.9999	0.9999
$α_{1}$	−2.1974	0.9999	−1.9356	−8.1981	−2.4216
$α_{1}$	0.9999	0.9999	0.9999	0.9999	0.9999
$α_{1}$	0.9999	0.9999	0.9999	0.9999	0.9999
$α_{1}$	−6.8117	−8.0748	0.9999	0.9999	0.9999
$α_{1}$	0.9999	0.9999	−4.5862	0.9999	−7.3687
$α_{1}$	0.9999	0.9999	0.9999	−1.4721	0.9999
b	0.5148	0.4971	0.5021	0.5119	0.4878

Table 4. Weight distribution of GM (1,1)-SVR.

Weight	The Total Energy Consumption	Coal	Oil	Natural Gas	Water, Wind, and Nuclear Power
$ω_{G M (1, 1)}$	0.1201	0.2437	0.1871	0.1606	0.5512
$ω_{S V R}$	0.8799	0.7563	0.8129	0.8394	0.4488

Table 5. Parameter estimation of RGM.

Parameter	The Total Energy Consumption	Coal	Oil	Natural Gas	Water, Wind, and Nuclear Power
$a_{1}$	−0.056	−0.047	−0.055	−0.153	−0.096
$a_{2}$	0.055	−0.045	−0.057	−0.152	−0.094
$a_{3}$	0.058	−0.047	−0.060	−0.152	−0.092
$a_{4}$	0.057	−0.047	−0.059	−0.153	−0.093
$u_{1}$	2.656	1.966	4.528	6.285	1.876
$u_{2}$	2.816	2.073	4.719	7.327	2.087
$u_{3}$	2.940	2.139	4.906	8.456	2.325
$u_{4}$	3.121	2.250	5.239	9.807	2.522

Table 6. Weight distribution of RGM-SVR.

Weight	The Total Energy Consumption	Coal	Oil	Natural Gas	Water, Wind, and Nuclear Power
$ω_{R G M}$	0.2201	0.2407	0.1809	0.2501	0.4748
$ω_{S V R}$	0.7799	0.7593	0.8191	0.7599	0.5254

Table 7. Parameter estimation of ARIMA.

Parameter	The Total Energy Consumption	Coal	Oil	Natural Gas	Water, Wind, and Nuclear Power
p	2	2	1	5	1
d	1	1	2	1	2
q	2	2	2	2	1

Table 8. Weight distribution of ARIMA-SVR.

Weight	The Total Energy Consumption	Coal	Oil	Natural Gas	Water, Wind, Nuclear Power
$ω_{A R I M A}$	0.2928	0.3252	0.1281	0.5436	0.0981
$ω_{S V R}$	0.7072	0.6748	0.8719	0.4564	0.9019

Table 9. The standard deviation (Std. Dev) comparison results of the optimization algorithms.

Function		GWO	MFO	WOA	POS	APSO
F₁	Std. Dev	8.24E − 007	3.34E + 001	1.07E − 006	2.32E + 001	1.00E − 010
F₂	Std. Dev	8.47E − 001	4.22E − 001	1.26E − 001	1.95E + 005	2.63E − 001
F₃	Std. Dev	7.67E − 004	2.80E − 003	3.85E − 004	7.97E − 002	5.27E − 005
F₄	Std. Dev	1.03E + 003	6.45E + 002	2.75E + 002	3.94E + 002	2.39E + 001
F₅	Std. Dev	5.69E + 000	0.79E + 000	0.12E + 000	2.98E + 001	0.04E + 000
F₆	Std. Dev	6.45E − 003	6.50E − 002	3.78E − 003	3.98E − 001	8.25E − 009

Table 10. The benchmark functions.

Function	Formula	Dimension	Range
Schwefel 2.21	$F_{1} = \max_{i} {\| x_{i} \|, 1 \leq i \leq D}$	30	[−100, 100)
Rosenbrock	$F_{2} = \sum_{i = 1}^{D i m - 1} [100 {(x_{i + 1} - x_{i}^{2})}^{2} + {(x_{i} - 1)}^{2}]$	30	[−30, 30]
Quartic	$F_{3} = \sum_{i = 1}^{D i m} i \cdot x_{i}^{4} + r a n d o m (0, 1]$	30	[−1.28, 1.28]
Schwefel 2.26	$F_{4} = \sum_{i = 1}^{D i m} - x_{i} \sin (\sqrt{x_{i}})$	30	[−500, 500]
Rastrigin	$F_{5} = \sum_{i = 1}^{D i m} [x_{i}^{2} - 10 \cos (2 π x_{i}) + 10]$	30
Griewank	$F_{6} = \frac{1}{4000} \sum_{i = 1}^{D i m} x_{i}^{2} - \prod_{i = 1}^{D i m} \cos (\frac{x_{i}}{\sqrt{i}}) + 1$	30	[−600, 600]

Table 11. Weight distribution of APSO-RGM-SVR.

Weight	The Total Energy Consumption	Coal	Oil	Natural Gas	Water, Wind, Nuclear Power
$ω_{A P S O - R G M}$	0.2928	0.3252	0.1281	0.5436	0.0981
$ω_{A P S O - S V R}$	0.7072	0.6748	0.8719	0.4564	0.9019

Table 12. The energy consumption of China from 2005–2016 (units: million ton coal equivalent (mtce)).

Year	The Total Energy Consumption	Coal	Oil	Natural Gas	Water, Wind, and Nuclear Power
2005	2613.69	1892.31	465.24	62.73	193.41
2006	2864.67	2074.02	501.32	77.35	211.99
2007	3114.42	2257.95	529.45	93.43	233.58
2008	3206.11	2292.37	535.42	109.01	269.31
2009	3361.26	2406.66	551.25	117.64	285.71
2010	3606.48	2495.68	627.53	144.26	339.01
2011	3870.43	2717.04	650.23	178.04	325.12
2012	4021.38	2754.65	683.63	193.03	390.07
2013	4169.13	2809.99	712.92	220.96	425.25
2014	4258.06	2793.29	740.90	242.71	481.16
2015	4299.05	2738.49	786.73	253.64	520.19
2016	4360.00	2703.20	797.88	279.04	579.88

Table 13. Model evaluation results.

Method	Total Energy Consumption
	Training			Testing
	MAE	MAPE	MSPE	MAE	MAPE	MSPE
PSO-RGM (1,1)-SVR	38.765	0.01174	0.00510	26.252	0.00616	0.00352
RGM (1,1)-SVR	41.123	0.01263	0.00548	28.092	0.00655	0.00378
GM (1,1)-SVR	42.503	0.01306	0.00570	35.005	0.00817	0.00457
ARIMA-SVR	54.939	0.01716	0.00767	56.142	0.01319	0.00680
SVR	52.555	0.01617	0.00747	99.841	0.02336	0.02741
RGM (1,1)	28.310	0.00840	0.00384	149.875	0.03483	0.02055
GM (1,1)	28.310	0.00840	0.00384	184.505	0.04282	0.02595
ARIMA	89.969	0.02818	0.01148	187.109	0.04332	0.02741
Method	Coal
	Training			Testing
	MAE	MAPE	MSPE	MAE	MAPE	MSPE
PSO-RGM (1,1)-SVR	32.901	0.01439	0.00691	14.066	0.00505	0.00320
RGM (1,1)-SVR	33.417	0.01460	0.00693	14.693	0.00528	0.00324
GM (1,1)-SVR	34.124	0.01493	0.00715	19.086	0.00689	0.00424
ARIMA-SVR	48.061	0.02143	0.00899	27.532	0.00997	0.00516
SVR	38.434	0.01694	0.00862	124.293	0.04520	0.02343
RGM (1,1)	29.404	0.01207	0.00516	254.165	0.09259	0.04987
GM (1,1)	29.399	0.01207	0.00516	315.762	0.11520	0.06403
ARIMA	113.113	0.04979	0.02029	327.478	0.11966	0.06900
	Oil
Method	Training			Testing
	MAE	MAPE	MSPE	MAE	MAPE	MSPE
PSO-RGM (1,1)-SVR	4.936	0.00814	0.00625	5.956	0.00785	0.00448
RGM (1,1)-SVR	5.430	0.00908	0.00611	6.397	0.00842	0.00464
GM (1,1)-SVR	5.924	0.01008	0.00623	7.128	0.00936	0.00499
ARIMA-SVR	6.827	0.01176	0.00615	21.482	0.02735	0.01732
SVR	4.345	0.00700	0.00651	14.118	0.02039	0.01386
RGM (1,1)	10.349	0.01851	0.00846	23.671	0.03050	0.01822
GM (1,1)	10.351	0.01852	0.00846	31.507	0.04047	0.02415
ARIMA	8.302	0.01471	0.00683	37.090	0.04777	0.02666
	Natural gas
Method	Training			Testing
	MAE	MAPE	MSPE	MAE	MAPE	MSPE
PSO-RGM (1,1)-SVR	1.034	0.00837	0.00512	2.092	0.01259	0.00895
RGM (1,1)-SVR	1.034	0.00837	0.00512	4.497	0.01896	0.01153
GM (1,1)-SVR	1.042	0.00830	0.00528	4.626	0.01897	0.00977
ARIMA-SVR	12.977	0.01023	0.00539	7.751	0.03031	0.01659
SVR	0.622	0.00549	0.00514	7.864	0.03196	0.01727
RGM (1,1)	3.190	0.02351	0.01073	22.625	0.08952	0.04906
GM (1,1)	3.188	0.02348	0.01073	43.689	0.16699	0.09828
ARIMA	2.283	0.01663	0.00795	24.671	0.09433	0.05736
	Hydro, Nuclear and Wind Power
Method	Training			Testing
	MAE	MAPE	MSPE	MAE	MAPE	MSPE
PSO-RGM (1,1)-SVR	8.092	0.02608	0.01435	10.947	0.02024	0.01232
RGM (1,1)-SVR	8.337	0.02694	0.01369	12.021	0.02237	0.01328
GM (1,1)-SVR	8.404	0.02718	0.01356	14.419	0.02679	0.01599
ARIMA-SVR	8.342	0.02698	0.01514	23.598	0.04405	0.02589
SVR	7.912	0.02546	0.01501	14.594	0.02707	0.01703
RGM (1,1)	8.808	0.02859	0.01333	15.707	0.02997	0.01660
GM (1,1)	8.805	0.02858	0.01333	17.408	0.03322	0.01840
ARIMA	13.768	0.04618	0.02123	127.062	0.24883	0.12645

Table 14. Prediction results.

Year	The Total Energy Consumption	Coal	Oil	Natural Gas	Hydro, Nuclear, and Wind Power
2020	4656.41	2637.23	922.15	409.53	687.50
2025	4825.43	2203.86	989.19	540.64	1091.74

Table 15. Average annual growth rate (%).

Energy	Total Energy Consumption	Coal	Oil	Natural Gas	Hydro, Nuclear, and Wind Power
Average annual growth rate	1.14	−2.2	2.42	7.43	7.70

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, W.; Zhao, J.; Yao, X.; Jin, Z.; Wang, P. A Novel Adaptive Intelligent Ensemble Model for Forecasting Primary Energy Demand. Energies 2019, 12, 1347. https://doi.org/10.3390/en12071347

AMA Style

Zhao W, Zhao J, Yao X, Jin Z, Wang P. A Novel Adaptive Intelligent Ensemble Model for Forecasting Primary Energy Demand. Energies. 2019; 12(7):1347. https://doi.org/10.3390/en12071347

Chicago/Turabian Style

Zhao, Wenting, Juanjuan Zhao, Xilong Yao, Zhixin Jin, and Pan Wang. 2019. "A Novel Adaptive Intelligent Ensemble Model for Forecasting Primary Energy Demand" Energies 12, no. 7: 1347. https://doi.org/10.3390/en12071347

APA Style

Zhao, W., Zhao, J., Yao, X., Jin, Z., & Wang, P. (2019). A Novel Adaptive Intelligent Ensemble Model for Forecasting Primary Energy Demand. Energies, 12(7), 1347. https://doi.org/10.3390/en12071347

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Adaptive Intelligent Ensemble Model for Forecasting Primary Energy Demand

Abstract

1. Introduction

2. Literature Review

2.1. Time-Series Forecasting Model

2.2. Soft-Computing Technology

2.3. Optimization Algorithm

2.4. Ensemble-Based Methods

3. Methodology

3.1. GM (1,1)

3.2. RGM

3.3. ARIMA

3.4. SVR

3.5. Model Evaluation Criteria

4. Construction of Ensemble Forecasting Model

4.1. Construction of the Ensemble Model Based on the Standard Deviation Weight Method

4.1.1. Standard Deviation Method

4.1.2. GM (1,1)-SVR

4.1.3. RGM-SVR

4.1.4. ARIMA-SVR

4.2. Construction of the Adaptive Weight Ensemble Forecasting Model

4.2.1. PSO

4.2.2. APSO

4.2.3. Contrast Test of Optimization Algorithms

4.2.4. Construction of APSO-RGM-SVR Based on Adaptive Inertia Weight

5. Results

5.1. Data

5.2. Model Adaptability

5.3. Prediction Results

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI