A Hybrid Grey System Model Based on Stacked Long Short-Term Memory Layers and Its Application in Energy Consumption Forecasting

Hao, Yiwu; Ma, Xin

doi:10.3390/pr12081749

Open AccessArticle

A Hybrid Grey System Model Based on Stacked Long Short-Term Memory Layers and Its Application in Energy Consumption Forecasting

by

Yiwu Hao

and

Xin Ma

^*

School of Science, Southwest University of Science and Technology, Mianyang 621010, China

^*

Author to whom correspondence should be addressed.

Processes 2024, 12(8), 1749; https://doi.org/10.3390/pr12081749

Submission received: 10 July 2024 / Revised: 7 August 2024 / Accepted: 14 August 2024 / Published: 20 August 2024

(This article belongs to the Special Issue Data-Based Prediction Models in Energy Systems: From Principles to Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Accurate energy consumption prediction is crucial for addressing energy scheduling problems. Traditional machine learning models often struggle with small-scale datasets and nonlinear data patterns. To address these challenges, this paper proposes a hybrid grey model based on stacked LSTM layers. This approach leverages neural network structures to enhance feature learning and harnesses the strengths of grey models in handling small-scale data. The model is trained using the Adam algorithm with parameter optimization facilitated by the grid search algorithm. We use the latest annual data on coal, electricity, and gasoline consumption in Henan Province as the application background. The model’s performance is evaluated against nine machine learning models and fifteen grey models based on four performance metrics. Our results show that the proposed model achieves the smallest prediction errors across all four metrics (RMSE, MAE, MAPE, TIC, U1, U2) compared with other 15 grey system models and 9 machine learning models during the testing phase, indicating higher prediction accuracy and stronger generalization performance. Additionally, the study investigates the impact of different LSTM layers on the model’s prediction performance, concluding that while increasing the number of layers initially improves prediction performance, too many layers lead to overfitting.

Keywords:

adaptive moment estimation; energy consumption forecasting; grid search; grey system model; stacked long short-term memory

1. Introduction

Energy consumption reflects a region’s energy demand, industrial development, and economic growth, serving as a critical metric for devising effective energy scheduling strategies. Therefore, accurate prediction of energy consumption is necessary. However, predicting energy consumption with a small sample size and nonlinear dataset is highly challenging. The grey system model, grounded in the grey differential equation, features a simpler structure than machine learning models. It effectively learns and utilizes the characteristics of limited data. This capability has spurred the extensive exploration and adoption of the grey model in the energy sector. The grey model (GM) was initially proposed by Deng in 1983 [1]. In 1984, Deng further advanced this concept by introducing practical grey forecasting models known as GM(1,1) and GM(1,N), which were notably applied to forecast long-term grain output in China [2]. Building upon Deng’s foundational work, subsequent models such as DGM(1,1) [3] and DGM(1,N) [3] were developed as derivatives of GM(1,1) and GM(1,N), respectively.

After decades of advancement, grey modeling techniques have attained a level of maturity. Predominantly linear GMs can be categorized into univariate and multivariate models. Additionally, both types can be further distinguished based on whether they are continuous or discrete models. The majority of univariate grey models stem from the foundational GM(1,1) model. In 2020, Wang introduced a seasonal grey model, DSGM(1,1), which incorporates dynamic seasonal adjustment factors, significantly enhancing prediction accuracy [4]. Concurrently, Wu proposed the CFNGM(1,1,k,c) model in 2020, utilizing novel concepts of conformable fractional accumulation and differentiation, applied specifically to carbon dioxide prediction [5]. Following this, in 2021, Liu introduced the GM(1,1) power model, leveraging the principle of adjacent accumulation and validating its efficacy through case studies involving four central European countries [6]. In 2019, Luo introduced the DGMP(1,1,N) model, demonstrating its robust fitting and forecasting accuracy [7]. In 2020, Zhou developed an innovative discrete grey model, DGMNF(1,1), integrating considerations of nonlinearity and fluctuation. Two empirical examples were presented to validate the efficacy and reliability of that model [8]. Additionally, in 2021, Qian proposed the SADGM(1,1) model, an innovative adaptive discrete grey system model. Its performance was benchmarked against various grey and non-grey prediction methods using three real-world cases, affirming its feasibility and comparative superiority [9]. Similarly, the development of multivariate grey models originates from the GM(1,N) framework. In 2018, Ding proposed the grey DFCGM(1,N) model, employing dummy variables to effectively capture future trends influenced by these variables [10]. Concurrently, in 2018, another GM(1,N) model was introduced for mixed-frequency data, addressing challenges arising from inconsistent statistical frequencies in system feature and correlative factor series, especially under small-sample conditions [11]. Subsequently, in 2021, Luo introduced the TDAGM(1,N) model, applying it to analyze food production dynamics [12]. In 2013, He introduced the D-GMC(1,N) model to address boundary effects associated with bi-dimensional empirical mode decomposition (BEMD), demonstrating its effectiveness [13]. In 2019, Ding developed the CDGM(1,N) model, an enhanced discrete grey multivariable model utilized for forecasting output values in eastern high-tech industries [14]. Lastly, in 2020, Ding proposed a novel discrete grey system model incorporating grey power indices within its structural framework [15].

As research progresses in GM methodologies, it becomes increasingly evident that linearly structured GMs encounter challenges in effectively forecasting nonlinear data. As a result, there has been an increasing interest in developing nonlinear GMs utilizing various approaches. Nonlinear GMs typically explore three primary methodologies: kernel methods, integration of nonlinear mathematical constructs such as

y^{2} (t)

and

y^{γ} (t)

, and incorporation of neural networks. In 2018, Ma introduced a nonlinear multivariate grey system model known as the kernel-based GM(1,N) or KGM(1,N), leveraging the kernel method. It was demonstrated that the KGM(1,N) model achieved higher efficiency compared to existing linear multivariate grey models and the Least-Squares Support Vector Machine (LSSVM) [16]. In 2020, Duna introduced an enhanced version of the KGM(1,N) model, incorporating a Gaussian vector basis kernel function and a global polynomial kernel function, which demonstrated improved capability in handling nonlinearity [17]. In 2024, Ma presented the GMW-KRGM, a kernel ridge grey system model integrating an expanded parametric Morlet wavelet. Through analysis across six real-world examples, Ma illustrated the model’s superior accuracy in managing nonlinear data [18]. In 2017, Shaikh proposed the grey Verhulst model and conducted a forecast analysis focusing on the gas consumption of China [19]. In 2020, Xiao developed the grey Riccati–Bernoulli model (GRBM(1,1)) by transforming a differential equation based on the concept of differential information. This model was validated through four examples, showcasing its effectiveness compared to existing models [20]. Also in 2020, Mao utilized the Lotka–Volterra model to measure and forecast the influence of commercial banks’ online payment systems on the growth of third-party online payment systems [21]. In 2021, Ma introduced the neural grey system model, highlighting its superior performance relative to other models and emphasizing its robust applicability across different scenarios [22]. In 2023, Liu developed an advanced conformable fractional-order grey forecasting model, integrating a pioneering accumulation mechanism grounded in the generalized conformable fractional calculus. This model showcased enhanced predictive capabilities, surpassing existing models in precision [23]. Also in 2023, Xie constructed a nonlinear grey multivariate model for energy structure forecasting by leveraging differential equations and grey differential data, successfully predicting China’s energy consumption trends [24]. In the same year, Wei introduced an innovative nonlinear grey Bernoulli model employing a physics-preserving Cusum operator. This method was utilized for extracting intrinsic dynamics from short-term traffic flow data, with outcomes confirming its efficacy [25].

Nowadays, grey system models has been extensively used in the energy field, such as power load forecasting, energy consumption prediction, energy price forecasting, etc. In 2016, Zhao successfully applied the Rolling-ALO-GM(1,1) model to predict annual power load with significant results [26]. In 2019, Jin proposed a novel grey model incorporating grey correlation and applied it to short-term power load forecasting, demonstrating its superior forecasting accuracy compared to existing methods [27]. In 2017, Zeng introduced the NSGM(1,1) model for predicting the trend of China’s total energy consumption [28]. In 2021, Guo utilized an enhanced GM(1,1) model to forecast energy usage of residential air source heat pump water heaters, validating the effectiveness of the predictions [29]. In 2022, Li developed a nonlinear grey system model utilizing grey difference information to forecast energy price, energy consumption, and economic growth. This model successfully predicted coal prices and consumption in China from 2021 to 2025 [30]. Also in 2022, Lei constructed the PGM(1,2,a,b) model to predict electricity price, demonstrating its efficient and accurate short-term forecasting capabilities [31]. In 2023, Duan integrated the logistic model of energy structure into the systemic framework, devising a novel grey prediction model. That model, when applied to the case study of China’s electricity consumption, exhibited commendable predictive performance [32]. Also within that year, Pandey utilized the grey forecasting model DGM (1,1,

α

) to predict non-renewable and renewable energy from diverse sources, including hydro, solar, wind, and bioenergy [33]. In 2023, Zhao combined a fractional-order cumulative operator with a new information-priority accumulation method to create a hybrid grey univariate model, which was used to predict energy consumption in southwestern China [34]. In 2024, Yuan integrated a grey system model with Gaussian process residual uncertainty analysis and seasonal trend decomposition using LOESS to forecast carbon emissions in developed countries [35]. That same year, He used the vector-valued Bernoulli equation to establish a nonlinear multivariable grey Bernoulli model for predicting fuel and crude oil prices [36]. Despite their relative maturity in model structure and application, existing grey models still exhibit limitations when confronted with nonlinear data, hindering their ability to effectively capture data characteristics.

Long Short-Term Memory (LSTM) is a type of recurrent neural network employing a gating mechanism that enhances its ability to learn features in data, particularly effective for processing long time series. Initially introduced by Hochreiter in 1997 [37], LSTM has since evolved with various extensions, including bidirectional LSTM-CRF [38], multiplicative LSTM [39], and convolutional LSTM [40]. LSTM models have been widely used in the field of energy forecasting. In 2023, Wang deployed the GA-LSTM model for forecasting ship fuel consumption, successfully predicting fuel usage across various conditions [41]. In the same year, Lu introduced an innovative multi-source transfer learning model for short-term energy forecasting, leveraging LSTM networks combined with multi-kernel maximum mean discrepancy for domain adaptation [42]. This model addressed the challenge of limited historical data in predicting energy usage for diverse building types. In the same year, Lu proposed the Prophet-EEMD-LSTM model for workshop power consumption forecasting, demonstrating its high predictive accuracy [43]. However, the LSTM model shares common drawbacks with neural networks, where its predictive performance relies heavily on large-scale datasets. When handling small-scale datasets, it frequently encounters challenges such as overfitting and convergence to local optima.

Considering the strengths and weaknesses of both the grey model and LSTM, a natural consideration is to integrate the LSTM model into the grey model framework to leverage their respective advantages. Thus, in this paper, we propose a hybrid grey system model based on stacked LSTM layers, aiming to synergize the strengths of both approaches and enhance the handling of nonlinear and small-scale data efficiently. The idea for this combined framework originates from reference [22], which focuses on using a more complex optimization algorithm to train the model and employs only a simple neural network. This paper embeds stacked LSTM layers to develop a new hybrid grey model, using a simpler and more user-friendly algorithm to train the model, ultimately resulting in an effective model. Additionally, since the neural grey model framework in reference [22] has not been extensively studied, we chose to use this framework to build our model in order to verify its effectiveness and fill the research gap. Furthermore, the application of the neural grey model in predicting the annual consumption of electricity, coal, and gasoline energy in Henan Province is still unexplored, with most research on Henan Province focusing on agriculture. To address this gap, we apply the proposed hybrid grey model to predict the annual consumption of electricity, coal, and gasoline energy in Henan Province and verify the model’s prediction performance.

In the rest of the paper, the methodology including the generic formula of the grey system model, the proposed GreySLstm model and its solutions are shown in Section 2; applications in three Henan energy consumption datasets are presented in Section 3; the conclusion is in Section 4.

2. Methodology

2.1. The General Formulation of a Grey System Model

Given the original sequences

I_{i}^{(0)} (p)

(

p = 1, 2 \dots, N

) and

T^{(0)} (p)

(

p = 1, 2 \dots, N

), the first-order accumulation

I_{i}^{(1)} (p)

and

T^{(1)} (p)

can be obtained:

\begin{matrix} I_{i}^{(1)} (p) = \sum_{t = 1}^{p} I_{i}^{(0)} (t), p = 1, 2, \dots, N \\ T^{(1)} (p) = \sum_{t = 1}^{p} T^{(0)} (t), p = 1, 2, \dots, N \end{matrix}

(1)

where Equation (1) is called 1-AGO [44], and the structure of 1-AGO is shown in Figure 1.

The general whitening formula can be explained as follows:

\frac{d T^{(1)} (p)}{d p} + a T^{(1)} (p) = f (I^{(1)} (p); θ),

(2)

Here,

I^{(1)} (p) = (I_{1}^{(1)} (p), I_{2}^{(1)} (p), \dots, I_{n}^{(1)} (p))

, where a denotes the development factor, and the vector

θ

consists of parameters associated with the input series. The function

f (\cdot)

depends on the variable p.

By calculating the whitening equation, we obtain the discrete version of Equation (2):

T^{(0)} (k) + a z^{(1)} (k) = f (\frac{1}{2} (I^{(1)} (k - 1) + I^{(1)} (k)); θ),

(3)

where

z^{(1)} (k) = 0.5 [T^{(1)} (k - 1) + T^{(1)} (k)]

, and it is called the background value.

For convenience, here, we let

v_{k} = \frac{1}{2} (I^{(1)} (k - 1) + I^{(1)} (k))

, so Equation (3) can be written as follows:

T^{(0)} (k) + a z^{(1)} (k) = f (v_{k}; θ),

(4)

In a grey system model, Equation (4) is used to compute the value of

α

and

θ

. After obtaining the parameters of the GM model, we need to construct the prediction equation. Upon solving Equation (3) with the starting condition

T^{(1)} (p) = T^{(0)} (1)

, the response function in continuous form is altered as follows:

T^{(1)} (p) = T^{(0)} (1) e^{- a (p - 1)} + \int_{1}^{p} e^{- a (p - r)} f (I^{(1)} (r); θ) d r,

(5)

We need to exploit its numerical approximation. By discretizing the integral term on the right side of Equation (5), we obtain its discrete form:

{\hat{T}}^{(1)} (k) = T^{(0)} (1) e^{- a (k - 1)} + \sum_{r = 2}^{k} \{e^{- a (k - r + \frac{1}{2})} \cdot f (\frac{1}{2} (I^{(1)} (k - 1) + I^{(1)} (k)); θ)\},

(6)

As previously defined for

v_{k}

, Equation (6) can also be written as follows:

{\hat{T}}^{(1)} (k) = T^{(0)} (1) e^{- a (k - 1)} + \sum_{r = 2}^{k} \{e^{- a (k - r + \frac{1}{2})} \cdot f (v_{k}; θ)\},

(7)

Thus, according to Equation (7), we can calculate the forecasting value

{\hat{T}}^{(1)} (k)

. Then, by using the inverse AGO (1-IAGO), we obtain the value of

{\hat{T}}^{(0)} (k)

. The expression of 1-IAGO can be written as follows:

{\hat{T}}^{(0)} (k) = {\hat{T}}^{(1)} (k) - {\hat{T}}^{(1)} (k - 1) .

(8)

The summary of the general form of the grey model, from Equations (2)–(8), refers to Ma’s paper [22].

2.2. The Proposed Hybrid Grey System Model Based on Stacked LSTM Layers

The preceding section outlined a conventional grey system model and its solutions. It is evident that current grey system models encounter challenges in predicting nonlinear time series data. This difficulty arises from the traditional parameter estimation of a and

θ

in the function

f (\cdot)

using the least-squares method, which results in the formation of a linear function.

In this section, we construct stacked LSTM (SLSTM) layers as the function

f (\cdot)

to more effectively capture features from nonlinear data. The architecture of

f (\cdot)

is depicted in Figure 2. As shown in Figure 2, we develop a neural network comprising multiple LSTM layers and linear layers.

First, the input

I^{(1)} (p)

is fed into the first LSTM layer. Within that layer, the data pass through three crucial gates: the input gate, forget gate, and output gate. The output value of the input gate is calculated as follows:

i^{(1)} (p) = σ (\begin{matrix} W_{i}^{(1)} I^{(1)} (p) + U_{i}^{(1)} h^{(1)} (p - 1) + b_{i}^{(1)} \end{matrix})

(9)

where

σ

refer to the sigmoid function which could be express as follows:

σ (x) = {(1 + e^{- x})}^{- 1}

(10)

Next, the output of the forget gate is obtained as follows:

f^{(1)} (p) = σ (W_{f}^{(1)} I^{(1)} (p) + U_{f}^{(1)} h^{(1)} (p - 1) + b_{f}^{(1)})

(11)

Then, the value of the output gate is calculated as follows:

o^{(1)} (p) = σ (W_{o}^{(1)} I^{(1)} (p) + U_{o}^{(1)} h^{(1)} (p - 1) + b_{o}^{(1)})

(12)

Based on the outputs of the aforementioned gates, the output of the first LSTM layer is obtained as follows:

\{\begin{matrix} c^{(1)} (p) = f^{(1)} (p) ⊙ c^{(1)} (p - 1) + i^{(1)} (p) ⊙ tanh (W_{c}^{(1)} I^{(1)} (p) + U_{c}^{(1)} h^{(1)} (p - 1) + b_{c}^{(1)}) \\ h^{(1)} (p) = o^{(1)} (p) ⊙ tanh (c^{(1)} (p)) \end{matrix}

(13)

where

h^{(1)} (p)

is the output of the first LSTM layer. The symbol ⊙ denotes element-wise multiplication, and tanh represents the hyperbolic tangent function, which is defined as follows:

tanh (x) = \frac{e^{x} - e^{-} x}{e^{x} + e^{-} x}

(14)

Then, it passes through the first linear layer:

s^{(1)} (p) = h^{(1)} (p) W^{(1)} + b^{(1)}

(15)

Each time, we take the output of the linear layer as the input for the next LSTM layer. We obtain the output of the last LSTM layer:

h^{(z)} (p) = o^{(z)} (p) ⊙ tanh (c^{(z)} (p))

(16)

where z indicates that the last LSTM layer is the zth LSTM layer. Finally, after mapping, we obtain the output of the stacked LSTM layers:

s^{(z)} (p) = h^{(z)} (p) W^{(z)} + b^{(z)}

(17)

where

s^{(z)} (p)

also represents the value of

f (I^{(1)} (p))

. In the formulations above,

W_{i}^{(q)}, U_{i}^{(q)},

b_{i}^{(q)},

W_{f}^{(q)}, U_{f}^{(q)}, b_{f}^{(q)}, W_{o}^{(q)}, U_{o}^{(q)}, b_{o}^{(q)}, W_{c}^{(q)}, U_{c}^{(q)}, b_{c}^{(q)} (q = 1, 2, \dots, z)

are the parameters of the LSTM layers, and

W^{(q)}, b^{(q)}

are the parameters of the linear layers. For ease of representation, we collectively denote the parameters in stacked LSTM layers as

Θ

. Furthermore, we refer to Gers’s paper [45] to establish Equations (9)–(17).

Thus, Equation (2) can be rewritten as follows:

\frac{d T^{(1)} (p)}{d p} + a T^{(1)} (p) = f (I^{(1)} (p); Θ)

(18)

where the equation of the right side is equal to

s^{(z)} (p)

.

To determine the parameter values, we solve Equation (18) to obtain its discrete form:

T^{(0)} (k) + a z^{(1)} (k) = f (v_{k}; Θ)

(19)

where

v_{k} = \frac{1}{2} (I^{(1)} (k - 1) + I^{(1)} (k))

(20)

The structure of Equation (19) is shown in Figure 3.

We can derive the response function from Equation (18) under the condition

T^{(1)} (p) = T^{(0)} (1)

:

T^{(1)} (p) = T^{(0)} (1) e^{- a (p - 1)} + \int_{1}^{p} e^{- a (p - r)} f (I^{(1)} (r); Θ) d r

(21)

Similarly, we obtain the discrete form of Equation (21):

{\hat{T}}^{(1)} (k) = T^{(0)} (1) e^{- a (k - 1)} + \sum_{r = 2}^{k} \{e^{- a (k - r + \frac{1}{2})} \cdot f (v_{k}; Θ)\}

(22)

Finally, after computing the value of

{\hat{T}}^{(1)} (k)

, we obtain the predicted value

{\hat{T}}^{(0)} (k)

using the inverse of IAGO (Equation (8)).

The synergy between stacked LSTM layers and grey system model in the hybrid grey model can be simply shown in Figure 4.

2.3. Adam Algorithm for Training the Proposed Model

In general, neural networks often lack closed-form analytical solutions, necessitating the use of optimization algorithms for iterative parameter updates. Among the various algorithms available, such as Gradient Descent (GD) [46], Stochastic Gradient Descent (SGD) [47], Adaptive Moment Estimation (Adam) [48], Momentum Gradient Descent (MGD) [49], and other algorithms [50], the Adam algorithm combines momentum and second-order moment estimation, enhancing the stability of the optimization process and accelerating convergence. Numerous studies have demonstrated its effectiveness and stability. To optimize the stability and ease of use, we employed the Adam algorithm for model optimization.

First, we need to define the training error

e_{k}

at each point (

I^{(1)} (k), T^{(0)} (k)

):

e_{k} = T^{(0)} (k) + a z^{(1)} (k) - f (v_{k}; Θ),

(23)

Then, we obtain the sum of training error E:

E (a, Θ) = \frac{1}{N} \sum_{k = 2}^{N} e_{k}^{2} = e^{T} e,

(24)

Next, we need to calculate the gradient of the total training error:

\nabla E = [\frac{\partial E}{\partial a}, \frac{\partial E}{\partial Θ}],

(25)

where

\frac{\partial E}{\partial a} = \frac{2}{N} \sum_{k = 2}^{N} ((T^{(0)} (k) + a z^{(1)} (k) - f (v_{k}; Θ)) z^{(1)} (k))

(26)

\frac{\partial E}{\partial θ} = - \frac{2}{N} \sum_{k = 2}^{N} ((T^{(0)} (k) + a z^{(1)} (k) - f (v_{k}; Θ)) \frac{\partial}{\partial Θ} f (v_{k}; Θ))

(27)

In the GD algorithm, iteration involves directly using the gradient:

[\begin{matrix} a^{k + 1} \\ Θ^{k + 1} \end{matrix}] = [\begin{matrix} a^{k} \\ Θ^{k} \end{matrix}] - l \cdot \nabla E

(28)

where l means the learning rate of the algorithm.

Unlike other optimization algorithms, the Adam algorithm additionally introduces and calculates the modified bias-corrected first-order moment estimate

{\hat{m}}_{k}

and the bias-corrected second-order raw moment estimate

{\hat{η}}_{k}

. Before calculating

{\hat{m}}_{k}

and

{\hat{η}}_{k}

, we need to get their biased first and second moment estimate which are represented by

m_{k}

and

η_{k}

, respectively.

\{\begin{matrix} m_{k} = μ_{1} \cdot m_{k - 1} + (1 - μ_{1}) \cdot \nabla E \\ η_{k} = μ_{2} \cdot η_{k - 1} + (1 - μ_{2}) \cdot \nabla E^{2} \end{matrix}

(29)

where

μ_{1}

and

μ_{2}

are the decay rates. Then, we can compute the value of

{\hat{m}}_{k}

and

{\hat{η}}_{k}

:

\{\begin{matrix} {\hat{m}}_{k} = \frac{m_{k}}{1 - μ_{1}^{k}} \\ {\hat{η}}_{k} = \frac{η_{k}}{1 - μ_{2}^{k}} \end{matrix}

(30)

Thus, we can update the parameters based on

{\hat{m}}_{k}

and

{\hat{η}}_{k}

:

[\begin{matrix} a^{k + 1} \\ Θ^{k + 1} \end{matrix}] = [\begin{matrix} a^{k} \\ Θ^{k} \end{matrix}] - l \cdot \frac{{\hat{m}}_{k}}{\sqrt{{\hat{η}}_{k}} + ϵ} .

(31)

where

ϵ

means a constant which is very small. The complete Adam algorithm is shown in Algorithm 1:

Algorithm 1: Adam algorithm for training the hybrid grey system model
	Input: $E (a, Θ)$ (Equation (24)), learning rate l, max_iteration
	Output: $[α_{0}, Θ_{0}] \leftarrow r a n d o m ();$ (Initialize the parameter set)
	$μ_{1} \leftarrow 0.9;$ (Initialize the exponential decay rate)
	$μ_{2} \leftarrow 0.999;$ (Initialize the exponential decay rate)
	$ϵ \leftarrow 10^{- 8};$ (Initialize the small constant)
	$m_{0} \leftarrow 0;$ (Initialize the $1 st$ moment)
	$η_{0} \leftarrow 0;$ (Initialize the $2 nd$ moment)
	$i t e r a t i o n \leftarrow 0;$ (Initialize the number of iterations)
1	while $iteration < \max_iteration$ do
2		iteration = iteration + 1;
3		$\nabla E \leftarrow$ Equation (26); (Calculate the objective function gradient)
4		$m_{k}, η_{k} \leftarrow$ Equation (29); (Calculate the first and second moment)
5		${\hat{m}}_{k}, {\hat{η}}_{k} \leftarrow$ Equation (30); (Calculate the bias-corrected first and second moment)
6		$[\begin{matrix} a^{k + 1} \\ Θ^{k + 1} \end{matrix}] \leftarrow$ Equation (31); (Update the parameters)
7	end
8	return $[α_{0}, Θ_{0}]$ (Resulting parameters)

2.4. Grid Search Algorithm for Tuning Parameters of the Proposed Model

In the preceding section, evolutionary algorithms and the Adam algorithm were employed to optimize model parameters. However, achieving optimal model performance necessitates fine-tuning hyperparameters such as the number of neurons L, learning rate l, and the count of LSTM layers z.

Let the model parameter space be denoted as

Φ

, where each parameter combination is represented by a vector

ϕ

consisting of

L, l, z

. A grid search aims to identify the optimal parameter combination

ϕ^{*}

from

Φ

based on the training dataset

D_{t r a i n}

, with the objective of enhancing model performance on the validation dataset

D_{v a l}

. This approach adheres to the mathematical principle:

ϕ^{*} = arg min_{ϕ \in Φ} F (ϕ, D_{train}, D_{val}),

(32)

where the term

F (ϕ, D_{train}, D_{val})

represents the performance metric obtained by training the model with parameter combination

ϕ

on the validation set

D_{v a l}

. Here, we use Mean Absolute Percentage Error (MAPE) to calculate the metric:

MAPE = \frac{1}{| D_{val} |} \sum_{i \in D_{val}} |\frac{T_{i} - {\hat{T}}_{i}}{T_{i}}| \times 100 %

(33)

In our approach, we employ two distinct methods to compute

{\hat{T}}_{i}

during the grid search. Firstly, we utilize the response function (Equation (22)) in conjunction with 1-IAGO (Equation (8)) to derive

{\hat{T}}_{i}

. This method is denoted as GreySLstm-M1 for convenience. Secondly, leveraging the previously defined function

f (\cdot)

, we directly use the output of the stacked LSTM layers as

{\hat{T}}_{i}

during the grid search process. This approach is referred to as GreySLstm-M2.

2.5. The Proposed Complete Forecasting Process

Taking into account the previous algorithms and formula derivations, the complete forecasting process of the proposed hybrid grey model is shown in Algorithm 2.

Algorithm 2: Complete forecasting process of the GM-ResNet
	Input: Training input: $(I^{(0)} (p), T^{(0)} (p)), p = 1, \dots, N;$
	$Test input : I^{(0)} (p), p = N + 1, \dots, N + T;$
	Number of neurons L, learning rate l and max_iteration
	Number of LSTM layers z;
	Output: $T^{(1)} (p) \leftarrow \sum_{r = 1}^{p} T^{(0)} (r);$ $I_{i}^{(1)} (p) \leftarrow \sum_{r = 1}^{p} I_{i}^{(0)} (r);$
	$iteration = 0$ ;
1	$L, l, z \leftarrow$ Equation (32); (Use GreySLstm-M1 or M2 to select the best $L, l, z$ )
2	while $iteration < m a x_i t e r a t i o n$ do
3		$[\begin{matrix} a^{} \\ Θ^{} \end{matrix}]$ ← Algorithm 1; (Use Algorithm 1 to train the model)
4	end
5	for $p = 2$ to $N + T$ do
6		${\hat{y}}^{(0)} (p)$ ← Equations (22) and (8); (Forecast by using the response function and 1-IAGO)
7	end
	Result: ${\hat{T}}^{(0)} (p), p = 1, \dots, N + T;$

3. Application

3.1. Data Collection

To validate the model’s efficacy and evaluate its predictive precision, we applied it to real-world examples for validation (as shown in Figure 5). This paper collected annual data on coal consumption in Henan Province from 1995 to 2019, electricity consumption from 1995 to 2022, and gasoline consumption from 1995 to 2019. These data were obtained from the latest available statistics on the official website of the National Bureau of Statistics of China (https://data.stats.gov.cn/index.htm (accessed on 10 July 2024)).

3.2. Selection of Comparison Models and Assessment Criteria

In order to better evaluate the performance of the proposed model, in this paper, we used 9 machine learning models and 15 grey system models for comparing. Meanwhile, four kinds of metrics were used to quantify prediction accuracy. The detailed information on the grey system models is shown in Table 1, the detail of the machine learning models is shown in Table 2, and the metrics used for evaluation are shown in Table 3. D in Table 3 means the set of training or testing data, and d means the length of D.

3.3. Case 1: Henan’s Coal Consumption

Coal remains a cornerstone of global energy supply, playing a crucial role in power production, industrial processes, and economic stability. In Henan Province, coal production constitutes a significant economic pillar, and coal is one of the primary energy sources. Therefore, predicting coal consumption in Henan Province is of paramount importance. Accurate forecasts can provide a theoretical foundation for the government to formulate relevant energy and economic strategies.

In this paper, we collected the latest annual coal consumption data in Henan Province from 1995 to 2019, totaling 25 data points. The first 20 data points were used for training the model, while the remaining 5 points were reserved for testing. The metrics for model performance are presented in Table 4, and the prediction curves are shown in Figure 6. As seen in Table 4, the proposed GreySLstm-M1 model achieved the best RMSE, MAE, MAPE, TIC, U1, and U2 values in both training and testing phases. During training, the MAE and MAPE of the rf model, the RMSE and U2 of the lstm model, and the TIC and U1 of the convlstm model ranked second. However, in testing, their performance was significantly inferior to that of the GreySLstm-M1 model. Furthermore, the GreySLstm-M2 model performed mediocrely based on the calculated indicators. Figure 6 illustrates that while GreySLstm-M1, GreySLstm-M2, gru, lstm, cnnlstm, and convlstm fitted the training data well, only GreySLstm-M1 maintained a close alignment between predicted and actual points during testing. It is also evident that all grey system models, except the proposed one, performed poorly.

3.4. Case 2: Henan’s Electricity Consumption

Electricity is essential for residents’ daily lives, industrial production, and scientific and technological research. Henan Province, with its large population, has a substantial demand for electricity. Therefore, accurately predicting electricity consumption in Henan is crucial for relevant authorities to formulate effective power distribution strategies.

In this paper, we collected annual electricity consumption data for Henan Province from 1995 to 2022, totaling 28 data points. The first 23 points were used for model training, and the last 5 points were used for prediction analysis. The results of the indicator calculations are presented in Table 5, and the prediction curves of the models are shown in Figure 7. From Table 5, it is evident that the proposed GreySLstm-M1 model achieved the best results for all indicators during testing, while the rf model performed best during training. However, the GreySLstm-M1 model’s training performance was close to that of the rf model, ranking second in all training indicators. During testing, the cnnlstm model ranked second in MAE and MAPE, and the BernoulliGM model ranked second in TIC, U1, and U2. However, their performance on the training set was significantly weaker than that of the GreySLstm-M1 model. The GreySLstm-M2 model showed average performance across the indicators. Figure 7 demonstrates that the prediction curve of GreySLstm-M1 closely aligned with the actual data. Among the comparison models, cnnlstm and convlstm effectively captured the nonlinearity of the real data. In contrast, most grey models produced prediction curves that resembled a straight line or arc, failing to accurately fit nonlinear data.

3.5. Case 3: Henan’s Gasoline Consumption

Gasoline is essential for transportation, powering vehicles, and facilitating the movement of goods and people, thereby driving economic activity and connectivity. Forecasting gasoline consumption in Henan Province can aid in the effective planning and allocation of resources, providing a scientific basis for developing strategies to reduce emissions and transition to alternative fuels.

In this paper, we collected annual gasoline consumption data for Henan Province from 1995 to 2019. The first 18 data points were used for training, and the remaining 7 points were used for testing. The related metrics are shown in Table 6, and the prediction curves are shown in Figure 8. The analysis of Table 6 reveals that the proposed GreySLstm-M1 model achieved the lowest values across all metrics in both training and testing phases. Additionally, the GreySLstm-M2 model demonstrated the second-best performance in all metrics, except for RMSE and U2 during testing. Figure 8 shows that while GreySLstm-M1, GreySLstm-M2, rf, lstm, and svr fitted the actual curve well during training, rf and svr failed to maintain this trend during testing. Furthermore, all grey system models exhibited poor performance in both training and testing, with their prediction curves resembling arcs.

3.6. Discussions

Analysis of the Performance of the Models on Real-World Cases

In the three real-world cases examined, the proposed GreySLstm-M1 model consistently demonstrated superior prediction performance with small-scale, nonlinear datasets. Compared to other models, GreySLstm-M1 maintained the best results across all scenarios, highlighting its versatility and stability. Conversely, the GreySLstm-M2 model showed mediocre performance, likely because it relied too heavily on the role of stacked LSTM layers and did not adequately leverage the strengths of the grey model component. This suggests that solely optimizing neural networks may not ensure superior prediction results for GreySLstm models.

Among the comparison models, the rf model often exhibited excellent fitting during the training phase but performed poorly during the prediction phase, primarily due to overfitting. Other machine learning models also suffered from similar overfitting issues to varying degrees. Moreover, the grey models in the comparison generally performed inadequately across all cases, indicating their inability to effectively handle nonlinear, small-scale datasets.

3.7. Analysis of the Indicator Optimization of the Proposed Model

To further analyze the performance of GreySLstm-M1 and quantify the improvement in prediction accuracy compared to other models, we used the following formula to calculate the degree of optimization:

x_{i} = \frac{M_{i} - M_{GreySLstm - M 1}}{M_{i}} \times 100 %

(34)

where

x_{i}

represents the optimization percentage of the GreySLstm-M1 model relative to model i on a certain indicator.

M_{i}

represents a certain indicator value of model i, and

M_{GreySLstm - M 1}

represents the same indicator value of the GreySLstm-M1 model.

The specific results of each case are listed in Table 7, Table 8 and Table 9, respectively. From case 1, we found that the prediction accuracy of GreySLstm improved significantly. Compared to other models, the RMSE, MAE, MAPE, TIC, U1, and U2 indicators were improved by 43.7827%, 22.3263%, 25.9755%, 41.4517%, 41.4517%, and 43.7827%, respectively. In case 2, these indicators were improved by at least 0.9034%, 8.0892%, 6.8578%, 1.0207%, 1.0207%, and 0.9034%. In case 3, the improvements were at least 18.9155%, 15.4864%, 14.1890%, 18.1348%, 18.1348%, and 18.9155%, respectively. In cases 1 and 3, where the data exhibited high nonlinearity, the proposed model significantly enhanced prediction performance. However, in case 2, where the data were less nonlinear, the performance improvement compared to other models was not as substantial. Therefore, we conclude that the proposed GreySLstm-M1 model is more suitable for small-scale datasets with strong nonlinearity.

Evaluating GreySLstm Performance with Different Numbers of LSTM Layers

To better understand the performance of the proposed GreySLstm model, we conducted a detailed analysis of its prediction performance based on different numbers of LSTM layers. Given that the GreySLstm-M2 model performed significantly worse than the GreySLstm-M1 model in the previous section, we focused exclusively on the GreySLstm-M1 model for the remainder of this study. Hereafter, we refer to GreySLstm-M1 simply as GreySLstm.

To ensure that our conclusions are more representative, we ran the GreySLstm model five times for each LSTM layer configuration and calculated the RMSE, MAE, MAPE, TIC, U1, and U2 indicators for each run. The results for each case are listed in Table 10, and the visualizations are shown in Figure 9, Figure 10 and Figure 11.

In cases 1 and 2, we observe that during both the training and test phases, the indicators initially decreased, then increased, and eventually stabilized. When the test set indicators stabilized, they were significantly larger than those of the training set. In case 3, the indicators fluctuated with an increasing number of LSTM layers but remained relatively stable overall, with test set indicators consistently larger than those of the training set.

From these observations, we conclude that the models in cases 1 and 2 experienced overfitting with too many layers. This occurred because each LSTM layer contained multiple neurons, and when there were too many layers, the total number of neurons could exceed 2–3 times the number of training data points. A model that is too complex not only learns the underlying patterns but also captures noise and irrelevant features, reducing its generalization ability on new data. Furthermore, the performance in case 3 suggests overfitting from the outset, with its indicator trends resembling the stable phase observed in cases 1 and 2. This was likely due to the smaller training set size (18 data points) compared to cases 1 and 2 (20 and 23 data points, respectively).

In summary, the prediction performance of the GreySLstm model generally improves with an increasing number of layers but may decline after reaching an optimal point. If overfitting persists from the beginning, increasing the training set size should be considered to mitigate this issue.

4. Conclusions

4.1. Paper Structure Overview

In this paper, we conducted a literature review on grey models and LSTM models in Section 1, identifying gaps in current research. In Section 2, we introduced the general form of the grey model and then proposed the GreySLstm model, explaining its construction and prediction principles. We also described the parameter optimization process using the Adam and grid search algorithms and summarized the complete prediction process. In Section 3, we collected the latest annual coal, electricity, and gasoline consumption data from Henan Province. We used these data to test the proposed model and quantify the prediction error with four different indicators. We compared our model with 24 other models, demonstrating its superior generalization and prediction performance. We also discussed the results from cases and analyzed the impact of the number of LSTM layers on the prediction accuracy of the hybrid grey model.

4.2. Main Findings and Contributions

The results in Section 3 showed that the proposed model outperformed many grey and machine learning models across multiple cases and evaluation indicators. The model demonstrated strong generalization performance in various scenarios, indicating high reliability and applicability in practical applications. This finding broadens current research ideas, proposing a novel integration of grey models with neural networks. It proves that this model framework effectively leverages the strengths of both approaches, resulting in a hybrid grey model capable of handling nonlinear and small-scale data.

The analysis of the impact of the number of LSTM layers on the prediction performance of the GreySLstm model in Section 3 revealed a practical rule: increasing the number of layers initially improves the model’s prediction performance, but excessive layers lead to overfitting. This finding provides a clear guideline for future researchers, helping them identify the optimal layer configuration and avoid overcomplicating the model. It also supports the theoretical assumption that overly complex deep learning models, while performing well on training data, often underperform on test data, reducing generalization performance. Additionally, this insight suggests an important research direction: investigating methods to mitigate overfitting caused by increasing LSTM layers, such as regularization techniques and early stopping.

Compared with other studies, this paper proposed a novel framework combining stacked LSTM layers and grey models, resulting in a hybrid grey model with superior performance. This neural grey model was applied for the first time to predict the annual energy consumption of electricity, coal, and gasoline in Henan Province, demonstrating its effectiveness. This research enhances the combination model, fills a gap in the literature, and provides a reference for future research.

4.3. Analysis of Potential Limitations of the Model

Grey models typically include accumulation operations (1-AGO), which can be time-consuming when handling large-scale data. However, this is a common limitation of grey models, not a specific issue of the proposed framework. Additionally, while the GreySLstm model can process nonlinear and small-scale data, embedding too many LSTM layers is not advisable. Excessive layers can lead to overfitting due to the model’s complexity. Therefore, we recommend setting the number of LSTM layers as a hyperparameter to be adjusted for optimal prediction results.

4.4. Recommendations for Model Enhancement

This paper presents the mathematical principles and training methods of the GreySLstm model but suggests further optimization for improved performance. To enhance the robustness of the GreySLstm model, we recommend using outlier detection techniques such as isolation forest, Local Outlier Factor (LOF), and density-based spatial clustering of applications with noise (DBSCAN). To improve the model’s generalization ability, we suggest implementing regularization, early stopping, and other techniques to effectively prevent overfitting, thereby enhancing the model’s overall generalization performance.

4.5. Future Research Directions

The results of this paper demonstrate that embedding neural networks into grey models is effective, leveraging the strengths of neural networks in feature capture and grey models in handling small-scale data. Future research will delve deeper into this model framework to gain a richer and more scientific understanding of its mechanisms.

Author Contributions

Conceptualization, X.M.; methodology, X.M.; validation, Y.H.; formal analysis, X.M.; investigation, Y.H.; resources, Y.H.; data curation, Y.H.; writing—original draft preparation, Y.H.; writing—review and editing, X.M.; visualization, X.M.; supervision, X.M.; project administration, X.M.; funding acquisition, X.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Humanities and Social Science Fund of Ministry of Education of China grant number 19YJCZH119.

Data Availability Statement

National Bureau of Statistics of China (https://data.stats.gov.cn/index.htm) (accessed on 10 July 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Deng, J. Grey fuzzy forecast and control for grain. J. Huazhong Univ. Sci. Technol. Med. Sci. 1983, 2, 1–8. [Google Scholar]
Deng, J. Grey dynamic model and its application in the long-term forecasting output of grain. Discov. Nat. 1984, 3, 37–45. (In Chinese) [Google Scholar]
Xie, N.-M.; Liu, S.-F.; Yang, Y.-J.; Yuan, C.-Q. On novel grey forecasting model based on non-homogeneous index sequence. Appl. Math. Model. 2013, 37, 5059–5068. [Google Scholar] [CrossRef]
Wang, Z.X.; Wang, Z.W.; Li, Q. Forecasting the industrial solar energy consumption using a novel seasonal GM (1, 1) model with dynamic seasonal adjustment factors. Energy 2020, 200, 117460. [Google Scholar] [CrossRef]
Wu, W.; Ma, X.; Zhang, Y.; Li, W.; Wang, Y. A novel conformable fractional non-homogeneous grey model for forecasting carbon dioxide emissions of BRICS countries. Sci. Total. Environ. 2020, 707, 135447. [Google Scholar] [CrossRef]
Liu, L.; Wu, L. Forecasting the renewable energy consumption of the European countries by an adjacent non-homogeneous grey model. Appl. Math. Model. 2021, 89, 1932–1948. [Google Scholar] [CrossRef]
Luo, D.; Wei, B.L. A unified treatment approach for a class of discrete grey forecasting models and its application. Syst. Eng.-Theory Pract. 2019, 39, 451–462. [Google Scholar]
Zhou, W.; Wu, X.; Ding, S.; Pan, J. Application of a novel discrete grey model for forecasting natural gas consumption: A case study of Jiangsu Province in China. Energy 2020, 200, 117443. [Google Scholar] [CrossRef]
Qian, W.; Sui, A. A novel structural adaptive discrete grey prediction model and its application in forecasting renewable energy generation. Expert Syst. Appl. 2021, 186, 115761. [Google Scholar] [CrossRef]
Ding, S.; Dang, Y.G.; Xu, H. Construction and application of GM (1, N) based on control of dummy variables. Control Decis. 2018, 33, 309–315. [Google Scholar]
Wang, J. The GM (1, N) Model for Mixed-frequency Data and Its Application in Pollutant Discharge Prediction. J. Grey Syst. 2018, 30, 97. [Google Scholar]
Luo, D.; An, Y.M.; Wang, X.L. Time-delayed accumulative TDAGM (1, N, t) model and its application in grain production. Control Decis. 2021, 36, 2002–2012. [Google Scholar]
He, Z.; Wang, Q.; Shen, Y.; Wang, Y. Discrete multivariate gray model based boundary extension for bi-dimensional empirical mode decomposition. Signal Process. 2013, 93, 124–138. [Google Scholar] [CrossRef]
Ding, S. A novel discrete grey multivariable model and its application in forecasting the output value of China’s high-tech industries. Comput. Ind. Eng. 2019, 127, 749–760. [Google Scholar] [CrossRef]
Ding, S.; Xu, N.; Ye, J.; Zhou, W.; Zhang, X. Estimating Chinese energy-related CO₂ emissions by employing a novel discrete grey prediction model. J. Clean. Prod. 2020, 259, 120793. [Google Scholar] [CrossRef]
Ma, X.; Liu, Z. The kernel-based nonlinear multivariate grey model. Appl. Math. Model. 2018, 56, 217–238. [Google Scholar] [CrossRef]
Duan, H.; Wang, D.; Pang, X.; Liu, Y.; Zeng, S. A novel forecasting approach based on multi-kernel nonlinear multivariable grey model: A case report. J. Clean. Prod. 2020, 260, 120929. [Google Scholar] [CrossRef]
Ma, X.; Deng, Y.; Ma, M. A novel kernel ridge grey system model with generalized Morlet wavelet and its application in forecasting natural gas production and consumption. Energy 2024, 287, 129630. [Google Scholar] [CrossRef]
Shaikh, F.; Ji, Q.; Shaikh, P.H.; Mirjat, N.H.; Uqaili, M.A. Forecasting China’s natural gas demand based on optimised nonlinear grey models. Energy 2017, 140, 941–951. [Google Scholar] [CrossRef]
Xiao, Q.; Gao, M.; Xiao, X.; Goh, M. A novel grey Riccati–Bernoulli model and its application for the clean energy consumption prediction. Eng. Appl. Artif. Intell. 2020, 95, 103863. [Google Scholar] [CrossRef]
Mao, S.; Zhu, M.; Wang, X.; Xiao, X. Grey–Lotka–Volterra model for the competition and cooperation between third-party online payment systems and online banking in China. Appl. Soft Comput. 2020, 95, 106501. [Google Scholar] [CrossRef]
Ma, X.; Xie, M.; Suykens, J.A.K. A novel neural grey system model with Bayesian regularization and its applications. Neurocomputing 2021, 456, 61–75. [Google Scholar] [CrossRef]
Liu, C.; Xu, Z.; Zhao, K.; Xie, W. Forecasting education expenditure with a generalized conformable fractional-order nonlinear grey system model. Heliyon 2023, 9, e16499. [Google Scholar] [CrossRef]
Xie, D.; Li, X.; Duan, H. A novel nonlinear grey multivariate prediction model based on energy structure and its application to energy consumption. Chaos Solitons Fractals 2023, 173, 113767. [Google Scholar] [CrossRef]
Wei, B.; Yang, L.; Xie, N. Nonlinear grey Bernoulli model with physics-preserving Cusum operator. Expert Syst. Appl. 2023, 229, 120466. [Google Scholar] [CrossRef]
Zhao, H.; Guo, S. An optimized grey model for annual power load forecasting. Energy 2016, 107, 272–286. [Google Scholar] [CrossRef]
Jin, M.; Zhou, X.; Zhang, Z.M.; Tentzeris, M.M. Short-term power load forecasting using grey correlation contest modeling. Expert Syst. Appl. 2012, 39, 773–779. [Google Scholar] [CrossRef]
Zeng, B.; Luo, C. Forecasting the total energy consumption in China using a new-structure grey system model. Grey Syst. Theory Appl. 2017, 7, 194–217. [Google Scholar] [CrossRef]
Guo, J.J.; Wu, J.Y.; Wang, R.Z. A new approach to energy consumption prediction of domestic heat pump water heater based on grey system theory. Energy Build. 2011, 43, 1273–1279. [Google Scholar] [CrossRef]
Li, H.; Wu, Z.; Yuan, X.; Yang, Y.; He, X.; Duan, H. The research on modeling and application of dynamic grey forecasting model based on energy price-energy consumption-economic growth. Energy 2022, 257, 124801. [Google Scholar] [CrossRef]
Lei, M.; Feng, Z. A proposed grey model for short-term electricity price forecasting in competitive power markets. Int. J. Electr. Power Energy Syst. 2012, 43, 531–538. [Google Scholar] [CrossRef]
Duan, H.; Pang, X. A novel grey prediction model with system structure based on energy background: A case study of Chinese electricity. J. Clean. Prod. 2023, 390, 136099. [Google Scholar] [CrossRef]
Pandey, A.K.; Singh, P.K.; Nawaz, M.; Kushwaha, A.K. Forecasting of non-renewable and renewable energy production in India using optimized discrete grey model. Environ. Sci. Pollut. Res. 2023, 30, 8188–8206. [Google Scholar] [CrossRef]
Zhao, X.; Ma, X.; Cai, Y.; Yuan, H.; Deng, Y. Application of a novel hybrid accumulation grey model to forecast total energy consumption of Southwest Provinces in China. Grey Syst. Theory Appl. 2023, 13, 629–656. [Google Scholar] [CrossRef]
Yuan, H.; Ma, X.; Ma, M.; Ma, J. Hybrid framework combining grey system model with Gaussian process and STL for CO₂ emissions forecasting in developed countries. Appl. Energy 2024, 360, 122824. [Google Scholar] [CrossRef]
He, Q.; Ma, X.; Zhang, L.; Li, W.; Li, T. The nonlinear multi-variable grey Bernoulli model and its applications. Appl. Math. Model. 2024, 134, 635–655. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF models for sequence tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar]
Krause, B.; Lu, L.; Murray, I.; Renals, S. Multiplicative LSTM for sequence modelling. arXiv 2016, arXiv:1609.07959. [Google Scholar]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.-Y.; Wong, W.-K.; Woo, W.-C. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef]
Wang, K.; Hua, Y.; Huang, L.; Guo, X.; Liu, X.; Ma, Z.; Ma, R.; Jiang, X. A novel GA-LSTM-based prediction method of ship energy usage based on the characteristics analysis of operational data. Energy 2023, 282, 128910. [Google Scholar] [CrossRef]
Lu, H.; Wu, J.; Ruan, Y.; Qian, F.; Meng, H.; Gao, Y.; Xu, T. A multi-source transfer learning model based on LSTM and domain adaptation for building energy prediction. Int. J. Electr. Power Energy Syst. 2023, 149, 109024. [Google Scholar] [CrossRef]
Lu, Y.; Sheng, B.; Fu, G.; Luo, R.; Chen, G.; Huang, Y. Prophet-EEMD-LSTM based method for predicting energy consumption in the paint workshop. Appl. Soft Comput. 2023, 143, 110447. [Google Scholar] [CrossRef]
Deng, T.F.; Gui, Y.; Yan, J.Y. Prediction and analysis of tunnel crown settlement based on grey system theory. Adv. Mater. Res. 2012, 490, 423–427. [Google Scholar] [CrossRef]
Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to forget: Continual prediction with LSTM. Neural Comput. 2000, 12, 2451–2471. [Google Scholar] [CrossRef] [PubMed]
Andrychowicz, M.; Denil, M.; Gomez, S.; Hoffman, M.W.; Pfau, D.; Schaul, T.; Shillingford, B.; De Freitas, N. Learning to learn by gradient descent by gradient descent. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar] [CrossRef]
Amari, S. Backpropagation and stochastic gradient descent method. Neurocomputing 1993, 5, 185–196. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Wang, S.; Lu, T.; Hao, R.; Wang, F.; Ding, T.; Li, J.; He, X.; Guo, Y.; Han, X. An Identification Method for Anomaly Types of Active Distribution Network Based on Data Mining. IEEE Trans. Power Syst. 2023, 39, 5548–5560. [Google Scholar] [CrossRef]
Duan, Y.; Zhao, Y.; Hu, J. An initialization-free distributed algorithm for dynamic economic dispatch problems in microgrid: Modeling, optimization and analysis. Sustain. Energy Grids Netw. 2023, 34, 101004. [Google Scholar] [CrossRef]
Liu, S.; Forrest, J.Y.L. Grey Systems: Theory and Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
Chen, P.Y.; Yu, H.M. Foundation settlement prediction based on a novel NGM model. Math. Probl. Eng. 2014, 2014, 242809. [Google Scholar] [CrossRef]
Xie, N.; Wang, R.; Chen, N. Measurement of shock effect following change of one-child policy based on grey forecasting approach. Kybernetes 2018, 47, 559–586. [Google Scholar] [CrossRef]
Chen, C.I.; Chen, H.L.; Chen, S.P. Forecasting of foreign exchange rates of Taiwan’s major trading partners by novel nonlinear Grey Bernoulli model NGBM (1, 1). Commun. Nonlinear Sci. Numer. Simul. 2008, 13, 1194–1204. [Google Scholar] [CrossRef]
Wu, L.; Liu, S.; Yao, L.; Yan, S.; Liu, D. Grey system model with the fractional order accumulation. Commun. Nonlinear Sci. Numer. Simul. 2013, 18, 1775–1785. [Google Scholar] [CrossRef]
Duan, H.; Lei, G.R.; Shao, K. Forecasting crude oil consumption in China using a grey prediction model with an optimal fractional-order accumulating operator. Complexity 2018, 2018, 3869619. [Google Scholar] [CrossRef]
Ding, Y.; Dang, Y. Forecasting renewable energy generation with a novel flexible nonlinear multivariable discrete grey prediction model. Energy 2023, 277, 127664. [Google Scholar] [CrossRef]
Wu, L.-F.; Liu, S.-F.; Cui, W.; Liu, D.-L.; Yao, T.-X. Non-homogenous discrete grey model with fractional-order accumulation. Neural Comput. Appl. 2014, 25, 1215–1221. [Google Scholar] [CrossRef]
Wu, W.; Ma, X.; Zeng, B.; Wang, Y.; Cai, W. Forecasting short-term renewable energy consumption of China using a novel fractional nonlinear grey Bernoulli model. Renew. Energy 2019, 140, 70–87. [Google Scholar] [CrossRef]
Wu, L.; Liu, S.; Chen, H.; Zhang, N. Using a novel grey system model to forecast natural gas consumption in China. Math. Probl. Eng. 2015, 2015, 686501. [Google Scholar] [CrossRef]
Xia, J.; Ma, X.; Wu, W.; Huang, B.; Li, W. Application of a new information priority accumulated grey model with time power to predict short-term wind turbine capacity. J. Clean. Prod. 2020, 244, 118573. [Google Scholar] [CrossRef]
Zhou, W.; Zhang, H.; Dang, Y.; Wang, Z. New information priority accumulated grey discrete model and its application. Chin. J. Manag. Sci. 2017, 25, 140–148. [Google Scholar]
Xie, N.M.; Liu, S.F. Research on the non-homogenous discrete grey model and its parameter’s properties. Syst. Eng. Electron. 2008, 5, 863–867. [Google Scholar]
Xiang, X.; Liu, L.; Cao, J.; Zhang, P. Forecasting the installed wind capacity using a new information priority accumulated nonlinear grey Bernoulli model. In IOP Conference Series: Earth and Environmental Science; IOP Publishing: Bristol, UK, 2020; Volume 467, p. 012088. [Google Scholar]
Drucker, H.; Burges, C.J.; Kaufman, L.; Smola, A.; Vapnik, V. Support vector regression machines. Adv. Neural Inf. Process. Syst. 1996, 9. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Popescu, M.-C.; Balas, V.E.; Perescu-Popescu, L.; Mastorakis, N. Multilayer perceptron and neural networks. WSEAS Trans. Circuits Syst. 2009, 8, 579–588. [Google Scholar]
Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K.; Mitchell, R.; Cano, I.; Zhou, T. Xgboost: Extreme gradient boosting. R Package Version 0.4-2. 2015. Available online: https://cran.r-project.org/web/packages/xgboost/vignettes/xgboost.pdf (accessed on 14 August 2024).
O’Shea, K.; Nash, R. An introduction to convolutional neural networks. arXiv 2015, arXiv:1511.08458. [Google Scholar]
Dey, R.; Salem, F.M. Gate-variants of gated recurrent unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; pp. 1597–1600. [Google Scholar]
Kim, S.; Hong, S.; Joh, M.; Song, S.-K. Deeprain: Convlstm network for precipitation prediction using multichannel radar data. arXiv 2017, arXiv:1711.02316. [Google Scholar]
Kim, T.Y.; Cho, S.B. Predicting residential energy consumption using CNN-LSTM neural networks. Energy 2019, 182, 72–81. [Google Scholar] [CrossRef]

Figure 1. The structure of 1-AGO.

Figure 2. The structure of function

f (\cdot)

.

Figure 2. The structure of function

f (\cdot)

.

Figure 3. Structure of the hybrid grey system model’s training equation (Equation (19)).

Figure 4. The synergy between stacked LSTM layers and grey system model in hybrid model.

Figure 5. Collected data on coal, electricity and gasoline consumption in Henan.

Figure 6. Predicted values of all models in case 1.

Figure 7. Predicted values of all models in case 2.

Figure 8. Predicted values of all models in case 3.

Figure 9. Performance of GreySLstm with different numbers of LSTM layers in case 1.

Figure 10. Performance of GreySLstm with different numbers of LSTM layers in case 2.

Figure 11. Performance of GreySLstm with different numbers of LSTM layers in case 3.

Table 1. Information on the grey system models used for comparison.

Name	Reference	Year	Model Structure	Parameter
GM	[51]	2010	$T^{(0)} (k) + a z^{(1)} (k) = b$	/
NGM	[52]	2014	$T^{(0)} (k) + a z^{(1)} (k) = b k + c$	/
DGM	[53]	2018	$T^{(1)} (k + 1) = β_{1} T^{(1)} (k) + β_{2}$	/
NDGM	[3]	2013	$T^{(1)} (k) = β_{1} T^{(1)} (k - 1) + β_{2} k + β_{3}$	/
BernoulliGM	[54]	2008	$T^{(0)} (k) + a z^{(1)} (k) = b {[z^{(1)} (k)]}^{n}$	b
FGM	[55]	2013	$(T^{(r)} (k) - T^{(r)} (k - 1)) + \frac{a}{2} (T^{(r)} (k) + T^{(r)} (k - 1)) = b$	$α$
FNGM	[56]	2018	$I^{(r)} (t) - I^{(r)} (t - 1) + α z^{(r)} (t) = β t + γ$	$β$
FNDGM	[57]	2023	$I^{(r)} (k + 1) = b_{1} I^{(r)} (k) + b_{2} k + b_{3}$	$b_{1}$
FDGM	[58]	2014	$T^{(r)} (k + 1) = β_{1} T^{(r)} (k) + β_{2}$	$β_{1}$
FBernoulliGM	[59]	2019	$\frac{d X^{(r)} (k)}{d t} + a X^{(r)} (k) = b {(X^{(r)} (k))}^{γ}$	b
NIPGM	[60]	2015	$I^{(λ)} (t) - I^{(λ)} (t - 1) + α z^{(λ)} (t) = β$	$β$
NIPNGM	[61]	2020	$I^{(λ)} (t) - I^{(λ)} (t - 1) + α z^{(λ)} (t) = β t + γ$	$β$
NIPDGM	[62]	2017	$I^{(λ)} (k + 1) = b_{1} I^{(λ)} (k) + b_{2}$	$b_{1}$
NIPNDGM	[63]	2008	$I^{(λ)} (k + 1) = b_{1} I^{(λ)} (k) + b_{2} k + b_{3}$	$b_{1}$
NIPBernoulliGM	[64]	2020	$\frac{d T (t)}{d t} + a T (t) = b {(T (t))}^{r}$	b

Table 2. Information of machine learning models used for comparison.

Full Name	Abbreviation	Reference	Year
Support vector regression	svr	[65]	1996
Long Short-Term Memory	lstm	[45]	2000
Random forest regression	rf	[66]	2001
Multilayer perceptron	mlp	[67]	2009
Extreme gradient boosting	xgb	[68]	2015
Convolution neural network	cnn	[69]	2015
Gated recurrent unit	gru	[70]	2017
Convolutional LSTM	convlstm	[71]	2017
CNN-LSTM	cnnlstm	[72]	2019

Table 3. Metrics used for evaluating.

Full Name	Metrics	Equation
Root-mean-square error	RMSE	$\sqrt{\frac{1}{d} \sum_{k \in D} {(T^{(0)} (k) - {\hat{T}}^{(0)} (k))}^{2}}$
Mean absolute error	MAE	$\frac{1}{d} \sum_{k \in D} \|T^{(0)} (k) - {\hat{T}}^{(0)} (k)\|$
Mean Absolute Percentage Error	MAPE	$\frac{1}{d} \sum_{k \in D} (\frac{\|T^{(0)} (k) - {\hat{T}}^{(0)} (k)\|}{\| T^{(0)} (k) \|} \times 100)$
Theil’s inequality coefficient	TIC	$\sqrt{\frac{\frac{1}{d} \sum_{k \in D} {(T^{(0)} (k) - {\hat{T}}^{(0)} (k))}^{2}}{\frac{1}{d} \sum_{k \in D} T^{(0)} {(k)}^{2} + \frac{1}{d} \sum_{k \in D} {\hat{T}}^{(0)} {(k)}^{2}}}$
Theil’s U1 statistic	U1	$\frac{\sqrt{\frac{1}{d} \sum_{k \in D} {(T^{(0)} (k) - {\hat{T}}^{(0)} (k))}^{2}}}{\sqrt{\frac{1}{d} \sum_{k \in D} T^{(0)} {(k)}^{2}} + \sqrt{\frac{1}{d} \sum_{k \in D} {\hat{T}}^{(0)} {(k)}^{2}}}$
Theil’s U2 statistic	U2	$\frac{\sum_{k \in D} {(T^{(0)} (k) - {\hat{T}}^{(0)} (k))}^{2}}{\sum_{k \in D} {(T^{(0)} (k + 1) - T^{(0)} (k))}^{2}}$

Table 4. The metrics of the models in case 1.

	Model	GreySLstm-M1	GreySLstm-M2	gru	rf	xgb	lstm	svr
Training	RMSE	490.3060	960.1925	928.6231	870.6966	1574.2322	811.1070	1566.8574
	MAE	339.5045	609.2220	658.7882	518.2737	1352.1988	529.3345	1367.6741
	MAPE	1.8857	2.9844	4.6797	2.7050	11.1550	3.6196	11.9076
	TIC	0.0133	0.0260	0.0252	0.0238	0.0432	0.0221	0.0425
	U1	0.0133	0.0260	0.0252	0.0238	0.0432	0.0221	0.0425
	U2	0.0266	0.0522	0.0505	0.0473	0.0855	0.0441	0.0851
Test	RMSE	928.6467	5323.7461	2507.0439	3073.1162	1661.8960	1651.8883	2308.4474
	MAE	910.8103	5108.5371	2329.0432	2799.0660	1172.6112	1436.7536	2166.6322
	MAPE	4.1222	23.2868	10.6819	12.8810	5.5687	6.6503	9.7389
	TIC	0.0209	0.1066	0.0532	0.0645	0.0362	0.0357	0.0540
	U1	0.0209	0.1066	0.0532	0.0645	0.0362	0.0357	0.0540
	U2	0.0414	0.2373	0.1117	0.1370	0.0741	0.0736	0.1029
	Model	cnn	mlp	cnnlstm	convlstm	GM	NGM	DGM
Training	RMSE	2430.8773	3915.1753	861.2289	812.6766	3013.4610	2491.0085	3017.3331
	MAE	2159.5672	3154.0362	597.2083	566.9696	2640.8209	2156.8366	2655.5472
	MAPE	14.2518	17.5495	3.1941	3.5721	17.3731	14.6886	17.6296
	TIC	0.0665	0.1152	0.0236	0.0221	0.0815	0.0691	0.0815
	U1	0.0665	0.1152	0.0236	0.0221	0.0815	0.0691	0.0815
	U2	0.1321	0.2127	0.0468	0.0442	0.1637	0.1353	0.1639
Test	RMSE	11,761.7003	4033.4668	3381.0791	5542.6099	16,836.7866	9023.2598	16,728.2120
	MAE	11,319.1136	3203.2006	3175.0204	5061.8062	16,073.4176	8607.2024	15,971.1954
	MAPE	51.5327	14.9912	14.5368	23.2690	73.2872	39.2627	72.8201
	TIC	0.2092	0.0839	0.0704	0.1110	0.2756	0.1687	0.2743
	U1	0.2092	0.0839	0.0704	0.1110	0.2756	0.1687	0.2743
	U2	0.5243	0.1798	0.1507	0.2471	0.7505	0.4022	0.7456
	Model	NDGM	BernoulliGM	FGM	FNGM	FNDGM	FDGM	FBernoulliGM
Training	RMSE	2403.2727	5532.4287	4248.0164	4070.0333	2318.3493	4003.4205	6202.7378
	MAE	2120.1498	4954.7318	3574.8920	3599.8241	1948.7597	3384.7620	5537.3872
	MAPE	15.0650	36.3972	30.5036	26.0696	12.7429	28.3588	41.8137
	TIC	0.0656	0.1669	0.1146	0.1150	0.0623	0.1086	0.1903
	U1	0.0656	0.1669	0.1146	0.1150	0.0623	0.1086	0.1903
	U2	0.1306	0.3006	0.2308	0.2211	0.1260	0.2175	0.3370
Test	RMSE	9081.4995	1671.9496	3807.3586	6217.6065	11,715.2465	3689.2512	2269.5763
	MAE	8708.0001	1383.7527	3298.5991	5447.6971	11,347.7484	3197.6934	2226.4167
	MAPE	39.6817	6.0014	15.2810	25.1836	51.5896	14.8132	9.9161
	TIC	0.1695	0.0363	0.0791	0.1235	0.2084	0.0768	0.0482
	U1	0.1695	0.0363	0.0791	0.1235	0.2084	0.0768	0.0482
	U2	0.4048	0.0745	0.1697	0.2771	0.5222	0.1644	0.1012
	Model	NIPGM	NIPNGM	NIPDGM	NIPNDGM	NIPBernoulliGM
Training	RMSE	4317.6857	3141.6617	4127.4823	2429.1336	3457.6183
	MAE	3704.4585	2639.5239	3531.0537	1970.3929	2963.5155
	MAPE	31.4827	16.0038	29.7322	11.6506	23.9604
	TIC	0.1171	0.0886	0.1121	0.0656	0.0996
	U1	0.1171	0.0886	0.1121	0.0656	0.0996
	U2	0.2346	0.1707	0.2243	0.1320	0.1879
Test	RMSE	5153.0724	11,050.5598	3907.2315	13,174.3111	23,210.8592
	MAE	4600.4453	10,461.0367	3395.0718	12,732.8356	18,689.1563
	MAPE	21.2189	47.7791	15.7211	57.9115	87.2516
	TIC	0.1042	0.1995	0.0810	0.2286	0.6128
	U1	0.1042	0.1995	0.0810	0.2286	0.6128
	U2	0.2297	0.4926	0.1742	0.5872	1.0346

Table 5. The metrics of the models in case 2.

	Model	GreySLstm-M1	GreySLstm-M2	gru	rf	xgb	lstm	svr
Training	RMSE	37.4968	63.6828	488.3888	34.8961	145.2803	295.4851	181.7269
	MAE	28.8925	42.0938	220.5750	25.8150	121.5278	150.3027	156.6424
	MAPE	1.6761	2.0229	32.1378	1.4577	11.4320	20.5718	14.6458
	TIC	0.0096	0.0163	0.1259	0.0090	0.0378	0.0763	0.0468
	U1	0.0096	0.0163	0.1259	0.0090	0.0378	0.0763	0.0468
	U2	0.0193	0.0328	0.2514	0.0180	0.0748	0.1521	0.0935
Test	RMSE	115.3096	200.4789	421.2057	523.4443	772.2424	457.9253	129.5320
	MAE	95.7251	148.0640	377.4322	480.7118	743.9410	414.9213	108.6905
	MAPE	2.7584	3.9858	10.3856	13.2748	20.7229	11.4350	3.1498
	TIC	0.0162	0.0288	0.0627	0.0791	0.1215	0.0685	0.0180
	U1	0.0162	0.0288	0.0627	0.0791	0.1215	0.0685	0.0180
	U2	0.0325	0.0564	0.1186	0.1474	0.2174	0.1289	0.0365
	Model	cnn	mlp	cnnlstm	convlstm	GM	NGM	DGM
Training	RMSE	189.6547	118.7663	105.7004	47.3233	231.5212	165.9904	232.7358
	MAE	142.9024	93.4915	75.3073	35.2646	192.6853	132.2427	193.8591
	MAPE	14.3022	6.1338	4.0891	2.3983	14.3749	9.9145	14.5544
	TIC	0.0490	0.0304	0.0267	0.0122	0.0586	0.0435	0.0589
	U1	0.0490	0.0304	0.0267	0.0122	0.0586	0.0435	0.0589
	U2	0.0976	0.0611	0.0544	0.0244	0.1192	0.0854	0.1198
Test	RMSE	293.7241	331.2288	124.8711	448.6507	1213.7337	277.8937	1215.1447
	MAE	270.8711	310.8389	104.1500	392.9850	1163.1175	246.7314	1164.8605
	MAPE	7.6876	8.8156	2.9615	10.9248	32.5157	6.9568	32.5664
	TIC	0.0398	0.0447	0.0175	0.0598	0.1464	0.0378	0.1466
	U1	0.0398	0.0447	0.0175	0.0598	0.1464	0.0378	0.1466
	U2	0.0827	0.0933	0.0352	0.1263	0.3417	0.0782	0.3421
	Model	NDGM	BernoulliGM	FGM	FNGM	FNDGM	FDGM	FBernoulliGM
Training	RMSE	152.1839	301.4826	268.4222	147.0146	136.9840	472.5171	263.7041
	MAE	131.1087	266.4801	225.9895	110.9666	104.1567	376.2808	232.4938
	MAPE	10.3028	21.8860	19.2908	5.8802	6.6519	37.5129	18.9998
	TIC	0.0392	0.0822	0.0662	0.0381	0.0348	0.1151	0.0713
	U1	0.0392	0.0822	0.0662	0.0381	0.0348	0.1151	0.0713
	U2	0.0783	0.1552	0.1382	0.0757	0.0705	0.2432	0.1357
Test	RMSE	340.3275	116.3608	773.9608	480.4346	473.8145	301.5636	149.3024
	MAE	315.9793	107.1247	751.6470	459.6381	457.4457	270.4788	123.7991
	MAPE	8.9202	3.0525	21.1297	12.9606	12.9383	7.4586	3.3959
	TIC	0.0459	0.0163	0.0984	0.0635	0.0627	0.0442	0.0212
	U1	0.0459	0.0163	0.0984	0.0635	0.0627	0.0442	0.0212
	U2	0.0958	0.0328	0.2179	0.1353	0.1334	0.0849	0.0420
	Model	NIPGM	NIPNGM	NIPDGM	NIPNDGM	NIPBernoulliGM
Training	RMSE	214.9696	138.8300	232.3187	125.2973	272.0905
	MAE	169.2131	109.0311	189.8683	90.8680	238.6010
	MAPE	12.6166	6.0049	15.0377	5.2743	20.0758
	TIC	0.0537	0.0361	0.0577	0.0319	0.0736
	U1	0.0537	0.0361	0.0577	0.0319	0.0736
	U2	0.1107	0.0715	0.1196	0.0645	0.1401
Test	RMSE	862.1933	404.1645	790.7576	429.9505	197.6275
	MAE	836.4000	384.1215	768.0924	412.9963	150.0556
	MAPE	23.4874	10.8571	21.5902	11.6904	4.0554
	TIC	0.1085	0.0540	0.1004	0.0572	0.0283
	U1	0.1085	0.0540	0.1004	0.0572	0.0283
	U2	0.2427	0.1138	0.2226	0.1211	0.0556

Table 6. The metrics of models in case 3.

	Model	GreySLstm-M1	GreySLstm-M2	gru	rf	xgb	lstm	svr
Training	RMSE	5.5247	8.7590	30.3996	9.4015	29.0164	17.2873	24.9984
	MAE	3.3122	5.5920	19.1020	6.4205	25.1756	13.5572	23.0586
	MAPE	1.4747	2.2650	12.0591	2.9254	15.7324	7.8731	13.2535
	TIC	0.0128	0.0202	0.0711	0.0221	0.0666	0.0405	0.0593
	U1	0.0128	0.0202	0.0711	0.0221	0.0666	0.0405	0.0593
	U2	0.0257	0.0407	0.1413	0.0437	0.1348	0.0803	0.1162
Test	RMSE	76.9153	96.7842	141.9118	278.8786	318.5326	94.8583	287.8411
	MAE	65.5358	77.5446	126.7438	265.2736	306.6917	85.9361	254.5826
	MAPE	9.5421	11.1199	18.0005	38.6305	44.9456	12.3212	36.1240
	TIC	0.0566	0.0691	0.1168	0.2592	0.3079	0.0751	0.2640
	U1	0.0566	0.0691	0.1168	0.2592	0.3079	0.0751	0.2640
	U2	0.1142	0.1437	0.2108	0.4142	0.4731	0.1409	0.4275
	Model	cnn	mlp	cnnlstm	convlstm	GM	NGM	DGM
Training	RMSE	89.2788	39.4255	35.1136	34.6697	40.1285	177.1090	40.0240
	MAE	73.5071	29.2930	26.1866	26.6884	32.8573	122.6439	32.7378
	MAPE	43.4380	14.4692	13.2231	13.9887	17.4882	54.4627	17.4793
	TIC	0.2081	0.0922	0.0822	0.0811	0.0954	0.2982	0.0949
	U1	0.2081	0.0922	0.0822	0.0811	0.0954	0.2982	0.0949
	U2	0.4149	0.1832	0.1632	0.1611	0.1865	0.8230	0.1860
Test	RMSE	462.0330	249.8680	104.4885	221.5902	168.8275	1536.1246	170.6696
	MAE	453.9507	244.4299	95.2879	210.2770	164.6643	1349.0963	166.6187
	MAPE	67.3986	36.2082	14.6379	30.5910	24.8887	193.0584	25.1626
	TIC	0.5208	0.2275	0.0833	0.1959	0.1426	0.5397	0.1444
	U1	0.5208	0.2275	0.0833	0.1959	0.1426	0.5397	0.1444
	U2	0.6862	0.3711	0.1552	0.3291	0.2508	2.2815	0.2535
	Model	NDGM	BernoulliGM	FGM	FNGM	FNDGM	FDGM	FBernoulliGM
Training	RMSE	34.0744	43.1875	37.6800	41.5449	42.8570	50.4337	36.0213
	MAE	27.1748	33.1724	29.2709	34.4108	33.4238	40.2398	28.0154
	MAPE	14.6721	15.7012	16.7802	18.6444	16.0451	22.6073	16.1310
	TIC	0.0798	0.1050	0.0849	0.0978	0.1039	0.1174	0.0817
	U1	0.0798	0.1050	0.0849	0.0978	0.1039	0.1174	0.0817
	U2	0.1583	0.2007	0.1751	0.1931	0.1992	0.2344	0.1674
Test	RMSE	197.0548	232.0179	217.0482	193.0278	241.6070	311.1789	362.4447
	MAE	144.3595	229.0458	159.5841	189.3324	238.4823	305.2174	260.6160
	MAPE	20.1963	34.4236	22.2412	28.5918	35.7358	45.2670	35.9641
	TIC	0.1337	0.2076	0.1436	0.1666	0.2182	0.3001	0.2202
	U1	0.1337	0.2076	0.1436	0.1666	0.2182	0.3001	0.2202
	U2	0.2927	0.3446	0.3224	0.2867	0.3588	0.4622	0.5383
	Model	NIPGM	NIPNGM	NIPDGM	NIPNDGM	NIPBernoulliGM
Training	RMSE	40.0408	59.9252	33.8145	32.3140	37.7739
	MAE	31.1999	42.8442	25.8625	28.0125	29.7247
	MAPE	18.2299	24.0522	14.8879	16.1756	17.5189
	TIC	0.0895	0.1439	0.0790	0.0760	0.0850
	U1	0.0895	0.1439	0.0790	0.0760	0.0850
	U2	0.1861	0.2785	0.1571	0.1502	0.1755
Test	RMSE	187.9752	107.3958	946.1340	7738.0291	443.3291
	MAE	139.3251	87.2050	690.9050	4938.1195	311.9407
	MAPE	19.4785	12.5335	94.8007	664.1561	42.8325
	TIC	0.1268	0.0757	0.4286	0.8693	0.2582
	U1	0.1268	0.0757	0.4286	0.8693	0.2582
	U2	0.2792	0.1595	1.4052	11.4929	0.6585

Table 7. Optimization of GreySLstm-M1 compared with other models in the test phase of case 1.

Model	vs. GreySLstm-M2	vs. gru	vs. rf	vs. xgb	vs. lstm	vs. svr	vs. cnn
RMSE	82.5565	62.9585	69.7816	44.1213	43.7827	59.7718	92.1045
MAE	82.1708	60.8934	67.4602	22.3263	36.6064	57.9619	91.9533
MAPE	82.2981	61.4092	67.9978	25.9755	38.0148	57.6727	92.0008
TIC	80.3937	60.6812	67.6062	42.2685	41.4517	61.3125	90.0117
U1	80.3937	60.6812	67.6062	42.2685	41.4517	61.3125	90.0117
U2	82.5565	62.9585	69.7816	44.1213	43.7827	59.7718	92.1045
Model	vs. mlp	vs. cnnlstm	vs. convlstm	vs. GM	vs. NGM	vs. DGM	vs. NDGM
RMSE	76.9765	72.5340	83.2453	94.4844	89.7083	94.4486	89.7743
MAE	71.5656	71.3132	82.0062	94.3334	89.4180	94.2972	89.5405
MAPE	72.5025	71.6429	82.2846	94.3753	89.5010	94.3392	89.6118
TIC	75.0942	70.3242	81.1786	92.4163	87.6138	92.3802	87.6714
U1	75.0942	70.3242	81.1786	92.4163	87.6138	92.3802	87.6714
U2	76.9765	72.5340	83.2453	94.4844	89.7083	94.4486	89.7743
Model	vs. BernoulliGM	vs. FGM	vs. FNGM	vs. FNDGM	vs. FDGM	vs. FBernoulliGM	vs. NIPGM
RMSE	44.4573	75.6092	85.0642	92.0732	74.8283	59.0828	81.9788
MAE	34.1782	72.3880	83.2808	91.9736	71.5166	59.0908	80.2017
MAPE	31.3125	73.0239	83.6314	92.0096	72.1720	58.4291	80.5729
TIC	42.4376	73.5743	83.0796	89.9698	72.7863	56.6203	79.9418
U1	42.4376	73.5743	83.0796	89.9698	72.7863	56.6203	79.9418
U2	44.4573	75.6092	85.0642	92.0732	74.8283	59.0828	81.9788
Model	vs. NIPNGM	vs. NIPDGM	vs. NIPNDGM	vs. NIPBernoulliGM
RMSE	91.5964	76.2326	92.9511	95.9991
MAE	91.2933	73.1726	92.8468	95.1265
MAPE	91.3724	73.7791	92.8819	95.2755
TIC	89.5263	74.1977	90.8570	96.5897
U1	89.5263	74.1977	90.8570	96.5897
U2	91.5964	76.2326	92.9511	95.9991

Table 8. Optimization of GreySLstm-M1 compared with other models in the test phase of case 2.

Model	vs. GreySLstm-M2	vs. gru	vs. rf	vs. xgb	vs. lstm	vs. svr	vs. cnn
RMSE	42.4830	72.6239	77.9710	85.0682	74.8191	10.9799	60.7422
MAE	35.3488	74.6378	80.0868	87.1327	76.9293	11.9287	64.6603
MAPE	30.7929	73.4399	79.2205	86.6889	75.8773	12.4241	64.1183
TIC	43.9111	74.2029	79.5609	86.6971	76.4041	10.4043	59.4026
U1	43.9111	74.2029	79.5609	86.6971	76.4041	10.4043	59.4026
U2	42.4830	72.6239	77.9710	85.0682	74.8191	10.9799	60.7422
Model	vs. mlp	vs. cnnlstm	vs. convlstm	vs. GM	vs. NGM	vs. DGM	vs. NDGM
RMSE	65.1873	7.6571	74.2986	90.4996	58.5059	90.5106	66.1181
MAE	69.2043	8.0892	75.6415	91.7700	61.2027	91.7823	69.7053
MAPE	68.7097	6.8578	74.7508	91.5166	60.3493	91.5298	69.0766
TIC	63.8041	7.4294	72.9519	88.9578	57.2180	88.9685	64.7390
U1	63.8041	7.4294	72.9519	88.9578	57.2180	88.9685	64.7390
U2	65.1873	7.6571	74.2986	90.4996	58.5059	90.5106	66.1181
Model	vs. BernoulliGM	vs. FGM	vs. FNGM	vs. FNDGM	vs. FDGM	vs. FBernoulliGM	vs. NIPGM
RMSE	0.9034	85.1014	75.9989	75.6636	61.7628	22.7678	86.6260
MAE	10.6414	87.2646	79.1738	79.0740	64.6090	22.6771	88.5551
MAPE	9.6337	86.9452	78.7168	78.6801	63.0165	18.7709	88.2557
TIC	1.0207	83.5747	74.5354	74.1933	63.3875	23.8799	85.0928
U1	1.0207	83.5747	74.5354	74.1933	63.3875	23.8799	85.0928
U2	0.9034	85.1014	75.9989	75.6636	61.7628	22.7678	86.6260
Model	vs. NIPNGM	vs. NIPDGM	vs. NIPNDGM	vs. NIPBernoulliGM
RMSE	71.4696	85.4178	73.1807	41.6531
MAE	75.0795	87.5373	76.8218	36.2069
MAPE	74.5934	87.2237	76.4042	31.9820
TIC	70.0377	83.8895	71.7290	42.8803
U1	70.0377	83.8895	71.7290	42.8803
U2	71.4696	85.4178	73.1807	41.6531

Table 9. Optimization of GreySLstm-M1 compared with other models in the test phase of case 3.

Model	vs. GreySLstm-M2	vs. gru	vs. rf	vs. xgb	vs. lstm	vs. svr	vs. cnn
RMSE	20.5290	45.8006	72.4198	75.8532	18.9155	73.2785	83.3528
MAE	15.4864	48.2927	75.2950	78.6314	23.7390	74.2575	85.5632
MAPE	14.1890	46.9899	75.2991	78.7697	22.5557	73.5852	85.8423
TIC	18.1348	51.5610	78.1745	81.6272	24.6273	78.5708	89.1368
U1	18.1348	51.5610	78.1745	81.6272	24.6273	78.5708	89.1368
U2	20.5290	45.8006	72.4198	75.8532	18.9155	73.2785	83.3528
Model	vs. mlp	vs. cnnlstm	vs. convlstm	vs. GM	vs. NGM	vs. DGM	vs. NDGM
RMSE	69.2176	26.3887	65.2894	54.4415	94.9929	54.9332	60.9676
MAE	73.1883	31.2234	68.8336	60.2004	95.1422	60.6672	54.6024
MAPE	73.6466	34.8124	68.8075	61.6611	95.0574	62.0784	52.7533
TIC	75.1257	32.0472	71.1178	60.3303	89.5159	60.8297	57.6863
U1	75.1257	32.0472	71.1178	60.3303	89.5159	60.8297	57.6863
U2	69.2176	26.3887	65.2894	54.4415	94.9929	54.9332	60.9676
Model	vs. BernoulliGM	vs. FGM	vs. FNGM	vs. FNDGM	vs. FDGM	vs. FBernoulliGM	vs. NIPGM
RMSE	66.8494	64.5630	60.1532	68.1651	75.2826	78.7787	59.0822
MAE	71.3875	58.9334	65.3859	72.5196	78.5282	74.8535	52.9620
MAPE	72.2804	57.0973	66.6266	73.2983	78.9205	73.4678	51.0122
TIC	72.7439	60.6075	66.0292	74.0691	81.1451	74.3007	55.3960
U1	72.7439	60.6075	66.0292	74.0691	81.1451	74.3007	55.3960
U2	66.8494	64.5630	60.1532	68.1651	75.2826	78.7787	59.0822
Model	vs. NIPNGM	vs. NIPDGM	vs. NIPNDGM	vs. NIPBernoulliGM
RMSE	28.3814	91.8706	99.0060	82.6505
MAE	24.8486	90.5145	98.6729	78.9909
MAPE	23.8675	89.9346	98.5633	77.7223
TIC	25.3015	86.7980	93.4915	78.0911
U1	25.3015	86.7980	93.4915	78.0911
U2	28.3814	91.8706	99.0060	82.6505

Table 10. Metrics of the GreySLstm model with different numbers of LSTM layers.

		LSTM Layer Number	1	2	3	4	5	6	7	8	9
Case 1	RMSE	Train	937.9942	851.0898	486.6038	844.6278	963.4058	3109.3818	10,351.4960	11,148.5194	11,152.6367
	RMSE	Test	5132.2100	3611.1851	1627.7841	3030.8841	5331.7715	19,477.6190	64,645.9305	68,870.8128	68,889.6010
	MAE	Train	625.4415	521.0091	288.6944	547.3042	546.0693	1982.2757	6818.2956	7408.2859	7411.4835
	MAE	Test	4841.1397	3463.4200	1190.5096	2618.6103	5128.7763	18,726.4847	62,154.8233	66,233.5773	66,251.8598
	MAPE	Train	3.1277	2.4315	1.3231	2.6728	2.4374	10.8775	39.6334	42.5158	42.5315
	MAPE	Test	22.1315	15.7899	5.6178	12.0999	23.3665	85.2647	282.9392	301.4895	301.5725
	TIC	Train	0.0254	0.0231	0.0132	0.0231	0.0260	0.0724	0.2276	0.2411	0.2412
	TIC	Test	0.1027	0.0747	0.0354	0.0635	0.1066	0.2361	0.5946	0.6105	0.6106
	U1	Train	0.0254	0.0231	0.0132	0.0231	0.0260	0.0724	0.2276	0.2411	0.2412
	U1	Test	0.1027	0.0747	0.0354	0.0635	0.1066	0.2361	0.5946	0.6105	0.6106
	U2	Train	0.0510	0.0462	0.0264	0.0459	0.0523	0.1689	0.5624	0.6057	0.6060
	U2	Test	0.2288	0.1610	0.0726	0.1351	0.2377	0.8682	2.8815	3.0698	3.0707
Case 2	RMSE	Train	101.5587	94.1466	85.9299	118.1945	65.8725	343.2804	861.3843	1104.3757	1103.5302
	RMSE	Test	317.5203	192.0910	149.4493	360.5283	177.3545	2044.3780	5160.0670	6546.0611	6542.0924
	MAE	Train	76.2756	69.4719	65.2312	85.1072	47.0495	202.2991	549.3752	699.6323	699.0246
	MAE	Test	286.1201	168.4210	128.8550	314.3367	148.3387	1973.5665	4974.8283	6322.1106	6318.2444
	MAPE	Train	4.1573	4.0826	3.7143	4.5947	2.5382	10.6773	30.1723	36.3884	36.3614
	MAPE	Test	8.0050	4.6901	3.6203	8.6849	4.1085	55.1810	138.8855	176.5324	176.4243
	TIC	Train	0.0265	0.0245	0.0225	0.0313	0.0169	0.0807	0.1833	0.2274	0.2272
	TIC	Test	0.0429	0.0276	0.0213	0.0493	0.0249	0.2058	0.4046	0.4814	0.4813
	U1	Train	0.0265	0.0245	0.0225	0.0313	0.0169	0.0807	0.1833	0.2274	0.2272
	U1	Test	0.0429	0.0276	0.0213	0.0493	0.0249	0.2058	0.4046	0.4814	0.4813
	U2	Train	0.0523	0.0485	0.0442	0.0608	0.0339	0.1767	0.4434	0.5685	0.5681
	U2	Test	0.0894	0.0541	0.0421	0.1015	0.0499	0.5756	1.4528	1.8430	1.8419
Case 3	RMSE	Train	38.5742	38.1353	31.8485	38.4768	38.4039	38.7443	38.8287	38.8505	38.6034
	RMSE	Test	208.8081	235.0953	206.5127	207.9963	206.1128	210.7499	211.8525	212.1576	208.7974
	MAE	Train	30.9613	30.4737	25.0073	30.9819	30.8845	31.1706	31.2248	31.2383	31.0690
	MAE	Test	152.8418	171.6049	150.9447	152.2497	150.8776	154.2628	155.0598	155.2788	152.8170
	MAPE	Train	18.5792	18.1695	14.8266	18.5924	18.5317	18.7202	18.7552	18.7637	18.6545
	MAPE	Test	21.2923	23.8549	21.0333	21.2110	21.0236	21.4869	21.5960	21.6260	21.2890
	TIC	Train	0.0867	0.0859	0.0720	0.0866	0.0864	0.0870	0.0872	0.0872	0.0868
	TIC	Test	0.1395	0.1535	0.1381	0.1391	0.1380	0.1406	0.1412	0.1414	0.1395
	U1	Train	0.0867	0.0859	0.0720	0.0866	0.0864	0.0870	0.0872	0.0872	0.0868
	U1	Test	0.1395	0.1535	0.1381	0.1391	0.1380	0.1406	0.1412	0.1414	0.1395
	U2	Train	0.1793	0.1772	0.1480	0.1788	0.1785	0.1800	0.1804	0.1805	0.1794
	U2	Test	0.3101	0.3492	0.3067	0.3089	0.3061	0.3130	0.3147	0.3151	0.3101

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hao, Y.; Ma, X. A Hybrid Grey System Model Based on Stacked Long Short-Term Memory Layers and Its Application in Energy Consumption Forecasting. Processes 2024, 12, 1749. https://doi.org/10.3390/pr12081749

AMA Style

Hao Y, Ma X. A Hybrid Grey System Model Based on Stacked Long Short-Term Memory Layers and Its Application in Energy Consumption Forecasting. Processes. 2024; 12(8):1749. https://doi.org/10.3390/pr12081749

Chicago/Turabian Style

Hao, Yiwu, and Xin Ma. 2024. "A Hybrid Grey System Model Based on Stacked Long Short-Term Memory Layers and Its Application in Energy Consumption Forecasting" Processes 12, no. 8: 1749. https://doi.org/10.3390/pr12081749

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Grey System Model Based on Stacked Long Short-Term Memory Layers and Its Application in Energy Consumption Forecasting

Abstract

1. Introduction

2. Methodology

2.1. The General Formulation of a Grey System Model

2.2. The Proposed Hybrid Grey System Model Based on Stacked LSTM Layers

2.3. Adam Algorithm for Training the Proposed Model

2.4. Grid Search Algorithm for Tuning Parameters of the Proposed Model

2.5. The Proposed Complete Forecasting Process

3. Application

3.1. Data Collection

3.2. Selection of Comparison Models and Assessment Criteria

3.3. Case 1: Henan’s Coal Consumption

3.4. Case 2: Henan’s Electricity Consumption

3.5. Case 3: Henan’s Gasoline Consumption

3.6. Discussions

Analysis of the Performance of the Models on Real-World Cases

3.7. Analysis of the Indicator Optimization of the Proposed Model

Evaluating GreySLstm Performance with Different Numbers of LSTM Layers

4. Conclusions

4.1. Paper Structure Overview

4.2. Main Findings and Contributions

4.3. Analysis of Potential Limitations of the Model

4.4. Recommendations for Model Enhancement

4.5. Future Research Directions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI