Natural Gas Consumption Forecasting Based on Homoheterogeneous Stacking Ensemble Learning

Wang, Qingqing; Luo, Zhengshan; Li, Pengfei

doi:10.3390/su16198691

Open AccessArticle

Natural Gas Consumption Forecasting Based on Homoheterogeneous Stacking Ensemble Learning

by

Qingqing Wang

¹

,

Zhengshan Luo

^1,* and

Pengfei Li

²

¹

School of Management, Xi’an University of Architecture and Technology, Xi’an 710055, China

²

College of Economics and Management, Xi’an University of Posts and Telecommunications, Xi’an 710121, China

^*

Author to whom correspondence should be addressed.

Sustainability 2024, 16(19), 8691; https://doi.org/10.3390/su16198691

Submission received: 4 September 2024 / Revised: 7 October 2024 / Accepted: 7 October 2024 / Published: 9 October 2024

Download

Browse Figures

Versions Notes

Abstract

Natural gas consumption is an important indicator of energy utilization and demand, and its scientific and high-accuracy prediction plays a key role in energy policy formulation. With the development of deep neural networks and ensemble learning, a homoheterogeneous stacking ensemble learning method is proposed for natural gas consumption forecasting. Firstly, to obtain the potential data characteristics, a nonlinear concave and convex transformation-based data dimension enhancement method is designed. Then, with the aid of a stacking ensemble learning framework, the multiscale autoregressive integrated moving average (ARIMA) and high-order fuzzy cognitive map (HFCM) methods are chosen as the base learner models, while the meta learner model is constructed via a well-designed deep neural network with long short-term memory (LSTM) cells. Finally, with the natural gas energy consumption data of national and 30 provinces (where the data of Xizang are unavailable) of China from 2000 to 2019, the numerical results show the proposed algorithm has a better forecasting performance in accuracy, robustness to noise, and sensitivity to data variations than the seven compared traditional and ensemble methods, and the corresponding model applicability rate could achieve more than 90%.

Keywords:

homoheterogeneous stacking ensemble learning; natural gas consumption forecasting; HFCM; LSTM

1. Introduction

Natural gas is the main fuel source for power generation and heating applications [1], and as an efficient, low-carbon, and clean energy source, its proportion in primary energy consumption is increasing. In recent years, China’s natural gas consumption has maintained a rapid growth trend, increasing from 107.6 billion cubic meters in 2010 to 394.5 billion cubic meters in 2023, with a compound annual growth rate of 9.7%. In April 2024, the apparent consumption of natural gas in China was 35.46 billion cubic meters, a year-on-year increase of 11.8%; from January to April, the apparent consumption of natural gas in China was 143.73 billion cubic meters, a year-on-year increase of 11.9% [2]. There is still significant room for growth in China’s natural gas consumption in the future. The consumption of natural gas is an important indicator reflecting the country’s energy utilization and demand. If it can be scientifically and reasonably predicted, it will provide support for the formulation and decision-making of natural gas strategies, as well as ensuring the gas consumption of various regions [3].

The current research on predicting natural gas consumption mainly focuses on two aspects: analysis of influencing factors and research on prediction methods:

(1) In terms of the analysis of factors affecting natural gas consumption, Ludhong’s [4] research found that the main indicators affecting natural gas demand include the level of national economic and urbanization development, energy policies and markets, etc.; Parikh et al. [5] studied the impact of Gross Domestic Product (GDP), agricultural GDP, industrial GDP, and population on natural gas consumption in India, and verified that the use of natural gas can promote the development of agriculture, industry, and other sectors in India; Shahbaz et al. [6] analyzed the relationship between Pakistan’s economy, working population, and natural gas consumption. Empirical analysis showed that economic growth is positively correlated with an increase in natural gas consumption; Zhang et al. [7] selected six factors, including GDP, urban population, energy consumption structure, industrial structure, energy efficiency, and exports of goods and services, to predict the demand for natural gas; Lu Quanying et al. [8] used path analysis to select GDP, population, and urbanization rate as influencing factors on natural gas consumption; Chai Jian et al. [9] constructed 62 models for predicting national natural gas demand, selecting indicators such as GDP, industrial structure (industrial proportion), the proportion of raw coal to total energy consumption, the proportion of crude oil to total energy consumption, the proportion of hydropower, nuclear power, and wind power to total energy consumption, urbanization rate, natural gas prices, external dependence, and annual natural gas production in China.

(2) The existing research on forecasting methods of natural gas consumption can be summarized into two categories: single-model and combined-model forecasting. The representative algorithms of single-model forecasting include time series [10,11], machine learning [12,13], the grey system method [14,15], and the econometrics method [16]. For example, Li Hongbing et al. [17] used the traditional GM (1,1) and grey Verhulst model to fit and forecast China’s urban natural gas consumption from 2006 to 2017, and found that the grey Verhulst model has a better fitting performance and higher precision than GM (1,1); Wang Hui et al. [18] established three grey models, GM (1,1), DGM (1,1), and FGM (1,1), based on China’s natural gas consumption data from 2014 to 2020, and compared the performance of the models using the average absolute percentage error. The results showed that the FGM (1,1) model had the highest simulation accuracy and could be used to predict China’s future natural gas consumption; Gao Yanpeng et al. [19] used grey theory models to predict China’s natural gas consumption in 2012, 2013, and 2014. The predicted results have a high accuracy and provide reference for the management and scheduling departments of natural gas users; Cheng Bailiang et al. [20] used low-carbon models, grey models, and linear programming models as technical frameworks to comprehensively evaluate the future energy structure and natural gas demand in Fujian Province; Shashank N S and Mani S S [21] used artificial neural network models for natural gas prediction and compared them with multiple linear regression methods to verify the effectiveness of the proposed algorithms. The combination forecasting model has been a research hotspot in recent years. Due to the large number of single forecasting models, there are various modes such as series, parallel, and mixed when forming a combination model. For example, Tong Yanzeng et al. [22] used the Grey Fractional Order FGM (1,1) model to predict the natural gas consumption in Beijing. However, due to the limited initial data, using the FGM (1,1) model for long-term prediction would increase the error; Ma et al. [23] proposed a new extended-parameter Morlet wavelet kernel ridge grey system model, which combines kernel ridge regularization with grey system modeling and is trained using the conjugate gradient method. Empirical results show that the model has the ability to effectively handle nonlinear complex systems and has great potential in nonlinear small sample prediction; Zhu M et al. [24] used residual autoregression models and Kalman filtering algorithms to construct a combination forecasting model. The research on predicting China’s natural gas consumption from 1980 to 2017 showed that the combination forecasting model has a high accuracy; compared to a single prediction model, the combination prediction model can utilize the potential different features in different model data to achieve efficient prediction of the data.

Ensemble learning, also known as committee-based learning, generates homogeneous ensembles through a learning algorithm in most ensemble methods. In homogeneous ensembles, individual learners can also be referred to as base learners; there are also some ensemble methods that use multiple learning algorithms to generate heterogeneous ensembles. Performance metrics are the evaluation criteria for a model’s generalization ability, and different performance metrics often lead to different evaluation results. It can balance noise from diverse models, thus enhancing its generalization ability. At present, ensemble learning methods can be roughly divided into two categories: one is the method of generating base classifiers through serialization, where there is a strong dependency relationship between individual learners; one is the parallelization ensemble method, which can generate base classifiers simultaneously without strong dependencies between individuals. The stacking algorithm is an effective and universal ensemble method that combines the prediction results generated by different learning algorithms in lower layers through high-level algorithms to achieve a higher prediction accuracy. Generally speaking, the steps to generate an ensemble mainly include the selection of base classifiers and their generation methods, as well as the combination strategy of base classifiers. Achieving a better performance than a single learner is the key to ensemble learning [25].

In current research on ensemble learning methods, many scholars have used the simple average ensemble, linear ensemble, and random forest ensemble to study [26]. Through a large number of research examples, it has been shown that ensemble learning models generally have better predictive performance. For example, Zhou Kun et al. [27] constructed a random forest (RF) ensemble prediction model based on Empirical Mode Decomposition (EMD), Extreme Gradient Boosting (XGBoost), Extreme Learning Machine (ELM), and corresponding combination models and conducted empirical research using carbon trading market data. The results showed that the proposed prediction model has a higher predictive performance than the benchmark model; Wang et al. [28] proposed a gated cyclic unit ensemble prediction model with random time effect weights based on the introduction of EMD. The experimental results showed that the ensemble model was superior to other comparative methods in the field of energy price prediction; Zhong, M et al. [29] proposed a prediction-based online trading strategy integration framework, which mainly consists of isomorphic CNN predictors, classifiers, and actuators. The effectiveness of the integration method was verified on the Standard & Poor’s and DJI datasets; Chai Jian et al. [30] provided a new comprehensive integrated prediction framework for natural gas consumption in different industries, but this integration is mainly based on the existence of influencing factors on natural gas consumption, and further research has not been conducted on the integrated prediction of natural gas consumption without selecting influencing factors, and there is relatively little existing research in this area.

The autoregressive integral moving average (ARIMA) model, as one of the classic models for time-series prediction, has the advantages of simple calculation, high speed efficiency, and high prediction accuracy. The high-order fuzzy cognitive map (HFCM) can compensate for the shortcomings of time-series prediction methods in describing causal relationships and dynamic system modeling. Using both as the basis models for integrated prediction is beneficial for achieving better prediction results. Meanwhile, by summarizing existing machine-learning prediction methods for natural gas consumption, it was found that most of them focus on single-model combination and data decomposition for prediction research, and few scholars have improved the internal structure of neural networks to enhance the data processing ability of machine-learning models.

Based on this, this article proposes a natural gas consumption prediction method that integrates heterogeneous stacking (ARIMA–HFCM–LSTM) to achieve complementary advantages and deep collaboration among multiple models. Stacking is an integrated learning method that is widely used in machine learning. Most existing research studies on stacking integrated learning are based on heterogeneous stacking (in theory, it can also be based on homogeneous stacking, but there is less research); that is, different prediction models are selected as its basic learners, and the alternative model of its meta learner is generally the existing single model. The stacking integrated prediction framework built in this paper has not only heterogeneous stacking but also homogeneous stacking. Secondly, on this basis, a double-layer LSTM network will be designed as the stacking meta learner model based on the LSTM model with strong generalization ability and stability, so as to achieve efficient prediction. The empirical results show that the prediction results of the homogeneous heterogeneous stacking (ARIMA–HFCM–LSTM) model proposed in this paper are better than those of the single model and the homogeneous stacking (ARIMA–LSTM) and stacking (HFCM–LSTM) models, which further verifies that the methods proposed in this paper have a better prediction performance. To some extent, it is hoped that the method proposed in this article can provide new research ideas for the integrated prediction of natural gas consumption in one-dimensional time series.

The main contributions of this article lie in the following: firstly, to meet the input requirements of the model for data, a data dimensionality enhancement technique using concave convex transformation is proposed. The transformed data meet the input requirements of the model and provide a new attempt to increase the dimensionality of one-dimensional time-series data. Secondly, starting from improving the internal structure of a neural network, a two-layer LSTM network is proposed as the meta learner model of stacking integrated learning, which overcomes the idea that the meta learner model is only a single model and improves the accuracy of integrated prediction on the basis of enriching the types of meta learner models. Third, by comparing the four single models of ARIMA, GM (1,1), HFCM, and LSTM, and the two integrated prediction results of homogeneous stacking (ARIMA–LSTM, HFCM–LSTM) and STK–DSN–R [31], it is found that the proposed integrated prediction method of heterogeneous stacking (ARIMA–HFCM–LSTM) has a better prediction performance. The method proposed in this paper broadens the research idea of stacking integrated learning prediction, and the accurate prediction results will also provide new decision-making basis for the policy formulation of natural gas related departments.

2. Model Introduction

2.1. ARIMA Model

The autoregressive integrated moving average model (ARIMA) is a time-series prediction method for fitting differential stationary series. The model examines the dynamic and persistent characteristics of time series, reveals the intrinsic relationships of time series, and is suitable for short-term time-series prediction [32].

The general form of ARIMA (p, d, q) model is as follows:

u_{t} = a + ϕ_{1} u_{t - 1} + \dots + ϕ_{p} u_{t - p} + ε_{t} + θ_{1} ε_{t - 1} + \dots + θ_{q} ε_{t - q}

(1)

where

p

is the order of the autoregressive model;

d

is the number of differences;

q

is the moving average order;

u_{t}

is the stationary sequence after differentiation;

a

is a constant;

ϕ

is the coefficient of the autoregressive model;

θ

is the coefficient of the moving average model;

ε_{t}

is a zero mean white noise sequence.

The modeling steps of this method are as follows:

Step 1: Time-series stationarity test. Test the stationarity of the time series through a time-series diagram. If the stationarity test is not passed, the non-stationary sequence will be transformed into a stationary sequence using a differential operation and subjected to stationarity testing;

Step 2: Preliminary identification of the model. Estimate the values of p and q based on the autocorrelation coefficient chart and partial autocorrelation coefficient chart and determine the optimal order of the model using the AIC criterion;

Step 3: Parameter estimation and model validation. Use the maximum likelihood function to estimate model parameters, and test the estimated parameters in the model;

Step 4: Model prediction. Based on the determined most suitable parameters, make model predictions.

2.2. High-Order Fuzzy Cognitive Map

Cognitive maps were first proposed by Tlomon in 1948 in his book “Cognitive Maps in Rats and Men”. In 1955, Kelly constructed a cognitive map where concepts are binary and their relationships are represented by three values: +, −, and 0. In 1976, Axelord further explained the dynamism of cognitive maps and the causal relationships between nodes. In 1986, Kosko et al. [33] proposed fuzzy cognitive maps (FCMs) based on Axelord’s fuzzy set theory and Zadeh’s fuzzy set theory. Fuzzy cognitive maps extend the ternary logical relationships between concepts to interval fuzzy relationships, which can be understood as signed fuzzy directed graphs. Figure 1 shows an example of a fuzzy cognitive graph with three nodes, where

w_{11}

,

w_{12}

,

w_{23}

and

w_{31}

represent the weights of the influence relationships between conceptual nodes.

If

w_{11}

,

w_{12}

,

w_{23}

and

w_{31}

are +0.3, −0.6, +0.5, and −0.1, respectively, then the causal relationship matrix between nodes is

W

, as shown in Equation (2):

W = [\begin{matrix} + 0.3 & - 0.6 & 0 \\ 0 & 0 & + 0.5 \\ - 0.1 & 0 & 0 \end{matrix}]

(2)

Assuming a fuzzy cognitive map has N nodes, the state values of all nodes can be represented as matrix

E

:

E = [e_{1}, e_{2}, \dots, e_{N}]

(3)

where

e_{n} = {[e_{1, n}, e_{2, n}, \dots, e_{Q, n}]}^{T}

represents the state value of the

n

-th node,

n = 1, 2, \dots, N

.

e_{q, n}

represents the state value of the e-th node at time

q

,

q = 1, 2, \dots, Q

.

The strength of the influence relationship between nodes

i

and

j

is called the weight of

i

on

j

, quantified by

w_{i, j}

. Therefore, the strength of the influence relationship between nodes can form a weight matrix

W

of

N \times N

:

W = [w_{1}, w_{2}, \dots, w_{N}]

(4)

where

w_{n} = {[w_{1, n}, w_{2, n}, \dots, w_{N, n}]}^{T}

represents the state value of the

n

-th node,

n = 1, 2, \dots, N

,

i = 1, 2, \dots, N

,

j = 1, 2, \dots, N

. When

w_{i, j} > 0

, it indicates a positive relationship between node

i

and

j

, and when

w_{i, j} > 0

, it indicates a neutral relationship between nodes

i

and

j

; when

w_{i, j} > 0

, nodes

i

and

j

have a negative relationship.

In a broad sense, the dynamic characteristics of a fuzzy cognitive map containing nodes can be expressed as follows:

e_{(q + 1), n} = ψ (\sum_{j = 1}^{N} w_{n, j} e_{q, j})

(5)

where

q

is the time tag,

q = 1, 2, \dots, Q

;

e_{q, j}

represents the state value of the

j

-th node at the time

q

iteration;

e_{(q + 1), n}

represents the state value of node

n

at time

q + 1

;

w_{i j}

represents the weight of node

i

to node

j

;

N

represents the number of nodes; and

ψ (\cdot)

represents the transfer function.

In order to enhance the ability of fuzzy cognitive maps to describe complex systems, Stach et al. [34] proposed high-order fuzzy cognitive maps (HFCMs) in 2006, whose order dynamic characteristics can be expressed as follows:

e_{(q + 1), n} = ψ (\sum_{j = 1}^{N} w_{n, j}^{1} e_{q, j} + w_{n, j}^{2} e_{q - 1, j} + \dots + w_{n, j}^{K} e_{q - K + 1, j} + w_{n, 0})

(6)

where

e_{(q + 1), n}

represents the state value of the

n

-th node at time

q + 1

;

w_{n, j}^{k}

represents the weight of node

n

to node

j

at time

q - k + 1

,

k = 1, 2, \dots, K

; and

w_{n, 0}

represents the constant bias associated with the

n

-th node.

The training process of high-order fuzzy cognitive map learning algorithm is as follows:

Step 1: Set the order

K

and regularization parameter

α

;

Step 2: Determine the weight matrix for each group

K

,

α

, using the training set;

Step 3: Use the training set and weight matrix to predict the validation set and obtain the validation set prediction result;

Step 4: Calculate the root mean square error (RMSE) of the predicted results from the validation set relative to the training set;

Step 5: Select the HFCM weight matrix corresponding to

K

,

α

, that minimizes RMSE as the training result.

2.3. Long Short-Term Memory Network LSTM Model

Long short-term memory (LSTM) [35] is a recurrent neural network that compensates for the limitation of ordinary neural networks that cannot rely on long-term information by introducing controllable self-loops. The clever design of this model effectively solves the problem of predicting time series with long delays and temporal intervals.

The internal structure of a single neuron in LSTM mainly consists of three layers: forget gate, input gate, and output gate, as shown in Figure 2.

Forgetting Gate: When the model is initially run, it needs to make trade-offs between information in the cellular state. It is possible to determine how much information from the previous unit state

c_{t - 1}

is retained in the current state by reading the previous output value

h_{t - 1}

and the current input value

x_{t}

. In the model, 0 represents complete forgetting, 1 represents complete retention, and different weights

W_{f}

and biases

b_{f}

are used between 0 and 1 to determine the degree of retention of sequence data information. The control function of the forget gate is as follows:

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(7)

Input gate: The input gate controls the degree of processing of cell element state information added to the new input sequence

x_{t}

. Firstly, the sigmoid layer function is used to determine the update value

i_{t}

, and the tanh layer function generates the state candidate value

{\tilde{C}}_{t}

. Both of them jointly determine the replacement of information by combining the two parts, discarding the information discarded in the forget gate, and adding the updated information to determine the current unit state

C_{t}

. The main calculation formula is as follows:

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(8)

{\tilde{C}}_{t} = \tanh (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{c})

(9)

C_{t} = f_{t} * C_{t - 1} + i_{t} * {\tilde{C}}_{t}

(10)

Output gate: The output gate determines the degree of sequence information output by controlling the state of the control unit. The sigmoid layer function serves as the judgment condition for the cell state output part, while the tanh layer function processes the cell state

C_{t}

obtained from the input gate. By multiplying the two layers, the current cell sequence information that determines the output can be obtained. The calculation formula is as follows:

o_{t} = σ (W_{O} [h_{t - 1}, x_{t}] + b_{o})

(11)

h_{t} = o_{t} * \tanh (C_{t})

(12)

2.4. Stacking Ensemble Learning Algorithm

Stacking is an ensemble learning algorithm proposed by Wolpert [36] in 1992, which has good generalization ability. The algorithm mainly selects models with good predictive performance as the first-layer base learners and trains them on the initial dataset; we choose a model with simple structure and strong generalization ability as the meta learner for the second layer and train the second-layer classifier with the new dataset generated by the first-layer classifier, where the output of the first-layer classifier is the input of the second-layer classifier. The schematic diagram of the structure is shown in Figure 3.

2.5. Model Evaluation Indicators

To comprehensively validate the performance of the model, this paper evaluates the proposed model using RMSE, MAPE, and R² error metrics, specifically:

(a) Root mean square error (RMSE):

R M S E = \sqrt{\frac{1}{m} \sum_{i = 1}^{m} {(y_{i} - {\hat{y}}_{i})}^{2}}

(13)

(b) Mean absolute percentage error (MAPE):

M A P E = \frac{1}{m} \sum_{i = 1}^{m} | \frac{y_{i} - {\hat{y}}_{i}}{y_{i}} |

(14)

(c) Coefficient of determination R²:

R^{2} = 1 - \frac{\sum_{i = 1}^{m} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{m} {(y_{i} - {\bar{y}}_{i})}^{2}}

(15)

In Equations (13)–(15),

y_{i}

is the true value of the

i

-th sample,

{\hat{y}}_{i}

is the predicted value of the

i

-th sample,

m

is the number of samples, and

\bar{y} = \frac{1}{m} \sum_{i = 1}^{m} y_{i}

in (15).

3. Design of Ensemble Prediction Model Based on Homoheterogeneous Stacking

3.1. Model Design

Stacking algorithms are usually heterogeneous ensembles, but there are also homogeneous ensembles. Most existing studies integrate homogeneous or anomalous models, while this article will focus on the ensemble of homogeneous and heterogeneous models.

Firstly, this article uses ARIMA models of different orders and high-order fuzzy cognitive maps (abbreviated as ARIMA (n) and HFCM (n) in this article) as the algorithms for the first-layer base learner. For ARIMA and HFCM models, they can be regarded as heterogeneous model ensembles of the stacking algorithm, while different orders of ARIMA and HFCM models can be considered as homogeneous model ensembles of the stacking algorithm. When calculating specifically, n is taken as three, which means there are six models in the base learner of the first layer.

When selecting and applying the second-layer meta learner model, a dropout layer and fully connected layer will be involved. Here is a brief explanation of their concepts. Dropout technology is a technique used to prevent overfitting in deep-learning models. Its main principle is to randomly remove several neurons, so that the model has a different structure during each training process, thereby avoiding dependence on a certain local feature, increasing the generalization ability of the model, and reducing the occurrence of overfitting. The example diagram is shown in Figure 4. The fully connected layer (FC), as one of the hierarchical structures in neural networks, mainly maps the high-dimensional feature maps obtained from convolutional and pooling layers to one-dimensional vectors through matrix multiplication. Simply put, it is a dimension transformation, which transforms high dimensions into one dimension while preserving useful information.

The newly designed LSTM network is used as the second-layer meta learner model of the stacking ensemble algorithm. Consider that the performance of the model gradually decreases with the increase in layers, mainly due to the influence of overfitting phenomenon in the model. To address this issue, this article introduces dropout technology into the network to improve model performance. By adding dropout technology to the LSTM network, not only does it increase model stability, but it also enhances the model’s generalization ability.

When designing an LSTM network, this article first takes the predicted results of six base learners as inputs to the network, and then designs a recyclable five-layer structure, which is as follows: LSTM 1, Dropout Layer 1, LSTM 2, Dropout Layer 2, the FC layer, and outputs the final prediction result of the network. Thus, the overall prediction process based on the same heterogeneous stacking ensemble algorithm is shown in Figure 5:

3.2. Data Preprocessing

(1): Data dimensionality enhancement technique based on concave convex transformation.

It should be noted that in specific prediction, due to the multidimensional requirements of HFCM for data input, while the research data only have one dimension, this paper proposes to perform nonlinear transformations of

y = x^{a}

and

y = \sqrt[b]{x}

(

a, b \geq 2

) forms on the input data of ARIMA and HFCM models. The example figure is shown in Figure 6.

From Figure 6, it can be seen that when x is greater than 0 and in the first quadrant,

y = x^{a}

is a concave function and

y = \sqrt[b]{x}

is a convex function. Based on the characteristics of the function curve, it is defined as a data dimensionality enhancement technique based on concave convex transformation, and the transformed data meet the multidimensional input requirements of high-order fuzzy cognitive maps. At the same time, in order to maintain the consistency of the input data and the accuracy of the prediction results, the prediction of the base learner model integrated by stacking is carried out on the basis of data transformation.

(2): Data normalization

In the modeling process, to eliminate the dimensional influence between data, the data after nonlinear transformation are subjected to Max Min normalization, and the normalized value range is in between, as shown in Equation (16). Subsequent research will be conducted on this basis.

\bar{x} = \frac{2 (x - x_{\min})}{x_{\max} - x_{\min}} + 1

(16)

4. Empirical Analysis

4.1. Data Sources and Preprocessing

According to the overall prediction process based on the same heterogeneous stacking ensemble algorithm in Section 3, the total natural gas energy consumption in China and 30 provinces (except Xizang) from 2000 to 2019 (unit: 100 million cubic meters) was selected for empirical research (data source: National Bureau of Statistics of the People’s Republic of China, https://data.stats.gov.cn/, accessed on 3 September 2024).

To make predictions, preprocessing steps include data dimensionality enhancement in Figure 6 (where a = 2, b = 3) and data normalization in (16).

4.2. Main Parameter Values

(1): First-layer base learner models: ARIMA and HFCM;

Through multiple experiments, the ARIMA model of the original sequence was determined to be ARIMA (2,2,1), following a t-distribution; the ARIMA model for the

y = x^{2}

sequence is ARIMA (2,2,2), which follows a t-distribution; the ARIMA model for the

y = \sqrt[3]{x}

sequence is ARIMA (2,2,1), which follows a Gaussian distribution.

For the original sequence,

y = x^{2}

sequence, and

y = \sqrt[3]{x}

sequence, the order of HFCM in this article is K = 2, 7, 9, 11 (when the training length is less than 11, K = 2, 7, 9). The transfer function is an important component of HFCM, which controls the output range. Commonly used transfer functions include bivalent, trivalent, sigmoid, hyperbolic tangent functions, and threshold linear functions. The maximum and minimum normalized values are between −1 and 1. Therefore, this article uses the hyperbolic tangent function as the transfer function of HFCM for prediction, and the expression is as follows:

\tanh (λ x) = \frac{e^{λ x} - e^{- λ x}}{e^{λ x} + e^{- λ x}}

(17)

where

λ \in [0, 1]

is an adjustable parameter, and

λ = 1

is often taken.

When predicting, the grid search method is used to optimize the order

K

and regularization parameter

α

. When the cross-validation error is minimized, the optimal

K

and

α

parameters are obtained, and the prediction effect is the best.

(2): Second-layer meta learner model: newly designed LSTM network

The parameters of the second-layer meta learner model based on stacking ensemble prediction are mainly for the newly designed LSTM network. Firstly, there is an LSTM network with 64 hidden units, followed by a dropout layer with a 10% probability, i.e., DropoutLayer (0.1). Then, the LSTM network with 64 hidden units is stacked, followed by DropoutLayer (0.1), and finally a fully connected FC layer, which ensures that the output is the desired prediction result. In order to provide a clear understanding, the detailed structure of the proposed LSTM regression network is shown in Table 1.

The network detail parameters are listed in Table 2, and the loss function is “huber”, where the training curve for national natural gas consumption prediction is shown in Figure 7 and presents a stable performance after about 100 iterations.

(3): To enable readers to reproduce the algorithm in this article, the specific parameters of the model hyperparameter settings, ARIMA, GM(1,1), HFCM, STK–DSN–R, LSTM, ARIMA–LSTM, HFMC−LSTM, and the proposed ARIMA–HFCM–LSTM are presented in a list, as shown in Table 2.

4.3. Prediction Result Analysis

Based on the natural gas consumption data of the whole country and 30 provinces (except Xizang), combined with the integrated forecasting process in Part 3, the natural gas consumption forecast based on the integration of heterogeneous stacking is realized. Please note: Due to missing data in Zhejiang, Anhui, Fujian, Jiangxi, Hunan, Guangxi, Guangdong, Hainan, and Ningxia, data from consecutive years will be selected for prediction. To verify the effectiveness of the same heterogeneous stacking integration algorithm proposed in this paper, the same heterogeneous stacking (ARIMA–HFCM–LST-M) integration prediction results and RMSE, MAPE, and R² of ARIMA, GM (1,1), HFCM, LSTM single-model and integrated-model STK–DSN–R [31], ARIMA–LSTM, and HFCM–LSTM prediction results are compared and analyzed, as shown in Table 3, Table 4 and Table 5, respectively. (For the convenience of writing, the integrated model of the following table contents omits stacking.)

From Table 3, it can be seen that a smaller RMSE value indicates a better fitting effect. Except Shandong, Hubei, and Sichuan (marked in italics), the RMSE of the prediction results of the proposed heterogeneous stacking (ARIMA–HFCM–LSTM) integrated model is less than the four single models and the three integrated prediction results, indicating that the application rate of the integrated prediction model can reach 90.3%.

From Table 4, it can be seen that a smaller MAPE value indicates a better fitting effect and better predictive performance. The MAPE value of the stacking (ARIMA–HFCM–LSTM) integrated prediction results in Hebei, Shandong, Hubei, and Sichuan Provinces (already marked in italics) is larger than that of the four single models and three integrated prediction models in other provinces, and it is further concluded that the application rate of the proposed model is 87.1%.

Table 5 shows the comparison of R² values for the predicted results. The larger the R² value, the better the prediction performance. From Table 4, it can be seen that only Heilongjiang, Hubei, Sichuan, and Jiangxi have R² values of 0.84, 0.18, 0.28, and 0.85 (marked in italics) for stacking (ARIMA–HFCM–LSTM) ensemble prediction, respectively. Compared with the four single and three ensemble prediction models, the R² values are relatively small, indicating that the prediction performance evaluation indicators of these four provinces are not good in terms of R² prediction. This further indicates that the applicability of the model proposed in this paper can reach 87.1%.

From the commonalities of the three evaluation indicators in the above prediction results, it can be seen that when using the stacking (ARIMA–HFCM–LSTM) integrated model proposed in this paper to predict the natural gas consumption of Hubei and Sichuan provinces, the results are not satisfactory. The main reason for this phenomenon is that the natural gas consumption data of these two provinces have significant fluctuations and differences compared to other provinces. In addition to being related to the regional environment, there may also be statistical errors in the data, but statistical errors cannot be avoided, which is also one of the problems that need to be addressed in future research. Secondly, future research will focus on multivariate natural gas consumption prediction to avoid the limitations of univariate prediction models. But overall, the stacking (ARIMA–HFCM–LSTM) ensemble model proposed in this article has a good predictive performance.

To demonstrate the rationality of choosing stacking-based learner and meta learner models in this article, the average values of RMSE, MAPE, and R² for each model were compared, as shown in Table 6.

It can be seen from Table 6 that the average RMSE and average MAPE of ARIMA and HFCM in the four single models are smaller than those of GM (1,1) and LSTM, and the average R² is larger, which indicates that it is reasonable to use them as the stacking base learner model. The average RMSE and MAPE of the proposed heterogeneous stacking (ARIMA–HFCM–LSTM) are 7.5 and 0.07, respectively, which are the smallest values compared to other models. The average R² is 0.54, which is the highest value compared to other models. Therefore, the proposed ensemble prediction model has a better predictive performance. To some extent, the method proposed in this article may provide a new research approach for predicting univariate time series using the stacking ensemble.

To further demonstrate the good predictive performance of the stacking ensemble model proposed in this article, the predicted RMSE, MAPE, and R² of each comparison model are visually displayed in the form of graphs, as shown in Figure 8, Figure 9 and Figure 10.

From Figure 8, Figure 9 and Figure 10, it can be seen that there are significant differences in the statistical histograms of RMSE, MAPE, and R² for predicting natural gas consumption in China and various provinces among different algorithms. The ARIMA–HFCM–LSTM algorithm mainly focuses on predicting an RMSE of 10⁹ m³ or less, a MAPE of 0.1 or less, and an R² of 0–1, while other algorithms have larger and more scattered values, indicating that the proposed algorithm has the best prediction accuracy and stability.

5. Conclusions

This paper has conducted gas consumption prediction via stacking ensemble learning. To obtain the potential characteristics of the data, convex and concave transformation are developed. Given the good performance of ARIMA and HFCM for time-series prediction, an ensemble learning framework is utilized, where multiscale ARIMA and HFCM are chosen as the base learner models, and the well-designed LSTM-based deep regression model is chosen as the meta learner model. Compared with existing traditional and ensemble methods, the experimental results have shown that the prediction performance of the proposed method is best in the RMSE metric, robust to data outlet in the MAPE metric, and insensitive to data variations. In addition, the proposed method is a univariate prediction one and cannot perform long-term forecasting. To improve the prediction performance further, more influencing factors, including supply, population, economic, etc., should be considered via multivariate time-series prediction in the future.

Author Contributions

Q.W.: methodology, software, writing—original draft, data curation. Z.L. and P.L.: validation, writing—review and editing, supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Deng, Y.; Dewil, R.; Appels, L.; Van Tulden, F.; Li, S.; Yang, M.; Baeyens, J. Hydrogen-enriched natural gas in a decarbonization perspective. Fuel 2022, 318, 123680. [Google Scholar] [CrossRef]
Available online: https://baijiahao.baidu.com/s?id=1808600980329759410&wfr=spider&for=pc (accessed on 3 September 2024).
Zhang, H.; Zhang, R. Natural gas importing and exporting strategy under influence of supply security abroad and domestic infrastructure. Chin. J. Manag. Sci. 2021, 29, 188–198. [Google Scholar] [CrossRef]
Lu, D. Forecast about natural gas consumption level in medium-long term of China. Oil Gas Storage Transp. 2002, 21, 1–5. [Google Scholar] [CrossRef]
Parikh, J.; Pallav, P.; Pallavi, M. Demand projections of petroleum products and natural gas in India. Energy 2007, 32, 1825–1837. [Google Scholar] [CrossRef]
Shahbaz, M.; Lean, H.; Farooq, A. Natural gas consumption and economic growth in Pakistan. Renew. Sustain. Energy Rev. 2013, 18, 87–94. [Google Scholar] [CrossRef]
Zhang, W.; Yang, J. Forecasting natural gas consumption in China by bayesian model averaging. Energy Rep. 2015, 1, 216–220. [Google Scholar] [CrossRef]
Lu, Q.; Chai, J.; Zhu, Q.; Xing, L.; Deng, J. Analysis and forecast for natural gas consumption demand. Chin. J. Manag Sci. 2015, 23, 823–829. [Google Scholar]
Chai, J.; Wang, Y.; KIN, K. Analysis and prediction of natural gas consumption in china under the “New Normal”. Oper. Res. Manag. 2019, 28, 175–183. [Google Scholar] [CrossRef]
Gao, R.; Wang, W.; Zhou, K.; Zhao, Y.; Yang, C.; Ren, Q. Optimization of a multiphase mixed flow field in backfill slurry preparation based on multiphase flow interaction. ACS Omega 2023, 8, 34698–34709. [Google Scholar] [CrossRef]
Yi, J.; Yan, H. Early prediction and warning of international trade risks based on wavelet decomposition and ARIMA-GRU hybrid model. Chin. J. Manag. Sci. 2023, 31, 100–110. [Google Scholar] [CrossRef]
Huang, B.; Zheng, H.; Guo, X.; Yang, Y.; Liu, X. A novel model based on DA-RNN network and skip gated recurrent neural network for periodic time Series forecasting. Sustainability 2021, 14, 326. [Google Scholar] [CrossRef]
Belany, P.; Hrabovsky, P.; Sedivy, S.; Cajova Kantova, N.; Florkova, Z. A comparative analysis of polynomial regression and artificial neural networks for prediction of lighting consumption. Buildings 2024, 14, 1712. [Google Scholar] [CrossRef]
Soltanisarvestani, A.; Safavi, A.; Rahimi, M. The detection of unaccounted for gas in residential natural gas customers using particle swarm optimization- based neural networks. Energy Sources Part B Econ. Plan. Policy 2023, 18, 2154412. [Google Scholar] [CrossRef]
Hu, Z.; Jiang, T. Innovative grey multivariate prediction model for forecasting Chinese natural gas consumption. Alex. Eng. J. 2024, 103, 384–392. [Google Scholar] [CrossRef]
Wu, W.; Ma, X.; Zeng, B.; Zhang, Y. A conformable fractional-order grey Bernoulli model with optimized parameters and its application in forecasting Chongqing’s energy consumption. Expert Syst. Appl. 2024, 255, 124534. [Google Scholar] [CrossRef]
Li, H.; Zhang, J.; Han, M.; Sun, Y. Urban natural gas consumption prediction with improved model. Urb. Gas 2021, 2021, 42–47. [Google Scholar] [CrossRef]
Wang, H.; Chen, Y.; Dong, R. Prediction of China’s natural gas consumption based on fractional grey model. J. Henan Uni. Sci. Technol. (Nat. Sci. Ed.) 2022, 50, 77–84. [Google Scholar] [CrossRef]
Gao, Y.; Cao, Q.; Xue, F.; Gao, Y.; Xie, Y.; Zhang, N.; Gong, Z. The Model for Predicting Annual Production and Consumption of Natural Gas in China. Liaoning Chem. Ind. 2013, 42, 500–502. [Google Scholar] [CrossRef]
Cheng, B.; Jie, A.; Zhu, S. Research on Natural Gas Demand Forecasting in Fujian Province from the Perspective of Multiple Models. Fujian Trib. 2013, 11, 128–134. Available online: https://d.wanfangdata.com.cn/periodical/fjlt-rwshkxb201311023 (accessed on 3 September 2024).
Shashank, N.S.; Mani, S.S. Artificial neural networks-based ignition delay time prediction for natural gas blends. Comb. Sci. Technol. 2023, 195, 3248–3261. [Google Scholar] [CrossRef]
Tong, Y.; Chen, H.; Zhang, X.; Wu, L. Forecast of Beijing Natural Gas Consumption Based on FGM(1,1) Model. Math. Pract. Theory 2020, 50, 79–83. Available online: https://d.wanfangdata.com.cn/periodical/sxdsjyrs202003009 (accessed on 3 September 2024).
Ma, X.; Deng, Y.; Ma, M. A novel kernel ridge grey system model with generalized Morlet wavelet and its application in forecasting natural gas production and consumption. Energy 2024, 287, 129630. [Google Scholar] [CrossRef]
Zhu, M.; Wu, Q.; Wang, Y. Forecasting the gas consumption based on residual auto-regression model and kalman filtering algorithm. J. Resour. Ecol. 2019, 10, 546–552. [Google Scholar] [CrossRef]
Ci, T.; Liao, Z.; Ren, M.; Liang, Y.; Wu, Z. A new energy generation power prediction model based on VMD stacking ensemble learning. Electr. Power Sci. Eng. 2024. Available online: https://link.cnki.net/urlid/13.1328.tk.20240903.1644.011 (accessed on 3 September 2024).
Song, Z.; Zhou, Z.; Yu, L.; Ren, T. Portfolio optimization strategy with a hybrid ensemble forecasting algorithm and black-litterman model. Chin. J. Manag. Sci. 2024, 1–12. [Google Scholar] [CrossRef]
Zhou, K.; Gao, X.; Li, L. Integrated carbon emission trading price prediction based on EMD-XGB-ELM and FSGM from the perspective of dual processing. Chin. J. Manag. Sci. 2024, 1–11. [Google Scholar] [CrossRef]
Wang, B.; Wang, J. Energy futures and spots prices forecasting by hybrid SW-GRU with EMD and error evaluation. Energy Econ. 2020, 90, 104827. [Google Scholar] [CrossRef]
Zhong, M.; Hang, L.; Peng, C.; Qi, B.; Zhi, H.; Wen, Y. An efficient isomorphic cnn-based prediction and decision framework for financial time series. Intell. Data Anal. 2022, 26, 893–909. [Google Scholar] [CrossRef]
Chai, J.; Zhao, Y.; Liang, T.; Wei, Z. Dynamic analysis and prediction of industry heterogeneity in gas consumption of non residential users in China: Based on a new comprehensive integrated analysis and prediction model framework. Syst. Sci. Math. 2022, 42, 318–336. [Google Scholar] [CrossRef]
Nti, I.K.; Adekoya, A.F.; Weyori, B.A. A comprehensive evaluation of ensemble learning for stock-market prediction. J. Big Data 2020, 7, 20. [Google Scholar] [CrossRef]
Xu, X.; Yu, L.; Lin, Z.; Su, Y. Short term sales combination prediction of fresh products based on feature fusion. J. Manag. Sci. 2022, 25, 102–123. [Google Scholar] [CrossRef]
Kosko, B. Fuzzy cognitive map. Int. J. Man-Mach Stud. 1986, 24, 65–75. [Google Scholar] [CrossRef]
Stach, W.; Kurgan, L.; Pedrycz, W. Higher-order fuzzy cognitive maps. In Proceedings of the NAFIPS 2006—2006 Annual Meeting of the North American Fuzzy Information Processing Society, Montreal, QC, Canada, 3–6 June 2006; pp. 166–171. [Google Scholar] [CrossRef]
Bo, F.; Chi, G.; Wen, W. Enterprise default prediction model based on lstm and multi head attention mechanism. J. Manag. Eng. 2024, 38, 213–226. [Google Scholar] [CrossRef]
Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]

Figure 1. Example of a fuzzy cognitive graph with three nodes.

Figure 2. Schematic diagram of LSTM unit cell structure.

Figure 3. Stacking ensemble learning framework.

Figure 4. Dropout example (the network on the right discards neurons filled with blue with a probability of 0.1).

Figure 5. Prediction process based on the homoheterogeneous stacking ensemble algorithm.

Figure 6.

y = x^{a}

and

y = \sqrt[b]{x}

function curves.

Figure 6.

y = x^{a}

and

y = \sqrt[b]{x}

function curves.

Figure 7. Model training convergence diagram.

Figure 8. Distributions of the algorithm prediction RMSE.

Figure 9. Distributions of the algorithm prediction MAPE.

Figure 10. Distributions of the algorithm prediction R².

Table 1. Structure of the proposed LSTM regression network.

Name	Type	Activation	Learnable Parameter Size	State Size
sequenceinput Sequence input: 12 dimensions	sequenceinput	12 (C) × 1 (B) × 1 (T)	_	_
lstm_1 LSTM: 64 hidden units	LSTM	64 (C) × 1 (B) × 1 (T)	InputWeights 256 … RecurrentW … 256 … Bias 256 …	HiddenState 64 × 1 CellState 64 × 1
dropout_1 10% discard	discard	64 (C) × 1 (B) × 1 (T)	_	_
lstm_2 LSTM: 64 hidden units	LSTM	64 (C) × 1 (B) × 1 (T)	InputWeights 256 … RecurrentW… 256 … Bias 256 …	HiddenState 64 × 1 CellState 64 × 1
dropout_2 10% discard	discard	64 (C) × 1 (B) × 1 (T)	_	_
Fc 3 fully connected layer	fully connected layer	3 (C) × 1 (B) × 1 (T)	Weights 3 × 64 Bias 3 × 1	_

Table 2. Algorithm parameter settings.

Algorithms	Parameters
ARIMA	p = 2, d = 2, q = 1
GM(1,1)	_
HFCM	a = 2, b = 3, $K \in [2, 11]$ , $K \in ℕ^{+}$ , $α \in [10^{- 4}, 1]$ , $α \in ℝ^{+}$
STK–DSN–R	Decision Tree (DT): criterion = entropy, max_depth = 4. SVM: Radial Basis Function (RBF) kernel, regularization C = 100. MLP: three hidden layers (HL), HL1 and HL2 (with 5 nodes), and HL3 (with 10 nodes), maximum iteration = 5000, optimiser = Adam, activation = logistic.
LSTM	MaxEpochs = 300, optimiser = Adam, LearnRateSchedule = “piecewise”, InitialLearnRate = 0.01, LearnRateDropPeriod = 50, LearnRateDropFactor = 0.2
ARIMA–LSTM	ARIMA: $p_{1} = 2, d_{1} = 2, q_{1} = 2, p_{2} = 2, d_{2} = 2, q_{2} = 1,$ $p_{3} = 2, d_{3} = 2, q_{3} = 2$ LSTM: MaxEpochs = 300, optimiser = Adam, LearnRateSchedule = “piecewise”, InitialLearnRate = 0.01, LearnRateDropPeriod = 50, LearnRateDropFactor = 0.2
HFCM–LSTM	HFCM: a = 2, b = 3, $K = 2, 7, 9, 11$ , $K \in ℕ^{+}$ , $α \in [10^{- 4}, 1]$ , $α \in ℝ^{+}$ LSTM: MaxEpochs = 300, optimiser = Adam, LearnRateSchedule = “piecewise”, InitialLearnRate = 0.01, LearnRateDropPeriod = 50, LearnRateDropFactor = 0.2
ARIMA–HFCM–LSTM	ARIMA: $p_{1} = 2, d_{1} = 2, q_{1} = 2, p_{2} = 2, d_{2} = 2, q_{2} = 1,$ $p_{3} = 2, d_{3} = 2, q_{3} = 2$ HFCM: a = 2, b = 3, $K = 2, 7, 9, 11$ , $K \in ℕ^{+}$ , $α \in [10^{- 4}, 1]$ , $α \in ℝ^{+}$ LSTM: MaxEpochs = 300, optimiser = Adam, LearnRateSchedule = “piecewise”, InitialLearnRate = 0.01, LearnRateDropPeriod = 50, LearnRateDropFactor = 0.2

Table 3. Comparison with RMSE of predicted results.

	ARIMA	GM(1,1)	HFCM	LSTM	STK–DSN–R	ARIMA–LSTM	HFCM–LSTM	ARIMA–HFCM–LSTM
National	360.79	605.82	145.37	278.34	300.05	359.29	184.48	73.85
Beijing	11.20	34.72	77.60	64.52	14.45	19.31	10.15	9.43
Tianjin	14.41	14.68	60.27	26.67	12.90	4.66	17.38	3.08
Hebei	27.54	10.46	6.36	60.32	12.03	28.17	32.59	5.71
Shanxi	13.78	136.63	27.35	49.79	15.34	4.63	18.37	4.60
Inner Mongolia	23.39	29.21	11.63	74.65	10.48	19.61	16.39	7.97
Liaoning	221.11	37.22	49.89	13.04	25.77	17.44	11.63	4.42
Jilin	3.20	16.40	4.94	22.69	4.47	9.19	5.18	3.10
Heilongjiang	0.91	0.85	1.78	3.26	6.17	4.71	3.30	0.80
Shanghai	15.09	69.56	2.85	15.68	12.44	6.93	10.50	2.66
Jiangsu	26.58	175.28	19.74	79.32	44.35	36.79	45.85	18.45
Shandong	41.74	13.46	50.41	19.02	56.86	54.14	52.59	23.03
Henan	6.26	28.02	11.12	87.85	14.29	14.57	9.60	2.11
Hubei	3.88	53.16	5.11	7.50	8.14	11.51	16.94	9.54
Chongqing	14.65	16.67	19.85	35.08	6.85	9.42	10.45	5.06
Sichuan	32.91	13.30	47.72	228.65	30.00	43.19	53.03	24.72
Guizhou	7.10	16.37	20.31	21.52	11.13	13.22	5.09	4.21
Yunnan	3.53	8.02	9.85	6.30	3.57	4.42	3.46	3.20
Shaanxi	7.60	19.12	4.94	81.45	6.09	10.70	14.52	3.42
Gansu	1.62	9.39	1.12	30.05	1.27	1.96	4.49	0.39
Qinghai	3.50	11.61	1.95	32.46	2.07	4.12	9.82	1.51
Xinjiang	52.40	80.82	10.96	12.39	14.14	21.73	32.22	3.02
Zhejiang	10.80	52.24	24.34	95.73	30.03	40.41	14.08	7.12
Anhui	1.13	43.24	2.98	9.31	5.33	7.08	8.95	1.09
Fujian	1.50	104.43	2.00	65.66	3.87	2.60	4.60	0.48
Jiangxi	6.31	43.04	1.54	5.39	1.78	1.73	0.87	0.67
Hunan	7.10	22.21	5.73	23.59	2.27	3.96	1.79	1.17
Guangdong	9.57	8.77	13.63	17.36	8.62	8.15	3.34	1.69
Guangxi	9.82	102.97	5.02	51.52	17.80	11.84	35.51	2.17
Hainan	13.19	20.56	4.74	7.73	3.63	2.19	9.28	2.08
Ningxia	3.52	7.42	1.45	12.31	2.52	2.05	2.65	1.40

Table 4. Comparison with MAPE of predicted results.

	ARIMA	GM(1,1)	HFCM	LSTM	STK–DSN–R	ARIMA–LSTM	HFCM–LSTM	ARIMA–HFCM–LSTM
National	0.11	0.23	0.05	0.09	0.37	0.11	0.06	0.02
Beijing	0.04	0.15	0.38	0.36	0.08	0.09	0.05	0.03
Tianjin	0.14	0.16	0.51	0.24	0.14	0.04	0.16	0.03
Hebei	0.18	0.10	0.06	0.35	0.10	0.17	0.19	0.09
Shanxi	0.16	1.61	0.31	0.62	0.19	0.04	0.19	0.02
Inner Mongolia	0.36	0.52	0.17	1.27	0.15	0.31	0.25	0.12
Liaoning	2.77	0.55	0.71	0.17	0.38	0.27	0.16	0.05
Jilin	0.10	0.61	0.14	0.77	0.16	0.28	0.15	0.08
Heilongjiang	0.02	0.02	0.04	0.07	0.14	0.10	0.08	0.01
Shanghai	0.16	0.74	0.03	0.18	0.13	0.06	0.11	0.02
Jiangsu	0.11	0.69	0.08	0.34	0.15	0.12	0.15	0.06
Shandong	0.23	0.10	0.28	0.12	0.31	0.30	0.29	0.14
Henan	0.05	0.24	0.10	0.86	0.13	0.14	0.09	0.01
Hubei	0.07	0.92	0.09	0.14	0.12	0.16	0.22	0.17
Chongqing	0.14	0.16	0.19	0.36	0.07	0.09	0.10	0.06
Sichuan	0.12	0.06	0.18	1.03	0.10	0.16	0.20	0.11
Guizhou	0.18	0.53	0.73	0.74	0.30	0.29	0.22	0.17
Yunnan	0.20	0.59	0.80	0.45	0.24	0.30	0.24	0.18
Shaanxi	0.07	0.15	0.05	0.73	0.06	0.10	0.14	0.03
Gansu	0.05	0.30	0.03	1.02	0.03	0.06	0.12	0.01
Qinghai	0.07	0.21	0.03	0.65	0.03	0.07	0.19	0.02
Xinjiang	0.35	0.59	0.08	0.09	0.09	0.16	0.24	0.02
Zhejiang	0.08	0.41	0.20	0.72	0.22	0.27	0.08	0.06
Anhui	0.02	0.81	0.06	0.13	0.09	0.11	0.13	0.01
Fujian	0.03	1.92	0.03	1.22	0.07	0.04	0.08	0.01
Jiangxi	0.24	1.61	0.05	0.22	0.05	0.06	0.03	0.04
Hunan	0.22	0.68	0.19	0.79	0.06	0.12	0.05	0.03
Guangdong	0.42	0.43	0.78	0.82	0.28	0.24	0.17	0.09
Guangxi	0.05	0.50	0.02	0.28	0.09	0.06	0.18	0.01
Hainan	0.29	0.46	0.10	0.17	0.07	0.04	0.19	0.02
Ningxia	0.15	0.30	0.05	0.50	0.09	0.08	0.10	0.03

Table 5. Comparison with R² of predicted results.

	ARIMA	GM(1,1)	HFCM	LSTM	STK–DSN–R	ARIMA–LSTM	HFCM–LSTM	ARIMA–HFCM–LSTM
National	0.09	−1.56	0.85	0.46	0.10	0.10	0.76	0.96
Beijing	0.22	−6.54	−36.70	−25.06	−0.31	−1.33	0.35	0.44
Tianjin	0.04	0.01	−15.76	−2.28	0.23	0.90	−0.39	0.93
Hebei	0.41	0.92	0.97	−1.81	0.89	0.39	0.18	0.98
Shanxi	−1.71	−265.94	−9.70	−34.44	−2.36	0.69	−3.83	0.70
Inner Mongolia	−8.20	−13.35	−1.28	−92.74	−0.85	−5.47	−3.52	−0.07
Liaoning	−511.21	−13.52	−25.07	−0.78	−5.96	−2.19	−0.42	0.80
Jilin	0.21	−19.78	−0.88	−38.79	−0.54	−5.52	−1.07	0.30
Heilongjiang	0.88	0.89	0.52	−0.61	−4.77	−2.36	−0.65	0.84
Shanghai	−2.50	−73.25	0.88	−2.77	−1.38	0.26	−0.69	0.89
Jiangsu	0.65	−14.18	0.81	−2.11	0.03	0.33	−0.04	0.89
Shandong	−0.55	0.84	−1.26	0.68	−1.88	−1.61	−1.46	0.85
Henan	0.00	−18.96	−2.14	−195.25	−4.19	−4.40	−1.34	0.89
Hubei	0.86	−24.32	0.77	0.50	0.41	−0.19	−1.57	0.18
Chongqing	−6.76	−9.05	−13.25	−43.47	−0.70	−2.21	−2.95	−0.30
Sichuan	−0.28	0.79	−1.70	−60.94	−0.07	−1.21	−2.33	0.28
Guizhou	0.29	−2.75	−4.78	−5.48	−0.73	−1.45	0.64	0.46
Yunnan	−0.12	−4.80	−7.74	−2.57	−0.15	−0.76	−0.08	0.08
Shaanxi	−1.12	−12.39	0.11	−242.11	−0.36	−3.20	−6.72	0.57
Gansu	0.56	−13.68	0.79	−149.43	0.73	0.36	−2.35	0.97
Qinghai	−1.43	−25.77	0.25	−208.23	0.15	−2.38	−18.16	0.55
Xinjiang	−332.34	−792.11	−13.58	−17.64	−23.26	−56.33	−125.04	−0.11
Zhejiang	0.79	−3.91	−0.07	−15.51	−0.62	−1.94	0.64	0.91
Anhui	0.98	−29.26	0.86	−0.40	0.54	0.19	−0.30	0.99
Fujian	0.15	−4115.16	−0.51	−1626.52	−4.66	−1.56	−6.97	0.91
Jiangxi	−3.84	−224.44	0.71	−2.54	0.61	0.64	0.91	0.85
Hunan	−9.02	−97.01	−5.53	−109.58	−0.02	−2.12	0.36	0.73
Guangdong	−1.39	−1.00	−3.84	−6.85	−0.93	−0.73	0.71	0.93
Guangxi	0.50	−53.99	0.87	−12.77	−0.64	0.27	−5.54	0.98
Hainan	−56.23	−137.93	−6.39	−18.63	−3.34	−0.58	−27.32	−0.47
Ningxia	−3.82	−20.37	0.18	−57.72	−1.46	−0.63	−1.73	0.19

Table 6. Average RMSE, MAPE, and R² of each model.

Model	Average RMSE	Average MAPE	Average R²
ARIMA	30.84	0.23	−30.12	Base learner model
GM(1,1)	58.25	0.53	−193.28	Not selected
HFCM	21.05	0.21	−4.57	Base learner model
LSTM	49.65	0.50	−95.98	Meta learner model
STK–DSN–R	22.22	0.15	−1.79	Other integrated models
ARIMA–LSTM	25.15	0.14	−3.03	Homogeneity
HFCM–LSTM	20.94	0.15	−6.77	Homogeneity
ARIMA–HFCM–LSTM	7.89	0.07	0.54	Homoheterogeneity

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Q.; Luo, Z.; Li, P. Natural Gas Consumption Forecasting Based on Homoheterogeneous Stacking Ensemble Learning. Sustainability 2024, 16, 8691. https://doi.org/10.3390/su16198691

AMA Style

Wang Q, Luo Z, Li P. Natural Gas Consumption Forecasting Based on Homoheterogeneous Stacking Ensemble Learning. Sustainability. 2024; 16(19):8691. https://doi.org/10.3390/su16198691

Chicago/Turabian Style

Wang, Qingqing, Zhengshan Luo, and Pengfei Li. 2024. "Natural Gas Consumption Forecasting Based on Homoheterogeneous Stacking Ensemble Learning" Sustainability 16, no. 19: 8691. https://doi.org/10.3390/su16198691

APA Style

Wang, Q., Luo, Z., & Li, P. (2024). Natural Gas Consumption Forecasting Based on Homoheterogeneous Stacking Ensemble Learning. Sustainability, 16(19), 8691. https://doi.org/10.3390/su16198691

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Natural Gas Consumption Forecasting Based on Homoheterogeneous Stacking Ensemble Learning

Abstract

1. Introduction

2. Model Introduction

2.1. ARIMA Model

2.2. High-Order Fuzzy Cognitive Map

2.3. Long Short-Term Memory Network LSTM Model

2.4. Stacking Ensemble Learning Algorithm

2.5. Model Evaluation Indicators

3. Design of Ensemble Prediction Model Based on Homoheterogeneous Stacking

3.1. Model Design

3.2. Data Preprocessing

4. Empirical Analysis

4.1. Data Sources and Preprocessing

4.2. Main Parameter Values

4.3. Prediction Result Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI