Next Article in Journal
Recent Advances in the Study of In Situ Combustion for Enhanced Oil Recovery
Previous Article in Journal
Cycle Characteristics of a New High-Temperature Heat Pump Based on Absorption–Compression Revolution
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Natural Gas Demand Forecasting Model Based on LASSO and Polynomial Models and Its Application: A Case Study of China

1
National & Local Joint Engineering Research Center of Harbor Oil & Gas Storage and Transportation Technology, Zhejiang Key Laboratory of Petrochemical Environmental Pollution Control, School of Petrochemical Engineering & Environment, Zhejiang Ocean University, Zhoushan 316022, China
2
College of Information and Engineering, Zhejiang Ocean University, Zhoushan 316022, China
3
National Engineering Laboratory for Pipeline Safety/MOE Key Laboratory of Petroleum Engineering/Beijing Key Laboratory of Urban Oil and Gas Distribution Technology, China University of Petroleum-Beijing, Fuxue Road No. 18, Changping District, Beijing 102249, China
*
Authors to whom correspondence should be addressed.
Energies 2023, 16(11), 4268; https://doi.org/10.3390/en16114268
Submission received: 15 April 2023 / Revised: 7 May 2023 / Accepted: 19 May 2023 / Published: 23 May 2023

Abstract

:
China aims to reduce carbon dioxide emissions and achieve peak carbon and carbon neutrality goals. Natural gas, as a high-quality fossil fuel energy, is an important transition resource for China in the process of carbon reduction, so it is necessary to predict China’s natural gas demand. In this paper, a novel natural gas demand combination forecasting model is constructed to accurately predict the future natural gas demand. The Lasso model and the polynomial model are used to build a combinatorial model, which overcomes the shortcomings of traditional models, which have low data dimensions and poor prediction abilities. In the modeling process, the cross-validation method is used to adjust the modeling parameters. By comparing the performance of the combinatorial forecasting model, the single forecasting model and other commonly used forecasting models, the results show that the error (2.99%) of the combinatorial forecasting model is the smallest, which verifies the high accuracy and good stability advantages of the combinatorial forecasting model. Finally, the paper analyzes the relevant data from 1999 to 2022 and predicts China’s natural gas demand in the next 10 years. The results show that the annual growth rate of China’s natural gas demand in the next 10 years will reach 13.33%, at 8.3 × 1011 m3 in 2033, which proves that China urgently needs to rapidly develop the gas supply capacity of gas supply enterprises. This study integrates the impact of multiple factors on the natural gas demand, predicts China’s natural gas demand from 2023 to 2033, and provides decision-making support for China’s energy structure adjustment and natural gas import trade.

1. Introduction

China is currently the world’s largest consumer of energy and the largest emitter of carbon dioxide [1]. Compared to the world’s leading economies, China’s energy production and consumption mix has a higher share of coal, reaching 56.8% in 2020 [2]. The massive consumption of coal and oil, which has led to serious environmental pollution, has led to China’s commitment to reach carbon neutrality by 2030 and a peak carbon target by 2060 [3]. Therefore, there is a need to strengthen the development of a low carbon economy in the future and to strongly advocate for clean energy. As a low energy intensive and efficient source of energy, natural gas has been recognized as a relatively low carbon energy source compared to fossil fuels such as coal and oil. Accurate forecasting of the natural gas demand is fundamental to achieving a reliable supply of natural gas and is essential to national energy policy and planning, planning for future natural gas supply and demand in China’s infrastructure development, distribution management and other issues [4]. Therefore, it is important to adopt suitable mathematical methods and forecasting models for natural gas demand forecasting in order to ensure the security of the energy supply in China.
In recent years, increasingly more scholars have focused on different forecasting methods. Single prediction methods mainly include the gray model, the regression model, the SVM model, etc. Shaikh et al. [5] used the optimized nonlinear gray model (gray Verhulst model and nonlinear gray Bernoulli model) to construct a natural gas consumption prediction model in China. Bianco et al. [6] used the regression model to establish a prediction model of Italian non-resident natural gas consumption, and realized the prediction in a complex market planning model by determining a simple consumption pattern. Zhu et al. [7] estimated the short-term gas demand in the UK by support vector regression (SVR) techniques, and the results showed that the model outperformed SVRLP, autoregressive sliding average (ARMA) and artificial neural network (ANN) methods. Jia Ding et al. [8] proposed a new natural gas consumption prediction method: the double convolutional seasonal decomposition network. The results show that the method can be used to analyze the natural gas demand in different seasons, and the accuracy of the model in predicting urban natural gas demand was verified by simulation experiments. Gao et al. [9] proposed an objective prediction method based on a polynomial logistic regression (PLR) model, in which the predictors are objectively selected from the perspective of machine learning. The results show that the prediction accuracy is comparable to that of the predictions of traditional estimates by considering the effects of collinearity and the number of predictors.
Although the above single prediction method is simple and fast, it is unable to achieve a balance between subjectivity and objectivity, and the stability of the model is poor. In 1969, the concept of model combination was proposed by Bates et al. [10], and the combination prediction method was used to predict the natural gas demand. The method is simple and effective and has a reasonable distribution ratio and a high prediction accuracy. Thus, the combination of models based on weight allocation is widely used in various forecasting industries and is one of the most commonly used combination methods. Min J et al. [11] proposed a combined model for the intelligent prediction of natural gas demand in the heating season based on EMD and BP neural network algorithms (EMD_BP). Compared with a prediction model using only the BP neural network algorithm, the combination model has better applicability and wide prospects. However, the existing combinatorial model does not consider the important features of multicollinearity in modeling, resulting in overfitting of the model. The Lasso model can solve the problem of multicollinearity in the model data by considering the above factors, and is a regression method for parameter estimation and variable selection at the same time [12].
In different natural gas demand forecasting methods, the accuracy and stability of the model are affected by the number of factors and the prediction range, and the multicollinearity problem must also be excluded. The Lasso algorithm model was first proposed by Robert Tibshirani [13] in 1996. By setting some coefficients with a value of 0, the algorithm model gives the Lasso model the ability to automatically select features and exclude weakly correlated features and improves the interpretability, generalization performance and redundancy of the model. Daalalyan et al. [14] studied the predictive performance of Lasso and showed that Lasso provides accurate predictions even when the covariates are highly correlated. Özmen et al. [15] used a LASSO analysis to generate gas demand forecasts for residential users for distribution system operators that require short- and long-term forecasts. The application of a subset of explanatory variables by LASSO not only minimizes the estimation error of the quantitative dependent variable, but also improves the interpretability and robustness of the model. The results show that the performance and application of the LASSO model outperform simple multiple linear regression (LR). Therefore, considering the accuracy of the prediction results and the stability of the model, this paper uses the Lasso model, which can automatically judge and exclude multicollinearity, as a single prediction model.
In summary, there is still a lack of a natural gas demand forecasting method that can balance the subjectivity and objectivity of the model and effectively judge and prevent the overfitting of the model. The purpose of this paper is to explain the requirements of model selection from the perspectives of subjectivity and objectivity, and to mine the highly subjective Lasso model and the polynomial model with strong objectivity to verify the modeling ideas. A combinatorial prediction model with good performance and strong explanatory performance is constructed; then, the established model is used to predict and analyze China’s natural gas demand to help formulate policies in the relevant energy sector. The model can process a large amount of data well, improve the accuracy of the model’s forecasts, more accurately predict China’s natural gas demand and predict and analyze China’s natural gas demand in the next decade.
There are two main innovation points of the model constructed in this paper:
(1) A new modeling idea based on model subjectivity and objectivity is proposed, which ensures the interpretability of the model while ensuring a high prediction accuracy.
(2) This paper combines the subjectivity of the Lasso model and the objectivity of the polynomial model for the combinatorial prediction model and changes the strength of the subjectivity of the combinatorial predictive model by adjusting the weight.
The structure of this article is as follows. The second section describes methods for forecasting the natural gas demand, including Lasso modeling. Section 3 presents a detailed case study. Section 3 discusses the prediction results and Section 4 summarizes the conclusions.

2. Methodology

In this paper, a natural gas demand forecasting model based on the Lasso model and a polynomial model is proposed. Firstly, the concise Lasso model with strong subjectivity and the objective polynomial model are selected from the subjective and objective models as single prediction models, and the combined forecasting model is constructed by the weighted average method as the combination method. The model’s performance is then evaluated through model performance parameters.
The Lasso algorithm model and polynomial model are used as single models, and the weighted average method is used as a combinatorial method to construct a combinatorial model. As shown in Figure 1, the model application process is mainly divided into five steps.
Step 1: Data Collection. Determine the influencing factors of natural gas demand, including population, GDP, industrial structure, environmental protection mechanism, energy consumption intensity and energy consumption structure. Select reasonable influencing factors and collect relevant data to establish an impact set, and establish a target set and a year establishment time set according to natural gas demand.
Step 2: Build a single prediction model. Select the appropriate time set and target set and use the least squares method for second-order polynomial fitting. The influence set and target set are introduced into the Lasso model, the data are split by the cross-validation method and the parameter t is continuously adjusted by the evaluation set until the fitting effect and the prediction effect meet the accuracy requirements.
Step 3: Combined models. Use the weighted average method to linearly weigh the objective model and the subjective model to build a combined prediction model. The linear programming method is used to determine the objective weight of the polynomial fitting model.
Step 4: Model evaluation. There are two aspects to evaluate the overall model performance: model training error and accuracy and the generalization error of the evaluation and test sets. Typical values for parameter identification of relevant evaluation standards refer to relevant industry standards. In this paper, the gray prediction model [16], ARIMA model [17], SVR regression model [18], single prediction model and combined prediction model are selected for a comparative analysis to evaluate the predictive performance of the model.
Step 5: Predictive analytics. Based on the influence set forecast, the time series model and gray prediction model are used to predict and analyze China’s natural gas demand from 2023 to 2033.

2.1. Lasso Model

The Lasso algorithm is a compressed estimation method that constructs an algorithm model through a penalty function. After model training is complete, the parameter with a coefficient equal to 0 is rounded off, so as to judge and remove the influence of the complex collinear dataset, which makes the prediction model simpler and effectively prevents the model from overfitting. The detailed modeling procedure is described below:
Target set:
Y = ( y ( 1 ) , y ( 2 ) , y ( 3 ) , , y ( n ) )
Impact set:
X 1 = ( x 1 ( 1 ) , x 1 ( 2 ) , x 1 ( 3 ) , , x 1 ( n ) ) X 2 = ( x 2 ( 1 ) , x 2 ( 2 ) , x 2 ( 3 ) , , x 2 ( n ) ) X p = ( x p ( 1 ) , x p ( 2 ) , x p ( 3 ) , , x p ( n ) )
where Y is a collection of data consisting of observations at different times, and there is usually only one dependent variable in the set function. For the same observation time Y , X j represents the set of j-th influencers, and the function can have multiple independent variables.
Step 1: Build a multivariate linear model and set the regression coefficient of the influencing factor X j to B j . The initial fitted model is shown in Equation (3).
y = j = 1 p X j B j + b
where y represents the expression of the set functional relationship, Bj represents the regression coefficient of the influencing factor Xj in the functional relationship expression and b is called the longitudinal intercept in the regression model and is a fixed value.
Step 2: According to the residual sum of squares minimization method, the objective explanatory function of the model is established, where Y is the observation value of the target set and its mathematical expression is shown in Equation (4).
B L A S S O = arg B m i n { | Y j = 1 p X j B j b | 2 }
Step 3: According to the Lasso algorithm, the compression constraint under the regression coefficient is established. The sum of the absolute values of the regression coefficient Bj is less than a constant, which yields some regression coefficients that are strictly equal to 0. The constraints are shown in Equation (5).
s . t . j = 1 p B j t
where t is the adjustment parameter and the appropriate value is selected by subjective adjustment.
Compression of the population regression coefficient can be achieved by controlling the tuning parameter t. The t-value can be estimated using the cross-validation method proposed by Efron and Tibshirani [19]. Equation (1) is equivalent to minimizing the penalty least squares of Equation (6).
B L A S S O = arg B min Y j = 1 p X j B j b 2 + a j = 1 p B j
where w and t can be converted to each other. The main advantage of the Lasso method is that it compresses variables with large parameter estimates less, while variables with small parameter estimates are compressed to 0.

2.2. Polynomial Model

Target set:
Y = ( y ( 1 ) , y ( 2 ) , y ( 3 ) , , y ( n ) )
Time set:
T = ( t ( 1 ) , t ( 2 ) , t ( 3 ) , , t ( n ) )
where T is the set of observational time data for Y, which here is the independent variable of the function. Let α be the coefficients, then fit the polynomial as shown in Equation (9):
y = α 0 + α 1 x + α 2 x 2 + α 3 x 3 + α 4 x 4 + + α n x n
where y represents the expression of the set function relationship, α represents the coefficients and x represents the independent variable.
Convert the form of the equation to a matrix as shown in Equation (10).
n j = 1 n x 2 j = 1 n x i k j = 1 n x i j = 1 n x i 2 j = 1 n x i k + 1 j = 1 n x i k j = 1 n x i k + 1 j = 1 n x i 2 k a 0 a 1 a k = j = 1 n y i j = 1 n x i y i j = 1 n x i k y i
The linear equation is solved with linear algebra ( X A = Y ) to obtain the coefficient matrix, the value of the α and the fitting function.

2.3. Combined Model Weight Optimization Principle

After modeling with the single models, the model composition method is used to combine single models into combined models. Its combination methods include the arithmetic mean method, the geometric mean method, the weighted average method, etc., among which the combination effect of the weighted average method is relatively scientific and effective. It is necessary to determine the optimal weight of each single model according to the data characteristics for a combined prediction model. In this paper, the linear programming method is used to construct the optimization function to obtain the weight values.
If there are N single-term prediction functions, each with combined coefficients ω j , then the multivariate fitting model is shown in Equation (11).
Y t = j = 1 N ω j Y j t ( j = 1,2 , 3,4 , , N ; t = 1,2 , 3,4 , , n )
where Y t is the predicted value at t , ω j is the weight of the j-th single model and Y j t is the predicted value at the t-th time of the j-th single model.
The sum of squared errors of the combined model is E is shown in Equation (12).
E = ω 1 , , ω N e 1 t 2 e 1 t e N t e N t e 1 t e N t 2 ω 1 ω N
where E represents the sum of the squared errors of each observation of the combined forecasting model and e represents the error values of each combined forecasting model.
Let E = ω T e ω , P = 1 ,   1 ,   1 ,   1 T and P T ω = 1 . When E reaches the smallest vector of optimal weight coefficients ω = e 1 P P T e 1 P , e 1 is the inverse of e . The weight vector can be solved by linear algebra to obtain the optimal solution of the single model weight.

3. Case Study

3.1. Selection and Collection of Sample Data

China is undergoing a transformation in its energy structure [20], and the proportion of consumption of various energy sources is constantly changing. In the process, there are many factors affecting the natural gas demand and their degree of influence varies; thus, it is difficult to accurately predict China’s natural gas demand. Based on the existing research results of a natural gas influencing factor analysis, 15 influencing factors are selected as the influence set in this paper. The data for 2021–2022 are from the China Statistical Yearbook [21] and the wind database.
Influencing factors:
x 1 :Number of people x 9 :Gasoline
x 2 :Gross domestic product (GDP) x 10 : Aviation kerosene
x 3 :Proportion of secondary industry x 11 :Diesel/light oil
x 4 :Investment in environmental
pollution control
x 12 :Fuel oil
x 5 : C O 2 emissions x 13 :Hydropower consumption
x 6 :Primary energy consumption x 14 :Nuclear energy consumption
x 7 :Ethane and LPG x 15 :Coal consumption
x 8 :Naphtha Y :Natural gas consumption
The data and units for the 24 year period after sorting the elements are shown in Table 1.

3.2. The Fitting Function of the Combined Predictive Model

3.2.1. Application of the Lasso Model

SPSS software was used to calculate the single and combined predictive models, running on an Intel i5 9300 H 2400 Mhz CPU, a 16 G memory, a NVIDIA GeForce RTX 1650 graphics card and a Win 11 64-bit operating system.
Firstly, the cross-validation method [22] was used to adjust the parameter a. At the same time, the target set from 1999 to 2022 is divided into three parts, of which 1999–2020 is the training set, 2021 is the evaluation set and 2022 is the test set.
The 1999–2020 impact set and target set data are fed into the Lasso model to generate a first-generation coefficient set that allows the maximum value of the parameter t to be determined. Secondly, the predicted value of the 2021 target is calculated by using the coefficient set, the error rates of the predicted value and the true value are compared and the evaluation effect of the evaluation set is determined. Finally, the parameter t is continuously narrowed, so that the coefficients are iterated until the evaluation effect of the evaluation set is optimal. The B j results output after the iteration is completed are shown in Table 2.
It can be seen from Table 2 that the iterative model has a goodness-of-fit of R 2 = 0.999, and the fitting curve and actual values are shown in Figure 1 and Figure 2. The average error rate was 3.36%, and the fitting effect was excellent. The test set data were calculated according to the coefficient set completed iteratively, and the predicted value of the test set was 377.43 (10 M m3) and the value was 366.3 (10 M m3), with an error rate of 3.04%. The accuracies of the training set, evaluation set and test set all met the requirements and had good fitting effects.

3.2.2. Application of Polynomial Models

In general, a polynomial of more than two orders, even though the fit is better, can reduce the reliability of the model due to its monotonicity and the complexity of stationary point variation. Therefore, this paper uses quadratic polynomials and least squares to establish the fitted function. The model function is shown in Equation (13), the relevant parameters are shown in Table 3 and the fitting effect is plotted in Figure 2 and Figure 3.
y = α 0 + α 1 x 1998 + α 2 x 1998 2
The relevant parameters are shown in Table 3.
The fitting effect is shown in Figure 3.

3.2.3. Combined Models

After modeling the Lasso model and the polynomial model, they were built into a combinatorial predictive model through the model combination method. The combination of linear weighting is the process of selecting and utilizing the information of each individual predictive model, which has good stability and error tolerance. The combined predictive model is closer to the actual situation, which can improve the fitting effect and prediction effect of the model. In this paper, the Lasso model with strong subjectivity and the polynomial model with strong objectivity are combined to obtain a combinatorial model. The Lasso model has a weight of ω 1 and the polynomial model has a weight of ω 2 . The values of ω 1 and ω 2 are shown in Table 4.

3.2.4. Model Evaluation

Model evaluation provides a reasonable assessment of the reliability of the prediction results of the combined prediction model and can also be used as a basis for judging the improvement in prediction performance of the combined method compared to a single model. The generalization error for the evaluation and test sets in the evaluation element can be expressed as the mean generalization error, which is the average of the two. The smaller the mean generalization error, the better the prediction performance of the model. Next, we evaluated the model training error and accuracy. The usual training error measure is the MAPE, the measure levels of which are shown in Table 5 [23]. The model accuracy indicates the proportion of data that are accurately predicted for the training set to the total number of data, and the data can be considered to be accurately predicted when the error is less than 5%, with a higher model accuracy indicating a stronger ability of the model to correctly process the data.
The gray prediction model, the ARIMA model and the SVR regression model in machine learning are introduced to compare with the combinatorial model. By comparing the parameters of the single model and the combined prediction model, one can effectively determine whether the combined method improves the performance of the model. The data results are shown in Table 6.
Goodness-of-fit refers to the degree to which the model fits the observed values, and the goodness-of-fit test is commonly used when testing the model by statistical criteria. It can be seen from the table that the goodness-of-fit of the single model, the combinatorial predictive model and the Lasso model is greater than 0.900. The combinatorial prediction model and the single-term prediction model passed the statistical criterion test, but the fitting effect of the combinatorial model and the Lasso model was significantly better than that of other models. The combinatorial predictive model, the Lasso model and the polynomial model all have an accuracy class of level 1. Both subjective and objective models can make high-precision predictions of data. The accuracy level of the SVR regression model is level 2 and the gray prediction and ARIMA models are both level 3, and the MAPE value of the combined prediction model is the smallest. Thus, the combined predictive model provides the most accurate observations and the best prediction performance among all models. Whether in the prediction accuracy analysis of the training set or in the average generalization error analysis, the combined prediction model has the highest accuracy and the best performance.
From the analysis of the performance parameters between the combinatorial predictive model and the contrasting model, it can be concluded that the combinatorial predictive model constructed in this paper is better than the comparative model in terms of the fitting effect and the prediction accuracy. The combined model greatly improves the prediction performance of the single prediction model and can make the stability of the model better and the prediction more accurate.

3.3. Forecast and Analysis of China’s Natural Gas Demand

The cross-validation method was used to divide the basic data into a test set (1999–2021) and an evaluation set (2022). Four models, the gray prediction model, the polynomial regression model, the time series model and the BP neural network, were selected for a comparative analysis. It was found that the prediction accuracy is the highest when using the general BP neural network for a predictive analysis, while the overall prediction accuracy of the other models is greatly reduced due to the excessive local error rate. The four model fitting performance parameters are shown in Table 7.
According to the basic data obtained by BP neural network prediction, the natural gas demand forecast data of the Lasso model from 2023 to 2033 are calculated. On the other hand, polynomial models derive predictions over time. The combined forecasting model linearly weighs two single-term forecasting models to obtain the forecast data. The data are shown in Table 8.
As can be seen from Figure 4 and Table 8, China’s natural gas demand will continue to grow from 2023 to 2033. However, due to the different growth rates of various data indicators, the growth rates are quite different, with a maximum increase of 64.71 × 109 m3 and a maximum annual growth rate of 13.33%. In 2026, the natural gas demand will increase sharply and slowly due to the continuous decline in the total population. After 2028, limited by the increasing proportion of other energy sources in the energy structure, natural gas will begin to be used as a conventional energy source, resulting in a steady increase in the natural gas demand within a certain range. In summary, China’s natural gas demand will exceed 600 × 109 m3 in 2028 and reach 830 × 109 m3 in 2033.

3.4. Discussion of the Results

In model building and the example analysis, there are several notable insights:
(1) The model established in this paper has a high reproducibility. In the process of discussing the idea of model establishment, our requirements for the selection of a single model are an obvious subjectivity or objectivity. Thus, the combined prediction model we establish can have the advantages of subjectivity and objectivity at the same time and the strength of subjectivity can be controlled by adjusting the weight. The modeling idea of this paper is clear, and the model establishment process is clearly explained in two aspects, model selection and model combination, and the model has a high reproducibility.
(2) The model established in this paper has a high predictive performance. In the case analysis, the model is evaluated and the indicators of the established combined forecasting model are compared with advanced forecasting models such as the single forecasting model, with the results showing that the prediction error in this paper is 2.99%, while the error of other advanced models is 15.37% for gray prediction, 10.55% for ARIMA and 8.54% for SVR. The comparative results show that the model constructed in this paper has a higher performance than the existing advanced predictive models.
(3) In our forecast analysis, we forecast China’s natural gas demand in the next ten years, providing support for relevant departments to formulate energy policies under the goal of carbon neutrality and ensuring the smooth adjustment of China’s energy structure transformation.
(4) The limitation of the application of the model in this paper lies in the prediction of the basic data; in the combination prediction of this type of model, the accuracy of the prediction largely depends on the prediction of the basic data, which causes different models to adapt to different regions of the data, so the model constructed in this paper is also limited by the accuracy of a single prediction model. Different models should be analyzed to choose the most suitable model.
(5) The MAPE obtained by the combinatorial prediction model established in this paper is 2.99%, which demonstrates a good performance. In comparison, the MAPE obtained by the MLP model established by Szoplik and Muchel [24] is 3.356%. Safiyari et al. [25] established a hybrid model by multi-layer perceptron and support vector machine, the MAPE of which was 7.6%. Ma et al. [26] compared 16 models based on in samples and out samples, and the minimum MAPE was greater than 4.3% and 4.9%. It can be seen that the error rate obtained by the proposed model is smaller and therefore the model has a higher prediction performance.

4. Conclusions

From the perspective of a combinatorial model, this paper combines the highly subjective Lasso model and the objective polynomial model to study and construct a new combinatorial prediction model, taking China as an example for a case analysis. The natural gas demand forecasting model established in this paper can greatly improve the model accuracy of the single prediction model has a better prediction performance than other comparative models. At the same time, because more factors are considered to affect the prediction data, the stability of the model is improved, the sensitivity of individual factors is reduced and the reliability of the prediction results is greatly improved. According to the prediction results of the forecast model in this paper, the current situation and future situation of China’s natural gas are analyzed, and the following conclusions are reached.
(1) Over the next decade, China’s natural gas demand will continue to grow, with a minimum increase of 10.10 × 109 m3 in 2026 and a maximum increase of 64.71 × 109 m3 in 2033. The minimum annual growth rate is 1.96% in 2026, the maximum annual growth rate is 13.33% in 2025 and the annual demand reaches 833.04 × 109 m3 in 2033, more than twice the demand in 2023. China’s future natural gas demand will rise sharply, and the growth in domestic demand will bring sufficient vitality to China’s natural gas market.
(2) Over the next decade, China’s imports of liquefied natural gas and pipeline natural gas will account for too much of the country’s natural gas supply. However, in the development of the natural gas market in the next decade, the demand is rising too fast and the domestic natural gas production will be insufficient to maintain the balance between supply and demand. Thus, imported natural gas is still the main source of natural gas for the domestic energy transition.
(3) After 2028, the growth rate of natural gas will gradually stabilize and the energy structure transformation will enter a stable period. Renewable energy sources such as nuclear and hydropower will gradually replace coal to reduce carbon emissions, while natural gas production will also grow steadily. China can reduce its carbon emissions while maintaining a sufficient energy supply and accelerate the goal of carbon neutrality and peak carbon emissions.
(4) Over the next decade, the growth in demand for natural gas will place higher demands on many aspects of the natural gas market. Regarding the price of natural gas, pipeline laying, construction of gas storage tanks and regional natural gas forecasting, relevant departments need to formulate policies to guide the reasonable and healthy development of the natural gas market.

Author Contributions

Conceptualization, H.L. and B.H.; methodology, H.L., Y.L. and C.W.; software, H.L.; validation, H.L., Y.L. and C.W.; formal analysis, Y.L. and C.W.; investigation, C.L., Y.S. and S.Z.; resources, B.H.; writing—original draft preparation, H.L. and Y.S.; writing—review and editing, W.J., S.Z., Y.S. and B.H.; visualization, H.L. and Y.L.; supervision, S.Z. and B.H.; funding acquisition, B.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National College Students Innovation and Entrepreneurship Training Program of Zhejiang Ocean University (202110340035), the Zhoushan Science and Technology Project (2021C21011) and the Basic Public Welfare Research Program of Zhejiang Province (LQ23E040004).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

SymbolExplanation
YRepresents the series of target set data that are the dependent variables of the model.
XjRepresents the series composed of the j-th influencing factor data, which are the independent variables of the model.
tRepresents the tuning parameters of the Lasso model.
BjRepresents the regression coefficient of the j-th influencing factor in the Lasso model.
bRepresents the longitudinal intercept of the Lasso model.
YjtRepresents the predicted value of the J model at time t in the combined model.
YTRepresents the predicted value of the combined model at time T.
ω j Represents the weight of the J model in the combined model.
MAPEAverage error rate
ARIMAAutoregressive Integrated Moving Average model
SVRSupport vector regression
BPBack Propagation

References

  1. Liu, M.; Yang, X.; Wen, J.; Wang, H.; Feng, Y.; Lu, J.; Chen, H.; Wu, J.; Wang, J. Drivers of China’s carbon dioxide emissions: Based on the combination model of structural decomposition analysis and input-output subsystem method. Environ. Impact Assess. Rev. 2023, 100, 107043. [Google Scholar] [CrossRef]
  2. Zeng, S.; Su, B.; Zhang, M.; Gao, Y.; Liu, J.; Luo, S.; Tao, Q. Analysis and forecast of China’s energy consumption structure. Energy Policy 2021, 159, 112630. [Google Scholar] [CrossRef]
  3. Cao, Q.R.; Zhou, S.Y.; Sajid, M.J.; Cao, M. The impact of China’s carbon-reduction policies on provincial industrial competitiveness. Energy Effic. 2022, 15, 34. [Google Scholar] [CrossRef]
  4. Shaikh, F.; Ji, Q. Forecasting natural gas demand in China: Logistic modelling analysis. Int. J. Electr. Power Energy Syst. 2016, 77, 25–32. [Google Scholar] [CrossRef]
  5. Shaikh, F.; Ji, Q.; Shaikh, P.H.; Mirjat, N.H.; Uqaili, M.A. Forecasting China’s natural gas demand based on optimised nonlinear grey models. Energy 2017, 140, 941–951. [Google Scholar] [CrossRef]
  6. Bianco, V.; Scarpa, F.; Tagliafico, L.A. Scenario analysis of nonresidential natural gas consumption in Italy. Appl. Energy 2014, 113, 392–403. [Google Scholar] [CrossRef]
  7. Zhu, L.; Li, M.S.; Wu, Q.H.; Jiang, L. Short-term natural gas demand prediction based on support vector regression with false neighbours filtered. Energy 2015, 80, 428–436. [Google Scholar] [CrossRef]
  8. Ding, J.; Zhao, Y.; Jin, J. Forecasting natural gas consumption with multiple seasonal patterns. Appl. Energy 2023, 337, 120911. [Google Scholar] [CrossRef]
  9. Gao, L.; Wei, F.; Yan, Z.; Ma, J.; Xia, J. A Study of Objective Prediction for Summer Precipitation Patterns Over Eastern China Based on a Multinomial Logistic Regression Model. Atmosphere 2019, 10, 213. [Google Scholar] [CrossRef]
  10. Bates, J.M.; Granger, C.W.J. The Combination of Forecasts. J. Oper. Res. Soc. 1969, 20, 451. [Google Scholar] [CrossRef]
  11. Min, J.; Dong, Y.; Wu, F.; Li, N.; Wang, H. Comparative Analysis of Two Methods of Natural Gas Demand Forecasting. IOP Conf. Ser. Earth Environ. Sci. 2021, 632, 32033. [Google Scholar] [CrossRef]
  12. Xia, M.; Cai, H.H. The driving factors of corporate carbon emissions: An application of the LASSO model with survey data. Environ. Sci. Pollut. Res. 2023, 30, 56484–56512. [Google Scholar] [CrossRef] [PubMed]
  13. Regression Shrinkage and Selection via the Lasso on JSTOR[EB/OL]. Available online: https://www-jstor-org-s.zjou.edu.cn/stable/2346178 (accessed on 9 January 2023).
  14. Dalalyan, A.S.; Hebiri, M.; Lederer, J. On the prediction performance of the Lasso. Bernoulli 2017, 23, 552–581. [Google Scholar] [CrossRef]
  15. Özmen, A. Sparse regression modeling for short- and long-term natural gas demand prediction. Ann. Oper. Res. 2023, 322, 921–946. [Google Scholar] [CrossRef]
  16. Zhang, K.; Yin, K.; Yang, W. Probabilistic accumulation grey forecasting model and its properties. Expert Syst. Appl. 2023, 223, 119889. [Google Scholar] [CrossRef]
  17. Mahendra, H.N.; Mallikarjunaswamy, S.; Kumar, D.M.; Kumari, S.; Kashyap, S.; Fulwani, S.; Chatterjee, A. Assessment and Prediction of Air Quality Level Using ARIMA Model: A Case Study of Surat City, Gujarat State, India. Nat. Environ. Pollut. Technol. 2023, 22, 199–210. [Google Scholar] [CrossRef]
  18. Karray, E.; Elmannai, H.; Toumi, E.; Gharbia, M.H.; Meshoul, S.; Aichi, H.; Ben Rabah, Z. Evaluating the Potentials of PLSR and SVR Models for Soil Properties Prediction Using Field Imaging, Laboratory VNIR Spectroscopy and Their Combination. Comput. Model. Eng. Sci. 2023, 136, 1399–1425. [Google Scholar] [CrossRef]
  19. Varying-Coefficient Models on JSTOR[EB/OL]. Available online: https://www-jstor-org-s.zjou.edu.cn/stable/2345993 (accessed on 15 January 2023).
  20. Baležentis, T.; Štreimikienė, D. Sustainability in the Electricity Sector through Advanced Technologies: Energy Mix Transition and Smart Grid Technology in China. Energies 2019, 12, 1142. [Google Scholar] [CrossRef]
  21. China Statistical Yearbook—2021[EB/OL]. Available online: http://www.stats.gov.cn/tjsj/ndsj/2021/indexch.htm (accessed on 27 January 2023).
  22. Lyu, Z.; Yu, Y.; Samali, B.; Rashidi, M.; Mohammadi, M.; Nguyen, T.N.; Nguyen, A. Back-Propagation Neural Network Optimized by K-Fold Cross-Validation for Prediction of Torsional Strength of Reinforced Concrete Beam. Materials 2022, 15, 1477. [Google Scholar] [CrossRef]
  23. Zhou, W.; Wu, X.; Ding, S.; Pan, J. Application of a novel discrete grey model for forecasting natural gas consumption: A case study of Jiangsu Province in China. Energy 2020, 200, 117443. [Google Scholar] [CrossRef]
  24. Szoplik, J.; Muchel, P. Using an artificial neural network model for natural gas compositions forecasting. Energy 2023, 263, 126001. [Google Scholar] [CrossRef]
  25. Safiyari, M.H.; Shavvalpour, S.; Tarighi, S. From traditional to modern methods: Comparing and introducing the most powerful model for forecasting the residential natural gas demand. Energy Rep. 2022, 8, 14699–14715. [Google Scholar] [CrossRef]
  26. Ma, X.; Lu, H.; Ma, M.; Wu, L.; Cai, Y. Urban natural gas consumption forecasting by novel wavelet-kernelized grey system model. Eng. Appl. Artif. Intell. 2023, 119, 105773. [Google Scholar] [CrossRef]
Figure 1. Model building and application flowchart.
Figure 1. Model building and application flowchart.
Energies 16 04268 g001
Figure 2. The change trend between the observed values and the fitted values of the Lasso model.
Figure 2. The change trend between the observed values and the fitted values of the Lasso model.
Energies 16 04268 g002
Figure 3. The trend between observations and the fitted values of the polynomial model.
Figure 3. The trend between observations and the fitted values of the polynomial model.
Energies 16 04268 g003
Figure 4. Natural gas consumption forecast for 2023–2033.
Figure 4. Natural gas consumption forecast for 2023–2033.
Energies 16 04268 g004
Table 1. Data table of China’s natural gas consumption and its influencing factors.
Table 1. Data table of China’s natural gas consumption and its influencing factors.
Factorsx1x2x3x4x5x6x7x8
Years(108 People)(1012 CNY)(%)(1010 CNY)(108 Ton)(1012 J)(103 Bucket
/Year)
(103 Bucket
/Year)
199912.57869.056456.99.36732.94441.03344.33187.79
200012.674310.02859.610.14933.60942.45388.5213.01
200112.762711.08646.411.06735.23144.84393.15221.44
200212.845312.17249.413.67238.43448.85451.85242.02
200312.922713.74257.916.27745.32256.89506.6276.06
200412.998816.18451.819.09853.34966.55588.18298.74
200513.075618.73250.523.8860.98275.6596.23383.26
200613.144821.94449.725.6666.77382.89667.42485.28
200713.212927.00950.133.87372.39890.09699.68554.34
200813.280231.92448.649.3773.78393.45627.12538.5
200913.34534.85252.352.58477.13997.53639.76598.3
201013.409141.21257.476.12281.458104.29677.16778.08
201113.491648.7945271.1488.272112.54690.7810.98
201213.592253.8582082.53590.042117.05689.08867.85
201313.672659.29648.590.37292.474121.38771.2914.05
201413.764664.35645.695.75592.932124.82884.021032.2
201513.832668.88639.788.06392.797126.531028.21115.2
201613.923274.643692.19892.79128.631207.51187.7
201714.001183.20434.295.3994.664132.81350.51276.7
201814.054191.92834.498.93296.527137.581408.51338.3
201914.100898.65232.6100.1898.105142.031402.71444
202014.1212101.3643.3104.8898.993145.461457.51569.2
202114.126114.920.394106.2599.725147.221512.31635.2
202214.118121.020.399109.02100.35148.821594.21762.2
Factorsx9x10x11x12x13x14x15Q
Years(103 Bucket
/Year)
(103 Bucket
/Year)
(105 Bucket
/Year)
(108 Bucket
/Year)
(1012 J)(1012 J)(1012 J)(10 M m3)
19995.6921186.812.573761.791.970.1529.1621.66
20008.0258196.1413.768766.422.220.1729.5624.7
20018.263202.0714.514770.82.760.1731.0927.65
20028.7474209.3315.796758.362.840.2534.0829.41
20039.4218211.3617.38889.412.780.4340.6234.17
200410.713255.220.66951.33.450.4947.3639.98
200511.106270.6622.375898.773.840.5155.4646.98
200611.993288.4723.92954.354.190.5360.9157.78
200712.626308.9425.48905.574.640.5966.3371.08
200814.021314.7727.609723.566.050.6567.3881.93
200914.121362.9427.614662.345.810.6670.5890.22
201015.914437.1930.048665.896.680.773.22108.87
201117.377458.7131.961587.626.420.8179.71135.16
201218.63494.5934.525559.7780.9180.71150.88
201321.427545.5535.034564.078.381.0382.44171.88
201422.364588.235.059584.169.711.2282.49188.36
201526.007654.4735.404622.110.151.5680.94194.69
201627.069730.9634.196593.3510.441.9380.21209.44
201728.127820.9634.27554.5810.492.2380.59241.25
201829.825903.0934.351555.610.732.6481.11283.93
201931.111956.0434.299562.8611.343.1181.79308.38
202029.419794.6335.011644.4111.743.2582.27330.58
202132.254854.5435.228586.4612.413.6481.561372.6
202233.954899.6536.021571.1312.763.7284.84366.3
Table 2. Lasso model regression coefficients.
Table 2. Lasso model regression coefficients.
CoefficientsValues
B1−10.102
B22.925
B3−6.430
B40.158
B50.585
B63.596
B7−0.050
B8−0.012
B90.768
B10−0.044
B11−1.946
B120.027
B13−5.871
B145.487
B15−3.966
Longitudinal intercept b 112.990
R20.999
Table 3. Parameters of the polynomial model.
Table 3. Parameters of the polynomial model.
Parametersα0α1α2R2
Values21.625−0.6010.66540.996
Table 4. The fitted data of the combined predictive models.
Table 4. The fitted data of the combined predictive models.
YearQ
(10 M m3)
Lasso
(10 M m3)
Polynomial
(10 M m3)
Combined
(10 M m3)
199921.6622.5721.6922.26
200024.7023.6023.0823.42
200127.6525.3125.8125.49
200229.4126.2229.8727.51
200334.1735.7935.2635.60
200439.9841.8941.9741.92
200546.9845.3250.0246.98
200657.7853.8359.4055.80
200771.0867.2470.1168.25
200881.9379.8082.1680.63
200990.2288.2495.5390.81
2010108.87110.42110.23110.36
2011135.16134.45126.26131.56
2012150.88149.67143.63147.54
2013171.88167.05162.33165.38
2014188.36179.91182.35180.77
2015194.69195.21203.71198.21
2016209.44212.00226.40217.08
2017241.25240.25250.42243.84
2018283.93276.75275.77276.40
2019308.38307.43302.45305.67
2020330.58326.92330.46328.17
Weight0.6470.353
Error rate3.36%3.75%2.99%
Table 5. MAPE measures the scale of the standard.
Table 5. MAPE measures the scale of the standard.
MAPE RangePredict Performance
<5%Level 1, high precision
5–10%Level 2, good accuracy
10–20%Level 3, the accuracy is reasonable
>20%Level 4, the accuracy is weak
Note: The MAPE represents the average error rate and is used to evaluate the fitting performance of the model.
Table 6. Parameters for combining predictive models and single model performance.
Table 6. Parameters for combining predictive models and single model performance.
TypeGoodness-of-Fit R2MAPEAccuracyAverage Generalization Error
Combined predictive model0.9992.99%86.36%1.75%
Lasso0.9993.36%81.82%2.56%
Polynomial0.9963.75%72.73%2.81%
Gray prediction0.93515.37%63.64%8.56%
ARIMA0.94410.55%68.18%5.15%
SVR0.9638.54%72.73%4.92%
Note: The Autoregressive Integrated Moving Average model (ARIMA) is a time series model. Support vector regression (SVR) is a machine learning model.
Table 7. The underlying data predicts model performance parameters.
Table 7. The underlying data predicts model performance parameters.
ModelMAPEAverage Generalization ErrorPrediction Accuracy
BP neural networks4.52%2.65%Level 1
Gray prediction21.8%36.4%Level 4
Polynomial regression9.7%9.56%Level 2
Time series11.4%12.5%Level 2
Note: Back propagation (BP) is a commonly used mathematical model.
Table 8. Natural gas consumption forecasts.
Table 8. Natural gas consumption forecasts.
YearNatural Gas Forecasts
(10 M m3)
Difference
(10 M m3)
Annual Growth Rate
2023407.6419.03 4.90%
2024453.8946.24 11.34%
2025514.4060.51 13.33%
2026524.4910.10 1.96%
2027542.3917.90 3.41%
2028600.4458.04 10.70%
2029646.8846.44 7.73%
2030681.7234.84 5.39%
2031724.3642.64 6.25%
2032768.3343.97 6.07%
2033833.0464.71 8.42%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, H.; Liu, Y.; Wang, C.; Song, Y.; Jiang, W.; Li, C.; Zhang, S.; Hong, B. Natural Gas Demand Forecasting Model Based on LASSO and Polynomial Models and Its Application: A Case Study of China. Energies 2023, 16, 4268. https://doi.org/10.3390/en16114268

AMA Style

Liu H, Liu Y, Wang C, Song Y, Jiang W, Li C, Zhang S, Hong B. Natural Gas Demand Forecasting Model Based on LASSO and Polynomial Models and Its Application: A Case Study of China. Energies. 2023; 16(11):4268. https://doi.org/10.3390/en16114268

Chicago/Turabian Style

Liu, Huanying, Yulin Liu, Changhao Wang, Yanling Song, Wei Jiang, Cuicui Li, Shouxin Zhang, and Bingyuan Hong. 2023. "Natural Gas Demand Forecasting Model Based on LASSO and Polynomial Models and Its Application: A Case Study of China" Energies 16, no. 11: 4268. https://doi.org/10.3390/en16114268

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop