1. Introduction
Forecasting the consumption of energy has been an important research topic for many decades. Its role has been growing in proportion to the increase in energy demand in all countries of the world. The quality of forecasts in various time horizons and for various time periods remains a significant research problem. A considerable research challenge is to improve the accuracy of long-term forecasts, which do not keep up with the growing volatility of the economic environment in the long term. For instance, for electricity markets, the quality of short-term and medium-term forecasts obtained with the use of various models is relatively good, because, for example, the typical forecasting error of monthly electricity consumption for one geographical area, measured by the mean absolute percentage error MAPE, is about 2% [
1]. However, the quality of electricity consumption forecasts may vary depending on the group of consumers and the MAPE error variance, for example, from 2% to 10% for short-term forecasts, while the MAPE error for long-term forecasts is from 4% to 32% [
2].
In the case of forecasting of natural gas consumption, the quality of the forecasts is usually lower. The MAPE error in daily gas consumption forecasts is around 10% [
3] or slightly less for hourly forecasts on a daily basis—MAPE 6–8% [
4]. Forecasts of daily gas consumption on one day in advance, made by the best of the tested algorithms, have the MAPE error of 2% to 5% [
5]. Nevertheless, the quality of forecasts significantly differs depending on the time horizon and the algorithms used—the MAPE error can range from 17% to over 100% [
6,
7]. Along with the extension of the time horizon, the quality of forecasts drops significantly.
In recent years, innovations of forecasting techniques have resulted in a great improvement of forecasting accuracy. Nowadays, the modern forecasting toolbox consists of a wide range of methods from classical statistical methods (autoregressive integrated moving average ARIMA and seasonal ARIMA–SARIMA family or exponential smoothing ETS family) to modern computational intelligence tools (long-short term memory LSTM, neutral networks, etc.). The common denominator of methods mentioned above is the way of describing the uncertainty related to errors in data collection, time lag and reciprocal effect between variables. Modern forecasting methods most often try to quantify this uncertainty using a probabilistic approach based on the time series which represents the behavior of a random variable over time. The occurrence of dependence between observations is a desirable feature that enables the forecasting of time series.
However, it should be remembered that traditional time series forecasting is based on historical data, which are usually incomplete, imprecise, and ambiguous. These uncertainties decrease long-term forecasting accuracy. Unlike classical time series forecasting, the fuzzy approach can deal with vague and incomplete time series data under uncertain circumstances. This problem particularly affects long-term forecasting.
Long-term forecasting is an essential part of the strategic decision-making process. In the case of the natural gas market, such decisions usually concern the situation where gas sellers want to know the demand on territorial area in order to organize their supply chain and plan the infrastructure. It is assumed that the minimum period for this type of decision is five years.
The aim of our article is to show the effectiveness of the analog forecasting paradigm (by analogy) in combination with the fuzzy approach for forecasting long-term gas demand. Analog forecasting is a method of forecasting value of variable based on another variable that has a similar change in value over time, but not simultaneously. Our goal is to find a methodology for forecasting long-term annual gas consumption with a low forecast error, allowing the use of economic data from some territorial units to construct a forecast for other similar territorial units. Territorial units can be entire countries, as in the case of our research, but also administrative units of any country or geographic region.
In simple terms, the forecasting methodology proposed in this article is as follows. Based on data on the annual consumption of natural gas and other energy carriers over many years, in dozens of countries around the world, we find groups with similar characteristics of annual consumption and the structure of energy carriers. As a result of grouping, a single country may be included in several clusters in different years. Then, using a fuzzy decision tree, we classify all countries in individual years into previously created groups using other explanatory variables, this time economic, social, and climatic. As a result, we create a prediction of membership to the energy consumption groups based on the forecasts of socio-economic indicators being the values of the explanatory variables in the formerly built fuzzy decision tree. In the last step, using the chain increment method, we build forecasts of natural gas consumption for individual countries based on average indicators of changes in the structure of energy consumption in individual energy consumption groups.
The rest of the article is organized as follows.
Section 2 provides an overview of research in the area of forecasting of natural gas consumption.
Section 3 introduces the concept of analog forecasting.
Section 4 contains the formal foundations of fuzzy sets and fuzzy decision trees. The data with their basic statistical analysis are described in
Section 5.
Section 6 presents the research methodology.
Section 7 contains the research results and their discussion.
Section 8 presents conclusions and directions for further research.
2. Literature Review
Long-term gas consumption forecasting, although it seems to be a recognized research topic, have still not received universal solutions [
8,
9]. This is due to the diversified decision makers’ demand for the granularity of forecasting information and differences in gas consumption patterns between countries. Taking into account the publication activity, after 2008 we have observed a significant increase in interest in this area (10–20 publications a year [
8,
9], previously there were 1–2 publications a year).
Every market is made up of producers and consumers. In a broader context, forecasting the consumption of natural gas is an attempt to show the market potential. Thus, we are trying to solve two related problems—Can producers sell more and cheaper? Can demand keep pace with or exceed supply? On the gas market, we are dealing with a situation in which, on the one hand, we observe an increase in consumption. On the other hand, at some point (called peak), the production will start to decline. The market is therefore described by two types of models: one for production and the other for consumption. In our article, we will focus on the demand part of the natural gas market.
The most well-known model for predicting production peak is the Hubbert curve. According to this theory, the production of any fossil fuel first increases due to the discovery of new resources and technology improvements, then reaches its peak and declines. The point at which each fossil fuel will achieve its peak has been predicted many times (e.g., Hubbert originally set the year 2000 for the oil peak). It is important to stress that this is not due to the weakness of the theory, but the fact that the parameters are burdened with uncertainty The difficulties in building the model lie in the uncertainty of the rate of: discovery of new sources, the development of mining technology, and consumption over time. Current research estimates peaks for coal in 2014, oil in 2022, and gas in 2030 [
10]. Other studies report a natural gas peak in 2035 with a ten to fifteen-year plateau [
11].
Hubbert model [
12] can be considered as a pillar and one of first studies of natural gas forecasting. Despite its simplicity, it worked quite well in long-term forecasts. It was very popular in 1950–1970. Nowadays, it is still often used to forecast oil peak. Several publications include significant uses of the Hubbert model in the 21st century for gas peak. In [
13], multicycle Hubbert model is employed to forecast gas production in China and Grey model to forecast consumption. Gray theory is especially useful for developing a model with a small amount of incomplete data, which is common in long-term forecasts. In [
14] Hubbert model with Starzman modification was implemented to forecast gas demand in Poland. Researchers estimated the forecast error at 20%, which is high compared to classical methods. Mohr et al. [
15] present projections of fossil fuel production for China, USA, Canada, Australia and the whole world. They predict a gas peak for the world in 2050. Recent uses of the Hubbert model (2019) relate to forecasts for Ecuador [
16] and Pakistan [
17].
However, Cavallo in his paper [
18] made a critical assessment and has shown that the Hubbert model has failed in situations of substantial economic change, especially for political or social reasons. The second major disadvantage is the fact that it fits only the forecasts of large countries, such as the USA, which are net producers and importers of gas.
In the gas forecasting literature, many more items relate to consumption than production. Regarding smaller countries, individual forecasting methodology of natural gas consumption for countries of a similar size is more appropriate than the general Hubbert model. Relevant studies include: ref [
19] (decomposition method), ref [
20] (neural networks, Belgium), ref [
21] (Spain, stochastic diffusion models), ref [
22] (Turkey, econometric models), ref [
23] (Turkey, machine learning, neural networks), ref [
24] (Turkey, Gray models), ref [
25] (Argentina, aggregation of short- and long-run models). However, these models are usually country specific which makes it difficult to use for other countries.
There are many forecasts of natural gas demand in the US, Canada, China (e.g., [
26,
27,
28]), and Australia, but their accuracy is worse for smaller countries compared to country-specific models. The trend of building country-specific forecasts is very visible in the literature. Policymakers are interested in the relationship between consumption and country-specific variables. Unfortunately, the literature on forecasting for middle markets, e.g., Poland, is quite scarce and includes: ref [
14] (logistic model), ref [
29] (neural networks, medium-term forecasting for a single urban area).
The main research gap is that long-term forecasting appears only in about 20% of studies [
8,
9,
30]. Many researchers create long-term forecasts using models with dependent macroeconomic variables (GDP, population, unemployment rate, etc.), without distinguishing between spatial and types of gas consumption (private-industrial). Exceptions here are the forecasts for Iran [
31] (logistic regression and genetic algorithms) and Argentina [
25] (logistic models, computer simulation and optimization models), ref [
32] (regression, elasticity coefficients).
The construction of long-term forecasts should also consider the analysis of difficult to quantify factors, such as political regulations or changes in private consumption. For this purpose, attempts are made to approximate the behavior of consumers through the use of text streams in the web and sentimental analysis [
33]. This is a relatively new approach and there is no extensive research in the literature yet.
Short-term gas consumption forecasts, traditionally using simple time series forecasting models, are more and more often replaced by artificial neural networks, in particular deep neural networks, including LSTM (Long Short Term Memory) networks [
3,
5,
34].
However, in contemporary research, econometric and statistical methods are most often used. In the literature, the most popular group of models are econometric models (e.g., [
35,
36]) and statistical models ([
19,
22]). Recent years have brought a lot of interest in artificial intelligence methods [
20,
24,
37,
38,
39]. The latest research (2018) [
9] compares the accuracy of more than 400 models of energy demand planning. The authors of this study note that statistical methods work better in short- and medium-term models, while in the long-term, models based on computational intelligence are much better. This is due to, among others, the advantage of computational intelligence methods for poorly cleaned data. Typical data sets used in gas demand forecasts are: gas consumption profiles, macro- and microeconomic data (e.g., households), and climatic data [
40].
The Gray models have become quite popular recently [
24,
26,
27] and they give good results in the case of short time series. A relatively new method presented in this article is the use of the fuzzy approach in gas forecasting [
41,
42]. So far, there are few publications on the use of the fuzzy approach in the forecasting of natural gas consumption in relation to other forecasting methods. Significantly, in recent years the interest in this approach has been growing. It is commonly expressed in the literature that the accuracy of forecasts in each horizon can be significantly improved by aggregating forecasts made with different methods [
43,
44]. For gas consumption, a typical combination of forecasts is a mixture of regression and neural networks [
45,
46]. Regression extrapolates trends better, and neural networks give better estimation of changes in the behavior of the forecasted object over time. This is possible because the artificial intelligence approach allows for the evolution of parameters over time. For long-term forecasting Azadeh et al. [
47] proposed a methodology that integrates adaptive neuro-fuzzy inference systems with DEA and Fuzzy DEA to improve forecast for five Middle-Eastern countries.
An important element in creating forecasts is the scenario analysis. Currently, simple scenario-variant analyzes are being built. There are very few gas consumption forecasts with scenario analysis in the literature (e.g., [
35,
36]—Italy, regression analysis).
The concept of combining analogue forecasting with a fuzzy decision tree extends the method of medium-term forecasting of sales of clothing industry products proposed by Thomassey and Fiordaliso [
48]. This method is based on grouping products using the k-means clustering and then classifying them with a decision tree. Analogue forecasting is often used in forecasting new product sales based on past sales of similar products. This approach was presented for instance by Basallo-Triana, Rodríguez-Sarasty and Benitez-Restrepo [
49]. An analogous approach was applied also to the forecasting of sales of steel products [
50,
51].
The aim of our solution proposed in the article was to build a forecasting method based on analog forecasting and fuzzy decision trees. There are no such solutions in the literature on the subject.
3. Analog Forecasting
The concept of analog forecasting was introduced by Lorenz in 1969 [
52] in application to the states of the atmosphere. This concept was later also used in economic forecasting, e.g., [
53,
54,
55].
There are four main analog forecasting methods in the literature [
56]:
The method of biological analogies,
The method of spatial analogies,
Historical analogy method,
The method of space-time analogies.
In the further part of the paper, we applied the spatial-temporal analogy method to selected countries in different periods of time. Assuming that the states of the economy can be described by a set of measurable quantitative indicators, we assume that , where f is the transformation of the state of the economy into a vector of measurable indicators, is a certain state of the economy in period T, and is a vector of selected indicators.
Definition 1. Letand, be the states of the economy in one or two objects described by the parameter vectors. Similar by analogy are the states of the economy for whichdiffer slightly from each other, with the assumed similarity indexand the assumed error.
The most important methodological problem of the analog forecasting is determining the similarity of states of economy. For quantitative forecasts, quantitative similarity criteria should be used. Regarding time series, they are the shape similarity criterion and the level similarity criterion. In the case of similarity in shape, the critical value
should be met, which indicates the existence of a sufficient similarity of the tested fragments of the time series. The objects for which the condition is met:
where
is a measure of the similarity of objects
O and
m, are considered to be similar.
Regarding time series, the criterion for the similarity level of the vectors
in two objects
O and
m is:
where
t is time index.
The concept of similarity level commonly used in time series can be easily generalized to an n-dimensional state space (states of the economy or more generally—features of objects) in which objects will be similar if their distance in the n-dimensional feature space is small.
The generalization of the idea of simple time series similarity to states of the economy
and
, where each observation is a
n-dimensional real vector
, the similarity criterion takes the form:
where
is the norm of a vector.
The generalization of the similarity criterion to n-dimensional vector allows the use of the k-means clustering algorithm. Its idea is to find objects close to each other in the n-dimensional feature space using the mean value of a group of similar objects.
Given a set of observations {
1,
2, …,
m}, where each observation is a
n-dimensional real vector, k-means clustering groups the
m observations into
k (≤
m) sets C = {
c1,
c2, …,
ck} so as to minimize the within-cluster sum of squares (i.e., variance). Formally, the objective is to find:
where
is the mean of points in set
ci. This is equivalent to minimizing the pairwise squared deviations of points in the same cluster:
The basic algorithm of the k-means method [
57] is as follows:
Determining the number of classes (clusters).
Selection of class centers (centroids) by random selection of k observations, selection of the first k observations from a set, or selection that allows to maximize the distance of the clusters.
Assigning points to the nearest centroids—each element is assigned to the class (cluster) to the center of which it has the closest (the measure of similarity here is the distance between the element and the centroid).
Calculation of new cluster centers—most often the new center of the class is the point whose coordinates are the arithmetic mean of the coordinates of the elements belonging to this class.
Repeating the algorithm until reaching the convergence criterion (usually it is a step in which the allocation of points has not changed or after the algorithm has reached the number of adopted iterations).
To calculate distance between observations the Euclidean distance (Equation (6)), Manhattan distance (Equation (7)), or Minkowski distance (Equation (8)) are among the most popular measures. Let observations
and
be
n-dimensional vectors. Euclidean distance is defined as:
Similarly, we can define other measures: Manhattan
and Minkowski distance, which is a generalization of Euclidean distance
After obtaining the assignment of objects to classes based on a certain set of features, it is possible to make forecasts for new objects based on the similarity criterion determined for other features of objects belonging to each class.
4. Fuzzy Approach in Learning Decision Trees
4.1. Basic Concepts of Fuzzy Learning
Regardless of the continuous development of methods of obtaining knowledge from data, classical crisp approach is insufficient to represent human thinking and perception. For this reason, the use of the fuzzy approach is becoming more and more common. Therefore, the proposed method of forecasting uses a fuzzy sets theory. First, we introduce some basic concepts, beginning from dataset.
The training set is a bag (multiset) consisting of a set of instances (objects). Each instance is described by vector of attribute values denoted as . The bag schema gives the description of the attributes and their domains. In this article, bag schema consists of set of input attributes and and target attribute .
Domain values of
are denoted by
. All attributes have infinite cardinalities, and target attribute has the finite cardinality
, where
. Formally, training set is
where
. Given the training set
, the goal of classification problem is to induce optimal classifier with minimum generalization errors.
Despite the rapid development of classification methods, crisp decision trees are one of the most used algorithms for this purpose. Unfortunately, this method has its serious drawbacks. It does not deal well with datasets which are mixture of numeric and categorized variables. Classical crisp trees are also very prone to incompleteness, imprecision, and ambiguity in data.
There are at least two main types of uncertainty: probabilistic (aleatory) and epistemic. Probabilistic uncertainty deals with the behavior of process that arises from its natural, built-in randomness. Epistemic uncertainty, on the other hand, deals with limited knowledge and lack of information. Epistemic uncertainty stems from vagueness and ambiguity. Ambiguity occurs when choice between two or more alternatives is unspecified. Vagueness, on the other hand, concerns a difficulty in making a precise distinction between states of the nature [
58].
Fuzzy set theory, first introduced by Zadeh [
59], deals with epistemic uncertainty and provides a framework for study of vague phenomena. Fuzzy concept can be introduced into a classification problem if at least one of the input attributes is fuzzy.
Definition 2. Letbe a universe of discourse (UoD), representing a collection of objects denoted generically by. A fuzzy setis a collection of ordered pairswhereis a non-empty set andis a membership function that assigns to eacha grade of membership.
There are several algorithms for induction of fuzzy decision trees. We implement the algorithm proposed by Yuan and Shaw [
60] because it can handle the situations where numerical values can be represented as different types of fuzzy terms. Training procedure of fuzzy decision tree model consists of following steps:
partitioning,
fuzzification,
pruning tree.
4.2. Partitioning
Partitioning process is a transformation of numerical attributes into linguistic variables. Each numerical attribute needs to be represented as linguistic attribute . Let be numerical attribute, where the Universe of Discourse is delimited by known bounds of , such that . Linguistic attribute takes values from domain . The membership specifies the degree to which object ’s attribute is .
Definition 3. Linguistic attributeis called the linguistic attribute overifis a collection of fuzzified values of.
Definition 4. The group of fuzzy sets, foris understood as the linguistic variable, and each fuzzy setis a linguistic value of the linguistic variable. Technically,.
Partitioning process is the division of each into intervals i.e., split into overlapping intervals , creating the linguistic attribute to describe .
To create partitioning method three hyper-parameters impact creation of linguistic attribute
: the membership function
number of partitions
, and partitioning method
. They indirectly influence the classification [
61].
The membership function
has the lowest impact on the accuracy of the forecast. The most popular membership functions are: singleton, triangular, trapezoidal, and Gaussian. Yuan and Shaw [
60] suggested a simple partitioning algorithm which used triangular membership function on attribute data. The high efficiency of this algorithm is confirmed by many studies and by observations of the authors of the article.
The choice of the number of partitions has a direct impact on the accuracy of the model. The value of is a trade-off between accuracy (bias) and generalization (variance). A low value of will generate too few fuzzy sets representing the attribute, making the model too general. High will generate too many fuzzy sets, causing the model to overfit. The optimal number of has to be optimized for each problem.
The partitioning scheme should be also treated as hyper-parameter. The most often used partitioning method is called Grid Partitioning where the domain of numerical attribute is divided into equal length intervals. There are many other partitioning techniques, the most popular are k-means and fuzzy c-means. However, the grid partitioning should be always the first approach to start with, due to its simplicity and small cost.
In our method each linguistic attribute
consists of the same number of intervals
, and each
has a triangular membership function:
The only parameters that need to be determined are the set of centers .
4.3. Fuzzification
Fuzzification—this stage aims to transform the crisp numerical attributes
into linguistic attribute
. There are two approaches to this topic: maximizing and holistic. Formally these approaches are as follows:
for the maximizing one and
for the holistic approach. The method of fuzzification can be considered as another hyper-parameter method.
Once the crisp attribute is converted to the fuzzy time series the process of induction of a fuzzy decision tree is ready to start.
In our method we utilize the Fuzzy Iterative Dichotomizer 3 (FID3). The algorithm measures the classification ambiguity associated with each attribute and splits the data using the attribute with the smallest classification ambiguity.
Definition 5. Cardinality measure for a given pair of linguistic termswith membership functionsand class, the fuzzy subsethoodmeasures the degree to whichis a subset ofwhereis the cardinality measure defined as: Note that for a given pair of fuzzy sets
and
with membership functions
and
respectively, the membership function of the intersection
is defined as the minimum of the two individual membership functions
Note also that in our case membership function for
is a crisp membership i.e.,
This measure can be used to determine the splitting criteria of the attribute. There are various splitting criteria. In our research we choose classification ambiguity and fuzzy entropy.
Definition 6. Fuzzy entropy of linguistic attributeis calculated as follows: Definition 7. The classification ambiguity of fuzzy attribute, on fuzzy evidence, denoted as, is the weighted average of classification ambiguity calculated as follows:whereis the weight which represents the relative size ofdefined subsequently: 4.4. Pruning and Inducing Fuzzy Decision Tree
Pruning a decision tree is a technique which involves the selective removal of certain branches of a tree to improve the tree’s structure and overcome problem of overfitting. Before presenting the tree-building algorithm, let us first consider the hyper-parameters responsible for its pruning. Let
be the number of objects in feature set and
number of objects classified as
. Leaf controls threshold
specifies the minimum number of elements in filtered fuzzy evidence
for a node. The fuzzy threshold
determines the share of the most frequent value of the
classification in
. The induction algorithm of fuzzy decision tree is based on Definitions 3 through 7. The idea of algorithm inducting the fuzzy decision tree is presented in Algorithm 1.
Algorithm 1. Algorithm for fuzzy decision tree induction |
Input: –a training set, –Input feature set, –target feature, –thresholds for control of growth of the tree |
Output: Fuzzy Decision Tree FDT |
START |
1: Create fuzzy decision tree FDT with a single root node |
2: if is empty or one of the ’s is below critical value then |
3: Mark DFT as leaf with the most common value of from as label |
4: Return DFT |
5: end if |
6: For each find with the smallest classification ambiguity |
7: for each outcome of do |
8: Recursively call procedure with corresponding partition |
9: Connect the root node to the returned subtree with an edge that is labelled as |
10: end for |
11: Return DFT |
STOP |
Classification of new instance—in crisp decision tree, only one rule can be applied for every object in training set. In fuzzy decision tree, several rules can be applied. In contrast to classical approach an object is classified with different degrees of membership into several classes. The fuzzy classification procedure for new instance consists of following steps [
60]:
Converted FDT into set of rules.
For each rule, calculate the membership of the object for the premise of rule.
For each class, aggregate membership derived from all rules.
As a result, object (new instance) is classified into several classes with different degree. To obtain final classification the process of defuzzification needs to be completed (
Section 4.3). Detailed description of our approach to classify new instance and build forecast is described in
Section 6.3.
6. Methodology
6.1. Finding Analogies Using Clustering
The proposed forecasting method combines k-means clustering and fuzzy decision trees into a single framework that operates on panel data.
The k-means algorithm is an unsupervized learning method that classifies observations based on the adopted distance measure to a predetermined number of classes. The purpose of the k-means method is to create groups of analogies. Determining the variables underlying the analogy for natural gas consumption is an important step in analog forecasting. For this purpose, two relative measures were selected: energy intensity of the economy (Energy_GDP) and the share of natural gas in the consumption of primary energy carriers (NG_mix). The first one characterizes the level of technological development (the negative correlation with GDP per capita mentioned in the previous section), as well as describes the energy intensity of the economy depending on the structure of GDP generation according to sectors. The second indicator—the share of natural gas in the Energy mix—takes into account two elements: the ease of access to natural gas resources (net exporter/importer of a given country) and a country’s commitment to decarbonizing the economy by replacing fossil fuels with natural gas.
Contrary to the results of the correlation analysis, we also try to include the degree heating days index, but the empirical verification did not bring positive results and did not improve the accuracy of classification and forecasting. According to the results of the correlation analysis, climatic factors are not important for determining the level of natural gas consumption in the world. Finally, clusters for country-year observations were created based on two variables: Energy_GDP and NG_mix.
Subsequently, a v-fold cross-check was used to determine the number of clusters. The minimum number of clusters suggested in automatic clustering algorithm is six clusters, the optimal number of clusters is 12, however the cost function (double log likelihood function) is further reduced to about 15 clusters. The error in the training sample for 15 clusters was 0.047 for normalized data, for 12 clusters it was 0.059, while for six clusters the error was 0.086. The characteristics of the obtained clusters are presented in
Table 4. The values in the Energy_GDP and NG_mix columns are the coordinates of the centers of individual clusters.
Detailed cluster analysis provides interesting observations on energy consumption and the level of natural gas use in the structure of primary energy carriers. For example, China in the years 1980–1994 was in cluster three (quite high energy intensity of the economy and moderate consumption of natural gas), and in the years 1995–2011 it was in cluster two (not very high energy intensity and low consumption of natural gas). Germany in the years 1980–1993 was in cluster seven (low energy intensity of the economy and moderate consumption of natural gas), and then in the years 1994–2011 it was in cluster 15 (low energy intensity of the economy and higher consumption of natural gas). Similar observations can be made for other countries. For example, France in the entire period 1980–2011 was in cluster seven. In turn, the USA in the entire period 1980–2011 was in cluster 15. Japan in 1980–1983 was in cluster 12, and then in 1984–2011 in cluster seven. Russia throughout the period 1980–2011 belonged to cluster 13. Great Britain in the years 1980–1995 was in cluster 15, and then in the years 1996–2011, it was in cluster 11. Poland in the years 1990–1999 was in cluster two, then in cluster seven. Brazil in 1980–2003 was in cluster 12, then in 2004–2011 it was in cluster 7. India was in cluster two throughout the entire period 1990–2011. Australia in 1980–2006 was in cluster two, then in 2007–2011 it was in cluster three. Switzerland in 1980–1998 was in cluster 8, then in 1991–2011 was in cluster two. It is visible that analogue approach can capture changes in the economic behavior of a country by changes in a group membership. Synthetic indicators of Energy GDP and Energy_mix partially consider the technological development of the country. The combination of these two variables also includes the impact of increased use of alternative energy sources.
6.2. Fuzzy Decision Tree Learning
The decision tree is a supervized learning method in which the assignment to previously created classes is based on selected explanatory variables. In the next step, a fuzzy decision tree was built. The purpose of was to assign country using selected variables to appropriate groups of analogies created by clustering. We started with a selection of explanatory variables for a fuzzy decision tree. As a rule, chosen variables have to be a different explanatory variable than those used for k-means clustering. The explanatory variables in this stage should reflect commonly measured economic phenomena in all countries of the world. These variables, as independent variables used to forecast future membership in clusters, must have individual long-term forecasts.
Hence, the variables for which strategic forecasts are usually created, being a component of social and economic development plans for any country in the world, were chosen. These are the forecasts for the country’s GDP and population, as well as the target structure of energy consumption reflected in Energy mix indicator. The target structure of the Energy mix is actively supported by national governments through fiscal policy. For the sake of comparability between countries, we have converted the GDP to a relative value—GDP per capita. Finally, three explanatory variables were selected: GDP per capita, population, and share of natural gas consumption in the Energy mix. Selected explanatory variables served as an input variable in learning of a fuzzy decision tree, where consumption profiles were the explained variable. The construction of a fuzzy tree consisted of three steps:
Transformation of explanatory variables of crisp type into fuzzy variables.
Extraction of knowledge from a dataset containing explanatory variables and explained variable with the help of a decision tree.
Building a forecast based on the acquired knowledge.
The transformation of explanatory variables into fuzzy versions is performed with the help of a linguistic variable. In our case, selected explanatory variables were used in fuzzy decision tree learning process to classify observations from 1980–2011 into 15 classes based on the labels previously assigned to each observation as a result of clustering with the k-means method (unsupervised learning). The accuracy of the best fuzzy decision tree classification, measured by accuracy coefficient, was 80%.
The accuracy of the decision tree classification significantly depends on the parameters described in
Section 4 of this paper. Many combinations of hyper-parameters were tested during building
. The method of hyper-parameters selection and discussion of results is presented in
Section 7.
6.3. Final Analog Forecasting of Natural Gas Consuption
The procedure described above creates country profiles using the k-means clustering method. Then the fuzzy tree determines the similarity of the territory for which the forecast is built to these groups of analogies. We will call this approach FAM (fuzzy analog model) in the rest of the article. Since we are dealing with a fuzzy classification, our model returns collection of membership functions that define the similarity to particular groups of analogies.
where
is a grade of membership to the group of analogies
cl.
Forecasting procedure based on FAM for a single object O consists of building a forecast for each group of analogies with non-zero membership function. Then, all forecasts are aggregated by the mean weighted by the membership functions.
The easiest way to create forecast for one group of analogies is to use the average values of energy consumption and the share of natural gas in the energy mix for cluster members, and then multiply them by the value of the forecast GDP. However, such forecast is subject to a large error, because each country within any cluster has a different base value of natural gas consumption at the beginning period of forecasting. We decided to use chain increments instead. This is a commonly used approach in analog forecasting [
56]. It is based on averaging relative increments of the two indicators (Energy_GDP and Energy_mix) for a given cluster and the creation a forecast starting from the base year—the last year of actual data.
The forecast for the object
O in the period
T+
h for group
O of analogies
cl is given by function
where
—forecast for the variable
O in the period
T+
h,
—arithmetic mean of relative chain increments in natural gas consumption in cluster
cl to which object
O belongs in period
t based on economic indicators.
The forecast for the object
O in the period
T+h can be written as
where
is obtained as follows:
where:
—relative increase of natural gas consumption of the country
l belonging to cluster
cl in period
t,
end(
l)—the last year of the period in which country
l belongs to the cluster
cl, |
cl|—number of countries belonging to the cluster
k.
Relative chain increments for country
l in period
t are calculated according to the formula:
where:
—gas consumption of country
l belonging to cluster
cl in period
t,
ti0—the first year of data for the country
l.
As a result, k forecast is being built. In order to prepare a forecast, it is necessary to somehow aggregate the forecasts using membership functions. This process is called sharpening. As we show it above there are, broadly speaking, two approaches. It is usually assumed that the aggregate forecast is equal to the forecast for the group of analogies with the highest membership function. This would lead to the loss of information contained in the fuzzy set. In our method, we proposed to make k forecasts, which then are sharpened based on the membership function.
Our hybrid forecasting process using analog forecasting and fuzzy decision trees is presented in
Figure 1 and Algorithm 2. After necessary modifications of input variables, described forecasting process can also be used for territorial units of any country.
Algorithm 2. Algorithm for forecasting the annual consumption of natural gas |
Input: S(B)—collection of m observations containing attributes—annual economic indicators for ms territorial units from the period of T years, where m = msT |
Ster—set of forecasts of n economic indicators for ms territorial units for period of T years |
Output: Sf—set of annual natural gas consumption forecasts for ms territorial units in h years |
START |
1: Calculate the growth rates of the economy’s energy consumption indicators and the share of gas in the Energy mix for each ms territorial units in the subsequent years of the T period |
2: Assign observations from the set S(B) to k subsets Each subset contais selected energy carrier consumption indices, where the number of clusters k is determined by the cross-validation method for the declining cost function |
3: Calculate the average growth rates of the economy energy consumption indices and the share of gas in the Energy mix in period T for each of the k clusters |
4: Fuzzy the s selected attributes (economic and social indicators) from the set S(B) |
5: Build the fuzzy decision tree describing the relationship between s attributes and k analogy groups (clusters) |
6: Fuzzy input variable in test set Stest |
7: Build a collection of membership functions that define the similarity to particular groups of analogies using the fuzzy decision tree obtained in step 5 |
8: Build gas consumption forecasts for each territorial unit by relative chain increments using average growth rates for all clusters to which each observation belongs |
9: Calculate the forecasts for each territorial unit by sharpening the center of gravity method. Repeat the calculation for each unit in subsequent years of the forecast |
STOP |