Statistical Comparison of Time Series Models for Forecasting Brazilian Monthly Energy Demand Using Economic, Industrial, and Climatic Exogenous Variables

Serrano, André Luiz Marques; Rodrigues, Gabriel Arquelau Pimenta; Martins, Patricia Helena dos Santos; Saiki, Gabriela Mayumi; Filho, Geraldo Pereira Rocha; Gonçalves, Vinícius Pereira; Albuquerque, Robson de Oliveira

doi:10.3390/app14135846

Open AccessArticle

Statistical Comparison of Time Series Models for Forecasting Brazilian Monthly Energy Demand Using Economic, Industrial, and Climatic Exogenous Variables

by

André Luiz Marques Serrano

^1,†

,

Gabriel Arquelau Pimenta Rodrigues

^1,†

,

Patricia Helena dos Santos Martins

^2,†

,

Gabriela Mayumi Saiki

^1,†

,

Geraldo Pereira Rocha Filho

^1,3,†

,

Vinícius Pereira Gonçalves

^1,†

and

Robson de Oliveira Albuquerque

^1,*,†

¹

Professional Post-Graduate Program in Electrical Engineering (PPEE), Department of Electrical Engineering (ENE), Faculty of Technology, University of Brasilia (UnB), Brasília 70910-900, Brazil

²

Post-Graduate Program in Economics (PPGECO), Department of Economics, School of Business, Economics, Accounting and Public Administration, University of Brasília (UnB), Brasília 70910-900, Brazil

³

Department of Exact and Technological Sciences (DCET), State University of Southwest Bahia (UESB), Vitória da Conquista 45083-900, Brazil

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2024, 14(13), 5846; https://doi.org/10.3390/app14135846

Submission received: 24 May 2024 / Revised: 26 June 2024 / Accepted: 27 June 2024 / Published: 4 July 2024

(This article belongs to the Special Issue Advanced Forecasting Techniques and Methods for Energy Systems)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Energy demand forecasting is crucial for effective resource management within the energy sector and is aligned with the objectives of Sustainable Development Goal 7 (SDG7). This study undertakes a comparative analysis of different forecasting models to predict future energy demand trends in Brazil, improve forecasting methodologies, and achieve sustainable development goals. The evaluation encompasses the following models: Seasonal Autoregressive Integrated Moving Average (SARIMA), Exogenous SARIMA (SARIMAX), Facebook Prophet (FB Prophet), Holt–Winters, Trigonometric Seasonality Box–Cox transformation, ARMA errors, Trend, and Seasonal components (TBATS), and draws attention to their respective strengths and limitations. Its findings reveal unique capabilities among the models, with SARIMA excelling in tracing seasonal patterns, FB Prophet demonstrating its potential applicability across various sectors, Holt–Winters adept at managing seasonal fluctuations, and TBATS offering flexibility albeit requiring significant data inputs. Additionally, the investigation explores the effect of external factors on energy consumption, by establishing connections through the Granger causality test and conducting correlation analyses. The accuracy of these models is assessed with and without exogenous variables, categorized as economical, industrial, and climatic. Ultimately, this investigation seeks to add to the body of knowledge on energy demand prediction, as well as to allow informed decision-making in sustainable energy planning and policymaking and, thus, make rapid progress toward SDG7 and its associated targets. This paper concludes that, although FB Prophet achieves the best accuracy, SARIMA is the most fit model, considering the residual autocorrelation, and it predicts that Brazil will demand approximately 70,000 GWh in 2033.

Keywords:

energy; demand analysis; forecasting; time series

1. Introduction

Energy demand forecasting plays a key role in ensuring the efficient allocation and management of resources within the energy sector. It also helps achieve sustainable development goals, particularly Sustainable Development Goal 7 (SDG 7) and its targets [1].

The SDG7 incorporates a number of targets, including the following: ensuring universal access to affordable, reliable, and modern energy services (SDG 7.1), increasing the share of renewable energy in the global energy mix (SDG 7.2), doubling the global rate of improvement in energy efficiency (SDG 7.3), encouraging international cooperation for clean energy research and technology (SDG 7.4), and improving the infrastructure for sustainable energy services in developing countries (SDG 7.5). To this extent, Table 1 demonstrates the actions taken by Brazil to reach these targets.

Accurate predictions about the international energy outlook are essential to enable utilities, policymakers, and stakeholders to make informed decisions that support these targets. Long-term trends, such as socioeconomic factors and technological advances, influence energy consumption patterns over extended periods and are aligned with SDG7’s goal of increasing access to modern energy services (SDG 7.1) and enhancing energy efficiency (SDG 7.3). Short-term seasonal factors, including variations in weather conditions and industrial production cycles, also affect energy demand fluctuations. These are fundamental considerations when seeking to achieve the targets of SDG7, particularly with regard to the integration of renewable energy sources (SDG 7.2) and infrastructural development (SDG 7.5).

Furthermore, unexpected events such as economic fluctuations and policy changes highlight the importance of robust forecasting methods to support international cooperation in clean energy research and technology (SDG 7.4). By selecting appropriate forecasting techniques, utility companies and researchers can contribute to global efforts to enhance access to clean energy and create a sustainable infrastructure for renewable energy (SDG 7.A, 7.B).

Accurate knowledge of energy consumption enables cities to formulate effective strategies for energy management, advancing the development of smart cities [2]. By analyzing consumption patterns, authorities can identify peak demand periods, allocate resources efficiently, implement demand response programs to alleviate strain on the energy grid, and develop sustainable urban planning initiatives. This approach may also be used in data centers and cloud environments, leading to improved energy efficiency and reduced operational costs [3].

As the global population grows, urbanizes, and industrializes, the demand for electricity rises, introducing uncertainty regarding future consumption patterns [4]. Remarkably, this has been a global concern with the recent growth in the use of electric vehicles [5]. This brings challenges to energy planners and utility providers to accurately predict and provide future energy needs. This is especially true for developing countries [6], such as Brazil.

Due to the relevance of this field, researchers have focused on time series models for forecasting electricity consumption in Brazil [7,8,9,10,11,12]. None of them, however, forecast the general energy consumption in Brazil to the time frame proposed in this work, for up to 2033. They are either sector-specific, such as industrial or residential demand, or their forecasting does not cover the temporal scope of this work. Hence, this study addresses these limitations by long-term forecasting Brazil’s energy consumption, thereby contributing to Brazilian policymakers and global energy research.

The primary objective of this research is to compare the performance of various forecasting models in predicting energy demand in Brazil while seeking to achieve the targets of SDG7 by finding innovative ways of improving energy demand forecasting methods and forecasting energy consumption in Brazil for up to 2033. More specifically, this study seeks to evaluate the effectiveness of the following forecasting models: Seasonal Autoregressive Integrated Moving Average (SARIMA) and Exogenous SARIMA (SARIMAX) [13], Facebook Prophet (FB Prophet) [14], Holt–Winters [15], Trigonometric Seasonality Box–Cox transformation, ARMA errors, Trends, and Seasonal components (TBATS) [16]. However, this study does not evaluate the Autoregressive Integrated Moving Average (ARIMA) due to the presence of seasonality in the studied time series, which the model cannot handle.

These models are selected based on a literature assessment. Facebook Prophet is a time series model that has been demonstrated to have good accuracy for predicting energy demand [17,18]. Nonetheless, as seen in Section 2, it has not been as extensively explored as other models. Conversely, ARIMA-based models are broadly applied in the energy sector [19]. Holt–Winters and TBATS have achieved better accuracy than ARIMA models in specific energy scenarios [20,21]. As seen in Section 2, this is the first study to evaluate and compare these models concurrently.

For the optimal use of these models, however, specific characteristics of the energy consumption time series must be evaluated. These features, such as periodicity of seasonality and stationarity of the time series, are assessed with statistical tests and passed as parameters for the models.

1.1. Contributions and Limitations of the Work

By conducting a comparative analysis of SARIMA, FB Prophet, Holt–Winters, and TBATS, this research seeks to determine their respective strengths and weaknesses by investigating the complex dynamics of energy demand in Brazil, along with an estimation of energy demand in the country in 2033. It also contributes by updating the energy consumption forecasting in Brazil, with better-assessed accuracy than previous recent studies, as seen in Section 2.3.

This study aims to offer practical recommendations for enhancing energy demand forecasting since this will lead to more efficient resource management within the energy sector and, thus, assist in achieving sustainable development goals.

Additionally, this research contributes to the understanding of time series model biases, such as overestimation or underestimation, providing a nuanced perspective on model selection. This work, thus, serves as a robust framework for researchers aiming to improve forecasting accuracy in diverse fields. We also make available the forecasted energy consumption values for Brazil up to 10 years ahead using the four time series models, providing an important benchmark for future validation as actual data become available.

This work also proposes a statistically robust methodology that rigorously evaluates time series, patterns, and anomalies. This methodology is not limited to energy consumption forecasting but is versatile and can be applied to any time series application, enhancing its utility across various domains.

As limitations, this study uses solely economic, industrial, and climatic indexes as exogenous variables. Other categories of variables could also be useful in describing energy patterns, such as demographic data. Such a study could explore the relationship between energy consumption and population growth rate, age distribution of the population, and urbanization rate.

Another limitation refers to the restricted scope of the data and of the time series models evaluated. This work forecasts exclusively the energy consumption, not production. It also assesses only the models SARIMA/X, FB Prophet, Holt–Winters, and TBATS, not encompassing promising models such as Temporal Fusion Transformers (TFTs), N-BEATS, DeepAR, and Gated Recurrent Units.

1.2. Organization of the Work

The remainder of this paper is structured as follows. Section 2 reviews the literature on time series forecasting and its applicability in the energy sector, reinforcing the novelty of this work through bibliometric analysis. Section 3 presents the material and the methodology used. Section 4 examines patterns and the results of statistical tests on the time series of energy consumption in Brazil. Section 5 compares the different models’ accuracy and time performance, presenting forecast values for up to ten years. Section 6 concludes the paper.

2. Literature Review

The significance of devising a sustainable energy plan and the emphasis on forecasting energy demand highlights the importance of managing energy demand—the necessity to align with the long-term objectives of ecological, societal, economic, and industrial sustainability goals [22]. By combining energy demand forecasting with sustainable energy planning, societies can progress toward a future that is both sustainable and equitable.

Therefore, energy demand forecasting is critical to energy planning and management, enabling stakeholders to make informed decisions regarding resource allocation, infrastructure development, and policy formulation. Energy demand forecasting methods can be broadly categorized into qualitative and quantitative approaches. As documented by [23], within quantitative methods, time-series models are widely used for energy demand forecasting because they capture the temporal patterns and trends inherent in energy consumption data.

Although this study focuses on energy demand, forecasting models have also been used for energy production, such as from wind power [24], including the use of state-of-the-art approaches, such as TFT [25,26].

Several studies have compared the performance of time-series models in different contexts and for various applications. Ref. [27] conducted a comprehensive review of forecasting methods, including classical and modern approaches. Similarly, ref. [28] used a Nonlinear Autoregressive (NAR) neural network to forecast the next decade of energy demand based on a publicly available data set for global energy consumption. Apart from energy, SARIMA, for example, has also been used for predicting traffic flow [29], and network traffic generated from an agricultural environment [30].

Energy demand forecasting has been a significant research area, with various methods proposed for accurate predictions. Neural networks, such as Nonlinear Autoregressive neural networks, can effectively handle statistical, empirical, and theoretical problems, as presented by [31].

Convolution Neural Networks (CNNs) and Conditional Random Fields (CRFs) have been used to forecast power consumption, achieving high accuracy, according to [32]. Electricity price forecasting is also relevant in this context [33].

Developing countries face challenges in achieving low-carbon and sustainable economic growth. Still, for [23], artificial neural networks (ANNs) hybridized with meta-heuristic techniques are superior in load forecasting. Probabilistic forecasting models, such as DeepAR and Deep State Space (DeepState), perform better for longer prediction horizons, as conducted by [27]. Machine learning (ML) techniques are widely used, particularly for short-term electricity forecasting, while engineering-based models cover long temporal horizons and appliance consumption, according to [34]. In energy generation, forecasting models have also been used for predicting parameters of electrical energy production, such as wind speed [35,36].

Existing studies on energy demand forecasting methods in Brazil have explored various models and approaches. The Holt–Winters method, Seasonal ARIMA model, dynamic linear model, and autoregressive neural network model have been compared for forecasting electric energy consumption in the Brazilian industrial sector [37]. Regression models and recurrent neural networks have been evaluated for predicting electricity consumption in the northern region of Brazil, with the linear regression model using moving average and the recurrent neural network model performing well [38].

Time-series forecasting models such as Holt–Winters, SARIMA, dynamic linear model, TBATS, neural network autoregression, and multilayer perceptron have been applied to forecast industrial electricity consumption in Brazil, with the multilayer perceptron model showing the best performance [10]. Gray-type models, including the optimized nonlinear gray Bernoulli model and the nonlinear gray Bernoulli model with particle swarm optimization, have also been compared with classic time series models like ARIMA, with the models showing equal predictive performance [39].

Considering the relevance of the energy sector, which is deemed as critical infrastructure, for a country’s sustainable development, industrial productivity, and economic growth, the cybersecurity aspects of this sector should also be assessed and improved. A cyber threat can disrupt energy operations, compromise data integrity, and even pose risks to public safety. In that regard, common attack vectors in the energy sector have been studied [40], along with possible regulatory outcomes [41].

2.1. Bibliometric Review

A bibliometric analysis is conducted to examine the publication domain surrounding electric energy demand forecasting. To support this analysis, the Scopus database is used. The search terms presented in Listing 1 result in 3549 documents that are used in the analysis of this section.

Listing 1. Search terms used in Scopus for publication count per target country.

(time-series OR “time series”) AND (forecast OR forecasting OR
predicting OR predict) AND energy AND (electric OR electrical
OR electricity)

For a greater similarity with this work, for the following investigations, the documents are filtered to include exclusively those in at least one of the areas of Engineering, Energy, Computer Science, Mathematics, or Decision Sciences. This results in 2882 documents.

As the electric energy consumption and production patterns vary among different economies and geographical regions, the forecasting studies usually target a specific country, such as this work, which is focused on Brazilian electric patterns. To assess the frequency at which the BRICS countries are the objects of similar studies, these filtered documents are filtered successively to the presence of each BRICS country in the document title, and the number of publications targeting each of them is depicted in Figure 1a. China leads the publication count, followed by India and Brazil, with similar values to each other.

Despite Brazil being a large country with significant energetic resources, there is little publication targeting it, when compared to China.

As this work is based on the SARIMA/X, FB Prophet, Holt–Winters, and TBATS forecasting models, the frequency at which these models are used is also evaluated for both general and energy purposes. To achieve this, Listing 2 shows the search terms used in Scopus, including the filter for energy application and the terms for searching documents that use all models simultaneously.

Listing 2. Search terms used in Scopus for publication count per time series model.

(fb OR facebook) AND prophet / AND energy AND (electric OR
electrical OR electricity)

(sarima OR sarimax) / AND energy AND (electric OR electrical OR
electricity)

holt AND winters / AND energy AND (electric OR electrical OR
electricity)

tbats / AND energy AND (electric OR electrical OR electricity)

tbats AND holt AND winters AND (sarima OR sarimax) AND prophet
AND (fb OR facebook)

The results, shown in Figure 1b, suggest that there is no study in the Scopus base that conducted a comparative assessment between these models simultaneously with any applicability, indicating the novelty of this work.

2.2. Discussion on the Strengths and Limitations of SARIMA, FB Prophet, Holt–Winters, and TBATS Models

The SARIMA model is a powerful tool for energy demand forecasting. It has been used in various studies to forecast electricity consumption, generation, peak load, and installed capacity [42]. It incorporates seasonal factors, whilst Exogenous SARIMA (SARIMAX) also supports additional regression with exogenous variables, which helps improve forecasting accuracy and reduce error values [43]. It outperforms simpler autoregressive integrated moving average-based techniques regarding forecasting accuracy [44].

The SARIMA model is particularly effective in capturing vital seasonality components in demand-related time series [45]. However, SARIMA may face challenges when dealing with irregular seasonal components in electricity demand. Despite this limitation, SARIMA has been shown to significantly improve the accuracy of predictions, leading to better management of electrical distribution.

Facebook Prophet is a time-series model used for energy demand forecasting. It has been applied in various studies to predict energy consumption in different sectors, such as manufacturing companies in the food sector [28]. The model has shown promising results in predicting energy consumption of electricity, water, and diesel fuel.

However, it is essential to note that the model’s performance may vary depending on the specific dataset and context. In the context of forecasting Italian electricity spot prices, combining seasonal and trend decomposition with Facebook Prophet has been found to improve forecast accuracy and reduce error rates [46]. The model considers factors like seasonality, climatic parameters, and special events to enhance projections.

The Holt–Winters model is a time-series forecasting model used in energy demand forecasting. It has several strengths and weaknesses. One of its strengths is that it can capture seasonal patterns in the data, making it suitable for forecasting energy demand with seasonal variations, according to [31].

Additionally, the model can handle nonlinear characteristics in the data, which is significant for accurately predicting energy consumption, as conducted by [47]. However, the Holt–Winters model may have limitations when the data patterns change over time or when there are sudden shifts in the energy demand, expressed by [48]. For ref. [49], it may also struggle to accurately forecast energy demand during significant disruptions, such as the COVID-19 pandemic. Despite these weaknesses, the Holt–Winters model has been shown to outperform other models, such as ARIMA, in terms of accuracy for energy demand forecasting [50].

Furthermore, the TBATS model is characterized by its flexibility in handling various seasonalities and non-linear trends, making it suitable for capturing intricate patterns in energy consumption [32]. Additionally, the incorporation of Box–Cox transformations in TBATS enables it to effectively deal with non-normality in the data, leading to more precise predictions [43]. Moreover, TBATS can handle missing values and outliers commonly encountered in energy demand data without compromising the accuracy of the forecasts [42].

However, certain limitations exist when using TBATS for energy demand prediction. One such restriction is that an adequate amount of historical data is required to accurately capture the seasonal and trend patterns [34]. Furthermore, TBATS might encounter obstacles in generating long-term projections, as it relies on the assumption that future patterns will resemble those observed in the past [42]. Lastly, TBATS is unlikely to perform optimally despite sudden changes or disruptions in the energy system, as it assumes a relatively stable and predictable environment.

Table 2 provides a comparative overview of four prominent time series forecasting models: SARIMAX, Facebook Prophet, Holt–Winters, and TBATS. Each model entry details its strengths and limitations, highlighting its capabilities alongside potential drawbacks. Additionally, the table identifies gaps in the current literature for each model.

These gaps point toward potential areas for future research, such as improving SARIMAX’s handling of irregular seasonality or optimizing Prophet for specific energy demand datasets. Therefore, the table summarizes the key characteristics of these forecasting models and pinpointing crucial areas for further development and where further research is necessary to optimize their performance in energy demand forecasting.

2.3. Comparison with Other Works

To establish an accuracy comparison base, the literature was searched for studies that used time series forecasting applied to energy. The common accuracy metric used is the Mean Absolute Percentage Error (MAPE), which is better defined in Section 3.

Reference [51] states that a MAPE value of less than 10% is considered a Highly Accurate Prediction (HAP), while values between 10% (inclusive) and 20% (exclusive) fall into the category of Good Prediction (GPR). Additionally, predictions with MAPE values between 20% (inclusive) and 50% (exclusive) are deemed Reasonable Predictions (RP), while those equal to or greater than 50% are labeled as Inaccurate Predictions (IPRs). All values presented in Table 3, which detail the accuracy of these studies, are classified as HAP.

Some of the models presented in Table 3, and their acronyms, are Regression–SARIMA–generalized autoregressive conditional heteroskedastic (Reg–SARIMA–GARCH), methods based on Box–Cox Transformation Quantile Regression via Normal Distribution (N-BCQR), the Error-Correction State Space Model (ECSTSP), and a hybrid model based on seasonal adjustment (SEEMDGESN).

To the best of our knowledge, the study by [38] is the most recent to employ MAPE as an evaluation metric for forecasting energy in Brazil. However, despite its recent publication, other older studies have achieved superior performance in terms of MAPE values [58,59]. This presents an opportunity for our work to enhance the accuracy of energy forecasting in Brazil compared to recent studies.

Also, this study makes several contributions that distinguish our research in relation to these previous works. We propose a time series analysis methodology, which is conducted prior to the actual forecasting. This methodology enables pattern recognition within the data and assists in model selection and model parameters specification, making it applicable beyond energy consumption to any time series domain.

Furthermore, the assessment of these forecasting models does not rely solely on accuracy, but also on a rigorous statistical assessment of model fit through the Ljung–Box test on residuals, ensuring robust validation. We also publish predicted data extending up to 10 years into the future, providing an essential benchmark for future validation as actual data becomes available. Moreover, we incorporate a wide range of exogenous factors, such as climatic, industrial, and economic variables, to capture the complex dynamics influencing energy demand in Brazil.

3. Materials and Methods

This section describes the historical data used and the statistical analysis performed on the resulting dataset, as depicted in Figure 2.

3.1. The Dataset

The dataset used in this study is composed of data from different sources, in the time range between January 2004 and November 2023. The energy consumption data are obtained from the Energy Research Company (EPE). The economic and industrial variables are extracted from the Institute of Applied Economic Research (IPEA), except for the number of registered electric vehicles, which is obtained from the Brazilian Association of Electric Vehicles (ABVE). The source of the climatic variables is the National Institute of Meteorology (INMET).

The target variable, which is the focus of this study, is the energy consumption within the territory of Brazil, measured in megawatt-hour (MWh). The independent variables may be categorized into industrial factors, which are the production of assembled automobiles (units), oil production (average daily quantity–barrel), capital goods production (quantum index, taking 2012 as the base year), consumer goods production (quantum index, taking 2012 as the base year), intermediate goods production (quantum index, taking 2012 as the base year), fertilizer production (tons), industrial wage bill (percentage, taking 2006 as the base year), sales of motor vehicles in the domestic market (units), and registration of electric cars (units).

The climatic variables are measured hourly in 473 municipalities across all 26 states and the Federal District of Brazil. In the consolidated dataset, the hourly values are grouped by month and averaged, encompassing all municipalities. The variables are total precipitation (mm), atmospheric pressure at station level (mb), maximum atmospheric pressure in the previous hour (aut) (mb), minimum atmospheric pressure in the previous hour (aut) (mb), solar global radiation (kJ/m²), air temperature-dry bulb (°C), dew point temperature (°C), maximum temperature in the previous hour (aut) (°C), minimum temperature in the previous hour (aut) (°C), maximum dew point temperature in the previous hour (aut) (°C), minimum dew point temperature in the previous hour (aut) (°C), maximum relative humidity in the previous hour (aut) (%), minimum relative humidity in the previous hour (aut) (%), relative humidity (%), wind direction (gr) (° (gr)), maximum wind gust (m/s), and wind speed (m/s).

The economic variables are the Gross Domestic Product (GDP), measured in millions in Brazilian currency, R$), changes in prices of exports (%, taking 2018 as the base year), changes in prices of imports (%, taking 2018 as the base year), General Price Index-Domestic Availability (GPI-DA, or IGP-DI in Portuguese, taking 1994 as the base year), General Market Price Index (GMPI, or IGP-M, taking 1994 as the base year), National Consumer Price Index (NCPI, or INPC, taking 1993 as the base year), and Extended National Consumer Price Index (ENCPI, or IPCA, taking 1993 as the base year).

The GPDI tracks price variations in various products, including commodities, durable consumer goods, and services. Prices are collected between the 1st and the 30th of the reference month for this index. In contrast, the GMPI collects price data between the 21st of the previous month and the 20th of the reference month. While both indices measure price fluctuations in similar products and services, they assign different weights to various categories. The GMPI is commonly utilized for rent adjustments and public tariffs.

The NCPI and ENCPI track price changes for a basket of goods and services Brazilian households consume. The NCPI focuses explicitly on families with incomes ranging from 1 to 5 minimum wages, while the ENCPI covers all income levels. Both indices are inflation indicators in Brazil and are widely used by policymakers and economists to monitor changes in the cost of living.

3.2. Data Analysis Methods

All these variables are combined and then analyzed, as represented in Algorithm 1. Firstly, it is necessary to select the parameters for defining the time series models. These models require the specification of autoregression and moving average parameters, which are obtained with statistical tools like autocorrelation and partial autocorrelation functions. Additionally, the time series must be evaluated in relation to the presence of a unit root, which is achieved with the Augmented Dickey–Fuller test. Ultimately, the seasonality of the energy consumption is also assessed, both with the autocorrelation function and with a Kruskal–Wallis test.

Algorithm 1 Data analysis

Require:

t i m e_s e r i e s

Require:

e x o g e n o u s_v a r i a b l e s []

1: procedure is_seasonal(

d a t a

)

2:

d e t r e n d e d_d a t a

← detrend(

d a t a

)

3: if (autocorrelation_confirms(

d e t r e n d e d_d a t a

) AND kruskall-wallis_confirms(

d e t r e n d e d_d a t a

)) then

4: return True

5: else

6: return False

7: end if

8: end procedure

9: procedure granger_variables(

t i m e_s e r i e s, v a r i a b l e s []

)

10:

r e s u l t s : = []

11: for each

v a r

in

v a r i a b l e s []

do

12: if granger_causality(

v a r, t i m e_s e r i e s

) then

13:

r e s u l t s . a p p e n d (v a r)

14: end if

15: return

r e s u l t s

16: end for

17: end procedure

18: if is_seasonal(

t i m e_s e r i e s

) then

19: components ← STL_decomposition(time_series)

20: end if

21:

d i f f_o r d e r

←

n_d i f f (t i m e_s e r i e s)

22:

a u t o r e g r e s s i v e_o r d e r

←

P A C F (t i m e_s e r i e s)

23:

m o v i n g a v e r a g e_o r d e r

←

A C F (t i m e_s e r i e s)

24:

g r a n g e r_v a r \leftarrow g r a n g e r_v a r i a b l e s (t i m e_s e r i e s,

e x o g e n o u s_v a r i a b l e s + c o m p o n e n t s

25:

c o r r e l a t i o n_m a t r i x (g r a n g e r_v a r)

Then, the persistence or long-term memory of the time energy consumption time series is examined. This analysis informs how the time series behaves in relation to previous values and whether it presents random walk patterns or not.

Finally, statistical analyses such as Granger causality tests and correlation studies were executed further to examine the relationship between external factors and energy consumption. These statistical methods aid in pinpointing significant variables and quantifying their effect on energy demand fluctuations, thereby boosting the forecasting accuracy of the models.

3.3. Models Evaluation and Forecasting

The forecasting models have undergone training and testing phases to evaluate the methodology. These models were trained on some historical data, with a specific time frame set aside for testing purposes, as depicted in Algorithm 2. The training process involves fine-tuning model parameters and assessing performance metrics through techniques like cross-validation. Performance metrics like Mean Absolute Percentage Error (MAPE), Mean Percentage Error (MPE), Mean Squared Error (MSE), and the Normalized Root Mean Square Error by the range between the minimum and the maximum values (NRMSE_minmax) are utilized to assess the precision and efficiency of the forecasting models.

Algorithm 2 Forecasting

Require:: $t i m e_s e r i e s$
Require:: $g r a n g e r_v a r$
1:: $t e s t_d a t a \leftarrow t i m e_s e r i e s [- 24 :]$
2:: $t r a i n_d a t a \leftarrow t i m e_s e r i e s [: - 24]$
3:: $s m o o t h e d_d a t a \leftarrow s m o o t h (t i m e_s e r i e s)$
4:: procedure optimal_case( $t i m e_s e r i e s, e x o g e n o u s$ )
5:: $b e s t_M A P E \leftarrow 100$
6:: $b e s t_c a s e s : = []$
7:: $o p t i m a l_w i n d o w : = {}$
8:: for $i = 1$ to 6 do
9:: for each model in models do
10:: $t r a i n_w i n d o w \leftarrow t r a i n_d a t a [- i * 36 :]$
11:: $m o d e l . t r a i n (t r a i n_w i n d o w, e x o g e n o u s)$
12:: $p r e d i c t i o n \leftarrow m o d e l . p r e d i c t (24 m o n t h s)$
13:: $M A P E \leftarrow g e t_m a p e (p r e d i c t i o n, t e s t_d a t a)$
14:: if $M A P E < b e s t_M A P E$ then
15:: $o p t i m a l_w i n d o w {m o d e l} \leftarrow i * 36$
16:: end if
17:: end for
18:: end for
19:: return $o p t i m a l_w i n d o w$
20:: end procedure
21:: $o p t i m a l_o r i g i n a l \leftarrow o p t i m a l_c a s e (t i m e_s e r i e s, N o n e)$
22:: $o p t i m a l_s m o o t h e d \leftarrow o p t i m a l_c a s e (s m o o t h e d_d a t a, N o n e)$
23:: $o p t i m a l_o r i g i n a l_e x o \leftarrow o p t i m a l_c a s e (t i m e_s e r i e s, g r a n g e r_v a r)$
24:: $o p t i m a l_s m o o t h e d_e x o \leftarrow o p t i m a l_c a s e (s m o o t h e d_d a t a, g r a n g e r_v a r)$
25:: for each model in models do
26:: $m o d e l . t r a i n (o p t i m a l_o r i g i n a l)$
27:: $m o d e l . p r e d i c t (120 m o n t h s)$
28:: $m o d e l . t r a i n (o p t i m a l_s m o o t h e d)$
29:: $m o d e l . p r e d i c t (120 m o n t h s)$
30:: end for

The MAPE offers a measure of the average percentage difference between the predicted and actual values, offering perspective into the relative magnitude of errors while inherently accounting for the scale of the data. As a complement, MPE provides an indication of directional bias by measuring the average percentage difference between the predicted and actual values without absolute value transformation, thereby discerning whether predictions consistently overestimate or underestimate actual values.

Furthermore, RMSE offers a measure of the typical magnitude of errors, while penalizing larger errors. Finally, NRMSE_minmax normalizes the RMSE by the range of actual values, allowing for a comparison of predictive accuracy relative to the variability of the data. These metrics are calculated by Equations (1)–(4), in which y represents the actual data and

\hat{y}

represents the predicted value.

MAPE = \frac{100 %}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}|

(1)

MPE = \frac{100 %}{n} \sum_{i = 1}^{n} (\frac{y_{i} - {\hat{y}}_{i}}{y_{i}})

(2)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(3)

NRMSE_minmax = \frac{RMSE}{\max (y) - \min (y)}

(4)

These metrics elucidate each model’s capability to predict energy demand. In addition to assessing the individual performance of the forecasting models, this study inquires about the influence of external factors on energy consumption trends. External variables, selected according to the Granger tests, are examined alongside the forecasting models to assess their impact on energy demand patterns.

For the section of the work in which we evaluate the impacts of exogenous variables, SARIMAX is used instead of SARIMA. The multivariate models are run on SARIMAX and FB Prophet exclusively, as Holt–Winters and TBATS do not support this type of forecasting.

In addition to the accuracy metrics, the models’ fit is also evaluated, with the Ljung–Box test. This test evaluates whether the model has successfully captured the time series structure by checking for remaining autocorrelations in the residuals. If the residuals show no significant autocorrelation, the model is considered a good fit.

The accuracy and fitness of the models are evaluated for different training window sizes, and the optimal scenario is selected for forecasting the energy consumption in Brazil up to 2033. Ultimately, we also measure the time for training and predicting with each model.

A comparative analysis evaluates each forecasting model’s strengths and weaknesses in predicting Brazil’s energy demand trends. This assessment considers factors such as prediction precision, computational speed, and the ability to manage external factors, offering perspectives into the algorithms’ practical applicability in real-world situations.

4. Data Analysis

Understanding its statistical characteristics is essential for more precise time series forecasting. This enables the selection of forecasting models that best align with these characteristics.

For the SARIMA model, for example, some parameters must be passed; these are the Autoregression (AR), Integrated (I), and Moving Average (MA) parameters, represented by the letters

p, d, q

, respectively. The d parameter is determined in Section 4.1, while

p, q

in Section 4.2.

Additionally, seasonality in the time series eliminates the suitability of linear regression models, as they assume a linear relationship between the independent and dependent variables. However, a straight line cannot accurately capture the cyclical patterns exhibited by seasonal time series. Section 4.3 presents statistical tests to evaluate whether the energy consumption in Brazil follows a seasonal pattern, along with the decomposition into its seasonal, trend, and residual components.

Moreover, the findings from an analysis of long-term memory in a time series may indicate whether the past observations in a time series are informative for future forecasting or not. This analysis is provided in Section 4.4.

Furthermore, the patterns and components of the time series may manifest as a consequence of other underlying factors, which may not directly correlate with the values in the time series or depend on them. These factors are termed exogenous variables, and considering them may also enhance the accuracy of the forecasting model. This evaluation is presented in Section 4.5.

This section presents these evaluations before comparing the results between the different models.

4.1. Unit Root Test and Differencing Order

Ensuring stationarity is a crucial step before applying many statistical methods. It implies that statistical properties of the data, such as mean, variance, and autocorrelation, remain constant over time. This characteristic impacts forecasting accuracy and model selection, as it is a fundamental assumption in many time series models, ARIMA.

Non-stationary data, often characterized by the presence of a unit root, can lead to unreliable parameter estimates and erroneous forecast results. It is manifested as a trend that increases or decreases. As seen in Section 4.3.1, the energy consumption time series in Brazil is not stationary, as it has an increasing trend component.

To address this unit root problem, stationarity is achieved by removing non-stationary features from the time series. The number of differences required to achieve stationarity is estimated using unit root tests, such as the Augmented Dickey–Fuller (ADF) test.

Hypothesis 1.

The times series has a unit root, it is non-stationary.

Hypothesis 2.

The series does not have a unit root, it is stationary.

For the original time series, the ADF test results a p-value of 0.84, which is greater than 0.05, suggesting acceptance of the null hypothesis. After one difference, however, the p-value is 0.0003 and indicates acceptance of the alternative hypothesis. This means that the time series is stationary after one difference. This determines that the d parameter of the SARIMA model must be equal to 1.

4.2. Autoregressive and Moving Average Orders

To ascertain the remaining parameters for SARIMA, namely the Autoregressive (AR) and Moving Average (MA) parameters, denoted as p and q respectively, the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) can be employed.

To determine the AR parameter, p, and the MA parameter, q, the plots from Figure 3 are used. In this Figure, the blue shades represent the confidence interval, at a 95% confidence level.

The ACF displays the correlations between the time series and its lagged values, which are past observations of a variable at specific time points relative to the current observation. In contrast, the PACF plot reveals the autocorrelation of a time series with itself at different lags, while controlling for the influence of intermediate lags, thus isolating the direct correlation between each observation and its specific lagged value.

For the p parameter, the PACF plot, in Figure 3b, is examined. A significant peak at lag 1 in the PACF plot, indicated by a spike beyond the confidence interval, represented by the blue shade, suggests a strong correlation between the current observation and its immediate lagged value. This is indicative of an autoregressive relationship. Consequently, p is determined to be 1.

A similar approach is used in the ACF plot, in Figure 3a, for the moving average parameter q extends to the ACF plot. A significant decay in correlation after the first lag in the ACF plot indicates the presence of a moving average process, where subsequent observations are influenced by past forecast errors. Thus, the observation of a significant correlation at lag 1 in the ACF plot, coupled with a rapid decay in subsequent lags, signifies the influence of one lagged forecast error on the current observation, leading to the determination of q as 1.

In addition to that, the ACF may also provide evidence of seasonality in a time series. A 12-month seasonality in energy consumption in Brazil is evidenced by Figure 3a.

4.3. Seasonality Tests

Identifying cyclical patterns in the time series empowers forecasting models to deliver more precise predictions. Hence, the seasonality and the temporal periodicity at which the time series presents this behavior must be detected.

For evaluating the seasonality of the energy consumption, the stationary time series, after one difference, is used, according to Section 4.1. The intention is to eliminate its underlying trend, intending to evidence short-term fluctuations and the seasonal component.

The Kruskal–Wallis test is a non-parametric statistical method utilized to evaluate whether the medians of two or more groups are different. Its hypotheses are as follows:

Hypothesis 3.

The medians of all the groups are equal; that is,

{\tilde{μ}}_{1} = {\tilde{μ}}_{2} = . . . = {\tilde{μ}}_{n}

, where

{\tilde{μ}}_{i}

represents the median of the ith group.

Hypothesis 4.

At least one of the group medians is different from the others.

To apply this test, the trend-subtracted time series is divided into 20 groups of 12 months each, each representing a year of energy consumption between 2004 and 2023. The length of 12 months is chosen based on the ACF results from Section 4.2.

These groups are inputted into the Kruskal–Wallis test, which returned a p-value of 0.994, more significant than the significance level of 0.05. Therefore, the null hypothesis is accepted, which suggests that the time series presents the same median throughout all years, endorsing the existence of yearly seasonality in the time series.

4.3.1. Seasonal and Trend Decomposition Using LOESS

After formal proof of the seasonality of the energy consumption in Brazil, the time series may be decomposed into its components using Locally Estimated Scatterplot Smoothing (LOESS). This results in the seasonal, trend, and residual components, as depicted in Figure 4. The decomposition is performed with additive seasonality, resulting in a consistent seasonal pattern, thus suggesting its additive nature.

The trend component represents the long-term direction of the time series. As seen in Figure 4, energy consumption is consistently rising, despite drops in 2009 and 2020 and a change in the slope in 2015.

The seasonal component represents the repeating pattern in the data at fixed intervals. This section proved that this fixed interval is 12 months wide. This is also confirmed by the seasonal component, shown in Figure 4.

In the seasonal component chart, all the dips to below

- 1.0 \times 10^{6}

MWh occurred either in June or in July when it is winter in Brazil, and all peaks above

1.0 \times 10^{6}

MWh are registered in March, during the transition from summer (until the 20th) to autumn. This suggests a correlation between the energy consumption pattern and the temperatures in Brazil. This causality is more deeply studied in Section 4.5.

After removing the trend and seasonal patterns, the residual component represents the unexplained variation or noise in the time series data. A sharp drop between April and July 2020 is observed in the residual component, potentially linked to the COVID-19 pandemic, officially declared by the World Health Organization on 11 March. This period may have decreased energy consumption due to pandemic-related restrictions and changes in societal behavior, as found by [60]. This also appears to be responsible for the drop in the trend component in 2020.

4.4. Long-Term Memory Test

The capacity of a time series to retain patterns over extended periods refers to its long-term memory. It reflects the degree to which past observations influence future observations in the series. It indicates how well the historical data of the time series are preserved or remembered over time. Comprehending the long-term memory of a time series is fundamental for revealing its underlying dynamics, analyzing trends, identifying patterns, and improving the reliability of forecasts based on historical data.

In this regard, a time series is said to have persistent behavior if it tends to replicate its patterns over time, manifested as momentum in the data, whether characterized by increasing or decreasing trends. This means that increases follow increases in values, and decreases in values are followed by decreases in values.

Conversely, an anti-persistent time series is more likely to experience an increase in value after a decrease and vice versa. This creates a mean reversion behavior rather than continuing in the same direction.

A time series may also present a Brownian motion behavior in which future values do not depend on past observations. This poses a challenge to time series forecasting.

The Hurst exponent (H) evaluates a time series regarding their long-term memory. This metric,

H \in [0, 1]

, indicates whether a time series presents a persistent (

H > 0.5

), anti-persistent (

H < 0.5

), or random (

H = 0.5

) behavior.

The variation of the Hurst Exponent over a 100-month wide window is calculated for the energy consumption time series and its residual and trend components. This analysis does not include the seasonal component due to its evident cyclical behavior. The results are presented in Figure 5, demonstrating that energy consumption in Brazil has a consistent growth trend despite the inflection points, such as the one possibly caused by the COVID-19 pandemic.

In contrast, the energy consumption time series and its residual components follow an anti-persistent behavior. The persistent behavior of the trend component, along with the fact that no Brownian motion behavior (

H = 0.5

) (red dashed lined in Figure 5) is observed, implies that Brazil’s energy consumption is predictable by observing its past values.

4.5. Interaction with Exogenous Variables

Understanding the intricate relationship between energy usage and external factors is essential to gain a comprehensive perception of the dynamics of energy demand.

Economic fluctuations, for example, may impact energy consumption trends, reflecting shifts in production, consumption patterns, and overall economic activity. Similarly, climatic variables, such as temperature and precipitation, may influence energy demand, particularly in sectors like heating and cooling.

By leveraging exogenous variables in the analyses, not only is the understanding of the factors that shape energy consumption patterns enhanced, but forecasting models may also improve in accuracy. This possibly enables these variables to serve as valuable indicators of external influences that drive changes in energy demand, allowing us to anticipate and adapt to evolving dynamics in the energy landscape. By meticulously analyzing these interactions, we aim to improve comprehension that facilitates informed decision-making in energy planning, resource allocation, and policy formulation.

4.5.1. Granger Causality

Granger causality determines whether one time series can be used to predict another time series [61]. It asserts that a given time series, denoted here as X, Granger-causes (or G-causes) another time series, designated as Y, under the condition that the historical data of X contribute to a superior forecasting accuracy of Y compared to relying solely on Y’s past data. The hypotheses of the test are as follows:

Hypothesis 5.

The lagged X-values do not explain the variations in Y, i.e., X(t) does not G-cause Y(t).

Hypothesis 6.

The lagged X-values may explain the variation in Y, i.e., X(t) G-causes Y(t).

For each pair of variables in our dataset, the Granger causality is tested for a maximum lag of three months, and the causality is only considered valid if the null hypothesis is rejected for all three lags. Figure 6 displays the results, in which the rows represent the causes being tested and the columns are the targeted time series.

It is important to note, however, the Granger causality does not imply a symmetric relationship between two variables; that is, (X(t) → Y(t)) ⇏ (Y(t) → X(t)), in which the → represents the Granger causality. Because of that, the matrices in Figure 6 are not symmetrical concerning their main diagonal. Also, one variable cannot G-cause itself; therefore, the matrices are also false in their main diagonal.

While oil production was the only industrial factor to Granger-cause energy consumption (Figure 6a), the climatic factors were the main ones responsible for the seasonal component of the energy usage (Figure 6b). The Granger causality testing, however, does not provide information about the direction of the causal effect between variables. That is, it cannot determine whether the energy consumption rises or falls after an increase in the temperature. Section 4.5.2 assesses the relationship between these variables’ directions.

Figure 6c shows that the economic indexes do not G-cause the trend or seasonality. However, the GDP, NCPI, ENCPI, and uni-direction Granger causality influence Brazil’s energy consumption (EC) time series. This finding diverges from those found by [62], which stated the causality in the reverse order between 1963 and 1993; that is, EC → GDP.

The Granger causality between economic growth and energy consumption has been studied in several countries, such as Italy, as conducted by [63] and in G7 countries, as conducted by [64]. It has been observed that, for example, between 1990 and 2014, EC ↔ GDP globally, while EU countries had EC → GDP, and, between 1960 and 2012, Belgium had GDP → EC, according to [65], as our current findings for Brazil.

4.5.2. Correlation

The assessment of the correlation between variables enlightens the direction of the relationship between variables and causality relationship. For this analysis, we have selected exclusively the variables that demonstrated, in Section 4.5.1, Granger causality to energy consumption or any of its components.

The correlation confirms the observations from Section 4.3.1, which concluded that the energy consumption was higher in the Brazilian summer. Figure 7 shows that the dry bulb temperature and the seasonal component of the energy consumption have a strong positive correlation, suggesting that the greater the temperature, the higher the energy demand.

A strong correlation between the trend component and the economic and industrial factors is noted despite no Granger causality being observed between them. This phenomenon may be attributed to several plausible explanations. Firstly, a common cause could form the basis of the observed correlation, whereby alterations in these factors occur concurrently without direct causal links among them. Alternatively, the potential presence of Granger causality, albeit with a lag greater than three months, was used in the experiment.

4.5.3. Cointegration

Cointegration describes the long-term equilibrium relationship between two-time series. Cointegrated series move together in the long run despite exhibiting short-term deviations from their equilibrium relationship.

For this analysis, we selected the variables that were found to be Granger causes of energy consumption in Brazil, according to Section 4.5.1. The cointegration between them was assessed with the Engle-Granger test.

This test involves regressing one non-stationary variable on another and then testing the stationarity of the residuals obtained from the regression, using the Augmented Dickey–Fuller test. If the p-value from the unit root test is below 0.05, it indicates that the residuals are stationary, suggesting that the variables are cointegrated. However, if the p-value exceeds this significance level of 0.05, the null hypothesis of non-stationarity in the residuals cannot be rejected, implying that there is no cointegration between the variables.

Figure 8 shows the matrix of cointegration between the selected Granger variables and is presented as the True cointegration if the p-value is less than 0.05 and as false otherwise.

All variables depicted in Figure 8 exhibit Granger causality with energy consumption, but only dew point temperature and relative humidity of air demonstrate cointegration with the energy consumption time series or its components. This suggests a predictive relationship between the two variables, but it may not be a stable long-term relationship.

5. Time Series Models

Having conducted the data analysis in Section 4, the patterns, trends, and correlations within the time series are now comprehended, providing meaningful observations of the historical behavior of energy consumption. This knowledge also enables a better configuration of the prediction models, resulting in more accurate values.

5.1. Training Settings

The forecast is conducted using the SARIMA, FB Prophet, Holt–Winters, and TBATS models, configured to consider energy consumption’s yearly seasonality. These models were selected due to their support for seasonal time series and their previous successful use, as seen in Section 2.

The experiment is conducted using Python, version 3.10.12. The libraries used for each model and their training parameters are presented in Table 4. These parameters were selected according to the data analysis discussed in Section 4, with a corresponding reference in Table 4, indicating where the determination of each parameter is elaborated. The parameters not mentioned in Table 4 were set to their default values. It is noted that FB Prophet is easier to use, as it automatically identifies the adequate parameters. Conversely, SARIMA requires manual configuration of several variables.

The 24 most recent months (December 2021 up to November 2023) of the data are used to test the predicted values, leaving 215 months available for training. As the features of the time series, such as the Hurst exponent, vary with time, each model is trained using an expanding training window that incrementally encompasses 36 months of historical data up to the complete training dataset.

Moreover, to mitigate the influence of noise inherent in the residual component, the evaluation is conducted on both the original time series and on a smoothed version, with a smoothing parameter

α = 0.3

. Figure 9 summarizes these forecasting scenarios.

5.2. Univariate Models Accuracy Evaluation

The validity and performance of these time series forecasting models must be rigorously assessed to ensure their suitability for real-world applications. This section presents the test results of the time series univariate forecasting models SARIMA, FB Prophet, Holt–Winters, and TBATS for energy consumption, comparing them on their Mean Absolute Percentage Error (MAPE).

As presented in Figure 10, FB Prophet’s model achieved the best accuracy, with 0.71% MAPE for the smoothed energy usage time series, with a 108-month training window. Furthermore, upon comparing Figure 10a,b, it becomes evident that smoothing the time series enhanced the accuracy of the models. Additionally, in most scenarios, the utilization of a 36-month training window yielded the poorest MAPE values, suggesting an insufficient amount of data for robust model performance.

For the original time series, the optimal training window sizes for each model were 72 months for SARIMA (1.38% MAPE) and for FB Prophet (1.91%); 108 months for Holt–Winters (3.85%); and 215 months for TBATS (4.17%). For the smoothed version, the optimal sizes were 180 months for SARIMA (1.44%), 108 months for FB Prophet (0.71%, the lowest MAPE overall); 72 months for Holt–Winters (2.53%); and 215 months for TBATS (1.44%).

The predictions for the validation period for each optimal model are further compared with additional accuracy metrics in Table 5, which presents additional accuracy metrics for these models. Both FB Prophet and SARIMA returned negative MPE values, meaning that their predicted values are greater than the actual values. In contrast, Holt–Winters and TBATS resulted in positive MPE values, as their estimated values are lower than the actual ones. The TBATS model also obtained a MAPE equal to MPE, meaning that the predicted values never cross over the actual values. These conclusions are supported by Equation (2) and visually represented in Figure 11, which compares these models, in these same scenarios, for both the original and the smoothed time series. It is noteworthy that all these models achieved Highly Accurate Prediction, according to [51].

These models are also used to forecast the energy consumption in Brazil for up to November 2033, in Section 5.5.

The MAPE assessment shows that FB Prophet achieved the best accuracy among the models assessed for energy forecasting in Brazil. Our findings are in accordance with [17,18].

5.3. Multivariate Models Accuracy Evaluation

Similar to the results presented in Section 5.2, this section evaluates the accuracy metrics for the forecasting models. In this section, however, we incorporate exogenous variables that exhibit a Granger causality relationship with the energy consumption time series, as seen in Section 4.5.

As the Holt–Winters and the TBATS models do not natively support exogenous variables, they are excluded from the analysis of this section. Also, SARIMAX is used, along with FB Prophet, instead of SARIMA.

As a consequence, these models are evaluated regarding their accuracy, measured by the MAPE, when adding the exogenous variables of oil production (industrial); GDP, NCPI, and ENCPI (economic) [66]; total precipitation, air temperature, dew point temperature, relative humidity of air and wind speed (climatic). We also assess the accuracy when including all these factors concurrently in the models. The results are shown in Figure 12 for SARIMAX and Figure 13 for FB Prophet.

It is worth noting that SARIMAX demonstrated improved optimal accuracy, in comparison to SARIMA, when incorporating exogenous variables in both the original and smoothed time series. Specifically, the original time series, utilizing solely climatic variables, achieved a MAPE of 1.28% with a training window size of 215 months. Conversely, the smoothed time series, incorporating economic variables, returned an even lower MAPE of 1.18% with a training window size of 180 months. In contrast, when considering only endogenous time series data, the MAPE values were slightly higher, at 1.38% for a training window size of 72 months and 1.44% for a window size of 180 months, respectively.

In the case of the FB Prophet model, an interesting observation emerges. Contrary to expectations, including economic exogenous variables seems to have a detrimental effect on model accuracy. When considered in isolation, these variables resulted in a maximum Mean Absolute Percentage Error (MAPE) of 42.08%, and when combined with industrial and climatic factors, the MAPE increased to 52.79%. Interestingly, while incorporating exogenous variables did not significantly enhance the model’s accuracy in the smoothed time series, it marginally improved accuracy in the original version. Specifically, accuracy improved from 1.91% with endogenous variables alone to 1.62% with a training window of 144 months, considering industrial variables.

None of the models incorporating all factors simultaneously produced superior results. However, when each category of exogenous variables was considered independently, the best accuracy was achieved under specific scenarios. Industrial variables enhanced the accuracy of the smoothed time series in the Facebook Prophet model, while climatic variables benefited the original time series in SARIMAX. Economic variables, conversely, demonstrated a modest improvement in accuracy for the smoothed version in SARIMAX. The original time series in the Facebook Prophet model generated the best results when considering endogenous variables exclusively.

For the original time series, the SARIMAX achieved its best MAPE value for a training window of 215 months and considering climatic exogenous variables, while FB Prophet achieved it for 144 months and industrial variables. For the smoothed version, SARIMAX tested the best with 180 months and economic variables, while FB Prophet with 36 months and industrial variables. Both SARIMAX and FB Prophet are compared in these optimal scenarios with different metrics in Table 6. These same scenarios are used to visually demonstrate the validation in Figure 14.

SARIMAX achieved Highly Accurate Prediction in all combinations of exogenous variables, both for the original and the smoothed time series. FB Prophet, contrarily, appears to have reacted poorly to the economic exogenous variables, achieving an Inaccurate Prediction when incorporating all variables in the smoothed version. When using exclusively industrial or climatic variables, it consistently reached Highly Accurate Predictions.

5.4. Models Fit Test

To estimate the adequacy of these models in each scenario, the Ljung–Box test is performed. The test is performed with a maximum lag of four, considering the default parameter of

m a x_l a g s = f l o o r (n u m b e r_o f_o b s e r v a t i o n s / 5)

. Its hypotheses are as follows:

Hypothesis 7.

There is no autocorrelation in the data up to a specified lag.

Hypothesis 8.

There is significant autocorrelation present in the data.

The Ljung–Box test is run on the residuals (i.e.,

y - \hat{y}

) of each model, in their optimal window sizes, according to Section 5.2.

Significant autocorrelation in the residuals suggests that the model fails to capture some foundational patterns in the data, indicating potential deficiencies in the estimation process. Its results help ensure the validity of the model’s assumptions, particularly regarding the independence and randomness of the residuals. This helps improve the forecasting model’s reliability.

The fourth lag demonstrated the highest p-values for all models. From Table 7, it is noted that only FB Prophet and SARIMA, when applied to the original time series, were able to reject

H_{1}

. This means that the forecasting models adequately capture the temporal dependencies present in the time series, and the residuals exhibit randomness and independence. The residuals for forecasting the smoothed version, however, present significant autocorrelation, implying a lack of fit. Holt–Winters and TBATS showed a lack of fit for both scenarios.

Upon restricting the accuracy evaluation for SARIMA and FB Prophet applied to the original time series, as these are the only models that demonstrated to be fit, it is concluded that SARIMA is the most accurate fit model for the univariate forecasting.

Similarly, in the multivariate scenario, both SARIMAX and FB Prophet demonstrated adequate fit, with no autocorrelation within the residuals in the original time series, as demonstrated in Table 8. In contrast, for the smoothed version, both models showed a lack of fit in the Ljung–Box test.

As again the smoothing of the time series made the models fail the Ljung–Box test, only the accuracy performance for the original time series is taken into consideration. With this restriction, SARIMAX returned the best MAPE values.

5.5. Energy Consumption Forecasting

This section considers the optimal training window sizes for each forecasting model, considering the univariate results presented in Section 5.2.

Although SARIMAX achieved its best accuracy in the multivariate modeling, we here present the predictions provided by its univariate scenarios because forecasting the energy consumption up to 2033 in the multivariate scenario would also require predicting indexes like Brazilian’s GDP, inflation, and oil production, which is not a trivial task and beyond the scope of this work. Moreover, the accuracy of the univariate scenario is considered adequate for this purpose. The primary objective of the results from Section 5.3 was to illustrate the impact of incorporating exogenous factors on model performance.

The predicted values are displayed in Figure 15. The SARIMA predicts values close to 70,000 GWh in November 2033 when no smoothing is applied, with FB Prophet forecasting above 60,000 GWh. The value indicated by SARIMA is still near the upper limits of the predicted values by FB Prophet, represented by the gray shade in Figure 15. It is noteworthy that Holt–Winters and TBATS predict a stationary trend in this scenario.

When smoothing is applied, SARIMAX and FB Prophet predict similar values at around 55,000 GWh consumption in November 2033. Although Holt–Winters still predicts a stationary trend in this scenario, TBATS forecasts an increase, predicting approximately 50,000 GWH for the same month. Smoothing the time series has also enlarged the lower and upper limits of the FB Prophet forecasts, possibly as a consequence of the increase of the autocorrelation in the residuals, as seen in Section 5.4.

The forecasted values generated by all four models are accessible in the GitHub repository (github.com/gabriel-arquelau/Brazilian-energy-consumption-prediction/ accessed on 21 June 2024). These predictions are based on scenarios that achieved the best MAPE assessments in their optimal univariate scenarios. They encompass data projections extending up to November 2033 for both the original and smoothed scenarios, and in the case of FB Prophet, accompanied by corresponding lower and upper limits. The publication of this data enables future validation of the forecasting with actual values.

5.6. Time Performance Assessment

While accuracy is a fundamental aspect of forecasting models, it is not the sole determinant of their effectiveness. Particularly in time-sensitive environments, time performance plays a vital role in ensuring the practicality and utility of these models.

In dynamic industries like energy management, decisions must be made rapidly in response to changing conditions. A forecasting model that provides accurate predictions but takes excessive time to train or generate forecasts may be impractical in such contexts. The decision-making process may require forecasting models that can deliver timely responses, enabling organizations to react adequately to emerging trends, seize opportunities, and mitigate risks effectively.

To extend the comparison between the models regarding their time performance, each was timed during training (with data of 215 months) and predicted (up to November 2033) 25 times, not considering exogenous variables. The resulting average execution time is shown in Table 9, along with the measurement margin of error, using Equation (5), and the consequent confidence level.

M a r g i n o f E r r o r = z_{α / 2} \frac{σ}{\sqrt{n}}

(5)

The time performance tests were conducted using a computer with 12.7 GB of RAM and an Intel(R) Xeon(R) CPU running at 2.20 GHz. No GPU was used.

6. Conclusions

Precise forecasting in the energy sector is crucial for the achievement of sustainable development objectives in three key areas: making available cost-effective, dependable, and up-to-date energy utilities, exploring alternative energy sources, and improving energy efficiency.

This study seeks to meet this objective by comparing the performance of different forecasting models in the Brazilian energy sector. The analysis assesses Seasonal Autoregressive Integrated Moving Average (SARIMA), Facebook Prophet (FB Prophet), Holt–Winters, and TBATS, in handling the intricate dynamics of energy demand in Brazil. The results reveal that each model displays distinct strengths and weaknesses.

As noted earlier, SARIMA excels in tracing seasonal patterns but may struggle to deal with irregular fluctuations. Facebook Prophet shows promising outcomes in different sectors. However, it requires careful examination of specific datasets and contextual factors when conducting an analysis. Holt–Winters is adept at handling seasonal variations but may encounter difficulties when faced with abrupt changes in demand. In contrast, TBATS shows flexibility when managing diverse seasonal patterns but requires a considerable amount of historical data and may face challenges in making long-term projections.

Furthermore, account was taken of the influence of external variables on energy consumption, such as economic indicators, industrial factors, and climatic conditions. This investigation seeks to elucidate the connecting links between these factors and energy consumption through Granger causality tests and correlation assessments. Additionally, the research investigates the persistence of energy consumption patterns, by highlighting enduring trends in trend components and non-persistent behavior in residual components.

The precision and effectiveness of the forecasting models are evaluated, in both univariate and multivariate scenarios. The performances of SARIMAX and FB Prophet are assessed in terms of their accuracy in integrating external variables and enabling practical applications. Thus, this study seeks to contribute to advances in energy demand prediction, by allowing well-informed decision-making in sustainable energy planning, resource allocation, and policymaking and, thus, assists in achieving SDG7 and its related objectives.

Our findings indicate that, although FB Prophet achieved the best general accuracy metrics, its residuals still present autocorrelation, which implies a lack of fit of the model. Upon restricting the accuracy evaluation to the models that do not present autocorrelation in the residuals, it is noted that SARIMA is found to return the best MAPE values for the univariate forecasting, with 1.38%, as opposed to the 1.91% from FB Prophet. In the forecasting with exogenous variables, SARIMAX achieved the best accuracy, achieving 1.28% MAPE compared to 1.62% from Facebook Prophet. The smoothed version of the time series resulted in autocorrelation within the residuals for all models.

These findings may support the Brazilian government in the elaboration of energy, economy, and sustainability policies. Additionally, the methodology proposed in this study may also be used as a framework for assessing and predicting energy consumption patterns worldwide.

As part of future work, we propose evaluating new categories of exogenous variables, such as demographic data, and their relationship to energy consumption patterns. We also propose expanding the statistical comparison to more recent time series models, such as TFT and N-BEATS, possibly to energy generation instead of demand.

Author Contributions

A.L.M.S., G.A.P.R. and P.H.d.S.M. conducted the experiments and formal analysis. G.M.S. curated the data. G.P.R.F., V.P.G. and R.d.O.A. supervised the experiment and the proposed methodology. All authors contributed equally to writing and revising the document. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the University of Brasilia (UnB).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

We used publicly available data published by the Brazilian institutions EPE, ABVE, INMET, and IPEA.

Acknowledgments

The authors acknowledge the technical and computational support from the LATITUDE Laboratory at the University of Brasília, to TED 01/2019 from the Attorney General’s Office (Grant AGU 697,935/2019), to TED 01/2021 from the National Secretariat for Social Assistance—SNAS/DGSUAS/CGRS for the SISTER City Project—Safe and Real-Time Effective Intelligent Systems for Smart Cities (Grant 625/2022), to the “Project for Project Control and Unification for the Federal District Government—Sispro-DF” (Grant 497/2023), to the Dean of Research and Innovation—DPI/UnB, and to FAP/DF. The authors would also like to thank the Brazilian National Confederation of Industry (CNI) for partially supporting this project and for their support and collaboration throughout this research project.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ABVE	Brazilian Association of Electric Vehicles
ADF	Augmented Dickey–Fuller
ARMA	Autoregressive Moving Average
ARIMA	Autoregressive Integrated Moving Average
ENCPI	Extended National Consumer Price Index
EPE	Brazilian Energy Research Company
FB	Facebook
GDP	Gross Domestic Product
GMPI	General Market Price Index
GPI-DA	General Price Index-Domestic Availability
IPEA	Brazilian Institute for Applied Economic Research
MAPE	Mean Absolute Percentage Error
NCPI	National Consumer Price Index
SARIMA	Seasonal Autoregressive Integrated Moving Average
SARIMAX	SARIMA with exogenous variable
TBATS	Trigonometric Seasonality Box–Cox transformation, ARMA errors, Trend, and Seasonal components
TFT	Temporal Fusion Transformer

References

He, J.; Yang, Y.; Liao, Z.; Xu, A.; Fang, K. Linking SDG 7 to assess the renewable energy footprint of nations by 2030. Appl. Energy 2022, 317, 119167. [Google Scholar] [CrossRef]
Rocha Filho, G.P.; Meneguette, R.I.; Maia, G.; Pessin, G.; Gonçalves, V.P.; Weigang, L.; Ueyama, J.; Villas, L.A. A fog-enabled smart home solution for decision-making using smart objects. Future Gener. Comput. Syst. 2020, 103, 18–27. [Google Scholar] [CrossRef]
Guo, C.; Luo, F.; Cai, Z.; Dong, Z.Y. Integrated energy systems of data centers and smart grids: State-of-the-art and future opportunities. Appl. Energy 2021, 301, 117474. [Google Scholar] [CrossRef]
Mir, A.A.; Alghassab, M.; Ullah, K.; Khan, Z.A.; Lu, Y.; Imran, M. A review of electricity demand forecasting in low and middle income countries: The demand determinants and horizons. Sustainability 2020, 12, 5931. [Google Scholar] [CrossRef]
Andrenacci, N.; Valentini, M.P. A literature review on the charging behaviour of private electric vehicles. Appl. Sci. 2023, 13, 12877. [Google Scholar] [CrossRef]
Wu, W.; Lin, Y. The impact of rapid urbanization on residential energy consumption in China. PLoS ONE 2022, 17, e0270226. [Google Scholar] [CrossRef] [PubMed]
de Assis Cabral, J.; Legey, L.F.L.; de Freitas Cabral, M.V. Electricity consumption forecasting in Brazil: A spatial econometrics approach. Energy 2017, 126, 124–131. [Google Scholar] [CrossRef]
Silva, F.L.; Souza, R.C.; Oliveira, F.L.C.; Lourenco, P.M.; Calili, R.F. A bottom-up methodology for long term electricity consumption forecasting of an industrial sector-Application to pulp and paper sector in Brazil. Energy 2018, 144, 1107–1118. [Google Scholar] [CrossRef]
Maçaira, P.; Elsland, R.; Oliveira, F.C.; Souza, R.; Fernandes, G. Forecasting residential electricity consumption: A bottom-up approach for Brazil by region. Energy Effic. 2020, 13, 911–934. [Google Scholar] [CrossRef]
Leite Coelho da Silva, F.; da Costa, K.; Canas Rodrigues, P.; Salas, R.; López-Gonzales, J.L. Statistical and artificial neural networks models for electricity consumption forecasting in the Brazilian industrial sector. Energies 2022, 15, 588. [Google Scholar] [CrossRef]
Velasquez, C.E.; Zocatelli, M.; Estanislau, F.B.; Castro, V.F. Analysis of time series models for Brazilian electricity demand forecasting. Energy 2022, 247, 123483. [Google Scholar] [CrossRef]
Albuquerque, P.C.; Cajueiro, D.O.; Rossi, M.D. Machine learning models for forecasting power electricity consumption using a high dimensional dataset. Expert Syst. Appl. 2022, 187, 115917. [Google Scholar] [CrossRef]
Box George, E.; Jenkins Gwilym, M.; Reinsel Gregory, C.; Ljung Greta, M. Time Series Analysis: Forecasting and Control; Wiley: San Francisco, CA, USA, 1976. [Google Scholar]
Taylor, S.J.; Letham, B. Forecasting at scale. Am. Stat. 2018, 72, 37–45. [Google Scholar] [CrossRef]
Winters, P.R. Forecasting sales by exponentially weighted moving averages. Manag. Sci. 1960, 6, 324–342. [Google Scholar] [CrossRef]
De Livera, A.M.; Hyndman, R.J.; Snyder, R.D. Forecasting time series with complex seasonal patterns using exponential smoothing. J. Am. Stat. Assoc. 2011, 106, 1513–1527. [Google Scholar] [CrossRef]
Guo, C.; Ge, Q.; Jiang, H.; Yao, G.; Hua, Q. Maximum power demand prediction using fbprophet with adaptive Kalman filtering. IEEE Access 2020, 8, 19236–19247. [Google Scholar] [CrossRef]
Chaturvedi, S.; Rajasekar, E.; Natarajan, S.; McCullen, N. A comparative assessment of SARIMA, LSTM RNN and Fb Prophet models to forecast total and peak monthly energy demand for India. Energy Policy 2022, 168, 113097. [Google Scholar] [CrossRef]
Debnath, K.B.; Mourshed, M. Forecasting methods in energy planning models. Renew. Sustain. Energy Rev. 2018, 88, 297–325. [Google Scholar] [CrossRef]
Karabiber, O.A.; Xydis, G. Electricity price forecasting in the Danish day-ahead market using the TBATS, ANN and ARIMA methods. Energies 2019, 12, 928. [Google Scholar] [CrossRef]
Alduailij, M.A.; Petri, I.; Rana, O.; Alduailij, M.A.; Aldawood, A.S. Forecasting peak energy demand for smart buildings. J. Supercomput. 2021, 77, 6356–6380. [Google Scholar] [CrossRef]
Bispo, G.D.; Vergara, G.F.; Saiki, G.M.; Martins, P.H.d.S.; Coelho, J.G.; Rodrigues, G.A.P.; Oliveira, M.N.d.; Mosquéra, L.R.; Gonçalves, V.P.; Neumann, C.; et al. Automatic Literature Mapping Selection: Classification of Papers on Industry Productivity. Appl. Sci. 2024, 14, 3679. [Google Scholar] [CrossRef]
Arnob, S.S.; Arefin, A.I.M.S.; Saber, A.Y.; Mamun, K.A. Energy Demand Forecasting and Optimizing Electric Systems for Developing Countries: A Systematic Review. IEEE Access 2023, 11, 39751–39775. [Google Scholar] [CrossRef]
Liu, L.; Wang, J.; Li, J.; Wei, L. An online transfer learning model for wind turbine power prediction based on spatial feature construction and system-wide update. Appl. Energy 2023, 340, 121049. [Google Scholar] [CrossRef]
Wu, B.; Yu, S.; Peng, L.; Wang, L. Interpretable wind speed forecasting with meteorological feature exploring and two-stage decomposition. Energy 2024, 294, 130782. [Google Scholar] [CrossRef]
Wu, B.; Wang, L. Two-stage decomposition and temporal fusion transformers for interpretable wind speed forecasting. Energy 2024, 288, 129728. [Google Scholar] [CrossRef]
Rafayal, S.; Cevik, M.; Kici, D. An empirical study on probabilistic forecasting for predicting city-wide electricity consumption. In Proceedings of the AI, Virtual, 27 May 2022; pp. 1–12. [Google Scholar]
Riady, S.R.; Apriani, R. Multivariate time series with Prophet Facebook and LSTM algorithm to predict the energy consumption. In Proceedings of the 2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE), Jakarta, Indonesia, 16 February 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 805–810. [Google Scholar]
Wang, Y.; Jia, R.; Dai, F.; Ye, Y. Traffic flow prediction method based on seasonal characteristics and SARIMA-NAR model. Appl. Sci. 2022, 12, 2190. [Google Scholar] [CrossRef]
López Rivero, A.J.; Martínez Alayón, C.A.; Ferro, R.; Hernández de la Iglesia, D.; Alonso Secades, V. Network Traffic Modeling in a Wi-Fi System with Intelligent Soil Moisture Sensors (WSN) Using IoT Applications for Potato Crops and ARIMA and SARIMA Time Series. Appl. Sci. 2020, 10, 7702. [Google Scholar] [CrossRef]
Abu Al-Haija, Q.; Mohamed, O.; Abu Elhaija, W. Predicting global energy demand for the next decade: A time-series model using nonlinear autoregressive neural networks. Energy Explor. Exploit. 2023, 41, 1884–1898. [Google Scholar] [CrossRef]
Thangavel, A.; Govindaraj, V. Forecasting energy demand using conditional random field and convolution neural network. Elektron. Elektrotech. 2022, 28, 12–22. [Google Scholar] [CrossRef]
Gundu, V.; Simon, S.P. PSO–LSTM for short term forecast of heterogeneous time series electricity price signals. J. Ambient. Intell. Humaniz. Comput. 2021, 12, 2375–2385. [Google Scholar] [CrossRef]
Verwiebe, P.A.; Seim, S.; Burges, S.; Schulz, L.; Müller-Kirchenbauer, J. Modeling energy demand—A systematic literature review. Energies 2021, 14, 7859. [Google Scholar] [CrossRef]
Sengar, S.; Liu, X. Ensemble approach for short term load forecasting in wind energy system using hybrid algorithm. J. Ambient. Intell. Humaniz. Comput. 2020, 11, 5297–5314. [Google Scholar] [CrossRef]
Liu, Z.; Hajiali, M.; Torabi, A.; Ahmadi, B.; Simoes, R. Novel forecasting model based on improved wavelet transform, informative feature selection, and hybrid support vector machine on wind power forecasting. J. Ambient. Intell. Humaniz. Comput. 2018, 9, 1919–1931. [Google Scholar] [CrossRef]
da Silva Mendes, R.F.; da Costa, K.; da Silva, F.L.C.; Coelho, J.d.S.C.; Vera-Tudela, C.A.R.; Pinto, R.V. Forecasting models for the electricity consumption of the cement industry in Brazil. Obs. Econ. Latinoam. 2023, 21, 6016–6031. [Google Scholar]
de Campos, L.M.L. Time Series Forecast Applied to Electricity Consumption. In Proceedings of the International Conference on Intelligent Systems Design and Applications, Virtual, 12–14 December 2022; Springer: Cham, Switzerland, 2023; pp. 178–187. [Google Scholar]
Khan, A.M.; Osińska, M. Comparing forecasting accuracy of selected grey and time series models based on energy consumption in Brazil and India. Expert Syst. Appl. 2023, 212, 118840. [Google Scholar] [CrossRef]
Rodrigues, G.A.P.; Serrano, A.L.M.; Vergara, G.F.; Albuquerque, R.d.O.; Nze, G.D.A. Impact, Compliance, and Countermeasures in Relation to Data Breaches in Publicly Traded US Companies. Future Internet 2024, 16, 201. [Google Scholar] [CrossRef]
Pimenta Rodrigues, G.A.; Marques Serrano, A.L.; Lopes Espiñeira Lemos, A.N.; Canedo, E.D.; Mendonça, F.L.L.d.; de Oliveira Albuquerque, R.; Sandoval Orozco, A.L.; García Villalba, L.J. Understanding Data Breach from a Global Perspective: Incident Visualization and Data Protection Law Review. Data 2024, 9, 27. [Google Scholar] [CrossRef]
Borucka, A. Seasonal methods of demand forecasting in the supply chain as support for the company’s sustainable growth. Sustainability 2023, 15, 7399. [Google Scholar] [CrossRef]
Alharbi, F.R.; Csala, D. A seasonal autoregressive integrated moving average with exogenous factors (SARIMAX) forecasting model-based time series approach. Inventions 2022, 7, 94. [Google Scholar] [CrossRef]
Trull, O.; García-Díaz, J.C.; Peiró-Signes, A. Forecasting irregular seasonal power consumption. An application to a hot-dip galvanizing process. Appl. Sci. 2020, 11, 75. [Google Scholar] [CrossRef]
Kramar, V.; Alchakov, V. Time-Series Forecasting of Seasonal Data Using Machine Learning Methods. Algorithms 2023, 16, 248. [Google Scholar] [CrossRef]
Kindalkar, S.S.; Itagi, A.R.; Kappali, M.; Karajgi, S. Time Series Based Short Term Load Forecasting using Prophet for Distribution System. In Proceedings of the 2022 International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON), Bangalore, India, 23–25 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
Almazrouee, A.I.; Almeshal, A.M.; Almutairi, A.S.; Alenezi, M.R.; Alhajeri, S.N. Long-term forecasting of electrical loads in kuwait using prophet and holt–winters models. Appl. Sci. 2020, 10, 5627. [Google Scholar] [CrossRef]
Cihan, P. Time-series Forecasting of Energy Demand in Electric Vehicles and Impact of the COVID-19 Pandemic on Energy Demand. Sak. Univ. J. Comput. Inf. Sci. 2023, 6, 10–21. [Google Scholar] [CrossRef]
Zhou, W.; Tao, H.; Jiang, H. Application of a novel optimized fractional grey holt-winters model in energy forecasting. Sustainability 2022, 14, 3118. [Google Scholar] [CrossRef]
Aurna, N.F.; Rubel, M.T.M.; Siddiqui, T.A.; Karim, T.; Saika, S.; Arifeen, M.M.; Mahbub, T.N.; Reza, S.S.; Kabir, H. Time series analysis of electric energy consumption using autoregressive integrated moving average model and Holt Winters model. Telkomnika Telecommun. Comput. Electron. Control 2021, 19, 991–1000. [Google Scholar] [CrossRef]
Lewis, C.D. Industrial and Business Forecasting Methods: A Practical Guide to Exponential Smoothing and Curve Fitting; Butterworth Scientific: Oxford, UK, 1982. [Google Scholar]
Sarkodie, S.A. Estimating Ghana’s electricity consumption by 2030: An ARIMA forecast. Energy Sources Part B Econ. Plan. Policy 2017, 12, 936–944. [Google Scholar] [CrossRef]
Eshragh, A.; Ganim, B.; Perkins, T.; Bandara, K. The importance of environmental factors in forecasting australian power demand. Environ. Model. Assess. 2022, 27, 1–11. [Google Scholar] [CrossRef]
Sigauke, C.; Chikobvu, D. Prediction of daily peak electricity demand in South Africa using volatility forecasting models. Energy Econ. 2011, 33, 882–888. [Google Scholar] [CrossRef]
He, Y.; Zheng, Y.; Xu, Q. Forecasting energy consumption in Anhui province of China through two Box–Cox transformation quantile regression probability density methods. Measurement 2019, 136, 579–593. [Google Scholar] [CrossRef]
Pao, H.T. Forecast of electricity consumption and economic growth in Taiwan by state space modeling. Energy 2009, 34, 1779–1791. [Google Scholar] [CrossRef]
Qin, L.; Li, W. A combination approach based on seasonal adjustment method and echo state network for energy consumption forecasting in USA. Energy Effic. 2020, 13, 1505–1524. [Google Scholar] [CrossRef]
Bernardi, M.; Petrella, L. Multiple seasonal cycles forecasting model: The Italian electricity demand. Stat. Methods Appl. 2015, 24, 671–695. [Google Scholar] [CrossRef]
Angelopoulos, D.; Siskos, Y.; Psarras, J. Disaggregating time series on multiple criteria for robust forecasting: The case of long-term electricity demand in Greece. Eur. J. Oper. Res. 2019, 275, 252–265. [Google Scholar] [CrossRef]
Strielkowski, W.; Firsova, I.; Lukashenko, I.; Raudeliūnienė, J.; Tvaronavičienė, M. Effective management of energy consumption during the COVID-19 pandemic: The role of ICT solutions. Energies 2021, 14, 893. [Google Scholar] [CrossRef]
Granger, C.W. Investigating causal relations by econometric models and cross-spectral methods. Econom. J. Econom. Soc. 1969, 37, 424–438. [Google Scholar] [CrossRef]
Cheng, B.S. Energy consumption and economic growth in Brazil, Mexico and Venezuela: A time series analysis. Appl. Econ. Lett. 1997, 4, 671–674. [Google Scholar] [CrossRef]
Magazzino, C.; Mutascu, M.; Mele, M.; Sarkodie, S.A. Energy consumption and economic growth in Italy: A wavelet analysis. Energy Rep. 2021, 7, 1520–1528. [Google Scholar] [CrossRef]
Pirgaip, B.; Dinçergök, B. Economic policy uncertainty, energy consumption and carbon emissions in G7 countries: Evidence from a panel Granger causality analysis. Environ. Sci. Pollut. Res. 2020, 27, 30050–30066. [Google Scholar] [CrossRef] [PubMed]
Tran, B.L.; Chen, C.C.; Tseng, W.C. Causality between energy consumption and economic growth in the presence of GDP threshold effect: Evidence from OECD countries. Energy 2022, 251, 123902. [Google Scholar] [CrossRef]
Caldeira, A.A.; Wilbert, M.D.; Moreira, T.B.S.; Serrano, A.L.M. Brazilian State debt sustainability: An analysis of net debt and primary balance. Public Adm. Mag. 2016, 50, 285–306. [Google Scholar] [CrossRef]

Figure 1. Number of published documents in energy forecasting.

Figure 2. Overview of the proposed solution.

Figure 3. Determination of AR and MA parameters.

Figure 4. Decomposition of the time series into the trend, seasonal, and residual components.

Figure 5. Variation of the Hurst exponent along the energy consumption time series and its components.

Figure 6. Grange causality results between energy consumption and different factors.

Figure 7. Pearson correlation coefficients between variables, selected according to Granger causality.

Figure 8. Cointegration between variables selected according to Granger causality.

Figure 9. Conditions of the training and testing of the models.

Figure 10. Mean absolute percentage error (MAPE—%) for the univariate models in different scenarios.

Figure 11. Comparison between the actual energy consumption and forecasting univariate models in their optimal training window sizes, according to Figure 10.

Figure 12. Mean absolute percentage error (MAPE—%) for the multivariate SARIMAX model with different exogenous variables.

Figure 13. Mean absolute percentage error (MAPE—%) for the multivariate FB Prophet model with different exogenous variables.

Figure 14. Comparison between the actual energy consumption and forecasting multivariate models in their optimal training window sizes, according to Figure 12 and Figure 13.

Figure 15. Energy consumption endogenous forecasting for up to November 2033, using models in their optimal training window sizes, as according to Figure 10. The gray shades represent the upper and lower limits of the FB Prophet model.

Table 1. Brazil’s actions toward achieving SDG7 targets.

SDG Target	Description	Actions Taken by Brazil
7.1: Universal access to affordable, reliable, and modern energy services	Ensure everyone has access to clean and sustainable energy.	National Program for Universal Access to Electric Energy (Luz para Todos): Expands electricity access to remote and low-income areas Social rate programs: Provide subsidies for low-income households
7.2: Increase the share of renewable energy in the global energy mix	Promote the use of clean energy sources.	Biofuel program: Encourages the production and use of biofuels like sugarcane ethanol, reducing dependence on fossil fuels Investment in wind and solar energy: Government incentives and auctions promote the development of wind and solar farms
7.3: Double the global rate of improvement in energy efficiency	Reduce energy consumption without compromising economic activity.	National Energy Efficiency Plan (PNPE): Sets targets and strategies for various sectors to improve energy efficiency Industrial energy efficiency programs: Offer incentives and technical assistance to industries for adopting energy-saving practices
7.4: Enhance international cooperation for clean energy research and technology	Collaborate with other countries on clean energy development.	Participation in international initiatives: Brazil actively participates in forums like the International Renewable Energy Agency (IRENA) and Mission Innovation. Bilateral cooperation: Engages in joint research and development projects with other countries focusing on clean energy technologies
7.5: Expand infrastructure for sustainable energy services in developing countries	Assist developing nations in accessing clean energy solutions.	South-South Cooperation: Brazil shares its expertise and technology in renewable energy with other developing countries Technology transfer initiatives: Provides technical assistance and training programs for capacity building in clean energy technologies

Table 2. Strengths and limitations of forecasting models.

Model	Strengths and Limitations	Gaps in Literature
SARIMA/X	Captures seasonality and exogenous factors, complementing ARIMA models. Challenges arise when dealing with irregular seasonal components, requiring further refinement for optimal predictive outcomes.	How to address irregular seasonality for improved accuracy.
Facebook Prophet	Shows promising results in energy forecasting by incorporating factors such as seasonality and climate variables. Its performance may vary depending on the dataset and contextual factors.	How to optimize Prophet for specific energy demand datasets.
Holt–Winters	Captures seasonal patterns and handles non-linear data. However, its adaptability to changing data patterns is limited, and it may struggle to handle disruptions effectively.	How to improve Holt–Winters for handling changing data patterns.
TBATS	Exhibits flexibility in accommodating various seasonalities and trends, while also addressing non-normality through techniques like Box–Cox transformation. However, it needs a substantial amount of historical data for accurate predictions and may have limitations for long-term forecasting. Additionally, it operates under the assumption of a stable environment, which could impact its reliability in dynamic contexts.	How to adapt TBATS for long-term forecasting with potential disruptions.

Table 3. Best MAPE values achieved in other energy forecasting studies.

Reference	Year	Model	MAPE (%)	Target Country
[52]	2017	ARIMA	5.34	Ghana
[18]	2022	FB Prophet	3.01	India
[53]	2022	SARIMA-regression	2.48	Australia
[38]	2023	RNN	2.40	Brazil
[54]	2011	Reg–SARIMA–GARCH	1.42	South Africa
[55]	2019	N-BCQR	1.40	Anhui (China)
This work	2024	SARIMA	1.38	Brazil
[56]	2009	ECSTSP	1.10	Taiwan
[57]	2020	SEEMDGESN	0.89	United States
[58]	2015	Exponential smoothing	0.85	Italy
[59]	2019	Ordinal regression	0.74	Greece

Table 4. Parameters selected for each model.

Model	Library (Version)	Parameters	Reference Section
SARIMA	statsmodels (0.14.2)	Seasonality = 12	Section 4.3
		Autoregressive = 1	Section 4.2
		Differences = 1	Section 4.1
		Moving Average = 1	Section 4.2
FB Prophet	prophet (1.1.5)	Automatic
Holt–Winters	statsmodels (0.14.2)	Seasonality = 12	Section 4.3
Holt–Winters	statsmodels (0.14.2)	Seasonality component = additive	Section 4.3.1
TBATS	TBATS (1.1.3)	Seasonality = 12	Section 4.3

Table 5. Comparison of MAPE, MPE, RMSE, and NRMSE_minmax between the univariate models in their optimal conditions, according to Figure 10.

Smoothing	Metric	SARIMA	FB Prophet	Holt–Winters	TBATS
No	MAPE (%)	1.38	1.91	3.85	4.17
	MPE (%)	−0.53	−1.72	3.82	4.17
	RMSE (MWh)	$8.68 \times 10^{5}$	$1.07 \times 10^{6}$	$2.03 \times 10^{6}$	$2.13 \times 10^{6}$
	NRMSE_minmax	0.15	0.19	0.37	0.39
Yes	MAPE (%)	1.44	0.71	2.53	1.44
	MPE (%)	−1.30	−0.03	2.50	1.44
	RMSE (MWh)	$6.87 \times 10^{5}$	$3.99 \times 10^{5}$	$1.37 \times 10^{6}$	$7.81 \times 10^{5}$
	NRMSE_minmax	0.23	0.13	0.46	0.26

Table 6. Comparison of MAPE, MPE, RMSE, and NRMSE_minmax between the multivariate models in their optimal conditions, according to Figure 12 and Figure 13.

Smoothing	Metric	SARIMAX	FB Prophet
No	MAPE (%)	1.28	1.62
	MPE (%)	0.08	0.29
	RMSE (MWh)	$6.94 \times 10^{5}$	$9.06 \times 10^{5}$
	NRMSE_minmax	0.13	0.17
Yes	MAPE (%)	1.18	0.84
	MPE (%)	−0.80	0.10
	RMSE (MWh)	$5.81 \times 10^{5}$	$4.75 \times 10^{5}$
	NRMSE_minmax	0.20	0.16

Table 7. Ljung–Box test p-values for four months lag for the optimal training window sizes in the univariate models, according to Figure 10.

Model	Original	Smoothed
SARIMA	0.277	0.0107
FB Prophet	0.404	0.0007
Holt–Winters	0.002	0.0000008
TBATS	0.001	0.0001

Table 8. Ljung–Box test p-values for four months lag for the optimal training window sizes in the multivariate models, according to Figure 12 and Figure 13.

Model	Original	Smoothed
SARIMAX	0.570	0.001
FB Prophet	0.151	0.006

Table 9. Time performance for forecasting with 215 months as the training window size. The margin of error was calculated with a confidence level of 98% and n = 25.

Model	Average (s)	Margin of Error (s)
SARIMA	2.634	0.392
FB Prophet	0.239	0.031
Holt–Winters	0.031	0.008
TBATS	147.268	10.573

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Serrano, A.L.M.; Rodrigues, G.A.P.; Martins, P.H.d.S.; Saiki, G.M.; Filho, G.P.R.; Gonçalves, V.P.; Albuquerque, R.d.O. Statistical Comparison of Time Series Models for Forecasting Brazilian Monthly Energy Demand Using Economic, Industrial, and Climatic Exogenous Variables. Appl. Sci. 2024, 14, 5846. https://doi.org/10.3390/app14135846

AMA Style

Serrano ALM, Rodrigues GAP, Martins PHdS, Saiki GM, Filho GPR, Gonçalves VP, Albuquerque RdO. Statistical Comparison of Time Series Models for Forecasting Brazilian Monthly Energy Demand Using Economic, Industrial, and Climatic Exogenous Variables. Applied Sciences. 2024; 14(13):5846. https://doi.org/10.3390/app14135846

Chicago/Turabian Style

Serrano, André Luiz Marques, Gabriel Arquelau Pimenta Rodrigues, Patricia Helena dos Santos Martins, Gabriela Mayumi Saiki, Geraldo Pereira Rocha Filho, Vinícius Pereira Gonçalves, and Robson de Oliveira Albuquerque. 2024. "Statistical Comparison of Time Series Models for Forecasting Brazilian Monthly Energy Demand Using Economic, Industrial, and Climatic Exogenous Variables" Applied Sciences 14, no. 13: 5846. https://doi.org/10.3390/app14135846

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Statistical Comparison of Time Series Models for Forecasting Brazilian Monthly Energy Demand Using Economic, Industrial, and Climatic Exogenous Variables

Abstract

1. Introduction

1.1. Contributions and Limitations of the Work

1.2. Organization of the Work

2. Literature Review

2.1. Bibliometric Review

2.2. Discussion on the Strengths and Limitations of SARIMA, FB Prophet, Holt–Winters, and TBATS Models

2.3. Comparison with Other Works

3. Materials and Methods

3.1. The Dataset

3.2. Data Analysis Methods

3.3. Models Evaluation and Forecasting

4. Data Analysis

4.1. Unit Root Test and Differencing Order

4.2. Autoregressive and Moving Average Orders

4.3. Seasonality Tests

4.3.1. Seasonal and Trend Decomposition Using LOESS

4.4. Long-Term Memory Test

4.5. Interaction with Exogenous Variables

4.5.1. Granger Causality

4.5.2. Correlation

4.5.3. Cointegration

5. Time Series Models

5.1. Training Settings

5.2. Univariate Models Accuracy Evaluation

5.3. Multivariate Models Accuracy Evaluation

5.4. Models Fit Test

5.5. Energy Consumption Forecasting

5.6. Time Performance Assessment

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI