**1. Introduction**

The advent of Industry 4.0 revolutionized factories worldwide, since it allowed the connectivity between measuring machines and the automation of companies, distributing the capacity to collect massive volumes of data [1]. In high-level data analysis, forecasting models allow the extraction of behavior patterns, as well as the prediction of future values for the collected data set [2].

In the above-mentioned scenario, the construction of predictive models is gaining prominence in the literature [3–5], since economic agents deal with uncertainty in multiple spheres and aim to achieve the best results using available resources [6]. Developing acceptably accurate models presents a meaningful challenge, as prediction is a technique that deals with risk and there will always be a fundamental error associated with it. The best model is the one that most adequately represents the phenomenon of interest.

In relation to the object of our study, power generation, there are several forecasting applications: (i) classical time series models like the autoregressive moving average, autoregressive integrated moving average, and generalized autoregressive conditional heteroscedastic among others [7,8]; (ii) pre-processing techniques like spectrum analysis, wavelets, and Fourier analysis [9]; and, (iii) machine learning approaches such as neural networks, fuzzy systems, and support vector machine [10]. Alternatively, hybrid models aim to combine machine learning representations with di fferent methods. These methods include focused time-delay neural networks [11], wavelet neuro-fuzzy systems [12], finite-impulse response neural networks [13], local feedback dynamic fuzzy neural networks [14], type recurrent fuzzy networks [15], and neuro-fuzzy inference systems [16] among others.

Additionally, an alternative class known as hierarchical forecasting [17–19] deals with organized time series that can be aggregated at di fferent levels into groups based on geography, sources of energy, or other, specific features. Despite this being a recent topic, there is already research that has addressed the use of hierarchical forecasting models in the energy sector. Examples of hierarchical forecasting include electrical grids [20], solar power generation [21], energy transport [22], short-term load forecasting [23], long-term load forecasting [24], energy consumption [25], and air pollution [26] among others.

The papers identified above have calibrated the forecasts using only the bottom-up, top-down or Ordinary least squares (OLS) assumptions [19]. Thus, the following research question is formulated: how is it possible to make hierarchical predictions using advanced linear regression models with regularization? In this way, it is expected to obtain more reliable forecasts by rewriting the hierarchical problem in terms of finding a set of unbiased, minimum variance measures of projected values across the whole array of data. It is possible to minimize the sum of variances of the reconciled estimate errors under the property of unbiasedness, using the procedure called MinT (minimum trace) reconciliation [27].

The present paper presents a case study using a power generation data set from Brazil (2018–2020) organized by electrical subsystems and di fferent generating sources. Specifically, the main approaches used to aggregate and disaggregate predictions made for grouped time series are examined, namely: (i) bottom-up, (ii) top-down and (iii) optimal reconciliation models (OLS, WLS and MinT). The ARIMA and ETS predictive models were used to test the performance of these reconciliation methods, since these are the default models available in the R-package HTS. Further descriptions can be found in the materials and methods section.

The remainder of the present paper is organized as follows. Section 2 defines the study methodology, describing the data set, hierarchical procedures, and forecasting models employed. Section 3 presents the results and discussions of the techniques, in addition to the limitations of this paper. Finally, Section 4 presents the conclusions and guidelines for future work.

### **2. Materials and Methods**

The secondary data used in this study correspond to the amounts of power generated by each of the Brazilian electrical subsystems (North, Northeast, Southeast/Midwest, and South). We separated these data according to the source of energy (wind, hydroelectric, thermal, solar, and nuclear). Data were obtained from the National Electric System Operator [28], due to their reliability. The observations of hourly power generation (GWh) were made during the period from January 2018 to January 2020, making a total of 17,521 h.

Based on Hyndman et al. [19], we present a schematic representation of the Brazilian energy generation system, comprising a three-level hierarchical structure (Figure 1). Level 0 represents the total energy generated in Brazil (completely aggregated series). Level 1 denotes each of Brazil's electrical subsystems (first level of disaggregation). The last level, Level 2, represents each of the energy generating sources (Level *k*). According to this framework, it is possible to identify the most disaggregated time series (in this case *k* = 2).

Table 1 shows the amounts of power generation in Brazil (GWh), according to generating sources and electrical subsystems. There is a predominance of hydroelectric generation (73%), making the Brazilian electrical matrix one of the cleanest in the world. At the same time, the Southeast/Midwest subsystem accounts for more than half (56%) of all energy generated in the country.

**Figure 1.** Hierarchical aggregation structure for the energy generation in Brazil.


**Table 1.** Amounts of power generation in Brazil (GWh).

Routines were implemented using the R® programming language [29]. The R-package HTS was used to calculate the bottom-up, top-down, optimal combination reconciliation and trace minimization reconciliation. HTS is available at: https://cran.r-project.org/web/packages/hts/index.html. Although HTS includes functions for creating, plotting and forecasting hierarchical time series, it has some limitations. Those limitations include the fact that it has only three built-in forecasting options: ARIMA, ETS, and random walks [19]. This paper will use the ARIMA and the ETS models since they have automatic adjustment and allow consideration of factors such as the trend and seasonality of the data set. The computer used to execute the algorithms had CPU Intel Core i5-7200 2.70 GHz, RAM of 16 GB, and operating system Windows 10 x64. In the next subsection, we present the hierarchical reconciliation models used in the present paper, as well as the forecasting models.

### *2.1. The Bottom-Up (BU) Approach*

The BU procedure requires first providing forecasts for every series at the bottom-level, and then summing these to generate forecasts for all the levels of the hierarchical structure [30]. In its simplicity, this approach neglects the relations between time series and works, mainly unsuccessfully, on highly disaggregated data. These data tend to have a low signal-to-noise ratio [27]. According to the hierarchy (Figure 1), we first make h-step-ahead forecasts for all the bottom-level time series (*n* = 14):

*y* ˆ *AA*,*t*, *y* ˆ *AB*,*t*, *y* ˆ *AC*,*t*, *y* ˆ *BA*,*t*, *y* ˆ *BB*,*t*, *y* ˆ *BC*,*t*, *y* ˆ *BD*,*t*, *y* ˆ *CA*,*t*, *y* ˆ *CB*,*t*, *y* ˆ *CC*,*t*, *y* ˆ *CD*,*t*, *y* ˆ*DA*,*t*, *y* ˆ*DB*,*t*, *y* ˆ *DC*,*t*. (1) Summing these, we obtain h-step-ahead forecasts for the rest of the series:

$$
\begin{aligned}
\widetilde{y}\_{t} &= \widehat{y}\_{AA,t} + \widehat{y}\_{AB,t} + \widehat{y}\_{AC,t} + \widehat{y}\_{BA,t} + \widehat{y}\_{BB,t} + \widehat{y}\_{BC,t} + \widehat{y}\_{BD,t} + \widehat{y}\_{CA,t} + \widehat{y}\_{CB,t} \\ &+ \widehat{y}\_{CC,t} + \widehat{y}\_{CD,t} + \widehat{y}\_{DA,t} + \widehat{y}\_{DB,t} + \widehat{y}\_{DC,t}.
\end{aligned}
\begin{aligned}
\widetilde{y}\_{t} &= \widehat{y}\_{CA,t} + \widehat{y}\_{DA,t} + \widehat{y}\_{DC,t} + \widehat{y}\_{DC,t}.
\end{aligned}
$$

$$
\begin{aligned}
\widetilde{y}\_{B,t} &= \widehat{y}\_{BA,t} + \widehat{y}\_{BB,t} + \widehat{y}\_{BC,t}.
\end{aligned}
\tag{2}
$$

$$
\begin{aligned}
\widetilde{y}\_{C,t} &= \widehat{y}\_{CA,t} + \widehat{y}\_{CB,t} + \widehat{y}\_{CC,t} + \widehat{y}\_{CD,t}.
\end{aligned}
\tag{3}
$$

According to [19], it is possible to arrange the equations expressed in (2) into an algebra notation. Below is a complete notation for this problem:


Alternatively, the notation presented in (3) can be reformulated in a compact way by applying the summing matrix. Thus, the bottom-up approach can be represented as:

$$
\overline{y\_t} = S\delta\_{t,\*} \tag{4}
$$

where *yt* is an *n*-dimensional vector of *h*-step-ahead forecasts for the total energy, *S* is the summing matrix, and ˆ *bt* is an *m*-dimensional vector of *h*-step-ahead forecasts for each of the sources of energy at bottom-level. An advantage of this procedure is that we are forecasting at the bottom-level of a hierarchy. Consequently, no information is missed due to aggregation [17].

### *2.2. The Top-Down (TD) Approach*

Top-down methods operate with strictly hierarchical aggregation structures, not with grouped structures. They involve first making forecasts for the Total level *yt*, and next disaggregating these down the hierarchy [17]. Let *p*1, ... , *pm* be a set of disaggregation proportions that deliver the forecasts of the Total series, which are to be distributed in order to obtain forecasts for all series at the bottom-level of the structure. To illustrate, concerning our hierarchy by applying proportions to Figure 1, we ge<sup>t</sup> *p*1, ... , *p*14:

$$\begin{aligned} \overline{y}\_{AA,t} &= p\_1 \hat{y}\_{t\prime}, \overline{y}\_{AB,t} = p\_2 \hat{y}\_{t\prime}, \overline{y}\_{A\mathcal{C},t} = p\_3 \hat{y}\_{t\prime} \\ \overline{y}\_{BA,t} &= p\_4 \hat{y}\_{t\prime}, \overline{y}\_{BB,t} = p\_5 \hat{y}\_{t\prime}, \overline{y}\_{BC,t} = p\_6 \hat{y}\_{t\prime}, \overline{y}\_{BD,t} = p\_7 \hat{y}\_{t\prime} \\ \overline{y}\_{CA,t} &= p\_8 \hat{y}\_{t\prime}, \overline{y}\_{C\mathcal{B},t} = p\_9 \hat{y}\_{t\prime}, \overline{y}\_{CC,t} = p\_{10} \hat{y}\_{t\prime}, \overline{y}\_{CD,t} = p\_{11} \hat{y}\_{t\prime} \\ \overline{y}\_{DA,t} &= p\_{12} \hat{y}\_{t\prime}, \overline{y}\_{DB,t} = p\_{13} \hat{y}\_{t\prime}, \overline{y}\_{DC,t} = p\_{14} \hat{y}\_{t\prime}. \end{aligned} \tag{5}$$

This can be rewritten using matrix notation. If we stack the set of proportions in an *m*-dimensional vector *p* = (*p*1, ... , *pm*)-, we have the bottom-level *h*-step-ahead predictions. Overall, for a given set of proportions, top-down approaches can be written as:

$$\begin{array}{l} b\_{t} = p\_{j} \mathfrak{g}\_{t}. \\ \widetilde{y}\_{t} = \mathcal{S} p\_{j} \mathfrak{g}\_{t}. \end{array} \tag{6}$$

The main TD models stipulate disaggregation proportions according to the historical proportions of the data. Among the main models of this approach, we highlight the following three: (i) top-down Gross–Sohl method A (TDGSA), (ii) top-down Gross–Sohl method F (TDGSF), and (iii) Top-down forecast proportions (TDFP) (Table 2). Additional details and demonstrations of Table 2 can be obtained from [18,31].

**Table 2.** TD disaggregation proportions according to the historical proportions of the data.


### *2.3. The Optimal Reconciliation Approaches*

The optimal reconciliation approach proposed by [19] consists of an ordinary least squares problem based on the calculation of independent projections for all hierarchical levels, then applying a regression model to optimize the combination of these forecasts. According to [32], we can write the base prediction as:

$$
\hat{y}\_{t+h|t} = \mathcal{S}\beta\_{t+h|t} + \varepsilon\_{h\prime} \tag{7}
$$

where β*t*+*h*|*t* represents the unknown conditional mean of the most disaggregated series, and ε*h* is the error with mean of zero and covariance matrix *<sup>h</sup>*. If *h* were known, the estimator of β*t*+*h*|*t* would lead to the following weighted least squares, producing reconciled forecasts, as follows:

$$
\widetilde{y}\_{t+h|t} = S\widehat{\beta}\_{t+h|t} = S(S' \sum\_{lh}^{-1} S)^{-1} S' \sum\_{lh}^{-1} \widehat{y}\_{t+h|t} = SP\widehat{y}\_{t+h|t} \tag{8}
$$

where *P* = (*S*- −<sup>1</sup> *h S*)<sup>−</sup>1*S*- −<sup>1</sup> *h S*. If the base forecasts *y*<sup>ˆ</sup>*t*+*h*|*t* are unbiased, then the reconciled forecasts *yt*+*h*|*t* will be unbiased, provided that *SPS* = *S* [19]. This condition is valid for this reconciliation procedure for the bottom-up, although not for the top-down, methods. Consequently, the top-down approaches will never give unbiased reconciled forecasts, even if the base forecasts are unbiased. Additionally, [27] proved that, in general, *h* is not known and not identifiable. The covariance matrix of the *h*-step-ahead reconciled forecast errors is given by the following expression:

$$Var\_{(y\_{t+h} - \widetilde{y}\_{t+h|t})} = SPW\_h P'S',\tag{9}$$

for any *P* such that *SPS* = *S*, then *Wh* = *Var*(*yt*+*h*−*y*<sup>ˆ</sup>*t*+*h*|*t*) = *<sup>E</sup>*(*e*<sup>ˆ</sup>*t*+*h*|*te*<sup>ˆ</sup>*t*+*h*|*t*) is the covariance matrix of the corresponding *h*-step ahead base forecast errors. The purpose is to ge<sup>t</sup> the matrix *P* that minimizes the error variances of the reconciled forecasts which are on the diagonal of the covariance matrix *Var*(*yt*+*h*−*yt*+*h*|*t*). Finally, [27] demonstrated that the optimal reconciliation matrix *P* that minimizes the trace of *SPWhP*-*S*- =, such that *SPS* = *S*, and the optimal reconciled forecasts, respectively, are given by:

$$\begin{aligned} P &= \left(\mathbf{S'}\mathbf{W}\_h^{-1}\mathbf{S}\right)^{-1}\mathbf{S'}\mathbf{W}\_h^{-1} \\ \widetilde{y}\_{t+h\vert t} &= \mathbf{S}\left(\mathbf{S'}\mathbf{W}\_h^{-1}\mathbf{S}\right)^{-1}\mathbf{S'}\mathbf{W}\_h^{-1}\hat{y}\_{t+h\vert t\prime} \end{aligned} \tag{10}$$

which is introduced as the MinT (minimum trace) estimator. The next step consists of estimating *Wh*, a matrix of order *n*. Wickramasuriya, Athanasopoulos and Hyndman [27] proposed the following procedures (Table 3) to obtain the matrix:



Source: adapted by authors from: [27].

### *2.4. ARIMA and ETS Formulation*

ARIMA is one of the most-widely-used time series approaches for forecasting power generation [33]. Although studies have shown that ETS outperforms ARIMA [34], it is recommended to keep ARIMA as a reference model during the forecasting process. Moreover, several statistical software packages, like <sup>R</sup>®, provide automatic model identification and parameter estimation skills for both ARIMA and ETS [17]. Professor Hyndman [19] developed the HTS package initially based on these predictive models. The present paper aims to test different approaches to optimal forecast reconciliation and, to do so, only the ARIMA and ETS models will be used. It is recommended that future studies extend these forecasting procedures using different predictive models, such as machine learning ones.

ARIMA was proposed by [33]. It is a linear forecasting method for dealing with stationary time series [34]. In the initial step, a time series is built stationary by differencing *d* times along with some nonlinear transformations, such as logging [34]. The consequential data are recognized as a linear function of past *p* data values and *q* errors (11), i.e., modeled as an autoregressive moving average (ARMA) model,

$$y\_t = \mathcal{Q}\mathbf{1}y\_{t-1} + \mathcal{Q}\mathbf{2}y\_{t-2} + \dots + \mathcal{Q}\_{p}y\_{t-p} + \Theta\mathbf{1}\varepsilon\_{t-1} + \Theta\mathbf{2}\varepsilon\_{t-2} + \dots + \Theta\_{\emptyset}\varepsilon\_{\emptyset-1},\tag{11}$$

where *yt* denotes real value at time *t*, ε*t* describes the error sequence: it is supposed to be white noise and Gaussian distributed (0, <sup>σ</sup><sup>2</sup>). ∅*i* for (*i* = 1, 2, ... , *p*) are autoregressive (*AR*) coefficients and Θ*j* for (*j* = 1, 2, ... , *q*) are moving average (*MA*) coefficients. *p* and *q* are integers referred to as model orders. The time series model is denoted as *ARIMA*(*p*, *d*, *q*) [35,36].

According to [34], the group of exponential smoothing methods utilizes the principle of weighted averages of past information for making forecasts. Since its formulation in 1950, a variety of exponential smoothing methods have been developed. All exponential smoothing methods were initially classified by [37], which has been continued by [38–40]. ETS stands for error, trend, and seasonality elements. As pointed by [34], the usual representation for these patterns involves a state vector *xt* = (*lt*, *bt*,*st*,*st*−1, ... ,*st*−*m*+<sup>1</sup>)-, and the state space equations [39] have the resulting structure:

$$\begin{aligned} y\_t &= w(\mathbf{x}\_{t-1}) + r(\mathbf{x}\_{t-1})\varepsilon\_t \\ \mathbf{x}\_t &= f(\mathbf{x}\_{t-1}) + g(\mathbf{x}\_{t-1})\varepsilon\_t \end{aligned} \tag{12}$$

where (<sup>ε</sup>*t*) denotes a Gaussian white noise (0, σ<sup>2</sup>) and μ*t* = *w* (*xt* − <sup>1</sup>). The model with additive error has *rt*(*xt* − 1) = 1, so *yt* = μ*t* + ε*t*. The model with multiplicative errors has *rt*(*xt* − 1) = μ*t* = μ*<sup>t</sup>*, so *yt* = μ*t*(<sup>1</sup> + <sup>ε</sup>*t*). Consequently, ε*t* = (*yt* − μ*t*)/μ*t* is a relative error for the multiplicative model and any value of *rt*(*xt* − 1) will lead to the identical point forecast for *yt* [34,39].

### *2.5. Evaluating Forecast Accuracy*

According to [20], there are several accuracy metrics, such as mean absolute percentage error (*MAPE*), mean absolute error (*MAE*), mean absolute scaled error (*MASE*), or root-mean-square error (*RMSE*), to evaluate the performance of point prediction methods, defined as follows:

$$MAPE = \frac{1}{T} \sum\_{t=1}^{T} \left| \frac{y\_t - \hat{y}\_t}{y\_t} \right|. \tag{13}$$

$$MAE = \frac{1}{T} \sum\_{t=1}^{T} \left| y\_t - \hat{y}\_t \right|. \tag{14}$$

$$MAE = \frac{MAE}{MAE\_{in-sample\,naive}} \tag{15}$$

$$RMSE = \sqrt{\frac{1}{T} \sum\_{t=1}^{T} \left( y\_t - \hat{y}\_t \right)^2} \tag{16}$$

where *yt* is the amount of power generation at time t, *y*ˆ*t* is the fitted value for power generation, and *MAEin*−*sample*,*naive* is the MAE generated by a naive forecast.

Specifically, in studies of hierarchical time series, the *MAPE* indicator appears the most frequently in the literature [41–43]. *MAPE* was also the selected metric for the present paper (Figures 2 and 3). Complementarily, *MAE*, *MASE*, and *RMSE* were estimated, and the results can be found in the Appendix A (Figures A3 and A4). The values of the *MAPE*, *MAE*, *MASE* and *RMSE* statistics were obtained using a weighted average, with proportions from Table 1.


**Figure 2.** Hierarchical forecasting for electricity generation based on the ARIMA procedure (MAPE). (Note: The performance was indicated into a color scale, where green means better values for calculated accuracy, and red means worse accuracy. The intermediate values are colored yellow.).


**Figure 3.** Hierarchical forecasting for electricity generation based on the ETS procedure. (Note: The performance was indicated into a color scale, where green means better values for calculated accuracy, and red means worse accuracy. The intermediate values are colored yellow.).

### **3. Results and Discussion**

Figure 2, below, shows the predictive result obtained, using the ARIMA model, considering a predictive window of nine hours (*h* = 1, ... , <sup>9</sup>). Note that the model was estimated, taking the main hierarchical adjustment approaches into account, for the following levels: (i) total power generation in Brazil (Level 0), (ii) total energy generation by electrical subsystem (Level 1), and (iii) total energy generation by the energy generating source (Level 2). For Level 1, four forecasts (one for each electrical subsystem) were estimated. For Level 2, 14 forecasts (one for each energy source) were estimated.

Therefore, we estimated 1539 predictive models satisfying the following proportions: (i) 81 models for Level 0, (ii) 324 models for Level 1, and (iii) 1134 models for Level 2. The MAPE calculation for Levels 1 and 2 was based on a weighted average of the predictive errors. The weighting factors used are shown in Table 1.

The performance of each predictive model, divided by the forecast horizon, is illustrated by a color scale. The green colors indicate the most accurate forecasts, while the red colors symbolize less accurate forecasts. The best forecasts, for each of the predictive horizons, are highlighted in bold. The last column of Table 1 presents the average performance for each forecast horizon (*h*) for each hierarchical approach.

As pointed by [27], the MinT procedure has a useful feature: it systematizes results into a unique analytical solution that incorporates information about the correlation structure of the entire dataset. Additionally, the minimum trace reconciliation, with or without regularization, presented the best results of all linear reconciliation methods, such as OLS and WLS, with variations. Moreover, the MinT (Sample) approach returns the most accurate, coherent forecasts for all levels considering just the first forecast horizons. However, as the predictive window grows, the BU method becomes more accurate. Furthermore, the performance of the BU model increases as the time series disaggregate.

As expected, the results obtained using the top-down technique did not present good predictive results, since it is intended to generate forecasts for level 0, with worse accuracy for the other levels. Both BU and TD present disadvantages: they do not take the correlation among the series at each level into account.

The other accuracy metrics presented in the Appendix A (*MAE*, *MAE*, and *RMSE*) reinforce the results found. In general, the performance of the optimal reconciliation models, by trace minimization, provides more uniform estimates and better predictive potential for the first hours of the predictive horizon (Figures A3 and A4).

In addition to the ARIMA predictive model, Figure 3 presents the same forecasting procedures. However, they are based on the ETS automatic adjustment model. The objective is to show the influence of di fferent forecasting methods for each hierarchical reconciliation model. In general, the error percentage produced by the ETS model was slightly higher than that produced by the ARIMA model. Figure 3 also shows the influence of trace minimization procedures (MinT) on the improvement of predictive performance. In particular, the MinT models have good predictive performance, even with the increase of the forecast horizon hours.

The average performance of the trace minimization (MinT) models shows stability, considering all hierarchical levels. As shown in Figure 2, the ETS-based predictive model shares some similarities with the ARIMA model. The BU technique is better for the most disaggregated levels, whereas the TD technique stands out only at the more aggregated levels. Note that the trace minimization procedures show significant gains over the classic linear models, namely OLS and WLS.

Figures 2 and 3 present some limitations. In general, it is not possible to test the predictive influence of each of the subsystems within the established forecast horizon. To show this problem, Figure 4 presents a predictive comparison (MAPE) for each of the Brazilian electrical subsystems, considering the nine-hour predictive horizon. On the left is the technique with the best aggregation/disaggregation performance (BU) for the ARIMA model. On the right is the technique with the best average performance (MinT) for the ETS automatic selection model.

Figure 4 thus shows a negative influence of the "south" electrical subsystem in the global measures of accuracy, especially from a predictive horizon of three hours onward. This system should be analyzed more thoroughly to identify energy sources located in the "south" subsystem that contributed most to the predictive instability of this system. Simultaneously, the use of individualized predictive models for this "south" system can be a good strategy, since unique climatic conditions exist in southern Brazil.

**Figure 4.** Hierarchical forecasting for power generation: electrical subsystem versus forecast horizon.

Figures A1 and A2 (Appendix A) present the accuracy measure of the ARIMA and ETS models in detail, considering energy sources versus electrical subsystems. These results reinforce those in Figure 4, indicating instability in the southern subsystem, especially wind energy data.

Finally, some limitations of the present paper are recognized here. First, predictive models are based on past information evaluable, so the presented results cannot be extrapolated for different contexts and other time periods. Additionally, it is necessary to incorporate other predictive models to make the results more robust. In future research, it is recommended that models which integrate high-frequency data, e.g., the Wavelet approach, be adopted.
