**5. Conclusions**

Retailers need forecasts for a huge number of related time-series which can be organised into an hierarchical structure. Sales at the SKU level can be naturally aggregated into categories, families, areas, stores, and regions. To ensure aligned decision-making across the hierarchy, it is essential that forecasts at the most disaggregated level add up to forecasts at the aggregate levels above. It is not immediately clear if these aggregate forecasts should be generated independently or by using an hierarchical forecasting method that ensures coherent decision-making at the different levels but does not guarantee (at the least) the same accuracy. To give guidelines on this issue, our empirical study investigates the relative performance of independent and reconciled forecasting approaches.

We use weekly data of SKU sales from one big store of a Portuguese retailer, spanning the period between 3 January 2012 and 27 April 2015, and consider the hierarchical structure of products adopted by the company from the top level to the bottom level, comprising six levels overall. We generate the independent forecasts using two alternative forecasting model families; namely, ETS and ARIMA. These are compared to the most commonly-used hierarchical forecasting approaches. We evaluate the forecast accuracies of several competing methods, through the Average Relative Mean Squared Error, by using a cross-validation based on a rolling forecast origin.

It is clear that MinT-Shrink forecasts generally improve on the accuracy of the ARIMA base forecasts for all levels and for the complete hierarchy, across all forecast horizons. The accuracy gains generally increase with the horizon, varying between 1.7% and 3.7% for the complete hierarchy. That is not the case for all other reconciliation methods, attesting to the difficulty of producing reconciled forecasts that are at least as accurate as base forecasts. The improvements on the accuracy of MinT-Shrink forecasts, across all forecast horizons, are more pronounced with the ARIMA base forecasts, compared to the ETS base forecasts (with the exception to horizon *h* = 1 at the bottom level); although, the former is almost always more accurate than the latter.

To improve on the accuracy of the base forecasts, the reconciliation methods have to take advantage of the combination of informative signals from all levels of aggregation. It is clear that MinT-Shrink is able do this and, hence, improvements in forecast accuracy over the base forecasts are attained. It is also evident that the gains in forecast accuracy are more substantial at higher levels of aggregation, which means that the information about the individual dynamics of the series lost when aggregating, is brought back again from the lower levels of aggregation to the higher levels by the reconciliation process, substantially improving the forecast accuracy over the base forecasts.

**Author Contributions:** Conceptualization, J.M.O. and P.R.; methodology, J.M.O. and P.R.; software, J.M.O. and P.R.; validation, J.M.O. and P.R.; formal analysis, J.M.O. and P.R.; investigation, J.M.O. and P.R.; resources, J.M.O. and P.R.; data curation, J.M.O. and P.R.; writing–original draft preparation, J.M.O. and P.R.; writing–review and editing, J.M.O. and P.R.; visualization, J.M.O. and P.R..

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.
