*3.2. Data Description*

The data used in this study were real utility data collected from January 2010 to December 2019. Figure 5 shows the different periodicities of the UCLF over time. Figure 5c shows the periodicity over weeks in parts of the South African winter (June–July) and summer (November–December) season in the year 2019.

The collected data were for four variables: the installed capacity, demand, PCLF, and UCLF. To investigate how these variables affect the UCLF forecast accuracy of the different techniques, the variables were arranged into five experiments, as shown in Figure 6. A tick indicates that a variable is used in the respective experiment and a cross indicates that the variable was not used in the experiment. The experiment with the best performance will, thus, indicate which variables should be used with which technique to achieve the lowest year-ahead UCLF forecasting error. The installed capacity is the total power that can be generated by the installed power generation plants in megawatts. The demand is the historic total national power demand in megawatts. The PCLF and UCLF are the respective historic variables in megawatts. The UCLF data used for the input in the training and testing of the models were split into the UCLF two years before the target UCLF, *UCLF T-2 Years*, and the UCLF a year before the target, *UCLF T-1 Year*. The UCLF data used was a daily peak value. A variable indicating if it is a weekend or a weekday, the

*Weekend Index*, was also used as an input. This variable was a 1 for weekends and a 0 for weekdays. This variable was included for the models to be able to differentiate the data for a weekday and the weekend, respectively. This resulted in six input variables. The training period was between 1 January 2012 and 31 December 2018. The testing period was between 1 January 2019 and 31 December 2019. Thus, the forecasts were a daily peak UCLF for the year-ahead forecast period. All the variables, except the weekend index, were normalized to be between 0 and 1. The training input data were, thus, a 2555 × *n* matrix, where the 2555 is the daily input values over 7 years and *n* is the number of variables used in the respective experiment, as described next.

**Figure 4.** Location of 15 key South African coal-fired power stations.

The training input variable matrix sizes were, thus, 2555 × 6 for Exp 1, 2555 × 5 for Exp 2 to Exp 4, and 2555 × 3 for Exp 5.
