(e) Outdoor temperature:

Some of the aforementioned models use the outdoor temperature as a predictor variable. Making future predictions with these models, therefore, requires outdoor temperature predictions. Solargis® services are again used to provide predictions for the outdoor temperature. The data obtained are post-processed with elevation correction and bias correction [44].

#### 2.2.1. Performance Evaluation of Predictions

The performance of the prediction models requires quantification. This is achieved by introducing various error metrics [45]. Two types of error metrics are used: those without dimensions (without units) and those with dimensions (with units). Error metrics *without* dimensions are essentially normalized errors which are necessary for comparing results with studies with different sized installations [46].

To normalize the data, error metrics with dimensions use the sum of all measured points or the average of all measured points in their denominators. This denominator has a downside when interpreting these error metrics for parameters such as the temperature or when assessing the seasonal performance of the PV predictions, since there is the possibility that the temperatures of a dataset, for example, may average to approximately 0 ◦C in wintertime, which in turn gives an unreasonable scaling to the performance metrics. Choosing an error metric with dimensions omits this problem and gives an intuitive value with units.

#### (a) The Coefficient of Determination (R2)

*R<sup>2</sup>* evaluates how much of the variability in the actual values is explained by the model [41]. Generally, *R<sup>2</sup>* takes a value between 0 and 1, wherein 1 represents the best performance. It should be emphasized that while *R<sup>2</sup>* is a powerful metric when assessing linear models, it is an inadequate measure when assessing non-linear models [47]. *R<sup>2</sup>* is therefore only used for the assessment of the chiller model in this research. The mathematical definition of *R<sup>2</sup>* is given by Equation (4).

$$R^2 = 1 - \frac{\sum\_{i=1}^{N} \left(\hat{\mathbf{x}}\_i - \mathbf{x}\_i\right)^2}{\sum\_{i=1}^{N} \left(\mathbf{x}\_i - \overline{\mathbf{x}}\right)^2} \ [-] \tag{4}$$

where

$$\begin{array}{ll} \pounds\_l & \text{The predicted value for data point } i \text{ (e.g., power demand),} \\ \chi\_l & \text{The measured (observed) value for data point } i \text{, and} \\ \overline{X} & \text{The mean of all observed values in the dataset.} \end{array}$$

(b) The Weighted Average Percentage Error (WAPE)

The WAPE describes the average magnitude of error produced by the model, relative to the measured values. It is widely used as a performance measure in forecasting, since it is easy to interpret and understand [48]. This metric is robust to outliers. Forecasting is best when the *WAPE* is close to 0. Equation (5) shows the mathematical definition of the *WAPE* [48].

$$\%APE = \frac{\sum\_{i=1}^{N} \left| \frac{\frac{\hat{x}\_i - \hat{x}\_i}{\hat{x}\_i} \vert x\_i}{\sum\_{i=1}^{N} x\_i} \right|}{\int \frac{\sum\_{i=1}^{N} x\_i}{\int \mathbf{x}\_i \, \mathbf{x}\_i} \, ^{\prime} \, \! \! \!/ \!/ \!/ \!/ \!/ \!/ \!/ \!/ \!/ \!/ \!/ \!/ \!/ \!/} \, \tag{5}$$

#### (c) The Coefficient of Variation of the Root Mean Square Error (CVRMSE)

The *CVRMSE* is a performance metric that penalizes larger errors more than the WAPE [49]. The American Society of Heating, Refrigerating, and Air Conditioning Engineers (ASHRAE) recommends *CVRMSE* values below 30% [50] for hourly predictions and so this standard is also adopted in this research. The mathematical definition is provided in Equation (6) [49].

$$CVRMSE = \frac{\sqrt{\frac{\sum\_{i=1}^{N} (x\_i - \mathfrak{e}\_i)^2}{N}}}{\overline{x}} \text{ [\%]} \tag{6}$$

#### (d) The Mean Absolute Error (MAE)

The *MAE* is the average of the absolute difference between the predicted values and observed value; see Equation (7) [51]. The closer the value is to 0, the better the prediction performance.

$$MAE = \frac{\sum\_{i=1}^{N} |\mathbf{x}\_i - \mathbf{x}\_i|}{N} \tag{7}$$

#### (e) The Root Mean Square Error (RMSE)

The *RMSE* is the same as the CVRMSE, except for scaling by the average of all observations; see Equation (8) [52].

$$RMSE = \sqrt{\frac{\sum\_{i=1}^{N} \left(\mathbf{x}\_{i} - \hat{\mathbf{x}}\_{i}\right)^{2}}{N}} \tag{8}$$

#### (f) The Mean Bias Error (MBE)

The *MBE* indicates whether a forecasting model, in general, tends to overestimate or underestimate in comparison to the actual values [46]. This metric could then be used to correct such systematic deviations. The *MBE* can be calculated according to Equation (9) [52]:

$$MBE = \frac{1}{N} \sum\_{i=1}^{N} (\hat{\mathbf{x}}\_i - \mathbf{x}\_i) \tag{9}$$

#### 2.2.2. Total Demand Prediction of the Building

Finally, in order to predict the total day-ahead (lead time: 24 h) electricity demand of the building, the established prediction models for all load groups and Solargis® temperature and PV prediction services are integrated into one combined model (see Figure 6). The predictions are performed each day at 00:00 and error metrics are computed. The dataset used in the integrated model consists of Solargis® and historic building energy demands from 25 May to 4 April 2019. MATLAB is used for the integration of the models and assessment of the data. After combining the predicted energy demands of each subcomponent, the resulting total building demand prediction with the hourly resolution data in kWh.h-<sup>1</sup> is taken as the power demand in kW.

**Figure 6.** Total demand prediction of the building.

#### *2.3. Step 2: Establish the Operational Strategy and BESS Simulations*

The objective of this study is to stabilize/flatten a building energy demand profile during office work hours using a BESS. Peak shaving and valley filling are necessary to meet the load shape objective. A peak refers to a significantly higher power demand than desired, and a valley to a significantly lower power demand than desired. Before peak shaving and valley filling can be considered, a '*desired power demand profile*' of the building should be established. A comparison between the actual building load and the desired load then allows for the identification of peaks and valleys. Instead of the term '*desired power demand*', henceforth, the term '*baseline*' (BL) is used. An illustrative example of a BL which is set between 07:00 and 17:00 (working hours of the building) is shown in Figure 7. By charging and discharging the BESS, load shape objectives can be met. In principle, this baseline can be developed to

reflect the different objectives of the building owners such as maximizing self-energy consumption, minimizing electricity costs, and matching flexible Smart Grid demands.

**Figure 7.** Peak shaving and valley filling depending on the established baseline.

The operation of the BESS relies heavily on the established BL. When the baseline is too high, power demands are unnecessarily high, and the BESS may not be able to fill all valleys. On the other hand, when the BL is too low, the BESS may not be able to deliver the power necessary to shave all the peaks. Another important parameter that is dependent on the baseline (and vice versa) is the initial state of charge (SoCini) of the BESS before the load balancing period starts. In this case, the load balancing period starts at 07:00. The mutual dependence of BL and SoCini calls for a strategy to determine the best balance. The steps taken to determine the best balance are described below. For both workdays and weekend days, the predictions are calculated at 00:00 for the upcoming 24 h and then used when determining the BL and SoCini.
