*2.2. Energy Analysis Methodology*

The study of the impact on the energy demand due to the use of two different *actual* weather datasets is based on the energy simulation, and therefore, building energy models (BEM) are needed. For this study, detailed BEMs are employed using the EnergyPlus engine [52,53]. BEMs require weather files in the *EPW* format, which are created using the weather data from both sources (*on-site* and *third-party* weather data) and employing the Weather Converter tool [54] provided as an auxiliary program by EnergyPlus.

As shown in Figure 1, the energy analysis studies the impact on the energy demand when using both *on-site* and *third-party* weather files. BEM provides the energy demand that it requires each weather file to accomplish with the defined requirements (the temperature setpoint of each space). The energy results using the *on-site* weather file are established as the reference as they corresponded with the weather data measured in the building's surroundings. Comparing the variation between the energy demand results using both weather files allows us to analyze the impact of using weather data from *third-party* sources with respect to the reference.

In order to perform a deeper study, a sensitivity analysis is performed to analyze the effects on the energy demand generated by each weather parameter. The methodology consists of replacing variables (one variable at a time) from the *on-site* weather file with data from the *third-party* weather file and generating specific weather files for each parameter. For example, when the dry bulb temperature is analyzed, a weather file is prepared that has the dry bulb temperature data from the *third-party*, but the rest of the weather parameters are maintained the same as in the *on-site* weather file. This way, the impact on energy demand when using only dry bulb temperature data from a *third-party* can be studied. This procedure is done for the weather parameters analyzed in the weather comparison: the dry bulb temperature (*Temp*), relative humidity (*RH*), direct normal irradiation (*DNI*), diffuse horizontal irradiation (*DHI*), wind speed (*WS*), and wind direction (*WD*). These weather parameters are selected for the study as they are all used by EnergyPlus in the simulations, unlike other parameters, such as the global horizontal irradiation [54].

In the case of the other weather parameters provided by the weather stations, such as the atmospheric pressure and precipitation, they are not presented in this study as their impact in the BEMs is low. The process to perform the energy analysis is the same as before: BEM is simulated with the generated weather file with one parameter changed, obtaining an energy demand that is compared to the reference (the energy demand obtained using the *on-site* weather data).

As was explained in the Introduction, most of the studies that analyzed the effect of using different weather datasets in the building energy simulations used only the annual energy results, and only some of them used smaller temporal resolutions (monthly or weekly). This study presents the analysis according to different temporal resolutions and discusses the differences in the results. The time granularity levels proposed are annual, seasonal, monthly, weekly, daily, and hourly. Thus, the uncertainty metrics calculated for the energy results are related to the accumulated energy demand provided by the model in year, season, month, week, day, and hour periods.

For the statistical analysis of the results, three metrics are used in the study: the mean absolute deviation percent (*MADP*) (5), the coefficient of variation of the root-mean-squared error (*CV*(*RMSE*)) (6), and the coefficient of determination (*R*2) (7). The equations of these statistical indexes are shown as:

$$MADP = \frac{\sum\_{i=1}^{n} |y\_i - \mathcal{G}\_i|}{\sum\_{i=1}^{n} |y\_i|} \tag{5}$$

$$CV(RMSE) = \frac{1}{\mathcal{Y}\_i} \sqrt{\frac{\sum\_{i=1}^n (y\_i - \hat{y})^2}{n - p}} \tag{6}$$

$$\mathcal{R}^2 = \left(\frac{n \cdot \sum\_{i=1}^n y\_i \cdot \mathcal{Y} - \sum\_{i=1}^n y\_i \cdot \sum\_{i=1}^n \mathcal{Y}}{\sqrt{\left(n \cdot \sum\_{i=1}^n y\_i^2 - (\sum\_{i=1}^n y\_i)^2\right) \cdot \left(n \cdot \sum\_{i=1}^n \mathcal{Y}^2 - (\sum\_{i=1}^n \mathcal{Y})^2\right)}}\right)^2 \cdot \tag{7}$$

In the equations, *n* is the number of observations, *yi* the *on-site* measured data at moment *i*, and *y*ˆ*<sup>i</sup>* the *third-party* value at that moment.

*MADP* and *CV*(*RMSE*) are both quantitative indexes that show the results in percentage terms. They allow the comparison between different test sites, weather parameters, and time resolutions. *MADP*, which is also called the MAD/mean in some studies [55], has advantages that overcome some shortcomings of other metrics. It is not infinite when the actual values are zero, is very large when actual values are close to zero, and does not take extreme values when managing low-volume data [55–57]. *CV*(*RMSE*), which gives a relatively high weight to large variations, is the other percentage metric selected for this study because it is a common metric in energy analysis. Indeed, the ASHRAE (American Society of Heating, Refrigerating and Air-Conditioning Engineers)

Guidelines [44], FEMP (Federal Energy Management Program) [58], and IPMVP (International Performance Measurement and Verification Protocol) [59] use it to verify the accuracy of the models.

The coefficient of determination (*R*2) allows us to measure the linear relationship of the two patterns [60]. It ranges between 0.00 and 1.00, and higher values are better. It should be noted that uncertainty cannot be assessed using only this metric as the linear relationship may be strong, but with a substantial bias.

In the study, the *MADP* and *CV*(*RMSE*) metrics are shown for all the temporal resolutions, from annual to *hour*. However, *R*<sup>2</sup> is only analyzed for the hourly time grain as the study of the linear relationship of larger time grains variations, which has few points, is meaningless.
