*2.3. Pre-Processing Data*

Before presenting the results, it is necessary to explain the specific smoothing process performed in this study. On the one hand, because there are parts of the day in which the variables are constant, the basis functions chosen are splines. On the other hand, to select the optimal number of basis, the determination coefficient R<sup>2</sup> was taken into account to measure the smoothing adjustment in relation to the raw data. As shown in Figure 3, the criterion was to select the minimum number of basis (in a given grid) where the R<sup>2</sup> surpasses the value of 99%.


**Table 1.** Monitored variables in the building to assess the impact of the refurbishment in the energy performance, illuminance and comfort of the building.

**Figure 3.** Example of the process of selecting the optimum number of basis for smoothing the original sample. The red line represents the minimum number of basis where the R<sup>2</sup> is higher than 0.99.

In addition, it is important highlight the *data cleaning* that was carried out using a functional approach. It is known that throughout all the analysed years there were many days when the building was closed and the information that these days provide is not only not useful, but can also distort the final results. To solve this problem, an algorithm that searches for these days and deletes them from the sample was developed (see Algorithm 1).

After an exploratory analysis of the data, the values of the chosen parameters for Algorithm 1 are: *β*1 = 500, *β*2 = 250, *α* = 0.25, and *θ* = 0.5. Moreover, because there are parts of the day in which the variables are constant, the chosen basis functions are spline. With the application of the algorithm to the data, the sample becomes smaller but only with relevant days that take into account the normal behaviour of the building. Figure 4 illustrates the performance of the algorithm. It can be seen that it is capable of deleting the days with an abnormal behaviour without affecting the bulk of the data. The days when the building was unoccupied and closed, and therefore with very small or no electrical consumption, are detected and eliminated in the picture on the right of Figure 4.

Figure 4 also presents the mean functions (in form of curves) and the change that they suffer after deleting non-representative days. Working with non-representative days will produce erroneous results in any study; for example, if an ANOVA test is performed, the test may not reject the equality of means even if the groups are different.

**Figure 4.** Performance of the *Functional cleaning* algorithm with an example variable: Electrical demand on the second floor of the building in the representative months (in dark gray the data before the refurbishment, in light gray after). The number of days taken into account in each sample is also shown: (**left**) the raw data and the mean functions separated between before and after the refurbishment; and (**right**) the data after applying the algorithm and the mean functions separated between before and after the refushbishment.

**Algorithm 1:** Functional cleaning.

**Input:** Data divided in groups and the parameters: *β*1, *β*2, *α*, *θ*. **Output:** Data without inappropriate days.

	- • Have a variability less than or equal to a percentage *α* ∈ R of the average sample variability.
	- • Be below the sample mean function for at least a percentage *θ* ∈ R of the day.

### **3. Results and Discussion**

The effects of the refurbishment carried out in 2017 in the Rectorate building of the University of the Basque Country were analysed. In this analysis, lighting consumption, illuminance, indoor temperatures and heating demand were studied. These variables were measured every minute between 2016 and 2019 (changing 2016 for 2015 in the case of the ground floor) and only taking into account those months in which the heating systems operate significantly (October to March). As the retrofitting started in the summer of 2017, the months of this year after the summer are not suitable for analysis. In this way, data were divided into nine months before retrofitting (six months in 2016 and three months in 2017) and nine months after retrofitting (six months in 2018 and three months in 2019).

Section 3.1 presents the lighting analysis of the study and Section 3.2 the same analysis for heating demands. The numerical results were based on the *p*-values of the ANOVA and Kruskal tests in the vectorial analysis, and on the *p*-values of the FANOVA in the functional analysis. Different measures are also shown to illustrate the differences between the sample groups: Dvec, the difference between the medians from the vectorial approach; Dfunc, the average minute difference between the mean functions; and Ddist, the L2(*l*) distance between the mean functions. Additionally, to measure the functional smoothing adjustment to the raw data, the coefficient of determination R<sup>2</sup> is shown in the tables. The change in the variance of monitored data ( Var) and the savings obtained with the retrofitting, calculated in relation to the initial energy demands, are also shown from both approaches. Lastly, all figures presented here were made with the R-programming software.
