**1. Introduction**

The building sector is considered the largest energy consumer in the European Union, representing 40% of the final energy consumption [1]. Globally, the energy consumption of this sector accounts for 20% of the total delivered energy [2]. Thus, there is increasing interest in improving the energy efficiency of buildings [3–5]. Furthermore, the potential of saving energy by renovation in Europe is considerable as two-thirds of European buildings were constructed before 1980 [6]. Building retrofitting can contribute to reduce the energy consumption of existing buildings with lower energy efficiencies. In this context, it is important to develop methodologies that can evaluate the actual impact of refurbishment on renovated buildings in terms of energy consumption, thermal comfort and lighting.

Therefore, building retrofitting is essential to prove the effectiveness of the applied energy conservation measures to check if the energy efficiency of the building has certainly improved. However, there are very few studies that actually evaluate the retrofit of buildings [7]. Most of the studies evaluate measures for building refurbishment based on energy simulation [8–10], mathematical models [11], artificial neural networks (ANN) [12] and building information modelling (BIM) [13]. Thus, most studies analyse the energy conservation measures based on model outputs and not proving the real effect on the building with monitored data. There is a performance gap between simulated and measured energy consumption. Some studies have found that the calculated heating energy consumption levels in the design phase were much lower than the measured values [14].

Some authors have evaluated the effect of refurbishment and renovations made in buildings with real data. Ardente et al. [15] presented the results of an energy and environmental assessment of retrofit actions implemented in six public buildings by using life cycle analysis (LCA). It is a common approach to evaluate the decrease of energy consumption, operational cost and environmental impact in building retrofitting [16–18]. Another approach for energy diagnosing of existing buildings is U-value in-situ measurement [19] that characterises the heat losses through the building envelope and that can be used to evaluate retrofitting actions [20]. Zavadskas et al. [21] proposed an approach to assess indoor environmental conditions before and after retrofitting of dwellings with multiplicative optimality criteria and experimental data. Hamburg et al. [14] analysed how well the energy performance targets of building refurbishment are reached by collecting energy consumption and indoor measurements after renovation and constructing simulations.

All of the previous presented methodologies to evaluate the retrofitting made in buildings use vector-based data approaches. The presented methodology in this work for evaluating the impact of a building retrofitting is based on functional analysis of monitored data before and after retrofitting. By comparison of vector and functional analysis, we demonstrate that functional analysis provides more realistic and accurate evaluations of the studied variables.

The methods for assessing building renovation have been applying vectorial analysis to the data. These methods do not take into account the observations within a day as a set when evaluating the daily behaviour of the data. As a consequence, the correlation between observations is missed. In this context, Functional Data Analysis (FDA) can be useful because it is able to detect days that do not have individual outliers, but may be far from the mean behaviour [22–26]. A proof of its application is that FDA has expanded to a grea<sup>t</sup> number of scientific fields related with continuous-time monitoring processes such as the environment [24,26–29], health and medical research [30,31], industrial processes [32,33], sensor technology [34,35] or even econometrics [36]. Moreover, it has also been applied with machine learning techniques in optimisation and classification problems [37,38]. Nowadays, FDA continues to expand its applications in more fields such as quality control or sports [39,40]

In this work, we propose the use of FDA for assessing the impact that building retrofitting has in the energy performance, indoor temperatures and lighting conditions of a building. The methodology was applied to a case study of the renovated building of the Rectorate of the University of the Basque Country (Spain). The novelty of this work is the application of FDA to statistically contrast the differences in the energy performance of a building before and after a retrofitting. The literature review shows that just few studies actually evaluated the retrofit of buildings with monitored data and functional analysis was not used for this application.

The samples are composed of daily curves of variables such as electricity, heating demands and temperatures. FDA allows making the contrast between samples taking into account the average behaviour of the group throughout all day [24,37,41], which would not be possible with a vectorial approach. In vectorial analysis, the data of a whole day have to be summarised in a single value to work with daily observations. To classify a day as outlier, it has to move away from the sample mean, in this case calculated with simplified daily observations [42,43].

On the one hand, to carry out the functional analysis, a functional ANOVA (FANOVA) was used to evaluate whether there are differences between monitored data in the building before and after retrofitting. On the other hand, a classical analysis of variance (ANOVA) was also used to study the differences between the samples before and after the retrofitting [43–46]. To complement the vectorial analysis, Kruskal's non-parametric test was applied to contrast if the two samples come from the

same initial distribution [47–49]. In addition, some variables such as the rate of change in sample variance or the functional L2 distance between curves are also presented to measure the impact of the refurbishment. The vectorial method is based on the differences between the medians, and the functional method consists of measuring the distance between the curves that represent the mean functions [50–52].

The results show that FDA efficiently demonstrates that the heating demands in the building were reduced thanks to the envelope insulation, although ventilation was increased, indoor temperatures were increased and internal lighting loads were reduced. The results also show that a significant reduction in lighting consumption was achieved with the installation of LED lighting. Moreover, it is demonstrated that taking into account the correlation of the data from a functional approach is a more realistic and informative way to study how different two or more samples are.

### **2. Materials and Methods**

### *2.1. Functional Data Analysis (FDA)*

Functional Data Analysis (FDA) studies observations which form functions defined over a determined set T. The infinite-dimensional structure of the data enlarges the possibilities of research [23,27,53]. A random variable X is defined such a functional variable if it takes values in a complete metric or semi-metric space and is observed in a discrete set of points {*tj*}*np j*=1 ∈ [*a*, *b*] (not necessarily equispaced) for each of the *n* individuals studied [54,55]. Thus, the data consist of a **X** matrix with *n* rows representing the different individuals and *np* columns representing the different discrete points where the functions are evaluated [55].

The functional model, through a process known as smoothing, converts the initial discrete values into a set of continuous functions over time *x*(*t*) ∈X ⊂F, being F a functional space. To estimate these functions, F is F = *span*{*φ*1, ..., *φnb*}, where *φk* is a base function and *nb* the number of basis functions necessary to build a functional sample. Although there are other types, the basis functions used commonly are spline or Fourier functions [56], and the expansion considered is [24,25,27,57]:

$$\mathbf{x}(t) = \sum\_{k=1}^{n\_b} c\_k \phi\_k(t) \tag{1}$$

where {*ck*}*nb k*=1 represent the coefficients that shape the function *x*(*t*) with respect to the chosen set of basis functions. In this way, the smoothing process consist on solving the following regularisation problem [24,25,27,57]:

$$\min\_{\mathbf{x}\in F} \sum\_{j=1}^{n\_p} \{\mathbf{z}\_j - \mathbf{x}(t\_j)\}^2 + \lambda \Gamma(\mathbf{x}) \tag{2}$$

where *zj* = *x*(*tj*) +  *j* (being  *j* a value of the zero-mean random noise) is the result of observing *x* at the point *tj*, *λ* a regularisation parameter that controls the intensity of the regularisation, and Γ an operator that penalises the complexity of the solution. Taking into account the expansion, Equations (1) and (2) can be expressed as [24,25,27,57]:

$$\min\_{\mathbf{c}} \left\{ (\mathbf{z} - \boldsymbol{\Phi}\mathbf{c})^{T} (\mathbf{z} - \boldsymbol{\Phi}\mathbf{c}) + \lambda\mathbf{c}^{T}\mathbf{R}\mathbf{c} \right\} \tag{3}$$

where **z** = (*<sup>z</sup>*1, ..., *znp* )*T* is the observation vector, **c** = (*<sup>c</sup>*1, ..., *cnb* )*T* the vector coefficients of the functional expansion, **Φ** the *np* × *nb* matrix of **<sup>Φ</sup>***jk* = *φk*(*tj*) elements, and **R** the matrix formed by *nb* × *nb* elements [24,25,57,58]:

$$\mathcal{R}\_{kl} = \langle D^2 \phi\_{\mathbf{k}}, D^2 \phi\_l \rangle\_{\mathcal{L}\_2(\mathbb{I})} = \int\_{\mathbb{T}} D^2 \phi\_k(t) D^2 \phi\_l(t) dt \tag{4}$$

where *Dnφk*(*t*) represents the *n*th-order differential operator of the function *φ<sup>k</sup>*. After this, it is easy to know that the solution can be calculated as follows:

$$\mathbf{c} = (\boldsymbol{\Phi}^t \boldsymbol{\Phi} + \lambda \mathbf{R})^{-1} \boldsymbol{\Phi}^T \mathbf{z} \tag{5}$$

2.1.1. Functional Depths

Initially, the depth concept appeared in the multivariate statistics for measuring the centrality of a point *x* ∈ R*<sup>d</sup>* within a specific dataset: giving greater value to the points near the center [22]. Then, some authors extended this measured to FDA [59,60]. The functional depth give us a centrality measure of a specific curve *xi* with respect a set of curves *x*1, ..., *xn* that comes from a stochastic process X (·) in a defined interval [*a*, *b*] ∈ R.

There are several functional depths in the statistical literature, but the three main are: *Fraiman–Muniz* [59], *h-modal* [61] and *Random Projections* [60]. The most used is the *h-modal* depth because it has a better frequency of correct detection than the others [22]. The *h-modal* depth defines the functional mode as the curve most densely surrounded by other curves of the dataset. In this manner, the functional depth of a curve *xi* with respect the other curves in the sample is given by:

$$MD\_n(\mathbf{x}\_i, h) = \sum\_{k=1}^n K\left(\frac{||\mathbf{x}\_i - \mathbf{x}\_k||}{h}\right) \tag{6}$$

with || · || being a norm in the functional space, *K* : R<sup>+</sup> → R<sup>+</sup> a kernel function, and *h* a bandwidth parameter [61]. Thus, the curve that gets the maximum value in Equation (6) is considered the functional mode. Moreover, some authors [60,61] recommended the use of L<sup>2</sup>(*l*) norms and a truncated Gaussian kernel:

$$||\mathbf{x}\_i - \mathbf{x}\_k||\_2 = \left(\int\_a^b (\mathbf{x}\_i(t) - \mathbf{x}\_k(t))^2 dt\right)^{1/2} \qquad\qquad K(t) = \frac{2}{\sqrt{2\pi}} \exp\left(-\frac{t^2}{2}\right), t > 0 \tag{7}$$

The principal aim of functional depths, viewed as functional dispersion measure, is the detection of outliers. As in the classical analysis, detecting and examining these curves is important because they may bias our functional estimations and because it allows us to discover the reasons that make these curves deviate from the mean. Furthermore, from a functional approach, it is essential because it may occur that the individual values of a curve are not outliers vectorially, but, instead, the complete curve is a functional outlier [22,58]. If we assume that every curve in the data come from the same stochastic process, a curve would be considered such an outlier for two reasons: it is at a significant distance from the expected function of the stochastic process or its shape represents a very different behaviour from the other curves. Therefore, the curves with functional depth below a specific C value would be considered atypical and would be removed from the sample (see [23–26]). On the other hand, it would be convenient to choose a C that provides a controlled type I error level. It should be a value that, in absence of outliers, the probability of mislabelling a correct data as outlier would be approximately a 1% [23–26]:

$$P(D\_{\mathbb{H}}(\mathbf{x}\_{i}) \le \mathcal{C}) = 0.01, \ i = 1, \ldots, n \tag{8}$$

In this way, the chosen C will be first percentile of the depths distribution chosen. Since this distribution is unknown this, percentile must be estimated using the sample data. For this purpose, there are two different bootstrap techniques: trimming bootstrap [62] and weighting bootstrap [63]. Some studies demonstrate that, despite having a larger incorrect outlier detection, the trimming bootstrap has a better performance detecting the curves that are actually outliers [22,64].

### 2.1.2. Functional Test ANOVA (FANOVA)

Any test or contrast that can be made in a vectorial analysis can have a functional version that usually provides more relevant information. An example of this is the classical ANOVA. Its functional version, although also contrasting the mean levels of a variable, is based on *k* independent samples *Xij*(*t*), *j* = 1, ..., *ni t* ∈ [*a*, *b*] drawn from L<sup>2</sup>(*l*) processes *Xi*, *i* = 1, ..., *k* such that *E*(*Xi*(*t*)) = *mi*(*t*) [65–68]. If we have a functional sample classified in several groups such as {X*<sup>i</sup>*, G*i*}*ni*=<sup>1</sup> ∈F×G = {1, ..., *<sup>G</sup>*}, where G is a discrete variable that tell us the membership group, the contrast will be:

$$\begin{cases} \begin{array}{c} H\_0: \overline{X}\_1 = \overline{X}\_2 = \dots = \overline{X}\_G\\\ H\_1: \exists k, j \quad \text{s.t.} \quad \overline{X}\_k \neq \overline{X}\_j \end{array} \tag{9} \end{cases} \tag{9}$$

After a few operations, as shown in [50,51], it is possible to go from classic (*Fn*) to functional statistic (*Vn*).

$$F\_n = \frac{\sum\_{i}^{G} n\_i (\overline{Y}\_{i.} - \overline{Y})^2 / (G - 1)}{\sum\_{i}^{G} \sum\_{j}^{n\_i} (Y\_{ij} - \overline{Y}\_{i.})^2 / (n - G)} \qquad \implies \qquad V\_n = \sum\_{i < j}^{n\_i} ||\overline{Y}\_{i.} - \overline{Y}\_{j.}||^2 \tag{10}$$

In addition, according to Cuevas et al. [50] and Tarrío-Saavedra et al. [51], the asymptotic distribution of *Vn* under *H*0 is the same as the following statistic:

$$V := \sum\_{i$$

where *Cij* = (*pi*/*pj*)1/2 and *<sup>Z</sup>*1(*t*), ..., *ZG*(*t*) are independent Gaussian processes with mean 0 and covariance functions *Ki*(*<sup>s</sup>*, *t*).

Finally, *H*0 will be rejected, at a level *α*, whenever *Vn* > *V* where *PH*0 {*V* > *<sup>V</sup>α*} = *α* [52]. Because in practise it is not easy to estimate the distribution of V, usually, it is necessary to implement a Monte Carlo procedure through which we ge<sup>t</sup> for each *i* = 1, ..., *G*, N iid observations

$$Z\_{il}^{\*} = (Z\_{il}^{\*}(t\_1), \dots, Z\_{il}^{\*}(t\_m)), \quad l = 1, \dots, N \tag{12}$$

from a *m-dimensional* Gaussian random variable with mean 0 and covariance matrix (*K*<sup>ˆ</sup>*i*(*tp*, *tq*))<sup>1</sup>≤*p*, *q*≤*<sup>m</sup>*. The functional L<sup>2</sup>(*l*)-distances ||*Zi*(*t*) − *CijZj*(*t*)||<sup>2</sup> are approximated by the R*m*-Euclidean distances ||*Z*<sup>∗</sup>*il* − *CijZ*<sup>∗</sup>*jl*(*t*)||2. Ultimately, the replications *V*˜*l* of V are

$$\mathcal{V}\_l = \sum\_{i$$

and the distribution of V is approximated from the empirical distribution to a sample *V* ˜ 1, ..., *V* ˜ *N* [50,51].
