1. Introduction
Today, the building stock suffers from low energy efficiency and significant discrepancies between anticipated and actual heat consumption known as the performance gap. The performance gap has been documented in several studies, see, e.g., in [
1,
2]. In [
3], it is stated that only 3% of the building stock in the EU has energy label A, which corresponds to the level of new buildings. Additionally, the reliability of the energy labels has been proven to be limited. In a report from 2018 by the Danish Energy Agency it was stated that 20 to 30% of the Danish building energy labels were wrong. This corresponds to between 12,000 and 18,000 energy labels that specific year.
The energy efficiency directive (EED) of the European Union (EU) [
4] states that all member states are responsible for the installation of individual energy meters, including heat meters, on all buildings to the extent that it is technically possible and economically feasible. Furthermore, the new energy performance of buildings directive (EPBD) lists several requirements to boost the national renovation strategies [
5]. These initiatives are established in order to increase the energy efficiency of the EU building stock. With the current data collection requirements and the new EPBD, the relevancy of data-driven methods for the screening and documentation of the thermal performance of buildings has become more relevant than ever.
Several dynamic modelling approaches used for energy performance evaluation of buildings can be mentioned. That is from pure deterministic white-box models such as TRNSYS, Modelica, IDA ICE, EnergyPlus or ESP-r models [
6,
7,
8,
9,
10], to fully statistical black-box models such as artificial neural networks. The estimated energy performance of buildings based on deterministic white-box models is based on a number of assumptions which may or may not match reality. Typically, this is because the evaluations are done before the building is built and several parameters, which can affect the energy consumption, are unknown at the time. This is one of several possible reasons for the performance gap in the building stock [
11]. On the other hand, the black-box models are often used for prediction, control and clustering, rather than system identification due to its lack of interpretability. Additionally, as black-box models are data-driven models, they can only be applied after the building is build and data are collected. One exception is if black-box models are applied on simulated data obtained from white-box models. A review of different black-box approaches used for building energy consumption and performance estimation can be found in [
12].
A third category of models is the grey-box models. These kinds of models are a hybrid of the previously mentioned white and black-box models. Examples of grey-box models are physics-based stochastic differential equations [
13,
14], and autoregressive moving-average models with exogenous inputs (ARMAX models) for time series data [
15]. The ARMAX model can be explained in physical terms. First, by formulating a thermal lumped resistance capacity (RC) model, and subsequently by deriving the corresponding ARMAX representation as done in, e.g., [
16,
17]. However, the MA term in the ARMAX model is often omitted, and the ARX models are used instead, see, for example, [
18,
19].
Other approaches in the literature utilise time varying parameters estimation related to the thermal building performance. For example, in [
20] each parameter of interest is treated as time-varying states found by multivariate kernel estimation. This method is one of the key influences of this study.
In general, it can be said that the category of supervised machine learning techniques (such as grey-box models) can be used for building performance parameter estimation, prediction and control, whereas unsupervised techniques (such as black-box techniques) are suitable for prediction and control only, as the physical interpretability of the model parameters is lacking.
A shared commonality of most of the dynamical data-driven methods used for building performance estimation is that they require human interaction for model selection and validation as described in [
17]. For large-scale assessment of building performance the before mentioned methods are therefore currently not feasible.
Alternatively, simpler quasi-stationary models have been proposed in the literature. One quasi-stationary approach to quantify thermal performance of buildings is the energy signature method, which has been studied and applied for decades. A few of the early examples can be found in [
21,
22,
23] with the earliest known, dated all the way back to 1951 [
24].
The dominating principle of the methods is to apply linear regression on, e.g., outdoor temperature measurements in order to explain the heat consumption of a building. Consequently, information on, e.g., the heat transfer coefficient can be extracted form the estimated model.
In one of the simplest energy signature models found in [
25], the heat consumption is described as a linear function of the outdoor temperature during the heating period, i.e., by a slope and an intercept, where the slope represents the heat loss coefficient of the building. During the weather-independent period, the heat consumption is modelled as a constant for buildings without cooling and heat recovery. The change point for the weather dependent period (heating period) to the weather independent period is described by the base temperature. The base temperature is the outdoor temperature at which the building is in thermal balance. For a fixed value of the base temperature, the energy signature is similar to the well-known degree-day method, see, e.g., in [
26,
27]. As the energy signature operates in two distinctive modes during the weather-dependent and the weather-independent periods, respectively, it is a regime model with two regimes, where
determines the instantaneous change between them.
In the ASHRAE Guideline 14-2002—Measurement of Energy and Demand Savings [
25], seven different energy signature models are proposed depending on different factors concerning heating, cooling and heat recovery systems.
Common for all the methods in the ASHRAE Guideline is that the heat usage is a function of the outdoor temperature alone, contrary to, e.g., [
22,
28].
In [
28], it was found that the heat loss coefficient is fairly insensitive (±5%) to solar gain and electricity use. The estimated base temperature was more sensitive to these two factors. As both electricity and solar gain are additive terms to the heat demand (i.e., a functional offset), this seems like a reasonable finding.
More recent studies, like those in [
29,
30], show that the energy signature method is still used in research. However, only minor advances on the technique have been reported since the earliest applications of the method. For example, in [
29] a energy signature model for determining the domestic hot water production and heat loss due to hot water circulation is proposed. Additionally, only night-time data were used to reduce effects such as solar irradiation. To the best of our knowledge, the actual energy signature model in the literature is typically kept linear, which is likely to result in significant biases of the estimated building physical parameters.
Motivation
For many of the above-mentioned energy signature methods, it is assumed that the transition from a weather-dependent heat consumption to a weather-independent heat consumption is instantaneous at a specific outdoor temperature. However, this simplification does not match reality. Therefore, we propose an advancement of the traditional energy signature models by formulating it as a smooth nonlinear model, from which the transition period can be determined. Furthermore, by reformulating the traditional energy signature model as a smooth, i.e., fully differentiable, model, the estimation procedure can be made more efficient, and more advanced modelling approaches, which require full differentiability, can be applied.
As several weather phenomenons, such as wind and long-wave radiation, have nonlinear impacts on the energy consumption, new model formulations of, specifically, the heating period are proposed to get reliable building physical estimates. Five models of increasing complexity are proposed and the model accuracy is documented.
For occupied buildings, a major source of the variation in the heat consumption is related to the occupants [
31]. The effects are related to personal preferences to the indoor environment and the occupants’ understanding of the building and its systems [
32]. Therefore, the non-modelled effects on the heat consumption (e.g., window openings, changing temperature set points, etc.) are part of the model errors. We utilise this fact to form a method to quantify the occupants’ effect on the heat consumption, based on the model residuals.
The overall aim of this work is to establish a robust and scalable method for thermal energy performance estimation. The focus in this article is on buildings without secondary heat sources (e.g., wood stoves), cooling and heat recovery systems. However, it would be possible to extend the models to include mechanical cooling and heat recovery.
3. Case Study
3.1. Data
Sixteen randomly selected houses in Sønderborg in Southern Denmark have been used as a demonstration case. The built year of the houses ranges from 1937 to 1996, and the heated floor area from 86 to 173 . All houses are heated by district heating only, i.e., there is no additional and unmeasured heat sources in the houses except from internal gains. Finally, four of the houses have night-setback on the temperature set point.
The only measurement used from the houses is the total heat consumption provided by Sønderborg Fjernvarme—a consumer-owned district heating company in Sønderborg. In addition, outdoor temperature, wind speed and global solar irradiation are measured at the district heating plant, which is within 10 of the houses.
The sky temperature used in model M4 is obtained from the freely available reanalysis data set
ERA5-Land provided by the Copernicus Climate Change Service [
39]. The full documentation on the ERA5-Land can be found in [
40].
The heat consumption was measured every 10th minute, and the weather data were obtained on hourly basis. In the analyses, only daily averaged values are used from a period from 2 January 2009 to 1 May 2011.
The original heat consumption data consist of both domestic hot water consumption and space heating. Before the analyses carried out in this study, the hot water consumption has been separated from the space heating by means of kernel smoothing as described and done in [
41]. Therefore, the estimated space heating has been used, rather than the total heat consumption.
3.2. Software
Each of the models have been set up in Template Model Builder (TMB) (version 1.7.16) [
42] and fitted with the global optimisation algorithm, Multi-Level Single-Linkage (MLSL), alongside the local optimisation algorithm Limited-memory Broyden–Fletcher–Goldfarb–Shanno Algorithm (L-BFGS). The optimisation algorithms used are implemented in the
R package for nonlinear optimisation, R Interface to NLopt (nloptr) (version 1.2.1) [
43]. Version 3.5.1 of
R [
44] has been used throughout the study.
Despite the choice of software used in this study, the models can be estimated by using a broad range of simpler optimisers and other software.
3.3. Model Validation
All models presented in
Table 1 were fitted and validated by means of a five-fold cross-validation where the hyperparameter
k in Equation (
22) was tuned. The training and validation data were splitted randomly in a 80/20 ratio. The root mean squared error (RMSE) on the validation set is shown in the slope chart in
Figure 2.
The corresponding RMSE for each house is indicated as a grey or black dot. The red line indicates the mean RMSE across all 16 houses and the black line indicates House 6 which we will study in details. Notice that the RMSE is plotted on a logarithmic scale, i.e., the slopes of the lines between the points express the relative change in model error.
A general model improvement is seen when the simplest model, M0, is extended by including the base temperature as a free parameter as done in model M1.
Models M2 and M3 are the model extensions, which include wind and solar irradiation, respectively. For both models we see a decrease in the RMSE, meaning that both wind and solar irradiation have a significant effect on the heat consumption. However, including solar gain has the largest positive effect on the model errors when the models are evaluated in this successive manner. This is also referred to as type I partition [
45].
Finally, the heat loss related to long-wave radiation (M4) does not affect any of the model predictions. The parameter
in Equation (
15) is simply estimated as such a low and insignificant value that it does not affect the RMSE.
3.4. Residuals
In
Figure 3, the residual analysis plot for House 6 is illustrated. Each row consists of three plots: the model residuals of a given model, the fit of its extension and the residuals after the model extension.
In the top-left plot in
Figure 3, the residuals are shown for model M0. From the plot, a series of systematic errors from approximately 8 to 17
C and a negative trend in the residuals are observed.
The negative trend in the residuals is diminished after the model M0 is extended to model M1 (top-right). However, the systematic errors are still present. The systematic error means that both models M0 and M1 are not capable of describing the heat consumption during the transition period.
Last, in the residuals for temperatures below 8.0 C, a slight concave trend is seen for both M0 (top-left) and M1 (top-right). These patterns indicate that the heat consumption is not only a function of the outdoor temperature.
In the second row of plots to the left, the residuals of model M1 are shown as a function of the wind speed. The red dashed line indicates that model M1 tends to underestimate the heat consumption for wind speeds above per-mode=symbol / and overestimate the heat consumption for wind speeds below per-mode=symbol /. Modelling the wind-dependent heat loss with model M2, the tendency disappear as seen in the right-most plot in the second row.
In the third row of plots to the left, the model residuals of M2 are compared to the global solar irradiation. In this case, we see a negative linear trend in the residuals. This means that for days with high levels of solar irradiation, model M2 tends to overestimates the heat consumption, simply because the model does not include the effect of solar irradiation. After implementing the solar gain in model M3, the residuals flattens out.
It is seen that the solar irradiation explains a lot of the variance in the transition period around 5–15 C (third row, second column). In contrast to this, we see that the wind speed in the M2 model (second row, second column) mainly explains the variance in the lower outdoor temperature range. As the heat loss due to wind is proportional to the outdoor temperature, and due to the fact that the daily mean outdoor temperature is correlated with the solar irradiation, this is a reasonable result.
Additionally, the figure shows the residuals of model M3 as a function of the outdoor temperature in row four, column three. Compared to the residuals of the previous models, we now see that the systematic error in the transition period disappears. Only a changing variance between weather-dependent and weather-independent days is seen. Therefore, the solar gain seems to be the main reason for the heating system turning on and off in the transition period.
In the last row of plots, the effect of long-wave radiation between the building and its surroundings is shown. As expected, based on the unaffected RMSE in
Figure 2, the residuals remain unaffected.
It is now shown that the model residuals of model M3 (and model M4 for that matter) are independent of the outdoor temperature, wind speed, solar irradiation and the sky temperature. However, a clear time dependence in the residuals is found in
Figure 4 (grey lines). As argued for in
Section 2.5, this is likely related to time-correlated occupants’ behaviour.
In
Figure 4, the black line shows the time-varying heat gain needed to maintain thermal balance at all time, which is obtained as
as described in
Section 2.5. Based on the estimate of
, the time-varying base temperature
is found and shown as the red line. The dotted sections of the red and black lines indicate periods without a heat demand. During these periods, the estimate is not valid.
3.5. Parameter Sensitivity
It is now demonstrated how the model with outdoor temperature, global solar irradiation and wind speed as explanatory variables can describe the actual energy use of the buildings rather accurate. In this section, we will investigate how two of the common parameters across all models (UA and the base temperature
) are affected by the different model formulations found in
Table 1.
In
Figure 5 (left), the estimated UA value for each house and model is shown. The UA value is the estimated heat loss coefficient under influence of the mean wind speed (per-mode=symbol
/
) observed in the measurement period. In the plot to the right, the estimated base temperatures are shown for all the houses and models.
Going from model M0 to M1 gives rise to both increments and reductions in the estimate of UA and . It is therefore not possible to state any general bias by fixing ; it depends on the building and the occupants.
Including the effects of the wind speed, as done in model M2, the estimated UA values increase significantly and consistently for all houses. The effect on the base temperature is more moderate and reduces slightly. On the other hand, including solar gain in model M3 creates a small decrease in the UA value, as well as in the base temperature.
The introduction of long-wave radiation does not affect UA nor . With the proposed implementation of long-wave radiation in model M4, it is completely irrelevant for the heat consumption.
Going from the simplest reasonable model (M1) to the best model (M3) gives an average change in the estimated UA value of +54% across all 16 buildings considered, with the smallest and largest change of +26% and +84%, respectively.
The base temperature changes from M1 to M3 in average by −8%, with the smallest and largest change of −1% and −11%, respectively.
3.6. Thermal Performance Characterisation
Based on the finding in the previous sections, model M3 has proven to be the best model. In this section, some of the results obtained from model M3 are presented in
Table 2.
As an example,
Table 2 shows that House 6 has a
value of 115
/
. The
value represents the heat loss coefficient under weather conditions with a wind speed of 0
/
. Even though House 6 has the smallest
value estimated, the heat loss related to convection and infiltration (
) is the fourth highest among the 16 houses. This might indicate that House 6 potentially has air leakage issues.
Looking at the estimated values, all houses seem to be significantly affected by the wind. Furthermore, it is reasonable to believe that a high standard error (e.g., House 8) is a result of different wind effects from different wind directions, or by the fact that people often open and close the windows. Both scenarios will result in a high standard error.
in
Table 2 tells us at which outdoor temperature the house is in thermal balance given the average weather conditions. The base temperature varies from
to
. A low value of
can—as stated in the
Section 2.1.1—mean different things. Either the internal gains are high, the indoor temperature low, the house is poorly insulated or some sort of combination. The opposite is true for a high value of
.
The transition period indicates the outdoor temperature range where the building is in its transition phase. The spans indirectly indicate to which extent the heating system is affected by the weather. That is, a narrow band tells that the building or the heating system is insensitive to changing weather conditions, and visa versa.
As we do not know the indoor temperature and the heat gains related to the building use, we are not able to separate them. However, assuming that the indoor temperature is equal to the design indoor temperature (e.g., 20
), we can estimate the remaining unobserved heat balance contributions (
) needed to obtain thermal balance as described in
Section 2.5. The mean and standard deviation of the time-varying process
is stated in the two last columns of
Table 2.
In
Figure 6, the estimated
value and the wind-dependent increment of the U value (
per
heated floor area) are plotted as a function of the construction year. The bars around the dots indicate a 95% confidence interval of the estimate.
Regarding both and , the figure shows that there is a negative trend in the estimates, which indicates that the thermal performance has increased during the years. Despite the rather small sample, it seems like there is a significant difference between houses build before 1961, which is the year where the first energy frame came in force in Denmark, and after.
Even though the negative trend is present for the and the values, the reduction of for the individual houses does not necessarily lead to a reduction in . An example of this is shown for House 6, which has the lowest value of the houses build between 1961 and 1980, but one of the highest values in the same cluster.
4. Discussion
In
Section 3, it is shown how a number of building parameters, related to its thermal performance from heat consumption measurements and weather data, can be estimated. Making use of electricity consumption data and separately measured space heating is strongly believed to improve the models further.
However, for an estimation of the two most important thermal performance parameters in this paper—the and values—the inclusion of electricity and separately measured space heating would most likely affect the estimates to a minor extent, as long as the daily average values are of approximately equal magnitude for all weather conditions. This is because such heat gains are additive terms in the model formulations. Therefore, to keep the models flexible and scalable, as few as possible measurements are preferred, as long as they do not introduce a serious bias to the estimated model parameters.
In
Section 2.5, we have shown how it is possible to model the residuals and estimate the variation in the base temperature needed to maintain thermal balance. Additionally, for a given design indoor temperature, the time-varying heat balance contribution related to the building use is estimated. In
Table 2, the mean and the standard deviation of the heat balance contribution required to maintain an indoor temperature of 20
are given for all 16 houses; in
Figure 4, the variation is shown for House 6.
From the investigation of the residuals of model M3 (
Figure 3), there is no clear sign of any missing effects, that is, a function of outdoor temperature, wind speed, global solar irradiation or long-wave radiation. However, there is a correlation over time. Consequently, we have some time-dependent dynamics which are not captured by the model.
For daily averaged values or longer, the dynamics related to the heat capacities of the building should be averaged out sufficiently even though the long time constants often exceed 24
[
22]. Systematic errors such as those found in
Figure 4 are suggested to consist of two main effects: changing temperature set-points and ventilation rates. That is, both effects which are directly related to the building use (occupant behaviour).
Besides estimating a number of important parameters that describe the thermal building performance, the methods presented in this paper are seen as a potential stepping stone towards an identification of the main reasons for the well-known performance gap in buildings. Namely, the occupants, the weather conditions and the building envelope.
In general, to document the actual energy performance of a building under usage (including the way the occupants affect this performance), a few design parameters, used as input for the calculation of the building energy performance, are required. Those include the weather data (e.g., the design reference year (DRY) weather data), the design indoor temperature, internal heat gains and ventilation losses. The actual energy performance evaluation can then be used to illuminate the different reasons causing the discrepancy between expected and realised energy use. In
Figure 7, we illustrate the apparent performance gap and the three possible causes of the discrepancy which is further described in the following.
- A
Unintended occupants’ related differences in the energy consumption can be estimated as the difference between the estimated user-related heat gain and the user-related heat gain assumed in the design phase, .
Assuming that the indoor temperature is equal to the actual temperature, the user related heat gain assumed in the design phase is
where
and
are the anticipated internal heat gain (e.g., electricity and metabolism) the design ventilation loss, respectively.
Based on Equation (
34), the difference between the assumed heat gain caused by the occupants and the actual heat gain (see A in
Figure 7) can be estimated by
where
is the estimated actual heat gain related to the building use given
and
is the anticipated design indoor temperature.
is the heat loss coefficient under wind-free conditions estimated by model M3 found in
Table 1, and
is the time-varying estimate of the base temperature found as described in
Section 2.5.
The higher is, the more occupant-related heat is required to bring the building in thermal balance. This means, if , the internal heat gains are higher than expected in the design phase, the ventilation loss is lower than expected in the design phase, the indoor temperature is lower than the design temperature or a combination.
On the other hand, if the opposite is true.
In this scenario, it is assumed that ventilation and internal gains are independent of the weather. In reality, this might be violated, and Equation (
36) must by altered to account for that.
- B
Weather-related differences in the energy use can be estimated by comparing the predicted energy use with the actual weather conditions, and the predicted energy use with the outdoor temperature, wind speed and global solar irradiation used in the design phase. Model M3 in
Table 1 is used for prediction.
- C
Building envelope-related differences in the energy use can be estimated as the difference between the predicted energy use obtained using model M3 in
Table 1 and the occupants and weather corrected energy use obtained from points A and B, above.
The concept is further illustrated in
Figure 7, but it still needs to be validated on either simulated or actual building data. The method for estimating the occupants’ effect on the energy use may eventually be refined as well, as this can cause significant changes in the model parameter estimates as seen, e.g., in [
46]. In this study, the occupancy rate was used as proxy for different buildings operation modes. It was found that the energy signature (with only the ambient outdoor air temperature as explanatory variable) resulted in significantly different model parameter estimates if the models were estimated on occupied, unoccupied or both occupied and unoccupied data.
As the model is formulated as a fully differentiable model, contrary to the typical energy signature model, it allows for more advanced estimation techniques. One natural model extension is to estimate model M3 in
Table 1 and the occupant-related heat gain
, simultaneously. This can be done by formulating the model as a second-order state-space model, with the occupants’ related heat gain
seen in
Figure 4 (black line), as a mean-reverting hidden state.
Alternatively, the modelling approach presented in the paper can be estimated recursively, such that the estimate of is fed back into model M3 as an additional heat gain. A new realisation of can then be obtained, and the routine can be repeated until convergence.
5. Conclusions
This study has shown that the typical linear energy signature methods, found in the literature, can be significantly improved by applying a nonlinear and smooth model formulation.
Only daily averaged values of heat consumption and measurements of outdoor temperature, wind speed and global solar irradiation were used as model input. From that, several measures for the thermal building performance were estimated, including heat loss coefficients, heat losses related to convection and infiltration, solar transmittance, base heat load and transition periods. The use of so few variables makes the proposed method highly scalable and easily automatised. This was demonstrated on a study on 16 random selected single family houses.
It has been illustrated how the proposed model is more accurate in describing the variance and nonlinearities in the heat consumption compared to a simple energy signature with only the outdoor temperature as the explanatory variable.
Based on the model residuals, the heat gains related to the building usage were estimated by means of nonparametric kernel estimation methods. The estimation of building usage-related heat consumption paves the way for detailed building performance documentation and screening, as outlined in the current energy performance of buildings directive (EPBD) [
5]. For example, the impact on energy use related to weather, building use and the building envelope itself can be estimated separately.
The novel heat demand formulation using a smooth maximum function (LogSumExp) does not only provide a way of estimating the transition period from weather-dependent to weather-independent heat consumption. It also makes it possible to make more advanced models as they are fully differentiable contrary to the traditional energy signature model. This was further discussed in
Section 4.
In the future, three main issues should be addressed:
In the present paper, only 24 h average values were used with the argument that the effects of the heat capacities were averaged out as stated in [
22]. Several tests on parameter sensitivity could be done with the input variables averaged over longer and shorter periods than 24
.
Furthermore, the heat capacity could be modelled to account for potential dynamics related to the heat capacities of the building. Residuals () with no cross-correlation with the differentiated outdoor temperature () indicate that no thermal dynamics are left unmodelled. However, autocorrelation in the residuals might still appear, which indicates time correlated building use, e.g., the building use in one time step is correlated with the next.
In
Figure 5, it was shown that the wind speed had a tremendous effect on the estimated heat loss (UA value). Even though the model predictions improved and the parameters that describe the wind sensitivity are significant for all 16 houses (see
Table 2), it might be worth investigating other ways of modelling the wind’s effect on the heat consumption. As the effects are highly dependent on surroundings, building geometry and other unknown factors, it is suggested to model the wind dependence by means of nonparametric methods such as kernel or splines estimation.
As the variance of the model residuals is highly dependent on the outdoor temperature, they are seemingly heteroskedastic, i.e., not constant. The implication of heteroskedastic residuals is that the standard errors of the model parameters are biased. To correct it, the model should be formulated as a weighted least square problem where the weights are the inverse of the error variance.