1. Introduction
In the effort to fight global warming, one of the German government’s goals is to achieve a climate-neutral building stock by 2050. The policies focus on two strategies, the use of renewable energies and the increase in energy efficiency. Long-term climate neutrality in the building sector can be achieved by reducing energy consumption and expanding renewable energy [
1]. The thermal properties of the building envelope, the efficiency of building technology, and the user behaviour significantly influence energy efficiency of a building. In order to take measures to improve existing buildings, it is essential to detect the actual energy efficiency of a building. With on-site measurement data, the actual energy consumption can be detected, and flaws in energy efficiency identified.
Deviations from the efficient operation can result from plant defects, neglected maintenance or changed or incorrect use by the residents. AS research implies these deviations can result in a significant additional consumption of energy in the order of 5–30% [
2,
3,
4]. This study aims to develop a method to determine the faults in HVAC installations to improve operational efficiency and, therefore, improve the energy efficiency of a building.
Fault detection (FD) identifies (detecting) the deviation from the target and actual or expected operations (faults). There are two possible fault detection scenarios: Either measurement can be performed on-site, and faults detected afterwards by analysing the data or the faults are detected automatically in real-time during operation and reported directly to the building technology manager. Automated fault detection has the advantage that the commissioning of a system is monitored and optimised by the FD method and that energy is saved over the entire lifespan, and the operation and maintenance process can be continuously monitored. In this way, an efficient HVAC process can be ensured.
With the development of new software, data availability and data analysis, and the research on artificial intelligence [
5] many institutions are improving and developing new fault detection methods. In the past there were many approaches to detect faults by using a number of different prediction methods. For example Yan et al. [
6] suggested a combination of ARX and support vector machines for fault detection, while Luo et al. [
7] created a fault detection of machine tools based on deep learning. Kim et al. [
8] give a summary of automated fault detection and diagnostics (AFDD) studies published since 2004 that are relevant to the commercial buildings sector. They categorise AFDD in HVAC areas into three main categories: process history-based, qualitative model-based, and quantitative model-based methods. Lo et al. also give a good review on machine learning approaches in fault diagnosis [
9]. Z. Ge et al. [
10] give a systematical review through the viewpoint of machine learning on methodologies of data mining and analytics in the process industry. Mattera et al. [
11] use the physical relations inside ventilation units to create virtual sensors from other sensors’ readings, introducing redundancy in the system. They employed linear regression models, statistical models like linear and non-linear regression models. Lin et al. [
4] investigate what the cost-benefit of FDD methods is and which methods and data sets can be used to evaluate and compare FDD methods.
Within the ongoing IEA ECB Annex 71 [
12] (International Agency’s Energy in Buildings and Communities Program; Annex 71 Building Energy Performance Assessment Based on In-situ Measurements), the members explored FD-techniques. Building on the Annex, this study focuses on the faults of the building system. Simulation data of the twin houses—which are two identical case study houses at the Fraunhofer Institute for Building Physics in Holzkirchen, Germany—were used. The simulation was carried out as part of the “The Building Energy Simulation (BES) model validation” study of the IEA ECB Annex 58 and 71 project [
12,
13]. The data consist of two sets, a first part in which fault free operations are simulated and a second part in which various system faults were integrated into the simulation. Both data sets have the length of one month.
This study is an extension of the work presented by Parzinger et al. at the 12th Nordic Symposium on Building Physics (NSB 2020) [
14]. In a first phase, two different statistical models, a machine learning approach called random forest [
15] and a time series approach called ARX, which stands for auto regressive process with exogenous inputs, predict the normal operation by predicting the total heating power of the building. Of these two models, the linear ARX model has the advantage that it is easier to interpret than random forest. On the other hand, the black box model random forest as an ensemble method is more difficult to interpret, but has the advantage that it has few problems with overfitting and provides a non-linear modelling technique. The fault detection approach presented in this study can, in principle, be performed with any prediction model for regression. Two different modelling techniques (ARX and random forest) are used to show that the presented methodology for fault detection is independent of the prediction model. In the second phase, these two prediction models predict the data set that contains faults. For fault detection two different residual analysis are processed: 1. The exact times of the faults are known. Thus the best decision rule can be found for each data set and each statistical model by minimizing the misclassification. 2. The times of the faults in the preliminary field are not known. In this case, the fault is estimated using the rate of estimated faults. The fault detection is carried out using residual analysis, model checking based on residual analysis is a standard technique for time series analysis, cf. [
16], page 175 ff. and [
17], page 360 ff. An overview of time series modelling can be found in [
16,
18]. With a suitable time series model adaptation, the residuals are generally expected to stay within—an approximate—white noise and i.i.d. with a centered mean. In this study we use this characteristic as the starting point for a decision technique. We propose that this decision technique is suitable for residuals based on any good model fitting (e.g., resulting from a random forest model).
There are two main types of time series methods for fault detection: non-parametric methods, which use spectral analysis, and parametric methods, which can be categorized as parameter-based and residual-based methods [
19]. For the residual technique, the estimations of the model parameters do not have to be considered. The residuals can be calculated directly from the predictions (based on the same modelling method) and the responses. The white noise property of the residuals can be analyzed using various statistical tests, which work partially as portmanteau tests. In this study a data-driven decision rule for fault detection merges multiple tests. Different faults and different prediction methods for the response yield different deviations of the standard behaviour of the resulting residuals. The decision rule for fault is learned on the residual data and can be seen as a sort of portmanteau decision rule for fault detection where the null hypothesis is specified by faultlessness. The technique is adjusted to the situations of observed and unobserved faults in the learning sample. Furthermore, the method is formed for fault detection in specific time points and time intervals.
The developed methods for fault detection could replace a graphical, user-subjective valuation of a residual plot using an automatic, data-based approach. The procedure of the fault detection method presented in this study is shown in
Figure 1. The focus is on the fault detection area highlighted in red.
2. Description of Simulated Data and System Faults
The simulation was build upon an empirical validation experiment of Annex 58 and Annex 71. A detailed description of the two identical full-size buildings of the Fraunhofer Institute for Building Physics in Holzkirchen, Germany, and data can be found in [
20,
21,
22,
23,
24].
The data set is obtained through detailed simulation with the building performance simulation (BPS) program IDA Indoor Climate and Energy (IDA ICE) [
25]. IDA ICE is a multizone equation based simulation program to describe and simulate the behaviour of buildings and HVAC systems. A model of the building and HVAC system is physically described and simulated. The simulation uses the house description of Annex 58 and the climate boundary conditions of January and February 2019 in Holzkirchen [
26] (Annex 71).
For the simulation model, each room was equipped with a 2000 W electric radiator with a longwave radiation fraction of . The heating set point of the air temperature was set to and controlled room wise by a thermostatic control with a dead band of K.
The simulation integrates an MHVR (Mechanical Ventilation with Heat Recovery) air handling unit with a heat recovery of 80 percent with an integrated MVHR summer bypass (possibility to switch off the heat recovery during summer months).
Table 1 shows the setting of the MHVR for the different rooms, divided into rooms with supply air and return air. The living room and the kitchen have an open floor plan.
The simulation includes a simple occupancy plan of a four-person household with the absence of users between 7:30 and 17:00 each day. The occupancy is set as presented in
Table 2.
The data set starts on 1 January and ends 28 February. The first month (1–31 January), the building runs in regular operation. The second month (1–28 February) includes faults in the operation of the building. In this study, three faults are selected. The first fault (F1) is the tripping of the circuit breaker due to an overload of the upper floor power cable or due to a short circuit. The result is the loss of heating power on the upper floor. It is possible to deactivate the heat recovery of the ventilation system through a bypass to prevent overheating during the summer months. In the second fault (F2), the bypass disables during the heating period. This fault results in cold supply air temperatures in the ventilation system. In the third fault (F3), the room temperature thermostat of the living room is broken. The result is a changed setpoint temperature of 28
.
Table 3 gives a detailed description of the faults and their start and end times.
The data set contains the indoor and outdoor properties shown in
Table 4. Air temperature for each room and total heating power supplied by all electrical radiators are measured indoors. The outdoor properties are the air temperature, relative humidity, diffuse and direct solar irradiation on horizontal surfaces, and wind speed and wind direction.
3. Statistical Tests
The presented statistical tests are implemented with the programming language R [
27], and the user interface RStudio [
28]. The graphics were created with the R package “ggplot2” [
29].
The predictive models for the response total heating power use as predictors the indoor temperatures of all rooms, all outdoor information, as listed in
Table 4 and the daytime in hours. The estimated total heating power is the model output and the values of the total heating power one hour and two hours ago at each point in time are added as predictors (features) to the models.
The predictive modelling is carried out with the method random forest [
30] and an ARX (autoregressive with exogenous variables) time series model [
31]. To predict the total heating power
in February, the January data is used to train a model. The total heating power in January is predicted with a 4-fold cross-validation. The 4-fold cross-validation divides the January data into four nearly equally sized parts. Then three of these parts predict the reminding part in order to evaluate the models. The difference between the real values (responses)
and the predicted values of the total heating power
are the residuals. Let
be the response variable of the observed total heating power and
the predicted total heating power from a model at the time
. Then
denotes the residual at the time
t and
the vector of the residuals. Furthermore
is defined for a fixed
.
Figure 2 shows the January and the February residuals for the developed random forest and the ARX model.
After successful modelling, the typical properties of residuals are to be statistically tested. A fault in the data process is assumed if the behaviour of the residuals deviates significantly from the standard properties. The special properties of residuals depend on the modelling methodology, the data structure of the learning sample, as well as on the prediction quality of the model.
It is assumed that the residuals have a median of zero, are independent and therefore uncorrelated from each other, and behave randomly.
The Sign Test [
32] and the Wilcoxon Signed-Rank Test [
33] are suitable for testing for median equal to zero. The Turning Point Test [
34] is well suited for testing independence, the Box-Pierce Test and the Ljung-Box Test [
35] for autocorrelation. Randomness can be tested with the Bartels-Rank Test, Cox-Stuart Trend Test, Difference-Sign Test and Mann-Kendall Rank Test [
36]. In total, this study examines the residuals using nine tests divided into four test objectives.
3.1. Moving p-Value
The moving residuals for the shift
s with time window length
, which represents the sample size of the moving residuals, are defined by
The moving residuals are used in order to avoid testing all residuals at once. If
is the
p-value of the statistical test
T, it is used
, for a fixed
L, to examine periods for faults. If the
p-value of a test is less then a previously selected significance level
, then the null hypothesis of this test is significantly not met [
17], page 5 ff. For all nine tests the null hypothesis is that a certain property of the residuals is fulfilled. Therefore it applies that for each test a fault is suspected when the
p-value of this test is smaller then
.
3.2. Mean p-Value (MPV)
A disadvantage of the
p-values of the moving residuals used so far is that it can be recognized at which shift
s the
p-value is no longer as expected, but not at which time point. In the following, a new constructed function determines faulty time points. This is made possible by a mean of the
p-values from the moving residuals. Let (
1) be the mean
p-value, in future called MPV.
where
denotes the indicator function with
, if
and 0 else. The MPV for a given test T and a window length L is for each
the mean of all
p-values generated by moving residuals that used
in the calculation.
For the fault detection it applies, if
, then the test
T assumes a fault at time point
t. The fault detection presented here uses a combination of tests, which raises the question of how many tests must assume at least one fault to assume a fault in total. This value is abbreviated with
H. The value of
H is at least one and at most the whole number of tests which are used in the analysis. In this work nine tests are used. Thus an automated fault detection can be defined by using the three parameters
,
and
. Therefore, in the following each triple
is simplified termed as a decision rule. For the application
G is replaced by a large subset of
G. Let (
2) be the fault function.
Since there are no system faults implemented in the January data a limitation is made to the decision rules which erroneously detect faults. Accordingly, a decision rule from the following set (
3) is required which is refereed to as choice set.
3.3. Parameter Optimization in the Case of Observed Faults
A grid search finds with the known time points of faults an optimized decision rule , which minimises a previously defined fault rate. This optimization shows on the data of this study good results. The applicability of the model, which was optimized by this procedure on another data, was not tested so far. A problem could be that the decision rule adapts to the data too individually. In future works, the procedure should be repeated with other validation data. Only when data with known faults is available, this technique can be used, which is why other procedures are presented below.
3.4. Parameter Optimization in the Case of Unobserved Faults
A realistic, application-oriented situation is that the system faults are unobserved. This study looks for a method to derive a valid decision rule from the choice set without prior knowledge of the faults. The aim is to find the decision rule from the choice set
C, which recognizes the faults as good as possible. Note that there are many decision rules in the choice set that never assume a fault. A possible approach to get a decision rule with a high statistical power would be to restrict the choice set via the adjacency. If a decision rule in the choice set is adjacent to a decision rule that estimates exactly one fault in January, then this decision rule will correctly estimate January, and it should be able to classify faults with a high power. An example of such an adjacency can be seen in
Figure 3, which is a simplified representation of two parameters. The figure shows the fault estimations of 25 decision rules.
A disadvantage of this approach is that it is likely to dispose of many good combinations. Moreover, in future works alternative measures of adjacency could be investigated.
Another way is to determine the most frequent characteristic of each component from the set C, i.e., the mode for each component. This means a search for the values defined in (
4)–(
6) is needed
where
denotes the number of elements of a set
A. Finally, it is checked whether
and for this case
is the selected decision rule. Instead of the mode, this procedure can also be performed with other measures of location such as mean or median. However, the choice quantity should be further restricted beforehand. This can be done by applying all decision rules to February and removing the decision rules that predict too few or too many times faults in the February data.
3.4.1. Rate of Estimated Faults
If no suitable element can be found directly from
C with the previous methods, the following approach can also be pursued. For this, each element of the choice set must contain a decision at any time. Each decision rule can have two values per point in time, zero if it is not classified as fault and one if it is classified as a fault. The number of decision rules that classify for faults are formed by the sum of the decisions for each element from
C (at any time). Next, these values are divided by the number of decision rules that classify for faults at least once in February. This way, the decision rules that never classify a fault are ignored. This rate is referred to as the rate of estimated faults. If this rate is greater than a fixed threshold value, a fault is assumed. The following
Table 5 serves as a simplified example.
Table 5 shows that there are four time-points considered, and
C contains five decision rules. Two of these five decision rules never assume a fault. Therefore, the sum of the dichotomous decision rule values per time is divided by three (blue in
Table 5). The threshold value chosen here is 1/2 so that for the time points two and three faults are classified.
3.4.2. Restricting Areas
The previous methods were based on the MPV, which was used to estimate the exact times when faults accrue. However, if it is sufficient to limit the faults to certain periods, the following steps can be taken. First, a significance level , e.g., is determined. Then the p-values of all January residuals of each test are calculated, and the tests with a p-value smaller than are discarded. The remaining tests are now applied to the February residuals and the tests with a p-value greater than are discarded. The tests that remain reject the null hypothesis in February but not in January. Step by step, the range of faults is limited by dividing the February residuals into the first and the second half. The p-values of both halves are calculated. Each half with a p-value greater than is further divided, and the process can be repeated. If there are only p-values bigger than in a range, it can be assumed that there is no fault in this range. However, the problem with this procedure is that p-values strongly depend on the sample size. Therefore the threshold needs to be gradually adapted depending on the sample size.
5. Conclusions
This study uses residual analysis for fault detection of HVAC systems in buildings. A detailed simulation of a residential case study house provided the data for the analysis. The predictive modelling of the total heating power was carried out with the method random forest and ARX (autoregressive with exogenous variables) time series model. The fault detection was carried out by a residual analysis. The residuals were calculated directly from the predictions. A combination of statistical tests explored the white noise properties of the residuals, while a data-driven decision rule that combines multiple tests predicted the faults. The methods for fault detection developed in this study could replace a graphical, user-subjective evaluation of a residual plot using an automatic, data-based approach.
The fault detection method that uses residual analysis has several advantages: For one the method does not depend on the prediction model applied, furthermore information such as model parameter estimations and specific model structures are become superfluous. The research has proved that the methods of fault detection can be applied and different types of prediction models such as time series models and procedures of machine learning (including black-box methods) are suitable. Statistical tests can be added or removed depending on the suspected residual properties.
This study introduces two different methods of residual analysis one that finds a decision rule by grid search when faults are observed and the other that uses the rate of estimated faults when faults are unobserved. Better results are achieved with the case of observed faults then with the case of unobserved faults. Both fault detection methods depend heavily on the p-values, which usually depend on the sample size, and the methods had to be consistently adjusted to the specific situation and the given sample size.
The results show that the method for the case of unobserved faults should be pursued further in the future. Since in practice the faults will not be observed. The evaluation of the case of unobserved faults has brought with it the following difficulty: The results of the method on unobserved faults show that the threshold value is difficult to find. The threshold value determines how accurate a fault is predicted and has the function of a trade-off between specificity and sensitivity of the classification rule. Fault detection based on the rate of estimated faults can be used without prior knowledge of the faults. Additionally, this procedure also finds faults when none are present, see
Figure 12. However, the advantage of this method, which is to ignore the decision rule that never detect faults, in this case becomes a disadvantage. By adjusting the threshold for the decision on faults, the ratio between the decision rules that identify at least one fault and the decision rules overall in the choice set could be improved. At the end of the test month (approx. 600 h), the FD method detects a fault where there is no fault. We interpret this to be the case that the seasonal temperature fluctuations were not sufficiently represented in the training data set, and the temperature increase during this period (winter-spring transition) was therefore detected as a fault.
The method invites the application on data measured on-site. A real building evaluation could be performed using a learning data set based on a building simulation (with simulated, observed faults) of the same building. Consequently, it is essential to test the applicability of a decision rule based on simulated building data on the behaviour of the original building. The accuracy of the building simulation would then be a significant factor in the successful application of the method.
6. Outlook
In the next stage of this study, we plan to apply more statistical methods for fault detection on the one hand and the other hand, an extended dataset to represent seasonal fluctuations better. We also plan to increase the number of technical faults in the ventilation system, in the heating and control system and a hydraulic problem. In a third phase of our study we plan to apply the developed fault detection models based on the simulated annual data set to data measured in situ.
The following methods could be suitable for finding an optimal algorithm to determine the best decision technique: defining upper and lower bounds for residuals, using the coefficient of determination, and the correlation comparison of the sensors.
A first new approach is to derive an upper and a lower bound using the residuals in the fault-free state. Then a fault might exist if residuals in a row are outside of the bound.
Mattera et al. [
11] present a second model approach which we plan to investigate as well. The coefficient of determination (
) between the predicted value and the observed value per day is used to predict faults. In linear regression, a value of the coefficient of determination of one indicates a perfect linear relationship, while a coefficient of determination of zero indicates no linear relationship. Matera et al. assume a fault if
is low. Instead of the coefficient of determination, other measures like the mean squared error would also be possible indicator for the presence of faults. Matera et al. also suggested using each sensor as a response to determine the specific cause of the fault. In future studies statistical methods to estimate (in an automated and data-driven way) the cause of a fault can be considered.
In addition to a residual analysis based on a predictive model of one response, other approaches are also conceivable. An idea for such an approach is an overall correlation comparison, which is conducted by calculating the correlation (with respect to the linear or monotone relationships) between all sensors per day. Then, with the help of the fault-free state, an upper and a lower bound is defined for each sensor pair. If a certain number of correlations is outside the bound, a fault can be assumed. The advantage of this method is that it would allow us to determine, without much additional effort, which sensors are suitable to detect the fault by taking into consideration not only the number of correlations outside the boundary but also which pairs are outside the boundary.
Creating a whole network of prediction models for real sensors as the response variables could be a useful extension of an overall correlation comparison. The predicted values as virtual sensors are checked against the observed values of the real sensors. The basis for the development of decision rules for fault detection could then be the deviation between the behaviour of the whole prediction network on faultless data and the behaviour of the network on a data set with faults.