*3.3. Validation and Application*

The validation of the prediction model is illustrated in Figure 13 as a comparison between the predicted and actual data from the validation dataset. The predicted power consumption, including the prediction interval, is the gray shaded area and the measured power consumption is the black line. The numbered red areas are the identified periods with operational disruption, and they include 14 datapoints out of a total of 85 in the validation dataset. The given operation disruptions have been identified as (A) uncontrolled water refill, (B,C) issues with the control system of the water temperature, (D) issues with controlling the indoor environment and water refill system, leading to a consecutive lockdown of the facility and (E) issues related to the control of the air handling unit and the air flow supply. The prediction model identifies all of the disruptions as illustrated. When the facility operates without flaws and faults, the facility performs within the operational baseline provided by the prediction model. Each of the operational disruptions are identified as major deviations from the baseline.

When excluding the data associated with operational disruptions, 14 datapoints in total (approximately 16% of the dataset), the predicted operation fits the actual performance well. Figure 14 illustrates the correlation between the predicted and measured power consumption exclusive of the operation disruptions. The Pearson correlation coefficient is 0.85. However, there are periods where the models seem to consistently over- or underpredict the performance model, and this may have to do with the lack of explanatory variables in the model. However, this deviation is within the prediction interval, which corresponds with no detection of operational disruption for the relevant period. Figures 15 and 16 present the range of the independent variables used in the prediction model. Even though the range of the training dataset was initially significantly reduced to only three months of data (29 datapoints), the dispersion of the variables within this dataset corresponds with the validation dataset.

**Figure 13.** Visual validation of the prediction model from September 2018 to June 2019. The prediction model includes the prediction interval in gray, measured power consumption in black and periods associated with operational disruptions in red (see Appendix A for higher resolution).

In the perspective of applying the presented method to industry, the combination of a short-term training dataset and the few predictors makes this method especially useful. This means that a facility can develop a model over a short period of time, with a minimum of sensors. However, the transferability with regard to the choice of independent variables must be further investigated in order to obtain a universal method for industry.

**Figure 14.** The predicted power consumption plotted against the measured power consumption for the validation dataset. The Pearson correlation coefficient is given as the R-coefficient.

**Figure 15.** The dispersion of the independent variables in the prediction model, for each dataset used in the analysis.

**Figure 16.** The dispersion of the independent variables in the prediction model, for each dataset used in the analysis.
