2.4.1. Cleaning the Dataset

The resolution of the original training dataset was 1 min time steps for all the variables. The dataset was cleaned and preprocessed by detecting and analyzing outliers manually, caused by broken sensors, miscoded values, operation disruption (e.g., unintended operation due to mechanical flaws, software errors or mistakes by the operator), etc. Outlier

detection can also be carried out statistically, for example, by using approaches such as standard deviation or the interquartile range [47]. Both techniques identify outliers by comparing each value/measurement to its population. Due to the purpose of this study, outliers are of special interest (fault detection). For the training dataset, operation disruptions were identified and excluded prior to regression analysis, while operation disruption was a part of the validation process.

The process of identifying and categorizing operation disruptions was carried out by an in-depth investigation of the historic data, stored in the BAS and in the dedicated control systems of the air handling unit and heat recovery system.
