*1.1. Automated Fault Detection and Diagnosis Methods for HVAC Systems*

Critical points of reactive and preventive maintenance approaches underline how "predicting" the faults of HVAC units could be essential. This task could be performed by means of the so-called Automated Fault Detection and Diagnosis (AFDD), which is an automated process of detecting faults and diagnosing the type of problem and/or its location [5,7,9]. It could be adopted to take advantage of potentialities associated to building energy management systems in quasi-real-time by comparing expected behavior with actual performance over a predefined period. AFDD technologies can provide numerous benefits, such as improved operational efficiency, energy savings, reductions of utility costs, as well as reduced equipment downtime [5,7,9]. Although currently underutilized, AFDD products represent one of the most active research areas as well as a very fast-growing market section in the sector of building analytics technologies [10]. The methodologies adopted for carrying out AFDD analyses can be categorized as (i) data-driven-based, (ii) quantitative model-based, and (iii) qualitative model-based [5]. The first category needs pre-labeled operational data acquired from the system under investigation in order to the develop AFDD models; data-driven AFDD approaches achieved promising results thanks to their applicability even in the case of simulation models are challenging to be developed [5,10]. The quantitative model-based approach relates to the methods involving simulation models physically describing the system at different levels of detail. Finally, the qualitative models are based on the knowledge of the system deriving from area expertise.

Nowadays, buildings are equipped with numerous sensors used for their energy management. In addition, innovative devices are allowed to connect occupancy sensors, power meters, and appliances that collect data in order to derive information with the aim of taking data-driven actions. In this context, the integration of artificial intelligence technologies (that highlighted fast advancements in last years), including both unsupervised and supervised algorithms [5,11], is particularly encouraging due to the fact that they could allow to improve self-diagnosis capabilities and optimize energy management systems. In particular, an Artificial Neural Network (ANN) represents a kind of artificial intelligence that simulates the operation of the human brain; it can learn from training data and replicate the trends of data time series, approximating nonlinear relationships between inputs and outputs of advanced energy systems without involving explicit mathematical representations [11]. The data-driven approach integrating artificial intelligence [5,12,13], with respect to the other methods, allows (i) achieving higher accuracy of fault detection and diagnosis; (ii) learning patterns from field data without involving physical models as well as needing an a priori knowledge of connections among faults and associated symptoms; and (iii) performing AFDD analyses considering a restricted number of variables and, therefore, limiting the number of sensors. In more detail, supervised approaches use the domain knowledge with the aim of developing a prediction tool, while the unsupervised methods get out concealed knowledge without a predefined goal [5,12,13]. Supervised models are mainly based on the implementation of residual analyses to perform an AFDD process [5,14,15], where a residual is the difference between the predicted and the experimental values of a specific parameter.

Several studies focusing on supervised techniques for AFDD of HVAC systems are reported in the scientific literature. Piscitelli et al. [5] suggested an innovative AFDD method based on both unsupervised and supervised data-driven approaches by considering the operational data of an AHU recorded during steady-state and transient periods. Dehestani et al. [16] suggested a methodology based on a multi-class support vector machine with the aim of identifying faults related to air dampers and fans of AHUs. A Bayesian network was considered in [17,18] for diagnosing faults associated to air dampers, return fan failure, and cooling coil valve; the network exploited as inputs the residuals derived from a set of statistical models and checking rules. Mulumba et al. [19] suggested a method to predict the occurrence of faults related to return air fan, air dampers, and cooling coil valve by means of a support vector machine combined with an autoregressive model. Yan et al. [20] presented a mixing of two supervised methods to detect blockage of coil valves and air dampers, return air fan failure, and duct leakage; a classification tree has been developed using as inputs both field data and residuals derived from a regression model, while the labels of different faults have been assumed as outputs; the method described in [20] can be helpful in performing AFDD analyses without considering transient operation of HVAC systems. McHugh et al. [21] compared several classification models for AFDD and the classification tree model was identified as the best option for chilled water or steam leakage.

#### *1.2. Novelty and Structure of the Paper*

The literature review performed in the previous subsection demonstrates how the scientific community is engaged in the research area of artificial intelligence techniquesbased AFDD for HVAC units. According to the authors of [1,7,22], even if AFDD is an effective approach to guarantee an efficient operation of HVAC systems and associated technology is growing, it is still in the initial stage of utilization. This means that additional investigations are still mandatory in order to address several research gaps.

First, the architecture of sensors in HVAC units is usually not designed with AFDD in mind, and therefore some important variables are generally not measured causing a lack of labeled data. Moreover, measurements under faulty conditions are even more challenging to be obtained due to the uncommon faults' occurrence as well as the inconvenience of implementing faults into complex and expensive devices with the purpose of collecting data [23]. In addition, relatively few studies give detailed information on how faults are empirically introduced into an existing HVAC system [5,24]; almost all the works only take into account one HVAC operating mode under different weather scenarios [5,24]. Lin et al. [7] highlighted that there is a need of standard datasets for assessing the accuracy of AFDD methods and future AFDD studies should focus on the expansion of databases as well as their provision for public use. Granderson et al. [25] also underlined that it is unusual to find datasets characterized by labeled data clearly indicating whether they represent faulty, healthy, or simply unusual operating states. Finally, Casillas et al. [26] indicated that one of the most important challenges of researches focusing on AFDD methods is represented by the insufficiency of shared databases to benchmark the performance of algorithms with the aim of assessing improvements and prioritizing future investments in these methods. With reference to this point, it should be highlighted that most of AFDD studies are based on the ASHRAE RP-1312 data set [24] (dated 2011) consisting of measurements recorded every minute from an experimental set-up comprising two AHUs; recently, Piscitelli et al. [5], Yun et al. [27], as well as Fan et al. [28] proposed novel methodologies for performing AFDD analyses of AHUs based on the ASHRAE RP-1312 data set [24]. Therefore, as also suggested by Hu et al. [23], additional researches are required in order to obtain more experimental data under both normal and faulty operation considering the occurrence of a number of different faults upon varying the boundary conditions.

One more research gap associated with the application of AFDD analyses is related to the fact that not many studies quantitatively examine how various faults and fault severities impact energy consumption, user comfort, maintenance cost, and equipment life cycle [6]. This point represents a demanding task taking into account that (i) several faults could have comparable symptoms and (ii) faults of AHUs could interact each other and, therefore, it could be challenging to isolate multiple faults of AHUs [1,24]. According to the authors of [7], additional works better characterizing faults' impact based on field measurements could prove valuable in addressing future developments and implementation attempts of AFDD techniques. Piscitelli et al. [5] also indicated that the majority of AFDD applications are used for detecting and/or diagnosing faults of HVAC units during steady-state operation, and therefore they could not be effectively used with reference to transient periods because they are not fully able to automatically determine the system operation mode and prevent false alarms. In this context, accurate simulation models of HVAC units can provide significant benefits for performing AFDD analyses taking into account that they could help in quantifying faults' impact on both energy demand and occupant comfort and, therefore, supporting corrective actions which can facilitate more reliable commissioning decisions, more efficient system operation, improved indoor conditions, and prolonged equipment service life [29]. However, according to the authors of [30,31], most existing simulation models of HVAC systems assume normal/healthy conditions without any operational faults and do not capture the significant impact of faults on energy consumption and indoor comfort conditions. In addition, Zhang and Hong [31] highlighted that modeling activities of HVAC systems operating under faulty conditions are still insufficient mostly due to the fact that several fault-related researches focus on single subcomponent operation rather than whole system performance and, consequently, they cannot predict the comprehensive faults' impact.

One additional knowledge gap to be underlined relates to the fact that models of HVAC units developed for AFDD purposes should be fully validated via extensive comparisons with experimental data under both faulty and normal conditions as well as different boundary scenarios. However, comparative analyses against field measurements are usually not performed for validation purposes mainly because, as mentioned above, accurate experimental datasets, covering a wide range of operating conditions and including faulty data, are not generally available. For example, Zhang and Hong [31] introduced a methodology for modeling operational faults of HVAC units by using a comprehensive whole-building performance simulation program; impacts of faults with reference to a small-size office building have been investigated in [31], but a validation process against experimental data has not been carried out; similarly, Basarkar et al. [30] assessed the effects of four typical faults on the HVAC unit serving a commercial reference building by means of a simulation program; the results of comparisons between predictions and field measures have not been reported in the paper in order to check the models' accuracy.

In this paper, the operation of the HVAC system assisting the integrated test room of the SENS i-Lab of the Department of Architecture and Industrial Design of the University of Campania Luigi Vanvitelli (located in Aversa, south of Italy) has been experimentally characterized on the basis of a series of tests performed during both summer and winter under both normal and faulty operating conditions (transient and non-transient). In particular, five different typical faults (affecting the supply/return air fans, the valve supplying the heating coil, the valve supplying the cooling coil, and the valve supplying the steam humidifier) have been artificially implemented in the HVAC system and analyzed during transient and steady-state operation. An optimal artificial neural network-based system model has been identified and verified by contrasting the experimental data with the predictions of twenty-two different neural network architectures developed in the MATLAB environment [32]; the selected artificial neural network has been coupled with a dynamic simulation model developed using the TRaNsient SYStems (TRNSYS) software platform (version 17) [33]. The effect of selected faults on occupant indoor comfort, temporal trends of key operating system parameters, as well as electric energy consumptions has been assessed.

This paper addresses several research gaps highlighted by the literature review focusing on AFDD applications to HVAC systems. In fact, the dataset described in this article includes fault free and faulty operational data of a typical HVAC unit, coupled with

ground-truth information and the indication of absence or presence of faults. In addition, this dataset covers a wide range of operating scenarios (both transient and steady-state) and weather conditions while encompassing five typical fault types. Moreover, a whole-system simulation model using both MATLAB and TRSNSYS environments has been created and extensively validated by contrasting predicted data with measurements; then, it has been used to discover a number of patterns related to the faulty system operation and assess the impacts of selected typical faults. Both the labeled measured data as well as the developed simulation models will be made available on a public data repository allowing access, consultation, and utilization to readers and organizations for institutional and research purposes.

The paper consists of six main sections. In Section 2, the experimental setup is detailed. Section 3 describes the investigated faults as well as the experimental results of both fault free and faulty tests. A detailed outline of the simulation model is reported in Section 4. An assessment of faults' impact is performed and discussed in Section 5. Finally, the conclusions and future research steps are indicated in Section 6.
