**2. Related Work**

The recent literature presents several data analysis and computational modelling techniques aimed at developing early-warning systems for cyber-attack detection in water systems. For example, in [9] a classification algorithm is developed using Support Vector Machines for identifying cyber-attacks in water systems. The authors propose a simple oneclass classification approach based on a truncated Mahalanobis distance. The algorithm is tested on a real dataset from a water distribution system in France. Hidden Markov chains are used in [10] for analyzing and detecting anomalies in the SCADA system of a water supply system. Normal behavior was first modelled and then modified with generated abnormal data to simulate potential attack detection. Not only water distribution systems, but also water treatment plants have been used for investigating cyber-attacks. Attacks in Programmable Logic Controllers (PLC) are designed by [11] for better comprehension of the impacts in the produced water.

In BATADAL, seven solutions, coming from research groups from all over the world, were presented, which were ranked based on time-detection and classification accuracy of the events. As our approach in this paper is directly competing with those seven solutions, to make it clear its novelty, we concisely describe the methodologies used in the other solutions. Those contributions together with several papers derived from the event, which we also mention later, can be considered a state-of-the-art literature on the subject, which can be enlarged with [5].

A two-stage method based on feature vector extraction and classification was proposed in [12]: vector extraction was applied to multidimensional hydraulic data, and safety classification was performed by random forests, the machine-learning algorithm developed by [13]. In [14] recurrent neural networks (RNNs) were used for hydraulic state estimation of network district metered areas and, based on the RNN output, a statistical control process was applied for detecting abrupt changes in the residual time series.

The authors in [15] use first operational variables to check whether physical and/or operating rules have been violated, and the generated set of flagged events feeds a deep learning method based on a convolutional variational auto-encoder to calculate the probability for measured data being anomalous.

In [16] also two detection methods were proposed: one evaluates consistency of the SCADA data and verifies the relation between actuator rules (e.g., pump/valve operation) and the measured data; then, the second method uses principal component analysis (PCA) for separating the hydraulic time series into normal and abnormal data.

A three-stage detection method was presented in [17]: the first step detects outliers in the data, focusing on single sensor analysis; the second stage employs a multilayer perceptron to detect SCADA data nonconformity to normal operation; and the third stage finds anomalies affecting multiple sensors.

Another three-module method was presented in [18]: the first module evaluates the consistency of the data against the set of control rules; the second applies statistical analysis to identify anomalous behaviors; then, the anomalies are confirmed by the third module, which finds correlations between hydraulic variables.

Finally, a model-based approach using EPANET for hydraulic simulations was developed in [19]; analyses of the residual time series between simulated and measured data

from SCADA system detected the anomalies, and a multilevel classification algorithm was implemented to classify the residual time series into normal and abnormal events.

BATADAL opened a fruitful discussion among various research groups around the world. Following the cyber-attack detection paradigm, new approaches have been presented in the literature after that Battle. For example, work [1] points to multisite detection approaches based on simultaneous analysis for an efficient warning system. In this work the authors present a joint data-model-based approach for cyber-attack detection: the model of the water network is used for inference from the observational data. Exploring the capacity of machine-learning techniques, in [20] a model for detecting anomalies in a water system controlled by SCADA using various machine-learning techniques is presented. The model classifies events including physical failures and cyber-attacks. As another example, research [21] has tested a set of machine-learning algorithms, highlighting the performance of extreme learning machine for classifying normal and abnormal data from multisite sensors.

Despite many devoted efforts to detect cyber-attacks on WDSs in recent years, the primary focus, as observed in the literature, has been mainly on machine learning and optimization techniques. The techniques of signal-processing for cyber-physical attack detection is still not well explored in the literature, especially in water distribution.

Work [22] investigates the application of Independent Component Analysis (ICA) for stealthy false data attack detection without prior knowledge of any power grid topology. The separated signal by ICA is used for detecting virtually unobservable attacks. The authors in [23] apply ICA for obtaining the fundamental traffic components and, in a second stage, the components are classified by machine-learning-inferred decision trees. Still on ICA applications, work [24] develops an algorithm to characterize hidden structures in fused residuals. Suppression of possible noisy content in residuals—to decrease the likelihood of false alarms—is achieved by performing the residual analysis solely on the dominant parts of a so-called demixing matrix.

In the water resources field, ICA has been applied to drought analysis, exploring hydrological data [25]. Also, in [26] the application of ICA to assess and estimate leakage in water distribution networks is proposed. The algorithm is tested on data acquired in a leakage experimental platform. Water demand is forecasted using a principal component model, and ICA is applied for developing climate predictors in [27].

Once demixed by ICA, source signals can be treated for automatically detecting anomalies, and this inspired us to apply ICA and then ACPD to the automatic detection of cyber-physical attacks. In this line, still within urban hydraulics, but with a different purpose, automatic identification of pipe bursts has been developed using statistic control processes applied to hydraulic parameters (e.g., pressure nodal pressure and flow in pipes) [28] or jointly to water demand forecasting [29]. Also, to improve the capacity of burst and leakage detection, work [30] proposes ACPD applied to filtered signals of consumption data.

After the Introduction, the structure of the paper is the following. The Materials and methods are presented in the next section. Then a section is devoted to the case study, and includes the obtained results and a discussion. The paper closes with the Conclusions section.

#### **3. Materials and Methods**

The methodology for cyber-attack detection proposed in this paper is based on two separate techniques. The first one comes from the signal-processing field and applies a Blind Source Separation (BSS) algorithm, which makes use of Independent Component Analysis. This technique produces the segregation of the original measured signals, affected by the attacks, into independent components. These components can be detected using a statistical control method, which corresponds with the other technique in this work: an abrupt change point detection algorithm is applied to the separate signals to accurately detect the start and the end times of the attacks, which helps characterize the attacks. Let us first concisely describe these techniques.
