*3.4. Performance Evaluation*

In addition to the BATADAL data sets, the performance evaluation also follows the criteria and metrics presented in [7], namely time-to-detection (*TTD*) and single classification rate (*SCR*).

*TTD* is the time required by the algorithm to find an attack and can be calculated as:

$$TTD = t\_0 - t\_{d'} \tag{8}$$

where *t*<sup>0</sup> is the time when an attack is detected, and *td* is the time when the attack really started. When an attack is detected, *TTD* varies in the interval [0, Δ*t*], where Δ*t* is the total attack duration. For calculating the total *TTD* under several attack scenarios, work [7] presents a score for the specific attack detection calculated by (9):

$$S\_{TTD} = 1 - \frac{1}{n\_d} \sum\_{i}^{n\_d} \frac{TTD\_i}{\Delta t\_i},\tag{9}$$

where *na* is the number of attack scenarios.

An ideal algorithm for cyber-attack detection must be able not only to quickly disclose the attacks, but also to not produce false positive warnings. For evaluating the accuracy of the algorithm, the true positive rate, *TPR* (10), and the true negative rate *TNR* (11), are calculated based on a confusion matrix. Both rates are combined for calculating the *SCR* (12):

$$TPR = \frac{TP}{TP + FN} \tag{10}$$

$$TNR = \frac{TN}{TN + FP'} \tag{11}$$

$$SCR = \frac{TPR + TNR}{2},\tag{12}$$

where *TP* and *TN* are the numbers of true positive and true negative time stamps, respectively. *FP* and *FN* are the numbers of false positive and false negative time stamps.

Criteria (9) and (12) are considered by [7] and the final score *S* is calculated as a weighted sum of *STTD* and *SCR* (13)

$$S = \gamma S\_{TTD} + (1 - \gamma)SCR,\tag{13}$$

the real number *γ* being used to build a suitable convex combination. For equally weighted criteria *γ* = 0.5.

## **4. Case Study**

The methodology presented in this paper is applied to the case study posed in BATADAL [7], which uses the water network D-town (Figure 2) and considers potential attacks to pump stations and pressure and tank level sensors, as indicated in the figure. The network is composed of 429 pipes, 388 junction nodes, 7 tanks, 1 reservoir, 11 pumps and 5 valves.

Three data sets are provided by BATADAL generated via epanetCPA [38], a MATLAB toolbox for cyber-attack design and hydraulic simulation. Please note that due to obvious security reasons, studies of cyber-physical attacks are usually conducted using simulated data that reproduce real-world conditions [5]. In the case of BATADAL, hourly pressure, flow, tank level and control device status are provided in the data sets. The first data set corresponds to one year of data without cyber-attacks. The second data set is based on a set of 492 h. This data set unfolds an entire, well-labeled cyber-attack, and other six cyber-attacks partially or completely hidden. Finally, the third data set has 7 new attacks distributed along 407 h of data.

The application of the methodology starts by selecting the combination of data to be used as input for fastICA from the available data. Since the water network is naturally divided into small district metered areas according to its topology, eight combinations of data are used as input for the ICA algorithm. These combinations consider the hydraulic connections of the system and are summarized in Table 1.

**Table 1.** Description of control and measuring devices for fastICA application


**Figure 2.** D-town water network topology highlighting potential attack locations.

Using the combinations presented in Table 1, the algorithm fastICA is applied, which separates each combination into 2 (approximate) sources. To illustrate the signal separation, Figure 3a presents the original data for combination B, and Figure 3b presents the separated signals, split into two sources. In the separated sources (Figure 3b), an abnormal trend of the time series is discovered in the test data set.

This behavior is repeated for other combinations. One source has a periodic trend, as a typical behavior of a WDS, while the second source is similar to a random noise. This second one is, usually, highly affected by the attacks and is considered by the detection algorithm to identify abrupt changes.

For automatic detection of the changes in the separated signals, ACDP is applied. The algorithm evaluates the second source, highly affected by the attacks, and allows a more accurate detection of the anomalies. Applying ACDP to the sources obtained from all combinations (Table 1), the start and end time indexes of the attacks are obtained.

The entire process may be summarized as follows. First, a combination of hydraulic time series is selected and is processed by fastICA (Figure 4a); this algorithm splits the time series and produces two sources that are processed by ACDP (Figure 4b). Finally, ACDP is launched to locate the time interval when the attack occurred (Figure 4c), allowing the water company to start actions for mitigating the impacts of the attack. Figure4c shows in detail the attack corresponding to combination F. It is possible to observe the delay in detecting the attack (interval between the first black and the green lines). As described in [7], this attack is related to changes of tank T4 signal. Even though these changes are not easily identified in the original data, as shown in Figure 4a, after fastICA processing, source signal 1 clearly reveals the change in data, allowing ACDP to disclose the attack.

(**a**) Original measured pressure at nodes J307 and J302

**Figure 3.** Comparison between mixed and separated pressure signal—combination B.

Still for illustrating the joint capability of fastICA and ACDP, Figure 5a shows original measured data of pumps PU8 and PU9, node J306 and tank level T5. The joint process by fastICA and ACDP applied to the corresponding test data set reveals that no attacks are found in the sources. This fact corroborates the accuracy of the algorithm, mainly in terms of false positives minimization, since according to [7], there were no attacks occurring in the test data set.

The ACDP applied to all sources and combinations for the test data set resulted in the identification of 7 cyber-attacks, i.e., all the attacks were disclosed by the proposed methodology. Figure 6 presents the confusion matrix with the numbers of *TP*, *TN*, *FP* and *FN*.

Based on the confusion matrix, it is possible to calculate *TPR* = 0.966 and *TNR* = 0.980, resulting in a *SCP* = 0.973. Compared to the seven teams that presented solutions for BATADAL, the value of *SCP* is the second higher, the first team having obtained *SCP* = 0.975, virtually identical. Comparing the *TPR*, the methodology of the present work gets the highest scores, showing its efficiency to find abnormal scenarios.

(**a**) Original measured pressure at node J415, tank level at Tank T4 and flow at pumps PU6 and PU7

(**b**) Separated signals from node J415, tank level at Tank T4 and flow at pumps PU6 and PU7

(**c**) Detail of ACPD algorithm applied to test data set using signal one of fastICA applied to node J415, tank level at Tank T4 and flow at pumps PU6 and PU7

**Figure 4.** Complete data processing, illustrating fastICA and ACPD applied to Combination F.

(**a**) Original measured pressure at node J306, tank level at Tank T5 and flow at pumps PU8 and PU9

(**b**) Separated signals from node J306, tank level at Tank T5 and flow at pumps PU8 and PU9 processed by ACDP

**Figure 5.** Original and processed data for combination G.

**Figure 6.** Confusion matrix for the test data set presenting the number of true positives and negatives on the main diagonal and the false negatives and false positives on the counterdiagonal

The results in terms of *TTD*, are summarized in Table 2. Four out of the seven attacks are detected immediately or in a maximum of 1 h later. The rest is detected in a maximum of 10 h later, as shown in the table. Based on these values, the score for the other metric proposed in BATADAL, namely *STTD*, is calculated, resulting in 0.913. Compared to the other teams, this value is the lowest and shows that despite the accuracy of the methodology, for some abnormal scenarios, early warnings cannot be suitably obtained. Based on both metrics *SCR* and *STTD* the final score is calculated, resulting in 0.973. This final score is the second highest, when compared with the seven teams that presented solutions in BATADAL.


**Table 2.** Summarized results for the test data sets presenting start and end time date for each attack

#### **5. Conclusions**

The security of water distribution systems has become increasingly complex due to the rapid rise of telemetry and remote controls. The growing number of reported cyber-attacks in WDSs has also created an important need for new, fast and efficient methodologies for early-warning systems that help guarantee WDS security.

Most efforts devoted to detecting cyber-attacks in WDSs have primary focused on machine-learning and optimization techniques. Statistical analysis of measured data can provide valuable results for quick detection of anomalies. However, as attested in [5], studies from other fields are necessary to build confidence in the models. In this paper, we focus on signal-processing. Among the signal-processing techniques based on statistical analysis, fastICA is explored in this work. FastICA has shown to be a powerful tool for hydraulic data analysis, mainly under abnormal conditions. The signal separation follows a trend, where one signal is more related to a typical periodical oscillation of the system, and the second one is more related to a random process. The latter is highly affected by

abnormal conditions and, consequently, it is a possible input for detection algorithms. The application of fastICA to hydraulic time series (e.g., tank level) allowed to clearly highlight the attacks against the studied water system. These attacks cannot be easily disclosed in the original time series; however, this task becomes easier after processing the data by a BSS algorithm.

Change point detection algorithms are useful for automatic statistical changes in time series, and can be used for early-warning systems. In this work, the ACPD algorithm is applied to the separate signals resulted from fastICA for automatically defining changes in data, which are seen to correspond to cyber-attacks. The methodology applied to the BATADAL case study resulted in the detection of the seven attacks with high accuracy and few false positives. We claim that the methodology can be perfectly applied to any real system, as long as the water utility can measure at least one of the hydraulic parameters, namely flow, pressure and tank level.

Nevertheless, some attack scenarios have been detected too late, which is a limitation, otherwise typical of most risk evaluation methodologies. Special attention to this kind of attacks should be paid, requiring more investigation for developing ultimate conclusions about the global efficiency of the methodology. Future works, more than ratifying the efficiency of detection algorithms, should go deeper into the cyber-physical problem, investigating the causes of the attacks, optimally placing grids of dedicated sensors, and timely responding to prevent the occurrence of damage. Optimal sensor placement is still an only recently and partially formed subject. Accordingly, efforts should be devoted to expanding and enriching this field by producing novel and efficient methodologies to help fully develop this field of research.

**Author Contributions:** Conceptualization, B.B. and E.L.J.; methodology, P.R., B.B. and D.B.; software, B.B. and P.R.; validation, G.M. and D.B.; formal analysis, J.I. and E.L.J.; writing—original draft preparation, B.B. and P.R.; writing—review and editing, G.M. and J.I.; supervision, J.I. and E.L.J. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

