4.2.2. Cyber-Attack Models

The modelling of attacks is an important part of cyber-security research, because it helps in understanding: the vulnerabilities of cyber–physical systems; the resources required to carry out successful attacks; the impact of attacks; and the resilience of countermeasures. Over the past decade, attacks against cyber–physical systems have attracted increased interest from the security research community to understand the resources required for attackers to carry out effective attacks.

We identified several papers that developed attack models to examine the behaviour of water systems and the impact of attacks. In [60], researchers investigated stealthy attacks that could cause damage while evading detection. They assumed an attacker with advanced skills and developed resources such as system dynamics, system diagnostic schemes, and the ability to manipulate PV (sensor) data. Attacks were carried out on the Gignac (Southern France) canal network's SCADA system. Researchers were able to design attacks that evaded the diagnostic scheme, which was based on unknown input observers for fault detection and isolation.

Adepu and Mathur [61] investigated single-point cyber-attacks against SWaT testbed and proposed attack detection based on system response to the attacks. Adepu et al. [62] and Tomic et al. [63] investigated jamming attacks against wireless communications in water systems. In [62], researchers carried out attacks against different parts of the SWaT testbed and, in [63], researchers used the Waterbox testbed [51] to investigate the robustness of process control schemes against jamming attacks using different attack strategies. Such attacks have the potential to halt or slow down a process and cause components to fail [62].

Robles-Durazno et al. [64] investigated memory corruption attacks against a PLC used in a water supply process, demonstrating their research using a Festo MPA rig. Researchers investigated memory corruption attacks in three location: attacking PLC inputs by overwriting memory allocated to connected sensors; attacking PLC outputs by overwriting memory for actuators; and attacking PLC working memory, targeting runtime code that contained setpoint variables. Researchers proposed a detection model based on monitoring energy consumption and voltage signals of sensors and actuators. Amin et al. [65] demonstrate stealthy deception attacks against SCADA systems used within water infrastructures.

RISKNOUGHT [55–57] simulation platform developed interaction between physical processes, and the computational and networking layers to simulate a range of cyber–physical threats including cyber-attacks targeting sensors, actuators, PLCs, SCADA and historians, causing physical damage to hydraulic components such as pumps, valves and pipes. Similarly, Taormina et al. [66] included a range of attack scenarios with the epanetCPA [53] toolkit to simulate cyber and physical attacks that target sensors, actuators, PLCs and SCADA, and communication between these components.

Erba et al. [67] investigated adversarial machine learning against ICS used in water distribution systems using WADI and BATADAL datasets. They present two models for concealment attacks to evade detectors that were trained using deep neural networks: (i) a white box attacker that has knowledge of the system and detection model and uses optimisation to generate adversarial samples that are close to the normal operating values of sensors; and (ii) a black box attacker, where the attacker has no knowledge of the detection and uses deep neural networks to learn the behaviour of expected ICS behaviour and produce adversarial sensor readings that resemble real data.

## 4.2.3. Cyber-Attack Detection Models

Designing effective detection techniques for cyber–physical systems is an important and dynamic area of research. A general list of cyber–physical systems detection models is reported in [68]. In this section, we review models proposed for detecting cyber-attacks in water systems.

A wide variety of approaches have been used to detect abnormal behaviour in water systems. These approaches are illustrated in Table 5. These can be divided into: modelbased detection, which tries to model the physical evolution of systems; machine learning models, which learn representative characteristics of a system using data; and statistical models, which use statistical analysis to detect attacks.



Amin et al. [69] propose a theoretical model-based detection scheme based on hydrodynamic models to detect cyber-attacks against sensor measurements and other anomalous behaviour in canal systems. Adepu and Mathur [70] used the SWaT testbed to detect

cyber-attacks using invariants, the physical conditions that must be true for a process at a given state. Researchers test their approaches using a selection of bias attacks, in which attackers modified sensor outputs and actuator commands by adding a small constant each time [74]. Researchers extended their work in [71,72] to detect bias attacks [74] against sensors and actuators using physics-based invariants for each state of the process, derived from process design for both single-point attacks happening at a single stage, and multiple point attacks that affect multiple sensors and actuators at a single stage [72], and proposed a distributed attack detection method in [73] to detect coordinated cyber-attacks. Yoong and Heng [75] proposed a security framework to develop and evaluate machine learning invariants to detect anomalies, and tested their framework using the SWaT testbed. They used an autoregressive model with exogenous inputs (ARX) combined with group searching to construct machine learning invariants to detect anomalies. The proposed framework is capable of being tested in real-life water treatment plants without causing any disturbances.

Miciolino et al. (2017) [54] proposed a fault detection and network anomaly-based detection models for FACIES testbed by monitoring data generated by sensors and network traffic between PLCs and SCADA which uses Modbus over TCP protocol. Detection uses standard deviation between the normal behaviour and actual observations. Normal behaviour of sensors and network traffic is determined by using statistical averages calculated using data from normal runs.

Zohrevand et al. (2016) [76] used a hidden Markov model (HMM) to design an anomaly-based detection model for a water supply system. Training data was collected from a SCADA-based water supply system in the City of Surrey in British Columbia (Canada) between 2011 and 2014. Working with domain experts, researchers generated anomalous cases and inserted these into the normal data as potential attack data. Four anomalies were constructed by targeting the flow capacity of water: maximum flow, minimum flow, continuous overflow and frequent overflow. Ahmed et al. (2017) [77] used EPANET to simulate a water distribution network to demonstrate a model-based attack detection technique. Detection involves determining the input-output dynamical model of the water distribution network as a set of Linear Time Invariant (LTI) equations. A Kalman Filter is then used to estimate the state of the physical process. The difference between actual measurements and estimations are used to obtain residuals which are then fed into a change detection procedure, CUSUM (cumulative sum control chart) to identify abnormal behaviour. Generated attacks include false data injection (sending modified PVs to controller; and sending false signals to actuators); and controller zero-alarm attack where the attacker changes sensor measurements in such a way that residuals do not cause any alarms. Moazeni and Khazaei [78] proposed a mixed integer nonlinear programming (MINLP) approach to estimate state variables, and tested this on a simulated 6-node water distribution system modelled using the MATLAB OPTi toolbox.

Many machine learning techniques, both supervised and unsupervised, have been used to detect anomalous behaviour. Inoue et al. [79] used a SWaT dataset [50], which consists of 41 cyber and physical attacks [45] against sensors, actuators and controllers including modifying PVs and MVs. Researchers used unsupervised learning approaches from deep learning (long short-term memory neural networks) and one-class support vector machines to detect anomalies.

Hindy et al. [80] built a water system testbed composed of two water tanks, a PLC, a Modicon M238 logic controller, pumps and five sensors that measures various water levels and the presence of water in the tanks. The testbed has two mode of operation, simulating water distribution, and storage. Sensor measurements are sent to the control and monitoring units using the Modbus protocol. Anomalous behaviour is generated as a result of cyber-attacks (DoS, spoofing), system faults and physical attacks (e.g., humans hitting tanks). Classic machine learning algorithms are used to classify anomalous behaviour and affected components using the data gathered and reported by the PLCs. These algorithms are logistic regression, Gaussian naive Bayes, k-nearest neighbors (K-NN), support vector

machine (SVM), decision trees and random forests [80]. They report that the K-NN model achieved the highest accuracy.

Several teams participated in the BATADAL challenge competition [47], developing attack detection for the fictitious C-Town water distribution network (WDN) benchmark [100]. This was built using the epanetCPA water distribution modelling toolkit, and presented at the 2017 World Environmental and Water Resources Congress organized by the Environmental and Water Resources Institute of the American Society of Civil Engineers (EWRI/ASCE). Three datasets [45], one with normal operational data, and two datasets (one for training, one for testing) containing cyber-attacks, were given to each competing team. Generated cyber-attacks were deception attacks (against PVs and MVs and SCADA data) and replay attacks. Taormina and Galelli [81,82] used autoencoders (deep neural networks) in detecting attacks. Abokifa et al. [83,84] proposed a detection approach composed of three layers to detect anomalies in the BATADAL datasets; first removing outliers using statistical analysis then, using a feed forward artificial neural network (ANN), a multilayer perceptron (MLP) to identify anomalies and, finally, principal component analysis (PCA) to identify multiple affected sensors. Giacomoni et al. [85] developed two detection approaches based on data-mining. The first of these is a method using actuator rules to ensure readings from the SCADA are within defined normal ranges. The second method uses an optimization routine that extracts low-dimensionality components of the data, and thereby separates normal operation data from attack data. Pasha et al. [86,101] also used a data-mining approach on BATADAL datasets based on extracting control rules, pattern recognition, PCA, and relationship between hydraulic and system parameters. Brentan et al. [87] applied autoregressive networks with exogenous inputs (NARX), a recurrent neural network. Housh and Ohar [89,90] used physical simulation to model the system to detect cyber-attacks. Their model-based approach uses mixed integer linear programming (MILP) to estimate the hydraulic processes of the water distribution systems under normal operating conditions to produce expected errors between the actual measurements and estimated model. The difference between the expected and actual value is used to detect attacks. Chandy et al. [88] developed an ensemble model comprising two models to detect attacks for the BATADAL detection challenge competition. The first uses physical and operational rules and violations to generate events. The second uses these events along with raw data to train a deep learning model, a convolutional variational autoencoder, to detect attacks. Aghashahi et al. [91] first extracted features related to the characteristics of the attack and no-attack data by using a covariance matrix and distance measure of every data point. Then, a random forest classifier was used to classify these characteristics as attack and normal operation. A detailed description of the competition and a discussion of results can be found in [47]. MarcosQuiñones-Grueiro [92] combined widely used signal processing techniques, PCA, the adaptive exponential weighted moving average chart (EWMA) and the reconstruction-based contribution (RBC) method to detect attacks and to diagnose the area of the network that was under attack using the BATADAL dataset. Ramotsoela et al. [93] used the BATADAL dataset to evaluate some of the traditional anomaly detection approaches to detect attacks in WDS, and proposed an ensemble technique. The proposed ensemble technique combines the subspace outlier degree (SOD) algorithm, a distance-based shared nearest neighbors approach designed to detect outliers in high-dimensional data [102] and a local outlier factor (LOF) algorithm [103] to detect outliers in low-dimensional data. Both algorithms are run in parallel for each predicted datapoint and feed their outputs to a quadratic discriminant analysis (QDA) process to classify datapoints into anomalous or normal. Kadosh et al. [94] used a support vector data description (SVDD) classifier to propose a one-class cyber-attack detection model to detect attacks in WSD using both the BATADAL dataset and epanetCPA.

Bakalos et al. [95] developed a cyber-attack detection approach for water systems using multimodal data fusion and adaptive deep learning. Multimodal data fusion involves combining different channels of information, including visual data from thermal camera streams, Wi-Fi reflection, and ICS data. The weight attached to each of these streams of

data is determined through a deep learning model process. The proposed adaptive deep learning approach uses a tapped delay line (TDL) convolutional neural network (CNN) with autoregressive moving average [95]. The data used to evaluate the approach is from STOP-IT project.

Min et al. [96] used an artificial neural network to detect attacks against a water distribution network using the EPANET simulator [84]. Macas et al. [97] used an "unsupervised attention-based spatio-temporal autoencoder for anomaly detection (STAE-AD)" model to detect attacks against water infrastructures using the SWaT dataset. Zou et al. [98] proposed a hybrid model making use of an MLP and a one-class SVM. MLP was used to forecast measurement parameters, and prediction errors were used to train a one-class SVM to classify outliers; finally, Bayesian sequence analysis was used to detect contamination attacks against water distribution systems.

Majority of cyber-attack detection models reviewed focus on detecting anomalous behaviour by monitoring and analyzing physical process variables, and failed to monitor industrial control network traffic and use this knowledge to detect cyber-attacks. Ghaeini and Tippenhauer [99] proposed a hierarchical monitoring intrusion detection system (HAMIDS) for ICS to collect network events in different layers of industrial networks. HAMIDS extends the Bro, an open-source tool for monitoring and analyzing network traffic. IDS sensors are installed on different layers of industrial networks to monitor network events. These events are then aggregated and processed in a central cluster to detect malicious behaviour. HAMIDS was validated using a range of network attacks (e.g., ARP poising, network flooding and man in the middle attacks) against SWaT testbed.

Proposed detection approaches are evaluated for effectiveness using (i) operational data from real-world systems; (ii) testbeds; and (iii) simulation. Existing studies show a wide variety of techniques that were applied to detect cyber-attacks against water systems; however, making a reliable comparison among detection approaches is not feasible due to a lack of common performance metrics and/or missing reported performance data, different datasets and sizes.

## *4.3. Model-Based Security Analysis*

Several research studies focused on using modelling approaches to analyse the security of water systems and to identify vulnerabilities.

Kang et al. [104] proposed a model-based security analysis for a water treatment system. Testing their approach on SWaT, they modelled the interaction between the physical plant and controller using approximate, discrete models to discover and explore potential attacks. The model is constructed using a first-order modelling language Alloy to capture, as state transition rules, connections among various components and the behaviour of the plant.

Motivated by malware techniques that hide critical information from operators while executing an attack (e.g., Stuxnet), Patloll et al. [105] proposed a multiple security domain non-deducibility (MSDND) model [106] using belief, information transfer and trust (BIT) logic [107] to identify critical information that attackers may hide. BIT logic is used to reason about the reliability of data moving between entities, defined as the belief and trust one entity has in information received from another entity. A system is decomposed into components, and each component that could change the state of the state is treated as a separate domain. Requiring development of invariants, an information execution flow across these domains starting from source to destination is monitored to identify when vulnerabilities that have been exploited have resulted in invariant violation. Mishra et al. [108] proposed an agent-based modelling framework to model critical CPS and their interdependencies, to understand the impact of attacks on interconnected critical infrastructures; they evaluated the application of the model to a water distribution system and used invariant-based method [70] to generate rules to detect attacks.

Taormina et al. [66] and Hunter et al. [109] proposed a modelling approach to quantify the hydraulic behaviour of the system (such as tank overflow, variation in pumps) under cyber–physical attacks by defining components of a system, and specifying attack variables (starting time, duration). They give simulation results using the epanetCPA toolbox and the C-Town network [100].
