Detection of Random False Data Injection Cyberattacks in Smart Water Systems Using Optimized Deep Neural Networks

Moazeni, Faegheh; Khazaei, Javad

doi:10.3390/en15134832

Open AccessArticle

Detection of Random False Data Injection Cyberattacks in Smart Water Systems Using Optimized Deep Neural Networks

by

Faegheh Moazeni

^1,*,†,‡ and

Javad Khazaei

^2,‡

¹

Civil & Environmental Engineering, Lehigh University, Bethlehem, PA 18015, USA

²

Electrical & Computer Engineering, Lehigh University, Bethlehem, PA 18015, USA

^*

Author to whom correspondence should be addressed.

^†

Current Address: 1 West Packer Avenue, Bethlehem, PA 18015, USA.

^‡

These authors contributed equally to this work.

Energies 2022, 15(13), 4832; https://doi.org/10.3390/en15134832

Submission received: 9 June 2022 / Revised: 27 June 2022 / Accepted: 28 June 2022 / Published: 1 July 2022

Download

Browse Figures

Versions Notes

Abstract

:

A cyberattack detection model based on supervised deep neural network is proposed to identify random false data injection (FDI) on the tank’s level measurements of a water distribution system. The architecture of the neural network, as well as various hyper-parameters, is modified and tuned to acquire the highest detection performance using the smallest size of training data set. The efficacy of the proposed detection model against various activation functions including sigmoid, rectified linear unit, and softmax is examined. Regularization and momentum techniques are applied to update the weights and prohibit overfitting. Moreover, statistical metrics are presented to evaluate the performance and effectiveness of the proposed model in the presence of a range of measurement noise levels. The proposed model is tested for three attack scenarios composed for the battle of the attack detection algorithms. Results confirm that the size of the data sets required to train the neural network (NN) to accomplish the highest levels of accuracy and precision is significantly decreased as the number of hidden layers is increased. The trained 4- and 5-layer deep neural networks are able to detect the readings’ FDIs with 100% precision and accuracy in the presence of 30% background noise in the sensory data.

Keywords:

random false data injection cyberattacks; deep neural networks; learning-based detection algorithm; water system

1. Introduction

The rising cyberattacks on various critical infrastructure sectors indicate the concerning issues of the automated and digitized monitoring and decision-making systems. Concretely, internet-of-things (IoT) sensors, recently adopted by various aspects of the water sector, have allowed advanced monitoring and control of smart water systems, yet they have made cybersecurity an emerging concern. Cyberattacks on the U.S. power plants and water facilities in 2018 [1], and the most recent cyberattack on the water treatment plant in Florida in 2021 (https://www.nytimes.com/2021/02/08/us/oldsmar-florida-water-supply-hack.html accessed on 8 June 2022) are only two examples of such attacks. Cyberattacks in the water sector can be conducted in different forms, including hijacking private consumers or a water utility’s information for ransom requests, generating reversible or irreversible damages to pumps, valves, or tanks in water systems, cutting off, reducing or poisoning water supply to a region or an entire city, and even manipulating a dam to cause flooding [2,3].

Cyberattacks in water distribution systems (WDS) are mainly carried out in the form of false data injections (FDIs) on smart meters’ measurements. State estimation is one of the most commonly implemented detection methods of false data injection in water systems [4,5], which has also been vastly studied in power grids [6,7,8]. However, if left undetected, false data injection attacks can lead to a false state estimation in the system, which ultimately will result in wrong operational decisions and system failures [9]. In addition to state-estimation methods, there are other techniques in the literature to detect cyberattacks. In [10], a model-based detection approach using mixed integer linear programming was proposed, where the the nodal demands were estimated to be used in EPANET (a hydraulic design software). Simulation results were compared to the given data to identify potential false data injections. In [11], the information flow pattern sent out by sensors to actuators was modified, and the number of redundant cyber components were increased to enhance the protection of WDS against cyber–physical attacks. In [12], a detection model incorporating a pressure-driven hydraulic simulation was proposed and verified against 15 failure set scenarios adopted from the model developed in [13].

However, deep learning methods have been adopted in many areas to solve complicated problems, and given their promising performance in generating efficient and accurate solutions to these complex problems, their applications in detecting cyberattacks have gained attention in recent years [14,15,16,17].

In water systems, also, some studies used deep learning-based methods to detect cyberattacks. For example, Ref. [18] proposed a data-driven detection algorithm to identify and localize cyberattacks based upon the hydraulic processes patterns of the water distribution system (WDS) using the data sets from the battle of the attack detection algorithms (BATADAL). However, the results suggest the high dependency of the detection performance on the threshold defined for the reconstruction errors. Additionally, the BATADAL data sets were used in [19] to train a 3-layer anomaly detection algorithm to identify two types of anomalies in WDS, including local anomalies affecting individual sensors and global anomalies impacting more than one sensor at the same time. While the trained models in [19] successfully detected the cyberattacks featured in the BATADAL historical data, the detection performance was sensitive to high measurement noise levels in the sensory data. In addition, the BATADAL attack scenarios and data were used in [20] to develop learning-based algorithms, including k-nearest neighbor, support vector machine, artificial neural network, and extreme learning machine (ELM). Even though the improved ELM versions proposed in [20] demonstrated much better results compared to other studied algorithms, the highest accuracy was reported only as 92.4% using an improved ELM including 10 hidden neurons. The BATADAL data sets were also used in [21] to train and evaluate learning-based anomaly detection techniques, combining density-based and parametric algorithms. Nevertheless, the large size of the training data set (i.e., hourly readings for 365 days) used in [18,19,20,21] can slow down the deep neural network (DNN) training procedure, which can also become computationally costly. In [22], multimodal data fusion and adaptive deep learning were used to develop a cyberattack detection model in WDS, where various weighted channels of information were considered, and deep learning approach was used to estimate the weight of each channel. However, the best accuracy and performance levels achieved using the proposed DNN-based detection model in [22] were 87.62% and 85.41%, respectively, meaning that more than 10% of cyberattacks could potentially bypass the proposed model and damage the WDS. In [23], multilayer perception (MLP) and support vector machine (SVM) were used to predict the measurement parameters and to identify and classify the outliers in WDS. In [24] also, supervised and unsupervised detection models were developed to identify anomalies in water treatment plants. Nonetheless, these last two studies were focused on training the DNN model based on the water quality parameters, and hydraulic features of the system were not contemplated. As such, the proposed models in [23,24] cannot detect any cyberattacks targeting the hydraulic-related readings of a WDS, such as the tank’s level measurements.

Despite significant efforts by previous studies in the application of deep learning and artificial intelligence on detection of different forms of cyberattacks in water systems, there are considerable limitations yet to be addressed to improve the performance and efficacy of the DNN-based cyberattack detection models. Firstly, the number and size of the input data to train the deep neural networks is so large and their architecture is so deep that make implementing and training DNNs very costly and time consuming. Moreover, while the sensitivity of DNNs to noisy measurements has been acknowledged in the past, efforts are to be made to improve the detection performance in presence of various levels of noise in the measurements to reduce sensitivity and enhance the resilience of the proposed DNNs. Furthermore, effective methods should be implemented to accurately update the weights and prevent overfitting.

To address the aforementioned limitations, hyper-parameters are tuned and multiple overfitting control techniques are implemented in the present study to generate the highest detection performance, using the smallest depth DNNs possible. In particular, the paper aims at the following:

Developing and training supervised deep neural network algorithms that can detect false data injections targeting storage tanks’ water level in water distribution systems using the minimum number of input data;
Implementing regularization and momentum techniques to efficiently update the weights and prevent overfitting;
Evaluating the performance of the proposed neural-network-based detection methodology for shallow and deep neural networks;
Strengthening the performance of the proposed neural-network-based detection methodology by evaluating various activation functions such as sigmoid, rectified linear unit (ReLU), and softmax functions;
Enhancing the performance of the proposed deep neural-network-based detection methodology in the presence of various measurement noise levels.

The paper is organized as follows: deep learning-based detection framework is developed in Section 1, numerical results are listed in Section 2 and Section 3 concludes the research.

2. Deep Neural Network-Based Detection Method Formulation

Deep learning is a subset of machine learning structured and designed based on the brain’s neurons and how they are getting trained to function in various applications. Deep neural networks are composed of input and output layers, with a number of hidden layers stacked in between [25]. The architecture of a shallow neural network is made of one hidden layer (overall three layers), while deep neural networks include two or more hidden layers. The performance of DNNs are highly dependent on the number of hidden layers and hyper-parameters that are selected to transfer data from one layer to the next in a forward direction, while nodes (also known as neurons) in each layer are fully connected via connection weights.

The proposed learning-based detection algorithm is illustrated in Figure 1 and in Algorithm 1. Herein, the input data, which are the hourly level of water in the tanks, will be sent to the proposed bad data detection (BDD) model of the water distribution system. The BDD is a deep neural network which is trained using the historical data of the tanks’ measurements. The historical data are split into “training", “cross-validation" (CV), and “testing" data sets. The training and CV data sets are introduced to the proposed deep learning algorithm to train the hyper-parameters and update the weights, while the testing data set is used to evaluate the performance of the trained network. Provided the trained network is efficient and effective, it will be used to identify the FDI cyberattacks in the WDS. Because both the inputs and outputs (i.e., the labels in this study) are known in this study, the proposed deep learning algorithm is considered supervised. Supervised deep learning has two steps; forward propagation and backward (or back) propagation to predict output

\hat{y}

from an input vector,

x

. In this case, the input

x

includes the vector of tanks’ water levels which may or may not include false data injections. The output (i.e., label)

d \in {0, 1}

is a vector of binary values indicating bad data in the measurement by assigning 1 to the label of the data. Implementing a multiclass classification, the size of the output vector will be the same size as vector of the measurements, with 0 and 1 indicating the clean and bad data, respectively. To find the highest computational efficiency of the proposed deep learning algorithm, fully connected learning algorithms including various number of hidden layers are examined for this study.

Algorithm 1: Proposed deep learning detection framework.

Input: Training set (

X^{t r a i n}, y^{t r a i n}

)
Input: Validation set (

X^{t r a i n}, y^{t r a i n}

)
Input: Initialize the learning rates,

α, λ, m

Compute:

θ^{[k]}

,

{\hat{y}}^{[j]}

, J

1 for

α = 0 : 1

with 0.01 step size

2 for

λ = 0 : 1

with 0.01 step size

3 for

m = 0 : 1

with 0.01 step size

4 for

i = 1 : {Max}_{iter}

5 Calculate the

\nabla^{[k]}

6 Update

W^{[} i]

during

{Max}_{iter}

training steps,

7 Calculate J on training

8 Calculate J on cross-validation

9

E_{v a l}^{[i]} \leftarrow

Validation Error

10 if

E_{v a l}^{[i]} < E_{v a l}^{[i - 1]}

11

W^{*} \leftarrow W^{[i]}

12

i^{*} \leftarrow i

13

λ^{*} \leftarrow λ^{[i]}

14

α^{*} \leftarrow α^{[i]}

15

m^{*} \leftarrow m^{[i]}

16 else

17

W^{*} \leftarrow W^{[i - 1]}

18

α^{*} \leftarrow α^{[i - 1]}

19

λ^{*} \leftarrow λ^{[i - 1]}

20

m^{*} \leftarrow m^{[i - 1]}

21 end if

22 end for

23 end for

24 end for

25 Output: Optimal values for

W^{*}

26 Output: Best number of training steps,

i^{*}

27 Output: Best learning rate values,

λ^{*}

,

α^{*}

,

m^{*}

2.1. Forward Propagation

The output of each neuron of the DNN is predicated in the forward propagation procedure. For instance, the output vector of the first set of nodes can be calculated as the following:

\underset{θ \in R^{k \times 1}}{\underset{︸}{[\begin{matrix} θ_{1} \\ θ_{2} \\ ⋮ \\ θ_{k} \end{matrix}]}} = \underset{W \in R^{k \times m}}{\underset{︸}{[\begin{matrix} - w_{1}^{{[1]}^{T}} - \\ - w_{2}^{{[1]}^{T}} - \\ ⋮ \\ - w_{k}^{{[1]}^{T}} - \end{matrix}]}} \underset{x \in R^{m \times 1}}{\underset{︸}{[\begin{matrix} x_{1} \\ x_{2} \\ ⋮ \\ x_{m} \end{matrix}]}} + \underset{b \in R^{k \times 1}}{\underset{︸}{[\begin{matrix} b_{1}^{[1]} \\ b_{2}^{[1]} \\ ⋮ \\ b_{k}^{[1]} \end{matrix}]}}

(1)

where

θ

is the hypothesis,

b^{[1]}

is the bias vector associated with layer 1, k is the number of neurons in each layer,

x^{(j)} = {[x_{1}, x_{2}, \dots, x_{m}]}^{T}

is the jth tank level measurement vector on m hours including clean and bad data, W is a weight matrix. Calculations of each layer are done via the following set of equations [26]

\begin{matrix} θ = f (W^{T} x + b) \end{matrix}

(2)

where

θ \in R^{l_{i}}

and

l_{i}

denotes the size of the input,

x \in R^{l_{i - 1}}

is the vector of input values,

W \in R^{l \times l_{i - 1}}

is the matrix of the weights, with

W^{[i]}

corresponding to the inputs of the ith layer, and

b \in R^{l_{i}}

is the vector of biases with

b^{[i]}

corresponding to the ith layer. The function

f (\cdot)

represents the activation function of each layer, chosen from one or a combination of the following linear and nonlinear correlations

\begin{matrix} f (x^{[i]}) = \frac{1}{1 + e^{- x^{[i]}}} \end{matrix}

(3)

f (x^{[i]}) = {\begin{matrix} x^{[i]}, x > 0 \\ 0, x \leq 0 \end{matrix}

(4)

f (x^{[i]}) = \frac{e^{x^{[i]}}}{\sum_{k = 1}^{M} e^{x [k]}}

(5)

where (3)–(5) are known as sigmoid, ReLU, and softmax functions, respectively. A sample calculation of an element-wise nonlinear application of (3) in (2) and (1) for the jth tank level measurement would be

\begin{matrix} θ^{(j)} = f (W^{[1]} x^{(j)} + b^{[1]}) = \frac{1}{1 + e^{- (W^{[1]} x^{(j)} + b^{[1]})}} \end{matrix}

(6)

where f is the sigmoid activation function of the 1st layer. The output of the neural network for the jth tank measurement,

{\hat{y}}^{(j)}

, will include the predictions at the second layer such that

\begin{matrix} {\hat{y}}^{(j)} = f (W^{[2]} θ^{(j)} + b^{[2]}) \end{matrix}

(7)

Thereafter, the error between the predicted and actual output will be used to update and optimize the weight matrices using the back propagation algorithm, explained in the following subsection. It is noted that while sigmoid and ReLU activation functions correspond to the input and hidden layers considering the weighted sum of inputs to each neuron, the softmax activation function is associated with the output layer. In (5), M represents the number of output nodes. Following (5), the softmax activation function will satisfy

f (x_{1}) + f (x_{2}) + \dots + f (x_{M}) = 1

.

2.2. Back Propagation

The gradient (i.e., derivative) of the cost function is estimated via the back propagation procedure, which will update the weights of the neural network and enhance the training process [27]. While the forward propagation is proceeded from left to right, taking in the inputs to calculate the output of each layer, the back propagation is directed from right to left, starting from the output nodes to minimize the errors between the estimated and actual output values.

2.3. Cost Function, Regularization, and Momentum

In training a DNN,

W

and

b

in (2) are acquired via solving an optimization problem such that the difference between the estimated

{\hat{y}}^{(j)}

and the actual output

d^{(j)}

of the jth tank measurements, the latter is also known as the set of labels for each input

x^{(j)}

, is minimal. In this model, the cross-entropy function of j-th training set

(x^{(j)}, {\hat{y}}^{(j)})

is used to define the cost function (

C (\hat{y^{(j)}}, d^{(j)})

) such that [28]

\begin{matrix} C & = \sum_{i = 1}^{M} \{- d^{(j)} \ln ({\hat{y}}^{(j)}) - (1 - d^{(j)}) \ln (1 - {\hat{y}}^{(j)})\} / M \end{matrix}

(8)

In order to minimize the error between the estimated and the actual output values of the NN, the cost function should be minimized. However, this can be obtained at the risk of overfitting. To tackle this issue, regularization is incorporated by adding the sum of weights,

λ \frac{1}{2} {| | w | |}^{2}

, to the cost function. The cost function along with regularization is minimized for the training set, as well as the cross validation set via the stochastic gradient descent (SGD) method. The SGD estimates the error for each layer and adjusts the weights accordingly. The term stochastic refers to the random selection of data to be processed for the gradient descent. Stochastic gradient decent allows for an efficient performance of the DNNs for large training sets, including the FDI cyberattacks on the measurements of a large-scale water distribution system.

minimize J

(9)

where

J = C ({\hat{y}}^{(j)}, d^{(j)}) + λ \frac{1}{2} | | W^{[i]} {| |}^{2}

(10)

Following SGD,

θ^{(j)} = {[θ^{(j)}]}^{-} + α \nabla J

, where

{[θ^{(j)}]}^{-}

is the

θ^{(j)}

of the previous step, and

α

represents the learning rate. Here, the gradient ∇ is the derivative of the cost function over output prediction and hypothesis. Subsequently, the weights in each k layer will be updated as

\begin{matrix} w_{i, j}^{[k]} \leftarrow w_{i, j}^{[k]} + α [\nabla_{i}^{[k]} x_{j}^{[k]} + \frac{λ}{m} w_{i, j}^{[k]}] \end{matrix}

(11)

where the

i, j

th element of the kth layer weight matrix will be updated based on the selected learning rate

α

and the regularization term

λ

. The SGD calculates the errors for each training data and updates the weights immediately.

Moreover, to further enhance the training process, an advanced weight adjustment known as momentum, m, is implemented. Momentum is responsible to guide the direction of the weight adjustment by adding the momentum term to the weight update process. The algorithm is expressed as [25]:

\begin{matrix} Δ w_{i, j}^{[k]} & = α [\nabla_{i}^{[k]} x_{j}^{[k]} + \frac{λ}{m} w_{i, j}^{[k]}] \end{matrix}

(12)

\begin{matrix} m_{i, j}^{[k]} & = Δ w_{i, j}^{[k]} + β {[m_{i, j}^{[k]}]}^{-} \end{matrix}

(13)

\begin{matrix} w_{i, j}^{[k]} & = w_{i, j}^{[k]} + m_{i, j}^{[k]} \end{matrix}

(14)

where

m_{i, j}^{[k]}

is the momentum for element

i, j

of the weight matrix of the kth layer,

β

\in [0, 1]

is a positive constant to adjust the momentum rate, and

{[m_{i, j}^{[k]}]}^{-}

is the momentum value in the previous step. The algorithm of the proposed optimized deep learning method is shown below.

2.4. Statistical Metrics

Performance of the NN algorithm in detecting cyberattacks of the water distribution system is commonly evaluated in terms of statistical measures, including the accuracy, precision, recall, and F1-score, expressed as the following.

\begin{matrix} Accuracy & = \frac{TP + TN}{TP + TN + FP + FN} \end{matrix}

(15)

\begin{matrix} Precision & = \frac{TP}{TP + FP} \end{matrix}

(16)

\begin{matrix} Recall & = \frac{TP}{TP + FN} \end{matrix}

(17)

\begin{matrix} F 1 - score & = \frac{2 (Precision \times Recall)}{Precision + Recall} \end{matrix}

(18)

Accuracy in (15) calculates the correctly estimated output over the entire test data set, where

TP

represents the true positive (TP),

TN

denotes the true negative (TN),

FP

is the false positive (FP), and

FN

represents the false negative (FN) rate. Precision in (16) evaluates the cyberattacks that are accurately identified to all detected attacks. Recall in (17) estimates the number of accurately detected cyberattacks over all the attacks (those that are correctly identified and those that are falsely undetected). F1-score in (18) is the harmonic mean of the precision and recall values. Furthermore, the receiver operating characteristic curve (ROC) is plotted to evaluate the performance of the classification framework for various scenarios in terms of the true positive rate (TPR) and false positive rate (FPR) such that

\begin{matrix} TPR & = \frac{TP}{TP + FN} \end{matrix}

(19)

\begin{matrix} FPR & = \frac{FP}{TN + FP} \end{matrix}

(20)

ROC integrates the area under curve (AUC) as

A U C = \int_{0}^{1} T R P d (F P R)

, such that AUC of 1 denotes 100% correct predictions.

3. Case Studies

The proposed deep learning based detection algorithm (described in Algorithm 1) is evaluated using partial data sets (80 out of 365 days) of the C-Town designed for the BATADAL, a WDS cybersecurity competition held at the World Environmental and Water Resources Congress in 2017 [29]. Herein, the data related to the level measurements of tanks 1, 2, 3, 4, and 7 are considered as the input data under three different attack scenarios of attack 1 on tank 7 only, attack 2 on tank 1 only, and attacks 1 and 2 on both tanks 1 and 7, simultaneously. This means that the SCADA readings associated to the tanks other than 1 and 7 are always considered clean data. The total number of measurements is 2119 for each tank, and hence the input samples and their associated labels

X = {x^{(1)}, x^{(2)}, x^{(3)}, x^{(4)}, x^{(7)}}

and

D = {d^{(1)}, d^{(2)}, d^{(3)}, d^{(4)}, d^{(7)}}

are

2119 \times 4

matrices. Out of these measurements, 2017 are clean and 102 contain false data injections, thus

[X_{c l e a n} D_{c l e a n}]

is a

2017 \times 8

matrix and

[X_{b a d} D_{b a d}]

is a

102 \times 8

matrix, where

D_{c l e a n} = {[0]}_{2017 \times 4}

and

D_{b a d} = {[1]}_{102 \times 4}

. Both clean and bad data sets are shuffled and then split by 80:10:10 for training, cross-validation, and testing, respectively [30]. Initial randomization is used to estimate the initial W values. The input data, i.e., hourly tank’s level, are normalized as

(x^{(j)} - x_{mean}^{(j)}) / x_{std}^{(j)}

, where

x_{m e a n}^{(j)}

and

x_{s t d}^{(j)}

are the mean and standard deviation values of the jth tank level measurements, respectively.

3.1. Impact of Activation Functions and Number of Hidden Layers

Impacts of number of hidden layers, as well as implementing sigmoid, ReLU, and softmax activation functions on the training errors and statistic metrics of the proposed detection algorithms are investigated.

The training errors represent the root-mean-square errors in the training and cross-validation data sets. Figure 2 demonstrates the training errors for a shallow neural network, including 1 hidden layer, which will result in an overall of 3 layers, using the sigmoid function. Figure 3 shows the errors for a 4-layer DNN, where the left column shows results with sigmoid and softmax functions for the first and second hidden layers, respectively; the middle column shows results of using ReLU and softmax in the first and second hidden layers, respectively; and the right column shows results using ReLU function in both hidden layers.

Figure 4 demonstrates the errors for a 5-layer DNN, where the left column shows results using sigmoid for the first and the second hidden layers and softmax for the third hidden layer; the right column using ReLU in the first hidden layer, sigmoid in the second layer, and softmax in the third hidden layer. The DNN was not trainable when ReLU was used for more than one hidden layer. (Results are not shown).

As can be seen in these figures, increasing the hidden layers with sigmoid as the primary activation function significantly enhances the convergence of the errors to zero. Moreover, more layers resolve the overfitting issues observed in Figure 2a,b.

However, in the 4-layer DNN associated with the attacks 1 and 2 scenario, overfitting occurs when ReLU is used in the first hidden layer. The worst overfitting occurs in the 5-layer DNN associated with attack 1 when ReLU is used in the first hidden layer. These results indicate that while adding more hidden layers to the DNN can prevent and resolve overfitting, the selection of the activation function is as critical and effective.

Figure 5, Figure 6 and Figure 7 present the AUC of the proposed model in the training, cross-validation, and testing data sets under the three attack scenarios for various hidden layers and activation functions.

Despite the overfitting observed in Figure 2, the ROC of the 3-layer DNN verifies the efficiency of the DNN to detect cyberattacks in all three scenarios. This is justified because the average error in Figure 2 is in the scale of

10^{- 6}

, thus overfitting does not significantly affect the performance of the DNN. The accuracy of the DNN in training, CV, and testing data sets are 1.0.

Figure 6 indicates the effectiveness of the 4-layer DNN-based detection model, where the bad data is accurately detected in all attack scenarios regardless of the selected activation function, except in attacks 1 and 2, when ReLU is used in both hidden layers. In this case, the proposed model detects all the bad data with 100% accuracy in the training data sets of both tanks 1 and 7, but it detects 92% and 95% of the bad data in the CV and testing data sets of the tank7’s measurements (LT7), respectively, and 97% of bad data in the CV data set associated with tank1’s measurements (LT1). These findings agree with the overfitting observed in Figure 3 using ReLU in both layers under the attacks 1 and 2 scenario.

The results demonstrated in Figure 7 suggest that for the attack 1 scenario, the 5-layer DNN offers an effective detection method only when sigmoid is used for the first two hidden layers. For the attack 2 scenario, the 5-layer DNN-based detection model is effective, regardless of whether sigmoid or ReLU is used in the first two hidden layers. For a simultaneous attack on both tanks (i.e., attacks 1 and 2), the detection model can identify bad data of all data sets on tank 7’s measurements, but can only detect 96% of the bad data in the CV data set related to tank 1’s measurements, using either activation functions in the first two hidden layers.

Numerical results tabulated in Table 1 indicate that the proposed model can detect the bad data in tank 7’s measurements with 100% accuracy and precision, even using a shallow neural network containing 3 layers. However, the same performance can be achieved with the 4- and 5-layer DNNs with input data including 1000 samples fewer than that of the 3-layer model, except for a 5-layer DNN using ReLU in the first 2 hidden layers. This agrees with Figure 4 and Figure 7, where the training and validation errors do not converge to zero and AUC values are very weak in the Attack1-ReLU1, respectively. The accuracy and precision of the 5LR DNN remain as low as 20% and 30% even when the sample size is increased to the size used in 3L, respectively. (Data are not shown.) Similar results are obtained for the attack 2 scenario, presented in Table 2, where the proposed model as shallow as the 3-layer neural network is able to identify all the FDI in tank 1’s measurements, but the number of input samples required to achieve the same performance is significantly smaller as the number of hidden layers of the DNN is increased. For the attack scenario where the readings of both tanks 1 and 7 are tampered with simultaneously, the results in Table 3 indicate a lower precision and accuracy by a 3-layer neural network model compared to those for the attack 1 and attack 2 scenarios. Similar performance is achieved including an extra hidden layer but much smaller sample size, using sigmoid and softmax functions in the first and second hidden layers (refer to 4LS Table 3). However, 4-layer DNNs with sigmoid and ReLU in the first and second hidden layers, or ReLU in all hidden layers demonstrate 100% accuracy and precision with sample inputs smaller than 4LS. The number of required input samples to accomplish 100% efficacy in detecting bad data in this attack scenario decreases even further as the number of layers is increased to 5, and the sigmoid function in the first hidden layer is replaced with ReLU (refer to 5LS and 5LR Table 3).

3.2. Impact of Noise

In this case study, the performance of the proposed DNN to identify cyberattacks in the presence of the noise in the readings received by SCADA is investigated. The measurement noise results in stochastic characteristics of the tank’s readings, which potentially can worsen the prediction accuracy of the DNN. The noise function is characterized as a Gaussian distribution with zero mean and a specific standard deviation (

σ

) [31]. The level of the noise in the tank’s readings is expressed as signal-to-noise ratio (SNR), defined as the signal of the readings without noise to that with noise such that

S N R_{d B} = 20 \cdot log (\frac{1}{σ})

. In this case study, the noisy measurements are generated as the following:

L_{i, N}^{(j)} = (1 + M_{N}) \cdot L_{i}^{(j)}

(21)

where

L_{i, N}^{(j)}

is the ith reading of the jth tank’s measurements including noise

(m)

,

M_{N}

is the noise magnitude randomly generated by the Gaussian distribution

N (0, σ^{2})

in

(p . u .)

, and

L_{i}^{(j)}

is the original ith reading of the jth tank’s measurements without noise

(m)

.

A sample input data set for the studied tanks including clean and bad data without and with 10% noise is plotted in Figure 8.

In Figure 9, Figure 10 and Figure 11, a variety of noise magnitudes are generated to evaluate the resilience and reliability of various DNN architectures in detecting cyberattacks. To that end, even though in practical applications measurement noise levels often do not occur beyond 30%, higher noise magnitudes are also considered to evaluate the proposed DNNs. Results associated with noise magnitudes less than 10% are not presented because they had no impact on the performance of the DNNs. DNNs contain only a sigmoid function in the 3-layer and sigmoid–softmax functions in the 4- and 5-layer DNNs, where softmax is used only in the last hidden layer.

For a SNN,

M_{N} \leq 10 %

does not impact the performance of the detection model under either attack scenarios. However, when the noise level is increased to 15%, the AUC of prediction drops to 96% in both attack scenarios. Moreover, numerical results shown in Table 4 and Table 5 show that accuracy and recall are also deteriorated from 1.00 to 0.99 and 0.67 in attack 1 and 0.99 and 0.50 in attack 2, respectively. This is also true when the noise level is further increased to 30%.

However, when the number of hidden layers is increased, the DNNs become more resilient against higher levels of measurement noise. For instance, results in Figure 10 and Figure 11 indicate that noise levels up to 30% have no impact on performance of the 4- and 5-layer DNNs under attack 1, and the AUC of predictions become slightly impaired in the presence of 30% noise when under attack 2. Moreover, the results numerated in Table 4 and Table 5 indicate that noise levels up to 50% do not affect the precision, accuracy, and recall of these DNNs under any of the studied attack scenarios. As the noise level is increased to 50% or higher, the performance of the proposed DNNs can no longer be trusted. For instance, although Figure 10 and Figure 11 indicate great AUC in training and CV data sets and more than 90% AUC in prediction for the two studied attack scenarios, numerical results presented in Table 4 and Table 5 do not suggest the same. As such, for such high measurement noise levels, the operators are out to verify the performance of the DNN-based detection models with ROC graphs, as well as statistical measures.

A comparison between the results shown in Table 3 and the best performance reported by previous studies, tabulated in Table 6, who also used BATADAL data, confirm a much better detection performance by the 5LS and 5LR DNNs proposed in this work. Furthermore, the size of the training data for the 5-layer DNN proposed in the present work is collected from hourly readings over 21 days as opposed 365 days used in previous studies. Even for the shallowest DNN (i.e., 3-layer) developed in the present study, the readings associated with 80 days of the BATADAL database is used as input data. Finally, tuning the activation functions and the architecture of the proposed DNNs along with regularization and momentum techniques to efficiently update the weights have allowed us to develop an optimal and resilient detection model, which is nearly insensitive against noisy measurements or other parameters. However, the model proposed in [18] was highly sensitive toward the error reconstruction threshold (i.e.,

θ

), while [19] was sensitive to high background noise of the sensors.

4. Conclusions

A deep learning-based detection algorithm is proposed for false data injection cyberattacks on the tank’s level measurements of the water distribution system. The findings indicate that by proper training of deep learning algorithm, cyberattacks under various scenarios can be detected at 99% and higher rates. Moreover, the results show that the size of the training data sets can be significantly reduced as the number of hidden layers of the deep neural networks (DNNs) is increased. This is especially critical in developing detection models for large scale water systems, where there are numerous readings to be processed. Furthermore, the results stipulate the importance of choosing the proper activation model for different hidden layers of DNNs. In particular, in deep neural networks containing four or higher layers, the accuracy and precision of the detection model are strongly correlated with the type of the activation function used in each hidden layer. It is also demonstrated that the proposed detection algorithm can successfully detect false data injections even if measurement noises up to 30% magnitude are present in the system. Future work will focus on the development of unsupervised learning algorithms to detect stealthy false data injections aiming at tanks’ and pumps’ measurements.

Author Contributions

Conceptualization, F.M. and J.K.; Data curation, F.M. and J.K.; Formal analysis, J.K.; Investigation, F.M. and J.K.; Methodology, Faegheh Moazeni; Software, F.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was in part funded by Lehigh University’s Faculty Innovation Grant (FIG), 2022 award, titled “AI-BASED CYBERATTACK DETECTION MODEL FOR SMART WATER DISTRIBUTION SYSTEMS”.

Conflicts of Interest

The authors declare no conflict of interest.

References

Perlroth, N.; Sanger, D.E. Cyberattacks put Russian fingers on the switch at power plants, US says. New York Times, 15 March 2018. [Google Scholar]
Rasekh, A.; Hassanzadeh, A.; Mulchandani, S.; Modi, S.; Banks, M.K. Smart water networks and cyber security. J. Water Resour. Plan. Manag. 2016, 142, 01816004. [Google Scholar] [CrossRef]
Hassanzadeh, A.; Rasekh, A.; Galelli, S.; Aghashahi, M.; Taormina, R.; Ostfeld, A.; Banks, M.K. A review of cybersecurity incidents in the water sector. J. Environ. Eng. 2020, 146, 03120003. [Google Scholar] [CrossRef] [Green Version]
Krutys, P.; Gomolka, Z.; Twarog, B.; Zeslawska, E. Synchronization of the vector state estimation methods with unmeasurable coordinates for intelligent water quality monitoring systems in the river. J. Hydrol. 2019, 572, 352–363. [Google Scholar] [CrossRef]
Arsene, C.T.; Gabrys, B. Mixed simulation-state estimation of water distribution systems based on a least squares loop flows state estimator. Appl. Math. Model. 2014, 38, 599–619. [Google Scholar] [CrossRef]
Kazemi, Z.; Safavi, A.A.; Naseri, F.; Urbas, L.; Setoodeh, P. A Secure Hybrid Dynamic State Estimation Approach for Power Systems Under False Data Injection Attacks. IEEE Trans. Ind. Inform. 2020, 16, 7275–7286. [Google Scholar] [CrossRef]
Wang, Q.; Tai, W.; Tang, Y.; Ni, M. Review of the false data injection attack against the cyber-physical power system. IET Cyber-Phys. Syst. Theory Appl. 2019, 4, 101–107. [Google Scholar] [CrossRef]
Bobba, R.B.; Rogers, K.M.; Wang, Q.; Khurana, H.; Nahrstedt, K.; Overbye, T.J. Detecting false data injection attacks on dc state estimation. In Proceedings of the Preprints of the First Workshop on Secure Control Systems, CPSWEEK, Stockholm, Sweden, 12 April 2010; Volume 2010. [Google Scholar]
Moazeni, F.; Khazaei, J. MINLP Modeling for Detection of SCADA Cyberattacks in Water Distribution Systems. In Proceedings of the World Environmental and Water Resources Congress 2020: Hydraulics, Waterways, and Water Distribution Systems Analysis. American Society of Civil Engineers, Reston, VA, USA, 17–21 May 2020; pp. 340–350. [Google Scholar]
Housh, M.; Ohar, Z. Model-based approach for cyber-physical attack detection in water distribution systems. Water Res. 2018, 139, 132–143. [Google Scholar] [CrossRef]
Nicolaou, N.; Eliades, D.G.; Panayiotou, C.; Polycarpou, M.M. Reducing vulnerability to cyber-physical attacks in water distribution networks. In Proceedings of the 2018 International Workshop on Cyber-Physical Systems for Smart Water Networks (CySWater), Porto, Portugal, 10–13 April 2018; pp. 16–19. [Google Scholar]
Shin, S.; Lee, S.; Burian, S.J.; Judi, D.R.; McPherson, T. Evaluating Resilience of Water Distribution Networks to Operational Failures from Cyber-Physical Attacks. J. Environ. Eng. 2020, 146, 04020003. [Google Scholar] [CrossRef]
Taormina, R.; Galelli, S.; Tippenhauer, N.O.; Salomons, E.; Ostfeld, A. Characterizing cyber-physical attacks on water distribution systems. J. Water Resour. Plan. Manag. 2017, 143, 04017009. [Google Scholar] [CrossRef]
Al-Abassi, A.; Karimipour, H.; Dehghantanha, A.; Parizi, R.M. An ensemble deep learning-based cyber-attack detection in industrial control system. IEEE Access 2020, 8, 83965–83973. [Google Scholar] [CrossRef]
Ferrag, M.A.; Maglaras, L.; Moschoyiannis, S.; Janicke, H. Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study. J. Inf. Secur. Appl. 2020, 50, 102419. [Google Scholar] [CrossRef]
Chen, D.; Wawrzynski, P.; Lv, Z. Cyber Security in Smart Cities: A Review of Deep Learning-based Applications and Case Studies. Sustain. Cities Soc. 2020, 66, 102655. [Google Scholar] [CrossRef]
Diro, A.A.; Chilamkurti, N. Distributed attack detection scheme using deep learning approach for Internet of Things. Future Gener. Comput. Syst. 2018, 82, 761–768. [Google Scholar] [CrossRef]
Taormina, R.; Galelli, S. Deep-learning approach to the detection and localization of cyber-physical attacks on water distribution systems. J. Water Resour. Plan. Manag. 2018, 144, 04018065. [Google Scholar] [CrossRef]
Abokifa, A.A.; Haddad, K.; Lo, C.; Biswas, P. Real-time identification of cyber-physical attacks on water distribution systems via machine learning–based anomaly detection techniques. J. Water Resour. Plan. Manag. 2019, 145, 04018089. [Google Scholar] [CrossRef]
Choi, Y.H.; Sadollah, A.; Kim, J.H. Improvement of Cyber-Attack Detection Accuracy from Urban Water Systems Using Extreme Learning Machine. Appl. Sci. 2020, 10, 8179. [Google Scholar] [CrossRef]
Ramotsoela, D.T.; Hancke, G.P.; Abu-Mahfouz, A.M. Attack detection in water distribution systems using machine learning. Hum.-Centric Comput. Inf. Sci. 2019, 9, 13. [Google Scholar] [CrossRef]
Bakalos, N.; Voulodimos, A.; Doulamis, N.; Doulamis, A.; Ostfeld, A.; Salomons, E.; Caubet, J.; Jimenez, V.; Li, P. Protecting water infrastructure from cyber and physical threats: Using multimodal data fusion and adaptive deep learning to monitor critical systems. IEEE Signal Process. Mag. 2019, 36, 36–48. [Google Scholar] [CrossRef]
Zou, X.Y.; Lin, Y.L.; Xu, B.; Guo, Z.B.; Xia, S.J.; Zhang, T.Y.; Wang, A.Q.; Gao, N.Y. A Novel Event Detection Model for Water Distribution Systems Based on Data-Driven Estimation and Support Vector Machine Classification. Water Resour. Manag. 2019, 33, 4569–4581. [Google Scholar] [CrossRef]
Ahmed, C.M.; Raman, G.; Mathur, A.P. Challenges in machine learning based approaches for real-time anomaly detection in industrial control systems. In Proceedings of the 6th ACM on Cyber-Physical System Security Workshop, Taipei, Taiwan, 6 October 2020; pp. 23–29. [Google Scholar]
Kim, P. Matlab deep learning. Mach. Learn. Neural Netw. Artif. Intell. 2017, 130, 21. [Google Scholar]
Vinayakumar, R.; Alazab, M.; Soman, K.; Poornachandran, P.; Al-Nemrat, A.; Venkatraman, S. Deep learning approach for intelligent intrusion detection system. IEEE Access 2019, 7, 41525–41550. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. 6.5 Back-propagation and other differentiation algorithms. In Deep Learning; 2016; Available online: https://mitpress.mit.edu/books/deep-learning (accessed on 8 June 2022).
Bottou, L. Large-scale machine learning with stochastic gradient descent. In Proceedings of the COMPSTAT’2010, Paris, France, 22–27 August 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 177–186. [Google Scholar]
Taormina, R.; Galelli, S.; Tippenhauer, N.O.; Salomons, E.; Ostfeld, A.; Eliades, D.G.; Aghashahi, M.; Sundararajan, R.; Pourahmadi, M.; Banks, M.K.; et al. Battle of the attack detection algorithms: Disclosing cyber attacks on water distribution networks. J. Water Resour. Plan. Manag. 2018, 144, 04018048. [Google Scholar] [CrossRef] [Green Version]
Pandey, R.K.; Dahiya, A.K.; Pandey, A.K.; Mandal, A. Optimized deep learning model assisted pressure transient analysis for automatic reservoir characterization. Pet. Sci. Technol. 2022, 40, 659–677. [Google Scholar] [CrossRef]
Brown, M.; Biswal, M.; Brahma, S.; Ranade, S.J.; Cao, H. Characterizing and quantifying noise in PMU data. In Proceedings of the 2016 IEEE Power and Energy Society General Meeting (PESGM), Boston, MA, USA, 17–21 July 2016; pp. 1–5. [Google Scholar] [CrossRef]

Figure 1. The proposed learning-based detection algorithm is illustrated. The top left box demonstrates a water distribution benchmark, used to generate hourly data points. On the bottom left, in an “offline mode”, the historical data are split into “training", “cross-validation" (CV), and “testing" data sets. The training and CV data sets are introduced to the proposed deep learning algorithm to train the hyper-parameters and update the weights, while testing data set is used to evaluate the performance of the trained network. Once the algorithm is effectively trained, it will be integrated in the SCADA system to identify the FDI cyberattacks in the WDS in real time.

Figure 2. Average of training and cross validation errors for a shallow neural network, including 1 hidden layer (overall of 3 layers), using sigmoid function are demonstrated. Subplots refer to the detection algorithm against attack 1, attack 2, and simultaneous attacks 1 and 2.

Figure 3. Average of training and cross validation errors for a 4-layer DNN (i.e., 2 hidden layers) are exhibited. The left column demonstrates the results associated with sigmoid and softmax functions for the first and second hidden layers, respectively; the middle column describes the results of using ReLU and softmax in the first and second hidden layers, respectively; and on the right column, the results of using ReLU function in both hidden layers are illustrated. From the top, the top row subplots refer to the detection algorithm against attack 1, the middle row is associated with attack 2, and the bottom row explains detection of simultaneous attacks 1 and 2.

Figure 4. Average of training and cross validation errors for a 5-layer DNN are demonstrated. The left column represents the results associated with using sigmoid for the first and the second hidden layers and softmax for the third hidden layer, while the right column shows the results of using ReLU in the first hidden layer, sigmoid in the second layer, and softmax in the third hidden layer. From the top, the top row subplots refer to the detection algorithm against attack 1, the middle row is associated with attack 2, and the bottom row explains detection of simultaneous attacks 1 and 2.

Figure 5. The ROC of various attack scenarios without any measurement noise, using shallow neural network detection algorithm.

Figure 6. The ROC of various attack scenarios without any measurement noise, using the 4-layer DNN detection algorithm.

Figure 7. The ROC of various attack scenarios without any measurement noise, using the 5-layer DNN detection algorithm.

Figure 8. Clean and bad data measurements of tank 1 (LT1), tank 2 (LT2), tank 3 (LT3), and tank 4 (LT4) considering 10% noise.

Figure 9. The ROC of various attack scenarios considering measurements noise levels.

Figure 10. The ROC of various attack scenarios considering measurements noise levels.

Figure 11. The ROC of various attack scenarios considering measurements noise levels.

Table 1. Attack 1 detection results without measurement noise.

DNN	Prec.	Acc.	Rec.	$F_{1}$	TP	TN	FN
3L	1.00	1.00	1.00	1.00	4	766	0
4LS	1.00	1.00	1.00	1.00	2	268	0
4LSR	1.00	1.00	1.00	1.00	5	265	0
4LRR	1.00	1.00	1.00	1.00	6	264	0
5LS	1.00	1.00	1.00	1.00	3	267	0
5LR	NaN	0.98	0	NaN	1	264	6

Table 2. Attack 2 detection results without measurement noise.

DNN	Prec.	Acc.	Rec.	$F_{1}$	TP	TN
3L	1.00	1.00	1.00	1.00	1	769
4LS	1.00	1.00	1.00	1.00	5	265
4LSR	1.00	1.00	1.00	1.00	5	265
4LRR	1.00	1.00	1.00	1.00	7	263
5LS	1.00	1.00	1.00	1.00	7	114
5LR	1.00	1.00	1.00	1.00	1	120

Table 3. Attack 1&2 detection results without measurement noise.

DNN	Prec.	Acc.	Rec.	$F_{1}$	TP	FP	TN	FN
3L	0.89	0.99	1.00	0.94	8	1	1088	0
4LS	0.87	0.99	0.87	0.87	7	1	528	1
4LSR	1.00	1.00	1.00	1.00	2	0	381	0
4LRR	1.00	1.00	1.00	1.00	2	0	381	0
5LS	1.00	1.00	1.00	1.00	8	0	234	0
5LR	1.00	1.00	1.00	1.00	1	0	164	0

Table 4. Attack 1 detection results considering measurement noise.

	3L			4L			5L
$M_{N}$	Prec.	Acc.	Rec.	Prec.	Acc.	Rec.	Prec.	Acc.	Rec.
10%	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
15%	1.00	0.99	0.67	1.00	1.00	1.00	1.00	1.00	1.00
25%	1.00	0.99	0.20	1.00	1.00	1.00	1.00	1.00	1.00
30%	1.00	0.99	0.50	1.00	1.00	1.00	1.00	0.99	0.67
50%			-	1.00	0.99	0.33	NaN	0.99	0
60%	-	-	-	1.00	0.99	0.33	-	-	-

Table 5. Attack 2 detection results considering measurement noise.

	3L			4L			5L
$M_{N}$	Prec.	Acc.	Rec.	Prec.	Acc.	Rec.	Prec.	Acc.	Rec.
10%	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
15%	1.00	0.99	0.50	1.00	1.00	1.00	1.00	1.00	1.00
25%	1.00	0.99	0.67	1.00	1.00	1.00	1.00	1.00	1.00
30%	NaN	0.99	0	1.00	1.00	1.00	1.00	1.00	1.00
50%	-	-	-	1.00	0.98	0.40	NaN	1.00	NaN
60%	-	-	-	NaN	0.99	0	-	-	-

Table 6. Comparison with other algorithms.

Ref.	Training Size	Performance	Sensitivity
[18]	365 days	$F 1 = 0.897$	High to $θ$
[19]	365 days	$A U C = 0.953$	High to noise
[20]	365 days	$F 1 = 0.806$	NA
[22]	365 days	$F 1 = 0.852$	NA
Proposed	21 days	$F 1 = 1.00$	Low

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Moazeni, F.; Khazaei, J. Detection of Random False Data Injection Cyberattacks in Smart Water Systems Using Optimized Deep Neural Networks. Energies 2022, 15, 4832. https://doi.org/10.3390/en15134832

AMA Style

Moazeni F, Khazaei J. Detection of Random False Data Injection Cyberattacks in Smart Water Systems Using Optimized Deep Neural Networks. Energies. 2022; 15(13):4832. https://doi.org/10.3390/en15134832

Chicago/Turabian Style

Moazeni, Faegheh, and Javad Khazaei. 2022. "Detection of Random False Data Injection Cyberattacks in Smart Water Systems Using Optimized Deep Neural Networks" Energies 15, no. 13: 4832. https://doi.org/10.3390/en15134832

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Detection of Random False Data Injection Cyberattacks in Smart Water Systems Using Optimized Deep Neural Networks

Abstract

1. Introduction

2. Deep Neural Network-Based Detection Method Formulation

2.1. Forward Propagation

2.2. Back Propagation

2.3. Cost Function, Regularization, and Momentum

2.4. Statistical Metrics

3. Case Studies

3.1. Impact of Activation Functions and Number of Hidden Layers

3.2. Impact of Noise

4. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI