4.1. Experimental Setup and Attack Implementation
To evaluate the different applications of the proposed network intrusion and anomaly detection system, a transmission power system model based on [
29] was implemented in a real-time simulation engine using HYPERSIM (OPAL-RT). The grid topology consisted of 6 machines, 17 busbars, and 26 branches divided into a southern and a northern subgroup and interconnected by 5 inter-area transmission lines. The overall experimental test setup is given in
Figure 8.
The PMU slaves were implemented at all busbars to transmit frequency and voltage measurements using the IEEE C37.118 protocol at a fixed reporting rate of 25 f.p.s. One additional RTU slave was implemented at a single station to transmit voltage, current, as well as active and reactive power measurements via the IEC 60870-5-104 protocol using a fixed reporting every 2 s. A commercial gateway was used to convert the SCADA telegrams into the IEC 61850-8-1 (MMS) reports.
On the adversary side, a MITM attack was implemented to eavesdrop and manipulate arbitrary protocol information of the exchanged PMU and MMS network packets. The MITM attack uses ARP spoofing to redirect the network packets between the control center and substation level to an adversary. The adversary decodes the network packets, overwrites specific protocol information (e.g., measurement values), and sends the manipulated network packets to corrupt important monitoring or control applications. For simplification, no detailed attacker model was assumed in this work and no prior information about the system topology or historical measurements was available for the adversary. For the IEEE C37.118 protocol, the implemented attacks include various manipulations of frequency information and voltage phasor information, SOC timestamps, time quality flags, as well as ID-codes or station names in transmitted DATA frames. This goes beyond related investigations [
15,
23,
24,
26], which focused on data replays, packet drops, and timing attacks. Regarding the IEC 61850-8-1 (MMS) protocol, the attacker can compromise integer-based data attribute values and the time of entry in transmitted MMS reports.
4.3. Attack Detection via Network Traffic Analysis
The evaluation of the anomaly-based NIDS focused on the amount of abnormal network packets, which were detected by the OC-SVM model during corrupted and noncorrupted network traffic.
Figure 10 shows the OC-SVM predictions
and raw scores (results of the decision function) for an exemplary baseline network excerpt of about 2 s.
As can be seen, no network anomalies were detected for the baseline traffic, so
for all network packets. In contrast to that,
Figure 11 shows the OC-SVM prediction results in case of corrupted network traffic by manipulating PMU frequency values within the MITM attack.
In that case, some of the network packets were detected as abnormal such that
and the corresponding score values decreased. The fraction of detected network anomalies in the total traffic
mainly depends on the chosen hyperparameters of the OC-SVM. This is illustrated in
Figure 12 by comparing the
values for the baseline traffic during training/validation and the manipulated traffic during testing for different RBF kernel coefficients
.
As it can be seen, high kernel coefficient values led to an increase in the number of abnormal network packets detected in both test data sets, while the number of detected normal network packets during training/validation remained almost constant. For kernel coefficients , a good separation between normal and abnormal network packets was achieved.
4.4. Attack Detection via Phasor Measurement Analysis
To evaluate the PMU anomaly detection and correction application that was introduced in
Section 3.3, the frequency and voltage phasor measurements were derived from the dynamic simulations (RMS) of the CIGRE TB 536 reference model (see
Section 4.1). The training data included noncorrupted PMU signals with a fixed reporting rate of 25 f.p.s. and a window size of
timesteps from the busbars of all 16 substations. The dynamic simulation was carried out for three operational points and 20 different contingencies (e.g., short-circuits, generator trips) taking the RMS signals until 20 s after the disturbance (approximately 30,000 training and validation samples). During testing, the data manipulations were created with arbitrary amplitudes, starting times, and durations for the simulated frequency, voltage magnitude, and voltage angle signals. Based on a simplified attacker model (see
Section 4.1), the naïve attack patterns comprised positive and negative signal steps as well as the addition of Gaussian white noise. Additionally, the data manipulations could affect a single PMU or a randomly chosen subset of PMUs (concurrent or shifted manipulations).
To assess the performance of the GRU-AE and GRU-FC models, special evaluation metrics were defined based on the F1-score. In case of the GRU-AE model, the true positives
, false positives
, and false negatives
were calculated as maximum values over all
PMUs and summed up over all
time steps:
The resulting F1-score follows with
In case of the GRU-FC model, the true positives
, false positives
, and false negatives
were calculated for all PMUs and time steps within the forecast horizon
without an additional aggregation, such that
Table 3 lists the selected hyperparameters of the GRU-AE and GRU-FC models, which were derived from comprehensive training and validation runs.
For a better understanding of the results,
Figure 13 shows an exemplary negative frequency step manipulation for about 2 s at a single PMU.
The start of the data manipulation at 10 s simulation time was detected correctly by the GRU-AE and GRU-FC models, which can be seen in the sudden increase of the respective model errors as well as the change of the predicted labels. Additionally, the GRU-FC model successfully identified the corrupted PMU, leading to an F1-score of 100%. After exceeding the forecast horizon
at 100 ms, the GRU-FC model went into the “idle” mode (see also
Section 3.3). The GRU-AE model failed to correctly predict the end of the data manipulation, such that the F1-scores decreased to approximately 92%. The total F1-score results of the GRU-FC model for different step and white noise manipulations as well as the number of corrupted PMUs are given in
Table 4.As it can be seen, the F1-scores only decreased slightly in case of a high number of corrupted PMUs. Larger differences arise when comparing the F1-scores between the step and white noise manipulations. Due to the stochastic behavior, white noise manipulations appear to be more difficult to be detected by the forecast model compared to step manipulations. This especially applies to the frequency and voltage angle signals. The corresponding GRU-AE model results are given in
Table 5.
Compared to the GRU-FC model, higher drops of the F1-scores occur in case of a high number of corrupted PMUs but no significant differences arise between step and white noise components. Noticeably low F1-scores were achieved when performing step manipulations for the frequency or voltage angle signals of all PMUs.
4.5. Real-Time Capability of the Proposed Applications
To evaluate the efficiency and applicability of the proposed hybrid NIDS (see
Section 3.2) and phasor data anomaly detection and correction (PADC—see
Section 3.3) application, comprehensive performance tests were performed to prove the real-time capabilities. Assuming a baseline network traffic, the average computational time for both applications is given in
Table 6.
As it can be seen, the PADC application needs a lot more computational time due to the increased number of observations per sample and processing steps within the neural network models. Assuming a maximum data transmission rate of 25 f.p.s. for the PMU data communication, the real-time processing capability for both applications can be confirmed.