4.1. Data Transmission Path Optimization Algorithm Based on BPDM-GCN
In the backup path design of this study, the BPDM-GCN-based data transmission path optimization algorithm aims to optimize the data transmission path in the communication network topology of the wide-area measurement system by combining the GCN and the DDPG algorithm, which provides a weight basis for key links to determine the backup path in the future. The following is a detailed introduction to the specific contents of the algorithm.
The BPDM-GCN backup path design method employs an enhanced version of the DDPG algorithm that integrates GCN. Specifically, the GCN neural network replaces the original neural network within the DDPG algorithm to optimize the transmission paths for phasor data across the communication network topology of the wide-area measurement system. The DDPG algorithm, structured within the Actor–Critic framework, comprises two primary components: the Actor network and the Critic network. The Actor network employs a neural network to approximate a strategic function, mapping current network topology state information to corresponding actions within the action space. Meanwhile, the Critic network assesses and directs actions generated by the Actor network based on a defined reward function, aiming to refine and optimize the strategic actions generated by the Actor network. The DDPG algorithm commonly integrates Recurrent Neural Networks and Long Short-Term Memory Networks within the Actor and Critic networks. These neural networks typically demand substantial computational resources during training, particularly when optimizing network topology path weights. However, their adaptability and flexibility can be constrained. In order to improve the operating efficiency and generalization ability of the DDPG algorithm, the graph theory based on the GCN neural network is used to efficiently learn the structural information and link state information of the nodes in the communication network topology of the wide-area measurement system, so that the DDPG algorithm improved based on GCN can efficiently adapt to the communication network topologies of wide-area measurement systems of various scales. The DDPG algorithm framework based on GCN is shown in
Figure 2.
Figure 2 shows the GCN-based DDPG algorithm framework, where the online policy GCN network, target policy GCN network, online
Q GCN network, and target
GCN network represent the improved online Actor policy network, target Actor policy network, online
Q Critic network, and target
Critic network. The algorithm’s generalization capability is effectively enhanced through the parameter sharing and local connectivity properties inherent in the GCN neural network. The experience replay pool
D stores
data generated during interactions between the agent and the network topology environment, thereby reducing data correlation and enhancing algorithm stability. Green nodes represent normal status information nodes, red nodes (such as node 17) represent nodes with abnormal conditions (such as link failures, high packet loss rates, etc.), and nodes are numbered 1–30 to identify each node in the network topology.The update process for the online policy network and the target network in the GCN-based DDPG algorithm proceeds as follows.
(1) Online network update: The online network includes the online policy GCN network and the online
Q GCN network. The online strategy GCN network generates action
based on the current network topology state
and parameter
.
and
are passed to the online
Q GCN network to generate function
after iteration. The online
Q GCN network transmits gradient information
to update the online policy GCN network. The update process of the online policy GCN network is shown in Equation (
5).
In Equation (
5),
represents the policy function
G with respect to parameter
.
is the gradient information transmitted in
Q GCN.
is provided by the online strategy GCN network, which ensures that the online strategy network selects high-yield actions.
N represents the number of randomly sampled samples.
represents the gradient of the
Q function
with respect to action
a when action
a takes the current
output of the policy network in state
.
represents the gradient of the policy network
with respect to parameter
in state
.
(2) Target network update: The target network includes the target policy GCN network and the target
Q GCN network. The target strategy GCN network selects the network topology state
from the experience replay pool
D and generates
after training. The target
GCN network trains with inputs
and
to derive
, computes the target reward value
, and subsequently transmits it to the online
Q GCN network. The computation process for
is illustrated in Equation (
6).
In Equation (
6),
represents the reward value obtained by taking action
in state
.
represents the discount factor.
represents the value evaluation of action
by the target
Q GCN network in state
. Among them,
is the action generated by the target policy GCN network based on state
and parameter
.
is obtained by regularly copying online
Q GCN network parameter
.
The GCN-based DDPG algorithm and the wide-area measurement system communication network topology facilitate agent training through multiple interactions. The agent dynamically adjusts the output action according to the phasor data transmission delay, bandwidth occupancy, packet loss rate, and reward value in the network topology until the agent training converges. The processes of state mapping, action mapping, and reward value mapping are as follows.
(1) State mapping: The link state information and network topology in the wide-area measurement system communication network are input in matrix form. The link state information is represented by a vector matrix . J represents the number of rows in the matrix. Each row represents the feature vector , of link in the wide-area measurement system communication network. C represents the dimension of the node feature vector. The vector matrix is . The network topology is represented by the adjacency matrix . To maintain vector scale consistency following multi-layer matrix transformations, matrix A undergoes normalization, . D is the degree matrix of matrix A.
(2) Action mapping: The action strategy generated by the GCN-based DDPG algorithm after iterative training based on state and reward value is represented as . The action value in the set represents the link weight of link .
(3) Reward value mapping: The reward value represents the feedback of the agent based on the corresponding action
made according to state
. The BPDM-GCN algorithm is optimized to minimize phasor data transmission delay, low bandwidth occupancy, and low packet loss rate. The phasor data transmission delay, bandwidth occupancy, and packet loss rate are standardized and utilized as the foundation for the reward value calculation. The reward value calculation process is illustrated in Equation (
7).
In Equation (
7), the reward factor parameter is designated as
. The value of parameter
is determined by the relative importance of phasor data transmission delay, link occupancy rate, and packet loss rate within the network topology. In this paper, the values of parameters
a,
b, and
are set to 0.4, 0.4, and 0.2, respectively.
In order to ensure the reliability of the backup path designed in the communication network of the wide-area measurement system, this paper proposes an improved DDPG algorithm based on the GCN. The detailed procedure is shown in Algorithm 1.
Algorithm 1: DDPG algorithm flow improved based on GCN |
Input: Link state eigenvector matrix and structural adjacency matrix Output: Link weight (1) Initalize , , , and D (2) For episode = 1, M do: (3) Initalize Initalize noise strategy ℵ (4) For t = 1, T do (5) Select action according to the current policy (6) (7) (8) (9) Obtain and (10) Store transition in D (11) Sample a random mini batch of from D (12) Calculate target return value , (13) (14) (15) (16) (17) End for (18) End for |
In Algorithm 1, the initial step initializes variables , , , and . Subsequent lines 2–16 outline the training and parameter update procedures of the DDPG algorithm enhanced with GCN. Specifically, lines 5–9 delineate the process of updating GCN neural network parameters, states, actions, and reward values during interactions between the algorithm and its environment. Line 10 involves storing information within the experience replay pool D. Lines 11–16 indicate how the algorithm utilizes data from the experience replay pool to conduct training for both the Actor and Critic networks. In formula , is a scaling parameter used to adjust the scale of the link state feature vector matrix E, each row of represents the eigenvector of the link , is the weight matrix of the GCN layer, is the matrix multiplication of the normalized adjacency matrix and the scaled link state feature matrix , and is obtained by nonlinear transformation by modifying the linear unit activation function .In formula , the result of is applied to the modified linear unit activation function, the function introduces nonlinear factors, multiplies with , and performs matrix multiplication with another weight matrix . Finally, Z is normalized by the function.
The backup path design in BPDM-GCN is divided into two cooperative stages:
(1) Route establishment stage: Through iterative training of DDPG-GCN, the link weight matrix is dynamically optimized to reflect the real-time status of the network topology (such as latency and bandwidth utilization).
(2) Route selection stage: Based on the link weight matrix, the incremental shortest path tree algorithm (Algorithm 2) is used to calculate the load balanced and shortest backup path.
Algorithm 2: Algorithm flow of backup path based on incremental shortest path tree |
Input: Network topology , link weight in the topology Output: The maximum disjoint backup path from the starting node s to the destination node d (1) Initalize , (2) Function build shortest path tree (3) dist = , parent= ,visited = (4) For each neighbor v of s (5) End for (6) return (dist, parent) (7) End function (8) Function build incremental shortest path tree (9) For each neighbor v of s (10) remove link (11) End for (12) Output backup path (13) End function |
Two-stage dynamic correlation by link weight matrix: When the network topology changes, DDPG-GCN retrains and updates the link weight matrix, triggering real-time adjustment of the incremental shortest path tree to ensure that the backup paths always adapt to the latest network conditions.
4.2. Backup Path Implementation Method Based on Incremental Shortest Path Tree
Because of variations in the physical distances between nodes within the communication network topology of the wide-area measurement system, selecting a longer link as the backup path can lead to increased recovery delays when transmitting phasor data from a faulty link along this backup route. In order to reduce the phasor data transmission delay in the backup path, this paper considers link load balancing and the shortest path in depth and designs a backup path implementation method based on the incremental shortest path tree according to the link weights output by the improved DDPG algorithm based on GCN in
Section 3.1. This paper employs the incremental shortest path tree algorithm to compute the shortest backup path from origin node
s to destination node
d within the communication network of the wide-area measurement system. This algorithm dynamically adjusts to real-time changes in network topology, efficiently updating the shortest path between nodes as required. In the context of a WAMS communication network, the transmission characteristics of phasor data dictate that all switches connected to the PMU transmit phasor data to the switch that is linked to the PDC. This paper utilizes the Dijkstra algorithm to establish the shortest path tree
T, with the switch connected to the PDC designated as destination node
d, based on the link weights detailed in
Section 3.1. This tree serves as the primary route for transmitting phasor data within the communication network of the wide-area measurement system. The source node
s transmits phasor data to the root node
d following the path delineated in the shortest path tree.
In order to ensure the reliability of phasor data transmission in the shortest path tree T in the communication network of the wide-area measurement system, the backup path is calculated for the shortest path tree T from the source node s to the root node d based on the incremental shortest path tree algorithm. During the backup path calculation process, the incremental shortest path tree algorithm efficiently updates shortest path information through incremental updates, thereby minimizing computational resources and time required. The algorithmic steps for computing the backup path using the incremental shortest path tree are as follows: (1) Initialize the link weights in tree T to infinity and establish the link relationships between nodes in the communication network topology of the wide-area measurement system. (2) Using node s as the root node in the network topology, employ the Dijkstra algorithm to construct the shortest path tree . (3) Disconnect the links between node s and its neighboring nodes in the shortest path tree sequentially. Based on the priority principle of the incremental shortest path tree, generate an incremental shortest path tree with node s as the root node. The shortest path from source node s to destination node d is selected from the incremental shortest path tree as the backup path. In the event of a malfunction in the network topology, the switch promptly transitions to the designated backup path, ensuring the uninterrupted transmission of phasor data. The particular methodology is illustrated in Algorithm 2.
As illustrated in
Figure 3, the communication network topology of the wide-area measurement system comprises three distinct nodes: the red node 8, which is a switch node connected to the PDC; the green node, which represents a switch node connected to the PDC; and the node without color marking, which is a regular switch node. According to the link weights obtained in
Section 3.1, the shortest path tree is constructed with node 1 as the root node. The main path for transmitting phasor data from node 1 to destination node 8 is
. Disconnect the link
, build an incremental shortest path tree with node 1 as the root node, and perform phasor transmission from node 1 to destination node 8 along the backup path
to ensure data transmission between node 1 and node 8. Green nodes (such as nodes 3, 6, and 10) represent nodes that have valid path connections to the root node (node 1). Red nodes (such as node 8) are nodes that are experiencing errors, congestion, or other abnormal conditions. The special symbol between Node 1 and Node 7 indicates a disconnection between these two nodes.
4.3. Theoretical and Practical Correlation of GCN and DDPG Fusion Mechanism
4.3.1. Benefits of GCN for Dynamic Network Topology Modeling
The traditional DDPG algorithm uses a fully connected neural network (FCN) or a recurrent neural network (RNN) to process the state space. However, in the communication network of a wide-area measurement system, the connection relationship between nodes has a graph structure characteristic. It is difficult for FCN to capture the dynamic correlation of the topology. Its fixed input dimension and local perception ability are difficult to adapt to the dynamic graph structure characteristics of the wide-area measurement system communication network [
19]. This paper uses a graph convolutional neural network (GCN) as the core network of DDPG for the following reasons:
(1) Graph structure perception capability: GCN directly models the connection relationship between switch nodes through the spectral domain convolution operation of the adjacency matrix and the node feature matrix, capturing the dynamic changes of the network topology.
(2) Parameter sharing and generalization: The convolutional layer of GCN shares the weight matrix between different nodes, avoiding the parameter explosion problem caused by changes in network scale in traditional FCN, and supports generalization training from 14 nodes to 118 nodes.
(3) Multi-scale feature fusion: By stacking multiple layers of GCN, the algorithm can simultaneously aggregate local link status (latency, bandwidth) and global topology (node degree, path redundancy), providing high-dimensional semantic features for DDPG.
Zheng et al. focused mainly on data center networks. To solve the problem that existing algorithms ignore network energy consumption, energy saving and network performance are taken as joint optimization goals. The improved DDPG algorithm is combined with a convolutional neural network to achieve energy-saving routing scheduling [
20]. This paper focuses on the communication network of wide-area measurement systems. To solve the problems of poor applicability of existing faulty link recovery algorithms and backup path congestion during network topology migration, the graph convolutional neural network (GCN) and deep deterministic policy gradient algorithms are integrated to optimize the backup path. The differences and innovations in the algorithm design and experimental verification of this paper can be summarized as follows.
(1) Innovative algorithm fusion: A backup path algorithm BPDM-GCN based on graph convolutional neural network is proposed to improve deep deterministic policy gradient. When dealing with network topology path weight optimization, the traditional DDPG algorithm, such as the recurrent neural network, has the problems of large computational resource requirement and limited adaptability. In this paper, GCN is used to directly model the node connection relationship in the network topology. Through the spectral domain convolution operation of the adjacency matrix and the node feature matrix, the dynamic changes of the network topology are captured, and the adaptability of the algorithm to the communication network topology of different scales of wide-area measurement systems is improved.
(2) Optimization of backup path implementation: A backup path implementation method based on the incremental shortest path tree is designed. Considering the difference in physical distance of nodes in the communication network of the wide-area measurement system, the long backup path will increase the delay of faulty link recovery. According to the link weight output by the BPDM-GCN algorithm, this method uses the incremental shortest path tree algorithm to calculate the shortest backup path from the source node to the destination node. It can dynamically adapt to the changes in network topology and reduce the transmission delay of phasor data in the backup path.
(3) Comprehensive experimental verification: Six IEEE benchmark test power system communication networks are used for experiments, and different link load conditions are set as experimental network topologies. Compared with algorithms such as NRLF-RL, LIR-LFR, and FR-VLAN, the method is evaluated by several key indicators such as fault link recovery delay and fault link recovery success rate. The results show that the BPDM-GCN algorithm can effectively reduce the fault link recovery delay and improve the recovery success rate.
4.3.2. DDPG-GCN Collaborative Link Weight Optimization Implementation Mechanism
The synergy between GCN and DDPG is reflected in the following key relationships:
(1) State representation and feature extraction: The network topology state is encoded as the adjacency matrix
A and the node feature matrix
X (including link delay, bandwidth occupancy, etc.). GCN generates the node embedding
H through a graph convolution operation as shown in Equation (
8).
In Equation (
8),
is the adjacency matrix with self-connection added,
is the degree matrix, and
is the trainable weight. The embedding vector
H is used as input to the actor network to guide the link weight adjustment strategy.
(2) Action generation and strategy optimization: The Actor network adjusts action based on the link weight output by H, and the Critic network calculates the value of Q based on H and to evaluate the long-term benefits of the action. GCN’s feature propagation mechanism enables the Critic network to predict link congestion risks and avoid strategy oscillations caused by traditional DDPG ignoring the topology structure.
(3) Dynamic weighting of the reward function: The latency, bandwidth utilization, and packet loss rate indicators in the reward value are dynamically weighted by the topological features extracted by GCN. For example, the neighbor node features of high-load links are automatically enhanced by GCN, prompting DDPG to prioritize the optimization of critical path weights and achieve load balancing.
4.3.3. Deep Integration Between BPDM-GCN and SDN Architecture
In the SDN control plane, the specific association of BPDM-GCN is as follows [
21]:
(1) Data plane awareness: The SDN controller collects link status data (latency, bandwidth usage) in real time via the OpenFlow protocol and builds a dynamic topology map .
(2) Control plane reasoning: BPDM-GCN receives topology data and uses the pre-trained DDPG-GCN model to output the optimal link weight matrix W.
(3) Application level decision: Generate the primary path and backup path based on the
W increment shortest path tree algorithm and send them to the switch through the flow table entry. The comparative analysis of BPDM-GCN and traditional methods is shown in
Table 1.
In the BPDM-GCN algorithm, GCN is used to compute link weights. The calculation of GCN mainly revolves around the adjacency matrix and the node feature matrix. When computing each layer of GCN, as in formula , it involves matrix multiplication operations, and its computational complexity is roughly related to the number of edges E, with each layer being approximately . Due to the usually small number of GCN layers, the total computational complexity of the GCN part can be approximated as . In addition, the complexity of constructing the shortest path tree in the algorithm (using Dijkstra’s algorithm) in sparse graphs (actual power system communication networks are mostly sparse graphs) is approximately , and the complexity of other auxiliary operations is relatively small and can be ignored. Considering these factors, the computational complexity of the BPDM-GCN method is finally .