Next Article in Journal
Relationship between Body Composition and Physical Performance by Sex in Professional Basketball Players
Previous Article in Journal
Method for Wind–Solar–Load Extreme Scenario Generation Based on an Improved InfoGAN
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Graph Convolutional Network-Based Method for Congested Link Identification

1
School of Information and Communications Engineering, Xi’an Jiaotong University, Xi’an 710049, China
2
The 54th Research Institute of CETC, Shijiazhuang 050081, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(20), 9164; https://doi.org/10.3390/app14209164
Submission received: 20 July 2024 / Revised: 20 September 2024 / Accepted: 24 September 2024 / Published: 10 October 2024

Abstract

:
Accurate and efficient congested link identification is crucial in wireless sensor networks (WSNs). However, in some networks with a centralized management architecture, it is often not feasible to monitor large numbers of internal links directly or even impossible in some heterogeneous networks. Network tomography, the science of inferring the performance characteristics of a network’s interior by correlating sets of end-to-end measurements, was put forward to solve this problem. Nevertheless, a network always contains more links than end-to-end paths, making it problematic to find a determined solution. To solve this problem, most of the current methods try to use some additional prerequisites, such as the link congestion probability. However, most existing studies have not considered the congestion caused by node factors and the case of multiple congested links on one path. In this paper, we initially model the issue of link congestion as a Bayesian network model (BNM). Subsequently, we introduce a congestion link identification method based on graph convolutional networks (GCNs), novelly converting the intricate Bayesian network solving problem into a graph node classification task. The simulation results validate the feasibility of our proposed algorithm in identifying congested links and underscore its advantages in scenarios involving node congestion and multiple congested links.

1. Introduction

A wireless sensor network (WSN) consists of a limited collection of nodes which detect various properties of the environment across different locations and over predetermined time intervals. These distributed observations, when combined, can create a comprehensive, global spatiotemporal representation of the monitored area [1]. Recently, numerous approaches have emerged for sensing environmental phenomena, leveraging both short-term and long-term monitoring strategies [2]. Researchers have actively addressed diverse scenarios across heterogeneous application domains, harnessing distributed intelligence and the collaborative nature of this innovative technology.
Examples of heterogeneous application domains include but are not limited to precision agriculture [3,4], where sensor networks are deployed in farmlands to monitor soil moisture, crop health, and other vital parameters; environmental monitoring [5,6], where networks of sensors are used to track air quality, water levels, and wildlife habitats; smart cities, where WSNs facilitate traffic management, energy efficiency, and public safety; and industrial monitoring, where sensors are integrated into manufacturing processes to ensure quality control and safety standards. In each of these domains, the wireless network architecture enhances sensing capabilities beyond those of standalone devices, enabling real-time decision making and optimization.
The WSN, one of the key technologies for sensing the physical world, is of vital importance to the development of these technologies. However, due to the rigidity of the distributed WSN architecture and the rigid resource management model, flexible changes cannot be made quickly according to the requirements of the upstream application. Therefore, the current distributed WSN cannot fully meet the dynamic, on-demand sensing requirements for monitoring the physical world [7]. This also limits the rapid development of WSNs to some extent. Hunkeler et al. found that centralized management outperforms distributed management in WSNs, reflecting streamlined sensor node functions, efficient network management and diagnostics, and optimal network parameter configurations, which they argue better align with future WSN requirements [8]. At present, this view is widely accepted in the academic community.
In WSNs, congestion can easily occur due to the bandwidth constraints between nodes and the many-to-one data transmission pattern. In the worst case, data retransmission due to congestion causes additional congestion, which increases the packet loss rate of the network, decreases the operational duration of the sensor nodes, and renders the entire network unavailable. Effective management of network congestion is crucial to prevent issues like packet loss, delay, load imbalance, reduced throughput, and energy inefficiency, thereby maintaining optimal network performance [9].
Accurate and efficient congested link identification is crucial in WSNs. However, it is a well-established fact that acquiring such states through conventional IP-based network monitoring tools, such as the Simple Network Management Protocol (SNMP) and the Internet Control Message Protocol (ICMP), generates a high monitoring overhead. Additionally, this is often impossible due to the large scale, heterogeneity, multiple ownership, and partial observability of networks. Therefore, it is essential to develop novel monitoring techniques which offer low traffic overhead and complexity while ensuring accurate measurements [10].
Network tomography is a technique for inferring the performance and state of a network by performing end-to-end measurements at the network boundaries [11,12]. This technique primarily deals with additive network metrics commonly utilized in routing, such as link delay and packet loss. By selectively probing a limited set of paths from a small number of monitors positioned at the network’s edge, network tomography constructs a linear system capturing the relationship between observed paths and corresponding link metrics.
Network Boolean tomography is a generalization of network tomography technology to the realm of Boolean algebra [13]. Boolean network tomography uses “0” and “1” to indicate whether a link or path in a network is congested. A link is labeled as “1” if it is congested and “0” if it is not. The congestion state of a path is related to the state of the links it traverses. The relation between path status and link status can be represented by the logical operator “OR”. In other words, a path is good if all links it traverses are good; otherwise, it is congested if at least one of the links it traverses is congested.
Identification of congested links is crucial for adjusting routing policies and controlling congestion. Based on the relationship between end-to-end path congestion and link congestion, it is possible to construct multiple equations from the path states, which are obtained through the operation or computation of all links traversed by the paths. However, since the number of links in the network is typically larger than the number of paths, the feasible solution to this problem is not unique. For instance, consider the scenario depicted in Figure 1, where four terminal nodes ( d 1 , d 2 , d 3 , and d 4 ) gather data and converge to a base station s via a WSN. If the paths from d 2 , d 3 , and d 4 to the base station s are congested, then we cannot solve for an exact result, except for the link between d 1 and an inner node e 1 .
In such scenarios, by leveraging the network’s topology and fundamental performance parameters, collected end-to-end network measurements serve as valuable sample data to estimate the probability distribution of performance parameters. This statistical approach offers a nuanced understanding of various network conditions, particularly regarding congestion. By extrapolating insights from this analysis, informed decisions can be made to manage and optimize network performance without the necessity of expanding the monitoring infrastructure.
Network tomography has garnered significant attention due to its ability to reduce monitoring overhead in a network by inferring unknown internal link states based solely on end-to-end measurements. Numerous studies have explored its application in various network scenarios. For instance, in [14], a solution is presented for inferring the network topology, while [15] focused on loss rate inference by selecting a minimal set of paths. Recently, the authors of [16] introduced the incorporation of path information to infer link metrics. Additionally, research has investigated network tomography’s robustness under failure [17] and explored the security vulnerabilities associated with these approaches [18]. A recent survey comprehensively discussed the state of the art in network tomography [19]. Furthermore, recent advancements in this field have leveraged machine learning (ML) techniques to estimate the additive metrics of network slices, such as delays or logarithms of loss rates. In particular, the authors of [20] employed neural networks to address this challenge, demonstrating the potential of ML in enhancing network tomography. Meanwhile, the authors of [21] focused specifically on the intricacies posed by unknown links and dynamic routing, offering innovative solutions to overcome these issues within the context of network tomography.
Network Boolean tomography, as an extension of network tomography technology to Boolean algebra, has garnered extensive attention for its inherent advantages in computational complexity reduction and data processing efficiency enhancement. Nguyen et al. [22] proposed the CLINK algorithm, which introduces the probability model into congested link inference. Moreover, through multi-slot path performance detection, it effectively avoids the strong dependence of single-slot path detection on clock synchronization [23]. Compared with the SCFS algorithm without prior probability [13] and the MCMC algorithm with consistent prior probability [24], congestion link inference performance is considerably improved. The authors of [25] elaborates on the ill-posed nature of network Boolean tomography by highlighting the significance of integrating link correlations. They modeled the problem as a maximum a posteriori (MAP) estimation problem and proposed a learning-based algorithm specifically utilizing long short-term memory (LSTM), which is adept at learning the statistical dependencies of sequence elements from historical data. Additionally, the authors of [26] treated the process of link inference as a Markov process, proving the rationality of the mathematical method employed.
The fault diagnosis technology rooted in graph theory has gained prominence. It deduces the most likely fault explanation set, relying on the accuracy of the graph model, such as the Bayesian network. In this context, the results of end-to-end path performance detection serve as observed nodes, revealing the states of these nodes’ variables and thus identifying the links most prone to congestion. The authors of [27] proposed a probabilistic symptom-fault map to isolate likely faults through incremental updates. On the other hand, the authors of [28] utilized a Bayesian network for root cause analysis, identifying the underlying reasons for faulty states. These techniques are crucial for efficient network fault troubleshooting.
However, none of the previous studies considered the congestion caused by the limited computing resources of nodes, which is a typical characteristic of a WSN. In contrast to traditional networks, WSNs are made up of sensor nodes which send and receive data through wireless communication. Consequently, congestion often occurs at the nodes and on the radio channel [29,30].
In WSNs, the downstream data flow from the sink to the sensor nodes typically involves a one-to-many multicast paradigm, whereas the upstream communication from sensor nodes to the sink adopts a many-to-one model. These networks are prone to two distinct types of congestion, as highlighted in [30] and illustrated in Figure 2. The first congestion type, which is commonly encountered in traditional networks, is node-level congestion. It arises from buffer overruns within individual nodes, leading to packet discarding and longer queuing delays. The subsequent packet loss triggers retransmission processes, thereby escalating energy consumption. Since WSNs rely on shared wireless channels among multiple nodes utilizing protocols akin to Carrier Sense Multiple Access (CSMA), link-level congestion emerges when several active sensor nodes concurrently attempt to access the channel, causing collisions. This phenomenon escalates packet service times, undermines link utilization and overall network throughput, and depletes energy resources at the sensor nodes.
The congested link identification algorithms have greatly improved over previous methods. However, they still face challenges in WSNs. Firstly, they often assume that there is a single congested link per path, reducing accuracy in multi-congestion scenarios. Secondly, they overlook congestion caused by node factors, further compromising accuracy. For example, when a node is congested, all links flowing into the node are unable to communicate normally, leading to link failure. The congestion caused by the node makes the link failure correlated. Ignoring this correlation will reduce the prediction accuracy. To address this, we introduced a Bayesian network model (BNM) and graph convolutional network (GCN) technology to extract network topology features.
We propose a deep learning method based on the GCN to solve the above two challenges. Our main contributions are as follows:
(1)
We model the issue of link congestion as a BNM, which makes it suitable for scenarios involving node congestion and multiple congested links.
(2)
We introduce a congestion link identification method based on the GCN, innovatively converting the intricate Bayesian network-solving problem into a graph node classification task.
(3)
We conduct extensive simulations to compare the performance of the proposed GCN-based identification of congested link (GIC) method with existing techniques based on various metrics. The results validate the feasibility of the GIC algorithm in identifying congested links and underscore its advantages in scenarios involving node congestion and multiple congested links.

2. Proposed Model

2.1. Criteria for Link Congestion

In many applications, the perception of link congestion is more realistic and helpful than a concrete measure of network performance [31]. According to [12], each packet traversing a link can suffer performance degradation, such as loss or delay. Once the nature of each impairment is known, a summary statistic (e.g., loss rate or mean delay) can be calculated and mapped to a binary performance measure using a predefined threshold. If the calculated statistical values exceed this threshold, then the performance is classified as “bad”; otherwise, it is classified as “good”. Paths consisting of multiple links will undergo the same classification process, with different thresholds being applied to each link.
In this paper, we choose packet loss rate as the criterion for link congestion. Therefore, we measure the congestion status of all end-to-end paths in terms of packet losses. During an end-to-end packet loss rate measurement period, each route of the end-to-end path is probed λ times. We denote the number of packets received from the source node through the kth route as λ ( p i k ) . To obtain the status of each route, we first calculate its packet loss rate γ as follows:
γ = 1 λ ( p i k ) λ
By comparing the packet loss rate λ with a given threshold α (e.g., 1% [12]), an estimated congestion status is attached to route p i k :
y i k = 1 if γ α , 0 if γ < α .
If a route’s measured packet loss rate falls below a predefined threshold α , then it will receive a Boolean value of “0”, indicating its state is good; otherwise, it will receive a Boolean value of “1”, indicating congestion.

2.2. Network Model

The network model, the foundation of network tomography, refers to the topology used to implement the network tomography. Consider the fat tree topology, which has demonstrated its efficiency in SDN-based networks. We define N as the set of network nodes and L as the set of network links. Each node in set N represents either a physical network node or a subnetwork consisting of numerous nodes and links. The connection between the source node and the destination node is called a path. Each path is composed of one or more links. Any connection between two nodes with no intermediate nodes is called a link. Each link can be a physical link or an abstraction of various physical links. For example, we consider the topology in Figure 3, which is an abstraction of the scenario in Figure 1. In this topology, set N contains elements such as a center node s, aggregation nodes a 1 and a 2 , edge nodes e 1 and e 2 , and destination nodes d 1 , d 2 , d 3 and d 4 . Set N is
N = { s , a 1 , a 2 , e 1 , e 2 , d 1 , d 2 , d 3 , d 4 }
Set L is
L = { l 1 , l 2 , l 3 , l 4 , l 5 , l 6 , l 7 , l 8 , l 9 , l 10 }
Let r d i k (abbreviated as r i k ) indicate the kth route from the source node to the destination node d i , and R be a routing table, which is a set of all optional routes from the source node to any destination node. For example, the routing table of Figure 3 is defined as follows:
R = { r 1 1 , r 1 2 , r 2 1 , r 2 2 , r 3 1 , r 3 2 , r 4 1 , r 4 2 }
The route table is shown in Table 1.
The authors of [32] pointed out that a congested link will cause congestion on all routes which traverse through it. Moreover, each link traversed by a good path route must be good. Consequently, we indicate y ( r i k ) as the binary status of the path route r i k . Since we do not perform loss detection directly on each link, their states remain unknown. For link l i traversed by route r i k , we mark its unknown status as x ( l i ) . We then construct a logical relation between a route and its component links as follows:
y ( r i k ) = x ( l 1 ) x ( l j ) = l V i k x ( l ) for all l V i k
where ⋁ is the Boolean operation “OR” and V i k indicates the set of links that route r i k traverses through. In this paper, we use the vector X = [ x 1 , , x i , , x n ] T to represent the state of the n links in the network, and the state of m routes in the route table R is the vector Y = [ y 1 , , y i , , y m ] T . For the Boolean relation between the observations of separate routes and the links’ status, we use the following equation to extend Equation (6):
D X = Y
where D = ( d i k ) is an m × n order measurement matrix and the element d i k { 0 , 1 } , specifically d i k = 1 , indicates that the kth route in R traverses l i ; otherwise, d i k = 0 . In addition, “⊙” is the product operators between the Boolean matrices.

2.3. Congestion Infer Model

Equation (7) formulates the relationship between the route status y i and the link statuses x i as y i = j = 1 n ( d i j x j ) . The goal of congested link identification is to find the link state x i by solving Equation (7), which gives Y and D. If the state x i of the link l i can be uniquely determined by Equation (7), then the link l i is identifiable. However, for end-to-end measurements in real networks, there are usually more links than the measured routes involved, indicating that there are more unknown variables than known variables in tomography, making it a typical ill-posed problem [33].
From a mathematical point of view, there are infinitely many solutions for the ill-posed problem of congested link identification. As a result, most state-of-the-art methods either make some assumptions about the nature of the various solutions or try to retrieve additional constraints to find a particular solution.

3. The GCN Approach

3.1. Bayesian Network Model

According to the previous description, the congestion probability of a measured route of an end-to-end path is determined by the congestion probability of the links it traverses through, and the congestion probability of a link is determined by both the channel factors and the node factors. In this paper, we construct a BNM as shown in Figure 4 to describe the interaction between different factors in the congested link identification problem.
When constructing the BNM for congested link identification, consider the graph DG = ( V , E ) , where the set V = { v i } represents three distinct sets of variables. Each set of variables in V can take on two possible states: 1, representing congested, and 0, representing non-congested. These sets are as follows:
  • Y = { y 1 , , y m } is the the set of end-to-end measured route states (representing the status of the measured routes in the network), where y i represents the status of the ith route in the routing table (i.e., y 1 is the status of r 1 , y 2 is the status of r 2 , and so on).
  • X = { x 1 , , x n } is the set of link states in the network, where x i represents the status of the ith link in the network (i.e., x 1 is the status of link l 1 , x 2 is the status of link l 2 , and so on).
  • N = { n 1 , , n l } is the set of node states in the network, where n i represents the status of the ith node in the set of network nodes (i.e., n 1 is the status of network node s, n 2 is the status of network node a 1 , n 3 is the status of network node a 2 , and so on).
E = { e i j } is the set of edges connecting nodes v i and v j , representing the causal influence among the variables in V . The graph DG captures the first-order causal dependency between nodes (i.e., either v i influences v j or v j influences v i ). In the Bayesian network constructed in this paper, the state of a route is related to the states of the links it traverses, and the state of a link is related to the state of the node it flows into.
If congestion is detected on route r i ( i = 1 , 2 , , m ) with ( y i = 1 ) , then the problem of inferring congestion on each link traversed by the congested route r i can be transformed into a problem of selecting the most likely set of values for the hidden variables X , given the observed variables Y in the BNM. The joint probability of the nodes in the BNM is described as follows:
P ( X ) = j = 1 m P ( x j | p a ( x j ) )
where p a ( x j ) is the parent node of node x j in the BNM. Based on the detection results y i for each route r i , we can utilize the BNM to infer the most likely set of links experiencing congestion. This can be solved using the a r g m a x function:
arg max X L P ( X | Y )
where Y is the set of measured routes. We define an m-dimensional indicator vector X such that x i = 1 when the link l i X is congested and, conversely, x i = 0 when the link l i is not congested. According to Bayes’ theorem:
arg max X L P ( X | Y ) = arg max X L P ( Y | X ) P ( X ) P ( Y )
Since P ( Y ) is only related to the network status and detection results and is independent of the selection of links—that is, P ( Y ) is independent of X —it can be regarded as a constant. Equation (10) is equivalent to
arg max X L P ( X | Y ) = arg max X L P ( Y | X ) P ( X ) = arg max i = 1 n P ( y i | p a ( y i ) ) j = 1 m P ( x j | p a ( x j ) )
Based on the premise that if the routing performance is normal, then all the links along the route are also normal, and if the routing is congested, then at least one of the links along the route has experienced congestion, we can derive the probabilistic relationship between the probed route and each link along the route, as shown in Equation (12):
P ( y i = 0 | p a ( y i ) = 0 , , 0 ) = 1 , P ( y i = 1 | x i = 1 x i p a ( y i ) ) = 1

3.2. Graph Structure Construction

In Equation (11), P ( y i | p a ( y i ) ) can be obtained through Equation (12). However, the estimation of P ( x j | p a ( x j ) ) is hard in practice because it is difficult to accurately obtain the prior probabilities of channel cause congestion and node cause congestion. Inspired by the graph models, we propose a new approach based on the GCN to avoid any explicit estimation of these prior probabilities in the BNM (i.e., to use historical data to achieve the mapping between observations of end-to-end path states and unknown congestion links).
We constructed a graph G = ( ν , ε ) for link congestion identification, where ν is the set of nodes in the graph and ε represents the set of edges in the graph. There are three types of nodes in ν . The first type of node represents the route status, the second type of node represents the link status, and the third type of node represents the node status in the network. There are two types of edges in ε . The edges between the link status nodes and route status nodes represent the routes which travel through the link, and the edges between the link status nodes and node status nodes represent the link starting from the node. We combined the route status node into two entities based on the test results. For example, if the test result is that the end-to-end path routes r 3 , r 4 , r 5 , r 6 , r 7 , and r 8 are congested, and r 1 and r 2 are not congested, then the graph structure is constructed as shown in Figure 5.
During the experimental process, we initially conducted multiple calculations and simulations based on the complete graph structure as shown in Figure 5, which included the route state nodes. However, upon observing and analyzing these results, we noticed that the route state nodes introduced a certain level of complexity during multiple iterations and calculations. This complexity not only increased the computation time but also potentially had subtle impacts on the accuracy of the results.
To delve deeper into the reasons behind this observation, we conducted a comparative experiment, repeating the same calculations and simulations on a simplified graph which excluded all route state nodes. This alteration significantly simplified the graph structure, reducing the amount of data which needed to be processed.
By comparing the results of the two experiments, we discovered that the graph without route state nodes was more efficient in terms of computation, and the stability and accuracy of the results were also enhanced. Specifically, we observed a substantial reduction in computation time and a narrowing of the fluctuation range of the results, making them closer to theoretical expectations.
Given the experimental scenario, removing the route state nodes simplified the graph structure and improved the accuracy and efficiency of the calculation results. Consequently, based on this finding, we constructed the simplified graph shown in Figure 6 to carry out further analysis and research.
In this way, we transformed the problem of link congestion identification into a node classification task as shown in Figure 7, where the link status nodes were categorized into two classes: congestion and normal.

3.3. The Network Structure of a GCN

The graph we constructed is inherently heterogeneous, meaning it comprises nodes and edges of diverse types and attributes. To address the complexities of such graphs, we drew inspiration from relational graph convolutional networks (RGCNs), introduced in [34]. The RGCN extends traditional graph convolutional networks (GCNs) to efficiently handle large-scale multi-relational graphs by adapting GCN principles specifically for realistic knowledge bases. It inherits the GCN’s convolutional architecture for graphs while incorporating mechanisms to model the complex interactions between different types of nodes and relations. Motivated by these strengths of the RGCN, we reformulated the congested link identification problem as a (semi-)supervised classification task on the nodes within our heterogeneous graph.
Relational graph convolutional networks (RGCNs) have the following advantages when dealing with network node classification problems. The RGCN excels at capturing the relationship between nodes in the network, whereas traditional graph neural networks cannot. By considering the relationships between nodes, the RGCN understands the role and importance of each node, thereby improving the accuracy of node classification. It is a scalable model which can handle networks of any size, enabling it to achieve good performance on large-scale network data and networks of different sizes. The RGCN is also highly robust to noise and outliers in the network, maintaining high accuracy even in the presence of such disturbances. Furthermore, it has high computational efficiency, processing large-scale network data in a short time. In short, the RGCN effectively tackles the network node classification problem by leveraging the relationship between nodes and its scalability, robustness, and computing efficiency.
The following propagation model is defined to compute the forward pass update of the entity or node indicated by ν in the previously constructed graph:
h i ( l + 1 ) = σ r R j N i r 1 c i , r W r ( l ) h j ( l ) + W 0 ( l ) h i ( l )
where N i r denotes the set of neighbor indices of node i under the relation r R and c i , r is a normalization constant. In this paper, we set c i , r = 1 | N i r | for convenience. Figure 8 shows the calculation chart of single-node updating in a GCN model.
In this paper, we stack the GCN layers as shown in Equation (13) and activate the final layer via softmax activation (•). The cross-entropy loss is as shown in Equation (14). Then, the task of the training phase is to minimize the cross-entropy loss:
L = i Y k = 1 K t i k ln h i k ( L )
The cross-entropy loss function is particularly suited for multi-class classification problems, as it combines the log-softmax operation and the negative log-likelihood loss in a single step. Specifically, the cross-entropy loss computes the softmax activation over the logits (raw output of the network) and then calculates the cross-entropy loss between the predicted probabilities and the target labels. By minimizing this loss function during training, we aimed to optimize the model’s parameters such that it could accurately classify the input data into the correct classes. The optimization of the model is carried out using an optimization function (e.g., stochastic gradient descent or its variants) which iteratively updates the model’s weights and biases in the direction which minimizes the loss.
In Equation (14), Y is the set of indices of the labeled nodes, h i k ( L ) is the kth entry of the network output for the ith labeled node, and t i k is the ground truth label. In this paper, we train the model using (full-batch) gradient descent techniques. Figure 9 shows the schematic depiction of the congested link identification model.

3.4. GCN-Based Congested Link Identification Algorithm

The congested link identification algorithm based on a GCN consists of two phases: the training phase and the diagnosis phase. In the training phase, the required parameters include the network topology T, the training data set Ω , and the graph convolutional neural network model M.
In the training phase, the batch size in each epoch is η , and the number of training epochs is ϵ . In the training process of each epoch, an episode is randomly selected from the training set Ω and construct graph G according to the test results using the method in Section 3.2. Through the graph convolution model, the estimated value X e is obtained based on G. The loss is then calculated using the loss function, and then the neural network model M is optimized using the optimization function.
We saved the trained model M from the epoch with the maximum accuracy value as a result of the training.
We detail our algorithm in Algorithm 1.
Algorithm 1 Congested link identification (GIC) algorithm.
  • training phase
  • require:  T , Ω , M , ϵ , η , a c c
  • inputs:  Ω
  • outputs: M
  • setps:
  •        I n i t l i z e ( )
  •       while  e p o c h < ϵ
  •             while  i t e r < η
  •              select randomly   ( X , Y ) Ω
  •              G = b u i l d _ g r a p h ( Y , T )
  •              X e = M . f o r w a r d ( G , Y )
  •              l o s s = g e t _ l o s s ( X , X e )
  •              M . o p t i m i z e r ( l o s s )
  •             calculate  a c c
  •             if a c c > m a x _ a c c
  •              m a x _ a c c = a c c
  •              save ( M )
  • diagnosis phase
  • require:  T , M , Y
  • inputs:  Y = ( y 1 , y 2 , , y n )
  • outputs:  X = ( x 1 , x 2 , , x m )
  • setps:
  •        load ( M )
  •        G = b u i l d _ g r a p h ( Y , T )
  •        X e = M . f o r w a r d ( G , Y )
  •       return  X e

4. Performance Evaluation

4.1. Experimental Set-Up

In adherence to the conventions established in [26], and with the aim of fostering a controlled environment which prioritizes simplicity and a keen focus on core concepts, we opted to employ a numerical simulation approach for validating our proposed method. Figure 3 shows the topology of the network designed for the experiment. There are nine nodes in this topology. Without loss of generality, we took node s as the source node and nodes d 1 , d 2 , d 3 , and d 4 as the destination nodes. In this topology, there are two routes between the source node and each destination node, and there are some shared links between different path routes, which partly reflects the complexity of the network.
To evaluate the performance of the GIC algorithm, we devised a simulation framework which incorporates the concept of maximum prior congestion probabilities for both links and nodes.
For the links, we introduced the maximum link prior congestion probability threshold δ l , which represents the upper limit of the potential congestion level within the network. We randomly assigned a prior congestion probability p l k to each link l k within the interval [ 0 , δ l ] . This approach ensured that while the overall congestion level was bounded by δ l , the specific congestion probabilities for individual links differed, reflecting the diversity in proneness to congestion across the network.
Furthermore, to capture the impact of node-related factors on network congestion, we introduce the concept of node cause congestion probability. The node prior congestion probability threshold p n k represents the likelihood of congestion arising from the computational capacity, storage space, and energy limitations of node n k . When a node becomes congested, we assume that all links originating from that node are also considered congested. Analogous to the link-level approach, we define a maximum node prior congestion probability δ n for the network and randomly assign a congestion probability p n to each node within the interval [ 0 , δ n ] . This mechanism ensures that, in addition to link-specific congestion, the simulation also accounts for the heterogeneous nature of node congestion, thereby enhancing the realism and relevance of the simulation outcomes.
We assume that the congestion probability of a link l k due to channel factors is independent of the node cause congestion probability caused by its sending node N l . Then, the congestion probability of a link p c is obtained from the link factor congestion probability p l and node factor congestion probability p n :
p c = p l + p n p l × p n
In the simulation experiments, the congested links in the network were generated randomly according to the congestion probability, and we calculated the congestion probability of each link according to Equation (15).
We then performed simulation experiments at different values of δ l and δ n and repeated each experiment 20 times. Each repeated experiment was called a scenario, and in each scenario, we regenerated the link-caused congestion probability P l = [ p 1 , , p n ] and the node-caused congestion probability P n = [ p 1 , , p k ] for each link in the network. We generated 1000 simulation data points for each scenario, of which the first 80% were used to train the models and the last 20% were used to validate the trained models.

4.2. Evaluation Metrics

Congested link identification algorithms are designed to increase the detection capability of congestion and decrease the likelihood that a regular link will be falsely recognized as a congested link. Accordingly, similar to [26], we evaluated the performance of the algorithm using the metrics of the detection rate (DR) and false positive rate (FPR). The DR indicates the probability that a congested link is accurately identified as congested, and we calculated the DR as follows:
DR = C { l | l L ; x ( l ) = 1 } { l | l L ; x ( l ) = 1 }
The FPR indicates the probability that a normal link is incorrectly identified as congested, and we calculated the FPR as follows:
FPR = C { l | l L ; x ( l ) = 0 } { l | l L ; x ( l ) = 0 }
where C represents the set of links estimated to be congested and x ( l ) represents the actual state of link l.

4.3. Choosing Model Parameters

We implemented our GCN-based algorithm using the deep learning framework PyTorch. In this part, we reveal the relevant parameters of our GCN model.
(1)
Hyperparameters
The hyperparameters of the GCN-based congested link identification algorithm we propose mainly include the learning rate, batch size, training epochs, the number of hidden layers, and the number of hidden units. In the simulations, we manually adjusted and set the learning rate to 0.01, the batch size to 200, the number of training epochs to 50, and the number of hidden layers to 3. The number of hidden units significantly influences the performance of the GIC algorithm. The selection of the optimal number of hidden units is described below.
(2)
Training
In the training phase, we used the training data set (80% of the total data) as input, and the remainder was used as input during the testing process. We trained the GCN model using the Adam optimizer.

4.4. Performance Evaluation Results

We simulate network congestion by assigning a congestion probability to each of the n links. The congestion was caused by limited link performance or insufficient resources from the link’s originator node. In this paper, we introduce the concept of congestion caused by node factors, which means that when a node is congested, all the links originating from that node are congested. Each link l k in the network is congested with a probability p l k distributed uniformly in [ 0 , δ l ] . Each node n k becomes congested with a probability p n k uniformly distributed in [ 0 , δ n ] , where δ l is the maximum link congestion probability threshold and δ n is the maximum congestion caused by the node factor probability threshold. The values of δ l and δ n were varied in our simulations to evaluate the algorithm under different situations.
(1)
Performance under different numbers of hidden units
The number of hidden units is an essential parameter in the GCN-based algorithm, since different numbers of hidden units can significantly influence the performance of the GCN model [35]. We chose the network topology shown in Figure 3 and measured the performance of our GIC algorithm with the values of the DR and FPR for different numbers of hidden units. We set δ n = 0.09 and δ l = 0.09 . As shown in Figure 10, the x axis represents the number of hidden units.
It can be seen from Figure 10 that when the number of hidden units increased, the performance of the model rose at first and then decreased. The main reason for this is that when the number of hidden units exceeds a certain degree, the model complexity and the computation difficulty increase, which leads to overfitting [35].
(2)
Performance versus different channel cause congestion probabilities
We compared our algorithm with SCFS [12], CLINK [13], and RIC [26]. To test the impact of the channel cause congestion probability on the performance of the algorithms, given that most of the aforementioned algorithms neglect the impact of node congestion, in order to compare the performance of the algorithm proposed in this paper under the condition where node congestion was not considered, we set δ n = 0 and then switched δ l from 0 to 0.3 .
The results are shown in Figure 11 and Figure 12. As the channel cause congestion probability increased, the performance of all congested link identification algorithms degraded because as the number of congested links increased, the number of congested routes also increased. A large number of congested path routes would increase the number of feasible solutions, which increased the probability of errors in the identification algorithm. More importantly, as the network congestion level increased, the chance of multiple congested links in the same path increased, but the existing algorithms are all based on the presence of only one congested link, which reduced the DR of the algorithms. However, the performance of the FPR was not all that different, since the existing algorithms are conservative in the inference of congestion.
(3)
Performance versus different node cause congestion probabilities
To investigate the effect of node cause congestion probability on the performance of the algorithms, we varied δ n from 0 to 0.3 in steps of 0.03 . For each fixed value of δ n , we further varied δ l from 0 to 0.3 in steps of 0.03 . For each combination of δ n and δ l , we recorded the DR and FPR. Then, for each fixed δ n value, we computed the average values of the DR and FPR over all δ l values. The trends of these average DR and FPR values with respect to changes in δ n are presented in Figure 13 and Figure 14, respectively.
It can be seen from the experimental results that with the addition of congestion caused by the node factor probability, the accuracy of the SCFS, CLINK, and RIC algorithms decreased significantly compared with that without the introduction of congestion caused by the node factor probability. With the increase in congestion caused by the node factor probability, the performance of SCFS decreased most severely, and the performance of the CLINK and RIC algorithms also saw a huge impact. This is because the congestion probability of the nodes increased the correlation between links. However, the previous algorithms do not deal well with the influence of network congestion caused by the network topology. Under different congestion types caused by node factor probabilities, both the DR and FPR of the proposed congested link identification algorithm based on the GCN had optimal performance, and the DR was far superior to that of the previous algorithm.
(4)
Performances of various topology sizes
To validate the applicability of our algorithm, we conducted experiments on a series of binary tree-based topologies, which are illustrated in Figure 15. These topologies offer a contrasting structure to our initial evaluation scenario, enabling us to thoroughly assess the robustness of the diagnosis methods.
We systematically varied the network size by adjusting the number of nodes, denoted as N, which varied from 2 to 16, and observed the impact on the DRs and FPRs of the different methods. For each network size, we conducted multiple simulations by iteratively changing δ l and δ n from 0.0 to 0.3 with a step size of 0.1 .
The results, summarized in Figure 16 and Figure 17, reveal several interesting insights.
First, the mean DR values achieved by the GIC method remained relatively stable across different network sizes, indicating its robustness to changes in topology and scale. In contrast, the DR values of SCFS, CLINK, and RIC decreased as the network size increased, with SCFS showing the most significant decline. This finding aligns with previous experimental results reported in [35], which also observed a reduction in the DR values as the network size expanded.
The stability of the GIC method can be attributed to its ability to effectively identify and isolate faults in complex network topologies, regardless of their size. This suggests that the GIC method may be more suitable for large-scale networks, where the impacts of the topology and size on diagnosis performance are more pronounced.

5. Conclusions

In addressing the challenge of congested link identification in WSNs, we leveraged the power of Bayesian networks to model the intricate relationships among congestion factors as a directed acyclic graph. This approach offers a clear and intuitive visualization of the interdependencies within the congested network, facilitating a deep understanding. By incorporating prior knowledge and subjective beliefs into the modeling process, Bayesian networks demonstrate high adaptability to the unique characteristics of WSNs, including node-induced congestion. Furthermore, their ability to reason efficiently under uncertainty enabled us to make informed predictions and decisions, even with incomplete or ambiguous network data. Building upon this BNM, we reframed the congested link identification problem as a node classification task, a strategy which our proposed GIC algorithm employs effectively. The simulation results underscore the GIC method’s ability to accurately extract network topology features and address node-caused congestion while also overcoming the limitations of existing methods, such as the smallest consistent failure set theory, in handling multiple congestion links within the same path. Comprehensive simulations validated the feasibility and precision of the GIC method.

Author Contributions

Conceptualization, X.L.; methodology, J.S.; validation, J.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The datasets generated and analyzed during the current study are available at https://doi.org/10.5281/zenodo.12755369.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Viani, F.; Rocca, P.; Oliveri, G.; Massa, A. Pervasive remote sensing through WSNs. In Proceedings of the 2012 6th European Conference on Antennas and Propagation (EUCAP), Prague, Czech Republic, 26–30 March 2012; IEEE: Piscataway, NJ, USA, 2012. [Google Scholar]
  2. Kandris, D.; Nakas, C.; Vomvas, D.; Koulouras, G. Applications of Wireless Sensor Networks: An Up-to-Date Survey. Appl. Syst. Innov. 2020, 3, 14. [Google Scholar] [CrossRef]
  3. Popescu, D.; Stoican, F.; Stamatescu, G.; Ichim, L.; Dragana, C. Advanced UAV—WSN System for Intelligent Monitoring in Precision Agriculture. Sensors 2020, 20, 817. [Google Scholar] [CrossRef] [PubMed]
  4. Singh, P.K.; Sharma, A. An intelligent WSN-UAV-based IoT framework for precision agriculture application. Comput. Electr. Eng. 2022, 100, 107912. [Google Scholar] [CrossRef]
  5. Lanzolla, A.; Spadavecchia, M. Wireless sensor networks for environmental monitoring. Sensors 2021, 21, 1172. [Google Scholar] [CrossRef] [PubMed]
  6. Ullo, S.L.; Sinha, G.R. Advances in Smart Environment Monitoring Systems Using IoT and Sensors. Sensors 2020, 20, 3113. [Google Scholar] [CrossRef]
  7. Qin, Z.; Denker, G.; Giannelli, C.; Bellavista, P.; Venkatasubramanian, N. A software defined networking architecture for the internet-of-things. In Proceedings of the 2014 IEEE Network Operations and Management Symposium (NOMS), Krakow, Poland, 5–9 May 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1–9. [Google Scholar]
  8. Hunkeler, U.; Lombriser, C.; Truong, H.L.; Weiss, B. A case for centrally controlled wireless sensor networks. Comput. Netw. 2013, 57, 1425–1442. [Google Scholar] [CrossRef]
  9. Anitha, P.; Vimala, H.; Shreyas, J. Comprehensive review on congestion detection, alleviation, and control for IoT networks. J. Netw. Comput. Appl. 2023, 221, 103749. [Google Scholar]
  10. Tao, X.; Monaco, D.; Sacco, A.; Silvestri, S.; Marchetto, G. Delay-Aware Routing in Software-Defined Networks Via Network Tomography and Reinforcement Learning. IEEE Trans. Netw. Sci. Eng. 2024, 11, 3383–3397. [Google Scholar] [CrossRef]
  11. Vardi, Y. Network tomography: Estimating source-destination traffic intensities from link data. J. Am. Stat. Assoc. 1996, 91, 365–377. [Google Scholar] [CrossRef]
  12. Duffield, N. Network tomography of binary network performance characteristics. IEEE Trans. Inf. Theory 2006, 52, 5373–5388. [Google Scholar] [CrossRef]
  13. Nguyen, H.X.; Thiran, P. The boolean solution to the congested IP link location problem: Theory and practice. In Proceedings of the IEEE INFOCOM 2007—26th IEEE International Conference on Computer Communications, Anchorage, AK, USA, 6–12 May 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 2117–2125. [Google Scholar]
  14. Zheng, Q.; Cao, G. Minimizing probing cost and achieving identifiability in probe-based network link monitoring. IEEE Trans. Comput. 2011, 62, 510–523. [Google Scholar] [CrossRef]
  15. Holbert, B.; Tati, S.; Silvestri, S.; La Porta, T.F.; Swami, A. Network topology inference with partial information. IEEE Trans. Netw. Serv. Manag. 2015, 12, 406–419. [Google Scholar] [CrossRef]
  16. Xue, L.; Marina, M.K.; Li, G.; Zheng, K. Paint: Path aware iterative network tomography for link metric inference. In Proceedings of the 2022 IEEE 30th International Conference on Network Protocols (ICNP), Lexington, KY, USA, 30 October–2 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–12. [Google Scholar]
  17. Arrigoni, V.; Bartolini, N.; Massini, A.; Trombetti, F. Failure localization through progressive network tomography. In Proceedings of the IEEE INFOCOM 2021—IEEE Conference on Computer Communications, Vancouver, BC, Canada, 10–13 May 2021; IEEE: Piscataway, NJ, USA, 2022; pp. 1–10. [Google Scholar]
  18. Chiu, C.C.; He, T. Stealthy dgos attack against network tomography: The role of active measurements. IEEE Trans. Netw. Sci. Eng. 2021, 8, 1745–1758. [Google Scholar] [CrossRef]
  19. He, T.; Ma, L.; Swami, A.; Towsley, D. Network Tomography: Identifiability, Measurement Design, and Network State Inference; Cambridge University Press: Cambridge, UK, 2021. [Google Scholar]
  20. Rkhami, A.; Hadjadj-Aoul, Y.; Rubino, G.; Outtagarts, A. On the use of machine learning and network tomography for network slices monitoring. In Proceedings of the 2021 IEEE 22nd International Conference on High Performance Switching and Routing (HPSR), Paris, France, 7–10 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–7. [Google Scholar]
  21. Sartzetakis, I.; Varvarigos, E. Network Tomography with Partial Topology Knowledge and Dynamic Routing. J. Netw. Syst. Manag. 2023, 31, 73. [Google Scholar] [CrossRef]
  22. Duffield, N. Simple network performance tomography. In Proceedings of the 3rd ACM SIGCOMM Conference on Internet Measurement, Miami Beach, FL, USA, 27–29 October 2003; pp. 210–215. [Google Scholar]
  23. Tsang, Y.; Yildiz, M.; Barford, P.; Nowak, R. Network radar: Tomography from round trip time measurements. In Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement, Sicily, Italy, 25–27 October 2004; pp. 175–180. [Google Scholar]
  24. Padmanabhan, V.N.; Qiu, L.; Wang, H.J. Server-based inference of internet performance. In Proceedings of the IEEE INFOCOM; Citeseer: San Francisco, CA, USA, 2003; Volume 3, pp. 1–15. [Google Scholar]
  25. Pan, S.; Li, P.; Zeng, D.; Zhang, Z.; Guo, S.; Liang, Y.C. Learning-Based Network Boolean Tomography for Identifying Congested Links with Correlations. In Proceedings of the GLOBECOM 2020-2020 IEEE Global Communications Conference, Taipei, Taiwan, 8–10 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
  26. Pan, S.; Li, P.; Zeng, D.; Guo, S.; Hu, G. A Q-learning based framework for congested link identification. IEEE Internet Things J. 2019, 6, 9668–9678. [Google Scholar] [CrossRef]
  27. Steinder, M.; Sethi, A.S. Probabilistic fault diagnosis in communication systems through incremental hypothesis updating. Comput. Netw. 2004, 45, 537–562. [Google Scholar] [CrossRef]
  28. Kandula, S.; Katabi, D.; Vasseur, J.P. Shrink: A tool for failure diagnosis in IP networks. In Proceedings of the 2005 ACM SIGCOMM Workshop on Mining Network Data, Philadelphia, PA, USA, 26 August 2005; pp. 173–178. [Google Scholar]
  29. Ee, C.T.; Bajcsy, R. Congestion control and fairness for many-to-one routing in sensor networks. In Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems, Baltimore, MD, USA, 3–5 November 2004; pp. 148–161. [Google Scholar]
  30. Wang, C.; Li, B.; Sohraby, K.; Daneshmand, M.; Hu, Y. Upstream congestion control in wireless sensor networks through cross-layer optimization. IEEE J. Sel. Areas Commun. 2007, 25, 786–795. [Google Scholar] [CrossRef]
  31. Wei, W.; Wang, B.; Towsley, D.; Kurose, J. Model-based identification of dominant congested links. In Proceedings of the 3rd ACM SIGCOMM Conference on Internet Measurement, Miami Beach, FL, USA, 27–29 October 2003; pp. 115–128. [Google Scholar]
  32. Pan, S.L.; Zhang, Z.Y.; Zhou, Y.J.; Qian, F.; Hu, G.M. Identify congested links based on enlarged state space. J. Comput. Sci. Technol. 2016, 31, 350–358. [Google Scholar] [CrossRef]
  33. Pan, S.; Zhou, Y.; Zhang, Z.; Yang, S.; Qian, F.; Hu, G. Identify congested links with network tomography under multipath routing. J. Netw. Syst. Manag. 2019, 27, 409–429. [Google Scholar] [CrossRef]
  34. Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; Van Den Berg, R.; Titov, I.; Welling, M. Modeling relational data with graph convolutional networks. In Proceedings of the Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, 3–7 June 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 593–607. [Google Scholar]
  35. Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-GCN: A temporal graph convolutional network for traffic prediction. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3848–3858. [Google Scholar] [CrossRef]
Figure 1. A simple multiple-route topology.
Figure 1. A simple multiple-route topology.
Applsci 14 09164 g001
Figure 2. Congestion caused by node factors or channel factors.
Figure 2. Congestion caused by node factors or channel factors.
Applsci 14 09164 g002
Figure 3. A simple multiple-route topology.
Figure 3. A simple multiple-route topology.
Applsci 14 09164 g003
Figure 4. BNM of the network.
Figure 4. BNM of the network.
Applsci 14 09164 g004
Figure 5. The graph constructed when r 3 , r 4 , r 5 , r 6 , r 7 , and r 8 are congested.
Figure 5. The graph constructed when r 3 , r 4 , r 5 , r 6 , r 7 , and r 8 are congested.
Applsci 14 09164 g005
Figure 6. The simplified graph when r 3 , r 4 , r 5 , r 6 , r 7 , and r 8 are congested.
Figure 6. The simplified graph when r 3 , r 4 , r 5 , r 6 , r 7 , and r 8 are congested.
Applsci 14 09164 g006
Figure 7. Transforming the task of congested link identification into graph node classification.
Figure 7. Transforming the task of congested link identification into graph node classification.
Applsci 14 09164 g007
Figure 8. GCN model structure.
Figure 8. GCN model structure.
Applsci 14 09164 g008
Figure 9. Schematic depiction of our entity classification model.
Figure 9. Schematic depiction of our entity classification model.
Applsci 14 09164 g009
Figure 10. Performance under different hidden units.
Figure 10. Performance under different hidden units.
Applsci 14 09164 g010
Figure 11. DR versus different channel cause congestion probabilities.
Figure 11. DR versus different channel cause congestion probabilities.
Applsci 14 09164 g011
Figure 12. FPR versus different channel cause congestion probabilities.
Figure 12. FPR versus different channel cause congestion probabilities.
Applsci 14 09164 g012
Figure 13. DR versus different congestion caused by node factor probabilities.
Figure 13. DR versus different congestion caused by node factor probabilities.
Applsci 14 09164 g013
Figure 14. FPR versus different congestion caused by node factor probabilities.
Figure 14. FPR versus different congestion caused by node factor probabilities.
Applsci 14 09164 g014
Figure 15. Schematic diagram of binary tree network topology.
Figure 15. Schematic diagram of binary tree network topology.
Applsci 14 09164 g015
Figure 16. DR versus different network size.
Figure 16. DR versus different network size.
Applsci 14 09164 g016
Figure 17. FPR versus different network size.
Figure 17. FPR versus different network size.
Applsci 14 09164 g017
Table 1. The route table of the network.
Table 1. The route table of the network.
LinksRoutes
r 1 1 r 1 2 r 2 1 r 2 2 r 3 1 r 3 2 r 4 1 r 4 2
l 1 10101010
l 2 01010101
l 3 10100000
l 4 00001010
l 5 01010000
l 6 00000101
l 7 11000000
l 8 00110000
l 9 00001100
l 10 00000011
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Song, J.; Liao, X.; Qiao, J. A Graph Convolutional Network-Based Method for Congested Link Identification. Appl. Sci. 2024, 14, 9164. https://doi.org/10.3390/app14209164

AMA Style

Song J, Liao X, Qiao J. A Graph Convolutional Network-Based Method for Congested Link Identification. Applied Sciences. 2024; 14(20):9164. https://doi.org/10.3390/app14209164

Chicago/Turabian Style

Song, Jiaqing, Xuewen Liao, and Jiandong Qiao. 2024. "A Graph Convolutional Network-Based Method for Congested Link Identification" Applied Sciences 14, no. 20: 9164. https://doi.org/10.3390/app14209164

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop