1. Introduction
With the rapid development of the Internet of Things (IoT), the interconnectivity of smart devices through the internet has experienced explosive growth, resulting in an unprecedented scale of networks [
1,
2,
3]. Correspondingly, the IoT network structure has gradually evolved from small-scale distributed systems to a large-scale hierarchical layout characterized by tight collaboration between backbone and access networks, showcasing strong cross-domain communication potential [
4,
5,
6]. In this context, the dynamic characteristic of backbone node connections and the surge in service demands pose a pressing challenge: how to achieve rapid fault detection of backbone node links and real-time awareness of topological changes, thereby increasing the effective transmission time of services.
In the process of service transmission, the role of routing protocols becomes increasingly critical, especially in the Internet of Things (IoT), where the on-demand routing protocol AODV is widely used [
7,
8]. However, the path selection of AODV is often relatively singular, which can lead to route contention and result in network congestion issues. Furthermore, the delays associated with AODV’s path discovery process conflict with the real-time awareness requirements of backbone network nodes regarding topological changes, thereby limiting the application of AODV within backbone networks. To optimize the issue of routing contention in the Internet of Things, Elappila et al. proposed a congestion- and interference-aware energy efficient routing technique for the IoT. This approach is based on the signal-to-noise ratio and noise ratio of links. By evaluating the survivability factor of the path from the next-hop node to the destination and its current congestion level, it effectively meets the data routing demands in high-load network environments [
9]. Guo et al. introduced a routing algorithm based on reinforcement learning, which constructs a reward function using remaining energy and hop count, optimizing routing in wireless sensor networks (WSNs), extending network lifespan, and ensuring a good data flow transmission rate [
10]. Younus et al. utilized reinforcement learning to optimize routing in software-defined networking (SDN) wireless sensor networks, proposing a reward function based on energy utilization and QoS requirements, significantly improving data flow transmission efficiency [
11]. Although these algorithms have reduced routing contention and improved data transmission efficiency to some extent, they primarily avoid routing contention by optimizing paths. When link failures occur between nodes and fault detection is not timely, even the best routing algorithms struggle to mitigate the impact of faulty links, leading to packet loss and reduced transmission efficiency.
Open Shortest Path First (OSPF) is a proactive routing protocol that possesses a comprehensive network topology view of all nodes within the area [
12,
13,
14]. In the event of a link failure, it can swiftly disseminate information to all nodes and update routing, ensuring rapid route convergence across all nodes simultaneously. This mechanism is particularly suitable for backbone nodes in the Internet of Things, as it avoids the delays associated with path discovery in AODV. However, its fault detection process is mechanical and slow, making it difficult to quickly identify failures, which severely impacts service transmission in the IoT and hinders its deployment. To accelerate fault detection speed, the literature [
15,
16,
17] has introduced the BFD protocol. BFD establishes independent sessions between OSPF neighbors to send probe messages at millisecond intervals, allowing for real-time monitoring of link status and rapid detection of link failures. However, this high-frequency probing also results in increased routing overhead. Particularly in cases of link congestion or interference, BFD is prone to false alarms, leading to unnecessary route convergence and further exacerbating routing overhead in the network [
18]. To reduce routing overhead, Manousakis et al. developed a tool based on an enhanced simulated annealing algorithm, aimed at automatically optimizing OSPF area partitioning to balance multi-objective performance requirements. This tool effectively reduces routing overhead, convergence time, and latency while improving bandwidth utilization [
19]. However, its applicability is limited, primarily targeting network environments with minimal topological changes. Considering the dynamic characteristics of network topology in the IoT, this limitation affects the tool’s feasibility in practical deployment. In response to time-varying network topologies, the IETF has introduced the OSPF-MPR [
20,
21], OSPF-MDR [
22], and OSPF-OR [
23] protocols to address topological changes in MANETs. These protocols introduce specific mechanisms to meet the demand for routing convergence in MANET topologies; however, their effectiveness in improving convergence speed is not significant. To address this gap, researchers have introduced centrality into the routing convergence process to enhance convergence speed while optimizing routing overhead to some extent. References [
24,
25] discuss the placement problem of Designated Routers (DRs) and propose optimizing DR layouts using betweenness centrality, closeness centrality, and degree centrality to shorten routing convergence time. References [
26,
27] adjust the sending frequency of Hello packets based on betweenness centrality, increasing the frequency for nodes with higher intermediary centrality to expedite fault detection and reduce routing overhead. References [
28,
29,
30] optimize the sending of Hello packets based on load centrality, running the load centrality algorithm directly in distributed routers to reduce computational complexity, significantly improving routing convergence efficiency. Although centrality-based algorithms have achieved success in accelerating fault detection and controlling routing overhead, nodes with lower centrality can still become critical routing nodes. Their failures can delay link fault detection and convergence, leading to data loss. Furthermore, nodes with higher centrality may waste network resources due to the frequent sending of Hello messages.
To address this, this paper proposes an inverse coupled simulated annealing for enhanced OSPF route convergence (OSPF-ICSA), aimed at resolving the issues of time-varying topology in IoT backbone nodes, slow fault detection, and inadequate topological awareness, which result in shorter effective transmission times for services. This method integrates the OSPF protocol into the Internet of Things and improves upon the traditional OSPF protocol. This method first utilizes the statistical characteristics of Hello packets to assess the link state and characterizes the node state through aggregated link state features, dynamically reflecting the topological changes of nodes in the IoT. This process effectively captures the dynamic characteristics of connection relationships, providing real-time network state data support for the inverse coupled annealing algorithm. Secondly, based on the improvement of Hello packets, a mechanism for OSPF interval synchronization and node state transmission is designed to ensure that the sending intervals and fault detection times of all nodes within the same subnet are synchronized, facilitating the effective transmission of node status messages and laying the foundation for the efficient application of the inverse coupled annealing algorithm. Finally, based on this foundation, the inverse coupled annealing algorithm is introduced to collaboratively optimize the sending intervals of Hello packets and fault detection times through two coupled annealing algorithms, dynamically adjusting the sending frequency of Hello packets and fault detection times. In poor link states, this accelerates fault detection so that nodes can more swiftly and flexibly identify and respond to link failures, thus shortening overall routing convergence time. In good link states, it reduces the sending frequency of Hello packets to lower routing overhead.
2. Main Contribution
2.1. Design Overview
OSPF routing convergence refers to the process by which nodes in a network reach a consensus to update routing information following the detection of changes in links or nodes [
12,
13,
14]. This process is illustrated in
Figure 1.
The process begins with fault detection. Node
R periodically sends Hello packets via the Hello protocol to monitor the status of neighboring nodes. The sending interval for Hello packets is denoted as
, which is controlled by the Hello Interval parameter, typically set to a default value of 10 s. Additionally, the link fault detection time
is usually set to four times
. If no response to the Hello packets from the neighbors is received within
(governed by the Dead Interval parameter), the neighbor is considered unreachable, triggering link fault handling. Assuming that the occurrence of a fault is a random event, the time difference
X from the occurrence of the link fault to when
R detects the fault is uniformly distributed within the interval (30, 40). The probability density function for this distribution is given by:
At this point, the expected value is 35. Therefore, the average fault detection time is 35 s.
When a fault occurs,
R generates a new Link State Advertisement (LSA) to reflect the topological change. Upon receiving the new LSA, neighboring nodes store it in their Link State Databases (LSDBs) and subsequently flood the LSA to all neighbors, achieving complete LSA flooding. To optimize LSA flooding, OSPF employs the mechanisms of Designated node (DR) and Backup Designated node (BDR). The DR is responsible for receiving and broadcasting LSAs within the network, which reduces redundant transmissions and minimizes bandwidth consumption. The BDR acts as a backup for the DR, ensuring continuity in LSA flooding. The cooperative operation of the DR and BDR enables OSPF to efficiently flood LSAs while alleviating routing overhead and network load. Consequently, the flooding time of an LSA within a broadcast segment can be expressed as:
where
and
represent the time taken by the DR and BDR/Others to generate the LSA, respectively,
s is the size of the LSA packet in bytes,
is the interface transmission rate, and
is the time taken by the DR to process the LSA. Therefore, the total time for LSA dissemination within the entire segment can be expressed as:
where
n represents the maximum number of broadcast segments for flooding the LSA to all nodes, and
is the time required for the
i-th broadcast segment that is traversed.
Once the LSA flooding is complete, the node recalculates the Shortest Path Tree (SPT) using Dijkstra’s algorithm based on the updated LSDB and updates the routing table. Let the time taken for this process be denoted as
. Therefore, the total routing convergence time can be expressed as:
In
, the fault detection phase is typically the most time-consuming part. This is primarily because confirming a link failure depends on not receiving a response to the Hello packets within the time
, which is set to 40 s by default. During this period, the average
requires 35 s. In contrast, the LSA flooding process, which involves the generation and flooding of LSAs, generally consumes less time than the fault detection phase, although the time required may increase with network size. Additionally, the routing table calculation phase, based on Dijkstra’s algorithm for shortest path computation, also operates within a limited time frame. Therefore, this paper restructures the protocol based on the traditional OSPF protocol, dynamically optimizing the Hello packet transmission frequency and fault detection time, aiming to accelerate fault detection speed, thereby enhancing routing convergence speed and, to some extent, reducing routing overhead, as illustrated in
Figure 2.
The specific improvements are as follows:
Utilize the statistical characteristics of Hello packets to obtain the link status, and characterize the node state based on the aggregated features of the link state, thereby assessing the state of links and nodes in the network and providing data support for the reverse coupled annealing algorithm.
Design an OSPF interval synchronization and node state propagation mechanism (OSPF-IS-NSP) to synchronize the sending intervals of Hello packets and fault detection times of nodes within the same subnet and propagate the node state of the local node.
Propose a reverse coupled annealing algorithm, which consists of two coupled annealing algorithms: the upward optimization annealing algorithm and the downward optimization annealing algorithm. When one annealing algorithm is executed, the temperature of the other gradually rises, and vice versa. Through this mechanism of alternating heating and cooling, and based on the link and node status, the two algorithms collaboratively optimize the Hello packet transmission interval and fault detection time.
2.2. Link and Node State Monitoring
In the OSPF protocol, the instability of links or nodes can trigger excessive LSA flooding and frequent routing updates, leading to routing oscillations. This situation not only results in a significant increase in routing overhead but also places a greater burden on the node’s CPU. In severe cases, it may even lead to network storms, resulting in network failures. To address this issue, this section introduces link and node states to monitor the statuses of links and nodes in real time, thereby providing data support for subsequent algorithms.
A network topology can be represented as
, where
N is the set of network node nodes and
E is the set of communication links. Assume there exists a link
between nodes
and
, which can be denoted as:
. In this case, the link state is evaluated based on the absence of Hello packets in link
l during the assessment period, as expressed in the following formula:
where
is the link state evaluation function,
T is the assessment period, and
represents the time segment formed by the absence of Hello packets during the assessment period. This is defined as the difference between the expected time of receiving the next Hello packet and the actual last received time, as illustrated in
Figure 3.
Additionally, based on the aggregated average link state of all interfaces on node
v, the node state is assessed using the following formula:
where
is the node state assessment function, and
represents the cardinality of
E, which is the number of links associated with node
v.
Building on this, this paper introduces weight parameters
and
for a comprehensive assessment of link and node states. The specific evaluation formula is:
where
is the comprehensive assessment function used to evaluate the overall state of the link between node
v and its neighboring node
u. The link state is denoted by
, indicating the condition of the link between nodes
v and
u, while
represents the state of node
u. The parameters
and
are weight coefficients for link and node states, respectively, which adjust their influence on
.
2.3. OSPF Interval Synchronization and Node State Propagation Mechanism
In the OSPF protocol, unifying the configuration of the Hello Interval (
) and Dead Interval (
) parameters for all nodes within the same segment is essential for ensuring the stable establishment and continuous maintenance of neighbor relationships. Hello packets primarily facilitate neighbor discovery and fault monitoring. Inconsistent parameter configurations can lead to asymmetries in neighbor state perception, adversely affecting the correct establishment of adjacency relations and timely fault detection. Additionally, parameter consistency synchronizes the time window for fault detection, preventing delays or misjudgments due to variations in
. This unification also enables the synchronous propagation of topological state information within the segment, minimizing delays in topology updates. Consequently, this section introduces the OSPF interval synchronization mechanism to ensure parameter consistency among all nodes, providing foundational support for the deployment of the reverse coupled annealing algorithm, as illustrated in
Figure 4.
The specific steps of the OSPF interval synchronization mechanism are as follows:
When the Designated Router (DR) calculates the new parameters for and (assumed to be 10 s and 40 s, respectively), the DR appends its and parameters to the Hello message and broadcasts it to all neighbors in the subnet.
The Backup Designated Router (BDR) and other non-DR routers have their previous and values assumed to be 7 s and 28 s, respectively.
Once the BDR or other routers receive the Hello message from the DR containing
and
, they will immediately update their
and
to 10 s and 40 s, respectively. At the same time, the nodes will adjust their Hello message sending timers and failure detection timers accordingly. As shown in
Figure 4, due to the timer adjustments, the sending of Hello messages is delayed by 3 s compared to before. When the next Hello message sending cycle begins, the node will send a Hello message containing the new parameters.
When the DR receives the Hello messages sent by the BDR or other routers, it can detect that these nodes’ and have been updated to 10 s and 40 s, thereby achieving global synchronization of and across the entire subnet.
Additionally, this section designs a node state propagation mechanism, which transmits the node state by mapping it to the Priority field in the Hello message. This approach not only optimizes the DR election process based on the Priority field—allowing nodes with higher node state to be prioritized as DR or BDR—but also ensures that nodes with superior states take on critical roles within the network, enhancing stability and routing efficiency. Furthermore, since the OSPF protocol lacks a native mechanism for propagating node state, this mechanism enables nodes to share node state information in real-time among neighbors.
2.4. Reverse Coupled Annealing Algorithm Design
To address the dynamic characteristic of link relationships and the frequent link failures in IoT networks, it is essential to dynamically adjust the sending frequency of Hello messages and the fault detection time based on the characteristics of link and node states in real time. The annealing algorithm can explore and approximate the optimal Hello interval and fault detection time through multiple iterations based on the network environment, allowing for a broader search space in the initial stages to prevent the algorithm from getting trapped in local optima [
31,
32,
33,
34]. As the algorithm’s temperature gradually decreases, the search range converges, accelerating convergence while maintaining solution diversity. The algorithm begins by randomly generating an initial solution
and searches for solutions within its neighborhood. In each iteration, based on the Metropolis criterion, the algorithm decides whether to accept a worse solution according to the acceptance probability formula:
where
is the difference in objective function values between the current and new solutions.
As the internal “temperature” of the algorithm gradually decreases, the temperature change can be expressed by the formula:
where
is the initial temperature and
is a decay factor less than 1. As the temperature decreases, the probability of accepting inferior solutions also diminishes. After each iteration, the algorithm updates the current solution:
The optimization process gradually stabilizes, ultimately converging to the global optimum , thereby minimizing .
However, traditional annealing algorithms primarily focus on unidirectional optimization, gradually reducing the temperature to minimize perturbation and slowly approaching the optimal solution. Once the temperature reaches its lowest point, the algorithm loses its effectiveness. This unidirectional cooling strategy gradually diminishes the algorithm’s perception of the network environment during execution, leading to a decrease in adaptability to dynamic changes. Particularly when faced with failures, the algorithm may not respond promptly, thereby affecting the overall fault detection and recovery speed. Therefore, this paper proposes the reverse coupled annealing algorithm, which consists of two coupled annealing algorithms: Algorithms 1 and 2. These algorithms collaborate to optimize link and node states based on
and implement dynamic fault detection through the OSPF interval synchronization mechanism. When a node assesses the overall state of node
v, neighboring node
u, and the link between them through the comprehensive evaluation function
, it triggers DOSA if
falls below the preset threshold
, executing downward optimization to reduce the values of
and
; otherwise, it triggers UOSA. Through this alternating operation of heating and cooling, the two annealing algorithms dynamically couple and interact, thereby adjusting the parameters
and
based on the network environment. The details are as follows.
Algorithm 1: Upward optimization simulated annealing algorithm (UOSA) |
Input: UOSA current temperature: ; UOSA termination temperature: ; UOSA annealing weight: ; DOSA current temperature: ; UOSA heating weight: ; DOSA maximum temperature: ; UOSA inner loop iterations: ; UOSA current optimal sending interval: Current Hello packet sending interval: ; Current fault detection time: ; Output: next Hello packet sending interval: ; next fault detection time: ; UOSA optimal sending interval: ; DOSA next temperature:
- 1:
Initialize: , , - 2:
if then - 3:
Initialize: - 4:
while do - 5:
- 6:
difference value: - 7:
if then - 8:
- 9:
else if then - 10:
- 11:
end if - 12:
if then - 13:
- 14:
end if - 15:
- 16:
end while - 17:
- 18:
- 19:
end if - 20:
|
Algorithm 2: Downward optimization simulated annealing algorithm (DOSA) |
Input: DOSA current temperature: ; DOSA termination temperature: ; DOSA annealing weight: ; UOSA current temperature: ; DOSA heating weight: ; UOSA maximum temperature: ; DOSA inner loop count: ; DOSA current optimal sending interval: ; Current hello Packet transmission interval: ; Current fault detection time: Output: next Hello packet transmission interval: ; next fault detection time: ; DOSA optimal sending interval: ; UOSA next temperature:
- 1:
Initialize: , - 2:
if then - 3:
Initialize: - 4:
while do - 5:
- 6:
difference value: - 7:
if then - 8:
- 9:
else if then - 10:
- 11:
end if - 12:
if then - 13:
- 14:
end if - 15:
- 16:
end while - 17:
- 18:
- 19:
end if - 20:
|
The objective of the upward optimization simulated annealing (UOSA) algorithm is to optimize the Hello packet transmission interval and the fault detection time , gradually increasing both parameters to their maximum values. In line 2 of the pseudocode, it first checks whether the initial temperature has reached the termination temperature ; if not, it enters the inner loop. Lines 4 to 16 describe the inner loop process of the simulated annealing algorithm. The algorithm generates a random number in the range of −2 to 4 using the random seed generation function and adds it to to obtain a new . Subsequently, it assesses whether the generated difference meets the Metropolis criterion, ultimately updating and the current optimal state . Line 17 implements the cooling operation of the annealing process, and line 18 calculates the new fault detection time based on the updated . Line 20 executes the heating operation for the downward optimization simulated annealing (DOSA); each time UOSA is completed, the initial temperature for DOSA is increased. In summary, the code structure of this algorithm consists of only one outer loop, with the number of iterations being . The algorithm’s complexity is .
The downward optimization simulated annealing algorithm (DOSA) also targets the Hello message transmission interval and the fault detection time , but its optimization direction is opposite to that of UOSA, aiming to gradually reduce these two parameters to their minimum values. The process is similar to UOSA, so it will not be elaborated here.
4. Conclusions
In the context of the current evolution of IoT network structures from small-scale distributed systems to large-scale hierarchical collaboration between backbone and access networks, the dynamic changes in backbone node connections and the surge in service demands pose significant challenges. The sluggish fault detection speed leads to a substantial reduction in the effective service transmission time. This paper proposes an OSPF dynamic routing convergence method based on reverse coupled simulated annealing (OSPF-ICSA). This method utilizes the statistical characteristics of Hello packets to acquire the link status and characterizes the node state based on the aggregated features of the link status, providing data support for the reverse coupled simulated annealing algorithm regarding the network conditions. Furthermore, the Hello packet is improved and an OSPF interval synchronization and node state transmission mechanism is designed to synchronize the sending intervals and fault detection times of nodes within the same subnet, while sharing the node state of nodes. Building on this foundation, the reverse coupled simulated annealing algorithm is introduced to collaboratively optimize the Hello packet sending interval and fault detection time. Experimental results demonstrate that OSPF-ICSA exhibits outstanding performance across various fault scenarios, particularly in four key indicators: fault detection time, detection accuracy, routing overhead, and packet delivery rate. This algorithm effectively addresses the trade-off between routing convergence time and routing overhead, achieving an optimized convergence speed while significantly reducing resource consumption. The aim is for OSPF-ICSA to provide a new perspective on routing convergence in large-scale hierarchical IoT network topologies. In future research, we plan to integrate machine learning, large models, and other cutting-edge algorithms to achieve more efficient and rapid routing convergence, ushering in a new era of network optimization.