Next Article in Journal
Comparative Analysis of CO2 Emissions, Fuel Consumption, and Fuel Costs of Diesel and Hybrid Dredger Ship Engines
Previous Article in Journal
Research on Evaluation of the Carbon Dioxide Sequestration Potential in Saline Aquifers in the Qiongdongnan–Yinggehai Basin
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

RL-ANC: Reinforcement Learning-Based Adaptive Network Coding in the Ocean Mobile Internet of Things

1
College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
2
Key Laboratory of Space Photoelectric Detection and Perception, Nanjing University of Aeronautics and Astronautics, Ministry of Industry and Information Technology, Nanjing 211106, China
3
Institute of Logistics Science and Engineering, Shanghai Maritime University, Shanghai 201306, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
J. Mar. Sci. Eng. 2024, 12(6), 998; https://doi.org/10.3390/jmse12060998
Submission received: 29 April 2024 / Revised: 29 May 2024 / Accepted: 6 June 2024 / Published: 15 June 2024
(This article belongs to the Section Ocean Engineering)

Abstract

:
As the demand for sensing and monitoring the marine environment increases, the Ocean Mobile Internet of Things (OM-IoT) has gradually attracted the interest of researchers. However, the unreliability of communication links represents a significant challenge to data transmission in the OM-IoT, given the complex and dynamic nature of the marine environment, the mobility of nodes, and other factors. Consequently, it is necessary to enhance the reliability of underwater data transmission. To address this issue, this paper proposes a reinforcement learning-based adaptive network coding (RL-ANC) approach. Firstly, the channel conditions are estimated based on the reception acknowledgment, and a feedback-independent decoding state estimation method is proposed. Secondly, the sliding coding window is dynamically adjusted based on the estimates of the channel erasure probability and decoding probability, and the sliding rule is adaptively determined using a reinforcement learning algorithm and an enhanced greedy strategy. Subsequently, an adaptive optimization method for coding coefficients based on reinforcement learning is proposed to enhance the reliability of the underwater data transmission and underwater network coding while reducing the redundancy in the coding. Finally, the sampling period and time slot table are updated using the enhanced simulated annealing algorithm to optimize the accuracy and timeliness of the channel estimation. Simulation experiments demonstrate that the proposed method effectively enhances the data transmission reliability in unreliable communication links, improves the performance of underwater network coding in terms of the packet delivery rate, retransmission, and redundancy transmission ratios, and accelerates the convergence speed of the decoding probability.

1. Introduction

With the gradual increase in the demand for the development of marine resources and the rising frequency of various maritime activities, real-time sensing and monitoring of the marine environment, as well as the efficient communication of maritime equipment, have become crucial [1,2]. The characteristics of the Internet of Things (IoT), such as the comprehensive sensing, reliable transmission, and intelligent processing, are highly suitable for the requirements of marine environment monitoring and maritime communication. Consequently, the Ocean Mobile Internet of Things (OM-IoT) has gradually piqued the interest of researchers. The traditional OM-IoT mainly refers to Underwater Wireless Sensor Networks (UWSNs) composed of various sensor nodes in the target sea area [3]. On the other hand, the generalized OM-IoT refers to a network that extends beyond traditional UWSNs, encompassing multiple areas and spaces. This network is established using new-generation information technologies like cloud computing, big data, and mobile Internet, and it is constructed across geographical areas, airspace, and sea areas [4,5].
A typical OM-IoT system is illustrated in Figure 1. It comprises a wide range of sensor nodes, including ships, unmanned submersibles, and traditional underwater sensor nodes. The system integrates different types of IoT systems, such as shore-based networks, satellite networks, and UAV-assisted relays. The underwater segment of the OM-IoT network utilizes hydroacoustic communication as the primary mode of communication. However, due to the intricate and fluctuating oceanic environment, the hydroacoustic channel conditions are quite harsh. Furthermore, the movement of nodes, current movement, noise interference and signal collisions may result in the corruption of packets. Additionally, the transmission of packets along an incorrect path may also result in packet loss. Consequently, the issue of unreliable communication links underwater represents a significant challenge to underwater communications. Enhancing the reliability of underwater data transmission has been a prominent topic [6,7,8]. In addition, there are a large number of mobile nodes in the OM-IoT that are influenced by environmental factors like the ocean currents and the technical characteristics of devices such as unmanned underwater vehicles (UUVs). These factors lead to unstable communication link quality between nodes, resulting in more serious issues such as packet corruption and loss. Therefore, the reliability of OM-IoT data transmission faces a significant challenge.
Researchers have conducted numerous studies to address this issue. Packet retransmission [9] is one of the more representative methods for improving the reliability of underwater data transmission. When the receiver detects a missing packet, it will send a retransmission request to the transmitter. The transmitter will retransmit the missing packet based on the retransmission request until the missing packet is acknowledged as having been received. In a communication link with poor quality, the packet loss rate is high, leading to frequent data retransmissions. This situation results in the decreased data transmission efficiency of the system. In addition, under channel conditions with more severe noise interference, data packets are often transmitted incorrectly, making it challenging to achieve error correction through the typical retransmission mechanism. In order to ensure the reliability of data transmission, it must be combined with other error correction methods, which also inevitably introduce additional time costs and energy overheads. To address the shortcomings of the general retransmission mechanism, researchers propose another method to enhance the reliability of data transmission, namely redundant transmission [10]. Redundant transmission can effectively reduce the interference of packet loss and erroneous transmission by transmitting the same packet multiple times. However, it inevitably increases the number of packets to be transmitted. In more complex network scenarios, general redundant transmission can easily cause network congestion and many other problems, reduce the efficiency of the system’s data transmission, and even affect the network lifetime.
Network coding (NC) [11] has been widely studied and applied in underwater data transmission in recent years as a method to effectively improve the network throughput. Therefore, it has been more widely used in underwater data transmission. However, NC also faces challenges in scenarios with unreliable communication links. We take Figure 2 as an example to illustrate such problems. As shown in Figure 2a, node S sends packets to node D. In each time slot, node S sends a coded packet. During the time slots t1~t3, the communication link between S and D is stable, and the coded packets can be transmitted sequentially. After receiving the coded packet, node D can decode it based on previously transmitted data to generate a new packet. When the channel changes and the link quality deteriorates at the beginning of time slot t4, the packet transmission fails. At this point, node D is unable to decode the coded packet from the previous time slot, which subsequently affects the decoding process. In the general transmission process, a large number of packets need to be transmitted. It is often necessary to encode and transmit these packets in batches. However, transmitting random batches can lead to issues such as the excessive or low complexity of the encoding and decoding operations, coding redundancy, or missing packets. As illustrated in Figure 2b, when the time slot t3 commences, node S transmits a coded packet to node D. In the subsequent time slots, due to inappropriate coding combinations or an abnormal number of zero elements in the coding coefficients matrix, some coded packets may become undecodable, rendering the corresponding packets inaccessible to node D. Moreover, various network coding algorithms primarily rely on random linear network coding (RLNC) [12]. Since the coding coefficients of each packet in RLNC are randomly chosen from the random field GF(2q), the improper construction of the coding coefficient matrix can also negatively impact data transmission. Therefore, optimizing the network coding to achieve the adaptive selection of coding packets and coding coefficients is of great importance for enhancing the performance of underwater data transmission.
In this paper, we propose a data transmission method that integrates reinforcement learning and network coding for the data transmission problem under unreliable communication links in the OM-IoT and adaptive optimization for underwater network coding. The main contributions of this paper are as follows:
  • Establish a comprehensive Binary Erasure Channel (BEC) model for unreliable communication links affected by multiple factors by simulating the packet loss issue resulting from various causes using the channel erasure probability. Additionally, a method will be proposed to estimate the channel erasure probability, channel capacity, and other relevant metrics for the developed BEC model.
  • Address the issue of batch packet encoding by applying a dynamic adjustment method for the sliding coding window based on the channel conditions and decoding states. Additionally, we introduce a sliding rule adaptive optimization method based on the Q-learning algorithm. The method achieves packet batching for transmission and updates the packets for encoding in real-time by adjusting the sliding window size and sliding rules.
  • Address the issue of excessively high randomness in RLNC coding coefficients by utilizing a Deep Q-Network (DQN)-based adaptive optimization method for coding coefficients. The method adaptively selects the coding coefficients based on the current packets in the window and historical coding information. This approach restricts the complexity of the coding and decoding operations, thereby enhancing the probability of decoding.
  • Enhance the greedy strategy in the reinforcement-learning algorithm by introducing a time-varying exploration probability to improve the algorithm’s operational efficiency. Additionally, a sampling period optimization method based on the simulated annealing algorithm is proposed to improve the accuracy and timeliness of channel estimation.
The rest of this paper is organized as follows. Section 2 introduces the related work and reviews the research results in recent years. Section 3 describes the system model used in this paper, analyzes the underwater unreliable communication link problem, and establishes the BEC model. Section 4 explains the principles and details of the algorithms proposed in this paper, while Section 5 describes the simulation experiments and analyzes the results. Section 6 summarizes the conclusions of the research in this paper and anticipates future research endeavors.

2. Related Work

Researchers have conducted numerous studies on the application of network coding in UWSNs. Cai et al. [13] proposed a reliable data transmission protocol for UWSNs based on twin paths and network coding. They established twin paths and transmitted shareable redundant packets to enhance the reliability of data transmission. Feng et al. [14] introduced an asynchronous duty cycle and network coding MAC protocol for UWSNs. This protocol is based on an asynchronous duty cycle to determine the rendezvous time of the exchanged data. It also suggests a coding node selection strategy and network-coding algorithm to enable the coding and forwarding of packets. Kulhandjian et al. [15] presented a CDMA-based simulated network coding method for UWSNs. This method tackles the issue of mutual interference of different packets at the relay nodes in unidirectional multihop networks. It incorporates interference cancellation based on a priori information. In two-way relay networks, the superposition property of hydroacoustic signals is utilized. To treat the received interference packets as naturally coded packets and forward them, Hao et al. [16] proposed a partial network coding-based geographic routing protocol for UWSNs. This protocol employs partial network coding to encode the packets and, based on the positional information of the sensor nodes, adopts a greedy strategy to forward the encoded packets to reduce the network delay and decrease the transmission energy consumption. Wang et al. [17] proposed an energy-efficient data transmission protocol based on network coding, hybrid auto-repeat request, and adaptive window size estimation algorithms to ensure the reliability and efficiency and optimize the trade-off between throughput and energy consumption for data transmission in UWSNs. Additionally, Wang et al. [18] proposed a network coding-based cross-layer routing protocol for UWSNs that takes advantage of multicast transmission to jointly decode coded packets received from multiple potential nodes throughout the network and optimize the transmission power. Zhan et al. [19] proposed a joint scheduling strategy. A method for network coding and transmission in UWSNs is proposed to address coding and transmission conflicts. The solution involves a heuristic approach to resolve conflicts in a conflict-free graph, searching for the maximum independent set to minimize the transmission time slots. Zhao et al. [20] introduced a network coding-aware opportunistic routing protocol and a sliding-window coding algorithm to enhance the data transmission robustness and reduce the decoding overheads in UWSNs. Su et al. [21] suggested a hybrid coding-aware routing protocol for UWSNs, incorporating inter-flow network coding and a combination of aware and opportunistic routing. They also presented an encoding method that does not depend on opportunistic listening to leverage network encoding opportunities and optimize transmission overheads.
The rise of reinforcement learning (RL) [22] has brought more possibilities to improve the performance of data transmission in UWSNs. Park et al. [23] proposed a reinforcement learning-based medium access control protocol for UWSNs to solve the underwater time synchronization problem through asynchronous operation, to improve channel utilization by reducing the number of time slots per frame, and to achieve collision-free scheduling by employing a new random backoff scheme. Chang et al. [24] proposed a reinforcement learning-based data-forwarding scheme for passive and movable UWSNs to enhance the data transmission performance of UWSNs. Di et al. [25] proposed a multipath adaptive routing scheme for UWSNs based on channel-aware reinforcement learning by means of a distributed reinforcement-learning framework based on the different underwater channel conditions, adaptively switching between single-path and multipath routing modes to achieve the joint optimization of the routing energy consumption and packet delivery rate. Zhang et al. [26] proposed a reinforcement learning-based opportunistic routing protocol for UWSNs that selects suitable nodes by comprehensively considering the nodes’ peripheral states. It introduces a recovery mechanism to minimize the impact of routing voids on the data transmission performance. Zhang et al. [27] introduced a reinforcement learning-based relay selection algorithm for UWSNs, combining RL with a simulated annealing algorithm to enhance the algorithm’s performance. Ye et al. [28] suggested a deep reinforcement learning-based medium access control protocol for underwater acoustic networks. This protocol maximizes the performance of underwater acoustic networks by effectively utilizing time slots due to propagation delays or unused by other nodes. The available time slots should be utilized to maximize the network throughput. In addition, researchers have launched more studies on the application of reinforcement learning (RL) in network coding (NC) optimization. Jadoon et al. [29] proposed a relay selection algorithm based on Q-learning for cooperative networks employing spatio-temporal network coding. The proposed algorithm maximizes the total capacity of the network by learning the cooperative network environment. Gao et al. [30] introduced an RL framework to enhance the network capacity through decoder feedback by dynamically adjusting the network coding parameters online to improve the network coding performance for multihop transmission under dynamic sparse network coding. Xiao et al. [31] suggested a reinforcement learning-based network coding for UAV-assisted secure wireless communication to select network coding strategies based on the measured interference power, previous transmission performance, and channel loading. Their approach aims to enhance the interception probability, latency, outage probability, and energy consumption to improve the anti-eavesdropping performance. Ali et al. [32] proposed a reinforcement learning-based selective random linear network coding (RLNC) framework for the haptic Internet, which utilizes network and receiver feedback to optimally choose between block-based RLNC and sliding window-based RLNC to enhance the system’s data transmission performance.
The aforementioned research is of great significance in improving the performance of OM-IoT data transmission and advancing the utilization of network coding in underwater networks. The introduction of network coding in the OM-IoT can effectively improve network throughput and enhance data transmission reliability and communication efficiency. The underwater communication system, with the integration of RL, exhibits better adaptability to the complex marine environment and can enhance the reliability of underwater data transmission systems in complex environments. The optimization of network coding based on RL enables the system to adaptively adjust the coding strategy according to its environment, making data processing and transmission within or across systems more intelligent. However, the majority of the aforementioned studies are more reliant on the actual feedback from the receiver side, particularly for the optimization of coding coefficients. In the unreliable communication links, the absence of accurate feedback from the receiver to the sender can result in a reduction in the adaptability of the network coding coefficients to the channel conditions. This can result in a deterioration in the system’s data transmission performance, which can in turn lead to a reduction in the adaptability of the network coding coefficients to the channel conditions. Concurrently, the prevailing solution to the issue of erroneous transmission and packet loss resulting from unstable link quality is data retransmission or redundant transmission. However, this approach is susceptible to inducing further complications, such as network congestion. Consequently, it is also essential to pursue a further equilibrium between the efficacy and dependability of data transmission. Moreover, there is a paucity of research investigating the integration of reinforcement learning and network coding techniques for the transmission of OM-IoT data.
In order to address the aforementioned issues, this paper proposes a data transmission method for unreliable communication links in the OM-IoT. This method integrates reinforcement learning and network coding, and it is referred to as reinforcement learning-based adaptive network coding (RL-ANC). Firstly, the channel conditions are estimated based on the reception acknowledgment, the channel changes are tracked in real time, and a feedback-independent decoding state estimation method is proposed. Secondly, the sliding coding window is dynamically adjusted in accordance with the estimates of the probability of erasure and the probability of successful decoding. Subsequently, the sliding rule is adaptively determined using a reinforcement learning algorithm and an enhanced greedy strategy. An adaptive optimization method for coding coefficients based on reinforcement learning is proposed to enhance the reliability of underwater data transmission and underwater network coding while reducing the redundancy in coding. Finally, the sampling period and time slot table are updated using the enhanced simulated annealing algorithm to optimize the accuracy and timeliness of the channel estimation in real time. This optimization considers the convergence of the variance of the estimated channel erasure probability and the decoding probability of coded packets.

3. Theory Preparations

3.1. Ocean Mobile Internet of Things Model

In the OM-IoT, the nodes of the underwater network are primarily classified into two types: aggregation nodes, which are floating on the sea surface and facilitate communication between the underwater network and UAV relays, shore-based networks, and satellite networks, among others; and underwater sensor nodes, which consist of general ocean monitoring sensors and unmanned submarine vehicles that collect and transmit ocean monitoring data. In order to emphasize the principle and performance of the proposed algorithm, it is possible to disregard the inherent characteristics of the nodes that are not relevant to the algorithm’s principle and do not significantly impact its performance. Therefore, in this paper, the underwater segment of the OM-IoT network depicted in Figure 1 is simplified as a three-dimensional stochastic network model, as illustrated in Figure 3. Assume that a set of OM-IoT nodes is randomly deployed in a finite 3D sea area. These nodes form a total node set N, with the total number of nodes n(N). The locations of nodes NiN (i = 1, 2, …, n(N)) are described by the 3D coordinates (xi, yi, zi), where xi > 0, yi > 0, and zi < 0. The depth of the nodes dep(Ni) = |zi|. During any data transmission, the set of source nodes is denoted as S, the set of destination nodes as D, and the set of intermediate nodes as R. Therefore, (SRD) ⊂ N.
As the node mobility and communicable range have a significant impact on the performance of the algorithm, we introduce the movable node model [33], as shown in Figure 4. The velocity of a node NiN (i = 1, 2, …, n(N)) is given by vi = [vxi, vyi, vzi], where vxi, vyi, vzi are the velocity components of Ni in the x, y, and z directions, respectively. The communication radius of node Ni is denoted as φ(Ni), and the necessary condition for nodes Ni and Nj to be able to transmit data is that d(Ni, Nj) ≤ min{φ(Ni), φ(Nj)}, where d(Ni, Nj) is the Euclidean distance between Ni and Nj.

3.2. Underwater Data Transmission Mechanisms

We are primarily concerned with the multihop transmission process of data from an underwater source node to a surface sink node. In this paper, we primarily focus on the scenario in which there is a single sink node. The underwater data transmission mechanism based on the node depth information [34] is employed as the fundamental transmission model. Figure 5 serves as an illustrative example of this process.
In the underwater network depicted in Figure 5, each sensor node is assigned a unique ID, and node N3 will transmit data packets to the sink node. Once transmission has commenced, node N3 transmits the packet to N6, which has a greater change in depth according to the depth priority principle. At this point, N6 selects the next hop node, and since N7, N8, and N9 have the same depth, the forwarding probability must be determined based on other indicators, such as the node’s residual energy, the number of neighboring nodes, etc. Should N6 select N7 as the next hop node, the packet will continue to be transmitted in accordance with the aforementioned rules until it is received by the sink node.
The underwater multihop data transmission method enables communication between nodes over long distances, and it is now widely used in UWSNs. Nevertheless, in the event of unreliable communication links, the possibility of packet mis-transmission or loss between nodes at each hop cannot be discounted. This issue is particularly pronounced when the nodes in question are mobile. Classical data retransmission and redundant transmission mechanisms are considered effective ways to address this problem, but they also introduce additional data transmission burdens, which will affect the overall data transmission performance of the system. Consequently, there is a necessity to achieve a further equilibrium between data transmission efficiency and reliability.

3.3. Network Coding and Decoding

The method proposed in this paper employs RLNC [35] as the foundation for the coding algorithm. To facilitate the description of the RLNC encoding and decoding process, in this paper, the initial data packets and the encoded packets are described as symbol matrices. Assuming that the matrix consisting of n packets to be encoded is M = [P1, P2, …, Pn], and that the total encoded packet Mec = [Pec(1), Pec(2), …, Pec(n)] is obtained after the RLNC. Therefore,
M e c T = P e c 1 P e c n = g 11 g n 1 g n 1 g n n P 1 P n = G × M T ,
where G denotes the matrix of coded coefficients, and it is generated by randomly sampling elements from the finite field G F (2q). In other words, the i-th coding packet Pec(i) is defined as
P e c i = g i j P j .
The decoding of the coded packet is performed in accordance with the Gaussian elimination method. Consequently, the decoding probability of RLNC is contingent upon the rank of the coefficient matrix. The decoding probability of RLNC with respect to the degrees of freedom required for decoding is defined as follows.
Definition 1.
(Decoding probability and degrees of freedom required for decoding.) Assuming that the rank of the coding coefficient matrix G is rank(G) and the number of distinct packets in the coded packet Pec is n(P), the decoding probability η and the degree of freedom χ required for decoding are defined as follows:
η = r a n k G n M ,                                   χ = r a n k G n M .    

3.4. Unreliable Communication Link Model

In underwater data transmission systems, numerous factors contribute to link unreliability, including the seawater temperature, current movement, marine biological activity interference, marine equipment noise interference, and sensor node position changes. The modeling of underwater unreliable communication links from a physical mechanism perspective is inherently complex. From the results of data transmission over unreliable communication links, packet loss represents a significant and undesirable situation that significantly impacts the performance of underwater data transmission. Consequently, in this paper, the unreliable communication link is modeled by creating an erasure channel. The typical erasure channels include the Binary Erasure Channel (BEC) and the Gilbert Elliot Channel (GEC). Without the loss of generality, in this paper, the BEC is used as the base model to establish the underwater unreliable communication link model.
It is assumed that each transmission of nmax packets constitutes one transmission round, where nmax is the maximum transmission limit. In particular, if the number of packets to be transmitted does not exceed nmax, then the completion of transmission of all the packets is recorded as one transmission round. The channel erasure probability and channel capacity are defined as follows.
Definition 2.
(Channel erasure probability and channel capacity.) In the k-th round, the current node Nk will transmit n(k) packets to the next-hop node Nk+1, assuming that each packet within the same round has the same size and that only one packet is transmitted per transmission time slot ti ∈ Tk. For each packet Pi(k), node Nk+1 sends an acknowledgement packet ACKi(k) to node Nk after reception. Assuming that node Nk receives a total of n′(k) acknowledgement packets from Nk+1 after Tk transmission, the channel erasure probability pe(k) and the channel capacity c(k) are calculated as follows:
p e k = 1 n k n k ,                                   c k = 1 p e k = n k n k .    

4. Reinforcement Learning-Based Adaptive Network Coding Algorithm

4.1. General Process of RL-ANC

The overall flow of the RL-ANC algorithm is depicted in Algorithm 1, in accordance with the descriptions presented in Section 4.2, Section 4.3, Section 4.4, Section 4.5, Section 4.6, Section 4.7 and Section 4.8. For the sake of clarity, the overall flow of the proposed RL-ANC calculation is also depicted in Figure 6. In Algorithm 1, the maximum period of the k-th round is denoted as Γk, the sampling period is denoted as τ, and the transmission time slot is denoted as t. For each τ, the set of packets in node Ni is denoted as MNi(τ), and the set of transmitted packets in node Ni is denoted as MNi(τ).
Algorithm 1 RL-ANC Algorithm
1: while node D does not cover M from node S do
2: for each node NiN do
3: Select the next node based on 3.2
4: if MNi(τ) ≠ ∅ then
5: while τΓk do
6: while tτ do
7: Estimate the channel erasure probability and channel capacity via (5) and (6)
8: Resize the sliding window and determine the maximum repeatability via (7)
9: Estimate the decoding probability via (8)
10:  while MNi(τ)\MNi(τ) ≠ ∅ do
11:  Determine the slide rule via (10) to (14)
12:  Optimize the coding coefficients via (15) to (19)
13:  Encode packets to obtain encoded packets based on 3.3
14:  Send Pec(τ) to next node
15:  end while
16:  Refresh the time slot table ττ + 1
17:  end while
18:  Optimize the sampling period via (20) to (22)
19:  end while
20:  end if
21:  end for
22:  end while

4.2. Channel Estimation

In the actual transmission process, the estimation of the channel erasure probability and channel capacity is based on the number of packets and acknowledgement packets. However, this introduces a significant error. Furthermore, the timeliness of the resulting erasure probability and channel capacity estimates using the maximum transmission period as the channel condition update period is inadequate in the context of the complex underwater environment and node mobility faced by the OM-IoT. Consequently, the proposed algorithm estimates the channel erasure probability in terms of the percentage of time slots where erasure occurs during the sampling period.
It is assumed that the sampling period within round Tk is τk and that the maximum transmission period is Γk, with τkΓk. In τk, node Nk transmits n(τk) packets and receives n′(τk) acknowledgements. The channel erasure probability pe*(k) with channel capacity c* (k) within sampling period τk is then given by
p e * τ k = 1 1 τ k i = 1 n τ k t i ,     c * τ k = 1 τ k i = 1 n τ k t i ,                  
During the actual transmission process, the channel erasure probability and the channel capacity of the sampling period are estimated from the data transmission in the sampling period (ξ − 1)τk (ξ = 0, 1, 2, …, ξTk/τk). Consequently, the estimated value of the channel erasure probability and the channel capacity estimate is given by
p ^ e ξ τ k = p e * ξ 1 τ k ,     , c ^ ξ τ k = c * ξ 1 τ k .    

4.3. Dynamic Adjustment of the Sliding Code Window

The proposed algorithm implements batch network coding via the sliding code window. When the time slot is updated, the code window slides a certain distance in a certain direction to achieve the update of the packets involved in the coding packet. Consequently, the size of the sliding code window with the minimum number of packets allowed to be duplicated must be designed. The proposed algorithm describes the number of currently to be coded packets allowed to be duplicated with the packets in the window of the previous time slot in terms of the minimum duplication. At the i-th time slot tiξτk, the sliding window size, Hi(k) and the minimum repetition degree, Oi(k) are respectively given by
H i k = 1 p ^ e ξ τ k n k τ k ,     O i k = p ^ e ξ τ k H i k .                    

4.4. Pre-Estimation on the State of Decoding

The proposed algorithm estimates the current decoding state in terms of the decoding probability and the degrees of freedom required for decoding. During the j-th time slot tjTk, the current node Nk will send a coded packet Pec(tj) containing nec(tj) packets to the next hop node Nk+1. Assuming that the rank of the total coding coefficients matrix Gj(k) at time slot tj is rank[Gj(k)], the decoding probability estimated value η ^ j k and the estimate value of the degree of freedom required for decoding χ ^ j k are respectively given by
η ^ j k = r a n k G j k i = 0 j n e c t i O i k ,  
χ ^ j k = i = 0 j n e c t i O i k r a n k G j k ,
where Oi(k) is the minimum repetition of the sliding coding window in the i-th time slot.

4.5. Adaptive Optimization of Sliding Rules for Coded Windows

The sliding rules for the coding window of the proposed algorithm encompass both the direction and the distance of the sliding movement. In contrast, the algorithm proposed assumes that the coding window does not exhibit backward sliding in the form of backtracking, thus establishing a fixed sliding direction. In other words, the proposed algorithm primarily considers the sliding distance when optimizing the sliding rule. In the context of underwater data transmission, the practice of encoding all the packets simultaneously can give rise to significant challenges, including reduced decoding efficiency and limited fault tolerance. Consequently, the approach of batch encoding is frequently employed. Consequently, the number of updates to encoded packets in different time slots is constrained and the algorithm is confronted with a limited state space and action space. The Q-Learning algorithm, a classical reinforcement-learning algorithm, is capable of satisfying this demand and of minimizing the influence of technical factors on the study. Nevertheless, it is important to note that the Q-Learning algorithm is not without its limitations. This paper examines the exploration strategy of the algorithm with a view to enhancing its computational efficiency. This section presents an optimization method based on the sliding rule, which is derived from the Q-Learning algorithm. In Section 4.6, this paper presents an improvement to the exploration strategy.
The proposed algorithm is based on the Q-learning algorithm for the adaptive optimization of sliding rules. In the sliding rule optimization phase, the Q-value update formula is as follows:
Q W s ,   a = Q W s ,   a + α r W s ,   a + γ max a Q W s , a Q W s ,   a
where the Q value, QW(s, a), represents the value of selecting action a at state s. The reward, r(s, a), is the value of selecting action a at state s. The learning rate, α, is a parameter that determines the rate of change in the value of the selected action. The discount factor, γ ∈ [0, 1], is a parameter that determines the relative importance of future rewards. In the proposed algorithm, the coded packets and uncoded packets as of the current slot ti are used as state s. The proposed algorithm achieves the update of coded packets through the sliding coding window. The sliding direction of the window is deterministic when the total coding window is deterministic. Consequently, with the sliding distance of the action a, the action space AW = {0, 1, …, Hi(k) − Oi(k)}. In the unreliable communication link scenario, the proposed algorithm aims to improve the decoding probability and thus rewards rW ∝ Δη(ti), where Δη(ti) is the incremental decoding probability under time slot ti, that is:
Δ η t i = η ^ i k η ^ i 1 k .
However, if the sole reward is the increment of the decoding probability, it can readily result in a non-sliding sliding window, whereby no new packets are involved in encoding in consecutive time slots. This will result in the algorithm becoming stuck in repetitive encoding, with a concomitant decrease in the efficiency of data transmission in the OM-IoT. In order to prevent the proposed algorithm from becoming continuously repetitive in its encoding process, it is essential to ensure that new packets are encoded within the sliding encoding window during each update time slot before the sliding encoding window has traversed all the packets to be transmitted. However, if no restriction is placed on the number of new packets within the sliding window, it may in turn lead to the situation where the degree of freedom required for decoding is still greater than zero after the current round of transmission. This is because the coding coefficient matrix is not full of rank and the resulting coded packets will not be successfully decoded. Therefore, the proposed algorithm determines the choice of the final action a, i.e.,
a t i = 0                                                                 , H i k O i k = 0 , arg max a A \ 0 Q W s , a , else .

4.6. Improved Greedy Strategies

The Q-learning algorithm selects the action with the highest Q-value based on the ε-greedy strategy. In this algorithm, the probability that the action with the highest Q-value, selected according to the general ε-greedy strategy, is chosen as 1 − ε.
a t i = arg max a A \ 0 Q W s , a , 1 ε , a A W \ 0                         , ε .
In the general ε-greedy strategy, the exploration probability is constant, so that the algorithm exploration probability is constant. As the number of explorations accumulates, the additional exploration probability does not have to be maintained at the initial level. Consequently, the greedy strategy needs to be improved.
The proposed algorithm incorporates a decay function for the exploration probability ε, which increases with the passage of time Δt. This is in consideration of the demand on the exploration probability by the difference in the Q-values of different actions. Consequently, the improved exploration probability is given by
ε = ε 0 exp Q W s , a Q W s , a Δ t ,
where ε0 is the initial value of the exploration probability and is defined as a real number from zero to one. As the algorithm progresses, ε gradually decays, which allows the algorithm to avoid unnecessary exploration and improve its overall efficiency. Consequently, the enhanced greedy strategy is as follows: when a′AW\{0}, if QW(s, a′) > QW(s, a), then a′ is the subsequent action. Otherwise, a′ is re-selected with the probability indicated in Equation (13).

4.7. Adaptive Optimization of Coding Coefficients

The proposed algorithm is based on RLNC as the fundamental coding algorithm. However, to avoid the high randomness of the network coding coefficients matrix, which could lead to an uncontrollable decoding probability, we employ the optimization of the network coding coefficients based on DQN. The fundamental framework of the coding coefficient optimization method is illustrated in Figure 7. In the proposed algorithm, each node is regarded as an agent with an embedded DQN, which is responsible for optimizing the coding coefficients. In the multi-node scenario, all the DQNs are trained centrally in order to simplify and accelerate the training process. At each discrete decision step, the sender performs the estimation of the coding sparsity of the packet. The environmental information encompasses the channel state and the historical packets stored in the node’s cache.
During the execution of the algorithm, in each decision step j, the sender takes an action aj in state sj with the objective of optimizing the coding coefficients of the j-th coded packet in the packet by DQN. Upon the action aj, the state transitions from sj to sj+1, and the sender obtains the reward rj from the environment. Thereafter, the sender stores the experience (sj, aj, rj, sj+1) to the playback buffer. In the multi-node scenario, during training, the centralized optimizer randomly draws a small batch of experience data from the playback cache and updates the parameters of the DQN by minimizing the loss θ. After the parameters θ are updated, the optimizer sends the updated parameters θj to each node. After receiving the updated parameters, the nodes update the parameters of their DQN.
During the process of optimizing the coding coefficients of the proposed algorithm, the Q-value update formula is given as follows:
Q G s j , a j ; θ = Q G s j , a j ; θ + α r j + γ max a j + 1 Q G s j + 1 , a j + 1 ; θ Q G s j , a j ; θ
where the Q-value, denoted by QG(s, a; θ), is the outcome of a selection of action a at state s and θ represents the network estimation parameter. The value of θ is updated by the loss function, i.e.,
L o s s θ = 1 N j = 1 N Q G j Q G s j , a j ; θ 2 .
As shown in Equation (16), at each decision step j, the state s is comprised of two partial packets, including Pj and the information Pec(m) of m coded packets from the historical packets in the cache, i.e., s = [Pj, Pec(m)]. The action ajAG, where AG = {0, 1, …, 2q} is the action space and q is the size of the random domain. For each participating packet Pj, the coding coefficients gj = aj. The objective of the coding coefficient optimization is to prevent linear correlation between different coding coefficient vectors, which could result in an uncontrolled decoding probability. Consequently, the proposed algorithm is rewarded with an increment of the rank of the total coefficients matrix, Δrank[G(ti)], which is given by
Δ r a n k t i = r a n k G i k r a n k G i 1 k .
The selection of the coding coefficients g(sj) in the final state sj is performed with reference to the improved greedy strategy presented in Section 4.6, i.e.,
g s j = arg max Q G s j , a j ; θ , 1 ε , random A G , ε ,
where the ε is given by
ε = ε 0 exp Q G s , a ; θ Q G s , a ; θ Δ t .

4.8. Optimization of the Sampling Period

The selection of the sampling period τk in the proposed algorithm affects the channel estimation level. When τk is too high, which is closer to the actual value, the channel estimation delay rises and the algorithm timeliness decreases, which in turn affects the performance of data transmission. Conversely, when τk is too low, the statistical significance of τk is weakened and the coded transmission degenerates into a retransmission mechanism. Consequently, we employ the simulated annealing algorithm to optimize the sampling period.
In the proposed algorithm, the temperature decay equation is given by
T 0 = T m a x , T m = 1 μ T m i n + μ T m ,
where the initial temperature, T0, represents the starting point of the process. The temperature maximum, Tmax, is the highest temperature reached during the process. The temperature minimum, Tmin, is the lowest temperature achieved. The annealing factor, µ, is a value between zero and one. When optimizing the sampling period, the initial sampling period, τk, is randomly generated. A random perturbation is then applied to the current sampling period after each round, generating a new sampling period, τk, in its neighboring nodes. The update probability of the sampling period is then calculated as follows:
p τ = 1 , σ τ k < σ τ k , exp σ τ k σ τ k T m , σ τ k σ τ k ,
where σ(τk) represents the channel estimation standard deviation and is given by
σ τ k = p ^ e k 1 p ^ e k ,
Equation (21) indicates that when the standard deviation of the channel estimation under the new sampling period τk is superior to τk, the sampling period is updated to τk. Conversely, if the standard deviation is not superior, the sampling period is updated with a probability determined by (21).

5. Simulation and Analysis of Results

5.1. Preparation for Simulation

5.1.1. Dataset

In this paper, the simulation employs a random network model. A number of nodes are randomly generated in a finite 3D area with random node movement speeds, and the node movement direction is realized by random positives and negatives of the 3D velocity components. The node deployment strategy is not considered when the network is initialized, so the effect of the network coverage on the performance of the algorithm will be illustrated in subsequent simulations. Furthermore, the nodes in the random network are assigned unique identifiers for subsequent analysis of the data transmission and codec performance. However, the node identifiers do not affect the specific data transmission process. In other words, all the intermediate nodes are peer-to-peer nodes, with the exception of differing identifiers. In order to analyze the data transmission performance of the proposed algorithm, it is not necessary to consider the specific packet content in great detail. Therefore, the transmitted packet is a randomly generated 32-bit string.

5.1.2. Platform, Parameters and Details

The simulations presented in this paper were conducted using Python 3.10.9 and PyTorch 1.13.1. The specific long parameter settings are outlined in Table 1. For the DQN employed in the algorithm, the proposed model was trained for 10,000 episodes and tested for 100 episodes with a memory playback batch size of 32 and a playback buffer capacity of 20,000. In Table 1, the ideal speed of sound underwater (1500 m/s) is employed. For the purposes of analysis, the maximum transmission limit and the number of transmission rounds are taken as 100 and 10,000, respectively. It is demonstrated that a random domain size of 28 is sufficient for encoding and decoding, and thus, it is set to 28. The remaining parameters are set according to the established research convention or experience in order to facilitate comparison with other algorithms.

5.2. Comparative Analysis of the Performance for Different Algorithms

In order to analyze the performance of the proposed algorithm, we have chosen to compare it with data retransmission (DR), redundant transmission (RT) and RLNC. The performance metrics of interest include the packet delivery rate (PDR), average retransmission rate (ARR) and redundant transmission rate (RTR), as well as the convergence of the decoding probability.

5.2.1. Packet Delivery Rate

In the context of underwater data transmission, the PDR is defined as the ratio of the number of packets received by the destination node or recovered by decoding to the number of packets sent by the source node. The simulation results for the four algorithms presented in this paper are shown in Figure 8.
The general trend of the simulation results is first analyzed. The results of this experiment demonstrate that the PDRs of all four algorithms are attenuated to varying degrees as the channel erasure probability increases. However, in general, the two algorithms with network coding (RLNC and RL-ANC) are more resistant against channel erasure than the two algorithms that do not use network coding (DR and RT). It should be noted that, in principle, the theoretical values of the PDR of the four algorithms should not be affected by the number of nodes in the network, provided that the distribution of the nodes can achieve complete coverage of the target area. Furthermore, the simulation experiments in this paper mainly focus on the random network, in which the nodes in the network are free to move.
Due to the high randomness of both the initial positions and the movement speeds of the nodes, it is possible that the network coverage may not always reach 100%. Consequently, in the low node density scenario, the PDR of all four algorithms is affected by the network connectivity and shows a low level. Nevertheless, as the number of nodes in the network increases, both the node density and network coverage demonstrate an upward trajectory. At this juncture, the PDR of the four algorithms is less susceptible to the effects of network connectivity, allowing them to exhibit a higher level. Furthermore, in the case of low node density, the network coverage and connectivity are at a low level, and the number of available hops in the process of data transmission is less. Consequently, direct transmission and retransmission can achieve data transmission more efficiently. As the node density increases, the average number of hops for data transmission rises. At this point, a significant number of data retransmissions will diminish the network’s data transmission performance. Consequently, as the node density rises, the general data transmission mechanism will become progressively less effective than the other three algorithms on the PDR. It can be observed that the performance of RLNC over RL-ANC in terms of the PDR performance is superior to that of DR. This is due to the fact that the transmission mechanism utilizing network coding is capable of effectively enhancing the network throughput. Furthermore, the proposed algorithm is adaptive to unreliable communication links, thereby allowing for a gradual reflection of the performance of the proposed RL-ANC algorithm in terms of the PDR performance relative to that of RLNC when the quality of the communication link decreases.
The following is a comparative analysis of the PDR performance of the four algorithms with the same channel erasure probability. When the channel erasure probability is pe = 0.1, the quality of the communication link between the nodes is more stable and the packet loss rate is lower than those presented in scenarios with higher pe (including cases with pe of 0.3, 0.5 and 0.7, as presented in this paper). Therefore, the four algorithms have similar PDR performance. As the channel erasure probability increases, the performance advantage of the proposed algorithm over the other three algorithms in terms of the PDR gradually emerges when pe = 0.3 and pe = 0.5. This advantage can be maintained at a high level with a certain degree of network coverage. The trend of the PDR performance of the proposed algorithm is similar to that of RLNC, but the proposed algorithm still has a certain advantage over RLNC. This is due to the fact that the proposed algorithm, in addition to enhancing the network throughput and facilitating redundant transmission through network coding, also exhibits robust environmental adaptability and is capable of promptly adjusting the coding strategy in accordance with the fluctuating channel conditions. When the channel erasure probability is at a high level, such as pe = 0.7, the quality of the communication link is markedly unstable and the conventional retransmission mechanism and redundant transmission demonstrate a considerable degree of performance deterioration. The proposed algorithm is capable of adapting its coding strategy in response to changes in the channel conditions, thereby enabling more effective communication in the face of unreliable communication links with unstable quality.

5.2.2. Average End-to-End Delay

The results of the end-to-end delay simulations are presented in Figure 9. The results indicate that the average end-to-end delay performance of data retransmission is generally better when pe = 0.1. This is due to the fact that when the channel erasure probability is lower, the packet loss rate during data transmission is lower and the scale of data retransmission faced by each hop is smaller. Conversely, when the probability of channel erasure is lower, the transmission of redundant data and the application of network coding necessitate the introduction of additional transmission and computational resources, thereby imposing an additional overhead. However, as the probability of channel erasure increases and the network size increases, the network employing data retransmission will face a significant number of retransmission requirements, resulting in a notable decline in the efficiency of data transmission between hops and a notable increase in the average end-to-end delay. In contrast, redundant transmission can reduce the number of burst retransmissions by repeated transmissions scheduled in advance. In networks where network coding is implemented, the network coding can effectively improve the network throughput, allowing redundant transmission to be achieved with higher efficiency and the average end-to-end delay advantage to be more obvious. A comparison of RL-ANC with RLNC reveals that the average end-to-end delay is marginally higher than that of RLNC due to the additional operations required by the proposed algorithm. However, when considered alongside the simulation results presented in Section 5.2.1, the minimal time cost introduced by RL-ANC is deemed acceptable in scenarios where enhanced data transmission reliability is a priority.

5.2.3. Average Retransmission Rate

Although network coding and retransmission mechanisms can reduce the need for retransmission, it does not follow that redundant transmission and network coding can completely eliminate the need for data retransmission. In the case of harsher channel conditions, it cannot be ruled out that the destination or coded packet cannot be successfully received or decoded for recovery. In order to analyze the performance of the proposed algorithms in greater detail, we analyze the four algorithms in terms of the ratio of the number of retransmissions of each hop to the number of all the data transmissions during the transmission process. We then describe this metric in terms of the average retransmission rate (ARR) of each hop during the transmission process. It should be noted that in the general retransmission mechanism, specific packets are transmitted during retransmission. In contrast, in the transmission mechanism using network coding, different coded packets with the same code combination are transmitted during retransmission. The network coding combination is defined as follows:
Definition 3.
(Network coding combination.) If a coded packet Pec consisting of n packets Pi (i = 1, 2, …, n), then the combination of packets Pi is defined as a coded combination of coded packets Pec, denoted as C(Pec), i.e.,
C(Pec) = {P1, P2, …, Pn}.
We present a simulation and comparative analysis of the average retransmission rate performance of the proposed algorithm with DR and RLNC (Figure 10). Overall, the average retransmission rate of RLNC and RL-ANC is significantly better than that of DR, and the average retransmission rate of RL-ANC has a certain advantage over that of RLNC. Furthermore, as the erasure probability increases, the average retransmission rates of the three algorithms all appear to increase. However, the degree of increase of RLNC and RL-ANC is significantly lower than that of DR. When pe = 0.1, the average retransmission rate of RLNC and RL-ANC is relatively low, with the simulation result approaching zero. At this point, the average retransmission rate of DR is already greater than 0.1. As the probability of packet loss during data transmission increases, the data transmission system employing DR is faced with a significant number of data retransmission requirements. In contrast, the systems employing RLNC and RL-ANC are able to complete the data transmission. The two algorithms are capable of completing the data transmission with fewer retransmissions due to their ability to increase the network throughput and achieve redundant transmission through repetitive coding. Consequently, the two algorithms can complete data transmission with fewer retransmissions when the probability of channel erasure increases. Furthermore, RL-ANC consistently outperforms RLNC in terms of the average number of retransmissions. This is because the proposed algorithm determines the coding strategy based on the channel conditions, and its coding repetition is correlated with the channel erasure probability. Consequently, it is less affected by the erased channel and has better environmental adaptability.

5.2.4. Redundant Transmission Rate

The analyses in Section 5.2.1 and Section 5.2.2 reveal that the redundant transmission achieved through repeated coding is a significant factor contributing to the superior performance of the proposed algorithm with RLNC compared to the other two algorithms. To gain a more comprehensive understanding of the proposed algorithm’s performance, we compare and analyze the redundant transmission rate (RTR) of the proposed algorithm. The definition of RTR is as follows:
Definition 4.
(Redundant transmission rate.) After the OM-IoT data transmission is completed, if the set of packets received and decoded for recovery by the sink node is PD = {P1, P2, …, Pk}, then the redundant transmission rate for this data transmission is defined as follow:
R T R = i = 1 k n P i 1 i = 1 k n P i ,
where n(Pi) is the frequency at which the packet Pi (i = 1, 2, …, k) is received and decoded for recovery by the sink node.
We primarily employ a simulation approach to assess and compare the redundancy transmission rate of three algorithms: RT, RLNC and RL-ANC (Figure 11). The findings indicate that the redundancy transmission rates of the three algorithms increase as the erasure probability increases. However, the performance of the three algorithms and their respective transformations in terms of the redundancy transmission rate varies at different erasure probabilities. When pe = 0.1, the redundancy transmission rates of the three algorithms are maintained at a low level. As pe increases from 0.1 to 0.5, the redundancy transmission rates of the three algorithms become closer to each other. However, the trend of RTR(RT) < RTR(RL-ANC) < RTR(RLNC) is consistently observed. This is due to the fact that the proposed algorithm adjusts the sliding coding window according to the current estimate of the channel erasure probability before network coding, thus maintaining a relatively balanced interval for the redundancy transmission rate of RL-ANC. When pe = 0.7, the redundancy transmission rates of the three algorithms exhibit a significant disparity, yet the redundancy transmission rate of RL-ANC remains between the other two algorithms. A comparison of the simulation results presented in Section 5.2.1 with those depicted in Figure 11 reveals that the proposed algorithm is capable of achieving a higher packet delivery rate with a lower redundancy transmission rate. Consequently, it can be concluded that the reliability of the OM-IoT data transmission can be effectively enhanced through the implementation of this algorithm.

5.2.5. Convergence of Decoding Probabilities

In order to further analyze the reliability of the coding and decoding strategies of the proposed algorithm, this paper analyses the convergence speed of the decoding probability of the proposed algorithm with RLNC. It should be noted that the simulation results and analyses presented in Section 5.2.1, Section 5.2.2, Section 5.2.3 and Section 5.2.4 indicate that when the number of nodes is over 90, the indices tend to be stable. Therefore, the convergence speed of the decoding probability in the network with a number of nodes of 90 is selected for analysis. The simulation results of the decoding probability under different channel erasure probabilities are presented in Figure 12. The decoding probability of both algorithms under different channel erasure probabilities is observed to converge to one as the data transmission process proceeds, which is in accordance with the general law of network coding and decoding. In the same channel erasure probability context, the decoding probability convergence speed of the proposed algorithm is superior to that of RLNC. Furthermore, the greater the channel erasure probability, the more pronounced this advantage becomes. As the channel erasure probability increases, the decoding probability convergence speeds of both the proposed algorithm and RLNC slow down to varying degrees. However, the decoding probability convergence speed of RL-ANC is less affected by changes in the channel conditions. The simulation results demonstrate that the adaptive optimization of the proposed network coding algorithm achieves the expected effect. Furthermore, the adaptive determination of the coding strategy for different communication link qualities can effectively improve the network coding performance.

5.3. Effectiveness Analysis of Improvement and Optimization in the Proposed Algorithm

In order to analyze the effect of the proposed algorithm on the improvement of the greedy strategy used in the reinforcement-learning algorithm and the optimization of the sampling period in the algorithm, this paper carries out a simulation and comparative analysis of the performance of the RL-ANC algorithm based on the unimproved and unoptimized base algorithm and the performance of the RL-ANC algorithm based on the improved and optimized algorithms. It should be noted that since the improvement of the greedy strategy and the optimization of the sampling period have a clear objective, the analysis focuses on the change in the target performance metrics. Furthermore, since the indicators tend to be stable when the number of nodes is over 90, the network with 90 nodes is selected for analysis.

5.3.1. Effects of Improvements in the Greedy Strategies

The enhanced greedy strategy is designed to enhance the operational efficiency of the algorithm. The average end-to-end delay is the primary metric for gauging the efficacy of the greedy strategy’s improvement. Figure 13 illustrates the simulation outcomes of the average end-to-end delay of the algorithm before and after the implementation of the enhanced greedy strategy under varying channel erasure probabilities. The results of the simulation, as shown in Figure 13, indicate that at a given erasure probability, the average end-to-end delay of the algorithm with the improved greedy policy is significantly lower than that of the algorithm with the unimproved greedy policy. As the channel erasure probability increases, the overall average end-to-end delay increases to varying degrees. However, the algorithm with the improved greedy policy is less affected by the increasing channel erasure probability. It is important to note that the results and analyses presented in Section 5.2.2 demonstrate that the introduction of the RL-ANC algorithm results in an additional time cost due to the additional operations. However, this additional time cost is deemed acceptable in light of the enhanced data transmission stability. Nevertheless, it is crucial to pursue further optimization of the algorithm to enhance its operational efficiency and improve the OM-IoT data transmission performance.

5.3.2. Effects of Optimization for the Sampling Period

The objective of sampling period optimization is to enhance the timeliness and accuracy of channel estimation, thereby facilitating the adaptivity of network coding. During the execution of the algorithm, one of the performance indicators that can be readily interpreted as a reflection of the reasonableness of the network coding strategy is the convergence speed of the decoding probability. Consequently, this study primarily focuses on the impact of sampling period optimization on the convergence speed of the decoding probability. In this section, we examine the impact of sampling period optimization on the performance of the algorithm in the presence of channel erasure. To this end, we conduct simulations in which the erasure probability was randomly determined and set to one of four values pe∈{0.1, 0.3, 0.5, 0.7} for each hop. These values were updated every five seconds. To emphasize the analysis of the impact of sampling period optimization, the performance of the algorithm under deterministic periods is additionally simulated. The fixed period used for comparison is τ∈{1 s, 2 s, 3 s, 4 s, 5 s}.
A comparison of the convergence speed of the decoding probability of the algorithm before and after the sampling period optimization is presented in Figure 14. During data transmission, the dynamic sampling period can be obtained through sampling period optimization. The simulation results shown in Figure 14 indicate that the decoding probability under a dynamic sampling period can be approximated to one faster than that under a fixed sampling period. Therefore, the adaptability of the network coding strategy to dynamic channel conditions can be improved by sampling period dynamic optimization. It is important to note that the convergence rate of the decoding probability under a fixed sampling period does not exhibit a monotonically increasing or decreasing trend with the sampling period. Instead, the decoding probability initially accelerates and then decelerates as the sampling period increases. This is due to the fact that if the sampling period is too short, the accuracy of the channel estimation will increase, but the statistical significance of the estimation value for fitting the channel erasure probability in the next sampling period will decrease. Conversely, if the sampling period is extended, the estimation value will have a better statistical interpretation of the channel erasure probability in the current and subsequent sampling periods. However, the accuracy of the channel erasure probability estimation value in terms of the timeliness will decrease. While the dynamic sampling period obtained by optimization can be tracked in real-time conditions to obtain the higher accuracy and timeliness of the estimates, as well as the resulting estimates of statistical significance, the optimized dynamic sampling period can therefore be expected to better improve the reliability and rationality of the algorithmic network coding strategy.

6. Conclusions

In the OM-IoT, the transmission of data underwater is subject to a number of challenges, including a high packet loss rate, low reliability and communication efficiency. These issues are compounded by the complex and changeable marine environment and the movement of nodes. In order to address the impact of unreliable communication links on the reliability of underwater data transmission in the OM-IoT, this paper proposes an adaptive network coding algorithm based on reinforcement learning. The paper then goes on to simulate and analyze the underwater data transmission process, introducing the proposed algorithm. RL-ANC introduces reinforcement learning on the premise of estimating the channel conditions and decoding states, achieves adaptive packet batching by dynamically adjusting the sliding coding window size and sliding rules, and controls the complexity of coding and decoding operations within a reasonable range by an adaptive optimization method of coding coefficients based on DQN. Furthermore, the RL-ANC algorithm enhances the greedy strategy by improving its efficiency, optimizes the algorithm operation by exploring the dynamic adjustment of the probability of realization, and optimizes the sampling period in real time, thereby enhancing the reliability and communication performance of the coding of underwater networks. A simulation was conducted to compare and analyze the proposed algorithm with common data retransmission, redundant transmission and general network coding. The results demonstrated that the proposed algorithm outperforms the other three algorithms in terms of the packet delivery rate, average retransmission rate, and redundant transmission rate. Additionally, the proposed algorithm exhibited a faster convergence speed of the decoding probability. Furthermore, we have conducted a detailed analysis of the impact of the enhancement of the greedy strategy and the optimization of the sampling period on the algorithm. Our findings demonstrate that both the proposed improvement and optimization have led to enhanced outcomes.
It should be noted that the algorithms proposed in this paper consider a node-depth-based framework for underwater multihop data transmission and mainly consider in-stream network coding. For large-scale data transmission scenarios, e.g., multi-source and multi-sink networks, multi-flow intersection networks, large-scale IoT data transmission, etc., if network coding is used, there will be multi-level concurrent network coding scheduling and optimization problems. Consequently, the subsequent stage of this research will entail the investigation of network coding algorithms in more intricate network scenarios, employing the RL-ANC framework. Furthermore, this paper primarily addresses the underwater network in the OM-IOT. Given the growing trend of integration in the OM-IOT, which integrates multiple networks such as satellites, further research is warranted. In other forms of networks, research on network coding optimization based on reinforcement learning holds considerable potential, as various forms of media, such as electromagnetic signals and optical signals, can be widely used. Consequently, we will also refine and extend the proposed RL-ANC framework to other networks, which represents a key area of future research.

Author Contributions

Y.Z. conceived and supervised the research and experiments, and also contributed as the lead author of the article; X.W. wrote the draft of the manuscript and conducted the experiments. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (no. 61673259); supported by the Shanghai “Science and Technology Innovation Action Plan” Hong Kong, Macao and Taiwan Science and Technology Cooperation Project (no. 21510760600); supported by the Capacity Building Project of Local Colleges and Universities of Shanghai (no. 21010501900), supported by the Open Project Funds for the Key Laboratory of Space Photoelectric Detection and Perception (Nanjing University of Aeronautics and Astronautics), Ministry of Industry and Information Technology (no. NJ2024027-3), and also supported by the Fundamental Research Funds for the Central Universities (no. NJ2024027).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data contributions presented in the study are included in the article, and further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Vo, D.T.; Nguyen, X.P.; Nguyen, T.D.; Hidayat, R.; Huynh, T.T.; Nguyen, D.T. A Review on the Internet of Thing (IoT) Technologies in Controlling Ocean Environment. Energy Sources Part A Recovery Util. Environ. Eff. 2021, 8, 1–19. [Google Scholar] [CrossRef]
  2. Durlik, I.; Miller, T.; Cembrowska-Lech, D.; Krzemińska, A.; Złoczowska, E.; Nowak, A. Navigating the Sea of Data: A Comprehensive Review on Data Analysis in Maritime IoT Applications. Appl. Sci. 2023, 13, 9742. [Google Scholar] [CrossRef]
  3. Gola, K.K.; Arya, S. Underwater Acoustic Sensor Networks: Taxonomy on Applications, Architectures, Localization Methods, Deployment Techniques, Routing Techniques, and Threats: A Systematic Review. Concurr. Comput. Pract. Exp. 2023, 35, e7815. [Google Scholar] [CrossRef]
  4. Xu, Q.; Su, Z.; Lu, R.; Yu, S. Ubiquitous Transmission Service: Hierarchical Wireless Data Rate Provisioning in Space-Air-Ocean Integrated Networks. IEEE Trans. Wirel. Commun. 2022, 21, 7821–7836. [Google Scholar] [CrossRef]
  5. Deyoung, B.; Visbeck, M.; de Araujo Filho, M.C.; Baringer, M.O.; Black, C.; Buch, E.; Canonico, G.; Coelho, P.; Duha, J.T.; Edwards, M. An Integrated All-Atlantic Ocean Observing System in 2030. Front. Mar. Sci. 2019, 6, 428. [Google Scholar] [CrossRef]
  6. Schmidt, J.H. Using Fast Frequency Hopping Technique to Improve Reliability of Underwater Communication System. Appl. Sci. 2020, 10, 1172. [Google Scholar] [CrossRef]
  7. Ra, H.; Youn, C.; Kim, K. High-Reliability Underwater Acoustic Communication Using an M-Ary Cyclic Spread Spectrum. Electronics 2022, 11, 1698. [Google Scholar] [CrossRef]
  8. Zhai, Y.; Li, J.; Feng, H.; Hong, F. Application Research of Polar Coded OFDM Underwater Acoustic Communications. EURASIP J. Wirel. Commun. Netw. 2023, 2023, 26. [Google Scholar] [CrossRef]
  9. Lee, S.; Bae, Y.; Khan, M.T.R.; Seo, J.; Kim, D. Avoiding Spurious Retransmission over Flooding-Based Routing Protocol for Underwater Sensor Networks. Wirel. Commun. Mob. Comput. 2020, 2020, 8839541. [Google Scholar] [CrossRef]
  10. Ahmed, G.; Zhao, X.; Fareed, M.M.S.; Fareed, M.Z. An Energy-Efficient Redundant Transmission Control Clustering Approach for Underwater Acoustic Networks. Sensors 2019, 19, 4241. [Google Scholar] [CrossRef]
  11. Ahlswede, R.; Cai, N.; Li, S.-Y.; Yeung, R.W. Network Information Flow. IEEE Trans. Inf. Theory 2000, 46, 1204–1216. [Google Scholar] [CrossRef]
  12. Ho, T.; Médard, M.; Koetter, R.; Karger, D.R.; Effros, M.; Shi, J.; Leong, B. A Random Linear Network Coding Approach to Multicast. IEEE Trans. Inf. Theory 2006, 52, 4413–4430. [Google Scholar] [CrossRef]
  13. Cai, S.; Yao, N.; Gao, Z. A Reliable Data Transfer Protocol Based on Twin Paths and Network Coding for Underwater Acoustic Sensor Network. EURASIP J. Wirel. Commun. Netw. 2015, 2015, 28. [Google Scholar] [CrossRef]
  14. Feng, X.; Wang, Z.; Liu, X.; Liu, J. ADCNC-MAC: Asynchronous Duty Cycle with Network-Coding MAC Protocol for Underwater Acoustic Sensor Networks. EURASIP J. Wirel. Commun. Netw. 2015, 2015, 207. [Google Scholar] [CrossRef]
  15. Kulhandjian, H.; Melodia, T.; Koutsonikolas, D. CDMA-Based Analog Network Coding for Underwater Acoustic Sensor Networks. IEEE Trans. Wirel. Commun. 2015, 14, 6495–6507. [Google Scholar] [CrossRef]
  16. Hao, K.; Jin, Z.; Shen, H.; Wang, Y. An Efficient and Reliable Geographic Routing Protocol Based on Partial Network Coding for Underwater Sensor Networks. Sensors 2015, 15, 12720–12735. [Google Scholar] [CrossRef] [PubMed]
  17. Wang, H.; Wang, S.; Zhang, E.; Zou, J. A Network Coding Based Hybrid ARQ Protocol for Underwater Acoustic Sensor Networks. Sensors 2016, 16, 1444. [Google Scholar] [CrossRef] [PubMed]
  18. Wang, H.; Wang, S.; Bu, R.; Zhang, E. A Novel Cross-Layer Routing Protocol Based on Network Coding for Underwater Sensor Networks. Sensors 2017, 17, 1821. [Google Scholar] [CrossRef] [PubMed]
  19. Zhan, C.; Wen, Z.; Xiao, F.; Chen, S. Joint Coding and Transmission Scheduling for Underwater Acoustic Networks. J. Internet Technol. 2018, 19, 2187–2196. [Google Scholar]
  20. Zhao, D.; Lun, G.; Xue, R. Coding-Aware Opportunistic Routing for Sparse Underwater Wireless Sensor Networks. IEEE Access 2021, 9, 50170–50187. [Google Scholar] [CrossRef]
  21. Su, Y.; Xu, Y.; Pang, Z.; Kang, Y.; Fan, R. HCAR: A Hybrid Coding-Aware Routing Protocol for Underwater Acoustic Sensor Networks. IEEE Internet Things J. 2023, 10, 10790–10801. [Google Scholar] [CrossRef]
  22. Shakya, A.K.; Pillai, G.; Chakrabarty, S. Reinforcement Learning Algorithms: A Brief Survey. Expert Syst. Appl. 2023, 231, 120495. [Google Scholar] [CrossRef]
  23. Park, S.H.; Mitchell, P.D.; Grace, D. Reinforcement Learning Based MAC Protocol (UW-ALOHA-Q) for Underwater Acoustic Sensor Networks. IEEE Access 2019, 7, 165531–165542. [Google Scholar] [CrossRef]
  24. Chang, H.; Feng, J.; Duan, C. Reinforcement Learning-Based Data Forwarding in Underwater Wireless Sensor Networks with Passive Mobility. Sensors 2019, 19, 256. [Google Scholar] [CrossRef] [PubMed]
  25. Di Valerio, V.; Presti, F.L.; Petrioli, C.; Picari, L.; Spaccini, D.; Basagni, S. CARMA: Channel-Aware Reinforcement Learning-Based Multi-Path Adaptive Routing for Underwater Wireless Sensor Networks. IEEE J. Sel. Areas Commun. 2019, 37, 2634–2647. [Google Scholar] [CrossRef]
  26. Zhang, Y.; Zhang, Z.; Chen, L.; Wang, X. Reinforcement Learning-Based Opportunistic Routing Protocol for Underwater Acoustic Sensor Networks. IEEE Trans. Veh. Technol. 2021, 70, 2756–2770. [Google Scholar] [CrossRef]
  27. Zhang, Y.; Su, Y.; Shen, X.; Wang, A.; Wang, B.; Liu, Y.; Bai, W. Reinforcement Learning Based Relay Selection for Underwater Acoustic Cooperative Networks. Remote Sens. 2022, 14, 1417. [Google Scholar] [CrossRef]
  28. Ye, X.; Yu, Y.; Fu, L. Deep Reinforcement Learning Based MAC Protocol for Underwater Acoustic Networks. IEEE Trans. Mob. Comput. 2022, 21, 1625–1638. [Google Scholar] [CrossRef]
  29. Jadoon, M.A.; Kim, S. Learning-Based Relay Selection for Cooperative Networks with Space–Time Network Coding. Wirel. Pers. Commun. 2019, 108, 907–920. [Google Scholar] [CrossRef]
  30. Gao, R.; Li, Y.; Wang, J.; Quek, T.Q. Dynamic Sparse Coded Multi-Hop Transmissions Using Reinforcement Learning. IEEE Commun. Lett. 2020, 24, 2206–2210. [Google Scholar] [CrossRef]
  31. Xiao, L.; Li, H.; Yu, S.; Zhang, Y.; Wang, L.-C.; Ma, S. Reinforcement Learning Based Network Coding for Drone-Aided Secure Wireless Communications. IEEE Trans. Commun. 2022, 70, 5975–5988. [Google Scholar] [CrossRef]
  32. Ali, R.; Haider, A.; Kim, H.S. RS-RLNC: A Reinforcement Learning Based Selective Random Linear Network Coding Framework for Tactile Internet. IEEE Access 2023, 11, 141277–141288. [Google Scholar]
  33. Zhao, Z.; Liu, C.; Guang, X.; Li, K. A Transmission-Reliable Topology Control Framework Based on Deep Reinforcement Learning for UWSNs. IEEE Internet Things J. 2023, 10, 13317–13332. [Google Scholar] [CrossRef]
  34. Patil, K.; Jafri, M.; Fiems, D.; Marin, A. Stochastic Modeling of Depth Based Routing in Underwater Sensor Networks. Ad Hoc Networks 2019, 89, 132–141. [Google Scholar] [CrossRef]
  35. Zhu, F.; Zhang, C.; Zheng, Z.; Farouk, A. Practical Network Coding Technologies and Softwarization in Wireless Networks. IEEE Internet Things J. 2021, 8, 5211–5218. [Google Scholar] [CrossRef]
Figure 1. Schematic of a typical marine mobile IoT system. It contains several types of generalized nodes.
Figure 1. Schematic of a typical marine mobile IoT system. It contains several types of generalized nodes.
Jmse 12 00998 g001
Figure 2. Schematic diagram of the problems with network coding in unreliable communication links: (a) loss of coded packets leads to recursive decoding failure; and (b) an irrational coding strategy leads to partial decoding failure. In order to facilitate comprehension, the XOR operation ( ) is employed here to represent the coding operation between packets. It should be noted that, in order to facilitate the description of the problems of network coding in unreliable communication links, network coding in a finite number of consecutive time slots is used here as an illustrative example. The necessity of continuous network coding during the transmission process is contingent upon the specific circumstances and the underlying algorithmic designed.
Figure 2. Schematic diagram of the problems with network coding in unreliable communication links: (a) loss of coded packets leads to recursive decoding failure; and (b) an irrational coding strategy leads to partial decoding failure. In order to facilitate comprehension, the XOR operation ( ) is employed here to represent the coding operation between packets. It should be noted that, in order to facilitate the description of the problems of network coding in unreliable communication links, network coding in a finite number of consecutive time slots is used here as an illustrative example. The necessity of continuous network coding during the transmission process is contingent upon the specific circumstances and the underlying algorithmic designed.
Jmse 12 00998 g002
Figure 3. A three-dimensional symbolic model of the OM-IoT. In this model, all of the nodes are divided into two categories: sink nodes and sensor nodes.
Figure 3. A three-dimensional symbolic model of the OM-IoT. In this model, all of the nodes are divided into two categories: sink nodes and sensor nodes.
Jmse 12 00998 g003
Figure 4. Schematic of a 3D movable node model. In this context, the symbol v represents the actual velocity of the node. The symbols vx, vy, and vz represent the partial velocity of the node in the x, y, and z directions, respectively. Finally, the symbol φ represents the communication radius of the node.
Figure 4. Schematic of a 3D movable node model. In this context, the symbol v represents the actual velocity of the node. The symbols vx, vy, and vz represent the partial velocity of the node in the x, y, and z directions, respectively. Finally, the symbol φ represents the communication radius of the node.
Jmse 12 00998 g004
Figure 5. Schematic diagram of the underwater data transmission mechanism based on the node depths.
Figure 5. Schematic diagram of the underwater data transmission mechanism based on the node depths.
Jmse 12 00998 g005
Figure 6. Schematic diagram of the overall flow of the RL-ANC algorithm.
Figure 6. Schematic diagram of the overall flow of the RL-ANC algorithm.
Jmse 12 00998 g006
Figure 7. Schematic framework of the coding coefficients’ adaptive optimization method in RL-ANC. In multi-node networks, coding factor optimization is achieved through centralized training with distributed execution. In this context, for each step j, the symbol aj represents the action, sj represents the state, rj represents the reward, θj represents the loss parameter, and Dj represents the playback cache.
Figure 7. Schematic framework of the coding coefficients’ adaptive optimization method in RL-ANC. In multi-node networks, coding factor optimization is achieved through centralized training with distributed execution. In this context, for each step j, the symbol aj represents the action, sj represents the state, rj represents the reward, θj represents the loss parameter, and Dj represents the playback cache.
Jmse 12 00998 g007
Figure 8. Comparison of PDR simulation results of four algorithms with different channel erasure probabilities. Different channel erasure probabilities: (a) pe = 0.1; (b) pe = 0.3; (c) pe = 0.5; (d) pe = 0.7.
Figure 8. Comparison of PDR simulation results of four algorithms with different channel erasure probabilities. Different channel erasure probabilities: (a) pe = 0.1; (b) pe = 0.3; (c) pe = 0.5; (d) pe = 0.7.
Jmse 12 00998 g008
Figure 9. Comparison of the simulation results of the average end-to-end delay of the four algorithms under different channel erasure probabilities. Different channel erasure probabilities: (a) pe = 0.1; (b) pe = 0.3; (c) pe = 0.5; (d) pe = 0.7.
Figure 9. Comparison of the simulation results of the average end-to-end delay of the four algorithms under different channel erasure probabilities. Different channel erasure probabilities: (a) pe = 0.1; (b) pe = 0.3; (c) pe = 0.5; (d) pe = 0.7.
Jmse 12 00998 g009
Figure 10. Comparison of simulation results on average retransmission rate of three algorithms with different channel erasure probabilities: (a) pe = 0.1; (b) pe = 0.3; (c) pe = 0.5; (d) pe = 0.7.
Figure 10. Comparison of simulation results on average retransmission rate of three algorithms with different channel erasure probabilities: (a) pe = 0.1; (b) pe = 0.3; (c) pe = 0.5; (d) pe = 0.7.
Jmse 12 00998 g010aJmse 12 00998 g010b
Figure 11. Comparison of redundant transmission rate simulation results of three algorithms with different channel erasure probabilities. Different channel erasure probabilities: (a) pe = 0.1; (b) pe = 0.3; (c) pe = 0.5; (d) pe = 0.7.
Figure 11. Comparison of redundant transmission rate simulation results of three algorithms with different channel erasure probabilities. Different channel erasure probabilities: (a) pe = 0.1; (b) pe = 0.3; (c) pe = 0.5; (d) pe = 0.7.
Jmse 12 00998 g011
Figure 12. Comparison of the decoding probability simulation results of RL-ANC and RLNC.
Figure 12. Comparison of the decoding probability simulation results of RL-ANC and RLNC.
Jmse 12 00998 g012
Figure 13. Comparison of the average end-to-end delay of RL-ANC before and after improvement of the greedy strategy.
Figure 13. Comparison of the average end-to-end delay of RL-ANC before and after improvement of the greedy strategy.
Jmse 12 00998 g013
Figure 14. Convergence speed comparison of the decoding probability of RL-ANC before and after sampling period optimization.
Figure 14. Convergence speed comparison of the decoding probability of RL-ANC before and after sampling period optimization.
Jmse 12 00998 g014
Table 1. Parameters of the RL-ANC simulation.
Table 1. Parameters of the RL-ANC simulation.
ParameterValue
Network size500 m × 500 m × 500 m
Simulation time5000 s
Number of simulations100
Number of nodes n (N)10/20/30/40/50/60/70/80/90/100/110/120
Node communication radius φ50 m
Node movement rate |v|0~3 m/s
Underwater ideal speed of sound1500 m/s
Packet size |M|32 bit
Number of packets per transmission round100
Number of transmission round k10,000
Channel erasure probability pe0.1/0.3/0.5/0.7
Maximum transmission period Γk100
Annealing factor μ0.2
Learning rate α0.1
Discount factor γ0.9
Explore the initial value of the probability ε00.01
Random field size q28
Initial temperature T01000
Maximum value of temperature Tmax1000
Minimum value of temperature Tmin10
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, Y.; Wang, X. RL-ANC: Reinforcement Learning-Based Adaptive Network Coding in the Ocean Mobile Internet of Things. J. Mar. Sci. Eng. 2024, 12, 998. https://doi.org/10.3390/jmse12060998

AMA Style

Zhang Y, Wang X. RL-ANC: Reinforcement Learning-Based Adaptive Network Coding in the Ocean Mobile Internet of Things. Journal of Marine Science and Engineering. 2024; 12(6):998. https://doi.org/10.3390/jmse12060998

Chicago/Turabian Style

Zhang, Ying, and Xu Wang. 2024. "RL-ANC: Reinforcement Learning-Based Adaptive Network Coding in the Ocean Mobile Internet of Things" Journal of Marine Science and Engineering 12, no. 6: 998. https://doi.org/10.3390/jmse12060998

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop