A Q-Learning-Based Approach to Design an Energy-Efficient MAC Protocol for UWSNs Through Collision Avoidance

Gang, Qiao; Rahman, Wazir Ur; Zhou, Feng; Bilal, Muhammad; Ali, Wasiq; Khan, Sajid Ullah; Khattak, Muhammad Ilyas

doi:10.3390/electronics13224388

Open AccessArticle

A Q-Learning-Based Approach to Design an Energy-Efficient MAC Protocol for UWSNs Through Collision Avoidance

by

Qiao Gang

^1,2,3,

Wazir Ur Rahman

^1,2,3,

Feng Zhou

^1,2,3,*,

Muhammad Bilal

^1,2,3

,

Wasiq Ali

^1,2,3,

Sajid Ullah Khan

⁴ and

Muhammad Ilyas Khattak

⁵

¹

National Key Laboratory of Underwater Acoustic Technology, Harbin Engineering University, Harbin 150001, China

²

Key Laboratory of Marine Information Acquisition and Security, Harbin Engineering University, Ministry of Industry and Information Technology, Harbin 150001, China

³

College of Underwater Acoustic Engineering, Harbin Engineering University, Harbin 150001, China

⁴

Department of Information Systems, College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, Alharj 16278, Saudi Arabia

⁵

School of Control Science and Engineering, Shandong University, Jinan 250100, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(22), 4388; https://doi.org/10.3390/electronics13224388

Submission received: 3 October 2024 / Revised: 30 October 2024 / Accepted: 5 November 2024 / Published: 8 November 2024

(This article belongs to the Special Issue New Advances in Underwater Communication Systems)

Download

Browse Figures

Versions Notes

Abstract

Deploying and effectively utilizing wireless sensor networks (WSNs) in underwater habitats remains a challenging task. In underwater wireless sensors networks (UWSNs), the availability of a continuous energy source for communicating with nodes is either very costly or is prohibited due to the marine life law enforcement agencies. So, in order to address this issue, we present a Q-learning-based approach to designing an energy-efficient medium access control (MAC) protocol for UWSNs through collision avoidance. The main goal is to prolong the network’s lifespan by optimizing the communication methods, specifically focusing on improving the energy efficiency of the MAC protocols. Factors affecting the energy consumption in communication are adjustments to the interference ranges, i.e., changing frequencies repeatedly to obtain optimal communication; data packet retransmissions in case of a false acknowledgment; and data packet collision occurrences in the channel. Our chosen protocol stands out by enabling sensor (Rx) nodes to avoid collisions without needing extra communication or prior interference knowledge. According to the results obtained through simulations, our protocol may increase the network’s performance in terms of network throughput by up to 23% when compared to benchmark protocols depending on the typical traffic load. It simultaneously decreases end-to-end latency, increases the packet delivery ratio (PDR), boosts channel usage, and lessens packet collisions by over 38%. All these gains result in minimizing the network’s energy consumption, with a proportional gain.

Keywords:

energy efficiency; low collision; MAC protocol; Q-learning; underwater wireless communication; underwater wireless sensor networks

1. Introduction

Information communication, which involves sharing relevant information to fulfill certain activities, has taken various shapes and forms throughout human history [1]. However, with technological breakthroughs in the 20th century, we humans have chosen many media through which to communicate between two entities [2,3,4]. The number of entities involved in the communication process has risen quite abruptly over the past three decades [5,6]. To cater for all humans with communication possibilities, networks have been created to accommodate this never-ending sharing of data [7,8]. There are two types of networks: wired and wireless. However, in this article, we focus on wireless communication [9]. Nowadays, WSNs play a key role in communication in all habitats: space communication between satellites and between satellites and base stations, underground communication, underwater communication, etc. [10,11,12].

This paper is devoted to UWSNs, in which various peculiar constraints like high propagation delays, limited bandwidth, and scalability of network topologies mean that a Q-learning-based MAC protocol can well handle the communication challenges faced by UWSNs. Among the solutions proposed to solve the MAC problem, Q-learning, from reinforcement learning methods, is extremely efficient and can become a tool to optimize the nodes to respond to changes in these harsh conditions, maximizing network performance in throughput, delay, and energy consumption. For instance, the authors in [13] design a Q-learning-optimized multi-receiver handshake scheme for mitigating packet loss and delay. The Q-learning algorithm constantly updates a Q-table for smart data transmission choices, significantly increasing delay and normalized throughput compared to traditional protocols. The authors in [14] achieve a similar goal by employing Q-learning to improve the utilization of the channel; that is, letting nodes learn how to transmit packets in an optimal way even if they do not have enough knowledge about the network topology. As a consequence, increased network channel utilization benefits from 13 to 26 percent higher throughput in different network environments. Regarding low energy [15], Q-learning improves energy consumption by minimizing collisions, which helps to increase network life and also optimizes MAC slot allocation for better efficiency. Over the years, much progress has been achieved in designing and developing UWSNs, which serve various purposes in different applications such as resource extraction, military surveillance, environmental monitoring, ocean pollution tracking, etc. [16]. And the progress in these new technologies has brought new marine and ocean research opportunities, which in turn is helping to develop a clearer understanding of the aquatic environment [17,18]. However, the specific nature of underwater communications gives rise to profound barriers that must be overcome to unlock the full potential of UWSNs. Among the challenges, one of the critical issues is related to the ability of water to absorb radio frequency (RF) waves, which leads to the utilization of acoustic waves to transfer information in UWSNs [19]. This reliance on acoustics brings several technical issues, primarily due to the characteristics of acoustic channels/links. Traditional communication paths using acoustic technology in underwater situations have limited bandwidth and slow propagation. This delay in communication between nodes implies the non-reliability of the network [20]. Table 1 illustrates the comparison between different existing MAC protocols and the present research.

Furthermore, there are numerous underwater-environment-specific factors, such as propagation conditions, that are continuously varying underwater [23]. Another very important factor affecting the whole communication process in UWSNs is the batteries’ storage capacity, which is used to keep the sensor (Rx) nodes running while specific sensor nodes perform a given task remotely [24]. So, if the battery charging and discharging frequency are high, the sensor nodes are consuming too much power to perform the assigned task [25]. Then, such a UWSN-based communication system is not energy efficient, because charging the battery again and again requires resources, and maybe those resources will increase so much with time that the whole application needs to be shut down because it is not financially sustainable [26,27]. In this context, to address the above-mentioned issue of energy, in our work we have exploited a Q-learning-based approach to formulate an energy-efficient MAC protocol for UWSNs, with collision prevention.

Given this, different research groups have put forward different MAC strategies targeting energy efficiency and communication dependability in underwater networks [28]. MAC protocols are crucial since they control the channel access of the sensor nodes in the UWSNs. Because the acoustic waves are very slow to propagate through water, MAC protocols can take advantage of the available bandwidth as long as future processes conserve power [29]. Due to the use of multiple nodes to perform a specific task for a given application to access the communication medium, contention-based MAC protocols have also proven more efficient underwater. These protocols, particularly the multiple access with collision avoidance (MACA)-based protocols, address the hidden node problem, an underlying cause of collisions in UWSNs [30]. The hidden node problem is where one or more nodes are not within each other’s range and do not know of the existence of one another but still send packets to the intermediary node. This causes packets to be sent and collide with each other, which in turn consumes energy. Conventional power control and collision avoidance (CAPC) MAC protocols have been developed to minimize the energy consumption of sensor nodes in UWSNs [31]. These protocols control the transmission power of the wireless devices depending on the data packet that is being transmitted, so that the power consumption is not overstretched. Power control methods are used to minimize energy usage, but these processes worsen the hidden node problem due to a high collision probability [32,33]. Interference in UWSNs is mainly because of the collision between control and data packets, which is generally termed long interference range collision (LIRC) [34]. The LIRC issue shows that it is challenging to identify potential interference in an underwater environment that suffers from both signaling overhead and propagation delay [35].

Moreover, different research groups have utilized Q-learning because it is an optimal solution for designing energy-efficient MAC and routing protocols for UWSNs [21,36,37,38]. As opposed to conventional protocols based on predefined rules, Q-learning leverages the nodes to adaptively learn optimal strategies to conserve energy using reinforcement learning [39]. This is especially useful in UWSNs, wherein the environmental conditions of high latency, low bandwidth, and unpredictable link qualities make communication very difficult [40,41,42]. Q-learning iteratively reinforces the decision-making process of the node from past experiences to yield better performance with reduced energy consumption and an elongated lifetime of the network without extensive prior knowledge regarding network dynamics [43]. Prior works have investigated several approaches to address these issues, such as employing directional antennas to enhance the transmission distance and minimize interference [44]. Some have incorporated scheduling policies for sensor nodes to avoid collision, enhancing the energy efficiency of the overall system/network [45].

To address these challenging issues, as mentioned earlier, we propose an adaptive Q-learning-based energy-efficient and low-collision MAC protocol to improve the performance of UWSNs. We use a reinforcement machine learning (ML) methodology to design an adaptive, energy-efficient, low-collision MAC protocol. By this method, we adapt transmission energy levels by observing the interference and noise levels of the network on the go, without prior knowledge about the interference sources. In addition, our proposed methodology has the potential to increase energy efficiency while enhancing the collision avoidance mechanism, which is a major problem of UWSNs. The detailed overview of the designed paradigm is given in Figure 1. The fundamental contributions of the research work presented by us are as follows:

We have developed a new low-collision method and combined it with an energy-efficient MAC protocol using Q-learning.
Our chosen methodology improves the system’s overall throughput.
Our chosen methodology also improves the collision avoidance of data packets.
We have proposed a new multi-cluster network for UWSNs to prove the theoretical analysis of our proposed system.

The rest of the paper is as follows: Section 2 describes the system model, with collision design and analysis of our proposed system, including collision avoidance, avoiding collisions between clusters, the problem of spatial–temporal uncertainty, the problem of hidden terminals, problems with exposed terminals, and the system model. Furthermore, Section 2 presents an in-depth overview of our proposed MAC protocol: enhancing network efficiency and performance. Section 3 presents an improved Q-learning algorithm for better management of channel utilization in UWCNs. Section 4 briefly describes a performance evaluation, including simulation design, performance metrics, and results evaluation, as well as a comparison with recent and advanced MAC protocols, while the last section is the Conclusions.

2. System Model with Collision Design and Analysis of Our Proposed System

During this phase of the design of our chosen technique, preventing collisions between communicating nodes is the main objective, in order to guarantee that every communicating node can access the medium. The model utilizes the concepts of organizational frameworks as well as distributed algorithms for clustering to reduce the possibility of collisions across both horizontal and vertical orientations.

2.1. Collision Avoidance

Every layer can efficiently handle horizontal collisions through energy efficiency and cluster formation. The decentralized clustering process helps the head communicating nodes of the cluster to select unique subsections from the other clusters in that region. Therefore, communicating nodes from neighboring clusters may communicate with one another without falling into any collisions. Additionally, some of the cluster’s communicating nodes connect to the system’s cluster head (CH) utilizing the handshake mechanism. This technique consists of layering the network area, allocating time periods, and controlling energy levels to prevent vertical collisions. Additionally, the network can be divided into four types of layers: layer 1, layer 2, layer 3, and layer 4, where every layer is subdivided into multiple types of clusters. Adjacent clusters communicate multiple times throughout the operating phase, whereas non-adjacent clusters send concurrently. Furthermore, energy-efficiency and time-stamp approaches are utilized to isolate various levels within a cluster in order to avoid collisions among communicating nodes belonging to nearby layers. The collision probability as a function of the total number of communication nodes N can be expressed with

p_{c}

, where t is the time for transmission and W is the contention window, as follows:

p_{c} = 1 - (t - \frac{1}{W})

(1)

where W denotes the contention window, that controls how to transmit multiple nodes simultaneously. The energy consumed because of collisions

E_{c}

is expressed as a function C over the total number of collisions,

E_{t}

, per packet transmission, and R retransmission attempts.

E_{c} = C \cdot E_{t} \cdot R

(2)

This energy figure also includes the overhead of retransmitting packets due to losses from collisions. The probability of collision-free transmission

P_{c} f

is as follows:

P_{c} f = {(1 - \frac{P_{c}}{N})}^{N - 1}

(3)

The equation shows the increasing probability of collision-free transmission as we increase the contention window or network efficiency.

2.2. Avoiding Collisions Between Clusters

Slots of time are allotted to enable effective communication inside the cluster, although at the cost of decreased energy efficiency, in order to ensure connection between the head communicating node of the cluster and communicating nodes inside the cell region. Energy is regulated as well and time frames are provided for control reasons in order to enable efficient communication inside the cluster and create a connection between the cluster’s central node and the nodes that make up the cluster in each cell region. But when two neighboring cell regions (in the horizontal position or the vertical position) interact at the same time, collisions could happen. Figure 2 demonstrates a scenario of clusters colliding. When both CHs of the network, X(a) and Y(a), communicate with the communicating nodes X(a − 1) and Y(a − 1), accordingly, inside neighboring clusters, it is apparent from the scenario that horizontal collisions could take place.

Similar to horizontal collisions, vertical collisions may happen when the CHs of the network, Z(a) and Y(a), communicate with communicating nodes Z(a − 1) and Y(a − 2), respectively. A collision analysis would include assessing the possibility of collisions between communicating data packets during transmission by various communicating nodes in the network in the context of a proposed energy-efficient MAC protocol system. The purposes of the analysis are to identify probable collision situations, evaluate how collisions affect the system’s performance, and propose strategies to reduce collisions and improve network efficiency. There are several problems, which are as follows.

2.2.1. Problem of Spatial–Temporal Uncertainty

The delay in submerged acoustical transmission renders the temporal and spatial characteristics of communication unpredictable in underwater wireless communication networks (UWCNs). According to [45], the time for communicating with data packets, the time for transmitting communicating data packets, and the propagation latency of the Rx node are all impacted by the uncertainty in the spatial and temporal components that are transmitted in UWCNs. The uncertainty in spatial–temporal factors considers the location of the Rx node and the moment at which the Tx anchor node delivers the signal when determining the status of a channel in a UWCN. The transmission time of the Tx anchor node and propagation latency have an impact on the collision of communicating data packets, which occurs when two communicating data packets arrive at the Rx node at the same time. Additionally, the distance between the communicating nodes introduces uncertainty into the channel’s state, and collisions can still happen even if other communicating nodes in the cluster communicate with one another independently. High propagation latency often has the potential to lead to collisions in UWCNs.

Figure 3 illustrates the example of spatiotemporal uncertainty. Both communicating node A and communicating node C might send communicating data packets at the same time in a particular scenario. On the other hand, the time of the data packets arriving at the Rx node might shift if the propagation latencies of the communicating nodes are not the same. Additionally, communicating node B could come across a collision if node A and communicating node C communicate data packets at distinct times.

2.2.2. Problem of Hidden Terminals

An Rx node that is blind to the existence of another Rx node is referred to as having a hidden terminal. As stated in [46], the problem of hidden terminals may result in collisions when an Rx node is unable to identify the interference created by an additional Rx node during the communication of data packets. Additionally, collisions might occur if two different Rx nodes communicate data packets to an identical target sensor Rx node. The issue with hidden and exposed communicating nodes is demonstrated in Figure 4. Specifically, both Rx node A and Rx node C are not perceptible to Rx node B, while both of them are accessible to Rx node C. Therefore, collisions with Rx node B may happen if communicating data packets are transmitted from both Rx node A and Rx node C. Additionally, the problem of hidden terminals may lead to poor throughputs as well as very high consumption of energy. These problems need to be handled efficiently so that there is no conflict between the transmitting and receiving times.

2.2.3. Problems with Exposed Terminals

According to [47], the communicating node exposure issue might occur if an Rx node delays communication as a result of receiving another signal. Rx node A and Rx node D are specifically single-hop neighbors of both Rx node B, Rx node C, and Rx node E, as illustrated in Figure 4. Rx node B and Rx node C can send communicating data packets that Rx node A and Rx node D can receive without colliding.

Despite the fact that Rx node A and Rx node D are outside of their respective communication ranges, Rx node B and Rx node C are unable to deliver communicating data packets since they are close to one another. Rx node C will not be able to transmit communicating data packets to Rx node D when the channel has been identified if Rx node A and Rx node B are in communication. Despite the possibility that Rx node B will interfere with Rx node C, Rx node D may still be able to receive the communicating data packet that is transmitted by Rx node C.

2.2.4. System Model

Figure 5 illustrates our chosen network system that uses a cluster topology for the collection of communicating data packets.

In this network, there are four clusters and each cluster has a central CH. Furthermore, the CH is uniformly encircled by a large number of Rx nodes within a single hop. The Rx nodes collect all the communicated data packets of the area and directly forward them to the CHs. Similarly, the sink node uses only a one-hop technique for collecting communicating data packets from the CHs, and then, transmitting them to the sensor Rx node. All Rx nodes are assumed to have the same battery lifetime, processing power, and transmission range. The hidden node issue results from Rx nodes being placed randomly across the overall network. Both inter-cluster and intra-cluster interferences are included in the analysis of this research. Additionally, any motion caused by features, including ocean waves and current movement, has not been considered since the nodes are regarded as static. For channel access, we use a traditional mechanism based on the handshaking MAC technique that follows the RTS/CTS and communicating data packet acknowledgment cycle. By controlling the magnitude of IR, the LIRC issue may be successfully solved in a more logical manner. The aforementioned objective may be accomplished by allocating the ideal transmission energy for transmitting the data packet. The transmission energy must satisfy the following requirements:

S I N R_{D, P} ⩾ S I N R_{d T H},

(4)

where

S I N R_{d T H}

is the minimal

S I N R

threshold required to correctly decode a received communicating data packet, and

S I N R_{D, P}

stands for the signal-to-interference and noise ratio of the communicating data packet. Equation (1) may also be written as follows:

S I N R_{D, P} = \frac{R X_{P}}{N + I_{S}} ⩾ S I N R_{d T H},

(5)

where N denotes the ambient noise,

I_{S}

denotes the interference signal, and

R X_{P}

denotes the energy level of the communication data packets arriving at the Rx node. If

S_{T x}

stands for sender and

R_{R x}

stands for receiver, assume that the energy of a communicating data packet is

E_{D, P}

of

S_{T x}

. Therefore, according to [48,49], we can describe the

R X_{P}

as:

\begin{matrix} R X_{P} = \frac{E_{D, P}}{A_{d s r, f}}, \end{matrix}

(6)

where

A_{d s r, f}

stands for the attenuation function, which is considered for acoustical path loss for a central frequency f and distance

A_{d s r}

between

S_{T x}

and

R_{R x}

. Given that the interference occurs outside of the RTS/CTS communication range, the following equation may be used to express the interference signal’s maximum value:

I_{S} = \frac{E_{max}}{A_{d I R, f}},

(7)

where

E_{max}

denotes the permitted maximum energy for transmission, and

A_{d I R, f}

is the attenuation across

I_{S}

and R when they have been divided by the distance

d I R

. The noise N may be roughly represented as follows (as stated in [48,49]):

N = 50 - 18 log (f) .

(8)

Furthermore, by including (2)–(5), we may modify (2) as:

I_{S} = \frac{\frac{E_{D, P}}{A_{d s r, f}}}{50 - 18 log (f) + \frac{E_{max}}{A_{d I R, f}}} \geq S I N R_{d T H} .

(9)

As stated in (6), the following requirements must be satisfied to decode the signal that was received if there are many interferences:

\frac{\frac{E_{D, P}}{A_{d s r, f}}}{50 - 18 log (f) + \sum \frac{E_{max}}{A_{d I R, f}}} \geq S I N R_{d T H} .

(10)

2.3. An In-Depth Overview of Our Proposed MAC Protocol: Enhancing Network Efficiency and Performance

We elaborate the details of our proposed MAC protocol in Figure 6, which illustrates the complete process in a step-by-step manner. Communication is initiated from the terrestrial base station to the buoy via satellite or another communication vessel. Each layer also consists of one central node located slightly below the buoy. It interacts with these vertical central nodes to collect data and feedback from the buoy. Moreover, every central node is in charge of collecting data from all relay nodes in the same layer. Then, the relay nodes gather data from other sensor nodes, which are the ordinary sensor nodes, organized into clusters. After the establishment of the UWCN, the terrestrial base station gives a standard time for synchronization. Synchronization is achieved by a scheme of layers: buoy–center relay–ordinary nodes.

After this, the base station sends out a data acquisition order to begin the data gathering and sending step. The normal sensor nodes tabulate information and send it to th In networking, every protocol layer functions cooperatively with other layers to deliver data transmission, and the MAC layer design significantly influences the efficiency of the next layer, the routing layer. The MAC protocol we propose targets energy efficiency and collision avoidance in the routing layer in the following ways. First and foremost, the protocol maintains a reliable link state at the MAC layer through minimizing collisions as well as energy in similar layers, hence improving the reliability of the synthesized data. This stability can influence the routing layer’s decision making and allow it to choose more stable and energy-efficie.

Second, when the MAC layer makes responses based on the condition of the network forwarding process, changing channel assignments and other network parameters, it assists in selecting the appropriate routes since the communication paths are more stable and have less chance of failing due to collisions or exhaustion of power. The routing layer can then leverage this increased reliability with less frequent recalculations of a path or deaths of a packet, which reduces delays and maximizes the rate a Last but not least, the flexibility of the total system can be enhanced; for instance, the MAC layer provides information about the channel usage and energy state of nodes of the network. This allows the routing layer to receive more accurate feedback, allowing it to avoid nodes which are almost discharged or less crowded areas to enhance the routing layer results.

2.4. Energy Model of Our Proposed System

Energy models of UWCNs differ considerably from TWSNs because of the special features of the submerged acoustical channel. The following is how the sensor node energy model is stated [50]. The energy efficiency of the communicating data packet transmission is described as follows:

E_{t} (n d) = n \times E_{t r} + n \times E_{e} .

(11)

where the energy efficiency of the transmission is denoted by

E_{t}

, and

E_{t r}

is the rate of energy efficiency of the transmission, which receives the 1-bit communicating data packet and is measured in joules per bit. The number of bits is denoted by n, while

E_{e}

is defined as

E_{e} = E_{0} \times d^{k} \times φ^{d} .

(12)

The UWCN’s energy-efficiency factor

E_{e}

equation considers several factors, including the distance d between the Tx anchor node and Rx node; the spreading factor k, which has a value of 1.7 for cylindrical spreading and 3 for spherical spreading; the power threshold

E_{0}

, necessary for the Rx node to receive communicating data packets; and a frequency-related term

φ

derived from the absorption coefficient:

φ = 10 \frac{A (f)}{10} .

(13)

where

A (f)

is the frequency in kHz and dB/km. The absorption coefficient, a frequency-dependent variable, is represented by f. The aforementioned frequency, expressed by

A (f)

, is just over one hundred Hz, and can be determined using Thorp’s expression from [50] as follows:

A (f) = \frac{44 f^{2}}{4100 + f^{2}} + \frac{0.11 f^{2}}{1 + f^{2}} + \frac{0.275 f^{2}}{10^{3}} + \frac{3}{1000} .

(14)

The number of joules of energy required for an Rx node to receive z number of bits of a communicating data packet is

E_{r} (n_{0}) = n \times E_{t r} .

(15)

An Rx node uses the following quantity of energy when it is only idle listening:

E_{I} (n d) = n \times E_{t r} \times μ_{0} .

(16)

where the ratio of active to passive listening energy is denoted by

μ_{0}

, The energy efficiency of submerged acoustical Rx nodes during communication was investigated in [51]. It has been suggested that the submerged acoustical signal’s transmission and reception make up around 61.2% of the communicating node’s overall consumption of energy. In order to minimize collisions during communication, increase efficiency, and decrease energy inefficiency, energy-efficient MAC protocols will focus on these three challenges in UWCNs.

Additionally, as communicating nodes can automatically wake up and the submerged habitat does not change significantly over a short period of time, sharing time monitoring is feasible. On the other hand, when comparing the idle mode with the sleep mode, the sleep mode is more energy efficient in UWCNs. Thus, by putting UWCNs in sleep mode when not in use and waking them up when necessary, their energy efficiency may considerably increase, and their lifespan may increase.

3. Improved Q-Learning Algorithm for Better Management of Channel Utilization in UWCNs

3.1. State Space ( $S_{s}$ )

A variety of variables that affect energy efficiency and transmission quality can be combined to define the state space of UWCNs, as follows:

Battery level ( $B_{l}$ ): How much power the sensor node has left.
Interference level ( $I_{l}$ ): The degree of interference in communication within the proximity of the node.
Data packet queue length (Q): The quantity of data packets that are waiting for transmission.
Recent collision history (C): A quantitative or binary metric showing the frequency of recent collisions. As a result, a vector S = ( $B_{l}$ , $I_{l}$ , Q, C) may be used to describe a state $s \in A$ .

3.2. Action Space ( $A_{s}$ )

The potential actions that the node may take are included in the action space:

Modify transmission energy (E): Data packets may be sent using varying transmission energy levels.
Retransmission strategy (R): Chooses which packets to retransmit and at what time. A combination may be any action $a \in A$ , such as a = (E, R).

3.3. Reward Function ( $R_{f}$ )

The success of an activity performed in a certain condition is measured by the reward function. Successful data transfer, collision avoidance, and energy efficiency should all be balanced:

Energy consumption ( $E C$ ): The energy used in a node to forward data packets at the transmission energy level. Nodes can be running at different energy levels, which impacts the achievable range and level of reliability. Transmission energy allows nodes to increase the coverage area of the communication and to conserve more energy by reducing the transmission power when interacting with other nearby nodes. This flexibility is particularly important in power-limited platforms such as UWSNs, where one has to be very careful when using both energy resources and communication subroutines.
Collision occurrence ( $C O$ ): Energy transmission: A node is also capable of varying the energy required for transmission over another node probably as a function of distance or the state of the network. Higher energy levels make the transmission more successful over longer distances but consume more energy. On the other hand, a reduced energy level is adopted every time nodes are nearby, or in cases where power usage is to be minimized.
Successful transmission ( $S T$ ): The simultaneous adjustment of transmission energy (E) and the selection of a retransmission strategy (R) makes the system much more dynamic. For example, a node can decide to boost the transmitted power level for urgent data frames for them to reach the intended destination intact, and decide on a retransmission technique for frames that were presumably not received at the intended destination due to interferences or collisions. These enable flexibility in the action space and lead to better energy management and overall network performance.

Thus,

R (s, a) = W_{1} \times S T - W_{2} \times E C - W_{3} \times C O

may be the reward function, with

W_{1}

,

W_{2}

, and

W_{3}

being weights signifying the relative significance of each element.

3.4. Rule for Q-Learning Updates

Q-learning updates often follow this rule:

\begin{matrix} Q (s, a) = Q (s, a) + \\ α [R (s, a) + γ max a^{'} Q (s^{'}, a^{'}) - Q (s, a)] . \end{matrix}

(17)

where for state–action pair

(s, a)

,

Q (s, a)

is the current Q-value. The learning rate is represented by

α

. The discount factor is denoted by

γ

. After executing the operation, the new state is

s^{'}

. The highest Q-value attainable in the new state is

max a^{'} Q (s^{'}, a^{'})

[21,52].

3.5. Description of the Q-Learning Formula

Present Q-value $Q (s, a)$ : This represents an approximation of the projected benefits of action in state s.
Learning rate $α$ : This indicates the degree to which recently learned knowledge supersedes previously learned information. When the value is 1, the agent only considers the most recent information; when the value is 0, the agent learns nothing at all.
Discount factor $(γ)$ : This establishes how important rewards in the future are. When the value is near 1, the agent will aim for long-term high rewards, but when the value is close to 0, the agent becomes short-sighted and only considers immediate benefits.
Reward $R (s, a)$ : Instant reward obtained after the change from state s to $s^{'}$ as a result of action a.
Maximum reward in the future $max a^{'} Q (s^{'}, a^{'})$ : The highest reward attainable from the newly created state $s^{'}$ , considering every action $a^{'}$ that might be taken.

3.6. Explanation of Our Proposed Q-Learning Technique

UWCNs need to learn the optimum transmission methods over time for sensor nodes, which may be accomplished by developing a reinforcement learning framework and using Q-learning techniques for an energy-efficient MAC protocol. Every sensor node in this configuration participates in the Q-learning process as an agent. Parameters such as battery level, current transmission power, number of surrounding nodes, channel condition, and recent collision history may be included in the state space of the Q-learning model. The action space would include various transmission power levels and perhaps different packet timing techniques. The incentive functions that help to facilitate the learning process (energy efficiency, collision avoidance, and high PDR) must be balanced. Penalties may be applied for unsuccessful transmissions or excessive energy usage, while significant rewards can be given for successful transmissions with little energy use and few collisions. The nodes learn to anticipate the expected utility of various actions given their present state by continually updating their Q-values based on the results of their actions while they are in operation. A larger learning rate might be used at the beginning of the procedure to promote exploration, and it could be progressively lowered to stabilize the learning process. To ensure that the nodes are able to respond to changing network circumstances and avoid becoming trapped in less-than-ideal methods, the protocol may also have a mechanism to adaptively alter the exploration rate. Sensor nodes are able to develop an effective transmission technique over time that minimizes collisions, maximizes energy usage, and enhances network performance in general.

4. Performance Evaluation

4.1. Simulation Design

The proposed protocol’s performance was evaluated using MATLAB R2022a computer simulations. The simulation settings that were used in the evaluation are listed in Table 2. Thorp’s empirical model of the submerged acoustical channel in [53] is used to accurately represent the properties of the submerged channel. Moreover, our research also takes into consideration the following interference situations. In this stage, there are two different scenarios.

Scenario 1: The research makes the assumption that even in situations when there is partial overlapping between control data packets, the communicating data packet cannot be effectively decoded until the criteria mentioned in (7) is accomplished [54].

Scenario 2: The second assumption is that the interference and receiving signals are uncorrelated with one another and that the interference signals may be thought of as cumulative values [54]. The Tx anchor node and Rx node energies, as well as the communicating data packet rate, were established by the specifications of the ATM-903 submerged modem. As illustrated in Figure 5, every Rx node creates a network of stars on its topology, with the sink node at its center, after being randomly placed inside any of the four aforementioned clusters. The four clusters are carefully deployed to take into consideration any inter-cluster interference situations. Additionally, a particular simulation scenario is considered, such as a variable average traffic load with a certain amount of Rx nodes in a cluster. As an example, cluster-1 has 12 Rx nodes that generate network traffic. This traffic load has a communicating data packet rate of between 0.005 and 0.015 packets/s and follows a Poisson distribution. Three thousand repetitions of each simulation are run to increase the accuracy of the findings.

4.2. Performance Metrics

The network is evaluated on the basis of the following metrics: Throughput of the network, energy consumption of the network, energy efficiency of the network, collisions/data packet, delay time, communication time, and data PDR.

4.3. Result Evaluation

CAPC-MAC, which is the basic strategy of this wireless network, is developed to minimize collisions and manage the priority of access to channels efficiently. This makes it possible to obtain better results, especially in conditions where many nodes want signal space at the same time. The CSMA-MAC (Carrier Sense Multiple Access with Collision Avoidance) protocol is one of the most popular techniques used in wireless communications. It has features that make it listen to the channel before it begins to transmit, working towards there being no other node transmitting at the same time. SFMA-MAC mainly aims at the spatial fairness of the access to the networks based on the location of the nodes. It assists in making a provision that nodes in distinct regions receive fair access to the network regardless of the positioning of the nodes. Lastly, the T-Lohi-MAC protocol is intended to cut battery consumption in wireless networks through a tone-based reservation scheme. It enables nodes to control use of the channel in an efficient way, making minimal use of system resources while at the same time offering optimum network performance.

4.3.1. Throughputs of the Network $ρ_{g}$

The throughput of a network is the rate at which communicating data packets may send data packets/bits, and it can be described as follows:

ρ_{g} = \frac{T_{L_{g}}}{T_{D_{g}}} = \frac{\sum_{i \in g} n_{w γ L_{D, P A_{H_{w}}}} (1 - b_{P_{e}}) 2 L_{D, P}}{T_{D_{g}}} .

(18)

where

T_{L_{g}}

stands for the total number of communicating data packets that the CH node properly receives during a round. This addresses all of the cluster’s active communicating data packets. The features of the modem and the properties of the submerged acoustical channel have an impact on the bit error rate

b_{P_{e}}

. The following equation may be used to obtain

A_{H_{i}}

, which stands for the average amount of communicating data packets sent by the cluster (I) in a round:

\begin{matrix} A_{H_{w}} = \sum_{k = 1}^{T_{N_{W} - 1}} {k P}^{k}, \\ = \sum_{k = 1}^{T_{N_{W} - 1}} k C_{T_{N_{w}}}^{k} P^{k} (1 - P) T_{N_{w}} - k T_{N_{w}}, \\ = T N_{w P B} . \end{matrix}

(19)

where the total number of communicating data packets is represented by

T N_{w}

and the probability of generating the communicating data packets is represented by

W B

and

P B

:

P B = 1 - e^{λ R_{D C}} .

(20)

The generation data packet rate of each communicating node is represented by

λ

and the acquisition delay of each round is

R_{D C}

. This system also comprises the time transmission

t_{w}^{d}

for every cluster as well as the time

P_{w}^{d}

between the clusters. Total energy consumption takes into account energy spent to send communication data packets, energy received by the communicating nodes during communication, and energy spent while communicating nodes are in the idle state or not constantly transferring communicating data packets. Both the frequencies of transmission and the separation between the acoustical modems have a significant impact on the energy efficiency in the network. To increase energy efficiency, our chosen model modifies the energy efficiency depending on the distance and frequency during the transmission. The received energy is considered to be fixed or constant. The energy efficiency is then explained as follows:

E_{t} = \sum_{i \in t} (E_{w}^{t} + E_{w}^{r}) + I d l e_{E} .

(21)

The unique cluster is represented by the cluster indicator with the letter W. The energy utilized for transmission by the CH communicating node and the regular communicating nodes within that specific cluster are collectively referred to as

E_{w}^{t}

, or transmitted energy. Thus, the following formula may be used to compute the energy to transmit communicating data packets:

\begin{matrix} E_{w}^{r} = \sum_{n_{w \in s_{w}}} p_{w n^{w}}^{t} T_{w n^{w}}^{t} \\ = E_{S_{W}}^{t} + \sum_{n_{w \in s_{w}} n_{w}} \neq s_{w} \frac{p_{w n^{w}}^{t} L_{w n^{w}}^{t}}{R}, \end{matrix}

(22)

where

S_{W}

is the cluster’s node indication and

S_{W}

represents the collection of communicating nodes in a cluster. Node

n_{w}

in cluster w has a transmission speed and time of

p_{w n^{w}}^{t}

and

T_{w n^{w}}^{t}

, respectively. In cluster w, the node’s packet length is

L_{w n^{w}}^{t}

, which is made up of the length of the communicating data packet. WAKE communicating data packets, ACK communicating data packets, and the energy efficiency for the transmission of inter-cluster communicating data packets are all included in the required energy for the transmission of communicating data packets

E_{S_{W}}^{t}

of the cluster’s central node.

E_{S_{W}}^{t}

can be defined as

E_{S_{W}}^{t} = E_{max} (L_{w a k e + n_{w}} L_{a c k}) + \frac{E_{max} N_{w n_{w}} γ L_{d a t a}}{R} .

(23)

As soon as the distance exceeds its limit during the communication

d_{c}

, the central cluster node’s energy for transmission is set to

E_{max}

.

E_{w}^{r}

is the term used to describe all of the energy that cluster w has received, including energy that has been received by both the head of the cluster node and the ordinary communicating nodes. WAKE-up communicating packets and ACK communicating packets make up the majority of the energy that the inter-cluster communicating nodes receive. To be clear, the energy received by the central node in the cluster consists of both the energy that occurs from the communicating data packets and the energy from the active communicating data packets. To control both space and time efficiently and avoid conflicts and retransmissions, the chosen-MAC protocol includes an ordered cluster structure that improves network performance by using a time synchronization plan and energy-efficiency technique. The chosen MAC protocol performs better than other protocols, including TLPC, MACA-PC, and CAPC, under comparable traffic load situations [55,56]. The transmission energy for communicating data packets is constantly altered by the suggested protocol’s two iterations, one utilizing the algorithm known as greedy and another utilizing the greedy technique in response to the interference conditions within the area. Adaptive distribution of the energy reduces packet losses. However, in an environment of numerous interferences, the TLPC, MACA-PC, and CAPC protocols are unable to allocate sufficient energy for reliable DATA transmission, which results in greater packet losses. When packets are lost, there is an increase in delay for successful packet delivery, which lowers network performance.

4.3.2. Average Throughput vs. Different Number of Nodes

Figure 7 demonstrates a comparison between the different number nodes and the network throughputs. Notably, when the total amount of nodes rises, the throughputs of the network for the chosen MAC and T-LOHI protocols improve [57]. After a certain level of nodes, the throughputs of both CSMA and SFAMA decline slowly and decrease after a peak [58,59]. In comparison to existing MAC protocols, the CSMA protocol interprets there being lower network traffic because it does not account for submerged space and time abnormalities. Each phase of the synchronous handshake may transfer additional communicating data packets because of SFAMA’s use of several RTS requests to accomplish data packet transmission via separated strings. This is advantageous when there is a lot of traffic in SFAMA. However, the SFAMA protocol’s performance falls as the number of nodes rises since more RTS attempts occur as a result of the increasing network density. The T-LOHI protocol successfully solves the significant delays brought on by submerged space and time uncertainty compared to the CSMA procedure, hence lowering collisions and retransmissions.

Through the use of time synchronization and power management, the suggested MAC protocol prevents communication collisions by using an organizational cluster architecture. As a result, traffic on the network is drastically decreased.

4.3.3. Network Throughput vs. Traffic Load

Our research indicates a significant improvement in network throughput under different traffic loads when our ML-based MAC protocol is compared to conventional MAC methods, as shown in Figure 8. With its capacity to dynamically change clusters, optimize bit rates, and effectively manage data packet sizes based on learned patterns, the ML-MAC protocol shows impressive versatility. Due to the protocol’s ability to adapt, congestion was reduced, collisions were avoided, and the available bandwidth was optimized, particularly in situations when there was a lot of traffic. The results indicate how well our ML method works to maximize network performance and how well it can handle the demands of dynamic and varied traffic loads, outperforming conventional MAC protocols in the process. Moreover, our ML-MAC protocol demonstrated a distinct advantage over conventional protocols when examining the crucial parameters of clusters, bit rate, and data packet size. Our protocol produced well-informed judgments that led to greater bit rates inside clusters, appropriate data packet sizes, and more effective intra-cluster communication by using learned insights from the environment. Throughput was increased using a comprehensive approach, and the protocol’s flexibility in various network situations was shown. When we compare our machine-learning MAC protocol to standard MAC protocols, it shows that it is a smart and resilient way to obtains higher network throughput under different traffic loads. This proves that it has the potential to be a cutting-edge solution for how wireless communication networks need to change over time.

4.3.4. Average Delay

Here, the average latency of each protocol is investigated by increasing the number of communication nodes used at one time, with all other parameters kept constant, to find out the capacity of each communicating data packet within the network. The average delay of our protocol is shown in Figure 9, along with four other current protocols.

The graph shows, among other things, that the average delay time grows in linear proportion to the overall quantity of nodes. Otherwise, the average delay time for each of the four operations has an inverse relationship with the total number of nodes. The average length of a delay decreases as the number of nodes increases. The average network latency of our protocol is much lower than the CAPC, CSMA, SFAMA, and TCPC protocols in a network with the same 60 nodes, with reductions of 82.7%, 81.7%, 81.2% and 80.4%, respectively. Furthermore, compared to the T-LOHI protocol, our protocol has a somewhat reduced average network latency (2.3% lower). As the number of nodes increases, there is a linear relationship between the number of packets from nodes and potential collisions. Therefore, the CAPC, CSMA, SFAMA, and T-LOHI protocols all see an increase in their average delay period. Compared to the CAPC and CSMA protocols, the SFAMA and T-LOHI protocols demonstrate delayed changes in reaction to changing network circumstances because of the concern for managing time and space uncertainty and minimizing collisions and retransmissions in their design. Additionally, the overall network delay period noticeably increases when the total number of nodes rises in the CSMA protocol. This is mainly because multi-hop situations include several request-to-send (RTS) frames, which might overlap and lead to delays when there is ambiguity about both space and time. Through simulations, we show that the average delay of our proposed MAC grows more slowly as the number of nodes in the total increases compared to other similar and related existing protocols. In particular, our MAC protocol is somewhat better than the T-LOHI scheme and much better than the CAPC, SFAMA, and CSMA schemes in terms of average delay time.

4.3.5. Different Traffic Loads vs. Average Delay

Next, while maintaining the same parameters, the average latency of the network of our protocol is compared with other existing protocols in terms of different offer loads in the network. The four protocols’ typical network delays are demonstrated in Figure 10. The average network delay time of our chosen MAC protocol is noticeably reduced when the different traffic load is set at 0.5 packets/sec compared to the CAPC, CSMA, SFAMA, and T-LOHI protocols, with reductions of 60.9%, 60.7%, 60.4% and 3.3%, respectively. The fact that the T-LOHI and CSMA protocols are unable to effectively deal with the collisions caused by hidden nodes or space–time uncertainty ultimately lead to a higher interference level, an increased packet loss, more collisions, and a longer time delay. However, our proposed protocol addresses these issues. At sub-optimal utilization levels, the average latency differences of the four protocols might not matter much due to the UWCN’s inferior scheduling efficiency.

However, this causes the delay to grow as the volume of traffic increases steadily. Collisions and retransmissions are more common as a result of this inefficiency. Based on simulation data, our proposed protocol outperforms the CSMA and SFAMA protocols in terms of average latency in the network. Our proposed T-LOHI procedures are especially designed to reduce collisions and retransmissions brought on by excessive latency and spatial–temporal uncertainty.

4.3.6. Average Traffic Time for Different Nodes

The purpose of this simulation is to observe how sensor nodes impact connection times for the four protocols. Figure 11 compares the average traffic times for the protocols for various node counts. The chart clearly shows that when more nodes are deployed, network traffic times for both the CSMA and SFAMA protocols grow, especially for the CSMA protocol. Our protocol has a network traffic time that is 92.7 percent and 89.5 percent lower than those of the CSMA and SFAMA protocols, respectively, when 140 nodes are used. Additionally, it is just 2.3% slower than the T-LOHI protocol’s traffic time. The simultaneous occurrence of RTS operations in the time domain brought on by a significant submerged delay in propagation results in a variety of collision situations for the CSMA protocol. However, the other three MAC protocols can effectively prevent the collision of the same nodes, and then, reduce network traffic time, while taking into account the uncertainty in underwater time and space. By efficiently using longer data lines as the traffic load rises, the CAPC and SFAMA protocols outperform the CSMA protocol in terms of throughput. High latency caused by submerged space and time uncertainty is a difficulty that the T-LOHI protocol attempts to solve. The sender is given time slots, each of which has a duration that is equal to the time it takes to transmit a packet plus its maximum propagation delay. Comparing this method to the CSMA and SFAMA protocols, it decreases collisions and retransmissions. Last but not least, the chosen protocol makes use of a clustering hierarchy structure to avoid transmission collisions by synchronizing time and managing energy. It thus cuts down on traffic on the network significantly.

4.3.7. Channel Utilization vs. Slot Size

UWCNs are a domain where effective communication depends critically on the dynamic interaction between channel use and slot size.

The severe underwater environment presents tremendous problems for traditional static techniques to channel allocation and setting slot sizes since acoustic channels display substantial attenuation and unpredictable propagation delays. Here, Q-learning dynamically adjusts to changing network circumstances, maximizing channel use and slot size for improved efficiency, as shown in Figure 12. By using historical data to identify the best regulations, the algorithm creates a responsive and adaptable network that lowers latency, decreases collisions, and increases overall efficiency. One potential option is to use the reinforcement learning method known as Q-learning.

The applicability of Q-learning in the management and optimization of channel allocation and episodes makes this work very important in improving the effectiveness of communication in UWCNs. Underwater communication is characterized by high attenuation, varying propagation delay, and varying conditions, making it tricky to establish an appreciable, clear channel. Q-learning is effective in tackling these issues mainly due to the fact that the system can learn different approaches in different stages depending on the network conditions. In particular, it maximizes the usage level and distribution of the entire communication links, which plays a critical role in addressing issues of latency and loss of packets. An episode in Q-learning refers to a sequence of decision making and interaction that takes place within the network. Through such episodes, the Q-learning algorithm dynamically modifies the channel selection patterns depending on prior encounters to enhance system performance.

This dynamic adjustment process is more useful in underwater networks than in terrestrial networks since activity in the underwater environment changes more frequently. It is seen from Figure 13 that the Q-learning enables the system not only to manage the episodes but also to control the usage of channels that would enhance the entire performance of the UWCN. Further, the interaction of these aspects is examined in detail in the study to demonstrate how Q-learning can cope with diverse challenges of underwater communication management, including the variability in the propagation delay and high error rate, given that it can adapt to new conditions as they are encountered.

By comparing Q-learning to more conventional approaches, this comparative study aims to assess Q-learning’s effectiveness and provide insight into how well it can handle the complexities of underwater communication, eventually leading to the creation of more resilient and adaptable UWCNs.

The graph in Figure 14 explains channel usage, which is an important parameter which signifies the credentials of the network in terms of how well is it using all the available channels for data transfer. In terms of throughput and energy consumption, the proposed protocol has better performance than T-LOHI-MAC, CSMA-MAC, SFAMA-MAC, and CAPC-MAC under various network scenarios, and especially in mobile network scenarios. As illustrated in Table 3, our protocol outperforms these alternatives, showing notable gains in channel optimization: We achieved an average of 24% improvement in throughput compared to T-LOHI-MAC, 20% compared to SFAMA-MAC, 17% compared to CAPC-MAC, and 16% compared to CSMA-MAC. This demonstrates that the proposed system is capable of maintaining a good level in dynamic mobile underwater networks, which again justifies the general enhancement in system performance, as demonstrated in Figure 15.

These percentages indicate significant improvements in the throughput, dependability, and efficiency of all important aspects of UWCN performance in situations where nodes are constantly moving. Our suggested protocol also remains better in static network circumstances, demonstrating 16%, 14%, 18%, and 17% gains in performance over CAPC-MAC, CSMA-MAC, SFAMA-MAC, and T-LOHI-MAC, respectively. This suggests that the protocol’s efficacy extends beyond mobile circumstances to static contexts, where stability and dependability are critical. The benefits of our suggested protocol hold when moving to big mobile networks, showing gains of 18%, 22%, 13%, and 6% over the current methods. These gains are especially remarkable since large mobile networks pose extra hurdles because of their increased complexity and propensity for interference. Our suggested protocol performs much better in the setting of large static networks, outperforming CAPC-MAC by a significant 29%, CSMA-MAC by 16%, SFAMA-MAC by 18%, and T-LOHI-MAC by an astounding 27%. The protocol’s exceptional performance in large static networks suggests that it can handle situations where there is a high node density and restricted mobility, which is important for applications like environmental monitoring and underwater surveillance.

Finally, these findings demonstrate the effectiveness and adaptability of our suggested strategy in maximizing channel use in a variety of underwater network conditions. Underwater communication systems have advanced greatly as a result of their promise as a dependable and effective option for UWCNs, as seen by the continual performance gains over current protocols. These results provide important new information for academic comprehension as well as practical application in real-world underwater situations, which is beneficial for researchers and field practitioners. As the average traffic load rises, Figure 16 shows a noticeable decline in the PDR. Notably, PDR dynamics are strongly influenced by the interaction between two weighting factors,

β_{1}

and

β_{2}

. The PDR records larger values when

β_{1}

is greater than

β_{2}

, suggesting that the collision avoidance reward component

r_{1}

is prioritized above the energy-efficiency reward factor

r_{2}

. Although collision avoidance is prioritized, this improves network performance at the expense of energy efficiency. A greater priority for collision avoidance improves PDR at the price of energy efficiency, as shown by the observed trend, which highlights the trade-off between collision avoidance and energy efficiency in the network and clarifies the complex effects of parameter settings on overall network performance.

The PDR shows a steady increase as the communication range expands in Figure 17. The addition of more nearby nodes that are within the sender’s communication range is what is causing this development. A wider routing route is produced by the extended communication range, which improves packet transmission dependability. The participation of eligible nodes in the packet forwarding process rises in tandem with the total number of installed nodes. Concurrently, the increase in the quantity of deployed nodes results in a decrease in empty spaces. This decrease in empty spaces is essential to increasing the PDR even further. To put it simply, the combination of increasing the number of deployed nodes and the increasing communication range results in a methodical enhancement of PDR, which builds a more reliable packet transmission infrastructure.

The average number of collisions per packet for a range of typical traffic loads is shown in Figure 18. Higher average traffic loads are correlated with a proportionate rise in collision incidence, according to the observed trend. Notably, every protocol that is examined responds to the increasing average traffic load in a comparable manner. This homogeneity results from the fact that CHs experience an increase in accidents as the average traffic load increases because of the increasing interference brought on by more simultaneous transmissions. The suggested technique stands out since it shows a significant improvement over the current methods. More specifically, compared to CAPC, CSMA, SFAMA, and T-LOHI, our proposed protocol records a 27%, 15%, 31%, and 37% decrease in the average number of collisions per packet when the average trapic load is 0.015. This enhancement may be ascribed to the protocol’s proficiency in identifying and executing the best energy-efficiency tactics, that can lessen the effects of many interferences. As such, there are essentially fewer collisions on average for each packet.

In contrast, the results of T-LOHI, SFAMA, CAPC, and CSMA highlight the negative consequences of insufficiently allocating energy for data transmission. In these situations, there is a rise in packet losses, and therefore, in the average number of collisions per packet. Specifically, a protocol’s capacity to properly handle numerous interferences might be compromised by inadequate energy allocation for data transfer. It must be noted that the risk of collision is doubled when the range of interference exceeds twice the extent of transmission. The increased number of sensors falling under the enlarged interference range causes this phenomenon and raises the possibility of accidents.

4.3.8. Energy Consumption

When compared to the current MAC protocols, our suggested Q-learning method for UWCNs has been shown to perform better in terms of energy consumption and efficient use of network resources, as shown in Figure 19.

The MAC layer’s integrated Q-learning algorithm improves energy usage by constantly adjusting transmission settings in response to the state of the network. The system’s flexibility enables it to modify characteristics like duty cycle and transmission power, guaranteeing effective energy use between nodes. Moreover, our Q-learning methodology has shown exceptional performance in circumstances when the number of nodes in the network varies. The Q-learning method adjusts to the shifting dynamics as the number of nodes rises, improving the MAC protocol for better performance. In contrast to conventional MAC protocols that could have scaling problems, our Q-learning approach makes use of its learning power to efficiently handle the higher node density. UWCNs with dynamic deployment situations might find this flexibility to be an attractive option as it translates into lower congestion for network resources and increased energy efficiency. Overall, compared to current MAC protocols, our suggested Q-learning approach demonstrates its capacity to achieve improved energy consumption and scalability performance, making it a significant improvement in the field of underwater sensor network communication.

Concerning energy consumption and offering load per node packet per second, our novel Q-learning method for UWCNs has shown impressive performance gains over current MAC protocols, as shown in Figure 20. Through the clever modification of transmission parameters, the Q-learning algorithm integrated into the MAC layer is essential for dynamically optimizing energy usage. The entire network lifespan is increased as a result of this flexibility, which guarantees that the underwater sensor nodes are more energy efficient. Apart from its excellent energy management, our Q-learning method performs very well in terms of offer load per node packet per second, which is an important parameter to evaluate the network’s communication efficiency. Because of its learning capabilities, the system can optimize packet transmission rates per node and distribute resources dynamically. As a consequence, the communication burden is distributed more evenly and effectively across the network, reducing congestion and increasing throughput overall. Our Q-learning technique consistently outperforms competitors in comparative evaluations with current MAC protocols, providing a comprehensive solution that guarantees a more equitable and efficient use of network resources per node packet per second in UWCNs, while also optimizing energy consumption.

The network’s energy efficiency, as shown in Figure 21, shows a steady decline as the average traffic load increases, supporting the pattern seen in Figure 16 regarding the PDR. Notably, energy efficiency is mostly determined by the interaction between the weighting factors

β_{1}

and

β_{2}

. In particular, the network achieves better energy efficiency when

β_{1}

is less than

β_{2}

, indicating a stronger focus on the energy-efficient reward element specified by

r_{2}

as opposed to collision avoidance,

r_{1}

. This suggests that with these parameter values, the system gives energy conservation a higher priority than collision avoidance, which leads to a more effective use of network resources when traffic load fluctuates. The results highlight how important parameter adjustment is to achieve a balance between the competing goals of energy efficiency and network collision avoidance.

The results described in Figure 22 show the performance of the proposed Q-learning-based MAC protocol and a fixed MAC protocol from episode 1 to episode 500 in underwater wireless sensor networks (UWSNs). From the collision count graph, it can be observed that Q-learning MAC always has a much lower collision count than fixed MAC, proving the low-collision-rate capability of this protocol. The Q-learning MAC protocol shows lower energy consumption in the energy consumption graph Figure 22, which results from the versatility to vary the transmission energy, unlike the fixed MAC protocol that has incessant higher energy use. Last but not least, through the throughput graph it can be observed that the Q-learning MAC protocol has better success rates of successful transmission and shows more stability in its number of successful transmissions as compared to the DCF MAC protocol, thus demonstrating the ability of the Q-learning MAC protocol to learn and adapt to the network environment in order to achieve optimum solutions for the data transmission. Overall, the Q-learning MAC protocol outperforms the fixed MAC protocol in all three key metrics: A bull’s-eye scatter plot of the collision rate in terms of energy efficiency and throughput is also shown.

4.3.9. Comparison Analysis with Recent MAC Protocols

Here, we compare our proposed MAC protocol with the very recently proposed MAC protocols for UWSNs and UAWCNs. Our proposed system demonstrates superior performance compared to the existing DR-ALOHA-Q [60] and MACA-Q protocols. This superiority can be attributed to several key factors that enhance channel utilization and overall network efficiency. Firstly, our proposed system achieves consistently higher channel utilization, as evidenced by the green line in Figure 23. This indicates that our protocol can handle more data traffic and make better use of available bandwidth, which is critical in the bandwidth-constrained underwater environment. The higher channel utilization means that more data packets are successfully transmitted without collisions, leading to more efficient use of network resources.

Secondly, the stability of our proposed system is notable. Unlike MACA-Q, which exhibits significant fluctuations and a lower overall channel utilization, our protocol maintains a steady performance. This stability is crucial for applications requiring reliable data transmission, such as environmental monitoring and underwater exploration. The reduction in packet collisions and retransmissions in our protocol translates to lower energy consumption, extending the operational life of the network nodes, which is a crucial factor in UWCNs where battery replacement or recharging is impractical. Lastly, compared to DR-ALOHA-Q, our proposed system reaches its peak performance more rapidly and maintains it over time. This rapid convergence to optimal performance is beneficial in dynamic underwater environments where conditions can change unpredictably.

The ability to quickly adapt and maintain high performance ensures that the network remains functional and efficient despite varying underwater conditions. Furthermore, our proposed system demonstrates a clear advantage over the existing DR-ALOHA-Q and MACA-Q protocols in terms of channel utilization, as shown in Figure 24. This superiority is evident across various numbers of network nodes, highlighting the robustness and efficiency of our approach. Our proposed system consistently achieves higher channel utilization, maintaining values above 0.55 Erlang even as the number of network nodes increases from 5 to 40. This indicates that our protocol efficiently manages network traffic and minimizes collisions, which is critical in the bandwidth-limited underwater environment. High channel utilization ensures that the network can support more data transmission activities simultaneously, enhancing overall communication efficiency. In comparison, the DR-ALOHA-Q protocol shows a peak in channel utilization at around 20 nodes but then experiences a decline as the number of nodes increases. This suggests that DR-ALOHA-Q may suffer from increased contention and collisions as the network density grows, leading to inefficiencies. The MACA-Q protocol performs even worse, with consistently lower channel utilization values, peaking around 20 nodes but failing to sustain high performance with further increases in network size.

Our proposed system’s ability to maintain high channel utilization despite the growing number of nodes is particularly beneficial for UWCNs, where expanding the network size is often necessary to cover larger areas or increase the number of data collection points. The superior performance can be attributed to the optimized handling of transmission scheduling and collision avoidance, which are critical in underwater communication due to the high propagation delays and limited available bandwidth. Moreover, the stability and predictability of our proposed system’s performance make it a reliable choice for critical applications such as environmental monitoring, military surveillance, and underwater exploration. By ensuring efficient use of the communication channel, our protocol can significantly reduce the energy consumption required for retransmissions, thereby prolonging the operational life of the sensor nodes.

In the case of UWSNs, energy consumption is a very important issue, because it is difficult to recharge or replace batteries underwater. Our proposed system demonstrates significant improvements in energy efficiency compared to the existing DR-ALOHA-Q and MACA-Q protocols, as evidenced by Figure 25. Our proposed system consistently exhibits the lowest energy consumption across all network sizes, ranging from 5 to 40 nodes.

This trend highlights the protocol’s ability to efficiently manage communication tasks, minimizing energy wastage. The green line in the graph clearly shows that our system maintains energy consumption well below 1 joule even as the number of nodes increases, whereas both the DR-ALOHA-Q and MACA-Q protocols consume significantly more energy. In particular, MACA-Q has the highest energy consumption among the three protocols, consistently above 1.3 joules, and it only decreases slightly with the increase in network nodes. This indicates inefficiencies in the MACA-Q protocol that lead to higher energy usage, likely due to increased collisions and retransmissions that are common in underwater environments. DR-ALOHA-Q performs better than MACA-Q but still lags behind our proposed system. As the number of nodes increases, DR-ALOHA-Q starts to consume less energy over time; however, it is always more than our proposed protocol. This suggests that while DR-ALOHA-Q attempts to manage energy usage, it cannot achieve the same level of efficiency as our protocol.

The superior performance of our proposed system can be attributed to several key factors. Firstly, the protocol effectively reduces collisions and retransmissions, which are major sources of energy drain in UWCNs. By optimizing the scheduling of transmissions and employing more efficient collision avoidance mechanisms, our system ensures that fewer packets need to be resent, conserving energy. Secondly, our protocol incorporates advanced energy management techniques that dynamically adjust transmission power and frequency based on the network conditions. This adaptability ensures that the system operates at optimal energy levels, extending the battery life of the sensor nodes. As a result, our proposed system not only outperforms DR-ALOHA-Q and MACA-Q in terms of channel utilization but also excels in energy efficiency. This makes it a highly suitable choice for UWCNs, where conserving energy is paramount to maintaining long-term network operations and reducing maintenance costs. The significant reduction in energy consumption ensures that our system can support more extensive and prolonged underwater monitoring and communication tasks.

5. Conclusions

The energy consumption issues of battery-powered Tx anchor and Rx nodes are successfully addressed by the suggested energy-efficient MAC protocol incorporating Q-learning. The technique improves network performance by emphasizing low collision rates and energy economy while dynamically optimizing transmission energy without previous interference information. The simulation findings indicate a considerable decrease in end-to-end latency, packet collisions, PDR, and channel utilization, as well as a possible boost in network throughput of up to 23%. However, with these advantages, the protocol also has some disadvantages, including the need for a large amount of training data and processing capacity for the Q-learning model. Subsequently, we investigated and concentrated on augmenting the flexibility of the procedure, carrying out empirical investigations for validation, which in turn strengthened the scalability of our approach.

Author Contributions

Conceptualization, Q.G. and W.U.R.; methodology, Q.G. and W.U.R.; software, W.U.R. and F.Z.; validation, Q.G., W.U.R. and F.Z.; formal analysis, F.Z., M.B. and W.A.; investigation, W.U.R., Q.G. and F.Z.; resources, Q.G. and F.Z.; data curation, S.U.K., M.B., M.I.K. and W.A.; writing—original W.U.R.; writing—review and editing, W.U.R., S.U.K., M.B., M.I.K. and W.A.; visualization, W.U.R., M.I.K., F.Z. and S.U.K.; supervision, Q.G. and F.Z.; project administration, Q.G. and F.Z.; funding acquisition, Q.G. and F.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by: 1. Shenzhen Science and Technology Program under Grant No. JSGG20220831103800001. 2. Key Research and Development Program of ShanDong Province under Grant No. 2022CXGC020409.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Khan, S.U.; Khan, Z.U.; Alkhowaiter, M.; Khan, J.; Ullah, S. Energy-efficient routing protocols for UWSNs: A comprehensive review of taxonomy, challenges, opportunities, future research directions, and machine learning perspectives. J. King Saud Univ.-Comput. Inf. Sci. 2024, 36, 102128. [Google Scholar]
Cuzme-Rodríguez, F.; Velasco-Suárez, A.; Domínguez-Limaico, M.; Suárez-Zambrano, L.; Farinango-Endara, H.; Mediavilla-Valverde, M. Application for the Study of Underwater Wireless Sensor Networks: Case Study. In Proceedings of the International Conference on Advances in Emerging Trends and Technologies, Riobamba, Ecuador, 26–28 October 2022; pp. 124–136. [Google Scholar]
Zhang, Y.; Hong, Y.; Guizani, M.; Wu, S.; Zhang, P.; Liu, R. A Multi-Layer Information Dissemination Model and Interference Optimization Strategy for Communication Networks in Disaster Areas. IEEE Trans. Veh. Technol. 2024, 73, 1239–1252. [Google Scholar] [CrossRef]
Sathish, K.; Hamdi, M.; Chinthaginjala, R.; Pau, G.; Ksibi, A.; Anbazhagan, R.; Abbas, M.; Usman, M. Reliable data transmission in underwater wireless sensor networks using a cluster-based routing protocol endorsed by member nodes. Electronics 2023, 12, 1287. [Google Scholar] [CrossRef]
Anitha, D.; Karthika, R. DEQLFER—A Deep Extreme Q-Learning Firefly Energy Efficient and high performance routing protocol for underwater communication. Comput. Commun. 2021, 174, 143–153. [Google Scholar] [CrossRef]
Zhao, D.; Mao, W.; Chen, P.; Hu, Y.; Liang, H.; Dang, Y. A Distributed and Parallel Accelerator Design for 3-D Acoustic Imaging on FPGA-Based Systems. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2023, 43, 1401–1414. [Google Scholar] [CrossRef]
Gazi, F.; Ahmed, N.; Misra, S.; Wei, W. Reinforcement learning-based MAC protocol for underwater multimedia sensor networks. ACM Trans. Sens. Netw. (TOSN) 2022, 18, 1–25. [Google Scholar] [CrossRef]
Zhao, D.; Zhou, H.; Chen, P.; Hu, Y.; Ge, W.; Dang, Y. Design of Forward-Looking Sonar System for Real-Time Image Segmentation with Light Multi-Scale Attention Net. IEEE Trans. Instrum. Meas. 2023, 73, 4501217. [Google Scholar]
Gang, Q.; Muhammad, A.; Khan, Z.U.; Khan, M.S.; Ahmed, F.; Ahmad, J. Machine learning-based prediction of node localization accuracy in IIoT-based MI-UWSNs and design of a TD coil for omnidirectional communication. Sustainability 2022, 14, 9683. [Google Scholar] [CrossRef]
Nkenyereye, L.; Nkenyereye, L.; Ndibanje, B. Internet of Underwater Things: A Survey on Simulation Tools and 5G-Based Underwater Networks. Electronics 2024, 13, 474. [Google Scholar] [CrossRef]
Alablani, I.A.; Arafah, M.A. EE-UWSNs: A joint energy-efficient MAC and routing protocol for underwater sensor networks. J. Mar. Sci. Eng. 2022, 10, 488. [Google Scholar] [CrossRef]
Shen, Z.; Yin, H.; Jing, L.; Liang, Y.; Wang, J. A cooperative routing protocol based on Q-learning for underwater optical-acoustic hybrid wireless sensor networks. IEEE Sensors J. 2021, 22, 1041–1050. [Google Scholar] [CrossRef]
Sun, W.; Sun, X.; Wang, B.; Wang, J.; Du, H.; Zhang, J. MR-SFAMA-Q: A MAC Protocol based on Q-Learning for Underwater Acoustic Sensor Networks. Diannao Xuekan 2024, 35, 51–63. [Google Scholar] [CrossRef]
Chen, P.; Luo, L.; Guo, D.; Tang, G.; Zhao, B.; Li, Y.; Luo, X. Why and How Lasagna Works: A New Design of Air-Ground Integrated Infrastructure. IEEE Netw. 2024, 38, 132–140. [Google Scholar] [CrossRef]
Ge, L.; Tu, S.; Dong, Y.; Chen, Y.; Wan, L. Meta-Learning Based Hyperparameter Reweighting MAC Protocol for Underwater Acoustic Networks. In Proceedings of the 17th International Conference on Underwater Networks & Systems, Shenzhen, China, 24–26 November 2023. [Google Scholar] [CrossRef]
Luo, X.; Chen, L.; Zhou, H.; Cao, H. A survey of underwater acoustic target recognition methods based on machine learning. J. Mar. Sci. Eng. 2023, 11, 384. [Google Scholar] [CrossRef]
Guo, J.; Song, S.; Liu, J.; Chen, H.; Lin, B.; Cui, J.-H. An efficient geo-routing-aware MAC protocol based on OFDM for underwater acoustic networks. IEEE Internet Things J. 2023, 10, 9809–9822. [Google Scholar] [CrossRef]
Alhassan, I.B.; Mitchell, P.D. Packet flow based reinforcement learning MAC protocol for underwater acoustic sensor networks. Sensors 2021, 21, 2284. [Google Scholar] [CrossRef]
Li, T.; Kouyoumdjieva, S.T.; Karlsson, G.; Hui, P. Data collection and node counting by opportunistic communication. In Proceedings of the 2019 IFIP Networking Conference (IFIP Networking), Warsaw, Poland, 20–22 May 2019. [Google Scholar] [CrossRef]
Khan, Z.U.; Gang, Q.; Muhammad, A.; Muzzammil, M.; Khan, S.U.; Affendi, M.E.; Ali, G.; Ullah, I.; Khan, J. A comprehensive survey of energy-efficient MAC and routing protocols for underwater wireless sensor networks. Electronics 2022, 11, 3015. [Google Scholar] [CrossRef]
Rahman, W.; Gang, Q.; Feng, Z.; Khan, Z.U. A Q-Learning-Based Multi-Hop Energy-Efficient and Low Collision MAC Protocol for Underwater Acoustic Wireless Sensor Networks. In Proceedings of the 2023 20th International Bhurban Conference on Applied Sciences and Technology (IBCAST), Bhurban, Pakistan, 22–25 August 2023. [Google Scholar]
Balakiruthig, B.; Angayarkann, S.A.; Shekhawat, N.S.; Bathija, N.; Shrimali, K.S.; Gupta, N. Dynamic MAC Protocol for Layered Data Aggregation in Underwater Wireless Sensor Networks. In Proceedings of the 2024 Second International Conference on Intelligent Cyber Physical Systems and Internet of Things (ICoICI), Coimbatore, India, 28–30 August 2024. [Google Scholar]
Chen, B.; Hu, J.; Zhao, Y.; Ghosh, B.K. Finite-time observer based tracking control of uncertain heterogeneous underwater vehicles using adaptive sliding mode approach. Neurocomputing 2022, 481, 322–332. [Google Scholar] [CrossRef]
Ahmad, I.; Narmeen, R.; Kaleem, Z.; Almadhor, A.; Alkhrijah, Y.; Ho, P.-H.; Yuen, C. Machine Learning-Based Optimal Cooperating Node Selection for Internet of Underwater Things. IEEE Internet Things J. 2024, 11, 22471–22482. [Google Scholar] [CrossRef]
Alsalman, L.; Alotaibi, E. A balanced routing protocol based on machine learning for underwater sensor networks. IEEE Access 2021, 9, 152082–152097. [Google Scholar] [CrossRef]
Khan, Z.U.; Aman, M.; Rahman, W.U.; Khan, F.; Jamil, T.; Hashim, R. Machine Learning-based Multi-path Reliable and Energy-efficient Routing Protocol for Underwater Wireless Sensor Networks. In Proceedings of the 2023 International Conference on Frontiers of Information Technology (FIT), Islamabad, Pakistan, 11–12 December 2023. [Google Scholar]
Huang, J.; Ye, X.; Fu, L. MAC Protocol for Underwater Acoustic Multi-Cluster Networks Based on Multi-Agent Reinforcement Learning. In Proceedings of the 17th International Conference on Underwater Networks & Systems, Shenzhen, China, 24–26 November 2023; pp. 1–5. [Google Scholar]
Centelles, D.; Soriano, A.; Marti, J.V.; Sanz, P.J. Underwater multirobot cooperative intervention MAC protocol. IEEE Access 2020, 8, 60867–60876. [Google Scholar] [CrossRef]
Chen, Y.; Jin, Z.; Xing, G.; Zeng, Q.; Chen, Y.; Zhou, Z.; Yang, Q. An Energy-Efficient MAC Protocol for Three-Dimensional Underwater Acoustic Sensor Networks With Time Synchronization and Power Control. IEEE Access 2023, 11, 20842–20860. [Google Scholar] [CrossRef]
ur Rahman, W.; Gang, Q.; Feng, Z.; Khan, Z.U.; Aman, M.; Bilal, M. A MACA-Based Energy-Efficient MAC Protocol Using Q-Learning Technique for Underwater Acoustic Sensor Network. In Proceedings of the 2023 IEEE 11th International Conference on Computer Science and Network Technology (ICCSNT), Dalian, China, 21–22 October 2023; pp. 352–355. [Google Scholar]
Rajasoundaran, S.; Kumar, S.S.; Selvi, M.; Thangaramya, K.; Arputharaj, K. Secure and optimized intrusion detection scheme using LSTM-MAC principles for underwater wireless sensor networks. Wirel. Netw. 2024, 30, 209–231. [Google Scholar] [CrossRef]
Poudel, S.; Moh, S. Energy-efficient and fast MAC protocol in UAV-aided wireless sensor networks for time-critical applications. Sensors 2020, 20, 2635. [Google Scholar] [CrossRef]
Kampen, A.-L.; Otnes, R. MAC and Network Layer Solutions for Underwater Wireless Sensor Networks. Int. J. Adv. Netw. Serv. 2022, 15, 18–28. [Google Scholar]
Hota, L.; Nayak, B.P.; Kumar, A. Machine Learning Algorithms for Optimization and Intelligence in Wireless Networks: WSNs, MANETs, VANETs, and USNs. In 5G and Beyond Wireless Communications; CRC Press: Boca Raton, FL, USA, 2025; pp. 306–332. [Google Scholar]
Shwetha, M.; Krishnaveni, S. A systematic analysis, outstanding challenges, and future prospects for routing protocols and machine learning algorithms in underwater wireless acoustic sensor networks. J. Interconnect. Netw. 2024, 2330001. [Google Scholar] [CrossRef]
Zhu, R.; Boukerche, A.; Li, D.; Yang, Q. Delay-aware and reliable medium access control protocols for UWSNs: Features, protocols, and classification. Comput. Netw. 2024, 252, 110631. [Google Scholar] [CrossRef]
Cheng, Y.; Deng, X.; Qi, Q.; Yan, X. Truthfulness of a Network Resource-Sharing Protocol. Math. Oper. Res. 2022, 48, 1522–1552. [Google Scholar] [CrossRef]
Zhao, Z.; Liu, C.; Guang, X.; Li, K. MLRS-RL: An Energy-Efficient Multilevel Routing Strategy Based on Reinforcement Learning in Multimodal UWSNs. IEEE Internet Things J. 2023, 10, 11708–11723. [Google Scholar] [CrossRef]
Bin, W.; Kerong, B.; Yixue, H.; Mingjiu, Z. SQMCR: Stackelberg Q-learning based Multi-hop Cooperative Routing Algorithm for Underwater Wireless Sensor Networks. IEEE Access 2024, 12, 56179–56195. [Google Scholar] [CrossRef]
Ntabeni, U.; Basutli, B.; Alves, H.; Chuma, J. Device-Level Energy Efficient Strategies in Machine Type Communications: Power, Processing, Sensing, and RF Perspectives. IEEE Open J. Commun. Soc. 2024, 5, 5054–5087. [Google Scholar] [CrossRef]
Aman, M.; Gang, Q.; Shang, Z.; Khan, Z.U.L.L.A.H.; Khan, M.S.; Ullah, I. Realization of RSSI Based, Three Major Components (Hx, Hy, Hz) of Magnetic Flux Created around the MI-TD Coil. In Proceedings of the 2023 IEEE International Conference on Electrical, Automation and Computer Engineering (ICEACE), Changchun, China, 29–31 December 2023. [Google Scholar]
Huang, W.; Li, T.; Cao, Y.; Lyu, Z.; Liang, Y.; Yu, L.; Jin, D.; Zhang, J.; Li, Y. Safe-NORA: Safe reinforcement learning-based mobile network resource allocation for diverse user demands. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023. [Google Scholar]
Zheng, Z.; Jiang, S.; Feng, R.; Ge, L.; Gu, C. Survey of reinforcement-learning-based MAC protocols for wireless ad hoc networks with a MAC reference model. Entropy 2023, 25, 101. [Google Scholar] [CrossRef] [PubMed]
Ma, Y.; Li, T.; Zhou, Y.; Yu, L. Depeng Jin Mitigating Energy Consumption in Heterogeneous Mobile Networks Through Data-Driven Optimization. IEEE Trans. Netw. Serv. Manag. 2024, 21, 4369–4382. [Google Scholar] [CrossRef]
Hsu, C.-C.; Kuo, M.-S.; Chou, C.-F.; Lin, K.C.-J. The elimination of spatial-temporal uncertainty in underwater sensor networks. IEEE ACM Trans. Netw. 2012, 21, 1229–1242. [Google Scholar] [CrossRef]
Hsu, C.-C.; Kuo, M.S.; Chou, C.-F.; Lin, K.C.-J. Collision-free and low delay MAC protocol based on multi-level quorum system in underwater wireless sensor networks. Comput. Commun. 2021, 173, 56–69. [Google Scholar]
Bouabdallah, F.; Zidi, C.; Boutaba, R.; Mehaoua, A. Collision avoidance energy efficient multi-channel MAC protocol for underwater acoustic sensor networks. IEEE Trans. Mob. Comput. 2018, 18, 2298–2314. [Google Scholar] [CrossRef]
Stojanovic, M. On the relationship between capacity and distance in an underwater acoustic communication channel. ACM Sigmobile Mob. Comput. Commun. Rev. 2007, 11, 34–43. [Google Scholar] [CrossRef]
Lucani, D.E.; Stojanovic, M.; Médard, M. On the relationship between transmission power and capacity of an underwater acoustic communication channel. In Proceedings of the OCEANS 2008-MTS/IEEE Kobe Techno-Ocean, Kobe, Japan, 8–11 April 2008. [Google Scholar]
Rizvi, H.H.; Khan, S.A.; Enam, R.N.; Nisar, K.; Haque, M.R. Analytical model for underwater wireless sensor network energy consumption reduction. Comput. Mater. Continua 2022, 72, 1611–1626. [Google Scholar] [CrossRef]
Rachman, R.; Laksana, E.P.; Putra, D.S.; Sari, R.F. Energy consumption at the node in underwater wireless sensor network (UWSNs). In Proceedings of the 2012 Sixth UKSim/AMSS European Symposium on Computer Modeling and Simulation, Malta, Malta, 14–16 November 2012. [Google Scholar]
Sun, G.; Zhu, G.; Liao, D.; Yu, H.; Du, X.; Guizani, M. Cost-efficient service function chain orchestration for low-latency applications in NFV networks. IEEE Syst. J. 2018, 13, 3877–3888. [Google Scholar] [CrossRef]
Lurton, X. An Introduction to Underwater Acoustics: Principles and Applications; Springer Science Business Media: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
Leinhos, H.A. Capacity calculations for rapidly fading communications channels. IEEE J. Ocean. Eng. 1995, 21, 137–142. [Google Scholar] [CrossRef]
Chen, Y.D.; Lien, C.Y.; Fang, Y.S.; Shih, K.P. TLPC: A two-level power control MAC protocol for collision avoidance in underwater acoustic networks. In Proceedings of the 2013 MTS/IEEE OCEANS-Bergen, Bergen, Norway, 10–14 June 2013; pp. 1–6. [Google Scholar]
Shih, K.P.; Chen, Y.D. CAPC: A collision avoidance power control MAC protocol for wireless ad hoc networks. IEEE Commun. Lett. 2005, 9, 859–861. [Google Scholar] [CrossRef]
Syed, A.A.; Ye, W.; Heidemann, J. Comparison and evaluation of the T-Lohi MAC for underwater acoustic sensor networks. IEEE J. Sel. Areas Commun. 2008, 26, 1731–1743. [Google Scholar] [CrossRef]
Shi, J.J.; Ma, C.B.; Zuo, Y.; Li, J.C.; Ao, J. SD-CSMA/CA Underwater Optical Wireless Communication Access Protocol Incorporating Spatial Location Information. In Proceedings of the 2023 Cross Strait Radio Science and Wireless Technology Conference (CSRSWTC), Guilin, China, 10–13 November 2023; pp. 1–3. [Google Scholar]
Molins, M.; Stojanovic, M. Slotted FAMA: A MAC protocol for underwater acoustic networks. In Proceedings of the OCEANS 2006-Asia Pacific, Singapore, 16–19 May 2006. [Google Scholar]
Tomovic, S.; Radusinovic, I. DR-ALOHA-Q: A Q-learning-based adaptive MAC protocol for underwater acoustic sensor networks. Sensors 2023, 23, 4474. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Detailed overview of the designed paradigm.

Figure 2. Example of a collision between system clusters.

Figure 3. Example of transmitting data packets with collision or no collision at the same and different times.

Figure 4. Tackling the hidden and exposed node challenges in UWCNs.

Figure 5. Multi-cluster underwater wireless sensor network.

Figure 6. Flow chart for our proposed system communication performance.

Figure 7. Average network throughput vs. no. of nodes.

Figure 8. Network throughput vs. traffic load.

Figure 9. Average network delay vs. number of nodes.

Figure 10. Different traffic loads vs. average delay.

Figure 11. Average traffic time vs. no of nodes.

Figure 12. Comparison of channel utilization and slot size.

Figure 13. Comparison of channel utilization and slot size.

Figure 14. Comparison of Q-value and action.

Figure 15. Comparison of channel utilization and episode.

Figure 16. PDR vs. average traffic load.

Figure 17. PDR vs. number of nodes.

Figure 18. Collision vs. avg. traffic load.

Figure 19. Energy consumption vs. number of nodes.

Figure 20. Energy consumption vs. offered load.

Figure 21. Energy efficiency vs. avg. traffic load.

Figure 22. Evaluation of the convergence values for Q-learning algorithm.

Figure 23. Channel utilization vs. time blocks.

Figure 24. Channel utilization vs. number of nodes.

Figure 25. Energy consumption vs. number of nodes.

Table 1. Comparison between different existing MAC protocols and the present research.

Ref.	Collision Rate	Energy Efficiency	Throughput	Latency	Adaptability
[21]	Reduced through Q-learning and multi-hop approach.	Improved by optimizing path selection via Q-learning.	Higher compared to traditional schemes.	Mitigates latency, but not fully optimized.	Adaptive via Q-learning based on real-time data.
[22]	Low collision rate through hybrid optimization techniques (COOT-HOA, PSO-ACO).	High energy efficiency through optimized base node selection and long short-term memory network (LSTM) predictions.	High throughput due to optimized data routing and active zone selection.	Moderate latency, optimized via signal parameter estimation (ESPRIT).	Highly adaptable with deep learning LSTM for mobile node prediction.
Present research	Significantly lower due to robust efficient design.	Enhanced through robust, more advanced mechanisms.	Achieves even better throughput due to reduced collisions.	Further reduction in latency with robust design of new algorithm.	More adaptable, with robust flexible and dynamic approach.

Table 2. Simulation parameters.

Parameter	Value
No. of clusters	4
Underwater sound speed	1500 m/s
Communicating data packet	1000 bits
Control data packet	60 bits
Rx node range	6.5 km
Channel range	6.5 km
Bit error rate	2200 bps
Radius of CH	6.5 km
Size of RTS/CTS	220 bits
$S I N R_{d T H}$ value	22 dB
Total simulation time	23,000 s
Power of transmission	2 watts
Power of receiver	0.75 watts
Gross power	8.5 mW
Bandwidth	10 Kb/s
Traffic rate	0.05 to 0.4 packets per s
Running rounds	20
Simulation area	600 × 600 m
No. of nodes	50 to 200
No. of buoys	Uniform: 14 × 14 grid (approx.)
Learning rate	0.01
Offset time step	6 ms

Table 3. Throughput analysis of different techniques.

Scenario	Proposed System	CAPC-MAC	CSMA-MAC	SFAMA-MAC	T-LOHI-MAC
Mobile network	127%	110%	112%	107%	103%
Static networks	108%	92%	94%	90%	91%
Large network (mobile)	125%	107%	103%	112%	119%
Large network (static)	106%	77%	90%	88%	79%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gang, Q.; Rahman, W.U.; Zhou, F.; Bilal, M.; Ali, W.; Khan, S.U.; Khattak, M.I. A Q-Learning-Based Approach to Design an Energy-Efficient MAC Protocol for UWSNs Through Collision Avoidance. Electronics 2024, 13, 4388. https://doi.org/10.3390/electronics13224388

AMA Style

Gang Q, Rahman WU, Zhou F, Bilal M, Ali W, Khan SU, Khattak MI. A Q-Learning-Based Approach to Design an Energy-Efficient MAC Protocol for UWSNs Through Collision Avoidance. Electronics. 2024; 13(22):4388. https://doi.org/10.3390/electronics13224388

Chicago/Turabian Style

Gang, Qiao, Wazir Ur Rahman, Feng Zhou, Muhammad Bilal, Wasiq Ali, Sajid Ullah Khan, and Muhammad Ilyas Khattak. 2024. "A Q-Learning-Based Approach to Design an Energy-Efficient MAC Protocol for UWSNs Through Collision Avoidance" Electronics 13, no. 22: 4388. https://doi.org/10.3390/electronics13224388

APA Style

Gang, Q., Rahman, W. U., Zhou, F., Bilal, M., Ali, W., Khan, S. U., & Khattak, M. I. (2024). A Q-Learning-Based Approach to Design an Energy-Efficient MAC Protocol for UWSNs Through Collision Avoidance. Electronics, 13(22), 4388. https://doi.org/10.3390/electronics13224388

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Q-Learning-Based Approach to Design an Energy-Efficient MAC Protocol for UWSNs Through Collision Avoidance

Abstract

1. Introduction

2. System Model with Collision Design and Analysis of Our Proposed System

2.1. Collision Avoidance

2.2. Avoiding Collisions Between Clusters

2.2.1. Problem of Spatial–Temporal Uncertainty

2.2.2. Problem of Hidden Terminals

2.2.3. Problems with Exposed Terminals

2.2.4. System Model

2.3. An In-Depth Overview of Our Proposed MAC Protocol: Enhancing Network Efficiency and Performance

2.4. Energy Model of Our Proposed System

3. Improved Q-Learning Algorithm for Better Management of Channel Utilization in UWCNs

3.1. State Space ( S s )

3.2. Action Space ( A s )

3.3. Reward Function ( R f )

3.4. Rule for Q-Learning Updates

3.5. Description of the Q-Learning Formula

3.6. Explanation of Our Proposed Q-Learning Technique

4. Performance Evaluation

4.1. Simulation Design

4.2. Performance Metrics

4.3. Result Evaluation

4.3.1. Throughputs of the Network ρ g

4.3.2. Average Throughput vs. Different Number of Nodes

4.3.3. Network Throughput vs. Traffic Load

4.3.4. Average Delay

4.3.5. Different Traffic Loads vs. Average Delay

4.3.6. Average Traffic Time for Different Nodes

4.3.7. Channel Utilization vs. Slot Size

4.3.8. Energy Consumption

4.3.9. Comparison Analysis with Recent MAC Protocols

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.1. State Space ( $S_{s}$ )

3.2. Action Space ( $A_{s}$ )

3.3. Reward Function ( $R_{f}$ )

4.3.1. Throughputs of the Network $ρ_{g}$