1. Introduction
Medium access control (MAC) is one of the key requirements in underwater acoustic sensor networks (UASNs), garnering a major interest in the research community [
1,
2,
3]. As an analogue of terrestrial sensor networks, UASNs are envisaged to enable a multitude of civilian and military applications [
4,
5,
6]. To advance these applications, sensor nodes are being developed to be small/compact for easy transport, given that the environment is characteristically challenging to access. There is interest in new sensor nodes being energy efficient for longer deployments; as currently, there is no viable energy harvesting technology. Nodes should also be inexpensive to lower the overall cost, since UASNs are envisaged to be deployed to cover substantial marine areas and require a large number of devices. Employing acoustic waves in UASNs imposes some unique channel-centric constraints, such as: limited distance and frequency dependent capacity (bandwidth and data rate), long and variable propagation delay and high bit error rate (BER) on the design of UASNs [
2,
4,
7]. As such, there is growing demand for efficient MAC solutions, especially adaptive MAC protocols for practical networks in the highly dynamic underwater environment.
Although preliminary studies on adopting existing MAC techniques/schemes from the vast body of work on terrestrial MAC protocols to underwater networks was largely found to be ineffective [
1,
8], the insight from the underlying principles remains useful. As a general guide, the network topology gives an insight into the appropriate category of MAC scheme to employ, with contention-free and contention based schemes better suited to centralised and decentralised topologies, respectively. Centralised topologies typically facilitate schedule creation and coordination from a central controlling node. Therefore, uncoordinated channel access becomes too contentious and less efficient. On the other hand, in a decentralised topology, such coordination is prohibitively challenging to implement, and the limited resources make contention-free protocols inefficient.
Code division multiple access (CDMA) and frequency division multiple access (FDMA) are promising contention-free schemes considered for UWASNs [
9,
10]. CDMA assigns unique binary codes to users (nodes) to spread the information signal, thereby offering the complete frequency band to nodes for simultaneous transmissions. Frequency hopping and direct sequence spread spectrum (FHSS and DSSS, respectively) are the standard modulations employed in this scheme. FDMA splits the channel into distinctive frequency bands and assigns them to different users. In this way, users can initiate concurrent transmissions without incurring collisions [
5,
10]. While the radio bandwidth (GHz) enables the implementation of these schemes with relative ease, in UANS, the available bandwidth is very limited (kHz).
Time division multiple access (TDMA) [
11] creates schedules by splitting time into slots and is the most promising contention-free approach used in UASNs, because of its flexibility and potential to achieve true collision-free scheduling. Despite the challenges of synchronisation, some solutions leverage the long propagation delays for spatial reuse to improve performance. A gateway node in [
12] creates a gap-free schedule and then requests packets from the transmitting nodes. Other solutions incorporate sleep cycles between activities to save energy [
3]. The solution in [
13] is for a central node to use an initialisation stage to gather network-wide information, which is then optimised using genetic and particle swarm algorithms to create a collision-free schedule. However, the lack of complete knowledge of the environment poses a major challenge for creating a lasting collision-free schedule.
Contention based MAC protocols such as ALOHA [
14] and its variants offer low complexity and simplicity of implementation. The downside is that contention based protocols suffer low utilization and prohibitively large end-to-end delay at high loads due to the blind transmission strategy. The works in [
15,
16] integrated additional guard times between successive transmissions in order to reduce collisions, and reference [
17] demonstrated receiver initiation (RI) to improve the performance. In RI, the receiver makes the first move of initiating the data transfer session by sending a request packet to the transmitter(s) (essentially polling). Since collisions occur at the receiver, the RI approach aims to eliminate the most common source of collision (transmit-receive collision). All these approaches add to the complexity, and the overheads incurred by the control packets limit the achievable utilisation.
A popular technique is to incorporate both contention based and contention-free components to form hybrid MAC protocols. This strategy improves performance by allowing networks/devices to switch to an optimum MAC scheme based on demand or traffic profiles. Variations in traffic were addressed in [
18], where the protocol was pre-configured to assign capacity either by free assignment or on demand, and reference [
19] balanced performance with two time slots in a frame, one slot for scheduled transmissions and the other for random access.
In the highly dynamic underwater environment, MAC protocols need to be adaptive to changing conditions as well. This is because previous assumptions used to create schedules may be outdated or sub-optimal due to changes in topology, traffic, node(s) failure(s) and/or addition(s). Reinforcement learning is a promising solution used in MAC protocols to provide adaptability and robustness in wireless sensor networks, such as ad-hoc emergency networks for disaster monitoring [
20,
21]. In such networks, intelligent MAC protocols will adapt to the changing topology or the environment. Instead of switching between MAC schemes, reinforcement learning is used to continually assess the network condition through feedback and appropriately responds with a view towards maintaining (as much as possible) a collision-free schedule.
In [
21], we studied the use of ALOHA-Q [
20] underwater. ALOHA-Q is a MAC protocol originally developed for terrestrial wireless sensor networks. It employs a Q-learning algorithm to incorporate intelligence into framed ALOHA. The frame is created with a predetermined number of periodic fixed time slots. Each slot is structured such that it accommodates a data packet, an ACK packet and their corresponding one hop propagation delays (Figure 2). Initially, nodes randomly select and transmit in any slot, but eventually, each node settles on a collision-free slot as the underlying Q-learning reward/punishment serves to reinforce successful slots. However, because the ACK serves as the critical signal for the reward/punish mechanism in the Q-learning algorithm, the overhead with respect to the slot size due to the long propagation delay severely constrains the effectiveness of the Q-learning strategy in terms of achievable utilization and end-to-end delay. In
Section 3.2, we demonstrate the Q-learning update mechanism and how it is applied in the ALOHA-Q protocol.
The focus of this paper is to implement a robust, simple and computationally inexpensive MAC protocol that consistently and efficiently delivers the maximum channel utilisation in a monitoring chain UASN, such as an underwater pipeline. To achieve that, we were inspired by the research in [
20,
22,
23,
24]. For reference,
Table 1 describes the terms/symbols used in this paper.
Our specific contributions are:
To provide some background work on the feasibility of restructuring the slot size in a typical frame-based underwater MAC protocol to improve network performance.
To propose a new slot structure with minimal overhead based on the relationship between packet transmission duration and the one hop propagation delay that is capable of achieving the theoretical channel utilization.
To propose ALOHA-QUPAF, a novel dual-control intelligent approach to medium access control based on packet(s) flow in a linear chain network.
The rest of the paper is structured as follows.
Section 2 introduces the frame based approach of the MAC protocol design and the network model.
Section 3 presents the proposed slot size, the analytical modelling, and discusses the simulation results as compared to the theoretical results. It is followed by
Section 4, our detailed dual-control intelligent MAC scheme, and the results obtained when applied to varying lengths of chain networks.
Section 5 discusses the simulation results obtained of our proposed protocol. Finally, in
Section 6 we draw conclusions.
2. Frame Based MAC Protocol
In this section, an overview is given of the fundamental operation of a baseline frame based random access protocol. With the aid of a simple network model, we analyse and identify the limitations of frame based scheduling (in terms of achievable channel utilization) with a random access scheme.
Framed ALOHA is one of the baseline protocols we compare against our proposed intelligent scheme. In contrast to slotted ALOHA, whereby time is divided into slots and nodes can only transmit at the beginning of each slot, a frame is used in framed ALOHA, which comprises a fixed number of contiguous slots . In the framed ALOHA random access strategy, each node independently and randomly chooses one of the transmission slots at the beginning of each frame.
Typically, a slot is structured such that it accommodates: a data packet of duration (
), an acknowledgement packet of duration (
if required), the associated propagation delays of each packet (
) and a small guard band (
): the band is essential to correct and guard against drifts in clock precision and synchronisation. The slot structure is shown in
Figure 1, for cases with and without acknowledgements. Whereas, in radio networks, the overheads due to the wait period between successive data transmissions in a slot/frame can be of negligible length with respect to the packet duration, in an underwater acoustic channel however, the physics impose a long propagation delay, plus low capacity (bandwidth and therefore data rate), making the overheads significant, thus negatively impacting the channel utilization and end-to-end delay.
Defining the channel utilization (U) as the rate of delivering data at the designated sink node (Equation (
1)), then, in frame/slot based protocols, the utilization is also a function of the number of slots (
) in the frame. For example, if a node is allowed to transmit
N packets per frame, then the maximum effective utilization at the sink is going to be upper bounded at
. The value of
is determined from the topology and interference population of the network. Setting
inappropriately will negatively affect not just the utilisation, but potentially the stability of the MAC protocol as well. For example, in a star topology,
is equal to the number of transmitting nodes (
); as each node should have a unique transmitting slot, setting
adds extra un-utilised slot(s), and
will cause contention as some nodes will not exclusively own a slot. Therefore, for a particular topology and interference model, there is an optimum
(
) [
20]. Erlang [
25] is a dimensionless unit that represents continuous channel usage (for example 0E = zero channel activity, 0.5E = half channel activity and 1E = full channel usage).
therefore, the optimum utilization is:
where
S,
denote the slot duration and packet duration in seconds respectively.
One of the consequences of having low capacity is the long transmission duration, which presents two situations for a given transmitter and receiver pair: the transmission duration is either greater than or less than the propagation delay between the nodes. Following [
26], if we introduce the parameter
(Equation (
3)), then the resulting slot structure can have either of two sets of transmission-reception patterns: overlapping and non-overlapping based on the value of
, as shown in
Figure 1.
and
represent the slots’ length with ACK and are typically used by slotted protocols employing an ACK signal such as ALOHA-Q. Similarly,
and
are the slots without ACK as used in framed ALOHA and TDMA. Equations (
4) and (
5) are used to calculate the slot sizes.
In this slotted concept, nodes are allowed to transmit only one packet per frame (i.e., N = 1), and the expression of maximum utilisation (U) can be simplified to the ratio of packet duration-to-frame size (Equation (
6)). We can combine Equations (
2) and (
6) to calculate the expression of the utilisation below:
As
, Equation (
6) approximates to:
From Equation (
7), it can be seen that, since
and
dominate, the value of
will guide us on how to improve channel utilisation by restructuring the slot size. For
, we are constrained with respect to any change to the slot size. Any reduction will create overlapping slot reception that will effectively render the slotting meaningless, as demonstrated with the downgrade of slotted ALOHA to pure ALOHA underwater [
26].
In most UASNs applications, the propagation delay is longer than the transmission duration because of sparse connectivity. Therefore,
best describes such scenarios. We propose the slot structure in
Figure 2. The slot size is now reduced to approximate the propagation delay (
), which is possible since with
, the data packet can be safely accommodated in
. This simple slot structure aims to reduce and fill the otherwise wide gap in the conventional slots with useful data (compared to
Figure 1). Therefore, for a given chain UASN, designed with nodes separated by a
dm transmission range, we demonstrate that there are advantages to the performance improvements of using our slot structure; for example, the peculiar characteristic of the underwater communication channel in terms of its distance dependent capacity, that is the acoustic transmission bandwidth and data rates decrease with increasing transmission distance [
27]. As such, instead of a few hops transmitting over longer ranges (requiring high power) with low capacity, we can potentially achieve higher capacity transmissions with additional hops added to route data over shorter ranges (low power). To investigate the achievable utilisation, the slot structure shown in
Figure 2 is based on
: a special case of
. This is purely to limit the overhead in the slot, as increasing the slot size beyond
negatively affects the utilisation according to Equation (
6).
Scenario and Network Model
Consider a scenario comprising quasi-stationary equally spaced nodes in an N hop underwater network chain topology, with data delivered along the chain from one end to the other.
Figure 3 depicts an example of such a network with N = 4 and hop distance
d. This topology is representative of pipeline monitoring. As such, during the reporting cycle, the network can be considered loaded to capacity; accordingly, this work is primarily concerned with the achievable utilisation. To aid the analysis, the following assumptions are made:
All nodes are homogeneous and communicate over a single channel, half-duplex mode.
The collision model (non-capture) is used, i.e., if two or more packets overlap at the receiver, they are discarded.
Nodes are globally synchronised, an assumption commonly employed to simplify analysis and applicable to quasi-stationary nodes synchronised before deployment.
The interference range (Ifx) is twice the reception range (Rx); this model is typically employed for chain networks as an illustrative model to incorporate the effect of interference from nodes that are two hops away.
A source node has saturated traffic, i.e., always has a packet to send, to provide the maximum monitoring rate based on the transmission opportunities offered by the MAC layer. Similar research papers are concerned with achievable utilization [
23,
28,
29].
All source/relay nodes can only transmit one packet per frame, a consequence of Assumption (4) yielding a frame consisting of four slots [
20], as only one of four connected nodes can transmit successfully at a given time.
We re-write Equation (
7) of
to get the new utilisation for the proposed slot structure:
and in terms of
, it becomes:
In summary, while the traditional slot structure that incorporates the propagation delay and/or ACK packet within the constraints of the available channel resources, we show that with
, the propagation delay is sufficient to accommodate the data packet, then it is possible for the slot size to be effectively reduced and restructured (by at least 50% of the cases in the
regime), and as long as a protocol does not require an ACK packet, there is a potential for a dramatic improvement in performance (Equation (
9) vs. Equation (
7)).
3. Model Analysis
To analyse the network with the proposed slot structure (
Figure 2), we consider a baseline scheme whereby each node initialises by randomly choosing a transmission slot. The purpose of considering this scheme is first to demonstrate the inefficiency of a random access scheme by analysing the distribution of the achievable channel utilization, second to investigate the feasibility of applying intelligent techniques to the model that could lead to a significant performance improvement and, finally, to evaluate the efficacy of the proposed slot structure coupled with the intelligent techniques relative to similar intelligent approaches and random access baseline schemes.
To build the frame, we start with the optimal number of slots per frame
. In a linear chain network (such as
Figure 3 and longer,)
is four as computed according to the two hop interference model [
20]. This is because in a linear topology with two hop interference model, technically only one in four nodes can successfully transmit at a given time. Similarly, for one hop and three hop interference models, one in three and one in five nodes can transmit successfully [
20,
23]. Therefore, for a distributed MAC protocol, such as framed ALOHA employed in this setup, each node is free to chose any of the available four slots in the frame, resulting in
ways for nodes to independently select and occupy transmission slots.
Table 2 lists the range of the 256 possible slot combinations in a four column array of 64 unique patterns, with each column vector signifying the transmission slot pattern from Node 0 to Node 3. That is, the vector
denotes all nodes selecting and occupying Slot 0; likewise, slot sequence
signifies both Nodes 0 and 1 choosing Slot 2, while Nodes 2 and 3 choose Slot 1 and Slot 0, respectively. Pictorial timing depictions (see
Appendix A) are employed to observe and obtain the theoretical bounds of the scheme in terms of channel utilisation. The diagrammatic method provides a visual intuition of our core idea. In
Appendix A, six examples (
Figure A2,
Figure A3,
Figure A4,
Figure A5,
Figure A6 and
Figure A7) are provided to illustrate the process. For each pattern,
is the source node; it generates and transmits data in every frame to
, which forwards the packet (if successfully received) to
in the next frame, and so on. Overall, individual packets are traced frame-by-frame as they traverse the network from source to sink (
). The final utilisation is measured when an overall periodic pattern emerges at the sink node (vertical red lines in each example figure; refer to
Appendix A).
3.1. Results
In order to empirically evaluate the performance of the above random access scheme, we ran a simulation on a network of five nodes (
Figure 3) configured with the proposed slot structure analysed in
Section 3. Each node is pre-configured to run a MAC protocol that randomly selects and maintains a transmission slot at the beginning of each simulation run. It should be noted that in this simulation, since
, the transmission delay and propagation delay are abstracted to 1:1 for the best results.
Figure 4 shows and compares the utilisation results from both the analytical distributions of the slot patterns and the simulations. Overall, there are three dominant utilisation levels and some spurious intermediate levels, as summarised in
Table 3. The summary provides individual proportions of levels in each plot, and the overall column is the contribution of each sequence in the combined set of 256 slots.
Depending on the chosen slot by the source node, transmissions could be initiated from either the frame edge (Slots 0 or 3) or mid-frame (Slots 1 or 2), and to some degree, the results show how the position of a chosen slot affects the utilisation. As shown in the result summary (
Table 3), there is a subtle, but clear advantage in performance when the source node initiates transmissions with emerging slot patterns at frame edges (i.e., SEQ_0XXX, SEQ_3XXX) relative to the mid frames (i.e., SEQ_1XXX, SEQ_2XXX) or there is at least an 8% better chance of getting a packet received at the sink node when the source node transmits at the edges of a frame as compared to when source node uses mid frame (in terms of the worst case utilisation levels).
Intuitively, the distribution of the utilisation of the patterns can be assumed to be similar, since it can be demonstrated that each column sequence can be translated to another corresponding sequence in the remainder of the columns (
Table 2). However, due to the transmission strategy of the protocol of scheduling packet transmission at the beginning of each frame, the simple slot structure guarantees that packets transmitted at
be received at
. This means sequence translations will result in packet reception/interference across frames, consequently causing the distribution of the utilization outcomes to vary. For example, consider the corresponding slot selection sequences: [ 0 0 3 0 ], [ 1 1 0 1 ], [ 2 2 1 2 ] and [ 3 3 2 3 ]. [ 0 0 3 0 ] and [ 3 3 2 3 ] both have cross-frame receptions and have a similar utilization of 0.125 Erlangs (
Figure A7). In contrast, [ 1 1 0 1 ] and [ 2 2 1 2 ] have no cross-frame reception and yield 0 Erlangs (
Appendix A:
Figure A2). Only 60 out of the total 256 slot sequences yield the maximum utilization level as a whole and remain immune to the slot sequence translations because they are perfectly collision-free. In
Figure A2,
Figure A3,
Figure A4,
Figure A5,
Figure A6 and
Figure A7 (
Appendix A), we show how we computed six of the ten prominent utilisation levels for brevity.
The simulation results are in agreement with our analytical results, as they show that no data is delivered 48% of the time. This corresponds to the average of the possible 43–53% worst cases in the given original slot patterns, as expected. Most importantly, the simulation result confirms that the full channel utilization is achievable with the exact proportion of 23%. Finally, the simulation result shows the average performance of the random slot selection protocol and will serve as a baseline with which to demonstrate the merit of slot based learning in the new protocol ALOHA-QUPAF.
3.2. Q-Learning
This section demonstrates the underlying Q-learning update procedure based on stateless Q-learning [
30]. Following the standard Q-learning framework, an agent learns how to behave in an unknown environment by interaction with the environment. The agent perceives and changes the state of the environment by taking an action and receives a response from the environment, which indicates the quality of the action taken in the form of a reward/punish signal. This process is Markovian, and it is modelled as an MDP [
30,
31,
32]. To develop a MAC protocol, this is translated to a node taking the action of transmitting the data packet, and the successful/unsuccessful reception of an ACK packet represents the reward/punish signal. Each node is given a vector of Q-values (Q-table), and each Q-value is in turn assigned to one slot in the frame (
Section 1). At the beginning of each frame, a node will scan the Q-table and select the slot with the highest Q-value to schedule transmission in that slot. Successful transmissions are rewarded and unsuccessful transmissions punished based on the reception or otherwise of an ACK packet and updating the Q-value of the transmission slot using Equation (
10).
where
,
and
denote the Q-value of the current slot, the learning rate (0.1) and the reward/punish signal (±1).
Table 4 illustrates an example of the Q-learning as implemented in ALOHA-Q. Consider an initial situation (Frame 0) whereby a node i with data to send randomly chooses Slot 2 (because all slots have equal Q-values) at the beginning of a frame to schedule transmission and the transmission was unsuccessful.
The new Q-value of Slot 2 becomes;
; [−0.1]
In the next frame, Slot 2 has the lowest Q-value and is not considered, and the node again chooses Slot 1 randomly (among Slots 0, 1 and 3). Following a successful ACK reception, the new Q-value of Slot 1 is updated.
; [0.1]
For Frame 2, the node chooses Slot 1 as it has the highest Q-value (0.1) and sends data; with successful ACK reception, the Q-value is updated accordingly.
; [0.19]
The table gives the Q-values up to twenty frames assuming Slot 1 continues to be successful. This simple, yet effective recursive Q-learning update bootstraps the trial-and-error mechanism to a robust collision-free schedule as each node will eventually and independently occupy a unique transmission slot.
However, as previously stated, while the ACK signal is crucial to the Q-value update operation, it puts an additional burden on the scarce network resources underwater: reducing utilisation due to overheads and increased delay due to the ACK signal wait times. Our goal is to implement a novel Q-learning approach that maintains the level of intelligence without this explicit ACK signal, thereby maximising the channel utilisation and improving end-to-end delay.
5. Discussion
Figure 5 and
Figure 6 are the results obtained when the network was simulated on four and eight hop networks, respectively. The figures compare the performance of ALOHA-QUPAF with ALOHA-Q and framed ALOHA. This comparison is particularly important as the protocols share similar reception conditions in the
scenario; transmission and reception occur in the same slot (
Figure 1). Evidently, in this setup, both ALOHA-QUPAF and ALOHA-Q are dramatically affected as the network size increases (four hops to eight hops). The maximum utilisations of ALOHA-QUPAF (0.217 Erlang) and ALOHA-Q (0.191 Erlang) are both sharply halved for about 40% and 58% of the simulated cases, respectively. This performance drop can be explained by the presence of the hidden node phenomenon [
38,
39]. This is simply the situation whereby a particular communication between any two nodes is interfered by another transmission within range of the receiver.
Figure 7 depicts the hidden node problem in an eight hop chain network, in a situation whereby both
and
share the same transmission slots; thus, transmission from
to
will be periodically interfered by
transmitting to
, as packets are relayed along the chain. The effect of the hidden node problem as the reason for the performance degradation is confirmed by the agreement shown in the simulation results obtained when the interference range (
Ifx) is reduced from two hops to one hop in the eight hop chain (
Figure 6) with the results in the four hops network (
Figure 5). This is because, in a two hop interference range model, a four hop range chain network is of insufficient length for the issue to manifest. Mitigating the hidden node issue is a subject of further work. Another important metric worth mentioning is the end-to-end delay; however, it is not presented here, since ALOHA-QUPAF does not implement packet retransmissions. Therefore, neglecting any processing and queuing delays in the nodes, the E2E delay is fixed as a function of the number of hops in the network. The simulations show that ALOHA-QUPAF achieves 0.124 Erlangs at its worst and 0.248 Erlangs at its best, outperforming both ALOHA-Q (0.19 Erlangs best) and framed ALOHA (0.069 Erlangs) respectively by at least 13% and 148% in all simulated scenarios.
Figure 8 presents the performance of ALOHA-QUPAF with our proposed slot structure (
Figure 2) in the
scenario. To demonstrate how the ALOHA-QUPAF protocol is affected by the network length, we extend the range to 16 hops and evaluate its performance. The results show a subtle drop in the overall performance from four to 16 hops. The decrease in performance is attributable to the increase in the hidden node spots (bottlenecks points) and the time needed for the protocol to find a collision-free schedule as the network size increases. Each time a node switches to a different transmission slot, this will have a ripple effect across the neighbouring nodes, causing others to potentially switch slots as well, essentially resetting the process. Despite a lack of an explicit acknowledgement signal, the protocol demonstrates significant performance improvement with more than 90% of cases achieving 0.24 Erlangs for networks in the 4–12 hop range and 80% for the 16 hop range.