AUV-Aided Optical—Acoustic Hybrid Data Collection Based on Deep Reinforcement Learning

Bu, Fanfeng; Luo, Hanjiang; Ma, Saisai; Li, Xiang; Ruby, Rukhsana; Han, Guangjie

doi:10.3390/s23020578

Open AccessArticle

AUV-Aided Optical—Acoustic Hybrid Data Collection Based on Deep Reinforcement Learning

by

Fanfeng Bu

¹

,

Hanjiang Luo

^1,*

,

Saisai Ma

¹,

Xiang Li

¹,

Rukhsana Ruby

²

and

Guangjie Han

³

¹

College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China

²

College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China

³

Department of Information and Communication Systems, Hohai University, Changzhou 213022, China

^*

Author to whom correspondence should be addressed.

Sensors 2023, 23(2), 578; https://doi.org/10.3390/s23020578

Submission received: 27 November 2022 / Revised: 20 December 2022 / Accepted: 29 December 2022 / Published: 4 January 2023

(This article belongs to the Section Sensor Networks)

Download

Browse Figures

Versions Notes

Abstract

:

Autonomous underwater vehicles (AUVs)-assisted mobile data collection in underwater wireless sensor networks (UWSNs) has received significant attention because of their mobility and flexibility. To satisfy the increasing demand of diverse application requirements for underwater data collection, such as time-sensitive data freshness, emergency event security as well as energy efficiency, in this paper, we propose a novel multi-modal AUV-assisted data collection scheme which integrates both acoustic and optical technologies and takes advantage of their complementary strengths in terms of communication distance and data rate. In this scheme, we consider the age of information (AoI) of the data packet, node transmission energy as well as energy consumption of the AUV movement, and we make a trade-off between them to retrieve data in a timely and reliable manner. To optimize these, we leverage a deep reinforcement learning (DRL) approach to find the optimal motion trajectory of AUV by selecting the suitable communication options. In addition to that, we also design an optimal angle steering algorithm for AUV navigation under different communication scenarios to reduce energy consumption further. We conduct extensive simulations to verify the effectiveness of the proposed scheme, and the results show that the proposed scheme can significantly reduce the weighted sum of AoI as well as energy consumption.

Keywords:

autonomous underwater vehicles; optical–acoustic multi-modal communication; data collection; path planning; deep reinforcement learning

1. Introduction

Accompanied by the increasing demand for ocean exploration and protection, underwater sensor networks (UWSNs) have received more attention as these play an important role in diverse marine applications, such as coastal monitoring and protection, marine resource exploration, disaster warning and military operations [1,2,3,4]. However, due to the harsh hydrographic and geographical environment, it is difficult to collect data from underwater sensor devices via a long-range routing path. Even if the monitored data can be transmitted through multi-hop routing technologies, there may be heavy workload near the sink with extra energy consumption [5]. Furthermore, as the battery power of underwater sensor nodes is severely limited and difficult to be recharged underwater, it is not energy-efficient to upload large volume of ocean monitoring data to the sink directly. Moreover, with marine security operations, it is better to collect secret data nearby the monitoring sensors. To solve the aforementioned problems, autonomous underwater vehicles (AUVs) have been rapidly developed in recent years in terms of data storage and signal processing capabilities, which can better enable underwater mobile data collection. Moreover, the durability and mobility of AUVs alleviate the unbalanced energy consumption problem of underwater sensors [6,7].

To collect data in an efficient manner, various underwater communication technologies have been investigated, such as acoustic and optics [8]. Currently, although underwater acoustic communication (UAC) has become the most widely used technology due to its unique advantages (e.g., long-range communication), it is limited by its shortcomings (e.g., low bandwidth, slow speed, high bit error rate and large delay) [9]. To address these issues, underwater optical communication (UOC) has emerged as an alternative solution, as it has a higher propagation speed (2.255

\times 10^{8}

m/s) and higher data rate (up to hundreds of Mbit/s) over short to medium-range transmissions [10,11]. As both acoustic and optical communication have their pros and cons, employing multi-modal underwater communication systems in UWSNs has become a potential approach to improve network performance [12,13].

To facilitate mobile data collection in such multi-modal networks, it is necessary to satisfy the varying requirements of marine applications by combining the potential advantages of AUVs [14]. Combined multi-modal data collection via AUV is divided into two categories, such as acoustic multi-modal and acoustic-optical multi-modal. In acoustic multi-modal data collection, the sensor node transmits control information using low-frequency acoustic waves and guides the AUV to the designated area, and then, it switches to a high-frequency UAC modem to transmit the data [15]. In this case, the high energy consumption of the UAC shortens the lifetime of the sensor node when transmitting large volume of network data. Whereas, in the acoustic–optical multi-modal data collection, the UAC provides the capability for the AUV to approach the sensor node through long-range guidance and assists with alignment for the optical communication. The subsequent proximity of the UOC data transmission not only improves data transmission efficiency but also saves energy for transmission [16]. However, limited by the UOC range, it is necessary for the AUV to move slowly close to the sensor node to build an optical link reliably, which increases traveling time. A promising solution to the above problem is to engage both the UAC and UOC in data collection, such as transmitting a small volume of data over long distances using the UAC and retrieving and offloading large amounts of data using the UOC [17,18].

Although the aforementioned pioneering studies have laid a solid foundation for multi-modal data collection, there are still some issues when applied to mobile data collection. Firstly, the AUV should adaptively select the best communication technology according to the specific marine operational data requirements (e.g., data importance and packet size). For example, high-quality data collection (e.g., high-resolution images with 4K size) with UOC can prolong the network lifetime of the UWSN by sacrificing the energy consumption of the AUV, and when the volume of data is relatively less, UAC can be used for remote collection to reduce the energy consumption and travel time of the AUV. Furthermore, the AUV should complete the data collection operation quickly to guarantee the freshness of data as the data value usually decays over time [19]. Generally, the age of information (AoI) can be used to measure data freshness in mobile data collection scenarios [5,20,21]. By optimizing AoI, the requirement of the network for timely data delivery can better be satisfied. Therefore, in a multi-modal AUV-enabled mobile data collection scheme, how to optimize the trajectories of AUV and select a communication option to minimize both AoI and energy consumption based on the size and the importance of the packets is a critical issue.

To solve the aforementioned issues, in this paper, we propose an acoustic–optical multi-modal mobile data collection scheme. Based on the type and the size of data, the AUV intelligently searches for the optimal trajectory and communication options using the deep reinforcement learning (DRL) approach, thereby minimizing the AoI and extending the lifetime of the sensor network. To the best of our knowledge, this is the first study which focuses on integrating an acoustic–optical multi-modal option with optimal AUV path planning for reliable and timely mobile data collection leveraging the DRL approach. The main contributions of this study are listed as follows.

We investigate an AUV-assisted underwater trajectory planning problem for data collection by integrating the complementary advantages of both acoustic and optical communication with data diversity to perform reliable and timely mobile data collection.
We propose a DRL-based AUV-assisted multi-modal mobile data collection scheme in which we consider several key factors, such as data importance, packet size and data collection option, to minimize AoI and reduce energy consumption.
We propose an optimal angle steering algorithm for AUV navigation to reduce energy consumption, in which the steering angle of the AUV is determined based on the AUV and sensor positions as well as the data collection option.

The rest of the paper is structured as follows. We briefly review the related works in Section 2. In Section 3, we introduce the network model with necessary background. In Section 4, we analyze the problem of the multi-modal data collection. In Section 5, we describe the proposed scheme of DRL-based multi-modal data collection in detail. We evaluate the performance of the proposed scheme in Section 6. Finally, we conclude the paper in Section 7.

2. Related Works

In recent years, multi-modal communication has become a research topic to improve network performance and optimize data transmission in various marine application scenarios. Commonly adopted multi-modal technologies include acoustic multi-modal communication and acoustic–optical hybrid communication [13,22,23,24]. Among them, the acoustic multi-modal communication is constructed by a set of UAC modems working on different frequency bands [13]. In [22], the authors proposed a multi-modal underwater routing protocol based on the reinforcement learning technique. In this protocol, the reliability and delay of data transmission are optimized by UAC modems in multiple frequency bands. To explore the advantages of UAC and UOC during data transmission, Shen et al. proposed an acoustic–optical multi-modal routing scheme based on packet size and link adaptation, which reduces packet loss and end-to-end delay [23]. However, the challenge of unbalanced energy consumption still exists in the multi-hop underwater networks.

The AUV-assisted data collection can mitigate the energy consumption unbalance that occurs in multi-hop routing. Han et al. [18] explored the characteristics of underwater acoustic and optical communication in AUV-assisted data collection and showed that hybrid acoustic–optical data collection outperforms the one with a single acoustic modem in terms of both throughput and energy consumption. To cope with the impact of the harsh underwater environment, Luo et al. [25] maximize the network throughput by capturing the dynamic characteristics of the channel and the mobility of the AUV. Hu et al. [17] proposed a mobile data collection method for the heterogeneous sensor network using multi-hop acoustic communication to build an intra-cluster network where CHs collect large-scale data and upload them to a mobile receiver via optical communication. Although the aforementioned schemes improve the efficiency of data collection, these ignore the difference in the importance of data and the decay of freshness over time.

To handle the aforementioned issues, Gjanci et al. [16] proposed a greedy adaptive navigation algorithm to guide AUV for data collection, which considers the characteristics of data decay, but it is only applicable to sparse networks due to the unavoidable long paths. To deliver emergency data faster, Liu et al. [26] proposed a hybrid data collection scheme, in which the urgent data are routed using a multi-hop scheme, while the delay-insensitive data are collected by AUVs. Duan et al. [27] studied a hierarchical data collection problem using AUVs to optimize the information quality of the collected data while considering the importance and timeliness of the events.

More recently, researchers have proposed the concept of AoI to model the timeliness of data while considering the quality of experience (QoE) [28]. Khan et al. [29] provided an optimization algorithm to ensure the freshness of the collected data. Fang et al. [5] used a vocational queuing model to improve the data reliability and peak AoI of the data. Then, the communication link is established using the UAC when the AUV arrives near the node. Al-Habob et al. [30] proposed a framework to optimize the trajectories of AUV and minimize the normalized weighted sum of the average AoI. Wu et al. [31] studied the AUV transmission scheduling policy by considering both the age and the importance of the message.

Although the aforementioned approaches have promoted the study of underwater data collection, only a single communication technology was considered for the data collection. Moreover, none of them addressed the issue of leveraging multiple data types and multiple communication technologies to improve data freshness and energy efficiency. To address this issue, in this paper, we propose a AUV-assisted acoustic and optical multi-modal data collection scheme, in which we use the DRL method to optimize AUV trajectories, AoI and energy consumption by considering different communication options and packet size.

3. Network Model

3.1. Network Architecture

As shown in Figure 1, we consider an AUV-based multi-modal data collection network where the deployed nodes are classified into ordinary nodes

S = {s_{1}, s_{2}, \dots, s_{M}}

, cluster heads

C H s = {c_{1}, c_{2}, \dots, c_{N}}

and sink node according to their different functions. The sensor nodes are statically deployed on the seabed using anchor chains, where the locations of the nodes are assumed to be known. The CHs perform intra-cluster data fusion and data compression [27] and then wait for the AUV to arrive and collect the data. In particular, during the network formation phase, all sensor nodes are divided into multiple clusters based on spatial distance, and only one node in each cluster is selected as the CH, while other nodes are used as ordinary nodes for data collection [32]. The AUV performs global data collection around all CH nodes and finally reports the data to the sink node.

In the multi-modal network, each node is equipped with both UAC and UOC modems for multi-modal communication, and they have the same initial energy, sensing and communication capabilities. Specifically, it includes an acoustic modem for exchanging data at a low transmission rate over a long distance and an optical modem with a relatively short transmission distance and high data rate [33]. Meanwhile, the AUV has similar communication capability to ensure the data transmission [16]. Without loss of generality, we assume that the data arrival rate of sensor node obeys a Poisson random distribution with parameter

λ

. When the AUV visits

c_{i}

, the CHs package the sampled data block into the packets of length

B_{i}

with timestamp

T_{i}

.

3.2. Node Clustering Phase

We assume that the nodes are randomly deployed in the target area to monitor the underwater environment, and the nodes are clustered. In the initial phase, the sink nodes know the location of each node and determine the number of clusters based on the network size, and then, the target area is divided equally into several square areas. The sink node broadcasts the subregion message to all nodes, and each node determines its own cluster identifier based on its position. Nodes with the same identifier belong to the same cluster [26].

The CHs should be selected for inter-cluster data collection and communication with the AUV. The selection of CH is carried out according to the procedure as follows. The number of optical and acoustic communication neighbors of each node in the sub-region is first obtained, and then, the node with the highest number of optical communication neighbors and the remaining energy satisfying the energy threshold requirement is selected as the CH. Then, the above operation is repeated until all CHs in the target region are determined. Finally, a confirmation packet is sent by the sink to the designated CHs. At the end of the data collection process, all CHs are evaluated, and when the energy of the CH is less than the energy threshold, the network performs a new CH selection round.

3.3. Acoustic Data Collection Link

When the AUV traverses near the node, it is necessary to construct a communication link for data collection. As for the acoustic link, the acoustic wave is affected by the absorption of medium and the scattering of impurities in water. The path loss of underwater acoustic channels is related to frequency f and distance

d_{a c}

. To this end, the total attenuation is given as follows [34].

A (d_{a c}, f) = d_{a c}^{k} a {(f)}^{d_{a c}},

(1)

where k = 1.5 represents propagation loss, and

a (f)

is the absorption coefficient in dB/km given by the Thorp formula [35]

\begin{matrix} 10 log a (f) = 0.11 \frac{f^{2}}{1 + f^{2}} + 44 \frac{f^{2}}{4100 + f^{2}} \\ + 2.75 \times 10^{- 4} f^{2} + 0.003 . \end{matrix}

(2)

Consequently, given the acoustic signal transmit power

P_{t r a n s}^{a c}

and frequency f, the signal-to-noise ratio (SNR) can be expressed as [36]

{S N R}_{a c} (d_{a c}, f) = \frac{P_{t r a n s}^{a c} / A (d_{a c}, f)}{N (f) Δ f},

(3)

where

N (f)

and

Δ (f)

represent the total noise level including four kinds of interference noise and the bandwidth of the receiver, respectively. Therefore, the transmission power of acoustic communication satisfying the minimum

S N R_{min}^{a c}

is expressed as

P_{t r a n s}^{a c} = S N R_{min}^{a c} A (d_{a c}, f) N (f) Δ f .

(4)

3.4. Optical Data Collection Link

For the optical link, the path loss

P L

of the underwater wireless optical link can be expressed as [37]

P L \approx 10 log ({(\frac{D_{r}}{2 θ d_{o p}})}^{2} e^{- c d_{o p} {(\frac{D_{r}}{θ d_{o p}})}^{ζ}}),

(5)

where

D_{r}

represents the aperture diameter of the receiver and

θ

denotes half of the transmitter beamwidth,

d_{o p}

represents the distance between transceivers, and c and

ζ

represent the extinction coefficient and turbidity of water quality, respectively. Subject to the optical-to-electric conversion efficiency of the receiver, a minimum received power per bit is defined as

P_{r e c}^{o p}

. Then, the transmission power of the UOC is expressed as

P_{t r a n s}^{o p} = \frac{P_{r e c}^{o p}}{P L} .

(6)

In order to ensure robust optical communication, it is necessary to control the minimum

S N R_{min}^{o p}

requirements [38].

S N R_{min}^{o p} = {[\frac{P_{t r a n s}^{o p} e^{- c d_{o p}} D_{r}^{2} cos φ}{({tan}^{2} θ) 4 d_{o p}^{2} N E P}]}^{2},

(7)

where

N E P

represents the noise equivalent power and

φ

is the offset angle between transceivers. According to the Lambert W function [39], the maximum underwater optical communication distance while satisfying the communication

S N R_{min}^{o p}

can be obtained by

d_{o p} = \frac{2 W (\frac{c}{4} {[\frac{{(S N R_{min}^{o p})}^{\frac{1}{2}} N E P {tan}^{2} θ}{P_{t r a n s}^{o p} D_{r}^{2} cos φ}]}^{- \frac{1}{2}})}{c} .

(8)

Since optical modems are usually directional, in order to receive optical signals from any direction, we assume that an omni-directional optical modem can be achieved by using multiple LEDs [40].

4. Multi-Modal Data Collection Analysis

When performing multi-modal data collection via AUV, the freshness of the collected data and the energy consumption of the network nodes need to be fully considered. The choice of communication options fundamentally affects the data collection efficiency. Among the various communication options, the UOC is capable of transmitting a large volume of data rapidly to reduce transmission latency but increases the navigation time and energy consumption of the AUV. Meanwhile, the UAC has a lower bandwidth but can collect small volume of data over long distances to reduce the travel time of the AUV. Consequently, to collect data in a timely and efficient manner, several key factors, such as the data collection option, data type, packet size and AoI requirement, should be fully analyzed and integrated into the optimal path-planning scheme for data collection.

4.1. Problem Analysis

The primary goal of the mobile data collection in this paper is to minimize both the weighted average AoI and the energy consumption. The factors that influence the AoI include the AUV trajectory, the data transfer time and the importance of the data. Consequently, to minimize the weighted average AoI, the optimization problem can be expressed as

\begin{matrix} min & \frac{1}{N} \sum_{i = 1}^{N} A_{i} + β \sum_{i = 1}^{N} e_{i}, \end{matrix}

(9)

\begin{matrix} s . t . & x_{i, t}^{a c} \in {0, 1}, \end{matrix}

(9a)

\begin{matrix} x_{i, t}^{o p} \in {0, 1}, \end{matrix}

(9b)

\begin{matrix} \sum_{t = 1}^{T} x_{n, t}^{a c} + x_{n, t}^{o p} = 1, \forall n \in N \end{matrix}

(9c)

\begin{matrix} E_{A U V} < e n e r g y_{A U V}, \end{matrix}

(9d)

\begin{matrix} P_{0} = (x_{0}, y_{0}), \end{matrix}

(9e)

where

A_{i}

denotes the final result of AoI when the data from

c_{i}

reach the sink node,

e_{i}

denotes the energy consumption by

c_{i}

during data collection, and

x_{i, t}^{a c} = 1

indicates that the AUV has reached the acoustic communication range of CH

c_{i}

and receives data through UAC. Otherwise,

x_{i, t}^{a c} = 0

holds. Similarly,

x_{i, t}^{o p}

represents an optical communication indicator.

E_{A U V}

is the energy consumption of the AUV in data collection, and

e n e r g y_{A U V}

indicates the initial total energy of the AUV. The constraint in (9c) is used to ensure that each node can select just one communication option during data collection. The constraint in (9d) is to guarantee that the AUV cannot consume all of its energy. Finally, the constraint in (9e) is to determine the initial position of AUV.

The optimization problem (9) is a non-linear integer programming problem, which is intractable due to the presence of binary variables and non-convex objective function. In the following section, we model this as a Markov decision process (MDP) to be solved by leveraging the DRL approach.

4.2. Definition of AoI

The AoI is an important metric to portray the freshness of collected data and is defined as the time elapsed between the data collected by the AUV from the CHs until its delivery to the sink node [41]. We use

δ_{i, t}

to denote the AoI collected from

c_{i}

in the navigation trajectory at time t. When

t < T_{i}

, the information of CH

c_{i}

is not sampled since it is not visited, and thus,

δ_{i, t} = 0

holds. Otherwise,

δ_{i, t} = t - T_{i}

holds. Then, the AoI of

c_{i}

at the start of time slot t is given by the following relation.

δ_{i, t} = \{\begin{matrix} 0, & if t < T_{i} \\ t - T_{i}, & otherwise \end{matrix} .

(10)

The primary factors affecting AoI during data collection include data transmission delay and AUV sailing time. We use

T_{i}^{a c}

and

T_{i}^{o p}

to denote the data transmission time using UAC and UOC, respectively. The time to transmit

B_{i}

bits by the UAC can be written as

T_{i}^{a c} = \frac{B_{i}}{R_{a c}} + \frac{d_{i}}{V_{a c}},

(11)

where

R_{a c}

and

V_{a c}

indicate the data rate and transmission velocity of the underwater acoustic modem, respectively. Similarly, the data transmission time of the UOC at data rate

R_{o p}

and transmission velocity

V_{o p}

can be obtained as follows.

T_{i}^{o p} = \frac{B_{i}}{R_{o p}} + \frac{d_{i}}{V_{o p}} .

(12)

To collect the monitored data, the AUV travels from the sink

p_{0}

, collects data from each of the N CHs according to a pre-determined trajectory, and then returns to the sink node after completing the task. Assume that the travel trajectory of the AUV

P = {p_{0}, p_{i}, \dots, p_{j}, p_{0}}

, and thus the travel time of the AUV can be expressed as

T_{t r a v e l} = \frac{D (P)}{V_{A U V}},

(13)

where

D (P)

and

V_{A U V}

denote the total distance and velocity traveled by the AUV, respectively.

According to (11)–(13), from the moment

T_{i}

when AUV arrives at CH

c_{i}

to the moment

T_{i + 1}

when it finishes collecting data and moves to the next data collection point, the AoI of CH

c_{i}

can be expressed as

T_{i}^{m} + t_{i, i + 1}^{t r a v e l}

. The optical communication has a much smaller transmission delay compared to the acoustic waves. However, the acoustic communication enables long-range transmission that significantly reduces the travel time of the AUV. The time delay caused by data transmission is mainly determined by the data size

B_{i}

, and so the data collection time and traveling time need to be considered jointly to reduce the decline of data freshness. Then, at moment

t = T_{i + 1}

, the AoI collected from CH

c_{i}

refers to

δ_{i, T_{i + 1}} = \{\begin{matrix} T_{i}^{a c} + t_{i, i + 1}^{t r a v e l}, & if b_{i} = 1 and x_{i, t}^{a c} = 1 \\ T_{i}^{o p} + t_{i, i + 1}^{t r a v e l}, & if b_{i} = 1 and x_{i, t}^{o p} = 1 \end{matrix},

(14)

where

b_{i} = 1, i = {1, 2, \dots, N}

indicates that the data of CH

c_{i}

has been collected; otherwise,

b_{i} = 0

holds. When the AUV arrives at the sink node, the AoI of

c_{i}

is

A_{i} = \sum_{k = i}^{N} η_{i} δ_{k, T_{k + 1}},

(15)

where

η_{i}

denotes the importance weight of the data collected by CH

c_{i}

, and

\sum_{i = 1}^{N} η_{i} = 1

. The higher its value, the greater the data importance.

4.3. Energy Consumption Associated with Data Collection

To satisfy the energy constraint (9d) in the optimization problem (9), we analyze the AUV energy consumption and node energy consumption. In the data collection process, there are extra costs associated with the AUV if it runs out of energy before returning to the sink node. Therefore, the trajectory of the AUV should be scheduled to minimize energy consumption. The power of the AUV at each time slot mainly consists of the sum of propulsion power

Φ_{p r o p}

and hotel load power

Φ_{H}

[42]. The hotel load

Φ_{H}

is the power consumed by all subsystems other than propulsion mechanism and is typically negligible in comparison with

Φ_{p r o p}

[43]. Therefore, the power of the AUV trajectory can be expressed as

Φ_{p r o p} = \frac{ρ}{2 η_{p}} C_{D} A_{s} {∥V_{A U V}∥}^{3},

(16)

where

∥\cdot∥

denotes the Euclidean vector norm and

ρ

is the density of water.

η_{p}

,

C_{D}

and

A_{s}

indicate the efficiency of the AUV’s propulsion system, the drag coefficient and the wetted surface area, respectively [7]. Consequently, with the relations in (4), (6), (11)–(13) and (16), the total energy consumption can be expressed as

\begin{matrix} E_{t o t} & = \sum_{i = 1}^{N} e_{i} + ϖ Φ_{p r o p} T_{t r a v e l} \\ = \sum_{i = 1}^{N} x_{i}^{a c} T_{i}^{a c} P_{t r a n s}^{a c} + x_{i}^{o p} T_{i}^{o p} P_{t r a n s}^{o p} + ϖ Φ_{p r o p} T_{t r a v e l}, \end{matrix}

(17)

where

ϖ

is a weighted parameter that measures the balance between the energy consumption of the sensor node and that of the AUV.

5. Proposed DRL-Based Multi-Modal Data Collection Scheme

In this section, we design the AUV multi-modal data collection scheme by leveraging the DRL approach. In this scheme, we first provide the MDP formulation and then present a multi-modal steering angle optimization (MSAO) algorithm for the AUV. Afterwards, we design the AUV path planning using the Deep Q Network (DQN) method for multi-modal mobile data collection.

5.1. MDP Formulation

When the network nodes are clustered, the next goal is to find an optimal CHs data collection strategy. The AUV-assisted data collection problem can be formulated as an MDP to be solvable by the DRL approach, which is represented by

< S, A, P, R, γ >

five tuples. Here,

S

is the state of the environment,

A

is the set of actions of the agent,

P

is the state transition probability,

R

is the reward function, and

γ

denotes the discount factor. In particular, at time slot t, the agent observes state

s_{t}

and chooses an action to be performed. Then, the environment state is transferred with probability

p_{s_{t}, s_{t + 1}}

to

s_{t + 1}

and the agent obtains a reward

r_{t}

from the environment. In this paper, the AUV is considered as the agent to collect data, and the details of each element are defined as follows.

State space $S$ : The status of AUV mobile collection is defined as

$s_{t} = \{p_{a, t}, ψ_{t}, Δ_{t}, {x_{i, t}^{a c}, x_{i, t}^{o p}, d_{i, t}, δ_{i, t}}_{i \in N}\},$

(18)

where $p_{a, t}$ and $ψ_{t}$ are the coordinates and sailing orientation of the AUV at time slot t, and its position can be obtained via ultra-short baseline (USBL) [44]. $Δ_{t}$ is the difference between the remaining energy of the AUV and the AUV’s arrival at its final destination from its current position. $d_{i, t}$ records the Euclidean distance of the AUV to CH $c_{i}$ . $x_{i, t}^{m}$ is the data collection indicator related to the data collection option. When the AUV arrives at the data collection point $c_{i}$ , the AoI of node $c_{i}$ starts to be updated.
Action space $A$ : In state $s_{t}$ , the action selection of the AUV is characterized by the target point $c_{i, t} \in N_{r}$ with the transmission option $m_{i, t}$ , and the next target point $c_{j, t} \in N_{r} \ c_{i, t}$ , where $N_{r}$ is the set of CHs that have not been collected. Then, the action performed by AUV at state $s_{t}$ can be expressed as

$a = \{c_{i, t}, m_{i, t}, c_{j, t} | s_{t}\} .$

(19)
State transition probability $P$ : $P (s_{t + 1} |s_{t}, a_{t})$ defines the transition probability from state $s_{t}$ to the next state $s_{t + 1}$ under the action $a_{t}$ , and $P (s_{t + 1} |s_{t}, a_{t}) = 1$ holds.
Reward $R$ : Applying action $a_{t}$ in state $s_{t}$ , the AUV enters state $s_{t + 1}$ and obtains an immediate reward $r (s_{t + 1} |s_{t}, a_{t})$ . In the AUV-assisted multi-modal data collection scenario, the immediate reward $r_{t}$ can be expressed as

$r_{t} = \{\begin{matrix} x_{i, t}^{a c} k_{1} e_{i} + x_{i, t}^{o p} k_{2} e_{i}, & if b_{i} = 1 \\ J, & if done \\ η_{i} (d i s_{p_{a}, p_{i}} + 1), & otherwise \end{matrix},$

(20)

where $k_{1}$ , $k_{2}$ are constants and $k_{1} < k_{2}$ holds, and when the AUV has collected the data of CH $c_{i}$ , the relevant reward is obtained according to the selected modem. $d i s_{p_{a}, p_{i}}$ is the Euclidean distance from the current position of the AUV to the target point. J denotes the reward at the end of the data collection process, including rewards for successful data collection and penalties for failure (e.g., exceeding maximum energy consumption and crossing boundaries).

$J = \{\begin{matrix} r_{o u t}, & if Δ_{t} < 0 o r p_{a} \notin Ω \\ k_{3} - \frac{1}{N} \sum_{i = 1}^{N} A_{i}, & if p_{a} = p_{0} and N_{r} = ⌀ \end{matrix},$

(21)

where $k_{3}$ is a constant and $Ω$ is the region in which the AUV can move within.
Discount factor $γ$ : $γ \in [0, 1]$ is the future reward discount factor.

5.2. Multi-Modal Steering Angle Optimization Algorithm

In the multi-modal data collection network, since the communication radius can reduce the navigation time of AUVs, we propose an MSAO algorithm to adjust the AUV heading under the maximum steering angle constraint. In MSAO, the steering angle of the AUV is calculated based on its position, the navigation target and the communication options. As shown in Figure 2, the yellow triangle indicates the position

p_{a, t}

of the AUV at time slot t, the blue pentagram indicates the CHs that need to perform the data collection operation, the outer circle

C_{a c}

and inner circle

C_{o p}

indicate the communication range of UAC and UOC, respectively. Let

c_{i}

be the AUV’s current target CH and

c_{j}

be the next target CH, the

p_{r_{i}, m}

indicates the target hover point when the AUV selects communication option

m = \{a c, o p\}

, the

ψ_{m, t}

indicates the angle of the AUV toward the target hover point

p_{r_{i}, m}

at time slot t. The goal is to obtain the point

p_{r_{i}, m}

such that the

∥p_{a, t} - p_{r_{i}, m}∥ + ∥c_{j} - p_{r_{i}, m}∥

distance is shortest within the communication range

C_{m}

of the communication options m. This problem is a classical pilgrimage problem in ancient castles, and hence, an approximate solution of

p_{r_{i}, m} = (x_{r_{i}}, y_{r_{i}})

can be obtained by the following equation [45].

- ς_{1} \sqrt{1 - y^{2}} + ς_{2} y = 0,

(22)

where

ς_{1} = \frac{x_{i} - x_{j}}{d_{m} d_{i j}} - \frac{x_{i} - x_{a, t}}{d_{m} d_{a i, t}}

,

ς_{2} = \frac{y_{i} - y_{a, t}}{d_{m} d_{a i, t}} - \frac{y_{i} - y_{j}}{d_{m} d_{i j}}

,

y = \frac{y_{i} - y_{r_{i}}}{d_{m}}

,

d_{a i, t}

and

d_{i j}

denote the distance of the AUV from the target at time slot t and the distance of the current target CH

c_{i}

from the next target CH

c_{j}

, respectively. Then, the steering angle of the AUV at time slot t can be expressed as

Ψ_{m, t} = \{\begin{matrix} min (ψ_{m, t} - ψ_{t}, ψ_{max}), ψ_{m, t} \geq ψ_{t} \\ max (ψ_{m, t} - ψ_{t}, - ψ_{max}), ψ_{m, t} < ψ_{t} \end{matrix},

(23)

where

ψ_{max}

is the maximum steering angle allowed by the AUV. Then, depending on the target location and the communication option, the steering angle of the AUV can be adjusted in the following two cases.

Case 1: The AUV is not through the region $C_{m}$ from the current position $p_{a, t}$ to the next target collection point $c_{j}$ ; i.e., the distance $d_{s e g_{i}}$ from point $c_{i}$ to the segment $\bar{p_{a, t} c_{j}}$ is greater than the UAC radius. As shown in Figure 2a, after determining the communication option, the points $p_{r_{i}, a c}$ (or $p_{r_{i}, o p}$ ) are obtained in circle $C_{a c}$ (or $C_{o p}$ ) to minimize the length of the AUV trajectory. For example, when the CH $c_{i}$ , $c_{j}$ and acoustic modem are selected, the AUV hover position $p_{r_{i}, a c} = (x_{r_{i}}, y_{r_{i}})$ for data collection and the steering angle $Ψ_{a c, t}$ can be calculated by (22) and (23), respectively. Similarly, when $m = o p$ holds, the data collection hover point $p_{r_{i}, o p}$ and the steering angle $Ψ_{o p, t}$ can be obtained using the same approach.
Case 2: The trajectory of the AUV from the current coordinate $p_{a, t}$ to the next target CH $c_{j}$ sails through the communication region $C_{m}$ of $c_{i}$ . If the AUV crosses the UAC area $C_{a c}$ without crossing the communication area $C_{o p}$ , $d_{s e g_{i}}$ becomes shorter than $d_{a c}$ but greater than $d_{o p}$ . As shown in Figure 2b, the data collection hover point of the AUV is the vertical foot $p_{r_{i}, a c}$ from $c_{i}$ to segment $\bar{p_{a, t} c_{j}}$ if UAC is selected as the communication option. Then, the steering angle of the AUV can be obtained by (23). If the selected communication option is UOC, the data collection point and steering angle are calculated following the method in Case 1. Furthermore, if $d_{s e g_{i}}$ is less than $d_{o p}$ , i.e., the AUV crosses the UOC range of $c_{i}$ , then UOC is selected directly as the communication option. This is due to the superiority of UOC over UAC in terms of energy consumption and transmission time for the same AUV trajectory. The data collection hover point and steering angle of the AUV are similar to the method in Case 2.

Based on the above discussion, we obtain the MSAO algorithm that is shown in Algorithm 1.

Algorithm 1 Proposed MSAO Algorithm

Require:: Coordinate of the AUV $p_{a, t}$ , coordinates of the current target CH $c_{i}$ , coordinates of the next target CH $c_{j}$ , UAC communication radius $d_{a c}$ and UOC communication radius $d_{o p}$ .
1:: if $d_{s e g_{i}} > d_{a c}$ and $m = a c$ then
2:: Calculate the data collection hover position $p_{c_{i}, a c}$ by (22).
3:: Calculate steering angle $Ψ_{a c, t}$ by (23).
4:: else if $d_{s e g_{i}} \leq d_{a c}$ and $d_{s e g_{i}} > d_{o p}$ and $m = a c$ then
5:: The data collection hover position is the vertical foot $p_{r_{i}, a c}$
from $c_{i}$ to the segment $\bar{p_{a, t} c_{j}}$ .
6:: Calculate steering angle $Ψ_{a c, t}$ by (23).
7:: else if $d_{s e g_{i}} > d_{o p}$ and $m = o p$ then
8:: Calculate the data collection hover position $p_{r_{i}, o p}$ by (22).
9:: Calculate steering angle $Ψ_{o p, t}$ by (23).
10:: else if $d_{s e g_{i}} \leq d_{o p}$ then
11:: The data collection hover position is the vertical foot $p_{r_{i}, o p}$
from $c_{i}$ to the segment $\bar{p_{a, t} c_{j}}$ .
12:: Calculate steering angle $Ψ_{o p, t}$ by (23).
13:: end if
Ensure:: The steering angle of the AUV: $Ψ_{m, t}$ .

5.3. DRL-Based Multi-Modal Path Planning Scheme

Due to the uncertainty of reference access points and node data arrivals, the locations of AUV and the AoI of collected data are inherently random, which leads to a proliferation of state space dimensions. In comparison, DRL can handle extremely large state space by estimating the Q values of states s and actions a through neural networks [45,46]. The training framework of DQN includes a current Q-network and a target Q-network. In order to balance experience and exploration of the unknown, the agent at state

s_{t}

selects the action

a_{t}

to be performed by the

ϵ

-greedy algorithm [47].

a_{t} = \{\begin{matrix} r a n d o m a \in A, & with probability ε \\ arg max_{a_{t}} Q (s_{t}, a_{t}; θ), & with probability 1 - ε \end{matrix} .

(24)

Immediately after adjusting the navigation angle and execution of action

a_{t}

, the AUV receives reward

r_{t}

and the data acquisition network moves to the next state

s_{t + 1}

. Aiming to reduce the correlation between the online Q-network samples, an experience replay

B

is used to store historical experience samples

(s_{t}, a_{t}, r_{t}, s_{t + 1})

. At each training step, a small batch of randomly selected empirical samples

Φ_{b}

from the experience replay is used to update the parameters of the online Q-network. In addition, we denote the parameters of DQN as

θ

, and the parameters of the online Q-network are determined by minimizing the loss function.

L (θ) = E_{Φ_{b}} [δ {(s, a)}^{2}],

(25)

where

δ (s, a) = y_{t} - Q (s_{t}, a_{t} | θ)

is the temporal difference, and

y_{t}

is the target Q-value, which can be calculated by

y_{t} = r_{t} + γ max_{a_{t + 1}} Q (s_{t + 1}, a_{t + 1} | θ^{-}),

(26)

where

θ^{-}

denoted the parameter of the target Q-network. Then, the weight of the current network

θ

is updated by the following formula.

θ = θ + \frac{α}{Φ_{b}} \sum_{t = 1}^{Φ_{b}} δ (s, a) \nabla_{θ} Q (s_{t}, a_{t}, θ) .

(27)

The proposed AUV-assisted data collection algorithm is shown in Algorithm 2. The algorithm starts by initializing all neural networks as well as the replay buffer

B

. The training iterates over E episodes, and the environment is initialized in each episode by observing the distribution of CHs. The action is first obtained according to the

ϵ

-greedy policy, which is followed by inputting the action to Algorithm 1 to obtain the steering angle. Then, the AUV moves to the next state

s_{t + 1}

and receives an immediate reward

r_{t}

. After storing the transition tuple

(s_{t}, a_{t}, r_{t}, s_{t + 1})

in experience replay

B

, a randomly selected sample of

Φ_{b}

is utilized to learn the current network Q, and it updates the weights of the current network

θ

and that of the target network

θ^{-}

. Then,

c_{i}

is removed from

N_{r}

if the current state is able to collect the data of

c_{i}

, and the current loop is terminated when

N_{r} = \emptyset

holds.

Algorithm 2 DRL-Based Multi-Modal Data Collection Algorithm

1:: Input: Initialize the constants $k_{1}$ , $k_{2}$ and $k_{3}$ , maximum number of training sets E, reward discount factor $γ$ , learning rate $l_{r}$ , experience replay $B$ , minimum batch $Φ_{b}$ , exploration probability $ϵ$ , and update step $χ$ ;
2:: Initialize the current network $Q (s_{t}, a_{t}, θ)$ with weights $θ$ and the target network $Q (s_{t}, a_{t}, θ^{-})$ with weights $θ^{-}$ .
3:: for $e p i s o d e = 1, \dots, E$ do
4:: for $t = 1, \dots, T$ do
5:: Initialize the data collection network environment and observe the initial state $s_{t}$ .
6:: Select a random action $a_{t}$ according to the $ϵ$ -greedy algorithm.
7:: Determine the AUV steering angle with Algorithm 1.
8:: Execute action $a_{t}$ and observe the reward $r_{t}$ and the next state $s_{t + 1}$ .
9:: Store experience $(s_{t}, a_{t}, r_{t}, s_{t + 1})$ in experience replay $B$ .
10:: Sample a random mini-batch of $Φ_{b}$ experiences from $B$ .
11:: Calculate the target value $y_{t}$ by (26).
12:: Update the current network weights $θ$ by (27).
13:: Update the weights of the target network $θ^{-} = θ$ every $χ$ steps.
14:: if $s_{t + 1}$ is the collection stop $n_{i}$ then
15:: Remove the CH $c_{i}$ from $N_{r}$ .
16:: end if
17:: Terminate the episode if $N_{r} = \emptyset$ holds.
18:: end for
19:: end for
20:: Output: The AUV trajectory $p_{a, t}$ and the AoI $A_{i}$ .

6. Results and Discussion

In this section, we conduct extensive simulations to verify the effectiveness of the proposed scheme. The simulation setup and numerical performance results are given as follows.

6.1. Simulation Setup

To evaluate the proposed scheme, we assume that there are 50 sensor nodes uniformly distributed in an 800 m × 800 m square target area. After CHs are designated, data fusion and data compression are performed by CHs. It is assumed that the data types collected and transmitted by the normal sensor nodes are text, records and images, and the amount of data pooled by the CHs is set to be between 10 and 300 packets, with the size of each packet 1024 bits. The AUV starts from the start point

p_{0} =

(50, 120) with an initial orientation angle

ψ_{m, 0} = 0^{\circ}

and returns to

p_{0}

after collecting data from all the CHs.

To evaluate the performance of the algorithm, a python 3.8 simulation environment was chosen. The target Q-network and the current Q-network are two-layer fully connected networks with 256 neurons per layer, and we use the ReLU function as the activation function to train both networks using the Adam optimizer. Other simulation parameters and their specific values are provided in Table 1.

For the sake of performance comparison, the benchmark algorithms are provided as follows.

Single Acoustic: The AUV exchanges data utilizing acoustic waves during data collection, and the hovering positions are determined by the UAC radius during the selection process of the steering angle. The AUV trajectories are learned using the DQN algorithm.
Single Optical: The AUV can exchange data only by selecting optical waves and calculating the AUV hovering locations by means of the UOC radius. The DQN algorithm is used to learn the AUV trajectory.
Energy Greedy: The AUV performs steering Algorithm 1 and then greedily selects the nodes with the shortest path length in the data collection sequence.

6.2. The Convergence Performance

To demonstrate the convergence of the AoI optimization algorithm for AUV data collection, in Figure 3, we show the variation of the cumulative reward, where the X-axis represents the number of iterations trained and the Y-axis presents the cumulative reward. It can be seen that in the early stages of training, the cumulative reward values are very low due to the high chance of

ϵ

-greedy random exploration. As the training period continues to increase, the reward value gradually increases and stabilizes.

6.3. Impact of the AUV Velocity on Performance

To explore the effect of AUV velocity on the average AoI, we simulated the average AoI performance of the collected data with data arrival rate

λ

= 20 and 300 Kbits. The experimental results are shown in Figure 4, where it can be observed that the average AoI of the data collected by AUV gradually decreases with increasing AUV velocity. The weighted average AoI of the single UAC is lower than that of the multi-modal and single UOC when the AUV speed is 0.5 m/s. With the increase of the AUV velocity, the average AoI of the multi-modal data collection scheme is better than that of the single communication option.

The primary reason for this performance is that the AUV travels slower and increases travel time, whereas the long-distance data collection via the single UAC is able to reduce the travel time of the AUV, which mitigates the increase in AoI. As the AUV velocity increases, the effect of AUV travel time on AoI is weakened, which makes the weight of data transmission time increase for AoI; thus, the UOC scheme outperforms the UAC scheme at higher AUV velocities. The multi-modal data collection scheme selects the best communication option according to the data characteristics, AUV navigation time and data transmission time so that the overall performance is better than the single-modal scheme. Furthermore, the average AoI of our proposed multi-modal data collection scheme at the AUV velocity of 0.5 m/s is inferior to that of the single UAC scheme; this is because the multi-modal scheme not only considers AoI but also focuses on data collection energy consumption, so it sacrifices some AoI performance to reduce CHs energy consumption.

Under the parametric conditions of Figure 4, we analyzed the effect of AUV velocity on the energy consumption of data collection. Considering that the energy consumption of CHs is irreversible, we pay more attention to the energy consumption of data transmission, and therefore, we set

ϖ

= 0.03. As shown in Figure 5, it can be observed that the weighted energy consumption of the single UAC scheme always remains at the highest level owing to the high weight of data transmission energy consumption. The single UOC scheme performs well in terms of CHs data transmission energy consumption, and hence, the weighted energy consumption is better than the single UAC scheme. The multi-modal scheme is able to reduce both AUV energy consumption and CHs data transmission energy consumption by jointly deciding on the best communication option based on packet size and path length. Furthermore, with the increasing AUV velocity, the weighted energy consumption of the three schemes will be convergent, since the AUV power increases geometrically with the velocity of travel.

6.4. Impact of the Data Arrival Rate on Performance

Figure 6 shows the effect of different data arrival rates on the weighted average AoI. The velocity of the AUV is set to 1 m/s, and the length of each time slot is set to 6 s. It can be observed from the figure that at lower data arrival rates, the single UAC is superior to the single UOC since the acoustic waves can be deployed for long-range data transmission, which significantly saves the travel time of the AUV. As the data arrival rate increases, the weight of data transmission time on AoI improves, which results in the single UOC scheme being superior to the single UAC scheme in terms of AoI. In addition, the greedy algorithm performs poorly in weighted average AoI as it greedily selects the closest visit location ignoring data importance and AoI. Our proposed multi-modal data collection scheme outperforms the other three schemes for different data arrival rates and is near the single UOC performance when the data size is over 140 Kbits. The main reason for this phenomenon is that the multi-modal scheme selects acoustic communication to reduce the sailing time when the data size is small and optical communication for fast data transmission when the data size is large, and thus, it can adapt to different data conditions and achieve a relatively low weighted average AoI.

To verify the superiority of the proposed multi-modal data collection algorithm in terms of energy consumption for data collection, we compare the transmission energy consumption of CHs and AUV energy consumption for different data sizes. In this study, we set the AUV velocity to 1 m/s and the data size to 20–200 Kbits, and the experimental results are shown in Table 2 and Figure 7. It is observed that the average data transmission energy consumption of CHs under a single UOC approach is the smallest, and the AUV energy consumption is the highest. The energy consumption of CHs is the highest for the single UAC and greedy approaches, but the AUV energy consumption is kept at a low level. In the multi-modal scheme, the energy consumption of CH increases and then decreases with the increasing data size, and the AUV energy consumption gradually increases, but the overall energy consumption remains at a low level.

The reason for such a phenomenon can be explained as follows. The single UOC requires the AUV to travel to the immediate vicinity of the node, which increases the energy consumption of the AUV for navigation. Fortunately, due to the low energy consumption and high bandwidth of the optical modems, the energy consumption of the CHs owing to the data transmission is low. Similarly, the single UAC and greedy algorithm allow the AUV to collect data over longer distances using acoustic waves, which greatly saves AUV energy consumption. However, with the increasing data size, the low bandwidth and high energy consumption of the UAC make the energy consumption of the CHs significant. In the multi-modal scheme, the AUV selects the UAC to collect data when the data size is small, keeping the transmission energy consumption of the CHs low while reducing the mobile energy consumption. When the data size is larger than 140 Kbits, the multi-modal scheme switches to strictly optical communication mode to reduce the excessive energy consumption of the CHs in order to extend the lifetime of the UWSN. Note that our proposed multi-modal data collection scheme has an excellent performance in the face of diversified data, and when the data size of each CHs is large (or small), the multi-modal scheme will become a strict UOC (or UAC) scheme.

In Figure 8, we show the weights of AoI for each CHs under different schemes. The results show that the greedy scheme has the maximum AoI value for CH index = 5 and the lowest AoI value for index = 1. This is because the greedy algorithm ignores the effect of data importance when selecting the nearest nodes to visit, resulting in a large data AoI for the first visited node. The other three schemes use reinforcement learning methods to select the best node access order based on the importance of the data, which avoids the extreme cases of AoI values. Furthermore, the multi-modal scheme flexibly selects the communication options based on the data size and importance of the nodes, and hence, its performance is better compared to the other two single-modal schemes. It is worth noting that we neglected the specific details of light alignment and the time consumed during data collection, which result in a seemingly promising AoI performance for the single UOC. In future work, we will consider more details of underwater optical communication.

7. Conclusions

In this paper, we proposed an AUV-assisted multi-modal data collection scheme which provides timely and reliable data collection by utilizing underwater acoustic and optical communication technologies in an adaptive manner. The trajectory planning problem is formulated as a mixed integer nonlinear problem to minimize the weighted average AoI and energy consumption, and the data collection problem is formulated as an MDP considering data importance, packet size, and data collection options. We then developed a DQN-based learning algorithm to determine the optimal strategy. In addition, an AUV multi-modal corner optimization algorithm is proposed to reduce the energy consumption of AUV navigation. Through numerical simulations, we showed that our proposed algorithm has convergence capability as well as verified that the AUV path-planning algorithm has excellent performance which can effectively reduce the AoI and energy consumption of collected data.

Author Contributions

F.B. proposed the main ideas, wrote the paper, designed the description framework, and conducted the simulations. H.L. provided guidance for the work, discussed and provided ideas, wrote and modified the paper and acquired funding. X.L., R.R. and G.H. provided guidance for the work, and collaborated in discussion on the proposed system model and techniques. S.M. assisted in testing the code and checked the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Shandong Provincial Natural Science Foundation, China, under Grant ZR2020MF059; in part by the China NSFC under Grant 62072287.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank the anonymous reviewers for their time towards the manuscript.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Jahanbakht, M.; Xiang, W.; Hanzo, L.; Azghadi, M.R. Internet of underwater things and big marine data analytics—A comprehensive survey. IEEE Commun. Surveys Tuts. 2021, 23, 904–956. [Google Scholar] [CrossRef]
Wei, X.; Guo, H.; Wang, X.; Wang, X.; Qiu, M. Reliable data collection techniques in underwater wireless sensor networks: A survey. IEEE Commun. Surveys Tuts. 2021, 24, 404–431. [Google Scholar] [CrossRef]
Luo, H.; Wu, K.; Ruby, R.; Liang, Y.; Guo, Z.; Ni, L.M. Software-defined architectures and technologies for underwater wireless sensor networks: A survey. IEEE Commun. Surveys Tuts. 2018, 20, 2855–2888. [Google Scholar] [CrossRef]
Luo, H.; Wang, J.; Bu, F.; Ruby, R.; Wu, K.; Guo, Z. Recent progress of air/water cross-boundary communications for underwater sensor networks: A review. IEEE Sens. J. 2022, 22, 8360–8382. [Google Scholar] [CrossRef]
Fang, Z.; Wang, J.; Jiang, C.; Zhang, Q.; Ren, Y. AoI-inspired collaborative information collection for AUV-assisted internet of underwater things. IEEE Internet Things J. 2021, 8, 14559–14571. [Google Scholar] [CrossRef]
Jawhar, I.; Mohamed, N.; Al-Jaroodi, J.; Zhang, S. An architecture for using autonomous underwater vehicles in wireless sensor networks for underwater pipeline monitoring. IEEE Trans. Ind. Informat. 2018, 15, 1329–1340. [Google Scholar] [CrossRef]
Mahmoodi, K.A.; Uysal, M. AUV Trajectory Optimization for an Optical Underwater Sensor Network in the Presence of Ocean Currents. In Proceedings of the 2021 IEEE International Black Sea Conference on Communications and Networking (BlackSeaCom), Bucharest, Romania, 24–28 May 2021; pp. 1–6. [Google Scholar]
Wang, X.; Luo, H.; Yang, Y.; Ruby, R.; Wu, K. Underwater Real-time Video Transmission via Optical Channels with Swarms of AUVs. In Proceedings of the 2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS), Beijing, China, 14–16 December 2021; pp. 859–866. [Google Scholar]
Wang, J.; Li, J.; Yan, S.; Shi, W.; Yang, X.; Guo, Y.; Gulliver, T.A. A novel underwater acoustic signal denoising algorithm for Gaussian/non-Gaussian impulsive noise. IEEE Trans. Veh. Technol. 2020, 70, 429–445. [Google Scholar] [CrossRef]
Akhoundi, F.; Jamali, M.V.; Hassan, N.B.; Beyranvand, H.; Minoofar, A.; Salehi, J.A. Cellular underwater wireless optical CDMA network: Potentials and challenges. IEEE Access 2016, 4, 4254–4268. [Google Scholar] [CrossRef]
Wang, J.; Luo, H.; Ruby, R.; Liu, J.; Guo, K.; Wu, K. Reliable Water-Air Direct Wireless Communication: Kalman Filter-Assisted Deep Reinforcement Learning Approach. In Proceedings of the 2022 IEEE 47th Conference on Local Computer Networks (LCN), Edmonton, AB, Canada, 26–29 September 2022; pp. 233–238. [Google Scholar]
Luo, H.; Xie, X.; Han, G.; Ruby, R.; Hong, F.; Liang, Y. Multimodal acoustic-RF adaptive routing protocols for underwater wireless sensor networks. IEEE Access 2019, 7, 134954–134967. [Google Scholar] [CrossRef]
Zhao, Z.; Liu, C.; Qu, W.; Yu, T. An Energy Efficiency Multi-Level Transmission Strategy based on underwater multimodal communication in UWSNs. In Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications, Toronto, ON, Canada, 6–9 July 2020; pp. 1579–1587. [Google Scholar]
Gauni, S.; Manimegalai, C.; Krishnan, K.M.; Shreeram, V.; Arvind, V.; Srinivas, T. Design and analysis of co-operative acoustic and optical hybrid communication for underwater communication. Wireless Pers. Commun. 2021, 117, 561–575. [Google Scholar] [CrossRef]
Yu, T.; Liu, C.; Qu, W.; Zhao, Z. OD-PPS: An On-Demand Path Planning Scheme for Maximizing Data Completeness in Multi-modal UWSNs. In Proceedings of the International Conference on Wireless Algorithms, Systems, and Applications, Nanjing, China, 25–27 June 2021; pp. 16–28. [Google Scholar]
Gjanci, P.; Petrioli, C.; Basagni, S.; Phillips, C.A.; Bölöni, L.; Turgut, D. Path finding for maximum value of information in multi-modal underwater wireless sensor networks. IEEE Trans. Mobile Comput. 2018, 17, 404–418. [Google Scholar] [CrossRef]
Hu, Y.; Zheng, Y.; Liu, H.; Wang, Z.; Mao, Y.; Han, H. Mobile sink path planning research for underwater heterogeneous sensor network. In Proceedings of the 2018 Chinese Control And Decision Conference (CCDC), Shenyang, China, 9–11 June 2018; pp. 4443–4448. [Google Scholar]
Han, S.; Noh, Y.; Liang, R.; Chen, R.; Cheng, Y.J.; Gerla, M. Evaluation of underwater optical-acoustic hybrid network. China Commun. 2014, 11, 49–59. [Google Scholar]
Sun, Y.; Uysal-Biyikoglu, E.; Yates, R.D.; Koksal, C.E.; Shroff, N.B. Update or wait: How to keep your data fresh. IEEE Trans. Inf. Theory 2017, 63, 7492–7508. [Google Scholar] [CrossRef]
Talak, R.; Karaman, S.; Modiano, E. Optimizing information freshness in wireless networks under general interference constraints. In Proceedings of the Eighteenth ACM International Symposium on Mobile Ad Hoc Networking and Computing, Los Angeles, CA, USA, 26–29 June 2018; pp. 61–70. [Google Scholar]
Yi, M.; Wang, X.; Liu, J.; Zhang, Y.; Bai, B. Deep reinforcement learning for fresh data collection in UAV-assisted IoT networks. In Proceedings of the IEEE INFOCOM 2020—IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Toronto, ON, Canada, 6–9 July 2020; pp. 716–721. [Google Scholar]
Basagni, S.; Di Valerio, V.; Gjanci, P.; Petrioli, C. MARLIN-Q: Multi-modal communications for reliable and low-latency underwater data delivery. Ad Hoc Netw. 2019, 82, 134–145. [Google Scholar] [CrossRef]
Shen, Z.; Yin, H.; Jing, L.; Liang, Y.; Wang, J. A Cooperative Routing Protocol Based on Q-Learning for Underwater Optical-Acoustic Hybrid Wireless Sensor Networks. IEEE Sens. J. 2021, 22, 1041–1050. [Google Scholar] [CrossRef]
Júnior, E.P.C.; Vieira, L.F.; Vieira, M.A. CAPTAIN: A data collection algorithm for underwater optical-acoustic sensor networks. Comput. Netw. 2020, 171, 1–11. [Google Scholar]
Luo, H.; Xu, Z.; Wang, J.; Yang, Y.; Ruby, R.; Wu, K. Reinforcement Learning-Based Adaptive Switching Scheme for Hybrid Optical-Acoustic AUV Mobile Network. Wirel. Commun. Mob. Com. 2022, 2022, 1–14. [Google Scholar] [CrossRef]
Liu, Z.; Meng, X.; Liu, Y.; Yang, Y.; Wang, Y. AUV-Aided Hybrid Data Collection Scheme Based on Value of Information for Internet of Underwater Things. IEEE Internet Things J. 2021, 9, 6944–6955. [Google Scholar] [CrossRef]
Duan, R.; Du, J.; Jiang, C.; Ren, Y. Value-based hierarchical information collection for AUV-enabled internet of underwater things. IEEE Internet Things J. 2020, 7, 9870–9883. [Google Scholar] [CrossRef]
Fang, Z.; Wang, J.; Jiang, C.; Wang, X.; Ren, Y. Average Peak Age of Information in Underwater Information Collection with Sleep-scheduling. IEEE Trans. Veh. Technol. 2022, 71, 10132–10136. [Google Scholar] [CrossRef]
Khan, M.T.R.; Jembre, Y.Z.; Ahmed, S.H.; Seo, J.; Kim, D. Data freshness based AUV path planning for UWSN in the internet of underwater things. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA, 9–13 December 2019; pp. 1–6. [Google Scholar]
Al-Habob, A.A.; Dobre, O.A.; Poor, H.V. Age-optimal information gathering in linear underwater networks: A deep reinforcement learning approach. IEEE Trans. Veh. Technol. 2021, 70, 13129–13138. [Google Scholar] [CrossRef]
Wu, T.; Wen, P.; Tang, S. Optimal scheduling strategy of AUV based on importance and age of information. Wirel. Netw. 2022, 1–9. [Google Scholar] [CrossRef]
Huang, M.; Zhang, K.; Zeng, Z.; Wang, T.; Liu, Y. An AUV-assisted data gathering scheme based on clustering and matrix completion for smart ocean. IEEE Internet Things J. 2020, 7, 9904–9918. [Google Scholar] [CrossRef]
Cheng, W.; Teymorian, A.Y.; Ma, L.; Cheng, X.; Lu, X.; Lu, Z. Underwater localization in sparse 3D acoustic sensor networks. In Proceedings of the IEEE INFOCOM 2008—The 27th Conference on Computer Communications, Phoenix, AZ, USA, 13–18 April 2008; pp. 236–240. [Google Scholar]
Stojanovic, M.; Preisig, J. Underwater acoustic communication channels: Propagation models and statistical characterization. IEEE Commun. Mag. 2009, 47, 84–89. [Google Scholar] [CrossRef]
Brekhovskikh, L.M.; Lysanov, Y.P.; Beyer, R.T. Fundamentals of Ocean Acoustics; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
Stojanovic, M. On the relationship between capacity and distance in an underwater acoustic communication channel. In Proceedings of the ACM SIGMOBILE Mobile Computing and Communications Review, Los Angeles, CA, USA, 25 September 2007; ACM: Los Angeles, CA, USA, 2007; Volume 11, pp. 34–43. [Google Scholar]
Elamassie, M.; Miramirkhani, F.; Uysal, M. Performance characterization of underwater visible light communication. IEEE Trans. Commun. 2018, 67, 543–552. [Google Scholar] [CrossRef]
Giles, J.W.; Bankman, I.N. Underwater optical communications systems. Part 2: Basic design considerations. In Proceedings of the MILCOM 2005–2005 IEEE Military Communications Conference, Atlantic City, NJ, USA, 17–20 October 2005; pp. 1700–1705. [Google Scholar]
Steinvall, O. Laser system range calculations and the Lambert W function. Appl. Opt. 2009, 48, B1–B7. [Google Scholar] [CrossRef]
Farr, N.; Chave, A.; Freitag, L.; Preisig, J.; White, S.; Yoerger, D.; Titterton, P. Optical modem technology for seafloor observatories. In Proceedings of the OCEANS 2005 MTS/IEEE, Washington, DC, USA, 17–23 September 2005; pp. 928–934. [Google Scholar]
Liu, J.; Wang, X.; Bai, B.; Dai, H. Age-optimal trajectory planning for UAV-assisted data collection. In Proceedings of the IEEE INFOCOM 2018-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Honolulu, HI, USA, 15–19 April 2018; pp. 553–558. [Google Scholar]
Phillips, A.; Haroutunian, M.; Murphy, A.J.; Boyd, S.; Blake, J.; Griffiths, G. Understanding the power requirements of autonomous underwater systems, Part I: An analytical model for optimum swimming speeds and cost of transport. Ocean Eng. 2017, 133, 271–279. [Google Scholar] [CrossRef] [Green Version]
Furlong, M.E.; McPhail, S.D.; Stevenson, P. A concept design for an ultra-long-range survey class AUV. In Proceedings of the OCEANS 2007-Europe, Aberdeen, UK, 18–21 June 2007; pp. 1–6. [Google Scholar]
Rigby, P.; Pizarro, O.; Williams, S.B. Towards geo-referenced AUV navigation through fusion of USBL and DVL measurements. In Proceedings of the OCEANS 2006, Boston, MA, USA, 18–21 September 2006; pp. 1–6. [Google Scholar]
Su, N.; Wang, J.B.; Zeng, C.; Zhang, H.; Lin, M.; Li, G.Y. Unmanned-Surface-Vehicle-Aided Maritime Data Collection Using Deep Reinforcement Learning. IEEE Internet Things J. 2022, 9, 19773–19786. [Google Scholar] [CrossRef]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
dos Santos Mignon, A.; da Rocha, R.L.d.A. An adaptive implementation of ϵ-greedy in reinforcement learning. Procedia Comput. Sci. 2017, 109, 1146–1151. [Google Scholar] [CrossRef]

Figure 1. Illustration of the underwater network model.

Figure 2. The AUV multi-modal steering angle diagram. (a) The trajectory of the AUV from the current coordinates

p_{a, t}

towards the next target

c_{j}

will not sail through the communication region

C_{m}

of

c_{i}

. (b) The trajectory of the AUV from the current coordinates

p_{a, t}

towards the next target

c_{j}

will sail through the communication region

C_{m}

.

Figure 2. The AUV multi-modal steering angle diagram. (a) The trajectory of the AUV from the current coordinates

p_{a, t}

towards the next target

c_{j}

will not sail through the communication region

C_{m}

of

c_{i}

. (b) The trajectory of the AUV from the current coordinates

p_{a, t}

towards the next target

c_{j}

will sail through the communication region

C_{m}

.

Figure 3. Convergence performance of the DQN-based data collection algorithm.

Figure 4. Effect of AUV velocity on AoI of collected data.

Figure 5. Effect of AUV velocity on the weighted energy consumption of the task.

Figure 6. The average AoI of collected data with the increasing data arriving rate.

Figure 7. Performance comparison in terms of AUV energy consumption.

Figure 8. AoI for different paths when the number of CHs is 5.

Table 1. Simulation parameters.

Parameters	Description	Value (Unit)
f	Carrier frequency	35 (kHz)
$Δ f$	Bandwidth	2 (kHz)
k	Propagation loss	$1.5$
$S N R_{min}^{a c}$	UAC minimum SNR	3 (dB)
$R_{a c}$	UAC data rate	$3.16$ (kbps)
$R_{o p}$	UOC data rate	$0.5$ (Gbps)
$θ$	Half of the transmitter beamwidth	3 $(^{\circ})$
c	Extinction coefficient	$0.18$ (m $^{- 1}$ )
$ζ$	Turbidity of water quality	$0.05$
$D_{r}$	Aperture diameter	$0.25$ (m)
$S N R_{min}^{o p}$	UOC minimum SNR	3 (dB)
$N E P$	Noise equivalent power	1 (mW)
$P_{r e c}^{o p}$	Average transmitted power	$0.01$ (mW)
$ρ$	Density of water	997 (Kg/m $^{3}$ )
$η_{p}$	Efficiency of the AUV propulsion system	100%
$C_{D}$	Drag coefficient	$0.0064$
$A_{s}$	Wetted surface area	$0.8856$ (m $^{2}$ )
$B$	Experience replay buffer sizer	500,000
$Φ_{b}$	Mini-batch size	256
$γ$	Reward discount factor	$0.95$
$χ$	Update step	1000

Table 2. Average data transmission energy consumption of CHs versus the data arriving rate.

Collected Data (Kbits)	Single UAC (mJ)	Single UOC (mJ)	Greedy (mJ)	Multi-Modal (mJ)
20	6659.29	0.02	5650.31	5112.19
50	15,403.83	0.05	12,780.47	3968.71
80	24,081.09	0.08	20,852.35	5583.11
110	32,623.83	0.11	28,789.69	7668.36
140	41,099.29	0.14	379,37.81	0.14
170	49,103.91	0.16	46,346.02	0.16
200	52,399.92	0.19	55,359.61	0.19

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bu, F.; Luo, H.; Ma, S.; Li, X.; Ruby, R.; Han, G. AUV-Aided Optical—Acoustic Hybrid Data Collection Based on Deep Reinforcement Learning. Sensors 2023, 23, 578. https://doi.org/10.3390/s23020578

AMA Style

Bu F, Luo H, Ma S, Li X, Ruby R, Han G. AUV-Aided Optical—Acoustic Hybrid Data Collection Based on Deep Reinforcement Learning. Sensors. 2023; 23(2):578. https://doi.org/10.3390/s23020578

Chicago/Turabian Style

Bu, Fanfeng, Hanjiang Luo, Saisai Ma, Xiang Li, Rukhsana Ruby, and Guangjie Han. 2023. "AUV-Aided Optical—Acoustic Hybrid Data Collection Based on Deep Reinforcement Learning" Sensors 23, no. 2: 578. https://doi.org/10.3390/s23020578

APA Style

Bu, F., Luo, H., Ma, S., Li, X., Ruby, R., & Han, G. (2023). AUV-Aided Optical—Acoustic Hybrid Data Collection Based on Deep Reinforcement Learning. Sensors, 23(2), 578. https://doi.org/10.3390/s23020578

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AUV-Aided Optical—Acoustic Hybrid Data Collection Based on Deep Reinforcement Learning

Abstract

1. Introduction

2. Related Works

3. Network Model

3.1. Network Architecture

3.2. Node Clustering Phase

3.3. Acoustic Data Collection Link

3.4. Optical Data Collection Link

4. Multi-Modal Data Collection Analysis

4.1. Problem Analysis

4.2. Definition of AoI

4.3. Energy Consumption Associated with Data Collection

5. Proposed DRL-Based Multi-Modal Data Collection Scheme

5.1. MDP Formulation

5.2. Multi-Modal Steering Angle Optimization Algorithm

5.3. DRL-Based Multi-Modal Path Planning Scheme

6. Results and Discussion

6.1. Simulation Setup

6.2. The Convergence Performance

6.3. Impact of the AUV Velocity on Performance

6.4. Impact of the Data Arrival Rate on Performance

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI