A Period Training Method for Heterogeneous UUV Dynamic Task Allocation

Xie, Jiaxuan; Yang, Kai; Gao, Shan; Bao, Shixiong; Zuo, Lei; Wei, Xiangyu

doi:10.3390/electronics12112508

Open AccessCommunication

A Period Training Method for Heterogeneous UUV Dynamic Task Allocation

by

Jiaxuan Xie

¹,

Kai Yang

²,

Shan Gao

^3,*,

Shixiong Bao

³,

Lei Zuo

³ and

Xiangyu Wei

⁴

¹

Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China

²

Satellite Network Group, General Management Department of China, Ltd., Beijing 100029, China

³

National Lab of Radar Signal Processing, Xidian University, Xi’an 710000, China

⁴

Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(11), 2508; https://doi.org/10.3390/electronics12112508

Submission received: 14 April 2023 / Revised: 22 May 2023 / Accepted: 23 May 2023 / Published: 2 June 2023

Download

Browse Figures

Versions Notes

Abstract

:

In the dynamic task allocation of unmanned underwater vehicles (UUVs), the schemes of UUVs need to be quickly reallocated to respond to emergencies. The most common heuristic allocation method uses predesigned optimization rules to iteratively obtain a solution, which is time-consuming. To quickly assign tasks to heterogeneous UUVs, we propose a novel task allocation algorithm based on multi-agent reinforcement learning (MARL) and a period training method (PTM). The period training method (PTM) is used to optimize the parameters of MARL models in different training environments, improving the algorithm’s robustness. The simulation results show that the proposed methods can effectively allocate tasks to different UUVs within a few seconds and reallocate the schemes in real time to deal with emergencies.

Keywords:

unmanned underwater vehicles; dynamic search task allocation; multi-agent reinforcement learning

1. Introduction

With the rapid development of UUV technologies, multi-UUVs can cooperatively perform various complicated tasks, such as target localization, photogrammetry, and cooperative search coverage [1,2,3]. Usually, multiple heterogeneous UUVs equipped with various sensors are allocated to search irregular task areas, where the UUVs may be broken due to unpredictable threats, and many new search tasks need to be conducted. To deal with emergencies, each UUV is required to adaptively reallocate its task sets, resulting in multiple UUV search coverage task allocation problems in dynamic environments.

To solve the UUV task allocation problem, many heuristic algorithms have been researched in recent years. To deal with large-scale geospatial search problems, Kuhlman et al. use a novel iterative greedy method to plan the search paths of UUVs, which the algorithm scales well to the deployment of large groups of agents [3]. In [4], a new technology based on receiver operating characteristic (ROC) analysis is used to centrally handle the UUV task allocation problem, and the path planning problem is solved using a distributed model. To address the dynamic UUV task allocation problem, Sun et al. design a tri-level optimization method to plan safe paths in severe ocean environments, in which UUVs have limited detection ranges and are required to respond to emergencies in real time [5]. To guarantee the effectiveness of scheme allocation, heuristic algorithms commonly require a certain number of optimization iteration processes to generate, update, and select solutions, leading to high time consumption in dynamic task environments.

Multi-agent reinforcement learning (MARL) provides a novel fast-solving framework to solve multi-agent task planning problems [6,7]. MARL formulates problems as a sequential Markov decision process, where intelligent agents collaboratively complete tasks in a shared environment [8,9]. In terms of the control and obstacle avoidance of UUVs, Fang et al. use the decentralized MARL training framework and improve the original multi-agent generative adversarial imitation learning method by introducing a new randomly selecting and updating method [10]. In [11], the researchers construct multiple reward functions to adjust the action distribution of UUVs, which quickly allocate task sequences in ocean current environments. MARL has achieved many significant results in different UUV planning problems. Therefore, it has great potential in UUV coordination task allocation. However, real task scenarios involve dynamic changes due to emergencies. To ensure the completion of tasks, it is inevitable for MARL to reallocate task schemes in different task environments. The changing numbers of UUVs reduce the generalization performance of the MARL algorithm and may lead to model invalidation [11,12].

In this paper, we propose a novel MARL with an attention mechanism (MARLAM) to quickly assign tasks to heterogeneous UUVs. Herein, we propose a novel task allocation algorithm based on multi-agent reinforcement learning (MARL) and a period training method (PTM). The designed MARL model employs an encoder–decoder framework with an attention mechanism to adaptively allocate many irregular task areas for heterogeneous UUVs. In addition, PTM uses periodically changing training conditions to optimize the model parameters, which enables the MARL model to achieve better scalability after the training process. The experiments demonstrate that the proposed MARLAM with the designed period training method (MARLAM-PTM) can use a trained MARL model to quickly allocate task schemes in different task environments.

2. Dynamic Task Allocation Formulation

2.1. Scenario Description of the Dynamic UUV Allocation Problem

Figure 1 is an illustration of the UUV search coverage task allocation problem with emergencies. Several irregular task areas are randomly distributed in a two-dimensional environment. Heterogeneous UUVs with different colors have different velocities and search ranges. Each UUV travels sequentially to its allocated task and searches the areas with designed coverage search paths. All UUVs start from the same base and return to it after finishing their tasks. The allocation of schemes needs to minimize the total distances traveled by UUVs between tasks and the overall time consumed searching irregular areas. As shown in Figure 1, the search coverage path of UUVs needs to be elaborately designed according to the shape and size of the irregular task areas. In addition, two types of emergency are considered here, i.e., broken UUVs and the new search tasks. When UUVs suffer emergencies, the optimization algorithm needs to reallocate tasks quickly and adaptively.

2.2. Objective Functions and Constraints

The heterogeneous UUV group is set as

U = {[U_{1}, U_{2}, \dots, U_{N}]}^{T}

, where

N

is the number of UUVs.

[x_{i}, y_{i}]

,

v_{i}

, and

r_{i}

represent the location coordinates, velocity, and search range of the UUV

i

, respectively.

T = {[T_{1}, T_{2}, \dots, T_{M}]}^{T}

denotes irregular task areas, where

M

is the number of tasks. UUVs are expected to search all irregular task areas with minimum traveling distances and search times. Accordingly, the objective function

f

is defined as follows:

\min f = \sum_{i = i}^{N} \sum_{m = 0}^{M} \sum_{n = 0}^{M} d (T_{m}, T_{n}) δ_{i, m, n} + \frac{(t (U_{i}, T_{m}) + t (U_{i}, T_{n}))}{2} δ_{i, m, n}

(1)

where

d (T_{m}, T_{n})

is the Euclidean distance between the locations of task

m

and task

n

, and

δ_{i, m, n}

is the binary decision variable. We define

δ_{i, m, n}

as 1 if task

m

and task

n

are allocated to the UUV

i

, and otherwise, we define it as 0.

t (U_{i}, T_{m})

denotes the search time of the UUV

i

in task

m

:

t (U_{i}, T_{m}) = \frac{L_{i, m}}{v_{i}}

(2)

where

L_{i, m}

and

v_{i}

are the coverage search path of the UUV

i

on task

m

, and the search velocity of the UUV

i

, respectively. As shown in Figure 2a,b

L_{i, m}

needs to be specially designed according to the UUV’s search range and the task area’s shape and size [4]. For the convenience of calculating the coverage search path, a maximum bounding rectangle is employed to approximate the irregular task areas.

In Figure 2a, the irregular task area is approximately represented by its bounding rectangle. Figure 2b shows an illustration of the UUV spiral search strategy, and the coverage search path

L_{i}_{, m}

can be calculated as follows:

L_{i}_{, m} = \frac{l_{m} \cdot w_{m}}{r_{i}} - r_{i} + \frac{\sqrt{l_{m}^{2} + w_{m}^{2}}}{2}

(3)

where

l_{m}

and

w_{m}

are the length and the width of the bounding rectangle, respectively.

To decrease the search time of the allocation of schemes, some UUVs with faster search velocities and wider search ranges tend to be assigned more tasks, leading to uneven allocation of schemes to UUVs. To avoid this situation, the maximum task load of the UUVs is described as follows:

l o a d = c e i l (\frac{M}{N - 1})

(4)

where

c e i l (\cdot)

is the top integral function, which guarantees that UUVs are not overloaded.

3. MARL with the Attention Mechanism and Period Training Method

To solve the UUV task allocation problem with emergencies, we propose a novel MARL algorithm with an attention mechanism. The algorithm consists of two parts; one is an encoder with deep feature extraction networks, and the other is a decoder based on the attention allocation mechanism. First, the encoder contains two linear projection networks and one self-attention network [13,14,15,16], which is used to extract the high-dimensional features of the UUV and task data. Then, the obtained high-dimensional features are used in the decoder, and we introduce the attention mechanism to allocate task sets for each UUV sequentially. Finally, we design a period training method to optimize the parameters of the encoder and decoder.

3.1. Encoder with Deep Feature Extraction Networks

It is noted that the data dimensions are different between the UUV data

U_{i}

and the task data

T_{m}

. We use two linear projection networks to unify the dimensions of

U_{i}

and

T_{m}

before the high-dimensional embedding process.

\{\begin{cases} h_{i}^{U} = W_{1} \times U_{i} + b_{1} \\ h_{m}^{T} = W_{2} \times T_{m} + b_{2} \end{cases}

(5)

where

h_{m}^{T}

and

h_{i}^{U}

are the low-level features of task

m

and the UUV

i

, respectively, with the same dimension

d i m

.

W_{1}

,

b_{1}

,

W_{2}

, and

b_{2}

are the parameter vectors of the two linear projection networks. Then, we use a simple attention model (SAM) to extract the high-level features [15].

H_{i}^{U}, H_{m}^{T} = S A M ({[h_{1}^{U} \dots, h_{i}^{U}, h_{N}^{U}, h_{1}^{T}, \dots h_{m}^{T} \dots, h_{M}^{T}]}^{T})

(6)

where

H_{i}^{U}

and

H_{m}^{T}

are the high-level features of the UUV

i

and target

m

, respectively.

{[\cdot, \cdot, \cdot]}^{T}

is the transposition operator.

3.2. Decoder with Attention Mechanisms

The decoder sequentially allocates the tasks for each UUV until all tasks are finished. During the solving process, the model needs the high-level feature vectors extracted from the encoder to solve the task lists. The flow chart of the decoding process can be seen in Figure 3.

The decoder uses an attention mechanism to output the task

m_{i}^{t}

for the UUV

i

at a specific time step

t

. We aim to use one trained MARL model to solve the problem with different numbers of UUVs and tasks. This means that the dimension of data embedding in the decoder remains constant in changing task environments. Therefore, combining the high-level feature vectors, the context embedding

c_{i}^{t}

of the UUV

i

at step

t

is constructed as follows:

c_{i}^{t} = [H_{i}^{U}, H_{_{m_{i}^{t}}}^{T}, \frac{1}{N} \sum_{i = 1}^{N} H_{_{m_{i}^{t}}}^{T}, \frac{1}{N} \sum_{i = 1}^{N} H_{i}^{U}, \frac{1}{M} \sum_{m = 1}^{M} H_{m}^{T}]

(7)

where

[\cdot, \cdot, \cdot]

is the horizontal concatenation operator. The dimension of

c_{i}^{t}

is

(5 \cdot d i m)

, which can remain unchanged with differing numbers of UUVs and tasks. A single-head attention layer in the decoder is used to calculate the high embedding

c_{i, m}^{t}

of the UUV

i

in task

m

, which denotes the matching degree between the UUV

i

and task

m

at decoding step

t

.

c_{i, m}^{t} = \tanh (\frac{{(W_{3} c_{i}^{t})}^{T} (W_{4} H_{m}^{T})}{\sqrt{\dim}})

(8)

where

W_{3}

and

W_{4}

are the parameter vectors of the linear projection networks. The probability that the UUV

i

selects task

m

at the time step

t

can be calculated as follows:

p_{θ | i, m}^{t} = \frac{e^{c_{i, m}^{t}}}{\sum_{m = i}^{M} e^{c_{i, m}^{t}}}

(9)

where

p_{θ | i, m}^{t}

represents the selection probability and

θ

is the total parameter vector of the encoder and the decoder. According to

p_{θ | i, m}^{t}

and the maximum task load constraint in (4), the decoder circularly outputs the tasks for each UUV as the solution until all tasks are selected.

Note that all parameters in the encoder and decoder do not depend on the UUVs and tasks. Hence, we can apply one MARLAM model to different numbers of UUVs and tasks without retraining.

3.3. Period Training Method

The total parameter vector

θ

is optimized with the proposed PTM. To ensure the convergence of the MARL model, the number of agents usually needs to be fixed in the training process. In the dynamic task allocation problem, UUVs may suffer unpredictable events and be broken. This changing number of UUVs significantly reduces the performance of the MARL model [11,12]. To adapt to this situation, the trained model needs to reallocate tasks when the UUVs‘ number changes.

Figure 4 is an illustration of the PTM training framework. Different from the common MARL training algorithm, PTM uses various data with different numbers of UUVs to construct periodically changing training environments. In addition, the improved actor–critic framework is introduced in the PTM. Herein, we redesign the parameter exchange rules between the actor network and critic network to achieve the convergence of the MARL model. The parameter vector of the total model

θ

is optimized using the policy gradient algorithm [14], which can be calculated as follows:

\nabla L (θ) = E_{p_{θ | i, m}^{t}} [(f (π) - f (\bar{π})) \nabla \log p_{θ | i, m}^{t} (π)]

(10)

where

L

represents the loss of the training process, and

π

and

\bar{π}

denote the solutions obtained by the actor network and critic network, respectively.

To ensure the convergence of the MARL model in periodically changing training conditions, we design a greedy update method to control the parameter replacement between the critic network and actor network. After finishing an epoch training step, PTM randomly samples the test data

{U, T}_{1}, \dots, {U, T}_{L}

from not only the current epoch but also the other epochs with different numbers of UUVs. The rule of parameter replacement of the PTM is as follows:

\{\begin{cases} θ_{c} \to θ_{a}, i f \sum (f (π) - f (\bar{π}) | {U, T}_{1}, \dots, {U, T}_{L}) > 0 \\ θ_{c} \to θ_{c}, else \end{cases}

(11)

where

θ_{c}

and

θ_{a}

are the parameters of the critic network and actor network, respectively. According to (12), PTM replaces the parameters of the critic network by using the actor network parameters when the results of the actor network are better than those of the critic network. Due to periodically changing training conditions and the greedy parameter replacement rule, the critic network can maintain the best model parameters.

4. Simulation Experiment

In the simulation experiments, we use two typical task allocation algorithms as comparisons, i.e., the modified two-part wolf pack search algorithm (MTWPS) [17] and dynamic discrete pigeon-inspired optimization (DDPIO) [18]. In addition, we use the traditional actor–critic training method to train another MARL model, and the two MARL models have the same network constructions, except for the training methods. The number of tasks in the training condition is 40. The number of UUVs in the common actor–critic method is four UUVs. The number of UUVs in PTM changes periodically between three and four. The simulation data are characterized as scalar values for the convenience of the tests.

4.1. Dynamic Case Settings with Emergencies

All UUVs start from the same base, whose coordinates are (12.5, 12.5), and return to the base when all their tasks are finished. We set three cases with emergencies in simulation experiments.

Case 1: We randomly generate 40 irregular task areas as the initial settings. Four heterogeneous UUVs with different velocities and search ranges need to conduct these tasks with preplanned orders and search coverage paths. The original task situation is shown in Figure 5a.

In Figure 5a,b the irregular blue areas are the task areas that need to be searched, the red rectangles are the corresponding approximate area model, and the black star represents the base. The Euclidean distance and the search time of UUVs in all areas can be approximately calculated using the simplified area model, which is a fast and efficient method of evaluating allocated task schemes.

2.: Case 2: We consider that the UUVs find a group of new task areas while performing case 1. Similarly, the new target areas are approximated using a bounding rectangle. To search the new task areas, the algorithms need to reallocate the task schemes in real time.
3.: Case 3: Following case 2, we assume that UUV 4 is broken and cannot search for any tasks in case 3. To deal with the emergency, the rest of the unfinished tasks are quickly assigned to the other UUVs according to objective function (1) and constraint (4).

4.2. Simulation Experimental Results

Figure 6 shows the allocation results of MARLAM-PTM in three cases. The values and times in Table 1 represent the values of the objective function in Equation (1) and the running times of the different methods, respectively.

Figure 6a displays the allocation result of case 1. The maximum task load of the UUVs in case 1 can be calculated with the constraint given in (4), i.e.,

14 = c e i l (40 / 3)

. As shown in Figure 6a, the numbers of allocation tasks for the four UUVs are 9, 8, 9, and 14, respectively. The scheme allocation of MARL-TPM meets the maximum task load constraint. In case 1, the UUVs conduct the allocation task sets according to the planning order and search them using the spiral search strategy. After completing the tasks, all UUVs return to the base. As shown in Table 1, the proposed MARLAM-PTM exhibits a fast solution speed due to the new solving framework.

Figure 6b demonstrates the scheme allocation of MARLAM-PTM in case 2, where the numbers of tasks and training conditions are different. Some task areas have been searched and are connected by broken lines, and the brown dots represent the new search tasks. In Figure 6b, the new search tasks are mainly located in the left corner of the task environment, near the original allocation tasks of UUV 2. In the reallocation of schemes, most of the new tasks are assigned to UUV 2 in real time, ensuring the reduction in the overall traveling distances of the UUVs. As shown in the results of case 2, the MARL-AMPTM adaptively reallocates tasks to each UUV and ensures that all new task areas can be searched.

As shown in Figure 6c, in case 3, the number of UUVs and tasks are both different from the training conditions of the two MARL models. During the process of carrying out tasks, UUV 4 is broken, and the unfinished tasks need to be reallocated to other UUVs. The locations of the unfinished tasks are mainly near the task sets of UUV 3. Figure 6c shows most of them are reallocated to UUV 3. It is noteworthy that the result in case 3 shows that the MARLAM-PTM model obtains a lower value than the MARLAM model. The periodically changing training conditions and the greedy parameter replacement rule in the PTM enable the model to maintain robustness in solving problems with different numbers of UUVs.

To test the robustness and scalability of MARLAM-PTM in various situations, we add four cases, and each of these contains three different task allocation problems. Ash shown Table 2 and Figure 7, Figure 8, Figure 9 and Figure 10, the additional tests have different settings. It is worth noting that we use the same trained model in all four cases, which have different test conditions. Table 2 and Figure 7, Figure 8, Figure 9 and Figure 10 show the results of task allocation.

According to the results of cases 4 and 5, two MARL models exhibit stable problem-solving capabilities when the number of tasks increases, and they yield similar results in task allocation across all six problems. Previous studies have shown that MARL models possess strong scalability with respect to the number of tasks [13]. Conversely, in cases 6 and 7, the performance of MARLAM is affected by a decrease in the number of UUVs. From the results of MARLAM-PTM in cases 6 and 7, the proposed algorithm utilizes the proposed PTM training model, which cyclically alters the numbers of UUVs, and the greedy rule of parameter replacement in Equation (11) ensures convergence and robustness of the proposed algorithm across different numbers of UUVs. As shown in the results, in all cases, this approach achieves optimal performance with a constantly changing number of UAVs.

5. Conclusions

In this paper, a new MARL algorithm with a period training method is proposed to solve the heterogeneous UUV dynamic task allocation problem with emergencies. The proposed MARLAM algorithm can quickly allocate tasks to each UUV. The dimension of data embedding in the MARL remains constant, which ensures that the algorithm can solve the problem with different numbers of UUVs and allocate tasks without one trained mode. In addition, the designed PTM training method uses periodically changing training conditions and the greedy parameter replacement rule to improve the scalability of the MARLAM model. Based on the simulation test results, the proposed method can quickly allocate different task areas for heterogeneous UUVs in different task environments.

Author Contributions

Resources, S.G.; validation, L.Z. and J.X.; writing—original draft preparation, X.W. and K.Y.; writing—review and editing and supervision, S.G. and S.B.; funding acquisition, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported, in part, by the National Natural Science Foundation of China (grant number 61871307) and the Fundamental Research Funds for the Central Universities (JB210207).

Data Availability Statement

Data is unavailable due to privacy or ethical restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gan, W.; Xia, T.; Chu, Z. A Prognosis Technique Based on Improved GWO-NMPC to Improve the Trajectory Tracking Control System Reliability of Unmanned Underwater Vehicles. Electronics 2023, 12, 921. [Google Scholar] [CrossRef]
Lemieszewski, L.; Radomska-Zalas, A.; Perec, A.; Dobryakova, L.; Ochin, E. GNSS and LNSS Positioning of Unmanned Transport Systems: The Brief Classification of Terrorist Attacks on USVs and UUVs. Electronics 2021, 10, 401. [Google Scholar] [CrossRef]
Zuo, L.; Hu, J.; Sun, H.; Gao, Y. Resource allocation for target tracking in multiple radar architectures over lossy networks. Signal Process 2023, 208, 108973. [Google Scholar] [CrossRef]
Baylog, J.G.; Wettergren, T.A. A ROC-Based Approach for Developing Optimal Strategies in UUV Search Planning. IEEE J. Ocean. Eng. 2018, 43, 843–855. [Google Scholar] [CrossRef]
Sun, S.; Song, B.; Wang, P.; Dong, H.; Chen, X. Real-Time Mission-Motion Planner for Multi-UUVs Cooperative Work Using Tri-Level Programing. IEEE Trans. Intell. Transp. Syst. 2022, 23, 1260–1273. [Google Scholar] [CrossRef]
Ao, T.; Zhang, K.; Shi, H.; Jin, Z.; Zhou, Y.; Liu, F. Energy-Efficient Multi-UAVs Cooperative Trajectory Optimization for Communication Coverage: An MADRL Approach. Remote Sens. 2023, 15, 429. [Google Scholar] [CrossRef]
Sun, Y.; He, Q. Computational Offloading for MEC Networks with Energy Harvesting: A Hierarchical Multi-Agent Reinforcement Learning Approach. Electronics 2023, 12, 1304. [Google Scholar] [CrossRef]
He, Z.; Dong, L.; Sun, C.; Wang, J. Asynchronous Multithreading Reinforcement-Learning-Based Path Planning and Tracking for Unmanned Underwater Vehicle. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 2757–2769. [Google Scholar] [CrossRef]
Qian, F.; Su, K.; Liang, X.; Zhang, K. Task Assignment for UAV Swarm Saturation Attack: A Deep Reinforcement Learning Approach. Electronics 2023, 12, 1292. [Google Scholar] [CrossRef]
Fang, Z.; Jiang, D.; Huang, J.; Cheng, C.; Sha, Q.; He, B.; Li, G. Autonomous underwater vehicle formation control and obstacle avoidance using multi-agent generative adversarial imitation learning. Ocean Eng. 2022, 262, 112182. [Google Scholar] [CrossRef]
Ding, C.; Zheng, Z. A Reinforcement Learning Approach Based on Automatic Policy Amendment for Multi-AUV Task Allocation in Ocean Current. Drones 2022, 6, 141. [Google Scholar] [CrossRef]
Liang, Z.; Dai, Y.; Lyu, L.; Lin, B. Adaptive Data Collection and Offloading in Multi-UAV-Assisted Maritime IoT Systems: A Deep Reinforcement Learning Approach. Remote Sens. 2023, 15, 292. [Google Scholar] [CrossRef]
Zhang, K.; He, F.; Zhang, Z.; Lin, X.; Li, M. Multi-vehicle routing problems with soft time windows: A multiagent reinforcement learning approach. Transp. Res. C Emerg. Technol. 2022, 121, 102861. [Google Scholar] [CrossRef]
Kool, W.; van Hoof, H.; Welling, M. Attention, Learn to Solve Routing Problems. In Proceedings of the 2019 International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2017. [Google Scholar]
Zuo, L.; Gao, S.; Li, Y.; Li, L.; Li, M.; Lu, X. A Fast and Robust Algorithm with Reinforcement Learning for Large UAV Cluster Mission Planning. Remote Sens. 2022, 14, 1304. [Google Scholar] [CrossRef]
Ren, L.; Fan, X.; Cui, J.; Shen, Z.; Lv, Y.; Xiong, G. A Multi-Agent Reinforcement Learning Method with Route Recorders for Vehicle Routing in Supply Chain Management. IEEE Trans. Intell. Transp. Syst. 2022, 23, 16410–16420. [Google Scholar] [CrossRef]
Chen, T.; Yang, T.; Yu, Y. Multi-UAV Task Assignment with Parameter and Time-Sensitive Uncertainties Using Modified Two-Part Wolf Pack Search Algorithm. IEEE Trans. Aerosp. Electron. Syst. 2018, 54, 2853–2872. [Google Scholar] [CrossRef]
Duan, H.; Zhao, J.; Deng, Y.; Shi, Y.; Ding, X. Dynamic Discrete Pigeon-Inspired Optimization for Multi-UAV Cooperative Search-Attack Mission Planning. IEEE Trans. Aerosp. Electron. Syst. 2021, 57, 706–720. [Google Scholar] [CrossRef]

Figure 1. An illustration of UUV task allocation with dynamic emergencies.

Figure 2. An approximate model of irregular areas. If there are multiple panels, they should be listed as (a) the maximum bounding rectangle and (b) the spiral search strategy.

Figure 3. The decoding process with the attention mechanism.

Figure 4. The training process of PTM.

Figure 5. The original task situation with the approximate area model. (a) Irregular target areas; (b) approximate area models.

Figure 6. The results of MARL-AMPTM: (a) the allocation result of case 1; (b) the reallocation result of case 2; (c) the reallocation result of case 3.

Figure 7. The results of MARLAM-PTM in case 4.

Figure 8. The results of MARLAM-PTM in case 5.

Figure 9. The results of MARLAM-PTM in case 6.

Figure 10. The results of MARLAM-PTM in case 7.

Table 1. Optimization results of different algorithms.

Algorithm	Case 1		Case 2		Case 3
	Value	Time	Value	Time	Value	Time
MTWPS	209.2	12.3 s	223.1	13.1 s	197.1	12.5 s
DDPIO	206.5	14.6 s	264.6	17.2 s	211.1	14.9 s
MARLAM	203.4	2.0 s	210.2	2.3 s	197.0	1.5 s
MARLAM-PTM	205.8	1.8 s	205.6	2.4 s	183.3	1.5 s

Table 2. Optimization results of different algorithms.

Case		MARLAM-PTM	MARLAM	MTWPS	DDPIO
Case 4-1	40 $T$ -4 $U$	205.62	206.5	209.3	216.2
Case 4-2	40 $T$ -4 $U$	229.65	232.98	224.16	233.24
Case 4-3	40 $T$ -4 $U$	193.35	194.68	195.24	204.27
Case 5-1	60 $T$ -4 $U$	252.56	259.61	270.27	274.86
Case 5-2	60 $T$ -4 $U$	259.87	261.48	267.07	269.51
Case 5-3	60 $T$ -4 $U$	259.90	261.28	258.54	267.67
Case 6-1	40 $T$ -3 $U$	181.84	189.08	190.42	190.06
Case 6-2	40 $T$ -3 $U$	199.19	217.81	205.88	210.12
Case 6-3	40 $T$ -3 $U$	199.98	203.8	200.52	209.11
Case 7-1	60 $T$ -3 $U$	248.96	253.98	244.82	258.10
Case 7-2	60 $T$ -3 $U$	249.77	251.63	254.20	256.03
Case 7-3	60 $T$ -3 $U$	244.40	245.89	244.12	260.85

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xie, J.; Yang, K.; Gao, S.; Bao, S.; Zuo, L.; Wei, X. A Period Training Method for Heterogeneous UUV Dynamic Task Allocation. Electronics 2023, 12, 2508. https://doi.org/10.3390/electronics12112508

AMA Style

Xie J, Yang K, Gao S, Bao S, Zuo L, Wei X. A Period Training Method for Heterogeneous UUV Dynamic Task Allocation. Electronics. 2023; 12(11):2508. https://doi.org/10.3390/electronics12112508

Chicago/Turabian Style

Xie, Jiaxuan, Kai Yang, Shan Gao, Shixiong Bao, Lei Zuo, and Xiangyu Wei. 2023. "A Period Training Method for Heterogeneous UUV Dynamic Task Allocation" Electronics 12, no. 11: 2508. https://doi.org/10.3390/electronics12112508

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Period Training Method for Heterogeneous UUV Dynamic Task Allocation

Abstract

1. Introduction

2. Dynamic Task Allocation Formulation

2.1. Scenario Description of the Dynamic UUV Allocation Problem

2.2. Objective Functions and Constraints

3. MARL with the Attention Mechanism and Period Training Method

3.1. Encoder with Deep Feature Extraction Networks

3.2. Decoder with Attention Mechanisms

3.3. Period Training Method

4. Simulation Experiment

4.1. Dynamic Case Settings with Emergencies

4.2. Simulation Experimental Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI