1. Introduction
With the increasing complexity of underwater exploration missions, collaborative operations of Autonomous Underwater Vehicle (AUV) swarms have become a critical technology for addressing challenges in wide-area search, target localization, and environmental monitoring [
1,
2]. However, during complex and dynamic missions, AUV swarms must adapt their collaborative strategies in real time under conditions of limited communication, environmental uncertainty, and frequent mission changes [
3]. Identifying individual intents and integrating them into swarm decision-making processes is a core challenge for enabling efficient collaborative search among AUVs.
Intent recognition, as a technique for inferring decision goals from observed behaviors, plays a crucial role in scenarios with incomplete or uncertain information [
4]. By analyzing observed behaviors to infer underlying intents, this method provides essential support for swarm mission planning and role allocation. Combining intent recognition with collaborative control techniques can not only enhance the adaptability of AUV swarms in dynamic and complex environments but also offer innovative solutions for multi-mission collaboration and emergency response.
Intent recognition fundamentally involves analyzing behaviors or information to infer the goals or motivations of individuals or swarms within specific contexts. It transforms external observations (such as actions, states, or communication signals) into high-level cognitive constructs (such as intents or objectives), serving as a critical bridge between observable behaviors and underlying decision-making processes. In real-world scenarios, observed information is often incomplete or noisy. Intent recognition leverages the integration of multi-source data, prior knowledge, and environmental context to reconstruct genuine intents from limited observations.
Intent recognition has found widespread applications in various domains [
5,
6], including human behavior prediction [
7,
8], vehicle lane change intent detection [
9,
10], aerial combat target intent prediction [
11,
12], and question–answering systems [
13,
14], supported by relatively mature methodologies. Currently, intent recognition approaches can be broadly categorized into two main types: model-based methods and data-driven methods.
Model-based intent recognition methods rely on predefined frameworks combined with adaptive parameter adjustments to construct deterministic models. These methods have made significant advancements and are increasingly reaching maturity. Prominent approaches in this category include template matching, expert systems, decision trees, and Bayesian networks. These methods excel in scenarios where domain knowledge is well-understood and can be explicitly modeled.
Data-driven intent recognition methods, on the other hand, leverage neural networks and deep learning techniques to learn latent patterns directly from data, without the need for prior assumptions about prototype models. The rapid advancements in data collection and computational capabilities have enabled the proliferation of data-driven algorithms in recent years. Key methods in this category include artificial neural networks, convolutional neural networks (CNNs), and recurrent neural networks (RNNs), as well as cutting-edge deep learning frameworks such as transformers. While model-based methods benefit from explainability and reliability, particularly in structured environments, data-driven approaches demonstrate superior adaptability and generalization in complex and dynamic scenarios. The integration of these two paradigms, combining the robustness of model-based inference with the flexibility of data-driven learning, represents a promising direction for advancing intent recognition technologies.
Although the theory of intent recognition has seen significant advancements and has reached a relatively mature stage, its application in the field of multi-agent collaboration remains underexplored and requires further development. In particular, research on intent recognition in the domain of underwater robotic collaboration is still in its infancy, with limited foundational work and significant gaps in practical implementation. The unique challenges posed by underwater environments, such as limited communication bandwidth, high latency, and the dynamic nature of oceanic conditions, further complicate the application of intent recognition techniques [
15]. The existing intent recognition methods face several challenges when applied to underwater multi-AUV collaborative search missions, which are outlined as follows:
In contrast to other domains where intention recognition often relies on inherent or well-established mappings, such as the intuitive association between human actions and intentions, no such predefined or standardized relationship exists in the context of AUV collaborative search missions.
Unlike autonomous driving and collaborative surface or aerial robots, which can rely on high-precision sensors to obtain large volumes of high-resolution environmental data in real time, multi-AUV systems are constrained by low bandwidth, high latency, and high packet loss rates of underwater acoustic communication. These limitations restrict intent recognition to only a limited amount of data exchange. Additionally, accumulated errors in underwater navigation lead to inaccurate positioning information. These factors result in relatively low accuracy and completeness of the data used for intent recognition.
To address the challenges mentioned above, we propose an intent recognition method for AUVs, termed the consensus-based intent recognition (CBIR) approach. Specifically, this method is grounded in the Belief–Desire–Intention (BDI) framework [
16]. The BDI framework is a cognitive architecture widely used in autonomous systems and multi-agent environments to model rational decision-making processes. The BDI framework consists of three core components: beliefs, desires, and intentions. Beliefs represent the agent’s understanding of the environment, including information received from sensors, prior knowledge, and inferred data. Beliefs form the foundation of the agent’s perception of the current state of the world. Desires reflect the goals or objectives the agent aims to achieve. Desires are not bound by practicality; instead, they represent the ideal outcomes or end states the agent wishes to reach. Intentions represent the specific plans or actions the agent commits to executing in order to achieve its desires, given its beliefs. Intentions bridge the gap between abstract goals and practical actions, guiding the agent’s behavior in a structured manner.
In this study, beliefs represent the AUV’s perception of the state information of both the swarm and the target. Desires refer to the goal of successfully locating the target, while intentions indicate the actions taken by the AUV, based on its understanding of global state information, to complete the collaborative search mission. The BDI framework is used to unify the decision-making process of the entire AUV swarm. Intent recognition enhances an AUV’s understanding of the global situation by identifying the intentions of other AUVs, thereby optimizing its decision-making process.
As previously analyzed, intent recognition in the AUV domain lacks a clear mapping between behavioral states and intentions. Therefore, the first challenge is to determine which specific state information influences AUV intentions and how this information impacts decision making. However, in collaborative search missions, it is impossible for the AUV swarm to complete the target search with a single decision. Throughout the mission, maintaining communication and improving detection accuracy must also be considered, requiring multiple dynamic adjustments to accomplish the mission. This complexity makes the mapping between behavioral states and intentions highly intricate.
To address this, we introduce the concept of “landmarks” [
17]. In this context, a “landmark” refers to a representative state that signifies potential short-term achievements during the intermediate stages of completing the mission. Specifically, to execute the collaborative search mission, the AUV needs to dynamically adjust its decisions at fixed intervals based on the current global situation, producing an immediate intention that serves as the “landmark”. In this study, “landmarks” are categorized into three types: optimizing communication, enhancing detection, and rapidly approaching the target. This classification unifies the intention types throughout the entire swarm decision-making process.
In practical operations, the situational information and action sequences of the AUV swarm often constitute massive datasets. It is neither feasible nor practical to use all this information to establish a mapping between behavioral states and intentions. On one hand, underwater acoustic communication is limited and cannot support large-scale data exchange. On the other hand, only a subset of this information truly influences intentions, while most of the data are redundant.
To address these challenges, we propose using a fuzzy inference method to establish the mapping between state information and AUV intentions. Specifically, by combining the three intention types mentioned above, we select key information that impacts decision making. By adjusting the fuzzy inference model, we enable it to produce corresponding intention results based on varying input data. Additionally, this approach ensures that when different AUVs receive identical or similar state information, they can make consistent decisions, achieving consensus within the swarm. We refer to this process as establishing the AUV’s behavior model.
In an ideal scenario where no communication packet loss or delay exists between AUVs, the intentions of other AUVs could be directly inferred using identical state features based on the behavior model. However, each AUV can only infer the behavior intentions of others based on similar, rather than identical, state features. To overcome these challenges, we propose the use of a residual convolutional neural network (ResCNN) to achieve robust and accurate intention inference for AUVs, even under the constraints of unreliable communication.
The main innovations of the proposed method are summarized as follows:
The method introduces the concept of “landmarks” to construct consensus among AUVs, focusing only on critical information that influences key behaviors. By leveraging the behavioral consistency of AUVs in similar or identical situations, this approach significantly reduces the data volume required for intent recognition, alleviates communication overhead in underwater acoustic channels, and enhances both the accuracy and efficiency of intent prediction.
The proposed method adopts a dual-stage architecture comprising a behavior modeling phase and an intent inference phase. In the behavior modeling phase, fuzzy inference maps the state space to the action space, providing an initial characterization of the current intent. The intent inference phase employs a residual convolutional neural network (ResCNN) to further analyze and predict the target intent based on behavioral data. This modular design enhances the flexibility and predictive power of the system.
The remainder of this paper is organized as follows:
Section 2 provides a brief overview of several intent recognition methods, highlighting their strengths and limitations.
Section 3 introduces the problem of intent recognition for the AUV collaborative search mission.
Section 4 presents a detailed explanation of the proposed behavior modeling and intent inference methods.
Section 5 conducts experiments and analyzes the performance of the proposed model. Finally,
Section 6 concludes the paper with a summary of the findings and discusses potential directions for future work.
3. Description of the Intent Recognition Problem in AUV Collaborative Search Mission
In multi-AUV collaborative search missions, AUV swarms are tasked with performing target search, tracking, and encirclement in unknown or partially observable complex marine environments. Given the dynamic and uncertain nature of the marine environment, along with the concealment and maneuverability of targets, AUVs must make rapid and accurate collaborative decisions under limited environmental information and communication constraints. During this process, each AUV predicts the intentions of other AUVs based on local sensor observations and inter-agent interaction data (e.g., position, velocity, direction), infers the behavioral objectives of each AUV, and designs multi-AUV collaborative strategies in each decision cycle to ensure the efficiency and robustness of mission execution.
The description of the intent space and intent feature input is shown below.
3.1. AUV Intention Space
The collaborative search mission typically involves a long duration and a wide spatial span. In such missions, a swarm of AUVs must collaborate intensively to complete missions such as target detection, tracking, and localization. We assume the entire process as follows:
At the initial stage of the mission, the AUV swarm performs an area patrol mission aimed at covering a broad marine area and identifying potential targets. Due to the advantages of low power consumption and high concealment, passive sonar sensors are commonly used for detection in the AUV swarm. At the beginning of the mission, one AUV uses its passive sonar sensor to preliminarily detect a target. The AUV then shares the target information with the entire swarm to ensure that other AUVs receive the preliminary location data of the target.
To enhance the accuracy of target localization and gather more information about the target, the AUV swarm needs to perform a collaborative localization and tracking mission. The entire mission is divided into two stages:
Positioning Stage: To quickly approach the target and prepare for subsequent collaborative localization, the AUV swarm first needs to design a series of positioning points based on the target’s location. The selection of these points takes into account providing better spatial positioning for collaborative detection and localization. Each AUV navigates from its initial position to the corresponding positioning point based on the collaborative localization requirements, preparing for continuous tracking and localization. During this stage, the AUV swarm needs to ensure that no AUV is left behind while reaching the positioning point as quickly as possible.
Tracking Stage: Once the AUV reaches the positioning points, the AUV adopts a bearing-only passive target tracking (BOT) method to continuously track and localize the target [
43]. The BOT method enables the AUV swarm to adjust its headings and speeds based on the relative direction of the target, thereby improving localization accuracy and stability.
In the entire collaborative search mission, in addition to the target position and bearing information, the AUV swarm also needs to share other relevant information in real time, such as detection errors. This allows for flexible strategy adjustment based on changes in the environment and the progress of the mission, ensuring the efficiency and robustness of the mission. To achieve this, we collected the aforementioned information during the decision-making cycle to determine the AUV’s behavior in the next cycle, which is essentially the AUV’s intent. Specifically, based on the two main phases of the mission, we have designed corresponding intent strategies.
In the positioning phase, the AUV’s intent can be classified into two types: communication priority and positioning priority. The communication priority intent emphasizes maintaining stable communication connections within the swarm, ensuring smooth information flow and preventing swarm disintegration due to communication interruptions. On the other hand, the positioning priority intent focuses on rapidly reaching the predetermined positioning points to ensure a quick response for the subsequent target tracking mission.
In the tracking phase, the AUV swarm must minimize target detection errors to enhance localization accuracy. However, while the BOT method helps reduce detection errors, it overlooks the detection range, which may cause the AUV to exceed the detection radius. To address this limitation, the AUV’s intent in this phase can be categorized into two types: detection priority and positioning priority. Under the detection priority intent, the AUV continues to track the target using the BOT method, further refining the target’s localization accuracy. Once the distance to the target reaches a predefined threshold, the intent shifts to positioning priority. The positioning priority intent prioritizes rapidly approaching the target within a specified heading range to minimize the distance. Upon reaching the threshold, the intent transitions back to detection priority.
The positioning priority in both phases is directed toward the target, thus the entire search phase can be divided into three types of intent: communication priority, positioning priority, and detection priority.
3.2. AUV Intention Characteristics
The intent of AUVs is manifested through their actions and states, with the mapping from behavior and state to intent being determined by the collaborative search decision-making process. As such, it is imperative to first identify the factors that influence this decision-making process. Building on our prior work [
44], we identify several key factors in the positioning phase, including the distance to the positioning point, the success rate of the received data, the distance to the parent node in the communication topology (PNCT), and the number of companions. In the tracking phase, the principal factors include the distance to the target, current detection errors, and the variance of detection errors over time. If these factors are obtained in a timely manner, the intentions of other AUVs can be predicted using the same decision-making method, facilitating adjustments to the collaborative strategy. However, due to the inherent characteristics of underwater acoustic communication, such as significant delays and packet loss, the transmission of this information may become unreliable. As a result, intention recognition is crucial for predicting the intentions of other AUVs.
Intent recognition necessitates the fusion and analysis of information obtained from sensors, making the selection of appropriate feature inputs for the model crucial. The aforementioned factors are key contributors to intent recognition, but certain information, such as packet loss rate and detection error, exhibits significant temporal variation and has a more direct impact on intent. These factors can adversely affect the intent recognition of AUVs under conditions of high latency and packet loss. Therefore, we ultimately select the following information as inputs for the model: distance to the positioning point, distance to the PNCT, relative direction to the positioning point, relative direction to the PNCT, number of companions, and the average detection error. By integrating and analyzing these factors, we can more accurately capture the key elements influencing the variation in AUV intentions, thereby improving the accuracy and robustness of intent recognition.
Table 2 further lists the specific meanings and related descriptions of each AUV intent feature indicator.
3.3. Dataset Construction for Intent Recognition in Multi-AUV Collaborative Search Mission
In the context of the multi-AUV collaborative search mission, the dataset construction process is critical for training effective intent recognition models. The quality of the data, as well as the selection of relevant features, greatly impacts the model’s ability to predict and understand the intentions of AUVs in complex marine environments. The simulation dataset and real-time feature set used in this study were obtained through a collaborative search simulation platform developed in our previous work. The platform comprises multiple modules, including AUV kinematics and dynamics simulation, underwater acoustic channel simulation, underwater acoustic networking communication simulation, and the simulation of relevant control algorithms. During the simulation of the collaborative search mission, the platform can record the AUV state features in real time and construct the corresponding intent recognition dataset from these features.
During the simulation, various state information of the AUV swarm is recorded. These include, but are not limited to, AUV motion characteristics, such as speed, posture, and position information, reflecting the dynamic state of the AUV in space, communication quality indicators, such as packet loss rate and latency, representing the stability of the underwater acoustic communication network and the efficiency of information transmission, and networking communication status, including the communication state between AUVs, signal strength, and effective connections. These state feature data are used to construct the AUV behavior model, from which the AUV’s intent can be inferred.
To achieve cooperative decision-making and intent recognition for the AUV swarm, the collected state features are matched during the simulation process, and the corresponding intent labels are assigned based on the AUV’s behavior and mission requirements. Specifically, within each decision cycle, we determine the AUV’s behavior intent based on the feature information from the following two main phases: In the positioning phase, the AUV’s mission is to reach the designated position quickly. State features, such as the distance to the positioning point, relative direction to the PNCT, and the number of companions, can indicate whether the AUV has deviated from the mission requirements. In the tracking phase, the AUV’s goal is to precisely locate the target. At this stage, information such as detection error and distance to the target is critical for determining the AUV’s behavioral goal. Through the mapping relationships in the behavior model, we match the collected state features with the corresponding intent labels (e.g., “communication priority”, “positioning priority”) to form the intent recognition dataset.
To ensure the training effectiveness of the intent recognition model, we have added corresponding intent labels to each data entry in the simulation dataset. These labels are annotated based on the actual performance of the AUV’s state features in the mission. For example, during the positioning phase, the AUV’s intent label can be either “positioning priority” or “communication priority”, depending on the AUV’s mission requirements and communication status. During the tracking phase, the AUV’s intent label may be “detection priority” or “positioning priority”, depending on the current detection accuracy and mission requirements.
To improve the model’s generalization ability and avoid overfitting, we applied data augmentation techniques to enhance the dataset. By injecting noise and performing data normalization, the dataset is better suited to the diversity and uncertainty of real-world environments.
Once the dataset is ready, we divide it into training, validation, and test sets for model training and evaluation. The training set is used to train the model, the validation set is used for parameter tuning and preventing overfitting, and the test set is used to evaluate the model’s performance and generalization ability.
4. CBIR Method in the AUV Collaborative Search Mission
To address the challenges faced by underwater robots in collaborative search missions, we propose a method called consensus-based intent recognition (CBIR), shown in
Figure 1. We first unified the AUV swarm decision-making process based on the BDI framework. To establish the mapping between behavioral states and intentions, we introduced the concept of “landmarks”, standardizing the intention types throughout the swarm decision-making process. These intention types include optimizing communication, enhancing detection, and rapidly approaching the target. Building on our previous work [
44], we identified the key information influencing AUV decision-making: distance to the positioning point (DPP), success rate of received data (PRR), distance to the PNCT (DPN), number of companions (NoCs), detection error (DE), and variance of detection error (VDE).
These features were used as inputs for the fuzzy inference model, with the three intention types as outputs. Using a fuzzy inference method, we constructed a mapping between this information and the intended actions, achieving consensus among the AUV swarm. This approach ensures that when different AUVs receive the same or similar state information, they make consistent decisions.
Once the mapping relationship is established, intent recognition for other AUVs can be achieved by classifying this information. However, due to the large delays and high packet loss characteristics of underwater acoustic communication, certain key information with significant temporal fluctuations, such as PRR, DE, and VDE, could negatively impact classification accuracy. Therefore, we replaced these variables with alternative features that exhibit less temporal variability but still reflect the necessary characteristics. The final input features for the classification model include the DPP, DPN, relative direction to the positioning point (RDP), relative direction to the PNCT (RDPN), number of companions (NoCs), and the average detection error (ADE). The outputs of the classification model remain consistent with the three intention types: communication priority, detection priority, and positioning priority.
In the collaborative search process of AUVs, information flow begins with collecting key state data from other AUVs, including parameters such as DPP, DPN, RDP, RDPN, NoCs, and ADE. These data are processed by a pre-trained ResCNN to predict the intent of other AUVs. Using both predicted intent and data received through underwater acoustic communication, the system computes the future state of other AUVs. The fuzzy inference model uses the following information: DPP, PRR, DPN, NoC, DE, and VDE to determine the local AUV’s intent, ensuring consensus within the swarm. Finally, the identified intent guides the path planning process, enabling the AUV to make adaptive decisions and contribute effectively to the collaborative search mission. The overall process is illustrated in
Figure 2.
In summary, by combining fuzzy inference and deep learning, and utilizing landmark consensus, the CBIR method effectively reduces communication overhead, enhancing the robustness and scalability of intent recognition in collaborative search missions. This provides an efficient and practical solution for AUVs to perform collaborative missions and achieve common objectives in complex underwater environments.
4.1. Construction of Behavior Models Based on Fuzzy Inference
Fuzzy inference is used in this study to establish the AUV behavior model, essentially serving to unify the decision-making method of the AUV swarm. The core idea is to ensure that the same or similar information consistently leads to the same decision outcome, thereby achieving decision-making consensus among the AUVs. We applied Gaussian membership functions and triangular membership functions to fuzzify the inputs and outputs, respectively. Through multiple tests, we established a comprehensive fuzzy rule table. The Mamdani-type fuzzy implication operator and the “max-min” fuzzy composition operation were used to compute the inference result for each rule. The weighted average defuzzification method was then applied. The complete process is illustrated in
Figure 3.
Fuzzy inference is a method based on fuzzy logic, used to handle situations involving uncertainty and vagueness [
45]. Fuzzy inference allows for the processing of fuzzy information in a system, simulating the way humans handle ambiguous and complex decisions to draw reasonable conclusions. Fuzzy inference relies on fuzzy set theory and fuzzy logic. Traditional binary logic can only represent “true” or “false”, but fuzzy logic allows for continuous values between “true” and “false”, commonly using numbers between 0 and 1 to represent the degree of membership of an element. Fuzzy inference consists mainly of four components: fuzzification, fuzzy rules, fuzzy inference, and defuzzification.
4.1.1. Fuzzification
The fuzzification process involves defining several fuzzy subsets for each input variable within the fuzzy domain and assigning membership functions to these subsets. The membership function serves to describe the degree of mapping of each fuzzy subset within the domain, and it is typically presented in a graphical form.
We define the fuzzy domain for NoCs as [0, n − 1], where n represents the total number of AUVs. The fuzzy domain for PRR is [0, 1], the fuzzy domain for DPP is [−3000, 0], the fuzzy domain for DPN is [−1000, 0], the fuzzy domain for detection error is [0, 10], and the fuzzy domain for detection error variance is [0, 3]. The linguistic values for all these domains are represented as “Large (L)”, “Medium (M)”, and “Small (S)”.
Because the smooth curve characteristics of the Gaussian membership function are well-suited for handling continuous and fuzzy data, we choose the Gaussian membership function here. The mathematical expression of the Gaussian membership function is as follows:
where
is the input value, c is the center of the Gaussian function, and
is the standard deviation (which controls the width of the curve).
For the output, we designed three behavioral outcomes
, where
is communication priority,
is positioning priority, and
is detection priority. The membership function chosen for these outcomes is the triangular membership function, and the mathematical expression is as follows:
where
is the left endpoint (minimum value) with a membership degree of 0,
is the center point (maximum value) with a membership degree of 1, and
is the right endpoint (maximum value) with a membership degree of 0.
Due to space limitations, we have presented the membership function curves for one input (NoCs) and the output separately, as shown in
Figure 4.
4.1.2. Fuzzy Rules
Fuzzy rules are used to define the relationship between input conditions and output variables. In fuzzy inference theory, fuzzy rules are expressed in the form of “IF-THEN”, which determines the search strategy of the AUV. In this work, the overall goal of the AUV is to reach the positioning point as quickly as possible. However, during the process, adjustments may be made in individual cycles to maintain communication and optimize detection. For example, when communication quality deteriorates, the AUV is more likely to prioritize maintaining communication and to optimize detection, it may even move away from the positioning point. By listing all possible combinations of antecedents and corresponding consequents, we can derive a rule table (see
Table 3).
4.1.3. Fuzzy Inference
We use the Mamdani-type fuzzy implication operator and the “max-min” fuzzy composition operation to compute the inference result for each rule. Afterward, the maximum value method is employed to aggregate the outputs of all fuzzy rules.
4.1.4. Defuzzification
The weighted average defuzzification method is used. The formula and specific process are as follows:
where
is the defuzzified output,
is the membership degree of the
-th output value,
is the
-th output value, and
is the total number of output values.
At this point, we have completed the mapping from the state space set to the action space set based on the observed data. If communication is good and the information exchange delay between AUVs is small, we can directly use this method to infer the intentions of other AUVs. However, as discussed earlier, the acoustic communication delay is quite large and variable. The same data arriving at the receiving AUV may have already lost its timeliness. Therefore, it is necessary to use data that reflects both communication quality and mission completion status but is less sensitive to time, to recognize the intentions of the AUV.
4.2. Intent Recognition Model Based on Residual Convolutional Neural Networks
Intent recognition in underwater collaborative search missions is essentially a classification problem. We referred to the residual block design concept proposed in [
46] and based on this idea, constructed the intent recognition network in this work. This architecture introduces skip connections, which allow deep neural networks (DNNs) to maintain effective gradient flow in deeper layers, thus avoiding the vanishing gradient problem during the training process [
47]. This structure significantly improves the training efficiency and enhances the model’s performance. The network structure is shown in
Figure 5, consisting of multiple layers of residual blocks, which effectively learn the features of the input data and infer the intent of the AUV.
The network architecture consists of an initial convolutional layer (Conv) followed by several residual blocks, each containing multiple convolutional layers. Specifically, our network features a sequence of residual blocks, with each block containing two convolutional layers. The output of the last residual block is then passed through a fully connected layer (Dense), followed by a softmax activation function. This activation function is used because the classes are not mutually exclusive; meaning, multiple classes may be predicted simultaneously. Each convolutional layer’s output is rescaled using batch normalization (BN) and is then passed through a rectified linear activation unit (ReLU). Additionally, dropout is applied after the nonlinearity to prevent overfitting.
The network consists of a series of convolutional and residual blocks, designed for 1D data processing. It begins with an initial convolutional layer, which applies a 1D convolution with 16 filters, followed by batch normalization, a ReLU activation function, and max pooling with a stride of 2. This first block reduces the spatial dimensions of the input data while extracting low-level features.
Following the initial block, there are four residual blocks, each constructed using the optimized residual network module. These blocks progressively increase the number of filters from 16 to 256, allowing the network to capture increasingly complex patterns as the depth increases. Each residual block contains skip connections to help the network learn deeper features without suffering from the vanishing gradient problem.
After passing through the residual blocks, the network uses an adaptive average pooling layer to reduce the spatial dimension to a fixed size of 1, regardless of the input length. This is followed by a flattening operation to prepare the output for the final fully connected layer. The output from the flattening operation is passed through a fully connected layer with 256 input units, which outputs a prediction for categories.
Overall, this architecture effectively balances feature extraction and complexity, utilizing residual connections to improve learning capacity while keeping the model relatively simple and efficient for classification missions.
Compared to traditional convolutional neural networks, the introduction of ResCNN enables the model to learn complex patterns more flexibly and achieve better generalization with smaller training samples. In AUV intent recognition, the combination of ResCNN and CNN not only captures the deep relationships between AUV behaviors and states but also improves the recognition accuracy and robustness in complex environments.
The training process of the ResCNN for intent recognition in AUV collaborative search missions involves several key steps. First, the training dataset is preprocessed using a data loader to convert state features and intent labels into PyTorch tensors. The ResCNN model is constructed with convolutional layers and residual blocks to maintain gradient flow and prevent vanishing gradients. During training, the model uses a cross-entropy loss function with the Adam optimizer and a learning rate scheduler to ensure efficient convergence. The training loop involves forward propagation, loss computation, backpropagation, and model weight updates over 300 epochs. Accuracy is monitored during training by comparing predicted and true labels, and the training loss is recorded to assess convergence. After training, the model is saved and tested using a custom testing function, validating its performance on unseen data. This approach ensures the model’s robustness and accuracy in dynamic underwater environments.
5. Numerical Experimental Analysis
5.1. Numerical Experimental Data and Environment
The simulation experiment is based on the LP-AUV [
48], as shown in
Figure 6, which features a wide speed range and high load capacity, allowing it to carry various sensors for rapid response operations. Due to space limitations, detailed information on the LP-AUV mathematical model and the relevant parameters of the LP-AUV can be found in reference [
48]. For a detailed modeling process of the underwater acoustic channel, please refer to our previous work [
44].
The simulation scenario is illustrated in
Figure 7. During the dataset generation, the initial positions of the AUVs are randomly distributed. In the initialization phase, the communication network is ensured to be connected. We assume that initially, only one AUV (referred to as AUV
1) detects the target’s position during the simulation. To enhance detection accuracy, collaborative detection and localization are required. Therefore, based on the target’s initial position and the needs of collaborative localization, several positioning points are designed to guide the remaining AUVs to reach these points quickly. During the initial phase, the Hungarian method is used to assign each AUV a positioning point, with the primary criterion being the minimization of the total travel distance of the AUV swarm.
Communication among the AUV swarm uses a multi-hop communication approach. In multi-hop communication, data are transmitted from one AUV to another through intermediate nodes rather than directly between the source and the destination. This method is particularly beneficial in underwater environments where acoustic communication has limited range, high latency, and high packet loss rates. The communication follows a hierarchical flooding routing protocol, ensuring efficient data dissemination while reducing redundant transmissions. During the simulation, the information exchanged between AUVs is affected by end-to-end delay and packet loss rate. For detailed calculation methods, please refer to our previous work [
44]. During the collaborative search process, if any AUV loses connection, the search is considered unsuccessful. If all AUVs enter the tracking phase, the collaborative search is deemed successful.
The number of AUVs significantly impacts the communication topology and the complexity of collaborative decision making. If the number of AUVs is too small, the problem becomes overly simplistic, failing to demonstrate the effectiveness of the decision-making process. The speed difference between the AUVs and the target also influences the success rate of the collaborative search. AUVs moving only slightly faster than the target may struggle to achieve effective positioning and tracking, while a significant speed advantage could make the mission less challenging and reduce the need for advanced intent recognition and collaborative strategies. Although the actual communication radius of underwater acoustic networks can exceed 1 km, we set a 1 km communication and detection radius in this study to simplify the problem and focus on the proposed method’s core aspects. To focus on specific research questions, we set the number of AUVs to nine, with each AUV having a cruising speed of 4 knots. The target consists of a single object, which moves at a speed of 2 knots. The parameters for underwater acoustic communication are configured to give the AUVs a communication radius of 1 km, as detailed in
Table 4. The detection radius was also set to 1 km.
The experimental computer system is Windows 11, with Python version is 3.8.0. The laptop was sourced from Lenovo, Beijing, China, and is equipped with NVIDIA GeForce RTX 3060 GPU. CUDA 11.0 is used for acceleration, and PyTorch 1.8.0 deep learning framework is used.
5.2. Impact of Underwater Acoustic Channels on Communication
In underwater collaborative search missions, numerous state variables are considered, including depth, position, velocity, acceleration, attitude, and detection error. The proposed method simplifies this complexity by constructing a behavior model that requires only key information rather than all available state data. This approach effectively reduces the data volume in communication packets, thereby lowering the packet loss rate. This is particularly meaningful, as the packet error rate
is directly influenced by data size, as shown in the following formula:
where
is bit error rate, and can be calculated based on Signal to Noise Ratio (SNR) and the modulation method, and
represents the number of bits in a packet. The probability of error-free reception is equivalent to the probability that all bits are received correctly.
Figure 8 illustrates the impact of different data lengths on the packet loss rate under consistent underwater acoustic channel parameters.
Table 5 lists the packet error rates at distances of 600 m, 700 m, and 800 m.
Figure 8 illustrates that as the communication distance increases, the PER rises sharply, particularly beyond 600 m. The data size significantly affects this trend, with larger data packets (288 bits) exhibiting a much steeper increase in PER compared to smaller packets (72 bits). This is because larger data packets are more susceptible to errors over longer distances, highlighting the importance of minimizing data size to maintain reliable communication in underwater environments. The data in
Table 5 further quantifies this effect.
Since there is currently no existing research on intent recognition in the AUV domain, it is not possible to directly compare the data volume of the proposed method with other approaches. But, since intent recognition can combine multi-dimensional information to predict the development trend of the swarm from a decision-making perspective, it not only reduces the need for real-time data compared to trajectory prediction but also better handles unexpected situations. This capability enables swarm collaboration under more challenging conditions, further reducing dependence on communication. Although the data packet length set in this study may not fully match real-world scenarios, the strategy of reducing data volume by constructing a behavior model to lower the packet loss rate is feasible. This approach provides valuable insights and a practical foundation for future research in this area.
5.3. Parameter Tuning
When training deep learning models, properly setting and tuning hyperparameters is a key factor in improving model performance. The main hyperparameters include, but are not limited to epoch , batch size , and learning rate . These hyperparameters significantly impact the model’s convergence speed, training effectiveness, and final classification performance.
We use the accuracy metric in classification problems to evaluate the impact of each parameter on the classification mission, and the results are shown in
Table 6. The accuracy of the classification task is calculated as the ratio of correctly predicted labels to the total number of samples and can be expressed as follows:
Based on the results in
Table 6, we maximized the intent recognition accuracy as the criterion and ultimately determined that
,
, and
.
5.4. Comparative Analysis with Other Intention Recognition Methods
In this experiment, we compared the proposed ResCNN model with CNN, LSTM, and GAN models, evaluating their classification performance on the same dataset. To ensure fairness, all models used identical training and testing sets, with consistent optimizers and early stopping mechanisms. Hyperparameters for all models were tuned using grid search to ensure optimal configurations for the experiments. We use the accuracy metric in classification problems to evaluate the performance of each method. The intent recognition results are shown in
Figure 9 and
Table 7.
The results demonstrate that the proposed ResCNN model outperforms the other models, achieving a classification accuracy of 95.83%. This improvement is primarily attributed to the introduction of residual connections, which effectively mitigate the vanishing gradient problem, allowing the network to maintain strong training performance even at deeper layers. The CNN model achieved an accuracy of 94.98%, ranking second among the compared models. While CNN effectively captures local features in the data, its performance is somewhat limited by the absence of a deeper network structure. The LSTM model achieved an accuracy of 94.55%, slightly lower than both CNN and ResCNN, suggesting that the advantages of LSTM in time-series modeling did not fully translate into better performance for the current mission, potentially due to its weaker ability to extract local spatial features. The GAN model achieved an accuracy of 94.85%, similar to CNN, but slightly lower than ResCNN, indicating that while GAN generates richer feature representations, its classifier and feature extraction capabilities did not fully match those of ResCNN.
From the training process shown in
Figure 10, it can be observed that the training loss curve of ResCNN shows an overall rapid decrease, and it converges to a relatively low loss value quickly within the first 50 epochs. As training progresses, the loss fluctuations gradually decrease and eventually stabilize, demonstrating good convergence. The training loss curve of CNN also declines quickly, but there are larger fluctuations in the later stages, especially after 250 epochs, where the loss does not show significant improvement and the fluctuation amplitude is higher than that of ResCNN. The training loss curve of LSTM exhibits a clear downward trend, with rapid loss reduction within the first 50 epochs, followed by a gradual stabilization, maintaining low fluctuations in the later stages. The training loss curve of GAN decreases slowly in the early stages, and its fluctuations are significantly higher than those of the other models throughout the entire training process. Even after a substantial number of epochs, the loss still shows noticeable oscillations.
From the analysis of the loss curves, it is clear that ResCNN outperforms the other methods in terms of training, with faster convergence and higher stability. CNN and LSTM also demonstrate certain advantages during training, but they still have limitations in deep feature extraction and handling complex missions. GAN requires further optimization of the training strategy to improve its stability and efficiency.
Overall, the experimental results strongly validate the effectiveness and superiority of the proposed ResCNN model. Compared to traditional CNN, LSTM, and GAN models, ResCNN exhibits enhanced generalization ability and stability in classification missions, providing a more efficient solution for practical applications. Future research could further enhance model performance by integrating attention mechanisms or other advanced techniques.
5.5. Comparative Analysis with Traditional Collaborative Search Methods
To validate the performance of the proposed decision consensus-based intent recognition method in collaborative search missions, we designed a comparative experiment to compare this method with the collaborative search method discussed in [
44]. We designed two typical experimental scenarios to test the search efficiency and mission completion capabilities of the method under different communication topology conditions (as shown in
Figure 11):
Scenario A: Good Communication Topology Conditions
Scenario A simulates a collaborative search environment with favorable communication conditions. Under the initial conditions, the communication topology between the AUVs is fully connected, allowing each AUV to exchange state information in real time with all other AUVs. The objective of this scenario is to compare the efficiency of the two methods in completing the search mission, with the mission completion time being the primary evaluation metric. By testing the mission duration of both methods, we can assess the advantage of the proposed method in optimizing search paths and improving mission efficiency.
Scenario B: Extreme Communication Topology Conditions
Scenario B simulates an extreme environment under sparse communication conditions. In this scenario, the communication topology between the AUVs is limited, and information exchange can only occur between nodes within the local neighborhood range. This setting simulates real-world situations where communication may be interrupted or constrained. The goal is to evaluate the differences between the two methods in prediction and decision-making capabilities, particularly in maintaining mission completion rates and search efficiency when communication is incomplete. By testing the mission success rate and search efficiency of both methods, we analyzed the robustness and adaptability of the proposed method under extreme conditions. In the experiment, all models were tested under the same initial conditions, with evaluation metrics including search time and mission success rate.
As shown in
Table 8, the time taken to reach the target location using the two methods is 4219 s and 3140.5 s, respectively. The proposed method reduces the mission completion time by approximately 25.6%, demonstrating a significant improvement in efficiency.
At each sampling moment, we calculated the total distance of all AUVs to their respective positioning points to quantify the progress of the collaborative search mission. The specific calculation method is as follows:
where
represents the number of AUVs,
represents the coordinates of
AUVi, and
represents the coordinates of their corresponding occupying sites.
We visualized the results of the two methods in a graph, as shown in
Figure 12. The plot illustrates the decline in total DPP over time, where the method proposed in this paper (with IR) achieves a faster reduction compared to the method without IR. This demonstrates that IR enhances the efficiency of collaborative search by enabling AUVs to reach their target positions more quickly.
As shown in
Figure 13, the simulation comparison results for Scenario A clearly demonstrate the advantages of the proposed CBIR method in collaborative search missions. From
Figure 13a, it can be observed that the AUV’s search path exhibits considerable redundant movement. This is primarily because traditional methods rely on global communication and strictly follow communication constraints for path planning, making it difficult to optimize mission execution through smarter decision making. In contrast,
Figure 13b shows a more efficient path distribution, where the intent of other AUVs is predicted and used to make the collaboration between AUVs more rational, thus avoiding unnecessary path overlap. This indicates that the CBIR method, through its intent prediction and inference mechanism, effectively reduces redundant paths and enhances the efficiency of path planning.
As shown in
Table 9, the proposed method takes 15,340.5 s to complete the mission in Scenario B. Due to the more demanding topological structure, the mission completion time increases significantly. However, the proposed method is still capable of accomplishing the collaborative search mission, demonstrating better robustness compared to traditional methods.
As shown in
Figure 14, despite the increasingly stringent communication topology, the proposed method still ensures superior search efficiency.
As shown in
Figure 15, under extreme communication topology conditions for collaborative search missions, significant differences in performance were observed between the traditional method and the intent prediction-based CBIR method in the simulation.
Figure 15a shows the search path of the traditional method under sparse communication conditions. Due to limited communication, AUVs are unable to exchange information in real-time, leading to a failure in collaboration. Some AUVs fall into a passive state after losing communication, unable to continue effectively participating in the mission, which results in team separation and ultimately failure of the search mission. The distance of 1.58 km indicated by the arrow exceeds the communication radius (1 km).
Figure 15b illustrates the search path using the intent prediction-based CBIR method. Under the same extreme communication conditions, despite long periods without communication between AUVs, the CBIR method still enables global decision making based on local information through intent prediction and inference mechanisms. Each AUV is able to infer the intent of the entire team and dynamically adjust its own path, effectively avoiding the mission failure caused by communication interruptions in traditional methods. As a result, the collaborative search mission was successfully completed.
From the analysis, it is evident that the proposed method excels in adaptability and robustness under extreme communication conditions. By introducing a decision consensus-based intent recognition mechanism, this method effectively reduces dependence on global communication and significantly enhances the dynamic adjustment capabilities of multi-AUV collaborative search missions. In conclusion, the proposed method demonstrates superior search performance in complex environments and provides an efficient solution for multi-AUV collaborative search missions.