Attention-Enhanced Contrastive BiLSTM for UAV Intention Recognition Under Information Uncertainty

Niu, Qianru; Zhang, Luyuan; Ren, Shuangyin; Gao, Wei; Wang, Chunjiang

doi:10.3390/drones9040319

Open AccessArticle

Attention-Enhanced Contrastive BiLSTM for UAV Intention Recognition Under Information Uncertainty

by

Qianru Niu

,

Luyuan Zhang

,

Shuangyin Ren

,

Wei Gao

and

Chunjiang Wang

^*

Institute of Systems Engineering, Academy of Military Sciences, Beijing 100141, China

^*

Author to whom correspondence should be addressed.

Drones 2025, 9(4), 319; https://doi.org/10.3390/drones9040319

Submission received: 28 February 2025 / Revised: 16 April 2025 / Accepted: 17 April 2025 / Published: 21 April 2025

(This article belongs to the Collection Drones for Security and Defense Applications)

Download

Browse Figures

Versions Notes

Abstract

:

The widespread deployment of unmanned aerial vehicles (UAVs) in modern warfare has profoundly increased the complexity and dynamic nature of aerial combat. To address the limitations of traditional UAV combat intention recognition methods, which rely on the “complete information” assumption and struggle to adapt effectively to dynamic adversarial environments, this paper proposes a deep learning-based UAV air combat intention recognition model (BLAC). The BLAC model establishes dynamic temporal feature mappings through a bidirectional long short-term memory network (BL) and innovatively incorporates a cross-attention mechanism (A) paired with contrastive learning (C) to improve model performance. To mitigate battlefield information uncertainty, the BLAC model implements cubic spline interpolation for numerical features and proximity-based imputation for non-numerical features, effectively resolving data loss challenges. The experimental results demonstrate that the BLAC model achieves superior intention recognition accuracy compared to mainstream models, maintaining over 91% accuracy even under 30% data loss conditions. These outcomes confirm the robustness and adaptability of the model in dynamic combat environments. This research not only provides an efficient framework for UAV combat intention recognition under information uncertainty but also offers valuable theoretical and practical insights for advancing intelligent command and control systems.

Keywords:

combat intention recognition; deep learning; UAV; Bi-LSTM; attention mechanism

1. Introduction

Since the inception of unmanned aerial vehicles (UAVs), military strategists and defense analysts have anticipated their transformative impact on modern warfare. The ongoing large-scale, high-intensity conflict between Russia and Ukraine serves as a compelling case study, demonstrating the extensive deployment of various UAV types by both belligerents. These platforms have emerged as pivotal assets, significantly altering the dynamics of the battlefield. Concurrently, their proliferation has escalated the complexity and intensity of air defense operations. Within this context, UAV intention recognition has become a critical component of air combat decision support systems, holding substantial implications for the efficacy of future command and control architectures.

Target intention recognition serves as a critical nexus between historical data and future operational planning, enabling commanders to derive actionable insights from battlefield information. Functioning as a transformative mechanism that converts empirical data into abstract decision-making frameworks, it occupies a pivotal role within the operational process [1]. Based on the level of reliance on prior knowledge, intention recognition methodologies in combat environments can be categorized into three distinct approaches: template-based methods [2,3,4,5,6], statistical theory-based methods [7,8,9,10,11,12,13,14,15], and data-driven methods [16,17,18,19,20,21,22,23,24]. The first two categories represent traditional approaches to intention recognition. However, with the rapid advancements in machine learning technologies, data-driven methods have emerged as a contemporary solution tailored to modern demands. In Section 2, these three categories of intention recognition methods are systematically reviewed, and their respective advantages and limitations are critically analyzed.

The majority of the aforementioned studies on intention recognition operate under the assumption of “complete information”, which presupposes that data are characterized by fully observable features derived from an ideal environment. However, in the context of air combat, the inherent uncertainties associated with detection processes and the deployment of advanced military technologies often result in partial or incomplete information regarding UAVs, leading to data gaps. Target intention recognition is fundamentally a pattern recognition problem under dynamic adversarial conditions. The characteristic data of UAVs exhibit temporal and multidimensional variability during air combat. Traditional methods, which rely predominantly on expert experience and single-moment feature information, are inadequate for accurately identifying the tactical intentions of enemy aircraft in real-time within complex air combat environments [25]. Furthermore, traditional mathematical modeling approaches [2,3,4,5,6,7,8,9,10,11,12,13,14,15] are constrained by their scenario-specific nature, resulting in limited transferability. These methods often necessitate the integration of domain-specific expertise and extensive empirical evaluations across multiple related fields, rendering them both challenging and inflexible. In contrast, deep learning-based methods, being inherently data-driven, are not reliant on specific scenario configurations. They not only exhibit superior transferability but also possess self-learning capabilities, thereby effectively addressing the “knowledge acquisition bottleneck” associated with traditional methods. Consequently, this paper advances the study of deep learning methodologies by proposing a novel UAV air combat intention recognition algorithm, termed BLAC, which operates under conditions of uncertain information. The BLAC framework integrates three key components: BiLSTM (BL) for sequential data modeling, cross-attention (A) for feature interaction, and contrastive learning (C) for enhanced discriminative capability.

The main contributions of this paper are as follows:

This paper provides a systematic summary of intention recognition research in combat scenarios. The methods are categorized into three distinct classes, and a comparative analysis of their respective advantages and disadvantages is conducted;
The problem of UAV intention recognition in air combat is formally defined. A basic intention space and a feature set for intention recognition are constructed. A hierarchical strategy is employed to select a 9-dimensional target feature set. To capture the dynamic and temporal attributes of the target, data from 12 consecutive time steps are collected for feature extraction. The target features are normalized and uniformly encoded, while the decision-maker’s cognitive experience is encapsulated as intention labels;
A novel data patching method is proposed to address uncertain information in air combat. The feature set is divided into numerical and non-numerical types. For numerical features, a cubic spline interpolation method is applied, while close filling is used for non-numerical features.
An LSTM network is designed to implicitly map the intention feature set to the intention space. On the basis of LSTM, we add the bidirectional mechanism to integrate historical and future information, enabling robust temporal analysis. This approach captures time-dependent relationships and enhances the model’s capacity 82 to learn complex temporal patterns;
A cross-attention mechanism is innovatively integrated into the model to emphasize the importance of data at different time steps. This mechanism comprises the following two components:
- Temporal attention mechanism: focuses on key temporal actions;
- Feature attention mechanism: evaluates the importance of different feature categories, filtering out the most influential features;
Contrastive learning is introduced to improve feature discrimination. By minimizing intra-class distances and maximizing inter-class distances, recognition accuracy is improved under uncertainty.

The remainder of this paper is structured as follows: Section 2 reviews three intention recognition methods in the combat domain and discusses their respective strengths and limitations. Section 3 elaborately describes the problem of UAVs’ intention recognition under battlefield conditions, and provides the selection criteria and encoding approaches for the intention space and target features. Section 4 presents a comprehensive description of the proposed BLAC model. Section 5 describes the experimental setup and conducts model experiments. Finally, Section 6 summarizes the key findings and draws conclusions.

2. Related Works

Intention recognition in combat environments involves analyzing information gathered by various battlefield sensors to assess, predict, or interpret the operational intentions, plans, and strategies of the enemy [26]. This section provides a concise overview of the three main categories of intent recognition methods in the military domain, summarizing the advantages and limitations of each approach. Overall, the evolution of intent recognition methods can be broadly categorized into three stages: reliance on manual recognition, automation, and intelligence.

The manual recognition stage primarily relies on template-based methods, including template matching [2] and expert systems [3]. The earliest research on applying template matching for situational recognition was conducted by J. Azarewicz et al. [4] in 1989, who developed a multi-agent plan recognition and situational estimation system based on template matching. In 2017, Floyd et al. [5] extended the application of template matching to beyond-visual-range air combat scenarios. While template-based methods have facilitated the automation of intention recognition, their effectiveness is predominantly limited to simple scenarios, and they demonstrate restricted applicability in complex battlefield environments. Expert systems, on the other hand, describe knowledge through rules. Compared to fixed prior models, the conditions and outcomes of these rules are more flexible, enabling adaptation to different contexts and enhancing the flexibility of knowledge representation. In 2014, Huang et al. [6] constructed a rule base using expert knowledge to address uncertainty issues in target intention recognition. Despite their strong knowledge representation and reasoning capabilities, expert systems suffer from weak fault tolerance and limited learning abilities. Their recognition accuracy heavily depends on the size of the rule base, and the cost of updating and maintaining the system is high. Furthermore, expert systems struggle to adapt to the complexities and dynamics of modern battlefield environments.

Statistical-theory-based methods, such as D-S evidence theory [7], grey relational analysis [8], decision trees [9], and Bayesian networks [10], have propelled intention recognition into the automation stage. In 2017, Cao et al. [11] applied D-S evidence theory to perform sequential target recognition, improving recognition accuracy. While D-S evidence theory can effectively handle uncertain information and integrate multi-source evidence, it may produce erroneous conclusions when evidence conflicts, thereby limiting its recognition accuracy. In 2014, Dai et al. [12] employed grey relational analysis to represent the range of target state feature changes within a time interval using interval numbers, enabling the recognition of aerial target intention. Grey relational analysis requires minimal data and is computationally efficient, making it suitable for engineering applications. However, it tends to oversimplify the battlefield environment, often neglecting non-quantifiable factors and struggling to capture the dynamic evolution of intention. Decision trees provide an intuitive and systematic framework for modeling complex decision-making processes and play a critical role in operational intention recognition. In 2023, Yang et al. [13] proposed a cost-sensitive multi-class three-way decision-based target intention recognition method, addressing the issue of conflicting recognition results in traditional algorithms and resolving challenges in recognizing target intention in modern aerial combat. While decision trees perform well in specific scenarios, they struggle to handle nonlinear and high-dimensional interactions. In complex operational environments, decision trees often need to be combined with other methods to enhance accuracy and stability. Bayesian networks have garnered significant attention due to their causal reasoning capabilities and ability to process multi-source information. However, traditional Bayesian methods suffer from low inference efficiency and difficulty adapting to dynamic changes. To address these limitations, researchers have introduced dynamic Bayesian networks to improve intention recognition accuracy in battlefield environments. In 2023, Li et al. [14] integrated Multi-Entity Bayesian Networks (MEBN) into the DSBN model to tackle the challenges of dynamic and sequential air combat intention recognition in complex battlefield scenarios. In 2024, Yang et al. [15] applied dynamic sequential Bayesian networks (DSBN) to recognize intention in air–ground collaborative formations. Despite their advantages, the construction of Bayesian networks relies heavily on extensive domain knowledge and data, resulting in high computational costs for complex problems. Additionally, accurately modeling the dependencies between variables remains a significant challenge.

With the advancement of machine learning, data-driven methods for intention recognition have emerged. These methods can be categorized into three groups: traditional machine learning [16], reinforcement learning [17,18], and deep learning [19]. In 2022, Wu et al. [20] proposed an air target intention recognition method based on SSA-optimized support vector machines. While traditional machine learning algorithms perform well with limited structured data, they face significant challenges when processing large-scale, heterogeneous battlefield situational data. Reinforcement learning, which does not rely on predefined datasets, learns through exploration and trial-and-error, reducing its dependence on prior knowledge. This approach has provided innovative solutions for target intention recognition in complex and data-scarce operational environments. In 2024, Yang et al. [21] proposed an intelligent air combat maneuver decision-making method based on explainable reinforcement learning, constructing an intention recognition model. Bai et al. [22], addressing both intention recognition and deception in adversarial scenarios, developed solutions based on reinforcement learning from the perspectives of both the recognizer and the target. However, the effectiveness of reinforcement learning is highly dependent on the design of the reward function. If the reward function does not accurately reflect the intention of the target, it may lead to suboptimal or incorrect strategies. Additionally, the training process requires extensive experimentation and interaction, and in complex environments, it often suffers from slow convergence and high computational and time costs. Deep learning, on the other hand, automatically extracts features from high-dimensional data through multilayer neural networks, significantly improving the accuracy and robustness of intention recognition. It possesses self-learning capabilities, overcoming the knowledge acquisition limitations of traditional machine learning methods. In 2020, Zhou et al. [23] proposed an intention recognition method for air targets that combines LSTM networks with decision trees. Building on traditional LSTM networks, in 2021, Teng et al. [24] employed the BiLSTM network model to utilize both current and past target information for recognizing air combat target intention.

Through a comparative analysis of the methods outlined above, we have selected a deep learning approach for research on UAV intention recognition. The next section will describe the problem of UAV combat intention recognition, and Section 4 will provide a detailed introduction to the proposed intention recognition model.

3. UAV Intention Recognition Problem Description

UAV intention refers to the purpose exhibited by a UAV during the execution of its tasks, reflecting the tactical intent and strategic planning of the commander of the adversary. Intention recognition involves extracting and analyzing real-time situational information from UAVs in dynamic adversarial environments and inferring their tactical intention by integrating domain-specific military knowledge and expert experience. This process is illustrated in Figure 1.

The UAV combat target intention recognition model is typically described as a mapping from intention-related features extracted from situational information to the types of UAV air combat intention. However, due to the highly complex and dynamic nature of the air combat environment, it is challenging to describe the intricate mapping relationship between temporal feature sets and UAV tactical intention types using simple mathematical models. Additionally, since air combat is inherently a dynamic game process, adversary targets may employ decoys or disguise their true intentions, making battlefield information at a single moment insufficient to accurately reflect the real intention of the enemy. To address this challenge, it is necessary to integrate air combat data from multiple consecutive time steps to more effectively capture the intention. Therefore, in this paper, the BLAC model is trained using a UAV situational dataset obtained from a battlefield simulation system, implicitly establishing a mapping between UAV tactical intention types and temporal feature sets. The proposed UAV intention recognition framework is illustrated in Figure 2. First, we simulate real battlefield scenarios using the exercise and simulation system or using the historical data collected by sensors on the actual battlefield to construct the required raw dataset. Subsequently, we perform data repair and preprocessing operations. Combat command personnel then calibrate the intention space of the preprocessed dataset based on a knowledge base derived from air combat expertise and battlefield rules. The calibrated complete air combat temporal sequence feature dataset is subsequently input into the BLAC model for training, establishing the mapping relationship between the UAV intention feature set and the intention type set. During this process, parameters are continuously optimized through iterative training, and the trained intention recognition model is used to derive the recognition results of air combat target intentions. Specifically, the temporal feature data represents the target features most relevant to the target’s intention, extracted from multi-source data obtained from various sensors during the time interval from

T_{n}

to

T_{n + N}

. After normalization and encoding, these features form a temporal feature vector set. In the following section, we will provide a detailed introduction to the intention space and target features that constitute the training dataset in the “③ Dataset for Training” component of the model.

3.1. Intention Space

The intention space refers to the set of possible target intentions. Table 1 shows the intention spaces defined in some related studies, along with the number of intentions within each space. The dataset used in this study is based on a battlefield simulation involving enemy and friendly defenses. With reference to Table 1 and considering the corresponding operational context, as well as the basic attributes of enemy targets and their potential combat tasks, the operational environment is combined with the attributes and tasks of the enemy targets to ultimately determine six types of intention: attack, retreat, electronic jamming, surveillance, reconnaissance, and feint. Since these six types of intention cannot directly serve as evaluation criteria for the neural network, they are encoded as {0, 1, 2, 3, 4, 5} in this paper. Table 2 provides a detailed description of these six primary intentions. To facilitate model training and recognition, the six intentions in the above intention space are annotated using the method shown in Figure 3.

3.2. Target Features

From the perspective of operational reality, when a UAV receives a command, it typically follows specific operational rules to execute the mission, and these rules are reflected in its behavior. In other words, the target combat intention is manifested in the maneuvering actions or flight states of the flying target. When an enemy aircraft participates in a specific combat task, certain characteristic information of the aircraft must meet specific conditions. Therefore, when performing different combat tasks, the characteristics of the target will vary. For example, the flight altitude, speed, and flight path of the target will differ when performing tasks like attack, electronic jamming, or surveillance. Similarly, the sensor states will vary when executing tasks like reconnaissance or feints. Thus, intention recognition can be transformed into recognizing the maneuvering actions or flight states, which in turn translates into capturing radar measurements and flight characteristics. Table 3 presents the specific details regarding the selected target features and their quantities in the relevant literature. From the table, it can be seen that current research primarily selects two types of features—target motion state and sensor state—as the input attributes for intention recognition.

Referring to the feature selection in previous studies, as shown in Table 3, and considering the corresponding operational context, we ultimately select a continuous 12-frame sequence of 9-dimensional data as the key temporal features for intention recognition that are practically obtainable through battlefield reconnaissance, detection, and tracking. The feature descriptions are shown in Figure 4 and are divided into numeric and non-numeric features. Due to differences in measurement units, the original feature distributions often vary significantly. Features with larger numerical ranges tend to dominate, which means core features cannot be accurately captured from the raw data. Therefore, normalization is required to reduce the impact of differences in data dimensions and improve the convergence efficiency of the network. Additionally, since BLAC can only process numeric data, non-numeric data needs to be converted into the numeric format.

4. Model Description

Figure 2 illustrates the model framework designed to address intention recognition tasks. This section will specifically elaborate on two critical components proposed in the model: “② Data Preprocessing” and “④ BLAC Network”.

4.1. Data Patching

Given that aerial combat constitutes a highly adversarial process where detection procedures are fraught with uncertainties, certain UAV parameters may evade detection, resulting in data missingness. This scenario parallels the incomplete time-series classification problem where partial temporal information is absent, thereby allowing intention recognition challenges to be formally framed as incomplete time-series classification tasks. Incomplete time-series classification represents a more pragmatic and challenging research direction, where data imputation can be systematically integrated as a preprocessing step. By reconstructing missing values, models can be trained on more complete temporal sequences, thereby enhancing classification performance.

The most natural approach to incomplete time-series classification involves data imputation methodologies. The UAV characteristics collected in this study comprise two distinct data types: numerical and non-numerical. Numerical data primarily represent UAV state information with temporal dependencies between consecutive states. To ensure imputed data maintain continuity and smoothness consistent with original measurements—thereby improving reliability in classification tasks—cubic spline interpolation is employed for missing data reconstruction [38]. Non-numerical data, predominantly comprising UAV radar status information, exhibits a lower frequency of variation. Consequently, missing values in this category are assumed equivalent to their immediately preceding temporal instances. The complete data patching workflow is illustrated in Figure 5.

4.2. BLAC Network

The BLAC network architecturally comprises five components: the input layer, cross-attention layer, contrastive learning layer, BiLSTM layer, and output layer. Our methodology strategically coordinates three algorithmic components across intermediate layers to address UAV combat intention recognition challenges. Specifically, the cross-attention layer focuses on extracting key features, the BiLSTM layer captures long-term temporal dependencies within the sequence and retrieves future information, and the contrastive learning layer enhances feature discriminability, guiding the model to learn more discriminative features. The overall framework of the network is illustrated in Figure 6, with detailed descriptions provided below.

4.2.1. Input Layer

Define the vector

X_{t}^{C}

as the C-dimensional real-time battlefield feature information of the UAV collected at time t. Assuming that, at each time step t, m-dimensional features

X_{t}^{i} = {X_{t}^{1}, X_{t}^{2}, X_{t}^{3}, \dots X_{t}^{m}}

, are extracted. The m-dimensional feature information from

t_{1}

to

t_{n}

is represented as

X_{T}^{C}

. The intention space of the UAV is defined as

I = {I_{1}, I_{2}, \dots, I_{K}}

,where K denotes the total number of distinct intentions.

The mapping relationship between the temporal features and intentions is as follows:

I_{k} = f (X_{T}^{C}) = f (X_{t_{1}}^{C}, X_{t_{2}}^{C}, X_{t_{3}}^{C}, \dots X_{t_{n}}^{C})

(1)

In the equation,

X_{T}^{C}

represents the feature matrix of the m-dimensional features of UAV at the continuous n time steps, and its expansion form is as follows:

X_{T}^{C} = [\begin{matrix} X_{t_{1}}^{1} & X_{t_{2}}^{1} & \dots & X_{t_{n}}^{1} \\ X_{t_{1}}^{2} & X_{t_{2}}^{2} & \dots & X_{t_{n}}^{2} \\ ⋮ & ⋮ & \dots & ⋮ \\ X_{t_{1}}^{m} & X_{t_{2}}^{m} & \dots & X_{t_{n}}^{m} \end{matrix}]

(2)

For any numerical feature

X_{t_{j}}^{i}

,

i \in {1, 2, \dots, m}, j \in {1, 2, \dots n}

, the min–max normalization method is applied to map values to the interval [0, 1] as defined in Equation (3). For non-numerical features, a binary encoding scheme is implemented. Specifically, radar operational states are encoded as Boolean values: 0 denotes a deactivated radar system, while 1 signifies an active radar state.

{[X_{t_{j}}^{i}]}^{'} = \frac{X_{t_{j}}^{j} - \min X_{t_{j}}}{\max X_{t_{j}} - \min X_{t_{j}}}

(3)

4.2.2. Cross-Attention Layer

In the combat intention recognition task of UAVs, the input data comprise information from multiple sensors exhibiting distinct temporal and feature hierarchies. Direct integration of these heterogeneous data sources into the model frequently introduces interference from irrelevant information, potentially compromising the recognition process. The application of attention mechanisms enables the model to assign different weights to different time points and feature dimensions, guiding the model to focus on the most important features, thus improving both the expressiveness and efficiency of the model. The cross-attention mechanism presented in this research incorporates two principal components: the feature attention module and the time attention module. Regarding diverse input features at identical time steps, their interrelations manifest as feature channel relationships, where the feature attention module systematically investigates correlations among distinct feature categories to amplify the influence of pivotal characteristics. For input features across varying time steps, their associations present as temporal channel relationships, with the time attention module dedicated to extracting historical value dependencies within each temporal sequence, thereby strengthening the model of comprehension regarding time-series data patterns.

The feature attention module aims to enhance the model of focus on the feature dimensions that are most crucial for determining the intention at the current time step by assigning different attention weights to each feature. Specifically, the model learns a feature-level latent vector to compute similarity metrics with individual features, thereby quantifying the importance of each feature. The attention weights for features at each time step are averaged, yielding an overall importance score spanning the entire time series. Let the input sequence to the feature attention module be represented as the feature matrix encompassing all time steps. At time step t, the input sequence containing m-dimensional feature vectors is denoted as

C_{t_{j}}^{i} = [C_{t_{j}}^{1}, C_{t_{j}}^{2}, C_{t_{j}}^{3}, \dots, C_{t_{j}}^{m}]

. The calculation of the feature attention mechanism proceeds as follows:

α_{t_{j}} = \frac{\exp (C_{t_{j}}^{i} u_{f})}{\sum_{i = 1}^{i = m} \exp (C_{t_{j}}^{i} u_{f})}

(4)

α^{i} = \frac{1}{n} \sum_{j = 1}^{n} α_{t_{j}}^{i}

(5)

X^{i} = α^{i} C_{t_{j}}^{i} = [α^{i} c_{t_{1}}^{i}, α_{i}^{i} c_{t_{2}}^{i}, α^{i} c_{t_{3}}^{i}, \dots, α^{i} c_{t_{n}}^{i}]

(6)

Here,

α_{t_{j}} = [α_{t_{j}}^{1}, α_{t_{j}}^{2}, α_{t_{j}}^{3}, \dots, α_{t_{j}}^{m}]

represents the attention weight for m-dimensional feature at time step

t_{j}

,which is calculated using the softmax function.

u_{f}

is the hidden vector learned through training. The attention weight for the i-th feature,

α_{i}

, is the average of the attention weights of the i-th feature

α_{t}^{i} = [α_{t_{1}}^{i}, α_{t_{2}}^{i}, α_{t_{3}}^{i}, \dots, α_{t_{n}}^{i}]

over the n time steps, where n is the number of time steps in the input feature matrix. By multiplying the feature weight

α_{i}

with the corresponding input feature vector

C_{t_{j}}^{i}

, the weighted feature vector

X^{i}

is obtained. The temporal sequence formed by concatenating the weighted feature vectors

X^{i}

from all m features is then used as the input to the time attention module.

The time attention module aims to selectively focus on important time steps in the sequence through a weighted mechanism, thereby concentrating on the most critical parts for intention recognition. The core idea of time attention is to assign different weights to each time step in the sequence and compute the output based on these weights. Specifically, the time attention module calculates the similarity between the query, key, and value to determine the importance of each time step, and then performs a weighted summation to generate the final output. Let the sequence input to the time attention module after feature attention be denoted as

X_{T} = [X_{1}, X_{2}, X_{3} \dots X_{n}]

, where n represents the length of the time series, and each

X_{i} \in R^{m}

corresponds to the feature vector at time step i with dimensionality m. The calculation process for time attention proceeds as follows:

Step 1: Obtain the

Q

/

K

/

V

matrix using Equations (7)–(9).

Q_{i} = X_{i} W^{Q}, W^{Q} \in R^{m \times m}

(7)

K_{i} = X_{i} W^{K}, W^{K} \in R^{m \times m}

(8)

V_{i} = X_{i} W^{V}, W^{v} \in R^{m \times m}

(9)

where

Q, K, V \in R^{n \times m}

, each row of it corresponds to the mapping result of a specific time step n. Specifically, Q represents the query matrix, which consists of query vectors for all time steps

Q_{i}, i \in {1, 2, \dots, n}

; K denotes the key matrix, composed of key vectors for all time steps

K_{i}, i \in {1, 2, \dots, n}

; and V signifies the value matrix, formed by value vectors for all time steps

V_{i}, i \in {1, 2, \dots, n}

.

Step 2: Calculate the similarity score vector for the query

Q_{t}

at the t-th time step with respect to the entire set of keys in the sequence.

S_{t}^{i} = \frac{Q_{t} K_{i}^{T}}{\sqrt{m}}

(10)

where

S_{t} = [S_{t}^{1}, S_{t}^{2}, S_{t}^{3}, \dots, S_{t}^{n}]

,

i \in {1, 2, \dots, n}

.

Step 3: Apply the Softmax function to transform the similarity scores into a normalized probability distribution, thereby ensuring that the sum of the weights equals 1.

β_{t}^{i} = \frac{\exp (S_{t}^{i})}{\sum_{i = 1}^{i = n} \exp (S_{t}^{i})}

(11)

where

β_{t} = [β_{t}^{1}, β_{t}^{2}, β_{t}^{3}, \dots, β_{t}^{n}]

is the overall attention weight vector.

Step 4: Obtain the attention output at the t-th time step.

X_{t} = \sum_{i = 1}^{n} β_{t}^{i} V_{i}

(12)

where

X = [X_{1}, X_{2}, X_{3}, \dots, X_{n}]

denotes the attention output for each time step in the entire sequence, which serves as the input to the BiLSTM layer.

4.2.3. BiLSTM Layer

The BiLSTM layer enables the intention recognition model to process information from both past and future contexts, thereby fully utilizing all the available contextual information in the sequence. This enhances the model to understand and predict data patterns. The BiLSTM layer extends the LSTM network by incorporating a bidirectional propagation mechanism. This mechanism achieves bidirectional processing through two independent LSTM networks, allowing information to flow not only from past to future but also from future to past. This overcomes the limitation of unidirectional LSTMs, which can only encode information in the forward direction, thereby strengthening the capacity of the model to capture sequential information from both directions.

The BiLSTM layer comprises two independent LSTM networks. One network processes the input sequence in chronological order from past to future, while the other processes it in reverse temporal order from future to past. Both networks generate their respective hidden states at each time step. As illustrated in Figure 6, at time step t,

\vec{H_{t}}

denotes the output of the forward LSTM, while

\overset{\leftarrow}{H_{t}}

denotes the output of the backward LSTM. The final output of the BiLSTM network at this time step

H_{t} = [\vec{H_{t}}, \overset{\leftarrow}{H_{t}}]

is obtained by concatenating the outputs from both the forward and backward LSTMs.

LSTM is a specialized type of RNN that retains a similar recursive structure. However, LSTM introduces a gating mechanism to regulate the flow of information, effectively addressing the common issues of gradient vanishing and explosion encountered in traditional RNNs. The architecture of a single LSTM unit is illustrated in Figure 7. Among them,

X_{t}

is the input feature at time step t;

C_{t - 1}

is the neuron state before update, and

C_{t}

is the neuron state after update;

{\tilde{C}}_{t}

is the candidate cell state;

H_{t - 1}

and

H_{t}

are the output features at the previous and current time steps, respectively;

f_{t}

,

i_{t}

, and

O_{t}

are the forget gate, input gate, and output gate, respectively; and

σ

denotes the Sigmoid function. The operation process is as follows:

Step 1: LSTM comprises a forget gate

f_{t}

, an input gate

i_{t}

, and an output gate

O_{t}

, respectively, regulating the extent of forgetting historical information, the quantity of new information introduced, and the proportion of the final output.

f_{t} = σ (W_{f} [H_{t - 1}, X_{t}] + b_{f})

(13)

i_{t} = σ (W_{i} [H_{t - 1}, X_{t}] + b_{i})

(14)

O_{t} = σ (W_{o} [H_{t - 1}, X_{t}] + b_{o})

(15)

where

W_{f}

,

W_{i}

, and

W_{o}

are the weight matrices for the forget gate, input gate, candidate cell state, and output gate, respectively.

b_{f}

,

b_{i}

, and

b_{o}

are the corresponding bias vectors for each part. These weight matrices and bias vectors are learned during the training process to optimize the performance of the LSTM model.

Step 2: The candidate cell state, denoted as

{\tilde{C}}_{t}

, captures the potential new information at the current time step.

{\tilde{C}}_{t} = \tan h (W_{c} [H_{t - 1}, X_{t}] + b_{c})

(16)

where

W_{c}

is the learnable weight matrices for the candidate cell state, and

b_{c}

is the corresponding learnable bias vectors.

Step 3: By synergizing the forget gate

f_{t}

and the input gate

i_{t}

, the model performs selective forgetting of the previous cell state

C_{t - 1}

while integrating the candidate state

{\tilde{C}}_{t}

.

C_{t} = f_{t} ⊙ C_{t - 1} + i_{t} ⊙ {\tilde{C}}_{t}

(17)

Step 4: The final output

H_{t}

is modulated by the output gate by Equation (18), as follows:

H_{t} = O_{t} ⊙ \tan h C_{t}

(18)

4.2.4. Contrast Learning Layer

The contrastive learning layer, as an auxiliary tool, strengthens the model’s representation learning and optimizes the feature space, thereby improving model performance. The core concept of contrastive learning is to construct positive and negative sample pairs, guiding the model to learn the ability to distinguish between similar and dissimilar samples. By integrating contrastive learning, samples from the same class are pulled closer together in the representation space, while samples from different classes are pushed farther apart. This not only helps the model learn more discriminative features, but also reduces the risk of overfitting in supervised learning tasks with labeled data.

SupCon is a supervised learning method based on contrastive learning. Unlike traditional contrastive learning methods, SupCon leverages label information by explicitly pulling together samples from the same class and pushing apart samples from different classes during the training process, thereby enabling more effective representation learning [39]. As illustrated in Figure 6, SupCon is integrated into the overall training framework and optimizes the network parameters through contrastive loss. The contrast learning layer enhances the model’s ability to learn discriminative features by effectively utilizing label information in a supervised manner, thereby improving the model’s performance in tasks that require clear class distinctions.

In SupCon, let the label

y_{i}

corresponding to sample

{x_{1}, x_{2}, x_{3}, \dots, x_{N}}

. Each sample is mapped to the embedding space through the neural network, yielding the embedding representation

z_{i} = f (x_{i})

. The network is then optimized using a contrastive loss function. For each sample

x_{i}

, the goal is to bring it closer to other samples of the same class and push it farther away from samples of different classes. Specifically, this can be formulated as follows:

L_{S u p C o n} = \frac{1}{N} \sum_{i = 1}^{N} \frac{- 1}{|P (i)|} \sum_{p \in P (i)} \log \frac{\exp (\frac{z_{i}^{T} z_{p}}{τ})}{\sum_{a = 1}^{N} \exp (\frac{z_{i}^{T} z_{a}}{τ})}

(19)

where

z_{i}

and

z_{p}

are the embedding vectors of samples

x_{i}

and

x_{p}

;

τ

is the temperature parameter, which is set to 0.1 in this paper, which is a reasonable choice that has been verified through experiments [40] and in line with the conventions of the contrastive learning field; and

P (i)

is the set of samples belonging to the same class as sample

x_{i}

.

4.2.5. Output Layer

The input to the output layer is derived from the output of the BiLSTM layer. Specifically, the final time-step output of the BiLSTM network serves as the input to the fully connected layer. The model then outputs the intention category label with the highest probability according to the following formula:

I_{k} = s o f t m a x (w H + b)

(20)

where

I_{k}

is the intention category label, w is the trainable weight coefficient matrix, and

b

is the bias vector.

5. Experimental Analysis

5.1. Experimental Data and Environment

The experimental background involves some specific types of combat UAVs engaged in free air combat scenarios within airspace. The data were obtained from a combat simulation system and exported in CSV format. Each row in the CSV file comprises an intention label and 9-dimensional data for 12 time-series frames, as illustrated in Figure 4. In total, each row contains 109 columns. Given the length of the time series, we provide an example by listing the 9-dimensional data corresponding to the first time point for the six intention categories, as shown in Table 4. Multiple runs of the simulation system yielded various air combat intention patterns. Given the large volume of the sample set, air combat domain experts developed intention recognition rules based on their experience to generate intention labels. Subsequently, the computer classified the air combat intention samples into different patterns. Finally, air combat domain experts reviewed and revised any samples with ambiguous intention classifications using their expertise. The total number of samples is 16,000, divided into a training set and a test set at an 8:2 ratio, resulting in 12,800 training samples and 3200 test samples. The distribution of categories is as follows: attack intention (16%), retreat intention (10%), electronic interference intention (10%), surveillance intention (24%), reconnaissance intention (23%), and feint attack intention (17%). The time step is 12, and the feature dimension is 9. The specific division of training and test samples is detailed in Table 5.

The experiment was conducted using Python 3.8 on a system equipped with an NVIDIA GeForce RTX 2060 GPU (manufactured by NVIDIA Corporation, Santa Clara, CA, USA) and CUDA 11.4 for acceleration. The deep learning framework employed was PyTorch 1.13.0, running on a PC with an x86-64 Windows 10 operating system. The hardware configuration included an Intel Core i7-9700 CPU @ 3.60 GHz and 16 GB of RAM (manufactured by Intel Corporation, Santa Clara, CA, USA). The experimental setup was configured with 60 epochs and a batch size of 128.

5.2. Evaluation Metric

We evaluated the performance of the proposed intention recognition model using the following five metrics: accuracy, precision, recall, F1 score, and loss. The calculations for each metric are as follows:

1: Accuracy, which represents the proportion of samples correctly predicted by the model out of the total number of samples, as follows:

A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N}

(21)

where TP is the true positive, FN is the false negative, FP is the false positive, and TN is the true negative;

2: Precision, which represents the proportion of samples that are actually positive among all the samples predicted as positive by the model, as follows:

$P r e c i s i o n = \frac{T P}{T P + F P}$

(22)
3: Recall, which represents the proportion of actual positive samples that are correctly predicted as positive by the model, as follows:

$R e c a l l = \frac{T P}{T P + F N}$

(23)
4: F1 score, which represents the harmonic mean of precision and recall, serving as a comprehensive metric that balances both measures. It is particularly important in scenarios with imbalanced class distributions. The F1 score is calculated as follows:

$F1 score = \frac{2 \times Precision \times Recall}{Precision + Recall}$

(24)

In multi-classification tasks, the F1 Score is calculated as follows:

F1_multi = \frac{1}{K} \sum_{k = 1}^{K} F 1_{k}

(25)

where K denotes the total number of intentions, which is set to 6 in this study, and

F 1_{k}

represents the

F1 score

for the k-th intention;

5: Loss—the cross-entropy loss quantifies the discrepancy between the predicted probability distribution and the true label distribution, and is formulated as follows:

L_{c} = - \frac{1}{d} \sum_{i = 1}^{d} \sum_{k = 1}^{K} y_{i k} l g (q_{i k})

(26)

where d denotes the number of samples, and k represents the intention of the UAV.

y_{i k}

is the one-hot coded value (0 or 1),

q_{i k}

is the output value of the Softmax function, and

\sum_{k = 1}^{k = 6} q_{k} = 1

.

L o s s = L_{c} + λ L_{S u p C o n}

(27)

where

L_{S u p C o n}

denotes the contrastive loss in SupCon, which quantifies the similarity among samples within the same class and the distance between samples from different classes. It is calculated as shown in Equation (19).

λ

is introduced to balance the contribution of contrastive loss within the overall loss function. In this study,

λ

is set to 0.1, reflecting that the primary objective of the model is classification, while contrastive learning serves as an auxiliary task.

5.3. Parameter Tuning

Prior to initiating the training process, it is essential to determine several hyperparameters in the model. Different combinations of these hyperparameters can significantly impact the model’s performance on both the training and testing sets. Grid search discretizes the possible range of hyperparameter values into a grid, systematically evaluating each combination to identify the configuration that yields optimal performance. Through this method, the optimal hyperparameter settings are identified to ensure the model achieves its best possible outcomes. Table 6 summarizes the parameter configurations used in the grid search.

A total of 288 possible hyperparameter combinations were evaluated, and those achieving an accuracy ≥ 98% are summarized in Table 7. The results demonstrate that the highest accuracy and F1 score are attained when the batch size is set to 128, the number of hidden layers to 1, the number of hidden nodes to 64, dropout to 0.2, and learning rate to 0.003. Table 8 summarizes the key hyperparameter settings for the model.

5.4. Results and Analysis

5.4.1. Intention Recognition Result Analysis of BLAC

The data collection process in an air combat environment is characterized by random data loss. To validate the effectiveness of the data patching method, we utilized a specific set of data representing the distance between two entities at 12 discrete time points: [8.5, 8.39, 8.08, 8.02, 7.71, 7.36, 7.13, 7.06, 6.81, 6.5, 6.44, and 6.22]. Data at the first, third, seventh, and ninth time points were randomly omitted. As illustrated in Figure 8, cubic spline interpolation was used to fit the curve. While there is some discrepancy between the interpolated values and the original missing data points, the overall trend remains consistent.

To evaluate the impact of the data patching method on the accuracy of intention recognition under varying missing rates, we randomly removed 10% to 70% of the combat intention data from the test set samples. The data were then patched using the proposed method and subsequently input into BLAC model. The recognition rate results are presented in Table 9. As indicated in Table 9, the proposed data patching method achieves a recognition rate exceeding 91% even when the missing data rate reaches 30%. This demonstrates that the proposed data patching method can significantly improve the model’s recognition accuracy in uncertain environments.

Figure 9 illustrates the BLAC model’s experimental results. The model converges after approximately 20 iterations. The test set accuracy fluctuates around 98%, reaching a peak of 98.25%. The loss value stabilizes at approximately 0.2, with a minimum of 0.16.

Due to the unequal number of samples for each intention label in the test set, a confusion matrix was generated to further analyze the recognition rates for each intention, as shown in Figure 10. In Figure 10, it is evident that the model achieves an accuracy that exceeds 97.5% for all six intentions. The highest recognition accuracy is observed for the retreat intention at 100%, while the lowest accuracy is observed for the feint intention at 97.61%. The 100% recognition rate of the retreat intention can be attributed to its distinct dynamic features, such as maintaining high speed and negative acceleration during retreat, long distance from the target, and elevated RCS levels. These features facilitate the establishment of non-overlapping decision boundaries in the classifier, thus achieving an extremely high accuracy rate. Misclassifications are observed between the attack and feint intentions, as well as between the reconnaissance and surveillance intentions. This analysis suggests that these two pairs of intentions share significant similarities in their air combat features and exhibit strong deceptive characteristics. During training, the model was unable to ensure sufficiently distinct weights to recognize these combat intentions, leading to occasional misidentifications. Enhancing the accuracy of deceptive intention recognition will be a key focus of future research.

5.4.2. Comparative Analysis of Intention Recognition Methods

To further validate the effectiveness of the BLAC model, comparative experiments were conducted with seven other neural network-based models: BiGRU-Attention [41], LSTM-Attention [29], LSTM [42], GRU [43], TCN-Self-Attention [28], PCLSTM [30], and TCN [35]. The highest accuracy and corresponding loss value from the 60th iteration of each model on the test set were selected as the final performance metrics, as shown in Table 10. The experimental results show that the proposed BLAC model outperforms the other seven models in both accuracy and loss. Specifically, it achieves an accuracy improvement of approximately 12% over the basic LSTM and GRU models, and 14.6% over the basic TCN model. BiLSTM, as a sequential feature extraction network, demonstrates higher applicability in recognizing combat intentions of UAVs compared to basic neural network models. Moreover, BiLSTM’s ability to capture both past and future information provides advantages in handling classification problems. The overall performance of BiGRU-Attention is superior to that of LSTM-Attention, which further validates that recognizing UAVs intentions based on sequential feature changes, rather than individual time-step features, leads to higher accuracy. Additionally, the incorporation of cross-attention mechanisms and contrastive learning significantly enhances the model’s performance compared to using a single attention mechanism. To further enrich the experimental results and provide deeper insights into model performance, Figure 11, Figure 12 and Figure 13 present the confusion matrix plots for each model, grouped by their fundamental network architectures. Additionally, we further evaluated the model complexity by examining two critical metrics: the number of parameters and computational complexity, thereby providing a holistic assessment of both performance and efficiency. The results are summarized in Table 10. From Table 10, it becomes evident that while BLAC exhibits a higher parameter count and computational cost compared to lightweight models, its performance is significantly superior. Specifically, when compared to the suboptimal BiGRU-Attention model, BLAC achieves a 50% reduction in parameters, a 29% decrease in computational cost, and an accuracy improvement of 2.29%. These results highlight BLAC’s superior performance-efficiency trade-off, as it maintains high accuracy without substantially increasing model complexity.

5.4.3. Ablation Experiment

To further validate the effectiveness of the BLAC model in UAV air combat intention recognition, we conducted an ablation experiment on the same dataset to evaluate the contribution of each component and assess its impact on overall performance. The effects of different model structure combinations on performance are summarized in Table 11, while the accuracy and loss value curves for each model configuration are illustrated in Figure 14.

From Table 11, it can be observed that Model ⑤ BLAC achieves the highest accuracy. Although Model ⑤ exhibits superior accuracy, Model ④ demonstrates a slightly lower loss. This discrepancy is attributed to the inclusion of the contrastive learning module in Model ⑤, which enhances accuracy but also results in a marginal increase in loss. Model ⑤ outperforms Model ③ by approximately 2% in terms of accuracy, indicating that the cross-attention mechanism significantly boosts model performance. Additionally, Model ⑤ achieves an approximately 2.5% higher accuracy compared to Model ②, further confirming the critical role of the bidirectional network in enhancing model performance. Compared to the baseline LSTM Model ①, Model ⑤ shows a substantial improvement of 11.94% in accuracy and also achieves a significant reduction in loss. This underscores the limitations of using LSTM alone for intention recognition tasks and highlights the necessity of integrating additional techniques to achieve better outcomes. The bidirectional network, cross-attention mechanism, and contrastive learning proposed in this paper collectively contribute to improving the model’s recognition accuracy.

In Figure 14, it is evident that BLAC consistently outperforms other models in terms of accuracy. By comparing Model ⑤ and Model ④, we observe that the inclusion of contrastive learning enables the model to achieve higher accuracy more rapidly. In the basic LSTM network, introducing the bidirectional propagation mechanism results in a slightly greater performance improvement compared to introducing only the cross-attention mechanism. However, when both mechanisms are incorporated together, the model’s recognition rate improves significantly.

Due to the varying sample sizes of each intention category in the test set, we use precision, recall, and F1 score to evaluate the recognition accuracy of the five models for the six types of intentions, as shown in Table 12. The performance varies across different intentions. All five models effectively recognize the “retreat” intention. The input features for the “retreat” intention are more distinct compared to other intentions, enabling the models to learn its characteristics more quickly and accurately. Consequently, the accuracy and F1 score for the “retreat” intention are the highest. Models ⑤ and ④ demonstrate a clear advantage in distinguishing between the more similar “surveillance” and “reconnaissance” intentions, while Model ① exhibits relatively weaker performance across all the intentions. This suggests that more complex or similar intentions may require specific enhancements in the model’s architecture or training strategies to achieve better differentiation.

6. Conclusions

This paper presents a UAV air combat intent recognition algorithm (BLAC) based on BiLSTM, attention mechanisms, and contrastive learning, aimed at solving the problem of intention recognition under uncertain information. Through a comparative classification of intention recognition methods in the operational domain, we emphasize the advantages of data-driven approaches in handling complex air combat scenarios. To address the issue of incomplete data in air combat, we design a unique data repair strategy. To improve the model performance, we construct a cross-attention layer that integrates both temporal and feature attention mechanisms, while also incorporating contrastive learning to enhance the model’s ability to differentiate features.

The BLAC model proposed in this study is capable of analyzing continuously evolving enemy behaviors and events within highly dynamic and variable combat environments, thereby enhancing the understanding of enemy intentions. To evaluate the performance of the model, comparative experiments were conducted against several state-of-the-art intention recognition models, including BiGRU-Attention [41], LSTM-Attention [29], LSTM [42], GRU [43], TCN-Self-Attention [28], PCLSTM [30], and TCN [35]. The experimental results demonstrate that the BLAC model outperforms these benchmarks in terms of accuracy for UAV combat intention recognition. Furthermore, considering the inherent uncertainties in operational environments, where target data are often incomplete or noisy, we further test the performance of the model under different degrees of data missing. The results indicate that the BLAC model maintains an accuracy exceeding 91% even with approximately 30% of the data missing. These findings suggest that the BLAC model is highly effective in accurately recognizing UAV tactical intentions under conditions of information uncertainty, offering significant potential for enhancing command and control decision-support systems.

However, while the BiLSTM layer in the proposed model can partially mitigate the negative effects of deceptive behavior by capturing temporal dependencies in scenarios where such behavior is brief, its efficacy diminishes when deceptive behavior persists longer due to limitations imposed by the window size and the duration of the deceptive behavior. Future work will aim to enhance the accuracy of detecting deceptive intentions, increase sensitivity to deceptive behaviors, and investigate the potential of novel cross-modal neural network models for UAV intention recognition [44]. Given the constraints of single-modal information, subsequent research will focus on integrating multi-modal sensor data into the model’s input features to improve the utilization of contextual information and minimize misjudgments arising from local behavioral ambiguity. Moreover, in real-world applications, deceptive intention samples are scarce, and annotations necessitate domain expert involvement, resulting in distribution biases within the training set. Consequently, it is essential to explore data augmentation techniques to address the challenge posed by limited labeled data.

Author Contributions

Conceptualization, Q.N. and S.R.; methodology, Q.N. and S.R.; formal analysis, software, Q.N.; writing—original draft preparation, Q.N.; writing—review and editing, L.Z. and S.R.; supervision, C.W. and W.G.; and project administration, C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are available on request due to restrictions. Due to the sensitive nature of our institution, the dataset from this study is not openly available. However, if access to the data is required, interested parties may contact the corresponding author. Upon request, and following a review by our institution, the data may be shared under specific conditions. This ensures compliance with our institutional policies while facilitating potential collaboration and further research.

DURC Statement

The current research is limited to the intention recognition of Unmanned Aerial Vehicles in the context of intelligent air combat, which is beneficial and does not pose a threat to public health or national security. Authors acknowledge the dual-use potential of the research involving intelligent air combat and confirm that all necessary precautions have been taken to prevent potential misuse. As an ethical responsibility, authors strictly adhere to relevant national and international laws about DURC. The authors advocate for responsible deployment, ethical considerations, regulatory compliance, and transparent reporting to mitigate misuse risks and foster beneficial outcomes.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

UAV	Unmanned aerial vehicle
BiGRU	Bidirectional gated recurrent unit
LSTM	Long short-term memory networks
TCN	Temporal convolutional network
PCLSTM	Panoramic convolutional long short-term memory networks
BiLSTM	Bidirectional long short-term memory networks

References

Zhang, C.; Yan, Z.; Cai, Y.; Guo, J. A Review of Air Target Operational Intention Recognition Research. Mod. Def. Technol. 2024, 52, 1–15. [Google Scholar]
Yin, J.; Guo, X. Aero space situation evolving and predicting methods. In Proceedings of the 19th CCSSTA 2018, College Park, MD, USA, 10–14 August 2018; pp. 303–306. [Google Scholar]
He, S.; Qian, J. Application research of expert system in data fusion technology. Fire Control Radar Technol. 2003, 32, 67–74+80. [Google Scholar]
Azarewicz, J.; Fala, G.; Heithecker, C. Template-based multi-agent plan recognition for tactical situation assessment. In Proceedings of the Fifth Conference on Artificial Intelligence Applications, Miami, FL, USA, 6–10 March 1989; pp. 247–248. [Google Scholar]
Floyd, M.W.; Karneeb, J.; Aha, D.W. Case-based team recognition using learned opponent models. In Proceedings of the Case-Based Reasoning Research and Development: 25th International Conference, ICCBR 2017; Trondheim, Norway, 26–28 June 2017, Springer: Berlin/Heidelberg, Germany, 2017; pp. 123–138. [Google Scholar]
Huang, J.; Liu, W.; Zhao, Y. Intuitionistic cloud reasoning and its application in aerial target intention analysis. Oper. Res. Fuzziol. 2014, 4, 60–69. [Google Scholar] [CrossRef]
Wang, X.; Xia, M.; Lin, Q.; Wang, Z.; Kong, F. Combat intent forecast based on DS evidence theory before contacting the enemy. Fire Control Command Control 2016, 41, 185–188. [Google Scholar]
Zhang, S.; Cheng, Q.; Xie, Y.; Gang, Y. Intention recognition method under uncertain air situation information. J. Air Force Eng. Univ. (Nat. Sci.) 2008, 9, 50–53. [Google Scholar]
Niu, X.; Zhao, H.; Zhang, Y. Warship intent recognition in naval battle field based on decision tree. Mil. Ind. Autom. 2010, 29, 44–46+53. [Google Scholar]
Zhang, Y.; Deng, X.; Li, M.; Li, X.; Jiang, W. Air target intention recognition based on evidence network causal analysis. Acta Aeronaut. 2022, 43, 143–156. [Google Scholar]
Cao, S.; Liu, Y.; Xue, S. Target tactical intention recognition method of improved high-dimensional data similarity. Sens. Microsyst. 2017, 36, 25–28. [Google Scholar]
Dai, G.; Chen, W.; Liu, Z.; Bai, Z.; Chen, H. Aircraft tactical intention recognition method based on interval grey correlation degree. Pract. Underst. Math. 2014, 44, 198–207. [Google Scholar]
Yang, C.; Song, S.; Fan, P. A target intention recognition method based on cost-sensitive and multi-class three-branch decision. J. Ordnance Equip. Eng. 2023, 44, 132–136. [Google Scholar]
Li, J.; Zhang, P.; Hao, R. Unmanned Aerial Vehicle Tactical Intention Recognition Method Based on Dynamic Series Bayesian Network. In Proceedings of the 2023 IEEE International Conference on Unmanned Systems (ICUS), Hefei, China, 13–15 October 2023; pp. 427–432. [Google Scholar]
Yang, R.; Yang, J.; Liu, X.; Zhang, Y.; Yan, Y. Air-ground cooperative operations intention recognition based on Dynamic Series Bayesian Network. Command Control Simul. 2024, 46, 75–85. [Google Scholar]
Hu, Z.; Liu, H.; Gong, S.; Peng, C. Target intention recognition based on random forest. Mod. Electron. Tech. 2022, 45, 1–8. [Google Scholar]
Zhang, P.; Zhang, Y.; Li, J.; Zhang, P.; Wei, X. Research on Air Target Intention Recognition Method Based on RL-LSTM. Fire Control Command Control 2024, 49, 75–81. [Google Scholar]
Zhou, D.; Sun, G.; Zhang, Z.; Wu, L. On Deep Recurrent Reinforcement Learning for Active Visual Tracking of Space Noncooperative Objects. IEEE Robot. Autom. Lett. 2023, 8, 4418–4425. [Google Scholar] [CrossRef]
Feng, Z.; Xiaofeng, H.; Lin, W. Simulation Method of Battlefields Situation Senior Comprehension Based on Deep Learning. Fire Control Command Control 2018, 43, 25–30. [Google Scholar]
Wu, G.; Shi, H.; Qiu, C. Intention Recognition Method of Air Target Based on SSA-SVM. Ship Electron. Eng. 2022, 42, 29–34. [Google Scholar]
Yang, S.; Zhang, D.; Xiong, W.; Ren, Z.; Tang, S. Decision-making method for air combat maneuver based on explainable reinforcement learning. Acta Aeronaut. Astronaut. Sin. 2024, 45, 257–274. [Google Scholar]
Bai, L.; Xiao, Y.; Qi, J. Adversarial Intention Recognition Based on Reinforcement Learning. J. Command Control 2024, 10, 112–116. [Google Scholar]
Zhou, T.; Chen, M.; Wang, Y.; He, J.; Yang, C. Information entropy-based intention prediction of aerial targets under uncertain and incomplete information. Entropy 2020, 22, 279. [Google Scholar] [CrossRef]
Teng, F.; Liu, S.; Song, Y. BiLSTM-Attention: An Air Target Tactical Intention Recognition Model. Aero Weapon. 2021, 28, 24–32. [Google Scholar]
Zhou, K.; Wei, R.; Xu, Z.; Zhang, Q.; Lu, H.; Zhang, G. An air combat decision learning system based on a brain-like cognitive mechanism. Cogn. Comput. 2020, 12, 128–139. [Google Scholar] [CrossRef]
Zhu, B.; Fang, L.G.; Zhang, X.D. Intention Assessment to Aerial Target Based on Bayesian Network. J. Mod. Def. Technol. 2012, 40, 109–113. [Google Scholar]
Xu, J.; Zhang, L.; Han, D. Air Target Intention recognition based on fuzzy inference. Command. Inf. Syst. Technol. 2020, 11, 44–48. [Google Scholar]
Zhao, L.; Sun, P.; Zhang, Y. A fast aerial targets intention recognition method under imbalanced hard-sample. J. Air Force Eng. Univ. 2024, 25, 76–82. [Google Scholar]
Liu, Z.; Chen, M.; Wu, Q.; Chen, S. Prediction of unmanned aerial vehicle target intention under incomplete information. Sci. Sin. Inf. 2020, 50, 704–717. [Google Scholar] [CrossRef]
Xue, J.; Zhu, J.; Xiao, J.; Tong, S.; Huang, L. Panoramic convolutional long short-term memory networks for combat intension recognition of aerial targets. IEEE Access 2020, 8, 183312–183323. [Google Scholar] [CrossRef]
Sun, L.; Yu, L.; Zou, D. Application of Dempster-Shafer Evidence Theory in Target Intention Prediction. Electron. Opt. Control. 2008, 15, 33–36. [Google Scholar]
Zhang, Z.; Qu, Y.; Liu, H. Air target intention recognition based on further clustering and sample expansion. In Proceedings of the 2018 37th Chinese Control Conference (CCC), Wuhan, China, 17–25 July 2018; pp. 3565–3569. [Google Scholar]
Geng, Z.; Zhang, J. Research on Air Target Combat Intention Inference Based on Bayesian Networks. Mod. Def. Technol. 2008, 36, 40–44. [Google Scholar]
Ou, W.; Liu, S.; He, X.; Guo, S. Tactical intention recognition algorithm based on encoded temporal features. Command Control Simul. 2016, 38, 36–41. [Google Scholar]
Teng, F.; Song, Y.; Guo, X. Attention-TCN-BiGRU: An air target combat intention recognition model. Mathematics 2021, 9, 2412. [Google Scholar] [CrossRef]
Ma, Y.; Sun, P.; Zhang, J..; Wang, P.; Yan, Y.; Zhao, L. Air group intention recognition method under imbalance samples. Syst. Eng. Electron. 2022, 44, 3747–3755. [Google Scholar]
Wang, S.; Wang, G.; Fu, Q.; Song, Y.; Liu, J.; He, S. STABC-IR: An air target intention recognition method based on bidirectional gated recurrent unit and conditional random field with space-time attention mechanism. Chin. J. Aeronaut. 2023, 36, 316–334. [Google Scholar] [CrossRef]
Zhang, H.; Yan, Y.; Li, S.; Hu, Y.; Liu, H. UAV behavior-intention estimation method based on 4-D flight-trajectory prediction. Sustainability 2021, 13, 12528. [Google Scholar] [CrossRef]
Khosla, P.; Teterwak, P.; Wang, C.; Sarna, A.; Tian, Y.; Isola, P.; Maschinot, A.; Liu, C.; Krishnan, D. Supervised contrastive learning. Adv. Neural Inf. Process. Syst. 2020, 33, 18661–18673. [Google Scholar]
Moradinasab, N.; Sharma, S.; Bar-Yoseph, R.; Radom-Aizik, S.; Bilchick, K.C.; Cooper, D.M.; Weltman, A.; Brown, D.E. Universal representation learning for multivariate time series using the instance-level and cluster-level supervised contrastive learning. Data Min. Knowl. Discov. 2024, 38, 1493–1519. [Google Scholar] [CrossRef]
Teng, F.; Guo, X.; Song, Y.; Wang, G. An air target tactical intention recognition model based on bidirectional GRU with attention mechanism. IEEE Access 2021, 9, 169122–169134. [Google Scholar] [CrossRef]
Ou, W.; Liu, S.; He, X.; Cao, Z. Study on intelligent recognition model of enemy target’s tactical intention on battlefield. Comput. Simul. 2017, 34, 10–14. [Google Scholar]
Gou, X.T.; Wu, N.F. Air group situation recognition method based on GRU-attention neural network. Comput. Mod. 2019, 10, 11. [Google Scholar] [CrossRef]
Li, L.; Yang, R.; Lv, M.; Wu, A.; Zhao, Z. From behavior to natural language: Generative approach for unmanned aerial vehicle intent recognition. IEEE Trans. Artif. Intell. 2024, 5, 6196–6209. [Google Scholar] [CrossRef]

Figure 1. Hierarchical representation and reasoning process of intention.

Figure 2. Process of UAVs tactical intention recognition.

Figure 3. Intention Space Coding.

Figure 4. Temporal Features of UAVs.

Figure 5. Data Patching Process.

Figure 6. Structure of BLAC Network.

Figure 7. LSTM Structure.

Figure 8. Fitting curve of distance under random missing data.

Figure 9. Experimental results of test and training sets of BLAC model: (a) Accuracy of BLAC. (b) Loss of BLAC.

Figure 10. Confusion matrix of BLAC model.

Figure 11. Confusion matrix of LSTM-based models: (a) Confusion matrix of BLAC. (b) Confusion matrix of LSTM-Attention. (c) Confusion matrix of LSTM. (d) Confusion matrix of PCLSTM.

Figure 12. Confusion matrix of GRU-Based models: (a) Confusion matrix of BiGRU-Attention. (b) Confusion matrix of GRU.

Figure 13. Confusion matrix of TCN-Based models: (a) Confusion matrix of TCN-Self-Attention. (b) Confusion matrix of TCN.

Figure 14. Results of ablation experiments: (a) Accuracy rate of ablation experiment. (b) Loss value of ablation experiment.

Table 1. Intention Space in Different Literature.

References	Intention Space	Qty.
[27]	Attack, penetration, retreat, and search	4
[13]	Attack, scout, penetration, electronic interference, and circumvention	5
[28]	Attack, electronic interference, retreat, surveillance, scout, and feint	6
[29]	Attack, scout, surveillance, feint, penetration, defense, and electronic interference	7
[30]	Penetration, attack, jamming, transportation, refueling, civil fight, AWACS, and scout	8

Table 2. Detailed Description of 6 Intentions.

Intention	Description
Attack	UAVs launch bullets, bombs, or missiles to strike the strategic point to cause damage
Retreat	UAVs evacuate from the current battlefield area
Electronic interference	UAVs interfere with enemy radar and communication systems through electronic jamming equipment
Surveillance	Passive activities of UAVs to monitor an area
Reconnaissance	Active exploration activities of air targets to detect the situation
Feint	UAVs simulate an attack to deceive the enemy

Table 3. Target Features in Different Literature.

References	Target Features	Qty.
[31]	Velocity, distance, and azimuth angle	3
[32]	Azimuth angle, distance, horizontal velocity, heading angle, and height	5
[33]	Heading, distance, identity, aircraft type, velocity, and height	6
[34]	Height, velocity, heading, repeated frequency, pulse width, carrier frequency, and level of RCS	7
[35]	Heading angle, azimuth angle, height, distance, velocity, acceleration, level of RCS, marine air-to-air radar status, and disturbed state	9
[28]	Azimuth angle, distance, heading angle, velocity, height, marine radar status, air-to-air radar status, disturbing state, disturbed state, and maneuver type	10
[36]	Velocity, acceleration, height, distance, heading angle, azimuth angle, level of RCS, maneuver type, disturbing state, air-to-air radar status, and marine radar status	11
[37]	Height, velocity, acceleration, heading angle, azimuth, distance, course short, 1D range profile, radar cross section, air-to-air radar status, air-to-ground radar state, and electronic interference state	12

Table 4. Example of CSV format for data.

Intention Label	Time Frame 1									…	Time Frame 12
Intention Label	Velocity	Acceleration	Height	Distance	Heading Angle	Azimuth Angle	Level of RCS	Radar Status	Disturbed State	…	…
0	240	20	480	50	90	0	0.3	1	0	…	…
1	220	−5	702	200	270	180	1.2	0	0	…	…
2	50	5	520	40	45	90	3	1	1	…	…
3	70	3	1800	600	180	10	0.5	1	0	…	…
4	60	2	1300	400	90	180	0.3	1	1	…	…
5	150	12	800	100	270	90	4	1	1	…	…
…	…	…	…	…	…	…	…	…	…	…	…

Table 5. Classification of intention recognition sample data.

Intention Label	Intention Type	Total Samples	Training Samples	Test Samples
0	Attack	2560	2048	512
1	Retreat	1600	1280	320
2	Electronic interference	1600	1280	320
3	Surveillance	3840	3072	768
4	Reconnaissance	3680	2944	736
5	Feint	2720	2176	544

Table 6. Parameters of Grid Search.

Parameter	Value
Batch size	[64, 128, 256, 512]
Number of hidden layer	[1, 2]
Number of hidden nodes	[64, 128, 256, 512]
Dropout	[0.1, 0.2, 0.3]
Learning rate	[0.0005, 0.001, 0.003]

Table 7. Parameter combinations.

Parameter					Accuracy (%)
Batch Size	Hidden Layer	Hidden Nodes	Dropout	Learning Rate	Accuracy (%)
64	2	128	0.1	0.003	98.104
64	2	128	0.2	0.003	98.043
64	2	256	0.3	0.0005	98.165
64	2	512	0.3	0.0005	98.104
128	1	64	0.2	0.003	98.250
128	2	64	0.3	0.001	98.043
128	2	256	0.3	0.003	98.165
256	2	128	0.2	0.0005	98.165
256	2	512	0.3	0.001	98.043
512	1	128	0.2	0.001	98.043
512	1	256	0.3	0.0005	98.043
512	2	256	0.2	0.0005	98.043
512	2	256	0.3	0.001	98.165

Table 8. Model experiment parameters.

Parameter	Value
Loss function	Categorical_Crossentropy
Optimizer	Adam
Activation function	ReLU
Hidden layer	1
Hidden nodes	64
Batch size	128
Epoch	60
Dropout	0.2
Learning rate	0.003

Table 9. Model recognition accuracy under different data missing percent.

Data Missing	Accuracy (%)	Data Missing	Accuracy (%)
0%	98.25	40%	86.51
10%	96.82	50%	73.27
20%	93.33	60%	60.83
30%	91.57	70%	45.60

Table 10. Model performance and complexity evaluation of various models.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1 Score	Loss	Number of Parameters (K)	Computational Complexity (MFLOPs)
BLAC	98.25	98.54	98.48	0.985	0.16	38.92	0.902
BiGRU-Attention	95.96	96.42	96.17	0.963	0.19	78.47	1.270
LSTM-Attention	90.74	91.85	91.47	0.916	0.17	21.37	0.504
LSTM	86.31	86.96	86.64	0.868	0.32	18.97	0.446
GRU	85.80	86.69	86.39	0.865	0.27	14.47	0.422
TCN-Self-Attention	89.93	91.05	90.29	0.906	0.22	26.54	0.649
PCLSTM	88.70	89.45	88.93	0.892	0.33	35.58	0.963
TCN	83.65	84.78	83.85	0.843	0.29	21.21	0.503

Table 11. Results of ablation experiment.

Model	Model Composition Structure				Accuracy (%)	Loss
Model	LSTM	Bidirectional	Cross Attention	Contrast Learning	Accuracy (%)	Loss
①	√				86.31	0.32
②	√		√	√	95.81	0.18
③	√	√		√	96.28	0.21
④	√	√	√		97.44	0.13
⑤	√	√	√	√	98.25	0.16

Table 12. Results of each intention evaluation index of ablation experiment.

Index	Precision (%)					Recall (%)					F1 Score
Intention	①	②	③	④	⑤	①	②	③	④	⑤	①	②	③	④	⑤
Attack	82.56	94.92	95.31	96.88	97.66	83.20	94.92	95.31	96.88	97.85	0.829	0.949	0.953	0.969	0.978
Retreat	92.54	100	100	100	100	96.88	100	100	100	100	0.947	1.000	1.000	1.000	1.000
Electronic interference	86.86	96.91	97.82	98.45	100	95.00	98.13	98.13	99.06	99.38	0.907	0.975	0.980	0.988	0.997
Surveillance	86.26	95.43	95.84	97.13	97.92	85.03	95.18	95.96	96.88	97.92	0.856	0.953	0.959	0.970	0.979
Reconnaissance	86.58	94.99	95.39	96.88	97.70	85.05	95.38	95.65	97.15	98.10	0.858	0.952	0.955	0.970	0.979
Feint	85.36	95.18	95.93	97.05	97.97	81.43	94.30	95.22	96.70	97.61	0.833	0.947	0.956	0.969	0.978

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Niu, Q.; Zhang, L.; Ren, S.; Gao, W.; Wang, C. Attention-Enhanced Contrastive BiLSTM for UAV Intention Recognition Under Information Uncertainty. Drones 2025, 9, 319. https://doi.org/10.3390/drones9040319

AMA Style

Niu Q, Zhang L, Ren S, Gao W, Wang C. Attention-Enhanced Contrastive BiLSTM for UAV Intention Recognition Under Information Uncertainty. Drones. 2025; 9(4):319. https://doi.org/10.3390/drones9040319

Chicago/Turabian Style

Niu, Qianru, Luyuan Zhang, Shuangyin Ren, Wei Gao, and Chunjiang Wang. 2025. "Attention-Enhanced Contrastive BiLSTM for UAV Intention Recognition Under Information Uncertainty" Drones 9, no. 4: 319. https://doi.org/10.3390/drones9040319

APA Style

Niu, Q., Zhang, L., Ren, S., Gao, W., & Wang, C. (2025). Attention-Enhanced Contrastive BiLSTM for UAV Intention Recognition Under Information Uncertainty. Drones, 9(4), 319. https://doi.org/10.3390/drones9040319

Article Menu

Attention-Enhanced Contrastive BiLSTM for UAV Intention Recognition Under Information Uncertainty

Abstract

1. Introduction

2. Related Works

3. UAV Intention Recognition Problem Description

3.1. Intention Space

3.2. Target Features

4. Model Description

4.1. Data Patching

4.2. BLAC Network

4.2.1. Input Layer

4.2.2. Cross-Attention Layer

4.2.3. BiLSTM Layer

4.2.4. Contrast Learning Layer

4.2.5. Output Layer

5. Experimental Analysis

5.1. Experimental Data and Environment

5.2. Evaluation Metric

5.3. Parameter Tuning

5.4. Results and Analysis

5.4.1. Intention Recognition Result Analysis of BLAC

5.4.2. Comparative Analysis of Intention Recognition Methods

5.4.3. Ablation Experiment

6. Conclusions

Author Contributions

Funding

Data Availability Statement

DURC Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI