1. Introduction
Synthetic Aperture Radar (SAR) stands out as an active microwave remote sensing technology, offering uninterrupted, all-day, and all-weather Earth surface observation [
1]. It remains unaffected by conditions such as illumination, clouds, or weather, making it indispensable in remote sensing [
2]. SAR has found extensive applications in both military and civilian domains, emerging as a pivotal tool for information acquisition [
3]. In military contexts, the detection of aircraft holds a central position in air defense research, leveraging the distinctive advantages offered by SAR images [
4]. Consequently, there is a global research emphasis on enhancing aircraft target detection in SAR imagery.
SAR imaging, distinct from optical methods, poses challenges in detecting and identifying aircraft targets due to its longer wavelength and complex mechanism [
5]. The irregular distribution of land clutter, marked by bright backscattering points, introduces interference [
6]. SAR images often showcase intricate terrain features, complicating aircraft target detection as these features may mimic the representation of targets. Targets in SAR images manifest as irregular bright spots, necessitating spot integration for effective recognition. The varied imaging characteristics of aircraft targets in SAR images, combined with fluctuating scattering conditions, reduce the relevance of traditional manually designed features [
6,
7,
8,
9]. Detecting aircraft targets in SAR images is, therefore, a significant but intricate research direction.
Traditional SAR image target detection relies heavily on model features, encompassing characteristics like the target’s outline, size, texture, and scattering center [
10,
11,
12,
13]. A common traditional algorithm is the constant false alarm rate (CFAR) based on clutter statistics and threshold extraction [
14]. Scholars have delved into statistical features and non-uniform backgrounds, proposing improved CFAR algorithms like CA-CFAR [
15], SOCA-CFAR [
16], GOCA-CFAR [
17], OS-CFAR [
18], and VI-CFAR [
19]. Ai et al. [
20] proposed an SAR detection algorithm using bilateral fine-tuning thresholds, enhancing performance in ocean backgrounds by fitting the target to clutter with higher contrast. Chen et al. [
21] introduced an improved constant false alarm rate detection algorithm based on multiscale contrast and variable windows, elevating target detection accuracy in SAR images. Model-based recognition methods achieve heightened target recognition accuracy with an evolving template database. However, this approach demands multiple iterations for high-precision simulated images, taxing computation speed, and model accuracy. Furthermore, model-based methods suffer from high computational complexity and low efficiency. Consequently, researchers are increasingly exploring machine learning algorithms, such as support vector machines, neural networks, and adaptive enhancement, for automatic interpretation of SAR targets.
Recently, deep learning-based target detection has experienced rapid development across various fields [
22,
23,
24,
25]. Notable strides have been achieved in aircraft target recognition in Synthetic Aperture Radar (SAR) images through deep learning [
26,
27,
28]. This progress is largely credited to the automatic learning and pattern recognition capabilities inherent in deep learning methods for handling complex features. Zhao et al. [
29] introduced an SAR aircraft detection algorithm leveraging dilated convolution and attention mechanisms, creating a novel pyramid dilation network to optimize aircraft feature extraction in SAR images. Wang et al. [
30], utilizing the SSD object detection framework, applied a strategy integrating transfer learning and data augmentation to enhance SSD’s target detection performance in SAR images. In SAR aircraft target detection, scholars commonly modify existing algorithms to meet specific requirements for satisfactory results [
31,
32]. However, achieving a balance between model complexity and detection accuracy can be challenging, and existing algorithms may struggle in such scenarios. Moreover, aircraft in SAR images may experience deformation due to radar geometry effects, altering the target shape and complicating detection. Additionally, deep learning methods rely on ample samples for supervised training, and inadequate samples can result in overfitting, adversely affecting detection performance.
To address these challenges, we propose a Feature Enhancement and Multi-Scales Fusion Network (FEMSFNet). This network aims to improve detection accuracy while minimizing model complexity, achieving a balance between the two. Firstly, FEMSFNet employs a diverse image enhancement technique [
33,
34,
35], applying methods like noise, mosaic, mixup, rotation, and cropping to enhance image features. This addresses the scarcity of SAR aircraft image data, enhancing sample diversity for improved generalization and robust network training. Secondly, drawing inspiration from Yolov4-tiny [
36,
37,
38], FEMSFNet utilizes the CSPDarknet53-tiny network [
39] as the backbone to create a lightweight model. It incorporates a residual module based on an improved attention mechanism, focusing more on critical image regions for enhanced recognition accuracy. Thirdly, the paper introduces superior CSP [
40] structures based on an improved feature pyramid [
41], preventing a reduction in the network’s receptive field and the loss of target feature information in deep structures. Finally, tailored for SAR aircraft target detection, FEMSFNet optimizes loss functions [
42,
43] while implementing cosine annealing learning rate decay [
44], and includes label smoothing [
45] techniques to prevent overfitting, expedite convergence, and improve regression accuracy.
The main contributions of our work are as follows:
FEMSFNet, as proposed, prioritizes both speed and accuracy in target detection. It leverages the lightweight CSPDarknet53-tiny as the backbone for efficient feature extraction. The model’s performance has undergone evaluation using the SAR Aircraft Detection Dataset (SADD).
To maintain a lightweight model without compromising detection accuracy, we integrate the optimized Squeeze-and-Excitation Networks (SE) attention module with the ResNet module in the backbone, forming the SdE-Resblock structure.
To prevent the deep network from causing a deviation in the receptive field during training, leading to ineffective global feature fusion and loss of feature information, we propose a CSP structure based on an improved pyramid pooling model, called ssppf-CSP.
Considering the unique characteristics of SAR aircraft target detection, FEMSFNet optimized the network’s loss functions while implementing techniques such as learning rate cosine annealing decay and label smoothing to prevent overfitting, ultimately enhancing convergence speed and regression accuracy.
The rest of the paper is arranged as follows.
Section 2 describes the proposed aircraft detection network in detail. Experimental results, as well as performance evaluation, are presented in
Section 3, and a detailed discussion of the results is provided at the end of this section.
Section 4 briefly summarizes this paper.
3. Experiments and Results
3.1. Datasets
Currently, there is a scarcity of publicly available datasets for SAR image aircraft detection. Consequently, we exclusively utilize the SAR Aircraft Detection Dataset (SADD) [
60] to validate our approach.
SADD is derived from the German Terra-SAR-X satellite, operating in the x-band and HH polarization mode. It provides image resolutions ranging from 0.5 to 3 m. Expert SAR Automatic Target Recognition (ATR) analysts manually annotate the ground truth of aircraft based on prior knowledge and corresponding optical images. After cropping large images, the SADD comprises 2966 non-overlapping 224 × 224 slices, containing 7835 annotated aircraft targets with clear structures, outlines, and main components. The dataset includes aircraft targets of varying sizes, with a significant number being small-scale targets. The SADD backdrop features a complex environment with diverse scenes such as airport runways, aprons, and civil aviation facilities. Negative samples are predominantly found in areas surrounding the airport, including open spaces and forests. Refer to
Figure 7 for visual representations of sample images within the SADD.
In this article, to validate our method and enable comparison with other relevant papers using the same dataset, we randomly divide the SADD images into the training and test sets at a 5:1 ratio [
60]. The training set comprises 799 positive samples and 1673 negative samples, totaling 6948 annotated aircraft boxes. The test set includes 85 positive samples, 409 negative samples, and a total of 887 annotated aircraft boxes, as summarized in
Table 1.
Figure 8 showcases sample images from the SADD. The images exhibit notable variations in the size of aircraft targets, and the backgrounds surrounding specific aircraft targets are particularly intricate, presenting challenges for accurate aircraft positioning. Furthermore, the intricate background clutter points may be misleadingly identified as aircraft components, further complicating the detection process.
3.2. Evaluation Metrics
In SAR aircraft detection, precision rate and recall rate serve as common evaluation criteria. Yet, there is often a trade-off between precision and recall, meaning enhancing one may reduce the other. To mitigate this, we introduce the F1 score as a supplementary metric, providing a holistic indicator that balances accuracy and recall. The calculation formulas are detailed in Equations (12)–(14):
where the meaning of TP, FN, and FP are as shown in
Table 2.
3.3. Experimental Setup
FEMSFNet utilizes a 224 × 224 image resolution for both training and testing phases. During the training stage, random rotation, random mixup, and mosaic data-enhancement methods are employed. All these methods are validated on an NVIDIA 4060Ti GPU. The configuration details of experimental parameters are outlined in
Table 3.
3.4. Comparison to Existing Algorithms
To showcase the efficacy of FEMSFNet, we visualized its detection confidence maps in comparison to the baseline Yolov4-tiny, depicted in
Figure 9. FEMSFNet demonstrates superior accuracy in locating aircraft targets and effectively suppressing background clutter interference, resulting in agiler detection confidence compared to the baseline Yolov4-tiny.
To rigorously validate our algorithm, we compare it against two two-stage methods (Faster R-CNN [
61] and Cascade R-CNN [
62]) and four one-stage methods (SSD [
63], Yolov3 [
64], SEFEPNet [
60], and Yolov4-tiny [
36]), as outlined in
Table 4. To specify, the model complexities are as follows: Faster R-CNN at 160MB, Cascade R-CNN at 319 MB, SSD at 96 MB, Yolov3 at 236 MB, Yolov4-tiny at 19 MB, and FEMSFNet at 23 MB. The results demonstrate that FEMSFNet excels across various metrics, particularly in Precision and Model Size, with a maximum improvement of 92.7% in model size (compared to cascade R-CNN), 12.2% in precision, 12.9% in recall, and 13.3% in F1 score compared to the contrasted algorithms. This success is attributed to enriching dataset diversity through image enhancement, incorporating attention modules in the residual structure, and integrating pyramid modules in the CSP network within the FEMSFNet architecture. These components enable FEMSFNet to capture deeper, broader, and more accurate target information.
Figure 10 illustrates the detection results of FEMSFNet and other methods. In support of further research in Synthetic Aperture Radar (SAR) aircraft target detection, we plan to open-source FEMSFNet soon at
https://github.com/WenboEth/Sar-Aircaft-Target-Detection, accessed on 10 May 2024.
The results below highlight that while FEMSFNet may make mistakes in complex target recognition scenarios, it consistently outperforms the compared algorithms, especially in terms of precision, recall, and model size. We anticipate that FEMSFNet has untapped potential, and with further optimization, broader expansion, and deeper evolution, it can achieve even better results in the future.
In the realm of deep learning-based object detection, assessing a network’s generalization ability is crucial. A network model with robust generalization performance excels on unseen, complex, or even distorted data, serving as a key metric of the model’s practical utility. The FEMSFNet network enhances its training data through methods including rotation, scaling, and cropping, aiding the model in learning a broader range of variations to boost generalization performance. Moreover, techniques like dropout are utilized within the SdE-Resblock module, randomly “dropping” a portion of neurons during training to decrease the model’s dependency on specific data and enhance its generalization capability. To validate the generalization performance of the FEMSFNet network, a non-learned dataset (n-LD) was created. This dataset consists of targets the network has never “learned” from, not included in training, validation, or test sets. To challenge the network’s generalization and robustness, the n-LD was complicated and distorted, increasing the difficulty of target recognition. Examples of the n-LD cases are shown in
Figure 11. The n-LD was then fed into both the FEMSFNet and state-of-the-art models for comparative testing and validation of FEMSFNet’s generalization ability, with results presented in
Table 5. The charts reveal that FEMSFNet performs exceptionally well across various metrics, including accuracy, error rate, and omission rate. Such outcomes are attributable to the superior SdE-Resblock module and the ssppf-CSP structure of FEMSFNet.
3.5. Ablation Study
To enhance readers’ understanding of each module’s impact, we conducted an ablation study. Yolov4-tiny serves as the base framework for evaluating the effects of various feature modules on aircraft detection in SAR images, as outlined in
Table 6. In the ablation comparison experiments, the “×” mark in the loss function column indicates the use of the pre-improvement Yolov4-tiny loss function, detailed in Equations (1)–(7). The “√” mark indicates the use of the improved loss function, as detailed in Equations (8)–(11). Simultaneously,
Figure 12 presents the results of the ablation study conducted on the SADD dataset. The charts illustrate sequential application of different modules leading to improvements across various metrics. Precision sees the highest enhancement, with a 7.0% increase, while recall and F1 score exhibit improvements of 6.9% and 6.8%, respectively. It highlights that the ssppf-CSP module, integrating a soft feature pyramid into the CSP architecture, enhances network depth and incorporates features from multiple scales. This results in a substantial improvement in overall accuracy and recall metrics. The SdE-Resblock module, integrating an optimized attention mechanism into the residual structure, enhances the network’s focus on target detection, leading to improvements in recognition rate and recall. While the improved loss function module may not show a significant increase in evaluation metrics, its main impact lies in accelerating convergence and reducing computational complexity. As illustrated in
Figure 13, we have compared the iteration process of the loss function before and after improvements. It is evident from the figure that the improved loss function demonstrates an overall reduction in loss values compared to its prior state, along with a notably faster convergence rate. Furthermore, when comparing the same iteration round, there is a maximum enhancement of 26% in the loss value.
3.6. Result and Discussion
From the comparative validation analysis presented earlier, it is clear that in the domain of aircraft target detection in remote sensing SAR imagery, the FEMSFNet model excels across several metrics, including detection precision, recall rate, F1 score, and model size, achieving varying degrees of improvement. Specifically, it surpasses other algorithms, with a maximum increase of 12.2% in precision, 12.9% in recall, and 13.3% in F1 score. Moreover, in terms of model size, FEMSFNet achieves an 88% reduction compared to the largest model size benchmarked. The primary reason for this performance is that, unlike general-purpose state-of-the-art neural networks, FEMSFNet is a unique network specifically tailored for SAR imagery of aircraft targets. It eliminates unnecessary components for this task, such as the Region Proposal Network and multiple cascading detection heads, refining the network architecture. Initially, to address the challenges of limited remote sensing SAR image data, complex backgrounds due to noise, and various obstructions making target detection difficult, FEMSFNet incorporates a data augmentation module. This module complexifies or even distorts the source image data, tackling the issue of data scarcity while training the network to enhance its generalization capability and robustness with complicated images. Furthermore, addressing the diversity in target scales and complex angles in SAR imagery, FEMSFNet introduces the ssppf-CSP and SdE-Resblock modules. Optimized residual modules are employed to extract more feature information, and multi-scale fusion modules ensure the detection and recognition of targets across various scales, thereby improving the network’s accuracy. Lastly, in the specialized task of single-object detection for aircraft in SAR images, focusing on a singular category output, the class loss function is omitted to simplify computations. This approach addresses the imbalance between positive and negative samples—mostly background—by incorporating the focal loss function into the confidence measure of loss calculation. This modification significantly enhances the model’s capability to manage class imbalances and prioritize complex examples. Moreover, to ensure faster convergence and prevent overfitting, L2 regularization is integrated into the loss function, facilitating the weight decay of model parameters.
4. Conclusions
This paper introduces FEMSFNet, an SAR aircraft detection model prioritizing both speed and accuracy. FEMSFNet utilizes the lightweight CSPDarknet53-tiny as its backbone for efficient feature extraction. To maintain a lightweight yet accurate model, we integrate the optimized SE attention module with the ResNet module, forming the SdE-Resblock structure. A novel CSP structure, ssppf-CSP, prevents deviations in the receptive field during training, enhancing global feature fusion. Addressing unique characteristics in SAR aircraft target detection, FEMSFNet optimizes loss functions and employs techniques like learning rate cosine annealing decay and label smoothing to prevent overfitting, improving convergence speed and regression accuracy. For increased sample diversity, FEMSFNet employs multi-faceted image augmentation with techniques like noise addition, mosaic, mixup, rotation, and cropping, enhancing training generalization and robustness. Experiments on the SADD dataset demonstrate FEMSFNet’s effectiveness, surpassing state-of-the-art object-detection algorithms. FEMSFNet exhibits significant improvements compared to the contrasted algorithms in terms of precision rate (92%), recall rate (96%), and F1 score (94%). Notably, it surpasses contrasted algorithms with a maximum increase of 12.2% in precision, 12.9% in recall, and 13.3% in F1 score. Anticipating untapped potential, further optimization, broader expansion, and deeper evolution are expected to propel FEMSFNet to even better results in the future.