Next Article in Journal
Mapping Forage Biomass and Quality of the Inner Mongolia Grasslands by Combining Field Measurements and Sentinel-2 Observations
Next Article in Special Issue
An Efficient Cloud Classification Method Based on a Densely Connected Hybrid Convolutional Network for FY-4A
Previous Article in Journal
Hyperspectral Denoising Using Asymmetric Noise Modeling Deep Image Prior
Previous Article in Special Issue
Assessment of Machine and Deep Learning Approaches for Fault Diagnosis in Photovoltaic Systems Using Infrared Thermography
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Technical Note

Aircraft Target Detection in Low Signal-to-Noise Ratio Visible Remote Sensing Images

Research Center for Space Optical Engineering, Harbin Institute of Technology, Harbin 150001, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2023, 15(8), 1971; https://doi.org/10.3390/rs15081971
Submission received: 16 February 2023 / Revised: 2 April 2023 / Accepted: 6 April 2023 / Published: 8 April 2023
(This article belongs to the Special Issue Computer Vision and Image Processing in Remote Sensing)

Abstract

:
With the increasing demand for the wide-area refined detection of aircraft targets, remote sensing cameras have adopted an ultra-large area-array detector as a new imaging mode to obtain broad width remote sensing images (RSIs) with higher resolution. However, this imaging technology introduces new special image degradation characteristics, especially the weak target energy and the low signal-to-noise ratio (SNR) of the image, which seriously affect the target detection capability. To address the aforementioned issues, we propose an aircraft detection method for RSIs with low SNR, termed L-SNR-YOLO. In particular, the backbone is built blending a swin-transformer and convolutional neural network (CNN), which obtains multiscale global and local RSI information to enhance the algorithm’s robustness. Moreover, we design an effective feature enhancement (EFE) block integrating the concept of nonlocal means filtering to make the aircraft features significant. In addition, we utilize a novel loss function to optimize the detection accuracy. The experimental results demonstrate that our L-SNR-YOLO achieves better detection performance in RSIs than several existing advanced methods.

1. Introduction

Accurate aircraft detection in remote sensing images (RSIs) remains challenging, which has considerable research significance for airport surveillance, aviation security, national defense construction, and other applications. With the rapid development of remote sensing techniques, in order to achieve the fine detection of small-sized targets represented by aircrafts, some brand new imaging technologies have emerged to improve the resolution of RSIs and broaden the detection range of the targets. An imaging system with an ultra-large area-array detector is one of the most effective ways to obtain broad width high-resolution RSIs. Its main implementation means is to obtain the focal plane by mechanical splicing of multiple small-scale detectors. Consequently, the ultra-large area-array detector imaging technology can simultaneously achieve high resolution and wide-area imaging, which lays the foundation for improving the target detection accuracy.
At present, many research institutions in the remote sensing field have carried out research on the imaging technology of an ultra-large area-array detector [1,2,3,4,5,6,7,8]. The study results indicate that this type of imaging mode introduces new imaging degradation characteristics in application. Limited by the primary lens aperture dimension of the space load optical system and the satellite resource constraints, the ultra-large area-array detectors apply small pixels. This structure reduces the energy received by the individual pixel. Relatively, the noise component in the remote sensing imaging chain enhances the interference with the imaging quality integrally, resulting in a significant decrease in the signal-to-noise ratio (SNR) of the imaging results compared with the ideal RSIs, as shown in Figure 1.
Meanwhile, the aircrafts are presented as small targets and sparse samples relative to the background in broad width high-resolution RSIs. In addition, under the condition of low SNR, the aircraft saliency is reduced, and the target details become fuzzier with the interference of high frequency noise, which greatly increases the difficulty of target detection. Nevertheless, the existing target detection algorithms in remote sensing are mainly aimed at general targets with sufficient sample size, which lack a treatment scheme for the small targets represented by aircrafts under the brand new imaging system with an ultra-large area-array detector.
In response to these problems, we present a low SNR visible RSI aircraft detection network named L-SNR-YOLO. The contributions of this work are summarized as follows:
  • We propose a backbone combining the advantages of swin-transformer and traditional CNN, which extracts multiscale aircraft features and fuses the global–local information interactively.
  • We present an effective feature enhancement (EFE) module, which suppresses the complex background interference in low SNR visible RSIs.
  • We design a new loss function to promote the detection convergence performance.
  • We carry out comparative experiments with a range of existing popular detectors on UCAS_AOD, DOTA, and Google Earth and verify the feasibility and algorithm boundary of the proposed architecture.
The remainder of this paper is organized as follows. Section 2 elaborates the development of the aircraft detection technology based on visible RSIs and analyzes the defects of the existing methods. Section 3 describes the details of the proposed network framework. Section 4 analyzes the detection performance of the proposed method compared with the existing typical algorithms. Section 5 summarizes the research and provides the conclusion.

2. Related Work

Currently, there exists a lack of a pertinent target detection scheme for the imaging results of a remote sensing camera with an ultra-large area-array detector. Template matching is one category of the earliest methods applied to target detection in RSLs. The templates are firstly generated based on the effective features that distinguish the targets from the background. Then, the similarities between the template and possible positions within the entire RSI are calculated to locate the targets. Xu et al. [9] proposed a shape-matching method for low-altitude aircraft detection applying the artificial bee colony algorithm with edge potential function. However, these approaches are sensitive to changes in the target shape and density. Therefore, they have great limitations in the target detection affected by the multiscale and multidirectional characteristics of aircrafts in RSIs.
Machine learning-based methods are significant approaches for target detection in RSIs. The core idea of machine learning is to extract the features of the possible target region and then input them into the corresponding classifier to determine whether the region contains targets [10,11]. Gao et al. [12] utilized the circle-frequency filter to extract the target candidate locations to detect aircrafts in high-resolution RSIs. Liu et al. [13] implemented airplane detection by extracting the rotation invariant feature combined with sparse coding and radial gradient transform. Sun et al. [14] extracted the image space sparse coded bag-of-words (BOW) information as the input of a support vector machine (SVM) classifier to achieve aircraft detection. Cheng et al. [15] trained the target mixed variability component model based on multiscale histogram of oriented gradients (HOG) features and then performed target detection by calculating the mixture model response. Bi et al. [16] proposed the local context deformable part model (DPM) by establishing the local context HOG feature pyramids to achieve target detection on the aircraft candidate regions.
With the increasingly wide application of convolutional neural network (CNN) architecture in the field of image processing [17,18,19,20], the approaches based on deep learning have greatly improved the target detection accuracy compared with the traditional methods. The CNNs train the image data by convolution, nonlinearity mapping, and pooling operations and obtain the feature extractors applied in the target detection task.
Two-stage and one-stage architectures are two common underlying frameworks of CNNs used in RSI target detection. Two-stage algorithms represented by regions with CNN features (R-CNN) [21], spatial pyramid pooling networks (SPP-Net) [22], Fast R-CNN [23], and Faster R-CNN [24] firstly generate candidate regions with potential targets on RSIs through CNNs and then perform classification and boundary regression on candidate regions. In these algorithms, the accuracy and quantity of the candidate regions directly influence the target detection performance [25]. Sequentially, one-stage algorithms omit the step of generating candidate regions and regard the classification and positioning in target detection as a regression problem [26], which can reduce the computational cost to enhance the target detection speed, including you only look once (YOLO) [27,28,29], single shot multi-box detector (SSD) [30], and anchor-free algorithms [31,32]. In particular, YOLO series algorithms have attracted much attention in RSI target detection by possessing high detection accuracy and detection speed simultaneously.
Meanwhile, considering the characteristics including the complex background and small target proportion in the RSIs, an increasing amount of research has introduced an attention mechanism [33,34,35,36] into CNNs to obtain target detection results with the interpretability of visual perception. Shi et al. [37] integrated contextual information into feature maps based on a spatial attention mechanism, which effectively improved the aircraft detection accuracy. Chen et al. [38] constructed a cascade attention network by combining self-attention and spatial attention modules to enhance the representation of the target characteristics. In particular, the transformer-based architectures apply the self-attention mechanisms and multilayer perceptron replacing the convolution operator to capture long-range dependencies, which have been gradually utilized for computer vision tasks based on RSIs to improve the algorithm performance [39].
Although there has been much research on RSI target detection, the influence of the brand new imaging system with an ultra-large area-array detector acting on the image quality restricts the application of the existing methods. Concretely, the low SNR of whole RSIs and fuzzy target details weaken the target saliency, which consequently decreases the target detection accuracy. For the above situation, the existing algorithms are difficult to directly apply, and it is urgent to put forward a targeted optimization and improvement scheme.

3. Proposed Method

We integrate the swin-transformer (ST) and YOLOv5 [40] as the backbone to obtain target multiscale information effectively and take global and local features into account. Concurrently, we enhance the target feature to inhibit the complex background interference in RSIs with a low SNR. In addition, we design a new loss function adding the angle between the prediction and the ground truth box as a constraint to promote the detection convergence performance.

3.1. Subsection Overall Architecture

The proposed entire network architecture is shown in Figure 2. Integrally, aiming at the aircraft target’s directional uncertainty, size inconsistency, and interference from a complex airport background, a backbone network is proposed to take the global/local feature into account in the low SNR visible RSIs. Convolution operations obtain local interactions among pixels in the RSIs, which makes the CNNs possess a generalizable and efficient framework but a lack of spatial connectivity. In addition, the vision transformers (ViTs) can extract the global interactions by their self-attention mechanisms and multilayer perceptron. We integrate the swin-transformer (ST) [41] and YOLOv5 in the proposed network, which optimizes the traditional CNN by combining the advantages of the ViTs. The former can further capture multiscale information, and the latter exhibits good performance on target detection speed and accuracy synthetically.
Specifically, we replace the last cross stage partial layer in the backbone of YOLOv5 with swin-transformer encoder blocks, which consist of multi-head self-attention (MSA) layers with “windows” and “shifted windows” followed by the multilayer perceptron (MLP) respectively, linked by the normalization layer (LN) and the residual connection processing.
Meanwhile, facing the target saliency reduction in RSIs with low SNR, we design an effective feature enhancement module to suppress the interference of high frequency noise on target details, which is elaborated in Section 3.2.

3.2. Effective Feature Enhancement Module

In order to enhance the learning ability, YOLOv5 adopts the simplified cross stage partial network (C3) for feature extraction. This structure splits the gradient flow and merges the output after different network paths to avoid the gradient disappearance caused by the deepening of network layers [42]. However, there exists a large amount of high-frequency noise in the deep features of an RSI with a low SNR caused by the small pixel structure in the ultra-large area-array detector, which affects the ability of the network to extract the target high-frequency features. Therefore, we design the WFE block based on C3 in combination with the idea of nonlocal means filtering.
Nonlocal means filtering can make full use of the feature map information and keep detailed features to the maximum extent while suppressing the noise. The core idea is to calculate the weighted sum of all pixels in the search window to a certain pixel as its output value, which can be expressed as
y i = j w ( x i , x j ) I ( x i )
where x i and x j are pixels in feature map,   I ( x i ) is the pixel value of x i , w ( x i , x j ) represents the weight, and y i is the response value at the position of i in the feature map after filtering. In addition, based on the principle of cosine similarity, we choose the Embedded Gaussian function to calculate the similarity of x i and x j in a feature map. It can be calculated as
w ( x i , x j ) = exp ( ( Ψ x i ) T ( Ζ x j ) ) j exp ( ( Ψ x i ) T ( Ζ x j ) ) = s o f t m a x ( ( Ψ x i ) T ( Ζ x j ) ) ,
where   Ψ and Ζ are the linear mapping matrixes.
Based on the above formulas, the effective feature enhancement module is illustrated in Figure 3:

3.3. Loss Function

We improve the loss function in order to improve the target detection accuracy in RSIs with low SNR.
By analogy with YOLO networks, we jointly describe the loss function of the confidence loss, category loss, and location loss. First, we add the angle between the prediction box and the ground truth box as a constraint to design a new angular location loss. The distance loss σ d i s   is defined as
σ d i s = [ 1 e ( 2 η ) ρ x ] + [ 1 e ( 2 η ) ρ y ]
ρ x = [ x c t x c t g t max ( x r x r g t ) min ( x l x l g t ) ] 2
ρ y = [ y c t y c t g t max ( y r y r g t ) min ( y l y l g t ) ] 2 ,
where ( x r ,   y r ) , ( x l ,   y l ) , and ( x c t ,   y c t ) respectively represent the coordinates of the left, right, and center points of the prediction box. Analogously, ( x r g t ,   y r g t ) , ( x l g t ,   y l g t ) , and ( x c t g t ,   y c t g t ) are the coordinates of the left, right, and center points of the ground truth box.
The angle parameter can be computed as
η = sin ( 2 arcsin ( d h d h 2 + d w 2 ) ) ,
where d w and d h are the distances in the x , y directions between the center points of the prediction box and the ground truth box.
Meanwhile, we define a new angular intersection over union (AGIoU)
A G I o U = S S g t ( w d i s × h d i s S S g t ) w d i s × h d i s
w d i s = max ( x r x r g t ) min ( x l x l g t )
h d i s = max ( y r y r g t ) min ( y l y l g t ) ,
where S ,   S g t are the area of prediction box and ground truth box, respectively.
Then, the angular location loss ( L a g l o c ) can be written as
L a g l o c = 1 A G I o U + σ d i s .
Parallelly, the confidence loss L o b j and category loss L a i r c r a f t can be described separately by binary cross-entropy loss and cross-entropy loss, as
L o b j = i = 0 M 2 j = 0 B l i j o b j _ T [ C i l n ( C i g t ) + ( 1 C i ) ln ( 1 C i g t ) ] i = 0 M 2 j = 0 B l i j o b j _ F [ C i l n ( C i g t ) + ( 1 C i ) ln ( 1 C i g t ) ]
L a i r c r a f t = i = 0 M 2 l i j o b j _ T ( p i g t _ a i r c r a f t log p i _ a i r c r a f t ) ,
where each input RSI is split into M 2 grids, each of which is distributed with B bounding boxes, l i j o b j is used to determine whether the jth bounding box in the ith grid is responsible for predicting target, and C i g t ,   C i are the annotation value and the predicted value of bounding box in the ith grid containing target confidence, respectively. p i g t ,   p i are the annotation value and the predicted value of the prediction result belonging to an aircraft.
Furthermore, we set weighting factors to allocate the significance of each loss function reasonably. The loss function of L-SNR-YOLO can be obtained as:
L L S N R Y O L O = λ o b j L o b j + λ a i r c r a f t L a i r c r a f t + λ a g l o c L a g l o c ,
where   λ o b j ,   λ a i r c r a f t ,   λ a g l o c are the weighting factors. After testing and contrasting, the most satisfactory detection result can be obtained when ( λ o b j , λ a i r c r a f t , λ a g l o c ) are set to (1.0,0.5,0.05).

4. Discussion

4.1. Dataset

We carried out experiments based on RSIs from the University of Chinese Academy of Sciences-high resolution aerial object detection (UCAS_AOD) [43] dataset of object detection in aerial images (DOTA) [44] and Google Earth to analyze the application performance of the proposed algorithm. The dataset covers multi-sized aircraft targets in various parking directions along with complex backgrounds such as airstrips and airfield construction. Considering the new imaging characteristics caused by the ultra-large area-array detector integrated with small pixels in practical applications, we obtained the degraded RSIs based on the optical remote sensing chain [45], including atmospheric transmission, optical imaging system, platform vibration, detector, and other links, as shown in Figure 4. It is noteworthy that the RSI quality was seriously reduced by the comprehensive influence of the detector dark current noise, the shot noise, the readout noise, and the quantization noise generated by the ultra-large area-array detector and its electronic system.
Finally, we acquired 2439 RSIs containing aircrafts with a resolution of 640 × 640 pixels and randomly selected 80% of them as the training set (1951 images) and the rest as the testing set (488 images). Figure 5 shows some image slices of the employed dataset.

4.2. Evaluation Metrics

We selected the average precision (AP), which is defined as the area between the Precision–Recall curve and the coordinate axes, to evaluate the detection algorithm performance.
A P = 0 1 p r e c i s i o n ( r e c a l l ) d ( r e c a l l )
p r e c i s i o n = T P T P + F P
r e c a l l = T P T P + F N ,
where T P represents the correctly predicted positive samples, and F P ,   F N represent the incorrectly predicted positive samples and negative samples by the detection algorithm, respectively.
Furthermore, the interference of the complex backgrounds with detection was enhanced in the low SNR RSIs, which increased the target misjudgment probability of the algorithm. Therefore, we introduced the false alarm rate (FAR) to further evaluate the algorithm detection effect.
F A R = number   of   detected   false   alarms number   of   detected   candidates
Additionally, we calculated the SNR of the target region (TR-SNR) to measure the quality of the RSI imaging by the ultra-large area-array detector so as to determine the algorithm’s applicable range.
T R _ S N R = 10 l g i = 1 M j = 1 N g ( i , j ) 2 i = 1 M j = 1 N | g ( i , j ) f ( i , j ) | 2 ,
where M ,   N are the length and width of in RSI target region, respectively, and g ( i , j ) , f ( i , j )   represent the gray values at ( i , j ) of the original image and the processed image, respectively.

4.3. Results and Analysis

The experiments were carried out on a workstation with an NVIDIA GeForce RTX3080 Ti GPU (12 GB VRAM). Before network training, we applied the K-means clustering method to generate the anchor box dimensions based on prior geometric feature of the aircrafts in the dataset. In addition, we used the mosaic data augmentation to combine nine training samples into one to enrich the dataset and reduce the GPU video memory concurrently.
In the training parameter setting, we set the epoch to 220 and the batch size to eight. Combined with the aircraft target characteristics in our dataset, we used the stochastic gradient descent (SGD) to minimize the loss with momentum of 0.93 and weight decay of 0.0005 and used cosine annealing to adjust the learning rate with the periodic learning rate of 0.2. We conducted comparative experiments on the original RSIs and the degraded RSIs with low SNR to verify the effectiveness of the proposed method. Specifically, we tested the degraded RSIs with the model trained with the degraded RSIs and tested the original RSIs by the model trained with the original RSIs. Figure 6 shows several examples of the detection results under the condition of IoU ≥ 0.5 to illustrate this intuitively. The confidence labels in the detection results were removed for a concise and distinct display. Overall, the target detection effect on the degraded RSIs with low SNR dropped sharply compared with that of the original RSIs, indicating that the degradation characteristics caused by the ultra-large area-array detector seriously interfered with the target detection. In the original RSIs (the first, third, and fifth rows in Figure 6), under the condition that the targets could be detected, the prediction boxes of our proposed method had higher accuracy compared with the YOLOv5 baseline. In addition, in the degraded RSIs with low SNR (the second, fourth, and sixth rows in Figure 6), our method avoided false alarms effectively in the first scene. In contrast, the YOLOv5 network misjudged the airdrome facility as a target. For the second scene, our approach was capable of detecting targets with fuzzy details, while there were missing alarms in the detection results of the YOLOv5. Moreover, the prediction boxes of our algorithm were more accurate in the third scene. Consequently, our method obtained more satisfactory detection results.
Meanwhile, we analyzed the unsatisfactory detection results of the proposed algorithm in Figure 7. Our approach detected the target that the YOLOv5 baseline failed to detect, improving the target detection ability of the algorithm in the degraded RSIs. However, due to the excessive small scale of some targets and the texture features similar to noisy background, our method did not detect all targets in this scene.
We performed ablation experiments on the dataset to illustrate the effects of the proposed module and new loss function. Furthermore, we compared the proposed algorithm with several advanced methods containing Faster R-CNN [24], YOLOv3 [28], YOLOv4 [29], and RetinaNet [46]. Table 1 and Table 2 shows the detection results of the multi-scenario RSIs on the dataset. The proposed method outperformed the other compared approaches in detection accuracy. In particular, compared to the YOLOv5, the proposed algorithm achieved better detection results with the AP value increasing by 3.2% and the FAR value reducing by 3.0%. When constrained by the new loss function, the proposed algorithm performance was further improved, the AP value was increased by 3.6%, and the FAR value was reduced by 3.9%.
In addition, in order to analyze the proposed framework applicability, we plotted the AP and FAR variation curves of our detection method in the RSIs with different TR-SNR in Figure 8. The TR-SNR of the RSIs’ imaging by ultra-large area-array detector varied between 7 dB and 15 dB according to the existing prior cognition. When the TR-SNR was between 10 and 15 dB, our algorithm effectively suppressed the influence of the SNR reduction on the detection effect. While under the condition of the TR-SNR lower than 5 dB, the AP of the algorithm was approximately equal to the FAR, indicating that the detection results lost the reference significance. Accordingly, the TR-SNR of the RSIs should be considered in the imaging system design, by ensuring its value better than 10dB for subsequent image interpretation.

5. Conclusions

In this paper, facing the space application of the new imaging mode with ultra-large area-array detectors integrated with small pixels, the L-SNR-YOLO network is proposed for aircraft detection in degraded RSIs considering the new imaging characteristics. As opposed to the existing aircraft detection methods, the hybrid backbone is structured based on swin-transformer and CNN, which extracts the multiscale global and local RSI features to improve the algorithm robustness effectively. Moreover, an effective feature enhancement module is developed by the idea of nonlocal means filtering, which suppresses the background noise at the feature map level to enhance the saliency of the aircraft features. Additionally, a new loss function is presented to optimize the detection accuracy in RSIs with low SNR. Finally, we conducted experiments on the ordinary RSIs and the RSIs with low SNR to verify the impact of the new imaging degradation characteristics caused by the ultra-large area-array detector on the aircraft target accuracy. The experiments’ results indicate that the detection precision of the L-SNR-YOLO network reaches 82.2% in RSIs with low SNR, which has better accuracy effectiveness compared with other benchmarking frameworks. In addition, the proposed architecture can be applied to other targets under this new imaging mode. Finally, it can be extended to target detections of degraded images under harsh weather conditions or limited observation resource conditions. In further research, we plan to establish the aircraft detection method under the condition of target partial occlusion and other complex background distractions.

Author Contributions

Conceptualization, R.N.; methodology, R.N.; software, R.N.; validation, S.J.; formal analysis, R.N.; investigation, L.Y.; resources, X.Z.; data curation, X.Z.; writing—original draft preparation, R.N.; writing—review and editing, R.N. and S.J.; supervision, J.G.; project administration, W.Z.; funding acquisition, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant numbers: 61975043 and 62101160.

Data Availability Statement

The datasets for this research were released in September 2015 and November 2017, which are available at https://hyper.ai/datasets/5419 and https://captain-whu.github.io/DOTA/dataset.html, respectively, accessed on 12 February 2023.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Kohley, R.; Gare, P.; Vetel, C.; Marchais, D.; Chassat, F. Gaia’s FPA: Sampling the sky in silicon. In Proceedings of the Conference on Space Telescopes and Instrumentation—Optical, Infrared, and Millimeter Wave, Amsterdam, The Netherlands, 1–6 July 2012. [Google Scholar]
  2. Ivezic, Z.; Connolly, A.J.; Juric, M. Everything we’d like to do with LSST data, but we don’t know (yet) how. In Proceedings of the 325th Symposium of the International-Astronomical-Union (IAU), Sorrento, Italy, 19–25 October 2016; pp. 93–102. [Google Scholar]
  3. O’Connor, P. Uniformity and stability of the LSST focal plane. J. Astron. Telesc. Instrum. Syst. 2019, 1, 41508. [Google Scholar] [CrossRef] [Green Version]
  4. Romeo, S.; Cosentino, A.; Giani, F.; Mastrantoni, G.; Mazzanti, P. Combining Ground Based Remote Sensing Tools for Rockfalls Assessment and Monitoring: The Poggio Baldi Landslide Natural Laboratory. Sensors 2021, 21, 2632. [Google Scholar] [CrossRef] [PubMed]
  5. Hu, J.; Wang, X.; Ruan, N. A Novel Optical System of On-Axis Three Mirror Based on Micron-Scale Detector. In Proceedings of the 5th International Symposium of Space Optical Instruments and Applications, Sino Holland Space Opt Instruments Joint Lab, Beijing, China, 5–7 September 2018; pp. 159–173. [Google Scholar]
  6. Sun, G.; Li, L. A new method of focal plane mosaic for space remote sensing camera. In Proceedings of the International Symposium on Photoelectronic Detection and Imaging 2011—Advances in Imaging Detectors and Applications, Beijing, China, 24–26 May 2011. [Google Scholar]
  7. Han, C.Y. Recent earth imaging commercial satellites with high resolutions. Chin. J. Opt. Appl. Opt. 2010, 3, 201–208. [Google Scholar]
  8. Zeng, D.; Du, X. Influence of detector’s pixel size on performance of optical detection system. Zhongguo Kongjian Kexue Jishu/Chin. Space Sci. Technol. 2011, 31, 51–55. [Google Scholar]
  9. Xu, C.; Duan, H. Artificial bee colony (ABC) optimized edge potential function (EPF) approach to target recognition for low-altitude aircraft. Pattern Recognit. Lett. 2010, 31, 1759–1772. [Google Scholar] [CrossRef]
  10. Alganci, U.; Soydas, M.; Sertel, E. Comparative Research on Deep Learning Approaches for Airplane Detection from Very High-Resolution Satellite Images. Remote Sens. 2020, 12, 458. [Google Scholar] [CrossRef] [Green Version]
  11. Cheng, G.; Han, J. A survey on object detection in optical remote sensing images. ISPRS J. Photogramm. Remote Sens. 2016, 117, 11–28. [Google Scholar] [CrossRef] [Green Version]
  12. Gao, F.; Xu, Q.Z.; Li, B. Aircraft Detection from VHR Images Based on Circle-Frequency Filter and Multilevel Features. Sci. World J. 2013, 2013, 1–7. [Google Scholar] [CrossRef] [Green Version]
  13. Liu, L.; Shi, Z. Airplane detection based on rotation invariant and sparse coding in remote sensing images. Optik 2014, 125, 5327–5333. [Google Scholar] [CrossRef]
  14. Sun, H.; Sun, X.; Wang, H.; Li, Y.; Li, X. Automatic Target Detection in High-Resolution Remote Sensing Images Using Spatial Sparse Coding Bag-of-Words Model. IEEE Geosci. Remote Sens. Lett. 2012, 9, 109–113. [Google Scholar] [CrossRef]
  15. Cheng, G.; Han, J.; Guo, L.; Qian, X.; Zhou, P.; Yao, X.; Hu, X. Object detection in remote sensing imagery using a discriminatively trained mixture model. ISPRS J. Photogramm. Remote Sens. 2013, 85, 32–43. [Google Scholar] [CrossRef]
  16. Bi, F.; Yang, Z.; Lei, M.; Bian, M. Airport aircraft detection based on local context dpm in remote sensing images. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Yokohama, Japan, 28 July–2 August 2019; pp. 1362–1365. [Google Scholar]
  17. Shi, T.; Gong, J.; Hu, J.; Zhi, X.; Zhang, W.; Zhang, Y.; Zhang, P.; Bao, G. Feature-Enhanced CenterNet for Small Object Detection in Remote Sensing Images. Remote Sens. 2022, 14, 5488. [Google Scholar] [CrossRef]
  18. Hu, J.; Zhi, X.; Shi, T.; Zhang, W.; Cui, Y.; Zhao, S. PAG-YOLO: A Portable Attention-Guided YOLO Network for Small Ship Detection. Remote Sens. 2021, 13, 3059. [Google Scholar] [CrossRef]
  19. Yu, L.; Zhi, X.; Hu, J.; Jiang, S.; Zhang, W.; Chen, W. Small-Sized Vehicle Detection in Remote Sensing Image Based on Keypoint Detection. Remote Sens. 2021, 13, 4442. [Google Scholar] [CrossRef]
  20. Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
  21. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
  22. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. In Proceedings of the 13th European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 346–361. [Google Scholar]
  23. Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Washington, DC, USA, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
  24. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
  25. Cheng, G.; Zhou, P.; Han, J. Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
  26. Chen, Q.; Wang, Y.; Yang, T.; Zhang, X.; Cheng, J.; Sun, J.; IEEE Comp, S.O.C. You Only Look One-level Feature. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Electr Network, Nashville, TN, USA, 19–25 June 2021; pp. 13034–13043. [Google Scholar]
  27. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  28. Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
  29. Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv 2004, arXiv:2004.10934 2020. [Google Scholar]
  30. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 14 September 2016; pp. 21–37. [Google Scholar]
  31. Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. Int. J. Comput. Vis. 2020, 128, 642–656. [Google Scholar] [CrossRef] [Green Version]
  32. Zhou, X.; Zhuo, J.; Krahenbuhl, P.; Soc, I.C. Bottom-up Object Detection by Grouping Extreme and Center Points. In Proceedings of the 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 850–859. [Google Scholar]
  33. Shi, L.; Kuang, L.; Xu, X.; Pan, B.; Shi, Z. CANet: Centerness-Aware Network for Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
  34. Zhao, Z.; Hu, D.; Wang, H.; Yu, X. Convolutional Transformer Network for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
  35. Liang, Y.; Zhang, P.; Mei, Y.; Wang, T. PMACNet: Parallel Multiscale Attention Constraint Network for Pan-Sharpening. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
  36. Deng, P.; Xu, K.; Huang, H. When CNNs Meet Vision Transformer: A Joint Framework for Remote Sensing Scene Classification. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
  37. Shi, G.; Zhang, J.; Liu, J.; Zhang, C.; Zhou, C.; Yang, S. Global Context-Augmented Objection Detection in VHR Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2021, 59, 10604–10617. [Google Scholar] [CrossRef]
  38. Chen, L.; Liu, C.; Chang, F.; Li, S.; Nie, Z. Adaptive multi-level feature fusion and attention-based network for arbitrary-oriented object detection in remote sensing imagery. Neurocomputing 2021, 451, 67–80. [Google Scholar] [CrossRef]
  39. Wang, L.; Li, R.; Wang, D.; Duan, C.; Wang, T.; Meng, X. Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images. Remote Sens. 2021, 13, 3065. [Google Scholar] [CrossRef]
  40. Ultralytics. YOLOv5 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 19 December 2022).
  41. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 18th IEEE/CVF International Conference on Computer Vision (ICCV), Electr Network, Montreal, BC, Canada, 11–17 October 2021; pp. 9992–10002. [Google Scholar]
  42. Wang, C.-Y.; Liao, H.-Y.M.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.H.; IEEE Comp, S.O.C. CSPNet: A New Backbone that can Enhance Learning Capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Electr Network, Washington, DC, USA, 14–19 June 2020; pp. 1571–1580. [Google Scholar]
  43. Zhu, H.; Chen, X.; Dai, W.; Fu, K.; Ye, Q.; Jiao, J. Orientation Robust Object Detection in Aerial Images Using Deep Convolutional Neural Network. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 27–30 September 2015; pp. 3735–3739. [Google Scholar]
  44. Xia, G.-S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A Large-scale Dataset for Object Detection in Aerial Images. In Proceedings of the 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar]
  45. Fiete, R.D. Image chain analysis for space imaging systems. J. Imaging Sci. Technol. 2007, 51, 103–109. [Google Scholar] [CrossRef]
  46. Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. In Proceedings of the 16th IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar]
Figure 1. Comparison of an ordinary RSI and an SNR RSI: (a) the ordinary RSI, (b) the low SNR RSI, (c) histogram of the ordinary RSI, (d) histogram of the low SNR RSI.
Figure 1. Comparison of an ordinary RSI and an SNR RSI: (a) the ordinary RSI, (b) the low SNR RSI, (c) histogram of the ordinary RSI, (d) histogram of the low SNR RSI.
Remotesensing 15 01971 g001
Figure 2. Overall architecture of the proposed method. (a) Composition of the L-SNR-YOLO network. (b) Modules used in the L-SNR-YOLO network.
Figure 2. Overall architecture of the proposed method. (a) Composition of the L-SNR-YOLO network. (b) Modules used in the L-SNR-YOLO network.
Remotesensing 15 01971 g002
Figure 3. The network structure diagram of effective feature enhancement.
Figure 3. The network structure diagram of effective feature enhancement.
Remotesensing 15 01971 g003
Figure 4. Optical remote sensing chain imaging simulation process.
Figure 4. Optical remote sensing chain imaging simulation process.
Remotesensing 15 01971 g004
Figure 5. Examples of image slices from the dataset. Slices contain aircrafts and airport complex backgrounds.
Figure 5. Examples of image slices from the dataset. Slices contain aircrafts and airport complex backgrounds.
Remotesensing 15 01971 g005
Figure 6. Comparison experiment results of our method with the YOLOv5 baseline: (a) input RSIs (partial areas marked by the white boxes), (b) partially enlarged RSIs with ground truth (represented by the yellow boxes), (c) YOLOv5 results (represented by the blue boxes), and (d) L-SNR-YOLO results (represented by the red boxes).
Figure 6. Comparison experiment results of our method with the YOLOv5 baseline: (a) input RSIs (partial areas marked by the white boxes), (b) partially enlarged RSIs with ground truth (represented by the yellow boxes), (c) YOLOv5 results (represented by the blue boxes), and (d) L-SNR-YOLO results (represented by the red boxes).
Remotesensing 15 01971 g006aRemotesensing 15 01971 g006b
Figure 7. The unsatisfactory detection result: (a) input RSI (partial area marked by the white box), (b) partially enlarged RSIs with ground truth (represented by the yellow boxes), (c) YOLOv5 results (represented by the blue boxes), and (d) L-SNR-YOLO results (represented by the red boxes).
Figure 7. The unsatisfactory detection result: (a) input RSI (partial area marked by the white box), (b) partially enlarged RSIs with ground truth (represented by the yellow boxes), (c) YOLOv5 results (represented by the blue boxes), and (d) L-SNR-YOLO results (represented by the red boxes).
Remotesensing 15 01971 g007
Figure 8. Variation curves of the AP and FAR.
Figure 8. Variation curves of the AP and FAR.
Remotesensing 15 01971 g008
Table 1. Ablation experiments of our proposed method. The bold highlights the best performance.
Table 1. Ablation experiments of our proposed method. The bold highlights the best performance.
MethodLoss FunctionAPFAR
YOLOv5 [40]The original function78.6%29.4%
YOLOv5+STThe original function80.3%29.0%
YOLOv5+NLC3The original function81.0%26.7%
YOLOv5+ST+WFEThe original function81.8%26.4%
YOLOv5+ST+WFEThe proposed function82.2%25.5%
Table 2. Results of the comparative experiment.
Table 2. Results of the comparative experiment.
MethodsAP (Low SNR)FAR (Low SNR)AP (Original)FAR (Original)
Faster R-CNN [24]54.9%49.8%65.8%25.4%
YOLOv3 [28]62.5%44.1%70.2%30.7%
YOLOv4 [29]76.4%32.1%81.6%29.2%
RetinaNet [46]63.3%42.8%71.0%33.2%
Proposed method82.2%25.5%86.0%25.2%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Niu, R.; Zhi, X.; Jiang, S.; Gong, J.; Zhang, W.; Yu, L. Aircraft Target Detection in Low Signal-to-Noise Ratio Visible Remote Sensing Images. Remote Sens. 2023, 15, 1971. https://doi.org/10.3390/rs15081971

AMA Style

Niu R, Zhi X, Jiang S, Gong J, Zhang W, Yu L. Aircraft Target Detection in Low Signal-to-Noise Ratio Visible Remote Sensing Images. Remote Sensing. 2023; 15(8):1971. https://doi.org/10.3390/rs15081971

Chicago/Turabian Style

Niu, Ruize, Xiyang Zhi, Shikai Jiang, Jinnan Gong, Wei Zhang, and Lijian Yu. 2023. "Aircraft Target Detection in Low Signal-to-Noise Ratio Visible Remote Sensing Images" Remote Sensing 15, no. 8: 1971. https://doi.org/10.3390/rs15081971

APA Style

Niu, R., Zhi, X., Jiang, S., Gong, J., Zhang, W., & Yu, L. (2023). Aircraft Target Detection in Low Signal-to-Noise Ratio Visible Remote Sensing Images. Remote Sensing, 15(8), 1971. https://doi.org/10.3390/rs15081971

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop