Next Article in Journal
A Novel Multi-Scale Feature Map Fusion for Oil Spill Detection of SAR Remote Sensing
Previous Article in Journal
A Signal Matching Method of In-Orbit Calibration of Altimeter in Tracking Mode Based on Transponder
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

High-Speed Spatial–Temporal Saliency Model: A Novel Detection Method for Infrared Small Moving Targets Based on a Vectorized Guided Filter

1
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
2
Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, Chinese Academy of Sciences, Beijing 100190, China
3
School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(10), 1685; https://doi.org/10.3390/rs16101685
Submission received: 27 March 2024 / Revised: 4 May 2024 / Accepted: 8 May 2024 / Published: 9 May 2024

Abstract

:
Infrared (IR) imaging-based detection systems are of vital significance in the domains of early warning and security, necessitating a high level of precision and efficiency in infrared small moving target detection. IR targets often appear dim and small relative to the background and are easily buried by noise and difficult to detect. A novel high-speed spatial–temporal saliency model (HS-STSM) based on a guided filter (GF) is proposed, which innovatively introduces GF into IR target detection to extract the local anisotropy saliency in the spatial domain, and substantially suppresses the background region as well as the bright clutter false alarms present in the background. Moreover, the proposed model extracts the motion saliency of the target in the temporal domain through vectorization of IR image sequences. Additionally, the proposed model significantly improves the detection efficiency through a vectorized filtering process and effectively suppresses edge components in the background by integrating a prior weight. Experiments conducted on five real infrared image sequences demonstrate the superior performance of the model compared to existing algorithms in terms of the detection rate, noise suppression, real-time processing, and robustness to the background.

Graphical Abstract

1. Introduction

Infrared imaging-based detection systems can transform the infrared radiation emitted by an object into observable infrared images, which enables target detection and tracking. These systems currently serve as crucial tools in various fields such as early warning and security. However, infrared small moving targets, such as small aircraft and unmanned aerial vehicles [1], are typically observed at a considerable distance, resulting in images with small and dim targets that often occupy only a few pixels and lack distinctive features. Additionally, the complex imaging environment often includes areas with high radiation characteristics, leading to the presence of various forms of bright noise in the background [2]. As a result, IR targets often appear dim and small relative to the background and are easily buried by noise and difficult to detect. Additionally, real-time detection of infrared targets requires high-speed detection methods. Given these challenges, the demand for enhanced detection accuracy and timeliness has significantly increased. Enhancing the stability and real-time performance of infrared small moving target detection in complex backgrounds holds significant practical importance.

1.1. Related Works

Conventional spatial filtering methods for IR target detection, including the top-hat transform filter [3], max-mean/max-median filters [4], and bilateral filter [5], leverage the discernible features of the target and background in the image sequence. Although these techniques are computationally efficient and straightforward to implement, they tend to be less effective in complex scenes.
Based on low-rank sparse decomposition (LRSD), various methods have been developed, such as the infrared patch image model (IPI) by Gao et al. [6] and the reweighted infrared patch tensor model (RIPT) proposed by Dai et al. [7]. Zhang et al. [8] introduced corner points and edge prior weight information and approximated the rank of the matrix by partial sum of tensor singular values and proposed the partial sum of tensor nuclear norm (PSTNN). Hu et al. [9] presented the multi-frame spatial–temporal image block tensor model (MFSTPT) that utilizes rank approximation with Laplace operator constraints.
Human visual system (HVS)-based methods perform target detection by extracting salient features such as local contrast. The local contrast method (LCM), proposed by Chen et al. [10], was one of the first methods to use this approach for IR target detection. Wei et al. [11] introduced the multiscale patch-based contrast measure (MPCM) which builds upon the LCM by adding a multiscale contrast calculation, leading to an improved detection performance. Shi et al. [12] introduced a high-boost-based multiscale local contrast measure (HB-MLCM) as a refinement to LCM. Additionally, Han et al. [13] proposed a multiscale three-layer local contrast measurement (TLLCM), while Cui et al. [14] proposed a weighted three-layer window local contrast method (WTLLCM). To capture time domain information between frames, Deng et al. [15] developed the spatial–temporal local contrast filter (STLCF). Du and Hamdulla [16] also introduced a novel spatial–temporal local difference measure (STLDM) algorithm for the detection of moving targets in IR image sequences. Moreover, Ma et al. [17] proposed a method based on coarse-to-fine structure (MFCS). HVS-based detection methods have gained widespread use due to their minimal reliance on prior information and efficient computational processing. However, there is still room for improvement in accuracy for detecting small moving IR targets in complex scenes and robustness of noise.
In recent years, neural networks have emerged as a viable approach for detecting IR targets. Wang et al. [18] utilized convolutional neural networks (CNNs) for extracting target features. Furthermore, the YOLO series, introduced by Redmon et al. [19], and generative adversarial networks (GANs) proposed by Goodfellow et al. [20] have been the subject of extensive research. Wang et al. [21] introduced the asymmetric patch attention fusion network (APAFNet), aimed at fusing high-level semantics and low-level spatial details. Nevertheless, the limited resolution of IR images and sample adequacy poses a challenge as it restricts the extraction of features from targets. Consequently, there is still a need to enhance the detection efficiency and robustness of these methods.

1.2. Motivation

Most of the current methods for infrared small moving target detection process single-frame images, focusing only on the spatial domain features of the target. Therefore, these methods are most effective in scenarios where the background is uncomplicated and the target stands out prominently in comparison to the background. In situations where targets have low contrast, small size, and complex backgrounds, ensuring the accuracy and robustness of detection results can be challenging. Effectively utilizing the inter-frame information of IR image sequences to extract temporal-domain features of the target becomes crucial for enhancing detection accuracy and reducing false alarms.
In addition, the current filtering methods and HVS-based methods are prone to miss detection and false alarms, particularly in scenarios with complex scenes. Therefore, algorithms for extracting spatial-domain features of the target based on local saliency require further enhancement. This is especially relevant in cases where strong edges exist in the background and point noise resembles the shape of the target. It becomes essential to integrate the suppression of background edge components into the spatial filtering process.
Finally, in practical applications, the speed of detection is as crucial as the accuracy; while single-frame-based methods can often guarantee high detection efficiency, there is a need to enhance the accuracy and robustness of detection. On the other hand, LRSD-based and multi-frame-based methods often achieve higher detection accuracy but at the cost of efficiency. Therefore, it is essential to find a balance between detection accuracy and speed and to maintain high-precision detection while enhancing efficiency.
To address the aforementioned issues, a fast and robust method for detecting small moving IR targets is introduced in this paper. The main contributions are as follows:
(1)
A novel high-speed spatial–temporal saliency model (HS-STSM) is proposed, which simultaneously extracts the temporal saliency of the target from the inter-frame information of IR image sequences and the local anisotropy saliency in the spatial domain.
(2)
To enhance the extraction of spatial saliency of the target, this paper proposes a novel fast spatial filtering algorithm via a guided filter. This approach is combined with edge suppression using local prior weights, which serves to further reduce background residuals and highlight the target.
(3)
Achieving real-time performance in IR target detection is crucial. To address this, the vectorization of IR image sequences is introduced into the filtering process. This method significantly improves the speed of multi-frame detection and the extraction of the temporal saliency of the IR target, resulting in superior detection efficiency while maintaining high levels of detection accuracy.
(4)
Both qualitative and quantitative experimental results on five real sequences demonstrate that our model performs advanced, fast, and robust detection of IR small moving targets.
The rest of this paper is structured as follows. Section 2 presents the proposed model, covering the extraction of temporal and spatial saliency, as well as the filtering process. Additionally, a detailed explanation of the computation of local prior weights is provided. Section 3 presents the experimental results and analysis and offers both qualitative and quantitative evaluations. Section 4 is the discussion of the performance of the proposed method and provides insights into the effectiveness and limitations of the model. Finally, Section 5 presents the conclusion of this paper.

2. Proposed Model

Inspired by the guided filter (GF) introduced by He et al. [22], the edge preservation property is capitalized on to jointly filter IR image sequences in spatial and temporal domains, thereby achieving both background suppression and target enhancement. Specifically, the vectorization of the IR image sequence facilitates the mapping of inter-frame information into a two-dimensional matrix, allowing for the extraction of temporal features of the moving target and further target enhancement. This matrix serves as the guided image for the fast guided filter (FGF) [23], which leverages anisotropic features in the spatial domain to extract background components and differentiate them from the original image, effectively suppressing the background. Furthermore, to eliminate bright edge interference in the background, prior weights are computed utilizing the local structure tensor in the spatial domain. Finally, the target image is reconstructed, and detection results are derived through adaptive threshold segmentation.

2.1. Spatial–Temporal Saliency

The background often exhibits diverse properties such as gradient, texture, reflection, etc., which manifest differently in various directions within the spatial domain. In contrast, these properties appear isotropic in the target, as illustrated in Figure 1. Leveraging this disparity, background suppression can be achieved based on the anisotropic features between the target and the background. To better preserve target components from the background, the superior detail preservation capability of GF is harnessed. The image to be detected is input as the guide image, thereby enabling the filter output to retain more structural features. Moreover, the proposed approach also utilizes the temporal saliency of the moving target. In the temporal domain, under a fixed system, as the target traverses a pixel location, the pixel’s value at this location sharply rises to a high level and then decreases to its original level. This temporal feature can be described by a varying curve with a pronounced peak when the target is present and a relatively smooth line when the target is absent, as depicted in Figure 2.

2.2. Vectorization of IR Image Sequence

To extract the temporal saliency of a moving target for target enhancement, each frame in the IR image sequence is converted into a column vector. Each pixel of the current image is mapped into the column vector in left-to-right and top-to-bottom order, as shown in Figure 3. These column vectors are then employed to construct a two-dimensional matrix, which combines temporal and spatial domain information and serves as the input for the filtering process. Compared with single-frame-based spatial-domain filtering algorithms, the proposed method yields more effective removal of point-like bright noise present in the background as it fails to align with the temporal saliency feature of the moving target, which thus improves the robustness.

2.3. Filtering Process Based on Vectorized Guided Filter

To satisfy the real-time detection of infrared targets, the improved FGF is employed. Initially, nearest-neighbor subsampling is applied to the two-dimensional matrix F, resulting in the sampled output F . This sampled result is utilized as both the input image I and the guiding map G. Specifically, O is defined to be the linear transformation of G in a window ω k centered at pixel k:
O i = j W i j ( G ) I j = a k G i + b k , i ω k
where i and j represent pixel indices, W i j denotes the filter kernel, and ( a k , b k ) are linear constant coefficients within the window ω k , which takes the form of a square window with a radius r. It becomes evident that there is a linear gradient change between the filtered output image O and the guiding map G, which serves to preserve detailed information such as targets and edges within the matrix F prior to filtering. The objective is to minimize the following cost function within the window:
E ( a k , b k ) = i ω k ( a k G i + b k I i ) 2 + ϵ a k 2
where ϵ is a regularization parameter constraining a k . The parameter ( a k , b k ) can be solved by the principle of linear regression:
a k = 1 | w | i ω k G i I i μ k I ¯ k σ k 2 + ϵ
b k = I ¯ k a k μ k
where μ k and σ k 2 are the mean and variance of the grayscale of the pixels within the window ω k in G, and w is the number of pixels within the window ω k . I ¯ k , on the other hand, is the mean value of the grayscale of the pixels within the window ω k in the input image I. It is calculated as follows:
I ¯ k = 1 | w | i ω k I i
By combining the previously mentioned formulas, the equation for deriving the background component in G extracted by the filter is as follows:
O i = 1 | w | k , i ω k ( a k G i + b k ) = a ¯ k G i + b ¯ k
a ¯ i = 1 | w | k ω i a k , b ¯ i = 1 | w | k ω i b k
where a ¯ i and b ¯ i are the mean values of all the windows overlapping with pixel i. This gives the following filter kernel function W i j ( G ) with normalized characteristics, and the filtered matrix O is upsampled to restore it to its original full resolution.
W i j ( G ) = 1 | w | 2 k , i , j ω k 1 + ( G i μ k ) ( G j μ k ) σ k 2 + ϵ

2.4. Edge Suppression Based on Prior Weights

After the filtering, there are still several strong edge components in the suppressed results. Gao et al. [24] proposed a method to differentiate edges and corners by computing the eigenvalues ( λ 1 , λ 2 ) of the structure tensor of the pixel points in the image:
J ρ = K ρ ( Δ D ρ Δ D ρ ) = J 11 J 12 J 21 J 22 = K ρ I x 2 K ρ I x I y K ρ I x I y K ρ I y 2
λ 1 = 1 2 J 11 + J 22 + ( J 22 J 11 ) 2 + 4 J 12 2
λ 2 = 1 2 J 11 + J 22 ( J 22 J 11 ) 2 + 4 J 12 2
where ⊗ is the Kronecker product, Δ is the gradient operator, K ρ is the Gaussian kernel function with variance ρ , and I x and I y are the derivatives along the x and y directions, respectively. Gao et al. stated that when λ 1 λ 2 0 , it means that the pixel belongs to the edge. Then, the spatial prior information of the edge is calculated as follows, as adopted from Dai et al. [7] in RIPT:
E ( x , y ) = λ 1 λ 2
This spatial weight is combined in the filtering process to further suppress the residual edge components.

2.5. Adaptive Threshold Segmentation

Although the background clutter and edge interference have been suppressed and the target is preserved, the background information does not become zero. Thus, the post-processing adaptive threshold segmentation module should consider all pixels. So, the threshold proposed in this paper is computed as follows:
T h = q × I max + ( 1 q ) × I mean
where I m a x is the maximum gray value, I m e a n is the average of non-zero pixels, and q is an adjustable threshold factor between 0 and 1.

2.6. The Flowchart of the Proposed Model

The flowchart of the proposed model is depicted in Figure 4.
  • The input consists of an IR image sequence;
  • The spatial prior weight map is constructed by Formulas (9)–(12);
  • Each pixel of the current image is mapped into the column vector in left-to-right and top-to-bottom order to construct an input matrix for the filtering process, as shown in Figure 3;
  • The fast guided filter is utilized for the extraction of spatial saliency and background suppression in the filtering process;
  • The filtered image is subtracted from the original image to perform background suppression;
  • The reconstruction process involves placing each pixel in the IR sequence matrix back to their original positions in the IR images;
  • Spatial prior weights are integrated into the reconstructed infrared image to suppress edge residuals in the background;
  • Finally, the adaptive threshold segmentation, as shown in Formula (13), is performed on the recovered target detection result map to obtain the final target image.

3. Experimental Results and Analysis

This section presents the experimental results and analysis of the proposed model. A detailed description of the five real infrared image sequences used in the experiments is provided, which contain a variety of complex background components and targets of varying sizes. Additionally, the evaluation metrics used in the experiments are introduced. Quantitative experiments including parameter analysis, background suppression, target detection accuracy and detection efficiency are performed, as well as qualitative analysis of the detection results. Finally, seven state-of-the-art methods are chosen for direct comparison, and the experimental results demonstrate the superiority of the proposed method.

3.1. Evaluation Metrics

Based on the evaluation metrics commonly used in the field of infrared target detection, the target and its neighboring region are defined as shown in Figure 5, where a b denotes the target region and ( a + 2 d ) ( b + 2 d ) denotes the background region around the target. The specific evaluation metrics used are described below.
  • First, the definition of the signal-to-clutter ratio (SCR) is established as follows:
    S C R = | m e a n t m e a n n | d e v n
    where m e a n t and m e a n n denote the grayscale mean of the target region and the surrounding neighborhood region, respectively, and d e v n represents the grayscale standard deviation of the neighborhood region. The signal-to-clutter ratio gain (SCRG) is frequently utilized to assess the effectiveness of clutter suppression and target enhancement. From SCR, SCRG is calculated as follows:
    S C R G = S C R r S C R i
    where S C R r and S C R i represent the SCR value of the detection result and the input IR image, respectively.
  • To measure the effect of the method on background suppression, the background suppression factor (BSF) is calculated as follows:
    B S F = d e v i d e v r
    where d e v i and d e v r represent the standard deviation of the background region in the original input image and the detection result, respectively.
  • The detection rate and false alarm rate are used for comprehensive evaluation of the detection performance of the method across the entire image sequence. The detection rate P d is calculated as follows:
    P d = D T A T
    where D T is the number of successful detection and A T is the number of targets to be detected in the IR sequence. The false alarm rate F a is calculated as follows:
    F a = F P N P
    where F P is the number of non-target pixels and N P is the total number of pixels in the IR sequence. Further, the receiver operating characteristic curve (ROC) can be plotted according to the change in the threshold, using the detection rate as the horizontal coordinate and the false alarm rate as the vertical coordinate. Furthermore, the AUC value, i.e., the area under the curve, which is the area enclosed by the ROC curve and the axes, is a direct indicator of the target detection performance.

3.2. Description of the Dataset

To better demonstrate the robustness of the proposed method to infrared small targets in different backgrounds, experiments are performed on real IR image sequences [25]. The targets in this dataset are mainly airborne IR targets in sky and ground scenes in complex situations. In order to evaluate the performance of the proposed method, five representative scenes with high detection difficulty as test sequences are selected, as shown in Figure 6. The 3D display is given by Figure 7. A detailed description regarding the sequences is shown in Table 1. The SCR values in the table represent the signal-to-cluster ratio.

3.3. Parameter Analysis

The model’s adjustable parameters primarily reside in the filtering process, specifically the radius r of the filtering window ω k and the regularization parameter ϵ . ϵ controls the degree of smoothness in the original guided filter. However, the method proposed in this study achieves background suppression through the disparity between the original image and the filtered image, rendering the detection result almost independent of the parameter ϵ . The experimental findings conducted on real IR target sequences also support this observation. Hence, the default ϵ value is set to 0.01.
Here, this article concentrates on how the radius r of the filter window ω k affects the detection performance. Given the size range of IR small targets, the radiuses are set to 2, 4, 6, 8, 10, and 12, respectively. Subsequently, experiments on five real IR target sequences are conducted. The 3D receiver operating curves (ROC) of the experimental results are depicted in Figure 8, with the corresponding area under curve (AUC) value provided in the legend for each case. A larger AUC value indicates a superior detection performance.
It is evident that the model exhibits the poorest detection performance when the radius is at the minimum value of 2, the optimal performance is achieved when the radius is approximately 6, and the effectiveness diminishes again after attaining and surpassing 8. This trend can be attributed to the target size range of 2 × 2 to 5 × 5 within the five sequences. Specifically, when the window radius is small, it fails to completely encompass the target area, resulting in excessive contraction of the target and a reduction in the detection rate. Conversely, when the window radius is too large, it includes excessive background clutter components, thereby leading to a higher false alarm rate. Consequently, the ideal window radius should be slightly larger than the size of the target to be detected. As a result, the value is set as r = 6 which yields the optimal and most robust detection performance.

3.4. Ablation Experiments

To validate the efficacy of each part of the proposed model, ablation experiments are conducted, and the 3D ROC curves derived from the results are illustrated in Figure 9. Ablation experiments are performed separately on five real infrared sequences listed in Table 1. In the legend, “Without” denotes an experiment conducted without a particular component of the proposed method. The AUC values derived from the 3D ROC curves are listed in Table 2. The best results are marked in red. The results of the ablation experiments demonstrate that the overall algorithm surpasses the removal or replacement of any individual module, thereby affirming the significance of each component in enhancing the detection performance.

3.5. Qualitative Experiments

In this section, qualitative experiments on robustness in different scenes are performed. Furthermore, a comparison with seven state-of-the-art methods is presented, evaluating the ability of target enhancement and background suppression.

Qualitative Comparison with State-of-the-Art Methods

To demonstrate the superiority of the proposed model, it is compared with seven current state-of-the-art and efficient detection methods for IR target detection. These methods include single-frame detection methods HB-MCLM [12], MPCM [11], PSTNN [8], TLLCM [13], and WTLLCM [14], as well as multi-frame detection methods STLCF [15] and STLDM [16]. The parameter settings for each comparison method are shown in Table 3. All the parameters of the methods are consistent with the parameter settings in the corresponding literature of the methods.
The qualitative detection results are shown in Figure 10, Figure 11, Figure 12, Figure 13 and Figure 14, where the red boxes indicate successful detection for the target, the blue oval boxes represent false alarms, and the green boxes indicate missed detection. Furthermore, 3D maps of the results are provided in the upper right corner of each image.
Observing the results in data 1 and data 4, where the target stands out more prominently against the background, all methods successfully detect the target. However, it is evident that HB-MLCM, MPCM, and STLDM exhibit a higher number of false alarms in the detection results for data 1, as well as insufficient suppression of the buildings present in data 4. This is attributed to their sensitivity to strong edges and bright noise in the background. The targets in data 2 and data 3 are characterized by their small size and low contrast and the presence of considerable clutter interference in the background. This makes it challenging to effectively detect them using only the spatial saliency features of the targets. Consequently, single-frame-based detection methods such as PSTNN are less effective on these sequences, resulting in a large number of missed detections. Furthermore, multi-frame-based detection methods such as STLCF and STLDM do not adequately suppress the background components in the spatial domain, thereby leading to a bright noise residual, which is easily confused with the target. The background of data 5 contains a substantial amount of noise components which are similar to the target. Therefore, pure spatial filtering methods like TLLCM and WTLLCM encounter difficulty in distinguishing between the target and the noise, leading to a high number of false alarms and missed detections simultaneously in the detection results. However, it is interesting to note that the single-frame-based PSTNN performs surprisingly well on data 5. This may be attributed to the ability of the low-rank matrix constructed by its neighborhood sliding-window sampling, which effectively preserves the local features of the target in the spatial domain.
The proposed HS-STSM model outperforms other methods on all five datasets by accurately detecting the target location and effectively preserving or enhancing the target. This superior performance is achieved through the extraction of temporal saliency features of the target motion by image vectorization, and the complete suppression of background components via spatial anisotropy as well as prior weight assignment. The 3D maps of the results also demonstrate that the proposed method can accurately distinguish the target from the complex background. These qualitative detection results in various scenarios show the robustness of the proposed method.

3.6. Quantitative Analysis

Quantitative analysis is performed on the detection results to achieve a more accurate evaluation of the algorithms. The evaluation metrics described before are employed, and the results are presented in Table 4 and Table 5.
The optimal results are highlighted in red in the table. It is evident that the proposed method outperforms the seven state-of-the-art methods by achieving optimal or suboptimal values. The high BSF values denotes superior suppression of background residuals by effectively leveraging spatial anisotropic features and prior weights. Furthermore, the high SCRG values indicate a strong target enhancement through the utilization of temporal saliency. Finally, the high AUC values indicate that the proposed method achieves high detection rates while maintaining low false alarm rates, demonstrating its superior detection performance and robustness to noise and scenes. Overall, the proposed model effectively enhances targets while successfully suppressing background residuals.
To visualize the detection results across the entire image sequence, Figure 15 displays the ROC curves of the eight methods for the five image sequences mentioned. The corresponding AUC values for each case are provided in the legend. It is evident that methods such as HB-MLCM and MPCM, which rely on single-frame filtering, exhibit less robustness and display more fluctuations in different scenarios. Furthermore, the LRSD-based approach, PSTNN, demonstrates moderate performance, with a significant number of missed detections on data 3, where the target size is the smallest. STLCF and STLDM efficiently leverage multi-frame information, resulting in greater resistance to interference. Nevertheless, these methods perform unsatisfactorily in detecting small, dim targets in data 2 and data 3.
Compared with other algorithms, the model presented in this paper effectively suppresses background noise and improves target visibility by extracting spatial isotropic features and temporal saliency. This approach achieves optimal detection and false alarm rates across various scenarios. Results from the 3D ROC curves for five real and complex scenes demonstrate the high accuracy and robustness of the proposed model in practical applications.

3.7. Detection Efficiency

Real-time detection is crucial for IR target detection. The average detection times for a single IR image of the aforementioned methods are provided in Table 6, with optimal results highlighted in red font. The results demonstrate the superior detection efficiency of the proposed method, achieving either optimal or near-optimal average detection times on real IR sequences. Spatial filtering methods, such as HB-MLCM, MPCM, and WTLLCM, have relatively lower computational complexity, resulting in the higher detection efficiency. However, these methods compromise the detection accuracy and exhibits a high false alarm rate. LRSD-based methods typically have high computational complexity and longer detection time. PSTNN only operates on single frame image and reduces computational complexity through the partial sum of the tensor nuclear norm, achieving a considerable detection efficiency. However, compared to filtering methods, it still lacks competitiveness. The STLDM method is limited by its multi-frame differencing, resulting in high computational effort and low detection efficiency. As a multi-frame spatial–temporal filtering algorithm, the proposed method achieves significantly higher detection efficiency than similar filtering methods. Moreover, it exceeds the detection efficiency of single-frame-based methods. This superiority can be attributed to the vectorization on IR image sequences, which reduces repeated and redundant computations. In conclusion, the proposed HS-STSM model maintains high detection accuracy without sacrificing efficiency, thus demonstrating a practical application value.

3.8. Intuitive Effect

To intuitively demonstrate the superior detection performance of the proposed model, the experimental results from quantitative evaluation, i.e., BSF, SCRG, AUC, and computation time, are summarized in a histogram, as shown in Figure 16. The values in the histogram represent the average results obtained from the aforementioned five real IR sequences. It is evident that the proposed HS-STSM model has the highest BSF, SCRG, and AUC values and the lowest computation time, which exhibits the model’s improved detection accuracy and efficiency.

4. Discussion

Research in the field of IR target detection methods has indeed received considerable attention, and significant advancements have been made in recent years. However, there is always room for improvement and further development. Spatial filtering methods and HVS-based detection methods have low computational complexity and therefore are faster. Still, they often perform poorly on the detection of dim and small targets in complex scenes. LRSD-based models have shown improvements in detection accuracy, yet they often rely on limited prior information and struggle to effectively leverage inter-frame information in the temporal domain of moving targets. Furthermore, multi-frame-based detection tends to involve high computational complexity, and there is a need to enhance detection efficiency in this area.
To address the problem of robustness and efficiency for IR target detection in complex scenes, the HS-STSM model is proposed. This method leverages temporal information of moving targets to effectively suppress clutter interference from complex backgrounds. In addition, it utilizes the local structure tensor in the spatial domain to assign prior weight, further reducing residual bright edge components in the background. Furthermore, by the vectorization of IR image sequences, the model significantly improves detection efficiency, as shown in Table 6. Qualitative and quantitative experiments were conducted, including robustness evaluation, background suppression, target enhancement, detection ability, and computation time, on five real IR target image sequences as depicted in Figure 6 and Table 1. These results were compared with seven state-of-the-art methods, namely HB-MLCM, MPCM, PSTNN, STLCF, STLDM, TLLCM, and WTLLCM. The obtained qualitative results in Figure 10, Figure 11, Figure 12, Figure 13 and Figure 14 and the overall evaluation results presented in Figure 16 demonstrate that the proposed model outperforms the other methods in terms of both detection efficiency and robustness.
Nevertheless, the proposed model still has room for improvement. The current size of the filtering window radius r is a fixed value, which may lead to excessive target shrinkage or missed detection when the target size varies greatly. One possible improvement idea is to adaptively determine the optimal window size in the filtering process based on the maximum response principle.

5. Conclusions

In order to achieve fast and robust IR target detection in complex scenes, a high-speed spatial–temporal saliency model (HS-STSM) for detecting small moving IR targets based on vectorized guided filter is proposed. The model extracts the target’s motion saliency in the temporal domain and its local anisotropy saliency in the spatial domain from IR sequences in complex scenes. Thus, the target can be distinguished from both dimensions. Moreover, the local structure tensor in the spatial domain is employed to assign prior weights to strong edges, contributing significantly to the suppression of the background. Both qualitative and quantitative experimental results conducted on five real sequences demonstrate that the model proposed in this paper enhances the detection performance of small moving IR targets in complex scenes while preserving high speed and robustness.

Author Contributions

A.A. and Y.L. proposed the original idea and designed the experiments. A.A. performed the experiments and wrote the manuscript. Y.L. reviewed and edited the manuscript. Y.H. and G.Z. contributed computational resources and edited the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Aerospace Information Research Institute, Chinese Academy of Sciences.

Data Availability Statement

Bingwei Hui et al., “A dataset for infrared image dim-small aircraft target detection and tracking underground/air background.” Science Data Bank, 28 October 2019 [Online]. Available: https://doi.org/10.11922/sciencedb.902 [Accessed: 2 December 2022].

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Eysa, R.; Hamdulla, A. Issues on Infrared Dim Small Target Detection and Tracking. In Proceedings of the 2019 International Conference on Smart Grid and Electrical Automation (ICSGEA), Xiangtan, China, 10–11 August 2019; pp. 452–456. [Google Scholar] [CrossRef]
  2. Tong, Z.; Can, C.; Xing, F.W.; Qiao, H.H.; Yu, C.H. Improved small moving target detection method in infrared sequences under a rotational background. Appl. Opt. 2018, 57, 9279–9286. [Google Scholar] [CrossRef] [PubMed]
  3. Tom, V.T.; Peli, T.; Leung, M.; Bondaryk, J.E. Morphology-based algorithm for point target detection in infrared backgrounds. In Proceedings of the Defense, Security, and Sensing, Orlando, FL, USA, 22 October 1993. [Google Scholar]
  4. Deshpande, S.D.; Er, M.H.; Venkateswarlu, R.; Chan, P. Max-mean and max-median filters for detection of small targets. In Proceedings of the Optics and Photonics, Denver, CO, USA, 4 October 1999. [Google Scholar]
  5. Tomasi, C.; Manduchi, R. Bilateral filtering for gray and color images. In Proceedings of the Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), Bombay, India, 7 January 1998; pp. 839–846. [Google Scholar] [CrossRef]
  6. Gao, C.; Meng, D.; Yang, Y.; Wang, Y.; Zhou, X.; Hauptmann, A.G. Infrared Patch-Image Model for Small Target Detection in a Single Image. IEEE Trans. Image Process. 2013, 22, 4996–5009. [Google Scholar] [CrossRef] [PubMed]
  7. Dai, Y.; Wu, Y. Reweighted Infrared Patch-Tensor Model With Both Nonlocal and Local Priors for Single-Frame Small Target Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 3752–3767. [Google Scholar] [CrossRef]
  8. Zhang, L.; Peng, Z. Infrared Small Target Detection Based on Partial Sum of the Tensor Nuclear Norm. Remote Sens. 2019, 11, 382. [Google Scholar] [CrossRef]
  9. Hu, Y.; Ma, Y.; Pan, Z.; Liu, Y. Infrared Dim and Small Target Detection from Complex Scenes via Multi-Frame Spatial-Temporal Patch-Tensor Model. Remote Sens. 2022, 14, 2234. [Google Scholar] [CrossRef]
  10. Chen, C.L.P.; Li, H.; Wei, Y.; Xia, T.; Tang, Y.Y. A Local Contrast Method for Small Infrared Target Detection. IEEE Trans. Geosci. Remote Sens. 2014, 52, 574–581. [Google Scholar] [CrossRef]
  11. Wei, Y.; You, X.; Li, H. Multiscale patch-based contrast measure for small infrared target detection. Pattern Recognit. 2016, 58, 216–226. [Google Scholar] [CrossRef]
  12. Shi, Y.; Wei, Y.; Yao, H.; Pan, D.; Xiao, G. High-Boost-Based Multiscale Local Contrast Measure for Infrared Small Target Detection. IEEE Geosci. Remote Sens. Lett. 2018, 15, 33–37. [Google Scholar] [CrossRef]
  13. Han, J.; Moradi, S.; Faramarzi, I.; Liu, C.; Zhang, H.; Zhao, Q. A Local Contrast Method for Infrared Small-Target Detection Utilizing a Tri-Layer Window. IEEE Geosci. Remote Sens. Lett. 2020, 17, 1822–1826. [Google Scholar] [CrossRef]
  14. Cui, H.; Li, L.; Liu, X.; Su, X.; Chen, F. Infrared Small Target Detection Based on Weighted Three-Layer Window Local Contrast. IEEE Geosci. Remote Sens. Lett. 2022, 19, 7505705. [Google Scholar] [CrossRef]
  15. Deng, L.; Zhu, H.; Tao, C.; Wei, Y. Infrared moving point target detection based on spatial–temporal local contrast filter. Infrared Phys. Technol. 2016, 76, 168–173. [Google Scholar] [CrossRef]
  16. Du, P.; Hamdulla, A. Infrared Moving Small-Target Detection Using Spatial–Temporal Local Difference Measure. IEEE Geosci. Remote Sens. Lett. 2020, 17, 1817–1821. [Google Scholar] [CrossRef]
  17. Ma, Y.; Liu, Y.; Pan, Z.; Hu, Y. Method of Infrared Small Moving Target Detection Based on Coarse-to-Fine Structure in Complex Scenes. Remote Sens. 2023, 15, 1508. [Google Scholar] [CrossRef]
  18. Wang, W.; Qin, H.; Cheng, W.; Wang, C.; Leng, H.; Zhou, H. Small target detection in infrared image using convolutional neural networks. In Proceedings of the AOPC 2017: Optical Sensing and Imaging Technology and Applications, Beijing, China, 4–6 June 2017; Jiang, Y., Gong, H., Chen, W., Li, J., Eds.; International Society for Optics and Photonics, SPIE: Bellingham, WA, USA, 2017; Volume 10462, p. 1046250. [Google Scholar] [CrossRef]
  19. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar] [CrossRef]
  20. Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems—Volume 2, NIPS’14, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
  21. Wang, Z.; Yang, J.; Pan, Z.; Liu, Y.; Lei, B.; Hu, Y. APAFNet: Single-Frame Infrared Small Target Detection by Asymmetric Patch Attention Fusion. IEEE Geosci. Remote Sens. Lett. 2023, 20, 7000405. [Google Scholar] [CrossRef]
  22. He, K.; Sun, J.; Tang, X. Guided Image Filtering. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1397–1409. [Google Scholar] [CrossRef] [PubMed]
  23. He, K.; Sun, J. Fast Guided Filter. arXiv 2015, arXiv:1505.00996. [Google Scholar]
  24. Gao, C.Q.; Tian, J.; Wang, P. Generalised-structure-tensor-based infrared small target detection. Electron. Lett. 2008, 44, 1349–1351. [Google Scholar] [CrossRef]
  25. Hui, B.; Song, Z.; Fan, H.; Zhong, P.; Hu, W.; Zhang, X.; Ling, J.; Su, H.; Jin, W.; Zhang, Y.; et al. A dataset for infrared detection and tracking of dim-small aircraft targets under ground/air background. China Sci. Data 2020, 5, 12. [Google Scholar]
Figure 1. Spatial saliency of IR target.
Figure 1. Spatial saliency of IR target.
Remotesensing 16 01685 g001
Figure 2. Temporal saliency of the IR target under a fixed system.
Figure 2. Temporal saliency of the IR target under a fixed system.
Remotesensing 16 01685 g002
Figure 3. Vectorization of IR image sequence.
Figure 3. Vectorization of IR image sequence.
Remotesensing 16 01685 g003
Figure 4. Flowchart of the proposed HS-STSM model.
Figure 4. Flowchart of the proposed HS-STSM model.
Remotesensing 16 01685 g004
Figure 5. The IR target and its neighboring region. The target region is enclosed within the red box, whereas the background region is enclosed within the blue box.
Figure 5. The IR target and its neighboring region. The target region is enclosed within the red box, whereas the background region is enclosed within the blue box.
Remotesensing 16 01685 g005
Figure 6. Five real IR sequences used in the experiments. The target regions are marked with red boxes.
Figure 6. Five real IR sequences used in the experiments. The target regions are marked with red boxes.
Remotesensing 16 01685 g006
Figure 7. Three-dimensional maps of the IR sequences used in the experiments. The target regions are marked with red boxes. The numbers represent corresponding IR sequences.
Figure 7. Three-dimensional maps of the IR sequences used in the experiments. The target regions are marked with red boxes. The numbers represent corresponding IR sequences.
Remotesensing 16 01685 g007
Figure 8. The impact of the filter window’s radius r.
Figure 8. The impact of the filter window’s radius r.
Remotesensing 16 01685 g008
Figure 9. Three−dimensional ROC curves of ablation experiments on five real infrared sequences.
Figure 9. Three−dimensional ROC curves of ablation experiments on five real infrared sequences.
Remotesensing 16 01685 g009
Figure 10. Comparative detection results of data 1.
Figure 10. Comparative detection results of data 1.
Remotesensing 16 01685 g010
Figure 11. Comparative detection results of data 2.
Figure 11. Comparative detection results of data 2.
Remotesensing 16 01685 g011
Figure 12. Comparative detection results of data 3.
Figure 12. Comparative detection results of data 3.
Remotesensing 16 01685 g012
Figure 13. Comparative detection results of data 4.
Figure 13. Comparative detection results of data 4.
Remotesensing 16 01685 g013
Figure 14. Comparative detection results of data 5.
Figure 14. Comparative detection results of data 5.
Remotesensing 16 01685 g014
Figure 15. Three−dimensional ROC curves of the detection results by different methods.
Figure 15. Three−dimensional ROC curves of the detection results by different methods.
Remotesensing 16 01685 g015
Figure 16. Intuitive display of the detection performance by different methods.
Figure 16. Intuitive display of the detection performance by different methods.
Remotesensing 16 01685 g016
Table 1. Description of the dataset.
Table 1. Description of the dataset.
SequenceFramesImage SizeAverage SCRTarget Size
Data 1399256 × 2566.073 × 3 to 4 × 4
Data 21500256 × 2565.201 × 1 to 2 × 2
Data 3750256 × 2563.421 × 1 to 3 × 3
Data 41599256 × 2563.842 × 2 to 4 × 4
Data 5499256 × 2562.203 × 3 to 5 × 5
Table 2. AUC values of the ablation experiments.
Table 2. AUC values of the ablation experiments.
MethodData 1Data 2Data 3Data 4Data 5
Proposed0.99080.95790.89251.00000.9930
Without prior0.96410.85020.79190.99790.9698
Without vectorization0.93510.44580.50370.75000.8129
Table 3. Comparison methods and parameters.
Table 3. Comparison methods and parameters.
MethodsParameters
HB-MLCMWindow size: 15 × 15 , ε = 25 , K = 4
MPCMWindow size: 3 × 3 , 5 × 5 , 7 × 7 , mean filter size: 3 × 3
PSTNNPatch size: 40 × 40 , step: 40, λ = 3.7 min ( m , n ) · n 3 , ε = 10 7
STLCFWindow size: 5 × 5 , frames = 5
STLDMFrames = 5
TLLCMGaussian filter kernel
WTLLCMWindow size: 3 × 3 , K = 4
ProposedWindow radius: r = 6 , regularization parameter: ϵ = 0.01
Table 4. Quantitative results of different methods.
Table 4. Quantitative results of different methods.
Data 1Data 2Data 3
MethodsBSFSCRGAUCBSFSCRGAUCBSFSCRGAUC
HB-MLCM17.3204.4650.881616.3433.1070.02295.6424.7410
MPCM402.9362.9070.5223185.5290.3590.0426NaN0.0320.0015
PSTNN18.7994.9410.99748.6902.1560.664714.5450.2050.0951
STLCF6.3144.1890.96997.1602.2450.685411.3490.4940.2224
STLDM10.1273.2410.89058.5002.4430.792510.8842.6060.5433
TLLCM6.1441.4070.35033.7352.4840.51638.1731.6470.5310
WTLLCM12.2105.5640.97095.1575.0140.824410.2452.8450.7210
Proposed396.5045.7620.9908199.2053.1760.957932.1174.9650.8925
Table 5. Quantitative results of different methods.
Table 5. Quantitative results of different methods.
Data 4Data 5
MethodsBSFSCRGAUCBSFSCRGAUC
HB-MLCM8.8005.4280.924330.7572.1840.1100
MPCM242.0642.2360.9000129.4281.4070.0482
PSTNN24.8892.0301.000047.5815.2100.8416
STLCF5.1851.9670.914618.5674.2030.6956
STLDM5.6533.7170.921956.2536.5470.8121
TLLCM30.3542.8120.999713.9330.9230.0679
WTLLCM45.1992.9440.999216.6413.2140.3164
Proposed64.1054.7571.0000650.7557.7580.9930
Table 6. Computation time of different methods (in seconds).
Table 6. Computation time of different methods (in seconds).
MethodTime
Data 1Data 2Data 3Data 4Data 5
HB-MCLM0.0260.0300.0260.0310.022
MPCM0.0320.0530.0620.0410.050
PSTNN0.2100.5270.8910.4280.293
STLCF0.3150.3270.3750.3330.383
STLDM1.5711.6501.6341.5821.702
TLLCM1.0781.1431.1651.1321.186
WTLLCM0.0540.8051.1450.2160.038
Proposed0.0300.0250.0240.0280.021
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Aliha, A.; Liu, Y.; Zhou, G.; Hu, Y. High-Speed Spatial–Temporal Saliency Model: A Novel Detection Method for Infrared Small Moving Targets Based on a Vectorized Guided Filter. Remote Sens. 2024, 16, 1685. https://doi.org/10.3390/rs16101685

AMA Style

Aliha A, Liu Y, Zhou G, Hu Y. High-Speed Spatial–Temporal Saliency Model: A Novel Detection Method for Infrared Small Moving Targets Based on a Vectorized Guided Filter. Remote Sensing. 2024; 16(10):1685. https://doi.org/10.3390/rs16101685

Chicago/Turabian Style

Aliha, Aersi, Yuhan Liu, Guangyao Zhou, and Yuxin Hu. 2024. "High-Speed Spatial–Temporal Saliency Model: A Novel Detection Method for Infrared Small Moving Targets Based on a Vectorized Guided Filter" Remote Sensing 16, no. 10: 1685. https://doi.org/10.3390/rs16101685

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop