3.1. Infrared Patch-Tensor Model
The original infrared images can be divided into three components: background, target, and noise. The model of the images can be expressed as follows:
where
represents the original infrared image,
represents the background component,
represents the target component, and
represents the noise component. Based on the framework of low-rank sparse decomposition, we hypothesize that the background component is low rank, whereas the target component is sparse, accompanied by an additive white noise component.
The IPI model was developed based on this framework [
25]:
where
and
N represent the original infrared image, target, background, and noise. In the IPI model, the image is initially sampled using a sliding window. The obtained image blocks are then transformed into column vectors to construct a low-rank matrix. On the other hand, target pixels with relatively high contrast and small size are represented as sparse components. To extract the sparse components and obtain the target detection results, robust principal component analysis (RPCA) is applied.
Compared with transforming each patch into a vector, the infrared patch-tensor model (IPT) introduced by Dai et al. [
36] constructs the tensor model by directly stacking the patches, obtained via sliding a window from the top left to the bottom right over the image. Following this rationale, we establish the infrared patch-tensor model as follows:
where
represent the constructed image tensor, the decomposed background tensor, the target tensor, and the noise tensor, respectively. The construction method for the image tensor
is explained in a subsequent section. Using the low-rank sparse decomposition framework, we assume that the target tensor
exhibits sparsity, denoted as
. Here,
k is a constant that represents the level of sparsity, which is determined by the complexity of the infrared image. The noise tensor
is assumed to be additive white noise, expressed as
. This leads to the constraint
, where
.
The assumption of low-rankness for the background tensor
can be described as:
where
denotes the rank of different slices of the background tensor
. The constants
, and
are positive and determined by the complexity of the image background. However, solving the rank by the L0 norm is challenging, and therefore the L1 norm is commonly used as an alternative approximation.
To examine the hypothesis of low-rankness, we unfold the raw tensor of the infrared image in three modes. This expansion enables us to analyze the variation of singular values for all modes. The results are presented in
Figure 3. Notably, the singular values of the frontal slices, horizontal slices, and side slices demonstrate a pronounced decrease, which consistently aligns with the fundamental assumption of low-rank sparsity. Using the framework of low-rank sparse decomposition, we establish the following objective function:
where
is a compromising parameter that governs the trade-off between the target tensor and the background tensor. The L0 norm, which is computationally difficult to solve, is substituted with the L1 norm.
3.2. Spatial–Temporal Block-Matching Patch-Tensor Construction
Using a single-frame sliding window to acquire infrared patches and then constructing the tensor via these neighboring blocks is often ineffective in complex scenes. The SFBM model [
31] attempts to improve the low-rankness of the tensor but still results in a significant number of false alarms and background residuals due to the lack of temporal information and the limitations of the matching criterion. The MFSTPT model [
39] incorporates temporal information between frames through spatial–temporal sampling, but it still relies on the same neighborhood sliding window method as PSTNN [
37] to obtain the tensor. Additionally, both the neighborhood sliding window method and matching block searching have redundancies, leading to a high number of repeated calculations in the subsequent iterative solution of the low-rank sparse decomposition, which results in generally low detection efficiency.
To address the aforementioned issues, we propose a spatial–temporal block-matching method to construct a low-rank tensor. First, the image is divided into multiple reference blocks, as shown in
Figure 4. The matching values of each reference block are then computed. A matching group is formed by selecting the
k most adjacent matching blocks. The current image frame is located in the middle of the image sequence. The green rectangle represents the current reference block, whereas the red border represents the matching block for this reference block. To detect the current image, we utilize its previous
frames and subsequent
frames, resulting in a total of
frames as the input model. Based on experimental testing, we set
for the balance between detection accuracy and processing time. The middle frame image is divided into several reference blocks of a designated size. The local grayscale entropy value of each reference block is calculated as the matching value. These matching values are then sorted in descending order to obtain the top
k blocks with the closest matching values. Together with the corresponding matching blocks in the preceding and subsequent frames, these
k blocks form a matching group, which is used to construct a low-rank tensor. The value of
k is determined by the ratio
of the total number of matching blocks. The next matching group is selected from the sorted list of matching values in increments of
. This approach avoids the global searching of block acquirement, which reduces the times of decomposition iterations and thus can improve the detection efficiency.
3.3. Matching Criterion Based on the Local Grayscale Entropy
The SFBM model matches image blocks solely based on grayscale differences, thereby disregarding the spatial distribution and structural characteristics. Additionally, it also relies on a sliding window traversal approach for searching matching blocks, leading to low detection efficiency. Recently, local grayscale entropy has gained significant attention in the fields of small target detection and image enhancement [
43,
44,
45]. To capture the spatial characteristics of the grayscale information in an image block, we establish a matching criterion based on the local grayscale entropy of the block. The pixel point of the statistical image, along with the surrounding neighborhood information, forms a new binary feature denoted as
, where
i and
j represent the grayscale values of the pixel and the corresponding neighborhood pixels, respectively. To further encompass the joint characteristics of the gray value at a pixel position and the gray distribution of its neighboring pixels, we define the probability of the occurrence of
in the image as
:
where
denotes the number of occurrences of the binary feature
, and
L and
H signify the dimensions of the image. We can then calculate the local entropy of the given image block as follows:
S represents the entropy value of the image block, and a higher similarity in the entropy value indicates better matching between the two image blocks, ultimately resulting in a more favorable low rank of the constructed tensor.
Figure 5 demonstrates the singular-value curve of the tensor constructed from the image in
Figure 3. It can be seen that the proposed method acquires a faster-decreasing curve compared to the grayscale difference employed in the SFBM model.
Moreover, we compare the PSTNN, SFBM, and MFSTPT models with the proposed STBMPT model for the construction of a low-rank tensor/matrix in two different infrared image sequences, and the corresponding singular-value curves are depicted in
Figure 6. The results demonstrate that the proposed tensor construction method displays the fastest decline in singular values and achieves the best low-rank properties.
3.4. Local Prior Information
False alarms in infrared small moving target detection frequently arise from strong edges and corner points in intricate backgrounds, which often become the background residuals and subsequently impact the accuracy. Therefore, it is necessary to compute local priors to suppress the pixels associated with strong edges and bright corner points before iteratively solving the model. Gao et al. [
46] proposed a method to distinguish edges from corner points by calculating the eigenvalues (
) of the structure tensor of pixels:
where
is the gradient operator, and
and
are the derivatives with respect to the
x- and
y-directions.
represents a Gaussian kernel function with variance
, and ⊗ denotes the Kronecker product. Gao et al. demonstrated that when
, the pixel corresponds to a corner point. On the other hand, when
, the pixel is associated with an edge. Finally, when
, the pixel belongs to a smooth background region.
The local prior weight used in RIPT [
36] is as follows:
where
denotes the pixel location. It can be observed in the second row in
Figure 7 that RIPT only evaluates the edges, which may result in more edge residues in the detection outcomes, thereby affecting the results. The corner-point weighting function proposed by Brown et al. [
47] is as follows:
where
is half of the harmonic mean of the eigenvalues;
and
are the determinant and trace of the matrix, respectively; and
is the structure tensor. Using Formula (15) as the prior weights leads to residual corner-point elements, contributing to a significant amount of point noise in the detection results, as shown in the third row in
Figure 7.
The PSTNN model [
37] employs the maximum value of the eigenvalues instead of the difference as the edge weight function and multiplies it with Formula (15) to obtain the prior weights, as shown in Formula (16). Partly suppressing the strong edges, this method still lacks sufficient suppression of strong corner pixels, as shown in the fourth row in
Figure 7.
The MFSTPT model [
39] computes the prior weight in the form of a weighted geometric average. This formula combines the edge weighting function presented in Formula (14) and the corner-point weighting function presented in Formula (15), as indicated in Formula (17):
However, as depicted in the fifth row in
Figure 7, the background residuals still cannot be eliminated when there are significant fluctuations near the edges.
To address the above issues, in combination with the previously mentioned laws of the structure tensor (
and
), we introduce a new local prior-weight calculation method inspired by the Frangi filtering method [
48]. We begin by defining two indicators that depict the shape of the pixels:
Here,
and
are the two eigenvalues, and
. In the case of a speckled shape, the two eigenvalues are close to each other, resulting in a larger
R-value. Conversely, when the shape is expressed as an edge, the difference between the two eigenvalues is larger, leading to a smaller
R-value. Furthermore, when the pixels belong to the background, the eigenvalues are both smaller, resulting in a smaller
S-value. On the contrary, the
S-value is larger if there is at least one larger eigenvalue present. Therefore, we establish the local prior weights as:
Here,
represents the local prior weights of the pixel positions in the image. The parameters
and
c are used to adjust the sensitivity of the edge and corner-point weights. In this paper,
is set at 2 and the value of
c is set at half of the maximum
S-value among all pixels. The weights are then normalized as:
where
and
represent the minimum and maximum values in
, respectively. The prior weights proposed in this paper allocate appropriate weights to the background edges and corner points, effectively suppressing strong edges and bright corner points while highlighting the target. To compare the impacts of different prior weights, we employ the aforementioned state-of-the-art prior weights to calculate the local weight maps for three common infrared images, which are presented in
Figure 7. The targets are enclosed within red boxes, whereas the residual background is enclosed within yellow elliptical boxes. It is evident that the proposed prior weights offer superior results compared to other models. Although there is a slight shrinkage of the target, the background residuals are significantly suppressed regardless of whether it is a complex or smooth background.
3.5. Sparsity Constraints and Rank Approximation
For the L1 parametric sparsity constraint on the target component
in Formula (9), the widely used sparsity-enhanced weighting scheme is employed to minimize the L1 norm [
49]:
where
p represents a non-negative constant,
is a small positive number introduced to prevent division by zero, and
k represents the number of iterations. Combining the sparse weights in Formula (22) and the prior weights in Formula (21), the target weight matrix is constructed as follows:
where
is the reciprocal of the corresponding element of
and ⊙ denotes the Hadamard product.
For the low-rank constraint of the background component
in Formula (9), the approximation of the tensor rank is a crucial issue. To address this, we adopt the rank approximation method based on the Laplace operator used in the MFSTPT model [
39]. This method, known as the Laplace patch-tensor nuclear norm (LPTNN), is used to approximate the background tensor rank as shown in Formula (24). Compared to other methods, the Laplace operator is able to assign automatic weights to singular values, resulting in a closer approximation to the background tensor rank.
where
represents the singular value of the
ith frontal slice,
p is the number of singular values, and
is a small positive number. As a result, the proposed solution model is updated in Formula (25):
3.6. Optimization and Solution of the Proposed Model
The alternating direction method of multipliers (ADMM) method has been widely recognized for its fast convergence and high solution accuracy in solving problems involving low-rank sparse decomposition [
50]. We employ the ADMM method to solve the model in Formula (25) through the following procedure.
Firstly, we express the model to be solved in the form of an augmented Lagrangian function, given by:
where
is a Lagrange multiplier,
is a penalty factor greater than 0,
denotes the inner product, and
is replaced by
according to the constraints in Formula (25). To solve for
and
, we solve for the other variable while keeping one variable fixed. As a result, the problem in Formula (26) can be decomposed into two subproblems:
The first subproblem can be effectively resolved using a soft thresholding optimization algorithm [
51]. By applying thresholding to the tensor elements, the solution to Subproblem (27) of the target tensor can be obtained as follows:
where
is the thresholding operator:
For the subproblem in Formula (28), the task of solving the background tensor is initially conceptualized as the following optimization problem:
In fact, we can calculate the non-convex rank of the LPTNN substitution by combining all the front slices of the tensor in the Fourier transform domain along
. This transformation converts the problem into an optimization problem for the sum of the singular values of
matrices:
where
are two-dimensional frontal slices. To address Problem (32), the generalized weighted singular-value threshold operator [
52] is employed:
where
An inverse Fourier transform of the result yields
X. Therefore, the background tensor in Problem (28) can be resolved. The algorithm’s flow is illustrated below in Algorithm 2.
Algorithm 2:Solution for the background tensor problem (28) |
|
and
are updated based on the following formulas:
Algorithm 3 shows the complete process of utilizing the ADMM method to optimize and solve the proposed model. The algorithm’s post-processing step applies a simple threshold segmentation to the target components. The threshold value is determined as
, where
represents the average grayscale value of the entire image,
s stands for the standard deviation of the image’s grayscale, and
q is a constant.
Algorithm 3:Solution of the proposed model. |
|