Infrared Small Target Detection Based on Compound Eye Structural Feature Weighting and Regularized Tensor

Li, Linhan; Wang, Xiaoyu; Hao, Shijing; Yu, Yang; Gao, Sili; Yue, Juan

doi:10.3390/app15094797

Open AccessArticle

Infrared Small Target Detection Based on Compound Eye Structural Feature Weighting and Regularized Tensor

by

Linhan Li

^1,2,

Xiaoyu Wang

^1,2,

Shijing Hao

¹,

Yang Yu

¹

,

Sili Gao

^1,* and

Juan Yue

^1,*

¹

Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(9), 4797; https://doi.org/10.3390/app15094797

Submission received: 24 February 2025 / Revised: 15 April 2025 / Accepted: 18 April 2025 / Published: 25 April 2025

Download

Browse Figures

Versions Notes

Abstract

:

Compared to conventional single-aperture infrared cameras, the bio-inspired infrared compound eye camera integrates the advantages of infrared imaging technology with the benefits of multi-aperture systems, enabling simultaneous information acquisition from multiple perspectives. This enhanced detection capability demonstrates unique performance in applications such as autonomous driving, surveillance, and unmanned aerial vehicle reconnaissance. Current single-aperture small target detection algorithms fail to exploit the spatial relationships among compound eye apertures, thereby underutilizing the inherent advantages of compound eye imaging systems. This paper proposes a low-rank and sparse decomposition method based on bio-inspired infrared compound eye image features for small target detection. Initially, a compound eye structural weighting operator is designed according to image characteristics, which enhances the sparsity of target points when combined with the reweighted

l_{1}

-norm. Furthermore, to improve detection speed, the structural tensor of the effective imaging region in infrared compound eye images is reconstructed, and the Representative Coefficient Total Variation method is employed to avoid complex singular value decomposition and regularization optimization computations. Our model is efficiently solved using the Alternating Direction Method of Multipliers (ADMM). Experimental results demonstrate that the proposed model can rapidly and accurately detect small infrared targets in bio-inspired compound eye image sequences, outperforming other comparative algorithms.

Keywords:

bio-inspired infrared compound eye; infrared small targets; low-rank sparse decomposition; target detection

1. Introduction

Compared to the visible spectrum, the infrared band is broader and contains more detailed target information. Meanwhile, infrared optical imaging systems exhibit stronger penetration capabilities and better anti-interference performance. By directly utilizing the thermal radiation characteristics of targets for passive detection, these systems can operate day and night, adapt to various environments, and maintain concealment. They are particularly effective for detection under adverse weather conditions with low visibility, such as fog, rain, snow, and frost, making them invaluable for industrial inspection, aerial surveillance, border patrol, and disaster rescue scenarios. Among these, Mid-Wave Infrared (MWIR) systems demonstrate sensitivity to both thermal and reflective radiation. Meanwhile, cooled infrared detectors significantly reduce thermal noise generated in high-temperature environments by cooling the core detector components. This enables cooled MWIR detectors to operate at lower temperatures, thereby enhancing signal clarity and detection accuracy. The low noise characteristics allow the detectors to better capture weak infrared radiation signals, substantially improving detection performance. Currently, most infrared imaging systems employ single-aperture configurations. However, as application scenarios and target detection requirements become increasingly diverse, single-aperture systems struggle to meet demands such as wide field-of-view or rapid target detection. Consequently, building upon traditional single-aperture designs, researchers have developed multi-aperture imaging systems, which are typically categorized into multi-view vision systems, camera arrays, microlens arrays, and compound eye lenslet arrays. Among these, multi-view vision systems and camera arrays are typically bulky, while microlens arrays are constrained by aperture and focal length limitations, restricting their application scenarios. Consequently, researchers have developed bio-inspired compound eye imaging systems. These systems consist of numerous small optical units, which we term “ommatidia” by analogy to their biological counterparts, enabling wide field-of-view imaging with comparable image quality while maintaining a more compact and smaller form factor [1,2]. Moreover, compound eye visual information processing differs significantly from single-aperture systems. Generally, higher resolution demands greater processing power, and higher pixel counts entail more information to process. Compound eye cameras, however, extract valuable information with limited resolution. Additionally, due to their unique structural design, compound eye imaging systems can acquire more spatial and motion information about targets. Existing compound eye imaging systems are typically designed for visible light or uncooled infrared applications, whereas our developed cooled bio-inspired infrared compound eye camera exhibits enhanced detection capabilities for weak targets [3].

A key technology in infrared target detection is small target detection. As shown in Figure 1, this is a typical scenario of infrared small target detection against an aerial or space background. Currently, infrared small targets are typically defined as targets occupying no more than 9 × 9 pixels [4]. With the advancement of modern technology, the rapid and accurate detection of small targets across diverse scenarios has become a primary task. Therefore, developing algorithms with high detection accuracy, strong robustness, and fast processing speed for complex scenarios presents a significant challenge. For infrared bio-inspired compound eye systems, adjacent ommatidia capture the same scene, but due to their geometric arrangement on a spherical surface, the same scene exhibits different pixel coordinates across ommatidia, consistent with overlap ratio analysis. Simultaneously, for the same target, the relative positions of each ommatidium result in varying target poses in the captured images. This is reflected in the compound eye images as targets exhibiting different sizes and responses across ommatidia. Detecting all small targets appearing in the ommatidia is essential to further combine their geometric information for estimating target distance ranges.

Infrared small target detection methods can be categorized into model-based approaches and deep learning-based approaches broadly. Due to the wide field of view and low pixel resolution of individual ommatidia in our infrared compound eye camera, it is challenging to obtain detailed features. Therefore, within the model-based framework, we employ Low-Rank and Sparse Decomposition (LRSD) for small target detection. The LRSD-based method assumes that infrared images consist of low-rank background signals, sparse target signals, and noise signals [6,7]. Specifically, in infrared imaging, some image patches in the background are approximately linearly correlated, indicating that the background satisfies the low-rank property. For targets, due to the considerable distance between the real targets and the imaging system, the targets occupy only a few pixels in the entire infrared image, making the target component sparse relative to the whole image [7]. By performing low-rank and sparse decomposition on the original infrared image, the low-rank and sparse components are separated, with the sparse component corresponding to the target. Applying threshold segmentation to the sparse component enables target detection. Based on the aforementioned assumptions, Gao et al. [7] initially developed an Infrared Patch-Image (IPI) model for infrared small target detection. However, when dealing with non-smooth backgrounds, the IPI model results in target images with strong edge residuals. Consequently, numerous improved algorithms have been proposed based on IPI, such as the Non-negative Infrared Patch-Image model via Partial Sum Minimization of Singular Values (NIPPS) proposed by Dai et al. [8], and the approach by Zhang and Xue [9,10], which employs a non-convex γ-norm to constrain the background. This not only overcomes the limitations of the nuclear norm in the IPI model but also achieves higher detection speeds, while using the

l_{2,1}

-norm to constrain sparse edges with approximately linear structures. Dai et al. [11] were the first to extend the low-rank and sparse decomposition-based infrared small target detection algorithm from two-dimensional matrices to three-dimensional tensors, proposing the Reweighted Infrared Patch Tensor (RIPT) model. Following this tensor construction scheme, Zhang and Peng [12] utilized the Partial Sum of the Tensor Nuclear Norm (PSTNN) to constrain the low-rank component and effectively suppress the background. Kong et al. [13] employed the Tensor Fiber Nuclear Norm and Hyper Total Variation (TV, Log TFNN) to suppress both background and noise.

In image sequences, the background exhibits low-rank properties in the temporal domain, while the target is sparsely distributed in the temporal domain [14]. To leverage the spatial–temporal correlations in infrared image sequences, Sun et al. [15] extended the IPI model from the spatial domain to the spatial–temporal domain. In recent years, many scholars have explored various methods, each with its distinct advantages. For instance, Liu et al. [16] proposed a Non-convex Tensor Low-rank Approximation (NTLA) with Asymmetric Spatial–Temporal Total Variation (ASTTV) to enhance target detection capabilities. Meanwhile, Wu et al. constructed a four-dimensional infrared tensor from a series of infrared images and decomposed it into low-dimensional tensors using the Tensor Train (TT) technique and its extension, Tensor Ring (TR) [17], while preserving the spatial structural features and temporal characteristics of the original data. Liu et al. applied the RCTV method to small target detection in infrared image sequences [18], improving detection efficiency. Sun proposed a novel approach based on multi-subspace learning and spatial–temporal tensor data structures [19]. Lu introduced a new Long-term Spatial–Temporal Tensor (LSTT) model, employing image registration techniques to achieve frame-to-frame alignment and constructing a new image tensor through direct superposition of aligned frames [20]. Wei developed a four-dimensional tensor model based on superpixel segmentation and statistical clustering for infrared dim target detection [21], further enhancing the spatial structural correlations among image patches. Liu proposed a novel IPT model (termed IPT–TCTV), constructing an improved Spatial–Temporal Tensor (STT) model through sliding 3-D windows, which better preserves the spatial correlation and temporal continuity of multi-frame infrared images in the constructed tensor [22]. Yin introduced a new 3-D paradigm framework that incorporates spatial–temporal weighting and regularization into the low-rank sparse tensor decomposition model [23]. Zhao developed an iterative corner and edge weighting method based on tensor decomposition, designing corner intensity as the weight for target components and edge intensity as the weight for interference components, enabling more accurate separation of targets and interference [24]. These are excellent methods proposed in recent years, capable of preserving the characteristics of the original image sequences during tensor construction and calculating target weights from the original image, demonstrating strong performance in small target detection for single-aperture infrared image sequences. However, they are not well suited for leveraging the unique features of bio-inspired infrared compound eye images.

This paper proposes a low-rank and sparse decomposition method based on the image characteristics of bio-inspired infrared compound eyes. The method reconstructs the structural tensor of our infrared compound eye images according to their features. Initially, the entire compound eye image is segmented to retain the images of the ommatidia regions. These ommatidia images are then arranged into a tensor along the temporal dimension based on their size and number. A compound eye structural weighting operator is designed to integrate information from all ommatidia images, effectively utilizing the scene correlation between adjacent ommatidia and the scene variability due to different imaging angles. We combine RCTV (Representative Coefficient Total Variation) and a reweighted

l_{1}

-norm that incorporates the compound eye structural weights into a novel model. This model imposes TV regularization constraints on the representative coefficients to enhance computational efficiency and employs the compound eye structural weighting operator to improve the accuracy of target detection, enabling rapid and accurate detection of small infrared targets. Quantitative and qualitative experiments demonstrate that, compared to other methods, this approach offers significant advantages in quickly detecting small targets in infrared compound eye images and effectively suppressing background clutter.

2. Materials and Methods

2.1. Construction of the Compound Eye Spatiotemporal Tensor Model

We have designed a bio-inspired infrared compound eye camera, which employs a high-performance cooled mid-wave infrared detector. The array of small lenses is arranged in a curved surface configuration, with their optical axes perpendicular to the spherical mounting structure. The optical axis of the central microlens serves as the principal optical axis of the compound eye, extending outward at intervals of 10°. The microlenses on each concentric circle are evenly spaced, forming a field of view of 108° × 108°. The edge distortion of the microlenses is approximately 4–5%, essentially achieving wide-field imaging with minimal edge distortion [3].

Due to the unique structure and optical system of the bio-inspired infrared compound eye camera, the infrared compound eye images we obtain exhibit distinct structural characteristics. In a complete compound eye image, there are both ommatidia images containing imaging information and non-imaging regions corresponding to the spherical shell portion. According to the arrangement of ommatidia in the compound eye camera, even the edge ommatidia have three adjacent neighbors, which means the same scene is captured by different clusters of adjacent ommatidia and is represented in the imaging results as overlapping scenes with a certain geometric regularity. Typically, infrared images can be represented by the following model [6]:

f_{D} = f_{B} + f_{T} + f_{N}

(1)

Here, the subscripts

D, B, T, and N

represent the original infrared image, background image, target image, and noise image, respectively. Based on the number of ommatidia and the imaging results from different ommatidia, we can represent the entire infrared compound eye image using the following model:

f_{D} = f_{D_{1}} ⨁ f_{D_{2}} \dots \dots ⨁ f_{D_{n}}

(2)

Here,

D_{i}

represents the ommatidial image,

n

denotes the total number of ommatidia, and ⊕ is the operator, indicating that the ommatidial images, based on the mechanical geometric structure, jointly form the entire compound eye image.

In our previous research, we explored the scene overlap rate between adjacent ommatidia after imaging the same scene using a compound eye camera. This overlap rate can reach between 50% and 70% [25]. Based on this characteristic and the geometric arrangement of the ommatidia, we can make the following observations: the background images between adjacent ommatidia exhibit correlations, and the target is present in a subset of adjacent ommatidia. As shown in Figure 2, we utilize these features to reconstruct our infrared compound eye image structural tensor. In the full compound eye image, only the ommatidial regions are retained, and the tensor is rearranged along the temporal scale.

By combining the IPI model for the extension of infrared images, our infrared compound eye image tensor model can be described as:

D = B + T + N

(3)

where

D, B, T, and N

represent the ommatidial tensor, background data, target, and noise, respectively.

The assumptions of low rankness and sparsity are well aligned with the structural characteristics of our compound eye infrared imaging data. Each frame consists of multiple ommatidium images capturing the scene from different but overlapping perspectives. This spatial redundancy, especially between neighboring ommatidia, results in strong correlations across the image regions, which in turn gives rise to a low-rank structure in the data.

To empirically verify this, we conducted mode-n unfolding of the four-dimensional data tensor (spatial width, spatial height, number of ommatidia, and time), followed by singular-value decomposition. In all four modes, the singular values exhibit a steep decay, where only a few components retain most of the energy, while the rest drop rapidly to near zero. This confirms the presence of a substantial low-rank structure across all dimensions. The singular value distributions are shown in Figure 2. Furthermore, in the context of infrared small target detection, the low-rank and sparsity priors have a clear physical meaning: the background tends to be smooth, slowly varying, and thus low rank, while small targets are localized, rare, and can be naturally modeled as sparse components. Analysis validates the low-rank nature of the background tensor:

r a n k (B) < ϵ

, where the constant

ϵ

> 0 represents the complexity of the background and the target tensor is a sparse tensor, satisfying the following condition:

{‖T‖}_{0} < τ

(4)

Typically, it can be assumed that the random noise in the infrared image is additive Gaussian white noise with a noise intensity of

δ

, which satisfies the following condition:

{‖N‖}_{F} < δ

(5)

Here,

{‖.‖}_{F}

denotes the Frobenius norm. Therefore, the low-rank background tensor and sparse target tensor can be separated by considering the following problem:

\min_{B, T} r a n k (B) + λ {‖T‖}_{0} s t . {‖D - B - T‖}_{F} < δ

(6)

Here,

λ

denotes the regularization weight parameter. However, since solving the

l_{0}

-norm is an NP-hard problem, the

l_{1}

-norm is typically used as a substitute. Therefore, it can be rewritten as:

\min_{B, T} {‖B‖}_{*} + λ {‖T‖}_{1} s t . {‖D - B - T‖}_{F} < δ

(7)

2.2. Compound Eye Structural Weighting Operator

For compound eye images, due to the overlapping imaging of different parts of the same scene by different ommatidia, adjacent ommatidia can capture the same target. Moreover, in our preliminary research, as the detection distance increases, the overlap ratio between ommatidia approaches a fixed value. Although adjacent ommatidia exhibit overlapping scenes, different ommatidia have varying scene complexities. For ommatidia with low scene complexity, the visual saliency of targets is higher. We use this characteristic to compute a weighting operator that enhances the sparsity of targets.

For each segmented

f_{D_{i}}

, the Laplace operator is used to compute the gradients in four directions:

g_{D_{i}} = \sum_{d_{1} = - 1}^{1} \sum_{d_{2} = - 1}^{1} f_{D_{i}} (x) - f_{D_{i}} (x - d_{1}, y - d_{2})

(8)

After ignoring the influence of gradient values at the edges of circular ommatidia imaging, we normalize the gradient values for a single ommatidium to obtain

{n g}_{D_{i}}

and calculate its standard deviation

{s d n g}_{D_{i}}

. Due to varying scene complexities across ommatidia, those with low scene complexity exhibit relatively smaller values of normalized gradient standard deviation. We take the ommatidium image with the smallest normalized gradient standard deviation as the baseline, and we calculate the horizontal and vertical displacement ranges

(d_{x}, d_{y})

for the remaining ommatidia relative to the baseline, based on their geometric arrangement on the spherical shell. The estimation method for the displacement ranges

(d_{x}, d_{y})

is as follows:

We define the field of view of the compound eye camera as:

F O V = 2 \times (n e - 1) \times θ + 2 ω

(9)

where

ω

represents the half-field of view of a single ommatidium,

n e

denotes the number of ommatidia, and

θ

indicates the arrangement position of the ommatidia on the spherical shell, containing the geometric location information of the ommatidia. Figure 3 shows the lens of our compound eye camera and a schematic diagram of the geometric arrangement of the ommatidia. The overlap rate between adjacent ommatidia can be expressed as [25] [Supplementary Materials]:

η = \frac{\sin (2 ω - θ)}{2 \cos (ω - θ) \sin (ω)}

(10)

We use the overlap rate of the imaging area to calculate the pixel displacement range:

d = R - R \sqrt{η}

(11)

where

R

is the radius of the ommatidial imaging region. For all ommatidial images after displacement estimation using geometric distance, it can be roughly assumed that their local imaging regions correspond to each other. The base ommatidial image can be used to suppress the background gray-level response of the remaining ommatidial images:

w_{D_{i}} = m e a n \sum_{d - l}^{d + l} g_{D_{i}} ⨂ \min \{{s d n g}_{D_{i}}\}

(12)

w_{p} = w_{D_{1}} ⨁ w_{D_{2}}, \dots, ⨁ w_{D_{n}}

(13)

where

l

represents the tolerance range of the estimation result, and ⊗ denotes the multiplication operation based on the corresponding region of the displacement results. The weight values are also constructed into a tensor consistent with the input image to be detected.

2.3. Weighted Regularization Model

Because the global low-rank nature of the background makes it difficult to describe the local details within the background, a common strategy to reduce the impact of edges, corners, noise, and other factors on detection performance is to add an additional regularization term that imposes constraints on the background [26]. This allows more details to be preserved within the background, thereby reducing its impact on the target detection.

Currently, when processing discrete images, two commonly used TV regularization terms are the anisotropic total variation (Anisotropic TV, ATV) based on the

l_{1}

-norm and the isotropic total variation (Isotropic TV, ITV) based on the

l_{2}

-norm. Isotropic TV (ITV) tends to smooth the image while preserving edge clarity, whereas Anisotropic TV (ATV) places more emphasis on preserving image details and edges. For a given two-dimensional image

X

of size

m \times n

, and without considering the boundaries, the definitions of isotropic and anisotropic total variation are as follows:

{‖X‖}_{I T V} = \sum_{i = 1}^{m} \sum_{j = 1}^{n} \sqrt{{(X (i + 1, j) - X (i, j))}^{2} + {(X (i, j + 1) - X (i, j))}^{2}}

(14)

{‖X‖}_{A T V} = \sum_{i = 1}^{m} \sum_{j = 1}^{n} |X (i + 1, j) - X (i, j)| + |X (i, j + 1) - X (i, j)|

(15)

Clearly, ITV is isotropic but non-differentiable and is not a convex function. The optimization speed, difficulty, and stability of non-convex functions cannot be compared to those of convex functions. Therefore, ATV is more commonly used and generally yields better results than ITV.

For convenience of representation, two auxiliary operators are introduced. Let

D_{h}

and

D_{v}

represent the two-dimensional difference operators in the horizontal and vertical directions:

D_{h} X (i, j) = X (i + 1, j) - X (i, j), D_{v} X (i, j) = X (i, j + 1) - X (i, j) .

where the boundary condition of

X

is:

X (m + 1, j) = X (m, j), j = 1, 2, 3, \dots, n

(16)

X (i, n + 1) = X (i, n), i = 1, 2, 3, \dots, m

(17)

Therefore, the ATV expression can be rewritten as:

{‖X‖}_{T V} = {‖D_{h} X‖}_{1} + {‖D_{v} X‖}_{1}

(18)

In Ref. [27], Theorem 1 mentions that the spatial information of the original large-size

X

can, to some extent, be reflected in the smaller size

U

, thus avoiding complex computations and improving the efficiency of target detection. That is, for a matrix

X \in R^{M N \times F}

with rank

r

, its decomposition is

X = U V^{T}

, where the orthogonal matrix

V \in R^{F \times r}

, and the coefficient matrix can be obtained by

U = X V

. The TV semi-norm is applied to each slice of

U

[28], and they are summed together, which results in Coefficient Representation TV (RCTV) regularization. Mathematically, its definition is as follows:

{‖X‖}_{R C T V} = \sum_{i = 1}^{r} {‖U (:, :, i)‖}_{T V}

(19)

where

U = X_{\times 3} V

. The expression can be simplified as:

{‖X‖}_{R C T V} = {‖\nabla_{1} (U)‖}_{1} + {‖\nabla_{2} (U)‖}_{1}

(20)

where

\nabla_{1} (U) = [D_{h} U (:, :, 1), \dots, D_{h} U (:, :, r)], \nabla_{2} (U) = [D_{v} U (:, :, 1), \dots, D_{v} U (:, :, r)]

.

Unlike existing TV-based methods that directly apply TV regularization to constrain the background tensor

B

, RCTV employs TV regularization to constrain the representation coefficient matrix

U

, describing local smoothness priors. This method eliminates the need to compute Singular Value Decomposition (SVD) and solve complex regularization terms, thus lowering computational complexity and enhancing detection speed.

To further improve the sparsity of targets and differentiate sparse non-target points, we employ a reweighted

l_{1}

minimization scheme combined with the compound eye structural weighting operator to adaptively assign weights to targets. Our model can be expressed as:

\min_{U, V, T} α_{1} {‖\nabla_{1} (U)‖}_{1} + α_{2} {‖\nabla_{2} (U)‖}_{1} + λ {‖w ⨀ T‖}_{1} s . t . D = U V^{T} + T, V^{T} V = I

(21)

Here, ⊙ denotes the Hadamard product, and

w = w_{r e p} ⨀ w_{t}

, where

w_{r e p}

is the reciprocal of the values of

w_{p}

,

λ

and

α_{1,2}

represent positive trade-off parameters, and

w_{t} = \frac{1}{|T| + ε}

, where ε is a minimal value introduced to avoid division by zero in computations.

2.4. Iterative Optimization Process

Using the Alternating Direction Method of Multipliers (ADMM) approach [29], we solve Equation (21) by fixing all other variables while solving for one variable at a time. This method introduces auxiliary variables and Lagrange multipliers, decomposing the original problem into a series of smaller subproblems, which are then alternately updated to progressively approach the optimal solution of the original problem.

By introducing auxiliary variables

G_{1}, G_{2}

, we reformulate Equation (21) as follows:

\min_{U, V, T} α_{1} {‖G_{1}‖}_{1} + α_{2} {‖G_{2}‖}_{1} + λ {‖w ⨀ T‖}_{1} s . t . D = U V^{T} + T, V^{T} V = I, \nabla_{i} (U) = G_{i}

(22)

We employ the Augmented Lagrange Multiplier Method (ALMM) [30] to solve the convex optimization problem involving the combination of nuclear norm and

l_{1}

-norm minimization, reformulating the above equation as:

L (U, V, T, G_{1}, G_{2}, β_{1}, β_{2}, β_{3}) = \sum_{i = 1}^{2} (α_{i} {‖G_{i}‖}_{1} + \frac{μ}{2} {‖\nabla_{i} (U) - G_{i} + \frac{β_{i}}{μ}‖}_{F}^{2}) + λ {‖w ⨀ T‖}_{1} + \frac{μ}{2} {‖D - U V^{T} - T + \frac{β_{3}}{μ}‖}_{F}^{2}

(23)

In the equation,

β_{1}, β_{2}, and β_{3}

represent the Lagrange multipliers, and

μ

denotes the penalty parameter. Subsequently, the ADMM method is employed to iteratively solve Equation (23). The complete solution process is outlined as follows.

(1): Updating $G_{1}, G_{2}$ : By fixing all variables except $G_{1}, G_{2}$ in Equation (23), we obtain the following subproblem:

$G_{i} = \underset{G_{i}}{argmin} \sum_{i = 1}^{2} (α_{i} {‖G_{i}‖}_{1} + \frac{μ}{2} {‖\nabla_{i} (U) - G_{i} + \frac{β_{i}}{μ}‖}_{F}^{2})$

(24)

The soft thresholding function [31] is employed to solve:

$G_{i} = S o f t (\nabla_{i} (U) + \frac{β_{i}}{μ}, \frac{α_{i}}{μ}), (i = 1, 2)$

(25)

(2): Updating $V$ : By fixing all variables except $V$ in Equation (23), we obtain the following subproblem:

$V = \underset{V}{argmin} \frac{μ}{2} {‖D - U V^{T} - T + \frac{β_{3}}{μ}‖}_{F}^{2}$

(26)

The calculation is performed using the theorem mentioned in [32]:

$[A, ~, C] = s v d ({(D - T + \frac{β_{3}}{μ})}^{T} U)$

(27)

(3): Updating $U$ : By fixing all variables except $U$ in Equation (23), we obtain the following subproblem:

$\arg \min_{U} \frac{μ}{2} \sum_{i = 1}^{2} ({‖\nabla_{i} (U) - G_{i} + \frac{β_{i}}{μ}‖}_{F}^{2}) + \frac{μ}{2} {‖D - U V^{T} - T + \frac{β_{3}}{μ}‖}_{F}^{2}$

(28)

Taking the derivative of the above equation:

$(I + \sum_{i = 1}^{2} \nabla_{i}^{T} \nabla_{i}) (U) = (D - T + \frac{β_{3}}{μ}) V + \sum_{i = 1}^{2} \nabla_{i}^{T} (G_{i} - \frac{β_{i}}{μ})$

(29)

In the equation, $\nabla_{i}^{T}$ represents the “transpose” operator of $\nabla_{i}$ . Treating the difference operation $\nabla_{i} (U)$ as the convolution of the difference filter $O_{i} ⊛ f o l d (U)$ , where $O_{i}$ is the corresponding difference filter, the closed-form solution for $U$ can be easily derived by applying the Fourier transform to both sides of the equation and utilizing the convolution theorem [33].

$U = {u n f o l d}_{3} (F^{- 1} (\frac{F (f o l d ((μ (D - T) + β_{3}) V)) + \sum_{i = 1}^{2} F {(O_{i})}^{*} ⨀ F (f o l d (μ G_{i}) - β_{i})}{μ (t e n (1)) + μ ({|F (O_{1})|}^{2} + {|F (O_{2})|}^{2})}))$

(30)

Here, $F (\cdot)$ and ${|.|}^{2}$ denote the Fourier transform and element-wise square operation, respectively, while $t e n (1)$ represents a tensor with all elements equal to 1.

(4): Updating $T$ : By fixing all variables except $T$ in Equation (23), we obtain the following subproblem:

T = \underset{S}{argmin} λ {‖w ⨀ T‖}_{1} + \frac{μ}{2} {‖D - U V^{T} - T + \frac{β_{3}}{μ}‖}_{F}^{2}

(31)

T = S o f t (D - U V^{T} + \frac{β_{3}}{μ}, \frac{λ w}{μ})

(32)

(5): Updating $w$ :

w = w_{r e p} ⨀ w_{t}

(33)

w_{t} = \frac{1}{|T| + ε}

(34)

(6): Updating the Lagrange multipliers $β_{1}, β_{2}, {and β}_{3}$ , and the penalty parameter $μ$ :

β_{i} = β_{i} + μ (\nabla_{i} (U) - G_{i}), i = 1, 2

(35)

β_{3} = β_{3} + μ (D - U V^{T} - T)

(36)

μ = \min (ρ μ, μ_{m a x})

(37)

Algorithm 1 summarizes the entire solution process. In the initialization step, the input data dimensions

M, N, n, and L

represent the pixel size of each ommatidium image, the number of ommatidia, and the number of frames in the image sequence, respectively. The matrices

U

and

V

are computed based on the given parameters

F

and

r

. The Lagrange multiplier and penalty parameter are initialized according to the values listed in the table.

Algorithm 1 Pseudocode outlining the main steps of the proposed algorithm

Input: Compound eye image sequences,

F, r, H, α

Initializatio:

D \in R^{M N \times n \times L}, U = U_{r} \sum_{r}, V = V_{r},

ξ = 10^{- 7}, β_{i} = 1, μ = 0.01, μ_{m a x} = 10^{7}, ρ = 1.5

While:not converged do

1: Updata

G_{1}, G_{2}

by (25)

2: Updata

V

by (27)

3: Updata

U

by (30)

4: Updata

T

by (32)

5: Updata

w

by (33)

6: Updata

β_{i}

by (35) and (36)

7: Updata

μ

by (37)

8: Check the convergence conditions

{‖D - U V^{T} - T‖}_{F}^{2} / {‖D‖}_{F}^{2} \leq ξ | | {‖T^{k + 1}‖}_{1} = {‖T^{k}‖}_{1}

End while

Output T

2.5. Complexity Analysis

The computation of the compound eye structural weights involves matrix operations, and we primarily analyze the computational complexity of the iterative optimization process. Assuming the constructed infrared image tensor is

D \in R^{m \times n \times F}

, where

F

is the temporal dimension, the computation involves

B = U_{\times 3} V^{T}

, where

U \in R^{(m n) \times r}

and

V \in R^{F \times r}

. Updating

G_{1}, G_{2}

requires soft thresholding calculations, with a combined computational complexity of

O (2 m n r)

. Updating

U

involves FFT calculations, resulting in a computational complexity of

O (m n r l o g (m n))

. Updating

T

requires soft thresholding calculations, with a complexity of

O (m n F)

. Updating

V

necessitates SVD calculations, leading to a computational complexity of

O (F r^{2})

. The complexity of updating ww and other parameters is

O (1)

. Therefore, the overall computational complexity of the iterative solution is

O (1 + m n r l o g (m n) + F r^{2} + m n F)

.

For an input image of size 640 × 640, the detection time per frame is approximately 0.1821 s. This includes all preprocessing steps applied to the original image, such as retaining only the valid imaging regions and computing the weighting operator. These steps involve various matrix operations, and there is room for further optimization in future implementations to improve computational efficiency.

2.6. Convergence Analysis

Following the above algorithm, each variable is iteratively solved, and the optimization iteration terminates when the following condition is met:

\frac{{‖D - U V^{T} - T‖}_{F}^{2}}{{‖D‖}_{F}^{2}} \leq ξ

(38)

{‖T^{k + 1}‖}_{1} = {‖T^{k}‖}_{1}

(39)

If any of the above conditions are met, it terminates.

2.7. Target Detection Process

As shown in Figure 4, the input sequence images are cropped to retain the effective imaging regions, unfolded according to the ommatidia image dimensions and temporal sequence to obtain the input tensor. Simultaneously, the effective imaging regions of all ommatidia in a single frame can be statistically computed to derive the compound eye structural weight tensor. The results are iteratively updated using the process outlined in Algorithm 1.

3. Results

3.1. Objective Evaluation Metrics

(1): Model-based approach evaluation metrics

To assess the effectiveness of the algorithm, we adopted four widely used metrics: the Receiver Operating Characteristic (ROC) curve, Signal-to-Clutter Ratio Gain (SCRG), Background Suppression Factor (BSF), and Contrast Gain (CG). The ROC curve, representing the Receiver Operating Characteristic curve, provides a comprehensive evaluation of detection performance. The y-axis of the curve represents the True Positive Rate (TPR), while the x-axis represents the False Positive Rate (FPR). The formulas for these metrics are as follows:

T P R = \frac{n u m b e r o f d e t e c t e d t a r g e t}{a c t u a l n u m b e r o f t a r g e t}

(40)

F P R = \frac{n u m b e r o f d e t e c t e d b a c k g r o u n d p i x e l s}{t o t a l n u m b e r o f i m a g e p i x e l s}

(41)

Detection performance can be quantified using the Area Under the Curve (AUC). A higher AUC value indicates better detection performance. The Signal-to-Clutter Ratio (SCR) measures the difficulty of detecting targets in infrared images.

S C R = \frac{|μ_{t} - μ_{b}|}{δ_{b}}

(42)

Here,

μ_{t}

represents the average grayscale of the target region,

μ_{b}

denotes the average pixel value of the local neighborhood region, and

δ_{b}

is the standard deviation.

{S C R}_{i n}

and

{S C R}_{o u t}

refer to the SCR values of the input source and output detection images, respectively. The Signal-to-Clutter Ratio Gain (SCRG) is defined as SCR Gain.

S C R G = \frac{{S C R}_{o u t}}{{S C R}_{i n}}

(43)

The effectiveness of background suppression in the detection algorithm is quantified by the Background Suppression Factor (BSF), expressed as follows:

B S F = \frac{δ_{i n}}{δ_{o u t}}

(44)

Here,

δ_{i n}

and

δ_{o u t}

represent the standard deviations of the local region before and after suppression, respectively. In summary, higher SCRG and BSF values indicate better performance of the algorithm in target enhancement and background suppression. However, in methods based on sparse and low-rank recovery, if background noise is minimized, the standard deviation becomes almost negligible, resulting in a calculated value of Inf. To address this issue, we utilize the Contrast Gain (CG) metric, defined as follows:

C G = \frac{{C O N}_{o u t}}{{C O N}_{i n}}

(45)

Here,

{C O N}_{i n}

and

{C O N}_{o u t}

represent the contrast of the input and output infrared images, respectively. Additionally, the calculation of CON is as follows:

C O N = |μ_{t} - μ_{b}|

(46)

Furthermore, in the measurements of SCRG, BSF, and CG, the neighborhood size is defined as the area of the effective imaging region of a single ommatidium.

(2): Deep learning-based approach evaluation metrics

To fairly evaluate and compare the performance of both traditional model-based and deep learning-based small target detection methods, we adopt two commonly used metrics: precision and recall. These metrics are model-agnostic and can be consistently applied across different types of detection outputs (saliency maps, confidence maps, bounding boxes), making them well suited for cross-method evaluation.

Recall is defined as the proportion of true targets that are successfully detected. Precision is defined as the proportion of correctly detected targets among all detections:

R e c a l l = \frac{T r u e P o s i t i v e s (T P)}{T r u e P o s i t i v e s (T P) + F a l s e N e g a t i v e s (F N)}

(47)

P r e c i s i o n = \frac{T r u e P o s i t i v e s (T P)}{T r u e P o s i t i v e s (T P) + F a l s e P o s i t i v e s (F P)}

(48)

Furthermore, the Precision–Recall (PR) curve provides an intuitive and informative visualization of the trade-off between these two metrics, making it well suited for comparing detection performance across fundamentally different methodologies.

3.2. Data Characteristics

A collimator, blackbody, and circular aperture target are combined to form a simulated infinite-distance target source system, as show in Figure 5. The radiative flux of the target source is controlled by adjusting the blackbody temperature, while the projected cross-sectional size of the target source is simulated by replacing circular aperture targets of different sizes.

A rotating reflection device reflects the radiation from the target source at a fixed angle and, through a rotating arm moving within the vertical plane, generates an optically moving target. The optical axis of the infrared compound eye camera is aligned with the center of the target’s rotational axis to ensure that the optical motion target appears to move with a certain line-of-sight angular velocity and angular acceleration within the compound eye camera’s field of view. By setting the angular velocity of the rotating optical target, the equivalent angular velocity of the target relative to the compound eye camera can be precisely adjusted.

Figure 6a–c show different small target datasets, where the targets simulate objects at an infinite distance, and the backgrounds represent the laboratory environment.

To analyze the performance of our small target detection method across various scenarios, we captured image sequences using an infrared bio-inspired compound eye camera. By moving the camera at non-uniform speeds, we captured several common ground-based infrared scenarios as data backgrounds. Panels (1)–(5) in Figure 6 depict scenes of buildings at different distances and densities, forests, and sky with clouds, respectively.

We employed the method mentioned in IPI to add targets:

f_{D} (x, y) = \{\begin{matrix} \max (t f_{T} (x, y), f_{B} (x, y)) x, y \in T \\ f_{B} (x, y) o t h e r w i s e \end{matrix}

(49)

Here,

f_{T} (x, y)

and

f_{B} (x, y)

represent the normalized target image and background image, respectively, where

t \in [h, 255]

and

t

is the maximum grayscale value of the background scene containing the target ommatidium.

The data characteristics are listed in Table 1.

3.3. Parameter Settings

(1): Frame Count: We evaluate the impact of different F values on algorithm performance by varying F from 1 to 10 in increments of 1. We use ROC curves to compare the results of different parameter settings.
(2): The trade-off parameter $λ$ between the target and background is related to the image dimensions and is typically calculated using the H parameter in combination with the image size and frame count. We perform parameter analysis using different H values, adjusting H from 1 to 10 in increments of 1 during the experiments.

(3): Rank $r$ primarily describes the low-rank property of infrared images, and it determines the size of U so it affects the computational complexity. We analyze the impact of $r$ on algorithm performance by varying $r$ in increments of 1, with F set to 10 frames.

(4): α: α is used to balance the target and background. Typically, a lower α-value enables the reconstruction of more detailed background information, but the target may also be retained in the background, leading to missed detections. A higher α-value results in a coarser background, with many background details potentially preserved in the target image, causing false alarms.

Based on Figure 7, we set the parameters as follows:

F = 10; H = 4; r = 6; a = 0.1

.

3.4. Ablation Study

We conducted an ablation study to demonstrate the importance of the weighting operator. As shown in the Figure 8, we compare the detection results with and without the weighting operator across five image sequences. It can be clearly observed that incorporating the weighting operator effectively suppresses false alarms.

3.5. Comparative Algorithms

We primarily compared several classic small target detection algorithms and recent low-rank sparse decomposition-based small target detection algorithms that have demonstrated excellent performance in single-aperture infrared image sequences. These include 4D_TT and 4D_TR [17], ADMD [34], ASTTV_NTLA [16], ICEW [24], IPI [7], RCTV [26], RCTVW [18], NFTDGSTV [35], Top_hat [36], and MPCM [37]. The parameters for all algorithms are listed in Table 2, and these algorithms are applied to all ommatidia imaging regions during data processing. We also compared several representative deep learning-based small target detection methods: SSD [38], YOLOv5 [39], ILNet [40], and MSHNet [41].

3.6. Visual Analysis

As shown in Figure 9, 4D_TT and 4D_TR exhibit good detection results but suffer from missed detections when target responses are low and false alarms in regions with high background responses. RCTV can detect most targets but generates numerous false alarms, especially in ommatidia that do not capture targets. RCTVW further improves the ability to distinguish targets from the background, yet it still faces false alarms in non-target ommatidia regions and missed detections in images with low target responses. MPCM and ADMD exhibit missed detections even for strong targets, indicating that the local features of small targets in compound eye images are not prominent. ASTTV_NTLA and NFTDGSTV produce fewer false alarms in images with smooth background grayscale transitions but struggle with detecting bright backgrounds as targets. ICEW and Top_hat show varying degrees of false alarms across different image sequences. The IPI algorithm can detect targets almost completely but still misidentifies background regions in ommatidia images without targets as targets.

Traditional model-based small target detection methods typically produce intermediate or post-processed result maps, which are well suited for direct visual comparison and qualitative analysis. In contrast, deep learning-based methods often provide detection results in the form of bounding boxes or confidence maps, which differ in format and interpretation.

Therefore, in the qualitative analysis section, we mainly focus on comparing traditional algorithms, as their visual outputs are more consistent and interpretable for analysis. Nonetheless, we have included quantitative comparisons with deep learning-based detection networks to ensure a comprehensive evaluation of all methods.

3.7. Quantitative Analysis

Since detection methods based on low-rank sparse decomposition typically minimize the background, the standard deviation becomes almost negligible, resulting in Signal-to-Clutter Ratio Gain (SCRG) and Background Suppression Factor (BSF) values of Inf, as shown in Table 3. Our method better suppresses the background in ommatidia imaging regions, while a higher Contrast Gain (CG) value indicates that our algorithm enhances contrast. Therefore, the proposed algorithm demonstrates strong performance in background suppression.

As illustrated in Figure 10 and Table 3, across the five image sequences, our detection algorithm achieves more comprehensive target detection in each ommatidium at a given false alarm rate, exhibiting a higher detection rate.

In addition to comparing with traditional model-based methods, we also evaluate the performance of our approach against several representative deep learning-based small target detection models or networks, including SSD [38], YOLOv5 [39], ILNet [40], and MSHNet [41]. These methods are trained and tested on our compound eye infrared dataset, with a training-to-testing split ratio of 9:1. In Figure 11, the comparison helps to further validate the effectiveness and generalizability of our method under the same experimental conditions.

Although the current deep learning-based small target detection methods included in our comparison demonstrate limited performance on our infrared compound eye dataset, we believe that developing deep learning approaches specifically tailored for compound eye infrared imaging is a promising and meaningful direction.

In future work, we plan to construct a more diverse and task-specific dataset to better support data-driven learning. Additionally, we aim to explore novel neural network architectures that are well suited to the unique spatial and angular structure of compound eye images, with the goal of enhancing detection performance under complex scenarios.

3.8. Noise Impact

Our equipment utilizes cooled detectors, which to some extent mitigate the influence of noise. Nevertheless, it remains imperative to evaluate the robustness of our algorithm against noise. As illustrated in Figure 12, even after the introduction of varying degrees of Gaussian white noise, the algorithm proposed in this paper is still capable of detecting all small targets, demonstrating its robustness in noisy environments.

3.9. Algorithm Runtime Analysis

The computer configuration utilized for the experiment is delineated as follows: MATLAB R2020b, 12th Generation Intel(R) Core(TM) i5-12600K processor at 3.70 GHz, and the Windows 11 operating system. The original single-frame infrared compound eye image input has dimensions of 640 × 640 pixels.

Taking into account the runtime and detection efficiency of various methods in Table 4, our approach demonstrates superior performance.

4. Discussion

The ability to accurately detect small targets in all ommatidium images greatly benefits the further application of compound eye cameras.

Upon obtaining the pixel coordinates of all detected targets, the motion trajectory of the targets can be determined by integrating the geometric structural features of the compound eye camera and the results of camera calibration. As illustrated in Figure 13, we simulated the structure of 19 ommatidia within our compound eye camera and calculated the spatial coordinates of a target using the pixel coordinates of an ideal imaging result of a point in space. The error of the calculation results is shown in Figure 13b. The method employed involves establishing imaging equations using the pixel coordinates from a varying number of ommatidia and deriving the optimal result through the least squares method. The calculation results indicate that detecting all targets in a single-frame compound eye image and utilizing more target information can significantly reduce computational errors. Therefore, our method ensures a lower false alarm rate and a higher detection rate, which greatly aids in further obtaining the spatial information of the targets.

Although the proposed method is evaluated using data from a custom-built mid-wave infrared (MWIR) compound eye camera, the underlying detection framework is not tightly coupled to any specific hardware configuration or waveband. The method is based on general assumptions about the data, such as the sparsity of target signals, the low-rank nature of the background, and the spatial redundancy among ommatidia—features that are generally applicable to infrared imaging in both MWIR and long-wave infrared (LWIR) systems.

Therefore, we expect the method to generalize well to other compound eye platforms with different microlens arrangements or operating wavelengths. Further validation on diverse datasets and hardware will be part of our future work to enhance the robustness and applicability of the proposed approach.

5. Conclusions

In this paper, we have developed a straightforward method for detecting small targets in infrared compound eye images by integrating a compound eye structural weight operator, the reweighted l1 norm, and the RCTV method, based on the characteristics of bionic infrared compound eye imagery. This method boasts a high detection efficiency. In our detection model, we first designed a weight operator utilizing the feature of overlapping imaging scenes from adjacent ommatidia. This operator, combined with the reweighted l1 norm, enhances the distinction between target and background pixels. Furthermore, leveraging the theorem that the representative coefficient matrix can inherit the spatial structure of the data matrix, we applied TV regularization to constrain the representative coefficient matrix, thereby reducing computational complexity. Experimental analysis confirms that our method is fully applicable to the task of detecting small targets in infrared compound eye images.

We will further analyze the image information and target characteristics of bionic infrared compound eye images, summarizing prior information from a vast amount of data with limited pixels to explore more precise and rapid low-rank sparse decomposition methods. This endeavor aims to better adapt to various types of compound eye images. Additionally, we will further integrate the biological principles of insect compound eye vision to explore more suitable target detection methods. Concurrently, we will continue to combine cutting-edge technology to develop more application scenarios for compound eye cameras and will persistently upgrade our bionic infrared compound eye cameras in future work [42,43].

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app15094797/s1.

Author Contributions

Methodology, L.L.; validation, L.L.; investigation, L.L.; data curation, L.L., Y.Y., X.W. and S.H.; writing—original draft preparation, L.L.; writing—review and editing, L.L., S.G., J.Y. and Y.Y.; project administration, S.G. All authors have read and agreed to the published version of the manuscript.

Funding

The Opening Project of Shanghai Key Laboratory of Crime Scene Evidence (2024XCWZK01).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Some of the data have already been presented in the article, and the original data can be requested from the author with a reasonable justification.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

ADMM	Alternating Direction Method of Multipliers (ADMM)
MWIR	Mid-Wave Infrared
LRSD	Low-Rank and Sparse Decomposition
RCTV	Representative Coefficient Total Variation
IPI	Infrared Patch-Image
RIPT	Reweighted Infrared Patch Tensor
ATV	Anisotropic Total Variation
ITV	Isotropic Total Variation
ALMM	Augmented Lagrange Multiplier Method
SCRG	Signal-to-Clutter Ratio Gain
BSF	Background Suppression Factor
CG	Contrast Gain
TPR	True Positive Rate
FPR	False Positive Rate

References

Cao, A.; Shi, L.; Shi, R.; Deng, Q.; Du, C. Image Process Technique Used in a Large FOV Compound Eye Imaging System. In Optoelectronic Imaging and Multimedia Technology II; Shimura, T., Xu, G., Tao, L., Zheng, J., Eds.; SPIE: Bellingham, WA, USA, 2012; Volume 8558. [Google Scholar] [CrossRef]
Song, Y.M.; Xie, Y.; Malyarchuk, V.; Xiao, J.; Jung, I.; Choi, K.-J.; Liu, Z.; Park, H.; Lu, C.; Kim, R.-H.; et al. Digital cameras with designs inspired by the arthropod eye. Nature 2013, 497, 95–99. [Google Scholar] [CrossRef] [PubMed]
Yang, Y.; Chi, Y.-H.; Li, L.-H.; Wang, X.-Y.; Chen, J.; Yue, J.; Gu, Y.-Z.; Su, H.-F.; Gao, S.-L. Design of Cooled Infrared Bionic Compound Eye Optical System with Large Field-of-View. In Earth and Space: From Infrared to Terahertz (ESIT 2022); SPIE: Bellingham, WA, USA, 2023; Volume 12505, pp. 172–182. [Google Scholar]
Zhang, W.; Cong, M.; Wang, L. Algorithms for Optical Weak Small Targets Detection and Tracking: Review. In Proceedings of the International Conference on Neural Networks and Signal Processing, Nanjing, China, 14–17 December 2003; IEEE: Piscataway, NJ, USA, 2003; Volume 1, pp. 643–647. [Google Scholar] [CrossRef]
Dai, Y.; Wu, Y.; Zhou, F.; Barnard, K. Asymmetric Contextual Modulation for Infrared Small Target Detection. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2021; pp. 949–958. [Google Scholar] [CrossRef]
Gu, Y.; Wang, C.; Liu, B.; Zhang, Y. A Kernel-Based Nonparametric Regression Method for Clutter Removal in Infrared Small-Target Detection Applications. IEEE Geosci. Remote Sens. Lett. 2010, 7, 469–473. [Google Scholar] [CrossRef]
Gao, C.; Meng, D.; Yang, Y.; Wang, Y.; Zhou, X.; Hauptmann, A.G. Infrared Patch-Image Model for Small Target Detection in a Single Image. IEEE Trans. Image Process. 2013, 22, 4996–5009. [Google Scholar] [CrossRef] [PubMed]
Dai, Y.; Wu, Y.; Song, Y.; Guo, J. Non-negative infrared patch-image model: Robust target-background separation via partial sum minimization of singular values. Infrared Phys. Technol. 2017, 81, 182–194. [Google Scholar] [CrossRef]
Zhang, L.; Peng, L.; Zhang, T.; Cao, S.; Peng, Z. Infrared Small Target Detection via Non-Convex Rank Approximation Minimization Joint l 2, 1 Norm. Remote Sens. 2018, 10, 1821. [Google Scholar] [CrossRef]
Xue, W.; Qi, J.; Shao, G.; Xiao, Z.; Zhang, Y.; Zhong, P. Low-Rank Approximation and Multiple Sparse Constraint Modeling for Infrared Low-Flying Fixed-Wing UAV Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 4150–4166. [Google Scholar] [CrossRef]
Dai, Y.; Wu, Y. Reweighted Infrared Patch-Tensor Model with Both Nonlocal and Local Priors for Single-Frame Small Target Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 3752–3767. [Google Scholar] [CrossRef]
Zhang, L.; Peng, Z. Infrared Small Target Detection Based on Partial Sum of the Tensor Nuclear Norm. Remote Sens. 2019, 11, 382. [Google Scholar] [CrossRef]
Kong, X.; Yang, C.; Cao, S.; Li, C.; Peng, Z. Infrared Small Target Detection via Nonconvex Tensor Fibered Rank Approximation. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–21. [Google Scholar] [CrossRef]
Nie, Y.; Li, W.; Zhao, M.; Ran, Q.; Ma, P. Infrared Small Target Detection in Image Sequences Based on Temporal Low-Rank and Sparse Decomposition. In Proceedings of the Twelfth International Conference on Graphics and Image Processing (ICGIP 2020), Xi’an, China, 13–15 November 2020; Volume 11720, pp. 72–80. [Google Scholar]
Sun, Y.; Yang, J.; Long, Y.; An, W. Infrared Small Target Detection via Spatial-Temporal Total Variation Regularization and Weighted Tensor Nuclear Norm. IEEE Access 2019, 7, 56667–56682. [Google Scholar] [CrossRef]
Liu, T.; Yang, J.; Li, B.; Xiao, C.; Sun, Y.; Wang, Y.; An, W. Nonconvex Tensor Low-Rank Approximation for Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–18. [Google Scholar] [CrossRef]
Wu, F.; Yu, H.; Liu, A.; Luo, J.; Peng, Z. Infrared Small Target Detection Using Spatiotemporal 4-D Tensor Train and Ring Unfolding. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–22. [Google Scholar] [CrossRef]
Liu, T.; Yang, J.; Li, B.; Wang, Y.; An, W. Representative Coefficient Total Variation for Efficient Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–18. [Google Scholar] [CrossRef]
Sun, Y.; Yang, J.; An, W. Infrared Dim and Small Target Detection via Multiple Subspace Learning and Spatial-Temporal Patch-Tensor Model. IEEE Trans. Geosci. Remote Sens. 2020, 59, 3737–3752. [Google Scholar] [CrossRef]
Lu, D.; An, W.; Ling, Q.; Cao, D.; Wang, H.; Li, M.; Lin, Z. LSTT: Long-Term Spatial–Temporal Tensor Model for Infrared Small Target Detection under Dynamic Background. Remote Sens. 2024, 16, 2746. [Google Scholar] [CrossRef]
Wei, W.; Ma, T.; Li, M.; Zuo, H. Infrared Dim and Small Target Detection Based on Superpixel Segmentation and Spatiotemporal Cluster 4D Fully-Connected Tensor Network Decomposition. Remote Sens. 2023, 16, 34. [Google Scholar] [CrossRef]
Liu, P.; Peng, J.; Wang, H.; Hong, D.; Cao, X. Infrared Small Target Detection via Joint Low Rankness and Local Smoothness Prior. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–15. [Google Scholar] [CrossRef]
Yin, J.-J.; Li, H.-C.; Zheng, Y.-B.; Gao, G.; Hu, Y.; Tao, R. Spatial-Temporal Weighted and Regularized Tensor Model for Infrared Dim and Small Target Detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–17. [Google Scholar] [CrossRef]
Zhao, E.; Dong, L.; Shi, J. Infrared Maritime Target Detection Based on Iterative Corner and Edge Weights in Tensor Decomposition. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 7543–7558. [Google Scholar] [CrossRef]
Wang, X.; Li, L.; Chi, Y.; Liu, J.; Yue, J.; Gao, S.; Yuan, X.; Yu, Y. Research on key technology of cooled infrared bionic compound eye camera based on small lens array. Sci. Rep. 2024, 14, 11094. [Google Scholar] [CrossRef]
Guo, J.; Chen, Q. Image denoising based on nonconvex anisotropic total-variation regularization. Signal Process. 2021, 186, 108124. [Google Scholar] [CrossRef]
Peng, J.; Wang, H.; Cao, X.; Liu, X.; Rui, X.; Meng, D. Fast Noise Removal in Hyperspectral Images via Representative Coefficient Total Variation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–17. [Google Scholar] [CrossRef]
Rudin, L.I.; Osher, S.; Fatemi, E. Nonlinear total variation based noise removal algorithms. Phys. D Nonlinear Phenom. 1992, 60, 259–268. [Google Scholar] [CrossRef]
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends^® Mach. Learn. 2011, 3, 1–122. [Google Scholar]
Lin, Z.; Chen, M.; Ma, Y. The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices. arXiv 2010, arXiv:1009.5055. [Google Scholar]
Donoho, D.L. De-Noising by Soft-Thresholding. IEEE Trans. Inf. Theory 2002, 41, 613–627. [Google Scholar] [CrossRef]
Peng, J.; Xie, Q.; Zhao, Q.; Wang, Y.; Yee, L.; Meng, D. Enhanced 3DTV Regularization and Its Applications on HSI Denoising and Compressed Sensing. IEEE Trans. Image Process. 2020, 29, 7889–7903. [Google Scholar] [CrossRef]
Krishnan, D.; Rob, F. Fast Image Deconvolution Using Hyper-Laplacian Priors. Adv. Neural Inf. Process. Syst. 2009, 22, 1033–1041. [Google Scholar]
Moradi, S.; Moallem, P.; Sabahi, M.F. Fast and robust small infrared target detection using absolute directional mean difference algorithm. Signal Process. 2020, 177, 107727. [Google Scholar] [CrossRef]
Liu, T.; Yang, J.; Li, B.; Wang, Y.; An, W. Infrared Small Target Detection via Nonconvex Tensor Tucker Decomposition with Factor Prior. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–17. [Google Scholar] [CrossRef]
Rivest, J.-F.; Fortin, R. Detection of Dim Targets in Digital Infrared Imagery by Morphological Image Processing. Opt. Eng. 1996, 35, 1886–1893. [Google Scholar] [CrossRef]
Wei, Y.; You, X.; Li, H. Multiscale patch-based contrast measure for small infrared target detection. Pattern Recognit. 2016, 58, 216–226. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
Li, H.; Yang, J.; Wang, R.; Xu, Y. ILNet: Low-Level Matters for Salient Infrared Small Target Detection. IEEE Trans. Aerosp. Electron. Syst. 2025, 1–13. [Google Scholar] [CrossRef]
Liu, Q.; Liu, R.; Zheng, B.; Wang, H.; Fu, Y. Infrared Small Target Detection with Scale and Location Sensitivity. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 17490–17499. [Google Scholar] [CrossRef]
Zhang, L.; Fan, H. Visual Object Tracking: Progress, Challenge, and Future. Innovation 2023, 4, 100402. [Google Scholar] [CrossRef]
Tang, Y.; Li, R.; Sun, R.; Min, J.; Lin, Q.; Yang, C.; Xie, G. Flexible All-Organic Photodetectors via Universal Water-Assisted Transfer Printing. Innovation 2023, 4, 100460. [Google Scholar] [CrossRef]

Figure 1. Typical examples of infrared small targets in real scenes [5]. (This small target is a flying object).

Figure 2. Illustration of the construction process of the infrared compound eye image tensor model. The original infrared image sequence is segmented to extract only the effective imaging regions corresponding to the ommatidia. These regions are then arranged to form a 4D tensor. Modes 1 to 4 correspond to the spatial dimensions

N_{W}, N_{h}

, the number of ommatidia

N_{n}

, and the number of frames

N_{t}

, respectively. Each unfolding operation along a specific mode is used for analyzing the tensor’s structural properties.

Figure 2. Illustration of the construction process of the infrared compound eye image tensor model. The original infrared image sequence is segmented to extract only the effective imaging regions corresponding to the ommatidia. These regions are then arranged to form a 4D tensor. Modes 1 to 4 correspond to the spatial dimensions

N_{W}, N_{h}

, the number of ommatidia

N_{n}

, and the number of frames

N_{t}

, respectively. Each unfolding operation along a specific mode is used for analyzing the tensor’s structural properties.

Figure 3. Illustration of the camera lens structure and the field-of-view (FOV) overlap between adjacent ommatidia in our compound eye imaging system.

Figure 4. Small target detection process: The input image sequence is reorganized to retain only the ommatidial imaging regions. A weighting operator is then computed, followed by the construction of the proposed model, which is subsequently optimized using the ADMM framework. The term “soft” refers to the soft-thresholding function, which is used during the iterative updates of T.

Figure 5. Schematic diagram of the small target data acquisition setup. Components 1–5 correspond to the (1) reflective mirror, (2) rotating reflection mechanism, (3) collimator, (4) circular aperture target, and (5) blackbody source.

Figure 6. Experimental infrared images. Panels (a–c) present small targets with varying intensity responses, where the backgrounds represent the laboratory environment. Panels 1–5 display different background scenarios, including (1) buildings and distant background, (2) distant background and sky, (3) distant building clusters with roads, (4) forested areas, and (5) sky scenes with clouds.

Figure 7. ROC curves for different parameters. The first to fourth rows correspond to varying values of F, H, r, and α, respectively, while the first to fifth columns represent image sequences 1–5.

Figure 8. ROC curves illustrating the impact of the weighting operator. (1)–(5): detection performance with the weighting operator outperforms the version without it across all five image sequences.

Figure 9. Detection results obtained using different methods on five infrared image sequences (img-seq-1 to img-seq-5). The img-part subset highlights four ommatidia within the yellow dashed region for more detailed comparison. Each result image is annotated in the upper-right corner with the name of the corresponding method, including 11 existing approaches and our proposed method. The green elliptical regions in the image indicate false alarms, while the red square regions represent actual targets.

Figure 10. (1)–(5): Comparison of ROC curves for different algorithms across five infrared small target datasets.

Figure 11. Precision–Recall curve comparison among different object detection algorithms.

Figure 12. The four rows of data in the figure correspond to (a) the image with 30dB Gaussian white noise added and its detection results; (b) the image with 40 dB Gaussian white noise added and its detection results.

Figure 13. Calculation error of target trajectory coordinates under ideal imaging conditions. (a) Schematic diagram of the geometric configuration of the coordinate system for the 19 ommatidia in the compound eye camera. (b) Spatial coordinate errors of the target computed from detection results using different numbers of ommatidia under ideal imaging conditions.

Table 1. Data characteristics.

Image Sequence	Frame	Background Description
Img-seq-1	160	Buildings and distant background
Img-seq-2	160	Distant background and sky
Img-seq-3	160	Distant building clusters with roads
Img-seq-4	160	Forested areas
Img-seq-5	160	Sky scenes with clouds

Table 2. Parameters.

Methods	Parameters
4D_TT	Patch size $N_{1} \times N_{2} : 10 \times 10$ , temporal size: $N_{3} = 15, λ_{1} = \sum_{i = 1}^{m - 1} 2 / \sqrt{m a x (\prod_{i = 1}^{k} N_{i}, \prod_{i = k + 1}^{m - 1} N_{i})}, τ = 1 \times 10^{- 7}$
4D_TR	Patch size $N_{1} \times N_{2} : 70 \times 70$ , temporal size: $N_{3} = 15, λ_{1} = \sum_{i = 1}^{l} 2 / \sqrt{m a x (\prod_{i = n}^{n + l - 1} N_{i}, \prod_{i = n + 1}^{n - 1} N_{i})}, τ = 1 \times 10^{- 7}$
ADMD	Multiscale local window size: [3, 5, 7, 9]
ASTTV_NTLA	$L = 3, H = 6, λ_{1} = 0.005, λ_{2} = H / \sqrt{m a x (M, N) * L}, λ_{3} = 100$
ICEW	Patch size: 60, step: 40, $μ = 1$ , $β = 0.2$ , $L = 0.6, λ = \sqrt{m a x (p, q) * n}$
IPI	Patch size: 10 × 10, slide step: 5
RCTV	$H = 4, λ = \frac{H}{\sqrt{\max (M, N) * L}}, α = 0.8, L = 8, r = 6, β = 2 \times 10^{- 4}, ρ = 1.1, μ = 0.01$
RCTVW	$H = 4, λ = H / \sqrt{m a x (M, N) * L}, α = 0.8, L = 8, r = 6, β = 2 \times 10^{- 4}, ρ = 1.1, μ_{1} = 0.01, μ_{2} = 0.002$
NFTDGSTV	$L = 3, H = 4, λ_{1} = 0.01, λ_{2} = H / \sqrt{m a x (M, N) * L}, λ_{S} = 0.001$
Top_hat	Local window size: 5 × 5
MPCM	Multiscale local window size: [3, 5, 7, 9]
Ours	$F = 10, r = 6, α = 0.1, β_{i} = 1, μ = 0.01, μ_{m a x} = 1 \times 10^{7}, ρ = 1.5$ , $λ = H / \sqrt{m a x (m, n) * F}, H = 4$

Table 3. Analysis of background suppression results for different methods across five image sequences. The boldfaced values denote the best-performing results.

	4D_TR			4D_TT			ADMD			ASTTV_NTLA
	SCRG	BSF	CG	SCRG	BSF	CG	SCRG	BSF	CG	SCRG	BSF	CG
Seq1	9.33	11.77	0.79	4.77	9.32	0.51	27.68	148.32	0.19	3.47	4.8	0.72
Seq2	60.96	92.67	0.66	4.21	10.07	0.42	13.37	18.4	0.73	2.71	5.41	0.5
Seq3	26.11	25.79	1.01	8.44	11.24	0.75	6.57	3.62	1.82	2.51	3.31	0.76
Seq4	19.73	23.39	0.84	3.82	7.28	0.52	7.44	3.32	2.24	1.94	2.48	0.78
Seq5	13.48	16.67	0.81	2.14	6.2	0.34	6.19	3.99	1.55	1.6	2.32	0.69
	RCTV			RCTVW			MPCM			NFTDGSTV
	SCRG	BSF	CG	SCRG	BSF	CG	SCRG	BSF	CG	SCRG	BSF	CG
Seq1	12.09	7.54	1.6	23.37	10.5	2.23	1.92	7.86	0.24	1.53	4.29	0.36
Seq2	44.44	36.73	1.21	Inf	Inf	1.53	2.87	11.52	0.25	1.16	5.18	0.22
Seq3	14.87	7.26	2.05	23.31	9.72	2.4	4.01	5.44	0.74	2.23	6.19	0.36
Seq4	15.42	10.69	1.44	52.72	23.33	2.26	4.83	11.28	0.43	1.39	2.89	0.48
Seq5	18.54	14.1	1.32	Inf	Inf	1.51	0.85	3.82	0.22	0.98	3.16	0.31
	ICEW			IPI			Top_hat			Ours
	SCRG	BSF	CG	SCRG	BSF	CG	SCRG	BSF	CG	SCRG	BSF	CG
Seq1	5.2	4.17	1.25	1.85	3.34	0.55	2.33	3.17	0.73	Inf	Inf	1.8
Seq2	3.72	6.17	0.6	2.62	4.19	0.63	3.53	5.58	0.63	Inf	Inf	1.76
Seq3	4.6	3.78	1.22	2.5	4.4	0.57	2.65	2.45	1.08	Inf	Inf	2.38
Seq4	4.06	3.43	1.18	2.31	3	0.77	2.12	3.52	0.6	Inf	Inf	2.51
Seq5	0.81	3.95	0.21	1.26	2.67	0.47	1.04	2.81	0.37	Inf	Inf	1.56

Table 4. The runtime for single-frame image detection and the corresponding detection results for each algorithm. The boldfaced values denote the best-performing results.

Method	4D_TT	4D_TR	ADMD	ASTTV_NTLA	ICEW	IPI	RCTV	RCTVW	MPCM	NFTDGSTV	Top_hat	OURS
Times	0.6041	1.2686	0.0437	2.4624	1.4240	1.1535	0.1085	0.1187	0.1787	3.3907	0.0820	0.1821
AUC	0.9962	0.9925	0.9479	0.9473	0.9986	0.8776	0.9948	0.9886	0.9134	0.9617	0.9861	0.9999

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, L.; Wang, X.; Hao, S.; Yu, Y.; Gao, S.; Yue, J. Infrared Small Target Detection Based on Compound Eye Structural Feature Weighting and Regularized Tensor. Appl. Sci. 2025, 15, 4797. https://doi.org/10.3390/app15094797

AMA Style

Li L, Wang X, Hao S, Yu Y, Gao S, Yue J. Infrared Small Target Detection Based on Compound Eye Structural Feature Weighting and Regularized Tensor. Applied Sciences. 2025; 15(9):4797. https://doi.org/10.3390/app15094797

Chicago/Turabian Style

Li, Linhan, Xiaoyu Wang, Shijing Hao, Yang Yu, Sili Gao, and Juan Yue. 2025. "Infrared Small Target Detection Based on Compound Eye Structural Feature Weighting and Regularized Tensor" Applied Sciences 15, no. 9: 4797. https://doi.org/10.3390/app15094797

APA Style

Li, L., Wang, X., Hao, S., Yu, Y., Gao, S., & Yue, J. (2025). Infrared Small Target Detection Based on Compound Eye Structural Feature Weighting and Regularized Tensor. Applied Sciences, 15(9), 4797. https://doi.org/10.3390/app15094797

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Infrared Small Target Detection Based on Compound Eye Structural Feature Weighting and Regularized Tensor

Abstract

1. Introduction

2. Materials and Methods

2.1. Construction of the Compound Eye Spatiotemporal Tensor Model

2.2. Compound Eye Structural Weighting Operator

2.3. Weighted Regularization Model

2.4. Iterative Optimization Process

2.5. Complexity Analysis

2.6. Convergence Analysis

2.7. Target Detection Process

3. Results

3.1. Objective Evaluation Metrics

3.2. Data Characteristics

3.3. Parameter Settings

3.4. Ablation Study

3.5. Comparative Algorithms

3.6. Visual Analysis

3.7. Quantitative Analysis

3.8. Noise Impact

3.9. Algorithm Runtime Analysis

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI