Small Space Target Detection Based on a Convolutional Neural Network and Guidance Information

Lin, Bin; Wang, Jie; Wang, Han; Zhong, Lijun; Yang, Xia; Zhang, Xiaohu

doi:10.3390/aerospace10050426

Open AccessArticle

Small Space Target Detection Based on a Convolutional Neural Network and Guidance Information

by

Bin Lin

^1,2,*

,

Jie Wang

²

,

Han Wang

²,

Lijun Zhong

²,

Xia Yang

² and

Xiaohu Zhang

^2,*

¹

College of Photonic and Electronic Engineering, Fujian Normal University, Fuzhou 350117, China

²

School of Aeronautics and Astronautics, Sun Yat-sen Unviersity, Guangzhou 510725, China

^*

Authors to whom correspondence should be addressed.

Aerospace 2023, 10(5), 426; https://doi.org/10.3390/aerospace10050426

Submission received: 19 March 2023 / Revised: 25 April 2023 / Accepted: 28 April 2023 / Published: 30 April 2023

Download

Browse Figures

Versions Notes

Abstract

:

Although space targets have different shapes, sizes and intensities, their distributions share certain commonalities. However, it is difficult to summarize a generalized distribution function for space targets. Moreover, most of the existing methods based on deep learning are not suitable to use directly because of the size of targets and the cost of manual labeling for a full image. In this paper, we proposed a pattern for space target detection based on a convolutional neural network (CNN) to learn essential features of the targets from data. In the processing stage, the background is estimated and removed. Then, image techniques are used to search and process region proposals. Different sizes of region proposals are recognized by a discriminator, which is built upon a small CNN trained with the data of several specific targets. Finally, a non-maximum suppression (NMS) operation is used to remove redundant targets. In the network structure, to further enhance the influence of the effective area, the parameters calculated from the center region of the input are utilized as guidance information and added to the features before the full connection. Moreover, the bias loss is applied to increase the weights of unique features. The experimental results demonstrate the outstanding performance of the proposed method in terms of the number of detected targets, accuracy rate and false alarm rate compared with baseline methods. In particular, the proposed method has a simple network structure and a lower computational cost which can be further promoted and implemented in actual engineering.

Keywords:

target characteristic; convolutional neural network (CNN); guidance information; space target detection; low signal-to-clutter ratio (SCR)

1. Introduction

Space surveillance is paramount in space situational awareness (SSA) due to its significant effect on military action and space operations. In this field, space targets refer to spacecraft in orbit, space debris, near-earth objects (NEO) and so on. Developments in space, electronics and processing technologies have made it possible to realize space target detection using different methods such as radar and optical systems. In 1990, the first software algorithm based on CCD images was designed to detect space targets [1]. In the decades that followed, many experiments based on optical measurements demonstrated the feasibility and efficiency of this kind of method [2,3,4]. However, in visual images, space targets are usually present in small sizes, and their intensity and shape are extremely variable due to the long imaging distance. As a result of the same imaging principles, the targets sometimes have a similar distribution to stars, especially when the exposure time is short. Thus, space target detection in a single image usually results in a lot of stars being falsely detected. In addition, stray lights, thermal noise, CCD imaging system noise and other interference may influence the detection. These factors converge to pose looming challenges for space target detection.

In recent decades, numerous studies have been dedicated to space target detection. Generally, a space target is presented as a light spot in an optical image, which is similar to the traditional infrared small target problem. Current small target detection methods can be broadly categorized into track-before-detect (TBD) methods [5] and detect-before-track (DBT) methods [6]. TBD methods usually utilize a sequence of frames and some prior knowledge of targets [7]. Reed [8] used a 3D matched (directional) filter to detect targets moving in uniform motion based on their shape and velocity. On this basis, the 3D double directional filter [9] and improved 3D directional filter [10] were then proposed to improve the detection ability for weak targets. Moreover, a modified partial differential equation [11] and support vector machines [12] have been used to suppress the background and remove false alarms. Compared with TBD methods, DBT methods require fewer assumptions and prior knowledge, so they are among the most commonly applied methods in engineering. For detection in single images, image binarization is a widely applied method. Stoveken [13] summarized several methods to identify objects. Methods including star catalogs, object characteristics and median images are used to realize image binarization. On this basis, Virani [14] detected and tracked objects using a signal-to-noise ratio (SNR) threshold and a Gaussian mixture-based probability hypothesis density (GM-PHD).

One of the earliest programs dedicated to detection was the Lincoln Near Earth Asteroid Research (LINEAR) program. A binary hypothesis test (BHT) was utilized to generate a binary map, where the value of detected targets is 1 [15]. Zingarelli [16] proposed a new method combining local noise statistics and a multi-hypothesis test (MHT) to reject outliers. Hardy [17] found that an unequal cost MHT achieves better performance than an equal cost MHT when the potential intensity of targets occupies a large range. Tompkins [18] used MHTs and parallax to find the difference between the point spread functions (PSFs) observed from a star and Resident Space Objects (RSOs).

Another more computationally complex method is a matched filter. The main aspect of a matched filter approach is the match between the observed data and the expected PSF. Among these methods, the SExtractor software is widely used for object detection within the SSA community [19]. Bertin [20] has conducted further work to apply and update the software over the years. Murphy [21] used the multi-Bernoulli filter to detect space objects with a low SNR. Furthermore, they utilized prior knowledge to design a bank of matched filters, and the filtering results were transformed as the measurement likelihood for a Bayesian update and subsequent detection [22]. Lin [23] researched the essential features of targets and proposed the multiscale local target characteristic (MLTC) to judge whether a local area contains a target.

In other small target detection fields, the max-mean, max-median [24], top-hat [25], two-dimensional least mean square (TDLMS) [26] and other methods are widely used to suppress backgrounds and enhance targets. Recently, methods based on the human visual system (HVS) such as the local contrast measure (LCM) [27], improved local contrast measure (ILCM) [28], relative local contrast measure (RLCM) [29], neighborhood saliency map (NSM) [30] and weighted strengthened local contrast measure (WSLCM) [31] have been introduced for target detection. Some methods regard the detection task as a two-class recognition problem. Qin [32] addressed the problem with a novel small target detection method based on the facet kernel and random walker (FKRW) algorithm.

In latest developments, many methods based on deep learning have been networks (RCNNs) [33,34] and Yolo [35,36,37,38], SSD [39] and DEtection TRansformer (DETR) [40]. These methods can learn features from huge amounts of data rather than handcrafted features so that they perform well when the data are rich and unique. However, the size of the space target is small and their texture information is little. Therefore, popular methods using deep learning are not suitable for the detection of space targets. Jia [41] proposed a method based on the concept of the Faster R-CNN to detect and classify astronomical targets with certain image sizes. Xi [42] proposed a space debris detection method using feature learning of candidate regions (FLCR) in optical image sequences. The method still suffers difficulties in using the spatial features of dim and small objects sufficiently. In addition, some methods based on deep learning have been used for small targets by generating backgrounds and separating small targets, including semantic constraint [43], denoising autoencoder [44], generative adversarial network (GAN) [45,46], a spatial-temporal feature-based method [47] and a feature extraction framework combining handcrafted feature methods and convolutional neural networks [48]. Li [49] designed a network named BSC-Net (Background Suppression Convolutional Network) to realize the background suppression of stray lights in star images. However, this method is still unable to filter out weak targets.

The above methods still suffer difficulties which need to be overcome when used in space target detection.

1.: Even though the intensities of the same target in different images or different targets in the same image change dramatically, they share high similarities with the distribution. However, it is difficult for traditional methods based on theoretical models to summarize a generalized distribution function of space targets that covers different situations.
2.: In the stage of the final target confirmation, the thresholds are usually set based on the information from the global image. This can barely balance the detection of weak targets and a decrease in false alarms.
3.: A star image contains a large number of targets that can be used to build a dataset, which provides a good foundation to build a deep convolutional neural network that can approximate the distribution function of different targets. Due to the diverse detectability of optical systems, even the images pointing to the same sky area contain different magnitudes and quantities of stars. Thus, although there are thousands of targets existing in an image, not all of them are certain, which results in a higher cost of annotation for a full image and lower validity of detection.
4.: The common end-to-end detectors based on CNN [50] do not perform well for small target detection. Besides, most of the structures of existing networks are complex and bring high requirements for in-orbit hardware resources.

To address these problems, this paper aims to develop a novel and low-cost method for space target detection based on distribution characteristics, which can potentially be used in actual in-orbit service. In this paper, a CNN is applied to learn the distribution of targets automatically with no need for a certain matching function or criteria. Proposals are separated from the background, located by finding an intensity higher than the local area, and confirmed depending on whether they follow a particular distribution. Real data are used to test the proposed method. The experimental results demonstrate that our method is simple and effective for targets with diverse signal-to-clutter values. Furthermore, the proposed method shows a significant superiority over baseline methods.

As a whole, the contributions of this paper can be summarized as follows:

1.: A method is presented to detect space targets by modeling the global detection of an image as local two-class recognition of several candidate regions so that manual labeling for a full image is not required.
2.: We design a small architecture of the CNN to realize automatic feature extraction and target confirmation, which avoids the problem of threshold setting for the final target confirmation. Furthermore, the architecture can potentially be used in actual in-orbit services because it is easy to implement and requires only a low cost in terms of computing resources.
3.: We added a module of information guidance to improve the detection ability. The former is applied to each candidate region to reduce the difference in intensity among targets, which confers a better generalization ability to the network. Furthermore, the latter is added to provide extra features. In this way, the interference caused by pixels in the non-central area is further reduced and the distribution characteristics of the target are enhanced. This method shows good performance in experiments.

The remainder of this paper is organized as follows. In Section 2, we explain the target detection method based on a CNN. In Section 3, we introduce the implementation in detail. Extensive experimental results and discussions are given in Section 4, and the conclusions are provided in Section 5.

2. Proposed Method

2.1. Problem Description

Traditional methods usually design features and standards to distinguish targets from backgrounds. However, an image contains not only a large number of star targets and space targets whose intensity and size are extremely variable, but also contamination and noise such as smear effects and stray light [23]. Methods based on the normal threshold policy, such as thresholding, pyramids and morphology operations, can hardly maintain robustness when facing different complex backgrounds. A large threshold means a loss in the detection ability for weak targets. A small threshold may result in many false alarms and target region connections.

Generally, when the relative motion between the optical system and the target is small during the exposure time, the intensity of a star image f can be regarded as a combination of the target t and noise n:

\begin{matrix} f (x, y) & = t (x, y) + n (x, y) \\ t (x, y) & = \frac{S}{2 π σ_{p s f}^{2}} exp (- \frac{{(x - x_{0})}^{2} + {(y - y_{0})}^{2}}{2 σ_{p s f}^{2}}) \end{matrix}

(1)

where the noise n includes background and space environmental effects,

(x_{0}, y_{0})

is the center coordinate of the target, S is the scale coefficient and

σ_{p s f}

is the standard deviation of the PSF.

However, targets, including space targets and star targets, usually do not follow the standard distribution in practical engineering, because when there is stray light in the image, some targets in the affected area may have a low signal-to-noise ratio and their original distribution characteristics will be overwhelmed by noise. At the same time, the camera and space target in the space-based observation system may move, which not only results in a certain image displacement of the target during the exposure time but also imposes a negative impact on the target distribution. It is difficult to summarize a generalized distribution function of space targets and to set thresholds that cover different situations. Even though the stars and targets are diverse in shape and intensity, the targets maintain the following characteristics:

1.: The target center is the local maximum. Figure 1 shows the original images and corresponding intensity distribution of the same target with low SCR values in different frames, where the center of the target in each image is marked in red. The distribution of the target is easily influenced by the noise (Figure 1c) and the PSFs of close stars and other targets (Figure 1d). The intensity value of each pixel inside the target region is the superposition result of the target and other interference; thus, the target center still retains its saliency unless it is submerged.
2.: When the target has no fast relative motion, its distribution approximates a Gaussian distribution. The difference is that the actual distribution extends in the direction of its motion because of motion blur if the exposure time is long or the target has a high speed, as shown in Figure 2.

Considering that there is no relative public dataset, another problem that needs to be solved is the construction of the dataset. There are differences among optical systems that affect image quality, representation of stars and targets and identification of stars and targets. Even images of the same patch of sky contain different amounts of stars and targets because of different cameras or parameters. It is difficult to realize labeling of the whole image. In addition, even a simulation can hardly describe all of the distribution because of the diversity of targets regarding shapes, sizes and intensities.

Thus, we proposed a method composed of the following three main phases as depicted in Figure 3. All displayed images have been stretched with contrast. The first phase is a preprocessing stage, where a set of techniques are applied to the image, resulting in a background-removed image. The second phase is the region proposal search, where local maximum points in the foreground are extracted and regions with different sizes (marked in red, green and blue) around these points are formed as proposals. The third phase is the certification of proposals with a specially designed CNN. The detected targets are shown in different colors according to the size with the highest confidence.

2.2. Preprocessing Stage

The method used in SExtractor [19] is introduced to suppress the background due to its effectiveness and suitability for this research. Grids are set to divide a star image into background meshes. For each mesh, its local background histogram is clipped iteratively until convergence at

\pm 3 σ

around its median. During the process, if

σ

drops by less than 20%, the mean of the clipped histogram is regarded as the background value. Otherwise, the local background

b_{l}

is estimated with:

b_{l} = 2 . 5 \times median - 1 . 5 \times mean

(2)

The background map and the root mean square (rms) map are then generated by a bilinear interpolation between the meshes of the grid. The mesh size is set as 32 here. The foreground

f_{f g}

, mainly including stars and targets, is segmented from the background

f_{b g}

as follows:

f_{f g} (x, y) = f (x, y) - f_{b g} (x, y)

(3)

The RMS map is applied to binarization as follows:

f_{b w} (x, y) = \{\begin{matrix} 1, & f_{f g} (x, y) > 3 \times f_{r m s} (x, y) \\ 0, & others \end{matrix}

(4)

where

f_{b w} (x, y)

is the binarization image and

f_{r m s} (x, y)

is the RMS map.

2.3. Region Proposal Search

According to the previous analysis, there is high relevance between target centers and local maximum points in an image. This feature can be applied to find region proposals of targets. Local maximum values in an image are extracted by an expansion operation:

f_{p} (x, y) = max (f (x + x^{'}, y + y^{'}) |(x^{'}, y^{'}) \in D_{m})

(5)

where

f_{p} (x, y)

is the expanded image and

D_{m}

is the region of the expansion operation, which is decided according to the minimum search distance between targets.

It is difficult to distinguish a target with a radius of 1 from thermal noise; thus, the smallest pixel-level radius for which people can judge whether or not a region contains a target is 2. Thus, we set

D_{m}

to 5 (

= 2 \times 2 + 1

), i.e., the searching radius is 2, to maintain the sensitivity of the method for two close targets in this paper. The point set of suspected center points

N

is defined as

N (x, y) = \{(x, y) |f_{b w} (x, y) > 0 \cap f_{p} (x, y) = f\}

(6)

Regions with different sizes around each point

p_{i} \in N

will be extracted as proposals. A smaller

D_{m}

may result in more proposals containing only the background; thus, these proposals will be classified by the CNN designed in the next subsection.

2.4. Target Detection with an SF-CNN

For each searched region proposal, we need to confirm whether there is a target present. In this way, target detection in the whole image is transferred into the recognition of a sub-image. In order to solve the problem of simple segmentation thresholds when using traditional methods and to improve the generalization ability for targets of different shapes, a method based on deep learning is considered. With the limitations of hardware resources and calculating pressures in orbit, we attempt to use a simple and small CNN called SF-CNN rather than state-of-the-art deep learning methods. This network is trained to obtain a small model C that classifies a sub-image into a target or non-target.

After the preprocessing stage and region proposal search, the input of the model C is each processed proposal,

f_{n o r m}

, and the output is the predicted class, i.e., target (=1) or non-target (=0) as well as a corresponding confidence value

v_{c}

(the output of C). Parameter training will be stopped until the loss stops decreasing and approximately converges at a constant value. In that way, the trained C is considered to have learned the distribution of small targets in the dataset so that it can be directly used to predict the class of inputs unless the distribution of targets is severely impacted by the background or nearby stars in the image.

There is no prior information on targets or stars; thus, for each suspected center point

(x_{i}, y_{i})

, proposals of different sizes are extracted to obtain the size information of the target. The value map V and radius map R are generated as:

\begin{matrix} V_{c} (x_{i}, y_{i}, σ) = \{\begin{matrix} v_{c}^{σ}, & \begin{matrix} class = 1 \end{matrix} \\ 0, & others \end{matrix} \\ V (x_{i}, y_{i}) = max (V_{c} (x_{i}, y_{i})) \\ R (x_{i}, y_{i}) = \{σ | V_{σ} (x_{i}, y_{i}) = V (x_{i}, y_{i}))} \\ σ = 5, 6, . . ., σ_{end} - 1, σ_{end} \end{matrix}

(7)

where

V_{c} (x_{i}, y_{i}, σ)

is the confidence value of point

(x_{i}, y_{i})

in size

σ

. If the center of a proposal is not the maximum,

v_{c}^{σ}

equals 0.

σ_{e n d}

is set according to the information of the optical system, and 11 is chosen in this paper.

There can be more than one suspected point in the region of a target; thus, to remove redundant targets in the same proposal, we apply a non-maximum suppression (NMS) operation:

\begin{matrix} V (x_{i}, y_{i}) = \{\begin{matrix} 1, & V (x_{i}, y_{i}) = V_{R} \\ 0, & V (x_{i}, y_{i}) \neq V_{R} \end{matrix} \\ R (x_{i}, y_{i}) = \{\begin{matrix} R (x_{i}, y_{i}), & V (x_{i}, y_{i}) = V_{R} \\ 0, & V (x_{i}, y_{i}) \neq V_{R} \end{matrix} \end{matrix}

(8)

where

V_{R}

is the maximum of V within radius R. A point whose V value is 1 will be regarded as the center of a target. The implementation details are provided as follows.

3. Implementation Details

3.1. Network Input

During surveillance, images of 16 bits are usually chosen due to their great ability to express image information, as shown in Figure 3a. To reduce the influences of intensity differences among targets and light conditions on the premise that the distribution of targets can be maintained as much as possible, normalization of each proposal region is applied as follows:

f_{norm} (x, y) = \frac{f_{sub} (x, y) - min_{(x, y)} f_{sub} (x, y)}{max_{(x, y)} f_{sub} (x, y) - min_{(x, y)} f_{sub} (x, y)}

(9)

where

f_{norm} (x, y)

is the normalized proposal region.

3.2. Network Structure

The detailed structure of our network is shown in Table 1. The convolutional layers extract multilevel image features, the max-pooling layer downsamples the feature map to reduce the spatial dimension while keeping the features invariant and the fully connected layer discriminates the class and corresponding probability of the input images. The central part of the input image is extracted and input into the feature extractor module (FE) as guidance information.

The input to the network is the processed sub-image. The network can learn the distribution characteristics of targets from the input data and adjust the parameters to establish the target model. The output of the network is two neurons, one is the confidence value of the target and the other is the confidence value of the non-target. The model and the output of the intermediate process are shown in detail in Figure 4. During training, the weights and deviation values of all parameters in the network are learned by backpropagation. After the training is complete, the loss of the network is approximately concentrated at a value and the distribution characteristics of the targets contained in the sample data are considered to have been learned by the network.

3.3. Guidance Information

A target is in the central part of the input image; thus, to further reduce the interference caused by pixels in the non-central area and enhance the distribution characteristics of the target, a center area of

5 \times 5

in the input image is extracted and input into the feature extractor module (FE) to calculate three features. Due to the replication operation, only one channel will be used for calculation. The calculation results are added as guidance information to the output of convolution-2 to jointly conduct the subsequent full connection. The features [23] are constructed to measure whether a sub-image coincides with a Gaussian distribution and are used as guidance information. The details are shown as follows:

(1)

E_{R}

is constructed to reflect the response when the sub-image passes a kernel. The weight function

ω

is designed as:

ω (x, y) = {(1 + φ (\sqrt{{(x - x_{0})}^{2} + {(y - y_{0})}^{2}}))}^{- 1}

(10)

where

φ (\cdot)

is the cost function. In this paper, it is set as follows:

φ (x) = x^{2} .

(11)

For a reasonable comparison between the responses obtained at different sizes, the kernel is generated as follows:

\begin{matrix} g (x, y) & = e^{- \frac{{(x - x_{0})}^{2} + {(y - y_{0})}^{2}}{2 {σ_{k}}^{2}}} \\ x_{0} & = \frac{k + 1}{2} \\ y_{0} & = \frac{k + 1}{2} \\ σ_{k} & = \frac{k - 1}{6} \end{matrix}

(12)

where

g (x, y)

is the gray value,

σ_{k}

is the variance and

(x_{0}, y_{0})

is the center of the kernel.

E_{R}

is defined as:

E_{R} (x, y) = \frac{\sum_{(x, y)} f_{norm} (x, y) \cdot g (x, y) \cdot ω (x, y)}{\sum_{(x, y)} g^{2} (x, y) \cdot ω (x, y)}

(13)

(2)

E_{C}

is constructed to reflect the energy concentration of a sub-image. It is defined as follows:

\begin{matrix} E_{C} (x, y) & = \frac{\sum_{(x, y) \in S_{sub}} f_{norm} (x, y) \cdot g (x, y) \cdot ω (x, y)}{\sum_{(x, y)} f_{norm} (x, y) \cdot g (x, y) \cdot ω (x, y)} \\ S_{sub} & = \{(x, y)| {(x - x_{0})}^{2} + {(y - y_{0})}^{2} < {(2 σ_{k})}^{2}\} \end{matrix}

(14)

where

S_{sub}

is the center region of the sub-image.

(3)

E_{T}

is constructed to reflect the correctness of energy transmissibility in a sub-image. The gradient of a point is calculated as follows:

\begin{matrix} \begin{matrix} \nabla f (x, y) & = {[G_{x}, G_{y}]}^{T} \\ G_{x} (x, y) & = f (x + 1, y) - f (x, y) \\ G_{y} (x, y) & = f (x, y + 1) - f (x, y) \end{matrix} \\ \begin{matrix} ϕ (x, y) = \{\begin{matrix} arctan (\frac{G_{y}}{G_{x}}), & G_{x} \geq 0 \\ arctan (\frac{G_{y}}{G_{x}}) + π, & G_{x} < 0, G_{y} \geq 0 \\ arctan (\frac{G_{y}}{G_{x}}) - π, & others \end{matrix} \end{matrix} \end{matrix}

(15)

where

G_{x} (x, y)

is the gradient in the x direction,

G_{y} (x, y)

is the gradient in the y direction and

ϕ (x, y)

is the gradient direction.

The formula of

E_{T}

is defined as:

\begin{matrix} E_{T} (x, y) & = \frac{\sum_{(x, y) \in S_{sub}} D (x, y)}{N} \\ D (x, y) & = \{\begin{matrix} 1, & ϕ_{diff} (x, y) \leq 45 \\ 0, & others \end{matrix} \\ ϕ_{diff} (x, y) & = min (|ϕ (x, y) - ϕ_{k} (x, y)|, 2 π - |ϕ (x, y) - ϕ_{k} (x, y)|) \end{matrix}

(16)

where

ϕ_{k} (x, y)

is the gradient direction of the kernel,

ϕ_{diff} (x, y)

is the angle between

ϕ (x, y)

and

ϕ_{k} (x, y)

,

D (x, y)

is the local deviation judgment and N is the total number of pixels in the sub-image.

3.4. Network Loss Calculation

Considering that data points cannot provide a sufficient amount of unique features that effectively describe the targets, bias loss [51] is introduced to address the resource-constrained classification problem which results in misleading information during the optimization process caused by random predictions. The variance is used as a simple metric of diversity to reflect how far the feature maps’ values are spread out from the average. A higher variance means a higher chance of obtaining a large number of unique features. The feature maps of the last convolutional layer are used for the variance calculations.

Let

T \in R^{b \times c \times h \times w}

be the output of the convolution-2 layer; then, the variance of the feature maps of the ith data point in the batch is equal to

\begin{matrix} v_{i} & = \frac{\sum_{j = 1}^{n} {(t_{j} - μ)}^{2}}{n - 1} \\ μ & = \frac{\sum_{j = 1}^{n} t_{j}}{n - 1} \\ n & = c \times h \times w \end{matrix}

(17)

where b is the batch size, c is the number of input channels and h and w are the height and width of the tensor, respectively.

In addition, the variance is scaled as follows:

{\tilde{v}}_{i} = \frac{(v_{i} - v_{m i n})}{(v_{m a x} - v_{m i n})}

(18)

where, at each iteration,

v_{m a x}

and

v_{m i n}

are the maximum and minimum values of the activations in the batch of feature maps, respectively. This is performed to ensure that outliers in the variance values will not lead to large changes in the loss and will not make the model unstable.

Let

P \in R^{c \times h \times w}

be the feature space and

Q = \{0, 1\}

be the label space. To calibrate the contribution of each data point to the cumulative loss, for a dataset

D = {(p_{i}, q_{i})}_{i = 1}^{N}

, where each

(p_{i}; q_{i}) \in P \times Q

, and a neural network

f (x; θ)

, where

θ

denotes the model parameters, a nonlinear scaling function is added to the traditional cross-entropy loss, which aims to create a bias between the data points with low and high variance. The bias loss is defined as

\begin{matrix} L_{b i a s} = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{j = 0}^{1} z ({\tilde{v}}_{i}) q_{i j} log f_{j} (p_{i}; θ) \\ z ({\tilde{v}}_{i}) = e x p (α \times {\tilde{v}}_{i}) - β \end{matrix}

(19)

where

α

and

β

are tunable contribution parameters and

\tilde{v}

is the scaled variance of the output of the convolutional layer. We select

α = 0.5

and

β = 0.3

in this paper.

4. Experimental Results

The model described above was applied to detect targets in star images. The CNN was implemented in Python 3.7 on a PC with a 16 GB memory and a 1.9 GHz Intel i7 dual CPU. During the experiments, we used PyTorch to compute the convolution with a learning rate of 0.001. In the training process, stochastic gradient descent (SGD) was used for optimization. The learning rate was 0.001, the momentum was 0.9 and the batch size was 10.

4.1. Experimental Settings

(1) Evaluation metrics

(a) Classification metrics

In this paper, the

F_{1}

score is selected to reflect the classification ability of the network. It is an indicator used in statistics to measure the accuracy of binary classification models. Precision refers to the proportion of the number of correctly classified positive samples with respect to the total number of classified positive samples. Recall rate refers to the proportion of the number of correctly classified positive samples among the positive samples. They are calculated as follows:

\begin{matrix} P r e c i s i o n & = \frac{T P}{T P + F P} \\ R e c a l l & = \frac{T P}{T P + F N} \end{matrix}

(20)

where

T P

(true positive) represents the number of correctly classified positive classes in positive samples,

F P

(false positive) represents the number of incorrectly classified negative classes in positive samples and

F N

(false negative) represents the number of incorrectly classified positive classes in negative samples. The accuracy rate reflects the accuracy of the overall classification of the network.

F_{1}

takes into account both the precision and recall of the classification model. It can be seen as a harmonic average of model accuracy and recall, with a maximum value of 1 and a minimum value of 0. The calculation is as follows:

F_{1} = 2 \cdot \frac{A c c u r a c y \cdot R e c a l l}{A c c u r a c y + R e c a l l}

(21)

(b) Detection metrics

SCR is a common metric used to reflect the saliency of targets. The SCR of the ith target is defined with the information of the target and the surrounding area. The calculation is as follows:

{SCR}_{i} = \frac{m_{i} - μ_{i}}{σ_{i}}

(22)

where

m_{i}

is the mean intensity of the target and

μ_{i}

and

σ_{i}

are the mean value and standard deviation of the intensity of the area, respectively.

The true positive rate (TPR) and false positive rate (FPR) are defined as

\begin{matrix} TPR = \frac{TD}{ND} \times 100 % \\ FPR = \frac{FD}{NT} \times 100 % \end{matrix}

(23)

where TD is the number of true targets detected, ND is the number of targets detected, FD is the number of false targets detected and NT is the total number of pixels in the area.

(2) Dataset

(a) Training data and validation data

There is no public dataset related to the research in this paper; thus, star images taken by two space-based telescopes or five ground-based telescopes under different imaging conditions were used to construct the dataset. For a determined target, the region of

11 \times 11

around the target is extracted and normalized to the data. The dataset consists of 3408 sub-images which were randomly selected from more than 800 star images. The dataset was divided into a training set and a validation set according to the ratio of 7:3. There are many possibilities for the distribution of non-target areas; thus, in order to avoid paying too much attention to negative samples during training and convergence speed-up, the training set contains 1428 positive samples and 956 negative samples and the validation set contains 578 positive samples and 446 negative samples, as shown in Figure 5. Among these, the positive sample includes the following three cases: (1) targets moving in different directions, i.e., targets extend and distribute in different directions; (2) targets with different sizes and significance and (3) images containing other stars or targets. In this way, overfitting can be reduced as much as possible.

(b) Test data

We used four new groups of star image sequences acquired by another ground-based space surveillance telescope to verify the performance of the proposed method. Table 2 shows the key parameters of the telescope during data acquisition. Nine typical targets with various SCR values in the sequences are summarized in Figure 6 and Figure 7. To process the algorithm quickly when determining target information and interference, small image subsets of 201 × 201 pixels around the targets are extracted from each frame and used to form new sequences, denoted as sequence 1 to sequence 9. Details of the targets and backgrounds are listed in Table 3, where information is calculated according to the true targets and their nearby areas whose sizes are slightly larger than the target radius in each frame.

(3) Baselines

To validate the detection ability of our method, it was compared to several state-of-the-art traditional small target detection methods. Although some mainstream deep learning techniques outperform conventional non-deep learning methods, these methods cannot be directly applied to this problem for the following reasons: (1) it is difficult to obtain correct labels in the whole image with prior information; (2) most of the deep learning methods contain several max-pooling layers, which are not applicable to small targets; (3) if the deep learning methods are only used for two-class recognition, their complicated structure may require more time for training and lead to inferior detection capability and (4) the deep learning methods need more computing resources, which places more pressure on the in-orbit system. Considering the cost of manual labeling for a full image and realizability in actual engineering, common deep-learning-based methods were not chosen in this paper. Methods based on global binarization (max-mean, RLCM, NSM and WSLCM), a method based on local binarization (FKRW) and local theoretical model-driven methods (SExtractor and MLTC) were utilized as baseline methods.

4.2. Comparative Analysis of the Processing Stage

To verify the influence of the preprocessing stage on the star image and targets inside, an image disturbed by strong stray light was selected for preprocessing. The image size is

1200 \times 1200

, and the comparison before and after the processing stage is shown in Figure 8. Since the depth of the original image is 16 bits, to compare the effects of processing more reasonably, the images before and after processing are stretched to the same gray range of

[1212, 189]

. In the contour map, to better reflect the gray change of the background, parts of the image with a gray value higher than 200 are truncated. There is a large area of uneven interference in the original image. The interference in the processed image is obviously removed and the background is close to 0. Furthermore, eight stars with different significances in different positions and different degrees of interference were selected for comparison. Figure 9 shows the images and corresponding gray distribution of three stars (Star 1, Star 4 and Star 8) before and after processing. The processed image better reflects the distribution of the target versus the background, i.e., the original distribution of the target. However, because the algorithm mainly roughly estimates the background, it is difficult to filter the interference of inclusions in the target. The processing effect is quantitatively analyzed in Table 4, including the number of candidate points in the figure and the SCR value of the target. It can be seen that preprocessing has less influence on the image details and less interference with the original distribution of the target and its region. On the other hand, because a large area of interference is filtered out, the false extraction of candidate points in the bright–dark junction area is greatly reduced, which increases the processing speed of the algorithm.

4.3. Comparative Analysis of Ablation Experiments

Furthermore, ablation experiments were conducted and the results are shown in Table 5.

Compared with no normalization, traditional cross entropy loss and the no guidance mode, the network combining normalization, bias loss and the addition of guidance information achieves the highest

F_{1}

score, which means that the network can effectively balance the classification precision and recall rate.

4.4. Comparative Analysis of Experiments for Different Targets

In order to demonstrate the generalization of the proposed method, sequences containing different targets with various backgrounds, shapes, sizes and intensities are processed using the above methods. Three typical targets in different sequences with a large range of SCRs (target 3, target 4 and target 6) ertr randomly selected from the dataset, and Figure 10 shows a comparison of the corresponding saliency maps and detection results with different methods. Here, the space targets and star targets detected correctly are labeled individually in red boxes and blue circles, and the yellow box represents a false alarm. The saliency maps of max-mean, RLCM, NSM, WSLCM and FKRW are the results of threshold segmentation. The MLTC column presents the final detection results, and the column of the proposed method only shows the proposals. For each method, the same parameters were used during the detection of a sequence.

Intuitively, false alarms and misdetection frequently occur in RLCM, NSM, WSLCM and FKRW due to the relatively unstable performance of background suppression. Max-mean adopts target enhancement and performs better with respect to false alarms. However, this method may further reduce the saliency of targets present in a low-contrast environment, as shown in Figure 10(B(b)). Instead, SExtractor and MLTC focus on the distribution of targets rather than local saliency, so fewer false alarms and misdetection occur. Figure 11 shows four typical undetected targets of MLTC and the calculation results of MLTC and the SF-CNN are given. When targets deviate from the ideal distribution due to them moving (Figure 11a), interference from the background (Figure 11b) or their small size (Figure 11c,d), the above two methods are also subject to the problem of threshold setting. The proposed SF-CNN follows a similar principle of target estimation with intensity distribution. The main difference is that the SF-CNN is data-driven rather than model-driven. Thus, the SF-CNN is more appropriate for targets with different shapes and less likely to be influenced by the background.

The receiver operator characteristic (ROC) curve for sequence 1 to sequence 6 is shown in Figure 12 to further reveal the advantages of the proposed method over the other seven methods. From the figure, we can see that the proposed algorithm achieves the lowest FPR for the same TPR in most cases.

The statistical results of the target detection methods are listed in Table 6. It should be noted that the same parameters are shared in different sequences when the MLTC and the SF-CNN are applied. In contrast, the parameters of the baseline methods in each sequence are confirmed after balancing the TPR and FPR. For each sequence, different values of each parameter are set to some images, and the values corresponding to the maximum acceptable FPR are selected after visual inspection. Furthermore, the TPR is determined at these parameter values. On the whole, the highest TD and TPR for all sequences achieved by our method indicate that it can stably detect targets with different SCRs and perform better than baseline methods.

Finally, Table 7 lists the average time required for detection in a single image. Compared to the baseline methods, the proposed method shows great advantages in processing time. Furthermore, the Multiply–Accumulate Operations (MACs) and the number of parameters (Params) of the proposed method and some common models of simple structures, such as ResNet50, AlexNet and VGG13, are shown in Table 8. For the same image, the number of regions is much less than that of the pixels for convolution, so the same model requires fewer MACs for the smaller subset. Due to the small architecture, the SF-CNN achieves the smallest number of MACs and Params, which means that the SF-CNN consumes the fewest computing resources.

4.5. Comparative Analysis of Experiments on the Same Targets with Different Saliency

In this subsection, we further evaluate the influence of target saliency on the detection performance. In group 4, a different exposure time is set at a moment of image acquisition to simulate different saliency conditions of the same targets. A shorter exposure time results in a lower intensity, a smaller size and a poorer saliency, so more robustness and a stronger generalization are required.

The results of a quantitative comparison of the different methods for group 4 are summarized in Table 9. For targets with a high saliency (SCR values above 5), these methods exhibit no significant difference in detection effects. However, when the intensity of the target is close to the background, most of the baseline methods, including max-mean, RLCM, NSM, WSLCM and FKRW, do not work well because they do not have a strong anti-interference ability for similar backgrounds around targets. Despite focusing on the same target, their detection probability is severely influenced by the saliency of targets. The proposed method almost always achieves the highest TPR in all test sequences, implying that the SF-CNN outperforms the baseline methods in terms of detection probability.

The visual results of three targets selected from sequence 7 to sequence 9 are illustrated in Figure 13. It can be observed that, compared to the subsets with long exposure time, substantial background clutter and noise exist in the original subsets with short exposure time, as shown in Figure 13(a1). This interference significantly decreases the saliency of the targets and impacts the detection performance of baseline methods based on background suppression and target enhancement, as shown in Figure 13(b1–g1). Many pixels are incorrectly identified by these methods as suspected targets. Although the MLTC has more stringent conditions for target identification, some targets submerged in this interference are easily missed because of their altered distribution. In contrast, due to the appropriate processing stage and proposal search, our method can remove interference while maintaining as much of the original distribution of targets as possible. Moreover, irregular targets are used to construct the dataset, which results in a better generalization and robustness of the proposed method with respect to targets of various saliency.

5. Discussion

The advantages and limitations of the presented method are discussed in this section.

1.: The global detection of a star image is converted to local two-class recognition of several candidate regions and certain targets are chosen as samples in order to solve the problems of uncertainty and a high cost of annotation.
2.: The preprocessing can only filter the interference in a large area to reduce the false extraction of candidate points and improve algorithm efficiency. The distribution of a target and its surrounding region is almost unchanged so that the original image information is maintained, which guarantees similarity among samples.
3.: Improvements in the network, including guidance information and bias loss, can contribute to a stronger detection ability.
4.: There are many problems for common end-to-end detectors based on CNN, such as the construction of a dataset, small sizes of targets and so on, to be considered in this field. Thus, we did not choose deep-learning-based methods as baselines. The proposed method avoids the problem of threshold setting for the final target confirmation so that it has better performance than the baseline methods when targets deviate from the ideal distribution due to their movement or interference from the background.
5.: With an increase in speed, a target may gradually change from point-like to streak-like, which will affect the detection ability of our method. In our future work, we will enhance the adaptability for targets of different shapes by the simulation of samples or improvements to the network.
6.: In our future research, we will further improve the processing stage and the region proposal search to enhance the SCR of a target (while maintaining the distribution of the target) and reduce computations.

6. Conclusions

This paper presents an effective method based on a CNN to detect space targets against diverse backgrounds. The background is estimated according to local intensity statistics, which are used to separate the foreground from the background simply and efficiently. The processing stage can significantly reduce the incorrect detection of region proposals. In particular, in contrast to the traditional methods based on background suppression and target enhancement and theoretical-model-driven methods, the proposed method transforms the global detection problem into a local two-class recognition problem. Each region proposal is classified by a trained CNN. Experiments implemented on an extensive range of star images containing complicated backgrounds and targets with diverse saliency demonstrate that our method significantly outperforms baseline methods, such as max-mean, ILCM, RLCM, NSM, WSLCM, SExtractor and MLTC. The proposed method shows superiority regarding detection ability and extra size information. In addition, the network structure of the proposed method is easy to implement and improve, so it can meet actual engineering requirements. The detection results contain not only space targets but stars as well; thus, in future works, we will try to combine information on motion characteristics to distinguish targets from stars.

Author Contributions

X.Z. conceptualized the study. B.L. proposed the original idea, performed the experiments and wrote the manuscript. J.W., H.W., L.Z. and X.Y. contributed to the writing, direction and content and revised the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by Guangdong Basic and Applied Basic Research Foundation Project (no. 2020A1515110216).

Data Availability Statement

Not applicable.

Acknowledgments

The datasets were provided by Changchun Observatory, National Astronomical Observatories, Chinese Academy of Sciences for Scientific Research.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ruprecht, J.D.; Stuart, J.S.; Woods, D.F.; Shah, R.Y. Detecting small asteroids with the Space Surveillance Telescope. Icarus 2014, 239, 253–259. [Google Scholar] [CrossRef]
Stokes, G.; Vo, C.; Sridharan, R.; Sharma, J. The space-based visible program. Lincoln. Lab. J. 1998, 11, 205–238. [Google Scholar]
Sharma, J. Space-based visible space surveillance performance. J. Guid. Control. Dyn. 2000, 23, 153–158. [Google Scholar] [CrossRef]
Hu, Y.P.; Li, K.B.; Xu, W.; Chen, L.; Huang, J.Y. A novel space-based observation strategy for GEO objects based on daily pointing adjustment of multi-sensors. Adv. Space Res. 2016, 58, 505–513. [Google Scholar] [CrossRef]
Porat, B.; Friedlander, B. A frequency domain algorithm for multiframe detection and estimation of dim targets. IEEE Trans. Pattern Anal. Mach. Intell. 1990, 12, 398–401. [Google Scholar] [CrossRef]
Bai, X.; Zhou, F. Analysis of new top-hat transformation and the application for infrared dim small target detection. Pattern Recognit. 2010, 43, 2145–2156. [Google Scholar] [CrossRef]
Gao, C.; Meng, D.; Yang, Y.; Wang, Y.; Zhou, X.; Hauptmann, A.G. Infrared patch-image model for small target detection in a single image. IEEE Trans. Image Process. 2013, 22, 4996–5009. [Google Scholar] [CrossRef]
Reed, I.S.; Gagliardi, R.M.; Stotts, L.B. Optical moving target detection with 3-D matched filtering. IEEE Trans. Aerosp. Electron. Syst. 1988, 24, 327–336. [Google Scholar] [CrossRef]
Li, M.; Zhang, T.; Yang, W.; Sun, X. Moving weak point target detection and estimation with three-dimensional double directional filter in IR cluttered background. Opt. Eng. 2005, 44, 107007. [Google Scholar] [CrossRef]
Liu, X.; Zuo, Z. A dim small infrared moving target detection algorithm based on improved three-dimensional directional filtering. In Chinese Conference on Image and Graphics Technologies; Springer: Berlin/Heidelberg, Germany, 2013; pp. 102–108. [Google Scholar]
Wang, Z.; Tian, J.; Liu, J.; Zheng, S. Small infrared target fusion detection based on support vector machines in the wavelet domain. Opt. Eng. 2006, 45, 076401. [Google Scholar]
Zhang, B.; Zhang, T.; Cao, Z.; Zhang, K. Fast new small-target detection algorithm based on a modified partial differential equation in infrared clutter. Opt. Eng. 2007, 46, 106401. [Google Scholar] [CrossRef]
Stoveken, E.; Schildknecht, T. Algorithms for the optical detection of space debris objects. In Proceedings of the 4th European Conference on Space Debris, Darmstadt, Germany, 18–20 April 2005; pp. 18–20. [Google Scholar]
Virani, S.; Murphy, T.; Holzinger, M.; Jones, B. Empirical Dynamic Data Driven Detection and Tracking Using Detectionless and Traditional FiSSt Methods. In Proceedings of the Advanced Maui Optical and Space Surveillance (AMOS) Technologies Conference, Maui, HI, USA, 19–22 September 2017; p. 41. [Google Scholar]
Viggh, H.; Stokes, G.; Shelly, F.; Blythe, M.; Stuart, J. Applying electro-optical space surveillance technology to asteroid search and detection: The linear program results. In Proceedings of the 1998 Space Control Conference; American Society of Civil Engineers: HlAmerican Society of Civil Engineers: Reston, VA, USA, 1998; pp. 373–381. [Google Scholar]
Zingarelli, J.C.; Pearce, E.; Lambour, R.; Blake, T.; Peterson, C.J.; Cain, S. Improving the space surveillance telescope’s performance using multi-hypothesis testing. Astron. J. 2014, 147, 111. [Google Scholar] [CrossRef]
Hardy, T.; Cain, S.; Jeon, J.; Blake, T. Improving space domain awareness through unequal-cost multiple hypothesis testing in the space surveillance telescope. Appl. Opt. 2015, 54, 5481–5494. [Google Scholar] [CrossRef]
Tompkins, J.; Cain, S.; Becker, D. Near earth space object detection using parallax as multi-hypothesis test criterion. Opt. Express 2019, 27, 5403–5419. [Google Scholar] [CrossRef] [PubMed]
Bertin, E.; Arnouts, S. SExtractor: Software for source extraction. Astron. Astrophys. Suppl. Ser. 1996, 117, 393–404. [Google Scholar] [CrossRef]
Bertin, E. Automated morphometry with SExtractor and PSFEx. In Astronomical Data Analysis Software and Systems XX; Astronomical Society of the Pacific: San Francisco, CA, USA, 2011; Volume 442, p. 435. [Google Scholar]
Murphy, T.S.; Holzinger, M.J. Uncued Low SNR Detection with Likelihood from Image Multi Bernoulli Filter. In Proceedings of the Advanced Maui Optical and Space Surveillance Technologies Conference, Maui, HI, USA, 20–23 September 2016; p. 30. [Google Scholar]
Murphy, T.S.; Holzinger, M.J.; Flewelling, B. Space object detection in images using matched filter bank and bayesian update. J. Guid. Control. Dyn. 2017, 40, 497–509. [Google Scholar] [CrossRef]
Lin, B.; Yang, X.; Wang, J.; Wang, Y.; Wang, K.; Zhang, X. A Robust Space Target Detection Algorithm Based on Target Characteristics. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
Deshpande, S.D.; Er, M.H.; Venkateswarlu, R.; Chan, P. Max-mean and max-median filters for detection of small targets. In Signal and Data Processing of Small Targets 1999; International Society for Optics and Photonics: Bellingham, DC, USA, 1999; Volume 3809, pp. 74–83. [Google Scholar]
Bai, X.; Zhou, F.; Xue, B. Infrared dim small target enhancement using toggle contrast operator. Infrared Phys. Technol. 2012, 55, 177–182. [Google Scholar] [CrossRef]
Hadhoud, M.M.; Thomas, D.W. The two-dimensional adaptive LMS (TDLMS) algorithm. IEEE Trans. Circuits Syst. 1988, 35, 485–494. [Google Scholar] [CrossRef]
Chen, C.P.; Li, H.; Wei, Y.; Xia, T.; Tang, Y.Y. A local contrast method for small infrared target detection. IEEE Trans. Geosci. Remote Sens. 2013, 52, 574–581. [Google Scholar] [CrossRef]
Han, J.; Ma, Y.; Zhou, B.; Fan, F.; Liang, K.; Fang, Y. A robust infrared small target detection algorithm based on human visual system. IEEE Geosci. Remote Sens. Lett. 2014, 11, 2168–2172. [Google Scholar]
Han, J.; Liang, K.; Zhou, B.; Zhu, X.; Zhao, J.; Zhao, L. Infrared small target detection utilizing the multiscale relative local contrast measure. IEEE Geosci. Remote Sens. Lett. 2018, 15, 612–616. [Google Scholar] [CrossRef]
Lv, P.; Sun, S.; Lin, C.; Liu, G. A method for weak target detection based on human visual contrast mechanism. IEEE Geosci. Remote Sens. Lett. 2018, 16, 261–265. [Google Scholar] [CrossRef]
Han, J.; Moradi, S.; Faramarzi, I.; Zhang, H.; Zhao, Q.; Zhang, X.; Li, N. Infrared small target detection based on the weighted strengthened local contrast measure. IEEE Geosci. Remote Sens. Lett. 2020, 18, 1670–1674. [Google Scholar] [CrossRef]
Qin, Y.; Bruzzone, L.; Gao, C.; Li, B. Infrared small target detection based on facet kernel and random walker. IEEE Trans. Geosci. Remote Sens. 2019, 57, 7104–7118. [Google Scholar] [CrossRef]
Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 11–18 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Wei, L.; Dragomir, A.; Dumitru, E.; Christian, S.; Scott, R.; Cheng-Yang, F.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
Jia, P.; Liu, Q.; Sun, Y. Detection and classification of astronomical targets with deep neural networks in wide-field small aperture telescopes. Astron. J. 2020, 159, 212. [Google Scholar] [CrossRef]
Xi, J.; Xiang, Y.; Ersoy, O.K.; Cong, M.; Wei, X.; Gu, J. Space debris detection using feature learning of candidate regions in optical image sequences. IEEE Access 2020, 8, 150864–150877. [Google Scholar] [CrossRef]
Zhao, M.; Cheng, L.; Yang, X.; Feng, P.; Liu, L.; Wu, N. Tbc-net: A real-time detector for infrared small target detection using semantic constraint. arXiv 2019, arXiv:2001.05852. [Google Scholar]
Wang, H.; Zhou, L.; Wang, L. Miss detection vs. false alarm: Adversarial learning for small object segmentation in infrared images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8509–8518. [Google Scholar]
Shi, M.; Wang, H. Infrared dim and small target detection based on denoising autoencoder network. Mob. Netw. Appl. 2020, 25, 1469–1483. [Google Scholar] [CrossRef]
Zhao, B.; Wang, C.; Fu, Q.; Han, Z. A novel pattern for infrared small target detection with generative adversarial network. IEEE Trans. Geosci. Remote Sens. 2020, 59, 4481–4492. [Google Scholar] [CrossRef]
Du, J.; Lu, H.; Zhang, L.; Hu, M.; Chen, S.; Deng, Y.; Shen, X.; Zhang, Y. A Spatial-Temporal Feature-Based Detection Framework for Infrared Dim Small Target. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–12. [Google Scholar] [CrossRef]
Hou, Q.; Wang, Z.; Tan, F.; Zhao, Y.; Zheng, H.; Zhang, W. RISTDnet: Robust infrared small target detection network. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
Li, Y.; Niu, Z.; Sun, Q.; Xiao, H.; Li, H. BSC-Net: Background Suppression Algorithm for Stray Lights in Star Images. Remote Sens. 2022, 14, 4852. [Google Scholar] [CrossRef]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Abrahamyan, L.; Ziatchin, V.; Chen, Y.; Deligiannis, N. Bias Loss for Mobile Neural Networks. arXiv 2021, arXiv:2107.11170. [Google Scholar]

Figure 1. The original images and corresponding intensity distribution of a target in different frames. (a) Frame 1, (b) Frame 2, (c) Frame 3 and (d) Frame 4.

Figure 2. (a) Original image and (b) corresponding intensity distribution of a target.

Figure 3. Learning process of the proposed method based on a CNN. (a) Original image and intensity distribution. (b) Preprocessing stage. (c) Region proposal search. (d) Target detection with CNN. (e) Detection results.

Figure 4. SF-CNN. (a) Architecture and (b) output of the intermediate process.

Figure 5. Training data. (a) Positive example and (b) negative example.

Figure 6. Original subset and intensity distribution of different targets marked in red boxes. (a) Target 1, (b) Target 2, (c) Target 3, (d) Target 4, (e) Target 5 and (f) Target 6.

Figure 7. Original subset and intensity distribution of the same targets marked in red boxes with different saliency. (A) Exposure time (100 ms), (B) exposure time (500 ms), (a) Target 7, (b) Target 8 and (c) Target 9.

Figure 8. Comparison of image and contour where eight stars are marked in red boxes and blue boxes in the stages (a) before the processing and (b) after the processing.

Figure 9. Comparison of stars (A) before and (B) after processing stage. (a) Star 1, (b) Star 4 and (c) Star 8.

Figure 10. Saliency maps and detection results. (A) Target 3 (sequence 3); (B) target 4 (sequence 4); (C) target 6 (sequence 6); (a) original subsets; (b) max-mean; (c) RLCM; (d) NSM; (e) WSLCM; (f) FKRW; (g) SExtractor; (h) MLTC; (i) SF-CNN.

Figure 11. Undetected targets by the MLTC. (A) Detection result of the MLTC; (B) detection results of the SF-CNN; (C) intensity distribution; (a) target 1; (b) target 2; (c) target 3; (d) target 4.

Figure 12. Comparison of ROC curves for (a) seq. 1, (b) seq. 2, (c) seq. 3, (d) seq. 4, (e) seq. 5 and (f) seq. 6.

Figure 13. Saliency maps and detection results. (A) Target 7 (sequence 7); (B) target 8 (sequence 8); (C) target 9 (sequence 9); (a) original subsets; (b) max-mean; (c) RLCM; (d) NSM; (e) WSLCM; (f) FKRW; (g) SExtractor; (h) MLTC; (i) SF-CNN.

Table 1. Architecture of the SF-CNN.

Type	Filters	Size/Stride	Output
Convolution-1	6	$5 \times 5$	$6 \times 6$
Maxpooling	-	$2 \times 2 / 2$	$3 \times 3$
Convolution-2	12	$3 \times 3$	$1 \times 1$
Connected	-	2	-
Softmax	-	-	-

Table 2. Telescope parameters.

Parameter	Value
Focal length	160 mm
Effective aperture	150 mm
CCD pixel size	9 $μ$ m × 9 $μ$ m
Total number of pixels	4096 × 4096
Center wavelength	650
Field angle	13.1 $^{\circ}$
Shooting mode	Star mode
Gray level	16 bit

Table 3. Details of sequences.

Group	Target	Exposure Time	Information
Group	Target	Exposure Time	Number	Avg Intensity of Targets	Avg FWHM	Avg Intensity of Background	Avg SCR
1	1	500 ms	244	379.2	4.0	140.43	5.5
	2		178	216.5	3.4	147.11	2.3
	3		93	194.6	3.0	151.43	1.6
2	4	300 ms	54	286.4	3.6	83.50	4.3
	5		54	182.8	3.2	107.21	2.5
3	6	100 ms	16	283.1	3.9	245.11	1.1
4	7	500 ms	54	1142.5	5.11	457.95	5.3
	7	100 ms	54	266.7	3.12	131.30	4.5
	8	500 ms	54	738.4	3.46	448.96	6.3
	8	100 ms	54	186.6	2.94	129.62	3.7
	9	500 ms	54	690.5	2.91	461.11	5.4
	9	100 ms	54	173.5	2.74	131.61	3.4

Table 4. Comparision of features before and after processing.

Condition	Number of Candidates	SCR
Condition	Number of Candidates	Star 1	Star 2	Star 3	Star 4	Star 5	Star 6	Star 7	Star 8
Before processing	62,736	2.25	2.31	2.45	4.00	3.52	6.85	1.62	1.19
After processing	57,463	2.26	2.31	2.45	4.00	3.51	6.85	1.62	1.19

Table 5. Comparison of ablation experiments.

Method	$F_{1}$
Method	Training Dataset	Validation Dataset
SFCNN + Cross entropy loss	88.57%	76.95%
SFCNN + Normalization + Cross entropy loss	94.16%	88.51%
SFCNN + Bias loss	92.88%	93.98%
SFCNN + Normalization + Guidance information + Cross entropy loss	96.97%	92.90%
SFCNN + Normalization + Guidance information + Bias loss	98.05%	97.85%

Table 6. Detection results of different methods for different targets.

Target	Exposure Time	TD/TPR
Target	Exposure Time	Max-Mean	RLCM	NSM	WSLCM	FKRW	SExtractor	MLTC	SF-CNN
1	500 ms	238/97.54%	241/98.77%	232/95.08%	234/95.90%	207/84.84	243/99.59%	241/98.77%	244/100.00%
2		54/30.33%	154/86.52%	75/42.13%	151/84.83%	125/70.22	167/93.82%	170/95.51%	178/100.00% %
3		55/59.13%	80/86.02%	21/22.58%	81/87.10%	143/80.34	87/93.54%	88/94.62%	90/96.77%
4	300 ms	48/85.71%	53/94.64%	35/64.81%	54/100.00%	50/92.59	54/100.00%	54/100.00%	54/100.00%
5	300 ms	26/48.15%	52/96.30%	44/95.08%	46/82.14%	54/100.00%	54/100.00%	53/94.64%	54/100.00%
6	100 ms	9/56.25%	16/100.00%	13/81.25%	14/87.50%	16/100.00%	16/100.00%	16/100.00%	16/100.00%

Table 7. Computational cost comparison among the proposed method and baselines for a single image (in milliseconds.)

Method	Max-Mean	RLCM	NSM	WSLCM	FKRW	SExtractor	MLTC	SF-CNN
Time consumed (ms)	141.28	538.37	1586.78	104.31	1140.12	15.55	12.89	53.85

Table 8. MACs and Params comparison between the proposed method and common models.

Method	Size of Input	MACs	Params
VGG13	$201 \times 201 \times 3$	9.02 G	133.05 M
AlexNet	$201 \times 201 \times 3$	551.13 M	61.1 M
ResNet50	$11 \times 11 \times 3 201 \times 201 \times 3$	31.89 M 3.62 G	25.56 M
SF-CNN	$11 \times 11 \times 3 201 \times 201 \times 3$	23.32 K 184.01 M	1.14 k

Table 9. Detection results of different methods for the same targets with different saliencies.

Target	Exposure Time	TD/TPR
Target	Exposure Time	Max-Mean	RLCM	NSM	WSLCM	FKRW	SExtractor	MLTC	SF-CNN
7	500 ms	54/100.00%	54/100.00%	44/81.48%	54/100.00%	54/100.00%	54/100.00%	54/100.00%	54/100.00%
7	100 ms	54/100.00%	54/100.00%	41/75.93%	54/100.00%	54/100.00%	54/100.00%	54/100.00%	54/100.00%
8	500 ms	50/92.59%	54/100.00%	30/55.56%	54/100.00%	54/100.00%	54/100.00%	53/98.15%	54/100.00%
8	100 ms	48/88.89%	54/100.00%	25/46.30%	54/100.00%	43/79.63%	49/90.74%	48/88.89%	54/100.00%
9	500 ms	54/100.00%	54/100.00%	17/31.48%	54/100.00%	54/100.00%	54/100.00%	54/100.00%	53/98.15%
9	100 ms	51/94.44%	54/100.00%	11/20.37%	48/88.89%	50/92.59%	30/55.56%	47/87.04%	50/92.59%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lin, B.; Wang, J.; Wang, H.; Zhong, L.; Yang, X.; Zhang, X. Small Space Target Detection Based on a Convolutional Neural Network and Guidance Information. Aerospace 2023, 10, 426. https://doi.org/10.3390/aerospace10050426

AMA Style

Lin B, Wang J, Wang H, Zhong L, Yang X, Zhang X. Small Space Target Detection Based on a Convolutional Neural Network and Guidance Information. Aerospace. 2023; 10(5):426. https://doi.org/10.3390/aerospace10050426

Chicago/Turabian Style

Lin, Bin, Jie Wang, Han Wang, Lijun Zhong, Xia Yang, and Xiaohu Zhang. 2023. "Small Space Target Detection Based on a Convolutional Neural Network and Guidance Information" Aerospace 10, no. 5: 426. https://doi.org/10.3390/aerospace10050426

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Small Space Target Detection Based on a Convolutional Neural Network and Guidance Information

Abstract

1. Introduction

2. Proposed Method

2.1. Problem Description

2.2. Preprocessing Stage

2.3. Region Proposal Search

2.4. Target Detection with an SF-CNN

3. Implementation Details

3.1. Network Input

3.2. Network Structure

3.3. Guidance Information

3.4. Network Loss Calculation

4. Experimental Results

4.1. Experimental Settings

4.2. Comparative Analysis of the Processing Stage

4.3. Comparative Analysis of Ablation Experiments

4.4. Comparative Analysis of Experiments for Different Targets

4.5. Comparative Analysis of Experiments on the Same Targets with Different Saliency

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI