A Space Non-Cooperative Target Recognition Method for Multi-Satellite Cooperative Observation Systems

Zhang, Yue; Wang, Jianyuan; Chen, Jinbao; Shi, Donghao; Chen, Xiaotong

doi:10.3390/rs16183368

Open AccessArticle

A Space Non-Cooperative Target Recognition Method for Multi-Satellite Cooperative Observation Systems

by

Yue Zhang

,

Jianyuan Wang

^*

,

Jinbao Chen

,

Donghao Shi

and

Xiaotong Chen

National Key Laboratory of Aerospace Mechanism, College of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(18), 3368; https://doi.org/10.3390/rs16183368 (registering DOI)

Submission received: 26 June 2024 / Revised: 2 September 2024 / Accepted: 5 September 2024 / Published: 10 September 2024

Download

Browse Figures

Versions Notes

Abstract

:

Space non-cooperative target recognition is crucial for on-orbit servicing. Multi-satellite cooperation has great potential for broadening the observation scope and enhancing identification efficiency. However, there is currently a lack of research on recognition methods tailored for multi-satellite cooperative observation. In this paper, we propose a novel space non-cooperative target recognition method to identify satellites and debris in images from multi-satellite observations. Firstly, we design an image-stitching algorithm to generate space-wide-area images. Secondly, we propose a two-stage multi-target detection model, a lighter CNN model with distance merge threshold (LCNN-DMT). Specifically, in the first stage, we propose a novel foreground extraction model based on a minimum bounding rectangle with the threshold for distance merging (MBRT-D) to address redundant detection box extraction for satellite components. Then, in the second stage, we propose an improved SqueezeNet model by introducing separable convolution and attention mechanisms for target classification. Moreover, due to the absence of a public multi-target detection dataset containing satellites and debris, we construct two space datasets by introducing a randomized data augmentation strategy. Further experiments demonstrate that our method can achieve high-precision image stitching and superior recognition performance. Our LCNN-DMT model outperforms mainstream algorithms in target localization accuracy with only 0.928 M parameters and 0.464 GFLOPs, making it ideal for on-orbit deployment.

Keywords:

non-cooperative target recognition; foreground extraction; lightweight model; cooperative observation

1. Introduction

Non-cooperative space targets usually refer to malfunctioning satellites and space debris without cooperative interfaces, artificial labels, and information communication capabilities [1]. With the increase in space accidents and malfunctioned spacecrafts, many space non-cooperative targets will occupy orbit resources and threaten the safety of spacecrafts in orbit. Because of their non-cooperative nature, non-cooperative target on-orbit recognition technology has become a complex problem in space situational awareness (SSA) [2,3].

Compared with expensive and heavy LIDAR systems, optical cameras provide an economical alternative with less mass, power requirement, and geometric constraints [4]. As a result, the research of a non-cooperative space target recognition algorithm based on vision has a broader application prospect. With the development of convolutional neural networks, image classification and recognition based on deep learning has made a breakthrough in the field of natural images, producing ResNet series [5,6], YOLO series [7,8,9], Faster R-CNN [10], and other mainstream deep learning models. Deep learning methods can learn target features autonomously and are more robust than traditional methods. Thus, current space non-cooperative target recognition algorithms are shifting from conventional to deep learning [11].

However, the space non-cooperative target recognition task is different from the natural image recognition task. On the one hand is the problem of the dataset. The lack of datasets has consistently been a significant difficulty in space situational awareness (SSA) since it is rare, costly, and sensitive to obtain realistic on-orbit space surveillance images. Published datasets are mainly aimed at spacecraft classification tasks [12,13], component identification, and pose estimation tasks [14], and there are no publicly available datasets for the multi-target detection task of distinguishing between satellites and debris in the space environment [15]. While the SPARK dataset [16] is suitable for target detection missions, including ten satellites and five space debris, the scene pictures contain only single targets of satellites or debris. On the other hand, the problem of on-orbit deployment. Although YOLO and R-CNN series models belong to the end-to-end high-performance target detection models, they are large in scale and arithmetic demand. Under the constraints of arithmetic and memory resources of onboard hardware platforms [17], it is difficult for them to realize accurate and efficient on-orbit recognition.

In addition, in terms of observation modes, with the development of satellite formation technology, multi-satellite cooperative observation has been gradually applied to space surveillance, space capture [18,19], and other tasks because it can obtain more comprehensive wide-area space images. Typically, such as the autonomous maneuvering space flying net capture system [20,21], as shown in the Figure 1. The satellite formation carries the photoelectric camera, and based on the multi-satellite cooperative observation, it can obtain a high-resolution large field-of-view space image and improve the scope and efficiency of deep space exploration. Unfortunately, to the best of our knowledge, most current studies employ single spacecraft to monitor targets, and there are a lack of research results on multi-satellite cooperative observation for non-cooperative targets.

To address the above issues, this paper proposes a space non-cooperative target recognition method that offers an extensive monitoring range and higher detection efficiency compared to single-satellite systems. Firstly, we employ the idea of feature matching to realize the stitching of multi-satellite images and generate smooth, wide-area space images. Secondly, we perform target localization and recognition on the generated wide-area space images. In this part, considering the need for lightweight and fast inference of models for on-orbit recognition, inspired by the idea of two-stage recognition, we propose a lighter CNN model with a distance merge threshold (termed as LCNN-DMT). This model first performs foreground extraction from the scene to locate and extract all the suspicious targets and then sends each target region into a CNN classification network to distinguish between satellites and debris. In conclusion, we propose a simpler, lighter, and more hardware-friendly method for non-cooperative multi-target recognition within multi-satellite cooperative observation while maintaining high recognition accuracy and speed.

The contributions of this paper are mainly as follows:

(1): This paper proposes a space non-cooperative target recognition method, including wide-area image generation, as well as target recognition and localization.
(2): A public space target dataset and a multi-target space scene dataset are constructed, encompassing 15 types of satellites and 39 types of debris. In particular, a randomized augmentation strategy is proposed by randomly assigning the parameters and transformation combinations of the samples.
(3): Inspired by the minimum bounding rectangle with threshold (MBRT) [22,23], we propose a new foreground extraction model based on a minimum bounding rectangle with threshold for distance merging (MBRT-D) to achieve high-precision target localization and extraction. We replace the IOU threshold with a bounding box distance merge threshold, effectively addressing the issue of redundant extraction of satellite bodies and solar wing parts.
(4): We propose an improved lightweight SqueezeNet model, which integrates separable convolutions [24] to reduce the model’s parameter count and computational demands. Furthermore, the incorporation of the efficient channel attention (ECA) [25] mechanism significantly enhances the model’s capacity for inter-channel information interaction, thereby improving its recognition performance.
(5): The proposed LCNN-DMT adopts the two-stage idea of first positioning and then identification by combining MBRT-D with the improved SqueezeNet model. This approach not only reduces the data labeling time but also maintains extremely low parameter counts and computational requirements.

The remainder of this paper is organized as follows. In Section 2, we briefly discuss some research work related to ours. Section 3 presents the space target dataset, scene dataset preparation, and augmentation methods. Section 4 details our proposed multi-satellite non-cooperative target recognition method, including multi-satellite image stitching and LCNN-DMT model design. Extensive experiments are conducted in Section 5, with our work and future outlook presented in Section 6 and Section 7.

2. Related Work

2.1. Satellite Image Stitching

Owing to the limitations of detection equipment, a single image view cannot contain the required information about the area of interest. Image stitching is an effective solution for obtaining broader field-of-view images. At present, satellite image stitching technology focuses on remote sensing images, and there are fewer studies on satellite space monitoring images. However, we can get inspiration from the satellite remote sensing image stitching method.

Image registration is the critical step of remote sensing image stitching [26], which can be broadly divided into region-based and feature-based methods. Feature-based methods achieve image registration by extracting and matching point, line, edge contour, and surface features between image pairs. This method is the most commonly used for satellite image registration because of its small computation and strong robustness to noise pollution, rotation, and illumination changes. Classical feature detection algorithms include invariant feature transform (SIFT) [27], speeded-up robust features (SURF) [28], rotated BRIEF (ORB) [29], binary robust invariant scalable key points (BRISK) [30], AKAZE [31], etc. Once the feature point pairs of two images are obtained, the k-nearest neighbor (KNN) method is usually used for feature matching. Furthermore, a classical processing method utilizes the random sample consensus (RANSAC) [32] algorithm to screen the matching point pairs after the initial matching and calculate the homologous matrix to eliminate false feature matches. Scholars have proposed some optimization algorithms to enhance the robustness of the RANSAC [33]. For example, Chun et al., proposed the progressive sample consensus (PROSAC) [34] method, which samples from the optimal set of feature matches and is more time efficient than the RANSAC.

With the advancement of deep learning technology, some scholars have applied CNN models to remote sensing image stitching [35,36,37] to improve the stitching accuracy of complex and weak texture images. However, due to the high complexity of the model, it is difficult to meet the real-time stitching requirements. Moreover, the deep learning model requires a large number of samples, thus limiting its application to satellite image stitching.

Fortunately, traditional stitching methods can meet the accuracy and real-time demand of stitching for space images with relatively simple background environments. This paper will apply the feature-based registration method in remote sensing images to stitch satellite space monitoring images.

2.2. Space Non-Cooperative Target Recognition

Space non-cooperative target recognition refers to the localization and identification of malfunctioning satellites and space debris in the scene through the recognition networks.

Some scholars have used traditional hand-designed feature extractors to achieve space target recognition. Chen et al. [1] proposed a Sample-HOG feature to localize typical grasping regions of satellites. Better target suggestion regions are generated at the price of a slight increase in computational cost. However, traditional methods are limited by subjective feature selection, which makes it difficult to find deep connections between targets and features and lacks robustness.

In recent years, deep learning methods have attracted the attention of many scholars. Zeng et al. [38] first introduced the DCNN architecture into space target classification tasks based on the LeNet5 model and hybrid data enhancement method. However, the shallow DCNN model can only realize the object classification task [39]. For the detection task, the current deep learning models mainly focus on one-stage and two-stage frameworks. Li et al. [40] proposed a one-stage non-cooperative spacecraft detection model called SCNN-lite, which can realize multi-target spacecraft detection. However, the constructed simulation dataset is based on the ground environment and lacks consideration of the differences between space and Earth. Chen et al. [41] proposed a space target detection model based on faster R-CNN, using HRNet as the backbone network to enhance the feature extraction capability, but its high complexity led to a high computational cost.

However, both one-stage and two-stage detection networks face the problem of being difficult to deploy on a satellite platform and having limited inference accuracy and speed. Therefore, developing lightweight deep learning target detection models is a breakthrough for realizing on-orbit recognition. Liu et al. [42] proposed a lightweight algorithm for spacecraft components detection based on the YOLOv5 [8], introducing the Ghost module and the channel compression method. Yang et al. proposed a two-stage convolutional neural network (T-SCNN) [22,23] to achieve satellite-debris multi-target recognition. In contrast to the R-CNN series, the network is based on the MBRT foreground extraction algorithm first, and then the target area is sent to the CNN network for identification. This method has high recognition accuracy for cooperative targets on the simulation space dataset. Furthermore, the article discusses that the process will have higher detection efficiency for large-scale space images with simple backgrounds than mainstream deep learning models such as YOLO v3 and Faster R-CNN but lacks relevant experimental comparison data.

Inspired by the idea of T-SCNN, this paper designs a two-stage detection model that includes foreground extraction and target recognition. Due to the complexity of the satellite structure, there remains room for optimization of the foreground extraction and classification model of the idea. In this paper, we will improve them for better localization and identification and conduct more comprehensive comparison experiments to verify the superiority of our model.

3. Datasets

Since the space non-cooperative target recognition model proposed in this paper adopts the two-stage idea of foreground extraction followed by classification, it is necessary to construct a space target dataset containing failed satellites and debris for the training of the classification network and a space scene dataset for the testing of the overall model recognition capability. Considering the variation of illumination angle and brightness of the natural space environment, we propose a randomized data augmentation strategy to improve the data richness.

3.1. Space Target Dataset

The space target dataset constructed in this paper contains two types of targets: malfunctioning satellites and space debris. We obtain satellite models from publicly accessible online resources and the BUAA-SID-share1.0 public dataset. The BUAA dataset has a total of 4600 grayscale images with a resolution of 320 × 230, including 20 types of satellite targets, each containing 230 multi-angle attitude images generated by 3D MAX software, which provides data support for the space target classification task. Since space debris often accompanies satellites in the real universe, we collect various space debris online, mainly parts of spacecraft or rockets.

The limited availability of publicly accessible space target images makes it challenging to meet the training requirements of deep learning models. To overcome the problem of poor generalization ability caused by small sample datasets, this paper proposes a randomized data augmentation strategy to achieve target dataset augmentation.

We use label-preserving transformations for data enhancement, including horizontal flipping, vertical flipping, random rotation, random cropping, and other geometric transformations. In addition, considering that the target’s luminance and background change with varying angles of light source incidence, we randomly add varying degrees of brightness changes based on contrast enhancement and Gamma correction to prevent the overfitting of the recognition model in the uneven illumination environment.

Instead of using only one transform or a fixed order of multiple transforms, we apply randomly generated combinations and coefficients of transforms to each image. Due to the random transformations, such as rotation angle, cropping scale, and Gamma coefficients, the uncertainty of the generated image samples is higher, improving the whole dataset’s sample richness. This will enhance the model’s recognition ability for non-cooperative targets. Figure 2 illustrates the effect of some of the data transformations in this paper.

3.2. Space Scene Dataset

According to the characteristics of space scenes with wide frames and small target pixels, considering the pixel values of satellite cameras in recent years, the resolution of our simulated images of space scenes is unified as 2048 × 2048. Considering the effect of illumination during real detection, we perform a luminance transformation on a pure black background. The targets within the scene are randomly taken from the above-established space target dataset and randomly placed in the background. The pixels of the targets are set to 0.6 times, 0.75 times, and 0.9 times resolution (320, 240), corresponding to the different stages of approaching the target during the detection or capture tasks. The simulated images in our space scene dataset are shown in Figure 3. In the end, a total of 2900 space scene images were synthesized and marked, containing all target types in the space target dataset. Compared with the publicly available space datasets containing only a single spacecraft or space debris target, our scenes are equipped with satellites and debris multi-targets, which are closer to the actual space environment.

4. Methodology

Typical application scenarios of the multi-satellite cooperative observation system are shown in Figure 1. The system offers significant advantages for tasks such as capturing non-cooperative target satellites within satellite formations. By utilizing image stitching technology, the system can achieve wide-area information correlation and coverage, thereby enhancing the scope and efficiency of deep-space exploration. In the cases where a satellite detector is non-functional, or the image quality is poor due to noise interference, other satellites in the formation can assume the target detection tasks. This redundancy improves the system’s anti-interference capability and flexibility. Additionally, when close-range proximity detection limits the ability to capture a complete image of the target satellite from a single camera viewpoint, coordinated observation across multiple satellites can address this challenge, thereby improving the accuracy of target localization.

Figure 4 shows the architecture of the proposed space non-cooperative target recognition method for multi-satellite cooperative observation systems. It mainly consists of two parts: multi-satellite wide-area image generation based on image stitching and space target detection in wide-area images based on the LCNN-DMT model. Among them, our LCNN-DMT model is a two-stage space non-cooperative target recognition model. The first stage locates and extracts all the suspicious targets based on the proposed MBRT-D foreground extraction model. The second stage is based on our improved SqueezeNet network for classifying the extracted targets into binary satellites and debris.

4.1. Multi-Satellite Image Stitching

Considering the simpler background of space exploration images compared to remote sensing images, as demonstrated in Figure 5, we employ the feature-based image registration method to realize multi-satellite image stitching. This approach offers lower computational complexity and enhanced real-time performance. Currently, feature-based matching algorithms like SIFT, SURF, and ORB are widely utilized in the field of image processing. While SIFT and SURF require the extraction of high-dimensional image features, leading to significant computational complexity. In contrast, ORB employs a 256-bit binary vector descriptor, characterized by rotational invariance and noise suppression, making it approximately 100 times quicker in computational speed compared to SIFT. Consequently, we adopt the ORB feature descriptor to ensure real-time performance in on-track image stitching.

Firstly, we extract the ORB feature points of the reference image and the image to be registered to form their respective feature point sets. Secondly, we realize the coarse matching of feature points based on the KNN matcher. To further eliminate the false matches, we apply the PROSAC algorithm to realize the matching point correction and solve the homography matrix. The homography matrix can be used to represent the projection transformation relationship between any two images. Finally, after the projective transformation of the image is registered, we design the weighted average fusion algorithm to make it naturally stitched with the reference image. The equation for the weighted average fusion algorithm is as follows:

F (x, y) = \{\begin{matrix} A (x, y), & (x, y) \in R_{A}, (x, y) \notin R_{B} \\ ω_{A} A (x, y) + ω_{B} B (x, y), & (x, y) \in R_{A} \cap R_{B} \\ B (x, y), & (x, y) \in R_{B}, (x, y) \notin R_{A} \end{matrix},

(1)

where

R_{A}

and

R_{B}

, respectively, represent the region of images A and B.

ω_{A}

and

ω_{B}

represent weighting coefficients and satisfy

ω_{A} + ω_{B} = 1

. The value

ω_{A}

gradually increases from 0 to 1 as the fusion region transitions from left to right. This fade-in and fade-out weighted stitching improves the problem of visible stitching seams between two images due to luminance differences.

A (x, y)

denotes the pixel value at the coordinates

(x, y)

within the region of image A, where x represents the horizontal coordinate and y represents the vertical coordinate.

Considering that multiple satellites acquire space images in the form of formation, the orientation relationship between satellites is relatively fixed. Therefore, the satellite image’s field of view connection belongs to orderly stitching. As shown in Figure 6, the matching and stitching sequence can be set in advance to improve the efficiency of wide-area image generation.

4.2. Foreground Extraction Model

4.2.1. MBR and MBRT

The process of foreground extraction algorithm based on the minimum bounding rectangle (MBR) mainly includes four stages: gray scaling, filter processing, threshold segmentation, and contour extraction. It has accurate and fast localization performance for the targets in the continuous region under the pure black background.

However, the potential problem is that the connecting structures between the satellite body and the solar wings and between the sail panels often appear as pixel discontinuities in the space image, which causes the MBR algorithm to easily divide the satellite into multiple parts, resulting in the redundancy of the detection boxes.

To solve the redundancy problem of detection boxes in MBR, Wu Tan et al. [16,18] proposed an improved foreground extraction algorithm, MBRT. The idea of MBRT is shown in Figure 7. MBRT introduces the IoU threshold to realize the de-redundancy of detection boxes by searching and merging the overlapping detection boxes with IOU higher than the merging threshold. To a certain extent, it makes up for the deficiency of MBR when the target structure is not continuous.

4.2.2. The Proposed MBRT-D

The proposal of MBRT profoundly inspires us. However, we have identified certain limitations in its application. Specifically, when multiple detection boxes are generated for a given satellite target, the IOU values are always zero for detection boxes that are in close proximity but do not overlap. As a result, MBRT will ignore processing them, leading to a certain amount of redundant detection boxes still existing. To address this issue, we propose an enhanced foreground extraction algorithm in this paper, i.e., a minimum bounding rectangle with the threshold for distance merging (MBRT-D). The proposed MBRT-D takes into account the boundary distances between rectangular boxes under different relative positional relationships and merges the boxes with boundary distances below the threshold. The concept of rectangular box merging in MBRT-D is illustrated in Figure 8.

The MBRT-D proposed in this paper will solve the rectangular box distances under different positional relations, and the rectangular box merging formula is as follows:

\{\begin{matrix} x = min (x 1, x 2), \\ y = min (y 1, y 2), \\ x b = max (x 1 b, x 2 b), \\ y b = max (y 1 b, y 2 b), \end{matrix}, D \leq Threshold

(2)

where

(x, y, x b, y b)

represents the positional information of the merged rectangle, including the upper-left corner coordinate

(x, y)

and the lower-right corner coordinate

(x b, y b)

.

(x 1, y 1, x 1 b, y 1 b)

and

(x 2, y 2, x 2 b, y 2 b)

represent the positional information of the two rectangular boxes before merging, respectively. D denotes the boundary distance between the two detection boxes.

T h r e s h o l d

refers to the set distance merging threshold.

The primary objective of our MBRT-D strategy is to merge redundant localized detection boxes of the same satellite target after threshold segmentation, which initially contain only partial components such as sails and the body. This merging process enables complete framing and localization of the target before it is input into the recognition network for classification. The value of the distance merging threshold is mainly influenced by the detection distance between the observing satellite and the non-cooperative satellite target.

When the detection distance is far, the satellite target occupies fewer pixels in the scene, so the threshold should be kept small. This approach ensures that redundant detection boxes of the same small target are merged while preventing the merging of boxes from other targets. Conversely, when the observing satellite is closer to the target, particularly in tasks involving the capture of inactive satellites, the satellite target occupies a larger portion of the image. In this case, we shall widen the distance threshold to ensure each redundant detection box is merged to obtain a complete and accurate detection frame. The specific relationship is as follows:

Assume that the size of a target in the scene is

H_{0} \times W_{0}

, where

H_{0}

denotes the height of the target, and

W_{0}

denotes the width of the target. Based on the principle of optical imaging, the corresponding size of the target reflected in the image is demonstrated below:

\{\begin{matrix} W_{p} = \frac{f}{d} \cdot W_{0}, \\ H_{p} = \frac{f}{d} \cdot H_{0}, \end{matrix}

(3)

where

H_{P}

and

W_{P}

denote the height and width of the target in the image, respectively. f is the focal length of the camera and d is the distance of the observation satellite from the target. In this case, the pixel percentage of the target in the image with the resolution of

H \times W

can be expressed as follows:

β = \frac{W_{p} \cdot H_{p}}{W \cdot H} = \frac{f^{2} \cdot W_{0} \cdot H_{0}}{d^{2} \cdot W \cdot H},

(4)

where

β

represents the pixel percentage of the target in the image. Therefore, the distance merging threshold is proportional to the pixel occupancy of the target and inversely proportional to the detection distance. Wherein when a formation of satellites with the same camera configuration is in the same plane for coordinated detection, the pixel occupancy refers to the percentage of pixels in the stitched image.

4.3. Target Classification Network

After the MBRT-D foreground extraction algorithm locates and extracts all the suspicious targets in the scene, the satellite and debris targets are distinguished by the classification network. We propose a lightweight convolutional neural network, improved based on SqueezeNet [43].

4.3.1. SqueezeNet Model

SqueezeNet is a lightweight network structure proposed by Landola, which achieves a recognition accuracy comparable to that of Alexnet under the ImageNet dataset. At the same time, the number of model parameters is reduced by about 50 times, and the model size is only 4.8 M. The authors further introduce SqueezeNet v1.1, which requires 2.4× less computation and slightly fewer parameters than the original SqueezeNet without sacrificing accuracy. The structure of the SqueezeNet v1.1 is shown in Figure 9. The model consists of two convolutional layers, eight Fire modules, three maximum pooling layers, a global average pooling layer, and a Softmax layer. Among them, the Fire module is the core of the SqueezeNet, which uses a large number of 1 × 1 convolutional kernels and a mixture of 1 × 1 and 3 × 3 convolutional kernels instead of 3 × 3 convolutional kernels. This replacement greatly reduces the number of parameters and computation of the model while increasing the depth of the model, and the recognition accuracy is not impaired.

4.3.2. Improved SqueezeNet Model

Due to the lightweight architecture of SqueezeNet, which facilitates deployment on starboard platforms with resource and computing power constraints, we select SqueezeNet as the basis of our recognition network. Furthermore, we propose an improved SqueezeNet model by refining the Fire module and introducing the attention mechanism, as illustrated in Figure 10. To reduce the model complexity and improve energy efficiency, the original Fire6-Fire9 modules are replaced with the INFire module we designed. Additionally, we use the combination of the 3 × 1 convolution and 1 × 3 convolution to obtain the same sensory field as that of the 3 × 3 convolution, which leads to a 33% reduction in the number of parameters. Since the Fire and INFire modules achieve multi-scale feature fusion through simple channel concatenation, we introduce the efficient channel attention (ECA) mechanism after selecting Fire and INFire modules to enhance inter-channel information interaction, allowing the network to extract more discriminative features for classification.

Specifically, the design of the INFire module is described here. Inspired by the idea of decomposable convolution in the Inception series of models [24], a combination of 1 × N and N × 1 convolution kernels is used instead of N × N convolution kernels, which not only achieves the same feeling field but also compresses the same number of parameters. Therefore, we split the 3 × 3 convolution in the original Fire module into parallel 3 × 1 and 1 × 3 convolutions. This decomposition not only reduces the model parameters but also deepens the depth of the network, thus improving the feature extraction capability. The structure of the INFire module is shown in Figure 10. Firstly, in the squeeze stage, the input features are channel compressed by 1 × 1 convolution. Secondly, in the expand phase, the number of channels is extended by 1 × 1 convolution, 3 × 1 convolution, and 1 × 3 convolution, respectively; finally, the results of different convolution kernel sizes are output to the next layer through channel concatenation.

Subsequently, the efficient channel attention (ECA) module is introduced. The ECA mechanism [25] enhances the feature representation of convolutional neural networks by capturing inter-channel dependencies, using one-dimensional convolutions instead of fully connected layers, thereby significantly reducing the network parameters. The principle of the ECA attention mechanism is illustrated in the following Figure 11.

For an input feature

X_{Input} \in R^{C \times H \times W}

, the average response of each channel is obtained after global average pooling to obtain global contextual information to obtain the characteristics of the

(C, 1, 1)

dimension. Then, channel-level feature learning is performed using one-dimensional convolution with an adaptive convolution kernel to capture the local dependencies among channels. Based on the Sigmoid activation function, the output of the convolutional layer is non-linearly transformed so that all parameters range from 0 to 1, and the attention weight vector can be obtained. Finally, the weight vector is dot-multiplied with the input features to obtain the output feature

X_{Output} \in R^{C \times H \times W}

. The attention weight vector can be computed according to the following equation:

X_{W} = σ ({Conv}_{1 D k} (GAP (X_{Input}))),

(5)

where GAP denotes global average pooling, and

{Conv}_{1 D k}

denotes one-dimensional convolution with convolution kernel k. In this paper, we set k to be

3 \times 3

. In addition,

σ

is the Sigmoid activation function. Here, the output feature

X_{Output}

can be expressed as follows:

X_{Output} = X_{W} \otimes X_{Input},

(6)

where ⊗ denotes the inter-element multiplication, and

X_{W} \in R^{C \times 1 \times 1}

is the weight feature map.

5. Experiments

5.1. Implement Details

Our experiments are all performed on the Windows 11 operating system, and the programming environment incorporates Pycharm IDE with Python 3.8, Pytorch 1.8, and CUDA 11.1. The computing resources are Intel Core i7-13700KF CPU and NVIDIA RTX 4090 GPU.

In order to fairly compare the recognition algorithms and show the outstanding of our method, the datasets are set up as follows in the experiments. The space target dataset contains 17,000 samples containing 12,500 satellite targets and 4500 debris targets of varying sizes. The ratio of the training and testing set is 8:2 for the training and testing of the binary classification network, and the target image size is standardized to 320 × 240 when sent to the network. In the space scene dataset, the total number of samples is 2900 with a resolution of 2048 × 2048, of which 2000 are used as the training set while 900 are regarded as the test set. The scene images are randomly placed with different kinds of satellites and debris in our established space target dataset, which contains a total of 1649 satellite targets and 1031 debris targets in the test set. To compare the recognition performance of different algorithms on space non-cooperative targets in natural scenes, in the two datasets above, the targets in the test set are almost non-cooperative, i.e., their configurations have not appeared in the training set.

5.2. Image Stitching Results

To measure the performance of our algorithm, we compare the image registration algorithm presented in this paper, which is based on ORB features, with other classical feature extractors [27,29,30,31]. Uniformly, the KNN method is used for feature matching, the PROSAC algorithm for matching point correction, and the weighted fusion method for image fusion. We apply the different feature registration algorithms to 10 sets of simulated satellite images to be stitched and compare their correct feature point matching pairs (

N_{c}

), root mean squared error (RMSE), precision, and stitching speed. In order to evaluate the impact of different sensor imaging differences on the stitching performance, we have designed a transformation for each set of image pairs. Specifically, rotation transforms, luminance transforms, and Gaussian noise are added to simulate different viewing angles, illumination, and signal-to-noise ratio (SNR) situations, respectively. The RMSE calculates the deviation between the reference image and the matched image. The precision refers to the ratio of the number of correct feature point matches to the number of all matches after matching point correction and is defined as follows:

Precision = N_{c} / N,

(7)

where

N_{c}

denotes the number of correct feature point matching pairs after matching correction, and N denotes the number of all feature matches after matching correction. Stitching speed refers to the time from the start of feature extraction to the end of image fusion between two images.

As can be observed in Table 1, since the space background is simpler than the ground background, these feature matching algorithms show high robustness even for image pairs with differences in imaging, and the matching algorithms based on SIFT, AKAZE, and ORB features can reach up to 100% precision after PROSAC correction. In addition, regardless of which feature descriptor is used, the feature point matching correction method based on PROSAC outperforms RANSAC in terms of precision and RMSE and successfully eliminates the redundant matching pairs of the RANSAC algorithm so as to reduce the computational burden. Among them, the SIFT+PROSAC algorithm has the lowest matching error. However, the ORB+PROSAC algorithm we introduce achieves a larger number of correct matched points and reduces the image matching speed by nearly 90 milliseconds, with only a 0.137 unit increase in RMSE compared to the SIFT+PROSAC algorithm.

Figure 12 presents the matching and stitching results of the representative three-treatment alignment images, joining various degrees of rotation transformation, luminance transformations, and Gaussian noise interference. It can be seen that the PROSAC algorithm can obviously eliminate the wrong feature pairs after the initial alignment of KNN. Meanwhile, when there are imaging deviations in different sensors, our ORB+PROSAC stitching algorithm still works well and realizes smooth transition and seamless splicing of multi-satellite observation images in overlapping regions without ghosting. It proves the environmental adaptability of our stitching algorithm.

5.3. Randomized Data Augmentation Results

To verify the performance enhancement of the randomized data augmentation strategy on the target classification network, based on the constructed space target dataset, we compare the recognition accuracy of no augmentation strategy, fixed-order augmentation strategy, and randomized data augmentation strategy under the SqueezeNet model. To ensure the network recognition accuracy is not affected by the amount of training data, for the original dataset without augmentation strategy, we perform 10× replication of the samples without image geometry and color transformation. Finally, the space target dataset of this experiment contains 10,000 satellite samples and 4000 debris samples in the training set and 2500 satellite samples and 500 debris samples in the test set.

As shown in Table 2, it can be seen that using the data augmentation strategy significantly improves the recognition accuracy of the network compared to using the original dataset. Among these strategies, due to adding the diversity and randomness of the samples, the randomized data augmentation strategy improves the recognition accuracy by 21.97% and 5.43% compared with the original dataset replica expansion strategy and the fixed order augmentation strategy, respectively. This experiment demonstrates the effectiveness of our strategy of randomized data augmentation.

5.4. Detection Performance of LCNN-DMT

5.4.1. Performance on Our Space Scence Dataset

Figure 13 demonstrates the space non-cooperative target detection results based on our proposed LCNN-DMT model. It can be seen that our model has multi-target identification and localization capabilities with high confidence. In practical application, after target detection, the three-dimensional position of the target can be solved based on the principle of the multi-camera positioning system. Further, the target position information can be provided to the control system to perform the approach capture.

5.4.2. Performance on Spark Dataset

The Spark dataset [16] is a widely used dataset for space target detection nowadays. To demonstrate the generalization capability and the identification capacity for space non-cooperative targets of our model, we conduct detection experiments on the Spark dataset. The dataset has never been used during the model’s training or validation phases. As displayed in Figure 14, our model successfully locates and identifies the satellite target containing earth background interference under the SPARK dataset. This proves the environmental adaptability of our model.

5.4.3. Noise Case

As we all know, Gaussian noise usually appears during space image acquisition and transmission. To test the anti-noise performance of our LCNN-DMT model, the recognition results under different degrees of Gaussian noise are presented in Figure 15. The means (

μ

) of all the noises are 0, and the standard deviations (

σ

) range from 0 to 1. To evaluate image quality, we also calculate the signal-to-noise ratio (SNR) for images with varying levels of noise.

It is observed that image quality significantly decreases as the noise level increases. Our LCNN-DMT model can correctly extract and identify satellite and debris targets with high confidence when the SNR remains above −10 dB. However, when the SNR is below −12 dB, the model, despite being able to detect targets normally, is unable to accurately identify them due to the overwhelming noise proportion in the image. It should be noted that the extreme scenarios depicted in Figure 15e–h, where the SNR is negative, are not commonly encountered in actual detection. Consequently, when considering practical applications, our model performs with commendable robustness.

5.5. Ablation Study

In this section, we conduct a series of ablation studies to systematically validate the contribution of each component within our proposed recognition approach.

5.5.1. Effectiveness of Foreground Extraction Model Performance

To demonstrate the advantages of the proposed MBRT-D foreground extraction algorithm, Table 3 presents a comparison of recognition performance when different foreground extraction algorithms are applied to the SqueezeNet model. The experiments are based on test images from our space scene dataset. It is evident that our proposed MBRT-D foreground extraction algorithm has the best performance. The mAP has increased by 17.89% and 10.42% relative to the MBR and MBRT. This indicates that the proposed method of removing redundant detection boxes based on distance merge threshold has a better extraction effect and target localization.

As shown in Figure 16, we compare the effect of different distance merge thresholds on the recognition performance, with a range of thresholds covering 0 to 250. When the threshold is 0, no merging is performed on any detection frames, the detection results at this point are consistent with the MBR strategy, with a poor recognition accuracy. Most of the targets in our test images are small in size, belonging to long-distance detection simulation images, which are more sensitive to distance. As a result, when the threshold is set small, the recognition accuracy is effectively improved, and the best recognition performance is achieved when the threshold is 26, with an accuracy of 97.06%. As we analyzed, with the distance threshold continuing to increase, the recognition accuracy declines. When the distance merge threshold is too large by more than 250 pixels, the recognition performance degrades to below the MBR strategy. The reason for this is that a distance merge threshold that is too large will cause other target detection boxes to be merged together, resulting in incorrect recognition and location. Experiments have proven that the distance merge threshold is related to the detection distance, and the threshold needs to be set in a reasonable interval according to the detection distance. Meanwhile, in the subsequent experiments, our distance merge threshold is set to 26.

Figure 17 demonstrates the effect of foreground extraction based on MBR, MBRT, and MBRT-D for the same image. The image background has been processed with brightness change, and salt-and-pepper noise is added randomly to each image. We introduce OTSU [44] to realize adaptive threshold segmentation in the three algorithms to adapt to the brightness variation of the environment. It can be seen that based on our MBRT-D, the redundant neighboring detection boxes after MBRT are merged. This indicates that MBRT-D reduces the redundancy of detection boxes and improves target localization accuracy with a certain degree of anti-noise ability.

5.5.2. Effectiveness of INFire Module

In this paper, we design the INFire module with parallel 3 × 1 and 1 × 3 convolutions instead of 3 × 3 convolutions. In order to check the impact of our designed INFire module, we replace the Fire module at different locations in the SqueezeNet with the INFire module, training and testing the model based on our space target dataset. As seen in Table 4, the spatially separable convolutions idea in the INFire module can effectively reduce the number of model parameters and computations, and the more INFire modules are set, the lighter the model is. Compared with replacing all the Fire modules or replacing Fire4-Fire5, the model accuracy is highest when replacing the four positions of Fire6-Fire9, which can reach 98.23%, and the model can be compressed to 75.14% of the base SqueezeNet model.

5.5.3. Effectiveness of Attention Mechanism

To verify the effects of the ECA attentional mechanism in our optimized SqueezeNet. We set the ECA module at the end of the Fire module in different layers, and Figure 18 shows the effect of the ECA module and its position on the recognition accuracy. It can be seen that the introduction of the ECA mechanism after the Fire module at most locations can improve the recognition performance.

However, the original recognition ability may be weakened in case of improper location selection. By adding the ECA mechanism after Fire9, the recognition accuracy is reduced from 98% to 97.63% due to the backward processing position. When the ECA mechanism is added sequentially after Fire2-Fire9, the recognition accuracy is reduced by 0.33%, probably because the Fire9 layer combined with the ECA mechanism has a greater impact on the overall model. After experimentation, the model recognition accuracy is highest when the ECA module is embedded behind both Fire3 and Fire7, up to 98.47%. In addition, the ECA attention mechanism improves the model recognition accuracy while basically unchanging the amount of computation and the number of parameters.

5.5.4. Effectiveness of Improved Squeezenet

After confirming the effectiveness of the INFire and ECA modules, we introduce the combination of both into SqueezeNet. In particular, based on the experimental results of the single module, we use the INFire module to replace the Fire6-Fire9 module in the original model to achieve the light weight of the model. We deploy the ECA module after the Concat operation of the Fire3 and INFire7 modules for channel information fusion. According to Table 5, our improvement increases the recognition accuracy to 98.83% while reducing the number of model parameters by 24.86% and computation by 15.02%. It proves the positive effect of INFire module and ECA module on the model recognition performance.

5.6. Comparative Experiments

5.6.1. Comparison of Classification Model Performance

To represent the excellent performance of our improved classification model based on SqueezeNet, we compare it with the LeNet-5 model used in the article [38], the ResNet-50 model used in the article [22,23], and other state-of-the-art (SOTA) models in recent years [6,45,46,47,48]. The comparison experiments are based on our space-targeted dataset, and the training epochs are uniformly set to 32. As presented in Figure 19, due to the sufficient amount of training samples after data augmentation, all models’ recognition accuracies are higher than 97%, except for the LeNet-5 model, which has a shallower number of layers. Although the LeNet-5 model performs the fastest inference with the least amount of computation, since this model employs the fully connected layers, resulting in an overly large number of parameters and model size.

The comparison results show that our model with the introduction of separable convolution and the ECA attention mechanism is superior to the basic SqueezeNet model in terms of both accuracy and the number of parameters and computations at the cost of a speed delay of no more than 1 ms. Among the models, our model has the highest recognition accuracy and the smallest number of parameters, which makes it more lightweight and hardware-friendly for on-orbit recognition tasks. In addition, although MobileNetv3 and ShuffeNetv2 also have outstanding recognition performance with lower computing power requirements, our model inference speed is only 44.83% and 56.52% of theirs, respectively. Therefore, the performance of our improved classification model has superior recognition performance compared to the current SOTA model, proving the effectiveness of the improved module.

5.6.2. Comparison of Recognition Method Performance

For further clarification about the superiority of our LCNN-DMT model, five recent SOTA models are used for comparisons, including lightweight models of the YOLO series [7,8,9], Faster R-CNN [10], and our benchmark model T-SCNN [22]. All experiments are uniformly conducted under our space scene dataset. The YOLO series and faster R-CNN are trained for 300 epochs with 16 batch sizes.

The results are illustrated in Table 6. Obviously, our LCNN-DMT model is lightweight enough, even though the lightest of the mainstream models, the YOLO v5n model, has 1.9 times the number of parameters and 9.1 times the amount of computation more than our model. Moreover, our LCNN-DMT model is lightweight while meeting space detection’s accuracy and speed requirements. The mAP of our model reaches 97.50%, which is only slightly lower than YOLO v8n, but the average inference speed of a single frame is improved by 17.6% compared to YOLO v8n and is much faster than Faster R-CNN.

In practical detection, targets in space tend to be sparse. Our model uses the two-stage strategy, so the image inference time mainly depends on the number of targets in the scene. It means that when the number of targets in the scene is less than three, our model will have a significant speed advantage, with an average single-target inference time of only 2 ms. In addition, considering the resource limitations of the onboard computing platform, it is usually required to recognize the wide-area space images in blocks after segmentation. Mainstream models such as YOLO and faster R-CNN need to be inferred for each sub-image, which will waste a lot of time in the blank area. In contrast, this paper adopts image processing methods to search the target area first quickly, and only the sub-images containing the target are processed, which has a higher on-orbit recognition efficiency.

On the other hand, our LCNN-DMT only needs the classification model to be trained, which saves a lot of marking and training time; the training time for network convergence is only 15 min. Compared with YOLO v5n and faster R-CNN, the training time is saved 6 times and 168 times, respectively.

Finally, compared to our baseline model, T-SCNN, our LCNN-DMT model has a notable performance improvement. This is due to our optimized foreground extraction algorithm design and redesigned the SqueezeNet model for classification. Specifically, the mAP increased by 10.86%, and the model size was sharply reduced by 96.3%, while the inference speed has been increased by 66.5%, which verifies the effectiveness and advantages of our proposed method once again.

5.6.3. Visualization Comparison

The visualization results of each detection model under the space scene dataset are shown in Figure 20. Obviously, our LCNN-DMT model shows the best recognition performance for space non-cooperative targets and can even recognize targets that are difficult to judge by human eyes.

Particularly in Figure 20a, due to the dark brightness of the debris target located at the top of the scene, it is misidentified as a satellite target by other mainstream recognition models except our LCNN-DMT model and T-SCNN model, highlighting the advantages of the strategy we adopt.

On the other hand, since the foreground extraction algorithm designed in this paper considers the distance merge threshold, there is almost no detection box redundancy in the visualization results of the LCNN-DMT compared to other models. As shown in Figure 20a–d, both the YOLO series models and the T-SCNN model expose the problem of redundant detection of solar panels, which leads to misjudgment of the number of targets and localization errors. Although faster R-CNN does not have such a problem, the inference speed is too slow to be realistically applied to on-orbit recognition. Therefore, considering the recognition precision, speed, number of parameters, computation cost, and virtualization effect, our proposed recognition model, i.e., LCNN-DMT, has more outstanding recognition performance and is more suitable for on-orbit deployment.

6. Discussion

The experimental results demonstrate that our method can be applied to multi-satellite cooperative observation for missions such as space capture and space surveillance. It should be noted that our recognition model is not only applicable to spatial targets but also to conventional targets. Furthermore, the model is not limited to optical images; it can also be extended to synthetic aperture radar (SAR) images for the task of non-cooperative target detection.

As depicted in Figure 21, we trained and tested our LCNN-DMT model based on the SAR-Ship dataset [49]. Fortunately, our algorithm can successfully detect ship targets in SAR images. Moreover, our proposed foreground extraction algorithm successfully merged redundant extraction frames caused by pixel discontinuities, which are annotated with green ellipses in Figure 21b,c.

However, our cooperative recognition algorithm has certain limitations. On the one hand, our algorithm is more suitable for fast processing of scenes with simple backgrounds, such as space or clean sea, while for interferences such as harbors or islands, as in Figure 21c, our foreground extraction algorithm extracts them together. Although we have included the background as negative samples in our classification model to mitigate this issue, the recognition performance is likely to degrade when faced with excessively complex backgrounds.

On the other hand, in practical applications, the imaging differences between satellite detectors due to various viewing angles, light intensities, and SNR will directly affect the stitching and recognition accuracy. Figure 22 shows the cooperative recognition result of our model for satellite images with different SNR. Due to the image preprocessing stages, including mean filtering and contrast enhancement adopted in our model, the quality difference between different images is reduced, and the space targets are successfully recognized by cooperative detection. However, more significant imaging differences pose a greater challenge to the model, which can adversely affect the recognition results.

In the future, we will consider utilizing super-resolution reconstruction techniques to enhance the algorithm’s robustness against variations in viewing angles, lighting conditions, and noise levels. Additionally, we will consider utilizing neural network hardware acceleration techniques to achieve the fast inference of our model on FPGA platforms and verify its feasibility for on-orbit deployment.

7. Conclusions

In contrast to terrestrial natural image detection tasks, space imagery is characterized by a simplistic background, large scale, low pixel, high noise levels, and dispersed targets with minimal pixel representation in long-range detection scenarios. Given the constraints on memory and computational resources of onboard hardware platforms, this paper has proposed a lightweight on-orbit space non-cooperative target recognition method with multi-satellite cooperation, aiming to achieve coordinated sensing across multiple satellites and reduce the parameter count and computational complexity of the recognition model. The input is the collected images from multiple satellites. Through the image stitching algorithm and recognition model proposed in this paper, the wide-area image is obtained and the target category and location information are output.

Concretely, we employed the feature-based image registration algorithm to realize high-precision stitching of multi-satellite images, taking into account the effects of perspective, illumination, and noise. Then, we proposed a novel space non-cooperative target recognition model, LCNN-DMT. For the foreground extraction step, we proposed the MBRT-D model, which solves the problem of redundant detection boxes by introducing a distance merge threshold so as to achieve high-precision location and extraction of suspicious targets in the scene. For the target classification stage, we introduced the spatially separable convolutions and ECA attention mechanism into the SqueezeNet model to obtain an improved model with higher accuracy and lighter weight. In addition, we constructed publicly available space target datasets and space scene datasets to provide convenience for other researchers to carry out space target classification and multi-target detection studies.

In conclusion, our method enables cooperative recognition for images obtained through multi-satellite observation, thereby enhancing both the field of view and detection efficiency. The LCNN-DMT recognition model we designed is sufficiently lightweight enough to mitigate the high computational demands of deep learning models, making it feasible for deployment on satellite-borne platforms. Moreover, compared to mainstream deep learning models such as YOLO and Faster R-CNN, our model achieves high recognition accuracy and inference efficiency without the need for large-scale scene datasets, thereby reducing the time required for labeling and training.

Author Contributions

Conceptualization, Y.Z. and D.S.; methodology, Y.Z. and J.W.; software, Y.Z. and D.S.; data curation, D.S. and X.C.; writing—original draft preparation, Y.Z.; writing—review and editing, Y.Z. and J.W.; funding acquisition, J.C.; supervision, J.W. and J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China with grant number U21B6002.

Data Availability Statement

The original data presented in the study are openly available at https://github.com/FPGAzzy/Space_datasets, accessed on 5 May 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, L.; Huang, P.; Cai, J.; Meng, Z.; Liu, Z. A non-cooperative target grasping position prediction model for tethered space robot. Aerosp. Sci. Technol. 2016, 58, 571–581. [Google Scholar] [CrossRef]
Shan, M.; Guo, J.; Gill, E. Review and comparison of active space debris capturing and removal methods. Prog. Aerosp. Sci. 2016, 80, 18–32. [Google Scholar] [CrossRef]
Sun, C.; Sun, Y.; Yu, X.; Fang, Q. Rapid Detection and Orbital Parameters’ Determination for Fast-Approaching Non-Cooperative Target to the Space Station Based on Fly-around Nano-Satellite. Remote Sens. 2023, 15, 1213. [Google Scholar] [CrossRef]
Xiang, A.; Zhang, L.; Fan, L. Shadow removal of spacecraft images with multi-illumination angles image fusion. Aerosp. Sci. Technol. 2023, 140, 108453. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Jocher, G. YOLOv5. Available online: https://github.com/ultralytics/yolov5 (accessed on 5 May 2024).
Jocher, G. YOLOv8. Available online: https://github.com/ultralytics/ultralytics (accessed on 5 May 2024).
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Zhang, H.; Zhang, Y.; Feng, Q.; Zhang, K. Review of Machine-Learning Approaches for Object and Component Detection in Space Electro-optical Satellites. Int. J. Aeronaut. Space Sci. 2024, 25, 277–292. [Google Scholar] [CrossRef]
Zhang, H.; Liu, Z.; Jiang, Z.; An, M.; Zhao, D. BUAA-SID1. 0 space object image dataset. Spacecr. Recovery Remote Sens. 2010, 31, 65–71. [Google Scholar]
Zhang, H.; Jiang, Z. Multi-view space object recognition and pose estimation based on kernel regression. Chin. J. Aeronaut. 2014, 27, 1233–1241. [Google Scholar] [CrossRef]
Kisantal, M.; Sharma, S.; Park, T.H.; Izzo, D.; Märtens, M.; D’Amico, S. Satellite pose estimation challenge: Dataset, competition design, and results. IEEE Trans. Aerosp. Electron. Syst. 2020, 56, 4083–4098. [Google Scholar] [CrossRef]
Zhang, Z.; Deng, C.; Deng, Z. A diverse space target dataset with multidebris and realistic on-orbit environment. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 9102–9114. [Google Scholar] [CrossRef]
Musallam, M.A.; Gaudilliere, V.; Ghorbel, E.; Al Ismaeil, K.; Perez, M.D.; Poucet, M.; Aouada, D. Spacecraft recognition leveraging knowledge of space environment: Simulator, dataset, competition design and analysis. In Proceedings of the 2021 IEEE International Conference on Image Processing Challenges (ICIPC), Anchorage, AK, USA, 19–22 September 2021; pp. 11–15. [Google Scholar]
Pang, Y.; Yao, L.; Luo, Y.; Dong, C.; Kong, Q.; Chen, B. RepSViT: An Efficient Vision Transformer Based on Spiking Neural Networks for Object Recognition in Satellite On-Orbit Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–16. [Google Scholar] [CrossRef]
Huang, P.; Zhang, F.; Chen, L.; Meng, Z.; Zhang, Y.; Liu, Z.; Hu, Y. A review of space tether in new applications. Nonlinear Dyn. 2018, 94, 1–19. [Google Scholar] [CrossRef]
Forshaw, J.L.; Aglietti, G.S.; Navarathinam, N.; Kadhem, H.; Salmon, T.; Pisseloup, A.; Joffre, E.; Chabot, T.; Retat, I.; Axthelm, R.; et al. RemoveDEBRIS: An in-orbit active debris removal demonstration mission. Acta Astronaut. 2016, 127, 448–463. [Google Scholar] [CrossRef]
Nakasuka, S.; Funane, T.; Nakamura, Y.; Nojiri, Y.; Sahara, H.; Sasaki, F.; Kaya, N. Sounding rocket flight experiment for demonstrating “Furoshiki Satellite” for large phased array antenna. Acta Astronaut. 2006, 59, 200–205. [Google Scholar] [CrossRef]
Meng, Z.; Huang, P.; Guo, J. Approach modeling and control of an autonomous maneuverable space net. IEEE Trans. Aerosp. Electron. Syst. 2017, 53, 2651–2661. [Google Scholar] [CrossRef]
Wu, T.; Yang, X.; Song, B.; Wang, N.; Gao, X.; Kuang, L.; Nan, X.; Chen, Y.; Yang, D. T-SCNN: A two-stage convolutional neural network for space target recognition. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 1334–1337. [Google Scholar]
Yang, X.; Wu, T.; Wang, N.; Huang, Y.; Song, B.; Gao, X. HCNN-PSI: A hybrid CNN with partial semantic information for space target recognition. Pattern Recognit. 2020, 108, 107531. [Google Scholar] [CrossRef]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Li, J.; Bi, G.; Wang, X.; Nie, T.; Huang, L. Radiation-Variation Insensitive Coarse-to-Fine Image Registration for Infrared and Visible Remote Sensing Based on Zero-Shot Learning. Remote Sens. 2024, 16, 214. [Google Scholar] [CrossRef]
Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Bay, H.; Tuytelaars, T.; Van Gool, L. Surf: Speeded up robust features. In Proceedings of the Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; Proceedings, Part I 9. Springer: Berlin/Heidelberg, Germany, 2006; pp. 404–417. [Google Scholar]
Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
Leutenegger, S.; Chli, M.; Siegwart, R.Y. BRISK: Binary robust invariant scalable keypoints. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2548–2555. [Google Scholar]
Alcantarilla, P.; Nuevo, J.; Bartoli, A. Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces. In Proceedings of the British Machine Vision Conference 2013, Bristol, UK, 9–13 September 2013; British Machine Vision Association: London, UK, 2013. [Google Scholar]
Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
Cao, M.; Yan, Q.; Lv, Z. FAPP: Extremely Fast Approach to Boosting Image Matching Precision. IEEE Sensors J. 2024, 24, 4907–4919. [Google Scholar] [CrossRef]
Chum, O.; Matas, J. Matching with PROSAC-progressive sample consensus. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 220–226. [Google Scholar]
Zhu, F.; Li, J.; Zhu, B.; Li, H.; Liu, G. UAV remote sensing image stitching via improved VGG16 Siamese feature extraction network. Expert Syst. Appl. 2023, 229, 120525. [Google Scholar] [CrossRef]
Li, W.; Yang, C.; Peng, Y.; Zhang, X. A multi-cooperative deep convolutional neural network for spatiotemporal satellite image fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 10174–10188. [Google Scholar] [CrossRef]
Fan, R.; Hou, B.; Liu, J.; Yang, J.; Hong, Z. Registration of Multiresolution Remote Sensing Images Based on L2-Siamese Model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 237–248. [Google Scholar] [CrossRef]
Zeng, H.; Xia, Y. Space target recognition based on deep learning. In Proceedings of the 2017 20th International Conference on Information Fusion, Xi’an, China, 10–13 July 2017; pp. 1–5. [Google Scholar]
Yang, X.; Nan, X.; Song, B. D2N4: A discriminative deep nearest neighbor neural network for few-shot space target recognition. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3667–3676. [Google Scholar] [CrossRef]
Yingxiao, L.; Ju, H.; Ping, M.; Jiang, R. Target localization method of non-cooperative spacecraft on on-orbit service. Chin. J. Aeronaut. 2022, 35, 336–348. [Google Scholar]
Chen, B.; Cao, J.; Parra, A.; Chin, T.J. Satellite pose estimation with deep landmark regression and nonlinear pose refinement. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019; pp. 2816–2824. [Google Scholar]
Liu, Y.; Zhou, X.; Han, H. Lightweight CNN-based method for spacecraft component detection. Aerospace 2022, 9, 761. [Google Scholar] [CrossRef]
Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–November 2019; pp. 1314–1324. [Google Scholar]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
Wang, Y.; Wang, C.; Zhang, H.; Dong, Y.; Wei, S. A SAR Dataset of Ship Detection for Deep Learning under Complex Backgrounds. Remote. Sens. 2019, 11, 765. [Google Scholar] [CrossRef]

Figure 1. Scene diagram of space non-cooperative target recognition based on autonomous maneuvering space flying net capture system.

Figure 2. Diagram of the effect of data enhancement in the space target dataset.

Figure 3. Example of a simulated image from the space scene dataset.

Figure 4. The architecture of the proposed space non-cooperative target recognition method.

Figure 5. The framework of the multi-satellite space image stitching algorithm.

Figure 6. Multi-satellite images stitching order.

Figure 7. Schematic diagram of MBRT.

Figure 8. The boundary distance between two rectangular boxes at different relative positional relationships in MBRT-D, where D represents the boundary distance.

Figure 9. SqueezeNet v1.1 network structure.

Figure 10. The improved SqueezeNet network structure.

Figure 11. Efficient channel attention (ECA) module.

Figure 12. The stitching results of images exhibiting disparities in viewpoint, luminance, and signal-to-noise ratio.

Figure 13. Space non-cooperative target detection results based on our proposed LCNN-DMT model.

Figure 14. Space scene recognition result under the SPARK dataset.

Figure 15. Partial detection results of LCNN-DMT under different degrees of Gaussian noise effects.

Figure 16. Recognition precision of the model with different distance merge thresholds set by MBRT-D.

Figure 17. Schematic diagram of the effect of the three foreground extraction algorithms. (a) Original space scene image. (b) Results of MBR. (c) Results of MBRT where the threshold is set to 0.5. (d) Results of MBRT where the threshold is set to 26.

Figure 18. The effectiveness of the ECA attention mechanism on SqueezeNet.

Figure 19. Comparison results of SqueezeNet and existing SOTA methods.

Figure 20. Spatial scene image detection effect of different target detection models. The green boxesmark the detection of misidentification or redundant identification. (a–d) represent typical test images from our space scene dataset, all of which contain both disabled satellite and debris targets.

Figure 21. Recognition results of our model on SAR images. (a) represents the detection process and results for a single ship target. (b) represents the detection process and results for several ship targets. (c) represents the detection process and results for a background containing island interferences.

Figure 22. The cooperative recognition result of our model for satellite images with different SNR.

Table 1. Comparison of matching performance of various feature-matching algorithms.

Feature Extractors	Nc	RMSE	Precision (%)	Speed (ms)
SIFT + RANSAC	101	0.544	92.59	100
SIFT + PROSAC	58	0.491	100.00	100
SURF + RANSAC	118	0.988	88.00	36
SURF + PROSAC	100	0.979	98.00	36
BRISK + RANSAC	175	0.988	96.70	32
BRISK + PROSAC	163	0.956	99.42	32
AKAZE + RANSAC	113	0.644	96.00	78
AKAZE + PROSAC	102	0.630	100.00	78
ORB + RANSAC	184	0.638	96.84	9
ORB + PROSAC (Ours)	178	0.624	100.00	9

Table 2. Comparison results of different data augmentation methods.

Replication Extension	Fixed Order Augmentation	Randomized Augmentation	Accuracy (%)
✓			76.03
✓	✓		92.57
✓	✓	✓	98.00

Table 3. Comparison of the recognition performance of foreground extraction algorithms.

Foreground Extraction	MBR	MBRT	MBRT-D
AP50 (Satellite)/%	83.64	93.71	97.32
AP50 (Junk)/%	74.70	79.57	96.79
mAP/%	79.17	86.64	97.06

Table 4. Effectiveness of the INFire module on SqueezeNet. “x” represents that this module is not deployed in the model, and “✓” represents that is deployed.

	Replace Fire Module			Performance
	INFire2-3	INFire4-5	INFire6-9	Accuracy	Params/M	GFLOPs
SqueezeNet	x	x	x	98.00%	1.235	0.545
	✓	x	x	97.90%	1.229	0.516
	x	✓	x	98.23%	1.186	0.490
	x	x	✓	98.23%	0.928	0.463
	x	✓	✓	98.13%	0.879	0.408
	✓	✓	✓	97.97%	0.867	0.867

Table 5. The effectiveness of the INFire module and ECA module on SqueezeNet. “x” represents that this module is not deployed in the model, and “✓” represents that is deployed.

	Module		Performance
	INFire	ECA	Accuracy	Params/M	GFLOPs
SqueezeNet	x	x	98.00%	1.235	0.545
	✓	x	98.23%	0.928	0.463
	x	✓	98.47%	1.235	0.545
	✓	✓	98.83%	0.928	0.464

Table 6. Performance comparison results of non-cooperative target recognition algorithms.

Method	Input Size	mAP50	Parameters	Model Size	GFLOPs	Speed	Train Time
	(Pixels)	(%)	(M)	(MB)		(ms)	(h)
YOLOv5n	640 × 640	96.89	1.76	3.90	4.2	4.5	1.58
YOLOv7-tiny	640 × 640	96.39	6.03	12.36	13.0	5.7	2.59
YOLOv8n	640 × 640	97.85	3.01	6.38	8.2	7.4	3.27
Faster R-CNN	640 × 640	89.17	47.27	377.60	129.3	180.2	42.04
T-SCNN	2048 × 2048	86.64	25.56	97.80	6.398	18.2 (6) ¹	0.37
LCNN-DMT	2048 × 2048	97.50	0.928	3.58	0.464	6.1 (2)	0.25

¹ 18.2 (6) in the table represents an average image recognition time of 18.2 ms, and a single target recognition time of 6 ms.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Y.; Wang, J.; Chen, J.; Shi, D.; Chen, X. A Space Non-Cooperative Target Recognition Method for Multi-Satellite Cooperative Observation Systems. Remote Sens. 2024, 16, 3368. https://doi.org/10.3390/rs16183368

AMA Style

Zhang Y, Wang J, Chen J, Shi D, Chen X. A Space Non-Cooperative Target Recognition Method for Multi-Satellite Cooperative Observation Systems. Remote Sensing. 2024; 16(18):3368. https://doi.org/10.3390/rs16183368

Chicago/Turabian Style

Zhang, Yue, Jianyuan Wang, Jinbao Chen, Donghao Shi, and Xiaotong Chen. 2024. "A Space Non-Cooperative Target Recognition Method for Multi-Satellite Cooperative Observation Systems" Remote Sensing 16, no. 18: 3368. https://doi.org/10.3390/rs16183368

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Space Non-Cooperative Target Recognition Method for Multi-Satellite Cooperative Observation Systems

Abstract

1. Introduction

2. Related Work

2.1. Satellite Image Stitching

2.2. Space Non-Cooperative Target Recognition

3. Datasets

3.1. Space Target Dataset

3.2. Space Scene Dataset

4. Methodology

4.1. Multi-Satellite Image Stitching

4.2. Foreground Extraction Model

4.2.1. MBR and MBRT

4.2.2. The Proposed MBRT-D

4.3. Target Classification Network

4.3.1. SqueezeNet Model

4.3.2. Improved SqueezeNet Model

5. Experiments

5.1. Implement Details

5.2. Image Stitching Results

5.3. Randomized Data Augmentation Results

5.4. Detection Performance of LCNN-DMT

5.4.1. Performance on Our Space Scence Dataset

5.4.2. Performance on Spark Dataset

5.4.3. Noise Case

5.5. Ablation Study

5.5.1. Effectiveness of Foreground Extraction Model Performance

5.5.2. Effectiveness of INFire Module

5.5.3. Effectiveness of Attention Mechanism

5.5.4. Effectiveness of Improved Squeezenet

5.6. Comparative Experiments

5.6.1. Comparison of Classification Model Performance

5.6.2. Comparison of Recognition Method Performance

5.6.3. Visualization Comparison

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI