Effective Multi-Frame Optical Detection Algorithm for GEO Space Objects

Dai, Yuqi; Zheng, Tie; Xue, Changbin; Zhou, Li

doi:10.3390/app12094610

Open AccessArticle

Effective Multi-Frame Optical Detection Algorithm for GEO Space Objects

¹

National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China

²

Department of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(9), 4610; https://doi.org/10.3390/app12094610

Submission received: 6 April 2022 / Revised: 28 April 2022 / Accepted: 29 April 2022 / Published: 3 May 2022

Download

Browse Figures

Versions Notes

Abstract

:

Featured Application

Space object detection.

Abstract

The limited resource of Geostationary Earth Orbit (GEO) is precious and most telecommunication, weather and navigational satellites are placed in this orbit. In order to guarantee the safety and health of active satellites, advanced surveillance and warning of unknown space targets such as space debris are crucial. However, space object detection still remains a very challenging problem because of the weak target characteristics and complex star background. To solve this problem, we conduct a deep-learning-based framework called PP-YOLOv2 for single-frame object detection and design a post-processing algorithm named CFS for further candidate filtration and supplement. First, we transform the label information and generate the according bounding boxes to train the PP-YOLOv2 detector to extract candidate coordinates for each frame. Then, the CFS technique is designed as an effective post-processing procedure to obtain the eventual prediction results. Experiments were conducted over a dataset from the Kelvins SpotGEO challenge, which demonstrate the effectiveness and the comparable detection performance of our proposed pipeline. Finally, the deployment results on NVIDIA Jetson Nano show that the proposed method has a competitive application prospect for a space target monitoring system.

Keywords:

geostationary earth orbit (GEO); space object detection; PP-YOLOv2; linear fitting

1. Introduction

Over the past couple of decades, we have witnessed vigorous development of space activities across the world. Tens of thousands of new space assets are being launched globally. With the expansion of human space exploration, the Resident Space Object (RSO) detection problem has been a global threat and critical issue in recent years [1].

Those fast-moving artificial space objects such as dead satellites, fragments of abandoned launch vehicle stages and mission-related debris are a potential threat to all space vehicles. Hence, it is urgent to establish a space surveillance station to enable the detection, identification and avoidance of space hazards, as an important part of space situation awareness (SSA) systems [2]. As one of the critical technologies, space object detection based on optical imaging characteristics has received wide attention.

The Kelvins SpotGEO challenge [3] held in 2020 provides a platform to addressing specific space objects around the Geostationary Earth Orbit (GEO). In addition, the competition organizers assemble and reveal the first public available SpotGEO dataset [4] from a computer vision perspective.

The SpotGEO dataset collects optical images captured from low-cost CMOS cameras deployed on the ground-based astronomical telescope over nighttime, as shown in Figure 1. The emphasis and difficulties lie in finding how to effectively detect the target objects under the extreme distance from the observer with weak target characteristics and complex star background.

Several algorithms have already been developed for similar problems. Šára et al. [5] proposed a RANSAC method based on statistical model and Monte Carlo algorithm which perform well at low SNR with moving thin overhead clouds for detecting tracks of slowly moving objects. Do et al. [6] proposed a pipeline named GP-ICP to detect the resident space objects in the GEO band. They took the strategy of detecting before tracking with the Gaussian process regression and a line-genuine technique. Yanagisawa et al. [7] proposed a line-identifying technique combining a stacking method to detect uncatalogued GEO debris in CCD frames. Liu et al. [8] introduced a multi-target detection algorithm for geostationary space objects based on topological sweep. They exploited the geometric duality to extract the target objects with an approximately linear trajectory. Ohsawa [9] introduced a robust algorithm called tracee for moving object detection, which will accelerate identifying tracklets by extracting linearly aligned points as a line segment from a three-dimensional space. To speed up the detection of space target, traditional algorithms based on hand-crafted feature selection need to be replaced by automatic artificial intelligence approaches.

The literature highlights that deep learning (DL) models surpass all others in terms of accuracy and efficiency, making remarkable progress in various application scenarios including natural scenery images, remote sensing images and aerial images. Currently, convolutional neural network (CNN)-based object detection algorithms have strong advantages in terms of their superior performance in extract more abstract and robust high-level semantic information, along with high plasticity and universality. It is equally important that dataset play a dominant role in shaping the future of DL models as DL itself is a data-driven approach. Consequently, we stand to seize the opportunity and momentum to take further research on the object detection problem in deep space scene with star background based on the SpotGEO dataset.

In this paper, a multi-frame optical detection algorithm for GEO target is proposed on the strength of the imaging characteristics and motion information. Motivated by the state-of-the-art object detection method for natural scene imagery, we conduct a real-time DL framework called PP-YOLOv2 [10] for single-frame object detection and design a post-processing algorithm named CFS for further candidate filtration and supplement. The main contributions of this paper are concluded as follows.

(1): To the best of our knowledge, this is the first effort toward introducing the deep model PP-YOLOv2 into the field of space object detection. We evaluate several representative models and conduct enough comparable experiments. This is expected to facilitate the development and benchmarking of object detection algorithms in satellite images.
(2): To make the most of motion characteristics that space objects take on linear trajectories, an effective candidate filtration and supplement (CFS) method based on the straight-line searching strategy is designed for further increasing accuracy. Experimental results demonstrate that the proposed algorithm reaches an F1-score of 93.47% with competitive performance.
(3): In order to verify the efficiency of proposed algorithm, we transplant the proposed pipeline to the embedded system platform of NVIDIA Jetson Nano. Considering the metric of frame per second (FPS), the proposed method achieves an inference speed at 1.42 FPS, with a ResNet50 [11] backbone other than typical lightweight backbone networks, which will pave the way for further optimization and improvement to meet the demand of future real-time image processing.

The remainder of this paper is organized as follows: Section 2 briefly highlights related work and materials. Section 3 describes the multi-frame object detection pipeline in detail, including the 2D single-frame detector and the proposed CFS technique. Section 4 provides overall performance and comparison results of the proposed method with analysis and discussion. Finally, conclusions with a summary of the key findings of our work are drawn in Section 5.

2. Related Work and Materials

2.1. Deep Models for Object Detection

Object detection is a longstanding problem which has been witnessing a rapid revolutionary change in the field of computer vision. Its involvement in the combination of object classification as well as object localization makes it one of the most challenging topics. Traditional object detection algorithms generate features by manually selected features and can achieve better results in specific scenes. Compared with traditional methods, algorithms based on deep learning overcome the disadvantages of poor regional selection strategy and poor robustness of extracted features and the detection effect has been significantly improved.

The current mainstream object detection algorithms can be divided into two types, one is two-stage methods [12,13,14,15], the other is one-stage methods [16,17,18,19,20,21]. Two-stage methods follow the traditional object detection pipeline, generating region proposals at first and then classifying each proposal into different object categories. The detection accuracy could be improved to a large degree. However, the detection speed is slow and cannot meet the requirements of real-time detection.

One-stage methods regard object detection as a regression or classification problem, adopting a unified framework to achieve final results (categories and locations) directly. The typical methods include the series of You Only Look Once (YOLO) methods. With the introduction of the series of PP-YOLO [22], the detection performance of the YOLO-based methods has been improved further. Currently, the newest version PP-YOLOv2 integrates many new technologies to optimize the computational efficiency.

Additionally, over the past two years, the major advances in object detection academia have focused on anchor-free detectors [23,24,25]. The anchor-free methods aim to detect objects by predicting their centroids and sizes which are architecturally simpler and faster.

2.2. Dataset

As a pioneer, the Kelvins SpotGEO Challenge organizers have released their SpotGEO dataset, which was collected for the study of space object detection problem. All the image sequences are captured by a low-cost optical CMOS camera carried on one ground-based telescope, with an angular pixel size of about 4.5 arc seconds. The camera remains stationary relative to the earth’s surface during the 40 s exposure time. Five consecutive frames are taken as a sequence imaging some portions of the sky at the geostationary ring. The cameral motion is constant between any two consecutive frames within a sequence.

Under this observation strategy, background stars are relative moving during the exposure time and eventually the motion trajectories of stars are streak-like. While the GEO objects appears as the form of spots or short streaks in the image, the motion trajectory within one sequence appears like uniformly distributed discrete points on a straight line.

In order to better demonstrate the SpotGEO dataset, two groups of sequences are selected for display, containing the number of targets of 3 and 5 respectively, as described in Figure 2 and Figure 3. The visualizations of example images annotated with the corresponding annotation are shown in Figure 2a and Figure 3a–e. It is evident that some space targets are too faint and dark to be seen through human eyes. Possible causes include the cloud over, atmospheric/weather effects, star occlusion and sensor noise/defects. All these terrible factors indeed increase the difficulty of GEO space object detection problem. Furthermore, all the coordinates of objects in five consecutive frames are gathered together, explicit motion trajectories can be seen in Figure 2b and Figure 3f.

It is specifically observed that the SpotGEO dataset contains 1280 sequences, 6400 grayscale images with annotations for training and 5120 sequences, 25,600 grayscale images with annotations for test, respectively. All the images have the resolution of 640 × 480. As a matter of fact, there is only one space object category annotated with point coordinates given by (x, y).

3. Methodology

The flowchart of the proposed algorithm is represented in Figure 4. To summarize, the algorithm mainly consists of two parts. The 2D object detector PP-YOLOv2 primarily focuses on searching potential candidate target among all the frames according to the appearance features. Then, the tracker based on CFS is deployed to filtrate true targets and supplement misdetected objects.

3.1. Data Preprocess: Label Transform

Owing to the atmospheric distortion and the long exposure time, the received photons from the target object are smeared over a few pixels. On the other hand, the brightness of space targets is unstable due to the movement. Hence, we take the advantage of intensity correlations of neighboring pixels and transform the label information of point coordinates. Then we generate the according bounding box labels as depicted in Figure 5. The conversion process is regarding each target point as the central position of the according bounding box. The distances between the center point and the top, bottom, left and right edges are all set to 2 to generate the ground truth of bounding box. Then, coordinates of the upper left corner and coordinate values of the lower right corner are calculated according to the offset value. Thus, we obtain new annotation files ready for training the single-frame object detector.

3.2. Candidate Extraction: PP-YOLOv2

The main consideration of PP-YOLOv2 is to achieve a better balance of speed and accuracy to a certain extent as a variant of YOLO, rather than designing a novel framework. PP-YOLOv2 has shown excellent real-time detection performance in generic object detection due to its high computation speed. Therefore, we exploit PP-YOLOv2 model as single-frame object detector to extract potential candidate target. The overall architecture of PP-YOLOv2 is shown in Figure 6. It is a further optimization and upgrade based on PP-YOLOv1 [9]. It improves the accuracy of the model by combining more than 10 new tricks on the baseline for higher performance. The mAP tested on COCO2017 [26] test-dev data is improved from 45.9% to 49.5%.

The backbone network is ResNet50-vd-dcn with deformable convolutional layers to increase the inference speed. One of the FPN variants named PAN is used as the detection neck to enhance the ability of pyramid representation. The detection head contains two convolutional layers: a 3 × 3 convolution followed by 1 × 1 convolution to obtain the final prediction. The number of predicted output channels is

3 \times (K + 5)

, where K represents the number of categories. Three different anchor points are assigned to each position of the predicted feature map. For each anchor point, the first K channels represent the predicted probability of the K category, the subsequent 4 channels represent the position, and the last channel represents the target score. The classification and localization loss functions are cross entropy loss and L1 loss, respectively.

Through employing 2D single-frame detector of PP-YOLOv2, the output data as candidate coordinates are a set of position coordinates as described below:

P = {P_{n}^{i} = (x_{n}^{i}, y_{n}^{i}) | n \in [1, \dots, N_{i}], i \in [1, 2, 3, 4, 5]},

(1)

where

N_{i}

is defined as the number of detections in the

i_{t h}

frame.

3.3. Candidate Filtration and Supplement: CFS

After obtaining all the candidate coordinates, an effective post-processing technique based on CFS is deployed to filtrate the true targets and supplement the misdetected target coordinates as the final prediction results. The CFS method is proposed in the light of the principle of moving continuity and trajectory consistency of space objects across the frames. We utilize the fact that the targets move with uniform speed along a straight-line trajectory. The flowchart, as shown in Figure 7, summarizes the overall procedures of the proposed CFS method.

In the first step, considering all potential straight-line trajectories, we take a search window to group point sets. As shown in Figure 8a, a sample of space objects is captured in all 5 frames within one sequence denoted as

P^{i}, i \in [1, 2, 3, 4, 5]

. When all the

P^{i}

are registered to a common coordinate system, the distance between any two adjacent points

P^{i} (x^{i}, y^{i})

and

P^{i + 1} (x^{i + 1}, y^{i + 1})

is the same. According to the statistics of the training dataset, we obtained the maximum and minimum value of the interval distance as

R_{\max}

and

R_{\min}

. Then, we use the ringlike search window to group point sets of potential trajectories, as shown in Figure 8b. The special search strategy we take to improve efficiency is that the number of elements in each point set is limited to

N = 3

. Only the collinear cases of 3 points from successive or inconsecutive frames are considered, all of which are shown in Figure 8c.

For the cases in which trajectory dots come from

N

successive frames, all the coordinate points of the searching starting frame

f_{s} \in {f_{1}, f_{2}, f_{3}}

will be used as the search starting points. Next, we capture all the potential trajectory points of the next frame

f_{s + 1}

that fall into the search window. Then, the frame

f_{s + 1}

is set as the starting frame to search potential trajectory points of the frame

f_{s + 2}

the same way.

As for the cases of inconsecutive frames, the maximum interval is two frames or three frames. An enumeration method is used to capture the potential trajectories as much as possible. When the maximum interval is 2 frames, the searching starting frame is

f_{s} \in {f_{1}, f_{2}}

and the single-step search window size

S_{n}, n \in [1, 2]

is set to

{r, 2 r}

and

{2 r, r}

, respectively. When the maximum interval is 3 frames, the searching starting frame are

f_{s} \in {f_{1}}

and the single-step search window size are set to

{r, 3 r}

and

{3 r, r}

, respectively.

Then, in step 2, we implement linear fitting to all points sets resulted from step 1 and obtain all linear trajectories across the sequence frames. In step 3, according to the results from linear fittings based on a least-squares technique, we can compute the missed target coordinates and supplement for the final detection results.

4. Experiment

This section evaluates the performance of the proposed GEO space object detection algorithm and compares it with several state-of-the-art models. Further, the inference performance of the proposed method based on NVIDA Jetson Nano is presented. The specific experimental details are elaborated, including the establishment of experiment platform and parameter setting. This section highlights the corresponding results and analysis.

4.1. Experiment Platform and Parameter Setting

The computing hardware platform used in this article is Baidu AI Studio [27] from where we access free available computing resource. The hardware is configured with 16 GB memory, equipped with NVIDIA Tesla V100 graphics card. Additionally, we deploy the deep learning framework named PaddlePaddle to conduct all experiments in this paper. PaddlePaddle is an open-access industrial Python library that covers core deep learning frameworks, basic model libraries and end-to-end development kits in favor of both dynamic and static graphs. The version of CUDA is 10.1, and the version of cuDNN is 7.6. Lastly, in order to verify the practical performance, the proposed model is later transplanted to the embedded system device of NVIDIA Jetson Nano.

As shown in Table 1, in the training section, the total number of the frames is 6400 which includes a total of 4775 frames with totally 11,205 target objects and 1625 frames without any object. All frames are shuffled before feeding into the network. Afterwards, we split the first 80% of them into a training set, and the last 20% into 1 evaluation set. Additionally, augmentation methods such as horizontal flipping and random rotation are used to create new frames and address the issue of bias for unbalanced number of samples in the two “spot” and “background” classes. Whereas, in the test section, the total number of the test images is 25,600, which includes a total of 44,550 space objects in the test section. There is no overlap between the training and test samples.

To ensure the fairness of the experimental results, all initial algorithm parameters of the experiments are maintained the same. The training phase ran over 400 epochs and the learning rate for the Momentum optimizer is set to 1.25 × 10⁻⁴ which remains fixed during the training phase. Additionally, the batch size of 8 is used to fit the model in the training phase.

4.2. Evaluation Metric

We exploit the officially provided evaluation kit [28] to compute the average F1 score and Mean Square Error (MSE) score for all the test images, of which the detailed calculation introduced as the following.

According to the ground truth of object locations

Y = {Y^{f}}_{f = 1}^{5}

and the predicted object locations

χ = {χ^{f}}_{f = 1}^{5}

in 5 successive frames, each

Y^{f} {= {y}_{j}^{f}}_{j = 1}^{N}

and

χ^{f} = {x_{i}^{f}}_{i = 1}^{M_{f}}

contain the true N objects and

M_{f}

candidates in frame

f

, respectively.

For a given frame, the binary matrix

H^{f} \in {0, 1}^{M_{f} \times N}

is obtained by a one-to-one matching between

χ^{f}

and

Y^{f}

via solving the minimum weighted unbalanced assignment problem

\underset{H^{f}}{argmin} \sum_{i = 1}^{M_{f}} \sum_{j = 1}^{N} H_{i, j}^{f} δ (x_{i}^{f}, y_{j}^{f})

, where function

δ

computes the truncated Euclidean distances as in Equation (2):

δ (x_{i}^{f}, y_{j}^{f}) = \{\begin{matrix} {‖x_{i}^{f} - y_{j}^{f}‖}_{2}, \\ l, \end{matrix} \begin{matrix}  \end{matrix} \begin{matrix} i f {‖x_{i}^{f} - y_{j}^{f}‖}_{2} \leq τ, \\ o t h e r w i s e . \end{matrix},

(2)

where the matching distance threshold

τ

is set to 10 and the

l

is a large enough positive number such as 1000. The matching procedure will result in each object being matched with its nearest neighbor prediction. Depending on the distance function

δ

, we can obtain the following sets of frame

f

: true positives

T P^{f}

, false positives

F P^{f}

and false negatives

F N^{f}

.

The regression error

S S E^{f}

, as the sum of squared error is defined by Equation (3):

S S E^{f} = \sum_{(i, j) \in T P^{f}} π (x_{i}^{f}, y_{j}^{f}) + \sum_{j \in F N^{f}} τ^{2} + \sum_{i \in T P^{f}} τ^{2},

(3)

where

π (x_{i}^{f}, y_{j}^{f}) = \{\begin{matrix} 0, \\ {‖x_{i}^{f} - y_{j}^{f}‖}_{2}^{2}, \end{matrix} \begin{matrix}  \end{matrix} \begin{matrix} i f {‖x_{i}^{f} - y_{j}^{f}‖}_{2} \leq ε \\ o t h e r w i s e . \end{matrix} .

(4)

The smaller tolerance radius

ε

is set to 3 and the SSE for the sequence is thus

S S E = \sum_{f = 1}^{5} S S E^{f} .

(5)

MSE score is defined by the following Equation (4) based on calculating the SSE for all the K sequence:

M S E = \frac{\sum_{k = 1}^{K} S S E_{k}}{\sum_{k = 1}^{K} T P_{k} + F N_{k} + F P_{k}},

(6)

where

T P = \sum_{f = 1}^{5} |T P^{f}|

,

F N = \sum_{f = 1}^{5} |F N^{f}|

and

F P = \sum_{f = 1}^{5} |F P^{f}|

.

The F1 score is defined as Equation (7):

F_{1} = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l},

(7)

where the overall precision (Precision) is calculated according to the TP and FP of all the K sequences as in Equation (8):

P r e c i s i o n = \frac{\sum_{k = 1}^{K} T P_{k}}{\sum_{k = 1}^{K} T P_{k} + F P_{k}},

(8)

and the overall recall (Recall) is computed considering false negatives (FN) as follows:

R e c a l l = \frac{\sum_{k = 1}^{K} T P_{k}}{\sum_{k = 1}^{K} T P_{k} + F N_{k}} .

(9)

Additionally, FPS (frames per second) is the most common metric used to measure detection speed. It represents the number of pictures that can be detected by the model per second.

4.3. Comparison Results

In order to verify the detection performance of the 2D single-frame detector and the proposed CFS technique for space objects in deep space scenery, under the same hardware conditions and environment configuration, several common object detection methods are compared on the SpotGEO dataset, and the detection results are shown in Table 2 and Table 3.

We compare the PP-YOLOv2 method with four state-of-the-art networks including Faster R-CNN, Cascade R-CNN, YOLOv3 and PP-YOLO. Table 2 intuitively denotes the comparison results of different single-frame GEO object detectors. The quantitative comparison results show that the PP-YOLOv2 outperforms the other models on the test dataset with comparable compute speed. The F1 score on the SpotGEO dataset is 84.08%. It achieves a compromise between performance and computational complexity, which can better adapt to dynamic requirements.

In order to verify the superiority of the proposed CFS technique, we rerun the tests of different models along with the same post-processing procedure. After the CFS step is finished, the final results are shown in Table 3. The F1 score of each model has significantly improved and the corresponding MSE has decreased dramatically. As can be seen from the third column in Table 3, various indicators have increased significantly close to 11% in the F1 score, hence demonstrating the great effectiveness of the proposed CFS technique. In addition, the final F1 score of the proposed algorithm is also 1.47% higher than the accuracy of GEO-FPN [29].

In the official challenge, the final ranking is based on the 1−F1 and MSE. For a more intuitive comparison, we have directly listed the F1 scores as presented in Table 4. It is obvious that the proposed algorithm has already exceeded third place. According to the published article of the organizers, the top 1 team adopted the 3-step pipeline including the first step of background removing using L1-spline and estimation of star shifting using a hand-crafted star descriptor and RANSAC. Then, in the second step, they trained an ensemble of 10 U-Net [30] models to predict object locations. Lastly, they also did the procedure of post-processing via line detection and trajectory filling. Whereas, the team POTLAB@BUAA that designs a non-learning-based approach came second. They adopted a pixel-level classification method based on the SNR calculation and extracted candidate targets through adjacent selected pixels. Then, the final detection results are obtained based on the estimation of interframe satellite shifts. Compared to their solution, our pipeline is more intuitive and concise.

4.4. Model Deployment

The developed algorithm is oriented towards practical application of space-based or ground-based space object surveillance system. Hence, compared with GPUs and AI chips designed for high-performance parallel computing processing, resources of the onboard processor platform are limited and there are strict constraints on computational complexity and power consumption for executing deep neural network. In this subsection, we investigate the proposed GEO object detection pipeline and analyze the performance on NVIDIA Jetson Nano. Table 5 shows the specifications of Jetson Nano boards.

The operating system is Ubuntu 18.04. The versions of CUDA and cuDNN are 10.2 and 8.0, respectively. Based on the higher-level Python API of Paddle Inference that has built-in support for deploying most common computer vision models, we run the PP-YOLOv2 model on Jetson Nano to test the inference speed. The TensorRT library is also used to boost the inference efficiency of the deep model, which provides a tuning function of deep learning networks. The average execution time are enlisted in Table 6.

It can be observed that the average inference time is 0.71 s per image with 640 × 480-sized input images. Considering the metric of FPS, the proposed method achieves an inference speed at 1.42 FPS, with a ResNet50 backbone other than common lightweight backbone networks. Since the typical onboard remote sensing systems have limited storage and compute capacity, this research work will pave the way for further optimization and improvement to meet the demand of future real-time image processing.

4.5. Further Discussion

Previous methodologies employed to detect space objects require algorithms which are complicated to design and dependent on handmade feature vectors. This study evaluates an alternative robust solution adopting a supervised learning algorithm to avoid selecting feature extractors. It could become part of a ground-based optical space surveillance system, as the top-level design in space situational awareness (SSN) missions. Here, we list several limitations that remain unsolved in the study.

In this paper, transfer learning through fine-tuning pre-trained weights is utilized to speed up the training process and result in more accurate 2D single-frame detection model overall. However, proper training settings can effectively facilitate better model performance. Our results encourage the community to take more heuristics training strategies (e.g., warmup, early stopping, synchronized batch normalization) to better explore their potentials.

To describe the performance of the proposed method intuitively, several groups of prediction results are represented in Figure 9 and Figure 10. The prediction results of single-frame detection by PP-YOLOv2 are marked by the green solid circle and the final results after the post-processing procedure of CFS are marked by the red cross symbols. As a reference, the blue cross symbols indicate the ground truth. It can be seen from Figure 9a–c that the proposed CFS technique can effectively extract true space targets through the motion track information. On the other hand, it is also obvious that there is a relative discrepancy between the predicted and true target location. Combined with the remarkable metric of MSE value, this algorithm still needs to be improved and optimized in the future. A promising direction could be investigating different label transform strategies to see which type of labeled data has better accuracy for the target position estimation.

Furthermore, the ability to detect truncated and occluded objects is highly favorable for space object detection tasks as such instances regularly occur for images. Here, we also present several challenging examples in the dataset in Figure 10. The blue cross symbols in the left column and the green circle symbols in the right column both indicate the ground truth. Those misdetected samples in blue stand out in the image where there are none of the same red symbols nearby. In future work, we will continue to refine our approach to achieve greater detection accuracy.

5. Conclusions

In this paper, an effective multi-frame detection algorithm is proposed to solve the problem of GEO space object detection, in which the 2D deep learning object detector named PP-YOLOv2 is firstly introduced to extract candidate target from single-frame images; and a new post-processing technique called CFS is applied to further filtrate and supplement true space objects. Our work shows the adaptation of the popular PP-YOLOv2 framework for predicting the objects and their locations in satellite images with high accuracy and inference speed. We utilized the transfer learning for faster convergence of the model on the SpotGEO dataset. Accordingly, we analyze the performance metrics including F1 score, MSE and FPS. The comparable detection performance that the trained model resulted in F1 score of 84% with an inference speed reaching 72.1 FPS on the Tesla V100 GPU. With the help of the CFS technique, the proposed pipeline achieves 9.39% improvement in F1 score. Experimental results demonstrate that the proposed algorithm has achieved good results with an overall accuracy of 93.47%. In addition, the deployment results on NVIDIA Jetson Nano platform prove that the proposed pipeline has a competitive application prospect for the surveillance of resident space objects. In the future, our algorithms are applicable to both ground and space platforms, with the foresight to integrate into the vision processing units of the new generation smart satellites. This research work will further pave the way for future advancement in the field of space situation awareness.

Author Contributions

Conceptualization, Y.D.; methodology, Y.D. and T.Z.; investigation, Y.D., C.X. and L.Z.; formal analysis, C.X., Y.D. and L.Z.; writing—original draft, Y.D.; writing—review and editing, T.Z. and C.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors thank the ESA Advanced Concepts Team and University of Adelaide for hosting the SpotGEO challenge and providing the dataset that enabled us to conduct the research of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Fitzmaurice, J.; Bédard, D.; Lee, C.H.; Seitzer, P. Detection and Correlation of Geosynchronous Objects in NASA’s Wide-Field Infrared Survey Explorer Images. Acta Astronaut. 2021, 183, 176–198. [Google Scholar] [CrossRef]
Diprima, F.; Santoni, F.; Piergentili, F.; Fortunato, V.; Abbattista, C.; Amoruso, L. Efficient and Automatic Image Reduction Framework for Space Debris Detection Based on GPU Technology. Acta Astronaut. 2018, 145, 332–341. [Google Scholar] [CrossRef]
Chen, B.; Liu, D.; Chin, T.-J.; Rutten, M.; Derksen, D.; Martens, M.; von Looz, M.; Lecuyer, G.; Izzo, D. Spot the GEO Satellites: From Dataset to Kelvins SpotGEO Challenge. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Nashville, TN, USA, 20–25 June 2021; pp. 2086–2094. [Google Scholar]
Chen, B.; Liu, D.; Chin, T.-J.; Rutten, M.; Derksen, D.; Märtens, M.; von Looz, M.; Lecuyer, G.; Izzo, D. SpotGEO Dataset. Available online: https://doi.org/10.5281/zenodo.4432143 (accessed on 11 May 2021). [CrossRef]
Šára, R.; Matoušek, M.; Franc, V. RANSACing Optical Image Sequences for GEO and Near-GEO Objects. In Proceedings of the Advanced Maui Optical and Space Surveillance Technologies Conference, Maui, HI, USA, 10–13 September 2013. [Google Scholar]
Do, H.N.; Chin, T.-J.; Moretti, N.; Jah, M.K.; Tetlow, M. Robust Foreground Segmentation and Image Registration for Optical Detection of GEO Objects. Adv. Space Res. 2019, 64, 733–746. [Google Scholar] [CrossRef]
Yanagisawa, T.; Kurosaki, H.; Nakajima, A. Activities of JAXA’s Innovative Technology Center on Space Debris Observation. In Proceedings of the Advanced Maui Optical and Space Surveillance Technologies Conference, Maui, HI, USA, 1–4 September 2009. [Google Scholar]
Liu, D.; Chen, B.; Chin, T.-J.; Rutten, M. Topological Sweep for Multi-Target Detection of Geostationary Space Objects. IEEE Trans. Signal Process. 2020, 68, 5166–5177. [Google Scholar] [CrossRef]
Ohsawa, R. Development of a Tracklet Extraction Engine. arXiv 2021, arXiv:2109.09064. [Google Scholar]
Huang, X.; Wang, X.; Lv, W.; Bai, X.; Long, X.; Deng, K.; Dang, Q.; Han, S.; Liu, Q.; Hu, X.; et al. PP-YOLOv2: A Practical Object Detector. arXiv 2021, arXiv:2104.10419. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-Cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra R-Cnn: Towards Balanced Learning for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 821–830. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. arXiv 2020, arXiv:1911.09070. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single Shot Multibox Detector. In Proceedings of the European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
Long, X.; Deng, K.; Wang, G.; Zhang, Y.; Dang, Q.; Gao, Y.; Shen, H.; Ren, J.; Han, S.; Ding, E.; et al. PP-YOLO: An Effective and Efficient Implementation of Object Detector. arXiv 2020, arXiv:2007.12099. [Google Scholar]
Law, H.; Deng, J. Cornernet: Detecting Objects as Paired Keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 734–750. [Google Scholar]
Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Centernet: Keypoint Triplets for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 6569–6578. [Google Scholar]
Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: A Simple and Strong Anchor-Free Object Detector. arXiv 2020, arXiv:2006.09214. [Google Scholar] [CrossRef]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014. [Google Scholar]
Baidu AI Studio. Available online: https://aistudio.baidu.com/ (accessed on 20 August 2021).
Derksen, D.; Martens, M.; von Moritz, M.; von Gurvan, G.; Izzo, G.; Chen, B.; Liu, D.; Chin, T.-J.; Rutten, M. Spotgeo Starter Kit. Available online: https://doi.org/10.5281/zenodo.3874368 (accessed on 11 May 2021).
Abay, R.; Gupta, K. GEO-FPN: A Convolutional Neural Network for Detecting GEO and near-GEO Space Objects from Optical Images. In Proceedings of the 8th European Conference on Space Debris (virtual), Darmstadt, Germany, 20–23 April 2021. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer International Publishing: Cham, Switzerland, 2015. [Google Scholar]

Figure 1. A diagram of a ground-based space object surveillance platform.

Figure 2. An example image sequence of the SpotGEO dataset. (a) The typical sequence that captures three objects. (b) Object detection in a sequence of 5 consecutive frames. The source positions are shown by the circle symbols in different colors.

Figure 3. Another example image sequence: (a–e) five frame images within a sequence that contain six objects; (f) the source positions are shown by the cross symbols in different colors.

Figure 4. Flowchart of the proposed system.

Figure 5. The bounding box generated from the given target coordinates.

Figure 6. The framework of PP-YOLOv2.

Figure 7. Flowchart of the proposed CFS method.

Figure 8. The schematic diagrams of three steps in the proposed CFS strategy: (a) a schematic view of the line in space-time intersecting image planes in a discrete ordered set of locations; (b) the ringlike search window; (c) the enumeration of potential candidate targets of sequence-based detection results.

Figure 9. Performance of the proposed multi-frame space object detection method on the test set: (a–c) show the space targets that can be detected correctly in different spatial distribution within one sequence. The prediction results of PP-YOLOv2 detector are marked by the green solid circle; the final results after the post-process procedure of CFS are marked by the red cross symbols. Additionally, the blue cross symbols indicate the ground truth.

Figure 10. Visual examples of missed detection problem in the SpotGEO dataset: (a–c) show space targets that can’t all be detected correctly in different spatial distribution within one sequence. Left column: the prediction results in terms of the sequence images. Right column: the first frame with annotations in the corresponding sequence where the prediction results of the proposed algorithm are marked by red circles and the ground truth is marked by green circles.

Table 1. The SpotGEO Dataset.

	Number of Images	Number of Objects	Average Number of Objects
Training	5120	8964	1.751
Validation	1280	2241	1.750
Test	25,600	44,550	1.740
Total	32,000	55,755	1.742

Table 2. Performance comparison of different 2D single-frame object detectors.

Methods	Backbone	FPS	F1(%)	MSE
GEO-Faster R-CNN	ResNet50_vd_dcn	18.06	80.19	162,168.66
GEO-Cascade R-CNN	ResNet50_vd_dcn	20.6	82.54	177,082.09
GEO-YOLOv3	ResNet50_vd_dcn	61.3	81.89	141,504.56
GEO-PP-YOLO	ResNet50_vd_dcn	72.9	82.21	181,461.13
GEO-PP-YOLOv2	ResNet50_vd_dcn	72.1	84.08	160,997.98

Table 3. Performance comparison of different deep models with CFS.

Methods	F1(%)	$Δ F 1 (%) ↑$	MSE
GEO-Faster R-CNN + CFS	90.02	9.83	53,001.35
GEO-Cascade R-CNN + CFS	91.63	9.09	53,274.17
GEO-YOLOv3 + CFS	91.54	9.65	53,888.27
GEO-PP-YOLO + CFS	92.89	10.68	49,274.17
GEO-PP-YOLOv2 + CFS	93.47	9.39	40,222.44

Table 4. Final ranking of top 10 teams in the SpotGEO challenge.

Rank	Participant Name	F1(%)	MSE
1	AgeniumSPACE	94.83%	33,838.9931
2	POTLAB@BUAA	94.43%	30,541.73189
3	dwiuzila	92.89%	41,198.45863
4	Magpies	90.43%	48,919.9227
5	Mr_huangLTZaaa	88.42%	62,021.80923
6	francescodg	87.89%	65,772.46337
7	mhalford	87.70%	69,566.90857
8	PedroyAgus	86.61%	70,104.96654
9	elmihailol	86.11%	83,172.81408
10	Barebones	83.66%	10,5518.4199

Table 5. Specifications of NVIDIA Jetson Nano.

AI Performance	GFLOPs
CPU	4-core Cortex A57
GPU	128-core NVIDIA Maxwell
Memory	4 GB 64 bit LPDDR4
Size(mm)	100 × 80 × 29
Power(W)	5 W (or 10 W)

Table 6. Average Inference Time on the NVIDIA Jetson Nano platform.

Methods	Inference Time(s)
GEO-PP-YOLOv2	0.932
GEO-PP-YOLOv2 (with TensorRT)	0.706

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dai, Y.; Zheng, T.; Xue, C.; Zhou, L. Effective Multi-Frame Optical Detection Algorithm for GEO Space Objects. Appl. Sci. 2022, 12, 4610. https://doi.org/10.3390/app12094610

AMA Style

Dai Y, Zheng T, Xue C, Zhou L. Effective Multi-Frame Optical Detection Algorithm for GEO Space Objects. Applied Sciences. 2022; 12(9):4610. https://doi.org/10.3390/app12094610

Chicago/Turabian Style

Dai, Yuqi, Tie Zheng, Changbin Xue, and Li Zhou. 2022. "Effective Multi-Frame Optical Detection Algorithm for GEO Space Objects" Applied Sciences 12, no. 9: 4610. https://doi.org/10.3390/app12094610

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Effective Multi-Frame Optical Detection Algorithm for GEO Space Objects

Abstract

Featured Application

Abstract

1. Introduction

2. Related Work and Materials

2.1. Deep Models for Object Detection

2.2. Dataset

3. Methodology

3.1. Data Preprocess: Label Transform

3.2. Candidate Extraction: PP-YOLOv2

3.3. Candidate Filtration and Supplement: CFS

4. Experiment

4.1. Experiment Platform and Parameter Setting

4.2. Evaluation Metric

4.3. Comparison Results

4.4. Model Deployment

4.5. Further Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI