1. Introduction
Detecting near-Earth asteroids (NEAs) is a crucial aspect of astronomy, pivotal both in studying the solar system and safeguarding Earth. NEAs are defined as asteroids with perihelion (
q) < 1.3 AU and aphelion (
Q) > 0.983 AU. These celestial bodies, constituting a segment of the Solar System’s small bodies, offer insights into its formation and evolution. Moreover, certain NEAs pose significant threats to Earth’s ecosystem, capable of catastrophic consequences [
1]. The extinction event of dinosaurs approximately 65 million years ago has been attributed to a kilometer-sized asteroid impact, suggesting the profound impact NEAs can have on our planet’s history [
2]. Thus, developing an effective model for NEA detection emerges as an imperative task, bridging both scientific inquiry and planetary defense.
Due to NEAs’ high apparent motion, they undergo noticeable positional changes across the celestial sphere during a single telescope exposure. In telescope imaging data, NEAs often leave distinctive “streaks” spanning tens to hundreds of pixel. With the rapid development of wide-field and space-based telescopes, NEAs await discovery [
3,
4,
5].
The methods employed for detecting NEAs over the past decade can be categorized as non-machine-learning methods, traditional machine-learning methods, and deep-learning methods [
6,
7,
8,
9,
10]. These three methods have their own techniques and limitations: (1) Non-machine-learning methods utilize the linear features of streaks to enhance image details and extract linear objects in the image, based on the characteristics of streaks left by NEAs. While detecting NEAs, these methods also collect artificial satellites, meteors, and other streak-like objects, which require another algorithm to distinguish them. (2) Traditional machine-learning methods extract morphological linear features in images and use them as inputs to the model. To avoid background and other target disturbance, differenced images are used (obtained by subtracting one astronomical image from another one that observes the same celestial region, revealing any changes or temporary targets in celestial objects over time). However, this process further blurs faint streaks. (3) Deep-learning methods take images as input. Because streaks can be generated according to a 2D Gaussian point spread function (PSF) [
11], it is convenient to simulate great numbers of data for training. How to limit the parameters of simulation, such as the length and brightness of streaks, in order to minimize the bias in the model and detect previously undiscovered streaks is a challenging problem in this method. In addition, a larger input size can effectively prevent streak truncation and enable the better distinction of artificial satellites, which are typically longer than streaks. Recently, more satellites, such as SpaceX Starlink Satellites, have been deployed, and the number of affected images have increased [
12,
13]. Due to satellite similarity in morphology to NEAs, satellites confuse the model’s recognition ability. Previous models did not have sufficiently large input size and overlooked the rising number of satellites.
In recent years, deep-learning methods have emerged as the predominant approach in this field. Deep-learning is highly suitable for research in astronomy, and it has been proven to be equally effective in detecting NEAs. In the study of NEAs, there exists not only a vast amount of data available for model training but also a massive volume of data that requires processing. NEAs appear in images at a low frequency of only approximately 1%, while telescopes like the Zwicky Transient Facility (ZTF) can release 5000 images in one night, rendering manual inspection excessively time-consuming. Instead, deep-learning models can accurately screen NEAs, eliminating interference from other linear objects such as satellites, saturated pixels, cosmic rays, etc.
For the precision with which deep-learning methods can identify NEAs from similar targets, we design a model based on deep-learning. Through a new approach, we extensively collect real NEA streaks as the basis for simulating streaks, aiming to reduce bias. Simultaneously, we increase the input size, reduce the probability of truncating streaks, and improve the differentiation between NEAs and satellites.
Furthermore, one of the shortcomings shared by the three methods mentioned above is that they consider all input features. An NEA typically comprises only one-thousandth of a two-dimensional matrix inputted into models. In other words, most of the pixels constitute background noise and other targets. Those pixels instead burden the detection of NEAs.
NEAs only constitute a minuscule fraction of the images; thus, in this paper we propose an ICC-BiFormer model, containing an image compression and contrast enhancement block and BiFormer model [
14], to detect NEAs. ICC-BiFormer is a binary classification, which identifies NEAs from the background images. It incorporates an adaptive mechanism to compress image bits and a dynamic module to filter out irrelevant information, utilizing the principles of sparse attention. This model is more robust against interference from streak-like objects and low signal-to-noise ratio (SNR) images.
The main contributions of the paper are as follows:
ICC-BiFormer is the first model to focus on extracting the local features of NEAs while disregarding the learning of global features. Compared to other models, it has high accuracy and a low false positive rate, rendering it more suitable for NEA detection.
We use a large input size of the model, which corresponds to the side length of the input image matrix and a new cropping algorithm to reduce the likelihood of truncating NEA streaks, preserving more NEA features in the images. This also facilitates the discrimination between NEAs and satellites.
A new approach of collecting streaks is established by differentiating the positions of NEAs across multiple observations, aiming to improve the integrity of the data.
The subsequent sections of the article are structured as follows:
Section 2 provides a review of related work.
Section 3 delves into the architecture of ICC-BiFormer.
Section 4 introduces the construction of the dataset and presents an analysis of the experimental results, and
Section 5 offers a summary of the article.
3. Method
In order to consider both speed and accuracy in NEA detection tasks, we propose ICC-BiFormer, which is an improved version of BiFormer [
14]. This section discusses the basic architecture and modified architecture of Biformer. The overall architecture of ICC-BiFormer is shown in
Figure 1.
As shown in
Figure 1, our ICC-BiFormer mainly consists of three parts: a module for processing 16 bit data, a backbone network composed of BiFormer, and a classification head. The image compression and contrast enhancement module mainly consists of three stacked parts: CNN, pooling layer, and transposed convolution. In the original images, most of the useful information corresponds to very low pixel values, so directly displaying the original images results in almost entirely black images. This also implies that the streaks are not distinctly different from the background; thus, we require ICC to reduce the image’s bits, amplify the useful information, and enhance the contrast between streaks and the background. BiFormer is an improved spatial pyramid module with stronger extraction ability and running speed.
BiFormer utilizes a dual layer routing attention mechanism. The dual layer routing attention mechanism is a dynamic, query-aware sparse attention mechanism that can adaptively select a small number of related key value pairs based on the characteristics and positions of each query, rather than all key value pairs. This can reduce computation and memory consumption, while improving attention concentration and efficiency.
A classification head is a fully connected layer. Its function is to classify and detect NEAs based on the output of BiFormer and output a binary classification (0, 1) to indicate whether the image contains NEAs.
The formal definition of ICC-BiFormer is as follows:
Given a 16-bit grayscale image,
, where
H and
W are the height and width of the image, ICC-BiFormer first uses the ICC module Compress and enhances the image to obtain an 8-bit image,
, that is:
ICC is the image compression and contrast enhancement module, which includes CNN, pooling layer, and transposed convolution. Through the ICC module, we reduce the bit depth of the image, identify which pixels’ brightness needs enhancement, and make adjustments accordingly. The primary purpose of this module is to enhance the contrast between NEAs and the background. The 8-bit images suffice to highlight their differences, while higher bit-depth images add unnecessary information. Before and after ICC processing, the image’s height and width remain unchanged.
Next, ICC-BiFormer uses the BiFormer module to encode and decode features of the image to obtain the feature map
, where
C is the number of channels, which is:
BiFormer is a double-layer routing attention module, including a spatial pyramid encoder and a decoder. The BiFormer block models cross-location relations. The primary purpose of this module is to filter out the majority of irrelevant key-value pairs at a coarse-grained regional level and retain only a small number of routed regions. Subsequently, it applies fine-grained token-to-token attention across the union of these routed regions. For NEA images, this module would output streaks’ features and discard noisy backgrounds.
Finally, ICC-BiFormer uses the classification head to classify and detect the feature map, and it obtains the binary classification result
, which is:
In Equation (
3), MLP is a multilayer perceptron, used to output the presence or absence of NEAs.
3.1. Image Compression and Contrast Enhancement Block
The image compression and contrast enhancement block reduces the bit depth of the image and enhances the features of streaks. Suppose that the number of channels in the input image is C, the height is H, the width is W, the normalized bit depth is B, and the brigthness adjustment coefficient is K, then:
Encoder: The input image is passed through three convolutional layers and a max pooling layer to increase the number of channels to 32, 64, and 128, respectively. The height and width are reduced to 1/8 of the original, resulting in the encoded feature map .
Normalizer: The encoded feature map is flattened into a one-dimensional vector , and is then mapped to a dimensional vector through two fully connected layers and a Tanh activation function as the normalized encoding.
Brightness Adjustment layer: The normalized encoded
is passed through an AdaptiveAvgPool2d to reduce the feature map of each channel to 1x1, resulting in a B-dimensional vector
. AdaptiveAvgPool2d can achieve two-dimensional adaptive average pooling operations. Its function is to dynamically pool the input feature map based on the specified output size, without manually setting the size and step size of the pooling kernel. It can adapt to feature maps of any input size, and the output size is always the specified
H ×
W. The number of channels for the output feature of AdaptiveAvgPool2d is the same as the number of channels for the input feature. Then, the formula is
In Equation (
4),
N is the batch size
and
are the height and width of the output feature. The calculation process of AdaptiveAvgPool2d involves dividing each channel of the input feature into
×
regions and then taking the average of the elements in each region to obtain the corresponding elements of the output feature.Then, by using a fully connected layer and a sigmoid activation function, it is mapped to a scalar
c as the brightness adjustment coefficient. Multiply
c by
K, the brightness adjustment coefficient preset manually, to obtain the adjusted brightness adjustment coefficient
. Finally, Multiply the normalized encoding
by
to obtain the brightness-adjusted encoding
.
Decoder: After multiplying, the decoder parts, reshape the output into a feature map of B, , and restores it to the size and number of channels of the original image through three transposed convolutional layers and a sigmoid activation function to obtain the decoded image .
After being processed by the ICC module, the astronomical images are compressed in bit rate, and the contrast of the images for trajectories is enhanced. The ICC module reduces the computational burden and optimizes the information contained for subsequent BiFormer image classification (
Figure 2).
3.2. BiFormer
BiFormer is a visual Transformer model based on Bi level Routing Attention (BRA), which can effectively allocate computing resources and capture context dependencies related to content. The overall architecture of our BiFormer is shown in
Figure 3. The model architecture of BiFormer is introduced as follows:
BiFormer mainly includes the following parts:
Overlapping Patch Embedding (OPE): This part divides the input image into overlapping small blocks and maps each block to a feature vector as the input to the Transformer.
Block Merge: This part involves merging four adjacent blocks into a larger block, reducing spatial resolution while increasing the number of channels.
BiFormer Block: This part contains a deep convolutional layer, a BRA module, and an MLP module for the feature transformation and contextual modeling of each block.
BRA module: This module is a dynamic, query-aware sparse attention mechanism that first filters out irrelevant key value pairs at the coarse region level and then applies fine-grained token-to-token attention in the remaining routing areas.
3.3. Model Components
This part introduces the classification header, loss function, and optimizer.
Classification Header: The final classification header is an MLP that outputs the probability y of the current input image containing NEAs, with a value of [0,1]; 0 indicates no NEAs, and the closer it is to 1 the more confident the model is that the input image contains NEAs.
Loss function: Using Binary cross entropy as the loss function, this function calculates the pixel level difference between the original image
and the decoded image
. Binary cross entropy is a loss function used for binary classification problems, which can measure the difference between predicted binary results and actual binary labels. It quantifies the dissimilarity between probability distributions and assists model training by punishing inaccurate predictions.The formula for binary cross entropy is as follows:
In Equation (
5),
n is the number of samples,
is the true label of the
i-th sample (0 or 1), and
is the predicted probability of the
i-th sample (a value between 0 and 1).The meaning of binary cross entropy is that, for each sample, if its true label is 1 then the value of the loss function is directly proportional to the negative logarithm of its predicted probability. Otherwise, its true label is 0; the value of the loss function is directly proportional to the negative logarithm of the complement of its predicted probability. In other words, the closer the predicted probability is to the true label, the smaller the value of the loss function. When the predicted probability is further away from the true label, the value of the loss function becomes larger. One application scenario of binary cross entropy is in neural networks for the supervised learning of binary classification problems. For example, predicting whether an image contains cats, or predicting whether a comment is positive.
Optimizer: The Adam optimizer is used to update the weights and bias parameters in the neural network based on the gradient of the loss function, in order to minimize the loss function.
Raw astronomical grayscale data is typically represented in 16 bits, with pixel values ranging from 0 to 65,536. However, the brightness of NEAs is usually concentrated within a narrow range. Faint objects may be overwhelmed if normalized directly. Therefore, this article’s method of first compressing 16 bits and then performing contrast enhancement is very suitable for solving this problem. Furthermore, the trajectories of NEAs in astronomical image files only exist in part of the image and the trajectory length is not fixed. By using BiFormer to detect near-Earth objects in astronomical images, the self-attention mechanism can be used to capture long-distance context dependencies in astronomical photos, thereby improving the positioning and recognition capabilities of NEAs. Dynamic query-aware sparsity can be achieved by reducing calculations on irrelevant areas and increasing attention to NEAs. Multi-scale feature extraction can also be achieved through the pyramid structure to adapt to NEAs of different sizes and shapes. Finally, depth convolution can be used to hide the relative position information, which is encoded in a formula, thereby enhancing the prediction ability of the motion trajectories of NEAs.
4. Experiment and Analyze
Our dataset comprises only two labels: images with streaks are categorized as positive, whereas those without streaks are classified as negative. All the data we use, including the real streaks and the background’s injected simulated streaks, are derived from the Zwicky Transient Facility (ZTF). The ZTF Observing System delivers 47 field-of-view optical imagery, enabling ZTF to better capture NEAs [
28].
4.1. Streaks
In our training dataset, all streaks are simulated. All simulated data can remove the bias caused from discovered NEAs’ limitationa, and to verify the effectiveness of our model in practical streak detection we propose a new method, a new way to collect real streaks.
4.1.1. Real Streak Collection
We combine two websites to make an NEA list and predict which image will leave streaks. The first website,
earthflyby, provides all the asteroids that made close passes by Earth each year, with orbit parameters and close approach parameters (
https://astorb.com/browse/earthflyby_index/, accessed on January 2024, (
Table 1)).
The limiting magnitude of ZTF is 20.4. In order to appropriately expand the search range for NEAs, we have set the upper limit of the search to an apparent magnitude of 26.0. To ensure NEAs have enough angular velocity to leave streaks in images, we calculated the NEA angular velocity at close approach time. Although the NEA angular velocity at close approach time is not necessarily the maximum, considering that ZTF cannot guarantee capturing NEAs at their maximum angular velocity, using it as a preliminary screening criterion is reasonable. The NEA angular velocity,
, can be estimated used Close Approach Speed,
, and Close Approach Distance,
, as follows:
After that, we can obtain a list of NEAs that meet the brightness and length requirements for streaks. These NEAs can be searched in bulk using the Moving Object Search Tool (
MOST) (
https://irsa.ipac.caltech.edu/applications/MOST/, accessed on January 2024). Before searching, the tool needs to input Observation Begin (UTC) and Observation End (UTC). We set the time according to a NEA close approach time, close approach angular velocity, and close approach magnitude.
After submitting an NEA name, Observation Begin (UTC), and Observation End (UTC), the website outputs a table of images that may leave an NEA streak (
Table 2).
Fitting the trajectory of the NEA between two points with a straight line, we can obtain the average velocity of the NEA during this time period. The average angular velocity,
, between the two recording image can be calculated by observing time,
, NEA Right Ascension,
, and NEA Declination,
, as follows:
This allows us to estimate the streak length,
l, that would be left on the image. By giving the exposure time,
, and the telescope pixel scale,
p, streak length,
l, can be calculated as follows:
After downloading candidate images that may leave streaks over 10 pixels, we manually inspect them to determine whether there is a streak present, and then record the starting and ending point of the streak. We record the starting and ending points to calculate the length of the image streak and for future cropping. Then, we compare this length with the previously estimated length. If the estimated length is significantly higher, it indicates that it is a satellite, and the
sat_id program is used to check whether the streak is a known satellite to confirm our label (
https://www.projectpluto.com/sat_id.htm, accessed on January 2024). Finally, We obtain a dataset of flagged NEAs and satellites.
4.1.2. Simulated Streak
With real NEAs as reference, we simulate streaks and inject them into ZTF images. It is proven that a 2D Gaussian PSF is accurate enough to describe asteroid streaks [
11]. We simulate streaks according to the equation from Ye [
29]:
where
is total flux,
is width of PSF,
L is length of streak, and
is the Gaussian error function, and
where
and
define the center of the streak, and
defines the angle between the motion of the streak and the
x axis.
4.2. Dataset
During the detection of NEAs, kilometer-sized NEAs have almost been discovered. However, the completeness for smaller NEAs drops rapidly [
30]. The brightness of NEAs is primarily due to the reflection of sunlight and is positively correlated with the size of the NEAs. Smaller NEAs generally have a fainter brightness. The key to discovering undiscovered NEAs lies in how we select the range of brightness as a parameter in the dataset. We use synthetic streaks and inject them into real background images from ZTF.
We randomly download approximately 400 ZTF images from 2020 to 2021. In order to ensure there is no streak-like objects in these images, we manually check and crop the raw image (3080 × 3072 pixels) into small blocks (256 × 256 pixels). Considering the non-uniformity at the edges of astronomical images, we dismiss a few columns of pixels in the cropping process. To evaluate the noise of backgrounds, we calculate each image pixel value mean in order to better inject streaks.
We simulate 20,000 streaks and adjust their parameters, as the present parameters derived from discovered NEAs statistics might not be optimal for detecting undiscovered NEAs.
In order to determine the length and width of streaks in the data, as well as the distribution of lengths and widths, we construct a length function and a width function that are constant at 6–60 and 0.3–1.0, respectively. We then extend this range to 60–190 and 1.0–2.0 using inverse proportionality functions. The two functions can be described as follows:
where
is the length function,
is the width function, and
and
are the constant factors of two functions (
Figure 4 and
Figure 5).
Streaks with lengths between 6 and 60 and widths between 0.3 and 1.0 have fewer features in the images and are difficult for models to recognize, and thus we increased their proportion in the dataset.
After cropping approximately 40,000 ZTF background images and simulating 20,000 streaks, we randomly combine them and inject streaks into the background. To ensure that the center brightness of each streak matrix remains constant, we multiply it by a compensation factor to prevent changes in center brightness caused by variations in length and width, and we multiply streak matrices by a brightness factor. We randomly sample the brightness factor from 6.31 to 15.85, implying that the brightness of the streak is between 2 and 3 magnitudes brighter than the average. Streaks with lower brightness are difficult for humans to distinguish, so even if the model has the ability to recognize them, they may still be difficult to confirm. For streaks with higher brightness, after the ICC module, they exhibit similar brightness to the streaks we have defined, and the generalization capability of ICC-BiFormer ensures that they can be identified. The process can be expressed as follows:
where
is the compensation factor,
is the brightness factor,
is the brightness of the streak center,
is the brightness of streak at position
,
is the brightness of background at position
, and
is the brightness of position
after processing.
This NEA image pair demonstrates many characteristics of NEA images (
Figure 6). In the original image, NEAs are often submerged in the background due to their low brightness. After processing by the ICC module, the contrast between the dim objects and background noise is enhanced. NEAs only occupy a small portion of the image and have distinct morphological features that differentiate them from other celestial bodies.
4.3. Larger Input Size
In our model, the input size is 256 pixels, which is larger than the input sizes used in previous detection works, such as 80 pixels in [
9] and 144 pixels in [
10]. We notice that when cropping training images from ZTF images (3080 × 3072 pixels), a larger training input size can effectively reduce the probability of truncating streaks. A longer streak retains more prominent features, which the model finds easier to recognize.
For a given nine images cropped from the raw image data, whose side lengths are the input size,
S, and supposing streak height and width are
H and
W, respectively, the possibility,
P, of the middle images among the nine images truncating the streaks is as follows: (
Figure 7 and
Figure 8)
Another advantage of a larger input size is that it aids in distinguishing NEA streaks from satellite streaks, as NEAs and satellites have similar distances from the Earth and their brightness mainly comes from the reflection of the Sun. In addition to utilizing known satellite databases for confirmation by matching, the length of streaks also serves as a crucial criterion for identification. Satellites have a higher angular velocity than NEAs. Due to the rotation of satellites and their irregular shapes, the brightness of satellite streaks sometimes experiences a significant increase in a short period of time, which is referred to as a flare [
31]. Longer streaks increase the possibility of capturing flares, which are conclusive evidences to confirm that streaks are left by satellites.
4.4. Cropping Algorithm
To further save longer streaks while cropping, we propose a new cropping algorithm. For a non-cropping image, we crop it four times from different starting positions. We take a 768 × 768 image as an example and the input size,
S, is 256. The first cropping starts from the top left corner (1,1). The 768 × 768 image can be divided into nine smaller images without residuals. The second cropping needs to move half an input size along the x-axis, starting from (128,1). The third cropping moves half an input size along the y-axis, starting from (1,128). Both the second cropping and the third cropping can obtain six smaller images. The fourth cropping moves half an input size along both the x-axis and the y-axis, starting from (128,128). (
Figure 9) This approach ensures that streaks with maximum components in the x and y directions both less than 128 will not be truncated. For streaks with components exceeding 128, at least 128 pixels will be retained in one small image. Thus, our model demonstrates greater detection capability for streaks whose positions have not been pre-labeled.
4.5. Results
In order to evaluate the performance of the classification model, we use some evaluation indicators to measure the accuracy and robustness of the model. This article mainly introduces the following four evaluation indicators: Accuracy (Acc), False Positive Rate (FP), True Positive Rate (TP), and Area Under the Receiver Operating Characteristic Curve (ROC AUC) [
32].
After the ICC block of this article, the characteristics of dim objects become more obvious and are easier to detect by the network. For the ICC processing effect shown in
Figure 6, it can be observed that some objects that were submerged in the background and could hardly be distinguished by the naked eye become obvious after ICC processing. After this, the resulting image classification effect is optimized.
As can be seen from
Table 3, compared with other models based on CNNs, ResNet uses residual connections to retain the original features, making the network learning smoother and more stable, and further improving the accuracy and generalization ability of the model. Therefore, it has better classification results. Next, comparing the prediction results of ViT, Swin Transformer, and ICC-BiFormer, the classification index of ICC-BiFormer is higher than the other two. ViT and Swin Transformer exhibit both higher FP and lower TP due to the interference of noise and celestial bodies in the image, which impedes global attention mechanisms from effectively learning NEA features. This proves that ICC-BiFormer, which adds dynamic, query-aware sparse attention and filters out irrelevant key-value pairs at the coarse region level, is more suitable for the NEAs detection task of this paper.
In order to verify the effect of each module, we designed corresponding experiments. As shown in
Table 4, the E-BiFormer model is better than the BiFormer model, which proves that the Encoder can fully extract the information carried by the original image. The image Decoder module is added to obtain ED-BiFormer, the images generated by the Decoder are input into BiFormer for classification, and the model is further optimized. Therefore, we propose ICC-BiFormer, which adds Quantizer and Contrast Enhancement blocks on the basis of ED-BiFormer and integrates image compression and contrast enhancement blocks and BiFormer. This experiment is also an ablation experiment of the model we proposed, proving that each module in the model structure is indispensable.
4.6. Discussion
The excellent performance of ICC-BiFormer is established by adapting to NEAs that occupy a small portion of the image. Due to the BRA module, ICC-BiFormer filters out backgrounds with low brightness fluctuations, allowing it to distinguish celestial bodies such as stars with slow brightness increases and streaks with significant brightness increases in an area. The ICC module suppresses dark pixels, further assisting in distinguishing between background and streaks. However, inevitably, some identification of faint streaks is lost. The suboptimal recognition of faint streaks is a drawback of ICC-BiFormer. In addition, except for satellites, some linear objects such as cosmic rays and saturated pixels also resemble streaks. However, in our dataset we have not amplified the proportion of these objects. In some rare cases, ICC-BiFormer may fail to classify correctly.
Comparing ICC-BiFormer with ResNet and BiFormer, ICC-BiFormer demonstrates a lower rate of FP, albeit with a slight decrease in TP. However, during actual detection, streaks occur at a frequency of approximately 1%, which reduces to approximately 0.1% after cropping. This indicates that, in astronomical survey datasets, the ratio of positive to negative samples is roughly 1:1000. Notably, each candidate identified as a streak requires manual confirmation to verify it as a genuine NEA. Therefore, with a focus on maintaining relatively high TP rates, reducing FP is imperative because it significantly reduces the time needed for manual confirmation. Moreover, leveraging ICC-BiFormer across a broader time of data enhances our ability to detect NEAs.
5. Conclusions
In this study, we train the ICC-BiFormer model for ZTF surveys to detect NEAs, achieving a high level of accuracy. The BiFormer block plays a crucial role in ensuring a high TP by filtering out irrelevant information, such as background noise or stars, thereby mitigating interference. Additionally, our results suggest that considering local features of astronomical images, especially for small-scale objects like NEAs that appear nearly one-dimensional, is more effective than focusing solely on global features. Furthermore, due to the use of image compression and contrast enhancement blocks, we have achieved an exceptionally low FP. Given that NEAs are typically only slightly brighter than the background, scaling astronomical images to a range conducive to NEA detection proves to be another effective strategy for enhancing the model’s resistance to interference.
When applying the model to practical detection tasks, our advantage lies in utilizing a larger input size and our cropping algorithm. One key distinction between model training and practical detection lies in the absence of pre-cropping based on NEA positions in real detection scenarios. This absence can lead to the loss of NEA information due to random cropping during practical detection. In addition, with the increasing number of satellites in near-Earth orbits, distinguishing them becomes easier through the utilization of a larger input size and our cropping algorithm.
Our entire suite of NEA detection models has been improved, in both experimentation and practical applications.To detect more NEAs, we plan to extend this method to several astronomical surveys to collectively search for NEAs. In further work, the technique of the adaptive extracting of local features in deep learning presents a groundbreaking approach to astronomical object recognition. In astronomical images, the vast majority of pixels are composed of background noise, with the targets representing a small proportion. The design of ICC-BiFormer can be extended to encompass other identification tasks within astronomy.