RACFME: Object Tracking in Satellite Videos by Rotation Adaptive Correlation Filters with Motion Estimations

Wu, Xiongzhi; Zhang, Haifeng; Mei, Chao; Wu, Jiaxin; Ai, Han

doi:10.3390/sym17040608

Open AccessArticle

RACFME: Object Tracking in Satellite Videos by Rotation Adaptive Correlation Filters with Motion Estimations

by

Xiongzhi Wu

^1,2,3,

Haifeng Zhang

^1,3,*,

Chao Mei

^1,3,

Jiaxin Wu

^1,2,3 and

Han Ai

^1,3

¹

Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

³

Xi’an Key Laboratory of Spacecraft Optical Imaging and Measurement Technology, Xi’an 710119, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(4), 608; https://doi.org/10.3390/sym17040608

Submission received: 16 February 2025 / Revised: 26 March 2025 / Accepted: 8 April 2025 / Published: 16 April 2025

(This article belongs to the Special Issue Advances in Image Processing with Symmetry/Asymmetry)

Download

Browse Figures

Versions Notes

Abstract

:

Video satellites provide high-temporal-resolution remote sensing images that enable continuous monitoring of the ground for applications such as target tracking and airport traffic detection. In this paper, we address the problems of object occlusion and the tracking of rotating objects in satellite videos by introducing a rotation-adaptive tracking algorithm for correlation filters with motion estimation (RACFME). Our algorithm proposes the following improvements over the KCF method: (a) A rotation-adaptive feature enhancement module (RA) is proposed to obtain the rotated image block by affine transformation combined with the target rotation direction prior, which overcomes the disadvantage of HOG features lacking rotation adaptability, improves tracking accuracy while ensuring real-time performance, and solves the problem of tracking failure due to insufficient valid positive samples when tracking rotating targets. (b) Based on the correlation between peak response and occlusion, an occlusion detection method for vehicles and ships in satellite video is proposed. (c) Motion estimations are achieved by combining Kalman filtering with motion trajectory averaging, which solves the problem of tracking failure in the case of object occlusion. The experimental results show that the proposed RACFME algorithm can track a moving target with a 95% success score, and the RA module and ME both play an effective role.

Keywords:

correlation filter; object tracking; motion estimations; rotation adaptive

1. Introduction

In recent years, the deployment of video satellites has generated a vast amount of images, facilitating the realization of numerous satellite applications [1,2]. These images are utilized in applications such as object tracking, airport traffic detection, and disaster monitoring [3]. These applications rely on essential technologies, including geometric correction, moving object detection, and tracking [4]. This paper specifically delves into the tracking of moving objects in satellite videos. Tracking moving objects poses a significant challenge in the realm of computer vision. Given the initial frame position of a particular object, the goal of the moving object tracking task is to accurately locate and estimate the bounding box of that object in subsequent frames [5]. Although many scholars have worked in the field of satellite video tracking and proposed additional datasets and algorithms in recent years, there are still many challenges that need to be addressed. The complexity of target tracking that this article focuses on can be divided into the following three main areas:

(1) The limited spatial resolution and small object size in satellite videos result in a weak image quality, often making it challenging to differentiate between the background and the object. This difficulty can lead to a loss of object features that would typically be discernible under natural conditions.

(2) The aerial perspective in satellite videos often causes many moving objects to be partially or entirely obscured, resulting in instances where objects may disappear from view. Existing methods often fail to effectively handle such occlusions, leading to tracking failure when objects are temporarily or completely blocked from view.

(3) Rotating objects are a common occurrence in satellite videos, however, existing correlation filtering methods often struggle to effectively track rotating objects. Traditional tracking algorithms, especially those based on Histogram of Oriented Gradients (HOG) features, lack rotation invariance, which severely degrades their performance when tracking rotating targets such as aircraft or ships.

Figure 1, as an example, shows that these difficulties indicate the need to investigate tracking algorithms to solve such problems.

The kernelized correlation filter (KCF) method [6] has demonstrated effectiveness in tracking objects and is considered one of the optimal choices for object tracking in satellite videos. However, when applying the KCF method to satellite video tracking, some limitations persist. (1) Objects are frequently lost when partially or entirely obscured due to the lack of robust occlusion detection and handling mechanisms. (2) The tracking of objects, particularly during rapid rotations such as aircraft maneuvers, poses challenges because the HOG features [7] used in the KCF method are not rotation invariant. In satellite videos, the objects being tracked are typically aircraft, ships, and vehicles, which generally exhibit uniform or uniformly accelerated motion. Recognizing this pattern, we were inspired to introduce RACFME. Our work stands out from others owing to the following contributions.

We introduce a rotation-adaptive (RA) feature enhancement module to address the issue of the lack of rotation invariance in HOG. By using a few affine transformations, the number of effective positive samples is increased and the rotating target tracking ability of the KCF method is enhanced.
Based on the correlation between peak response and occlusion, an occlusion detection method for vehicles and ships in satellite video is proposed. Then Kalman filter is combined with motion path equalization to predict a target when occlusion is detected, which solves the problem of target loss when partial or complete occlusion occurs.
Our algorithm achieves a success score of 95% in tracking objects, operating at about 116 FPS, showcasing a superior performance. Moreover, the method effectively addresses tracking failures in scenarios involving occluded and rotating objects.

2. Related Work

Object tracking in satellite videos has received extensive research attention in recent years, and we will briefly introduce the following three areas: object tracking in satellite videos, moving object tracking, and kernelized correlation filters (KCFs) [6] for object tracking.

2.1. Object Tracking in Satellite Videos

Object tracking in satellite videos has been a significant focus of research efforts, with various methodologies being explored. Some researchers have incorporated moving object detection algorithms to enhance tracking algorithm performance [8]. For instance, a template matching approach utilizing HU matrices was proposed in [9], assuming linear vehicle motion over short intervals. While effective in simple road conditions, this method may encounter challenges in complex environments. Zhang et al. [10] proposed combining spectral and spatial features to model ships and aircraft, using a unique regional operator design in the target matching process, which is well-suited for tracking larger targets such as aircraft but less suitable for tracking smaller targets such as vehicles.

In contrast, Chen et al. [11] proposed the use of optical flow technology for target tracking, achieving results in vehicle speed estimation and traffic density monitoring, though tracking was not the primary focus, as it demands a high video quality. To enhance tracking performance, Du et al. [12] adopted more robust discriminative methods like correlation filters, integrating techniques such as multi-frame differencing for support. These algorithms refine correlation filters based on satellite video object characteristics, yielding favorable tracking outcomes. Nonetheless, these methods do not fully address inherent correlation filter limitations, such as boundary effects potentially leading to object loss during complete occlusion. Novel methods need to be developed to mitigate the shortcomings of these filters and improve tracking accuracy.

Satellite video, as an evolving Earth observation technology, has found widespread application in moving object detection studies. While current tracking methods excel under simpler conditions, there is ample room for enhancing algorithm precision and FPS. Moreover, existing methods have yet to comprehensively address the challenges related to object occlusion and rotating object tracking in satellite videos, necessitating further optimization and refinement based on satellite video characteristics.

2.2. Moving Object Tracking

To develop a more effective algorithm, we analyze the unique characteristics of satellite video and integrate them with cutting-edge tracking algorithms. Generative methods typically rely on hand-crafted features like color histograms [13], histograms of gradient directions (HOGs) [7], scale-invariant feature transform (SIFT) descriptors [14], and HU matrices [9] to construct object templates. These methods use search algorithms such as particle filtering [15] to locate the most similar objects in subsequent frames. However, generative methods often struggle with complex backgrounds due to their neglect of background information.

Discriminative methods are also known as detection-based tracking. Initially, a classifier is trained in the first frame to distinguish an object from its surroundings, which is then used in subsequent frames to assess if the predicted location corresponds to the object. Various machine learning techniques, including online boosting [16], semi-supervised methods [17], support vector machines [18], and structured output SVMs, have been utilized for training classifiers.

In recent years, discriminative methods have focused on deep learning and correlation filters. For instance, MDNet [19] adopts a multi-domain approach, training a convolutional neural network (CNN) [20] offline and fine-tuning it online during tracking, achieving a high precision but at the cost of speed. In contrast, GOTURN [21] employs CNNs for regression without online updating, resulting in a higher speed but lower precision. Some trackers leverage deep features instead of the hand-crafted features used in correlation-filtered trackers, significantly enhancing precision, albeit sacrificing real-time tracking capability [22]. Notably, correlation filters offer a high speed and precision comparable to deep learning methods [23]. Correlation filters like CSK [24] and kernelized correlation filters (KCFs) [6] leverage kernel tricks to enhance discriminative power through multi-channel features. While operating in the frequency domain reduces computational volume, it introduces boundary effects, leading to potential overfitting. While these methods significantly improve tracking results, they tend to degrade FPS and have difficulty in adapting to changes in target orientation, especially in satellite video, where changes are more often due to object rotation rather than scale changes. Addressing this, we propose a rotation-adaptive feature enhancement module to accommodate such fluctuations.

2.3. Kernelized Correlation Filter

Researchers have made significant advancements to enhance effectiveness, leading to the development of various correlation filtering algorithms. Among these, the kernelized correlation filter (KCF) algorithm [6] stands out as particularly influential due to its simplicity and performance improvements. The core concept of the KCF algorithm revolves around augmenting the number of negative samples to bolster the tracker’s capabilities, achieved through the utilization of cyclic matrix construction methods. The disadvantages of the mainstream filtering method KCF based on HOG [7] and ECO [22] are shown in Table 1 to highlight our research motivation.

In the KCF algorithm [6], training samples are generated based on cyclic matrix formulation, with positive samples serving as the base and additional samples as synthetic negatives. This sample set exhibits favorable characteristics and can be efficiently computed leveraging fast Fourier transform properties and Fourier diagonalization, eliminating the need to explicitly define the negative sample structure. By transforming all computations related to the negative samples into the frequency domain, the algorithm effectively addresses sample scarcity during tracker training by densely sampling the input image’s search region through cyclic matrix properties.

The primary objective of the KCF algorithm [6] in tracker training is to utilize these generated samples to create a filter that yields the expected distribution when applied to the samples. The cyclic matrix generated for any base sample x can be diagonalized in Fourier space using a discrete Fourier matrix, as represented by Equation (1), as follows:

X = C (x) = F d i a g (\hat{x}) F^{H}

(1)

Here, x′ is the discrete Fourier transform of x.

C (x)

is the cyclic matrix generated from the base sample x and

F^{H}

is the Hermite transpose of F, which is a discrete Fourier constant matrix of the form shown in Equation (2), as follows:

F = \frac{1}{\sqrt{n}} [\begin{matrix} 1 & 1 & \dots & 1 \\ 1 & ω & \dots & ω^{n - 1} \\ \dots & \dots & \dots & \dots \\ 1 & ω^{n - 1} & ω^{(n - 1) (n - 2)} & {ω^{(n - 2)}}^{2} \end{matrix}]

(2)

Unlike the MOSSE algorithm, the KCF algorithm [6] uses ridge regression of the following form to train the tracker:

m i n \sum_{i} L (y_{i}, f (x_{i})) + λ {‖ω‖}^{2}

(3)

In the detection process, the algorithm uses the already trained tracker to filter the padding window region to compute the distribution map of the output, in which the maximum response position is used as the center position of the predicted object. According to the concept of the kernelized correlation vector defined above, the kernel matrix of the input samples and template samples in the high dimensional space is of the following form:

K^{Z} = C (k^{x z})

(4)

From the derivation of the previous equations, we know that we only need to update the training sample set x in the process of updating the tracker. Therefore, after the detection part of the algorithm is executed, a new object prediction position is obtained and a new base sample is obtained, which is used to generate a loop matrix to obtain a new sample set new_x. Then, the training is performed to obtain a new set, and finally the tracker is updated using the model parameters from the previous frame, using linear interpolation with a set update step, as shown in the following equations:

\{\begin{matrix} α = (1 - β) α^{,} + β n e w_α \\ x = (1 - β) x^{,} + β n e w_x \end{matrix}

(5)

In summary, the existing methods for object tracking in satellite videos and general moving object tracking have made significant progress. However, they still face challenges when dealing with specific issues prevalent in satellite videos, such as object occlusion and rotation. Traditional correlation filter-based methods, despite their efficiency and accuracy, lack robustness against target rotation and occlusion. Deep learning-based methods, although powerful, often suffer from computational inefficiency and are not well-suited for real-time applications. Moreover, none of these methods have comprehensively addressed the unique challenges of satellite video tracking, including small object sizes, a limited spatial resolution, and complex occlusion scenarios. This gap in the literature highlights the need for a novel approach that can effectively handle these challenges while maintaining real-time performance. Our proposed RACFME algorithm aims to bridge this gap by introducing rotation adaptivity and robust occlusion handling into correlation filter-based tracking, thereby enhancing its overall performance and applicability for satellite video tracking tasks.

3. Proposed Method

Symmetry, as a core concept in nature and engineering, has important applications in computer vision. For example, rotational symmetry can enhance the robustness of feature description to changes in target orientation, while the motion symmetry assumption in time series can improve tracking stability in occluded scenes. The RACFME algorithm proposed in this paper introduces a rotational symmetry prior through affine transformation to address the limitations of traditional HOG features [13] in rotating target tracking; at the same time, the motion estimation module combined with Kalman filtering implicitly incorporates the assumption of temporal symmetry, which further optimizes trajectory prediction under occlusion scenarios. This explicit and implicit utilization of the symmetry principle provides a new solution idea for the complex target tracking problem in satellite video.

In this section, we first introduce the rotation-adaptive (RA) feature enhancement module, then introduce ME, and then use them in combination with the correlation filter to form the RACFME algorithm to mitigate the problem of tracking rotating objects, as well as the occlusion problem.

3.1. Rotation-Adaptive Feature Enhancement Module

The Histogram of Oriented Gradients (HOG) feature [13] exhibits light invariance and is particularly suitable for object tracking. However, a limitation of HOG is its lack of rotation invariance, which hampers its effectiveness in tracking objects with significant angular rotation. Enhancing HOG features proves challenging and can introduce complexity, potentially impacting the tracking algorithm’s efficiency. To solve this problem, we use affine transformation to rotate the object in the image to several angles and generate the rotated image block, use the response peak to obtain the object rotation angle, and perform affine transformation based on this angle to obtain the effective positive sample in the next frame. This approach mitigates HOG’s inherent lack of rotation invariance, thus enhancing the algorithm’s tracking capabilities.

In Figure 2, the module is situated in the RA (rotation-adaptive) region. At the outset, in the initial frame, we set the current rotation angle as zero, following which our algorithm proceeds to train the correlation filter akin to the standard KCF methodology [6]. Distinguished from the KCF method, during the tracking process, we introduce a rotation angle pool R₀ = [0, ±0.5, ±1, ±1.5, ±2]. The process of generating rotated image blocks by affine transformation in the RA module essentially introduces a rotational symmetry prior. Specifically, the rotational angle pool defines the discrete symmetry transform group which covers the possible major rotational directions of the target. Through Fourier space diagonalization, the algorithm achieves the efficient modeling of rotational symmetry in the frequency domain, thus enhancing the invariance of HOG features [13] under the rotational group transformation. This symmetry-driven feature enhancement strategy significantly improves the algorithm’s adaptability to rotating targets. First, we crop an image patch centered at the target position of the previous frame in the current frame. Subsequently, we perform affine transformation to rotate the image. For ease of representation, these image patches are denoted as

{Z^{r_{i}} | r_{i} \in R

. Post-affine transformation, the sizes of the image patches do not necessarily match [w, h]. Any image patch larger than [w, h] has its edges trimmed to conform to the specified size, while patches smaller than [w, h] are padded with zero-valued pixels along the edges. The object’s rotation angle can be estimated using Equation (6), as follows:

r o t a t i o n a n g l e = a r g \max_{r_{i}} f (Z^{r_{i}})

(6)

where the closest angle is the angle closest to the rotation angle in R₀. After obtaining the rotation angle of the object, we can use the response map to obtain the position of the object. We then extract the image patch at the estimated position to update the correlation filters, at the same time updating the rotation angle pool as R = [0, rotation angle, closest angle]. The pseudo-code for RA module tracking is shown in Algorithm 1.

Algorithm 1 RA module tracking

Input:
frames: video stream;
Output:

P_{t}

: position at subsequent frames;
for i = 1; i < length(frames); i + + do
if i == 1 then
/*Select the object to track and do some initialization*/

P_{o l d}

\leftarrow P_{f i r s t}

;
R ← R₀;
else then

z \leftarrow Extract an image patch at P_{o l d}

;

{Z^{r_{i}} | r_{i} \in R}

← Rotate z based on R;
rotation angle ← Equation (6);

P_{t}

← The response map;

Update the correlation filters \leftarrow Extract image patch at P_{t}

;
R ← [0, rotation angle, closest angle];
return

P_{t}

end else
end for

3.2. Motion Estimation

This paper employs Kalman filtering to predict the position and velocity of moving objects. While Kalman filtering offers a high precision in motion estimation, it necessitates an adequate number of frames to achieve filter convergence. Prior to the Kalman filter reaching convergence, we utilize motion trajectory averaging to estimate the object’s motion. Objects typically observed in satellite videos include vehicles, aircraft, and ships, which may experience occlusion, such as by bridges. We make the assumption that, during occlusion, the object maintains consistent motion for a brief duration, irrespective of whether it is accelerating or decelerating abruptly.

Under the aforementioned assumptions, the object’s position in previous frames serves as a basis for predicting its position in the current frame, as demonstrated in Equations (7) and (8), as follows:

∆ (x_{t - 1}, y_{t - 1}) = \frac{1}{n} \sum_{i = 1}^{n} (x_{t - i}, y_{t - i}) - (x_{t - i - 1}, y_{t - i - 1})

(7)

P_{t} = (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix} \begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}) (x_{t - 1}, y_{t - 1}, ∆ x_{t - 1}, ∆ y_{t - 1})

(8)

In Equation (7), n denotes the number of selected previous frames, if n is too small, which makes the estimated current position too sensitive to the speed, and if n is too big, which also affects the computational efficiency and needs to be carefully considered, the method in this paper sets {n|n ∈ (3, 5, 10)} according to the object type. In satellite videos, vehicles have lower speeds and smaller pixel sizes, ships generally maintain a constant speed or uniform acceleration and have small accelerations, and aircraft have larger changes in speed during the landing and take-off phases. Therefore, {n|n ∈ (3, 5, 10)} corresponds to vehicles, ships, and aircraft, respectively, in this paper.

In order to address the occlusion problem, the algorithm proposed in this paper, as illustrated in Figure 2, utilizes the correlation filter response threshold discriminant method for occlusion detection. The peak value of the response block generated by the kernel correlation filter (KCF) tracker serves as an indicator of the object’s confidence level. However, in satellite videos, the response of the KCF tracker may be less prominent due to lighting variations, object motion blur, and deformation, while the response to occlusion could be significant. Therefore, we rely on whether the peak value of the response block obtained from the KCF tracker exceeds a predefined threshold to differentiate between full or partial occlusion of the object. As shown in Figure 3, when occlusion occurs, the peak value decreases, and when the peak value is low enough, the object is completely occluded. By observing the changes in the peak value, we can know whether occlusion occurs. Due to the different scales and movements of vehicles and ships, the reflected changes in the peak value are also different. The duration of occlusion for vehicles is shorter and that for ships is longer, so the threshold values we choose are also different. As shown in Figure 4, after comparative experiments of different occlusion detection thresholds, the optimal occlusion detection thresholds for vehicles and ships were obtained, respectively. The occlusion detection thresholds were set as 0.3 for vehicles and 0.4 for ships.

The RA module, ME of RACFME, has been illustrated. Now, they are combined with a KCF for complete RACFME algorithm description. (1) Firstly, we select the object to be tracked in the first frame, crop an image patch at the target position of the current frame, then obtain the rotated image patch based on the rotation angle pool, extract HOG features [13], obtain the peak value, rotation angle, and exact position after passing the correlation filter, and train the KCF [6] using the image patch at the new position. (2) After n frames, ME starts to work and outputs the estimated position to discriminate whether the object is occluded or not according to whether the peak size of the response map is larger than the threshold. If the peak is larger than the threshold, it is not occluded, outputs the exact position directly, and trains and updates the KCF; if the peak is smaller than the threshold, it determines that it is partially or fully occluded, outputs the estimated position, and updates the Kalman filter. (3) We stop updating the KCF filter until the end of the occlusion to revert to the exact position output by the KCF. This aims to prevent performance degradation caused by adding the background features at the time of occlusion to the KCF.

4. Experiment and Analysis

4.1. Datasets and Evaluation Metrics

The experimental data were obtained from China’s high-time-resolution satellite Jilin-1, utilizing a dataset comprising 10 videos. The spatial resolution of the data is approximately 1.5 m, and the frame rate is 20 frames per second. Among the videos, two depict airports in Guangzhou, China, and Copenhagen, Denmark, featuring aircraft as the objects with dimensions of around 50 × 40 pixels; notably, one aircraft undergoes rapid rotation. In addition, two videos show ships in ports across Shanghai, China, with objects measuring approximately 50 × 15 pixels, one of which exhibits progressive rotation and another of which undergoes occlusion. The remaining videos capture traffic scenarios in Hong Kong, Shanghai, and New York, where the moving objects are vehicles sized approximately 15 × 8 pixels, and the smallest object measures 5 × 5 pixels, with multiple instances of vehicle occlusion observed. The weather conditions in the experimental data are all sunny, which is more suitable for target tracking tasks. There are clouds in the data, but the number of images with clouds is small, which has relatively little interference in the experiment.

As shown in Figure 5, compared to other published datasets, our dataset presents several unique challenges, as follows:

Small object size and limited spatial resolution: The objects in our dataset, such as vehicles and ships, are relatively small (e.g., 15 × 8 pixels for vehicles and 50 × 15 pixels for ships), and their spatial resolution is only about 1.5 m. This makes it difficult to distinguish the objects from the background, especially when the objects are partially occluded or have similar colors to the surroundings.
Frequent occlusions: The dataset contains numerous instances of object occlusions, especially in the traffic scenarios in Hong Kong, Shanghai, and New York. Vehicles are often occluded by other vehicles or infrastructure such as bridges, which poses a significant challenge for tracking algorithms.
Rotating objects: The dataset includes objects that undergo significant rotations, such as the rotating aircraft in the airport videos. Traditional tracking algorithms that rely on non-rotation-invariant features often fail to track these objects accurately.
Diverse object types and motion patterns: The dataset includes different types of objects (vehicles, ships, and aircraft) with varying sizes, shapes, and motion patterns. This diversity requires the tracking algorithm to be robust to different object characteristics and motion dynamics.

We evaluate the performance of the tracker in consecutive frames using the following two metrics that are widely accepted in the field: the center location error (CLE) and the overlap score. The CLE is a Euclidean metric that measures the distance between the true center of the tracked target and the predicted center. In evaluating the overall performance of the tracker over the entire video sequence, we consider metrics such as accuracy, success rate, and the area under the success curve (AUC), which are computed based on the CLE and overlap scores. If the CLE is within five pixels, the tracking is considered successful. The accuracy score of the tracker is determined by calculating the percentage of successfully tracked frames, which are considered to be successfully tracked if they exceed 0.5. The success rate is the percentage of successfully tracked frames. In addition, the processing speed of the tracker is evaluated in terms of the number of frames processed per second (FPS).

4.2. Experimental Settings

In this paper, we chose to use KCF [6], ECO [22], BOOSTING [17], MEDIANFLOW [25], MIL [26], and Siam R-CNN [27] to compare with our RACFME. KCF is the baseline of RACFME. MEDIANFLOW both uses optical flow and re-detection methods to prevent object loss. MIL and BOOSTING are classical detection-based tracking algorithms. ECO is one of the best correlation filtering algorithms in object tracking. Siam R-CNN is one of the best deep learning algorithms in object tracking. BOOSTING and MEDIANFLOW are implemented by calling the opencv API. The source code for KCF, MIL, and ECO is from the open-source community. All algorithms are executed on a computer with an i7-12650H CPU and a 4 G NVIDIA GeForce RTX 3050 GPU. The factor λ is set to

10^{- 4}

and the learning rate η is set to 0.01.

4.3. Experimental Analysis on Moving Object Tracking

The results of object tracking on all sequences are shown in Table 2 and Figure 6. Compared to the KCF algorithm [6], the RACFME precision score improves by about 23%, the success score improves by about 24%, and the AUC improves by about 21%. ECO’s [22] performance is a small improvement over KCF, but still not impressive enough. Siam R-CNN [27] is the best deep learning method in the field of single object tracking and its performance is also good, but there is a gap compared to our method, as it does not deal with rotating targets and occlusion separately and is also slower due to computational complexity. MEDIANFLOW [25] predicts positions that are too large for vehicles with small objects, resulting in poor scoring. The performance of MIL [26] and boost is inferior to KCF, and because MIL adopts an online learning strategy, the computational complexity is increased in the scenario with many candidate samples and its FPS is very low at only seven. It is apparent that our method does a much better job with small object tracking and object occlusion problems. Its FPS ranks second among all methods but is very close to the first-ranked KCF. Our method rotates HOG features to enhance the rotational adaptive capability of KCF and uses ME to solve the object occlusion problem, which produces a good effect but does not drastically reduce the computational efficiency.

4.4. Experimental Analysis on Rotating Object Tracking

The results of rotating object tracking are shown in Table 3 and Figure 7. The tracking performance of rotating objects is degraded due to the fact that KCF [6], which relies on HOG features, is not rotation-adaptive, with an accuracy of 57.2%, success score of 50.1%, and AUC of 48.7%. Compared to KCF, the accuracy score of RACFME is improved by about 32% to 89.1%, the success score is improved by about 36% to 86.7%, and the AUC is improved by about 24%. The performance of ECO [21] is similar to that of KCF with an increase. The performance degradation of Siam R-CNN [26] is not significant due to the wide range of target motion scenarios included in the training dataset. MEDIANFLOW [24] performs comparably to KCF, MIL [25], and BOOSTING [17], outperforming ECO in terms of accuracy, success score, and AUC, which demonstrates its superiority for the tracking of large-scale rotating targets. Our method has a significant advantage in the problem of tracking rotating targets, which not only substantially leads KCF and ECO, but even exceeds Siam R-CNN trained with a large amount of data containing rotating targets. In addition, the FPS exceeds 100 and reaches 114, which demonstrates that the RA module proposed in this paper solves the correlation filtering tracking problem for rotating targets through a simple few affine transformations and rotational angle pool update problems and does not increase the computational complexity too much. When the rotation angle of the object is small and changes slowly, the RA module can accurately estimate the rotation angle of the object, so as to effectively enhance the rotation adaptability of HOG features. In this case, RACFME’s tracking accuracy and success rate are significantly improved. However, when the rotation angle of the target changes very fast and unpredictably, the rotation angle estimation of the RA module may have a large error, resulting in inaccurate feature extraction, thus affecting the tracking performance. Due to the scarcity of satellite video data, the rotation of aircraft and ships in the dataset we adopted is relatively slow, which may be more friendly to our method.

4.5. Experimental Analysis on Occluded Object Tracking

The results of occluded object tracking are shown in Table 4 and Figure 8. The KCF method [6] has an AUC of 37.4, an accuracy score of 47.2, a success score of 45.8, and an FPS of 131. Compared to KCF, RACFME improves its accuracy score by about 44% to 91.3%, its success rate by about 46% to 92.0%, and its AUC by about 39%, demonstrating great success in occlusion detection and processing. At the same time, RACFME’s FPS is 121, which decreases by only 10 points compared to KCF. It still maintains an excellent real-time performance. ECO’s [21] performance is similar to KCF’s, but increases. Since there is no occlusion detection and handling mechanism, the performance degradation of Siam R-CNN [21] is very significant, far from RACFME, but still better than ECO. MEDIANFLOW [24] performs on par with Siam R-CNN in terms of accuracy, success scores, and AUC, still significantly below our approach. Both MIL [25] and BOOSTING [17] have a decreased performance on occluded objects, comparable to KCF. Our approach has a clear advantage in tracking obscured targets, not only by a wide margin over KCF and ECO, but also by a wide margin over Siam R-CNN, with an accuracy advantage of about 27%. The results show that the occlusion detection method proposed in this paper can detect occlusion well and that the motion estimation based on the motion trajectory average fusion Kalman filter has a good effect, only increasing the computational complexity a little. When the target is not blocked, the ME module predicts the target’s motion through the motion trajectory average and Kalman filter, optimizes the search area of the target, and improves the tracking stability and accuracy. When the target is partially occluded, the ME module can predict the location of the target through the motion trajectory average and Kalman filter, so as to keep track of the target to a certain extent. When the target is completely blocked and blocked for a long time, the prediction error of the ME module may gradually accumulate, resulting in inaccurate target position prediction, which affects the tracking performance.

5. Qualitative Evaluation

Here, we qualitatively compare our method with the other trackers in Figure 9. All trackers select the same target and enter the first frame position, and the trackers’ results are distinguished by different bounding box colors. In the video sequence for New York, the target is a vehicle, and when the occlusion occurs, the object almost completely disappears from the frame. Only our method can estimate the position of the object, in which the ME mechanism plays an important role, estimating the position of the object when it is blocked based on the displacement of the object in the previous frames when the relevant filter cannot find the object, in order to prevent the tracking failure caused by the target drift problem. When the object reappears, our method can correctly detect the object and effectively deal with the occlusion problem. Other methods do not handle occlusion effectively. After the target is blocked, the search area of other methods still stays in the position before the target is blocked for a long time, and the target position cannot be estimated. When the occlusion ends, the search area is far away from the target and the target cannot be detected, so the target is eventually lost. In the video sequence for Hong Kong, the target is also a vehicle, and the object is not blocked throughout the video, but it is obvious that MEDIANFLOW [25] gradually loses track of the object due to the lack of angular characteristics of the vehicle as a small target. Other methods maintain tracking. Our method gives the minimum bounding box of the object, so it has a higher precision than the KCF [6] method. Although it does not directly output the position of the object, we analyze the motion estimation module, which optimizes the search area to focus more on the center area of the tracking object when there is no blocking, which effectively alleviates the boundary effect caused by cyclic motion. In the video sequence for Copenhagen, the target is a rotating aircraft larger than a vehicle. Our method and the best existing twin neural network method, Siamese R-CNN [27], simultaneously maintain the tracking of the rotating aircraft to achieve the same visual tracking effect. However, due to the lack of rotation adaptation, it relies on the relevant filtering model of HOG features. KCF and ECO [22], after the target experiences rapid rotation, almost lose the target. The MEDIANFLOW’s tracking effect is a certain improvement over vehicles and comparable to MIL [26] and BOOSTING [17], as it is better suited to handle aircraft with a larger scale and angular features. In the video sequence for Shanghai, the target is a ship with obvious wake, the color features of the wake are similar to the target, and the occlusion caused by crossing the bridge poses a challenge to the tracking task. Only our method can effectively track the target and maintain this tracking throughout the whole process, while other methods are interfered with by the wake with similar features to the target, resulting in target drift. Due to the occlusion of the bridge, the target is lost in the picture, and even if the ship reappears in the picture, these methods cannot locate the target and finally lose it.

6. Conclusions

In this paper, an efficient satellite video target tracking framework is constructed by explicitly utilizing rotational symmetry and implicit temporal symmetry assumptions. The RA module incorporates rotational symmetry a priori into the feature enhancement process, which solves the rotational sensitivity problem of traditional methods; the ME module optimizes motion trajectory prediction based on temporal symmetry assumptions, which significantly improves the tracking stability in the occlusion scenario. The experimental results show that the RACFME algorithm achieves a 95% success rate in tracking moving targets with an excellent performance, and RA and ME make effective progress in addressing the problems of object occlusion and rotating object tracking, respectively, which proves the algorithm’s effectiveness and practicality in the field of satellite video target tracking. However, due to the scarcity of satellite video data, our experimental objects, such as aircraft and ships, move relatively slowly, which is more friendly to both the RA and ME modules in our method. In the future, we will study the effectiveness of our method under more complex conditions.

Author Contributions

Conceptualization, H.Z.; methodology, X.W. and J.W.; software, X.W. and C.M.; validation, X.W. and H.Z.; investigation, H.Z. and H.A.; writing—original draft preparation, X.W.; writing—review and editing, H.Z. and X.W.; funding acquisition, H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shaanxi provincial fund 2023—YBGY—234.

Data Availability Statement

The data presented in this study are openly available in SatSot [SatSot] [10.1109/TGRS.2022.3140809].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, Z.; Peng, T.; Liao, L.; Xiao, J.; Wang, M. SDM-Car: A Dataset for Small and Dim Moving Vehicles Detection in Satellite Videos. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
Shangguan, D.; Chen, L.; Ding, J. A Digital Twin-Based Approach for the Fault Diagnosis and Health Monitoring of a Complex Satellite System. Symmetry 2020, 12, 1307. [Google Scholar] [CrossRef]
Ghaban, W.; Ahmad, J.; Siddique, A.A.; Alshehri, M.S.; Saghir, A.; Saeed, F.; Ghaleb, B.; Rehman, M.U. Sustainable Environmental Monitoring: Multistage Fusion Algorithm for Remotely Sensed Underwater Super-Resolution Image Enhancement and Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 16, 3640–3653. [Google Scholar] [CrossRef]
Liu, H.; Zhang, C.; Fan, B.; Xu, J. Pro2Diff: Proposal Propagation for Multi-Object Tracking via the Diffusion Model. IEEE Trans. Image Process. 2024, 33, 6508–6520. [Google Scholar] [CrossRef] [PubMed]
Xuan, S.; Li, S.; Han, M.; Wan, X.; Xia, G.-S. Object tracking in satellite videos by improved correlation filters with motion estimation. IEEE Trans. Geosci. Remote Sens. 2020, 58, 1074–1086. [Google Scholar] [CrossRef]
Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 583–596. [Google Scholar] [CrossRef] [PubMed]
Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 20–25 June 2005; pp. 886–893. [Google Scholar]
Li, H.; Man, Y. Moving ship detection based on visual saliency for video satellite. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 1248–1250. [Google Scholar] [CrossRef]
Wu, J.; Zhang, G.; Wang, T.; Jiang, Y. Satellite Video Point-target Tracking in Combination with Motion Smoothness Constraint and Grayscale Feature. J. Geomat. 2017, 46, 1135–1146. [Google Scholar] [CrossRef]
Zhang, C.; Wang, C.; Song, J.; Xu, Y. Based Satellite Video Object Tracking: A Review. Remote Sens. 2022, 14, 3674. [Google Scholar] [CrossRef]
Chen, Y.; Tang, Y.; Xiao, Y.; Yuan, Q.; Zhang, Y.; Liu, F.; He, J.; Zhang, L. Satellite Video Single Object Tracking: A Systematic Review and An Oriented Object Tracking Benchmark. J. Photogramm. Remote Sens. 2024, 210, 212–240. [Google Scholar] [CrossRef]
Du, B.; Sun, Y.; Cai, S.; Wu, C.; Du, Q. Object tracking in satellite videos by fusing the kernel correlation filter and the three-framedifference algorithm. IEEE Geosci. Remote Sens. Lett. 2018, 15, 168–172. [Google Scholar] [CrossRef]
Possegger, H.; Mauthner, T.; Bischof, H. In defense of color-based model-free tracking. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 2113–2120. [Google Scholar]
Zhou, H.; Yuan, Y.; Shi, C. Object tracking using sift features and mean shift. Comput. Vis. Image Underst. 2009, 113, 345–352. [Google Scholar] [CrossRef]
Nummiaro, K.; Koller-Meier, E.; Van Gool, L. Object tracking with an adaptive color-based particle filter. In Pattern Recognition; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2002; Volume 2449, pp. 353–360. [Google Scholar]
Grabner, H.; Bischof, H. On-line boosting and vision. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; pp. 260–267. [Google Scholar]
Grabner, H.; Leistner, C.; Bischof, H. Semi-supervised on-line boosting for robust tracking. In Computer Vision—ECCV 2008; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2008; pp. 234–247. [Google Scholar]
Hare, S.; Golodetz, S.; Saffari, A.; Vineet, V.; Cheng, M.M.; Hicks, S.L.; Torr, P.H. Struck: Structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 263–270. [Google Scholar] [CrossRef] [PubMed]
Nam, H.; Han, B. Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 4293–4302. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25; MIT Press: Cambridge, MA, USA, 2012; p. 2. [Google Scholar]
Held, D.; Thrun, S.; Savarese, S. Learning to track at 100 fps with deep regression networks. In Proceedings of the Computer Vision ECCV, Amsterdam, The Netherlands, 8–16 October 2016; pp. 749–765. [Google Scholar]
Li, M.; Zhang, H.; Wang, J.; Liu, Y. Research on ECO-HC Target Tracking Algorithm Based on Adaptive Template Update and Multi-Feature Fusion. IEEE Trans. Image Process. 2024, 23, 10758587. [Google Scholar] [CrossRef]
Bolme, D.S.; Beveridge, J.R.; Draper, B.A.; Lui, Y.M. Visual object tracking using adaptive correlation filters. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; pp. 2544–2550. [Google Scholar] [CrossRef]
Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. Exploiting the circulant structure of tracking-by-detection with kernels. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; pp. 702–715. [Google Scholar]
Kwon, J.; Lee, K. Visual Tracking Decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; pp. 1269–1276. [Google Scholar] [CrossRef]
Deng, R.; Cui, C.; Remedios, L.W.; Bao, S.; Womick, R.M.; Chiron, S.; Li, J.; Roland, J.T.; Lau, K.S.; Liu, Q.; et al. Cross-scale Multi-instance Learning for Pathological Image Diagnosis. Med. Image Anal. 2024, 82, 103124. [Google Scholar] [CrossRef] [PubMed]
Voigtlaender, P.; Luiten, J.; Torr, P.H.; Leibe, B. Siam R-CNN: Visual Tracking by Re-Detection. In Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 6577–6587. [Google Scholar] [CrossRef]

Figure 1. Moving objects in high-time-resolution satellite video. (a) A multitude of small objects are moving; (b) unobscured vehicle which is framed in green; (c) partially obscured vehicle; and (d) and (e) KCF tracking results on a rotating aircraft, where the red box indicates the KCF trace result and the blue box indicates the ground truth.

Figure 2. Pipeline of rotation-adaptive correlation filter tracking with motion estimations.

Figure 3. Visualization of tracking. When occlusion occurs, the peak value decreases, and when the peak value is low enough, the object is completely occluded. (a) and (b) are targeted at vehicles and vessels, respectively.

Figure 4. RACFME’s precision plot on each threshold. (a) and (b) tracking objects are vehicles and ships, respectively. Due to different scales and motion characteristics, we choose different occlusion detection thresholds. The highest accuracy is 0.3 for vehicles and 0.4 for ships.

Figure 5. Examples of images from the dataset. (a,d) are vehicle images; (b,e) are plane images; and (c,f) are ship images.

Figure 6. Precision plots over all the sequences.

Figure 7. Precision plots over rotating object sequences.

Figure 8. Precision plots over occluded object sequences.

Figure 9. Visualization of the tracking results. The number in the top-left corner of each image is the number of current frames in the video.

Table 1. The shortcomings of mainstream correlation filtering method KCF based on HOG and ECO.

Reference	Methods	Research Gap
[7]	HOG	HOG features describe the object by calculating the gradient direction and size, and the gradient direction is based on a fixed coordinate system. Cannot automatically adapt to such a rotation.
[6]	KCF	The KCF model based on HOG does not have the ability of rotation adaptation and does not have the mechanism to deal with object occlusion.
[21]	ECO	ECO is better than KCF in some aspects, but there are still shortcomings in rotating target tracking and occlusion processing, and the computational complexity is high.

Table 2. Results of object tracking on all sequences, bold numbers indicate the highest.

	RACFME	KCF [6]	ECO [22]	Siam R-CNN [27]	MEDIANFLOW [25]	MIL [26]	BOOSTING [17]
AUC	78.6%	57.4%	59.1%	72.7%	10.2%	52.2%	49.6%
Precision score	96.7%	73.4%	77.8%	90.5%	8.2%	42.1%	47.6%
Success score	95.0%	70.9%	73.2%	86.2%	8.2%	51.7%	48.1%
FPS	116	130	60	46	90	7	62

Table 3. Results of rotating object tracking, bold numbers indicate the highest.

	RACFME	KCF [6]	ECO [22]	Siam R-CNN [27]	MEDIANFLOW [25]	MIL [26]	BOOSTING [17]
AUC	72.8%	48.7%	54.6%	70.2%	52.8%	67.5%	59.8%
Precision score	89.1%	57.2%	67.1%	85.9%	57.4%	67.4%	62.2%
Success score	86.7%	50.1%	61.3%	83.6%	50.7%	82.6%	74.5%
FPS	114	127	51	44	104	8	65

Table 4. Results of occluded object tracking, bold numbers indicate the highest.

	RACFME	KCF [6]	ECO [22]	Siam R-CNN [27]	MEDIANFLOW [25]	MIL [26]	BOOSTING [17]
AUC	76.5%	37.4%	42.8%	52.7%	53.0%	24.0%	44.9%
Precision score	91.3%	47.2%	54.8%	64.1%	55.7%	41.1%	47.7%
Success score	92.0%	45.8%	52.4%	62.7%	61.9%	42.7%	48.4%
FPS	121	131	60	46	97	7	64

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, X.; Zhang, H.; Mei, C.; Wu, J.; Ai, H. RACFME: Object Tracking in Satellite Videos by Rotation Adaptive Correlation Filters with Motion Estimations. Symmetry 2025, 17, 608. https://doi.org/10.3390/sym17040608

AMA Style

Wu X, Zhang H, Mei C, Wu J, Ai H. RACFME: Object Tracking in Satellite Videos by Rotation Adaptive Correlation Filters with Motion Estimations. Symmetry. 2025; 17(4):608. https://doi.org/10.3390/sym17040608

Chicago/Turabian Style

Wu, Xiongzhi, Haifeng Zhang, Chao Mei, Jiaxin Wu, and Han Ai. 2025. "RACFME: Object Tracking in Satellite Videos by Rotation Adaptive Correlation Filters with Motion Estimations" Symmetry 17, no. 4: 608. https://doi.org/10.3390/sym17040608

APA Style

Wu, X., Zhang, H., Mei, C., Wu, J., & Ai, H. (2025). RACFME: Object Tracking in Satellite Videos by Rotation Adaptive Correlation Filters with Motion Estimations. Symmetry, 17(4), 608. https://doi.org/10.3390/sym17040608

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

RACFME: Object Tracking in Satellite Videos by Rotation Adaptive Correlation Filters with Motion Estimations

Abstract

1. Introduction

2. Related Work

2.1. Object Tracking in Satellite Videos

2.2. Moving Object Tracking

2.3. Kernelized Correlation Filter

3. Proposed Method

3.1. Rotation-Adaptive Feature Enhancement Module

3.2. Motion Estimation

4. Experiment and Analysis

4.1. Datasets and Evaluation Metrics

4.2. Experimental Settings

4.3. Experimental Analysis on Moving Object Tracking

4.4. Experimental Analysis on Rotating Object Tracking

4.5. Experimental Analysis on Occluded Object Tracking

5. Qualitative Evaluation

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI