Next Article in Journal
Towards Digital-Twin Assisted Software-Defined Quantum Satellite Networks
Previous Article in Journal
Stereo Event-Based Visual–Inertial Odometry
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Particle Filter Tracking System Based on Digital Zoom and Regional Image Measure

1
Beijing Key Laboratory for Precision Optoelectronic Measurement Instrument and Technology, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
2
Yangtze Delta Region Academy of Beijing Institute of Technology, Jiaxing 314019, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(3), 880; https://doi.org/10.3390/s25030880
Submission received: 13 December 2024 / Revised: 17 January 2025 / Accepted: 23 January 2025 / Published: 31 January 2025
(This article belongs to the Section Sensing and Imaging)

Abstract

:
To address the challenges of low accuracy and the difficulty in balancing a large field of view and long distance when tracking high-speed moving targets with a single sensor, an ROI adaptive digital zoom tracking method is proposed. In this paper, we discuss the impact of ROI on image processing and describe the design of the ROI adaptive digital zoom tracking system. Additionally, we construct an adaptive ROI update model based on normalized target information. To capture target changes effectively, we introduce the multi-scale regional measure and propose an improved particle filter algorithm, referred to as the improved multi-scale regional measure resampling particle filter (IMR-PF). This method enables high temporal resolution processing efficiency within a high-resolution large field of view, which is particularly beneficial for high-resolution videos. The IMR-PF can maintain high temporal resolution within a wide field of view with high resolution. Simulation results demonstrate that the improved target tracking method effectively improves tracking robustness to target motion changes and reduces the tracking center error by 20%, as compared to other state-of-the-art methods. The IMR-PF still maintains good performance even when confronted with various interference factors and in real-world scenario applications.

1. Introduction

Single object detection and tracking (SODT) technology has important applications in video surveillance, target analysis, smart ice and snow sports, and other fields. The technology has garnered widespread attention from scholars and industries worldwide. However, achieving a balance between wide viewing angles and high resolution in machine vision has always been a challenge. Maintaining spatial resolution and temporal resolution over long distances remains a key hurdle in the field of machine vision. Capturing, processing, and transmitting images with higher spatial resolution requires more time, consequently reducing temporal resolution [1]. The accuracy of an algorithm can be significantly influenced by spatial resolution [2], with even minor variations in it leading to notable consequences [3]. Both high temporal and spatial resolutions are highly desirable for machine vision [4], as they can enhance the precision of object detection and tracking [5].
The current single-target tracking methods mainly comprise discriminant correlation filtering methods [6] and methods based on deep learning [7]. Discriminant correlation filtering methods typically adopt the framework of Kalman filtering [8], which demonstrates superior tracking performance for linear and Gaussian motion. However, most real-world motion is nonlinear. The particle filter (PF) is employed for nonlinear motion filtering and prediction, as it is unaffected by linear and Gaussian distribution [9]. Nevertheless, the time complexity and particle degradation associated with PF can easily result in filtering processes falling into local optima, thus hindering the tracking accuracy of nonlinear motion and non-Gaussian targets [10]. Some researchers have introduced swarm intelligence to particle filtering to address the issue of particles getting stuck in local optima. The firefly particle filter algorithm reduces the time complexity and local optimum phenomenon by combining the firefly algorithm and the particle filter algorithm [11]. Furthermore, deep learning-based methods improve tracking accuracy by matching target features using a data-driven approach [12]. However, when applied to high-resolution images, these methods often sacrifice temporal resolution.
The experimental verification and datasets used in these methods are typically limited to small size changes in videos or images and cannot be readily applied to tracking scenarios with a wide dynamic range of distance and resolution.
A study achieves the region of interest (ROI) tracking within a large field of view by setting ROI [13]. However, this method is limited to targets with small motion changes at close range and is not suitable for tracking and shooting high-speed moving targets in ice and snow sports.
In this paper, we propose a particle filter tracking system based on digital zoom and regional image measure. Our contributions are as follows:
(1)
We introduce a technique for adaptive digital zoom that dynamically adjusts the location and size of the region of interest (ROI), based on the target’s nonlinear normalized center distance and aspect ratio. This enables real-time detection and tracking within a broad field of view and high dynamic range. The spatial and temporal resolution of the target tracking process is enhanced.
(2)
In addition, we introduce a particle filter tracking algorithm that integrates multi-scale regional measures into the resampling process. By constructing a multi-scale regional measure feature module, the resampling is improved according to the target change state, and the extended Kalman filter (EKF) is applied to improve the importance density of the particle filter. The improved method removes the interference of motion mutations and improves the stability and accuracy of target tracking.
(3)
Furthermore, our method achieves impressive results on public benchmark test sets. It outperforms existing methods, including deep feature matching and trackers based on correlation filters, in terms of target tracking accuracy and tracking error. Our method also attains state-of-the-art performance for object tracking tasks in scenes with complex motion variations and large distance variations, thereby demonstrating the general applicability of our approach.
The rest of this paper is organized as follows: Section 2 gives an overview of related work; Section 3 explains in detail the proposed particle filter tracking system based on digital zoom and regional image measure; Section 4 describes the experimental setup and presents the results; and finally, Section 5 concludes this paper.

2. Related Works

2.1. Application of ROI

The utilization of ROI is widely employed in various detection and tracking applications as a means of reducing noise interference in non-target areas. In the field of medical imaging, there are technologies such as remote photoplethysmography [14], detection of hand veins and handprints [15], along with blink monitoring for patients with ALS [16], all of which need to be performed only in high-quality diagnostic ROI areas [17]. For computing-intensive fields, aerial surveillance transmission of ROI images can substantially decrease the data volume collected by drones [18], and information reconstruction on the ROI can also greatly speed up processing [19]. Lane detection applications also benefit from processing smaller ROIs instead of the entire image, enabling real-time detection, which is particularly important for transportation warning systems [20]. Correlation filtering-based target tracking methods also adopt ROI technology. The object of required is detected and captured, and an estimation filter is utilized to employ the ROI position in the subsequent frame. However, fixed ROIs often limit the applicability of these methods, and dynamic signal ranges often necessitate adaptively changing ROIs. The selection strategy for ROIs has also been a topic of discussion among researchers. Manual selection requires a stable signal state. Certain studies have employed partition coding on images, hidden Markov chains, and Gaussian mixture models to differentiate between the ROI and background unrelated to road traffic [21]. Kiadtikorn and Tatnall [22] and Ma et al. [23] separately implemented the histogram segmentation technique and a deep learning model enhanced with attention mechanisms to identify regions of interest (ROIs) within remote sensing data. However, the large field of view and high resolution make ROI extraction a more complex task. Therefore, we propose an ROI selection method that is adaptive to changes in the field of view, accomplished through the normalization of the target distance and aspect ratio of the detection results.

2.2. Target Tracking

Target tracking faces several challenges, including occlusion (OCC), deformation (DEF), camera motion (BC), illumination change (IV), and rapid motion (MB). Numerous methods have been proposed to address these problems.
For the tasks of predicting trajectories of high-speed maneuvering targets, filtering algorithms such as Kalman filtering (KF) are commonly utilized to process original information like the relative distance and angle of the target [24]. These algorithms are then combined with target motion models to enable accurate tracking of target trajectories. The presence of nonlinearity in the system can lead to a loss of accuracy, particularly in cases of high nonlinearity where the estimation results of the extended Kalman filter (EKF) may exhibit significant errors [25]. To mitigate these challenges, Sort introduced a method that integrates Kalman filtering with the Hungarian algorithm to enhance target motion estimation and data association accuracy [26]. Additionally, the ByteTrack algorithm was developed to account for both low-confidence and high-confidence detection boxes, employing different strategies based on the similarity of detection boxes to enhance tracking performance [27].
The advancement of deep learning models has facilitated the formulation of the Siam RPN++ algorithm, which leverages twin networks to match deep features of the target, thereby improving the accuracy of target tracking [12]. Furthermore, researchers had also improved the accuracy of target tracking by introducing feature combination modules [28] and different attention mechanisms [29]. Sumaira et al. introduced the SPT model [30], which incorporates a re-identification module specifically designed for pedestrian targets. By integrating metric learning, they enhanced the tracking performance of the Siamese model. However, the model’s reliance on shared weights for extracting similar features makes it vulnerable to distortions caused by scene variations. Addressing this limitation, Feng et al. developed a solution by integrating multiple attention mechanisms with background features [31], enabling the model to adaptively update and better handle diverse target changes in complex environments. The Transformer-based single object tracking (SOT) model stands out for its straightforward architecture and optimal balance between performance and speed. To tackle the issue of targets moving out of the field of view, Miao et al. implemented a backtracking recognition module alongside a trajectory disappearance discrimination model [32], also taking into account fine-grained low-level target features. Further advancing the field, Liang et al. proposed the Multi Local Guided Tracker (MLGT) [33], which enhances target modeling and representation through the fusion of multi-level and multi-stream features. Notably, to ensure real-time processing, the model optimizes computational efficiency by reducing the size of processed images and search areas.
Despite these advancements, the effectiveness of these methods is often compromised by several challenges, including limited search areas, the absence of template updates, and sample imbalance. These factors render the models susceptible to variations in the field of view and the deformation of small-sized targets.
Particle filtering (PF) leverages random samples to estimate the probability density function, and the integral operation is replaced by the sample average to obtain the target state with the minimum variance estimate [34]. PF is particularly effective in handling nonlinear and non-Gaussian target tracking challenges due to the randomness of particles. To address issues like particle weight degradation, researchers have explored the integration of swarm intelligence, including particle swarm optimization (PSO) [35] and firefly algorithms [36], for locating high-likelihood areas.
However, this type of optimization method may be attracted by local optimal values during the iterative search process, and the global search ability will be reduced. The kernel-correlated particle filter [37] technique incorporates the Histogram of Oriented Gradients (HOG) [38] feature for enhanced discrimination and has demonstrated effectiveness in tracking partially occluded and rotated targets, underscoring the importance of image features in target tracking.
When applied to scenarios with large fields of view, deep learning models often face limitations due to their low resolution, particularly when tracking small targets that undergo significant changes. These target variations further constrain the robustness and accuracy of traditional particle filtering improvement methods. To address these challenges, we propose an enhanced particle filtering algorithm that incorporates image change features measured within the target area. By leveraging a Kalman filter to approximate the particle importance density function, we construct a robust and high-precision tracking framework. This approach, when integrated with an adaptive variable region of interest (ROI) module, enables real-time processing of high-resolution images from large scenes while maintaining the robustness and accuracy of target tracking. The proposed method effectively balances computational efficiency and tracking performance, making it well-suited for applications requiring precise tracking in complex and dynamic environments.

3. Methodology

3.1. Adaptive Digital Zoom Tracking System

This study aims to address the challenge of accurate detection and tracking of targets with wide-ranging changes of distance and size changes. For instance, in scenarios like tracking skiers in large outdoor venues, it is crucial to take into account both small targets at long distances and large targets at close ranges. To address this issue, we propose a particle filter tracking method based on digital zoom and image area measurement, which forms the core of a novel object detection and tracking system capable of tracking targets of varying sizes and distances within a large field of view.
The overall workflow of the proposed method is illustrated in Figure 1. The system primarily consists of two key modules. First, the region of interest (ROI) Adaptive Digital Zoom module adjusts the ROI in the input image based on the detection and tracking results. This refined region is then fed into a universal object detection module to obtain target information. Second, the target details (including size and position) generated by the object detection module are passed to the object tracking module, which employs an improved multi-scale regional measure resampling particle filter (IMR-PF) to address potential target omission issues. To implement this method, we developed a hardware system, as shown on the right side of Figure 1. The system comprises a high-resolution camera and a pan-tilt (P/T) rotary turntable. The high-resolution camera captures wide field-of-view images, which are processed for adaptive ROI digital zoom, object detection, and variable particle filter tracking.
The rotating turntable is designed to synchronize with the movement of small targets, ensuring that it consistently points toward them. It serves as a platform for high-resolution cameras and other narrow field-of-view sensors, enabling the capture of higher-quality images and facilitating close-up imaging of small targets. To achieve this functionality, the target’s center position, calculated by the IMR-PF method, is converted into control signals. These signals drive the turntable, enabling real-time pan-tilt (P/T) rotation and precise target tracking. Additionally, the detection results generated during the target information calculation process are stored in the trajectory management submodule for further analysis and utilization.
To validate the effectiveness of the algorithm in practical applications, we selected a large skiing venue with a range of 0–170 m as one of the test scenarios. Based on the single-aperture imaging model, we chose a high-resolution, high-frame-rate, large field-of-view industrial camera (MS-XG903C/M camera produced by MINDVISION in Shenzhen, China) as the visual sensor. This setup ensures robust performance in capturing and tracking targets across varying distances and sizes in a dynamic environment.
h i = f d h o
where f is the focal length of the visual sensor, d is the object distance, h o is the image height, and h i is the object height. Its resolution is 4208 × 2160, the field of view is 40.2 × 30.6, the focal length is 8 mm, and the imaging frame rate is 100 fps. We selected a 45× zoom variable focal length camera (The Sony PXW-Z750 camera produced by Sony in Tokyo, Japan) for close-up shooting.

3.2. ROI Adaptive Digital Zoom Algorithm

In a scene with a large field of view, the size of the detection resolution can impact the detection speed. Whether the ROI is either too large or too small, the detection of the target becomes challenging. It is crucial to choose a suitable ROI. The ROI adaptive digital zoom algorithm processes the original picture of the visual sensor, and it updates the ROI in real time based on the detection and tracking results. The updated ROI is then used to extract the image in the picture with original resolution. The extracted image is digitally zoomed for target detection.
In the adaptive ROI algorithm, the image of the entire field of view (FOV) captured by the visual sensor is divided into m × n partitions, and each partition is subjected to cyclic detection using the detection algorithm (ROI ≤ FOV). When a target is detected, the algorithm generates an ROI based on the detection results and predicts the ROI for the subsequent frame using both detection and tracking results. In this method, m = 6 and n = 4 are selected for partitioning. During the process of target change, the region of interest (ROI) must be updated in advance to ensure the target is correctly included in the next frame. This pre-update is performed adaptively based on changes in target information and can be modeled as a regression relationship with the target’s characteristics.
The regression relationship has been proven to require careful consideration of factors such as the error distance and aspect ratio between the predicted target and the true target [39]. Therefore, the size-adaptive update of the region of interest (ROI) is represented by Equation (2), which ensures that the ROI adjusts proportionally to changes in the size factor (representing the width and height of the target) and the error distance factor. Specifically, when the error distance increases, the size of the ROI is expanded proportionally to ensure the target remains within the ROI area. The equation is expressed as follows:
R O I n e w = S R O I o l d , w , h + β D R O I o l d , x , y
Among them, R O I o l d is the previous ROI, w , h are the width and height of the target box, x , y are the center positions of the target box, S · is the update function based on the size factor, D · is the update function based on the position factor, and the weighting factors are represented by β = 0.4 .
The target size not only impacts the effectiveness of target detection and tracking but also influences the subjective perception of the observer regarding the image display. In ROI updates, the size adjustment is determined by multiplying two ratios: the change in target height between consecutive frames h k / h k 1 and the ratio of the target height in the previous frame h k 1 to the preset height, as shown in Equation (3). When the target height in the current frame exceeds that of the previous frame, the ROI size (without considering the aspect ratio and distance factors) should be increased. Additionally, the target size affects not only detection and tracking performance but also the observer’s subjective perception of the image display. To mitigate these effects, our objective is to adjust the target height to approach a preset height of 360 pixels after ROI updates. To prevent the target from overflowing the boundaries excessively, we introduced a minimum of 20% redundant space ( α is the size bias factor, and α = 0.2 in this method).
R O I = h k h k 1 × h k 1 360 + α × R O I o l d
Inspired by the CIOU [39], updating the ROI size directly without considering aspect ratio factors can make the system susceptible to the influence of target rotation. When the target rotates, its motion dynamics change significantly, which may cause the target to rapidly move outside the ROI. To address this, we incorporated a multiplier for aspect ratio consistency into Equation (3). This multiplier expands the ROI when the target rotates, ensuring that the target remains within the detection area. Furthermore, a normalized centroid distance multiplier was added to Equation (3). When the predicted target center deviates significantly from the actual target center, the normalized centroid distance increases, prompting the ROI to expand further. This prevents the target from moving too quickly and overflowing the image boundaries. This approach enhances the stability of the algorithm. The ROI update equation (Equation (4)), which incorporates improvements based on the position factor and size factor, is as follows:
S = 4 π 2 arctan w k 1 h k 1 arctan w h 2 × h k h k - 1 × h k 1 360 + α × R O I o l d D = ρ 2 b , b k 1 c 2 × h k h k - 1 × h k 1 360 + α × R O I o l d
Among them, b = [ x , y ] and b k 1 = [ x k 1 , y k 1 ] are the center points of the target boxes of adjacent images, c is the diagonal length of the target box, and ρ is the Euclidean distance.
During the target detection process, the target detection failure delay threshold t d e l a y is set considering the case of target loss for a long time. When the target loss time exceeds t d e l a y , we increase the ROI to prevent further loss. In the tracking state, the ROI follows the movement of the target to ensure it remains centered within the ROI. Consequently, the positional distribution of the target within the image resembles a Gaussian distribution. We propose an ROI size adaptive update method based on time and position changes to adapt to the target loss situation, and the position change factor γ is calculated by the improved Gaussian distribution. If the target deviates significantly from the center position in the current frame, the ROI scaling amount increases. Conversely, if the target is stable, the ROI scaling amount is appropriately reduced. To ensure that the position change factor is 1 when the target position distance from the center is 0, the position change factor is calculated using Equation (5), which leverages the properties of the normalized Gaussian distribution (with a range of [0, 1]). The equation is defined as follows:
γ = 1 2 π σ 2 exp b b c e n 2 2 σ 2
Among them, b c e n = [ x c e n , y c e n ] is the center point of the picture and σ is the target deviation variance.
Therefore, when the target loss time exceeds the threshold, the position factor is primarily updated based on the time factor and the Gaussian distribution improvement. Conversely, the ROI size update method is updated based on the aspect ratio and distance factor improvement. The general ROI size adaptive update equation (Equation (5)) is derived by simplifying and combining Equations (3) and (4), as follows:
R O I n e w = t l o s s t d e l a y + 0.001 × 1 2 π σ 2 exp x μ 2 2 σ 2 × R O I o l d , t l o s s > t d e l a y ρ 2 b , b k 1 c 2 + α 4 π 2 arctan w k 1 h k 1 arctan w h × h k h k - 1 × h k 1 360 + α × R O I o l d ,   others
In addition, we hope that the position of the ROI can also change adaptively with the detection results. At the same time, to enhance the stability of ROI position adjustments, this study employs a method that updates the ROI position through nonlinear normalization of the center point distance, leading to improved stability in target detection.
R o I p o s = x , y + Δ x , y · sigmod ρ 2 b , b k 1 c 2
The ROI adaptive digital zoom algorithm is implemented as Algorithm 1.
Algorithm 1 The ROI adaptive digital zoom algorithm
Input: x, y, w, h
Output: ROI
1: 
Initialize(ROI, Result, Result_track);
2: 
Resultinit(x, y, w, h) = Partition loop detect(img);
3: 
ROIinit = Init_ROI(img, Resultinit);
4: 
While True do:
5: 
  Result(x, y, w, h, c) = Detect(img ⊗ ROI);
6: 
  Resulttrack(X, Y, w, h) = track(Result−1);
7: 
  If tloss > tdelay then:
8: 
    ROInew = Renew_lost_ROI(ROI−1, Resulttrack−1);
9: 
   Else then:
10: 
   ROInew = Renew_ROI(ROI−1, Resulttrack−1);
11: 
 End;
12: 
  ROIpos = Pos_renew_ROI(ROIpos−1, Resulttrack−1);
13: 
  Return ROInew, ROIpos;
14: 
  If stop then:
15: 
    Break;
16: 
End;

3.3. Particle Filter Tracking Algorithm Based on Multi-Scale Regional Measure Resampling

To tackle the trajectory mutation issue in target tracking, this paper proposes an improved particle filter algorithm based on multi-scale regional measure resampling (IMR-PF). In IMR-PF, the target area image measure is first calculated, and then the particle filter based on the extended Kalman prediction is used. At the same time, the firefly algorithm, which leverages the image measure, is applied to assign weights to the particles, thus improving the precision of the target tracking.

3.3.1. Multi-Scale Regional Feature Module

The spatial distribution of the target is continuous during the motion process, but may exhibit as discontinuous due to the motion mutation. These discontinuities can be used to identify changes in the target’s motion pattern. In the IMR-PF algorithm, the target box and its surrounding region are considered for calculating the feature distribution. Previous work [40] has demonstrated that the maximum eight-neighborhood sub-maximum (MENS) can be used to filter out background interference. In IMR-PF, a set of MENS local intensity-weighted gradient filters is employed to extract target area features.
The MENS filter contains eight sub-filters, each of which corresponds to a neighborhood in a direction. By applying each sub-filter to the original image through convolution and calculating the global maximum, the filter template f S F l is shown in Figure 2, which is a ring structure with a size of 3 × 3. By convolving the original image I O , we can easily obtain the sub-maximum filter map m B l . The process can be summarized as follows:
m B l = I O f S F l
In order to more accurately characterize the continuity of spatial distribution between frames, we propose a method to improve image entropy based on multi-scale regional measure. The improved image entropy calculation method is as follows:
G j = D o w n s a m p l e ( G j 1 ) , G 0 = I 0 m G b l j = G j f S F l m B l k = f u s e _ o r _ u p s a m p l e m G b l 0 , m G b l 1 , , m G b l j d E B B l k , B l k 1 = m B l k , m B l k 1 , l = 1 , , 8 d m B l = d E B B l k , B l k 1 h = 1 m p I h log 2 p I h
Among them, m B l k and m B l k 1 are the sub-maximum filter feature maps of the corresponding areas of two adjacent frames of images, m is the number of pixels with different grayscale values in the window, and p I h is the probability of the occurrence of a pixel value with grayscale I h .
Finally, the improved image entropy is normalized to obtain the multi-scale target area measure, and the sum of the improved image entropy reflecting the target mutation state is denoted by d m B l * .
S E m = l 8 d m B l , d m B l * = d m B l S E m

3.3.2. Improved Firefly Algorithm Based on Multi-Scale Area Measure (IMFA)

The resampling process of the PF involves duplicating high-weighted particles while discarding low-weighted particles. Nevertheless, this can lead to a particle impoverishment issue in severely degraded particle sets. The Firefly algorithm, employing swarm intelligence, was introduced as a solution to this problem. In the standard firefly algorithm, particles move toward high-brightness particles with a fixed step factor. The literature [11] has verified the correlation between the PF algorithm enhanced by the Firefly algorithm and the particle likelihood. When the tracking state changes suddenly, particle optimization guided by a priori tends to increase the inertial error.
Furthermore, particle interactions significantly increase computational complexity. Therefore, an enhanced firefly algorithm is proposed to address weight degradation and particle scarcity issues through improved resampling and particle distribution.
At the same time, by incorporating target motion characteristics, the motion state reflected by the image measure is integrated to guide particle optimization. This enables dynamic adjustment of the particle group’s motion trend, ensuring that most particles gravitate towards high-likelihood regions during sudden motion changes, thereby augmenting the algorithm’s ability to handle unexpected motion scenarios. We define the relative intensity and relative distance of particles based on the latest measure. The particle with the highest intensity determines the direction of movement of other particles, while the attraction determines the distance of movement of particles. In the improved firefly algorithm, the maximum attraction is 0.85 and the light absorption coefficient is 1.
I n e w i k = I 0 × e η z n e w z p r e d i
Among them, z n e w is the latest measure value and z p r e d i is the predicted observation value of particle i .
The attraction of particle i is improved by using multi-scale regional measure:
β d m i ( k ) = d m B l * × 1 β 0 × e γ r i g b e s t 2
where z i g b e s t is the distance between particle i and the optimal estimated particle z g b e s t . In the improved method, the more drastic the change in the target area, the stronger the attraction, which makes the particle closer to the likelihood area.
In the particle position update, a fixed update step factor may cause the firefly to vibrate near the optimal value, resulting in a decrease in calculation accuracy. Therefore, we use a variable update step in the particle position update equation,
x k , d m i = x k i + β k , d m i × x k g b e s t x k i + d k i × α × r a n d 1 2 d k i = x k g b e s t x k i d max d max = max x k g b e s t x k i , i = 1 , 2 , , N
where α is a step factor parameter, and r a n d refers to a random factor [0, 1] that obeys a uniform distribution and is used to represent a random perturbation in the firefly algorithm. d max is the maximum value of the distance between the best particle z g b e s t and other particles.

3.3.3. Improved Firefly Algorithm Optimized Particle Filter Based on Multi-Scale Regional Measure (IMR-PF)

The standard PF algorithm suffers from a degeneracy problem, where the weights become concentrated on only a few particles after several iterations. In addition to improving the resampling method, a suitable choice of importance density can alleviate this issue. Although the standard PF utilizes an easily implementable prior distribution p x k | x k 1 , it lacks the integration of new observation, causing the filter to degenerate quickly. To mitigate these problems, this method employs the EKF to sub-optimally approximate the optimal importance density. The mean and variance of the i th particle are calculated using the EKF and the latest observation information z k , to approximate the optimal importance density, subsequently utilizing these statistics to sample and update particles.
It is assumed that the state transition equation and measure equation of the system model are:
x k = F x k 1 , u k 1 + w k 1 z k = H x k + v k
where x k is the state vector, z k is the measure value at time k , x k 1 is the state vector at time k 1 , F is the state transfer matrix, and H is the measure matrix. v k and w k 1 are the system and measure noise, respectively.
v k and w k are represented by covariance matrices for uncorrelated zero-mean Gaussian noises R k and Q k , respectively.
The extended Kalman filter process is,
x k , p r e i = F ( x k 1 i ) P k , ρ r e ( i ) = F k ( i ) P k 1 ( i ) F k T ( i ) + Q k 1 K k = P k , p r e ( i ) H k T ( i ) [ C k i R k C k T ( i ) + R k ] 1 x ¯ k ( i ) = x k , p r e i + K k ( z k H ( x k , p r e i ) ) P ^ k ( i ) = P ^ k , p r e ( i ) K k H k ( i ) P k , p r e ( i )
Using the latest observations z k , we calculate the mean and covariance of the estimated ith particle in the step k as,
x ¯ k ( i ) = x k , p r e i + K k ( z k H ( x k , p r e i ) ) P ^ k ( i ) = P ^ k , p r e ( i ) K k H k ( i ) P k , p r e ( i )
In the context of the improved particle filter algorithm, the recommended density distribution of particle i generated by the extended Kalman filter algorithm can be obtained, and then the firefly algorithm improved based on image measure, which is used for resampling:
x k i ~ q ( x k i | x 0 : k 1 i , z 1 : k ) = N ( x k i , P ^ k ( i ) )
After obtaining the particle estimation value x k , d m i and observation target z k of the frame k , the particle weight ω k i can be calculated by the transition probability density function p x ¯ k i | x ¯ k 1 i , the importance function q x ¯ k i | x ¯ k 1 i , y 1 : k and the likelihood probability density function p y k | x ¯ k i . For the multivariate normal distribution, the importance weight of the particle can be approximately replaced by the likelihood probability density, and the particle weight ω k i is given by,
ω k i = ω k 1 i p y k | x ¯ k i p x ¯ k i | x ¯ k 1 i q x ¯ k i | x ¯ k 1 i , y 1 : k 1 ( 2 π ) | R k | 1 / 2 exp y k H x ^ k i T R k 1 y k H x ^ k i 2
The particle weights are normalized to more accurately approximate the state posterior probability density function. The normalized weight is,
ω ˜ k i = ω k i i = 1 N ω k i
Therefore, the state of the target to be tracked is the weighted average of the particles, calculated as,
x ^ k = i = 1 N ω ˜ k i x k , d m i .

4. Experiments

4.1. Experimental Setup

All algorithms were implemented on a PC equipped with a GTX 1660Ti GPU and an Intel (R) Core (TM) i7-10700 CPU @ 2.9 GHz, running on a Windows 10 operating system with the pytorch1.90 software environment. The GTX 1660Ti GPU is produced by NVIDIA in Santa Clara, CA, USA. The Intel (R) Core (TM) i7-10700 CPU is produced by Intel in Santa Clara, CA, USA. The experimental system includes a two-dimensional rotating platform integrated with a high-resolution wide-field camera and a close-up camera. The wide-field camera used is the MS-XG903C/M, which captures images at a resolution of 4208 × 2160 and a frame rate of 100 fps. The images acquired by this camera serve as the primary information source for the system, with full-resolution images being processed by the proposed adaptive digital zoom algorithm and the IMR-PF algorithm. The close-up camera is utilized solely for the auxiliary verification of experimental results and detailed observation of targets, ensuring accurate validation of the tracking performance. When the hardware system was equipped with algorithms for detection and tracking, we used the pixel difference between the target position and the center of image as the miss amount, which was used as the error of the PID control of the two-dimensional rotating platform. The two-dimensional rotating platform was driven to rotate in real time, while the large-field-of-view camera and the close-up camera also followed the target with the system. For the detection of long-distance fast-moving targets, we utilized the advanced deep learning-based Yolov8 [41] target detection algorithm as the target detection base-method of our system, which was optimized for GTX1660Ti through Tensor-RT. The confidence threshold was set according to the official recommendation at conf = 0.3.
In the tracking experiments, our method was compared with other state-of-the-art methods. The IMR-PF algorithm is associated with four improved methods based on KCF-PF [37], IFA-PF [11], FAPF [42], and PSO-PF [35] to evaluate the performance of the IMR-PF. Numerous deep learning models have been applied to object tracking, and we have conducted comprehensive comparisons with several state-of-the-art methods. These include Siamese network trackers based on deep feature matching (Siam RPN++) [12], multi-local guided trackers (MLGT) [33], tracking models utilizing adaptive update strategies with multiple attention mechanisms and background features (FBST) [31], and Siamese object tracking models integrated with re-identification modules (SPT) [30]. Additionally, we demonstrated the superiority of IMR-PF over other tracking techniques by comparing it with SORT based on Kalman filter [26], ByteTrack [27], oc-sort using momentum consistency and inverse Kalman filter [43], and EKF [25].
To evaluate the effect of the adaptive ROI long-range object detection and tracking algorithm, we also applied it to detect and track objects with significant scale changes.

4.2. Dataset

In the method validation experiments, we selected both public datasets and partially self-made datasets as testing sources. Among these, the VOT2021 dataset [44] includes 51 target tracking test sets, incorporating factors such as occlusion, camera movement, lighting changes, size variations, and target motion. Specifically, VOT2021 contains: over 30% of data with rapid motion (defined as target displacement exceeding 50% of the target size between frames); over 25% of data with small targets (defined as targets occupying 1% to 5% of the image area); and over 40% of data with target changes (defined as scale variations exceeding 2 times). To comprehensively quantify the robustness of our method to challenges such as target changes and rapid motion, we selected video segments from the VOT2021 dataset that exhibit significant size variations and include five interfering factors. The details of the selected video segments are provided in Table 1.
Additionally, to evaluate the effectiveness of our method and system in practical applications, we created a self-made dataset consisting of real videos capturing long-distance, fast-moving targets in U-shaped skiing competitions. This self-built dataset has a resolution of 4096 × 2160 and includes 5 video clips, each containing 340 frames. The target size in this dataset ranges from 30 × 30 to 510 × 510 pixels, and all the data feature the characteristics of target motion and target deformation.
By combining these datasets, we ensured a thorough evaluation of our method’s performance across diverse and challenging scenarios.

4.3. Evaluation Criteria

To quantitatively evaluate different methods, we used the central error (CE), overlap rate (OR), the value of AUC (area under the curve), and the success rate plot (SP) as evaluation metrics.
The CE is defined as,
C E ( k ) = x T k x G k 2 y T k y G k 2
where ( x G k , y G k ) represents the true center position of the target and ( x T k , y T k ) represents the center position obtained by the tracker. A good tracker should accurately track the target position in real time, so the CE value of the best tracker is expected to be small.
The OR evaluates the quality of the tracker by the ratio of the intersection and union of the bounding box obtained by the tracker and the bounding box of the true position,
O R ( k ) = a r e a ( R T k R G k ) a r e a ( R T k R G k )
where R G k represents the bounding box of the ground truth and R T k represents the bounding box of the tracker in the kth frame.
The SP represents the success rate curve corresponding to the correct tracking threshold for different overlap rates, and the value of AUC is the area value under the curve in the success rate graph. Precision reflects the performance of a tracking algorithm by measuring the distance between the estimated target position and the center of the ground truth at various thresholds.

4.4. Target Tracking Experiment on U-Skier Dataset

Our method aims to solve the problem of poor tracking performance due to changes in distance and target. Initially, we first conducted a comparative test on the self-built U-skier dataset. In the simulation experiment, we used the full-size U-skier video as the input source, detected and tracked the target through different tracking algorithms and the proposed algorithm, and compared and evaluated the tracking results of the algorithms.
Qualitative experiment: Figure 3 illustrates the tracking results of each tracker when the target changes dramatically. For the sake of clarity, we only show the results of the six methods with the best performance in Figure 3. As can be seen from the figure, the tracking results of the IFA-PF are better than the comparison method, and the target box is most consistent with the real target box. Secondly, ByteTrack also has a good effect. It can enhance the precision of target tracking by associating low-score detection boxes. However, the IFA-PF method predicts a larger target box, which may be because the particle filter in IFA-PF averages the filtering results, making the tracking box change smoother. The Siam RPN++ method obtains relatively poor results, which may be due to the image blur caused by target changes, which affects the discriminability of the model. Compared to the Siam RPN++ method, the MLGT approach exhibits smaller positional deviations in its prediction results. This improvement can likely be attributed to the incorporation of contextual information through self-attention and cross-attention mechanisms, which enhance the precision of target localization. However, when compared to the method proposed in this study, the MLGT’s prediction bounding boxes are notably larger, suggesting that our approach achieves more precise target delineation while maintaining superior localization accuracy.
Quantitative experiment: For the tracking results of each method, Table 2 describes the average center error and the time required for tracking calculation. It can be seen that our method achieves the smallest error and a high success rate.
Figure 4 illustrates the success rate and precision of different methods. It is evident that KCF-PF and PSO-PF exhibit better results compared to the FAPF algorithm, indicating that the use of KCF and swarm intelligence optimization can improve the effect of PF. Similarly, The Siam RPN++ method performs better results when the overlap rate threshold is small, indicating that the matching of similar image features can significantly improve the effect of the algorithm. However, as the overlap rate threshold increases, Siam RPN++ suffers from a decline in the feature matching effect, and its advantages diminish. Algorithms including Kalman filtering follow a similar trend. The Bytetrack and oc-sort methods demonstrate better robustness to state mutations of the target when the overlap rate threshold is low. The incorporation of contextual features allows MLGT to achieve a higher success rate at low overlap thresholds. The tracking success rate, measured by Intersection over Union (IoU) with a confidence interval of 0–50%, demonstrates that the proposed method improves the success rate by at least 5% compared to other leading methods. Deep learning-based methods generally exhibit higher success rates than non-deep learning methods, except for the proposed method. However, when the confidence interval exceeds 50%, deep learning-based methods, such as MLGT and FBST, show lower success rates compared to traditional methods. Although the success rate of the proposed method also decreases in this range, it still outperforms other methods, further proving that resolution is a critical factor influencing tracking success rates.
However, the precision curve reveals that this method underperforms at low position error thresholds, which restricts its applicability in large-scale scenarios where precise localization is critical. Our method combines Kalman filtering and particle filtering with image measure features to achieve better results regardless of the size of the threshold. The tracking error table demonstrates that our method is shown to improve temporal resolution through time-consuming comparison results, which has significant advantages in the case of high-resolution images. It can be observed that while traditional tracking methods such as KCF-PF and PSO-PF, along with recent deep learning-based approaches, demonstrate advantages in terms of success rates, their precision curves reveal relatively poor performance. This suggests that these methods are capable of accurately estimating the general region of the target’s features but struggle to pinpoint the exact location, likely due to limitations in resolution. Such a trade-off highlights the challenges these methods face in achieving both robust target localization and high positional accuracy, particularly in scenarios where fine-grained precision is critical.
Real-time Experiment: under identical hardware configurations and experimental conditions, a comparison of computational time consumption on Table 2 reveals that our method significantly improves time resolution, particularly demonstrating notable advantages when processing high-resolution images. Experiments indicate that non-deep learning methods generally incur lower computational costs, making them more suitable for real-time applications. In contrast, the computational cost of MLGT is higher than that of convolutional Siamese networks, primarily due to the extensive time required for cross-attention mechanisms to extract contextual information. Furthermore, compared to the earlier FAPF method, the improved IFA-PF approach shows a substantial reduction in computational consumption. This improvement underscores the effectiveness of optimizing resampling processes in enhancing algorithmic efficiency. By incorporating image measures to further refine resampling, we achieve additional efficiency gains, making our method both robust and computationally efficient for real-time tracking tasks.

4.5. Indoor Simulation Experiment on U-Skier Dataset

To verify the actual effectiveness of the detection and tracking system, we conducted a scene simulation experiment, as illustrated in Figure 5. The experiment utilized a 65-inch 4K resolution display to present the U-skier dataset video, simulating a skiing scene with a depth range L of 0–170 m. To further observe and analyze the target captured by the detection and tracking system, we employed a variable focal length video camera (with a focal length ranging from 25 mm to 375 mm) to zoom in and closely monitor the target, enabling more accurate determination of tracking errors.
Due to the close-up camera’s longer focal length compared to the wide-field camera, the simulated scene was set up with a minimum working distance of 4.5 m to ensure clear imaging. The display size for the simulation was 1440 mm × 810 mm, while the CMOS size of the wide-field camera was 8.8 mm × 6.6 mm. Based on the proportional scaling model described in Equation (1), the detected skier’s target size ranged from 18 × 18 to 301 × 301 pixels. During the detection and tracking process, we dynamically adjusted the focal length of the close-up camera based on the target’s pixel size to facilitate detailed observation of the target’s state.
For the actual tracking result processing, we manually aligned the close-up images of adjacent frames to ensure accuracy and consistency in the analysis. Given the real-time requirements of the tracking system, we focused on presenting the target trajectory results obtained by five different methods, all of which exhibit high real-time performance. These results are illustrated in Figure 6, providing a clear comparison of the tracking precision and robustness of each method under realistic conditions. This approach allows us to evaluate the effectiveness of the proposed system while maintaining the practical constraints of real-time operation.
Qualitative experiment: in order to compare the results intuitively, we show the trajectory tracking results of each tracker when the real motion trajectory of the target changes on the U-skier dataset in Figure 6. Figure 6a,c show that our method can accurately track the target when there are interferences from other human targets and changes in the target’s motion, but SiamRPN++ has tracking errors due to changes in target features. When the target’s motion changes dramatically, algorithms containing the Kalman filter experience position drift issues, as depicted in Figure 6c. It can also be seen from the picture that our method’s target box is most consistent with the real target under significant target changes. Then, the IFA-PF method exhibits a smoother over-prediction problem. This may be because IFA-PF is greatly affected by historical results, leading to failure in predicting the target trajectory when changes occur. The oc-sort method achieves relatively good results, although there remains a significant error in predicting the target box.
Quantitative experiment: Figure 7 illustrates the target measure and frame-by-frame tracking errors of various methods during target tracking. When the target measure value shows a peak mutation, the errors of each method increase accordingly. Figure 7 shows that the EKF has the largest error, failing to promptly reduce it following an increase in target error. This failure may be attributed to tracking failure. Although the PSO-PF method enhances the tracking success rate compared with the FAPF method, it experiences substantial error increments upon target changes, likely due to the ant colony algorithm being attracted to local optimal values during iterations, limiting the global search ability. The IFA-PF algorithm demonstrates relatively low error, indicating that taking the optimal particle as the center can improve the global search ability after the tracking state mutation. The center error of oc-sort is further reduced. Reference [43] also verifies that alternating reverse checking the parameters of the Kalman filter can reduce the center error of tracking. Our method has the lowest target center error in indoor simulation experiments. Figure 7b shows that the image measure is associated with the target trajectory change state and reflects the target area change. This observation shows that our method is less affected by the change in the target motion state, demonstrating that introducing measure features helps to improve the performance of target tracking.

4.6. Experiments on the VOT2021 Dataset

To evaluate the performance and robustness of the proposed method, we conducted experiments on the VOT2021 dataset, a general target tracking dataset. By testing the datasets with six interference factors such as Motion Change, Camera Motion, Occlusion, Background clutter, Size Change, and Illumination Change at the same time, we compared and evaluated this method with other tracking methods.
Qualitative experiments: in the case of long-term tracking under severe occlusion, our method demonstrated excellent performance. Compared with Kalman filter tracking matching Bytetrack and oc-sort, the image feature discriminant tracker SiamRPN++ can address the problem of tracking failure caused by long-term target loss. As illustrated in Figure 8b,e, when the target changes significantly, the image feature discriminant tracker performs poorly, and the method based on the latest observation center and trajectory matching is more robust. When other targets and background interference are present in the scene, the MLGT method exhibits smaller positional deviations, demonstrating that its use of contextual information effectively mitigates the impact of environmental changes. However, its performance diminishes when tracking small targets, as highlighted by the results from the U-skier dataset. This limitation underscores the influence of resolution as a constraining factor for deep learning-based methods. In contrast, our method enhances algorithmic stability by integrating contextual information into the calculation of target domain image measures. This approach not only improves robustness in complex environments but also addresses the challenges posed by small targets, making it more versatile and effective across a wider range of tracking scenarios.
Quantitative Experiments: Figure 9 further illustrates that the tracking success rate of methods relying solely on the target’s motion state is significantly lower in scenarios where the target is lost for an extended period. This limitation highlights the challenges of maintaining robust tracking when relying exclusively on motion-based information, particularly in cases of prolonged target occlusion or disappearance. This further confirms that appearance similarity is useful for long distances. In Figure 8c, it can be observed that the various methods have very similar results when the target moves smoothly and the changes are not obvious. However, when motion blur and fast motion are present, as shown in Figure 9c,h, many trackers fail due to the limited tracking range, especially when sudden changes occur in the motion trajectory. Consequently, our method enhances the particle filter method by introducing a regional change feature measure. The interference of objects in the datasets corresponding to Figure 9b,e affects the tracking effect. The particle filter method reduces the influence of the interference target by sampling a large number of particles. Figure 6 and Figure 8 show that our method has good performance for both large and small targets.
Overall, when the confidence interval of Intersection over Union (IoU) is between 0 and 50%, as shown in Figure 9a,e and Table 3, our method demonstrates a significant advantage in success rate for datasets with a high proportion of target changes and small target movements. When the confidence interval exceeds 50%, although our method’s success rate converges toward that of other advanced methods as the error range decreases, it still maintains an advantage on datasets with a higher proportion of small targets. This further indicates that our method is robust across different confidence intervals and performs well in diverse and challenging tracking scenarios.
By analyzing the precision plot for different interference factors in Figure 10, it is evident that lighting changes have a minimal impact on the tracking methods. In contrast, EKF and FAPA exhibit relatively large center position prediction errors across various interference factors, likely due to their overly simplistic models. When occlusion occurs, improved particle filtering methods and those incorporating Kalman filtering mechanisms achieve higher accuracy. This improvement can be attributed to the trajectory continuity discrimination, which reduces the impact of occlusion on target detection and tracking. Figure 10c further demonstrates that deep learning models enhanced with contextual information exhibit greater stability under background clutter interference. However, in scenarios involving target motion and size variations, our method achieves the highest accuracy. When combined with the accuracy results for other interference factors, our approach consistently outperforms others, underscoring its robustness and adaptability in diverse and challenging tracking environments. This comprehensive performance highlights the effectiveness of our method in maintaining precision and stability across a wide range of conditions.

5. Ablation Study

To verify the effectiveness of the different components designed in our method, we conducted ablation experiments on the U-skier dataset with the same settings as Section 4.1. Specifically, we compared the two improved submodules:
(1)
Adaptive digital zoom structure: this structure is capable of handling both small targets at a distance and large targets at a close distance by adaptively cropping and scaling the visual sensor image. It ensures a uniform target size in the input detection network. High-resolution and large-field-of-view images are time-consuming for target detection. This structure can reduce unnecessary interference while reducing the amount of algorithm calculation, so that the detection and tracking algorithms can focus on effective target features.
(2)
Multi-scale feature measure of target region: this model enhances the resampling rule of particle update and exhibits high responsiveness to significant target changes. By introducing the mutation type of target motion and target change information into the tracking algorithm, this structure facilitates improving tracking performance.
To validate the effectiveness of the two submodules, we conducted a comparative analysis of the improved method, evaluating the impact of including each submodule separately and both submodules simultaneously. Additionally, to assess the influence of the object detection base model on the enhanced tracking module, we selected YOLOv8 [41] and YOLOv5 [45] as base models for comparison, as both are widely recognized for their leading performance in object detection tasks. For the object detection model, we chose the official recommended confidence level, conf = 0.3.
As illustrated in Figure 11, the adaptive digital zoom structure, primarily designed to enhance the real-time performance of high-resolution and large field-of-view detection, also contributes to improved target tracking accuracy. Furthermore, the target area feature measurement module significantly enhances the overall performance of the algorithm, confirming the necessity of our proposed model improvements. By comparing different base models, we observed that our proposed modules consistently improve tracking accuracy across various detection models, demonstrating the broad applicability of our method.
From the central error curve, it is evident that the introduction of the adaptive zoom structure uniformly reduces target tracking errors. Moreover, incorporating measurement features effectively mitigates errors caused by target mutations without significantly increasing computational time. These results underscore the robustness and efficiency of our approach in achieving high-precision tracking while maintaining real-time performance.

6. Conclusions

In general, we present a novel improved particle filter algorithm named IMR-PF for visual tracking. The diverse and accurate distribution of particles is destroyed in the standard resampling process of Particle filter. To address this issue, we combine PF with regional image measure and KF, resulting in an enhanced overall accuracy of the particle filter algorithm and improved performance when dealing with target mutations. In addition, we utilize an adaptive zoom algorithm to deal with the problems of long distance and target size variation, which reduces the interference of the PF tracking algorithm and allows particles to be more biased towards high-likelihood areas. The proposed IMR-PF is validated on the VOT2021 dataset and our custom U-skier dataset, demonstrating its robustness in various thought-provoking situations. Simulation results show that the ROI adaptive digital zoom tracking system has remarkable tracking performance. We also observed that our method has a more significant impact on tracking success rates compared to deep learning models, particularly at high overlap thresholds. To address the issue of limited accuracy at these thresholds, we plan to integrate contextual deep feature extraction with the current object tracking module in future work. At the same time, we also plan to draw on the advantages of combining shallow features with deep information to solve target tracking problems in more complex environments.
Furthermore, we will explore the multi-target tracking problem based on the proposed IMR-PF framework and investigate its applicability in multi-viewpoint object tracking scenarios. This expansion will not only broaden the scope of our method but also provide valuable insights into addressing more challenging real-world tracking tasks.

Author Contributions

Q.Z.: Conceptualization, Methodology, Software, Writing—original draft, Writing—review and editing. L.D.: Resources, Funding acquisition, Methodology, Formal analysis. X.C.: Visualization, Validation, Writing—review & editing. M.L.: Methodology, Writing—review and editing. L.K.: Conceptualization, Writing—review and editing. Y.Z.: Supervision, Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Natural Science Foundation of China Project under Grant No. U1804261, and the National Key R&D Program of China: 2020YFB20019003, 2020YFB2009304, in part by JCJQ Program under Grant 2019-JCJQ-ZD254.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are not openly available due to reasons of sensitivity and are available from the author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wang, J.; Zhang, C.; Shum, H.-Y. Face Image Resolution versus Face Recognition Performance Based on Two Global Methods. In Proceedings of the Asia Conference on Computer Vision, ACCV 2004, Jeju, Republic of Korea, 27–30 January 2004; Volume 47, pp. 48–49. [Google Scholar]
  2. Koziarski, M.; Cyganek, B. Impact of Low Resolution on Image Recognitionwith Deep Neural Networks: An Experimental Study. Int. J. Appl. Math. Comput. Sci. 2018, 28, 735–744. [Google Scholar] [CrossRef]
  3. Song, R.; Zhang, S.; Cheng, J.; Li, C.; Chen, X. New Insights on Super-High Resolution for Video-Based Heart Rate Estimation with a Semi-Blind Source Separation Method. Comput. Biol. Med. 2020, 116, 103535. [Google Scholar] [CrossRef] [PubMed]
  4. Korshunov, P.; Ooi, W.T. Critical Video Quality for Distributed Automated Video Surveillance. In Proceedings of the 13th Annual ACM International Conference on Multimedia, Singapore, 6 November 2005; pp. 151–160. [Google Scholar]
  5. Handa, A.; Newcombe, R.A.; Angeli, A.; Davison, A.J. Real-Time Camera Tracking: When Is High Frame-Rate Best? In Computer Vision—ECCV 2012, Proceedings of the 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7578, pp. 222–235. ISBN 978-3-642-33785-7. [Google Scholar]
  6. Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. High-Speed Tracking with Kernelized Correlation Filters. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 583–596. [Google Scholar] [CrossRef] [PubMed]
  7. Song, Y.; Ma, C.; Wu, X.; Gong, L.; Bao, L.; Zuo, W.; Shen, C.; Lau, R.W.; Yang, M.-H. VITAL: Visual Tracking via Adversarial Learning. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8990–8999. [Google Scholar]
  8. Liu, S.; Wei, G.; Song, Y.; Liu, Y. Extended Kalman Filtering for Stochastic Nonlinear Systems with Randomly Occurring Cyber Attacks. Neurocomputing 2016, 207, 708–716. [Google Scholar] [CrossRef]
  9. Arulampalam, M.S.; Maskell, S.; Gordon, N.; Clapp, T. A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking. IEEE Trans. Signal Process. 2002, 50, 174–188. [Google Scholar] [CrossRef]
  10. Doucet, A.; Godsill, S.; Andrieu, C. On Sequential Monte Carlo Sampling Methods for Bayesian Filtering. Stat. Comput. 2000, 10, 197–208. [Google Scholar] [CrossRef]
  11. Tian, M.; Bo, Y.; Chen, Z.; Wu, P.; Yue, C. Multi-Target Tracking Method Based on Improved Firefly Algorithm Optimized Particle Filter. Neurocomputing 2019, 359, 438–448. [Google Scholar] [CrossRef]
  12. Li, B.; Wu, W.; Wang, Q.; Zhang, F.; Xing, J.; Yan, J. SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4277–4286. [Google Scholar]
  13. Chen, J.; Huang, H.-W.; Rupp, P.; Sinha, A.; Ehmke, C.; Traverso, G. Closed-Loop Region of Interest Enabling High Spatial and Temporal Resolutions in Object Detection and Tracking via Wireless Camera. IEEE Access 2021, 9, 87340–87350. [Google Scholar] [CrossRef]
  14. Feng, L.; Po, L.-M.; Xu, X.; Li, Y.; Cheung, C.-H.; Cheung, K.-W.; Yuan, F. Dynamic ROI Based on K-Means for Remote Photoplethysmography. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia, 19–24 April 2015; pp. 1310–1314. [Google Scholar]
  15. Kalluri, H.K.; Prasad, M.V.N.K.; Agarwal, A. Dynamic ROI Extraction Algorithm for Palmprints. In Advances in Swarm Intelligence, Proceedings of the Third International Conference, ICSI 2012, Shenzhen, China, 17–20 June 2012; Tan, Y., Shi, Y., Ji, Z., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 217–227. [Google Scholar]
  16. Yano, K.; Ishihara, K.; Makikawa, M.; Kusuoka, H. Detection of Eye Blinking from Video Camera with Dynamic ROI Fixation. In Proceedings of the 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No. 99CH37028), Tokyo, Japan, 12–15 October 1999; Volume 6, pp. 335–339. [Google Scholar]
  17. Rahmiati, P.; Fajri, A.; Handayani, A.; Mengko, T.L.R.; Suksmono, A.B.; Pramudito, J.T. Distributed System for Medical Image Transfer Using Wavelet-Based Dynamic RoI Coding. In Proceedings of the 7th International Workshop on Enterprise networking and Computing in Healthcare Industry, 2005 (HEALTHCOM 2005), Busan, Republic of Korea, 23–25 June 2005; pp. 191–196. [Google Scholar]
  18. Meuel, H.; Kluger, F.; Ostermann, J. Region of Interest (ROI) Coding for Aerial Surveillance Video Using AVC & HEVC. arXiv 2018. [Google Scholar] [CrossRef]
  19. Gao, J.; Jin, Z. A Region of Interest Prediction Method for Real-Time Online Dynamic Magnetic Resonance Imaging. In Proceedings of the 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Shanghai, China, 14–16 October 2017; pp. 1–5. [Google Scholar]
  20. Tian, J.; Wang, Z.; Zhu, Q. An Improved Lane Boundaries Detection Based on Dynamic ROI. In Proceedings of the 2017 IEEE 9th International Conference on Communication Software and Networks (ICCSN), Guangzhou, China, 6–8 May 2017; pp. 1212–1217. [Google Scholar]
  21. Dynamic Region of Interest Extract Method for JPEG2000 Coding|IEEE Conference Publication|IEEE Xplore. Available online: https://ieeexplore.ieee.org/document/5544812 (accessed on 1 September 2024).
  22. Kiadtikornthaweeyot, W.; Tatnall, A.R.L. Region of interest detection based on histogram segmentation for satellite image. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, XLI-B7, 249–255. [Google Scholar] [CrossRef]
  23. Ma, J.; Zhang, L.; Sun, Y. ROI Extraction Based on Multiview Learning and Attention Mechanism for Unbalanced Remote Sensing Data Set. IEEE Trans. Geosci. Remote Sens. 2020, 58, 6210–6223. [Google Scholar] [CrossRef]
  24. Khodarahmi, M.; Maihami, V. A Review on Kalman Filter Models. Arch. Comput. Methods Eng. 2023, 30, 727–747. [Google Scholar] [CrossRef]
  25. Einicke, G.A.; White, L.B. Robust Extended Kalman Filtering. IEEE Trans. Signal Process. 1999, 47, 2596–2599. [Google Scholar] [CrossRef]
  26. Wojke, N.; Bewley, A.; Paulus, D. Simple Online and Realtime Tracking with a Deep Association Metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar]
  27. Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. ByteTrack: Multi-Object Tracking by Associating Every Detection Box. In Computer Vision—ECCV 2022, Proceedings of the 7th European Conference, Tel Aviv, Israel, 23–27 October 2022; Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., Eds.; Springer: Cham, Switzerland, 2022; pp. 1–21. [Google Scholar]
  28. Li, G.; Chen, X.; Li, M.; Li, W.; Li, S.; Guo, G.; Wang, H.; Deng, H. One-Shot Multi-Object Tracking Using CNN-Based Networks with Spatial-Channel Attention Mechanism. Opt. Laser Technol. 2022, 153, 108267. [Google Scholar] [CrossRef]
  29. An, D.; Zhang, F.; Zhao, Y.; Luo, B.; Yang, C.; Chen, B.; Yu, L. MTAtrack: Multilevel Transformer Attention for Visual Tracking. Opt. Laser Technol. 2023, 166, 109659. [Google Scholar] [CrossRef]
  30. Manzoor, S.; An, Y.-C.; In, G.-G.; Zhang, Y.; Kim, S.; Kuc, T.-Y. SPT: Single Pedestrian Tracking Framework with Re-Identification-Based Learning Using the Siamese Model. Sensors 2023, 23, 4906. [Google Scholar] [CrossRef]
  31. Feng, W.; Meng, F.; Yu, C.; You, A. Fusion of Multiple Attention Mechanisms and Background Feature Adaptive Update Strategies in Siamese Networks for Single-Object Tracking. Appl. Sci. 2024, 14, 8199. [Google Scholar] [CrossRef]
  32. Miao, B.; Chen, Z.; Liu, H.; Zhang, A. A Target Re-Identification Method Based on Shot Boundary Object Detection for Single Object Tracking. Appl. Sci. 2023, 13, 6422. [Google Scholar] [CrossRef]
  33. Liang, X.; Chen, M.; Liu, E. MLGT: Multi-Local Guided Tracker for Visual Object Tracking. J. Real-Time Image Process. 2024, 21, 54. [Google Scholar] [CrossRef]
  34. Djuric, P.M.; Kotecha, J.H.; Zhang, J.; Huang, Y.; Ghirmai, T.; Bugallo, M.F.; Miguez, J. Particle Filtering. IEEE Signal Process. Mag. 2003, 20, 19–38. [Google Scholar] [CrossRef]
  35. Zhao, J.; Li, Z. Particle Filter Based on Particle Swarm Optimization Resampling for Vision Tracking. Expert Syst. Appl. 2010, 37, 8910–8914. [Google Scholar] [CrossRef]
  36. Yang, X.-S. Firefly Algorithms for Multimodal Optimization. In Stochastic Algorithms: Foundations and Applications, Proceedings of the 5th International Symposium, SAGA 2009, Sapporo, Japan, 26–28 October 2009; Watanabe, O., Zeugmann, T., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; pp. 169–178. [Google Scholar]
  37. Zhao, Z.; Feng, P.; Guo, J.; Yuan, C.; Wang, T.; Liu, F.; Zhao, Z.; Cui, Z.; Wu, B. A Hybrid Tracking Framework Based on Kernel Correlation Filtering and Particle Filtering. Neurocomputing 2018, 297, 40–49. [Google Scholar] [CrossRef]
  38. Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object Detection with Discriminatively Trained Part-Based Models. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1627–1645. [Google Scholar] [CrossRef] [PubMed]
  39. Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. IEEE Trans. Cybern. 2022, 52, 8574–8586. [Google Scholar] [CrossRef]
  40. Qin, Y.; Bruzzone, L.; Gao, C.; Li, B. Infrared Small Target Detection Based on Facet Kernel and Random Walker. IEEE Trans. Geosci. Remote Sens. 2019, 57, 7104–7118. [Google Scholar] [CrossRef]
  41. Glenn, J.; Ayush, C.; Jing, Q. Ultralytics YOLOv8. 2023. Available online: https://github.com/ultralytics/yolov5 (accessed on 22 January 2025).
  42. Gao, M.-L.; Li, L.-L.; Sun, X.-M.; Yin, L.-J.; Li, H.-T.; Luo, D.-S. Firefly Algorithm (FA) Based Particle Filter Method for Visual Tracking. Optik 2015, 126, 1705–1711. [Google Scholar] [CrossRef]
  43. Cao, J.; Pang, J.; Weng, X.; Khirodkar, R.; Kitani, K. Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 9686–9696. [Google Scholar]
  44. Kristan, M.; Matas, J.; Leonardis, A.; Felsberg, M.; Pflugfelder, R.; Kämäräinen, J.-K.; Chang, H.J.; Danelljan, M.; Cehovin, L.; Lukežič, A.; et al. The Ninth Visual Object Tracking VOT2021 Challenge Results. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, 11–17 October 2021; pp. 2711–2738. [Google Scholar]
  45. Jocher, G. Ultralytics YOLOv5. 2020. Available online: https://github.com/ultralytics/ultralytics (accessed on 22 January 2025).
Figure 1. The overall structure of the proposed method.
Figure 1. The overall structure of the proposed method.
Sensors 25 00880 g001
Figure 2. MENS filter diagram.
Figure 2. MENS filter diagram.
Sensors 25 00880 g002
Figure 3. Tracking results of the top six methods on the U-skier dataset (different video clips from top to bottom).
Figure 3. Tracking results of the top six methods on the U-skier dataset (different video clips from top to bottom).
Sensors 25 00880 g003
Figure 4. Evaluation result on the U-skin dataset. (a), Success rate of different methods on the U-skier dataset; (b), Precision of different methods on the U-skier dataset.
Figure 4. Evaluation result on the U-skin dataset. (a), Success rate of different methods on the U-skier dataset; (b), Precision of different methods on the U-skier dataset.
Sensors 25 00880 g004
Figure 5. System schematic: large FOV camera is used to capture object images, PC processor is used to target detection and tracking on the captured images, and the tracking results are used to drive the Pan-tilt Platform to rotate. Video camera is a zoom camera used to capture close-up images of the target and evaluate the results. L is the depth range of the entire system.
Figure 5. System schematic: large FOV camera is used to capture object images, PC processor is used to target detection and tracking on the captured images, and the tracking results are used to drive the Pan-tilt Platform to rotate. Video camera is a zoom camera used to capture close-up images of the target and evaluate the results. L is the depth range of the entire system.
Sensors 25 00880 g005
Figure 6. Tracking results of the top five tracking methods on the U-skier dataset in an indoor simulation environment (The following line shows the tracking results on a high-resolution image with a large field of view. The top line shows the close-up camera captured images corresponding to each tracking moment. The trajectory curve in the figure was formed by manually registering the front and rear frames and connecting the predicted center positions of different methods. The images in column (a) depict a scenario where the athlete’s target leaps in the distance, with interference from similar targets in the background. The images in column (b) show the athlete’s target descending and encountering a sudden change in trajectory due to the ski slope. The images in column (c) illustrate the athlete’s target leaping into the air from a close distance. The images in column (d) capture the athlete’s target descending at a close distance, experiencing a sudden change in trajectory upon encountering the ski slope).
Figure 6. Tracking results of the top five tracking methods on the U-skier dataset in an indoor simulation environment (The following line shows the tracking results on a high-resolution image with a large field of view. The top line shows the close-up camera captured images corresponding to each tracking moment. The trajectory curve in the figure was formed by manually registering the front and rear frames and connecting the predicted center positions of different methods. The images in column (a) depict a scenario where the athlete’s target leaps in the distance, with interference from similar targets in the background. The images in column (b) show the athlete’s target descending and encountering a sudden change in trajectory due to the ski slope. The images in column (c) illustrate the athlete’s target leaping into the air from a close distance. The images in column (d) capture the athlete’s target descending at a close distance, experiencing a sudden change in trajectory upon encountering the ski slope).
Sensors 25 00880 g006
Figure 7. Tracking results on the U-skier dataset in an indoor simulation environment. (a) Ground truth of the target trajectory; (b) Changes in the target trajectory and target area measure; (c,d) Tracking error curves of different methods as the video frame changes.
Figure 7. Tracking results on the U-skier dataset in an indoor simulation environment. (a) Ground truth of the target trajectory; (b) Changes in the target trajectory and target area measure; (c,d) Tracking error curves of different methods as the video frame changes.
Sensors 25 00880 g007
Figure 8. Results of the top six different methods on the VOT2021 dataset for tracking different data segments on the screen. (a) represents the success rate of these methods on the graduate_set of the VOT2021; (b) represents the success rate of these methods on the matrix_set of the VOT2021; (c) represents the success rate of these methods on the pedestrian_set of the VOT2021; (d) represents the success rate of these methods on the road_set of the VOT2021; and (e) represents the success rate of these methods on the shaking_set of the VOT2021.
Figure 8. Results of the top six different methods on the VOT2021 dataset for tracking different data segments on the screen. (a) represents the success rate of these methods on the graduate_set of the VOT2021; (b) represents the success rate of these methods on the matrix_set of the VOT2021; (c) represents the success rate of these methods on the pedestrian_set of the VOT2021; (d) represents the success rate of these methods on the road_set of the VOT2021; and (e) represents the success rate of these methods on the shaking_set of the VOT2021.
Sensors 25 00880 g008
Figure 9. Success rate of different methods on the different sub-datasets of the VOT2021 dataset. (a) represents the Success rate of these methods at graduate_set in the VOT2021 dataset; (b) represents the Success rate of these methods at matrix_set in the VOT2021 dataset; (c) represents the Success rate of these methods at soccer_set in the VOT2021 dataset; (d) represents the Success rate of these methods at nature_set in the VOT2021 dataset; (e) represents the Success rate of these methods at road_set in the VOT2021 dataset; (f) represents the Success rate of these methods at racing_set in the VOT2021 dataset; (g) represents the Success rate of these methods at pedestrian_set in the VOT2021 dataset; and (h) represents the Success rate of these methods at shaking_set in the VOT2021 dataset.
Figure 9. Success rate of different methods on the different sub-datasets of the VOT2021 dataset. (a) represents the Success rate of these methods at graduate_set in the VOT2021 dataset; (b) represents the Success rate of these methods at matrix_set in the VOT2021 dataset; (c) represents the Success rate of these methods at soccer_set in the VOT2021 dataset; (d) represents the Success rate of these methods at nature_set in the VOT2021 dataset; (e) represents the Success rate of these methods at road_set in the VOT2021 dataset; (f) represents the Success rate of these methods at racing_set in the VOT2021 dataset; (g) represents the Success rate of these methods at pedestrian_set in the VOT2021 dataset; and (h) represents the Success rate of these methods at shaking_set in the VOT2021 dataset.
Sensors 25 00880 g009
Figure 10. Precision of different methods on data with different properties in the VOT2021 dataset. (a) represents the precision of these methods at size variations properties; (b) represents the precision of these methods at occlusion properties; (c) represents the precision of these methods at background clutter properties; (d) represents the precision of these methods at camera movement properties; (e) represents the precision of these methods at target fast motion properties; and (f) represents the precision of these methods at illumination variable properties.
Figure 10. Precision of different methods on data with different properties in the VOT2021 dataset. (a) represents the precision of these methods at size variations properties; (b) represents the precision of these methods at occlusion properties; (c) represents the precision of these methods at background clutter properties; (d) represents the precision of these methods at camera movement properties; (e) represents the precision of these methods at target fast motion properties; and (f) represents the precision of these methods at illumination variable properties.
Sensors 25 00880 g010
Figure 11. Tracking error and success rate curves of different module improvement methods on the U-skier dataset. (a) Representing the central error of methods with different structures at Uskier sequence; (b) Repeat the success rate of methods with different structures at Uskier sequence.
Figure 11. Tracking error and success rate curves of different module improvement methods on the U-skier dataset. (a) Representing the central error of methods with different structures at Uskier sequence; (b) Repeat the success rate of methods with different structures at Uskier sequence.
Sensors 25 00880 g011
Table 1. The tracking example of VOT2021 attributes.
Table 1. The tracking example of VOT2021 attributes.
DatasetFrame
Numbers
Challenging Factors
Graduate844Low Resolution, Motion Change, Camera Motion, Illumination Change, Size Change, Occlusion
Matrix100Occlusion, Camera Motion, Illumination Change, Motion Change, Scale Variation
Nature999Size Change, Camera Motion, Occlusion, Illumination Change, Motion Change,
Racing156Key Frame, Camera Motion, Illumination Change, Motion Change, Occlusion, Size Change
Road558Out-of-Plane Rotation, Background Clutters, Deformation, Fast Motion, Size Change, Illumination Change, Occlusion, Camera Motion
Shaking365Motion Change, Camera Motion, Occlusion, Size Change, Illumination Change
Soccer1392Camera Motion, Illumination Change, Motion Change, Occlusion, Scale Variation
Pedestrian1140Size Change, Camera Motion, Illumination Change, Motion Change, Occlusion
Table 2. Average center error (ACE) with U-skin dataset and time cost for each method.
Table 2. Average center error (ACE) with U-skin dataset and time cost for each method.
EKFFAPFPSO-PFSORTKCF-PFSiam RPN++IFA-PFBytetrackoc-sortSPTFBSTMLGTOurs
ACE53.0439.2535.3722.4241.7217.4016.4824.9819.4614.9415.6113.5312.93
Time cost (ms)15.1843.6721.3827.1312.7236.3110.4216.8317.1655.6543.7482.648.40
Table 3. Average center error (ACE) with VOT dataset.
Table 3. Average center error (ACE) with VOT dataset.
EKFPAPFPSO-PFSORTKCF-PFSiam RPN++IFA-PFBytetrackoc-sortSPTFBSTMLGTOurs
graduate66.4723.469.4315.784.686.744.019.992.082.412.942.591.33
matrix55.1843.6721.387.1312.726.35.425.835.165.175.814.824.40
nature33.4221.588.1214.14.033.363.472.711.972.472.552.272.07
racing25.3715.0411.021.252.22.892.931.442.771.871.981.81.03
road40.8135.5117.1610.257.5717.096.988.738.104.294.483.883.71
shaking59.2244.7510.495.714.944.644.978.913.994.484.674.222.88
soccer159.0745.5411.426.353.924.825.277.463.664.666.934.224.03
pedestrian27.9925.5516.185.772.883.135.025.053.874.274.464.032.75
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, Q.; Dong, L.; Chu, X.; Liu, M.; Kong, L.; Zhao, Y. Particle Filter Tracking System Based on Digital Zoom and Regional Image Measure. Sensors 2025, 25, 880. https://doi.org/10.3390/s25030880

AMA Style

Zhao Q, Dong L, Chu X, Liu M, Kong L, Zhao Y. Particle Filter Tracking System Based on Digital Zoom and Regional Image Measure. Sensors. 2025; 25(3):880. https://doi.org/10.3390/s25030880

Chicago/Turabian Style

Zhao, Qisen, Liquan Dong, Xuhong Chu, Ming Liu, Lingqin Kong, and Yuejin Zhao. 2025. "Particle Filter Tracking System Based on Digital Zoom and Regional Image Measure" Sensors 25, no. 3: 880. https://doi.org/10.3390/s25030880

APA Style

Zhao, Q., Dong, L., Chu, X., Liu, M., Kong, L., & Zhao, Y. (2025). Particle Filter Tracking System Based on Digital Zoom and Regional Image Measure. Sensors, 25(3), 880. https://doi.org/10.3390/s25030880

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop