Dynamic Tracking Method Based on Improved DeepSORT for Electric Vehicle

Zhu, Kai; Dai, Junhao; Gu, Zhenchao

doi:10.3390/wevj15080374

Open AccessArticle

Dynamic Tracking Method Based on Improved DeepSORT for Electric Vehicle

by

Kai Zhu

^1,*

,

Junhao Dai

² and

Zhenchao Gu

²

¹

School of Automobile and Traffic Engineering, Jiangsu University of Technology, Changzhou 213001, China

²

School of Mechanical Engineering, Jiangsu University of Technology, Changzhou 213001, China

^*

Author to whom correspondence should be addressed.

World Electr. Veh. J. 2024, 15(8), 374; https://doi.org/10.3390/wevj15080374

Submission received: 25 July 2024 / Revised: 6 August 2024 / Accepted: 15 August 2024 / Published: 17 August 2024

(This article belongs to the Special Issue Deep Learning Applications for Electric Vehicles)

Download

Browse Figures

Versions Notes

Abstract

:

The development of electric vehicles has facilitated intelligent transportation, which requires the swift and effective detection and tracking of moving vehicles. To satisfy this demand, this paper presents an enhanced DeepSORT algorithm. By selecting YOLO-SSFS as the front-end detector and incorporating a lightweight and high-precision feature training network called FasterNet, the proposed method effectively extracts vehicle appearance attributes. Besides this, the noise scale adaptive Kalman filter is implemented and the conventional cascade matching process is substituted with global join matching, thereby enhancing overall performance and tracking accuracy. Validation conducted on the VisDrone dataset demonstrates the superiority of this method compared to the original DeepSORT algorithm, exhibiting a 4.76% increase in tracking accuracy and a 3.10% improvement in tracking precision. The findings reveal the advantages of the algorithms in the domain of vehicle detection and tracking, allowing significant technological advancements in intelligent transportation systems.

Keywords:

electric vehicle; detection and tracking; improved DeepSORT; tracking ability

1. Introduction

The rapid advancement of electric vehicles has brought about increasingly sophisticated autonomous driving technology. Currently, approximately 1.35 million individuals die in traffic accidents annually, while around 50 million people sustain injuries. The primary cause behind these incidents, accounting for nearly 90% of all traffic accidents, is the human factor, which includes driver fatigue, driver negligence, pedestrian mistakes, and other related factors. Under these circumstances, how to identify moving vehicles and other objects is crucial for improving transportation safety [1]. In complex traffic environments, the ability to efficiently and accurately identify and track surrounding vehicles is crucial for ensuring the safety of both vehicles and pedestrians. Recent years have witnessed progress in multi-object recognition and tracking by researchers worldwide. One approach involves the integration of object detection algorithms with tracking algorithms to strengthen tracking performance.

2. Current Research

2.1. Traditional Target Tracking Algorithm

In 2002, Comaniciu et al. [2] applied the mean shift algorithm to target tracking, which required the distribution of data, and iteratively calculated the maximum density of sample points to obtain the target orientation. The principle is simple and the speed is fast, but the image features are singular, which may result in target loss in actual searches. In the same year, Nummiaro et al. [3] utilized particle filtering in the field of target tracking, using Monte Carlo methods to approximate the estimation of recursive Bayesian filtering. The algorithm is easily implemented, making it suitable for nonlinear dynamic system analysis. In 2006, Stenger et al. [4] applied the recursive Bayesian algorithm in tracking and achieved good performance, laying a solid foundation for the development of subsequent related filtering algorithms. In 2010, Bolme et al. [5] proposed the Minimum Output Sum of Squared Error (MOSSE) algorithm; it estimated the signal points in the window by solving the minimum mean square error criterion, and filtered the entire signal by continuously moving the window. However, this method failed to achieve high accuracy in complex situations. In order to overcome the shortcomings of the MOSSE algorithm, Henriques et al. [6] created a Kernelized Correlation Filter (KCF). The captured image was subjected to Fourier transformation and the position of the next frame was determined by calculating the correlation between the frequency domain expression and the background frequency domain expression.

2.2. Target Tracking Algorithm Based on Deep Learning

Ma et al. [7] put forward HCF, an algorithm that replaced the original HOG features with deep features so as to refine the performance of the network. Algorithms such as C-COT [8] and ECO [9] combine deep features extracted from deep learning with traditional manual features to obtain composite features, which are then adopted to track targets. In 2016, Alex Bewley et al. [10] proposed the SORT algorithm as the first multi-target tracking algorithm, consisting of detection and tracking algorithms. The detectors and trackers were helpful in achieving target tracking. However, when facing target scale changes and long-term disappearance, incorrect tracking tended to occur. Zhou et al. [11] employed CenterTrack, which adopted a single-stage forward propagation framework and performed excellently in terms of speed and accuracy. Wojke N. et al. [12] integrated appearance information to improve the performance of SORT, effectively reducing the number of identity switches. Zhang et al. [13] invented a simple and effective method called FairMOT based on the anchor-free object detection architecture CenterNet, which fused the advantages of CenterTrack and Reid, thus enhancing detection accuracy. Zhang et al. [14] exploited an algorithm called ByteTrack, which was derived from a simple, effective, and universal correlation method. While retaining high-credibility detection boxes for low-score detection boxes, similarity with the tracklet was employed to restore real objects and filter out background detection, thus improving tracking accuracy.

2.3. Improved Target Tracking Algorithm Based on Deep Learning

For instance, Cho et al. [15] projected images from local to global coordinate systems, leveraging YOLOv4 and DeepSORT to identify and track vehicles captured by traffic surveillance cameras. Likewise, Patel et al. [16] developed an online object tracking system for activity detection and crowd behavior analysis. Another avenue concerns optimizing detection or tracking algorithm modules. Ge et al. [17] employed ShuffleNetv2-YOLO as the primary detector, incorporating the CBAM attention mechanism and BiFPN, along with DeepSORT, for roadside pedestrian detection. Perera et al. [18] augmented DeepSORT’s Kalman filter by introducing a trajectory-free variant, thereby boosting tracking efficiency. Pei [19] substituted the Mahalanobis distance metric with the Generalized Intersection Over Union (GIOU) metric, allowing the network to more effectively determine weight distribution between detection and prediction boxes. Jin et al. [20] substituted the detector with Gaussian YOLOv3 and replaced the cross-entropy function with the center loss function in the re-identification network, enhancing the algorithm’s vehicle feature extraction capabilities, and thus its tracking performance. Song et al. [21] integrated the D2LA network into FairMOT, minimizing its complexity while preserving detection accuracy. Kesa et al. [22] conducted both object prediction and tracking, primarily utilizing FairMOT [23] to detect bounding boxes and consolidate prediction results in a joint learning framework. New modules have also been introduced to improve detection. He et al. [24] utilized NMS to reduce redundant bounding boxes, match features, re-determine bounding boxes utilizing Intersection Over Union (IOU) metrics, and re-establish trajectory positions. Zou et al. [25] proposed a compensating tracker to reduce identity switching during tracking by retrieving lost objects through a motion compensation module. Peng et al. [26] introduced trajectory plane matching, generating short trajectories from detected objects and aligning them on a trajectory plane. By allocating each trajectory a hyperplane based on its start and end times, they effectively correlated successive trajectories. Liang et al. [27] proposed a spatial attention mechanism, implementing a Spatial Transformation Network (STN) in the appearance model to concentrate exclusively on the foreground. Chen et al. [28] combined DeepSORT with a human detector and an I3D network module to achieve the multi-target tracking of pedestrians. Chen et al. [29] improved YOLOv4 using the Generalized Intersection Union Ratio (GIOU) [30], and then combined the improved YOLOv4 with Deep SORT to improve the performance of the algorithm. However, there are issues such as false positives and missed detections that relate to occlusion and background interference. Zhang et al. [31] improved DeepSORT using YOLOv5s and MobileNetv3, which improved the detection speed of the algorithm, but there was a problem related to insufficient accuracy. Du et al. [32] optimized and improved the structures of various parts of DeepSORT and proposed a StrongSORT. After repeated experiments, the tracking accuracy was greatly improved. Nikita et al. [33] proposed a theory for the technical and economic evaluation of reliability improvements in autonomous electric buses and trolleybuses, operating electric buses through optimal management theory.

2.4. Shortcomings of Existing Methods

Although various target tracking algorithms have improved their performances in various ways, there are still several problems. Firstly, when dealing with large-scale data, the algorithm performs complex tracking and feature extraction, resulting in reduced tracking accuracy. Secondly, there is also a relatively weak ability to handle changes in target shape and occlusion. Finally, due to the influence of noise matrices, the target state cannot be updated in a timely manner, resulting in the algorithm being unable to accurately estimate the target’s motion state.

2.5. Novelty and Contribution

In this article, we have mainly made the following four contributions:

Firstly, we have introduced YOLO-SSFS as the front-end detector, mainly to obtain higher-quality vehicle features for tracking;

Secondly, as regards the DeepSORT algorithm, we improved its feature extraction network, effectively enhancing the accuracy of feature extraction;

Meanwhile, we improved the Kalman filter, which associates the noise matrix that affects target state updates with the confidence level of the target, discarding some low-confidence targets and reducing the occurrence of ID Switch;

Finally, the algorithm was transitioned from cascaded matching to global linear matching, effectively reducing previous performance limitations.

3. Overview of DeepSORT Algorithm

The DeepSORT algorithm presents significant advancements over the classic SORT (Simple Online and Realtime Tracking) algorithm. By utilizing a feature extraction network, DeepSORT derives comprehensive feature information from tracked targets.

This facilitates more robust object association between frames, reducing identity switches often caused by occlusions. In addition, DeepSORT creatively integrates Kalman filtering to optimize object trajectory predictions. The algorithm leverages the Hungarian algorithm and cascade matching for robust temporal association, finally achieving accurate and consistent tracking. Figure 1 demonstrates its operational flow, highlighting the crucial steps of data association and parameter updating.

3.1. Kalman Filtering Algorithm

Kalman filtering, a well-established algorithm for target tracking, is often implemented in conjunction with other algorithms. This approach proves particularly effective in real-world detection scenarios, where background noise and environmental factors can present significant challenges. By analyzing historical movement patterns, Kalman filtering anticipates future target behavior, effectively predicting the target’s motion state at the next time step.

During the prediction process, the algorithm constructs a prediction state equation based on the detected objects. The state equation at a specific time is defined as

{\vec{X}}_{k - 1} = [\begin{matrix} p_{k - 1} \\ v_{k - 1} \end{matrix}]

. When the target vehicle is in uniform motion, the state equation at time

k

can be expressed as

{\vec{X}}_{k} = [\begin{matrix} p_{k} \\ v_{k} \end{matrix}]

, where

p_{k} = p_{k - 1} + (t_{k} - t_{k - 1}) v_{k - 1}

. However, in real-world scenarios, vehicles may experience a certain degree of acceleration. Therefore, defining the acceleration as

a_{k}

, and

Δ t = t_{k} - t_{k - 1}

as the time difference between two instants, the state equation at time

k

becomes [12]

{\overset{⏜}{X}}_{k} = [\begin{matrix} p_{k} \\ v_{k} \end{matrix}] = [\begin{matrix} 1 & Δ t \\ 0 & 1 \end{matrix}] [\begin{matrix} p_{k - 1} \\ v_{k - 1} \end{matrix}] + [\begin{matrix} Δ t^{2} / 2 \\ Δ t \end{matrix}] a_{k}

(1)

Therefore, the first state transition equation during the prediction process is

{\overset{⏜}{X}}_{k | k - 1} = F_{k} \overset{⏜}{X_{k}} + B_{k} \vec{u_{k}}

(2)

where the state transition matrix

F_{k} = [\begin{matrix} 1 & Δ t \\ 0 & 1 \end{matrix}]

and control matrix

B_{k} = [\begin{matrix} Δ t^{2} / 2 \\ Δ t \end{matrix}]

. Via transformation, each point in the initial estimation can be moved to a new position, completing the position estimation of the target vehicle.

\vec{u_{k}}

represents the control output vector, i.e., acceleration

a_{k}

{\overset{⏜}{X}}_{k | k - 1}

represents the a priori estimate of the current state.

A unique feature of Kalman filtering is represented by its reliance on predictions at different times, resulting in uncertainty in the estimated position at each moment. To address this, the covariance matrix

P

is introduced to denote the uncertainty of the estimated state and the correlation between state quantities in Kalman filtering. Its formula is:

P_{k | k - 1} = F_{k} P_{k - 1} {F_{k}}^{T}

(3)

Besides this, real-world applications necessitate an accounting for the noise interference present in the environment. This noise influences the covariance matrix, as exemplified by the control input vector

\vec{u_{k}}

discussed earlier. Such noise is typically unavoidable and is represented by the covariance matrix

Q_{k}

. The second equation in the Kalman filter prediction process is [12]:

P_{k | k - 1} = F_{k} P_{k - 1} {F_{k}}^{T} + Q_{k}

(4)

The result obtained from Formula (4) represents the system’s estimation of the target state at different times. The system’s observation equation can be derived from the transformation coefficient matrix H, as shown in Formula (5):

Z_{k} = H x_{k} + v_{k}

(5)

where

Z_{k}

represents the system observation, and

v_{k}

represents the system noise. Both

Z_{k}

and

v_{k}

, as well as their product, follow Gaussian distributions. Therefore, the gain matrix

K

can be obtained as demonstrated in Formula (6).

K_{k} = P_{k | k - 1} {H_{k}}^{T} {(H_{k} P_{k | k - 1} {H_{k}}^{T} + R_{k})}^{- 1}

(6)

Finally, by employing this gain matrix for the optimal estimation of the aforementioned prediction equation, we arrived at the formula for the target’s optimal estimation [12]:

{\overset{⏜}{X}}_{k} = {\overset{⏜}{X}}_{k | k - 1} + K_{k} (Z_{k} - H_{k} {\overset{⏜}{X}}_{k | k - 1})

(7)

{\overset{⏜}{P}}_{k} = P_{k | k - 1} - K_{k} H_{k} P_{k | k - 1}

(8)

3.2. Data Association and Cascade Matching

The tracking algorithm leverages both motion and appearance information, derived through feature extraction, to enhance tracking accuracy. To assess the consistency between predicted and actual samples, the Mahalanobis distance is employed, effectively normalizing the covariance matrix to evaluate the stability of the predicted trajectory mean. This relationship is expressed as:

d^{(1)} (i, j) = {(d_{j} - y_{i})}^{T} S_{i}^{- 1} (d_{j} - y_{i})

(9)

where

d^{(1)} (i, j)

represents a comparison of the dynamic target,

d_{j}

is the coordinate of the

j

-th detection box,

y_{i}

is the prediction result of the

i

-th tracker for the tracking target, and

S_{i}

denotes the noise impact between the actual detection position and the predicted position.

To effectively track the motion trajectory information of objects, the cosine distance metric is introduced to facilitate appearance information matching, effectively distinguishing the scale differences across different time instances:

d^{(2)} (i, j) = \min {1 | - r_{j}^{T} r_{k}^{(1)} | r_{k}^{(i)} \in R_{i}}

(10)

where

d^{(2)} (i, j)

is the cosine distance metric,

r_{j}

is the unit feature vector extracted from the

d_{j}

-th detection box, i.e.,

| | r_{j} | |

= 1, and

R_{i}

stores the appearance feature vectors that define the trajectory.

The final metric is obtained by linearly weighting the Mahalanobis distance and the cosine distance [12]:

c_{i, j} = λ d^{(1)} (i, j) + (1 - λ) d^{(2)} (i, j)

(11)

where

λ

represents the influence factor associated with each measurement method. If the metric value is in the overlapping threshold range of both, it indicates a successful association.

When tracking a target, extended periods of occlusion can impair the Kalman filter’s predictive abilities. This impairment results in covariance matrix inflation, thus reducing tracking accuracy. To combat this, a cascade matching mechanism is employed. Each tracker is assigned an update time parameter. The tracker is actively discarded under two conditions: successful matching or when its time parameter surpasses 60 s. This process is illustrated in Figure 2.

3.3. Feature Extraction Network

The feature extraction network marks a crucial step in extracting the visual characteristics of a tracked object. This network is trained offline through deep learning, which establishes the weights for appearance extraction. During the tracking process, these weights are utilized in conjunction with boundary decision directions to re-identify the object, finally enhancing recognition accuracy.

The DeepSORT algorithm leverages a CNN architecture, as depicted in Table 1, which functions as a residual network. This network comprises two convolutional layers followed by six residual blocks. Global feature maps are computed in the Dense 10 layer. The network utilizes the convolutional kernels consistently, with 3 × 3 throughout. Specifically, convolutions with a step of 2 replace traditional max-pooling operations. To reduce bottleneck problems encountered during spatial resolution reduction, the number of channels is progressively increased. In addition, Exponential Linear Units (ELU) represent the activation function throughout the entirety of the network architecture.

4. Improved DeepSORT Algorithm

4.1. Frontend Detector Optimization

Existing object tracking systems frequently employ YOLO algorithms, particularly YOLOv5. However, these algorithms often struggle to achieve high-precision detection in complex road environments. To address this, this paper utilizes YOLO-SSFS [34], which introduces several key improvements: (1) The architecture of the original YOLOv5 is enhanced by incorporating spatial depth layers in the Neck, improving its vehicle-detection capabilities. (2) Cross-scale connection lines are integrated into the FPN structure, optimizing the algorithm’s detection accuracy. (3) The SIoU loss function is employed to filter prediction boxes, enhancing both the stability and speed of target box regression. These modifications result in a 6.31% improvement in vehicle detection accuracy compared to the original algorithm, effectively meeting the demands of real-world detection scenarios.

Figure 3 offers a visual representation of this enhanced network structure. Through a series of improvements, the accuracy of the front-end detector for detecting vehicle targets has been greatly improved, laying a good foundation for subsequent target tracking.

4.2. Target Feature Extraction Optimization

The original DeepSORT algorithm demonstrates excellence in extracting and tracking target information. It achieves this by aggregating dynamic and static target features into a cost matrix. This matrix enables the association of data across consecutive time steps utilizing cascade matching, facilitating the accurate prediction of tracked target trajectories. While DeepSORT’s optimized structure, derived from its predecessors, significantly enhances tracking accuracy and stability, it can exhibit limitations in real-world applications. Specifically, the algorithm may struggle to extract comprehensive appearance features from targets due to limitations in its network depth. To overcome this, this paper incorporates a more robust feature extraction network, FasterNet [35], the architecture of which is outlined in Table 2.

Primarily developed for pedestrian detection, the DeepSORT algorithm typically uses an input size of 128 × 64. However, this experiment focuses on vehicles, which primarily present a flattened profile in images. To accommodate this difference, the input size was adjusted to 64 × 128 to better reflect the aspect ratio of the target objects. Figure 4 illustrates the core structure of both the FasterNet block and the overall FasterNet architecture.

As we all know, FasterNet, a convolutional neural network module based on Partial Convolutions (PConv), is known for its rapid processing speed. It consists of four hierarchical levels. An embedding layer (with a 2 × 2 convolution kernel and a step of 2) precedes the first level, while a merging layer (also with a 2 × 2 convolution kernel and a step of 2) precedes each of the remaining three levels. These layers facilitate spatial sampling and channel expansion. Specifically, the number of FasterNet modules varies across each stage. In each FasterNet block, a single PConv layer is followed by two 1 × 1 Conv layers. Normalization and activation layers are positioned after the middle layer to preserve feature diversity while minimizing latency.

4.3. Kalman Filter Improvement

In DeepSORT, the Kalman filter is tasked with estimating and predicting the trajectories of target objects. This procedure is composed of two main steps. First, the filter estimates the target’s present motion state and evaluates its stability. Second, by analyzing the effect of background and environmental noise, the filter rectifies trajectory deviations by computing a weighted average of predicted and measured values. This produces a final adjusted trajectory estimation. Specifically, as background noise defies mathematical expression, it is represented by a covariance matrix, R. Increased measured noise, indicating greater uncertainty in the target’s position, causes the next state update process to progressively scale back the weighting of measurements from that target until it is finally disregarded.

In the original Kalman filter algorithm, the covariance matrix representing noise is a fixed value. This assumption, however, can occasionally lead to target loss and identity switches. To address this, the DeepSORT structure incorporates a Noise Scale Adaptive Kalman Filter (NSA Kalman Filter). In this adaptive methodology, the covariance is computed as follows [32]:

{\tilde{Q}}_{k} = (1 - c_{k}) Q_{k}

(12)

where

{\tilde{Q}}_{k}

represents the noise matrix that varies with confidence, whereas

Q_{k}

denotes the fixed noise matrix in the original algorithm, and

c_{k}

indicates the confidence score of the detection at time

k

A confidence increment would lead to a reduction in the effect of the noise scale on the state update process. This adjustment enhances the accuracy of the updated state, thus leading to improved target tracking accuracy.

4.4. Global Linear Matching

It is well-established that DeepSORT’s cascade matching algorithm significantly improves accuracy; however, increasing tracker performance can counterintuitively reduce overall tracking accuracy. This occurs because a high-performing tracker, while adept at handling ambiguous matches, is negatively affected by the constraints imposed by the cascade matching method’s reliance on prior information. These associations increase computational complexity, finally hindering algorithm performance and reducing matching accuracy. To address this limitation, this paper proposes replacing cascade matching with global linear matching.

5. Experiment Results and Analysis

5.1. Materials and Methods of Work

This research utilized the VeRi-wild [36] dataset, an extensive compilation of urban traffic surveillance data. From this dataset, images of 390 vehicles were extracted, totaling 17,500 photos. These images were captured under diverse circumstances, including various angles, lighting conditions, and occlusion scenarios, with each vehicle represented by 30 to 60 images. Figure 5 offers representative images from the dataset. The following pictures were taken on the outskirts of a small city in China.

To verify the algorithm’s tracking capabilities, three video detection tasks were selected from the VisDrone-MoT [37] dataset, and each image sequence was synthesized into a dynamic video. For practical implementation, frame extraction operations were simulated during detection. The algorithm’s processing speed was approximated as 10–15 frames per second (fps). Therefore, during the synthesis of image sequences, the frame rate was fixed at 10 fps to align with the actual application scenario of the tracking algorithm. Table 3 offers further details.

The experimental platform comprises two primary components, with Table 4 outlining the main hardware and software configurations. Specifically for the multi-object tracking of forward vehicles analyzed in this study, pre-training was performed for the feature extraction of vehicle targets and the training of the vehicle detector YOLO-SSFS.

This article mainly uses ablation experiments and comparative experiments to demonstrate the superiority of our algorithm. In the ablation experiment, we gradually introduced improvement methods to verify the improvement of algorithm performance achieved by introducing modules. In the comparative experiment, the superiority of the algorithm over other algorithms in certain aspects was demonstrated.

5.2. Evaluation Metrics

Vehicle tracking performance evaluation metrics are crucial for evaluating the effectiveness of tracking algorithms. Employing multiple metrics facilitates the evaluation of various aspects of algorithmic performance. Commonly utilized metrics include Multi-Object Tracking Accuracy (MOTA), Multi-Object Tracking accuracy (MOTP), and Identity Switch (IDS).

MOTA mainly reflects the quantity of inaccurate target detections during the detection process, thus indicating the algorithm’s detection accuracy. MOTP, on the other hand, represents the overlap ratio between predicted bounding boxes and ground truth bounding boxes, demonstrating the model’s overall resilience to errors. IDS primarily quantifies the tracking algorithm’s ability to resist interference. The calculation formulae for MOTA and MOTP are as follows:

M O T A = 1 - \frac{\sum (F P + F N + I D s)}{\sum G T}

(13)

M O T P = \frac{\sum_{t, i} d_{t . i}}{\sum c_{t}}

(14)

where

F P, F N

represents the total number of false positives and false negatives, while

G T

denotes the total number of ground truth annotations,

t

denotes the

t

-th frame,

d_{t . i}

signifies the overlap ratio between the predicted bounding box and the ground truth box for the

i

-th matched object in the

t

-th frame, and

c_{t}

represents the number of ground truth boxes matched by predicted boxes in the

t

-th frame.

The ID Switch metric, frequently utilized to assess tracking performance in multi-object tracking, reflects the number of times a single target is assigned different IDs across different frames. This phenomenon, where the same target is misidentified, often originates from shifts in the environment, and poses a significant challenge in tracking.

Figure 6 depicts a scenario of a single ground truth trajectory (GT), denoted by a black dashed line, overlaid with six images. The orange and blue lines represent predicted trajectories at different time points. In the first frame, the lack of a detection box covering the true trajectory results in a false negative (FN). The orange line then tracks the target accurately in the second, third, and fourth frames, producing true positives (TPs). However, an ID switch occurs in the fifth frame as the orange line deviates from the GT, while the blue line converges towards it. This leads to an FP for the orange line in the fifth frame and designates the fifth and sixth frames as TPs for the blue line, accurately reflecting the target’s movement. This example highlights how a greater frequency of ID switches directly correlates with a decrease in the tracking algorithm’s performance.

5.3. Training Results of Feature Extraction Network

Figure 7 and Figure 8 illustrate the top1 error rate and loss curves of the algorithm both before and after the feature extraction network. The training process is represented by the blue lines, while the validation process is illustrated by the red lines. The top1 error rate signifies the probability of the algorithm misidentifying a target. In simpler terms, it represents the probability of the model generating an inaccurate result due to an incorrect prediction for a given object, whereas the loss value indicates the loss incurred during the training process. During the training of the feature extraction network on the dataset, both networks utilized identical parameters: an initial learning rate of 0.001, which was then reduced by a factor of 0.1 every 20 epochs. The discrepancy between the red and blue curves represents the degree of overfitting. A smaller gap between these curves indicates lower overfitting, suggesting the better performance of the algorithm.

Figure 7 illustrates that during the first 20 training epochs, the training and validation set curves remain closely aligned. As training progresses beyond 20 epochs and approaches 30, the original network’s descending curve retains relative smoothness, while the improved network demonstrates more significant fluctuations in this range. After 40 epochs, both networks experience a sharp decline in top1err values; however, this rate of decrease gradually levels off, reaching a plateau at approximately 60 epochs. Specifically, after 80 epochs, the improved network exhibits a reduction in top1err values for both the training and validation sets.

As depicted in Figure 8, the loss values for both networks steadily decreased during the initial 20 epochs. However, the original network displayed more significant fluctuations between epochs 20 and 40. After epoch 40, the original network’s loss value plummeted, and a gap gradually formed between the training and validation sets, indicating overfitting. In contrast, the FasterNet network exhibited superior convergence. During the first 40 epochs, the curves representing the training and validation sets nearly overlapped, with fewer fluctuations compared to the original network. This observation highlights that the improved algorithm proposed in this chapter enhanced convergence characteristics, resulting in greater stability and reliability.

Table 5 represents the performance gains achieved by FasterNet, which attained 94.71% accuracy—a 4.33% improvement compared to the original network. This improvement advancement incurs a marginal cost according to model size, as the weight files are larger despite similar training durations. Nevertheless, the accuracy improvement is accompanied by a trade-off with some of the model’s speed advantage.

5.4. Analysis of Evaluation Metric Results

The algorithm’s performance can be evaluated through established metrics such as MOTA, MOTP, and ID Switch. Table 6 illustrates the effect of the enhanced DeepSORT algorithm on overall performance. Utilizing Video 1 as a case study and employing YOLO-SSFS as the front-end detector, the improved methods are incorporated progressively throughout the experiment. Higher MOTA and MOTP values, approaching 1, signify enhanced tracking performance, whereas a lower ID Switch value indicates superior algorithm processing capabilities.

As evident from the table, compared to the original algorithm, the proposed improvement strategies in this paper significantly enhance tracking accuracy. The MOTA value exhibited an increase of 2.96%, the MOTP value increased by 1.11%, and the ID Switch count experienced a reduction by eight occurrences.

In addition, this chapter presents comparative experiments conducted on three video sequences, comparing the original YOLOv5 algorithm with YOLO-SSFS as unique front-end detectors. This study considers the YOLOv5 + DeepSORT algorithm as combination one, YOLOv5 + improved DeepSORT algorithm as combination two, and YOLO-SSFS + improved DeepSORT algorithm as combination three. The specific experimental results are detailed in Table 7.

From the table, it is observable that in combination one, solely enhancing the DeepSORT algorithm structure yields an increase in the MOTA value and a reduction in the number of ID Switches. Specifically, in combination three, substituting the front-end detection algorithm results in a significant increase in the MOTA value, with MOTA values in videos one, two, and three increasing by 4.37%, 5.16%, and 4.76%, respectively, compared to the original algorithm. This phenomenon is largely attributed to the enhancements in the YOLO-SSFS algorithm, which primarily focus on augmenting detection accuracy, thereby indirectly elevating the MOTA value. The comparison between combination one and combination two highlights that improving the front-end detector can more effectively enhance the MOTA value while simultaneously minimizing the ID Switch count. On average, the improved algorithm proposed in this chapter achieves a 4.76% increase in the MOTA value and an 11-count reduction in the total number of ID Switches compared to the original algorithm.

5.5. Analysis of Visualization Results

In complex, real-world traffic environments, ID switching is highly probable during the detection process. The images in Figure 9 and Figure 10 were taken on a certain road section in Tianjin, China, and illustrate the detection results of two such scenarios. As illustrated in Figure 9, at frame 103, the vehicle initially appears with ID number 110. However, after moving for a while, at frame 107, the ID changes to 122 due to the added presence of an overpass in the detection scene, representing a typical ID Switch phenomenon. The overpass causes interference with the traffic background, leading to partial occlusion of the vehicle. Therefore, the algorithm fails to accurately recognize the vehicle’s appearance, resulting in misidentification.

However, utilizing the improved YOLO-SSFS and DeepSORT combined algorithm proposed in this chapter significantly reduces the ID switching phenomenon. This is clearly illustrated in Figure 10, where a vehicle, assigned ID number 110 at frame 103, retains its ID despite the overpass interference when it progresses to frame 107. This consistency demonstrates the effectiveness of the proposed algorithm in enhancing detection accuracy and minimizing ID switching occurrences.

5.6. Comparison with Existing Achievements

Due to the rapid development of existing target tracking technology, many researchers have focused on improving the DeepSORT algorithm, and have derived several results. We have selected some algorithms for comparison and compared their advantages from multiple points of view. The results are displayed in Table 8 and Table 9.

As is well known, MOTA, as an indicator of tracking accuracy, reflects the effectiveness of the algorithm to some extent. From Table 8, it is not difficult to see that in complex traffic environments, the MOTA values of algorithms such as the EAMMTT algorithm, ER DeepSORT algorithm and DeepMOT are 24.52%, 9.17% and 7.87% lower than that of our algorithm, respectively. This indicates the priority of our algorithm.

In Table 9, a completely different phenomenon can be seen. The algorithms mentioned, such as Transformer YOLOv5 DeepSORT, ByteTrack and OC-SORT, perform better in MOTA and MOTP. However, during the improvement process, the two algorithms exhibited a more severe ID Switch phenomenon, while our algorithm outperformed them to some extent.

In summary, due to different traffic environments, the performances of different algorithms may also vary. Besides, there are varieties in datasets, and different researchers will rarely choose the same one, or the same section, for use in experiments. Based on the above analyses, the algorithm proposed in this article is still competitive compared to existing methods in the field of vehicle tracking.

6. Conclusions

With the rapid development of electric vehicles in the intelligent transportation system, how to effectively detect and track dynamic vehicles is of great importance. This paper presents an enhanced DeepSORT algorithm incorporating several key optimizations. Firstly, the algorithm utilizes YOLO-SSFS for object detection and introduces FasterNet, a novel feature extraction network. To enhance vehicle re-identification capabilities, the model is trained on the VeRi-wild dataset, optimizing its ability to extract vehicle capacity. In addition, the conventional Kalman filter is substituted with the NSA Kalman filter. This adjustment associates the noise covariance matrix with confidence levels, thereby boosting tracking accuracy. Finally, the algorithm transitions from cascade matching to global linear matching, effectively reducing prior performance limitations. To verify these enhancements, experimental validation was conducted across three video segments from the VisDrone-MOT dataset. The results demonstrate the effectiveness of the optimized algorithm, producing an average 4.76% gain in MOTA, a 3.10% increase in MOTP, and an 11-instance reduction in ID switches relative to the baseline algorithm. The research helps to enhance vehicle tracking ability, which will be crucial to intelligent transportation, and intelligent life in general, in the future.

However, the method proposed in this question has the following shortcomings. Firstly, the tracking accuracy has been improved, but as the number of tracked targets increases, its accuracy will decrease. Further, the phenomenon of ID conversion still exists and needs to be further reduced.

In future work, we will use higher-quality object detection models to reduce object detection errors, or use some auxiliary information, such as the shape or color of the target, to help distinguish it.

Author Contributions

Formal analysis, Z.G.; investigation, Z.G.; methodology, K.Z.; resources, J.D.; validation, J.D.; writing—original draft, K.Z.; writing—review and editing, K.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by The Natural Science Foundation of the Jiangsu Higher Education Institutions of China (grant number 22KJD440001) and Changzhou Science & Technology Program (grant number CJ20220232).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Patil, L.N.; Khairnar, H.P. Investigation of Perceived Risk Encountered by Electric Vehicle Drivers in Distinct Contexts. Appl. Eng. Lett. 2021, 6, 69–79. [Google Scholar] [CrossRef]
Comaniciu, D.; Meer, P. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 603–619. [Google Scholar] [CrossRef]
Nummiaro, K.; Koller-Meier, E.; Van Gool, L. An adaptive color-based particle filter. Image Vis. Comput. 2003, 21, 99–110. [Google Scholar] [CrossRef]
Stenger, B.; Thayananthan, A.; Torr, P.H.; Cipolla, R. Model-based hand tracking using a hierarchical Bayesian filter. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1372–1384. [Google Scholar] [CrossRef] [PubMed]
Bolme, D.S.; Beveridge, J.R.; Draper, B.A.; Lui, Y.M. Visual object tracking using adaptive correlation filters. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 2544–2550. [Google Scholar]
Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 583–596. [Google Scholar] [CrossRef]
Ma, C.; Huang, J.-B.; Yang, X.; Yang, M.-H. Hierarchical convolutional features for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3074–3082. [Google Scholar]
Danelljan, M.; Robinson, A.; Shahbaz Khan, F.; Felsberg, M. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 472–488. [Google Scholar]
Zolfaghari, M.; Singh, K.; Brox, T. Eco: Efficient convolutional network for online video understanding. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 695–712. [Google Scholar]
Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple online and realtime tracking. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar]
Zhou, X.; Koltun, V.; Krähenbühl, P. Tracking objects as points. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 474–490. [Google Scholar]
Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar]
Zhang, Y.; Wang, C.; Wang, X.; Zeng, W.; Liu, W. Fairmot: On the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis. 2021, 129, 3069–3087. [Google Scholar] [CrossRef]
Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. Bytetrack: Multi-object tracking by associating every detection box. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 1–21. [Google Scholar]
Cho, K.; Cho, D. Autonomous driving assistance with dynamic objects using traffic surveillance cameras. Appl. Sci. 2022, 12, 6247. [Google Scholar] [CrossRef]
Patel, A.S.; Vyas, R.; Vyas, O.P.; Ojha, M.; Tiwari, V. Motion-compensated online object tracking for activity detection and crowd behavior analysis. Vis. Comput. 2023, 39, 2127–2147. [Google Scholar] [CrossRef]
Ge, Y.; Lin, S.; Zhang, Y.; Li, Z.; Cheng, H.; Dong, J.; Shao, S.; Zhang, J.; Qi, X.; Wu, Z. Tracking and counting of tomato at different growth periods using an improving YOLO-Deepsort network for inspection robot. Machines 2022, 10, 489. [Google Scholar] [CrossRef]
Perera, I.; Senavirathna, S.; Jayarathne, A.; Egodawela, S.; Godaliyadda, R.; Ekanayake, P.; Wijayakulasooriya, J.; Herath, V.; Sathyaprasad, S. Vehicle tracking based on an improved DeepSORT algorithm and the YOLOv4 framework. In Proceedings of the 2021 10th International Conference on Information and Automation for Sustainability (ICIAfS), Negambo, Sri Lanka, 11–13 August 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 305–309. [Google Scholar]
Pei, Y.C. Research and Application of Pedestrian Multitarget Tracking Algorithm Based on DeepSORT. Master’s Thesis, Qilu University of Technology, Jinan, China, 2022. [Google Scholar]
Jin, L.S.; Hua, Q.; Guo, B.C.; Xie, X.Y.; Yan, F.G.; Wu, B.T. Multi-target tracking of vehicles based on optimized DeepSort. J. Zhejiang Univ. Eng. Sci. 2022, 55, 1056–1064. [Google Scholar]
Song, Y.; Zhang, P.; Huang, W.; Zha, Y.; You, T.; Zhang, Y. Multiple object tracking based on multi-task learning with strip attention. IET Image Process. 2021, 15, 3661–3673. [Google Scholar] [CrossRef]
Kesa, O.; Styles, O.; Sanchez, V. Multiple object tracking and forecasting: Jointly predicting current and future object locations. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass, CO, USA, 4–8 January 2022; IEEE: Snowmass, CO, USA, 2022; pp. 560–569. [Google Scholar]
Chen, Y.; Chen, Z.; Zhang, Z.; Bian, S. AdapTrack: An adaptive FairMOT tracking method applicable to marine ship targets. AI Commun. 2023, 36, 127–145. [Google Scholar] [CrossRef]
He, J.; Zhong, X.; Yuan, J.; Tan, M.; Zhao, S.; Zhong, L. Joint re-detection and re-identification for multi-object tracking. In Proceedings of the International Conference on Multimedia Modeling, Prague, Czech Republic, 5–8 January 2022; Springer International Publishing: Cham, Switzerland, 2022; pp. 364–376. [Google Scholar]
Zou, Z.; Huang, J.; Luo, P. Compensation tracker: Reprocessing lost object for multi-object tracking. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass, CO, USA, 4–8 January 2022; IEEE: Snowmass, CO, USA, 2022; pp. 307–317. [Google Scholar]
Peng, J.; Wang, T.; Lin, W.; Wang, J.; See, J.; Wen, S.; Ding, E. TPM: Multiple object tracking with tracklet-plane matching. Pattern Recognit. 2020, 107, 107480. [Google Scholar] [CrossRef]
Liang, T.; Lan, L.; Zhang, X.; Luo, Z. A generic MOT boosting framework by combining cues from SOT, tracklet and re-identification. Knowl. Inf. Syst. 2021, 63, 2109–2127. [Google Scholar] [CrossRef]
Chen, X.Y. Research on Behavior Recognition Algorithm Based on Deep Learning. Master’s Thesis, Harbin Institute of Technology, Harbin, China, 2020. [Google Scholar]
Chen, Z.Q.; Zhang, Y.L. An improved Deep Sort target tracking algorithm based on YOLOv4. J. Guilin Univ. Electron. Technol. 2021, 41, 140–145. [Google Scholar]
Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 658–666. [Google Scholar]
Zhang, X.H.; Yan, J.X.; Zhang, C. Abnormal behavior recognition of coal blocks based on improved YOLOv5s + DeepSORT. Ind. Mine Autom. 2022, 48, 77–86, 117. [Google Scholar]
Du, Y.; Zhao, Z.; Song, Y.; Zhao, Y.; Su, F.; Gong, T.; Meng, H. Strongsort: Make deepsort great again. IEEE Trans. Multimed. 2023, 25, 8725–8737. [Google Scholar] [CrossRef]
Martyushev, N.V.; Malozyomov, B.V.; Kukartsev, V.V.; Gozbenko, V.E.; Konyukhov, V.Y.; Mikhalev, A.S.; Kukartsev, V.A.; Tynchenko, Y.A. Determination of the Reliability of Urban Electric Transport Running Autonomously through Diagnostic Parameters. World Electr. Veh. J. 2023, 14, 334. [Google Scholar] [CrossRef]
Gu, Z.C.; Zhu, K.; You, S.T. YOLO-SSFS: A method combining SPD-Conv/STDL/IM-FPN/SIoU for outdoor small target vehicle detection. Electronics 2023, 12, 3744. [Google Scholar] [CrossRef]
Chen, J.; Kao, S.H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.H.; Chan, S.H. Run, Don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; IEEE: Vancouver, BC, Canada, 2023; pp. 12021–12031. [Google Scholar]
He, W.K.; Peng, Y.H.; Huang, W.; Yao, Y.J.; Chen, Z.H. Research on Dynamic Vehicle Multi-object Tracking Method Based on DeepSort. Automob. Technol. 2023, 11, 27–33. [Google Scholar]
Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y. VisDrone-DET2019: The vision meets drone object detection in image challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019; pp. 213–226. [Google Scholar]
Xu, Y.; Osep, A.; Ban, Y.; Horaud, R.; Leal-Taixé, L.; Alameda-Pineda, X. How to train your deep multi-object tracker. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6787–6796. [Google Scholar]
He, S.L.; Zhang, J.J.; Zhang, L.J.; Mo, D.Y. An Improved Vehicle Tracking Algorithm Based on Transformer-Enhanced YOLOv5+DeepSORT. Automot. Technol. 2024, 7, 9–16. [Google Scholar]
Cao, J.; Weng, X.; Khirodkar, R.; Pang, J.; Kitani, K. Observation-centric sort: Rethinking sort for robust multi-object tracking. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 9686–9696. [Google Scholar]

Figure 1. The operational steps of the DeepSORT algorithm.

Figure 2. Cascade matching.

Figure 3. Network structure diagram.

Figure 4. FasterNet network structure diagram.

Figure 5. Typical Images from the VeRi-wild Dataset.

Figure 6. The schematic diagram depicting the occurrence of the ID Switch phenomenon.

Figure 7. Comparison of top1 error curves for re-identification training networks.

Figure 8. Comparison of loss curves for re-identification training networks.

Figure 9. Detection and tracking results of video one before algorithm improvement.

Figure 10. Detection and tracking results of video one after algorithm improvement.

Table 1. Original residual network structure.

Layer Name	Kernel Size/Step	Output Size
Conv1	3 × 3/1	32 × 128 × 64
Conv2	3 × 3/1	32 × 128 × 64
MaxPool	3 × 3/2	32 × 64 × 32
Residual 1	3 × 3/1	32 × 64 × 32
Residual 2	3 × 3/1	32 × 64 × 32
Residual 3	3 × 3/2	64 × 32 × 16
Residual 4	3 × 3/1	64 × 32 × 16
Residual 5	3 × 3/2	128 × 16 × 8
Residual 6	3 × 3/1	128 × 16 × 8
Dense 10		128
BN		128

Table 2. FasterNet network structure.

Combination Module	Type	Convolution Kernel	Convolution Stride	Output
Embedding	Conv1	4 × 4	4	128 × 16 × 32
Stage1	PConv1	3 × 3	1	128 × 16 × 32
	Conv2	1 × 1	1	256 × 16 × 32
	Conv3	1 × 1	1	128 × 16 × 32
Merging 1	Conv4	2 × 2	2	256 × 8 × 16
Stage 2	PConv2	3 × 3	1	256 × 8 × 16
	Conv5	1 × 1	1	512 × 8 × 16
	Conv6	1 × 1	1	256 × 8 × 16
Merging 2	Conv7	2 × 2	2	512 × 8 × 16
Stage 3	PConv3	3 × 3	1	512 × 4 × 8
	Conv9	1 × 1	1	512 × 4 × 8
	Conv10	1 × 1	1	512 × 4 × 8
Merging 3	Conv11	2 × 2	2	1024 × 2 × 4
Stage 4	PConv4	3 × 3	1	1024 × 2 × 4
	Conv13	1 × 1	1	1024 × 2 × 4
	Conv14	1 × 1	2	1024 × 2 × 4
Classifier	GAP	4 × 2	1	1024 × 1 × 1
	Conv	1 × 1	1	1024 × 1 × 1
	FC			1024

Table 3. Information of the three selected tracking datasets from VisDrone-MOT.

Video ID	Number of Frames	Number of Tracked Vehicles	Data Annotation
1	145	120	8547
2	265	52	3061
3	420	125	16,111

Table 4. Experimental platform parameters.

	Name	Model Specifications
Hardware Information	CPU	AMD Ryzen 7 5800H with Radeon Graphics
Hardware Information	GPU	NVIDIA GeForceRTX3060 laptop GPU 6 GB
Software Information	OS	Windows 11
	CUDA	11.3
	cuDNN	8.2.1
	Pytorch	1.11.0
	OpenCV	4.5.0

Table 5. Comparison of training results for re-identification models.

Re-Identification Network Names	Accuracy/%	Time
6-Layer Residual Network	90.38	25.72
FasterNet	94.71	27.77

Table 6. Results of the improved DeepSORT ablation experiment.

Algorithm Improvement Process	MOTA/%	MOTP/%	ID Sw/Time
DeepSORT	58.71	72.32	45
DeepSORT + NSA Kalman	59.07	72.17	43
DeepSORT + NSA Kalman + Global Matching Association	61.67	73.41	37

Table 7. Comparison experiment results.

Video Numbering	Combination Numbering	MOTA/%	MOTP/%	ID Sw/Time
One	1	57.30	72.03	49
	2	60.09	73.28	42
	3	61.67	75.41	37
Two	1	58.11	72.06	32
	2	61.13	72.21	25
	3	63.27	75.06	20
Three	1	32.65	72.41	117
	2	33.37	74.21	110
	3	37.41	75.35	109

Table 8. Comparison with existing achievements from the perspective of MOTA.

Algorithm	MOTA/%	MOTP/%	ID Sw/Time
ER-DeepSORT [35]	37.10	83.20	/
EAMTT [20]	52.50	78.80	/
DeepMOT [38]	53.80	/	1947
Method in this paper	61.67	73.41	37

Table 9. Comparison with existing achievements from the perspective of ID Sw.

Algorithm	MOTA/%	MOTP/%	ID Sw/Time
Transformer-YOLOv5-DeepSORT [39]	78.68	/	339
ByteTrack [14]	76.6	/	159
OC-SORT [40]	73.1	/	250
Method in this paper	61.67	73.41	37

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, K.; Dai, J.; Gu, Z. Dynamic Tracking Method Based on Improved DeepSORT for Electric Vehicle. World Electr. Veh. J. 2024, 15, 374. https://doi.org/10.3390/wevj15080374

AMA Style

Zhu K, Dai J, Gu Z. Dynamic Tracking Method Based on Improved DeepSORT for Electric Vehicle. World Electric Vehicle Journal. 2024; 15(8):374. https://doi.org/10.3390/wevj15080374

Chicago/Turabian Style

Zhu, Kai, Junhao Dai, and Zhenchao Gu. 2024. "Dynamic Tracking Method Based on Improved DeepSORT for Electric Vehicle" World Electric Vehicle Journal 15, no. 8: 374. https://doi.org/10.3390/wevj15080374

Article Menu

Dynamic Tracking Method Based on Improved DeepSORT for Electric Vehicle

Abstract

1. Introduction

2. Current Research

2.1. Traditional Target Tracking Algorithm

2.2. Target Tracking Algorithm Based on Deep Learning

2.3. Improved Target Tracking Algorithm Based on Deep Learning

2.4. Shortcomings of Existing Methods

2.5. Novelty and Contribution

3. Overview of DeepSORT Algorithm

3.1. Kalman Filtering Algorithm

3.2. Data Association and Cascade Matching

3.3. Feature Extraction Network

4. Improved DeepSORT Algorithm

4.1. Frontend Detector Optimization

4.2. Target Feature Extraction Optimization

4.3. Kalman Filter Improvement

4.4. Global Linear Matching

5. Experiment Results and Analysis

5.1. Materials and Methods of Work

5.2. Evaluation Metrics

5.3. Training Results of Feature Extraction Network

5.4. Analysis of Evaluation Metric Results

5.5. Analysis of Visualization Results

5.6. Comparison with Existing Achievements

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI