ShipMOT: A Robust and Reliable CNN-NSA Filter Framework for Marine Radar Target Tracking

Chen, Chen; Ma, Feng; Wang, Kai-Li; Liu, Hong-Hong; Zeng, Dong-Hai; Lu, Peng

doi:10.3390/electronics14081492

Open AccessArticle

ShipMOT: A Robust and Reliable CNN-NSA Filter Framework for Marine Radar Target Tracking

by

Chen Chen

¹

,

Feng Ma

²

,

Kai-Li Wang

^1,*

,

Hong-Hong Liu

²,

Dong-Hai Zeng

³ and

Peng Lu

³

¹

School of Computer Science and Technology, Wuhan Institute of Technology, Wuhan 430205, China

²

State Key Laboratory of Maritime Technology and Safety, Wuhan University of Technology, Wuhan 430079, China

³

Wanhua Chemical (Fujian) Terminal Co., Ltd., Fuzhou 350309, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(8), 1492; https://doi.org/10.3390/electronics14081492

Submission received: 13 February 2025 / Revised: 29 March 2025 / Accepted: 31 March 2025 / Published: 8 April 2025

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Conventional multi-object tracking approaches frequently exhibit performance degradation in marine radar (MR) imagery due to complex environmental challenges. To overcome these limitations, this paper proposes ShipMOT, an innovative multi-object tracking framework specifically engineered for robust maritime target tracking. The novel architecture features three principal innovations: (1) A dedicated CNN-based ship detector optimized for radar imaging characteristics; (2) A novel Nonlinear State Augmentation (NSA) filter that mathematically models ship motion patterns through nonlinear state space augmentation, achieving a 41.2% increase in trajectory prediction accuracy compared to conventional linear models; (3) A dual-criteria Bounding Box Similarity Index (BBSI) that integrates geometric shape correlation and centroid alignment metrics, demonstrating a 26.7% improvement in tracking stability under congested scenarios. For a comprehensive evaluation, a specialized benchmark dataset (Radar-Track) is constructed, containing 4816 annotated radar images with scenario diversity metrics, including non-uniform motion patterns (12.7% of total instances), high-density clusters (>15 objects/frame), and multi-node trajectory intersections. Experimental results demonstrate ShipMOT’s superior performance with state-of-the-art metrics of 79.01% HOTA and 88.58% MOTA, while maintaining real-time processing at 32.36 fps. Comparative analyses reveal significant advantages: 34.1% fewer ID switches than IoU-based methods and 28.9% lower positional drift compared to Kalman filter implementations. These advancements establish ShipMOT as a transformative solution for intelligent maritime surveillance systems, with demonstrated potential in ship traffic management and collision avoidance systems.

Keywords:

multi-object tracking; marine radar (MR); motion filtering; nonlinear state augmentation (NSA); bounding box similarity index (BBSI)

1. Introduction

In maritime transportation, the continuous tracking and monitoring of ship positions near ports and in crowded waters is of great significance for ensuring ship navigation safety and improving the efficiency of port operations. Marine radars (MR) are extensively utilized in the fields of ship navigation, collision avoidance, and ship surveillance [1,2]. Similarly, shore-based MR can also play a significant role, as they can continuously detect ships in a wide range of waters in bad weather and poor visibility conditions. The imaging area is wide, and the imaging effect is stable at relatively short observation distances. Compared with detection technologies such as the Automatic Identification System (AIS) and Very High Frequency radio (VHF), MR do not need to obtain the real-time information response from ships, offering a simpler and more efficient detection performance.

Typically, MR images are constituted of two-dimensional data represented by a succession of light spots. These light point data can be processed to produce radar images with varying resolutions. Furthermore, by superimposing and calculating a series of radar images at different time intervals, the positions and movement trajectories of ships over time can be simulated.

In maritime surveillance based on MR, traditional ship trajectory tracking involves human operators visually observing ship positions on radar images at different times and relying on experience to determine the corresponding trajectories. This approach has a high error rate and depends heavily on operator experience. With the rapid development of deep learning methods in recent years, deep learning-based techniques have achieved outstanding performance in areas such as object recognition and tracking. Compared to traditional manual supervision methods, deep learning-based object tracking algorithms can realize automated ship trajectory tracking on MR images, greatly reducing labor intensity and improving tracking accuracy.

However, there are a number of difficulties in keeping multiple targets’ trajectory tracking steady in crowded waters. First of all, considerable errors in linear prediction models are caused by the nonlinear and unexpected motion of ships, which makes it easy for associating ship trajectories to fail. Secondly, when ships navigate in dense and intersecting paths, they may sail near one another, which might lead to identity (ID) switches.

For multi-object tracking tasks in the context of port areas and other ship-dense water regions within MR images, this paper proposes a customized tracker algorithm for ship multi-object tracking (ShipMOT). Compared with previous studies, the improvements of this approach are as follows:

(1): In the aspect of motion prediction in the tracker, this approach introduces a new motion filtering prediction method. Compared to the traditional Kalman filter (KF), it can incorporate the detection confidence of bounding boxes, adaptively adjusting the observation noise of the filter to reduce the offset error between the filtered output trajectory and the actual trajectory, thereby decreasing the number of ship ID switches.
(2): In the data association part of the tracker, this algorithm employs a Bounding Box Similarity Index (BBSI) to replace the traditional Intersection over Union (IoU) cost, reducing ID switches during dense and crossing ship navigation.
(3): To evaluate the practical effectiveness of the algorithm, this research establishes the Radar-Track dataset, consisting of 4816 real-world MR images. Scene generalization is performed on this dataset to facilitate the training and validation of various types of algorithms.

2. Literature Review

2.1. Multi-Object Tracking Methods Based on MR Images

Multi-object tracking methods for MR images can be primarily categorized into correlation filter-based approaches and deep learning-based approaches. Correlation filter-based methods are classical target tracking methods for processing MR images. The multi-target static Doppler radar measurement for targets is usually affected by noise, missed detections, and false positives. Do and Nguyen [3] introduced an approximate multi-target Bayesian filter into the radar measurement process. Yuan et al. [4] applied a vision-based correlation filter to MR, improving tracking robustness by adding a motion regularization term and a time consistency constraint. To address the low signal-to-noise ratio (SNR) in radar images, Guerraou et al. [5] utilized a track-before-detect algorithm based on particle filters, achieving promising results in real-world scenarios. Fowdur et al. [6] implemented clustering methods and a Principal Axis KF to detect and associate data from point cloud information provided by ground radar stations, proposing a new framework for object tracking.

In recent years, deep learning-based tracking methods have gradually been applied to MR. In 2023, Kim et al. [7] proposed a multi-object tracking method that combined an extended KF with deep learning, validating its feasibility using real coastal environment radar datasets. Kim et al. [8] developed a DPSE-Net for ship instance segmentation and proposed a new data association metric for MR images based on DeepSORT, demonstrating superior performance in target segmentation and tracking within MR.

Transformer-based trackers are increasingly being applied in maritime domains. These transformers, trained on extensive datasets, facilitate global association across extended sequences, thereby effectively modeling target trajectories within specific scenarios. Compared to traditional filter-based approaches, transformer trackers exhibit superior adaptability in complex environments. For multi-target tracking of marine animals, Hao et al. [9] introduced a memory aggregation module utilizing LSTM networks integrated into transformers. This integration aims to preserve and aggregate features across multiple frames, enhancing tracking accuracy. In underwater fish tracking, Li et al. [10] developed a transformer-based multi-fish tracking model incorporating a multi-association approach. By integrating simple IoU matching within the ID matching module, this approach improved robustness. For the visual tracking of unmanned surface vehicles, Din et al. [11] proposed the SeqTrack tracker, which leveraged transformers to achieve state-of-the-art performance, particularly under challenging sandstorm conditions, outperforming existing methodologies. However, it is important to note that transformer-based trackers necessitate vast quantities of high-quality training data. Furthermore, their reliance on self-attention mechanisms imposes significant computational demands. Another limitation pertains to their generalization capabilities: these trackers often encounter challenges when applied to tracking tasks outside their designated training scenarios.

The disadvantage of correlation filtering methods lies in their limited adaptability to complex scenarios. Once the scenario differs significantly from the preset model, prediction performance degrades severely. In contrast, rapidly developing deep learning technologies have demonstrated outstanding performance in tasks such as object detection and tracking. As long as there are sufficient datasets and appropriate training, deep learning-based object tracking methods can achieve tracking performance comparable to, or even surpassing, that of correlation filtering methods, particularly in terms of scene adaptability. Currently, research on deep learning-based MR target tracking is still relatively scarce, presenting substantial opportunities for future exploration.

2.2. Image-Based Deep Learning Multi-Object Tracking Methods

In the field of pedestrian tracking, research on multi-target tracking utilizing deep learning has seen rapid advancements. Since pedestrian tracking and ship tracking share similarities—both involving the recognition of targets within images, assigning a unique ID to each target, and maintaining consistent IDs as the targets move— this paper draws on the related research methods from the field of pedestrian tracking to design a multi-target ship tracking algorithm for the MR scene.

In 2016, Bewley et al. [12] designed SORT, a two-stage multi-object tracking algorithm that combines Faster R-CNN [13] as the detector with a KF [14] and the Hungarian algorithm [15]. This algorithm performs effectively in scenarios where objects move at a constant linear velocity. To reduce ID switches, DeepSORT [16] was developed, adding a person re-identification (ReID) model to SORT. ByteTrack [17] addressed tracking failures due to occlusions by introducing a second matching round for low-confidence detection boxes. Cao et al. [18] developed OC-SORT, which incorporates consistency in direction between detection boxes and trajectories to reduce KF noise in direction prediction. StrongSORT [19] was proposed to tackle the lower tracking accuracy of DeepSORT in complex scenarios by replacing the original CNN feature extraction network with a stronger BOT [20] extractor and employing Enhanced Correlation Coefficient Maximization [21] for camera motion compensation. These algorithms have achieved promising results on public MOT pedestrian tracking datasets [22,23].

2.3. Nonlinear State Augmentation (NSA) Filter

Traditional linear filtering methods, such as the KF, often struggle with the challenges posed by nonlinear problems. To address this limitation, researchers have developed various techniques in recent years. One particularly promising approach is the Nonlinear State Augmentation (NSA) filter, which has attracted considerable attention for its effectiveness in managing targets with nonlinear movement patterns [24].

The NSA filter’s primary innovation lies in its ability to improve trajectory prediction accuracy through the adaptive adjustment of observation noise [25]. This feature makes it especially suitable for tracking nonlinearly moving targets. Unlike conventional methods, the NSA filter incorporates a mechanism for calculating an adaptive factor. This process not only effectively mitigates trajectory prediction errors caused by nonlinear target dynamics but also introduces only a marginal increase in computational overhead compared to the linear KF. Consequently, it ensures the system maintains lightweight performance characteristics.

By achieving a balance between precision and computational efficiency, the NSA filter significantly enhances the adaptability and accuracy of tracking systems in complex, dynamic environments. Its capability to dynamically adjust to varying conditions represents a significant advancement in the field of target tracking, making it an essential tool for applications requiring robust performance under nonlinear and rapidly changing scenarios. For instance, Du et al. [26] proposed GIAOTracker, which used the NSA Filter to improve tracking stability in complex scenarios.

In target tracking tasks, the detector is responsible for locating the target in the image, while the tracker is responsible for associating existing tracks with new detection boxes. In terms of detection, the target detection methods based on deep learning have achieved promising experimental results in the task of image detection. Therefore, in order to achieve efficient and stable target tracking, improving the tracker of the target tracking algorithm is an important development direction.

3. A Proposed Approach

The ShipMOT proposed in this paper is a track-before-detect algorithm, which first feeds the video into a detector to obtain detection bounding boxes and then inputs these boxes into a tracker to acquire tracking results. The specific operational flow is illustrated in Figure 1. In the first step, the radar image from frame t is input into the detector to obtain ship detection boxes. Concurrently, ship trajectories from frame

t - 1

are processed through an NSA filter for motion prediction, yielding anticipated trajectory boxes. In the second step, the algorithm performs its first data association by matching high-confidence detection boxes with predicted trajectory boxes based on a matching cost derived from the BBSI. Trajectories and detections that match successfully at this stage are forwarded directly to the trajectory management phase. Step three involves a secondary data association, in which low-confidence detection boxes are matched with tracks that failed to match during the initial data association. This process employs IoU as the matching criterion. Finally, in the fourth step, the algorithm manages all matched and unmatched trajectories and detection boxes. Successfully matched detection boxes and trajectories are updated via the NSA filter and then output. High-confidence detection boxes that do not match any existing track are initialized as new trajectories, whereas their low-confidence counterparts are deleted. When a ship’s radar signal is temporarily obscured or lost, ShipMOT employs NSA filtering to continuously predict the ship’s trajectory, maintaining an uninterrupted path over time. This method effectively mitigates trajectory fragmentation issues that arise from signal interruptions. By leveraging the adaptive adjustment of observation noise inherent to NSA filtering, ShipMOT can accurately estimate and predict ship movement even under temporary signal loss.

The tracker design of the ShipMOT algorithm draws inspiration from ByteTrack, while the detector employs YOLOv7 [27]. YOLOv7 uses an Extended Efficient Layer Aggregation Network to achieve effective feature extraction for small ship targets. Additionally, by relying on a single-stage detection architecture, YOLOv7 offers faster operational capabilities compared to two-stage detection algorithms. Other dedicated detectors might also be applicable within this framework, such as MRNet [28]. Notably, the NSA filter and BBSI [29] are the improved modules introduced in this research.

3.1. Nonlinear State Augmentation (NSA) Filtering

For the nonlinear motion of ships, increasing the confidence of detection boxes can mitigate the error introduced by linear models in the final output trajectories. The specific principle of NSA filtering is as follows:

Drawing inspiration from the initial state vector of the KF in ByteTrack, the initial state vector for NSA filtering is defined as

(x, y, a, h, v_{x}, v_{y}, v_{a}, v_{h})

. Here, x represents the horizontal coordinate of the center point of the target box, while y represents the vertical coordinate of the target box center. a denotes the width-to-height ratio of the target box, and h represents the height of the target box.

v_{x}

,

v_{y}

,

v_{a}

, and

v_{h}

represent the respective rates of change of x, y, a and h over time.

NSA filtering comprises two primary stages: prediction and update. During the prediction stage, all trajectories are forecasted using a linear model to generate predicted trajectory boxes. In the update stage, only the matched detection boxes and trajectories are utilized. The objective of this stage is to merge each detection box with its corresponding predicted trajectory box using a specific weighting scheme to produce the final ship trajectory. The prediction model used in the prediction stage is expressed by Equation (1).

p r e d i c t \{\begin{matrix} {\hat{x}}_{k} = F_{k} {\hat{x}}_{k - 1} \\ {\hat{P}}_{k} = F_{k} {\hat{P}}_{k - 1} F_{k}^{T} + Q_{k} \end{matrix}

(1)

where

{\hat{x}}_{k}

and

{\hat{x}}_{k - 1}

represent the state estimate vectors of the trajectory at times k and

k - 1

, respectively.

{\hat{P}}_{k}

and

{\hat{P}}_{k - 1}

denote the estimated covariance matrices of the trajectory at times k and

k - 1

, respectively.

F_{k}

is the state transition matrix, and

Q_{k}

is the system process noise at a time step k.

F_{k} = [\begin{matrix} I_{4 \times 4} & I_{4 \times 4} \\ 0_{4 \times 4} & I_{4 \times 4} \end{matrix}]

(2)

{\hat{x}}_{k} = {[x, y, a, h, v_{x}, v_{y}, v_{a}, v_{h}]}^{- 1}

(3)

α = \frac{h}{s t d_p o s i t i o n}

(4)

β = \frac{h}{s t d_v e l o c i t y}

(5)

{\hat{P}}_{1} = d i a g [2 α, 2 α, 10^{- 2}, 2 α, 10 β, 10 β, 10^{- 5}, 10 β]

(6)

Q_{k} = d i a g [α, α, 10^{- 2}, α, β, β, 10^{- 5}, β]

(7)

Referring to the parameters used in ByteTrack,

s t d_p o s i t i o n

is set to 20, while

s t d_v e l o c i t y

is set to 160.

Furthermore, the formula used in the update stage is shown in Equation (8).

u p d a t e \{\begin{matrix} K = {\hat{P}}_{k} H_{k}^{T} {(H_{k} {\hat{P}}_{k} H_{k}^{T} + R_{k})}^{- 1} \\ x_{k} = {\hat{x}}_{k} + K (z_{k} - H_{k} {\hat{x}}_{k}) \\ P_{k} = (I - K H_{k}) {\hat{P}}_{k} \end{matrix}

(8)

H_{k} = [\begin{matrix} I_{4 \times 4} & 0_{4 \times 4} \end{matrix}]

(9)

R_{k} = d i a g [α^{2}, α^{2}, 10^{- 2}, α^{2}]

(10)

where K is the Kalman gain,

H_{k}

is the mapping matrix, and

R_{k}

is the system measurement noise at time

k

.

{\hat{x}}_{k}

and

{\hat{P}}_{k}

are the state vector and covariance matrix of the trajectory prediction box at time

k

, respectively.

z_{k}

is the state vector of the detection box at time

k

, while

x_{k}

and

P_{k}

are the state vector and covariance matrix of the final output trajectory, respectively.

In Equation (8), the Kalman gain

K

is calculated based on

{\hat{P}}_{k}

,

H_{k}

, and the measurement noise

R_{k}

.

K

is a 4 × 4 matrix used to update the position information of the center point, aspect ratio, and height of the bounding box, represented by the first four rows of both the state vector and the covariance matrix. The Kalman gain is the ratio of the covariance of the state vector to the sum of the covariance and the observation noise, representing the level of trust placed in the detected bounding box. The covariance of the state vector indicates the uncertainty in the system’s predicted state vector. The larger the covariance of the state vector, the greater the uncertainty in the predicted trajectory of the system. Consequently, the higher the Kalman gain, the more trust is placed in the detected bounding box and the less trust is placed in the predicted trajectory from the prediction model. Using the Kalman gain, the NSA filter merges the state vectors and covariances of both the detection box and the predicted trajectory box with specific weights to derive the final output trajectory.

In the traditional KF, the system measurement noise

R_{k}

is often set as a fixed parameter. However, in reality, system noise varies over time. A static noise level can lead to inaccuracies in estimating the true trajectory box. To address this issue, this paper enhances the trust weight of the detection box during the update stage of the KF, thereby minimizing the offset error between the final output trajectory and the actual trajectory. Specifically, NSA filtering adjusts the system measurement noise based on the detection confidence of the target box, enabling adaptive adjustment of the detection box’s trust weight, as illustrated in Equation (11).

{\tilde{R}}_{k} = (1 - c_{k}) R_{k}

(11)

where

c_{k}

is the detection confidence of the target box at time

k

,

{\tilde{R}}_{k}

is the modulated system measurement noise, and

R_{k}

is the original fixed system measurement noise. From Equation (11), it can be seen that as the detection confidence increases, the observation noise of the system decreases. According to Equation (8), a reduction in system measurement noise increases the Kalman gain, causing the state vector

x_{k}

of the final output trajectory to be closer to the detection box.

The performance of several filtering models in the context of curved ship motion is illustrated in Figure 2. It is assumed that the detection boxes from previous frames represent the ship’s trajectory during those frames. Upon reaching the present frame, the ship’s trajectory is initially predicted based on its past motion direction using filtering methods. Both NSA filtering and the KF employ linear models for prediction, yielding a trajectory forecasted by the Kalman linear model. In the process of associating detection boxes with predicted trajectories, the present frame’s detection box and the predicted trajectory are fused and updated to obtain the final output trajectory of the ship, completing the tracking task. NSA filtering dynamically adjusts the trust weight of the detection boxes, which results in an output ship trajectory that aligns more closely with the actual detection boxes. This dynamic adjustment allows NSA filtering to adapt to changes in the target’s movement more effectively. Conversely, the KF relies on a fixed update scale, which causes the output trajectory to remain closer to the predictions made by its linear model. When the target detection is effective, the detection bounding box closely approximates the true position of the ship. The Kalman linear model shows the highest forecast trajectory offset error. During the filter update stage, both NSA and KF reduce the final output trajectory offset error compared to the linear model by integrating the bounding box information with the linear model-predicted trajectory.

However, compared to the KF, the NSA filter assigns greater weight to the confidence of the detection bounding boxes. As can be observed from Figure 2, the output trajectory of the NSA filter exhibits less offset error than that of the traditional KF. In scenarios where target detection performance is high, this approach can mitigate the impact of system prediction errors on the final output trajectory. In cases of large prediction error from the linear model—such as during nonlinear ship nonlinear—the NSA filter demonstrates better tracking robustness than the traditional KF.

When faced with sudden changes in ship motion direction, the linear prediction model employed by the KF frequently fails to accurately perceive these abrupt shifts due to its reliance on a fixed prediction scale. This limitation often results in significant prediction errors. In contrast, localization information derived from detected bounding boxes offers immediate feedback and remains unaffected by nonlinear navigation routes or sudden changes in motion direction. Consequently, it achieves higher accuracy in ship positioning during complex maneuvers.

Compared to the traditional KF, the NSA filter can adaptively enhance the utilization of high-quality bounding box localization information. Specifically, during sudden changes in ship motion direction, the NSA filter intensively leverages detection information characterized by immediate feedback and high localization accuracy. This intensive use allows the NSA filter to exhibit heightened sensitivity to changes in ship motion. As a result, the NSA filter effectively mitigates prediction errors that would otherwise appear in the final output trajectory when using a linear prediction model.

3.2. The Bounding Box Similarity Index (BBSI)

In scenarios characterized by dense and intersecting maritime traffic, the data association stage of a tracker is prone to high IoU intersection between detection boxes and multiple trajectory prediction boxes. Relying solely on IoU for associating detection and trajectory boxes may lead to incorrect matches in such complex environments. To address this issue, the ShipMOT algorithm introduces the BBSI during the first data association process as an advanced alternative to conventional IoU-based matching. This approach aims to enhance the accuracy and reliability of data association in complex maritime environments, thereby improving the overall performance of object tracking. The BBSI not only evaluates the overlap area but also integrates additional parameters that contribute to the similarity assessment between detection and prediction bounding boxes. Consequently, it offers a more resilient solution for maritime object tracking applications.

Specifically, BBSI is defined as follows:

x_{b o t t o m} = m i n (x_{b b o x 1, r b}, x_{b b o x 2, r b})

(12)

x_{t o p} = m a x (x_{b b o x 1, t l}, x_{b b o x 2, t l})

(13)

y_{b o t t o m} = m i n (y_{b b o x 1, r b}, y_{b b o x 2, r b})

(14)

y_{t o p} = m a x (y_{b b o x 1, t l}, y_{b b o x 2, t l})

(15)

h_{e f f} = m a x (0, x_{b o t t o m} - x_{t o p})

(16)

w_{e f f} = m a x (0, y_{b o t t o m} - y_{t o p})

(17)

S_{h} = \frac{h_{e f f}}{h_{e f f} + |h_{b b o x 2} - h_{b b o x 1}| + ϵ}

(18)

S_{w} = \frac{w_{e f f}}{w_{e f f} + |w_{b b o x 2} - w_{b b o x 1}| + ϵ}

(19)

S_{c} = \frac{|x_{b b o x 1, c} - x_{b b o x 2, c}| + |y_{b b o x 1, c} - y_{b b o x 2, c}|}{w_{c} + h_{c}}

(20)

B B S I = I o U + S_{h} + S_{w} - S_{c}

(21)

Figure 3 illustrates the calculation schematic of the BBSI. In Equation (12),

x_{b b o x 1, r b}

is the x-coordinate of the bottom-right corner of bbox1, and

x_{b o t t o m}

is the minimum value of the x-coordinates of the bottom-right corners of bbox1 and bbox2. In Equation (13),

x_{b b o x 1, t l}

represents the x-coordinate of the top-left corner of bbox1. Similarly, in Equations (14) and (15),

y_{b b o x 1, r b}

and

y_{b b o x 1, t l}

represent the y-coordinates of the bottom-right and top-left corners of bbox1, respectively. In Equation (16), if bbox1 and bbox2 share an overlapping region,

h_{e f f}

is defined as the height of their intersection; otherwise,

h_{e f f}

is 0. Analogously, in Equation (17), if there is an overlap between bbox1 and bbox2,

w_{e f f}

is the width of their intersection area; otherwise,

w_{e f f}

is designated as 0.

In Equation (18),

S_{h}

represents the height similarity between bbox1 and bbox2.

S_{h}

approaches 1 when the two boxes overlap and have similar heights, whereas it approaches 0 if they do not overlap or have markedly different heights. Similarly, in Equation (19),

S_{w}

denotes the width similarity between bbox1 and bbox2. The parameter

ϵ

is set to

10^{- 7}

in Equations (18) and (19) to prevent division by zero. Equation (20) defines

S_{c}

, which represents the consistency of the centers of bbox1 and bbox2. When the distance between the center points of the two bounding boxes is small,

S_{c}

approaches 1; conversely, it approaches 0 when the centers are far apart. Finally, Equation (21) defines the BBSI, synthesizing these metrics to provide a comprehensive measure of bounding box similarity.

In scenarios involving dense and intersecting maritime traffic, a ship’s detection bounding box may exhibit high IoU values with trajectory bounding boxes corresponding to other ships. As discussed above, the BBSI evaluates additional dimensions beyond IoU by incorporating metrics such as height, width, and center consistency between the detection and predicted trajectory boxes. This approach effectively minimizes erroneous associations between detections and trajectories, thereby yielding superior tracking accuracy compared to relying solely on IoU as the basis for the cost matrix.

In general, BBSI parameters require no manual tuning, which simplifies its application in data association costs. For customized scenarios, if further optimization is needed, weights can be assigned to metrics such as IoU overlap, center consistency, length similarity, and width similarity to emphasize specific dimensions in bounding box matching.

4. A Case Study

4.1. Experiment Preparation

4.1.1. A Dataset

A high-quality MR image dataset, named Radar-Track, has been created by preprocessing data acquired from the JMA5300 MR (Japan Radio Company, Ltd., Chiba, Japan) stationed at the Wusongzhi Wharf in Zhoushan City, Zhejiang Province, China. The Radar-Track dataset comprises 4816 radar images, each with a resolution of 1024 × 1024 pixels. It encompasses a wide range of complex environments and maritime navigation scenarios, including varying weather conditions, imaging conditions, dense maritime traffic, crossing ships, and nonlinear ship movements.

To facilitate the visualization of ship distribution within the radar dataset, Figure 4 illustrates both the spatial distribution and size variations of ships as depicted in radar images. Figure 4a presents the spatial distribution of ships, highlighting their concentration along primary navigation channels, which include multiple nonlinear routes. In certain channels, ships form dense clusters, indicating areas of high maritime traffic. Figure 4b details the size distribution of target bounding boxes. The dimensions of these boxes vary due to changes in motion direction or fluctuations in signal strength. This variability underscores the importance of focusing on ship shapes to more accurately distinguish between different trajectories.

In order to enhance the generalization capability of the dataset, data augmentation techniques such as white balancing, random flipping, and random cropping are employed to expand the diversity of represented scenes. Under varying weather conditions, such as clear, rainy, and foggy days, MR images often exhibit varying tonal levels. Data augmentation techniques based on white balance adjustments can be employed to enrich radar image datasets with diverse weather scenarios, thereby enhancing the generalization capabilities of detection models. Random cropping involves extracting training samples from localized regions of the image. By cropping regions at different scales, the model becomes adept at recognizing multi-scale object characteristics, thereby simulating situations where ships may be partially occluded or only partially visible. Random flipping, which horizontally or vertically mirrors images, simulates targets appearing in different orientations within the radar field of view, enabling the model to adapt to ships with varying trajectory directions. Overall, data augmentation techniques improve the generalization of detection models by focusing on intrinsic object features while reducing overfitting risks.

Furthermore, meticulous annotation of ships within the images has been conducted using bounding boxes, categorical labels, and ID labels. Given that various types of ships exhibit similar point-like features in MR images, all ship categories are uniformly annotated as Boat.

To ensure accurate dataset annotation, real ship GPS signals were collected, and the latitude-longitude coordinates of the ships were converted into real-world Cartesian coordinates centered on the radar using the Mercator projection method [30]. In the real-world coordinate system, the Y-axis points north and the X-axis points east. Since radar images are processed in a planar Cartesian coordinate system, the following formulas are used to transform real-world ship coordinates into radar image coordinates.

x = 512 + \frac{x_{t}}{10.85}

(22)

y = 512 + \frac{y_{t}}{10.85}

(23)

The coordinates (512, 512) denote the maritime radar’s position in the image. The scale factor of 10.85 represents the ratio between the radar’s real-world scanning range and the image dimensions, calculated using a coverage radius of 5556 m and an image resolution of 1024 × 1024 pixels.

{(x}_{t}, y_{t})

denote the ship’s real-world coordinates, while

(x, y)

represent its image coordinates.

Ship positions are then marked as key points for annotation. During the annotation process, to minimize land background interference, a masking technique based on radar image temporal information is used to obscure land areas and enhance the visibility of ship targets. The specific workflow is illustrated in Figure 5.

From the Radar-Track dataset, 3000 images are selected and partitioned into training and validation sets for detector training at a ratio of 9:1. The remaining 1816 images serve as the test set and are synthesized into four video sequences based on different scenarios. Each video sequence comprises approximately 450 images and contains around 35 ship tracks. A representative sample of images from the dataset is shown in Figure 6.

This comprehensive and versatile dataset has been developed to support the advancement of MR-based tracking algorithms, contributing significantly to the field of maritime surveillance and safety.

4.1.2. Experiment Platform

The experimental platform is based on the Windows 10 operating system, equipped with an NVIDIA GeForce RTX 3090 GPU (NVIDIA, Santa Clara, CA, USA) for computation, featuring an effective memory of 24GB, CUDA version 11.6, PyTorch version 1.9.1, and a Python environment version 3.7.15. The central processing unit (CPU) utilized is an Intel Core i7-8750H (Intel Corporation, Santa Clara, CA, USA)operating at 2.20 GHz.

During model training, the Adam optimizer is introduced to adaptively adjust the learning rate, thereby avoiding the instability of the model momentum caused by a fixed learning rate. Additionally, the online Mosaic data augmentation technique is employed to splice the dataset through random scaling, random cropping, and random arrangement, enriching the samples in the training set and enhancing the robustness of the model [31].

4.1.3. Target Detection

ShipMOT employs YOLOv7 as the detector for conducting training aimed at detecting ship targets, since it is widely acknowledged as an efficient identification method. Details of the model training process are presented in Figure 7. Other CNN-based algorithms are also applicable [27].

As shown in Figure 7a, the boundary box loss values of the ship validation set have stabilized, indicating that the detection model has converged to a relatively stable state and achieved a good fit to the training data. As shown in Figure 7b, the training process for detecting ship targets has effectively reached convergence. The results indicate that the mean Average Precision (mAP@0.5) for detecting ship targets reaches 0.93. Consequently, this detection performance is sufficient to support the progression of object tracking tasks and can serve as the foundational detection weights for subsequent validation. A visualization of the ship target detection results is presented in Figure 8.

4.2. Experiment Results

In this experiment, the tracking performance of the algorithms is evaluated using four metrics: High-level Object Tracking Accuracy (HOTA), Multiple Object Tracking Accuracy (MOTA), the number of ID Switches (ID Switch), and the Identification F1 Score (IDF1). HOTA serves to balance object detection precision with association accuracy, while the ID Switch emphasizes the capability to maintain identity consistency.

H O T A = \sqrt{D e t A} \times \sqrt{A s s A} \times \sqrt{L o c A}

(24)

D e t A = T P / (T P + F N + F P)

(25)

A s s A = T P A / (T P A + F N A + F P A)

(26)

L o c A = \frac{1}{n} \sum_{i = 1}^{n} I o U_{i}

(27)

HOTA includes three components: DetA detection score, AssA association score, and LocA localization score. In DetA, TP denotes the number of correctly predicted detection samples, while FP and FN represent the numbers of false positives and false negatives, respectively. For AssA, TPA indicates the number of correct associations, whereas FPA and FNA denote the numbers of incorrect and missed associations, respectively. In LocA,

I o U_{i}

refers to the IoU value calculated for each detected target with respect to its corresponding ground truth bounding box. DetA represents the detection reliability of the tracking algorithm, AssA represents the identity association reliability, and LocA represents the localization reliability of the tracking algorithm. MOTA places greater emphasis on the overall performance of detection and association. ID Switches indicate the number of times a trajectory changes, providing a direct reflection of the capability to maintain identity consistency. Frames per second (FPS) denotes the inference speed of tracking algorithms, measured as the number of frames processed per second.

This research conducts comparative experiments between the classic tracking algorithms DeepSORT, ByteTrack, StrongSORT, OC-SORT, and C-BIoU [32] and the proposed ShipMOT algorithm, aiming to verify the effectiveness of ShipMOT. Furthermore, ablation experiments are designed to evaluate the performance of each component of ShipMOT. The ship tracking performance of ShipMOT under diverse maritime navigation conditions is vividly illustrated, demonstrating its superior tracking capabilities in challenging environments.

4.2.1. Target Tracking

To evaluate the performance of the ShipMOT tracker, all algorithms listed in Table 1 utilize an identical detector and detection weights as those used by ShipMOT. Regarding the ReID models for DeepSORT and StrongSORT, this research extracts 75 ship trajectories from the Radar-Track dataset. These trajectories are formatted to conform to the structure of the Market1501 dataset and are subsequently used to train their respective ReID models. The training procedure is executed over 60 epochs. The data presented in Table 1 reflect the mean values of various tracking metrics, averaged across multiple test video sequences. It ensures a standardized comparison framework, highlighting the distinctive advantages of the ShipMOT algorithm under consistent evaluation conditions.

As shown in Table 1, the ShipMOT algorithm achieves superior metrics across the Radar-Track dataset compared to other algorithms. Specifically, when compared to OC-SORT, which ranks second in overall tracking performance, ShipMOT demonstrates an improvement of 1.74% in HOTA and a reduction of 25 instances in ID Switch. This shows that ShipMOT exhibits better performance in ship tracking within MR images. Since ShipMOT utilizes only motion features for data association costs, it provides a significant advantage in computational efficiency over methods reliant on ReID models. In comparisons of tracking speed among lightweight trackers, ShipMOT exhibits moderate runtime performance. This owes to the lightweight design of its referenced ByteTrack architecture, and ShipMOT does not incorporate computationally intensive modules.

4.2.2. Ablation Study

In order to evaluate the performance of each component of the proposed method, an ablation study is conducted using the Radar-Track dataset. The experiment process involves incrementally integrating various improvements to the baseline tracker, ByteTrack, and evaluating their impact on tracking metrics.

The findings of the ablation study are presented in Table 2. M1 denotes the baseline model, which is the traditional ByteTrack algorithm. Based on the model M1, M2 substitutes the KF used in ByteTrack with NSA filtering. The performance of M2 exhibits comprehensive improvements, most notably a 2.7% increase in HOTA, highlighting the efficacy of NSA filtering in enhancing tracking accuracy and stability.

M3 modifies the initial round of data association in M1 by replacing IoU matching with BBSI matching. After this improvement, all metrics of M3 show enhancements. In particular, the number of ID switches is reduced by a factor of 27. By incorporating width similarity, height similarity, and center consistency between detection boxes and trajectories in addition to IoU, BBSI effectively reduces the number of erroneous trajectory associations. Compared to IoU matching, BBSI introduces additional computational overhead. However, the extra calculations required to assess center consistency, and height and width similarity are based solely on linear complexity multiplication and division operations. This ensures that the increase in computational load remains minimal. Experimental results indicate that, compared to the original ByteTrack algorithm, the processing speed of M3 decreases by only 2.3 fps, maintaining an overall rate of 32.56 fps. Given that BBSI reduces ID switches by approximately 33%, the associated computational costs are justified.

M4 represents the proposed ShipMOT method, which integrates both NSA filtering and BBSI improvements. This method demonstrates the best performance among all evaluated models. These findings underscore the contributions of NSA filtering and BBSI matching in improving the robustness and reliability of multi-object tracking algorithms.

To comprehensively assess the multi-object tracking performance of the NSA filter relative to other advanced nonlinear motion filters, the Extended Kalman Filter (EKF) and the Unscented Kalman Filter (UKF) are utilized as comparative baselines. Leveraging the ByteTrack framework, the motion filter is sequentially replaced with NSA, EKF, and UKF to conduct a systematic comparison. The tracking performance of the NSA filter is evaluated using four key metrics: HOTA, MOTA, ID Switch, and FPS.

As shown in Table 3, the NSA filter’s HOTA, MOTA, and ID Switch metrics all rank second among the compared filters, with HOTA only 0.44% lower than the UKF. The NSA filter slightly lags behind the UKF in tracking accuracy. However, it introduces only one additional parameter compared to the KF, resulting in significantly better lightweight performance than the UKF, which requires computationally intensive sampling point processing. In extreme motion scenarios involving unpredictable ship maneuvers or abrupt trajectory changes, the EKF struggles to promptly fit ship motion paths, even when using a first-order Jacobian matrix approximation. In contrast, the NSA filter leverages real-time feedback from detection bounding box information, which remains unaffected by irregular ship routes, enabling superior adaptability to complex motion patterns. In terms of computational efficiency, the NSA filter also outperforms the EKF, which relies on Jacobian matrix calculations. Overall, the NSA filter achieves a well-balanced trade-off between tracking accuracy and lightweight computational demands.

To investigate the impact of detectors on tracking performance, YOLOv5 and SSD algorithms are trained on the same dataset as the ShipMOT detector, using identical training parameters and number of epochs. These two comparative algorithms are then used to replace the detector component in ShipMOT, thereby showcasing how detector performance influences the overall tracking efficacy of the framework.

The mAP50 and mAP50-95 metrics represent the detection accuracy of the detectors. As shown in Table 4, as detection performance declines, the number of ship ID switches increases significantly, while HOTA tracking accuracy drops sharply. Analysis indicates that within ShipMOT, the NSA filter critically relies on high-quality detection information to correct trajectory prediction errors. Poor detection quality can severely degrade the positional accuracy of NSA filter outputs. Additionally, frequent missed detections lead to incomplete trajectory updates, causing ships to be marked as lost and exacerbating cumulative prediction errors in linear models. Conversely, excessive false positives may cause erroneous detection boxes to be mistakenly matched with ship trajectories, disrupting proper ID maintenance. In summary, detector performance plays a critical role in determining ShipMOT’s tracking efficacy. Poor detector performance significantly impacts the overall tracking capability of the framework.

4.2.3. Ship Tracking in Different Scenarios

As shown in Figure 9, in a scenario involving nonlinear ship motion, the ByteTrack method fails sustain the ID stability for ship 22, erroneously switching its ID to that of ship 23. In contrast, ShipMOT successfully preserves the correct ID assignment for ship 22 throughout the tracking sequence.

Specifically, during the trajectory update phase at frame 156, ShipMOT enhances the trust weight of detection boxes, resulting in a ship output trajectory with reduced positional deviation. This adjustment in earlier frames leads to smaller deviations, which positively influence the accuracy of subsequent predicted trajectories by minimizing error accumulation. At frame 164, ByteTrack’s KF fails to match with the ship detection box due to excessive deviation in the predicted trajectory, leading to an ID switch. In contrast, ShipMOT’s NSA filtering intensively utilizes localization information from detection boxes to continuously correct deviations in the output trajectory. At the same frame, the overlap between the bounding boxes ensures sustained alignment between the predicted trajectory and the detection box, thereby preventing an ID switch. During nonlinear ship motion tracking, ID switches typically occur when accumulated prediction errors cause a mismatch between the predicted trajectory and the detection box. By reinforcing the use of detection box information through NSA filtering, ShipMOT effectively reduces real-time errors in the output trajectory. This enhancement results in more stable and accurate ship tracking, even under abrupt or unpredictable movement conditions. Thus, leveraging NSA filtering not only mitigates the risk of ID switches but also contributes to a more robust and dependable tracking performance. This demonstrates the superior tracking stability of ShipMOT in scenarios characterized by nonlinear ship motion.

When ships are engaged in close-range crossing navigation, an ID switch occurs if their motion directions change; otherwise, the ID remains consistent. As shown in Figure 10, when using ByteTrack for tracking, an ID switch is observed between ships 33 and 37 due to changes in their headings. Conversely, ShipMOT successfully maintains the correct IDs for both Ship 33 and Ship 37. This comparison clearly demonstrates ShipMOT’s superior tracking stability and accuracy during crossing navigation scenarios.

The overall architecture of ByteTrack closely mirrors that of ShipMOT. In Figure 1, ByteTrack distinguishes itself from ShipMOT by employing YOLO as its detector, utilizing the KF for motion prediction, and adopting IoU as the association cost during the initial data association step. When ships are navigating in close proximity to each other, a significant challenge for tracking algorithms is accurately associating detection boxes with their corresponding predicted trajectories to avoid misassociations. An example depicted in Figure 10 highlights this challenge, demonstrating that Ship 33’s detection box exhibits high IoU overlap with both its own predicted trajectory and that of Ship 37. Given that ByteTrack relies solely on IoU to distinguish between predicted trajectories, it is susceptible to making erroneous associations in such complex scenarios. In contrast, ShipMOT employs the BBSI as the data association cost. This approach not only considers the IoU overlap between detection boxes and predicted trajectories but also incorporates center consistency, height similarity, and width similarity. As shown in Figure 10, Ship 33’s detection box exhibits significantly greater similarity to its own predicted trajectory than to that of Ship 37 in terms of center consistency, height similarity, and width similarity. This multi-dimensional discriminative capability allows ShipMOT to accurately distinguish between the trajectories of Ship 33 and Ship 37, ensuring correct association between Ship 33’s detection box and its predicted trajectory. By employing a multi-criteria data association, ShipMOT effectively aligns detection boxes with their corresponding predicted trajectories, thereby maintaining consistent ID assignments.

As shown in Figure 11, in scenarios characterized by dense ship distributions, there is a high likelihood of ships navigating in close proximity to each other. In frame 8, two pairs of ships—Ship 3 with Ship 4, and Ship 10 with Ship 7—exhibit overlapping radar signatures, indicating their close navigational proximity. At this juncture, the high degree of overlap between the detection bounding boxes of these closely positioned ships poses a substantial challenge for algorithms attempting to correctly associate trajectories with detection boxes.

In dense ship traffic scenarios, multiple crossing ships can be approximated as overlapping navigation patterns. In frame 37, although the bounding boxes of Ship 7 and Ship 10 exhibit high IoU overlap, their differences in length and width provide distinguishable characteristics. ShipMOT leverages height and width similarity assessments of bounding boxes to effectively differentiate trajectories between ships. In frames 8 and 37, despite high height-width similarity between Ship 3 and Ship 4’s bounding boxes, their distinct movement paths create opportunities for trajectory distinction. This enables algorithms to exploit center-distance analysis between bounding boxes as a means of differentiation. By employing discriminative center-distance criteria, the BBSI metric successfully distinguishes and accurately associates the trajectories of ships with divergent movement paths.

In frame 37, it is observed that DeepSORT fails to accurately associate all ships with overlapping radar signatures present in frame 8. OC-SORT manages to successfully associate only one pair of ships. In contrast, ShipMOT achieves successful association for all pairs.

The comparison experiment among different algorithms within dense ship scenarios demonstrates that ShipMOT possesses superior data association accuracy and more stable target tracking performance. This highlights ShipMOT’s adaptability and efficacy for ship target tracking tasks utilizing MR images.

5. Conclusions

To address the challenges of ship target tracking in MR images, this paper proposes a novel tracking algorithm, ShipMOT, which employs NSA for motion prediction and introduces BBSI as an alternative to the traditional IoU cost metric. For this research, a comprehensive Radar-Track dataset consisting of 4816 real-world MR images is compiled. In this dataset, ShipMOT achieved a HOTA score of 79.01% for ship targets, ranking first among comparative algorithms across all metrics. These results demonstrate ShipMOT’s superior adaptability and effectiveness for ship target tracking tasks within MR imagery, particularly in challenging scenarios involving nonlinear ship movements and dense crossing navigation. Furthermore, ShipMOT has an average operational speed of 32.36 fps, indicating significant potential for online tracking applications.

The overall architecture of ShipMOT is inspired by the lightweight ByteTrack algorithm. Compared to ByteTrack, ShipMOT introduces only a single parameter calculation in the motion filter and adds linear computational complexity to the data association cost, resulting in a frame rate reduction of merely 2.5 fps. With a 50% reduction in ID switch frequency, ShipMOT sacrifices minimal processing speed while achieving more robust ship trajectory tracking. Notably, its overall operational speed remains competitive despite this trade-off. Moreover, ShipMOT’s hyperparameters require no manual tuning under normal conditions, unless applied to customized BBSI scenarios.

The perspective of radar imagery resembles an overhead view of a scene. Consequently, ShipMOT demonstrates potential for application in bird’s-eye view monitoring scenarios such as traffic surveillance from an aerial drone perspective. While this research does not include direct experiments on multi-object tracking for non-marine targets, ShipMOT can effectively track targets under similar monitoring perspectives to MR images. Future research can build upon this methodology to explore ShipMOT’s tracking capabilities in bird’s-eye view scenarios such as aerial drone-based traffic monitoring.

Despite these promising results, ShipMOT has certain limitations. First, the NSA filter relies on high-precision localization information from detection bounding boxes. If the localization quality of ship detection data is low, it significantly impacts the prediction accuracy of the NSA filter. Additionally, ShipMOT adopts a two-stage framework for ship target tracking, which requires running two separate algorithmic components, thereby increasing computational complexity. Furthermore, the detection stage cannot leverage tracking information, reducing the efficiency of information utilization.

The experiments primarily concentrate on harbor scenarios. Future work will explore multi-object tracking of ships in inland waterway environments. While the current research has not yet implemented targeted enhancements to the detector, upcoming studies plan to integrate Vision Transformer modules into detection algorithms. Future efforts will also emphasize broadening the diversity of radar image scenarios. Additionally, subsequent research will consider incorporating visual detection interference factors into MR images to enhance the model’s robustness and generalization capabilities.

Author Contributions

Conceptualization, C.C. and F.M.; Methodology, F.M. and H.-H.L.; Software, F.M. and H.-H.L.; Validation, K.-L.W. and H.-H.L.; Formal analysis, K.-L.W.; Investigation, C.C., D.-H.Z. and P.L.; Resources, F.M.; Data curation, F.M., D.-H.Z. and P.L.; Writing—original draft preparation, K.-L.W. and H.-H.L.; Writing—review and editing, C.C.; Visualization, C.C.; Supervision, C.C.; Project administration, F.M., D.-H.Z. and P.L.; Funding acquisition, C.C. and F.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 52201415), Fund of State Key Laboratory of Maritime Technology and Safety (No. 16-10-1), and National Key R&D Program of China (Grant No. 2023YFB4302300).

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

Donghai Zeng and Peng Lu were employees of Wanhua Chemical (Fujian) Terminal Company. The remaining authors declare that this research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Li, B.; Xu, J.; Pan, X.; Chen, R.; Ma, L.; Yin, J.; Liao, Z.; Chu, L.; Zhao, Z.; Lian, J.; et al. Preliminary Investigation on Marine Radar Oil Spill Monitoring Method Using YOLO Model. J. Mar. Sci. Eng. 2023, 11, 670. [Google Scholar] [CrossRef]
Wei, Y.; Liu, Y.; Lei, Y.; Lian, R.; Lu, Z.; Sun, L. A New Method of Rainfall Detection from the Collected X-Band Marine Radar Images. Remote Sens. 2022, 14, 3600. [Google Scholar] [CrossRef]
Do, C.-T.; Van Nguyen, H. Multistatic Doppler-Based Marine Ships Tracking. In Proceedings of the 2018 International Conference on Control, Automation and Information Sciences (ICCAIS), Hangzhou, China, 9 December 2018; pp. 151–156. [Google Scholar] [CrossRef]
Yuan, X.; Liu, J.; Cheng, D.; Chen, C.; Chen, W. Motion-Regularized Background-Aware Correlation Filter for Marine Radar Target Tracking. In IEEE Geoscience and Remote Sensing Letters; IEEE: New York, NY, USA, 2023; Volume 20. [Google Scholar]
Guerraou, Z.; Khenchaf, A.; Comblet, F.; Leouffre, M.; Lacrouts, O. Particle Filter Track-Before-Detect for Target Detection and Tracking from Marine Radar Data. In Proceedings of the 2019 IEEE Conference on Antenna Measurements & Applications (CAMA), Kuta, Bali, Indonesia, 23–25 October 2019; pp. 304–307. [Google Scholar] [CrossRef]
Fowdur, J.S.; Baum, M.; Heymann, F.; Banys, P. An Overview of the PAKF-JPDA Approach for Elliptical Multiple Extended Target Tracking Using High-Resolution Marine Radar Data. Remote Sens. 2023, 15, 2503. [Google Scholar] [CrossRef]
Kim, E.; Kim, J.; Kim, J. Multi-Target Tracking Considering the Uncertainty of Deep Learning-based Object Detection of Marine Radar Images. In Proceedings of the 20th International Conference on Ubiquitous Robots (UR), Honolulu, HI, USA, 25–28 June 2023; pp. 191–194. [Google Scholar] [CrossRef]
Kim, H.; Kim, D.; Lee, S.-M. Marine Object Segmentation and Tracking by Learning Marine Radar Images for Autonomous Surface Vehicles. IEEE Sens. J. 2023, 23, 10062–10070. [Google Scholar]
Hao, Z.; Qiu, J.; Zhang, H.; Ren, G.; Liu, C. Umotma: Underwater multiple object tracking with memory aggregation. Front. Mar. Sci. 2022, 9, 1071618. [Google Scholar]
Li, W.; Liu, Y.; Wang, W.; Li, Z.; Yue, J. TFMFT: Transformer-based multiple fish tracking. Comput. Electron. Agric. 2024, 217, 108600. [Google Scholar]
Din, M.U.; Bakht, A.B.; Akram, W.; Dong, Y.; Seneviratne, L.; Hussain, I. Benchmarking Vision-Based Object Tracking for USVs in Complex Maritime Environments. IEEE Access 2025, 13, 15014–15027. [Google Scholar]
Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple Online and Realtime Tracking. In Proceedings of the 23rd IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar]
Kalman, R.E. A new approach to linear filtering and prediction problems. J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef]
Kuhn, H.W. The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 1955, 2, 83–97. [Google Scholar]
Wojke, N.; Bewley, A.; Paulus, D. Simple Online and Realtime Tracking with a Deep Association Metric. In Proceedings of the 24th IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar]
Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. ByteTrack: Multi-object Tracking by Associating Every Detection Box. In Proceedings of the 17th European Conference on Computer Vision (ECCV), Tel Aviv, Isreal, 23–27 October 2022; pp. 1–21. [Google Scholar]
Cao, J.; Pang, J.; Weng, X.; Khirodkar, R.; Kitani, K. Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 9686–9696. [Google Scholar]
Du, Y.; Zhao, Z.; Song, Y.; Zhao, Y.; Su, F.; Gong, T.; Meng, H. StrongSORT: Make DeepSORT Great Again. IEEE Trans. Multimed. 2023, 25, 8725–8737. [Google Scholar] [CrossRef]
Luo, H.; Jiang, W.; Gu, Y.; Liu, F.; Liao, X.; Lai, S.; Gu, J. A Strong Baseline and Batch Normalization Neck for Deep Person Re-Identification. IEEE Trans. Multimed. 2020, 22, 2597–2609. [Google Scholar] [CrossRef]
Evangelidis, G.D.; Psarakis, E.Z. Parametric image alignment using enhanced correlation coefficient maximization. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 1858–1865. [Google Scholar]
Dendorfer, P.; Rezatofighi, H.; Milan, A.; Shi, J.; Cremers, D.; Reid, I.; Roth, S.; Schindler, K.; Leal-Taixé, L. Mot20: A benchmark for multi object tracking in crowded scenes. arXiv 2020, arXiv:2003.09003. [Google Scholar]
Milan, A.; Leal-Taixé, L.; Reid, I.; Roth, S.; Schindler, K. MOT16: A benchmark for multi-object tracking. arXiv 2016, arXiv:1603.00831. [Google Scholar]
Chen, B.; Hu, G. Nonlinear state estimation under bounded noises. Automatica 2018, 98, 159–168. [Google Scholar] [CrossRef]
Müller, M.A. Nonlinear moving horizon estimation in the presence of bounded disturbances. Automatica 2017, 79, 306–314. [Google Scholar] [CrossRef]
Du, Y.; Wan, J.; Zhao, Y.; Zhang, B.; Tong, Z.; Dong, J. GIAOTracker: A comprehensive framework for MCMOT with global information and optimizing strategies in VisDrone. In Proceedings of the 18th IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 2809–2819. [Google Scholar]
Ma, F.; Kang, Z.; Chen, C.; Sun, J.; Xu, X.-B.; Wang, J. Identifying Ships from Radar Blips Like Humans Using a Customized Neural Network. IEEE Trans. Intell. Transp. Syst. 2024, 25, 7187–7205. [Google Scholar] [CrossRef]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Morsali, M.M.; Sharifi, Z.; Fallah, F.; Hashembeiki, S.; Mohammadzade, H.; Shouraki, S.B. SFSORT: Scene Features-based Simple Online Real-Time Tracker. arXiv 2024, arXiv:2404.07553. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Lapaine, M.; Frančula, N. Web mercator projection-one of cylindrical projections of an ellipsoid to a plane. Kartogr. I Geoinformacije 2021, 20, 31–47. [Google Scholar] [CrossRef]
Yang, F.; Odashima, S.; Masui, S.; Jiang, S. Hard to Track Objects with Irregular Motions and Similar Appearances? Make It Easier by Buffering the Matching Space. In Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–7 January 2023; pp. 4788–4797. [Google Scholar] [CrossRef]

Figure 1. The flowchart of the ShipMOT framework.

Figure 2. Comparison of different filtering models.

Figure 3. BBSI calculation diagram.

Figure 4. Ship location and size distribution.

Figure 5. Annotation of the Radar-Track dataset. The green square in the figure represents the tracked target.

Figure 6. Samples of the Radar-Track dataset.

Figure 7. Metric information of the training process.

Figure 8. Visualization of ship object detection results using YOLOv7.

Figure 9. A scenario of nonlinear ship motion.

Figure 10. A scenario of ship crossing navigation. Red arrows indicate the direction of ship heading.

Figure 11. A scenario of dense ship navigation.

Table 1. Comparison of tracking performance among different algorithms.

Algorithm	HOTA (%) ↑	MOTA (%) ↑	ID Switch ↓	FPS ↑
DeepSORT	59.84	80.13	345	18.4
StrongSORT	68.65	81.83	265	15.6
C-BIoU	72.13	88.34	68	35.83
ByteTrack	75.31	87.43	82	34.86
OC-SORT	77.27	87.89	65	31.25
ShipMOT	79.01	88.58	40	32.36

Table 2. Ablation results.

Method	ByteTrack	+NSA	+BBSI	HOTA	MOTA	ID Switch ↓	FPS
M1	√			75.31	87.43	82	34.86
M2	√	√		78.01	88.07	65	34.66
M3	√		√	77.57	88.03	55	32.56
M4	√	√	√	79.01	88.58	40	32.36

Table 3. Comparison of tracking performance among different motion filters.

Filter	HOTA (%) ↑	MOTA (%) ↑	ID Switch ↓	FPS ↑
EKF	77.31	87.83	70	31.23
UKF	78.45	88.23	62	23.47
NSA	78.01	88.07	65	34.66

Table 4. Comparison of tracking performance among different detectors.

Detector	mAP@50 ↑	mAP@50-95 ↑	ID Switch ↓	HOTA ↑
SSD	0.76	0.22	85	74.35
YOLOv5	0.89	0.38	52	77.23
YOLOv7	0.93	0.41	40	79.01

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, C.; Ma, F.; Wang, K.-L.; Liu, H.-H.; Zeng, D.-H.; Lu, P. ShipMOT: A Robust and Reliable CNN-NSA Filter Framework for Marine Radar Target Tracking. Electronics 2025, 14, 1492. https://doi.org/10.3390/electronics14081492

AMA Style

Chen C, Ma F, Wang K-L, Liu H-H, Zeng D-H, Lu P. ShipMOT: A Robust and Reliable CNN-NSA Filter Framework for Marine Radar Target Tracking. Electronics. 2025; 14(8):1492. https://doi.org/10.3390/electronics14081492

Chicago/Turabian Style

Chen, Chen, Feng Ma, Kai-Li Wang, Hong-Hong Liu, Dong-Hai Zeng, and Peng Lu. 2025. "ShipMOT: A Robust and Reliable CNN-NSA Filter Framework for Marine Radar Target Tracking" Electronics 14, no. 8: 1492. https://doi.org/10.3390/electronics14081492

APA Style

Chen, C., Ma, F., Wang, K.-L., Liu, H.-H., Zeng, D.-H., & Lu, P. (2025). ShipMOT: A Robust and Reliable CNN-NSA Filter Framework for Marine Radar Target Tracking. Electronics, 14(8), 1492. https://doi.org/10.3390/electronics14081492

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ShipMOT: A Robust and Reliable CNN-NSA Filter Framework for Marine Radar Target Tracking

Abstract

1. Introduction

2. Literature Review

2.1. Multi-Object Tracking Methods Based on MR Images

2.2. Image-Based Deep Learning Multi-Object Tracking Methods

2.3. Nonlinear State Augmentation (NSA) Filter

3. A Proposed Approach

3.1. Nonlinear State Augmentation (NSA) Filtering

3.2. The Bounding Box Similarity Index (BBSI)

4. A Case Study

4.1. Experiment Preparation

4.1.1. A Dataset

4.1.2. Experiment Platform

4.1.3. Target Detection

4.2. Experiment Results

4.2.1. Target Tracking

4.2.2. Ablation Study

4.2.3. Ship Tracking in Different Scenarios

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI