An Anti-Occlusion Approach for Enhanced Unmanned Surface Vehicle Target Detection and Tracking with Multimodal Sensor Data

Zheng, Minjie; Li, Dingyuan; Chen, Guoquan; Wang, Weijun; Yang, Shenhua

doi:10.3390/jmse12091558

Open AccessArticle

An Anti-Occlusion Approach for Enhanced Unmanned Surface Vehicle Target Detection and Tracking with Multimodal Sensor Data

by

Minjie Zheng

,

Dingyuan Li

,

Guoquan Chen

^*,

Weijun Wang

and

Shenhua Yang

Navigation College, Jimei University, Xiamen 361021, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(9), 1558; https://doi.org/10.3390/jmse12091558

Submission received: 27 July 2024 / Revised: 17 August 2024 / Accepted: 1 September 2024 / Published: 5 September 2024

(This article belongs to the Special Issue Unmanned Marine Vehicles: Navigation, Control and Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

Multimodal sensors are often employed by USVs (unmanned surface vehicles) to enhance situational awareness, and the fusion of LiDAR and monocular vision is widely used in near-field perception scenarios. However, this strategy of fusing data from LiDAR and monocular vision may lead to the incorrect matching of image targets and LiDAR point cloud targets when targets occlude one another. To address this issue, a target matching network with an attention module was developed to process occlusion information. Additionally, an image target occlusion detection branch was incorporated into YOLOv9 to extract the occlusion relationships of the image targets. The introduction of the attention module and the occlusion detection branch allows for the consideration of occlusion information in matching point cloud and image targets, thereby achieving more accurate target matching. Based on the target matching network, a method for water surface target detection and multi-target tracking was proposed. This method fuses LiDAR point cloud and image data while considering occlusion information. Its effectiveness was confirmed through experimental verification. The experimental results show that the proposed method improved the correct matching rate in complex scenarios by 13.83% compared to IoU-based target matching methods, with an MOTA metric of 0.879 and an average frame rate of 21.98. The results demonstrate that the method effectively reduces the mismatch rate between point cloud and image targets. The method’s frame rate meets real-time requirements, and the method itself offers a promising solution for unmanned surface vehicles (USVs) to perform water surface target detection and multi-target tracking.

Keywords:

unmanned surface vehicles; sensor fusion; target matching; target detection and tracking

1. Introduction

Unmanned surface vehicles (USVs) are garnering increased attention in the maritime industry due to their compact size and ability to operate effectively in challenging conditions. These attributes make them well suited for tasks such as maritime surveillance, environmental monitoring, and search and rescue operations. Situational awareness is crucial for optimizing USV performance, enabling autonomous navigation and collision avoidance. A key component of situational awareness involves detecting and tracking surface targets accurately, which is essential for collision avoidance and efficient route planning, ultimately leading to energy savings and improved operational efficiency.

However, current WSTDT (water surface target detection and tracking) methods still have some shortcomings. Ships usually rely on shipborne radar and AISs (automatic identification systems) to identify surface targets [1,2]. While navigating in open waters, these sources of information prove to be reliable enough to support operations such as collision avoidance and route planning. However, they lack precision in perceiving near-field obstacles. Shipborne radar has a blind detection area in the near field and lacks distance accuracy [3]. The update rate of AIS information is insufficient to facilitate collision avoidance for near-field obstacles, and AISs do not support the detection of non-ship obstacles. Compared to traditional shipborne radar and AISs, LiDAR and cameras have received more attention in the recent research on near-field water surface target detection [4]. Numerous algorithms based on CNNs (convolutional neural networks), such as Yolo and Faster-RCNN, have been widely used to identify ships and other static targets from images [5]. Water surface point cloud segmentation and identification methods based on clustering and CNNs have also been studied in depth [6]. However, detection techniques that rely solely on images or LiDAR point clouds have poor stability in complex environments. In adverse weather or poor lighting conditions, the accuracy of image recognition decreases [7], and the sparsity of LiDAR point clouds leads to targets being easily omitted. In order to improve the stability of recognition and fully exploit the advantages of LiDAR and camera sensors, many studies have begun to try to combine these two sensing technologies [8].

Considering the variations in perception range, data modalities, and detection accuracies among different sensors used for WSTDT, a major research challenge is how to efficiently utilize multi-modal information from multiple sensors to achieve a higher accuracy and faster target detection. The inappropriate application of multi-modal data for WSTDT can lead to inflated costs and suboptimal improvements in detection accuracy. Therefore, there is an urgent need for an effective method of water surface target fusion detection, which is the motivation of this article. In response to the near-field perception requirements of USV collision avoidance, we consider the complementary characteristics of multiple sensors and propose a multi-stage method for WSTDT that fuses data from three-dimensional LiDAR and a monocular camera. The main contributions are as follows:

A neural network for matching point cloud and image targets is constructed, addressing the problem of the incorrect matching of multi-modal information for multiple occluded targets. Simultaneously, the training process of this network replaces the tedious joint calibration work between the camera and LiDAR;
A method which integrates data from LiDAR and a camera and considers the occlusion relationships between targets for WSTDT is proposed.

The remainder of the paper is organized as follows: Section 2 is the literature review. In Section 3, the issue of incorrect matches caused by target occlusion in the fusion of LiDAR and monocular vision is analyzed, and a solution is proposed. In Section 4, the WSTDT method and the target matching network architecture, which are the core parts of the research, are introduced. In Section 5, the effectiveness of the WSTDT method is validated through comparative experiments. Finally, in Section 6, the experiment results are summarized, and future research directions are outlined.

2. Related Work

Recent studies on integrating LiDAR and camera data have explored various methods for detecting targets on water surfaces. Some scholars generate ROIs (regions of interest) by projecting point cloud targets onto the image plane and then classify the targets using traditional machine vision or deep learning methods. David John Thompson [9] converted LiDAR point clouds into an overhead grid map, generated image ROI areas based on the occupied grid, and used an SVM to achieve the detection and classification of water surface targets. Woo, J. et al. [10] utilized two-dimensional LiDAR to detect the orientation and distance of floating obstacles and classified the obstacles by color. Kamsvåg, V. et al. [11] implemented DBSCAN (density-based spatial clustering of applications with noise) for point cloud segmentation and established an image’s ROIs by projecting point cloud targets onto the image plane. Subsequently, they utilized Faster-RCNN for target classification. For methods that project point cloud targets onto the image plane to generate ROIs and subsequently classify targets within these ROIs, occlusion can lead to the misidentification of the targets. Additionally, if any target entities are not scanned by LiDAR, their corresponding targets in the image will be overlooked.

Compared to only relying on point cloud targets to generate ROIs, a more prevalent approach used for the detection results of multiple sensors comprehensively is known as decision-level fusion. This approach allows missed point cloud targets to be compensated by image targets. Wu, Y. et al. and Clunie, T. et al. [12,13] used a probabilistic data association method to fuse detection results from radar, LiDAR, and RGB cameras, achieving more stable marine environment target detection. Lu, Z. et al. [14] used a CNN to detect image targets and determined the distance of the image targets based on the projection of the point clouds in the image. Wang, L. et al. [15] used the CornerNet-Lite network to detect water surface images and projected the LiDAR point clouds onto the image plane. The targets’ confidence scores were adjusted, and the spatial position information of the targets were obtained based on point clouds within the image target bounding boxes. This approach effectively reduced misidentifications caused by water surface ripples and reflections. Liu, D. et al. [16] used D-S evidence theory to fuse LiDAR, millimeter-wave radar, and stereo vision data for target detection in an overhead grid map. The detection accuracy was higher than that of methods using only a single sensor. Chen, J. et al. [17] combined data from radar, binocular stereo vision, and GPS, using an extreme learning machine (ELM) as a binary classifier to match image targets with radar targets. This approach effectively achieves target detection and classification in real marine environments. The primary challenge of decision-level fusion approaches is how to effectively integrate the results obtained from various types of sensor data. Among them, the literature [12,13,14,15] can be categorized as “ projecting multi-modal targets onto bird’s-eye view maps or image planes, and matching the targets with the highest IoU (intersection over union) to achieve fusion of multi-modal information”, These approaches may lead the mismatch of image targets and point cloud targets when some targets are occluded. Liu, D. et al. [16] match targets within each grid of an overhead grid map, requiring a high degree of precision in the calibration results obtained from various sensors. Chen, J. et al. [17] use machine learning methods to match image targets and radar targets in the bird’s-eye view map, but the drawback of this method is that it directly used the features of the radar and image targets as the input of the ELM, without considering the intra-group influences among either image targets or radar targets.

Apart from combining the detection results from various sensor data, some studies have also developed deep learning neural networks to extract comprehensive features from each sensor’s data for target recognition, known as feature-level fusion. Haghbayan, M. et al. [18] utilize a convolutional neural network to extract features from LiDAR point clouds and images. Based on the spatial mapping relationship between the image and point clouds, they combine the features from both sources and detect the combined features to achieve target detection. Feature-level fusion methods are often capable of utilizing data from each sensor more effectively, thereby enhancing the accuracy of target detection. However, feature-level fusion methods also require the strict spatial and time alignment of sensor data, and their performance depends on the quality of the neural network architecture.

Unlike “feature-level fusion” methods that require complex neural networks and have large computational demands, “decision-level fusion” methods have smaller computational requirements, a better real-time performance, and are easier to deploy in practical systems. In order to facilitate the deployment of our method in practical systems, and considering the real-time performance requirements for WSTDT in actual scenarios, we focus on “decision-level fusion” methods. To address the two shortcomings of the above-mentioned methods (“ strict joint calibration of sensors is required” and “when a target is occluded, its features are prone to be mismatched”), a neural network based on the attention mechanism [19] to match the detection results derived from LiDAR point clouds with those from image detection is constructed. A multi-target detection and tracking method for water surface objects that considers occlusion relationships was achieved based on the neural network.

A summary of the above related works is provided in Table 1.

3. Problem Statement

For the decision-level fusion methods, the general process of data fusion is shown in Figure 1.

We focus on the fusion perception of water surface objects based on images and LiDAR point clouds. In prior studies, researchers usually established the match between targets of two modalities by comparing the IoU (calculated as the area of overlap between the predicted bounding box and the ground truth bounding box divided by the area of their union) of the point cloud target’s projection on the image plane with the image target [20]. However, when three-dimensional point cloud targets are projected onto two-dimensional images and matched by calculating the IoU, occlusions in the three-dimensional domain (such as a ship and a buoy in Figure 2) can lead to mismatch [21].

In Figure 2, when we manually match the point cloud targets and image targets of the ship and the buoy, we discovered that the ship is being occluded by the buoy. Therefore, we match the farther point cloud target with the image targets of the ship, and the closer point cloud target with the image targets of the buoy. When automatically matching targets, occlusion information should be extracted and used as one of the inputs to enhance the accuracy of the match between image and point cloud targets.

In order to achieve the above aim, we utilize the attention mechanism to incorporate the occlusion relationship between targets, thereby improving the outcome of the target matching process. The attention mechanism helps the neural network focus on relevant parts of the input data by considering contextual and positional information. Given that the positions of point cloud targets inherently contain three-dimensional information, incorporating an attention module allows us to effectively extract the occlusion relationships among these targets. However, the position of image targets only provides two-dimensional information. To obtain the occlusion relationship between image targets, an additional step of occlusion detection is required during the process of image target detection, i.e., determining whether the target is occluded. Here, we introduce an occlusion detection branch to YOLOv9 [22] to realize the occlusion detection of image targets. By constructing a target matching network with an attention module and adding an occlusion detection branch to YOLOv9, the match between the point cloud and image targets while considering the occlusion relationships between targets is achieved. This work will be explained in Section 4.2.

4. Method

4.1. Target Detection and Multi-Target Tracking

The method for water surface target identification and tracking proposed in this study primarily uses LiDAR point clouds and images as inputs, and it outputs an overhead grid map of obstacles around the ship. The main functional modules of the system are divided into a target detection module (as shown in Figure 3) and a multi-target tracking module (as shown in Figure 4).

4.1.1. Water Surface Target Detection

We selected YOLOv9-C and DBSCAN for image target detection and point cloud segmentation, respectively. YOLOv9-C is an object detection algorithm for images, while DBSCAN is a density-based clustering method. The choice of YOLOv9-C was motivated by the fact that, at the time of conducting this research, it was one of the latest and best-performing image target detection algorithms. Additionally, the -C version was chosen because it has the smallest computational load among the released models, making it suitable for tasks with real-time requirements. DBSCAN was chosen because, in a maritime environment, different objects’ point clouds often have certain separation distances, facilitating voxel-based downsampling. The downsampled point clouds have fewer points and clear separations between different targets. In such cases, using DBSCAN for segmentation is both simple and efficient.

As shown in Figure 3, images and point cloud data are processed through YOLOv9 recognition and DBSCAN segmentation [6], respectively. YOLOv9 and DBSCAN output a series of image and point cloud features. These features are input together into the target matching network. The matching network correctly matches the point clouds and image targets, generating 3D targets with category information attached. Then, these targets are projected onto an overhead grid map based on IMU information. The overhead grid map is then a preliminary target detection result.

Owing to the sparsity of point clouds, some targets are not scanned by LiDAR, so during the matching process, some image targets are not matched with point cloud targets. For these image targets, we estimate their positions in the overhead grid map based on their size and position in the image using monocular estimation [23]. By combining the result of the monocular estimation and the preliminary target detection result, we obtain the final target detection result.

The specific calculation process of monocular estimation is as follows. Assuming the camera’s intrinsic matrix is

K

, and the center point of the image target is

(u, v)

, then the coordinates of the pixel point in the camera coordinate system are as follows:

[\begin{matrix} x \\ y \\ 1 \end{matrix}] = K^{- 1} \cdot [\begin{matrix} u \\ v \\ 1 \end{matrix}],

(1)

Assuming the rotation matrix from the USV to the camera coordinate system is

R_{1}

, and the rotation matrix from the world to the USV coordinate system is

R_{2}

, then the coordinates of the pixel point in the world coordinate system are as follows:

[\begin{matrix} X \\ Y \\ Z \end{matrix}] = R_{2} \cdot R_{1} \cdot [\begin{matrix} u \\ v \\ 1 \end{matrix}],

(2)

Therefore, we can obtain the true bearing of the target relative to the USV as follows:

θ = \arctan \frac{X}{Y},

(3)

At the same time, for floating docks and ships, their pixel height will not change with the variation in the observation perspective, and for buoys, their pixel width will not change with the variation in the observation perspective. Based on trigonometry principles, the following two formulas can estimate the target distance according to height and width, respectively:

D = \frac{H \cdot f_{y}}{h},

(4)

D = \frac{W \cdot f_{x}}{w},

(5)

where H and W represent the actual height and width of the target (in meters), which are obtained by importing the corresponding 3D model into Blender and measuring them; h and w represent the pixel height and width of the target;

f_{x}

and

f_{y}

denote the focal lengths of a camera in the x-direction and y-direction, respectively; and

D

represents the distance to the target. Based on

D

,

θ

, and the position of the boat, the approximate position of the target in the

X Y

plane can be calculated, as shown in Figure 5. Due to the lack of precise location information from point clouds when estimating target position solely through monocular images, these temporary targets are only retained in the current detection frame and are not added to the tracking list.

4.1.2. Multi-Target Tracking

As shown in Figure 4, the multi-target tracking module takes a time sequence of target detection results as input, assigns a Kalman filter to each target, and tracks the category, position, and speed of targets. During multi-target tracking, we use the Hungarian algorithm (a combinatorial optimization algorithm for solving assignment problems) to match the targets between two timestamps (t and t + 1). Firstly, we use the Kalman filter of each target to predict its position at time t + 1. Secondly, we calculate the matching cost between this predicted position and the actual position of each target at time t + 1. The task of the Hungarian algorithm is to find a matching scheme that minimizes the total cost of all pairs. Finally, we can establish the correspondence between the targets at time t and t + 1, achieving continuous target tracking. The specific tracking process is as shown in Algorithm 1.

Algorithm 1 Multi-target Tracking

Input:

d e t c t i o n s_{i}

Output:

t r a c k s

# target tracking list (including location, category, speed, and tracking id)

1:

t r a c k s = []

# create an empty tracking list
2:

F o r d e t c t i o n i n d e t c t i o n s_{0} :

3:

t r a c k = K F (d e t c t i o n)

# assign a Kalman filter to each detected target in the initial detection frame
4:

t r a c k s . a p p e n d (t r a c k)

# and add them to the tracking list
5:

e n d F o r

6:

i = 1

7:

W h i l e (d e t c t i o n s_{i} \neq \emptyset) :

# maintain the tracking loop while there are still new detection frames
8:

F o r t r a c k i n tracks :

# predict the state of each tracked target in the current frame
9:

t r a c k . p r e d i c t ()

10:

e n d F o r

11:

m a t c h s, u n m a t c h e d_t r a c k s, u n m a t c h e d_d e t e c t i o n s = M a t c h (t r a c k s, d e t c t i o n s_{i})

# match the tracked targets and detected targets using the Hungarian algorithm
12:

F o r m a t c h i n m a t c h s :

# for the successfully matched targets, update their tracking status
13:

t r a c k_i n d e x, d e t e c t i o n_i n d e x = m a t c h

14:

t r a c k s [t r a c k_i n d e x] . u p d a t e (d e t c t i o n s_{i} [d e t e c t i o n_i n d e x])

15:

t r a c k s [t r a c k_i n d e x] . t i m e s_f r o m_l a s t_d e t e c t i o n = 0

# reset the count of undetected times
16:

e n d F o r

17:

F o r t r a c k i n u n m a t c h e d_t r a c k s :

# for the unmatched tracking targets
18:

t r a c k . t i m e s_f r o m_l a s t_d e t e c t i o n + = 1

# increase its consecutive undetected times
19:

e n d F o r

20:

F o r d e t c t i o n i n u n m a t c h e d_d e t c t i o n s :

# for the unmatched detection targets
21:

t r a c k = K F (d e t c t i o n)

# assign a Kalman filter for it
22:

t r a c k s . a p p e n d (t r a c k)

# and add it to the tracking list
23:

e n d F o r

24:

F o r t r a c k i n tracks :

# remove tracking targets that have not been detected consecutively for 10 times from the tracking list
25:

i f t r a c k . t i m e s_f r o m_l a s t_d e t e c t i o n > 10 :

26:

t r a c k s . r e m o v e (t r a c k)

27:

e n d i f

28:

e n d F o r

29:

i + = 1

30:

e n d W h i l e

4.2. Target Matching Network and Improved YOLOv9

4.2.1. Target Matching Network

Considering the occlusion relationship between targets, we construct a neural network with an attention module to match point cloud and image targets, as shown in Figure 6. The feature of the point cloud target is the coordinates of the vertices of its 3D bounding box. The feature of the image target is the coordinates of the vertices of the 2D bounding box and the probability of the image target being occluded. The probability of the image target being occluded is predicted by the occlusion detection branch in the improved YOLOv9 algorithm. The occlusion detection branch will be detailed in Section 4.2.2. The point cloud and image features will each go through an attention module. Once processed by the attention module, each point cloud feature will integrate the features of other point cloud targets, and the same applies to the features of image targets. In this way, the occlusion relationship between point cloud targets and image targets will be considered by the subsequent MLP network. The loss function of the target matching network uses binary cross-entropy loss, and its formula is as follows:

BCELoss (x, y) = - \frac{1}{n} \sum_{i = 1}^{n} [y_{i} \cdot \log (x_{i}) + (1 - y_{i}) \cdot \log (1 - x_{i})],

(6)

4.2.2. Improved YOLOv9

In order to extract the occlusion relationship between image targets, we introduce an occlusion detection branch to the detection head of the original YOLOv9 algorithm to detect whether the target is occluded. The architecture of the YOLO network after adding the branch is shown in Figure 7. When there are multiple targets in the same line of sight, the unoccluded target is closer, while the occluded target is farther away; thus, the near/far relationship of image targets can be learned to a certain degree. The recognition results after adding the occlusion detection branch are shown in Figure 8.

To verify the usability of the matching method proposed in this study, we constructed a method for multi-target detection and tracking on a water surface and applied and evaluated it in the Gazebo simulation environment [24]. The next section will provide a detailed introduction.

5. Experiments and Results

The experiment consists of three parts: comparing the IOU-based target matching method and the target matching method proposed in this study; implementing and evaluating the water surface target detection and multi-target tracking algorithm in the VRX 2022 simulation environment; and conducting a stability analysis to evaluate the robustness of the proposed method under various challenging conditions.

The hardware configuration used in the experiment is shown in Table 2.

5.1. Comparison of Target Matching Methods

5.1.1. Data Collection

We collected simulated sensor data in the VRX 2022 simulation environment built on ROS Gazebo for the RobotX 2022 Challenge. This environment contains various static water surface objects such as buoys and docks, and it also simulates the waves and the buoyancy of these water surface objects, as shown in Figure 9. It features dynamic environmental conditions such as changing weather and lighting conditions to test the robustness of navigation algorithms. We use the WAM-V in VRX 2022 as our own ship, which is equipped with various sensors such as an RGB camera, LiDAR, IMU, GPS, etc. To obtain the sensor data of dynamic water obstacles, we obtained a type of ship mesh model from the 3D Warehouse, imported it into the simulation environment, and wrote ROS nodes to control three ships moving along different paths.

In order to ensure that the USV can stably identify targets, it is necessary to gather sensor data from our own ship in various postures and positions to train the target matching network. For this purpose, we maneuvered the WAM-V to move along six different paths across two distinct water surface environments. Thus, we obtained six data packages. In the recorded data packages, the image frequency was 30 HZ, the GPS frequency was 15 HZ, the LiDAR frequency was 10 HZ, and the IMU frequency was 100 HZ. From the recorded data packages, we selected 244 frames of effective image and point cloud data pairs. Within these data pairs, we identified 4693 effective matching labels. The final training, validation, and test sets have 115, 34, and 95 frames, respectively, in which the training and validation sets only include the data from the first scene, and the testing set includes the data from both scenarios. In addition, the testing set was divided into ordinary and complex scenarios. The complex scenarios comprised a total of 47 frames and contained 1562 effective matching labels.

5.1.2. Comparative Experiment

In the process of matching point cloud targets with image targets, the method that is based solely on IOU projects the point cloud targets onto the image plane, then the IOU between the point cloud targets and image targets is calculated. The image target and point cloud target that have the maximum IOU are matched together. SDIOU [21] considers the shape similarity and relative distance of the projection box based on the IoU. The accuracy of the IoU-based matching methods and our proposed matching method, as tested on our self-built test dataset, is shown in Table 3. The calculation formula for the accuracy is as follows:

A c c u r a c y = \frac{T P + T N}{G T},

(7)

where

T P

stands for true positives, i.e., a point cloud target matched with the right image target;

T N

stands for true negatives, i.e., a point cloud target not matched with any image target and for which the prediction probabilities of matching with any image target are all below the threshold; and

G T

refers to the total number of point cloud targets to be matched.

It is obvious that the methods based on IOU do not perform well in our self-built test dataset, and the SDIOU method that considers target shapes performs worse than the simple IOU method. This is likely because the targets on the water surface are far away, their point clouds are sparse, and shape information is lost. Comparing the last three methods, it is clear that the inclusion of the attention module and the occlusion detection branch both significantly increased the matching accuracy. This improvement is especially remarkable in complex scenarios.

5.2. Experiment of Target Detection and Multi-Target Tracking

5.2.1. Input Data Processing

Spatial alignment and time alignment are the basis for ensuring the effectiveness of multi-sensor data fusion. For spatial alignment, the trained target matching network includes the spatial location relationship between point clouds and images, so there is no need for further spatial alignment. To achieve time alignment, we use an image frame as a reference and select the point cloud frame with the minimal time interval relative to it as the input data. We then choose one IMU data point before and one after the selected image and point cloud data, ensuring the minimal time interval between them (as shown by IMU0 and IMU1 in Figure 10). Using the timestamps of the IMU, image, and point cloud data, we interpolate them to compute the ship’s pose corresponding to the image and point cloud timestamps (as shown by Rotate1 and Rotate2 in Figure 10). Next, the point cloud is rotated according to the rotational relationship between these two poses (as shown by Rotate1⁻¹ × Rotate2 in Figure 10). This process outputs point clouds corresponding to the image timestamp (as shown by AP in Figure 10), thus achieving time alignment between the point clouds and the image. The time alignment process is illustrated in Figure 10.

In addition, in order to obtain pairs of image and point cloud data, we adopt the procedure shown in Figure 11 and Figure 12 for offline and real-time data pair capture, respectively.

5.2.2. Results of Target Detection and Multi-Target Tracking

The test scenario for the target detection and multi-target tracking method is shown in Figure 13. The visualization effects of target detection and multi-target tracking on the water surface are shown in Figure 14. Figure 14a is a grid map with a grid size of 0.5 m × 0.5 m, where each pixel represents a grid cell. The grid map provides a clear visualization of the spatial distribution and size of each target. Furthermore, in the grid map, pixels of different colors represent different categories. Figure 14b visualizes the target detection and tracking data in Pygame, showing various buoys, ships, and floating docks. The category of each target can be intuitively seen, as well as the direction of movement of the ships, with the heading of the ship indicating its direction of movement. The raw data from the occupancy grid are suitable as the perception input for autonomous ships, while the visualization in Pygame is more suitable as a reference for the manual operation of the ship.

We select a common indicator—MOTA [25]—to measure the performance of the detection and tracking methods. It takes into account false positives, missed detections, and identity switches. The calculation formula is as follows:

M O T A = 1 - \frac{F P + F N + I D S W}{G T},

(8)

where

F P

stands for false positives, i.e., a location is detected as having a target when there is not one present or the target type is misjudged;

F N

stands for missed detections, i.e., a target is present at a location but is not detected;

I D S W

stands for identity switches, which refers to the number of times the ID of a target changes during tracking; and

G T

stands for ground truths, which refer to the actual locations and identities of targets within a scene. If there are m target entities in the scene and n frames in the tracking process, then the total number of

G T

would be

m \times n

. Each target in each frame has a corresponding

G T

.

In order to reflect the generalization performance of the matching method, the detection and tracking method is evaluated in a scenario that differs from those used to train the matching network. The final test data set consists of 1436 frames, with a total of 19 target entities. Table 4 shows a comparison between the detection and tracking performance achieved with the IoU-based method and the matching method proposed in this study.

The average frame rate for the detection and tracking method surpasses 20 frames per second, while the typical frequency of LiDAR ranges between 10 and 20 Hz. This indicates that our approach fulfills the requirements for real-time operation. It is worth emphasizing that we focus on improving the processing of occlusion information, rather than on optimizing the tracker itself. This is because our aim is to explore and demonstrate the significance of occlusion information in target detection and tracking, not to enhance the performance of the tracker. Consequently, our experiments did not involve comparisons with other trackers. Nonetheless, it is likely that combining our method of occlusion information processing with the latest trackers could further improve its tracking performance.

5.3. Stability Evaluation

To analyze the stability of the proposed method, several experiments were conducted. These experiments include evaluating the method’s performance under foggy and low-light conditions, analyzing the impact of varying ship densities in the waterway, examining the method’s stability during extended testing periods, and assessing the effect of external disturbances on the performance of the proposed method. These experiments provided clearer insights into the stability and applicability of the proposed method across various complex environments.

5.3.1. Evaluation under Foggy and Low-Light Conditions

To evaluate the impact of foggy conditions and insufficient lighting on the accuracy of the target matching method proposed, we maneuvered the WAM-V to move along the same path in an identical environment, adjusting only the fog density or lighting intensity, and recorded five data packages. From each data package, we extracted 44 pairs of image and point cloud data according to the same sequence of timestamps. Figure 15 presents the image detection results under normal weather, heavy fog, and poor lighting conditions. It can be observed that the recall rate of image target detection decreases under poor lighting or foggy conditions. The specific statistical results of image target detection are shown in Table 5.

Subsequently, we measured the accuracy of target matching on datasets under different environmental conditions, as shown in Table 6.

It can be observed that the accuracy of target matching does not decrease under foggy or poor lighting conditions.

To evaluate the impact of foggy conditions and insufficient lighting on the MOTA of the proposed WSTDT method, we ran the proposed WSTDT method on five recorded data packages. The results are shown in Table 7. To evaluate the impact of low recall rates of image targets, we modified the definition of FN in MOTA. Originally, FN referred to missed detections, but now each target with an uncertain category will also increase FN by 0.5.

The results indicate that when the fog is light or the lighting intensity is at 50%, the performance of the WSTDT method is almost unaffected. However, when the fog is dense or the lighting is only at 20%, the recall rate and precision of image detection decrease, leading to a significant increase in the FP and FN metrics for the WSTDT method and, consequently, a noticeable drop in MOTA.

5.3.2. Impact of Varying Ship Densities

To evaluate the impact of ship density in a navigational area on the performance of our proposed method, we set up three different quantities of ships in a 10,000 square meter area and recorded data packets for the same duration for each quantity. We then applied the proposed WSTDT method to these data packets to assess how the ship density affects the performance of the proposed WSTDT method. The specific results are shown in Table 8.

The results indicate that an increase in ship density within a certain range does not affect the performance of the proposed WSTDT method. However, when ship density exceeds a certain threshold, the performance of the WSTDT method declines. This decline occurs because, with excessively dense targets, some targets overlap significantly, making their features prone to be mismatched and causing a sharp increase in FP. Additionally, when using the Hungarian algorithm to match the same target across consecutive frames, the close proximity of targets can lead to incorrect matches, resulting in a rapid rise in IDSW.

5.3.3. Long-Term Stability Testing

To evaluate the WSTDT method’s stability during extended testing periods, we applied it to the data packets used in Section 5.2. The variation curve of MOTA as the cumulative runtime increases is shown in Figure 16.

The results shown in Figure 16 indicate that as the runtime increases, the MOTA metric gradually improves and stabilizes.

5.3.4. Effect of External Disturbances

To evaluate the impact of external disturbances on the proposed method, we added random disturbances to the measured point cloud target positions. Each disturbance is three-dimensional, with the perturbations in the X, Y, and Z directions following a normal distribution with a mean of 0 and the same standard deviation. The accuracy of target matching on the test dataset used in Section 5.1 under varying levels of external disturbances is shown in Figure 17.

The data packets from Section 5.2 were used to test the impact of external disturbances on MOTA, with the detailed results shown in Table 9.

From the results in Table 9, as the standard deviation of the external disturbance on the point cloud target position increases, the MOTA metric gradually decreases. When the standard deviation of the external disturbance is 0.5, the impact is small; however, when the standard deviation increases to 1, the number of FP significantly increases. This is because the external disturbance exceeds the threshold that small-target matching can withstand. For large targets, due to their larger volume, a small disturbance still leaves most of their volume overlapping with the original position, thus having a smaller impact on the matching performance. However, for small targets, the same degree of disturbance can separate their position from the original, making matching difficult and significantly affecting the MOTA metric.

6. Conclusions

This research addresses a specific challenge in the field of situational awareness on water surfaces. The challenge arises when data from LiDAR and a monocular camera are fused together using a ‘decision-level’ method, which results in the probability of target features being incorrectly matched when these targets are occluded. To address this issue, a novel approach is proposed to accurately match image targets with point cloud targets. Furthermore, an attention-based target detection and multi-target tracking method is proposed, which considers occlusion information. These methods have been validated in the Gazebo water surface simulation environment. Compared with the IOU-based matching method, the accuracy of the target matching method proposed in this study increased by 13.83% in complex scenarios. In addition, the MOTA score of the multi-target detection and tracking method proposed in this study reached 0.879 in the simulation environment, with an average frame rate of 21.98, meeting the real-time requirements. Through our method, the data from LiDAR and monocular cameras will be mapped into an overhead grid map of obstacles near our own ship, which can provide a basis for decision making for the autonomous navigation of the ship.

In future research, real-world datasets for training might be used and feature extraction and target matching might be combined to enable the network to autonomously select the necessary features, which could further improve the accuracy and stability of detection and tracking. In addition, using the perception result’s output by the method proposed in this study as input for reinforcement learning and other methods to achieve the autonomous navigation of ships is also a task to be carried out in the follow-up work of this study.

Author Contributions

Conceptualization, D.L. and G.C.; methodology, D.L.; software, S.Y.; validation, D.L., G.C. and M.Z.; formal analysis, W.W.; investigation, M.Z.; resources, M.Z.; data curation, W.W.; writing—original draft preparation, D.L.; writing—review and editing, G.C.; visualization, S.Y.; supervision, W.W.; project administration, M.Z.; funding acquisition, S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No. 52371369); the Natural Science Project of Fujian Province (No. 2022J01323, 2021J01822); the Fujian Provincial Science and Technology Plan Foreign Cooperation Project (No. 2023I0019); the Fuzhou-Xiamen-Quanzhou In-dependent Innovation Region Cooperated Special Foundation (No: 3502ZCQXT2021007); the Fujian Ocean and Fisheries Bureau High Quality Development Special Project (No: FJHYF-ZH-2023-10); and the Natural Science Foundation of Xiamen (No: 502Z202373038).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the ongoing nature of the research and the fact that the dataset is still being expanded and refined.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Naus, K.; Wąż, M.; Szymak, P.; Gucma, L.; Gucma, M. Assessment of Ship Position Estimation Accuracy Based on Radar Navigation Mark Echoes Identified in an Electronic Navigational Chart. Measurement 2021, 169, 108630. [Google Scholar] [CrossRef]
Yan, Z.; Cheng, L.; He, R.; Yang, H. Extracting Ship Stopping Information from AIS Data. Ocean. Eng. 2022, 250, 111004. [Google Scholar] [CrossRef]
Almeida, C.; Franco, T.; Ferreira, H.; Martins, A.; Santos, R.; Almeida, J.M.; Carvalho, J.; Silva, E. Radar Based Collision Detection Developments on USV ROAZ II. In Proceedings of the OCEANS 2009-EUROPE, Bremen, Germany, 11–14 May 2009. [Google Scholar]
Thombre, S.; Zhao, Z.; Ramm-Schmidt, H.; Vallet Garcia, J.M.; Malkamaki, T.; Nikolskiy, S.; Hammarberg, T.; Nuortie, H.; Bhuiyan, M.Z.H.; Sarkka, S.; et al. Sensors and AI Techniques for Situational Awareness in Autonomous Ships: A Review. IEEE Trans. Intell. Transport. Syst. 2022, 23, 64–83. [Google Scholar] [CrossRef]
Farahnakian, F.; Heikkonen, J. Deep Learning Based Multi-Modal Fusion Architectures for Maritime Vessel Detection. Remote Sens. 2020, 12, 2509. [Google Scholar] [CrossRef]
Boonchoo, T.; Ao, X.; Liu, Y.; Zhao, W.; Zhuang, F.; He, Q. Grid-Based DBSCAN: Indexing and Inference. Pattern Recognit. 2019, 90, 271–284. [Google Scholar] [CrossRef]
Liu, Z.; Zhang, Y.; Yu, X.; Yuan, C. Unmanned Surface Vehicles: An Overview of Developments and Challenges. Annu. Rev. Control 2016, 41, 71–93. [Google Scholar] [CrossRef]
Kufoalor, D.K.M.; Johansen, T.A.; Brekke, E.F.; Hepsø, A.; Trnka, K. Autonomous Maritime Collision Avoidance: Field Verification of Autonomous Surface Vehicle Behavior in Challenging Scenarios. J. Field Robot. 2020, 37, 387–403. [Google Scholar] [CrossRef]
Thompson, D.J. Maritime Object Detection, Tracking, and Classification Using Lidar and Vision-Based Sensor Fusion. Master’s Thesis, Embry-Riddle Aeronautical University, Daytona Beach, FL, USA, 2017. [Google Scholar]
Woo, J.; Lee, J.; Kim, N. Obstacle Avoidance and Target Search of an Autonomous Surface Vehicle for 2016 Maritime RobotX Challenge. In Proceedings of the 2017 IEEE Underwater Technology (UT), Busan, Republic of Korea, 21–24 February 2017. [Google Scholar]
Kamsvåg, V. Fusion between Camera and Lidar for Autonomous Surface Vehicles. Master’s Thesis, Norwegian University of Science and Technology, Trondheim, Norway, 2018. [Google Scholar]
Wu, Y.; Qin, H.; Liu, T.; Liu, H.; Wei, Z. A 3D Object Detection Based on Multi-Modality Sensors of USV. Appl. Sci. 2019, 9, 535. [Google Scholar] [CrossRef]
Clunie, T.; DeFilippo, M.; Sacarny, M.; Robinette, P. Development of a Perception System for an Autonomous Surface Vehicle Using Monocular Camera, LIDAR, and Marine RADAR. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021. [Google Scholar]
Lu, Z.; Li, B.; Yan, J. Research on Unmanned Surface Vessel Perception Algorithm Based on Multi-Sensor Fusion. In Proceedings of the 2022 4th International Conference on Frontiers Technology of Information and Computer (ICFTIC), Qingdao, China, 2–4 December 2022. [Google Scholar]
Wang, L.; Xiao, Y.; Zhang, B.; Liu, R.; Zhao, B. Water Surface Targets Detection Based on the Fusion of Vision and LiDAR. Sensors 2023, 23, 1768. [Google Scholar] [CrossRef]
Liu, D.; Zhang, J.; Jin, J.; Dai, Y.; Li, L. A New Approach of Obstacle Fusion Detection for Unmanned Surface Vehicle Using Dempster-Shafer Evidence Theory. Appl. Ocean. Res. 2022, 119, 103016. [Google Scholar] [CrossRef]
Chen, J.; Wang, H. An Obstacle Detection Method for USV by Fusing of Radar and Motion Stereo. In Proceedings of the 2020 IEEE 16th International Conference on Control & Automation (ICCA), Singapore, 9–11 October 2020. [Google Scholar]
Haghbayan, M.-H.; Farahnakian, F.; Poikonen, J.; Laurinen, M.; Nevalainen, P.; Plosila, J.; Heikkonen, J. An Efficient Multi-Sensor Fusion Approach for Object Detection in Maritime Environments. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Kim, A.; Osep, A.; Leal-Taixe, L. EagerMOT: 3D Multi-Object Tracking via Sensor Fusion. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021. [Google Scholar]
Wang, X.; Fu, C.; He, J.; Wang, S.; Wang, J. StrongFusionMOT: A Multi-Object Tracking Method Based on LiDAR-Camera Fusion. IEEE Sens. J. 2023, 23, 11241–11252. [Google Scholar] [CrossRef]
Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
Breitinger, A.; Clua, E.; Fernandes, L.A.F. An Augmented Reality Periscope for Submarines with Extended Visual Classification. Sensors 2021, 21, 7624. [Google Scholar] [CrossRef] [PubMed]
Bingham, B.; Aguero, C.; McCarrin, M.; Klamo, J.; Malia, J.; Allen, K.; Lum, T.; Rawson, M.; Waqar, R. Toward Maritime Robotic Simulation in Gazebo. In Proceedings of the OCEANS 2019 MTS/IEEE SEATTLE, Seattle, WA, USA, 27–31 October 2019. [Google Scholar]
Amosa, T.I.; Sebastian, P.; Izhar, L.I.; Ibrahim, O.; Ayinla, L.S.; Bahashwan, A.A.; Bala, A.; Samaila, Y.A. Multi-Camera Multi-Object Tracking: A Review of Current Trends and Future Advances. Neurocomputing 2023, 552, 126558. [Google Scholar] [CrossRef]

Figure 1. The perception process of decision-level fusion.

Figure 2. Complex scenario. The rectangular boxes in red and cyan, respectively, represent the projections of the point clouds from the ship and the buoy onto the image plane. Specifically, the IoU between the buoy’s point cloud projection and the image target of the buoy is 0.114, while its IoU with the image target of the ship is 0.225. The IoU between the ship’s point cloud projection and the image target of the buoy is 0.088, while its IoU with the image target of the ship is 0.174. If the targets with the highest IoU are matched first during the matching process, it could lead to the buoy’s point clouds being incorrectly associated with the ship’s image target. Similarly, even when the goal is to maximize the total IoU, a mismatch still occurs.

Figure 3. Water surface target detection module.

Figure 4. Multi-target tracking module.

Figure 5. Monocular estimation.

Figure 6. Target Matching Network.

Figure 7. The network structure of the improved YOLOv9 algorithm. The red dashed box in the figure indicates the occlusion detection branch. The occlusion detection branch performs a binary classification task, so the same structure as the classification branch is chosen. The improved YOLOv9 algorithm outputs three kinds of information: the category, the coordinates of the recognition box, and the probability of being occluded.

Figure 8. Examples of recognition results from the improved YOLOv9 algorithm. Each target recognition box displays the corresponding target category and the probability of being occluded. The dock is occluded by the floating dock, so its probability of being occluded is higher.

Figure 9. VRX simulation environment.

Figure 10. Alignment of timestamps between image and point cloud frame.

Figure 11. Reading data pairs in offline data package. A sliding window algorithm is used to capture data. For each image data frame within the sliding window, the nearest LiDAR, GPS, and IMU frames are searched before and after it. By using the interpolation method, the point cloud, GPS, and IMU information corresponding to the timestamp of the image frame are obtained.

Figure 12. Real-time capture of data pairs. Dual-thread processing is adopted. The main thread performs detection and tracking functions based on the input data pairs, while the auxiliary thread continuously receives data frames and stores them in the data queue. Following each detection and tracking operation, the main thread interacts with the data queue to extract the most recent pair of data.

Figure 13. Test scenario for target detection and multi-target tracking method.

Figure 14. Visualization of multi-target tracking frames. (a) The green numbers represent the tracking IDs of the targets, the right angle formed by red and blue lines represents our own ship, and the rest of the pixels represent the occupancy grid of obstacles; (b) The “question mark” icon represents obstacles of unknown categories, the black catamaran represents our own ship, and the rest of the categories are marked in the figure.

Figure 15. Image target detection results under normal, foggy, and low-light conditions.

Figure 16. Curve of MOTA over runtime.

Figure 17. Target matching accuracy curve with varying disturbance standard deviation.

Table 1. Overview of water surface target recognition based on LiDAR and camera.

Year of Research Publication	Sensors	Need for Joint Calibration	Feature Matching Capability *
2017 [9]	GPS/INS	Yes	Bad
	2D-LiDAR
	Monocular camera
2017 [10]	2D-LiDAR	Yes	Bad
2017 [10]	Monocular camera	Yes	Bad
2018 [12]	3D-LiDAR	Yes	Good
2018 [12]	Monocular camera	Yes	Good
2018 [11]	3D-LiDAR	Yes	Bad
2018 [11]	Monocular camera	Yes	Bad
2018 [18]	3D-LiDAR	Yes	Bad
2018 [18]	Monocular camera	Yes	Bad
2020 [17]	GPS	Yes	Normal
	Radar
	Binocular camera
2021 [13]	IMU	Yes	Bad
	3D-LiDAR
	Radar
	Monocular camera
2022 [14]	3D-LiDAR	Yes	Bad
2022 [14]	Monocular camera	Yes	Bad
2022 [16]	3D-LiDAR	Yes	Normal
	Millimeter-wave Radar
	Binocular camera
	3D-LiDAR
2023 [15]	Monocular camera	Yes	/
2023 [15]	2D-LiDAR	Yes	/

* “Feature Matching Capability” refers to the ability to correctly associate point cloud and image features for each target, even when there are occlusions among targets.

Table 2. Hardware configuration.

Component	Specification
CPU	i7-10700
GPU	RTX-3060
Memory	24 GB RAM
Storage	256 GB SSD
Operating System	Linux Ubuntu 20.04 LTS (noetic)

Table 3. Average matching accuracy in ordinary and complex scenarios.

Method	Ordinary			Complex
Method	Acc (%)	TP + TN	GT	Acc (%)	TP + TN	GT
IOU	83.00	874	1053	76.76	1199	1562
SDIOU [21]	73.22	771	1053	67.54	1055	1562
MLP (Ours)	89.36	941	1053	84.83	1325	1562
Attention + MLP (Ours)	92.02	969	1053	89.44	1397	1562
Occ + Attention + MLP (Ours)	92.88	978	1053	90.59	1415	1562

Table 4. IOU vs. proposed matching method in conjunction with tracking module.

Method	FP	FN	IDSW	MOTA (%)	FPS
IoU-based	5802	1511	354	71.9	23.73
Ours	1437	1511	354	87.9	21.98

Table 5. Performance of image target detection under various environmental conditions.

Environment	Recall (%)	Precision (%)
Normal	92.14	98.84
Light Fog	96.43	98.15
Heavy Fog	52.50	85.03
50% Lighting	94.64	98.49
20% Lighting	60.36	92.31

Table 6. Average matching accuracy under various environmental conditions.

Environment	Accuracy (%)	TP + TN	GT
Normal	92.93	880	947
Light Fog	91.34	865	947
Heavy Fog	94.83	898	947
50% Lighting	92.82	879	947
20% Lighting	93.56	886	947

Table 7. Performance of proposed WSTDT method under various environmental conditions.

Environment	FP	FN	IDSW	Fixed MOTA (%)	FPS
Normal	1437	5204	354	74.36	21.98
Light Fog	1492	4965	532	74.38	21.65
Heavy Fog	2541	6716.5	354	64.77	22.13
50% Lighting	1465	5198	354	74.28	21.72
20% Lighting	1959	6440	534	67.26	22.04

Table 8. Performance evaluation results of WSTDT method under different ship densities.

Number of Ships	Number of Other Obstacles	FP	FN	IDSW	MOTA (%)
3	16	911	360	5	90.19
6	16	1115	207	2	91.71
9	16	1270	302	6	91.31
12	16	1643	322	247	0.8771

Table 9. Performance of WSTDT method under various disturbance standard deviation.

Standard Deviation	FP	FN	IDSW	MOTA (%)
0	1437	1511	354	87.89
0.5	1593	1633	349	86.90
1	2712	1349	446	83.48

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zheng, M.; Li, D.; Chen, G.; Wang, W.; Yang, S. An Anti-Occlusion Approach for Enhanced Unmanned Surface Vehicle Target Detection and Tracking with Multimodal Sensor Data. J. Mar. Sci. Eng. 2024, 12, 1558. https://doi.org/10.3390/jmse12091558

AMA Style

Zheng M, Li D, Chen G, Wang W, Yang S. An Anti-Occlusion Approach for Enhanced Unmanned Surface Vehicle Target Detection and Tracking with Multimodal Sensor Data. Journal of Marine Science and Engineering. 2024; 12(9):1558. https://doi.org/10.3390/jmse12091558

Chicago/Turabian Style

Zheng, Minjie, Dingyuan Li, Guoquan Chen, Weijun Wang, and Shenhua Yang. 2024. "An Anti-Occlusion Approach for Enhanced Unmanned Surface Vehicle Target Detection and Tracking with Multimodal Sensor Data" Journal of Marine Science and Engineering 12, no. 9: 1558. https://doi.org/10.3390/jmse12091558

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

An Anti-Occlusion Approach for Enhanced Unmanned Surface Vehicle Target Detection and Tracking with Multimodal Sensor Data

Abstract

1. Introduction

2. Related Work

3. Problem Statement

4. Method

4.1. Target Detection and Multi-Target Tracking

4.1.1. Water Surface Target Detection

4.1.2. Multi-Target Tracking

4.2. Target Matching Network and Improved YOLOv9

4.2.1. Target Matching Network

4.2.2. Improved YOLOv9

5. Experiments and Results

5.1. Comparison of Target Matching Methods

5.1.1. Data Collection

5.1.2. Comparative Experiment

5.2. Experiment of Target Detection and Multi-Target Tracking

5.2.1. Input Data Processing

5.2.2. Results of Target Detection and Multi-Target Tracking

5.3. Stability Evaluation

5.3.1. Evaluation under Foggy and Low-Light Conditions

5.3.2. Impact of Varying Ship Densities

5.3.3. Long-Term Stability Testing

5.3.4. Effect of External Disturbances

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI