Next Article in Journal
Measurement of the Instrumental Effect Caused by Flexure Clamping on Quartz Crystal Microbalances
Previous Article in Journal
Blurred Lesion Image Segmentation via an Adaptive Scale Thresholding Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Toward an Augmented Reality Representation of Collision Risks in Harbors

by
Mario Miličević
1,*,
Igor Vujović
1,*,
Miro Petković
1 and
Ana Kuzmanić Skelin
2
1
Faculty of Maritime Studies, University of Split, Ruđera Boškovića 37, 21000 Split, Croatia
2
Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture, University of Split, Ruđera Boškovića 32, 21000 Split, Croatia
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2025, 15(17), 9260; https://doi.org/10.3390/app15179260
Submission received: 16 July 2025 / Revised: 7 August 2025 / Accepted: 21 August 2025 / Published: 22 August 2025
(This article belongs to the Section Marine Science and Engineering)

Abstract

Featured Application

A mobile application for the visualization of collision risk in maritime traffic based on augmented reality is proposed.

Abstract

In ports with a significant density of non-AIS vessels, there is an increased risk of collisions. This is because physical limitations restrict the maneuverability of AIS vessels, while small vessels that do not have AIS are unpredictable. To help with collision prevention, we propose an augmented reality system that detects vessels from video stream and estimates speed with a single sideway-mounted camera. The goal is to visualize a cone for risk assessment. The estimation of speed is executed by geometric relations between the camera and the ship, which were used to estimate distances between points in a known time interval. The most important part of the proposal is vessel speed estimation by a monocular camera validated by a laser speed measurement. This will help port authorities to manage risks. This system differs from similar trials as it uses a single stationary camera linked to the authorities and not to the bridge crew.

1. Introduction

Vessel speed estimation is a critical aspect of maritime domain awareness, encompassing safety enforcement, environmental monitoring, traffic optimization, and port security. As the global maritime industry grows, with increasing vessel density near ports and coastal areas, the demand for scalable, automated monitoring solutions becomes more pressing. Traditional methods, such as radar and AIS (Automatic Identification System), have long served this purpose, but both have significant limitations: radar systems require a large infrastructure and are sensitive to weather conditions, while AIS depends on the active cooperation of ships.
The current technology trend is moving towards image processing-based systems, due to their passive nature, relatively low cost, and ability to be deployed in a distributed manner. Such systems are particularly advantageous for detecting and analyzing the behavior of vessels that do not transmit AIS data, such as small fishing boats or unauthorized crafts. The visual approach also provides additional data, such as size, direction, and vessel type, which can enhance operational decision-making.
In this study, we propose a computer vision-based approach using a single overhead side-mounted camera installed at a harbor entrance. The methodology combines object detection (YOLOv9), object tracking (DeepSORT), and geometric calibration of image coordinates to real-world distances using camera parameters (height and tilt angle). This framework allows us to estimate the vessel speed by analyzing the displacement of tracked vessel centroids over a known time interval, accounting for pixel scaling variations due to the perspective projection.
Our approach emphasizes reproducibility and practicality: it does not require prior knowledge of vessel size or class, can be deployed in a wide range of settings, and provides output data in a form compatible with port control systems. The methodology was tested over real harbor surveillance footage collected in varied environmental conditions. The findings demonstrate that the system performs well across different vessel types and sizes, with manageable error margins, offering a viable alternative or supplement to traditional maritime surveillance technologies.
The use of computer vision (CV) in traffic monitoring is improving every day. The areas of application are very diverse: Autonomous Underwater Vehicles (AUVs) [1], local sea state estimation [2], ship navigation support [3], tracking of multiple objects in the maritime domain [4], innovative methods for CV techniques in the maritime sector [5], risk assessment for autonomous surface vessels [6], intruder/illegal activity monitoring [7], structuring of marine areas [8], ship detection in maritime environments [9], or for maritime safety [10], to name a few.
One possible approach to support the operator in assessing the traffic situation is the use of augmented reality (AR). To develop a tool for collision estimation, one needs to think about the framework, the application, and the brain of the method. We propose the framework and the application, but the real problem to be solved lies in the “operation behind the scene” (brains). To have a reliable risk estimation system, the system must estimate the speed. The estimation of the distance between the camera and the moving ship plays a crucial role. This is the main topic of this article. Distance estimation with a monocular camera was considered in [11], where multi-scale resolution is applied to improve the accuracy of the estimation by extending the expressiveness of the depth information. In [12], it is proposed to use facial features to estimate the distance from a camera. However, this is not applicable in our case. Namely, paper [12] proposes using face features, in addition to body features, with various regression algorithms to estimate the distance from a camera. Hence, it is obvious that it cannot be used for ships. In [13], the ability of the PTZ camera to rotate horizontally and in tilt, as well as to use optical zoom, was used to estimate the distance. In [14], a self-monitoring approach called masked object modeling is introduced. It can be seen from this that distance estimation is a current problem that is the focus of research interest.
The contribution of this article is a validated method for estimating the speed of vessels using a side-mounted monocamera to detect and prevent possible collisions. The entire system is based only on the data from a single stationary camera at the port entrance, which makes this method affordable and easily applicable to other ports worldwide. It also offers modern approaches, such as the detection and tracking of objects by ANN with augmented reality, to indicate and possibly prevent collisions and improve the safety of navigation.
Table 1 shows all the advantages of our proposed system, which are in the “vision-based” column versus radar and AIS methods. Advantages are explained in the “Explanation” column, and “Limitations” in the last column.

2. Background and Motivation

Vessel speed monitoring is crucial in port security, maritime traffic control, and environmental regulation enforcement. Traditional solutions, such as radar and AIS, offer reliable data but have limitations in high-traffic, near-shore, or GPS-denied environments. Computer vision approaches present a low-cost, scalable alternative for passive, infrastructure-based monitoring.
Previous studies have explored optical flow and trajectory modeling for object speed estimation; however, their application in maritime contexts remains underdeveloped due to challenges, such as surface reflection, occlusion, and scale distortion. This research builds upon the recent advances in deep learning-based object detection and aims to offer a reproducible, geometry-aware methodology for estimating vessel speed from a single side-mounted camera.
Additionally, this research contributes to the broader effort in transforming ports into smart, sensor-driven environments. By offering scalable and non-invasive monitoring, vision-based systems allow authorities to track traffic trends, detect unauthorized behavior, and improve safety.
Image-based motion estimation relies on projective geometry principles, where an object’s pixel displacement corresponds to physical movement. Due to perspective projection, pixel movement varies with object distance from the camera. Thus, a geometry-aware calibration is essential to infer meaningful real-world motion from observed image-space trajectories.
While optical flow methods, such as Lucas–Kanade, Farnebäck, and Horn–Schunck, provide dense motion fields, they are less effective in maritime environments. Water surfaces introduce high-frequency motion and repetitive patterns, violating the brightness constancy assumption. Reflections from sunlight and waves add further noise, making optical flow unreliable for speed estimation.
YOLOv9 (You Only Look Once, version 9) improves upon earlier versions with better localization, anchor-free detection, and attention mechanisms. It performs robust detection, even in complex maritime conditions. Combined with DeepSORT (Simple Online and Realtime Tracking with Deep Association Metrics), the system assigns consistent identities to tracked vessels, maintaining continuity despite occlusion or frame skipping.

3. Materials and Methods

3.1. Proposed Idea

The artificial neural network (ANN) called You Only Look Once (YOLO) was trained with the Split Port Ship Classification Dataset (SPSCD) [15]. The training of the ANN was performed on our mainframe computer equipped with an Intel(R) Core(TM) i7-13700F processor and an NVIDIA GeForce RTX 4070 graphics card.
A fixed HD camera (1920 × 1080 resolution, Dahua Technology, Hangzhou, China, model DH-TPC-PT8620A-T) was mounted at a height of 9 m with an inclination of 20° towards the water. Video footage was recorded over a period of 10 days in various weather and lighting conditions.
YOLOv9 was used to detect vessels in each frame. The centroid of each bounding box was recorded and tracked over time using DeepSORT. The positions were interpolated to reduce noise.
Figure 1 shows the proposed overall framework. The camera installed in the harbor is connected to the computer via an IP (Internet Protocol) connection, which performs all steps, except image acquisition. The basic idea is to present the results of the computer estimation in the mobile app via a wireless connection.
The system was implemented using Python 3.10.4, OpenCV-python 4.6.0.66, and TensorFlow 2.16. The detection-tracking pipeline was optimized for batch processing of multiple vessels. To evaluate accuracy, speed estimates were compared with laser speed gun measurements.
False positives were filtered using trajectory length thresholds and detection confidence metrics. The proposed system ran in near-real time on a standard GPU-enabled personal computer.
On-site data collection was performed in the harbor with camera and laser measurements. In the proposed framework, the camera in the port works online and captures real live images that are processed on the remote computer (in the faculty lab for research purposes and in the port authority if it is accepted in the future). This enables ship detection with delay through IP traffic. The computer performs ship detection, a distance and speed estimation algorithm, and, subsequently, an AR visualization of the collision risk. This is sent to the operator, who can see the risk on their smartphone.

3.2. Proposed Methodology

The research is based on estimating the speed of a ship based on video footage from a camera installed at the harbor entrance. The speed was estimated based on the movement of the center of the boundary box, which was determined by applying the YOLOv9 detection algorithm and the DeepSORT algorithm to track the detected object. The center of the boundary box is represented as a single point in the image coordinate system for each image of the observed time interval, whose position is determined by the corresponding x-y coordinates in pixels. A time interval of three seconds was used for velocity estimation as this is a sufficiently small time span to ensure good accuracy while keeping the computational complexity at a reasonable level so as not to have problems with simultaneous tracking and computation for a large number of vessels in the image. In preliminary experiments, we evaluated multiple observation intervals ranging from 1 to 5 s. It was found that a three-second interval provided the most consistent balance between accuracy and robustness across various vessel sizes and speeds. Shorter intervals (e.g., 1–2 s) led to noisier measurements, especially for slow-moving vessels, due to smaller pixel displacements. Longer intervals (e.g., 4–5 s) increased the chance of occlusion or vessel trajectory changes, which negatively affected tracking consistency. Conducted experiments testing 2 s, 3 s, and 5 s intervals revealed that slower vessels (e.g., passenger ships) benefit from longer intervals, while faster vessels (e.g., speed crafts) obtain better estimates with shorter intervals.
Figure 2 shows the movement of a ship leaving the harbor. The green color shows the trajectory “t”, which indicates the direction and speed of the ship’s movement. The linear interpolation of the trajectory points produces a line that indicates the direction of the ship’s movement. The ship has moved from point P1 to point P2 after 3 s. The red lines represent the position of the ship in the coordinate system for point P1, while the blue lines represent the position at point P2. The labels X1, X2, Y1, Y2 are the coordinates of the points mentioned and can be used to calculate the length of the trajectory l(t) in pixels:
From Figure 3, it can be written that:
l t = ( Y 1 Y 2 ) 2 + ( X 1 X 2 ) 2 .
The area of interest is presented in Figure 4 and is marked with two blue bars. The width or length of the pixel through which the ship has passed can be estimated using the measured reference values displayed in the image, depending on the distance to the camera and the image resolution. The height of the camera is 9 m and the tilt of the camera used to observe the harbor is 20 degrees. Since each new row of image pixels is viewed at a different angle to the ground, it is necessary to determine the viewing angle of the upper and lower points using trigonometry and elevation angles.
α u p p e r = a r c t a n h d u p p e r = arctan 9   m 204   m = 2.5 ° ,
α l o w e r = a r c t a n h d l o w e r = arctan 9   m 44   m = 11.5 ° ,
The image covers the range of:
α = α l o w e r α u p p e r = 9 ° ,
The viewing angle for the n-th row of pixels is calculated as:
θ n =   α u p p e r +   n H 1 ( α l o w e r   α u p p e r ) ,
where θ(n) is an angle at which the pixel is viewed towards the sea level, and H the total number of rows (in our case 612). For each line of the image, the distance on the ground (from the camera projection to the point where this line “looks”) is:
d ( n ) =   h t a n ( θ n ) ,
The pixel length for line n is the difference between the distances between two neighboring lines:
α = d n d ( n + 1 ) ,
The vertical length of a pixel is not constant but decreases as you go down the lines because the angle increases and the tangent function increases rapidly. The pixel width decreases linearly from the top to the bottom of the image:
S n = s 1 + n 612   p x ( s 2 s 1 ) ,
where “s1” and “s2” represent the pixel size in the lower part of the area of interest and in the upper part of the area of interest.
s 1 = 44   m 1920   p x = 0.0229   m / p x ,
s 2 = 204   m 1920   p x = 0.1062   m / p x ,
Then the width of one pixel in n-th row is calculated as:
p x _ w = S ( n ) W ,
W is the designation for the number of horizontal pixels (1920 in our case). Using this method, the actual pixel size can be calculated (length and width in meters) for each line of the image, which is a key parameter for estimating the speed of the observed ships. Using the previously defined equations, the distance in meters is obtained between the positions the vessel has travelled to in three-second interval. From this, the vessel speed is finally obtained using the expression:
v = d i s t a n c e t i m e .
To further assess the consistency of the proposed method, the standard deviation (SD) and coefficient of variation (CoV) of the category-wise estimation errors are calculated:
S D =   1 n   i = 1 n ( x i   x ¯   ) 2 .
and
C o V = S D E 100 .
where E is the average of all data.
To illustrate the subsequent objective, two moving vessels are considered. When two vessels are present, the time required for the vessels to reach the intersection point must be calculated:
t v 1 = ( P i x x v 1 ) 2 + ( P i y y v 1 ) 2 ) v v 1 ,
t v 2 = ( P i x   x v 2 ) 2 + ( P i y   y v 2 ) 2 ) v v 2
where Pi is coordinate of the possible intersection point, vv1, vv2 are the velocities of vessels 1 and 2, and yv1, xv1, yv2, xv2 are the coordinates of the corresponding vessels.
If it is determined that vessel 1 and vessel 2 will reach the interception point at different times, it can be concluded that there is no risk of collision in this situation. The mobile app should circle the green area. If the times are close to each other, the area will be closer to red (yellow, orange, etc.), and finally, if the times are identical, a hot-red cone will be displayed in the mobile app.
Our goal is to further develop this function so that it is also useful for smartphones.

3.3. Detector and Tracker Selection

For vessel detection, we selected YOLOv9 after thorough benchmarking against previous YOLO versions on our Split Port Ship Classification Dataset. YOLOv9 provided the best trade-off between detection accuracy, especially for small or partially occluded vessels and real-time inference speed, attributed to its improved anchor-free design and advanced backbone modules. This makes it highly suitable for the variable object size, viewing angles, and illumination typical of port scenes. The effectiveness of YOLO-based architectures for maritime vessel detection has been corroborated by recent studies, such as Liu et al. (2023) [16] and Zhai et al. (2024) [17], who demonstrated reliable ship detection performance in real port and inland waterway environments [16,17].
For multi-object tracking, we adopted DeepSORT because it combines robust identity preservation using deep appearance features (re-ID embedding) with computational efficiency. This is essential for real-time operation and for handling moderate vessel density and partial overlaps, which are common in port monitoring scenes. DeepSORT’s performance for ship tracking and its ability to maintain target identity has been positively evaluated in several recent maritime surveillance studies [16,17].

4. Results

Experimental settings include a Dahua Technology, Hangzhou, China, model DH-TPC-PT8620A-T surveillance camera [18] that was installed at the beacon above the port entrance to monitor all incoming and outgoing maritime traffic in the port of Split. Validation of the results obtained was carried out using a Braun Rangefinder 1000 WH laser distance and speed measuring device. The device has a speed range of 0–300 km/h and a maximum distance range of 1000 m. If necessary, it can use magnification factor 6. Declared precision is ±1 m.
The research methodology was conducted on a sample of vessels of different types. The dataset includes sailboats, yachts, motorboats, large passenger vessels, and speed crafts. The vessels were tracked across the entire field of view of the camera and at different distances from the camera itself in order to fully validate the system for different vessel types and at different distances from the camera.
Figure 5 illustrates the complexity of daily traffic at the site. There are a variety of vessel types. In the figure, one can see two ferries moving in different directions, boats, and tourist vessels boarding. Many of these vessels are not equipped with AIS, so video detection offers an advantage in comparison to other detection systems. It should be noted that, in summer, almost all traffic takes place during daylight hours. In winter, the traffic is less dense.
Table 2 shows examples of the experimental results. At the time of the site investigation, there were five categories of vessels, based on the classification in [19]. Pleasure yachts are defined as over 12 m in length. Motorboats are defined as being in a length range from 7 to 12 m. Speedboats larger than 2 m are in the category “Speed craft”. Sailing boots, besides different constructions, are defined as having a length larger than 6 m. Large passenger ships are larger than 130 m in length. Figure 6 shows the number of occurrences (called instances) of the measured categories.
Table 3 shows the category estimation error analysis for all the results as average and in the minimum error–maximum error range.
A detailed breakdown of the estimation error by vessel category reveals several important insights. Small boats exhibited an average error of 13.98%, with high variability primarily attributed to their smaller physical dimensions. These vessels tend to occupy fewer pixels in the image, especially when positioned farther from the camera, making them more susceptible to calibration inaccuracies and pixel quantization effects. Sailboats showed a similar average error of 13.61%, though the distribution of errors was wider. The variability in sail geometry, combined with motion induced by wind and water currents, contributed to fluctuations in the bounding box dimensions and centroid positions. Observed errors ranged from 2.8% to 21.9%, indicating significant sensitivity to changes in shape and appearance across frames. Motorboats recorded the highest average error among the tested categories, at 14.80%. This was largely influenced by their dynamic movement patterns, including rapid acceleration and deceleration, as well as the turbulence caused by their wake. These factors introduced erratic centroid trajectories that reduced estimation accuracy. In contrast, ferries achieved an average error of 11.24%. Their larger and more uniform shape contributed to consistent detection performance. Additionally, their typical presence in the mid-range pixel area—where pixel scaling is relatively stable—helped maintain errors within a narrower margin.
Finally, passenger ships demonstrated the most accurate results, with an average error of 10.21%. This is due to their substantial size and slow, stable movement; these vessels generated smooth trajectories with minimal noise in pixel displacement, enhancing the precision of the estimated speed.
The square root of the variance yielded the standard deviation. The resulting standard deviation of 1.86% indicates the extent to which the errors of the individual categories deviate from the average total error. The coefficient of variation is 14.53% and indicates a moderate dispersion of the error values for the various ship categories.
It turns out that the proposed method has errors ranging from 2 to 22 compared to the laser speed measurement, depending on the ship category. Using basic mathematical methods, it can be determined that the average estimation error across all categories is approximately 12.78%.
In addition to category-wise average errors, we estimated the standard deviation (SD) and coefficient of variation (CoV) for each vessel type using the reported minimum and maximum relative errors. The SD was approximated using the empirical rule based on error range divided by four. These values provide an insight into the internal variability within each category.
The results are presented in Table 4. The highest relative variation was found in the sailing boat category (CoV = 32.37%), consistent with its broad range of errors (2.83–21.97%) and dynamic shape behavior. The large passenger ship category also showed significant dispersion (CoV = 25.89%), likely due to differences in ship geometry and visual perspective across examples. Conversely, motorboats showed the lowest variability (CoV = 9.99%), followed closely by pleasure yachts (CoV = 15.96%). The speed craft category could not be analyzed for variability due to unavailable range data.
These findings reinforce the need for vessel-type-specific calibration or adaptive modeling, especially when monitoring smaller or less stable vessel classes.
Error histograms for each vessel type revealed positively skewed distributions, especially in the case of sailboats and motorboats. The kurtosis for the sailboat category was higher, indicating a greater number of extreme deviations, possibly linked to environmental factors, like wave interference.
Further analysis showed a strong correlation between error magnitude and the pixel location of centroid displacement. Vessels tracked in the top third of the image frame (farther from camera) showed higher error rates. Lateral movements (perpendicular to image plane) also introduced additional variance, which was expected, given the projection compression effects.
Applying confidence-weighted averaging (using YOLO detection confidence) slightly improved the accuracy in categories with erratic shapes. This suggests that integrating temporal detection stability metrics could further enhance robustness.

4.1. Quantitative Analysis of Distance Estimation Accuracy

To evaluate the performance of the proposed monocular distance estimation method, a quantitative analysis was conducted to assess how the estimation error varies with the vessel’s distance from the camera. The results presented in Table 5 show that the average relative error increases progressively with distance. Specifically, for vessels located within the range of 50–100 m, the average error was below 7.6%. For distances in the range of 100–300 m, the error remained under 12.1%. Beyond 300 m, the error increased significantly due to reduced resolution and stronger perspective effects. These findings indicate that the proposed system is reliable for port monitoring and early warning within an operational range of up to 300 m from the camera.

4.2. Real-Time Performance and Multi-Vessel Efficiency

To evaluate the efficiency and scalability of the proposed algorithm in real-time scenarios, additional tests were conducted with video sequences containing multiple vessels (up to five) simultaneously visible in the frame. The method was deployed on a machine with an NVIDIA RTX 4070 GPU, and the average processing time per frame was measured. The results show that the system maintains an average processing time below 50 ms per frame, corresponding to the real-time performance at 20 FPS.
These results, presented in Table 6, confirm the method’s feasibility for real-time port monitoring applications under moderate-to-high vessel density.

5. Discussion

Since the ANN is trained on the SPSC dataset, the authors in [15] noted some class imbalance. In practice, there is a trade-off between the granularity of the classification and the balance of the dataset when creating datasets. A case with fewer categories leads to overgeneralization. The reason for this is the grouping of different vessels into broad categories. If there are too many categories, there is a risk of misclassification because there are not enough training examples per category. Therefore, the dataset should be balanced. We propose a balance of the real traffic profile, which means that the number of vessels per category corresponds to the real occurrence of these categories.
Although, there are new studies that can obtain more accurate speed estimations, they are usually not obtained from monocular cameras, which are side-mounted or need a bird’s-eye view. One example of such research is presented in [20], but it deals with car traffic in cities, not maritime traffic. Namely, maritime traffic involves less complex scenes due to weather and sea conditions, and distances. From the results, we can conclude that the estimated speed follows the actual speed of the vessel to a satisfactory degree. Furthermore, the estimated values are in all cases higher than the measured values, which is due to approximations and rounding when calculating the speed. When continuing the research, the number of monitored objects should be increased to obtain more statistically significant deviation data. In addition, changing the observation time interval from three seconds to more or less should monitor the impact on the accuracy of the results. Furthermore, the study should be repeated in bad weather when there are larger waves and greater oscillations of the ship at sea to gain insights into the behavior of the proposed system under difficult conditions.
The analysis of category-level errors reveals that the proposed method performs with acceptable accuracy, yielding an average estimation error of approximately 12.78% when compared to laser-based speed measurements. The highest precision was achieved in the “Large passenger ship” category (10.21%), while the “Sailing boat” category exhibited the widest error range (2.83–21.97%), indicating higher variability in performance for smaller and slower vessels. The system was shown to maintain a distance estimation error below 12.1% within 300 m, which we define as its optimal operational range. This performance is sufficient for effective deployment in port monitoring and early warning applications.
The calculated standard deviation of 1.86% and a coefficient of variation (CV) of 14.53% suggest a moderate level of variability across vessel categories. These results support the reliability of the method in general use, but highlight the potential for further improvement, particularly for vessel types with higher deviation from ground truth measurements. These findings provide a foundation for refining the approach and tailoring it to vessel-specific characteristics in future work.
The proposed system has a broad range of potential real-world applications. It can be employed for enforcing speed limits within ports and coastal areas by automatically detecting vessels that exceed permitted velocities. Furthermore, it enables the identification of unauthorized docking and the entry of vessels into restricted or prohibited zones, thereby enhancing security and improving port management efficiency. The system can also contribute to environmental monitoring by estimating the cumulative wave energy generated by frequent vessel traffic. In addition, historical trajectory analysis can support search and rescue operations by reconstructing vessel movements over specific time intervals. In the context of port logistics, real-time traffic data can be used to optimize docking schedules and reduce waiting times, leading to more efficient maritime operations.
Future development will focus on several key improvements. The integration of Kalman filtering will enhance trajectory smoothing and tracking accuracy. Deploying the system on edge AI devices, such as Jetson Nano or Coral TPU modules, will enable real-time operation with minimal latency. The methodology will be adapted for nighttime conditions through the use of infrared imagery, while integration with AIS data will allow for advanced anomaly detection and predictive modeling of vessel behavior. Additionally, the approach will be extended to estimate acceleration and turning angles, and a dedicated dataset will be developed for supervised learning of vessel behavior profiles. Future research will also focus on implementing adaptive time intervals based on vessel class or behavior, which may further improve the precision of speed estimation, especially in heterogeneous traffic scenarios.
Despite these advantages, the system has certain limitations. Detection accuracy depends on visibility conditions and camera stability, while recognizing sailboats and small objects at long distances remains challenging. Performance during the nighttime is currently constrained due to poor illumination, and speed estimation is sensitive to calibration precision. These limitations will be mitigated through improved hardware configurations and data fusion techniques. There are also standard problems of monocular camera systems, such as a loss of depth information, occlusions, calibration complexity, or similar problems. Monocular distance estimation needs some prior knowledge to cope with a loss of dimensionality: either knowledge of an object or about the scene.
One of the key limitations of monocular vision systems is the inability to perceive depth directly. In crowded port environments, overlapping vessels can result in occluded or merged bounding boxes, which negatively impacts the performance of tracking algorithms, such as DeepSORT. In these scenarios, spatial ambiguity can lead to temporary ID switching or even object loss.
As noted by Park et al. (2024) [21], multi-camera setups and camera fusion approaches can significantly reduce occlusion and improve spatial resolution by providing additional viewpoints. However, such systems require more complex calibration, infrastructure, and maintenance. In contrast, monocular vision remains a cost-effective and scalable solution, particularly suited for small-to-medium port installations or as a component in hybrid sensor networks.
In conclusion, this study demonstrates a reproducible, geometry-aware, vision-based method for estimating vessel speed using a single camera. The system provides a low-cost, non-intrusive alternative to radar and AIS technologies, with strong potential for applications in port monitoring and coastal management. With further refinement and integration into smart port infrastructures, it can become a key component of fully automated maritime surveillance systems.
Finally, the mobile AR application will be developed based on the success of this investigation. The main difference between the existing system and this work lies in the operator. The operator of the existing systems is on the bridge of the ship. Our application is intended for port authorities, VTS centers, and shore-based personnel to regulate collision risks. In order to prove this, we plan to apply for a proof-of-concept project in collaboration with interested subjects out of academia. Finally, we hope that this research will contribute toward AR representations of collision risk and play a role in saving human lives and the environment.

Author Contributions

Conceptualization, I.V. and M.M.; methodology, I.V., M.M. and A.K.S.; software, M.P. and M.M.; validation, M.M. and I.V.; investigation, A.K.S., M.P., M.M. and I.V.; writing, I.V. and M.M.; visualization, I.V., M.M., M.P. and A.K.S.; writing revision, M.M., I.V. and M.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The training set, used for the ANN in this research, is available at: https://labs.pfst.hr/maritime-dataset/ (accessed 17 July 2025).

Acknowledgments

The paper is a part of the following project: Application of augmented reality and machine learning for maritime transport safety improvement (see https://ivujovic.pfst.hr/index.php/primjena-prosirene-stvarnosti-i-strojnog-ucenja-u-poboljsanju-sigurnosti-pomorskog-prometa-application-of-augmented-reality-and-machine-learning-for-maritime-transport-safety-improvement/).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AISAutomatic Identification System
ANNArtificial Neural Network
ARAugmented Reality
AUVAutonomous Underwater Vehicle
CoVCoefficient of Variation
CVComputer Vision
DeepSORTSimple Online and Realtime Tracking with Deep Association Metrics
IPInternet Protocol
PCPersonal Computer
PTZPan, Tilt, and Zoom
SDStandard Deviation
SPSCDSplit Port Ship Classification Dataset
YOLOYou Only Look Once

References

  1. Danielis, P.; Brekenfelder, W.; Parzyjegla, H.; Ghogari, P.; Stube, P.; Torres, F.S. Integrated Autonomous Underwater Vehicle Path Planning and Collision Avoidance Algorithms. Trans. Marit. Sci. 2024, 13, 3. [Google Scholar] [CrossRef]
  2. Vorkapic, A.; Pobar, M.; Ivasic-Kos, M. A computer vision approach to estimate the localized sea state. Ocean Eng. 2024, 309, 118318. [Google Scholar] [CrossRef]
  3. Chen, S.; Gao, M.; Shi, P.; Zeng, X.; Zhang, A. Target Ship Recognition and Tracking with Data Fusion Based on Bi-YOLO and OC-SORT Algorithms for Enhancing Ship Navigation Assistance. J. Mar. Sci. Eng. 2025, 13, 366. [Google Scholar] [CrossRef]
  4. Huang, K.; Chong, W.; Yang, H.; Lertniphonphan, K.; Xie, J.; Chen, F. ReIDTracker_Sea: Multi-Object Tracking in Maritime Computer Vision. In Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), Waikoloa, HI, USA, 1–6 January 2024; pp. 813–820. [Google Scholar] [CrossRef]
  5. Jiang, B.; Wu, X.; Tian, X.; Jin, Y.; Wang, S. Proposal of Innovative Methods for Computer Vision Techniques in Maritime Sector. Appl. Sci. 2024, 14, 7126. [Google Scholar] [CrossRef]
  6. Na, S.; Lee, D.; Baek, J.; Kim, S.; Choung, C. Qualitative Risk Assessment Methodology for Maritime Autonomous Surface Ships: Cognitive Model-Based Functional Analysis and Hazard Identification. J. Mar. Sci. Eng. 2025, 13, 970. [Google Scholar] [CrossRef]
  7. Nalamati, M.; Sharma, N.; Saqib, M.; Blumenstein, M. Automated Monitoring in Maritime Video Surveillance System. In Proceedings of the 35th International Conference on Image and Vision Computing New Zealand (IVCNZ), Wellington, New Zealand, 25–27 November 2020; pp. 1–6. [Google Scholar] [CrossRef]
  8. Krivoguz, D.; Bespalova, L.; Chernyi, S.; Zhilenkov, A.; Silkin, A.; Goryachev, I.; Daragan, P. The Structure and Technology of Structuring Marine Areas Using Remote Sensing Data in Semi-Arid Conditions. Trans. Marit. Sci. 2024, 13, 10. [Google Scholar] [CrossRef]
  9. Haijoub, A.; Hatim, A.; Guerrero-Gonzalez, A.; Arioua, M.; Chougdali, K. Enhanced YOLOv8 Ship Detection Empower Unmanned Surface Vehicles for Advanced Maritime Surveillance. J. Imaging 2024, 10, 303. [Google Scholar] [CrossRef] [PubMed]
  10. Şengül, B.; Yılmaz, F.; Uğurlu, Ö. Safety–Security Analysis of Maritime Surveillance Systems in Critical Marine Areas. Sustainability 2023, 15, 16381. [Google Scholar] [CrossRef]
  11. Liang, H.; Ma, Z.; Zhang, Q. Self-Supervised Object Distance Estimation Using a Monocular Camera. Sensors 2022, 22, 2936. [Google Scholar] [CrossRef] [PubMed]
  12. Duman, S.; Elewi, A.; Yetgin, Z. Distance Estimation from a Monocular Camera Using Face and Body Features. Arab. J. Sci. Eng. 2022, 47, 1547–1557. [Google Scholar] [CrossRef]
  13. Zhong, Q.; Cheng, X.; Song, Y.; Wang, H. Monocular Distance Estimated Based on PTZ Camera. Comput. Mater. Contin. 2024, 79, 3417–3433. [Google Scholar] [CrossRef]
  14. Panariello, A.; Mancusi, G.; Ali, F.H.; Porrello, A.; Calderara, S.; Cucchiara, R. Monocular per-object distance estimation with Masked Object Modeling. Comput. Vis. Image Underst. 2025, 253, 104303. [Google Scholar] [CrossRef]
  15. Petković, M.; Vujović, I.; Lušić, Z.; Šoda, J. Image Dataset for Neural Network Performance Estimation with Application to Maritime Ports. J. Mar. Sci. Eng. 2023, 11, 578. [Google Scholar] [CrossRef]
  16. Liu, J.; Li, C. Maritime video ship detection and tracking based on improved YOLOX and DeepSORT. J. Electron. Imaging 2023, 32, 013042. [Google Scholar] [CrossRef]
  17. Zhai, Y. River Ship Monitoring Based on Improved Deep-Sort Algorithm. Informatica 2024, 48, 163–176. [Google Scholar] [CrossRef]
  18. Dahua DH-TPC-PT8620A-T. Available online: https://www.dahuasecurity.com/uk/products/All-Products/Discontinued-Products/Thermal-Cameras/TPC-PT8620A-T (accessed on 15 July 2025).
  19. Petković, M.; Vujović, I.; Kaštelan, N.; Šoda, J. Every Vessel Counts: Neural Network Based Maritime Traffic Counting System. Sensors 2023, 23, 6777. [Google Scholar] [CrossRef] [PubMed]
  20. Lian, H.; Li, M.; Li, T.; Zhang, Y.; Shi, Y.; Fan, Y.; Yang, W.; Jiang, H.; Zhou, P.; Wu, H. Vehicle speed measurement method using monocular cameras. Sci. Rep. 2025, 15, 2755. [Google Scholar] [CrossRef] [PubMed]
  21. Park, J.H.; Roh, M.-I.; Lee, H.-W.; Jo, Y.-M.; Ha, J.; Son, N.-S. Multi-vessel target tracking with camera fusion for unmanned surface vehicles. Int. J. Nav. Arch. Ocean Eng. 2024, 16, 100608. [Google Scholar] [CrossRef]
Figure 1. Framework of the proposed method.
Figure 1. Framework of the proposed method.
Applsci 15 09260 g001
Figure 2. Example of real situation for calculations.
Figure 2. Example of real situation for calculations.
Applsci 15 09260 g002
Figure 3. Triangle solution.
Figure 3. Triangle solution.
Applsci 15 09260 g003
Figure 4. Area of interest distances.
Figure 4. Area of interest distances.
Applsci 15 09260 g004
Figure 5. Example of traffic complexity in the port of Split.
Figure 5. Example of traffic complexity in the port of Split.
Applsci 15 09260 g005
Figure 6. Distribution of measured categories in the training dataset.
Figure 6. Distribution of measured categories in the training dataset.
Applsci 15 09260 g006
Table 1. Pros and cons of proposed system.
Table 1. Pros and cons of proposed system.
FeatureRadar/AISVision-BasedExplanationLimitations
Detects small boatsXVision-based methods can detect even small, non-AIS-equipped vessels, unlike traditional radar/AIS systems.Vision may misdetect small boats in poor visibility or cluttered backgrounds; radar may miss low-profile or non-metallic targets.
Requires onboard equipmentXRadar and AIS require transmitters/receivers on vessels, while the vision system only needs shore-based cameras.AIS does not cover all vessels; vision depends on camera placement and quality.
Real-time video availableXVision-based systems inherently provide visual footage for monitoring and analysis.Vision requires a stable bandwidth for high-res streaming; the quality degrades in low light or adverse weather.
Affected by weather conditions✔ (mildly)Both are influenced by severe weather, but vision is less affected under mild conditions.Vision heavily impacted by fog, rain, or night without IR support.
Resolution at close rangeMediumHighCameras provide high-resolution data for nearby vessels, while radar has a lower spatial resolution at close distances.High resolution is only beneficial if there is good visibility, no occlusion, and sufficient processing power.
Depth estimation✔ (direct or inferred)XRadar can infer distance directly; monocular vision uses geometric estimation.Monocular vision cannot perceive depth directly, leading to inaccuracies in crowded scenes.
Performance in crowded scenesXRadar can better separate overlapping targets.Vision suffers from occlusion and overlapping bounding boxes in dense traffic.
The article is structured as follows. In the second section, the background and motivation for this research is discussed. In the third section, the proposed method is described by presenting the necessary information. The fourth section presents the experimental results. The fifth section discusses important questions and issues.
Table 2. Experimental validation of the proposed method; one example per class.
Table 2. Experimental validation of the proposed method; one example per class.
Type of VesselEstimated SpeedMeasured SpeedRelative Error
Pleasure yacht23 km/h21 km/h9.52%
Motorboat24.2 km/h20.9 km/h15.79%
Large passenger ship17.2 km/h15.9 km/h8.17%
Sailing boat11.1 km/h9.1 km/h21.97%
Speed craft18.8 km/h16.9 km/h11.24%
Table 3. Errors by vessel category.
Table 3. Errors by vessel category.
Vessel CategoryCategory Estimation ErrorMin–Max Error Range
Pleasure yacht13.975%9.52–18.43%
Motorboat13.613%10.36–15.79%
Large passenger ship10.286%6.03–16.66%
Sailing boat14.8%2.83–21.97% *
Speed craft11.24%N/A **
* The only category with one negative error. It is calculated as an absolute value. ** At the time of the terrain measurement, there was only one speed craft.
Table 4. Statistical analysis by vessel category.
Table 4. Statistical analysis by vessel category.
Vessel CategoryAvg. ErrorSDCoV
Pleasure yacht13.975%2.2315.96%
Motorboat13.613%1.369.99%
Large passenger ship10.286%2.6625.89%
Sailing boat14.8%4.7932.37%
Speed craft11.24%N/AN/A
Table 5. Mean estimation error compared with distance range.
Table 5. Mean estimation error compared with distance range.
Distance Range (m)Mean Estimation Error (%)
50–1007.6%
100–20010.9%
200–30012.1%
300–40015.8%
>40017.9%
Table 6. Frame processing time with increasing number of vessels per frame.
Table 6. Frame processing time with increasing number of vessels per frame.
Number of Vessels in FrameAverage FPSProcessing Time per Frame (ms)
1–22244
3–42050
51953
Tested on NVIDIA RTX 4070 GPU.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Miličević, M.; Vujović, I.; Petković, M.; Kuzmanić Skelin, A. Toward an Augmented Reality Representation of Collision Risks in Harbors. Appl. Sci. 2025, 15, 9260. https://doi.org/10.3390/app15179260

AMA Style

Miličević M, Vujović I, Petković M, Kuzmanić Skelin A. Toward an Augmented Reality Representation of Collision Risks in Harbors. Applied Sciences. 2025; 15(17):9260. https://doi.org/10.3390/app15179260

Chicago/Turabian Style

Miličević, Mario, Igor Vujović, Miro Petković, and Ana Kuzmanić Skelin. 2025. "Toward an Augmented Reality Representation of Collision Risks in Harbors" Applied Sciences 15, no. 17: 9260. https://doi.org/10.3390/app15179260

APA Style

Miličević, M., Vujović, I., Petković, M., & Kuzmanić Skelin, A. (2025). Toward an Augmented Reality Representation of Collision Risks in Harbors. Applied Sciences, 15(17), 9260. https://doi.org/10.3390/app15179260

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop