Next Article in Journal
Research on Model Reduction of AUV Underwater Support Platform Based on Digital Twin
Previous Article in Journal
Research on Precise Feeding Strategies for Large-Scale Marine Aquafarms
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Development and Application of an Advanced Automatic Identification System (AIS)-Based Ship Trajectory Extraction Framework for Maritime Traffic Analysis

1
Maritime Development and Training Center, National Taiwan Ocean University, Keelung 202301, Taiwan
2
Department of Marine Engineering, National Taiwan Ocean University, Keelung 202301, Taiwan
3
Department of Merchant Marine, National Taiwan Ocean University, Keelung 202301, Taiwan
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2024, 12(9), 1672; https://doi.org/10.3390/jmse12091672
Submission received: 20 August 2024 / Revised: 13 September 2024 / Accepted: 15 September 2024 / Published: 18 September 2024
(This article belongs to the Section Ocean Engineering)

Abstract

:
This study addresses the challenges of maritime traffic management in the western waters of Taiwan, a region characterized by substantial commercial shipping activity and ongoing environmental development. Using 2023 Automatic Identification System (AIS) data, this study develops a robust feature extraction framework involving data cleaning, anomaly trajectory point detection, trajectory compression, and advanced processing techniques. Dynamic Time Warping (DTW) and the Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) algorithms are applied to cluster the trajectory data, revealing 16 distinct maritime traffic patterns, key navigation routes, and intersections. The findings provide fresh perspectives on analyzing maritime traffic, identifying high-risk areas, and informing safety and spatial planning. In practical applications, the results help navigators optimize route planning, improve resource allocation for maritime authorities, and inform the development of infrastructure and navigational aids. Furthermore, these outcomes are essential for detecting abnormal ship behavior, and they highlight the potential of route extraction in maritime surveillance.

1. Introduction

In today’s globalized world, the volume of maritime transportation and the scale of international trade continue to expand, leading to a rapid increase in maritime traffic and a progressively complex traffic environment. This situation necessitates that all nations accord significant attention to maritime safety issues. Particularly in vast waters, where ships have a high degree of spatial freedom in navigation, traffic regulatory agencies must enhance the intelligence level of traffic information systems. This enhancement is crucial for more effectively managing navigation activities within specific waters.
To enhance navigation safety and management efficiency, the International Maritime Organization (IMO) mandated the installation of the AIS as part of the International Convention for the Safety of Life at Sea (SOLAS) in December 2000 [1,2]. This convention requires that all passenger ships, ships over 300 tons on international voyages, and ships over 500 tons on domestic voyages within member states must install AIS in stages by 1 July 2002. AIS is an automatic tracking system that transmits information about a vessel’s static, dynamic, and voyage-related data at designated intervals. Through a global network of land-based and satellite AIS receivers, vast amounts of data on ship characteristics, voyages, and navigation trajectories—such as latitude, longitude, speed, and course—are continuously accumulated, forming the most comprehensive maritime big data repository.
As the AIS system continues to accumulate vast amounts of historical spatiotemporal trajectory data of vessels, it encompasses a wealth of maritime traffic characteristics. These characteristics can be extracted through data science techniques, aiding in Maritime Situational Awareness (MSA) and in the development and management of maritime domains. Data science has developed models and methods for extracting maritime patterns and predicting vessel behavior. These models and methods can be applied to develop various systems for transportation optimization and efficiency/safety management, reflecting real-time traffic information. They provide valuable reference data to decision makers, enabling them to make informed decisions and enhance navigation safety. This has become a critical focus of current research and development.
As part of its response to globalization, Taiwan focuses on developing renewable energy, particularly offshore wind farms, due to their renewable and eco-friendly benefits. With their abundant wind resources, the western waters of Taiwan are ideal for these projects. However, the western waters of Taiwan are a vital shipping corridor for international trade, and installing wind farms in this critical shipping corridor poses significant challenges, including increased traffic density, potential collision risks, and interference with navigational instruments. Careful maritime planning is required to manage the overlap between wind farm areas and traditional shipping lanes, necessitating route adjustments and safety measures to ensure coexistence.
Ships plan routes based on international regulations, fuel efficiency, and safety [3]. Optimal routes save time and energy, but the high spatial freedom in vast waters increases uncertainty and navigation risks. Extracting maritime routes is crucial for identifying traffic patterns, optimizing voyage plans, and improving traffic management. This study aims to generate commercial ship route information and create a navigation map for the western waters of Taiwan. The results can serve as practical references for navigators and relevant authorities and provide a foundation for future research on the early detection of abnormal ship behavior within an intelligent maritime safety system. The contributions of this study are as follows:
  • This study proposes an innovative maritime route extraction framework that effectively reveals distinct maritime traffic patterns, key navigation routes, and intersections and creates a comprehensive navigation network map.
  • The framework introduces a novel approach for extracting valuable route information by integrating DTW and HDBSCAN methods with a hyperparameter tuning procedure. This approach enhances the differentiation of similar ship trajectories in complex waters, thereby improving the accuracy and efficiency of the analysis.
  • This study incorporates robust data processing and feature extraction methods, including anomaly trajectory point detection and trajectory compression, to significantly enhance data quality, reduce noise, and minimize computational complexity, providing a reliable foundation for further analysis.
  • The navigation network map compiled in this study provides valuable route information for shipping operators and government agencies, aiding decision making by identifying high-risk areas, supporting spatial planning, and improving navigation efficiency and safety while also reducing fuel consumption and emissions, contributing to the vision of sustainable oceans.
This paper is organized as follows: Section 2 reviews the literature. Section 3 details the maritime traffic extraction process, including data processing, vessel motion patterns, and cluster modeling. Section 4 presents the results of applying this method to the western waters of Taiwan. Section 5 concludes this paper.

2. Related Works

The mandatory installation of AIS has led to the global accumulation of extensive maritime big data. Among these, trajectory data are particularly crucial for understanding maritime situations and predicting vessel behavior, as it provides detailed insights into ships’ movement patterns and navigational choices.
AIS data are crucial and widely used for various research purposes. In route extraction and optimization, AIS data help analyze ship trajectories, extract regular routes, optimize navigation paths, reduce fuel consumption, and improve transportation efficiency [3,4,5,6,7]. For vessel traffic monitoring and management, AIS data are used to analyze traffic flow and density, identify busy waters, and monitor ship movements in real time, enhancing maritime traffic safety [8,9,10,11,12,13,14]. In accident analysis, regression analysis of AIS data helps study accident causes, propose improvements, identify high-risk waters, and designate safe navigation zones [15,16,17]. In navigational safety, AIS provides information on the position, speed, and course of other vessels to identify potential collision risks [18]. However, it is only one of several tools that contribute to a comprehensive safety strategy, alongside visual and radar observations and radio communication. Additionally, most maritime authorities now broadcast virtual Aids to Navigation (V-AtoN) via AIS, and some studies have also explored the communication techniques and functionality of AIS in V-AtoN to enhance the quality of maritime services [19]. For environmental impact assessments, ship emissions are evaluated, as well as their impact on the marine environment [20]. Jalkanen et al. [21] developed a model for underwater noise sources using AIS data to study the impact of ship noise on marine life. In search and rescue (SAR) and emergency response, real-time AIS data locate missing or distressed ships, assist SAR operations, and help formulate emergency response plans to improve efficiency [22]. For port operations management, AIS data analyze ship entry and exit patterns to optimize berth allocation and improve port efficiency [23]. In fisheries monitoring and management, AIS data monitor fishing vessel activities, analyze fishing paths, identify illegal fishing activities, and assist in sustainable fisheries management. Kurekin et al. [24] proposed a cost-effective system using Earth observation and AIS data to monitor illegal, unreported, and unregulated fishing activities in Ghanaian waters.
Due to inherent flaws in raw data, such as noise, data loss, and duplication, data preprocessing is essential after collection. Procedures like data cleaning, compressing, and aggregation are crucial for visualization, analysis, and feature extraction. Raw ship trajectory data often have quality issues, including invalid data, errors, missing values, anomalies, and duplicates caused by equipment failures, transmission errors, and improper sampling. Multiple methods are needed to enhance data quality [25,26]. Guo et al. [27] proposed an iterative anomaly detection method using statistical analysis to identify outliers. Lv et al. [28] developed a real-time AIS data cleaning and analysis algorithm to improve AIS data quality and analyze large datasets. This algorithm, which includes data fusion, deduplication, decoding, anomaly detection, sorting, prediction, and statistical analysis, has developed into a mature and comprehensive approach for processing AIS data. With these advancements in data processing algorithms, along with recent improvements in equipment and preprocessing technologies, AIS data quality has significantly increased. In this study, we employ iterative anomaly detection and removal methods for data cleaning as proposed by Guo et al. [27].
In addition, handling large-scale ship trajectory data is challenging due to the vast spatiotemporal range of navigation and the high sampling frequency of AIS data. Trajectory compression is used to reduce data size, with algorithms classified as line-based or semantic-based [29]. Line-based algorithms represent trajectories as series of line segments. Qi and Ji [30] compared five trajectory compression algorithms, CIPA, LVDA, OAA, Douglas–Peucker (DP), and Grid Algorithm (GA), finding DP suitable for historical data and GA for real-time data. Wang et al. [31] used a trajectory compression algorithm to identify collision avoidance points in AIS data, proving more accurate than the CPA method. Ferreira et al. [32] evaluated DP, Time Ratio (TR), and Speed-Based (SB) algorithms, noting that the choice depends on balancing processing time and clustering accuracy. Gao et al. [33] developed a compression algorithm based on navigation status and acceleration changes suitable for emission inventory research. This method ensures accuracy under high compression rates by controlling acceleration changes. Liu et al. [34] proposed an AIS trajectory clustering method combining discrete Fréchet distance and the DP algorithm. Wei et al. [35] introduced a two-part AIS data compression algorithm using DP for spatial simplification and a sliding window for motion simplification, balancing compression rate, and behavior retention. Rong et al. [36] used the DP algorithm with a 500 m threshold to group historical traffic data for probabilistic characterization and anomaly detection. Huang et al. [37] used the DP algorithm to compress ship trajectories and introduced the Average Compression Score (ACS) for determining optimal thresholds. Choosing an appropriate AIS data compression method depends on application needs, data characteristics, and resources. Different algorithms and settings can significantly impact clustering results, necessitating careful selection. In this study, we utilize the DP algorithm for historical AIS data; unlike the approaches discussed in previous studies, this research proposes thresholds based on ship type and size and compares the compression ratio of trajectory points and lengths.
Data-driven methods for route extraction fall into three categories: statistics-based, grid-based, and vector-based [38]. Statistics-based methods are simple but challenging to model with large-scale data. Vector-based methods, however, identify waypoints in ship trajectories and reduce the computational load by employing data science clustering techniques. Ship trajectory clustering research is typically divided into three parts: trajectory point clustering, sub-trajectory clustering, and overall trajectory clustering [37]. Trajectory point clustering focuses on key turning points, start points, and docking points. Rong et al. [36] treated ship routes as a combination of straight segments and turning segments, clustering key points to form final routes. Yan et al. [39] used a semantic modeling approach to simplify navigation in vast waters into sequences of docking points and waypoints, clustering them with a density-based algorithm and then connecting them using graph theory. Liu et al. [38] proposed an AIS-based framework for extracting maritime traffic networks, including traffic pattern recognition, semantic route extraction, route decomposition, and network generation. Sub-trajectory clustering divides ship trajectories into segments and clusters them based on local features. Each method has its own strengths, and the choice depends on the specific application and data characteristics.
Ship trajectories are spatiotemporal curves with sequential and directional characteristics. Trajectory point clustering simplifies these trajectories into key points, such as start points, operation points, waypoints, and endpoints, representing local features like position, direction, and speed. This method requires extensive data for optimal results and is particularly suited for large-scale route extraction. Trajectory line clustering, through data compression, extracts approximate routes that capture overall trajectory features. This approach retains key characteristics and improves the efficiency of distance and similarity calculations, making it increasingly popular. Ship trajectory spatial distances are commonly used to measure the similarity between trajectories. The Hausdorff distance [40], which considers the maximum distance between the closest points of two trajectories, is often used for this purpose. To effectively represent spatiotemporal features and improve computational efficiency, exploring trajectory similarity has become an important research topic. Li et al. [41] introduced a multi-step clustering algorithm for AIS trajectory clustering, combining Dynamic Time Warping (DTW), Principal Component Analysis (PCA), and an improved center-selection clustering algorithm. Sheng and Yin [42] proposed a trajectory similarity measure based on accumulated weighted distances (spatial, directional, and velocity distances). Zhao and Shi [43] combined the DP compression algorithm with DTW to reduce the data needed for DTW distance matrix calculations, improving clustering performance by adaptively determining DBSCAN parameters based on trajectory distribution. Wang et al. [44] proposed a Hausdorff distance and HDBSCAN-based clustering method for shape-featured ship trajectories, with a statistical method for clustering parameter selection and sensitivity analysis. Huang et al. [37] used the DP algorithm to compress ship trajectories and introduced the Average Compression Score (ACS) to determine optimal compression thresholds, using MD-DBSCAN for noise reduction and route extraction. Yan et al. [45] suggested a Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) enhanced clustering method suitable for dynamic datasets, using Comprehensive Clustering Performance Metrics (CCPM) for evaluation and adaptive parameter determination. Zhang et al. [46] employed Minimum Description Length (MDL), improved DTW for trajectory clustering and anomaly detection, and improved DBSCAN to enhance its applicability and performance for large datasets.
The reviewed literature highlights advancements in trajectory feature extraction, data compression for efficiency, similarity measurement, clustering methods, and parameter optimization. These methods, particularly trajectory line clustering from historical AIS data, have matured, offering structured frameworks for route extraction applicable to various needs and maritime conditions.

3. Data Processing and Methodology

The widespread use of AIS devices has made long-term AIS records a crucial big data source for maritime traffic analysis. However, during transmission, AIS information is compressed, and once decoded, each record is discrete, lacking temporal and spatial continuity. A single AIS record only provides the vessel’s status at a specific moment, limiting its usefulness. Thus, extracting valuable features from AIS records and converting them into actionable maritime knowledge is essential. This study proposes a framework, shown in Figure 1, for maritime route extraction to address these challenges. The framework consists of three stages: it begins with AIS preprocessing to ensure the quality of AIS records and uses trajectory recognition to create continuous paths for each ship. Next, feature extraction, including anomaly trajectory point detection and trajectory compression, identifies irregular movements and reduces data size while retaining essential information. Finally, a clustering model groups similar ship trajectories to reveal maritime traffic patterns, key navigation routes, and high-risk areas. This approach provides practical references for navigators and authorities and lays the foundation for future research on abnormal navigation behavior detection.

3.1. AIS Preprocessing

Data quality significantly impacts the accuracy and reliability of machine learning models and the validity of analytical insights, making preprocessing a crucial step before modeling. According to the ITU’s 2014 AIS technical standards [47], AIS signals are transmitted via different VHF bands (terrestrial AIS via 87B and 88B, satellite AIS via 75 and 76) at intervals from 3 s to 6 min, depending on the type of information and the vessel’s status. Static and voyage-related data (e.g., vessel name, call sign, dimensions, destination, ETA) are transmitted every 6 min or upon change, while dynamic data (e.g., position, SOG, COG, heading) are updated every 2 s to 3 min, based on speed and turning rate. Signal delays, interference, and disruptions can occur due to traffic volume and transmission capacity limitations. Given these challenges, previous research highlights the need for comprehensive AIS data preprocessing. This section adapts these methods to the characteristics of AIS data in Taiwanese waters, developing a preprocessing procedure that includes data cleaning and trajectory feature extraction. As the data used are already decoded, the AIS decoding step is omitted. The following sections detail the methods for data cleaning and trajectory feature extraction.
Data cleaning involves two main aspects. First, according to AIS performance standards [47], records with obvious anomalies are removed, such as duplicate data, incomplete or unidentifiable information, or dynamic information with longitude over 180°, latitude over 90°, speed over the maximum possible for vessels, course over 360°, or true heading over 360°. Additionally, static and voyage information, which is manually entered, often contains errors, omissions, or outdated information, making it more prone to inaccuracies compared to dynamic information directly from navigation instruments [48]. Therefore, verifying the accuracy of static data is crucial. This section introduces automated procedures to identify and remove records with errors, such as invalid Maritime Mobile Service Identity (MMSI) numbers (not precisely 9 digits or containing non-existent Maritime Identity Digits), unknown vessel types, or missing size details. Second, data outside the scope of the study are removed to reduce data volume and analysis costs. By setting boundaries and speed thresholds, AIS data from anchored ships, docked ships, or those not in normal navigation are excluded from the analysis.
AIS communication technology uses Self-Organized Time Division Multiple Access (SOTDMA) to stagger transmission times, ensuring that only one ship transmits at any given time [47]. Additionally, receiving data via satellite AIS can introduce significant signal delays [49], causing AIS records to be sorted by signal reception order rather than chronological sequence. This makes the data disorganized and fragmented. To address this issue, this section employs the MMSI, a unique identifier assigned to each ship or maritime station, to differentiate between vessels in the AIS dataset. Using MMSI allows for the accurate sequencing and attribution of data to individual ships, even when transmission times are staggered, thereby overcoming the challenges posed by the asynchronous nature of AIS transmissions. This approach allows for the creation of temporally and spatially linked multi-dimensional sequences for each ship.
To compile cleaned AIS point data into coherent ship trajectory datasets, we employ mathematical representations where T denotes the set of all ship trajectories and Tracki represents the i-th trajectory in T. Each trajectory, Tracki, consists of a set of trajectory points Pi,k (where k = 1, 2, …, m(Pi)), with m(Pi) being the total number of points in Tracki. Each trajectory point Pi,k includes multiple dimensions of information: record time (ti,k), X-coordinate (Xi,k), Y-coordinate (Yi,k), speed over ground (SOGi,k), course over ground (COGi,k), and true heading (THDi,k), as described in Equation (1).
P i , k = ( t i , k , X i , k , Y i , k , S O G i , k , C O G i , k , T H D i , k )
Typically, AIS coordinate information, including longitude and latitude, is provided in the WGS84 system. However, for this study, which requires extensive geometric and distance calculations, the coordinates are converted to the TWD97 projection coordinate system, resulting in Xi,k and Yi,k. This conversion reduces computational cost and errors, allowing direct calculation of Euclidean distances between points in meters.

3.2. Track Feature Extraction

In AIS data-based research, ensuring the accuracy and reliability of trajectory data is crucial for the validity of the findings. This section introduces two key methods to enhance the quality and usability of AIS trajectory data: (1) anomaly trajectory point detection, which mitigates errors and noise caused by GNSS signal inaccuracies and transmission interferences that can distort ship paths, and (2) trajectory compression, which reduces the number of trajectory points by using key points to represent steady navigation states, thereby preserving data integrity while minimizing computational complexity and improving efficiency in handling large datasets.

3.2.1. Anomaly Trajectory Point Detection Algorithm

Although the data cleaning process removes records with obvious errors, such as incorrect MMSI formats, out-of-range coordinates, and other violations, some noise caused by GNSS anomalies or transmission errors may still result in significant short-term position drifts in AIS records [50,51,52,53]. If left unaddressed, these drifts can distort subsequent trajectory compression and affect the accuracy of model development. Therefore, additional measures are needed to accurately identify and remove such anomalies.
This study proposes an algorithm for detecting and removing anomalous data based on the speed and rate of turn between consecutive points. The algorithm evaluates the plausibility of movement between points and iterates twice over each trajectory to ensure comprehensive detection and removal of anomalies. This results in a refined dataset of ship trajectories. To implement this, we developed Python scripts to perform iterative anomaly detection and segmentation for each trajectory, Tracki (i = 1, 2, …, n). At each trajectory point Pi,k (k = 1, 2, …, m(Pi)), the coordinate distance and movement distance to adjacent points are calculated. Anomalies are then identified based on these calculations:
d a , k ( p i , k ,   p i , k - 1 ) = ( X i , k ,   Y i , k ) ,   ( X i , k 1 ,   Y i , k 1 ) 2
d c , k ( p i , k ,   p i , k 1 ) = S O G i , k × ( t i , k t i , k 1 ) × α
R O T k = ( Δ C O G i , k / Δ t i , k ) 60
f k = 0 i f   d a , k ( p i , k ,   p i , k - 1 ) d c , k ( p i , k ,   p i , k 1 )   a n d   R O T k β ; 1 o t h e r w i s e .
F l a g i , k = ( f k ,   f k + 1 )
where da,k represents the coordinate distance calculated by the trajectory point and its preceding point based on their coordinates. dc,k is the movement distance between a trajectory point and its preceding point, derived from the reference speed and time difference. The parameter α is a manually input adjustment coefficient aimed at reducing misjudgments caused by sensor errors. ROTk is the turning rate between the trajectory point and its preceding point, and β is a manually input threshold. The variable Flagi,k is the judgment marker for the checkpoint, while fk and fk+1 denote the relationship between the checkpoint and its adjacent points. The Flagi,k are interpreted as follows:
  • (0, 0): normal point.
  • (1, 1): anomaly trajectory point.
  • (0, 1): segment point I, denoted as ps1, which indicates the point immediately before an anomaly.
  • (1, 0): segment point II, denoted as ps2, which indicates the point immediately after an anomaly.
Removal of anomaly trajectory points is achieved as follows:
P i = p i , k   P i | F l a g i , k ( 1 , 1 )
After removing the anomalous points, a second calculation is performed using Equations (4)–(7). This involves computing the coordinate and movement distances between all segment points and their adjacent segment points. If the distances meet the criteria, ps1 and ps2 are merged. Otherwise, they remain separate segments, with ps1 as the endpoint of the previous segment and ps2 as the starting point of the new segment.

3.2.2. Trajectory Compression

After detecting and removing anomalous trajectory points, trajectory data are compressed using the DP algorithm [54], enhancing data quality for analysis and pattern recognition. Compressed trajectory data also enhance visualization, making it easier to understand and analyze. As discussed in Section 2, the DP algorithm significantly reduces the number of data points while retaining the trajectory’s original features. The workflow of the DP algorithm is shown in Figure 2. From trajectory points TR1 to TR8 (Figure 2a), a baseline is formed between the first and the farthest points (point 8). The perpendicular distance of each intermediate point from the baseline is calculated, and the point with the greatest distance (TR3) is identified. If this distance exceeds a set threshold ε, the point is retained (Figure 2b), and the trajectory is split into two segments (Figure 2c). If the distance is less than ε, all intermediate points are discarded, keeping only the first and last points (Figure 2d,e). This process recursively applies until all segments’ maximum perpendicular distances are less than or equal to ε. The retained points are merged to form the simplified trajectory (Figure 2f).
The distance threshold ε directly affects the compression performance of the DP algorithm. Its purpose is to control the precision of curve simplification, ensuring the simplified curve closely resembles the original shape. A smaller ε value retains more trajectory points, making the simplified curve more accurate but less effective in reducing data. Conversely, a larger ε value removes more points, resulting in a greater deviation from the original curve but better simplification. Therefore, determining an appropriate ε is a key part of this research, as data quality and navigational characteristics also influence this value. Detailed discussions on setting the threshold ε are provided in Section 4.

3.3. Clustering Model

In route extraction, clustering analysis of multiple ship trajectories can reveal common operational paths and abnormal behaviors, which is significant for maritime traffic management, logistics optimization, and anomaly detection. After the data preprocessing and trajectory feature extraction in Section 3.1 and Section 3.2, we obtained a clean and streamlined trajectory dataset ready for building clustering models.
Ship trajectory clustering can be categorized into three main types: (1) trajectory point clustering, (2) segment clustering, and (3) full trajectory clustering. However, this research found that trajectory point clustering faces challenges due to the varying number of trajectory points among different ships. This can lead to sparse trajectory points being non-clustered and dense points being overly clustered. Additionally, trajectory point clustering may fail to fully capture the overall shape and pattern of the trajectories, potentially overlooking important global features. While segment clustering avoids the issue of uneven point distribution, it poses challenges in segmenting the trajectories. Improper boundary handling can lead to analytical biases. Therefore, this study adopts full trajectory clustering, which addresses the uneven distribution of trajectory points and considers the complete shape and pattern of the trajectories. Moreover, it has a lower computational cost compared to trajectory point and segment clustering, requiring significant computation only when calculating the similarity matrix.

3.3.1. Similarity Measurement

A crucial step in clustering is measuring similarity, often performed by calculating the distance between ship trajectories. Several methods define trajectory distance, with the Hausdorff distance [40] being a common choice. The Hausdorff distance measures the maximum distance between trajectories but does not account for similar shapes with opposite directions. Since ship trajectories are spatiotemporal curves that include both time and space, ships may have different speeds and times but similar paths or similar speeds and times but completely different paths. As a result, the Hausdorff distance alone does not accurately capture the similarity between ship trajectories. It often requires additional trajectory direction measures, increasing analysis costs and model complexity.
This study uses Dynamic Time Warping (DTW) [55] to address these limitations to measure trajectory similarity. DTW calculates the cumulative distance between ship trajectories based on their temporal sequences, not just their spatial distribution. This approach simplifies the process by requiring only a single similarity matrix, unlike Hausdorff distance methods. DTW effectively measures similarity between two time series, even with non-linear distortions or opposite directions along the time axis, by employing dynamic programming to find the optimal alignment that minimizes cumulative distance. For any two trajectories Tracki = [Pi,1, Pi,2, …, Pi,p], p = m(Pi) and Trackj = [Pj,1, Pj,2, …, Pj,q], q = m(Pj), initialize a p × q cost matrix D. The mathematical formulation is given by Equations (8)–(10).
D ( k , m ) = P i , 1 - P j , 1 2 P i , 1 - P j , m 2 + D ( 1 , m - 1 ) i f   m 1 ; P i , k - P j , 1 2 + D ( k - 1 , 1 ) i f   k 1 ; P i , k - P j , m 2 + min D ( k - 1 , m ) , D ( k , m - 1 ) , D ( k - 1 , m - 1 ) i f   k 1   a n d   m 1 ; o t h e r w i s e .
s i j = D T W ( T r a c k i , T r a c k j ) = D ( p , q ) i , j = 1 , 2 , 3 , , n
S = s 11 s 12 s 1 n s 21 s 22 s 2 n s n 1 s n 2 s n n n × n
where P i , k P j , m 2   k = 1 , 2 , , p ,   m = 1 , 2 , , q represents the Euclidean distance between points Pi,k and Pj,m, D(k,m) denotes the accumulated distance (cost) of aligning Pi,k with Pj,m, and sij is the DTW distance between any two trajectories, forming an element of the similarity matrix S, which is a symmetric matrix.

3.3.2. HDBSCAN

Clustering algorithms typically do not have explicit optimization objectives or predefined learning labels; instead, they differentiate data based on similarity features. Since these algorithms rely directly on the data, a good clustering algorithm should demonstrate stability, meaning that the clustering results should not change significantly due to minor variations in the sample data. Additionally, they should minimize human intervention to maintain the objectivity of the clustering process.
HDBSCAN [56] is a hierarchical, density-based clustering algorithm derived from DBSCAN [57]. Unlike DBSCAN, which assumes that all clusters have similar densities and uses specific hyperparameters like eps and min_samples to define clusters, HDBSCAN can handle clusters of varying densities. This flexibility is particularly important in maritime contexts, where geographical constraints and sea conditions often influence ships’ navigation routes, leading to different density patterns. HDBSCAN does not require predefining the number of clusters and can effectively manage noisy data, making it well suited for complex trajectory data.
The HDBSCAN algorithm begins by constructing a minimum spanning tree (MST) from a similarity matrix, where each edge represents the distance between ship trajectories. It then calculates two key metrics: (1) core distance (dcore), which is the distance from each point to the farthest point within its min_samples neighborhood, as shown in Equation (11), and (2) density reachability distance (dreach), which is the maximum of the core distances of two points and the distance between them, as defined in Equation (12).
d c o r e ( p ) = max d ( p , p k ) | k = m i n _ s a m p l e s
d r e a c h ( p , q ) = max d c o r e ( p ) , d c o r e ( q ) , d ( p , q )
Here, d(‧) denotes the distance between samples. The advantage of density reachability distance is that it preserves the sampling distance in dense regions while amplifying the distance between sampling points in sparse areas. This increases the algorithm’s robustness to noise points.
Finally, based on the density reachability distance, the edges of the MST are sorted and cut to form hierarchical clusters. This step is similar to top-down partitioning in hierarchical clustering. By analyzing the stability of clusters at different levels of the hierarchy, the most stable clustering result is selected, ensuring the reliability of the final clustering outcome.

3.3.3. Performance Metrics

To evaluate the clustering results, this study employs two standard evaluation metrics: the Silhouette Coefficient (SC) [58] and the Davies–Bouldin Index (DBI) [59]. The SC measures cohesion within clusters and the separation between different clusters. The SC value falls in the range [−1, 1], with values closer to 1 indicating better clustering performance due to well-separated samples from neighboring clusters. A value of 0 indicates that the sample is on or very close to the decision boundary between two clusters, while negative values suggest that samples may have been misclassified into the wrong clusters. The DBI is calculated by dividing the sum of the average distances of each sample within a cluster by the distance between the cluster centers. Lower average distances within clusters (indicating samples are closer together) and a smaller DBI value signify better clustering performance. The formulas for SC and DBI are provided in Equations (13) and (14).
S C = i = 1 n b ( i ) a ( i ) max a ( i ) , b ( i )
D B I = 1 n i = 1 n max i j S i + S j M i j
where n is the total number of clusters, a(i) is the average distance from sample i to all other samples in the same cluster, and b(i) is the average distance from sample i to the nearest different cluster. Si and Sj are the average distances within clusters i and j, respectively, and Mij is the distance between the centers of clusters i and j.

4. Case Study

This study conducts a case study of trajectory clustering based on HDBSCAN to validate the practical applicability of the proposed method. The Taiwan Strait, specifically the western waters of Taiwan, was chosen as the study area. The geographical boundaries for this region are latitude 23°40′ N to 25°20′ N and longitude 119°30′ E to 120°40′ E. This area is a busy international shipping route connecting Northeast Asia to Southeast Asia, with significant maritime traffic. With increasing awareness of environmental sustainability, the Taiwanese government’s energy policy has shifted towards green energy development. Given Taiwan’s limited land and surrounding seas, offshore development has become a crucial solution. The western waters of Taiwan, characterized by a narrow strait between Taiwan and mainland China, experience northeast monsoons in winter and southwest monsoons in summer, making it an internationally recognized wind farm location. Large-scale offshore wind power installations are planned for this area, which will potentially impact existing navigation routes, reduce available navigation space, and increase the risk of ship encounters and collisions.
The Changhua wind farm channel at the southern end of the study area has been operational since October 2021. Another channel is planned for the northern end, with the area off Taichung Port forming a critical intersection for multiple shipping lanes. This situation necessitates a study of the navigation characteristics in the western waters of Taiwan to assist the Vessel Traffic Service (VTS) in monitoring ship movements and reducing navigation risks. Therefore, unlike other studies that primarily discuss shipping routes from economic and shipping development perspectives, this research aims to clarify and understand the navigation characteristics of the area, providing a reference for maritime traffic planning and safety decision making. This study focuses on container ships, which frequently navigate these waters with consistent routes, offering a clearer understanding of maritime traffic patterns.

4.1. Data Preparation and Processing

This study processed container ship data using the data processing and trajectory extraction methods developed in Section 3.1 and Section 3.2. However, since the “Ship and cargo type” field in the AIS data only identifies cargo ships (Ship Type Code 70) and does not differentiate between specific types such as container ships, this study applied the ship classifier developed by Huang et al. [60], which identifies container ships, bulk carriers, general cargo ships, and car carriers based on AIS data’s ship geometry and trajectory behavior features. The relevant parameters are listed in Table 1 and Table 2. As a result, the processed trajectory data and density maps in Figure 3 illustrate the container ship trajectory density in the western waters of Taiwan for 2023. The small inset on the right side of Figure 3 shows the raw AIS data points within the analysis area. The main map on the left displays the traffic density, with red areas indicating high-density waters where ship traffic is more concentrated. Yellow represents medium-density areas, and blue shows low-density areas. Currently, the Changhua wind farm and its channels are already operational, with the eastern lane designated for northbound traffic and the western lane for southbound traffic. Multiple traffic flows are converged offshore from Taichung Port and between Taichung Port and Taipei Port. As more offshore wind farms and channels are established in the western waters of Taiwan, the complexity of these traffic intersections is expected to increase.
Parameter analysis was conducted on the acquired data to determine appropriate threshold values for anomaly trajectory point detection, and trajectory compression methods were developed and applied in this study. The results are presented in Figure 4. Figure 4a shows the relationship between different α values and the number of detected anomalies. When α is set to 1, approximately 20% of the data are flagged as anomalous, which is clearly unreasonable. As α increases, the number of detected anomalies decreases and stabilizes around α = 1.3. This study selected α = 1.1 (marked by the red dot), where the gradient change is most significant, identifying anomalies in about 1.5% of the data. Figure 4b–d compare the ε parameter selection for trajectory compression. This study first conducted a statistical distribution of the container ships’ length and width, with the box plot shown in Figure 4b. The distribution indicates that the dataset’s average length of container ships is about 195 m, with an average width of 30 m. Most of the vessels are under 200 m in length, with the largest ship measuring 400 m in length and 63 m in width, capable of carrying around 24,000 TEUs.
Trajectory compression reduces data storage and computational costs. It eliminates low-frequency noise, such as minor lateral shifts caused by wind, waves, current, or positioning errors, while preserving the primary features of the actual trajectory. Given that such shifts are often lateral, the ε parameter is closely related to the ship’s width. This section conducted parameter analysis by setting the ε value based on the average ship width in the dataset and its multiples, with the results shown in Figure 4c,d. Figure 4c shows the relationship between ε and the average reduction in trajectory length. In contrast, Figure 4d depicts the relationship between ε and the average decrease in AIS trajectory points. Both figures indicate that when ε is set to the average ship width (30 m), the slopes of both reduced length and reduced point count are the steepest. The average trajectory length reduction is only 89 m, and the average decrease in AIS trajectory points reaches 3731. As the ε value increases incrementally, the average reduced length increases, but the average reduction in AIS trajectory points shows no significant difference. Therefore, to preserve the main features of the actual trajectory, this study selected an ε value of 30 m for effective trajectory compression.

4.2. HDBSCAN Model and Hyperparameter

After the data processing in Section 4.1, we obtained the trajectory features of container ships in the western waters of Taiwan. Next, we applied the DTW method described in Section 3.3 to construct the similarity matrix of trajectory distances (Table 3) and used it in the HDBSCAN model for clustering analysis. The advantage of the HDBSCAN algorithm is that it does not require setting a core distance or predefining the number of clusters; it only needs to put one hyperparameter as the minimum cluster size (min_samples). The model automatically segments the data through a minimum spanning tree mechanism, resulting in a hierarchical clustering structure.
To achieve the best clustering results and observe the impact of different hyperparameter settings on the clustering outcomes, this study set the min_samples within the range of 2 to 200 for model training. The clustering results were then evaluated using the criteria described in Section 3.3.3. The analysis results are illustrated in Figure 5 and Figure 6. Figure 5 illustrates the relationship between the hyperparameter, min_samples, and the number of clusters and non-clustered trajectories. The horizontal axis represents the min_samples value, the left vertical axis shows the number of clusters, and the right vertical axis indicates the number of non-clustered trajectories. In the figure, the purple line represents the change in the number of clusters, while the red line represents the change in the number of non-clustered trajectories. According to Figure 5, as the min_samples increase from 2 to 200, clusters decrease significantly from 43 to 9 (stabilizing at min_samples = 107). Simultaneously, the number of non-clustered trajectories increases from 336 (min_samples = 2) to 1889 (min_samples = 200). The lowest number of non-clustered trajectories, 314 (marked by the red arrow), occurs when the min_samples is set to 3, resulting in 28 clusters.
Figure 6 presents the results of two key metrics for the clustering method: the Silhouette Coefficient (SC) and the Davies–Bouldin Index (DBI), as they vary with the min_samples. The horizontal axis represents the min_samples value, the left vertical axis indicates SC, and the right vertical axis represents DBI. In the figure, the orange line represents SC, while the blue line represents DBI. According to Figure 6, SC reaches its maximum value at min_samples = 50 (marked by the orange arrow), indicating the best clustering performance at this point. Afterward, SC gradually decreases as min_samples increases. On the other hand, DBI reaches its minimum value at min_samples = 4 (marked by the blue arrow), suggesting the best separation between clusters at this point. As min_samples increases, DBI fluctuates, with local peaks occurring at min_samples = 49 and min_samples = 165, before gradually decreasing again.
The SC and DBI metrics together show how the clustering performance of the HDBSCAN method changes with different min_samples settings. As shown in Table 4, when min_samples is set to 4, the DBI reaches its lowest value, resulting in 26 clusters with 366 non-clustered trajectories. However, even though the DBI is at its lowest, the SC is relatively low (0.37), indicating good separation between clusters but potentially insufficient cohesion within clusters. As the min_samples parameter increases, the SC reaches its maximum value when min_samples is set to 50, indicating the highest cluster cohesion. At this point, the DBI remains close to a local minimum (3.00). This configuration results in 10 clusters but with 803 non-clustered trajectories. The trends shown in Figure 5 and Figure 6 indicate that there is a balance point between clustering effectiveness, separation, the number of clusters, and the number of non-clustered trajectories. Based on the trends in SC and DBI, this study selects a compromise at min_samples = 17 (marked by the black dashed line), where SC (0.54) is close to its maximum value (0.55) and DBI (2.77) is lower than that at min_samples = 50 (3.00). This setting leads to 16 clusters with 529 non-clustered trajectories, representing a middle ground that likely provides better overall clustering performance compared to at min_samples = 4 or 50.

4.3. Route Extraction Results

This section presents visualizations of the clustering results derived from the hyperparameter analysis discussed in Section 4.2. We calculated and generated the visualizations using ArcGIS Pro 3.3.1 software developed by Esri. Trajectory density distribution maps will be generated using the trajectory length-weighted density, as defined in Equation (15), with a cell size of 30 m × 30 m. These visualization maps will depict the clustering results for each category label, illustrating the distribution and characteristics of the ship trajectories within the study area. The selection of a 30 m cell size is closely aligned with the average ship width in the dataset, enabling a more precise representation of each container ship’s contribution to the overall density distribution. This approach maintains high spatial resolution while also improving data processing efficiency.
D e n s i t y = i = 1 n L i 30 × 30   ( m 2 )
where Li represents the length of a trajectory within a cell and n represents the total number of trajectories within the cell.
Figure 7, Figure 8 and Figure 9 illustrate the trajectory density distributions corresponding to min_samples set to 4, 17, and 50, respectively. This study also computes the average length and the average track made good (TMG) within each trajectory cluster to identify the predominant direction and compare clustering results across different hyperparameter settings. The relevant data are compiled in Table 5. Based on the analysis of Figure 7, Figure 8 and Figure 9 and the summary in Table 5, the clustering results across the three different hyperparameter settings can be categorized into four types, encompassing 16 groups of trajectory clusters with similar or distinct characteristics.
In the first type, the data can be categorized into four groups (Group 1 to Group 4), where all three hyperparameter configurations yield identical clustering results. Group 1 comprises northbound routes passing through the Changhua wind farm channel without stopping at Taichung Port (A09, B05, and C05). The average TMG is around 30.5 degrees, and the average length is approximately 201,000 m. Group 2 consists of northbound routes passing through the Changhua wind farm channel and stopping at Taichung Port (A10, B06, and C04). The average TMG for these routes is 42.6 degrees, and the average length is 90,333 m. Group 3 includes northbound routes departing directly from Taichung Port (A12, B08, and C07). The average TMG is about 33.0 degrees, and the average length is 139,000 m. Group 4 represents southbound routes passing through the Changhua wind farm channel without stopping at Taichung Port (A24, B12, and C08). The average TMG is approximately 212.1 degrees, and the average length is about 206,000 m.
The second type includes five groups (Group 5 to Group 9), where all three hyperparameter configurations generally produce similar clustering results, although certain specific clusters are obtained only with particular hyperparameter settings. Group 5 is distributed southwest of the Changhua wind farm channel, following a southeast–northwest direction without passing through the wind farm channel (A00~03, B00, and C00). When min_samples is set to 4, four distinct clusters can be identified, distinguishing between southbound and northbound trajectories based on average TMG and average length. For instance, A00 and A01 are both southbound trajectories but differ significantly in shape—A00 has two turning points during its southbound course, whereas A01 cuts directly through the offshore wind farm’s boundary. A02 and A03 are northbound trajectories. A02 mirrors A01 in form but in the opposite direction, representing a customary route, while A03 involves a northbound trajectory that turns left and exits westward after approaching the wind farm channel. However, with the other two hyperparameter configurations, although these four trajectory patterns are recognized, they are not further distinguished, resulting in a single cluster. The average TMG and average length data indicate that B00 and C00 represent the same trajectory cluster. Still, unlike A00 to A03, the inability to differentiate between southbound and northbound directions means that the average TMG yields a westward average. Group 6 is located in the northwest of Taiwan, following a southwest–northeast direction (A04, B01~02, and C01~02). In contrast to Group 5, when min_samples is set to 4, only a single cluster is identified, which fails to differentiate between trajectory directions. However, the other two hyperparameter settings do distinguish between southwest-bound and northeast-bound trajectories based on average TMG and average length. Group 7 represents southbound routes departing from Taichung Port and passing through the Changhua wind farm channel (A13~15, B13~15, and C09). When min_samples is set to 4 and 17, the trajectories can be categorized into three distinct clusters: those that turn west after heading south through the wind farm channel (A13 and B13), those that continue south after passing through the southern lane (A14 and B14), and those that proceed south through the eastern buffer zone of the wind farm channel (A15 and B15). The average TMG and average length data show consistency within the same cluster across the two hyperparameters and significant differences between different clusters. However, when min_samples is set to 50, only a single cluster is identified, losing the finer distinctions. Group 8 includes southbound routes passing through the western waters of Taiwan and stopping at Taichung Port (A19~20, B03, and C03). When min_samples is set to 4, A20 exhibits a detour behavior compared to A19, resulting in an average length that is approximately 20,000 m longer. The other two hyperparameter settings produce a single cluster with an average TMG of about 216.1 degrees and an average length of approximately 147,000 m. Group 9 consists of westbound or northwest-bound routes departing from Taichung Port and distributed north of the Changhua wind farm channel (A21~23, B09~10, and C06). When min_samples is set to 4, the average TMG values are 327.9 degrees (A21), 268.6 degrees (A22), and 287.5 degrees (A23). At min_samples = 17, the average TMG values are 327.4 degrees (B10, corresponding to A21) and 269.8 degrees (B09, corresponding to A22~23). When min_samples is set to 50, only a single cluster is obtained (C06).
The third and fourth types involve scenarios where certain hyperparameter configurations fail to produce clustering results, leading to the clustering of those trajectories as non-clustered. In the third form, when min_samples is set to 50, no clustering results are obtained for three groups (Groups 10–12). Group 10 consists of eastbound or southeast-bound routes departing from Taichung Port and distributed north of the Changhua wind farm channel, represented explicitly by trajectories A06~07 (min_samples = 4) and B04 (min_samples = 17). Group 11 includes northbound routes that enter from the southwest of the Changhua wind farm channel, pass through the wind farm channel, and dock at Taichung Port, represented by trajectories A11 (min_samples = 4) and B07 (min_samples = 17). Group 12 represents anchorage trajectories outside Taichung Port, indicated by trajectories A16 (min_samples = 4) and B11 (min_samples = 17). In the fourth type, when min_samples = 17 and 50, no clustering results are obtained for four groups (Group 13~16). This likely results from an insufficient number of trajectories in these cases. Furthermore, these groups consist of specific routes with only a few trajectories, highlighting the challenges of clustering with limited data under specific hyperparameter settings.
In summary, Figure 7 (min_samples = 4) shows a highly detailed clustering result, where multiple distinct clusters capture minor variations in trajectory patterns. This granularity is particularly useful for identifying unique navigational behaviors, such as specific maneuvers or deviations from common routes, which can assist in precise navigation planning and localized traffic management. Figure 8 (min_samples = 17) demonstrates a balanced clustering outcome that captures the essential structure of maritime traffic while reducing excessive fragmentation. This configuration allows for effective traffic management strategies by maintaining sufficient detail to monitor significant routes and identify potential high-risk areas without being overwhelmed by minor variations. Figure 9 (min_samples = 50) represents a more generalized clustering result with fewer clusters, providing a simplified view of significant traffic patterns. This broader perspective is valuable for strategic planning, such as developing maritime infrastructure or assessing long-term navigational trends, as it highlights the most frequently traveled routes and critical intersections. Overall, through the framework proposed in this study, selecting specific hyperparameters generates clustering results tailored to different purposes, providing fresh perspectives on complex maritime traffic. While each configuration offers distinct advantages, setting min_samples to 17 balances the overly granular clustering of min_samples = 4 and the more generalized approach of min_samples = 50. It achieves a detailed representation of significant traffic patterns without excessive fragmentation, making it the most suitable hyperparameter configuration for explaining the container ship routes in the western waters of Taiwan. This conclusion is further supported by the metrics SC and DBI discussed in Section 4.2.
This study consolidates the results from the three hyperparameter settings to create a simplified map of primary navigation routes, which highlights areas of high traffic density and marks intersections with white dots, as shown in Figure 10. The map reveals several critical insights for practical maritime operations. Notably, it identifies multiple intersection points, particularly in the northwest waters of Taiwan and the offshore areas near Taichung Port. Among these, the most critical area is near the offshore waters of Taichung Port, where vessels traveling to and from the Changhua wind farm channel intersect with ships arriving at and departing from Taichung Port. For instance, northbound vessels from the wind farm channel or those entering the southbound traffic lane will encounter ships departing south from Taichung Port toward the channel. As a result, this area is currently designated as a precautionary area, managed by a dedicated Vessel Traffic Service (VTS). With the ongoing development of offshore wind farms, a new channel is expected to be established in the northwest waters of Taiwan. This will make the offshore area of Taichung Port the junction of two channels, requiring vessels to navigate with increased vigilance, leading to more complex traffic conditions. This area will likely become one of the busiest traffic zones, where navigational risks, such as collisions or congestion, may increase. Therefore, detailed monitoring, management, and continuous traffic analysis are necessary to enhance safety.
Moreover, from a practical perspective, this route network map is valuable for navigators optimizing their route planning. Navigators can adjust routes to avoid congestion, reduce transit time, and improve fuel efficiency by identifying the most frequently used paths and potential bottlenecks. Additionally, the map aids maritime authorities in strategic planning, such as allocating resources for traffic surveillance, optimizing VTS operations, and implementing regulatory measures to mitigate risks in densely trafficked zones. VTS can enhance navigational safety by providing real-time information and guidance to vessels, especially in high-risk or congested areas. Furthermore, the data in Figure 10 can guide decisions regarding the placement of navigational aids or infrastructure development, such as building new offshore facilities. When making these plans, it is essential to consider existing traffic patterns to minimize disruptions. This approach is necessary for creating sustainable maritime spatial development that considers safety, efficiency, and environmental factors.

5. Conclusions and Future Work

Ship trajectories record navigation processes, behavioral characteristics, and water region features. By clustering these trajectories, we can analyze navigational behaviors, supporting applications like route planning, anomaly detection, and maritime awareness. This paper proposes a ship trajectory clustering framework based on AIS data tested with container ship trajectories in the western waters of Taiwan. The framework includes data processing, anomaly trajectory point detection, trajectory compression, similarity measurement, and clustering models. First, we collected and organized AIS data from the western waters of Taiwan in 2023. Using anomaly trajectory point detection and trajectory compression, we extracted 10,267 container ship trajectories and analyzed the threshold settings for these methods. We then created a trajectory similarity matrix using the DTW method and trained the HDBSCAN model. During model tuning, we evaluated hyperparameter configurations using clustering metrics such as the SC and the DBI and observed changes in the number of clusters and non-clustered trajectories. Three hyperparameter configurations were selected, visualized through trajectory length-weighted density plots, and analyzed based on average TMG and average length.
Results showed that with min_samples set to 17, the SC (0.54) is close to its optimal value (0.55) and the DBI (2.77) is lower than the DBI (3.00) corresponding to the optimal SC. The number of clusters is 16, with 529 non-clustered trajectories, falling between the optimal SC and DBI values. Therefore, we concluded that when the optimal SC or DBI indicator does not correspond to an optimal value for the other indicator, the clustering result may not be the best. A balance point exists between clustering effectiveness, separation, number of clusters, and non-clustered trajectories, which can be verified through domain knowledge and expert interpretation. In this case, setting the min_samples to 17 provided the optimal hyperparameter configuration, resulting in comprehensively interpretable clusters from a practical perspective. This study depicted clustering results based on density distribution, extracting the maritime traffic routes, and generating a network map of container ship routes in the western waters of Taiwan. This highlights key maritime traffic characteristics, especially in the northwestern waters of Taiwan and offshore Taichung Port. These are vital areas to monitor for navigation, emphasizing potential navigational risks and future evolving trends. Additionally, this navigation network map serves as a valuable reference for navigators to optimize route planning and supports authorities in making informed maritime traffic management and development decisions.
In conclusion, this HDBSCAN-based trajectory clustering framework improves maritime traffic management and understanding of vessel navigation characteristics. It offers fresh perspectives on analyzing complex maritime traffic conditions, identifying high-risk areas, and informing safety and spatial planning. In practical applications, the results help navigators optimize route planning, improve resource allocation for maritime authorities, and inform the development of infrastructure and navigational aids. Future work will explore other weighting parameters like average speed, trajectory direction, and ship tonnage, extending to two-dimensional shipping channels and comprehensive maritime risk assessments, proposing new solutions for maritime supervision and spatial planning.

Author Contributions

Conceptualization, I.-L.H. and J.-C.H.; methodology, I.-L.H. and J.-C.H.; software, I.-L.H.; validation, I.-L.H. and J.-C.H.; formal analysis, I.-L.H.; investigation, I.-L.H.; resources, J.-C.H.; data curation, I.-L.H.; writing—original draft preparation, I.-L.H.; writing—review and editing, I.-L.H., M.-C.L., L.C., and J.-C.H.; visualization, I.-L.H.; supervision, M.-C.L. and J.-C.H.; project administration, I.-L.H.; funding acquisition, M.-C.L. and J.-C.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Maritime and Port Bureau, Ministry of Transportation and Communications, TAIWAN, grant number MPB1130509C026, and National Science and Technology Council, TAIWAN, grant number NSTC113-2410-H-019-019.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available on request due to restrictions of privacy.

Acknowledgments

AIS data were obtained from the Maritime and Port Bureau, Ministry of Transportation and Communications, TAIWAN. The authors would like to thank the government for providing the data free of charge. The authors would also like to acknowledge the reviewers for evaluating this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. IMO. Safety of Navigation. In SOLAS; IMO Publishing: London, UK, 2012; Chapter V. [Google Scholar]
  2. IMO. Resolution MSC.74(69): Adoption of New and Amended Performance Standards. MSC 69/22. 1998. Available online: https://wwwcdn.imo.org/localresources/en/OurWork/Safety/Documents/AIS/Resolution%20MSC.74(69).pdf (accessed on 8 September 2024).
  3. Kytariolou, A.; Themelis, N. Ship routing optimisation based on forecasted weather data and considering safety criteria. J. Navig. 2022, 75, 1310–1331. [Google Scholar] [CrossRef]
  4. Li, M.; Mou, J.; Chen, P.; Chen, L.; van Gelder, P. Real-time collision risk based safety management for vessel traffic in busy ports and waterways. Ocean Coast. Manag. 2023, 234, 106471. [Google Scholar] [CrossRef]
  5. Wang, G.; Wang, J.H.; Wang, X.Y.; Wang, Q.Z.; Han, J.Y.; Chen, L.F.; Feng, K. A Method for Coastal Global Route Planning of Unmanned Ships Based on Human-like Thinking. J. Mar. Sci. Eng. 2024, 12, 476. [Google Scholar] [CrossRef]
  6. Onyango, S.O.; Owiredu, S.A.; Kim, K.I.; Yoo, S.L. A Quasi-Intelligent Maritime Route Extraction from AIS Data. Sensors 2022, 22, 8639. [Google Scholar] [CrossRef]
  7. Luo, D.; Chen, P.; Yang, J.S.; Li, X.A.; Zhao, Y.Z. A New Classification Method for Ship Trajectories Based on AIS Data. J. Mar. Sci. Eng. 2023, 11, 1646. [Google Scholar] [CrossRef]
  8. Durlik, I.; Miller, T.; Dorobczynski, L.; Kozlovska, P.; Kostecki, T. Revolutionizing Marine Traffic Management: A Comprehensive Review of Machine Learning Applications in Complex Maritime Systems. Appl. Sci. 2023, 13, 8099. [Google Scholar] [CrossRef]
  9. Wolsing, K.; Saillard, A.; Bauer, J.; Wagner, E.; van Sloun, C.; Fink, I.B.; Schmidt, M.; Wehrle, K.; Henze, M. Network attacks against marine radar systems: A taxonomy, simulation environment, and dataset. In Proceedings of the 2022 IEEE 47th Conference on Local Computer Networks (LCN), Edmonton, AB, Canada, 26–29 September 2022; pp. 114–122. [Google Scholar]
  10. Stach, T.; Kinkel, Y.; Constapel, M.; Burmeister, H.C. Maritime Anomaly Detection for Vessel Traffic Services: A Survey. J. Mar. Sci. Eng. 2023, 11, 1174. [Google Scholar] [CrossRef]
  11. Fu, X.; Xiao, Z.; Xu, H.; Jayaraman, V.; Othman, N.B.; Chua, C.P.; Lind, M. AIS data analytics for intelligent maritime surveillance systems. In Maritime Informatics; Springer: Cham, Switzerland, 2021; pp. 393–411. [Google Scholar]
  12. Ma, Q.D.; Tang, H.; Liu, C.; Zhang, M.Y.; Zhang, D.Z.; Liu, Z.; Zhang, L.Y. A big data analytics method for the evaluation of maritime traffic safety using automatic identification system data. Ocean Coast. Manag. 2024, 251, 107077. [Google Scholar] [CrossRef]
  13. Kim, H.S.; Lee, E.; Lee, E.J.; Hyun, J.W.; Gong, I.Y.; Kim, K.; Lee, Y.S. A Study on Grid-Cell-Type Maritime Traffic Distribution Analysis Based on AIS Data for Establishing a Coastal Maritime Transportation Network. J. Mar. Sci. Eng. 2023, 11, 354. [Google Scholar] [CrossRef]
  14. Chen, X.; Ma, D.; Liu, R.W. Application of Artificial Intelligence in Maritime Transportation. J. Mar. Sci. Eng. 2024, 12, 439. [Google Scholar] [CrossRef]
  15. Ma, X.F.; Shi, G.Y.; Shi, J.H.; Liu, J. A framework of marine collision risk identification strategy using AIS data. J. Navig. 2023, 76, 525–544. [Google Scholar] [CrossRef]
  16. Zhu, W.H.; Wang, S.D.; Liu, S.L.; Yang, L.B.; Zheng, X.R.; Li, B.H.; Zhang, L.X. Dynamic Multi-Period Maritime Accident Susceptibility Assessment Based on AIS Data and Random Forest Model. J. Mar. Sci. Eng. 2023, 11, 1935. [Google Scholar] [CrossRef]
  17. Huang, J.-C.; Ung, S.-T. Risk Assessment and Traffic Behaviour Evaluation of Ships. J. Mar. Sci. Eng. 2023, 11, 2297. [Google Scholar] [CrossRef]
  18. Liu, Z.; Zhang, B.; Zhang, M.; Wang, H.; Fu, X. A quantitative method for the analysis of ship collision risk using AIS data. Ocean Eng. 2023, 272, 113906. [Google Scholar] [CrossRef]
  19. Di Ciaccio, F.; Menegazzo, P.; Troisi, S. Optimization of the maritime signaling system in the lagoon of venice. Sensors 2019, 19, 1216. [Google Scholar] [CrossRef]
  20. Rapalis, P.; Silas, G.; Zaglinskis, J. Ship Air Pollution Estimation by AIS Data: Case Port of Klaipeda. J. Mar. Sci. Eng. 2022, 10, 1950. [Google Scholar] [CrossRef]
  21. Jalkanen, J.P.; Johansson, L.; Liefvendahl, M.; Bensow, R.; Sigray, P.; Östberg, M.; Karasalo, I.; Andersson, M.; Peltonen, H.; Pajala, J. Modelling of ships as a source of underwater noise. Ocean Sci. 2018, 14, 1373–1383. [Google Scholar] [CrossRef]
  22. Nasar, W.; Torres, R.D.; Gundersen, O.E.; Karlsen, A.T. The Use of Decision Support in Search and Rescue: A Systematic Literature Review. ISPRS Int. J. Geo-Inf. 2023, 12, 182. [Google Scholar] [CrossRef]
  23. Yan, Z.J.; Cheng, L.; He, R.; Yang, H. Extracting ship stopping information from AIS data. Ocean Eng. 2022, 250, 111004. [Google Scholar] [CrossRef]
  24. Kurekin, A.A.; Loveday, B.R.; Clements, O.; Quartly, G.D.; Miller, P.I.; Wiafe, G.; Agyekum, K.A. Operational Monitoring of Illegal Fishing in Ghana through Exploitation of Satellite Earth Observation and AIS Data. Remote Sens. 2019, 11, 293. [Google Scholar] [CrossRef]
  25. He, W.; Lei, J.Y.; Chu, X.M.; Xie, S.; Zhong, C.; Li, Z.X. A Visual Analysis Approach to Understand and Explore Quality Problems of AIS Data. J. Mar. Sci. Eng. 2021, 9, 198. [Google Scholar] [CrossRef]
  26. Zhao, L.B.; Shi, G.Y.; Yang, J.X. Ship Trajectories Pre-processing Based on AIS Data. J. Navig. 2018, 71, 1210–1230. [Google Scholar] [CrossRef]
  27. Guo, S.Q.; Mou, J.M.; Chen, L.Y.; Chen, P.F. Improved kinematic interpolation for AIS trajectory reconstruction. Ocean Eng. 2021, 234, 109256. [Google Scholar] [CrossRef]
  28. Lv, T.; Tang, P.; Zhang, J. A Real-Time AIS Data Cleaning and Indicator Analysis Algorithm Based on Stream Computing. Sci. Program. 2023, 2023, 8345603. [Google Scholar] [CrossRef]
  29. Muckell, J.; Hwang, J.-H.; Lawson, C.T.; Ravi, S. Algorithms for compressing GPS trajectory data: An empirical evaluation. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010; pp. 402–405. [Google Scholar]
  30. Qi, L.; Ji, Y. Ship trajectory data compression algorithms for Automatic Identification System: Comparison and analysis. J. Water Resour. Ocean Sci. 2020, 9, 42–47. [Google Scholar] [CrossRef]
  31. Wang, Y.; Zhang, Y.; Zhao, H.C.; Wang, H.B. Assessment Method Based on AIS Data Combining the Velocity Obstacle Method and Pareto Selection for the Collision Risk of Inland Ships. J. Mar. Sci. Eng. 2022, 10, 1723. [Google Scholar] [CrossRef]
  32. Ferreira, M.D.; Campbell, J.; Purney, E.; Soares, A.; Matwin, S. Assessing compression algorithms to improve the efficiency of clustering analysis on AIS vessel trajectories. Int. J. Geogr. Inf. Sci. 2023, 37, 660–683. [Google Scholar] [CrossRef]
  33. Gao, J.B.; Cai, Z.; Yu, W.J.; Sun, W. Trajectory Data Compression Algorithm Based on Ship Navigation State and Acceleration Variation. J. Mar. Sci. Eng. 2023, 11, 216. [Google Scholar] [CrossRef]
  34. Liu, X.; Zhi, X.; Wang, P.; Mei, Q.; Su, H.; He, Z. An Improved DBSCAN Clustering Method for AIS Trajectories Incorporating DP Compression and Discrete Fréchet Distance. In Proceedings of the International Conference on Spatial Data and Intelligence, Nanjing, China, 25–27 April 2024; pp. 44–56. [Google Scholar]
  35. Wei, Z.K.; Xie, X.L.; Zhang, X.J. AIS trajectory simplification algorithm considering ship behaviours. Ocean Eng. 2020, 216, 108086. [Google Scholar] [CrossRef]
  36. Rong, H.; Teixeira, A.; Soares, C.G. Data mining approach to shipping route characterization and anomaly detection based on AIS data. Ocean Eng. 2020, 198, 106936. [Google Scholar] [CrossRef]
  37. Huang, C.; Qi, X.; Zheng, J.; Zhu, R.; Shen, J. A maritime traffic route extraction method based on density-based spatial clustering of applications with noise for multi-dimensional data. Ocean Eng. 2023, 268, 113036. [Google Scholar] [CrossRef]
  38. Liu, Z.; Gao, H.R.; Zhang, M.Y.; Yan, R.; Liu, J.X. A data mining method to extract traffic network for maritime transport management. Ocean Coast. Manag. 2023, 239, 106622. [Google Scholar] [CrossRef]
  39. Yan, Z.J.; Xiao, Y.J.; Cheng, L.; He, R.; Ruan, X.G.; Zhou, X.; Li, M.C.; Bin, R. Exploring AIS data for intelligent maritime routes extraction. Appl. Ocean Res. 2020, 101, 102271. [Google Scholar] [CrossRef]
  40. Hausdorff, F. Grundzüge der mengenlehre; Von Veit: Leipzig, Germany, 1914; Volume 7. [Google Scholar]
  41. Li, H.; Liu, J.; Liu, R.W.; Xiong, N.; Wu, K.; Kim, T.H. A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis. Sensors 2017, 17, 1792. [Google Scholar] [CrossRef]
  42. Sheng, P.; Yin, J.B. Extracting Shipping Route Patterns by Trajectory Clustering Model Based on Automatic Identification System Data. Sustainability 2018, 10, 2327. [Google Scholar] [CrossRef]
  43. Zhao, L.B.; Shi, G.Y. A trajectory clustering method based on Douglas-Peucker compression and density for marine traffic pattern recognition. Ocean Eng. 2019, 172, 456–467. [Google Scholar] [CrossRef]
  44. Wang, L.H.; Chen, P.F.; Chen, L.Y.; Mou, J.M. Ship AIS Trajectory Clustering: An HDBSCAN-Based Approach. J. Mar. Sci. Eng. 2021, 9, 566. [Google Scholar] [CrossRef]
  45. Yan, Z.J.; Yang, G.H.; He, R.; Yang, H.; Ci, H.; Wang, R. Ship Trajectory Clustering Based on Trajectory Resampling and Enhanced BIRCH Algorithm. J. Mar. Sci. Eng. 2023, 11, 407. [Google Scholar] [CrossRef]
  46. Zhang, C.; Liu, S.T.; Guo, M.Z.; Liu, Y.C. A novel ship trajectory clustering analysis and anomaly detection method based on AIS data. Ocean Eng. 2023, 288, 116082. [Google Scholar] [CrossRef]
  47. Series, M. Technical Characteristics for an Automatic Identification System Using Time-Division Multiple Access in the VHF Maritime Mobile Band; M.1371-5; Recommendation ITU: Geneva, Switzerland, 2014. [Google Scholar]
  48. Harati-Mokhtari, A.; Wall, A.; Brooks, P.; Wang, J. Automatic identification system (AIS): Data reliability and human error implications. J. Navig. 2007, 60, 373–389. [Google Scholar] [CrossRef]
  49. ITU. Recommendation ITU-R M.2169—Improved Satellite Detection of AIS; Recommendation ITU: Geneva, Switzerland, 2009. [Google Scholar]
  50. Meng, F.; Yuan, G.; Lv, S.; Wang, Z.; Xia, S. An overview on trajectory outlier detection. Artif. Intell. Rev. 2019, 52, 2437–2456. [Google Scholar] [CrossRef]
  51. Chen, X.; Ling, J.; Yang, Y.; Zheng, H.; Xiong, P.; Postolache, O.; Xiong, Y. Ship trajectory reconstruction from AIS sensory data via data quality control and prediction. Math. Probl. Eng. 2020, 2020, 7191296. [Google Scholar] [CrossRef]
  52. Chen, S.; Huang, Y.; Lu, W. Anomaly detection and restoration for ais raw data. Wirel. Commun. Mob. Comput. 2022, 2022, 5954483. [Google Scholar] [CrossRef]
  53. Garcez Duarte, M.M.; Sakr, M. An experimental study of existing tools for outlier detection and cleaning in trajectories. GeoInformatica 2024, 1–21. [Google Scholar] [CrossRef]
  54. Douglas, D.H.; Peucker, T.K. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica 1973, 10, 112–122. [Google Scholar] [CrossRef]
  55. Sankoff, D.; Kruskal, J.B. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison; Addison-Wesley Publishing: Reading, MA, USA, 1983. [Google Scholar]
  56. Campello, R.J.G.B.; Moulavi, D.; Zimek, A.; Sander, J. Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection. ACM Trans. Knowl. Discov. Data 2015, 10, 5. [Google Scholar] [CrossRef]
  57. Ester, M.; Kriegel, H.-P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; pp. 226–231. [Google Scholar]
  58. Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
  59. Davies, D.L.; Bouldin, D.W. A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, 1, 224–227. [Google Scholar] [CrossRef]
  60. Huang, I.-L.; Lee, M.-C.; Nieh, C.-Y.; Huang, J.-C. Ship classification based on ais data and machine learning methods. Electronics 2023, 13, 98. [Google Scholar] [CrossRef]
Figure 1. Methodological framework of research.
Figure 1. Methodological framework of research.
Jmse 12 01672 g001
Figure 2. Principle diagram of DP algorithm; (a) Original trajectory. (b) Baseline construction and distance calculation. (c) Trajectory segmentation at farthest points. (d) Segmentation progression. (e) Incomplete segment handling. (f) Final simplified trajectory.
Figure 2. Principle diagram of DP algorithm; (a) Original trajectory. (b) Baseline construction and distance calculation. (c) Trajectory segmentation at farthest points. (d) Segmentation progression. (e) Incomplete segment handling. (f) Final simplified trajectory.
Jmse 12 01672 g002
Figure 3. Container ship traffic density in the western waters of Taiwan, 2023.
Figure 3. Container ship traffic density in the western waters of Taiwan, 2023.
Jmse 12 01672 g003
Figure 4. Parameter analysis for anomaly trajectory point detection and trajectory compression; (a) Relationship between parameter α and the number of anomaly data points. (b) Distribution of ship length and width. (c) Relationship between ε and average reduced distance. (d) Relationship between ε and average reduced point.
Figure 4. Parameter analysis for anomaly trajectory point detection and trajectory compression; (a) Relationship between parameter α and the number of anomaly data points. (b) Distribution of ship length and width. (c) Relationship between ε and average reduced distance. (d) Relationship between ε and average reduced point.
Jmse 12 01672 g004
Figure 5. Impact of min_samples on the number of clusters and non-clustered trajectories.
Figure 5. Impact of min_samples on the number of clusters and non-clustered trajectories.
Jmse 12 01672 g005
Figure 6. Impact of min_samples on Silhouette Coefficient (SC) and Davies–Bouldin Index (DBI).
Figure 6. Impact of min_samples on Silhouette Coefficient (SC) and Davies–Bouldin Index (DBI).
Jmse 12 01672 g006
Figure 7. Clustering analysis results for min_samples = 4 (26 clusters and 1 non-clustered trajectory data).
Figure 7. Clustering analysis results for min_samples = 4 (26 clusters and 1 non-clustered trajectory data).
Jmse 12 01672 g007
Figure 8. Clustering analysis results for min_samples = 17 (16 clusters and 1 non-clustered trajectory data).
Figure 8. Clustering analysis results for min_samples = 17 (16 clusters and 1 non-clustered trajectory data).
Jmse 12 01672 g008
Figure 9. Clustering analysis results for min_samples = 50 (10 clusters and 1 non-clustered trajectory data).
Figure 9. Clustering analysis results for min_samples = 50 (10 clusters and 1 non-clustered trajectory data).
Jmse 12 01672 g009
Figure 10. Route extraction of container ships in the western waters of Taiwan in 2023.
Figure 10. Route extraction of container ships in the western waters of Taiwan in 2023.
Jmse 12 01672 g010
Table 1. Configuration of the case study.
Table 1. Configuration of the case study.
ItemConfiguration
BoundaryLatitude: 23°40′ N~25°20′ N
Longitude: 119°30′ E~120°40′ E
AIS points109,725,868 points
AIS trajectory35,460 trajectories
α of anomaly trajectory point detection1.1
β of ROT limit45
ε of Douglas–Peucker30 m (one times the average ship width)
Table 2. Trajectory counts of four cargo ship types based on the ship classification model.
Table 2. Trajectory counts of four cargo ship types based on the ship classification model.
Ship TypeTrajectory Count
Container ship10,267
Bulk carrier6324
General cargo ship2833
Vehicles carrier580
Others or unknown15,456
Table 3. Partial information of the similarity matrix of trajectory distances (10,267 × 10,267).
Table 3. Partial information of the similarity matrix of trajectory distances (10,267 × 10,267).
Traj.1210,267
102,396,564286,987
22,396,56403,295,866
10,267286,9873,295,8660
Table 4. Hyperparameter analysis results for different min_samples settings.
Table 4. Hyperparameter analysis results for different min_samples settings.
Min_SamplesClustersNon-ClustersSCDBI
4263660.371.97 *
17165290.542.77
50108030.55 *3.00
* represents the best metric among the min_samples setting from 2 to 200.
Table 5. Clustering results based on different min_samples.
Table 5. Clustering results based on different min_samples.
GroupMin_Samples = 4Min_Samples = 17Min_Samples = 50
ClusterAvg. TMG (Degree)Avg. Length (m)ClusterAvg. TMG (Degree)Avg. Length (m)ClusterAvg. TMG (Degree)Avg. Length (m)
1A09030.5201,620B05030.5201,540C05030.3201,272
2A10042.690,333B06042.690,333C04042.690,333
3A12033.6138,471B08032.6138,625C07033.0139,263
4A24212.1206,418B12212.1206,231C08212.5207,744
5A00151.770,434B00249.054,340C00249.054,340
A01167.451,937
A02345.854,793
A03302.648,956
6A04121.1167,770B01243.9159,745C01244.2157,401
B02060.9172,255C02060.8172,805
7A13244.4120,398B13244.4120,255C09225.8101,373
A14225.0100,848B14225.0100,800
A15220.885,864B15220.885,864
8A19216.1146,386B03216.1146,800C03216.1146,828
A20214.2165,464
9A21327.966,180B10327.466,706C06279.0938,171
A22268.6100,075B09269.898,754
A23287.578,181
10A06105.880,245B04099.2885,181---
A07090.598,945
11A11064.5118,369B07064.5118,226---
12A16134.310,970B11137.510,796---
13A05229.9248,269------
14A08157.980,098------
15A17268.4110,206------
A18281.9103,913
16A25220.6241,645------
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, I.-L.; Lee, M.-C.; Chang, L.; Huang, J.-C. Development and Application of an Advanced Automatic Identification System (AIS)-Based Ship Trajectory Extraction Framework for Maritime Traffic Analysis. J. Mar. Sci. Eng. 2024, 12, 1672. https://doi.org/10.3390/jmse12091672

AMA Style

Huang I-L, Lee M-C, Chang L, Huang J-C. Development and Application of an Advanced Automatic Identification System (AIS)-Based Ship Trajectory Extraction Framework for Maritime Traffic Analysis. Journal of Marine Science and Engineering. 2024; 12(9):1672. https://doi.org/10.3390/jmse12091672

Chicago/Turabian Style

Huang, I-Lun, Man-Chun Lee, Li Chang, and Juan-Chen Huang. 2024. "Development and Application of an Advanced Automatic Identification System (AIS)-Based Ship Trajectory Extraction Framework for Maritime Traffic Analysis" Journal of Marine Science and Engineering 12, no. 9: 1672. https://doi.org/10.3390/jmse12091672

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop