AIS Data Driven Ship Behavior Modeling in Fairways: A Random Forest Based Approach

Ma, Lin; Guo, Zhuang; Shi, Guoyou

doi:10.3390/app14188484

Open AccessArticle

AIS Data Driven Ship Behavior Modeling in Fairways: A Random Forest Based Approach

by

Lin Ma

^1,2,*

,

Zhuang Guo

^1,2 and

Guoyou Shi

^1,2

¹

Navigation College, Dalian Maritime University, Dalian 116026, China

²

Key Laboratory of Navigation Safety Guarantee of Liaoning Province, Navigation College, Dalian Maritime University, Dalian 116026, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(18), 8484; https://doi.org/10.3390/app14188484

Submission received: 12 August 2024 / Revised: 5 September 2024 / Accepted: 17 September 2024 / Published: 20 September 2024

(This article belongs to the Section Marine Science and Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

The continuous growth of global trade and maritime transport has significantly heightened the challenges of managing ship traffic in port waters, particularly within fairways. Effective traffic management in these channels is crucial not only for ensuring navigational safety but also for optimizing port efficiency. A deep understanding of ship behavior within fairways is essential for effective traffic management. This paper applies machine learning techniques, including Decision Tree, Random Forest, and Gradient Boosting Regression, to model and analyze the behavior of various types of ships at specific moments within fairways. The study focuses on predicting four key behavioral parameters: latitude, longitude, speed, and heading. The experimental results reveal that the Random Forest model achieves adjusted

R^{2}

scores of 0.9999 for both longitude and latitude, 0.9957 for speed, and 0.9727 for heading. All three models perform well in accurately predicting ship positions at different times, with the Random Forest model particularly excelling in speed and heading predictions. It effectively captures the behavior of ships within fairways and provides accurate predictions for different types and sizes of vessels, especially in terms of speed and heading variations as they approach or leave berths. This model offers valuable support for predicting ship behavior, enhancing ship traffic management, optimizing port scheduling, and detecting anomalies.

Keywords:

AIS data; ship behavior; fairway; random forest; trajectory prediction

1. Introduction

Maritime transport is a cornerstone of international trade and the global economy, with over 80% of international cargo trade carried by sea [1]. In most developing countries, this proportion is even higher. As port construction and maritime cargo transport rapidly develop, issues such as complex coastal and port entry and exit routes, increased vessel traffic and density, frequent encounters among various ships, and heightened water traffic safety risks have become increasingly prominent. Maritime traffic safety is crucial to human life, the marine environment, and the material and immaterial assets involved in maritime activities [2]. Moreover, as maritime vessel traffic increases, so do CO₂ emissions, exacerbating environmental pollution [3]. As a critical hub for waterway transportation, port waters are typically busy, especially within fairways where multiple vessels often sail simultaneously. The organization of this traffic affects the frequency of vessel encounters, local vessel density, and waiting times for port entry and exit. These factors directly impact navigation safety within channel waters and port operation efficiency. VTS (Vessel Traffic System) officers need to understand vessel behavior within fairways for effective ship scheduling and risk assessment. This includes knowledge of preferred exit positions, speed variations of different vessel types and sizes, and ship turning behaviors within the channel [4]. Studying vessel behavior within fairways aids in traffic prediction, risk anticipation, anomaly detection, and optimizing the scheduling of port entry and exit. Therefore, research on vessel behavior within fairways is crucial for port traffic management.

The Automatic Identification System (AIS) is a modern navigational aid designed to enhance maritime safety. It is widely used in maritime supervision, traffic flow analysis, collision avoidance, and other fields [5]. The AIS system stores extensive historical trajectory data, including vessel positions, speed over ground (SOG), course over ground (COG), vessel length, vessel width, draft, vessel type, and more. This vast AIS data resource supports the extraction of traffic patterns and knowledge mining, further aiding in traffic optimization and safety management. And it contributes to addressing the increasingly complex maritime transportation networks, playing a significant role in enhancing the operational capacity of vessels [6]. The richness of the vessel trajectory data makes traffic pattern exploration and knowledge mining feasible, attracting increasing interest from the research community. Current methods for maritime traffic knowledge mining primarily include grid-based, vector-based, and statistical approaches. The grid-based method divides the study area into indexed grids and places the raw traffic data into these grids. This method effectively reduces the problem size. For instance, analyzing five days of Class A AIS vessel traffic data within the waters east of Bohai, China (117.5° E 123.5° E, 37° N 41° N), yields 30,325,098 trajectory points. Dividing the study area into grids with 1’ latitude and longitude would result in a significant reduction in problem size [7]. However, the grid-based method is more suitable for smaller study areas and often requires predefined grid sizes before the study [8]. An inappropriate grid size or uniform grid setting without considering regional characteristics may affect the study results. The vector-based method effectively identifies route points in vessel trajectories. Most studies only identify entry and exit points, turning points, and stopping points, but do not consider traffic characteristics along the routes. For example, speeds along routes may vary depending on vessel type, size, and region. Ignoring these variations makes it challenging to accurately represent the actual traffic situation in the study area. Thus, statistical models are often needed to describe maritime traffic characteristics, such as traffic volume, heading, and speed distribution.

Based on this analysis, we have designed a supervised learning method using machine learning for multi-feature, multi-output learning of vessel behavior within fairways based on AIS data. First, we acquire AIS trajectory data within channel boundaries and preprocess the data. Then, we organize training data based on the preprocessed AIS trajectories. Finally, we apply three methods to learn vessel behavior within fairways. The resulting models can accurately predict vessel latitude, longitude, heading, and speed at specific future times based on berth positions and vessel dimensions. These models support traffic volume prediction, risk assessment, traffic management, and vessel scheduling within fairways.

The remainder of this paper is organized as follows: Section 2 reviews existing research on vessel behavior learning and trajectory prediction, discussing their advantages, disadvantages, and application scenarios. Section 3 defines the multi-attribute vessel behavior learning method within fairways based on AIS data. Section 4 presents experiments and analyzes the results using AIS data from the fairways of Tianjin Port, China. Section 5 concludes the paper and provides an outlook for future research.

2. Related Work

Unlike land-based traffic, maritime traffic enjoys greater freedom due to the vast expanse of water. However, it is not entirely unstructured, especially for Class A AIS vessels, which are typically engaged in transportation. When designing routes, navigators refer to various nautical resources such as world ocean routes, route design charts, routing guides, and port guides [9]. They also consider geographical features and navigation rules for specific areas. Different vessels may choose different routes based on these factors, and there are certain patterns to these choices. Learning vessel behavior within fairways is beneficial not only for predicting maritime traffic but also for visualizing maritime traffic, detecting anomalies, providing collision warnings, designing routes, and supporting other decision-making systems.

2.1. Ship Behavior Learning Method

Based on current research, ship behavior learning within fairways can be categorized into grid-based, vector-based, and statistical ship behavior learning methods. The basic idea of grid-based methods for learning ship behavior within fairways is to divide the study area into grids, map the ship traffic flow data onto these grids, and then learn the traffic patterns. Kim et al. [10] proposed creating a grid-based ship traffic database and introduced the problem of trajectory interpolation. They stored dynamic and static ship information in the gridded database, significantly reducing the problem size. Their research primarily applied to ship traffic display and traffic information statistics. Ristic et al. [11] described ship traffic patterns based on grids and used the developed model for ship traffic anomaly detection. Xiao et al. [12] designed the Lattice-Based DBSCAN clustering algorithm, which clusters grids based on their density (the number of trajectory points within a grid). This approach effectively removes noise and irregular ship traffic data, facilitating the learning of ship traffic patterns. The learned models were then applied to ship traffic prediction. Grid-based ship traffic pattern learning effectively reduces the problem size and allows various attributes (such as speed distribution, heading distribution, and ship length distribution, and more) to be stored within the grids. However, the grid size generally needs to be determined before the study, and grid-based methods are typically more suitable for smaller areas.

The basic idea of vector-based methods for learning ship behavior within fairways is to represent ship routes using turning points and the straight lines connecting them. Before extracting ship routes or constructing a maritime traffic network, specific methods are generally required to cluster ship routes and turning points, thereby constructing the traffic network. Pallotta et al. [13] developed a method called TREAD (Traffic Route Extraction and Anomaly Detection), which uses DBSCAN clustering to identify route points (such as ports, offshore platforms, and entry and exit points). They used statistical methods to obtain the distribution of parameters like ship types at these route points, learning traffic patterns for trajectory prediction, and anomaly detection. The study also compared traffic patterns derived from different data volumes and suggested the data volume required to obtain reliable traffic patterns. Vespe et al. [14] identified turning areas, ports, offshore platforms, and entry, and exit points to ultimately form ship routes. Vector-based ship traffic mining methods are generally considered effective for handling high-density traffic but are less effective for irregular ship traffic.

Statistical methods are widely used in maritime traffic learning to describe important maritime traffic parameters. Ristic et al. [15] established a statistical model of ship movement patterns based on density estimation and applied it to anomaly detection and maritime traffic prediction. Xiao et al. [16] set up gate lines in the navigation channel to collect statistics on the distribution of ship positions, speeds, and headings as they pass through, obtaining traffic patterns for modeling and simulating actual maritime traffic. Wu et al. [17] investigated ship behavior in densely trafficked areas, with their findings applied to identifying collision risks. Ma et al. [18] applied the Ising model to establish a calculation model for determining ship collision risk values in port waters. This model provides a quantitative analysis of collision risks in port waters. A case study in Qingdao Port demonstrated that the Ising model effectively identified high-risk collision areas, contributing to overall navigation safety in port waters.

2.2. Ship Trajectory Prediction

In recent years, significant progress has been made in ship trajectory prediction research. Various methods, such as Graph Attention Networks (GAT), Long Short-Term Memory Networks (LSTM), and Generative Adversarial Networks (GAN), have been combined with spatial feature extraction, temporal feature learning, and improved algorithms to enhance prediction accuracy and real-time performance, providing crucial support for maritime traffic management and ship safety. Zhao et al. [19] proposed a ship trajectory prediction method based on GAT and LSTM, combining spatial feature extraction and temporal feature learning to achieve high-precision ship trajectory prediction, which is significant for maritime traffic control and safe navigation. Wu et al. [20] introduced a novel multi-ship trajectory prediction model called GL-STGCNN, incorporating a ship interaction adjacency matrix extraction module and a Model Predictive Control (MPC) trajectory correction method, significantly improving trajectory prediction accuracy and rationality in multi-ship interaction scenarios. Li et al. [21] proposed a ship trajectory prediction model based on an improved Bi-directional LSTM (Bi-LSTM), which introduced the Rectified Adaptive Moment Estimation and Lookahead optimization algorithms to enhance model generalization and prediction accuracy. Experimental results demonstrated the superior performance of this model in ship trajectory prediction. Chen et al. [22] presented a ship trajectory prediction and analysis framework based on the Bi-LSTM model, utilizing Automatic Identification System (AIS) data. The model’s prediction performance was validated using Mean Squared Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE), showing satisfactory ship trajectory prediction results in both single-ship and multi-ship scenarios. Zhang et al. [23] proposed a ship trajectory prediction method based on a hybrid K-Nearest Neighbor (KNN) and LSTM model. By considering the differences in ship density across various sea areas, the KNN algorithm was optimized and combined with spatio-temporal features to achieve high-precision ship trajectory prediction. Tian et al. [24] introduced a ship trajectory prediction method based on a Differential LSTM (D-LSTM) network, which effectively improved trajectory prediction accuracy in complex waterways with frequent maneuvers by integrating dynamic time-varying features from time series data and differential variables, significantly reducing training time. Zhang et al. [25] proposed a ship trajectory prediction method based on an improved LSTM and GAN, significantly enhancing the accuracy and real-time performance of ship trajectory prediction, which is crucial for maritime traffic management and infrastructure protection. Zhang et al. [26] developed a method for identifying ship maneuvering motion models based on least squares and combined ocean current velocity impacts to establish an extreme short-term ship trajectory prediction model. By real-time online correction of current velocity information, the model achieved high-precision ship trajectory prediction under different ocean current conditions. Bao et al. [27] introduced an improved ship trajectory prediction model based on a Multi-Head Attention mechanism and Bi-directional Gated Recurrent Unit (MHA-BiGRU). Experiments verified its effectiveness in achieving high-precision, real-time ship trajectory prediction, providing support for maritime traffic services and ship navigation. Wu et al. [28] proposed a ship trajectory prediction method based on a Sequence-to-Sequence model using Convolutional LSTM. The method improved ship trajectory prediction accuracy through preprocessing and feature extraction, validated with real AIS data. Liu [29] developed a ship trajectory prediction model based on Adaptive Chaotic Differential Evolution optimized Support Vector Regression (ACDE-SVR). By processing AIS data and optimizing SVR model parameters, the model effectively enhanced prediction accuracy and real-time performance. Qian [30] proposed a new method for inland ship trajectory prediction based on a Genetic Algorithm-optimized LSTM (GA-LSTM), significantly improving prediction accuracy and computational efficiency. Jia [31] presented a ship trajectory prediction model based on an Attention-enhanced Bi-directional LSTM combined with a Whale Optimization Algorithm (WOA) for hyperparameter optimization. This model comprehensively considered the context information of time series data, improving prediction accuracy and adaptability.

3. Methodology

3.1. Problem Definition

The management of vessel traffic in port waters primarily involves three core factors: berths, channels, and anchorages. As shown in Figure 1, for entry vessels, there are generally two scenarios: if the vessel meets the entry conditions upon arrival at the port waters, VTS (Vessel Traffic Service) operators will direct it into the channel for berthing; if the vessel does not meet the entry conditions, it will be directed to the anchorage to wait. For exit vessels, after cargo loading or unloading is completed, the vessel must apply to the VTS center for departure based on the existing scheduling plan. If the departure conditions are met, the vessel will depart with the assistance of tugs and pilots; if there are conflicts or the conditions are not met, the vessel will wait at the dock until the conditions are satisfied. VTS operators need to evaluate berth availability, channel vessel density, and port entry and exit schedules in real-time to determine whether a vessel can use the channel and the order of usage. The behavior of vessels within the channel is crucial for assessing channel risks and ensuring safe and efficient vessel scheduling.

The SOLAS Convention mandates that by 31 December 2004, all cargo ships of 300 gross tonnage and above engaged in international voyages, cargo ships of 500 gross tonnage and above engaged in domestic voyages, and all passenger ships must be equipped with AIS. AIS data is categorized into two types: dynamic information, which includes the vessel’s position, course, speed, etc., and static information, which encompasses the vessel’s name, call sign, length, width, and voyage-related details such as draft, destination, and estimated time of arrival. These two sets of information can be correlated using the MMSI (Maritime Mobile Service Identity) number. AIS data serves as a foundational dataset for learning vessel behavior within channels.

The problem of ship behavior modeling in fairways can be formulated through a supervised learning task based on historical AIS data to capture the behavior of vessels in the entire port area. As show in Equation (1), we hope to obtain model f through supervised learning based on historical AIS data. So, we can obtain any ship’s position, COG, and SOG at the corresponding time when we know the ship’s berth position, the position of the ship entering or leaving the channel, the length of the ship, the width of the ship, the draft of the ship, the type of the ship, and the time interval between the start of the operation.

l o n_{i}, l a t_{i}, c o g_{i}, s o g_{i} = f (l o n_{s}, l a t_{s}, l o n_{e}, l a t_{e}, l, b, d m, s t, t_{i})

(1)

where

l o n_{i}

represents the vessel’s longitude at time

t_{i}

,

l a t_{i}

represents the vessel’s latitude at time

t_{i}

,

c o g_{i}

represents the vessel’s course at time

t_{i}

,

s o g_{i}

represents the vessel’s speed at time

t_{i}

,

l o n_{s}

represents the start position’s longitude,

l a t_{0}

represents the start position’s latitude,

l o n_{e}

represents the longitude of the end position.

l a t_{e}

represents the latitude of the end position. l represents the vessel’s length. b represents the vessel’s width.

d m

represents the vessel’s draft.

s t

represents the vessel type.

t_{i}

represents the time interval between the corresponding time and the time when the vessel enters the channel or leaves the berth.

As shown in Figure 2, whether for incoming or outgoing vessels, the vessel’s attributes, the Channel Location, and the berth location are known. For incoming vessels, the starting point is the Channel Location, and the endpoint is the berth. For outgoing vessels, the starting point is the berth, and the endpoint is the Channel Location. The problem we aim to solve is accurately predicting the vessel’s position, heading, and speed at times

t_{0}, t_{1}, \dots, t_{n}

within the channel based on the known information.

Taking an incoming vessel as an example, after entering the channel, the vessel will proceed towards the berth at a certain heading and speed until it reaches a certain distance before the berth. At this point, the vessel’s speed will decrease, and its heading may change until docking is completed. The position where the vessel decelerates and the rate of speed change will vary depending on the size and type of the vessel. The paper aims to solve the problem of obtaining a vessel behavior model within the channel using machine learning methods, and then predict the vessel’s behavior within the channel at different times based on known conditions.

The study focuses on the following key aspects:

Extraction of Waterway Data: Since the research is concentrated on vessel behavior within the channel, it is crucial to confine the range of AIS data to the boundaries of the channel;
Preprocessing of Historical AIS Data: This involves filtering out anomalous AIS data to ensure that the training process is not adversely affected;
Organization of Training Data: This step includes reformatting AIS data to create labeled training datasets suitable for analysis;
Learning Vessel Behavior Patterns: The study employs Decision Tree, Random Forest, and Gradient Boosting Regression methods to analyze and model vessel behavior patterns within the Waterway.

3.2. AIS Data Pre-Processing

The study primarily focuses on the behavioral patterns of vessels within the channel. Therefore, extracting data specific to the channel and port waters from the vast historical AIS trajectory data helps to effectively narrow down the scope of the problem. The channel and port areas are irregular polygons, and cannot directly compare AIS trajectory points with the latitude and longitude of the channel boundaries to determine whether the trajectory points are within the channel. The Ray-casting algorithm [32] is employed to extract data within these waterway boundaries. The Ray-casting algorithm operates on the principle of determining whether a given point lies inside or outside a polygon by counting the number of intersections between a ray starting from the point and the edges of the polygon. If the number of intersections is odd, the point is considered to be inside the polygon; if even, it is outside. As shown in Figure 3, the shaded area represents the port waters, which form an irregular polygon. Vessels a and c, located within the water area, emit rays in a fixed direction, with the intersection points on the port boundary being 3 and 1, respectively—both odd numbers. Vessels c and d, located outside the water area, emit rays in a fixed direction, with the intersection points on the port boundary being 0 and 2, respectively—both even numbers. Using the Ray-casting algorithm, we can obtain data within the channel boundaries and directly exclude data outside the water area from the study.

Algorithm 1 describes how to determine whether an AIS trajectory point is within the Fairway. AIS trajectory data often contains noise, such as erroneous ship information or anomalous positional and speed data. The Maritime Mobile Service Identity (MMSI) is used to identify various stations and groups of calling stations. Each vessel should have a unique MMSI, which can be used as an index in the study to determine if AIS data pertains to the same vessel. However, some small vessels may have incorrectly configured MMSIs, so during data preprocessing, it is crucial to handle these issues to prevent different vessels’ AIS data from being mistakenly assigned to the same vessel. The assignment of vessel radio station identifiers follows specific requirements, represented by a 9-digit ship station identifier code in the format of MMSI, Format as MIDXXXXXX. The first three digits represent the Maritime Identification Digits (MID), which can be any number from 0 to 9. The MID denotes the authority responsible for the vessel’s radio station. As of 18 July 2023,there are 292 MIDs allocated to responsible authorities out of a possible 474, representing 61.60% of available combinations. A statistical analysis of the used maritime identification numbers is shown in Figure 4. Currently, there are no MIDs below 200 or above 800. Therefore, as the first step in data processing, any data that does not comply with the MMSI coding rules based on the MID range is excluded.

Algorithm 1: Ray-Casting Algorithm

The removal of other outliers primarily focuses on anomalies in position, speed, and draft information. Errors in position transmission can result in significant discrepancies between the positions of consecutive vessel trajectory points. By calculating the speed of the vessel based on the distance between adjacent points and the time interval, it is possible to identify abnormal position points. Such anomalies typically result in excessively high calculated speeds. The precision of the speed over ground (SOG) field in AIS data is 0.1 knots, with a representable range from 0 to 102.2 knots. When the SOG field value is 1022, it indicates that the vessel speed is 102.2 knots or higher; when the field value is 1023, it signifies that no speed information is available for that trajectory point. Currently, the maximum speed of ultra-large container vessels can reach 25–32 knots, bulk carriers typically operate at 12–17 knots, and general cargo ships usually maintain speeds between 15–17 knots. As show in Figure 5, a speed distribution analysis of 2,884,799 trajectory points collected within one hour along the Chinese coast shows a high concentration of points in the 0–2 knot range, with almost no points exceeding 20 knots. Since this study focuses on vessel behavior within navigational channels, where speed limits are enforced by authorities to ensure safe navigation, a speed threshold of 20 knots is established based on the statistical analysis and port safety regulations. Thus, any data with a speed field value or calculated speed exceeding 20 knots, based on the distance between consecutive trajectory points and the time interval, is considered an anomaly and is removed.

The movement of vessels within port waters can generally be divided into three distinct phases: the incoming phase, where a vessel enters the channel and proceeds to a berth; the berthing phase, where the vessel is moored at the berth for cargo loading or unloading; and the outcoming phase, where the vessel departs from the berth and exits the channel. Based on this analysis, the speed of an individual vessel within the study area, when plotted over time, typically exhibits a pattern of high-to-low (incoming phase), a period of near-zero speed (berthing phase), and low-to-high (outcoming phase). Our research focuses primarily on the incoming and outcoming phases. Therefore, the vessel trajectory can be segmented based on speed variations to extract the trajectory data of individual voyages. Considering minor speed fluctuations, a threshold of 0.5 knots is selected to segment the trajectory.

A complete trajectory used for training is defined as follows: For incoming vessels, the trajectory starts at the channel boundary and ends at the berth. For outgoing vessels, the trajectory starts at the berth and ends at the channel boundary. Due to the varying positions of the entrance and exit of the channel and the berth, the number of trajectory points in a single trajectory can vary significantly. Therefore, when organizing the training data, there is no restriction on the number of points in a single trajectory. However, the position of the trajectory’s starting and ending position is very important.

For a single trajectory, the vessel’s length, width, and type can be retrieved from the static data using the MMSI, while the draft information can be obtained from the voyage information database based on the time range of the trajectory. The last point of an entry port trajectory and the first point of an exit port trajectory can be used to determine the berth location. By iterating through each trajectory point, the latitude and longitude of each point, along with the time difference between the point and the starting time, can be obtained. With this data, labeled training data can be generated as Equations (2) and (3). X and y form the labeled sample, the solid lines in the matrix separate multiple complete trajectories. The meaning of each feature is described in detail in Section 3.1.

X = (\begin{matrix} L O N_{0 S} & L A T_{0 S} & L O N_{0 E} & L A T_{0 E} & L_{0} & B_{0} & s t_{0} & d m_{0} & t_{00} \\ L O N_{0 S} & L A T_{0 S} & L O N_{0 E} & L A T_{0 E} & L_{0} & B_{0} & s t_{0} & d m_{0} & t_{01} \\ ⋮ & ⋮ \\ L O N_{0 S} & L A T_{0 S} & L O N_{0 E} & L A T_{0 E} & L_{0} & B_{0} & s t_{0} & d m_{0} & t_{0 n} \\ L O N_{1 S} & L A T_{1 S} & L O N_{1 E} & L A T_{1 E} & L_{1} & B_{1} & s t_{1} & d m_{1} & t_{10} \\ L O N_{1 S} & L A T_{1 S} & L O N_{1 E} & L A T_{1 E} & L_{1} & B_{1} & s t_{1} & d m_{1} & t_{11} \\ ⋮ & ⋮ \\ L O N_{1 S} & L A T_{1 S} & L O N_{1 E} & L A T_{1 E} & L_{1} & B_{1} & s t_{1} & d m_{1} & t_{1 n} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ L O N_{m S} & L A T_{m S} & L O N_{m E} & L A T_{m E} & L_{m} & B_{m} & s t_{m} & d m_{m} & t_{m 0} \\ L O N_{m S} & L A T_{m S} & L O N_{m E} & L A T_{m E} & L_{m} & B_{m} & s t_{m} & d m_{m} & t_{m 1} \\ ⋮ & ⋮ \\ L O N_{m S} & L A T_{m S} & L O N_{m E} & L A T_{m E} & L_{m} & B_{m} & s t_{m} & d m_{m} & t_{m n} \end{matrix})

(2)

y = (\begin{matrix} l o n_{00} & l a t_{00} & c o g_{00} & s o g_{00} \\ l o n_{01} & l a t_{01} & c o g_{01} & s o g_{01} \\ ⋮ & ⋮ \\ l o n_{0 n} & l a t_{0 n} & c o g_{0 n} & s o g_{0 n} \\ l o n_{10} & l a t_{10} & c o g_{10} & s o g_{10} \\ l o n_{11} & l a t_{11} & c o g_{11} & s o g_{11} \\ ⋮ & ⋮ \\ l o n_{1 n} & l a t_{1 n} & c o g_{1 n} & s o g_{1 n} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ ⋮ & ⋮ & ⋮ & ⋮ \\ l o n_{m 0} & l a t_{m 0} & c o g_{m 0} & s o g_{m 0} \\ l o n_{m 1} & l a t_{m 1} & c o g_{m 1} & s o g_{m 1} \\ ⋮ & ⋮ \\ l o n_{m n} & l a t_{m n} & c o g_{m n} & s o g_{m n} \end{matrix})

(3)

3.3. Model Training

Random Forest (RF) is an extended variant of the Bagging method. RF builds on the ensemble learning approach of Bagging, which is based on Decision Trees, by introducing random attribute selection during the training process of the Decision Trees. Specifically, while a traditional Decision Tree selects the optimal attribute from the entire set of attributes (assuming there are d attributes) at each node, RF takes a different approach. For each node in the base Decision Tree, RF randomly selects a subset of k attributes from the full set, and then chooses the best attribute from this subset for splitting. The parameter k controls the degree of randomness: when k = d, the base Decision Tree is constructed just like a traditional Decision Tree; when k = 1, a single attribute is randomly selected for splitting. RF leverages the simplicity of Decision Trees and the power of ensemble learning to create a more accurate model. Before learning, the method uses bootstrapping to randomly select samples, allowing for repetition in the selection process. Data that is not selected is referred to as the out-of-bag (OOB) dataset, which can be used to evaluate the model’s accuracy. Additionally, the randomness introduced in attribute selection when constructing Decision Trees enhances the robustness of the model.

Decision Tree (DT) is a classification and regression method that uses a tree-like structure to make decisions. It recursively splits the data into two parts at each node, creating a flowchart-like structure where each internal node represents a feature, each branch represents a feature’s value, and each leaf node represents the predicted outcome for the target variable. Gradient Boosting Regression (GBR) is an ensemble learning method that iteratively trains new models to correct the prediction errors of previous models, thereby gradually improving the overall predictive accuracy of the model.

When training the Decision Tree model, we adjusted the max_depth, min_samples_split, and min_samples_leaf parameters to effectively control the model’s complexity and prevent overfitting, thereby enhancing the model’s generalization ability on unseen data. Additionally, we employed cross-validation to evaluate various parameter configurations, ensuring that the model maintains stable performance across different subsets. In training the Random Forest model, in addition to adjusting the basic parameters of the Decision Tree, we also optimized the n_estimators parameter, which determines the number of trees in the model. Increasing the number of trees helps improve the model’s stability and accuracy. We also used cross-validation techniques to find the optimal parameter combination, ensuring that the Random Forest model performs well under various data conditions. For the Gradient Boosting Regression model, we adjusted the learning_rate, n_estimators, min_samples_split, min_samples_leaf, and max_depth parameters. The learning_rate parameter controls the influence of each tree on the final prediction, with a lower learning rate potentially requiring more trees to fine-tune the model. By gradually experimenting with different numbers of trees and learning rates, we aimed to find the optimal model configuration to accurately predict vessel behavior within the channel.The framework of shhip behavior model in fairway as show in Figure 6.

4. Results Analysis and Discussion

4.1. Experimental Data

Tianjin Port, as show in Figure 7, the largest comprehensive port in northern China, experiences heavy vessel traffic and complex navigation conditions. Vessels primarily access the Tianjin New Port area through several channels: the Tianjin Port main channel, the Zha Dong channel, the 300,000-ton channel, and the small vessel channel. Along these channels, there are seven recommended entry and exit positions for vessels. Depending on the vessel’s draft, different entry and exit positions are chosen. The general principle is to minimize the vessel’s occupation of the channel, with deeper-draft vessels using positions farther from the port and shallower-draft vessels using positions closer to the port. For the analysis, historical AIS data from the Tianjin Port channel area was selected. The data covers vessel movements from January 1 to 5 January 2021, during which approximately 100 vessel movements occur daily. The initial dataset used for training the navigational traffic behavior model included 222,623 trajectory points from 333 vessels. After data preprocessing, the final dataset consisted of 190,999 AIS trajectory points from 220 vessels. Given that the data collection period spanned 5 days, the same vessel may have multiple trajectories, including both inbound and outbound routes. In total, the training dataset includes 359 trajectories, with 205 being inbound and 154 being outbound. By ship type, there are 88 container ships, 93 cargo ships, 24 oil tankers, and 15 other ship types.

The distribution of data regarding ship length, width, draft, speed, length-to-width ratio, and width-to-draft ratio is shown in Figure 8. The distribution of ship lengths ranges between 40 and 260 m, with the highest frequency observed around 170 m. Ship widths are primarily between 12 and 34 m, with the most common width being approximately 33 m. Draft range from 2 to 15 m, with the most frequent around 5 m. The SOG displays multiple peaks, particularly between 7.5 and 12.5 knots. The length-to-breadth ratio (L/B) predominantly falls between 5 and 8, peaking around 6 and 7. The breadth-to-draft ratio (B/D) shows a broad distribution, with a notable peak around 4, indicating that most vessels have a width roughly 4 times their draft depth.

The AIS data were divided into two sets, with 80% used as the training dataset and 20% as the test dataset.

4.2. Performance Metrics

In this experiment, the performance of the models was evaluated using the Mean Squared Error (MSE) and the Adjusted R-squared (Adjusted

R^{2}

). Unlike the regular

R^{2}

, the Adjusted

R^{2}

takes into account the number of independent variables, penalizing the addition of irrelevant variables. This means that the inclusion of irrelevant features does not increase the Adjusted

R^{2}

, while it may still increase the

R^{2}

. The formulas for calculating MSE and Adjusted

R^{2}

are provided below. MSE measures the magnitude of the prediction error, while Adjusted

R^{2}

assesses the model’s explanatory power. Together, these metrics provide a comprehensive evaluation of the model’s predictive performance and goodness-of-fit.

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}

(4)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y_{i}})}^{2}}

(5)

A d j u s t e d R^{2} = 1 - \frac{(1 - R^{2}) \times (n - 1)}{n - p - 1}

(6)

where n is the number of samples,

y_{i}

is the actual value of the i sample,

\hat{y_{i}}

is the predicted value of the i sample,

\bar{y}

is the mean of the actual values, p is the number of independent variables.

4.3. Model Parameter Settings

The study utilized Decision Tree (DT), Gradient Boosting Regression (GBR), and Random Forest (RF) models to learn the behavior of vessels within the channel. To enhance the accuracy and stability of the models, hyperparameter tuning was performed before finalizing the models. For the Decision Tree, the hyperparameters max_depth, min_samples_split, and min_samples_leaf were optimized. Max_depth controls the maximum depth of the tree; a greater depth allows the model to capture more complex patterns but also increases the risk of overfitting. Min_samples_split determines the minimum number of samples required to split an internal node, and min_samples_leaf controls the minimum number of samples required to be at a leaf node. For Gradient Boosting Regression, additional parameters n_estimators and learning_rate were tuned. N_estimators is a key parameter that determines the number of decision trees, with higher values leading to a more complex model. Learning_rate decides the contribution of each tree to the final model. Random Forest’s hyperparameter tuning included n_estimators, in addition to those of the Decision Tree. Hyperparameter tuning was performed on the training dataset using grid search combined with cross-validation. The range of hyperparameters and the final selected values are listed in Table 1.

4.4. Learning Curves

This section compares the performance of three machine learning regression models on the dataset: Decision Tree Regressor, Gradient Boosting Regressor, and Random Forest Regressor. The learning curves for each model on both the training set and the test set are shown in Figure 9.

As illustrated in Figure 9, when the number of training samples increases from 20,000 to 120,000, the Decision Tree Regressor achieves a training score of 0.99 on the training set, indicating that the model fits the training data very well. However, the test set score increases only slightly from 0.925 to 0.95, suggesting that while the Decision Tree Regressor can effectively fit the training data, its performance on unseen data is slightly less robust. This highlights the need for further model optimization or the adoption of more complex models to improve generalization performance. The Gradient Boosting Regressor shows a training score increase from 0.88 to 0.89 on the training set, while the test set score slightly decreases from 0.875 to 0.87. The minimal impact of increased training samples on model performance suggests that the Gradient Boosting Regressor has already achieved a good fit for the current dataset. For the Random Forest Regressor, the training score is 0.995, and the test set score increases significantly from 0.825 to 0.975 as the number of training samples grows. This indicates that the model fits the training set well and demonstrates a strong improvement in performance on unseen data, showing excellent generalization capability. Overall, the Random Forest Regressor outperforms both the Gradient Boosting Regressor and the Decision Tree Regressor on both the training and test sets, reflecting its superior performance and stable generalization ability. The Random Forest model, by aggregating the results of multiple decision trees, is better able to capture the complex patterns and variations within the data, making it particularly effective on large-scale datasets.

4.5. Feature Importance Comparison

Feature importance plays a critical role in machine learning, aiding in understanding the decision-making process of models, enhancing model performance, and enabling feature selection. In this section, we compare the feature importance across Decision Tree, Gradient Boosting Regressor, and Random Forest models to evaluate the contribution of each feature to the models. The training data includes nine features: initial position longitude, initial position latitude, berth longitude, berth latitude, ship length, ship width, ship draft, ship type, and time. Figure 10 illustrates the feature importance for the Decision Tree model, Gradient Boosting Regressor, and Random Forest model.

As shown in Figure 10, the trends in feature importance across the three models are largely consistent. Initial position, berth position, ship length, and time are highly important in all models, while ship width, draft, and ship type have relatively lower importance. This aligns well with practical maritime operations, as a vessel’s position at a given time is closely related to the berth and navigational channel entry/exit points. According to maritime regulations, different sized vessels have varying speed limits within the channel, which affects their speed differently. Additionally, vessels of different sizes tend to have different speeds when approaching or leaving the berth.

Figure 11 and Figure 12 compare the performance in terms of Mean Squared Error (MSE) and adjusted

R^{2}

under three scenarios: retaining all features, removing vessel width, draft, and ship type, and removing vessel length, width, draft, and ship type.

Figure 11 compares the MSE of DT, GBR, and RF models on both the training and testing datasets under scenarios of including all features and excluding specific features (B, DM, SHIPTYPE). For the four prediction targets (Longitude (LON), Latitude (LAT), SOG, and COG, the results show that the Random Forest regressor achieves the lowest MSE in all scenarios, particularly excelling on the testing set. This indicates that the Random Forest regressor delivers optimal performance across these four prediction tasks and possesses strong generalization capabilities. The Random Forest regressor performs excellently across different feature sets, demonstrating lower MSE when B, DM, and SHIPTYPE are excluded. However, Removing the L feature has had a multifaceted impact on model performance. In some cases, such as with SOG, removing the L feature effectively reduces the MSE on both the training and test sets, thus improving the model’s generalization ability. However, in other scenarios, such as with LON, LAT, and COG, removing the L feature can actually increase the test set error. This is particularly evident with COG in the Random Forest model, where the MSE increases significantly, indicating that the L feature has a substantial impact on predicting COG. The performance stability of all models suggests that feature selection significantly impacts the models’ predictive capabilities, but the Random Forest regressor is better able to adapt to these changes.

Figure 12 shows the adjusted

R^{2}

values of the Decision Tree Regressor, Gradient Boosting Regressor, and Random Forest Regressor on the four prediction tasks (LON, LAT, SOG, COG), comparing their performance on the training and testing datasets. The figure compares three feature sets: including all features (All Features), excluding specific features (Except B, DM, SHIPTYPE), and excluding a broader set of features (Except B, L, DM, SHIPTYPE). In the Decision Tree model, the adjusted

R^{2}

values show almost no difference across the three feature sets, indicating that these features have minimal impact on the model’s performance. In the Gradient Boosting Regression model, the adjusted

R^{2}

on the test set slightly decreases after removing B, DM, and SHIPTYPE features, but it improves when L is also removed, still showing good fit quality. For the Random Forest model, the adjusted

R^{2}

increases after removing some features, suggesting that excluding these features might help improve the model’s performance. Overall, the performance of all models remains relatively stable across different feature sets, with the exclusion of specific features having a minor impact on the models’ fit quality. The Random Forest Regressor, in particular, demonstrates better fitting performance after certain features are excluded.

4.6. Prediction Error Analysis

In this study, we applied Decision Tree, Random Forest, and Gradient Boosting Regressor models to predict the target variables. To evaluate the performance of these models, we analyzed the distribution of prediction errors for each model. The prediction error plots reveal the relationship between actual values and predicted values, providing a visual representation of the accuracy and fit of the models on the test set, as shown in Figure 13.

The prediction error plots demonstrate that the Random Forest model exhibits a strong linear relationship when handling the data, providing accurate predictions for longitude and latitude. Although some deviations are observed in the predictions for Speed Over Ground (SOG) and Course Over Ground (COG), the overall prediction accuracy of the Random Forest model surpasses that of the other models. Overall, the Random Forest model displays the best predictive performance. The Decision Tree model’s prediction results, as seen in the scatter plots, exhibit more dispersed points, indicating that its predictions are less accurate than expected. Compared to the Random Forest and Gradient Boosting Regressor models, the Decision Tree model performs poorly when handling this data. The Gradient Boosting Regressor model shows exceptionally high accuracy in predicting target variables LON and LAT, with predicted values closely aligning with actual values, indicating excellent predictive performance. In predicting SOG, although there are some points that deviate from the ideal red prediction line, the overall error remains small, demonstrating the model’s stability and accuracy in this dimension. However, in predicting COG, the points in the scatter plot are more dispersed, indicating a certain degree of prediction error. The Mean Squared Error (MSE) and Adjusted

r^{2}

are used to quantify the prediction errors of different models, as shown in Table 2. Despite the favorable performance of both the Decision Tree and Random Forest models on the learning curves, the Decision Tree model suffers from overfitting and single-model limitations, resulting in poorer performance when predicting new data. In contrast, the Random Forest model, by integrating multiple decision trees, effectively reduces the risk of overfitting, enhancing the model’s stability and prediction accuracy.

4.7. Complete Trajectory Prediction

In this study, we compared the performance of the Decision Tree, Gradient Boosting Regressor, and Random Forest models in predicting the outcomes for two ships of different sizes, drafts, and types, particularly during maneuvers involving various turning angles. One ship has the following specifications: a length of 199 m, a beam of 32 m, a draft of 13.1 m, and is a bulk carrier. The other ship has a length of 115 m, a beam of 20 m, a draft of 9.3 m, and is a general cargo vessel.

Figure 14 and Table 3 compares the predictions for the bulk carrier in terms of position, Course Over Ground (COG), and Speed Over Ground (SOG). Panel (a) compares the predicted ship positions across different models, where the Random Forest model’s predicted trajectory aligns most closely with the actual trajectory, indicating higher prediction accuracy and stability. The Gradient Boosting Regressor shows some deviation from the actual position, while the Decision Tree model exhibits larger discrepancies from the actual trajectory. Panel (b) compares the predicted COG across different models, with both the Random Forest and Gradient Boosting Regressor models closely matching the actual COG, demonstrating strong predictive performance; however, the Decision Tree model shows a significant gap compared to the actual COG, indicating poorer performance. Panel (c) compares the SOG predictions across different models, where the Random Forest model’s prediction curve closely aligns with the actual values at most data points, demonstrating very high prediction accuracy. In contrast, the Decision Tree model shows considerable deviation, and the Gradient Boosting Regressor provides moderate prediction accuracy.

Figure 15 and Table 4 compares the predictions for the general cargo vessel in terms of position, COG, and SOG. Panel (a) contrasts the actual ship trajectory with the predicted positions from different models, where the Random Forest model shows very close alignment with the actual trajectory, indicating high prediction accuracy. The Gradient Boosting Regressor also performs well, closely aligning with the actual trajectory. The Decision Tree model, however, shows significant deviations from the actual trajectory in some regions, indicating poorer performance. Panel (b) compares COG predictions across different models, where the Random Forest model closely matches the actual COG, showing excellent predictive performance. The Gradient Boosting Regressor’s predictions are also close to the actual values at most data points, while the Decision Tree model’s predictions deviate more significantly. Panel (c) compares SOG predictions across different models, with the Random Forest model’s prediction curve aligning very closely with the actual values at most data points, indicating very high prediction accuracy. In contrast, the Decision Tree model shows significant deviation, and the Gradient Boosting Regressor provides moderate prediction accuracy.

The comparison across these three aspects reveals that the Random Forest Regressor consistently outperforms in predicting ship trajectories, COG, and SOG, with prediction results that fit the actual values most accurately. The Gradient Boosting Regressor and Decision Tree models show relatively lower prediction accuracy, with the Gradient Boosting Regressor particularly exhibiting larger deviations in COG and SOG predictions. This suggests that the Random Forest model possesses superior stability and generalization ability, making it well-suited for precise ship behavior prediction.

In conclusion, the Random Forest model demonstrates outstanding performance in predicting the trajectories, COG, and SOG for ships of various sizes, drafts, and types, showing high prediction accuracy and stability. Therefore, in practical applications, the Random Forest model is recommended for use to obtain more accurate and reliable prediction results. These findings provide robust technical support for maritime traffic planning and management, contributing to enhanced safety and efficiency in maritime traffic.

5. Conclusions

Understanding vessel behavior within navigational channels is of paramount importance. These channels are critical passageways for maritime navigation, and comprehending the behavior of vessels of various types and sizes within these channels is essential for enhancing navigational safety, optimizing channel utilization, and reducing environmental impact. To address these needs, this study employs machine learning techniques to develop regression models that predict vessel behavior within navigational channels based on historical AIS data. The models accurately predict vessel position, heading, and speed at specific time intervals within the channels. Vessel behavior in navigational channels is influenced by factors such as vessel size, type, draft, and proximity to berths. Using data from the navigational channels of Tianjin Port in China, this study applies Decision Trees, Gradient Boosting Regression, and Random Forest methods to model vessel behavior within these channels. The key findings are as follows:

The Random Forest method outperforms both Gradient Boosting Regression and Decision Tree models in predicting vessel behavior within navigational channels. After hyperparameter tuning, the model achieves optimal performance with n_estimators set to 300, max_depth to 35, min_samples_split to 2, and min_samples_leaf to 1.
Through feature importance analysis, it is determined that vessel width, draft, and vessel type have low importance in the model. A comparison between models with and without these features shows minimal changes in mean squared error (MSE) and adjusted $R^{2}$ , indicating that these features can be omitted to reduce model complexity and training costs without significantly affecting performance.
Analyzing the prediction errors for longitude, latitude, heading, and speed on the test set, the Random Forest model consistently shows the lowest MSE, with values of 2293.859, 6790.852, 6.071, and 33.242, respectively. Compared to Gradient Boosting Regression and Decision Tree models, the MSE is reduced by 65.1%, 50.6%, 60.5%, and 54.4%, respectively, and by 69.1%, 40.1%, 43.5%, and 64.2%, respectively.
The study validates the models using complete trajectories of two vessels of different sizes, types, and berthing locations, along with predictions of speed and heading at each trajectory point. The Random Forest model demonstrates superior performance in accurately predicting vessel behavior within the navigational channel.

The study proposes a Random Forest regression method for ship behavior research within fairways, which effectively predicts ship position, speed, and heading in the fairway. This model is significantly helpful for risk management and vessel scheduling within the fairway. Particularly in predicting and calibrating speed and course changes for different types and sizes of vessels during berth approach and departure, as well as estimating the time required for port entry and exit. This model can predict vessel behavior in fairways, providing support for VTS in managing maritime traffic and can be used to ship collision avoidance.

This study’s dataset is primarily based on vessel traffic data from Tianjin Port. In the future, the dataset’s scope can be expanded to include data from different types of ports, different seasons, and varying weather conditions, to enhance the model’s generalization capability. Additionally, integrating multi-source heterogeneous data, such as historical vessel navigation data, meteorological data, and marine environmental data, could improve the comprehensiveness and accuracy of predictions. Furthermore, applying the model to real-time prediction systems could enable dynamic monitoring and adjustment of vessel navigation processes, further improving port management efficiency and navigation safety. The development of intelligent vessel navigation systems, combining real-time data and predictive models, could provide vessels with real-time navigation recommendations and warning information.

Author Contributions

Conceptualization, L.M. and G.S.; methodology, L.M. and Z.G.; validation, L.M. and Z.G.; data curation, L.M.; writing—original draft preparation, L.M. and Z.G.; funding acquisition, L.M. and G.S.; supervision, G.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by China Environment and Zoology Protection for Offshore Oil and Ocean Foundation (no. CF-MEEC-TR-2023-8) and the Fundamental Research Funds for the Central Universities (no. 3132024134).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

We are especially grateful for the financial and data support provided by the Navigation Safety and Guarantee Institute.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AIS	Automatic Identification System
COG	Course over ground
SOG	Speed over ground
SOLAS	International Convention for Safety of Life at Sea

References

Jiang, Q.; Ji, M.; Wang, J.; Sun, P. Remote Sensing Methods for Striped Marine Oil Spill Detection in Narrow Ship Channels. Ocean Eng. 2023, 289, 116162. [Google Scholar] [CrossRef]
Xiao, Z.; Fu, X.; Zhang, L.; Goh, R.S.M. Traffic Pattern Mining and Forecasting Technologies in Maritime Traffic Service Networks: A Comprehensive Survey. IEEE Trans. Intell. Transp. Syst. 2020, 21, 1796–1825. [Google Scholar] [CrossRef]
Xu, L.; Di, Z.; Chen, J.; Shi, J.; Yang, C. Evolutionary Game Analysis on Behavior Strategies of Multiple Stakeholders in Maritime Shore Power System. Ocean Coast. Manag. 2021, 202, 105508. [Google Scholar] [CrossRef]
Park, S.; Kang, W.-S.; Park, Y.-S. Analysis of Minimum Speed Control Effect Using Queue Model Focusing on Busan Port. J. Mar. Sci. Technol.-Taiwan 2020, 28, 564–571. [Google Scholar]
Wolsing, K.; Roepert, L.; Bauer, J.; Wehrle, K. Anomaly Detection in Maritime AIS Tracks: A Review of Recent Approaches. J. Mar. Sci. Eng. 2022, 10, 112. [Google Scholar] [CrossRef]
Rindone, C. AIS Data for Building a Transport Maritime Network: A Pilot Study in the Strait of Messina (Italy). In Proceedings of the International Conference on Computational Science and Its Applications, Braga, Portugal, 1–4 July 2024; Springer Nature Switzerland: Cham, Switzerland, 2024; Volume 11, pp. 213–226. [Google Scholar]
Russo, F.; Musolino, G. State of the Art of Factors Affecting Times of Ships in Container Ports: Characteristics Identification of Port Generations. In Proceedings of the International Conference on Computational Science and Its Applications, Braga, Portugal, 1–4 July 2024; Springer Nature Switzerland: Cham, Switzerland, 2024; Volume 3, pp. 283–295. [Google Scholar]
Guo, Z.; Qiang, H.; Xie, S.; Peng, X. Unsupervised Knowledge Discovery Framework: From AIS Data Processing to Maritime Traffic Networks Generating. Appl. Ocean Res. 2024, 146, 103924. [Google Scholar] [CrossRef]
Zhang, S.; Shi, G.; Liu, Z.; Zhao, Z.; Wu, Z. Data-Driven Based Automatic Maritime Routing from Massive AIS Trajectories in the Face of Disparity. Ocean Eng. 2018, 155, 240–250. [Google Scholar] [CrossRef]
Kim, K.-I.; Jeong, J.S.; Park, G.-K. Development of a Gridded Maritime Traffic DB for E-Navigation. Int. J. E-Navig. Marit. Econ. 2014, 1, 39–47. [Google Scholar] [CrossRef]
Ristic, B. Detecting Anomalies from a Multitarget Tracking Output. IEEE Trans. Aerosp. Electron. Syst. 2014, 50, 798–803. [Google Scholar] [CrossRef]
Xiao, Z.; Ponnambalam, L.; Fu, X.; Zhang, W. Maritime Traffic Probabilistic Forecasting Based on Vessels’ Waterway Patterns and Motion Behaviors. IEEE Trans. Intell. Transp. Syst. 2017, 18, 3122–3134. [Google Scholar] [CrossRef]
Pallotta, G.; Vespe, M.; Bryan, K. Vessel Pattern Knowledge Discovery from AIS Data: A Framework for Anomaly Detection and Route Prediction. Entropy 2013, 15, 2218–2245. [Google Scholar] [CrossRef]
Vespe, M.; Visentini, I.; Bryan, K.; Braca, P. Unsupervised learning of maritime traffic patterns for anomaly detection. In Proceedings of the 9th IET Data Fusion & Target Tracking Conference (DF&TT 2012): Algorithms & Applications, London, UK, 16–17 May; 2012; Volume 10, pp. 1–5. [Google Scholar]
Ristic, B.; Scala, B.; Morelande, M.; Gordon, N. Statistical analysis of motion patterns in AIS Data: Anomaly detection and motion prediction. In Proceedings of the 2008 11th International Conference on Information Fusion, Cologne, Germany, 30 June–3 July 2008; Volume 10, pp. 1–7. [Google Scholar]
Xiao, F.; Ligteringen, H.; van Gulijk, C.; Ale, B.J.M. AIS data analysis for realistic ship traffic simulation model. In Proceedings of the International Workshop on Next Generation of Nautical Traffic Model, Shanghai, China, 21–24 September 2012; pp. 44–49. [Google Scholar]
Wu, X.; Rahman, A.; Zaloom, V. Study of Vessel Travel Behavior at Hot Spots in Sabine-Neches Waterway. Ocean Eng. 2018, 147, 399–413. [Google Scholar] [CrossRef]
Ma, J.; Hu, Q.; Liu, T.; Zhu, Z.; Zhou, Y. Research on Ship Collision Risk Calculation in Port Navigation Waters Based on Ising Model and AIS Data. ASCE-ASME J. Risk Uncertain. Eng. Syst. Part A: Civ. Eng. 2024, 10, 04024003. [Google Scholar] [CrossRef]
Zhao, J.; Yan, Z.; Zhou, Z.; Chen, X.; Wu, B.; Wang, S. A Ship Trajectory Prediction Method Based on GAT and LSTM. Ocean Eng. 2023, 289, 116159. [Google Scholar] [CrossRef]
Wu, Y.; Yv, W.; Zeng, G.; Shang, Y.; Liao, W. GL-STGCNN: Enhancing Multi-Ship Trajectory Prediction with MPC Correction. J. Mar. Sci. Eng. 2024, 12, 882. [Google Scholar] [CrossRef]
Li, W.; Lian, Y.; Liu, Y.; Shi, G. Ship Trajectory Prediction Model Based on Improved Bi-LSTM. ASCE-ASME J. Risk. Uncertain. Eng. Syst. Part A.-Civ. Eng. 2024, 10, 04024033. [Google Scholar] [CrossRef]
Chen, X.; Wei, C.; Zhou, G.; Wu, H.; Wang, Z.; Biancardo, S.A. Automatic Identification System (AIS) Data Supported Ship Trajectory Prediction and Analysis via a Deep Learning Model. J. Mar. Sci. Eng. 2022, 10, 1314. [Google Scholar] [CrossRef]
Zhang, L.; Zhu, Y.; Su, J.; Lu, W.; Li, J.; Yao, Y. A Hybrid Prediction Model Based on KNN-LSTM for Vessel Trajectory. Mathematics 2022, 10, 4493. [Google Scholar] [CrossRef]
Tian, X.; Suo, Y. Research on Ship Trajectory Prediction Method Based on Difference Long Short-Term Memory. J. Mar. Sci. Eng. 2023, 11, 1731. [Google Scholar] [CrossRef]
Zhang, J.; Wang, H.; Cui, F.; Liu, Y.; Liu, Z.; Dong, J. Research into Ship Trajectory Prediction Based on An Improved LSTM Network. J. Mar. Sci. Eng. 2023, 11, 1268. [Google Scholar] [CrossRef]
Zhang, D.; Chu, X.; Wu, W.; He, Z.; Wang, Z.; Liu, C. Model Identification of Ship Turning Maneuver and Extreme Short-Term Trajectory Prediction under the Influence of Sea Currents. Ocean Eng. 2023, 278, 114367. [Google Scholar] [CrossRef]
Bao, K.; Bi, J.; Gao, M.; Sun, Y.; Zhang, X.; Zhang, W. An Improved Ship Trajectory Prediction Based on AIS Data Using MHA-BiGRU. J. Mar. Sci. Eng. 2022, 10, 804. [Google Scholar] [CrossRef]
Wu, W.; Chen, P.; Chen, L.; Mou, J. Ship Trajectory Prediction: An Integrated Approach Using ConvLSTM-Based Sequence-to-Sequence Model. J. Mar. Sci. Eng. 2023, 11, 1484. [Google Scholar] [CrossRef]
Liu, J.; Shi, G.; Zhu, K. Vessel Trajectory Prediction Model Based on AIS Sensor Data and Adaptive Chaos Differential Evolution Support Vector Regression (ACDE-SVR). Appl. Sci. 2019, 9, 2983. [Google Scholar] [CrossRef]
Qian, L.; Zheng, Y.; Li, L.; Ma, Y.; Zhou, C.; Zhang, D. A New Method of Inland Water Ship Trajectory Prediction Based on Long Short-Term Memory Network Optimized by Genetic Algorithm. Appl. Sci. 2022, 12, 4073. [Google Scholar] [CrossRef]
Jia, H.; Yang, Y.; An, J.; Fu, R. A Ship Trajectory Prediction Model Based on Attention-BILSTM Optimized by the Whale Optimization Algorithm. Appl. Sci. 2023, 13, 4907. [Google Scholar] [CrossRef]
Roth, S. Ray Casting for Modeling Solids. Comput. Graph. Image Process. 1982, 18, 109–144. [Google Scholar] [CrossRef]

Figure 1. VTS workflow diagram.

Figure 2. Diagram of ship behavior in Fairways.

Figure 3. Schematic Diagram of the Ray-Casting Algorithm. a, b, c, and d represent four ships at different positions, and the numbers indicating how many times rays emanating from each ship intersect the channel boundaries.

Figure 4. MID Distribution.

Figure 5. Ship speed Distribution.

Figure 6. Ship behavior model in fairway framework.

Figure 7. Research area (Tianjin Port).

Figure 8. Distribution of Training Data: (a) Length Distribution, (b) Width Distribution, (c) Draft Distribution, (d) Speed Distribution, (e) Length-to-Width Ratio, (f) Width-to-Draft Ratio.

Figure 9. Learning Curve Diagrams: (a) Decision Tree Learning Curve; (b) Gradient Boosting Regressor Learning Curve; (c) Random Forest Learning Curve.

Figure 10. Feature Importance Diagrams: (a) Decision Tree Feature Importance; (b) Gradient Boosting Regressor Feature Importance; (c) Random Forest Feature Importance.

Figure 11. MSE Comparison Diagrams: (a) Longitude Mean Squared Error; (b) Latitude Mean Squared Error; (c) SOG Mean Squared Error; (d) COG Mean Squared Error.

Figure 12. Adjusted

R^{2}

Comparison Diagrams: (a) Longitude Adjusted

R^{2}

; (b) Latitude Adjusted

R^{2}

; (c) SOG Adjusted

R^{2}

; (d) COG Adjusted

R^{2}

.

Figure 12. Adjusted

R^{2}

Comparison Diagrams: (a) Longitude Adjusted

R^{2}

; (b) Latitude Adjusted

R^{2}

; (c) SOG Adjusted

R^{2}

; (d) COG Adjusted

R^{2}

.

Figure 13. Prediction Error Scatter Plots: (a) Longitude Prediction Error; (b) Latitude Prediction Error; (c) SOG Prediction Error; (d) COG Prediction Error.

Figure 14. Predictions for the Bulk Carrier (Length 199 m, Beam 32 m, Draft 13.1 m): (a) Position Prediction; (b) COG Prediction; (c) SOG Prediction.

Figure 15. Predictions for the General Cargo Vessel (Length 115 m, Beam 20 m, Draft 9.3 m): (a) Position Prediction; (b) COG Prediction; (c) SOG Prediction.

Table 1. Model Parameters.

Model	Parameter Name	Parameter Range	Best Parameter
	max_depth	1, 5, 10, 20, 50, 100	20
Decision Tree	min_samples_split	2, 3, 5, 10, 20, 30	2
	min_samples_leaf	1, 2, 4, 10, 20, 30	1
	learning_rate	0.01, 0.1, 0.2, 0.3, 0.5, 1.0	0.1
	n_estimators	10, 25, 50, 100, 200, 300	300
Gradient Boosting Regression	max_depth	1,3, 5, 9,10,30	9
	min_samples_split	2, 5, 7, 9, 15, 20	7
	min_samples_leaf	1, 2, 5, 8, 10, 15	2
	n_estimators	10, 20, 50, 100, 300, 500	300
	max_depth	1, 5, 10, 35, 50, 100	35
Random Forest	min_samples_split	2, 3, 5, 10, 20, 30	2
	min_samples_leaf	1, 3, 5, 10, 15, 20	1

Table 2. Prediction Error Summary.

Model	Parameter	MSE	Adjusted $R^{2}$
Random Forest	LON	2293.8592	0.9999
	LAT	6790.8516	0.9999
	SOG	6.0709	0.9957
	COG	33.2415	0.9727
Random Forest	LON	18,823.2767	0.9998
	LAT	27,791.3605	0.9996
	SOG	38.9861	0.9923
	COG	160.2386	0.9812
Decision Tree	LON	23,957.7198	0.9979
	LAT	18,916.4763	0.9978
	SOG	19.0222	0.9871
	COG	259.5119	0.9359

Table 3. A Bulk Carrier’s behavior prediction Error Summary (MSE).

Model	Position	COG	SOG
Random Forest	743.8657	0.5998	0.0199
Gradient Boosting Regression	8899.3125	15.169	0.0178
Decision Tree	193,224.89	5405.0668	1.5231

Table 4. A general cargo vessel’s prediction Error Summary (MSE).

Model	Position	COG	SOG
Random Forest	422.5002	1.2202	0.0023
Gradient Boosting Regression	8571.1916	132.4752	0.0401
Decision Tree	1,010,341.8	11,931.0257	3.2384

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ma, L.; Guo, Z.; Shi, G. AIS Data Driven Ship Behavior Modeling in Fairways: A Random Forest Based Approach. Appl. Sci. 2024, 14, 8484. https://doi.org/10.3390/app14188484

AMA Style

Ma L, Guo Z, Shi G. AIS Data Driven Ship Behavior Modeling in Fairways: A Random Forest Based Approach. Applied Sciences. 2024; 14(18):8484. https://doi.org/10.3390/app14188484

Chicago/Turabian Style

Ma, Lin, Zhuang Guo, and Guoyou Shi. 2024. "AIS Data Driven Ship Behavior Modeling in Fairways: A Random Forest Based Approach" Applied Sciences 14, no. 18: 8484. https://doi.org/10.3390/app14188484

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AIS Data Driven Ship Behavior Modeling in Fairways: A Random Forest Based Approach

Abstract

1. Introduction

2. Related Work

2.1. Ship Behavior Learning Method

2.2. Ship Trajectory Prediction

3. Methodology

3.1. Problem Definition

3.2. AIS Data Pre-Processing

3.3. Model Training

4. Results Analysis and Discussion

4.1. Experimental Data

4.2. Performance Metrics

4.3. Model Parameter Settings

4.4. Learning Curves

4.5. Feature Importance Comparison

4.6. Prediction Error Analysis

4.7. Complete Trajectory Prediction

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI