4.1. Datasets and Model Parameters
This study focuses on the main navigational span of the Jintang Bridge and its adjacent waters as the designated research area. The Jintang Bridge is located in the coastal waters of Zhejiang Province, China. Spanning approximately 26 km, it connects Jintang Island with the mainland, serving as a major transportation corridor for both land and maritime traffic. The main navigational span, situated above one of the busiest shipping lanes in the region, accommodates frequent two-way passage of various vessel types. Due to the convergence of high vessel density, complex hydrological conditions, and limited navigational space beneath the bridge, the area presents significant challenges for maritime traffic management and safety monitoring. These characteristics make it an ideal and representative case study for evaluating the effectiveness of trajectory anomaly detection methods in high-risk, high-traffic maritime environments.
After data preprocessing, a total of 1400 northbound vessel trajectories were collected, as illustrated in
Figure 3. To ensure that the anomaly detection model accurately learns normal vessel navigation patterns, the dataset was meticulously refined through manual screening combined with expert knowledge from maritime authorities. As a result, a training set consisting of 850 normal trajectories was constructed, while the remaining trajectories were used as a test set for anomaly detection.
The input data to the model is structured as a tensor with the shape (N, T, F), where N represents the number of trajectories, T = 95 denotes the number of time steps (i.e., trajectory points per trajectory), and F = 5 indicates the number of features at each time step. The dataset is normalized using min–max scaling. Considering the periodic nature of the COG feature, this study transforms its original range from [0°, 360°] to [−180°, 180°] in order to avoid abrupt numerical transitions during normalization. This transformation preserves the continuity of angular values after normalization and effectively prevents the artificial jump between 0 and 1 that would occur if 0° and 360° were directly normalized. As a result, the stability of the model in learning periodic features is significantly improved. The output dimension of the transformer encoder was set to 128, with eight self-attention heads. The batch size was set to 64, and the number of training epochs was set to 100. The model was trained using the Adam optimizer with a learning rate of r = 0.001. Detailed model parameters are summarized in
Table 1.
To evaluate the robustness of the transformer–VAE model with respect to architectural hyperparameters, a sensitivity analysis was conducted on four key parameters: latent dimension (latent_dim), hidden layer dimension (hidden_dim), number of encoder layers (num_layers), and number of attention heads (num_heads). In each experiment, only one parameter was varied while the others were held constant. The average reconstruction error on the test set was used as the evaluation metric.
As shown in
Figure 4, the reconstruction errors remained within a narrow range across all configurations, with variation amplitudes generally within 0.02. This indicates that the model exhibits good stability and robustness under a variety of structural configurations. Specifically, changes in latent_dim had a negligible effect on reconstruction accuracy, indicating insensitivity to the size of the latent space. Increasing the hidden_dim slightly improved performance, although the differences were not substantial. When num_layers was set to 6, the model achieved the best performance, while using four or eight layers led to only minor fluctuations. Variations in num_heads had minimal impact on reconstruction accuracy, reflecting strong adaptability in the attention structure. These findings confirm that the transformer–VAE model maintains stable performance under different parameter settings and demonstrates high structural robustness and practical applicability in trajectory anomaly detection tasks.
To evaluate the transformer–VAE model’s ability to learn normal trajectory patterns and reconstruct trajectories, comparative experiments were conducted with two baseline models: the traditional VAE and the LSTM–VAE. The traditional VAE uses fully connected neural networks (MLPs) as the encoder and decoder for trajectory data. The LSTM–VAE, a variational autoencoder based on long short-term memory networks, employs LSTM units as both encoder and decoder. During the training process of all three models—VAE, LSTM–VAE, and transformer–VAE—the same set of hyperparameters was used. The reconstruction performance was evaluated using RMSE, MSE, and MAE as the evaluation metrics.
4.2. Experimental Validation of Trajectory Reconstruction
Figure 5 presents a comparison of the reconstruction performance of three models—transformer–VAE, LSTM–VAE, and conventional VAE—on four trajectory features: longitude (LON), latitude (LAT), SOG, and COG, based on a randomly selected test trajectory sample. In the figure, the red curve represents the original observed values, while the other colored curves correspond to the reconstruction results produced by the different models. As shown in the figure, transformer–VAE achieves the best reconstruction performance across all feature dimensions, demonstrating superior modeling capability and higher fitting accuracy compared to both LSTM–VAE and the conventional VAE.
For the LON feature, the reconstruction curve generated by the transformer–VAE closely aligns with the observed values, accurately capturing both the overall trend and local variations, indicating strong spatial modeling capability. The reconstruction performance of LSTM–VAE is slightly inferior; although the overall trend is generally consistent with the observations, local delays and deviations can be observed. The conventional VAE performs worst on this feature, with its reconstruction curve exhibiting substantial fluctuation and significant deviation from the ground truth, suggesting a limited ability to fit spatial sequences. As for the LAT feature, since vessels navigating through the Jintang Bridge area primarily travel along the north–south direction, the latitude generally follows a monotonically increasing pattern. Consequently, the reconstruction differences among the three models are relatively small. Nevertheless, the transformer–VAE still demonstrates the best fitting performance, followed by LSTM–VAE, while the reconstruction curve of the conventional VAE still presents a certain degree of fluctuation.
For the SOG feature, transformer–VAE effectively captures the trend of speed variation, with reconstruction results that are smooth and closely aligned with the ground truth, demonstrating a strong capability in modeling dynamic features. In contrast, the reconstruction output produced by the LSTM–VAE is consistently lower than the original values, while the reconstruction curve of the conventional VAE exhibits substantial fluctuations and fails to accurately reproduce the real speed variation process. Regarding the COG feature, the transformer–VAE maintains stable reconstruction performance and is capable of smoothly fitting the trend of heading variation. Although the reconstruction results of the LSTM–VAE generally follow the overall pattern of the ground truth, the curve exhibits noticeable fluctuations and fails to smoothly capture the periodic nature of heading changes, resulting in localized reconstruction instability. In contrast, the reconstruction produced by the conventional VAE shows significant oscillations and large reconstruction errors, making it incapable of accurately capturing the detailed variations in vessel heading.
To comprehensively evaluate the performance of the transformer–VAE, LSTM–VAE, and VAE in the trajectory reconstruction task, this study calculates the MSE, MAE, and RMSE of the three models on both the training and test datasets. The results are presented in
Figure 6. On the training set, transformer–VAE achieves the lowest reconstruction errors across all trajectory feature dimensions, indicating its strong feature representation and trajectory pattern learning capabilities, which enable high-quality trajectory reconstruction. The test results further demonstrate that the transformer–VAE consistently maintains the lowest error levels across all metrics, reflecting its superior generalization performance. These findings further confirm that the transformer–VAE can effectively model complex spatiotemporal dynamics across different trajectory samples while maintaining a high level of reconstruction quality.
Based on the above trajectory reconstruction results, the transformer–VAE outperforms the baseline models in terms of reconstruction accuracy and feature fitting capability, exhibiting lower errors and higher consistency across trajectory features. This indicates that the proposed model can more accurately learn and reconstruct the spatiotemporal characteristics of normal vessel trajectories, providing a solid foundation for subsequent anomaly detection.
In addition to reconstruction accuracy, computational efficiency is also important for evaluating model practicality. We compared the transformer–VAE and LSTM–VAE models under identical settings on a standard PC (Intel i5-7300HQ CPU, GTX 1050 GPU, 16 GB RAM) using Python 3.11. The transformer–VAE had 1,228,100 trainable parameters and required 3037 s to train for 200 epochs, while LSTM–VAE had 1,484,164 parameters and took 2820 s. Although the transformer–VAE has fewer parameters, this is due to its streamlined attention-based architecture compared to the gate-heavy structure of LSTM. The slightly higher training time is primarily attributed to the attention mechanism and the higher per-layer computational complexity of transformer blocks. Nonetheless, transformer–VAE remains suitable for large-scale offline anomaly detection. For real-time applications, lightweight optimization strategies can be considered.
4.3. Experimental Validation of Anomaly Detection
The transformer–VAE model distinguishes normal and abnormal trajectories by learning only the features of normal trajectories within the dataset. Specifically, the reconstruction error, or anomaly score, of a normal trajectory remains relatively low, as the model has learned its underlying patterns. Conversely, anomalous trajectories, which are not encountered during training, tend to exhibit higher reconstruction errors due to their deviations from learned patterns.
As shown in
Figure 7, the distribution of reconstruction errors for the transformer–VAE model on both the training set and test set is illustrated. The distribution of training reconstruction errors exhibits a right-skewed pattern, with the majority of samples having reconstruction errors concentrated within the range of 0.0 to 0.01. The peak of the distribution is located in an extremely low-error region, approximately between 0.003 and 0.005. The sharp decline in error values beyond 0.01 indicates that the temporal features of most training samples have been stably modeled, with no significant deviations from the patterns learned by the model. This observation suggests that the model achieves a good fit on the training set, accurately learning the patterns of normal trajectories and enabling high-precision reconstruction. The distribution of test errors follows a similar overall trend to that of the training set but with notable differences. Although the test error distribution remains right-skewed, its range is broader, and some test samples exhibit significantly higher reconstruction errors compared to the training set. In particular, a noticeable long-tail distribution appears in the region above 0.02. Within the low-error range of 0.0 to 0.01, the test error distribution still exhibits a high peak, indicating that most test trajectories conform to the learned normal patterns and are reconstructed accurately. This confirms the transformer–VAE model’s strong ability to generalize and reconstruct normal trajectories. However, beyond the 0.02 threshold, the tail of the test error distribution extends further than that of the training set, with the highest reconstruction error approaching 0.1. These high-error trajectories are likely to contain anomalous behavior, suggesting that the transformer–VAE model is capable of generating significantly higher reconstruction errors for abnormal trajectories under unsupervised conditions, thereby providing a reliable basis for anomaly detection.
In an unsupervised learning setting, the model has no prior exposure to abnormal trajectories; therefore, the core idea of anomaly detection is to identify trajectories with large reconstruction errors as anomalies based on the statistical distribution of reconstruction errors. To determine a reasonable threshold for anomaly detection, this study adopts a quantile-based approach to control the sensitivity of detection. Specifically, the 90th, 95th, and 98th percentiles are selected as threshold values for detecting anomalous trajectories within the test set. As shown in
Figure 8, the number of detected anomalies decreases as the threshold increases. Under the 90th percentile threshold, the largest number of anomalous trajectories is detected, though some normal trajectories may be falsely identified as anomalies. When using the 95th percentile threshold, fewer anomalies are detected, but most trajectories that deviate from normal patterns can still be captured. At the 98th percentile threshold, only trajectories with extremely large reconstruction errors are flagged as anomalous, reducing the false positive rate but potentially missing mildly abnormal trajectories. The results indicate that the choice of quantile threshold has a direct impact on anomaly detection outcomes. Lower thresholds (e.g., 90%) increase the recall rate but may result in more false positives, whereas higher thresholds (e.g., 98%) reduce false positives but may lead to missed detections. The 95th percentile threshold provides a relatively balanced trade-off between detection accuracy and robustness.
The choice of quantile threshold has a direct impact on the performance of anomaly detection. Lower quantile thresholds tend to increase the recall rate by detecting more potentially anomalous trajectories, but they may also lead to higher false positive rates by misclassifying normal trajectories. In contrast, higher thresholds reduce false alarms but may miss moderately abnormal trajectories. To effectively evaluate the accuracy of anomaly detection under different quantile settings, it is essential to construct a reliable ground truth dataset of anomalous maritime trajectories.
In this study, representative anomalous vessel trajectories were identified and labeled by integrating multi-source data. Two primary sources were utilized:
- (1)
Vessel traffic service (VTS) warning records, which capture real-time operational risk alerts flagged by maritime surveillance systems during navigation.
- (2)
Administrative penalties and incident reports issued by maritime authorities, which provide authoritative documentation of historical violations or hazardous behaviors.
To further enhance the accuracy and representativeness of the labeled data, a manual verification process was conducted. A panel of three domain experts in maritime safety—including a certified captain, a VTS instructor, and a professor from a maritime academy—was invited to review the trajectories in the test set. The experts evaluated each trajectory in detail, taking into account real-world navigational contexts and vessel behavior characteristics, such as spatial deviation, abrupt speed changes, and anomalous course patterns. As a result, 33 anomalous trajectories were confirmed and used as ground truth labels to evaluate the performance of the proposed transformer–VAE model under different quantile thresholds. By comparing the detection results with expert-labeled ground truth, we further analyzed the model’s performance in terms of false positive rate, false negative rate, and F1-score, thereby providing empirical guidance for threshold selection and detection accuracy optimization.
Figure 9 presents the confusion matrices of anomaly detection results under three quantile thresholds, while
Table 2 summarizes the corresponding performance metrics, including accuracy, precision, recall, and F1-score. These metrics are calculated as follows:
where,
TP (true positive) refers to cases where the model correctly predicts an anomalous trajectory, and
TN (true negative) refers to cases where the model correctly predicts a normal trajectory.
FP (false positive) occurs when the model incorrectly predicts an anomalous trajectory for a normal one, and
FN (false negative) occurs when the model incorrectly predicts a normal trajectory for an actual anomalous one.
As illustrated in both the figure and the table, the choice of quantile threshold plays a critical role in shaping the model’s detection performance. When using the 90% quantile threshold, the model achieved the highest recall (1.000), successfully identifying all 33 expert-labeled anomalous trajectories. However, due to the relatively low threshold, it also produced a high number of false positives (FP = 23), resulting in lower precision (0.5893) and overall accuracy (0.9586). This indicates that, under this setting, the model demonstrates strong sensitivity to anomalies but lacks sufficient discriminative ability for normal trajectories.
When the threshold is increased to 95%, the model shows a significant improvement in precision (0.9643) while maintaining a relatively high recall (0.8182), leading to the highest F1-score (0.8852). The number of false positives is reduced to only one, suggesting that this setting achieves an optimal balance between anomaly detection capability and false alarm control, with the highest overall accuracy (0.9874). However, when the threshold is further increased to 98%, the model becomes overly conservative. Although it attains perfect precision (1.0000), the recall drops sharply to 0.3636, and the F1-score declines to its lowest value (0.5333). The confusion matrix shows that most anomalous trajectories were misclassified as normal (FN = 21), indicating severe under-detection, which significantly limits the practical applicability of this threshold setting.
In summary, the 95% quantile threshold provides the best compromise between sensitivity and reliability, making it more suitable for real-world maritime traffic monitoring, where both accurate anomaly detection and low false alarm rates are essential for ensuring navigational safety and supporting effective traffic management.
To explore the feasibility of dynamic thresholding, a preliminary experiment based on extreme value theory was conducted as a methodological extension. Specifically, the POT method was applied. The initial threshold was set at the 95th percentile of the reconstruction error distribution. All exceedances of , where , were extracted and modeled using a generalized Pareto distribution (GPD). The shape parameter () and scale parameter () were estimated using maximum likelihood estimation (MLE). The final anomaly threshold was calculated by adding the target GPD quantile to , resulting in a value of 0.047382, which closely matched the original 95th percentile threshold (0.047244). Both thresholds produced identical anomaly detection results on the test set. Although no additional anomalies were identified, this experiment validates the theoretical soundness of POT and highlights its potential applicability in dynamic maritime environments.
After completing the analysis of detection performance under different quantile thresholds, a representative normal trajectory and an anomalous trajectory were selected for detailed analysis, based on the fixed 95% quantile threshold.
Figure 10 presents the spatial visualization of the normal and anomalous trajectories identified by the transformer–VAE model. It is clearly observed from the figure that the vessel corresponding to the normal trajectory (blue line) traveled from south to north, strictly following the designated northbound navigational channel and smoothly passing through the bridge area. In contrast, the anomalous trajectory (red line) shows that the vessel did not comply with navigation regulations as it approached the bridge area, choosing instead to deviate significantly from the main navigation channel by taking a shortcut directly towards the bridge along its original heading. Such anomalous navigational behavior clearly violates the relevant regulations for bridge-area navigation and could potentially increase the risk of vessel-to-vessel collisions or severe vessel-bridge accidents.
Figure 11 illustrates the reconstruction results of the normal trajectory features. As shown in
Figure 11a,b, the original and reconstructed values of LON and LAT features of the normal trajectory exhibit high consistency, indicating that the vessel maintained a steady course along the designated navigational channel during actual sailing without any noticeable deviation in route or position. This trajectory behavior aligns well with the normal patterns learned by the model. Similarly, subplots of SOG and COG also show good reconstruction performance. Despite minor fluctuations observed locally, the reconstructed values closely match the original observed values. These results demonstrate that the transformer–VAE model, by learning from a substantial amount of normal trajectory data, successfully captures the inherent patterns of typical vessel movements, enabling accurate reconstruction of normal trajectories.
A representative anomalous trajectory reconstruction result is shown in
Figure 12. The original trajectory was identified as anomalous based on a quantile threshold, and the effectiveness of the transformer–VAE model can be verified visually. Specifically, the trajectory was flagged as anomalous due to significant reconstruction errors observed in three key features: LON, SOG, and COG. The spatial distribution of this trajectory significantly deviates from typical navigation patterns, particularly before the vessel enters the bridge area, where it significantly strays from the main navigational channel.
Prior to entering the bridge area, the actual longitude of the first 70 trajectory points (blue solid line) exhibits significant variation, clearly indicating that the vessel deviated from the standard navigational route. In contrast, since the transformer–VAE model has learned typical normal trajectory patterns, its reconstructed trajectory (orange dotted line) remains consistently aligned with the main navigational channel. As a result, the discrepancy between the actual trajectory and the model’s reconstruction leads to a substantial increase in reconstruction error for the LON feature. Regarding LAT, the reconstruction performed well, primarily due to the anomalous trajectory’s LAT remaining relatively stable and consistent with normal trajectories, enabling effective reconstruction based on learned normal patterns.
Regarding the SOG feature, vessels typically maintain stable speeds when navigating through cross-sea bridge areas under normal conditions. However, the trajectory under consideration reveals sharp fluctuations in speed characterized by brief bursts of acceleration and deceleration prior to reaching the bridge. Conversely, the transformer–VAE model, having learned stable speed patterns from mainstream trajectories, generates smoother reconstructed speed curves. This discrepancy leads to significant reconstruction errors in the SOG feature, further highlighting the anomalous nature of the actual trajectory. Similarly, for the COG feature, the anomalous trajectory showed frequent and irregular heading variations, whereas the model’s reconstruction remained stable, reflecting learned normal heading patterns. This resulted in considerable reconstruction errors due to these irregular deviations.
Overall, because the transformer–VAE model has effectively captured mainstream navigational patterns from normal vessel trajectories, its reconstructions consistently align with standard spatial (LON and LAT), speed, and heading behaviors. Thus, trajectories deviating from these typical patterns exhibit notable reconstruction errors, allowing the model to efficiently detect spatial deviations, abnormal speed fluctuations, and irregular heading changes. As an unsupervised anomaly detection method, the transformer–VAE demonstrates strong capability in identifying diverse types of anomalous trajectories.