1. Introduction
The increasing volumes of trajectory datasets bring challenges in storage, querying, and processing [
1,
2,
3], which can be addressed by applying trajectory compression (or simplification). Trajectory compression refers to the elimination of points in a trajectory which do not contain new information [
4]. Through trajectory compression, the overall volume of data is reduced, thereby facilitating data storage and analysis [
4,
5]. It accelerates and improves the performance of subsequent trajectory processes [
2,
5]. In practical applications, trajectory compression is an effective method for preprocessing and visualizing massive trajectory data [
6].
Algorithms use different criteria to identify the redundant trajectory points [
5] such as offset distance [
7], spatial topology [
8], time, velocity, or combinations of them, i.e., a combination of time and distance [
5], or a combination of velocity and distance [
9]. Although the compression algorithm eliminates the redundant data, it must preserve the moving object’s behavioral information within the trajectory [
8], minimize information loss, and maintain trajectory quality [
10].
The ultimate criterion for trajectory quality is the extent to which the trajectory can be compressed without damaging its use for further processing [
9], meaning that there is a trade-off between the number of eliminated points and trajectory quality after compression [
11]. Therefore, maintaining the trajectory quality and obtaining a consistent or meaningful output is a complicated problem [
12]. A trajectory compression method is typically evaluated based on (1) possessing time, (2) compression ratio, and (3) error measurement [
2], which are (1) the duration of the compression process, (2) the ratio between the size of simplified and actual trajectories, and (3) deviations between compressed and actual trajectories, respectively [
13]. The processing time is used to evaluate the efficiency, while the remaining metrics are used to assess the effectiveness of the processes [
13].
In trajectory databases, various techniques have been developed to compress trajectory data. The most common method is Piecewise Linear Segmentation (PLS) [
14], which is intuitive, easy to implement, and relatively fast [
15]. The Douglas–Peucker (DP) algorithm, another popular method, is based on PLS [
16] and guarantees an error-bound compression process [
1]. PLS and DP have a worst-case time complexity of
O(
n2). While these methods are straightforward, obtaining consistent or meaningful results can be challenging [
12]. To address this, DPhull was introduced, leveraging convex hull properties to achieve the same compression ratio as DP while reducing the complexity to
O(
n log n) [
1].
To address both spatial and temporal dimensions of trajectory data, the TD-DP method was developed [
17]. It uses Synchronous Euclidean Distance (SED) to account for temporal characteristics. Additionally, Zhang et al. [
18] introduced the Adaptive Core Threshold Difference-DP (ACTD-DP) algorithm, which builds on DP by incorporating ships’ course values. The ACTD-DP compresses trajectories by employing curve fitting to establish a mathematical relationship between the compression threshold and the number of points. This function aids in determining the optimal core threshold, a critical parameter for controlling the algorithm’s accuracy and efficiency. Guo et al. [
19] proposed Top-Down Kinematic Compression (TDKC)
, an adaptive algorithm designed to preserve key features in AIS trajectory data, such as time, position, speed, and course. TDKC uses a Compression Binary Tree to address recursion termination and automatic threshold determination. A case study in the Gulf of Finland demonstrated TDKC’s superior performance over conventional and improved algorithms, proving its effectiveness in maritime traffic analysis.
Semantic trajectory compression has also gained attention. Zhou et al. [
20] proposed a method that integrates semantic and spatio-temporal features using information entropy and Synchronous Euclidean Distance. By prioritizing feature points with maximum semantic-spatio-temporal distance, the approach preserves semantic similarity and enhances the interpretability of compressed trajectories. Gao et al. [
21] introduced another semantic trajectory compression technique inspired by synchronization. It uses a multi-resolution clustering model to create hierarchical semantic regions of interest (ROIs), compressing trajectories into sequences of semantic ROIs. This model is extendable to data streams and validated on synthetic and real-world datasets, demonstrating its efficiency and effectiveness.
Compression methods often face challenges when using position-preserving error measures, as these can lose critical information required for analyses such as trajectory clustering. To address this, Direction-Preserving Trajectory Simplification (DPTS) was developed [
22]. By employing an angular distance measure, DPTS preserves both directional and positional information. However, this approach can increase computational complexity, illustrating the challenge of balancing eliminating redundant points with the retention of meaningful movement patterns. This balance is vital for applications such as behavior analysis [
11,
12].
A GPU-based parallel computing framework was developed with the Adaptive DP with Speed and Course (ADPSC) algorithm for large datasets and real-time applications. This framework optimizes threshold calculations and accelerates trajectory compression processes. Experiments conducted on vessel trajectory datasets demonstrated the superior performance of the ADPSC algorithm in compressing trajectories. The GPU framework significantly reduced processing times, supporting real-time decision making. These advancements enhance maritime safety and minimize data storage costs, providing critical support for autonomous shipping in complex waters [
23].
The movement of objects, such as animals, people, or vehicles, is often influenced by context, like weather, which can either enable or restrict movement [
24,
25]. For instance, vessels avoid rough seas or shallow waters, and humans become more aggressive in high temperatures [
26,
27,
28]. Therefore, ignoring context in trajectory compression can lead to inconsistent results [
8,
12]. For example, in maritime trajectories, compression may produce segments that intersect coastlines, rendering them invalid. To address this, a contextual DP algorithm was developed to preserve topology and direction. However, it struggles with retaining stopping points, which can misrepresent object behavior [
9,
12]. A two-stage context-aware PLS was proposed to solve the problem of retaining stopping points. This method first analyzes speed to detect stops and then integrates topological relations into PLS to ensure logical consistency in the simplified trajectory [
8]. By incorporating context, this approach addresses the shortcomings of traditional methods, enabling more accurate and reliable trajectory compression for applications such as behavior analysis and movement modeling.
Previous context-aware trajectory compression methods primarily focus on the geometric properties of context, which leaves a persistent risk of losing meaningful trajectory points. This paper proposes incorporating spatial, non-spatial, and semantic context domains into the trajectory compression process to address this limitation. Initially, a novel context-aware trajectory formulation is introduced to effectively capture the contextual information surrounding the moving agent. Building on this foundation, a context-aware Douglas–Peucker algorithm is developed to minimize the risk of discarding significant trajectory points.
The remainder of the paper will discuss the following sections: The Dp algorithm and proposed methodology are summarized in
Section 2.
Section 3 presents the results of the implementation and discusses them. A conclusion is presented in
Section 4 of the document, along with suggestions for future research.
3. Results and Discussion
3.1. Datasets and Preprocesses
This paper uses three datasets. First, vessel movement data were collected by Automatic Identification Systems (AIS), which contained 3,213,700 locations of 1125 ships traveling around the English Channel in February 2016 (
Figure 1). These AIS messages are converted into trajectories using a Python library named MovingPandas [
41].
According to [
42], the most influential context factors for vessel trajectory prediction are waves’ direction, height, and water depth. Therefore, these factors are used in this paper in the environmental context. These parameters came from an ocean condition dataset [
43] covering a longitude range from 0 W to 10 W and a latitude range from 45 N to 51 N. The dataset has a three-hour temporal resolution and contains 43,206 gridded sampling points.
To link movement and context data coming from different sources and having different temporal resolutions, this paper used the trajectory annotation method described in [
42]. Trajectory annotation includes spatial and temporal interpolation [
27]. As a result of annotation, each trajectory point contains an environmental context. A Python environment and libraries such as MovingPandas [
41] and tslearn [
44] are used for implementations. Trajectories are compressed using CADP with the maximum allowed distance threshold (D), ϵ
max, set to 1 km and different thresholds for context parameters. Then, runtime, a compression ratio
R, and the similarity between the simplified and original trajectory are calculated. The CADP compressed trajectories are then used in a deep learning trajectory prediction model to evaluate the effectiveness of the CADP and its effects in further trajectory analysis.
3.2. Effects of Context on Compression Ratio and Similarity of Trajectories
Figure 2 shows that the compression ratio rises as the threshold increases. The larger the threshold, the higher the dtw (smaller similarity). Similarity graphs show sharp increases in all context parameters. They show critical points, meaning that, by increasing the threshold, the amount of useful data loss is greater than the volume reduction. In other words, the compression ratio increases more slowly after the critical point.
Furthermore, according to
Figure 2, similar graphs of compressed and original datasets are converging. After the convergence point, increasing the context threshold has little effect on similarity reduction, meaning that most useful trajectory points containing useful context data are discarded. It is, therefore, possible to save those essential trajectory points using the CADP.
Table 1 shows the compression ratio and similarity of simplified trajectories with different wave direction settings in the CADP method. The compression ratio increased by 1.68% in point 3, while the similarity of compressed trajectories decreased by 72.99%. In other words, 1.68% of the trajectory points contain crucial information. Although the overall behavior of the CADP in all three experiments is the same, i.e., as the compression ratio increases, the similarity decreases, there are slight differences in context behavior. This could help to balance the similarity of the original and compressed datasets and select appropriate compression thresholds. Lastly,
Figure 3 visualizes the trajectory before and after compression based on the CADP algorithm.
Additionally, the DP and two-stage Context-Aware Piecewise Linear Segmentation (CPLS) are implemented for better comparability. As the two-stage CPLS takes context and topological relationship into the trajectory compression process [
6], comparisons of these methods will offer a better balance in testing the effectiveness of CADP for preserving spatial and contextual fidelity. In both algorithms, the spatial setting, i.e., ϵ
max, is set to =1 km.
In this case, the compression ratio was 76.78%, with a similarity score of 524,860 using the DP algorithm and 72.08% using the two-stage CPLS method with a corresponding similarity score of 334,029. The results reflect the trade-off between compression efficiency and trajectory fidelity. The DP had a higher compression ratio of 76.78% with a similarity score of 524,860, while the two-stage CPLS had a lower compression ratio of 72.08% with a similarity score of 334,029. Since the performance is reflected by a lower similarity score and a higher compression ratio, this result means that two-stage CPLS outperforms DP in trajectory compression with better retention of crucial spatial and contextual information. These findings provide a point of reference to compare the performance of the proposed CADP algorithm, whose goal is the simultaneous optimization of both compression efficiency and the retention of context.
3.3. Effects of CADP on Trajectory Prediction
Accurate vessel trajectory prediction can improve safety management at sea. To evaluate the CADP and the effects of a good compression method, this paper used a Long Short-Term Memory (LSTM) network currently used by various maritime systems and applications for trajectory prediction. This method is widely used in different maritime systems and applications, including collision avoidance, vessel route planning, and anomaly detection systems [
42,
45,
46,
47,
48].
All trajectory datasets, which are compressed by DP, two-stage CPLS, and CADP, are divided into two main groups. The first group contains 80% of the data and is used for modeling (training). The remaining data are used as a test set for accuracy assessment. Parameter selection is a crucial task for recurrent neural networks and is a trial-and-error process. The Adam optimizer [
49], which is a stochastic gradient descent and adaptive learning rate optimization method, was used in the experiments. It is well known that including more hidden layers in the deep neural network enhances the model’s performance and learning ability. Each layer contains 135 hidden units. The gradient threshold was set to 1 to prevent the gradients from exploding. Furthermore, the initial learning rate was 0.004, and the learning rate dropped after 140 epochs by multiplying by a factor of 0.2. The LSTM takes three steps of trajectory and predicts the next three steps, i.e., the average prediction interval is 15 s. According to the average speed of vessels in the dataset, in 15 s, a vessel moves about 141 m. Evaluation results are summarized in
Table 2.
The results highlight the performance of context-aware compression algorithms like CADP and two-stage CPLS in reducing complexity with a gain in preserved relevant information from the trajectory data for further prediction. According to
Table 2, validation RMSE values demonstrate general performance improvement within the context-aware compressed trajectories compared to their counterparts for most situations. DP-compressed data perform much better than uncompressed trajectories with an RMSE of 0.012226, which is much better than 0.048789. This is because compressing spatial trajectories significantly reduces the prediction error by removing redundant or less critical data points that do not contribute much to trajectory prediction accuracy and reducing the complexity of data during LSTM training.
The two-stage-CPLS-compressed trajectories result in an RMSE of 0.0047859. This is a substantial enhancement over DP-compressed data, but larger than the best-performing variant of CADP, Spatial + Water Depth. This likely implies that including topological relationships and contextual features in two-stage CPLS retains important information regarding trajectories that is essential for good prediction accuracy. However, the relatively higher RMSE compared to CADP might indicate that two-stage CPLS does not prioritize certain contextual dimensions as much as CADP does, since it is tailor-made to include some environmental factors such as depth of water and wave characteristics. The CADP-compressed trajectories contain specific contextual information, such as water depth, wave height, and water direction, and have the best overall prediction accuracy. This is probably because context-aware compression retains essential spatial information and integrates environmental factors that significantly influence vessel movement.
Spatial + Water Depth (Compressed by CADP) yields the lowest RMSE of 0.004195, suggesting that water depth plays a significant role in accurately predicting vessel trajectories. This may be because vessels adjust their movements based on water depth, particularly in shallow or coastal regions.
Spatial + Wave Height (Compressed by CADP) results in an RMSE of 0.005792, meaning wave height, though important, is a bit less influential in these data compared to water depth; it does still have an impact on the error reduction to provide a better prediction over a DP-compressed dataset.
Spatial + Water Direction (Compressed by CADP) shows an RMSE of 0.004738, demonstrating that water direction influences vessel movement and prediction accuracy, and it provides a better prediction over DP-compressed and two-stage-CPLS-compressed trajectories.
The model focuses on the most critical features by compressing trajectories before training and avoids overfitting redundant data. Two-stage CPLS and CADP provide advantages over DP by incorporating contextual features that improve prediction accuracy. However, CADP’s tailored approach to specific contextual dimensions, such as water depth, achieves the best performance, highlighting its effectiveness in reducing RMSE while maintaining computational efficiency. The results suggest that compression methods enhance prediction accuracy and optimize the training process by distilling essential trajectory characteristics.
3.4. Computational Efficiency Analysis
The worst-case theoretical time complexity for the proposed CADP algorithm is
O(
n2), which agrees with the DP algorithm. This means the changes introduced by CADP to add contextual awareness did not harm the DP’s theoretical time complexity. In contrast, the two-stage CPLS has a worst-case theoretical time complexity of
O(
n3), reflecting higher computational overhead than CADP. Moreover, such a performance reflects CADP’s scalability well and thus denotes evidence that adding contextual domains was achieved without sacrificing computational efficiency. So, such an efficient compression method seems to have significant potential for enhancing a wide range of applications of trajectory data analysis, such as location-based services, movement pattern analysis, and transportation optimization [
8].
Besides the above theoretical time complexity evaluation, runtime measurements were conducted on trajectory datasets of varying sizes to benchmark the performance of the proposed CADP algorithm against the original DP algorithm. The experiments were performed in a controlled test environment using an AMD Ryzen 7 7700X 4.5 GHz CPU, 16 GB RAM, and an NVIDIA GeForce RTX 4060 Ti GPU. The experiments aimed to determine how much time both algorithms require to perform the task, compressing trajectories with a different number of points (n).
Annotated trajectory datasets of five sizes were prepared for experiments: 1000, 5000, 10,000, 50,000, and 100,000 points per trajectory. In all sets of experiments, the spatial compression threshold (ϵ) value was fixed as 10 m, and for CADP, the wave height, wave direction, and bottom depth threshold were 0.15 m, 1.5 degrees, and 80 m, respectively. Each experiment was repeated 10 times to account for variability, and the average runtimes were recorded (
Table 3 and
Figure 4). Since the runtime of the two-stage CPLS method was significantly different from the other algorithms, a separate figure (
Figure 5) was created to compare the runtime performance of DP and CADP. This gives a better view of their relative scalability and computational efficiency.
The theoretical time complexity of the two-stage CPLS method a is O(n3), which is significantly higher than the O(n2) complexity of both the DP and CADP algorithms. The cubic time complexity results from how the method is designed to go through topological relationships for every point in the trajectory. It includes intensive operations such as testing pairwise relations and maintaining contextual and topological consistency along the trajectory. This O(n3) complexity becomes extremely prohibitive for large trajectory sizes, with an exponentially higher runtime than for O(n2) algorithms. For example, while the 100,000 points of CADP take roughly 3355 milliseconds to process, the CPLS method requires upwards of 127,315 milliseconds on the same dataset. This huge increase further emphasizes how computationally inefficient CPLS is in dealing with large-scale applications.
In contrast, it can be verified from the CADP algorithm that incorporating contextual parameters can be fulfilled without significant improvement in computational complexity; it keeps O(n2) complexity in the same class as DP. Thus, CADP balances very well between the two goals and becomes more acceptable for large-scale trajectory analysis. As mentioned above, such a time complexity difference at the theoretical level and in empirical performance manifests that CADP is taking further steps to enhance scalability on large-scale and/or real application scenarios with finite computation resources compared to other methods.
The results of the runtime analysis give an idea of the practical impact of this difference in complexity. For instance, whereas the DP algorithm takes 3190 milliseconds to process 100,000 points and the computationally most expensive variant of CADP (Spatial + Wave Direction) takes 3355 milliseconds, the two-stage CPLS method requires 127,315 milliseconds for the same number of points. Such a huge contrast reveals the inadequacy of CPLS for large-scale applications, where its runtime is almost 38 times higher than that of CADP.
Although the integration of contextual dimensions introduces additional overhead, CADP does so without increasing its theoretical time complexity beyond O(n2). For example, the CADP (Spatial + Water Depth) variant processes 100,000 points in 3290 milliseconds, while the CADP variants, namely, CADP (Spatial + Wave Height) and CADP (Spatial + Wave Direction), take 3340 and 3355 milliseconds, respectively. This increase over DP, at 3190 milliseconds, demonstrates how efficiently CADP balances context awareness and scalability. By comparison, the two-stage CPLS method, effective in retaining topological relationships and spatial consistency, has a prohibitive computational expense for large datasets. These results put into perspective the advantage of CADP, which joins computational efficiency at O(n2) algorithms with multiple contextual dimensions, making it more practical for large-scale trajectory compression and analysis tasks.
3.5. Limitations
While Equation (8) copes very well with nominal and non-nominal contexts, its applicability depends directly on the quality and consistency of context data. Inconsistent quality or sampling rates may affect the robustness of the error measure. Moreover, the integration of contextual domains has higher computational costs than a standard DP algorithm. Further optimizations, such as parallel processing or adaptive thresholds, might be investigated to overcome drawbacks and enhance scalability.
Another limitation of this study is data availability, as evaluating the proposed CADP algorithm relies on AIS data and environmental context. Although AIS data are widely used for maritime trajectory analysis because of their rich spatial and contextual information, the applicability of CADP to semantic contexts and other types of trajectory datasets, such as those from pedestrians, vehicles, or aircraft, remain unexplored. This is mainly due to the lack of a semantic dataset for the analyzed trajectories. While trajectory datasets for land- or air-based movements are more often open, the contextual data required, such as traffic flow, road maps, or atmospheric conditions, are either unavailable or restricted to a few public repositories. The current study, therefore, focuses on demonstrating the efficiency of CADP in maritime applications using AIS data.
While DTW captures temporal distortions effectively and gives a robust measure for spatial alignment, it does not explicitly assess semantic similarity, reflecting the retention of meaningful contextual transitions or features. As a result, the study may not fully explore the impact of context integration on maintaining semantic fidelity in compressed trajectories.
4. Conclusions
This study introduced a novel Context-Aware Douglas–Peucker (CADP) trajectory compression method that integrates both spatial dimensions and environmental and semantic contextual factors, aiming to enhance the quality and utility of compressed trajectories. Traditional trajectory compression methods, such as the Douglas–Peucker (DP) algorithm, focus on spatial characteristics, often neglecting the influence of critical contextual factors like weather or ocean conditions, which can lead to the loss of valuable information during compression. In contrast, the CADP method addresses these limitations by incorporating external environmental data, including water depth, wave height, and wave direction, into the compression process.
The results demonstrate that the CADP method performs significantly better in maintaining trajectory similarity and improving trajectory prediction accuracy. Including contextual information, such as water depth, led to a Root Mean Square Error (RMSE) of 0.004195 in an LSTM-based trajectory prediction, outperforming uncompressed, DP-compressed, and two-stage-CPLS-compressed datasets. This improvement underscores the importance of integrating context into compression algorithms to preserve moving objects’ behavior and movement patterns, especially in complex environments like maritime navigation.
Moreover, the CADP method maintains traditional DP compression’s computational efficiency with a worst-case O(n²) complexity. This ensures that the method is scalable and effective for large datasets, such as those generated by Automatic Identification Systems (AIS) in maritime traffic monitoring. By preserving critical trajectory points influenced by contextual factors, CADP enhances the dataset’s representativeness without significantly increasing its volume, making it suitable for further analysis in applications such as collision avoidance, vessel route planning, and anomaly detection systems.
Furthermore, experiments show that CADP compresses trajectories while effectively embedding contextual domains like water depth, wave height, and wave direction with negligible effects on scalability and computational efficiency. The algorithm is a practical and dependable extension of the DP algorithm; hence, it can be very suitable for many context-aware applications involving trajectory analytics, such as maritime navigation, environmental monitoring, and road transportation systems.
This study shows how context-aware trajectory compression bridges the gap between spatial accuracy and contextual domains. CADP, therefore, represents a practical, scalable, and high-impact contribution to trajectory analysis: it addresses very real challenges in maritime navigation while opening very promising avenues for extensions into other industries.
Future research should be directed at embedding semantic context, like the activity of moving objects or operational purposes, into trajectory compression and analysis. This dimension extends the CADP algorithm to richer scenarios in which semantic information plays an important role in shaping the behavior of trajectories. It will also enable an investigation into how semantic context influences compression efficiency, semantic similarity, and prediction accuracy, further establishing the adaptability and robustness of the CADP framework.
Moreover, extending the CADP algorithm to more classes of trajectories, such as those generated by the movement of vehicles or aircraft, will allow for greater generalization. To that end, domain-specific contextual datasets will have to be sourced for roads; this may involve accessing a road network, traffic flow information, atmospheric data, or synthetic data that can be created to test various domains. Further support for such an extension can be achieved by collaborating with data providers or generating synthetic contextual datasets to enable a better demonstration of the adaptability of CADP to diverse scenarios.
Future work should explore the effects of spatial autocorrelation and uncertainty in trajectory compression, as these factors can influence both the data integration process and the compression quality. Addressing these challenges will improve the robustness and applicability of context-aware trajectory compression in a broader range of geospatial and transportation systems.
Also, future research should be channeled into developing trade-off curves that can consider how best to balance real-time and offline trajectory processing in shipping. These curves can consider the size of a ship, type of freight, environmental dynamics, and respective costs related to processing or memory. These quantified trade-off curves will give actionable insights into optimizing trajectory generation and compression strategies for the concrete needs of industry stakeholders.
Maritime trajectory generation and analysis are placed in a very special position between the robotics and spacecraft domains. While maritime navigation, like the robotics domain [
50,
51,
52], is concerned with dense environments and constraints related to safety, it also requires energy efficiency in route planning, similar to the spacecraft domain [
53,
54,
55]. On the other hand, unlike a spacecraft, ships need to consider external forces such as waves, currents, and tides [
56]. Therefore, tailored approaches like CADP are in order. Thus, CADP embeds environmental and contextual factors and is balanced between spatial accuracy, fuel efficiency, and safety.
Therefore, further research is required to translate trajectory planning methods from robotics and spacecraft to the maritime domain. Strategies for dynamic obstacle avoidance may be adopted from robotics or energy optimization techniques from space. Further, trade-off curves can be drawn on the balance between offline and real-time trajectory processing with respect to ship size, freight type, and environmental conditions. This could provide a cross-domain contribution to improving the robustness and applicability of trajectory planning.