Vehicle Trajectory Recovery Based on Road Network Constraints and Graph Contrastive Learning

Chen, Juan; Feng, Qinxuan

doi:10.3390/su17083705

Open AccessArticle

Vehicle Trajectory Recovery Based on Road Network Constraints and Graph Contrastive Learning

by

Juan Chen

^1,2,*

and

Qinxuan Feng

¹

SHU-UTS SILC Business School, Shanghai University, Shanghai 201899, China

²

Smart City Research Institute, Shanghai University, Shanghai 201899, China

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(8), 3705; https://doi.org/10.3390/su17083705

Submission received: 8 March 2025 / Revised: 11 April 2025 / Accepted: 17 April 2025 / Published: 19 April 2025

(This article belongs to the Collection Urban Street Networks and Sustainable Transportation)

Download

Browse Figures

Versions Notes

Abstract

:

Location-based services and applications can provide large-scale vehicle trajectory data. However, these data are often sparse due to human factors and faulty positioning devices, making it challenging to use them in research tasks that require precision. This affects the efficiency and optimization of sustainable transportation systems. Therefore, this paper proposed a trajectory recovery model based on road network constraints and graph contrastive learning (RNCGCL). Vehicles must drive on the road and their driving processes are affected by the surrounding road network structure. Based on the motivations, bidirectional long short-term memory neural networks and an attention mechanism were used to obtain the spatiotemporal features of trajectory. Graph contrastive learning was applied to extract the local feature representation of road networks. A multi-task module was introduced to guarantee the recovered points strictly projected onto the road. Experiments showed that RNCGCL outperformed other benchmarks. It improved the F1-score by 2.81% and decreased the error by 8.62%, indicating higher accuracy and lower regression errors. Furthermore, this paper validated the effectiveness of the proposed method by case studies and downstream task performance. This study provides a robust solution for trajectory data recovery, contributing to the overall efficiency and sustainability of transportation.

Keywords:

vehicle trajectory data; road network constraints; graph neural networks; contrastive learning; sustainable transportation

1. Introduction

As urbanization speeds up, establishing intelligent traffic systems (ITS) is crucial for sustainable transportation development [1]. The widespread use of mobile devices equipped with a Global Positioning System (GPS) has been facilitated by the advancement of information technology. Traffic trajectory data from GPS-equipped mobile vehicles provide extensive information on vehicle movements [2]. Valuable traffic insights can be gained by processing and analyzing these data, thereby enhancing the sustainable development of ITS research [3]. These data can be analyzed to gain insights into traffic patterns, aiding in urban planning [4] and autonomous driving technology [5]. They are vital for improving traffic flow, energy efficiency, and understanding movement, which can optimize transportation systems [6]. They also underpin applications such as travel time estimation [7] and traffic demand prediction [8,9].

Many studies on location-based tasks depend on high-quality trajectories, including next location prediction [10], traffic flow estimation [11], and destination recommendation [12]. ITS research requires high-precision trajectories because low-sample rate trajectories miss detailed information and introduce more uncertainty [13]. However, real-world trajectory data are often sparse and of low quality due to factors such as unstable equipment [14,15]. Issues such as low sampling rates, positioning errors, and transmitting instability can cause data to be missing or inaccurate [16]. Many real-life trajectories have a low sampling rate. For instance, taxis typically report GPS locations every 2 to 6 min to save energy [17]. This leads to information impairment and affects subsequent trajectory-related work and research [18]. Additionally, navigation can lead to position drift and calibration errors, which refers to the discrepancy between the actual location positioning and the map display, causing further inaccuracies [19]. These challenges make trajectory recovery crucial but complex. The use of irregular location points hinders the study of trajectory recovery in comparison to conventional location information [20]. Additionally, manually shutting down sampling equipment or experiencing data transmission failures may also cause the loss of partial position information [21]. This can lead to non-uniformity, low sampling rates, and noisy trajectory data.

Low sampling rates and noisy trajectories generate less information and more uncertainty when studying the mobility of objects. Therefore, restoring a complete and precise trajectory is essential. The purpose of trajectory recovery is to recover high-sampling rate trajectories from low-quality trajectory data and ensure the accuracy, as shown in Figure 1.

Specifically, the trajectory recovery task consists of two main goals. The first is to recover the high-sampling rate trajectory based on the original low-sampling rate trajectory. The second is to match the recovered trajectory to a real road network [22]. Traditional methods can be divided into interpolation-based and statistical learning-based methods [23]. Linear interpolation is poorly accurate and it ignores the spatial and temporal relationships of the trajectory [24]. In recent years, deep learning methods have been widely used in the study of intelligent transportation systems [18]. Xia et al. [17] proposed Attnmove, which utilizes attention mechanisms to recover trajectories by modeling regularity and periodic patterns in user flow. Ren et al. [25] proposed a sequence-to-sequence model, MTrajRec, which first combines trajectory recovery and map matching through multi-task learning. These methods often treat trajectory data as time series data and ignore the attention paid to spatial properties and the overall information of the trajectory, for example, by predicting missing points only from the previous trajectory portion. Existing trajectory restoration methods typically ignore the constraints of the road network, which detaches the recovered trajectory data from the real map and does not conform to real situations. Trajectory features are typically fused with road network features to improve the accuracy and stability of the trajectory recovery. However, urban road networks are dynamic and intricate systems with diverse road functions and relatively complex structures. The conventional approach of road-based analysis [26] concentrates solely on the attribute information of the road. The impact of overall urban road structure on relevant features is often disregarded. In addition, before the discrete trajectory data can be applied to other works in the transportation field, it is essential to perform map matching to guarantee that trajectory points are accurately mapped onto an actual road [27]. This means that the restoration of missing data and their application to other works still need to go through multiple steps, which is inefficient and easily causes inaccurate results, as shown in Figure 1.

To solve the limitations of existing trajectory recovery methods, a trajectory recovery model that integrates road network constraints and graph contrastive learning (RNCGCL) is proposed in this paper. The model is mainly based on the following two insights: vehicles must drive on the road network and the surrounding road network information has a certain guiding role in the driving process. The key innovation lies in combining road network constraints with graph contrastive learning to enhance trajectory recovery accuracy and ensure alignment with real road networks. Specifically, the model employed bidirectional long short-term memory neural networks and attention mechanisms to capture spatiotemporal features. The model constructs local weighted road network graphs for each trajectory point. The node representation vectors were extracted through contrastive learning, and a multi-task module based on Ren et al. [25] was introduced to accurately constrain the recovered trajectories onto the road network. The main innovations of this study are as follows:

Road network constraints. The model incorporated map matching to ensure reconstructed trajectories align with actual road networks. Additionally, the combination of road network and GPS trajectory representations were fused to enhance the accuracy and reliability of recovery.
Graph contrastive learning. Local road network graphs were created for each trajectory point. Additionally, weights were assigned based on the distance of the nodes in the graph. Node representation vectors were extracted through contrastive learning of the local graphs to learn more spatial semantic information.
Multi-task recovery process. A multi-task learning module decomposed the recovery process into predicting the road number which the point belongs to and predicting its location on the road. This helps the restored points to be correctly mapped to the corresponding road network, reducing overfitting and improving stability.
Comprehensive recovery effect evaluation. The performance of RNCGCL is validated through extensive experiments, including comparison, ablation, parameter sensitivity, and robustness tests, using real-world datasets and various evaluation metrics. The effectiveness of the proposed methodology was comprehensively assessed by visualizing case studies and analyzing the downstream task performance.

This paper is structured as follows: Related works are generally reviewed in Section 2. The relevant definitions and research questions are described in Section 3. Section 4 introduces the proposed RNCGCL model and all the modules. The presentation of the experimental settings and results can be found in Section 5. Section 6 provides a summary of the study and directions for future research.

2. Related Work

Trajectory data have spatiotemporal characteristics, and extracting effective trajectory features is a key factor that affects recovery results. In research on vehicle trajectory data, the role of the road network is significant, and the extraction and integration of road network features are also key points. In addition, many methods of contrastive learning and deep learning have also been widely applied to the study of trajectory data. Therefore, this paper reviews trajectory and road network representation learning, contrastive learning, and trajectory recovery methods.

2.1. Trajectory and Road Network Representation Learning

Representation learning has been widely used as a fundamental task in the field of trajectory data processing [28]. It involves embedding sparse, low-dimensional sequences into more dense vectors [29]. The methods are mainly divided into word embedding [12] and graph embedding [30]. Table 1 provides an overview of trajectory and road network representation learning methods.

Trajectory data are context-sensitive sequence data [27]. However, unlike words in natural language processing, trajectory points consist of a series of coordinates representing latitude and longitude. Before vectorizing geographic coordinates, they must be uniquely indexed in a way that preserves their geographical information [34]. Existing studies treat trajectory data as sequence data for representation learning, and common sequence data modeling methods include sequence models based on a recurrent neural network (RNN), long short-term memory network (LSTM), gated recurrent unit (GRU), and transformer [35]. For example, Trembr, proposed by Fu et al. [36], first employed a skip-gram to obtain the embedding vectors of road segments, then the trajectory representation was extracted by using LSTM networks. The model t2vec proposed by Li et al. [16] divides the map space based on the grid and applied GRU to model the grid sequences. One-hot encoding and word2vec are common ways to realize embedding [31]. The former is barely used in processing trajectory because each vector is orthogonal to the other [44]. This leads to the loss of spatial connectivity, which may cause serious disturbances to trajectory recovery [45]. Another method to obtain word vectors is word2vec [46]. Mikolov et al. [32] introduced Skip-gram, which utilized neural networks to map words into feature vectors. Pennington et al. [33] proposed a global vector model that incorporates both a global interaction matrix and a local context window. Trajectory data can be fragmented for generating positional embeddings. However, with this representation learning approach, the spatial hierarchical relationships of the trajectories are lost [47].

In the representation learning of road networks, the road network is often regarded as a graph in which the edges represent road sections, the vertices represent intersections. In the field of graphs, many researchers recently developed graph-embedding methods that are suitable for network graphs [11,48]. The method of graph embedding captures the topology of a graph and assigns nodes to low-dimensional, dense vectors. Commonly used graph embeddings include Node2vec [38] and DeepWalk [37]. However, road networks are complex and large, and this embedding method has a low computational efficiency and cannot effectively extract spatial structure information. The advancement of graph neural networks (GNN) led to the introduction of various graph convolutional networks, including graph convolutional networks (GCN) [39], graph attention networks (GAT) [40], and GraphSage [41]. GAT incorporated the self-attention mechanism into the realm of graph representation learning. GraphSage used a LSTM to aggregate the feature information of neighbor nodes and simultaneously learns the topology of each node’s neighborhood and the distribution of node features in the neighborhood. DGTM [42] integrated DeepWalk and GAT to capture node features and graph structures more effectively by considering the context information of nodes as well as neighborhood features. Some scholars proposed work specifically for representation learning in road networks [43,49]. The spatiotemporal dual graph neural network (STDGNN) [49] models the road network from dual views, where the GCN for nodes is responsible for capturing intersection features, and the edge GCN handles section features. Toast [43] proposed a traffic context-aware word-embedding module for road network representation. This module takes a series of grids through which a road section passes as its input. The model utilizes an RNN to analyze the correlation of the mesh sequence, and a GNN to model the graph structure. Both neighborhood characteristics and graph topology information are efficiently captured by the learnt road network embedding vectors. However, on the one hand, the representation vectors obtained by this encoding method lack discrimination; that is, it is impossible to judge whether they belong to the same class in the vector space. However, the learned representation vectors are limited to the original data and lack generalization.

In summary, the trajectory representation learning methods based on word embedding have extracted contextual information. However, it neglected the spatial relationships and the impact of the surrounding environment. The graph embedding methods for road network representation focused on spatial information but failed to reflect the dynamic relationships between trajectory nodes. The guiding role of the road network structure on trajectories was ignored, and the resulting representation vectors lack discrimination.

2.2. Contrastive Learning

Contrastive learning has been gradually applied to representation learning [50,51]. The representation vectors in contrastive learning were learned through comparison. Contrastive learning compares dissimilar samples to bring positive ones closer together in the vector space while pushing negative ones farther away. This idea can be summarized as minimizing the distance between similar data points and maximizing the distance between different data points. The use of contrastive learning to extract rich information from data sources can alleviate data sparsity. Deep Graph Infomax (DGI) is a technique that enhances the mutual information between local and global representations of nodes in a graph [50]. This is achieved by learning the graph representations by comparing the nodes of the graph. On the other hand, the graph mutual information model (GMI) [52] is an approach that learns embedding vectors of nodes and edges in a graph by comparing the inputs and outputs of a GNN. DMGI [53] exploits the embedding of mutual information integration nodes between local and global representations of graphs and introduces a consistency regularization framework to minimize the differences between specific nodes of relationship types. CPT-HG [54] uses contrastive learning to obtain higher-order semantic and structural information about graphs through relational and subgraph-level training tasks. Qiu et al. [51] introduced the GCC method, which applies contrastive learning to graph data and develops a self-supervised graph neural network pretraining framework. This framework aids in capturing common network topological properties across multiple networks. The training task is designed to discriminate subgraph instances in the network, and the graph neural network can learn generalized structural representations using contrastive learning.

By contrasting positive and negative sample views, the encoder can be encouraged to learn rich representation vectors. To achieve a better contrast, effective and reasonable data augmentation is essential. Various enhancement schemes have been proposed in previous studies, including edge removal, feature hiding [55,56], and perturbation transformation [57]. A contrast of the nodes and graph representations from positive and negative views yielded better results. However, these studies remained within the framework of node or graph comparison. The characteristics of the road network and trajectory movement are ignored.

In summary, there are complex spatiotemporal and semantic relationships between trajectory and road network. This paper proposed a method to establish a relationship between two forms of data, specifically the road network and trajectory, which can be regarded as equal to topological structure and sequential structure. Local graphs of the surrounding road network will be created for each point of trajectory in this paper. The weighted subgraphs obtained were then used for contrastive learning to extract feature representation vectors of the road nodes. By doing this, the model could learn the road network and the trajectory representation simultaneously, ultimately extracting more comprehensive spatial semantic information.

2.3. Trajectory Recovery Methods

Trajectory recovery refers to the fact that missing values can be completed based on the trajectory information when there is data support before and after missing data [25]. The task of trajectory recovery is to enhance the completion and accuracy by restoring the missing points in sparse trajectory. Existing research on trajectory recovery can be categorized into two primary groups: models that utilize statistical techniques and neural network models that are based on deep learning [2]. Table 2 provides an overview of trajectory recovery methods.

Statistical methods use probabilities or rules to estimate missing locations in trajectories [46]. Linear interpolation is poorly accurate and it ignores the spatial and temporal relationships of the trajectory [24]. Markov models, which assume future states depend only on the current state, are commonly used [19,58]. Chen et al. [28] proposed a trajectory predictor NLPMM using Markov process, which models the movement trajectory from both global and individual aspects. The study by Mathew et al. [46] employed a model that initially categorizes user trajectories based on historical data. This is achieved by grouping trajectories with similar motion patterns into the same class. Afterward, the model used the data from each cluster to train a hidden Markov model. Compared with a single Markov model, these two methods process and then model the trajectory according to different information, which can more fully mine the various characteristics of the trajectory data.

Deep learning-based methods often treat the trajectory recovery problem as a prediction problem, using data before and after the lost location to predict missing location data [7,9]. Wang et al. [9] proposed a two-stage solution known as DHTR, which initially recovered the high-sample trajectory and subsequently employs a map-matching algorithm and the hidden Markov model HMM [13] to recover the real GPS location. The model employed a sequence-to-sequence model in conjunction with the calibration component of a Kalman filter to recover completed trajectories from sparse ones. Ren et al. [25] proposed an encoder–decoder model called MTrajRec that employed multi-task learning to integrate trajectory recovery and map matching. Both the DHTR and MTrajRec methods follow an encoder–decoder structure. However, MTrajRec used RNNs as the basic units of encoders and decoders, which makes it relatively difficult for the model to handle long trajectories [17]. Furthermore, the DHTR model disregarded the significant spatial arrangement of the road, which led to the result not being consistent with the geographical area. Xi et al. [8] suggested a bidirectional sequential model named Bi-STDDP, which incorporated both bidirectional spatiotemporal dependencies and user preferences to capture intricate patterns. Xia et al. [17] introduced AttnMove, an approach that incorporated a range of attention mechanisms between points in trajectory and between trajectories to identify different patterns from the time dimension. Despite this, the model disregarded the road network structure, which lead to reconstructed trajectories that did not align with the actual map. AttnMove and Bi-STDDP are tailored to individual users and are not designed to be universally applicable. Furthermore, the map-matching task [59] needs to be performed after trajectory recovery using these methods. The task of map matching involves aligning GPS points in a trajectory with the road network, which serves as a fundamental preprocessing step. It not only improves road-based applications, such as vehicle navigation [60] and travel time estimation [61], but also enriches trajectories with more semantics and supports driver-based applications such as behavior analytics [62]. Conventional methods address the issue of trajectory recovery limited by the road network using a two-stage approach. Although map matching can be performed using the recovered trajectories, the accumulation of inference errors is a concern. Furthermore, the two-stage approach is inefficient due to the time cost of the map-matching algorithm [27].

In summary, existing statistical methods have high interpretability, but the corresponding transfer matrix cannot be established when some locations cannot be directly observed. The deep learning model-based approach can effectively identify intricate and nonlinear connections in trajectory that may not be apparent through conventional techniques. However, existing deep learning-based methods usually ignore the constraints of the road network, which detaches the recovered trajectory data from the real map and does not conform to the real situation. The other models integrate the complex features of multiple information, while this user-based data processing is not universal. The two-stage scheme solves the road network constraints while error accumulation is inefficient.

3. Problem Description and Definition

To more effectively convey the research content in this paper, this section thoroughly outlines the relevant concepts that are present in the trajectory recovery issue and explicitly defines the trajectory recovery problem.

Definition 1 (Trajectory

T

).

The trajectory described in this paper refers to the trajectory of the vehicle in a moving state. In the trajectory recovery problem, the trajectory is usually represented by a set of GPS continuously sampled trajectory point sequences, and each GPS location is accompanied by a timestamp so that all GPS locations can be arranged in the order of sampling time, which can be expressed as:

T = \{P_{1}, P_{2}, \dots, P_{t}\}

(1)

where for each timestamp

t, t \in (1, n)

,

P_{t} = (x_{t}, y_{t}, t)

represents the coordinates of the GPS point with longitude

x_{t}

and latitude

y_{t}

. A trajectory illustration is shown in Figure 2.

Definition 2 (Road network

G

).

The constructed road network includes geographic information such as road distance, road type, longitude and latitude of road intersections, and connection relationship between roads. A road network can be described as an undirected graph using the following formula:

G = (V, E, F, A)

(2)

where

V = {v_{1}, v_{2}, \dots v_{t}}

denotes the nodes, representing road segments set.

E = {e_{1}, e_{2}, \dots, e_{l}}

represents the set of connections between nodes. For each

e \in E

,

e = (v_{s}, v_{d})

, where

v_{s}

is the start node and

v_{d}

is the end node.

v x_{s}, v y_{s}

and

x_{d}, v y_{d}

represent the coordinates of both ends of the road, respectively:

F = {f_{1}, f_{2}, \dots, f_{t}}

, which represents the feature set of the road section, and

f_{t} = \{l e n g t h, l e v e l_{i}\} \in R^{| V | \times 8} .

The former denotes the length of the road;

l e v e l_{i}

,

i = 7

denotes the type of the road segments, including main roads, streets, and so on;

A \in R^{|V| \times |V|}

represents the adjacency matrix of the graph. When two nodes are connected, the value corresponding to the matrix position is 1; otherwise, it is 0.

Definition 3 (Constrained trajectory

\bar{T}

).

GPS data collected from real world may have shifting errors, which means deviating from the actual road. In the traditional trajectory interpolation recovery method, the map matching work needs to be continued after the interpolation of the trajectory points to restore the real trajectory. However, by constructing an end-to-end trajectory recovery method, this step can be reduced and the application efficiency can be improved. Before applying and restoring trajectory data, trajectory points will be matched with the corresponding road segment in this paper.

For each trajectory point that has been matched by the map, it is expressed as follows:

p_{t} = (v_{t}, z_{t})

(3)

where

v_{i}

is the road segment number, and

z_{i}

is the location proportion of vehicles relative to the segment, that is, the length of the location divides the total length of road. The illustration of location proportion is shown in Figure 3. The conversion formula of the original point

P

and the matched point

p

is as follows:

P_{t} . y_{t} = p_{t} . v y_{s} + p_{t} . z \times (p_{t} . v y_{d} - p_{t} . v y_{s})

(4)

P_{t} . x_{t} = p_{t} . v x_{s} + p_{t} . z \times (p_{t} . v x_{d} - p_{t} . v x_{s}) .

(5)

The trajectory after completing the map matching can be denoted as

\bar{T} = \{p_{1}, p_{2}, \dots, p_{n}\}

. In addition, it can also be represented as a set of road segments

{\bar{T}}_{r o a d} = {v_{1}, v_{2}, \dots, v_{n}}

. The concept of the constrained trajectory is shown in Figure 4.

Definition 4 (Missing trajectory

\tilde{T}

).

For a complete real trajectory, a missing mask matrix

M = {(m}_{1}, m_{2}, \dots, m_{t})

represents the label of missing trajectories, which is denoted as follows:

\tilde{T} = T \cdot M^{T} = \{{\tilde{P}}_{1}, {\tilde{P}}_{2}, \dots, {\tilde{P}}_{n}\}

(6)

where

(\cdot)

represents matrix multiplication, for each timestamp

t

,

m_{t} \in (0, 1)

indicates whether the trajectory point under time step

t

is missing or not. For example,

m_{t} = 1

indicates that the trajectory point is not missing and

m_{t} = 0

indicates that the trajectory point is missing.

Problem description (Trajectory recovery).

For a given sparse trajectory

\tilde{T} = \{{\tilde{P}}_{1}, {\tilde{P}}_{2}, \dots, {\tilde{P}}_{n}\}

, the problem of trajectory recovery is to recover a low-sample rate trajectory to a complete real trajectory,

T = {P_{1}, P_{2}, \dots, P_{m}}

, and the recovered trajectory can be accurately projected onto the actual road network to form a complete vehicle driving trajectory.

4. Methodology

The continuity of the trajectory and the correlation between the trajectory points are similar to the relationship between words in a sentence in natural language processing. An encoder–decoder framework [63] can be applied in trajectory processing. Although the conventional approach can progressively generate trajectory points step by step, it cannot ensure that the generated trajectories will accurately align with the road network.

Vehicles must drive on the road network. The surrounding road network also has a certain guiding role in the process of vehicle driving. Therefore, this paper proposed a trajectory recovery model based on road network constraints and graph contrastive learning, namely RNCGCL. The model can recover trajectory points by utilizing correlation information in the trajectory points and matching them to the actual road network simultaneously.

In this section, details of the proposed model RNCGCL are provided, which mainly include three modules: a trajectory sequence processing module, a road network local graph contrastive module, and a trajectory recovery multi-task module. The overall research framework is first described in Section 4.1. Then, each major module in this model is specifically introduced in Section 4.2, Section 4.3 and Section 4.4.

4.1. Research Framework

The proposed model RNCGCL utilized an encoder–decoder framework. The encoder captured the sequence relationship of the trajectory points. Additionally, it extracted spatial structure information from the local road network graph and passed it to the decoder. Then the decoder recovered the missing points through the encoding vector obtained from the encoder. There are three main modules in the proposed RNCGCL: (1) trajectory sequence processing, (2) road network local graph contrastive learning, and (3) multi-task trajectory recovery, each of which can be summarized as follows:

In the trajectory sequence processing module, the constraint effect of the road network on vehicle behavior is considered. Map matching is implemented to process data and guarantee that the reconstructed point can be accurately aligned with the corresponding road network. For trajectory data, a two-way long short-term memory neural network was used to obtain trajectory vectors, and a spatiotemporal attention mechanism was introduced to enhance the model’s ability to learn the spatiotemporal characteristics and overall information of trajectories.

In the road network local graph contrastive module of a road network, graph theory is introduced into the construction of road networks. A topology model of urban road networks based on road nodes and edges is constructed. Local graphs of the road network were generated for each point in the given trajectory, taking into account the road network structure around that point. The weights of the local graph were determined based on the distance of the roads. Additionally, the feature representation vector of the road network nodes was extracted through local graph contrastive learning so that the model could learn richer spatial semantic information.

In the multi-task trajectory recovery module, the final decoding part of the model for trajectory recovery is decomposed into predicting the road section number and predicting the location ratio on the road. This guaranteed the recovered trajectory to be accurately matched to the actual road network, which improves the generalization ability of the model and reduces the risk of overfitting.

The framework of the model is shown in Figure 5.

As shown in Figure 5, firstly, in the trajectory sequence processing module, the

i^{t h}

vehicle trajectory with missing points

{\tilde{T}}_{i} = \{{\tilde{P}}_{1}, {\tilde{P}}_{2}, \dots, {\tilde{P}}_{n}\}

is taken as the input of the model. After the constraint is established by map matching, the constraint trajectory

\bar{T_{i}} = \{p_{1}, p_{2}, \dots, p_{n}\}

is input into the grid embedding layer to obtain the embedding vector

S_{i}

of the trajectory sequence. The embedding vector

S_{i}

is encoded by a bidirectional long short-term memory network to obtain the encoding vector

r_{t}

of trajectory point. Through the attention layer, the weights of the trajectory points in the sequence are assigned, and the final trajectory point-encoding vector

a_{t}

is obtained by weighted summing.

Secondly, in the road network local graph contrastive module, according to the matched road section sequence

{\bar{T_{i}}}^{r o a d} = {{v}_{1}, v_{2}, \dots, v_{n}}

, each node is taken as the center to obtain the road network local graph sequence

T_{i}^{G} = {{Q}_{1}, Q_{2}, \dots, Q_{n}}

. The weights of the edges in the local graph were assigned according to the distance, and the road network feature encoding was carried out using the graph attention network GATs. The representative vector

H

of each local graph and current node representation vector

h_{t}^{v}

were obtained. The node vector at the next step

h_{t + 1}^{v}

is taken as the positive sample and the node vector

{(h}_{t - 1}^{v} {, h}_{t - 2}^{v}, \dots ., h_{t - k + 1}^{v})

in the previous

k

steps is taken as the negative sample. Contrastive learning of the local graphs is performed to guide the model in predicting future local graph structures.

Finally, in the trajectory recovery multi-task module, the input is the concatenation of three encoding vectors: the encoding vector of trajectory point

a_{t}

obtained by the trajectory sequence module, the road node representation vector

h_{t}^{v}

obtained by the road network local graph contrastive module, and the position proportion vector

z_{t}

queried in the road network constraint layer. After being decoded, the node index of the road node

{\hat{v}}_{t + 1}

and the proportion of the trajectory location

{\hat{z}}_{t + 1}

are obtained, respectively. By decoding and converting, the recovered trajectory

T = {P_{1}, P_{2}, \dots, P_{m}}

can be obtained. Each module is described specifically below.

4.2. Trajectory Sequence Processing Module

The trajectory sequence processing module is mainly used to extract trajectory sequence features, including the road network constraint layer, the grid embedding layer, the sequence encoding layer, and the attention layer. The details of this module can be seen in Figure 6.

As shown in Figure 6, the module used the restraining effect of the road network on vehicle behavior to ensure that the restored trajectory can be correctly mapped to the corresponding road network. A bidirectional long short-term memory neural network was used to obtain the trajectory vector, and the self-attention mechanism was introduced to enhance the model’s ability to learn the inter-sequence features of the trajectory and the overall information. The input is the missing trajectory

{\tilde{T}}_{i} = \{{\tilde{P}}_{1}, {\tilde{P}}_{2}, \dots, {\tilde{P}}_{n}\}

, and the encoded vector

a_{t}

of the trajectory point at the current time step will be input in the decoder.

4.2.1. Road Network Constraint Layer

In the road network constraint layer, the input missing trajectories

{\tilde{T}}_{i} = \{{\tilde{P}}_{1}, {\tilde{P}}_{2}, \dots, {\tilde{P}}_{n}\}

are queried through the mask matrix

M

, and finally the corresponding constrained trajectories

\bar{T_{i}} = \{p_{1}, p_{2}, \dots, p_{n}\}

and

{\bar{T_{i}}}^{r o a d} = {v_{1}, v_{2}, \dots, v_{n}}

are obtained.

Data obtained after cleaning do not guarantee that all trajectory points will fall correctly on the road network. Those trajectory points detach the actual situation and will influence the recovery. Therefore, it is crucial to apply map matching on the trajectory data to establish road network constraints on the trajectory data. Following the map matching work of Lou et al. [27], this paper matches the trajectory points into the road nodes.

In the search for candidate points, for each trajectory point of a vehicle trajectory

T_{i}

, a certain threshold

r

is used as the radius, adjacent road network nodes within the threshold range are searched as a candidate set. The haversine distance between two points is calculated as follows:

d_{h a v e r s i n e} (P_{i}, P_{j}) = 2 R a r c t a n (\sqrt{\frac{a (P_{i}, P_{j})}{a (P_{i}, P_{j}) - 1}})

(7)

a (P_{i}, P_{j}) = \sin^{2} (\frac{l a t_{i} - l a t_{j}}{2}) + c o s (l a t_{i}) \cos (l a t_{j}) \sin^{2} (\frac{l o n_{i} - l o n_{j}}{2})

(8)

where

R

is the radius of the earth,

R = 6371 km

;

l o n_{i}, l a t_{i}

is the latitude and longitude of the point at time step

i

. The calculated unit is kilometers. The selected candidate point sequence corresponding to the trajectory point sequence

P_{i : j}

is recorded as

p_{i : j}

, and each candidate point contains the corresponding road network node index

v_{i : j}

. The matching process consists of a triple, which can be expressed as:

R C = (S, O, π)

(9)

where

S

is transition probability matrix,

O

is observation probability matrix, and

π

is initial state probability.

S

is fitted by an exponential function, which is calculated by the distance of trajectory points and projected points. The closer the distance, the greater the state transition probability, which is calculated as follows:

p_{S} (λ) = e^{- \frac{λ}{β}}

(10)

λ = {‖P_{t} - P_{t + 1}‖}_{h} - {‖p_{t, i} - p_{t + 1, j}‖}_{v}

(11)

where

p_{s}

denotes the transition probability,

P_{t}

is the trajectory point at time step

t

,

p_{t, i}

denotes the closest point to the trajectory point

P_{t}

on the road node

v_{i}

, and

β

is a hyper-parameter.

{‖\cdot‖}_{e}

denotes the haversine distance,

{‖\cdot‖}_{v}

denotes the path distance between the projection points on the roads, and

m e d i a n (\cdot)

denotes the mean value. The observation probability matrix

O

represents the distance between the observed trajectory point and the roads. The closer the distance, the greater the probability of the point matched on this road, which is calculated as follows:

p_{O} (P_{t} | v_{i}) = \frac{1}{\sqrt{2 π} σ_{z}} e^{- \frac{1}{2} {(\frac{{‖p_{t} - p_{t, i}‖}_{h}}{σ_{z}})}^{2}}

(12)

σ_{z} = 1.4826 median ({‖P_{t} - p_{t, i}‖}_{v})

(13)

where

p_{b}

denotes the observation probability,

v_{i}

denotes the

i^{t h}

road segment;

σ_{z}

represents the standard deviation between the trajectory points

P_{t}

and

p_{t, i}

.

m e d i a n (\cdot)

denotes the mean value. Recursive method is used to solve the transition probability and optimal path between two adjacent points. Then, the global optimal path is calculated using the state with the highest transition probability and the local optimal path to achieve map matching. The process is carried out in data preprocessing, and according to the matching results, in the constraint layer of the road network, the input missing trajectory

{\tilde{T}}_{i} = {\tilde{P_{1}}, {\tilde{P}}_{2}, \dots, \tilde{P_{n}}}

is queried through the mask matrix

M

, and the corresponding constraint trajectories

\bar{T_{i}} = \{p_{1}, p_{2}, \dots, p_{n}\}

and

{\bar{T_{i}}}^{r o a d} = {v_{1}, v_{2}, \dots, v_{n}}

are finally obtained.

4.2.2. Grid Embedding Layer

For a constrained trajectory

\bar{T_{i}} = \{p_{1}, p_{2}, \dots, p_{n}\}

, the trajectory sequence is represented as a vector sequence via trajectory embedding. Owing to the discrete nature of the trajectory point, the direct use of coordinates increases the data dimension and is inefficient; therefore, the grid embedding method is used to mesh the map, and a simple multi-layer perceptron is constructed to encode the grid number where the trajectory point is located to retain the connection between longitude and latitude and the position relationship between the trajectory points as follows:

{\bar{T}}_{i}^{'} = \{e_{1}, e_{2}, \dots, e_{n}\}, i \in {1,2, \dots N}

(14)

S_{i} = \{s_{1}, s_{2}, \dots s_{n}\}, i \in {1,2, \dots N}

(15)

s_{t} = σ (W_{s} e_{t} + b_{s}), t \in \{1,2, \dots, n\}

(16)

where

{\bar{T}}_{i}^{'}

denotes the grid sequence corresponding to the

i^{t h}

trajectory.

e_{t} = ({e . x}_{t}, {e . y}_{t})

is the grid index, and the row and column indexes of the grid are, respectively, represented by

x_{t}

and

y_{t}

.

S_{i} \in R^{n \times d}

denotes the grid embedding vector sequence. The embedding vector corresponding to the grid

e_{t}

is denoted as

s_{t}

and the activation function is represented as

σ (\cdot)

. The weights

W_{s}

and biases

b_{s}

in the multi-layer perceptron are the trainable parameters. Through the grid embedding layer, the constrained trajectory sequence

\bar{T_{i}}

is finally represented as an embedding vector sequence

S_{i}

.

4.2.3. Sequence Encoding Layer

In the sequence encoding layer, the input is an embedding vector sequence

S_{i} = \{s_{1}, s_{2}, \dots s_{n}\}

, and the trajectory point-encoding vector

r_{t}

is obtained after learning by the encoder. In this layer, the encoder is stacked by stacking multiple Bi-LSTM models to extract the trajectory sequence vector. Bi-LSTM is used as the encoder to obtain the context vector of the trajectory and learn the sequential correlation of the trajectory. For a forward LSTM, the update method is as follows: First, the information is input to the input gate, and the trajectory point embedding vector is used as the initial value of the first layer. The

t^{t h}

layer will be composed of the embedding vector

s_{t}

and the output

h_{t - 1}

of the previous LSTM. The formula is as follows:

i_{t} = σ (W_{i} [s_{t}, h_{t - 1}] + b_{i})

(17)

where

σ (\cdot)

is the activation function, and

W_{i}

and

b_{i}

are the weights and biases in the input gate. Next, the information is filtered and updated. The update gate is updated with long-term memory, and the state obtained by the update is partly from the forgotten gate of the previous moment, and part of it is from the original input gate of the current moment, and unimportant information will be forgotten; the specific formula is as follows:

f_{t} = σ (W_{f} [s_{t}, h_{t - 1}] + b_{f})

(18)

{\tilde{C}}_{t} = \tanh (W_{c} [s_{t}, h_{t - 1}] + b_{c})

(19)

C_{t} = f_{t} ⊙ C_{t - 1} + i_{t} ⊙ {\tilde{C}}_{t}

(20)

where

σ (\cdot)

is the activation function;

W_{f}

and

b_{f}

are the weight and bias in the forget gate.

f_{t}

and

C_{t}

are the filter gate and the update gate, respectively;

s_{t}

denotes the embedding vector obtained by the grid embedding layer at time step

t

and the hidden state vector

h_{t - 1}

is output by the LSTM of the previous layer.

W_{f}

and

W_{c}

are the weight parameters for the forgetting gate and the update gate, respectively, while

b_{f}

and

b_{c}

are the corresponding biases. After the update, the output state vector

\overset{⃑}{r_{t}}

is calculated as follows:

o_{t} = σ (W_{o} {[s}_{t}, h_{t - 1}] + b_{o})

(21)

\overset{⃑}{r_{t}} = o_{t} ⊙ \tanh (C_{t})

(22)

where

W_{o}

and

b_{o}

are weight and bias in the forget gate;

t a n h (\cdot)

is an activation function. The hidden state vectors

r_{t}

after the final sequence encoding are the combined means of forward and backward LSTM outputs

\overset{⃑}{r_{t}}

and

\overset{⃐}{r_{t + 1}}

, which can be calculated as follows:

r_{t} = m e a n (\overset{⃑}{r_{t}} ∥ \overset{\leftarrow}{r_{t + 1}}) .

(23)

4.2.4. Self-Attention Layer

In this layer, the model learns the temporal and spatial correlations of the data for trajectory restoration. By allocating the input weights, the attention mechanism allows the model to focus only on the features that need to be learned, which can effectively allocate computing resources under the condition of limited computer performance so that the model can focus on more effective information and improve the calculation speed. The goal of the attention mechanism is to generate context vectors by calculating the similarity between query vectors and key vectors. In this model, the query vector is the current hidden state in the decoder and the key vector is the output of the encoder. The formula for calculating the attention layer is as follows:

a_{t} = \sum_{j = 1}^{n} α_{t, j} r_{j}

(24)

α_{t, j} = \frac{\exp (e_{t, j})}{\sum_{j = 1}^{n} \exp (e_{t, j})}

(25)

e_{t, j} = σ (W_{e} {[u}_{t}, r_{j}])

(26)

where the context vector

a_{t}

corresponding to the trajectory point at the current time step

t

is the weighted sum of all output vectors from the encoder.

r_{j}

is the encoding vector of the trajectory points at the time step

j

.

α_{t, j}

is the attention weight coefficient between the current time step

t

and the trajectory point at other time steps, which is calculated through

e_{t, j}

by the softmax function. The calculation formula of

e_{t, j}

is shown in (26).

u_{t}

is the hidden state vector obtained in the decoder at the current time

t

,

σ (\cdot)

is a nonlinear activation function, and

W_{e}

is a weight matrix that can be transformed linearly.

4.3. Road Network Local Graph Contrastive Module

The local graph contrastive module of road network is mainly designed to extract the static structural features and dynamic trajectory spatial features of the road network, including the local graph generation layer, graph encoding layer, and graph contrastive layer. The details of this module can be seen in Figure 7. In this section, specific details of the content of the module are explained.

As shown in Figure 7, the main components of this module were local graph generation and graph attention. Considering the guiding role of the surrounding road network on the trajectory, the module generates a local road network graph by extracting the local road network structures of each trajectory point in the sequence and sets the weights according to the distance of the road. The characteristic representation vector of the road network nodes was extracted through constructive learning of the local graph of the road network. The input is a sequence of segment nodes

{\bar{T_{i}}}^{r o a d} = {v_{1}, v_{2}, \dots, v_{n}}

, and the segment node representation vector

h_{t}^{v}

at the current time step

t

will be obtained.

4.3.1. Local Graph Generation Layer

In the local graph generation layer, the model extracts the structural features of the surrounding static road network and the dynamic spatial features of the trajectory at the same time, and the input is the node sequence of the road

{\bar{T_{i}}}^{r o a d} = {v_{1}, v_{2}, \dots, v_{n}}

and the output is the local graph sequence

T_{i}^{G} = {Q_{1}^{w}, Q_{2}^{w}, \dots, Q_{n}^{w}}

. The entire road network is complex and huge, and using it causes a waste of computing resources. When the vehicle is moving, the distant road network structure is less useful or even useless for the current prediction; therefore, the local road network graph of each trajectory point is considered in this study. According to the corresponding road node sequence

{\bar{T_{i}}}^{r o a d} = {v_{1}, v_{2}, \dots, v_{n}}

, for each trajectory point

P_{t}

, the current road node

v_{t}

is the center, and the second-order subgraph is retained as the local graph to obtain the local graph sequence

T_{i}^{G} = {Q_{1}^{k}, Q_{2}^{k}, \dots, Q_{n}^{k}}

. Considering that the influence of different road section nodes on the trajectory is different, the weights of the edges in the local graph are set according to the distance between the nodes and current trajectory points. The calculation formula is as follows:

W_{t, j} = \exp (- \frac{∥ v_{j} - p_{t} ∥^{2}}{δ^{2}})

(27)

where

v_{j}

is the node of the

j^{t h}

road node,

p_{t}

is the trajectory point at time

t

,

∥ v_{j} - p_{t} ∥

represents the haversine error between the matched point

p_{t}

and the nearest road node

v_{j}

, and

δ

is the distance threshold.

δ

is set to 30 m referenced to [25]. If a segment node is farther from the current trajectory point, the segment should have less influence on its travel. According to the calculated weights, the feature representation matrix of the current road section node

X_{t}

can be obtained, which is calculated by weighted summation, and the formula is as follows:

X_{t} = m e a n [\frac{1}{|N_{t}|} \sum_{j \in N_{t}}^{j = 1} W_{t, j} f_{j} ∥ f_{t}]

(28)

where

N_{t}

is the second-order neighborhood node set of the current nodes.

f_{j}

and

f_{t}

represent the feature vectors of the road segment

j

and

t

, which are explained in detail in Section 3. Each local graph can be represented as

Q_{t}^{k} = (V_{t}^{k}, E_{t}^{k}, X_{t}^{k}, A_{t}^{k})

.

4.3.2. Graph Encoding Layer

In the graph encoding layer, the graph attention network is used to aggregate neighbor node information, the input will be the adjacency matrices

A_{t}

and the feature matrix

X_{t}

, and the output is the current node representation vector

h_{t}^{v}

. The graph attention network obtains a representation of the central node by assigning the attention weight of the central node and its first-order neighbor node, and for the node

v_{t}

, it is updated as follows:

h_{t}^{v} = ∥_{l = 1}^{L} σ (\sum_{j \in N_{t}} α_{t j} W X_{j})

(29)

α_{t j} = \frac{\exp (σ (A_{t j} \cdot a^{⊤} [W X_{t} ∥ {W X}_{j}]))}{\sum_{j \in N_{t}} \exp (σ (A_{t j} \cdot a^{⊤} [W X_{t} ∥ {W X}_{j}]))}

(30)

where

h_{t}^{v}

is the output node vector at the current time.

N_{t}

is the second-order neighborhood node set of the current node.

σ (\cdot)

denotes the nonlinear activation function.

α_{t j}

is the attention weight coefficient between the current node

t

and other nodes.

A_{t j}

denotes the corresponding values of the current node

t

and node

j

in the adjacency matrix.

X_{t}

is the feature matrix of the current road,

a^{⊤}

and

W

denote trainable weight matrix, and

L

denotes the number of attention heads.

4.3.3. Graph Contrastive Layer

The trajectory is dynamically changing, whereas the road network is static. The two adjacent local graphs have a strong correlation and can reflect the trend of the next step of the trajectory. Trajectories have temporal dependencies and spatial information about movement. Therefore, a graph contrastive layer is designed to establish the connection between the trajectory and the road network. This layer can extract the dynamic time series features of the trajectory and extract the structural information of the surrounding road network. The core idea of contrastive learning is to train an encoder by generating and comparing the positive and negative samples. Contrastive learning can help the model understand these time dependencies and spatial relationships by contrasting the trajectory that follows similar temporal patterns or spatial contexts. By pulling closer representations of similar trajectories and pushing apart those of dissimilar ones, the model becomes better at capturing the essential features that define different types of movement. In the graph contrastive layer, a sliding window is used to guide the model to predict the future local graph structure by taking the node vector

h_{t + 1}^{v}

at the next time step as the positive sample and the node vectors

{(h}_{t - 1}^{v} h_{t - 2}^{v}, \dots ., h_{t - k + 1}^{v})

in the previous

k

time steps as the negative samples for the node vector

h_{t}^{v}

at the current step. The model needs to maximize the similarity between positive sample pairs and minimize the similarity between negative sample pairs. Using InfoNCE [64] as the loss function, the formula was calculated as follows:

L_{C} = \frac{1}{|V|} \sum_{v \in V} - \log \frac{\exp (\frac{sim (h_{t}^{v}, h_{t + 1}^{v})}{τ})}{\sum_{j = t - k + 1}^{t + 1} \exp (\frac{sim (h_{t}^{v}, h_{j}^{v})}{τ})}

(31)

where

V

is the node set of the road section,

k

is the step size of the sliding window,

h_{t}^{v}

is the node vector at time step

t

, and

h_{t + 1}^{v}

denotes the node vector at the next local graph.

τ

is the temperature coefficient, and

s i m (\cdot)

is the cosine similarity function.

τ

is set to 0.4 referred to [65].

4.4. Trajectory Recovery Multi-Task Module

The trajectory recovery multi-task module is mainly used to decode hidden state vectors, including a decoding layer and a model training layer. In this section, specific details of the content of the module are explained.

4.4.1. Decoding Layer

In the decoding layer, following [25], the trajectory recovery task is decomposed into predicting the road node number and the position on the road, which guarantees the recovered trajectory to be accurately matched to the actual road network. Additionally, the setting of multiple tasks can improve the generalization ability of the model and reduces overfit. The details of the decoding layer are shown in Figure 8.

As shown in Figure 8, RNCGCL decodes the encoding vectors obtained by the first two modules, and the decoder consists of a Bi-LSTM. The input is the concatenation of three encoding vectors: the encoding vector of trajectory point

a_{t}

obtained by the trajectory sequence module, the road node representation vector

h_{t}^{v}

obtained by the road network local graph contrastive module, and the position proportion vector

z_{t}

queried in the road network constraint layer. The decoder outputs the node index

{\hat{v}}_{t + 1}

and trajectory position proportion

{\hat{z}}_{t + 1}

. For the decoder, the input data were marked as missing by a missing matrix

M

. For the current timestamp

t

, if the current value in the matrix is zero, the input of the decoder is raw data

z_{t}

. Otherwise, the predicted value

{\hat{v}}_{t}

and

{\hat{z}}_{t}

from the upper decoder will be the input of the current decoder, and the decoder uses the output of the previous moment as the input vector to iteratively predict the number of the road segment node in the next moment and then predicts the trajectory position ratio through the sigmoid activation function. The formula for updating the decoder is as follows:

u_{t} = B i L S T M (u_{t - 1}, h_{t - 1}^{v}, z_{t - 1}, a_{t})

(32)

where

B i L S T M (\cdot)

is the bidirectional long short-term memory model. The model update rules are shown in (17)–(23) in Section 4.2.3, in which

u_{t - 1}

represents the hidden state vectors of the upper layer of the decoder. Through step-by-step decoding, the obtained road section node index and proportion of the trajectory location are converted according to Equations (3) and (4), and the coordinates of the recovered position are calculated. The recovered trajectory

T_{i} = {P_{1}, P_{2}, \dots, P_{m}}

is finally obtained.

4.4.2. Model Training Layer

The model was trained using a multi-task learning strategy. For the learning of the node index of the road segment

v_{t + 1}

, the cross-entropy loss function is used. For the regression task of predicting

z_{t + 1}

, mean square error (MSE) was used to calculate the regression loss. This loss quantifies the expected value of the square of the difference between the prediction and true value. It is particularly attentive to anomalies and functions as a performance indicator to measure network stability. The loss function of the final model training was expressed as a linear weighting of the three tasks, which was calculated as follows:

L_{I} = \frac{1}{N} \sum_{i = 1}^{N} \sum_{c = 1}^{m} {- y}_{i c} \log (p_{i c})

(33)

L_{R} = \frac{1}{N} \sum_{i = 1}^{N} \sum_{t = 1}^{T} {({\hat{P}}_{i, t} - P_{i, t})}^{2}

(34)

L = L_{I} + λ_{1} L_{R} + λ_{2} L_{C}

(35)

where

N

is the number of samples,

m

is the number of labels, and

y_{i c}

is an indication function, it can be 0 or 1. When the value is 0, it means that the true category label of the sample

i

is not a category

c

, and when the value is 1, it means that the sample is correctly predicted.

p_{i c}

is the probability that the predicted sample

i

is a category

c

;

T

is the total number of steps in the trajectory.

{\hat{P}}_{i, t}

denotes the position coordinates at time step

t

in the

i^{t h}

trajectory recovered by the model, and

P_{i, t}

denotes the real position coordinates.

L_{I}

denotes the classification loss of the road segment nodes and

L_{R}

denotes the mean square error of the proportional regression of the trajectory position.

L_{C}

is the contrastive loss of local graphs.

λ_{1}

and

λ_{2}

are all hyperparameters, which are used to adjust the balance of each task.

5. Experiments

In this paper, comparative, ablation, parameter sensitivity, and robustness experiments were conducted. A variety of metrics were employed to confirm the accuracy and efficiency of the model. Additionally, the results of the trajectory recovery according to the model are visualized on the map, which makes the experimental results more intuitive and further enhances the interpretability of the model. This section describes the datasets, evaluation indicators, benchmark models, and the experimental protocols.

5.1. Experimental Setup

5.1.1. Datasets and Preprocessing

Trajectory dataset: The taxi trajectory dataset used in this paper was obtained from the public dataset provided by the Kaggle competition. The dataset was studied in the city of Porto, the second-largest city in Portugal. The trajectory data in this dataset were collected through mobile terminals on the taxi, and the GPS points of the taxi’s location were collected every 15 s after the order started. Each GPS point is composed of longitude and latitude, and the order corresponds to a trajectory sequence composed of multiple GPS points. The dataset depicts the trajectories of nearly 1.7 million trajectories of all 442 taxis operating in the city of Porto, Portugal, throughout the year. The time frame of the dataset is the full year from 1 July 2013 to 30 June 2014. In this paper, the trajectory data of the central urban area of Porto were sampled, with a longitude range of [−8.71099°, −8.51099°] and a latitude range of [41.04961°, 41.24961°]. Approximately 150,000 trajectories (157,515) were used for training, and each dataset was split into training, validation, and test sets at a 7:2:1 splitting ratio.

Trajectory data preprocessing: This process included region filtering, data filtering, and training data construction. All data outside the study area were purged, the longitude range was [−8.71099°, −8.51099°], the latitude range was [41.04961°, 41.24961°], the coordinates of the boundary point were used as the filter range, and only all data in the screening area were retained. Null data were removed. According to the

3 - σ

guidelines, data with a travel duration of less than 3 min were removed as outliers. For trajectory points with a speed of more than 120 km/h, the average speed of the moment before and after the trajectory was used. Missing vehicle data are usually due to equipment failure and signal transmission problems. The missing data are completely random; that is, the missing data have nothing to do with their own value, nor do they have anything to do with the value of other variables. To simulate the real data missing, 25% of the trajectory points in each trajectory in the original trajectory dataset were randomly sampled. The average sampling time interval of the simulated low-sampling rate trajectory was approximately 1 min.

Road network dataset: The road network map dataset used in this paper was derived from OpenStreetMap (OSM), an open-source map website. An automated web scraping technique was applied to extract the data. The node data consist of the node ID and the WGS84 coordinates, and the edge data consist of the starting and ending road ID, coordinates, and the coordinate sequence on the roads.

Road network data preprocessing: This process included area filter, graph construction, and road network simplification. The road network was constructed based on OSM map data. After obtaining the data for the same specified area as trajectory data, ArcGIS software (version 10.8) from Esri (Redlands, CA, USA) was used to further process it into a graph structure through extracting nodes and edges data. The simplification to address redundant involved merging multiple intersections into a single point and merging duplicate road types. Figure 9 is an example of a local road network, showing a schematic diagram of the road network before and after simplification, as shown in the Figure 9, where multiple intersections at an intersection are merged into a single point.

5.1.2. Evaluation Metrics

Experimental evaluation uses a wide range of classification and evaluation indicators. Among these, there are four classification indicators: accuracy, recall, precision, and F1-score. There are two regression indicators: mean absolute error (MAE) and root mean square error (RMSE). The six metrics are described below.

Accuracy: It is the most basic evaluation criterion for classification, representing the proportion of samples correctly predicted by the model. The calculation formula is as follows:

\begin{matrix} Accuracy = \frac{T P + N P}{N} \end{matrix}

(36)

where

N

is the number of all samples and

T P

and

T N

are the numbers of positive and negative samples that are predicted correctly, respectively.

Precision: In addition to accuracy, it measures the accuracy of the model with the predicted positive sample. The proportion of positive data that were predicted correctly to the data was predicted to be positive. The higher the precision, the greater the probability that the true label will also be positive in the samples predicted to be positive. The calculation formula is as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(37)

where

T P

is the number of positive samples that are correctly predicted, and

F P

is the number of positive samples that are predicted incorrectly.

Recall: This is also known as the recall rate, and it is the proportion of positive data predicted to be correct for the actual positive data. The calculation formula is as follows:

r e c a l l = \frac{T P}{T P + F N}

(38)

where

T P

is the number of positive samples that are predicted correctly and

F N

is the number of negative samples that are predicted incorrectly.

F1-score: The F1-score value is a combination of the precision and recall, which can reflect the accuracy and completeness of the model. The calculation formula is as follows:

F 1 - s c o r e = \frac{2 \times P r e c i s o n \times Re c a l l}{P r e c i s o n + Re c a l l}

(39)

Mean absolute error (MAE): A statistical metric that calculates the error between the predicted and ground truth. It represents the average magnitude of the errors in the predictions. Therefore, the model is better with lower MAE. The calculation formula is as follows:

M A E = \frac{1}{N} \sum_{q = 1}^{N} |Y_{q} - {\hat{Y}}_{q}|

(40)

where

Y_{q}

is the true value of the trajectory point coordinates,

{\hat{Y}}_{q}

is the trajectory coordinates predicted by the model.

N

is the number of all samples.

Root-mean-square error (RMSE): RMSE is also a commonly used statistical metric to describe the difference between predicted and actual values. By calculating the RMSE, the accuracy and reliability of a predictive model can be assessed. The RMSE has a greater penalty for high discrepancies than the MAE. The calculation formula is as follows:

M S E = \sqrt{M S E} = \sqrt{\frac{1}{N} \sum_{q = 1}^{N} {(Y_{q} - {\hat{Y}}_{q})}^{2}}

(41)

where

Y_{q}

is the true value of the trajectory point coordinates,

{\hat{Y}}_{q}

is the trajectory coordinates predicted by the model, and

N

is the number of all samples.

5.1.3. Benchmark Models

Eight benchmark models were used in the experimental protocol for comparative experiments to evaluate the effectiveness of the proposed model. The selection of benchmark models is mainly based on the types of methods mentioned in the literature review, which can be divided into three groups: The first group is the traditional method, that is, the trajectory recovery model based on statistics and rules, which uses a two-step scheme to restore first and then map matching the results; the map matching of this group uses the HMM model. In the second group, the encoder used the classical deep learning model and the decoder used the model in this paper. The third group is an end-to-end trajectory recovery model, MTrajRec. This is because the input of the model includes not only trajectory and road network data, but also the point-of-interest (POI) data of the city. This paper sets the MTrajRec, using POI data and non-POI data input as two benchmark models for comparison. The above three sets of benchmark models represent traditional methods, classical deep learning methods, and the best end-to-end solutions at present, which can comprehensively verify the effectiveness of the model compared with other methods.

Linear [66]: The linear model assumes that the trajectory moves uniformly in a straight line to recover its position. Linear interpolation was used to generate the recovered trajectory points, and then HMM was used to confine trajectories mapped to the road network.

DHTR [67]: a hybrid Seq2Seq model with a Kalman filter was used to recover the position, and the trajectories were then matched to the road network.

LSTM [68]: Commonly applied to time series forecasting, it compensates for the inability of traditional RNN models to handle long-term memory. It is possible to determine which states are preserved or forgotten by long-term states and forgetting gates, and the gradient vanishing problem can also be avoided to some extent.

GRU [69]: The GRU model has an update gate and a reset gate, which manage information transmission and output, filtering out irrelevant data while retaining historical information. It reduces computational complexity than LSTM.

Transformer [35]: A sequence-to-sequence model based on attention mechanisms. Time-dependent representations can be learned using a grid embedding and positional encoding for the trajectories.

Deepmove [7]: A classic model for mobility prediction. The inputs are the historical trajectory and the current trajectory, and the current trajectory is predicted by learning the movement laws and patterns of similar historical trajectories.

MTrajRec-no poi [25]: a version of the MTrajRec model without the POI data input.

MTrajRec [25]: The first model using an end-to-end framework for trajectory recovery. Constraint mask, self-attention, and attribute module are proposed to overcome the limitations of coarse grid representation and improve performance. POI is used as the input, POI density of different categories is used as a POI feature, and road network attributes are extracted as network features.

5.1.4. Parameters and Environmental Settings

In terms of hardware, the CPU used in this experiment was an Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40 GHz, and the GPU was a 12 GB NVIDIA RTX A2000 (version 527.99). In terms of software, the system is Windows 11 and the code is implemented by the language Python (version 3.8). PyTorch (version 1.12.1) is adopted as the framework for experiments. In this paper, RNCGCLs were implemented in the Python–Pytorch framework. For the benchmark model, the source code and the parameters recommended by default were used. Due to memory limitations, the size of the GAT hidden layer dimension

d

of the model in this paper was set to 128. The same dimension of hidden layers was set for benchmarks on the dataset. The grid cell unit was set to 50 × 50 m.

λ 1

and

λ 2

in the multi-task training formula were set to 10 and 0.1, respectively. The dimension of

f_{t}

was set to 8, where 7 dimensions represent the level of the segment and 1 dimension represents the length of the segment. The hyperparameters

δ

and

β

were set to 30 and 15, respectively. An Adam optimizer was used in all the of the training process. In this paper, the model was trained on 10 epochs, specifically about 25,000 batch iterations. The batch size of the training was 64 and the learning rate was set to 0.001.

5.2. Comparative Experiments

The experimental results of the proposed RNCGCL model and all benchmark models on the Porto vehicle dataset are shown in Table 3. The experimental results can be analyzed from two aspects: on the one hand, from the classification index, accuracy, recall, precision, and F1-score, a total of four indicators that measure the effect of the model in predicting the road section number. On the other hand, from the regression indicators MAE and RMSE, these two indicators measure the error of the model’s final prediction coordinates. From the evaluation results, it can be observed that compared with most of the baseline models, the proposed model achieved the best prediction results.

As shown in Table 3, RNCGCL outperformed the other baseline models in most of the indicators on the dataset. Among them, Linear performed the worst on all metrics, and RNCGCL was much better than the two-stage baseline methods Linear and DHTR. In terms of regression indicators, RNCGCL decreased by 60.14% and 53.84% compared with Linear in terms of MAE and RMSE, and decreased by 43.41% and 31.92% compared with DHTR. In terms of accuracy and recall, RNCGCL is 19.93% and 12.34% higher than Linear, and 14.08% and 14.46% lower than DHTR. The main reason is that the model in this paper combines contrastive learning and assisted training tasks to effectively model and capture the spatiotemporal dependence in the trajectory, while Linear only performs simple interpolation and ignores the spatial and temporal relationship of the trajectory. Furthermore, it is evident that DHTR outperformed Linear, indicating that the recoveries based on the points simply interpolated were inappropriate and low-precision.

Another observation is that the end-to-end approach generally outperformed the traditional approach. Deepmove and MTrajRec-no poi models achieved better results than the benchmark models because they can capture spatiotemporal information and movement rules in low-sample trajectories. Compared with the model using GRU as the encoder, Deepmove and MTrajRec-no poi had an improved accuracy of 11.12% and 9.51%, respectively, in classifying road section numbers, and the final prediction coordinate errors were improved by 4.70% and 3.35%, respectively. MAE was reduced by 10.41% and 20.42%, respectively. MTrajRec performed worse than MTrajRec-no poi without the inclusion of POI data. Compared with MTrajRec-no poi, MTrajRec decreased accuracy and recall by 2.71% and 0.43%, respectively, and increased by 4.04 and 6.43 on MAE and RMSE, respectively. This shows that more data sources of the model are not better, and if the multi-source data are not well integrated, it will interfere with the performance of the model.

Although Deepmove and MTrajRec-no poi achieved the best results in the benchmark model, they relied only on the RNN model to learn the time dependence. In this paper, the model RNCGCL not only uses the trajectory encoder based on attention mechanism to effectively learn the medium and long-term spatiotemporal dependence in the trajectory, but also uses contrastive learning to enhance the representation learning ability of the model. More specifically, on the Porto dataset, RNCGCL improved the F1-score and recall by 2.81% and 2.23%, respectively, compared with the best baseline. For MAE and RMSE, the relative performance improvements of RNCGCL relative to the best-performing baseline were 8.62% and 10.40%, respectively. This demonstrates the effectiveness of RNCGCL for trajectory recovery. This is precisely because of the complex road network structure in which the graph-based method exhibits high performance. RNCGCL focuses on the features that were relatively significant and it simplifies this process by shrinking the range to the local and dynamic road network structure. The graph contrast learning module proposed in this paper enables the model to learn the rich spatiotemporal characteristics of a given trajectory.

5.3. Ablation Experiments

To further verify the effectiveness of the module proposed in this paper, an ablation experiment was set up to verify the effectiveness of the module. Some of the modules in the model were removed, respectively, and three variants of RNCGCL were constructed.

Variant 1 (w/o TSA): to test the importance of the attention mechanism in the model, the attention layer in the trajectory sequence processing module was removed and replaced with an ordinary fully connected layer.

Variant 2 (w/o GCL): This variant was designed to isolate the impact of contrastive learning on the trajectory recovery performance. It removes the local graph contrastive module and directly uses simple GNN to encode the road network graph. For each trajectory point, the local graph is no longer used. The corresponding road network node is matched as the road network representation vector to the input decoder.

Variant 3 (w/o MT): the model removes the multi-task module, directly predicts the road node number, and takes the midpoint coordinate as the final recovered position.

Figure 6 shows the variation in the training set accuracy with training curve for different variants in the trajectory recovery task. The horizontal axis denotes the training iterations and the vertical axis represents the precision metric. The experimental results are shown in Figure 10.

As shown in Figure 10, the training accuracy of each model increased rapidly in the initial stage and tended to stabilize at approximately 5000 batch iterations. Overall, it can be observed that the RNCGCL containing all modules performed better than all its variants. The highest training accuracy was maintained throughout the training process. This indicated that the complete model had a better effect and robustness when dealing with trajectory recovery tasks.

Specifically, the accuracy of the w/o TSA variant was slightly reduced by removing the trajectory attention layer, which illustrated the effectiveness of the attention mechanism. In addition, the surrounding road network of the vehicles is crucial for comprehending the driving intention. It can be observed that the overall performance of the w/o GCL variant decreased significantly after the deletion of the road network local graph contrastive learning module GCL. The multi-task module in the decoder was the most important component of the model. The variant of removing the multi-task module had the lowest training accuracy and tends to overfit, indicating that the module has a significant improvement in the model performance. More specifically, the performance of the variants across all metrics is presented in Table 4.

The experimental results in Table 4 show that the complete RNCGCL significantly outperformed the ablated version across various evaluation metrics such as accuracy, precision, recall, and F1-score. In particular, the removal of the multi-task module had the greatest impact on training accuracy, which provided a strong reference for further model optimization. Specifically, the attention mechanism was removed from RNCGCL to test its contribution. The w/o TSA variant showed a significant decrease in performance compared to RNCGCL. After removing the attention mechanism, the MAE increased by 15.80% and the recall rate decreased by 5.70%. One possible reason for this is that attention mechanisms can effectively enforce the spatial constraints of missing locations. The variant without contrastive learning showed a noticeable decrease in performance. Compared with RNCGCL, the MAE performance of the w/o GCL variant increased by 45.29%, and the recall decreased by 4.85%. This indicated that contrastive learning contributed significantly to the ability to learn discriminative features from trajectory data, which is crucial for accurate recovery. By extracting the local road network graph, the final recovery result not only conformed to the constraints of the road network, but also helped the model understand the motion trend of the trajectory. When the multi-task module was removed, the w/o MT variant significantly degraded the performance. The variant increased the MAE by 62.70% and the recall rate decreased by 7.90%. This is because multi-task learning generalizes the performance of RNCGCL and enriches the ability to extract spatiotemporal features. At the same time, the multi-task module of the decoder also makes the trajectory recovery have road network constraints, making the model results more reliable and stable.

5.4. Parameter Sensitivity Experiments

In this section, a parameter sensitivity analysis experiment is conducted to test and optimize the performance of the model. The parameters involved include batch size, GAT hidden layer dimension, and multi-task training hyperparameters. Through the control variable method, the corresponding parameters are changed, the other parameters are kept unchanged, and parameter sensitivity analysis is carried out.

The batch size refers to the number of training samples used for each parameter update. The batch sizes were adjusted to 8, 16, 32, 64, and 128 to verify the performance of the model. Figure 11 shows the experimental results of the model with the same number of iterations under different settings for 25,000 iterations.

As shown in Figure 11, the recall and accuracy of the model exhibited an upward trend as the batch size increased. With a larger batch size, the gradient estimation is more accurate because more data samples are used for each update, which generally results in a smoother training and optimization process for the model, which helps improve accuracy. The model performance was poor when the batch sizes were 8 and 16. If the batch size is small, the amount of computation required for each update is small, although each iteration is faster. Therefore, the change in the model parameters after each update is small, resulting in a slower convergence speed, and the improvement in the model effect is slower under the same number of iterations. When the batch sizes were 64 and 128, there was not much difference in the model effect. Although setting a larger batch size can process large amounts of data more efficiently and speed up the training process, more memory is required to store more training samples. If the batch size is too large, memory may be insufficient. Therefore, considering the trade-off between performance and efficiency, the batch size of the RNCGCL model was set to 64.

The dimensions of the GAT hidden layer were set to 8, 16, 32, 64, 128, and 256, and sensitivity experiments were conducted to test the performance of RNCGCL. Figure 12 shows the experimental results of the model with the same number of iterations under different settings for 25,000 iterations.

As shown in Figure 12, the model performance gradually improved as the number of dimensions increased. When the dimension was increased to 64, the performance of the model began to stabilize. Enhancing the quantity of neurons in the hidden layer elevated the complexity of the model, empowering it to acquire and depict additional features and patterns. This often improves the model’s performance of complex tasks. When the dimensions are small, the model’s ability to represent them is limited, and it may not be able to fully capture all the information in the data, affecting the accuracy and robustness of the model. However, larger dimensions also require more computational resources and time because the number of parameters increases, thereby increasing the computational and memory overhead of the model training. Under the trade-off between performance and efficiency, the dimension of the RNCGCL model was set to 128.

The loss function of the model adopts a multi-task training strategy, which is composed of the loss function weighting of three tasks: the road section number classification task, trajectory position prediction task, and road network local graph contrastive task. The hyperparameters in (35) were used to assign the weights of each task, where

λ_{1}

is the weight of the loss function of the trajectory position prediction task and

λ_{2}

is the weight of the road network local graph contrastive task. Three dimensional grades of 0.1, 1, and 10 were set, and the parameters were tuned in the form of two and two parameter combinations, and Table 5 shows the results of RNCGCL under nine different hyperparameter combinations.

As can be seen from Table 5 and the experimental results of the fixed

λ_{1}

or

λ_{2}

, the results of model do not have a large positive or negative trend. When

λ_{1} = 10

, the experimental results of the combination of the three parameters are generally better than the result when

λ_{1} = 0.1

and when

λ_{1} = 1

. This shows that the influence of the trajectory position prediction task is more important, and the error loss can improve the performance of the model. In the combination of the three parameters when

λ_{1} = 10

, when

λ_{2}

increases, the performance of the model increases first and then decreases, indicating that contrast learning can improve the learning ability of the model trajectory representation, and considering the contrast loss can improve the model effect, but too much attention will reduce the model effect. Among the nine parameter combinations, the model performed best when

λ_{1} = 10, λ_{2} = 1

, so the model multi-task training hyperparameters were finally set to

λ_{1} = 10, λ_{2} = 1

. There is still room for the optimization of these parameters.

5.5. Robustness Experiments

In this section, robustness experiments conducted to test the stability of the model’s performance are described. The proportion of missing trajectory data was adjusted and the model was used to evaluate the generalization and stability of the model recovery effect. The sample ratios were set to 12.5%, 25%, 50%, and 75%, respectively. Data with different proportions of missing data were input into the model for the experiments, and the experimental results of the model are shown in Table 6.

As shown in Table 6, the model had good accuracy with the data of the four missing proportions. As the proportion of missing items gradually increased, the performance of the model gradually decreased. When the missing ratio increased from 25% to 87.5%, the accuracy of the model decreased by 19.12% and the MAE index increased by 66.38. When the proportion of missing data increases, the amount of information that can be utilized by the model decreases. As a result, the model cannot obtain sufficient features and samples at the time of training, making it difficult to learn the underlying patterns and patterns of the data. The model also maintained a good level of accuracy at the proportion of the most severe deletions, and the results are better than those of some baseline models at 75% deletions, as can be seen in Section 5.2.

To further study the impact of the missing ratios on model training, the training loss curves of the model under four different missing ratios were recorded, as shown in Figure 13.

As shown in Figure 13, the loss values for all curves were high at the initial stage of training because the parameters of the model were randomly initialized, and the valid patterns in the data had not been learned. With training, the loss decreased rapidly and gradually flattened out, indicating that the model can effectively learn the patterns in the data at all four scales. In the case of missing 25% and 50%, the model training curve was smoother and converged faster. Missing data can lead to increased noise during training, which made it difficult for the model to converge. In the absence of 87.5% of the scale, the model showed strong fluctuations, and the convergence speed was the slowest compared with the other three curves, but eventually flattened and converged to a smaller value. This indicated that the proposed model exhibits good robustness for different missing ratios.

5.6. Time Cost Evaluation

The time costs of models during both training and testing phases were compared in Table 7. When training, the batch size was set to 64 for all the models and the training time was recorded for each iteration. When testing, the testing time for recovering a trajectory was recorded for each model. These metrics are crucial for understanding the efficiency and practicality of models, especially when dealing with large-scale datasets.

Based on Table 7, Linear performed best during the testing phase but is less effective during training. However, it failed to recover trajectory accurately, achieving the lowest accuracy 49.16%. Specifically, for training time cost, Linear and DHTR exhibited the highest training time cost, requiring 259.34 s per iteration. This could be due to its two-stage process, which may not be as efficient with large datasets. LSTM and GRU models have relatively low training times, at 5.31 s and 4.9 s, respectively, indicating the efficiency of recurrent neural networks in handling sequential data. RNCGCL has the highest training time cost at 8.92 s, possibly due to its integration of complex graph contrastive learning mechanisms and multi-task learning modules, increasing computational complexity. For testing time cost, Linear had the shortest testing time, averaging 62.3 ms, consistent with its simple structure, making it very fast during the testing phase. DHTR had the longest testing time at 104.2 ms, possibly due to its intricate decoding process. RNCGCL had the highest testing time at 94.3 ms, but achieved highest accuracy among these methods of 69.09%. Overall, RNCGCL achieved balanced results between time cost and accuracy.

5.7. Validation of Trajectory Recovery Effectiveness

To comprehensively assess the trajectory recovery effect of the proposed method, this section will validate it in terms of both visual analysis and downstream task performance. The visual case analysis section will show the recovery results of the proposed model RNCGCL and two benchmark models. The recovery effect will be analyzed from both macro and micro perspectives. The downstream task performance section will select trajectory prediction as a validation task, and the recovered trajectory data from the proposed model and the other three benchmark models will be used as the dataset to analyze the trajectory prediction results.

5.7.1. Visual Analysis of the Recovery Effect

To verify the effectiveness of the RNCGCL model proposed in this paper in the trajectory recovery task, a trajectory missing case was selected in this section for visualization and analysis of the recovery effect. A vehicle trajectory in the urban road with a sampling rate of 15 s was selected as the case for study, as shown in Figure 14. The trajectory points are labeled with blue symbols. Due to the loss of GPS signals, the section of trajectory is partially missing in the middle region.

In this paper, the original and recovered trajectories were plotted based on OpenStreetMap, an open platform for maps. The recovery results of the RNCGCL model were compared with those of the benchmark models Linear and LSTM. This section will analyze from both macro and micro perspectives.

From the macroscopic aspect of the analysis, the roadmap of the original trajectory and the recovered trajectory were presented in this paper. The visualization of the recovery results is shown in Figure 15. The original trajectory was plotted in blue lines, as shown in Figure 15a, and the recovered trajectory was plotted in red lines. The results of the proposed RNCGCL are shown in Figure 15b. The results of Linear and LSTM are shown in Figure 15c and d, respectively. Key areas of interest in the recovery results are highlighted by green and blue borders.

As shown in Figure 15b, the macroscopic restoration of the proposed model RNCGCL was closely aligned with the ground truth, especially at the turns and intersections. Its restored route is basically consistent with the actual route, maintaining the continuity and smoothness, as shown in the blue border. Figure 15c shows the less accurate recovery results of the Linear model, as the significant deviations compared with the actual routes in the green border can be seen. Due to its simplistic interpolation approach, it failed to capture road curvatures. The two-stage process degraded map matching precision, causing further divergence from the actual route. The macroscopic recovery results of the LSTM model are shown in Figure 15d. Its recovered trajectory was able to capture certain road bending features, and the restored routes were generally accurate and smooth. However, there were some deviations in local details, as shown in the blue border, due to its limitations in capturing long-term dependencies.

From a microscopic point of view, the original trajectory points and the recovered trajectory points were plotted in this section. The recovery effect is visualized in Figure 16. The original trajectory points are plotted in the blue color, and the ground truth of the missing points are represented by green points, as shown in Figure 16a. The recovered trajectory points of each model are shown in red. The results of the proposed model RNCGCL are shown in Figure 16b. The corresponding microscopic recovery results of Linear and LSTM are shown in Figure 16c and d, respectively.

As shown in Figure 16b, the recovery accuracy of the proposed model RNCGCL in this paper was high. The coordinates of its recovered trajectory points were basically the same as the actual results. Figure 16c shows the microscopic recovery results of the Linear model. It can be found that the recovery effect was poorer, and its recovered trajectory point had a large deviation compared with the ground truth. The error brought by the interpolation will result in a mismatched road section in the map matching process. Therefore, its recovered results deviate from the original road, which made the distance error higher. The microscopic recovery results of the LSTM model are shown in Figure 16d. It was based on the neural network method and learned the temporal features of the trajectory. Therefore, its recovered trajectory was able to capture a certain degree of road curvature characteristics in the missing region. While some predicted points deviate from the actual results due to the long-term dependency problem. The model decoded the recovery results by time step in a predictive manner. Therefore, the recovery results in the latter step only considered the historical information. The prediction accuracy of the RNCGCL is significantly better than that of the benchmark model, especially in the key points, and the overall error distribution is uniform. In addition, the model proposed in this paper used a bi-directional LSTM model. This can take full advantage of the characteristics of the missing trajectory problem, as the information at both ends of the missing part of the trajectory is known.

5.7.2. Analysis of Downstream Task Results

The quality of recovered trajectories and its impact on practical applications were assessed by comparing the performance difference between using original trajectories and recovered trajectories on a downstream task. The downstream task selected in this paper is trajectory prediction, which aims to predict trajectory points or paths in a future period based on historical trajectory data. The smoothness and continuity of recovered trajectories will directly affect the performance of the prediction model. The vehicle trajectory data obtained from the recovered models of RNCGCL, Linear, LSTM, and MTrajRec were used as the training dataset to predict the next position using the former six positions. The trajectory prediction model was built based on two-layer LSTM. The parameters of the hidden layer of the model were both set to 128. Then, the corresponding data in the original trajectory dataset were taken as the ground truth when testing. The mean absolute error (MAE) and root mean square error (RMSE) are used as evaluation metrics, which are calculated in Equations (40) and (41). Results of the trajectory prediction task are shown in Table 8.

As shown in Table 8, it can be found that the MAE and RMSE of the recovered trajectories using RNCGCL were lower than those of the Linear- and LSTM-recovered trajectories. In addition, it was close to the prediction errors using the original data. The RNCGCL-recovered trajectories had a better continuity and smoothing, which can provide more reliable inputs for the prediction model. Linear-recovered trajectories only relied on simple interpolation, which made the recovered trajectories deviate to other roads, increasing the prediction error. The analysis of the downstream task results showed that the performance of the RNCGCL-recovered trajectories in the trajectory prediction task was close to that of the results using the original trajectories, and significantly better than the Linear- and LSTM-recovered trajectories. This indicated that the RNCGCL-recovered trajectories had high quality and practicality.

6. Conclusions

In this paper, the constraint effect of the road network on vehicle behaviors was considered to form an end-to-end trajectory recovery model RNCGCL. Map matching was adopted to process the data. This guaranteed that the recovered trajectory could be accurately matched to the actual road network. Road network data were also fused by combining the road network representation and trajectory representation. Local road network graphs were created for each point in trajectory based on the surrounding road network structure. Additionally, weights were assigned based on the distance of the nodes in the local graph. The feature representation vector of the road network nodes was extracted through contrastive learning of the local graphs such that the model could learn more spatial semantic information. Multiple experiments were conducted on the RNCGCL model based on the trajectory dataset in Porto and the corresponding road network dataset, which verified the accuracy and effectiveness of the model over the benchmark model. In addition, this paper comprehensively evaluated the effectiveness and usefulness of the proposed method by visualizing case studies and analyzing the performance for downstream tasks. This study provides a robust solution for trajectory data recovery, contributing to the overall efficiency and sustainability of transportation.

However, the following aspects of the model need to be studied in future work: First, the model is trained using teacher forcing, which is suitable for optimization complexity, but can lead to poor sample quality. The model relies on learning the relationships before and after missing data. It cannot be used on data that are not observed. The weights assigned to multi-tasks in the model are determined through empirical set, which involves an amount of trial and error to optimize. Adaptive weighting methods, which can dynamically adjust the task weights, may potentially leading to more accurate outcomes. Second, the trajectory distribution is susceptible to a variety of exogenous factors. Due to the limitation of data acquisition in this paper, exogenous factors, such as traffic accidents and weather, cannot be considered. In future research, these exogenous influencing factors can be studied further to adapt to different and complex scenarios. Dynamic traffic scenarios, especially abrupt changes caused by accidents or congestion, are also a challenge. Exploring federated learning for collaborative traffic system learning across regions may help model handle diverse traffic patterns effectively. Additionally, integrating datasets from several urban clusters is anticipated to enrich the scale of datasets. This approach will provide a more comprehensive dataset that encompasses a broader spectrum of conditions and dynamics, thereby improving the robustness and applicability of model in various regions. Finally, with the expansion of the road network and the increase in trajectory length, the computational efficiency of model is reduced. Therefore, in future research, optimizing the computational cost of the model should be the focus to meet the requirements of online large-scale trajectory data recovery. Exploring more efficient training methodologies, such as distributed training, can help manage the computational load by distributing the processing across multiple nodes or machines. This could significantly reduce the training time and make the model more scalable.

Author Contributions

Conceptualization, J.C. and Q.F.; methodology, Q.F.; software, Q.F.; validation, J.C. and Q.F.; writing—original draft preparation, Q.F.; writing—review and editing, Q.F.; visualization, Q.F.; supervision, J.C.; project administration, J.C.; funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China, grant number is 61104166.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset was obtained from the ECML/PKDD 2015 competition and is publicly available through the Kaggle platform. The original data presented in the study are openly available in Kaggle at https://kaggle.com/competitions/pkdd-15-predict-taxi-service-trajectory-i (accessed on 7 March 2025) (Meghan O’Connell, Matias Moreira, and Wendy Kan. ECML/PKDD 15: Taxi Trajectory Prediction (I). Kaggle, 2015).

Acknowledgments

The authors would like to thank the reviewers for useful suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Cui, G.; Luo, J.; Wang, X. Personalized travel route recommendation using collaborative filtering based on GPS trajectories. Int. J. Digit. Earth 2022, 11, 284–307. [Google Scholar] [CrossRef]
Graser, A.; Jalali, A.N.; Lampert, J.; Weienfeld, A.; Janowicz, K. MobilityDL: A Review of Deep Learning From Trajectory Data. GeoInformatica 2024, 29, 115–147. [Google Scholar] [CrossRef]
Jiao, P.; Zhao, X.; Zhang, D.; Hu, Y.; Yi, B. Survey of mobile mode analysis based on traffic big data. China J. Highw. Transp. 2021, 34, 175–202. [Google Scholar]
Pan, Y.; Dong, Y.; Wang, D.; Cao, S.; Chen, A. Comparative study on fatigue evaluation of suspenders by introducing actual vehicle trajectory data. Sci. Rep. 2024, 14, 5165. [Google Scholar] [CrossRef]
Fang, J.; He, H.; Xu, M.; Wu, X. Heterogeneous multi-modal graph network for arterial travel time prediction. Appl. Intell. 2025, 55, 446. [Google Scholar] [CrossRef]
Li, M.; Tong, P.; Li, M.; Jin, Z.; Huang, J.; Hua, X.S. Traffic flow prediction with vehicle trajectories. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, virtual, 2–9 February 2021; Volume 35, pp. 294–302. [Google Scholar]
Feng, J.; Li, Y.; Zhang, C.; Sun, F.; Jin, D. DeepMove: Predicting Human Mobility with Attentional Recurrent Networks. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018. [Google Scholar]
Xi, D.; Zhuang, F.; Liu, Y.; Gu, J.; Xiong, H.; He, Q. Modelling of Bi-Directional Spatio-Temporal Dependence and Users’ Dynamic Preferences for Missing POI Check-In Identification. arXiv 2019, arXiv:2112.15285. [Google Scholar] [CrossRef]
Wang, X.; Guan, X.; Cao, J.; Zhang, N.; Wu, H. Forecast Network-Wide Traffic States for Multiple Steps Ahead: A Deep Learning Approach Considering Dynamic Non-Local Spatial Correlation and Non-Stationary Temporal Dependency. arXiv 2020, arXiv:2004.02391. [Google Scholar] [CrossRef]
Yang, S.; Liu, J.; Zhao, K. GETNext: Trajectory Flow Map Enhanced Transformer for Next POI Recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022. [Google Scholar]
Li, F.; Gui, Z.; Zhang, Z.; Peng, D.; Lei, Y. A hierarchical temporal attention-based LSTM encoder-decoder model for individual mobility prediction. Neurocomputing 2020, 403, 153–166. [Google Scholar] [CrossRef]
Gui, Z.; Sun, Y.; Yang, L.; Peng, D.; Gong, J. LSI-LSTM: An attention-aware LSTM for real-time driving destination prediction by considering location semantics and location importance of trajectory points. Neurocomputing 2021, 440, 72–88. [Google Scholar] [CrossRef]
Wang, S.; Bao, Z.; Culpepper, J.S.; Cong, G. A survey on trajectory data management, analytics, and learning. ACM Comput. Surv. CSUR 2021, 54, 1–36. [Google Scholar] [CrossRef]
Zhen, Y. Urban Computing: Building future Cities with Big Data and AI. Satell. Netw. 2018, 12, 6. [Google Scholar]
Zhao, Z.; Koutsopoulos, H.N.; Zhao, J. Individual mobility prediction using transit smart card data. Transp. Res. Part C Emerg. Technol. 2018, 89, 19–34. [Google Scholar] [CrossRef]
Li, X.; Zhao, K.; Cong, G.; Jensen, C.; Wei, W. Deep representation learning for trajectory similarity computation. In Proceedings of the IEEE 34th International Conference on Data Engineering (ICDE), Paris, France, 16–19 April 2018; pp. 617–628. [Google Scholar]
Xia, T.; Qi, Y.; Feng, J.; Xu, F.; Sun, F.; Guo, D.; Li, Y. AttnMove: History Enhanced Trajectory Recovery via Attentional Network. arXiv 2021. [Google Scholar] [CrossRef]
Liao, L.; Lin, Y.; Li, W.; Zou, F.; Luo, L. Traj2traj: A road network constrained spatiotemporal interpolation model for traffic trajectory restoration. Trans. GIS 2023, 27, 1021–1042. [Google Scholar] [CrossRef]
Calabrese, F.; Lorenzo, G.D.; Ratti, C. Human mobility prediction based on individual and collective geographical preferences. In Proceedings of the 13th International IEEE Conference on Intelligent Transportation Systems, Funchal, Portugal, 19–22 September 2010; pp. 312–317. [Google Scholar]
Song, C.; Qu, Z.; Blumm, N.; Barabasi, A.L. Limits of Predictability in Human Mobility. Science 2010, 327, 1018–1021. [Google Scholar] [CrossRef]
Krumm, J.; Horvitz, E. Predestination: Inferring destinations from partial trajectories. In International Conference on Ubiquitous Computing; Springer: Berlin/Heidelberg, Germany, 2006; Volume 2006, pp. 243–260. [Google Scholar]
Chen, Y.; Zhang, H.; Sun, W.; Zheng, B. Rntrajrec: Road network enhanced trajectory recovery with spatial-temporal transformer. In Proceedings of the 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, CA, USA, 3–7 April 2023; Volume 2023, pp. 829–842. [Google Scholar]
Zhou, F. Human Mobile Imputation Technology Based on Deep Learning. Master’s Thesis, University of Electronic Science and Technology of China, Hefei, China, 2022. [Google Scholar]
Yuan, J.; Zhen, Y.; Zhang, C.; Xie, X.; Sun, G. An interactive voting based map matching algorithm. In Proceedings of the 2010 Eleventh International Conference on Mobile Data Management, Kansas City, MO, USA, 23–26 May 2010; Volume 2010, pp. 43–52. [Google Scholar]
Ren, H.; Ruan, S.; Li, Y.; Bao, J.; Meng, C.; Li, R. MTrajRec: Map-constrained trajectory recovery via seq2seq multi-task learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore, 14–18 August 2021; Volume 2021, pp. 1410–1419. [Google Scholar]
Zhang, J.; Pei, H.; Ban, X.; Li, L. Analysis of cooperative driving strategies at road network level with macroscopic fundamental diagram. Transp. Res. Part C Emerg. Technol. 2022, 135, 103503. [Google Scholar] [CrossRef]
Lou, Y.; Zhang, C.; Zheng, Y.; Xie, X.; Huang, Y. Map-matching for low-sampling-rate GPS trajectories. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 4–6 November 2009; Volume 2009, pp. 352–361. [Google Scholar]
Chen, L.; Lv, M.; Chen, G. A system for destination and future route prediction based on trajectory mining. Pervasive Mob. Comput. 2010, 6, 657–676. [Google Scholar] [CrossRef]
Endo, Y.; Nishida, K.; Toda, H.; Sawada, H. Predicting Destinations from Partial Trajectories Using Recurrent Neural Network. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Jeju, Republic of Korea, 23–26 May 2017. [Google Scholar]
Park, S.; Kim, B.; Kang, C.M.; Chung, C.C.; Choi, J.W. Sequence-to-Sequence Prediction of Vehicle Trajectory via LSTM Encoder-Decoder Architecture. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Suzhou, China, 26–30 June 2018; pp. 1672–1678. [Google Scholar]
Liang, Y.; Ke, S.; Zhang, J.; Yi, X.; Zheng, Y. Geoman: Multi-level attention networks for geo-sensory time series prediction. In Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; Volume 2018, pp. 3428–3434. [Google Scholar]
Mikolov, T. Efficient estimation of word representations in vector space. arXiv 2012, arXiv:1301.3781. [Google Scholar]
Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; Volume 2014, pp. 1532–1543. [Google Scholar]
Kong, D.; Wu, F. HST-LSTM: A Hierarchical Spatial-Temporal Long-Short Term Memory Network for Location Prediction. In Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Fu, T.Y.; Lee, W.C. Trembr: Exploring Road Networks for Trajectory Representation Learning. ACM Trans. Intell. Syst. Technol. 2020, 11, 1–25. [Google Scholar] [CrossRef]
Rozemberczki, B.; Davies, R.; Sarkar, R.; Sutton, C. Gemsec: Graph embedding with self clustering. In Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Vancouver, BC, Canada, 27–30 August 2019; Volume 2019, pp. 65–72. [Google Scholar]
Grover, A.; Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Volume 2016, pp. 855–864. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. Stat 2017, 1050, 10–48550. [Google Scholar]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; p. 30. [Google Scholar]
Chen, J. Research and Application of Multi-Scene Trajectory Representation Learning Based on Neural Network. Ph.D. Thesis, Suzhou University, Suzhou, China, 2023. [Google Scholar]
Chen, Y.; Li, X.; Cong, G.; Bao, Z.; Long, C.; Liu, Y. Robust road network representation learning: When traffic patterns meet traveling semantics. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Queensland, Australia, 1–5 November 2021; Volume 2021, pp. 211–220. [Google Scholar]
Liu, Q.; Wu, S.; Wang, L.; Tan, T. Predicting the next location: A recurrent model with spatial and temporal contexts. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; p. 30. [Google Scholar]
Xue, A.Y.; Qi, J.; Xie, X.; Zhang, R.; Huang, J.; Li, Y. Solving the data sparsity problem in destination prediction. VLDB J. 2014, 24, 219–243. [Google Scholar] [CrossRef]
Mathew, W.; Raposo, R.; Martins, B. Predicting future locations with hidden Markov models. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing, Pittsburgh, PA, USA, 5–8 September 2012; Volume 2012, pp. 911–918. [Google Scholar]
Perozzi, B.; Al-Rfou, R.; Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; Volume 2014, pp. 701–710. [Google Scholar]
Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Mei, Q. LINE: Large-scale Information Network Embedding. In Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, 18–22 May 2015; Volume 2015, pp. 1067–1077. [Google Scholar]
Jin, G.; Yan, H.; Li, F.; Huang, J.; Li, Y. Spatio-temporal dual graph neural networks for travel time estimation. ACM Trans. Spat. Algorithms Syst. 2021. [Google Scholar] [CrossRef]
Velickovic, P.; Fedus, W.; Hamilton, W.L.; Lio, P.; Bengio, Y.; Hjelm, R.D. Deep graph infomax. ICLR Poster 2019, 2, 4. [Google Scholar]
Qiu, J.; Chen, Q.; Dong, Y.; Zhang, J.; Yang, H.; Ding, M.; Tang, J. Gcc: Graph contrastive coding for graph neural network pretraining. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Long Beach, CA, USA, 6–10 August 2020; Volume 2020, pp. 1150–1160. [Google Scholar]
Peng, Z.; Huang, W.; Luo, M.; Zheng, Q.; Huang, J. Graph representation learning via graphical mutual information maximization. In Proceedings of the Web Conference 2020, Taipei Taiwan, 20–24 April 2020; Volume 2020, pp. 259–270. [Google Scholar]
Park, C.; Kim, D.; Han, J.; Yu, H. Unsupervised attributed multiplex network embedding. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 5371–5378. [Google Scholar]
Jiang, X.; Lu, Y.; Fang, Y.; Shi, C. Contrastive pretraining of GNNs on heterogeneous graphs. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Queensland, Australia, 1–5 November 2021; Volume 2021, pp. 803–812. [Google Scholar]
Zhu, Y.; Xu, Y.; Yu, F.; Liu, Q.; Wang, L. Deep graph contrastive representation learning. arXiv 2020, arXiv:2006.04131. [Google Scholar]
Zhu, Y.; Xu, Y.; Yu, F.; Liu, Q.; Wu, S.; Wang, L. Graph contrastive learning with adaptive augmentation. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; Volume 2021, pp. 2069–2080. [Google Scholar]
Hassani, K.; Khasahmadi, A.H. Contrastive multi-view representation learning on graphs. In Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 4116–4126. [Google Scholar]
Yin, M.; Sheehan, M.; Feygin, S.; Paiement, J.F.; Pozdnoukhov, A. A generative model of urban activities from cellular data. IEEE Trans. Intell. Transp. Syst. 2017, 19, 1682–1696. [Google Scholar] [CrossRef]
Newson, P.; Krumm, J. Hidden Markov map matching through noise and sparseness. In Proceedings of the 17th ACM SIGSPATIAL international conference on advances in geographic information systems, Seattle, WA, USA, 4–6 November 2009; Volume 2009, pp. 336–343. [Google Scholar]
Joshi, R.R. A new approach to map matching for in-vehicle navigation systems: The rotational variation metric. In Proceedings of the ITSC 2001, 2001 IEEE Intelligent Transportation Systems, Proceedings (Cat. No. 01TH8585), Oakland, CA, USA, 25–29 August 2001; Volume 2001, pp. 33–38. [Google Scholar]
Fang, X.; Huang, J.; Wang, F.; Zeng, L.; Liang, H.; Wang, H. Constgat: Contextual spatial-temporal graph attention network for travel time estimation at baidu maps. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, CA, USA, 23–27 August 2020; Volume 2020, pp. 2697–2705. [Google Scholar]
Dong, W.; Li, J.; Yao, R.; Li, C.; Yuan, T.; Wang, L. Characterizing driving styles with deep learning. arXiv 2016, arXiv:1607.03611. [Google Scholar]
Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. Adv. Neural Inf. Process. Syst. 2014, 27, 3104–3112. [Google Scholar]
Oord, A.; Li, Y.; Vinyals, O. Representation learning with contrastive predictive coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
Chen, K.J.; Liu, L.; Jiang, L.; Chen, J. Self-Supervised Dynamic Graph Representation Learning via Temporal Subgraph Contrast. ACM Trans. Knowl. Discov. Data 2023, 18, 1–20. [Google Scholar] [CrossRef]
Hoteit, S.; Secci, S.; Sobolevsky, S.; Ratti, C.; Pujolle, G. Estimating human trajectories and hotspots through mobile phone data. Comput. Netw. 2014, 64, 296–307. [Google Scholar] [CrossRef]
Zhao, W.X.; Lu, X.; Wu, N.; Wu, N.; Wang, J.; Feng, K. Deep trajectory recovery with fine-grained calibration using kalman filter. IEEE Trans. Knowl. Data Eng. 2021, 33, 921–934. [Google Scholar]
Ma, X.; Tao, Z.; Wang, Y.; Yu, H.; Wang, Y. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transp. Res. Part C Emerg. Technol. 2015, 54, 187–197. [Google Scholar] [CrossRef]
Dey, R.; Salem, F.M. Gate-variants of gated recurrent unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; Volume 2017, pp. 1597–1600. [Google Scholar]

Figure 1. Illustration of trajectory recovery process. The red dash lines represent the matching pairs of recovered points and road segments.

Figure 2. Trajectory illustration.

Figure 3. Location proportion. Dark blue point represents the original point

P

and the light blue one represents the matched point

p

.

Figure 3. Location proportion. Dark blue point represents the original point

P

and the light blue one represents the matched point

p

.

Figure 4. Constrained trajectory. Red dash lines represent the corresponding road segment sequence after matching.

Figure 5. RNCGCL model framework diagram.

Figure 6. Details of trajectory sequence processing module.

Figure 7. Details of road network local graph contrastive module.

Figure 8. Details of the multi-task decoder.

Figure 9. Road network preprocessing: (a) before preprocessing; (b) after preprocessing.

Figure 10. Accuracy curves for different variants of RNCGCL.

Figure 11. Effect of batch size parameters.

Figure 12. Effect of GAT hidden layer dimension parameters.

Figure 13. Training loss curves of RNCGCL at four missing ratios.

Figure 14. A case of vehicle trajectory with missing points. Blue mark represents the sampled trajectory point.

Figure 15. Comparison of macroscopic recovery effect: (a) original; (b) RNCGCL; (c) Linear; and (d) LSTM. Green or blue dash box represents the key area of the recovered route. The detailed recovered results of the area inside the blue dash box are in the lower right corner of (a,b,d).

Figure 16. Comparison of microscopic recovery effect: (a) ground truth; (b) RNCGCL; (c) Linear; and (d) LSTM. Blue circle represents the raw point. Green circle represents the ground truth. Red circle represents the recovered point.

Table 1. Overview of trajectory and road network representation learning methods.

Category	Method Name	Key Points	Limitations
Trajectory representation learning	One-hot [31]	Simple, orthogonal vectors	Loss of spatial connectivity, high computational cost
	Skip-gram [32]	Captures context, learns word vectors	Ignores spatial relationships
	GloVe [33]	Global interaction matrix, captures statistical and contextual info	Limited by vocabulary size
	RNN [34]	Captures sequential information	Struggles with long-term dependencies
	Transformer [35]	Attention mechanism, handles long sequences	High computational cost
	Trembr [36]	Road segment embeddings, captures traffic patterns	Limited by grid size division
	t2vec [16]	Grid-based, captures spatial relationships	Limited by grid resolution
Road network representation learning	DeepWalk [37]	Random walk, captures network topology	Computationally expensive
	Node2Vec [38]	Node embeddings, captures relationships	Loses local context information
	GCN [39]	Graph convolution, captures local connections	Requires substantial computational resources
	GAT [40]	Attention mechanism, flexible weight allocation	Model complexity, long training time
	GraphSage [41]	Aggregates neighbor features, learns topology	Requires extensive parameter tuning
	DGTM [42]	DeepWalk and GAT integration	Less effective for large-scale data
	Toast [43]	Traffic context-aware word embedding	Limited by the complexity of traffic data

Table 2. Overview of trajectory recovery methods.

Category	Method Name	Key Points	Limitations
Statistical methods	Linear [24]	Simple to implement.	Poorly captures complex patterns, lacks spatial–temporal awareness.
	Markov [27,46]	Models transition between states.	Limited to Markovian processes, may not capture all dynamics.
	DHTR [9]	Hybrid model with Kalman filter.	May not generalize well to all scenarios, complex to tune.
Deep learning methods	LSTM [34]	Captures temporal dependencies.	Struggles with long-term dependencies and road network constraint.
	Transformer [35]	Attention mechanism, good for long sequences.	Lacks road network constraint.
	MTrajRec [25]	Multi-task learning, integrates trajectory and map matching.	Hard to handle long trajectories.
	Bi-STDDP [8]	Incorporates bidirectional features and user preferences.	Limited by the complexity of capturing intricate patterns.
	AttnMove [17]	Complex Attention mechanisms to capture patterns.	Not universal due to user-based data.

Table 3. Experimental results of different models on each indicator.

Model	Accuracy	Recall	Precision	F1-score	MAE	RMSE
Linear	0.4916	0.6597	0.6166	0.6374	358.24	494.32
DHTR	0.5501	0.6385	0.7149	0.6745	252.31	335.17
LSTM	0.5346	0.6241	0.6478	0.6357	234.77	312.25
GRU	0.5601	0.7123	0.7870	0.7478	196.34	298.22
Transformer	0.5902	0.7365	0.8229	0.7773	177.13	277.33
Deepmove	0.6713	0.6926	0.8340	0.7568	175.91	274.52
MTrajRec-no poi	0.6552	0.7608	0.8205	0.7895	156.25	254.70
MTrajRec	0.6281	0.7565	0.8210	0.7874	160.29	261.13
RNCGCL	0.6909	0.7831	0.8231	0.8026	142.78	228.20

Table 4. Experimental results for different variants of RNCGCL on all metrics.

Model	Accuracy	Recall	Precision	F1-score	MAE	RMSE
RNCGCL	0.6909	0.7831	0.8231	0.8026	142.78	228.20
w/o TSA	0.6801	0.7385	0.8192	0.7768	185.34	273.49
$∆$ w/o TSA	−1.56%	−5.70%	−0.47%	−3.22%	15.80%	19.85%
w/o GCL	0.6613	0.7451	0.7946	0.7691	207.45	293.22
$∆$ w/o GCL	−4.28%	−4.85%	−3.46%	−4.18%	45.29%	28.49%
w/o MT	0.6786	0.7212	0.7741	0.7467	252.31	335.17
$∆$ w/o MT	−1.78%	−7.90%	−5.95%	−6.96%	52.70%	35.48%

Table 5. Experimental results with different hyperparameter combinations.

$λ_{1}$	$λ_{2}$	Recall	Precision
	0.1	0.6912	0.7847
0.1	1	0.6941	0.8021
	10	0.6671	0.7946
	0.1	0.7142	0.788
1	1	0.7711	0.7601
	10	0.7246	0.7593
	0.1	0.7348	0.8062
10	1	0.7593	0.8152
	10	0.7446	0.7989

Table 6. Experimental results of RNCGCL with different missing ratios.

Missing Ratio	25%	50%	75%	87.5%
MAE	88.24	124.10	142.78	257.32
Precision	0.9015	0.8472	0.8231	0.7103

Table 7. Training and testing time cost of different models.

Model	Accuracy (%)	Training Time (s/its)	Testing Time (ms)
Linear	49.16	259.34	62.3
DHTR	55.01	268.21	104.2
LSTM	53.46	5.31	85.2
GRU	56.01	4.9	84.6
Transformer	59.02	6.22	87.4
Deepmove	67.13	6.81	86.2
MTrajRec	62.81	8.41	84.6
RNCGCL	69.09	8.92	94.3

Table 8. Comparison of recovered data on trajectory prediction task.

Training Data	MAE	RMSE
Ground Truth	96.04	143.76
Linear	154.06	252.15
LSTM	103.27	159.78
MTrajRec	95.50	140.11
RNCGCL	95.83	142.49

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, J.; Feng, Q. Vehicle Trajectory Recovery Based on Road Network Constraints and Graph Contrastive Learning. Sustainability 2025, 17, 3705. https://doi.org/10.3390/su17083705

AMA Style

Chen J, Feng Q. Vehicle Trajectory Recovery Based on Road Network Constraints and Graph Contrastive Learning. Sustainability. 2025; 17(8):3705. https://doi.org/10.3390/su17083705

Chicago/Turabian Style

Chen, Juan, and Qinxuan Feng. 2025. "Vehicle Trajectory Recovery Based on Road Network Constraints and Graph Contrastive Learning" Sustainability 17, no. 8: 3705. https://doi.org/10.3390/su17083705

APA Style

Chen, J., & Feng, Q. (2025). Vehicle Trajectory Recovery Based on Road Network Constraints and Graph Contrastive Learning. Sustainability, 17(8), 3705. https://doi.org/10.3390/su17083705

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Vehicle Trajectory Recovery Based on Road Network Constraints and Graph Contrastive Learning

Abstract

1. Introduction

2. Related Work

2.1. Trajectory and Road Network Representation Learning

2.2. Contrastive Learning

2.3. Trajectory Recovery Methods

3. Problem Description and Definition

4. Methodology

4.1. Research Framework

4.2. Trajectory Sequence Processing Module

4.2.1. Road Network Constraint Layer

4.2.2. Grid Embedding Layer

4.2.3. Sequence Encoding Layer

4.2.4. Self-Attention Layer

4.3. Road Network Local Graph Contrastive Module

4.3.1. Local Graph Generation Layer

4.3.2. Graph Encoding Layer

4.3.3. Graph Contrastive Layer

4.4. Trajectory Recovery Multi-Task Module

4.4.1. Decoding Layer

4.4.2. Model Training Layer

5. Experiments

5.1. Experimental Setup

5.1.1. Datasets and Preprocessing

5.1.2. Evaluation Metrics

5.1.3. Benchmark Models

5.1.4. Parameters and Environmental Settings

5.2. Comparative Experiments

5.3. Ablation Experiments

5.4. Parameter Sensitivity Experiments

5.5. Robustness Experiments

5.6. Time Cost Evaluation

5.7. Validation of Trajectory Recovery Effectiveness

5.7.1. Visual Analysis of the Recovery Effect

5.7.2. Analysis of Downstream Task Results

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI