LGTCN: A Spatial–Temporal Traffic Flow Prediction Model Based on Local–Global Feature Fusion Temporal Convolutional Network

Ye, Wei; Kuang, Haoxuan; Deng, Kunxiang; Zhang, Dongran; Li, Jun

doi:10.3390/app14198847

Open AccessArticle

LGTCN: A Spatial–Temporal Traffic Flow Prediction Model Based on Local–Global Feature Fusion Temporal Convolutional Network

by

Wei Ye

¹

,

Haoxuan Kuang

¹,

Kunxiang Deng

¹,

Dongran Zhang

^2,*

and

Jun Li

¹

School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen 518107, China

²

China Mobile Internet Co., Ltd., Guangzhou 510510, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(19), 8847; https://doi.org/10.3390/app14198847

Submission received: 1 September 2024 / Revised: 19 September 2024 / Accepted: 25 September 2024 / Published: 1 October 2024

(This article belongs to the Special Issue Advances in Intelligent Transportation Systems)

Download

Browse Figures

Versions Notes

Abstract

:

High-precision traffic flow prediction facilitates intelligent traffic control and refined management decisions. Previous research has built a variety of exquisite models with good prediction results. However, they ignore the reality that traffic flows can propagate backwards on road networks when modeling spatial relationships, as well as associations between distant nodes. In addition, more effective model components for modeling temporal relationships remain to be developed. To address the above challenges, we propose a local–global features fusion temporal convolutional network (LGTCN) for spatio-temporal traffic flow prediction, which incorporates a bidirectional graph convolutional network, probabilistic sparse self-attention, and a multichannel temporal convolutional network. To extract the bidirectional propagation relationship of traffic flow on the road network, we improve the traditional graph convolutional network so that information can be propagated in multiple directions. In addition, in spatial global dimensions, we propose probabilistic sparse self-attention to effectively perceive global data correlations and reduce the computational complexity caused by the finite perspective graph. Furthermore, we develop a multichannel temporal convolutional network. It not only retains the temporal learning capability of temporal convolutional networks, but also corresponds each channel to a node, and it realizes the interaction of node features through output interoperation. Extensive experiments on four open access benchmark traffic flow datasets demonstrate the effectiveness of our model.

Keywords:

traffic flow prediction; spatial–temporal feature; local–global feature fusion; probabilistic sparse self-attention; temporal convolutional network

1. Introduction

The growing number of automobiles has created many challenges for urban transportation, including traffic congestion and traffic environment pollution [1,2]. The failure to divert congested traffic results in a lot of wasted travelers’ time and fuel consumption, which has been troubling urban transportation managers [3]. The development of Intelligent Transportation Systems (ITSs) offers promising solutions to this challenge [4]. In particular, high-precision spatial and temporal prediction of regional traffic flow prediction is important reference information of the ITS, which can provide support for traffic control and management and improve the efficiency of urban traffic operation [5].

In the past few decades, scholars have proposed a large number of traffic flow prediction models and achieved excellent prediction results. These models mainly include statistical methods, traditional machine learning, and deep learning methods. Statistical methods include the history average (HA) [6], autoregressive integrated moving average (ARIMA) [7], seasonal autoregressive integrated moving average (SARIMA) [8], Kalman filter model [9], etc. These models are simple, efficient, and easy to compute, but it is difficult to capture the nonlinear temporal correlation of traffic flow and the traffic spatial correlation in the city. To solve this problem, machine learning methods have been proposed and widely used, including k-nearest neighbor (KNN), support vector regression (SVR) [10], gradient boosting machine (GBDT) [11], etc. These methods can be applied to nonlinear traffic flow data, but they rely on manually setting parameters, and the feature extraction capability is still insufficient. Deep learning methods solve the above problems well. For example, Long Short-Term Memory Networks (LSTMs) and Gated Recurrent Units (GRUs) can learn long-term temporal dependencies [12,13], and Temporal Convolutional Networks (TCNs) can realize a larger sensory field by expanding the convolution [14]. Furthermore, the topology of the road network can be modeled by advanced graph neural networks [15]. These models have achieved good results in traffic flow prediction, but it is always difficult to be satisfied with the improvement brought by simple models [16,17].

Focusing on the spatial characteristics of urban transportation networks is considered to be a significant contribution to the improvement of prediction accuracy [18,19,20]. Early research project constructed traffic information as a spatio-temporal matrix and achieved good prediction results using convolutional neural networks (CNNs) [21,22,23]. However, it is applicable to standard Euclidean data, while real traffic spatial relationships (i.e., road networks) are usually non-Euclidean structured. The Graph Convolutional Network (GCN) proposed later introduces a spatially structured graph to represent road network nodes and connectivity relationships, and it shows excellent performance in predicting spatio-temporal traffic [2,24,25]. However, previous studies on traffic network graphs have overlooked the important fact that the nodal impacts of traffic flows can propagate not only in the forward direction, but also in the reverse direction. Specifically, a congested traffic flow not only affects subsequent moments at the next node of the roadway, but it also spreads to the previous node. In addition, nodes with the same land use attributes may have similar change patterns due to commuting behavior, even though these nodes are non-adjacent in terms of road topology, as shown in Figure 1. Note that Figure 1, Figure 2 and Figure 3 illustrate the observational or experimental results in our dataset. Previous CNN and GCN studies could only extract long-distance features by deepening the number of layers, but they are prone to overfitting and difficult to obtain global features [26]. For classical CNNs and GCNs, information can only be propagated in neighboring vectors or nodes on the graph. A multilayered propagation structure may introduce ambiguity or the overfitting of information. How to effectively learn the global correlation of urban road networks has been a hot and challenging research topic [27,28].

As shown in Figure 2, the patterns of each node have not only certain commonalities but also heterogeneity. Specifically, node 1 has two traffic peaks, node 2 has only one traffic peak, and nodes 3 and 4 show strong similarity. Previous spatio-temporal studies of traffic prediction, whether they choose convolutional, recurrent neural networks, or advanced transformers, have only considered the nodes of the whole city as a whole to learn the temporal features, and they ignore the independence [29,30]. A temporal module that can learn multinode heterogeneous features, while the loss of commonality is still lacking.

To address the above challenges, we propose a local–global features fusion temporal convolutional network (LGTCN) for accurate spatial–temporal traffic flow prediction, which incorporates the bidirectional graph convolutional network, probabilistic sparse self-attention, and multichannel temporal convolutional network. Specifically, we employ bidirectional graph convolution to extract the bidirectional propagation of local road networks, as well as a probabilistic sparse self-attention mechanism to efficiently extract global features. Furthermore, we propose a multichannel temporal convolutional network to refine and extract temporal heterogeneity features without the loss of commonality; Finally, multilayer perceptron is used to integrate the hidden states and output predictions. Extensive experiments are conducted on four open access benchmark traffic datasets to evaluate the proposed model, and the experimental results demonstrate the effectiveness of our model.

The main contributions of this paper are as follows:

We propose an innovative spatio-temporal traffic flow prediction scheme, which addresses the research difficulties in spatial and temporal dimensions.
In the spatial local dimension, we propose bidirectional graph convolution to extract local traffic characteristics with bidirectional propagation. It is more in line with the reality of traffic flow compared to previous graph neural network studies.
In the spatial global dimension, we express spatial nodes as spatial sequences and employ a probabilistic sparse self-attention mechanism for node interaction computation. Our method can reduce the computational complexity while maintaining the effectiveness of feature extraction.
In the temporal dimension, we propose a multichannel temporal convolutional network to extract temporal heterogeneity features without the loss of commonality. We set the number of nodes equal to the number of channels to realize the fine extraction of multinode temporal features. Moreover, we utilize the dilated causal convolution, which can expand the sensory field to effectively extract the temporal features of long-term sequences while avoiding information leakage.

The rest of this paper is organized as follows: Section 2 summarizes the literature review related to our topic; Section 3 introduces the problem studied in this paper; Section 4 introduces the proposed spatio-temporal traffic flow prediction method; Section 5 presents the experiments in four benchmark datasets, including descriptions of the datasets, results of the experiments, and findings. Finally, Section 6 concludes the paper and discusses possible future work.

2. Related Work

2.1. Deep Learning for Traffic Flow Prediction

The rapid development of deep learning provides a new research method for the spatial–temporal prediction of traffic flow [31,32,33]. Stacking autoencoders (SAEs) deepen the network layers to learn the latent traffic spatial–temporal relationships and employ a greedy layerwise unsupervised algorithm to pretrain the model [34]. CNN-MLP can extract local spatial correlations and is less sensitive to data noise [18]. By abstracting network traffic into images, CNNs can be employed to learn large-scale spatio-temporal information [19]. Some scholars extracted traffic spatial-temporal information through Conv-LSTM and then used a Bi-LSTM to obtain temporal periodicity features of traffic flow [21]. The Conv-LSTM conforms to the attention mechanism optimization module to extract spatio-temporal features and then uses the BiLSTM module to efficiently capture daily and weekly periodic features [23]. T-GCN adopts a GCN to obtain the spatial dependence of urban roads and then introduces a GRU to obtain the temporal dependence of dynamic changes [35]. ST-ResNet uses convolutional and residual networks to extract spatial correlations, we well as residual networks to extract temporal features [36]. FCL-Net uses Conv-LSTM network to extract spatial–temporal correlation, as well as an LSTM network to extract external weather features [37]. However, a CNN is usually suitable for grid data. In reality, traffic flow data are often non-Euclidean structural data, so a GCN is introduced into traffic prediction. The STGCN employs the GCN and temporal gated convolutions to model spatial and temporal dependencies [38]. The DCRNN adopts the bidirectional random walk mechanism on the graph to model the spatial dependence, uses the Gated Recurrent Units (GRUs) to capture the temporal dependence, the overall model adopts the encode–decode structure, and the scheduled sampling technique to improve the prediction accuracy [39]. Further, a study used advanced deep learning models to accomplish multitask learning and prediction on missing traffic data [40].

2.2. Attention Mechanism

The self-attention mechanism is an important component of the Transformer [41]. The self-attention mechanism can reduce the maximum length of network signals traveling paths to the theoretical shortest

O (1)

[42], so it has the potential to capture the global dependencies of spatial nodes compared to CNN and GCN. The attention mechanism is validated in traffic flow prediction to capture long-range dependencies [27,43,44,45,46].

However, the quadratic computational complexity of the self-attention mechanism affects the efficiency of the model. The Longformer attention mechanism reduces the computational complexity through the combination of local self-attention and global attention, and it realizes the modeling application of long text sequences [47]. Faster attention operations can be achieved by constructing a sparse transformer by dividing the complete attention computation into several blocks [48]. Logsparse Transformer reduces computational complexity to

O (L {(l o g L)}^{2})

[49]. Reformer also achieves the complexity of

O (L {(l o g L)}^{2})

by replacing the dot product attention with locality-sensitive hashing [50].

In addition, the self-attention mechanism makes all nodes perform weight calculation, which will cause the final weight to show a long-tailed distribution [51,52,53] and introduces too much noise from unrelated nodes. We used a self-attention mechanism to extract global spatial correlations for a dataset, which can be learned from long-range correlations from Figure 3a, and the prior graph structure can be ignored. We selected a point in the dataset for attention score distribution analysis, and it can be seen from Figure 3b that the attention distribution is unstable. And as shown in Figure 3c, according to the attention score from high to low, it obviously shows a long-tailed distribution, and a few nodes contribute to the main utility.

3. Preliminaries

In this paper, the task of traffic flow prediction is to predict the traffic flow in future time periods based on the historical traffic flow and the spatial relationship of nodes. This is defined as follows:

Nodes Graph: We use

G = (V, E, A)

to represent the graph of nodes, V to represent the sequence of nodes, E to represent the edges of nodes, and adjacency matrix A to represent the connection between nodes.

Feature Matrix: Graph signal vector

X_{N}^{t} \in R^{N}

, where t denotes the time step.

X_{N}^{t}

represents the observed traffic flow of the nodes at time step t.

Prediction Problem: The prediction problem can be described as follows: given the spatial–temporal data

τ

time steps in the past, learn a nonlinear mapping function to predict the data Q time steps in the future. The formula is as follows:

(X_{N}^{(t + 1)}, \dots, X_{N}^{(t + Q)}) = f (G; (X_{N}^{(t - τ + 1)}, \dots, X_{N}^{(t)}))

(1)

where f is the nonlinear mapping function to be learned, and

τ

denotes the number of look back intervals.

4. Methodology

We proposed a prediction model that fuses temporal convolutions with the local–global feature, as shown in Figure 4. In the spatial dimension, the bidirectional graph convolutional network was used to extract the spatial local features, and the probability sparse self-attention mechanism was used to extract the spatial global features. In the temporal dimension, the multichannel temporal convolutional network was used to extract the temporal features.

4.1. Bidirectional Graph Convolutional Network

For local node relationships, we considered nodes with physical road connections to be neighbors on actual road network. The graph convolutional network aggregates neighborhood information according to the spatial graph structure [54], which can be effectively used for spatial feature extraction of traffic flow. Traffic nodes have upstream and downstream relationships in a physical sense, but because traffic flow has the characteristics of backward propagation, traffic events at downstream nodes will also affect upstream traffic status. Therefore, forward and backward propagation studies on traffic graph are more applicable [39,55]. The bidirectional message propagation function in the first-order approximation ChebNet is defined as follows:

First, we calculate a adjacency matrix

\tilde{A}

with a self-connected structure.

\tilde{A} = A + I

(2)

where A denotes the adjacency matrix without self-connected information, and I is the unit diagonal matrix. More specifically,

A_{i j} = 1

stands for node i and node j being connected. And same below.

Then, a degree matrix

\tilde{D}

can be obtained accordingly as below:

\tilde{D} = \sum_{j} {\tilde{A}}_{i j}

(3)

Then, the normalized adjacency matrix

\hat{A}

can be calculated as below:

\hat{A} = {\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{\frac{1}{2}}

(4)

Finally, we can calculate the hidden feature matrix

H^{l}

of the lth graph convolutional layer as below:

H^{l} = \hat{A} H^{l - 1} W + {\hat{A}}^{T} H^{l - 1} W

(5)

where

{\hat{A}}^{T}

is the matrix transposed after normalization, and W is a learnable parameter matrix.

4.2. Probabilistic Sparse Self-Attention

4.2.1. Canonical Self-Attention

The self-attention mechanism is an important structure of the Transformer, which effectively demonstrates the ability to extract features [41]. Transformer maps the input into query, key, and value, and the attention is calculated by the scaled dot product that then uses Softmax to obtain the weights. The formula is as follows:

A (Q, K, V) = Softmax (\frac{Q K^{T}}{\sqrt{d}}) V

(6)

where

Q, K, V \in R^{L \times d}

, and d define the input dimension. Softmax is an activation function that allocate attention to each feature. After Softmax calculation, the sum of attention weights of features was 1.

4.2.2. Multihead Self-Attention

The multihead attention mechanism divides the model into subspaces, which allows the model to learn information representing different subspaces at different locations. By mapping the linear mapping matrices of Q, K, and V to h in the dimension of

d_{s}

,

d_{s} = d / h

, and then performing the attention operation, and finally concatenating the h attentions together for linear mapping to obtain the output. The multihead attention mechanism expands the model’s ability to learn different positions, and the parameters of the linear mapping can be learned, which improves the model fitting ability. The mechanism first computes the attention information of multiple heads independently and then fuses the multidimensional features through a linear mechanism. The formula is as follows:

h e a d_{i} = A (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V})

(7)

MA (Q, K, V) = Concat (h e a d_{i}, \dots, h e a d_{h}) W^{O}

(8)

where MA stands for mutihead attention function, h is the number of heads, Concat means to concatenate the high-dimensional feature matrices of multiple heads, W is a learnable linear mapping matrix,

W_{i}^{Q}, W_{i}^{K}, W_{i}^{V} \in R^{d \times d_{s}}

, and

W^{O} \in R^{d \times d}

.

4.2.3. Probabilistic Sparse Measurement

The self-attention mechanism usually forms a long-tailed distribution, where a few dot products contribute the main attention, and other dot products contribute less, so the computational complexity can be simplified by capturing the main contribution of dot products. Those elements of the long-tailed distribution that deviate from the uniform distribution have a better chance of being the dominant contribution, so the Informer uses a variant of the KL divergence formulation to filter out the more important dot product pairs, known as the max-mean measurement [42], and the formula is as follows:

M (q_{i}, K) = \underset{j}{m a x} \{\frac{q_{i} k_{j}^{T}}{\sqrt{d}}\} - \frac{1}{L_{K}} \sum_{j = 1}^{L_{K}} \frac{q_{i} k_{j}^{T}}{\sqrt{d}}

(9)

where

q_{i}

is the ith element in Q, and

k_{j}

represents the jth element of K.

The Informer achieves sparse self-attention by allowing each key to focus only on u dominant queries [42].

A (Q, K, V) = Softmax (\frac{\bar{Q} K^{T}}{\sqrt{d}}) V

(10)

where

\bar{Q}

is a sparse matrix, including only the top-u dominant queries. Note that softmax is an activation function to determine the attentional weights of the features. First, randomly sample the quantitative key for each query; Second, calculate the M value of each query; Then select u queries with the highest M value,

u = L \cdot l n L

, other dot-product pairs are filled with 0; Finally, only calculate the dot product of u queries and all keys. The score corresponding to the other query is to average the input of the self-attention layer.

4.3. Multichannel Temporal Convolutional Network

4.3.1. Temporal Convolutional Network

The TCN adopts structures such as dilated causal convolution and residual blocks, which can obtain a larger receptive field with fewer layers and maintain the stability of the model.

Dilated Causal Convolution: Causal convolution does not use recurrent connections, so it can achieve a faster network training speed with parallel input of time series data. However, standard causal convolution needs to stack many layers or use many large convolution kernels to increase the receptive field of neurons when dealing with large sample time series. The dilated causal convolution can improve the receptive field without significantly increasing the computational cost through the skip operation [14]. The formula is as follows:

D C C (i) = \sum_{j = 0}^{k - 1} h (j) x (i - d j)

(11)

where

D C C (i)

is the convolution result of the ith element in the sequence

(x_{0}, \dots, x_{t})

;

h (j)

is the convolution kernel, and the size of the convolution kernel of the one-dimensional sequence is

K = 1 \times k

; d is the expansion factor, when d = 1 is the standard causal convolution.

The dilated causal convolution structure is shown in Figure 5, and the parameters

K = 1 \times 2

, and d are 1, 2, 4, and 4, respectively. Under the same number of network layers, a very large receptive field can be achieved.

Residual Block: The residual block can maintain the stability of model features by introducing the operation of skip connection and solving the problem of deep learning network degradation [56]. The input x of the residual block is added to the features extracted by the dilated causal convolution module, and the final output result o is obtained through the activation function. The formula is as follows:

o = ReLU (x + D (x))

(12)

where ReLU is an activation function;

D (\cdot)

is the dilated causal convolution block.

4.3.2. Multichannel Mechanism

Traffic flow spatial–temporal prediction can also be regarded as a multivariate time series forecasting problem, which consists of multiple one-dimensional time series. The variation laws of each sequence are not the same, so a more refined extraction on the scale of temporal features can improve the prediction accuracy. We regard different nodes as different channels and let the input channel

c_{i}

and output channel

c_{o}

equal the number of nodes N; then each channel is regarded as a response to the temporal varying of different nodes. For each output channel, its convolution kernel is

c_{i} \times K

, and the total number of convolution kernels is

c_{o} \times c_{i} \times K

.

4.4. Prediction Module

The prediction module of our model was used to improve nonlinear prediction ability and is a fully connected neural network architecture [41]. The formula is as follows:

F F N (X) = (X W_{1} + b_{1}) W_{2} + b_{2}

(13)

where

W_{1}

and

W_{2}

are learnable parameter matrices,

b_{1}

and

b_{2}

are biases.

5. Experiments

5.1. Datasets Description

We evaluated the prediction performance of LGTCN on four public traffic flow datasets: PEMS03, PEMS04, PEMS07, and PEMS08 released by STSGCN [57]. The data were collected from four regions in California and aggregated into discrete data at 5 min intervals, amd each node generated 288 data per day, and the detailed description is shown in Table 1.

We normalized the data using max-min scaling:

X^{'} = \frac{X - X_{m i n}}{X_{m a x} - X_{m i n}}

(14)

where

X_{m i n}

is the smallest values in X, and

X_{m a x}

is the largest values in X.

5.2. Baseline Methods

FC-LSTM: The Long Short-Term Memory Network is suitable for dealing with long-term dependencies in time series [60].
DCRNN: The Diffusion Convolutional Recurrent Neural Network uses bidirectional random walks on the graph to capture spatial correlations and an encoder–decoder architecture with predetermined sampling to capture temporal correlations [39].
STGCN: The Spatial–Temporal Graph Convolutional Network employs a temporal gated convolutional module to capture temporal correlations and a graph neural network to capture spatial correlations [38].
ASTGCN(r): The ASTGCN constructs recent, daily, and weekly modules. A spatial–temporal attention module is used to capture spatial–temporal dynamic features, a graph convolution module is used to capture spatial features, and a standard convolution is used to capture temporal features. The comparison algorithm has recently adopted the recent modules in the model [32].
STSGCN: The Spatial–Temporal Synchronous Graph Convolutional Networks construct a spatial–temporal synchronization graph to capture spatial–temporal relationships simultaneously and obtain prediction results by stacking multiple modules to aggregate long-range spatial–temporal relationships and heterogeneity [57].
STFGNN: The Spatial–Temporal Fusion Graph Neural Networks constructs a temporal graph based on time series similarity that can learn long-range dependencies, and it employs a gated dilated convolution module whose large dilation rate can capture long-range dependencies [59].

The spatial-temporal feature extraction methods used by different baseline models are shown in Table 2.

5.3. Experiment Settings

We split the original datasets into the training set, validation set, and test set in a ratio of 6:2:2. We used the past one hour data to predict the future one hour data, which means

P = Q = 12

.

We implemented the LGTCN model on the Pytorch platform, and the hyperparameters of the model were determined on the validation set. We used the same model architecture to conduct experiments on each of the four datasets. Among them, the number of multi-head of the probability sparse self-attention mechanism is 4, and d = 12. Multi-channel temporal convolutional network adopts dynamic expansion coefficients of 1, 2, 4, and 4 respectively. The hidden layer neurons of the prediction module are 128.

We adopted three validation metrics for model performance evaluation, Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE). Their formulas are presented as follows:

MAE = \frac{1}{n} \sum_{i = 1}^{n} |y_{i}^{p} - y_{i}|

(15)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i}^{p} - y_{i})}^{2}}

(16)

MAPE = \frac{1}{n} \sum_{i = 1}^{n} \frac{|y_{i}^{p} - y_{i}|}{y_{i}}

(17)

where n is the number of all values,

y_{i}

is the ground values, and

y_{i}^{p}

is the predicted values at time i.

5.4. Experiment Results

5.4.1. Model Prediction Effects Comparison

The comparison of the prediction effects of various models is shown in Table 3. The LGTCN has achieved in ten out of the twelve indicators and has superior prediction performance. On all four datasets, our experimental results performed consistently well. Our proposed model, LGTCN, consistently outperformed the other baseline models on its MAE and RMSE. In the MAPE, our model took the lead on two datasets, but slightly fails on two datasets. One of the reasons for this is the presence of 0 and tiny values in the data, which leads to the nonapplicability of the MAPE. Another reason is that the loss function of the model is the MAE, and the model is optimized in the direction of minimum MAE value. Nonetheless, we argue that our model achieves an important improvement in prediction performance compared to the baseline models.

The FC-LSTM can only use temporal correlation and cannot use spatial correlation, so the prediction effect is the worst. The DCRNN, STGCN, and ASTGCN(r) can use different modules to utilize the temporal and spatial correlation of data, so the prediction accuracy has improved. The STSGCN can simultaneously extract temporal and spatial information, and the STFGNN model can take into account long-range spatial correlations, so it exhibits excellent predictive performance.

Our model can extract global spatial correlations with a small computational cost and then use a multichannel temporal convolutional network to achieve refined extraction of temporal correlations, and the comprehensive prediction performance is the best.

5.4.2. Horizon Analysis

As can be seen from the Figure 6 and Table 4, as the horizon increased, the prediction error increased steadily. The model had higher prediction accuracy for the near-term and lower prediction accuracy than for the long-term. Different evaluation indicators were not completely consistent with the direction of model validation. For example, the model achieved the minimum value of the MAPE on the PEMS07 dataset, while the MAE and RMSE were always at the highest values.

5.4.3. Ablation Experiment

To verify the role of each module of the model in prediction, we fixed the random seed and conducted a series of ablation experiments on the four datasets, which are the following: NO-PSA: remove the probability sparse self-attention; NO-MCTCN: remove the multichannel temporal convolution network; NP-PM: remove the prediction module; NO-RES: Remove residual connections.

The experimental results are shown in Table 5 and Figure 7, and each module plays a role in the prediction. As shown in Figure 7d, the verification errors of different modules on the PEMS08 datasets, in the evaluation function of MAE, the errors of NO-PSA, NO-MCTCN, NO-PM, and NO-RES were increased by 3.91%, 20.84%, 16.09%, and 24.18% compared to the LGTCN, respectively. In the evaluation function of MAPE, the errors of NO-PSA, NO-MCTCN, NO-PM, and NO-RES were increased by 8.63%, 20.07%, 12.30%, and 23.52% compared to the LGTCN, respectively. In the evaluation function of the RMSE, the errors of NO-PSA, NO-MCTCN, NO-PM, and NO-RES were increased by 3.28%, 20.54%, 14.44%, and 25.02% compared to the LGTCN, respectively.

After removing the PSA module, the model still showed strong predictive ability, because the multichannel mutual operation of the MCTCN module can also achieve the extraction of spatial relationships. The multichannel mechanism in the MCTCN module can achieve refined extraction for each node. Although the PM and RES modules are relatively simple, they have a very important effect on the stability of the model.

6. Conclusions

In this paper, we propose a traffic flow prediction model based on a local–global features fusion temporal convolutional network. First, we used a forward and backward graph convolutional network to obtain spatial local features, and we used a probabilistic sparse self-attention mechanism to efficiently extract spatial global features. Then, we used a multichannel temporal convolutional network to extract the temporal feature, and fine-grained extraction could be achieved by making each channel correspond to a node. Finally, the prediction result was obtained through the prediction module. Extensive experiments demonstrate the overall better performance of our model, and ablation experiments demonstrate that each module plays a role in prediction.

In the future, we will study the simultaneous extraction of spatially local and global features. At the same time, due to the cross-correlation operation of the multichannel convolution output, the model could not realize the independent extraction of complete temporal features, which inspires us to construct a series of independent time extraction sub-modules in the future. In addition, we need to embed real-time data from multiple sources (e.g., weather) into the model to further improve the prediction accuracy of the model. We also hope to implement the model in more urban spatio-temporal data mining areas, such as parking demand analysis. Downstream applications, including congestion information dissemination system, dynamic navigation, etc., are also yet to be developed.

Author Contributions

Conceptualization, W.Y. and D.Z.; methodology, W.Y. and D.Z.; software, W.Y., H.K. and D.Z.; validation, W.Y., D.Z., H.K. and K.D.; formal analysis, W.Y. and J.L.; investigation, W.Y. and D.Z.; resources, D.Z. and J.L.; data curation, W.Y., H.K. and K.D.; writing—original draft preparation, W.Y.; writing—review and editing, W.Y., H.K., K.D., D.Z. and J.L.; visualization, W.Y., H.K. and K.D.; supervision, D.Z.; project administration, D.Z.; funding acquisition, D.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Guangdong Key Areas R&D Program. Grant number No. 2019B090913001.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Dongran Zhang was employed by the China Mobile Internet Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Lv, Y.; Chen, Y.; Zhang, X.; Duan, Y.; Li, N.L. Social media based transportation research: The state of the work and the networking. IEEE/CAA J. Autom. Sin. 2017, 4, 19–26. [Google Scholar] [CrossRef]
Li, Z.; Xiong, G.; Chen, Y.; Lv, Y.; Hu, B.; Zhu, F.; Wang, F.Y. A hybrid deep learning approach with GCN and LSTM for traffic flow prediction. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 1929–1933. [Google Scholar]
Liu, Y.; Feng, T.; Rasouli, S.; Wong, M. ST-DAGCN: A spatiotemporal dual adaptive graph convolutional network model for traffic prediction. Neurocomputing 2024, 601, 128175. [Google Scholar] [CrossRef]
Li, Z.; Wei, S.; Wang, H.; Wang, C. ADDGCN: A Novel Approach with Down-Sampling Dynamic Graph Convolution and Multi-Head Attention for Traffic Flow Forecasting. Appl. Sci. 2024, 14, 4130. [Google Scholar] [CrossRef]
Xia, Z.; Zhang, Y.; Yang, J.; Xie, L. Dynamic spatial–temporal graph convolutional recurrent networks for traffic flow forecasting. Expert Syst. Appl. 2024, 240, 122381. [Google Scholar] [CrossRef]
EDES, Y.J.S.; Michalopoulos, P.G.; Plum, R.A. Improved estimation of traffic flow for real-time control. Transp. Res. Rec. 1980, 95, 28–39. [Google Scholar]
Ahmed, M.S.; Cook, A.R. Analysis of Freeway Traffic Time-Series Data by Using Box-Jenkins Techniques; Transportation Research Board: Washington, DC, USA, 1979; Number 722. [Google Scholar]
Williams, B.M.; Hoel, L.A. Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: Theoretical basis and empirical results. J. Transp. Eng. 2003, 129, 664–672. [Google Scholar] [CrossRef]
Okutani, I.; Stephanedes, Y.J. Dynamic prediction of traffic volume through Kalman filtering theory. Transp. Res. Part B Methodol. 1984, 18, 1–11. [Google Scholar] [CrossRef]
Su, H.; Zhang, L.; Yu, S. Short-term traffic flow prediction based on incremental support vector regression. In Proceedings of the Third International Conference on Natural Computation (ICNC 2007), Washington, DC, USA, 24–27 August 2007; Volume 1, pp. 640–645. [Google Scholar]
Yang, S.; Wu, J.; Du, Y.; He, Y.; Chen, X. Ensemble learning for short-term traffic prediction based on gradient boosting machine. J. Sens. 2017, 2017, 7074143. [Google Scholar] [CrossRef]
Tian, Y.; Zhang, K.; Li, J.; Lin, X.; Yang, B. LSTM-based traffic flow prediction with missing data. Neurocomputing 2018, 318, 297–305. [Google Scholar] [CrossRef]
Fu, R.; Zhang, Z.; Li, L. Using LSTM and GRU neural network methods for traffic flow prediction. In Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), Wuhan, China, 11–13 November 2016; pp. 324–328. [Google Scholar]
Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
Ni, Q.; Peng, W.; Zhu, Y.; Ye, R. Graph dropout self-learning hierarchical graph convolution network for traffic prediction. Eng. Appl. Artif. Intell. 2023, 123, 106460. [Google Scholar] [CrossRef]
Chen, J.; Xu, M.; Xu, W.; Li, D.; Peng, W.; Xu, H. A flow feedback traffic prediction based on visual quantified features. IEEE Trans. Intell. Transp. Syst. 2023, 24, 10067–10075. [Google Scholar] [CrossRef]
Zheng, G.; Chai, W.K.; Duanmu, J.L.; Katos, V. Hybrid deep learning models for traffic prediction in large-scale road networks. Inf. Fusion 2023, 92, 93–114. [Google Scholar] [CrossRef]
Song, C.; Lee, H.; Kang, C.; Lee, W.; Kim, Y.B.; Cha, S.W. Traffic speed prediction under weekday using convolutional neural networks concepts. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 1293–1298. [Google Scholar]
Ma, X.; Dai, Z.; He, Z.; Ma, J.; Wang, Y.; Wang, Y. Learning traffic as images: A deep convolutional neural network for large-scale transportation network speed prediction. Sensors 2017, 17, 818. [Google Scholar] [CrossRef]
Zhang, D.; Yan, J.; Polat, K.; Alhudhaif, A.; Li, J. Multimodal joint prediction of traffic spatial-temporal data with graph sparse attention mechanism and bidirectional temporal convolutional network. Adv. Eng. Inform. 2024, 62, 102533. [Google Scholar] [CrossRef]
Liu, Y.; Zheng, H.; Feng, X.; Chen, Z. Short-term traffic flow prediction with Conv-LSTM. In Proceedings of the 2017 9th International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, 11–13 October 2017; pp. 1–6. [Google Scholar]
Khajeh Hosseini, M.; Talebpour, A. Traffic prediction using time-space diagram: A convolutional neural network approach. Transp. Res. Rec. 2019, 2673, 425–435. [Google Scholar] [CrossRef]
Zheng, H.; Lin, F.; Feng, X.; Chen, Y. A hybrid deep learning model with attention-based conv-LSTM networks for short-term traffic flow prediction. IEEE Trans. Intell. Transp. Syst. 2020, 22, 6910–6920. [Google Scholar] [CrossRef]
Yu, B.; Lee, Y.; Sohn, K. Forecasting road traffic speeds by considering area-wide spatio-temporal dependencies based on a graph convolutional neural network (GCN). Transp. Res. Part C Emerg. Technol. 2020, 114, 189–204. [Google Scholar] [CrossRef]
Chen, Z.; Zhao, B.; Wang, Y.; Duan, Z.; Zhao, X. Multitask learning and GCN-based taxi demand prediction for a traffic road network. Sensors 2020, 20, 3776. [Google Scholar] [CrossRef]
Kuang, H.; Qu, H.; Deng, K.; Li, J. A physics-informed graph learning approach for citywide electric vehicle charging demand prediction and pricing. Appl. Energy 2024, 363, 123059. [Google Scholar] [CrossRef]
Zhang, X.; Huang, C.; Xu, Y.; Xia, L.; Dai, P.; Bo, L.; Zhang, J.; Zheng, Y. Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Conference, 19–21 May 2021; Volume 35, pp. 15008–15015. [Google Scholar]
Luo, Q.; He, S.; Han, X.; Wang, Y.; Li, H. LSTTN: A Long-Short Term Transformer-based spatiotemporal neural network for traffic flow forecasting. Knowl. Based Syst. 2024, 293, 111637. [Google Scholar] [CrossRef]
Wu, F.; Zheng, C.; Zhang, C.; Ma, J.; Sun, K. Multi-View Multi-Attention Graph Neural Network for Traffic Flow Forecasting. Appl. Sci. 2023, 13, 711. [Google Scholar] [CrossRef]
Lian, Q.; Sun, W.; Dong, W. Hierarchical Spatial-Temporal Neural Network with Attention Mechanism for Traffic Flow Forecasting. Appl. Sci. 2023, 13, 9729. [Google Scholar] [CrossRef]
Shi, X.; Qi, H.; Shen, Y.; Wu, G.; Yin, B. A Spatial–Temporal Attention Approach for Traffic Prediction. IEEE Trans. Intell. Transp. Syst. 2020, 22, 4909–4918. [Google Scholar] [CrossRef]
Guo, S.; Lin, Y.; Feng, N.; Song, C.; Wan, H. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 922–929. [Google Scholar]
Han, S.Y.; Zhao, Q.; Sun, Q.W.; Zhou, J.; Chen, Y.H. Engs-dgr: Traffic flow forecasting with indefinite forecasting interval by ensemble gcn, seq2seq, and dynamic graph reconfiguration. Appl. Sci. 2022, 12, 2890. [Google Scholar] [CrossRef]
Lv, Y.; Duan, Y.; Kang, W.; Li, Z.; Wang, F.Y. Traffic flow prediction with big data: A deep learning approach. IEEE Trans. Intell. Transp. Syst. 2014, 16, 865–873. [Google Scholar] [CrossRef]
Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-gcn: A temporal graph convolutional network for traffic prediction. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3848–3858. [Google Scholar] [CrossRef]
Zhang, J.; Zheng, Y.; Qi, D. Deep spatio-temporal residual networks for citywide crowd flows prediction. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
Ke, J.; Zheng, H.; Yang, H.; Chen, X.M. Short-term forecasting of passenger demand under on-demand ride services: A spatio-temporal deep learning approach. Transp. Res. Part C Emerg. Technol. 2017, 85, 591–608. [Google Scholar] [CrossRef]
Yu, B.; Yin, H.; Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 3634–3640. [Google Scholar]
Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Wang, A.; Ye, Y.; Song, X.; Zhang, S.; James, J. Traffic prediction with missing data: A multi-task learning approach. IEEE Trans. Intell. Transp. Syst. 2023, 24, 4189–4202. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Conference, 19–21 May 2021. [Google Scholar]
Lin, Z.; Li, M.; Zheng, Z.; Cheng, Y.; Yuan, C. Self-attention convlstm for spatiotemporal prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11531–11538. [Google Scholar]
Zheng, C.; Fan, X.; Wang, C.; Qi, J. Gman: A graph multi-attention network for traffic prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 1234–1241. [Google Scholar]
Wang, X.; Ma, Y.; Wang, Y.; Jin, W.; Wang, X.; Tang, J.; Jia, C.; Yu, J. Traffic flow prediction via spatial temporal graph neural network. In Proceedings of the Web Conference 2020, Taipei Taiwan, 20–24 April 2020; pp. 1082–1092. [Google Scholar]
Xie, Y.; Xiong, Y.; Zhu, Y. SAST-GNN: A self-attention based spatio-temporal graph neural network for traffic prediction. In Proceedings of the International Conference on Database Systems for Advanced Applications, Jeju, Republic of Korea, 24–27 September 2020; Springer: Cham, Switzerland, 2020; pp. 707–714. [Google Scholar]
Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The long-document transformer. arXiv 2020, arXiv:2004.05150. [Google Scholar]
Child, R.; Gray, S.; Radford, A.; Sutskever, I. Generating long sequences with sparse transformers. arXiv 2019, arXiv:1904.10509. [Google Scholar]
Li, S.; Jin, X.; Xuan, Y.; Zhou, X.; Chen, W.; Wang, Y.X.; Yan, X. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
Kitaev, N.; Kaiser, L.; Levskaya, A. Reformer: The Efficient Transformer. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Liu, Z.; Miao, Z.; Zhan, X.; Wang, J.; Gong, B.; Yu, S.X. Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2537–2546. [Google Scholar]
Li, Y.; Shen, T.; Long, G.; Jiang, J.; Zhou, T.; Zhang, C. Improving Long-Tail Relation Extraction with Collaborating Relation-Augmented Attention. In Proceedings of the 28th International Conference on Computational Linguistics, Virtual Conference, 8–13 December 2020; pp. 1653–1664. [Google Scholar]
Zhao, X.; Qi, R. Improving Long-tail Relation Extraction with Knowledge-aware Hierarchical Attention. In Proceedings of the 2021 IEEE 12th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 20–22 August 2021; pp. 166–169. [Google Scholar]
Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. Spectral networks and locally connected networks on graphs. arXiv 2013, arXiv:1312.6203. [Google Scholar]
Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Zhang, C. Graph WaveNet for Deep Spatial-Temporal Graph Modeling. In Proceedings of the IJCAI, Macao, China, 10–16 August 2019. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Song, C.; Lin, Y.; Guo, S.; Wan, H. Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 914–921. [Google Scholar]
Ji, C.; Xu, Y.; Lu, Y.; Huang, X.; Zhu, Y. Contrastive Learning-Based Adaptive Graph Fusion Convolution Network With Residual-Enhanced Decomposition Strategy for Traffic Flow Forecasting. IEEE Internet Things J. 2024, 11, 20246–20259. [Google Scholar] [CrossRef]
Li, M.; Zhu, Z. Spatial-temporal fusion graph neural networks for traffic flow forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Conference, 2–9 February 2021; Volume 35, pp. 4189–4196. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]

Figure 1. Spatial correlations. (a) Spatial relationship according to prior graph structure. (b) Pearson correlation coefficient for all nodes.

Figure 2. Heterogeneity in temporal dimension.

Figure 3. Feature distribution learned by self-attention. (a) The distribution of global attention values. (b) A node attention score in node order. (c) The attention score of a node is sorted from high to low.

Figure 4. Overall framework of the model.

Figure 5. Structure of dilated causal convolution.

Figure 6. Prediction errors on different datasets. (a) MAE at different horizon. (b) MAPE(%) at different horizon. (c) RMSE at different horizon.

Figure 7. Comparison of ablation experiment results.

Table 1. Datasets description [58,59].

Datasets	Nodes	Days	Time Steps	Missing Rate
PEMS03	358	91	26,208	0.672%
PEMS04	307	59	16,992	3.182%
PEMS07	883	98	28,224	0.452%
PEMS08	170	62	17,856	0.696%

Table 2. Methodology used in the baseline model.

Model	Spatial Global	Spatial Local	Temporal
FC-LSTM	-	-	RNN
DCRNN	-	Bidirectional random walks	RNN
STGCN	-	GCN	TCN
ASTGCN(R)	-	GCN	TCN
STSGCN	-	Synchronization graph
STFGNN	Temporal graph	-	Gated dilated convolution
LGTCN	Probabilistic Sparse Self-Attention	Bidirectional GCN	Multi-Channel TCN

Table 3. Performance comparison of different models in traffic flow dataset prediction.

Datasets	Metrics	FC-LSTM [60]	DCRNN [39]	STGCN [38]	ASTGCN(r) [32]	STSGCN [57]	STFGNN [59]	LGTCN
PEMS03	MAE	21.33 ± 0.24	18.18 ± 0.15	17.49 ± 0.46	17.69 ± 1.43	17.48 ± 0.15	16.77 ± 0.09	15.21 ± 0.04
	MAPE(%)	23.33 ± 0.23	18.91 ± 0.82	17.15 ± 0.45	19.40 ± 2.24	16.78 ± 0.20	16.30 ± 0.09	14.79 ± 0.20
	RMSE	35.11 ± 0.50	30.31 ± 0.25	30.12 ± 0.70	29.66 ± 1.68	29.21 ± 0.56	28.34 ± 0.46	24.11 ± 0.07
PEMS04	MAE	27.14 ± 0.20	24.70 ± 0.22	22.70 ± 0.64	22.93 ± 1.29	21.19 ± 0.10	19.83 ± 0.06	19.58 ± 0.09
	MAPE(%)	18.20 ± 0.40	17.12 ± 0.37	14.59 ± 0.21	16.56 ± 1.36	13.90 ± 0.05	13.02 ± 0.05	15.92 ± 0.31
	RMSE	41.59 ± 0.21	38.12 ± 0.26	35.55 ± 0.75	35.22 ± 1.90	33.65 ± 0.20	31.88 ± 0.14	31.37 ± 0.14
PEMS07	MAE	29.98 ± 0.42	25.30 ± 0.52	25.38 ± 0.49	28.08 ± 2.34	24.26 ± 0.14	22.07 ± 0.11	22.03 ± 0.13
	MAPE(%)	13.20 ± 0.53	11.66 ± 0.33	11.08 ± 0.18	13.92 ± 1.65	10.21 ± 1.05	9.21 ± 0.07	9.96 ± 0.14
	RMSE	45.94 ± 0.57	38.58 ± 0.70	38.78 ± 0.58	42.57 ± 3.31	39.03 ± 0.27	35.80 ± 0.18	34.83 ± 0.16
PEMS08	MAE	22.20 ± 0.18	17.86 ± 0.03	18.02 ± 0.14	18.60 ± 0.40	17.13 ± 0.09	16.64 ± 0.09	16.20 ± 0.08
	MAPE(%)	14.20 ± 0.59	11.45 ± 0.03	11.40 ± 0.10	13.08 ± 1.00	10.96 ± 0.07	10.60 ± 0.06	10.56 ± 0.20
	RMSE	34.06 ± 0.32	27.83 ± 0.05	27.83 ± 0.20	28.16 ± 0.48	26.80 ± 0.18	26.22 ± 0.15	25.22 ± 0.12

Table 4. Comparison of errors in different prediction horizons on four datasets.

Datasets	Metrics	1	2	3	4	5	6	7	8	9	10	11	12
PEMS03	MAE	13.31	13.92	14.44	14.86	15.23	15.53	15.79	16.02	16.30	16.65	17.06	17.63
	MAPE(%)	13.38	13.78	14.12	14.41	14.71	15.00	15.28	15.50	15.77	16.12	16.54	17.10
	RMSE	20.32	21.53	22.53	23.32	23.98	24.49	24.91	25.30	25.74	26.30	26.93	27.79
PEMS04	MAE	17.99	18.37	18.74	19.04	19.32	19.62	19.94	20.22	20.45	20.71	21.11	21.71
	MAPE(%)	13.65	14.04	14.36	14.65	14.93	15.21	15.49	15.77	16.02	16.31	16.69	17.15
	RMSE	28.66	29.45	30.13	30.67	31.16	31.62	32.10	32.54	32.92	33.30	33.85	34.60
PEMS07	MAE	18.47	19.81	20.77	21.49	22.07	22.64	23.23	23.83	24.46	25.10	25.84	26.83
	MAPE(%)	8.39	9.00	9.44	9.81	10.13	10.43	10.74	11.05	11.33	11.66	12.07	12.61
	RMSE	28.58	30.94	32.47	33.60	34.52	35.36	36.20	37.01	37.83	38.67	39.63	40.84
PEMS08	MAE	14.42	14.79	15.14	15.47	15.78	16.14	16.45	16.74	17.02	17.30	17.70	18.27
	MAPE(%)	8.91	9.16	9.39	9.65	9.93	10.19	10.44	10.68	10.91	11.13	11.45	11.87
	RMSE	21.97	22.77	23.49	24.16	24.76	25.35	25.88	26.37	26.80	27.21	27.73	28.49

Table 5. Ablation experiment.

Method	PEMS03			PEMS04			PEMS07			PEMS08
Method	MAE	MAPE(%)	RMSE	MAE	MAPE(%)	RMSE	MAE	MAPE(%)	RMSE	MAE	MAPE(%)	RMSE
LGTCN	15.14	14.73	24.00	19.77	15.36	31.75	22.88	10.56	35.47	16.27	10.31	25.41
NO-PSA	17.14	15.67	26.90	20.47	15.72	32.44	25.26	11.97	38.52	16.90	11.20	26.25
NO-MCTCN	17.31	17.05	27.54	23.61	17.17	36.66	25.98	11.89	39.77	18.89	11.58	29.08
NO-PM	18.15	17.50	30.69	24.28	17.87	38.18	29.00	12.84	45.00	19.66	12.38	30.63
NO-RES	18.45	18.73	31.53	21.60	19.51	34.80	28.36	12.80	47.35	20.20	12.73	31.77

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ye, W.; Kuang, H.; Deng, K.; Zhang, D.; Li, J. LGTCN: A Spatial–Temporal Traffic Flow Prediction Model Based on Local–Global Feature Fusion Temporal Convolutional Network. Appl. Sci. 2024, 14, 8847. https://doi.org/10.3390/app14198847

AMA Style

Ye W, Kuang H, Deng K, Zhang D, Li J. LGTCN: A Spatial–Temporal Traffic Flow Prediction Model Based on Local–Global Feature Fusion Temporal Convolutional Network. Applied Sciences. 2024; 14(19):8847. https://doi.org/10.3390/app14198847

Chicago/Turabian Style

Ye, Wei, Haoxuan Kuang, Kunxiang Deng, Dongran Zhang, and Jun Li. 2024. "LGTCN: A Spatial–Temporal Traffic Flow Prediction Model Based on Local–Global Feature Fusion Temporal Convolutional Network" Applied Sciences 14, no. 19: 8847. https://doi.org/10.3390/app14198847

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LGTCN: A Spatial–Temporal Traffic Flow Prediction Model Based on Local–Global Feature Fusion Temporal Convolutional Network

Abstract

1. Introduction

2. Related Work

2.1. Deep Learning for Traffic Flow Prediction

2.2. Attention Mechanism

3. Preliminaries

4. Methodology

4.1. Bidirectional Graph Convolutional Network

4.2. Probabilistic Sparse Self-Attention

4.2.1. Canonical Self-Attention

4.2.2. Multihead Self-Attention

4.2.3. Probabilistic Sparse Measurement

4.3. Multichannel Temporal Convolutional Network

4.3.1. Temporal Convolutional Network

4.3.2. Multichannel Mechanism

4.4. Prediction Module

5. Experiments

5.1. Datasets Description

5.2. Baseline Methods

5.3. Experiment Settings

5.4. Experiment Results

5.4.1. Model Prediction Effects Comparison

5.4.2. Horizon Analysis

5.4.3. Ablation Experiment

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI