Spatio-Temporal Self-Attention Network for Origin–Destination Matrix Prediction in Urban Rail Transit
Abstract
:1. Introduction
2. Related Work
2.1. Origin–Destination Matrix Prediction in Traffic Scenarios
2.2. Self-Attention Mechanism
2.3. Our Contribution
3. Methods
3.1. Problem Formulation
3.2. Overview
3.3. Spatio-Temporal Non-Local Operation
3.3.1. Spatial Orthogonal Non-Local Operation
3.3.2. Spatio-Temporal Non-Local Operation
3.4. Spatio-Temporal Self-Attention Module
3.4.1. The Overall Architecture of STSM
3.4.2. Attention Map Generation Network
3.4.3. Non-Local Feature Aggregation Network
3.5. Loss Function
4. Experiments
4.1. Dataset
4.2. Implementation Details
4.3. Evaluation Metrics
4.4. Comparison Methods
4.5. Results and Discussions
4.5.1. Comparison of Prediction Performance
4.5.2. Ablation Studies
- (1)
- The proposed SSNet performs better than SSNet-A, which indicates that the proposed STSMs are crucial for the improvement of the network prediction performance. The possible reasons for this result can be analyzed as follows: the proposed STSMs can aggregate long-range spatio-temporal contextual information more effectively than the convolutional layers, thus more effectively exploring lower local minima and then obtaining a lower best RMSE.
- (2)
- From Table 2, the prediction performance from SSNet-A to SSNet is improved, while the prediction performance from SSNet to SSNet-B is decreased, which indicates that, simply increasing the number of STSMs cannot continuously improve the network prediction performance. The possible reasons can be analyzed as follows: since the convolutional operation cannot efficiently capture non-local feature information, it is difficult for SSNet-A to converge to a low minimum value in the process of parameter optimization, thus it is difficult to further improve the network prediction performance. For the proposed SSNet, compared with SSNet-A, adding an appropriate amount of STSMs can effectively aggregate the long-range spatio-temporal contextual information with little impact on parameter optimization; therefore, the prediction performance is further improved. However, compared with the appropriate amount of STSMs, it is difficult to further improve the non-local information aggregation ability of the network by adding more STSMs; simultaneously, the difficulty of parameter optimization will be greatly increased, which is not conducive to the convergence of the network. Therefore, the prediction performance of SSNet-B decreases compared with SSNet. The comparisons of convergence curves among SSNet-A, SSNet-B and SSNet in Figure 11 further validate the above analysis results.
- (3)
- SSNet performs better than SSNet-C, which demonstrates that the STSMs placed at the back of the network can obtain a better prediction performance than the STSMs placed at the front. The main reason may be that in the process of backpropagation, parameter optimization complexity for the STSMs closer to the network output is lower, which makes parameter optimization easier and thus conducive to network convergence.
- (4)
- As shown in Table 2, the prediction performance from SSNet-D to SSNet is improved, while the prediction performance from SSNet to SSNet-E is decreased, which indicates that increasing the number of RB–SRBs cannot continuously improve the network prediction performance. The possible reasons can be analyzed as follows: SSNet-D contains only one RB–SRB, which leads to a small number of layers and thus fails to extract high-level feature information. Compared with SSNet-D, the appropriately increased number of RB–SRBs in SSNet can extract higher-level feature information while having little impact on parameter optimization. Therefore, SSNet performs better than SSNet-D. Compared with SSNet, SSNet-E adds an additional RB–SRB again. Since SSNet can extract effective high-level feature information, it is difficult for SSNet-E to extract more effective feature information by adding another RB–SRB. Moreover, using three RB-SRBs will increase the difficulty of parameter optimization in the training process. Therefore, the OD prediction performance of SSNet-E is decreased compared with SSNet.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Maaten, L.V.D.; Weinberger, K.Q. Densely Connected Convolutional Networks. arXiv 2018, arXiv:1608.06993v5. [Google Scholar]
- Ledig, C.; Theis, L.; Huszar, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 105–114. [Google Scholar]
- Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence Learning with Neural Networks. arXiv 2014, arXiv:1409.3215v3. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
- Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, arXiv:1810.04805v2. [Google Scholar]
- Zhang, J.; Zheng, Y.; Qi, D. Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, Palo Alto, CA, USA, 4–9 February 2017; pp. 1655–1661. [Google Scholar]
- Liu, L.; Qiu, Z.; Li, G.; Wang, Q.; Ouyang, W.; Lin, L. Contextualized Spatial–Temporal Network for Taxi Origin-Destination Demand Prediction. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3875–3887. [Google Scholar] [CrossRef]
- Zhang, J.; Chen, F.; Wang, Z.; Liu, H. Short-Term Origin-Destination Forecasting in Urban Rail Transit Based on Attraction Degree. IEEE Access 2019, 7, 133452–133462. [Google Scholar] [CrossRef]
- Li, D.; Cao, J.; Li, R.; Wu, L. A Spatio-Temporal Structured LSTM Model for Short-Term Prediction of Origin-Destination Matrix in Rail Transit With Multisource Data. IEEE Access 2020, 8, 84000–84019. [Google Scholar] [CrossRef]
- Zhang, J.; Che, H.; Chen, F.; Ma, W.; He, Z. Short-term origin-destination demand prediction in urban rail transit systems: A channel-wise attentive split-convolutional neural network method. Transp. Res. Part C Emerg. Technol. 2021, 124, 102928. [Google Scholar] [CrossRef]
- Zhang, J.; Shen, D.; Tu, L.; Zhang, F.; Xu, C.; Wang, Y.; Tian, C.; Li, X.; Huang, B.; Li, Z. A Real-Time Passenger Flow Estimation and Prediction Method for Urban Bus Transit Systems. IEEE Trans. Intell. Transp. Syst. 2017, 18, 3168–3178. [Google Scholar] [CrossRef]
- Ou, J.; Lu, J.; Xia, J.; An, C.; Lu, Z. Learn, Assign, and Search: Real-Time Estimation of Dynamic Origin-Destination Flows Using Machine Learning Algorithms. IEEE Access 2019, 7, 26967–26983. [Google Scholar] [CrossRef]
- Bierlaire, M.; Crittin, F. An Efficient Algorithm for Real-Time Estimation and Prediction of Dynamic OD Tables. Oper. Res. 2004, 52, 116–127. [Google Scholar] [CrossRef]
- Wang, S.-W.; Ou, D.-X.; Dong, D.-C.; Xie, H. Research on the model and algorithm of origin-destination matrix estimation for urban rail transit. In Proceedings of the 2011 International Conference on Transportation, Mechanical, and Electrical Engineering (TMEE), Changchun, China, 16–18 December 2011; pp. 1403–1406. [Google Scholar]
- Yang, C.; Yan, F.; Xu, X. Daily metro origin-destination pattern recognition using dimensionality reduction and clustering methods. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 548–553. [Google Scholar]
- Noursalehi, P.; Koutsopoulos, H.N.; Zhao, J. Dynamic Origin-Destination Prediction in Urban Rail Systems: A Multi-Resolution Spatio-Temporal Deep Learning Approach. IEEE Trans. Intell. Transp. Syst. 2022, 23, 5106–5115. [Google Scholar] [CrossRef]
- Cao, Y.; Hou, X.; Chen, N. Short-Term Forecast of OD Passenger Flow Based on Ensemble Empirical Mode Decomposition. Sustainability 2022, 14, 8562. [Google Scholar] [CrossRef]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.; Le, Q.V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 5753–5763. [Google Scholar]
- Lin, Z.; Feng, M.; dos Santos, C.N.; Yu, M.; Xiang, B.; Zhou, B.; Bengio, Y. A structured self-attentive sentence embedding. arXiv 2017, arXiv:1703.03130v1. [Google Scholar]
- Dai, Z.; Yang, Z.; Yang, Y.; Carbonell, J.; Le, Q.V.; Salakhutdinov, R. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. arXiv 2019, arXiv:1901.02860v3. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2016, arXiv:1409.0473v7. [Google Scholar]
- Choromanski, K.; Likhosherstov, V.; Dohan, D.; Song, X.; Gane, A.; Sarlos, T.; Hawkins, P.; Davis, J.; Mohiuddin, A.; Kaiser, L.; et al. Rethinking attention with performers. arXiv 2022, arXiv:2009.14794v4. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
- Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. Self-Attention Generative Adversarial Networks. arXiv 2019, arXiv:1805.08318. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154. [Google Scholar]
- Huang, Z.; Wang, X.; Huang, L.; Huang, C.; Wei, Y.; Liu, W. CCNet: Criss-Cross Attention for Semantic Segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 603–612. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2021, arXiv:2010.11929v2. [Google Scholar]
- Ye, J.; Zhao, J.; Zheng, F.; Xu, C. Completion and augmentation-based spatiotemporal deep learning approach for short-term metro origin-destination matrix prediction under limited observable data. Neural Comput. Appl. 2023, 35, 3325–3341. [Google Scholar] [CrossRef]
- Zhou, W.; Du, H.; Mei, W.; Fang, L. Spatial orthogonal attention generative adversarial network for MRI reconstruction. Med. Phys. 2021, 48, 627–639. [Google Scholar] [CrossRef] [PubMed]
- Shi, X.; Chen, Z.; Wang, H.; Yeung, D.; Wong, W.; Woo, W. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 802–810. [Google Scholar]
Method | RMSE | MAE | SMAPE |
---|---|---|---|
ConvLSTM | 0.648 ± 0.343 | 0.161 ± 0.100 | 0.185 ± 0.078 |
STResNet | 0.722 ± 0.476 | 0.172 ± 0.119 | 0.189 ± 0.078 |
CASCNN | 0.740 ± 0.473 | 0.178 ± 0.116 | 0.195 ± 0.077 |
SSNet | 0.622 ± 0.336 * | 0.159 ± 0.105 | 0.183 ± 0.080 |
Method | Block Sequence | RMSE | MAE | SMAPE |
---|---|---|---|---|
SSNet-A | RB–RB–RB–RB | 0.726 ± 0.490 | 0.172 ± 0.120 | 0.188 ± 0.079 |
SSNet-B | SRB–SRB–SRB–SRB | 0.631 ± 0.310 | 0.172 ± 0.107 | 0.202 ± 0.086 |
SSNet-C | SRB–RB–SRB–RB | 0.648 ± 0.343 | 0.168 ± 0.111 | 0.190 ± 0.086 |
SSNet-D | RB–SRB | 0.656 ± 0.365 | 0.167 ± 0.105 | 0.192 ± 0.079 |
SSNet-E | RB–SRB–RB–SRB–RB–SRB | 0.652 ± 0.353 | 0.168 ± 0.110 | 0.193 ± 0.085 |
SSNet | RB–SRB–RB–SRB | 0.622 ± 0.336 * | 0.159 ± 0.105 | 0.183 ± 0.080 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, W.; Tang, T.; Gao, C. Spatio-Temporal Self-Attention Network for Origin–Destination Matrix Prediction in Urban Rail Transit. Sustainability 2024, 16, 2555. https://doi.org/10.3390/su16062555
Zhou W, Tang T, Gao C. Spatio-Temporal Self-Attention Network for Origin–Destination Matrix Prediction in Urban Rail Transit. Sustainability. 2024; 16(6):2555. https://doi.org/10.3390/su16062555
Chicago/Turabian StyleZhou, Wenzhong, Tao Tang, and Chunhai Gao. 2024. "Spatio-Temporal Self-Attention Network for Origin–Destination Matrix Prediction in Urban Rail Transit" Sustainability 16, no. 6: 2555. https://doi.org/10.3390/su16062555