Parallel Interactive Attention Network for Short-Term Origin–Destination Prediction in Urban Rail Transit

Zhou, Wenzhong; Gao, Chunhai; Tang, Tao

doi:10.3390/app14010100

Open AccessArticle

Parallel Interactive Attention Network for Short-Term Origin–Destination Prediction in Urban Rail Transit

by

Wenzhong Zhou

^1,2,*

,

Chunhai Gao

² and

Tao Tang

¹

School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing 100044, China

²

Traffic Control Technology Co., Ltd., Beijing 100070, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(1), 100; https://doi.org/10.3390/app14010100

Submission received: 22 November 2023 / Revised: 18 December 2023 / Accepted: 20 December 2023 / Published: 21 December 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Short-term origin–destination (termed as OD) prediction is crucial to improve the operation of urban rail transit (termed as URT). The latest research results show that deep learning can effectively improve the performance of short-term OD prediction and meet the real-time requirements. However, many advanced neural network design ideas have not been fully applied in the field of short-term OD prediction in URT. In this paper, a novel parallel interactive attention network (termed as PIANet) for short-term OD prediction in URT is proposed to further improve the short-term OD prediction accuracy. In the proposed PIANet, a novel omnidirectional attention module (termed as OAM) is proposed to improve the representational power of the network by calculating the feature weights in the channel–spatial dimension. Moreover, a simple yet effective feature interaction is proposed to improve the feature utilization. Based on the two real-world datasets from the Beijing subway, the comparative experiments demonstrate that the proposed PIANet outperforms the state-of-the-art deep learning methods for short-term OD prediction in URT, and the ablation studies demonstrate that the proposed OAMs and feature interaction play an important role in improving the short-term OD prediction accuracy.

Keywords:

origin–destination prediction; urban rail transit; deep learning; attention mechanism

1. Introduction

Urban rail transit (termed as URT) plays an important role in solving the problems of traffic congestion and environmental pollution in medium and large cities. Short-term origin–destination (termed as OD) prediction is crucial to improve the operation of URT. Short-term OD prediction in URT can alleviate problems such as congestion or an incident that may occur in the URT network, so as to improve the operation of URT and provide a better travel experience for passengers.

At present, there are three main challenges for short-term OD prediction in URT: (1) data sparsity—in an OD matrix, many OD pairs are either small or zero; (2) complex spatio-temporal correlations—specifically, the variation of passenger flow among adjacent OD matrices in the time dimension usually follows the same distribution. Moreover, for each OD matrix, the change of one element may cause the change of other elements. Thus, there are complex spatio-temporal correlations among adjacent OD matrices; (3) low data availability from the same day—specifically, in-and-out card swiping information for each passenger is required to obtain a complete OD matrix. However, there are different time delays for different passengers between entering the station and leaving the station; therefore, it is difficult to obtain the historical OD matrices of the same day, and only the historical inflow series of the same day can be obtained. The above challenges make it difficult to further improve the short-term OD prediction performance.

The existing OD prediction methods in traffic scenarios can be mainly divided into three categories, which are the conventional methods [1,2], the machine learning methods [3,4], and the recently emerging deep learning methods [5,6,7,8]. There are three main problems with the conventional methods and machine learning methods as follows: (1) the computational complexity is high for large-scale URT network, which cannot meet the need of real-time prediction; (2) it is difficult to capture complex spatio-temporal correlations; (3) the prediction accuracy is low. Compared with the conventional methods and machine learning methods, the deep learning methods can not only meet the need of real-time prediction, but also further improve the prediction accuracy.

In recent years, deep learning has developed rapidly and achieved excellent performance in various fields [9,10,11,12,13,14,15,16]. However, many advanced neural network design ideas have not been fully applied in the field of short-term OD prediction in URT. For example, channel–spatial attention and feature interaction are rarely applied in the short-term OD prediction in URT. The introduction of these advanced neural network designs into the field of short-term OD prediction in URT can further improve the short-term OD prediction accuracy by overcoming the above challenges.

Based on the above analysis, in this paper, a novel parallel interactive attention network (termed as PIANet) for short-term origin–destination prediction in URT is proposed to further improve the short-term OD prediction accuracy. The novelty of the proposed PIANet mainly lies in the architectural design, in which two innovative components including the omnidirectional attention module (termed as OAM) and the feature interaction are proposed. Therefore, our main contributions are three-fold:

A novel omnidirectional attention module (termed as OAM) is proposed to improve the representational power of the network by calculating the feature weights in the channel–spatial dimension, which can overcome the difficulty of extracting effective feature information caused by data sparsity and complex spatio-temporal correlations.
A simple yet effective feature interaction is proposed to improve the feature utilization between two network branches.
Based on the two real-world datasets from the Beijing subway, the comparative experiments demonstrate that the proposed PIANet outperforms the state-of-the-art deep learning methods for short-term OD prediction in URT, and the ablation studies show that the proposed OAMs and feature interaction play an important role in improving the short-term OD prediction performance.

The rest of this paper is organized as follows. Section 2 surveys deep learning methods for short-term OD prediction in URT and the attention mechanism. Section 3 describes the proposed PIANet in detail, including problem formulation, the proposed OAM, the dense compression attention block (termed as DCAB), the overall architecture of the proposed PIANet, and the loss function. In Section 4, the basic settings including datasets, implementation details, evaluation metrics, and comparison methods are introduced. Then, the comparative experimental results among the proposed PIANet and the comparison methods are shown and analyzed. After that, the ablation studies are performed to verify the effectiveness of the proposed OAMs and feature interaction. In Section 5, this paper is summarized.

2. Related Work

2.1. Deep Learning Methods for Short-Term OD Prediction in Urban Rail Transit

As an emerging technology, deep learning has been widely used in various fields with its powerful feature extraction and expression ability. In recent years, several deep learning methods have emerged for short-term OD prediction in URT [8,17,18,19,20,21,22]. Based on the multisource spatio-temporal data, Ref. [8] proposed a spatio-temporal long short-term memory network by improving the architecture of LSTM to improve prediction accuracy. The experimental results demonstrated that STLSTM can effectively exploit the potential information of multisource data and improve the prediction effect. Moreover, a channel-wise attentive split–convolutional neural network (termed as CASCNN) [17] was proposed to solve the data availability and data sparsity issues by using an inflow/outflow-gated mechanism and a masked loss function. Ref. [18] proposed a multi-resolution spatio-temporal neural network (termed as MRSTN) for real-time OD prediction, where the multi-resolution spatial feature extraction modules were first used to capture the local spatial dependencies, and ConvLSTMs were used to capture the temporal evolution of demand. The experimental results demonstrated that MRSTN outperformed the advanced deep learning methods (e.g., STResNet [6]). Ref. [19] proposed a completion augmentation-based self-attention temporal convolutional network (termed as CA-SATCN) by addressing the recent destination distribution availability, augmenting the flow presentation for each station, and digging out the global spatial dependency to improve the short-term OD prediction accuracy. Ref. [20] proposed temporal Pearson correlation coefficients, approximate entropy, and spatial correlations as indicators to reflect the inherent spatio-temporal correlations and complexity of the OD flow. Ref. [21] proposed a multi-fused residual network (termed as MF-ResNet) to capture multiple complex dependencies, in which the convolution-based residual network units model the temporal closeness, mid-term periodicity, as well as long-term periodicity features, and the fully connected layers capture external factors. Ref. [22] proposed a spatio-temporal convolutional neural network (termed as STCNN) by using OD pair importance calculation, lagged spatio-temporal relationship construction, lagged spatio-temporal learning, real-time information learning, and sequential-temporal learning blocks. The experimental results demonstrated that the STCNN outperforms the advanced methods with significant superiority on critical OD pairs.

It can be seen that there are few deep learning methods that introduce channel–spatial attention into the field of short-term OD prediction in URT. The first contribution of this paper is introducing the proposed OAM, a variant of the channel–spatial attention module, into the field of short-term OD prediction in URT. In addition, for the existing deep learning methods (e.g., CASCNN [17], CA-SATCN [19]) with several inputs, there is a problem of low feature utilization between two adjacent network branches. Thus, the second contribution of this paper is proposing a feature interaction to further improve the feature utilization.

2.2. Attention Mechanism

The attention mechanism is crucial for human perception, as it can help humans focus on the target area in the global image to obtain high-value information. In recent years, there has been plenty of research applying the attention mechanism to various fields (e.g., computer vision [23,24,25,26]). Notably, the efficient Squeeze-and-Excitation block [23] was proposed to model the cross-channel relationships by learning the channel attention. Then, a Convolutional Block Attention Module (termed as CBAM) [25] was proposed by learning channel attention and spatial attention separately. In [25], a global max pooling operation was added for the channel attention to generate a finer channel attention map, and then exploiting both spatial attention and channel attention was demonstrated to be superior to using only the channel attention. In the field of short-term OD prediction in URT, channel attention was introduced in [17] to weigh different OD inputs.

Inspired by CBAM [25], in the proposed OAM of this paper, an omnidirectional attention map is first generated by combining a channel attention map and a spatial attention map, and then used to fine-tune the importance of each element in the input feature in the channel–spatial dimension. Note that the channel–spatial dimension denotes the three-dimensional coordinate space consisting of the channel axis, height axis, and width axis.

3. Methods

In this section, the short-term OD prediction of this paper is first formulated. Then, the architectures of the OAM and DCAB are introduced in turn. After that, the overall architecture of the proposed PIANet is introduced. Lastly, the loss function is introduced.

3.1. Problem Formulation

For the short-term OD prediction of this paper, the historical OD matrices M of previous days and the historical inflow series N of the same day are used to generate the predicted OD matrix O, where any of the historical OD matrices and any historical inflow series can be formulated as follows:

M^{d, t} = [\begin{matrix} m_{11}^{d, t} & m_{12}^{d, t} & \dots & m_{1 j}^{d, t} & \dots & m_{1 ϕ}^{d, t} \\ m_{21}^{d, t} & m_{22}^{d, t} & \dots & m_{2 j}^{d, t} & \dots & m_{2 ϕ}^{d, t} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ m_{i 1}^{d, t} & m_{i 2}^{d, t} & \dots & m_{i j}^{d, t} & \dots & m_{i ϕ}^{d, t} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ m_{ϕ 1}^{d, t} & m_{ϕ 2}^{d, t} & \dots & m_{ϕ j}^{d, t} & \dots & m_{ϕ ϕ}^{d, t} \end{matrix}],

(1)

N^{t} = [\begin{matrix} n_{1}^{t} \\ n_{2}^{t} \\ ⋮ \\ n_{i}^{t} \\ ⋮ \\ n_{ϕ}^{t} \end{matrix}],

(2)

where

m_{i j}^{d, t}

denotes the OD flow between station i and station j at the time interval t on day d, and

n_{i}^{t}

represents the inflow of station i at the time interval t of the same day. Given that the predicted time interval is the time interval

κ

on day

δ

, the time delay of OD data acquisition is

α

minutes, and prediction time granularity is set to

β

minutes. Figure 1 shows the temporal relationship between the historical OD data and the predicted OD matrix. According to Figure 1, the short-term OD prediction can be formulated as follows:

O^{κ, δ} = f (M^{κ - y, δ - x_{1}}, N^{δ - α - β - x_{2}}),

(3)

where

O^{κ, δ}

represents the OD matrix at the predicted time interval, and

M^{κ - y, δ - x_{1}}

denotes the historical OD matrix at the time interval

δ - x_{1}

on day

κ - y

, where

y = 1, 2, \dots, λ_{1}

and

x_{1} = 0, 1, \dots, λ_{2} - 1

.

N^{δ - α - β - x_{2}}

denotes the inflow series at the time interval

δ - α - β - x_{2}

of the same day, where

x_{2} = 1, 2, \dots, λ_{3}

.

f (\cdot, \cdot)

is a nonlinear function. In this paper,

f (\cdot, \cdot)

denotes the forward process of the proposed PIANet.

3.2. Omnidirectional Attention Module

The OAM is proposed to fine-tune the importance of each element of the input feature in the channel–spatial dimension by generating a learnable omnidirectional attention map, so as to achieve more effective feature expression. The architecture of the proposed OAM is shown in Figure 2. As shown in Figure 2, the input feature and output feature are, respectively,

X \in R^{C \times H \times W}

and

Y \in R^{C \times H \times W}

, where C, H, and W, respectively, denote the number of channels, the height, and the width. In the proposed OAM, the omnidirectional attention map

P_{1} \in R^{C \times H \times W}

is first generated by a dot-product operation between the generated channel attention map

R_{1}

and the generated spatial attention map

R_{2}

, and then used to fine-tune the elements in the input feature. The specific process of the proposed OAM is as follows.

Firstly, the channel attention map

R_{1} \in R^{C \times H \times W}

is generated to measure the importance of each feature map in the input feature, which can be formulated as:

\begin{matrix} (4) & G_{1} = w_{1} G A P_{s} (X) \oplus w_{2} G M P_{s} (X), \\ (5) & L_{1} = σ_{l r} (g_{f c} (G_{1})), \\ (6) & S_{1} = σ_{s i g} (g_{f c} (L_{1})), \\ (7) & R_{1} = ψ_{s} (S_{1}), \end{matrix}

where

w_{1}

and

w_{2}

are the trainable weights, and ⊕ represents the adding operation.

G A P_{s} (\cdot)

and

G M P_{s} (\cdot)

denote the global average pooling operation and the global max pooling operation in the spatial dimension, respectively. Note that the spatial dimension denotes the two-dimensional coordinate space consisting of the height axis and the width axis.

g_{f c} (\cdot)

represents the fully connected layer.

σ_{l r}

and

σ_{s i g}

denote the leaky ReLU activation function and the sigmoid activation function, respectively.

ψ_{s} (\cdot)

represents the repeat operation in the spatial dimension. In Equation (4), compared with using only one global pooling operation, using two different global pooling operations in the spatial dimension can obtain two different feature statistics, thus inferring finer channel attention. Thus,

G_{1} \in R^{C \times 1 \times 1}

is generated. To reduce parameter overhead while ensuring the flexibility of learnable attention map generation, the fully connected layers in Equations (5) and (6) are used to obtain

L_{1} \in R^{C / r \times 1 \times 1}

and

S_{1} \in R^{C \times 1 \times 1}

, respectively, where r is the reduction ratio. After the repeat operation in Equation (7), the channel attention map

R_{1} \in R^{C \times H \times W}

is generated.

Simultaneously, the spatial attention map

R_{2} \in R^{C \times H \times W}

is generated to measure the importance of each element in the input feature in the spatial dimension, which can be formulated as:

\begin{matrix} (8) & G_{2} = [G A P_{c} (X), G M P_{c} (X)], \\ (9) & L_{2} = σ_{l r} (g_{b n} (g_{c o n v} (G_{2}))), \\ (10) & S_{2} = σ_{s i g} (L_{2}), \\ (11) & R_{2} = ψ_{c} (S_{2}), \end{matrix}

where

[\cdot, \cdot]

denotes the concatenation operation along the channel axis, and

G A P_{c} (\cdot)

and

G M P_{c} (\cdot)

denote the global average pooling operation and the global max pooling operation along the channel axis, respectively.

g_{c o n v} (\cdot)

and

g_{b n} (\cdot)

denote the two-dimensional convolution and the batch normalization, respectively.

ψ_{c} (\cdot)

represents the repeat operation along the channel axis. In Equation (8), compared with using only one global pooling operation, using the global average pooling operation and the global max pooling operation along the channel axis can infer finer spatial attention. Thus,

G_{2} \in R^{2 \times H \times W}

is generated. In Equations (9) and (10), the convolutional layer and sigmoid activation function are used to obtain

L_{2} \in R^{1 \times H \times W}

and

S_{2} \in R^{1 \times H \times W}

, respectively. Then, according to Equation (11),

S_{2}

are copied along the channel axis to obtain the spatial attention map

R_{2} \in R^{C \times H \times W}

. Note that the repeat operations on

S_{1}

and

S_{2}

are performed to facilitate the fusion between

R_{1}

and

R_{2}

.

Eventually, the dot-product operation between

R_{1}

and

R_{2}

is performed to generate the omnidirectional attention map

P_{1}

, and then the dot-product operation between X and

P_{1}

is performed to obtain Y, which can be formulated as:

\begin{matrix} P_{1} & = R_{1} ⊙ R_{2}, \end{matrix}

(12)

\begin{matrix} Y & = X ⊙ P_{1}, \end{matrix}

(13)

where ⊙ denotes the dot-product operation. It can be seen that

P_{1}

contains the feature weights of X in the channel–spatial dimension, which can fine-tune the importance of each element in X in the channel–spatial dimension by Equation (13).

3.3. Dense Compression Attention Block

The architecture of DCAB is shown in Figure 3. According to Figure 3, the proposed DCAB consists of a dense compression block (termed as DCB) and an OAM. In the DCAB, the proposed OAM is used to improve the feature expression by fine-tuning the importance of each element in the input feature in the channel–spatial dimension. Moreover, the DCB is composed of a dense block [11] and a convolutional layer. In the DCB, the dense block is used to improve the training efficiency. It can be seen that the number of channels in the output feature of the dense block is drastically increased compared to the number of channels in its input features. Therefore, in order to reduce the computational complexity, a convolutional layer is subsequently used to compress the dimension in the channel axis.

3.4. Parallel Interactive Attention Network

The proposed PIANet takes the historical OD matrices

M \in R^{λ \times ϕ \times ϕ}

of previous days and the historical inflow series

N \in R^{λ \times ϕ}

of the same day as inputs and the predicted OD matrix

O \in R^{1 \times ϕ \times ϕ}

as output, where

λ = λ_{1} λ_{2} = λ_{3}

is set. Figure 4 shows the overall architecture of the proposed PIANet, and Table 1 presents the PyTorch-like pseudocode for the forward process of the proposed PIANet. According to Figure 4 and Table 1, the proposed PIANet can be divided into three stages: stage 1 is the generation of the historical OD matrices of the same day, stage 2 is the feature interaction, and stage 3 is the generation of the predicted OD matrix.

3.4.1. Generation of the Historical OD Matrices of the Same Day

As shown in Figure 4, stage 1 can be described in two steps. The first step is to generate the OD distribution

S_{1}

of the historical OD matrices of the same day by introducing the OD distribution information contained in the historical OD matrices of the previous days. The second step is to generate the historical OD matrices

M_{1}

of the same day by using

S_{1}

. The specific process is as follows.

Firstly, M is fed into two DCABs in turn to obtain the feature

D_{1} \in R^{λ \times ϕ \times ϕ}

and

D_{2} \in R^{λ \times ϕ \times ϕ}

, respectively, where

D_{2}

can provide effective OD distribution information for the generation of the historical OD matrices of the same day.

Simultaneously, N is fed into the unsqueeze layer and the repeat layer in turn to obtain

U_{1} \in R^{λ \times ϕ \times 1}

and

B_{1} \in R^{λ \times ϕ \times ϕ}

, respectively. It can be seen that the dimension of

B_{1}

is the same as that of M. However,

B_{1}

is obtained by copying the inflow series in

U_{1}

along the width axis; therefore,

B_{1}

does not contain any OD flow information. After that,

B_{1}

is fed into a DCAB for feature extraction to obtain

D_{3} \in R^{λ \times ϕ \times ϕ}

. In order to extract effective OD flow feature information, the adding operation between

D_{2}

and

D_{3}

is performed to obtain

A_{1} \in R^{λ \times ϕ \times ϕ}

. Then,

A_{1}

is fed into the softmax layer to obtain

S_{1} \in R^{λ \times ϕ \times ϕ}

, where the softmax layer can perform normalization computation along the width axis. Thus,

S_{1}

can be regarded as the OD distribution of the historical OD matrices of the same day, where row i of each

ϕ \times ϕ

matrix in

S_{1}

represents the OD distribution from station i to any station.

Secondly, the dot-product operation between

B_{1}

and

S_{1}

is performed to generate the historical OD matrices

M_{1} \in R^{λ \times ϕ \times ϕ}

of the same day, which can be formulated as:

M_{1} = B_{1} ⊙ S_{1} .

(14)

3.4.2. Feature Interaction

In stage 2, the feature information in

D_{1}

and

M_{1}

is further extracted separately, and then the extracted feature information is used interactively to improve the feature utilization, which is helpful to improve the short-term OD prediction accuracy. The specific process is as follows.

D_{1}

and

M_{1}

are fed into the DCABs for feature extraction to obtain

D_{4} \in R^{λ \times ϕ \times ϕ}

and

D_{5} \in R^{λ \times ϕ \times ϕ}

, respectively, where

D_{4}

and

D_{5}

contain the feature information of the historical OD matrices of previous days and the historical OD matrices of the same day, respectively. In order to improve the feature utilization and provide richer feature expression, two different feature fusion operations are performed on

D_{4}

and

D_{5}

to obtain

E_{1} \in R^{2 λ \times ϕ \times ϕ}

and

A_{2} \in R^{λ \times ϕ \times ϕ}

, respectively, which can be expressed as:

\begin{matrix} E_{1} & = [D_{4}, D_{5}], \end{matrix}

(15)

\begin{matrix} A_{2} & = D_{4} \oplus D_{5} . \end{matrix}

(16)

In order to further improve the feature expression, the feature interaction is performed again. Specifically,

E_{1}

and

A_{2}

are fed into the DCABs to generate

D_{6} \in R^{λ \times ϕ \times ϕ}

and

D_{7} \in R^{λ \times ϕ \times ϕ}

, respectively. Then, the adding operation and concatenation operation are performed on

D_{6}

and

D_{7}

to obtain

A_{3} \in R^{λ \times ϕ \times ϕ}

and

E_{2} \in R^{2 λ \times ϕ \times ϕ}

, respectively, which can be formulated as:

\begin{matrix} A_{3} & = D_{6} \oplus D_{7}, \end{matrix}

(17)

\begin{matrix} E_{2} & = [D_{6}, D_{7}] . \end{matrix}

(18)

After that,

A_{3}

and

E_{2}

are fed into the DCABs for deeper feature extraction to obtain

D_{8} \in R^{λ \times ϕ \times ϕ}

and

D_{9} \in R^{λ \times ϕ \times ϕ}

, respectively.

3.4.3. Generation of the Predicted OD Matrix

After obtaining

D_{8}

and

D_{9}

, the concatenation operation between

D_{8}

and

D_{9}

is performed to obtain

E_{3} \in R^{2 λ \times ϕ \times ϕ}

. Eventually,

E_{3}

is fed into a DCB to generate the predicted OD matrix O.

3.5. Loss Function

In this paper,

L_{2}

loss is used as the loss function, which can be formulated as

L_{2} = \frac{1}{ϕ^{2}} \sum_{i = 1}^{ϕ} \sum_{j = 1}^{ϕ} {({\hat{M}}_{i, j} - {\hat{O}}_{i, j})}^{2},

(19)

where

\hat{O}

is the predicted OD matrix obtained by removing the dimension whose size is 1 from O;

{\hat{M}}_{i, j}

and

{\hat{O}}_{i, j}

denote the value of row i and column j in the real OD matrix and the predicted OD matrix, respectively.

4. Experiments and Results

In this section, the basic settings including datasets, implementation details, evaluation metrics and comparison methods are first introduced. Then, the experimental results are presented and discussed, in which the experiments include two parts: (1) comparison of short-term OD prediction performance among the proposed PIANet and the three comparison methods; and (2) ablation studies.

4.1. Basic Settings

4.1.1. Datasets

Two real-world large-scale OD datasets from the Beijing subway were used in the experiments, which are presented in Table 2. According to Table 2, these two OD datasets were collected in 2021 and 2022, named BJSubway2021 and BJSubway2022, respectively. For BJSubway2021, the date range is from 2021.9.14 to 2021.10.17 and from 2021.12.14 to 2021.12.29, the number of data records is 137 million, the number of stations is 369, and thus the OD matrix dimension is

369 \times 369

. BJSubway2021 contains 50 days, of which 36 days, 7 days, and 7 days were used as training set, validation set, and test set, respectively. For BJSubway2022, the date range is from 2022.1.1 to 2021.1.9 and from 2022.1.18 to 2022.2.21, the number of data records is 98 million, the number of stations is 372, and thus the OD matrix dimension is

372 \times 372

. BJSubway2022 contains 44 days, of which 32 days, 6 days, and 6 days were used as the training set, validation set, and test set, respectively. For both BJSubway2021 and BJSubway2022, the time range in a day was set from 6:00 to 23:00; each record contains the entry station ID, the exit station ID, the entry time, and the number of passengers entering the station; And the time interval is 5 min.

4.1.2. Implementation Details

In this paper, the proposed PIANet was implemented by using PyTorch and Python on an NVIDIA Geforce RTX 3090 with 24GB memory, and the learning rate and batch size were set to 0.001 and 16, respectively. The hyperparameters were set as follows: both

w_{1}

and

w_{2}

were initialized to 0.5, and

α = 15

,

β = 15

,

λ_{1} = 3

,

λ_{2} = 3

,

λ_{3} = 9

,

r = 2

. Moreover, the Adam optimizer was used to optimize the parameters in the proposed PIANet. The early stopping was used as the training stopping criterion to avoid overfitting, that is, the best validation RMSE was recorded in each epoch, and the training did not stop until the best validation RMSE rose 10 times in a row. All convolutional layers were followed by the batch normalization layers and leaky ReLu layers. Furthermore, the historical OD matrices of previous days and the historical inflow series of the same day were preprocessed by Min-Max normalization to avoid gradient explosion before feeding into the architecture of the proposed PIANet.

4.1.3. Evaluation Metrics

In this paper, the root mean square error (termed as RMSE), mean absolute error (termed as MAE), and symmetric mean absolute percentage error (termed as SMAPE) were used as evaluation metrics to evaluate the short-term OD prediction performance, in which RMSE, MAE, and SMAPE are, respectively, formulated as

\begin{matrix} R M S E & = \sqrt{\frac{1}{ϕ^{2}} \sum_{i = 1}^{ϕ} \sum_{j = 1}^{ϕ} {({\hat{M}}_{i, j} - g ({\hat{O}}_{i, j}))}^{2}}, \end{matrix}

(20)

\begin{matrix} M A E & = \frac{1}{ϕ^{2}} \sum_{i = 1}^{ϕ} \sum_{j = 1}^{ϕ} |{\hat{M}}_{i, j} - g ({\hat{O}}_{i, j})|, \end{matrix}

(21)

\begin{matrix} S M A P E & = \frac{1}{ϕ^{2}} \sum_{i = 1}^{ϕ} \sum_{j = 1}^{ϕ} \frac{|{\hat{M}}_{i, j} - g ({\hat{O}}_{i, j})|}{({\hat{M}}_{i, j} + g ({\hat{O}}_{i, j})) / 2 + c}, \end{matrix}

(22)

where the notations

{\hat{M}}_{i, j}

and

{\hat{O}}_{i, j}

are the same as those in Equation (19),

|\cdot|

denotes

L_{1}

norm, and c is a very small positive constant to prevent the denominator from being 0, which is set as

10^{- 9}

. Since

{\hat{O}}_{i, j}

usually contains decimals and negative numbers, in order to evaluate the model performance more accurately,

{\hat{O}}_{i, j}

was evaluated after rounding and non-negative integer processing. Thus,

g (\cdot) = g_{2} (g_{1} (\cdot))

is a composite function, in which

g_{1} (\cdot)

denotes the rounding function and

g_{2} (\cdot)

denotes the function of taking nonnegative integers.

g_{2} (g_{1} ({\hat{O}}_{i, j}))

can be expressed as:

g_{2} (g_{1} ({\hat{O}}_{i, j})) = \{\begin{matrix} 0, g_{1} ({\hat{O}}_{i, j}) < 0 \\ g_{1} ({\hat{O}}_{i, j}), g_{1} ({\hat{O}}_{i, j}) \geq 0 . \end{matrix}

(23)

According to Equations (20) and (21), RMSE and MAE are two different evaluation metrics that can measure the absolute error between the predicted OD matrix and the real OD matrix but cannot reflect whether the predicted OD matrix is underestimated or overestimated relative to the real OD matrix. According to Equation (22), SMAPE as a relative error metric can reflect whether the predicted OD matrix is underestimated or overestimated relative to the real OD matrix. Therefore, the prediction performance of the network can be comprehensively evaluated by using RMSE, MAE, and SMAPE.

4.1.4. Comparison Methods

In this paper, three state-of-the-art deep learning methods for short-term OD prediction in URT including ConvLSTM [27], STResNet [6], and CASCNN [17] were selected as the comparison methods of the proposed PIANet. Compared with the original paper, the implementation details of the three comparison methods were fine-tuned to make them more suitable for the dataset in this paper. Specifically, in terms of use of input data, the inputs of ConvLSTM and STResNet are the same as that of the proposed PIANet. For CASCNN, the inflow and outflow series of the same day and the finished OD matrices of previous days were used the same as the original paper. The batch sizes of ConvLSTM, STResNet, and CASCNN were 16, 32, and 32, respectively. Moreover, for ConvLSTM, STResNet, and CASCNN, the learning rates and loss functions were set to 0.001 and

L_{2}

loss, respectively.

4.2. Results and Discussions

In this section, the proposed PIANet is firstly compared with the comparison methods to show the superiority of the proposed PIANet in short-term OD prediction performance. Then, ablation studies are conducted to investigate the impacts of the proposed OAMs and feature interaction on the prediction performance of the proposed PIANet.

4.2.1. Comparison of Prediction Performance

In order to show the excellent short-term OD prediction performance of the proposed PIANet, comprehensive comparisons among the proposed PIANet and the three comparison methods were conducted. Table 3 shows the comparisons of RMSE, MAE, and SMAPE among the proposed PIANet and the three comparison methods (i.e., ConvLSTM, STResNet, and CASCNN) in BJSubway2021 and BJSubway2022. Note that all values of RMSE, MAE, and SMAPE are displayed in the form of mean ± std (standard deviation) in Table 3 and Table 4.

As shown in Table 3, the proposed PIANet can obtain lower RMSE, lower MAE, and lower SMAPE than the comparison methods in BJSubway2021 and BJSubway2022, which indicates that the proposed PIANet performs better than the comparison methods in short-term OD prediction performance. Specifically, compared with the suboptimal results, the RMSE, MAE, and SMAPE obtained by the proposed PIANet are, respectively, reduced by 19.1%, 9.9%, and 4.3% in BJSubway2021, and the RMSE, MAE, and SMAPE obtained by the proposed PIANet are, respectively, reduced by 20.3%, 14.7%, and 6.6% in BJSubway2022.

In order to present more detailed comparison results, the comparisons of RMSE, MAE, and SMAPE at each time interval during the morning peak hours (i.e., 7:30–9:10) among the proposed PIANet and the comparison methods (i.e., ConvLSTM, STResNet, and CASCNN) in BJSubway2021 and BJSubway2022 are shown in Figure 5 and Figure 6, respectively.

As shown in Figure 5a, the RMSE obtained by ConvLSTM is lower than those obtained by STResNet and CASCNN at each time interval in BJSubway2021. According to Figure 6b,c, the MAE and SMAPE obtained by ConvLSTM are, respectively, lower than those obtained by STResNet and CASCNN at each time interval in BJSubway2022. However, according to Figure 5b,c, the MAE and SMAPE obtained by ConvLSTM do not maintain the performance advantage compared with STResNet and CASCNN at each time interval in BJSubway2021. Specifically, in Figure 5b, the MAE obtained by ConvLSTM is lower than those obtained by STResNet and CASCNN before 8:30, and then higher than those obtained by STResNet and CASCNN after 8:30. In Figure 5c, the SMAPE obtained by ConvLSTM is lower than those obtained by STResNet and CASCNN before 8:15, and then higher than those obtained by STResNet and CASCNN after 8:15. Similarly, as shown in Figure 6a, the RMSE obtained by ConvLSTM does not maintain the performance advantage compared with STResNet and CASCNN at each time interval in BJSubway2022.

The above results show that none among ConvLSTM, STResNet, and CASCNN can always maintain the advantages in short-term OD prediction performance under the three evaluation metrics (i.e., RMSE, MAE, and SMAPE). However, it can be seen from Figure 5 and Figure 6 that the RMSE, MAE, and SMAPE obtained by the proposed PIANet are lower than those obtained by the comparison methods at all time intervals in BJSubway2021 and BJSubway2022, which further verifies the superiority of the proposed PIANet in short-term OD prediction performance.

In addition, the comparisons of convergence curves of best validation RMSE among the proposed PIANet and the comparison methods in BJSubway2021 and BJSubway2022 are shown in Figure 7. According to Figure 7a, CASCNN and STResNet stop converging to the lower minima at the 7th epoch and the 15th epoch, respectively. The convergence of ConvLSTM is relatively smooth; however, the convergence speed is slow. Compared with the three comparison methods, the proposed PIANet can converge faster and can stably converge to a lower best RMSE. According to Figure 7b, the proposed PIANet and the three comparison methods can all converge smoothly. Notably, compared with the three comparison methods, the proposed PIANet can still converge faster and still stably converge to a lower best RMSE.

The possible reasons for the better short-term OD prediction performance of the proposed PIANet compared to ConvLSTM, STResNet, and CASCNN are as follows: In terms of feature extraction, ConvLSTM can capture the temporal correlation well, but it cannot capture the spatial correlation. STResNet only uses a stack of convolutional layers for short-term OD prediction, and thus it is difficult to extract effective non-local feature information. The architecture of CASCNN is relatively simple; therefore, it is difficult to extract effective feature information for complex spatio-temporal correlations. In the proposed PIANet, the proposed OAMs are used to improve the representational power of the network by calculating the feature weights in the channel–spatial dimension, which can extract effective feature information for complex spatio-temporal correlations. In terms of feature utilization, none of the comparison methods have feature interactions between two network branches, which makes the feature utilization between two network branches low. However, the feature interaction between two network branches is used in the proposed PIANet, which further improves feature utilization between two network branches.

In short, compared with the comparison methods, the proposed PIANet can stably converge to a lower best validation RMSE during training; thus, the proposed PIANet shows better short-time OD prediction performance during testing.

4.2.2. Ablation Studies

The OAMs and feature interaction are proposed to further improve the short-term OD prediction performance. To verify the effectiveness of the proposed OAMs and feature interaction, four extra experiments were conducted.

In order to verify the effectiveness of the proposed OAMs, three experiments were conducted to form three variants of the proposed PIANet: (1) PIANet-A—all DCABs in Figure 4 are changed to DCBs; (2) PIANet-B—only DCABs in stage 2 of Figure 4 are changed to DCBs; (3) PIANet-C—only DCABs in stage 1 of Figure 4 are changed to DCBs. Moreover, in order to verify the effectiveness of feature interaction, the feature interaction in stage 2 of Figure 4 is removed to form a variant of the proposed PIANet, named PIANet-D.

The comparisons of RMSE, MAE, and SMAPE among the proposed PIANet and its variants (i.e., PIANet-A, PIANet-B, PIANet-C, and PIANet-D) in BJSubway2021 and BJSubway2022 are presented in Table 4. Furthermore, the comparisons of convergence curves of best validation RMSE among the proposed PIANet and its variants in BJSubway2021 and BJSubway2022 are shown in Figure 8.

Table 4. Comparisons of RMSE, MAE, and SMAPE among the proposed PIANet and its variants in BJSubway2021 and BJSubway2022.

Method	BJSubway2021			BJSubway2022
Method	RMSE	MAE	SMAPE	RMSE	MAE	SMAPE
PIANet-A	0.558 ± 0.230	0.153 ± 0.088	0.185 ± 0.070	0.521 ± 0.187	0.131 ± 0.072	0.164 ± 0.064
PIANet-B	0.556 ± 0.234	0.151 ± 0.089	0.182 ± 0.072	0.499 ± 0.172	0.125 ± 0.071	0.158 ± 0.065
PIANet-C	0.550 ± 0.238	0.153 ± 0.096	0.187 ± 0.081	0.498 ± 0.176	0.123 ± 0.072	0.156 ± 0.068
PIANet-D	0.610 ± 0.292	0.158 ± 0.096	0.186 ± 0.074	0.551 ± 0.257	0.137 ± 0.084	0.169 ± 0.077
PIANet	0.524 ± 0.191 *	0.145 ± 0.084	0.177 ± 0.074	0.492 ± 0.155	0.122 ± 0.067	0.155 ± 0.062

* Bold for the evaluation metric represents the best result.

According to Table 4, the proposed PIANet shows better short-term OD prediction performance with lower RMSE, lower MAE, and lower SMAPE than PIANet-B and PIANet-C, and then PIANet-B and PIANet-C perform better than PIANet-A in general, except that the SMAPE of PIANet-C is higher than that of PIANet-A in BJSubway2021. The above comparison results show that both the OAMs in stage 1 and the OAMs in stage 2 play an important role in improving the short-term OD prediction performance. And the more OAMs, the better the short-term OD prediction performance. Furthermore, the proposed PIANet outperforms PIANet-D, which indicates that feature interaction plays an important role in improving the short-term OD prediction performance.

According to Figure 8, the proposed PIANet can converge to a lower best validation RMSE than PIANet-B and PIANet-C in BJSubway2021 and BJSubway2022, and PIANet-B and PIANet-C can converge to a lower best validation RMSE than PIANet-A in BJSubway2021 and BJSubway2022. Moreover, the proposed PIANet can converge to a lower best validation RMSE than PIANet-D in BJSubway2021 and BJSubway2022. The comparison results of convergence curves in Figure 8 further verify the analytical results of Table 4.

In a word, it can be seen from Table 4 and Figure 8 that the proposed OAMs and feature interaction are effective in improving the short-term OD prediction performance.

5. Conclusions

In this paper, the proposed PIANet is introduced to further improve the short-term OD prediction accuracy in URT. The proposed PIANet involves two innovative components, namely, the OAMs and feature interaction. The OAMs are proposed to improve the representational power of the network by calculating the feature weights in the channel–spatial dimension, and the feature interaction is proposed to improve the feature utilization. Based on the two real-world datasets from the Beijing subway, according to the comparative experiments, the proposed PIANet obtains lower RMSE, lower MAE, and lower SMAPE than the comparison methods, which indicates that the proposed PIANet outperforms the comparison methods in short-term OD prediction performance. According to the ablation studies, the proposed OAMs and feature interaction play an important role in improving the short-term OD prediction performance.

The embedding of the proposed OAMs and the feature interaction makes the proposed PIANet have excellent feature extraction ability. Therefore, the proposed PIANet is not only suitable for the URT system of Beijing, but also can be extended to the URT system in other cities. Specifically, when the PIANet is extended to another URT system with different characteristics and scale, only the sizes of input and output need to be modified according to the scale of the new URT system, and then the corresponding input data and output data can be used for training. Due to its excellent feature extraction ability, the PIANet can capture the new passenger flow distribution during training. Therefore, after training, the PIANet can adapt to the new URT system and achieve accurate short-term OD prediction.

Author Contributions

Conceptualization, W.Z., C.G. and T.T.; methodology, W.Z.; software, W.Z.; validation, W.Z.; formal analysis, W.Z.; investigation, W.Z.; resources, W.Z. and C.G.; data curation, C.G.; writing—original draft preparation, W.Z.; writing—review and editing, W.Z.; visualization, W.Z.; supervision, C.G. and T.T.; project administration, C.G. and T.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the security of passenger flow information.

Conflicts of Interest

Author Wenzhong Zhou was employed by Traffic Control Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Bierlaire, M.; Crittin, F. An Efficient Algorithm for Real-Time Estimation and Prediction of Dynamic OD Tables. Oper. Res. 2004, 52, 116–127. [Google Scholar] [CrossRef]
Wang, S.-W.; Ou, D.-X.; Dong, D.-C.; Xie, H. Research on the model and algorithm of origin-destination matrix estimation for urban rail transit. In Proceedings of the 2011 International Conference on Transportation, Mechanical, and Electrical Engineering (TMEE), Changchun, China, 16–18 December 2011; pp. 1403–1406. [Google Scholar]
Yang, C.; Yan, F.; Xu, X. Daily metro origin-destination pattern recognition using dimensionality reduction and clustering methods. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 548–553. [Google Scholar]
Ou, J.; Lu, J.; Xia, J.; An, C.; Lu, Z. Learn, Assign, and Search: Real-Time Estimation of Dynamic Origin-Destination Flows Using Machine Learning Algorithms. IEEE Access 2019, 7, 26967–26983. [Google Scholar] [CrossRef]
Liu, L.; Qiu, Z.; Li, G.; Wang, Q.; Ouyang, W.; Lin, L. Contextualized Spatial–Temporal Network for Taxi Origin-Destination Demand Prediction. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3875–3887. [Google Scholar] [CrossRef]
Zhang, J.; Zheng, Y.; Qi, D. Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, Palo Alto, CA, USA, 4–9 February 2017; pp. 1655–1661. [Google Scholar]
Zhang, J.; Chen, F.; Wang, Z.; Liu, H. Short-Term Origin-Destination Forecasting in Urban Rail Transit Based on Attraction Degree. IEEE Access 2019, 7, 133452–133462. [Google Scholar] [CrossRef]
Li, D.; Cao, J.; Li, R.; Wu, L. A Spatio-Temporal Structured LSTM Model for Short-Term Prediction of Origin-Destination Matrix in Rail Transit With Multisource Data. IEEE Access 2020, 8, 84000–84019. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Huang, G.; Liu, Z.; Maaten, L.V.D.; Weinberger, K.Q. Densely Connected Convolutional Networks. arXiv 2018, arXiv:1608.06993v5. [Google Scholar]
Ledig, C.; Theis, L.; Huszar, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 105–114. [Google Scholar]
Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence Learning with Neural Networks. arXiv 2014, arXiv:1409.3215v3. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, arXiv:1810.04805v2. [Google Scholar]
Kumar, R. Memory Recurrent Elman Neural Network-Based Identification of Time-Delayed Nonlinear Dynamical System. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 753–762. [Google Scholar] [CrossRef]
Zhang, J.; Che, H.; Chen, F.; Ma, W.; He, Z. Short-term origin-destination demand prediction in urban rail transit systems: A channel-wise attentive split-convolutional neural network method. Transp. Res. Part C Emerg. Technol. 2021, 124, 102928. [Google Scholar] [CrossRef]
Noursalehi, P.; Koutsopoulos, H.N.; Zhao, J. Dynamic Origin-Destination Prediction in Urban Rail Systems: A Multi-Resolution Spatio-Temporal Deep Learning Approach. IEEE Trans. Intell. Transp. Syst. 2022, 23, 5106–5115. [Google Scholar] [CrossRef]
Ye, J.; Zhao, J.; Zheng, F.; Xu, C. Completion and augmentation-based spatiotemporal deep learning approach for short-term metro origin-destination matrix prediction under limited observable data. Neural Comput. Appl. 2023, 35, 3325–3341. [Google Scholar] [CrossRef]
Yang, F.; Shuai, C.; Qian, Q.; Wang, W.; He, M.; He, M.; Lee, J. Predictability of short-term passengers’ origin and destination demands in urban rail transit. Transportation 2023, 50, 2375–2401. [Google Scholar] [CrossRef]
He, Y.; Yang, Z.; Tsui, K. Short-term forecasting of origin-destination matrix in transit system via a deep learning approach. Transp. A 2023, 19, 2033348. [Google Scholar] [CrossRef]
Zhang, Y.; Sun, K.; Wen, D.; Chen, D.; Lv, H.; Zhang, Q. Deep Learning for Metro Short-Term Origin-Destination Passenger Flow Forecasting Considering Section Capacity Utilization Ratio. IEEE Trans. Intell. Transp. Syst. 2023, 24, 7943–7960. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Wang, W.; Hu, X.; Yang, J. Selective Kernel Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
Woo, S.; Park, J.; Lee, J.; Sun, G.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Misra, D.; Nalamada, T.; Arasanipalai, A.U.; Hou, Q. Rotate to Attend: Convolutional Triplet Attention Module. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2021; pp. 3138–3147. [Google Scholar]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.; Wong, W.; Woo, W. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 802–810. [Google Scholar]

Figure 1. Temporal relationship between the historical OD data and the predicted OD matrix.

Figure 2. The architecture of omnidirectional attention module.

Figure 3. The architecture of the dense compression attention block.

Figure 4. The overall architecture of the parallel interactive attention network.

Figure 5. Comparisons of the RMSE, MAE, and SMAPE at each time interval during the morning peak hours (i.e., 7:30–9:10) among the proposed PIANet and the comparison methods in BJSubway2021.

Figure 6. Comparisons of the RMSE, MAE, and SMAPE at each time interval during the morning peak hours (i.e., 7:30–9:10) among the proposed PIANet and the comparison methods in BJSubway2022.

Figure 7. Comparisons of convergence curves of best validation RMSE among the proposed PIANet and the comparison methods in BJSubway2021 and BJSubway2022.

Figure 8. Comparisons of convergence curves of best validation RMSE among the proposed PIANet and its variants in BJSubway2021 and BJSubway2022.

Table 1. PyTorch-like pseudocode for the forward process of the proposed PIANet.

Input: The historical OD matrices M of previous days, the historical inflow series N of the same day.

Stage 1: Generation of The Historical OD Matrices of The Same Day

1:

D_{1}

= DCAB (M);

D_{2}

= DCAB (

D_{1}

);

2:

B_{1}

= N.unsqueeze (−1). repeat (1, 1,

ϕ

);

3:

M_{1}

=

B_{1}

⊙ torch.softmax (

D_{2}

+ DCAB (

B_{1}

));

Stage 2: Feature Interaction

4:

D_{4}

= DCAB (

D_{1}

);

D_{5}

= DCAB (

M_{1}

);

5:

E_{1}

= torch.concat (

D_{4}

,

D_{5}

);

A_{2}

=

D_{4}

+

D_{5}

;

6:

D_{6}

= DCAB (

E_{1}

);

D_{7}

= DCAB (

A_{2}

);

7:

A_{3}

=

D_{6}

+

D_{7}

;

E_{2}

= torch.concat (

D_{6}

,

D_{7}

);

8:

D_{8}

= DCAB (

A_{3}

);

D_{9}

= DCAB (

E_{2}

);

Stage 3: Generation of The Predicted OD Matrix

9:

E_{3}

= torch.concat(

D_{8}

,

D_{9}

);

10: O = DCB (

E_{3}

);

Output: The predicted OD matrix O.

Table 2. Data description.

Dataset	BJSubway2021	BJSubway2022
Date	14 September 2021 to 17 October 2021	1 January 2022 to 9 January 2022
Date	14 December 2021 to 29 December 2021	18 January 2022 to 21 February 2022
Data record	137 million	98 million
Station Number	369	372
Matrix Dimension	369 × 369	372 × 372
Time	6:00 to 23:00	6:00 to 23:00
Time Interval	5 min	5 min

Table 3. Comparisons of RMSE, MAE, and SMAPE among the proposed PIANet and the three comparison methods in BJSubway2021 and BJSubway2022.

Method	BJSubway2021			BJSubway2022
Method	RMSE	MAE	SMAPE	RMSE	MAE	SMAPE
ConvLSTM	0.648 ± 0.343	0.161 ± 0.100	0.185 ± 0.078	0.632 ± 0.353	0.143 ± 0.089	0.166 ± 0.070
STResNet	0.722 ± 0.476	0.172 ± 0.119	0.189 ± 0.078	0.623 ± 0.348	0.149 ± 0.100	0.176 ± 0.078
CASCNN	0.740 ± 0.473	0.178 ± 0.116	0.195 ± 0.077	0.617 ± 0.341	0.146 ± 0.090	0.175 ± 0.071
PIANet	0.524 ± 0.191 *	0.145 ± 0.084	0.177 ± 0.074	0.492 ± 0.155	0.122 ± 0.067	0.155 ± 0.062

* Bold for the evaluation metric represents the best result.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, W.; Gao, C.; Tang, T. Parallel Interactive Attention Network for Short-Term Origin–Destination Prediction in Urban Rail Transit. Appl. Sci. 2024, 14, 100. https://doi.org/10.3390/app14010100

AMA Style

Zhou W, Gao C, Tang T. Parallel Interactive Attention Network for Short-Term Origin–Destination Prediction in Urban Rail Transit. Applied Sciences. 2024; 14(1):100. https://doi.org/10.3390/app14010100

Chicago/Turabian Style

Zhou, Wenzhong, Chunhai Gao, and Tao Tang. 2024. "Parallel Interactive Attention Network for Short-Term Origin–Destination Prediction in Urban Rail Transit" Applied Sciences 14, no. 1: 100. https://doi.org/10.3390/app14010100

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Parallel Interactive Attention Network for Short-Term Origin–Destination Prediction in Urban Rail Transit

Abstract

1. Introduction

2. Related Work

2.1. Deep Learning Methods for Short-Term OD Prediction in Urban Rail Transit

2.2. Attention Mechanism

3. Methods

3.1. Problem Formulation

3.2. Omnidirectional Attention Module

3.3. Dense Compression Attention Block

3.4. Parallel Interactive Attention Network

3.4.1. Generation of the Historical OD Matrices of the Same Day

3.4.2. Feature Interaction

3.4.3. Generation of the Predicted OD Matrix

3.5. Loss Function

4. Experiments and Results

4.1. Basic Settings

4.1.1. Datasets

4.1.2. Implementation Details

4.1.3. Evaluation Metrics

4.1.4. Comparison Methods

4.2. Results and Discussions

4.2.1. Comparison of Prediction Performance

4.2.2. Ablation Studies

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI