Augmentation of Deep Learning Models for Multistep Traffic Speed Prediction

Riaz, Adnan; Rahman, Hameedur; Arshad, Muhammad Ali; Nabeel, Muhammad; Yasin, Affan; Al-Adhaileh, Mosleh Hmoud; Eldin, Elsayed Tag; Ghamry, Nivin A.

doi:10.3390/app12199723

Open AccessArticle

Augmentation of Deep Learning Models for Multistep Traffic Speed Prediction

by

Adnan Riaz

^1,2,*,

Hameedur Rahman

³

,

Muhammad Ali Arshad

⁴,

Muhammad Nabeel

⁵

,

Affan Yasin

⁶

,

Mosleh Hmoud Al-Adhaileh

⁷

,

Elsayed Tag Eldin

^8,*

and

Nivin A. Ghamry

⁹

¹

Department of Creative Technologies, Faculty of Computing and Artificial Intelligence, Air University, Islamabad 44000, Pakistan

²

School of Software, Dalian University of Technology, Dalian 116024, China

³

Department of Computer Games Development, Faculty of Computing and Artificial Intelligence, Air University, Islamabad 44000, Pakistan

⁴

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

⁵

School of Software, South China University of Technology, Guangzhou 510006, China

⁶

School of Software, Tsinghua University, Beijing 100084, China

⁷

Deanship of Elearning and Distance Education, King Faisal University, P.O. Box 400, Al Ahsa 31982, Saudi Arabia

⁸

Faculty of Engineering and Technology, Future University in Egypt, New Cairo 11835, Egypt

⁹

Faculty of Computers and Artificial Intelligence, Cairo University, Giza 12613, Egypt

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2022, 12(19), 9723; https://doi.org/10.3390/app12199723

Submission received: 30 August 2022 / Revised: 20 September 2022 / Accepted: 21 September 2022 / Published: 27 September 2022

(This article belongs to the Special Issue Innovative Solutions for Intelligent and Sustainable Machinery)

Download

Browse Figures

Versions Notes

Abstract

:

Traffic speed prediction is a vital part of the intelligent transportation system (ITS). Predicting accurate traffic speed is becoming an important and challenging task with the rapid development of deep learning and increasing traffic data size. In this study, we present a deep-learning-based architecture for network-wide traffic speed prediction. We propose a deep-learning-based model consisting of a fully convolutional neural network, bidirectional long short-term memory, and attention mechanism. Our design aims to consider both backward and forward dependencies of traffic data to predict multistep network-wide traffic speed. Thus, we propose a model named AttBDLTSM-FCN for multistep traffic speed prediction. We augmented the attention-based bidirectional long short-term memory recurrent neural network with the fully convolutional network to predict the network-wide traffic speed. In traffic speed prediction, this is the first time that augmentation of AttBDLSTM and FCN have been exploited to measure the backward dependency of traffic data, as a building block for a deep architecture model. We conducted comprehensive experiments, and the experimental evaluations illustrated that the proposed architecture achieved better performance compared to state-of-the-art models when considering the short and long traffic speed prediction, e.g., 15, 30, and 60 min, in multistep traffic speed prediction.

Keywords:

attention mechanism; bidirectional long short-term memory; fully convolutional neural network; intelligent transportation system (ITS); speed prediction

1. Introduction

The intelligent transportation system (ITS) is an advanced tool designed to provide services related to traffic management and different modes of transportation, also enabling people to use the transportation network more intelligently and efficiently. The ITS is reflected as a key to the transport system. Traffic optimization control, network traffic planning, and accurate and real-time short-term traffic prediction in the road network play a vital role in analyzing the better traffic conditions of the road network. Several methods have been developed to enhance the efficiency and accuracy of the traffic prediction, e.g., parametric and nonparametric approaches [1].

It is challenging to model the interaction between vehicles in large-scale transportation traffic in highly populated urban road networks, and collecting traffic data is also a difficult task. Existing short-term traffic prediction methods mostly use sensor location or corridor level, which cannot achieve network-wide prediction accuracy unless N models were trained for N nodes traffic networks [2]. For time series data, existing LSTM-based prediction models use only one hidden layer and, thus, relatively shallow structure [2,3,4]. Along with a corridor location, future traffic speed values are influenced by the past speed of downstream and upstream locations [5,6,7]. The predictive performance shall be enhanced by backward and forward temporal perspectives, particularly for recurring traffic patterns, by investigating the periodicity of time series data [6]. We cannot ignore the impact of downstream and upstream speed in each network location while predicting network-wide traffic speed instead of speed at one location. The learned features will be more widespread because of both backward as well as forward dependencies of spatiotemporal data. This research focuses on predicting network-wide traffic speed to better understand traffic states in a network instead of a single road link.

We propose an AttBDLSTM-FCN model for the multistep traffic speed prediction over network-wide traffic data to overcome the aforementioned challenges. From experimental results, we come to know that our proposed model achieves better performance to predict network-wide traffic speed. In summing up, our contribution is as follows: we examined the traffic forecasting network-wide instead of at a specific location or many nearby locations. We proposed a deep architecture considering both forward and backward dependencies using AttBDLSTM-FCN to enrich the feature extraction and feature learning from time series spatial data. We examined the attention layer behavior in our model and concluded that using attention mechanism can enhance the model performance accordingly.

The rest of the paper is as follows: Section 2 describes the literature review. Section 3 describes the methodology in great detail. Section 4 describes the results and discussion. Some interesting conclusions are drawn in Section 5.

2. Literature Review

Only a few studies in short-term traffic prediction from the last few decades elaborated network-wide traffic prediction. Most of the approaches engrossed arterial or corridor levels. To increase the accuracy and efficiency of traffic forecasting, numerous studies have been proposed.

On the whole, these approaches are grouped as parametric and nonparametric approaches. In many traffic prediction practices, parametric approaches leverage counts in time series and linear regressive models. To identify the primary travel time pattern, Fei et al. [8] proposed a Bayesian dynamic linear model to predict accurate travel time. To characterize the temporal travel time evolution, many entropy-based measures were proposed by Oh and Park [9]. Ahmed et al. [10] envisaged expressway traffic flow by applying the ARIMA model. When the traffic conditions are regular, parametric approaches are very appropriate to use. However, as traffic data involve nonlinear and stochastic properties, the performance of parametric approaches becomes undesirable. Therefore, applying the techniques mentioned above make traffic prediction unreliable.

Regarding nonparametric approaches, also known as data-driven models, during the traffic speed prediction, Cai et al. [11] proposed the KNN model to depict the spatiotemporal correlation. MingHeng et al. [12] proposed a short-term traffic flow model based on SVM to predict multistep traffic flow prediction. When traffic conditions are large in scale and complex with intersections and loops, upstream and downstream both refer to relative positions, and two arbitrarily locations can be downstream and upstream. The upstream and downstream are defined with respect to space, while forward and backward dependencies are defined with respect to time.

When traffic conditions become large in scale and complex, the parametric and nonparametric approaches do not work efficiently. Deep-learning-based approaches are significantly used in traffic prediction due to their great ability to deal with spatiotemporal patterns. For traffic forecasting problems, many neural-network-based methods, e.g., feedforward NN [13], fuzzy NN [14], recurrent NN [15], hybrid NN [16], and Gaussian process (GP) [17] have been adopted. Kim et al. [18] proposed a composite road network Caps-Net architecture, which switched the max pooling task of CNN. To simultaneously predict weather conditions and traffic flow, decision level data fusion and deep belief networks were incorporated by Koesdwiady et al. [19]. Transportation has a dynamic nature, which is why RNNs models are very suitable to deal with the temporal evolution of traffic status. In traffic speed prediction, long-term temporal characteristics were captured using LSTMs by Ma et al. [3]. To predict travel demand in the large-scale transportation network, using stacked long short-term memory, Cheng et al. [20] proposed the feature-level data fusion model. Zhang et al. [21] incorporated a deep spatiotemporal residual network for the prediction of the citywide crowd flow by considering the period, temporal closeness, and the trend properties of human mobility. Zhang et al. [22] proposed a multitask model to predict network-wide traffic speed. Cui et al. [7] proposed a bidirectional and unidirectional RNN-based model for the prediction of network wide traffic speed. Adnan et al. [23] proposed a stacked bidirectional attention-based gated recurrent unit model for large-scale traffic speed prediction.

3. Methodology

In this section, architecture and its components are discussed in great detail.

3.1. Input Data

The prediction of traffic speed on a single location usually requires the series of the speed values associating with ‘n’ historical time steps for input data. We have the input data with p locations

[l_{1}, l_{2}, \dots, l_{n}]

, where each location

l_{i}

[t_{1}, t_{2}, \dots, t]

represents the traffic speed data at p location, with ‘t’ historical time steps (frames) from the sensor location

l_{i}

, that can be illustrated with the vector as shown in the following equation.

Y_{T} = [Y_{(i, t - 1)}, Y_{(i, t - 2)}, \dots, Y_{(i, t - n)}]

(1)

where

Y_{i, t - 1}

indicates the speed at ith location and

t - 1

time. When traffic jams or traffic congestion propagate in a traffic network, it may affect the speed at nearby as well as faraway locations. Thus, considering these influences, network-wide speed data are taken as input data for the proposed and compared models. We aim to envisage traffic speed at ‘T’ time using ‘t’ historical time frames from the traffic network consisting of ‘P’ locations. The speed data matrix can be characterized as input.

Y_{P, T} = [\begin{matrix} Y_{T - n}^{1} & Y_{T - n + 1}^{1} & \dots & Y_{T - 2}^{1} & Y_{T - 1}^{1} \\ Y_{T - n}^{2} & Y_{T - n + 1}^{2} & ⋱ & Y_{T - 2}^{2} & Y_{T - 1}^{2} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ Y_{T - n}^{P} & Y_{T - n + 1}^{P} & \dots & Y_{T - 2}^{P} & Y_{T - 1}^{P} \end{matrix}]

(2)

Equation (2) shows every component

Y_{P, T}

is the speed at the

p_{t} h

location along with the tth time steps. To imply temporal features of speed statistics and rationalize the representation of the equation, the vector

Y_{P, T} = [Y_{i, t - 1}, Y_{i, t - 2}, \dots, Y_{i, t - n}]

denotes the speed matrix in which every element indicates ‘P’ locations’ speed values. The prediction speed output can be represented as

{\hat{Y}}_{P, T} = [\begin{matrix} Y_{T - n + 1}^{1} & Y_{T - 2}^{1} & \dots & Y_{T - 1}^{1} & Y_{T}^{1} \\ Y_{T - n + 1}^{2} & Y_{T - 2}^{2} & ⋱ & Y_{T - 1}^{2} & Y_{T}^{2} \\ ⋮ & ⋮ & ⋮ \\ Y_{T - n + 1}^{P} & Y_{T - 2}^{P} & \dots & Y_{T - 1}^{P} & Y_{T}^{P} \end{matrix}]

(3)

3.2. Bidirectional LSTMs

The bidirectional RNN [24] deals with the sequential data in a backward and forward direction in two different hidden layers. The thought of using bidirectional LSTMs came from a bidirectional recurrent neural network. In various fields, the BDLSTMs show better performance than LSTMs, e.g., phoneme classification and speech recognition [25,26]. Figure 1 demonstrates the structure of the BDLSTM.

h_{t} = σ_{h} (W_{x h} x_{t} + W_{h h} h_{t - 1} + b_{h})

(4)

{\hat{y}}_{t} = σ_{h} (W_{h y} h_{t} + b_{y})

(5)

f_{t} = σ_{g} (W_{f} x_{t} + U_{f} h_{t - 1} + b_{f})

(6)

i_{t} = σ_{g} (W_{i} x_{t} + U_{i} h_{t - 1} + b_{i})

(7)

o_{t} = σ_{g} (W_{o} x_{t} + U_{o} h_{t - 1} + b_{o})

(8)

{\tilde{C}}_{t} = \tan h (W_{C} x_{t} + U_{C} h_{t - 1} + b_{C})

(9)

The weight matrices

W_{f}

,

W_{i}

,

W_{o}

, and

W_{C}

draw the hidden layer input to the input cell state and gates, whereas the weight matrices that join the input cell state and three gates to the output of the previous cell state are

U_{f}

,

U_{i}

,

U_{o}

, and

U_{C}

. The gate activation function is

σ_{g}

, technically which can be a sigmoid function. The bias vectors are

b_{f}

,

b_{i}

,

b_{o}

, and

b_{C}

, and

\tan h

is a hyperbolic tangent function. Figure 2 depicts the structure of the AttBDLSTM architecture.

{\hat{y}}_{t} = σ (↼_{h}, ⇀_{h})

(10)

↼_{h}

and

⇀_{h}

are backward and forward layer output sequences computed by utilizing positive sequence inputs from time

T - n

to

T - 1

and vice versa, respectively. Equations (4)–(9) compute both backward and forward layers outputs by updating these LSTM equations.

{\hat{Y}}_{T}

is an output vector, where to combine the two output sequences,

σ

function is used. The final output of the BDLSTM is also demonstrated by a vector the same as the LSTM output,

{\hat{Y}}_{T} = [y_{T}, \dots, y_{T + n}]

, where the last element

y_{T}

is the predicted speed of the next time iteration, and so on.

3.3. Attention

Meanwhile, the research is aimed at the attention mechanism, we tend to study the self-attention mechanism. As the hidden state of the RNN enfolds the summary of the data in the sequence, it processes a sequence as it outputs its hidden states

h_{i} .

We strengthen the contribution of the self-attention mechanism [27]. In various sequence transduction tasks, including machine translation [27], image captioning [28], mining bug repository [29] and speech emotion recognition [30], attention-based RNNs have proven very proficient. The magnitude of weights can signify the importance of hidden states

\hat{\propto_{i}}

learned by the network. The context vector

\hat{r}

is computed by Equation (11).

\hat{r} = \sum_{i = 1}^{N} {\hat{\propto}}_{i} {\hat{h}}_{i}

(11)

The input of the attention layer is

{\hat{h}}_{i}

; at every time step, the attention weight

\hat{w_{i}}

is calculated as

{\hat{w}}_{i} = \tan h ({\hat{h}}_{i})

(12)

\hat{b} = a^{T} m_{i} + b

(13)

\hat{\propto_{i}} = \frac{exp (\hat{b})}{\sum_{k} exp (\hat{b})}

(14)

The output of the attention layer at ith time step is computed as shown in Equation (15).

\hat{r_{i}} = {\hat{\propto}}_{i} {\hat{h}}_{i}

(15)

3.4. Fully Convolutional Network (FCN)

The fully convolutional network has been applied successfully in many domains such as semantic segmentation [31,32,33], breast cancer detection using historical image data [34], time series classification [35,36,37], model-free prediction of multilane traffic scenes [38], etc. We were inspired by CNN models used for time series prediction and based on the literature review. We applied a fully convolutional network to the traffic prediction problem. The model’s architecture consisted of a Conv1D, also known as temporal convolution, used for spatial/temporal input. We used temporal convolutional networks (Conv1D) in the FCN branch, which means we applied a 1D filter. A convolutional block consisted of a convolution layer, batch normalization (BN), and a sigmoid activation function.

y = W * x + b

(16)

\hat{y} = B N + (y)

(17)

h = S i g m o i d + (\hat{y})

(18)

where ∗ is the convolutional operator. We applied causal convolutional architecture provided by the TensorFlow implementation by adding the padding parameter as ‘causal’ results in causal convolutions, which means no information leakage from the future to the past in the network. The causal shift of the filter in the right direction did not depend on future inputs, as shown in Figure 3. Finally, after the convolutional block, learned features fed into the global average pooling layer rather than a fully connected layer, which vastly decreased the model’s number of parameters and weights.

3.5. AttBDLSTM-FCN

The proposed model, AttBDLSTM-FCN, augments an attention-based bidirectional LSTM with a fully convolutional network to predict multistep network-wide traffic speed. To encapsulate the bidirectional spatial features and temporal dependencies, augmentation of AttBDLSTM and FCN are exploited. In traffic speed prediction, to compute the backward dependency of data, this is the first time that augmentations of AttBDLSTM and FCN are used as a building block for a deep architecture model. Figure 4 describes the architecture of the proposed methodology. Several hidden layers of deep LSTM architectures are built up and thus work more effectively than described in existing studies [25,26,39]. AttBDLTSM-FCN comprises two blocks, as depicted in Figure 4. The FCN block consists of temporal convolutions that are applied for feature extraction generally. The global average pooling [40] is used to diminish the number of parameters.

The FCN block augments through the AttBDLSTM block. The FC block contains a temporal block having a filter size of 128. The structure of the FC block is described as a temporal convolutional layer accompanied by batch normalization [40] and followed by a sigmoid activation function. Finally, we used the global average pooling after the convolutional block.

We used the self-attention mechanism in the BDLSTM module as attention-based RNNs are very efficient in many fields, e.g., machine translation, image captioning, speech emotion recognition, etc. Thus, we strengthened the contribution of the self-attention mechanism in our study. The AttBDLSTM and the FCN block are concatenated and fed into the sigmoid layer. We used the sigmoid function in our proposed model as the last layer of the model. Further, it also serves as an activation function in machine learning. The AttBDLTSM-FCN inputs spatial time series data and predicts the expected future values for the multistep traffic speed prediction. When the time series spatial–temporal traffic data are fed to the AttBDLSTMs, the spatial correlation of the speed at different locations and the temporal dependencies of the speed values are captured during the process of feature learning, as bidirectional LSTMs can utilize both backward and forward dependencies, so the BDLSTM is an entirely reasonable model to take in more valuable information from spatial time series data. In the FCN model, there is no fully connected layer, so learning filters are everywhere. It is faster than CNN. An FCN learns representations and makes decisions based on local spatial input. The proposed model, AttBDLSTM-FCN, uses the augmentation of AttBDLSTM and FCN. It is evident that when applying FCN and AttBDLSTM blocks parallelly, the blocks augment and force each other to find a set of features when combined, resulting in a model with better overall performance. Our proposed model significantly enhances the performance of the FCN network with minimal increase in the model size and requires minimal processing of the dataset. One of the major advantages of using the attention BDLSTM cell is that it provides a visual representation of the attention vector. Utilization of the attention mechanism allows one to visualize the decision process of the LSTM cell. A major advantage of the proposed model is that it does not require heavy preprocessing or feature engineering.

The mechanism used to make short–long-term traffic speed predictions in the present study is depicted in Figure 4. The one-time-step prediction model property is shown in Figure 5. Therefore, we transform the univariate time series data of traffic speed prediction into

Y_{T}

as predicted variable and

{\hat{Y}}_{T + 1}

as a response variable. Afterward, to predict the

Y_{T + 2}

,

{\hat{Y}}_{T + 1}

is appended to the training data. In Figure 6, the model predicts future multistep traffic speed prediction at a time. Thus, it predicts multiple future steps from

{\hat{Y}}_{T + 1}

to

{\hat{Y}}_{T + n}

at once, where n is the number of future time step predictions.

3.6. Dataset Description

The dataset used in this study is publicly available [41]. The dataset covers four connected freeways, I-405, I-5, SR-520, and I-90. There are 323 sensor stations located on these connected freeways to collect the data every 5 min from the Seattle area. The dataset covers the whole year of 2015.

3.7. Experimental Setup

Each sample of the input data

Y_{P, T}

is a 2-dimensional vector, and the output is a 1-dimensional vector with 323 components. Based on the model description, [

N, n, P

] = [

N, n, 323

] is the dimension of input data. We used an early stopping mechanism to overcome the problem of overfitting. Before feeding input to the model, all samples were divided and randomized by the ratio of 7:2:1 as training, validation, and testing set, respectively.

Table 1 shows the hyperparameters of the training process. To determine the performance of the different speed prediction algorithms, we used mean absolute percentage error (

M A P E

) and mean absolute error (

M A E

) as evaluation metrics from the following equations:

M A E = \frac{1}{n} \sum_{i = 1}^{n} |h_{i} - ℏ_{i}|

(19)

M A P E = \frac{100}{n} \sum_{i = 1}^{n} |\frac{h_{i} - ℏ_{i}}{h_{i}}|

(20)

where,

h_{i}

is actual traffic speed, and

ℏ_{i}

is observed speed. The proposed AttBDLSTM-FCN model’s results are examined and compared with different models.

4. Results Analysis and Discussion

Different baseline models were used in traffic forecasting problems, such as ARIMA and Kalman filter. In this study, ARIMA and Kalman filter methods are not compared, as the performances of these models are far behind, based on our literature review [42]. In this study, we compared and evaluated the performance of our proposed model, AttBDLSTM-FCN, with different state-of-the-art baseline models, e.g., fully convolutional network (FCN) [38], LSTM [3], GRU NN [4], LSTM with fully connected deep neural network layer (LSTM-DNN) [43,44], LSTM with fully convolutional network (LSTM-FCN) [35,45], and BDLSTM [46]. We also examined and evaluated the performance of baseline models with a different number of time lags (time frames) studied in this paper, such as time lags of 3, 6, and 12, respectively. After conducting comprehensive research, we learned that the compared models performed well on small time lags as the time lags (time frames) increased, and their performance decreased. The GRU model showed its best performance when it used six time lags together. The MAE and MAPE were 1.80 and 3.69%, respectively. As the time lags increased onward, the performance of the GRU model decreased accordingly. In the case of LSTM, when the time lag was set to greater than six, its performance started deceasing, accordingly. The best results of LSTM models were 1.74 and 3.90% MAE and MAPE, respectively, when it used six time lags as input. By adding a DNN layer in the LSTM model, the performance of the model was enhanced, and the error was reduced to 1.86 and 4.45% MAE and MAPE, respectively. LSTM-DNN models also outperformed when the time lag was set as three. By increasing the time lags from three onward, the model performance decreased, and the error rate increased. In nonhybrid models, the BDLSTM model outperformed other baseline models such as GRU, LSTM, and LSTM-DNN. The BDLSTM model also showed consistent behavior with the LSTM-DNN model discussed above. It had its best performance when the time lag was set as three. The MAE and MAPE were reduced to 1.40 and 2.99%, respectively. In hybrid models, the LSTM-FCN showed the best performance when it used six time lags together as input; the MAE and MAPE reduced to 1.45 and 3.40%, respectively. In the proposed model, when the time lags were set to three, the MAE and MAPE were 1.17 and 2.73, but as the number of time lags increased from three onward, the performance behavior of the proposed model was not decreased as it was in the baseline models. The proposed model achieved its best performance when time lag was set as three, the MAE and MAPE reduced to 1.17 and 2.73%, respectively. As the number of time lags increased from three onward, the performance of the proposed model did not decrease as did the baseline models. The experimental results are shown in Table 2 and Table 3. Figure 7 shows the comparison of different models’ training times per epoch. The graphical representation of the results are shown in Figure 8 and Figure 9. The ground truth speed values and predicted speed values of the proposed model at the sensor station 167.56 milepost, from a randomly selected day, while predicting multistep future traffic speed prediction on 15, 30, and 60 min time frames are shown in Figure 10. We used the same sensor location to clearly show the performance of the proposed model with different time steps without having biased or ambiguous results.

Ablation Study

We performed an ablation study of the proposed model to investigate the causality of the model. The results are shown in Table 4. The experimental evaluations demonstrated that the combination of the AttBDLSTM block features and the FCN block features improved the model’s performance. We performed an ablation study on three different time lags, e.g., 3, 6, and 12 (15, 30, and 60 min, respectively), and we obtained surprising results by augmenting two models. When the time lag was set as 3 (e.g., 15 min), FCN, BDLSTM, AttBDLSTM, and the proposed model had MAE and MAPE values of 6.73, 18.10%, 1.40, 2.99%, 1.29, 2.77%, and 1.17, 2.73%, respectively. When the time lag was set as 6 (e.g., 30 min), the MAE and MAPE of the FCN, BDLSTM, AttBDLSTM, and the proposed model were 6.58, 18.99%, 1.42, 3.11%, 1.34, 2.99%, and 1.19, 2.75%, respectively. When the time lag was set as 12 (e.g., 60 min), we obtained the MAE and MAPE of the compared models as follows: 8.09, 20.48%, 1.54, 3.41%, 1.39, 3.11%, and 1.30, 2.99%, respectively. After the ablation study, based on our experimental findings, we can see that the attention mechanism along with BDLSTM can help to improve the performance of the proposed model. Moreover, we can conclude that the FCN block augmented with the AttBDLSTM block statistically helped improve the overall model performance.

5. Conclusions and Future Work

In this study, we proposed an attention-based bidirectional long short-term memory fully convolutional network to predict multistep network-wide traffic speed. The contributions of this study are as follows:

We proposed a deep-stacked model studying both forward and backward dependencies of the traffic data. Firstly, it integrates both attention-based BDLSTMs and FCN as an introductory module to capture the spatiotemporal correlation in the network-wide traffic data. We examined the attention layer behavior in our model and concluded that using attention mechanisms can enhance the model performance accordingly.
The proposed model exploited the benefits of deep-learning-based architectures, e.g., bidirectional long short-term memory and fully convolutional neural networks, to improve prediction performance in this study.
Our study demonstrated that AttBDLSTM-FCN achieved better performance in network-wide traffic speed prediction when predicting traffic for 15, 30, and 60 min, respectively. Moreover, the proposed model is also competent for multistep network-wide traffic speed prediction in the future.

Nevertheless, as the results indicate, we know that deep-learning-based models for predicting network-wide traffic speed still need further study. In the future we will extend this work to improve the models by combining evolutionary computation solutions to automatically optimize deep neural network hyperparameters. The other possible future direction is to extend this work of speed prediction in between different levels, which is still missing.

Author Contributions

Conceptualization, A.R.; Data curation, A.R., M.A.A. and M.N.; Formal analysis, A.R. and M.A.A.; Investigation, M.N.; Methodology, A.R. and H.R.; Software, A.R.; Validation, A.R. and M.H.A.-A.; Visualization, A.R. and A.Y.; Writing—original draft, A.R.; Writing—review & editing, H.R., A.Y., M.H.A.-A., E.T.E. and N.A.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable for studies not involving humans or animals.

Informed Consent Statement

Not applicable.

Data Availability Statement

The code is available at the following link: https://github.com/AdnanRiaz107/AttBDLSTM_FCN (accessed on 31 June 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

Van Lint, J.; Van Hinsbergen, C. Short-term traffic and travel time prediction models. Artif. Intell. Appl. Crit. Transp. Issues 2012, 22, 22–41. [Google Scholar]
Duan, Y.; Yisheng, L.; Wang, F.Y. Travel time prediction with LSTM neural network. In Proceedings of the 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, 1–4 November 2016; pp. 1053–1058. [Google Scholar]
Ma, X.; Tao, Z.; Wang, Y.; Yu, H.; Wang, Y. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transp. Res. Part C Emerg. Technol. 2015, 54, 187–197. [Google Scholar] [CrossRef]
Fu, R.; Zhang, Z.; Li, L. Using LSTM and GRU neural network methods for traffic flow prediction. In Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), Wuhan, China, 11–13 November 2016; pp. 324–328. [Google Scholar]
Jiang, X.; Adeli, H. Wavelet packet-autocorrelation function method for traffic flow pattern analysis. Comput.-Aided Civ. Infrastruct. Eng. 2004, 19, 324–337. [Google Scholar] [CrossRef]
Box, G.E.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
Cui, Z.; Ke, R.; Pu, Z.; Wang, Y. Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. arXiv 2018, arXiv:1801.02143. [Google Scholar]
Fei, X.; Lu, C.C.; Liu, K. A bayesian dynamic linear model approach for real-time short-term freeway travel time prediction. Transp. Res. Part C Emerg. Technol. 2011, 19, 1306–1318. [Google Scholar] [CrossRef]
Oh, C.; Park, S. Investigating the effects of daily travel time patterns on short-term prediction. KSCE J. Civ. Eng. 2011, 15, 1263–1272. [Google Scholar] [CrossRef]
Ahmed, M.S.; Cook, A.R. Analysis of Freeway Traffic Time-Series Data by Using Box-Jenkins Techniques; The National Academies of Sciences, Engineering, and Medicine: Washington, DC, USA, 1979; Number 722. [Google Scholar]
Cai, P.; Wang, Y.; Lu, G.; Chen, P.; Ding, C.; Sun, J. A spatiotemporal correlative k-nearest neighbor model for short-term traffic multistep forecasting. Transp. Res. Part C Emerg. Technol. 2016, 62, 21–34. [Google Scholar] [CrossRef]
Mingheng, Z.; Yaobao, Z.; Ganglong, H.; Gang, C. Accurate multisteps traffic flow prediction based on SVM. Math. Probl. Eng. 2013, 2013, 418303. [Google Scholar] [CrossRef]
Park, D.; Rilett, L.R. Forecasting freeway link travel times with a multilayer feedforward neural network. Comput.-Aided Civ. Infrastruct. Eng. 1999, 14, 357–367. [Google Scholar] [CrossRef]
Yin, H.; Wong, S.; Xu, J.; Wong, C. Urban traffic flow prediction using a fuzzy-neural approach. Transp. Res. Part C Emerg. Technol. 2002, 10, 85–98. [Google Scholar] [CrossRef]
Van Lint, J.; Hoogendoorn, S.; van Zuylen, H.J. Freeway travel time prediction with state-space neural networks: Modeling state-space dynamics with recurrent neural networks. Transp. Res. Rec. 2002, 1811, 30–39. [Google Scholar] [CrossRef]
Yu, R.; Li, Y.; Shahabi, C.; Demiryurek, U.; Liu, Y. Deep learning: A generic approach for extreme condition traffic forecasting. In Proceedings of the 2017 SIAM International Conference on Data Mining, Houston, TX, USA, 27–29 April 2017; pp. 777–785. [Google Scholar]
Wang, P.; Kim, Y.; Vaci, L.; Yang, H.; Mihaylova, L. Short-term traffic prediction with vicinity Gaussian process in the presence of missing data. In Proceedings of the 2018 Sensor Data Fusion: Trends, Solutions, Applications (SDF), Bonn, Germany, 9–11 October 2018. [Google Scholar]
Kim, Y.; Wang, P.; Zhu, Y.; Mihaylova, L. A capsule network for traffic speed prediction in complex road networks. In Proceedings of the 2018 Sensor Data Fusion: Trends, Solutions, Applications (SDF), Bonn, Germany, 9–11 October 2018. [Google Scholar]
Koesdwiady, A.; Soua, R.; Karray, F. Improving traffic flow prediction with weather information in connected cars: A deep learning approach. IEEE Trans. Veh. Technol. 2016, 65, 9508–9517. [Google Scholar] [CrossRef]
Cheng, Q.; Liu, Y.; Wei, W.; Liu, Z. Analysis and forecasting of the day-to-day travel demand variations for large-scale transportation networks: A deep learning approach. In Transportation Analytics Contest; Technical Report; FHWA Resource Center: Atlanta, GA, USA, 2016. [Google Scholar]
Zhang, J.; Zheng, Y.; Qi, D.; Li, R.; Yi, X.; Li, T. Predicting citywide crowd flows using deep spatio-temporal residual networks. Artif. Intell. 2018, 259, 147–166. [Google Scholar] [CrossRef]
Zhang, K.; Zheng, L.; Liu, Z.; Jia, N. A deep learning based multitask model for network-wide traffic speed prediction. Neurocomputing 2020, 396, 438–450. [Google Scholar] [CrossRef]
Riaz, A.; Nabeel, M.; Khan, M.; Jamil, H. SBAG: A hybrid deep learning model for large scale traffic speed prediction. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 287–291. [Google Scholar] [CrossRef]
Song, X.; Kanasugi, H.; Shibasaki, R. Deeptransport: Prediction and simulation of human mobility and transportation mode at a citywide level. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, NY, USA, 9–15 July 2016; pp. 2618–2624. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Graves, A.; Jaitly, N.; Mohamed, A.R. Hybrid speech recognition with deep bidirectional LSTM. In Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic, 8–12 December 2013; pp. 273–278. [Google Scholar]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
He, C.; Hu, H. Image captioning with visual-semantic double attention. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2019, 15, 1–16. [Google Scholar] [CrossRef]
Arshad, M.A.; Huang, Z.; Riaz, A.; Hussain, Y. Deep Learning-Based Resolution Prediction of Software Enhancement Reports. In Proceedings of the 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 27–30 January 2021; pp. 492–499. [Google Scholar] [CrossRef]
Li, P.; Song, Y.; McLoughlin, I.V.; Guo, W.; Dai, L.R. An Attention Pooling Based Representation Learning Method for Speech Emotion Recognition; International Speech Communication Association: Baixas, France, 2018. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Li, Q.; Li, L.; Wang, W.; Li, Q.; Zhong, J. A comprehensive exploration of semantic relation extraction via pre-trained CNNs. Knowl.-Based Syst. 2020, 194, 105488. [Google Scholar] [CrossRef]
Zhang, Y.; Qiu, Z.; Yao, T.; Liu, D.; Mei, T. Fully convolutional adaptation networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6810–6818. [Google Scholar]
Budak, Ü.; Cömert, Z.; Rashid, Z.N.; Şengür, A.; Çıbuk, M. Computer-aided diagnosis system combining FCN and Bi-LSTM model for efficient breast cancer detection from histopathological images. Appl. Soft Comput. 2019, 85, 105765. [Google Scholar] [CrossRef]
Karim, F.; Majumdar, S.; Darabi, H.; Chen, S. LSTM fully convolutional networks for time series classification. IEEE Access 2017, 6, 1662–1669. [Google Scholar] [CrossRef]
Wang, Z.; Yan, W.; Oates, T. Time series classification from scratch with deep neural networks: A strong baseline. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 1578–1585. [Google Scholar]
Khan, M.; Wang, H.; Riaz, A.; Elfatyany, A.; Karim, S. Bidirectional LSTM-RNN-based hybrid deep learning frameworks for univariate time series classification. J. Supercomput. 2021, 77, 7021–7045. [Google Scholar] [CrossRef]
Schörner, P.; Hubschneider, C.; Härtl, J.; Polley, R.; Zöllner, J.M. Grid-based micro traffic prediction using fully convolutional networks. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 4540–4547. [Google Scholar]
Yasrab, R.; Jiang, W.; Riaz, A. Fighting Deepfakes Using Body Language Analysis. Forecasting 2021, 3, 303–321. [Google Scholar] [CrossRef]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning. PMLR, Lille, France, 7–9 July 2015; pp. 448–456. [Google Scholar]
Wang, Y.; Zhang, W.; Henrickson, K.; Ke, R.; Cui, Z. Digital Roadway Interactive Visualization and Evaluation Network Applications to WSDOT Operational Data Usage; Technical Report; Department of Transportation: Washington, DA, USA, 2016.
Guo, J.; Huang, W.; Williams, B.M. Adaptive Kalman filter approach for stochastic short-term traffic flow rate prediction and uncertainty quantification. Transp. Res. Part C Emerg. Technol. 2014, 43, 50–64. [Google Scholar] [CrossRef]
Kang, D.; Lv, Y.; Chen, Y.Y. Short-term traffic flow prediction with LSTM recurrent neural network. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 1–6. [Google Scholar]
Liu, Y.; Wang, Y.; Yang, X.; Zhang, L. Short-term travel time prediction by deep learning: A comparison of different LSTM-DNN models. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017. [Google Scholar]
Naeem, H.; Bin-Salem, A.A. A CNN-LSTM network with multi-level feature extraction-based approach for automated detection of coronavirus from CT scan and X-ray images. Appl. Soft Comput. 2021, 113, 107918. [Google Scholar] [CrossRef]
Wang, J.; Chen, R.; He, Z. Traffic speed prediction for urban transportation network: A path based deep learning approach. Transp. Res. Part C Emerg. Technol. 2019, 100, 372–385. [Google Scholar] [CrossRef]

Figure 1. Model diagram for bidirectional LSTM architecture.

Figure 2. AttBDLSTM architecture.

Figure 3. Example of causal convolution.

Figure 4. The proposed model architecture.

Figure 5. One-step prediction model.

Figure 6. Multistep prediction model.

Figure 7. Training time per epoch vs. time (seconds).

Figure 8. MAE vs. time lags.

Figure 9. MAPE vs. time lags.

Figure 10. Performance prediction of the proposed model on different time lags.

Table 1. Hyperparameters.

Time lags	3, 6, and 12, respectively
Total time steps	105,120
Activation function	Sigmoid
Training process	Mini-batch gradient descent
Optimizer	RMSProp
Loss	MSE
Patience	20
Training sample	75,679
Validation samples	18,920
Number of trainable parameters	3,491,173
Number of nontrainable parameters	256
Total parameters	3,491,429

Table 2. Comparison with baseline models(MAE).

Time Lags	Models	Error	T + 1	T + 3	T + 6	T + 12
3	FCN	MAE	6.53	5.03	5.61	7.60
	GRU	MAE	1.89	2.45	3.43	3.30
	LSTM	MAE	1.94	2.64	2.83	2.67
	LSTM-DNN	MAE	1.86	2.32	2.37	2.75
	LSTM-FCN	MAE	1.54	2.21	2.24	2.31
	BDLSTM	MAE	1.40	2.18	2.22	2.26
	PROPOSED-MODEL	MAE	1.17	2.10	2.16	2.11
6	FCN	MAE	6.58	5.28	5.58	8.27
	GRU	MAE	1.80	2.66	3.80	3.53
	LSTM	MAE	1.74	2.87	3.14	2.69
	LSTM-DNN	MAE	2.09	2.74	2.80	2.61
	LSTM-FCN	MAE	1.45	2.82	2.66	2.50
	BDLSTM	MAE	1.42	2.62	2.71	2.61
	PROPOSED-MODEL	MAE	1.19	2.25	2.56	2.44
12	FCN	MAE	8.09	5.29	7.70	12.06
	GRU	MAE	2.17	3.36	3.77	3.52
	LSTM	MAE	2.29	3.51	3.54	3.11
	LSTM-DNN	MAE	2.62	3.45	3.38	3.09
	LSTM-FCN	MAE	1.71	3.43	3.33	2.71
	BDLSTM	MAE	1.54	3.20	3.31	2.75
	PROPOSED-MODEL	MAE	1.30	3.06	3.21	2.67

Table 3. Comparison with baseline models(MAPE).

Time Lags	Models	Error	T + 1	T + 3	T + 6	T + 12
3	FCN	MAPE	18.10	16.04	16.72	20.30
	GRU	MAPE	3.82	5.48	7.52	7.24
	LSTM	MAPE	4.22	5.87	6.35	6.23
	LSTM-DNN	MAPE	4.45	5.59	5.56	6.97
	LSTM-FCN	MAPE	3.62	5.18	5.29	5.59
	BDLSTM	MAPE	2.99	4.99	5.26	5.15
	PROPOSED-MODEL	MAPE	2.73	4.95	5.15	5.05
6	FCN	MAPE	18.99	16.26	17.14	21.87
	GRU	MAPE	3.69	6.52	8.88	8.64
	LSTM	MAPE	3.90	6.96	7.53	6.28
	LSTM-DNN	MAPE	5.0	7.09	7.08	6.33
	LSTM-FCN	MAPE	3.40	6.91	6.81	5.96
	BDLSTM	MAPE	3.11	6.60	6.74	6.09
	PROPOSED-MODEL	MAPE	2.75	5.39	6.66	5.80
12	FCN	MAPE	20.48	17.01	21.18	27.89
	GRU	MAPE	4.37	9.06	10.04	8.53
	LSTM	MAPE	5.13	9.36	9.45	7.53
	LSTM-DNN	MAPE	6.63	10.07	9.52	7.80
	LSTM-FCN	MAPE	4.02	9.78	9.52	6.98
	BDLSTM	MAPE	3.41	8.93	9.25	7.20
	PROPOSED-MODEL	MAPE	2.99	7.52	9.05	6.80

Table 4. Ablation study.

Time Lag	Error	FCN	BDLSTM	AttBDLSTM	Proposed Model
3	MAE	6.73	1.40	1.29	1.17
3	MAPE	18.10%	2.99%	2.77%	2.73%
6	MAE	6.58	1.42	1.34	1.19
6	MAPE	18.9%	3.11%	2.99%	2.75%
12	MAE	8.09	1.54	1.39	1.30
12	MAPE	20.48%	3.41%	3.11%	2.99%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Riaz, A.; Rahman, H.; Arshad, M.A.; Nabeel, M.; Yasin, A.; Al-Adhaileh, M.H.; Eldin, E.T.; Ghamry, N.A. Augmentation of Deep Learning Models for Multistep Traffic Speed Prediction. Appl. Sci. 2022, 12, 9723. https://doi.org/10.3390/app12199723

AMA Style

Riaz A, Rahman H, Arshad MA, Nabeel M, Yasin A, Al-Adhaileh MH, Eldin ET, Ghamry NA. Augmentation of Deep Learning Models for Multistep Traffic Speed Prediction. Applied Sciences. 2022; 12(19):9723. https://doi.org/10.3390/app12199723

Chicago/Turabian Style

Riaz, Adnan, Hameedur Rahman, Muhammad Ali Arshad, Muhammad Nabeel, Affan Yasin, Mosleh Hmoud Al-Adhaileh, Elsayed Tag Eldin, and Nivin A. Ghamry. 2022. "Augmentation of Deep Learning Models for Multistep Traffic Speed Prediction" Applied Sciences 12, no. 19: 9723. https://doi.org/10.3390/app12199723

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Augmentation of Deep Learning Models for Multistep Traffic Speed Prediction

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Input Data

3.2. Bidirectional LSTMs

3.3. Attention

3.4. Fully Convolutional Network (FCN)

3.5. AttBDLSTM-FCN

3.6. Dataset Description

3.7. Experimental Setup

4. Results Analysis and Discussion

Ablation Study

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI