Predicting Traffic Flow Parameters for Sustainable Highway Management: An Attention-Based EMD–BiLSTM Approach

Rui, Yikang; Gong, Yannan; Zhao, Yan; Luo, Kaijie; Lu, Wenqi

doi:10.3390/su16010190

Open AccessArticle

Predicting Traffic Flow Parameters for Sustainable Highway Management: An Attention-Based EMD–BiLSTM Approach

by

Yikang Rui

^1,2,3,

Yannan Gong

⁴,

Yan Zhao

^1,2,3,*,

Kaijie Luo

^1,2,3 and

Wenqi Lu

^1,2,3

¹

School of Transportation, Southeast University, Nanjing 211189, China

²

Institute on Internet of Mobility, Southeast University and University of Wisconsin-Madison, Southeast University, Nanjing 211189, China

³

Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies, School of Transportation, Southeast University, Nanjing 211189, China

⁴

Highway Development Center of Jiangsu Provincial Department of Transportation, Nanjing 210001, China

^*

Author to whom correspondence should be addressed.

Sustainability 2024, 16(1), 190; https://doi.org/10.3390/su16010190

Submission received: 22 October 2023 / Revised: 12 December 2023 / Accepted: 19 December 2023 / Published: 25 December 2023

(This article belongs to the Special Issue Big Data Analytics in Sustainable Transport Planning and Management)

Download

Browse Figures

Versions Notes

Abstract

:

The long-term prediction of highway traffic parameters is frequently undermined by cumulative errors from various influencing factors and unforeseen events, resulting in diminished predictive accuracy and applicability. In the pursuit of sustainable highway development and eco-friendly transportation strategies, forecasting these traffic flow parameters has emerged as an urgent concern. To mitigate issues associated with cumulative error and unexpected events in long-term forecasts, this study leverages the empirical mode decomposition (EMD) method to deconstruct time series data. This aims to minimize disturbances from data fluctuations, thereby enhancing data quality. We also incorporate the BiLSTM model, ensuring bidirectional learning from extended time series data for a thorough extraction of relevant insights. In a pioneering effort, this research integrates the attention mechanism with the EMD–BiLSTM model. This synergy deeply excavates the spatiotemporal characteristics of traffic volume data, allocating appropriate weights to significant information, which markedly boosts predictive precision and speed. Through comparisons with ARIMA, LSTM, and BiLSTM models, we demonstrate the distinct advantage of our approach in predicting traffic volume and speed. In summary, our study introduces a groundbreaking technique for the meticulous forecasting of highway traffic volume. This serves as a robust decision-making instrument for both sustainable highway development and transportation management, paving the way for more sustainable, efficient, and environmentally conscious highway transit.

Keywords:

sustainable highway; empirical mode decomposition; bidirectional long short-term memory network; attention mechanism; traffic flow prediction

1. Introduction

In recent years, with the continuous advancement of science and technology, the intelligent transportation system (ITS) has begun to be applied in the system of highway traffic management and control. Its objectives are to make full use of traffic control resources, intelligently manage the operations of transportation systems, and enhance the overall quality of road traffic management. This is done in pursuit of promoting the intelligent and informational development of road traffic systems. Precise predictions can increase road traffic efficiency and allocate traffic resources rationally, thereby reducing traffic congestion and related carbon emissions. Thus, accurately predicting traffic volume parameters has become a focal point of research. Predicting traffic speed is also pivotal for enhancing the efficiency of traffic flow management, thereby reducing congestion, and for bolstering safety by identifying areas prone to accidents due to speed variations. Accurate speed predictions also serve as a crucial input for travel time estimations, aiding in robust traffic planning and decision-making processes, ultimately contributing to the optimization of overall transportation network performance. How to fully tap into the potential efficiency of existing road resources, effectively alleviate road traffic congestion, reduce the rate of traffic accidents, and enhance the quality of highway traffic management and services remains a hot and challenging topic for researchers in relevant fields and professionals in the transportation industry. Computing and analyzing traffic parameters on different time scales is crucial for enhancing the accuracy of highway control. Meanwhile, accurate prediction of traffic parameters plays a crucial role in fostering sustainable transportation by enabling efficient traffic management, which leads to reduced congestion and lower emissions. This, in turn, contributes to a more environmentally friendly and resource-efficient transportation system. Therefore, this paper mainly studies traffic flow parameter prediction methods and prediction methods based on spatial and temporal features.

Scholars both domestically and internationally have conducted extensive research on traffic flow parameter predictions. Most of these studies are focused on predicting based on traffic flow parameter data. The main research approaches include methods based on statistics and methods based on artificial intelligence.

(1): Statistical prediction methods primarily aim for the smallest discrepancy between output data and real data. They seek optimal parameters during the fitting process of historical traffic data, and then incorporate these optimal parameters into the model for optimization, achieving the minimal prediction error, including the ARIMA model [1], SARIMA model [2], and $k$ -nearest neighbor (k-NN) model [3]. Within the traffic flow prediction methods based on statistical learning, time series methods predict the future based on the development patterns of historical traffic flow data.
(2): With the global rise in artificial intelligence and big data technologies, scholars from various countries have also started using artificial intelligence techniques for traffic flow parameter prediction, primarily encompassing the recurrent neural network (RNN) [4], the convolutional neural network (CNN), the graph neural network (GNN) [5] and others.

Despite the advancements in contemporary traffic flow parameter prediction methods that leverage the synergy of spatiotemporal characteristics through deep learning, there are still inherent limitations in these approaches. First, the majority of these methods focus predominantly on traffic flow predictions based on a single data source, rarely considering the potential enhancement that could be achieved through multi-source data fusion. Secondly, the existing methods might lack sufficient robustness when addressing unexpected events or unconventional traffic patterns. Furthermore, while deep learning techniques offer advantages in predictive accuracy, their computational complexity is relatively high, potentially rendering them unsuitable for scenarios that demand real-time responses. Lastly, the current techniques still fall short in delivering satisfactory long-term traffic flow parameter predictions, especially in dynamic traffic environments.

Therefore, this paper delves into the methods of traffic flow parameter prediction with an emphasis on those grounded in spatiotemporal features. To address the aforementioned challenges, we introduce the EMD–Attention–BiLSTM model in the realm of deep learning. The main contributions of this paper are as follows.

(1): EMD Data Processing and Denoising: We present a novel utilization of empirical mode decomposition (EMD) for the preprocessing and denoising of raw traffic data. EMD effectively decomposes the original traffic flow and speed time series data into intrinsic mode functions, thereby elevating data quality and minimizing the impact of fluctuations. This process provides a more dependable foundation of input data for our model. Furthermore, we have streamlined the input data by focusing on key features that significantly impact traffic prediction, which in turn reduces the complexity and computational demands of the model.
(2): EMD–BiLSTM–Attention Model: Our study introduces an innovative model for predicting traffic flow parameters by combining EMD with bidirectional long short-term memory (BiLSTM) and the attention mechanism. This model proficiently captures spatiotemporal features, leading to more precise predictions of traffic flow and speed. It demonstrates significant performance advantages over traditional forecasting methods. This model employs efficient data preprocessing techniques that reduce the computational burden during the training phase. By optimizing the way data are handled, it has managed to decrease the time and resources required for model training. This research has also carefully designed the architecture of the EMD–BiLSTM–Attention model to ensure that it is as lean as possible without compromising on performance. This involves using fewer layers and parameters, thus reducing the computational load.

By leveraging the fusion of empirical mode decomposition (EMD) and bidirectional long short-term memory (BiLSTM) networks enhanced with attention mechanisms, this approach offers precise and dynamic forecasting of traffic flow parameters. Such accurate predictions are invaluable for optimizing highway operations, reducing congestion, and enhancing road safety, ultimately contributing to more sustainable and efficient transportation systems. It is also a cornerstone in achieving sustainable transportation. The application potential of this method lies in its ability to inform real-time traffic management decisions and long-term urban planning strategies, thereby facilitating a more environmentally friendly and resource-efficient approach to highway management.

The arrangement of this paper is as follows. Section 2 focuses on traffic flow parameter prediction, reviewing mainstream methods and summarizing their shortcomings. Section 3 uses the EMD algorithm to denoise the data source of this paper. In Section 4, based on the attention mechanism and the bidirectional long short-term memory model, an EMD–Attention–BiLSTM model is constructed. Section 5 validates the proposed model’s performance and compares it with classic models. Finally, Section 6 summarizes the content of this research and looks forward to the challenges and future development trends in this research field.

2. Literature Review

There are two main types of traffic parameter prediction models, statistics-based and artificial intelligence-based, as described below.

2.1. Traffic Parameter Prediction Models Based on Statistics

Common statistical traffic parameter prediction models include the autoregressive moving average model, of which the ARIMA has become a typical representative of statistical methods [1]. The ARIMA model and its variants are still widely used in the field of traffic flow prediction. For instance, the SARIMA model, based on the ARIMA model, introduces the capability to identify cyclical patterns, and its empirical performance has also been improved [2]. The

k

-nearest neighbor (k-NN) model, a non-parametric regression method, has been widely applied in the domain of traffic flow prediction. Leveraging a similarity mechanism, it matches the

k

optimal neighbors within historical time series data to predict future traffic volumes. This approach also encompasses regression models such as linear regression, time series models like Kalman filtering, and ARIMA. In the last century, Ahemd utilized the autoregressive integrated moving average model to predict traffic flow information parameters [6]. Clark distilled the intrinsic patterns of traffic volume, speed, and density from London highway traffic data and input them into a multivariate non-parametric regression model [7]. This helped in mining the changing patterns of traffic operational states and achieving precise predictions of traffic operational conditions. The main innovation by Thomas et al. was the categorization of traffic travel patterns, taking into consideration the impact of daily and special travel on traffic operational states [8]. These were then incorporated as influencing factors into the autoregressive integrated moving average prediction model, substantially improving the accuracy of long-term predictions.

In addition to the time series data-based statistical model, spatial data are crucial for the traffic parameter prediction model based on statistics. Min and Wynter considered the influence of road segment differences on traffic flow [9]. They calculated the spatiotemporal correlation coefficients of traffic flow and speed across different segments in the traffic network, integrating the spatiotemporal characteristics of traffic flow with the ARIMA model to achieve precise predictions of traffic flow from

5

min to

1

h into the future. Guo et al. proposed an adaptive spatiotemporal

k

-NN model, addressing the limitation in traditional

k

-NN algorithms, where one has to manually set the state space dimension

m

and time window

n

[3]. The size of the time window

n

is determined by the autocorrelation between road segments, while the space dimension

m

is determined by the correlations between different road segments. Ultimately, the adaptive spatiotemporal state matrix replaces the traditional state matrix, effectively integrating spatiotemporal states and enhancing prediction accuracy.

2.2. Traffic Parameter Prediction Models Based on Artificial Intelligence

Common artificial intelligence traffic parameters prediction models include the recurrent neural network (RNN), the convolutional neural network (CNN) and the graph neural network (GNN).

Dia et al. employed time-lagged recurrent network methods for speed prediction, with experimental results indicating a prediction accuracy of up to 90% [10]. As one of the deep learning algorithms, the recurrent neural network (RNN) can aptly capture the spatiotemporal evolution of traffic flow. However, due to issues like gradient explosion and gradient vanishing, it cannot grasp the long-term spatiotemporal evolution patterns [4]. Long short-term memory (LSTM) networks introduced the concept of gates based on RNNs, including forget gates, input gates, and output gates. This gate structure enhances the ability to control the accumulation of internal information [11]. Subsequently, to overcome the problem where LSTM prediction errors accumulate as the sequence lengthens, various improved methods have emerged. Although the LSTM model has, to a certain extent, addressed the issues of gradient explosion and gradient vanishing, it has not fully resolved them, and LSTMs have a large computational overhead and demand rigorous training. This gave rise to derivatives like the gated recurrent unit (GRU) and various LSTM variants such as Bi-LSTM, Tree–LSTM, Graph LSTM, and GS–GLSTM [12].

In 2015, Byeonghyeop and others, based on speed data collected by microwave detectors, used LSTM to fit the nonlinear time series analysis of traffic volume and verified its accuracy and stability [13]. In 2019, Loan and others, under the spatiotemporal attention mechanism, used deep learning for short-term traffic volume forecasting, further mining the spatiotemporal correlation of the traffic network [14]. Hochreiter used the attention mechanism to connect influential historical time steps with the current time step, addressing the long-term dependency issue of LSTM [15]. Wang and others introduced spatial characteristics into time series data, processed the time series data through random forests, then input them into the LSTM network, achieving traffic flow parameter prediction based on spatiotemporal features [16]. Given that traffic flow changes in the road network exhibit complex dynamics, randomness, and high spatiotemporal dependency, scholars worldwide proposed various improved graph neural networks for traffic flow parameter prediction tasks that jointly consider spatiotemporal features.

There are many algorithms in deep learning that can effectively combine the spatiotemporal characteristics of traffic flow for prediction. Among them, the convolutional neural network (CNN) performs well in processing map images containing traffic information. CNNs can automatically capture and process the spatiotemporal features and traffic data in the traffic network for traffic flow parameter prediction and can also be extended to macro road networks. The time and space information in the road network can be transformed into a two-dimensional grid image, where adjacent road sections in the grid image are also adjacent points, thus preserving spatial information. The CNN then processes this grid image for prediction. The CNN consists of a five-layer architecture: the input layer, convolution layer, pooling layer, fully connected layer, and output layer.

Graph neural networks (GNNs) leverage neural networks to mine the data characteristics and patterns within graph structures, and they are used for various graph learning tasks, such as clustering, prediction, and generation [5]. In GNNs, there is a state variable that can represent neighborhood information of any dimension and convey graph structure-related information through node propagation. The road network can be abstracted into a graph structure. GNNs can effectively describe the spatial correlation in traffic networks, making them widely used for traffic flow parameter prediction in regional road networks. First, a local spatiotemporal graph is constructed. Then, the spatiotemporal correlations in the graph are extracted through the spatiotemporal synchronized graph convolution module (STSGCM). Finally, a hierarchical model containing multiple STSGCMs, called the spatiotemporal synchronized graph convolution layer (STSGCL), is established to capture the spatiotemporal correlation and heterogeneity in the spatial network sequence. In 2020, Feng and others combined a CNN with graph neural networks to introduce the graph convolutional neural network (GCN), predicting traffic speeds on roads within the research scope based on spatiotemporal correlations [17]. Chen and others proposed the spatiotemporal synchronized graph convolution model, which can directly extract local spatiotemporal correlation. However, in actual applications, graph neural networks mainly use a fixed graph structure, which does not mine the hidden spatial information in traffic networks effectively and lacks universality.

The above provides an overview of the development history of traffic flow parameter prediction methods. Traditional statistical methods have limited capabilities in mining the spatiotemporal characteristics of traffic networks, typically represented by the road spatial matrix. On one hand, in deep learning methods, RNNs are proficient in handling time series information. Numerous LSTM variants are improvements over the basic LSTM model in specific aspects. Different variants have their unique features and targeted applications, such as the CNN–LSTM model which stacks CNN layers before the LSTM layer; the ConvLSTM model embeds convolution; the StackedLSTMs model piles multiple LSTM models; the Encoder–Decoder LSTM model encodes and decodes input sequences, specifically designed for seq2seq problems; the BiLSTM model adds a backward LSTM model, and so on. On the other hand, for processing high-dimensional spatial structures, convolutional neural networks can automatically extract spatial information, enhancing the ability to depict road space [18]. However, traditional CNNs are only suitable for processing data structures in regular Euclidean spaces. In real roads, the directions of the same road may differ significantly. That is, neighboring points in Euclidean space may not necessarily be similar, so the spatial structure of the road network is non-Euclidean and directed. Therefore, CNNs are not the optimal choice for processing road network spatial structures, but graph neural networks effectively address this problem.

While the realm of artificial intelligence has made significant strides in traffic parameter prediction, there still exists an underlying issue that most models and methods do not holistically address. Many of the aforementioned deep learning models excel in processing either the temporal or spatial features, but rarely both in a cohesive manner. Moreover, these models tend to handle vast amounts of data effectively, but might fall short in discerning intricate patterns in smaller data sets or in scenarios with rapid, unpredictable fluctuations. Additionally, there is a noticeable gap in the existing literature when it comes to the adaptive fusion of multiple models, which could potentially harness the strengths of individual models while compensating for their respective weaknesses. This, coupled with the observation that real-world traffic flow is subject to a multitude of dynamic factors, both predictable (like daily work routines) and unpredictable (like accidents or sudden weather changes), signifies the pressing need for a model that not only robustly captures the spatiotemporal intricacies of the traffic flow but also rapidly adapts to unforeseen changes.

Addressing these shortcomings and inspired by the potential of blending the strengths of existing methodologies, this paper introduces an EMD–Attention–BiLSTM model, aiming to seamlessly integrate spatiotemporal features while ensuring adaptability and precision in diverse traffic scenarios.

3. Data Preparation

This research primarily focuses on traffic prediction using data from electronic toll collection (ETC) gantries. The specificity of this data type necessitates a high level of precision and contextual relevance in the datasets used. As a result, our initial study was conducted with a dataset specifically tailored to ETC gantry data, which provided the most relevant and accurate context for our objectives.

3.1. Raw Data Description

In the current landscape, numerous mature datasets pertaining to traffic flow are readily available [19]. The data in this instance comes from a research section of the G50 Shanghai–Chongqing Expressway, which is approximately 50 km long. The data collection range is the section from Jingze Interchange near Suzhou City on the G50 Expressway to Nanxun Interchange, which includes eight microwave radar detector sections and 12 ETC gantry sections. According to the spatial locations of the microwave radar detectors and ETC gantries, they are sequentially arranged, and after fusion, there are a total of 20 road sections. The data include two attribute parameters: flow and speed. Data were selected from 14 February 2022 to 13 March 2022, a total of 4 weeks. One section generates 288 sample data points in one day, resulting in a total of 8064 data entries over 4 weeks.

3.2. Data Denoising Based on the Empirical Mode Decomposition Algorithm

The original volume time series data and speed time series data are used as input signals for the empirical mode decomposition (EMD) algorithm for decomposition. EMD is an adaptive time–frequency domain signal-processing method [20]. It is mainly used to decompose non-stationary and nonlinear time series data to achieve a stabilizing effect. EMD assumes that any complex original signal can be composed of several intrinsic mode functions (IMFs). The EMD expression is shown in Equation (1). EMD decomposes the original data into several IMFs and a residual series based on the data’s own time characteristics. Each decomposed intrinsic mode function has adaptability, which means each IMF possesses feature information at different time scales. Moreover, the sum of the decomposed IMF and residual series can reconstruct the original data.

I (t) = \sum_{i = 1}^{n} I M F_{i} (t) + R_{n} (t)

(1)

where

I (t)

is the input signal;

I M F_{i} (t)

is the

i t h

iIMF; and

R_{n} (t)

is the residual series.

The EMD algorithm procedure is as follows:

Step 1: For the original input signal $I (t)$ series, traverse the entire time series data $I (t)$ , examining each point in sequence to identify its local maxima and minima. Using the cubic spline interpolation method, fit the maxima and minima, resulting in the upper envelope $e_{+} (t)$ and the lower envelope $e_{-} (t)$ , respectively.
Step 2: Calculate the mean of the upper and lower envelopes and define the mean as $μ_{i} (t)$ . This calculation is shown in Equation (2). Define the new signal series as $I_{i + 1} (t)$ . This new series is obtained by subtracting the envelope mean from the original series, as indicated in Equation (3).

$μ_{i} (t) = \frac{e_{+} (t) + e_{-} (t)}{2}$

(2)

$I_{i + 1} (t) = I_{i} (t) - μ_{i} (t)$

(3)
Step 3: Repeat the above steps 1 and 2 until condition $I_{k} (t)$ is satisfied. Specifically, the mean of the local maxima and minima upper and lower envelopes should be close to zero, and the difference between the number of extrema and zero-crossings must be no more than one. This results in the extraction of the first IMF denoted as $f_{1}$ . This IMF is then subtracted from the original signal series $I (t)$ to obtain the residual series $H_{1} (t) = I (t) - f_{1} (t)$ .
Step 4: Take the residual series $H_{1} (t)$ as the new input series and repeat steps 1 to 4. This process continues until the resultant series $H_{k + 1} (t) = I_{k} (t) - f_{k} (t)$ is a monotonic function or a constant. The final residual series is the residue $r_{n} (t)$ , and a total of $n$ IMFs $f_{1} (t), f_{2} (t), f_{3} (t), \dots f_{n} (t)$ are obtained through the process.
Step 5: The EMD algorithm concludes.

3.3. Analysis of Denoised Data

The original volume time series data and speed time series data were respectively decomposed into 12 IMFs and one residual series. The original input series and the 12 IMFs

I M F_{1}, I M F_{2}, \dots, I M F_{12}

are shown in Figure 1. The frequency of

I M F_{1} - I M F_{12}

gradually decreases and the data’s stationarity gradually increases. After decomposition using the EMD, the errors caused by fluctuations in the volume time series and speed time series data are reduced. The intrinsic mode components of volume and speed can be integrated into a matrix as the input data matrix for the BiLSTM–Attention model, thus improving the prediction accuracy.

In light of the need for high precision and the capacity for the short-term traffic volume prediction in dynamic environments, the single-point-ahead prediction mode is chosen due to its ability to provide accurate forecasts for immediate future points while minimizing model complexity and uncertainty. In addition, the selection of a 5 min prediction interval for traffic flow, as opposed to longer durations such as 10 min or more, is primarily driven by the need for greater accuracy and reliability in short-term predictions, which are subject to rapid changes due to variables like road incidents, weather conditions, and unforeseen events. Additionally, shorter prediction intervals align better with the requirements of real-time traffic management systems, which demand timely data to make effective adjustments in traffic signal control and disseminate traffic advisories. Therefore, this paper adopts the single-point-ahead prediction mode, that is, based on the previous

T

5

min short-term traffic volume parameters to predict the traffic flow parameters at the

T + 1

moment. In order to eliminate the impact of different evaluation metrics’ dimensions and avoid oscillations during gradient updates, enabling the network to converge faster, the min–max scaler method is used for normalization. After linear transformation, the results are mapped to values within the range

[0, 1]

. The distribution of normalized traffic flow and speed is shown in Figure 2 and Figure 3, respectively, and the traffic flow parameter data show a clear periodicity.

4. Methodology

4.1. Bidirectional Long Short-Term Memory (BiLSTM) Neural Network Algorithm

The classical LSTM model can capture characteristics in long-term forward time series data and has therefore found extensive applications in the field of traffic flow prediction. LSTM, an improved version of RNN, overcomes the challenge posed by short-term memory in RNNs, making it more capable of extracting valuable information from earlier timestamps. LSTM (long short-term memory) emerged as a solution to address the issues of short-term memory and vanishing gradients encountered in RNNs. LSTM introduces significant changes in the hidden layers, replacing the single tanh layer of the original RNN with a four-layer structure. The LSTM model is composed of multiple memory modules arranged in a chain-like recurrent neural network, enabling it to selectively retain and utilize information at each time step. This enhances the LSTM model’s autonomous learning and search capabilities, demonstrating remarkable performance in sequential data analysis.

Due to its memory capabilities, the LSTM model can unearth past information, significantly enhancing the accuracy of traffic flow prediction. Building upon this foundation, incorporating the extraction of future information enables a more comprehensive analysis of historical and future data features, further improving prediction accuracy. BiLSTM (bidirectional long short-term memory) [21], a variant of the LSTM model, combines forward LSTM and backward LSTM, training them from both directions and linearly merging their training structures. Consequently, it adeptly captures bidirectional influences in time series data.

The fundamental idea behind bidirectional long short-term memory (BiLSTM) is to train the forward and backward sequences of the input sequence using two separate LSTM models, which are then connected to the same output layer. As a result, each data point in the input sequence within the output layer contains both complete forward and backward information. The overall network structure of the BiLSTM model is depicted in Figure 4. In the figure, it can be observed that BiLSTM comprises a forward LSTM and a backward LSTM. The forward LSTM encodes the traffic flow data from front to back, while the backward LSTM encodes the traffic flow data from back to front. These two LSTM models work in parallel, processing the data independently through their respective hidden states and eventually linearly combining the two output results to produce the final training outcome.

At time

t

, the input vector

x_{t}

is processed through the forward LSTM model and the backward LSTM model to compute and separately produce the corresponding vectors

h_{(F, t)}

and

h_{(B, t)}

. These vectors are then concatenated to form the output variable

h_{t}

at time t. The specific computational formulas for the BiLSTM model are as follows. The calculation formula for the forward LSTM is shown in Equation (4), and the calculation formula for the backward LSTM, which computes from time

t + 1

input data in the backward LSTM, is as shown in Equation (5). The vectors

h_{(F, t)}

and

h_{(B, t)}

produced by the forward and backward layers collectively form the final output vector

h

of the BiLSTM model, calculated as Equation (6).

\begin{array}{l} i_{t} = σ (W_{i} [h_{t - 1}, x_{t}] + b_{i}) \\ f_{t} = σ (W_{f} [h_{t - 1}, x_{t}] + b_{f}) \\ \tilde{c_{t}} = \tanh (W_{c} [h_{t - 1}, x_{t}] + b_{c}) \\ c_{t} = f_{t} \times c_{t - 1} + i_{t} \times {\tilde{c}}_{t} \\ o_{t} = σ (W_{o} [h_{t - 1}, x_{t}] + b_{0}) \\ h_{F, t} = o_{t} \times \tanh (c_{t}) \end{array}}

(4)

\begin{array}{l} i_{t} = σ (W_{i} [h_{t + 1}, x_{t}] + b_{i}) \\ f_{t} = σ (W_{f} [h_{t + 1}, x_{t}] + b_{f}) \\ \tilde{c_{t}} = \tanh (W_{c} [h_{t + 1}, x_{t}] + b_{c}) \\ c_{t} = f_{t} \times c_{t + 1} + i_{t} \times {\tilde{c}}_{t} \\ o_{t} = σ (W_{o} [h_{t + 1}, x_{t}] + b_{0}) \\ h_{B, t} = o_{t} \times \tanh (c_{t}) \end{array}}

(5)

h = [h_{F, t}, h_{B, t}] = W_{F} h_{F, t} + W_{B} h_{B, t} + b

(6)

where

σ

represents activation function;

b

represents bias value; and

W_{F}, W_{B}

represent weight matrices for the hidden layers of the forward and backward LSTM models, respectively.

4.2. Traffic Flow Parameter Prediction Model Based on EMD–Attention–BiLSTM Model

In neural network learning, the more parameters there are, the stronger the model’s expressive ability and the more information it can store. When the neural network model processes excessive information, the application of the attention mechanism allows for the selection of key information, thereby improving the efficiency of the neural network. In this paper, the original data are decomposed through EMD to enhance the data quality as the input to the model. On the basis of the BiLSTM model, the attention mechanism is introduced to calculate different weights for the output vectors of the BiLSTM model’s hidden layer. A short-term traffic flow parameter prediction model based on EMD–Attention–BiLSTM is constructed. The hierarchical structure of the EMD–Attention–BiLSTM model is shown in Figure 5 and mainly consists of the following five parts.

Input layer: Every $5$ min traffic parameter, i.e., volume, speed, and road physical structure, is used as input.
EMD decomposition: The time series data are decomposed into several IMFs.
BiLSTM layer: Uses both forward and backward LSTM models.
Attention layer.
Output layer.

Attention mechanisms are widely used to allocate different weights to various parts of the input data, enabling the model to “focus” on the most relevant information when making predictions. The attention mechanism computes attention weights

α_{t}

using the output vector

H_{t}

from the hidden layer of the BiLSTM model at time

t

.

u_{t} = \tanh (W h_{t} + b)

(7)

α_{t} = \frac{\exp (u_{t})}{\sum_{t} \exp (u_{t})}

(8)

where

u_{t}

calculated by Equation (7) represents the importance or impact of the output vector

h_{t}

from the BiLSTM model at time

t

on the result;

W

is the weight matrix;

b

is the bias term; and Equation (8) computes the attention weight

α_{t}

for time

t

.

4.3. Prediction Using EMD–BilSTM–Attention

The steps for short-term traffic state prediction based on the EMD–BiLSTM–Attention model are as follows:

Handling Missing Values:

Examine the original dataset for missing and anomalous values. Substitute missing values with the average of the values above and below the missing data point.

Sample Construction:

Based on the discussions above, select traffic volume, average speed, and road physical structure as influencing factors to input into the model.

Sample Preprocessing:

To optimize the model training outcome, utilize EMD to perform stationary decomposition on sample data. Once decomposed, recombine the components into a matrix to be used as input data. Use the min–max scaler method to normalize the input data, and then split the processed data into training and test sets. The normalized data will fall within the

[0, 1]

range. The specific calculation formula is given as Equation (9).

x_{s c a l e} = \frac{x - x_{\min}}{x_{\max} - x_{\min}}

(9)

where

x_{s c a l e}

represents the data normalized by min–max scaler;

x

represents the input data;

x_{\max}, x_{\min}

are the maximum and minimum values of the input data, respectively.

Model Construction:

Integrate the attention mechanism with the BiLSTM model. Begin by inputting the training set into the forward and backward LSTM models. Calculate based on Equations (4) and (5). During the training process, utilize the dropout function to randomly delete neurons, preventing overfitting in neural network training. Adjust parameters using the Adam optimizer.

Next, the output matrix from the BiLSTM model enters the attention layer. Determine the weight coefficient at each moment according to Equations (11) and (12). Let us set the output of the attention layer at moment

t

as

S_{t}

with the calculation formula given as Equation (10).

S_{t} = \sum_{t = 1} α_{t} h_{t}

(10)

Lastly, in the fully connected layer, the activation function chosen is the sigmoid function, producing the predicted value.

Grid Search for Optimal Parameters:

Different parameter combinations in the neural network can affect the prediction results. Hence, this paper employs the grid search method to adjust parameters such as the number of units in the BiLSTM layer, the number of iterations, and the time window size to find the optimal parameter combination.

The calculation process of the attention mechanism can be divided into the following three stages:

Stage 1, Calculating Similarity: Based on the query and the input key–value pairs, calculate their similarity $S_{i j}$ . The calculation formula is shown in Equation (11).

$S_{i j} = F (Q_{i}, K_{j})$

(11)

where: $Q_{i}$ is the query vector of $i t h$ input data;
$K_{j}$ is the key vector for the $j t h$ input data;
$F$ is the calculation model. Common models include dot product model, represented as $S_{i j} = Q_{i} \cdot K_{j}$ ; cosine similarity model, represented as $S_{i j} = Q_{i} \cdot K_{j} / | Q_{i} | \cdot | K_{j} |$ ; and MLP, represented as $S_{i j} = M L P (Q_{i}, K_{j})$ .
Stage 2, Normalization: The SoftMAX function is introduced for two primary reasons. First, it transforms raw score values into a probability distribution where the weights sum up to $1$ . Second, it accentuates the weights of important elements through the intrinsic mechanism of SoftMAX. The attention score value of the query vector against the key vector can be represented as $α_{i j}$ , with the calculation formula shown in Equation (12).

$α_{i j} = s o f t m a a x (S_{i j})$

(12)
Stage 3, Weighted Sum: At this stage, the attention scores are summed to yield the attention values of the query vector concerning the input data. The computation can be represented by Equation (13).

$A t t e n t i o n (Q_{i,} K_{j}) = \sum_{j = 1}^{N} α_{i j} V_{j}$

(13)

The computational structure flowchart is shown in Figure 6.

4.4. Evaluation Criteria

The experimental results will be evaluated using the root mean square error (RMSE) and mean absolute error (MAE) to assess the predictive performance of the model. RMSE and MAE are the most classic regression evaluation metrics. Both reflect the discrepancy between the actual values and predicted values. Therefore, the larger the RMSE and MAE values, the greater the error. Notably, RMSE is sensitive to large deviations, while MAE is sensitive to small errors. The formulae for RMSE and MAE are given in Equations (14) and (15).

S_{t} = \sum_{t = 1} α_{t} h_{t}

(14)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |\hat{y_{i}} - y_{i}|

(15)

In which:

\hat{y_{i}}

represents the predicted value;

y_{i}

represents the actual measured value;

n

denotes the number of predicted samples.

5. Experiment and Results

5.1. Experiment Environment

In this study, the experiments were conducted on an Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz 2.40 GHz processor with a Windows 10 64-bit operating system. The neural network framework was built using the Keras deep learning library in a Python 3.7 and TensorFlow 2.6.0 environment. The model was trained on 5184 data entries from 14 February to 3 March 2022, serving as the training set. Data from 3 March to 13 March 2022, comprising 2880 entries, were used as the test set. The optimal parameters for the model are a timestep of 10, 64 units in the hidden layer, a dropout rate set to 0.3, and training for 100 iterations.

5.2. Performance Comparison among Different Prediction Models

The proposed model framework in this paper will be compared with the ARIMA model, LSTM neural network, and BiLSTM model. This is to validate the significant advantages of the EMD–Attention–BiLSTM model in the domain of short-term traffic volume prediction. The comparative results of different models are illustrated in Figure 7, Figure 8, Figure 9 and Figure 10.

In the same experimental environment, the ARIMA, LSTM, BiLSTM, and EMD–BiLSTM–Attention models were trained and tested on traffic volume and speed. The evaluation metrics RMSE and MAE were computed. The results are presented in Table 1.

5.3. EMD–Attention–BiLSTM Model Prediction Result Analysis

Figure 11 and Figure 12 depict the changes in evaluation metrics for the four prediction models. From these figures, it can be observed that the EMD–Attention–BiLSTM model outperforms the traditional ARIMA model, LSTM model, and BiLSTM model in terms of both RMSE and MAE metrics. From a volume prediction perspective, the EMD–Attention–BiLSTM model shows significant improvements over the ARIMA model, LSTM model, and BiLSTM model. The RMSE metric decreases by 31.4%, 29.2%, and 10.1%, respectively, compared to the ARIMA, LSTM, and BiLSTM models. The MAE metric also decreases by 21.1%, 22.3%, and 9.7%, respectively. From a speed prediction perspective, the EMD–Attention–BiLSTM model performs better than the ARIMA model, LSTM model, and BiLSTM model. The RMSE metric decreases by 28.6%, 6.6%, and 0.9%, respectively, compared to the ARIMA, LSTM, and BiLSTM models. The MAE metric also decreases by 39.2%, 21.6%, and 19.5%, respectively.

The traditional ARIMA model relies on moving averages and autoregressive principles, and its predictions are closely related to historical data and tend to be close to the average of historical data. However, traffic data exhibit significant fluctuations and randomness, making neural network models more suitable. This model achieves better prediction results because the BiLSTM model is an improvement over the LSTM model, and EMD decomposition helps in denoising the data. The BiLSTM model captures both forward and backward data features, considering that traffic volume data is not only related to past spatiotemporal characteristics but also influenced by drivers’ intentions in the future. Therefore, the BiLSTM model can better uncover the hidden characteristics of traffic volume. The introduction of the attention mechanism further enhances its ability to capture long-range dependencies, optimizing the prediction results.

6. Conclusions

This paper primarily focuses on short-term prediction of traffic volume and speed. It begins by analyzing the historical development of traffic volume parameter prediction, from traditional methods to incorporating spatiotemporal features. The paper reviews the theoretical foundations and methods in the field of traffic volume parameter prediction, providing a comprehensive overview of research in this area. Next, the paper introduces the empirical mode decomposition (EMD) method for decomposing time series data. It applies EMD to denoise 5184 data points. Furthermore, the paper proposes an EMD-based model that integrates bidirectional long short-term memory (BiLSTM) with the attention mechanism. Finally, the EMD–BiLSTM–Attention model is employed for short-term prediction of traffic volume and speed. The model’s performance is compared with traditional models such as ARIMA, LSTM, and BiLSTM. The results demonstrate that the EMD–BiLSTM–Attention model exhibits superior short-term prediction performance. It is particularly well suited to modeling the operational characteristics of highway traffic, allowing for more accurate predictions of both traffic volume and speed. This model can be valuable for sustainable traffic planning and management.

In future research, it may be worthwhile to explore the application of self-attention models in the field of sustainable traffic planning, management, and prediction, as well as to compare their performance with models employing the attention mechanism. Moreover, expanding dataset selection is needed so as to include a broader range of data sources. This expansion will not only encompass additional ETC gantry data but also other relevant datasets that can provide a more comprehensive analysis of traffic patterns and behaviors.

Author Contributions

Conceptualization, Y.R. and Y.G.; methodology, Y.G.; software, Y.Z.; validation, Y.R., Y.G. and W.L.; formal analysis, K.L.; investigation, Y.Z.; resources, Y.R.; data curation, Y.G.; writing—original draft preparation, Y.G.; writing—review and editing, Y.R.; visualization, Y.Z.; supervision, K.L.; project administration, Y.R.; funding acquisition, Y.R. All authors have read and agreed to the published version of the manuscript.

Funding

Key R&D Program of Shandong Province, China. No. 2020CXGC010118.

Data Availability Statement

The data used in this research cannot be disclosed for reasons of privacy of the individual vehicle users.

Acknowledgments

Thanks to Shandong Expressway Group for their strong support of the study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Shahriari, S.; Ghasri, M.; Rashidi, T. Ensemble of ARIMA: Combining parametric and bootstrapping technique for traffic flow prediction. Transportmetrica 2020, 16, 1552–1573. [Google Scholar] [CrossRef]
Lin, X.F.; Huang, Y.Z. Short-Term High-Speed Traffic Flow Prediction Based on ARIMA-GARCH-M Model. Wirel. Pers. Commun. 2021, 117, 3421–3430. [Google Scholar] [CrossRef]
Guo, J.H.; Huang, W.; Williams, B.M. Adaptive Kalman filter approach for stochastic short-term traffic flow rate prediction and uncertainty quantification. Transp. Research. Part C Emerg. Technol. 2014, 43, 50–64. [Google Scholar] [CrossRef]
Mers, M.; Yang, Z.Y.; Hsieh, Y.; An, T.Y. Recurrent Neural Networks for Pavement Performance Forecasting: Review and Model Performance Comparison. Transp. Res. Rec. 2023, 2677, 156–178. [Google Scholar] [CrossRef]
Chen, W.; An, J.; Li, R.; Fu, L.; Xie, G.; Bhuiyan, M.Z.A.; Li, K. A novel fuzzy deep-learning approach to traffic flow prediction with uncertain spatial–temporal data features. Future Gener. Comput. Syst. 2018, 89, 78–88. [Google Scholar] [CrossRef]
Ahmed, M.A. Analysis of Freeway Traffic Time-Series Data by Using Box-Jenkins Techniques. Transp. Res. Rec. 1979, 17, 1–9. [Google Scholar]
Clark, S. Traffic prediction using multivaruate nonparametric regression. J. Transp. Eng. 2003, 129, 161–168. [Google Scholar] [CrossRef]
Thomas, T.; Weijermars, W.; Van, B.E. Predictions of urban volumes in single time series. IEEE Trans. Intell. Transp. Syst. 2010, 11, 71–80. [Google Scholar] [CrossRef]
Min, W.; Wynter, L. Real-timr road traffic prediction with spatio-temporal correlations. Transp. Res. Part C 2011, 19, 606–616. [Google Scholar] [CrossRef]
Dia, H. An object-oriented neural network approach to short-term traffic forecasting. Eur. J. Oper. Res. 2001, 131, 253–261. [Google Scholar] [CrossRef]
Yang, Y.Q.; Lin, J.; Zheng, Y.B. Short-Time Traffic Forecasting in Tourist Service Areas Based on a CNN and GRU Neural Network. Appl. Sci. 2022, 12, 768–799. [Google Scholar]
Lu, Z.; Lv, W.; Cao, Y.; Xie, Z.; Peng, H.; Du, B. LSTM variants meet graph neural networks for road speed prediction. Neurocomputing 2020, 400, 34–45. [Google Scholar] [CrossRef]
Byeonghyeop, Y.; Lee, Y.J.; Sohn, K. Forecasting road traffic speeds by considering area-wide spatio-temporal dependencies based on a graph convolutional neural network (GCN). Transp. Res. Part C: Emerg. Technol. 2020, 114, 189–204. [Google Scholar]
Do, L.N.; Vu, H.L.; Vo, B.Q.; Liu, Z.; Phung, D. An effective spatial-temporal attention based neural network for traffic flow prediction. Transp. Res. Part C Emerg. Technol. 2019, 108, 12–28. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Wang, Y.; Long, M.; Wang, J.; Gao, Z.; Yu, P.S. Predrnn: Recurrent neural networks for predictive learning using spatiotemporal lstms. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 879–888. [Google Scholar]
Feng, X.X.; Ling, X.Y.; Zheng, H.F.; Chen, Z.; Xu, Y. Adaptive Multi-Kernel SVM With Spatial-Temporal Correlation for Short-Term Traffic Flow Prediction. IEEE Trans. Intell. Transp. Syst. 2019, 20, 2001–2013. [Google Scholar] [CrossRef]
Olayode, I.; Tartibu, L.K.; Okwu, M.; Severino, A. Comparative Traffic Flow Prediction of a Heuristic ANN Model and a Hybrid ANN-PSO Model in the Traffic Flow Modelling of Vehicles at a Four-Way Signalized Road Intersection. Sustainability 2021, 13, 119–10704. [Google Scholar] [CrossRef]
Maryam, S.; Collin, M.; Wanxin, L.; Xiaoliang, Z.; Mark, N. Traffic prediction using artificial intelligence: Review of recent advances and emerging opportunities. Transp. Res. Part C Emerg. Technol. 2022, 145, 103921. [Google Scholar]
Rilling, G.; Flandrin, P.; Goncalves, P. On empirical mode decomposition and its algorithms. In Proceedings of the IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing, Grado IEEE, Antalya, Turkey, 20–23 June 2003; Volume 3, pp. 8–11. [Google Scholar]
Guo, Z.; Zhang, Y.; Lu, W. Attention guided graph convolutional networks for relation extraction. arXiv 2019, arXiv:1906.07510. [Google Scholar]

Figure 1. Empirical mode decomposition results. (a) Description of the intrinsic mode components of volume; (b) description of the intrinsic mode components of speed.

Figure 2. The distribution of normalized traffic flow.

Figure 3. The normalized vehicle speed distribution.

Figure 4. BiLSTM structure.

Figure 5. EMD–Attention–BiLSTM architecture.

Figure 6. Attention mechanism architecture.

Figure 7. ARIMA model prediction results compared with raw data. (a) Predicted results on traffic volume; (b) predicted results of travel speed.

Figure 8. LSTM model prediction results compared with raw data. (a) Predicted results on traffic volume; (b) predicted results of travel speed.

Figure 9. BiLSTM model prediction results compared with raw data. (a) Predicted results on traffic volume; (b) predicted results of travel speed.

Figure 10. EMD–BiLSTM–Attention model prediction results compared with raw data. (a) Predicted results on traffic volume; (b) predicted results of travel speed.

Figure 11. Comparison of root mean square error (RMSE) for volume and speed under four prediction methods.

Figure 12. Comparison of root mean absolute error (MAE) for volume and speed under four prediction methods.

Table 1. The comparison results for different models. An asterisk (*) is used to highlight the smallest RMSE and MAE values obtained among the compared models.

Model	Prediction Type
	Volume		Speed
	RMSE	MAE	RMSE	MAE
ARIMA	8.79851	6.34704	6.97533	5.47037
LSTM	8.53072	6.44815	5.33418	4.23885
BiLSTM	6.74355	5.55348	5.02398	4.20131
EMD–BiLSTM–Attention	* 6.03410	* 5.01282	* 4.98031	* 3.32175

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rui, Y.; Gong, Y.; Zhao, Y.; Luo, K.; Lu, W. Predicting Traffic Flow Parameters for Sustainable Highway Management: An Attention-Based EMD–BiLSTM Approach. Sustainability 2024, 16, 190. https://doi.org/10.3390/su16010190

AMA Style

Rui Y, Gong Y, Zhao Y, Luo K, Lu W. Predicting Traffic Flow Parameters for Sustainable Highway Management: An Attention-Based EMD–BiLSTM Approach. Sustainability. 2024; 16(1):190. https://doi.org/10.3390/su16010190

Chicago/Turabian Style

Rui, Yikang, Yannan Gong, Yan Zhao, Kaijie Luo, and Wenqi Lu. 2024. "Predicting Traffic Flow Parameters for Sustainable Highway Management: An Attention-Based EMD–BiLSTM Approach" Sustainability 16, no. 1: 190. https://doi.org/10.3390/su16010190

APA Style

Rui, Y., Gong, Y., Zhao, Y., Luo, K., & Lu, W. (2024). Predicting Traffic Flow Parameters for Sustainable Highway Management: An Attention-Based EMD–BiLSTM Approach. Sustainability, 16(1), 190. https://doi.org/10.3390/su16010190

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Predicting Traffic Flow Parameters for Sustainable Highway Management: An Attention-Based EMD–BiLSTM Approach

Abstract

1. Introduction

2. Literature Review

2.1. Traffic Parameter Prediction Models Based on Statistics

2.2. Traffic Parameter Prediction Models Based on Artificial Intelligence

3. Data Preparation

3.1. Raw Data Description

3.2. Data Denoising Based on the Empirical Mode Decomposition Algorithm

3.3. Analysis of Denoised Data

4. Methodology

4.1. Bidirectional Long Short-Term Memory (BiLSTM) Neural Network Algorithm

4.2. Traffic Flow Parameter Prediction Model Based on EMD–Attention–BiLSTM Model

4.3. Prediction Using EMD–BilSTM–Attention

4.4. Evaluation Criteria

5. Experiment and Results

5.1. Experiment Environment

5.2. Performance Comparison among Different Prediction Models

5.3. EMD–Attention–BiLSTM Model Prediction Result Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI