Next Article in Journal
Rethinking Energy–Transport Poverty: An Indicator for Vulnerable Rural EU Regions
Previous Article in Journal
Exploring the Tourism Development Potential and Distinctive Features of Traditional Wooden Architecture in Central Hunan: A Case Study of 18 Villages
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Short-Term Traffic Flow Prediction Considering Weather Factors Based on Optimized Deep Learning Neural Networks: Bo-GRA-CNN-BiLSTM

Business School, University of Shanghai for Science and Technology, Shanghai 200093, China
*
Author to whom correspondence should be addressed.
Sustainability 2025, 17(6), 2576; https://doi.org/10.3390/su17062576
Submission received: 18 February 2025 / Revised: 11 March 2025 / Accepted: 12 March 2025 / Published: 14 March 2025
(This article belongs to the Section Sustainable Transportation)

Abstract

:
Accurately predicting road traffic flows is a primary challenge in the development of smart cities, providing a scientific basis and reference for urban planning, construction, and traffic management. Road traffic flow is influenced by various complex features, including temporal and weather conditions, which introduce challenges to traffic flow prediction. To enhance the accuracy of traffic flow prediction and improve the adaptability across different weather conditions, this study introduced a traffic flow prediction model with explicit consideration of weather factors including temperature, rainfall, air quality index, and wind speed. The proposed model utilized grey relational analysis (GRA) to transform weather data into weighted traffic flow data, expanded input variables into a new data matrix, and employed one-dimensional convolutional neural networks (CNNs) to extract valuable feature information from these input variables, as well as bidirectional long short-term memory (BiLSTM) to capture temporal dependencies within the time-series data. Bayesian optimization was employed to fine-tune the hyperparameters of the model, offering advantages such as fewer iterations, high efficiency, and fast speed. The performance of the proposed prediction model was validated using the traffic flow data collected at an intersection in China and on the M25 motorway in the United Kingdom. The results demonstrated the effectiveness of the proposed model, achieving improvements of at least 9.0% in MAE, 2.8% in RMSE, 2.3% in MAPE, and 0.06% in R2 compared to five baseline models.

1. Introduction

Intelligent transportation systems (ITSs) greatly enhance the connectivity between vehicles, roads, and users. Short-term traffic flow prediction is a key component of ITS [1,2,3]. More accurate predictions of short-term traffic flows can facilitate travelers in obtaining reasonable route guidance, saving costs and time, improving route planning, and reducing traffic congestion and accidents for traffic managers [4,5]. Owing to problems such as unreasonable road flow management, air pollution, extended travel time, network traffic speed prediction, and citizen dissatisfaction, predicting urban street traffic flow has become a popular research topic [6].
Traffic flow prediction is typically categorized into two distinct scenarios: road-section prediction and regional traffic flow prediction. This study focused on the first scenario, which is used for traffic management at specific locations. The time interval for short-term traffic flow prediction is generally 5–15 min, and the prediction is performed on a specific road [7]. Because of the short time interval, high data complexity, and random and uncertain characteristics involved with short-term traffic flow prediction, the nonlinear characteristics of neural networks have received special attention and displayed widespread use in this field [8]. Fully considering the complexity of traffic flow factors and improving the prediction accuracy and speed of road traffic flow have persisted as major challenges [9].
Most researchers have focused on using historical traffic flows on roads to forecast future flows [10]. However, road traffic flow is influenced by factors such as time and weather. Several researchers are considering weather elements as crucial features to be considered in traffic flow prediction. Experiments have shown [11] that weather factors such as wind speed, temperature, fog, and visibility affect travel traffic volume on highways. Similarly, in urban road studies, factors such as temperature, visibility, and heavy fog influence the travel traffic volume on roads. Although weather factors have been considered features for predicting traffic flow in previous studies, the interdependent trends between traffic flow and weather factors as a holistic system, as well as the interactions between them, have been overlooked.
Regarding the prediction model architectures, long short-term memory (LSTM) and bidirectional long short-term memory (BiLSTM) networks are commonly used for time-series forecasting. However, they can only capture features inherent to the dataset and excel at addressing issues related to temporal correlations. To enhance the performances of LSTM and BiLSTM networks, some studies have proposed using convolutional neural networks (CNNs) to capture valuable features in the input data matrix and then input those features into LSTM or BiLSTM networks for prediction. Many results indicate that CNN-LSTM and CNN-BiLSTM outperform single-network prediction models under typical conditions [12].
In this study, we propose a new hybrid model, Bo-GRA-CNN-BiLSTM, for road traffic flow prediction. Based on the existing CNN-BiLSTM model, gray relational analysis (GRA) is used to convert weather data into the corresponding weighted traffic flow data, thereby enhancing the alignment between different types of data and improving their correlation as components of a complex system. Determining optimal hyperparameters for a network model is crucial. Small variations in the model hyperparameters, such as the learning rate and epochs, can significantly impact prediction results. To address this challenge, we employed the Bayesian optimization algorithm, known for its low number of iterations and high efficiency, to determine the optimal hyperparameters. To validate the predictive performance of the proposed model, we compared it with five baseline models.
The main contribution of this study is the proposal of a traffic flow prediction model based on the fusion of CNN and BiLSTM neural networks, incorporating Bayesian optimization and GRA. The Bayesian optimization algorithm was employed to search for hyperparameters, while GRA was used to convert weather data into corresponding weighted traffic flow data. We focused on three widely accepted weather indicators: temperature, wind speed, and precipitation. We also used the air quality index (AQI) for the Changzhou dataset.
The remainder of this paper is organized as follows: Section 2 provides a review of the relevant research on network traffic prediction. In Section 3, we introduce the datasets used and outline the problems to be addressed. Section 4 describes the methods used in the model. Section 5 presents the experimental results for the datasets and comparisons with baseline models. Finally, Section 6 concludes the main findings of this study and offers insights into future prospects.

2. Literature Review

Traffic flow prediction is an extensively studied problem, both domestically and internationally. Over the years, many machine learning algorithms have been developed to address this issue. Based on recent trends, they can be classified into machine learning and deep learning models.
In machine learning models, researchers such as Lihua et al. [13] employed the autoregressive integrated moving average (ARIMA) algorithm and the wavelet analysis method for traffic volume prediction. Ryu et al. [14] used the k-nearest neighbor (KNN) algorithm to estimate the spatiotemporal correlations of road segments for traffic flow prediction, achieving promising results. Zhang et al. [15] described a novel application of support vector regression (SVR) in cooperation with rising computational intelligence techniques using the state-space approach in traffic forecasting. Dai et al. [16] employed a multilinear regression model based on MapReduce for traffic flow prediction, achieving accurate results in large-scale multidimensional time-series traffic flow prediction.
With the development of technology in recent years, deep learning models such as long short-term memory (LSTM) have been applied to traffic flow predictions. Kang et al. [17] analyzed the impacts of different inputs on LSTM-based traffic flow predictions. Chen et al. [18] proposed an LSTM model based on deep reinforcement learning (DRL) for traffic flow prediction. In addition, they utilized a fuzzy comprehensive evaluation (FCE) model to characterize traffic conditions. Poonia and Jain [19] utilized LSTM for momentary traffic stream forecasting. Lu et al. [20] proposed an LSTM traffic flow prediction method utilizing ARIMA for capturing linear regression features.
As emerging deep learning approaches, graph neural networks (GNNs), CNNs, BiLSTMs, and their variations have been used for traffic flow prediction [21,22,23]. Traffic networks can be represented as graphs, where nodes represent traffic nodes (such as intersections or junctions) and edges represent connections between nodes (such as roads or pathways) [24]. Therefore, GNNs and graph convolutional networks (GCNs) can more effectively capture the topological structure and connectivity of traffic networks. Regional traffic flow prediction involves spatial dependencies between traffic nodes, such as the propagation and influence of traffic flows between different road segments. GNNs and GCNs can leverage adjacency relationships between nodes to learn spatial dependencies and incorporate these relationships into a prediction model [25]. In summary, GNNs and GCNs exhibit superior performance in capturing the topological structures, spatial dependencies, and temporal dependencies inherent in traffic networks. However, for the road-section traffic flow prediction, which is the object of this study, the node connections in the graph are relatively sparse, and the adjacency relationships between nodes may not be sufficiently dense, which can hinder GNNs from capturing effective information. Hence, they are more optimally utilized for regional traffic flow prediction tasks than for predicting the traffic flow on individual road sections.
Combining CNNs with time-series models has proven to be effective for road section traffic flow forecasting. Wang et al. [26] addressed the complexity and long-term dependencies of urban road traffic by proposing a short-term traffic flow prediction model based on an attention mechanism and a 1DCNN-LSTM network. However, the study only considered weather factors, such as temperature, and did not consider multiple factors, such as precipitation. Zhuang, Weiqing, and Yongbo [27] proposed a multistep traffic flow prediction model that combined a CNN and BiLSTM to achieve accurate results in short-term traffic volume prediction on highways. However, this model is limited to the time-series of traffic volume and does not consider factors such as weather, which can impact traffic flow. Li et al. [28] proposed a model combining the BiLSTM and attention mechanism units. It not only dealt with forward and backward dependencies in time-series data but also integrated the attention mechanism to improve the ability of key information representation. Méndez et al. [29] demonstrated the relevance of extracting features from a time-series data matrix using a CNN layer. They also found that BiLSTM outperformed LSTM in time-series tasks and that CNN-BiLSTM performed better than CNN-LSTM. However, in a subsequent study, the authors introduced new predictive variables and did not provide a detailed analysis of the different hyperparameters of the model. Redhu et al. [30] proposed a prediction model for short-term traffic flows using a combination of particle swarm optimization (PSO) and bidirectional long short-term memory (BiLSTM) neural networks, and this model is referred to as the PSO-BiLSTM model. Though an analysis of the model’s hyperparameters was conducted, multiple factors were not considered.
In summary, previous forecasting models overlooked the interactive development trends between traffic flow and weather factors as an integrated system as well as the inter-relationships between traffic flow and weather data. To enhance the performance of traffic flow prediction models and strengthen the fit between different types of data, the network structure was modified by considering several widely accepted weather indicators that affect traffic flow as input variables for the model. A Bayesian optimization method was employed to search for the optimal hyperparameters, and the hyperparameters of the model were analyzed. To the best of our knowledge, the Bo-GRA-CNN-BiLSTM hybrid model has not been previously applied in the context of time-series or traffic flow prediction.

3. Data Source

Two actual road datasets were used to establish and evaluate the proposed model. The first dataset consisted of traffic flow data collected by road detectors at a main road intersection in Changzhou, China, in July 2023. The weather data for this dataset were obtained from the Changzhou Meteorological Observatory. The second dataset was sourced from the UK Highways Agency. Specifically, these traffic flow data were collected by detectors on the M25 motorway in September 2019 and the MIDAS site at M25/4883A. The weather data for this dataset were obtained from public records collected by Heathrow Airport, UK. Both datasets included traffic flow data collected every 15 min using road detectors. The first dataset comprised 2976 traffic flow data points along with corresponding weather factors, including temperature, wind speed, rainfall, and AQI. The second dataset contained 2880 traffic flow data points along with corresponding weather factors, including temperature, wind speed, and rainfall. The subsequent steps involved splitting the original dataset into training, validation, and test sets at an 8:1:1 ratio.
Table 1 lists a subset of the weather factor datasets used in this study. The temperature data were calculated by averaging the highest and lowest temperatures within a day. This choice was made considering that the time periods with the highest travel demand generally occur in the morning and evening, whereas the impact of the highest temperature around noon on travel behavior is relatively small. The rainfall amount refers to the total rainfall of each day, which can, to a certain extent, reflect the degree of rainfall and the road surface conditions on that day. AQI stands for air quality index. It is a comprehensive indicator for measuring the degree of air pollution. In its evaluation system, it includes fine particulate matter (PM2.5), inhalable particulate matter (PM10), etc. To a certain extent, it can determine people’s means of travel. Dataset 2 lacks an AQI compared to Dataset 1, and the details are not repeated here. Figure 1a,b show traffic flow data for the two observed roads.

4. Methodology

Figure 2 shows the overall workflow of the prediction model. Owing to the diverse measurement units of the input variables, we initially nondimensionalized the data collected in the preceding section. Using the GRA method, we computed the correlation coefficients between weather factors and road traffic flow and subsequently determined their relative weights with respect to traffic volume (Step 1). Prior to model construction, initializing the parameters for the Bayesian optimization algorithm was crucial because it enabled the subsequent optimization of the model hyperparameters (Step 2). The construction of the predictive model involved the integration of the CNN, BiLSTM, and fully connected layers (Step 3).
The input variables consisted of traffic flow N and K weather factors sampled at 15 min intervals per road segment. We utilized data from the preceding eight time steps to forecast the traffic flow for the subsequent time step. By leveraging a month’s worth of observational data, our input ensemble was represented as an N × (K + 1) data matrix, denoted as R(N × (K + 1)). Subsequently, normalization was performed before the data matrix was input into the model. The MinMaxScaler() function in Python facilitated this normalization process. Finally, the processed data matrix was used as the input, yielding the final predictive outcomes.

4.1. Step 1: Grey Relational Analysis

The grey system theory introduces the concept of grey relational analysis for various subsystems to seek numerical relationships among these subsystems or factors through specific methods. Grey relational analysis aims to quantify the degree of association between two factors in system development [31]. If two factors exhibit a synchronized trend in their variations, indicating a high degree of simultaneous change, they are considered to have a strong correlation, whereas if their trends are dissimilar, they are considered to have a weak correlation [32]. Therefore, grey relational analysis provides a quantitative measure of developmental trends within a system, making it particularly suitable for analyzing dynamic processes.
Grey relational analysis can be used to perform weight calculations after assessing the correlation between various factors, resulting in a weight proportion that indicates the influence of different factors relative to a specific factor.
If we represent a traffic flow sequence as shown in Equation (1), where the dependent variable is denoted as Y, it can be expressed as follows:
Y = y 1 , y 2 , , y n T
The factors that influence traffic flow constitute a data sequence represented by the dependent variable X, which is denoted as
X = x 11 x 21 x n 2 x 22 x 1 m x 2 m x n 1 x 23 x n m
where n represents the time points and m represents the number of factors.
Because of the varying units and data ranges of the different factors, it was necessary to initially normalize them to bring them within a consistent range. Subsequently, the relationship coefficients between the influencing factors and traffic flow were calculated. Let a and b denote the minimum and maximum differences between the two extremes, respectively.
a = min i min k x 0 k x i k
b = max i max k x 0 k x i k
To construct z, we used
z k , j = ε j k = a + ρ b x k j y j + ρ b
where ρ is the resolution coefficient, typically taking a value of 0.5.
To calculate the degree of association, we used
r j = 1 n k = 1 n ε j k = 1 n k = 1 n z k j
where r j represents the grey relational degree of the j-th factor.
Next, the weights of each indicator relative to the traffic flow were calculated using Equation (7).
w j = r j k = 1 m r k

4.2. Step 2: Bayesian Optimization Algorithm

In traditional machine learning, the model parameters are often determined through extensive experimentation. With the continuous advancements in technology, methods for automatically searching for model hyperparameters have emerged. Bayesian optimization is one such method of hyperparameter optimization [33]. Bayesian optimization does not impose specific requirements on the objective function and can optimize the hyperparameters with a limited number of sampling points. It is widely used to optimize deep learning models.
In the context of the GRA-CNN-BiLSTM model proposed in this paper, we considered three hyperparameters, namely epoch, batch, and learning rate, as optimization targets for Bayesian optimization.
The Bayesian optimization algorithm assigns a range of values for the hyperparameter combination C = c 1 , c 2 , , c n , x n , where xn represents the range of values for a specific hyperparameter. These hyperparameters are then used as inputs to the deep learning model function f for hyperparameter optimization. The process of determining the optimal hyperparameter values involves.
c * = argmin c C f x
where c represents the combination of hyperparameters for the model, f(x) represents the function representing the optimization objective for the GRA-CNN-BiLSTM model, and c* represents the optimal set of hyperparameters. Figure 3 shows the algorithm flow.
The workflow of the Bayesian optimization for GRA-CNN-BiLSTM is as follows:
(1)
Initialize the process by randomly sampling points within the specified bounds.
(2)
Track the best point and its corresponding value.
(3)
For a set number of iterations, a Gaussian process model is fitted using the sampled points. Then, a Gaussian process model is used to select the next sampling point.

4.3. Step 3: Hybrid Model Integrating CNN and BILSTM

4.3.1. Convolutional Neural Network

The LeNet model is the original model of convolutional neural networks (CNNs). Its inception was inspired by the study of the cat’s visual system in neuroscience, and it introduced a feature known as the “receptive field” to construct convolutional network layers [34]. Currently, CNN models demonstrate excellent performance in various scientific domains. Their primary function is feature extraction from the data. Traditional CNNs typically comprise convolutional, activation, pooling, and fully connected layers.
In terms of architecture, there is no significant difference between the 1DCNN and CNN. Both encompass a series of convolutional layers, pooling layers, and activation functions with a final fully connected layer to produce the output. The main distinction lies in the application: CNN models are typically employed for image recognition, whereas one-dimensional CNN (1DCNN) networks are widely used for feature extraction and recognition within data matrices. Despite operating in only one dimension, a 1DCNN retains the advantages of a CNN in terms of translational invariance. Owing to its single-dimensional nature, a 1DCNN does not introduce a substantial computational load or parameter count in convolution kernel calculations. Consequently, larger kernel widths can be utilized to comprehensively extract features [35].
However, in the context of time-series traffic flow prediction, the performance of CNNs appears to be mediocre. This is primarily because CNNs struggle to capture contextual information, implying the relationships between the preceding and subsequent time steps. Therefore, the integration of CNNs with memory networks is crucial in time-series traffic flow prediction scenarios.
Figure 4 shows the structure of the 1DCNN. Here, the convolution kernel ω operates on the input data x t R s × f at time step t to extract a feature matrix D t = { D t , 1 ,   D t , 2 ,   ,   D t , s 1 } R τ × d . In this representation, s represents the length of the time step, f denotes the feature dimension, τ denotes the length of the output features, and d represents the dimension of the output features, which is determined by the filter settings.

4.3.2. Bidirectional Long Short-Term Memory Neural Network

The long short-term memory neural network (LSTM) was introduced by Hochreiter and Schmidhuber in 1997 to address the “vanishing gradient” problem using traditional RNNs [36]. LSTM networks can retain information from many time steps in the past, enabling them to learn long-term dependencies. The LSTM model is represented by Equations (9)–(14), where x t represents the data at time t; ht denotes the hidden information at time t; f t , i t , and o t correspond to the forget, input, and output gates, respectively; and c t represents the memory cell. W, U, and b represent the weights and biases, whereas σ and Tanh denote the Sigmoid and Tanh activation functions, respectively.
f t = σ W f x t + U f h t 1 + b f
i t = σ W i x t + U i h t 1 + b i
c t = T a n h W c x t + U c h t 1
c t = f t c t 1 + i t c t
h t = o t T a n h c t
o t = σ W o x t + U o h t 1 + b o
BiLSTM is a variant of the LSTM. Because an RNN processes time steps in the order of data, reversing or disrupting the order of time steps can affect the extraction of features from the data [37]. Additionally, because traffic flow exhibits periodic characteristics with morning and evening peaks and the traffic flow data for adjacent days do not show significant fluctuations, this method is particularly suitable for traffic flow prediction. Figure 5 illustrates the internal structure of the BiLSTM. The output of the BiLSTM model is
h t = ω y · h i f , h i b + b y
where h i f and h i b are the output vectors of the forward and backward LSTM networks and ω y and b y are the weight and bias, respectively.

4.3.3. Combination Model of CNN + BiLSTM

In conclusion, future traffic flow at a certain moment is closely influenced not only by past traffic conditions but also by weather factors. For this, we considered leveraging the advantages of CNNs in handling spatial features and BiLSTMs in handling temporal sequence features to construct a combined prediction model, CNN-BiLSTM, to enhance the prediction accuracy of the road traffic flow. In the model construction phase, we employed a 4-layer, 1-dimensional CNN with a rectified linear unit (ReLU) as the activation function, the “same” padding, a Flatten layer, three BiLSTM layers with ReLU activation, a dropout rate of 0.5 (with one placed between every two BiLSTM layers), two fully connected layers with ReLU activation, and a regression layer.

5. Results and Discussion

5.1. Measures of Performance

To gauge the efficacy of the traffic flow prediction model, an array of performance metrics, including the mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R2), were employed for evaluation as follows:
MAE = 1 N i = 1 N y i y ^ i y i
RMSE = 1 N i = 1 N y i y ^ i 2
MAPE = 1 n Y p Y A Y A
R 2 = i = 1 N y ^ i y ¯ 2 i = 1 N ( y i y ¯ ) 2
where y i is the predicted value, y ^ i is the original value, y ¯ is the average value, and N is the number of datasets.

5.2. GRA Calculates Weights

In this section, the weights of various weather indicators relative to traffic flow are analyzed using GRA. Table 2 and Table 3 list the weights of the weather factors for Datasets 1 and 2, respectively. Table 4 and Table 5 list the partial traffic flows based on the calculation of the weather factor weights.

5.3. Bayes Parameter Initialization

Table 6 lists the initial hyperparameter settings for the Bayesian optimization algorithm.

5.4. Construction Model Hyperparameters

A Bayesian optimization algorithm and multiple experiments were employed to set the detailed hyperparameters of the prediction model, as listed in Table 7.

5.5. Baseline Models

In this section, we briefly describe the baseline models whose performances are compared with that of the proposed model. We will specify the hyperparameter values for each baseline model.

5.5.1. Backpropagation Neural Network

The BP neural network, a multi-layer feedforward neural network, operates by iteratively adjusting the weights of each layer network based on error backpropagation. Training ceases once the specified number of training iterations or desired training accuracy is achieved. Considering reference [38], we set the number of hidden neurons in the BP neural network to 1200, with 12 hidden layers, each containing 100 neurons.

5.5.2. Convolutional Neural Network

Following the configuration in [29], this study used a 9 × 4 matrix as the input to obtain the corresponding target variable as the output. We employed a 2D convolutional layer with 256 filters, kernel size of 2 × 2, and rectified linear unit activation function. Subsequently, a max-pooling layer with a pool size of 2 × 2 pixels was added. Next, a flattened layer was introduced, and this was followed by a dense layer with one neuron and a linear activation function.

5.5.3. LSTM Network

In accordance with the configuration in [39], we set the initial learning rate to 0.005, minimum batch size to 128, and maximum number of epochs to 300. The adaptive moment estimation optimizer was selected as the training optimizer. The learning rate drop during training was set to piecewise, with a learning rate drop period of 125 and the factor for the learning rate drop set to 0.2. Additionally, the model comprised four LSTM layers, each with 300 hidden units.

5.5.4. BiLSTM Network

According to the configuration in [30], we set the number of neurons to 150, number of hidden layers to 1, maximum epochs to 350, and learning rate drop factor to 0.2.

5.5.5. CNN-BiLSTM Network

Based on the configuration in [27], this study used a CNN layer and two BiLSTM layers to construct a CNN-BiLSTM prediction model. Specifically, the Conv1d filters were set to 64, Conv1d kernel size was set to 3, Conv1d activation function was set to ReLu, and dropout rate was set to 0.2. The BiLSTM units were set to 64 and 32, and the activation function was set to ReLu. The first BiLSTM layer had the BiLSTM return sequence set to true, the second BiLSTM layer had the BiLSTM return sequence set to false, and, finally, a fully connected layer was added. A key distinction of our proposed model is that it uses only historical traffic flow data as the input.

5.6. Results Comparison

To analyze the performance of the model in predicting short-term traffic flow, the constructed new data matrix was divided into training, validation, and test sets at an 8:1:1 ratio. Subsequently, using Python 3.9 and TensorFlow 2.9.0 in PyCharm 2024.3, a deep-learning framework was developed, and the Bo-GRA-CNN-BiLSTM model was applied for short-term traffic flow prediction. The model was run five times to ensure the effectiveness of the experiments, and the average of the prediction results was obtained. It was then compared with several traditional neural network models, including BP, CNN, LSTM, BiLSTM, and CNN-BiLSTM. Various evaluation metrics were calculated to evaluate the prediction accuracy of the model, as listed in Table 8.
The results of the five baseline models on the test sets of Datasets 1 and 2 are illustrated in Figure 6a,b, respectively, depicting a comparison with the actual traffic flow data. A comparison between the predicted results of the Bo-GRA-CNN-BiLSTM network and the actual traffic flow data on the test sets for Datasets 1 and 2 is illustrated in Figure 7a,b.
According to Equations (16)–(19), smaller values of RMSE, MAE, and MAPE, as well as higher values of R2, indicate better model performance. The experimental results demonstrate that the Bo-GRA-CNN-BiLSTM model performs best overall in terms of RMSE, MAE, MAPE, and R2 on the two datasets. In terms of the RMSE, MAE, and MAPE evaluation metrics, the Bo-GRA-CNN-BiLSTM model outperformed five traditional neural network models: BP, CNN, LSTM, BiLSTM, and CNN-BiLSTM. In Table 8, it can be observed that the BP neural network and convolutional neural network perform relatively poorly in time-series prediction, which may be attributed to the tendency of BP neural networks to become stuck in local minima when dealing with time-series tasks, whereas CNNs may be more inclined toward processing time-series tasks.
Our proposed Bo-GRA-CNN-BiLSTM prediction model outperformed the LSTM model, which performed the best among the baseline comparison models in terms of the MAE and RMSE metrics, on the two datasets. The improvements were 9.04%, 9.08%, 2.84%, and 33.5%, respectively, and the MAPE metric, compared to the best-performing baseline model, CNN-BiLSTM, showed improvements of 29.3% and 2.3%, respectively. The R2 metric remained unchanged compared with the best-performing LSTM model on the first dataset and showed a 0.06% improvement on the second dataset. We attribute these improved results to the consideration of datasets that capture the correlation within complex systems, enabling the model to better learn the underlying features and achieve a more accurate fit to real traffic flow data.

5.7. Sensitivity Analysis

In this section, we focus on the sensitivity analysis and parameter settings of the Bo-GRA-CNN-BiLSTM model, primarily considering three types of parameters: the epochs, size of the convolutional kernels of the CNN layer, and number of hidden units in the BiLSTM layer. This analysis was performed using Dataset 1.
In Table 9, the compared prediction epoch is set from 50 to 250 with an interval of 50. The models for all considered epochs were trained five times, and the best values were compared. Evidently, the Bayesian-optimized epoch outperformed the other regular epochs for all evaluation metrics. This demonstrates the precision of the Bayesian optimization algorithm in finding optimal hyperparameters for the network model in comparison to the parameter settings with 250 epochs, and there were increases of 45.17%, 35.46%, and 22.74% in the three indicators, respectively. Therefore, it can be concluded that the Bo-GRA-CNN-BiLSTM model at 186 epochs provides more accurate and stable traffic flow predictions than the same model at the other epochs.
For the convolutional kernel sizes, we conducted comparisons with sizes ranging from 8 to 128, with each time at intervals equal to the size, as shown in Figure 8a. Simultaneously, we kept the hyperparameters of the Bayesian optimization and the number of hidden units unchanged. Compared to other scenarios, the convolutional kernel sizes optimized through Bayesian optimization showed minimum improvements of 50.19%, 43.94%, and 49.89% across the three evaluation metrics.
Next, we compared the number of hidden units in BiLSTM, set from 32 to 256, with intervals equal to their own size, while keeping the convolution kernels specified in Table 6 constant, as shown in Figure 8b. For various combinations of hidden unit numbers, our chosen hidden layer sizes of 128, 256, and 64 demonstrated improvements of at least 0.56%, 8.79%, and 35.5%, respectively, across the three evaluation metrics.
In all cases, this can be attributed to the fact that under these combinations of neural network parameters, the input data may contain sufficient traffic flow information to accurately predict future traffic flow. Therefore, it can be concluded that at 186 epochs, setting the BiLSTM layer with 128, 256, and 64 hidden units and using convolution kernel sizes of 64, 128, 128, and 64, respectively, in the one-dimensional convolution layer outperformed the other choices in terms of accuracy and stability.

6. Conclusions

This study proposes a short-term traffic flow prediction model that combines GRA to calculate the weights of various weather factors and Bayesian optimization to integrate weather factors with a one-dimensional convolutional and BiLSTM neural network. Our proposed forecasting model outperformed five baseline models on two datasets, which were collected at an intersection in China and on the M25 motorway in the United Kingdom. The following conclusions were drawn.
(1)
In short-term traffic flow prediction, the impact of weather factors on daily traffic volume cannot be ignored. Road traffic, as a complex system, is influenced by interactions between time and weather factors. Therefore, it is essential to consider the mutual development trends between these data. By comparing the models with and without weather factors, we demonstrated the influence of weather factors on short-term traffic flow prediction. In Dataset 1, the model considering weather factors exhibited improvements of 45.59%, 41.25%, 29.3%, and 0.40% in the MAE, RMSE, MAPE, and R2 metrics, respectively. In Dataset 2, the corresponding improvements were 35.57%, 60.94%, 2.3%, and 0.28%, respectively.
(2)
The proposed Bo-GRA-CNN-BiLSTM network addresses the challenge of determining optimal hyperparameters using Bayesian optimization and enhances the alignment between weather and traffic flow data using GRA. Compared with the five baseline models, the improvements in the proposed model in terms of MAE, RMSE, MAPE, and R2 were at least 9.0%, 2.8%, 2.3%, and 0.06%, respectively.
(3)
The selection of the BiLSTM and CNN layers was tested using sensitivity analysis. The proposed model outperformed the other BiLSTM layers by 50.19%, 43.94%, and 49.89% in terms of MAE, RMSE, and MAPE, respectively. The improvements were 0.56%, 8.79%, and 35.5%, respectively, compared with the other CNN layers.
The experimental results demonstrate that the one-dimensional convolutional and bidirectional long short-term memory algorithms, combined with Bayesian optimization and the fusion of weather factors using grey relational analysis, can provide a more comprehensive and accurate prediction of traffic flow. However, there are also some limitations in this study. For example, it is necessary to further analyze additional weather indicators and holiday factors, explore different interpretability in neural network models, add more dataset scenarios for verification, and conduct experiments with more geographical information conditions. Future work will involve the more detailed exploration and optimization of the hyperparameters of the model.

Author Contributions

Conceptualization, C.W. and S.H.; Formal analysis, C.W. and S.H.; Funding acquisition, C.Z.; Methodology, S.H.; Project administration, C.Z.; Supervision, C.Z.; Writing—original draft, C.W. and S.H.; Writing—review and editing, C.W. and C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant No. 52202401 and the Shanghai Pujiang Program under Grant No. 22PJC081.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zahid, M.; Chen, Y.; Jamal, A.; Mamadou, C.Z. Freeway short-term travel speed prediction based on data collection time-horizons: A fast forest quantile regression approach. Sustainability 2020, 12, 646. [Google Scholar] [CrossRef]
  2. Li, M.; Li, M.; Liu, B.; Liu, J.; Liu, Z.; Luo, D. Spatio-temporal traffic flow prediction based on coordinated attention. Sustainability 2022, 14, 7394. [Google Scholar] [CrossRef]
  3. Lan, T.; Zhang, X.; Qu, D.; Yang, Y.; Chen, Y. Short-term traffic flow prediction based on the optimization study of initial weights of the attention mechanism. Sustainability 2023, 15, 1374. [Google Scholar] [CrossRef]
  4. Feng, X.; Chen, Y.; Li, H.; Ma, T.; Ren, Y. Gated Recurrent Graph Convolutional Attention Network for Traffic Flow Prediction. Sustainability 2023, 15, 7696. [Google Scholar] [CrossRef]
  5. Kim, M.; Lee, D. Why Uncertainty in Deep Learning for Traffic Flow Prediction Is Needed. Sustainability 2023, 15, 16204. [Google Scholar] [CrossRef]
  6. Zhou, S.; Wei, C.; Song, C.; Fu, Y.; Luo, R.; Chang, W.; Yang, L. A hybrid deep learning model for short-term traffic flow pre-diction considering spatiotemporal features. Sustainability 2022, 14, 10039. [Google Scholar] [CrossRef]
  7. Smith, B.L.; Williams, B.M.; Oswald, R.K. Comparison of parametric and nonparametric models for traffic flow forecasting. Transp. Res. Part C Emerg. Technol. 2002, 10, 303–321. [Google Scholar] [CrossRef]
  8. Chan, K.Y.; Dillon, T.S.; Singh, J.; Chang, E. Neural-network-based models for short-term traffic flow forecasting using a hybrid exponential smoothing and Levenberg–Marquardt algorithm. IEEE Trans. Intell. Transp. Syst. 2011, 13, 644–654. [Google Scholar] [CrossRef]
  9. Li, Y.; Chai, S.; Ma, Z.; Wang, G. A hybrid deep learning framework for long-term traffic flow prediction. IEEE Access 2021, 9, 11264–11271. [Google Scholar] [CrossRef]
  10. Cui, J.; Zhao, J. Construction of Dynamic Traffic Pattern Recognition and Prediction Model Based on Deep Learning in the Background of Intelligent Cities. IEEE Access 2023, 12, 1418–1433. [Google Scholar] [CrossRef]
  11. Yue, X.; Yang, X.; Song, D.; Yuan, Y. The interaction effect of severe weather and non-weather factors on freeway travel volume. Sci. Total Environ. 2022, 808, 152057. [Google Scholar] [CrossRef] [PubMed]
  12. Bogaerts, T.; Masegosa, A.D.; Angarita-Zapata, J.S.; Onieva, E.; Hellinckx, P. A graph CNN-LSTM neural network for short and long-term traffic forecasting based on trajectory data. Transp. Res. Part C Emerg. Technol. 2020, 112, 62–77. [Google Scholar] [CrossRef]
  13. Lihua, N.; Xiaorong, C.; Qian, H. ARIMA model for traffic flow prediction based on wavelet analysis. In Proceedings of the 2nd International Conference on Information Science and Engineering, Hangzhou, China, 4–6 December 2010; IEEE: Piscataway, NJ, USA, 2010. [Google Scholar]
  14. Ryu, U.; Wang, J.; Kim, T.; Kwak, S.; Juhyok, U. Construction of traffic state vector using mutual information for short-term traffic flow prediction. Transp. Res. Part C Emerg. Technol. 2018, 96, 55–71. [Google Scholar] [CrossRef]
  15. Zhang, Y.; Liu, Y. Traffic forecasting using least squares support vector machines. Transportmetrica 2009, 5, 193–213. [Google Scholar] [CrossRef]
  16. Dai, L.; Qin, W.; Xu, H.; Chen, T.; Qian, C. Urban traffic flow prediction: A MapReduce based parallel multivariate linear regression approach. In Proceedings of the 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), Qingdao, China, 8–11 October 2014; IEEE: Piscataway, NJ, USA, 2014. [Google Scholar]
  17. Kang, D.; Lv, Y.; Chen, Y.Y. Short-term traffic flow prediction with LSTM recurrent neural network. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; IEEE: Piscataway, NJ, USA, 2017. [Google Scholar]
  18. Chen, Z.; Luo, X.; Wang, T.; Wang, W.; Zhao, W. Deep reinforcement learning-based LSTM model for traffic flow forecasting in internet of vehicles. In Proceedings of the 2021 Chinese Intelligent Automation Conference, Zhanjiang, China, 5–7 November 2021; Springer: Singapore, 2022. [Google Scholar]
  19. Poonia, P.; Jain, V.K. Short-term traffic flow prediction: Using LSTM. In Proceedings of the 2020 International Conference on Emerging Trends in Communication, Control and Computing (ICONC3), Sikar, India, 21–22 February 2020; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar]
  20. Lu, S.; Zhang, Q.; Chen, G.; Seng, D. A combined method for short-term traffic flow prediction based on recurrent neural network. Alex. Eng. J. 2021, 60, 87–94. [Google Scholar] [CrossRef]
  21. Huang, F.; Yi, P.; Wang, J.; Li, M.; Peng, J.; Xiong, X. A dynamical spatial-temporal graph neural network for traffic demand prediction. Inf. Sci. 2022, 594, 286–304. [Google Scholar] [CrossRef]
  22. Wang, T.; Ni, S.; Qin, T.; Cao, D. TransGAT: A dynamic graph attention residual networks for traffic flow forecasting. Sustain. Comput. Inform. Syst. 2022, 36, 100779. [Google Scholar] [CrossRef]
  23. Shuai, C.; Wang, W.; Xu, G.; Lee, J.; Lee, J. Short-term traffic flow prediction of expressway considering spatial influences. J. Transp. Eng. Part A Syst. 2022, 148, 04022026. [Google Scholar] [CrossRef]
  24. Hou, F.; Zhang, Y.; Fu, X.; Jiao, L.; Zheng, W. The prediction of multistep traffic flow based on AST-GCN-LSTM. J. Adv. Transp. 2021, 2021, 9513170. [Google Scholar] [CrossRef]
  25. Luo, X.; Li, D.; Yang, Y.; Zhang, S. Spatiotemporal traffic flow prediction with KNN and LSTM. J. Adv. Transp. 2019, 2019, 4145353. [Google Scholar] [CrossRef]
  26. Wang, K.; Ma, C.; Qiao, Y.; Lu, X.; Hao, W.; Dong, S. A hybrid deep learning model with 1DCNN-LSTM-Attention networks for short-term traffic flow prediction. Phys. A Stat. Mech. Its Appl. 2021, 583, 126293. [Google Scholar] [CrossRef]
  27. Zhuang, W.; Cao, Y. Short-term traffic flow prediction based on cnn-bilstm with multicomponent information. Appl. Sci. 2022, 12, 8714. [Google Scholar] [CrossRef]
  28. Li, Z.; Xu, H.; Gao, X.; Wang, Z.; Xu, W. Fusion attention mechanism bidirectional LSTM for short-term traffic flow prediction. J. Intell. Transp. Syst. 2022, 28, 511–524. [Google Scholar] [CrossRef]
  29. Méndez, M.; Merayo, M.G.; Núñez, M. Long-term traffic flow forecasting using a hybrid CNN-BiLSTM model. Eng. Appl. Artif. Intell. 2023, 121, 106041. [Google Scholar] [CrossRef]
  30. Bharti, P.; Redhu, P.; Kumar, K. Short-term traffic flow prediction based on optimized deep learning neural network: PSO-Bi-LSTM. Phys. A Stat. Mech. Its Appl. 2023, 625, 129001. [Google Scholar] [CrossRef]
  31. Eça, L.; Hoekstra, M. Evaluation of numerical error estimation based on grid refinement studies with the method of the manufactured solutions. Comput. Fluids 2009, 38, 1580–1591. [Google Scholar] [CrossRef]
  32. Liu, H.; Wang, W.; Zhang, Q. Multi-objective location-routing problem of reverse logistics based on GRA with entropy weight. Grey Syst. Theory Appl. 2012, 2, 249–258. [Google Scholar] [CrossRef]
  33. Lindauer, M.; Eggensperger, K.; Feurer, M.; Biedenkapp, A.; Deng, D.; Benjamins, C.; Ruhkopf, T.; Sass, R.; Hutter, F. SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization. J. Mach. Learn. Res. 2022, 23, 1–9. [Google Scholar]
  34. Zhang, W.; Yu, Y.; Qi, Y.; Shu, F.; Wang, Y. Short-term traffic flow prediction based on spatio-temporal analysis and CNN deep learning. Transp. A Transp. Sci. 2019, 15, 1688–1711. [Google Scholar] [CrossRef]
  35. Reza, S.; Ferreira, M.C.; Machado, J.J.M.; Tavares, J.M.R.S. Traffic state prediction using one-dimensional convolution neural networks and long short-term memory. Appl. Sci. 2022, 12, 5149. [Google Scholar] [CrossRef]
  36. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  37. Li, L.; Yang, Y.; Yuan, Z.; Chen, Z. A spatial-temporal approach for traffic status analysis and prediction based on Bi-LSTM structure. Mod. Phys. Lett. B 2021, 35, 2150481. [Google Scholar] [CrossRef]
  38. Peng, Y.; Xiang, W. Short-term traffic volume prediction using GA-BP based on wavelet denoising and phase space reconstruction. Phys. A Stat. Mech. Its Appl. 2020, 549, 123913. [Google Scholar] [CrossRef]
  39. Abduljabbar, R.L.; Dia, H.; Tsai, P.-W. Unidirectional and bidirectional LSTM models for short-term traffic prediction. J. Adv. Transp. 2021, 2021, 5589075. [Google Scholar] [CrossRef]
Figure 1. The number of vehicles per 15 min on the target road.
Figure 1. The number of vehicles per 15 min on the target road.
Sustainability 17 02576 g001
Figure 2. Flowchart of traffic flow prediction.
Figure 2. Flowchart of traffic flow prediction.
Sustainability 17 02576 g002
Figure 3. Bayesian optimization process.
Figure 3. Bayesian optimization process.
Sustainability 17 02576 g003
Figure 4. One-dimensional CNN structure.
Figure 4. One-dimensional CNN structure.
Sustainability 17 02576 g004
Figure 5. BiLSTM network structure.
Figure 5. BiLSTM network structure.
Sustainability 17 02576 g005
Figure 6. Traffic flow prediction using baseline models on the test dataset.
Figure 6. Traffic flow prediction using baseline models on the test dataset.
Sustainability 17 02576 g006aSustainability 17 02576 g006b
Figure 7. Traffic flow prediction using the proposed model on the test dataset.
Figure 7. Traffic flow prediction using the proposed model on the test dataset.
Sustainability 17 02576 g007
Figure 8. An analysis of the prediction performance of the Bo-GRA-CNN-BiLSTM model for different parameters.
Figure 8. An analysis of the prediction performance of the Bo-GRA-CNN-BiLSTM model for different parameters.
Sustainability 17 02576 g008aSustainability 17 02576 g008b
Table 1. Extraction of raw weather indicator data.
Table 1. Extraction of raw weather indicator data.
Number of DataDateTemperature (°C)Rainfall (mm)AQIWind Speed (m/s)
01 July 2023280972.45
12 July 202328.53.26522.45
23 July 2023290372.45
Table 2. Weight index for weather factors of Dataset 1.
Table 2. Weight index for weather factors of Dataset 1.
Traffic VolumeTemperatureRainfallAQIWind Speed
10.24560.26180.24640.2462
Table 3. Weight index for weather factors of Dataset 2.
Table 3. Weight index for weather factors of Dataset 2.
Traffic VolumeTemperatureRainfallWind Speed
10.35260.34130.3059
Table 4. Partial input variables of Dataset 1.
Table 4. Partial input variables of Dataset 1.
No.Traffic VolumeTemperatureRainfallAQIWind Speed
022254.523258.119654.700854.6564
121151.821655.239851.990451.9482
219046.66449.74246.81646.778
315939.050441.626239.177639.1458
Table 5. Partial input variables of Dataset 2.
Table 5. Partial input variables of Dataset 2.
No.Traffic VolumeTemperatureRainfallWind Speed
0400141.057136.551122.390
1372131.183126.993113.823
2362127.656123.579110.763
3427150.578145.768130.652
Table 6. Initial hyperparameters for Bayesian optimization.
Table 6. Initial hyperparameters for Bayesian optimization.
Bayesian OptimizationInitial PointsNumber of IterationsRandom State
Parameters6105
Table 7. Hyperparameters for the Bo-GRA-CNN-BiLSTM model.
Table 7. Hyperparameters for the Bo-GRA-CNN-BiLSTM model.
Network ParametersParameter ValuesNetwork ParametersParameter Values
Conv1D_filters64, 128, 128, 64Best epochs186
Bi_LSTM_Dence128, 256, 64Best learn rate0.0006121
Number of hidden layers128, 64, 1Best batch size118
Table 8. Prediction accuracy of the considered models on the test dataset.
Table 8. Prediction accuracy of the considered models on the test dataset.
DatasetsMetricBPCNNLSTMBiLSTMCNN-BiLSTMProposed Model
1MAE44.480639.51687.76148.717212.97467.0601
RMSE57.820946.40008.957911.494514.81328.7034
MAPE (%)10.422313.40303.95145.27192.68611.8992
R20.98430.98980.99950.99930.99910.9995
2MAE59.207448.466810.669915.049115.05629.7010
RMSE89.613660.891316.801918.249728.608111.1738
MAPE (%)9.56566.21243.00533.91773.04342.9735
R20.96900.98550.99890.99840.99670.9995
Table 9. Analysis of the prediction performance of the Bo-GRA-CNN-BiLSTM model for different epochs.
Table 9. Analysis of the prediction performance of the Bo-GRA-CNN-BiLSTM model for different epochs.
EpochsMAERMESMAPE (%)
5017.438721.38253.6171
10037.573146.221715.7805
15010.270411.55502.5753
1862.64183.47591.5638
2008.89839.27902.9319
2504.81785.38602.0242
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, C.; Huang, S.; Zhang, C. Short-Term Traffic Flow Prediction Considering Weather Factors Based on Optimized Deep Learning Neural Networks: Bo-GRA-CNN-BiLSTM. Sustainability 2025, 17, 2576. https://doi.org/10.3390/su17062576

AMA Style

Wang C, Huang S, Zhang C. Short-Term Traffic Flow Prediction Considering Weather Factors Based on Optimized Deep Learning Neural Networks: Bo-GRA-CNN-BiLSTM. Sustainability. 2025; 17(6):2576. https://doi.org/10.3390/su17062576

Chicago/Turabian Style

Wang, Chaojun, Shulin Huang, and Cheng Zhang. 2025. "Short-Term Traffic Flow Prediction Considering Weather Factors Based on Optimized Deep Learning Neural Networks: Bo-GRA-CNN-BiLSTM" Sustainability 17, no. 6: 2576. https://doi.org/10.3390/su17062576

APA Style

Wang, C., Huang, S., & Zhang, C. (2025). Short-Term Traffic Flow Prediction Considering Weather Factors Based on Optimized Deep Learning Neural Networks: Bo-GRA-CNN-BiLSTM. Sustainability, 17(6), 2576. https://doi.org/10.3390/su17062576

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop