1. Introduction
Onsets of freeway congestion reduce efficiency and capacity of transportation networks and should be forecast to take measures to prevent its formation in an accurate and timely manner in most situations [
1,
2]. Compared with flow or volume prediction [
3,
4,
5,
6], traffic congestion forecasting is more intuitive to both road travelers and transportation administrators. It can help road travelers to make better route selections, reduce pollution such as emission of greenhouse gases, and improve transportation operation efficiency. For these reasons, many prediction methods have been proposed and evaluated for traffic congestion prediction [
7,
8,
9].
Various kinds and amounts of traffic data have been used by researchers in recent years for traffic condition prediction and related research of intelligent transportation systems. Most of these works use data sources such as road sensors, induction loops, automatic vehicle identification systems, remote traffic microwave sensors, in-road reflectors, floating car data, and simulation [
3,
4,
10,
11,
12,
13]. Images from cameras installed on roads, aerial photographs and remote sensing images as another kind of data source have also been used [
14,
15]. These kinds of data are either difficult to access due to special requirement of permissions or often outdated. On the other hand, many traffic administrative departments [
16,
17] and online map service providers [
18,
19,
20] provide real-time or near real-time online traffic congestion condition maps to the general public for free. Such services use and integrate various data sources, for example induction loops and more generally location-based data originating from apps active on Global Positioning System (GPS)-enabled smartphones carried in running cars. Such maps provide a new kind of real-time traffic condition data source which differs vastly in timeliness, accessibility, availability, and coverage from aforementioned kinds of data sources. Yet there are few studies based on such data sources for traffic congestion prediction. This paper proposes a systematic method to collect and use this new kind of data source for traffic congestion forecasting.
Furthermore, traffic congestion maps provided by traffic administration departments and online map services often cover areas with road networks of various scales instead of a single road. It is more significant and interesting to understand and forecast traffic congestion in an entire road network rather than on a single road. It is also more helpful to present complete traffic congestion information for travelers to choose routes wisely and for traffic administrators to manage road networks and allocate resources systematically. However, it is more demanding to predict congestion for a road network and thus needs abilities to deal with more computational complexity introduced by the network topology, to generate more timely and effective prediction from a three-dimensional perspective, and to output more accurately predicted congestion levels. Unfortunately, conventional traffic congestion prediction models, which represent each road or road segment as an element in a sequence ignoring impacts of road length [
7] or only consider a limited number of road links, do not meet aforementioned requirements due to limitations in input and output representation and normalization, improper hypotheses and assumptions, and inability to cope with the curse of dimensionality [
21]. Thus, existing models may fail to predict large-scale network traffic congestion.
Motivated by hierarchical feature extraction and motion prediction, this paper represents congestion levels of a traffic network as a 2D-matrix and introduces an image-based deep learning approach for congestion forecasting. A deep learning architecture inspired by deep autoencoders (DAE) [
22] is proposed to first learn low-dimensional vector representations of features and relationships embodied in images obtained from aforementioned data sources and then predict future congestion levels in a supervised way based on these learned representations. An autoencoder is a neural network consisting of an encoding module (encoder) and a decoding module (decoder) designed to learn representational properties of the input data [
23]. Furthermore, multiple encoders and decoders can transform high-dimensional data into low-dimensional vector representations for further applications [
22,
24,
25].
This paper tries to forecast future congestion levels of a transportation network using only historical temporal information and correlations of congestion levels of the network. In particular, we examine whether it is feasible to directly predict and output road network congestion images through the examination of just a sequence of previous congestion images for that same transportation network in an end-to-end approach with the help of a simple yet efficient deep learning neural network architecture inspired by DAEs.
The contributions of the paper are summarized as follows:
We propose an accessible and general approach to collect, transform, and represent snapshots of transportation network maps marked with traffic congestion conditions for roads inside, which are publicly available from transportation administrative departments and online traffic map service providers. Based on this approach, we have built and released a long-span traffic congestion dataset.
We develop a deep neural network model for efficient end-to-end prediction of transportation network congestion levels by using hierarchical feature extraction. Our end-to-end learning model directly outputs prediction results presented visually and intuitively in the same road network structure and form as inputs are, thus eliminating the need for manual feature selection and engineering.
Our extensive experiments on a transportation network in the Seattle area demonstrates effectiveness and efficiency of the proposed approach.
The remainder of this paper is organized as follows:
Section 2 discusses the related literature on traffic prediction.
Section 3 presents a systematic approach to collect and transform snapshots of transportation network maps with congestion levels, introduces a grid-based representation method for traffic congestion levels, and shows the architecture of our deep neural network to learn temporal feature representation and correlations for traffic congestion prediction. In
Section 4 a transportation network in the Seattle area, Washington state is used to build a traffic congestion dataset and to test the effectiveness of the proposed model. To evaluate the performance of our proposed neural network model, we compare two state-of-the-art deep learning models titled convolutional long short-term memory (LSTM) Network (ConvLSTM) [
26] and Spatiotemporal Recurrent Convolutional Network (SRCN) [
6]. Then in
Section 5 we discuss implications and limitations of our work. The conclusions and future studies are presented in the final section.
2. Related Work
There are two main categories of approaches for prediction of traffic related variables such as speed, volume, and density, namely parametric and nonparametric approaches [
21,
27].
Being a parametric approach, the Auto Regressive Integrated Moving Average (ARIMA) model was proposed to construct models from time series of historical states to predict future values. Parameters of the ARIMA model could be configured via the Box-Jenkins method [
28]. Smith and Williams used the ARIMA model for the first time to predict traffic flow at a single point [
28]. A family of ARIMA-based models, such as seasonal ARIMA (SARIMA) models [
29,
30,
31], KARIMA models [
32], ARIMAX models [
33], and CTM-SARIMA models [
34], have been deployed for traffic forecasting since then. These parametric approaches share common requirements which demand predetermined structures of models according to theoretical or physical assumptions and tuning a set of parameters to reflect the evolution of traffic conditions in the real world as much as possible [
35,
36]. Especially SARIMA models are found to incur high computational cost [
28].
The limitations of parametric algorithms have led the shift to nonparametric approaches for traffic prediction such as nonparametric regression, Support Vector Machines (SVM), and Artificial Neural Networks. As a nonparametric approach, k-nearest neighbors (KNN) models have been used to forecast traffic speeds and flows, with both univariate and multivariate cases [
37,
38,
39,
40,
41]. Supporting vector machine (SVM) and its variants such as SVR, seasonal SVM, and Online-SVM have been explored to improve traffic prediction performance due to their capabilities to generalize well and capture the high dynamics and sensitivity of traffic data [
42,
43,
44]. Artificial neural networks (ANNs), with their advantages such as capability to work with multi-dimensional data, implementation flexibility, generalization ability, and good forecasting performance, have been applied to traffic prediction problems [
45]. Kumar et al. applied an ANN to predict traffic volume using time information in addition to past traffic related data such as volume, speed, and density [
46]. Kashi and Akbarzadeh used wavelet transformation to remove unimportant fluctuation from the flow signal and then an ANN to train on past data to predict future flow on different highways and locations [
47].
Although these methods can model the non-linearity and extract spatial-temporal relationship in the traffic data to achieve better results than parametric approaches, they require significant prior domain knowledge and extensive preprocessing work such as feature engineering. With traffic density increasing, wide adoption of sensors and cameras, and popularization of GPS-enabled navigation apps for smartphones, the big data paradigm has emerged from transportation related data. Such data explosion introduces a problem well-known as the curse of dimensionality, which cannot be handled efficiently by traditional approaches [
48,
49]. To gain insight from big traffic data, deep neural networks have become popular in recent years to learn deep correlations inherent in data with little or no prior knowledge and need for manual feature engineering [
13,
50]. Stacked autoencoder models have been used to exploit temporal and spatiotemporal information on real-world or simulated datasets to predict traffic flow [
5,
51]. Recurrent neural networks (RNNs) especially LSTM have been used to predict traffic flow, speed, and congestion because of their built-in memory cells enabling learning temporal knowledge and thus being suitable for time-series analytics [
52,
53,
54]. Convolutional neural networks (CNNs), with its special strength in extracting spatial correlations, have been adopted for traffic speed prediction on image generated from speed data [
4,
55]. Deep belief networks (DBNs) have also been used for traffic flow prediction due to their capability to learn effective representative features from data in an unsupervised way [
50,
56,
57]. Furthermore, there have been research efforts which combine more than one kind of deep neural networks. Yu et al. [
6] proposed a deep learning model named SRCN combining CNNs and LSTMs to capture and learn both spatial dependencies of different roads and long-term temporal dependency of each road and to predict traffic speed on a certain set of roads, using in-house synthesized road maps with floating car GPS data.
However, these attempts mainly focus on prediction of traffic related properties such as flow, speed, and time on a single road segment, several number of roads, or a small network region [
21,
58,
59,
60,
61]. One major reason for lack of such studies of traffic congestion prediction is the challenge to obtain large data set. One recent work by Ma et al. [
7] used the congestion information collected from the GPS data from taxi to model and predict traffic congestion evolution. However, this dataset has time intervals of 30 minutes and 60 minutes, which is too sparse for training models for real-world congestion prediction. It also spans only four weeks.
3. Methodology
In this section, we describe our accessible and general approach for obtaining and processing raw snapshots of traffic congestion maps from online map service providers. In our approach, traffic congestion data are extracted automatically using an image mask for highways and a customized map-reduce implementation is used to process and transform these extracted congestion data originally represented as pixels into float numbers. Our method can extract traffic congestion levels from raw traffic congestion maps or snapshots provided publicly and for free by online map and traffic service providers. Then we propose a deep neural network (DNN) architecture for traffic congestion prediction.
Inspired by research findings of computer vision and deep learning in motion prediction which estimate future trajectories of objects via sequences of scenes generated by itself [
62], we first collect time series of snapshots of network-wide traffic congestion map from Washington State Department of Transportation (WSDOT) [
17] with the help of web browsers and web crawlers, and then build a dataset from these snapshots. The approach can be easily extended to other online map service providers to build more datasets. This dataset is then used to train and back-test different deep learning models for congestion prediction. Based on the theory of city management grid modeling [
63], we segment the transportation network covered by these snapshots into different grids. Each snapshot is divided into non-overlapping tiled 8 × 8 pixel grids where each grid corresponds to an area of about
m
in the real world. Each grid has a congestion level calculated as the average of traffic congestion levels represented by all pixels in that grid.
3.1. Representation of Congestion Level of the Transportation Network
Since we want to forecast congestion levels inside every grid in a traffic network, we first retain only the road network and congestion conditions in the transportation network by removing all other pixels with the help of an image mask as shown by the transformation from
Figure 1a to
Figure 1b.
Then we segment network-only images into
grids, as is shown by
Figure 1c. For each grid located at (
) where
and
,
denote the set of congestion levels as represented by all pixels in that grid at time
t. Calculated from
according to Equation (
1),
denotes the averaged congestion level of the grid at the time
t.
Based on Equation (
1), the final congestion level representation for the segmented road network at time
t is expressed by the matrix in Equation (
2), where
R stands for number of grids latitudinally and
C longitudinally.
Figure 1d provides a colored visualization of a matrix representation as expressed by Equation (
2). Each pixel in
Figure 1d stands for an area of
m
and is rendered according to a custom linear color map [
64] generated from a sequence of RGB values (
).
Suppose that we need to predict traffic congestion levels for the road network at time points in where h is the prediction horizonand historical congestion levels in the time range are used as inputs. When we arrange historical records of network traffic congestion representation by time, we get the time-series sequence .
Because multiple time intervals are used as input for forecasting, the traffic congestion prediction task can be regarded as a time-series sequence prediction problem. When all grids in the network are predicted at the same time, it is known as the multi-dimensional sequence learning problem.
3.2. Temporal Features
Sequences of snapshots of traffic congestion levels across a road network delimited by a fixed time interval in chronological order are very similar to natural language sentences consisting of words separated by spaces. Future congestion levels might be affected by earlier congestion levels to varying degrees due to the temporal dependency inherent in traffic data. In our work we try to use such temporal correlation for prediction of future congestion levels.
3.3. Deep Congestion Prediction Network
We propose to use a deep learning model titled Deep Congestion Prediction Network (DCPN) inspired by DAEs for transportation network congestion prediction. It is designed to learn and represent temporal features and correlations of traffic congestion levels among roads in the transportation network. Our proposed model consists of two components. The first component contains an encoder and a decoder. The encoder first obtains a vector representation of historical congestion levels of a transportation network and their correlations using four encoding layers. Next the decoder builds a representation of the congestion levels for a future time point using four decoding layers. The architecture of this first component uses symmetrical layers for the encoder and the decoder, while in DAEs the encoder and the decoder shares the inner most layer [
22]. The second component of DCPN uses two dense layers to construct congestion levels for each grid in that transportation network at that future time point. These two dense layers take the output
from the decoder in the first component and calculate a vector representation of predicted traffic congestion levels, as shown by Equation (
3).
and
represent the weight matrix and the bias between the last layer of the DAE-like architecture and hidden dense layer, while
and
between the hidden dense layer and the output dense layer. To avoid overfitting, a dropout layer is added between the two dense layers. The prediction vector
has the same number of elements as each of input series and is further reshaped to have the same shape. Thus, in this way we train the proposed model from end to end. To integrate DAE-like architectures and dense layers for traffic congestion prediction, we propose to use the architecture in
Figure 2 for DCPN to forecast traffic congestion on a network scale. A DCPN model includes one input layer, one flattened layer, one deep autoencoder, two dense layers with one dropout layer in between, and one reshaping layer.
5. Discussion
Lack of large-scale traffic data related datasets with open access hinders research of intelligent transport systems. Data used in previous work—for example GPS and floating car data captured by taxis, images from surveillance CCTV cameras—either requires prior complex or special processes of application by users such as researchers and grant from authorities such as governmental agencies and companies. Nevertheless, there have been transportation network maps marked with congestion indicators or levels provided online for free by transportation administration departments and map service providers. For example, WSDOT has started providing such traffic congestion maps since at least as early as 2012 according to the Internet Archive [
70]. To ease building of large-scale datasets for and to promote traffic congestion related research, we present a convenient and general workflow to create datasets for prediction of traffic congestion based on raw data from these free online service providers. Specifically, first we have made available an archived dataset of raw snapshots of the highway transportation network in the Seattle area from 1 January 2016 to 28 February 2017 as collected from WSDOT, most of which are no longer provided on the WSDOT website. Then based on this raw dataset, we build a dataset named SATCS and release it for future traffic congestion related research by us and others.
Even though now there is the SATCS dataset for traffic congestion research, some information is lost when congestion levels are represented in our current work. We must shrink matrix representations of congestion levels in the Seattle area highway transportation network using a grid-based scheme due to limitation imposed by GPU memory size of our current experiment equipment. As a result, each grid covers an area of 80 m × 80 m and the average of congestion levels represented by all pixels in a grid is used as the congestion level for that grid. During this shrinking process information is lost and geographical accuracy is reduced. It is yet unclear what information is lost and how loss of information affects prediction performance. We will deal with this information loss problem in our future work to predict traffic congestion at a finer granularity.
Furthermore, there is still room for improvement of computation efficiency. In the current representation of traffic congestion levels, most values equal which is for grids containing no roads at all in the vast background area. However, DCPN still uses such values when training and testing, and thus incurs unnecessary computation. In our future work, we will try to exclude such background information to further improve efficiency.
6. Conclusions
In this work, we first propose an accessible and general approach to collect, transform and represent snapshots of road networks marked with congestion levels. We then apply it to build a dataset named SATCS for traffic congestion research. We develop a deep learning model DCPN by combining a DAE-inspired feature learning architecture and dense layers to learn representational features and temporal correlations from historical traffic congestion data for prediction of future congestion levels in a transportation network near the Seattle area, Washington state, USA. To evaluate the effectiveness of the proposed DCPN model for short-term traffic congestion forecasting, we compare its prediction performance with that of two state-of-the-art deep learning neural network models using the back-testing technique. Results over the SATCS benchmark dataset show that our proposed DCPN is more effective and computationally efficient for short-term traffic congestion forecasting.
This study focuses only on prediction of traffic congestion levels using traffic congestion snapshots from a single data source, and is limited by our experiment equipment. However, more extensive traffic forecast solutions are possible by covering travel time, volume, speed, and occupancy, and using other information such as weather conditions, which may be more accurate and meaningful for travelers, commuters, and administration departments. In future work, we will try to enhance computing capability of our experiment equipment to perform more thorough trials, experiment with snapshots from other service providers, and fuse multiple types of data from different sources, in order to build traffic forecast models for predicting aforementioned traffic condition related properties.