5.1. Data Description and Preparation
In our research, a private-sector company provided historical and real-time probe vehicle data for us to research travel time prediction. Probe vehicles collect information such as instantaneous speeds, timestamps, location, and azimuth, reflecting the running state of the urban traffic, and could play a crucial part in travel time estimation and prediction. The oracle database was acquired from ITS (Intelligent Traffic System) in Wuhan, China. We chose a partial road network in Wuhan city as a study area, as shown in
Figure 1. The road network was bounded by Wuluo Road, Luoshi South Road, Xiongchu Avenue, and Dingziqiao Road, including many branches and paths. The road network was divided into links by crossing points where roads intersect. The degree of each link was obtained for the links in the entire network, except for links on the edge of the study area.
Table 4 shows the selected local roads in the Wuhan road network, which includes the section number, geographic location and the length of each segment.
As a result of the effect of GPS positioning error [
24], probe vehicles usually deviate from its actual driving road. Therefore, we first projected GPS points to those roads according to the probe vehicle trajectory with map-matching algorithm [
15,
25,
26,
27], and then, calculated link travel time using these corrected points. We calculated travel information including travel time, average speed of probe vehicles taking into consideration the probe vehicle running state at the intersections [
28,
29,
30,
31].
Table 5 depicts the travel characteristics extracted from the massive quantities of statistical travel time data, including obtained link ID, exiting endpoint ID, entering endpoint ID, probe vehicle ID, the travel time for a probe vehicle traversing the link, the moment a probe vehicle entered the link and the average speed of a probe vehicle traversing the link. Existing research has shown that probe vehicle trajectories display similar traffic patterns over a weekly cycle [
18,
21,
32]. According to the weekly cycle of traffic, historical characteristics between target and upstream links were extracted.
Figure 2 depicts road number and traffic direction on the partial road network shown in
Figure 1. We used our model to predict travel times for link 82 considering spatiotemporal correlation among link 82, link 88, and link 77. Consequently, we extracted spatiotemporal correlation characteristics from big historical data from probe vehicles from May to July 2014—about three billion records.
Table 6,
Table 7 and
Table 8 summarize historical big data, about 299,773,570 records from probe vehicle travel time information with descriptive statistics from January to May, 2014, including: mean value; standard deviation (SD); the 25th, 50th and 75th percentiles of travel time; and the minimum (Min), and maximum (Max) observations. Travel time data were recorded in the unit of seconds. From these three tables, it can be inferred that the quartile speed for the same link was similar for each day, with not much difference from day to day. In contrast, a great difference in speeds existed among different links.
Figure 6,
Figure 7 and
Figure 8 show the distribution of speeds among the observations from link 88, link 82 and link 77 on Mondays and Wednesdays, respectively. A histogram of the same links presents a similar pattern, with an approximately normal distribution, if outliers are excluded. The distribution of travel speed however, shows slight differences among the different links.
We took link 82 for the sparse data link. Thus, link 82 and adjacent links including link 76, link 77, link 81 and link 88, were the research objects. To remove noisy data, statistical historical data were preprocessed. At the same time, historical data on workdays (from Monday to Friday, except holidays) were filtered as experimental data. Consequently, we calculated expected speed and the standard deviation for the speed of the adjacent links for every 30 min period from preprocessed historical data according to the weekly cycle and traffic flow direction as depicted in
Figure 2. The features were calculated according to
Section 3 and shown in
Table 9. The travel time of adjacent links is the key point when predicting target link travel time as this value reflects the localized state of traffic overall. Finally, we extracted 2078 features as input for the neural network and 2078 features as output corresponding to input features. Of all the extracted features, a portion of these features was taken as training data for the ANN model. Another different portion of these features was taken as test data to verify the validity of the model.
Partial meteorological information was selected to research their influence on link travel time prediction. As for the meteorological information, we only considered the influence of temperature and rainfall on traffic based on our previous analysis discussed in
Section 3.4. We defined degree of rainfall into four ranks according to historical rainfall: rainless, drizzle, downpour and thunder, corresponding to the digital values 1, 2, 3 and 4, respectively. The input and output information of our model are depicted in
Table 9.
5.5. Sensitivity Analysis of Different Influencing Factors
We constructed different ANN models to understand factors influencing link travel time, such as weather, and temperature. As mentioned in
Section 3, the input features of neural network model includes day of the week (
), which 30 min of the day (
), expected speed (
), standard deviation of speed (
), degree ratio between target link and adjacent link (
), length ratio between target link and adjacent link (
), temperature (
temp) and rainfall (
rain). The output of ANN models was the travel time ratio between target link and adjacent link (
). The role each feature plays in predicting travel time in the neural network model needed to be verified. Therefore, different models were constructed with different factors and the sensitivity of factors on travel time prediction was analyzed. We constructed different neural network models by combining input features and used the average absolute percentage error (MAPE) to measure the performance of these ANN models. To construct models conveniently, we use simple variable
F1 to
F8 as depicted in
Table 3 to denote the input features of different models and constructed them as follows.
(1) Model M: including all input features
This model includes all input features and it is regard as a benchmark compared with other models. The input feature of Model M includes F1, F2, F3, F4, F5, F6, F7 and F8.
(2) Model A: without day of the week
Time information reflects the travel characteristic of probe vehicle during different time periods. The day of the week distinguishes different travel times corresponding to different days in a week. The ANN model without day of the week was trained as Model A. The input features of Model A included F2, F3, F4, F5, F6, F7 and F8.
(3) Model B: without 30 min time interval of the day
The 30 min intervals of the day distinguish different travel times corresponding to different time periods. The ANN model excluding 30 min of a day was trained as Model B. The input feature of Model B includes F1, F3, F4, F5, F6, F7 and F8.
(4) Model C: without expected speed
The expected speed reflects the state of traffic on a road. The ANN model without expected speed was trained as Model C. The input feature of Model C includes F1, F2, F4, F5, F6, F7 and F8.
(5) Model D: without the standard deviation of speed
The standard deviation of speed reflects the variance in speeds on a link. The ANN model without the standard deviation of speed was trained as Model D. The input feature of Model D includes F1, F2, F3, F5, F6, F7 and F8.
(6) Model E: without length ratio
The ANN model without length ratio was trained as Model E. The input feature of Model E includes F1, F2, F3, F4, F6, F7 and F8.
(7) Model F: without degree ratio
The ANN model without degree ratio was trained as Model F. The input feature of Model F includes F1, F2, F3, F4, F5, F7 and F8.
(8) Model G: without temperature
The ANN model without temperature was trained as Model G. The input feature of Model G includes F1, F2, F3, F4, F5, F6 and F8.
(9) Model H: without rain
The ANN model without rain was trained as Model H. The input feature of Model H includes F1, F2, F3, F4, F5, F6 and F7.
In the comparison experiment, we used the same dataset to train different ANN models and the same dataset was used to test those trained models using MAPE. Consequently, we conducted experiments using the same training dataset and four groups of testing dataset for each trained model to test the trained model. It can reflect the influence of model constituted by different factors on link travel time prediction. As shown in
Table 11, it quantifies the influence of different factors. In general, model M has smaller MAPE than other models.
Figure 11 illustrates the influence of different models on the performance of ANNs under the condition of different testing dataset. As shown in
Figure 11, different factors influence the prediction of link travel time. Model M had greater prediction accuracy overall, as the MAPE value was lower. Models with the three factors day of the week, 30 min period of the day, and the expected speed of adjacent link influenced link travel time prediction had a higher value of MAPE than those models without them. The expected speed of an adjacent link had the greatest effect on link travel time prediction among those three factors; the biggest MAPE value appeared in the model excluding expected speed in link travel time prediction. The degree ratio and temperature slightly influence link travel time prediction. Rainfall affects link travel time prediction but is not as important as time of day as expressed by 30 min interval, expected speed of adjacent links, or day of the week. The MAPE value was smaller when rainfall was excluded from the model, as shown in the sensitivity analysis seen in
Figure 11.