**4. Evaluation**

In this section, we evaluate our approach using the data from TOPIS (Seoul Traffic Operation and Information service) https://topis.seoul.go.kr/, which contain hourly average speed information for the 4670 major traffic links in the Seoul metropolitan area. We obtained data at one hour intervals for 8760 h worth of data from 00:00 on 1 January 2018 to 23:00 on 31 December 2018. Temperature and precipitation information of each link was obtained from the nearest weather station. We retrieved the climate readings from KMA's https://data.kma.go.kr/ AWS(Automatic Weather System). We took 60% of the data as a training set, 20% as a validation set, and the rest as the test set. The attributes of the dataset used in this paper are listed in Table 2 with basic statistics, units, measurement intervals, and data types.


We conducted our experiments on NVIDIA DGX-1 with an 80-Core (160 threads) CPU, 8 Tesla V100 GPUs with 32 GB of exclusive memory and 512 GB of RAM. NVIDIA DGX-1 was operated with the Ubuntu 16.04.5 LTS server, and the machine learning tasks were executed through the Docker containers. The machine learning algorithms were implemented with Python (v3.6.9), Tensorflow (v1.15.0), and Keras (v2.3.1) libraries. We used ReLU for the activation function [36,37] and Adam for the optimization function [38,39]. The learning rate was empirically set to 0.001. The early stoppage was configured by setting the patience value to 200 through Keras. To measure the prediction performance, we employed the metrics as follows: (1) MAE (Mean Absolute Error) (Equations (7)); (2) RMSE (Root Mean Squared Error) (Equation (8)); and (3) MAPE (Mean Absolute Percentage Error) (Equation (9)). *yt* and *y*ˆ*t* are the predicted speed and the actual speed, respectively. *n* is the number of test cases. The value for checking the convergence of the *Z* value, as defined in Equation (6), was empirically set as shown in Table 3. With = 0.005, we were able to achieve the lowest MAPE.

$$MAE = \frac{1}{n} \sum\_{t=1}^{n} |y\_t - \hat{y}\_t| \tag{7}$$

$$RMSE = \sqrt{\frac{1}{n} \sum\_{t=1}^{n} (y\_t - \hat{y}\_t)^2} \tag{8}$$

$$MAPE = \frac{1}{n} \sum\_{t=1}^{n} \left| \frac{y\_t - \hat{y}\_t}{y\_t} \right| \* 100 \quad (\%) \tag{9}$$


**Table 3.** Prediction performance according to the convergence condition () setting.

### *4.1. Measurement of Prediction Performance*

We measured the prediction performance between various approaches denoted as DNN, TFC-DNN, GRU, TFC-GRU, LSTM, and TFC-LSTM. The prefix TFC- stands for Traffic Flow Centrality, and it represents the following seven novel features that we introduced in Section 3, i.e., *V*, *Pin*, *Pout*, *ρFin*, *ρFout*, *Zin*, and *Zout*. We evaluated the effectiveness of the information about external conditions such as climate and date information separately. DNN, GRU, and LSTM are the artificial neural network architectures we employed for machine learning. Table 4 contains the prediction performance of each approach. For each model, we picked the empirically best hyper-parameter settings, such as the number of hidden layers, perceptrons per layer, and the number of time windows for the recurrent neural networks. For double hidden layers, we had 64 and eight perceptrons for the first and the second layer, respectively. For a single hidden layer, we used 512 perceptrons. We cited the representative existing works that fall under the prediction model categories that do not use our techniques. We did not compare the methods that could not deal with the traffic networks that scale to thousands of links. MSE values during the training and validation stage are plotted on the graphs in Figure 10. MSEs converged at epoch = 400.

The consideration of every link's external conditions was effective only when used by LSTM and TFC-LSTM. On the other hand, we observed that using the features related to traffic flow centrality consistently led to improvement over the baseline approaches. However, when both the external conditions and the features related to traffic flow centrality were used with LSTM, i.e., TFC-LSTM, we achieved the lowest MAPE of 10.39.



The DNN-based approaches without the information about the external conditions performed poorly compared to the recurrent neural network models because it does not model the temporal transitions of the link features.

**Figure 10.** MSEs during training and validation.

### *4.2. The Effect of Reachable Path Length Cutoff*

We noticed that a link in the Seoul traffic system could be reached from any other links within an hour regardless of the distance, even during rush hours with heavy traffic, according to the Floyd–Warshall algorithm we used for retrieving the lowest cost inbound and outbound paths between any OD pairs. It turns out that the Floyd–Warshall algorithm retrieved several unrealistic reachable paths between any OD pairs by ignoring specific restrictions on some of the traffic links. For example, the algorithm generated some routes that included wild U-turns that are not allowed on some roads. As a result, path lengths measured as the number of hops tended to be excessively long. Thus, we were unnecessarily taking into account the state of the links for which it is practically impossible to influence the state of the remote links.

One possible solution to this problem is to have a shorter time window than an hour. However, TOPIS only makes the hourly data available to the public. Therefore, we revised the algorithm to limit the path length instead. When overall traffic flows at the lowest average speed, we limited the reachable inbound and outbound path lengths to 15 hops. On the other hand, for the period when overall traffic flows at the highest average speed, we relaxed the length limit to 45 hops. Furthermore, the link's low speed may be attributed to the congestion on the neighboring links in the vicinity. Thus, reachability from other links to the low-speed link is also limited. Therefore, it is sufficient to consider only the links in proximity. Compared to the period of lowest average traffic speed ("Lowest Speed

Period" in Table 5), the reachability of the traffic from one link to the others is higher during the highest average traffic speed period. Thus, for such a period ("Highest Speed Period" in Table 5), we considered the states of the other links within a wider range. Our simple revision of the algorithm is empirically proven to be effective, as shown in Table 5. It shows the prediction performance for different path length cutoff settings. We observed that our approach performed most effectively with an MAPE of 10.39 when the range of neighboring links to consider was proportional to the traffic's average speed.


**Table 5.** The effect of reachability path length limit on speed prediction.

### *4.3. The Discussion on Scalability*

One of the merits of our approach is the capability to predict the traffic speeds even for a very large-scale traffic network such as the road system of Seoul. The larger the traffic network is, the higher the opportunity is to predict speed accurately. This is because we can consider more conditions around the links that are often overlooked by the existing works. The works that only consider the conditions of adjacent neighbors [18] exhibited a decline in prediction accuracy compared to what was originally reported and performed worse than our approach. This is because their approaches were unableto capture the farther away links' highly probable influence.

Naively feeding in the raw adjacency matrix was the most space inefficient approach [40,41]. We could not even compare the prediction performance with such approaches, as they quickly encountered out-of-memory errors when dealing with Seoul's large-scale road system. Our approach is agnostic of the scale of the network and even the structure change. Regardless of the scale and any changes, we ran the aggregation functions to compute the fixed-length input feature vector concerning the traffic flow centrality. Therefore, we did not need to restructure the neural network architecture upon changes to the traffic system (i.e., addition/deletion of links, change of adjacent links).

However, we dealt with a fragment of the entire national traffic system in South Korea. The traffic networks in Seoul are connected to the systems in other districts such as Gyeonggi Province and Incheon metropolitan area. Due to the fragmented view, we accidentally identified the links that bridge between different traffic networks as dead-ends, as shown in Figure 11. We could not accurately reflect the traffic flow centrality for these dead-end links. The *Zout* value was zero for these dead-end links because there is no way out. The *Zin* value is not credible as the inbound traffic from other regions was not accounted for. The inaccurate *Z* values on the dead-end may negatively impact the neighboring links' *Z* values.

For this issue, we applied the average *Z* value of the whole system as the *Z* value of the dead-end links, which still cannot be viewed as an ideal solution. This motivated us to venture into applying our approach to predicting speed prediction for all the links in the entire national traffic system. By expanding the view of the traffic network, we expect the accuracy to improve further. This is planned to be done soon after we are given integrated traffic information from all Korean regions.

**Figure 11.** Bridge links between traffic networks subject to analysis and out-of-range traffic network accidentally being recognized as dead-end links.
