*3.1. Link Embedding*

Feature representation of a link is done in the following three steps: First, we abstract the traffic system as a link adjacency matrix *A*. A traffic network illustrated in Figure 2A is abstracted further as a directed graph structure, as shown in Figure 2B, with every directional link being uniquely identified. Figure 3A shows the link adjacency matrix for the sample traffic system in Figure 2A. *Alilj* denotes whether traffic can flow from link *li* to link *lj*. For instance, according to the example in Figure 2A, *Al*2*l*3 = 1 since traffic can flow from *l*2 to *l*3. Otherwise, *Al*2*l*3 equals 0. We exclude the links through which no traffic can reach any other parts of the traffic system. For instance, *l*1 and *l*16 are disconnected from the rest of the traffic system. Figure 3B illustrates a matrix of the distance between every pair of adjacent links. Given *Alilj* , Figure 3C shows the speed of the traffic flowing from *li* to adjacent link *lj*.

**Figure 3.** Matrices representing adjacency, distance, and information about reachable paths between links.

Second, we conduct a traffic flow reachability analysis. For every link *l*, we extract all inbound multi-hop paths through which traffic traverses towards *l*. Oppositely, we compute all outbound multi-hop paths through which the traffic originated from *l* diffuses. Given *Alilj* , we run the Floyd–Warshall algorithm [20] to obtain the minimum time it takes to travel from every source link *li* to all other destination links *lj*, as shown in Figure 3F. We generate a matrix of hop counts and minimum distances from every source link *li* to all other destination link *lj*, as shown in Figure 3D,E, respectively. For instance, the traffic on *l*5 (at Hop 3) takes *l*6 (at Hop 2) to reach *l*9 at time *t*2 in Figure 4B as opposed to taking *l*4 (at Hop 1) at time *t*1 in Figure 4A due to the heavy congestion on *l*4. After we obtain the paths, we discard the links through which traffic cannot reach the source link before the beginning of the next time window. For example, if the traffic on *l*9 is too slow to reach the remote link *l*15 by the end of the current time window, then we discard *l*15 from the reachable outbound path from *l*9, as shown in Figure 5.

$$P\_{\rm int} = \sum\_{i \in \mathcal{R}}^{\mathcal{R}} \mathbb{C}\_{\rm in} \quad \text{ } \quad P\_{\rm out} = \sum\_{i \in \mathcal{R}}^{\mathcal{R}} \mathbb{C}\_{\rm out} \tag{1}$$

$$\rho F\_{in} = \sum\_{i \in R}^{R} \frac{\upsilon\_{in\_i}}{f\_{in\_i} d\_{in\_i}} \quad , \quad \rho F\_{out} = \sum\_{i \in R}^{R} \frac{\upsilon\_{out\_i}}{f\_{out\_i} d\_{out\_i}} \tag{2}$$

$$L\_{x\_{in}} = \frac{\rho F\_{in}}{\sum\_{i \in R}^{R} \rho F\_{in}} \quad , \quad L\_{x\_{out}} = \frac{\rho F\_{out}}{\sum\_{i \in R}^{R} \rho F\_{out}} \tag{3}$$

$$Z\_{\rm int} = \ln(\sum\_{n \in N\_{\rm in}}^{N\_{\rm in}} \frac{L\_{\rm in}}{e^{hop}}) \; , \; Z\_{\rm out} = \ln(\sum\_{n \in N\_{\rm out}}^{N\_{\rm out}} \frac{L\_{\rm n\_{\rm out}}}{e^{hop}}) \tag{4}$$

$$Z\_{in} = \frac{Z\_{\text{ll}\_{in}}}{\sum\_{\text{dt} \in N}^{N} Z\_{\text{ll}\_{in}}} \text{ , } Z\_{\text{out}} = \frac{Z\_{\text{ll}\_{out}}}{\sum\_{\text{dt} \in N}^{N} Z\_{\text{out}}} \tag{5}$$

$$|Z\_{\rm in} - Z\_{\rm in}^{\prime}| < \varepsilon \, , \, |Z\_{\rm out} - Z\_{\rm out}^{\prime}| < \varepsilon \, \tag{6}$$

**Figure 4.** Extraction of inbound reachable paths and the computation of the inbound Z value for *I*9.

In the final step, we compute the *Z* value for every link, which we refer to as the traffic flow centrality. Given the matrix of minimum time to travel from source link *li* to destination link *lj* (Figure 6A), we count the number of inbound and outbound reachable paths ( *R*), ( *Pin* and *Pout*)for every link using Equation (1), as illustrated in Figure 6B. We also compute the speed and distance between every pair of adjacent links on the reachable paths. Equation (2) defines (*ρF*) as the weighted sum of the average traffic speed (*v*) on every link *i* on the multi-hop reachable paths. The weight of each intermediate link *i* is the inverse of the product between the fanout *f* and the distance *d* to *i*, where *f* specifically represents the number of alternate paths on a junction. The weight represents the impact on a given link the current traffic is either destined to or stemmed from. We expect the impact to be sensitive to the *f* and *d* values. For instance, we can capture the circumstance where the traffic on links with higher *f* and *d* values are less likely to move towards a target link *l* than the traffic on a link with lower *f* and *d* values. This is because traffic on the link with higher *f* and *d* values is more likely to veer away by taking different turns on the junctions or halt the transition at any point on the path. As an example, computing the inbound and the outbound *ρF* values for the link *l*9 is illustrated in Figure 6C,D. The aggregation steps above are to account for the traffic flow dynamics around every link.

Note that the links adjacent to each other are also inter-dependent on each other with regards to computing their *Z* values. Due to the inter-dependence, we have to compute Equations (4) and (5) iteratively until the condition on Equation (6) holds. We obtain a converged *Z* value when Equation (6) is satisfied. We re-scale the *ρF* values to *Lx* through normalization as specified in Equation (3). Suppose we have a set of adjacent inbound and outbound neighbors *Nin* and *Nout*, respectively, for a given link. Then, we take the sums of *Lx*/*ehop* of every neighbor in *Nin* and *Nout*. The division by *ehop* reflects that the impact of a link's *Lx* value on its immediate neighbor is inversely proportional to the hop distance between them. We take the natural logarithmic function on the sums as shown in Equation (4) and normalize the value as defined in Equation (5). Whenever Equation (4) repeats, *Ln* is substituted by *Z* obtained in the previous iteration. The value in Equation (6) is for determining the converged *Z*. We empirically set the value that leads to the most accurate link speed prediction. The iterative computation of the *Z* value for every link in a traffic network is illustrated in Figure 7.

The traffic flow centrality is to capture the inter-link relationship, and we expect it to be one of the key factors for accurately predicting traffic speed. Intuitively, it is highly probable that a link would experience traffic swarming in and causing congestion, especially when a large portion of its immediate neighbors also experience high inbound traffic through the reachable paths. Furthermore, a link with several surrounding links that spread out traffic quickly is likely to disseminate its traffic more easily.

**Figure 7.** Iterative computation of traffic flow centrality for every link in the traffic network.

We embed the 7 features, ( *V*, *Pin*, *Pout*, *ρFin*, *ρFout*, *Zin*, *Zout*), into a vector for every link, where *V* is the traffic speed. We make this link embedding more context-aware by simply concatenating an additional vector of external conditions surrounding the link. The external conditions include temperature, precipitation, time of day, day of the week, and an indication of whether a given day is a public holiday. The steps for generating the final input matrix we feed into a neural network for speed prediction is illustrated in Figure 8.

No matter how many adjacent neighbors a link has, the length of the input vector remains invariant, thus making our solution resilient to changes such as the addition or deletion of neighboring links. We do not take the entire adjacency matrix as an input to the prediction engine based on recurrent neural networks. Therefore, our approach becomes space-efficient. Our input vectors contain only the essential information instead of the initial raw adjacency matrix that is sparse. With every link expressed with highly distinct features, we expect it to yield better prediction results. Note that the input vectors are computed at every time window. In the following section, we introduce the method for modeling the transition of the features over time.

**Figure 8.** Generation of the input tensor reflecting the context-aware traffic flow centrality of every link.

### *3.2. Modeling the Temporal Patterns with Recurrent Neural Networks*

Given the input matrix generated at a time window, we predict the traffic speed of every link in the next time window using recurrent neural networks, such as GRU and LSTM. With the hidden layers, we can model a complex non-linear relationship between the input and the output values. Specifically, the example of feeding in the input features of every link and predicting its speed through an LSTM network is illustrated in Figure 9. The states summarized in the previous time window are fed into the block of hidden layers at the subsequent time window. Besides the previous states, we feed in the updated input feature vectors of every link. By training this network, we can model the spatial information's temporal transition, such as the transition of the traffic system's structure, traffic flow dynamics, and external conditions. We also make the correlation between the traffic speed of each link and the temporal transition of the most comprehensive set of features to date.

**Figure 9.** Using LSTM for modeling the correlation between the temporal patterns and the speed prediction at every time window.
