Discovering Key Sub-Trajectories to Explain Traffic Prediction

Wang, Hongjun; Fan, Zipei; Chen, Jiyuan; Zhang, Lingyu; Song, Xuan

doi:10.3390/s23010130

Open AccessArticle

Discovering Key Sub-Trajectories to Explain Traffic Prediction

by

Hongjun Wang

¹,

Zipei Fan

^2,*,

Jiyuan Chen

¹

,

Lingyu Zhang

¹ and

Xuan Song

^1,*

¹

Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China

²

Center for Spatial Information Science, The University of Tokyo, 4 Chome-6-1 Komaba, Meguro City, Tokyo 153-8505, Japan

^*

Authors to whom correspondence should be addressed.

Sensors 2023, 23(1), 130; https://doi.org/10.3390/s23010130

Submission received: 26 October 2022 / Revised: 6 December 2022 / Accepted: 19 December 2022 / Published: 23 December 2022

(This article belongs to the Section Sensing and Imaging)

Download

Browse Figures

Versions Notes

Abstract

:

Flow prediction has attracted extensive research attention; however, achieving reliable efficiency and interpretability from a unified model remains a challenging problem. In the literature, the Shapley method offers interpretable and explanatory insights for a unified framework for interpreting predictions. Nevertheless, using the Shapley value directly in traffic prediction results in certain issues. On the one hand, the correlation of positive and negative regions of fine-grained interpretation areas is difficult to understand. On the other hand, the Shapley method is an NP-hard problem with numerous possibilities for grid-based interpretation. Therefore, in this paper, we propose Trajectory Shapley, an approximate Shapley approach that functions by decomposing a flow tensor input with a multitude of trajectories and outputting the trajectories’ Shapley values in a specific region. However, the appearance of the trajectory is often random, leading to instability in interpreting results. Therefore, we propose a feature-based submodular algorithm to summarize the representative Shapley patterns. The summarization method can quickly generate the summary of Shapley distributions on overall trajectories so that users can understand the mechanisms of the deep model. Experimental results show that our algorithm can find multiple traffic trends from the different arterial roads and their Shapley distributions. Our approach was tested on real-world taxi trajectory datasets and exceeded explainable baseline models.

Keywords:

submodular; neural networks; explainable; Trajectory

1. Introduction

With the development of wireless communication and location acquisition, people can easily acquire their location by using a smartphone with the Global Positioning System (GPS), which has resulted in a massive amount of fragmentary spatio-temporal (ST) data [1]. Therefore, taking full advantage of using such mobile data is key to meeting human mobility demands. In recent years, many researchers have studied ST data, such as crowd flow and traffic flow. Deep ST neural networks (e.g., ST-resnet [2], DeepST [3], DMVST-Net [4], STDN [5]) have demonstrated that deep networks can take maximum advantage of ST data for prediction.

Although those approaches predict future traffic flow with high accuracy, they are based on deep learning and involve stacked nonlinear operations, which are unexplainable and impede their deployment in cities. To understand black box systems, in recent years, great achievements have been made in convolutional neural network (CNN) visualization and interpretation. Saliency maps [6], which are a gradient-based method, back-propagate through the entire model from the output to the input, and exhibits a correlation score between each grid square of the input and output. An integrated gradient [7] analyzes a wide range of output and solves the problem of gradient saturation. Smooth gradient [8] removes noise during visualization.

In the meantime, there are various existing methods to interpret neural networks by extracting features in the field of image recognition, due to the high dimensions of the pixel space. Lime [9] and kernel SHAP [10] combine image segmentation and the transformation into superpixels, explaining each superpixel through ablation. Time2graph [11] extracts time-aware shapelets [12] using a two-level timing factor. By extracting the key timing signal, Time2graph constructs the shapelet evolution graph and successfully detects abnormal time series. Activation maximization [6] finds the input pattern that maximizes the activation value of a given hidden layer and uses the hidden layer to extract features. Although the above methods successfully explain the relevant domain model, they may be unfit for crowd prediction due to the different definitions of feature space. The crowd flow tensor G is defined by the summation of all independent trajectories (see Defination 1). The input for crowd prediction is additive (see Defination 2), which differs from other domains. In this paper, we intend to combine trajectories with the Shapley method and produce a relevant Shapley value for each trajectory. The Shapley value was first proposed by Shapley in the field of game theory and has recently been applied to explain neural networks [10]. The Shapley method is NP-hard, meaning that it is impractical to exhaust all possible resources. Its key feature is to separate G into multiple trajectories and explain how each trajectory performs independently. This solution converts the flow tensor into the result of the addition of multiple tracks, which can reduce the computational complexity from

O (2^{d \times H \times W})

to

O (2^{N})

, where d is the history time slots, H is the height of the flow tensor, W is the width of the flow tensor, and N is the number of trajectories. However, despite reducing the solution space significantly, the trajectory space remains immense because, in real life, there are millions of cars on the road.

To address the above-mentioned challenges, in this paper, we combine the Shapley method with the trajectory tensor and propose a novel approach called Trajectory Shapley, which can compute approximate Shapley trajectories with time complexity

O (N)

. Moreover, to solve the problem of the chaotic distribution of trajectories and to find the pattern of explainable common trajectories, we need to divide the trajectory into many small sub-trajectories because the long-distance trajectory pattern is random and difficult to explore. Sub-trajectories of the trajectory helps us to eliminate the unimportant and redundant fragments. Then, we use the submodular method to find K representative trajectories. Each representative trajectory represents a set of trajectories and a Shapley distribution. Our goal is for the distribution of different subsets to be as scattered as possible, to prove that we have found representative trajectories. The trajectory selection example shown in Figure 1.

Our contributions to the field are as follows:

(1): We propose Trajectory Shapley, a method that can effectively extract features from in–out flow and interpret neural networks. As far as we know, we are the first to introduce the Shapley value into crowd prediction;
(2): In order to understand the pattern of trajectories from randomly distributed trajectories, we use submodules to discover key sub-trajectories that are representative of a certain distribution;
(3): We validate the effectiveness of our approach on two real-world public datasets. Experimental results show that our approach achieves notably better performance in the aspects of coverage and summarization.

2. Architecture

Figure 2 shows the architecture of our explanatory process and the mining of the sub-trajectory correlation, which comprises two parts: data processing and model training, and maximum explainability coverage. The first part generates the flow tensor G and the trajectory flow tensor T in Defination 1 and Defination 2, respectively. The second part computes the trajectory Shapley values and finds the most representative K sub-trajectories through summarization.

Data processing and model training: Given multiple users’ GPS logs, we build two types of data: a flow tensor and a trajectory flow tensor. The flow tensor is generated to train a deep model, which is the same as in previous approaches. For explanation, we extract the trajectory flow tensor T from G. Note that the time and space complexity when using the input of T is N times greater than using G.

Maximum explainability coverage: There are four parts of this architecture: model output, Trajectory Shapley, Trajectory Shapley subset, and trajectory segment. The model output represents the deep model output with a summation of the trajectory flow tensor Equation (1). Trajectory Shapley is produced by grid-based Shapley values; see Section 4.1. The purpose of the Trajectory Shapley subset is to reduce the explanation space. We use the receptive field of the model to screen the subsets of trajectories to be explained. For details, see Section 4.2. The purpose of Shapley segmentation is to generate the solution space to discover the pattern of explainable common trajectories; see Section 4.3. The chaotic Trajectory Shapley distribution is summarized to provide a clear explanation for users; see Section 4.4.

3. Preliminaries

Definition 1

(Inflow and outflow [3]). Let

P

be the set of trajectories in the

t^{t h}

time interval. For a grid of inflow and outflow matrices with i rows and j columns, the inflow

g_{t}^{i n, i, j}

and outflow

g_{t}^{o u t, i, j}

of the crowds are defined as

\begin{matrix} g_{t}^{i n, i, j} & = \sum_{T r \in P} |\{m > 1 ∣ v_{m - 1} \notin (i, j) \land v_{m} \in (i, j)\}| \\ g_{t}^{o u t, i, j} & = \sum_{T r \in P} |\{m \geq 1 ∣ v_{m} \in (i, j) \land v_{m + 1} \notin (i, j)\}| \end{matrix}

where,

T r : v_{1} \to v_{2} \to \dots \to v_{| T r |}

is a single trajectory in

P

.

v_{m} \in (i, j)

means that the trajectory

v_{m}

’s coordinates are in the region of

g_{t}^{i, j}

.

| \cdot |

denotes the cardinality of a set.

The inflow and outflow matrices are mixtures. Given an area to explain, it is difficult to attribute region contribution. Therefore, we extract the features from G following Defination 1. The equivalent definition for Defination 1 is

Definition 2

(Trajectory flow splicing).Let G be the flow matrix in the range of all time. Each trajectory can be split by a time interval as a tensor. Let Ω be the set of all trajectories;

T_{i} \in Ω

denotes a trajectory.

T_{i}^{i n}

and

T_{i}^{o u t}

refer to a transfer presentation with the following constraint

\begin{matrix} T_{r}^{i n, i, j} & = |\{m > 1 ∣ v_{m - 1} \notin (i, j) \land v_{m} \in (i, j)\}| \\ T_{r}^{o u t, i, j} & = |\{m \geq 1 ∣ v_{m} \in (i, j) \land v_{m + 1} \notin (i, j)\}| \end{matrix}

Therefore,

G^{i n}

and

G^{o u t}

are defined as

\begin{matrix} G^{i n} = \sum_{T_{r}^{i n} \in Ω} T_{r}^{i n}, G^{o u t} = \sum_{T_{r}^{o u t} \in Ω} T_{r}^{o u t} \end{matrix}

(1)

The Figure 3 shows the aggregate process.

Definition 3

(Flow prediction).Given the historical observations

G_{t} = \{V_{t^{'}} ∣ t^{'} \in [t - n + 1, t]\}

, predict

Y_{t + 1}

.

f : G_{t} \to Y_{t + 1}

(2)

where f is a neural network and

n \in N

denotes the length of the input timestamps,

V_{t}

resides in

R^{1 \times 2 \times W \times H}

as the one-frame inflow and outflow. W and H denote the region size.

Definition 4

(Shapley values [13]). The Shapley value is defined via the value function (val) of players in

S

and a feature value is its contribution to the payout, weighted, and summed over all possible feature value combinations:

\begin{matrix} ϕ_{j} (v a l) = \sum_{S \subseteq \{x_{1}, \dots, x_{p}\} ∖ \{x_{j}\}} \frac{| S |! (p - | S | - 1)!}{p!} (v a l (S \cup \{x_{j}\}) - v a l (S)) \end{matrix}

where

S

is a subset of the features used in the model,

x

is the vector of feature values of the instance to be explained, and

p

the number of features.

{val}_{x} (S)

is the prediction for feature values in set

S

. Ref. [14] showed that the Shapley value is the only reward with the following axioms.

4. Trajectory Shapley

In this section, we present a novel algorithm for computing the Trajectory Shapley value. We name the proposed framework Trajectory Shapley, as it combines trajectory flow tensors and Shapley values.

4.1. Trajectory Shapley

While we extract the trajectory flow tensor and reduce the computational complexity from

O (2^{d \times H \times W})

to

O (2^{N})

within a certain time slot, there may be millions of trajectories in the city. The computational complexity is still large. Fortunately, Deep SHAP is a high-speed approximation algorithm for SHAP values in deep learning models that builds on a connection with Deep LIFT. The Trajectory Shapley value can be obtained with

O (N)

complexity. Following Defination 3, in crowd prediction, Deep SHAP can be formulated as

\begin{matrix} ϕ (V_{t^{^{'}}}) = (Y_{t + 1} - E [Y_{t + 1}]) \times \frac{\partial (Y_{t + 1}^{x, y})}{\partial V_{t^{^{'}}}}, V_{t^{^{'}}} \in G_{t}, \end{matrix}

(3)

where

Y_{t + 1}^{x, y}

denotes the region coordinates

(x, y)

in the model output;

E [Y_{t + 1}]

is the expectation of output, which can be obtained approximately from the background samples. Therefore, we have the Shapley region

ϕ (V_{t^{^{'}}})

. According to the chain rule, Definition 2, and Equation (3), we can obtain Trajectory Shapley

ϕ (T_{i})

with

\begin{matrix} ϕ (T_{i}) & = (Y_{t + 1} - E [Y_{t + 1}]) \times \frac{\partial (Y_{t + 1}^{x, y})}{\partial T_{i}} \\ = (Y_{t + 1} - E [Y_{t + 1}])) \times \frac{\partial (Y_{t + 1}^{x, y})}{\partial V_{t^{^{'}}}} \times \frac{\partial V_{t^{^{'}}}}{\partial T_{i}} \\ = ϕ (V_{t^{^{'}}}) \times \frac{\partial V_{t^{^{'}}}}{\partial T_{i}} \end{matrix}

(4)

Therefore, the process of obtaining

ϕ (T_{i})

can be divided into two steps: (1) compute the region Shapley

ϕ (V_{t^{^{'}}})

; (2) calculate Trajectory Shapley

ϕ (T_{i})

with

ϕ (V_{t^{^{'}}})

. One of the benefits of using trajectories to explain traffic forecasting is attribution. According to Defination 1 and Figure 4, we know that there are four trajectories flowing into the two opposite grids, but we do not know where they come from. While using Gradient × Input can eliminate a significant amount of noise, such as the areas that no track pass through, some small particles are inevitably retained, because Gradient × Input removes the information on where the trajectory came from. On the contrary, for Trajectory × Shapley, by incorporating GPS logs, the prior information can help to attribute the trajectory and the results of interpretation are easy to understand. Here, we give the definition of the Shapley flow.

Definition 5

(Shapley Flow).Given a trajectory T, the Shapley flow refers to the spatial and temporal contribution of the trajectory for a certain region in the model output.

See Algorithm 1 for an overview of Trajectory Shapley.

Algorithm 1 Trajectory Shapley

Input:
Randomly selected background sample

S = {G_{1}, \dots, G_{| V |}}

,
Explain region coordinate (x, y) in terms of grid,
Set trajectories

P = {T_{1}, \dots, T_{| N |}}

in t time slot,
Flow tensor

G_{t}

, pretrain model f.
Output:

1:: Trajectory Shapley set M
2:: Initialize $ϕ (G_{t}) = 0$ , $M = \emptyset$
3:: Generate output $Y_{t + 1} = f (G_{t})$
4:: Obtain $E [Y_{t + 1}] = \frac{1}{| V |} \sum_{G_{i} \in S} G_{i}$
5:: Calculate $ϕ (G_{t})$ using Equation (3)
6:: for all trajectories $T_{i}$ in $P$ do
7:: Calculate $ϕ (T_{i})$ using Equation (4)
8:: Add $ϕ (T_{i})$ to M
9:: end for
10:: return M

4.2. Maximum Explainability Coverage

It is difficult to analyze the Trajectory Shapley values obtained in Section 4.1 due to the large number of trajectories, chaotic distribution, (see Section 5.2.1), and the limitation on the number of roads in the spatial aspect, which leads to few paths being available for the trajectory. Furthermore, the trajectory flows are affected by the time of day, as in the morning and evening peaks, which provide the opportunity to discover special patterns. Therefore, many trajectories are redundant. To intuitively explain this to users, in this section, we discover key sub-trajectories to represent other trajectory signals.

4.3. Trajectory Segment

To discover the representativeness of the sub-trajectories, we need to discretize trajectories to find the common Shapley flow. For example, in Figure 1, three trajectories converge in three directions and then separate at the crossroads. In this kind of trajectory driving model, it is difficult to say which trajectory is representative. However, after segmentation, if we take each sub-trajectory as independent, we can easily find a common Shapley flow. Fortunately, the segment does not change the nature of the neural network due to additivity. The trajectory discrete equivalence theorem is as follows.

Theorem 1

(Trajectory Discrete Equivalence).Take a neural network f and a set of trajectory tensors Ω, where trajectory tensor

T_{r} \in Ω

. Each

T_{r}

is segmented with multiple parts

t_{r i}

. We have the constraint

T_{r} = \sum_{i} t_{r i}

. Therefore, the neural network f has the same output as

\begin{matrix} f (\sum_{r} T_{r}) = f (\sum_{r} \sum_{i} t_{r i}) \end{matrix}

(5)

The segmentation of trajectories can be obtained by executing the approximate trajectory partitioning algorithm [15]. We assume that the segmented trajectory is an independent trajectory. In other words, the trajectory in the submodular method is a set of line segments. According to Theorem 1, we can expand Trajectory Shapley Equation (4) to the sub-trajectories. After rerunning Trajectory Shapley, we obtain the Shapley value of each sub-trajectory. The two definitions of distance—perpendicular distance and angle distance—used in segments and submodules are as follows.

Definition 6

(Perpendicular distance).Suppose the projection of the points

S_{a}

and

S_{b}

onto

L_{i}

are

P_{a}

and

P_{b},

respectively.

l_{⊥ 1}

is the Euclidean distance between

S_{a}

and

P_{a}

;

l_{⊥ 2}

is that between

S_{b}

and

P_{b}

. The perpendicular distance is defined in Formula (6). Figure 5 shows the semantic of perpendicular distance.

\begin{matrix} d_{⊥} (L_{i}, L_{j}) = \frac{l_{⊥ 1}^{2} + l_{⊥ 2}^{2}}{l_{⊥ 1} + l_{⊥ 2}} \end{matrix}

(6)

Definition 7

(Angle distance).The angle distance between

L_{i}

and

L_{j}

is defined in Formula (7). Here,

∥L_{j}∥

is the length of

L_{j}

and θ

(0^{\circ} \leq θ \leq 180^{\circ})

is the smaller intersecting angle between

L_{i}

and

L_{j}

. Figure 5 shows the semantic of angle distance.

\begin{matrix} d_{θ} (L_{i}, L_{j}) = \{\begin{matrix} ∥L_{j}∥ \times sin (θ), & if 0^{\circ} \leq θ < 90^{\circ} \\ ∥L_{j}∥, & if 90^{\circ} \leq θ \leq 180^{\circ} \end{matrix} \end{matrix}

(7)

4.4. Trajectory Shapley Maximum Coverage

Maximum coverage functions aim to maximize the number of features that have a non-zero element in at least one selected example; there is no marginal benefit to observing a variable in two examples. If each variable is thought to be an item in a set, and the data are a binary matrix where 1 indicates that the item is present in the example and 0 indicates that it is not, optimizing a maximum coverage function is a solution to the set coverage problem. These functions are useful when the space of variables is massive and each example only includes a small subset of them, which is a common situation when analyzing text data when the variables are words. The maximum coverage function is an instance of a feature-based function when the concave function is minimum. Maximum coverage is actually a special case of submodular maximization. Here, we have the definition of Trajectory Shapley maximum coverage.

Definition 8

(Trajectory Shapley maximum coverage).Take a set of sub-trajectories Ω and their Shapley distribution

D_{t o t a l}

. Trajectory Shapley maximum coverage finds the K representativeness trajectories and K sub-distributions of

\begin{matrix} min d_{t} (D_{t o t a l}, D_{u n i o n}) - \sum_{i}^{K} \sum_{j}^{K} d_{t} (D_{i}, D_{j}) \end{matrix}

where

D_{i}, D_{j}

are the sub-Trajectory Shapley distributions and

D_{i}, D_{j} \in R

. R is the set of K sub-distributions.

D_{u n i o n}

is the union of R.

d_{t}

is the distance function of two distributions.

To achieve Trajectory Shapley maximum coverage in Defination 8, we divide the task into three parts: (1) similar Shapley flows should be as close as possible; (2) the sub-trajectories with large Shapley values should be chosen as the representative trajectories; (3) trajectories in same the cluster should be in similar directions and close to each other. For (1), we use the Euclidean distance to measure the segment Shapley

ϕ (L)

distance

\begin{matrix} d_{s} (ϕ (L_{1}), ϕ (L_{2})) = ∣ ∣ ϕ (L_{1}) - ϕ (L_{2}) ∣ ∣^{2} \end{matrix}

(8)

For (2), on account of the large margin of Trajectory Shapley values, we apply the Sigmoid function to smooth them. Inspired by [16], we introduce a temperature parameter to adjust the sigmoid function

δ (ϕ (L_{i}), T) = \frac{1}{1 + e^{- \frac{|ϕ (L_{i})|}{T}}}

, which modulates the weight of the Trajectory Shapley value and controls the distance between

L_{i}

and

L_{j}

. We call T the Shapley weight. We can find that the smaller the Shapley weight T, the larger the distance of the pair of Shapley trajectories. By adjusting T, we can easily control the Trajectory Shapley distribution.

With Definitions 6 and 7, for (3), we use the perpendicular and angle distance to regularize the direction and distance of trajectories in the same cluster.

To sum up, we provide the asymmetric distance formula for a trajectory submodule of two sub-trajectories,

L_{i}

and

L_{j}

:

\begin{matrix} d_{L_{r}} (L_{1}, L_{2}) & = \frac{λ_{θ} d_{θ} (L_{1}, L_{2}) + λ_{⊥} d_{⊥} (L_{1}, L_{2}) + λ_{s} d_{s} (ϕ (L_{1}), ϕ (L_{2}))}{δ (ϕ (L_{r}), T)} \end{matrix}

(9)

where

L_{1}, L_{2} \in L_{r}

. The selection matrix of maximum coverage can be obtained by setting threshold parameters

ω

. Through Theorem 2, the approximate solution of maximum coverage can be obtained by greedy maximum covering. Note that

λ_{θ}

,

λ_{⊥}

,

λ_{s}

control the weight of perpendicular distance, angle distance, and distribution distance, respectively.

Theorem 2.

The greedy algorithm for monotone submodular maximization is a (1-1/e) approximation.

Proof.

We use an argument similar to that used in Theorem 2. Let

S_{i}

denote the set of elements chosen by the algorithm after i steps of the algorithm, and let

S^{*}

be the set that maximizes

f .

Let

ℓ_{i}

be the difference between the value of these two sets, i.e.,

ℓ_{i} = f (S^{*}) - f (S_{i}) .

Our goal will again be to show that

ℓ_{i} / k \leq ℓ_{i} - ℓ_{i + 1} .

Once we establish this, the proof becomes identical to that of Theorem 2.

To show this, let

K_{i}^{*}

be the set of elements included in

S^{*}

but not in S after i steps. Observe that since f is submodular, we have the following inequality:

\begin{matrix} \sum_{j = 1}^{|K_{i}^{*}|} f (S_{i} \cup \{y_{1}, \dots, y_{j}\}) - f (S_{i} \cup \{y_{1}, \dots, y_{j - 1}\}) \\ \leq \sum_{j = 1}^{|K_{i}^{*}|} f (S_{i} \cup \{y_{j}\}) - f (S_{i}) \end{matrix}

where

\{y_{1}, \dots, y_{0}\}

is defined as the empty set. Observe that the LHS of Equation (3) telescopes and therefore equals

f (S^{*} \cup S_{i}) - f (S_{i}) .

Since f is also monotone, we have

f (S_{i} \cup S^{*}) \geq f (S^{*}) .

Thus, Equation (3) implies that:

\begin{matrix} \begin{matrix} ℓ_{i} = f (S^{*}) - f (S_{i}) & \leq \sum_{j = 1}^{|K_{i}^{*}|} f (S_{i} \cup \{y_{j}\}) - f (S_{i}) \\ \leq |K_{i}^{*}| max_{j \in K_{i}^{*}} \{f (S_{i} \cup \{y_{j}\}) - f (S_{i})\} \\ \leq k (ℓ_{i} - ℓ_{i + 1}) \end{matrix} \end{matrix}

where (4) follows from the fact that the algorithm chooses the element that increases the value of f by the most and

K_{i}^{*} \leq k

(the optimal solution can pick at most k elements). Thus, we have that

ℓ_{i} / k \leq ℓ_{i} - ℓ_{i + 1},

as intended. As mentioned above, the rest of the proof is identical to that of Theorem 2. □

See the formulation of the trajectory submodular framework in Algorithm 2.

Algorithm 2 Trajectory Shapley maximum coverage

Input:
The Trajectory Shapley set M calculated by Algorithm 1,
Trajectories set

P = {T_{1}, \dots, T_{| N |}}

in t time slot,
Submodular distance threshold

ω

.
Output:

O = {N

sub-trajectories}

1:: Initialize the trajectories Shapley subsets $S = \{ϕ (T_{i}) ∣ ϕ (T_{i}) \neq 0 \land ϕ (T_{i}) \in M\}$
2:: Initialize the trajectories subsets $K = \{T_{i} ∣ T_{i} \in P \land ϕ (T_{i}) \in M\}$
3:: Initialize segment set $R = {\emptyset}$ , segment Shapley set $S = {\emptyset}$ , segment distance matrix D
4:: for all trajectory $T_{i}$ in $P$ do
5:: Run approximate trajectory partitioning algorithm for $T_{i}$
6:: Add segment set Q to R
7:: for all segment $L_{i}$ in Q do
8:: Rerun Algorithm 1 for $L_{i}$ , adding $ϕ (L_{i})$ to S
9:: end for
10:: end for
11:: for all segment $L_{i}$ in Q do
12:: for all segment $L_{j}$ in Q do
13:: Calculate $d_{L_{i}} (L_{i}, L_{j})$ and $d_{L_{j}} (L_{i}, L_{j})$ using Equation (9).
14:: Set $D (L_{i}, L_{j}) = d_{L_{i}} (L_{i}, L_{j})$ and $D (L_{j}, L_{i}) = d_{L_{j}} (L_{i}, L_{j})$
15:: end for
16:: end for
17:: Set $D = \{1 ∣ D (i, j) \leq ω\}$ and $D = \{0 ∣ D (i, j) > ω\}$
18:: Run greedy max cover algorithm and find N common patterns.
19:: return O

5. Experiments

In this paper, we used a large-scale online taxi request dataset collected from DiDi Chuxing, which is one of the largest online car-hailing companies in China. One dataset contains taxi requests from 1 November 2016 to 30 format 2016 for the city of Chengdu. There are 38 × 36 regions in our data. The size of each region is 0.450 km × 0.450 km. Another dataset contains taxi requests from 1 October 2016 to 30 October 2016 for the city of Xi’an. There are 39 × 32 regions in our data. The size of each region is 0.450 km × 0.450 km. We used the Chengdu data from 1 November 2016 to 23 November 2016 for training (23 days), and the data from 24 November 2016 to 30 November 2016 (7 days) for testing. The Xi’an data from 1 October 2016 to 24 October 2016 was used for training (24 days), and the data from 25 October 2016 to 31 October 2016 (7 days) was used for testing. We used 10 min as the time interval.

5.1. Classic Prediction Methods for Comparison

We compared the proposed Trajectory Shapley model with three classic baselines, with these baselines trained on the Chengdu and Xi’an datasets:

•: CNN: We used a basic deep learning predictor constructed with four CNN layers. The $4 D$ tensor is represented by $(H, T, W, C)$ . The CNN predictor utilizes four Conv layers to take the current observed t-step frames as input and predicts the next frame as output;
•: ST-GCN [17]: For ST-GCN, we set the adjacency matrix to have the same receptive field as the CNN. The receptive field was set on the basis of a grid. It is regulated by the distance parameter $ω$ . The three layer channels in the ST-Conv block were 64, 64, and 64, respectively. Both the graph convolution kernel size K and temporal convolution kernel size $K_{t}$ were set to 3 in the model.
•: DNN: We straightened in–out flow grids into vectors and used them as the output of DNNs. We also erased the time information in DNNs and used five layers of a fully connected network. The feature size of each layer was $T \times W \times C$ .

5.2. Case Study

5.2.1. Case Study of Trajectory Shapley Visualization

The Figure 6 shows the performance of Trajectory Shapley with a

C N N, S T - G C N

, and

D N N

on the same region. We used transparency to represent the Shapley value of the trajectory. The time slot we chose was from 8:00 to 8:50 both for Xi’an and Chengdu. The area we selected was Xian’s overpass with the Chang’an interchange and Chang’an Road. In one day, there were 66 DiDi taxis passing by every 10 min. In Chengdu, we chose the intersection of the Second Ring Elevated Road and Fuqing Road. The speed here is fast and the traffic flow is large; it is the place with the largest traffic flow. We chose these areas because their flows are the greatest over the whole day in the two cities. The model pays more attention to this area due to the loss function, so the visualization area is representative. We can see that with the CNN and ST-GNN, the receptive fields of the models are limited by the depth of the models and the size of the kernel, so they are similar. However, with the DNN, perception is global because the DNN eliminates spatial information. The distribution of trajectories in each classical method is turbid and disordered; thus, we propose the summarization method to conclude the Shapley distribution.

5.2.2. Case Study of Explainable Summarization

We use an example to explain the process of mining representative trajectories. In this experiment, we tested the morning and evening peaks of taxi driving in Chengdu, from 8:00 to 8:50 and from 18:00 to 18:50, respectively, on 1 November 2016. There are 12,971 and 10,871 tracks in these time periods, respectively. We chose the same place in Section 5.2.1 in Chengdu. Due to space limitations, we only display the

C N N

model results here.

Subset Trajectories

By calculating the Shapley value of each trajectory with

O (N)

time complexity, we can easily filter out a large number of irrelevant trajectories; then, we retain 533 trajectories in the morning case, as shown in Figure 7b.

Subsets Segment

We used the trajectory segment approach to segment 533 tracks in Figure 7b. Then, we recalculated the Shapley value of each line segment and filtered out the line segments with

ϕ (L_{i}) = 0

. Finally, we obtained the sub-trajectory summarization set in Figure 7c; 487 segments were retained.

Trajectory Shapley Cover

We used Formula (9) to compute the asymmetric distance for each sub-trajectory and set the

5 %

quantile as the distance coverage parameter

ω

to divide the coverage. The segment coverage matrix

D \in {0, 1}

. Each row and column represents a segment; 1 means that the segments cover each other, 0 means that they are unrelated. Note that the segment coverage matrix is the input of summarization and the goal is to select K sets to cover all samples. The result of trajectory coverage is shown in the Figure 8. Figure 8c,f show the algorithm results for the morning and evening peaks, with the green line representing the whole set, the orange line representing the Trajectory Shapley maximum coverage algorithm, and the blue line representing the randomly selected set. It should be noted that there are several reasons why the coverage is not complete. (1) Our goal is to find the representative distribution and the common Shapley flow under this distribution; therefore, the coverage is as complete as possible. (2) Coverage is influenced by the covering parameter

ω

. (3) Even if we conduct segmentation, there are still some segments passing through different regions, such as in Clusters 1 and 2. Therefore, such segments are difficult to cover and their Shapley values are relatively random.

Common Shapley Flow

The common Shapley flow of a submodular cover describes the overall importance of the trajectory partitions that belong to the cluster. We need to extract quantitative information on the movement within a cluster such that domain experts can understand the movement in the trajectories. Thus, to gain full practical potential from trajectory clustering, a representative trajectory is required. Figure 8b,e show the distribution of the cluster. For the morning peak, we found three obvious patterns, while for the evening peak, we found two patterns. We can find that their distributions vary and try to cover different areas. In Figure 8a,d, we display the original Shapley trajectories and the union of cluster distribution. In the morning case, two distributions cover the main body, and one distribution covers the large Shapley value region. In the evening case, all distributions are focused on the main body region. Note that the distribution and coverage can be adjusted by five parameters, which are discussed in Section 5.3.

Result Analysis

The results of spatial visualization are displayed in Figure 9. We use transparency to represent the Shapley value, which is the same as Figure 6. We display the trajectory directions to reflect the trend of traffic flow. We can see that since most of the people live in the suburbs, in the morning, the traffic flow mainly comes from North Star Road and then passes through Second Ring Road. This was successfully perceived by the neural network. Therefore, the distribution with the largest Shapley weight is the part with blue sub-trajectories. On the contrary, during evening rush hour, people move from the city to the suburbs. The second loop traffic is successfully mined by the model. We only show the first two classes because the Shapley value in the latter classes is too small. The reason why common the Shapley flow is short and near the interpretation region is that it is affected by perpendicular distance and weight. On one hand, if the common Shapley flow is very long, it is difficult to balance segments from all directions, which is mainly affected by the coverage. On the other hand, according to Equation (4), we know that the closer to the interpretation area, the greater the weight. Our algorithm gives priority to trajectories with large Shapley values.

5.3. Parameter Analysis

We examine the sensitivities of five important hyperparameters: the weight of perpendicular distance

λ_{θ}

, angle distance

λ_{⊥}

, distribution distance

λ_{s}

, the coverage parameter

ω

, and the Shapley weight T. We use coverage

P_{c o v e r}

to reflect the ability of the algorithm to cover samples, which can be written as

\begin{matrix} P_{c o v e r} = \frac{\cup U_{i}}{N}, \end{matrix}

(10)

where

U_{i}

denotes the i sample cover set, N denotes the total number of samples, and

c u p

denotes the union operation. Moreover, intuitively, the common patterns for each coverage should be distinguishable; otherwise, they should be merged into one block, so we use the distance of different distributions to measure the influence of parameters. Here, we introduce the Wasserstein distance

\begin{matrix} W (P_{r}, P_{g}) = inf_{γ \sim Π (P_{r}, P_{g})} E_{(x, y) \sim γ} [∥ x - y ∥], \end{matrix}

(11)

where

Π (P_{r}, P_{g})

is the set of all possible joint distributions that are combined for

P_{r}

and

P_{g}

. For every possible joint distribution

γ

, we can sample from

(x, y) \sim γ

to obtain real samples x and y. The expected value

E_{(x, y) \sim γ} [∥ x - y ∥]

of the sample with respect to the distance under the joint distribution

γ

can be calculated.

We set the average distance between all samples as

\bar{W} (D_{i}, D_{j})

and the average distance between the sample distribution and the union of the set distribution as

\bar{W} (D_{t o t a l}, D_{u n i o n})

. The range of

λ_{θ}

,

λ_{⊥}

,

λ_{s}

, T

\in {0.2, 0.4, 0.6, 0.8, 1}

, and

ω \in {2, 4, 6, 8, 10}

. From the results in Figure 10, we see that

\bar{W} (D_{t o t a l}, D_{u n i o n})

is stable at all times except when changing the segment receptive field

ω

or the Shapley weight T. This shows the stationarity of the algorithm, which covers the overall distribution as much as possible.

\bar{W} (D_{t o t a l}, D_{u n i o n})

have an inverse relationship with

ω

and T. This is intuitively reasonable because if the receptive field is too small, it will be indistinguishable. Moreover, enlarging T will eliminate the effects of Shapley values. If

T = \infty

, the asymmetric distance will transform to symmetry. In all five cases,

\bar{W} (D_{i}, D_{j})

remains at about 0.3. This shows that the distance between each class is relatively stable.

ω

should be large enough to cover as many samples as possible, which is reflected in

P_{c o v e r}

.

P_{c o v e r}

is proportional to

λ_{⊥}

, inversely proportional with

λ_{θ}

, and related to

λ_{s}

and T. If the perpendicular distance

λ_{⊥}

increases, there will be fewer segments. If the angle distance

λ_{θ}

increases, the algorithm will be more likely to select a similar angle. It is reasonable that changes of distribution distance

λ_{s}

and T have little effect on the coverage because these are mainly used to adjust the clustering of the Shapley distribution.

6. Related Work

6.1. Urban Computing and Crowd Prediction

GPS data [18,19,20], social network data [21,22], and query data [23] have been extensively researched in recent years. Massive datasets have been published and relevant studies have demonstrated the potential of big data to solve the difficult problems in urban computing, for example, traffic jams [24], supply–demand [25], and energy consumption [26]. The classic review [27] summarizes the key challenges, general framework, and applications of urban computing. Many studies have proposed different methods for the task of crowd prediction, such as DCRNN [28], SRCNs [29], and multitask-net [30]. VLUC [31], PCRN [32], and PDB-ConvLSTM [33] use CNNs to process recent, near, and far data, respectively, and treat each timestamp as the equivalent convolution channel. STGCN [17], MRGCN [34], and ST-MGCN [34] fit a graph to the road structure and use convolution to learn temporal correlations. However, practical experiments are still lacking to explain how these models produce their results by learning the features from an input. Therefore, in this work, we propose a novel framework that focuses on dealing with mixed trajectory inputs. Secondly, our model attempts to summarize and attribute the Shapley value to trajectories.

6.2. Explainable Model

Linear models or basic decision trees are still widely used in many application that require a highly explainable model, even at the expense of a large compromise in accuracy. However, recent works with elaborately designed interpretation techniques [6,35] have demonstrated how neural networks obtain the mapping relation between input and output and have represented the decision-making process of neural networks. A general framework to achieve model-agnostic explanation is to visualize and understand the activation value produced by the neural network. Deconvolution [36] maps the features of the activation function back to the grid space to reveal what input patterns produce a particular output. Guided backpropagation [37] replaces the pooling operation with stride convolution, while ReLU backpropagation [38] prevents the backwards flow of negative gradients. Game theory can be used to calculate the importance of each feature [10]. However, these methods are not effective for crowd prediction, since crowd prediction maintains spatial and temporal patterns. Furthermore, the direct use of these methods in crowd prediction is difficult due to the lack of attribution.

6.3. Trajectory Cluster

Clustering similar trajectories to produce representative exemplars can be a powerful visualization tool to track the mobility of vehicles and humans. It has been investigated for many different applications, such as spatial databases [39,40], data mining [41], transportation [42], motion segmenting [43], and visualization [44]. Clustering is most often applied to spatial-only trajectories, with prior work on spatial-textual trajectory clustering being relatively rare. Trajectory clustering can be broadly divided into two categories: partition-based clustering [44,45,46] and density-based clustering [39,47,48]. Both partition- and density-based trajectory clustering require extensive similarity computations, with the only distinction being whether similarity is computed for whole trajectories or using only sub-trajectories. However, these methods have not been applied for mining representative model patterns.

7. Conclusions and Discussion

In this paper, we proposed a novel framework called Trajectory Shapley to explain spatial and temporal correlation in flow prediction. To capture common patterns for model prediction, we proposed the idea of summarizing Trajectory Shapely value distributions. We demonstrate the theory of Trajectory Shapley and show that our method produces a structural and continuous result that is easy to understand for users. We conducted experiments on two real-world public datasets from DiDi, using morning and evening rush hours as comparison experiments to test whether our submodule can successfully capture information in time and space. We demonstrated the effectiveness and interpretability of our proposed model, which can obtain the Trajectory Shapley value with time complexity

O (N)

. In the future, we will try to explore more diverse patterns, such as flocking, gathering, swarming, or meeting [49], in explaining crowd prediction.

Author Contributions

Conceptualization, H.W.; methodology, H.W.; software, H.W.; validation, J.C.; formal analysis, H.W.; investigation, H.W.; resources, H.W.; data curation, H.W.; writing—original draft preparation, H.W.; writing—review and editing, H.W.; visualization, H.W.; supervision, X.S., Z.F., and L.Z.; project administration, X.S., Z.F., L.Z.; funding acquisition, X.S., Z.F., and L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the grants of National Key Research and Development Project (2021YFB1714400) of China, Guangdong Provincial Key Laboratory (2020B121201001) and the grant in-Aid for Scientific Research B (22H03573) of Japan Society for the Promotion of Science (JSPS).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This paper did not report any data.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zheng, Y.; Zhang, L.; Xie, X.; Ma, W.Y. Mining interesting locations and travel sequences from GPS trajectories. In Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain, 20–24 April 2009; pp. 791–800. [Google Scholar]
Zhang, J.; Zheng, Y.; Qi, D. Deep spatio-temporal residual networks for citywide crowd flows prediction. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2018. [Google Scholar]
Zhang, J.; Zheng, Y.; Qi, D.; Li, R.; Yi, X. DNN-based prediction model for spatio-temporal data. In Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Burlingame, CA, USA, 31 October–3 November 2016; pp. 1–4. [Google Scholar]
Yao, H.; Wu, F.; Ke, J.; Tang, X.; Jia, Y.; Lu, S.; Gong, P.; Ye, J.; Li, Z. Deep multi-view spatial-temporal network for taxi demand prediction. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, AK, USA, 2–7 February 2018. [Google Scholar]
Yao, H.; Tang, X.; Wei, H.; Zheng, G.; Li, Z. Revisiting spatial-temporal similarity: A deep learning framework for traffic prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 5668–5675. [Google Scholar]
Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv 2013, arXiv:1312.6034. [Google Scholar]
Sundararajan, M.; Taly, A.; Yan, Q. Gradients of counterfactuals. arXiv 2016, arXiv:1611.02639. [Google Scholar]
Smilkov, D.; Thorat, N.; Kim, B.; Viégas, F.; Wattenberg, M. Smoothgrad: Removing noise by adding noise. arXiv 2017, arXiv:1706.03825. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
Lundberg, S.; Lee, S.I. A unified approach to interpreting model predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
Cheng, Z.; Yang, Y.; Wang, W.; Hu, W.; Zhuang, Y.; Song, G. Time2graph: Revisiting time series modeling with dynamic shapelets. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 3617–3624. [Google Scholar]
Ye, L.; Keogh, E. Time series shapelets: A new primitive for data mining. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 947–956. [Google Scholar]
Shapley, L.S. A value for n-person games. Contrib. Theory Games 1953, 2, 307–317. [Google Scholar]
Weber, R.J. Probabilistic values for games. In The Shapley Value. Essays in Honor of Lloyd S. Shapley; Cambridge University Press: Cambridge, UK, 1988; pp. 101–119. [Google Scholar]
Lee, J.G.; Han, J.; Whang, K.Y. Trajectory clustering: A partition-and-group framework. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, Beijing, China, 11–14 June 2007; pp. 593–604. [Google Scholar]
Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
Yu, B.; Yin, H.; Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv 2017, arXiv:1709.04875. [Google Scholar]
Fan, Z.; Song, X.; Shibasaki, R.; Adachi, R. CityMomentum: An online approach for crowd behavior prediction at a citywide level. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Osaka, Japan, 7–11 September 2015; pp. 559–569. [Google Scholar]
Song, X.; Zhang, Q.; Sekimoto, Y.; Shibasaki, R.; Yuan, N.J.; Xie, X. Prediction and simulation of human mobility following natural disasters. ACM Trans. Intell. Syst. Technol. (TIST) 2016, 8, 1–23. [Google Scholar] [CrossRef]
Wang, L.; Yu, Z.; Guo, B.; Ku, T.; Yi, F. Moving destination prediction using sparse dataset: A mobility gradient descent approach. ACM Trans. Knowl. Discov. Data (TKDD) 2017, 11, 1–33. [Google Scholar] [CrossRef]
Yang, Z.; Lian, D.; Yuan, N.J.; Xie, X.; Rui, Y.; Zhou, T. Indigenization of urban mobility. Phys. A Stat. Mech. Its Appl. 2017, 469, 232–243. [Google Scholar] [CrossRef] [Green Version]
Zhang, J.; Guo, B.; Han, Q.; Ouyang, Y.; Yu, Z. CrowdStory: Multi-layered event storyline generation with mobile crowdsourced data. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, Heidelberg, Germany, 12–16 September 2016; pp. 237–240. [Google Scholar]
Konishi, T.; Maruyama, M.; Tsubouchi, K.; Shimosaka, M. CityProphet: City-scale irregularity prediction using transit app logs. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Heidelberg, Germany, 12–16 September 2016; pp. 752–757. [Google Scholar]
Chawla, S.; Zheng, Y.; Hu, J. Inferring the root cause in road traffic anomalies. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining, Brussels, Belgium, 10–13 December 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 141–150. [Google Scholar]
Jian, S.; Rey, D.; Dixit, V. An integrated supply-demand approach to solving optimal relocations in station-based carsharing systems. Netw. Spat. Econ. 2019, 19, 611–632. [Google Scholar] [CrossRef]
Kim, T.Y.; Cho, S.B. Predicting residential energy consumption using CNN-LSTM neural networks. Energy 2019, 182, 72–81. [Google Scholar] [CrossRef]
Zheng, Y.; Capra, L.; Wolfson, O.; Yang, H. Urban computing: Concepts, methodologies, and applications. ACM Trans. Intell. Syst. Technol. (TIST) 2014, 5, 1–55. [Google Scholar] [CrossRef]
Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv 2017, arXiv:1707.01926. [Google Scholar]
Yu, H.; Wu, Z.; Wang, S.; Wang, Y.; Ma, X. Spatiotemporal recurrent convolutional networks for traffic prediction in transportation networks. Sensors 2017, 17, 1501. [Google Scholar] [CrossRef] [Green Version]
Zhang, K.; Zheng, L.; Liu, Z.; Jia, N. A deep learning based multitask model for network-wide traffic speed prediction. Neurocomputing 2020, 396, 438–450. [Google Scholar] [CrossRef]
Jiang, R.; Cai, Z.; Wang, Z.; Yang, C.; Fan, Z.; Song, X.; Tsubouchi, K.; Shibasaki, R. VLUC: An Empirical Benchmark for Video-Like Urban Computing on Citywide Crowd and Traffic Prediction. arXiv 2019, arXiv:1911.06982. [Google Scholar]
Zonoozi, A.; Kim, J.j.; Li, X.L.; Cong, G. Periodic-CRN: A Convolutional Recurrent Model for Crowd Density Prediction with Recurring Periodic Patterns. In Proceedings of the IJCAI, Stockholm, Sweden, 13–19 July 2018; pp. 3732–3738. [Google Scholar]
Song, H.; Wang, W.; Zhao, S.; Shen, J.; Lam, K.M. Pyramid dilated deeper convlstm for video salient object detection. In Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 715–731. [Google Scholar]
Geng, X.; Li, Y.; Wang, L.; Zhang, L.; Yang, Q.; Ye, J.; Liu, Y. Spatiotemporal multi-graph convolution network for ride-hailing demand forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27–28 January 2019; Volume 33, pp. 3656–3663. [Google Scholar]
Bach, S.; Binder, A.; Montavon, G.; Klauschen, F.; Müller, K.R.; Samek, W. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 2015, 10, e0130140. [Google Scholar] [CrossRef] [Green Version]
Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the European conference on computer vision, Zurich, Switzerland, 6–12 September 2014; pp. 818–833. [Google Scholar]
Springenberg, J.T.; Dosovitskiy, A.; Brox, T.; Riedmiller, M. Striving for simplicity: The all convolutional net. arXiv 2014, arXiv:1412.6806. [Google Scholar]
Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the ICML, Madison, WI, USA, 21–24 June 2010. [Google Scholar]
Agarwal, P.K.; Fox, K.; Munagala, K.; Nath, A.; Pan, J.; Taylor, E. Subtrajectory clustering: Models and algorithms. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Houston, TX, USA, 10–15 June 2018; pp. 75–87. [Google Scholar]
Hung, C.C.; Peng, W.C.; Lee, W.C. Clustering and aggregating clues of trajectories for mining trajectory patterns and routes. VLDB J. 2015, 24, 169–192. [Google Scholar] [CrossRef]
Pelekis, N.; Kopanakis, I.; Kotsifakos, E.; Frentzos, E.; Theodoridis, Y. Clustering trajectories of moving objects in an uncertain world. In Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, Miami, FL, USA, 6–9 December 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 417–427. [Google Scholar]
Wu, Y.; Shen, H.; Sheng, Q.Z. A cloud-friendly RFID trajectory clustering algorithm in uncertain environments. IEEE Trans. Parallel Distrib. Syst. 2014, 26, 2075–2088. [Google Scholar] [CrossRef]
Shen, J.; Peng, J.; Shao, L. Submodular trajectories for better motion segmentation in videos. IEEE Trans. Image Process. 2018, 27, 2688–2700. [Google Scholar] [CrossRef] [PubMed]
Ferreira, N.; Klosowski, J.T.; Scheidegger, C.E.; Silva, C.T. Vector field k-means: Clustering trajectories by fitting multiple vector fields. In Proceedings of the Computer Graphics Forum; Wiley Online Library: New York, NY, USA, 2013; Volume 32, pp. 201–210. [Google Scholar]
Chan, T.H.; Guerqin, A.; Sozio, M. Fully dynamic k-center clustering. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 579–587. [Google Scholar]
Gudmundsson, J.; Valladares, N. A GPU approach to subtrajectory clustering using the Fréchet distance. IEEE Trans. Parallel Distrib. Syst. 2014, 26, 924–937. [Google Scholar] [CrossRef]
Andrienko, G.; Andrienko, N.; Rinzivillo, S.; Nanni, M.; Pedreschi, D.; Giannotti, F. Interactive visual clustering of large collections of trajectories. In Proceedings of the 2009 IEEE Symposium on Visual Analytics Science and Technology, Atlantic City, NJ, USA, 12–13 October 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 3–10. [Google Scholar]
Han, B.; Liu, L.; Omiecinski, E. Road-network aware trajectory clustering: Integrating locality, flow, and density. IEEE Trans. Mob. Comput. 2013, 14, 416–429. [Google Scholar]
Zheng, K.; Zheng, Y.; Yuan, N.J.; Shang, S.; Zhou, X. Online discovery of gathering patterns over trajectories. IEEE Trans. Knowl. Data Eng. 2013, 26, 1974–1988. [Google Scholar] [CrossRef]

Figure 1. Overview of the proposed method. In this paper, we aim to quantify the significance of the trajectory of the Shapley flow among a set of input trajectories and discover the key Shapley flow forming a common pattern.

Figure 2. Framework for discovering key representative sub-trajectories.

Figure 3. Schematic inflow and outflow matrices following Defination 2. For each trajectory, we calculate the inflow and outflow separately, and use the three-dimensional tensor

T_{i}^{i n}

and

T_{i}^{o u t}

shapes as

(t i m e l i n e, x, y)

to represent them; finally, we add all the tensors to obtain the final flow tensor G.

Figure 3. Schematic inflow and outflow matrices following Defination 2. For each trajectory, we calculate the inflow and outflow separately, and use the three-dimensional tensor

T_{i}^{i n}

and

T_{i}^{o u t}

shapes as

(t i m e l i n e, x, y)

to represent them; finally, we add all the tensors to obtain the final flow tensor G.

Figure 4. Input × Gradient vs. Trajectory × Shapley. The biggest difference between these is attribution. In traffic prediction tasks, using input × gradient rather than employing trajectories will result in the loss of attribution.

Figure 5. The distance function for line segments.

Figure 6. The simulation experiments of Trajectory Shapley in Chengdu and Xi’an with different classical prediction models.

Figure 7. Finding a subset of interpretable trajectories. (a) City-wide trajectory. (b) Subset tracks and explanation region. (c) Sub-trajectory summarization set.

Figure 8. Coverage results of morning peak and evening peaks.

Figure 9. Comparison of the key Shapley flow and the trend of flows during morning and evening rush hours in Chengdu. (a) The pattern when evening peak. (b) The pattern when morning peak.

Figure 10. Parameter analysis.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, H.; Fan, Z.; Chen, J.; Zhang, L.; Song, X. Discovering Key Sub-Trajectories to Explain Traffic Prediction. Sensors 2023, 23, 130. https://doi.org/10.3390/s23010130

AMA Style

Wang H, Fan Z, Chen J, Zhang L, Song X. Discovering Key Sub-Trajectories to Explain Traffic Prediction. Sensors. 2023; 23(1):130. https://doi.org/10.3390/s23010130

Chicago/Turabian Style

Wang, Hongjun, Zipei Fan, Jiyuan Chen, Lingyu Zhang, and Xuan Song. 2023. "Discovering Key Sub-Trajectories to Explain Traffic Prediction" Sensors 23, no. 1: 130. https://doi.org/10.3390/s23010130

APA Style

Wang, H., Fan, Z., Chen, J., Zhang, L., & Song, X. (2023). Discovering Key Sub-Trajectories to Explain Traffic Prediction. Sensors, 23(1), 130. https://doi.org/10.3390/s23010130

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Discovering Key Sub-Trajectories to Explain Traffic Prediction

Abstract

1. Introduction

2. Architecture

3. Preliminaries

4. Trajectory Shapley

4.1. Trajectory Shapley

4.2. Maximum Explainability Coverage

4.3. Trajectory Segment

4.4. Trajectory Shapley Maximum Coverage

5. Experiments

5.1. Classic Prediction Methods for Comparison

5.2. Case Study

5.2.1. Case Study of Trajectory Shapley Visualization

5.2.2. Case Study of Explainable Summarization

Subset Trajectories

Subsets Segment

Trajectory Shapley Cover

Common Shapley Flow

Result Analysis

5.3. Parameter Analysis

6. Related Work

6.1. Urban Computing and Crowd Prediction

6.2. Explainable Model

6.3. Trajectory Cluster

7. Conclusions and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI