2. Background
The journey planning problem takes as input a timetable that contains data concerning stops, vehicles (e.g., trains, buses or any means of transportation) connecting stops and departure and arrival times of vehicles at stops. More formally, a timetable is defined by a triple , where is a set of vehicles, S is a set of stops (often in the literature also referred to as stations) and is a set of elementary connections whose elements are 5-tuples of the form . Such a tuple is interpreted as vehicle leaves departure stop at departure time , and the immediately next stop of vehicle Z is stop at time (i.e., is the arrival time of Z at arrival stop ). Departure and arrival times are integers in representing times in minutes after midnight, where is the largest time allowed within the timetable (typically , where n is the number of days that are represented by the timetable). We assume , that is we do not consider vehicles and stops that do not take part to any connection. In the realistic scenario, each stop has an associated minimum transfer time, denoted by mtt, that is the time, in minutes, required for moving from one vehicle to another inside stop .
Definition 1 (Trip). A trip is a sequence of k connections that: (i) are operated by the same vehicle; (ii) share pairwisely departure and arrival stop, that is, formally, we have and with for any .
Clearly, connections in a trip are ordered in terms of the associated departure times, hence we say connection follows connection in a trip trip whenever the departure time of the former is larger than that of the latter. Similarly, we say connection precedes connection in a trip trip.
Definition 2 (Journey). A journey connecting two stops and is a sequence of n connections that: (i) can be operated by different vehicles; (ii) allows reaching a given target stop starting from a distinguished source stop at a given departure time , that is, the departure stop of is , the arrival stop of is and the departure time of is larger than or equal to τ; (iii) is formed by connections that satisfy the time constraints imposed by the timetable, namely that if the vehicle of connection is different with reference to that of at a certain stop , then the departure time of must be larger than the arrival time of plus mtt.
As well as trips, journeys are implicitly ordered by time according to departure times of the connections. The traveling time of a journey is given by the difference between arrival time of its last connection and .
An earliest arrival query ea asks, given a triple consisting of a source stop , a target stop , and a departure time , to compute a quickest journey, that is, a journey that starts at any , connects to , and minimizes traveling time. In what follows we provide two useful definitions that are necessary to introduce the notion of profile query.
Definition 3 (Time-Dominated Journey). Let and be two journeys, both connecting two stops and . Then journey is time-dominated by journey if and only if both the following conditions hold:
the departure time of the first connection of is larger than the departure time of the first connection of ;
the arrival time of the last connection in is smaller than the arrival time of the last connection in .
By the above, if we let be the set of all journeys connecting two stops and in a transit network, then trivially a journey is non-time-dominated if and only if either one of the two following conditions hold: (i) the departure time of the first connection of J is larger than the departure time of the first connection of all other journeys in ; (ii) the arrival time of the last connection of J is smaller than the arrival time of the last connection of all other journeys in .
Hence, we define a profile query pq as the one that asks for the set of non-time-dominated journeys between stops and in the time range , subject to , that is, the set of journeys connecting stops and that start at any time in and are non-time-dominated journeys.
Finally, we define a multi-criteria query mc-ea as the one asking to compute the set of Pareto-optimal journeys. Informally, such journeys simultaneously optimize more than one criterion (e.g., traveling time and number of vehicle transfers), departing in at some time and arriving at stop . More precisely, given a set of criteria, a journey is in the Pareto-optimal set S if it is non-dominated by any other journey. A journey dominates a journey if it is better with respect to every criterion, while it is non-dominated otherwise. Note that, most commonly considered optimization criteria are traveling time and number of vehicle transfers, although other optimization can be found in the literature, for example, monetary cost.
It is long known that the problem of computing the mentioned Pareto-optimal set is (weakly) NP-hard [
25], since such journeys can be exponential in number. However, if some degree of importance of the optimization criteria is imposed then the problem is polynomially solvable, by using a simple multi-criteria modification of the Dijkstra’s algorithm, based on lexicographical optimality [
25]. An example of this scenario is when one wants to compute the set of quickest journeys between two stops
and
and then, among them, to choose the one minimizing the number of transfers between vehicles. In this paper, we focus on this latter realistic variant of the journey planning problem. As a final remark, observe that profile queries are a special case of multi-criteria ones using arrival and departure times as criteria.
5. Dynamic Public Transit Labeling
In this section, we introduce Dynamic Public Transit Labeling (d-ptl, for short), a new technique that is able to maintain the ptl data structure under delays occurring in the given transit network. In particular, we first show a dynamic algorithm (referred to as basic d-ptl) to update the basic ptl framework, that is how to maintain both a red-te graph , the corresponding 2hc-r labeling l and stop labeling sl under delays affecting connections, and then discuss on how to extend this procedure to the multi-criteria setting.
Formally, a
delay is an increase in the departure time of an elementary connection of a finite quantity
. Hence, it is easy to see how a delay can induce an arbitrary number of changes to both the graph and labelings [
2,
13], depending on the structure of the trip the connection belongs to, thus in turn inducing arbitrarily wrong answers to queries.
A general strategy to achieve the purpose of updating both
G, the
2hc-r labeling
l and the stop labeling
sl, after a delay, while preserving the correctness of the queries, is to first update the graph representing the timetable (via, e.g., the solutions in References [
2,
9,
10]) and then reflect all these changes on both
l and
sl by: (i) detecting and removing obsolete label entries; and (ii) adding new updated label entries induced by the new graph, as done in other works on the subject [
21,
24]. However, this results in a quite high computational effort, as shown by preliminary experimentation we conducted.
In order to minimize the number of changes to both
l and
sl, we hence exploit the specific structure of the
red-te graph and design a dynamic algorithm that alternates phases of update of the graph with phases of update of the labeling
l through the procedures given in Reference [
24]. At the end of such phases, changes to
l are reflected onto its compact representation
sl through a dedicated routine. In particular, our algorithm is based on the following observation: a delay affecting a connection of a trip might be propagated to all subsequent connections in the same trip, if any. Hence, the impact of a given delay on both the graph and the labelings strongly depends on
, on the structure of the trip and, in particular, on the departure times of subsequent connections. Therefore,
d-ptl processes connections of a trip incrementally, and in order with respect to departure time. In details,
d-ptl comprises two sub-routines, called, respectively, removal phase (Algorithm
rem-d-ptl, see Algorithm 1) and insertion phase (Algorithm
ins-d-ptl, see Algorithm 2) that update
l along with the graph. Such phases are then followed by a bundle update of
sl by a suitable procedure (Algorithm
UpdateStopLab, see Algorithm 3).
Algorithm 1: Algorithm rem-d-ptl. |
Input: red-te graph G, a delay affecting a connection , the trip including the connection Output: red-te graph G not including vertices of connections violating red-te constraints and the 2hc-r labeling l of G |
Algorithm 2: Algorithm ins-d-ptl. |
Input: red-te graph G not including vertices of connections violating red-te constraints, the 2hc-r labeling l of G, delay , delayed connection , trip Output: red-te graph G including vertices of connections affected by the delay, the 2hc-r labeling l of G, the delay affecting the connection and the trip including the connection |
Algorithm 3: Algorithm UpdateStopLab. |
Input: Outdated stop labeling sl, 2hc-r labeling l of G, sets us, us Output: Updated stop labeling sl of l |
Algorithm 4: Algorithm RewireWaitingDep. |
Input: Graph , departure vertex , stop |
Algorithm 5: Algorithm RewireTransferDep. |
Input: Graph , departure vertex , successor vertex succ, stop |
Algorithm 6: Algorithm RewireArr. |
Input: Graph , arrival vertex , trip trip, stop |
In the removal phase, we first remove from
G vertices and arcs that are associated with the delayed connection that violate the
red-te constraints. We say a vertex (arc, respectively) violates the
red-te constraints whenever the associated time (the difference of the times of the endpoints, respectively) does not satisfy at least one of the inequalities imposed by the
red-te model discussed in
Section 2. Note that, vertices and arcs of the above kind can be: (i) departure and arrival vertices of the delayed connection; (ii) departure and arrival vertices following the delayed connection in the same trip; (iii) arcs adjacent to vertices in (i) and (ii).
Once the above is done, we might have that
G is no longer a
red-te graph, since the removal of the above vertices and arcs can, in turn, induce some other vertex/arc to violate
red-te constraints. Hence, we first reflect such removals onto
l by running the decremental algorithm
dec-bu of Reference [
24] and then check if we need to insert into
G some new arcs to let it be again a
red-te graph. Accordingly, if this is the case, we add label entries induced by these insertions by using the incremental algorithm
inc-bu of Reference [
24]. At this point, the graph
G is a
red-te graph of a timetable that does not include the delayed connection. Then, if some changes has been applied to
G (and
l) in the above step, we proceed by analyzing the connections following the delayed one in the same trip, one by one, and by removing vertices and arcs that violate the
red-te graph. At the end of these iterations, we have that
G is a
red-te graph of a timetable that does not include neither the delayed connection nor those following it in the same trip that have violated the
red-te constraints because of
.
After completing the above, we perform the insertion phase, where we check whether we need to insert back into G some vertices and arcs, with updated associated times, to let the graph be a red-te graph of the updated timetable. This might require to execute algorithm inc-bu to add label entries induced by such insertions. Once both G and l have been updated, we reflect the changes onto the stop labeling via a suited routine (see Algorithm 3). In the next sections we describe in detail the above sub-routines.
5.1. Removal Phase
In the negative case, we do not remove since, after updating time(), all vertices of dv[s] do not violate the time inequalities imposed by waiting arcs. In the affirmative case (see Line 9), instead, must be removed and the arcs adjacent to vertices in and must be rewired. In particular, we proceed as follows: if there exists some waiting arc in A, that is, there is some other whose time was larger than or equal to that of before the delay), and (thus the ordering imposed by waiting arcs is violated), then we compute a set of vertices that will be wired at v, given by . Note that the time of said vertex v is necessarily larger than the time of vertices such that plus mtt, thus satisfy the red-te inequality for transfer arcs.
Moreover, we search for two vertices, named pred and succ respectively, defined as follows:
pred is the unique vertex (if any) such that pred and (pred,) ;
succ is the unique vertex (if any) such that succ and (,succ) .
These are the vertices adjacent to the waiting arcs having as one endpoint, that we will need to rewire to preserve the red-te properties. Then, we remove from V, and run dec-bu to obtain an updated version of the 2hc-r labeling (see Line 14). Note that the removal of a vertex also removes all arcs (v,) and (,v) (if any) from A. Finally, we add: a waiting arc (pred, succ) to A, if both pred and succ are vertices in the graph, and a transfer arc for each entry in . In particular, for each vertex , we add a new transfer arc (w, succ). To reflect such changes on l, we run inc-bu (see Line 16).
Regarding vertex
, graph
G remains unchanged either if there is no transfer arc in
A having
as endpoint, or if there is a transfer arc (
,
v) but such arc is not affected by the delay, that is, when
. In all other cases, we proceed by removing
from
G and by updating
l via
dec-bu (see Line 21). An example of execution of the removal phase is shown in
Figure 2.
As a final remark on this part, notice that (see
Figure 2) the removal phase is stopped at a given connection
of trip
, with
whenever the delay does not induce a change neither in the time associated to
and
nor in their adjacent arcs, as this trivially implies that no change will be performed on all vertices
and
(and their adjacent arcs) for all
j, with
. This can be detected by comparing the
status of vertices (namely time and set of adjacent arcs) before and after performing the procedure for a given connection. In the remainder of the paper, for the sake of brevity, we denote this test by writing either “the graph has changed” or not.
5.2. Insertion Phase
In this section, we discuss in details Algorithm ins-d-ptl whose aim is adding to G vertices and arcs according to the delayed connection in such a way G is a red-te graph properly representing the updated timetable, and then to update accordingly l (see Algorithm 2). In particular, once Algorithm 1 has been executed, the following four cases can occur, for each connection in trip that has been affected by the delay, depending on whether the vertices associated have been removed or not from the graph:
- (a)
and ;
- (b)
and ;
- (c)
and ;
- (d)
and .
In what follows we describe in detail how Algorithm ins-d-ptl manage each of these cases.
5.2.1. Discussion on Case I
In this case, when both vertices have remained in G (see Line 9 of Algorithm 2), we only check whether some transfer arcs have to be updated. This process is summarized in Algorithm 5 which is called as sub-routine by Algorithm 2. In particular, if is the last vertex in dv[s] (see Line 2 of Algorithm 5—Sub-case I.a), i.e., there is no waiting arc outgoing then we compute the subset candidates of vertices in av[s] that do not have any adjacent transfer arc and would not violate the red-te constraints, i.e., we add a vertex to candidates if and only if mtt and v does not have any adjacent transfer arc.
Then, for each vertex we add a new arc (v,) to A.
If, instead, is not the last vertex in dv[s] (see Line 14 of Algorithm 5—Sub-case I.b), i.e., there exists some waiting arc connecting to a vertex succ, then some of the transfer arcs having succ as endpoint in G may need to be updated and connected to (i.e., rewired to ). To this purpose, we first determine the subset of transfer arcs in A having w as endpoint and then, for each arc in , if mtt we replace arc by a new arc (v,). Notice that, for replaced transfer arcs we do not need to update l, since any two vertices that were reachable before such update remain reachable afterward. Moreover, also vertices in dv[s] remain in ordered form, therefore we do not need to add/replace any waiting arc of A. On the contrary, if some modification has been applied to the topology of G or to the ordering of the vertices, then we run inc-bu to obtain an updated version of the 2hc-r labeling (see Line 11).
5.2.2. Discussion on Case II
In this case, occurring when both vertices have been removed from V (see Line 13), we know that the affected connection has no counterpart in G in terms of departure and arrival vertices. Thus, to make G reflect the updated network as a correct red-te model, we proceed as follows.
First, we add a vertex to V and to dv[s] and set its associated time to be equal to the new departure time of the (delayed) connection. After that, we add arcs adjacent to , depending on the presence of other vertices in dv[s] and av[s] and on their times. In particular, if and , i.e., there is no waiting arc outgoing vertex and there exists another departure vertex besides in dv[s], we need to add a waiting arc incoming into , in particular we insert arc (m,) into A.
On the other hand, if there exist some vertices such that , then we remove waiting arc (m1,m2) and add two new waiting arcs (m1,) and (,m2) to A. It is worth to remark here that cannot be such that since otherwise the original vertex would have not been removed by Algorithm 1. The pseudo-code of this part of the insertion phase is shown in Algorithm 4 which is again executed as sub-routine of Algorithm 2. Regarding transfer arcs, after is inserted we execute Algorithm 5, as already discussed for case I. Finally, we run inc-bu to update the 2hc-r labeling l (see Line 17).
Once vertex has been handled, we focus on the arrival stop and insert a vertex into V and av[t], and a connection arc (,) to A. Then, to properly set transfer arcs induced by such connection arc, we search for the vertex v in dv[t] such that: (i) and (ii) is minimum among vertices satisfying (i). If such a vertex v exists, then we add arc to A. Moreover, to properly set bypass arcs, if we add an arc , where we remark that is the arrival vertex of connection of trip. Similarly, we add an arc where is the arrival vertex of connection of trip (see Algorithm 6 for the pseudo-code of this phase). Again, we run inc-bu to update l (see Line 20).
5.2.3. Discussion on Case III
In this case, when has been removed while is in V (see Line 2 of Algorithm 2), we first add a vertex to V and to dv[s] and a connection arc (,) to A. This is followed by the wiring of suited transfer and waiting arcs to , in order to preserve the red-te properties. As in the previous cases, this is achieved by Algorithms 4 and 5, discussed above. Algorithm inc-bu is also run to reflect changes on the 2hc-r labeling (see Line 26).
5.2.4. Discussion on Case IV
In this case, occurring when is part of V while has been removed by the removal phase (see Line 27), we insert a vertex into V and av[t], and the corresponding connection arc (,) into A. This is followed by the addition of bypass and transfer arcs adjacent to , achieved again by Algorithm 6. Furthermore, we obtain the final version l of the 2hc-r labeling (see Line 30).
An example of execution of the insertion phase is shown in
Figure 3. In addition, for the sake of simplicity in understanding, we show an example of execution of the procedures for: (i) rewiring transfer arcs (Algorithm 5) and waiting arcs (Algorithm 4) to a departure vertex in
Figure 4; and (ii) rewiring arcs to an arrival vertex (Algorithm 6) in
Figure 5.
5.3. Updating the Stop Labeling
Once both the graph and the
2hc-r labeling have been updated, if a corresponding compressed stop labeling
sl is available and one wants to reflect the mentioned updates on said compressed structure, a straightforward way would be that of recomputing the stop labeling from scratch, via for example, the routine in Reference [
13]. This computational effort is not large as that required for recomputing the
2hc-r labeling. However, we propose a alternative routine that is incorporated in
d-ptl and avoids (and it is faster than) the recomputation from scratch of the stop labeling. Our routine requires, during the execution of Algorithms 1 and 2, to compute two sets of so–called
updated stops, denoted, respectively, by
us and
us. These are defined as the stops
such that vertices in
dv[
i] (
av[
i], respectively) had their time value or forward label (backward label, respectively) changed during Algorithm
rem-d-ptl or during Algorithm
ins-d-ptl. Sets
us and
us can be easily determined by inserting stops satisfying the property in said sets during the execution of Algorithms 1 and 2, after each update to times or labels.
Once this is done we update the stop labeling
sl by recomputing only the entries of
sl (
sl, respectively) for each
(for each
, respectively). To this aim, for each stop
(
, respectively) we first reset
sl (
sl, respectively) to the emptyset. Then, we scan departure (arrival, respectively) vertices in decreasing (increasing, respectively) order with respect to time and add entries to
sl (
sl, respectively) accordingly. In particular, for all departure (arrival, respectively) vertices
v of
in the above mentioned order, we add a pair
for each
u in
sl (
sl, respectively) only if there is no pair
sl (
sl, respectively) having
u as hub vertex. This guarantees that each pair contains latest departure (earliest arrival, respectively) times. After updating the stop labels, we sort both
sl and
sl to restore the ordering according to the hub vertices [
13]. Details on how to update the stop labeling by executing the procedure are given in Algorithm 3.
We are now ready to give the following results.
Theorem 1 (Correctness of Basic d-ptl). Given an input timetable and a corresponding red-te graph G, let l be a 2hc-r labeling of G and let be a stop labeling associated to l. Assume is a delay occurring on a connection, that is, an increase of δ on its departure time. Let , , and be the output of d-ptl when applied to G, l and , respectively, by considering the delay. Then: (i) is a red-te graph for the updated timetable; (ii) is a 2hc-r labeling for ; (iii) is a stop labeling for .
Notice that, the above theorem is based on the correctness of the approaches in References [
2,
13,
24]. In particular, it is easy to see that whenever we update the graph, we do it by preserving the constraints imposed by the
red-te model on both vertices, by suitably modifying connection arcs and associated waiting, bypass, and transfer arcs. In more details, it is easy to prove, by contradiction, that after the execution of Algorithms 1 and 2,
G is a
red-te graph. Concerning the labeling data structures, observe that after each change to
G we use either
dec-bu or
inc-bu, depending on the type of performed modification. These algorithms have been shown to compute a labeling that is a
2hc-r labeling for the modified graph [
24]. Hence, at the end of Algorithm 2,
l is a
2hc-r labeling for
G. Finally, Algorithm 3 applies the definition of stop labeling, by updating the entry of a stop with the proper hub vertices and times values. Hence, after the execution of Algorithm 3,
sl is a stop labeling of
l and the theorem follows.
Theorem 2 (Complexity of Basic d-ptl). Algorithm d-ptl takes computational time in the worst case.
Proof. The complexity of Algorithm d-ptl is given by the sum of the complexities of Algorithms 1, 2 and 3. In what follows, we analyze separately the three algorithms.
Concerning Algorithm 1, we first bound the cost of executing Lines 1–21, that is, the amount of computational time per connection. Lines 1–8 require a time that is linear in the number of neighbors (incoming and outgoing) of
, which is a constant in
red-te graphs, while lines 9–21 spend a time that grows as said number of neighbors times the time required for performing the dynamic algorithms
dec-bu and
inc-bu. Each execution of these algorithms takes
in the worst case [
24]. Thus, lines 1–21 require
time in the worst case. These lines are repeated for all stops traversed by the vehicle of the trip from connection
to
, therefore in the worst case for all stops of the transit network, which are
. Since
, we have that Algorithm
rem-d-ptl runs in
worst case time.
Concerning Algorithm 2, notice that all sub-routines require a time that is linear in the size of the processed stop (i.e., in the number of associated arcs). Hence, by summing up the contribution for all considered stops (those traversed by the trip from connection to ), we obtain that updating the graph via Algorithm ins-d-ptl takes , as and, in the worst case, the affected trip can traverse all stops of the network. On top of that, we need again to consider the time for executing dec-bu and inc-bu, which are performed again times in the worst case. Since , we have that Algorithm ins-d-ptl runs in worst case time.
Concerning Algorithm 3, it scans label entries of vertices in both us and us in non–increasing and non–decreasing order, respectively (thus requiring either to sort them or to use a priority queue). In both cases, we have an additional logarithmic factor in terms of computational time per vertex. Since all vertices for all stops can be , and since sorting stop labels with respect to hub vertices at the end of the procedure requires worst-case time, it follows that the worst case time of Algorithm 3 is . If we sum up the complexities of Algorithms 1, 2 and 3, the claim follows. □
Notice that, Theorem 2 implies that
d-ptl, in the worst case, is slower than the reprocessing from scratch via
ptl, whose worst case running time is cubic in the size of the graph due to the recomputation of the labeling [
20].
However, our experimental study, which is described in
Section 7, clearly shows that
d-ptl always outperforms
ptl in practice.
6. Dynamic Multi-Criteria Public Transit Labeling
In this section, we extend d-ptl to handle the multi-criteria setting. We refer to the extended version as multi-criteria d-ptl.
We remark that, to update the data structures employed by the basic
ptl framework,
d-ptl exploits the structure of the
red-te graph and alternates phases of modifications of the graph itself with corresponding updates of the reachability labeling via the procedures given in [
24]. These phases are bundled in two blocks, namely the
removal phase (Algorithm
rem-d-ptl, see Algorithm 1) and
insertion phase (Algorithm
ins-d-ptl, see Algorithm 2) that update the labeling along with the graph.
The above two routines, however, cannot be directly employed within the multi-criteria
ptl approach, that relies on a shortest path labeling rather than on a reachability one. In particular, while the modifications to the graph applied by the two routines are almost same for
wred-te graphs (the only exception is that whenever we add a transfer arc we need also to add a suited intermediate vertex for modeling a transfer, whenever the two vertices are associated to connections of different trips.), we cannot use algorithm
butterfly, which is designed for reachability labelings, to update the shortest path labeling at hand. Hence, we need to replace
dec-bu (in lines 14 and 21 of Algorithm 1) and
inc-bu (in lines 11, 17, 20, 26, and 30 of Algorithm 2) with decremental and incremental algorithms that are suited to update the
2hc-sp labeling. To this regard, we can employ the decremental algorithm
decpll of Reference [
21] and the incremental algorithm
incpll of Reference [
15], respectively, that are designed to update
2hc-sp in general graphs.
Unfortunately, by preliminary experiments we conducted on some relevant instances of the problem (we recall the reader that graphs treated in this paper are specifically DAGs), we observed that, while
incpll is quite fast and updates the labeling within few seconds even in very large graphs,
decpll is painfully slow, and sometimes its computational time is comparable with that required for recomputing the labeling from scratch. This is most likely due to the sparse nature of the
red-te graph and to how
decpll updates
2hc-sp labelings. In more details,
decpll works in three phases whose running time depends proportionally on the cardinality of the set of vertices that contain at least a label entry that is incorrect. It is easy to see that this cardinality tends to the number of vertices of the graph in DAGs in most of the cases (see Reference [
21] for more details on this part of the computation).
For such reasons, in what follows we propose an extension of algorithm dec-bu, named dag-decpll, that is explicitly designed to update shortest path labelings in DAGs, instead of reachability labelings, as a consequence of decremental updates to the graph. The main intuition behind dag-decpll is to exploit the specific relationships between shortest paths in DAGs, which are instead neglected by decpll, which is designed for general graphs.
Given a graph , we discuss the new approach by focusing on how to handle the removal of a vertex, say , which is the decremental operation of interest in our scenario. Note that, the routine can be easily extended to handle arc removals or arc weight increases, as discussed at the end of this section. In what follows, we call the graph obtained by removing vertex x from V. Furthermore, we denote by the distance (i.e., the weight of a shortest path) between two vertices u and v of a graph, say S. and define two subsets of vertices of V, namely right and left, as follows:
right: the set of vertices of V that are reachable from x in G, i.e., right if and only if there exists a path from x to u in G;
left: the set of vertices of V that can reach x in G, i.e., left if and only if there exists a path from u to x in G.
Since G is a DAG, it is easy to see that right and left are inherently disjoint, that is right∩left. Additionally, given the above definitions, we say a label entry l of some vertex is affected by the removal of a vertex only if x lies on a shortest path between v and h induced by l. Similarly, a label entry l is affected by the removal of a vertex only if x lies on a shortest path between h and v induced by l.
In what follows, given a vertex , we highlight some simple yet important properties of the two sets right and left that are easily derived by the structure of DAGs.
Property 1. For any vertex such that right no label entry inl is affected by the removal of x from G.
Corollary 1. For any vertex right, a label entry inl may be affected only if left or if .
Property 2. For any vertex such that left no label entry inl is affected by the removal of x from G.
Corollary 2. For any vertex left, a label entry inl may be affected only if right or if .
Lemma 3. For any pair of vertices in V, if left and right, then . Symmetrically, if right and left then .
Proof. The above easily follows by Properties 1 and 2. Notice that, when left (right, respectively), the shortest path from h to v (from v to h, respectively) cannot pass through x by the definition of left (right, respectively). □
According to the previous observations, we now provide a strategy to carefully identify the label entries that are affected by the removal of a vertex x from G. In particular, for each vertex right (left, respectively) we know that l (l, respectively) can contain affected label entries, which must be either removed or updated in order to preserve the correctness of the query algorithm. The routine to achieve the update is based on the notion of marking label entries, that is we assume to store an additional boolean field, attached to each label entry, encoding the information “the label entry is marked or not”. We assume initially all these bits are set to false.
Given the additional boolean field, we define a so–called marked query between two vertices u and v, denoted as mquery, that behaves as a regular query on the labeling with the difference that it considers only those label entries that are either marked or such that their associated vertices do not belong to either left or right. This is done with the purpose of distinguishing label entries that have already been updated with the correct distance or such that the attached distance is not changed by the removal of x. We will show later in the section how this modified query is used to retrieve correct distances during the update.
Algorithm dag-decpll, whose pseudocode is given in Algorithm 7, exploits the above properties and definitions and works as follows. Given the vertex x, the algorithm first computes a topological order T of the graph in linear time. Then, sets right and left are determined, again in linear time via a forward and backward, respectively, execution of the well known breadth-first search (BFS, for short) algorithm, starting from x. This is followed by the removal of x from G. Now, if either right or left are empty, the algorithm simply removes all entries that have x as first field in the labeling l, by linearly scanning it, and terminates. Note that, it is very unlikely for right or left to be empty, therefore the removal of x from l is done in the trivial way, rather than employing explicitly some data structure storing an inverted index for each label entry in l. Otherwise, the algorithm proceeds in two phases, called forward update and backward update, that scan vertices that can contain obsolete label entries (namely vertices in left and right, respectively) with the purpose of either removing them or updating the associated distances. The two phases are described in details separately in the following sections. At the end of the two, dag-decpll removes l and l from l and returns the updated label set. In the pseudocode, we denote by (, respectively) the out-neighbors (in-neighbors, respectively) of the generic vertex v of graph G.
Algorithm 7: Algorithm dag-decpll. |
Input: Directed Acyclic Graph G, 2hc-sp labeling l of G, vertex x to be removed from G Output: Directed Acyclic Graph , 2hc-sp labeling l of |
6.1. Forward Update
The procedure processes vertices in left in decreasing order with respect to a topological ordering T of G. Assume we are processing a given vertex, say v. If v has a maximum value in T as compared to that for the rest of vertices in left, then we know by the definition of T that no vertex in belongs to left. We also know that a label entry l may be affected if and right (see Corollary 1). Moreover, it can be easily seen that, for any vertex with left, no label entry in l is affected by the removal of x from G (see Corollary 2). Additionally, for the rest of cases where and left, by definition of T, u must have been processed before v.
The routine hence proceeds by removing all affected label entries from
l. Notice that, after removing such label entries, we can retrieve the correct distance in the new graph
for any vertex
such that
right, by performing a query
query, since the path induced by the labeling does not contain
x in these cases. However, to guarantee that the cover property of
l is satisfied with respect to all pairs of vertices of the new graph
, we may need to add new label entries to
l and possibly to backward label sets of vertices in
right. To this aim, we exploit the notion of
superset of hubs, originally presented in Reference [
24], and incorporate it in the
dag-decpll update procedure after suitably adapting it in order to make it compatible with
2hc-sp labeling.
In more details, the superset of hubs for a forward label
l, denoted by
, is defined as the union of the hub vertices, belonging to
right, in all forward label sets of all vertices in
. More formally:
In the case of reachability labeling one can exploit the notion of superset of hubs to update the reachability properties of a given vertex v: if a neighbor of v is reachable from a given vertex, so is v. Here, instead, we exploit it to simplify the update of the distances stored in the label entries. In details, since we are updating a 2hc-sp labeling, to achieve the update of the label of a given vertex v, we need to compute for all and use it to update entries l so that they correspond to distances in the new graph.
One way to do this is to execute a baseline algorithm for computing shortest paths in DAGs. However, even if it is well known that this costs linear time with respect to the graph size, this can easily become a computational bottleneck when dealing with medium to large scale graphs, since we need to compute many distances during an update.
To overcome this limit, we propose a hybrid approach that exploits and l to compute distances faster. In more details, it is easy to observe that for any and for any such that left, the correct distance can be computed via a query on the labeling query, since the path induced by the labeling from w to h cannot include x (see Lemma 3). Moreover, for any the path between v and h must pass through at least a vertex in right. This implies that, if we have the set of vertices leftleft}, that are reachable from v in , then is given by the minimum value between among all vertices (note that can be retrieved from l). Therefore, to compute for all , we run a pruned BFS starting from v (see sub-routine shown in Algorithm 8).
Algorithm 8: Algorithm customBFS. |
Input: Directed acyclic graph G, a vertex s of G, sets right and left Output: Set of pairs of vertices and relative distances from s 1 2 |
Once all distances are available, we process the vertices in
in increasing order with respect to topological sorting. In particular, for each
in increasing order of
, we update the label entries by using the computed distances, the notion of superset, and the labeling
l (see Lines 11–21 of Algorithm 9). Whenever we add a new label entry or update an existing one, we
mark the entry so that we keep trace of distances that have already been checked. On top of that, after the first iteration, we exploit the marked query every time we need to check whether a discovered distance
d, passing through a vertex, is already encoded in the labeling or not. Finally, notice that, whenever we add a new label entry to the
2hc-sp labeling, we insert it in order to preserve the well-ordered property [
26]. This property guarantees that the labeling is minimal in size (i.e., if a single entry is removed, the cover property is broken). To achieve it, vertices are sorted according to any reasonable criterion before the initial preprocessing takes place and, whenever a label entry associated with an hub
h has to be added to the label set of a vertex
v, this is done if and only if
h preceedes
v in the established order (we refer the reader to Reference [
21,
26] for more details). We denote by
the position of a vertex
according to the established order.
Algorithm 9: Procedure forward. |
Input: Directed Acyclic Graph G, 2hc-sp labeling l of G, vertex x to be removed from G, sets right and left |
6.2. Backward Update
The procedure processes vertices in right in increasing order with respect to the same topological ordering T of G. Assume we are processing a given vertex, say v. We know that a label entry l is affected if and left (see Lemma 3). However, in this case, there may be marked label entries, for example, l such that left, that have been added in the forward update phase and that are therefore not considered as affected (they have already been updated). Moreover, it can be easily seen that, for any vertex with right, no label entry in l is affected by the removal of x from G (see again Lemma 3). Additionally, for the rest of cases where and right, by definition of T, u must have been processed before v. Hence, we proceed by removing all affected entries from l, where a label entry l now is affected only if or left and is not marked.
Notice that, after removing affected label entries from
l, we can compute correct values of
for any
such that
left, via a query
query. However, to restore the cover property for other vertices, we may need to add new label entries to
l and possibly to forward label sets of some of the vertices in
left. To this end, symmetrically to the forward update case, we compute the superset of hubs, this time for the backward label
l, denoted as
, as the union of the hub vertices, belonging to
left, for all backward label sets of all vertices in
, that is:
Observe that, for any , the path between w and v in must pass through one of the vertex in , by the structure of the DAG G. Moreover, we also know that for any and for any , the distance can be correctly computed via a query query. In particular, it is given by the minimum value we obtain for among all vertices . If this value is not encoded in the labeling, we add to l if , otherwise we add to l.
We are now ready to discuss on the correcntess of the newly proposed approach.
Theorem 4 (Correctness of dag-decpll). Let G be a DAG, let l be a 2hc-splabeling of G, and let x be a vertex of G. Let and be the output of algorithm dag-decpll when applied to G, l and x. Then: a) is a DAG and b) is a 2hc-sp labeling of .
Proof. Concerning (a), the proof is trivial. In fact, if the topological ordering property is true on the arcs of G, then it will hold on , as we only remove a vertex and its adjacent edges. Regarding (b), we need to show that the cover property holds for all pairs of vertices of the new graph. To this end, first observe that we remove all label entries that induce paths that include the removed vertex x. Then, notice that, in both forward and backward procedures, we test the property for all and only the vertices that are affected by the removal of x (sets right and left) and that the algorithm adds new label entries to vertices by considering them in the order imposed by the topological sorting. The addition of new label entries is done incrementally, by either relying on distances that are: (i) either computed in the new graph via the customBFS; or (ii) obtained by combining distances encoded in the labeling that have surely not changed because of the removal of x; or (iii) marked, and hence already updated by previous iterations of the two procedures. □
Theorem 5 (Complexity of dag-decpll). Algorithm dag-decpll takes in the worst case.
Proof. Note that, for each vertex in left, the algorithm: (i) scans the neighbors and analyzes the label sets of such neighbors (possibly removing some entries); (ii) executes procedure customBFS; (iii) processes vertices in and for each one of them possibly performs a marked query. Concerning (i), asymptotically this costs overall quadratic time in the size of G, since the graph is acyclic and the worst case label size is . Concerning (ii), again we have an asymptotical time complexity that is quadratic with respect to. , since customBFS must explore the whole graph in the worst case. Finally, the asymptotical time complexity for executing (iii) can be bounded by observing that vertices in can be at most and that for each of them we may execute a constant number of queries, which take each. Similar considerations can be done to bound the time spent by the algorithm for each vertex in right, with the exception of procedure customBFS which is not executed, as vertices in left have already been processed. Therefore the claim follows. □
Theorem 6 (Correctness of Multi-criteria d-ptl). Given an input timetable and a corresponding wred-te graph G. Let l be a 2hc-sp labeling of G and let e-sl be an extended stop labeling associated to l Assume is a delay occurring on a connection, i.e. an increase of δ on its departure time. Let , , and e-sl′ be the output of d-ptl when applied to G, l and e-sl, respectively, by considering the delay, in the multi-criteria setting. Then: (i) is a wred-te graph for the updated timetable; (ii) is a 2hc-sp labeling for ; (iii) e-sl′ is an extended stop labeling for .
Proof. The correctness of
d-ptl in the multi-criteria case is based on that of the approach in [
15] and on Theorem 4. In particular observe that, whenever we update the graph, we do it by preserving the constraints imposed by the
wred-te model on both vertices, by suitably modifying connection arcs and associated waiting, bypass, and transfer arcs. In more details, it is easy to prove, by contradiction, that after the execution of the
d-ptl algorithm for the multi-criteria case,
G is a
wred-te graph. Concerning the labeling data structures, observe that after each change to
G we use either
dag-decpll or
incpll, depending on the type of performed modification. These algorithms have been shown to compute a labeling that is a
2hc-sp labeling for the modified graph (see Theorem 4 or the proof in [
24]). Hence, at the end of Algorithm 2,
l is a
2hc-sp labeling for
G. Finally, note that the algorithm described in
Section 6.6 applies the definition of extended stop labeling, by updating the entry of a stop with the proper hub vertices, times and distances values. Hence, after the execution of algorithm described in
Section 6.6,
e-sl is still an extended stop labeling of
l. □
6.3. On Handling Arc Removals or Arc Weight Increases by dag-decpll
In this section, we provide an overview on how to extend dag-decpll to handle arc removals and arc weight increases. In details, given an arc (u,v) to be removed from G, then to update a 2hc-sp labeling l via dag-decpll, we model the removal as the removal of a virtual vertex, say , having arcs (u,x′) and (x′,v) in G. In particular, we remove (u,v) from G and run the dag-decpll procedure by considering as the vertex to be removed. It is easy to observe that this has the same effect of updating l after the removal of (u,v) from G. It is important to mention that as is a virtual vertex in , therefore no label set exists that is associated to . To handle an arc weight increase for a generic arc (u,v), as a first step, we remove (u,v) and then update l using the approach mentioned above. We then insert a new arc (u,v) with the updated arc weight value, and run incpll which update l by possibly adding new label entries to l.
6.4. Compacting a Multi-Criteria Public Transit Labeling
In this section, we propose an extension of the notion of stop labeling sl, named extended stop labeling (shortly, e-sl), suited for answering to multi-criteria queries. In particular, given a 2hc-sp labeling l of a wred-te graph G, we associate to each stop two sets, namely a forward stop labele-sl and a backward stop label e-sl where, in this case, the forward (backward, respectively) stop label is a list of triples of the form where
v is a hub vertex reachable from (that reaches, respectively) at least one vertex in dv[i] (av[i], respectively);
encodes the latest departure (earliest arrival, respectively) time to reach hub vertex v from one vertex in dv[i] (to reach a vertex in av[i] from vertex v, respectively);
encodes the minimum number of transfers to reach hub vertex v from one vertex in dv[i] (to reach a vertex in av[i] from vertex v, respectively).
Our approach to compact a labeling for multi-criteria queries is as follows. To compute e-sl, we process vertices in dv[i] in decreasing order with respect to departure time. In particular, let v be the vertex under consideration. Then, for each , we add to e-sl only if one of the following conditions hold:
To compute e-sl, symmetrically, we process vertices in av[i] in increasing order with respect to arrival times. If v is the vertex under consideration then, for each , we add to e-sl only if one of the following conditions hold:
Note that the above second conditions are necessary since, differently from the original stop labeling, here a generic hub h can be added more than once to e-sl or e-sl, since we might have more paths toward h (or from h) having different number of transfers.
Notice that, for the sake of efficiency, we sort entries in
e-sl and
e-sl with respect to the first, second and third fields, in this order, similarly to what is done in Reference [
13] for the stop labeling. The detailed procedure for computing the extended stop labeling is shown in Algorithm 10.
Algorithm 10: Algorithm e-sl Computation. |
Input: wred-te graph G, 2hc-sp labeling l of G Output: Extended stop labeling e-sl |
6.5. Answering to Multi-criteria Queries via Extended Stop Labeling
For answering a multi-criteria query mc-ea via extended stop labeling, we proceed as follows. Note that e-sl and e-sl are arrays sorted with respect to ids, the algorithm as a first step finds the vertex v in e-sl (e-sl, respectively) whose time is greater than or equal to . Assume that said vertex is in position p (q, respectively) in such arrays.
Then, a linear sweep, starting from location p, is performed on sl to find the first entry satisfying the condition that . Let us assume this entry is stored in location . This part of the computation is known as the process of computing relevant hubs and it is followed by the computation of all hubs that are both e-sl and e-sl, stored at locations greater than and q in e-sl and e-sl, respectively. We then perform a linear sweep on both e-sl and e-sl starting from and q, respectively. While performing the linear sweep, we add the corresponding journey for each matched hub we found to a temporary set, say M. Once M is computed, we order the journeys in M with respect to their arrival times. Finally, we process journeys in M sequentially, and add a journey J to profile only if the accumulated number of transfers in J is less than the number of transfers for journeys added so far to profile. Finally, we return profile as the answer to the profile query mc-ea.
6.6. Updating the Extended Stop Labeling
If an extended stop labeling e-sl is available and the network undergoes a delay then, after updating both the wred-te graph and the 2hc-sp labeling, the trivial way to update e-sl is to recompute it from scratch. However, to reduce the required computational effort, in what follows we propose a procedure that is able to exploit the information about the changed part of the graph and the 2hc-sp labeling to update the corresponding extended stop labeling in very short time.
In details, the procedure for dynamically updating the extended stop labeling requires, during the execution of Algorithms 7–11, to compute two sets of so–called updated stops, denoted, respectively, by us and us. In the multi-criteria case these two sets are defined as the stops such that vertices in dv[i] (av[i], respectively) had their time value or forward label (backward label, respectively) changed during Algorithm dag-decpll. Once this is done we update the extended stop labeling e-sl by recomputing only those entries of e-sl (e-sl, respectively) for each (for each , respectively).
We are now ready to provide the following results.
Theorem 7 (Complexity of Multi-criteria d-ptl). Algorithm d-ptl in the multi-criteria setting takes computational time in the worst case.
Proof. The proof can be derived by the argument given in the proof of Theorem 2. In particular, we know that the worst case time complexity of both incpll and dag-decpll is for a graph with vertices and that these routines, in Algorithm d-ptl, can be executed, in the worst case, for all stops, which are . Since for any wred-te graph, the claim follows. □
Algorithm 11: Procedure backward. |
Input: Directed Acyclic Graph G, labeling L of G, vertex x to be removed from G, sets and . |