1. Introduction
Worldwide, cities face pressures to dynamically adapt their public transportation systems to better respond to the increasingly complex changes in urban mobility. Pandemic-evoked changes are driven by shifts in travel patterns, increased traffic demand variability, the need to enforce safety norms of social distancing and hygiene, and the rising advocacy towards active modes of transportation [
1]. In a disruptive and evolving socioeconomic context, such as a pandemic one, the value of static studies with fixed temporal spans is thus of limited relevance as their findings can easily become obsolete [
2]. Instead, the presence of an enhanced methodology for the dynamic and comprehensive tracing of actionable changes to the urban mobility dynamics is necessary so that the finding can be periodically updated in the presence of incoming multimodal traffic data [
3]. Still, the research literature on the dynamic assessment of the impacts of COVID-19 on public transport use is limited [
4].
This work provides a comprehensive and actionable description of non-trivial traffic demand changes within a multimodal public transportation system between two reference periods from individual trip data. The city of Lisbon in Portugal is used as the reference study case to this end. Complete individual trip record data along the public transportation system from 2019 and 2020 is considered in this study. Public transportation in Lisbon operates under an integrated fare collection, offering the unprecedented possibility to trace the movements of each user along multiple modes within the public transportation system. In particular, we consider trip record data gathered from the three major public transportation modes in Lisbon: subway or underground, bus, and tramways.
The discovery of changing mobility patterns in urban centers is hindered by four major challenges [
5,
6]. First, trip record data is inherently multimodal and shows spatiotemporal stochasticities. The rich modal, geographical, and calendrical content of trip record data should be properly explored. Second, massive timestamped validations are produced at subway stations, buses, and trams around the city. Over 50 million trips are recorded in Lisbon per month. Analyzing such massive data comes with strict scalability requirements for the pursued processing and learning algorithms. Third, there is the need to go beyond trivial changes in demand into more informative views that can account for changes in mode preference as well as emerging traffic origin-destination flows throughout the city. Fourth, there is the need to guarantee the statistical significance, actionability, interpretability, and navigability of the changing urban mobility patterns. This work applies a novel methodology to answer the above challenges, characterizing mobility dynamics in urban centers from user trip data, combining both statistical principles with advanced pattern mining principles based on order-preserving biclustering searches.
The gathered results comprehensively map statistically significant changed travel patterns within the Lisbon city, going far beyond simplistic localized changes to the magnitude of traffic demand per geography. This research offers a statistical ground for the spatiotemporal assessment of actionable mobility changes and provides essential essential insights for other cities and public transport operators facing mobility challenges alike.
This work is anchored in the pioneer research and innovation project ILU, a project that joins the Lisbon city Council and national research institutes, bridging the ongoing research on urban mobility with recent advances from artificial intelligence.
2. Background
Automated fare collection (AFC) systems produce individual trip records in public transportation, generally consisting of smart card validations from users at stations or vehicles. For each card validation, an individual trip record is issued with the passenger identifier, timestamp, boarding or alighting location and, for validations inside vehicles, additional details pertaining to the vehicle and route. In cities, such as Lisbon, the ticketing systems of public carriers are consolidated, offering the possibility to trace multimodal user movements along the public transportation network.
Individual trip record data are generally subjected to statistical exploration, visual inspection, and often mapped into
spatiotemporal data structures more conducive to subsequent traffic demand analysis. Three major structural representations of trip record data can be found:
georeferenced time series of traffic demand at different locations and routes; end-to-end
origin-destination (OD) data mapped from paired entry-and-exit card validations of users along the public transport network; and raw trip/event data. Given a specific traffic data structure,
descriptive tasks generally aim at extracting statistically significant patterns or to generatively model traffic dynamics;
predictive tasks are considered when forecasting upcoming traffic dynamics or to discriminate particular traffic conditions of interest; and
prescriptive tasks rely on optimization principles over the described/predicted traffic dynamics to place mobility recommendations. Computational approaches for the analysis of traffic demand and OD series generally rely on classic statistical principles – including decomposition, auto-regression, differencing and exponential smoothing operations – and on advances from machine learning – distance-based approaches reliant on series similarities (motif analysis, lazy learning, barycenter computation) [
7] and recurrent neural network approaches for the autonomous learning of complex temporal associations. Complementary approaches have been proposed for raw trip record data analysis, including distance-based and generative approaches from event-sets [
8], episode mining [
9], and dedicated neural processing architectures [
10]. More frequently, raw trip data analysis resorts to the introduced approaches for demand/OD traffic series under specific spatiotemporal aggregation criteria.
Given a specific spatiotemporal data structure, a
pattern is a set of spatially correlated, coherently changing observations along time [
11]. Illustrating, periodic patterns or urban traffic describe recurrent demand over regular time intervals at certain locations or origin-destination flows. For a given pattern solution, different criteria of interest can be measured: (i)
pattern support, the number of observations satisfying the pattern; (ii)
pattern length, the multivariate order and spatiotemporal extension of the given pattern; and (iii)
pattern strength, including confidence, lift and interestingness, defining the association strength among the elements composing a pattern.
Specific contributions within the spatiotemporal pattern mining field aim at describing changing dynamics.
Emerging Patterns (EPs) were firstly introduced by Dong et al. [
12] in the context of multivariate observations collected from two periods/datasets. An emerging pattern was in this context defined as a multivariate pattern whose support suffered a significant change between the two given periods. Neves et al. [
5] extended this early notion of emerging pattern to encompass an arbitrary number of time periods and to further incorporate spatial information. In the context of this work, an emerging pattern is defined as a set of spatially correlated observations whose values satisfy specific growth, fitness and support criteria along time. The
growth criterion defines the rate at which observations change along time. Given a specific growth rate, the
fitness (quality) criterion defines how well observations follow (deviate) from the given expectations. Finally,
support criterion defines the number of observations (temporal extent) satisfying the given growth and accuracy criteria. Emerging patterns not satisfying specific growth, quality and support criteria may be spurious and should therefore be discarded. Paradigmatic examples include patterns without clear trends or guarantees of fitness quality due fitting errors and/or lack of support for sound statistical testing.
Despite the relevance of these previous pattern-centric studies, they are arguably insufficient to assess changing traffic patterns along a pandemic context. Given the high extent of mobility changes observed throughout pandemic contexts, there is the need to deal with arbitrarily-high volume of patterns and go beyond trivial views focused on how demand is changing at specific locations, into more actionable views able to capture changing travel patterns throughout a urban centre. In this context, and in addition to the introduced growth, quality and support criteria, emerging patterns should further satisfy the following properties of interest: (i) non-triviality (novelty); (ii) actionability (ability to support real-world decisions and reveal relevant knowledge); (iii) robustness (bounded noise tolerance); (iv) statistical significance (excluded spurious patterns occurring by chance); (v) interpretability; (vi) coverage (complete solutions spanning different geographies and time periods); and (vii) efficiency of the pattern retrieval process.
3. Related Work
The discovery of actionable spatiotemporal mobility patterns has received particular attention in recent years with the increased availability of urban data, advances on spatiotemporal data analysis, and global pressure towards sustainability [
13,
14,
15]. Classic approaches make use of statistics, parametric models and visualization principles to understand spatiotemporal traffic dynamics, with particular focus on highlighting discrepancies between origin-destination matrices [
15,
16] and establishing views on changing traffic flows [
17,
18,
19]. Clustering has been also applied to identify geographies with correlated traffic demand [
20], detect vulnerabilities along a transport network [
21], and complementary spatial associations assessed against external factors of influence [
22]. Classic pattern mining algorithms have been also successfully extended to detect trajectory patterns [
23], patterns of daily congestion and spatial propagation [
24,
25], amongst other patterns of urban mobility [
26]. Neves et al. [
11] proposed the combined use of spatiotemporal data transformations and biclustering to comprehensively find congestion patterns from heterogeneous sources of road traffic data. Despite their relevance, their application for assessing significant mobility changes along a pandemic context requires the visual inspection of the differences between the solutions derived from the multiple periods of interest.
Recent published studies have offered a description of the effects of the COVID-19 pandemic in the urban mobility dynamics [
27,
28,
29]. Most studies within this context focus on road traffic, identifying general trends observed before versus during quarantine [
27,
28]. Complementarily, some studies assess the pandemic effects on public transportation. Campisi et al. [
30] offer a statistical analysis of post-pandemic mobility needs using comprehensive questionnaire data on the Sicily public transportation. The results capture the changes in perceptions of residents regarding transportation, including the importance of remote work and active modes of transportation (e.g., cycling). However, the gathered results are not anchored in the actual patterns from available traffic data. Similarly, Przybylowski et al. [
31] rely on questionnaire data to assess the impact of the COVID-19 pandemic on public transport users in Gdansk, Poland. Sharifi and Khavarian-Garmsird provide a qualitative assessment on the impacts of COVID-19 in the air and water quality across cities during lockdown periods, and offer recommendations related to the socioeconomic factors, urban management, governance, transportation and urban design that can be used for post-COVID urban planning and design. Tamagusko and Adelino [
32] studied changing traffic dynamics in Portugal during the starting months of the pandemic, measuring drops in public transport demand. This study also establishes a relationship between re-transmission rates and the enforced safety measures put in place by the Portuguese government. Despite its relevance, the study does not explore changing travel patterns.
In contrast with frequent, periodic or anomalous patterns, changing patterns can be dynamically discovered to reveal the impacts of a pandemic context on urban traffic dynamics. The discovery of changing patterns can be traced back to two different research streams. In time series data analysis, the discovery of emerging behaviors generally corresponds to the modeling of both linear and non-linear trends within a time series [
33]. Changing behaviors are generally approximated using regressive or auto-regressive models, including regime switching models and neural network models, approximated on the original time series or on a decomposed series after removing seasonal and cyclical components [
34]. In the pattern mining field, emerging behaviors were in 1999 coupled with the pattern concept, implying the satisfaction of statistical frequency criteria. An emerging pattern (EP), as firstly introduced by Dong et al. [
12], is a set of data instances whose characteristics entail significant changes between two (or more) periods. This original notion of EPs has been extended and mostly applied in different domains [
35,
36]. Novak et al. [
37] consistently combined principles of contrast set mining, emerging pattern mining, and sub-group discovery with the aim of discovering supervised temporal rules. Chen et al. [
38] propose association rule discovery along different time periods. They extend the early Song et al. [
39] concepts towards emerging, unexpected and added rules and propose corresponding evaluation measures of growth, difference, and modified difference. In the urban mobility domain, Neves et al. [
5] proposed principles for the linear time discovery of emerging and abrupt changes in road traffic along a specific interval.
Alternative time-changing patterns have been proposed for different data structures, including three-dimensional data via triclustering [
40]; collections of events using both generative and deterministic approaches [
8,
41]; as well as streaming data [
42] by combining evolutionary algorithms with batch strategies.
Despite the relevance of the surveyed studies, their application for the comprehensive discovery of changing traffic dynamics within a multimodal public transportation system is still generally hampered by some of the following four major challenges:
the need to go beyond trivial views on how demand is changing at specific locations/routes, and unravel the newly formed circulation dynamics within a city;
the need to address the high volume of patterns in contexts where urban mobility entails large modifications;
the need to prioritize changing traffic dynamics in accordance with their discriminative power; and
the need to guarantee the statistical significance, interpretability and actionability of the found patterns.
In recent years, a clearer understanding of the synergies between biclustering and pattern mining paved the rise of a new class of algorithms, generally referred to as pattern-based biclustering algorithms [
43]. In 2021, biclustering found its primary application over intelligent transportation systems [
11], allowing the comprehensive discovery of (de)congestion road traffic patterns from stationary and mobile sensor data. Pattern-based biclustering algorithms are inherently prepared to efficiently find exhaustive sets of biclusters and offer the unprecedented possibility to affect their structure, coherence and quality [
44]. This behavior explains why this class of biclustering algorithms is receiving increasing attention in recent years [
43]. BicPAMS (Biclustering based on PAttern Mining Software) consistently combines these state-of-the-art contributions on pattern-based biclustering [
45].
Despite the relevance of pattern-based biclustering for pattern discovery, they have not been extended for exploring actionable changes in demand from individual trip record data. In the context of our work, BicPAMS is applied with an order-preserving assumption [
46] and extended with discriminative power to find changing mobility dynamics over the denormalized traffic data. The statistical frame placed by pattern-based biclustering algorithms in BicPAMS [
47] guarantee that the found changing mobility dynamics further satisfy specific criteria of statistical significance and actionability. An actionable pattern is one having a practical value to acquire novel knowledge, aid decision making, or support mobility reforms.
4. Materials and Methods
4.1. Data
The target individual trip record data were made available by the two major carriers in the Lisbon metropolitan area, CARRIS (the tramway and major bus operator) and METRO (the subway operator). Individual trips correspond to smart card validations at METRO stations and CARRIS buses and tramways, monitored through an integrated fare collection system. In this study we consider all the individual trips recorded throughout a typical pre-pandemic month, October 2019, and a post-pandemic month, May 2020. The Lisbon city was in strict quarantine throughout all days of May of 2020, with changes in the applied quarantine restrictions at two moments, May 2nd and 18th. Along October and May periods, a total of 38,845,645 and 14,867,335 trips were observed at the METRO and CARRIS networks, respectively. An illustrative set of anonymized raw trip records from CARRIS is provided in
Table 1.
4.2. Statistical Exploration
Varying spatiotemporal criteria was considered to identify statistically significant changes in traffic demand within the public transportations system. Spatially, we consider both coarser geographies given by well-established zoning criteria and groups of routes, as well as finer spatial criteria given by stations and routes. Geographical information pertaining to the major city routes vulnerable to congestions is further available. Temporally, we consider different calendrical conditions (weekdays, weekends, day of week), and consider typical weekdays (Tuesdays, Wednesdays and Thursdays) as the default calendar for the conducted statistical analyzes. To assess traffic demand levels, hourly aggregations of trip records are provided by default. A graphical interface with spatiotemporal navigation facilities was further developed for the visualization of demand changes along the Lisbon city for two parameterizable periods of interest.
4.3. Discovery of Discriminative Patterns of Changing Traffic Dynamics
To address the limitations of the surveyed approaches for traffic pattern discovery, we propose the combined use of spatiotemporal data transformations and biclustering to comprehensively find actionable patterns of traffic demand changes from individual trip record data distributed along two periods of interest. In contrast with clustering, biclustering—the discovery of subspaces within real-valued data—provides the possibility to search for traffic patterns across geographies, offering modular views.
Definition 1. Let be the set of stations and geographies of interest; δ be a time interval of interest within a day; and be the demand observed for a station, , along a δ interval, generally corresponding to the amount of entry or/and exit card validations. Anorderof stations, , is a permutation of stations in accordance with their demand for a given time interval δ, i.e., such that and .
Definition 2. Given a set of day instances {}, set of stations S, and time interval δ; anorder-preserving patternis a frequent ordering of stations, , where the frequency should be sufficient to guarantee that the association is statistically significant, thus deviating from null expectations.
Given two reference time periods, and , each with a set of day instances; adiscriminative order-preserving patterncorresponds to either an order-preserving pattern in that is frequently disrupted in , denoted , or an order-preserving pattern in that is frequently disrupted in , denoted .
A changing mobility dynamic is seen as a disrupted ordering of traffic demand among stations or geographies. In other words, changing dynamics occur when order-preserving associations are significantly altered between two time periods. A changing mobility dynamic can be given by a discriminative order-preserving pattern of traffic demand in accordance to Definition 2.
Figure 1 instantiates some of the concepts introduced in Definitions 1 and 2. In this illustrative scenario, we consider the analysis of demand along a set of days from two reference periods,
and
. We observe the formation of two order-preserving patterns,
and
. While the order-preserving pattern
is discriminative, the order-preserving pattern
is not discriminative as it is not frequently disrupted in
. Further details associated with the discovery process and the computation of the discriminative power and statistical significance of the patterns are introduced later.
The intuition behind the target pattern discovery process is to comprehensively identify changes in the orders of demand among stations using two reference periods during particular times of the day (e.g., peak hours). The focus on disrupted order-preserving patterns is essential to guarantee their actionability. Understandably, an alternative and simplistic focus on changes to demand levels would lead to large volumes of trivial patterns informing the user of general decreases in demand throughout pandemic times, with poor translation into operational and tactical mobility reforms.
The methodological approach for mining the target discriminative patterns of order-preserving demand will be introduced along two major steps:
data mappings necessary to solve the tackled problem using biclustering tasks;
application of biclustering algorithm with order-preserving coherence and guarantees of discriminative power;
retrieval and postprocessing of public traffic patterns from biclustering solutions.
4.3.1. Data Mappings
The first step of the discovery process of changing mobility dynamics is to fix spatial and temporal constraints, including the target geographies , time periods , and weekday annotations for the composition of data instances. As default, the discovery process considers all available geographies of the search space at their finest level (i.e., stations) and uses typical weekdays as the reference calendar to produce comparable data instances. In addition, the time intervals of interest (e.g., hour, on/off-peak intervals) can be optionally specified to guide road traffic data aggregation. By default, daily demand is considered.
Once these constraints are fixed, data mappings are applied to transform the original spatiotemporal data into a tabular data structure, more conducive to the subsequent pattern mining task. In the target structure, each observation/row represents a day; each variable/column corresponds to a network station or geography (set of stations); and the observed values correspond to the aggregate demand on a given day and station under a specific calendric constraint.
4.3.2. Discriminative Order-Preserving Biclustering
Under the previous mappings, traffic data still preserves their spatiotemporal content, yet denormalized within a tabular data structure, turning it a candidate for the application of biclustering. Biclustering aims at finding subsets of observations with values correlated on a subset of variables.
Definition 3. Given a traffic dataset defined by a set of day instances, =, stations/geographies =, and elements measuring the demand observed in and station over a given time interval of the day δ:
AbiclusterB= is a subspace, where D= is a subset of days and is a subset of stations or geographies;
Thebiclustering taskaims at identifying a set of biclusters = such that each bicluster = satisfies specific criteria of homogeneity and statistical significance.
Homogeneity criteria are commonly guaranteed through the use of a merit function. Merit functions, such as the variance of the values in a bicluster [
48], are typically applied to guide the formation of biclusters in greedy and exhaustive searches.
Statistical significance criteria can be further placed to guarantee that the retrieved biclusters cannot occur by chance, i.e., their occurrence deviates from expectations.
Definition 4. Elements in a bicluster havecoherenceiff =++, where is the expected demand for station , is the adjustment for day , and is the noise factor.
A bicluster is said to beconstantwhen =0. A bicluster, , isorder-preservingiff the values for each day in D induce the same linear ordering π along the set of stations in S.
The constant correlation assumption suffers from a problem: two random days need to show identical demand in order to count as supporting observations of a bicluster. However, demand is inherently stochastic, showing high variability, particularly in a pandemic context. In this context, order-preserving patterns are pursued to guarantee a greater robustness to traffic variability, while still guaranteeing the coherence of the target traffic patterns. According to Definitions 2 and 4, order-preserving patterns correspond to frequent orders of demand among a set of stations for a given time interval (e.g., peak hours).
BicPAMS is the suggested biclustering search to the discovery of order-preserving patterns [
45]. As discussed with greater detail in
Section 3, BicPAMS is inherently ability to retrieve patterns with easily parameterized homogeneity criterion and strict guarantees of optimality and statistical significance [
45,
47].
To guarantee that the found patterns are discriminative of a specific period, the biclustering search was extended to guarantee that the observed orders yield discriminative power against the period class. Illustrating, given two stations, and , if one station shows consistently higher demand on a period and consistently lower demand on , the corresponding order-preserving patterns, > and >, are discriminative.
To this end, the
lift of the candidate association rules,
is assessed in order to guarantee the formation of candidate patterns within the biclustering search is restricted to patterns with lift above a given threshold
. On one hand, the higher the minimum lift, the higher is the discriminative power of the retrieved order-preserving patterns, i.e., only the more accentuated differences on the demand orderings between time periods
and
are returned. On the other hand, lower lifts (yet still considerably above 1) can be useful to guarantee a more comprehensive view of the disrupted orders, even when they occur at a looser discriminative power. From empirical evidence,
= 1.5 is suggested to extract order-preserving patterns with solid discriminative power without incurring in loss of potentially relevant patterns.
6. Discussion
The reported research applied an enhanced methodology for discovering changing traffic patterns in public transportation systems using the city of Lisbon as a case study. More specifically, it bridged principles from urban computing and artificial intelligence algorithms to detect order-preserving patterns of demand among stations, along with their discriminative assessment to place a focus on actionable mobility changes. Principles from incremental data mining and online learning can be further placed to guarantee the ability to learn from continuously arriving trip record data. These principles guarantee the updatability of the computational models in the presence of more recent data, offering the possibility to dynamically reflect ongoing mobility changes throughout the current COVID-19 and future pandemics.
To the best of our knowledge, this is the first study that applied advanced pattern mining principles based on order-preserving biclustering algorithms within the urban mobility field. This feature enables transportation planners to analyse critical nodes using and holistic perspective along corridors or specific zones and trace changes occurring in different modes covering multiple geographies at a time. This approach aims to support the city and its public transportation operators to optimize supply, explore synergies and redesign dynamically their services and routes during disruptive events to better meet passenger needs.
Among the multiplicity of findings, the research revealed that the impact of COVID-19 on public transportation demand were considerably lower in those stations located in areas outside of Lisbon municipality and in zones with lower incomes. A possible interpretation is the need experienced by some daily commuters to continue relying on these transport modes to travel to work. The observed relationship between the demand impact and average income is cohesive with the arguments for inequality reported by Sharifi and Khavarian-Garmsird [
50]. In this context, public transportation reveals to be key to address mobility needs during disruption and, hence, provision of safe services is a means to contribute to social equity goals. On the other hand, patterns and stations registering the highest decrease in demand are mostly located in specific areas in the city of Lisbon (higher price of housing per square meter) which seem to indicate that residents could adapt their working status and change it to digital forms. Overall, impacts of COVID-19 on public transportation highlighted the socioeconomic disparities of users across different areas and corridors.
The analysis of the changing mobility dynamics along the major city arteries (
Figure 8 and
Figure 9) show that stations serving the downtown district (Cais do Sodré, Rossio and Terreiro do Paço) and the central business district (Saldanha and Marques de Pombal) suffered a heightened demand contraction after the pandemic, altering the pre-pandemic established demand effects of those classic traffic attracting poles. Upper regions of the Avenida Almirante Reis also entailed higher changes when compared with stations lower in the same avenue, for both modes of transportation. On the Marques de Pombal to Campo Grande route, we observe a greater impact on the demand for METRO stations over bus stops, particular those stations serving large commercial and sports poles, such as Benfica and Campo Grande. The observed degree of pandemic-triggered changes to urban mobility are consistent with other works [
27,
30,
51,
52].
The gathered discriminative patterns (
Figure 10 and
Figure 11) augment these views, supporting the comparison of the relative geographical variations on the traffic demand with the aim of unveiling actionable bottlenecks. As the rates of change in demand are not uniform throughout the city, the discovery of patterns with such actionable characteristics are essential to produce focal information for promoting more resilient public transportation system.
Further policy and planning measures are expected to envisage a roadmap or pathway for a sustainable and multimodal mobility within Lisbon metropolitan area. In future work, it would be also important to understand if these patterns found have changed after the second and third waves of the COVID-19 pandemic in Lisbon. The computational approach can be directly applied to periods other than those considered, including different moments within the pandemic. Following the methodology introduced, the user only has to specify the periods of interest, with the patterns and views being dynamically updated.