*2.1. Starting Premise*

Our first purpose is to construct a methodology and a mathematical framework that allow us to classify BSS trips depending on their intrinsic purpose. In this respect, we can observe four types of BSS users [5]: (a) commuters: who move between residence and work place, or secondary transport node; (b) utility users: who need to reach commercial, cultural or sport facilities; (c) leisure users: who cycle for fun and sport; and (d) tourists: who visit tourist sites and attractions. To reach our main objective, we define two categories of BSS trips: *transport*, which includes group types (a) and (b); and *leisure*, which include group types (c) and (d). Please note that we do not classify *users*, but *trips* as our final goal is to characterize the BSS mobility and its intrinsic network. In this sense, a determined user can always perform trips that fall in either category.

A trip is a sequence of locations and time stamps. From this set of spatio-temporal points we can obtain different variables that describe the trip, like speed and length. Considering the former, we could expect the speed of leisure trips to be lower than that of transport trips. However, speed may not be adequate for this purpose as it depends on factors like age, physical abilities, steepness of the road, or traffic. On the other side, the length of the trip is intrinsically impacted by the separation between origin and destination, thus it cannot directly reflect the purpose of the journey as an absolute value.

So, let us observe the trajectory of the trip. Research in public transportation systems assume that users apply some utility function to their decision to choose this transport mode [18,19]. Consequently, we build our framework based on this starting premise: transport trips describe trajectories close to the shortest path.

## *2.2. Trip Index*

Our objective is to define a *trip index α* that allows us to classify trips into transport or leisure. To calculate this index, we will rely on trajectories. In this respect, note that trajectories do not need to be identical to describe a specific type of trip; they just have to share some common semantic meaning, which in our case, will be given by the trip index. Following our starting premise, the trip index must represent the deviation of the user's actual trajectory from the shortest path.

In other scientific disciplines, for example Biology and Hydraulic, researchers have faced similar problems in describing animal's movements searching for food [20] and the course of rivers through valleys [21] respectively. These works make use of the *sinuosity index*, *SI*, which relates the distance of the linear and actual paths:

$$SI = \frac{d\_p}{d\_L}\_{}'$$

where *dp* is the length of the actual trajectory and *dL* is the Euclidean distance between origin and destination.

However, linear trajectories are not viable in cities most of the time, because of the intrinsic topology of the infrastructure. Thus, we must take the shortest possible path as the base reference. Considering these premises, we define the trip index *<sup>α</sup>p* of a trip *p* from origin *i* to destination *j* as:

$$
\alpha\_p = \frac{d\_{SP}(i\_\prime j)}{d\_p},
\tag{1}
$$

where *dp* is the actual distance traveled in trip *p* and *dSP* is the length of the shortest viable path from origin to destination. However, the peculiarities of a BSS insert a level of complexity in calculating this shortest path.

#### *2.3. Spaces, Trajectories and Shortcuts in a BSS*

To represent trips in a BSS system, we must consider two spaces. On one hand, BSSs are deployed on a *real physical space*, with constraints. Some of these constraints are immutable, such as buildings or rivers; and others are subject to change, like one-way streets or shared lanes (which cars and bicycles can simultaneously use). On the other hand, this real physical space with constraints defines an *underlying graph*, which represents the actual allowed travel space. The underlying graph continuously evolves in time as it is affected by mutable constraints.

In addition, given the particularities of BSSs, we have to contend with the fact that the underlying graph is not complete, and we must also consider missing connections in the real space. This is because cyclists can navigate through some restrictions using, for example, pedestrian crossings or sidewalks. We will collectively refer to these extra graph elements as *shortcuts*. Furthermore, the existence of these shortcuts has a complex time dependency as they may be impacted by factors that make them viable or not at specific moments. For example, consider the possibility of crossing a street on a traffic light; only when it is green, this shortcut becomes a viable alternative to reduce the minimum distance. Thus, shortcuts can be observed as graph edges with a time-to-live.

Considering these premises, we can define four different trajectories regarding a determined BSS trip:


#### *2.4. Characterizing the Shortest Path in a BSS Trip*

Given a trip *p*, the Euclidean distance *dL* from origin *i* to destination *j* defines the length scale to the problem: the relative size of graph links to the Euclidean distance from origin to destination.

On the other hand, the orthodox trajectory is the one that minimizes the overall distance traveled from origin to destination, using the corresponding city infrastructure (streets and bicycle lanes) and respecting the regulation applicable to bicycles. This orthodox trajectory does not embed the heterodoxy that results from inserting shortcuts in the graph, which effectively reduces *dO*.

In addition, we must take into account that these shortcuts evolve in time. This variation implies that we cannot aim at defining a *fixed* shortest path, but a *set of viable* shortest paths. Effectively, we need some way of characterizing the statistical distribution of this set of available shortest paths.

The problem of directed paths is intrinsic to a large number of different disciplines, polymer physics in the presence of forces, percolation theory, direct random walks, to name but a few. Using some of the more basic concepts from these fields we can infer what the basic structure of the probability distribution of shortest paths could be. This distribution function is built from a collection of paths fixed to have the same start and end points, which are separated by *dL*. In effect, we are considering a distribution function constructed from a set of paths defined by a fixed extension. The actual distribution will be complex, where details of its form will depend on the nature of the paths available, but we are proposing that the grosser characteristics are defined by only a limited number of parameters. The simplest spatial representation of this statistical distribution of paths is that of an ellipsoid with origin and destination located on its vertexes. The width of the ellipsoid is dictated by allowed variance and frequency of deviations from the direct path. Consequently, we obtain a crude characterization of this distribution by looking at the defining geometry of the ellipse. Hence, the solution to our problem revolves around the calculation of the major and minor axes of this ellipsoid. The former is the linear trajectory, with length *dL*. Therefore, let us concentrate on the latter.

The minor axis reflects the distribution width caused by deviations from the linear path of the actual trip. We can find a set of approaches within the scientific literature in the field of urban mobility that estimate deviations from trajectories using map-matching methods or path-searching problems [22]. These approaches often aim at coupling a set of GPS data points to one of the possible trajectories a vehicle can describe through the underlying city's graph. Among them, researchers have used methods like the A- search algorithm [23], the Manhattan Distance [24], or the Hausdorff distance [25].

However, our problem is intrinsically different as we need to characterize the distribution function of a set of viable and time-dependent shortest paths. This set of shortest paths presents an absolute minimum, the linear path. Consequently, we can evaluate deviations using a similarity metric to the linear path [26]. For this purpose, we chose the Fréchet distance [27], which measures the similarity between two curves, considering the location and order of points along them. Intuitively, imagine a person walking a dog with a leash. The person and the dog describe finite paths *f* and *g* respectively. Both can vary their speed, but they cannot walk back. The Fréchet distance between curves *f* and *g* could be seen as the minimum length of the leash that allows these two trajectories.

Formally [28], let (*<sup>M</sup>*, *d*) be a metric space; we define a *curve* as a continuous mapping:

$$f \colon [v\_{0\prime}v\_1] \to M, \quad v\_{0\prime}v\_1 \in \mathbb{R}, \ v\_0 \le v\_1.$$

Given two curves *f* : [*<sup>v</sup>*0, *<sup>v</sup>*1] → *M* and *g* : [*<sup>w</sup>*0, *<sup>w</sup>*1] → *M*, their *Fréchet distance* is defined as:

$$\delta\_{\mathbb{F}}(f,\emptyset) = \inf\_{\substack{\mathbb{v}:\, [0,1]\to\,\,[\mathbb{v}\_{\mathbb{D}},\mathbb{v}\_{1}]\\ \text{av}:\,[0,1]\to\,\,[\mathbb{a}\_{\mathbb{D}},\mathbb{v}\_{1}]}} \max\_{\substack{t\in\mathbb{[}\mathbb{v}\_{\mathbb{D}}]\\ \text{at}\,\,[0,1]}} d\left(f\left(\boldsymbol{v}(t)\right),\emptyset\left(\boldsymbol{w}(t)\right)\right),$$

where *v* and *w* are arbitrary continuous nondecreasing functions, with *v*(0) = *v*0, *v*(1) = *v*1, *w*(0) = *w*0, and *w*(1) = *w*1 and *t* is the path parameter, typically time.

For trajectories formed as a collection of points rather than a continuous function, we can use the discrete Fréchet distance, also called the *coupling distance* [29]. It is an approximation of the Fréchet metric for polygonal curves. Following a similar intuitive example mentioned before, the discrete Fréchet distance replaces the man and the dog with a pair of leaping frogs.

Using the formal definition in [29], let *P* and *Q* be two polynomial curves with endpoints of their line segments forming the sequences *σ*(*P*) = - *v*1, *v*2,..., *vp* and *σ*(*Q*) = - *w*1, *w*2,..., *wq* . A coupling *C* between *P* and *Q* is a sequence of distinct pairs of endpoints in *σ*(*P*) × *σ*(*Q*) taken in order:

$$C = \left( \left( \upsilon\_{a\_1}, \upsilon\_{b\_1} \right), \left( \upsilon\_{a\_2}, \upsilon\_{b\_2} \right), \dots, \left( \upsilon\_{a\_m}, \upsilon\_{b\_m} \right) \right) \dots$$

The first and last pairs are formed by the two origins and two destinations, respectively. The remaining pairs are formed in an iterative sequence. In each step, we couple the previous or the next endpoints in *σ*(*P*) to the previous or the next in *<sup>σ</sup>*(*Q*). This method implies that there is not a unique coupling; every particular combination of endpoints will generate a coupling *Ck* with *k* ∈ [1, *<sup>K</sup>*], whose length is determined by the longest distance between every pair of endpoints, i.e.

$$||\mathbf{C}\_k|| = \max\_{i=1,2,\dots,m} d\left(v\_{a\_i}, w\_{b\_i}\right).$$

Finally, the *discrete Fréchet distance* between the polygonal curves *P* and *Q* is defined as the minimum length among all the possible *K* couplings between them:

$$\delta\_{DF}(P,Q) = \min\_{k \in [1,K]} \{||C\_k||\}. \tag{2}$$

.

We use the discrete Fréchet distance to calculate the maximum deviation of the orthodox trajectory to the linear trajectory, i.e., *<sup>δ</sup>DF*(*<sup>O</sup>*, *<sup>L</sup>*). Considering this maximum deviation, the length of the heterodox trajectory lies between those two distances, i.e., *dH* ∈ [*dL*, *dO*]. Thus, we define the length of the heterodox trajectory as the minimum distance between origin and destination, given a deviation of *<sup>δ</sup>DF*(*<sup>O</sup>*, *L*) from the linear trajectory, which is calculated as:

$$d\_H = 2\sqrt{(\delta\_{DF}(O, L))^2 + \frac{d\_L^2}{4}} = \sqrt{4(\delta\_{DF}(O, L))^2 + d\_L^2} \tag{3}$$

We will take *dH* as an approach to the shortest available path between the origin *i* and destination *j* of a particular trip *p*. Consequently, the trip index defined in Equation 1 is finally defined as:

$$\alpha\_p = \frac{d\_H}{d\_P} = \frac{\sqrt{4(\delta\_{DF}(O\_\prime L))^2 + d\_L^2}}{d\_P} \tag{4}$$

The trip index in Equation 4 incorporates all the available information about a trip and the possible trajectories in the underlying physical and topological spaces of the BSS. Following the premise stated in Section 2.1, we will use the trip index to classify trips as leisure or transport.

#### **3. Application to a Real BSS**

We applied the trip index to a dataset of real trips performed in BiciMAD, the public BSS in the city of Madrid, Spain. In this section, we will describe the dataset and the methodology we followed to obtain indexes and classify the trips accordingly. Finally, we will analyze the characteristics of the resulting groups of transport and leisure trips to validate the classifier under both statistical and operational perspectives.
