3.2.1. Preprocessing

We first carry out a preprocessing phase to guarantee the quality of the data we employ. Errors in the database are mainly due to the low accuracy of GPSs in an urban scenario, which results in either inconsistent positions or time slots with no recorded location. Furthermore, there are trips performed by staff members that we do not consider in the study.

We apply this preprocessing to 303 962 trips connecting 25 818 origin and destination pairs resulting in 142 968 movements and 22 929 origin-destination (OD) pairs.

In addition, we filter out those trips that start and end on the same docking station as they cannot be considered to be transport. In this filtering we treat as *twin* docking stations those that are located at the same point of interest. These trips follow their own internal dynamics and are also intention-based. They are possibly retrieval trips (target destination and return). They could also be tourist/leisure-based activity, or they could just represent an interpretation of the tariffing, where the user considers it more effective to retain the bike for whatever reason. Such paths should be considered separately, but with no knowledge of the destination, they cannot be used to analyze the transport aspects of the service. In any case, this restriction forces us to work on the worst-case scenario for our analysis.

The remaining set includes 139 956 trips and 22 750 OD pairs.

#### 3.2.2. Calculation of the Trip Index

For a given trip *p* from origin *i* to destination *j*, the calculation of the trip index *<sup>α</sup>p* defined in Equation 4 involves the lengths of four trajectories: the actual trajectory of the trip, *dp*; the linear trajectory, *dL*; the orthodox trajectory, *dO*; and the heterodox trajectory, *dH*.

The actual trajectory of the trip is retrieved from the track data in the BiciMAD database. The linear trajectory is built as the segmen<sup>t</sup> connecting the origin *i* to the destination *j*. On the other hand, the orthodox trajectory is retrieved from Google Maps, selecting *bicycle* as the transport mode. This way, the orthodox trajectory is defined by a polygonal curve that includes *n* intermediate points.

To calculate the Fréchet distance from the orthodox ( *O*) to the linear (*L*) trajectories, we construct a linear interpolation of *n* segments that link origin *i* to destination *j*. Next, we apply the calculation of the discrete Fréchet distance *<sup>δ</sup>DF*(*<sup>O</sup>*, *L*) specified in Equation 2 to these two polygonal curves.

Finally, the length of the heterodox trajectory is calculated from the linear and the orthodox trajectories as defined in Equation 3.

#### *3.3. Results of the Classification of BSS Trips*

We represented the classification of BSS trips as a data-driven problem given the lack of a previously well-defined set of trips falling on either category: transport or leisure. Consequently, we classify BSS trips according to the results obtained from applying the trip index to the specific set of empirical data, and analyzing their inherent features.

Figure 1 shows the histogram of the trip indexes corresponding to the 139 956 trips registered in February 2019, in BiciMAD, resulting from the preprocessing phase. We observe two clear behaviors of the trip index, which suggests a concatenation of uniform and Gaussian distributions.

**Figure 1.** Histogram of trip indexes of trips in the dataset.

Please note that there are trip indexes greater than 1, which correspond to trips achieving deviations under the Fréchet distance of the orthodox trajectory to the linear path. This fact is perfectly consistent with the proposed methodology. On the other hand, it opens up a new set of questions regarding the actual statistical distribution of the shortest available routes between two points in a BSS, to be specifically addressed in a particular research line that will be discussed in Section 5.

To determine the optimal threshold that separates each of these two behaviors, we use the *elbow method*, commonly employed in data-driven clustering models. In this respect, Figure 2 shows the trip indexes sorted in ascending order.

The optimal value for the threshold of the trip index is *α*<sup>∗</sup> = 0.7. We then use this value to classify trips as transport or leisure.

#### *3.4. Validation of the Results*

The characterization of mobility often lacks a solid ground-truth [30]. In these cases, we must rely on the analysis of a set of features shown by each resulting group and test whether they actually represent distinct behaviors.

Consequently, in this work, we perform two different analyses to validate the classification methodology based on the trip index: statistical and operational.

## 3.4.1. Statistical Analysis

First, we complete the statistical analysis of three defining features of the trip: distance, duration, and speed. Figure 3 shows the results we obtained.

**Figure 3.** Statistical characterization of trips: *leisure* (*<sup>α</sup>p* < *α*<sup>∗</sup>) on the left, and *transport* (*<sup>α</sup>p* ≥ *α*<sup>∗</sup>) on the right.

We can observe that the shapes of the statistical distributions corresponding to *leisure* (left) and *transport* (right) trips are fundamentally different. Let us study in some depth each of these features and compare them to similar scientific results obtained from BSSs.

Traveled distances in leisure and transport trips are clearly different as they show particular ranges, average values and standard deviations as we can observe in Table 1. Leisure trips are characterized by significantly higher maximum distances, which reflect the user is *wandering*. This result agrees with the conclusions in [31] where authors state that *daily members* of a BSS (leisure) perform longer trips as opposed to *annual members* (transport). In addition, the average distances we obtained fit those resulting from the *non-commute* and *commute* trips in [32].


**Table 1.** Distance Statistics (km).

The specific statistical values of the distribution of the duration of the trip are included in Table 2. In this case, the ranges are similar, but we can observe significant differences among the average and standard deviations. Transport trips are typically shorter in time, which coincides with the results in [32], where the authors conclude that duration of BSS trips is highly correlated with distance. In addition, the authors in [33] affirm that commuters spend 1 hour a day total in their outward and return trips; in our study, Figure 3b shows that the tail of the distributions of duration of trips is negligible beyond 30 min.

**Table 2.** Travel Time Statistics.


Finally, Table 3 shows the numerical results for the distribution of speeds. We can observe a significantly higher average speed in transport trips. This is particularly relevant as bicycles in BiciMAD are electric, which avoids the speed being biased by the physical condition of the user. Higher speeds were also observed in commuters in [32].

**Table 3.** Speed Statistics (km/h).

