*Article* **Cluster Analysis of Public Bike Sharing Systems for Categorization**

#### **Tamás Mátrai \* and János Tóth**

Department of Transport Technology and Economics, Budapest University of Technology and Economics, Stoczek utca 2, H-1111 Budapest, Hungary; toth.janos@mail.bme.hu

**\*** Correspondence: tamas.matrai@mail.bme.hu; Tel.: +36-20-260-6171

Received: 14 June 2020; Accepted: 3 July 2020; Published: 8 July 2020

**Abstract:** The world population will reach 9.8 billion by 2050, with increased urbanization. Cycling is one of the fastest developing sustainable transport solutions. With the spread of public bike sharing (PBS) systems, it is very important to understand the differences between systems. This article focuses on the clustering of different bike sharing systems around the world. The lack of a comprehensive database about PBS systems in the world does not allow comparing or evaluating them. Therefore, the first step was to gather data about existing systems. The existing systems could be categorized by grouping criterions, and then typical models can be defined. Our assumption was that 90% of the systems could be classified into four clusters. We used clustering techniques and statistical analysis to create these clusters. However, our estimation proved to be too optimistic, therefore, we only used four distinct clusters (public, private, mixed, other) and the results were acceptable. The analysis of the different clusters and the identification of their common features is the next step of this line of research; however, some general characteristics of the proposed clusters are described. The result is a general method that could identify the type of a PBS system.

**Keywords:** public bike sharing; cluster analysis; categorization; data collection

#### **1. Introduction**

According to the UN forecast, the world population in mid-2017 was about 7.6 billion people, and by 2050 it is predicted to reach 9.8 billion. Along with this, urbanization is expected to increase [1,2]. Cycling is one of the fastest developing sustainable transport solutions [3–6]. Modernized and urban lifestyles have faded away physical activity of everyday life and this has resulted in a threat to population health caused by sedentary lifestyles [7]. It is estimated that physical inactivity causes 21–25% of breast and colon cancer and even greater proportions are estimated for diabetes (27%) and ischemic heart disease (30%) [8].

Public bike sharing (PBS) systems, also known as "Public-Use Bicycles", "Bicycle Transit," "Bikesharing", or "Smart Bikes," can be defined as a short-term urban bicycle rental schemes that allow bicycles to be picked up at any self-service bicycle station and returned to any other bicycle station, consisting in point-to-point trips [9]. Basically, people use bicycles on an "as-needed" basis, without the responsibility of the bicycle ownership [10]. Nowadays, different type of PBS systems start to spread all around the world, which can be operated without the docking stations, hence called dockless systems [11,12]. With the spread of public bike sharing systems, it is very important to understand the differences between systems [10,13–16]. Without understanding the differences neither the impact of these systems can be calculated, nor is high-quality decision support possible.

We developed a complete framework during a doctoral research for analyzing, comparing, and categorizing public bike sharing systems, as such a comprehensive system is still missing from the literature [17]. The first level of our framework is to collect data about existing systems and perform a

cluster analysis. Then, a SWOT (strengths, weaknesses, opportunities, and threats) analysis for each cluster is compiled based on the examined systems. The third step is to create a benchmark tool, which supports the evaluation of systems. At the fourth level, impact analysis and impact assessment are carried out [18–21].

The present article deals with the clustering of different bike sharing systems around the world (i.e., it concerns the first level of our framework). The lack of a comprehensive database about PBS systems in the world does not allow for a simple comparison or evaluation of the systems [22]. Furthermore, the original goal of the creation of a PBS system is quite often unclear [23]. Without knowing the initial goal, the success of the system cannot be evaluated. A systematic literature review and scientometric analysis was conducted by Si et al. [17] from most of the bike-sharing-related articles between 2010 and 2018 from which it is clear that the researchers main focus was not on business models.

Several articles analyze the value creation of a bike sharing system [10,24–26], although all of them start from the assumption that there are several distinct business models for bike sharing. DeMaio [16] introduced several examples of model provision in his article, but there was no clear definition of the different models. Other articles [24,25,27–29] are using the business model canvas [30] approach or at least some of its elements, but these are not provide an easy to use categorization.

Our initial idea was to apply an unsupervised machine learning algorithm to a dataset, which should lead us to findings related to business models. This approach was applied in other industries like the Spanish scientific journals [31] or electric mobility [32] successfully. The cluster analysis methodology was not up to now applied in the field of PBS business models, but we collected a large dataset, which can be used to this purpose.

The goal of the clustering process is to create groups (clusters of objects) of the dataset, in a way that: (i) the objects in a given cluster are similar as much as possible; and (ii) the objects belonging to different clusters are highly different [33].The cluster analysis usually applied in the domain of spatial studies related to public bike sharing (e.g., [34–37]). In this field, the studies mostly focus on the distribution of bikes or stations.

Our main assumption is that a large proportion (i.e., 90%) of the public bike sharing systems around the world could be classified into one of the four clusters. These clusters are formulated based on the type of the owner and the type of the operator. A SWOT analysis based on this categorization could help PBS project promoters and owners to develop higher-quality systems. The clustering methodology proposed by the authors contributes, among others, to reducing a large number of primary data to several basic categories that can be treated as subjects for further analysis in the public bike sharing domain.

#### **2. Methodology**

Our research followed the steps described in Figure 1.

The first step was data collection, which was followed by the initial dataset analysis. Then, the first cluster analysis based on expert opinion was conducted. The statistical tests and regression analysis were applied in order to select the parameters for the second cluster analysis. In the end, both internal and external cluster validation techniques were applied. During the analysis of the results, we compared three scenarios to each other, where different parameters were considered:


**Figure 1.** Flow-chart for the cluster analysis.

#### *2.1. Data Collection, Database*

The original idea was to collect 80 parameters on 125 systems around the world. The collection of this data was based on open web databases and the webpages of the different bike sharing systems. We assumed that the data of the bike sharing system website are up-to-date and accurate. Our starting point was the collection of systems by Meddin [38]; this database contained 2124 active systems at the beginning of 2019. We selected the 125 systems based on the following criteria:


After a 6-month-long collection period, we had to reduce the dataset to 64 systems and 64 parameters. We made the decision that to exclude the dockless systems (*n* = 31) from the analysis and only focus on docked bike sharing. There were several systems (*n* = 30) where—despite all efforts—we did not reach the minimum viable information. These systems were excluded from the analysis so as to not distort the results. Out of the originally desired 80 parameters, we had to exclude some due to the lack of available data. For example, we intended to gather information about the goals of the different systems, although it was not possible since very few system declare their initial goal, as Ricci pointed out earlier [23].

The final database was grouped around the following main topics:


#### *2.2. Dataset Analysis*

The first step was to visualize the dataset in a two-dimensional space. As the dataset itself contains several parameters, a principal component analysis (PCA) algorithm was used to reduce the number of dimensions. The algorithm presents the results in a scattered plot diagram, which gives us an easily understandable visual representation of the dataset [33].

There are several methods to calculate the distance between each pair of observations. Gower distance [39] is one of the few measures that are capable of handling both categorical and continuous variables, therefore this method was used for our calculation. The dissimilarity between two variables is the weighted mean of the contributions of each variable. This automatically implies that a particular standardization process is applied to each variable.

The daisy function from the cluster package [40] is suitable for calculating Gower distances in R. The result of computation of these distances is known as a dissimilarity matrix. The Gower distance can be described with the following Equation (1).

$$d\_{ij} = \frac{\sum\_{k=1}^{p} \omega\_k \star \delta\_{ij}^{(k)} \ast d\_{ij}^{(k)}}{\sum\_{k=1}^{p} \omega\_k \star \delta\_{ij}^{(k)}} \tag{1}$$

where *dij* is a weighted mean, ω*<sup>k</sup>* is the weight, δ (*k*) *ij* is the 0–1 weight, which becomes zero when the variable *x*[, *k*] is missing in either or both rows (*i* and *j*) or when the variable is asymmetric binary and both values are zero and in all other situations it is 1, and *d* (*k*) *ij* <sup>−</sup> *<sup>k</sup> th* variable contribution to the total distance

We analyzed the entire dataset from the cluster tendency point of view. During the visual assessment of clustering tendency (VAT approach), we used the following steps:


The color level is proportional to the value of the dissimilarity between observations. The observations in the same cluster are displayed in a consecutive order [41].

After the visual inspection, we also used the statistical method called Hopkins statistic to evaluate clusterability. This method measures the probability if a dataset was generated by a uniform distribution, so it tests the spatial randomness of the data. The calculations are the following:


The formula of Hopkins statistics can be defined as below (2):

$$H = \frac{\sum\_{i=1}^{n} y\_i}{\sum\_{i=1}^{n} \mathbf{x}\_i + \sum\_{i=1}^{n} y\_i} \tag{2}$$

where *H* is the Hopkins statistics, *y<sup>i</sup>* is the nearest neighbor distance in the random dataset, *x<sup>i</sup>* is the nearest neighbor distance in the real dataset, and *n* is number of sample points in the dataset.

The null hypothesis is that the original real dataset is uniformly distributed (i.e., there are no meaningful clusters). The alternative hypothesis is that the dataset is not uniformly distributed. (i.e., there can be find meaningful clusters). If the Hopkins statistics is close to 1, we can reject the null hypothesis and conclude that there is significant clusterability. A higher than 75% value indicates a clusterability at the 90% confidence level.

#### *2.3. Clustering Based on Expert Opinion*

Our main hypothesis was that most of the PBS systems can be clustered based on the owner type and the operator type. Therefore, during data collection, we used two owner categories: Public and Private, while in the type of operator we created 4 categories: Advertising company, Private Company, Service provider, and Public. Based on these types, we created 4 clusters, which can be seen in Table 1. This categorization was based on the expert opinion of the two authors.


**Table 1.** Clustering logic based on the operator and the owner.

#### *2.4. Univariate Statistical Tests*

In order to determine which of the 64 parameters should be included in a multivariate regression model, some preselection is required [42]. As the dependent variables are both categorical and continuous, while the independent variable is categorical, we had to use two types of statistical tests. We used the SPSS statistical software for these tests.

We used the Pearson's chi-square test to discover whether there is a relationship between two categorical variables. As all the variables were measured at an ordinal or nominal level (i.e., were categorical data) and both variables consist of at least two independent groups, the test was applicable. The null hypothesis was that Variable 1 (Cluster) is dependent from Variable 2 (all other categorical variables) [43].

We used the one-way ANOVA test to determine if there is a statistical difference between the means of independent groups and the population. The independent variable (cluster in our case) divides the dataset into mutually exclusive groups. We used this test where the dependent variables were continuous. The null hypothesis was that all group means are equal, while the alternative hypothesis was that at least one of the group means is not equal to the others. As the one-way ANOVA is an omnibus test, we do not know which of the groups are different [44].

We selected a higher significance level for both tests not to eliminate the possible candidates from the multivariate regression analysis as it was suggested by Bursac et al. (2008). If the *p*-value was less than our chosen significance level (α = 0.25), we rejected the null hypothesis, and concluded that there is an association between our two variables, therefore we selected the dependent variable for further tests [42].

#### *2.5. Multinomial Regression*

We used multinomial logistic regression to predict the nominal dependent variable (cluster of the PBS system) based on the preselected independent variables (both categorical and continuous ones). This also allows to have interaction between the independent variables to predict the dependent one. We used the SPSS statistical software for this.

The applicability of this method is based on the following assumptions:


We checked the entire dataset for the first 3 assumptions. The multicollinearity assumption was continuously tested for each different model and the rest was automatically tested in SPSS. As the software is not capable of running any automated model selection processes due to categorical variable, we decided to use the backward method and computed each step manually. First, we eliminated those independent parameters where we believed that the relationship to the dependent one would only be statistical, but there is no real reason to be related (e.g., start of operation, country etc.). Then, we added all remaining parameters to the model. We selected the variables with multicollinearity and eliminated one of them based on the significance. We reduced the model until we got a statistically significant one.

#### *2.6. Cluster Analysis for Selected Parameters*

We used a clustering method for creating associated groups from the dataset. We used the same method with different parameter sets. We decided to use a *k*-medoids algorithm, which belongs to the *k*-means clustering approaches. The most commonly used method is the partitioning around medoids (PAM) algorithm [45]. The PAM algorithm is based on the search of *k* representative medoids in the dataset and then it clusters the remaining dataset around them. As it does not use the means of the cluster, this method is less sensitive to outliers. The method consists of two phases: The build phase and the swap phase. In the build phase, the first step is the selection of *k* medoids. The second step is the calculation of the dissimilarity matrix, while the third step is the assignment of each observation into the closest medoids (therefore cluster), based on the calculated distance. In the swap phase, the fourth step is to check if swapping the current medoid of the cluster to any other object in the given cluster is reducing the average dissimilarity. If this happens, the cluster medoid should be changed to the new object and we must go back to the third step and start over again. If none of the medoids change in the fourth step the procedure stops.

We used the R software [46] and the factoextra package [47] to compute the clustering. We used Gower distance to calculate the dissimilarity of the variables.

#### *2.7. Internal Cluster Validation*

In order to determine how good the clustering is, we applied internal cluster validation statistics, which uses the internal information of each cluster without external data. All the different statistics measure the compactness, the separation, and the connectedness of the different clusters [40,48].


In addition to the statistical indexes, we can also use visual methods to explore the results of clustering. The first possibility is to visualize the clusters with PCA in a two-dimensional space. The other option is the silhouette plot, where the diagram shows the silhouette coefficient for each object in an ordered way separated for each cluster.

#### *2.8. External Cluster Validation*

During the external cluster validation, we can compare two cluster validation techniques to each other. As in this research we created an expert based categorization as well as the wider parameter-based cluster using PAM method, we can compare the two categorizations to each other. The external cluster validation parameters measure how the external cluster number is matched to the clustered one.

The Rand index [49] measures the similarity between two clusters; its range is from −1 (no common value) to 1 (completely the same). The Variation Index described by Meila [50] is also a valuable tool to measure the similarity of the two clusters.

#### **3. Results and Discussion**

The initial phase of our research was to collect the necessary data for our clustering analysis. We shared all the data which collected for this purpose online [51].

We presented the results in three different scenarios below. In the first case, we always made the assessment on the entire dataset. The second presented scenario is the one with the selected parameters based on multinomial regression. The third case is when we only use the operator and the owner parameters. We used Gower distance here as the measure of dissimilarity of the different objects.

We visualized the raw dataset in a two-dimensional space using PCA methodology (Figure 2). The two axes have no specific meaning, they only provide an artificial scale for visualization purposes. Although the scaling and the axes of the figures are not the same, it is viable for comparing the resulting patterns to each other. The dataset with all parameters is less clusterable than the one with selected parameters. In the last one, only four datapoints are visible, since the entire dataset is clustered into these four points.

**Figure 2.** Dataset visualized with principal component analysis (PCA) (**a**) based on all parameters; (**b**) based on the selected parameters; (**c**) based on the operator and owner parameters.

ܪ1−= ௧ܪ The same conclusion can be drawn from the heatmap resulting from the VAT approach (see Figure 3) as well as from the Hopkins statistics. The factoextra package [47] implements *Halt* = 1 − *H* as the definition of *H* provided in the methodology section. *Halt* = 0.2661683 proved to be for all parameters (scenario 1), while *Halt* = 0.1565344 for the selected parameters (scenario 2). We used the seed number 123 for the calculation of Hopkins statistics. ܪ1−= ௧ܪ

**Figure 3.** Visualization of the dissimilarity matrix (**a**) based on all parameters; (**b**) based on the selected parameters; (**c**) based on the operator and owner parameters.

(0.25 = ߙ (0.25 = ߙ We used the SPSS for the preselection of the parameters for the multivariate regression. Table 2 contains those variables whose *p*-value is lower than the chosen significance level (α = 0.25) in the Chi-square test.

Table 3 contains those variables whose *p*-value is lower than the chosen significance level (α = 0.25) in the one-way ANOVA test. Parameter names can be found next to the dataset description in [51].

After seven iterations with the multinomial regression, the model with the following parameters were selected:


We ran the cluster analysis in the R software using the PAM method for all three scenarios with *k* = 4. As shown in Figure 4, the clustering is better with the selected parameters.


**Table 2.** Results of the Chi-square tests.

**Table 3.** Results of the one-way ANOVA tests.


**Figure 4.** Clustered dataset visualized with PCA (**a**) based on all parameters; (**b**) based on the selected parameters; (**c**) based on the operator and owner parameters.

The results of the clustering can be described with the average silhouette width of each cluster (Table 4) and the silhouette plots Figure 5). The average silhouette width increases to 1 in the absolutely clustered scenario. The cluster based on the selected parameters has no negative data in cluster 1 and 2, which means a good clustering result.


**Table 4.** Cluster size and average silhouette width in three different scenarios.

**Figure 5.** Silhouette plot for clustering (**a**) based on all parameters; (**b**) based on the selected parameters; (**c**) based on the operator and owner parameters.

The internal cluster validation statistics shows similar results (Table 5). Where it is appropriate, the owner and operator scenario has the theoretical minimum or maximum values. There are some exceptions e.g., the first Dunn index shows worse results for the selected parameter scenario than the all-parameters one.

**Table 5.** Internal cluster validation statistics for the three different alternatives.


External cluster validation was based on the clustering related to the expert opinion. Therefore, the last column of Table 6 is just for reference purposes; obviously, it has no meaning besides that the statistical calculation is working.



The results of the clustering algorithm for the selected parameter can be seen in Figure 6. There is a distinct cluster which is clearly different from the others, while the remaining three are somewhat overlapping.

∞

∞

−

**Figure 6.** Clustering visualized with PCA based on the selected parameters.

We compared expert-based clustering with parameter-based clustering. Out of 64 systems, 42 clustered into the same cluster as the expert based method suggests and the remaining 22 (35%) were misplaced by the clustering algorithm. Based on our results, we decided that these 22 systems should be categorized as a fifth cluster, called Other. The clustering was correct for the systems, which were owned by a private company: No system with such parameters was missing, and none was misplaced to this cluster. The public systems showed similarly good results, as there were only 3 out of 12 that were misplaced and even these 3 showed some similarity with the selected clusters (e.g., Bixi Montreal is a public system, but has similarity with a commercial one, the two others were Chinese systems, where the difference between a public or a private company is sometimes hard to spot).

Eleven systems were misplaced into the cluster where the operator is supposed to be an advertising company, although all these systems are operated by a service provider. It was also true for the other way around: Four systems out of five were misplaced into the cluster with a service provider as operator, although they have an advertising company operator. This might indicate that the service provider and the advertising company business model and those system characteristics are not as distinct from each other as the purely public and purely private models.

If we only use three big clusters (purely public, purely private, and mixed), we end up only with the miscategorization of 6 systems out of 64. The analysis of the different clusters and the identification of their common features is the next step of this line of research, however some general characteristics of the proposed four clusters can be described here as a starting point. The basis of these descriptions is both the categorization described in Section 2.3 and the results of the clustering exercise:

• Cluster 1 (Public systems): Both the operator and owner of the system are public institutions. The owner is usually a city or one of its companies. The operator can be the same organization or a new one created for this specific purpose. The income is coming directly from the user fees, but usually it requires subsidization. The goal of such a scheme usually is to provide an alternative transport mode or educate the citizens rather than profit making. Typical example: MVG Rad (Munich).


Additionally, there are some limitations with the current methodology. As it was stated above, collecting all parameters from the different systems is a time-consuming process. Furthermore, the current data become outdated very quickly as not only new systems emerge, but also the technology changes. This research does not consider dockless schemes, although they have become more and more popular in recent years [52]. At the same time, these systems are almost all profit-oriented, privately funded systems, which can be easily put under the same, distinct category. The other problem with this type of applied, data-driven approach is that an error in data collection can cause problems in interpreting the results. This was one of the reasons that we chose a clustering method that is less vulnerable to outliers.

#### **4. Conclusions**

In this study, we developed a method for categorizing public bike sharing systems, which consisted of 8 steps:


During data collection, we faced several problems, therefore only 64 parameters were collected from 64 systems around the world [51]. The dataset analysis showed that the dataset is clusterable. We selected the operator and owner type as initial parameters for expert based clustering, from which we created four clusters.

We preselected 19 factor type and 15 continuous type parameters for multinomial regression, based on the univariate statistical tests, out of which five factor parameters and six continuous parameters were selected for the final model.

We reran the PAM-approach-based clustering again with the selected parameters, which resulted in better fit than in the case of all parameters. Forty-two systems were assigned to the correct clusters, the remaining 22 were misplaced by the clustering algorithm. Our initial assumption was too optimistic, as only 65% of the systems could be clustered with this method. Thirty-five percent fall into the "Other" category. At the same time, if we only use three main clusters (public, private, and mixed), the error is reduced to 6 systems out of 64.

This can be arguably a correct solution as the service provider and the advertising company business model might not be separated. So, there are four proposed clusters: Public systems, private systems, mixed systems, and other systems.

This article describes the basic characteristics of these clusters, however analyzing the characteristics in details of the different clusters is the next step of this ongoing research. Additionally, future research work will be devoted to overcoming some of the limitations of the presented methodology. One of the main limitations is the data availability; if new, reliable data become available (e.g., usage data, travel pattern, financial data), the current methodology can be expanded to cover this. Another development path can be the inclusion of the dockless schemes to the current analysis, which was neglected in this article due to the lack of reliable data.

This article can help for those who would like to apply the clustering methodology in a different domain. At the same time, it can provide a basis for further research in the public bike sharing domain, as the proposed methodology can be applied for a different set of PBS systems. A newly designed system can be categorized based on the owner and operator, which can help to find similar systems and identify problems and best practices in the earl stage. In other words, this paper can provide significant added value for researchers and academics as well as policy makers and practitioners.

**Author Contributions:** Conceptualization T.M. and J.T.; methodology T.M. and J.T.; writing—original draft preparation T.M. and J.T.; writing—review and editing, T.M. and J.T.; visualization, T.M.; supervision, J.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Evaluating the Efficiency of Bike-Sharing Stations with Data Envelopment Analysis**

**Leonardo Caggiani <sup>1</sup> , Rosalia Camporeale 2,3,\* , Zahra Hamidi 3,4 and Chunli Zhao 2,3**


**Abstract:** This paper focuses on the efficiency evaluation of bike-sharing systems (BSSs) and develops an approach based on data envelopment analysis (DEA) to support the decisions regarding the performance evaluation of BSS stations. The proposed methodology is applied and tested for the Malmöbybike BSS in Malmö, Sweden. This was done by employing spatial analyses and data about the BSS usage trends as well as taking into account transport, land use, and socioeconomic context of the case study. The results of the application demonstrate consistency with the literature and highlight meaningful associations between the station relative efficiency and the urban context. More specifically, the paper provides in-depth knowledge about the preprocessing data, selection of input and output variables, and the underlying analytical approach to be potentially applied to other cases and urban contexts. Overall, the DEA-based methodology presented in this study could assist decision-makers and planners with developing operational strategies for planning and management of BSS stations and networks.

**Keywords:** BSS station efficiency; data envelopment analysis; spatial analysis in transport; bikesharing system; bike-sharing station

#### **1. Introduction**

A bike-sharing system (BSS) is considered an alternative to cars. It is a measure designed to inspire modal shift from short car trips to cycling and intermodal. BSS primary function, typically regarded as a last-mile solution for metropolitan areas, has motivated the investments to provide such services in cities around the world [1,2]. Two main types of BSS exist in cities today, the conventional BSS and the free-floating BSS. The conventional BSS requires the passengers to borrow and return the bicycle from/to fixed stations. Compared to the conventional BSS, free-floating BSS has been recently introduced and it does not have fixed stations for picking up and dropping off bicycles; users are allowed to park the bikes potentially "everywhere" (or within areas with geo-fenced boundaries) as close as possible to their destinations [3]. Both BSS types enable the possibility for the passengers to cycle in a city without owning a bike. This study focuses on conventional BSS.

The first bicycle-sharing scheme was introduced in Amsterdam, the Netherlands in 1965 and it was followed by a station-based BSS implemented in Denmark in 1991 [4]. The first Swedish BSS was a pilot project introduced in Gothenburg in 2005 which operated exclusively in the northern part of the city. The project led to the development of the current BSS in Gothenburg, Styr and Ställ, which was launched in 2010 with 300 bicycles distributed in 20 stations (operating between April and October) [5] and expanded in 2020 to provide 1750 bicycles in 135 stations available throughout the year [6]. Similar

**Citation:** Caggiani, L.; Camporeale, R.; Hamidi, Z.; Zhao, C. Evaluating the Efficiency of Bike-Sharing Stations with Data Envelopment Analysis. *Sustainability* **2021**, *13*, 881. https:// doi.org/10.3390/su13020881

Received: 31 December 2020 Accepted: 13 January 2021 Published: 17 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

systems have also been implemented in other major Swedish cities including Stockholm and Malmö. In 2019, Linköping, the fourth largest city in Sweden, launched the LinBike program as the first Swedish BSS with e-bikes (100 bikes and 17 charging stations). The system, instead of fixed stations, employs recent BSS technologies such as geofencing to define GPS-based virtual zones where users can access or leave the rental bikes [7]. In recent years, in addition to the rapid emergence of these systems in the larger Swedish cities, there has been a growing interest in the development of regional BSSs that could provide viable bike-sharing services across several smaller cities [8].

As a measure to reduce car use and emissions, BSSs offer a set of advantages which explains their widespread adoption by many cities around the world [9,10]. They provide improved accessibility to cycling thus could increase cycling mode share in general [10]; BSSs are a possible last/first-mile mode for connecting to public transport services or can be used as single-mode for shorter journeys. In terms of costs, using rental bikes is often cheaper than renting a car and it is not necessarily more expensive than buying a ticket for public transport for the equivalent travel distance in an urban area. Overall, BSS is considered an affordable, convenient, sustainable, and healthy transport alternative, hence gaining the attention of the cities committed to social, economic, and environmental sustainability [11]. It is worth noting that BSS popularity has not declined due to the recent COVID-19 pandemic but rather has grown considering the reported increase in trip duration and distance compared to the nonpandemic time [12,13], which strengths its potential for future mobility.

Despite the wide-ranging possible benefits and the global popularity of BSSs, there have been cases of financial or operational failures that were mostly caused by mismanagement or under-designed implementations of these systems [14]. Due to inflexible standardized business models or lack of strategies tailored to the local context, such systems typically face issues such as underuse, misplaced bicycles, vandalism and theft, unusable or dysfunctional devices, impractical or unreliable service, sluggish expansion, and lack of adequate cycling infrastructure [15,16]. Previous studies in a Swedish context suggest that the pressure to deliver a commercially viable and profitable service presents challenges to the success of BSSs as it may result in creating sociotechnical configurations that fall short in delivering long-term sustainability benefits [14]. Other studies highlight public acceptance as another relevant factor for the success of BSSs in both Swedish [5] and global contexts [16]. Nikitas [16] maintains that while BSSs are often widely appreciated by users and nonusers, without long-term support and investment from the local authorities such public acceptance may not translate into an actual usage hence failing in achieving sustainability goals.

In general, it is challenging to provide effective BSS in cities since a range of behavioral aspects, as well as technical and organizational factors, can impact the usage of a bikesharing system. From the BSS planning perspective, station location, the membership, the accessibility of the stations, the number of bikes and racks in each station, the redistribution of bikes during the rush hours, the technology used for building and operation of the system, as well as the attractiveness of the service are considered significant for an effective BSS [4,15,17–19]. In terms of land use, similarly to other travel modes, activity patterns and urban form influence BSS users' travel behaviors. Previous research suggests that population density, job density, as well as cycling infrastructure are all crucial for passengers' choice of traveling by shared bicycles [15,20].

Even though the knowledge about the factors associated with the usage of the BSSs has been fairly studied within the research on shared-bike systems, the topic of station efficiency and its determinant factors has been understudied and under-analyzed [21]. Similarly, in transport practice, typical bike-sharing strategies do not involve scientifically backed and evidence-based measures of station efficiency. In the absence of a station efficiency analysis, it is difficult to identify and eliminate the bottlenecks in a BSS effectively. This was while BSSs have been increasingly planned and implemented to meet mobility needs in an environmentally sustainable way, hence a growing relevance and importance of dealing with the efficiency analysis of shared-bicycle stations.

The objective of this study is to propose and test a method to evaluate the relative efficiency of each shared-bicycle station within a given system and identify its determinants to establish an operational strategy for public BSSs. The proposed method will not only evaluate the efficiency of shared-bicycle stations but also consider the influence of the external variables, thereby contributing to the literature as a methodology for analyzing the efficient operation of BSS stations and the management of the shared-bicycle systems.

The method is proposed and tested through carrying out an analysis of the comparative efficiency of bike-sharing stations, putting forward a general methodology to apply potentially to any context and proposing a numerical application for the city of Malmö, Sweden. The efficiency measures are calculated by a nonparametric approach known as data envelopment analysis (DEA), showing its particular applicability to BSSs. The evaluation result is expected to help in reallocating the existing resources and assist policymakers when deciding where to allocate new stations (planning stages). In this way, it is possible to discover those stations that work better, that are more *efficient* according to the considered parameters, and optimize the system with low costs, i.e., reallocating racks where they are more needed (moving them from less used to more used stations, for instance).

The paper is structured as follows. Section 2 provides the introduction of the proposed DEA methodology from a general perspective, specifying the variables that, according to literature and planning guides, mostly characterize BSSs. Section 3 details the study material and method for the application of DEA to the BSS in Malmö, Sweden, including the detailed description of the explanatory analysis on the dataset to identify a subselection of significant variables. Section 4 presents and discusses the obtained results in Malmö. Section 5 concludes the paper with final remarks and reflections on the proposed approach and its implications.

#### **2. Proposed Methodology**

The methodology presented in this section allows at first to define the input and output variables that mostly characterize BSS stations. More specifically, inputs refer to BSS station, built environment, and population-related variables; outputs refer to station usage trends and are based on the trips done by using the system. Data related to BSS usage has to be cleaned and prepared before applying DEA (i.e., removing anomalies that can indicate temporary malfunctioning of the system of broken bicycles/stations) and be able to calculate the efficiency of each station.

Furthermore, to obtain a sufficient differentiation between the efficiency scores and remove from the analysis any potential outliers among the pool of BSS stations (DEA is sensitive to outliers), we propose to use Robust CoPlot (more details in Section 3.4). Robust CoPlot allows choosing inputs, outputs, and stations more significant for the studied context, considering the available data.

After this preliminary data preparation, DEA can be applied to determine the different degrees of efficiency associated with each BSS station. In the following subsections, we provide a more detailed description of the DEA methodology and the inputs/outputs that we suggest to include in the analysis. The data cleaning, elaboration and variable selection are more extensively described when presenting the case study (Section 3).

#### *2.1. Data Envelopment Analysis (DEA)*

Mathematically, DEA is a linear programming-based model for evaluating the relative efficiency of a set of decision making units (DMUs) which are homogeneous in the sense that they use the same types of resources (inputs) to produce the same kinds of goods or services (outputs) [22]. DEA evaluates the efficiency of each DMU relative to an estimated production possibility frontier determined by all DMUs. It has been used in several contexts (including education systems, health care units, agricultural production, military logistics, etc.); however, when analyzing the areas approached thus far, energy and transportation have the highest number of applied studies [23].

The application of the method in the transport sector is widespread, especially in the evaluation of airports, ports, railways, and urban transport companies [24,25]. In this paper, we suggest applying DEA to evaluate the relative efficiency of bike-sharing stations: hence, each DMU, in this case, corresponds with a bike-sharing station of a selected system.

To our knowledge, only two recent studies present an application of DEA in the bikesharing research. The first one, from Hong et al. [21], is applied to a station-based BSS, but it does not include any external variable in the first stage of the model. The second one, from Chang and Wei [26], uses DEA to evaluate and determine the optimal bike-sharing parking points for free-floating bicycles. We believe that the application of DEA to shared systems, although unconventional, is an interesting line of upcoming research that is worthy of further investigations.

DEA does not require any functional relationship between inputs and outputs, although it is important to provide their accurate measurements to apply it successfully. This means that only those variables that could appropriately capture the nuances in the efficiency of the DMUs have to be selected as inputs and outputs.

Since the DEA model employed in this paper relies on the standard input-oriented CCR model [22], the DMUs that, at the result of the application, obtain efficiency values equal to 1 are considered efficient. On the other end, efficiency scores less than 1 denote some inefficiencies of the considered DMU.

Note that to obtain sufficient differentiation between the efficiency scores, the number DMUs should not be too small when compared to the total number of inputs and outputs. In the literature, there is no theoretical treatment that gives a unique suggestion on this issue, but there are different rules of thumb. In this paper, we follow the recommendation by Dyson et al. [27], keeping the number of DMUs greater than or equal to twice the product between the number of inputs and that of outputs.

#### *2.2. List of Inputs to Include in the Model*

Input variables for DEA represent the aspects that impact the usage of the BSS and travel behavior in general and may explain the differences in the performance of the stations. To include such aspects in the DEA model, they need to be quantified and recorded as a set of variables. Nevertheless, other relevant qualitative parameters, such as weather and seasonal conditions, that may influence the use of BSS network as a whole, could play an important role in the step of interpreting the result.

In this study, a set of input variables were identified based on the review of literature on the usage of the BSS and travel behavior. In particular, the research by Ewing and Cervero [28] and the review study carried out by Eren and Uz [18] were used as key literature for establishing the list of the input variables which are described in Table 1 below.

**Table 1.** Suggested input variables for measuring the efficiency of the bike-sharing system (BSS) stations using DEA.



**Table 1.** *Cont.*

#### *2.3. List of Outputs to Include in the Model*

The outputs are needed in the model to analyze the performance of BSS stations and calculate generation/attraction factors connected to (the usage of) each station. We propose the following three classes of indicators (five outputs in total), all able to appropriately capture the nuances in the efficiency of bike-sharing stations.

The usage trend of each BSS station shows a cyclical trend, i.e., a pattern that repeats itself after a certain time interval ∆t. Here, we suggest calculating the output indicators as daily averages (∆t = 24 h). Note that the output values have to be normalized according to the number of racks of the largest BSS station in the analyzed system, meaning that each station score is adjusted for the number of racks available at that station (this is the reason why we did not include them among the inputs of the model).

• *Station daily amplitude*: The station daily amplitude is a way to express the daily variation of the number of bicycles in each station. A higher value (higher amplitude) corresponds to a station that is more regularly used throughout the day. We suggest calculating the amplitude of each station using the fast Fourier transform [44]. Fast Fourier transforms are mathematical calculations that convert a domain waveform (amplitude versus time) into a series of discrete waves in the frequency domain. The daily amplitude for each station can be calculated starting from the bicycle variations (usage trends) in ∆T, obtaining their frequency domain using the fast Fourier transform, and assessing the (daily) amplitude value for frequency (cycles/day) = 1.


#### **3. Case Study: Malmöbybike**

#### *3.1. Context Description and Related Variables*

Malmö, with more than 344,000 inhabitants [45], is the third-largest urban area in Sweden. The central-northern part (city center) has the highest population concentration, while smaller urban agglomerations exist in the southwest and eastern parts (Figure 1). As illustrated in Figure 2, the public transportation network follows a similar configuration and is concentrated in areas with higher population density. The cycling infrastructure (Figure 3) includes a bike path network with 520 km of completely separated (from motor vehicle traffic) bike paths and prioritized bike paths shared with other road users [46]. In 2016, Malmöbybike (i.e., the Malmö BSS) started operating with 50 stations in the central areas of the city; during 2019, the network expanded to a total of 100 stations. The recent travel survey conducted in 2018 indicates that the modal share of cycling and public transport in Malmö are, respectively, 25.5% and 25.4% [47].

The spatial data about the population statistics and the built environment characteristics in Malmö were extracted from multiple sources including Statistics Sweden (SCB) [45], Lantmäteriet (Swedish mapping, cadastral and land registration authority) [48] and Trafikverket (Swedish Transport Administration) [49]. The population size data were in a grid format of 100 × 100 m; while other socioeconomic data (such as employment status, education level, income level, etc.) were available with two different cell sizes (250 × 250 m for urban areas and 1000 × 1000 m for suburban areas). Land use data available by Lantmäteriet were employed to map three types of land use namely residential, public and commercial, green areas. Moreover, the transport-related geodata captures the existing cycling infrastructure as well as the public transport network including bus stops and train stations [50].

**Figure 1.** Population distribution in Malmö (number of inhabitants per 100 × 100 m).

**Figure 2.** Map of land use and public transport network in Malmö.

**Figure 3.** Map of the cycling infrastructure in Malmö.

#### *3.2. BSS Data Description and Preparation*

The available dataset on Malmöbybike (January 2018–July 2020) was provided by Clear Channel [51]. It covers all the OD trips in the system during this timeframe, and it makes it possible to have detailed information about the usage of the system, allowing different analyses and data aggregations.

For this application, we selected one-month data, ∆T = June 2020, i.e., the month that has registered the largest number of movements (64,763 trips) in the available dataset. At that date, 100 BSS stations were built and operating in the network. According to Weather spark [52], the average daylight time in June is 17.5 h, with an average temperature of 28 ◦C; the summer vacation in Sweden usually starts from the last week of June. This background offers an attractive condition for having outdoor activities. Regarding the restriction related to the COVID-19 pandemic, in June 2020 Sweden has restricted the social gathering in restaurants and public spaces (that should not exceed 50 people) and advised everyone to keep social distance in outdoor activities.

Out of the 100 stations, five of them (namely, stations no. 21, 61, 62, 69, and 79) have not been used at all during June; hence, they were removed from the dataset. As far as concerns those stations that have been partially used during the month (i.e., due to malfunctioning in some days), they were excluded only if they had not been used for more than 50% of the observation time (station no. 41 was removed in this stage). The reason is that we were performing a monthly (∆T) efficiency analysis, determining which stations have been more efficient in the considered period; minor malfunctioning of the stations should be part of the calculations.

An additional data cleaning was performed concerning those bikes that have been used longer than 1 h (i.e., picked up, and not dropped off by 60 min). According to the Malmöbybike terms of use [53], a bike should be used for a maximum of 60 min at a time, and in the case that a bike is not returned within an hour the user would be charged a fine. Therefore, it is assumed that the trips longer than 60 min are due to bikes that are broken or not functioning correctly. The result of data cleaning was a dataset with 94 stations and 63,338 OD-trips.

Considering the previous research [54] as well as the contextual conditions in Malmö (e.g., the urban area size, the MalmöbyBike coverage area), a radius R = 300 m was considered acceptable to define the *catchment area* (buffer) around each BSS station.

The selected input and output variables are explained and listed in the following Section 3.3.

#### *3.3. Specification of Inputs and Outputs*

Based on the input variables suggested in Section 2.2, we used publicly available statistical data to calculate the following list of input variables to apply DEA to the Malmöbybike BSS (Table 2). Note that all the numbers in the input final table are non-negative; the zero values were eliminated by adding a small positive constant, to meet the "positivity" requirement of DEA [55].




**Table 2.** *Cont.*

Although the station age was listed among the suggested input variables (Section 2.2), we did not include this variable for the case study of Malmöbybike. The decision was made since the system is fairly recent, and it has been mainly built in two steps (50 stations in 2016 and 50 more stations in 2019). As previously explained, since DEA provides a relative efficiency of each station, it is important to provide indicators able to capture in a nuanced way the differences among stations from a certain perspective. The (50 + 50) BSS stations have not been opened simultaneously, but gradually over the year(s). Since the information about the exact days/weeks/months of operation of each station is not available and the *Station age* input would have had only two values (the two known years: 2016 and 2019), it was not added to the model.

In the following Table 3, some descriptive statistics (mean, median, minimum, maximum, standard deviation) of the input variables used in this analysis are provided.

**Table 3.** Descriptive statistics of input variables for the DEA applied to Malmöbybike bike-sharing system (94 DMUs, 20 inputs).


Regarding the output calculation, notation and descriptive statistics are summarized in the following Table 4.


**Table 4.** Variable notations and descriptive statistics of output variables for the DEA applied to Malmöbybike bike-sharing system. (94 DMUs, five outputs).

> If the calculation of station prevalence (O2 and O3) and attractiveness (O4 and O5) is straightforward following the description of Section 2.3, we provided a more detailed explanation for the assessment of the station daily amplitude O1 using the fast Fourier transform.

> Using the Clear Channel database [51] for the Malmöbybike BSS, it was possible to obtain the usage trend of each station in ∆T (June 2020). We did not have any information about bicycle relocations among stations performed by the operator; hence, we made an assumption looking at the available data, which indicates origin and destination of each bike-sharing trip in the network. If the bicycle *b<sup>k</sup>* is in the station *s<sup>i</sup>* at a certain time *h*1, but the previously registered trip (ended at *h*2) in the system does not have *s<sup>i</sup>* as the destination station, we assumed that relocation happened in the time interval *h*1-*h*2, more specifically at the midpoint *h*<sup>3</sup> (so that the time interval *h*2-*h*<sup>3</sup> has the same length of *h*3-*h*1).

> After obtaining the final usage trends (i.e., the bicycle variations) in ∆T taking into account relocations as just described, the fast Fourier transform was applied to convert the time domain waveforms to the frequency domain. The value of each station daily amplitude is the one corresponding to frequency (cycles/day) = 1 (Figure 6).

> The following Figures 4–6 show a practical example for two bike-sharing stations in the system.

**Figure 4.** Normalized time-domain (according to the number of racks of the largest BSS station) waveforms for the bikesharing stations 1 and 15 of the Malmöbybike system on the 9th of June 2020. The daily usage trends (with bicycle variations) can be visualized in blue.

213

ΔT.

ΔT. **Figure 5.** Time-domain waveforms (in blue) for the bike-sharing station 1 of the Malmöbybike system over 10 days included in the analyzed time interval ∆T. A cyclical (daily) periodical behavior can be detected (in red).

**Figure 6.** Frequency domain for the bike-sharing stations 1 and 15 of the Malmöbybike system. The station daily amplitude is the one corresponding to frequency (cycles/day) = 1.

Transforming the temporal domain (trend over time of the number of bikes in each station) into the frequency domain allows finding signal periodicity that otherwise would not be easy to identify. Figure 6 highlights a series of peaks representing the different amplitudes of the periodicities identified using the fast Fourier transform. Larger amplitudes show the prevailing periodicities.

ing ΔT. Some of them (39.4% of the BSS stations) sho (daily) periodic behavior (Δt = 24 We chose to visualize the stations 1 and 15 (in Figures 4 and 6) since they are representatives of the different behaviors that the stations in the Malmöbybike system had during ∆T. Some of them (39.4% of the BSS stations) show a peak corresponding to frequency (cycles/day) = 1 (such as the one shown in Figure 6, Station 1): this means that a typical (daily) periodic behavior (∆t = 24 h) was detected for these stations (look at the corresponding time domain, Figure 4, station 1; Figure 5, over 10 days of observations).

The other stations (look at the representative trend of Station 15, Figure 6) show a smaller amplitude corresponding to frequency (cycles/day) = 1, and peak(s) at lower frequencies (i.e., with cycles longer than 24 h).

–

–

*σ*

–

temporal unit that can be detected in the system corresponds to Δt

The highest frequency peak that was found in the entire database for all the BSS stations is the one corresponding to frequency (cycles/day) = 1, that is, the smallest cyclical temporal unit that can be detected in the system corresponds to ∆t.

#### *3.4. Inputs, Outputs, and Station Selection*

Since DEA is sensitive to outliers [60] and CoPlot has been often used as a supplemental tool to cluster analysis, DEA and outlier detection methods in the literature [61–63], we decided to suggest its application to the proposed analysis [64–66]. Additionally, this analysis allows reducing the number of variables/DMUs to obtain a sufficient differentiation between the efficiency scores, while following the rule of Dyson et al. [27].

We propose to use Robust CoPlot, an adaptation of multidimensional scaling (MDS) that facilitates rich interpretation of multivariate data [67]; it has the capacity to work better than CoPlot with datasets containing outliers since it is not affected by their presence.

Both CoPlot and Robust CoPlot are able to reduce multidimensional data into a twodimensional structure, by superimposing two graphs [68–70], simultaneously evaluating associations between variables and between observations. The first map uses a nonmetric version of MDS to spatially represent the distances between observations (in our case, the observations are the DMUs, that is, the bike-sharing stations in Malmöbybike): similar observations are located close to one another, and the goodness-of-fit of this representation is summarized by a single parameter, the Kruskal stress value, *σ* [71]. The second map, which is conditional on the first, generates vectors that display the relationships among the variables (which, in our case, are inputs and outputs, Section 3.3). Each variable has its vector: if two variables are highly correlated, the vectors describing them are close together, and if their correlation is negative, the vectors describing them go in opposite directions. In this case, we have a goodness-of-fit for each variable, which expresses the goodness of the regression with respect to the observations, and is visualized by the length (magnitude) of the vector (for more details, see [62,67]).

The procedure to identify correlated variables and outliers consists of repeating the Robust Co-Plot several times, removing, before each repetition, respectively, some variables correlated to each other and outliers. DMUs identified by a specific input/output variable are positioned in the same direction of that input/output vector. Correlated variables are represented by vectors having the same directions in space, while DMUs outliers are represented by points positioned far from the center of gravity (the point where the vectors diverge) compared to the other points of the chart.

Figure 7 shows, for example, the Robust CoPlot obtained for the 20 inputs and five outputs described in Section 3.3 in the first repetition.

The DMUs (bike-sharing stations) are graphically represented by red dots: as explained above, similarities between the stations in the dataset are transformed into distances on the map such that similar stations are closer together than less similar stations. The Kruskal stress value *σ* is 9.18%, showing a goodness-of-fit between good and fair [71].

The inputs and outputs are each represented by a black vector (labeled, with notation and magnitude). Those vectors having the same directions in space are highly correlated, hence we decided to not consider some of them and repeat the procedure, so to apply the DEA only considering the most significant variables.

Note that the analysis to remove the highly correlated inputs and outputs has to be done separately for inputs and outputs. Looking at the outputs (Figure 7), we can see that O2 and O3 are almost overlapping, and O4 and O5 have a similar direction. Hence, we selected O1, O3. and O5 since they seem to be the less correlated outputs and more significant for this dataset. Similar reasoning was applied to the 20 inputs, also taking into account those more meaningful in the Malmö context. The procedure was repeated three times, progressively removing those vectors with higher correlation, obtaining at the end the configuration shown by Figure 8, with 11 inputs and three outputs (the rule of Dyson et al. [27] is satisfied). When removing a variable, there is a rearrangement of the remaining ones in the Robust CoPlot map, depicting the associations in the new configuration.

**Figure 7.** Robust CoPlot map of 25 variables (20 inputs and five outputs) describing the bike-sharing stations of the Malmöbybike system.

**Figure 8.** Final Robust CoPlot map of 14 selected variables (11 inputs and three outputs) describing the bike-sharing stations of the Malmöbybike system.

*σ*

Looking at Figure 8, the efficient DMUs (bike-sharing stations) are represented with a blue cross (28 in total), while the less efficient are represented with a red dot. By eliminating variables with low correlations, the goodness-of-fit is slightly improved and the Kruskal stress value *σ* results equal to 9.01%. We did not remove any DMU since we did not notice any significant cluster/variable positioned too far from the center of gravity.

The estimated efficiency scores for the remaining DMUs as well as the inputs and outputs are presented and further discussed in the next section.

#### **4. Results and Discussion**

Figure 9 presents the efficiency scores yielded by DEA. It shows an overall pattern of the relative efficiency for the BSS stations included in the analysis based on the data from June 2020. As represented by the ramp color (dark green to light yellow), stations exhibit clear differences regarding their efficiency levels. Mapping the efficiency scores across space is helpful for both identifying the most/least efficient stations and comparing a subset of the stations to one another or to the contextual conditions. The variation in the relative efficiency scores demonstrate a meaningful pattern concerning the contextual factors and highlights three categories of stations according to their level of efficiency: (1) the efficient BSS stations (having efficiency = 1); (2) the medium efficient BSS stations; (3) the least efficient BSS stations. Each efficiency category is further addressed and discussed in the following subsections.

**Figure 9.** Monthly stations efficiency map for the Malmöbybike system using DEA, June 2020.

#### *4.1. The Efficient BSS Stations*

The stations visualized in the darkest green color represent efficient stations, that is, those having efficiency equal to one (for instance, stations no. 30, 18, or 63). Located in different areas of the city, the efficiencies of these stations may be attributed to varying land use contexts. However, the availability of separated cycling lanes indicates that

the catchment areas for these stations contain a high level of bicycle infrastructure. This pattern reflects the results found in previous studies [35,72,73]. Consistent with the literature [4,15,20], another common property of this category is the proximity to a green area or an activity center such as commercial buildings, public facilities, and job centers. Considering the spatial properties and the urban context of the station locations, three groups can be identified.

The first group includes stations located in the northern part of the city with good access to nature, e.g., green areas and the waterfront. Trips originated from or ending at these stations are likely made by cyclists visiting the area for outdoor activities. Therefore, the presence of natural resources seems to positively contribute to the efficiency of these stations. This result is similar to the finding reported in the study by Kim et al. [74].

The weather or the seasonal conditions may be considered another external factor contributing to a larger number of trips connected to this area [18]. The last week in June coincides with the start of summer vacations in Sweden, hence the increased usage of shared bikes in areas with a larger share of recreational activities. In general, a combination of the mentioned contextual factors is likely to improve the DEA based evaluated efficiency for these stations.

The second group of efficient stations is located in those areas with a high level of access to public transport (no. 18, 16, 1, 24, 25 next to railway stations), and close to the city center. In this case, the shared bicycles users are likely the passengers who are travelling by public transport, using bikes as first/last-mile feeder mode. Such trips can be both commuting and noncommuting trips, meaning the efficiency of these stations may be less affected by the weather or seasonal conditions in June. Hence, good access to public transport may be a major contributor to the higher efficiency of these stations. This result confirms the findings of previous research suggesting that successful BSSs complement existing transport infrastructure such as public transport [16,75].

The third group includes those stations located in areas further from the city center (if compared with the first two groups), but still in the urban area, e.g., stations no. 46, 57, 89. Most of them are newly added stations that have a station age of less than one year. They are located in areas with high population density, next to the buildings which are public facilities or commercial centers, with good bicycle infrastructure available, and close to bus stops. Previous studies have provided strong evidence that these factors contribute to increased use of BSS services [4,74,76]. In some cases, the density of BSS station within 1 km is rather lower than the average level (farther than 500 m to the next station, e.g., stations no. 57, 89) which could contribute to the efficiency of these stations. The pattern of this group may indicate that, for the less dense areas that are located further away from the city center, locations next to the public facilities and commercial centers where often the bus stops are planned are likely to be the optimal spots for planning efficient BSS stations. At the same time, a good quality cycling infrastructure should be provided.

#### *4.2. The Medium Efficient BSS Stations*

Those stations colored in mid-range green are categorized as medium efficient stations, such as stations no. 11, 14, 99. Most of these stations are located in the central area of the city with a higher concentration of public facilities and commercial buildings. The central area is often characterized by a high density in terms of population and jobs which, in turn, implies that it generates or attracts a larger number of trips and, due to the densely built environment, makes traveling by bikes or public transport more convenient than by cars [77]. Similarly, this context may create a higher demand for cycling compared to the peripheral areas, which often motivates the need for a medium/high level of BSS service provision in urban centers.

In the case of Malmö, although these stations did not fall into the efficient station group, many of them have obtained an efficiency score close to 1 (that is, the maximum efficiency score in DEA). Their slightly lower efficiency scores are probably due to the very high density of the BSS stations in the area. Most of the stations in this category have overlapping catchment areas and/or more than one BSS station may be present within their 300 m catchment area. Reducing the density by removing some stations would likely make the remaining ones more efficient. However, given the urban form context in the city central area, the level of the current efficiency of all the stations rather demonstrates the success of the BSS service in the area. In a similar urban context, previous studies have suggested the buffer to be between 200 and 400 m when planning for new stations [18,29,78]. In general, a smaller radius seems to contribute positively to the usage of the service.

The stations located further away from the city center (no. 64, 87, 90) are commonly placed within a maximum of 600 m distance from another. While this radius falls within the reasonable distance range noted in the previous studies, these stations seem to further benefit from proximity to bus stops or large public facilities/commercial buildings. Additionally, despite the lower population size in the peripheral areas, a higher residential density in the form of apartment housings, as opposed to single family houses areas, could be observed in the catchment area of these stations. In general, the observed pattern further confirms the results discussed in the earlier section and previous studies that for the noncentral urban area, the density of the BSS stations, proximity to bus stops, and large public buildings, as well as the high population density could contribute to the efficiency of the BSS stations.

#### *4.3. The Least Efficient BSS Stations*

The least efficient stations, visualized in the lightest shade of green/yellow, mostly include those added during 2019, meaning that their age is less than one year (e.g., stations no. 52, 53, 56, 60, 66, 68, 71, 74, 83, 85, 86, 93). Most of these stations are located further away from the city center and in areas with lower population density. While some of the stations (such as no. 71, 74, 75) are located in proximity to small scale public facilities and commercial buildings, the low population density in their catchment areas indicates a low travel demand [28]. Similarly, the cycling infrastructure connected to the stations is rather poor which can significantly impact cycling behavior [18]. Station no. 60 is an exception to this, most likely because it is located next to two other BSS stations (no. 59 and 63) which are, respectively, next to a train station (no. 59) and public facility buildings (no. 63), providing sufficient service demand in the area. In this single case, removing station no. 60 perhaps would make stations no. 59 and 63 more efficient, reducing the running cost in general. This shows how in the areas far away from city center, where the population density is relatively low, even though there is demand due to the connection to the public transport and access to the public facilities or commercial areas, a higher density of BSS stations may not be needed. This issue has been discussed in the previous studies which have suggested different buffers according to the distance between the location of the stations and the central area [11,18].

Station no. 85 is located in a villa house area. The low efficiency of the station may be due to a low population density around the station and to the socioeconomic features of the population living in the catchment area. More specifically, the residents in the area seem to be associated with larger household size and being part of a higher income group who is more likely to travel by car than bicycle [79]. However, we would like to argue that, although the station has low efficiency, from a behavior nudging perspective, it is still worth placing the BSS service here for promoting and normalizing cycling for the groups living in these contexts.

Based on the examination of the three efficiency categories in relation to the urban contexts, the relative efficiencies evaluated by the DEA method seem highly reasonable and well supported by the previous studies.

#### **5. Conclusions**

The study proposed and tested a method, the data envelopment analysis, for evaluating the relative efficiency of BSS stations. The method was tested by applying DEA to a Swedish case study, the BSS Malmöbybike in Malmö.

The efficiencies were evaluated starting from a pool of input and output variables supported by literature, reports, and BSS planning guides, with declinations which allow the same procedure to be applied potentially to any city. This method does not only evaluate the efficiency of each shared-bicycle station but also enables the possibility of considering the influence of external variables, thereby contributing to the literature as a methodology for analyzing the efficient operation of shared-bicycle stations and the management of shared-bicycle systems.

The results provided by the application to the Malmöbybike BSS are meaningful in relation to both the specificities of the urban context and the findings reported in previous studies. This seems to indicate that the suggested method can provide a reliable evaluation of the BSS efficiency and that it can be used by decision-makers and planners for developing operational strategies to plan BSS stations and networks.

One of the limitations of the proposed methodology is related to the identification of a specific timeframe under evaluation. If external factors change during the days/weeks/months after the analysis, the calculated efficiencies are no longer correct. Furthermore, the analyst should have a good knowledge of the urban context under examination to be sure to include the most suitable variables capable of representing it.

It is important to point out that the objective of the study is to propose and test the DEA methodology rather than carrying out a comprehensive evaluation for the BSS in Malmö. In future studies, broader spatial and temporal information should be included and compared to achieve a more complete evaluation of the Malmöbybike efficiency. The evaluation should be carried out during the seasons when cycling is the most and the least popular. The differences between and within days, weeks, and months should all be analyzed and compared to gain a good overview of the efficiency for supporting effective operational and planning strategies.

Some of the input variables may be difficult to be expressed in a quantitative way, such as the station visibility. This type of variable could be defined through fuzzy sets. A new formulation of the methodology proposed here which considers a fuzzy DEA approach [66] is currently being prepared.

A further line of research should possibly investigate the inclusion of the suggested methodology in bike-sharing network design models, to take into account the potential efficiency of BSS stations when planning or expanding such a system.

**Author Contributions:** Conceptualization, L.C. and R.C.; Methodology, L.C. and R.C.; Software, L.C.; Formal Analysis, L.C., Z.H. and R.C.; Writing—original draft preparation, R.C. and C.Z.; Writing review and editing, R.C., Z.H., C.Z. and L.C.; Visualization, L.C. and Z.H.; Supervision, L.C. and R.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors would like to thank Clearchannel for their assistance with the collection of the Malmöbybike BSS data.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

