Next Article in Journal
Effects of Different Amounts of Dynamic Stretching on Musculotendinous Extensibility and Muscle Strength
Previous Article in Journal
GCE: An Audio-Visual Dataset for Group Cohesion and Emotion Analysis
Previous Article in Special Issue
Improving Driving Style in Connected Vehicles via Predicting Road Surface, Traffic, and Driving Style
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Decision System Based on Markov Chains for Sizing the Rebalancing Fleet of Bike Sharing Stations

Department of Automation, Technical University of Cluj-Napoca, 40014 Cluj-Napoca, Romania
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(15), 6743; https://doi.org/10.3390/app14156743
Submission received: 13 June 2024 / Revised: 17 July 2024 / Accepted: 29 July 2024 / Published: 2 August 2024
(This article belongs to the Special Issue Intelligent Transportation System Technologies and Applications)

Abstract

:
Docked Bike Sharing Systems often experience load imbalances among bike stations, leading to uneven distribution of bicycles and to challenges in meeting users’ demand. To address the load imbalances, many docked Bike Sharing Systems employ rebalancing vehicles that actively redistribute bicycles across stations, ensuring a more equitable distribution and enhancing the availability of bikes for users. The determination of the number of rebalancing vehicles in docked Bike Sharing Systems is typically based on various criteria, such as the size of the system, the density of stations, the expected demand patterns, and the desired level of service quality. This is a determining factor, in order to increase the efficiency of customer service at a reasonable cost. To enable a cost-effective rebalancing, we have used a cluster-based approach, due to the large scale of the Bike Sharing Systems, and our model is based on Markov Chains, given their proven effectiveness in this domain. Degrees of subsystem load at station level were used for modeling purposes. Additionally, a quantization strategy around cluster load was developed, to avoid state space explosion. This allowed the computation of the probability of transitioning from one degree of system load to another. A new method was developed to determine the fleet size, based on the identified subsystem steady state, describing the rebalancing necessity. The model evaluation was performed on traffic data collected from the Citi Bike New York Bike Sharing System. Based on the evaluation results, the model transition rates were in accordance with the expected values, indicating that the rebalancing operations are efficient from the point of view of the fulfillment of on-time arrival constraints.

1. Introduction

The imperative to shift travel modes from vehicles to eco-friendly bikes has become evident, due to traffic congestion and pollution. Motor vehicles contribute significantly to emissions, primarily from fossil fuel consumption. To address these challenges and promote sustainable cities, integrating Bike Share Programs (BSPs) into transportation systems has gained traction. BSPs mitigate pollution, conserve resources, and improve air quality. They offer environmental benefits, promote physical activity, and provide cost-effective and efficient transportation. Implementing BSPs fosters greener urban environments, economically viable transportation, and healthier societies, facilitating the transition to sustainable modes of travel [1].
Bike sharing primarily replaces sustainable modes of transportation, although the rate at which it replaces cars is relatively low. However, bike sharing can indirectly reduce car travel by creating synergies with public transport, such as expanding catchment areas, reducing overcrowding, and improving overall travel times. Different types of modal shift result in different benefits. Transitioning public transport users to bike sharing can alleviate overcrowding in saturated public transport networks. It has been observed that socially disadvantaged groups are underrepresented among bike sharing users, but implementing equity measures and awareness-raising campaigns could help increase their usage rates [2].
The availability of bike sharing services presents both benefits and challenges in traffic management. One of the challenges is ensuring efficient bike sharing dispatch. Polarized traffic patterns within a bike sharing network can give rise to hotspots, which are locations characterized by an imbalance of bikes, either in excess or deficiency. Insufficient bike supply can result in lower user acceptance, while increasing the supply can lead to higher dispatch costs and potential underutilization of resources [3,4].
The operators responsible for bike dispatching play a crucial role in balancing the demand for bicycles. They ensure that bikes are replenished at the most popular starting stations and that docking bases are available at the most popular destination stations [5]. It is important to note that how effectively the operators handle this issue directly impacts the satisfaction level of the service users. This presents another significant challenge, as it requires precise control on the part of the operators, and determining the optimal solution remains a complex and unresolved problem. Sub-optimal solutions for the control of bike sharing rebalancing operators can lead to several pitfalls. Inadequate control measures may result in inefficient redistribution of bikes, leading to imbalanced bike availability at different stations. This can result in a shortage of bikes at popular starting stations, causing frustration and inconvenience for users. Conversely, an excess of bikes at popular destination stations can lead to overcrowding and limited docking space, making it difficult for users to return their bikes. Sub-optimal control strategies may fail to consider dynamic changes in demand patterns, resulting in a mismatch between bike supply and user demand. This can further exacerbate the imbalances and hinder the overall effectiveness of the Bike Sharing System (BSS). Sub-optimal control of operators may lead to increased operational costs, as resources are not efficiently utilized. Therefore, it is crucial to develop robust control strategies that consider real-time demand fluctuations and optimize the redistribution of bikes, to ensure a seamless and satisfactory BSS user experience [6].
The present study introduces an approach based on capacity scaling in the BSS by utilizing bike counts as a measure of load. This allows for the identification of highly dynamic bike stations, which are then analyzed as a separate subsystem. A state vector is associated with this subsystem, capturing the frequency of critical events at the bike stations. By examining highly dynamic bike stations in the real-world Citi Bike New York system during specific time intervals of practical significance, a Markov Chain (MC)-based model of critical events was developed and validated. This model provides support information for the estimation of subsystem demand and facilitates the optimization of fleet size for operators implementing the rebalancing operation strategy that was developed in study [7].
One of the general challenges encountered when utilizing an MC is the issue of state space explosion. An MC relies on defining a set of states and the transition probabilities between those states. However, as the complexity of the system increases, the number of possible states grows exponentially, leading to a phenomenon known as state space explosion. This explosion in the number of states can quickly become computationally infeasible to handle, making it challenging to accurately model and analyze the system using traditional MC techniques. In this paper, the challenge of state space explosion in an MC is addressed by employing an effective approach that involves fixing the number of states through feature quantization. This approach is particularly suitable for tackling the state space explosion problem within the context of a BSS. The quantization technique is motivated by the understanding that the definition of hotspots in a BSS, as measured by bike counts, is relative rather than absolute. Specifically, it is scaled by the total capacity of bike stations and compared with the system-wide average. By quantizing the features and constraining the number of states, the computational complexity associated with the state space explosion can be mitigated, while still capturing the essential characteristics of the BSS dynamics.
This paper is organized into several sections. The second section, Literature Review, provides an in-depth review of the existing literature and research related to the problem at hand. The Methodology section, which is the core of this paper, presents a detailed description of the proposed approach. The Results section presents the findings and outcomes of the research: it includes a detailed analysis of the data collected during the practical evaluation, as well as the results obtained from the theoretical evaluation. Our discussion of the results and methodology can be found in the Discussion section. Finally, the Conclusions section summarizes the key findings of this study and discusses their implications.

2. Literature Review

Understanding actual system behavior is essential for achieving a satisfactory control strategy. Knowing a system’s functioning and response to inputs and conditions allows a comprehensive understanding to be gained. This knowledge enables the identification of critical parameters, analysis of system dynamics, and prediction of system responses. Key features for describing systems, like land use intensity [8], bike counts [9], trip data [10], and multiple categories [11], are used in the literature. Common algorithms used in the literature for feature evaluation are deep neural networks [8], dynamic linear models [9], gradient-boosting decision trees [10], and ensemble learning model-based methods [11]. Leveraging this understanding, optimal control strategies can be developed to enhance system performance, stability, and safety. By incorporating insights from the study of actual system behavior, a reliable and efficient control strategy can be formulated.
For large systems, it is often impractical to comprehensively analyze the intricate behavior of the entire system, necessitating the adoption of reductionist approaches that enable the study of simplified subsystems or components, in order to gain insight and understanding. In this sense, clustering plays an important role in addressing challenges related to a BSS. By employing clustering techniques, valuable insights can be gained into the system’s behavior, allowing for the grouping of similar bikes, stations, or users, based on various attributes, such as location [12,13], usage patterns [14,15], or user preferences [16,17]. This enables the identification of user segments and the tailoring of services accordingly. Clustering also aids in optimizing bike allocation [18], station placement [19], and route planning [20], resulting in enhanced operational efficiency and improved user experience. Moreover, clustering facilitates anomaly detection [21], enabling the identification and resolution of issues, such as bike theft, vandalism, or system malfunctions. Overall, clustering techniques provide a powerful toolset for analyzing and understanding the complex dynamics of BSSs, enabling data-driven decision making and the development of innovative solutions to enhancing the sustainability and usability of BSSs.
One common approach to performing clustering is utilizing feature descriptors. Feature descriptors provide a concise representation of the characteristics and attributes of bike sharing data, enabling efficient and effective clustering algorithms to be applied. The careful selection of feature descriptors is of the utmost importance in ensuring the accuracy and reliability of the clustering results. These descriptors should capture the relevant information that distinguishes different user behaviors and patterns. In the literature, descriptors such as bike demand [22], trip duration [23], or distance traveled [24] can provide insights into user preferences and usage patterns.
Moreover, the proper usage of these feature descriptors is also of great importance. Techniques such as Principal Component Analysis (PCA) have been used in other studies, to reduce the dimensionality of the feature space, allowing for a more efficient clustering process. Li et al. [25] have shown that by extracting the most informative features, PCA helps to eliminate noise and redundancy, leading to more meaningful and interpretable clustering results.
Feature descriptors play a vital role in uncovering hidden patterns and structures within bike sharing data. They enable the identification of distinct user groups, such as commuters, leisure riders, or occasional users, facilitating targeted interventions and improvements in BSS operations [26]. Additionally, other studies use feature descriptors in anomaly detection, identifying unusual or abnormal user behaviors that may require further investigation [27].
In recent years, there has been a growing interest in the application of MCs for modeling and optimizing BSSs [28,29,30]. An MC provides a powerful framework for capturing the dynamic nature of a BSS, where the availability of bikes at different stations changes over time, due to user demand and system operations. Several studies have explored the use of MCs to analyze and improve various aspects of BSSs, such as station capacity planning [31], rebalancing strategies [32], and user behavior prediction [30].
In contrast, the current work focused on analyzing highly dynamic stations in a BSS. Such stations are characterized by frequent fluctuations in their bike availability, often being either completely full or empty. This enables the tailoring of rebalancing control strategies more effectively, ensuring compliance with specific system time constraints. In other words, it addresses, specifically, the problematic areas within the system in a flexible manner, instead of imposing a system universal solution that is not always requested and not always feasible.
Researchers have developed models that incorporate factors like weather conditions, temporal patterns, and station characteristics, to accurately predict bike availability and demand [33,34,35]. These models have been used to optimize the allocation of bikes across stations, minimize rebalancing costs, and enhance the overall efficiency and user experience of BSSs. Furthermore, in their work, Zhou et al. [36] demonstrated that the use of MCs allows for the evaluation of different operational policies and the assessment of their impact on system performance. Overall, the application of MCs in the context of BSSs has shown great potential in improving the sustainability, reliability, and usability of these systems.
However, several studies have concluded that there are certain drawbacks associated with the use of Markov models as modeling tools [37]. Firstly, Markov models do not have the capability to represent confliction, parallelism, preemption, and synchronization [37]. Secondly, it is difficult to extend a system that is modeled using a Markov model. Even a small change in the system design can result in significant changes to the MC structure [38].
Cluster-based approaches offer a promising solution to addressing these two key limitations of MCs [37]. With respect to the first issue mentioned, the inability to convey different forms of concurrency, study [39] showed that clustering allows a more granular representation of system dynamics, which enables the modeling of complex behaviors that are not easily captured by traditional Markov models. For the second issue mentioned, the model extension, by organizing the system into clusters, each representing a subset of the overall system, Rayati et al. [40] demonstrated that it becomes easier to incorporate new components or modify existing ones without disrupting the entire model.
Obtaining a model based on a BSS using an MC opens up a wide range of applications in the field of transportation and urban planning. One potential application highlighted by [35] is predicting the availability of bikes at different stations throughout the day. By analyzing the transition probabilities between states, Chen et al. [33] used their proposed model to provide insights into the patterns of bike usage and to help optimize the distribution of bikes across stations. This information can be valuable for bike sharing operators, to ensure a sufficient supply of bikes at popular locations and avoid overcrowding or shortages. Another application showcased in [41] is evaluating the impact of different policies or interventions on the system. By simulating the MC model in various scenarios, decision makers can assess the effectiveness of strategies, such as adding new stations, adjusting pricing schemes, or implementing incentives to encourage bike usage. Additionally, the model was used in [42] to analyze the spatial and temporal dynamics of BSSs, identifying hotspots in demand and understanding the factors influencing bike usage patterns. This knowledge can inform the planning of infrastructure, such as the placement of new stations or the design of bike lanes, to enhance the accessibility and usability of the system. Overall, the application of MC models in BSSs provides a powerful tool for data-driven decision making and sustainable urban mobility planning.
A model that can help forecasting bike usage by using weather data was explored by [43], temperature and precipitation being the key factors discovered. The conclusion was that high temperatures and low or no precipitation result in increased bike usage. Chen et al. [44] proposed a dynamic cluster-based framework for predicting over-demand in BSSs. Compared to the work of [43], they considered common and opportunistic contextual factors and modeled the relationship among bike stations, using a weighted correlation network. They also introduced a geographically-constrained clustering method. Based on the results obtained from the collected data of NYC Citi Bike and DC Capital Bikeshare, their framework improved the accuracy of predicting over-demand occurrences by considering dynamic contexts and spatial relationships among stations. However, article [45] discussed the importance of understanding the dynamics among stations in a BSS. The user’s satisfaction level is a global indicator of system performance, but it does not capture the variations among stations. The availability rate, on the other hand, reflects the status of available bicycles in each station, which can vary based on their geographical locations. Article [45] emphasized the need for a comprehensive analysis of the system, to determine its dynamic behaviors, which is crucial for system control or redesign. In contrast with articles [43,44], the current work’s clustering strategy advantageously uses the bike stations’ loads, which are directly related to the rebalancing needs. Moreover, compared to the approach in [45], the current work aimed at capturing the memoryless behavior of the bike stations’ loads, Markov Chains being specifically chosen in modeling them because they are well-suited for capturing the dynamic behavior and transitions between different states in the system.
In the context of rebalancing, the research in [46] focused on reducing operation costs by considering depot location. The study utilized K-means and WK-means algorithms to determine the optimal number and location of depots. The spatio-temporal patterns of bicycle sharing systems were also taken into account, providing valuable insights for rebalancing management [47,48]. Study [47] aimed to identify spatio-temporal patterns in bike sharing station usage, using a model based on Functional Data Analysis (FDA). They developed a novel bi-clustering algorithm for functional data, which grouped stations and days with similar behavior. The analysis revealed distinct usage patterns based on city area and day of the week, providing valuable insights for fleet managers to effectively manage the service. The model presented in [17] provides a unique mathematical formulation for bike station clusters, which utilizes user bike flows from operational data in BSSs to optimize rebalancing planning. Unlike traditional models that focus on factors like travel distance or operational costs, this model incorporates user trip information for clustering. To efficiently solve the long-distance rebalancing problem, the researchers proposed a modified fast unfolding algorithm based on network theory.

3. Methodology

3.1. Problem Description

A BSS consists of bike stations equipped with docks where bikes are stored. Clients take bikes for their travel and then return them to another bike station at a dock. To ensure bike availability, trucks are used to rebalance the stations by moving bikes between them. The modeling which encompasses these activities is illustrated in Figure 1:
State BS i signifies that the bike is inside a bike station, and that it is only connected to itself, to state Ri i or to state Re i (black solid connections). State Ri i indicates that the bike is involved in a ride performed by the client, and that it is connected to itself and to every single state BS (red dashed connections). State Re i represents that the bike is transported by a vehicle that performs rebalancing, and that it is connected to itself and to every single state BS (green dashed connections), except the corresponding state BS i . This exception is due to the uselessness of rebalancing, in which the bike is taken from a bike station only to be reintroduced into the same bike station.
Every client docks/undocks a bike required for travel from a set of bike stations A = { B 1 , B 2 , , B M } from the BSS. A load value, defined as the ratio between the number of bikes available (k) and its capacity (C—number of parking lots), is assigned to each bike station B, expressed as follows:
load = L ( B ) = k C
Obviously, 0 k C 0 < L ( B ) < 1 .
A bike station state ξ ( B ) is considered “critical” when its load is close to either 0 or 1, and it is considered “highly dynamic” when “critical” state is reached with a frequency much higher than the average. These two properties are formulated in Equations (2) and (3), respectively:
ξ ( B ) is CRITICAL L ( B ) 0 OR L ( B ) 1 ,
B is HIGHLY DYNAMIC f ( B ) f ¯ ,
where f ( B ) is the frequency of B reaching critical state and f ¯ is the average frequency of reaching critical state throughout the bike stations set A.
Highly dynamic bike stations are those that most require the rebalancing operation and, thus, they are the main contributors to the rebalancing strategy (e.g., designing of the rebalancing fleet). So, they can be isolated from the set, in the form of subset A ˙ , presented below:
A ˙ A ; A = { B A | B is HIGHLY DYNAMIC }
In the case of BSSs characterized by high dimensions (like Citi Bike, in our case), a rebalancing strategy optimization for the whole BSS is impractical and cluster-based strategies are preferred instead [49,50,51,52].
According to [49], it has been shown that the geographic location is not sufficient, the usage pattern being also required, in order to obtain an effective clustering method.
If we select a cluster A ¨ A ˙ , which is a BSS subsystem (referred to as “subsystem” from now on), the proposed problem to be resolved is the developing of a model, in order to predict the behavior of A ¨ . The data used for identification and validation purposes were collected from the Citi Bike system consisting of several months’ time intervals.
An abstract grid of bike stations comprising the whole system and the clustering is shown in Figure 2, where black squares correspond to highly dynamic bike stations, and the others are illustrated by gray squares.
As can be seen in Figure 2, four entities are specified:
  • Bike stations—the elements for each of the sets below;
  • BSS (A)—the set that contains all bike stations;
  • Special subset ( A ˙ )—the set that contains only the highly dynamic bike stations;
  • Subsystem ( A ¨ )—a region of the special subset, manageable for the rebalancing operations.
Their individual relevance to the study is as follows: BSS, represented by Citi Bike New York, provides a practical context for the application; the special subset holds the criterion of selecting highly dynamic stations, defined in Equation (3); the subsystem is derived from the clustering activity, which is justified directly after introducing Equation (4), and it is the focus of the present study; the bike stations hold the inherent properties that will be used as features for modeling purposes.
The notations are summarized in Table 1.

3.2. A Feature Descriptor for the Subsystem

To derive a behavioral model of the subsystem, a state vector X ( t ) ( t time dependent) is assigned to A ¨ :
X ( A ¨ ) = X ( t ) = x 1 x 2 x m ,
where m < M is the number of bike stations contained inside the subsystem and x i is a function attached to B, defined as follows:
x i = x ( B i ) = 1 , B i is CRITICAL 0 , otherwise
The overall criticality of the subsystem is then extracted from the state vector, to serve as input for the states of the model proposed. The overall criticality is defined as the percentage of bike stations from the subsystem that are in a critical state, formulated as below:
x ¯ = x ¯ ( A ¨ ) = i = 1 m x i m ( · 100 % )
Figure 3 summarizes the method of computing the overall criticality ( x ¯ ), using the information of the available bikes ( k i ) and the capacity ( C i ) of all the bike stations ( B i ) from a subsystem comprised of m bike stations, 1 i m , in the form of a flowchart that incorporates the concepts from Equations (1), (2) and (5)–(7).
The flowchart from Figure 3 is organized in three columns, from left to right: variables initialization, computation loop, and final computation of the overall criticality:
  • In the first column, s u m represents the variable that will have, in the end, the same value as the numerator of the fraction from Equation (7), and i is a variable corresponding to the index i of the sum from Equation (7).
  • In the second column:
    -
    Based on Equation (5), there is a loop iterated over i = 1 m .
    -
    The loop starts with the computation of each load for B 1 m from the subsystem, in accordance with Equation (1).
    -
    Then, the computed load is used in the critical load decision process specified by Equation (2).
    -
    Finally, s u m can be incremented if a new critical load is discovered, as stated by Equation (6).
  • In the third column, the overall criticality formula from Equation (7) is applied with s u m already computed and m representing the number of bike stations from the subsystem.

3.3. A Markov Chain for the Subsystem

Before defining the MC for the subsystem, we must address two issues: the state explosion problem and the interdependence between states and the number of bike stations. Both issues are resolved by a quantization strategy, described as follows.
Until now, we have associated a specific value from the domain [ 0 % , 100 % ] to the subsystem state vector. The resolution of the measurement is dependent on m (defined in Equation (7) as the number of bike stations). The states of the MC must be independent of the number of bike stations, otherwise subsystem scalability is not ensured (i.e., if m is increased, the number of states should remain constant; otherwise, the state explosion problem appears).
Therefore, a quantization is performed on the value range [ 0 % , 100 % ] of x ¯ , described as follows:
[ 0 % , 100 % ] = [ 0 % , υ 1 ) [ υ 1 , υ 2 ) [ υ N 1 , 100 % ] , S 1 S 2 S N
where S 1 , S 2 , , S N represent the states of the MC; υ 1 , υ 2 , , υ N 1 are constants ( 0 % < υ 1 < υ 2 < < υ N 1 < 100 % ).
At any given state ( S i ), a subsystem has the theoretical opportunity to transition to any other state ( S j ), as can be seen in the state evolution example from Figure 4 or the general state transition diagram from Figure 5.
In this example, from Figure 4, the subsystem representation is featured by any set of small circles with white or dark fill surrounded by a large circle with dashed outline and the same color fill as the state it belongs to. For example, the left-most representation of the subsystem contains eleven dark and eight white small circles, surrounded by a green filled circle. Each small circle represents a bike station, dark if it has critical load, white otherwise. This means that, in the context of this example, the subsystem contains nineteen bike stations. Above the subsystems’ representations, there is a graph with a horizontal axis denoting time and a vertical axis corresponding to overall criticality ( x ¯ ). All variables from Equation (8) and their relationship are reflected in the graph. Each measurement from the graph, symbolized by ×, means that the overall criticality x ¯ of the subsystem at time t i is of value x ¯ i , which translates to the affiliation to one of the states S 1 , S 2 , S 3 , or S 4 . Each value x ¯ i is computed by dividing the number of small dark circles over the number of all small circles inside the bigger circle.
The state S i of the MC is defined based on the ratio of bike stations in critical load. The state changes when the ratio of bike stations in critical load goes outside of a determined interval. Therefore, the state can be characterized by the specific range [ υ i 1 , υ i ) S i of the ratio of bike stations in critical load.
The timing associated with the state transition is expressed by a random variable X with a specified distribution function. The exponential distribution function is expressed by the following relation:
F ( X ) = P ( X x ) = 1 e λ x .
If t i j represents the time interval for which a subsystem remains in state S i until it reaches state S j , expressed by a random variable of exponential distribution with mean value γ i j , then the execution frequency of the transition will be λ i j = 1 γ i j , which, from now on, will be called “transition rate”.
So, from S i to S j , the transition probability is defined as an exponential distribution function, outlined in Equation (9), with frequency λ i j and mean value γ i j . The probability is dependent on the current state and is not influenced by past or future states.
In Figure 5, S 1 N are the states of the MC, where N is the number of states; λ i j is the the execution frequency of the transition from state S i to state S j . This general diagram is actually an abstract interface with properties to be inherited by concrete diagrams. Each identified subsystem establishes its own concrete case, with two examples being presented in Section 4.
The state transition matrix Q of the general MC will then be as follows:
Q = λ 11 λ 12 λ 13 λ 1 ( N 1 ) λ 1 N λ 21 λ 22 λ 23 λ 2 ( N 1 ) λ 2 N λ 31 λ 32 λ 33 λ 3 ( N 1 ) λ 3 N λ ( N 1 ) 1 λ ( N 1 ) 2 λ ( N 1 ) 3 λ ( N 1 ) ( N 1 ) λ ( N 1 ) N λ N 1 λ N 2 λ N 3 λ N ( N 1 ) λ N N
A general statement that applies to historical data collected from Citi Bike is as follows: The rate of transition between states whose percentage ranges are adjacent (rates λ highlighted with bold text in Equation (10)) is much higher compared to the rest, formalized in the equation below:
λ i a λ i b , a , b , i { 1 N } , for which | i b | > | i a | = 1
The MC was constructed with the following assumptions:
  • Once a truck has left a bike station, it cannot return to the same station unless it has visited other bike stations in between; already illustrated in Figure 1;
  • The bikes are not exchanged in between trucks that meet in their journeys;
  • When observing real subsystems for the defined time interval of model identification, the transitions in between some states will never happen;
  • The transition probability between states that correspond to neighboring ranges of critical load ratios is much higher compared to distant ranges; already described in Equation (11).

3.4. Subsystem Identification

The region to be selected has two dimensions: geographical and temporal. The geographical dimension specifies what stations belong to the subsystem while the temporal dimension determines what time intervals ( Δ t ) to use for model identification.
The predictability of a subsystem is also dependent on the predictability of the underlying bike stations. From the point of view of predictability, similar stations are better suited for participation inside a subsystem. The similarity score from [53] is then used for geographical selection of the bike stations.
In the remainder of this section, the temporal-selection methodology is elaborated.
As presented in Equation (6), the criticality value over time is the indicator for each bike station used to derive a specific behavior.
According to historical traffic data collected from Citi Bike, each subsystem exhibits variation in its behavior, intermittently following different patterns. In some situations, the subsystem seems to conform to a certain stable pattern, while at other times it adopts a completely different model [53]. This fact is expressed in Equation (12):
Ψ 1 Ψ 2 Ψ g T = Δ t 11 Δ t 12 Δ t 1 n 1 Δ t 21 Δ t 22 Δ t 2 n 2 Δ t g 1 Δ t g 2 Δ t g n g
The recorded lifetime T of a subsystem is then partitioned in time intervals and each time interval Δ t has an association with a behavior model Ψ , where g is the total number of observed models and n i is the number of time intervals in which the subsystem conforms to model Ψ i . In general, Δ t i j and Δ t i ( j + 1 ) are not consecutive time intervals, i = 1 g , j = 1 n i .
For bike relocation purposes, it is not necessary to provide a behavior model covering the whole lifetime of the subsystem. Separate models can be offered corresponding to a set of partitions of time.
However, it is important to have a fixed reference interval each day for which the same model applies, so that the station rebalancing operators can prepare for it. This association (reference interval ↦ model) is formulated in Equation (13):
Δ t REF Ψ i ^ Δ t i ^ j Δ t REF Δ t REF Δ t i ^ j Δ t REF ϵ t i ^ = constant , j when Δ t REF Δ t i ^ j ,
where Δ t REF corresponds to the Controlled Time Interval (CTI); Ψ i ^ is the model attached to it; Δ t i ^ j is any time interval overlapping with Δ t REF in which the bike station conforms to model Ψ i ^ ; ϵ t is a tolerance threshold (set to 3% in the application of this paper).
A visualization of ϵ t is provided in Figure 6:
Regarding Equation (13), there is an observation: The behavior mapped to CTI is allowed to manifest also outside CTI (exemplified in Figure 7) as long as it is not happening consistently, in which case either CTI or the behavior must be adapted.
In addition to a reference time interval associated with the model identified for a subsystem ( Δ t R E F Ψ i ^ ), a set of probabilistic distribution functions are associated with the state transitions ( { F } Ψ i ^ ). Each function from { F } is identified via model fit (LSQ minimization) over the accumulated histogram of state transition events, visually represented in Figure 8:

3.5. Rebalancing Strategy for Model Validation

In order to provide an example of practical application that can be supported by the proposed model, the rebalancing method from [7] was chosen. There, the rebalancing problem was decomposed into two smaller problems (bike station prioritization, station-to-rebalancer assignment) that were solved individually as follows: the bike station prioritization was provided by a fuzzy logic-controlled genetic algorithm; the assignment between the stations and rebalancing agents (trucks) was obtained by applying an inference mechanism. Real data from Citi Bike New York BSSs were used for the validation of the solution.
An example of results for the bike station prioritization and assignment to trucks can be seen in Table 2:
The presented results have the corresponding graphic representation in Figure 9:
In Figure 9, the location labeled “0” is the depot of the trucks, and the correspondence (color ↔ truck index) is as follows: blue ↔ 1, green ↔ 2, red ↔ 3. The locations labeled “1”,“2”,…,“9” correspond to bike stations, as mentioned in Table 2.

Reasoning for the Attached Model

Although Ref. [7] features a comparative performance analysis by using different numbers of trucks (2,3,…,10) in order to prove effectiveness, the method is still reliant on another source of decision for the final number of trucks allocated for each subsystem.
Moreover, when rebalancing is considered, it must be taken into account that even though all stations from the subsystem are highly dynamical, most of the time not all of them will be simultaneously critical, but a percentage instead (<100%), i.e., it is not the number of bike stations (m from Equation (5)) but the overall criticality (defined in Equation (7)) of a subsystem that matters when choosing the number of rebalancing trucks.
Based on the considerations above, the degree of subsystem demand (for rebalancing) is defined in the equation below:
D = x ¯ · m
The relationship between optimal fleet size and different subsystem demands can also be visualized in Figure 10.
For exemplification purposes, inside Figure 10, based on the available results from [7], the actual rebalancing cost is drawn, in the form of the continuous black line for which the subsystem demand was maximum ( D = m = 50 —marked on top of line); dashed lines are used for other different demands ( D = 10 , 15 , 20 , , 45 —marked on top of lines). The red point corresponds to the optimal number of trucks, as a trade-off between the initial fleet acquisition cost and the rebalancing cost; the blue line is the scaled truck acquisition cost, which grew linearly, based on the number of trucks involved, considering that each additional truck represented a fixed cost. The scaling of the truck acquisition cost is formulated below:
FK ( Tr ) = α · K t · Tr ,
where FK : N R + is the scaled fleet cost; Tr N is the number of trucks allocated for the subsystem; α R + is a constant scaling factor; K t R + is the fixed cost of the acquisition of one truck.
The scaling factor α is selected so that the optimal criteria from Equation (16) are met:
Tr ^ is OPTIMAL FK ( Tr ^ ) CK ( Tr ^ ) ,
where Tr ^ is the optimal number of trucks allocated to the subsystem; FK : N R + is the scaled fleet cost; CK : N R + is the rebalancing cost. Approximation (≈) instead of equality (=) is used because it is impossible to allocate fractions of trucks, so the closest natural number is used instead (for example, in Figure 10, Tr ^ = 7 or 8 ).
Equation (7) shows that the overall criticality x ¯ changes over time and, based on Equation (14), the subsystem demand D, therefore, also changes over time. The optimal number Tr ^ of trucks is also dependent on D. Because the subsystem to be modeled belongs to a real-world BSS, another constraint is that Tr ^ is ideally selected once in the beginning phase of the subsystem. In order to determine a time invariant demand D ^ of the subsystem, the steady state probabilities for the subsystem are used. The steady state probabilities vector P * is defined in the equation below:
P * = [ P 1 * P 2 * P N * ] ,
where N is the number of individual states of the subsystem; P i * is the probability for the subsystem to be in the state S i , and it can be determined by solving the following system of equations:
P * Q = 0 i = 1 N P i * = 1
Considering the quantization Equation (8), which states that S i corresponds to an overall criticality as an average of the interval boundaries ( 1 2 ( υ i 1 + υ i ) ), the steady demand D ^ of the subsystem will be calculated as follows:
D ^ = P * Y m ,
where Y is the associated average overall criticality vector to the subsystem states ( Y { S 1 , S 2 , , S N } ), defined as follows:
Y = υ 0 , 1 ¯ υ 1 , 2 ¯ υ 2 , 3 ¯ υ N 2 , N 1 ¯ υ N 1 , N ¯ = 1 2 υ 1 υ 1 + υ 2 υ 2 + υ 3 υ N 2 + υ N 1 υ N 1 + 1

3.6. Practical Evaluation for the Proposed Model

In the case of BSSs, the practical evaluations regarding constraints (e.g., reduction of unworking time, lost customer rate) are related to the on-time arrival of the rebalancing agents [54,55,56,57,58]. Therefore, the proposed model must have all the values for the state transition rates correlated to the time intervals requested to the trucks, in order to provide good performance. This quality-related constraint of the model can be formulated as in the equation below:
λ ^ i j = 1 γ ( m i j ) = 1 Γ · m i j ,
where λ ^ i j is the ideal transition rate; γ ( m i , j ) is the maximum time allowed for a truck to travel to m i , j bike stations for rebalancing; Γ is a constant value equal to the amount of time required to travel to each bike station (as part of the trip formed by the m i , j stations). In practice, the ideal transition rate does not always happen, because the high dynamism of the bike stations, with their unpredictable and ever-changing nature, cannot be captured by the model without a degree of tolerance. Therefore, the acceptable transition rate, which also incorporates a degree of noise, is defined below:
λ ˜ i j λ ^ i j α L B · λ ^ i j λ ˜ i j α U B · λ ^ i j , i , j { 1 , , N } , i j ,
where λ ˜ i , j is the acceptable transition rate from state S i to S j ; λ ^ i j , with the formula in Equation (21), is the ideal transition rate relevant for supporting a truck that serves m i , j bike stations; α L B = 1 2 , α U B = 2 are constant values corresponding to factors that are applied to λ ^ i j , in order to obtain the tolerated lower and, respectively, upper bound for λ ˜ i , j .
For the particular case of Citi Bike, according to [7,59], the truck velocity during rebalancing is expected to reach up to 25.5 km/h, and the average of the distances in between two bike stations to be handled during rebalancing trips is a maximum of 2 km. This leads to the following equality:
Γ ( Citi Bike ) = 4 min and 42 s ,
where Γ ( B S S ) is the constant value corresponding to the given BSS equal to the amount of time required to travel to each bike station during the rebalancing trip.
The worst-case-scenario number m i j of bike stations to be served when transitioning from S i to S j is the maximum difference between any number from ( υ i 1 , υ i ) and any number from ( υ j 1 , υ j ) . This can be formulated as follows:
m i j = max ( | υ j υ i 1 | , | υ i υ j 1 | )
In addition to the maximum arrival-time constraint, defined in Equation (23), another common operational goal for BSSs is the reduction of the unworking time [54]. The unworking time refers to the cumulative duration during a rebalancing chain of orders when a station has critical load.
During the evaluation, the average (Equation (25)), the worst case (Equation (26)), and the best case (Equation (27)) of all the unworking times are used as performance indicators:
ω ¯ = B i B t T I Δ t R E F ξ t ( B i ) m · | T I | ,
ω m a x = max B i B t T I Δ t R E F ξ t ( B i ) | T I | ,
ω m i n = min B i B t T I Δ t R E F ξ t ( B i ) | T I | ,
where ξ t ( B i ) is equivalent to the function ξ ( B i ) defined in Equation (2), at moment t, which returns the value 1 when the bike station B i has critical load, and 0 otherwise; ω ¯ , ω m a x , ω m i n are the average, worst case, and best case unworking times, respectively, with lower values corresponding to better performance. The average is computed using the data of all the bike stations ( B i B ) inside the analyzed subsystems, including all the days that figure in the model identification ( | T I | denoting the number of days covered by the time interval T I ) for which traffic data are collected during the reference time interval ( Δ t R E F from Equation (13)), obtaining a daily unworking time that describes the usual bike station. The worst and best cases are calculated in a similar manner, focusing on the bike station with the highest and lowest unworking time, respectively.
Apart from the operational goals, the practical evaluation is also performed with respect to the end-user (bike rider) expectation, by determining the lost customer rate [58] of the rebalanced subsystem.
The lost customer rate is defined as the ratio of customers who have arrived at a bike station and are not able to fulfill their expectation of either renting or returning a bike, because of critical load. It is therefore defined in Equation (28) as follows:
ϕ = k l o s t k l o s t + k u t i l ,
where ϕ is the lost customer rate, k l o s t represents the number of bikes that cannot be rented/returned, due to critical load at the bike station, and k u t i l represents the number of bikes that are rented/returned successfully. During the evaluation, the average ( ϕ ¯ ), the worst case ( ϕ m a x ), and the best case ( ϕ m i n ), computed in a similar manner as Equations (25), (26), and (27), respectively, are used as performance indicators, with the same interpretation that lower values mean better performance.
In the context of unworking time and lost customer rate, in order to assess the model effectiveness, a distinction is made between assisted (by the model) rebalancing and unassisted. Specifically, “assisted” means that before applying the rebalancing strategy [7] the model is used to determine the cost-effective number of trucks, while “unassisted” means that an arbitrary number of trucks are used, as defined in Equation (29):
Tr ˜ = Tr ^ + ϵ Tr ,
where Tr ˜ is the arbitrary number of trucks; Tr ^ is the cost-effective number of trucks (defined in Equation (16)); and ϵ Tr is the deviation from the cost-effective, and has a non-zero integer value.
When ϵ Tr > 0 , the overall unworking time and lost customer rate are, of course, expected to increase or at least remain the same, but this would imply an excessive acquisition of trucks, which, in turn, would lead to low profitability of the system. Alternatively, to maintain profitability, the extra cost could be passed to the end user by increasing the ride prices, which, in turn, would make BSSs less attractive.
Instead, the validation analysis is more pertinent when ϵ Tr < 0 . But it is also of no practical significance to use ϵ Tr values that are too low, because that will result in very low numbers of trucks, which leads to bike stations that are never rebalanced, with obviously long unworking times and high lost customer rates. So, it is essential to evaluate the model’s effectiveness by concentrating the analysis on the minimal achievable benefit provided by it, when ϵ Tr = 1 .

3.7. Theoretical Evaluation for the Proposed Model

As mentioned in Section 3.4, Δ t R E F is a daily time slot repeating each day at the same hourly interval. If for the subsystem identification a few months from the lifetime are used ( T I T ) then the same daily time slot Δ t R E F for another period of the same length ( T E T , | T I | = | T E | ) is used for evaluation.
The model identification on the same subsystem is performed again for T E , obtaining a second model Ψ E , which is equivalent to the previously identified model ( Ψ E Ψ I ) if and only if the maximum error ϵ between the associated probabilistic functions ( F I λ I Ψ I and F E λ E Ψ E ) for each model is below a tolerance t h = 3 % :
Ψ I Ψ E ϵ : = m a x i 2 | λ i I λ i E | | λ i I | + | λ i E | < t h
The model equivalence via accepted probabilistic function deviation can also be visualized in Figure 11:

4. Results

In Figure 12 and Figure 13, two MCs are presented, corresponding to the subsystems with the parameters documented in Table 3 and Table 4, respectively.
The parameters from Table 3 can be interpreted as follows:
  • m, having the value of 20 bike stations, aligns with the usual size of 10–30 bike stations of clusters in the context of rebalancing operations [60,61,62];
  • Δ t R E F corresponds to early morning to mid-morning, correlated to the time of departure of workers to their activities, which might be a hint for increased predictability—hence, the subsystem discovery;
  • T I = T E values represent compact chunks of circa 3 months that were used for identification and evaluation;
  • υ 1 = 25 % , υ 2 = 50 % , and υ 3 = 75 % are interval delimiters corresponding to the N = 4 states; so, the states S 1 , S 2 , S 3 , and S 4 are associated with overall criticality levels inside intervals of [0,25%), [25,50%), [50,75%), and [75,100%], respectively, in accordance with the procedure derived from Equation (8).
The content of Figure 12 is an instance derived from the general representation shown in Figure 5. In this instance, λ i j k is the transition rate from state S i to state S j , where 1 i 4 and 1 j 4 ; k = 1 specifies the context of the first subsystem.
Another important aspect, which is particular to the first subsystem, is that each state S i in the MC is connected to either itself or to the state with neighboring indexes S i 1 and S i + 1 . This is the most common pattern observed throughout Citi Bike data and implies the fact that it is quite rare to observe sudden jumps in overall criticality, i.e., it requires too many bike stations to become simultaneously critical.
The parameters from Table 4 can be interpreted as follows:
  • m, having the value of 16 bike stations, is also usual for clusters in the context of rebalancing operations [60,61,62];
  • Δ t R E F overlaps with the usual time of checking out of work, which might explain the subsystem predictability;
  • T I = T E have values for the time interval of identification and evaluation of almost 3 months;
  • υ 1 = 25 % , υ 2 = 50 % , and υ 3 = 75 % are interval delimiters corresponding to the N = 4 states; so, the states S 1 , S 2 , S 3 , and S 4 are associated with overall criticality levels inside intervals of [0,25%), [25,50%), [50,75%) and [75,100%], respectively, in accordance with the procedure derived from Equation (8).
Compared to Table 3, the number of bike stations that could be captured was reduced by 20%, and this was to be expected because the second subsystem is showing additional transition rates, which would be restricted if too many bike stations were introduced.
The content of Figure 13 is another instance derived from the general representation shown in Figure 5, where λ i j k is the transition rate from the state S i to the state S j ; 1 i 4 and 1 j 4 ; k = 2 specifies the context of the second subsystem.
Compared to Figure 12, the MC features two extra transition rates ( λ 13 2 and λ 24 2 ). This is a good opportunity to test if the theoretical and practical evaluations passed, given the fact that the data points available for deriving these two rates were fewer, so the risk was higher because of the potentially lower signal-to-noise ratio.
The MCs had the following state transition matrices:
Q 1 = λ 11 1 λ 12 1 0 0 λ 21 1 λ 22 1 λ 23 1 0 0 λ 32 1 λ 33 1 λ 34 1 0 0 λ 43 1 λ 44 1 ,
Q 2 = λ 11 2 λ 12 2 λ 13 2 0 λ 21 2 λ 22 2 λ 23 2 λ 24 2 0 λ 32 2 λ 33 2 λ 34 2 0 0 λ 43 2 λ 44 2 ,
and the following values for the transition rates (Table 5 and Table 6):
The probabilistic distribution functions associated with the models obtained from identification and evaluation ( F I and F E , respectively) are presented in Figure 14 and Figure 15:
Given that no transition rate reached a deviation of 3%, according to Equation (13), the theoretical evaluation passed. As expected and formulated in Equation (11), the minor values of λ 13 2 and λ 24 2 are clearly visible through the two isolated curves inside Figure 15.
The practical evaluation of the (relevant) transition rates, in accordance with Equation (22), is presented below in Figure 16 and Figure 17. The blue and red delineations correspond to the lower bound and upper bound, respectively, of the acceptable domain of the transition rate, while the green line marks the actual value identified. Therefore, an acceptable green line position is always in between blue and red, never above red and never below blue. For irrelevant transition rates evaluation (value “0” or λ i i in Table 5 and Table 6), “N/A” (not applicable) text was introduced.
Due to the additional transitions present in the second subsystem, the identified values of all rates were closer to the bounds, due to the lower signal-to-noise ratio.
Nevertheless, in both subsystems, all rates are within the bounds, meaning that the practical evaluation also passed.
In Figure 18, the subsystems’ overall unworking time is shown.
The term “assisted” refers to the rebalancing that used the presented model; “unassisted” means that the rebalancing was performed without the aid of the model. The average, worst, and best cases are defined in Equations (25)–(27), respectively.
Based on the customer feedback mentioned in published studies [54], the improvement caused by decreasing the unworking times by 15–20% from values around 100 s was noticeable by the end users. So, the results of the “assisted” cases justified the usage of the model.
In Figure 19, the subsystems’ overall lost customer rate is presented.
Based on the obtained results, the model caused a reduction in the lost customer rate, proving its positive effect. Nevertheless, it is desired that no customer is lost [58] and the model cannot guarantee that.

5. Discussion

Citi Bike New York was used to evaluate the proposed strategy. The raw data, which consisted of the number of available bikes at each station, was processed, to determine the station load for modeling purposes.
Given the large geographical area covered by the BSSs and the specific traffic limitations, it was not feasible to implement a centralized rebalancing strategy. Therefore, a cluster-based approach was explored, to assess the effectiveness of the proposed model. In this context, a cluster refers to a collection of bike stations that are highly dynamic and located within a defined area.
BSS cluster behavior identification is relevant because it enables the assessment of rebalancing needs. The cluster behavior is defined as a set of states and the transition rates in-between. Each state corresponds to the overall criticality degree, depending on how many bike stations are unable to provide the resources required by the customers.
Determining the fleet size for a set of bike stations is important for efficient operations. A fleet size that is too big can result in high acquisition and maintenance costs. On the other hand, a fleet size that is too small can lead to slow reaction times to customer needs, ultimately resulting in customer loss. To achieve the right balance, a method was developed based on the cluster steady state and the fleet rebalancing cost.
The model effectiveness was demonstrated on two subsystems. Each subsystem had four distinct states corresponding to an overall criticality degree as follows: [0%; 25%), [25%; 50%), [50%; 75%), [75%; 100%]. The first subsystem presented had 20 bike stations and had, in total, 206 selected days for identification and validation purposes, while the second had 16 bike stations and 174 selected days. The second subsystem posed particular challenges because of the observed transitions between states that had intervals that were not neighboring: [0%; 25%)→[50%; 75%) and [25%; 50%)→[75%; 100%]. The evaluations performed for both subsystems analyzed passed.
The proposed model underwent a theoretical evaluation strategy that involved using a few months of data to identify a model. This was followed by another few months of data, with the goal of identifying an equivalent model. The equivalence was determined by ensuring that the deviation between the two models was less than 3%. To assess the deviation, each transition rate from one model was compared to the corresponding transition rate from the other model. For both subsystems, the theoretical evaluation criterion passed by obtaining maximum deviations of 2.47% and 2.28%, respectively.
Within the practical evaluation strategy, the model together with the fleet sizing method were applied to the same subsystems. The selected rebalancing application was tested, to ensure that it operated within the maximum travel time restriction ( Γ = 282 s) to each bike station during the rebalancing trip. The practical evaluation was also successful, due to the fact that all the underlying transition rates were within bounds derived from the mentioned studies of real world conditions of BSSs (25.5 km/h truck velocity; 2 km distance between stations). Additionally, the model effectively reached another operational goal by lowering the duration of the bike station unworking times, to a noticeable extent. Furthermore, in the context of user expectations, it demonstrated efficacy in decreasing the rate of lost customers, which is noteworthy, albeit without ensuring complete reduction, which is a limitation.
The model is, in theory, applicable to other BSSs because its viability is not conditioned by the specific properties of the studied system. This means that any BSS with a subset of bike stations that frequently experience load imbalances could use the model effectively. When doing so, the BSS behavior characteristics would need to be taken into consideration, as they would influence the decision on the number of states in the model, the connections between states, the execution rates, and the definition of the steady state. This introduces another limitation to the study, because it requires an extensive statistical study in order to determine suitable model or evaluation components, such as defining the boundaries for station load intervals, expected execution rates, and reference time intervals for model identification. Future research could focus on conducting practical experiments to verify the transferability of the model.

6. Conclusions

The main objective of this work was to develop a model based on an MC that would support the optimization of the fleet size for rebalancing the loads of BSS stations in highly dynamic conditions.
Compliance with the imposed constraints was ensured. The model was consistent because the variation in behavior throughout the identification and validation time intervals was negligible. The maximum arrival time constraint was also fulfilled. This was demonstrated by the fact that the MC transition rates corresponded to acceptable travel times for the rebalancing routes. Also, the efficiency was increased, with respect to avoiding unworking situations or losing customers.
The effectiveness of the model was checked, using real-world scenarios and data.
Based on its consistent positive outcomes, the model is also recommended for representing other similar systems. Consequently, in future research, there is potential to integrate the presented model into various other logistic applications, tailoring it to their specific contexts and requirements. One potential improvement involves using machine learning techniques to automate the feature-engineering process. This approach would help overcome the limitations of the current study’s statistical analysis by replacing it with a more efficient method.

Author Contributions

All authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in https://ride.citibikenyc.com/system-data, accessed on 17 January 2024.

Acknowledgments

This paper was financially supported by the Project “Entrepreneurial com-petencies and excellence research in doctoral and postdoctoral programs—ANTREDOC”, a project co-funded by the European Social Fund financing agreement no. 56437/24.07.2019.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Eren, E.; Uz, V.E. A review on bike-sharing: The factors affecting bike-sharing demand. Sustain. Cities Soc. 2020, 54, 101882. [Google Scholar] [CrossRef]
  2. João Filipe Teixeira, C.S.; e Sá, F.M. Empirical evidence on the impacts of bikesharing: A literature review. Transp. Rev. 2021, 41, 329–351. [Google Scholar] [CrossRef]
  3. Sun, C.; Lu, J. The Reliability Model for Bike-Sharing Dispatch Based on Hotspot Detection and Hypothesis Test: A Case Study in Beijing. Discret. Dyn. Nat. Soc. 2022, 2022, 7049765. [Google Scholar] [CrossRef]
  4. Sun, C.; Lu, J. Modeling Spatial Riding Characteristics of Bike-Sharing Users Using Hotspot Areas-Based Association Rule Mining. J. Adv. Transp. 2022, 2022, 5705080. [Google Scholar] [CrossRef]
  5. Vallez, C.M.; Castro, M.; Contreras, D. Challenges and Opportunities in Dock-Based Bike-Sharing Rebalancing: A Systematic Review. Sustainability 2021, 13, 1829. [Google Scholar] [CrossRef]
  6. Zhou, J.; Guo, Y.; Sun, J.; Yu, E.; Wang, R. Review of bike-sharing system studies using bibliometrics method. J. Traffic Transp. Eng. 2022, 9, 608–630. [Google Scholar] [CrossRef]
  7. Florian, H.; Avram, C.; Pop, M.; Radu, D.; Aștilean, A. Resources Relocation Support Strategy Based on a Modified Genetic Algorithm for Bike-Sharing Systems. Mathematics 2023, 11, 1816. [Google Scholar] [CrossRef]
  8. Zhao, J.; Fan, W.; Zhai, X. Identification of land-use characteristics using bicycle sharing data: A deep learning approach. J. Transp. Geogr. 2020, 82, 102562. [Google Scholar] [CrossRef]
  9. Mohammed, H.; Almannaa, M.E.; Rakha, H.A. Dynamic linear models to predict bike availability in a bike sharing system. Int. J. Sustain. Transp. 2020, 14, 232–242. [Google Scholar] [CrossRef]
  10. Lin, P.; Weng, J.; Hu, S.; Alivanistos, D.; Li, X.; Yin, B. Revealing Spatio-Temporal Patterns and Influencing Factors of Dockless Bike Sharing Demand. IEEE Access 2020, 8, 66139–66149. [Google Scholar] [CrossRef]
  11. Lv, Y.; Zhi, D.; Sun, H.; Qi, G. Mobility pattern recognition based prediction for the subway station related bike-sharing trips. Transp. Res. Part C Emerg. Technol. 2021, 133, 103404. [Google Scholar] [CrossRef]
  12. Fontes, T.; Arantes, M.; Figueiredo, P.V.; Novais, P. A Cluster-Based Approach Using Smartphone Data for Bike-Sharing Docking Stations Identification: Lisbon Case Study. Smart Cities 2022, 5, 251–275. [Google Scholar] [CrossRef]
  13. Kim, K. Spatial Contiguity-Constrained Hierarchical Clustering for Traffic Prediction in Bike Sharing Systems. IEEE Trans. Intell. Transp. Syst. 2022, 23, 5754–5764. [Google Scholar] [CrossRef]
  14. Lv, C.; Zhang, C.; Lian, K.; Ren, Y.; Meng, L. A hybrid algorithm for the static bike-sharing re-positioning problem based on an effective clustering strategy. Transp. Res. Part B Methodol. 2020, 140, 1–21. [Google Scholar] [CrossRef]
  15. Lv, C.; Zhang, C.; Lian, K.; Ren, Y.; Meng, L. A two-echelon fuzzy clustering based heuristic for large-scale bike sharing repositioning problem. Transp. Res. Part B Methodol. 2022, 160, 54–75. [Google Scholar] [CrossRef]
  16. Albuquerque, V.; Andrade, F.; Ferreira, J.C.; Dias, M.S.; Bacao, F. Bike-sharing mobility patterns: A data-driven analysis for the city of Lisbon. EAI Endorsed Trans. Smart Cities 2021, 5, e2. [Google Scholar] [CrossRef]
  17. Wang, Y.J.; Kuo, Y.H.; Huang, G.Q.; Gu, W.; Hu, Y. Dynamic demand-driven bike station clustering. Transp. Res. Part E Logist. Transp. Rev. 2022, 160, 102656. [Google Scholar] [CrossRef]
  18. Zhang, J.; Meng, M. Bike allocation strategies in a competitive dockless bike sharing market. J. Clean. Prod. 2019, 233, 869–879. [Google Scholar] [CrossRef]
  19. Caggiani, L.; Camporeale, R.; Marinelli, M.; Ottomanelli, M. User satisfaction based model for resource allocation in bike-sharing systems. Transp. Policy 2019, 80, 117–126. [Google Scholar] [CrossRef]
  20. Chen, J.; Li, K.; Li, K.; Yu, P.S.; Zeng, Z. Dynamic Planning of Bicycle Stations in Dockless Public Bicycle-sharing System Using Gated Graph Neural Network. ACM Trans. Intell. Syst. Technol. 2021, 12, 25. [Google Scholar] [CrossRef]
  21. Liu, C.; Gao, X.; Wang, X. Data adaptive functional outlier detection: Analysis of the Paris bike sharing system data. Inf. Sci. 2022, 602, 13–42. [Google Scholar] [CrossRef]
  22. Sathishkumar, V.E.; Cho, Y. Season wise bike sharing demand analysis using random forest algorithm. Comput. Intell. 2024, 40, e12287. [Google Scholar] [CrossRef]
  23. Li, A.; Zhao, P.; Liu, X.; Mansourian, A.; Axhausen, K.W.; Qu, X. Comprehensive comparison of e-scooter sharing mobility: Evidence from 30 European cities. Transp. Res. Part D Transp. Environ. 2022, 105, 103229. [Google Scholar] [CrossRef]
  24. Sangli, M.; Kadadevaramath, R.S.; Madaka, S. Comparative study of methods to identify sensitive parameters for improving performance of predictive models. Int. J. Bus. Syst. Res. 2023, 17, 636–658. [Google Scholar] [CrossRef]
  25. Li, D.; Zhao, Y. A Multi-Categorical Probabilistic Approach for Short-Term Bike Sharing Usage Prediction. IEEE Access 2019, 7, 81364–81369. [Google Scholar] [CrossRef]
  26. Willberg, E.; Salonen, M.; Toivonen, T. What do trip data reveal about bike-sharing system users? J. Transp. Geogr. 2021, 91, 102971. [Google Scholar] [CrossRef]
  27. Zhang, M.; Li, T.; Yu, Y.; Li, Y.; Hui, P.; Zheng, Y. Urban Anomaly Analytics: Description, Detection, and Prediction. IEEE Trans. Big Data 2022, 8, 809–826. [Google Scholar] [CrossRef]
  28. Crisostomi, E.; Faizrahnemoon, M.; Schlote, A.; Shorten, R. A Markov-chain based model for a bike-sharing system. In Proceedings of the 2015 International Conference on Connected Vehicles and Expo (ICCVE), Shenzhen, China, 19–23 October 2015; pp. 367–372. [Google Scholar] [CrossRef]
  29. Zhou, Y.; Wang, L.; Zhong, R.; Tan, Y. A Markov Chain Based Demand Prediction Model for Stations in Bike Sharing Systems. Math. Probl. Eng. 2018, 2018, 8028714. [Google Scholar] [CrossRef]
  30. Chen, X.; Jiang, H. Detecting the Demand Changes of Bike Sharing: A Bayesian Hierarchical Approach. IEEE Trans. Intell. Transp. Syst. 2022, 23, 3969–3984. [Google Scholar] [CrossRef]
  31. Cheng, R.; Zhong, S.; Wang, Z.; Anker Nielsen, O.; Jiang, Y. A hyper-heuristic approach to the strategic planning of bike-sharing infrastructure. Comput. Ind. Eng. 2022, 173, 108704. [Google Scholar] [CrossRef]
  32. Mo, X.; Liu, X.; Chan, W.K.V. Modeling and Optimization in Resource Sharing Systems: Application to Bike-Sharing with Unequal Demands. Algorithms 2021, 14, 47. [Google Scholar] [CrossRef]
  33. Chen, X.; Huang, K.; Jiang, H. Detecting Changes in the Spatiotemporal Pattern of Bike Sharing: A Change-Point Topic Model. IEEE Trans. Intell. Transp. Syst. 2022, 23, 18361–18377. [Google Scholar] [CrossRef]
  34. Li, X.; Xu, Y.; Chen, Q.; Wang, L.; Zhang, X.; Shi, W. Short-Term Forecast of Bicycle Usage in Bike Sharing Systems: A Spatial-Temporal Memory Network. IEEE Trans. Intell. Transp. Syst. 2022, 23, 10923–10934. [Google Scholar] [CrossRef]
  35. Sathishkumar, V.E.; Park, J.; Cho, Y. Using data mining techniques for bike sharing demand prediction in metropolitan city. Comput. Commun. 2020, 153, 353–366. [Google Scholar] [CrossRef]
  36. Zhou, Y.; Lin, Z.; Guan, R.; Sheu, J.B. Dynamic battery swapping and rebalancing strategies for e-bike sharing systems. Transp. Res. Part B Methodol. 2023, 177, 102820. [Google Scholar] [CrossRef]
  37. Yang, N.; Yu, H.; Qian, Z.; Sun, H. Modeling and quantitatively predicting software security based on stochastic Petri nets. Math. Comput. Model. 2012, 55, 102–112. [Google Scholar] [CrossRef]
  38. Brinkmann, J.; Ulmer, M.W.; Mattfeld, D.C. The multi-vehicle stochastic-dynamic inventory routing problem for bike sharing systems. Bus. Res. 2019, 13, 69–92. [Google Scholar] [CrossRef]
  39. Zhou, Y.; Yuan, Q.; Yang, C.; Wang, Y. Who you are determines how you travel: Clustering human activity patterns with a Markov-chain-based mixture model. Travel Behav. Soc. 2021, 24, 102–112. [Google Scholar] [CrossRef]
  40. Rayati, M.; Bozorg, M.; Carpita, M.; Cherkaoui, R. Stochastic optimization and Markov chain-based scenario generation for exploiting the underlying flexibilities of an active distribution network. Sustain. Energy Grids Netw. 2023, 34, 100999. [Google Scholar] [CrossRef]
  41. Shi, B.; Song, Z.; Huang, X.; Xu, J. User Incentive Based Bike-Sharing Dispatching Strategy. In Advances in Knowledge Discovery and Data Mining: Proceedings of the 26th Pacific-Asia Conference, PAKDD 2022, Chengdu, China, 16–19 May 2022; Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F., Eds.; Springer: Cham, Switzerland, 2022; pp. 338–352. [Google Scholar]
  42. Duan, Y.; Wu, J. Spatial-Temporal Inventory Rebalancing for Bike Sharing Systems with Worker Recruitment. IEEE Trans. Mob. Comput. 2022, 21, 1081–1095. [Google Scholar] [CrossRef]
  43. Quach, J.; Malekian, R. Exploring the weather impact on bike sharing usage through a clustering analysis. arXiv 2020, arXiv:2008.07249. [Google Scholar]
  44. Chen, L.; Jakubowicz, J.; Zhang, D.; Wang, L.; Yang, D.; Ma, X.; Li, S.; Wu, Z.; Pan, G.; Thi Mai Trang, N. Dynamic cluster-based over-demand prediction in bike sharing systems. In Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing, Heidelberg, Germany, 12–16 September 2016; pp. 841–852. [Google Scholar] [CrossRef]
  45. Feng, Y.; Affonso, R.C.; Zolghadri, M. Analysis of bike sharing system by clustering: The Vélib’ case. IFAC-PapersOnLine 2017, 50, 12422–12427. [Google Scholar] [CrossRef]
  46. Boonjubut, K.; Hasegawa, H. A Comparison of Clustering Method to Determine Depot Location for a Bike-sharing Operation. In Proceedings of the 2022 5th Asia Conference on Machine Learning and Computing (ACMLC), Bangkok, Thailand, 28–30 December 2022; pp. 127–132. [Google Scholar] [CrossRef]
  47. Galvani, M.; Torti, A.; Menafoglio, A.; Vantini, S. A novel spatio-temporal clustering technique to study the bike sharing system in Lyon. In Proceedings of the EDBT/ICDT Workshops, Copenhagen, Denmark, 30 March–2 April 2020. [Google Scholar]
  48. Ma, X.; Cao, R.; Jin, Y. Spatiotemporal Clustering Analysis of Bicycle Sharing System with Data Mining Approach. Information 2019, 10, 163. [Google Scholar] [CrossRef]
  49. Dai, P.; Song, C.; Lin, H.; Jia, P.; Xu, Z. Cluster-Based Destination Prediction in Bike Sharing System. In Proceedings of the 2018 Artificial Intelligence and Cloud Computing Conference, AICCC ’18, Tokyo, Japan, 21–23 December 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 1–8. [Google Scholar] [CrossRef]
  50. Tavares, B.; Soares, C.; Marques, M. A Cluster-Based Trip Prediction Graph Neural Network Model for Bike Sharing Systems. arXiv 2022, arXiv:2201.00720. [Google Scholar]
  51. Huang, J.; Tan, Q.; Li, H.; Li, A.; Huang, L. Monte carlo tree search for dynamic bike repositioning in bike-sharing systems. Appl. Intell. 2021, 52, 4610–4625. [Google Scholar] [CrossRef]
  52. Mukku, V.D.; Salah, I.H.; Roy, A.; Assmann, T. Evaluation of Station Distribution Strategies for Next-Generation Bike-Sharing System. In Smart Energy for Smart Transport: Proceedings of the 6th Conference on Sustainable Urban Mobility, CSUM2022, Skaithos Island, Greece, 31 August–2 September 2022; Nathanail, E.G., Gavanas, N., Adamos, G., Eds.; Springer: Cham, Switzerland, 2023; pp. 1358–1373. [Google Scholar]
  53. Florian, H.; Pop, M.; Avram, C.; Aştilean, A. Similarity measure for station clustering in bike sharing systems. In Proceedings of the 2020 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR), Cluj-Napoca, Romania, 21–23 May 2020; pp. 1–5. [Google Scholar] [CrossRef]
  54. Lin, Y.C. A Demand-Centric Repositioning Strategy for Bike-Sharing Systems. Sensors 2022, 22, 5580. [Google Scholar] [CrossRef]
  55. Yi, P.; Huang, F.; Peng, J. A Rebalancing Strategy for the Imbalance Problem in Bike-Sharing Systems. Energies 2019, 12, 2578. [Google Scholar] [CrossRef]
  56. Alabool, H.M.; Alarabiat, D.; Abualigah, L.; Heidari, A.A. Harris hawks optimization: A comprehensive review of recent variants and applications. Neural Comput. Appl. 2021, 33, 8939–8980. [Google Scholar] [CrossRef]
  57. Obaid, O.I. Solving Capacitated Vehicle Routing Problem (CVRP) Using Tabu Search Algorithm (TSA). Ibn AL-Haitham J. Pure Appl. Sci. 2018, 31, 199–209. [Google Scholar] [CrossRef]
  58. Xia, Y.; Fu, Z.; Pan, L.; Duan, F. Tabu search algorithm for the distance-constrained vehicle routing problem with split deliveries by order. PLoS ONE 2018, 13, e0195457. [Google Scholar] [CrossRef]
  59. Chiariotti, F.; Pielli, C.; Zanella, A.; Zorzi, M. A Dynamic Approach to Rebalancing Bike-Sharing Systems. Sensors 2018, 18, 512. [Google Scholar] [CrossRef]
  60. Chemla, D.; Meunier, F.; Wolfler Calvo, R. Bike sharing systems: Solving the static rebalancing problem. Discret. Optim. 2013, 10, 120–146. [Google Scholar] [CrossRef]
  61. Raviv, T.; Kolka, O. Optimal inventory management of a bike-sharing station. IIE Trans. 2013, 45, 1077–1093. [Google Scholar] [CrossRef]
  62. Nair, R.; Miller-Hooks, E.; Hampshire, R.C.; Bušić, A. Large-Scale Vehicle Sharing Systems: Analysis of Vélib’. Int. J. Sustain. Transp. 2012, 7, 85–106. [Google Scholar] [CrossRef]
Figure 1. Graph modeling the bike transfer of a BSS.
Figure 1. Graph modeling the bike transfer of a BSS.
Applsci 14 06743 g001
Figure 2. Visual representation of A, A ˙ , and A ¨ .
Figure 2. Visual representation of A, A ˙ , and A ¨ .
Applsci 14 06743 g002
Figure 3. Flowchart for the computation of overall criticality of the subsystem.
Figure 3. Flowchart for the computation of overall criticality of the subsystem.
Applsci 14 06743 g003
Figure 4. Evolution of subsystem state.
Figure 4. Evolution of subsystem state.
Applsci 14 06743 g004
Figure 5. The general state transition diagram for a subsystem.
Figure 5. The general state transition diagram for a subsystem.
Applsci 14 06743 g005
Figure 6. Visualization of ϵ t .
Figure 6. Visualization of ϵ t .
Applsci 14 06743 g006
Figure 7. Visual examples of allowed behavior outside CTI.
Figure 7. Visual examples of allowed behavior outside CTI.
Applsci 14 06743 g007
Figure 8. Probabilistic function fit.
Figure 8. Probabilistic function fit.
Applsci 14 06743 g008
Figure 9. Rebalancing routes.
Figure 9. Rebalancing routes.
Applsci 14 06743 g009
Figure 10. Fleet acquisition cost versus rebalancing cost.
Figure 10. Fleet acquisition cost versus rebalancing cost.
Applsci 14 06743 g010
Figure 11. Negligible error of probabilistic functions for model equivalence.
Figure 11. Negligible error of probabilistic functions for model equivalence.
Applsci 14 06743 g011
Figure 12. MC of the first subsystem identified.
Figure 12. MC of the first subsystem identified.
Applsci 14 06743 g012
Figure 13. MC of the second subsystem identified.
Figure 13. MC of the second subsystem identified.
Applsci 14 06743 g013
Figure 14. Graphical representation of Subsystem 1 model identification and theoretical evaluation.
Figure 14. Graphical representation of Subsystem 1 model identification and theoretical evaluation.
Applsci 14 06743 g014
Figure 15. Graphical representation of Subsystem 2 model identification and theoretical evaluation.
Figure 15. Graphical representation of Subsystem 2 model identification and theoretical evaluation.
Applsci 14 06743 g015
Figure 16. Graphical representation of Subsystem 1 practical evaluation.
Figure 16. Graphical representation of Subsystem 1 practical evaluation.
Applsci 14 06743 g016
Figure 17. Graphical representation of Subsystem 2 practical evaluation.
Figure 17. Graphical representation of Subsystem 2 practical evaluation.
Applsci 14 06743 g017
Figure 18. Comparative analysis—unworking time.
Figure 18. Comparative analysis—unworking time.
Applsci 14 06743 g018
Figure 19. Comparative analysis—lost customer rate.
Figure 19. Comparative analysis—lost customer rate.
Applsci 14 06743 g019
Table 1. Notations summary.
Table 1. Notations summary.
NotationUnitsDefinitions
A-Bike Sharing System
A ˙ -Set of all highly dynamic bike stations
A ¨ -Bike sharing subsystem
B-Bike sharing stations
ξ [boolean]Bike station state (critical/not critical)
M[bike stations]Number of bike stations present at a system
k[bikes]Number of bikes present at bike station
C[docks]Capacity of a bike station
L[%]Bike station load
f[ s 1 ]Frequency of criticality of a bike station
X[boolean array]Subsystem state vector
x[boolean]Criticality of a bike station
x ¯ [%]Overall criticality of a subsystem
m[bike stations]Number of bike stations present at a subsystem
S-Subsystem state of the MC
N[states]Number of states of the MC
υ [%] x ¯ quantile
λ [ s 1 ]Execution rate (MC)
λ ^ [ s 1 ]Ideal execution rate (MC)
λ ˜ [ s 1 ]Acceptable execution rate (MC)
Q-State transition matrix (MC)
T[time intervals]Recorded lifetime of a bike station
T I [time intervals]Partition of T (model identification)
T E [time intervals]Partition of T (model evaluation)
Ψ -Model identified
Δ t i j [s]Time interval of consistent model fit
D[bike stations]Subsystem demand
FK[cost units]Rebalancing fleet acquisition cost
K t [cost units]Single truck acquisition cost
CK[cost units]Rebalancing operation cost
Tr[trucks]Number of trucks allocated for subsystem rebalancing
Tr ^ [trucks]Optimal Tr
P * -The steady state probabilities vector for the subsystem
D ^ [bike stations]The steady demand of the subsystem
Y -Average overall criticality associated for subsystem rates
Table 2. Example result for the rebalancing strategy [7].
Table 2. Example result for the rebalancing strategy [7].
Truck (Index) AssignedBike Stations (Index, Prioritized)
ALL4 ⟶ 5 ⟶ 2 ⟶ 8 ⟶ 3 ⟶ 1 ⟶ 7 ⟶ 9 ⟶ 6
14 ⟶ 2 ⟶ 9 ⟶ 6
25 ⟶ 3 ⟶ 1
38 ⟶ 7
Table 3. Subsystem 1 parameters.
Table 3. Subsystem 1 parameters.
ParameterProperty ofValue
mSubsystem20 [bike stations]
Δ t R E F Identification/EvaluationWednesday 5:47 AM–10:08 AM
T I Identification103 [days]
T E Evaluation103 [days]
υ 1 Quantization25%
υ 2 Quantization50%
υ 3 Quantization75%
NMC4 [states]
Table 4. Subsystem 2 parameters.
Table 4. Subsystem 2 parameters.
ParameterProperty ofValue
mSubsystem16 [bike stations]
Δ t R E F Identification/EvaluationTuesday 6:35 PM–9:16 PM
T I Identification87 [days]
T E Evaluation87 [days]
υ 1 Quantization25%
υ 2 Quantization50%
υ 3 Quantization75%
NMC4 [states]
Table 5. Transition rate values of Subsystem 1.
Table 5. Transition rate values of Subsystem 1.
λ ij 1
[ 10 4 s 1 ] j = 1 234
i = 1 7.48 7.4800
26.71 12.52 5.810
306.39 10.66 4.27
4009.10 9.10
Table 6. Transition rate values of Subsystem 2.
Table 6. Transition rate values of Subsystem 2.
λ ij 2
[ 10 4 s 1 ] j = 1234
i = 1 10.13 7.292.840
216.27 25.65 6.083.30
305.69 12.95 7.26
4008.84 8.84
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Florian, H.; Avram, C.; Radu, D.; Aștilean, A. Decision System Based on Markov Chains for Sizing the Rebalancing Fleet of Bike Sharing Stations. Appl. Sci. 2024, 14, 6743. https://doi.org/10.3390/app14156743

AMA Style

Florian H, Avram C, Radu D, Aștilean A. Decision System Based on Markov Chains for Sizing the Rebalancing Fleet of Bike Sharing Stations. Applied Sciences. 2024; 14(15):6743. https://doi.org/10.3390/app14156743

Chicago/Turabian Style

Florian, Horațiu, Camelia Avram, Dan Radu, and Adina Aștilean. 2024. "Decision System Based on Markov Chains for Sizing the Rebalancing Fleet of Bike Sharing Stations" Applied Sciences 14, no. 15: 6743. https://doi.org/10.3390/app14156743

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop