An Electric Fence-Based Intelligent Scheduling Method for Rebalancing Dockless Bike Sharing Systems

Jia, Lulu; Yang, Dezhen; Ren, Yi; Feng, Qiang; Sun, Bo; Qian, Cheng; Li, Zhifeng; Zeng, Chenchen

doi:10.3390/app12105031

Open AccessArticle

An Electric Fence-Based Intelligent Scheduling Method for Rebalancing Dockless Bike Sharing Systems

by

Lulu Jia

¹,

Dezhen Yang

¹,

Yi Ren

¹

,

Qiang Feng

^1,*

,

Bo Sun

¹

,

Cheng Qian

¹,

Zhifeng Li

¹ and

Chenchen Zeng

²

¹

The School of Reliability and Systems Engineering, Beihang University, Beijing 100191, China

²

Mobvoi, Suzhou 215000, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(10), 5031; https://doi.org/10.3390/app12105031

Submission received: 8 March 2022 / Revised: 12 May 2022 / Accepted: 13 May 2022 / Published: 16 May 2022

(This article belongs to the Topic Advanced Systems Engineering: Theory and Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

With a new generation of bike sharing services emerging, the development of dockless bike sharing services results in considerable socioeconomic and environmental benefits but also creates new issues, such as inappropriate parking behaviors and bike imbalances. To solve the inappropriate parking problem, electric fences have been introduced to guide users to park bikes in designated zones. Considering the role of electric fences in restricting user parking behaviors, an electric fence-based intelligent scheduling method for rebalancing dockless bike sharing systems is proposed in this paper. As a dynamic method that considers the real-time usage of bike sharing systems, an electric fence adjusts its capacity based on real-time information, which guides users to return bikes to electric fences with greater urgency. Because existing approaches require prespecified models and are unable to consider all the intricacies in the dynamic optimization problem, a model-free intelligent scheduling approach based on deep Q-learning that can adapt to the changing distributions of customer arrivals, available bikes, bike locations, and user travel times is used to solve the problem. Finally, a case study involving Beihang University is employed, which shows that the method performs well in rebalancing the bike sharing system and improving the mean utilization (MU) and customer satisfaction (CS).

Keywords:

bike sharing system; dockless bikes; electric fence; deep Q-learning; intelligent scheduling

1. Introduction

Existing for nearly 50 years, bike sharing systems (BSS) have sharply increased both in terms of their prevalence and popularity recently [1]. Similar to electric vehicles [2,3], bikes are environmentally friendly, and they have been frequently cited as a method for solving the “last mile” problem [1,2,3,4,5]. Public bikes on the market are of two types: docked bikes and dockless bikes. The mainstream form of the Chinese BSS are dockless bikes, which allows users the freedom to pick up a bike and park it anywhere they want and, thus, is highly convenient and flexible for users. However, they generate new urban issues. The first problem involves inappropriate parking behaviors. A number of users park bikes at places that are unsuitable as parking spaces (e.g., on a pedestrian street, closely adjacent to a metro entrance), thereby resulting in negative impacts [6,7,8]. The second problem is the imbalance caused by the tidal phenomenon [9]. Affected by the lack of restraints on parking locations, bikes become spatially imbalanced over time. Moreover, the imbalance between users’ concentrated travel demand and the distribution of bikes is further exaggerated because the number of bikes fluctuates dramatically during rush hours [10,11,12].

To solve the imbalance problem mentioned above, vehicle-based methods and user-based methods have been proposed. In vehicle-based methods, operators deploy trucks to rebalance the bike inventory. With regard to static rebalancing, Chemla et al. presented a relaxation of the original model, which was an MIP (mixed integer programming), to address the single-vehicle one-commodity capacitated pickup and delivery problem (SVOCPDP), in which a branch-and-cut algorithm was used [13]. Based on the multiple traveling salesman problem, three formulations were described by Dell’Amico et al. [12]. The branch-and-cut algorithm was used to solve the problem by combining the three formulations and invoking separation procedures. Liu et al. studied a dynamic BSS with multiple heterogeneous vehicles, depots, and visits. The conclusion was obtained that enhanced chemical reaction optimization (CRO) yielded better solutions than preliminary CRO [14]. Schuijbroek et al. proposed the use of mixed-integer programming based on decomposing multivehicle rebalancing matters into single-vehicle problems [15]. They also provided a heuristic of “cluster first, route second” to mitigate its running time. Other static rebalancing studies can also be found in [16,17,18,19,20]. It is difficult to adjust a rebalancing plan in real time to cater to a fluctuating demand for static rebalancing purposes [20]. As a result, dynamic rebalancing solutions began to appear. For dynamic rebalancing, Caggiani et al. carried out dynamic bike rebalancing with a constant gap time, aiming at making users more satisfied and a low rebalancing cost [21]. To decide the number of bikes to reposition and at which station to carry out the rebalancing process, Legros et al. developed an implementable decision-support tool [22]. Mellou et al. proposed a novel mixed-integer programming formulation to solve the dynamic rebalancing problem and provided a linear programming model to capture the bike flows from all trips [23,24]. A rebalancing framework for the dynamic bike sharing problem was presented in [25]. These methods are used to predict upcoming critical statuses and plan the most effective rebalancing operations using an entirely data-driven approach. However, they have limited predictive capabilities of critical stations caused by the absence of temporal sequences, especially in the case of long-term predictions. The vehicle-based approaches’ rebalancing effects depend heavily on the accuracy of demand prediction [26]. Additionally, due to the maintenance and traveling costs of trucks, as well as labor costs, the truck-based approach can deplete a limited budget rapidly.

User-based methods reposition bikes from the perspective of demand/inventory management [27]. They usually provide incentives or implement regulations to encourage users to participate in bike rebalancing. Existing user-based rebalancing methods can be divided into three types: best-of-two regulation, parking space reservation, and dynamic pricing incentives [27]. Flicker et al. presented a two-choice model in which every user is provided with two station choices at the time of a rental and is given an incentive if he/she chooses the station with the lower load [28]. They showed that even if a small portion of the users make the intended choices, the number of unbalanced stations is dramatically reduced. Kaspi et al. investigated the parking space reservation problem by comparing the performance of complete, partial, and no parking space reservation policy. They found that the complete parking reservation policy achieved the lowest total excess time [29]. The dynamic pricing incentive is the most commonly used incentive to respond to the rapid changes in bike inventory levels. Flicker operated a V+ scheme to induce users to avoid certain stations and prefer others [30]. Fifteen minutes are added to their travel time if users place the bikes at one of the hundred uphill stations. Using a fluid approximation, Waserhole et al. also developed a pricing strategy to solve the imbalance problem [31]. By rewarding users, a graph-theoretic approach was employed to solve the imbalance problem and maximize the profit of the system [26]. Chemla et al. determined the incentive price at each station dynamically to maximize the bike service level [32]. Pfrommer et al. and Singla et al. considered the current and projected demand–supply condition in the incentive mechanism design [33,34]. Specifically, Pfrommer et al. combined the vehicle-based strategy and user-based strategy by computing dynamically varying rewards for customers based on the current and predicted bike demand–supply conditions of the bike sharing systems [33]. Singla et al. presented a crowdsourcing mechanism that employs the approach of regret minimization in online learning [34]. Haider et al. also presented an incentive mechanism to encourage users to pick up/drop off bikes at neighboring stations to generate hub stations, which reduces the need for vehicle-based strategies [35]. A mixed-integer nonlinear and nonconvex problem was formulated by Li and Liu to design the rebalance strategy under a static scenario. Compared with vehicle-based approaches, user-based approaches offer more flexible ways to rebalance the system [36,37,38].

According to the literature above, current vehicle-based methods on bike rebalancing problems mainly focus on the strategy design, either as a tactical problem (i.e., bike rebalancing under static scenario) or as an operational problem (i.e., bike rebalancing under dynamic scenario). However, vehicle-based methods operate on a routine basis; all demands cannot be satisfied until the next rebalancing operation. By incentivizing users to rebalance bikes on a real-time basis, the user-based rebalancing strategy effectively improves the service level of a bike sharing system. However, majority of existing user-based rebalancing studies employ a post-price model-based incentive mechanism, which increases the operation cost of the bike sharing systems. To solve the inappropriate parking problem, electric fences have been built as a choice for regulating inappropriate parking behaviors. An electric fence is a predetermined “virtual fence” without a physical installation. Users who park bikes outside the allowed areas cannot lock them and will continue to be charged [13,39]. In this way, users will be guided by their application to proper parking locations. Electric fence policies and technology have been recommended in several important governmental documents, such as the “National Guidance to Encourage and Regulate the Development of the Internet-based Dockless Bike-sharing Service”. Such technology and policies have also been tested as pilot projects in several cities in China since early 2017 [40].

Considering the application of electric fences in recent years, we first propose a scheduling method based on electric fences. As a dynamic method that considers the real-time usage of BSS, electric fences adjust their capacities based on real-time information, which guides users to return bikes to areas within dynamic electric fences with greater urgency. A model-free intelligent scheduling approach is proposed in this paper, based on deep Q-learning (DQN), which can adapt to the changing distributions of customer arrivals and the changing distributions regarding the active bikes, the bikes’ locations, and users’ valuations for the total travel times [41,42,43]. To the best of our knowledge, this is the first work that uses electric fences for the imbalance problem faced by BSS and casts the imbalance problem into a reinforcement learning problem. The arrangement of this paper is as follows. First, a bike sharing system modeling mechanism is provided to describe the structures and operations of BSS; then, we propose the goals and constraints of our optimization approach. In the fourth section, an intelligent dispatching solution and optimization model based on dynamic electric fences is presented. Finally, taking the BSS at Beihang University as an example, the scheduling strategy for electric fences is given, and the effectiveness of the scheduling scheme is verified in AnyLogic.

2. Methodology

A bike sharing system (BSS) is a complex system that includes interactions between users and electric fences.

2.1. System Description

2.1.1. Users

Users represent the people in the BSS system. When users want to ride somewhere, they usually walk to find the bike that is closest to them and park their bikes on the fence that is nearest to their destination if there is no external influence. However, due to the time concentration and spatial unidirectionality of urban traffic, users’ behaviors without any restrictions or guidance can cause a serious imbalance between users’ demand and the distribution of bikes. To solve this problem, scheduling strategies are used to lead users to park bikes on electric fences with greater demand requirements.

2.1.2. Electric Fence

An electric fence is a predetermined “virtual fence” without a physical installation [6], and a typical electric fence is shown in Figure 1. As a virtual site based on satellite positioning, radio frequency identification (RFID), and Bluetooth technology, an electric fence can obtain the current location of a bike and match it with the border of the electric fence to determine whether the bike is parked in the correct position. Based on electric fences, operators can use different strategies to guide users during the parking process. With reasonable settings, electric fences can help bike systems find a balance between flexibility and standardization.

Considering the dynamic changes in users’ needs, the capacities of the electric fences in our method are adjustable and they are the decision variable of this problem. By changing the capacities of electric fences, we can change the distribution of bikes to adapt to the riding needs of users. According to the actions electric fences take, electric fences are divided into target electric fences and dispatch electric fences. Target electric fences are electric fences that need to be expanded to attract more bikes. In contrast, dispatch electric fences are electric fences that need to reduce their capacities. According to the current vehicle distribution and user needs, we increase the areas of target electric fences and reduce the areas of dispatch electric fences to guide users in parking their bikes within fences with greater demand, which can help the system achieve a balance between supply and demand.

2.2. Mathematical Notations

Assuming that there are

N_{P}

users,

N_{B}

dockless bikes, and

N_{R}

electric fences in a given bike sharing system, we build the mathematical notations of the system, which is shown in Table 1.

2.3. The Definitions of Intelligent Scheduling Objectives

With the comprehensive consideration of bikes’ service ratios and service providers’ service levels, two representative metrics (mean utilization (MU) and customer satisfaction (CS)) are used as the objectives of our dispatching scheme.

2.3.1. Mean Utilization (MU)

The MU is the average service ratio of bikes and represents the average number of bikes used in a specific time, such as an hour, as expressed in (1).

M U = \frac{\sum_{i = 1}^{N_{B}} s_{B i}}{N_{B}}

(1)

where

N_{B}

is the total number of bikes and

s_{B i}

is the number of times that the ith bike is used in a period of time.

Assuming that there are

N_{P}

users in the system,

s_{P i}

denotes the number of times that the ith user wants to use the bike in a period of time. Then, the total number of times all bikes are used in a certain period of time is

\sum_{i = 1}^{N_{P}} s_{P i}

, so (1) can be rewritten as:

M U = \frac{\sum_{i = 1}^{N_{P}} s_{P i}}{N_{P}}

(2)

2.3.2. Customer Satisfaction (CS)

CS reflects the satisfaction degree of users regarding their trips and depends on users’ walking times in the process of reaching their destinations [5]. During a dispatch, the calculation of CS requires two steps: first, it is determined if users with riding demands can obtain bikes; then, the time they spend reaching their destinations is calculated. The formula for determining customer satisfaction is as follows:

C S_{i j} = {\begin{matrix} 1 - k \times (t_{g_{i j}} + t_{p_{i j}}) & l_{i j} = 1 \\ 0 - k \times t_{w_{i j}} & l_{i j} = 0 \end{matrix}

(3)

C S_{m e a n} = \frac{\sum_{i = 1}^{N_{P}} \sum_{j = 1}^{s_{p N_{P}}} C S_{i j}}{N_{P}}

(4)

where

i = 1, 2, \dots, N_{P}

j = 1, 2, \dots, s_{p N_{P}}

, and l_ij represents the realization of the jth riding demand of the ith user in the system.

l = 1

represents that a user who wants to ride using a bike successfully, and

l = 0

represents that a user who wants to ride but cannot use a bike and walks to the destination.

t_{g}

represents the time it takes for a user to find an available bike,

t_{p}

represents the time it takes for users to park their bikes, and

t_{w}

represents the time it takes for users to walk from the departure location to the destination. If the electric fence for parking is the destination,

t_{p} = 0

; otherwise,

t_{p} = D_{P} / V_{P}

. k is a weight coefficient, and

C S_{m e a n}

represents users’ overall satisfaction.

2.4. The Calculation of Intelligent Scheduling Objectives

The scheduling process of a BSS is shown in Figure 2. Assume that the initial states of the users are (0, 0, …, 0). If they do not want to have a short trip or their distance from the nearest bike is longer than their maximum acceptable distance, they will walk to the destination, and the walking time is t_w. Otherwise, they will ride a bike. The time they need is the sum of time for walking to the bike (t_g), cycling (t_r), and walking from the parking point to their destination (t_p). The detailed calculation processes of CS and the MU are shown in Figure 3. We find that the calculation processes of CS and the MU are actually the same. As a result, CS and the MU are positively correlated, and the MU increases as CS increases. In contrast, if a user cannot find an available bike (

l = 0

), the MU will be lower, and CS will be negative because

t_{w} > (t_{g} + t_{p})

. According to the correlation between the MU and CS, the scheduling goal is simplified as maximizing

C S_{m e a n}

.

A discrete-time rebalancing method is used in this paper. Every few minutes, the capacity of the electronic fence is adjusted based on the current state that includes the travel demand of users and the states of the electric fences. In our scheduling, the capacity of each electric fence is thought as the decision variables. By changing the capacity of electric fences, the distribution of bikes will be more suitable for users’ needs. Parameters such as the number of users, the number of bikes, the number of electric fences, the locations of electric fences, the walking speed of users, and the speed of bikes are fixed and considered the constraints of our scheduling process.

3. Intelligent Electric Fence Dispatching Algorithm Based on DQN

According to the description of the scheduling process mentioned above, the scheduling strategies given by the control system are the core of the scheduling method. We need to propose a set of electric fence scheduling schemes to rebalance the system by guiding users to park bikes in areas with the greatest demand. Considering the characteristics of the DQN method, which does not need to solve complex models, and its adaptability to dynamic environments, an electric fence scheduling algorithm based on DQN is proposed in this section. The flow diagram of DQN is shown in Algorithm 1.

Algorithm 1 The flow diagram of DQN [41]

1: Initialize the online network with weight

θ

2: Initialize the target network with weight

θ^{-}

3: Initialize the total number of scheduled bikes as

n

4: Initialize the total number of scheduling requests as

t

5: Initialize the capacity of the experience pool as

D

6: Initialize the batch capacity as m

7: While episode

< m,

do:

8: Initialize the state of the electric fence as

S_{t}

9: If num

< n

and

time < t

：

10: Select an initial action

a_{t}

randomly according to the

ξ -greedy

strategy;

11: Or

a_{t} = \arg \max_{a} Q (s, a; θ)

12: Execute

a_{t}

, observe the reward

r

and set the next state as

s_{t + 1}

13: Store

(s_{t}, a_{t}, r, s_{t + 1})

in the experience replay;

14: Extract m data from the experience pool randomly and obtain

Target Q (θ)

according to the target network;

15: Obtain

m Q (s, a; θ^{-})

according to the online network and calculate the value of

L, L = E [{(T a r g e t Q (θ) - Q (s, a; θ^{-}))}^{2}];

16: Update parameter

θ

of the target network;

17:

s_{t} : = s_{t + 1}

18:

Q (θ^{-}) = Q (θ)

every

C

steps;

19: End if

20: End while

Figure 4 is the flow chart of the electric fences’ training and scheduling processes. The proposed scheduling algorithm includes three parts: (1) training the reinforcement learning model (the DQN-based neural network); (2) obtaining the electric fence scheduling scheme; and (3) evaluating the effectiveness of the scheme.

3.1. Generating Actions through Neural Networks

3.1.1. Forward Propagation

Figure 5 shows the neural network used to obtain the value function, in which the leftmost and rightmost columns are the input layer and output layer, respectively, and the others in between are hidden layers. The depth of the neural network is 2; s₁ represents the number of scheduled bikes for the target electric fence at time t; s₂, s₃, …, s_j represent the numbers of dispatched bikes by the dispatch electric fences. The network outputs action a_k at time t + 1 according to the corresponding Q-value and ε-greedy strategy. The output of the jth neuron is

Q (s_{t}, a_{j}) = σ (\sum_{k = 0}^{i} W_{j, k}^{l} a_{k}^{l - 1} + b_{j}^{l})

(5)

where

W_{j, k}^{l}

represents the multiplying factors (or weights) between the kth element in the (l − 1)th layer and the jth element in the lth layer,

b_{j}^{l}

is a constant (normally referred to as a threshold or bias), and

a_{k}^{l - 1}

is the input from the kth element of the previous layer. Because of its fast convergence and simple calculation process, the activation function

σ

used in this case is as follows:

σ (x) = \max (0, x)

(6)

3.1.2. Bias Calculation

Markov decision processes (MDPs) offer standard formalisms for describing multistate decision making in a probabilistic environment. More precisely, an MDP is a discrete-time stochastic control process, where at each time step, the process is in some state s and the decision maker chooses a feasible action a. Accordingly, the process then moves to a new state and awards the decision maker a corresponding reward

r (s, a, s^{'})

. The role of rewards is to provide feedback to a reinforcement learning model about the performances induced by the previous actions. Thus, it is important to define the reward to correctly guide the learning process, which accordingly helps the system take the best action policy. In our system, the main goal is to increase customer satisfaction. Thus, we define the rewards as follows:

r (s_{t}, a) = C S_{m e a n}

(7)

To perform the dispatch actions with different vehicle states, we utilize the deep Q-networks to dynamically generate optimized values. This learning technique is widely used in modern decision making due to its adaptability to the dynamic features in a system. The optimal action value function for an electric fence is defined as the maximum expected achievable reward. Thus, we have

R = \sum_{t = 0}^{\infty} γ_{n} r (s_{t}, a_{t})

(8)

where

0 < γ < 1

is the discount factor for the future. At any time slot t, the dispatcher monitors the current state s_t and then feeds it to the neural network to generate an action. Note that we do not use a full representation of s_t to find the expectation. Rather, we use a neural network to approximate the Q function.

For each electric fence, an action is taken such that the output of the neural network is maximized. The learning process starts with zero knowledge, and actions are chosen using a greedy scheme by following the ε-greedy method. For the electric fence, after choosing the action and according to the reward, the Q-value is updated with a learning factor α as follows:

Q (s_{t}, a) : = Q (s_{t}, a) + α (r (s_{t}, a) + γ \max_{a^{'}} Q (s_{t + 1}, a^{'}) - Q (s_{t}, a))

(9)

where γ ∊ [0, 1] is a discount factor that defines the discounted reward for the future. This technique is known as the value iteration algorithm, and it converges to the optimal action value function,

Q (s, a) \to Q^{*} (s, a)

as

t \to \infty

. The action value function can be represented with a neural network, which takes the current system state and action as input and outputs the corresponding Q-value.

To update the parameters in the neural network, a target value is defined to help guide the update process:

T a r g e t Q = r + γ \max_{a_{t + 1}} Q (s_{t + 1}, a_{t + 1})

. Let Qtarget(s, a) denote the target Q-value at state s when taking action a. The neural network is updated by the mean square error (MSE) in the following equation:

L (θ) = E [{(T a r g e t Q - Q (s, a; θ))}^{2}]

(10)

3.1.3. Bias Reduction

Since the output of the neural network is q but the expected target value is TargetQ, the loss function of the value neural network is loss = (TargetQ − Q)². The objective is to find the neural network’s set of weights that make

L (θ)

as small as possible. This is accomplished using an algorithm known as gradient descent, which repeatedly computes the gradient and updates the neural network’s weights to reach a global minimum. Hence, by differentiating the loss function with respect to the neural network’s parameters at iteration i, θ_i gives the gradient, as expressed in (11):

θ_{i} : = θ_{i} - \frac{\partial L (θ)}{\partial θ_{i}}

(11)

3.2. ε-Greedy Strategy

To avoid electric fences selecting actions with a maximum Q-value of 100%, a random exploration strategy (the ε-greedy scheme) is proposed in the DQN algorithm, which makes electric fences select actions randomly with probability ε and prevents the algorithm from obtaining a locally optimal solution instead of a globally optimal solution. Under this policy, the agent chooses the action that results in the highest Q-value with probability 1 − ε; otherwise, it selects a random action.

3.3. Experience Replay

In a DQN, the true value of a sample is obtained by the formula, and the parameter in the formula is related to the prediction of Q, so the true value of Q is dependent on the predicted value of Q. To minimize the impact of the correlation between two networks, the concept of experience replay is proposed. For any initial states, the corresponding action is selected according to the ε-greedy strategy to obtain an immediate reward, and the system is updated to the new state; then, (s_t, a_t, r, s_t₊₁) is stored in the experience replay. After the samples in the experience replay are sufficient, a batch of samples from the replay is selected randomly to update the real Q network.

3.4. Parameter Update

To eliminate the correlations between training samples, two neural networks with the same structure but different parameters are established. Among these two networks, the parameters of the online networks are up to date, and their training samples come from the experience replay. The target network with parameter

θ^{-}

is the same as the online network, except its parameters are copied every τ steps from the online network so that

θ^{-} = θ

, which is kept fixed during all other steps. The training process of the DQN is shown in Figure 6.

4. Case Study

As a densely populated place, a campus has a very serious bike imbalance problem due to the tidal characteristics of people’s travels, especially during the noon peak period. Taking the bike sharing system at Beihang University as an example, a scheduling strategy is obtained by using the proposed intelligent scheduling method. The effectiveness of the resulting scheduling strategy is verified in this case study.

4.1. User Travel Data

By analyzing the data from questionnaire surveys, a data review, and field observations, users’ travel patterns can be determined. Figure 7 shows the traffic flows of users from the classroom to canteens on working days and weekends. According to Figure 7, students’ travel needs are characterized by concentrations and tides due to the similarities of their life trajectories. The travel flows during the breakfast, lunch, and dinner periods increase significantly, and the tidal phenomenon is very obvious on both weekdays and weekends. Compared with the tidal phenomenon observed during breakfast and dinner, the tidal phenomenon during lunch is more serious. As a result, a scheduling scheme is formulated to alleviate the bike imbalance during lunch.

4.2. Simulation Model

Since the application of our scheduling strategy in reality requires many financial resources, an AnyLogic model is built for the bike sharing system. By applying our scheduling strategy in the simulation model, the effectiveness of the strategy can be verified. The simulation model interface, which also shows the layout of Beihang University, is shown in Figure 8. By modeling the operations of the BBS at Beihang University, the effects of using a scheduling strategy, such as the improvement in CS and the MU, can be reflected. The details of the model, including riding speed, walking speed, etc., are shown in Table 2.

4.3. Scheduling Strategy

Taking the bike imbalance problem on working days as an example, we simulate the peak travel patterns of students in the simulation model. At noon, a large number of students go from teaching buildings to canteens, which makes it hard for students near teaching buildings to find available bikes. By using our scheduling strategies, users are guided to park their bikes at electric fences near the teaching buildings. In our scheduling strategy, the status of the electric fence is (in_num, out_num1, out_num2), in_num is the number of bikes that the target electric fence near the teaching building obtains from the dispatch electric fences, and out_num1 and out_num2 represent the numbers of bikes obtained from two dispatch electric fences (in_num = out_num1+out_num2). There are four kinds of scheduling actions for electric fences, and these are a: (+10, −5, −5), b: (+20, −10, −10), c: (+30, −15, −15), and d: (+40, −20, −20). The scheduling period on working days is from 9:00 a.m. to 11:30 a.m. The whole scheduling period consists of five timeslots, and each timeslot is half an hour. For the scheduling period on weekends, the whole scheduling period from 10:00 a.m. to 11:30 a.m. is divided into three timeslots. If n denotes the maximum scheduling iterations, the end condition of a schedule is n = 5 during working days and n = 3 on weekends.

The parameters of the DQN model are shown in Table 3. By taking the end condition for dispatches on weekends as 50 bikes, the scheduling scheme corresponding to the tidal phenomenon observed on weekends caused by the flow of students from classrooms to cafeterias is shown in Table 4, and the scheme is a→a→c. The MU and CS values obtained with and without scheduling on weekends are shown in Table 5.

By taking the end condition for dispatches on weekdays as 80 bikes, the scheduling strategy corresponding to the tidal phenomenon observed on working days is shown in Table 6. After using this scheduling strategy, the obtained MU and CS values on working days are shown in Table 7.

4.4. Results Analysis

4.4.1. The Effect of Scheduling Strategies on CS

To show the effect of the scheduling strategy, it is applied to the BSS in AnyLogic. The distributions of user satisfaction before and after executing five dispatches and the average CS values with and without scheduling are shown in Figure 9, where the red line represents the distribution with dispatches. Analyzing the average CS values with and without scheduling shown in Table 8, we find that users’ satisfaction with scheduling is significantly higher than that without scheduling, which means that scheduling strategies are useful for solving the imbalance problem and improving CS.

4.4.2. The Effect of Scheduling Strategies on the MU

Taking the scheduling strategy with the end condition of 80 as an example, the bike utilization rates with and without scheduling are shown in Figure 10. Similar to the travel flow of people shown in Figure 7, bike utilization increases starting at 6:00 a.m. Then, it decreases because students’ similar travel flows from dormitories to canteens results in bike imbalance. After the morning peak, students’ random travel patterns begin, and these can alleviate bike imbalance. According to Figure 7, the travel flow from the classroom to the cafeteria decreases from 9:30 a.m. to 10:30 a.m., which results in a decline in the MU. Starting at approximately 10:30 a.m., the travel flow from the classroom to the cafeteria increases. However, because the number of bikes at the electric fence near the classroom is limited and users near the teaching building do not have available bikes, an imbalance during the peak period begins to appear, which causes the MU without dispatching during peak periods to be lower than 0.8. By using the dispatching strategy, the MU remains higher than that without scheduling.

4.4.3. The Loss Function

The loss function of the DQN with different learning rates is shown in Figure 11. During the simulation, the value of the loss function decreases rapidly as the number of training iterations increases at the beginning and then gradually decreases to zero, accompanied by oscillation. The results illustrate the gradual approximation of the Q-value function, which means that the proposed method can obtain the globally optimal strategy rather than the locally optimal strategy. Furthermore, when the learning ratio is 0.06, the loss function can converge better than when the learning ratio is 0.03.

4.5. Discussion

Without scheduling, MU on weekdays is higher than that on weekends, due to the greater demand for cycling on weekdays. However, the CS on weekdays is lower than that on weekends. Because the imbalance problem on weekdays is more serious. Based on this method, MU is increased by about 20% and CS is increased by about 8% both on weekdays and weekends. According to the results above, the proposed method performs well in alleviating the imbalance problem caused by tidal phenomena, and the MU and CS can be improved.

5. Conclusions

This work focuses on the imbalance problem caused by tidal phenomena in BSS while attempting to improve the mean utilization of and customer satisfaction with such services. Considering the role of electric fences in restricting user parking behaviors, we propose an electric fence-based intelligent scheduling method that uses deep neural networks and reinforcement learning to learn optimal dispatch policies via interactions with the external environment. As a dynamic method that considers the real-time usage of a BSS, the proposed approach efficiently incorporates travel demand statistics and deep learning models to manage electric fences for achieving improved mean utilization and customer satisfaction. By adjusting electric fences’ capacities, users are guided to return bikes to dynamic electric fences with the largest demands. Taking the campus of Beihang University as an example, we find that the proposed method performs well in alleviating the imbalance problem. During the working days, MU is improved from 0.505 to 0.603, and CS is improved from 0.501 to 0.544. During weekends, MU is improved from 0.412 to 0.487, and CS is improved from 0.528 to 0.563.

Based on the electronic fences, a new management method to rebalance the bike sharing systems is proposed in this paper, where MU and CS are taken as the goals of the rebalancing strategy. In future works, we plan to add the profits of operators in our optimization objectives to realize more reasonable management of bike sharing systems. In addition, with the rapid development of the sharing economy, shared electric vehicles and shared e-scooters also appear. Future work will also include applying this method to shared electric vehicle systems and e-scooter systems.

Author Contributions

Writing original draft, L.J.; Manuscript modification, Q.F. and Y.R.; Manuscript proofreading, D.Y., C.Q. and B.S.; Data analysis, Z.L.; Anylogic model building, C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No research data.

Conflicts of Interest

The authors declare no conflict of interest.

References

Shaheen, S.A.; Martin, E.W.; Chan, N.D. Public Bike sharing in North America: Early Operator and User Understanding, MTI Report 11-19. Public Bike Sharing in North America during a Period of Rapid Expansion: Understanding Business Models, Industry Trends & User Impacts; Mineta Transportation Institute: San Jose, CA, USA, 2012. [Google Scholar]
Wr, P.; Dro, W. Total Cost of Ownership and Its Potential Consequences for the Development of the Hydrogen Fuel Cell Powered Vehicle Market in Poland. Energies 2021, 14, 2131. [Google Scholar]
Wr, P.; Dro, W. Methodology for Assessing the Impact of Aperiodic Phenomena on the Energy Balance of Propulsion Engines in Vehicle Electromobility Systems for Given Areas. Energies 2021, 14, 2314. [Google Scholar]
Chen, Z.; van Lierop, D.; Ettema, D. Travel satisfaction with dockless bike-sharing: Trip stages, attitudes and the built environment. Transp. Res. Part D Transp. Environ. 2022, 106, 103280. [Google Scholar] [CrossRef]
Hui, Y.; Xie, Y.; Yu, Q.; Liu, X.; Wang, X. Hotspots Identification and Classification of Dockless Bicycle Sharing Service under Electric Fence Circumstances. J. Adv. Transp. 2022, 5218254. [Google Scholar] [CrossRef]
Chang, S.; Song, R.; He, S.; Qiu, G. Innovative Bike-Sharing in China: Solving Faulty Bike-Sharing Recycling Problem. J. Adv. Transp. 2018, 2018, 4941029. [Google Scholar] [CrossRef]
Shi, J.-g.; Si, H.; Wu, G.; Su, Y.; Lan, J. Critical Factors to Achieve Dockless Bike-Sharing Sustainability in China: A Stakeholder-Oriented Network Perspective. Sustainability 2018, 10, 2090. [Google Scholar] [CrossRef] [Green Version]
Yu, D.-S.; Shang, L.-C. Opportunities and Challenges Faced by Share Economy: Taking Sharing Bicycle as an Example. DEStech Trans. Econ. Bus. Manag. 2018, 6, 254–258. [Google Scholar] [CrossRef] [Green Version]
Maulit, A.; Baiburin, Y.; Rakhymbek, M.; Sadykova, G.; Nugumanova, A. Statistical and Network Analysis of Shared Bikes—In the Case of Almaty Bike. In Proceedings of the 2021 IEEE International Conference on Smart Information Systems and Technologies (SIST), Nur-Sultan, Kazakhstan, 28–30 April 2021. [Google Scholar] [CrossRef]
DeMaio, P. Bike-sharing: History, Impacts, Models of Provision, and Future. J. Public Transp. 2009, 12, 41–56. [Google Scholar] [CrossRef]
Nair, R.; Miller-Hooks, E.; Hampshire, R.C.; Busic, A. Large-Scale Vehicle Sharing Systems: Analysis of Vélib’. Int. J. Sustain. Transp. 2012, 7, 85–106. [Google Scholar] [CrossRef] [Green Version]
Dell’Amico, M.; Hadjicostantinou, E.; Iori, M.; Novellani, S. The bike sharing rebalancing problem: Mathematical formulations and benchmark instances. Omega 2014, 45, 7–19. [Google Scholar] [CrossRef]
Chemla, D.; Meunier, F.; Calvo, R.W. Bike sharing systems: Solving the static rebalancing problem. Discret. Optim. 2013, 10, 120–146. [Google Scholar] [CrossRef]
Liu, Y.; Szeto, W.; Ho, S.C. A static free-floating bike repositioning problem with multiple heterogeneous vehicles, multiple depots, and multiple visits. Transp. Res. Part C Emerg. Technol. 2018, 92, 208–242. [Google Scholar] [CrossRef]
Schuijbroek, J.; Hampshire, R.; van Hoeve, W.-J. Inventory rebalancing and vehicle routing in bike sharing systems. Eur. J. Oper. Res. 2017, 257, 992–1004. [Google Scholar] [CrossRef] [Green Version]
Szeto, W.; Shui, C. Exact loading and unloading strategies for the static multi-vehicle bike repositioning problem. Transp. Res. Part B Methodol. 2018, 109, 176–211. [Google Scholar] [CrossRef]
Bulhões, T.; Subramanian, A.; Erdoğan, G.; Laporte, G. The static bike relocation problem with multiple vehicles and visits. Eur. J. Oper. Res. 2018, 264, 508–523. [Google Scholar] [CrossRef]
Jia, H.; Miao, H.; Tian, G.; Zhou, M.; Feng, Y.; Li, Z.; Li, J. Multiobjective Bike Repositioning in Bike-Sharing Systems via a Modified Artificial Bee Colony Algorithm. IEEE Trans. Autom. Sci. Eng. 2019, 17, 909–920. [Google Scholar] [CrossRef]
Tang, Q.; Fu, Z.; Qiu, M. A Bilevel Programming Model and Algorithm for the Static Bike Repositioning Problem. J. Adv. Transp. 2019, 8641492. [Google Scholar] [CrossRef] [Green Version]
Contardo, C.; Morency, C.; Rousseau, L.-M. Balancing a Dynamic Public Bike-Sharing System; Cirrelt: Montreal, QC, Canada, 2012. [Google Scholar]
Caggiani, L.; Camporeale, R.; Ottomanelli, M.; Szeto, W.Y. A modeling framework for the dynamic management of free-floating bike-sharing systems. Transp. Res. Part C Emerg. Technol. 2018, 87, 159–182. [Google Scholar] [CrossRef]
Legros, B. Dynamic repositioning strategy in a bike-sharing system; how to prioritize and how to rebalance a bike station. Eur. J. Oper. Res. 2018, 272, 740–753. [Google Scholar] [CrossRef]
Mellou, K.; Jaillet, P. Dynamic Resource Redistribution and Demand Estimation: An Application to Bike Sharing Systems. SSRN Electron. J. 2019, 1–58. [Google Scholar] [CrossRef] [Green Version]
Erdoğan, G.; Laporte, G.; Calvo, R.W. The One Commodity Pickup and Delivery Traveling Salesman Problem with Demand Intervals. Eur. J. Oper. Res. 2013, 238, 451–457. [Google Scholar] [CrossRef] [Green Version]
Cipriano, M.; Colomba, L.; Garza, P. A data-driven based dynamic rebalancing methodology for bike sharing systems. Appl. Sci. 2021, 11, 6967. [Google Scholar] [CrossRef]
Ramesh, A.A.; Nagisetti, S.P.; Sridhar, N.; Avery, K.; Bein, D. Station-level Demand Prediction for Bike-Sharing System. In Proceedings of the 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 27–30 June 2021; pp. 916–921. [Google Scholar] [CrossRef]
Shui, C.S.; Szeto, W.Y. A review of bicycle-sharing service planning problems. Transp. Res. Part C Emerg. Technol. 2020, 117, 102648. [Google Scholar] [CrossRef]
Fricker, C.; Gast, N. Incentives and redistribution in homogeneous bike-sharing systems with stations of finite capacity. EURO J. Transp. Logist. 2016, 5, 261–291. [Google Scholar] [CrossRef] [Green Version]
Kaspi, M.; Raviv, T.; Tzur, M. Parking reservation policies in one-way vehicle sharing systems. Transp. Res. Part B Methodol. 2014, 62, 35–50. [Google Scholar] [CrossRef]
Fricker, C.; Gast, N.; Mohamed, H. Mean field analysis for inhomogeneous bike sharing systems. Discret. Math. Theor. Comput. Sci. 2012. [Google Scholar] [CrossRef]
Waserhole, A.; Jost, V.; Brauner, N. Vehicle Sharing System Pricing Regulation: A Fluid Approximation; HAL Id: Hal-00727041; HAL Open Access: Lyon, France, 2013. [Google Scholar]
Chemla, D.; Pradeau, T.; Calvo, R.W. Self-Service Bike Sharing Systems: Simulation, Repositioning, Pricing; HAL Open Access: Lyon, France, 2013. [Google Scholar]
Pfrommer, J.; Warrington, J.; Schildbach, G.; Morari, M. Dynamic vehicle redistribution and online price incentives in shared mobility systems. IEEE Trans. Intell. Transp. Syst. 2014, 15, 1567–1578. [Google Scholar] [CrossRef] [Green Version]
Singla, A.; Santoni, M.; Bartók, G.; Mukerji, P.; Meenen, M.; Krause, A. Incentivizing users for balancing bike sharing systems. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; Volume 1, pp. 723–729. [Google Scholar]
Haider, Z.; Nikolaev, A.; Kang, J.E.; Kwon, C. Inventory rebalancing through pricing in public bike sharing systems. Eur. J. Oper. Res. 2018, 270, 103–117. [Google Scholar] [CrossRef]
Singla, A.; Santoni, M.; Bartók, G.; Mukerji, P.; Meenen, M.; Krause, A. Incentivizing users for balancing bike sharing systems. Proc. Natl. Conf. Artif. Intell. 2015, 1, 723–729. [Google Scholar]
Pan, L.; Cai, Q.; Fang, Z.; Tang, P.; Huang, L. A Deep Reinforcement Learning Framework for Rebalancing Dockless Bike Sharing Systems. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 1393–1400. [Google Scholar] [CrossRef]
Zhang, J.; Meng, M.; David, Z.W. A dynamic pricing scheme with negative prices in dockless bike sharing systems. Transp. Res. Part B Methodol. 2019, 127, 201–224. [Google Scholar] [CrossRef]
Park, C.; Sohn, S.Y. An optimization approach for the placement of bicycle-sharing stations to reduce short car trips: An application to the city of Seoul. Transp. Res. Part A Policy Pract. 2017, 105, 154–166. [Google Scholar] [CrossRef]
Rahman, S.-U.; Smith, D.K. Use of location-allocation models in health service development planning in developing nations. Eur. J. Oper. Res. 2000, 123, 437–452. [Google Scholar] [CrossRef]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
Chiu, W.-Y.; Hu, C.-W.; Chiu, K.-Y. Renewable Energy Bidding Strategies Using Multiagent Q-Learning in Double-Sided Auctions. IEEE Syst. J. 2021, 16, 985–996. [Google Scholar] [CrossRef]
Luo, Y.; Chin, K.-W. Learning to Charge RF-Energy Harvesting Devices in WiFi Networks. IEEE Syst. J. 2021, 15, 5516–5525. [Google Scholar] [CrossRef]

Figure 1. A typical electric fence.

Figure 2. Scheduling process of a bike sharing system (BSS).

Figure 3. The calculation of Customer Satisfaction (CS) and the Mean Utilization (MU).

Figure 4. Optimization process of electric fence scheduling.

Figure 5. The neural network of the deep Q-learning (DQN).

Figure 6. DQN learning process.

Figure 7. Peak travel flows from the classroom to other places (not the canteen) on working days and weekends.

Figure 8. Interface of the campus environment simulation.

Figure 9. Users’ satisfaction with five scheduling actions. The horizontal axis indicates users’ instantaneous satisfaction, and the vertical axis denotes probability (Blue columns represent users’ satisfaction without scheduling, and red columns represent users’ satisfaction with scheduling).

Figure 10. Mean utilization of bikes with and without scheduling strategies.

Figure 11. DQN loss function—lr = 0.03 and lr = 0.06.

Table 1. The mathematical model of a bike sharing system.

Notation	Description
$N_{P}$	the number of users
$S_{P}$	$S_{P} : (s_{P_{1}}, s_{P 2}, \dots, s_{P i}, \dots, s_{P N_{P}})$ the states of the users ( $s_{P i}$ denotes the number of times that the ithuser wants to use the bike in a period of time)
$V_{P}$	the walking speed of a user
$T_{P}$	the maximum walking time that can be accepted by users during a ride
$N_{B}$	the number of bikes
$S_{B}$	$S_{B} : (s_{B}_{1}, s_{B}_{2}, \dots, s_{B}_{i}, \dots, s_{B}_{N_{B}})$ the states of the bikes $(s_{B}_{i}$ denotes the number of times that the ith bike is used in a period of time)
$V_{B}$	the speed of a bike
$N_{R}$	the number of electric fences
$S_{R}$	$S_{R} : (s_{R}_{1}, s_{R}_{2}, \dots, s_{R}_{i}, \dots, s_{R}_{N_{R}})$ the states of the electric fences: $s_{R}_{1}$ is the state of the target electric fence $(s_{R}_{1}$ represents the total number of bikes dispatched from the dispatch electric fences); $s_{R}_{2}, \dots, s_{R}_{i}, \dots, s_{R}_{N_{R}}$ represent the states of the dispatch electric fences $(s_{R}_{i}$ represents the total number of bikes dispatched to the target electric fence from the ith electric fence). The initial states of the electric fences are (0, 0, …, 0)
$a_{R}$	$a_{R} : (a_{R}_{1}, a_{R}_{2}, \dots, a_{R}_{i}, \dots, a_{R}_{N_{R}})$ the actions of the electric fences: $a_{R}_{1}$ is the action of the target electric fence $(a_{R}_{1}$ represents the number of bikes dispatched to the target electric fence $); a_{R}_{2}, \dots, a_{R}_{i}, \dots, a_{R}_{N_{R}}$ represent the actions of the dispatch electric fences $(a_{R}_{i}$ represents the number of bikes dispatched from the ith dispatch electric fence). The initial states of the actions are (0, 0, …, 0). Since the bikes entering the target electric fence come from other dispatch electric fences, we have $a_{R}_{1} + a_{R}_{2} + \dots + a_{R}_{N_{R}} = 0$
$D_{w}$	the distance from the departure location to the destination
$t_{w}$	the time it takes for users to walk from the departure location to the destination, $t_{w} = D_{w} / V_{P}$
$D_{g}$	the distance from the departure location to the nearest available bike
$t_{g}$	the time it takes for users to find an available bike, $t_{g} = D_{g} / V_{P}$
$D_{r}$	the distance from the nearest available bike to the target electric fence
$t_{r}$	the time it takes for users to ride from the closest bike to the target fence, $t_{r} = D_{r} / V_{B}$

Table 2. The parameters of the simulation model.

Parameter	Value
The number of users	25,000
The walking speed of a user	5 km/h
The maximum walking time that can be accepted by users during a ride	500 m
The number of bikes	2000
The speed of a bike	15 km/h

Table 3. The parameters for the DQN method.

Parameters
Learning rate	0.1
Discount factor	0.9
Attenuation factor	0.9
Batch size	32
Episode	1200

Table 4. Electric fence scheduling strategy based on reinforcement learning during weekends.

No.	Schedule1	Schedule2	Schedule3
action	(+10, 5, 5)	(+10, 5, 5)	(+30, 15, 15)
state	(+10, 5, 5, 1)	(+20, 10, 10, 2)	(+50, 25, 25, 3)
reward(α)	0.407	0.412	0.521
sum_reward	0.407	0.819	1.34

Table 5. The MU and CS on weekends.

	Without Scheduling	With Scheduling
Mean Utilization (day)	0.412	0.487
Customer Satisfaction (day)	0.528	0.563

Table 6. Electric fence scheduling strategy based on reinforcement learning during working days.

No.	Schedule1	Schedule2	Schedule3	Schedule4	Schedule5
action	(+10, 5, 5)	(+40, 20, 20)	(+20, 10, 10)	(+20, 10, 10)	none
state	(+10, 5, 5, 1)	(+50, 25, 25, 2)	(+70, 35, 35, 3)	(+90, 45, 45, 4)	(+90, 45, 45, 4)
reward	0.356	0.509	0.617	0.522	none
sum_reward	0.356	0.865	1.482	2.004	2.004

Table 7. The MU and CS on working days.

	Without Scheduling	With Scheduling
Mean Utilization (day)	0.505	0.603
Customer Satisfaction (day)	0.501	0.544

Table 8. CS with and without scheduling.

No.	S1	S2	S3	S4	S5
Without scheduling	0.5525	0.4665	0.4975	0.4660	0.5610
With scheduling	0.6980	0.5725	0.5850	0.7040	0.6890

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jia, L.; Yang, D.; Ren, Y.; Feng, Q.; Sun, B.; Qian, C.; Li, Z.; Zeng, C. An Electric Fence-Based Intelligent Scheduling Method for Rebalancing Dockless Bike Sharing Systems. Appl. Sci. 2022, 12, 5031. https://doi.org/10.3390/app12105031

AMA Style

Jia L, Yang D, Ren Y, Feng Q, Sun B, Qian C, Li Z, Zeng C. An Electric Fence-Based Intelligent Scheduling Method for Rebalancing Dockless Bike Sharing Systems. Applied Sciences. 2022; 12(10):5031. https://doi.org/10.3390/app12105031

Chicago/Turabian Style

Jia, Lulu, Dezhen Yang, Yi Ren, Qiang Feng, Bo Sun, Cheng Qian, Zhifeng Li, and Chenchen Zeng. 2022. "An Electric Fence-Based Intelligent Scheduling Method for Rebalancing Dockless Bike Sharing Systems" Applied Sciences 12, no. 10: 5031. https://doi.org/10.3390/app12105031

APA Style

Jia, L., Yang, D., Ren, Y., Feng, Q., Sun, B., Qian, C., Li, Z., & Zeng, C. (2022). An Electric Fence-Based Intelligent Scheduling Method for Rebalancing Dockless Bike Sharing Systems. Applied Sciences, 12(10), 5031. https://doi.org/10.3390/app12105031

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Electric Fence-Based Intelligent Scheduling Method for Rebalancing Dockless Bike Sharing Systems

Abstract

1. Introduction

2. Methodology

2.1. System Description

2.1.1. Users

2.1.2. Electric Fence

2.2. Mathematical Notations

2.3. The Definitions of Intelligent Scheduling Objectives

2.3.1. Mean Utilization (MU)

2.3.2. Customer Satisfaction (CS)

2.4. The Calculation of Intelligent Scheduling Objectives

3. Intelligent Electric Fence Dispatching Algorithm Based on DQN

3.1. Generating Actions through Neural Networks

3.1.1. Forward Propagation

3.1.2. Bias Calculation

3.1.3. Bias Reduction

3.2. ε-Greedy Strategy

3.3. Experience Replay

3.4. Parameter Update

4. Case Study

4.1. User Travel Data

4.2. Simulation Model

4.3. Scheduling Strategy

4.4. Results Analysis

4.4.1. The Effect of Scheduling Strategies on CS

4.4.2. The Effect of Scheduling Strategies on the MU

4.4.3. The Loss Function

4.5. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI