Next Article in Journal
Performance Evaluation of Omni-Channel Distribution Network Configurations considering Green and Transparent Criteria under Uncertainty
Previous Article in Journal
Do Uncertainty and Financial Development Influence the FDI Inflow of a Developing Nation? A Time Series ARDL Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Smart Carbon Emission Scheduling for Electric Vehicles via Reinforcement Learning under Carbon Peak Target

1
Department of Intelligent Science and Information Law, East China University of Political Science and Law, Shanghai 200042, China
2
School of Electronic Information and Electrical Engineering, China Institute for Smart Court, Shanghai Jiao Tong University, Shanghai 200240, China
*
Author to whom correspondence should be addressed.
Sustainability 2022, 14(19), 12608; https://doi.org/10.3390/su141912608
Submission received: 26 August 2022 / Revised: 21 September 2022 / Accepted: 29 September 2022 / Published: 4 October 2022

Abstract

:
Electric vehicles (EVs) have become popular in daily life, which influences carbon dioxide emissions and reshapes the curves of community loads. It is crucial to study efficient carbon emission scheduling algorithms to lessen the influence of EVs’ charging demand on carbon dioxide emissions and reduce the carbon emission cost for EVs coming to the community. We study an electric vehicle (EV) carbon emission scheduling problem to shave the peak community load and reduce the carbon emission cost when we do not know future EV data. First, we investigate an offline carbon emission scheduling problem to minimize the carbon emission cost of the community by predicting future data with regard to incoming EVs. Then, we study the online carbon emission problem and propose an online carbon emission algorithm based on a heuristic rolling algorithm. Furthermore, we propose a reinforcement learning smart carbon emission algorithm (RLSCA) to achieve the dispatching plan of the charging carbon emission of EVs. Last but not least, simulation results show that our proposed algorithm can reduce the carbon emission cost by 21.26 % , 16.60 % , and 8.72 % compared with three benchmark algorithms.

1. Introduction

With the increasing awareness of the need for environmental protection and the support of national policy, people are more willing to use electric vehicles (EVs). Chinese national policy has also contributed to the rapid adoption of EVs. Different from the community load, EVs have intermittent charging demands, which is a remarkable feature [1]. For the intermittent characteristic of EVs and unknown future charging demand, it is crucial and challenging to design online electric vehicle (EV) carbon emission scheduling strategies to shave the peak load [2]. According to the IPCC report on climate change, it will become carbon neutral around 2050 if the temperature rise is less than 1.5 degrees Celsius. To achieve carbon peak and neutrality, we need optimal scheduling methods and green technologies to control carbon emissions. In this work, we study an online carbon emission scheduling algorithm under the constraint of carbon peak.
There has been a large body of literature on the EV charging scheduling and the design of EV charging algorithms. Mou et al. [3] proposed a dynamic wireless charging system powered by wind energy with traffic flow-based charging demand prediction to cut down carbon emissions. Hanahi et al. [4] investigated optimal charging problems for EVs supported by on-route public charging stations to reduce carbon emissions. For the dynamic charging demand, some online EV charging problems with carbon emission are studied in the following works. A linear function approximator is proposed in [5] for the charging scheduling state-value function to optimize pricing strategies for a public EV charging station. Ren et al. [6] studied a low-carbon schedule model with the constraints of carbon emission costs and rights to reduce the total system carbon emissions. The references [3,4,5,6] discussed the EV charging scheduling problem with the target of reducing carbon emissions, but did not emphasize the EV carbon emission scheduling problem. We formulate our EV carbon emission model from these works.
The log-mean divisia index method is utilized in [7] to analyze the influence of influencing factors to reduce road transport carbon emission in eight policy scenarios. Vehicle carbon emissions were calculated in [8] based on monthly average energy consumption and carbon emissions of electricity generation. The carbon emission monetary value method is utilized in [9] to convert the carbon emission reduction problem into a cost optimization problem, to better evaluate energy transition. An online EV energy schedule model with an evolutionary method was proposed in [10] to manage the energy storage. Yang et al. [11] proposed a layered transactive energy model to coordinate EV charging demand and energy-resources generation in distributed power systems. Masoum et al. [12] proposed an online EV charging fuzzy coordination algorithm to reduce the total energy cost and power losses. Jian et al. [13] proposed a valley-filling policy for large-scale EVs charging problem. Dong et al. [14] developed a charging price strategy based on a model of EV mobility for EV fast charging stations. Wang et al. [15] proposed a dynamic traffic assignment model to optimize the spatiotemporal distribution of EV flows on roads to enhance renewable energy source absorption and reduce carbon emissions. Bilh et al. [16] proposed an EV charging algorithm and aimed to utilize the flexibility of the load of EVs to flatten the power fluctuation caused by wind energy. The studies [14,15,16] are dependent on specific models and scenarios, which are not completely applicable to the carbon emission scenario. We formulate our EV online carbon emission model from these works. We aim at developing a less model-dependent online carbon emission algorithm and we utilize the model-free method to design the optimal EV carbon emission strategy. To our knowledge, this paper is the first work to propose an EV online carbon emission model with the constraint of dynamic charging demand.
There are some works using reinforcement learning (RL) methods to solve the scheduling problems about EVs. A deep q-network-based energy and emission management strategy and two distributed deep RL algorithms for for hybrid electric vehicles were proposed to enhance training efficiency [17]. A double-Q learning-based arbitrage strategy is proposed in [18] to reduce the operation costs. Standard Q-learning methods have discrete charging action values, which may not achieve the optimal plan over practical continuous action space. Therefore, in our work, we don’t discretize the decisions and aim at achieving the optimal EV charging carbon emission dispatching plan with continuous scheduling action. We utilize the actor–critic (AC) method [19] with continuous scheduling action to solve the practical scheduling problem with continuous states and actions. Some works on the EV charging scheduling problem are studied with the AC method. An online incentive RL-based algorithm was proposed in [20] to schedule electricity to shave the peak and enhance the reliability of power system. Bahrami et al. [21] developed an online scheduling algorithm based on the AC method, which can lower the peak-to-average ratio and users’ cost. A policy-gradient method with continuous actions is proposed in [22] to solve the resource allocation problem. which is not completely applicable to the carbon emission scenario. Reinforcement learning algorithms in the studies [17,18,19,20,21] are not completely applicable to the continuous EV charging schedule in the carbon emission scenario. Then, we design an improved AC algorithm to achieve a continuous EV charging carbon emission value. A deep reinforcement learning method was utilized in [23] to solve the optimal dispatch problem of electricity–gas systems to improve the scheduling efficiency. This work is the research closest in nature to our own. To our knowledge, this work is the first one to design an EV carbon emission problem based on an improved AC algorithm to achieve a continuous charging emission under carbon peak.
We aim to study a carbon emission scheduling algorithm for EVs arriving in a community; by this algorithm, we can dispatch the EVs in response to day-ahead price. The scheduling of EV-charging carbon emissions can result in the reduction of the system cost and the fluctuation of total load. We design the carbon emission schedule as a Markov decision process, considering uncertainty about the price and loads in the community. We first propose an offline EV carbon emission scheduling problem to minimize the carbon emission cost by knowing the future data. Then, we work on the online carbon emission problem and apply a heuristic rolling algorithm to solve this problem. Furthermore, we propose a smart charging carbon emission scheduling algorithm via the improved AC method to achieve an optimal EV charging action, which has a good convergence [24]. Finally, we compare the average cost per EV of the RLSCA algorithm with three benchmark algorithms. The contributions of our paper are summarized by the following three points.
  • A carbon emission scheduling model that considers the random arrival of EVs with power constraints is formulated to minimize the carbon emission cost in accord with the community’s known EV profiles. Our model is practical and easily adaptable to different scenarios.
  • We propose an online carbon emission scheduling algorithm based on a heuristic rolling algorithm by making carbon emission decisions according to the current EV profiles, such as arrival time and state of charge. This serves as a baseline by which to evaluate our proposed learning-based algorithm.
  • This work proposes a reinforcement learning-based smart charging carbon emission scheduling algorithm (RLSCA) via the asynchronous advantage AC method. The simulation results show that the RLSCA algorithm can reduce the expected carbon emission cost of EVs by 21.26 % , 16.60 % , and 8.72 % , compared with three benchmark algorithms, respectively.
This work contains the following five parts. Section 2 studies the offline carbon emission scheduling problem. Section 3 reformulates the offline carbon emission schedule to an online carbon emission scheduling problem. Section 4 studies and analyzes the reinforcement learning algorithm RLSCA. Section 5 shows simulation performance and analysis. Section 6 shows the conclusion and shows the future work.

2. Offline Carbon Emission Scheduling Problem

In this part, we study the optimal offline carbon emission scheduling problem without knowing the future data ahead.

2.1. System Model

From Figure 1, we study a community carbon emission model with enough charging piles. We need to minimize the carbon emission cost of this community. In particular, we study the charge and discharge scheduling of electric vehicle battery in a time horizon of T with the same intervals—that is, t = 1 , 2 , , T . Without loss of generality, we denote that EVs come to the community in the order from 1 to N. We denote the state of charge (SoC) for EV k as S o C k , t at time slot t. The initial SoC of EV k is S o C k , 0 . The expected SoC at the departure time for EV k is S o C k , e n d . The arrival time of EV k is denoted as t k a r r . The departure time of EV k is set as t k d e p . We set the charging demand of EV k as D k . The profiles D k , t k a r r and t k d e p will be known to the community when EV k arrives in an offline scheduling problem. The expected SoC value should be reached before the EV leaves.
According to the practical constraints of battery, the carbon emission of EV k according to the charging energy is set as b k ( t ) and the carbon emission per time slot has a range limit
b k ( t ) [ 0 , b k , m a x ] ,
where b k , m a x is the upper limit of EV k carbon emission in each time slot. We have the battery capacity of EV k as B k , m a x and denote G ( t ) as EVs set in the community at time slot t. U ( t ) is the time set from t to t ¯ , where t ¯ is the maximum EV leaving time in G ( t ) ,
U ( t ) = { t ¯ | t ¯ t & t ¯ max { t k | k G ( t ) } } .
We explain the concept of G ( t ) and U ( t ) by the example shown in Figure 2. There are five EVs in the community in this example. At time slot 4, there are four EVs in the community (that is, G 4 = { 2 , 3 , 4 , 5 } ) and in this time slot, the maximum service time of EVs parking in the community is from 4 to 9 (that is, U 4 = { 4 , 5 , 6 , 7 , 8 , 9 } ) and EV 5 is the last one that leaves the community in time slot 9. Our community can control the charging rate b k ( t ) . We denote the charging load of the station w c e ( t ) at time slot t,
w c e ( t ) = k G ( t ) b k ( t ) .
We denote the inelastic base load of the community as w b ( t ) , such as washing machine or lighting, which fits a certain distribution. We have the total community load W ( t ) at time slot t,
W ( t ) = w c e ( t ) + w b ( t ) .
The total community load W ( t ) has an upper bound L m a x , and the unit price can be formulated as a linear function of the load (that is, ξ 0 + 2 ξ 1 z ), where ξ 0 and ξ 1 are nonnegative constants. We can get the following carbon emission cost c ( t ) of the community,
c ( t ) = w b ( t ) W ( t ) ( ξ 0 + 2 ξ 1 z ) d z = ξ 0 ( W ( t ) w b ( t ) ) + ξ 1 ( W ( t ) ) 2 ( w b ( t ) ) 2 = ξ 0 w c e ( t ) + ξ 1 w c e ( t ) 2 + 2 w c e ( t ) w b ( t ) = ξ 0 k G ( t ) b k ( t ) + ξ 1 k G ( t ) b k ( t ) 2 + 2 ξ 1 w b ( t ) k G ( t ) b k ( t ) ,
where the carbon emission cost c ( t ) is the integral of the price ξ 0 + 2 ξ 1 z . Let J denote the index set of the time intervals from 1 to T, and let e j ( j J ) denote the length of jth interval. Then we define the total electricity bill paid by the community as C ( t ) during the period T in the following equation:
C ( t ) = t = 1 T c ( t ) .

2.2. Problem Formulation

We then formulate the carbon emission scheduling problem to minimize the total carbon emission cost in the following expression,
min b k ( t ) t J k G ( t ) ξ 0 b k ( t ) + 2 ξ 1 b k ( t ) w b ( t ) + ξ 1 k G ( t ) b k ( t ) 2 e j
s . t . t k a r r t k d e p b k ( t ) e j = D k / η , k = 1 , 2 , , N
0 b k ( t ) b k , m a x
0 W ( t ) L m a x ,
where η is the conversion coefficient between electricity consumption and carbon emissions. If w b ( t ) , t k a r r , t k d e p , b k , m a x , and D k are known in advance, we can achieve the optimal carbon emission schedule b k ( t ) by solving the problem (7). Then, we need to reformulate the offline problem without knowing the future profiles of an online carbon emission problem.

3. Online Carbon Emission Scheduling Algorithm

In this section, we design a heuristic online algorithm (OA) in a rolling fashion. We study the carbon emission scheduling without considering any demand from arriving EVs. We assume that the inelastic load in the community will not change in each time slot and in practice we can set the time interval to be smaller.
Figure 3 shows an illustrative example. An EV comes to the community in the time slot t s . We set U ( t s ) as the set of interval indices at time slot t s and e j ( t s ) is the interval from t s to j U ( t s ) , which is the length of the jth interval. For instance, in Figure 3, U ( t s ) = { t 3 d e p , t 1 d e p , t 2 d e p } and e 1 ( t s ) = t 3 d e p t s , e 2 ( t s ) = t 1 d e p t 3 d e p , e 3 ( t s ) = t 2 d e p t 1 d e p . We define G ( j , t s ) as the set of EVs that arrive at time slot t s and stay in the community at the interval j, j U ( t s ) . We set the set of time interval indices that EV k stay at the community as Q ( k , t s ) . The charging rate of EV k is set as b k ( t ) , t U ( t s ) , which is constant in each time interval. Then, an online carbon emission optimization problem is formulated as follows:
min b k ( t ) t U ( t s ) k G j , t s ξ 0 b k ( t ) + 2 ξ 1 b k ( t ) w b ( t s ) + ξ 1 k G j , t s b k ( t ) 2 e j ( t s )
s . t . j Q ( k , t s ) b k ( j ) e j ( t s ) = D i ( t s ) , k G t s
0 b k ( t ) b k , m a x , k G t s , t Q ( k , t s )
0 W ( t ) L m a x .
The community will stick with the optimal carbon emission schedule to Problem (8) if there are no arriving EVs. When the base load in the community changes, the system will update w b ( t s ) , G j , t s , Q ( k , t s ) , D i ( t s ) and resolve Problem (8).

4. Smart Charging Carbon Emission Scheduling Algorithm

The OA algorithm tackles the problem with the heuristic rolling algorithm without considering any demand from EVs that may arrive in the future time slots. This algorithm executes the online scheduling in a rolling manner according to the current profile of the community. This results in poor performance because it neglects the incoming EVs that bring stochastic charging demand. Therefore, we seek a more effective charging algorithm. Reinforcement learning aims to find an optimal policy to make the agent obtain the maximum reward by taking the action according to the current state. Sarsa [25] and Q-learning [26] estimate the future reward of a policy by utilizing temporal difference iteration and then choose the discrete action drawn from the highest reward. However, it is difficult to set the convincing discrete action. Therefore, we utilize the AC approach to solve the online carbon emission schedule to choose the continuous charging action. The AC method will learn the optimal policy and the value function, which includes two parts: the actor part and the critic part. The actor part is designed to define a parameterized policy and achieve the corresponding carbon emission action for EVs. The critic is to evaluate and criticize the current policy by the rewards according to the action of the state. Unlike the reward function in the policy gradient method, the convergence of the AC algorithm is not ideal [24] because it tries to evaluate the policy similar to that used to generate the data, and the approximate value of state is self-generated. Therefore, an asynchronous advantage AC approach [24] is applied to solve the convergence of AC algorithm in our problem.

4.1. RL Framework Formulation

We dispatch the carbon emission in a time window of T and denote the state space as Φ where the state is φ ( t ) . The carbon emission state φ ( t ) Φ contains the unit price and the percentage of the battery capacity S o C i ( t ) of EV k, which is shown as follows,
φ ( t ) = ( S o C 1 ( t ) , , S o C N ( t ) , w b ( t ) ) ,
where w b ( t ) is defined as community energy demand in Section 2. In practice, the charging process is a continuous state, and we know that the probability of staying in a specific state approaches zero. We utilize the EV carbon emission cost to design the following reward function r ( t ) ,
r ( t ) = [ ξ 0 + 2 ξ 1 a ( t ) + 2 ξ 1 w b ( t ) ] a ( t ) .
Then, we set the following state-action function Q Θ ε ( φ ( t ) , a ( t ) ) to estimate the expected accumulated rewards of current community carbon emission state,
Q Θ ε ( φ ( t ) , a ( t ) ) = E Θ ε { k = 0 ϵ k r ( t + k + 1 ) | φ ( t ) , a ( t ) , Θ ε } ,
where the discount factor is ϵ ( 0 , 1 ) .

4.2. The Actor Part

The state-action function is the expected rewards according to current state. We give the following parameterized policy to generate the carbon emission action
Q Θ ( φ ( t ) , a ( t ) ) = E { r ( t ) + γ Q Θ ( φ ( t + 1 ) , a ( t + 1 ) ) } ,
where E { · } is the anticipation function, γ is the discount factor to estimate the action decisions, and Θ can be estimated by Θ ε ( φ ( t ) , a ( t ) ) . The gradient policy Θ ε ( φ ( t ) , a ( t ) ) is differentiable in ε , and we give the gradient of ε in the following expression,
Δ ε = β a ε J ( Θ ε ) = β a J ( Θ ε ) Θ ε Θ ε ε ,
where β a is the actor learning rate. We need to minimize β a to avoid the policy oscillation [24]. The Gaussian distribution is widely used in the actor part based on the maximum entropy principle [27]. Then, we design the following parameterized policy to generate continuous carbon emission action,
Θ ε ( φ ( t ) , a ( t ) ) = 1 2 Θ σ e [ ( a ( t ) κ ( φ ( t ) ) ) 2 / 2 σ 2 ] ,
where κ ( φ ( t ) ) is the action of the carbon emission state, and σ is the standard deviation of all the probable carbon emission actions. The parameter κ ( φ ( t ) ) is defined as follows,
κ ( φ ( t ) ) = j = 1 n ε j x j ( φ ( t ) ) ,
where x j ( φ ( t ) ) is the feature vector in state φ ( t ) . Θ ε ( φ ( t ) ) shows the probability of choosing action a ( t ) in state φ ( t ) . We optimize the policy Θ ε to make carbon emission decisions.
We use the AC approach to solve the EV carbon emission problem of continuous states and actions. It is impossible to enumerate and store the whole value functions in a lookup table for each state-action couple. Therefore, the target of the AC approach is to find an optimal strategy Θ ε by which to maximize the following function,
J ( Θ ε ) = E { Q Θ ε ( φ ( t ) , a ( t ) ) } = Φ d Θ ε ( φ ( t ) ) a ( t ) Θ ε ( a ( t ) | φ ( t ) ) Q Θ ε ( φ ( t ) , a ( t ) ) d a ( t ) d φ ( t ) ,
where d Θ ε ( φ ) is the function of strategy Θ ε . We need to improve the parameters of policy Θ ε iteratively.

4.3. The Critic Part

We estimate the value function by the function approximation, where the parameter vector w = ( w 1 , w 2 , , w n ) T ,
Q w ( φ ( t ) , a ( t ) ) Q ε Θ ( φ ( t ) , a ( t ) ) .
We apply the linear combination method to approximate the value function because it has a good function approximation and an easy-to-show gradient. Here, we have
Q w ( φ ( t ) , a ( t ) ) = w T Ψ ( φ ( t ) , a ( t ) ) = k = 1 n w i ψ i ( φ ( t ) , a ( t ) ) ,
where we define the basis function vector Ψ ( φ ( t ) , a ( t ) ) for EV carbon emission action a ( t ) at state φ ( t ) in time slot t, as follows,
Ψ ( φ ( t ) , a ( t ) ) = ( ψ 1 ( φ ( t ) , a ( t ) ) , ψ 2 ( φ ( t ) , a ( t ) ) , , ψ n ( φ ( t ) , a ( t ) ) ) T ,
where Ψ ( φ ( t ) , a ( t ) ) is the vector of features visible in state φ ( t ) taking carbon emission action a ( t ) , which is created by tile coding [19] and combined linearly with the parameter vector ψ i ( φ ( t ) , a ( t ) ) . Any of the methods in the linear model can guarantee that the algorithm can converge to the local optimum or the global optimum [19]. We apply the temporal difference error to update the critic value function, which shows the error between true value and approximated value. We set temporal difference (TD) error as follows,
e t = r ( t + 1 ) + ϵ Q w [ φ ( t + 1 ) , a ( t + 1 ) ] Q w ( φ ( t ) , a ( t ) ) ,
where r ( t + 1 ) + ϵ Q w [ φ ( t + 1 ) , a ( t + 1 ) ] is the actual return value of slot t. The critic parameters are updated in the following expression,
Δ w = β c e t w Q w ( φ ( t ) , a ( t ) ) ,
where β c is the critic learning rate and β c will cause the oscillation if β c is too big, but it will take a long time to achieve the convergence if β c is too small.

4.4. Smart Charging Carbon Emission Scheduling Algorithm

The critic part updates the value function and its parameters. Then, the actor part utilizes the critic’s output to update its policy parameters. The TD error can indicate whether the learning process is becoming better or worse than we expect because the learning process is positively correlated with the critic output. We aim to reduce the TD error, which can adjust the actor part and the critic part in the gradient direction. The TD error is generated according to the current state, the next state, and the current reward in the critic part, and then this TD error will help update the policy parameter with the current EV carbon emission action and state. The critic part estimates the strategy from the state-action function Q w ( φ ( t ) , a ( t ) ) . The actor part utilizes the policy gradient in the following expression, where we utilize a as an abbreviation for charging action a ( t ) ,
ε J ( Θ ε ) Φ d Θ ε ( φ ( t ) ) a Q Θ ε ( φ ( t ) , a ) ε Θ ε ( a | φ ( t ) , ε ) d a d φ ( t ) .
We use the Gaussian policy to achieve continuous charging action values. Because ε Θ ε ( φ ( t ) , a ( t ) ) = Θ ε ( φ ( t ) , a ( t ) ) ε ln Θ ε ( φ ( t ) , a ( t ) ) , we have
ε Θ ε ( φ ( t ) , a ( t ) ) = Θ ε ( φ ( t ) , a ( t ) ) ( a ( t ) ϱ ( φ ( t ) ) ) Γ ( φ ( t ) ) σ 2 .
We update the actor parameter ε and critic parameter w at the same time. We update the actor parameter ε in the direction determined by the critic output. However, unlike the reward function in the policy gradient method, the convergence of the AC algorithm is not ideal because of its on-policy search method [19], which tries to evaluate the same policy as that used to generate the data. However, the approximate value of state in the AC algorithm is self-generated. To enhance the convergence of the AC algorithm, we utilize the asynchronous advantage AC algorithm to dispatch EV carbon emission actions, and we give the complete RLSCA algorithm in Algorithm 1.
Algorithm 1 Reinforcement learning-based smart charging carbon emission scheduling algorithm (RLSCA)
Input: Discount factor ϵ , critic learning rate β c , actor learning rate β a , strategy Θ ε ( φ ( t ) , a ( t ) ) , a N ( ϱ ( φ ( t ) ) , σ 2 )
 1:
Init: Thread step operator t = 1 , global shared operator k = 0 . Set parameter ε = ε 0 .
 2:
For each thread do
 3:
For each step do
 4:
Reset gradients: d ε = 0 , d ω = 0 .
 5:
Update thread parameters ε = ε , ω = ω
 6:
Set t s t a r t = t , get state φ ( t )
 7:
Repeat
 8:
Generate carbon emission action a ( t + 1 ) Θ ε ( φ ( t ) , a ( t ) ) , move to the state φ ( t + 1 ) P ( φ ( t ) , a ( t ) , φ ( t + 1 ) ) , achieve the reward r ( t + 1 ) .
 9:
Update t t + 1 , k k + 1
10:
Until   t t s t a r t = t m a x
11:
Critic:
12:
Update: Ψ ( φ ( t ) , a ( t ) ) = ε ln Θ ε ( φ ( t ) , a ( t ) )
13:
Update: Q w ( φ ( t + 1 ) , a ( t + 1 ) ) = w T ε ln Θ ε ( φ ( t + 1 ) , a ( t + 1 ) )
14:
Update critic parameters: w t + 1 = w t + β c e t I , where e t is updated by (20)
15:
Actor:
16:
Update the policy parameter: ε t + 1 = ε t + β a e t ε J ( Θ ε )
17:
Update: φ ( t ) , a ( t ) , z ( t ) , Q w ( φ ( t ) , a ( t ) ) Q w ( φ ( t + 1 ) , a ( t + 1 ) )
18:
end
19:
Update ε using d ε and of ω using d ω according to (24) and (25).
20:
Until   k   >   ξ m a x
21:
End
RLSCA has the policy Θ ( a t | φ t ; ε ) and the approximate state function Q ( φ t ; ω ) and adopts the multi-step returns [19] to update the strategy and the value function after every t m a x actions. We update the critic parameters in the following expression,
d ω = d ω + R Q ( φ t ; ω ) 2 / ω ,
and we update the actor parameters in the following expression,
d ε = d ε + ε log Θ ( a t | φ t ; ε ) ( R Q ( φ t ; ω ) ) .
The cumulative reward is R ( t + 1 ) = r ( t ) + ϵ R ( t ) and the immediate reward is r ( t ) = [ ξ 0 + 2 ξ 1 a ( t ) + 2 ξ 1 w b ( t ) ] a ( t ) . Each thread has an agent that works in a replicated environment. Each step generates a gradient of parameters.

5. Simulation

In this part, we estimate the performance of three benchmark algorithms and RLSCA algorithm by utilizing the community load profiles. For our simulation, the base load profile and the statistical EV data, which includes EV arrival distribution and the energy level at their arrival, are given and known ahead of time.

5.1. Simulation Parameter

We utilize the base load profile for two days from [28]. We obtain the statistical EV data in [29], where Figure 4 shows the EV arrival distribution. The initial SoC of an EV’s battery influences the load profile of the community. The initial SoC is estimated based on urban dynamometer driving schedule [30]. According to measured statistics from [31], the probability density function of the EVs’ battery SoC is shown in Figure 5. We consider two types of EVs—Type 1, the maximum charging rate of which is 3.2 kW with battery capacity 36 kWh, and Type 2, the maximum charging rate of which is 1.4 kW with battery capacity 16 kWh.

5.2. Simulation Performance

The simulation performance of the RLSCA algorithm is compared with the online optimal charging algorithm, the eagerly charging algorithm, and the reinforcement learning adaptive energy management algorithm. Here are the three algorithms’ benchmarks.
(1)
OA: The EV k selects the optimal carbon emission b k * ( t ) from the community, which is the optimal solution of Problem 8. We set the carbon emission cost by OA as Θ O A .
(2)
EC [32]: The EV k selects the maximum carbon emission b m a x for every time slot in the community. We set the carbon emission cost by EC as Θ E C .
(3)
RLAEM [33]: The EV k selects the carbon emission b k * ( t ) from the community solved by the adaptive Q-learning algorithm, which is an adaptive reinforcement learning algorithm. We set the carbon emission cost by RLAEM as Θ R L .
The AEM algorithm is based on the Q-learning method, which has discrete charging action values which may not achieve the optimal plan over practical continuous action space. It is impossible to enumerate and store the whole value functions in a lookup table for each state-action couple. In our work, we don’t discretize the decisions and aim at achieving the optimal EV charging carbon emission dispatching plan with continuous scheduling action. The RLSCA algorithm is based on the AC approach to solve the EV carbon emission problem of continuous states and actions. The RLSCA algorithm is applicable to other places and scenarios if the base load profile and statistical EV data are given. The OA algorithm is simulated by CVXPY [34] which is a Python toolbox for convex optimization problems. We compare four algorithms in a case study with 40 Type-1 EVs over 48 h, and show the optimal charging results of four algorithms in a case study with 40 Type-1 EVs over 48 h in Figure 6. The EV axis shows the index of EV, the time slot axis shows the index of time slot, and the z axis shows the total load of the community. The fluctuation of total community load of RLSCA algorithm is less than the EC, OA, and RLAEM algorithms. From Figure 6, we see that RLAEM and RLSCA charge EVs more smoothly than EC and OA and shave the total peak load together with the base load. Among all tested algorithms, the RLSCA algorithm achieves the best performance in terms of minimizing the total peak load. We show the carbon emissions from EV charging in Figure 7. We present four simulation results of 40 EVs and show the total load including the EV load and the inelastic load of the community in Figure 8. Our RLSCA algorithm has the lowest total load peak. The carbon emission peaks of EC, OA, RLAEM, and RLSCA algorithms are 67.51 kg, 60.34 kg, 60.37 kg, and 56.94 kg. The carbon emission peak of RLSCA algorithm is 15.66%, 5.63%, and 5.68% less than the carbon emission peak of EC, OA, and RLAEM algorithms. The fluctuation of carbon emission peak of RLSCA algorithm is less than the EC, OA, and RLAEM algorithms. We also calculate the carbon emission costs for four simulated algorithms and show the results in Figure 9. The total carbon emission costs of EC, OA, RLAEM, and RLSCA algorithms in this system are $ 20.6 , $ 19.45 , $ 17.77 , and $ 16.22 , and the carbon emission costs of the EC, OA, and RLAEM algorithms in this system are 21.26 % , 16.61 % , and 8.72 % higher than that of our RLSCA algorithm. Similarly, for another type of EV in which the maximal charging rate is 1.4 kW and cell capacity is 36 , 16 kWh, the total carbon emission costs of the EC, OA, RLAEM, and RLSCA algorithms are $ 14.98 , $ 13.50 , $ 12.43 , and $ 11.32 , and the carbon emission costs of the EC, OA, and RLAEM algorithms are 24.43 % , 16.15 % , and 8.93 % higher than that of our RLSCA algorithm, respectively.
In addition, RLSCA algorithm can reduce EVs’ expected cost by 16.61 % , compared with the OA algorithm. In Figure 9, we compare the electricity bill of EVs with two types of EVs in the EC, OA, RLAEM, and RLSCA algorithms. The discount factor ϵ is a crucial operator that can cut down on the TD error in the critic part. For Type-1 EV, the total EV carbon emission costs of EC, OA, RLAEM, and RLSCA algorithms are $ 21.32 , $ 18.95 , $ 18.01 , and $ 17.32 , and the EV carbon emission costs of the EC, OA, and RLAEM algorithms are 23.09 % , 9.41 % , and 3.98 % higher than the cost of our RLSCA algorithm. For Type-2 EV, the total EV carbon emission costs of EC, OA, RLAEM, and RLSCA algorithms are $ 15.58 , $ 13.95 , $ 11.95 , and $ 11.31 , and the EV carbon emission costs of the EC, OA, and RLAEM algorithms are 37.75 % , 23.34 % , and 5.66 % higher than the cost of our RLSCA algorithm, respectively.
From Figure 10, we can conclude that the discount factor has an important impact on the convergence of our algorithm, and we need to find a low discount factor. We simulate our algorithm with discount factors 0.005 , 0.01 , and 0.05 , and we can find that a lower discount factor will get a better performance. Considering the computational efficiency of the algorithm, we can select ϵ = 0.01 to get a trade-off between the reward and convergence, achieving both reasonably fast convergence and high reward. The actor learning rate β a is a crucial parameter in the actor part. From Figure 11, the actor learning rate will have an important impact on the convergence of our algorithm and the update of policy. A larger actor learning rate results in large overshoots of the rewards and the state reward depends on the actor learning rates. A large actor learning rate (e.g., 0.0005 and 0.001) leads to big overshoots in the evolution of rewards, and the steady-state reward depends on the configuration of the actor learning rates as well. From the simulation results in Figure 11, we can see that β a = 0.0001 has the better convergence performance and the higher reward.
Last but not least, we give the results of average carbon emission costs for all the simulated algorithms in Figure 12. From Figure 12, for 40 EVs, carbon emission costs per EV of EC, OA, RLAEM, and RLSCA are $ 0.5447 , $ 0.5129 , $ 0.4756 , and $ 0.4372 . The carbon emission costs per EV of RLSCA are 24.59 % , 17.31 % , and 8.78 % less than those of EC, OA, and RLAEM. For 50 EVs, carbon emission costs per EV of EC, OA, RLAEM, and RLSCA are $ 0.5946 , $ 0.5573 , $ 0.5168 , and $ 0.4607 . The carbon emission costs per EV of RLSCA are 29.06 % , 20.97 % , and 12.18 % less than those of EC, OA, and RLAEM. The average costs of four algorithms steadily rise with the increase of the number of EVs. Our RLSCA algorithm has the lowest average cost among the tested algorithms and the effectiveness of the RLSCA algorithm is verified in the tested scenarios with a different number of electric vehicles.

6. Conclusions

In this work, we study an offline carbon emission scheduling problem, which minimizes the carbon emission cost of a community without future data. Then, we modify the offline EV carbon emission problem as an online problem, and we propose the OA algorithm based on a heuristic rolling algorithm. Furthermore, we propose the RLSCA algorithm based on the AC method to update the continuous EV carbon emission action. In our algorithm, the actor part has parameterized policy, and we can update the policy parameter with a policy gradient. The critic part uses the TD error about the state and action to update the critic parameter, which helps the actor part to optimize the policy. Simulation results showed that our RLSCA algorithm is lower by 21.26 % 16.60 % , and 8.72 % than the EC, OA and RLAEM algorithms, respectively, while having a good convergence. The average cost per EV of the RLSCA algorithm is the lowest among all the simulated algorithms. In future work, we will formulate the carbon emission of renewable energy as a negative cost function or a reward function and study the transactive carbon emission schedule problem with renewable energy according to the transactive energy framework under carbon peak.

Author Contributions

Conceptualization, Y.C. and Y.W.; methodology, Y.C.; software, Y.C.; validation, Y.C.; formal analysis, Y.C.; investigation, Y.C.; resources, Y.C.; data curation, Y.C.; original draft preparation, Y.C.; review and editing, Y.C.; visualization, Y.C.; supervision, Y.C.; project administration, Y.C.; funding acquisition, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by Shanghai Science and Technology Innovation Action Plan Morning Star Project (Sail Special) (22YF1411900); China Postdoctoral Science Foundation (2022TQ0210); National Social Science Foundation of China (20&ZD199); National Social Science Fund Major Project of China on “Legalization of Technology Standards for Public Data in China” (21&ZD200); the Key interdisciplinary project for the Central Universities (223201800084); the Humanities and Social Science Research Project of Ministry of Education (20YJC820030).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Lopes, J.A.P.; Soares, F.J.; Almeida, P.M.R. Integration of Electric Vehicles in the Electric Power System. Proc. IEEE 2011, 99, 168–183. [Google Scholar] [CrossRef] [Green Version]
  2. He, Y.; Venkatesh, B.; Guan, L. Optimal Scheduling for Charging and Discharging of Electric Vehicles. IEEE Trans. Smart Grid 2012, 3, 1095–1105. [Google Scholar] [CrossRef]
  3. Mou, X.; Zhang, Y.; Jiang, J.; Sun, H. Achieving Low Carbon Emission for Dynamically Charging Electric Vehicles Through Renewable Energy Integration. IEEE Access 2019, 7, 118876–118888. [Google Scholar] [CrossRef]
  4. Al-Hanahi, B.; Ahmad, I.; Habibi, D.; Pradhan, P.; Masoum, M.A.S. An Optimal Charging Solution for Commercial Electric Vehicles. IEEE Access 2022, 10, 46162–46175. [Google Scholar] [CrossRef]
  5. Wang, S.; Bi, S.; Zhang, Y.A. Reinforcement Learning for Real-Time Pricing and Scheduling Control in EV Charging Stations. IEEE Trans. Ind. Inform. 2021, 17, 849–859. [Google Scholar] [CrossRef]
  6. Ren, Y.; Ma, C.; Chen, H.; Huang, J. Low-carbon power dispatch model under the carbon peak target. In Proceedings of the 2021 IEEE Sustainable Power and Energy Conference (iSPEC), Nanjing, China, 23–25 December 2021; pp. 2078–2082. [Google Scholar] [CrossRef]
  7. Dong, J.; Li, Y.; Li, W.; Liu, S. CO2 Emission Reduction Potential of Road Transport to Achieve Carbon Neutrality in China. Sustainability 2022, 14, 5454. [Google Scholar] [CrossRef]
  8. Lajunen, A. Evaluation of energy consumption and carbon dioxide emissions for electric vehicles in Nordic climate conditions. In Proceedings of the 2018 Thirteenth International Conference on Ecological Vehicles and Renewable Energies (EVER), Monte-Carlo, Monaco, 10–12 April 2018; pp. 1–7. [Google Scholar] [CrossRef]
  9. Kai, Y.; Shuai, J.; Chunxuan, H.; Zheng, Z.; Tianran, L. Analysis on the emission reduction benefits of electric vehicle replacing fuel vehicle. In Proceedings of the 2021 IEEE Sustainable Power and Energy Conference (iSPEC), Nanjing, China, 25–27 November 2021; pp. 3396–3402. [Google Scholar] [CrossRef]
  10. Qi, X.; Wu, G.; Boriboonsomsin, K.; Barth, M.J. Development and Evaluation of an Evolutionary Algorithm-Based OnLine Energy Management System for Plug-In Hybrid Electric Vehicles. IEEE Trans. Intell. Transp. Syst. 2017, 18, 2181–2191. [Google Scholar] [CrossRef]
  11. Yang, J.; Wiedmann, T.; Luo, F.; Yan, G.; Wen, F.; Broadbent, G.H. A Fully Decentralized Hierarchical Transactive Energy Framework for Charging EVs with Local DERs in Power Distribution Systems. IEEE Trans. Transp. Electrif. 2022, 8, 3041–3055. [Google Scholar] [CrossRef]
  12. Masoum, A.S.; Deilami, S.; Abu-Siada, A.; Masoum, M.A.S. Fuzzy Approach for Online Coordination of Plug-In Electric Vehicle Charging in Smart Grid. IEEE Trans. Sustain. Energy 2015, 6, 1112–1121. [Google Scholar] [CrossRef] [Green Version]
  13. Jian, L.; Zheng, Y.; Shao, Z. High efficient valley-filling strategy for centralized coordinated charging of large-scale electric vehicles. Appl. Energy 2017, 186, 46–55. [Google Scholar] [CrossRef]
  14. Luo, L.; Gu, W.; Zhou, S.; Huang, H.; Gao, S.; Han, J.; Wu, Z.; Dou, X. Optimal planning of electric vehicle charging stations comprising multi-types of charging facilities. Appl. Energy 2018, 226, 1087–1099. [Google Scholar] [CrossRef]
  15. Wang, H.; Ye, Y.; Wang, Q.; Tang, Y.; Strbac, G. An Efficient LP-based Approach for Spatial-Temporal Coordination of Electric Vehicles in Electricity-Transportation Nexus. IEEE Trans. Power Syst. 2022, 1–11. [Google Scholar] [CrossRef]
  16. Bilh, A.; Naik, K.; El-Shatshat, R. A Novel Online Charging Algorithm for Electric Vehicles Under Stochastic Net-Load. IEEE Trans. Smart Grid 2018, 9, 1787–1799. [Google Scholar] [CrossRef]
  17. Tang, X.; Chen, J.; Liu, T.; Qin, Y.; Cao, D. Distributed Deep Reinforcement Learning-Based Energy and Emission Management Strategy for Hybrid Electric Vehicles. IEEE Trans. Veh. Technol. 2021, 70, 9922–9934. [Google Scholar] [CrossRef]
  18. Yu, Y.; Cai, Z.; Huang, Y. Energy Storage Arbitrage in Grid-Connected Micro-Grids Under Real-Time Market Price Uncertainty: A Double-Q Learning Approach. IEEE Access 2020, 8, 54456–54464. [Google Scholar] [CrossRef]
  19. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction. IEEE Trans. Neural Netw. 1998, 9, 1054. [Google Scholar] [CrossRef]
  20. Lu, R.; Hong, S.H. Incentive-based demand response for smart grid with reinforcement learning and deep neural network. Appl. Energy 2019, 236, 937–949. [Google Scholar] [CrossRef]
  21. Bahrami, S.; Wong, V.W.S.; Huang, J. An Online Learning Algorithm for Demand Response in Smart Grid. IEEE Trans. Smart Grid 2017. [Google Scholar] [CrossRef]
  22. Wei, Y.; Yu, F.R.; Song, M.; Han, Z. User Scheduling and Resource Allocation in HetNets with Hybrid Energy Supply: An Actor-Critic Reinforcement Learning Approach. IEEE Trans. Wirel. Commun. 2018, 17, 680–692. [Google Scholar] [CrossRef]
  23. Teng, X.; Long, H.; Yang, L. Integrated Electricity-Gas System Optimal Dispatch Based on Deep Reinforcement Learning. In Proceedings of the 2021 IEEE Sustainable Power and Energy Conference (iSPEC), Nanjing, China, 25–27 November 2021; pp. 1082–1086. [Google Scholar] [CrossRef]
  24. Mnih, V.; Badia, A.P.; Mirza, L.; Graves, A.; Harley, T.; Lillicrap, T.P.; Silver, D.; Kavukcuoglu, K. Asynchronous Methods for Deep Reinforcement Learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; Volume 4, pp. 2850–2869. [Google Scholar]
  25. You, Y.; Zhu, J.; Huang, Y.; Jing, Z. Optimal Decision-Making Method for a Plug-In Electric Taxi in Uncertain Environment. IEEE Access 2021, 9, 62467–62477. [Google Scholar] [CrossRef]
  26. Wang, H.; Zhang, B. Energy Storage Arbitrage in Real-Time Markets Via Reinforcement Learning. arXiv 2017, arXiv:1711.03127. [Google Scholar]
  27. Yan, Y.; Ma, H.; Wen, M.; Dang, S.; Xu, H. Multi-Feature Fusion-Based Mechanical Fault Diagnosis for On-Load Tap Changers in Smart Grid with Electric Vehicles. IEEE Sens. J. 2021, 21, 15696–15708. [Google Scholar] [CrossRef]
  28. Gan, L.; Topcu, U.; Low, S.H. Optimal decentralized protocol for electric vehicle charging. IEEE Trans. Power Syst. 2013, 28, 940–951. [Google Scholar] [CrossRef] [Green Version]
  29. Cao, Y.; Wang, H.; Li, D.; Zhang, G. Smart Online Charging Algorithm for Electric Vehicles via Customized Actor–Critic Learning. IEEE Internet Things J. 2022, 9, 684–694. [Google Scholar] [CrossRef]
  30. United States Environmental Protection Agency. Emission Standards Reference Guide for On-Road and Nonroad Vehicles and Engines; United States Environmental Protection Agency: Washington, DC, USA, 2022.
  31. Leou, R.C. Optimal Charging/Discharging Control for Electric Vehicles Considering Power System Constraints and Operation Costs. IEEE Trans. Power Syst. 2016, 31, 1854–1860. [Google Scholar] [CrossRef]
  32. Tang, W.; Bi, S.; Zhang, Y.J. Online Coordinated Charging Decision Algorithm for Electric Vehicles without Future Information. IEEE Trans. Smart Grid 2014, 5, 2810–2824. [Google Scholar] [CrossRef] [Green Version]
  33. Liu, T.; Zou, Y.; Liu, D.; Sun, F. Reinforcement Learning of Adaptive Energy Management with Transition Probability for a Hybrid Electric Tracked Vehicle. IEEE Trans. Ind. Electron. 2015, 62, 7837–7846. [Google Scholar] [CrossRef]
  34. Diamond, S.; Boyd, S. CVXPY: A Python-Embedded Modeling Language for Convex Optimization. J. Mach. Learn. Res. 2016, 17, 2909–2913. [Google Scholar]
Figure 1. The system model of EVs carbon emission schedule in the community.
Figure 1. The system model of EVs carbon emission schedule in the community.
Sustainability 14 12608 g001
Figure 2. Illustration of G t and U t .
Figure 2. Illustration of G t and U t .
Sustainability 14 12608 g002
Figure 3. An example of online case.
Figure 3. An example of online case.
Sustainability 14 12608 g003
Figure 4. Probability of EVs’ arrival.
Figure 4. Probability of EVs’ arrival.
Sustainability 14 12608 g004
Figure 5. Probability of EVs’ SoC at the arrival time.
Figure 5. Probability of EVs’ SoC at the arrival time.
Sustainability 14 12608 g005
Figure 6. The community load of four algorithms.
Figure 6. The community load of four algorithms.
Sustainability 14 12608 g006
Figure 7. Comparison of carbon emissions from EV charging.
Figure 7. Comparison of carbon emissions from EV charging.
Sustainability 14 12608 g007
Figure 8. Total community load of four algorithms.
Figure 8. Total community load of four algorithms.
Sustainability 14 12608 g008
Figure 9. Two types of EVs’ carbon emission cost in four algorithms.
Figure 9. Two types of EVs’ carbon emission cost in four algorithms.
Sustainability 14 12608 g009
Figure 10. Total moving reward versus discount factor.
Figure 10. Total moving reward versus discount factor.
Sustainability 14 12608 g010
Figure 11. Total community load versus actor learning rate.
Figure 11. Total community load versus actor learning rate.
Sustainability 14 12608 g011
Figure 12. Average costs per EV of four algorithms.
Figure 12. Average costs per EV of four algorithms.
Sustainability 14 12608 g012
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Cao, Y.; Wang, Y. Smart Carbon Emission Scheduling for Electric Vehicles via Reinforcement Learning under Carbon Peak Target. Sustainability 2022, 14, 12608. https://doi.org/10.3390/su141912608

AMA Style

Cao Y, Wang Y. Smart Carbon Emission Scheduling for Electric Vehicles via Reinforcement Learning under Carbon Peak Target. Sustainability. 2022; 14(19):12608. https://doi.org/10.3390/su141912608

Chicago/Turabian Style

Cao, Yongsheng, and Yongquan Wang. 2022. "Smart Carbon Emission Scheduling for Electric Vehicles via Reinforcement Learning under Carbon Peak Target" Sustainability 14, no. 19: 12608. https://doi.org/10.3390/su141912608

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop