Online Charging Strategy for Electric Vehicle Clusters Based on Multi-Agent Reinforcement Learning and Long–Short Memory Networks

Shen, Xianhao; Zhang, Yexin; Wang, Decheng

doi:10.3390/en15134582

Open AccessArticle

Online Charging Strategy for Electric Vehicle Clusters Based on Multi-Agent Reinforcement Learning and Long–Short Memory Networks

by

Xianhao Shen

,

Yexin Zhang

and

Decheng Wang

^*

College of Information Science and Engineering, Guilin University of Technology, Guilin 541006, China

^*

Author to whom correspondence should be addressed.

Energies 2022, 15(13), 4582; https://doi.org/10.3390/en15134582

Submission received: 19 May 2022 / Revised: 17 June 2022 / Accepted: 19 June 2022 / Published: 23 June 2022

Download

Browse Figures

Versions Notes

Abstract

:

The electric vehicle (EV) cluster charging strategy is a key factor affecting the grid load shifting in vehicle-to-grid (V2G) mode. The conflict between variable tariffs and electric-powered energy demand at different times of the day directly affects the charging cost, and in the worst case, can even lead to the collapse of the whole grid. In this paper, we propose a multi-agent reinforcement learning and long-short memory network (LSTM)-based online charging strategy for community home EV clusters to solve the grid load problem and minimize the charging cost while ensuring benign EV cluster charging loads. In this paper, the accurate prediction of grid prices is achieved through LSTM networks, and the optimal charging strategy is derived from the MADDPG multi-agent reinforcement learning algorithm. The simulation results show that, compared with the DNQ algorithm, the EV cluster online charging strategy algorithm can effectively reduce the overall charging cost by about 5.8% by dynamically adjusting the charging power at each time period while maintaining the grid load balance.

Keywords:

multi-agent reinforcement learning; long–short memory network; MADDPG; smart charging

1. Introduction

In today’s tight global energy supply and demand, countries around the world are striving to achieve carbon neutrality, and electric vehicles (EV) are rapidly expanding their market with their low pollution and noiseless features, but the huge user base overwhelms community home charging stations with limited community power supply capacity and insufficient power load. The traditional EV cluster charging strategy favors charging during low peak periods, but the electricity demand explodes at night when electric vehicles become popular, grid load peaks will be shifted, and electricity costs will increase exponentially. Traditional methods encourage users to use electricity through prices [1], but when large quantities of electricity are consumed, there will be an avalanche in the wholesale power market, and its flexibility is low. Many real time incentive schemes gradually emerged with the development of reinforcement learning (RL) [2,3], which hedges the avalanche risk in the grid market by RL and dynamically adjusts the incentive scheme in real time to induce the most favorable choice for the users. Vehicle-to-grid (V2G) [4] technology is proposed to form a certain buffer by feeding back to the grid through the EV’s power storage when the grid load is high, reaching a win–win situation for both users and the grid.

This paper proposes an online charging strategy that combines long and short-term memory networks (LSTM) and multi-agent reinforcement learning to solve the problem of smart charging of community home EV clusters under uncertain electricity prices. This framework mainly uses LSTM to predict the next electricity price to guide the charging piles to make the best charging and discharging decisions to minimize the charging cost and maintain a good load balance in the community grid. Nowadays, there are a variety of EV charging scheduling strategies; literature [5] proposes a nonlinear programming algorithm that includes energy price models to optimize EV charging for fleets, and literature [6] takes Monte Carlo sampling and proposes a solution which is infinitely close to the optimal one following the approximate dynamic programming idea of an infinite level dynamic program. However, such nonlinear programming and dynamic programming are too computationally large and too slow in the improvement in relative progress to satisfy only the charging scheduling in a specific environment. The literature [7] has developed a stochastic optimized dispatch model that combines energy retailers with local operators with a high degree of flexibility and has been tested in several scenarios with uncertain parameters, but with the popularity of the V2G model, the instability of grid-connected tariffs should be taken into account and should be combined with the V2G model to further reduce charging costs by selling excess electricity to local operators. In recent years, with the increasing maturity of annual machine learning techniques, we find that RL in the EV cluster charging strategy has started to become popular, and the literature [8] seeks the best charging strategy by controlling the charging of EVs throughout the charging station through a batch reinforcement learning approach, where the value network takes a fully connected neural network for fitting; conversely, the ability to handle high-dimensional data is poor when the smart body state parameters increase, and the charging characteristics of individual EVs are retained in the process of binning the total demand and the approximation error is introduced in the discretization process, which affects the learning speed of the whole algorithm. In the literature [9], a RL-based intelligent charging coordination system for EV fleets is proposed from the grid perspective, which keeps the grid facilities operating at low load by creating charging schedules for EVs 24 h in advance. At the same time, this system only provides charging strategies for individual EVs and does not propose a charging strategy that adapts to the global situation.

Smart charging strategies for electric vehicle clusters need to meet the charging demand of users while taking into account the grid load as well as the charging cost, where the changeable electricity price and uncertain usage time of users become a challenge. In the literature [10], a long–short memory network is used to improve the prediction accuracy of unstable tariffs, overcoming the challenges posed by realistic battery models of limited types and non-negligible uncertainties for parameters, but not taking into account the uncertainty of user usage. The literature [11] uses a feedforward neural network (NN) based on an extreme learning machine (ELM) for predicting uncertainty in electricity prices and electric vehicle commuting behavior by fitting the value function of the best action through a Q-network, while the ELM network has a slow learning rate, which is difficult to determine and prone to fall into local minima.

The main contributions of this paper are mainly as follows:

We proposed a prediction strategy based on LSTM. This strategy can provide accurate prediction values for floating tariffs in the charging strategy algorithm and improves the accuracy of the algorithm greatly.
We proposed a multi-agent RL-based online charging strategy algorithm for EV clusters, which takes into account uncertainties such as variable electricity prices and user usage.
We designed an approach of centralized training and distributed execution by coordinating the charging strategies of each charging pile within a community home charging station to maintain a cooperative and competitive relationship between each charging post while globally controlling and coordinating the whole, minimizing user usage costs and maintaining load balancing on the community grid.

The remainder of this paper is as follows. Section 2 presents the specific design process of a multi-agent RL-based model for EV cluster charging strategy. The simulation results of the algorithm are discussed in Section 3. Finally, Section 4 provides the conclusion and follow-up work in the future.

2. Real-Time Grid Price Forecasting Model Based on Long and Short Memory Networks

2.1. RNN-Based LSTM Network

Recurrent neural networks (RNNs) [12] are neural networks that process data with certain sequences, which can allow the manipulation of sequences of input and output vectors to better process data with sequences relative to the traditional neural networks. RNNs are equipped with a memory module, through which information about the current state and the old state is continuously passed to the next stage; thus, enabling the prediction of the next state.

The RNN structure is shown in Figure 1, where

x

is the input of sequence data in the current state,

y

is the output of the current state, and

h

is the memory module, which stores the output of the previous state and passes on the output of the current state.

LSTM [12] is a special form of RNN, which aims at solving the gradient disappearance as well as gradient explosion problems that occur during the training process of long sequence data. For the purpose of this paper, LSTM networks are well suited for long-term real-time price prediction.

The LSTM electric car price prediction model mainly includes an input layer, convolutional layer, fully connected layer, and output layer. It is calculated as follows:

c_{t} = σ (W_{f} [x_{t}, h_{t - 1}]) ⊙ c_{t - 1} + σ (W_{i} [x_{t}, h_{t - 1}]) ⊙ \tanh (W [x_{t}, h_{t - 1}])

(1)

h_{t} = σ (W_{0} [x_{t}, h_{t - 1}]) ⊙ \tanh (c_{t})

(2)

y_{t} = σ (W^{'} h_{t})

(3)

where

x_{t}

is the feature vector of the input layer,

y_{t}

is the output state vector,

h_{t - 1}

and

h_{t}

are the states at the time of t−1 and t.

W_{f}, W_{i}, W_{o}

are the weights of the forgetting gate, the selection gate, and the output gate, respectively,

W, W^{'}

are the input and output weights,

σ

and

\tan h

are the

s i g m o i d

function and

\tan h

function, respectively.

The convolutional layer of the LSTM model consists of several memory gates, and its internal structure is shown in Figure 2, where

c_{t - 1}

is the memory state which has very small transition amplitude in the transfer process,

h_{t - 1}

is the hidden layer at time t−1, and

x_{t}

is the input feature vector at time t.

h_{t - 1}

and

x_{t}

pass together as inputs to the first forgetting gate, which has a selective memory function that chooses to remember the part of

c_{t - 1}

from the previous state that needs to be remembered; the second gate is a memory gate, where the input

x_{t}

is memorized and subsequently added to the result calculated in the previous step to obtain the next memory state

c_{t}

; finally, after the output gate, the previous state is deflated through the tanh function to output

h_{t}

.

h_{t}

is finally multiplied with the output weight

W^{'}

to output

y_{t}

through the

s i g m o i d

function.

2.2. LSTM-Based Real-Time Grid Price Prediction Model

The LSTM-based real-time grid price prediction model proposed in this paper focuses on defining an LSTM model with 300 single hidden layers as a decoder that reads an input sequence and outputs a 300-element vector that captures the features in the input sequence. By taking as input the real-time charging demand quantity of EVs, then, a fully connected layer is used to interpret each time step in the output sequence, and the output layer predicts a step in the output sequence.

3. A Multi-Agent Reinforcement Learning-Based Model for Electric Vehicle Cluster Charging Strategy

3.1. Introduction to Multi-Intelligence Reinforcement Learning

Multi-agent reinforcement learning [13] is a sub-field of reinforcement learning, which is still in its infancy in various areas of artificial intelligence development today. Multi-agent reinforcement learning is the study of the interaction of multiple intelligent bodies in a common environment, which usually collaborate towards a final goal.

There are cooperative and competitive relationships when this intelligence interacts with the environment and each other, and their complexity increases exponentially as the number of intelligent bodies increases. The training framework of intelligent bodies can be divided into centralized training and distributed training, where a centralized training is coordinated by a central controller, and each intelligent body communicates with each other to update its strategy during the training period, but its strategy update speed is too slow and is mostly used in traffic signal control [14,15], multi-robot coordination [16], production scheduling [17], etc. In contrast, in distributed training, each intelligent body only focuses on its information for policy update during training, which is fast but less scalable and mostly used for distributed sensors [18]. The MADDPG algorithm used in this paper takes a centralized training distributed execution approach to speed up its policy updates while maintaining the independence and scalability of community charging stake policies.

3.2. Electric Vehicle Cluster Charging Strategy Model

3.2.1. Cluster Charging Behavior Analysis

In this paper, we focus on the charging strategy of community home electric vehicle clustering by dividing a day into 288 time periods with a 24 h cycle and a 5 min time window length and treating each charging pile as a single intelligent body to optimize the output power of the charging post for decision making when the charging pile is in working condition. The cluster charging cost consists of the user charging cost and the overall charging load cost of the whole charging station. Among them, the user charging cost includes the cost of power sold in V2G mode and the cost of the conventional use of power, expressed as follows:

C = \sum_{k = 0}^{N} c (i, k) E p (i, k) P (i, k) Δ t

(4)

In Equation (4),

E p (i, k)

is the real-time tariff,

P (i, k)

is the charging power of the charging post,

Δ t

is the time window, i.e., 5 min,

N

is the total time period, and

c (i, k)

is the charging status of the current time period, as follows:

c (i, t) = \{\begin{matrix} 1, & o u t p u t \\ 0, & d o n e \\ - 1, & c h \arg i n g \end{matrix}

(5)

In Equation (5),

c (i, t)

is equal to 1 if it is in charging state, −1 if it is selling power, and 0 if it is in standby. The overall charging load cost of a charging station

Q

can be expressed as follows:

Q (t) = \sum_{k = 0}^{N} α Q_{b u r d e n} (t)

(6)

In Equation (6),

α

is the load factor and is the grid load factor, in which

Q_{b u r d e n}

is calculated as follows:

Q_{b u r d e n} (i, t) = \sum_{t}^{t + Δ t} \frac{P (i, t)}{S n i \times Cos ϕ 2 \times 100 %}

(7)

In Equation (7),

S n i

is the rated power of the charging post,

Cos \emptyset 2

is the power factor of the secondary, and

P (i, t)

is the output power.

3.2.2. EV Cluster Charging Strategy: Markov Decision Process

The biggest challenge of EV cluster charging [19] is to make charging choices for each charging stake while maintaining load balancing and minimizing user usage costs. Since there are different cooperative or competitive relationships between each charging stake, it is difficult for ordinary reinforcement learning to satisfy its conditions, while multi-agent reinforcement learning coordinates the strategies between each intelligence through a central processor, ensuring the independence of the intelligent body’s decisions while maintaining its cooperative communication capabilities.

As described above, a Markov decision model for EV cluster charging is as follows:

State-space: The states of EV cluster charging can be divided into two groups: (1) single-intelligent charging state, which is primarily accountable for ensuring that individual EV charging is satisfied; (2) multi-agent charging state, which is a collective state and is mainly responsible for maintaining the overall state stability.
As shown in Table 1, the EV battery power $S O C (i, t)$ is directly obtained when the charging post starts charging tasks, and the charging status $c (i, t)$ is updated at the same time. In order to further reduce the user’s usage cost and extend the battery life, our idea is to set the charging threshold $u (i, t)$ for EV charging during the daytime charging according to the past usage $S O C_{h i s} (i, t)$ of the community users and charge the remaining power during the low power consumption period. At the same time, there is a discharge threshold $v (i, t)$ in the V2G mode to ensure that users have electricity available when they use the car temporarily.

In addition to the four single-intelligent state features, there are two multi-agent charging states, tariff

e p (t)

and area load

W_{b u r d e n} (t)

, where the area load is a real-time value that is a weighted average of the load factor of all charging piles, and the tariff is stored in historical data to continuously improve the neural network in Section 2.

Table 1. EV cluster charging state characteristics.

	Single Smart Body Charging Status	Multi-Agent Body Charging Status
Real-time data	$Amount of electricity S O C (i, t)$	$Area load W_{b u r d e n} (t)$
	$Charging status c (i, t)$
	$Charging threshold status y_{u} (i, t)$ $Discharge threshold state y_{v} (i, t)$
Historical data	$Historical electricity S O C_{h i s} (i, t)$	$Electricity price e p (t)$

Action space: EV cluster charging action space $a (i, t)$ is the output power of each charging post.
Reward: Reward provides delayed feedback to intelligence for its action decisions, enabling it constantly to optimize its strategies. Therefore, the reward function has a great influence on the optimization of intelligent decision-making. In the multi-agent reinforcement learning algorithm proposed in this paper, the reward function is set to a negative penalty function, which helps the rapid iterative convergence of the neural network.

The reward function can be divided into the charging cost function, charging power penalty, and charging power penalty function, where the charging cost reward function is expressed as follows:

r_{c h a r g e} (i, t) = - λ c (i, t) \int_{t}^{t + Δ t} e p (t) P (i, t) d t

(8)

In Equation (8),

λ

is the discount parameter,

c (i, t)

is the state of charge function,

e p (t)

is the real-time tariff, and

P (i, t)

is the charging power. The charging power reward function is as follows:

w_{p} (t) = \{\begin{matrix} δ W_{b u r d e n} (t) & 75 % \leq W_{b u r d e n} (t) \leq 80 % \\ - δ W_{b u r d e n} (t) & W_{b u r d e n} < 75 %, W_{b u r d e n} (t) > 80 % \end{matrix}

(9)

In Equation (9),

δ

is the discount parameter,

W_{b u r d e n} (t)

is the regional grid load factor, and is a weighted average of the load factors of all charging posts, calculated as follows:

W_{b u r d e n} (t) = \sum_{i = 0}^{N} Q_{b u r d e n} (i, t)

(10)

In Equation (10),

Q_{b u r d e n} (i, t)

is the load factor of a single charging post. Finally, the charging power reward function:

w_{s} (i, t) = - σ {(1 - S O C (i, t))}^{2}

(11)

In Equation (11),

σ

is the power discount parameter, and the function mainly judges whether the charging is completed. Considering the uncertainty of user use scenarios, set the charging threshold

g

, the threshold limit function is as follows:

r_{g} (i, t) = \{\begin{array}{l} ε & c (i, t) > 0, S O C (i, t) > 80 % \\ - ε & c (i, t) > 0, S O C (i, t) < 80 % \end{array}

(12)

In Equation (12),

ε

is the threshold discount parameter, when the power is less than this threshold then the charging state is maintained and discharging is prohibited, when greater than this threshold then the discharging operation is allowed once the threshold is reached then the discharging is stopped. The final total reward function is shown as Equation (13):

R (i, t) = r_{c h a r g e} (i, t) + w_{P} (t) + w_{s} (i, t) + r_{g} (i, t)

(13)

Observation value: To give the central processor a better grasp of the global information, an observation value is set for each intelligent body. The settings are as Equation (14):

o (i) = {S O C (i, t), e p (t + 1), c (i, t)}

(14)

3.2.3. Multi-Intelligence Reinforcement Learning-Based Charging Strategy Process for Electric Vehicle Clusters

The pseudo-code of the multi-agent RL-based EV cluster charging strategy is shown in Algorithm 1, where a random process

N

is initialized for the first time at the beginning of each training round for the intelligent body actions, and subsequently the initialized states are fed into the model. At the time of exploration, each smart body chooses the best action

a_{i, t}

based on the current policy and the latest tariff

e p_{t}

obtained from the LSTM-based tariff prediction network, and observes the reward

r_{i, t}

, the new state

s_{t + 1}

, and the latest real-time tariff

E p_{t}

after performing the action, and stores

(x_{t}, a_{t}, r_{t}, x_{t + 1})

in the buffer pool

D

. The latest real-time tariff is returned to the LSTM network for storage and training. Next, the state

s_{t + 1}

is updated, and random samples are taken from the cache pool

D

to update the critic network and the performer network. Finally, the target network parameters are updated for each intelligent body.

Algorithm 1: MADDPG Electric Vehicle Cluster Charging Strategy Process

1: for episode = 1 to

M

do

2: Initialize the random process

N

3: Accept initial state

s

4: for

t = 1

to maximum time period

n

do

5: Forecasting electricity prices

e p_{t}

based on LSTM networks

6: Select action

a_{i, t}

according to current strategy

7: Perform actions

a_{i, t}

, observe rewards

r_{i, t}

and new states

s_{i, t + 1}

8: Obtain real-time electricity prices

E p_{t}

to return to LSTM

9: network optimization

10: Put

(s_{i, t}, a_{i, t}, r_{i, t}, s_{i, t + 1})

into the experience pool

D

11:

s_{i, t} = s_{i, t + 1}

12: for agent i = 1 to

K

do

13: Draw samples

(s_{i, t}, a_{i, t}, r_{i, t}, s_{i, t + 1})

from experience pool

D

14: Follow the New Critics Network and the Performers Network

15: end for

16: Update the target network parameters of each charging post smartbody

17: end for

18: end for

4. Algorithm Analysis

4.1. Analysis of LSTM-Based Electricity Price Prediction Algorithm

4.1.1. Data Description

In this paper, we use electricity price data from the Australian Energy Market (AEMO) [20], with data starting on 1 January 2022 and ending on 31 January 2022, with a data interval of 5 min, i.e., 288 data items for one day, and monthly electricity price data as shown in Figure 3.

4.1.2. Parameter Setting

The algorithm takes 7 days a week as a cycle, and a total of 2016 time period samples are included in a cycle, and the electricity price of the next time period is predicted by learning from the data of the first 7 days. As shown in Figure 4, a total of 8928 data is included, the first 6696 are training data and the last 2232 are test data set. A sliding window of size 2017 with step size 1 is adopted, in each window the first 2016 data are the training set and the last one is the validation set. The learning rate in LSTM algorithm is set to 0.001 and the number of training rounds is 300.

4.1.3. Analysis of Results

The results of the LSTM-based electricity price prediction algorithm are shown in Figure 5 below, from which it can be seen that the LSTM network has a very small error in predicting the electricity price with an accuracy of 90.12%, while the traditional ARIMA prediction algorithm predicts the results as shown in Figure 6 with an accuracy of 89.57%. From the error comparison graph of the two algorithms in Figure 7, we can easily see that the blue dashed LSTM-based forecasting algorithm has a smoother error compared to the orange dashed ARIMA algorithm. Additionally, as shown in Algorithm 1, the charging strategy algorithm obtains the predicted electricity price at the current time by LSTM every time period and returns it to the LSTM network store for continuous training after obtaining the current actual electricity price in order to improve the accuracy of the algorithm.

4.2. Analysis of a Cluster Charging Strategy Algorithm for Community Home Electric Vehicles Based on Multi-Agent Reinforcement Learning

4.2.1. Experiment Description

This experiment mainly simulates the charging scenario of a community home charging station, setting 12 charging piles of rated power of 37.5 KW and battery capacity of 60 Kwh for electric vehicles, and in order to reduce the number of training rounds, the charging piles are taken to work continuously and uninterruptedly for 24 h. Specifically, the initial charging battery power of each charging stake is initialized at the beginning of each round of training, and the uncertainties of the noise simulation real charging scenario are added in each charging time step, and the charging battery power is initialized when one of the charging stakes completes the charging task, and the charging task is continued until the end of 288 time steps of the day.

4.2.2. Parameter Setting

To verify the algorithm of this paper the following experiments are conducted and the simulation environment is shown in Table 2. In this algorithm, the target policy network learning rate is

10^{- 4}

, and the value network learning rate is

10^{- 4}

. Combining the data in Figure 3 and Figure 5, we can see that the average electricity price in January is AUD 75, so we set the charging cost discount parameter

λ

to 0.1, so that the average charging cost

r_{c h a r g e}

is 0.625; from Equation (9), we can see that the grid load rate

W_{b u r d e n}

is maintained between 0 and 1, so the solid charging power discount parameter

δ

is 1; from the charging power reward function

w_{s} (i, t)

, we can see that

w_{s} (i, t) \approx 10^{- 2}

, so the charging power discount parameter

σ

is 10; finally, the threshold discount parameter

ε

is 0.5.

4.2.3. Analysis of Results

To demonstrate the performance of the proposed EV cluster charging strategy algorithm, we next discuss the simulation results for 1 February 2022, in detail. The average execution time of the algorithm proposed in this paper is 1.884 milliseconds, which has a strong real-time performance and meets the requirements of industrial applications. As shown in Figure 8, the blue line shows the electricity price for each period on 1 February and the green bar shows the output power of the charging post during the period. It is obvious that the electricity price fluctuates a lot around 13:00 and 18:00, which are the peak hours of electricity consumption, and the charging strategy successfully avoids the high charging cost during this period and reverses the output voltage to sell to the grid for a lower charging cost.

In terms of load, the dynamic power load versus the rated power load is shown in Figure 9, where the dynamic power load is lower than the rated power load for most of the period, with an average load factor of 68% throughout the day, which is 32% lower than the rated power. Compared with the single strategy of the DQN algorithm, this algorithm adopts a centralized training and distributed execution method, and each charging stake maintains both cooperative communication and policy independence, which can better adjust its strategy to achieve the overall load balancing of the community home charging stations. As shown in Figure 10, Figure 11 and Figure 12, MADDPG greatly reduces the load pressure on the community grid and helps the community grid to operate efficiently and smoothly.

The cost of charging is another key indicator to verify the performance of the charging strategy. From Figure 13, Figure 14 and Figure 15, we can see that the cost of electricity of the proposed algorithm in each period is lower than that of a single power charging strategy. It is easy to see that the reinforcement learning combined with V2G technology significantly reduces the user’s electricity cost, and the MADDPG algorithm is more flexible than the DQN algorithm for each charging station charging strategy, which, to a large extent, can effectively reduce the use cost.

5. Conclusions

In this paper, a cluster online charging strategy algorithm for a community household electric vehicle was proposed. The algorithm, combined with multi-agent reinforcement learning and short and long memory network, aims to reduce the load of regional power grid, expand the operation scope of local operators to achieve better energy scheduling, and reduce the charging cost of community users to achieve a win–win situation among three parties. In particular, in the context of today’s volatile energy prices, the algorithm predicts the next stage of energy prices through a long–short memory network and combines a multi-agent reinforcement learning algorithm with a centralized training and distributed execution model to maintain communication between intelligence while ensuring their decision independence. Simulation results show that this algorithm, combined with the V2G model, can effectively avoid peak consumption periods and reverse power sales to the grid, which greatly reduces the cost of use for users. In the future, the recommendation algorithm for a user-oriented charging strategy will be researched to guide users to avoid peak usage periods through incentive strategies to further reduce charging costs.

Author Contributions

Pre-research data collection for this article was done by X.S. and Y.Z. Software was done by Y.Z. X.S. and Y.Z. were responsible for writing and editing the article. D.W. was responsible for project management, obtaining funding and reviewing the article. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partly funded by the National Natural Science Foundation of China (Grant no.61961010), Natural Science Foundation of Guangxi (Grant no. 2021AC19255), Guangxi Science and technology major special projects (Grant no. AA19046004) and Innovation Project of Guangxi Graduate Education (Grant no. YCSW2022314).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ensslen, A.; Ringler, P.; Dörr, L.; Jochem, P.; Zimmermann, F.; Fichtner, W. Incentivizing smart charging: Modeling charging tariffs for electric vehicles in German and French electricity markets. Energy Res. Soc. Sci. 2018, 42, 112–126. [Google Scholar] [CrossRef]
Lu, R.; Hong, S.H. Incentive-based demand response for smart grid with reinforcement learning and deep neural network. Appl. Energy 2019, 236, 937–949. [Google Scholar] [CrossRef]
Yang, T.; Zhao, L.; Li, W.; Zomaya, A.Y. Reinforcement learning in sustainable energy and electric systems: A survey. Annu. Rev. Control 2020, 49, 145–163. [Google Scholar] [CrossRef]
Bibak, B.; Tekiner-Moğulkoç, H. A comprehensive analysis of Vehicle to Grid (V2G) systems and scholarly literature on the application of such systems. Renew. Energy Focus 2021, 36, 1–20. [Google Scholar] [CrossRef]
Rücker, F.; Bremer, I.; Linden, S.; Badeda, J.; Sauer, D.U. Development and Evaluation of a Battery Lifetime Extending Charging Algorithm for an Electric Vehicle Fleet. Energy Procedia 2016, 99, 285–291. [Google Scholar] [CrossRef] [Green Version]
Schneider, F.; Thonemann, U.W.; Klabjan, D. Optimization of Battery Charging and Purchasing at Electric Vehicle Battery Swap Stations. Transp. Sci. 2018, 52, 1211–1234. [Google Scholar] [CrossRef]
Habeeb, S.A.; Tostado-Veliz, M.; Hasanien, H.M.; Turky, R.A.; Meteab, W.K.; Jurado, F. DC Nanogrids for Integration of Demand Response and Electric Vehicle Charging Infrastructures: Appraisal, Optimal Scheduling and Analysis. Electronics 2021, 10, 2484. [Google Scholar] [CrossRef]
Sadeghianpourhamami, N.; Deleu, J.; Develder, C. Definition and Evaluation of Model-Free Coordination of Electrical Vehicle Charging with Reinforcement Learning. IEEE Trans. Smart Grid 2020, 11, 203–214. [Google Scholar] [CrossRef] [Green Version]
Tuchnitz, F.; Ebell, N.; Schlund, J.; Pruckner, M. Development and Evaluation of a Smart Charging Strategy for an Electric Vehicle Fleet Based on Reinforcement Learning. Appl. Energy 2021, 285, 116382. [Google Scholar] [CrossRef]
Chang, F.; Chen, T.; Su, W.; Alsafasfeh, Q. Control of battery charging based on reinforcement learning and long short-term memory networks. Comput. Electr. Eng. 2020, 85, 106670. [Google Scholar] [CrossRef]
Wan, Y.; Qin, J.; Ma, Q.; Fu, W.; Wang, S. Multi-agent DRL-based data-driven approach for PEVs charging/discharging scheduling in smart grid. J. Frankl. Inst. 2022, 359, 1747–1767. [Google Scholar] [CrossRef]
Sherstinsky, A. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Phys. Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef] [Green Version]
Zhang, H.; Li, D.; He, Y. Multi-Robot Cooperation Strategy in Game Environment Using Deep Reinforcement Learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), Kuala Lumpur, Malaysia, 12–15 December 2018; pp. 886–891. [Google Scholar]
Li, L.; Lv, Y.; Wang, F.-Y. Traffic signal timing via deep reinforcement learning. IEEECAA J. Autom. Sin. 2016, 3, 247–254. [Google Scholar] [CrossRef]
Wang, T.; Cao, J.; Hussain, A. Adaptive Traffic Signal Control for large-scale scenario with Cooperative Group-based Multi-agent reinforcement learning. Transp. Res. Part C Emerg. Technol. 2021, 125, 103046. [Google Scholar] [CrossRef]
Li, X.; Wang, X.; Zheng, X.; Jin, J.; Huang, Y.; Zhang, J.J.; Wang, F.-Y. SADRL: Merging human experience with machine intelligence via supervised assisted deep reinforcement learning. Neurocomputing 2022, 467, 300–309. [Google Scholar] [CrossRef]
Popper, J.; Yfantis, V.; Ruskowski, M. Simultaneous Production and AGV Scheduling Using Multi-Agent Deep Reinforcement Learning. Procedia CIRP 2021, 104, 1523–1528. [Google Scholar] [CrossRef]
Xu, T.; Zhao, M.; Yao, X.; Zhu, Y. An improved communication resource allocation strategy for wireless networks based on deep reinforcement learning. Comput. Commun. 2022, 188, 90–98. [Google Scholar] [CrossRef]
Narasipuram, R.P.; Mopidevi, S. A technological overview & design considerations for developing electric vehicle charging stations. J. Energy Storage 2021, 43, 103225. [Google Scholar] [CrossRef]
AEMO|Combined Price and Demand Data. Available online: https://aemo.com.au/energy-systems/electricity/national-electricity-market-nem/data-nem/aggregated-data (accessed on 14 April 2022).

Figure 1. RNN structure.

Figure 2. LSTM structure.

Figure 3. January 2022 electricity prices in New South Wales, Australia.

Figure 4. LSTM sliding window.

Figure 5. LSTM algorithm electricity price prediction.

Figure 6. ARIMA algorithm electricity price prediction.

Figure 7. Comparison chart of tariff forecast errors.

Figure 8. Electricity price and power chart for each time period on 1 February.

Figure 9. Load comparison chart for each time period on 1 February.

Figure 10. Comparison of the power of MADDPG algorithm and DQN algorithm for charging post #1.

Figure 11. Comparison of the power of MADDPG algorithm and DQN algorithm for charging post #5.

Figure 12. Comparison of the MADDPG algorithm with the DQN algorithm for the sum of power of all charging posts in the community.

Figure 13. Comparison of electricity costs by time period on 1 February.

Figure 14. MADDPG algorithm vs. DQN algorithm charging post #2 cost comparison Chart.

Figure 15. Comparison of the average cost per charging post by time period for the MADDPG and DQN algorithms.

Table 2. Simulation environment configuration.

Experimental Configuration	Specific Parameters
CPU	Intel Core i7-11700K@ 5.0 GHz
GPU	NVIDIA GeForce GTX 1660 Super
Memory	16 GB
Operating System	Windows10
Programming Environment	python3.7, pytorch 1.10.2 + cu102

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shen, X.; Zhang, Y.; Wang, D. Online Charging Strategy for Electric Vehicle Clusters Based on Multi-Agent Reinforcement Learning and Long–Short Memory Networks. Energies 2022, 15, 4582. https://doi.org/10.3390/en15134582

AMA Style

Shen X, Zhang Y, Wang D. Online Charging Strategy for Electric Vehicle Clusters Based on Multi-Agent Reinforcement Learning and Long–Short Memory Networks. Energies. 2022; 15(13):4582. https://doi.org/10.3390/en15134582

Chicago/Turabian Style

Shen, Xianhao, Yexin Zhang, and Decheng Wang. 2022. "Online Charging Strategy for Electric Vehicle Clusters Based on Multi-Agent Reinforcement Learning and Long–Short Memory Networks" Energies 15, no. 13: 4582. https://doi.org/10.3390/en15134582

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Online Charging Strategy for Electric Vehicle Clusters Based on Multi-Agent Reinforcement Learning and Long–Short Memory Networks

Abstract

1. Introduction

2. Real-Time Grid Price Forecasting Model Based on Long and Short Memory Networks

2.1. RNN-Based LSTM Network

2.2. LSTM-Based Real-Time Grid Price Prediction Model

3. A Multi-Agent Reinforcement Learning-Based Model for Electric Vehicle Cluster Charging Strategy

3.1. Introduction to Multi-Intelligence Reinforcement Learning

3.2. Electric Vehicle Cluster Charging Strategy Model

3.2.1. Cluster Charging Behavior Analysis

3.2.2. EV Cluster Charging Strategy: Markov Decision Process

3.2.3. Multi-Intelligence Reinforcement Learning-Based Charging Strategy Process for Electric Vehicle Clusters

4. Algorithm Analysis

4.1. Analysis of LSTM-Based Electricity Price Prediction Algorithm

4.1.1. Data Description

4.1.2. Parameter Setting

4.1.3. Analysis of Results

4.2. Analysis of a Cluster Charging Strategy Algorithm for Community Home Electric Vehicles Based on Multi-Agent Reinforcement Learning

4.2.1. Experiment Description

4.2.2. Parameter Setting

4.2.3. Analysis of Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI