*Article* **A Frequency and Voltage Coordinated Control Strategy of Island Microgrid including Electric Vehicles**

**Peixiao Fan 1, Song Ke 1,\*, Salah Kamel 2, Jun Yang 1, Yonghui Li 1, Jinxing Xiao 3, Bingyan Xu <sup>3</sup> and Ghamgeen Izat Rashed <sup>1</sup>**


**Abstract:** Frequency and voltage deviation are important standards for measuring energy indicators. It is important for microgrids to maintain the stability of voltage and frequency (VF). Aiming at the VF regulation of microgrid caused by wind disturbance and load fluctuation, a comprehensive VF control strategy for an islanded microgrid with electric vehicles (EVs) based on Deep Deterministic Policy Gradient (DDPG) is proposed in this paper. First of all, the *SOC* constraints of EVs are added to construct a cluster-EV charging model, by considering the randomness of users' travel demand and charging behavior. In addition, a four-quadrant two-way charger capacity model is introduced to build a microgrid VF control model including load, micro gas turbine (MT), EVs, and their random power increment constraints. Secondly, according to the two control goals of microgrid frequency and voltage, the structure of DDPG controller is designed. Then, the definition of space, the design of global and local reward functions, and the selection of optimal hyperparameters are completed. Finally, different scenarios are set up in an islanded microgrid with EVs, and the simulation results are compared with traditional PI control and *R*(λ) control. The simulation results show that the proposed DDPG controller can quickly and efficiently suppress the VF fluctuations caused by wind disturbance and load fluctuations at the same time.

**Keywords:** islanded microgrid; electric vehicles; charger capacity model; VF control; DDPG

### **1. Introduction**

Microgrid refers to a small power generation and distribution system that is composed of distributed power sources, energy storage devices, energy conversion devices, related loads, monitoring, and protection devices. It is an autonomous system that can realize self-control, protection, and management. In addition, the microgrid can operate in gridconnected mode and islanded mode. In islanded mode, the power quality of the microgrid is usually maintained by the micro sources and flexible loads [1]. At the same time, with the development of vehicle-to-grid (V2G) technology, the research of EVs in the areas of grid peak and valley filling, suppression of power fluctuations, and microgrid stability control has also been deepened [2,3], which brings opportunities and challenges to the VF regulation of microgrids.

Due to the limited capacity of the islanded microgrid, ensuring the stability of frequency and voltage is the key for the operation safety of microgrid. In [4], a VF strategy of an islanded microgrid based on fuzzy logic controller is proposed, which can control active and reactive powers and decrease power losses of the microgrid, thus the effectiveness and robustness of the proposed controller over the conventional proportional- integral controller. In [5], a decoupled VF controller for DGs is proposed, which is able to keep the

**Citation:** Fan, P.; Ke, S.; Kamel, S.; Yang, J.; Li, Y.; Xiao, J.; Xu, B.; Rashed, G.I. A Frequency and Voltage Coordinated Control Strategy of Island Microgrid including Electric Vehicles. *Electronics* **2022**, *11*, 17. https://doi.org/10.3390/ electronics11010017

Academic Editors: Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

Received: 5 December 2021 Accepted: 20 December 2021 Published: 22 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

grid VF magnitude constant, so as to enhance the resilience and increase the penetration of renewable energy to the stand-alone microgrid. In [6], an optimized solution is proposed for minimizing both frequency and voltage deviations. The simultaneous control of VF is achieved with proper load sharing among the DG units. However, the system parameter setting of the traditional control strategy mentioned above is complicated, and the control performance needs to be further improved when faced with complex working conditions such as wind disturbance and load fluctuation.

Therefore, various intelligent algorithms are gradually being widely used in the control of microgrids. In [7], a new scheme for the online minimization of harmonic distortion of an islanded microgrid based on a population-based optimization method is proposed, presenting a new central controller to optimize network voltage harmonics according to particle swarm optimization (PSO) algorithm, while active power is shared between distributed generation units. In [8], a coordinated load shedding control scheme based on Double-*Q* learning for an islanded microgrid is proposed to solve the problem of how to determine the appropriate load shedding amount and objects when frequency is disturbed by considering the relationship between the active power and frequency deviation of each distributed energy resource. However, the intelligent controllers mentioned above can only regulate frequency or voltage, that is, they cannot take both of frequency recovery and voltage adjustment.

Meanwhile, in the construction of the microgrid model, the access of EVs is not considered, and the boundary of the output power of each unit is ignored. Thus, there is room for further optimization in the microgrid model and control strategy. EVs have become a new type of distributed energy storage unit with its energy saving, environmental protection, and flexibility [9,10], which can provide power support for the islanded microgrid and improve its operational flexibility through V2G technology. In [11], an islanded microgrid LFC model including loads, distributed power sources, MT, EVs, and their constraints is established. However, the output power boundary of the EVs charging station model in this paper is a fixed value, which does not match the actual situation. In [12], a microgrid including micro gas turbine (MT), EVs, distributed power, and loads is established, and an improved robust model predictive frequency control strategy of microgrids with EVs is proposed, which can better suppress the frequency fluctuation with a faster response speed than other methods, but the random output power boundary is not refined from the perspective of cluster EVs. In addition, none of the above references considers the reactive power regulation effect of EVs on voltage stability. In fact, the power boundary of the charging station can be affected by user travel demand, charging behavior, and the characteristics of EV clusters. Thus, the active power *P* and reactive power *Q* output by the EVs charging station can be adjusted according to the control command and the power factor angle of the charger, so as to complete the stability control of VF.

In summary, the randomness of users can affect the charging behavior of EVs stations. In addition, there is no suitable intelligent control algorithm that can use EVs to realize the coordinated control of the VF of the islanded microgrid. Thus, a VF coordinated control strategy based on Deep Deterministic Policy Gradient (DDPG) is proposed in this paper, which is applied to the VF control of an islanded microgrid with EVs. The main contributions are as follows: (1) In order to solve the problem of randomness in the charging boundary of EVs caused by the users' randomness, the VF control model of EVs is established. The *SOC* constraint condition of the EVs is established, and a fourquadrant two-way charger capacity model is introduced. Thus, a microgrid VF control model including load, MT, EVs, and their random power boundary is built; and (2) the voltage and frequency fluctuations can be caused by wind disturbance and load fluctuations. Thus, the DDPG controller with online learning and experience playback capabilities is selected. The convergence characteristics of DDPG are great, so it can coordinate the frequency recovery and voltage regulation of the islanded microgrid greatly. (3) In order to achieve effective regulation of voltage and frequency at the same time, the structure of DDPG controller is designed according to the two control goals of microgrid frequency

and voltage. In addition, then the definition of space, the design of global and local reward functions, and the selection of optimal hyperparameters are completed. Thereby, it can simultaneously meet the VF control requirements.

### **2. Microgrid Control Model with EVs**

The VF control in microgrid can be realized by distributed power supply, energy storage device, etc. In addition, EVs can also participate in microgrid VF regulation. For the VF control of microgrid, the microgrid control system of distributed power supply, load, MT, and EVs is established in this section.

### *2.1. Electric Vehicle Control Model*

As a flexible energy storage device in microgrid control, EVs can regulate the charge and discharge power of the battery according to the instructions of the controller, thereby to control the interaction of active power with the grid [13]. At the same time, charger scheduling is applied to realize the regulation of voltage or reactive power. The two-way charger can realize four-quadrant operation [14], and the power factor cannot determine the transmission direction of reactive power, so the operating quadrant of the charger cannot be determined. Taking the power factor angle as the control variable can determine the transmission direction and magnitude of active and reactive power together, which is more conducive to the two-way transmission control of active and reactive power between the grid and the EV.

The function of EVs in microgrid control is similar to that of energy storage devices. In terms of active power, the charging and discharging power ranges of EV are limited within ±*λ*e, due to the limits of inverter capacity. The *E*max is the maximum capacity of EVs station. In addition, the recommended maximum capacity *E*rmax = 0.9*E*max and the recommended minimum capacity *E*rmin = 0.1*E*max are set to ensure the safe and stable operation of EV station. When the current capacity *E* of the EVs station is higher than the *E*rmax, the EV stations can discharge to the microgrid, and the discharge power range is 0–*λ*e. Similarly, if the current capacity of the EVs station is lower than the *E*rmin, the EV station can be charged from the microgrid within the charging power range is −*λ*e–0. In addition, the EV control model can be affected by users' uncertain factors such as the randomness of travelling demands and charging behavior of users.

Firstly, the randomness of user travel demand affects the capacity and limitation of the charging station to be random. Therefore, it is necessary to establish the constraints of *SOC* to ensure that the user's normal travel is still satisfied under the interaction between EVs and the grid. In addition, the initial *SOC* of the battery in this paper is set as a random number [15] obeying Gaussian distribution, and its probability density function is expressed as Equation (1):

$$f(s) = \frac{1}{\sigma\_s \sqrt{2\pi}} e^{\frac{-\left(s - \mu\_s\right)^2}{2\sigma\_s^2}}\tag{1}$$

where *μ*<sup>s</sup> represents the average value of *SOC*, and *σ*<sup>s</sup> represents the standard deviation.

According to the 2017 National Household Travel Survey (NHTS) of the US Department of Transportation [16], it can be obtained that the daily mileage *L* obeys lognormal distribution, and its probability density function is as follows:

$$f(L) = \frac{1}{\frac{1}{L\sigma\_L\sqrt{2\pi}}e^{\frac{-(\ln L - \mu\_L)^2}{2\sigma\_L^2}}}\tag{2}$$

where *μ*<sup>L</sup> represents the average value of the daily mileage *L*, and *σ*<sup>L</sup> represents the standard deviation.

According to the daily driving mileage, the charging time *T*c is calculated:

$$T\_{\mathfrak{c}} = \frac{LQ\_{100}}{100P\_{\mathfrak{c}}} \tag{3}$$

where *Pc* is the charging power, and *Q*<sup>100</sup> is the power consumption per 100 km.

For the leaving time *T*leave, it is required that *T*leave ≥ *Tc*. Thus, *T*leave is set as follows:

*T*leave = (1 + *σT*)*Tc* (4)

where *σ<sup>T</sup>* is a positive random number.

Based on the above parameters, the demanded *SOC* for future travel named *SOCm* can be calculated [17]:

$$\text{SOC}\_{\text{ill}} = S\_0 + \frac{L}{L\_{\text{max}}} \tag{5}$$

where *S*<sup>0</sup> is the initial *SOC* for EVs.

Therefore, for EVs in the station, the *SOC* can be maintained within the range of [*SOC*rmin, *SOC*rmax]. *SOC*rmax and *SOC*rmin are the recommended maximum and minimum value of *SOC*, which can ensure the life of the battery. To satisfy the sufficient *SOCm* to make sure the follow-up driving when EVs leave, the constraint conditions are added to the *SOC* of EVs, as shown in Figure 1. The blue dotted line represents the charge boundary, which means that the EV can no longer charge when the *SOC* reaches *SOC*rmax. The red dotted line represents the discharge boundary, which means that the EV can no longer discharge when the *SOC* reaches *SOC*rmin. The solid green line represents the boundary of forced charging, which means that the EV is forced to charge to ensure the *SOCm* when leaving the charging station.

**Figure 1.** Boundary of charging and discharging constraints for EVs.

Furthermore, in terms of active power, the rated charging power of a single EV can be set to *P ch EV*,*i*, and the rated discharging power to *<sup>P</sup> dis EV*,*<sup>i</sup>* The relationship between the charging power of a single EV and the charging and discharging state can be obtained as follows: When *SOCi* ≥ *SOC*rmax, the single EV can discharge positive power increment 0 < Δ*PEV*,*<sup>I</sup>* < *P dis EV*,*i* , which can ensure that *SOCi* is controlled below *SOC*rmax. When *SOCi* ≤ *SOC*min, the single EV can only be charged, that is, only the negative power increment can be discharged <sup>−</sup>*<sup>P</sup> ch EV*,*<sup>i</sup>* < Δ*PEV*,*<sup>i</sup>* < 0, which can ensure that *SOCi* is controlled above *SOC*rmin. When *SOC*rmin < *SOCi* < *SOC*rmax, the single EV can be charged and discharged. Thus, the power increment satisfies <sup>−</sup>*<sup>P</sup> ch EV*,*<sup>i</sup>* < Δ*PEV*,*<sup>i</sup>* < *P dis EV*,*i* . In summary, the

instruction distribution of the EVs station through the controller is shown in Figure 2. In addition, the charging and discharging constraint boundary of a single EV can be obtained as follows:

$$P\_{EV,i}^{+}(t) = \begin{cases} 0 & , t \ge T\_{\text{larc},i}, 0 \le t < T\_{\text{larc},i} \text{ and } \text{SOC}\_{EV,i}(t) \le \text{SOC}\_{EV,i}^{-}(t) \\\ P\_{EV,i}^{\text{dis}}(t), \; 0 \le t < T\_{\text{larc},i} \text{ and } \text{SOC}\_{EV,i}(t) > \text{SOC}\_{EV,i}^{-}(t) \end{cases} \tag{6}$$
 
$$P\_{EV,i}^{-}(t) = \begin{cases} 0 & , t \ge T\_{\text{larc},i}, 0 \le t < T\_{\text{larc},i} \text{ and } \text{SOC}\_{EV,i}(t) \ge \text{SOC}\_{EV,i}^{+}(t) \\\ P\_{EV,i}^{\text{ch}}(t), \; 0 \le t < T\_{\text{larc},i} \text{ and } \text{SOC}\_{EV,i}(t) < \text{SOC}\_{EV,i}^{+}(t) \end{cases} \tag{7}$$
 
$$\bigtriangleup$$

**Figure 2.** The distribution of controller commands in the charging station.

The charging and discharging constraint boundary of the cluster EVs' *PEV* can be obtained from the boundary of a single EV as follows:

$$\begin{cases} P\_{EV}^-(t) < \Delta P\_{EV}(t) < P\_{EV}^+(t) \\\ P\_{EV}^+(t) = \sum\_{i=1}^{n\_{EV}} P\_{EV,i}^+(t) \\\ P\_{EV}^-(t) = \sum\_{i=1}^{n\_{EV}} P\_{EV,i}^-(t) \end{cases} \tag{8}$$

where *nEV* is the number of EV.

In addition, the active power capacity calculation is related to the number and the *SOC* state of EV:

$$E\_{ct} = \sum\_{i=1}^{N\_{cv}} (SOC\_i \times E\_i) / E\_{all} \tag{9}$$

where *Ei* represents the active power capacity of a single EV, *Eall* represents the total active power capacity of EVs, and *Ect* represents the real time active power capacity of the EVs station.

From this, it can be obtained that the output power Δ*PEV* of the EV charging station during the charging and discharging process should meet the following constraints:

$$\begin{cases} \begin{aligned} 0 &< \Delta P\_{EV} < \lambda\_{\varepsilon} \\ & -\lambda\_{\varepsilon} < \Delta P\_{EV} < \lambda\_{\varepsilon} \end{aligned} & E\_{\text{rm in}} > E\_{\text{eff}} < E\_{\text{rm max}} \\\ \begin{aligned} -\lambda\_{\varepsilon} &< \Delta P\_{EV} < 0 \end{aligned} & E\_{\text{rm in}} < E\_{\text{rm max}} \end{aligned} \tag{10}$$

when *Ect* > *E*rmax, the real time active power capacity *Ect* of the EV station is higher than the recommended maximum capacity *E*rmax, due to the rapid increase in the number of EVs in the charging station. When *Ect* < *E*rmin, the number of EVs in the charging station is too small, or the EVs in the charging station are all in a low battery state. When *E*rmin < *Ect* < *E*rmax, the EV station can either discharge to the microgrid or charge from the microgrid.

Furthermore, the capacity state *E* of the EVs station is related to the EVs existing in the EVs station in different *SOC* states. Therefore, by combining Equations (8) and (10), it can obtain the constraint of active output power Δ*PEV* considering the travel demand of users, the number of electric vehicles, and the real-time *SOC* of electric vehicles as:

$$\begin{cases} \begin{aligned} 0 < \Delta P\_{EV} \le P\_{EV}^{+}(t) \end{aligned} & \begin{aligned} E\_{ct} > E\_{\text{rmax}} \\ P\_{EV}^{-}(t) \le \Delta P\_{EV} \le P\_{EV}^{+}(t) \end{aligned} & E\_{\text{rmin}} < E\_{ct} < E\_{\text{rmax}} \\\ P\_{EV}^{-}(t) \le \Delta P\_{EV} < 0 \end{aligned} \tag{11}$$

After obtaining the boundary of the active discharge power Δ*PEV* of the EVs, the reactive power boundary can be obtained through the power factor angle of the charger, and the circuit topology of the four-quadrant bidirectional charger mostly uses a doublebuck AC–DC half-bridge conversion circuit, a traditional AC–DC half-bridge conversion circuit, and an AC–DC full-bridge conversion circuit. The capacity curve of the charger is shown in Figure 3 [18].

**Figure 3.** The capacity curve of the charger.

*ϕ* is the power factor angle when the apparent rated power is Δ*SEV*. *ϕ*min and *ϕ*max are the minimum and maximum power factor angles of the charger. The positive axis of the *P* axis and *Q* axis represents the energy transferred from the grid to the EV charger. When the active power is *OA*, the adjustable range of reactive power is *CC*', and the length of *OB* is the apparent rated power Δ*S*. In addition, the relationship of the active and reactive power Δ*PEV* and Δ*QEV* can be charged by Figure 3, as in the Formula (12):

$$
\Delta Q\_{EV} = \Delta P\_{EV} \tan \varphi \tag{12}
$$

*ϕ*min < *ϕ* < *ϕ*max −*ϕ*min < *ϕ* < −*ϕ*max (13)

Thus, the power factor angle needs to meet the operating characteristics of the charger, and when Δ*PEV* > 0, the grid feeds active power to the EVs, when Δ*QEV* > 0, the grid feeds reactive power to the EVs.

In summary, the boundary of the output power increment of the EV charging station is affected by the number of EV in the charging station *NEV*, *SOC* state, electric vehicle charging station real time capacity *E*, and the angle of charging power factor.

### *2.2. VF Control Model of Microgrids with EVs*

The output characteristics of distributed wind power and photovoltaic system are random, and load fluctuations simultaneously affect the output of active and reactive power. Therefore, in the process of microgrid VF control in this paper, the wind power and photovoltaic system are equivalent to disturbance sources [19]. In addition, the load response characteristics of wind power system and photovoltaic power system are similar, so only the microgrid load VF control under the wind power disturbance is considered, and it is applied using recorded historical data [20]. In addition, the MT is added to the microgrid system as a main control unit in this paper to ensure the flexibility and validity of microgrid regulation.

The structure of the microgrid is in Figure 4. The microgrid includes a MT, EVs, distributed wind power, and load.

**Figure 4.** Structure of the islanded microgrid.

Δ*PL* and Δ*QL* are the load disturbance power, Δ*PW* and Δ*QW* are the wind disturbance power, Δ*PMT* and Δ*QMT* are the power variation of *MT*, and Δ*PEV* and Δ*QEV* are the power variation of EVs.

### **3. The Design of Microgrid VF Controller Based on DDPG**

In the islanded microgrid, it is important to maintain the stability of VF, but there are some control problems such as various uncertainties and nonlinearities caused by DGs and EVs, which can inevitably cause the VF fluctuation and make it deviate from the reference value.

In addition, the Deep Reinforcement Learning (DRL) with online learning, experience playback capabilities and other advantages, is suitable for nonlinear systems [21]. Therefore, in this paper, a VF controller based on DDPG for islanded microgrid with EVs is designed. The frequency and voltage deviation is fed back to the DDPG controller, which adjusts the power output of each unit to ensure the stability of the frequency and voltage of the system.

### *3.1. Theoretical Analysis of DDPG*

Q-learning and Deep Q-learning (DQN) are typical value-based reinforcement learning algorithms that use value functions to learn the optimal strategy during the interaction with the environment [22]. However, since the *Q*-learning cannot process continuous signals, it is necessary to discretize the action space. Therefore, it is difficult to realize the precise control of MT, EVs and chargers, which is not suitable for the design of this paper.

In addition, the learning of the DDPG can be carried out in a continuous action space [23]. The DDPG contains four networks, namely actor current network, actor target network, critic current network, critic target network. At *t*, the actor current network parameter is *θ*, and the actor target network parameter is *θ*- , the critic current network parameter is *ω*, the critic target network parameter is *ω*- .

In the above four networks, the actor current network can generate action *a*t according to the current status *st*. The actor target network can generate the action *a*t+1 at the *t* + 1 time according to the subsequent state of the environment. The critic current network can calculate the value *Rt* corresponding to the status *st* and action *at*. The Critic target network can generate the value of *Qvalue*- (*st +* 1, *at+*1*|ω*- ) based on subsequent state *s*t+1 and action *a*t+1, which is used to calculate the target value *y*, as shown in the Formula (14):

$$y = r\_t + \gamma Q\_{valur} \prime (s\_{t+1\prime} a\_{t+1\prime} \omega^{\prime}) \tag{14}$$

where *γ* is a discount factor and 0 < *γ* < 1, *Qvalue*- (*st +* 1, *at +* <sup>1</sup>*|ω*- ) is the value generated by subsequent state *st* + 1 and action *at* + 1, which is used to calculate the target value *y*.

Meanwhile, the critic current network parameter *ω* is updated by the gratial direction of the neural network using a mean square difference loss functional Formula (15). In addition, the parameter of the actor current network *θ* is updated through the gradient of the neural network, as shown in Formula (16):

$$L = \frac{1}{m} \sum\_{j=1}^{m} \left( y\_j - \mathbb{Q}\left( s\_{j\prime} A\_{j\prime} \omega \right) \right)^2 \tag{15}$$

$$\nabla f(\theta) = \frac{1}{m} \sum\_{j=1}^{m} \left[ \nabla\_a Q(s, a, \omega) \Big|\_{s=s\_j, a=\pi\theta(s)} \nabla\_\theta \pi\_\theta(s) \Big|\_{s=s\_j} \right] \tag{16}$$

where *m* is the number of samples, *yj* is the target value of the *j* sample, *Q*(*sj,aj,ω*) is the output value of the critic current network for the *j* sample, and *πθ*(·) is the output value of the actor current network.

Furthermore, it is necessary to update the critic target network and actor target network parameters by Equation (17):

$$\begin{aligned} \omega' &\leftarrow \tau \omega + (1 - \tau) \omega'\\ \theta' &\leftarrow \tau \theta + (1 - \tau) \theta' \end{aligned} \tag{17}$$

where *τ* is an update coefficient, which is generally small.

In addition, the *E* is a termination function, which is to determine whether the Agent enters the termination. If the Agent enters the termination state, the iterative process stops and a new round of state sequence starts. If the Agent enters the non-termination state, the iterative process of the wheel can be continued.

In summary, status information, reward value, action information, and termination status information {*s*, *a*, *R*, *s*- , *E*} are formed into a sample unit and stored in the empirical playback set *D*. Then, *m* sample units of set *D* are taken to be trained by Formulas (14)–(17). A total of *T* rounds is trained, and the training step length of each round is *T*m. The specific training process is shown in Figure 5.

**Figure 5.** Structure of the islandeded microgrid.

### *3.2. Design of DDPG VF Controller Structure*

Considering MT and EV output power increment limiting constraints, a VF controller structure based on DDPG is proposed, as shown in Figure 6. The controller is composed of two layers: coordinate layer and control layer. The coordinate layer provides real-time regulation signal Δ*A* to the control layer according to the frequency deviation Δ*f*, voltage deviation Δ*U*, and the real-time boundary of output power of EV charging station, and then controls the output power of MT and EV to quickly suppress the frequency and voltage deviation.

**Figure 6.** Microgrid LFC controller structure based on DDPG.

### *3.3. Definition of Space and Reward Function*

As mentioned above, the state set of the control system is frequency deviation Δ*F*(*t*), voltage deviation Δ*U*(*t*), and the real-time boundary of output power of EV charging station *P* ± *EV*(*t*) and *Q*<sup>±</sup> *EV*(*t*), so the state space *S* can be defined as follows:

$$S = \left[\Delta F(t), \Delta l I(t), \, P\_{EV}^{+}(t), \, P\_{EV}^{-}(t), \, Q\_{EV}^{+}(t), \, Q\_{EV}^{-}(t)\right] \tag{18}$$

In addition, the joint action set *A* of the DDPG controller, namely the output of the controller, should be a real-time set of dispatch instruction of the active and reactive power output of MT, the output active power of EVs, and the power factor angle of the charger. Thus, the action space *A* can be defined as follows:

$$A = \left[ \Delta A\_{P,MT} \left( t \right), \Delta A\_{P,EV} \left( t \right), \Delta A\_{Q,MT} \left( t \right), \Delta A\_{q,EV} \left( t \right) \right] \tag{19}$$

In addition, then, China's power safety work principle stipulates that the frequency of the power system during normal operation should be within the range of 50 ± 0.2 Hz, and the voltage deviation should within 5%. Thus, on this basis, a certain adjustment dead zone is considered, the discrete set of real-time frequency deviation Δ*F*(t) can be set as (−∞, −0.2), [−0.2, −0.15), [−0.15, −0.10), [−0.10, −0.03), [−0.03, 0.03], (0.03, 0.10], (0.10, 0.15], (0.15, 0.2], (0.2, +∞), unit of Hz, and the discrete set of real-time voltage deviation Δ*U*(*t*) can be set as (−1,−0.05), [−0.05, −0.03), [−0.03, −0.02), [−0.02, −0.01), [−0.01, 0.01], (0.01, 0.02], (0.02, 0.03], (0.03, 0.05], (0.05,1), unit of p.u.

Meanwhile, the control objectives in this paper are: <sup>1</sup> Restore the frequency to the rated value; <sup>2</sup> Regulate and control the voltage to restore to the best state. As a result, a comprehensive reward function including two local reward functions can be set up to coordinate frequency recovery and voltage adjustment:

$$R = r\_f + r\_u \tag{20}$$

$$r\_f = \begin{cases} 0 & |\Delta f| < 0.03\\ -\mu\_1 |\Delta f| & 0.03 \le |\Delta f| < 0.10\\ -\mu\_2 |\Delta f| & 0.10 \le |\Delta f| < 0.15\\ -\mu\_3 |\Delta f| & 0.15 \le |\Delta f| < 0.2\\ -\mu\_4 |\Delta f| & 0.2 \le |\Delta f| \end{cases} \tag{21}$$
 
$$\left\{ \begin{array}{c} 0 & |\Delta l l| < 0.01\\ -\mu\_4 |\Delta f| & 0.01 \end{array} \right.$$

$$r\_{\rm u} = \begin{cases} -\delta\_1 |\Delta \ell| & 0.01 \le |\Delta \ell| < 0.02 \\ -\delta\_2 |\Delta \ell| & 0.02 \le |\Delta \ell| < 0.03 \\ -\delta\_3 |\Delta \ell| & 0.02 \le |\Delta \ell| < 0.03 \\ -\delta\_4 |\Delta \ell| & 0.03 \le |\Delta \ell| < 0.05 \\ -\delta\_4 |\Delta \ell| & 0.05 \le |\Delta \ell| < 1 \end{cases} \tag{22}$$

where *R* is the global reward, *rf* is the frequency reward, *ru* is the voltage reward, *μ*1, *μ*2, *μ*<sup>3</sup> and *μ*<sup>4</sup> are the weights corresponding to the reward function of each control region in the frequency penalty item *rf*, and *δ*1, *δ*2, *δ*<sup>3</sup> and *δ*<sup>4</sup> and are the weights corresponding to the voltage control regions.

The control process needs to control the frequency through *rf*, when |Δ*f*| is in adjusting dead zone [−0.05, 0.05] Hz, and the frequency meets the minimum error requirement of normal operation, so the maximum reward value given to the DDPG controller at this time is 0. When |Δ*f*| is respectively in normal control (0.05, 0.10) and (0.10, 0.15) Hz, auxiliary control area (0.15, 0.2) Hz, emergency control area (0.2, +∞) Hz, the controller can get the corresponding negative incentives, namely the penalty value. Meanwhile, when voltage control is performed, the voltage needs to be regulated by *ru*, when |Δ*U*| is in adjusting dead zone [−0.01, 0.01], the maximum reward value given to the DDPG controller at this time is 0, and when |Δ*U*| is respectively in normal control (0.01, 0.02) and (0.02, 0.03), auxiliary control area (0.03, 0.05), emergency control area (0.05, 1), the controller can get the corresponding penalty value.

When determining the values of the above parameters, it should be noted that the size of the reward value can affect the convergence effect and the learning speed. Therefore, it is necessary to perform simulation tests based on actual calculation examples, and the specific process will be discussed later.

In summary, the state space and reward function designed in this paper can realize the simultaneous adjustment of voltage and frequency. When the frequency is restored, it can consider whether the voltage exceeds the limit, and, when adjusting the voltage, it can also consider whether the frequency deviates from the rated value, which significantly improves the overall stability of the microgrid.

### *3.4. The Selection of Hyperparameter*

In DRL, it is necessary to provide the agent with a set of optimal hyperparameters to improve the performance and effect of learning [24].

First of all, the larger the discount factor *γ*, the more the agent attaches importance to past experience and can give up current interests and pursue overall interests. However, if *γ* is too large, it will also cause the training of agent to fail to converge. The greater the learning rate *α*, the faster the agent converges, but the worse the stability; the smaller the *α*, the better the stability, but the slower the agent converges. Therefore, the convergence speed should be improved on the premise when the agent training can converge. In addition, the design of network structure can be discussed from two aspects: network type and network depth. The choice of network type depends largely on the state space, and the state space of the control system in this paper is frequency and voltage deviation, which belong to

one-dimensional vector, so the full connection layer can better meet the requirements of the storage strategy set. In addition, the network depth determines the generalization ability of the neural network, which includes the number of layers of the neural network *h* and the neurons in each layer *u*.

In addition, the specific values of *γ*, *α*, *h* and *u* need to be selected according to the calculation example.

### *3.5. Summary of Control Strategy*

In summary, the control strategy of this paper is carried out in the following steps:


### **4. Simulation Results**

In order to evaluate the control effect of the above strategy, the coupled islanded microgrid system is built as shown in Figure 7. In addition, the specific settings of equipment parameters are shown in Table 1. The verification of the calculation examples in this paper is carried out through simulation experiments. The computing platform is a PC with i7-1165G7@2.80GHz CPU and 16 GB RAM, and the software environment is Windows 10 Professional and MATLAB R2021a.

**Figure 7.** Microgrid LFC controller structure based on DDPG.

In the microgrid, there is a MT with capacity of 40 kW, a WT with capacity of 20 kW, an EV station1 with capacity of 16 kW, an EV station2 with capacity of 14 kW, and 60 kW ordinary loads. In addition, this paper assumes that the initial state of the microgrid is stable. Thus, when there is no external disturbance, the power output of MT, EV stations, WT, and conventional loads are always in balance. Therefore, in the following calculation examples, only the per-unit value of the power fluctuations of MT, EVs stations, WT, and load need to be considered.

**Table 1.** Parameters of equipment in microgrids.


### *4.1. Pre-Learning Stage*

Before the controller is used, it needs to undergo a random trial and error learning process, which is called the pre-learning stage. In the initial stage of pre-learning, the controller has not accumulated any experience and has no intelligent control ability [25]. Only after accepting various state actions can the optimal value function *Q* network *Qϕ*(*s*,*a*). Therefore, the wind and load disturbances superimposed by various different amplitudes and different types of functions are set up for repeated training of the controller. Meanwhile, according to the output capacity change data of the electric vehicle charging station, a boundary function of the output power increment that changes randomly over time is set. Take active power disturbance and the output boundary of the active power of EVs as examples. The random disturbance of a certain training process is shown in Figure 8.

**Figure 8.** Random perturbation function in the pre-learning phase: (**a**) Random Function of Active Power Disturbance; (**b**) Random Function of EV Output Power Boundary.

Meanwhile, through a large number of simulation studies, *μ*1, *μ*2, *μ*3, and *μ*<sup>4</sup> are referred as 1, 5, 10, and 20, respectively, *δ*1, *δ*2, *δ*<sup>3</sup> and *δ*<sup>4</sup> are referred as 5, 20, 50, and 100 respectively, and α and *γ* are referred as 0.01, 0.09. Meanwhile, the number of learning iterations of the DDPG controller is set to 500, each with 500 steps, and the step length is 0.1 s. Therefore, six groups of parameters (*h*, *u*) are set for the convergence test, and the learning results are shown in Table 2. It can be seen that the reward value of the system at convergence is the highest when *h* = 5 and *u* = 50.


**Table 2.** Convergence test results under different parameters.

Thus, when *h* = 5 and *u* = 50, the pre-learning process of the agent is shown in Figure 9.

**Figure 9.** The complete trend graph of the reward function: (**a**) The complete trend graph of the reward function; (**b**) the trend graph of the reward function for the last 50 iterations.

It can be seen that the agent basically converges after 80 iterations, and the system judges that the learning process has been completed and stops the training after 248 iterations. In this case, the average reward is −21.096 and the final award is 0.65307, which shows that the controller can complete the subsequent simulation at this time.

### *4.2. The Implementation of Constraint Conditions in the EV Model*

In order to verify the implementation of constraint conditions in the EV model, this paper selects several typical monomer EV *SOC* simulation situations as examples, as shown in Figures 10 and 11. In addition, to ensure the life of battery, the initial *SOC* is set between *SOC*rmin = 0.2 and *SOC*rmax = 0.8.

The first situation in Figure 10 shows that, when *SOC* < *SOC*rmin, the EV will be forced to enter the charging state. Only when *SOC* > *SOC*rmin can the EV participate in system regulation. The second situation in Figure 10 shows that, when the EV is close to the leaving time and *SOC* < *SOCm*, it will turn to the forced charging state to ensure that the *SOC* reaches the expected *SOCm* when leaving the charging station. In general, the changes in the *SOC* of EVs participating in the regulation of the microgrid are shown in Figure 11. The *SOC* of EVs will change in the constraint range.

### *4.3. Case Study*

After completing the pre-learning phase and the verification of the EV *SOC* constraints, the example can be simulated under different operation scenarios. Meanwhile, in order to

evaluate the effect of DDPG controller proposed in this paper, traditional PID controller and *R*(λ) controller are used in the same scene respectively, and the corresponding controller parameters are shown in Table 3.

**Figure 10.** Changes of *SOC* of EVs at critical value.

**Figure 11.** Changes of *SOC* of EVs in the normal range.


**Table 3.** PID and R(λ) controller system parameters.

4.3.1. Case 1: The Response of Wind Power Disturbance

First of all, wind power disturbance is added to the islanded microgrid system, and wind mainly provides active power disturbances to the grid. In order to compare the adjusting speed of each controller, the wind power disturbance ends after 43 s. The disturbance setting is shown in Figure 12.

**Figure 12.** Wind power disturbance.

There is not the fluctuation of reactive power in this case, so the impact of voltage fluctuation is not considered here. The variation of frequency deviation under wind power disturbance is shown in Figure 13. Meanwhile, according to the simulation results, this paper takes the absolute value of |Δ*f*| as the evaluation object, and sets the threshold of the frequency deviation excellence rate to 2 <sup>×</sup> <sup>10</sup>−<sup>4</sup> Hz, and defines *<sup>T</sup>*recover as the time which is taken for |Δ*f*| to recover to 5 <sup>×</sup> <sup>10</sup>−<sup>5</sup> Hz after the wind power disturbance ends. The results of the control test under wind disturbance are shown in Table 4.

**Figure 13.** Performance of frequency control under wind power disturbance.


**Table 4.** Frequency simulation results under wind disturbance.

It can be seen from Figure 13 and Table 4 that, compared with the PID controller, the DDPG and *R*(λ) controller with the ability of online learning and experience playback can more effectively deal with the highly random disturbance. Under the wind disturbance, the frequency fluctuation of the islanded microgrid under the DDPG controller can be limited in 2 <sup>×</sup> <sup>10</sup>−<sup>4</sup> Hz, and the excellent rate can reach 98%, which is significantly better than the traditional controller. In addition, if only analyzed from the perspective of frequency control, the control strategy of DDPG and *R*(λ) controller in this paper possesses virtues of great control effect, smaller amplitude of frequency fluctuation, and faster regulation speed than a traditional controller. Furthermore, the regulation speed of DDPG controller is much faster than a R(λ) controller.

Furthermore, the power variations of each equipment in islanded microgrid under the DDPG controller are shown in Figure 14. It can be seen that, when the system suffers disturbance, the MT undertakes the main work of frequency regulation, and the output power of EV charging station is also significant. In addition, when the limit is reached, the power variations of different charging stations are different.

**Figure 14.** Power variations of each equipment under wind power disturbance.

### 4.3.2. Case 2: The Response to Load Power Disturbance

The fluctuation of load power is gentler than that of wind power, but the load change is abrupt and can cause fluctuations in active and reactive power at the same time. In this case, load power variations are set as Δ*P*<sup>L</sup> = −0.025 p.u during 10–40 s, Δ*P*<sup>L</sup> = 0.005 p.u during 40–55 s, Δ*P*<sup>L</sup> = −0.0025 p.u during 55–70 s, Δ*P*<sup>L</sup> = 0.015 p.u during 70–150 s, Δ*Q*<sup>L</sup> = −0.04 p.u during 7.5–34 s, Δ*Q*<sup>L</sup> = −0.015 p.u during 34–66 s, Δ*Q*<sup>L</sup> = 0.005 p.u during 66–96 s, Δ*Q*<sup>L</sup> = −0.0075 p.u during 96–150 s. The specific setting of load disturbance is shown in Figure 15.

**Figure 15.** Load power disturbance: (**a**) load active power disturbance; (**b**) load reactive power disturbance.

The DDPG controller is compared with traditional PID and *R*(λ) controller, and the frequency and voltage fluctuation are shown in Figures 16 and 17. The same as the case 1, this part takes |Δ*f*| and |Δ*U*| as the evaluation object, and sets the threshold of the |Δ*f*| excellence rate to 2 <sup>×</sup> <sup>10</sup>−<sup>4</sup> Hz, the |Δ*U*| excellence rate to 0.01 p.u. Meanwhile, *<sup>T</sup>*recover is defined as the time which is taken for |Δ*f*| to recover to 5 <sup>×</sup> <sup>10</sup>−<sup>5</sup> Hz and |Δ*U*| to recover to 0.002 p.u after the load power disturbance no longer changes. Thus, the statistical results of the control test under load disturbance are shown in Tables 5 and 6.

**Figure 16.** Performance of frequency control under load power disturbance.

**Figure 17.** Performance of voltage control under load power disturbance.


**Table 5.** Frequency simulation results under load disturbance.

**Table 6.** Voltage simulation results under load disturbance.


It can be seen from Figures 16 and 17 and Tables 5 and 6 that, when the load changes, compared with the PI controller and R(λ) controller, the DDPG controller can ensure that the frequency deviation of the microgrid is maintained within <sup>±</sup><sup>1</sup> <sup>×</sup> <sup>10</sup>−<sup>3</sup> Hz Hz, and the voltage deviation is also close to 0, which is much smaller than the control index of the power quality of the power grid. In addition, compared with the *R*(λ) controller, the DDPG controller can coordinate the frequency recovery and voltage adjustment of the islanded microgrid, so as to meet the VF control requirements at the same time, which has superior dynamic control characteristics.

Furthermore, the power variations of each equipment are shown in Figure 18. The MT in the micro grid is used as the main source to maintain the stability of the VF amplitude of the microgrid, while the EV1 and EV2 as the slave sources are mainly responsible for the regulation of the active power of the microgrid and also participate in the regulation of the reactive power. In addition, due to the randomness of users, the output power boundary of EV charging stations is random, showing obvious jagged shapes.

**Figure 18.** Power variations of each equipment under load power disturbance: (**a**) active power increment; (**b**) reactive power increment.

### **5. Conclusions**

To solve the problem in which the stability of island microgrid is greatly affected by random power sources, and it is difficult to control frequency and voltage together, a VF control strategy of islanded microgrids with EVs is proposed in this paper. The randomness of charging behavior is considered, and an islanded microgrid system including MT, WT, EVs stations, and loads is established. Thus, a VF synergistic control strategy based on DDPG is proposed. The simulation results show that:


For microgrid systems with more complex structures and larger volumes, it is necessary to consider the multi-microgrid interconnection technology. In addition, multi-agent algorithms such as MA-DDPG, COMA, CommNet, etc. will also be applied to the control of multi-microgrid. The follow-up work will focus on in-depth analysis and research in these directions, and add corresponding hardware circuit experiments or semi-physical simulation experiments.

**Author Contributions:** P.F., S.K. (Song Ke) and S.K. (Salah Kamel) conceptualized the idea of this research, P.F.; performed the experiments and data analysis, P.F. and S.K. (Song Ke) wrote the paper; J.Y., Y.L., J.X., B.X. and G.I.R. provided supervision and reviewed the paper. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the science and technology project of the State Grid Corporation of China Research and the application of flexible control technology for a distribution system with large-scale distributed generation and a multi microgrid (No. 52093220000H).

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

### **References**

