Improvement of the Control of a Grid Connected Photovoltaic System Based on Synergetic and Sliding Mode Controllers Using a Reinforcement Learning Deep Deterministic Policy Gradient Agent

Nicola, Marcel; Nicola, Claudiu-Ionel; Selișteanu, Dan

doi:10.3390/en15072392

Open AccessArticle

Improvement of the Control of a Grid Connected Photovoltaic System Based on Synergetic and Sliding Mode Controllers Using a Reinforcement Learning Deep Deterministic Policy Gradient Agent

by

Marcel Nicola

¹

,

Claudiu-Ionel Nicola

^1,2,*

and

Dan Selișteanu

²

¹

Research and Development Department, National Institute for Research, Development and Testing in Electrical Engineering—ICMET Craiova, 200746 Craiova, Romania

²

Department of Automatic Control and Electronics, University of Craiova, 200585 Craiova, Romania

^*

Author to whom correspondence should be addressed.

Energies 2022, 15(7), 2392; https://doi.org/10.3390/en15072392

Submission received: 16 February 2022 / Revised: 21 March 2022 / Accepted: 23 March 2022 / Published: 24 March 2022

(This article belongs to the Special Issue New Frontiers in Electrical Power Systems Quality)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This article presents the control of a grid connected PV (GC-PV) array system, starting from a benchmark. The control structure used in this article was a cascade-type structure, in which PI or synergetic (SYN) controllers were used for the inner control loop of i_d and i_q currents and PI or sliding mode control (SMC) controllers were used for the outer control loop of the u_dc voltage from the DC intermediate circuit. This paper presents the mathematical model of the PV array together with the main component blocks: simulated inputs for the PV array; the PV array itself; the MPPT algorithm; the DC-DC boost converter; the voltage and current measurements for the DC intermediate circuit; the load and connection to power grid; the DC-AC converter; and the power grid. It also presents the stages of building and training the reinforcement learning (RL) agent. To improve the performance of the control system for the GC-PV array system without using controllers with a more complicated mathematical description, the advantages provided by the RL agent on process controls could also be used. This technique does not require exact knowledge of the mathematical model of the controlled system or the type of uncertainties. The improvement in the control system performance for the GC-PV array system, both when using simple PI-type controllers or complex SMC- and SYN-type controllers, was achieved using an RL agent based on the Deep Deterministic Policy Gradient (DDPG). The variant of DDPG used in this study was the Twin-Delayed (TD3). The improvement in performance of the control system were obtained by using the correction command signals provided by the trained RL agent, which were added to the command signals u_d, u_q and i_dref. The parametric robustness of the proposed control system based on SMC and SYN controllers for the GC-PV array system was proven in the case of a variation of 30% caused by the three-phase load. Moreover, the results of the numerical simulations are shown comparatively and the validation of the synthesis of the proposed control system was obtained. This was achieved by comparing the proposed system with a software benchmark for the control of a GC-PV array system performed in MATLAB Simulink. The numerical simulations proved the superiority of the performance of control systems that use the RL-TD3 agent.

Keywords:

photovoltaic system; grid; sliding mode control; synergetic control; reinforcement learning

1. Introduction

The importance of studying renewable energies from the phenomena of their generation, from sources including solar, wind, water, geothermal, etc., to their integration into microgrids or main grids is undeniable [1].

In parallel with these studies, studies on the control systems used for the generation of energy from renewable sources have also been intensified. Thus, there have been studies on hybrid microgrids [2,3,4], the optimization of the process for battery charging in microgrids [5,6], the optimization of converters in microgrids [7,8], problems regarding the defects that can occur in microgrids [9], as well as elements regarding the dispatching of microgrids by economic criteria [10,11,12,13,14].

A specific problem that is addressed in this article is the control system for the connection of the PV array system to a main grid. This problem involves the study of a chain of primary elements, consisting of the following blocks: the inputs for the PV array; the PV array itself; the MPPT algorithm; the DC-DC boost converter; the voltage and current measurements for the DC parameters in intermediate circuit; the DC-AC converter; the PLL (phase locked loop); the load and connection to power grid; and the core element that controls these blocks, called the voltage-source converter (VSC). The main objective of the control system is to stabilize the u_dc voltage as precisely as possible, including under variation caused by the three-phase load [15].

To achieve this goal, a series of adaptive control-type [16], robust control-type [17,18] and predictive control-type [19] algorithms can be used. Furthermore, fuzzy logic and neuro-fuzzy systems [20,21], genetics [22], particle swarm optimization (PSO) [23], RL [24], and passivity theory control systems are a special category of these control systems [25].

Given that the description equations of GC-PV array systems are nonlinear, a control system that ensures parametric robustness is provided by the SMC [26]. The SYN control systems [27], which can be considered as an extension of the SMC, also receive special emphasis.

The control systems that are based on RL for process control are organized as a series of tasks, which run on a computer for the control of an industrial process but do not require an explicit mathematical description [28,29,30].

This article starts from a benchmark presented in MATLAB Simulink [15], which was resumed in order to compare the best results obtained in [26,27,31,32]. After presenting the main characteristics of the benchmark system, numerical simulations are reported based on the theoretical elements presented in the preceding sections. Thus, starting from the cascade control structure in which the PI-type controllers were used in the inner control loop of i_d and i_q currents, an SMC-type controller was used in the outer control loop of u_dc voltage and the elements regarding the RL-TD3 agent were used, the superior performance of the control system for the GC-PV array was obtained.

Moreover, in the second part of the numerical simulations, starting from the peak performances presented in [32,33] regarding the cascade control system in which an SYN-type controller was used in the inner control loop of i_d and i_q currents, an SMC-type controller was used in the outer control loop of u_dc voltage and the elements regarding the RL-TD3 agent were used, the superior performance of the control system for the GC-PV array was obtained both in terms of the direct comparison of these performances and the robustness provided by the control system under parametric variations, such as the variation caused by the three-phase load.

The main contributions of this article are as follows:

The proposal of a cascade control system structure for the GC-PV array system, in which an SMC-type controller is used for the outer u_dc voltage in the DC circuit control loop and SYN-type controllers are used in the inner control loops in the i_d and i_q currents;
Improvements in the performance of the control system for the GC-PV array system when using simple PI-type controllers or complex SMC-type or SYN-type controllers through the use of an RL agent that is based on TD3;
Validations of the results performed through a MATLAB Simulink environment to show the improvements in the performance of the control system for the GC-PV array system by using the RL-TD3 agent, even under parametric uncertainties; for example, a variation of 30% from the nominal value caused by the three-phase load.

The rest of the paper is organized as follows. Section 2 presents the mathematical model of the GC-PV array system. Section 3 describes the RL agent used for process control. Section 4 presents a correction of the control signals and the MATLAB Simulink implementation of the control for the GC-PV array system based on PI-type controllers using the RL-TD3 agent. Section 5 presents a correction and MATLAB Simulink implementation of the command signals for the control system for the GC-PV array system based on SMC- and SYN-type controllers using the RL-TD3 agent. The results of the numerical simulations are presented in Section 6 and Section 7 presents our conclusions.

2. Grid Connected PV Array System: The Mathematical Model

The schematic block of the main circuit for the GC-PV system is presented in Figure 1 [15,27,31]. The input quantities for the PV array model were provided by radiation and temperature. A component of utmost importance was the power point tracking (MPPT) module, which acted on the DC boost converter to obtain the maximum efficiency of the energy received from the PV array. A detailed description of the MPPT is presented in [15,26,31]. A three-phase DC–AC converter, which powered a three-phase load, was added to the diagram in Figure 1. The controller proposed and described in the following sections acted on the three-phase DC–AC converter in order to stabilize the u_dc voltage as precisely as possible, even under a significant variation caused by the three-phase load. Using the notations presented in Figure 1, Equations (1)–(4) could be written as below:

C_{1} \frac{d u_{P V}}{d t} = i_{P V} - i_{s}

(1)

u_{P V} = R_{1} i_{s} + L_{1} \frac{d i_{s}}{d t} + u_{s}

(2)

C_{2} \frac{d u_{d c}}{d t} = i_{d c 1} - i_{d c 2}

(3)

u_{a b c} - e_{a b c} = R_{3} i_{a b c} + L_{3} \frac{{d i}_{a b c}}{d t}

(4)

where the output voltages of the DC–AC is noted with u_abc (represented by the VSC with the form

u_{a b c} = {[\begin{matrix} u_{a} & u_{b} & u_{c} \end{matrix}]}^{T}

), the grid voltages are denoted by e_abc with the form

e_{a b c} = {[\begin{matrix} e_{a} & e_{b} & e_{c} \end{matrix}]}^{T}

and the alternating currents are denoted by i_abc with the form

i_{a b c} = {[\begin{matrix} i_{a} & i_{b} & i_{c} \end{matrix}]}^{T}

.

In Equation (5), the well-known Park’s transformation based on P matrix is presented:

P = [\begin{matrix} \sin (ω t) & \sin (ω t - \frac{2 π}{3}) & \sin (ω t + \frac{2 π}{3}) \\ \cos (ω t) & \cos (ω t - \frac{2 π}{3}) & \cos (ω t + \frac{2 π}{3}) \\ \frac{1}{2} & \frac{1}{2} & \frac{1}{2} \end{matrix}]

(5)

The transformation from the coordinates abc reference frame to the d−q reference frame was performed using Equation (5): u_dq₀ = Pu_abc, e_dq₀ = Pe_abc, i_dq₀ = = Pi_abc. Equation (4) transformed as follows:

u_{d q 0} - e_{d q 0} = R_{3} i_{d q 0} + L_{3} \frac{d i_{d q 0}}{d t} + L_{3} [\begin{matrix} - ω i_{q} \\ ω i_{d} \\ 0 \end{matrix}]

(6)

Equation (6) could be written by components as follows:

L_{3} \frac{d i_{d}}{d t} = - R_{3} i_{d} + ω L_{3} i_{q} - e_{d} + u_{d} = u_{3 d} + u_{d}

(7)

L_{3} \frac{d i_{q}}{d t} = - R_{3} i_{q} - ω L_{3} i_{d} - e_{d} + u_{q} = u_{3 q} + u_{q}

(8)

where u_id and u_iq are the control variables used for the command of the DC–AC. In the above equations, we noted that

u_{3 d} = - R_{3} i_{d} + ω L_{3} i_{q} - e_{d}

and

u_{3 q} = - R_{3} i_{q} - ω L_{3} i_{d} - e_{q}

.

Following [15,26,31], the duty cycle D used for the control of the DC boost converter was described by means of Equations (9) and (10):

i_{d c 1} = (1 - D) i_{s}

(9)

u_{s} = (1 - D) u_{d c}

(10)

A general block diagram of the entire application described in this article is shown in Figure 2. The chain of primary elements consisted of the following blocks: the simulated inputs for PV array; the PV array itself; the MPPT algorithm; the DC-DC boost converter; the voltage and current measurements for the DC intermediate circuit; the load and connection to power grid; the DC-AC converter; and the power grid. It can also be noted that the main element that we focus on in this article is the control block of the three-phase DC-AC converter. The main objective of the control system is to stabilize the u_dc voltage as precisely as possible, including under variation caused by the three-phase load. While in the classic case, the control system is built with PI-type controllers for the two voltage and current control loops, this article presents an improvement in the performance of the control system through the use of an RL-TD3 agent. Furthermore, in the complex case of a control system in which the control of the voltage loop is performed by an SMC-type controller and the control of the current loop is performed by an SYN-type controller, there was an improvement in the performance of the GC-PV control system through the use of the RL-TD3 agent that was created and trained accordingly.

3. Reinforcement Learning for Process Control

The RL for process control is organized as a series of tasks, which run on a computer to control an industrial process but do not require an explicit mathematical description. Thus, the RL process interacts with the controlled process in the sense of transmitting decisions (commands), which must reach the maximum of a set cumulative “Reward”. Figure 3 presents the schematic block diagram for an RL of the process control system. It can be noted that “Observation” and “Reward” are input signals for the RL. Observations are signals that characterize the process and are measurable along with their rate of change or error relative to a reference. Actions are the control quantities that act on the controlled process. Over time, Actions are selected so that the cumulative Reward increases in order to reach an optimal value. The Reward is expressed in terms of the square error of process signals and the square of the past Actions. The RL contains an optimal “Policy”, which is analogous to the operating mode of a process controller. The process contains the usual elements, namely a plant, reference signals, converters, filters and sensors.

The usual stages for the design of an RL process are the following [28,29,30]:

The Problem statement represents the RL agent and its capability to interconnect with the components of the process;
The Process creation represents the dynamic model type of the GC-PV’s controlled process and its interface;
The Reward creation represents the mathematical relationship of the Reward in order to carry out the performance measurements for the execution of the proposed task;
The Agent training represents an RL agent that is trained to realize the Policy based on the Reward, RL algorithm and controlled process.
The Agent validation represents the stage where the performance is evaluated after training;
The Deploy policy represents the step that performs the implementation of the trained RL agent within the GC-PV control system.

In this article, we used an RL-TD3 agent, which was an improved variant of the RL-DDPG-type agent. This type of agent is an actor-critic agent that calculates the long-term maximization of the Reward.

The steps performed by an RL agent during the training period are as follows [28,29]:

For the Observation of the current state S, the action $A = μ (S) + N$ is selected, where N is the stochastic noise obtained from the noise model;
Action A is executed, then Reward R and the next Observation S’ are calculated;
The experience $(S, A, R, S^{'})$ is stored;
M experiences $(S_{i}, A_{i}, R_{i}, S_{'}^{i})$ are randomly generated;
For $S_{i}^{'}$ , which is a terminal state, we can obtain the value function target y_i that is set to R_i.

Alternatively, this is calculated by the Equation (11) [28,29]:

y_{i} = R_{i} + γ \cdot \min (Q_{k}^{'} (S_{k}^{'}, c l i p (μ^{'} (S_{k}^{'} |θ_{μ}) + ε) |θ_{Q_{k}^{'}}))

(11)

The value function target is equal to the sum of the experience Reward R_i and the minimum discounted value for the future Reward from the critics.

At every training step, the parameters of each critic are updated and minimized using the following expression:

L_{k} = \frac{1}{M} \sum_{i = 1}^{M} {(y_{i} - Q_{k} (S_{i}, A_{i} |θ_{Q k}))}^{2}

(12)

At every step, the actor’s parameter values are updated, thereby maximizing the Reward:

\nabla_{θ_{μ}} J = \frac{1}{M} \sum_{i = 1}^{M} G_{a i} G_{μ i}

(13)

where G_ai, G_μi and A are represented by the following expressions, respectively:

G_{a i} = \nabla_{A} \min (Q_{k} (S_{i}, A |θ_{Q}))

(14)

G_{μ i} = \nabla_{θ_{μ}} μ (S_{i} |θ_{μ})

(15)

A = μ (S_{i} |θ_{μ})

(16)

Additionally, the parametric updates are realized for a selected smoothing coefficient τ, as in the following equations:

θ_{Q k^{'}} = τ θ_{Q k} + (1 - τ) θ_{Q k^{'}}

(17)

θ_{μ^{'}} = τ θ_{μ} + (1 - τ) θ_{μ^{'}}

(18)

4. Correction of the Control Signals Used for the Control of a Grid Connected PV Array System Based on PI Controllers Using RL-TD3 Agent

The classic control system for a GC-PV system is presented in detail in [15] and can be considered as the benchmark for the performance of the control system. The control system is also presented in [27,31], both under low voltage and normal operation conditions. Figure 4 shows the schematic diagram of the classic control system for the GC-PV array, which consists of a cascade structure in which PI-type controllers and control loops are used for the control of currents i_d and i_q (inner control loop) and for the control of the u_dc voltage (outer control loop).

Figure 5 shows the model MATLAB Simulink implementation of the control system for the GC-PV system based on PI-type controllers using an RL-TD3 agent for the correction of control signals, i.e., a customization of Figure 2 for the control system presented in this section. Thus, the RL-TD3 agent that learned the behavior of the control system for the GC-PV array was used, which supplied the correction command signals for the three command inputs of the cascade-type control system (i_dref, u_dref and u_qref) after the training phase, so that the improved GC-PV control system would produce a superior performance.

The steps in Section 3 were followed to implement the RL-TD3 agent. In first step, the deep neural network (DNN) object was created, which was characterized through two inputs (Observation and Action) and one output. An example code sequence from the software program developed in the MATLAB environment for the design of the neural network is shown in Figure 6 and its graphic representation is presented in Figure 7.

To train the RL-TD3 agent to control the GC-PV system, 200 episodes were chosen, with the step number for each episode being around 100 and the time sampling of the agent being 10⁻⁴ s. The RL-TD3 agent training stage could be stopped when the cumulative average Reward was greater than −190 for a period of 100 consecutive episodes or after the 200 training episodes that were initially set had finished. To improve the RL-TD3 agent’s performance during training, Gaussian noise overlapped the signals that were received and transmitted by the proposed agent.

4.1. Implementation of the RL-TD3 Agent for the Correction of Commands for the Outer Voltage Control Loop

The model MATLAB Simulink implementation of the control system for the GC-PV array based on PI-type controllers using the RL-TD3 agent for the command correction of the i_dref current, which represents the outer loop for the control of the u_dc voltage, is shown in Figure 8. Figure 9 presents subsystem diagram of the MATLAB Simulink implementation of the RL-TD3 agent. In this case, the corrected command signals of the RL-TD3 agent were added to the command signal i_dref. The Observations were represented by the following signals: u_dc and u_dcerror.

The Reward at every step in this case was calculated using the following equation:

r_{1} = - (Q_{1} u_{d c_e r r o r}^{2} + R \sum_{j} {(u_{t - 1}^{j})}^{2})

(19)

where Q₁ is 0.5 and R is 0.1.

The training time in this case was 2 h, 37 min and 12 s. The graphical results for this training stage are presented in Figure 10.

4.2. Implementation of the RL-TD3 Agent for the Command Correction of the Inner Currents Control Loop

The model MATLAB Simulink implementation of the inner loop of the GC-PV array (which controls the i_d and i_q currents) based on the RL-TD3 agent is shown in Figure 11. After the learning stage, the RL-TD3 agent supplied correction signals for the command signals u_d and u_q. Figure 12 presents the block diagram of the MATLAB Simulink subsystem implementation of the RL-TD3 agent. The Observation consisted of the following signals: i_d, i_q, i_derror and i_qerror.

The Reward at every step in this case was calculated using the following equation:

r_{1} = - (Q_{1} i_{d e r r o r}^{2} + Q_{2} i_{q e r r o r}^{2} + R \sum_{j} {(u_{t - 1}^{j})}^{2})

(20)

where Q₁ = Q₂ = 0.5, R is 0.1 and

u_{t - 1}^{j}

is the Actions from the previous step.

The training time in this case was 3 h, 12 min and 42 s. The graphical results for this training stage are presented in Figure 13.

4.3. Implementation of the RL-TD3 Agent for the Command Correction of the Outer Voltage Control Loop and Inner Current Control Loops

The model MATLAB Simulink implementation of the command correction for the outer voltage control loop and the inner current control loops based on the RL-TD3 agent is shown in Figure 14. Figure 15 shows the block diagram of the MATLAB Simulink subsystem implementation of the RL-TD3 agent. In this case, the correction signals of RL-TD3 agent were supplied to the command signals u_d and u_q and also to the i_dref signal. The Observations were represented by the following signals: u_dc, u_{dc_error}, i_d, i_q, i_{d_error} and i_{q_error}.

The Reward at every step in this case was calculated using the following equation:

r_{1} = - (Q_{1} u_{d c_e r r o r}^{2} + Q_{2} i_{d e r r o r}^{2} + Q_{3} i_{q e r r o r}^{2} + R \sum_{j} {(u_{t - 1}^{j})}^{2})

(21)

where Q₁ = Q₂ = Q₃ = 0.5 and R is 0.1.

The training time in this case was 1 h, 27 min and 33 s. The graphical results for this training stage are presented in Figure 16.

5. Correction of the Control Signals for the Control System of the Grid Connected PV Array Based on SMC and Synergetic Controllers Using the RL-TD3 Agent

In this section, we present the design and synthesis algorithms of the SMC and SYN controllers of the control system for the GC-PV array. Figure 17 presents the schematic diagram of the control system for the GC-PV array based on the SMC and SYN controllers. This control system consisted of a cascade in which the control loops were used for the control of the i_d and i_q signal currents (inner control loop with the SYN controller) and the control of the u_dc signal voltage (outer control loop with the SMC controller).

Moreover, the RL-TD3 agent that learned the behavior of the GC-PV control system was used, which supplied the correction signals for the three control inputs of the cascade-type control system (i_dref, u_dref, u_qref) after the training stage, so that the improved control system would produce a superior performance, even when the control system used SMC- and SYN-type controllers.

5.1. Sliding Mode Control

Based on the elements presented in Section 2 and denoting the switching functions of the DC-AC converter as S_a, S_b and S_c, the following equation could be written within the abc frame reference:

C_{2} \frac{d u_{d c}}{d t} = i_{d c 1} - (i_{a} S_{a} + i_{b} S_{b} + i_{c} S_{c})

(22)

By using the transformation in (5), the switching functions Sd and S_q could be obtained:

{[\begin{matrix} S_{d} & S_{q} & 0 \end{matrix}]}^{T} = P {[\begin{matrix} S_{a} & S_{b} & S_{c} \end{matrix}]}^{T}

(23)

With these, Equation (22) became:

C_{2} \frac{d u_{d c}}{d t} = i_{d c 1} - \frac{3}{2} (i_{d} S_{d} + i_{q} S_{q})

(24)

Similar to [15,31,33], the same MPPT algorithm was considered, so we then focused on obtaining the SMC and SYN command laws. Furthermore, by following [26], i_qref = 0 was selected and Equation (24) became:

C_{2} \frac{d u_{d c}}{d t} = i_{d c 1} - \frac{3}{2} i_{d r e f} S_{d}

(25)

Moreover, to obtain the reference current i_dref using the SMC design procedure, the state variable x₁, as in Equation (26), and the switching surface S, as in Equation (27), were added:

x_{1} = u_{d c} - u_{d c r e f}

(26)

\{\begin{matrix} S = c_{1} x_{1} + x_{2} \\ \dot{S} = c_{1} x_{2} + {\dot{x}}_{2} \end{matrix}

(27)

In Equation (27), the state variable x₂ was defined as follows:

x_{2} = {\dot{x}}_{1} = - {\dot{u}}_{d c}

(28)

Equation (29) was then necessary to achieve convergence:

\dot{S} = - ε sgn S - k S

(29)

where ε and k are positive constants.

Using calculus, the following could be obtained:

{\ddot{x}}_{1} = {\dot{x}}_{2} = - {\ddot{u}}_{d c} = \frac{3}{2} \frac{S_{d}}{C_{2}} {\dot{i}}_{d r e f} - \frac{{\dot{i}}_{d c 1}}{C_{2}},

(30)

and so, the next equation could be written:

- ε sgn S - k S = c_{1} x_{2} + \frac{3}{2} \frac{1}{C_{2}} S_{d} {\dot{i}}_{d r e f} - \frac{{\dot{i}}_{d c 1}}{C_{2}}

(31)

Following [32,33], to improve the convergence and smoothing of the high frequency oscillations, the sgn function was replaced with the following function defined by Equation (32):

h (x) = \frac{2}{1 + e^{- a (x - b)}} - 1

(32)

For a = 4 and b = 0,

h \in [- 1 1]

and a smoothed transition were achieved for this interval. Thus, the output of the designed SMC-type controller was obtained by:

i_{d r e f} = \frac{2}{3} \frac{C_{2}}{S_{d}} \int_{0}^{t} [- (c_{1} x_{2} + k S - ε h (S)) + \frac{{\dot{i}}_{d c 1}}{C_{2}}] d t

(33)

Figure 18 presents the block diagram of the MATLAB Simulink subsystem implementation of the proposed SMC controller.

5.2. Synergetic Control

For a nonlinear system in the form of (34), an SYN control law could be synthesized that could be seen as a generalization of the SMC-type control law [27,32,33]:

\dot{x} = f (x, u, t)

(34)

where x is the state vector

x \in ℜ^{n}

,

f (.)

is the continuous nonlinear function and u is the input control vector

u \in ℜ^{m}, (m < n)

.

The macro variable

ψ (x, t)

was chosen and it was defined for each input control according to the states of the system. The forced evolution of the states according to the following equation was imposed for the synthesis of the SYN-type control law:

T \dot{ψ} + ψ = 0

(35)

where T > 0 is selected to achieve the desired convergence rate.

By differentiating the chosen macro variable, ψ was obtained by the following expression:

\dot{ψ} = \frac{\partial ψ}{\partial x} \dot{x},

(36)

After inserting Equation (36) into Equation (35), the following could be obtained:

T \frac{\partial ψ}{\partial x} \dot{x} + ψ = 0

(37)

By inserting the explicit forms of the

\dot{x}

states into Equation (37), we could obtain the control law given by the next equation:

u = u (x, ψ (x, t), T, t)

(38)

The outputs of the SYN controller were given by u_d and u_q.

For the d axis and k_d > 0, we selected the chosen macro variable ψ_d in the following form:

ψ_{d} = (u_{d c r e f} - u_{d c}) + k_{d} (i_{d r e f} - i_{d})

(39)

We defined the state variable x₂ as in Equation (40):

\{\begin{matrix} x_{1} = u_{d r e f} - u_{d c} \\ x_{1} = i_{d r e f} - i_{d} \end{matrix}

(40)

From Equation (40) and for the slow mode variations of the reference quantities or for a quasi-stationary regime, the next expression could be obtained:

\{\begin{matrix} {\dot{x}}_{1} = - {\dot{u}}_{d c} \\ {\dot{x}}_{2} = - {\dot{i}}_{d} \end{matrix}

(41)

Based on these, Equation (39) became:

{\dot{ψ}}_{d} = {\dot{x}}_{1} + k_{d} {\dot{x}}_{2} = - {\dot{u}}_{d c} - k_{d} {\dot{i}}_{d}

(42)

For T = T₁, Equation (40) became:

T_{1} (- {\dot{u}}_{d c} - k_{d} {\dot{i}}_{d}) + (u_{d c r e f} - u_{d c}) + k_{d} (i_{d r e f} - i_{d}) = 0

(43)

Using Equation (7), Equation (43) could be written in the following form:

- T_{1} {\dot{u}}_{d c} - T_{1} k_{d} \frac{1}{L_{3}} (u_{3 d} - u_{d}) + (u_{d c r e f} - u_{d c}) + k_{d} (i_{d r e f} - i_{d}) = 0

(44)

After rearranging the terms in Equation (44), we could obtain the following expression:

T_{1} k_{d} \frac{1}{L_{3}} u_{d} = - T_{1} {\dot{u}}_{d c} - T_{1} k_{d} \frac{1}{L_{3}} u_{3 d} + (u_{d c r e f} - u_{d c}) + k_{d} (i_{d r e f} - i_{d})

(45)

Thus, the control law u_d was obtained:

u_{d} = \frac{L_{3}}{T_{1} k_{d}} [- T_{1} {\dot{u}}_{d c} - T_{1} k_{d} \frac{1}{L_{3}} u_{3 d} + (u_{d c r e f} - u_{d c}) + k_{d} (i_{d r e f} - i_{d})]

(46)

For the q axis and k_q > 0, we selected the macro variable ψ_q in the following form:

ψ_{q} = i_{q r e f} - i_{q}

(47)

We could define the state variable x₃ as:

\{\begin{matrix} x_{1} = u_{d c r e f} - u_{d c} \\ x_{2} = i_{d r e f} - i_{d} \\ x_{3} = i_{q r e f} - i_{q} \end{matrix}

(48)

For i_qref = 0, Equation(48) could be written in the following form:

\{\begin{matrix} {\dot{x}}_{1} = - {\dot{u}}_{d c} \\ {\dot{x}}_{2} = - {\dot{i}}_{d} \\ {\dot{x}}_{3} = - {\dot{i}}_{q} \end{matrix}

(49)

Thus, the macro variable derivative ψ_q, which was defined in Equation (47), was obtained:

{\dot{ψ}}_{q} = {\dot{x}}_{3}

(50)

For T = T₂, Equation (40) became:

- T_{2} {\dot{i}}_{q} + (i_{q r e f} - i_{q}) = 0

(51)

Using Equation (8), Equation (51) could be written in the following form:

- T_{2} \frac{1}{L_{3}} (u_{3 q} + u_{q}) + i_{q r e f} - i_{q} = 0

(52)

After rearranging of the terms in Equation (52), we could obtain the following expression:

(u_{3 q} + u_{q}) = \frac{L_{3}}{T_{2}} (i_{q r e f} - i_{q})

(53)

Thus, the control law u_q was obtained:

u_{q} = \frac{L_{3}}{T_{2}} (i_{q r e f} - i_{q}) - u_{3 q}

(54)

Figure 19 presents the MATLAB Simulink implementation subsystem of the designed SYN controller.

Figure 20 shows the model MATLAB Simulink implementation of the control system for the GC-PV array based on the SMC (MATLAB Simulink subsystem implementation shown in Figure 18) and SYN (MATLAB Simulink subsystem implementation shown in Figure 19) controllers using the RL-TD3 agent for the correction of the control signals.

Following on from the aspects presented in Section 4, this section continues to present the ways in which the performance of the GC-PV system could be improved by using the RL-TD3 agent, even when using complex SMC- and SYN-type controllers, i.e., a customization of Figure 2 for the control system presented in this section.

5.3. Implementation of the RL-TD3 Agent for the Correction of the Outer Voltage Control Loop Using SMC and Synergetic Control

The block diagram of the MATLAB Simulink subsystem implementation of the GC-PV control system using SMC and SYN controllers and the improved performance of the RL-TD3 agent being used for the outer control loop is shown in Figure 21. The correction signals of the RL-TD3 agent were added to the command signal i_dref, the RL-TD3 block structure was similar to that in Figure 9 and the Reward was given by Equation (19).

The training time in this case was 3 h, 12 min and 17 s. The graphical results for this training stage are presented in Figure 22.

5.4. Implementation of the RL-TD3 Agent for the Correction of the Inner Currents Control Loop Using SMC and Synergetic Control

The MATLAB Simulink subsystem block implementation of the GC-PV control system using SMC and SYN and the improved performance from using the RL-TD3 agent in the inner control loop is presented in Figure 23. The correction signals of the RL-TD3 agent were added to the command signals u_dref and u_qref, the RL-TD3 block structure was similar to that in Figure 12 and the Reward was obtained using Equation (20).

The training time in this case was 3 h, 3 min and 36 s. The graphical results for this training stage are presented in Figure 24.

5.5. Implementation of the RL-TD3 Agent for the Correction of the Outer Speed Control Loop and Inner Current Control Loops Using SMC and Synergetic Control

The MATLAB Simulink subsystem block implementation of the GC-PV control system using SMC and SYN and the improved performance from using the RL-TD3 agent in the outer and inner control loops is presented in Figure 25. The correction signals of the RL-TD3 agent were added to the command signals i_dref, u_dref and u_qref, the RL-TD3 block structure was similar to that in Figure 15 and the Reward was obtained using Equation (21).

The training time in this case was 2 h, 15 min and 24 s. The graphical results for this training stage are presented in Figure 26.

6. Numerical Simulations

This section starts with the benchmark that was presented in MATLAB Simulink [15] and resumed to compare the best results obtained in [26,27,31,33]. After presenting the main characteristics of the benchmark system, this section presents the numerical simulations that were based on the theoretical elements presented in the previous sections. Thus, starting with the cascade control structure in which PI-type controllers were used for the inner control loop of the i_d and i_q current signals and the outer control loop of the u_dc voltage signal and using the elements regarding the RL-TD3 agent, a superior performance of the control system for the GC-PV array was obtained.

Moreover, in the second part of the numerical simulations and starting from the peak performances presented in [32,33] regarding the cascade control system in which an SYN-type controller was used for the inner control loop of the i_d and i_q current signals, an SMC-type controller was used for the outer control loop of the u_dc voltage signal and the elements regarding the RL-TD3 agent were used, a superior performance of the control system for the GC-PV array was obtained, both in terms of the direct comparison of these performances and the robustness provided by the control system under parametric variations, such as the variation caused by the three-phase load.

Regarding the characteristics of the benchmark system presented in Section 2, we note that it was a 100 kW model in which the value of the u_dc voltage in the DC intermediate circuit was set to a value of 500 V; therefore, one of the objectives of the control system is to maintain this voltage and the voltage value supplied by the DC–AC converter was 260 V. The load was connected to the main grid via a 25 kV–260 V transformer. The nominal value of the three-phase load was 10 kvar. The MPPT algorithm used was that presented and implemented in the benchmark system [15,31] and was kept unchanged so as to be able to compare the performances of different the control systems for the GC-PV array. The PV array consisted of 330 modules that could supply 100.7 kW (305.2 W/modules) and in which the short circuit current of each module was I_sc = 5.96 A and the open circuit voltage was V_oc = 64.2 V. The sampling period for the PWM generator was 1 ms and the sampling period for the voltage and the current were 100 ms. Similar to the benchmark, the control system was bypassed for the first 50 ms.

The simulation of the PV array operation was dependent on the evolution of the irradiance and temperature input signals, which is shown in Figure 27. The time evolution of the irradiance and temperature in Figure 27 and the signal type 1 PV array are noted to be those used in the numerical simulations of the GC-PV array control system when using PI-type controllers.

Figure 28 shows the time evolution of the u_dc voltage for the irradiance and temperature of the control GC-PV array system using PI-type controllers with the signals of the type 1 PV array. The steady-state error of the control system based on PI controllers was 1 V, i.e., 0.2%, and the overshooting was neglectable.

Next, Figure 29, Figure 30, Figure 31 and Figure 32 show the evolution over time of the following quantities of interest of the control system: the i_d and i_q currents; the power P_mean and voltage U_mean of the PV; the duty cycle of the DC-DC converter; the modulation index of the DC-AC converter; the u_a voltage and the i_a current of the main grid; and the power flow P between the PV and the main grid. Thus, the evolution of the i_d and i_q currents, with the reference current i_qref = 0 and the i_d current following the i_dref reference current, are presented in Figure 29. The evolution of the power P_mean and voltage U_mean of the PV, the duty cycle of the DC-DC converter and the modulation index of the DC-AC converter are presented in Figure 30. The time evolution of the u_a voltage and the i_a current of the main grid are presented in Figure 31. The evolution of the power flow P between the PV and the main grid is presented in Figure 32.

Regarding the performance of the control system for the GC-PV array using PI-type controllers, a step variation was applied from 500 V to 550 V at 1 s. The result of the numerical simulation is presented in Figure 33.

As a result of the design of the RL-TD3 agents, their training and the numerical simulations related to the cases in Section 4.1, Section 4.2, Section 4.3 and Section 3 are presented Figure 34, Figure 35 and Figure 36.

Figure 37 presents the comparative responses of the control systems for the GC-PV array for a step signal between 500 V to 550 V that was based on PI-type controllers and three variants of this type of control system using the RL-TD3 agent for the correction of the command signals (outer and inner control loops).

Table 1 presents the comparative performances of these systems in controlling the GC-PV array variants, i.e., the response times, and the ripple of the voltage error signal obtained using the Equation (55). In all of these cases in which the RL-TD3 agent was used, the overshooting was almost zero and the steady-state error was less than 0.2%. In the presented numerical simulations, it can be seen that the use of an RL-TD3 agent contributed to the improvement in the performance of the GC-PV array control system.

u_{d c_r i p} = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(u_{d c} (i) - u_{d c r e f} (i))}^{2}}

(55)

where N is the sample number, u_dc is the voltage and u_dcref is the reference voltage.

The time evolution of the irradiance and temperature in Figure 38 and the signal type 2 PV array was noted as being used in the numerical simulations of the GC-PV array control systems using SMC- and SYN-type controllers.

Figure 39 presents the response of the SMC-type control, which was designed for the control of the u_dc voltage, combined with the SYN-type control, which was designed for the control of the i_d and i_q currents, for the DC voltage reference u_dcref = 500 V. In the detail in Figure 39, it can be observed that the steady-state error was 0.1 V, i.e., 0.02%.

To demonstrate the parametric robustness of the control system, Figure 40 and Figure 41 show the evolution of the u_dc voltage in the case of a variation of 30% from its nominal value that was caused by the load. Therefore, it can be noted that the control system for the GC-PV array using SMC- and SYN-type controllers maintained its performance in each of these cases.

Following the case in which the input for the GC-PV array was provided by the type 2 PV array signals, Figure 42, Figure 43, Figure 44 and Figure 45 present the evolution over time of the following quantities of interest of the control system: the i_d and i_q currents; the P_mean power and U_mean voltage of the PV; the duty cycle of the DC-DC converter; the modulation index of the DC-AC converter; the voltage u_a and current i_a of the main grid; and the power flow P between the PV and the main grid. Thus, the time evolution of the i_d and i_q currents, with the reference current i_qref = 0 and the i_d current following the i_dref reference current, are presented in Figure 42. The time evolution of the power P_mean and voltage U_mean of the PV, the duty cycle of the DC-DC converter and the modulation index of the DC-AC converter are presented in Figure 43. The evolution of the u_a voltage and i_a current of the main grid are presented in Figure 44. The time evolution of the power flow P between the PV and the main grid is presented in Figure 45.

Regarding the performance of the control system for the GC-PV array using SMC- and SYN-type controllers, a step variation was applied from 500 V to 550 V at 1 s. The result of the numerical simulation is presented in Figure 46.

As a result of the proposed RL-TD3 agents, their training and the numerical simulations related to the cases in Section 5.3, Section 5.4 and Section 5.5 are presented in Figure 47, Figure 48 and Figure 49.

Figure 50 presents the comparative response of the control for the GC-PV array system for a step variation from 500 V to 550 V based on SMC and SYN controllers and three variants of this type of control system using RL-TD3 agent for correction of the command signals (outer and inner control loop).

Table 2 presents the comparative performances of these systems in controlling the GC-PV array variants, i.e., the response times, and the ripple of the voltage error signal obtained using Equation (55). In all of these cases in which the RL-TD3 agent was used, the overshooting was almost zero and the steady-state error was less than 0.02%. In the presented numerical simulations, it can be seen that the use of an RL-TD3 agent contributed to the improvement in the performance of the GC-PV array control system.

It can be observed that, between PI-type control system and the PI–RL-TD3 agent control system, the response time was improved by approximately 7 ms and between the SMC and SYN control system, the SMC–RL-TD3 agent and SYN–RL-TD3 agent, the response time was improved by approximately 1 ms, i.e., a decrease in the response time by approximately 18% in the first case and approximately 7% in the second case. However, between the simplest and most complex cases, the decrease in the response time was about 27 ms, which indicates a decrease of about 70%. In the same way, the rest of the performances could be evaluated as relative or absolute units.

7. Conclusions

This paper described the control system for a GC-PV array, starting from a benchmark system. The control structure was a cascade-type structure in which PI or SYN controllers were used for the inner control loops of i_d and i_q signal currents and PI or SMC controllers were used for the outer control loop of the u_dc signal voltage in the DC intermediate circuit. The paper presented the model of the PV array together with the main component blocks: the simulated inputs for the PV array; the PV array itself; the MPPT algorithm; the DC-DC boost converter; the voltage and current measurements for the DC intermediate circuit; the DC-AC converter; the load and connection to the power grid; and the power grid. It also presented the stages of building and training the RL-TD3 agent. Additionally, the comparative results are shown for cases in which the RL-TD3 agent was properly trained and provided correction signals that were added to the command signals u_d, u_q and i_dref. The parametric robustness of the proposed control system for the GC-PV array based on SMC and SYN controllers was proven in the case of a variation of 30% that was caused by the three-phase load. Moreover, the results of the numerical simulations are presented comparatively and the validation of the synthesis of the proposed control system for the GC-PV array was obtained. This was performed by comparing the system to the software benchmark of a control system for a GC-PV array that was implemented in MATLAB Simulink. The numerical simulations proved the superiority of the control system that used the RL-TD3 agent.

Author Contributions

Conceptualization, M.N. and C.-I.N.; data curation, M.N., C.-I.N. and D.S.; formal analysis, M.N., C.-I.N. and D.S.; funding acquisition, M.N. and D.S.; investigation, M.N., C.-I.N. and D.S.; methodology, M.N., C.-I.N. and D.S.; project administration, M.N. and D.S.; resources, M.N. and D.S.; software, M.N. and C.-I.N.; supervision, M.N. and D.S.; validation, M.N. and D.S.; visualization, M.N., C.-I.N. and D.S.; writing—original draft, M.N., C.-I.N. and D.S.; writing—review and editing, M.N., C.-I.N. and D.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the European Regional Development Fund Competitiveness Operational Program, project TISIPRO, ID: P_40_416/105736, 2016–2021, and with funds from the Ministry of Research and Innovation in Romania as part of the NUCLEU program, PN 19 38 01 03.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Tricarico, T.; Gontijo, G.; Neves, M.; Soares, M.; Aredes, M.; Guerrero, J.M. Control Design, Stability Analysis and Experimental Validation of New Application of an Interleaved Converter Operating as a Power Interface in Hybrid Microgrids. Energies 2019, 12, 437. [Google Scholar] [CrossRef] [Green Version]
Petersen, L.; Iov, F.; Tarnowski, G.C. A Model-Based Design Approach for Stability Assessment, Control Tuning and Verification in Off-Grid Hybrid Power Plants. Energies 2020, 13, 49. [Google Scholar] [CrossRef] [Green Version]
Veerashekar, K.; Askan, H.; Luther, M. Qualitative and Quantitative Transient Stability Assessment of Stand-Alone Hybrid Microgrids in a Cluster Environment. Energies 2020, 13, 1286. [Google Scholar] [CrossRef] [Green Version]
Zhao, F.; Yuan, J.; Wang, N.; Zhang, Z.; Wen, H. Secure Load Frequency Control of Smart Grids under Deception Attack: A Piecewise Delay Approach. Energies 2019, 12, 2266. [Google Scholar] [CrossRef] [Green Version]
Montoya, O.D.; Gil-González, W.; Rivas-Trujillo, E. Optimal Location-Reallocation of Battery Energy Storage Systems in DC Microgrids. Energies 2020, 13, 2289. [Google Scholar] [CrossRef]
Alshehri, J.; Khalid, M.; Alzahrani, A. An Intelligent Battery Energy Storage-Based Controller for Power Quality Improvement in Microgrids. Energies 2019, 12, 2112. [Google Scholar] [CrossRef] [Green Version]
Estévez-Bén, A.A.; Alvarez-Diazcomas, A.; Rodríguez-Reséndiz, J. Transformerless Multilevel Voltage-Source Inverter Topology Comparative Study for PV Systems. Energies 2020, 13, 3261. [Google Scholar] [CrossRef]
Yan, X.; Cui, Y.; Cui, S. Control Method of Parallel Inverters with Self-Synchronizing Characteristics in Distributed Microgrid. Energies 2019, 12, 3871. [Google Scholar] [CrossRef] [Green Version]
Coppola, M.; Guerriero, P.; Dannier, A.; Daliento, S.; Lauria, D.; Del Pizzo, A. Control of a Fault-Tolerant Photovoltaic Energy Converter in Island Operation. Energies 2020, 13, 3201. [Google Scholar] [CrossRef]
Khan, K.; Kamal, A.; Basit, A.; Ahmad, T.; Ali, H.; Ali, A. Economic Load Dispatch of a Grid-Tied DC Microgrid Using the Interior Search Algorithm. Energies 2019, 12, 634. [Google Scholar] [CrossRef] [Green Version]
Cook, M.D.; Trinklein, E.H.; Parker, G.G.; Robinett, R.D., III; Weaver, W.W. Optimal and Decentralized Control Strategies for Inverter-Based AC Microgrids. Energies 2019, 12, 3529. [Google Scholar] [CrossRef] [Green Version]
Oviedo Cepeda, J.C.; Osma-Pinto, G.; Roche, R.; Duarte, C.; Solano, J.; Hissel, D. Design of a Methodology to Evaluate the Impact of Demand-Side Management in the Planning of Isolated/Islanded Microgrids. Energies 2020, 13, 3459. [Google Scholar] [CrossRef]
Stadler, M.; Pecenak, Z.; Mathiesen, P.; Fahy, K.; Kleissl, J. Performance Comparison between Two Established Microgrid Planning MILP Methodologies Tested On 13 Microgrid Projects. Energies 2020, 13, 4460. [Google Scholar] [CrossRef]
Artale, G.; Caravello, G.; Cataliotti, A.; Cosentino, V.; Di Cara, D.; Guaiana, S.; Nguyen Quang, N.; Palmeri, M.; Panzavecchia, N.; Tinè, G. A Virtual Tool for Load Flow Analysis in a Micro-Grid. Energies 2020, 13, 3173. [Google Scholar] [CrossRef]
MathWorks—Detailed Model of a 100-kW Grid-Connected PV Array. Available online: https://nl.mathworks.com/help/physmod/sps/ug/detailed-model-of-a-100-kw-grid-connected-pv-array.html;jsessionid=29903e2e045151ffb3e27a4920e1 (accessed on 4 November 2020).
Hong, W.; Tao, G. An Adaptive Control Scheme for Three-phase Grid-Connected Inverters in Photovoltaic Power Generation Systems. In Proceedings of the Annual American Control Conference (ACC), Milwaukee, WI, USA, 27–29 June 2018; pp. 899–904. [Google Scholar]
Naderi, M.; Khayat, Y.; Bevrani, H. Robust Multivariable Microgrid Control Synthesis and Analysis. Energy Procedia 2016, 100, 375–387. [Google Scholar] [CrossRef] [Green Version]
Hua, H.; Qin, Y.; Xu, H.; Hao, C.; Cao, J. Robust Control Method for DC Microgrids and Energy Routers to Improve Voltage Stability in Energy Internet. Energies 2019, 12, 1622. [Google Scholar] [CrossRef] [Green Version]
Villalón, A.; Rivera, M.; Salgueiro, Y.; Muñoz, J.; Dragičević, T.; Blaabjerg, F. Predictive Control for Microgrid Applications: A Review Study. Energies 2020, 13, 2454. [Google Scholar] [CrossRef]
Zeb, K.; Islam, S.U.; Din, W.U.; Khan, I.; Ishfaq, M.; Busarello, T.D.C.; Ahmad, I.; Kim, H.J. Design of Fuzzy-PI and Fuzzy-Sliding Mode Controllers for Single-Phase Two-Stages Grid-Connected Transformerless Photovoltaic Inverter. Electronics 2019, 8, 520. [Google Scholar] [CrossRef] [Green Version]
Kamal, T.; Karabacak, M.; Perić, V.S.; Hassan, S.Z.; Fernández-Ramírez, L.M. Novel Improved Adaptive Neuro-Fuzzy Control of Inverter and Supervisory Energy Management System of a Microgrid. Energies 2020, 13, 4721. [Google Scholar] [CrossRef]
Song, L.; Huang, L.; Long, B.; Li, F. A Genetic-Algorithm-Based DC Current Minimization Scheme for Transformless Grid-Connected Photovoltaic Inverters. Energies 2020, 13, 746. [Google Scholar] [CrossRef] [Green Version]
Yoshida, Y.; Farzaneh, H. Optimal Design of a Stand-Alone Residential Hybrid Microgrid System for Enhancing Renewable Energy Deployment in Japan. Energies 2020, 13, 1737. [Google Scholar] [CrossRef] [Green Version]
Younesi, A.; Shayeghi, H.; Siano, P. Assessing the Use of Reinforcement Learning for Integrated Voltage/Frequency Control in AC Microgrids. Energies 2020, 13, 1250. [Google Scholar] [CrossRef] [Green Version]
Serra, F.M.; Fernández, L.M.; Montoya, O.D.; Gil-González, W.; Hernández, J.C. Nonlinear Voltage Control for Three-Phase DC-AC Converters in Hybrid Systems: An Application of the PI-PBC Method. Electronics 2020, 9, 847. [Google Scholar] [CrossRef]
Wu, B.; Zhou, X.; Ma, Y. Bus Voltage Control of DC Distribution Network Based on Sliding Mode Active Disturbance Rejection Control Strategy. Energies 2020, 13, 1358. [Google Scholar] [CrossRef] [Green Version]
Qian, J.; Li, K.; Wu, H.; Yang, J.; Li, X. Synergetic Control of Grid-Connected Photovoltaic Systems. Int. J. Photoenergy 2017, 2107, 1–11. [Google Scholar] [CrossRef]
Brandimarte, P. Approximate Dynamic Programming and Reinforcement Learning for Continuous States. In From Shortest Paths to Reinforcement Learning: A MATLAB-Based Tutorial on Dynamic Programming; Springer Nature: Cham, Switzerland, 2021; pp. 185–204. [Google Scholar]
Beale, M.; Hagan, M.; Demuth, H. Deep Learning Toolbox™ Getting Started Guide, 14th ed.; MathWorks, Inc.: Natick, MA, USA, 2020. [Google Scholar]
MathWorks—Reinforcement Learning Toolbox™ User’s Guide. Available online: https://www.mathworks.com/help/reinforcement-learning/getting-started-with-reinforcement-learning-toolbox.html?s_tid=CRUX_lftnav (accessed on 4 November 2020).
de Brito, M.A.G.; Sampaio, L.P.; Luigi, G.; e Melo, G.A.; Canesin, C.A. Comparative analysis of MPPT techniques for PV applications. In Proceedings of the International Conference on Clean Electrical Power (ICCEP), Ischia, Italy, 14–16 June 2011; pp. 99–104. [Google Scholar]
Nicola, M.; Nicola, C.-I. Sensorless Fractional Order Control of PMSM Based on Synergetic and Sliding Mode Controllers. Electronics 2020, 9, 1494. [Google Scholar] [CrossRef]
Nicola, M.; Nicola, C.-I. Fractional-Order Control of Grid-Connected Photovoltaic System Based on Synergetic and Sliding Mode Controllers. Energies 2021, 14, 510. [Google Scholar] [CrossRef]

Figure 1. The schematic block of the main circuit for the GC-PV system.

Figure 2. The schematic diagram of the cascade control system for the GC-PV system.

Figure 3. The schematic diagram for an RL of process control.

Figure 4. The schematic diagram of the control system for the GC-PV system based on PI-type controllers.

Figure 5. The model MATLAB Simulink implementation of the control system for the GC-PV array based on PI-type controllers using an RL-TD3 agent for the correction of control signals.

Figure 6. An example of the MATLAB syntax program code for the DNN creation.

Figure 7. The graphic representation of the created DNN.

Figure 8. The model MATLAB Simulink implementation of the control system for the GC-PV array based on PI controllers using the RL-TD3 agent for the correction of the i_dref command.

Figure 9. The MATLAB Simulink subsystem of the RL-TD3 agent for the correction of the i_dref command.

Figure 10. The training stage of the RL-TD3 agent for the correction of the i_dref command.

Figure 11. The model MATLAB Simulink implementation of the control system for the GC-PV array based on PI controllers using the RL-TD3 agent for the correction of the u_dref and u_qref commands.

Figure 12. The MATLAB Simulink subsystem of the RL-TD3 agent for the correction of the u_dref and u_qref commands.

Figure 13. The training stage of the RL-TD3 agent for the correction of the u_dref and u_qref commands.

Figure 14. The model MATLAB Simulink implementation of the control system for the GC-PV array based on PI controllers using the RL-TD3 agent for the correction of the u_dref, u_qref and i_dref commands.

Figure 15. The MATLAB Simulink subsystem of the RL-TD3 agent for the correction of the u_dref, u_qref and i_dref commands.

Figure 16. The training stage of the RL-TD3 agent for the correction of the u_dref, u_qref and i_dref commands.

Figure 17. The schematic diagram of the control system for the GC-PV array based on the SMC and SYN controllers.

Figure 18. The MATLAB Simulink subsystem implementation of the SMC controller.

Figure 19. The MATLAB Simulink implementation subsystem for the SYN controller.

Figure 20. The model MATLAB Simulink implementation of the control system for the GC-PV array based on SMC and SYN controllers using the RL-TD3 agent for the correction of the control signals.

Figure 21. The block diagram of the MATLAB Simulink subsystem implementation of the control system for the GC-PV array based on SMC and SYN using the RL-TD3 agent for the correction of the i_dref command.

Figure 22. The training stage of the RL-TD3 agent for the correction of the i_dref command.

Figure 23. The MATLAB Simulink subsystem block implementation of the control system for the GC-PV array based on SMC and SYN and using the RL-TD3 agent for the correction of the u_dref and u_qref command signals.

Figure 24. The training stage of the RL-TD3 agent for the correction of the u_dref and u_qref commands.

Figure 25. The MATLAB Simulink subsystem block implementation of the control system for the GC-PV array based on SMC and SYN and using the RL-TD3 agent for the correction of the u_dref, u_qref and i_qref commands.

Figure 26. The training stage of the RL-TD3 agent for the correction of the u_dref, u_qref and i_dref commands.

Figure 27. The time evolution of irradiance and temperature (signal evolution of the type 1 PV array).

Figure 28. The time evolution of the u_dc voltage for the irradiance and temperature using PI-type controllers (signal evolution of the type 1 PV array).

Figure 29. The time evolution of the i_d and i_q currents (signal evolution of the type 1 PV array).

Figure 30. The time evolutions of P_mean and U_mean, the duty cycle of the DC-DC converter and the modulation index of the DC–AC converter (signal evolution of the type 1 PV array).

Figure 31. The time evolution of the u_a voltage and i_a current of the main grid (signal evolution of the type 1 PV array).

Figure 32. The time evolution of the power flow P between the PV and the main grid (signal evolution of the type 1 PV array).

Figure 33. The time evolution of the u_dc voltage for a step variation of u_dcref from 500 V to 550 V using PI controllers (signal evolution of the type 1 PV array).

Figure 34. The time evolution of the u_dc voltage for a step variation of the u_dcref reference voltage from 500 V to 550 V using PI controllers and the RL-TD3 agent for the correction of the i_dref command (signal evolution of the type 1 PV array).

Figure 35. The time evolution of the u_dc voltage for a step variation of the u_dcref reference voltage from 500 V to 550 V using PI controllers and the RL-TD3 agent for the correction of the u_dref and u_qref commands (signal evolution of the type 1 PV array).

Figure 36. The time evolution of the u_dc voltage for a step variation of the u_dcref reference voltage from 500 V to 550 V using PI controllers and the RL-TD3 agent for the correction of the i_dref, u_dref and u_qref commands (signal evolution of the type 1 PV array).

Figure 37. The comparison of voltage u_dc for a step variation of the u_dcref reference voltage from 500 V to 550 V using PI controllers and the RL-TD3 agent for the correction of the outer and inner loop commands (signal evolution of the type 1 PV array).

Figure 38. The time evolution of irradiance and temperature (signal evolution of the type 2 PV array).

Figure 39. The time evolution of the u_dc voltage for irradiance and temperature using SMC- and SYN-type controllers for a 10 kvar load (signal evolution of the type 2 PV array).

Figure 40. The time evolution of the u_dc voltage for and temperature using SMC- and SYN-type controllers for a 13 kvar load (signal evolution of the type 2 PV array).

Figure 41. The time evolution of the u_dc voltage for irradiance and temperature using SMC- and SYN-type controllers for a 7 kvar load (signal evolution of the type 2 PV array).

Figure 42. The time evolution of the i_d and i_q currents (signal evolution of the type 2 PV array).

Figure 43. The time evolutions of the P_mean and U_mean of the PV, the duty cycle of the DC–DC converter and the modulation index of the DC–AC converter (signal evolution of the type 2 PV array).

Figure 44. The time evolution of the u_a voltage and i_a current of the main grid (signal evolution of the type 2 PV array).

Figure 45. The time evolution of the power flow P between the PV and the main grid (signal evolution of the type 2 PV array).

Figure 46. The time evolution of voltage u_dc for a step variation of the u_dcref reference voltage from 500 V to 550 V using SMC- and SYN-type controllers (signal evolution of the type 2 PV array).

Figure 47. The time evolution of voltage u_dc for a step variation of the u_dcref reference voltage from 500 V to 550 V using SMC- and SYN-type controllers and the RL-TD3 agent for the correction of the i_dref command (signal evolution of the type 2 PV array).

Figure 48. The time evolution of voltage u_dc for a step variation of the u_dcref reference voltage from 500 V to 550 V using SMC- and SYN-type controllers and the RL-TD3 agent for the correction of the u_dref and u_qref commands (signal evolution of the type 2 PV array).

Figure 49. The time evolution of voltage u_dc for a step variation of the u_dcref reference voltage from 500 V to 550 V using SMC- and SYN-type controllers and the RL-TD3 agent for the correction of the i_dref, u_dref and u_qref commands (signal evolution of the type 2 PV array).

Figure 50. The comparison of voltage u_dc for a step variation of the u_dcref reference voltage from 500 V to 550 V using SMC- and SYN-type controllers and the RL-TD3 agent for the outer and inner loop corrections (signal evolution of the type 2 PV array).

Table 1. The performances of the GC-PV array control system based on PI-type controllers using the RL-TD3 agent.

Controllers for the GC-PV Array	Response Time (ms)	Voltage Ripple (V)	Overshooting (%)	Steady-State Error (%)
PI	40.4	57.87	<0.5	0.2
PI using the RL-TD3 agent for the correction of the i_dref command	37.1	57.23	<0.5	0.2
PI using the RL-TD3 agent for the correction of the u_dref and u_qref commands	35.9	56.67	<0.5	0.2
PI using the RL-TD3 agent for the correction of the u_dref, u_qref and i_dref commands	33.8	56.22	<0.5	0.2

Table 2. The performances of the GC-PV control systems based on SMC and SYN controllers using the RL-TD3 agent.

Controllers for the GC-PV Array	Response Time (ms)	Voltage Ripple (V)	Overshooting (%)	Steady-State Error (%)
SMC and SYN	14.1	55.63	<0.2	0.02
SMC and SYN using the RL-TD3 agent for the correction of the i_dref command	13.7	55.12	<0.2	0.02
SMC and SYN using the RL-TD3 agent for the correction of the u_dref and u_qref commands	13.5	54.58	<0.2	0.02
SMC and SYN using the RL-TD3 agent for the correction of the u_dref, u_qref and i_dref commands	13.2	54.03	<0.2	0.02

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nicola, M.; Nicola, C.-I.; Selișteanu, D. Improvement of the Control of a Grid Connected Photovoltaic System Based on Synergetic and Sliding Mode Controllers Using a Reinforcement Learning Deep Deterministic Policy Gradient Agent. Energies 2022, 15, 2392. https://doi.org/10.3390/en15072392

AMA Style

Nicola M, Nicola C-I, Selișteanu D. Improvement of the Control of a Grid Connected Photovoltaic System Based on Synergetic and Sliding Mode Controllers Using a Reinforcement Learning Deep Deterministic Policy Gradient Agent. Energies. 2022; 15(7):2392. https://doi.org/10.3390/en15072392

Chicago/Turabian Style

Nicola, Marcel, Claudiu-Ionel Nicola, and Dan Selișteanu. 2022. "Improvement of the Control of a Grid Connected Photovoltaic System Based on Synergetic and Sliding Mode Controllers Using a Reinforcement Learning Deep Deterministic Policy Gradient Agent" Energies 15, no. 7: 2392. https://doi.org/10.3390/en15072392

APA Style

Nicola, M., Nicola, C.-I., & Selișteanu, D. (2022). Improvement of the Control of a Grid Connected Photovoltaic System Based on Synergetic and Sliding Mode Controllers Using a Reinforcement Learning Deep Deterministic Policy Gradient Agent. Energies, 15(7), 2392. https://doi.org/10.3390/en15072392

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improvement of the Control of a Grid Connected Photovoltaic System Based on Synergetic and Sliding Mode Controllers Using a Reinforcement Learning Deep Deterministic Policy Gradient Agent

Abstract

1. Introduction

2. Grid Connected PV Array System: The Mathematical Model

3. Reinforcement Learning for Process Control

4. Correction of the Control Signals Used for the Control of a Grid Connected PV Array System Based on PI Controllers Using RL-TD3 Agent

4.1. Implementation of the RL-TD3 Agent for the Correction of Commands for the Outer Voltage Control Loop

4.2. Implementation of the RL-TD3 Agent for the Command Correction of the Inner Currents Control Loop

4.3. Implementation of the RL-TD3 Agent for the Command Correction of the Outer Voltage Control Loop and Inner Current Control Loops

5. Correction of the Control Signals for the Control System of the Grid Connected PV Array Based on SMC and Synergetic Controllers Using the RL-TD3 Agent

5.1. Sliding Mode Control

5.2. Synergetic Control

5.3. Implementation of the RL-TD3 Agent for the Correction of the Outer Voltage Control Loop Using SMC and Synergetic Control

5.4. Implementation of the RL-TD3 Agent for the Correction of the Inner Currents Control Loop Using SMC and Synergetic Control

5.5. Implementation of the RL-TD3 Agent for the Correction of the Outer Speed Control Loop and Inner Current Control Loops Using SMC and Synergetic Control

6. Numerical Simulations

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI