Maximum Power Point Tracker Controller for Solar Photovoltaic Based on Reinforcement Learning Agent with a Digital Twin

Artetxe, Eneko; Uralde, Jokin; Barambones, Oscar; Calvo, Isidro; Martin, Imanol

doi:10.3390/math11092166

Open AccessArticle

Maximum Power Point Tracker Controller for Solar Photovoltaic Based on Reinforcement Learning Agent with a Digital Twin

by

Eneko Artetxe

^*

,

Jokin Uralde

,

Oscar Barambones

^*

,

Isidro Calvo

and

Imanol Martin

Department Systems Engineering and Automatic Control, Faculty of Engineering of Vitoria-Gasteiz, University of the Basque Country (UPV/EHU), 01006 Vitoria-Gasteiz, Spain

^*

Authors to whom correspondence should be addressed.

Mathematics 2023, 11(9), 2166; https://doi.org/10.3390/math11092166

Submission received: 31 March 2023 / Revised: 24 April 2023 / Accepted: 3 May 2023 / Published: 5 May 2023

(This article belongs to the Special Issue Advanced Control Theory with Applications)

Download

Browse Figures

Versions Notes

Abstract

Photovoltaic (PV) energy, representing a renewable source of energy, plays a key role in the reduction of greenhouse gas emissions and the achievement of a sustainable mix of energy generation. To achieve the maximum solar energy harvest, PV power systems require the implementation of Maximum Power Point Tracking (MPPT). Traditional MPPT controllers, such as P&O, are easy to implement, but they are by nature slow and oscillate around the MPP losing efficiency. This work presents a Reinforcement learning (RL)-based control to increase the speed and the efficiency of the controller. Deep Deterministic Policy Gradient (DDPG), the selected RL algorithm, works with continuous actions and space state to achieve a stable output at MPP. A Digital Twin (DT) enables simulation training, which accelerates the process and allows it to operate independent of weather conditions. In addition, we use the maximum power achieved in the DT to adjust the reward function, making the training more efficient. The RL control is compared with a traditional P&O controller to validate the speed and efficiency increase both in simulations and real implementations. The results show an improvement of 10.45% in total power output and a settling time 24.54 times faster in simulations. Moreover, in real-time tests, an improvement of 51.45% in total power output and a 0.25 s settling time of the DDPG compared with 4.26 s of the P&O is obtained.

Keywords:

solar PV; maximum power point tracking (MPPT); reinforcement learning (RL); deep deterministic policy gradient (DDPG); digital twin (DT)

MSC:

93C40; 93C55; 90C29

1. Introduction

Electrical energy is a vital energy source for the survival and development of humans. The global energy demand continues to grow year by year and by 2030, is expected to reach 170% of the consumption of 1990 after adjusting estimates following the COVID-19 pandemic according to the IEA [1,2]. Historically, the energy mix has heavily relied on fossil fuels, presenting major sources of CO₂ and greenhouse gases that generate climate change [3]. The lack of profitable fossil-fuel fields combined with increased awareness of environmental concerns led to the need for a common framework endorsed in the Paris agreements, which states a limit in the global temperature increase of 2 °C [4].

Renewable energies play a key role in the reduction of greenhouse gas emissions and the achievement of a sustainable mix of energy generation [5]. Solar energy can be harvested through a Photovoltaic (PV) panel [6], and turned into electrical energy. PV panels are an arrangement of doped solar cells that generate an electron flow when the photons of solar light hit the surface [7]. The applications of solar energy extend beyond electricity generation for networks, also providing energy for isolated facilities in remote locations without access to electrical networks, or in the aerospace industry [8]. In comparison with other clean energies with higher efficiency, PV energy offers major advantages such as avoidance of moving parts, minimal maintenance, long life span, silent energy production, and lower installation costs [9].

One of the challenges of solar energy is to maximise the power output across all working conditions. In PV panels, DC/DC converters allow for manipulating the operating point on which the panel is working on seeking to achieve the maximum power point (MPP) [10,11]. MPP is the operating point where the system can work at its best efficiency. Converters can be controlled by different algorithms to track the MPP since it changes with environmental conditions, creating an MPP tracker (MPPT) [12,13,14].

Digital twins (DT) are digital representations of a real-world product, system, or a process that serves as the digital counterpart of it for practical purposes, such as simulation, integration, testing, monitoring, and maintenance [15]. DT of photovoltaic systems can model the global MPP under different environmental conditions [16,17]. The integration of this technique to the control loop can accelerate MPPT training.

MPPT techniques can be categorized into either conventional or artificial-based approaches. The most used ones, perturb and observation (P&O) and incremental conductance, fall into the conventional category, in particular, inside hill-climbing algorithms (HCAs). These algorithms try to locate whether the operating point is on the left or right side of the curve, and constantly move a step towards the MPP [18]. P&O is a low-cost and reliable algorithm for MPPT. This technique calculates the power with the current and voltage readings. At every step, based on whether there has been an increase or decrease in power and the direction of movement in the previous step, the next fixed step movement is determined [19,20]. P&O requires very low computational power but presents two main disadvantages: (1) Once reaching the MPP, it oscillates around that point thus losing efficiency and (2) under partial shadowing conditions, it can reach a local minimum instead of the global one [21,22].

To solve these issues, one option is to reduce the step size, but this implies that the algorithm tracks the MPP slower [23]. Modified and improved methods of P&O control algorithms have been developed recently. For example, Numan et al. [24] proposed a variable step size P&O controller adding incremental calculations and achieving a stable controller that improves the classical one. A similar approach was used by Alangammal and Rathina [25], where a power-change-dependent scaling factor was used to calculate the step size. Optimisation algorithms have also been studied to find appropriate step sizes, Mendez et al. [26] for example compared particle swarm optimisation (PSO) and the earthquake optimisation algorithm (EOA) achieving higher energy performance. Nevertheless, optimisation algorithms require high computational resources to solve the optimization problem [27]. Incremental conductance (InCond) algorithms track changes in the voltage and current instead of the power. This makes IncCond algorithms faster and more precise, but increase the computational cost. IncCond algorithms face the same problem of oscillation behaviour around the MPP. In addition to this algorithm, other alternatives have been presented to solve this problem. For example, Li et al. [28] developed a variable step IncCond using an adjustment coefficient of step size, subsequently achieving fast response and better steady-state performance.

Newer techniques have also been studied for MPPT algorithms. Fuzzy logic controllers (FLC) [29] based on artificial neural networks (ANN) can be found in the literature. Al-Giziz et al. [30] compared FLC and classical methods, showing that FLC achieves faster response and greater stability at steady-state compared with P&O and InC methods. Roy et al. [31] compared Levenberg–Marquardt, Bayesian Regularization and Scaled Conjugate Gradient ANN algorithms for MPPT energy harvesting. The results showed that the Levenberg–Marquardt algorithm exhibits better performance in the overall data processing.

Reinforcement learning (RL) has emerged as a technique that is gaining popularity for MPPT controllers, due to its resistance to environmental changes and self-learning capabilities [32]. There are several types of RL agents that have been previously discussed. Chou et al. [33], for example, compared a Q-table (RL-QT) and a Q-network (RL-QN) method, with RL-QT demonstrating smaller oscillations and the RL-QN achieving higher average power. Singh et al. [34] solved the problem of a continuous space by introducing a fuzzified reward function.

Other authors, such as Phant et al. [35], chose to discuss the applicability of a DDPG agent, the one used in this work, compared to a DQN agent, comparing both of them, moreover, with a P&O. A DQN agent cannot deal only with discrete action space, while a DDPG agent handles the continuous action space, thus being more applicable to control tasks. Nevertheless, in this case, the controllers attempt to adjust the performance of a P&O, increasing or decreasing the stock value each time it is sampled until the MPP is reached. In this case, where the controllers do not obtain the optimal duty cycle value instantly as in the DDPG proposed in this work, the DQN exhibits a higher performance (up to 2.5% over the DDPG) because its training with discrete actions requires fewer iterations and is better optimized. However, the DDPG performs better in PSC conditions. It extracts more power than the DQN-based method and has the highest tracking speed, while also being the most efficient. Thus, the efficiency of the DDPG method increases by 44.6%, while that of the DQN method is just about 38.3%.

The DDPG agent and its variant, the Twin-Delayed (TD3) control, were also used by Nicola et al. [36] to improve the control process of PI, Sliding Mode Control (SMC) and Synergetic (SYN) controllers. In this case, the RL agent is not used directly as a controller but helps to obtain a better performance of the controllers providing references of the control signals.

Based on the state-of-the-art literature, this work presents a controller based on a DDPG agent that is combined with a DT of the solar panel and DC/DC converter in its training. This combination provides to the DDPG agent the duty cycle to achieve the MPP, eliminating the steady-state operation oscillations and reaching a high efficiency. The choice of this control design is based on the following aspects:

Continuous action space. Compared to other RL agents such as DQN, the continious action space handled by the DDPG agent is more suitable for control task, giving the precise control signal to achieve the MPP in each environmental case of temperature and irradiance.
Instantaneous control action. Contrary to the work [35], after training, the DDPG agent instantly provides the optimal duty cycle to obtain the MPP, so the time needed to reach the MPP is only limited by the response of the system.
Direct control. The DDPG agent in this case, is the only one in charge of providing the duty cycle to the converter without counting on other controllers or without being itself the one that provides help to other controllers as in the paper [36]. In this way, the computational cost is lower and the system is simplified.
Training simplicity. Compared to any ANN or machine learning that works as an MPPT, an RL agent learns the correct control signal during training for a variety of irradiance and temperature values. A machine learning or supervised ANN needs to know in advance, what the optimal duty cycles are for each combination of irradiance and temperature, and thus presents a lengthy process using scanning or other types of controllers to obtain this large amount of data, in addition to the time to train the network afterwards. In the case of an RL agent, this initial process is eliminated at the cost of greater design difficulty.
DT for the training of an RL agent. The use of the DT as part of the reward function of the DDPG training accelerates the training process.

In this way, the MPP is achieved in the fastest way that the system allows, at low computational cost and over all possible values of radiation and temperature considered. To validate the efficency of this control scheme, this paper compares and evaluates RL DDPG algorithm for MPPT control against more traditional P&O controller solutions, both in simulation and real implementation.

The rest of this article is arranged as follows. Section 2 defines the materials and methods used, such as the MPP, the equations of the DT, the RL algorithm used and the P&O implementation. It also includes a description of the hardware involved in the PV implementation. Section 3 provides the results and discussion of both the simulations and the real implementation. Finally, concluding remarks and future work are presented in Section 4.

2. Materials and Methods

2.1. Digital Twin (PV Model)

The use of the DT makes it possible to use simulation in the controller training process. By characterising the system, we obtain a model that resembles the characteristics of the real system. Therefore, it is especially useful to use this resource to conduct simulations of the system and thus accelerate the learning process. In addition, this DT allows us to abstract from the weather conditions and simulate any state, making the control more robust. Another benefit of the DT is the possibility of using the maximum power that we achieve in it to complement the reward function of the agent and thus optimise the learning process. Additionally, the use of the DT reduces the real training time by 40.76% due to the higher speed of the simulations compared to the real system. A block diagram showing the implementation of the training process is given in Section 2.3.

To characterize the photovoltaic panel, a single-diode model [37], as shown in Figure 1, was used due to simplicity and accuracy. This model treats the PV as a DC source in an electrical circuit. The current source provides the current

I_{p h}

that is generated by solar irradiance. The model also includes two resistances, where

R_{s h}

is related to the diode’s current leakage at the p-n interface; and

R_{s}

is the series resistance of the PV.

The single-diode model [37] follows Kirchhoff’s current law. The current that the panel outputs

I_{P V}

is defined and developed as in Equations (1) and (2):

I_{P V} = I_{p h} - I_{d} - I_{s h}

(1)

I_{P V} = I_{p h} - I_{0} - I_{s h} e^{\frac{q (V_{P V} + R_{s} I_{P V})}{α K T_{c}}} - 1 - \frac{V_{P V} + R_{s} I_{P V}}{R_{s h}}

(2)

In Equation (1)

I_{p h}

represents the light-generated current in the cell,

I_{d}

the voltage-dependent current lost to recombination, and

I_{s h}

the current lost due to shunt resistances. In Equation (2),

I_{0}

denotes the reverse saturation current,

V_{P V}

the PV output voltage,

R_{s}

the series resistance,

α

the diode ideality factor, K the Boltzmann constant, q the elementary charge, and

T_{c}

the operating temperature. The current generated by solar irradiance

I_{p h}

is expressed in Equation (3):

I_{p h} = \frac{G}{G_{r e f}} I_{s c_r e f} + K_{I_r e f} (T - T_{r e f})

(3)

where G and

G_{r e f}

are the effective and reference irradiances, respectively,

I_{s c_r e f}

is the short-circuit current,

T_{r e f}

the PV temperature and

K_{I_r e f}

comprises a thermal factor of the short-circuit current. Equations (4)–(6) represent the output current and voltage taking into account the number of modules in parallel and in series:

I_{m} = N_{p} I_{P V}

(4)

V_{m} = N_{s} V_{P V}

(5)

I_{P V} = I_{p h} N_{p} - N_{p} I_{0} e^{^{\frac{q (V_{P V} + R_{s} I_{c})}{α K T_{c}}}} - 1 - N_{p} \frac{V_{P V} + R_{s} I_{P V}}{R_{s h}}

(6)

N_{p}

,

N_{s}

I_{m}

and

V_{m}

are the numbers of modules in parallel, the number of modules in series, the output current and the output voltage, respectively.

2.2. Maximum Power Point (MPP)

The current

I_{P V}

and voltage

V_{P V}

define the power generated by solar panels. Solar panel performance can be defined by plotting those currents against voltage in an I-V curve that shows all the operation points from short circuit (

I_{s c}

) to open circuit (

V_{o c}

), as shown in Figure 2. The MPP is the unique point in the curve that maximizes the power generated by the panel. The current and voltage at this point are named

I_{m p}

and

V_{m p}

, respectively. A load connected to the PV panel the operating point varies in function of the resistive value of the load. If the load is different from

R_{m p} = V_{m p} / I_{m p}

the panel will not produce the maximum power.

Moreover, MPP also changes with irradiance and temperature [38]. Figure 3 and Figure 4 show the different I-V and P-V curves for changes in irradiance and temperature, respectively. As irradiance increases, the curves displace almost growing vertically, on the contrary, as temperature increases, the I-V curves shrinks horizontally. These changes in the I-V curves change the location of the MPP. Therefore, most of the time a panel operates outside of the MPP. This can be corrected with an MPPT. To achieve this, an MPPT converted is connected between the panel and the load. A change in the converter duty cycle (d) changes the operation point. With proper control, this operation point is manipulated via the duty cycle to achieve the MPP.

2.3. Reinforcement Learning

Reinforcement learning (RL) is a subset of machine learning where the agent learns a desired behaviour through trial and error during interactions with its environment [39]. The agent receives feedback in the form of penalties or rewards for each action it takes, and tries to learn a policy that maximizes the cumulative reward over time. The key components of RL include the agent, an environment, and a reward function. The agent takes actions in the environment, and the environment responds with a new state and a reward. With this information, it adjusts the policy parameters.

The control variable manipulated in the MPPT control is the duty cycle of the converter. Therefore the use of a continuous action and state space is more suitable. Among the algorithms that handle continuous actions and state spaces, Deep Deterministic Policy Gradient (DDPG) is the most common. DDPG is a type of reinforcement learning algorithm that combines ideas from both deep learning and policy gradient methods [40]. DDPG is particularly well-suited for continuous control problems where the action space is continuous and high-dimensional. DDPG is an off-policy algorithm, meaning that it learns from a separate set of experiences called an experience buffer, rather than directly from the agent’s current interactions with the environment. This provides the algorithm with efficient learning and better stability. The agent is composed of two neural networks called actor and critic as shown in Figure 5. The actor-network takes in the current state and outputs a continuous action, while the critic network takes in the current state and action and outputs a Q-value, which estimates the expected cumulative reward from that state–action pair.

The algorithm first initializes the critic. The critic is noted as

Q (S, A; ϕ)

, with

ϕ

random parameter values, and initialize the target critic

Q_{t} (S, A; ϕ_{t})

with the same parameters

ϕ_{t} = ϕ

. The same is done for the actor

Q (S; θ)

with random parameters values

θ

and the actor target

Q_{t} (S; θ_{t})

where

θ_{t} = θ

. Once initialized, for every training time step the agent selects an action

A = π (S; θ) + N

for the current observation S, N is a stochastic noise added to enhance the exploration from the Ornstein–Uhlenbeck noise model. After the execution of A, the next state

S^{'}

and the reward R are observed. The algorithm stores all values

(S, A, R, S^{'})

in the experience buffer. Then, the algorithm selects a random mini-batch of M experiences

(S_{i}, A_{i}, R_{i}, S_{i}^{'})

of the experience buffer. If

S_{i}^{'}

is a terminal state, the value function target

y_{i}

is set to

R_{i}

, otherwise, Equation (7) is used:

y_{i} = R_{i} + γ Q_{t} (S_{i}^{'}, π_{t} (S_{i}^{'}; θ_{t}); ϕ_{t})

(7)

The value function target

Q_{t}

is the sum of the experience reward

R_{i}

and the discounted future reward, where

γ

is the discount factor, and is bounded between 0 and 1. To compute the cumulative reward, the agent first computes the next action by passing the next observation

S_{i}^{'}

from the sampled experience to the target actor. The agent finds the cumulative reward by passing the next action to the target critic. Once this step is completed, the critic updates its parameters trying to minimize the loss Equation (8) L across all sampled experiences:

L = \frac{1}{2 M} \sum_{i = 1}^{M} {(y_{i} - Q (S_{i}, A_{i}; ϕ))}^{2}

(8)

The actor parameters are then updated with Equations (9)–(11) following sampled policy gradient to maximize the expected discounted reward:

\nabla_{θ} J \approx \frac{1}{M} \sum_{i = 1}^{M} G_{a i} G_{π i}

(9)

G_{a i} = \nabla_{A} Q (S_{i}, A; ϕ) w h e r e A = π (S_{i}; θ)

(10)

G_{π i} = \nabla_{θ} π (S_{i}; θ)

(11)

G_{a i}

is the gradient of the critic output with respect to the action computed by the actor-network, and

G_{π i}

is the gradient of the actor output with respect to the actor parameters. Both gradients are evaluated for observation

S_{i}

.

\nabla_{ϕ}

and

\nabla_{A}

represents the derivative of the function respect to the policy parameter for each case. Update the target actor and critic parameters depending on the target smoothing method. In the smoothing method, at every time step, the target parameters are updated by incorporating a smoothing factor

τ

as shown in Equations (12) and (13) for the critic and actor, respectively:

ϕ_{t} = τ ϕ + (1 - τ) ϕ_{t}

(12)

θ_{t} = τ θ + (1 - τ) θ_{t}

(13)

The environmental conditions (irradiance and temperature), power output and duty cycle are fed to the controller as space state. The reward function used in the implementation rewards high-power outputs and positive changes in power. It also compares the power with the maximum obtained under the same conditions on the DT to improve the training efficiency, as shown in Figure 6. The action space is continuous but bounded between 0.1 and 0.9 to ensure continuous conduction mode (CCM) working conditions in the DC/DC converter. The hyper parameters used in the agent are shown in Table 1.

2.4. Perturb and Observe (P&O) Controller

The P&O MPPT method is extensively utilized due to its ease of implementation, low cost, minimal sensor requirements, and simplicity [14]. This technique involves an iterative process to track the MPP by making slight adjustments to the duty cycle of the DC/DC voltage to produce changes in the PV panel voltage and monitor the resulting power changes. The changes in power are used to determine the direction of perturbation required to converge towards the MPP accurately. The iteration process is continued until the MPP is achieved. If the voltage is increased, and the power also increases, the PV module operating point is on the left side of the P-V curve. If the voltage is increased, and the power decreases, the PV module operating point is on the right side of the P-V curve. Figure 7 illustrates the steps involved in the operation of the technique.

Figure 7 illustrates the steps involved in the P&O algorithm. First, the algorithm reads the voltage and intensity to calculate the power. If the power of the current step is equal to the previous one, the algorithms take no action and start again. If it is different, it checks if the power is bigger or smaller than the previous one, and whether the voltage has increased or decreased. Based on these two conditions, the algorithm decides whether the action will increase or decrease the duty cycle. Finally, it updates the values from the previous step with the current data before starting again with the new measures.

2.5. Hardware

The proposed control structures were implemented in real-time using commercial hardware. The main source of power was a Peimar SG340P polycrystalline solar panel consisting of 72 high-quality module cells arranged in a 12 × 6 array. The panel is designed for residential and small industrial use and is protected by a low-iron-tempered glass front cover and a double-wall aluminium frame for mechanical stiffness. Table 2 provides additional details on the electrical properties of the panel. The irradiance and temperature were measured using a Meteo Control Si-V-010-T irradiance sensor.

A TEP-192 boost converter was used in conjunction with the PV panel. This device increases voltage and regulates the output voltage for various applications while also provides measurements of input-output current and voltage for monitoring. By utilizing a metal-oxide-semiconductor field-effect transistor (MOSFET), the input PWM signal can control this boost converter. The resistive load is set by the BK8500 in series with a fixed load since the device has a rated power of 300 W and the PV panel can supply as far as 340 W. Additional technical information is provided in Table 3 and Table 4.

The PWM signal was generated using dSPACE’s DS1202 MicroLabBox hardware, which was specifically developed for mechatronics development and can produce analog, digital, and PWM signals. The device is powered by a programmable FPGA with a dual-core processor, allowing for up to 2 GHz, 1 GB DRAM and 128 MB flash memory. It also supports Real-Time Interface (RTI), a platform for fast and automatic C code generation, which enables designers to focus solely on the Simulink interface. Additionally, dSPACE provided ControlDesk, which not only displays measured variables but also allows for the manipulation of control signals. The circuit was closed with variable resistances that act as an adjustable load to perform different experimental scenarios. Figure 8 shows a diagram of the hardware and connections between the elements.

The PC used for training contains a 2.8 GHz Intel i9-10900 CPU with 32 GB of RAM memory at 1600 MHz and an integrated Intel UHD Graphics 630 GPU. The design, training and simulation were carried out in Matlab 2022b and Simulink with the RL toolbox over Windows 10 Enterprise Operating System in its 1809 version. The code was compiled and uploaded by Matlab in C++ to upload onto the DS1202 MicroLabBox. The data acquisition of real-time experiments was done using dSapce Controldesk software and imported to Matlab for data processing and visualisation.

3. Results

3.1. Simulation Results

The trained DDPG agent was first tested via simulation by generating different irradiation and temperature conditions. Two experiments are presented below: the first one simulated different irradiation points at a fixed temperature, and the second simulated different temperature conditions at a fixed irradiation. Therefore, the effect of these two variables on the maximum power that the panel can offer can be seen separately. In addition, the DDPG agent was compared with a Perturb and Observe (P&O) controller with a fixed step size in order to observe whether there is an improvement in the search for the point of maximum power. It is worth mentioning that both experiments were carried out with a fixed resistor in the DC/DC converter.

As mentioned above, the first experiment was conducted at a constant temperature of 20 °C and varying irradiance, as shown in Figure 9. The maximum power that the solar PV panel can achieve at those irradiance points are obtained by the described controllers and shown in Figure 10. The first thing that can be checked is the direct ratio between irradiation and maximum power described in Section 2.2: when irradiation increases, power increases. Regarding the performance of the controllers, it can be observed that both reach the MPP before the change of irradiance in less than one second. However, the DDPG agent, by directly supplying the necessary duty cycle to the converter, the time it takes to reach the maximum power point is only limited by the converter model, 0.096 s. In the case of P&O, on the other hand, it is the controller itself that limits the time required to reach the MPP, needing 0.8 s to reach the MPP. This is due to the very nature of P&O control discussed in Section 2.4 above, and explains why at each change of irradiance, the controller has to perform a new MPP search. Thus, the DDPG is 8.43 times faster than the P&O in the first change of irradiation.

It should be noted that in the other three irradiation step changes (seconds 1, 2 and 3), the MPP is reached earlier than in the first one. This is due to the fact that the response of the converter varies depending on the environmental conditions. With more radiation, and to a lesser extent, at lower temperatures, the current flowing through the converter is higher and consequently, the capacitor at the converter output is charged faster. In the opposite case, with less radiation, the current flowing is lower and therefore the time it takes to charge the capacitor increases and the response becomes slower, arriving later at the MPP. All the settling times of the irradiance step changes and the efficiency improvement of the total power output of the DDPG over the P&O are shown in the following Table 5:

It should be noted that the P&O oscillates when it reaches the point of maximum power with an amplitude that depends on the increment with which it is designed. However, the increment in this case is very small,

10^{- 5}

, so that the oscillations in power cannot be observed.

In the second experiment, simulations were conducted with a fixed radiation of 750 W/m

^{2}

and different temperature points, as shown in Figure 11. As in the previous experiment, the following Figure 12 shows the maximum power points obtained by the two controllers at different temperatures. In this case, as there are only temperature changes, it can be concluded that there exists an inverse relationship between power and temperature: when the temperature decreases, the maximum power increases. Again, the DDPG controller acts faster compared to the P&O because the only limiting factor on how fast the MPP is reached is the DC/DC converter, obtaining a settling time of 0.06 s against the 0.58 s of the P&O. Additionally, as seen in Figure 4, where the panel curves at different temperature conditions are shown, the higher the temperature, the lower the panel voltage value that must be obtained to reach the MPP. Therefore, in the second simulation with higher temperature, the voltage difference between the initial value and the desired value is smaller than in the first simulation, making the output response in the first simulation 55% slower.

On the other hand, the P&O still needs more time to reach the same point as discussed above. The temperature change is not very large as in seconds 1 and 2, and the time needed for the P&O is consequently shorter. However, it is still slower than the DDPG agent, as shown in Table 6. In the data shown there, an improvement of 10,667% for the DDPG against the PO stands out. This is due to the fact that the system’s response in these seconds is faster and therefore reaches the MPP sooner. The DDPG takes advantage of this improvement because the controller, as mentioned above, produces the control signal with the optimal duty cycle in an instantaneous manner, while the P&O is still limited by its own search nature. Therefore, the DDPG can further improve its settling time compared to the P&O.

It is worth noting that in second 2, the temperature change is even smaller, so the P&O can reacher closer to the times of the DDPG, which cannot improve its settling time much more. Thus, the DDPG only has a 480% improvement over the P&O. As in the previous simulation, the efficiency improvement of the total power output of the DDPG over the P&O is provided in Table 6.

3.2. Real Solar PV Experiments Results

Regarding the experiments with the implementation referenced in Section 2.5, two tests were conducted with different irradiance and temperature conditions, in which, as in the simulations, the DDPG agent and a P&O with a fixed step size are compared. This comparison on the actual solar panel is carried out by running the controllers one immediately after the other within a short space of time so that the irradiance and temperature conditions are as even as possible.

The conditions of the first experiments are 240 W/m

^{2}

of irradiance and a temperature of 27 °C, and the results of the controller’s performance are shown in Figure 13. The test starts, as can be seen in Figure 14, with a constant duty cycle of 0.86 until the controllers are triggered to take action after 2.5 s. At that instant, the DDPG instantly applies the duty cycle value required to achieve the MPP and the PO starts its search to find the MPP. As shown in Figure 13, as in the simulations, although the DDPG gives its control signal to obtain the MPP, the system cannot reach that point as quickly. Because of this and the designed increase of the PO, the two controllers take almost the same time to reach the MPP, although the DDPG reaches the MPP 5% faster. However, due to the selected value for the increment of the P&O, even though it reaches the desired value quickly, the controller oscillates in the range of 0.1 over the required value of 0.2 given by the DDPG, as shown in Figure 14. This fact is reflected in the power oscillations with an amplitude of 3 watts in Figure 13, which leads to an improvement in efficiency of the DDPG of 11.19%.

It should be noted that in the control actions of the P&O in Figure 14, around the second 3, the duty cycle reaches 0.1 and stays there for a short time. This happens because a saturation has been added so that the duty cycle provided by the controllers does not go below 0.1 to maintain the DC/DC converter in the CCM mode. Aditionally, the P&O has been designed with an initial value of 0.9 at which it starts its search for the MPP, and explains why the first peak of the PO control signal is observed in the second 2.5 of Figure 14.

In the second experiment, the environmental conditions are as follows: irradiance of 1090 W/m

^{2}

and temperature of 43.5 °C. In this case, as shown in Figure 15, the experiment starts with a duty cycle of 0.84 until the controllers, after about 3 s, take action and execute their control. As in the previous experiment, we observe the immediacy of the DDPG control action compared to the 4.26 s it takes the P&O to find the MPP. In this second test, the DDPG needs only 0.25 s to reach the MPP thus being 1704% faster.

It is worth noting that the increment has been reduced 10 times and therefore the time it takes to find the point is 3.4 times slower compared the the first test, although the oscillations around the desired value of duty cycle are also 10 times smaller as shown in Figure 16, with an amplitude of 0.01. This fact is reflected in the power oscillations with an amplitude of 0.5 watts in Figure 13, which leads to an improvement in efficiency of the DDPG of 51.45%.

As can be seen in Figure 16, the time taken for the DDPG to reach the MPP is 4.1 times shorter than that seen in the previous experiment (see Figure 13). This phenomenon is the same as the one explained in the previous Section 3.1 of simulations: When the irradiation is high, the current flowing through the converter is higher and the charging of the capacitor in the converter is faster. Consequently, the response of the converter is faster than at lower irradiation values. Moreover, the temperature also influences the voltage that the solar panel has to obtain in order to reach the MPP. As shown in Figure 17, the higher the temperature, the lower the voltage the panel has to deliver and therefore the faster the converter dynamics become.

It should be noted that the two controllers achieve a different MPP, even though they are very close. This occurs because the irradiance throughout the experiment has not been constant, but has undergone a rise of about 10 W/m

^{2}

(see Figure 18), which produces a slight change in the MPP value. In this case, the P&O control was executed later than the DDPG control, so the MPP reached is slightly higher.

After observing the tests performed, it can be concluded that the performance of the DDPG in finding the MPP under different environmental conditions is better. It reaches the desired power in a shorter time, as it is only limited by the slow dynamics of the converter. Furthermore, it does not show any oscillation when reaching the MPP value as the P&O does. In fact, the DDPG is, both in simulations and in real experiments, the fastest controller with the highest total power output. This fact is reflected in Table 7, where the improvement of the DDPG in terms of speed and efficiency is shown. It should be clarified that the settling time improvement of the simulations is the best improvement value of all sections of each simulation. From these data, the 51.45% improvement in efficiency obtained by the DDPG over the P&O stands out. This is because the response of the system under these conditions is fast and the DDPG agent can reach the MPP quickly. In addition, the P&O is in this case designed with a small increment, so that although the oscillations are almost imperceptible at the MPP, it takes more time to reach that point, which detracts from its ability to provide a total power output similar to that of the DDPG agent.

In comparison to similar works such as [35], improvements can be observed. Phant et al. carried out two simulations similar to those conducted in this work, where one of the two environmental variables was kept constant while the other varied. In the case of the simulation with constant temperature, they obtained an efficiency of 0.96% compared to the 8.59% obtained in this work. In the case of the simulation with constant irradiation and temperature variation, they obtained an efficiency of 2.74% while this work achieved an efficiency of 10.45%. It should be noted that their simulations and those carried out in this paper did not employ the same length of time. Their simulations were shorter, so if our work were to run simulations for the same period, the efficiency would decrease. However, as the settling time is more important in short periods and our settling time is shorter, the difference in efficiency would increase. On the contrary, if they were to simulate a period such as that used in this work, their efficiency would increase; however, as they spend more time in the MPP, the settling time would lose importance and the difference in efficiency would decrease.

In another work that also used neural networks, the authors [41] compared their ANN-INC (incremental method) with a conventional P&O in simulation. They subsequently obtained an improvement in efficiency of 5.2% compared to a conventional P&O, 5.25% less than that obtained in this work.

4. Conclusions and Future Work

The use of a DDPG agent of RL with a DT as the method of MPPT of a solar PV panel is summarized in this paper. After designing the DDPG agent and training it for all possible conditions, it was tested, first in simulation, using a solar PV model and a DC/DC converter model, and proved able to verify the relationship between irradiance and temperature with the maximum power point. Subsequently, the controller was transferred to a real solar panel to examine its performance. In addition, both in simulation and in real experiments, the controller was compared with a P&O controller with fixed step time to observe the improvements offered by the DDPG agent.

The results showed that the DDPG agent was able to reach the maximum power point in 25% less time than the P&O controller. This finding is relevant because the ability to quickly adapt to changing solar light conditions is crucial for the proper functioning of solar energy systems. Additionally, the findings showed that the DDPG agent achieved an improvement in efficiency of 11.19% over the P&O control in a time period of four seconds.

The proposed MPPT controller can contribute to the field of solar PV research by advancing the state-of-the-art MPPT algorithms. By incorporating reinforcement learning and digital twin technology, the proposed controller can achieve improved performance over existing MPPT algorithms. The proposed MPPT controller can help to accelerate the adoption of solar energy by increasing the efficiency and reducing the cost of solar energy generation.

Despite the promising results, further studies and tests are needed to validate the effectiveness of the DDPG agent in different operating conditions, such as partial shadowing condition (PSC). Due to the nature of the RL, it is difficult to find the necessary parameters to obtain an adequate control. By having to make adjustments through trial and error, the long training time makes it difficult to obtain optimal controllers. In addition, because of the dependence on the weather, it is not possible to recreate scenarios where the response of the controllers can be better observed as can be done in a controlled environment. It would also be interesting to explore the possibility of combining reinforcement learning techniques or the use of digital twin with other control optimization methods, such as feedback control or model predictive control. Additionally, it would be valuable to test reinforcement learning control with a solar PV array to examine its performance in a more complex system. In summary, there is great potential for the application of deep learning techniques in solar energy system control; however, more research is needed to fully exploit it.

Author Contributions

Conceptualization, O.B. and I.C.; methodology, O.B., E.A. and J.U.; investigation, E.A. and J.U.; writing—original draft preparation, E.A. and J.U.; writing—review and editing, E.A., J.U., O.B. and I.M.; visualization, E.A., J.U. and I.M.; supervision, O.B. and I.C.; project administration, O.B. and I.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available on request from the authors.

Acknowledgments

The authors wish to express their gratitude to the Basque Government, through the project EKOHEGAZ II, to the Diputación Foral de Álava (DFA), through the project CONAVANTER, and to the UPV/EHU, through the project GIU20/063, for supporting this work.

Conflicts of Interest

The authors declare no conflict of interest.

References

IEA. Energy Statistics Data Browser. 2022. Available online: https://www.iea.org/data-and-statistics/data-tools/energy-statistics-data-browser?country=WORLD&fuel=Energy%20consumption&indicator=TFCbySource (accessed on 18 April 2023).
IEA. Global Primary Energy Demand Growth by Scenario, 2019–2030. 2022. Available online: https://www.iea.org/data-and-statistics/charts/global-primary-energy-demand-growth-by-scenario-2019-2030 (accessed on 18 April 2023).
Goman, V.; Prakht, V.; Kazakbaev, V.; Dmitrievskii, V. Comparative Study of Energy Consumption and CO₂ Emissions of Variable-Speed Electric Drives with Induction and Synchronous Reluctance Motors in Pump Units. Mathematics 2021, 9, 2679. [Google Scholar] [CrossRef]
Brauers, H.; Oei, P.Y.; Walk, P. Comparing coal phase-out pathways: The United Kingdom’s and Germany’s diverging transitions. Environ. Innov. Soc. Transit. 2020, 37, 238–253. [Google Scholar] [CrossRef] [PubMed]
Gielen, D.; Boshell, F.; Saygin, D.; Bazilian, M.D.; Wagner, N.; Gorini, R. The role of renewable energy in the global energy transformation. Energy Strategy Rev. 2019, 24, 38–50. [Google Scholar] [CrossRef]
Hrovatin, D.; Žemva, A. Exploiting Solar Energy during an Aerial Mapping Mission on a Lightweight UAV. Electronics 2021, 10, 2876. [Google Scholar] [CrossRef]
Mukhtar, S.; Gul, T. Solar Radiation and Thermal Convection of Hybrid Nanofluids for the Optimization of Solar Collector. Mathematics 2023, 11, 1175. [Google Scholar] [CrossRef]
Verduci, R.; Romano, V.; Brunetti, G.; Yaghoobi Nia, N.; Di Carlo, A.; D’Angelo, G.; Ciminelli, C. Solar energy in space applications: Review and technology perspectives. Adv. Energy Mater. 2022, 12, 2200125. [Google Scholar] [CrossRef]
Alsadi, S.; Khatib, T. Photovoltaic power systems optimization research status: A review of criteria, constrains, models, techniques, and software tools. Appl. Sci. 2018, 8, 1761. [Google Scholar] [CrossRef]
Parada-Salado, J.G.; Rodríguez-Licea, M.A.; Soriano-Sanchez, A.G.; Ruíz-Martínez, O.F.; Espinosa-Calderon, A.; Pérez-Pinal, F.J. Study on Multiple Input Asymmetric Boost Converters with Simultaneous and Sequential Triggering. Electronics 2021, 10, 1421. [Google Scholar] [CrossRef]
Troudi, F.; Jouini, H.; Mami, A.; Ben Khedher, N.; Aich, W.; Boudjemline, A.; Boujelbene, M. Comparative Assessment between Five Control Techniques to Optimize the Maximum Power Point Tracking Procedure for PV Systems. Mathematics 2022, 10, 1080. [Google Scholar] [CrossRef]
Shen, C.L.; Chen, L.Z.; Chuang, T.Y.; Liang, Y.S. Cascaded-like High-Step-Down Converter with Single Switch and Leakage Energy Recycling in Single-Stage Structure. Electronics 2022, 11, 352. [Google Scholar] [CrossRef]
Danandeh, M.; Mousavi G., S. Comparative and comprehensive review of maximum power point tracking methods for PV cells. Renew. Sustain. Energy Rev. 2018, 82, 2743–2767. [Google Scholar] [CrossRef]
Petrescu, C.; Sharma, A.K.; Pachauri, R.K.; Choudhury, S.; Minai, A.F.; Alotaibi, M.A.; Malik, H.; Márquez, F.P.G. Role of Metaheuristic Approaches for Implementation of Integrated MPPT-PV Systems: A Comprehensive Study. Mathematics 2023, 11, 269. [Google Scholar] [CrossRef]
Grieves, M. Completing the Cycle: Using PLM Information in the Sales and Service Functions [Slides]. In Proceedings of the SME Management Forum, Troy, MI, USA, 31 October 2002. [Google Scholar]
Wang, K.; Ma, J.; Wang, J.; Xu, B.; Tao, Y.; Man, K.L. Digital Twin based Maximum Power Point Estimation for Photovoltaic Systems. In Proceedings of the 2022 19th International SoC Design Conference (ISOCC), Gangneung-si, Republic of Korea, 19–22 October 2022; pp. 189–190. [Google Scholar] [CrossRef]
Zhang, G.; Wang, X. Digital Twin Modeling for Photovoltaic Panels Based on Hybrid Neural Network. In Proceedings of the 2021 IEEE 1st International Conference on Digital Twins and Parallel Intelligence (DTPI), Beijing, China, 15 July–15 August 2021; pp. 90–93. [Google Scholar] [CrossRef]
Liu, H.D.; Lu, S.D.; Lee, Y.L.; Lin, C.H. A novel photovoltaic module quick regulate mppt algorithm for uniform irradiation and partial shading conditions. Processes 2021, 9, 2213. [Google Scholar] [CrossRef]
Yildirim, M.A.; Nowak-Ocłoń, M. Modified maximum power point tracking algorithm under time-varying solar irradiation. Energies 2020, 13, 6722. [Google Scholar] [CrossRef]
Zafar, M.H.; Al-shahrani, T.; Khan, N.M.; Feroz Mirza, A.; Mansoor, M.; Qadir, M.U.; Khan, M.I.; Naqvi, R.A. Group teaching optimization algorithm based MPPT control of PV systems under partial shading and complex partial shading. Electronics 2020, 9, 1962. [Google Scholar] [CrossRef]
Ko, J.S.; Huh, J.H.; Kim, J.C. Overview of maximum power point tracking methods for PV system in micro grid. Electronics 2020, 9, 816. [Google Scholar] [CrossRef]
Ahmed, J.; Salam, Z. An enhanced adaptive P&O MPPT for fast and efficient tracking under varying environmental conditions. IEEE Trans. Sustain. Energy 2018, 9, 1487–1496. [Google Scholar]
Macaulay, J.; Zhou, Z. A fuzzy logical-based variable step size P&O MPPT algorithm for photovoltaic system. Energies 2018, 11, 1340. [Google Scholar] [CrossRef]
Farhat, M.; Barambones, O.; Sbita, L. A real-time implementation of novel and stable variable step size MPPT. Energies 2020, 13, 4668. [Google Scholar] [CrossRef]
Alagammal, S.; Rathina Prabha, N. Combination of modified P&O with power management circuit to exploit reliable power from autonomous PV-battery systems. Iran. J. Sci. Technol. Trans. Electr. Eng. 2021, 45, 97–114. [Google Scholar] [CrossRef]
Mendez, E.; Ortiz, A.; Ponce, P.; Macias, I.; Balderas, D.; Molina, A. Improved MPPT algorithm for photovoltaic systems based on the earthquake optimization algorithm. Energies 2020, 13, 3047. [Google Scholar] [CrossRef]
Ajani, T.S.; Imoize, A.L.; Atayero, A.A. An overview of machine learning within embedded and mobile devices–optimizations and applications. Sensors 2021, 21, 4412. [Google Scholar] [CrossRef]
Li, C.; Chen, Y.; Zhou, D.; Liu, J.; Zeng, J. A high-performance adaptive incremental conductance MPPT algorithm for photovoltaic systems. Energies 2016, 9, 288. [Google Scholar] [CrossRef]
Khan, M.J.; Mathew, L.; Alotaibi, M.A.; Malik, H.; Nassar, M.E. Fuzzy-Logic-Based Comparative Analysis of Different Maximum Power Point Tracking Controllers for Hybrid Renewal Energy Systems. Mathematics 2022, 10, 529. [Google Scholar] [CrossRef]
Al-Gizi, A.G.; Al-Chlaihawi, S.J. Study of FLC based MPPT in comparison with P&O and InC for PV systems. In Proceedings of the 2016 International Symposium on Fundamentals of Electrical Engineering (ISFEE), Bucharest, Romania, 30 June–2 July 2016; pp. 1–6. [Google Scholar]
Roy, R.B.; Rokonuzzaman, M.; Amin, N.; Mishu, M.K.; Alahakoon, S.; Rahman, S.; Mithulananthan, N.; Rahman, K.S.; Shakeri, M.; Pasupuleti, J. A comparative performance analysis of ANN algorithms for MPPT energy harvesting in solar PV system. IEEE Access 2021, 9, 102137–102152. [Google Scholar] [CrossRef]
Kofinas, P.; Doltsinis, S.; Dounis, A.I.; Vouros, G.A. A reinforcement learning approach for MPPT control method of photovoltaic sources. Renew. Energy 2017, 108, 461–473. [Google Scholar] [CrossRef]
Chou, K.Y.; Yang, S.T.; Chen, Y.P. Maximum Power Point Tracking of Photovoltaic System Based on Reinforcement Learning. Sensors 2019, 19, 5054. [Google Scholar] [CrossRef]
Singh, Y.; Pal, N. Reinforcement learning with fuzzified reward approach for MPPT control of PV systems. Sustain. Energy Technol. Assess. 2021, 48, 101665. [Google Scholar] [CrossRef]
Phan, B.C.; Lai, Y.C.; Lin, C.E. A Deep Reinforcement Learning-Based MPPT Control for PV Systems under Partial Shading Condition. Sensors 2020, 20, 3039. [Google Scholar] [CrossRef] [PubMed]
Nicola, M.; Nicola, C.I.; Selișteanu, D. Improvement of the Control of a Grid Connected Photovoltaic System Based on Synergetic and Sliding Mode Controllers Using a Reinforcement Learning Deep Deterministic Policy Gradient Agent. Energies 2022, 15, 2392. [Google Scholar] [CrossRef]
Humada, A.M.; Hojabri, M.; Mekhilef, S.; Hamada, H.M. Solar cell parameters extraction based on single and double-diode models: A review. Renew. Sustain. Energy Rev. 2016, 56, 494–509. [Google Scholar] [CrossRef]
Villalva, M.G.; Gazoli, J.R.; Filho, E.R. Comprehensive Approach to Modeling and Simulation of Photovoltaic Arrays. IEEE Trans. Power Electron. 2009, 24, 1198–1208. [Google Scholar] [CrossRef]
Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef]
Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar] [CrossRef]
Raj, A.; Gupta, M. Numerical Simulation and Performance Assessment of ANN-INC Improved Maximum Power Point Tracking System for Solar Photovoltaic System Under Changing Irradiation Operation. Ann. Rom. Soc. Cell Biol. 2021, 25, 790–797. [Google Scholar]

Figure 1. Single-diode model electrical diagram.

Figure 2. I-V and P-V curves.

Figure 3. I-V and P-V curves at 25 °C.

Figure 4. I-V and P-V curves for 1000 W/m

^{2}

.

Figure 4. I-V and P-V curves for 1000 W/m

^{2}

.

Figure 5. DDPG actor-critic structure diagram.

Figure 6. Block Diagram of RL DDPG Agent training with Digital Twin.

Figure 7. P&O controller flow chart.

Figure 8. Hardware implementation schematic.

Figure 9. Irradiation conditions.

Figure 10. Maximum power tracking with DDPG and P&O under constant temperature and variable irradiance.

Figure 11. Temperature conditions.

Figure 12. Maximum power tracking with DDPG and P&O under constant irradiation and variable temperature.

Figure 13. Maximum power point tracking with DDPG and P&O at 244 W/m

^{2}

and 27

^{\circ}

C.

Figure 13. Maximum power point tracking with DDPG and P&O at 244 W/m

^{2}

and 27

^{\circ}

C.

Figure 14. DDPG and P&O control actions for MPP tracking at 244 W/m

^{2}

and 27

^{\circ}

C.

Figure 14. DDPG and P&O control actions for MPP tracking at 244 W/m

^{2}

and 27

^{\circ}

C.

Figure 15. DDPG and P&O control actions for MPP tracking at 1090 W/m

^{2}

and 43.5

^{\circ}

C.

Figure 15. DDPG and P&O control actions for MPP tracking at 1090 W/m

^{2}

and 43.5

^{\circ}

C.

Figure 16. Maximum power point tracking with DDPG and P&O at 1090 W/m

^{2}

and 43.5

^{\circ}

C.

Figure 16. Maximum power point tracking with DDPG and P&O at 1090 W/m

^{2}

and 43.5

^{\circ}

C.

Figure 17. Solar PV voltage.

Figure 18. Irradiance over time.

Table 1. Hyperparamentes DDPG agent.

Properties	Values
Sample time	0.01
Experience buffer length	$10^{6}$
Mini-batch size	64
Discount factor ( $γ$ )	0.99
O-U noise variance	0.15
Smooth factor ( $τ$ )	0.001

Table 2. Tecnical data of SG340P panel.

Properties	Values	Units
Dimensions	156 × 156	mm
Open-circuit voltage	45	V
Max power voltage	37	V
Max power current	9	A
Maximum power	340	W
Number of parallel cells	12	units
Number of series cells	6	units
Isc	9.9	A

Table 3. TEP-192 Details.

Properties	Values	Units
Switching frequency	20	kHz
Max input voltage	60	V
Max output voltage	250	V
Max input current	30	A
Max output current	30	A

Table 4. BK8500 Details.

Properties	Values	Units
Power	300	W
Operating voltage	0–120	V
Rated current	30	A
Load range	0.1–4 k	Ω

Table 5. Settling times of the DDPG agent and the P&O controller for irradiance changes.

Step	Step Time (s)	DDPG Settling Time (s)	P&O Settling Time (s)	Settling Time Improvement	Overall Efficiency Improvement
1	0	0.096	0.81	843%	8.59%
2	1	0.011	0.28	2454%
3	2	0.014	0.1	714%
4	3	0.005	0.12	2400%

Table 6. Settling times of the DDPG and P&O under changes in temperature.

Step	Step Time (s)	DDPG Settling Time (s)	P&O Settling Time (s)	Settling Time Improvement	Overall Effiency Improvement
1	0	0.06	0.58	966%	10.45 %
2	1	0.003	0.32	10,667%
3	2	0.0025	0.012	428%

Table 7. Results of improvement of the DDPG over the P&O controller.

Test	Settling Time Improvement	Efficiency Improvement
1st Simulation	2454%	8.59%
2nd Simulation	10,667%	10.45%
1st Real Test	5%	11.19%
2nd Real Test	1704%	51.45%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Artetxe, E.; Uralde, J.; Barambones, O.; Calvo, I.; Martin, I. Maximum Power Point Tracker Controller for Solar Photovoltaic Based on Reinforcement Learning Agent with a Digital Twin. Mathematics 2023, 11, 2166. https://doi.org/10.3390/math11092166

AMA Style

Artetxe E, Uralde J, Barambones O, Calvo I, Martin I. Maximum Power Point Tracker Controller for Solar Photovoltaic Based on Reinforcement Learning Agent with a Digital Twin. Mathematics. 2023; 11(9):2166. https://doi.org/10.3390/math11092166

Chicago/Turabian Style

Artetxe, Eneko, Jokin Uralde, Oscar Barambones, Isidro Calvo, and Imanol Martin. 2023. "Maximum Power Point Tracker Controller for Solar Photovoltaic Based on Reinforcement Learning Agent with a Digital Twin" Mathematics 11, no. 9: 2166. https://doi.org/10.3390/math11092166

APA Style

Artetxe, E., Uralde, J., Barambones, O., Calvo, I., & Martin, I. (2023). Maximum Power Point Tracker Controller for Solar Photovoltaic Based on Reinforcement Learning Agent with a Digital Twin. Mathematics, 11(9), 2166. https://doi.org/10.3390/math11092166

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Maximum Power Point Tracker Controller for Solar Photovoltaic Based on Reinforcement Learning Agent with a Digital Twin

Abstract

1. Introduction

2. Materials and Methods

2.1. Digital Twin (PV Model)

2.2. Maximum Power Point (MPP)

2.3. Reinforcement Learning

2.4. Perturb and Observe (P&O) Controller

2.5. Hardware

3. Results

3.1. Simulation Results

3.2. Real Solar PV Experiments Results

4. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI