Robust Reinforcement Learning-Based Multiple Inputs and Multiple Outputs Controller for Wind Turbines

Tomin, Nikita

doi:10.3390/math11143242

Open AccessArticle

Robust Reinforcement Learning-Based Multiple Inputs and Multiple Outputs Controller for Wind Turbines

by

Nikita Tomin

Melentiev Energy Systems Institute of SB RAS, Lermontov Str. 130, 664033 Irkutsk, Russia

Mathematics 2023, 11(14), 3242; https://doi.org/10.3390/math11143242

Submission received: 25 June 2023 / Revised: 19 July 2023 / Accepted: 20 July 2023 / Published: 24 July 2023

(This article belongs to the Special Issue Numerical Simulation and Control in Energy Systems, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

The control of variable-speed wind turbines that generate electricity from the kinetic energy of the wind involves subsystems that need to be controlled simultaneously, namely, the blade pitch angle controllers and the generator torque controllers. The presented study solves the control problem with multiple inputs and multiple outputs (MIMO), using the method of reinforcement learning–based Trust Region Policy Optimization, through which the control parameters of both subsystems are simultaneously optimized. In this case, the robust control problem is transformed into a constrained optimal control problem with an appropriate choice of value functions for the nominal system. The study aims to synthesize a robust controller, with the aim of maximizing the generated energy (power) and minimizing unwanted forces (thrust). The innovative control architecture uses an extended input space, which allows fine-tuning of parameters for each operating state. Test calculations carried out in simulation experiments using models of the 5 MW NREL wind turbine and the 4 MW Enercon E-126 EP3 wind turbine are presented to illustrate the performance and practicality of the proposed approach.

Keywords:

wind turbine; controller; MIMO; reinforcement learning; blade pitch control; generator torque control; Trust Region Policy Optimization

MSC:

37M22; 37M05

1. Introduction

Wind energy has become one of the most competitive renewable alternatives to conventional energy sources based on fossil fuels [1]. Designing an efficient controller for variable-speed wind turbines (VSWTs) is challenging due to the complex dynamics caused by wind variability and the large number of variables that need to be controlled simultaneously. The actual control problem for a VSWT can be represented as a control problem for multiple input–multiple output (MIMO) systems where multiple inputs (e.g., blade pitch angle, rotor speed, etc.) need to be controlled to achieve desired outputs (e.g., power, wind speed, etc.).

One of the main difficulties encountered when solving the VSWT control problem is the nonlinearity of the system dynamics. The dynamics of VSWT, in particular, depend on many factors such as wind speed, angle of attack of the blades, rotor speed, etc. Furthermore, these factors interact with each other, which leads to complex nonlinear dynamics throughout the entire system. Another difficulty is the presence of uncertainties in the system, such as changes in wind speed and other external factors. These uncertainties can lead to system instability and inefficient control.

Most of the scientific studies addressing VSWT control [2,3,4] solve this problem in two ways:

Using the reduced dynamic models of VSWT operation, which simplify the derivation of analytical expressions;
Decomposing the control problem into subproblems, where each local control objective of the subsystem contributes to the achievement of the global goal of control.

The first approach involves the use of simplified models, which, unfortunately, can lead to a loss of accuracy and a lack of universality. Reduced models may leave out of consideration some important factors, such as the influence of wind, changes in the speed of blade rotation, and others. In addition, these models can only be applied under certain conditions, which limits their versatility. In the second approach, the decomposition of the control problem into subproblems can lead to problems with integration and coordination of control. Each subsystem may have its objectives and constraints, which may conflict with each other or may not take into account interaction with other subsystems. This can lead to inefficient control and failure to achieve the global goal of control. As in other systems with MIMO control, the adverse effect of interactions between control loops on system performance is one of the most challenging problems in uncertain nonlinear systems.

1.1. Review of MIMO Controller Research for Wind Turbines

The use of MIMO controllers for wind turbines has been a significant area of research in recent years. These controllers aim to optimize the performance and efficiency of wind turbines by simultaneously controlling multiple variables. In one study [5], a multivariable individual pitch control (IPC) rotor active load controller for a large wind turbine was designed using a mixed sensitivity

H \infty

optimization approach. The proposed MIMO controller was optimized to reject periodic load disturbances in an optimal manner. In another study [6], the authors presented a passive IPC controller that was independent of fault diagnosis and based on double multivariable adaptive control without modeling.

Several studies have investigated various designs of MIMO controllers for VSWT based on a doubly-fed induction generator (DFIG), with the goal of improving the performance and robustness of such systems. For example, one study [7] introduced a MIMO power control strategy for a grid-connected DFIG-based wind turbine with slip power recovery. The control design was based on second-order sliding modes and Lyapunov methods, which are known for their robustness. Another study [8] proposed a MIMO controller that partially linearized the original nonlinear DFIG system to achieve fully decoupled control of the external dynamics, while the stability of the remaining internal dynamics was analyzed via the Lyapunov stability method.

Several studies have utilized the advantages of a MIMO control approach to compensate for uncertainties in control and mitigate mechanical loads on wind energy systems. One study addressed the problem of parameter variation in wind turbine systems due to wind speed fluctuations [9]. The paper proposed a MIMO self-tuning regulator that used local generator speed to compensate for parametric uncertainty. Another study [10] proposed a MIMO linear quadratic regulator (LQR) controller for a VSWT, focusing on the operating range above nominal wind speeds. In [11], the authors confirmed that the MIMO LQR controller enabled optimal suppression of random disturbances from the load, critical changes in wind speed, etc. However, it is important to remember that the real proof of control performance is obtained when controls are implemented and tested in the field. In [12], multivariable MIMO controls based on an LQR controller were implemented and tested for active tower damping, with good load alleviation results.

Another promising direction of the MIMO approach is the development of controllers aimed at optimizing the performance and efficiency of wind turbines by simultaneously controlling multiple variables, such as rotor speed, pitch angle, and generator torque. One study [13] proposed a multivariable model predictive control (MPC) strategy for variable-speed and variable-pitch wind turbines. The control strategy simultaneously controlled the blade pitch angle and generator torque to maximize energy capture, reduce transient loads, and smooth power output. In [14], the authors proposed a fuzzy MPC control concept, wherein the fuzzy state estimation was used, because, in real cases, measurement noise was usually present and not all the states were measurable. It has been shown that significant improvements in terms of control performance can be obtained with a fuzzy MPC strategy. A similar task can also be successfully addressed using artificial intelligence and machine learning methods. For example, one study [1] provided a reinforcement learning (RL) architecture for optimizing the parameters of the VSWT controller using different sets of input variables to simultaneously control the blade pitch angle and generator torque. In [15], a new fuzzy MIMO controller was developed which controlled both machine-side (MSC) and grid-side (GSC) controllers of DFIG together to extract the maximum power from varying wind velocity for the grid synchronization (Table 1).

The studies presented in this review demonstrate the potential of MIMO controllers for wind turbines to simultaneously control multiple variables and optimize performance and efficiency. However, there are some limitations and areas for improvement in the studies reviewed. Firstly, the studies primarily focus on designing MIMO controllers for specific wind turbine systems, which may limit the generalizability of the results to other wind turbine systems. Additionally, some studies do not address the potential increase in computational complexity associated with MIMO controllers and the impact on real-time implementation. Secondly, while the studies demonstrate promising results in improving the performance and robustness of wind turbines, there is limited discussion on the potential trade-offs between performance and robustness when designing MIMO controllers. Furthermore, there is limited analysis of the scalability of the MIMO controllers proposed in the studies reviewed. Finally, the studies reviewed primarily focus on traditional control approaches, with limited exploration of the potential of artificial intelligence and machine learning methods for MIMO control of wind turbines. While one study suggests the potential of reinforcement learning, further exploration of these methods could be beneficial in optimizing MIMO control for wind turbines.

1.2. The Paper Contribution

Overall, the above studies reviewed demonstrate the potential of MIMO controllers for wind turbines, but there is room for improvement in generalizability, scalability, consideration of computational complexity, trade-offs between performance and robustness, and exploration of AI and machine learning methods for MIMO control. This paper presents the results of a study on the development of the concept of a robust MIMO controller based on RL for wind turbines of the VSWT type. To synthesize such a controller and implement the principle of joint control, the presented study uses the Trust Region Policy Optimization (TRPO) method, which simultaneously optimizes the control parameters of the subsystems of the blade pitch and generator torque controllers. The experimental results obtained for the dynamic model of the NREL wind turbine show the high performance of the proposed controller trained on the basis of the TRPO method.

Its difference from the existing approaches lies in the following:

To overcome the above difficulties related to using MIMO control, the VSWT robust control problem in the presented paper was transformed into a class of optimal control problems by choosing the right cost functions for the nominal system. This means that such problems can be effectively solved with so-called model-free RL methods;
The proposed model-free RL approach on the basis of the TRPO method allows examination of the parameters of decision-making policy with minimal designer input and without domain-specific knowledge in the form of marked-up samples, which makes RL a promising research tool for developing VSWT control systems. Traditional wind turbine control methods require domain knowledge and labeled samples for training. However, the proposed model-free RL approach enables learning without such prior knowledge or sample labeling. This means that the control system can autonomously learn and optimize its decision-making policy based on observed data, which is a novel and innovative approach. This makes it a promising tool for investigating and developing control systems for VSWT;
The proposed approach is also not focused on or tailored to a specific wind turbine system. Due to the fact that an agent can learn from experience gained by interacting with the environment using a model-free method, the developed controller can be “tuned” to the required wind turbine system with minimal effort using recognized aerodynamic programs, such as FAST;
The use of the TRPO method in the development of the controller is also a novel aspect. This method allows for the optimization of the control policy while considering constraints and safety, which is crucial in the operation of wind turbines. This is important for wind turbines, where it is necessary to comply with restrictions on rotor speed, blade loads, and other parameters to prevent damage and ensure safe operation.

It is important to note that the developed controller is not only applicable to wind turbines but can also be generalized to other MIMO systems. This broad applicability further enhances the novelty of the proposed controller.

2. Problem Statement

2.1. General Principles and Objectives of VSWT Control

Wind turbines convert the kinetic energy of the wind into electrical energy. However, they cannot “collect” 100% of the energy passing through the disk area attracted by the blades. Only a certain part is converted into electrical power, according to the following expression [16]:

P_{a} = \frac{1}{2} ρ \cdot π \cdot R^{3} \cdot v^{3} \cdot C_{p},

(1)

where

ρ

is air density, R is a radius of the disk area covered by the blades, v is wind speed, and

C_{p}

is wind turbine power factor, which is a unimodal function of the relative blade tip speed

λ

. The curve is usually defined as a dependence of

C_{p}

on

λ

, and the real control has “hard-coded” optimal values.

Each control system has its unique control method, which depends on the operating region and the purpose of the VSWT control. Figure 1 shows the individual operating regions for any VSWT system. For most types of VSWT, the wind speed required to start the generator (cut-in speed) is typically around 3 to 4 m/s, and the wind speed at which the generator shuts down (cut-out speed) is usually in the range of 20 to 25 m/s. However, the specific values of cut-in and cut-out speeds may vary depending on the particular wind turbine model and operating conditions. It is also worth noting that some new wind turbine technologies may have lower cut-in speed values, which allow for the use of wind energy even at very low wind speeds.

Generator torque control allows for changing the speed of the turbine rotor, applying maximum power point tracking (MPPT) strategies to achieve the maximum possible extraction of wind energy. In the MPPT region, the VSWT can generate electricity in the wind speed range, but not at the rated power. In this region, the focus is on maximizing electricity generation. As seen from (1), the energy content of wind energy depends on the cube of the average wind speed. The rotor speed is varied to ensure that

λ

is maintained at the optimum level under changing wind speed to produce maximum power.

Thus, the maximum power can be obtained when the VSWT is operated at the optimum ratio of blade tip speeds

λ_{o p t}

to the speeds of rotor blades set at the optimum pitch angle

β_{o p t}

. In light of this, the generated power is maximized by the controller of the generator torque

T_{g}

, which reaches

λ_{o p t}

, and is represented as a function of the rotor speed expressed as

T_{g} = K ω^{2},

(2)

where the rotor speed equals

ω

, and K is determined as the VSWT aerodynamic constant, defined as

K = 0.5 \cdot ρ \cdot π \cdot R^{5} \cdot \frac{C_{p . o p t}}{λ_{o p t}^{3}}

(3)

where

ρ

is air density,

C_{p . o p t}

is optimal power factor, and R stands for the blade radius.

Generator torque can be considered as the force required to move the generator shaft. If its value is too low for a given wind speed, the blades will spin rapidly, as the generator does not offer any counter-resistance. If, on the contrary, the value is too high for a given wind speed, the blades will hardly move because the wind does not have enough force to overcome the resistance created by the generator. In both cases, the energy produced will be very low. The torque of the generator, in this case, acts as a “brake” that regulates the rotation speed of the blades (rotor speed). Thus, the main question here is, what torque should be applied to make the blades spin at the optimal speed that generates the most power under the current wind?

When the VSWT reaches the rated wind speed, it enters the pitch-control region, which is considered to be a full-load region. In this region, the wind speed lies between rated and cut-out speed, the tilt regulator controls the rotation of the rotor at rated speed, and the generator produces rated power, as shown in Figure 1. In contrast to the MPPT region, where the control aims to maximize energy production, the desired control goal in the pitch-control region is to limit energy production. This is achieved by limiting both the torque and the rotor speed of the VSWT generator to ensure that a constant rated power is obtained from the wind. PID control is commonly used in this region for the purpose of blade pitch control to adjust the speed of the VSWT under changing wind conditions. In this case, the increment of

Δ ω

in the initial VSWT pitch can be calculated as

Δ Θ = (K_{P} + \frac{K_{I}}{s} + \frac{K_{D}}{s . τ + 1}) Δ ω

(4)

where

Δ ω

is error in the generator speed and

K_{P}

,

K_{I}

, and

K_{D}

are the gains that are selected for the desired closed-loop performance of the controller.

Most commercial VSWTs rely on the collective pitch-control method by implementing the same control collectively for all wind turbine blades [18]. Each VSWT blade has the same pitch regardless of the presence of independent servos. The controlled variable in this case is the total blade pitch angle, and the difference between the given nominal rotor speed and its actual value is the error:

β_{c} = K_{P} (1 + \frac{K_{I}}{s}) (ω_{r e f} - ω)

(5)

where

β_{c}

is the total demand for blade pitch angles,

ω_{r e f}

is reference rotor speed, and

ω

is actual speed, which is measured on the rotor axis.

Typically, individual control of a VSWT involves two main actions: changing the generator torque and changing the blade pitch angle. At the same time, the VSWT can change the rotor speed to operate at maximum over a wide range of wind speeds. The disadvantage is that the control of the VSWT is more complex, since the controller must set both the pitch angle of the blades and the torque of the generator for the sole purpose of control, i.e., to maximize the efficiency of the power take-off. Modern approaches [19,20] normally solve the VSWT control problem by splitting the overall problem into two separate control subproblems:

Controlling the blade pitch angle $β$ to follow the reference value of the generator speed $ω_{r e f}$ , which depends on v, i.e., the blade pitch controller aims to minimize the absolute generator speed error $e_{ω_{g}} = |ω_{g} - ω_{r e f}|$ ;
Controlling the torque $T_{g}$ to reduce the absolute power error $e_{p} = |P_{e} - P_{r e f}|$ , where $P_{e}$ is the generated electric power and $P_{r e f}$ is the reference value of the electric power to be reached (which also depends on the wind speed, v).

These two subproblems are usually solved with a closed-loop PID-controller, which aims to minimize the error variable e, whose parameters are denoted by

k_{i}, i = 1, 2, \dots n

. In traditional approaches, these parameters are usually set by manually tuning (or using a heuristic method) and kept constant throughout the entire life cycle of the system. The well-known basic controllers Vidal [20] and Boukhezzar [19] are obtained analytically from a simplified single-mass dynamic model of VSWT.

Today, the more advanced joint control of torque and pitch angle for VSWT is a crucial problem in the field of wind turbine control. This problem is solved using a MIMO controller, which controls several inputs and outputs of the system simultaneously. In this case, the torque and the blade pitch angle are the inputs, and the rotor speed and generator power are the outputs. At the same time, such a MIMO controller can be built on the basis of a system model that describes the VSWT dynamics. In the traditional approach, the dynamic model of the system can be represented as a matrix of transfer functions that relates the input and output signals of the system. The control usually aims to maximize energy production with minimum fuel consumption.

2.2. General Applications of AI Methods to VSWT Control

It is important to emphasize that the joint control of the torque and pitch angle of a VSWT is a complex control problem that requires advanced control and optimization methods and technologies, such as adaptive control methods, artificial intelligence (AI), and heuristic optimization [21,22]. The most popular modern AI methods are fuzzy logic systems and machine learning (primarily reinforcement learning). The fuzzy logic-based VSWT blade-turning control attracts much attention due to its adaptability and simplicity. A unique characteristic of fuzzy logic controllers is the possibility of quickly changing their parameters to rapidly respond to changes in system dynamics without prior evaluation of the parameters. At the same time, the performance of such a controller largely depends on the knowledge of the user, and the requirement of fuzzy methods to allocate memory is the main disadvantage of this VSWT control method [22]. In [23], the authors used fuzzy logic to analyze various operating regions of a low-speed wind system by generating reference power from a VSWT and evaluating the difference between the reference power and the actual generator power. A certain downside of the proposed technology is its high cost. In [24], a pitch angle controller was proposed to smooth out wind turbine power output fluctuations that occur at sub-rated wind speeds. This approach showed a sufficiently high efficiency with a significantly small drop in power output.

According to [25], methods based on fuzzy logic and neural networks have their “weaknesses”, primarily, because their implementation requires accurate measurements of wind speed and some system parameters during the training phase, to ensure the correct direction to the MPPT strategy. Heuristic optimization algorithms can be a solution to these problems. For example, in [25], a new particle swarm-based MPPT technique was proposed, which required only the generated power from the VSWT as the input to the controller. Additionally, similar optimization algorithms were successfully used for optimal tuning of traditional PID-controllers. In [26], the particle swarm optimization algorithm was used to calculate the optimal PID-controller parameters in the VSWT blade pitch control problem.

The most promising methods for solving the VSWT optimal control problem, however, can be reinforcement learning methods, a specialized branch of AI methods, since they factor in uncertainty, adapt to changes in wind speed, and independently find the optimal control strategy based on the reinforcement signals received from the system. These methods make it possible to create autonomous decision-making systems that can implement wind turbine control with a high degree of adaptability and minimal dispatcher involvement. In [1], for example, a trained RL model was proposed for the synthesis of a MIMO controller to implement such a form of wind turbine control. In [27], the use of deep RL methods to control the turn of a wind turbine demonstrated that this approach was significantly superior to traditional RL algorithms, due to the combination of RL with learning capabilities through neural networks.

Some studies can also use RL techniques to improve the performance of basic VSWT controllers. For example, [1] provided an Actor–Critic RL architecture for optimizing the parameters of two base VSWT controllers (Vidal and Boukhezzar) using different sets of input variables for the RL agent and base controller. The proposed architecture optimized the parameters of each subsystem controller to minimize the overall electric power error. In [28], the authors noted that the convergence of RL methods in the VSWT optimal control problem was still limited due to the slowness of the learning process, and, therefore, suggested a solution based on hybrid control, relying on RL and a traditional PID-controller. The latter was used during the first training instances, since RL-based control did not yet have any experience to learn from. As a result, the hybrid controller reduced the output power error by about 41% compared to the PID-controller.

It is worthwhile to note that most studies on the application of RL to VSWT control refer to the model-based methods. Certain disadvantages of such approaches are the need for an accurate dynamic model and high computational complexity. In addition, model-based methods can be unstable if the environment model contains errors or does not take into account all the factors that affect the operation of the wind turbine. This can lead to incorrect action choices and performance degradation. In contrast to model-based methods, model-free RL methods do not require an exact model for training, are easier to implement, and afford greater robustness to errors in the input data.

3. Robust Model-Free RL-Based MIMO Controller

3.1. Reinforcement Learning

The interaction of an agent with its environment by trial and error is modeled in the framework of RL as a Markov decision process (MDP)

〈S, A, T, R〉

, where S is the set of observable variables that determine the space of system states, A is the set of actions that the agent can take,

T : S \times A \times S \to [0, 1]

is the stochastic transition function that gives the probability of observing the state

s^{'}

after execution of the action a in the state s, and R is the reward function that evaluates the value of the transition result (Figure 2).

In applications to control problems, the goal of an agent is to learn a deterministic policy

π (s)

that maximizes the value function

V (s)

, defined as the expected cumulative discounted reward if the agent follows the deterministic policy

π (s)

:

V^{π} = E \{\sum_{k = 0}^{\infty} r_{t + k + 1} \cdot γ^{t} | s_{t} = s\}

(6)

where

γ^{t}

is the discount factor weighting immediate and future rewards.

RL differs significantly from typical approximate dynamic programming methods in that it does not offer a prescribed behavior or learning model. Therefore, RL is often applied to adaptive optimal controller designs [29,30,31]. Following the conceptual basis of RL methods for the problem of robust control of a nonlinear system with continuous time, it is required to find a robust controller that ensures the stability of the system in the sense of uniform finite boundedness. This problem can be successfully transformed into an optimal control problem with the right choice of value functions for the nominal system.

Let a nonlinear system with continuous time be given and defined by the equation

\dot{s} (t) = f (s (t), a (t))

(7)

where

s (t) \in R^{n}

is the system state vector,

a (t) \in R^{m}

is the vector of control actions, and

f : R^{n} \times R^{m} \to R^{n}

is a vector function describing the system dynamics.

It is required to find a robust controller

a (t)

that ensures the stability of the system in the sense of uniform finite boundedness. To this end, the robust control problem can be successfully transformed into an optimal control problem with the right choice of value functions for the nominal system.

Let

s^{*} (t)

be the nominal trajectory of the system that satisfies the equation

{\dot{s}}^{*} (t) = f (s^{*} (t), a^{*} t))

(8)

where

a^{*} (t)

is optimal control action that minimizes the quality functional

J (u) = \int_{0}^{\infty} L (s (t), a (t)) d t

(9)

where

L (s (t), a (t))

is a cost function that determines the cost of transition from state

s (t)

to state

s (t + d t)

under control action

a (t)

.

Then, solving the constrained optimal control problem, we can obtain a robust controller (in our case, for VSWT) that guarantees the stability of a nonlinear system with continuous time in the sense of uniform finite boundedness.

3.2. Trust Region Policy Optimization

The Trust Region Policy Optimization (TRPO) method belongs to the class of model-free optimization methods used in reinforcement learning problems that do not require explicit modeling of the dynamics of the system that the agent is trying to control. Instead, these methods use the observational data obtained during the interaction of the agent with the environment to find the optimal control strategy. It does not require explicit modeling of system dynamics, but instead uses constraints on change in the strategy between iterations to ensure that the new strategy does not differ too much from the previous one and does not lead to a worse reward.

A more rigorous mathematical formulation implies that, during the learning process, the TRPO agent interleaves the sample data through interaction with the environment and updates the policy parameters, solving the constrained optimization problem. The Kullback–Leibler (KL) divergence between the old and new policies is used as a constraint during optimization. As a result, this algorithm prevents a significant decrease in performance compared to standard policy gradient methods by keeping the updated policy within a trust region close to the current policy [32].

Let

π_{θ}

denote a policy with

θ

parameters. The theoretical updating of TRPO in this case can be determined as follows:

\begin{matrix} θ_{k + 1} = arg max_{θ} & L (θ_{k}, θ) \\ s . t . & {\bar{D}}_{K L} (θ | | θ_{k}) \leq δ \end{matrix}

(10)

where

L (θ_{k}, θ)

is the surrogate advantage, a measure of how policy

π_{θ}

performs relative to the old policy

π_{θ_{k}}

using data from the old policy:

L (θ_{k}, θ) = s, a \sim π_{θ_{k}} \frac{π_{θ} (a | s)}{π_{θ_{k}} (a | s)} A^{π_{θ_{k}}} (s, a),

(11)

and

{\bar{D}}_{K L} (θ | | θ_{k})

is an average KL divergence between policies across states visited by the old policy:

{\bar{D}}_{K L} (θ | | θ_{k}) = s \sim π_{θ_{k}} D_{K L} (π_{θ} (\cdot | s) | | π_{θ_{k}} (\cdot | s)) .

(12)

We can visualize this in the manner presented in Figure 3.

In practice, it is quite difficult to implement the theoretical updating of TRPO. Therefore, the TRPO algorithm makes some approximations to obtain the answer sufficiently quickly. To this end, a Taylor expansion can be used to extend the objective and constraint to a leading order around

θ_{k}

, which leads to an approximate optimization problem:

\begin{matrix} θ_{k + 1} = arg max_{θ} & g^{T} (θ - θ_{k}) \\ s . t . & \frac{1}{2} {(θ - θ_{k})}^{T} H (θ - θ_{k}) \leq δ . \end{matrix}

(13)

This approximate problem can be analytically solved by the methods of Lagrangian duality, yielding the solution:

θ_{k + 1} = θ_{k} + \sqrt{\frac{2 δ}{g^{T} H^{- 1} g}} H^{- 1} g .

(14)

TRPO adds a modification to this update rule—a backtracking line search:

θ_{k + 1} = θ_{k} + α^{j} \sqrt{\frac{2 δ}{g^{T} H^{- 1} g}} H^{- 1} g,

(15)

where

α \in (0, 1)

is the backtracking coefficient, and j is the smallest nonnegative integer such that

π_{θ_{k + 1}}

satisfies the KL constraint and produces a positive surrogate advantage.

Based on the above general mathematical description of TRPO, it was adapted in the VSWT control problem as follows:

At each step, the agent chooses an action $a_{t}$ based on the current state $s_{t}$ using the control strategy $a (s)$ ;
The agent interacts with the environment and receives a reward $r_{t}$ for the action performed;
The control strategy is updated based on the optimization of the value function $V (s)$ and the control policy function $a (s)$ using the TRPO method;
The value function $V (s)$ estimates the expected total reward from the current state $a_{t}$ to the end of episode T;
The control policy function $a (s)$ determines the probability of choosing each action $a_{t}$ based on the current state $s_{t}$ .

The functions

V (s)

and

u (s)

were optimized given the constraints on the change in the control strategy, which were specified in the form of a TRPO. The control strategy was updated according to the expression (15) until the optimal strategy was reached.

3.3. MIMO Controller Synthesis

This subsection presents a proposed architecture that uses a TRPO-based method to simultaneously learn the parameters of the blade pitch controller

β

and generator torque controller

T_{g}

. The general control diagram is shown in Figure 4.

A single reward signal combines the responses of both subsystems in search of optimal performance for the entire VSWT:

r_{t} = w_{p} P_{r a t e} - w_{F} F_{r a t e} - w_{c} a_{t_{s} u m}^{2},

(16)

where

P_{r a t e} = (P_{e, t} - P_{e, t - 1}) / P_{e, t - 1}

and

F_{r a t e} = (F_{T, t} - F_{T, t - 1}) / F_{T, t - 1}

are coefficients of change in generated electricity and trust, respectively;

w_{p}

,

w_{F}

and

w_{c}

are weight coefficients; and

a_{t_{s u m}}

is sum of control actions.

Solving the above problem using RL, we determined the key components of the considered MDP as follows:

Action space: Since the control is responsible for the generator torque $T_{g_{r a t e}}$ [kNm/s] and the total pitch $β_{r a t e}$ [deg/s], the allowed actions are their speed changes;
State space: The state of the wind turbine is selected as $s_{t} = (P_{e}, w_{t}, T g, β, F_{T}, v)$ to characterize the operating parameters of the wind turbine, where $P_{e}$ is electric power produced [kW]; $F_{t}$ is trust [kN]; $w_{t}$ is the rotor speed [rpm]; $T_{g}$ is generator torque [kNm]; and $β$ is total pitch [deg];
Observation space: Each observation of the environment consists of six dimensions of the state vector $o_{t} = s_{t}$ ;
Transition probabilities: The transition probability $T (s_{0} | s, a)$ is a characteristic of wind turbine dynamics. In this study, an OpenAI Gym environment was created using a model that realistically reproduced the behavior of a wind turbine by interacting with the open source CCBlade to calculate aerodynamic forces using Blade Element Momentum (BEM) theory. This theory is based on the assumption that a blade can be divided into small elements, called “blade elements”, each of which has its own aerodynamic characteristics. To calculate the aerodynamic characteristics of a blade using the BEM approach, the blade is broken down into small elements, each with its characteristics such as angle of attack, lift coefficient, and drag coefficient. Then, using the BEM equations, the thrust and moment generated by each element of the blade are calculated. The approach used reduces the BEM equations to a one-dimensional residual function—function $ϕ$ :

$R (ϕ) = \frac{sin ϕ}{1 - a (ϕ)} - \frac{cos ϕ}{λ_{r} (1 + a^{'} (ϕ))} = 0$

(17)

Reducing the BEM equations to a one-dimensional residual function means that the BEM equations can be represented as a single equation that depends on one variable only, i.e., the blade pitch. This allows for solving the BEM equations with optimization methods, in this case, those based on RL, to find the optimal blade pitch. The study presented in [33] demonstrated, through mathematical proof, that the methodology always finds a bracket to a zero of $R (ϕ)$ without any singularities in the interior. This proof, along with existing proofs for root-finding methods such as Brent’s method [34], implies that the solution is guaranteed. The CCBlade code model factors in both hub and tip losses using the Prandtl method and high induction factor correction [35]. The resistance is included in the calculation of the inductance factors.

Thus, the TRPO agent learns both the weight vector

θ_{k + 1}

according to (15), which corresponds to the current policy

π_{θ_{k + 1}}

, and the weight vector of parameters which correspond to each base controller. Figure 4 shows these parameter vectors, grouped into the following two sets of weights:

$\vec{θ^{β}}$ is the parameter vector of the linear-functional approximation of the blade pitch controller $β$ ;
${\vec{θ}}^{T_{g}}$ is the parameter vector of the linear-functional approximation of the generator torque controller $T_{g}$ .

Thus, the TRPO agent generates two vector policies corresponding to the parameters of each subsystem controller:

{\vec{π}}^{β}

and

{\vec{π}}^{T_{g}}

.

4. Experiments

Testing the proposed approach involved a series of simulation experiments conducted using the NREL 5 MW baseline turbine and Enercon E-126 EP3 4.0 MW. The NREL 5 MW was developed by the National Renewable Energy Laboratory (NREL) for testing and evaluating wind turbine technology and is one of the most widely used prototype wind turbines in the world. The NREL 5 MW baseline turbine has a tower height of about 80 m and a rotor diameter of 126 m (Figure 5). It is equipped with a three-bladed rotor that can rotate at a speed of 6 to 20 revolutions per minute. The generator has a capacity of 5 MW and generates enough energy to power more than 1400 homes. In the context of our problem, it is important to note that this wind turbine model is well suited for use in the OpenAI Gym environment.

The Enercon E-126 EP3 4.0 MW is a newer version of the E-126 wind turbine model manufactured by Enercon GmbH, a German wind turbine manufacturer. It has a rotor diameter of 127 m and a hub height of up to 135 m. The rotor blades are made of a hybrid fiberglass–carbon material and have a variable-pitch mechanism, which allows the angle of attack of the blades to be adjusted to optimize energy capture at different wind speeds.

4.1. Dynamic Model

The experiments relied on a specially designed Gym-wind turbine environment, which is a balance between a simple and a realistic environment, so that the results obtained can be an approximation of the results expected from more extensive aeroelastic programs such as FAST (Fatigue, Aerodynamics, Structures, and Turbulence). The Gym-wind turbine environment interacts with an aeroelastic code, called CCBlade, to calculate aerodynamic forces. CCBlade is a code developed by NASA to model the aeroelastic properties of wind turbine blades. It employs a panel method for calculating the aerodynamic characteristics of the blades and modeling the dynamic behavior of the blades under wind conditions. CCBlade takes into account the effects of aerodynamic and geometric blade warping and the effects of aerodynamic interaction between the blades. A simplified transmission model was then added so that the driver can be implemented. The specific models used were the reference 5MW VSWT presented in [36] and Enercon WSWT operated in the Valentia offshore wind farm (Ireland). The dimensionless power curve (

λ

vs.

C_{p}

) for this reference model NREL 5 MW is shown in Figure 6.

The parameters of the environment used in all experiments are shown in Table 2 and Table 3. Wind turbine specifications determine the maximum rate of change per second. However, since the simulation ran every

1 / 20

of a second, the maximum range allowed at decision time was also

1 / 20

of the maximum value (

d t = 0.05

).

4.2. Case Study of the NREL 5 MW

In this experiment, the TRPO agent reward function (16) was a linear combination of power (80%) and thrust (20%) generated in the range [−200, 5600]. This means that the assumed weight variables were

w_{p} = 0.8

and

w_{F} = 0.2

. At the same time, there is an optimal rotor speed at which the wind generator produces maximum power at various wind speeds. For example, for this reference model, the optimum number of rotations per minute was 10 with a wind speed of approximately 8 m/s; this could be achieved by setting the generator torque to 10.147 kNm with a total blade pitch angle of 5 degrees. In fact, this was what the TRPO algorithm had to find, by setting different rates of change at each time step.

Each instance of the training experiment of the TRPO agent consisted of 100 training episodes. If the limits were not reached, the environment terminated after 60 s of simulation time (2400 time steps). As a result, the TRPO agent learned about

8 \times 10^{3}

of the total simulation time. Figure 7 shows the change in average reward depending on the episode of training. This graph demonstrates a stable trend of increasing agent reward, indicating the effectiveness of the trained controller design.

Two simulation scenarios of the dynamic environment model were considered to test the trained MIMO controller: (1) with a sequential increase in wind speed, and (2) with random changes in wind speed. It is evident that the second scenario was closer to the real operating conditions of the wind turbine.

Figure 8 shows the input and output variables of the trained MIMO controller with sequentially increasing wind speed and the change in the reward function for testing simulation. This figure shows that the TRPO agent found the next optimal policy

π_{θ}

. At low wind speeds (approximately up to 10 m/s), the generator torque

T_{g_{r a t e}}

increased gradually (action 0). With a further increase in wind speed, the total blade pitch angle

β_{r a t e}

(action 1) was additionally changed, to stabilize the rotor speed

w_{t}

and thrust

F_{t}

. Thus, the trained controller successfully solved the problem of stabilizing the rotor speed and thrust when the wind speed changed. At low wind speeds, the controller increased the generator torque and, with a further rise in wind speed, it changed the overall blade pitch angle to stabilize the rotor speed and thrust. Figure 8a also shows that the average values of the agent’s reward changed rapidly at the beginning. By the end of training, the controller did not reach a stable point, but the optimal reward values (close to zero) were reached repeatedly. This indicates that the controller reached a stable point and successfully solved the wind turbine control problem. Figure 8b shows the results of testing the trained controller for a more “stressful” yet more realistic scenario with random changes in wind speed. It can be seen that, in this case, the agent followed the strategy described above, attempting to adapt the available actions to sudden changes in external wind conditions.

Thus, the figures demonstrate the stable trend of increasing agent reward during training and the successful adaptation of the controller to changing wind conditions. Overall, the results suggest that the TRPO-based MIMO controller is effective in solving the wind turbine control problem.

4.3. Case Study of the Enercon E-126 EP3 4.0 MW

The Valentia offshore wind farm is a wind farm located on Valentia Island in County Kerry, on the west coast of Ireland. It was commissioned in 2018 and has an installed capacity of 10 MW. The wind farm consists of three Enercon E-70 wind turbines, each with a height of 64 m and a rotor diameter of 70 m. The turbines generate electricity, which is fed into the grid and supplied to consumers. Valentia Island Wind Park is one of the first wind farms in Ireland to be based on wind energy. It contributes to the development of renewable energy in the region and helps to reduce greenhouse gas emissions, which is an important step in the fight against climate change.

Although Enercon E-70 wind turbines are currently installed in this location, the E-126 EP3 model was considered in this example because this wind turbine features a new control system, which is intended to improve the efficiency and performance of the turbine [37]. The control system uses advanced algorithms to optimize the rotor speed and blade pitch in real time, based on data from sensors that measure wind speed and other factors. This model is better adapted for use with the proposed controller. To model wind conditions in the Valentia offshore wind farm area, the NREL Turbsim software Version 1.50 was used. This program utilizes wind field observations data and models atmospheric turbulence using mathematical equations. As a result of the modeling, parameters characterizing the wind field at a given point, including wind speed, were obtained (Figure 9). The reward function settings for the TRPO agent and its training parameters were adopted similarly to the previous case study.

In the presented test simulation of the Enercon E-126 EP3 (Figure 10), which was controlled by the proposed controller, it is evident tha, at low wind speeds of less than or equal to 4–5 m/s, the VSWT was in the parking region and the generator did not output power. As the wind speed increased to 10–12 m/s, the controller increased the generator torque

T_{g_{r a t e}}

to achieve the maximum power output conditions according to the MPPT strategy and obtained the optimal policy

π_{θ}

. In the tested case, there were no higher speeds of 18–25 m/s, where a more significant change in blade pitch angle

β_{r a t e}

would be required, i.e., when the generator torque is not able to “deal” with it.

5. Conclusions

This paper describes the RL architecture for optimizing the parameters of a wind turbine MIMO controller using a TRPO-based agent and a base controller. Additional state variables that are independent of the base controller parameters were used to characterize the operating state of the system, allowing the TRPO agent to configure the controller parameters for each operating state. The proposed architecture optimizes the parameters of each subsystem controller in a coordinated manner to maximize power generation and minimize unwanted forces. This is different from traditional VSWT control approaches that separate the control problems into two separate control subproblems.

The proposed approach was tested using realistic simulations that reproduced the behavior of a wind turbine using CCBlade to simulate the aerodynamic performance of an NREL 5 MW wind turbine reference model and an E-126 EP3 real model. Computational experiments showed that the proposed architecture can improve the base controllers through learning the parameters that maximize the global performance of the VSWT as a function of input variables that were left out of consideration by the original base controllers.

Funding

This research was supported by the Russian Science Foundation, project No. 19-19-00673.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MIMO	Multiple inputs and multiple outputs
VSWT	Variable-speed wind turbine
RL	Reinforcement learning
MPC	Model Predictive Control
AI	Artificial Intelligence
TRPO	Trust Region Policy Optimization
MPPT	Maximum power point tracking
MDP	Markov decision process
BEM	Blade Element Momentum
LQR	Linear quadratic regulator
DFIG	Doubly-Fed Induction Generator
FAST	Fatigue, Aerodynamics, Structures, and Turbulence
NREL	National Renewable Energy Laboratory
KL	Kullback–Leibler

References

Fernandez-Gauna, B.; Graña, M.; Osa-Amilibia, J.-L.; Larrucea, X. Actor-critic continuous state reinforcement learning for wind-turbine control robust optimization. Inf. Sci. 2022, 591, 365–380. [Google Scholar] [CrossRef]
Khezami, N.; Benhadj Braiek, N.; Guillaud, X. Wind turbine power tracking using an improved multimodel quadratic approach. ISA Trans. 2010, 49, 326–334. [Google Scholar] [CrossRef]
Boukhezzar, B.; Siguerdidjane, H. Nonlinear Control of a Variable-Speed Wind Turbine Using a Two-Mass Model. IEEE Trans. Energy Convers. 2011, 26, 149–162. [Google Scholar] [CrossRef]
Rubio, J.O.M.; Aguilar, L.T. Maximizing the performance of variable speed wind turbine with nonlinear output feedback control. Procedia Eng. 2012, 35, 31–40. [Google Scholar] [CrossRef] [Green Version]
Vali, M.; van Wingerden, J.-W.; Kühn, M. Optimal multivariable individual pitch control for load reduction of large wind turbines. In Proceedings of the 2016 American Control Conference (ACC), Boston, MA, USA, 6–8 July 2016; pp. 3163–3169. [Google Scholar] [CrossRef]
Li, J.; Wang, S.; Hou, Z.; Zhao, J. Multivariable Model-Free Adaptive Controller Design with Differential Characteristic for Load Reduction of Wind Turbines. IEEE Trans. Energy Convers. 2022, 37, 1106–1114. [Google Scholar] [CrossRef]
Evangelista, C.; Valenciaga, F.; Puleston, P. Active and Reactive Power Control for Wind Turbine Based on a MIMO 2-Sliding Mode Algorithm With Variable Gains. IEEE Trans. Energy Convers. 2013, 28, 682–689. [Google Scholar] [CrossRef]
Yang, B.; Jiang, L.; Wang, L.; Yao, W.; Wu, Q.H. Nonlinear maximum power point tracking control and modal analysis of DFIG based wind turbine. Int. J. Electr. Power Energy Syst. 2016, 74, 429–436. [Google Scholar] [CrossRef] [Green Version]
Muhando, E.B.; Senjyu, T.; Urasaki, N.; Yona, A.; Funabashi, T. Robust Predictive Control of Variable-Speed Wind Turbine Generator by Self-Tuning Regulator. In Proceedings of the 2007 IEEE Power Engineering Society General Meeting, Tampa, FL, USA, 24–28 June 2007; pp. 1–8. [Google Scholar] [CrossRef]
Pintea, A.; Wang, H.; Christov, N.; Borne, P.; Popescu, D.; Badea, A. Optimal control of variable speed wind turbines. In Proceedings of the 2011 19th Mediterranean Conference on Control & Automation (MED), Corfu, Greece, 20–23 June 2011; pp. 838–843. [Google Scholar] [CrossRef] [Green Version]
Lemmer, F.; Schlipf, D.; Cheng, P.W. Control design methods for floating wind turbines for optimal disturbance rejection. J. Phys. Conf. Ser. 2016, 753, 092006. [Google Scholar] [CrossRef]
Wright, A.D.; Fingersh, L.J.; Stol, K.A. Design and Testing Controls to Mitigate Tower Dynamic Loads in the Controls Advanced Research Turbine. NREL/CP-500-40932, National Renewable Energy Laboratory. January 2007. Available online: http://www.nrel.gov/docs/fy07osti/40932.pdf (accessed on 13 July 2023).
Soliman, M.; Malik, O.P.; Westwick, D.T. Multiple model MIMO predictive control for variable speed variable pitch wind turbines. In Proceedings of the 2010 American Control Conference, Baltimore, MD, USA, 30 June–2 July 2010; pp. 2778–2784. [Google Scholar] [CrossRef]
Novak, J.; Chalupa, P. MIMO Predictive Control of a Wind Turbine. Int. J. Energy Environ. 2014, 8, 22–38. [Google Scholar]
Sudarsana Reddy, K.; Mahalakshmi, R. A MIMO-Based Compatible Fuzzy Logic Controller for DFIG-Based Wind Turbine Generator. In Artificial Intelligence and Technologies; Raje, R.R., Hussain, F., Kannan, R.J., Eds.; Lecture Notes in Electrical Engineering; Springer: Singapore, 2022; Volome 806. [Google Scholar] [CrossRef]
Meisam, M.; Francisco, J.; Verdú, R.R.A.; Augustine, A. Hybrid biomass, solar and wind electricity generation in rural areas of Fez-Meknes region in Morocco considering water consumption of animals and anaerobic digester. Appl. Energy 2023, 343, 121253. [Google Scholar] [CrossRef]
Rezaei, M.M. A nonlinear maximum power point tracking technique for DFIG-based wind energy conversion systems. Eng. Sci. Technol. Int. J. 2018, 21, 901–908. [Google Scholar] [CrossRef]
Njiri, J.G.; Söffker, D. State-of-the-art in wind turbine control: Trends and challenges. Renew. Sustain. Energy Rev. 2016, 60, 377–393. [Google Scholar] [CrossRef]
Boukhezzar, B.; Lupu, L.; Siguerdidjane, H.; Hand, M. Multivariable control strategy for variable speed, variable pitch wind turbines. Renew. Energy 2007, 32, 1273–1287. [Google Scholar] [CrossRef]
Vidal, Y.; Acho, L.; Luo, N.; Zapateiro, M.; Pozo, F. Power Control Design for Variable-Speed Wind Turbines. Energies 2012, 5, 3033–3050. [Google Scholar] [CrossRef]
Chatterjee, J.; Dethlefs, N. Scientometric review of artificial intelligence for operations & maintenance of wind turbines: The past, present and future. Renew. Sustain. Energy Rev. 2021, 144, 111051. [Google Scholar] [CrossRef]
Apata, O.; Oyedokun, D.T.O. An overview of control techniques for wind turbine systems. Sci. Afr. 2020, 10, e00566. [Google Scholar] [CrossRef]
Kamel, R.M.; Chaouachi, A.; Nagasaka, K. Three Control Strategies to Improve the Microgrid Transient Dynamic Response During Isolated Mode: A Comparative Study. IEEE Trans. Ind. Electron. 2013, 60, 1314–1322. [Google Scholar] [CrossRef]
Chowdhury, M.A.; Hosseinzadeh, N.; Shen, W.X. Smoothing wind power fluctuations by fuzzy logic pitch angle controller. Renew. Energy 2012, 38, 224–233. [Google Scholar] [CrossRef]
Zeddini, M.A.; Pusca, R.; Sakly, A.; Mimouni, M.F. PSO-based MPPT control of wind-driven Self-Excited Induction Generator for pumping system. Renew. Energy 2016, 95, 162–177. [Google Scholar] [CrossRef]
Iqbal, A.; Ying, D.; Saleem, A.; Hayat, M.A.; Mateen, M. Proposed particle swarm optimization technique for the wind turbine control system. Meas. Control. 2020, 53, 1022–1030. [Google Scholar] [CrossRef] [Green Version]
Saenz-Aguirre, A.; Zulueta, E.; Fernandez-Gamiz, U.; Ulazia, A.; Teso-Fz-Betono, D. Performance enhancement of the artificial neural network-based reinforcement learning for wind turbine yaw control. Wind Energy 2020, 23, 687–701. [Google Scholar] [CrossRef]
Sierra-Garcia, J.E.; Santos, M.; Pandit, R. Wind turbine pitch reinforcement learning control improved by PID regulator and learning observer. Eng. Appl. Artif. Intell. 2022, 111, 104769. [Google Scholar] [CrossRef]
Li, H.; He, H. Learning to Operate Distribution Networks With Safe Deep Reinforcement Learning. IEEE Trans. Smart Grid 2022, 13, 1860–1872. [Google Scholar] [CrossRef]
Wang, Z.; Yu, Y.; Gao, W.; Davari, M.; Deng, C. Adaptive, Optimal, Virtual Synchronous Generator Control of Three-Phase Grid-Connected Inverters Under Different Grid Conditions—An Adaptive Dynamic Programming Approach. IEEE Trans. Ind. Inform. 2022, 18, 7388–7399. [Google Scholar] [CrossRef]
Vu, N.T.T.; Nguyen, H.D.; Nguyen, A.T. Reinforcement Learning-Based Adaptive Optimal Fuzzy MPPT Control for Variable Speed Wind Turbine. IEEE Access 2022, 10, 95771–95780. [Google Scholar] [CrossRef]
Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust Region Policy Optimization. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015; Bach, F., Blei, D., Eds.; Proceedings of Machine Learning Research. Volume 37, pp. 1889–1897. Available online: http://proceedings.mlr.press/v37/schulman15.pdf (accessed on 8 June 2023).
Ning, S.A. A simple solution method for the blade element momentum equations with guaranteed convergence. Wind Energy 2013, 17, 1327–1345. [Google Scholar] [CrossRef]
Brent, R.P. An algorithm with guaranteed convergence for finding a zero of a function. Comput. J. 1971, 14, 422–425. [Google Scholar] [CrossRef]
Buh, M.L. A New Empirical Relationship between Thrust Coefficient and Induction Factor for the Turbulent Windmill State; NREL/TP-500-36834; National Renewable Energy Laboratory: Golden, CO, USA, 2005; pp. 32–58. [Google Scholar]
Jonkman, J.; Butterfield, S.; Musial, W.; Scott, G. Definition of a 5-MW Reference Wind Turbine for Offshore System Development; Technical Report; National Renewable Energy Laboratory: Golden, CO, USA, 2009. [Google Scholar]
Enercon E-126 EP3 4.0MW. Retrieved from wind-turbine-models.com. 2018. Available online: https://www.enercon.de/fileadmin/Redakteur/Medien-Portal/windblatt/pdf/Windblatt_03_18_GB_Web.pdf (accessed on 8 June 2023).

Figure 1. Operating regions of a typical wind turbine. Adapted from [17].

Figure 2. Agent–environment interaction loop.

Figure 3. Visualization of the TRPO algorithm concept.

Figure 4. Diagram of the proposed architecture for optimizing the parameters of a robust RL-based MIMO controller.

Figure 5. Rotor–nacelle assembly of the NREL 5 MW baseline turbine.

Figure 6. Power coefficient as a function of tip–speed ratio.

Figure 7. Mean reward over episodes.

Figure 8. The simulation plots for the RL-based MIMO controller for the NREL 5 MW: (a) sequentially increasing wind speed, (b) random variation of wind speed.

Figure 9. The variation in wind speed over the course of a year obtained in NREL TurbSim for the Valentia offshore wind farm.

Figure 10. The simulation plots for the RL-based MIMO controller for the Enercon E-126 EP3 4.0 MW.

Table 1. MIMO controllers for wind turbines.

Reference	Method	MIMO Controller Operation Principle	Controlled Parameters
[5]	Mixed sensitivity $H \infty$ optimization	MIMO individual pitch controller	Blade root flap-wise bending moments
[6]	Dual multivariable model-free adaptive control strategy	Passive MIMO fault-tolerant individual pitch controller	The components of each blade
[7]	Second-order sliding modes and Lyapunov methods	MIMO second-order sliding controller	Reactive power and generator torque
[8]	Partial linearization and Lyapunov stability method	MIMO controller that achieves fully decoupled control of the external dynamics of a DFIG-based wind turbine	Multiple variables
[9]	Self-tuning regulator	MIMO pitch + generator speed	Blade pitch angle and generator torque
[10,11,12]	Linear quadratic regulator	MIMO pitch + generator speed controller	Blade pitch angle and generator torque
[13,14]	Model predictive control and fuzzy logic	MIMO pitch + generator speed controller	Blade pitch angle and generator torque
[1]	Reinforcement learning	MIMO pitch + generator speed controller	Blade pitch angle and generator torque
[15]	Fuzzy logic	MIMO-Based MSC+GSC controller	Modulation indexes for the GSC and MSC controllers.

Table 2. The environment parameter values.

Index	Name and Units	Min	Max
1	Wind speed [m/s]	3	25
2	Power generated [kW]	0	7000
3	Thrust [kN]	0	1000
4	Rotor speed [rpm]	0	15
5	Generator torque [kNm]	0.606	47.403
6	Collective pitch [deg]	0	90

Table 3. The control actions values.

Index of Actions $a_{t}$	Name and Units	Min	Max
0	Generator torque rate [kN·m/s]	$- 15.0 \cdot dt$	$15.0 \cdot dt$
1	Collective pitch rate [deg/s]	$- 8.0 \cdot dt$	$8.0 \cdot dt$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tomin, N. Robust Reinforcement Learning-Based Multiple Inputs and Multiple Outputs Controller for Wind Turbines. Mathematics 2023, 11, 3242. https://doi.org/10.3390/math11143242

AMA Style

Tomin N. Robust Reinforcement Learning-Based Multiple Inputs and Multiple Outputs Controller for Wind Turbines. Mathematics. 2023; 11(14):3242. https://doi.org/10.3390/math11143242

Chicago/Turabian Style

Tomin, Nikita. 2023. "Robust Reinforcement Learning-Based Multiple Inputs and Multiple Outputs Controller for Wind Turbines" Mathematics 11, no. 14: 3242. https://doi.org/10.3390/math11143242

APA Style

Tomin, N. (2023). Robust Reinforcement Learning-Based Multiple Inputs and Multiple Outputs Controller for Wind Turbines. Mathematics, 11(14), 3242. https://doi.org/10.3390/math11143242

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robust Reinforcement Learning-Based Multiple Inputs and Multiple Outputs Controller for Wind Turbines

Abstract

1. Introduction

1.1. Review of MIMO Controller Research for Wind Turbines

1.2. The Paper Contribution

2. Problem Statement

2.1. General Principles and Objectives of VSWT Control

2.2. General Applications of AI Methods to VSWT Control

3. Robust Model-Free RL-Based MIMO Controller

3.1. Reinforcement Learning

3.2. Trust Region Policy Optimization

3.3. MIMO Controller Synthesis

4. Experiments

4.1. Dynamic Model

4.2. Case Study of the NREL 5 MW

4.3. Case Study of the Enercon E-126 EP3 4.0 MW

5. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI