Reinforcement-Learning-Based Virtual Energy Storage System Operation Strategy for Wind Power Forecast Uncertainty Management

Oh, Eunsung

doi:10.3390/app10186420

Open AccessFeature PaperArticle

Reinforcement-Learning-Based Virtual Energy Storage System Operation Strategy for Wind Power Forecast Uncertainty Management

by

Eunsung Oh

Department of Electrical and Electronic Engineering, Hanseo University, Chungcheongnam-do 31962, Korea

Appl. Sci. 2020, 10(18), 6420; https://doi.org/10.3390/app10186420

Submission received: 31 August 2020 / Revised: 10 September 2020 / Accepted: 11 September 2020 / Published: 15 September 2020

(This article belongs to the Special Issue Smart Solutions of Distributed Energy Systems: Design, Operation, and Application)

Download

Browse Figures

Versions Notes

Abstract

:

Uncertainties related to wind power generation (WPG) restrict its usage. Energy storage systems (ESSs) are key elements employed in managing this uncertainty. This study proposes a reinforcement learning (RL)-based virtual ESS (VESS) operation strategy for WPG forecast uncertainty management. The VESS logically shares a physical ESS to multiple units, while VESS operation reduces the cost barrier of the ESS. In this study, the VESS operation model is suggested considering not only its own operation but also the operation of other units, and the VESS operation problem is formulated as a decision-making problem. To solve this problem, a policy-learning strategy is proposed based on an expected state-action-reward-state-action (SARSA) approach that is robust to variations in uncertainty. Moreover, multi-dimensional clustering is performed according to the WPG forecast data of multiple units to enhance performance. Simulation results using real datasets recorded by the National Renewable Energy Laboratory project of U.S. demonstrate that the proposed strategy provides a near-optimal performance with a less than 2%-point gap with the optimal solution. In addition, the performance of the VESS operation is enhanced by multi-user diversity gain in comparison with individual ESS operation.

Keywords:

energy sharing; energy storage system; reinforcement learning; renewable; uncertainty; virtual energy storage system; wind power

1. Introduction

The use and development of renewables has grown continuously in the power sector owing to climate change, with renewable power generation becoming second in the electricity mix in 2018 [1]. Renewables installed more than 200 gigawatts in 2019, which is the largest increase to date [2]. It is expected that renewable-based power capacity will grow by 50% between 2019 and 2024 [3]. In particular, forecasts predict that wind capacity installations will triple by 2024 [3].

Wind power generation (WPG) is subject to high fluctuations and intermittent properties. The characteristics of WPG make it difficult to ensure power system reliability [4,5]. Although various wind power forecasting methods such as the ensemble method [6], aggregated probabilistic method [7], and machine learning-based method [8] have been researched, uncertainty cannot be completely eliminated owing to the nature of wind-resource phenomena.

An energy storage system (ESS) plays an essential role in managing the uncertainty of WPG [9]. ESSs for WPG are used in various applications such as frequency regulation [10], ramp rate mitigation [11], and demand response [12]. The basic role of an ESS is to charge the surplus energy and discharge the stored energy according to the operational objective. Therefore, the primary issue of the usage of ESS is the effective operation of the ESS. Gomes et al. proposed a stochastic mixed-integer linear programming approach to manage the mismatching of the renewable power generation uncertainty using ESS operation in the day-ahead market [13]. Sperstad and Korpas presented stochastic ESS scheduling over an extended planning horizon [14]. Kalavani et al. proposed stochastic ESS scheduling, considering a demand response [15]. Yuan et al. suggested a revised genetic algorithm (GA) to solve the ESS operation optimization problem for economic dispatch [16]. Khare proposed particle swarm optimization and chaotic particle swarm optimization algorithms to minimize energy cost in renewable systems with ESS [17]. Liu et al. suggested a chicken swarm optimization algorithm to improve the reliability of ESS for renewables [18]. Oh and Son presented a frequency domain analysis-based ESS operation algorithm to reduce the uncertainty of WPG [19]. These studies show that ESSs effectively manage uncertainty to enhance the utilization of WPG. However, there are two constraints: First, the performance of the ESS is strongly related to the ESS capacity [19,20,21]. Second, for effective ESS operation, environment modeling of the characteristics of WPG is required in conventional stochastic and meta-heuristic approaches.

This study focuses on reinforcement learning (RL)-based virtual ESS (VESS) operation to manage the uncertainty of WPG. VESS (also named cloud ESS or shared ESS) is an ESS that is linked to different end-units and promotes coordination [22]. VESS virtually operates the physical ESS for serving several units. Virtualization reduces the investment cost of ESS and increases its utilization [23,24]. By applying VESS, the limitation of the ESS capacity can be addressed. Moreover, RL is a model-free approach [25]. In recent years, various studies have been conducted using RL for demand response [26], micro-grid systems [27,28], frequency oscillation control [29], and energy trading [30]. The RL-based, model-free approach reduces difficulties related to modeling.

The goal of this study is to design an RL-based VESS operation strategy to manage WPG forecast uncertainty. The VESS logically shares the physical ESS. The suggested VESS operation system considers not only its own operation but also the operation of other units. Under the system, the proposed RL approach to determine VESS operation is presented. Most previous studies used Q-learning-based RL approaches. However, in WPG environments, Q-learning is not efficient owing to the high penetration of uncertainty [31]. In this study, the expected state-action-reward-state-action (SARSA)-based RL approach is applied to obtain a more robust solution to WPG forecast fluctuation. Moreover, multi-dimensional data clustering that considers the WPG forecasting of each unit is combined to enhance the proposed strategy. A simulation study using real datasets recorded by the National Renewable Energy Laboratory (NREL) project of U.S. demonstrates that the proposed strategy provides near-optimal performance.

The main contribution of this study is summarized as follows:

1: This study proposes a VESS operation strategy based on an expected state-action-reward-state-action (SARSA)-based RL approach that is a more robust solution for WPG forecast fluctuation. To the best of our knowledge, this is the first work to apply the RL approach for the VESS operation;
2: The proposed strategy is also combined with multi-dimensional data clustering for enhancing the policy learning performance of the RL approach;
3: Effect of VESS and clustering is carefully discussed, and the usage case of the proposed strategy is suggested.

The rest of this paper is organized as follows. In Section 2, the system description, including forecasting uncertainty model and VESS system, and the VESS operation problem formulation is described, and in Section 3, the design method of the proposed RL-based VESS operation strategy is discussed. In Section 4, measurement studies using real WPG profiles applied to the proposed strategy are demonstrated, and in Section 5, a conclusion of the study is presented.

2. System Description and Problem Formulation

2.1. System Description

2.1.1. Uncertainty Model

In this study, a group of WPGs

U = {1, \dots, u, \dots, U}

, such as a wind farm, was considered to be connected to the grid, as shown in Figure 1. To connect to the grid, the WPG operator forecasts the power generation production over

T = {1, \dots, t, \dots, T}

, e.g.,

T = 24

h for day-ahead operation. Let

g_{t}^{u}

and

{\hat{g}}_{t}^{u}

be the actual WPG and its forecasting of the

u

-th WPG at time

t

, respectively. The

u

-th WPG forecast uncertainty at time

t

is defined as

e_{t}^{u} = {\hat{g}}_{t}^{u} - g_{t}^{u} .

(1)

An ESS is operated to manage the uncertainty by charging or discharging energy. By applying the ESS, the uncertainty is calculated as

ε_{t}^{u} = e_{t}^{u} + q_{t}^{u},

(2)

where

q_{t}^{u}

is the ESS charging/discharging quantity for the

u

th WPG at time

t

.

2.1.2. VESS System

The ESS is constructed in two parts: a power subsystem (PS) and an energy subsystem (ES). The PS capacity limits instantaneous charging and discharging power. The ES stores the energy, and its capacity determines the ESS service time. The ESS operation is performed within these two constraint regions in the individual ESS [21]. However, the VESS is operated by logically dividing one physical ESS over several units. Therefore, the VESS operation region is limited not only by the ESS capacity of the PS and the ES, but also by the operation of each unit.

Let

a_{t}^{u}

be the ESS charging or discharging action for the

u

th WPG at decision time

t

. The action is first restricted by the PS capacity

C_{P S}

- C_{P S} \leq a_{t}^{u} \leq C_{P S}, \forall t \in T .

(3)

Moreover, in VESS, the action is limited by the actions of other WPGs. Including this constraint, the action range in Equation (3) is modified as

- C_{P S} - \sum_{i \in U / {u}} a_{t}^{i} \leq a_{t}^{u} \leq C_{P S} - \sum_{i \in U / {u}} a_{t}^{i}, t \in T .

(4)

Considering the charging/discharging efficiency

η

, the actual operation quantity

q_{t}^{u}

in Equation (2) is measured as

q_{t}^{u} = {\begin{array}{l} η a_{t}^{u}, & if a_{t}^{u} \leq 0, \\ a_{t}^{u} / η, & if a_{t}^{u} > 0 . \end{array}

(5)

Second, the action is performed in the “energy stored” range, referred to as the state-of-charge (SoC). The SoC at the decision time

t

,

s_{t}

, is measured to be

s_{t} = s_{t - 1} + \sum_{u \in U} q_{t}^{u} Δ T,

(6)

where

Δ T

is the operation time interval, which is limited by the ES capacity

C_{E S}

,

C_{E S}^{\min} \leq s_{t} \leq C_{E S}^{\max}, \forall t \in T .

(7)

Herein,

C_{E S}^{\min}

and

C_{E S}^{\max}

express the minimum and maximum operable ES capacity, respectively. These values are determined by considering ESS characteristics, such as the depth of discharge (DoD).

Note that power system requirements such as line flow and frequency should be considered for implementation into practical systems. However, this study focuses on the forecasting error management aspect on the WPG; hence, a simple model used in [23,24] is considered herein. The VESS operation considering power system requirement is an open problem.

2.2. Problem Formulation

The aim of this study is to determine the VESS operation action required to manage WPG forecast uncertainty. Particularly, this study only deals with the WPG forecast error minimization problem. Therefore, the mean absolute error (MAE) is considered to be the performance metric of uncertainty management.

The MAE is calculated as

O (a) = \frac{1}{T} \sum_{u \in U} \sum_{t \in T} | e_{t}^{u} + q_{t}^{u} | = \frac{1}{T} \sum_{u \in U} \sum_{t \in T} | ε_{t}^{u} |,

(8)

where

a = {a_{1}^{1}, \dots, a_{t}^{u}, \dots, a_{T}^{U}}

.

Considering the VESS constraints, the VESS operation problem for managing WPG forecast uncertainty can be formulated as

\begin{array}{l} \min_{a} & O (a) \\ subject to & (4) and (7) . \end{array}

(9)

If perfect information is known, such as the actual WPG properties about a future time, the problem in Equation (9) can be optimally solved using iteration-based search algorithms such as the gradient descent method and Newton method [32]. However, this non-causal assumption cannot be implemented in the real world. In this study, the solution of this problem using perfect information, including the future time, is used as the optimal solution that is compared to the performance of the proposed VESS operation strategy.

3. Method

As shown in Equation (9), ESS operation is a sequential decision-making (SDM) problem. The SDM problem is mathematically formulated as the state-action space model, and the transaction probability among states is required to optimally solve the problem. However, the RL approach predicts the transaction probability using a learning algorithm, so it requires only the state-action space model to solve the SDM problem in Equation (9) [25].

3.1. State-Action Space

The state-action space for the individually operated ESS is described using a one-dimensional model [31]. The VESS is operated as a physical ESS, so the state-action space for the VESS is also presented as a one-dimensional model. However, the individually operated ESS action is only limited by the ESS capacity, although the action range of VESS is determined according to the accumulated actions of all units.

When the VESS is operated during

T,

the decision sequence, the state-action space is at

T + 1,

a decision stage including the initial stage, as shown in Figure 2. The state-action space for the RL approach is solved only using the discrete model. Therefore, the ESS operation corresponds to quantization by the unit action step

δ

, as shown in Figure 2. Accordingly, all state and action sets are expressed as

\begin{array}{l} S & = {s^{1}, \dots, s^{i}, \dots, s^{κ_{s}}}, \\ A & = {a^{- κ_{a}}, \dots, a^{j}, \dots, a^{κ_{a}}}, \end{array}

(10)

where

a_{t}

is the accumulated action for all units

a_{t} = \sum_{u \in U} a_{t}^{u}

. Considering the PS and ES capacity constraints in Equations (4) and (7),

κ_{s}

and

κ_{a}

are calculated as

\begin{array}{l} κ_{s} & = ⌊ (C_{E S}^{\max} - C_{E S}^{\min}) / δ ⌋, \\ κ_{a} & = ⌊ C_{P S} / δ ⌋, \end{array}

(11)

where

⌊ \cdot ⌋

is the floor operation. The discretized ESS operation by the quantization process makes a quantization error, but the error is bound according to the step size [33].

At each stage

t

, the current state is defined as the current SoC,

s_{t}

. The next state,

s_{t + 1}

, is determined by the current state and the selected action

a_{t + 1}

set by Equation (6) as follows:

< s_{t} = s^{i}, a_{t + 1} = a^{j} > \to s_{t + 1} .

(12)

Herein, action

a_{t}

should be selected from within the feasible action range according to the current state

A_{t} = {a^{j_{\min}}, \dots, a^{j}, \dots, a^{j_{\max}}},

(13)

where

j_{\min} = \max (- κ_{a}, 1 - i)

and

j_{\max} = \min (κ_{a}, κ_{s} - i)

. As an example, in Figure 2, the feasible action range for the first action

a_{1}

and the second action

a_{2}

become

A_{1} = {a^{- 2}, a^{- 1}, a^{0}, a^{1}, a^{2}}

and

A_{2} = {a^{- 2}, a^{- 1}, a^{0}}

, respectively.

3.2. Decision Policy Design

An RL-based VESS operation strategy determines the decision rule of the current action among the feasible action ranges at each stage in the state-action space. The decision rule is designed to maximize the reward provided by the VESS operation.

The goal of the VESS operation is to minimize the forecast uncertainty presented in the objective function in (9). At stage

t

, the operator is unaware of the uncertainty of forward time over

t

. Therefore, the uncertainty included in the VESS operation at stage

t

is expressed as

\begin{matrix} O_{t} (a_{t} ∣ ε_{t}) & = \frac{1}{T} \sum_{m = t}^{T} \sum_{u \in U} | ε_{m}^{u} | \\ = \frac{1}{T} \sum_{u \in U} | ε_{t}^{u} | + \frac{1}{T} \sum_{m = t + 1}^{T} \sum_{u \in U} | {\hat{ε}}_{m}^{u} | \\ = \frac{1}{T} \sum_{u \in U} | ε_{t}^{u} | + O_{t + 1} ({\hat{a}}_{t + 1} ∣ {\hat{ε}}_{t + 1}), \end{matrix}

(14)

where the values with a hat represent the expected values.

As shown in Equation (14), the uncertainty comprises the accumulated uncertainty of the current and expected uncertainties at the future time. In the RL approach, the current uncertainty performance of the VESS operation is defined as the instantaneous reward value from the current decision action at each stage. The reward at state

t

is presented as

r_{t} = \frac{1}{T} \sum_{u \in U} | ε_{t}^{u} | .

(15)

Moreover, the accumulated uncertainty is defined as the return that is the accumulated reward from time

t

onward, and is calculated as

\begin{array}{l} R_{t} & = r_{t} + γ r_{t + 1} + \dots + γ^{T - t} r_{T} \\ = r_{t} + γ R_{t + 1}, \end{array}

(16)

where

γ

is the discount factor,

(0, 1]

, which reduces the risk of the expected value from the onward decision time. The return in Equation (16) becomes the weighted uncertainty performance of the VESS operation in Equation (14).

The RL-based decision-making approach is used to determine the VESS operation action to minimize the reward, which is the uncertainty performance. For this, the state-action value function is defined and presents the quality of an action,

a_{t}

, at a given state

s_{t}

, as follows

\begin{array}{l} Q (s_{t}, a_{t}) & = E [R_{t} | s_{t}, a_{t}] \\ = E [r_{t} + γ Q (s_{t + 1}, a_{t + 1}) | s_{t}, a_{t}] . \end{array}

(17)

As the transaction probability of an action at each state is known,

π = \Pr (a_{t} | s_{t}), \forall t \in T, s_{t} \in S, a_{t} \in A_{t}

, the optimal state-action value function

Q^{*} (s_{t}, a_{t})

is measured using the Bellman optimality equation [34]

\begin{array}{l} Q^{*} (s_{t}, a_{t}) & = E [r_{t} + γ \min_{a_{t + 1} \in A_{t + 1}} Q (s_{t + 1}, a_{t + 1}) | s_{t}, a_{t}] \\ = E [r_{t} + γ Q^{*} (s_{t + 1}, a_{t + 1}) | s_{t}, a_{t}] . \end{array}

(18)

and the optimal action is determined as

\begin{array}{l} a_{t}^{*} & = \arg \min_{a_{t} \in A_{t}} Q^{*} (s_{t}, a_{t}) \\ = \arg \min_{a_{t} \in A_{t}} E [r_{t} | s_{t}, a_{t}] + γ Q^{*} (s_{t + 1}, a_{t + 1}) . \end{array}

(19)

However, it is impractical to attempt to determine the transaction probability of an action at each state.

In the RL approach, the state-action value function is estimated by learning. In widely used Q-learning-based RL approaches [26,27,28,29], the state-action value function is estimated as

Q^{Q L} (s_{t}, a_{t}) \leftarrow (1 - α) Q^{Q L} (s_{t}, a_{t}) + α [r_{t} + γ \min_{a_{t + 1} \in A_{t + 1}} Q^{Q L} (s_{t + 1}, a_{t + 1})],

(20)

where

α

is the learning rate in

(0, 1]

. Moreover, the action is determined as

a_{t}^{Q L} = \arg \min_{a_{t} \in A_{t}} [r_{t} + γ \min_{a_{t + 1} \in A_{t + 1}} Q^{Q L} (s_{t + 1}, a_{t + 1}) .]

(21)

However, the WPG has a high uncertainty variance [19]. This variance reduces the reliability of the expected value in forward time, such as

Q^{Q L} (s_{t + 1}, a_{t + 1})

. Therefore, the Q-learning-based RL approach cannot guarantee uncertainty management performance in WPG environments [31].

Instead of employing the minimum value in the Q-learning-based approach, the expected SARSA-based RL approach uses the expected state-action value to decide the action. The expectation of the value reduces the risk of variance in forward time [25]. In the expected SARSA-based RL approach, the action is determined as

\begin{array}{l} a_{t}^{e S A R S A} = \arg \min_{a_{t} \in A_{t}} [r_{t} + γ E_{A_{t + 1}} {Q^{e S A R S A} (s_{t + 1}, a_{t + 1})}] . \end{array}

(22)

In addition, the state-action value function is updated as

Q^{e S A R S A} (s_{t}, a_{t}) \leftarrow (1 - α) Q^{e S A R S A} (s_{t}, a_{t}) + α [r_{t} + γ E_{A_{t + 1}} {Q^{e S A R S A} (s_{t + 1}, a_{t + 1})}] .

(23)

3.3. Multi-Dimensional Clustering

The determined action for the VESS operation in (22) is an accumulated action set to manage the uncertainty of each WPG, such as

a_{t} = {a_{t}^{1}, \dots, a_{t}^{U}}

. The expected SARSA approach reduces the expected risk in forward time. However, the multi-dimensional action renders convergence difficult, and also reduces the uncertainty management performance of the VESS operation.

To mitigate this effect of multi-dimensional action, data classification is considered. Data classification is a technique that involves the categorization of data to enable organization for effective operation [35]. With RL approaches, data classification can enhance the learning performance of the state-action value function [36].

In this study, a k-means clustering method is applied, which involves the vector quantization of data into K clusters [37]. With K cluster centroids,

c = {c_{1}, \dots, c_{k}, \dots, c_{K}}

, the method classifies the data into K cluster sets,

C = {C_{1}, \dots, C_{k}, \dots, C_{K}}

, as follows

\arg \min_{c} \sum_{k = 1}^{K} \sum_{u \in U} ∥ {\hat{g}}^{u} - c_{k} ∥_{2}

(24)

where

{\hat{g}}^{u} = {{\hat{g}}_{1}^{u}, \dots, {\hat{g}}_{T}^{u}}

and

| | \cdot | |_{2}

express the Euclidean norm. The k-mean clustering problem of (24) is repeatedly solved by the Lloyd algorithm, which determines the centroids of Voronoi diagrams [37].

3.4. RL-Based VESS Operation Strategy

The proposed strategy comprises data clustering for enhancing the learning performance and policy learning to determine the VESS operation action. The proposed strategy is described as follows (Algorithm 1).

Algorithm 1. Proposed RL-based VESS operation algorithm

Data clustering
1: Initialization
2: Set the number of clusters to

K

.
3: Initialize centroids

c_{k}

using historical WPG forecasting data.
4: Data clustering
5: Set cluster

k

as

k = a r g m i n_{k} \sum_{u \in U} ∥ {\hat{g}}^{u} - c_{k} ∥_{2}

.
6: Update

c_{k}

including

{\hat{g}}^{u}

.
Policy learning
7: Initialization
8: Set

Q^{e S A R S A}

as

Q_{k}

from

Q = {Q_{1}, \dots, Q_{K}}

.
9: Set

s_{1}

as the current SoC and

A_{1}

using (13).
10: Policy learning
11: For

t = {1, \dots, T}

,
12: Set

a_{t}^{e S A R S A}

in

A_{t}

using (22).
13: Update

s_{t + 1}

,

A_{t + 1}

and

Q^{e S A R S A}

using (12) and (23).
14: end for

First, to apply the k-mean clustering, the number of clusters is set to

K

, and using the historical WPG forecast data, the centroids are initialized to solve (24) (steps 2 and 3). The cluster number k of data sets is selected as the cluster that has the minimum Euclidean distance to the cluster centroid in step 5. The cluster number is used to select the active state-action value function for the policy learning process. Moreover, the centroid of the selected cluster is updated considering the dataset in step 6.

Combined with

K

clustering,

K

state-action value functions are required. In the policy learning process, the k-th state-action value function is loaded as the active state-action value function,

Q^{e S A R S A}

, according to the cluster number

k

in step 8. The initial state

s_{1}

is set as the current VESS condition and the feasible action range

A_{1}

is determined by the current state

s_{1}

in step 9. During the operation time horizon

T

, the VESS operation action is selected using (22) and the values of the state and the state-action value function are updated according to the selected action in steps 12 and 13.

4. Results and Discussion

To verify the performance of the proposed strategy, the simulation results were evaluated. In the simulation, five WPG datasets that were recorded by the National Renewable Energy Laboratory to develop eastern wind resources in the United States of America from 2004 to 2006 were employed [38]. Each WPG had a capacity of 20 megawatt (MW). Day-ahead forecasting data were provided with 1-h time resolution. Therefore, the operation time horizon was set to 24 h,

T = {1, \dots, 24}

.

The simulation results were measured using the data from the first 14 days of December 2006, and the other datasets were used for RL training. Moreover, for policy learning, the learning rate and discount factor were set as

α = 0.1

and

γ = 0.95

, respectively. The cluster size was assumed to be three. However, a discussion of the cluster size is also presented here.

A lithium-ion based ESS system was considered, which is widely used with renewable energy systems [39]. The charging/discharging efficiency

η

was set as,

0.95,

which provided a

0.9

round-trip efficiency, and the DoD margin that restricts the minimum and maximum operable ES capacity was

0.1

. The ES capacity was expressed as the normalized WPG capacity, that is, per unit (p.u.), and the service time was 2 h, which provided a 0.5 charging rate (C-rate).

The simulations in this study were implemented on a 64-bit PC with a 4 GHz Quad-Core Intel Core i7 CPU and 32 GB RAM, using MATLAB R2020a with an IBM CPLEX optimization studio.

4.1. Performance Results of Proposed Strategy

Figure 3 shows the uncertainty management performance as MAE, with varying VESS size. The black line with circles, the red line with squares, and the blue line with diamonds present the results obtained when applying the optimal solution, the proposed method, and the stochastic method, respectively. The optimal solution is the solution to problem (9), for known information including the future time, and the stochastic method is the VESS operation according to the probabilistic information of the WPG suggested in [40]. The VESS size is the available operation room to manage uncertainty. Therefore, by increasing the size, the MAE is reduced, as shown in Figure 3a. In particular, in Figure 3a the optimal solution and proposed method have a similar slope with increasing VESS size, but the result obtained with the stochastic method shows a less significant decrease. This implies that the optimal solution and the proposed method effectively operate according to the environment, while the stochastic method does not. The stochastic method is designed according to the Markov decision process, similar to the proposed method. However, the stochastic method applying the backward induction approach in [40] predetermines the reserved capacity for the future decision stage according to the probabilistic information of the WPG, so the operational diversity of the stochastic method is lower than those of the optimal solution and proposed method. Figure 3b shows the optimal gap, which represents the difference from the optimal solution. The optimal gap of the proposed method is less than 2%, and is reduced with increasing size. The proposed method can effectively consider environmental characteristics by learning and clustering, and therefore achieves gain by increasing the size. However, the stochastic method cannot reflect this. Therefore, the optimal gap in the stochastic method increases with increasing size.

4.2. Effect of VESS

Figure 4 compares the MAE of the individual ESS operation and the proposed VESS operation. The results for individual ESS operations are the optimal solution reformulated problem (9) for each WPG. Each WPG presents different uncertainties, so the decrease in the slope with increasing size also differs, as shown in Figure 4a. However, the results of the proposed VESS operation demonstrate that it outperforms all individual operation results. The individual ESS operation works by using its own information. However, in the case of the proposed VESS operation, information from multiple units is used. Therefore, the proposed VESS operation achieves multi-user diversity gain [41]. Figure 4b verifies the diversity gain. By increasing the size, the operation availability also increases. The proposed VESS operation is effectively operated to achieve availability with multi-user diversity. Therefore, the VESS operation gain compared to the individual ESS operation is enhanced with increasing operation availability.

4.3. Effect of Clustering

Figure 5 shows the optimal gap of the proposed method with 1, 3, and 5-cluster cases. As shown in Figure 5, the optimal gap is reduced as the number of clusters increases. In particular, with five clusters, an optimal gap enhancement of more than 1.5% can be obtained when the ESS size is 0.6 p.u. This indicates that the clustering method is an effective way to enhance the performance of the proposed method. However, compared to the performance enhancement provided by three clusters or one cluster, the performance increase provided by five clusters is less than that of three clusters. This is because the distance between the centroids is reduced with increasing cluster size. Moreover, by increasing the cluster size, the number of state-action value functions also increases for policy learning. This increases the system complexity in the implementation. Therefore, it is important to set the appropriate cluster size by considering both performance enhancement and system complexity. As an example, in this study operating with five WPG units, three clusters are efficient considering the performance enhancement, as shown in Figure 5.

4.4. Usage of the Proposed Strategy

The VESS operation applying the proposed strategy can get a higher forecast error management performance than that of the individual ESS operation. For an example, when the MAE target of each WPG sets as 1.5, in case of the individual ESS operation, each ESS size larger than 1 p.u. is required, as shown in Figure 4a. This is economically not viable. However, in the proposed VESS case, 0.2 p.u. of ESS size is required for each WPG with the same target. This makes a business model such as a VESS service with economic benefit by reducing the ESS size. Moreover, by increasing the number of clusters, the ESS size can be reduced, as shown in Figure 5. The cluster size affects the number of the state-action value function that is related to the memory size and the computational complex. Therefore, the VESS service provider can select the ESS size and the number of clusters considering the ESS cost, the memory cost, and the computational complexity, as well as the WPG forecast error management target.

5. Conclusions

This study proposed an RL-based VESS operation strategy to manage WPG forecasting uncertainty. The VESS operation model is the first to consider not only its own uncertainty management requirement, but also the requirements of other units. Applying the VESS model, the expected SARSA-based learning policy is suggested to solve the sequential decision-making problem of the VESS operation. Moreover, the k-means data clustering method is employed to enhance the performance of the proposed strategy by reducing uncertainty variance. The simulation results demonstrate that the proposed strategy provides a near-optimal performance, with a less than 2%-point gap to the optimal solution that requires information including the future time. Moreover, the MAE improvement when applying the proposed method has a similar slope to that of the optimal method according to the storage size. This shows that the proposed method obtains a similar operational diversity to that of the optimal method and can achieve near-optimal performance generally. In addition, we evaluated the performance achieved by the VESS operation in terms of multi-user diversity and the effect of the clustering method according to cluster size.

Research on VESS operation is at an early stage. This study shows that VESS operation can outperform individual ESS operation. However, the performance enhancement according to the VESS operation differs for each unit. Therefore, the VESS operation considering the performance balance among units will be the subject of further research. Moreover, this study only considers a simple system model. By including power system requirements, the system model can be practically extended further. Finally, the forecast error management of WPG is highly related to the revenue, and the VESS operation is cost-efficient, rather than the individual ESS operation. Therefore, this study can be extended to research in the economic aspect, such as a revenue maximization problem considering ESS costs.

Funding

This work was supported by 2020 Research Grant of Hanseo University.

Conflicts of Interest

The authors declare no conflict of interest.

References

International Energy Agency (IEA). World Energy Balances. Statistics Report—July 2020. Available online: https://www.iea.org/reports/world-energy-balances-overview (accessed on 31 August 2020).
REN21. Renewables 2020 Global Status Report. June 2020. Available online: https://www.ren21.net/gsr-2020 (accessed on 31 August 2020).
International Energy Agency (IEA). Renewables 2019. Fuel Report—October 2019. Available online: https://www.iea.org/reports/renewables-2019 (accessed on 31 August 2020).
Ayodele, T.R.; Jimoh, A.; Munda, J.L.; Tehile, A.J. Challenges of grid integration of wind power on power system grid integrity: A review. Int. J. Renew. Energy Res. 2012, 2, 618–626. [Google Scholar]
Wu, Y.-K.; Chang, S.-M.; Mandal, P. Grid-connected wind power plants: A survey on the integration requirements in modern grid codes. IEEE Trans. Ind. Appl. 2019, 55, 5584–5593. [Google Scholar] [CrossRef]
Hao, Y.; Tian, C. A novel two-stage forecasting model based on error factor and ensemble method for multi-step wind power forecasting. Appl. Energy 2019, 238, 368–383. [Google Scholar] [CrossRef]
Sun, M.; Feng, C.; Zhang, J. Conditional aggregated probabilistic wind power forecasting based on spatio-temporal correlation. Appl. Energy 2019, 256, 113842. [Google Scholar] [CrossRef]
Demolli, H.; Dokuz, A.; Şakir Ecemis, A.; Gökçek, M. Wind power forecasting based on daily wind speed data using machine learning algorithms. Energy Convers. Manag. 2019, 198, 111823. [Google Scholar] [CrossRef]
Zhao, H.; Wu, Q.; Hu, S.; Xu, H.; Rasmussen, C.N. Review of energy storage system for wind power integration support. Appl. Energy 2015, 137, 545–553. [Google Scholar] [CrossRef]
Sebastian, R. Application of a battery energy storage for frequency regulation and peak shaving in a wind diesel power system. IET Gener. Transm. Distrib. 2016, 10, 764–770. [Google Scholar] [CrossRef]
Atherton, J.; Sharma, R.; Salgado, J. Techno-economic analysis of energy storage systems for application in wind farms. Energy 2017, 135, 540–552. [Google Scholar] [CrossRef] [Green Version]
Bitaraf, H.; Rahman, S. Reducing curtailed wind energy through energy storage and demand response. IEEE Trans. Sustain. Energy 2018, 9, 228–236. [Google Scholar] [CrossRef]
Gomes, I.; Pousinho, H.M.I.; Melicio, R.; Mendes, V. Stochastic coordination of joint wind and photovoltaic systems with energy storage in day-ahead market. Energy 2017, 124, 310–320. [Google Scholar] [CrossRef]
Sperstad, I.B.; Korpås, M. Energy storage scheduling in distribution systems considering wind and photovoltaic generation uncertainties. Energies 2019, 12, 1231. [Google Scholar] [CrossRef] [Green Version]
Kalavani, F.; Mohammadi-Ivatloo, B.; Zare, K. Optimal stochastic scheduling of cryogenic energy storage with wind power in the presence of a demand response program. Renew. Energy 2019, 130, 268–280. [Google Scholar] [CrossRef]
Yuan, Y.; Zhang, X.; Ju, P.; Li, Q.; Qian, K.; Fu, Z. Determination of economic dispatch of wind farm-battery energy storage system using Genetic Algorithm. Int. Trans. Electr. Energy Syst. 2012, 24, 264–280. [Google Scholar] [CrossRef]
Khare, V.; Nema, S.; Baredar, P. Optimisation of the hybrid renewable energy system by HOMER, PSO and CPSO for the study area. Int. J. Sustain. Energy 2015, 36, 1–18. [Google Scholar] [CrossRef]
Liu, Z.-F.; Li, L.; Tseng, M.-L.; Tan, R.; Aviso, K. Improving the reliability of photovoltaic and wind power storage systems using least squares support vector machine optimized by improved chicken swarm algorithm. Appl. Sci. 2019, 9, 3788. [Google Scholar] [CrossRef] [Green Version]
Oh, E.; Son, S.-Y. Energy-storage system sizing and operation strategies based on discrete Fourier transform for reliable wind-power generation. Renew. Energy 2018, 116, 786–794. [Google Scholar] [CrossRef]
Dui, X.; Zhu, G.; Yao, L. Two-stage optimization of battery energy storage capacity to decrease wind power curtailment in grid-connected wind farms. IEEE Trans. Power Syst. 2018, 33, 3296–3305. [Google Scholar] [CrossRef]
Oh, E.; Son, S.-Y. Theoretical energy storage system sizing method and performance analysis for wind power forecast uncertainty management. Renew. Energy 2020, 155, 1060–1069. [Google Scholar] [CrossRef]
Li, S.; Yang, J.; Fang, J.; Liu, Z.; Zhang, H. Electricity scheduling optimisation based on energy cloud for residential microgrids. IET Renew. Power Gener. 2019, 13, 1105–1114. [Google Scholar] [CrossRef]
Oh, E.; Son, S.-Y. Shared electrical energy storage service model and strategy for apartment-type factory buildings. IEEE Access 2019, 7, 130340–130351. [Google Scholar] [CrossRef]
Zhao, D.; Wang, H.; Huang, J.; Lin, X. Virtual energy storage sharing and capacity allocation. IEEE Trans. Smart Grid 2020, 11, 1112–1123. [Google Scholar] [CrossRef]
Sutton, R.; Barto, A. Reinforcement learning: An nntroduction. IEEE Trans. Neural Netw. 1998, 9, 1054. [Google Scholar] [CrossRef]
Vázquez-Canteli, J.R.; Nagy, Z. Reinforcement learning for demand response: A review of algorithms and modeling techniques. Appl. Energy 2019, 235, 1072–1089. [Google Scholar] [CrossRef]
Qiu, X.; Nguyen, T.A.; Crow, M.L. Heterogeneous energy storage optimization for microgrids. IEEE Trans. Smart Grid 2015, 7, 1453–1461. [Google Scholar] [CrossRef]
Bui, V.-H.; Hussain, A.; Bui, V.-H. Q-Learning-based operation strategy for Community Battery Energy Storage System (CBESS) in microgrid system. Energies 2019, 12, 1789. [Google Scholar] [CrossRef] [Green Version]
Younesi, A.; Shayeghi, H. Q-learning based supervisory pid controller for damping frequency oscillations in a hybrid mini/micro-grid. Iran. J. Electr. Electron. Eng. 2019, 15, 126–141. [Google Scholar]
Wang, H.; Huang, T.; Liao, X.; Abu-Rub, H.; Chen, G. Reinforcement learning in energy trading game among smart microgrids. IEEE Trans. Ind. Electron. 2016, 63, 1. [Google Scholar] [CrossRef]
Oh, E.; Wang, H. Reinforcement-learning-based energy storage system operation strategies to manage wind power forecast uncertainty. IEEE Access 2020, 8, 20965–20976. [Google Scholar] [CrossRef]
Boyd, S.P.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Gray, R.M.; Neuhoff, D.L. Quantization. IEEE Trans. Inf. Theory 1998, 44, 2325–2383. [Google Scholar] [CrossRef]
Kirk, D.E. Optimal Control Theory: An Introduction; Prentice Hall: Upper Saddle River, NJ, USA, 1970. [Google Scholar]
Durgesh, K.S.; Lekha, B. Data classification using support vector machine. J. Theor. Appl. Inf. Technol. 2010, 12, 1–7. [Google Scholar]
Rajan, S.; Ghosh, J.; Crawford, M. An active learning approach to hyperspectral data classification. IEEE Trans. Geosci. Remote. Sens. 2008, 46, 1231–1242. [Google Scholar] [CrossRef]
Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inf. Theory 1982, 28, 129–137. [Google Scholar] [CrossRef]
Brower, M. Development of Eastern Regional Wind Resource and Wind Plant Output Datasets: 3 March 2008–31 March 2010; Office of Scientific and Technical Information (OSTI): Washington, DC, USA, 2009.
Yang, Y.; Bremner, S.; Menictas, C.; Kay, M. Battery energy storage system size determination in renewable energy systems: A review. Renew. Sustain. Energy Rev. 2018, 91, 109–125. [Google Scholar] [CrossRef]
Oh, E.; Son, S.-Y.; Hwang, H.M.; Park, J.-B.; Lee, K.Y. Impact of demand and price uncertainties on customer-side energy storage system operation with peak load limitation. Electr. Power Compon. Syst. 2015, 43, 1872–1881. [Google Scholar] [CrossRef]
Verdú, S. Multiuser Detection; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]

Figure 1. Wind power generation system model with a virtual energy storage system; VESS: virtual energy storage system, and RL: reinforcement learning.

Figure 2. State-action space model for a virtual energy storage system; PS: power subsystem, ES: energy subsystem, and ESS: energy storage system.

Figure 3. Mean absolute error (MAE) comparison between the optimal solution, proposed method, and stochastic method. (a) Mean absolute error; (b) Optimal gap.

Figure 4. MAE comparison among individual ESS operation and VESS operation. (a) Mean absolute error; (b) Optimal gap.

Figure 5. Change in optimal gap of the proposed method with 1, 3, and 5-cluster cases.

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Oh, E. Reinforcement-Learning-Based Virtual Energy Storage System Operation Strategy for Wind Power Forecast Uncertainty Management. Appl. Sci. 2020, 10, 6420. https://doi.org/10.3390/app10186420

AMA Style

Oh E. Reinforcement-Learning-Based Virtual Energy Storage System Operation Strategy for Wind Power Forecast Uncertainty Management. Applied Sciences. 2020; 10(18):6420. https://doi.org/10.3390/app10186420

Chicago/Turabian Style

Oh, Eunsung. 2020. "Reinforcement-Learning-Based Virtual Energy Storage System Operation Strategy for Wind Power Forecast Uncertainty Management" Applied Sciences 10, no. 18: 6420. https://doi.org/10.3390/app10186420

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reinforcement-Learning-Based Virtual Energy Storage System Operation Strategy for Wind Power Forecast Uncertainty Management

Abstract

1. Introduction

2. System Description and Problem Formulation

2.1. System Description

2.1.1. Uncertainty Model

2.1.2. VESS System

2.2. Problem Formulation

3. Method

3.1. State-Action Space

3.2. Decision Policy Design

3.3. Multi-Dimensional Clustering

3.4. RL-Based VESS Operation Strategy

4. Results and Discussion

4.1. Performance Results of Proposed Strategy

4.2. Effect of VESS

4.3. Effect of Clustering

4.4. Usage of the Proposed Strategy

5. Conclusions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI