1. Introduction
The use and development of renewables has grown continuously in the power sector owing to climate change, with renewable power generation becoming second in the electricity mix in 2018 [
1]. Renewables installed more than 200 gigawatts in 2019, which is the largest increase to date [
2]. It is expected that renewable-based power capacity will grow by 50% between 2019 and 2024 [
3]. In particular, forecasts predict that wind capacity installations will triple by 2024 [
3].
Wind power generation (WPG) is subject to high fluctuations and intermittent properties. The characteristics of WPG make it difficult to ensure power system reliability [
4,
5]. Although various wind power forecasting methods such as the ensemble method [
6], aggregated probabilistic method [
7], and machine learning-based method [
8] have been researched, uncertainty cannot be completely eliminated owing to the nature of wind-resource phenomena.
An energy storage system (ESS) plays an essential role in managing the uncertainty of WPG [
9]. ESSs for WPG are used in various applications such as frequency regulation [
10], ramp rate mitigation [
11], and demand response [
12]. The basic role of an ESS is to charge the surplus energy and discharge the stored energy according to the operational objective. Therefore, the primary issue of the usage of ESS is the effective operation of the ESS. Gomes et al. proposed a stochastic mixed-integer linear programming approach to manage the mismatching of the renewable power generation uncertainty using ESS operation in the day-ahead market [
13]. Sperstad and Korpas presented stochastic ESS scheduling over an extended planning horizon [
14]. Kalavani et al. proposed stochastic ESS scheduling, considering a demand response [
15]. Yuan et al. suggested a revised genetic algorithm (GA) to solve the ESS operation optimization problem for economic dispatch [
16]. Khare proposed particle swarm optimization and chaotic particle swarm optimization algorithms to minimize energy cost in renewable systems with ESS [
17]. Liu et al. suggested a chicken swarm optimization algorithm to improve the reliability of ESS for renewables [
18]. Oh and Son presented a frequency domain analysis-based ESS operation algorithm to reduce the uncertainty of WPG [
19]. These studies show that ESSs effectively manage uncertainty to enhance the utilization of WPG. However, there are two constraints: First, the performance of the ESS is strongly related to the ESS capacity [
19,
20,
21]. Second, for effective ESS operation, environment modeling of the characteristics of WPG is required in conventional stochastic and meta-heuristic approaches.
This study focuses on reinforcement learning (RL)-based virtual ESS (VESS) operation to manage the uncertainty of WPG. VESS (also named cloud ESS or shared ESS) is an ESS that is linked to different end-units and promotes coordination [
22]. VESS virtually operates the physical ESS for serving several units. Virtualization reduces the investment cost of ESS and increases its utilization [
23,
24]. By applying VESS, the limitation of the ESS capacity can be addressed. Moreover, RL is a model-free approach [
25]. In recent years, various studies have been conducted using RL for demand response [
26], micro-grid systems [
27,
28], frequency oscillation control [
29], and energy trading [
30]. The RL-based, model-free approach reduces difficulties related to modeling.
The goal of this study is to design an RL-based VESS operation strategy to manage WPG forecast uncertainty. The VESS logically shares the physical ESS. The suggested VESS operation system considers not only its own operation but also the operation of other units. Under the system, the proposed RL approach to determine VESS operation is presented. Most previous studies used Q-learning-based RL approaches. However, in WPG environments, Q-learning is not efficient owing to the high penetration of uncertainty [
31]. In this study, the expected state-action-reward-state-action (SARSA)-based RL approach is applied to obtain a more robust solution to WPG forecast fluctuation. Moreover, multi-dimensional data clustering that considers the WPG forecasting of each unit is combined to enhance the proposed strategy. A simulation study using real datasets recorded by the National Renewable Energy Laboratory (NREL) project of U.S. demonstrates that the proposed strategy provides near-optimal performance.
The main contribution of this study is summarized as follows:
- 1
This study proposes a VESS operation strategy based on an expected state-action-reward-state-action (SARSA)-based RL approach that is a more robust solution for WPG forecast fluctuation. To the best of our knowledge, this is the first work to apply the RL approach for the VESS operation;
- 2
The proposed strategy is also combined with multi-dimensional data clustering for enhancing the policy learning performance of the RL approach;
- 3
Effect of VESS and clustering is carefully discussed, and the usage case of the proposed strategy is suggested.
The rest of this paper is organized as follows. In
Section 2, the system description, including forecasting uncertainty model and VESS system, and the VESS operation problem formulation is described, and in
Section 3, the design method of the proposed RL-based VESS operation strategy is discussed. In
Section 4, measurement studies using real WPG profiles applied to the proposed strategy are demonstrated, and in
Section 5, a conclusion of the study is presented.
3. Method
As shown in Equation (9), ESS operation is a sequential decision-making (SDM) problem. The SDM problem is mathematically formulated as the state-action space model, and the transaction probability among states is required to optimally solve the problem. However, the RL approach predicts the transaction probability using a learning algorithm, so it requires only the state-action space model to solve the SDM problem in Equation (9) [
25].
3.1. State-Action Space
The state-action space for the individually operated ESS is described using a one-dimensional model [
31]. The VESS is operated as a physical ESS, so the state-action space for the VESS is also presented as a one-dimensional model. However, the individually operated ESS action is only limited by the ESS capacity, although the action range of VESS is determined according to the accumulated actions of all units.
When the VESS is operated during
the decision sequence, the state-action space is at
a decision stage including the initial stage, as shown in
Figure 2. The state-action space for the RL approach is solved only using the discrete model. Therefore, the ESS operation corresponds to quantization by the unit action step
, as shown in
Figure 2. Accordingly, all state and action sets are expressed as
where
is the accumulated action for all units
. Considering the PS and ES capacity constraints in Equations (4) and (7),
and
are calculated as
where
is the floor operation. The discretized ESS operation by the quantization process makes a quantization error, but the error is bound according to the step size [
33].
At each stage
, the current state is defined as the current SoC,
. The next state,
, is determined by the current state and the selected action
set by Equation (6) as follows:
Herein, action
should be selected from within the feasible action range according to the current state
where
and
. As an example, in
Figure 2, the feasible action range for the first action
and the second action
become
and
, respectively.
3.2. Decision Policy Design
An RL-based VESS operation strategy determines the decision rule of the current action among the feasible action ranges at each stage in the state-action space. The decision rule is designed to maximize the reward provided by the VESS operation.
The goal of the VESS operation is to minimize the forecast uncertainty presented in the objective function in (9). At stage
, the operator is unaware of the uncertainty of forward time over
. Therefore, the uncertainty included in the VESS operation at stage
is expressed as
where the values with a hat represent the expected values.
As shown in Equation (14), the uncertainty comprises the accumulated uncertainty of the current and expected uncertainties at the future time. In the RL approach, the current uncertainty performance of the VESS operation is defined as the instantaneous reward value from the current decision action at each stage. The reward at state
is presented as
Moreover, the accumulated uncertainty is defined as the return that is the accumulated reward from time
onward, and is calculated as
where
is the discount factor,
, which reduces the risk of the expected value from the onward decision time. The return in Equation (16) becomes the weighted uncertainty performance of the VESS operation in Equation (14).
The RL-based decision-making approach is used to determine the VESS operation action to minimize the reward, which is the uncertainty performance. For this, the state-action value function is defined and presents the quality of an action,
, at a given state
, as follows
As the transaction probability of an action at each state is known,
, the optimal state-action value function
is measured using the Bellman optimality equation [
34]
and the optimal action is determined as
However, it is impractical to attempt to determine the transaction probability of an action at each state.
In the RL approach, the state-action value function is estimated by learning. In widely used Q-learning-based RL approaches [
26,
27,
28,
29], the state-action value function is estimated as
where
is the learning rate in
. Moreover, the action is determined as
However, the WPG has a high uncertainty variance [
19]. This variance reduces the reliability of the expected value in forward time, such as
. Therefore, the Q-learning-based RL approach cannot guarantee uncertainty management performance in WPG environments [
31].
Instead of employing the minimum value in the Q-learning-based approach, the expected SARSA-based RL approach uses the expected state-action value to decide the action. The expectation of the value reduces the risk of variance in forward time [
25]. In the expected SARSA-based RL approach, the action is determined as
In addition, the state-action value function is updated as
3.3. Multi-Dimensional Clustering
The determined action for the VESS operation in (22) is an accumulated action set to manage the uncertainty of each WPG, such as . The expected SARSA approach reduces the expected risk in forward time. However, the multi-dimensional action renders convergence difficult, and also reduces the uncertainty management performance of the VESS operation.
To mitigate this effect of multi-dimensional action, data classification is considered. Data classification is a technique that involves the categorization of data to enable organization for effective operation [
35]. With RL approaches, data classification can enhance the learning performance of the state-action value function [
36].
In this study, a k-means clustering method is applied, which involves the vector quantization of data into K clusters [
37]. With K cluster centroids,
, the method classifies the data into K cluster sets,
, as follows
where
and
express the Euclidean norm. The k-mean clustering problem of (24) is repeatedly solved by the Lloyd algorithm, which determines the centroids of Voronoi diagrams [
37].
3.4. RL-Based VESS Operation Strategy
The proposed strategy comprises data clustering for enhancing the learning performance and policy learning to determine the VESS operation action. The proposed strategy is described as follows (Algorithm 1).
Algorithm 1. Proposed RL-based VESS operation algorithm |
Data clustering 1: Initialization 2: Set the number of clusters to . 3: Initialize centroids using historical WPG forecasting data. 4: Data clustering 5: Set cluster as . 6: Update including . Policy learning 7: Initialization 8: Set as from . 9: Set as the current SoC and using (13). 10: Policy learning 11: For , 12: Set in using (22). 13: Update , and using (12) and (23). 14: end for |
First, to apply the k-mean clustering, the number of clusters is set to , and using the historical WPG forecast data, the centroids are initialized to solve (24) (steps 2 and 3). The cluster number k of data sets is selected as the cluster that has the minimum Euclidean distance to the cluster centroid in step 5. The cluster number is used to select the active state-action value function for the policy learning process. Moreover, the centroid of the selected cluster is updated considering the dataset in step 6.
Combined with clustering, state-action value functions are required. In the policy learning process, the k-th state-action value function is loaded as the active state-action value function, , according to the cluster number in step 8. The initial state is set as the current VESS condition and the feasible action range is determined by the current state in step 9. During the operation time horizon , the VESS operation action is selected using (22) and the values of the state and the state-action value function are updated according to the selected action in steps 12 and 13.
4. Results and Discussion
To verify the performance of the proposed strategy, the simulation results were evaluated. In the simulation, five WPG datasets that were recorded by the National Renewable Energy Laboratory to develop eastern wind resources in the United States of America from 2004 to 2006 were employed [
38]. Each WPG had a capacity of 20 megawatt (MW). Day-ahead forecasting data were provided with 1-h time resolution. Therefore, the operation time horizon was set to 24 h,
.
The simulation results were measured using the data from the first 14 days of December 2006, and the other datasets were used for RL training. Moreover, for policy learning, the learning rate and discount factor were set as and , respectively. The cluster size was assumed to be three. However, a discussion of the cluster size is also presented here.
A lithium-ion based ESS system was considered, which is widely used with renewable energy systems [
39]. The charging/discharging efficiency
was set as,
which provided a
round-trip efficiency, and the DoD margin that restricts the minimum and maximum operable ES capacity was
. The ES capacity was expressed as the normalized WPG capacity, that is, per unit (p.u.), and the service time was 2 h, which provided a 0.5 charging rate (C-rate).
The simulations in this study were implemented on a 64-bit PC with a 4 GHz Quad-Core Intel Core i7 CPU and 32 GB RAM, using MATLAB R2020a with an IBM CPLEX optimization studio.
4.1. Performance Results of Proposed Strategy
Figure 3 shows the uncertainty management performance as MAE, with varying VESS size. The black line with circles, the red line with squares, and the blue line with diamonds present the results obtained when applying the optimal solution, the proposed method, and the stochastic method, respectively. The optimal solution is the solution to problem (9), for known information including the future time, and the stochastic method is the VESS operation according to the probabilistic information of the WPG suggested in [
40]. The VESS size is the available operation room to manage uncertainty. Therefore, by increasing the size, the MAE is reduced, as shown in
Figure 3a. In particular, in
Figure 3a the optimal solution and proposed method have a similar slope with increasing VESS size, but the result obtained with the stochastic method shows a less significant decrease. This implies that the optimal solution and the proposed method effectively operate according to the environment, while the stochastic method does not. The stochastic method is designed according to the Markov decision process, similar to the proposed method. However, the stochastic method applying the backward induction approach in [
40] predetermines the reserved capacity for the future decision stage according to the probabilistic information of the WPG, so the operational diversity of the stochastic method is lower than those of the optimal solution and proposed method.
Figure 3b shows the optimal gap, which represents the difference from the optimal solution. The optimal gap of the proposed method is less than 2%, and is reduced with increasing size. The proposed method can effectively consider environmental characteristics by learning and clustering, and therefore achieves gain by increasing the size. However, the stochastic method cannot reflect this. Therefore, the optimal gap in the stochastic method increases with increasing size.
4.2. Effect of VESS
Figure 4 compares the MAE of the individual ESS operation and the proposed VESS operation. The results for individual ESS operations are the optimal solution reformulated problem (9) for each WPG. Each WPG presents different uncertainties, so the decrease in the slope with increasing size also differs, as shown in
Figure 4a. However, the results of the proposed VESS operation demonstrate that it outperforms all individual operation results. The individual ESS operation works by using its own information. However, in the case of the proposed VESS operation, information from multiple units is used. Therefore, the proposed VESS operation achieves multi-user diversity gain [
41].
Figure 4b verifies the diversity gain. By increasing the size, the operation availability also increases. The proposed VESS operation is effectively operated to achieve availability with multi-user diversity. Therefore, the VESS operation gain compared to the individual ESS operation is enhanced with increasing operation availability.
4.3. Effect of Clustering
Figure 5 shows the optimal gap of the proposed method with 1, 3, and 5-cluster cases. As shown in
Figure 5, the optimal gap is reduced as the number of clusters increases. In particular, with five clusters, an optimal gap enhancement of more than 1.5% can be obtained when the ESS size is 0.6 p.u. This indicates that the clustering method is an effective way to enhance the performance of the proposed method. However, compared to the performance enhancement provided by three clusters or one cluster, the performance increase provided by five clusters is less than that of three clusters. This is because the distance between the centroids is reduced with increasing cluster size. Moreover, by increasing the cluster size, the number of state-action value functions also increases for policy learning. This increases the system complexity in the implementation. Therefore, it is important to set the appropriate cluster size by considering both performance enhancement and system complexity. As an example, in this study operating with five WPG units, three clusters are efficient considering the performance enhancement, as shown in
Figure 5.
4.4. Usage of the Proposed Strategy
The VESS operation applying the proposed strategy can get a higher forecast error management performance than that of the individual ESS operation. For an example, when the MAE target of each WPG sets as 1.5, in case of the individual ESS operation, each ESS size larger than 1 p.u. is required, as shown in
Figure 4a. This is economically not viable. However, in the proposed VESS case, 0.2 p.u. of ESS size is required for each WPG with the same target. This makes a business model such as a VESS service with economic benefit by reducing the ESS size. Moreover, by increasing the number of clusters, the ESS size can be reduced, as shown in
Figure 5. The cluster size affects the number of the state-action value function that is related to the memory size and the computational complex. Therefore, the VESS service provider can select the ESS size and the number of clusters considering the ESS cost, the memory cost, and the computational complexity, as well as the WPG forecast error management target.
5. Conclusions
This study proposed an RL-based VESS operation strategy to manage WPG forecasting uncertainty. The VESS operation model is the first to consider not only its own uncertainty management requirement, but also the requirements of other units. Applying the VESS model, the expected SARSA-based learning policy is suggested to solve the sequential decision-making problem of the VESS operation. Moreover, the k-means data clustering method is employed to enhance the performance of the proposed strategy by reducing uncertainty variance. The simulation results demonstrate that the proposed strategy provides a near-optimal performance, with a less than 2%-point gap to the optimal solution that requires information including the future time. Moreover, the MAE improvement when applying the proposed method has a similar slope to that of the optimal method according to the storage size. This shows that the proposed method obtains a similar operational diversity to that of the optimal method and can achieve near-optimal performance generally. In addition, we evaluated the performance achieved by the VESS operation in terms of multi-user diversity and the effect of the clustering method according to cluster size.
Research on VESS operation is at an early stage. This study shows that VESS operation can outperform individual ESS operation. However, the performance enhancement according to the VESS operation differs for each unit. Therefore, the VESS operation considering the performance balance among units will be the subject of further research. Moreover, this study only considers a simple system model. By including power system requirements, the system model can be practically extended further. Finally, the forecast error management of WPG is highly related to the revenue, and the VESS operation is cost-efficient, rather than the individual ESS operation. Therefore, this study can be extended to research in the economic aspect, such as a revenue maximization problem considering ESS costs.