A Wind Power Probabilistic Model Using the Reflection Method and Multi-Kernel Function Kernel Density Estimation

Choi, Juseung; Eom, Hoyong; Baek, Seung-Mook

doi:10.3390/en15249436

Open AccessArticle

A Wind Power Probabilistic Model Using the Reflection Method and Multi-Kernel Function Kernel Density Estimation

by

Juseung Choi

,

Hoyong Eom

and

Seung-Mook Baek

^*

Department of Electrical, Electronic, and Control Engineering, Institute of IT Convergence Technology, Kongju National University, Cheonan 31080, Republic of Korea

^*

Author to whom correspondence should be addressed.

Energies 2022, 15(24), 9436; https://doi.org/10.3390/en15249436

Submission received: 26 October 2022 / Revised: 30 November 2022 / Accepted: 8 December 2022 / Published: 13 December 2022

(This article belongs to the Special Issue Recent Advances in Isolated Power Systems)

Download

Browse Figures

Versions Notes

Abstract

:

This paper proposes a wind power probabilistic model (WPPM) using the reflection method and multi-kernel function kernel density estimation (KDE). With the increasing penetration of renewable energy sources (RESs) into power systems, several probabilistic approaches have been introduced to assess the impact of RESs on the power system. A probabilistic approach requires a wind power scenario (WPS), and the WPS is generated from the WPPM. Previously, WPPM was generated using a parametric density estimation, and it had limitations in reflecting the characteristics of wind power data (WPD) due to a boundary bias problem. The paper proposes a WPPM generated using the KDE, which is a non-parametric method. Additionally, the paper proposes a reflection method correcting for the boundary bias problem caused by the double-bounded characteristic of the WPD and the multi-kernel function KDE minimizing the effect of tied values. Six bandwidth selectors are used to calculate the bandwidth for the KDE, and one is selected by analyzing the correlation between the normalized WPD and the calculated bandwidth. The results were validated by generating WPPMs with WPDs in six regions of the Republic of Korea, and it was confirmed that the accuracy and goodness-of-fit are improved when the proposed method is used.

Keywords:

bandwidth selection; kernel density estimation; probabilistic model; sampling-based method; scenario generation; wind power output

Graphical Abstract

1. Introduction

Changes in global energy policy have increased the penetration of renewable energy sources (RESs) in power systems. The intermittence and variability of RESs increase uncertainty into power systems. The characteristics of RESs have led to the use of probabilistic approaches to economic dispatch [1], power flow [2], stability evaluation [3], unit commitment [4], optimization problems [5,6,7], and forecasting [8,9,10].

The probabilistic approach requires scenario generation (SG), and the SG method has been proposed in previous studies [11]. SG methods include sampling-based methods [12], prediction-based methods [13,14], and optimization-based methods. Sampling-based methods include the Monte Carlo method and the Latin hypercube method. It is recommended to use different sampling methods according to the given information [15]. The sampling-based method has the advantage of being fast and simple because SG is possible directly from a wind power probabilistic model (WPPM). The prediction-based methods are influenced by historical data and can capture the variability and complex nonlinearity of RESs. The optimization-based method has high scenario approximation accuracy because it captures the characteristics of the RESs well, but it is not suitable for application to large-scale power systems due to the non-deterministic polynomial time problem.

For SG with time series characteristics, using a prediction-based or optimization-based method is a way to reduce uncertainty. Alternatively, scenarios can be hypothesized from models created with wind speed and climate data from surrounding areas [16]. Both methods are realistic for small-scale power systems and have high accuracy. However, both methods require a lot of data and calculations in large-scale power systems. In probabilistic approaches where scenarios with time-series characteristics are not required, it is efficient to use the sampling-based method.

There are two sampling-based methods for the SG. The first is to convert the wind speed scenario generated from the wind speed probabilistic model into a wind power scenario (WPS) using a power curve [17,18], and the second directly generates WPPM from measured wind power data (WPD) and then creates the WPS from the WPPM [19,20]. The first method requires correction because the wind speed distribution and output curve are different depending on the region and climate, even for the same wind power generator [21]. The WPS for use in large-scale power systems requires data from many regions and power curve corrections [22,23], and power curves can be subject to uncertainty [24]. The second method generates WPPM directly from WPD, therefore it requires less calculation and fewer data than the previous method.

A lot of research has been performed to estimate wind speed probabilistic models through parametric [25,26] and non-parametric [27,28] density estimation. It is also used to generate WPPM from WPD. Weibull distribution is one of the parametric distributions and is often used to estimate the distribution of WPD [12]. A representative non-parametric density estimation is kernel density estimation (KDE). For the KDE, it is necessary to calculate a bandwidth suitable for the characteristics of the data. There are various bandwidth selectors to calculate the optimal bandwidth. However, studies directly estimating WPPM from WPD using the KDE are insufficient, and studies on bandwidth selectors suitable for WPD are also insufficient.

WPD has a double-bounded characteristic [29]. The parameter distribution leaks density beyond the second boundary, and KDE leaks density because there is a boundary bias problem. Hong [30] used the delta function to model the double-bounded characteristic in parametric density estimation.

In this paper, WPPM using the reflection method and multiple-kernel-function KDE is proposed. To generate WPPM directly from WPD using the KDE, the optimal bandwidth is required. The first step is to choose the most suitable bandwidth selector by calculating the bandwidth using various bandwidth selectors and analyzing the correlation between the calculated bandwidth and WPD. In the second step, the boundary bias problem due to the double-bounded characteristic of the WPD is corrected using the reflection method and the effect of tied values is minimized using the multi-kernel function.

The main objectives of this study are the following:

(1): Generate the WPPM directly in WPD by using KDE with the reflection method and multi-kernel function;
(2): Choose the appropriate bandwidth selector for WPD among six bandwidth selectors by analyzing the correlation between WPD and bandwidth;
(3): Apply the proposed method for generating WPPM from WPD in six regions in the Republic of Korea. To validate the proposed WPPM, the accuracy and goodness-of-fit are compared with several known methods.

The remainder of this paper is organized as follows. Section 2 analyzes the characteristics of WPD. Section 3 describes Weibull, KDE, and six bandwidth selectors and calculates the optimal bandwidth for WPD. Section 4 proposes the KDE method to overcome the boundary bias problem and tied values of WPD. Section 5 evaluates the accuracy and goodness-of-fit of the proposed WPPM. Section 6 provides concluding remarks and future research directions.

2. Wind Power Data and Analysis

2.1. Wind Power Data

In this paper, data measured from six regions in the Republic of Korea are used and named WPD. The data can be found in the “Public Data Portal” system [31]. The period of all data is one year, the time interval is 1 h, and the unit is kW. If the data are negative, they are corrected to 0. Regions and installed capacity are listed in Table 1, and histograms of WPD for the six regions are shown in Figure 1.

Figure 2 shows the locations where six WPDs were measured on a map of the Republic of Korea. It is shown that the six regions are evenly distributed in the map. Additionally, the WPD measured on Jeju Island is included. By using WPDs from different regions, various WPPMs will be applied and compared.

2.2. Double-Bounded Characteristics of WPD

WPD has double-bounded characteristics [29]. A boundary exists both when the output is 0 and the output is at installed capacity (i.e., maximum output). The data range is 0 < x < maximum output, and the total probability within the data range must be 1. Figure 3 shows the double-bounded characteristic of the WPD. The first bound means that the wind power generation cannot be negative, and the second bound means that the wind power generation cannot exceed the installed capacity.

2.3. Tied Values at Zero Output

Tied values occur when two are more observations are equal, whether the observations occur in the same sample or in different samples. There are cases where a wind turbine does not produce electricity, and it was recorded as 0 in the data. So in general, WPD has many tied values, but only for values with zero output. Tied values cause leakage density. In order to improve the accuracy of WPPM, the problem of leakage density due to tied values must be addressed.

3. Wind Power Probabilistic Model (WPPM)

A WPPM is generated through a density estimation. The density estimation is to construct an estimate from the observed data, and there are parametric and non-parametric methods. The parametric method estimates parameters by assuming a known probability distribution in order to estimate density. The non-parametric method assumes no pre-specified probability distribution. Histograms and KDEs are representative non-parametric methods.

3.1. Weibull Distribution

The Weibull distribution, which is commonly used for estimating the parametric density emstimation of WPD, is used [12]. The probability density function (PDF) and cumulative distribution function (CDF) of the Weibull distribution are Equations (1) and (2), respectively.

f (x) = \frac{k}{λ} {(\frac{x}{λ})}^{k - 1} e^{- {(\frac{x}{λ})}^{k}}

(1)

F (x) = 1 - e^{- {(\frac{x}{λ})}^{k}}

(2)

where k is the shape factor of Weibull distribution and

λ

is the scale factor of Weibull distribution.

Table 2 lists the parameters of the Weibull distribution estimated in the six regions [32]. A parametric distribution such as the Weibull distribution cannot model the double-bounded characteristic of WPD. Figure 4 shows the PDF and empirical CDF (ECDF) of WPD and WPPM generated from Weibull distribution (WPPM-Weibull) in YG1. Probability must exist only within the double bounds of the WPD. However, in the Weibull distribution, the probability leaks beyond the second boundary, as shown in Figure 4b, so that total probability is less than 1 within the data range.

3.2. Kernel Density Estimation

KDE is a non-parametric method that estimates PDF directly from observed data. An estimate

\hat{f}

of the unknown true density

f

is constructed from the observed data. The procedure by which the observed data is reconstructed through KDE is described in Figure 5. The equation is as follows [33]:

\hat{f} (x) = \frac{1}{n h} \sum_{i = 1}^{n} K (\frac{x - X_{i}}{h})

(3)

where

X_{i}

is observed data, n is the number of observed data, h is the bandwidth of the scaling factor that determines the spread of the kernel function. K(u) is a kernel function with

\int K = 1

. The Gaussian kernel function and the Dirac delta kernel function are Equations (4) and (5), respectively.

K (u) = \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} u^{2}}

(4)

K (u) = δ (u)

(5)

where representative kernel functions include Gaussian, box, triangle, and Epanechnikov. In this paper, the Gaussian kernel function and Dirac delta function are used.

3.2.1. Bandwidth Selection

Bandwidth selection is an important issue in KDE. If the bandwidth is selected too small, the fine structure of the data is visible, and if the bandwidth is too large, the overall characteristics of the distribution are blurred [33]. Depending on the bandwidth, PDF with different shapes for the same data may be estimated. The variation of PDF according to the bandwidth is shown in Figure 6, and it confirm that the bandwidth is an important factor in determining the shape of the PDF. The bandwidth selector calculates the optimal bandwidth for data, and there are various bandwidth selectors.

The mean integrated square error (MISE) is used as a method to evaluate the KDE model and expressed as a bias term and variance term. The equation is as follows [34]:

M I S E (\hat{f}) = \int b i a s^{2} + \int v a r

(6)

where the bias term is described in Equation (7) and the variance term is described in Equation (8).

\int b i a s^{2} = \frac{1}{4} h^{4} μ_{2} {(K)}^{2} R (f " (x)) + o (h^{4})

(7)

\int v a r = \frac{R (K (t)) f (x)}{n h} + o (\frac{1}{n h})

(8)

where

μ_{2} (K) = \int z^{2} K (z) d z

,

R (K) = \int K {(z)}^{2} d z

. MISE is expressed as Equation (9).

M I S E (\hat{f}) = A M I S E (\hat{f}) + o (\frac{1}{n h} + h^{4})

(9)

According to the Taylor higher order term in Equation (9), when the bandwidth increases, the bias increases, and when the bandwidth decreases, the variance increases. This is called bias–variance tradeoff, and it means that the error is minimized when the optimal bandwidth is selected rather than when it is too small or too large. The first term on the right side of Equation (9) is called asymptotic MISE (AMISE), and finding the optimal bandwidth means finding the bandwidth that minimizes AMISE, which is expressed as [34]:

A M I S E (\hat{f}) = \frac{1}{4} h^{4} μ_{2} {(K)}^{2} R (f " (x)) + \frac{R (K (t))}{n h}

(10)

h_{A M I S E} = a r g m i n A M I S E (\hat{f}) = {[\frac{R (K)}{μ_{2} {(K)}^{2} R (f ") n}]}^{0.2}

(11)

However, the optimal bandwidth cannot be obtained directly through Equation (11), because R(

f "

) calculated from real density cannot be known. Assuming R(

f "

) as an arbitrary distribution, Equation (11) can be used to calculate the optimal bandwidth.

3.2.2. Rule of Thumb (ROT)

The ROT bandwidth selector can be calculated by assuming that f in R(

f "

) is a normal distribution with standard deviation σ. The optimal bandwidth

h_{R O T}

is [33]:

h_{R O T} = {[\frac{8 π^{0.5} R (K)}{3 μ_{2} {(K)}^{2} n}]}^{0.2} \hat{σ}

(12)

{\hat{σ}}_{I Q R} = \frac{S a m p l e I Q R}{Φ^{- 1} (0.75) - Φ^{- 1} (0.25)}

(13)

where

μ_{2} (K)

= 1,

R (K) = \frac{1}{2 \sqrt{π}}

, Φ is the inverse cumulative distribution of the normal distribution. It is expressed as:

h_{R O T} = 1.06 n^{- 0.2} \hat{σ}

(14)

where σ is the standard deviation of observation data or

{\hat{σ}}_{I Q R}

, and n is the number of observed data.

3.2.3. Direct Plug-In (DPI)

The DPI selector can be calculated by substituting

Ψ_{4}

for the unknown value R(

f "

) in Equation (11).

h_{D P I}

can be calculated by replacing

Ψ_{4}

with

{\hat{Ψ}}_{4} (g)

, and

{\hat{Ψ}}_{4} (g)

can be replaced with

{\hat{Ψ}}_{r} (g)

. The equation is as follows [33]:

h_{D P I} = {[\frac{R (k)}{μ_{2} {(K)}^{2} {\hat{Ψ}}_{4} (g) n}]}^{0.2}

(15)

{\hat{Ψ}}_{r} (g) = \frac{1}{n^{2}} \sum_{i = 1}^{n} {\hat{f}}^{(r)} (X_{i}; g)

(16)

where

{\hat{f}}^{(r)} (X_{i}; g)

is the r-order derivative having the bandwidth

g

and the kernel function L, and the optimal bandwidth

g

of this equation is as follows:

g_{A M S E} = {[\frac{- k! K^{(r)} (0)}{μ_{k} (L) Ψ_{r + k} n}]}^{\frac{1}{r + k + 1}}

(17)

Calculating

{\hat{Ψ}}_{4} (g)

requires

{\hat{Ψ}}_{6} (g)

, and calculating

{\hat{Ψ}}_{6} (g)

requires

{\hat{Ψ}}_{8} (g)

. It is calculated assuming

{\hat{Ψ}}_{8} (g)

is a normal distribution, and the equation is as follows:

ψ_{r} = \frac{{(- 1)}^{\frac{r}{2}} r!}{{(2 σ)}^{r + 1} (\frac{r}{2})! \sqrt{π}}

(18)

where r is r-order derivative, and

\hat{σ}

calculated as Equation (13). As a result,

{\hat{ψ}}_{8}^{N S} = \frac{105}{32 \sqrt{π} {\hat{σ}}^{9}}

is calculated.

3.2.4. Sheather–Jones (SJ)

The equation to find the optimal bandwidth of the SJ bandwidth selector is as follows [34]:

h_{S J} = {[\frac{R (k)}{μ_{2} {(K)}^{2} {\hat{Ψ}}_{4} (γ (h)) n}]}^{\frac{1}{5}}

(19)

where

γ (h)

is:

γ (h) = {[\frac{2 L^{(4)} (0) μ_{2} {(K)}^{2} {\hat{Ψ}}_{4} (g_{1})}{{\hat{Ψ}}_{6} (g_{2}) R (k) μ_{2} (L)}]}^{\frac{1}{5}} h^{\frac{5}{7}}

(20)

where

g_{1}

and

g_{2}

can be obtained using Equation (17),

{\hat{Ψ}}_{4} (g_{1})

and

{\hat{Ψ}}_{6} (g_{2})

required to calculate

{\hat{Ψ}}_{6}

, and

{\hat{Ψ}}_{8}

is calculated from Equation (18).

3.2.5. Smooth Cross Validation (SCV)

The rate of convergence increases under a wide range of smoothness conditions. SCV has the advantage of increasing the rate of convergence by presmoothing the data. An integrated bias square (IBS) can be obtained by modifying Equation (7), and the value that minimizes SCV(h) is the optimal bandwidth. The equation is as follows [35]:

SCV (h) = \frac{R (K)}{n h} + \hat{I B S} (h)

(21)

\hat{I B S} (h) = \int {\{K_{h} * {\hat{f}}_{L} - {\hat{f}}_{L}\}}^{2} (x) d x

(22)

where

{\hat{f}}_{L}

is an estimate having a kernel function L and a bandwidth g, and the equation is as follows:

{\hat{f}}_{L} = \frac{1}{n} \sum_{i = 1}^{n} L_{g} (x - X_{i})

(23)

3.2.6. Least Square Cross Validation and Biased Cross Validation

Least square cross validation (LSCV) (also called unbiased cross validation (UCV) and biased cross validation (BCV)) bandwidth selectors are widely used bandwidth selectors for calculating the optimal bandwidth. LSCV and BCV are numerically analyzed to calculate the optimal bandwidth, and there are several local minimums in this process. To solve this problem, the optimal bandwidth is calculated through the bandwidth grid. However, there is a limitation on the cross-validation method when tied values exist [36].

3.2.7. Optimal Bandwidth of WPD

The optimal bandwidth was calculated using six bandwidth selectors for two data sets. Dataset (1) is the original WPD from the six regions and dataset (2) is the WPD normalized to 1 MW. The reason for calculating the optimal bandwidth of the normalized WPD is to analyze the correlation between WPD and optimal bandwidth. Table 3 shows the bandwidth calculated from dataset (1), the bandwidth calculated from dataset (2), and the ratio of (1) to (2), respectively. The ratio of bandwidth calculated for WPD and normalized WPD equals the size of the installed capacity. This means that the bandwidth calculated by the bandwidth selector has a linear relationship with the installed capacity of the wind power plant. Even if the installed capacity changes due to the expansion or contraction of a wind farm, or the size of the WPD for simulation changes, the bandwidth size only needs to be linearly changed without recalculating the optimal bandwidth.

LSCV and BCV bandwidth selectors are calculated with a bandwidth grid ranging from 0.001–500. The cases where the proper bandwidth in the range was not found are marked with an X. Although the proper bandwidth was found in some cases, it was not found in most. This confirms that the LSCV and BCV bandwidth selectors cannot reliably calculate the optimal bandwidth for WPD.

Figure 7 shows the bandwidth values calculated from the normalized WPD of the six regions. Four bandwidth selectors, ROT, DPI, SJ, and SCV, calculate bandwidths within a certain range regardless of the characteristics of various normalized WPDs. Additionally, the normalization factor for WPD and the optimal bandwidth have a linear relationship.

4. Proposed Method

4.1. Correction of the Boundary Bias Problem through the Reflection Method

The double-bounded characteristic of WPD cannot be modeled with a parametric distribution, and the boundary bias problem exists in KDE. The boundary bias problem occurs when data exist near the boundary and density leaks over the boundary during the KDE process. A method of solving the boundary bias is called a boundary correction method, of which there are various. The boundary correction method proposed in this paper is a reflection method [33]. The reflection method has the advantage of being simple: it can be used without complicated calculations and assumptions. Figure 8 shows a WPPM generated using KDE (WPPM-KDE) and a WPPM generated via KDE using the reflection method (WPPM-KDE-Reflection). The boundary bias problem is effectively corrected with the reflection method. The equation is as follows [33]:

\hat{f} (x) = \frac{1}{n h} \sum_{i = 1}^{n} [K (\frac{x + X_{i}}{h}) + K (\frac{x - X_{i}}{h})]

(24)

where

X_{i}

is observed data, K is the kernel function with

\int K = 1

, h is bandwidth.

4.2. Minimization of the Effect of Tied Values through the Multi-Kernel Functions

The bandwidth calculated using the bandwidth selector is dependent on the installed capacity, but tied values exist only when the output is 0, regardless of installed capacity. This means that the tied values cause leak density. Unlike the boundary bias that arises from boundaries, this is leakage density caused by too many values connected to a single point. Suga [37] solves this problem with multiple bandwidths. Otherwise, the paper uses the KDE with multi-kernel function to solve this problem. In particular, the Dirac delta function is used as the kernel function to minimize the leakage density in the range with many tied values. The Gaussian function is used in the remaining range. Figure 9 shows WPPM-KDE-Reflection and WPPM-KDE-Reflection using the multi-kernel method (WPPM-PM). The expression of the multi-kernel function method is as follows:

K (u) = \{\begin{matrix} δ (u) X_{i} = 0 \\ \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} u^{2}} X_{i} \neq 0 \end{matrix}

(25)

where if

X_{i}

is 0, the delta function is used, otherwise the Gaussian kernel function is used.

4.3. Proposed Model

The proposed WPPM (WPPM-PM) is based on the reflection method and multi-kernel function KDE. The WPPM-PM corrects the boundary bias problem with the reflection method. The influence of tied values can be minimized by using the Dirac delta function. A flowchart of the proposed method is shown in Figure 10.

5. Results and Evaluation

5.1. Evaluation Methods

5.1.1. Mean Absolute Error

MAE is calculated as the average of the absolute errors of the model. The closer the result is to 0, the higher the accuracy. MAE can intuitively identify errors between the model and the WPD. The equation is as follows [38]:

M A E = \frac{1}{n} \sum_{i = 1}^{n} |{\hat{Y}}_{i} - Y_{i}|

(26)

where n is the number of intervals,

{\hat{Y}}_{i}

is the i-th value of the generated model, and

Y_{i}

is the i-th observation of the WPD.

5.1.2. Root Mean Square Error

RMSE is the square root of the mean of the squared errors of the model and WPD. Compared to MAE, the larger the error, the larger the penalty, and the smaller the error, the smaller the penalty. The equation is as follows [38]:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {({\hat{Y}}_{i} - Y_{i})}^{2}}

(27)

where n is the number of intervals,

{\hat{Y}}_{i}

is the i-th value of the generated model, and

Y_{i}

is the i-th observation of the WPD.

5.1.3. Kolmogorov–Smirnov Test (K-S Test)

The K-S test is a non-parametric test method. Through the two-sample K-S test, we can test whether the two sets of samples to be compared are drawn from the same probability distribution. If the p-value is greater than 0.05 after comparing the ECDF of two samples, the null hypothesis cannot be rejected, and the two samples are considered to come from the same distribution. The equation is as follows [39]:

D = \sup |F_{1} - F_{2}|

(28)

where

F_{1}

and

F_{2}

are the ECDF of the first and second samples, respectively. It is decided whether to reject the null hypothesis by calculating the KS statistic of the two ECDFs.

5.2. Comparison and Results

The results are validated by comparing the WPPM-PM, the WPPM-Weibull, and the WPPM-KDE. When generating the WPPM-PM and WPPM-KDE, the bandwidth is calculated using four bandwidth selectors: ROT, DPI, SJ, and SCV. Accuracy as according to MAE and RMSE is calculated using the ECDF of the WPD and the CDF of WPPM. The K-S test compares the WPD to the WPS generated by WPPM to test the goodness-of-fit.

Table 4 and Table 5 list the MAE and RMSE values. In Table 4 and Table 5, the errors of WPPM-PM and WPPM-KDE are improved compared to WPPM-Weibull, and the average error also confirms this. Compared to WPPM-KDE, the accuracy of WPPM-PM improved by 50% on average for all models, and the results are shown in Table 6. Four bandwidth selectors were applied to propose the best WPPM-PM. The WPPM-PM-SJ shows the highest accuracy in all regions.

Table 7 lists the K-S test results. Only WPPM-PM-SJ passed the K-S test in all regions. WPPM-PM-DPI and WPPM-PM-SCV failed the K-S test according to WPD. WPPM-KDE, WPPM-PM-ROT and WPPM-Weibull failed the K-S test across all regions. Conclusively, the WPPM-PM-SJ guarantees the highest accuracy and goodness-of-fit to stably create WPPM.

6. Conclusions

In this paper, a wind power probability model (WPPM) using the reflection method and multi-kernel function kernel density estimation (KDE) is proposed. A boundary bias problem existed due to the double-bounded characteristics of WPD, which was corrected using the reflection method. In addition, the leakage density caused by the effect of tied values at zero output was minimized using the multi-kernel function.

First, because of comparing six bandwidth selectors using wind power data (WPD) from six regions in the Republic of Korea, it was confirmed that the LSCV and BCV bandwidth selectors are inappropriate to calculate the optimal bandwidth of WPD. For the remaining four selectors, the calculated bandwidths were within a certain range regardless of the characteristics of various normalized WPDs. Next, MAE, RMSE, and K-S tests of WPPM-PM, WPPM-Weibull and WPPM-KDE were compared. The accuracy of WPPM-PM and WPPM-KDE was better than that of WPPM-Weibull, and when comparing each bandwidth selector, the accuracy of WPPM-PM was overall better. Only WPPM-PM-SJ passed goodness-of-fit in all regions. The accuracy and goodness-of-fit of WPPM-PM-SJ was the best. As a result, when generating WPPM directly from WPD for WPS generation, a better model can be generated if the method proposed in this paper is used.

Analyzing power systems through a probabilistic approach has become an important task. Through the proposed method, a better WPPM was created than the existing WPPM, and it can be used in various probabilistic approaches.

Author Contributions

Conceptualization, J.C., H.E. and S.-M.B.; methodology, J.C.; software, J.C. and H.E.; validation, H.E.; investigation, H.E.; data curation, H.E.; writing—original draft preparation, J.C.; writing—review and editing, S.-M.B.; visualization, J.C. and H.E.; supervision, S.-M.B.; project administration, S.-M.B.; funding acquisition, S.-M.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Korea Electric Power Corporation (No. R21XO01-43), and by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. 2020R1I1A3074996).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

AMISE	Asymptotic mean integrated square error
BCV	Biased cross validation bandwidth selector
CDF	Cumulative density function
DPI	Direct plug-in bandwidth selector
ECDF	Empirical cumulative density function
KDE	Kernel density estimation
K-S test	Kolmogorov–Smirnov test
LSCV	Least square cross validation bandwidth selector
MAE	Mean absolute error
MISE	Mean integrated square error
PDF	Probability density function
RMSE	Root mean square error
ROT	Rule-of-thumb bandwidth selector
SCV	Smooth cross validation bandwidth selector
SG	Scenario generation
SJ	Sheather–Jones bandwidth selector
UCV	Unbiased cross validation bandwidth selector
WPD	Wind power data
WPS	Wind power scenario
WPPM	Wind power probabilistic model
WPPM-KDE	WPPM generated using KDE
WPPM-KDE-Reflection	WPPM generated using KDE via the reflection method
WPPM-PM	WPPM generated using KDE via the reflection method and multi kernel function method (=proposed WPPM)
WPPM-Weibull	WPPM generated using Weibull distribution

References

Pham, L.H.; Duong, M.Q.; Phan, V.-D.; Nguyen, T.T.; Nguyen, H.-N. A High-Performance Stochastic Fractal Search Algorithm for Optimal Generation Dispatch Problem. Energies 2019, 12, 1796. [Google Scholar] [CrossRef] [Green Version]
Peng, S.; Lin, X.; Tang, J.; Xie, K.; Ponci, F.; Monti, A.; Li, W. Probabilistic Power Flow of AC/DC Hybrid Grids With Addressing Boundary Issue of Correlated Uncertainty Sources. IEEE Trans. Sustain. Energy 2022, 13, 1607–1619. [Google Scholar] [CrossRef]
Qi, B.; Hasan, K.N.; Milanović, J.V. Identification of Critical Parameters Affecting Voltage and Angular Stability Considering Load-Renewable Generation Correlations. IEEE Trans. Power Syst. 2019, 34, 2859–2869. [Google Scholar] [CrossRef] [Green Version]
Hong, Y.-Y.; Apolinario, G.F.D. Uncertainty in Unit Commitment in Power Systems: A Review of Models, Methods, and Applications. Energies 2021, 14, 6658. [Google Scholar] [CrossRef]
Rakipour, D.; Barati, H. Probabilistic optimization in operation of energy hub with participation of renewable energy resources and demand response. Energy 2019, 173, 384–399. [Google Scholar] [CrossRef]
Lorca, Á.; Sun, X.A. Adaptive Robust Optimization With Dynamic Uncertainty Sets for Multi-Period Economic Dispatch Under Significant Wind. IEEE Trans. Power Syst. 2015, 30, 1702–1713. [Google Scholar] [CrossRef] [Green Version]
Riaz, M.; Ahmad, S.; Hussain, I.; Naeem, M.; Mihet-Popa, L. Probabilistic Optimization Techniques in Smart Power System. Energies 2022, 15, 825. [Google Scholar] [CrossRef]
Viet, D.T.; Phuong, V.V.; Duong, M.Q.; Tran, Q.T. Models for Short-Term Wind Power Forecasting Based on Improved Artificial Neural Network Using Particle Swarm Optimization and Genetic Algorithms. Energies 2020, 13, 2873. [Google Scholar] [CrossRef]
Wang, Y.; Xu, H.; Zou, R.; Zhang, L.; Zhang, F. A deep asymmetric Laplace neural network for deterministic and probabilistic wind power forecasting. Renew. Energy 2022, 196, 497–517. [Google Scholar] [CrossRef]
Zhou, N.; Xu, X.; Yan, Z.; Shahidehpour, M. Spatio-Temporal Probabilistic Forecasting of Photovoltaic Power Based on Monotone Broad Learning System and Copula Theory. IEEE Trans. Sustain. Energy 2022, 13, 1874–1885. [Google Scholar] [CrossRef]
Li, J.; Zhou, J.; Chen, B. Review of wind power scenario generation methods for optimal operation of renewable energy systems. Appl. Energy 2020, 280, 115992. [Google Scholar] [CrossRef]
Kim, S.; Hur, J. Probabilistic power output model of wind generating resources for network congestion management. Renew. Energy 2021, 179, 1719–1726. [Google Scholar] [CrossRef]
Park, H. A Unit Commitment Model Considering Feasibility of Operating Reserves under Stochastic Optimization Framework. Energies 2022, 15, 6221. [Google Scholar] [CrossRef]
Wu, H.; Wang, M.; Xu, Z.; Jia, Y. Graph Attention Enabled Convolutional Network for Distribution System Probabilistic Power Flow. IEEE Trans. Ind. Appl. 2022, 58, 7068–7078. [Google Scholar] [CrossRef]
Alzubaidi, M.; Hasan, K.N.; Meegahapola, L.; Rahman, M.T. Identification of Efficient Sampling Techniques for Probabilistic Voltage Stability Analysis of Renewable-Rich Power Systems. Energies 2021, 14, 2328. [Google Scholar] [CrossRef]
Hu, J.; Li, H.; Liu, Z. A Novel Scenario Generation Framework Based on the Knowledge of Existing Wind Power Plants. IEEE Trans. Sustain. Energy 2021, 12, 1229–1241. [Google Scholar] [CrossRef]
Malekshah, S.; Banihashemi, F.; Daryabad, H.; Yavarishad, N.; Cuzner, R. A zonal optimization solution to reliability security constraint unit commitment with wind uncertainty. Comput. Electron. Eng. 2022, 99, 107750. [Google Scholar] [CrossRef]
Shaheen, M.A.M.; Ullah, Z.; Qais, M.H.; Hasanien, H.M.; Chua, K.J.; Tostado-Véliz, M.; Turky, R.A.; Jurado, F.; Elkadeem, M.R. Solution of Probabilistic Optimal Power Flow Incorporating Renewable Energy Uncertainty Using a Novel Circle Search Algorithm. Energies 2022, 15, 8303. [Google Scholar] [CrossRef]
Lee, M.; Yoon, M.; Cho, J.; Choi, S. Probabilistic Stability Evaluation Based on Confidence Interval in Distribution Systems with Inverter-Based Distributed Generations. Sustainability 2022, 14, 3806. [Google Scholar] [CrossRef]
Kim, G.; Hur, J. Probabilistic modeling of wind energy potential for power grid expansion planning. Energy 2021, 230, 120831. [Google Scholar] [CrossRef]
Wang, Y.; Hu, Q.; Li, L.; Foley, A.M.; Srinivasan, D. Approaches to wind power curve modeling: A review and discussion. Renew. Sustain. Energy Rev. 2019, 116, 109422. [Google Scholar] [CrossRef]
Wang, Y.; Hu, Q.; Srinivasan, D.; Wang, Z. Wind Power Curve Modeling and Wind Power Forecasting With Inconsistent Data. IEEE Trans. Sustain. Energy 2019, 10, 16–25. [Google Scholar] [CrossRef]
Marčiukaitis, M.; Žutautaitė, I.; Martišauskas, L.; Jokšas, B.; Gecevičius, G.; Sfetsos, A. Non-linear regression model for wind turbine power curve. Renew. Energy 2017, 113, 732–741. [Google Scholar] [CrossRef]
Wang, J.; AlShelahi, A.; You, M.; Byon, E.; Saigal, R. Integrative Density Forecast and Uncertainty Quantification of Wind Power Generation. IEEE Trans. Sustain. Energy 2021, 12, 1864–1875. [Google Scholar] [CrossRef]
Wang, J.; Hu, J.; Ma, K. Wind speed probability distribution estimation and wind energy assessment. Renew. Sustain. Energy Rev. 2016, 60, 881–899. [Google Scholar] [CrossRef]
Aydin, O.; Igliński, B.; Krukowski, K.; Siemiński, M. Analyzing Wind Energy Potential Using Efficient Global Optimization: A Case Study for the City Gdańsk in Poland. Energies 2022, 15, 3159. [Google Scholar] [CrossRef]
Wahbah, M.; Mohandes, B.; EL-Fouly, T.H.M.; El Moursi, M.S. Unbiased cross-validation kernel density estimation for wind and PV probabilistic modelling. Energy Convers. Manag. 2022, 266, 115811. [Google Scholar] [CrossRef]
Han, Q.; Ma, S.; Wang, T.; Chu, F. Kernel density estimation model for wind speed probability distribution with applicability to wind energy assessment in China. Renew. Sustain. Energy Rev. 2019, 115, 109387. [Google Scholar] [CrossRef]
Wang, Z.; Wang, W.; Liu, C.; Wang, B.; Feng, S. Short-term probabilistic forecasting for regional wind power using distance-weighted kernel density estimation. IET Renew. Power Gener. 2018, 12, 1725–1732. [Google Scholar] [CrossRef]
Hong, P.; Qin, Z. Distributed Active Power Optimal Dispatching of Wind Farm Cluster Considering Wind Power Uncertainty. Energies 2022, 15, 2706. [Google Scholar] [CrossRef]
Public Data Portal. Available online: https://www.data.go.kr/ (accessed on 1 December 2022).
Teimourian, H.; Abubakar, M.; Yildiz, M.; Teimourian, A. A Comparative Study on Wind Energy Assessment Distribution Models: A Case Study on Weibull Distribution. Energies 2022, 15, 5684. [Google Scholar] [CrossRef]
Silverman, B.W. Density Estimation for Statistics and Data Analysis, 1st ed.; Routledge: New York, NY, USA, 1998. [Google Scholar]
Wand, M.P.; Jones, M.C. Kernel Smoothing, 1st ed.; Chapman and Hall/CRC: New York, NY, USA, 1994. [Google Scholar]
Hall, P.; Marron, J.S.; Park, B.U. Smoothed cross-validation. Probab. Theory Relat. Fields 1992, 92, 1–20. [Google Scholar] [CrossRef]
Żychaluk, K.; Patil, P.N. A cross-validation method for data with ties in kernel density estimation. Ann. Inst. Stat. Math. 2008, 80, 21–44. [Google Scholar] [CrossRef]
Suga, N.; Yano, K.; Webber, J.; Hou, Y.; Higashimori, T.; Suzuki, Y. Estimation of Probability Density Function Using Multi-bandwidth Kernel Density Estimation for Throughput. In Proceedings of the 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Fukuoka, Japan, 19–21 February 2020; p. 19532270. [Google Scholar]
Mohammadi, K.; Alavi, O.; Mostafaeipour, A.; Goudarzi, N.; Jalilvand, M. Assessing different parameters estimation methods of Weibull distribution to compute wind power density. Energy Convers. Manag. 2016, 108, 322–335. [Google Scholar] [CrossRef]
Chang, T.P. Estimation of wind energy potential using different probability density functions. Appl. Energy 2011, 88, 1848–1856. [Google Scholar] [CrossRef]

Figure 1. Histogram of WPD for the six regions.

Figure 2. Locations where six WPDs were measured.

Figure 3. Double-bounded characteristics of the WPD.

Figure 4. WPD and WPPM-Weibull in YG1. (a) Comparison of YG1-PDF and WPPM-Weibull-PDF. (b) Comparison of YG1–ECDF and WPPM-Weibull-CDF.

Figure 5. The procedure to obtain probability density through KDE.

Figure 6. Variation of PDFs according to the bandwidth.

Figure 7. Bandwidths calculated from the normalized WPD of six regions.

Figure 8. WPPM-KDE and WPPM-KDE-Reflection in YH.

Figure 9. WPPM-KDE-Reflection and WPPM-PM in YH.

Figure 10. The flowchart of the proposed method.

Table 1. Information of six sites: region, installed capacity, and type.

Regions	Installed Capacity	Type
Jeonnam Yeongwang 1 (YG1)	2300 kW	Generator
Jeonnam Yeongwang 2 (YG2)	3000 kW	Generator
Gyeongbuk Gunwi (GW)	11,550 kW	Plant
Incheon Yeongheung (YH)	46,000 kW	Plant
Jeju Seongsan (SS)	20,000 kW	Plant
Jeonnam Hwasun (HS)	16,000 kW	Plant

Table 2. Parameters of the Weibull distribution estimated for the six regions.

Region	Parameter
Region	Scale	Shape
YG1	181.963	0.365
YG2	69.366	0.304
GW	866.286	0.372
YH	2490.728	0.388
SS	3409.666	0.549
HS	2072.767	0.510

Table 3. Bandwidth calculated using six bandwidth selectors for six regions.

Region	Dataset	Bandwidth Selector
Region	Dataset	ROT	DPI	SJ	LSCV	BCV	SCV
YG1	(1)	83.19	13.16	2.18	X	X	15.93
	(2)	36.17	5.72	0.95	X	X	6.93
	Ratio	2.3	2.3	2.3	X	X	2.3
YG2	(1)	38.06	5.05	0.60	X	X	14.05
	(2)	12.69	1.68	0.20	X	X	4.68
	Ratio	3	3	3	X	X	3
GW	(1)	420.18	62.14	13.83	X	121.52	58.19
	(2)	36.38	5.38	1.20	X	X	5.04
	Ratio	11.55	11.55	11.55	X	X	11.55
YH	(1)	1009.30	160.93	39.86	X	X	189.11
	(2)	21.94	3.50	0.87	X	X	4.11
	Ratio	46	46	46	X	X	46
SS	(1)	840.49	199.20	66.16	X	X	185.76
	(2)	42.02	9.96	3.31	X	252.69	9.29
	Ratio	20	20	20	X	X	20
HS	(1)	570.23	119.22	34.20	X	X	109.10
	(2)	35.64	7.45	2.14	X	116.60	6.82
	Ratio	16	16	16	X	X	16

Table 4. Results of MAE.

$MAE (10^{- 4})$
Region	WPPM-KDE				WPPM-PM				WPPM-Weibull
Region	ROT	DPI	SJ	SCV	ROT	DPI	SJ	SCV	WPPM-Weibull
YG1	111.227	16.172	3.785	19.373	35.461	4.188	1.819	4.864	350.132
YG2	49.428	7.837	2.935	18.286	11.317	1.995	1.154	3.678	16.434
GW	106.926	12.851	3.794	12.135	48.876	4.645	1.951	4.390	619.539
YH	68.762	9.057	3.447	10.482	45.530	4.526	1.330	5.359	389.439
SS	73.502	13.697	5.098	12.793	37.376	5.220	2.285	4.889	538.388
HS	70.994	12.852	4.024	11.829	40.266	6.468	2.189	5.978	591.832
Average	80.1398	12.0777	3.8472	14.1497	36.4710	4.5070	1.7880	4.8597	417.6273

Table 5. Results of RMSE.

$RMSE (10^{- 6})$
Region	WPPM-KDE				WPPM-PM				WPPM-Weibull
Region	ROT	DPI	SJ	SCV	ROT	DPI	SJ	SCV	WPPM-Weibull
YG1	836.267	99.492	15.533	121.698	46.711	0.969	0.121	1.272	1954.606
YG2	516.928	56.682	4.126	169.632	12.685	0.354	0.179	15.432	530.007
GW	872.343	56.888	11.745	53.103	212.379	1.711	0.206	1.485	4732.856
YH	501.010	32.811	6.212	40.286	265.007	7.689	0.883	9.987	1952.145
SS	276.515	32.773	9.202	30.075	38.773	0.935	0.118	0.801	3834.946
HS	331.689	36.435	8.492	32.820	91.009	4.892	0.619	4.274	4407.879
Average	555.7920	52.5135	9.2183	74.6023	111.0940	2.7583	0.3543	5.5418	2902.0732

Table 6. Improved rate of MAE.

Region	Improved Rate (%)
Region	ROT	DPI	SJ	SCV
YG1	68.118	74.103	51.941	74.893
YG2	77.104	74.544	60.681	79.886
GW	54.290	63.855	48.577	63.824
YH	33.786	50.028	61.416	48.874
SS	49.150	61.889	55.179	61.784
HS	43.283	49.673	45.601	49.463
Average	54.2885	62.3487	53.8992	63.1207

Table 7. Results of K-S test.

K-S Test
Region	WPPM-KDE				WPPM-PM				WPPM-Weibull
Region	ROT	DPI	SJ	SCV	ROT	DPI	SJ	SCV	WPPM-Weibull
YG1	0	0	0	0	0.005	0.645	0.997	0.581	0
YG2	0	0	0	0	0.001	0.402	0.487	0.159	0
GW	0	0	0	0	0	0.540	0.802	0.565	0
YH	0	0	0	0	0	0	0.133	0.001	0
SS	0	0	0	0	0.001	0.567	0.792	0.592	0
HS	0	0	0	0	0	0.001	0.077	0.002	0

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Choi, J.; Eom, H.; Baek, S.-M. A Wind Power Probabilistic Model Using the Reflection Method and Multi-Kernel Function Kernel Density Estimation. Energies 2022, 15, 9436. https://doi.org/10.3390/en15249436

AMA Style

Choi J, Eom H, Baek S-M. A Wind Power Probabilistic Model Using the Reflection Method and Multi-Kernel Function Kernel Density Estimation. Energies. 2022; 15(24):9436. https://doi.org/10.3390/en15249436

Chicago/Turabian Style

Choi, Juseung, Hoyong Eom, and Seung-Mook Baek. 2022. "A Wind Power Probabilistic Model Using the Reflection Method and Multi-Kernel Function Kernel Density Estimation" Energies 15, no. 24: 9436. https://doi.org/10.3390/en15249436

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Wind Power Probabilistic Model Using the Reflection Method and Multi-Kernel Function Kernel Density Estimation

Abstract

1. Introduction

2. Wind Power Data and Analysis

2.1. Wind Power Data

2.2. Double-Bounded Characteristics of WPD

2.3. Tied Values at Zero Output

3. Wind Power Probabilistic Model (WPPM)

3.1. Weibull Distribution

3.2. Kernel Density Estimation

3.2.1. Bandwidth Selection

3.2.2. Rule of Thumb (ROT)

3.2.3. Direct Plug-In (DPI)

3.2.4. Sheather–Jones (SJ)

3.2.5. Smooth Cross Validation (SCV)

3.2.6. Least Square Cross Validation and Biased Cross Validation

3.2.7. Optimal Bandwidth of WPD

4. Proposed Method

4.1. Correction of the Boundary Bias Problem through the Reflection Method

4.2. Minimization of the Effect of Tied Values through the Multi-Kernel Functions

4.3. Proposed Model

5. Results and Evaluation

5.1. Evaluation Methods

5.1.1. Mean Absolute Error

5.1.2. Root Mean Square Error

5.1.3. Kolmogorov–Smirnov Test (K-S Test)

5.2. Comparison and Results

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI