1. Introduction
Wind power has gained significant traction in the global energy mix due to its clean and renewable nature. In order to promote the energy revolution and reduce carbon emissions, there is a strong focus on wind power engineering, which offers pollution-free characteristics and wide distribution advantages. International research and development efforts prioritize this field. During the “14th Five-Year Plan” period, China plans to add 310 million kilowatts of installed wind power capacity. According to statistics from China’s National Energy Administration (NEA), approximately 37.6 GW of newly installed wind power capacity was expected in 2022; Inner Mongolia leads in terms of installed capacity. It is imperative to accurately and reliably assess wind resources in order to combat climate change and ensure energy security [
1]. Two important characteristics of wind are its speed and direction, with variations in wind energy density observed across different directions. Therefore, studying the joint probability density function (JPDF) of wind speed and direction allows us to quantify their correlation and assess the potential for wind energy associated with a specific wind direction [
2,
3,
4]. Wind resource assessment in conjunction with wind direction is important for improving the accuracy of micro-siting for wind farms, reducing the operating costs, and improving the efficiency of wind turbines and power generation availability [
5].
In recent years, various statistical models have been utilized to fit wind speed random variables. These include the Rayleigh and Weibull distributions [
6], lognormal distribution [
7], and generalized extreme value distributions [
8]. D’Amico et al. [
9] put forward an approach of modeling wind speed data through using a semi-Markov chain. Compared with a general Markov chain, the synthetic time series generated by this model can more accurately reflect the statistical characteristics of wind speed data, among which the second-order semi-Markov process of state and duration fits best. Aljeddani and Mohammed [
10] used the probability density function (PDF) of the inverse Weibull distribution to model the wind speed characteristics. They proposed a modified maximum likelihood function based on this specific distribution to enhance parameter estimation accuracy, resulting in a reliable framework for wind speed assessment. Researchers have also investigated the potential to improve the validity and robustness of marginal PDF for wind speed by employing mixture distribution models [
11,
12], extended distribution models [
13,
14], and nonparametric kernel density estimations [
15,
16,
17]. Alharthi [
18] introduced a new statistical model called the modified sine-Weibull distribution. This model was used to analyze wind data from Spain by incorporating the Weibull distribution into the modified sine-G family of distributions. This approach represents a new advancement in utilizing trigonometric functions for wind speed modeling. In the modeling and prediction of wind direction, Hirata et al. [
19] proposed a nonlinear multi-observation wind direction prediction model, which led to the improvement of prediction performance and expected power generation. Despite the extensive research on wind speed modeling, there is a scarcity of studies focusing on continuous wind angular probability distributions. Currently, the most commonly used distributions to characterize changes in wind direction are harmonic functions [
20] and finite mixtures of von Mises distributions [
21].
Previous findings suggest the limitations of assuming complete independence between wind speed and wind direction, and that the interdependence of the two variables should be fully considered. Johnson and Wehrly [
22] proposed an angle-linear (AL) approach which describes variable dependency by defining circular-related coefficients. Carta et al. [
23] improved the AL model and applied it to the study of wind speed and wind direction in JPDF. The wind speed marginal distribution for this JPDF model was described by a mixed Normal-Weibull distribution, and the marginal distribution of wind direction was obtained by fitting a mixed von Mises distribution. Since then, the AL model has become a representative method for constructing JPDFs of wind vectors, as it has better matching performance than conventional models [
24,
25]. In fact, in complex geographical areas where air can be blocked or accelerated, it can lead to strong winds in the prevailing direction. At this point, the AL method is restricted by symmetry and does not always adequately represent dependence structures.
Recently, the copula function has been widely used to construct joint models for multivariate random variables [
26,
27,
28]. This approach allows for independently determining the marginal distributions without interference, offering high flexibility to capture non-normal and asymmetric distributional features [
29]. A number of researchers have extensively explored the use of copula functions in wind energy studies, showcasing their ability to accurately describe the correlation among wind characteristics. An analysis of directional wind power generation in the German region was conducted by Schindler and Jung [
30] using the Gaussian copula model. Li et al. [
31] demonstrated that the copula approach is superior in binary distributions adjustment of wind speed and direction, as well as in predicting extreme wind speeds, through a comparison of its performance with that of conventional methods. Huang et al. [
32] evaluated the directional wind energy potential in Hong Kong based on various copula functions.
However, it is worth noting that the studies mentioned above utilized parametric copula models, which rely on a priori distributional assumptions and are limited in terms of the types of distributions they can handle. These assumptions and limitations can introduce errors when applied to real data. A non-parametric kernel density estimator is a fully data-driven model, in contrast to parametric models. Without assuming a specific functional form, the model is capable of managing intricate relationships among variables. This ability allows it to effectively capture the non-linear correlations between variables, which provides a unique advantage. Charpentier et al. [
33,
34] proposed a non-parametric copula model utilizing kernel functions. Among them, the kernel-based copula model on the basis of the transformation idea was used to analyze financial risk data; the beta boundary kernel is optimally sophisticated and robust in analyzing wind speed and direction data [
35]. The empirical Bernstein Copula (EBC) proposed by Sancetta and Satchell shows great flexibility in correlation analyses of circular–circular variables or circular–linear variables. Carnicero et al. illustrated the Bernstein Copula-based circular–linear and circular–circular modeling approaches using two cases, one of the relationship between wind direction and precipitation, and the other between the wind directions of two adjacent buoys [
36]. In a recent study [
37], the nonparametric Bernstein copula was used to construct a JPDF of wind speed and direction, where the order of the model was deter-mined by a stepwise search strategy combined with the cube root of the sample size recommended by Sancetta and Satchell. The model accurately describes the prevailing wind direction in complex wind environments and, in addition, the EBC method provides desired JPDF accuracy when the marginal distributions are poorly represented. However, up to the present, there have been few research studies on the performance of nonparametric copula methods for the fitting of JPDFs of wind speed and direction. Previous literature [
30,
38,
39] contributed to the fields of wind speed and wind energy; however, they did not consider nonparametric models.
Situated along the northern border of China, Inner Mongolia boasts abundant wind resources. In order to promote balanced development and respond to the call for sustainable development, Inner Mongolia has become a key region for wind energy development in China. While many studies have analyzed wind speed variations and characteristics of wind energy distribution, there is a lack of research applying nonparametric copula methods to construct JPDFs for wind speed and direction in Inner Mongolia. Additionally, no study has explored the potential of directional wind energy in Inner Mongolia or its impact on engineering structures using this method. To address this gap, this study introduces a non-parametric copula model that utilizes a probabilistic transformation and optimal bandwidth algorithm to establish correlations between wind speed and wind direction. Various parametric copula models and models that do not consider interdependence are also introduced for comparison purposes. Measured data from monitoring stations in four similar allied cities in Inner Mongolia are utilized to evaluate the fitting accuracy of the various models; meanwhile, marginal PDFs of wind vectors suitable for this study area are obtained. Then, JPDFs of wind speed and direction are established on the basis of the nonparametric copula model, and subsequently, direction-dependent wind energy assessment is carried out.
The rest of this paper is structured as follows.
Section 2 introduces the nonparametric methodology for constructing the marginal PDFs and binary JPDFs, as well as the model evaluation metrics.
Section 3 briefly describes the wind data used.
Section 4 compares the fitting accuracies of the different models and determines the JPDFs for wind speed and direction, as well as obtaining the marginal PDFs.
Section 5 calculates the directional wind energy for sites located in four unallied cities in Inner Mongolia employing the superior JPDF model.
Section 6 summarizes the entire paper.
2. Nonparametric Probabilistic Model
This section provides a brief description of the characteristics of the wind vector components considered in this study. Nonparametric kernel density estimation (KDE) models are established separately for wind speed and wind direction, yielding marginal probability density distributions for both. In this paper, we conduct a correlation analysis between wind speed and wind direction, introducing a nonparametric kernel density estimation copula (KDE-COP) model as well as several classical copula models developed for the JPDF of the wind vector. Subsequently, various evaluation metrics are introduced to evaluate the fitting performance of the models. Among them, the KDE model and the KDE-COP model employ an optimal bandwidth algorithm to select the most suitable bandwidth.
2.1. Marginal Probability Density Function of Wind Speed
When employing a kernel density estimation model, the initial challenge is selecting the appropriate kernel function and bandwidth. Based on historical research experience, the optimality of different choices of kernel functions in kernel density estimation is nearly consistent. In practice, the selection of the smoothing parameter (bandwidth, denoted as h) is a crucial and complex issue that directly impacts the performance of the kernel estimation. If h is chosen too small, the resulting kernel estimation curve exhibits pronounced fluctuations, and it may not be sufficiently smooth, which leads to an increase in variance. Conversely, if h is chosen too large, it may overlook the multimodality of the kernel estimation, resulting in an overly smooth curve and causing significant estimation bias.
For a sample
,
, …,
from an unknown density
, the kernel estimator expression for the wind speed probability density function is as follows:
where
n is the sample size,
h is the bandwidth, and
,
represents the kernel function. In this paper, the Gaussian kernel is chosen as the kernel function for fitting wind speed data in the KDE model,
.
The choice of bandwidth is typically made to minimize the error function, such as the Mean Integrated Squared Error (MISE) or Asymptotic expression for the MISE (AMISE). For the density function
and its corresponding kernel estimator
, the MISE can be expressed as follows:
in which
,
g gives a square-integrable function and
. The second-order continuous derivative of the target density is denoted by
, and
proves square-integrable. Asymptotically optimal bandwidth by minimizing the MISE (2) can be obtained by
Currently, it is common that the bandwidth selection methods include rule-of-thumb, plug-in (PI), and data-driven cross-validation (CV) methods. Applying the idea of the normal reference distribution rule (nrd0) [
40] and from Equation (3), the expression for the optimal bandwidth is obtained by
In the above equation,
is taken as
, in which
S is the standard deviation of the sample and
Q is the difference between the 75% and 25% quantile of the sample. By another rule of thumb, nrd [
41], the factor in Equation (4) is taken to be 1.06 in the paper. This bandwidth formula is the adjusted Equation (4), i.e.,
The Least Squares Cross-Validation (LSCV) method, which automatically generates the optimal bandwidth from the data, produces an unbiased estimate of the bandwidth and is a commonly used method in solving for the bandwidth. Expanding the first equation in expression (2),
It is evident that the last term in the above expression does not depend on
, and consequently, nor is it dependent on h. Therefore, the minimization formula is equivalent to minimizing
According to the principles of LSCV, the following LSCV estimate can be constructed:
in which
. Hence, the bandwidth estimation on the basis of the LSCV method is given by
In addition to the aforementioned linear bandwidth selection algorithms, two algorithms introduced by Sheather and Jones using the plug-in method are also widely used. These two algorithms are known as the direct plug-in rule (SJ-dpi) and the solve-the-equation rule (SJ-ste), which utilizes the prior estimation of the derivatives to select the bandwidths. In this paper, this method can also be used in the bandwidth estimation of wind speed models.
The above kernel density estimation models with different bandwidths are recorded as KDE-nrd0, KDE-nrd, KDE-lscv, KDE-dpi, and KDE-ste, respectively.
2.2. Marginal Probability Density Function of Wind Direction
For an angular sample
from an unknown density
, the circular kernel density estimator of
is defined as follows:
where the bandwidth parameter is denoted by
, and
represents the circular kernel function.
Currently, the most widely used parametric model for circular data is the von Mises distribution, which has a PDF of
where
θ represents the wind angle, the scale parameter
, and
denotes the mean value of wind direction.
denotes the r-order modified Bessel function of the first kind. Taking into consideration the flexibility of the von Mises distribution, this paper employs the density function of the von Mises distribution as the kernel function in wind direction kernel density estimation, yielding the density estimator in view of the von Mises kernel as follows:
Here, represents the smoothing parameter (bandwidth) of the kernel density.
Following the principle of cross-validation, the optimal bandwidth can be solved by searching for the maximum value for the likelihood cross-validation (LCV) function, expressed as follows:
In this equation,
denotes the circular kernel density estimate excluding the
th observed value. Consequently, the maximum likelihood bandwidth for the circular kernel density is given by the following
The mean integrated squared error for circular kernel density is represented by
. The MISE typically lacks a closed-form expression, and practitioners often resort to optimizing its asymptotic approximation [
42], AMISE of MISE is derived as
where
represents the second derivative of the target density to be estimated.
According to the well-known rule of thumb [
43], the samples are assumed to obey a von Mises distribution with a scale parameter
, which is used as a reference density for the target circular density
, in that way
Thus, the optimal bandwidth by minimizing the above equation is gained as
in which
denotes the maximum likelihood estimate of the scale parameter
.
Another approach is the plug-in rule, which is adopted in this paper to insert the mixture von Mises distribution in Equation (15) as the reference density [
44]. A finite mixture of M von Mises distributions,
,
is defined as
In the above equation, represents the weight coefficient, with . After obtaining the estimate of AMISE(), the estimation of bandwidth is calculated using minimization of AMISE().
Among the numerous bandwidths solving algorithms used regarding kernel density estimation on cyclic data, data variability may give rise to unsolved cases. Ameijeiras-Alonso [
45] conducted a new study by proposing the direct plug-in rule (AA-dpi) and the solving the Equation (AA-ste) rule based on the plug-in idea, which are bandwidth methods for circular data that can be iterated until the derivative estimation of the target density is obtained. It serves as an extension of the bandwidth provided by Sheather and Jones. The bandwidth estimates obtained in this paper by applying these two rules are denoted as
and
, respectively, and their algorithms are implemented in the R package NPCirc.
The circular kernel density estimation models with different bandwidths are respectively recorded as KDE-LCV, KDE-RT, KDE-PI, KDE-DPI, KDE-STE.
2.3. Metrics for Model Evaluation
In this paper, three metrics are introduced to assess the goodness-of-fit of the marginal distribution: the coefficient of determination (
R2), the root-mean-square error (
RMSE), and the mean absolute error (
MAE). The expressions of these metrics are shown in
Table 1.
In
Table 1,
and
represent the actual and the estimated values of the distribution for the
ith sample, respectively;
denotes the mean of the modified empirical cumulative distribution, and
n is the sample size. In the above metrics, a higher
R2 approaching 1, and smaller
RMSE and
MAE values, indicate a better accuracy in model fitting.
2.4. Joint Probability Density Function Estimation of Wind Speed and Direction
For wind speed and wind direction variables, conventional linear coefficients may not accurately reflect the correlation between them. Copula functions, as a type of linking function, are widely applied in correlation analysis. In this paper, it can be used to characterize the non-linear relationship between the wind speed and wind direction bivariate variables.
and
represent the cumulative distribution functions (CDFs) of the two wind vector variables, respectively, and their joint cumulative distribution function (JCDF) is denoted as
F (
x,
θ). In accordance with Sklar’s theorem [
46], the relationship between wind speed and direction could be expressed with a correlation function
since
,
, then
. The JPDF of wind speed and direction is expressed as the following equation:
Parametric models are typically used to estimate the probability distribution of the copula functions, and the commonly used parametric copula models include the Gaussian copula, Student t-copula, Clayton copula, Frank copula, and Gumbel copula. The formulas of these models can be found in the literature [
31]. However, parameter copula models also have limitations, such as exhibiting boundary biases and being more suitable for describing the symmetry of the distribution.
Nonparametric copula models offer greater flexibility as they do not rely on previous knowledge or assumptions about known distributions. However, there are only a few studies that have considered the nonparametric copula approach for analyzing directional wind energy. Therefore, this paper aims to explore and describe the nonlinear correlation mechanism between wind vector variables employing this model. The methodology introduced by Charpentier et al. [
33] involves utilizing the transformation method to construct a nonparametric copula model, which can effectively avoid the boundary error of kernel estimation. In this paper, Gaussian CDF Φ is selected as the transformation function; given the variable
, the transformed binary variable is
. Depending on Sklar’s theorem, the joint density function of the correlated variables can be derived as follows:
where Φ denotes the standard Gaussian CDF and
φ is the first-order derivative of Φ. Hence, the above equation can be estimated using standard kernel density methods to obtain an estimate of
. In turn, the estimation of the copula density for wind speed and direction can be obtained:
The MISE criterion and CV criterion are still employed when determining the bandwidth, with their expressions as follows:
in which
,
is the
excluding the
ith observation at the estimation of
.
Two optimal smoothing parameter selection methods, i.e., the PI algorithm and the profile CV algorithm, are applied building upon the improved transform kernel estimation method [
47]. The optimal bandwidth for the non-parametric copula is calculated by minimizing Equations (23) and (24). The JPDF between the two variables of wind vector can be calculated by estimating
. These two non-parametric copula-based models are referred to as KDE-COP-PI and KDE-COP-CV. Additionally, several parametric copula models are introduced in this paper for comparison.
3. Wind Data
In this study, the actual observed data of daily maximum wind speed and the corresponding wind direction of four meteorological stations in the Inner Mongolia Autonomous Region of China were used as the samples. These four meteorological sites located in central and eastern Inner Mongolia play a vital role in regional climate monitoring, and their geographic information is provided in
Table 2. The location information is shown in
Figure 1. Subsequently, for the sake of brevity in the article, the stations Hohhot, Arxan, Abag Banner, and Linxi County will be referred to as S1, S2, S3, and S4 in that order. Wind data available taken from the China National Weather Network for the period 5 years were used in this paper. The wind direction belongs to cyclic data, which takes values from 0° to 360°, with 0° being the due north direction, clockwise as positive, and 22.5° as the interval, for a total of 16 directions, which are named due north, north-northeast, northeast, east-northeast, due east, east-southeast, southeast, south-southeast, due south, south-southwest, southwest, west-southwest, due west, west-northwest, northwest, and north-northwest in order and are noted as N, NNE, NE, ENE, E, ESE, SE, SSE, S, SSW, SW, WSW, W, WNW, NW, NNW, respectively.
5. Directional Wind Energy Assessment
Wind Power Density (WPD) is a crucial metric in wind energy assessment [
48]. Its formula is expressed as follows:
In this paper, based on the constructed bivariate distribution model KDE-COP-CV and combining with Formula (26), the WPD related to wind direction is rewritten as follows:
here,
ρ represents air density, typically assumed to be a constant
.
The WPD for 16 wind directions is obtained using Formula (27). The overall WPD (
) for all directions is derived by substituting the wind speed marginal PDF in place of the JPDF in Equation (27), namely
The reference WPD calculated using real data is denoted as
.
Table 6 compares the distribution of
and WPD in different directions, showing significant variability in wind energy across different wind directions.
Figure 10 displays the distribution of WPD across distinct wind directions. As shown in
Figure 10a,b, the dominant wind directions at sites S1 and S2 are northwest and north-northwest, in which the highest WPD of S1 station reached 416.15 W/m
2. Moreover, the WPD at site S1 in the true north, west-northwest, and south-southwest directions also exceeds 60 W/m
2. For the S3 site, it can be seen from
Figure 10c that the western, northerly, and southerly directions are more abundant, with a WPD greater than 150 W/m
2, while the north-northeast and easterly directions have a more uniform distribution of WPD, with values ranging from 64 W/m
2 to 73 W/m
2. As shown in
Figure 10d, the wind resources at S4 are concentrated in five directions from west-southwest to north-northwest, and the WPD values are distributed in the range of 108 W/m
2–221 W/m
2. To validate the fitting performance of the model employed in this study,
Figure 11 illustrates the results for
and
at four different sites. The WPD obtained in this paper is calculated according to the wind direction orientation of the maximum wind speed, which provides a basis for siting the turbine and determining the direction of blade rotation. If the actual turbine installation direction deviates from the direction of maximum wind speed, the obtained WPD will be overestimated.
It should be noted that this study evaluates wind energy variation based on wind speeds at 10 m above ground level. The specific wind energy density should be determined based on the hub height of different wind turbines. See
https://en.wind-turbine-models.com/ (accessed on 10 December 2023) for pertinent technical parameters and power curves of the turbine model. Undoubtedly, higher hub heights correspond to greater WPD values. Differences in wind power generation potential underscore the need for directional investigations. These findings aid in optimizing the design and state monitoring of wind turbine assemblies and have significant practical implications for the design and selection of flexible structures.