**1. Introduction**

When the penetration of wind power exceeds a certain value, it seriously affects power quality. At present, the error rate of wind speed forecasting of wind farms is about 25%–40%, and the research on wind speed forecasting of wind farms has not reached a satisfactory level [1]. If wind speed and wind power could be accurately predicted, it would be beneficial for the power system dispatching department to adjust the scheduling plan in time, which could effectively reduce the impact of wind power on the power grid [2]. At the same time, the improvement of prediction accuracy could also reduce the operating cost and rotation reserve of power systems [3], increase the limit of wind power penetration, and lay the foundation for wind farms to participate in bidding for power generation [4]. Many researchers have developed several different wind speed prediction methods. The simplest prediction method is the continuous method, which uses the closer wind speed or power observation value as the prediction value for the next point [5]. Other prediction methods include Kalman filters [6], ARMA [7], artificial neural network (ANN) [8], fuzzy logic, and so on. These methods only need the wind speed or power time series of the wind farm to build a model and make predictions. The spatial correlation method needs to consider the wind farm and the wind speed time series of several places

close to it, using several locations. Then, the spatial correlation between wind speeds is used to predict the wind speed of a wind farm and to predict wind power.

In recent years, the prediction of stochastic sequences with long-range dependence (LRD) characteristics has become a hot topic and can be applied to the prediction of non-stationary stochastic processes. The LRD model [9,10] can give better forecasting of the stochastic sequence by comprehensively considering the influence of both the past state and the current state on the future state. Fractional Brownian motion models with LRD characteristics have been widely applied in this field [11–13]. The LRD of fractional Brownian motion is described by the only parameter H (self-similarity index). Compared with fractional Brownian motion the LRD of the fractional Levy stable motion (fLsm) is determined instead by two parameters α and H, which can separately characterize the local irregularity and global persistence [14] so that fLsm can describe the long correlation process more flexibly. Therefore, in the following, we used a prediction model of stochastic sequences based on fLsm with LRD to predict wind speed.

The prediction method used in this paper involves the maximum Lyapunov exponent and fLsm iterative prediction model [15]. The Lyapunov exponent can help us distinguish between noise and signals that obey a certain law. In this paper, we mainly used the reciprocal of the maximum Lyapunov exponent to represent the maximum predictive steps. The methods for calculating the Lyapunov exponent are the definition method, small-data method, wolf method, Jacobian method, etc. This paper used the small-data method [16], which makes full use of all available data and therefore has relatively high accuracy. The small-data method is fast in operation and easy to implement, and it shows strong robustness to the embedding dimension and time delay, as well as the size of the data amount. However, the choice of the embedding dimension is subjective, and the time delay is not necessarily accurate. Therefore, we needed to use the c-c method [17,18] to avoid this problem.

The fLsm iterative prediction model was established by fLsm-driven Langevin-type stochastic differential equation (SDE) [19]. First, the fractional Black-Scholes model [20,21] was extended and the parameterized SDE was obtained. Then, the fLsm was discretized by Taylor series expansion of fractional order [22], and the mathematical relationship between the increment of flsm and Levy's stable white noise was obtained and substituted into discrete Langevin-type SDE. Finally, using the discrete Langevin-type SDE and the difference equation, the expression of the proposed fLsm finite difference iterative prediction model was obtained.

Wind speed is mainly affected by weather and terrain shape, and it also changes with altitude, so randomness is the basic property of wind speed, at least on a small scale. In this paper, we used Langevin-type SDE [19] driven by fLsm to describe the randomness of wind speed. However, the wind speed data in most regions do not have heavy tail characteristics, which will lead to a larger error when using wind speed data to estimate the parameters of the fLsm iterative prediction model. From the characteristics of the data: weighting can increase the variance of the data so that the data can show heavy-tailed features.

This paper is organized as follows. The small-data method is introduced in Section 2, The fLsm is introduced in Section 3, where we also analyze the model and LRD characteristics. The fLsm finite difference iterative forecasting model is proposed in Section 4, which establishes the finite-difference iterative forecasting model by making Langevin-type SDE [19] driven by fractional Levy stable motion, and the Langevin-type SDE [19] parameters estimated by the novel Characteristic Function (CF) method [23–25]. The wind speed forecasting results show the superiority of the method used in this paper (Section 5). The mathematical relationship between wind speed and wind power is introduced in Section 6. Concluding remarks are given in Section 7.

#### **2. Maximum Prediction Steps Based on Lyapunov Exponent**

The small-data method [16] is defined as follows.

Let {*<sup>x</sup>*1, *x*2 ··· *xN*}, be a given chaotic time series, then the reconstructed phase space is defined as:

$$\mathbf{Y}\_{i} = \left(\mathbf{x}\_{i}, \mathbf{x}\_{i+\tau\_{i}} \cdots, \mathbf{x}\_{i+(m-1)\tau}\right) \in \mathbb{R}^{m}, (i = 1, 2, \cdots, M), \tag{1}$$

where *N* = *M* + (*m* − <sup>1</sup>)<sup>τ</sup>. The embedding dimension m and the time delay τ can be chosen according to the C-C method [17,18].

After the reconstruction of phase space, find the nearest adjacent point of each point on the given orbit, i.e.,

$$d\_{\dot{\jmath}}(0) = \min\_{\mathcal{X}\_{\dot{\jmath}}} \|Y\_{\dot{\jmath}} - Y\_{\dot{\jmath}}\|\_{\mathcal{I}} \tag{2}$$

$$\left| \left| j - \hat{f} \right| > p \right. \tag{3}$$

where *p* is the average period of the time series, which can be estimated by the inverse of the average frequency of the power spectrum, and the maximum Le can be estimated by the average divergence rate of each point on the basic orbit. For each reference point, calculate the distance to the nearest discrete point after the first discrete time step by

$$d\_j(i) = \min\_{\mathbf{x}\_j} \|Y\_{j+i} - Y\_{j+i}\|\_\star \text{ } i = 1, 2, \dots, \min\{M - j, M - j\}, \tag{4}$$

The average divergence rate obeys the exponential divergence, i.e.,:

$$d\_{\rangle}(i) = \mathbb{C}\_{\rangle} x^{\lambda\_1(i \wedge \Lambda t)}, \; \mathcal{C}\_{\rangle} = d\_{\rangle}(0), \tag{5}$$

Take the logarithm on both sides to get:

$$
\ln d\_{\bar{f}}(i) = \ln \mathcal{C}\_{\bar{f}} + \lambda\_1 (i \cdot \Delta t), \tag{6}
$$

Obviously, the maximum Le is roughly equivalent to the slope on this set of straight lines. It can be obtained by approximating this set of lines by the method of least squares.

$$
\lambda\_1 = \frac{\ln d\_j(i) - \ln \mathbf{C}\_j}{i \cdot \Delta t},
\tag{7}
$$

The reciprocal of the maximum Lyapunov exponent is the maximum prediction steps ε when λ1 > 0.

$$
\varepsilon = \frac{1}{\lambda\_1 \prime} \tag{8}
$$

#### **3. Fractional Levy Stable Motion**

#### *3.1. Parameter Meaning of Levy Stable Motion*

Levy stable motion represents a non-Gaussian random process with LRD and high variability, we denote by *X* ∼ *<sup>S</sup>*α(β, δ, μ) the stable distribution with parameters α, β, δ and μ. Its characteristic function form is as follows [26]:

$$\varphi(\theta:\alpha,\beta,\delta,\mu) = E[e^{j\theta\mathbf{x}}] = \begin{cases} \exp\left[j\mu\theta - \delta|\theta|^a \left[1 - j\beta \frac{\partial}{|\theta|} \tan\left(\frac{\pi\alpha}{2}\right)\right] \right], \alpha \neq 1\\ \exp\left[j\mu\theta - \delta|\theta|^a \left[1 + j\beta \frac{\partial}{|\theta|} \frac{2}{\pi} \ln|\theta|\right] \right], \alpha = 1 \end{cases} \tag{9}$$

where δ > 0, β ∈ [−1, 1], μ ∈ *R*. The parameter β is called the skewness parameter, while δ is called scale parameter, and μ the location parameter. In this article, we studied symmetric stable distribution, so we make β = 0. The location parameter μ indicates the mean, and the scale parameter δ represents the discrete nature of the distribution.

Where α ∈ (0, 2]. The parameter α is the tail parameter and the distribution is Gaussian when α = 2, whereas the tail is exponential. In what follows, we typically supposed 0 <α< 2. When *x* → <sup>∞</sup>, the probability tails of *X* satisfy [27]:

$$P\{|X| > \mathbf{x}\} \sim \mathcal{C}\_{\mathbf{a}} \delta^{a} \mathbf{x}^{-a},\tag{10}$$

where *Ca* is a constant. The tail of the distribution with 0 <α< 2 obeys a power law and decreases to zero so slowly that the variance is infinite; the smaller the value of α, the slower the decrease. From the perspective of probability distribution, as the value of α decreases, its tail becomes thicker (Figure 1).

**Figure 1.** Influence of different characteristic index values on the probability distribution function.

*3.2. Long-Range Dependence and Self-Similarity Fractional Levy Stable Motion*

The model of fLsm [14] is given by the following stochastic integral:

$$L\_{H,a}(t) = \int\_{-\infty}^{\infty} \left\{ a \left[ (t-s)\_{+}^{H-\frac{1}{a}} - (-s)\_{+}^{H-\frac{1}{a}} \right] + b \left[ (t-s)\_{-}^{H-\frac{1}{a}} - (-s)\_{-}^{H-\frac{1}{a}} \right] \right\} \lambda ds,\tag{11}$$

where *a* and *b* are the arbitrary constants, *xH*−1/<sup>α</sup> + = 0 for *x* ≤ 0 and *xH*<sup>−</sup> 1α + = *xH*<sup>−</sup> 1α for *x* > 0, *M* ∈ *R* is the symmetric Levy stable random measure, and *H* is the self-similarity parameter. The incremental process of fLsm [28] is as follows:

$$\begin{array}{lcl}X\_{H,a}(t) &= L\_{H,a}(t+1) - L\_{H,a}(t) \\ &= \int\_{-\infty}^{\infty} \left[ a \left[ (t+1-s)\_{+}^{H-1/a} - (-s)\_{+}^{H-1/a} \right] \right. \\ &+ b \left[ (t+1-s)\_{-}^{H-1/a} - (-s)\_{-}^{H-1/a} \right] \omega\_{a}(s), \end{array} \tag{12}$$

where ωα(*s*) is the Levy stable white noise.

Symmetric Levy stable motion is 1/α self-similar, namely, *<sup>L</sup>*α(*t*) - *<sup>a</sup>*<sup>−</sup>1/<sup>α</sup>*L*α(*at*) for all *a* > 0. Laskin et al. [29] have shown that the fLsm is a self-similar process with self-similar parameter *H* − 1/2 + 1/<sup>α</sup>. The incremental process *LH*,<sup>α</sup>(*<sup>t</sup>*2) − *LH*,<sup>α</sup>(*<sup>t</sup>*1) is also self-similar with *H* − 1/2 + 1/<sup>α</sup>.

The key parameters α, *H* of the fLsm model are not independent in some cases, i.e., the fLsm has LRD characteristics for α*H* > 1 [30]. It is worth noting that the fLsm model has no long memory when 0 <α< 1, therefore, the range of α is limited to (1, 2) to ensure that the fLsm model has the LRD characteristic. At the same time, 0.5 < *H* < 1 is also required.

#### **4. Iterative Forecasting Model Based on Fractional Levy Stable Motion**

#### *4.1. Iterative Forecasting Model*

Let us consider the following Langevin-type stochastic differential equation driven by Levy stable motion [19]:

$$dX(t) = b(t, X(t))dt + \delta(t, X(t))dL\_{\mathfrak{a}}(t), \; X(0) = X\_{0\prime} \tag{13}$$

where *dL*α(*t*) stands for the increments of Levy α-stable motion *<sup>L</sup>*α(*t*). By replacing *LH*,<sup>α</sup>(*t*) to *<sup>L</sup>*α(*t*), we obtain the Langevin-type stochastic differential equation driven by fractional Levy stable motion:

$$dX\_{H,a}(t) = b(t, X\_{H,a}(t))dt + \delta(t, X\_{H,a}(t)), \\ dL\_{H,a}(t)X\_{H,a}(0) = X\_{0\prime} \tag{14}$$

where *b*(*<sup>t</sup>*, *X*(*t*)) and <sup>δ</sup>(*<sup>t</sup>*, *X*(*t*)) represent the drift and diffusion functions, respectively.

The fractional Black-Scholes model [20,21], which was developed by W. DAI et al. [31,32] has expression in the form:

$$dS\_l = \mu S\_l dt + \delta S\_l dB\_H(t),\tag{15}$$

where μ indicates the expected return rate and δ is the volatility rate. The Levy stable distribution is the Gaussian distribution when α = 2 so that when α = 2 the fLsm becomes the fractional Brownian motion, μ represents the mean, and δ represents the diffusion coefficient. The parameters *b* and δ in the Levy stable distribution represent the mean and diffusion coefficient, respectively, in 1 < α ≤ 2. Consequently, Equation (14) can be rewritten as follows:

$$dX\_{\mathcal{H},a}(t) = \mu X\_{\mathcal{H},a}(t)dt + \delta X\_{\mathcal{H},a}(t)dL\mu\_{\mathcal{A}}(t),\tag{16}$$

where μ and δ are constants. They are derived from the novel CF method in the Appendix.

By using the Maruyama symbol [22], *dBt* = *w*(*t*)(*dt*)1/2, the following equations can be obtained:

$$\int\_0^t f(\tau) \left(d\tau\right)^a = \rho \int\_0^\tau \left(t - \tau\right)^{a-1} f(\tau) d\tau,\tag{17}$$

$$d\mathbf{x} = f(t)(dt)^a,\tag{18}$$

where 0 < *a* < 1, and *a* represents the self-similar parameter of *x*. The incremental expression of fLsm can be obtained by replacing *f*(*t*) with *<sup>w</sup>*α(*t*):

$$dL\_{H,a} = w\_a(t) (dt)^{H - \frac{1}{2} + \frac{1}{a}} \,, \tag{19}$$

Equation (16) can be written the discrete form, which reads as follows:

$$
\Delta X\_{H,a}(t) = \mu X\_{H,a}(t)\Delta t + \delta X\_{H,a}(t)w\_a(t)(\Delta t)^{H - \frac{1}{2} - \frac{1}{a}},\tag{20}
$$

The iterative predictive model was obtained from the identity Δ*X*(*t*) = *X*(*t* + 1) − *<sup>X</sup>*(*t*):

$$L\_{H,a}(t+1) = L\_{H,a}(t) + \mu L\_{H,a}(t)\Delta t + \delta L\_{H,a}(t)w\_a(t)(\Delta t)^{H - \frac{1}{2} + \frac{1}{a}},\tag{21}$$

#### *4.2. Parameter Estimation with the Characteristic Function*

In the essay of Wang et al. [23–25], some methods were introduced and the validity of these methods was compared, including the quantiles method, empirical characteristic function method, logarithmic moment method, Monte Carlo method, etc. It was concluded that the CF accuracy method was better. The parameter estimation methodology can be subdivided into the following steps:

Step 1: Let *xi*|*<sup>i</sup>*=1...*<sup>N</sup>* be the sampling data for the fLsm,

Step 2: δ estimation:

$$\left| \left| \boldsymbol{\varrho} \left( \boldsymbol{\theta}; \boldsymbol{\alpha}, \boldsymbol{\beta}, \mu, \delta \right) \right| = \left| \mathbb{E} \left\{ \boldsymbol{e}^{j\boldsymbol{\beta} \mathbf{x}} \right\} \right| = \boldsymbol{e}^{-\boldsymbol{\gamma} \left| \boldsymbol{\theta} \right|^{a}},\tag{22}$$

$$\ln \left| \wp(\theta; \alpha, \beta, \mu, \delta) \right| = -\gamma \left| \theta \right|^{\alpha}, \tag{23}$$

$$\delta = -\ln \left| \wp(1; \alpha, \beta, \mu, \delta) \right| = -\ln \left| E\{e^{jx} \} \right|,\tag{24}$$

The estimated δ has the form:

$$\delta = -\ln \left| \phi(1; \alpha, \beta, \mu, \delta) \right| = -\ln \frac{1}{N} \left| \sum\_{i=1}^{N} e^{j x\_i} \right|,\tag{25}$$

Step 3: Further, we estimate parameter α,

$$\Theta\_0^{\alpha} = \frac{\ln \left| E \left\{ e^{j\theta\_0 \chi} \right\} \right|}{\ln \left| E \left\{ e^{jx} \right\} \right|} = \frac{\ln \left| \phi(\theta\_0; \alpha, \beta, \mu, \delta) \right|}{\ln \left| \phi(1; \alpha, \beta, \mu, \delta) \right|}, \tag{26}$$

$$\hat{\alpha} = \log\_{0\_0} (\frac{\ln[\phi(\theta\_0; \alpha, \beta, \mu, \delta)]}{\ln[\hat{\phi}(1; \alpha, \beta, \mu, \delta)]}),\tag{27}$$

where ϕ<sup>ˆ</sup>(<sup>θ</sup>0; α, β, μ, δ) = 1*N Ni*=<sup>1</sup> *ej*θ0*xi* .

Step 4: Parameter μ is estimated by complex domain of the cumulant generating function of fLsm,

$$\ln \rho(\theta\_0; \alpha, \beta, \mu, \delta) = \delta |\theta|^a + j \left[ \delta |\theta|^a \beta \frac{\theta}{|\theta|} \tan \left( \frac{\pi \alpha}{2} \right) + \mu \theta \right],\tag{28}$$

$$\hat{\mu} = \frac{\operatorname{Im} \Big| \theta\_0^{\text{th}} \ln \left| \phi(1; \alpha\_\prime \beta\_\prime \mu\_\prime \delta) \right| - \ln \left| \phi(\theta\_0; \alpha\_\prime \beta\_\prime \mu\_\prime \delta) \right|}{\theta\_0^{\text{a}} - \theta\_0} \Big) \tag{29}$$

Step 5: As we know, the fLsm model drive function is symmetric β ˆ = 0.

#### **5. Wind Speed Forecasting**

We used the average daily wind speed data from the 2011 actual historical wind speed of Inner Mongolia. The historical wind speed waveform is shown in Figure 2. When the wind speed is too high, it will seriously affect the power grid, so we focused on accurately predicting the time period when the wind speed is high. It can be seen from Figure 2 that the wind speed data began to fluctuate greatly from the 100th day, which was harmful to the power grid, so we chose to start from the 100th forecast. In terms of selecting the prediction steps, the small-data method of the second part was used to calculate the maximum prediction steps. The calculation results are shown in Table 1. The maximum forecast steps were 43 days, we could set the forecast time period from the 100th day to the 140th day. Before using the fLsm iterative forecasting model, we needed to determine whether the wind speed sequence was LRD. Through parameter estimation, we could ge<sup>t</sup> the value of *H* and α (Table 2), satisfying α*H* > 1. Finally, the fLsm iterative forecasting model was used to forecast the wind speed sequence, and the forecast result is shown in Figure 3. The specific method flow is shown in Figure 4.


**Table 1.** 2011 Small-data method parameters.

**Figure 2.** 2011 wind speed waveform.

**Table 2.** Parameters and errors of the three weighting methods.


**Figure 3.** Unweighted wind speed predictions.

**Figure 4.** Forecasting process.

As can be seen from Figure 3, when the prediction steps exceeded nine steps, the prediction error gradually increased, and the prediction data was often larger than the actual data. However, by calculating the maximum prediction steps of 43, its effective prediction steps were much less than the maximum prediction steps.

As fLsm is an infinite variance process and the variance of the wind speed data is not large, if the historical wind speed data is used to estimate the parameters of the fLsm iterative prediction model, a large error will occur. In this section, we used a method of weighting the wind speed data to increase the variance of the data, thereby reducing the error in parameter estimation.

It can be seen from Figures 5 and 6 that the prediction effect of the wind speed weighted data had been significantly improved. Generally speaking, increasing the variance will cause the tail parameter α to decrease. However, it can be seen from Table 2 that the α value of the five-times weighted wind speed data was larger than the α value of the unweighted wind speed data, which indicated that a larger error occurred when modeling the wind speed sequence using the fLsm iterative prediction model. Of course, α*H* > 1 must be guaranteed when weighting the wind speed data.

**Figure 5.** Five-time weighted wind speed predictions.

**Figure 6.** Ten-time weighted wind speed predictions.

In order to prove the extensiveness of the wind speed prediction model in this paper, we forecasted the wind speed data of Inner Mongolia in 2012. As can be seen from Figure 7, the wind speed data on the 70th day began to fluctuate. We then calculated the maximum number of prediction steps to 45 and set the prediction time period to the 70th to 115th days. After weighting the wind speed data 10 times, the fLsm iterative prediction model was used for prediction in Figure 8. In addition, the fLsm iterative prediction model was compared with the GA-BP neural network, which showed that the wind speed prediction model in this paper had better prediction accuracy.

**Figure 7.** 2012 wind speed waveform.

**Figure 8.** 2012 wind speed forecast results.

Table 3 lists the maximum and average percentage errors for the two prediction models. As can be seen from the table, the fLsm iterative prediction model had higher prediction accuracy. At the same time, it can be seen from Figures 9 and 10 that the GA-BP neural network had a poor prediction of the peak value, which will lead to the inability to prevent the impact of excessive wind speed on the grid.


**Table 3.** Errors of the two prediction models.

**Figure 9.** Comparison of prediction effects in 2011.

**Figure 10.** Comparison of prediction effects in 2012.

#### **6. Relationship between Wind Speed and Wind Power**

Taking a variable-pitch wind turbine with a single unit capacity of 600 kW as an example, the power characteristics are shown in Figure 11. The cut-in wind speed, cut-out wind speed, and rated wind speed were 3, 50, and 25 m/s, respectively. The raw data of wind power time series could be obtained from the original data of wind speed and power characteristic curve of the wind turbines.

**Figure 11.** Power curve of a wind power generator.

When the wind speed was less than the cut-in wind speed and greater than the cut-out wind speed, the power generation was zero; when the wind speed was equal to the cut-in wind speed, the rated wind speed, and the cut-out wind speed, the power characteristic curve had a significant turning point. When the wind speed was greater than the rated wind speed and less than the cut-out wind speed when the wind speed was out, the generating power was a certain value. Only when the wind speed was greater than the cut-in wind speed and less than the rated wind speed did the generating power, and the wind speed approximate a linear relationship.

$$P(v) = \begin{cases} 0 & 0 \le v \le v\_i \\ f\_p(v) & v\_i \le v \le v\_r \\ P\_r & v\_r \le v \le v\_c \\ 0 & v\_r > v\_c \end{cases} \tag{30}$$

where *<sup>P</sup>*(*v*) is the wind power, *Pr* is the rated power of the generator, *vi* is the cut-in wind speed, *vc* is the cut-out wind speed also known as the cut-off wind speed, *vr* is the rated wind speed, and *fp*(*v*) is the output characteristic of the wind speed between *vi* and *vr*. Its characteristics can be linear functions, quadratic functions, or cubic functions.
