*3.3. GA-ELM Modeling*

The intelligent algorithm in model construction is the key to WT temperature monitoring. Compared with NN, Extreme Learning Machine (ELM) has the advantages of fast training speed and high accuracy. ELM is composed of a single hidden layer feed-forward Neural Network.

In ELM, *X* = [*<sup>X</sup>*1, *X*2, ..., *Xn*]*<sup>T</sup>* ∈ *Rn* and *Y* = [*<sup>Y</sup>*1, *Y*2, ...,*Ym*]*<sup>T</sup>* ∈ *Rm* are the input and output of the model, respectively; *<sup>ω</sup>ij* and *<sup>ω</sup>ij* are the input and output weights, respectively. For *n* distinct samples *X*, the ELM can approximate the target as

$$\hat{Y}\_k = \sum\_{i=1}^{\overline{n}} \omega\_{jk} \cdot g\left(\omega\_{i\overline{j}} \cdot X\_i + b\_i\right) \,\,\, j = 1, 2, \dots, \,\,\,\,\widetilde{n} \tag{1}$$

where *g*(·) represents the activation function, *n*- is the number of hidden nodes, and *bi* is the hidden layer bias.

If ELM can fit *n* distinct samples with zero error, the matrix form of approximation can be expressed as

$$
\Upsilon = \hat{\Upsilon} = H \mathfrak{w}\_{\bar{n}\bar{m}} \tag{2}
$$

where the output weights ω*nm*- = ω*T*1 , <sup>ω</sup>*T*2 , ..., <sup>ω</sup>*Tm<sup>T</sup>*, ω*k* = [*<sup>ω</sup>*1*k*, *ω*2*k*, ..., *<sup>ω</sup>nk*- ]*T* and the hidden layer output matrix *H* can be expressed as

$$H = \begin{bmatrix} \ g(\omega\_{11} \cdot X\_1 + b\_1) & \dots & \ g(\omega\_{\overline{n}1} \cdot X\_1 + b\_n) \\ \dots & \dots & \dots \\ \ g(\omega\_{1m} \cdot X\_m + b\_1) & \dots & \gcd(\omega\_{\overline{n}m} \cdot X\_m + b\_{\overline{n}}) \end{bmatrix}\_{m \times \overline{n}} \tag{3}$$

With given input weights *<sup>ω</sup>ij* and hidden layer bias *bj*, the output weight can be analytically calculated by a least squares method as

$$\|\mathbf{H}\hat{\omega}\_{\overline{n}\overline{n}} - \mathbf{Y}\|\ = \|\mathbf{H}\mathbf{H}^+\mathbf{Y} - \mathbf{Y}\|\ = \min\_{\overline{\omega}\_{\overline{n}\overline{n}}} \|\mathbf{H}\boldsymbol{\omega}\_{\overline{n}\overline{n}} - \mathbf{Y}\|\tag{4}$$

where *H*<sup>+</sup> is the generalized Moore–Penrose inverse of *H*.

Then, the solution can be expressed as

$$
\hat{\omega}\_{\overline{n}|n} = \mathbf{H}^+ \mathbf{Y} \tag{5}
$$

However, the model is based on one set of initial input weights *<sup>ω</sup>ij* and hidden layer bias *bj*, which are mostly random. With the randomness of the initial coefficients, the ELM model could easily fall into local minima. To solve this problem, GA is applied in this paper to optimize ELM.

In GA, the coefficients to be optimized are coded as individual chromosomes. In this paper, the individual chromosome code contains the input weights and hidden layer bias of ELM. The fitness *F*, which can judge whether the code is a good solution, is calculated as

$$F = 1 \bigvee\_{i=1}^{m} (\sum\_{k=1}^{m} |e\_k|) \tag{6}$$

where *ek* is the error of ELM as *ek* = *Yk* − *Y* ˆ *k* and *m* is the number of output layer nodes.

The GA optimization process proceeds as follows:

Step 1, selection. GA selection is based on fitness; the probability of selection is calculated as

$$p\_i = \frac{k/F\_i}{\sum\_{j=1}^{N} k/F\_i} \tag{7}$$

where *N* is the number of individuals.

> Step 2, crossover. GA crossover of two chromosomes at gene *j* is calculated as

$$\begin{aligned} \boldsymbol{\alpha}\_{kj} &= \boldsymbol{\alpha}\_{kj}(1-\boldsymbol{\beta}) + \boldsymbol{\alpha}\_{lj}\boldsymbol{\beta} \\ \boldsymbol{\alpha}\_{lj} &= \boldsymbol{\alpha}\_{lj}(1-\boldsymbol{\beta}) + \boldsymbol{\alpha}\_{kj}\boldsymbol{\beta} \end{aligned} \tag{8}$$

where *<sup>α</sup>kj* and *<sup>α</sup>lj* are the gene *j* of chromosome *k* and chromosome *l*, respectively, and *β* is the cross-coefficient, which is a random number in the range (0–1).

Step 3, evolution. GA evolution of *<sup>α</sup>ij* is as

$$\mathfrak{a}\_{ij} = \begin{cases} \mathfrak{a}\_{ij} + \left(\mathfrak{a}\_{ij} - \mathfrak{a}\_{\max}\right) f(\mathbf{g}), \gamma > 0.5\\ \mathfrak{a}\_{ij} + \left(\mathfrak{a}\_{\min} - \mathfrak{a}\_{ij}\right) f(\mathbf{g}), \gamma \le 0.5 \end{cases} \tag{9}$$

$$f(\emptyset) = \gamma (1 - \lg \prime G\_{\text{max}})^2 \tag{10}$$

where *αmax* and *αmin* are the upper and lower threshold of *aij*, respectively, *g* and *Gmax* are the current and maximum number of GA evolutions, respectively, and *γ* is the evolution coefficient, which is in the range (0–1).

It is necessary to repeat the GA optimization until the maximum fitness is obtained. The individual chromosome code with the maximum fitness is the optimal solution. By decoding the optimal solution, optimal initial input weights and hidden layer bias can

be obtained for the ELM. The flowchart of building the WT model based on GA-ELM is shown in Figure 6.

**Figure 6.** Topology of Extreme Learning Machine (ELM).

#### **4. Model Development and Validation**

## *4.1. SCADA Datasets*

To build the WT model and verify its accuracy, the learning set and test set are shown in Table 1.


**Table 1.** Description of the learning and test sets.

To ensure model accuracy, the learning set should cover the working conditions and state as much as possible only without failures. Similarly, the test set should also contain a variety of working conditions and states without failures.

#### *4.2. Model Testing Result*

To verify the GA optimization, the original ELM and Back-Propagation Neural Network (BPNN) are used for comparison with GA-ELM. The residuals of GA-ELM, ELM,

and BPNN are shown in Figure 7a; the ambient temperature and wind speed of the testing set are shown in Figure 7b.

**Figure 7.** (**a**) Model testing results of different intelligent algorithms; (**b**)ambient conditions of testing set.

For the testing set, as shown in Figure 7b, the ambient temperature rises wavily and the wind speed drops rapidly after remaining stable for a period of time. This kind of irregularity may not fit the ELM and BPNN model at local minima. Thus, as shown in Figure 7a, the residual temperature of GA-ELM is smaller than that of ELM and BPNN.

To quantitatively compare the performance of the three algorithms, Mean Square Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE) are used to analyze the results and calculated as

$$\text{MSE} = \frac{1}{s} \sum\_{k=1}^{s} \left( \sigma\_k \right)^2 \tag{11}$$

$$\text{MAE} = \frac{1}{s} \sum\_{k=1}^{s} |\mathcal{e}\_k| \tag{12}$$

$$\text{MAPE} = \frac{1}{s} \sum\_{k=1}^{s} \left| \frac{c\_k}{\mathbf{Y}\_k} \right| \tag{13}$$

Statistical indicators are shown in Table 2. In this study, all calculations are performed with the MATLAB R2017b environment on a 64-bit Windows operating system installed on a computer with an Intel Core i7-6700 CPU with 4GHz.


**Table 2.** Statistical indicators of different intelligent algorithms.

Compared to ELM and BPNN, GA-ELM achieves a smaller MSE, MAE, and MAPE, demonstrating that the GA optimization is effective and the accuracy of the WT model is improved.

## **5. Case Study**

To verify the proposed method, this paper sets up experiments for different ambient conditions and real failure. For an extreme ambient temperature, cases of winter midnight, summer noon, and a cold wave in normal state are presented. For wind speed change, cases of wind speed increase and decrease in normal state are presented. Meanwhile, for failure detection, a main bearing offset case is presented as well.

In these experiments, a comparison method is also applied to analyze the effects of the ambient conditions. The framework of the comparison method is the same as that of the proposed method, as shown in Figure 5. The comparison method uses the same GA-ELM model as the proposed method. The only difference between the proposed method and the comparison method is that the inputs of the comparison method, as with existing methods, do not contain the ambient conditions. To compensate for this, four other parameters (pitch motor 1, 2 and 3 temperature and hub ambient temperature) are added to the inputs of the comparison method. It should be noted that the residual in this paper is the difference in the actual value minus the predicted value. Moreover, for concise description, the proposed method in this paper is referred to as Method I, and the comparison method is referred to as Method II.

#### *5.1. Ambient Temperature Change in Normal State*

For ambient temperature, the data in winter, summer, and the cold wave are shown in Table 3. It should be noted that, during these three periods, the WT is in a normal working state.

**Table 3.** Description of datasets under different weather conditions.


The residual results are shown in Figure 8, and the statistical indicators are shown in Table 4.

**Table 4.** Statistical indicators of Method I and II under a normal working state under different weather conditions.


It can be seen from Figure 8a that the Method I residual results are between −0.45 ◦C and 0.23 ◦C, but the Method II residual results are between −2.52 ◦C and 1.86 ◦C. Similarly, in Figure 8b, the Method II residual results (between −2.12 ◦C and 1.67 ◦C) are much greater than the Method I residual results (between−0.19 ◦C and 0.28 ◦C) in summer. Table 4 also shows that Method I achieves smaller statistical indicators than Method II. This proves that, with ambient temperature and ambient temperature change, Method I achieves better performance when facing cyclical and seasonal ambient temperature changes.

Furthermore, comparing Figure 8a,b the effect of ambient temperature in summer is smaller than in winter. A reasonable explanation is that the difference between the main bearing temperature and the ambient temperature is smaller in summer than in winter, since the range of the main bearing temperature in a normal working state is approximately 55–70 ◦C.

During the cold wave shown in Figure 8c, Method I has some reasonable fluctuations, but Method II shows a downward trend. If the monitoring is based on Method II, the

main bearing temperature continues to decline for 2 h, which may lead to false alarms. However, the actual situation is that the WT is under a normal working state. This proves that, although Method II has more temperature parameters as compensation, Method I is more sensitive to the rapid ambient temperature change than Method II.

**Figure 8.** Residual results of Method I and II under a normal working state under different weather conditions: (**a**) winter; (**b**) summer; (**c**) cold wave.

#### *5.2. Wind Speed Change in Normal State*

For wind conditions, the datasets of wind speed increase and decrease are shown in Table 5. Similarly, in these two periods, the WT is in a normal working state. At the same time, in the event that the ambient temperature would have an influence, the ambient temperature is around 15 ◦C.

**Table 5.** Datasets of wind speed increase and decrease.


The residual results of Method I and II are shown in Figure 9, and the statistical indicators are shown in Table 6.

**Figure 9.** Residual results of Method I and II in normal working state within wind conditions: (**a**) wind speed increase; (**b**) wind speed decrease.

**Table 6.** Statistical indicators of Method I and II in normal working state during wind speed increase and decrease.


As shown in Figure 9a, during the wind speed increase, the residual results of Method I and Method II are negative. As mentioned in Section 2, these results are due to the delay between the wind speed change and internal temperature change. Since the wind speed change is used as a model input of Method I, the delay is reduced effectively, and the residuals' absolute value of Method I is much smaller than that of Method II. The amplitude of Method I is 0.48 ◦C, and that of Method II is 4.48 ◦C. The same reasoning can explain Figure 9b: during the wind speed decrease, the residuals' absolute value of Method I is much smaller than that of Method II (0.39 ◦C vs. 1.48 ◦C, respectively). The statistical indicators in Table 6 also prove that Method I achieves better performance than Method II during the wind speed change.

Additionally, both Method I and II achieve better performance during a wind speed decrease than during an increase. These results occur because the speed of wind speed change can directly determine the delay between the wind speed change and internal temperature change. In these two periods, the wind speed increase is faster than the decrease, which means that the delay during the increase is larger than during the decrease. Thus, the absolute values of the residual results during the wind speed increase are generally larger than those during the decrease.

#### *5.3. Main Bearing Failure Detection*

To verify the failure detection ability of the proposed method, a serious main bearing offset that occurred in the wind farm, which is shown in Figure 10, is used as a failure case. The dataset of 5 h before the failure happened is shown in Table 7, and the residual results are shown in Figure 11.

**Table 7.** Datasets of main bearing offset failure.


**Figure 10.** Main bearing offset failure.

**Figure 11.** Residual results of Method I and II in main bearing offset failure.

The residual results of Method I and Method II both show upward trends, which means that both methods could predict the failure. However, during the 5h period, the ambient temperature continued to drop and the wind speed continued to increase, which caused the residual results of Method II at first to be negative, falling behind Method I. In particular, at 10:16 (time point 276), the wind speed increased rapidly, which caused the residual results of Method II to decrease over a short time. Comparing the two curves, it can be seen that Method II generally falls behind Method I by more than 50 min, reaching approximately90 min at a residual result of 2 ◦C.

Considering the conclusions of the previous experiments, Method I exhibits stable performance during extreme ambient temperatures and wind speed change, and the residual results of Method I are generally less than 0.5 ◦C. However, for Method II, due to the change in the ambient conditions, residual results could be more than 1 ◦C, sometimes reaching 4 ◦C, under a normal working state. Thus, the safe range of Method I is narrower than that of Method II. If the safe range of Method I is set to ±1 ◦C and that of Method II is set to ±2 ◦C, the alarm from Method I would be approximately120 min earlier than in Method II. If the safe range of Method I is set to ±0.5 ◦C and that of Method II is set to ±4 ◦C, the alarm of Method I would be more than 180 min earlier than that of Method II. These results demonstrate that Method I, with ambient conditions, can achieve higher monitoring accuracies and earlier failure alarms.
