WTBD-YOLOv8: An Improved Method for Wind Turbine Generator Defect Detection

Tong, Liang; Fan, Changlong; Peng, Zhongbo; Wei, Cong; Sun, Shijie; Han, Jie

doi:10.3390/su16114467

Open AccessArticle

WTBD-YOLOv8: An Improved Method for Wind Turbine Generator Defect Detection

by

Liang Tong

,

Changlong Fan

^*,

Zhongbo Peng

,

Cong Wei

,

Shijie Sun

and

Jie Han

School of Shipping and Naval Architecture, Chongqing Jiaotong University, Chongqing 400074, China

^*

Author to whom correspondence should be addressed.

Sustainability 2024, 16(11), 4467; https://doi.org/10.3390/su16114467

Submission received: 7 April 2024 / Revised: 22 May 2024 / Accepted: 22 May 2024 / Published: 24 May 2024

Download

Browse Figures

Versions Notes

Abstract

Wind turbine blades are the core components responsible for efficient wind energy conversion and ensuring stability. To address challenges in wind turbine blade damage detection using image processing techniques such as complex image backgrounds, decreased detection performance due to high image resolution, prolonged inference time, and insufficient recognition accuracy, this study introduces an enhanced wind turbine blade damage detection model named WTDB-YOLOv8. Firstly, by incorporating the GhostCBS and DFSB-C2f modules, the aim is to reduce the number of model parameters while enhancing feature extraction capability. Secondly, by integrating the MHSA-C2f module, which incorporates a multi-head self-attention mechanism, the focus on global information is enabled, thereby mitigating irrelevant background interference and reducing the impact of complex backgrounds on damage detection. Lastly, adopting the Mini-BiFPN structure improves the retention of features for small target objects in shallow networks and reinforces the propagation of these features in deep networks, thereby enhancing the detection accuracy of small target damage and reducing false negative rates. Through training and testing on the Wind Turbine Blade Damage Dataset (WTBDD), the WTDB-YOLOv8 model achieves an average precision of 98.3%, representing a 2.2 percentage point improvement over the original YOLOv8 model. Particularly noteworthy is the increase in precision from 93.1% to 97.9% in small target damage detection. Moreover, the total parameter count of the model decreases from 3.22 million in YOLOv8 to 1.99 million, marking a reduction of 38.2%. Therefore, the WTDB-YOLOv8 model not only enhances the detection performance and efficiency of wind turbine blade damage but also significantly reduces the model parameter count, showcasing its practical advantages in engineering applications.

Keywords:

wind energy; wind turbine blade damage detection; YOLOv8

1. Introduction

In recent decades, the global demand for renewable energy has surged, leading to a significant increase in production capacity [1]. Wind energy, as a crucial renewable energy source, plays a pivotal role in ensuring the stability and reliable operation of national grids in countries heavily invested in wind energy development. The continued growth of wind power not only reinforces its importance in the power system but also strengthens its contribution to the global energy supply [2]. Wind turbine blades are critical components ensuring system stability and high efficiency [3]. However, due to prolonged operation in harsh environments like high winds, salt spray, and regions prone to natural disasters, the failure rate of blades is relatively high [4]. As wind turbines trend towards larger sizes and single-machine capacities increase with longer blade lengths, the risk of damage escalates. Hence, timely detection and accurate diagnosis of blade damage locations and characteristics are imperative for extending their lifespan and maintaining the efficient operation of wind turbines.

In the wind energy industry, various technologies are employed for detecting wind turbine blade damage, including manual inspection, strain monitoring, acoustic emission, ultrasonic testing, vibration analysis, thermal imaging, and machine vision methods [5]. However, most detection techniques are more suitable for pre-installation blade inspection [6]. Traditional ultrasound and vibration detection technologies necessitate direct contact with the blade surface, while strain and acoustic emission detection heavily depend on surface contact, making comprehensive post-installation testing challenging. Implementation of these methods faces constraints such as the complexity of blade structures, economic costs, installation time pressures, reliability issues of sensors under cyclic loading, and the need for coupling agents. On the other hand, non-contact infrared thermal imaging and machine vision detection do not require surface contact, but the accuracy of infrared technology may decrease in high temperatures and humidity changes [5].

With advancements in image processing technology, machine vision detection has emerged as a key technology for identifying wind turbine blade damage and deformation. Traditional image processing techniques excel in addressing simple and clear-cut damage cases but often falter in accuracy and robustness when dealing with complex, subtle damage or variable environments. In contrast, deep-learning-based damage detection methods leverage neural networks’ exceptional feature extraction capability to automatically identify key features from raw image data. These methods broadly fall into two main categories.

One category comprises two-stage algorithms based on region proposal networks, such as the Fast-Region-based Convolutional Neural Network (Faster R-CNN) [7] and Mask Region-based Convolutional Neural Network (Mask R-CNN) [8]. Shi et al. [9] employed an enhanced multi-scale Faster R-CNN model to detect cracks inside wind turbine blades and utilized the Visual Geometry Group 19-layer network (VGG-19) to determine the precise locations of the cracks. They also developed a model for predicting the remaining lifespan of coatings. Tong et al. [10] improved the backbone network and feature fusion of Faster R-CNN, thereby enhancing the accuracy of blade damage detection. Zhang et al. [11] proposed an image enhancement detection process based on Mask R-CNN, outperforming You Only Look Once version 3 (YOLOv3) and YOLOv4, and introduced new evaluation criteria. Zhang et al. [12] developed the Mask-MRNet network for blade fault detection, which improved detection performance by combining Mask R-CNN-512 and MRNet, selecting DenseNet-121 as the optimal classifier. Diaz et al. [13] designed a fast detection model based on Cascade Mask R-CNN, which utilized depthwise separable convolutions and image enhancement techniques to achieve high-accuracy damage detection and instance segmentation.

Although two-stage object detection models have made significant progress in wind turbine blade defect detection, they still suffer from several issues. Firstly, these models often have a large number of parameters, resulting in high computational costs, especially in real-time applications. Secondly, the detection speed of two-stage models is not satisfactory, limiting their application in real-time wind turbine blade defect detection and increasing the difficulty of deployment in practical engineering. Therefore, despite excellent performance in detection accuracy, the real-time performance and deployment convenience of the models still need further improvement and optimization.

To address the shortcomings of two-stage object detection models, another class of algorithms based on regression, such as the Single-Shot MultiBox Detector (SSD) [14] and YOLO [15], has been proposed. Single-stage object detection models are recognized for their swift detection speed but often provide lower accuracy compared to two-stage models. As a result, the current research focus on these models is primarily aimed at improving their detection accuracy. For instance, Lv et al. [16] developed the EADD detector, which utilized an improved SSD framework and ResNet backbone network, incorporating FSDB and FAM technologies for fast and accurate blade damage detection. Zhu et al. [17] significantly improved network performance by replacing the backbone VGG of the SSD one-stage object detection model with ResNet and ResNeXt, without increasing the complexity of the network structure. Ran et al. [18] enhanced feature extraction and attention mechanisms based on YOLOv5s and optimized the loss function, thus improving the accuracy and speed of small damage detection. Hao et al. [19] introduced a lightweight channel attention mechanism and replaced the Resunit module in the backbone network with the CR residual module to improve the detection accuracy of small-target damage. Yao et al. [20] proposed a lightweight cascaded feature fusion neural network model based on YOLOX, which made lightweight improvements to the backbone feature extraction network of the RepVGG structure, thereby improving the model’s inference speed. Liu et al. [21] proposed a detection algorithm based on YOLOv8, which significantly improved detection accuracy by enhancing the feature extraction capability of the backbone network, achieving an mAP of 79.9%. Yu et al. [22], by integrating the CBAM attention mechanism, employing weighted BiFPN (BiFPN), and improving the loss function, proposed an enhanced YOLOV8 model to improve the accuracy and robustness of wind turbine blade damage image detection. Liu et al. [23] proposed a wind turbine damage detection method based on an improved YOLOv8 algorithm, achieving high average precision in damage detection by introducing the C2f-FocalNextBlock module, ResNet-EMA module, and adopting a slim-neck structure.

In the aforementioned studies, researchers optimized and improved single-stage object detection models, enhancing the accuracy of the models in detecting wind turbine blade damage. Particularly, the improvements to the YOLOv8 model make it perform the best among current object detection models. Despite the significant progress in accuracy achieved by the YOLOv8 model, it still needs to find a better balance between detection accuracy and processing speed. Additionally, existing models often struggle to simultaneously meet the requirements of high real-time performance and low computational resource consumption when faced with the specific task of wind turbine blade defect detection.

Considering that two-stage and single-stage algorithms each have their own advantages in the field of object detection, existing models find it challenging to simultaneously meet the demands of high real-time performance and low computational resource consumption when dealing with the unique challenges of wind turbine blade defect detection. Moreover, due to wind turbine blades often being situated under complex backgrounds, the detection of small target defects is prone to background interference, leading to cases of missed detection and false alarms.

To address these issues, this study builds upon the state-of-the-art single-stage object detection model YOLOv8 and proposes the WTBD-YOLOv8 (Wind Turbine Blade Detection-YOLOv8) model. The design objective of this model is to enhance the accuracy of wind turbine blade damage detection in complex backgrounds by improving the feature extraction capability of the backbone network and strengthening multi-scale feature fusion techniques, without sacrificing detection speed or increasing the number of model parameters. Additionally, the model aims to maintain real-time performance and reduce computational resource consumption to meet the stringent standards and practical demands of the wind energy industry for blade damage detection.

1.1. Base Network Model

YOLO, designed specifically for real-time object detection tasks, is an efficient neural network model that strikes a balance between detection accuracy and processing speed [15]. Within the evolution of the YOLO model series, YOLOv8 notably enhances the efficiency and performance of real-time object detection.

As illustrated in Figure 1, the architecture of YOLOv8 consists of four key components: input, backbone, neck, and head. These components collectively ensure the efficiency and accuracy of the model, with the core architecture comprising Convolution-BatchNorm-SiLU (CBS), CSP Bottleneck with two convolutions (C2f), Spatial Pyramid Pooling Fusion (SPPF), and Detect modules. Specifically, the CBS module optimizes learning efficiency and enhances gradient handling capabilities through convolutional layers, batch normalization layers, and SiLU activation functions. The C2f module strengthens feature extraction using split attention layers, enhancing the capture of multiscale information and improving detection accuracy. The SPPF module addresses the issue of inconsistent feature map sizes, ensuring output consistency and spatial information preservation. Meanwhile, the Detect module is responsible for identifying and locating targets, employing an anchor-free mechanism to directly predict bounding boxes and optimizing prediction results through loss functions, thereby enhancing detection capabilities for objects of varying sizes.

In conclusion, this study employs an improved YOLOv8 model to address the challenges of wind turbine blade damage detection, aiming to enhance efficiency and performance.

1.2. The Proposed WTBD-YOLOv8 Framework

Considering the high demand for convenience in mobile deployment in practical applications, although YOLOv8 has shown effectiveness in real-time object detection, addressing the issues of model lightweighting and finding a balance between speed and accuracy is necessary to adapt it for wind turbine blade detection. Therefore, this study proposes an improved lightweight network model, WTBD-YOLOv8 (Figure 2). Specific improvements include:

In the Backbone part, the introduction of Depthwise Factorize Separable Bottleneck (DFSB)-C2f and GhostConv-BatchNorm-SiLU modules (GhostCBS) replaces the original C2f and CBS modules to reduce the model’s parameter count while enhancing its performance in extracting features from raw images.
In the Neck part, the Multi-Head Self-Attention (MHSA)-C2f module focuses on improving the extraction capability of critical features to effectively reduce interference from complex backgrounds, thereby enhancing the model’s detection accuracy in complex backgrounds.
In the Neck part, based on the feature fusion structure of YOLOv8, a weighted bidirectional pyramid feature fusion strategy is adopted to effectively integrate features of different scales. This significantly improves the model’s performance in detecting small damages without sacrificing computational efficiency, aiming to enhance the model’s accuracy in detecting small target damages on wind turbine blades.

These improvements aim to ensure that the proposed lightweight network model maintains the real-time detection performance of YOLOv8 while being more suitable for deployment on mobile devices and providing a high-precision solution for wind turbine blade damage detection.

2. Modulars Design of the Proposed Model

2.1. DFSB-C2f Modular

In this study, a new DFSB-C2f module was developed based on the original C2f module of the YOLOv8 model (Figure 3a). The improved part of the module is depicted by the dashed box in Figure 3b. The DFSB-C2f module integrates the GhostCBS and DFSB proposed in this study. This design maintains the structure of the original C2f module while inheriting its capability for multi-gradient information flow. Additionally, it incorporates the advantages of multi-linear information extraction from GhostCBS and DFSB. This architecture reduces the number of model parameters and computational costs, achieving model lightweighting while maintaining efficient performance.

2.1.1. GhostCBS

To enhance the network’s feature extraction capability while maintaining computational efficiency, we drew inspiration from the Ghost Module concept proposed in GhostNet [24] and designed the GhostConv module, as depicted in Figure 4b. Building upon this, we further innovated by designing the GhostCBS structure. As illustrated in Figure 5, the GhostCBS module consists of GhostConv, batch normalization (BN), and Silu activation function. The GhostCBS structure not only strengthens the model’s feature extraction capability by increasing the number of feature maps but also facilitates interaction between features across different channels. This maximizes the model’s learning and representation capabilities within limited computational resources.

As shown in Figure 4a, traditional convolutional operations may lead to considerable redundant computations when generating feature maps. To address this issue, GhostConv adopts small-sized filters to alleviate the burden of convolutional computations while still efficiently generating numerous feature maps. The core idea behind GhostConv is to employ a lightweight and innovative feature extraction strategy, utilizing fewer parameters and lower computational costs to mimic the effects of multiple convolutional channels.

As depicted in Figure 4b, GhostConv first performs convolutional operations using a limited number of standard convolutional kernels to generate a preliminary set of feature maps. Subsequently, for the remaining feature maps that need to be simulated, instead of stacking additional convolutional layers in the traditional manner, GhostConv applies a series of “Φ” operations on the base feature maps obtained from the primary convolution. These operations may include depthwise separable convolutions or linear transformations, aimed at channel fusion to evoke deep interactions among feature channels, thereby further enhancing the model’s performance. These operations have relatively fewer parameters and computational costs but can dynamically generate additional feature mappings, enriching the network’s feature representation hierarchy while conserving resources.

2.1.2. DFSB

Figure 6 illustrates several exemplary designs of residual blocks. The bottleneck residual structure [25] employs a combination of 1 × 1 convolutional layers, 3 × 3 convolutional layers, and 1 × 1 convolutional layers (as in Figure 6a). By adjusting the output channels of the 1 × 1 convolutional layers, it achieves compression or expansion of feature map dimensions, thereby reducing computational complexity. The MobileNet [26] model utilizes depthwise separable convolution (DSConv), decomposing standard convolution into depthwise convolution and pointwise convolution steps (as in Figure 6b), effectively extracting spatial features while reducing computational complexity and parameter count. The non-bottleneck-1D approach [27] employs a one-dimensional decomposed convolutional kernel (as in Figure 6c), similarly reducing computational costs without sacrificing performance.

This study innovatively integrates the aforementioned three types of residual blocks and proposes DFSB as depicted in Figure 6d. It retains the structure of Bottleneck, draws inspiration from the DSConv concept in MobileNet, and introduces the GhostCBS structure to replace the CBS module to facilitate feature interaction between different channels. Simultaneously, by incorporating the FConv kernel from non-bottleneck-1D, a novel depthwise separable convolution (DFSConv) is proposed, decomposing the 3 × 3 convolutional layer into 1 × 3 and 3 × 1 convolutional layers. Compared to traditional approaches, DFSB employs a unique design mechanism to ensure model performance while reducing parameter count, enhancing computational efficiency, and achieving lightweight levels, thus meeting the dual requirements of model compactness and computational efficiency.

2.2. MHSA-C2f Modular

In this study, to enhance the accuracy of blade damage detection in complex environments, we improved the C2f module of the model by integrating MHSA [28] to enhance its processing capability. Although the C2f module performs well in handling local features, it may not fully utilize global contextual information when facing global dependencies and complex background interferences in blade images, relying solely on local convolutional operations. This limitation could potentially affect the detection performance and robustness of the model under complex operating conditions. To overcome this limitation, we added an MHSA module after the Concat operation in the C2f module, as shown in Figure 7. This improvement provides the model with a global view and the ability for dynamic weight allocation while retaining the original computational efficiency. It helps enhance the accuracy of damage detection in complex environments.

As illustrated in Figure 8, the MHSA extends upon the self-attention mechanism by mapping the input data through different linear transformations to multiple representation subspaces, thereby forming multiple “heads.” Within each “head,” self-attention operations are independently performed, including computing the Query matrix Q, the Key matrix K, and the Value matrix V. For each head, its attention score is computed as shown in Equation (1), where d_k is the dimensionality of the key vector K, typically scaled over the dimensionality of the key to avoid the vanishing gradient problem.

A t t e n t i o n (Q, K, V) = s o f t \max (\frac{Q K^{T}}{\sqrt{d_{K}}}) V

(1)

Each head generates a distinct attention matrix, based on which the corresponding context vectors are computed. Subsequently, MHSA concatenates the outputs of all heads together, followed by integration through a fully connected layer. The computation is expressed in Equations (2) and (3), where W_i^Q, W_i^K, and W_i^V are the weight matrices for linear transformations of the input, and W^O is the weight matrix for further transforming the concatenated vector into the final output.

{head}_{i} = A t t e n t i o n (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V})

(2)

MultiHead (Q, K, V) = C o n c a t (h e a d_{1}, \dots, h e a d_{h}) W^{O}

(3)

By replacing the higher-level C2f module with MHSA-C2f in the neck section of the model, each feature position in the model can adaptively focus on and integrate global information based on its correlation within the entire input feature sequence. This is particularly beneficial when dealing with images of wind turbine blades intertwined with complex backgrounds, as it enables the model to selectively attend to and integrate global information. This prioritizes the features of the main body of the blade, significantly enhancing the identification and localization capabilities for blade damage while avoiding redundant computations. This improvement ensures that the model effectively captures and utilizes key information closely related to damage detection when processing images of wind turbine blades in complex backgrounds.

2.3. Mini-Bidirectional Feature Pyramid Network(Mini-BiFPN)

In wind turbine blade damage detection, the performance of detecting small-scale target damage often falls short of expectations, primarily due to the attenuation of crucial semantic information related to small-scale target damage caused by multiple convolution operations in neural networks. Damage on wind turbine blade surfaces manifests in various forms and sizes, presenting unique features at different resolutions. While Path Aggregation Network (PANet) in the YOLOv8 model (as in Figure 9a) promotes multi-level feature fusion by constructing a feature pyramid, this fusion mechanism is not suitable for detecting damage on wind turbine blades. It indiscriminately merges features from different inputs without explicit differentiation or weighting, potentially leading to an inadequate balance and utilization of features at different scales during the fusion output stage. Specifically, the expression of small-sized features may be suppressed, thus reducing the accuracy of detecting small-scale target damage and increasing the risk of missed detections.

To address this challenge, this study proposes an improved approach called Mini-BiFPN based on BiFPN [29], aiming to enhance the performance of the YOLOv8 model in wind turbine blade damage detection. As shown in Figure 9, we simplify the five-layer input structure of BiFPN to three layers (Figure 9b) to integrate more efficiently with the YOLOv8 architecture. P3, P4, and P5 respectively represent different feature maps in the FPN. This streamlined structure maintains the vertical flow characteristics of features between layers, including top-down and bottom-up information propagation, and introduces a weighting mechanism and bidirectional information exchange. By adjusting the weight distribution when connecting each level with other layers, the ability of each level to acquire valuable information from other layers is enhanced, improving the efficiency of critical feature propagation. Specifically, this approach significantly enhances the model’s ability to retain crucial features of small-scale targets in shallow networks and strengthens the propagation and influence of these features in deep networks. Through this information fusion strategy, Mini-BiFPN significantly improves the detection accuracy of damage on wind turbine blade surfaces of various sizes, optimizing the performance of YOLOv8 in specific application scenarios.

3. Experimental Design and Result Analysis

3.1. Dataset

Wind turbine blades are prone to various types of surface damage, resulting in performance degradation and reduced lifespan, especially in challenging environments with fluctuating climates. Among the primary forms of damage are pitting corrosion, characterized by localized deep pits that may resemble dark spots or holes in images. This corrosion can be prevalent in high-stress regions and exacerbated in salt-rich environments like coastal areas. Moreover, developing cracks when internal stresses exceed the material’s strength, typically appearing as elongated, thin dark lines in images, sometimes with defined edges. When cracks extend and compromise structural integrity, they can lead to partial or complete structural failure, manifested in images as wider cracks, occasionally with visible discontinuity and separation zones. Furthermore, damage in general encompasses oxidation, coating delamination, fragmentation, or deformation, which may be presented as irregularly shaped blotches, masses, or other abnormal patterns in images.

In addressing the problem of wind turbine blade damage identification, we utilized a real dataset from a wind farm in Jiangsu, China, consisting of 4000 high-resolution color images of blade damage. To ensure training effectiveness, we selected 400 high-quality images from this dataset as the Wind Turbine Blade Damage Dataset (WTBDD) for this study, covering four common types of surface damage: pitting, cracking, splitting, and breakage. All images were resized to 640 × 640 resolution and annotated using LabelImg software 1.8.6 to generate YOLO-compatible dataset formats suitable for model training. To enhance the model’s generalization ability and reduce overfitting, we applied data augmentation techniques, such as contrast adjustment, rotation, flipping, and the addition of salt-and-pepper noise and Gaussian noise (as shown in Figure 10) and also expanded the dataset to 2000 images. These samples were randomly divided into training, validation, and testing sets in an 8:1:1 ratio to achieve optimal training performance. The dataset contains detailed information on damage categories and locations, and all models in this study were tested on this dataset. To conduct ablation studies and compare performance with other existing methods, the validation set was used for evaluation. Figure 11 illustrates example images of the four types of damage.

3.2. Parameter Settings and Evaluation Metrics

In the experimental section of this study, we employed multiple performance metrics to comprehensively evaluate the model’s detection capability, including average precision (AP), mean average precision (mAP), parameter count, GFLOPs (giga floating point operations per second), and frames per second (FPS). These metrics reflect the model’s detection performance, resource consumption, and computational efficiency. The calculation formulas for these metrics are as follows:

Precision = \frac{TP}{TP + FP}

(4)

Recall = \frac{TP}{TP + FN}

(5)

where TP represents true positive samples predicted as positive, FP represents false positive samples incorrectly predicted as positive, and FN refers to true positive samples incorrectly predicted as negative.

AP = \int_{0}^{1} P (R) dR

(6)

mAP = \frac{1}{N} \sum_{i = 1}^{N} AP (i)

(7)

where AP(i) represents the average precision AP for the ith class and N represents the total number of defect categories.

Parameters = C_{in} \times C_{out} \times K \times K

(8)

GFLOPs = W \times H \times K \times K \times C_{in} \times C_{out}

(9)

where W and H represent the width and height of the input feature map, C_in and C_out represent the number of input and output feature channels, respectively, and K represents the size of the convolution kernel.

The experimental environment and hyper parameters are shown in Table 1 and Table 2, respectively.

3.3. Ablation Experiment and Comparative Experiments

To thoroughly assess the contributions of each proposed module in this study to the enhancement of model performance, we conducted five targeted ablation experiments. These experiments aimed to analyze and compare the roles and importance of each module within the model. In each experimental group, aside from the module being validated, all other training parameters remained constant to ensure fairness and comparability of the experimental outcomes. Table 3 presents comprehensive results of each ablation experiment group, encompassing the performance of modules such as GhostCBS, DFSB-C2f, MHSA-C2f, and Mini-BiFPN.

In this study, we utilized YOLOv8 as the benchmark model for experimentation. According to the data presented in the table, replacing the C2f module in the backbone feature extraction network with DFSB-C2f resulted in a rise in mAP from 96.1% to 97.0%, alongside a reduction in parameter count from 3.22 million to 1.86 million. Substituting the GHOSTCBS structure for the CBS module further decreased the parameter count while maintaining the mAP almost unchanged. The adoption of the MHSA-C2f module yielded a 0.8% increase in mAP, despite a 0.12 million increase in parameters and a decrease in FPS. This suggests that the MHSA-C2f module enhances object detection performance but also slightly raises the model’s parameter count. Integrating the Mini-BiFPN model boosted the mAP to 98.3% while reducing the parameter count to 1.99 million, affirming the efficacy of BiFPN in amalgamating shallow feature information. These outcomes comprehensively illustrate that the structural and modular replacements proposed in this study not only enhance the detection accuracy of the model but also achieve further model lightweighting.

To validate the advantages of the WTBD-YOLOv8 network model in wind turbine blade damage detection, this study compared it with current mainstream object detection models, as shown in Table 4.

After conducting experiments with a substantial validation set, WTBD-YOLOv8 demonstrated significant advantages. Compared to mainstream two-stage detection algorithms like Faster-RCNN and the single-stage detection algorithms YOLOv5s, YOLOv7, and YOLOv8, WTBD-YOLOv8 achieved improvements in average detection accuracy (mAP) of 1.2%, 3.7%, 6.4%, and 2.2%, respectively, reaching an exceptionally high level of 98.3%. Additionally, its parameter count is merely 1.99 million, much lower than other models, rendering the model more lightweight. Furthermore, WTBD-YOLOv8 possesses lower GFlops (7.1 GFlops) and higher detection speed (112.25 FPS), enabling it to handle large-scale data quickly and efficiently. Therefore, our model maintains outstanding detection performance when processing large-scale data, demonstrating significant advantages. The data in Table 5 reveals the performance of WTBD-YOLOv8 compared to the original model in detecting four different types of damage, providing a clearer illustration of the performance difference between WTBD-YOLOv8 and YOLOv8. Specifically, for the “Crack” category, WTBD-YOLOv8 achieved an AP50 of 98.7%, an improvement over YOLOv8’s 97.6%. In the “Cracking” category, WTBD-YOLOv8’s AP50 was 99.5%, while YOLOv8’s was 98.9%. For the “Damage” category, WTBD-YOLOv8 achieved an AP50 of 97.1%, surpassing YOLOv8’s 94.8%. Lastly, in detecting small target damage such as “Pitting corrosion”, WTBD-YOLOv8 showed a significant improvement in precision compared to the original model. WTBD-YOLOv8’s AP50 reached 97.9%, also exceeding YOLOv8’s 93.1%. In summary, WTBD-YOLOv8 demonstrates higher accuracy in detecting cracks, cracking, damage, and pitting corrosion, proving its superior performance in damage detection.

To visually demonstrate the improved model’s effectiveness in detecting damages, Figure 12 presents partial sample examples from the experimental validation conducted on a validation set consisting of 200 images in this study. Figure 12a displays all annotated defects within the images, while Figure 12b showcases the results post-detection by the WTBD-YOLOv8 model. The WTBD-YOLOv8 model accurately identifies all defects without any instances of missed detection, demonstrating high precision in detection.

To further showcase and compare the effectiveness of the model before and after improvements in damage detection, Figure 13 presents a comparison of the detection results of YOLOv8 and the proposed WTBD-YOLOv8 model on damaged wind turbine blade images using the test dataset. The first row of the figure displays a series of original images, all containing damaged wind turbine blades for detection. The second and third rows respectively show the detection results of YOLOv8 and the improved WTBD-YOLOv8 model for the same original images. By comparing Figure 13a–c, it can be observed that for the categories “Damage”, “Crack”, and “Cracking”, the improved WTBD-YOLOv8 model exhibits higher confidence scores in detection, indicating an enhancement in its detection performance. It is worth noting that in Figure 13d, YOLOv8 missed the detection of small-scale damage—“Pitting corrosion”. In contrast, the WTBD-YOLOv8 model proposed in this study successfully detects this small target damage, further demonstrating the effectiveness and robustness of the proposed method in detecting small target damages.

In summary, WTBD-YOLOv8 demonstrates superior performance compared to the original YOLOv8 model in wind turbine blade damage detection tasks, particularly in improving detection accuracy and reducing false negative rates. The model exhibits more balanced detection accuracy and speed. Moreover, the algorithm can be easily deployed on mobile devices, indicating its capability to efficiently meet real-world industrial inspection needs.

4. Conclusions

In this study, we developed WTBD-YOLOv8, a lightweight model for quickly and accurately detecting wind turbine blade damage. By integrating the GhostCBS structure into YOLOv8, we enhanced feature extraction and cross-channel feature interaction, maximizing learning and representation capabilities within limited computational resources. We introduced DFSB-C2f to improve feature extraction, reduce model parameters, and accelerate detection while maintaining accuracy. Incorporating MHSA into our feature extractor enabled better focus on blade areas, enhancing performance in complex backgrounds. Mini-BiFPN in our feature fusion structure improved deep semantic information capture and feature utilization efficiency, enhancing sensitivity and robustness in detecting small damages.

Experimental results confirmed WTBD-YOLOv8’s enhancements, including a 98.3% mAP, 1.99 million parameters, and 112.25 FPS. Compared to YOLOv8, our model achieved a 2.2% mAP increase, 38.20% parameter reduction, and 51.7% faster processing speed, notably improving small damage detection by 4.8%. WTBD-YOLOv8 offers an efficient and accurate solution for wind turbine blade damage detection, promising wide application in the wind energy industry. It enhances maintenance efficiency, safety, and contributes significantly to sustainable development.

Author Contributions

Conceptualization, L.T.; data curation, C.F. and Z.P.; investigation, C.W. and J.H.; methodology, L.T. and S.S.; writing—original draft, L.T.; writing—review and editing, C.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Chongqing Graduate Joint Training Base Construction Project, grant number JDLHPYJD2019006.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

We would like to give special thanks to the editors and the anonymous reviewers of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

International Renewable Energy Agency(IRENA) Report. 2022. Available online: https://www.irena.org/ (accessed on 27 May 2023).
Ahmed, S.D.; Al-Ismail, F.S.M.; Shafiullah, M.; Al-Sulaiman, F.A.; El-Amin, I.M. Grid integration challenges of wind energy: A review. IEEE Access 2020, 8, 10857–10878. [Google Scholar] [CrossRef]
Hu, B.; Wu, Y.K.; Guo, Z.J. Overview of crack monitoring techniques for wind power generation blades. High Volt. Appar. 2022, 58, 93–100. [Google Scholar]
Hai, T.; Fan, H.; Wang, K.J.; Liu, Z.Y.; Chen, Y.J. Ice fault diagnosis of wind turbine units based on PSO-SVM algorithm. Smart Power 2021, 49, 1–6+74. [Google Scholar]
Du, Y.; Zhou, S.; Jing, X.; Peng, Y.; Wu, H.; Kwok, N. Damage detection techniques for wind turbine blades: A review. Mech. Syst. Signal Process. 2020, 141, 106445. [Google Scholar] [CrossRef]
Zhou, J.; Shi, T.; Xu, B. Research Progress of Wind Turbine Blade Damage Fault Detection Technology. Adv. N&R Energy 2023, 11, 556–563. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN[A]. In Proceedings of the 2017 IEEE Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2980–2988. [Google Scholar]
Shi, L.; Long, Y.; Wang, Y.; Chen, X.; Zhao, Q. Evaluation of internal cracks in turbine blade thermal barrier coating using enhanced multi-scale Faster R-CNN model. Appl. Sci. 2022, 12, 6446. [Google Scholar] [CrossRef]
Tong, W.G.; Yi, X.L.; Li, B.; Yang, K. Fusion of multi-scale features and attention mechanism for detecting defects in wind turbine blades. Electron. Meas. Technol. 2022, 45, 166–172. [Google Scholar]
Zhang, J.; Cosma, G.; Watkins, J. Image enhanced mask R-CNN: A deep learning pipeline with new evaluation measures for wind turbine blade defect detection and classification. J. Imaging 2021, 7, 46. [Google Scholar] [CrossRef] [PubMed]
Zhang, C.; Wen, C.; Liu, J. Mask-MRNet: A deep neural network for wind turbine blade fault detection. J. Renew. Sustain. Energy 2020, 12, 053302. [Google Scholar] [CrossRef]
Diaz, P.M.; Tittus, P. Fast detection of wind turbine blade damage using Cascade Mask R-DSCNN-aided drone inspection analysis. Signal Image Video Process. 2023, 17, 2333–2341. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the Computer Vision & Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016. [Google Scholar]
Lv, L.; Yao, Z.; Wang, E.; Ren, X.; Pang, R.; Wang, H.; Zhang, Y.; Wu, H. Efficient and accurate damage detector for wind turbine blade images. IEEE Access 2022, 10, 123378–123386. [Google Scholar] [CrossRef]
Zhu, J.; Wen, C.; Liu, J. Defect Identification of Wind Turbine Blade Based on Multi-feature Fusion Residual Network and Transfer Learning. Energy Sci. Eng. 2022, 10, 219–229. [Google Scholar] [CrossRef]
Ran, X.; Zhang, S.; Wang, H.; Zhang, Z. An improved algorithm for wind turbine blade defect detection. IEEE Access 2022, 10, 122171–122181. [Google Scholar] [CrossRef]
Hao, W.X.; Li, J.J. Improvements on YOLOx for defect detection of wind turbine blades. Computer Era 2023, 9, 106–110+115. [Google Scholar] [CrossRef]
Yao, Y.; Wang, G.; Fan, J. WT-YOLOX: An Efficient Detection Algorithm for Wind Turbine Blade Damage Based on YOLOX. Energies 2023, 16, 3776. [Google Scholar] [CrossRef]
Liu, L.; Li, P.; Wang, D.; Zhu, S. A wind turbine damage detection algorithm designed based on YOLOv8. Appl. Soft Comput. 2024, 154, 111364. [Google Scholar] [CrossRef]
Yu, H.; Wang, J.; Han, Y.; Fan, B.; Zhang, C. Research on an Intelligent Identification Method for Wind Turbine Blade Damage Based on CBAM-BiFPN-YOLOV8. Processes 2024, 12, 205. [Google Scholar] [CrossRef]
Liu, Y.; Zheng, Y.; Shao, Z.; Wei, F.; Cui, T.; Xu, R. Defect detection of the surface of wind turbine blades combining attention mechanism. Adv. Eng. Inform. 2024, 59, 102292. [Google Scholar] [CrossRef]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Romera, E.; Alvarez, J.M.; Bergasa, L.M.; Arroyo, R. Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 2017, 19, 263–272. [Google Scholar] [CrossRef]
Tan, H.; Liu, X.; Yin, B.; Li, X. MHSA-Net: Multi-Head Self-Attention Network for Occluded Person Re-Identification. arXiv 2020, arXiv:2008.04015. [Google Scholar] [CrossRef]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]

Figure 1. The structure of the YOLOv8 network.

Figure 2. The structure of the WTBD-YOLOv8 network.

Figure 3. Comparison of the structure between the C2f module and the proposed DFSB-C2f module: (a) the structure of C2f module; (b) the structure of DFSB-C2f module.

Figure 4. Comparison between traditional convolution operation and the proposed GhostConv operation: (a) the convolutional layer; (b) the proposed GhostConv module.

Figure 5. The structure of GhostCBS module.

Figure 6. The structure of different types of residual blocks and proposed: (a) the structure of Bottleneck; (b) the structure of the Depthwise Separable Convolution; (c) the structure of the non-bottleneck-1D; (d) the structure of proposed DFSB module.

Figure 7. The structure of MHSA-C2f.

Figure 8. The Self-Attention module and the MHSA module: (a) the Self-Attention module; (b) the MHSA module.

Figure 9. The structure of feature fusion networks: (a) the structure of PANet; (b) the proposed Mini-BiFPN network structure.

Figure 10. Samples of wind turbine blade defects after image enhancement: (a) original image; (b) mirror flip; (c) Gaussian noise addition; (d) random angle rotation after Gaussian noise processing.

Figure 11. Four types of wind turbine blade defect image samples: (a) pitting corrosion; (b) crack; (c) cracking; (d) damage.

Figure 12. Validation detection example: (a) validation ground truth; (b) detection results.

Figure 13. Comparison of detection results before and after model improvement: (a) mainly the detection examples of Damage; (b) mainly the detection examples of Crack; (c) mainly the detection examples of Cracking; (d) mainly the detection examples of Pitting corrosion.

Table 1. Environment of simulation experiment.

Configuration	Model
System	Windows 10
CPU	AMD Ryzen 7 5800 H, basic frequency 3.5 Ghz
GPU	NVIDIA RTX 3060/6 G
Memory	1.5 T SSD
Frame	Pytoch 1.12.0, Python 3.8, Cuda 11.3, Cudnn 8.2.1

Table 2. Hyper parameters settings of model training.

Parameters Name	Model
Learning rate	0.01
Momentum	0.937
Iou	0.75
Batch size	8
Warm up epochs	5
Total epoch	300

Table 3. Ablation experiment results of WTBD-YOLOv8.

Strategy	Group 1	Group 2	Group 3	Group 4	Group 5
YOLOv8	√	√	√	√	√
+DFSB-C2f	×	√	√	√	√
+GhostCBS	×	×	√	√	√
+MHSA-C2f	×	×	×	√	√
+Mini-BiFPN	×	×	×	×	√
Params/M	3.22	3.06	3.01	3.22	1.99
mAP50/%	96.1	97.0	96.9	97.6	98.3
FPS	54.2	57.9	59.8	48.9	112.25

Table 4. Comparative experimental results.

Model	mAP/%	Params/M	GFlops/G	FPS
Faster-R cnn	97.1	63.3	261.7	10.4
YOLOv5s	94.6	7.22	15.8	68.3
YOLOv7	91.7	37.21	102.7	54.7
YOLOv8	96.1	3.22	8.3	54.2
WTBD-YOLOv8	98.3	1.99	7.1	112.25

Table 5. Performance comparison of WTBD-YOLOv8 and YOLOv8 in detecting damage.

Model	YOLOv8	WTBD-YOLOv8
Model	AP50/%
Crack	97.6	98.7
Cracking	98.9	99.5
Damage	94.8	97.1
Pitting corrosion	93.1	97.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tong, L.; Fan, C.; Peng, Z.; Wei, C.; Sun, S.; Han, J. WTBD-YOLOv8: An Improved Method for Wind Turbine Generator Defect Detection. Sustainability 2024, 16, 4467. https://doi.org/10.3390/su16114467

AMA Style

Tong L, Fan C, Peng Z, Wei C, Sun S, Han J. WTBD-YOLOv8: An Improved Method for Wind Turbine Generator Defect Detection. Sustainability. 2024; 16(11):4467. https://doi.org/10.3390/su16114467

Chicago/Turabian Style

Tong, Liang, Changlong Fan, Zhongbo Peng, Cong Wei, Shijie Sun, and Jie Han. 2024. "WTBD-YOLOv8: An Improved Method for Wind Turbine Generator Defect Detection" Sustainability 16, no. 11: 4467. https://doi.org/10.3390/su16114467

APA Style

Tong, L., Fan, C., Peng, Z., Wei, C., Sun, S., & Han, J. (2024). WTBD-YOLOv8: An Improved Method for Wind Turbine Generator Defect Detection. Sustainability, 16(11), 4467. https://doi.org/10.3390/su16114467

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

WTBD-YOLOv8: An Improved Method for Wind Turbine Generator Defect Detection

Abstract

1. Introduction

1.1. Base Network Model

1.2. The Proposed WTBD-YOLOv8 Framework

2. Modulars Design of the Proposed Model

2.1. DFSB-C2f Modular

2.1.1. GhostCBS

2.1.2. DFSB

2.2. MHSA-C2f Modular

2.3. Mini-Bidirectional Feature Pyramid Network(Mini-BiFPN)

3. Experimental Design and Result Analysis

3.1. Dataset

3.2. Parameter Settings and Evaluation Metrics

3.3. Ablation Experiment and Comparative Experiments

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI