DCW-YOLO: An Improved Method for Surface Damage Detection of Wind Turbine Blades

Zou, Li; Chen, Anqi; Li, Chunzi; Yang, Xinhua; Sun, Yibo

doi:10.3390/app14198763

Open AccessArticle

DCW-YOLO: An Improved Method for Surface Damage Detection of Wind Turbine Blades

by

Li Zou

^1,2,*

,

Anqi Chen

¹

,

Chunzi Li

³,

Xinhua Yang

² and

Yibo Sun

²

¹

School of Intelligent Rail Engineering, Dalian Jiaotong University, Dalian 116028, China

²

Liaoning Key Laboratory of Welding and Reliability of Rail Transportation Equipment, Dalian Jiaotong University, Dalian 116028, China

³

School of Chinese Language and Literature, Tangshan Normal University, Tangshan 063000, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(19), 8763; https://doi.org/10.3390/app14198763 (registering DOI)

Submission received: 21 August 2024 / Revised: 22 September 2024 / Accepted: 24 September 2024 / Published: 28 September 2024

Download

Browse Figures

Versions Notes

Abstract

:

Wind turbine blades (WTBs) are prone to damage from their working environment, including surface peeling and cracks. Early and effective detection of surface defects on WTBs can avoid complex and costly repairs and serious safety hazards. Traditional object detection methods have disadvantages of insufficient detection capabilities, extended model inference times, low recognition accuracy for small objects, and elongated strip defects within WTB datasets. In light of these challenges, a novel model named DCW-YOLO for surface damage detection of WTBs is proposed in this research, which leverages image data collected by unmanned aerial vehicles (UAVs) and the YOLOv8 algorithm for image analysis. Firstly, Dynamic Separable Convolution (DSConv) is introduced into the C2f module of YOLOv8, allowing the model to more effectively focus on the geometric structural details associated with damage on WTBs. Secondly, the upsampling method is replaced with the content-aware reassembly of features (CARAFE), which significantly minimizes the degradation of image characteristics throughout the upsampling process and boosts the network’s ability to extract features. Finally, the loss function is substituted with the WIoU (Wise-IoU) strategy. This strategy allows for a more accurate regression of the target bounding boxes and helps to improve the reliability in the localization of WTBs damages, especially for low-quality examples. This model demonstrates a notable superiority in surface damage detection of WTBs compared to the original YOLOv8n and has achieved a substantial improvement in the [email protected] metric, rising from 91.4% to 93.8%. Furthermore, in the more rigorous [email protected]–0.95 metric, it has also seen an increase from 68.9% to 71.2%.

Keywords:

WTBs; object detection; YOLO; loss function

1. Introduction

Over the past few decades, to promote the sustainable development of the global economy, renewable energy has been widely recognized as an effective energy solution. This encompasses sources such as solar, hydrogen, tidal, wind power, etc. [1]. Among the various renewable energy technologies, wind energy stands out as one of the most rapidly advancing fields. According to the Global Wind Report 2023, by the year 2030, China’s wind energy installed capacity is expected to reach 700 to 800 gigawatts, maintaining its position as the largest wind energy market in the world [2]. With the sharp increase in demand for wind power, there is a growing need for wind turbines with larger sizes, higher production efficiency, and greater power generation stability. WTBs are one of the pivotal components for capturing wind energy within a system for producing wind electricity. They are also considered consumables and are expensive, usually making up between 15% and 20% of the wind turbine system’s overall cost [3,4]. However, the environmental conditions at most wind farms are often harsh; WTBs must withstand complex and variable climatic and mechanical loads throughout their service life. This renders many turbines potentially susceptible to damage over prolonged operational periods [5]. Although early damage may not initially affect the stable operation of wind turbines directly, surface defects that are not promptly repaired have the potential to gradually expand, causing further harm to the structural integrity of the blades and eventually posing a serious risk to their service life [6]. Therefore, employing efficient methods for detecting superficial defects on WTBs and promptly carrying out repairs is a critical strategy for minimizing maintenance expenses and prolonging the operational life of turbines. Simultaneously, it ensures that wind power plants can maintain efficient and continuous power generation.

Large-scale wind turbines are predominantly situated in remote and isolated regions, and their blades rotate at an altitude of 80 m or more [7]. This results in significant safety risks and lower efficiency in conventional manual inspections. Furthermore, such inspections must be conducted while the wind turbine is in a shutdown state, severely impacting its power generation efficiency [8]. In recent years, with the rapid development of deep learning technology, deep learning algorithms have been widely applied in fields such as damage detection [9,10,11]. One-stage and two-stage detectors are the two primary types of object detection models in use today. Representative models of the two-stage approach include Faster R-CNN [12], Mask R-CNN [13], and others. These models initially generate candidate bounding boxes, followed by classification and regression on each candidate to accomplish object detection. Representatives of one-stage models include the Single Shot MultiBox Detector (SSD) [14] algorithm and the YOLO algorithm series. The SSD algorithm immediately determines the targets’ positions and categories on the feature maps. The YOLO series divides the image into grids and predicts bounding boxes and categories for targets in each grid cell. The YOLO family of models is widely favored for their real-time capabilities and high precision. Despite the significant advancements in accuracy brought forth by deep learning-based detection algorithms, their performance is largely contingent upon the quality of the input images. Considering large-scale wind turbine images collected via UAVs, the damaged areas often manifest as small-scale features, leading to common instances of missed detections. Furthermore, during the process of image capture, light refraction and other visual disturbances inevitably reduce the recognizability of damage features. Particularly for cases of minor surface damage, these factors further increase the complexity of detection. Therefore, selecting an appropriate time for shooting is crucial. It is recommended to capture images during periods without precipitation, such as on clear evenings or at night, to effectively avoid optical interference and reflections caused by rain, thereby improving image clarity and detection accuracy.

This research is based on the newest algorithm in the YOLO series, YOLOv8. In order to enhance the detection performance of the algorithm, a series of improvements have been made. Firstly, inspired by tubular structure segmentation techniques based on topological geometric constraints, this research has integrated DSConv into the cross-scale feature fusion C2f module of YOLOv8, reinforcing the algorithm’s focus on damage geometric structures within WTBs images. Secondly, a lightweight upsampling module—the CARAFEis introduced, which preserves high-resolution details while reassembling and expanding features, aiming to enhance the detection accuracy of small-area, superficial damages in large-scale wind turbine images. Finally, the loss function of YOLOv8 has been optimized through the integration of the WIoUv3 strategy. This method combines a dynamic non-monotonic focusing mechanism with a prudent gradient gain allocation strategy, enabling more accurate bounding box regression, particularly for low-quality examples. Building upon the aforementioned optimization techniques, the goal of the research is to improve damage detection accuracy without sacrificing real-time performance. The improved algorithm proposed in this research holds significant research value and practical implications for addressing issues of damage overlap and small-area damage occurrences in large-scale WTB images under complex lighting conditions.

The following is a summary of this article’s primary contributions.

(1): In the C2F module, we have incorporated DSConv, enabling the model to better focus on the geometric structures within WTB images, thus enhancing the accuracy of detecting small-area damages and intricate cracks.
(2): Replacing the upsampling method with CARAFE effectively reduces the loss of image feature information during the upsampling process, enhancing the feature extraction capability of the feature enhancement network.
(3): The loss function is optimized by the WIoUV3 strategy. This strategy enables more accurate regression of target bounding boxes, rendering them more reliable in scenarios of WTBs damage localization.

The remainder of this paper is structured as follows: An overview of the current WTB damage detection methods is given in Section 2. Section 3 presents the detailed description of the proposed improved method. Section 4 introduces the experiment details, including dataset preparation, parameter settings, and evaluation metrics, and demonstrates the efficacy of the suggested method through experiments on a self-made WTB damage dataset. Finally, some conclusions are drawn in Section 5.

2. Related Work

In the domain of WTB damage inspection, detection methodologies are primarily divided into conventional approaches and deep learning methods.

2.1. Conventional Approaches

Traditional damage identification techniques primarily include vibration analysis, acoustic emission, ultrasonic testing, and thermography.

Vibration-based detection methods monitor the vibrations of WTBs using vibration sensors. When a blade becomes damaged, its vibrational characteristics can change [15]. Ghoshal et al. [16] tested four vibration-based techniques to detect partial damage to the blades. These approaches rely on assessing the vibrational behavior of the blades upon stimulation, utilizing patches of piezoelectric ceramic actuators that are attached to the blades. By examining vibration data, Liu et al. [17] used an empirical wavelet thresholding approach to identify the kind of damage in the blade bearings. This method aims to enhance the processing capability of vibration data, thereby increasing the accuracy and efficiency of fault detection. Employing vibration-based techniques for blade damage detection presents a significant challenge: differentiating between vibrations resulting from damage and those arising from environmental and operational factors.

Acoustic emission (AE) technology is a non-destructive testing technique utilized to monitor internal damages, cracks, micro-defects, or other anomalies within materials or structures; it is extensively applied within engineering and structural health monitoring [18]. In the monitoring of WTBs, acoustic emission-based detection techniques not only allow for the assessment of the structural health status but also enable the evaluation of the severity of damage by analyzing the acoustic emission signals. Tang et al. [19] confirmed the practicality of AE technology for the real-time monitoring of WTBs structural health. Cyclic loading was conducted using a compact resonant mass to accurately replicate the operational loading conditions. Xu et al. [20] monitored the condition of a 59.5-m composite WTB under fatigue stress using AE technology, identifying and classifying different types of damage on the WTBs. Nonetheless, a significant quantity of acoustic emission sensors must be installed on the WTBs, and there is not any physical connection between the acoustic emission signals and the corresponding damage. Concurrently, a principal challenge of this technology lies in differentiating the acoustic emission signals attributable to damage from those caused by noise, which renders the data processing task complex and significantly more costly.

It is possible to use ultrasonic testing technologies to find exterior and internal damage in WTBs. By transmitting ultrasonic waves and receiving the reflected signals within or on the surface of the blades, the structural health of the blades can be assessed [21]. Ultrasonic technology is capable of detecting a variety of damage types, such as delamination and cracks, which alter the propagation path of ultrasound waves. Because of this, ultrasonic technology is an essential tool for the inspection and maintenance of WTBs since it can be used to assess the position and severity of damage by analyzing the received signals. Yang et al. [22] utilized ultrasonic non-destructive testing (NDT) and structural health monitoring (SHM) techniques to detect damage in an actual WTB within a laboratory setting. Oliveira et al. [23] suggested employing a novel detection technique alongside non-destructive ultrasonic examination to pinpoint structural defects in WTBs. Ultrasonic inspection technology typically presents a limitation in that it necessitates direct placement of an ultrasonic transducer on the surface or within the interior of a blade. This may require a shutdown and contact with the blade, leading to interruptions in production and additional maintenance costs.

Thermographic inspection techniques aim to detect variations in the thermodynamic properties of blades by capturing thermal images of the blade surface using a thermal imager or infrared camera. These images are then analyzed to identify any anomalous temperature distributions or thermal characteristics [24]. The advantage of this approach lies in its non-contact nature, as it does not necessitate direct contact with the blade surface, thereby avoiding production downtime. Additionally, it enables scanning of the entire blade surface rather than merely localized areas, making it suitable for the inspection of large-scale WTBs. Sanati et al. [25] investigated the application of active and passive thermographic techniques for the detection of defects in WTBs, utilizing several image processing techniques on the unprocessed thermal data to increase the defect diagnosis process’s signal-to-noise ratio. Hwang et al. [26] proposed a Continuous Line Laser Scanning Thermography (CLLST) system that remotely detects internal delamination and surface damage in WTBs by analyzing the propagation patterns of laser-induced thermal waves.Thermographic techniques rely on the detection of temperature differences on the surface of the target. When there are reflective properties or contamination on the blades’ surface, these surface characteristics can lead to misinterpretations, as thermography is sensitive to temperature variations. However, early-stage damage does not easily result in temperature changes, which means that thermography may not be suitable for the detection of early-stage damage.

2.2. Deep Learning Methods

With the ongoing advancement of drone technology equipped with high-definition cameras, deep learning techniques have been widely applied in the field of target detection, including maintenance and monitoring of WTBs [27,28]. Detection methods based on deep learning leverage large volumes of annotated data for training, enabling automatic learning and recognition of features within images, thereby achieving precise target detection. Damage detection methods based on deep learning offer advantages such as non-contact, high precision, and long-range detection. Compared to traditional detection approaches, this method can significantly reduce labor costs and risks while being less susceptible to environmental factors. Guo et al. [29] employed Haar-like features, an Adaboost cascade classifier, and convolutional neural networks (CNN) to construct a classification network for detecting the presence and location of damage. Yang et al. [30] utilized deep learning models, transfer learning, and ensemble learning classifiers to identify damage in images of WTBs. Lv et al. [31] detected damage to the blades using an enhanced SSD algorithm. Zhang et al. [32] suggested using a lightweight YOLOv5s as the basis for a surface flaw detection model for wind turbine assembly, which not only exhibits high accuracy but also delivers strong instantaneous performance. Ran et al. [33] presented a modified algorithm named Attention and Feature Balance YOLO (AFB-YOLO) based on YOLOv5s. This updated model can identify subtle and low-resolution damage characteristics on WTBs, greatly improving the capacity to identify small-scale flaws. Hu et al. [34] have made improvements to the yolov-tiny algorithm, resulting in a modified model that demonstrates greater detection accuracy, especially in terms of small-sized objects.

Although current methods can effectively detect damage scenarios of simple WTBs, they mainly focus on identifying single damage types. When it comes to detecting various types of damage in complex environments, their performance often leaves much to be desired, frequently resulting in missed and false detections. This phenomenon indicates that existing technologies still have significant limitations in improving the accuracy of detecting multiple types of damage. To address these shortcomings, firstly, the present research employed drones equipped with high-resolution cameras to conduct an extensive collection of blade damage images across various locations. Based on the acquired data, the damages were classified into two categories: cracks and peelings. Secondly, this study utilized the efficient YOLOv8 model, which is widely applied in the field of damage detection. Finally, to improve the model’s performance even further, improvements were made in three key aspects of the algorithm: the introduction of DSConv to enhance feature extraction capabilities, refinement of the upsampling process, and customized optimization of the loss function.

3. Methodology

In this section, firstly, the YOLOv8 model adopted is introduced. Secondly, the improvement methods of the DCW-YOLO model are elaborated in detail.

3.1. YOLO Algorithm

The YOLO series represents one of the most prevalent real-time object detection algorithms in modern times, including iterations such as YOLOv5, YOLOv7 [35], and YOLOv8, among others. YOLOv5 utilizes the cross stage partial (CSP) [36] network architecture, which significantly reduces computational expense, thereby offering enhanced real-time performance as well as the characteristic of a lightweight model. However, YOLOv5 encounters certain challenges and limitations within scenarios involving the identification of targets with high density and tiny size. In response to these shortcomings, YOLOv7 innovatively introduces a trainable bag of freebies strategy, which significantly enhances the model’s detection accuracy and generalization capability. Nevertheless, due to limitations related to training datasets, model architecture, and hyperparameters, YOLOv7 might undergo a decrease in performance under certain unique conditions. Recently, Ultralytics, the developer of YOLOv5, released the YOLOv8 algorithm in 2023. Figure 1 depicts the network architecture. The architecture of the algorithm primarily comprises three components: backbone, neck, and a detection head. This network topology will be explained in depth in the material that follows.

In the design of the backbone network, to achieve effective feature extraction, the hierarchical path aggregation concept of CSP is adopted, and the construction principles of the E-Net Like Attention Network (ELAN) are incorporated, which showcased advanced performance in YOLOv7. By swapping out the YOLOv5’s C3 module for the more sophisticated C2f module, the structure not only maintains a lightweight profile but also enhances feature fusion capabilities, permitting a gradient flow of information via the network that is more plentiful. The SPPF module, which comes in after the backbone network, keeps improving the model’s capacity to identify objects at different sizes. This module employs three

5 \times 5

maxpool operations before each fusion layer to ensure detection accuracy for multi-scale targets while maintaining the network’s lightweight characteristics.

The integration of features from several scales is the neck network’s main purpose. By adopting a Path Aggregation Network-Feature Pyramid Network (PAN-FPN) architecture [37,38], both top-down and bottom-up streams of feature flow are constructed, facilitating the efficient integration of shallow spital information and rich semantic information. This structure allows the network to gain a broader and more enriched feature representation, thereby enhancing the model’s capability for object detection.

YOLOv8 integrates YOLOX’s approach of using decoupled heads within its detection mechanism, opting for distinct pathways for classifying objects and for bounding box adjustments, employing unique loss functions for each [39]. Specifically, it utilizes binary cross-entropy (BCE) loss for object classification and combines distribution focal loss (DFL) [40] with complete intersection over union (CIoU) [41] loss for more accurate bounding box regression. This architecture not only boosts the accuracy of the model but also speeds up the convergence during training. Moreover, YOLOv8 adopts an anchor-free design, directly identifying object centers, which simplifies the process of non-maximum suppression. In the loss calculation process, the TaskAlignedAssigner method is used to determine positive and negative sample assignments. Regression and classification scores are combined and weighted to determine positive sample selection, thereby improving the model’s detection accuracy and robustness [42].

3.2. DCW-YOLO Algorithm

The improvements of the DCW-YOLO model mainly include three aspects: Firstly, DSConv are integrated into the YOLOv8 C2f module; secondly, a CARAFE lightweight upsampling mechanism is introduced; finally, the loss function is optimized by adopting the WIoUv3 strategy.

3.2.1. DSConv

In order to overcome the difficulties in identifying small-area and serpentine crack damage in large-scale UAV-captured wind turbine photos, this study introduces an enhanced object detection method that integrates a dynamic convolution structure into the YOLOv8 model’s C2f module to improve detection performance [43].

Dynamic convolution, a multi-directional convolutional operation, is designed to enhance the model’s capacity to perceive complex geometric structures, thereby more accurately capturing the subtle characteristics and contours of cracks. During the model training phase, dynamic convolution adaptively focuses on elongated and curved regions with tube-like structures, treating these features as significant targets. The model’s capacity to resolve the geometric features seen in WTB pictures is much enhanced when dynamic convolution is applied to the C2f module, thus increasing the detection accuracy and robustness for small-area damage and serpentine cracks. The accompanying Figure 2 displays the DSConv schematic diagram.

Dynamic snake convolution’s fundamental idea involves incorporating adjustable offsets into convolutional kernels, enhancing their spatial adaptability, thereby more accurately capturing the complex geometric features within an image. In traditional convolutional processes, the receptive fields remain static, which can limit the model’s capacity to detect intricate characteristics, especially in cases of elongated or tubular shapes. DSConv makes it possible to learn flexible offsets, which enhances the model’s capacity to recognize these kinds of characteristics. Additionally, it utilizes an iterative approach (Figure 3) to prevent the receptive fields from straying too far from the intended target, processing one target at a time to ensure sustained attention. Below is the mathematical derivation of DSConv.

First, consider a standard 2D convolution given by the coordinate K, with

K_{i} = (x_{i}, y_{i})

serving as the representation of the center coordinate.The following is an expression for a

3 \times 3

convolution kernel K with a dilation factor of 1.

K = \{(x - 1, y - 1), (x - 1, y), \cdot \cdot \cdot, (x + 1, y + 1)\}

(1)

Introduce a deformation offset,

Δ

, to improve the convolution kernels’ flexibility and let them concentrate more precisely on the intricate geometric characteristics of the target. However, permitting the model to freely adjust, there is a risk that the receptive field will deviate greatly from the planned region. To mitigate this, DSConv modifies the convolutional kernel along both the x-axis and y-axis directions.

As shown in Figure 3, for the x-axis direction, the coordinates are denoted as follows:

K_{i \pm c} = \{\begin{matrix} (x_{i + c}, y_{i + c}) = (x_{i} + c, y_{i} + Σ_{i}^{i + c} Δ y), \\ (x_{i - c}, y_{i - c}) = (x_{i} - c, y_{i} + Σ_{i - c}^{i} Δ y) . \end{matrix}

(2)

The coordinates in the y-axis direction are expressed as follows:

K_{j \pm c} = \{\begin{matrix} (x_{j + c}, y_{j + c}) = (x_{j} + Σ_{j}^{j + c} Δ x, y_{j} + c), \\ (x_{j - c}, y_{j - c}) = (x_{j} + Σ_{j - c}^{j} Δ x, y_{j} - c) . \end{matrix}

(3)

where the cumulative deformation is represented by ∑. Considering that

Δ

is typically fractional, we employ bilinear interpolation for processing, as shown below.

K = Σ_{K^{'}} B (K^{'}, K) \cdot K^{'}

(4)

The fractional position between Equations (2) and (3) is denoted by the letter k in the expression. All integer spatial positions are enumerated by

K^{'}

, and B represents the bilinear interpolation kernel, capable of being broken down into a pair of one-dimensional interpolation kernels. It is expressed as shown in Equation (5). Herein, b denotes the interpolation function in one dimension.

B (K, K^{'}) = b (K_{x}, K_{x}^{'}) \cdot b (K_{y}, K_{y}^{'})

(5)

DSConv can dynamically change the direction of the x- and y-axes by using the mathematical analysis that is provided, as depicted in Figure 3. During its deformation process, DSConv covers a 9 × 9 region. Implementing this convolution within the C2f module greatly improves the model’s ability to thoroughly analyze geometric configurations in images of WTBs, thus enhancing the detection accuracy and reliability for minor damage and intricate crack patterns.

3.2.2. CARAFE

Upsampling is a common operation in deep neural networks, utilized to boost feature map resolution and size. The most prevalent techniques for feature upsampling are bilinear interpolation and nearest neighbor; however, they come with certain limitations. These interpolation methods consider only simple comparisons or linear combinations between pixels, disregarding the semantic data in the feature maps. Moreover, the size of their receptive field is restricted to 1 × 1, which falls short in supplying the detailed semantic details necessary for dense prediction tasks. Deconvolution is another method of upsampling, which involves convolving the input feature map with a kernel, using padding and stride adjustments to expand the generated feature map’s size. Decovolution, on the other hand, uses the same kernel throughout the picture, which compromises its ability to respond to local variations. Moreover, due to its substantial number of parameters, it can impact computational efficiency.

In this research, we propose the adoption of a lightweight, universal upsampling technique, CARAFE [44], as a replacement for the nearest neighbor interpolation upsampling method in the original YOLOv8 model, thereby capturing richer semantic information. Figure 4 shows the particular structure of CARAFE, comprising an upsampling prediction module and a content-aware reassembly module. The upsampling kernel prediction module generates the upsampling kernels required for the reassembly module in a content-aware manner. Initially, a channel compressor employing a 1 × 1 convolutional layer decreases the quantity of channels in the input feature representation from C to

C_{m}

, significantly decreasing the computational burden. Subsequently, content encoding is applied to the channel-compressed input feature map of size

H \times W \times C_{m}

, expanding the number of channels from

C_{m}

to

σ^{2} \times K^{2}

. The channels are then stretched over the spatial dimension, producing an upsampling kernel that has the form

σ H \times σ W \times K^{2}

. A softmax function is then utilized for the standardization of the kernel, guaranteeing that the total weight of the upsampling kernel is unity. Overall, the upsampling prediction module leverages the semantic information contained within each feature point of the input feature map and its surrounding feature points for content encoding. This process adaptively generates the corresponding upsampling reassembly kernels for different feature points.

In the content-aware reassembly module, for every reassembly kernel

W_{i}

, the content-aware reassembly module reassembles the features within the local area. For the target position

l^{'}

and the

K \times K

square region N corresponding to the feature map with

N (X_{l}, k_{u p})

as the center, the process for reassembly is presented in Equation (6), where

r = ⌊ k_{u p} / 2 ⌋

. Based on both its own features and the surrounding features, each pixel in the

N (X_{l}, k_{u p})

area of the reassembly kernel contributes to the upsampled pixel in a distinct way, rather than its position and distance. Therefore, the reconstructed feature map holds more robust semantic content compared to the initial features.

χ_{l^{'}}^{'} = \sum_{n = - r}^{r} \sum_{m = - r}^{r} W_{l^{'} (n, m)} \cdot χ_{(i + n, j + m)}

(6)

In contrast to Yolov8’s conventional nearest-neighbor interpolation upsampling, the CARAFE upsampling module employs adaptively generated recombination kernels to perform upsampling on each feature point in the image, thereby gathering contextual semantic insights from a wider field of perception. This notably decreases the degradation of image feature details during the upsampling phase, effectively improving the network’s ability to extract features.

3.2.3. Improved Loss Function

The YOLOv8 algorithm has introduced significant improvements in its network architecture’s head, adopting a decoupled head architecture along with an anchor-free strategy. For the computation of the loss function, it covers two main parts: classification and regression losses. For the classification loss, the algorithm utilizes binary cross-entropy loss to quantify the error between predicted classes and actual classes. In terms of regression loss, the algorithm combines distribution focal loss and bounding box regression loss to enhance the accuracy of the location prediction. Overall, the loss function of YOLOv8 can be expressed as Equation (7).

L = L_{b o x} + L_{c l s} + L_{D F L}

(7)

The distribution focal loss (DFL) aims to optimize the probabilities of two discrete points adjacent to the target label y, namely the left and right points, in the form of cross-entropy. With this strategy, the network is encouraged to quickly concentrate on the dispersion of the area around the goal site, thereby enhancing the accuracy of localization. The loss function’s particular form is as follows; see Equation (8).

D F L (S_{i}, S_{i + 1}) = - ((y_{i + 1} - y) log (S_{i}) + (y - y_{i}) log (S_{i + 1}))

(8)

Subsequently, YOLOv8 adopts CIoU as the loss function for the bounding box regression, as shown in Equation (9).

L_{C I o U} = 1 - I o U + \frac{ρ^{2} (b, b^{g t})}{c^{2}} + α ν

(9)

where, b and

b^{g t}

are the centroids of the predicted and actual bounding boxes, respectively,

ρ

represents the Euclidean distance between the two points, c is the diagonal length of the smallest box that contains both the predicted and actual boxes, and

α

stands for the weight coefficient; its expression is shown as follows.

α = \frac{ν}{(1 - I o U) + ν}

(10)

where

I o U

is defined as the ratio of the overlap between the predicted bounding box and the actual box to their union.

ν

is a measure of the aspect ratio similarity, and it can be defined in the form of Equation (11). In the equation,

w^{g t}

and

h^{g t}

correspond to the ground truth box’s width and height, respectively, while w and h indicate the width and height of the prediction box.

ν = \frac{4}{π^{2}} [arctan \frac{w^{g t}}{h^{g t}} - arctan \frac{w}{h}]^{2}

(11)

Despite the effectiveness of

C I o U

loss in object detection, it still harbors certain limitations. First and foremost,

C I o U

loss does not account for balancing sample difficulties, which may result in the model being biased towards detecting easier samples, especially in scenarios with an abundance of small targets, at the expense of performance on more challenging samples. Secondly,

C I o U

utilizes the aspect ratio as a penalty in its loss calculation; nonetheless, if the actual and predicited bounding boxes match in aspect ratio yet diverge in their width and height, this penalty fails to accurately capture the actual size disparities between the boxes. Moreover, the

C I o U

equation incorporates inverse trigonometric functions, significantly raising the computational demand and arithmetic processing expense of the model, which may limit its efficiency in environments with restricted resources.

In the specific scenario of detecting WTB damages from UAV imagery, the presence of a relatively high proportion of small target damage data makes the judicious design of the loss function crucial for enhancing the performance of object detection models. Furthermore, since there will always be some low-quality examples in the training data, placing too much emphasis on box regression for these samples could deteriorate the model’s ability to generalize. Ideally, the objective function should reduce the gradient penalty for geometric measures in cases of considerable congruence between the anchor boxes and the actual ground truth boxes, thus helping the model to enhance its generalization capabilities.

Given these considerations, this research adopts the dynamic non-monotonic focus mechanism of gradient gain allocation from

W I o U v 3

[45], which allows the flexible adjustment of gradient gain allocation strategies at different stages of training. In the early phase of training, the approach emphasizes preserving high-quality anchor boxes and mitigating geometric penalties, enabling the model to acquire better generalization abilities. During the mid-to-late phase of training,

W I o U v 3

attributes reduced gradient enhancements to anchor boxes of inferior quality, suppressing harmful gradients originating from inferior samples and enhancing the model’s localization accuracy.

W I o U v 3

is defined as follows.

L_{W I o U} = γ \times R_{W I o U} \times L_{I o U}

(12)

R_{W I o U} = exp (\frac{{(x - x_{g t})}^{2} + {(y - y_{g t})}^{2}}{W_{g}^{2} + H_{g}^{2}})

(13)

β = \frac{L_{I o U}^{*}}{\bar{L_{I o U}}} \in [0, \infty)

(14)

γ = \frac{β}{δ α^{β - δ}}

(15)

In Equation (13),

R_{W I o U}

denotes the normalized distance between the centers of the predicted box and the ground truth box, where x, y and

x_{g t}

,

y_{g t}

respectively denote the center coordinates of the predicted box and the ground truth box, while

W_{g}

and

H_{g}

represent the width and height of the ground truth box.In the Equation (14),

β

represents outlierness,

L_{I o U}^{*}

represents the monotonic focusing coefficient, and

\bar{L_{I o U}}

represents the moving average with momentum m. In Equation (15),

α

and

δ

represent hyperparameters controlling the gradient gain

γ

.

After the improvements adopted in this research, the structure of the DCW-YOLO model is shown in Figure 5.

4. Experiment

4.1. Data Collection and Preparation

The dataset was collected using a high-resolution drone, specifically the DJI Mavic 2 Pro, equipped with a Hasselblad L1D-20c camera. This camera features a 20-megapixel 1-inch CMOS sensor with dimensions of 13.2 mm (width) × 8.8 mm (height), capable of capturing greater detail and richer color information. The lens has a 28 mm equivalent focal length, and the image resolution is 5472 × 3648 pixels. During the image capture process, the distance between the drone and the WTBs was approximately 5 to 15 m, ensuring precise detection of surface defects. According to imaging principles, the ratio of the target’s actual size to the sensor size is equal to the ratio of the shooting distance to the focal length. The following formula can be used to calculate the physical size of each pixel in the scene. Assuming a target distance of D meters, a focal length of f, a sensor width of W, and an image width resolution of X, the physical size per pixel S can be expressed using the following formula.

S = \frac{W \times D}{f \times X}

(16)

Thus, at a shooting distance of 5 m, the pixel size is approximately 0.43 mm. This means that each pixel represents a physical size of 0.43 mm in the actual scene, providing sufficient resolution to accurately detect and classify both peeling and crack defects. In evaluating the model’s performance, determining the minimum detectable defect size is crucial. The minimum detectable defect size primarily depends on image resolution, the distance between the target and the camera, and the downsampling factor of the model’s feature extraction. Typically, the downsampling factor of the final detection layer in YOLOv8 is 32, meaning that each pixel in the feature map corresponds to a 32 × 32 pixel area in the original image. Assuming a downsampling factor of

σ

, the minimum detectable size M can be expressed using the following formula.

M = S \times σ

(17)

By substituting the previously obtained minimum physical size per pixel (0.43 mm), we can determine the minimum defect size that the model can detect. Therefore, at a shooting distance of 5 m, the model can detect a minimum defect size of approximately 13.76 mm.

With image samples obtained from two wind farms in Liaoning and Jiangsu provinces, China. A total of 600 high-definition original damage images were collated. After utilizing 600 original images and applying image preprocessing, the sample size is sufficient for binary damage detection, effectively supporting the training and validation of the algorithm. To enhance data diversity and facilitate effective training of deep learning models, the original photos were subjected to a series of data augmentation processes, including image flipping, rotation, and HSV color transformation. Through these preprocessing operations, the images were expanded to 3550 and then divided in a 9:1 ratio into a training set and a validation set. The dataset primarily consists of two types of damage: cracks and peelings. As shown in Figure 6, a subset of the dataset collected in this experiment is displayed.

In this experiment, training was performed using a GPU. The operating system used was Ubuntu 20.04, with an RTX 3080 graphics card and an Intel(R) Xeon(R) Platinum 8255C CPU. The CUDA version used was 11.8. The deep learning framework utilized was PyTorch 2.0.0, and the programming language used was Python 3.8. The resolution of the training images was set to 640 × 640, with a batch size of 16. The training was performed for 100 iterations, utilizing a stochastic gradient descent (SGD) optimizer, a learning rate of 0.01; a momentum parameter of 0.937; a weight decay value of 0.0005.

4.2. Evaluation Metrics

To objectively evaluate the detection performance of the algorithm on blade damage in complex environments, this experiment uses precision (P), recall (R), average precision (AP), mean average precision (mAP), parameter count (Params), and the number of giga floating-point operations per second (GFLOPs) as the evaluation metrics for the model.

P = \frac{TP}{TP + FP}

(18)

R = \frac{TP}{TP + FN}

(19)

AP = \int_{0}^{1} P (R) d R

(20)

mAP = \frac{1}{n + 1} \sum_{i = 1}^{n} {AP}_{i}

(21)

In the aforementioned formulas, TP denotes the count of instances that the algorithm properly classified as positive, FP denotes the count of instances erroneously classified as positive by the algorithm, and FN denotes the count of instances falsely classified as negative by the algorithm. Precision refers to the ratio of correctly classified positive instances among all samples predicted as positive by the algorithm, while recall refers to the ratio of accurately identified positive instances out of all actual positive instances. AP quantifies a model’s effectiveness through the calculation of the space under the precision-recall curve. AP denotes improved model performance across various thresholds. mAP represents the average value of AP for each class; in the experiment, a higher mAP value indicates better performance of the model in detecting damage to WTBs. The Params refers to the total number of parameters that need to be trained within the model. GFLOPs refers to the number of floating-point operations performed by the model in one second. Lower values of parameters and computational load contribute to making the model more lightweight.

4.3. Results and Analysis

4.3.1. Ablation Experiment

Three improvement methods, named D (DSConv), C (CARAFE), and W (WIoUv3), were proposed in the experiment. To verify the efficacy of these three enhancement methods, all ablation experiments were conducted on our self-made dataset of damaged fan blades. The evaluation metrics for detection performance included P, R, [email protected], and [email protected]–0.95. Use Params and GFLOPs as evaluation metrics for model lightweighting. As shown in Table 1, using the original Yolov8n model as a baseline, the effectiveness of each improvement method is evaluated individually by adding them one at a time. This approach reveals the specific impact of each improvement on model performance. Moreover, by combining different improvement strategies, it is possible not only to assess the effect of individual improvements but also to explore whether there is a synergistic effect between them, thereby further enhancing the performance of the model. Through these experiments, we can better evaluate the effect of each improvement method on the performance enhancement of object detection algorithms.

The symbol “√” indicates the inclusion of the improvement method. As we can see from Table 1, compared to the original Yolov8n, the most significant improvement was observed by incorporating dynamic snake convolution into the C2f module at the neck end, with an increase of 1.3% in [email protected] and 0.3% in [email protected]–0.95. Further enhancement in detection accuracy was achieved through improving the upsampling, leading to a rise of 0.9% in [email protected] and 0.9% in [email protected]–0.95. Following loss function optimization, the bounding box regression became more accurate, leading to final [email protected] and [email protected]–0.95 reaching 92.5% and 69%, respectively. The tabular data also indicates that integrating any two improvement methods can enhance detection performances. Ultimately, after integrating three enhancement methods, the model’s parameter count and computational complexity are 3.35M and 8.5 GFLOPs, respectively. With only a minor increase in computational overhead, the DCW-YOLO model achieved significant performance improvements. Its [email protected] index reached 93.7%. The precision could reach up to 96.8%, and the recall rate up to 89%. The [email protected] increased from the initial 91.4% to 93.7%, and the [email protected]–0.95 rose from the initial 68.9% to 71.2%. Figure 7 illustrates the comparison of mAP values obtained on the wind turbine blade damage dataset between the original YOLOv8 algorithm and its enhanced version, the DCW-YOLO model.

From Figure 7, it can be observed that the improved DCW-YOLO model achieves a higher mAP compared to the original YOLOv8 model, with an increase from 91.4% to 93.7%. Specifically, for crack damage, the mAP increases from the original 95.5% to 97%, and for peeling damage, the mAP increases from the original 87.2% to 90.3%.

To vividly demonstrate the effectiveness of the improved algorithm in practical applications, it was compared with the original YOLOv8 algorithm on a WTBs dataset. Two types of damaged image samples, captured under different background complexities and lighting conditions, were selected for comparison. As confirmed by the detection results shown in Figure 8, the improved algorithm significantly outperforms the original YOLOv8 algorithm in terms of detection confidence and accuracy in identifying blade damage. In summary, the improved algorithm proposed in this research can meet the needs of actual WTB damage detection scenarios and will have significant value and broad application prospects in future practical applications.

4.3.2. Comparison Experiment

To evaluate the impact of input data quantity on the algorithm’s performance, we designed a comparative experiment, training the algorithm with different amounts of input images. Specifically, we used a smaller amount of input data (e.g., 400 images) and larger amounts (e.g., 600 and 800 images) to observe their effects on the model’s convergence, generalization ability, and detection accuracy. The performance evaluation metrics used included P, R, [email protected], [email protected]–0.95, and training time. Table 2 displays the outcomes of the experiment.

As shown in the table above, with the increase in the number of original training images, the algorithm’s mAP value gradually improves. However, when fewer training images are used, the mAP value is significantly lower, and the model’s performance fluctuates, indicating insufficient training data. This lack of data prevents the model from fully learning the important features in the dataset, thus affecting detection performance. Through experiments, we found that when the number of original training images reaches approximately 600, the algorithm achieves optimal output performance. At this data volume, the mAP curve stabilizes, showing that the model has reached a good level of convergence with strong detection capability and generalization performance. Further increasing the number of images does not result in a significant improvement in mAP, indicating that 600 images provide the algorithm with sufficiently rich samples to ensure stable performance across diverse scenarios. However, this also leads to significantly longer training times and higher computational resource consumption. Therefore, 600 images represent the optimal balance between performance and efficiency, meeting the needs of practical applications.

To further validate the superiority of the proposed model compared to other mainstream object detection network models, a comparative experiment was carried out using a self-made WTBs damage dataset between the proposed DCW-YOLO algorithm and other models including Faster R-CNN, SSD, Yolov5, Yolov6 [46], Yolov7-Tiny, Yolov8n, YOLOv9 [47], YOLOv10 [48] and RT-DETR [49]. The performance evaluation metrics used were P, R, [email protected], and [email protected]–0.95. Table 3 displays the outcomes of the experiment.

Firstly, data from Table 3 clearly demonstrate that YOLOv8 surpasses traditional object detection frameworks like Faster R-CNN in metrics such as recall and mean average precision. This improvement is primarily due to YOLOv8’s superior integration of multi-scale features within its feature extraction network, enhancing robustness in detecting objects of varying sizes. In contrast, the SSD algorithm, despite utilizing multiple anchor boxes for different scales, struggles with targets having inconsistent aspect ratios. YOLOv8’s enhancements in network architecture and training strategies have significantly boosted its performance in detecting these challenging targets, resulting in superior outcomes. Moreover, YOLOv8 maintains a significantly lower parameter and computational cost compared to traditional detection algorithms, achieving greater efficiency.

Secondly, compared to other iterations in the YOLO series such as YOLOv6, YOLOv7-Tiny, and YOLOv10, YOLOv8 continues to exhibit advantages in various detection metrics. The relatively shallow network architecture of YOLOv6 limits its learning and expressive capabilities, thus hindering its performance in complex scenarios. YOLOv7-Tiny, on the other hand, lacks sensitivity to changes in target sizes, which compromises its efficacy in detecting both small and large objects. YOLOv10’s attempt to balance network depth and width has not translated effectively in practical applications, failing to capture sufficient feature representations, particularly in complex scenarios like WTB damage detection. In contrast, YOLOv5, despite its efficiency in terms of parameter count and computational complexity (only requiring 2.51M parameters and 7.1 GFLOPs), falls short in precision compared to YOLOv8, largely due to its anchor-free training approach, which still requires further optimization for detecting smaller targets. Although YOLOv9 achieves a higher precision rate of 0.954, surpassing YOLOv8’s 0.936, its [email protected] and [email protected]–0.95 are 0.902 and 0.668, respectively, both lower than YOLOv8’s 0.914 and 0.689, indicating that while YOLOv9 excels in accurately identifying targets, YOLOv8 maintains a superior overall performance in both high recall and average precision. Additionally, the increased parameter count and computational demands of YOLOv9 (25.32M parameters and 102.3 GFLOPs) may not be optimal for scenarios with limited computational resources.

Thirdly, the detection performance of the RT-DETR model is observed to be unsatisfactory from the table data. This is primarily attributed to the fact that large-scale models like RT-DETR typically require substantial data support to perform optimally. However, the dataset available for WTB damage detection is relatively limited, greatly constraining RT-DETR’s effectiveness. In contrast, the YOLOv8 algorithm has proven to be more suitable for WTB damage detection in this study.

In summary, YOLOv8 strikes a good balance between resource efficiency and performance. Consequently, this paper selects YOLOv8 as the baseline model for improvements. The enhanced DCW-YOLO model, with a minor increase in computational overhead compared to YOLOv8, achieves substantial performance enhancements, with its [email protected] reaching 93.7%.The Figure 9 shows the mAP values obtained for the different models in the experiment.

From Figure 9, it is clear that when compared to other algorithms, the YOLOv8 algorithm has a considerable edge in terms of mAP value, demonstrating the accuracy and effectiveness of the baseline model chosen in this research. Furthermore, the DCW-YOLO model, based on the improvement of YOLOv8, demonstrated outstanding performance in the experiment of WTB damage detection, further confirming the effectiveness and applicability of the model improvement.

To more vividly demonstrate the detection performance of the DCW-YOLO algorithm proposed in this study compared to other algorithms, several models with similar parameters were selected for a visual comparison, including YOLOv5, YOLOv6, YOLOv7-Tiny, YOLOv8, and YOLOv10.The visual detection is shown in Figure 10. From the visual comparison results, it can be seen that YOLOv8 demonstrates higher detection accuracy and more precise damage localization compared to YOLOv5, YOLOv6, YOLOv7-Tiny, and YOLOv10, with no occurrences of false positives or missed detections. This further confirms the appropriateness of selecting YOLOv8 as the baseline model for this study. Additionally, the DCW-YOLO model, which is an improvement based on YOLOv8, not only enhances detection accuracy but also completely eliminates false positives and missed detections, further validating the reliability and effectiveness of the proposed algorithm.

5. Conclusions

This study proposes a WTB damage detection model called DCW-YOLO, which is based on the one-stage object detection model YOLOv8. To enhance the efficacy of the algorithm in identifying complex small targets within extensive images of WTB damage, three improvements are made. Firstly, dynamic snake convolution is integrated into the C2f module before the three detection heads of the YOLOv8 network model. This advancement permits the model to efficiently detect the elongated and curved characteristics of the targets under blur or overlap conditions, enhancing the algorithm’s focus on the geometric structure of the damage in WTB images. Secondly, the upsampling method is replaced with CARAFE, which mitigates the issue of minor damaged targets being ignored during upsampling and successfully minimizes feature loss. Finally, the optimization of the loss function is achieved through the integration of the WIoUv3 strategy, It allows for more accurate bounding box regression by combining a sensible gradient gain allocation technique with a dynamic non-monotonic focus mechanism. According to experimental data, the suggested DWC-YOLO model exhibits notable improvements in detection performance metrics when compared to the original YOLOv8 model. Specifically, accuracy is improved by 3.2%, recall is increased by 1.9%, [email protected] is increased by 2.3%, and [email protected]–0.95 is increased by 2.3%. These experimental results indicate the outstanding performance of the DWC-YOLO model in intelligently identifying WTB damage. This model has important research value and practical significance in dealing with overlapping damage and small-area damage in large-scale WTB images under complex lighting conditions, offering dependable technical assistance for real-world implementations in monitoring the health of wind turbines.

However, it must be acknowledged that this study has certain limitations and there is room for further improvement. First, although our method has achieved significant achievements in feature extraction and detection accuracy, the corresponding parameter count and computational load have also slightly increased, which may affect the deployment efficiency of the algorithm on edge devices. Therefore, in future research, we will focus more on exploring the lightweight implementation of the algorithm to optimize its performance in resource-constrained environments. Secondly, subsequent studies will continue to delve into the accurate assessment of damage severity, as accurately determining the extent of blade damage is crucial for maintaining the normal operation of wind turbines. Finally, to enhance the resilience and generalizability of the algorithm, we will continue to refine and adjust it meticulously. Overall, the DCW-YOLO model proposed in this study provides an effective and practical method for detecting WTB damage, offering significant research value and practical significance for monitoring the health of wind turbines.

Author Contributions

Conceptualization, A.C.; methodology, A.C.; software, A.C.; validation, L.Z. and A.C.; formal analysis, L.Z.; investigation, L.Z. and A.C.; resources, A.C.; data curation, L.Z. and A.C.; writing—original draft preparation, A.C.; writing—review and editing, L.Z.; visualization, L.Z. and A.C.; supervision, C.L.; project administration, X.Y. and Y.S.; funding acquisition, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant 52005071 and the Applied Basic Research Program Project of Liaoning Province under Grant 2023JH2/101300236.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dincer, I. Renewable energy and sustainable development: A crucial review. Renew. Sustain. Energy Rev. 2000, 4, 157–175. [Google Scholar] [CrossRef]
Global Wind Energy Council. Global Wind Report 2023. Available online: https://gwec.net/globalwindreport2023 (accessed on 6 May 2023).
Du, Y.; Zhou, S.; Jing, X.; Peng, Y.; Wu, H.; Kwok, N. Damage detection techniques for wind turbine blades: A review. Mech. Syst. Signal Process. 2020, 141, 106445. [Google Scholar] [CrossRef]
Jureczko, M.; Pawlak, M.; Mężyk, A. Optimisation of wind turbine blades. J. Mater. Process. Technol. 2005, 167, 463–471. [Google Scholar] [CrossRef]
Mishnaevsky, L., Jr. Root causes and mechanisms of failure of wind turbine blades: Overview. Materials 2022, 15, 2959. [Google Scholar] [CrossRef]
Kaewniam, P.; Cao, M.; Alkayem, N.F.; Li, D.; Manoach, E. Recent advances in damage detection of wind turbine blades: A state-of-the-art review. Renew. Sustain. Energy Rev. 2022, 167, 112723. [Google Scholar] [CrossRef]
McKenna, R.; vd Leye, P.O.; Fichtner, W. Key challenges and prospects for large wind turbines. Renew. Sustain. Energy Rev. 2016, 53, 1212–1221. [Google Scholar] [CrossRef]
Hang, X.; Zhu, X.; Gao, X.; Wang, Y.; Liu, L. Study on crack monitoring method of wind turbine blade based on AI model: Integration of classification, detection, segmentation and fault level evaluation. Renew. Energy 2024, 224, 120152. [Google Scholar] [CrossRef]
Liu, J.; Wang, X.; Wu, S.; Wan, L.; Xie, F. Wind turbine fault detection based on deep residual networks. Expert Syst. Appl. 2023, 213, 119102. [Google Scholar] [CrossRef]
Zhou, Y.; Cao, R.; Li, P.; Ma, X.; Hu, X.; Li, F. A damage detection system for inner bore of electromagnetic railgun launcher based on deep learning and computer vision. Expert Syst. Appl. 2022, 202, 117351. [Google Scholar] [CrossRef]
de Paula Monteiro, R.; Lozada, M.C.; Mendieta, D.R.C.; Loja, R.V.S.; Bastos Filho, C.J.A. A hybrid prototype selection-based deep learning approach for anomaly detection in industrial machines. Expert Syst. Appl. 2022, 204, 117528. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016. Part I. Volume 14, pp. 21–37. [Google Scholar]
Ou, Y.; Chatzi, E.N.; Dertimanis, V.K.; Spiridonakos, M.D. Vibration-based experimental damage detection of a small-scale wind turbine blade. Struct. Health Monit. 2017, 16, 79–96. [Google Scholar] [CrossRef]
Ghoshal, A.; Sundaresan, M.J.; Schulz, M.J.; Pai, P.F. Structural health monitoring techniques for wind turbine blades. J. Wind Eng. Ind. Aerodyn. 2000, 85, 309–324. [Google Scholar] [CrossRef]
Liu, Z.; Zhang, L.; Carrasco, J. Vibration analysis for large-scale wind turbine blade bearing fault detection with an empirical wavelet thresholding method. Renew. Energy 2020, 146, 99–110. [Google Scholar] [CrossRef]
Kumpati, R.; Skarka, W.; Ontipuli, S.K. Current trends in integration of nondestructive testing methods for engineered materials testing. Sensors 2021, 21, 6175. [Google Scholar] [CrossRef]
Tang, J.; Soua, S.; Mares, C.; Gan, T.H. An experimental study of acoustic emission methodology for in service condition monitoring of wind turbine blades. Renew. Energy 2016, 99, 170–179. [Google Scholar] [CrossRef]
Xu, D.; Liu, P.; Chen, Z. Damage mode identification and singular signal detection of composite wind turbine blade using acoustic emission. Compos. Struct. 2021, 255, 112954. [Google Scholar] [CrossRef]
Yang, B.; Sun, D. Testing, inspecting and monitoring technologies for wind turbine blades: A survey. Renew. Sustain. Energy Rev. 2013, 22, 515–526. [Google Scholar] [CrossRef]
Yang, K.; Rongong, J.A.; Worden, K. Damage detection in a laboratory wind turbine blade using techniques of ultrasonic NDT and SHM. Strain 2018, 54, e12290. [Google Scholar] [CrossRef]
Oliveira, M.A.; Simas Filho, E.F.; Albuquerque, M.C.; Santos, Y.T.; Da Silva, I.C.; Farias, C.T. Ultrasound-based identification of damage in wind turbine blades using novelty detection. Ultrasonics 2020, 108, 106166. [Google Scholar] [CrossRef]
Mori, M.; Novak, L.; Sekavčnik, M. Measurements on rotating blades using IR thermography. Exp. Therm. Fluid Sci. 2007, 32, 387–396. [Google Scholar] [CrossRef]
Sanati, H.; Wood, D.; Sun, Q. Condition monitoring of wind turbine blades using active and passive thermography. Appl. Sci. 2018, 8, 2004. [Google Scholar] [CrossRef]
Hwang, S.; An, Y.K.; Yang, J.; Sohn, H. Remote inspection of internal delamination in wind turbine blades using continuous line laser scanning thermography. Int. J. Precis. Eng. Manuf.-Green Technol. 2020, 7, 699–712. [Google Scholar] [CrossRef]
Zhu, X.; Hang, X.; Gao, X.; Yang, X.; Xu, Z.; Wang, Y.; Liu, H. Research on crack detection method of wind turbine blade based on a deep learning method. Appl. Energy 2022, 328, 120241. [Google Scholar]
Guo, S.; Tian, H.; Wang, W. A Method for Solving Path Planning Problems in Complex Scenarios. Comput. Technol. Dev. 2022, 32, 27–33. [Google Scholar]
Guo, J.; Liu, C.; Cao, J.; Jiang, D. Damage identification of wind turbine blades with deep convolutional neural networks. Renew. Energy 2021, 174, 122–133. [Google Scholar] [CrossRef]
Yang, X.; Zhang, Y.; Lv, W.; Wang, D. Image recognition of wind turbine blade damage based on a deep learning model with transfer learning and an ensemble learning classifier. Renew. Energy 2021, 163, 386–397. [Google Scholar] [CrossRef]
Lv, L.; Yao, Z.; Wang, E.; Ren, X.; Pang, R.; Wang, H.; Zhang, Y.; Wu, H. Efficient and Accurate Damage Detector for Wind Turbine Blade Images. IEEE Access 2022, 10, 123378–123386. [Google Scholar] [CrossRef]
Zhang, Y.; Yang, Y.; Sun, J.; Ji, R.; Zhang, P.; Shan, H. Surface Defect Detection of Wind Turbine Based on Lightweight YOLOv5s Model. Measurement 2023, 220, 113222. [Google Scholar] [CrossRef]
Ran, X.; Zhang, S.; Wang, H.; Zhang, Z. An Improved Algorithm for Wind Turbine Blade Defect Detection. IEEE Access 2022, 10, 122171–122181. [Google Scholar] [CrossRef]
Hu, Y.; Wang, L.; Kou, T.; Zhang, M. YOLO-Tiny-attention: An Improved Algorithm for Fault Detection of Wind Turbine Blade. In Proceedings of the 2023 8th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 21–23 April 2023; pp. 1228–1232. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. Scaled-yolov4: Scaling cross stage partial network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual Event, 19–25 June 2021; pp. 13029–13038. [Google Scholar]
Liu, Y.; Yang, F.; Hu, P. Parallel FPN algorithm based on cascade R-CNN for object detection from UAV aerial images. Laser Optoelectron. Prog 2020, 57, 201505. [Google Scholar]
Wang, Y. Symposium Title: The Fronto-Parietal Network (FPN): Supporting a Top-Down Control of Executive Functioning. Int. J. Psychophysiol. 2021, 168, S39. [Google Scholar] [CrossRef]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Adv. Neural Inf. Process. Syst. 2020, 33, 21002–21012. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. Tood: Task-aligned one-stage object detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV) IEEE Computer Society, Montreal, QC, Canada, (Virtual Event), 11–17 October 2021; pp. 3490–3499. [Google Scholar]
Qi, Y.; He, Y.; Qi, X.; Zhang, Y.; Yang, G. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 6070–6079. [Google Scholar]
Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C.C.; Lin, D. Carafe: Content-aware reassembly of features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3007–3016. [Google Scholar]
Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
Lv, W.; Xu, S.; Zhao, Y.; Wang, G.; Wei, J.; Cui, C.; Du, Y.; Dang, Q.; Liu, Y. Detrs beat yolos on real-time object detection. arXiv 2023, arXiv:2304.08069. [Google Scholar]

Figure 1. The structure of YOLOv8 model.

Figure 2. The structure of DSConv.

Figure 3. Iteration of DSConv: (a) Schematic of the coordinates calculation of the DSConv. (b) The receptive field of DSConv.

Figure 4. The overall architecture of CARAFE.

Figure 5. The structure of DCW-YOLO model.

Figure 6. Dataset samples of WTBs: (a,b) are peeling samples. (c,d) are crack samples.

Figure 7. mAP of the model: (a) the mAP of the original YOLOv8; (b) the mAP of the DCW-YOLO.

Figure 8. Detection performance of the model: (a) the detection performance of YOLOv8; (b) the detection performance of DCW-YOLO.

Figure 9. mAP values for different models.

Figure 10. WTB damage detection results: (a) YOLOv5 detection effect. (b) YOLOv6 detection effect. (c) YOLOv7-Tiny detection effect. (d) YOLOv8 detection effect. (e) YOLOv10 detection effect. (f) DCW-YOLO detection effect.

Table 1. Impact of different component modules.

DSConv	CARAFE	WIoUv3	P	R	[email protected]	[email protected]–0.95	Params (M)	GFLOPs (G)
			0.936	0.871	0.914	0.689	3.01	8.1
√			0.943	0.876	0.927	0.692	3.31	8.5
	√		0.944	0.882	0.923	0.698	3.14	8.6
		√	0.938	0.875	0.925	0.690	3.01	8.1
√	√		0.949	0.887	0.925	0.704	3.49	9.0
	√	√	0.947	0.884	0.924	0.701	3.14	8.6
√		√	0.945	0.877	0.931	0.697	3.36	8.5
√	√	√	0.968	0.89	0.937	0.712	3.35	8.5

Table 2. The impact of different numbers of input images on the model.

Input Images	P	R	[email protected]	[email protected]–0.95	Training Time (Hours)
400	0.939	0.704	0.782	0.508	0.233
600	0.936	0.871	0.914	0.689	0.676
800	0.918	0.826	0.893	0.635	0.727

Table 3. Comparison of object detection models.

Method	P	R	[email protected]	[email protected]–0.95	Params (M)	GFLOPs (G)
Faster R-CNN [12]	0.903	0.807	0.813	0.560	44.68	147.6
SSD [14]	0.899	0.796	0.805	0.547	22.85	27.7
YOLOv5	0.929	0.797	0.862	0.593	2.51	7.1
YOLOv6 [46]	0.897	0.747	0.824	0.566	4.23	11.8
YOLOv7-Tiny [35]	0.911	0.799	0.856	0.544	6.02	13.2
RT-DETR [49]	0.801	0.741	0.804	0.541	31.99	103.4
YOLOv8	0.936	0.871	0.914	0.689	3.01	8.1
YOLOv9 [47]	0.954	0.842	0.902	0.668	25.32	102.3
YOLOv10 [48]	0.874	0.736	0.831	0.572	2.69	8.2
DCW-YOLO	0.968	0.890	0.937	0.712	3.35	8.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zou, L.; Chen, A.; Li, C.; Yang, X.; Sun, Y. DCW-YOLO: An Improved Method for Surface Damage Detection of Wind Turbine Blades. Appl. Sci. 2024, 14, 8763. https://doi.org/10.3390/app14198763

AMA Style

Zou L, Chen A, Li C, Yang X, Sun Y. DCW-YOLO: An Improved Method for Surface Damage Detection of Wind Turbine Blades. Applied Sciences. 2024; 14(19):8763. https://doi.org/10.3390/app14198763

Chicago/Turabian Style

Zou, Li, Anqi Chen, Chunzi Li, Xinhua Yang, and Yibo Sun. 2024. "DCW-YOLO: An Improved Method for Surface Damage Detection of Wind Turbine Blades" Applied Sciences 14, no. 19: 8763. https://doi.org/10.3390/app14198763

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DCW-YOLO: An Improved Method for Surface Damage Detection of Wind Turbine Blades

Abstract

1. Introduction

2. Related Work

2.1. Conventional Approaches

2.2. Deep Learning Methods

3. Methodology

3.1. YOLO Algorithm

3.2. DCW-YOLO Algorithm

3.2.1. DSConv

3.2.2. CARAFE

3.2.3. Improved Loss Function

4. Experiment

4.1. Data Collection and Preparation

4.2. Evaluation Metrics

4.3. Results and Analysis

4.3.1. Ablation Experiment

4.3.2. Comparison Experiment

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI