Multi-Object Detection Algorithm in Wind Turbine Nacelles Based on Improved YOLOX-Nano

Hu, Chunsheng; Zhao, Yong; Cheng, Fangjuan; Li, Zhiping

doi:10.3390/en16031082

Open AccessArticle

Multi-Object Detection Algorithm in Wind Turbine Nacelles Based on Improved YOLOX-Nano

by

Chunsheng Hu

^1,2,

Yong Zhao

^1,*,

Fangjuan Cheng

¹ and

Zhiping Li

¹

School of Mechanical Engineering, Ningxia University, Yinchuan 750000, China

²

School of Advanced Interdisciplinary Studites, Ningxia University, Zhongwei 755000, China

^*

Author to whom correspondence should be addressed.

Energies 2023, 16(3), 1082; https://doi.org/10.3390/en16031082

Submission received: 24 November 2022 / Revised: 11 January 2023 / Accepted: 12 January 2023 / Published: 18 January 2023

(This article belongs to the Special Issue Wind Turbine Structural Control and Health Monitoring)

Download

Browse Figures

Versions Notes

Abstract

With more and more wind turbines coming into operation, inspecting wind farms has become a challenging task. Currently, the inspection robot has been applied to inspect some essential parts of the wind turbine nacelle. The detection of multiple objects in the wind turbine nacelle is a prerequisite for the condition monitoring of some essential parts of the nacelle by the inspection robot. In this paper, we improve the original YOLOX-Nano model base on the short monitoring time of the inspected object by the inspection robot and the slow inference speed of the original YOLOX-Nano. The accuracy and inference speed of the improved YOLOX-Nano model are enhanced, and especially, the inference speed of the model is improved by 72.8%, and it performs better than other lightweight network models on embedded devices. The improved YOLOX-Nano greatly satisfies the need for a high-precision, low-latency algorithm for multi-object detection in wind turbine nacelle.

Keywords:

wind turbine nacelle; multiple objects; improved YOLOX-Nano; inference speed; inspection robot

1. Introduction

Wind turbines are the main equipment for converting natural wind energy into electricity [1]. In recent years, as a large number of wind turbines have been put into operation and the wind power industry has gradually matured, the inspection and maintenance of wind turbines have become a challenge. Wind farms are generally located in remote areas, with scattered equipment and inconvenient transportation [2]. The use of manual inspection of wind farms is not only time and labour-consuming but also cannot detect some abnormal conditions of wind turbines in time. At the same time, there is a risk of working at height when staff enter the wind turbine nacelle for inspection. However, the inspection robot can work continuously around the clock and can inspect several important parts of the wind turbine nacelle in real time (e.g., check the level of grease in the generator oil box, whether the oil in the waste oil tray under the hydraulic station has overflowed, and whether the water pump has leaked). Some abnormalities of the equipment in the nacelle can be found in time and notified to the maintenance personnel for timely treatment, which prolongs the unit’s service life and saves manpower. Therefore, studying the detection of multiple objects in the wind turbine nacelle is not only a prerequisite for the condition monitoring of multiple vital parts of the wind turbine nacelle by the inspection robot but also has great significance for improving the efficiency of the inspection robot.

The development of machine learning has provided new ideas for object detection algorithms. Deng et al. employed a classifier based on the Histogram of Oriented Gradient (HOG) and Support Vector Machine (SVM) to identify and classify the defect types of wind turbine blades, which effectively improves the identification accuracy of scratch-type, crack-type, sand-hole-type, and speckle-type defects [3]. Abedini et al. operated the Scale-Invariant Feature Transform (SIFT), Speeded Up Robust Features (SURF), Features from Accelerated Segment Test (FAST), Brute-Force, Fast Library for Approximate Nearest Neighbors (FLANN) method to detect wind turbine towers, which achieves a detection accuracy of 0.894 [4]. Zhu et al. used a semi-supervised method based on anomaly detection for the detection of internal defects in aluminum conductor composite core (ACCC) wires, which achieves a detection accuracy of 0.761 [5]. Although these methods improve the detection accuracy to different degrees, the final detection accuracy is roughly distributed between 0.75–0.92, which is low and cannot meet the requirements of multi-object detection accuracy in the wind turbine nacelle. Moreover, these methods have poor robustness since they can only be applied to specific environments and are unsuitable in the complicated environment of wind turbine nacelles.

With the development of computer technology, deep learning-based object detection algorithms have been widely used in various aspects in recent years [6,7]. The current deep learning-based object detection algorithms are mainly composed of two categories: two-stage object detection algorithms such as Region Convolutional Neural Network (R-CNN) [8], Fast Region-based Convolutional Neural Network (Fast R-CNN) [9], Faster Region-based Convolutional Neural Network Faster R-CNN [10]; and one-stage object detection algorithms such as You Only Look Once (YOLO) series [11,12,13,14], Single Shot MultiBox Detector (SSD) [15]. The two-stage object detection algorithm has high detection accuracy but slow inference speed. In contrast, a one-stage object detection algorithm has faster inference speed and higher detection accuracy. Ran et al. could achieve 58% detection accuracy and 4.9 frame s⁻¹ detection speed by using Faster R-CNN, a two-stage object detection algorithm, to detect defects in wind turbine blades; and 75.6% detection accuracy and 35.8 frame s⁻¹ detection speed by using YOLOv3, a one-stage object detection algorithm, to detect blade defects [16]. Hu et al. could obtain 83.32% detection accuracy and 3.2 frame s⁻¹ detection speed by using Faster R-CNN, a two-stage object detection algorithm, to detect fastener defects in high-speed railroads; and 82.34% detection accuracy and 47.27 frame s⁻¹ detection speed by using YOLOv5, a one-stage object detection algorithm, to detect fastener defects [17]. Therefore, the one-stage object detection algorithm is more suitable for detecting multiple objects (oil box, pump, oil pan) in the wind turbine nacelle.

Liu et al. [18] proposed YOLOX, a one-stage object detection algorithm suitable for deployment in industry, which incorporates the advantages of the YOLO family of networks with high detection accuracy and has been widely used in various applications. Yi et al. effectively improved the accuracy and speed of this algorithm for strip steel surface defect detection by enhancing the feature extraction layer and feature pyramid network of the YOLOX model [19]. Wu et al. proposed an improved YOLOX-TR model based on the Transformer encoder and structurally reparametrized VGG (RepVGG) blocks to achieve end-to-end tank detection and classification of dense regions of large-scale synthetic aperture radar (SAR) images [20]. Ru et al. presented a lightweight ECA-YOLOX-Tiny unmanned aerial vehicle inspection model by embedding the effective channel attention (ECA) module into the lightweight network model YOLOX-Tiny, effectively improving the localization accuracy of defective insulator self-detonation areas [21]. All these methods are mainly responsible for detecting objects of different sizes, especially for small objects with high detection accuracy. However, as shown in Figure 1, the space inside the wind turbine nacelle is small, and the detected objects occupy a relatively large portion of the monitoring screen of the inspection robot. The probability of the existence of small objects in the multi-object detection in the wind turbine nacelle is extremely small. In addition, the monitoring time of the inspection robot for the detected objects in the nacelle is short, leaving little time for the inspection robot to check the status of multiple objects in the nacelle. The original YOLOX algorithm is used to detect multiple objects in the wind turbine nacelle; although it has high detection accuracy, the inference speed is slow. Moreover, the correlation layer structure of the original YOLOX algorithm for detecting large and medium objects is sufficient for detecting multiple objects in the nacelle. The correlation layer structure for detecting small objects not only contributes to low accuracy in detecting multi-objects in the wind turbine nacelle but also causes redundant computation in the network inference process and lengthens the inference time, which is not suitable for the detection of multi-objects in the wind turbine nacelle.

An algorithm with high accuracy, low latency, and the capability of being deployed on embedded devices is needed for multi-object detection in wind turbine nacelle. Therefore, this study proposes a multi-object detection algorithm in wind turbine nacelles based on improved YOLOX-Nano, and the main contributions of this paper are as follows:

The feature extraction layer of YOLOX-Nano is replaced by CSPDarkNet-Tiny, the backbone network of YOLOv4-Tiny, to improve the speed of feature extraction for images.
The detection layer structure associated with the 8× downsampling rate feature layer in YOLOX-Nano is removed to reduce the model computation and speed up the inference.
The CSPlayer module of the YOLOX-Nano feature pyramid layer is replaced by the CSPBlock module to speed up the processing of feature fusion for different downsampled rate feature layers.

2. Models and Datasets

2.1. Dataset

The dataset of multiple objects in the nacelle was collected on a wind farm in Northwest China. It was collected from multiple wind turbine nacelles by an orbital inspection robot inside the nacelle using Hikvision surveillance cameras at different times, locations, and lighting conditions. The dataset contains 5000 RGB images with image resolutions of 1920 × 1080 and 480 × 270. As shown in Figure 2a is the hydraulic station in the wind turbine nacelle; the inspection robot needs to determine whether the oil in the waste oil tray under the hydraulic station is full or not; Figure 2b represents the pump in the wind turbine nacelle, and the inspection robot needs to judge whether the pump has water leakage; Figure 2c is the oil box that supplies grease to the generator in the wind turbine nacelle, and the inspection robot needs to check whether there is grease in the oil box. Detecting these components is a prerequisite for the condition monitoring of the oil pan, oil box, and water pump in the nacelle by the inspection robot.

The Labellimg annotation tool was used to annotate the collected 5000 images, and 3000 images were randomly divided for training and 2000 images for testing. Models of different sizes have different learning abilities. For large models with stronger learning ability, using stronger data augmentation can effectively improve the generalization ability of the model. For small models, the learning ability is weak, and using stronger data augmentation will lead to the model not fitting well. All the studies in this paper are based on the improvement of the YOLOX-Nano lightweight network model. Therefore, only weaker data enhancement methods, such as random horizontal flipping, random scaling, random panning, and random Hue, Saturation, Value (HSV) image enhancement, are used in the training process to enhance the generalization ability and ensure the fitting ability of the model.

2.2. YOLOX-Nano Network Model

YOLOX, as the current mainstream one-stage object detection algorithm, incorporates the advantages of the previous YOLO series networks on the basis of YOLOv3. It also innovatively proposes the decoupled head and Simplified Optimal Transport Distribution (SimOTA) dynamic sample matching scheme, introduces the anchor-free frame detection method, and achieves the improvement in detection accuracy and speed. Meanwhile, YOLOX is available in various models (e.g., YOLOX-s, YOLOX-m, YOLOX-l, YOLOX-x, YOLOX-DarkNet53, YOLOX-Tiny, and YOLOX-Nano).

YOLOX-Nano, as one of the lightest models in YOLOX, is mainly divided into a backbone network, neck module and head module, and the network structure is shown in Figure 3. The backbone network is mainly employed for the extraction of image feature information. The design based on the backbone network CSPDarkNet is cited in YOLOv5 [22], and part of the regular convolution is changed to depthwise separable convolution (DWConv) to reduce the number of network parameters. At the same time, the Spatial Pyramid Pooling-Fast (SPPF) structure proposed by YOLOv5 authors Glenn Jocher et al. is introduced to achieve the fusion of local feature information and global feature information of images, which enriches the feature representation capability of the backbone network.

The neck module consists of Feature Pyramid Network (FPN) and Path Aggregation Network (PAN) [23,24]. The top-down semantic information of the FPN is laterally connected with the bottom-up semantic information of the PAN to achieve the fusion of deep semantic information with shallow semantic information, ensuring the output of the neck module for high-resolution, strongly semantic multi-layer feature information.

While the head module of the previous YOLO series models used one convolutional module to predict the bounding box regression and the category, the head module of YOLOX-Nano uses its own proposed decoupled head module, which uses multiple convolutional modules to predict the category and bounding box regression separately, effectively improving the convergence speed and accuracy of the network model.

2.3. Backbone Improvements

YOLOv4-Tiny is the lightest model in the YOLOv4 series of models. Its ultra-fast detection speed has been widely recognized, which can be used for devices with relatively tight computing resources. YOLOv4-Tiny’s backbone network, CSPDarkNet-Tiny, consists mainly of the CSPBlock module and the Max Pooling layer. The CSPBlock module divides the feature map into two parts and then combines the two parts by cross stage residual edge. This allows the gradient stream to be propagated in two different network paths to increase the correlation difference of gradient information and the learning ability of the convolutional network. In addition, it reduces the computational cost and speeds up the image feature extraction to some extent [25,26]. In this study, in order to improve the inference speed of the network model without introducing too many parameters and reducing the accuracy of the model, we replace the backbone of the original YOLOX-Nano network by halving the number of output channels of all convolutions of the CSPDarkNet-Tiny backbone network of YOLOv4. The improved backbone network is shown in Figure 4.

2.4. Neck Module and Head Module Improvements

The original YOLOX-Nano detects objects of different sizes on the network structure’s 8×, 16×, and 32× downsampling rate feature layers. The perceptual field of individual feature points on the 8× downsampling rate feature layer is smaller, so the 8× downsampling rate feature layer focuses more on the local feature information and small objects in the image. The perceptual field of individual feature points on the 32× downsampling rate feature layer is larger, so the 32× downsampling rate feature layer focuses more on the global information and large objects in the image. The feature layer at the 16× downsampling rate is somewhere in between. The original YOLOX-Nano’s 8× downsampling rate feature layer is mainly used to detect small objects, which is more focused on the local feature information of the image, which helps the regression of the bounding box of small objects. However, the 8× downsampling rate feature layer is not effective for the detection of large and medium objects, and the bounding box regression is difficult. Detecting large and medium-sized objects is mainly performed on feature layers with 32× and 16× downsampling rates. The space inside the wind turbine nacelle is small and the object being detected occupies a relatively large proportion of the pictures taken by the inspection robot. The probability of the presence of small objects in the multi-object detection inside the nacelle of a wind turbine is extremely small. Therefore, we reduce the computation of the network model and speed up the inference of the model by removing the correlation layer structure of the 8× downsampling rate feature layer from the network structure. The detection of multiple objects in the nacelle is carried out only by the correlation layer structure of the 16× and 32× downsampling rate feature layers. The specific operation involves removing the upsampling layer within the FPN for the 16× downsampling rate feature layer and the associated layer structure, and further removing the detection head that is mainly used to detect small objects. The model is changed to predict information about multiple objects in the nacelle on the 16× and 32× downsampling rate feature layers. In order to speed up the fusion of features in feature layers with different downsampling rates of the model, we replace the CSPLayer module for fusing feature information at different scales within the FPN layer with the CSPBlock module. In addition, a convolution layer is added after the CSPBlock module for integrating the number of channels output from the CSPBlock module. The improved network model is shown in Figure 5. The images are extracted to 16× and 32× downsampling rates feature information by the backbone network, respectively. The feature information continues to be fed into the FPN module for feature fusion. Finally, the fused feature information is processed by the detection head to achieve the classification and object box regression prediction of multiple objects in the wind turbine nacelle.

3. Experiment

The experimental environment of this study was built based on Windows 10 operating system, and the model was built, trained, and tested on the same PC. The computer configuration is as follows: CPU model is 11th Gen Intel Core i5-11400H with RAM of 16 GB, GPU model is NVIDIA GeForce GTX 3050 Laptop (8G), Python version is 3.7, and the deep learning framework is Pytorch 1.10.0. The GPU’s general purpose parallel computing architecture is CUDA 11.3, and the deep learning acceleration library is cuDNN 8.2.

In the field of object detection, precision, recall, and average detection accuracy (AP; mAP) are usually used as model performance evaluation metrics. The precision is the percentage of positive samples in the predicted samples. The recall is the percentage of positive samples that are correctly predicted as positive samples. The AP is the area under the P–R curve of a single category with recall and precision as the horizontal and vertical coordinates, respectively. The mAP denotes the average of AP of multiple categories. mAP@0.75 is the mAP obtained when the Intersection over Union (IoU) of the prediction frame and the real frame are greater than 0.75 for positive samples. mAP@0.5:0.95 is the positive sample when the IoU of the prediction frame and the real frame are greater than 0.5~0.95, respectively, mAP is calculated every 0.05 in steps of 0.05, and the average of all steps of mAP is finally obtained. In this study, mAP@0.75, mAP@0.5:0.95 and Frames Per Second (FPS) are used as the evaluation metrics of the model. Accuracy, recall, and mAP are calculated as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(1)

R e c a l l = \frac{T P}{T P + F N}

(2)

m A P = \frac{1}{N} \sum_{i = 1}^{N} \int_{0}^{1} P d R

(3)

where TP stands for the number of correctly predicted positive samples, and FP denotes the number of incorrectly predicted positive samples. FN is the number of true samples that were not predicted. N, P, and R represent the number of categories, precision value, and recall value, respectively.

In this study, we design two sets of experiments. The first set of experiments is to perform ablation experiments to determine the effectiveness of each improvement point of the model. The second set of experiments comprehensively analyzes the performance of the improved YOLOX-Nano model by comparing the experimental results of the improved YOLOX-Nano model with some current lightweight models. All the models trained in this study use the same training parameters: the batch size is 8, the input size of the image is determined to be 416 × 416, the number of iterations is set to 100, and the initial learning rate is fixed to 0.00125.

3.1. Model Ablation Experiments

In this study, we verify the effectiveness of the improvements for the original YOLOX-Nano by ablation experiments, as shown in Table 1. YOLOX-Nano-A represents the model after replacing the backbone network of the original YOLOX-Nano with CSPDarkNet-Tiny. YOLOX-Nano-B is the model after removing the correlation layer structure of the 8× downsampling rate feature layer from the original YOLOX-Nano. YOLOX-Nano-C is the model after removing the correlation layer structure of the 8× downsampling rate feature layer from the YOLOX-Nano-A. Improved YOLOX-Nano is the model after changing the CSPLayer in the FPN structure of YOLOX-Nano-C to CSPBlock and adding a convolutional layer after it to integrate the channels. These models are trained and tested with the same dataset, equipment, and training parameters.

By comparing the experimental results of YOLOX-Nano and YOLOX-Nano-A, it can be seen that by replacing the backbone network of the original YOLOX-Nano with CSPDarkNet-Tiny and halving the output channels of all convolutional layers within CSPDarkNet-Tiny, the model’s mAP@0.5:0.95 and mAP@0.75 are improved by 0.12% and 0.18%, and the FPS of the model is improved from 81 frame s⁻¹ to 84 frame s⁻¹. The results show that the improvement of the backbone network has a certain improvement in the accuracy and inference speed of the model. After continuing to remove the correlation layer structure of the 8× downsampling rate feature layer of model YOLOX-Nano-A, the mAP@0.5:0.95 of model YOLOX-Nano-C improved by 0.04%, mAP@0.75 decreased by 0.01%, and FPS improved from 84 frame s⁻¹ to 117 frame s⁻¹. It can be seen that the accuracy of the model is almost unchanged, but the inference speed improved by 33 frame s⁻¹. After removing the correlation layer structure of the 8× downsampling rate feature layer of the original YOLOX-Nano, YOLOX-Nano-B has a 0.12% and 0.19% decrease in mAP@0.5:0.95 and mAP@0.75, respectively, and 17 frame s⁻¹ improvement in FPS compared to the original YOLOX-Nano model. Within the FPN layer of the original YOLOX-Nano, the global information and local features of the 32×, 16×, and 8× downsampling rate feature layers are fused and output; 32× and 16× downsampling rate feature layers fuse the local feature information of the image captured by the 8× downsampling rate feature layer, which can effectively improve the detection accuracy of the model. Therefore, after the original YOLOX-Nano removes the correlation layer structure of the 8× downsampling rate feature layer, the detection accuracy of the model decreases, the parameters are reduced, and the inference speed is accelerated. By comparing the experimental results of the original YOLOX-Nano, YOLOX-Nano-B, and YOLOX-Nano-C, it can be seen that the detection accuracy and speed of YOLOX-Nano-C are higher than the former two.

Therefore, compared with YOLOX-Nano-B that directly removes the layer structure associated with the 8× downsampling rate feature layer of YOLOX-Nano, and YOLOX-Nano-C that replaces the backbone network and removes the layer structure associated with the 8× downsampling rate feature layer, is more beneficial to the detection accuracy and speed of the model. The mAP@0.5:0.95, mAP@0.75, and FPS of improved YOLOX-Nano improve 0.14%, 0.14%, and 23 frame s⁻¹, respectively, over the YOLOX-Nano-C model. Therefore, replacing the CSPLayer in the FPN with a CSPBlock and adding a convolutional layer after the CSPBlock to integrate the channels can effectively improve the model accuracy and speed, especially the inference speed of the model is improved by 19.66%.

From Table 1, we can ascertain that the mAP@0.75 and FPS of improved YOLOX-Nano are improved by 0.44% and 72.8%, respectively, based on the original YOLOX-Nano. It is verified that the improvement of YOLOX-Nano effectively improves the model’s detection accuracy and inference speed, and especially, the model’s inference speed is greatly improved. The low latency feature of the improved YOLOX-Nano algorithm effectively improves the efficiency of the inspection robot to check the status of multiple objects in the nacelle within a short period. Although the number of parameters of the improved YOLOX-Nano increases by 2.03 M compared to the original YOLOX-Nano, the improved YOLOX-Nano can still be deployed on embedded devices. In addition, the drawback from the increase in parameters is much less than the advantage from the speed of the improved YOLOX-Nano.

In order to compare the detection accuracy of the improved YOLOX-Nano and the original YOLOX-Nano, three images randomly collected by the inspection robot were detected using the original YOLOX-Nano network model and the improved YOLOX-Nano network model, respectively. The detection results are shown in Figure 6: (a) is a graph of the detection results of the original YOLOX-Nano, and (b) is a graph of the detection results of the improved YOLOX-Nano. In the first set of comparisons, the confidence level of the original YOLOX-Nano for the detection of oil boxes was only 0.21, while the confidence level of the improved YOLOX-Nano for the detection of oil boxes was 0.84. In the second set of comparisons, the confidence level of the original YOLOX-Nano for oil pan detection was only 0.12; however, the confidence level of the improved YOLOX-Nano for oil pan detection was 0.63. In the third set of comparisons, the confidence level of the original YOLOX-Nano and the improved YOLOX-Nano models for pump detection were 0.82 and 0.88, respectively. These three comparisons show that the detection accuracy of the improved YOLOX-Nano model is higher than that of the original YOLOX-Nano model in the actual detection, which proves the effectiveness of the improved YOLOX-Nano model in improving the detection accuracy.

3.2. Model Comparison

To verify the effectiveness of the improved YOLOX-Nano model, we compared the experimental results of the improved YOLOX-Nano with the original YOLOX-Nano, YOLOv4-Tiny, and YOLOX-Tiny on the same datasets and devices, as shown in Table 2. The improved YOLOX-Nano has higher detection accuracy compared to other models, has a smaller number of parameters than YOLOX-Tiny and YOLOv4-Tiny, and has a faster detection speed than YOLOX-Nano and YOLOX-Tiny. Although the improved YOLOX-Nano has more parameters than the original YOLOX-Nano, it does not affect its deployment on embedded devices. The original YOLOX-Nano and YOLOX-Tiny have higher detection accuracy but a slower inference speed. YOLOv4-Tiny has a faster detection speed but lower detection accuracy. The improved YOLOX-Nano has higher detection accuracy as well as faster detection speed. And the relatively small number of parameters of the improved YOLOX-Nano model allows it to be deployed on embedded devices. Collectively, the overall performance of the improved YOLOX-Nano is better than other lightweight network models.

To verify the performance of the improved YOLOX-Nano deployed on an embedded device, we compared the experimental results of the original YOLOX-Nano, YOLOv4-Tiny, YOLOX-Tiny, and the improved YOLOX-Nano when deployed on the Jetson Nano, an embedded device from NVIDIA. The hardware model of the Jetson Nano is Jetson Nano B01 (4 GB) and the system version is JetPack 4.6. As shown in Figure 7, the FPS of the original YOLOX-Nano, YOLOX-Tiny, YOLOv4-Tiny, and the improved YOLOX-Nano are 8.96 frame s⁻¹, 10.53 frame s⁻¹, 14.45 frame s⁻¹, and 17.18 frame s⁻¹, respectively. The improved YOLOX-Nano has a 91.74% improvement in detection speed compared to the original YOLOX-Nano, and compared to other lightweight models, with better performance on the embedded device, Jetson Nano, having a faster detection speed. Ma et al. proposed that the inference speed of a network model is not only related to the number of network parameters but also the frequency of device memory access and hardware platform characteristics [27]. The original YOLOX-Nano has only 0.88 M parameters, but its introduction of depthwise separable convolution increases the memory access frequency to some extent, thus leading to a lower inference speed than other lightweight network models. The greater parameters of the YOLOX-Tiny and YOLOv4-Tiny models result in lower inference speed than the improved YOLOX-Nano when deployed to embedded devices with limited computational resources. The improved YOLOX-Nano maintains a lower number of parameters than YOLOX-Tiny and YOLOv4-Tiny without introducing too many depth-separable convolutions. Therefore, the improved YOLOX-Nano model deployed on the Jetson Nano holds a faster detection speed than other lightweight network models. On the whole, the improved YOLOX-Nano performs better on embedded devices and is more suitable for deployment on embedded devices.

4. Conclusions

Since high accuracy and low latency algorithms are needed for multi-object detection in the wind turbine nacelle, the YOLOX-Nano algorithm has the problem of slow inference speed. In this paper, an improved YOLOX-Nano algorithm is proposed, which speeds up the feature extraction of images by replacing the original YOLOX-Nano backbone network with the YOLOv4-Tiny backbone network CSPDarknet-Tiny. The probability of the presence of small objects in the multi-object detection inside the wind turbine nacelle is extremely small. Therefore, to reduce the model parameters and speed up the model inference, we remove the related detection layer structure of the 8× downsampling feature layer in the network structure. Additionally, the CSPLayer of the FPN module is replaced by CSPBlock to speed up the fusion of feature information at different scales to achieve faster inference of network models. The experimental results show that for the improved YOLOX-Nano compared to the original YOLOX-Nano, mAP@0.5:0.95 and mAP@0.75 improves by 0.44% and 0.34%, respectively, and the FPS improves by 72.8%. The improved YOLOX-Nano performs better on both PC and embedded devices than the lightweight models YOLOv4-Tiny and YOLOX-Tiny. In general, the model improvement makes the model maintain high accuracy detection while inference speed is greatly improved and performs better than other lightweight network models on embedded devices. It satisfies the demand for high accuracy and low latency detection algorithms for multi-object detection in wind turbine nacelle and solves the problem of slow inference speed of the original YOLOX-Nano. In the next step, we will prune and quantize the improved YOLOX-Nano model to reduce the model’s parameters and further improve its performance when deployed to embedded devices with limited computational resources.

Author Contributions

Methodology, C.H.; Data curation, Z.L.; Writing—original draft, Y.Z.; Writing—review & editing, F.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not available.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gao, R.; Wang, T. Motion Deblurring Algorithm for Wind Power Inspection Images Based on Ghostnet and SE Attention Mechanism. IET Image Process. 2022, 17, 291–300. [Google Scholar] [CrossRef]
Wang, W.; Xue, Y.; He, C.; Zhao, Y. Review of the Typical Damage and Damage-Detection Methods of Large Wind Turbine Blades. Energies 2022, 15, 5672. [Google Scholar] [CrossRef]
Deng, L.; Guo, Y.; Chai, B. Defect Detection on a Wind Turbine Blade Based on Digital Image Processing. Processes 2021, 9, 1452. [Google Scholar] [CrossRef]
Abedini, F.; Bahaghighat, M.; S’hoyan, M. Wind Turbine Tower Detection Using Feature Descriptors and Deep Learning. Facta Univ.—Ser. Electron. Energetics 2020, 33, 133–153. [Google Scholar] [CrossRef]
Zhu, Y.; Chen, D.; Yang, L.; Yuan, G.; Wei, R.; Hu, Y. Defect Detection of Aluminum Conductor Composite Core (ACCC) Wires Based on Semi-Supervised Anomaly Detection. Energy Rep. 2021, 7, 183–189. [Google Scholar] [CrossRef]
Li, X.; Wang, W.; Sun, L.; Hu, B.; Zhu, L.; Zhang, J. Deep Learning-Based Defects Detection of Certain Aero-Engine Blades and Vanes with DDSC-YOLOv5s. Sci. Rep. 2022, 12, 13067. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Gao, Z.; Zhang, Y.; Zhou, J.; Wu, J.; Li, P. Real-Time Detection and Location of Potted Flowers Based on a ZED Camera and a YOLO V4-Tiny Deep Learning Algorithm. Horticulturae 2021, 8, 21. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013. [Google Scholar]
Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; IEEE: Piscataway, NJ, USA; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA; pp. 6517–6525. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767v1. [Google Scholar]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934v1. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016, Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland; pp. 21–37.
Ran, X.; Zhang, S.; Wang, H.; Zhang, Z. An Improved Algorithm for Wind Turbine Blade Defect Detection. IEEE Access 2022, 10, 122171–122181. [Google Scholar] [CrossRef]
Hu, J.; Qiao, P.; Lv, H.; Yang, L.; Ouyang, A.; He, Y.; Liu, Y. High Speed Railway Fastener Defect Detection by Using Improved YoLoX-Nano Model. Sensors 2022, 22, 8399. [Google Scholar] [CrossRef] [PubMed]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430v2. [Google Scholar]
Yi, C.; Xu, B.; Chen, J.; Chen, Q.; Zhang, L. An Improved YOLOX Model for Detecting Strip Surface Defects. Steel Res. Int. 2022, 93, 2200505. [Google Scholar] [CrossRef]
Wu, Q.; Zhang, B.; Xu, C.; Zhang, H.; Wang, C. Dense Oil Tank Detection and Classification via YOLOX-TR Network in Large-Scale SAR Images. Remote Sens. 2022, 14, 3246. [Google Scholar] [CrossRef]
Ru, C.; Zhang, S.; Qu, C.; Zhang, Z. The High-Precision Detection Method for Insulators’ Self-Explosion Defect Based on the Unmanned Aerial Vehicle with Improved Lightweight ECA-YOLOX-Tiny Model. Appl. Sci. 2022, 12, 9314. [Google Scholar] [CrossRef]
Ultralytics. Yolov5. 2021. 1, 2, 3, 5, 6. Available online: https://Github.Com/Ultralytics/Yolov5 (accessed on 1 October 2022).
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Zhang, R.; Wen, C. SOD—YOLO: A Small Target Defect Detection Algorithm for Wind Turbine Blades Based on Improved YOLOv5. Adv. Theory Simul. 2022, 5, 2100631. [Google Scholar] [CrossRef]
Zhao, S.; Zheng, J.; Sun, S.; Zhang, L. An Improved YOLO Algorithm for Fast and Accurate Underwater Object Detection. Symmetry 2022, 14, 1669. [Google Scholar] [CrossRef]
Jiang, Z.; Zhao, L.; Li, S.; Jia, Y. Real-Time Object Detection Method Based on Improved YOLOv4-Tiny. arXiv 2020, arXiv:2011.04244. [Google Scholar]
Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]

Figure 1. Inspection robot inside the wind turbine nacelle.

Figure 2. Multiple objects in the nacelle: (a) the hydraulic station, (b) the pump, (c) the oil box.

Figure 3. YOLOX-Nano network structure.

Figure 4. CSPDarkNet-Tiny network structure.

Figure 5. Improved YOLOX-Nano model structure.

Figure 6. Detection results of (a) YOLOX-Nano before improvement, and (b) YOLOX-Nano after improvement.

Figure 7. Speed of different detection algorithms.

Table 1. Ablation experiments.

Methods	mAP		FPS [frame s⁻¹]	Parameters/(×10⁶)
Methods	0.5:0.95 [%]	0.75 [%]	FPS [frame s⁻¹]	Parameters/(×10⁶)
YOLOX-Nano	76.40	97.83	81	0.88
YOLOX-Nano-A	76.52	98.01	84	2.31
YOLOX-Nano-B	76.28	97.64	98	0.77
YOLOX-Nano-C	76.7	98.00	117	1.91
Improved YOLOX-Nano	76.84	98.14	140	2.91

Table 2. Comparison of different detection algorithms.

Methods	mAP		FPS [frame s⁻¹]	Parameters/(×10⁶)
Methods	0.5:0.95 [%]	0.75 [%]	FPS [frame s⁻¹]	Parameters/(×10⁶)
YOLOX-Nano	76.40	97.83	81	0.88
YOLOv4-Tiny	70.00	91.70	142	5.88
YOLOX-Tiny	76.78	97.83	87	5.03
Improved YOLOX-Nano	76.84	98.14	140	2.91

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, C.; Zhao, Y.; Cheng, F.; Li, Z. Multi-Object Detection Algorithm in Wind Turbine Nacelles Based on Improved YOLOX-Nano. Energies 2023, 16, 1082. https://doi.org/10.3390/en16031082

AMA Style

Hu C, Zhao Y, Cheng F, Li Z. Multi-Object Detection Algorithm in Wind Turbine Nacelles Based on Improved YOLOX-Nano. Energies. 2023; 16(3):1082. https://doi.org/10.3390/en16031082

Chicago/Turabian Style

Hu, Chunsheng, Yong Zhao, Fangjuan Cheng, and Zhiping Li. 2023. "Multi-Object Detection Algorithm in Wind Turbine Nacelles Based on Improved YOLOX-Nano" Energies 16, no. 3: 1082. https://doi.org/10.3390/en16031082

APA Style

Hu, C., Zhao, Y., Cheng, F., & Li, Z. (2023). Multi-Object Detection Algorithm in Wind Turbine Nacelles Based on Improved YOLOX-Nano. Energies, 16(3), 1082. https://doi.org/10.3390/en16031082

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Object Detection Algorithm in Wind Turbine Nacelles Based on Improved YOLOX-Nano

Abstract

1. Introduction

2. Models and Datasets

2.1. Dataset

2.2. YOLOX-Nano Network Model

2.3. Backbone Improvements

2.4. Neck Module and Head Module Improvements

3. Experiment

3.1. Model Ablation Experiments

3.2. Model Comparison

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI