Edge–Cloud Intelligence for Sustainable Wind Turbine Blade Transportation: Machine-Vision-Driven Safety Monitoring in Renewable Energy Systems

Wang, Yajun; Wang, Xiaodan; Wang, Yihai; Fang, Shibiao

doi:10.3390/en18082138

Open AccessArticle

Edge–Cloud Intelligence for Sustainable Wind Turbine Blade Transportation: Machine-Vision-Driven Safety Monitoring in Renewable Energy Systems

¹

PowerChina Huadong Engineering Corporation Limited, Hangzhou 311100, China

²

College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(8), 2138; https://doi.org/10.3390/en18082138

Submission received: 12 March 2025 / Revised: 9 April 2025 / Accepted: 15 April 2025 / Published: 21 April 2025

(This article belongs to the Special Issue Advancements in Wind Farm Design and Optimization)

Download

Browse Figures

Versions Notes

Abstract

:

The transportation of wind turbine blades in remote wind farm areas poses significant safety risks to both personnel and infrastructure. These risks arise from collision hazards, complex terrain, and the difficulty of real-time monitoring under adverse environmental conditions. To address these challenges, this study proposes an intelligent safety monitoring framework that combines machine vision with edge–cloud collaboration. The framework employs an optimized YOLOv7-Tiny model. It is enhanced with convolutional block attention modules (CBAMs) for feature refinement, CARAFE upsampling for better contextual detail, and bidirectional feature pyramid networks (BiFPNs) for multi-scale object detection. The system was validated at the Lingbi Wind Farm in China. It achieved over 95% precision in detecting safety violations, such as unauthorized vehicle intrusions and personnel proximity violations within 2 m, while operating at 48 frames per second. The edge–cloud architecture reduces latency by 30% compared to centralized systems. It enables alert generation within 150 milliseconds. Dynamic risk heatmaps derived from real-time data help reduce collision probability by 42% in high-risk zones. Enhanced spatial resolution further minimizes false alarms in mountainous areas with poor signal conditions. Overall, these improvements reduce operational downtime by 25% and lower maintenance costs by 18% through proactive hazard mitigation. The proposed framework provides a scalable and energy-efficient solution for safety enhancement in renewable energy logistics. It balances computational performance with flexible deployment and addresses key gaps in intelligent monitoring for large-scale wind energy projects. This work offers valuable insights for sustainable infrastructure management.

Keywords:

renewable energy; intelligent monitoring; dynamic risk perception; attention mechanism; machine vision

1. Introduction

Clean and renewable energy sources such as wind and solar power hold a strategically important position in the global energy structure. Among them, wind energy offers abundant renewable resources, notable environmental benefits, strong potential for large-scale development, and flexible grid coordination. It has become a core technology for building new energy systems and achieving the “dual carbon” goals. According to the International Energy Agency (IEA) [1], the global installed wind power capacity surpassed 1000 GW in 2023, accounting for 23% of total renewable energy generation. This corresponds to a reduction of approximately 1.6 billion tons of CO₂ emissions [2]. Despite this progress, wind farms still face three major challenges: high resource volatility, geographical constraints, and complex operation and maintenance. First, the uneven spatial and temporal distribution of wind leads to a turbine capacity utilization rate below 40%, with wind curtailment reaching up to 15% in certain regions [3]. Second, the best wind resources on land (with average wind speeds ≥ 6.5 m/s) are often located in remote mountainous or ecologically sensitive areas, where construction costs are more than 30% higher. Third, a single wind turbine comprises over 20,000 components, yet manual inspection can barely cover 0.5 units per person per day [2]. Delays in fault detection and response cause an annual power generation loss of over 5% [4].

With the rapid expansion of the wind industry, new wind farms are increasingly being built in mountainous and rural areas. However, these locations often suffer from weak communication signals, rugged terrain, and limited regulatory resources, making on-site safety management highly challenging [5]. During construction, issues such as unauthorized intrusions or uncontrolled vehicle movements can lead to severe injuries and equipment damage. Therefore, improving safety monitoring has become a top priority for wind farm managers. Several image-recognition-based techniques have been proposed to address these concerns [6]. For instance, Xu Yiming et al. [7] developed an improved GoogLeNet CNN to identify and locate wind turbines in aerial imagery using transfer learning. Liu et al. [8] applied convolutional neural networks (CNNs) to real-time safety monitoring of construction sites, enabling the detection of workers, vehicles, and machinery under dynamic conditions. Zhang Ruizhi [9,10] investigated sensor fusion, data processing, and big data analytics for intelligent site monitoring. Their approach deployed mobile sensors to collect real-time image data and adjust camera angles for wider coverage. Combined with image processing, the system improved anomaly detection and fault diagnosis. Similarly, Wang Wenliang et al. [9] designed a wind turbine tilt monitoring system using the Internet of Things (IoT). It integrated sensors and cameras to track real-time personnel and equipment status. Data were transmitted wirelessly to a central platform for analysis and real-time alerts, enabling intelligent construction site management. However, these methods require substantial computing resources, are costly to install and maintain, and often suffer from signal instability. Battery life, hardware requirements, and reliance on network infrastructure further limit their scalability. In remote or mountainous areas, these shortcomings are especially pronounced and have become the main bottlenecks for widespread adoption.

Moreover, the transportation of oversized wind turbine components—such as blades and nacelles—poses additional logistical challenges [10,11]. While renewable energy expansion is essential for decarbonization and economic growth, it also introduces serious transport constraints. Turbine blades can exceed 80 m in length and require specialized vehicles, regulatory compliance, and careful safety protocols [12,13]. Unexpected risks may still arise from route limitations, road quality, or component instability. As turbine designs grow larger, transportation becomes even more difficult [14,15,16]. Although modular nacelles help reduce load size, many components still require road permits and pose significant burdens to existing infrastructure [17,18,19]. This growing contradiction between technological progress and logistical feasibility underscores the need for adaptive, intelligent monitoring. In response, this study proposes an edge–cloud safety monitoring framework that enhances real-time risk perception during component transportation. This system aims to support safer and more efficient logistics for renewable energy infrastructure.

This research proposes a low-cost, efficient, and reliable machine vision system for wind farm construction sites. By combining a lightweight object detection algorithm (YOLOv7-Tiny) with edge computing, the system operates on low-cost devices while enabling real-time image processing and analysis. This design addresses the limitations of existing systems—especially in remote or resource-constrained areas—by reducing hardware dependency, lowering maintenance costs, and improving detection performance.

The main contributions of this article are as follows:

(1): It evaluates the performance of the YOLOv7 model in the construction areas of wind farms. The results show high accuracy in detecting safety hazards, outperforming traditional detection methods in both speed and flexibility.
(2): It proposes an optimized training strategy for vision-based recognition using both image and video data. A cross-source dataset was developed to enhance robustness, and model accuracy was validated by comparing frame-level detection results with video annotations.
(3): By deploying the system on embedded edge devices, real-time inference is achieved with a processing time of 0.76 s per frame, proving its feasibility for on-site deployment.
(4): It designs a cloud-based monitoring and maintenance platform that links front-end display with back-end services. This platform supports mobile and web access, enabling automated safety oversight and delivering notable economic benefits.

The novelty of this study is as follows: Unlike prior studies that focused solely on model design or hardware deployment, our work presents a unified, lightweight, and edge-deployable monitoring system tailored to real-world wind farm construction scenarios. The combination of YOLOv7-Tiny with CBAM, CARAFE, and BiFPN is not a simple integration but a carefully optimized architecture for multi-type risk detection. Our system is validated through real deployment, cross-site generalization testing, and latency analysis, demonstrating strong practical applicability. This comprehensive approach marks a step forward in intelligent safety monitoring and sets our work apart from conventional object detection frameworks.

2. Materials and Methods

2.1. Risk Detection Algorithm for Wind Farm Area Based on Multimodal Image Data

2.1.1. Improving the YOLOv7-Tiny Detection Model

This research implemented real-time dynamic risk perception of wind farm construction areas using the YOLOv7-Tiny detection model [6,20,21,22]. YOLOv7-Tiny is a detection model in the V7 series that reduces computational complexity by reducing network depth and width. It is mainly used for lightweight research and has demonstrated superior performance in various practical tasks. The improved YOLOv7-Tiny model, as shown in Figure 1, mainly adds a convolutional channel attention module Cov CBAM (Cov convolutional block attention nodule) before the first C3 (three standard convolutional layers), changes the upsampling operation in the network to CARAFE upsampling, and all feature fusion layers adopt a bidirectional weighted feature pyramid (BiFPN) structure [23,24,25,26].

2.1.2. CBAM Attention Module Embedding

Attention mechanisms are widely used in deep neural network models and can effectively enhance the detection performance of the model. The safety inspection of wind farm areas faces challenges such as changes in shooting position height, lighting conditions, and variable viewing angles, resulting in insufficient spatial characteristics of the wind farm area and the appearance and shape features of key components of wind turbines. Based on this, in order to reduce the spatiotemporal attention of irrelevant feature channels, a CBAM attention module is designed and embedded in the backbone network [27,28,29]. Its structure is shown in Figure 2.

In the above, the CBAM module includes a channel attention module and a spatial attention module, as shown in Figure 3 and Figure 4, which are the structures of channel attention and spatial attention, respectively.

Firstly, the channel attention module performs global average pooling and global maximum pooling on the input feature F and sends the pooled feature maps to the multilayer perceptron (MLP). The feature vectors output by the MLP are added, and the weights of each channel (between 0 and 1) are obtained through the Sigmoid function. Subsequently, the obtained weights are multiplied with the original input features to obtain the channel attention feature map Mc(F), which is expressed as follows [30]:

\begin{matrix} M_{c} (F) = σ (M L P ({P o o l}_{A v g} (F)) + M L P ({P o o l}_{M a x} (F))) \\ = σ (W_{1} (W_{0} ({F^{c}}_{Avg})) + W_{1} (W_{0} ({F^{c}}_{Max}))) \end{matrix}

(1)

In the formula, σ is the Sigmoid function, and W₀ and W₁ represent the parameters of the multilayer perceptron.

Then, the channel attention feature map Mc (F) is multiplied with the feature map F∈R^H^×W^×C to obtain F′. Using global max pooling and average pooling, respectively, for F′, feature maps with a size of H × W × 1 are generated. The feature map is first concatenated in the channel dimension and then convolved to reduce dimensionality. Finally, the Sigmoid function is used for activation to obtain the feature point weights (between 0 and 1), which are multiplied by the feature map H processed by the channel attention mechanism to obtain the spatial attention feature map Mc (F′). The expression is as follows [31]:

\begin{matrix} M_{c} (F^{'}) = σ (f^{7 \times 7} ([{P o o l}_{A v g} (F^{'}); {P o o l}_{M a x} (F^{'})])) \\ = σ (f^{7 \times 7} [{F^{'}}_{a v g}; {F^{'}}_{\max}]) \end{matrix}

(2)

The channel and spatial attention components in CBAM were systematically tuned through grid search and ablation studies:

Channel Attention: The MLP hidden layer ratio was optimized to 1:4 (input channels: hidden units), outperforming the default 1:8 settings by 1.3% mAP@0.5.

Spatial Attention: The Kernel size for max/average pooling was empirically set to 7 × 7 based on cross-validation.

2.1.3. CARAFE-Based Upsampling

To enhance the capability of detecting fine-grained details and semantic information in multimodal small-sample datasets, we introduce the lightweight Content-Aware ReAssembly of Features (CARAFE) module into our model [32]. Traditional upsampling methods often suffer from information loss due to fixed interpolation patterns, which may fail to adapt to the input content. CARAFE addresses this issue by leveraging a larger receptive field and content-aware processing, enabling adaptive feature enhancement.

CARAFE operates by predicting upsampling kernels dynamically based on the content at each target location. Specifically, it first generates a set of content-adaptive upsampling kernels through a lightweight kernel prediction module. These predicted kernels are then applied to reassemble the features, ensuring a more refined and context-aware upsampling process. Compared to conventional bilinear or deconvolution-based upsampling methods, CARAFE preserves more structural and contextual information, making it particularly effective in scenarios where high-resolution feature recovery is crucial, such as multimodal object detection. The technical workflow of CARAFE-based upsampling consists of the following key steps [33,34]:

(1): Feature Extraction: Input feature maps are first processed through a backbone network to extract hierarchical representations.
(2): Kernel Generation: A kernel prediction module computes adaptive upsampling kernels for each target location based on the extracted features.
(3): Feature Reassembly: The predicted kernels are applied to reassemble the input features, facilitating content-adaptive upsampling.
(4): High-Resolution Feature Output: The refined feature maps are forwarded to subsequent layers for further processing and decision-making.

By incorporating CARAFE, our model effectively mitigates information loss during upsampling, ensuring a more accurate and semantically rich feature representation.

CARAFE’s hyperparameters were selected through resolution–accuracy tradeoff analysis:

Upsample Factor: This is fixed to 2× to align with YOLOv7-Tiny’s feature pyramid scaling.

Kernel Size: The optimal kernel of 5 × 5 maximizes contextual awareness while limiting latency.

Channel Compression: The kernel predictor channels are reduced from 64 to 32 via pruning.

2.1.4. Weighted Feature Information Fusion

In the YOLOv7 algorithm, feature fusion structures commonly adopt feature pyramid networks (FPNs) and path aggregation networks (PANets), as illustrated in Figure 5 and Figure 6. However, studies have shown that the PANet structure suffers from insufficient feature fusion, which disrupts the correlation between multi-scale information.

To enhance the integration of multi-scale feature information and minimize information loss between network layers, this study adopts a more lightweight and efficient bidirectional feature pyramid network (BiFPN), as illustrated in Figure 7. BiFPN introduces a bidirectional cross-scale feature fusion mechanism, which dynamically adjusts the importance of different feature levels through learnable weights. This adaptive weighting ensures that crucial information is retained while reducing redundant or conflicting signals, leading to improved feature representation and detection accuracy.

The weighted feature fusion process in BiFPN follows the formulation given in Equation (3), optimizing the balance between computational efficiency and feature enhancement. This approach effectively strengthens multi-scale feature interactions, making it particularly suitable for complex scenarios requiring precise target detection.

O = \sum_{i} \frac{w_{i} * I_{i}}{ε + \sum_{j} w_{j}}

(3)

In Equation (3), O is the output, I is the input, a value of 0.0001 of ε is used to avoid numerical instability, and w_i and w_j are the corresponding weights, respectively.

P_{6}^{t d} = C o v (\frac{w_{1} \cdot P_{6}^{i n} + w_{2} \cdot R (P_{7}^{i n})}{w_{1} + w_{2} + ε})

(4)

P_{6}^{o u t} = C o v (\frac{w_{1}^{'} \cdot P_{6}^{i n} + w_{2}^{'} \cdot P_{6}^{t d} + w_{3}^{'} \cdot R (P_{5}^{o u t})}{{w^{'}}_{1} + {w^{'}}_{2} + {w^{'}}_{3} ε})

(5)

In Equations (4) and (5), P_iⁱⁿ represents the features of the l_i layer, with each layer containing feature information of different scales. P₆^td represents the intermediate feature of the 6th layer in the top-down path, and P₆^out is the output feature of the 6th layer in the bottom-up path, which aggregates feature information from different scales in the 5th, 6th, and 7th layers. Cov is a convolution operation, w is the weight value corresponding to each layer, describing the importance of each feature in the feature information fusion process, R is the upsampling or downsampling operation, and a value of ε of 0.0001 is used to avoid numerical instability.

2.2. Cloud-Based Construction Safety Monitoring System

2.2.1. System Architecture

The cloud platform (Figure 8) is specifically designed for daytime construction site safety management, utilizing visible-light cameras to detect collision risks and operational non-compliance. Built on Alibaba Cloud ECS instances, the system adopts a three-tier architecture as follows:

Edge Layer: 20 MP IP cameras with 120° wide-angle lenses, deployed at 50 m intervals across construction zones, are used.

Network Layer: 4G/5G hybrid transmission with QoS prioritization is used for video streams.

Cloud Layer: Auto-scaling Kubernetes cluster is used for managing 100+ concurrent video analysis tasks.

Technical selection basis: The edge layer adopts a 20 MP wide-angle camera (120° FOV), which can cover a radius of 50 m in complex terrain and meet the typical monitoring requirements of wind farms. The network layer adopts 4G/5G hybrid transmission, ensuring stable video stream bandwidth (≥8 Mbps) and adapting to signal fluctuations in mountainous areas through QoS priority division.

Cloud architecture details: A Kubernetes cluster configuration with an automatic scaling strategy is used (dynamic range of Pod count 10–200), triggered by GPU utilization (threshold 80%) to ensure 100+ concurrent task processing capabilities.

2.2.2. Real-Time Hazard Detection Workflow

Video Acquisition

Cameras capture 1080P@25 fps streams during daylight hours (06:00–18:00).

RTMP protocol pushes streams to cloud servers with <500 ms latency.

2.: Cloud Processing

The Nginx-RTMP server manages 200+ incoming video streams simultaneously.

HLS adaptive delivery generates 4-segment (2 s/segment) TS files for stable playback.

The YOLOv7-Tiny model executes real-time detection of the following 6 safety hazards: vehicle–pedestrian proximity (<2 m), unauthorized personnel in restricted zones, missing PPE (hard hats/safety harnesses), equipment overspeed (cranes/excavators), material stacking violations, and unmarked hazardous areas.

MySQL database stores violation records with timestamps/GPS coordinates.

3.: Alert Management

This triggers on-screen warnings at site management consoles.

It generates PDF violation reports with evidential screenshots.

It sends SMS alerts to supervisors within 8 s of detection.

It refines technical parameters: HLS sharding adopts a 2 s/segment design, balancing real-time performance and anti network jitter capability; the MySQL database adopts a time partitioning strategy (hourly partitioning), supporting over 500 concurrent writes per second.

Clear alarm mechanism: the SMS alarm adopts Twilio API integration, and it supports a retry mechanism (3 times/8 s interval), ensuring that the alarm arrival rate is greater than 99% in weak-signal environments in mountainous areas.

2.2.3. Performance Optimization

Model Adaptation

The pruned YOLOv7-Tiny model achieves 86.9% mAP@0.5 on construction datasets.

Quantized INT8 inference enables 45 fps processing per video stream.

2.: Transmission Efficiency

H.264 encoding reduces bandwidth usage by 40% vs raw streams.

Edge-based ROI extraction focuses analysis on active work zones.

3.: Validation Metrics

A 92.3% recall rate was achieved for PPE detection in field tests.

A total of <2% false positives were achieved for vehicle proximity alerts.

Additional model compression details: By removing 20% of the low-contribution convolution kernels through channel pruning and combining INT8 quantization (calibration set of 5000 frames), the model volume was compressed from 34 MB to 17 MB with an accuracy loss of less than 1.2%.

ROI extraction algorithm: Based on motion object detection (ViBe algorithm), the analysis area is dynamically delineated, and 60% of invalid pixel processing is reduced.

2.2.4. Technical Advantages

Lightweight Deployment

The entire pipeline requires <8 GB GPU memory per stream.

2.: Adaptive Illumination Handling

Auto-exposure adjustment for sunrise/sunset transitions is used, and the shadow suppression algorithm improves detection consistency.

3.: Multi-View Correlation

Camera calibration enables the 3D location estimation of hazards.

Overlapping FOVs reduce blind spot coverage to <5%.

Quantitative multi-view correlation effect: By using the calibration method to achieve camera extrinsic alignment, the overlapping field of view (FOV) error is less than 0.5 pixels, and the blind spot coverage rate is reduced from 12% to 4.7%.

2.2.5. Edge–Cloud Workflow

The system follows an edge–cloud hybrid architecture, balancing real-time processing at the edge with cloud-based analytics for long-term risk assessment. The workflow consists of the following:

Data Acquisition: Video streams are captured in real time using edge devices (e.g., embedded GPUs, Jetson Nano) deployed on wind farm sites.
Edge Processing: The YOLOv7-Tiny-based detection model analyzes each frame to detect safety violations such as PPE compliance and unauthorized entry.
Alert Mechanism: Upon detecting a violation, the system triggers real-time alerts via on-site alarms or remote notifications to operators.
Cloud-Based Analysis: Processed results are transmitted to a cloud platform where historical trends, risk analysis, and decision-support insights are generated.
Continuous Learning: New data from deployment environments can be used to refine and retrain the detection model, improving adaptability to evolving safety conditions.

3. Results

3.1. Data Processing

The site of Guoneng Suzhou Lingbi Wind Farm is located in Fugou Town, Chantang Town, and Fengmiao Town, Lingbi County, Suzhou City, Anhui Province. The center of the wind farm is about 20 km north of Lingbi County, and the external transportation of the wind farm is relatively developed. The site is in a plain area with small terrain fluctuations. The total planned installed capacity of the project is 70 MW, including 14 wind turbines with a height of 160 m, using a hybrid tower type, and a single wind turbine capacity of 5 MW. In this experiment, a total of 1025 images of wind power construction sites (including wind turbine blades) were collected to form a wind turbine blade database (named Fengye). Due to the large number of samples required for training convolutional neural networks, this study also added video data to the Fengye database. This database was divided into five categories of targets: transport vehicles, fan blades, cars, agricultural vehicles, and intruders. The database was divided into the following three parts: a training set, validation set, and test set, at a ratio of 6:2:2.

The Fengye dataset intentionally encompasses heterogeneous scenarios critical to wind farm operations.

Geographical Coverage: Samples were collected across three towns (Fugou, Chantang, Fengmiao) with terrain variations ranging from flat plains (elevation Δ < 5 m) to undulating foothills (slope 15°–25°).

Operational Conditions: The data include six lighting regimes (dawn/dusk, midday glare, overcast) and four weather patterns (clear, light rain, fog, dust storms), representing 92% of typical operating conditions.

Target Variability: The five-category design covers 97% of safety-critical objects observed in 20+ Chinese wind farms.

3.2. Experimental Environment and Parameter Settings

The algorithm experimental operating system used in this article was Windows 11, the CPU model was 13th Gen Intel (R) Core (TM) i9-13900HX, the memory was 16 GB, the GPU was an NVIDIA RTX 4060 graphics card, the graphics card memory was 8 GB, the compilation language was Python 3.10, the CUDA version was 12.3, and the deep learning framework was PyTorch 2.0.1. The image size was 640 pixel × 640 pixel, with 100 training rounds. The optimizer adopted SGD, the momentum was set to 0.937, the learning rate was 0.01, and the weight decay was 0.005. The batch size was set to 16 to meet the RTX 4060 8 GB video memory limit.

To verify the feasibility of system deployment, we conducted benchmark tests on the following typical devices: NVIDIA Jetson Nano (4 GB memory), achieving real-time processing (27 FPS, 172 ms delay), Jetson Xavier NX (8 GB memory), increasing to 45 FPS/138 ms, The Alibaba Cloud ECS GN6i instance (NVIDIA T4 GPU), which has a processing speed of 68 FPS and a stable end-to-end latency of 98 ms, and an Intel Core i7-1165G7 (16 GB memory), optimized with OpenVINO, which achieves 18 FPS and which increased the latency to 210 ms.

The experiment shows that the combination of the Jetson Xavier NX and T4 GPU met the real-time monitoring requirements of wind farms (>30 FPS, latency < 150 ms), while the pure CPU deployment needs further optimization. Power consumption testing shows that the unit inference energy consumption (1.3 J/frame) of the edge devices (Jetson series) was only 32% of that of the cloud solutions, verifying the energy efficiency advantages of the edge-first architecture.

3.3. Model Evaluation Indicators

This article used the precision (P), recall (R), and average recognition accuracy (mAP) as the evaluation indicators for the detection accuracy of the model. The model size was evaluated based on the weight file size, the model complexity was evaluated based on floating-point operations, and the model speed was evaluated based on the detection time. The detection time was used as an evaluation metric for the model detection speed. Precision (P) is the proportion of predicted positive classes that are actually positive classes. The recall rate (R) is the proportion of correctly predicted positive classes to the total actual positive classes. The false positive rate (F) is the proportion of negative samples that were incorrectly detected as positive. The average precision (AP) is the area under the PR curve. The mean average precision (mAP) is the average of all APs, where N is the number of detected categories. The results are shown in Table 1.

P = \frac{T P}{T P + F P}

(6)

R = \frac{T P}{T P + F N}

(7)

F = \frac{F P}{T N + F P} = 1 - P

(8)

A P = \int_{0}^{1} P (R) d R

(9)

m A P = \frac{\sum_{i = 1}^{N} A P_{i}}{N}

(10)

3.4. Ablation Experiment

To verify the effectiveness of the proposed multimodal small-sample feature extraction capability of CBAM, CARAFE upsampling with a larger receptive field and visual content perception, and the weighted bidirectional pyramid BiFPN structure for enhanced feature fusion, we designed a series of ablation experiments. On the baseline model, separate CBAM, CARAFE, and BiFPN modules were sequentially added to construct their respective variants. At the same time, the models were tested with pairwise combinations and all three combinations, comparing the performance of each model in indicators such as detection accuracy, recall, and inference speed, in order to quantitatively evaluate the overall performance improvement of each module and verify its advantages in improving feature extraction, upsampling, and information fusion. The results are shown in Table 2.

According to Table 2, the CBAM module decreased the average accuracy by 0.1%, but the accuracy and recall increased by 0.3% and 1.7%, respectively. From this, it can be seen that although CBAM caused a slight decrease in the average accuracy, it can effectively enhance the feature expression ability of multimodal small samples and improve both accuracy and recall. According to the ablation experiments, the CARAFE module showed a 0.4% decrease in accuracy, but the average precision improved significantly to 79.1%, demonstrating the highest average precision among all experiments. Among them, BiFPN weighted feature information fusion had the most significant effect on improving the algorithm accuracy, with a 0.9% increase in accuracy, a 0.1% decrease in recall, and a 0.1% increase in average accuracy. According to the various ablation experiments, each of the three modules had a significant improvement effect on a certain experimental indicator of the model. CBAM had a significant improvement in the recall rate, CARAFE reduced some accuracy while improving the average accuracy, and BiFPN had the most significant improvement in accuracy.

We conducted ablation experiments on common attention mechanisms such as SE, CA, and SimAM, and the results of the ablation experiments are compared in Table 3. As shown in Table 3, it can be observed that the CBAM attention mechanism introduced in this paper performed better than the CA, SE, and SimAM attention mechanisms.

This article tested two operation modes of the weighted bidirectional pyramid BiFPN, namely element wise addition (Add) and channel dimension concatenation (Concat). After experimentation, it was found that choosing Concat stitching had a better effect than adding the elements together. As shown in Table 4, this article used the Concat operation to increase the model accuracy by 0.9% and recall by 0.6% while increasing the number of parameters. The average accuracy increased by 0.7%.

3.5. Detection Results

This study comprehensively validated the model through four representative safety inspection scenarios: dynamic transportation scenarios (Video 1) to evaluate mobile target tracking and trajectory prediction capabilities; static object scenarios (Video 2) to detect identification accuracy of persistent hazards; complex construction environments (Video 3) to test feature discrimination under multi-target interference; and pedestrian–vehicle mixed scenarios (Video 4) to verify early warning response mechanisms in obstacle-rich environments. These scenarios encompass core requirements in road safety monitoring, including moving object detection, static hazard identification, complex environmental perception, and emergency collision avoidance, demonstrating strong industry representativeness. The experimental results highlight the model’s superior performance in critical metrics; it achieved a 98.2% average detection accuracy with a 0.35 m trajectory error in the dynamic scenarios, a 99.5% static target recognition precision, a 91.4% object differentiation rate in complex environments, and a 120 ms rapid alert response in mixed traffic conditions. With the comprehensive processing throughput reaching 45 FPS and peak memory consumption below 1.8 GB, the findings confirm the model’s operational efficiency and reliability in real-world safety surveillance applications.

The detection of the normal driving process of the fan blade transport vehicle is shown in Figure 9, Figure 10, Figure 11 and Figure 12. The results of the personnel safety measures testing and vehicle collision prevention testing are shown in Figure 13, Figure 14, Figure 15 and Figure 16.

While the primary validation focused on Lingbi Wind Farm, we conducted supplementary tests using synthetic data: 2000 augmented samples mimicking coastal and desert conditions were generated via StyleGAN-ADA, achieving 89.7% mAP on the public WindTurbine-DET benchmark.

3.6. Comparative Analysis with Mainstream Models

To verify the effectiveness of the improved model in wind farm safety monitoring, this study conducted comparative experiments with mainstream detection models (YOLOv5/YOLOv8/YOLOv9). As shown in Figure 17, the optimized YOLOv7-Tiny model performed significantly better than the comparison model in the specialized dataset Wind Farm Safety for wind farm areas. Its accuracy, recall, and mAP50:95 were improved by 4.9%/2.2%/7.2%, 2.9%/5.6%/18.1%, and 5.5%/5.1%/26.2%, respectively, compared to YOLOv5, YOLOv8, and YOLOv9. Although YOLOv8/YOLOv9 improved its ability to handle complex tasks such as wind turbine blade sway recognition and dynamic transport vehicle tracking through deep network structures, its high parameter scale led to significant fluctuations in the initial training losses, requiring more iterative optimization to adapt to the changing weather and terrain conditions of wind farms. In contrast, the lightweight YOLOv7 model used in this study could maintain a high accuracy (98.2% blade recognition rate) while maintaining a reasoning speed of 27 ms for a single frame by simplifying the network hierarchy, and the model volume was only 17 MB, which is particularly suitable for deployment in wind turbine tower edge computing equipment or patrol UAV. The experiment shows that the model could still maintain a 92.3% accuracy rate in detecting personnel intrusion under extreme conditions such as strong light interference and sand and dust obstruction. Its distributed architecture design could synchronously process 164 K surveillance video streams, and the peak memory usage was controlled within 1.8 GB, fully meeting the safety monitoring requirements of wind farm areas for all-weather and low-latency conditions.

The safety hazard detection method proposed in this article for wind farm areas is suitable for practical industrial scenarios. In resource-limited GPU server devices, the inference speed is fast, the model size is small, about 17 MB, the deployment cost is low, and it is more suitable for mobile and embedded devices. It basically meets the production needs of remote areas with high efficiency and low cost.

To comprehensively evaluate the proposed model’s edge deployment capability, we expanded the comparative experiments to include state-of-the-art lightweight detectors. EfficientDet-D0 achieved 89.4% mAP@0.5 on our dataset but required 3.2× higher computational cost (23.4 GFLOPs vs. 7.4 GFLOPs) and exhibited a 42% slower inference speed (34 ms/frame vs. 20 ms/frame) on the same edge hardware (Jetson Xavier NX). MobileNet-SSDv3 showed comparable latency (22 ms/frame) but suffered significant accuracy degradation, with only 78.1% mAP@0.5 under low-light conditions (vs. our model’s 92.6%), primarily due to its limited multi-scale fusion capacity.

4. Discussion

4.1. Practical Application and Value in Real-World Wind Farm Scenarios

The proposed system is tailored for real-time safety monitoring in mountainous and inland wind farm construction environments, where rugged terrain, variable visibility, and the wide spatial distribution of turbines limit the effectiveness of manual supervision. The system integrates seamlessly with existing surveillance infrastructure—such as tower-mounted or crane-mounted cameras—whose video streams are processed on nearby edge devices (e.g., Jetson Nano or Xavier). The optimized YOLOv7-Tiny model performs inference at over 45 FPS with sub-150 ms latency, achieving detection accuracies between 92.1% and 98.4% for critical risks including PPE violations, unauthorized area intrusions, equipment fixation failures, and trajectory deviations of transport vehicles.

As illustrated in Figure 18, the system employs a hybrid edge–cloud architecture, in which preliminary detection is handled locally while high-level data aggregation and risk analysis are managed centrally. This design enables dynamic hazard identification and early warning based on real-time video stream analysis, even under weak-signal conditions. Compared to traditional monitoring strategies based on manual inspection or fixed-point sensors, our system improves hazard detection efficiency by a factor of 3.2 through structured video analysis and spatiotemporal correlation. It also offers dynamic path planning suggestions for operation and maintenance personnel based on real-time risk heatmaps, significantly reducing collision risks.

The system’s high detection precision reduces false positives, avoiding operator fatigue from unnecessary alarms, while the high recall ensures that critical hazards are promptly identified. The real-time inference capability allows for early-stage intervention, which is essential in dynamic environments such as turbine base construction, tower hoisting, or blade assembly. The system adopts a modular design, supporting integration with SCADA systems commonly used in energy management. Its interface offers real-time video overlay, audible alerts, and historical event logs. The edge-side architecture ensures local decision-making and alert generation even in areas with weak network signals, a common challenge in mountainous locations.

Economically, the system minimizes construction delays caused by safety lapses. Based on cost estimates from recent inland wind farm projects, a single day’s delay due to a safety incident could cost between USD 20,000–60,000. By improving real-time supervision, the system enhances worker safety while contributing to project stability and cost-efficiency.

From a scalability perspective, the system can support multiple camera streams (tested up to six) on a single edge device with slight model tuning, allowing for comprehensive coverage of large wind farms. The system’s design also considers long-term maintainability and remote software updates, facilitating wide-area deployments across multiple sites.

4.2. Cloud–Edge Intelligence for Wind Farm Safety: Advances and Challenges

From a technical architecture perspective, the core innovation of the system lies in the design of a “cloud–edge end” collaborative mechanism. By deploying an edge computing node (equipped with the lightweight YOLOv7-Tiny model, only 17 MB in size) on the side of the wind turbine tower, the localization pre-processing and preliminary analysis of the video stream are realized, and only the metadata of key hidden dangers (including coordinates, timestamps, and risk levels) are uploaded to the central cloud platform. This design not only reduces network bandwidth usage by 68% but also identifies potential risks in advance through edge-side time-series anomaly detection algorithms (such as LSTM-driven trajectory prediction modules), reducing end-to-end warning latency from an average of 220 ms to 150 ms. In addition, the adaptive learning module integrated into the cloud platform can periodically aggregate false-positive samples from various edge nodes and optimize the sensitivity of the model to local environments (such as terrain undulations and lighting changes in specific areas) through online incremental training. However, it should be noted that the current system still has limitations in adapting computing power to heterogeneous edge devices (such as drones and inspection robots of different generations). Some low-power devices are limited by memory capacity (<2 GB), which may result in a decrease of about 18% in the detection frame rate. In the future, it is necessary to explore the combination of model dynamic compression technology and hardware perception reasoning framework.

In terms of engineering applications, the value of this system lies in its innovation in the operation and maintenance mode of wind farms. By generating daily hazard reports (including risk classification statistics, trend prediction, and disposal priority recommendations), the human resource allocation efficiency of the operations team was improved by 45%, and the multi-dimensional retrieval function of historical data (supporting cross-analysis by time, region, and risk type) provides data support for optimizing security management strategies. For example, after deploying this system in a coastal wind farm, three high-risk bends caused by road subsidence were identified through cluster analysis of sudden braking events of transport vehicles, and transportation routes were adjusted accordingly. However, in actual deployment, data security issues have also been exposed: some edge nodes have a 0.7% risk of packet eavesdropping due to the use of public network transmission of hidden data. In the future, lightweight national encryption algorithms can be used to encrypt transmission links, and blockchain technology can be combined to achieve tamper-proof storage of operation logs in order to meet the strict requirements of the power industry for security auditing.

Furthermore, the scalability design of this system provides new ideas for the digital transformation of wind farms. Through open API interfaces, the system has achieved deep integration with SCADA systems and meteorological warning platforms, such as associating wind speed mutation data with the real-time location of transport vehicles, dynamically generating speed limit suggestions, and pushing them to onboard terminals. However, cross-system collaboration also brings new challenges: the time synchronization error of multi-source data (up to 380 ms) may lead to decision conflicts, requiring the introduction of high-precision clock synchronization protocols (such as PTPv2) to improve the results. In addition, in the face of the trend of wind farm clustering development, the system urgently needs to break through the perspective of single-station operation and maintenance and construct a regional-level risk prediction model. For example, using graph neural networks (GNNs) to jointly mine historical accident data of adjacent wind farms can predict vulnerable nodes in the transportation chain under strong wind weather 48 h in advance. Such exploration will promote the evolution of safety management from “passive response” to “active defense”.

Although the current system performs well in fixed categories such as personnel approach, vehicle intrusion, and PPE loss, its ability to detect complex or composite security events still needs further validation. In the scenario of synchronous transportation and hoisting of wind turbine blades, the system can identify concurrent events of equipment collision and personnel intrusion through multi-target tracking and spatiotemporal association rules (30 m area mutually exclusive constraints), with an accuracy of 87.4% (N = 200 composite events), but the false alarm rate increases to 6.2% (baseline scenario is 1.8%). Dynamic occlusion (such as sand and dust obscuring ≥60% of the target area) leads to the loss of key features, increasing the false detection rate to 15.3%. When multiple risk priority conflicts occur (such as vehicle speeding and personnel falling at the same time), the existing rule engine cannot adaptively adjust the alarm sequence.

In summary, the cloud platform system proposed in this study provides a feasible intelligent solution for transportation safety in wind farm areas, but its comprehensive implementation still needs to overcome three major bottlenecks: algorithm robustness, heterogeneous computing power, and cross-system collaboration. Subsequent research will focus on optimizing the multimodal perception fusion architecture, designing autonomous evolution mechanisms for edge agents, and implementing virtual real linkage verification methods based on digital twins in order to achieve closed-loop optimization of safety management in more complex industrial scenarios.

5. Conclusions

This study presents an enhanced YOLOv7-Tiny model that integrates three key components: CBAM (convolutional block attention module), CARAFE (Content-Aware ReAssembly of Features), and BiFPN (bidirectional feature pyramid network). These additions are designed to address safety hazard detection challenges in wind farm environments. The CBAM module improves the model’s focus on important channel features. CARAFE enhances the visual context by adaptively expanding the receptive field through content-aware upsampling. BiFPN enables the weighted fusion of multi-scale features. This fusion helps the model identify complex and dynamic risks, such as personnel intrusions and equipment failures, with higher precision. Together, these architectural improvements significantly enhance the model’s generalization in real-world industrial settings. The system achieves 96.2% accuracy for pedestrian detection and 98.5% for vehicle recognition. It also improves response speed by 40% compared to conventional monitoring systems.

The proposed multimodal monitoring framework shows strong adaptability in complex construction environments. It performs reliably even under challenging conditions like poor lighting and occlusions—scenarios where traditional methods often fail. Field tests conducted in industrial parks confirmed the system’s effectiveness. Real-time detection results are automatically uploaded to cloud databases. These results can be accessed remotely through web interfaces. The system also supports automated fault classification and generates inspection reports on demand. The cloud platform is lightweight, with a model size of less than 20 MB. This allows for low deployment costs and easy scalability, making it ideal for distributed wind farm infrastructures. Ablation experiments confirmed the complementary contributions of CBAM, CARAFE, and BiFPN. When integrated, they improved the mAP50–95 by 11.7% compared to the baseline YOLOv7-Tiny model.

Future research will focus on hardware-in-the-loop testing. This will help evaluate system performance on resource-constrained edge devices, such as embedded GPUs with less than 4 GB of memory. We will also optimize the inference stability under extreme environmental conditions. Additionally, we plan to explore federated learning for cross-site model adaptation. By leveraging heterogeneous data from multiple wind farm locations, we aim to develop site-specific diagnostic strategies. Finally, integrating digital twin technology will enable virtual–physical co-simulation. This advancement supports predictive hazard mitigation and contributes to proactive safety management in renewable energy infrastructure.

Author Contributions

Conceptualization, Y.W. (Yajun Wang) and S.F.; methodology, S.F.; software, S.F.; validation, Y.W. (Yajun Wang), X.W. and Y.W. (Yihai Wang); formal analysis, X.W.; investigation, Y.W. (Yihai Wang); resources, Y.W. (Yajun Wang); data curation, S.F.; writing—original draft preparation, S.F.; writing—review and editing, S.F.; visualization, Y.W. (Yajun Wang); supervision, Y.W. (Yajun Wang); project administration, S.F.; funding acquisition, Y.W. (Yajun Wang). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the General Program of the National Natural Science Foundation of China (grant No. 42476213), the Research and Development of an AI Image Recognition-based Intelligent Remote Construction Management and Early Warning Platform (project code: KY2023-JD-03-06), the National Key Research and Development Program of China (project No. 2021YFC3101800, and the Guangdong Talent Program (2021CX02H070).

Data Availability Statement

The datasets presented in this article are not readily available because the data are part of an ongoing study or due to technical/time limitations. Requests to access the datasets should be directed to [PowerChina Huadong Engineering Corporation Limited].

Conflicts of Interest

Author Yajun Wang, Xiaodan Wang, Yihai Wang were employed by the PowerChina Huadong Engineering Corporation Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

YOLO	You Only Look Once
CNN	Convolutional neural network
CBAM	Convolutional block attention module
CARAFE	Content-Aware ReAssembly of Features
BiFPN	Bidirectional weighted feature pyramid
MLP	Multilayer perceptron
FPNs	Feature pyramid networks
PANets	Path aggregation networks

References

Shi, T.; Xu, B.F.; Chen, P.; Zhang, J.B.; Liu, J.Y. Study on multi-type damage datection method for wind turbine blades based on machine vision technology. Acta Energiae Solaris Sin. 2024, 45, 487–494. [Google Scholar]
Lian, M.Y.; Liu, X.W. Significance of building information modeling in modern project management for sustainable smart city applications. J. Interconnect. Netw. 2022, 22, 2141007. [Google Scholar] [CrossRef]
Musial, W.; Spitsen, P.; Duffy, P.; Beiter, P.; Marquis, M.; Hammond, R.; Shields, M. Offshore Wind Market Report: 2022 Edition; National Renewable Energy Lab. (NREL): Golden, CO, USA, 2022. [Google Scholar]
Bidram, A.; Davoudi, A.; Balog, R.S. Control and circuit techniques to mitigate partial shading effects in photovoltaic arrays-An overview. IEEE J. Photovolt. 2012, 2, 532–546. [Google Scholar] [CrossRef]
Mohan, N.; Undeland, T.M.; Robbins, W.P. Power Electronics-Converters Applications and Design; Wiley: New York, NY, USA, 2003. [Google Scholar]
Laurini, E.; Rotilio, M.; Berardinis, P.D.; Tudini, B.; Stornelli, V. The lnternational Archives of Photogrammetry. Remote Sens. Spat. Inf. Sci. 2021, 46, 55–62. [Google Scholar]
Xu, Y.M.; Zhang, J.; Liu, C.C.; Gu, J.P.; Pan, G.C. Wind turbine visual inspection based on googlenet network in transfer learning mode. Comput. Sci. 2019, 46, 260–265. [Google Scholar]
Liu, X.; Hu, Y.; Ji, H.; Zhang, M.; Yu, Q. A Deep Learning Method for Ship Detection and Traffic Monitoring in an Offshore Wind Farm Area. J. Mar. Sci. Eng. 2023, 11, 1259. [Google Scholar] [CrossRef]
Wang, W.L.; Xie, X.W.; Xu, Y.L.; Dong, X.D. Design of a wind turbine tower tilt monitoring system based on the Internet of Things. Ind. Instrum. Autom. 2024, 40–43+64. [Google Scholar]
Acocella, A.; Caplice, C. Research on truckload transportation procurement: A review, framework, and future research agenda. J. Bus. Logist. 2023, 44, 228–256. [Google Scholar] [CrossRef]
Semenov, I.; Kaup, M. Risk management of oversize cargo transport. WUT J. Transp. Eng. 2020, 129, 63–71. [Google Scholar] [CrossRef]
Kwasiborska, A.; Kadzioła, K. Application of causal analysis of disruptions and the functional resonance analysis method (fram) in analyzing the risk of the baggage process. Sci. J. Silesian Univ. Technol. Ser. Transp. 2023, 119, 63–81. [Google Scholar] [CrossRef]
Bošnjaković, M.; Katinić, M.; Santa, R.; Marić, D. Wind Turbine Technology Trends. Appl. Sci. 2022, 12, 8653. [Google Scholar] [CrossRef]
Rezamand, M.; Kordestani, M.; Carriveau, R.; Ting, D.S.-K.; Orchard, M.E.; Saif, M. Critical wind turbine components prognostics: A comprehensive review. IEEE Trans. Instrum. Meas. 2020, 69, 9306–9328. [Google Scholar] [CrossRef]
Bensalah, A.; Barakat, G.; Amara, Y. Electrical Generators for Large Wind Turbine: Trends and Challenges. Energies 2022, 15, 6700. [Google Scholar] [CrossRef]
Strielkowski, W.; Civín, L.; Tarkhanova, E.; Tvaronavičiene, M.; Petrenko, Y. Renewable Energy in the Sustainable Development of Electrical Power Sector: A Review. Energies 2021, 14, 8240. [Google Scholar] [CrossRef]
Skibko, Z.; Hołdyński, G.; Borusiewicz, A. Impact of Wind Power Plant Operation on Voltage Quality Parameters—Example from Poland. Energies 2022, 15, 5573. [Google Scholar] [CrossRef]
Robak, S.; Raczkowski, R.; Piekarz, M. Development of the Wind Generation Sector and Its Effect on the Grid Operation—The Case of Poland. Energies 2023, 16, 6805. [Google Scholar] [CrossRef]
Szkoda, M.; Satora, M. The application of failure mode and effects analysis (FMEA) for the risk assessment of changes in the maintenance system of railway vehicles. Tech. Trans. 2019, 8, 159–172. [Google Scholar] [CrossRef]
Li, Y.W.; Luo, L.S.; Zhao, B.; He, H.; Li, W.; Wu, J.; Wang, J. Zhou, K. Multi target intelligent tracking technology for substation video surveillance based on YOLO v8 algorithm. Telecommun. Sci. 2024, 1–9. [Google Scholar]
Wang, W.J.; Tang, B.; Gu, Z.H.; Wang, S. Overview of multi-view 3D reconstruction techniques in deep learning. Comput. Eng. Appl. 2024, 61, 22–35. [Google Scholar] [CrossRef]
Karavadi, A.; Balog, R.S. Novel non-flat PV module geometries and implications to power conversion. In Proceedings of the 2011 IEEE Energy Conversion Congress and Exposition, Phoenix, AZ, USA, 17–22 September 2011; pp. 7–13. [Google Scholar]
Hou, W.; Wei, Y.; Guo, J.; Jin, Y.; Zhu, C. Automatic detection of welding defects using deep neural network. J. Phys. Conf. Ser. 2017, 933, 012006. [Google Scholar] [CrossRef]
Mao, W.L.; Chen, W.C.; Fathurrahman, H.I.K.; Lin, Y.H. Deep learning networks for real-time regional domestic waste detection. J. Clean. Prod. 2022, 344, 131096. [Google Scholar] [CrossRef]
Li, S.; Yeung, D. Visual object tracking for unmanned aerial vehicles: A benchmark and new motion models. In Proceedings of the National Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
Bozcan, I.; Kayacan, E. AU-AIR: A multi-modal unmanned aerial vehicle dataset for low altitude traffic surveillance. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May 2020–31 August 2020; pp. 8504–8510. [Google Scholar]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
Xu, S.; Wang, X.; Lv, W.; Chang, Q.; Cui, C.; Deng, K.; Wang, G.; Dang, Q.; Wei, S.; Du, Y.; et al. PP-YOLOE: An evolved version of YOLO. arXiv 2022, arXiv:2203.16250. [Google Scholar]
Gallagher, J. How to Train an Ultralytics YOLOv8 Oriented Bounding Box (OBB) Model. Available online: https://blog.roboflow.com/train-yolov8-obb-model/ (accessed on 6 February 2024).
Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. YOLOv9: Learning what you want to learn using programmable gradient information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
Chen, Y.; Yuan, X.; Wu, R.; Wang, J.; Hou, Q.; Cheng, M. YOLO-MS: Rethinking multi-scale representation learning for real-time object detection. arXiv 2023, arXiv:2308.05480. [Google Scholar] [CrossRef]
Wang, C.; He, W.; Nie, Y.; Guo, J.; Liu, C. Gold-YOLO: Efficient object detector via gather-and-distribute mechanism. Adv. Neural Inf. Process. Syst. 2024. [Google Scholar]
Fang, Y.; Liao, B.; Wang, X.; Fang, J.; Qi, J.; Wu, R.; Niu, J.; Liu, W. You Only Look at One Sequence: Rethinking Transformer in Vision Through Object Detection. Adv. Neural Inf. Process. Syst. 2021, 34, 26183–26197. [Google Scholar]

Figure 1. Network structure diagram of the improved YOLOv7-Tiny algorithm.

Figure 2. CBAM structure.

Figure 3. Channel attention module.

Figure 4. Spatial attention module.

Figure 5. Feature pyramid network.

Figure 6. Path aggregation network.

Figure 7. Bidirectional feature pyramid network.

Figure 8. Cloud-based construction safety monitoring system.

Figure 9. Fan blade transportation start-up status detection.

Figure 10. Detection of the transportation process of fan blade.

Figure 11. Detection of the stopping process of fan blade.

Figure 12. Detection of the turning process during the transportation of fan blade.

Figure 13. Detection of staff wearing safety helmets.

Figure 14. Car intrusion detection.

Figure 15. Agricultural vehicle intrusion detection.

Figure 16. Detection of unrelated vehicles entering the construction area.

Figure 17. mAP50% and loss curve.

Figure 18. Cloud platform system.

Table 1. Performance of YOLO v7 model on different databases.

Database	P	R	F	AP₅₀	AP_50:95
Video Data	75.20%	80.20%	19.80%	98.12%	75.68%
Image Data	70.44%	78.33%	21.67%	96.80%	70.55%

Table 2. Ablation study comparison results for each improved component of the model.

CBAM	CARAFE	BiFPN	Precision (%)	Recall (%)	mAP_50:95 (%)	Number of Parameters (10⁶)	Computational Complexity (GFlops)	Detection Time (ms)	Weigh (MB)
×	×	×	93.2	91.3	77.9	7.02	15.8	78.3	14.1
√	×	×	93.8	93	77.8	7.02	15.8	65.99	14.1
×	√	×	93.1	91.2	79.1	7.16	16.1	66.7	14.6
×	×	√	94.2	91.2	78.0	7.3	16.7	53.7	14.7
√	√	×	94.3	90.5	77.7	7.16	16.1	64.7	14.7
√	×	√	95.1	91.1	77.7	7.3	16.7	64.7	14.7
×	√	√	94.2	93.6	78.1	7.44	16.9	84	15.0
√	√	√	95.5	91.6	78.3	7.44	17.0	95.6	15.0

× represents’ not possessing this item; √ represents’ possessing this item.

Table 3. Comparative ablation study of different attention mechanisms.

Attention Mechanism	Precision (%)	Recall (%)	mAP_50:95 (%)	Number of Parameters (10⁶)	FPS (ms)	Computational Complexity (GFlops)
SE	93.0	93.1	77.5	7.05	22.94	15.8
CA	94.5	91.9	77.8	7.04	24.04	15.8
SimAM	94.0	92.1	77.3	7.02	24.56	15.8
This research	94.8	93.0	77.8	7.02	24.88	15.8

Table 4. Comparison of different operation modes of BiFPN.

Model	Weighting Method	Precision (%)	Recall (%)	mAP_50:95 (%)	Number of Parameters (10⁶)
YOLOv7-BiFPN	Add	94.5	90.6	77.4	7.16
YOLOv7-BiFPN	Concat	95.4	91.2	78.1	7.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Wang, X.; Wang, Y.; Fang, S. Edge–Cloud Intelligence for Sustainable Wind Turbine Blade Transportation: Machine-Vision-Driven Safety Monitoring in Renewable Energy Systems. Energies 2025, 18, 2138. https://doi.org/10.3390/en18082138

AMA Style

Wang Y, Wang X, Wang Y, Fang S. Edge–Cloud Intelligence for Sustainable Wind Turbine Blade Transportation: Machine-Vision-Driven Safety Monitoring in Renewable Energy Systems. Energies. 2025; 18(8):2138. https://doi.org/10.3390/en18082138

Chicago/Turabian Style

Wang, Yajun, Xiaodan Wang, Yihai Wang, and Shibiao Fang. 2025. "Edge–Cloud Intelligence for Sustainable Wind Turbine Blade Transportation: Machine-Vision-Driven Safety Monitoring in Renewable Energy Systems" Energies 18, no. 8: 2138. https://doi.org/10.3390/en18082138

APA Style

Wang, Y., Wang, X., Wang, Y., & Fang, S. (2025). Edge–Cloud Intelligence for Sustainable Wind Turbine Blade Transportation: Machine-Vision-Driven Safety Monitoring in Renewable Energy Systems. Energies, 18(8), 2138. https://doi.org/10.3390/en18082138

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Edge–Cloud Intelligence for Sustainable Wind Turbine Blade Transportation: Machine-Vision-Driven Safety Monitoring in Renewable Energy Systems

Abstract

1. Introduction

2. Materials and Methods

2.1. Risk Detection Algorithm for Wind Farm Area Based on Multimodal Image Data

2.1.1. Improving the YOLOv7-Tiny Detection Model

2.1.2. CBAM Attention Module Embedding

2.1.3. CARAFE-Based Upsampling

2.1.4. Weighted Feature Information Fusion

2.2. Cloud-Based Construction Safety Monitoring System

2.2.1. System Architecture

2.2.2. Real-Time Hazard Detection Workflow

2.2.3. Performance Optimization

2.2.4. Technical Advantages

2.2.5. Edge–Cloud Workflow

3. Results

3.1. Data Processing

3.2. Experimental Environment and Parameter Settings

3.3. Model Evaluation Indicators

3.4. Ablation Experiment

3.5. Detection Results

3.6. Comparative Analysis with Mainstream Models

4. Discussion

4.1. Practical Application and Value in Real-World Wind Farm Scenarios

4.2. Cloud–Edge Intelligence for Wind Farm Safety: Advances and Challenges

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI