Next Article in Journal
Editorial for Special Issue on Ultra-Precision Machining of Difficult-to-Machine Materials
Previous Article in Journal
Perovskites to Photonics: Engineering NIR LEDs for Photobiomodulation
Previous Article in Special Issue
I–V Characteristics and Electrical Reliability of Metal–SixNy–Metal Capacitors as a Function of Nitrogen Bonding Composition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Photovoltaic Cell Surface Defect Detection via Subtle Defect Enhancement and Background Suppression

School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China
*
Author to whom correspondence should be addressed.
Micromachines 2025, 16(9), 1003; https://doi.org/10.3390/mi16091003
Submission received: 11 July 2025 / Revised: 24 August 2025 / Accepted: 28 August 2025 / Published: 30 August 2025
(This article belongs to the Special Issue Thin Film Photovoltaic and Photonic Based Materials and Devices)

Abstract

As the core component of photovoltaic (PV) power generation systems, PV cells are susceptible to subtle surface defects, including thick lines, cracks, and finger interruptions, primarily caused by stress and material brittleness during the manufacturing process. These defects substantially degrade energy conversion efficiency by inducing both optical and electrical losses, yet existing detection methods struggle to precisely identify and localize them. In addition, the complexity of background noise and other factors further increases the challenge of detecting these subtle defects. To address these challenges, we propose a novel PV Cell Surface Defect Detector (PSDD) that extracts subtle defects both within the backbone network and during feature fusion. In particular, we propose a plug-and-play Subtle Feature Refinement Module (SFRM) that integrates into the backbone to enhance fine-grained feature representation by rearranging local spatial features to the channel dimension, mitigating the loss of detail caused by downsampling. SFRM further employs a general attention mechanism to adaptively enhance key channels associated with subtle defects, improving the representation of fine defect features. In addition, we propose a Background Noise Suppression Block (BNSB) as a key component of the feature aggregation stage, which employs a dual-path strategy to fuse multiscale features, reducing background interference and improving defect saliency. Specifically, the first path uses a Background-Aware Module (BAM) to adaptively suppress noise and emphasize relevant features, while the second path adopts a residual structure to retain the original input features and prevent the loss of critical details. Experiments show that PSDD outperforms other methods, achieving the highest m A P 50 scores of 93.6% on the PVEL-AD.

1. Introduction

Solar energy, as a clean and renewable resource, has attracted worldwide attention in the global pursuit of carbon neutrality and the mitigation of climate change [1]. Photovoltaic cells, as the core devices for converting solar energy into electricity, have their manufacturing quality directly determining energy conversion efficiency and decisively influencing the overall system performance [2]. However, during the manufacturing process of PV cells, various defects may arise on the cell surface due to internal stress concentration, uneven thermal treatment, or minor damage [3]. These defects not only reduce energy conversion efficiency by creating carrier recombination centers [4], but also trigger localized hotspot effects, further accelerating the degradation of cell performance [5]. Therefore, the accurate and timely detection of surface defects in PV cells is of critical importance for the sustainable development of the photovoltaic industry.
In recent years, researchers have continuously explored the application of deep learning in detecting surface defects in PV cells. Akram et al. [6] proposed a simplified CNN architecture and incorporated data augmentation to improve defect detection performance while maintaining low model complexity; however, its limited representational capacity results in suboptimal performance in capturing complex defect features. Subsequently, Wang et al. [7] developed YOLO-PV, a customized network architecture specifically designed for EL images, which emphasizes fine-grained feature fusion to balance detection accuracy and computational efficiency; however, its generalization capability across different defect types remains limited. More recently, Liu et al. [8] proposed a CNN-based method that combines contrast-limited adaptive histogram equalization with a global context attention mechanism to enhance the recognition of fine-grained defects in EL images, although its relatively complex network structure also results in higher training and inference costs.
Existing deep learning methods for PV cell defect detection have demonstrated promising results; however, under complex background noise, achieving precise detection of subtle defects on the surface of photovoltaic cells remains challenging. As shown in Figure 1, Dynamic R-CNN [9] performs poorly in the localization of subtle defects, with detection regions appearing fragmented and discontinuous; YoLov10 [10] is prone to missing small defects, resulting in incomplete detection and DETR [11] often misclassifies regions unrelated to actual defects as defects.
To address the above issues, we propose a novel surface defect detector based on subtle defect enhancement and background suppression, which incorporates two key innovations: Subtle Feature Refinement Module (SFRM) and Background Noise Suppression Block (BNSB). Specifically, SFRM partitions spatial features and rearranges them into the channel dimension, combined with a channel attention mechanism, enhancing the representation of subtle defects and preserving their complete information. BNSB adopts a parallel dual-path structure to effectively suppress background noise interference: on the one hand, the first path incorporates the BAM to adaptively weaken irrelevant background information; on the other hand, the second path preserves the integrity of input features through a residual connection to prevent the loss of critical details. In summary, the main contributions of this paper are as follows:
  • We propose a plug-and-play SFRM module that preserves subtle defect features by rearranging spatial information and enhances key feature representation through channel attention.
  • We propose a dual-path BNSB, the first path uses BAM to suppress noise and highlight key features, and the second path employs residual structure to preserve the original features, thereby reducing background interference and enhancing defect saliency.
  • Based on SFRM and BNSB, a novel detector PSDD is proposed to achieve efficient and accurate detection of PV cell defects by enhancing subtle features and suppressing noise.
  • Extensive tests show that PSDD outperforms other advanced methods, achieving the best m A P 50 of 93.6% in the PVEL-AD datasets.
This paper is organized as follows. Section 2 reviews the progress of related research. Section 3 provides a detailed introduction to the proposed method. Section 4 presents the experimental results. Section 5 concludes the paper.

2. Related Work

2.1. Attention Mechanism

The attention mechanism, inspired by the selective focus of the human visual system, adaptively recalibrates feature representations by assigning higher weights to informative signals while suppressing redundant or noisy information. This dynamic weighting enhances the model’s ability to capture salient patterns and contextual dependencies, improving both discriminative power and generalization capability. The advantages of the attention mechanism include stronger feature representation, the ability to capture long-range dependencies, and improved robustness against irrelevant information. However, attention mechanisms also introduce higher computational and memory costs, increase network complexity, and may require large or high-quality datasets to achieve optimal performance. Attention mechanisms can be categorized into three types based on their operational dimensions: channel attention, spatial attention, and self-attention. The channel attention mechanism adaptively assigns weights to each channel, enhancing the representation of defect-related features while suppressing redundant information and background noise. Hu et al. [12] proposed the Squeeze-and-Excitation (SE) network, which extracts global contextual information through global average pooling and adaptively recalibrates channel-wise feature responses, thereby enhancing the network’s focus on informative channels while suppressing irrelevant features. The spatial attention mechanism adaptively adjusts the weights of different positions in the feature map based on global or local spatial information, highlighting salient features in key regions while suppressing interference from non-critical areas. Jongchan Park et al. [13] proposed the bottleneck attention module, which generates attention maps along the spatial dimension to reinforce the representation of salient regions, enabling more precise detection of defect areas. The self-attention mechanism dynamically weights and fuses input features based on the correlations between elements, thereby capturing global long-range dependencies and enhancing feature representation. Chen et al. [14] introduced a lightweight self-attention mechanism within the DETR framework to achieve real-time detection of surface defects in crystalline silicon photovoltaic cells, balancing global modeling capability with inference efficiency. Their method effectively improves the accuracy of defect identification.

2.2. Photovoltaic Cell Defect Detection

Deep learning-based methods have become the mainstream approach for PV cell defect detection due to their ability to automatically learn discriminative features from raw images. These methods can be broadly categorized into single-stage, two-stage, and Transformer-based approaches, based on differences in detection processes and model architectures. Single-stage detection methods offer high inference speed and are suitable for real-time applications, but their accuracy may be limited for small or densely packed defects. Fioresi et al. [15] proposed a detection framework based on an improved YOLOX [16] model, which can efficiently identify and localize defects even with a limited number of training samples. Two-stage detection methods provide higher accuracy and better handling of small or overlapping defects, but their multi-step processing increases model complexity and slows inference. Su et al. [17] proposed a complementary attention network that sequentially combines channel and spatial attention to suppress background noise and enhance defect features. Transformer-based detection methods excel at capturing complex and multi-scale defects but require high computational resources, limiting real-time applicability. However, they typically require higher computational resources, making them more suitable for applications that prioritize detection accuracy over real-time performance. Lang et al. [18] proposed a YOLO-based PV defect detection method that integrates attention mechanisms and Transformer modules, enhancing detection accuracy by effectively capturing and fusing spatial and semantic features.
Although the above methods for detecting PV cell defects have achieved certain results, their applicability remains limited due to insufficient consideration of the characteristics of subtle defects and the impact of complex background noise. To more effectively capture the characteristics of PV cell surface defects and enhance detection accuracy, we design a plug-and-play SFRM module, which integrates a spatial-to-channel mapping strategy with a channel attention mechanism to effectively mitigate the loss of subtle defect features during the downsampling process. Meanwhile, we propose the BNSB module with a dual-path fusion strategy: the main path incorporates the BAM module to suppress interference from complex background noise, while the auxiliary path adopts a residual connection structure to preserve original feature information. On this basis, we propose the PSDD method, whose overall framework is mainly supported by both SFRM and BNSB. The detailed implementation is presented in Section 3.

3. Method

3.1. Overall Architecture

Figure 2 illustrates the overall architecture of our PSDD, which consists of three main components: the Backbone, the Feature Fusion Network (FFN), and the Prediction Head. We adopt Darknet53 [19] as the backbone to extract multiscale features { f i } i = 1 3 from the input image x R H × W × C for subsequent processing, due to its strong feature extraction capability and low computational cost. To enhance the backbone’s ability to capture subtle defect patterns, we integrate a set of plug-and-play Subtle Feature Refinement Modules (SFRMs) into selected backbone layers, as shown in Figure 2a. We append the lightweight Spatial Pyramid Pooling Fast (SPPF) module at the end of the backbone to efficiently aggregate multi-scale spatial information and enhance global feature representation. FFN employs a Bidirectional Feature Pyramid strategy to integrate and refine multiscale features { f i } i = 1 3 . We further introduce the Background Noise Suppression Block (BNSB) in the feature fusion stage to suppress background interference and enhance the feature representation of foreground defects. For the final prediction stage, we adopt a decoupled head design that handles classification and regression tasks separately to improve detection accuracy and robustness. The prediction branch consists of two convolutional layers and one Conv2D layer, while the classification branch employs two separable convolutional layers in depth and one Conv2D layer [20], achieving precise defect localization while effectively reducing the number of parameters and computational cost. In this paper, the main innovative components include SFRM and BNSB, which are detailed in Section 3.2 and Section 3.3, respectively.

3.2. Subtle Feature Refinement Module (SFRM)

Convolutional units are prone to losing fine-grained defect information as a result of strided convolutions and pooling operations. To mitigate this issue, we propose a plug-and-play SFRM that enhances the representation of subtle defect features. Figure 3a shows the architecture of the SFRM.
SFRM partitions the input feature f R h × w × c into four spatial subsets of equal size:
f 1 = f [ : , 0 : : 2 , 0 : : 2 , : ] f 2 = f [ : , 1 : : 2 , 0 : : 2 , : ] f 3 = f [ : , 0 : : 2 , 1 : : 2 , : ] f 4 = f [ : , 1 : : 2 , 1 : : 2 , : ]
Each subset is represented as f i R h 2 × w 2 × c , achieving a twofold downsampling in spatial dimensions while fully preserving the original channel information. However, spatial division may also inadvertently retain background noise or irrelevant signals, thereby impacting the effectiveness of subsequent feature fusion. To mitigate this issue, an SE attention mechanism [12] is incorporated into SFRM (Figure 3b), which allows adaptive channel weighting to emphasize defect-relevant features. Formally,
f i = σ W 2 δ W 1 · GAP ( f i ) f i , i { 1 , 2 , 3 , 4 }
The refined sub-features { f i } i = 1 4 are concatenated along the channel axis to form a unified representation:
f ^ = Concat ( f 1 , f 2 , f 3 , f 4 )
Therefore, f ^ R h 2 × w 2 × 4 c retains half the spatial resolution of the original input f while expanding the channel dimension fourfold to integrate complementary information from different subregions.
f ^ is finally passed through a unit-stride convolutional layer with a kernel size of c ˜ × c ˜ to preserve subtle defect information:
f ˜ = Conv 1 × 1 c ˜ ( f ^ )
In this way, f ˜ obtained by the SFRM preserves subtle defect information and thus may enhance the model’s discriminative capacity.

3.3. Background Noise Suppression Block (BNSB)

Features at different scales from the backbone network often contain noisy information. During the fusion process, background noise interference may be further amplified, which can obscure true defect features and reduce the discriminative capacity of the model. We design a novel BNSB module to alleviate noise interference using a dual path optimization strategy, as shown in Figure 4a.
Let f R h × w × C be the input feature obtained by fusing features from different levels, as illustrated in Figure 2. After reducing f’s channel dimension to c via a 1 × 1 convolution, BNSB splits it into two subsets, i.e., f bam R h × w × c 2 and f res R h × w × c 2 . Formally,
f bam , f res = Split Conv f
f bam is fed into the background noise suppression path, which consists of a sequence of background-aware modules (BAM) designed to suppress background noise.
f i = BAM ( f bam ) , if i = 1 BAM ( f i 1 ) , if i { 2 , 3 , , T }
where T, defaulting to 4, denotes the number of BAMs. f res is used as the residual path to retain detailed spatial information and complement the background suppressed features, followed by a 1 × 1 convolution to restore the dimension of the channel. Formally,
f out = Conv i = 1 T f i bam f res
Figure 4b,c shows the details of the proposed BAM with and without shortcut connections, respectively. BAM employs a bottleneck structure that compresses the channel dimension of f bam using a 1 × 1 convolution, followed by a 3 × 3 convolution to extract local spatial features of defect regions, enhancing the model’s ability to perceive background noise. Formally,
f ˜ bam = Conv 3 × 3 Conv 1 × 1 ( f bam )
The Convolutional Block Attention Module (CBAM) is applied to suppress background noise and enhance feature representation through channel and spatial attention mechanisms. Specifically, CBAM sequentially combines channel attention and spatial attention. f ˜ bam undergoes channel attention, where both GAP and GMP are applied to extract two complementary pooled features. These features are then passed through MLP, followed by element-wise multiplication and sigmoid activation to generate channel attention f ca . Formally,
f ca = σ MLP GMP f ˜ bam GAP f ˜ bam
The output f ca is then element-wise multiplied with f ˜ bam to generate the channel-refined feature f ˜ ca :
f ˜ ca = f ˜ bam f ca
CBAM applies spatial attention to f ˜ ca . Specifically, GAP and GMP perform along the spatial dimensions in f ˜ ca to extract complementary spatial context information. The pooled results are then concatenated to form the spatial attention input f sp , formulated as follows:
f sp = concat GAP ( f ˜ ca ) , GMP ( f ˜ ca )
f sp is processed by a 7 × 7 convolution to extract spatial contextual information, followed by a sigmoid activation function to generate spatial attention weights. These weights are then element-wise multiplied with f ca , resulting in the final output feature of BAM, denoted as f cbam .
f cbam = σ Conv 7 × 7 ( f sp ) f ˜ ca

3.4. Loss Function

We use Enhanced IoU (EIoU) [21] loss to improve boundary sensitivity and precisely localize subtle defects by leveraging the geometric structure. Formally,
L EIoU = L IoU + L dls + L asp L IoU = 1 IoU L dls = ρ 2 ( b , b gt ) ( w c ) 2 L asp = ρ 2 ( w , w gt ) ( w c ) 2 + ρ 2 ( h , h gt ) ( h c ) 2
where IoU (Intersection over Union) [21] measures the overlap between predicted and ground-truth boxes and reflects the localization accuracy. Distance loss L dls normalizes center point deviation to improve positioning precision, while the aspect ratio loss L asp penalizes size mismatches to maintain shape consistency and enhance defect detection performance.

4. Experiments

4.1. Experimental Setup

4.1.1. Datasets

We conduct experiments on the publicly available PVEL-AD dataset [22] to evaluate the performance of our PSDD. PVEL-AD, jointly released by Hebei University of Technology and Beihang University, is a high-quality near-infrared EL dataset comprising 36,543 images with diverse internal defects and heterogeneous backgrounds across 12 defect types. Due to the extremely limited number of samples for Printing_Error, Corner, Fragment, and Scratch defect types, we constructed a representative subset consisting of 3524 images. This subset contains eight common defect types: Black_Core (Bc), Crack (Cr), Finger (Fi), Horizontal_Dislocation (Hd), Short_Circuit (Sci), Star_Crack (Scr), Thick_Line (Tl), and Vertical_Dislocation (Vd). Based on their spatial distribution and visual scale, these defects are grouped into large-scale (Bc, Hd, Vd, Sci) and subtle (Cr, Scr, Fi, Tl) types. This dataset is divided into training, validation, and test sets in a ratio of 7:1:2 to ensure the rationality of model training, parameter tuning, and performance evaluation.

4.1.2. Implementing Details

Our PSDD is implemented using the PyTorch(1.13.1) framework and is trained on an NVIDIA A100 GPU with 80 GB of memory. During training and evaluation, all images are uniformly resized to a resolution of 640 × 640 pixels. Stochastic Gradient Descent (SGD) [23] is employed as the optimizer, with a learning rate of 0.01, a momentum factor of 0.937, and a batch size of 32. The maximum number of training epochs is 200.

4.2. Evaluation Metrics

We use m A P 50 and m A P 50 : 95 as key metrics to evaluate detection performance. m A P 50 represents the average precision across all N categories at a fixed IoU threshold of 0.50, defined as follows:
m A P 50 = 1 N i = 1 N A P i I o U = 0.50
m A P 50 : 95 is computed as the mean A P across T IoU thresholds from 0.50 to 0.95 with an interval of 0.05, defined as follows:
m A P 50 : 95 = 1 T t = 1 T 1 N i = 1 N A P i I o U = τ t
where T = 10 corresponds to the number of evaluated thresholds. A P i I o U = τ t is the Average Precision calculated for category i at that threshold, defined as follows:
A P = 0 1 P ( R ) d R
where P ( R ) denotes the function that describes the relationship between precision (P) and recall (R), respectively.

4.3. Comparative Experiment

We conducted comparative experiments between our PSDD and eighteen advanced detectors on the PVEL-AD dataset to verify the effectiveness of PSDD. These detectors span three categories: single-stage detectors, including TOOD [24], YOLOv8 [25], YOLOv10 [10], YOLOv11 [20], YOLOX [16], and Mamba YOLO [26]; two-stage detectors, including Dynamic R-CNN [9], Faster R-CNN [27], Mask R-CNN [28], Cascade RPN [29], SSOD [30], and IOD [31] and transformer-based detectors, including DETR [11], Deformable DETR [32], Rt-DETR [33], Swin-Transformer [34], DINO [35], and Wave-ViT [36]. The comparative experiments cover key metrics, including mAP 50 , mAP 50 : 95 , precision, recall, model parameters, FLOPs, inference speed, and detection performance across all defect categories, with the overall performance and category-specific results summarized in Table 1.
As shown in Table 1, PSDD achieves 93.6 % , 65.3 % , and 90.8 % on mAP 50 , mAP 50 : 95 , and P, respectively, ranking highest among all compared models. In terms of inference speed, PSDD processes a single image in only 10.0 ms, slightly faster than the single-stage detector Yolov11 [20] ( 12.0 ms). This advantage is primarily attributed to PSDD’s efficient design of feature extraction and detection heads, which enables low latency while maintaining high accuracy. In contrast, the two-stage detectors SSOD [30] and IOD [31] require 25.6 ms and 19.2 ms per image, respectively. This is mainly because SSOD suffers from efficiency bottlenecks in feature fusion and region proposal generation, while IOD has relatively limited capability in modeling fine-grained defects, resulting in lower overall inference speed and detection performance. In terms of model complexity, PSDD contains only 4.0 M parameters and 13.8 G FLOPs, achieving an excellent balance between detection accuracy and computational cost. Wave-ViT [36], due to its complex self-attention mechanisms, has a substantially larger model size, with 66.1 M parameters and 96.4 G FLOPs. In addition, PSDD also demonstrates outstanding performance across all defect categories, achieving the highest mAP in five categories (Bc, Cr, Scr, Fi, and Tl) at 98.8 % , 84.7 % , 87.0 % , 90.4 % , and 91.3 % , respectively. Notably, its detection performance on subtle defects such as Cr, Scr, Fi, and Tl is particularly strong, further validating the model’s robustness in high-precision defect detection tasks. After approximately the 125th epoch, PSDD significantly outperforms all other models in m A P 50 , demonstrating its strong learning capability and faster convergence.
Figure 5 presents the changes in m A P metrics of various comparison models on the PVEL-AD dataset over training iterations. The red curve at the top of the figure represents PSDD. The results indicate that the performance of PSDD gradually improves with training iterations and ultimately reaches the optimal result.
Notably, after approximately the 125th epoch, PSDD significantly outperforms all other models in mAP50, demonstrating its strong learning capability and faster convergence. In addition, PSDD consistently maintains an advantage in the m A P 50 : 95 metric, indicating high sensitivity to capture defect details and stable recognition performance.

4.4. Ablation Experiments

To further analyze the contribution of each component, we progressively integrate the SFRM and BNSB modules into the baseline model and conduct ablation experiments. The results are reported in Table 2.
When the SFRM module is added to the baseline model alone, the model’s mAP50 increased by 2.5% (from 87.4% to 89.9%), with P and R improving by 1.6% and 6.4%, respectively. In particular, the most significant performance gains are observed in the Cr and Scr categories, increasing from 78.9% to 84.7% and from 73.2% to 81.2%, respectively. These results indicate that SFRM, by combining a spatial-to-channel mapping strategy with a channel attention mechanism during feature downsampling, assigns higher responses to subtle and critical defect regions, effectively preserving fine defect information and thereby enhancing the model’s capability to detect subtle defects.
When the BNSB module is integrated into the baseline model alone, the model’s m A P 50 increases by 4.7% (reaching 92.1%), with P and R improving by 3.6% and 3.8%, respectively. The most significant performance gain is observed in the Vd category, increasing from 73.0% to 96.8%, while notable improvements are also achieved in the Fi and Tl categories. This improvement is mainly attributed to BNSB, which introduces the BAM module in the main path to effectively suppress complex background noise, while retaining residual connections in the auxiliary path to preserve original feature information, thereby significantly enhancing the representation of defect features. This mechanism not only improves the model’s detection performance for categories with strong background interference but also optimizes the overall detection accuracy.
When the SFRM and BNSB modules are integrated simultaneously, the model achieves the best overall performance, with m A P 50 , P, and R reaching 93.6%, 90.8%, and 89.2%, respectively. Furthermore, as shown in Figure 6, we compare the confusion matrix of the PSDD model (Figure 6b) with that of the baseline model (Figure 6a) to analyze their performance differences in the detection of various defect categories. Compared with the baseline model, PSDD improves the detection accuracy for the subtle defect categories Cr, Scr, and Tl by 0.04, 0.17, and 0.06, respectively. It demonstrates that our PSDD has superior capability in identifying subtle defects under complex backgrounds.

4.5. Visualization

Figure 7 presents a visual comparison between the proposed PSDD method and several mainstream object detectors on the PVEL-AD dataset. As shown in Figure 7, PSDD is capable of precisely localizing defect regions while maintaining stable confidence distributions, achieving high consistency with the ground-truth defect areas and demonstrating superior detection accuracy and robustness. In contrast, RT-DETR [33] fails to fully recognize Fi and Tl defects and produces false detections in the Cr category; YOLOv8 [25] exhibits missed detections for Fi and Tl, fragments the Cr defect into two separate regions, and misclassifies Scr as Cr; Faster R-CNN [27] tends to confuse Scr with Cr and assigns relatively low confidence scores to Fi and Tl.
We conducted Class Activation Mapping (CAM) visualization analysis for PSDD and several comparative models [37] to further evaluate PSDD’s capability in detecting subtle defects under complex backgrounds and to enhance its interpretability [37]. The results in Figure 8 indicate that PSDD can accurately focus on subtle defect regions and generate strong activations at critical locations. The detected regions align closely with the ground-truth defect areas, underscoring the model’s accuracy and reliability in fine-grained defect localization. In contrast, Rt-DETR [33] exhibits relatively broad activation regions, which weaken defect responses and increase the risk of false positives; YOLOv8 [25] shows insufficient feature representation for fine-grained defects, with dispersed activations failing to effectively cover defect areas; Faster R-CNN [27] produces some focused activations in certain discrete defect regions, but its overall response strength and spatial continuity are limited, constraining its ability to identify subtle defects in complex textured environments. In summary, PSDD demonstrates significant advantages in feature extraction and spatial localization, effectively improving the accuracy and robustness of subtle defect detection.

4.6. Generalization Experiments

To comprehensively assess the generalization performance of the proposed PSDD model, this study selects eleven representative object detection models for comparison, covering the mainstream detection architectures. Among them, the single-stage detection models include TOOD [24], YOLOv8 [25], YOLOv10 [38], YOLOv11 [20], YOLOX [16], and Mamba YOLO [26]; the two-stage detection model includes Faster R-CNN [27] and the Transformer-based detection models include Deformable DETR [32], Rt-DETR [33], and Swin-Transformer [34]. To systematically compare the performance of these models, their overall performance was quantified using m A P 50 , P, and R, followed by a detailed evaluation of detection accuracy across different defect categories. All experiments are carried out on the NEU-Det dataset [39], a widely used benchmark for steel surface defect detection released by Northeastern University. The NEU-Det dataset consists of 1800 grayscale images with a resolution of 200 × 200 pixels, evenly distributed among six typical defect types. Among them, Crazing (Cr), Inclusion (In), and Pitted Surface (Ps) are fine-grained defects characterized by small sizes and low contrast, whereas Patches (Pa), Rolled-in Scale (Rs) and Scratches (Sc) are large-scale defects with clear contours.
Table 3 presents a comparative evaluation of PSDD against other representative object detection models on the NEU-Det dataset. PSDD achieves the best overall performance, with a m A P 50 of 80.9%, P of 81.4%, and R of 80.6%. Across different defect categories, PSDD attains high detection accuracy for Cr (52.4%), In (84.1%), Pa (93.4%), Ps (90.6%), and Sc (96.4%). In particular, it demonstrates a clear advantage in detecting low-contrast and subtle defects (Cr and In), highlighting its enhanced capability to identify subtle defects under complex backgrounds.

5. Conclusions

This paper proposes a novel photovoltaic cell surface defect detector (PSDD), designed to improve detection performance under complex background conditions. Compared with existing methods, including traditional CNN-based detectors and attention-guided architectures, PSDD demonstrates superior capability in enhancing subtle defect features and effectively suppressing background noise. This improvement is primarily attributed to the Subtle Feature Refinement Module (SFRM), which captures fine-grained defect information through spatial feature splitting and channel attention, and the Background Noise Suppression Block (BNSB), which employs a dual-path fusion strategy to balance noise suppression and information retention. Experimental results in the public PVEL-AD dataset further validate the advantages of PSDD. In addition to achieving a m A P 50 of 93.6%, the results demonstrate the effectiveness of SFRM in extracting subtle defect features and the role of BNSB in distinguishing foreground defects under complex backgrounds. Meanwhile, we note that PSDD may face challenges in detecting extremely small or low-contrast defects, and the added modules slightly increase computational complexity. Therefore, future work will explore optimization strategies for extremely small and low-contrast defects and attempt to refine module designs to reduce computational overhead, thereby enhancing practicality and detection performance. At the same time, we plan to conduct evaluations on more diverse photovoltaic defect datasets to assess the generalization capability of PSDD and comprehensively examine its robustness under varying environmental and imaging conditions.

Author Contributions

Conceptualization, Y.S. and Y.F.; methodology, Y.S.; validation, G.H.; formal analysis, G.H., Y.F. and C.X.; resources, Y.F.; data curation, C.X.; writing—original draft preparation, G.H.; writing—review and editing, Y.S., H.G. and Y.F.; visualization, G.H.; supervision, Y.S. and H.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded in part by Henan Provincial Science and Technology Program under Grant 241111212200 and 252102220046, in part by Henan Joint Fund for Science and Technology Research under Grant 20240012, and in part by Key Scientific Research Projects of Higher Education Institutions in Henan Province under Grants 25B520004, 26A520036 and 26A520037, and in part by the Open Fund of the Engineering Research Center of Intelligent Swarm Systems, Ministry of Education under Grant ZZU-CISS-2024004.

Data Availability Statement

Our code are available at https://github.com/tm924222/PSDD (accessed on 27 August 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhang, S.; Cai, W.; Zheng, X.; Lv, X.; An, K.; Cao, Y.; Cheng, H.S.; Dai, J.; Dong, X.; Fan, S.; et al. Global readiness for carbon neutrality: From targets to action. Environ. Sci. Ecotechnol. 2025, 25, 100546. [Google Scholar] [CrossRef] [PubMed]
  2. Rana, Z.; Zamora, P.P.; Soliz, A.; Soler, D.; Reyes Cruz, V.E.; Cobos-Murcia, J.A.; Galleguillos Madrid, F.M. Solar Panel Corrosion: A Review. Int. J. Mol. Sci. 2025, 26, 5960. [Google Scholar] [CrossRef] [PubMed]
  3. Leung, T.L.; Willson, G.K.; Fimbres-Weihs, G.; Deng, R.; Chang, N.; Tan, V.; Abbas, A.; Ho-Baillie, A. A new perspective for evaluating circularity of photovoltaic module recycling and technology developments. Cell Rep. Phys. Sci. 2025, 6, 102547. [Google Scholar] [CrossRef]
  4. Fu, B.; Xiong, J.; Jv, T.; Chen, S.; Liang, T.; Ma, H.; Zhang, X.; Pan, D.; Zou, B.; Liang, G.; et al. Reaction Kinetics Regulation Suppressed Carrier Recombination Loss for High-Efficient Solution-Based Antimony Selenosulfide Photovoltaic Devices. Adv. Energy Mater. 2025, 15, 2500586. [Google Scholar] [CrossRef]
  5. Tarigan, E. Identification of Early Operational Defects in Photovoltaic Modules: A Case Study of a 24.9 MWp Solar PV System in Sumatra, Indonesia. Unconv. Resour. 2025, 6, 100156. [Google Scholar] [CrossRef]
  6. Akram, M.W.; Li, G.; Jin, Y.; Chen, X.; Zhu, C.; Zhao, X.; Khaliq, A.; Faheem, M.; Ahmad, A. CNN based automatic detection of photovoltaic cell defects in electroluminescence images. Energy 2019, 189, 116319. [Google Scholar] [CrossRef]
  7. Meng, Z.; Xu, S.; Wang, L.; Gong, Y.; Zhang, X.; Zhao, Y. Defect object detection algorithm for electroluminescence image defects of photovoltaic modules based on deep learning. Energy Sci. Eng. 2022, 10, 800–813. [Google Scholar] [CrossRef]
  8. Liu, Q.; Liu, M.; Wang, C.; Wu, Q.J. An efficient CNN-based detector for photovoltaic module cells defect detection in electroluminescence images. Sol. Energy 2024, 267, 112245. [Google Scholar] [CrossRef]
  9. Zhang, H.; Chang, H.; Ma, B.; Wang, N.; Chen, X. Dynamic R-CNN: Towards high quality object detection via dynamic training. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XV 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 260–275. [Google Scholar]
  10. Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J. Yolov10: Real-time end-to-end object detection. Adv. Neural Inf. Process. Syst. 2025, 37, 107984–108011. [Google Scholar]
  11. Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
  12. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  13. Park, J. Bam: Bottleneck attention module. arXiv 2018, arXiv:1807.06514. [Google Scholar] [CrossRef]
  14. Chen, S.; Lu, Y.; Qin, G.; Hou, X.; Sun, Y. CSPD-DETR: Real-time silicon crystalline photovoltaic cell surface defect detection transformer for building photovoltaic systems. J. Build. Eng. 2025, 108, 112810. [Google Scholar] [CrossRef]
  15. Fioresi, J.; Colvin, D.J.; Frota, R.; Gupta, R.; Li, M.; Seigneur, H.P.; Vyas, S.; Oliveira, S.; Shah, M.; Davis, K.O. Automated defect detection and localization in photovoltaic cells using semantic segmentation of electroluminescence images. IEEE J. Photovoltaics 2021, 12, 53–61. [Google Scholar] [CrossRef]
  16. Ge, Z. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
  17. Su, B.; Chen, H.; Chen, P.; Bian, G.; Liu, K.; Liu, W. Deep learning-based solar-cell manufacturing defect detection with complementary attention network. IEEE Trans. Ind. Inform. 2020, 17, 4084–4095. [Google Scholar] [CrossRef]
  18. Lang, D.; Lv, Z. A PV cell defect detector combined with transformer and attention mechanism. Sci. Rep. 2024, 14, 20671. [Google Scholar] [CrossRef]
  19. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
  20. Khanam, R.; Hussain, M. Yolov11: An overview of the key architectural enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
  21. Zhang, Y.F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
  22. Su, B.; Zhou, Z.; Chen, H. PVEL-AD: A large-scale open-world dataset for photovoltaic cell anomaly detection. IEEE Trans. Ind. Inform. 2022, 19, 404–413. [Google Scholar] [CrossRef]
  23. Ketkar, N. Stochastic gradient descent. In Deep Learning with Python: A Hands-on Introduction; Springer: Berlin/Heidelberg, Germany, 2017; pp. 113–132. [Google Scholar]
  24. Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. Tood: Task-aligned one-stage object detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 3490–3499. [Google Scholar]
  25. Reis, D.; Kupec, J.; Hong, J.; Daoudi, A. Real-time flying object detection with YOLOv8. arXiv 2023, arXiv:2305.09972. [Google Scholar] [CrossRef]
  26. Wang, Z.; Li, C.; Xu, H.; Zhu, X. Mamba YOLO: SSMs-based YOLO for object detection. arXiv 2024, arXiv:2406.05835. [Google Scholar] [CrossRef]
  27. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
  28. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
  29. Vu, T.; Jang, H.; Pham, T.X.; Yoo, C. Cascade rpn: Delving into high-quality region proposal network with adaptive convolution. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar] [CrossRef]
  30. Mustikovela, S.K.; De Mello, S.; Prakash, A.; Iqbal, U.; Liu, S.; Nguyen-Phuoc, T.; Rother, C.; Kautz, J. Self-supervised object detection via generative image synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 8609–8618. [Google Scholar]
  31. Joseph, K.; Rajasegaran, J.; Khan, S.; Khan, F.S.; Balasubramanian, V.N. Incremental object detection via meta-learning. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 9209–9216. [Google Scholar] [CrossRef]
  32. Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
  33. Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
  34. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
  35. Zhang, H.; Li, F.; Liu, S.; Zhang, L.; Su, H.; Zhu, J.; Ni, L.M.; Shum, H.Y. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv 2022, arXiv:2203.03605. [Google Scholar]
  36. Yao, T.; Pan, Y.; Li, Y.; Ngo, C.W.; Mei, T. Wave-vit: Unifying wavelet and transformers for visual representation learning. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 328–345. [Google Scholar]
  37. Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
  38. Wang, C.Y.; Yeh, I.H.; Mark Liao, H.Y. Yolov9: Learning what you want to learn using programmable gradient information. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 1–21. [Google Scholar]
  39. Song, K.; Yan, Y. A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Appl. Surf. Sci. 2013, 285, 858–864. [Google Scholar] [CrossRef]
Figure 1. Heatmaps generated by our method and three object detection methods. The red boxes indicate crack defects, while the blue boxes represent star-shaped crack defects.
Figure 1. Heatmaps generated by our method and three object detection methods. The red boxes indicate crack defects, while the blue boxes represent star-shaped crack defects.
Micromachines 16 01003 g001
Figure 2. Overall architecture of our PSDD, which consists of the backbone network, feature fusion stage, and prediction head.
Figure 2. Overall architecture of our PSDD, which consists of the backbone network, feature fusion stage, and prediction head.
Micromachines 16 01003 g002
Figure 3. Architecture of the Subtle Feature Refinement Module.
Figure 3. Architecture of the Subtle Feature Refinement Module.
Micromachines 16 01003 g003
Figure 4. Structure of the background noise suppression block.
Figure 4. Structure of the background noise suppression block.
Micromachines 16 01003 g004
Figure 5. The mAP variation curves of PSDD compared to other models on the PVEL-AD dataset. (a,b) depict the trends of m A P 50 and m A P 50 : 95 , respectively.
Figure 5. The mAP variation curves of PSDD compared to other models on the PVEL-AD dataset. (a,b) depict the trends of m A P 50 and m A P 50 : 95 , respectively.
Micromachines 16 01003 g005
Figure 6. Confusion matrix comparison between the baseline and PSDD models. Labels represent a 9-class classification, including ‘bg’ for background (non-defective regions).
Figure 6. Confusion matrix comparison between the baseline and PSDD models. Labels represent a 9-class classification, including ‘bg’ for background (non-defective regions).
Micromachines 16 01003 g006
Figure 7. Visualization comparison of different models on PVEL-AD.
Figure 7. Visualization comparison of different models on PVEL-AD.
Micromachines 16 01003 g007
Figure 8. Heatmap visualization comparison of different models on the PVEL-AD dataset. The yellow boxes indicate finger defects, the red boxes indicate crack defects, and the blue boxes represent star-shaped crack defects.
Figure 8. Heatmap visualization comparison of different models on the PVEL-AD dataset. The yellow boxes indicate finger defects, the red boxes indicate crack defects, and the blue boxes represent star-shaped crack defects.
Micromachines 16 01003 g008
Table 1. Performance comparison of multiple models on the PVEL-AD dataset and mAP (%) results across all defect categories.
Table 1. Performance comparison of multiple models on the PVEL-AD dataset and mAP (%) results across all defect categories.
Model mAP 50 mAP 50 : 95 PRParams (M)FLOPs (G)Speed (ms)BcCrScrFiTlSciHdVd
Single-stage Detectors
TOOD [24]66.047.961.362.73.717.85.998.075.278.189.388.299.2
Yolov8 [25]87.461.990.381.53.58.19.097.078.973.288.989.799.598.873.0
Yolov10 [10]79.251.178.472.66.516.617.498.677.479.290.691.299.598.589.4
Yolov11 [20]90.651.778.972.98.014.512.098.383.883.789.790.899.598.574.4
YoloX [16]90.362.285.787.19.07.020.098.480.681.889.889.899.598.683.9
Mamba YOLO [26]91.662.187.985.55.813.214.498.880.780.789.990.899.496.296.9
Two-stage Detectors
Dynamic R-CNN [9]89.862.383.288.845.046.324.098.876.368.590.382.499.781.6
Faster R-CNN [27]79.754.077.075.923.326.422.097.575.282.789.389.499.298.5
Mask R-CNN [28]90.157.685.582.530.033.326.098.177.783.988.788.498.096.789.4
Cascade RPN [29]87.861.987.679.547.3136.635.598.879.682.289.788.999.596.967.9
SSOD [30]77.352.476.971.326.419.225.698.080.780.883.688.298.589.3
IOD [31]87.861.889.280.49.014.519.298.479.582.087.988.998.497.270.2
Transformer-based Detectors
DETR [11]90.461.278.989.741.660.527.095.880.171.887.490.099.286.899.5
Deformable DETR [32]91.261.985.487.039.952.424.597.780.9683.889.790.099.498.590.8
Rt-DETR [33]89.163.089.378.919.957.020.798.382.178.989.990.398.397.577.4
Swin-Transformer [34]78.454.671.387.154.6136.033.789.960.042.188.974.899.699.669.7
DINO [35]90.459.083.287.259.3123.032.098.780.186.889.991.199.589.386.7
Wave-ViT [36]90.661.885.988.366.196.412.498.678.281.089.991.199.596.390.6
PSDD (Ours)93.665.390.889.24.013.810.098.884.787.090.491.399.599.399.4
The best results are highlighted in bold, and ‘–’ indicates unavailable results.
Table 2. Results of ablation experiments for each module with 8 defect types.
Table 2. Results of ablation experiments for each module with 8 defect types.
Module mAP 50 mAP 50 : 95 PRBcCrScrFiTlSciHdVd
Baseline SFRM BNSB (%) (%) (%) (%) (%) (%) (%) (%) (%) (%) (%) (%)
87.461.983.681.597.078.973.277.289.799.598.873.0
89.962.785.287.998.384.781.289.689.499.399.477.5
92.164.587.285.398.280.881.090.990.499.599.496.8
93.665.390.889.298.884.787.090.491.399.599.399.4
✓ indicates that the corresponding module is included; “–” means the module is not included.
Table 3. Comparison results of different models on the NEU-Det dataset.
Table 3. Comparison results of different models on the NEU-Det dataset.
Model mAP 50 PRCrInPaPsRsSc
TOOD [24]72.475.768.142.871.587.780.959.991.9
Yolov8 [25]77.375.470.849.777.990.188.063.794.7
Yolov10 [38]74.268.369.740.274.391.983.361.494.4
Yolov11 [20]79.479.171.751.079.592.387.370.9 95.2
YoloX [16]74.570.469.737.879.390.386.060.692.5
Faster R-CNN [27]70.967.466.736.274.289.177.957.090.9
Rt-DETR [33]70.669.368.133.373.588.081.755.491.6
Deformable DETR [32]71.170.367.230.576.389.182.756.291.6
Swin-Transformer [34]70.369.965.331.874.288.079.357.290.9
Mamba YOLO [26]69.672.363.730.773.888.776.558.090.1
PSDD (Ours)80.981.480.652.484.193.490.668.396.4
Note: Best results are shown in bold.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sun, Y.; Huang, G.; Xu, C.; Guo, H.; Feng, Y. Photovoltaic Cell Surface Defect Detection via Subtle Defect Enhancement and Background Suppression. Micromachines 2025, 16, 1003. https://doi.org/10.3390/mi16091003

AMA Style

Sun Y, Huang G, Xu C, Guo H, Feng Y. Photovoltaic Cell Surface Defect Detection via Subtle Defect Enhancement and Background Suppression. Micromachines. 2025; 16(9):1003. https://doi.org/10.3390/mi16091003

Chicago/Turabian Style

Sun, Yange, Guangxu Huang, Chenglong Xu, Huaping Guo, and Yan Feng. 2025. "Photovoltaic Cell Surface Defect Detection via Subtle Defect Enhancement and Background Suppression" Micromachines 16, no. 9: 1003. https://doi.org/10.3390/mi16091003

APA Style

Sun, Y., Huang, G., Xu, C., Guo, H., & Feng, Y. (2025). Photovoltaic Cell Surface Defect Detection via Subtle Defect Enhancement and Background Suppression. Micromachines, 16(9), 1003. https://doi.org/10.3390/mi16091003

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop