Enhanced Detection of Subway Insulator Defects Based on Improved YOLOv5

Huang, Lifeng; Li, Yongzhen; Wang, Weizu; He, Zemin

doi:10.3390/app132413044

Open AccessArticle

Enhanced Detection of Subway Insulator Defects Based on Improved YOLOv5

¹

College of Engineering, South China Agricultural University, Guangzhou 510642, China

²

ZKROT Technology Co., Ltd., Guangzhou 510710, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(24), 13044; https://doi.org/10.3390/app132413044

Submission received: 24 October 2023 / Revised: 30 November 2023 / Accepted: 1 December 2023 / Published: 7 December 2023

Download

Browse Figures

Versions Notes

Abstract

:

Insulators, pivotal to the integrity of railway catenaries, demand impeccable functioning to prevent system failures. Their consistent assessment is vital for railway safety. Current insulator evaluations in subways predominantly involve human intervention, a method fraught with inefficiencies, inaccuracies, and oversights, exacerbated by the complex backdrop of subway tunnels and minuscule defect dimensions. This study introduces an enhanced algorithm, anchored in the YOLOv5 framework, to refine insulator defect identification. Challenges in defect detection include limited, imbalanced data samples and adaptability. Addressing this, an accurate catenary model mirrors the subway line’s architecture, facilitating the creation of synthetic instances of both intact and impaired insulators. An atomization technique augments the dataset volume, fortifying the algorithm’s resilience in reduced visibility conditions, such as fog. Tackling sample equilibrium, the study introduces an equilibrium loss function, assigning disparate weights to various sample categories during training, thereby sharpening the algorithm’s focus on positive instances, particularly those that are challenging to discern, and rectifying the disproportion in sample categories. Incorporating lightweight structures like GhostNet and the Efficient Channel Attention Network (ECA-Net) channel attention scheme not only diminishes the network’s computational demands, thereby elevating the detection capabilities, but also minimizes superfluous data processing, enhancing the accuracy in identifying smaller targets. Empirical analyses indicate substantial model optimization: a size reduction to 60 pp of its original (from 15 MB to 9 MB), a near 1.4 pp increase in mean average precision (mAP) to 96.57%, and a tripling of the detection speed (from 30 to 90 FPS). Real-world image assessments further reveal a mAP improvement of approximately 2.5 pp (reaching 98.43%), confirming the model’s suitability for real-time applications.

Keywords:

YOLOv5; railway; insulator; sample scarcity; atomization; lightweight; attention mechanism

1. Introduction

With the rapid urbanization of China, the role and significance of urban rail transport in its public transportation system have become increasingly prominent. As of the end of 2022, 290 urban rail transit lines spanning a total length of 9584 km have been opened in 53 cities across mainland China, with the subway system accounting for the majority at 78.3%. Guangzhou city has the third-longest metro line in China and its total metro mileage is 621 km. The daily average passenger flow of the Guangzhou Metro Line network was more than 8.3 million in the first 9 months of 2023. The subway has emerged as the primary mode of transportation for urban residents, underscoring the criticality of ensuring its safe operation. Consequently, the operation, maintenance, and overhaul of the subway’s bow network system have assumed greater importance. Subway catenary systems, integral to supplying power to electric locomotives through complex interactions between pantographs and overhead contact lines, are characterized by high failure rates due to their complex mechanical and electrical dynamics. Insulators, key components of these electrified railway lines, provide crucial mechanical support and electrical insulation. Nevertheless, these insulators are subject to degradation and defects owing to prolonged exposure to outdoor environments, with challenges stemming from variations in temperature, humidity, and other natural elements. Compromised insulators can lead to substantial service disruptions, triggering not only significant economic setbacks but also detrimental societal consequences [1]. Given these potential impacts, the rigorous inspection of insulators constitutes a critical measure in ensuring the uninterrupted functionality of contact line transmission systems [2]. Traditional inspection methodologies typically involve manual examinations or conventional image inspections. However, manual inspection is both hazardous and inefficient, relying primarily on non-motorized vehicles such as railroad flatcars that require multiple individuals to push them along the track. An inspector sits on the herringbone ladder of the flatbed truck and utilizes a flashlight and other light sources to observe and inspect the insulators and other components on the line. After detecting at a certain distance, the inspector is replaced to continue the inspection. This maintenance process is characterized by poor working conditions, high labor intensity, low detection efficiency, and a high risk of misdetection and omission, which can result in operational safety accidents. whereas conventional image processing techniques, which employ edge detection and color features, tend to be sluggish and lack accuracy in defect identification [3]. For open-air railway settings, drones have emerged as an efficient mechanism to capture insulator images over extensive areas. However, the applicability of this method diminishes within the confines of railway tunnels, especially those of subway systems, areas that current research scarcely addresses. However, the rapid advancements in deep learning technology herald new opportunities for the accurate and efficient detection of insulator defects within subway environments. This burgeoning field shows substantial promise in overcoming the limitations of existing methods, potentially redefining the standards for insulator inspection in subway catenary systems.

Currently, insulation detection algorithms predominantly fall into two categories: two-stage and one-stage detection algorithms. The former, exemplified by models such as the Region-Based Convolutional Neural Network (R-CNN) [4], Fast R-CNN [5], Faster R-CNN [6], and Mask R-CNN, initiates the detection process by generating region proposals on the input image, followed by feature extraction and classification. Despite their efficacy, these models are often encumbered by extensive computational requirements, resulting in substantial sizes and slow processing speeds that impede real-time application. Conversely, one-stage algorithms—including models like Single-Shot MultiBox Detector (SSD) [7,8], RetinaNet, and the You Only Look Once (YOLO) [9,10,11] series—streamline the detection process by addressing object localization as a regression problem, eliminating the need to generate region proposals. This approach significantly enhances the detection speed. Nonetheless, a limitation arises when these algorithms are deployed for small target detection, where a compromise in detection accuracy is often observed. Given the critical balance between speed and accuracy in real-time applications, there is an imperative to optimize one-stage algorithms.

Yi et al. [12] modified the sliding window ratio and implemented a hard sample adversarial generation strategy to enhance the efficiency of insulation detection using the Faster R-CNN network, but the enhanced model’s detection speed remained limited at 1.2 frames per second (FPS). Zhao et al. [13] introduced an insulation identification technique merging an attention mechanism with Faster R-CNN to augment the recognition accuracy; however, the additional network parameters reduced the model’s detection velocity. Aiming for meticulous insulation defect detection, Jiang et al. [14] and his team devised a thorough multi-level perception approach based on the SSD algorithm, although this method necessitated an extended duration for image processing. YOLOv3, a seminal model in the YOLO series, boasts substantial merits in detection speed and accuracy. Liu et al. [15] introduced an ameliorated YOLOv3-inspired insulation recognition algorithm, integrating YOLOv3 with dense blocks to refine the feature extraction network and employing a multi-level feature mapping module to enhance the network’s feature fusion capabilities. Yao et al. [16] put forth a GIOU-YOLOv3 method for insulation detection and positioning, substituting the original loss function with the GIOU loss function to refine the insulation detection accuracy without escalating the model size, although substantial insulation target omissions were observed in the test results. Liu et al. [17] suggested an enhanced YOLOv4 method for power insulation defect detection, incorporating a weight coefficient in the balanced cross-entropy to amplify the loss function’s impact, and augmenting the network depth with additional convolution layers surrounding the spatial pyramid structure. While experimental analyses demonstrated the method’s proficiency in insulation defect identification, the efficiency did not meet optimal standards. In 2020, Ultralytics unveiled YOLOv5 [18], combining a focus structure, Generalized Intersection over Union (GIoU) Loss [19], and a feature pyramid structure, potentially addressing issues related to complex backgrounds, diminutive targets, and overlapping objects in conventional images. Jia et al. [20] formulated a DE-YOLO detection network predicated on YOLOv5, advancing the accuracy of complex background insulation extraction, but its efficacy did not satisfy real-time detection criteria.

In machine learning, loss functions quantify the disparity between a model’s predictions and actual data. Altering these functions can enhance the accuracy and bolster the robustness of various models. Li et al. [21] innovated within this space by modifying YOLOv5 algorithm’s loss function, implementing dynamic weight adjustments for positive and negative samples to heighten the detection accuracy, albeit at the expense of a reduced detection speed. Tang et al. [22] introduced an approach that incorporated a triple attention mechanism into the network, employing Complete IoU (CIoU) Loss as the network regression loss function to expedite network convergence. However, this methodology has yet to achieve the benchmarks necessary for real-time detection, indicating a need for further refinements.

Subway insulators, predominantly situated in tunnel environments, present a unique challenge for defect inspection due to the constrained inspection windows—typically in the early hours—and the need for the immediate replacement of defective units upon detection. This practice renders the accumulation of a substantial defective insulator image dataset for training purposes challenging. Compounding this, the elevated humidity levels common in southern locales and the prevalent foggy conditions within tunnels further complicate inspection efforts. Addressing these multifaceted challenges, this study pioneers the construction of a partial contact network model, meticulously mirroring the subway line’s actual structure and dimensions. This innovative approach facilitates the generation of both normal and defective insulator samples, thereby mitigating the issue of sample insufficiency. To enhance the model’s applicability in adverse weather conditions, the dataset is augmented through the integration of a fogging algorithm [23], ensuring adaptability in fog-prevalent scenarios. Furthermore, acknowledging the necessity for sample equilibrium during model training, the study introduces a Balanced Loss (BL) function [24]. This strategic function assigns differential weighting to various sample categories—positive, negative, challenging, or straightforward—thereby optimizing the model’s attentiveness to positive and particularly elusive samples. This method effectively addresses the prevalent imbalance among diverse sample types. In the final phase of enhancement, the incorporation of a lightweight GhostNet module [25] and Efficient Channel Attention Network (ECA-Net) [26] channel attention mechanism proves instrumental. This dual integration not only curtails the network’s computational demands but also significantly accelerates the detection velocity, all without compromising the fidelity of image information capture. Simultaneously, the model’s propensity for redundant information is minimized, leading to an improvement in the detection accuracy concerning diminutive targets. Pollution, flashover, etc., can also affect the normal operation of insulators. However, for insulator pollution and flashover conditions, their datasets will be more difficult to obtain due to their much lower probability of occurring in the subway system. However, if sufficient datasets can be obtained, the present method will still be applicable.

2. Related Work

2.1. YOLOv5 Model Architecture

The YOLO series, renowned for its speed and exceptional portability, has garnered widespread utilization across various domains. The architecture of the YOLOv5 model is segmented into four principal components, the input, backbone, neck, and head, each depicted in Figure 1. The process initiates with the input network sampling the image, achieving downsampling while preserving comprehensive image information. Subsequently, the backbone network undertakes the primary task of image feature extraction, facilitated through convolutional modules coupled with cross-stage local modules ingrained with residual structures. Notably, the Conv+BN+LeakyReLu (CBL) module, comprising convolution, batch normalization, and the Leaky ReLU activation function, and the cross-stage local C3 module, contribute significantly to this phase. Progressing to the neck network, the generation of feature pyramids is conducted, incorporating both a feature pyramid network (FPN) and path aggregation network (PAN). The FPN serves a critical role in assimilating robust semantic attributes from the higher levels, employing a top-down approach. In contrast, the PAN complements the FPN by transmitting potent localization features, preserving a wealth of image attributes critical for the ensuing object detection derived from the image feature information extracted by the backbone network. This dual system enhances the network’s capacity for feature integration. Concluding the process, the head network is responsible for the ultimate detection stage. This involves the employment of a loss function for the predicted bounding box and the implementation of the Non-Maximum Suppression (NMS) algorithm. YOLOv5 employs GIOU_Loss as its chosen loss function for predicted boxes, effectively addressing the challenge of overlapping bounding boxes and augmenting the velocity and accuracy of box regression. Concurrently, NMS functions to refine the detection boxes during the prediction phase of target detection, thereby optimizing the model’s output.

2.2. BL Function

The YOLOv5 model employs a cross-entropy loss function for classification tasks; however, this approach does not effectively mitigate the challenges posed by sample imbalance throughout the training process. Particularly in scenarios such as insulator defect detection, the majority of identifiable defects, though prevalent, are minuscule and elude unaided visual detection, especially within the complex confines of actual subway environments. This complexity leads to a preponderance of negative samples within the pre-selected boxes generated by YOLOv5, precipitating an inequitable distribution between positive and negative samples. Consequently, the acquisition of essential difficult and positive samples for training becomes cumbersome, while an abundance of negative samples exerts an undue influence over the gradient descent process, diminishing the accuracy of defect detection. Furthermore, during practical training scenarios, the proportion of intact and defective insulators is disproportionately skewed, with the latter being notably more challenging to discern, potentially undermining the algorithm’s detection efficacy. To navigate this predicament, this study introduces a BL function, supplanting the traditional cross-entropy loss. This strategic adaptation enables the model to allocate increased attention to positive and challenging samples throughout the training phase. Given that cross-entropy quantifies the divergences between two probabilistic distributions, it is conventionally harnessed in classification endeavors to gauge the discrepancy between the distributions of labels and predictions. The equation for cross-entropy loss [24] is shown as follows:

L_{_{C E}} = \{\begin{matrix} - \log (p), & Positive sample \\ - \log (1 - p), & Negative samples \end{matrix}

(1)

where p represents the probability that a sample is a positive sample. The balanced positive and negative sample loss is achieved by adding a parameter

α

before the cross-entropy loss, as shown in the following equation of

L_{C E^{r}}

:

L_{C E^{r}} = \{\begin{matrix} - α \log (p), & Positive samples \\ - (1 - α) \log (1 - p), & Negative samples \end{matrix}

(2)

where

α

can balance the ratio of positive and negative samples, but it does not rectify the disproportion between straightforward and complex samples. Typically, datasets exhibit a preponderance of simple samples, significantly influencing the cumulative total loss due to their abundance. However, in defect detection, an overemphasis on straightforward samples can undermine the defect identification accuracy. Consequently, the model necessitates a redirected focus towards more complex samples, such as damaged insulators. Therefore, reducing the loss associated with simple samples inherently elevates the importance of complex samples within the model’s purview. The equation of

L_{B L}

to obtain BL is expressed as follows:

L_{BL} = \{\begin{matrix} - α {(1 - p_{1})}^{γ} \log (p_{1}), & damaged \\ - α {(1 - p_{2})}^{γ} β \log (p_{2}), & Normal Insulators \\ - \dot{α} {(1 - \dot{p})}^{γ} \log (1 - \dot{p}), & other \end{matrix}

(3)

where

\dot{p} = p_{1} + p_{2}

;

\dot{α} = 1 - α

;

p_{1}

represents the probability of occurrence of damaged insulators;

p_{2}

represents the probability of occurrence of normal insulators; the value of

β

is the ratio of the simple number of damaged insulators to the simple number of normal insulators; and the value of

γ

is used to reduce the loss of easily separable samples (i.e., normal insulator samples).

2.3. Improved Feature Extraction Network

Deep learning often employs particularly deep network models, which tend to generate analogous feature maps essential for effective input data utilization. Nonetheless, the creation of these congruent feature maps elevates the computational demands. GhostNet [25] presents a solution by revamping the YOLOv5 backbone network, using straightforward linear transformations to engender equally comprehensive feature maps while lessening the computational difficulty, thereby optimizing the network’s efficiency and portability. Central to GhostNet is its innovative ghost module, formulated to streamline the model architecture. This is achieved by employing depthwise separable convolution, a marked deviation from traditional convolution practices, leading to a significant reduction in the computational burden. Under standard convolution parameters, assuming an input feature map size of

l \times h \times c

and an output of

l^{'} \times h^{'} \times c^{'}

, with an

m \times m

convolution kernel, the computational demand is

C_{1} = l^{'} \times h^{'} \times c^{'} \times m \times m \times c

. The ghost module introduces an efficient convolution kernel sizing strategy,

m_{1}

and

m_{2}

, for the initial standard and subsequent depthwise separable convolutions, respectively. This strategy not only generates n extra feature maps but also dramatically reduces the computational needs to

1 / n

of the original. Within the training scope, the ghost module emphasizes pointwise convolution, enhancing the processing efficiency. This research integrates the ghost module into the foundational model, a strategic move that aims to minimize the model’s overall dimensions, alleviate its computational requirements, and boost the model’s detection alacrity.

2.4. ECA-Net Feature Extraction

Utilizing lightweight networks restricts the computational parameters and model complexity; however, it risks the suboptimal extraction of pivotal feature information, potentially diminishing the detection accuracy. To counteract this and augment the model’s target feature detection capabilities, this study incorporates an attention mechanism within the network structure. In the realm of computer vision, attention mechanisms are pivotal in enabling models to distill more salient information, directing the focus towards specific target regions, and thereby enhancing the feature extraction efficacy. Specifically, channel attention mechanisms prioritize crucial channels within the input imagery, deepening the granularity of feature information across various channels. This proves advantageous for the model’s learning trajectory, aiding in the assimilation of detection target features and their subsequent localization. Notable examples include the Squeeze-and-Excitation Network (SE-Net) and ECA-Net [26]. ECA-Net, in particular, stands out due to its minimal operational demand while proficiently capturing cross-channel information. In pursuit of a balance between comprehensive feature learning—specific to insulators and their corresponding defects—and maintaining an overall lightweight framework, this research introduces ECA-Net, a streamlined channel attention mechanism. This strategic inclusion empowers the network to distill more significant data from the input imagery, all while operating with a reduced parameter set.

The strategic placement of attention mechanisms within a neural network model can significantly influence its performance, with various positions potentially yielding different results. This research proposes two methodologies for the integration of the ECA module, aiming to optimize the model’s capacity for salient information extraction.

1. One approach involves the incorporation of the ECA module within the ghost bottleneck module, specifically when the stride is configured to one. Here, the ECA module is inserted into the inaugural ghost module within the ghost bottleneck structure. This enhanced configuration, termed the ECA-Bottleneck module, serves as the building block for the ECA-GhostNet backbone network, with the intention of augmenting the network’s feature discernment capabilities. The architecture of this novel ECA-Bottleneck module is delineated in Figure 2.

2. During the feature fusion phase, the ECA module is positioned subsequent to the tri-layer feature extraction process, bolstering the model’s attentiveness to feature details. Additionally, the attention mechanism is deployed following PANet’s upsampling procedure. This strategic placement not only intensifies the amalgamation of global information but also fosters a more dynamic interplay of contextual information within the model, thereby enhancing its overall efficacy.

3. Methods

3.1. Image Acquisition

The distinct characteristics of subway environments necessitate specific protocols, including restricting access to tunnels, typically limited to authorized personnel during pre-dawn maintenance hours. The operational constraints of subways preclude the use of unmanned aerial vehicles for image capture, a technique otherwise employed in conventional railway settings. Compounding these challenges is the need for the prompt replacement of defective insulators, which impedes the collection of authentic defect imagery in situ. In response to these limitations, and to obtain a robust dataset of defective insulators, this study engineered a representative model of the contact network, meticulously mirroring the actual structure and dimensions of a functioning subway contact network. The insulators incorporated into this model comprised both damaged and pristine specimens, sourced from the Guangzhou subway system. A comparative analysis between the constructed model and the authentic contact network was undertaken, with correlating visuals presented in Figure 3.

Upon the strategic configuration of the experimental framework and model, a diverse array of perspectives and proximities were employed to obtain a comprehensive dataset pertaining to the contact network insulators. Image acquisition was executed utilizing a Daheng ME2P-2621-4GC-P camera, the specifications of which are delineated in Table 1.

Within the constructed experimental simulation, image capture was primarily conducted during the early morning and evening hours under low-light conditions, mimicking the auxiliary illumination typically present in actual subway environments. Real-world parameters indicate a spatial range of 4 m to 6 m between the track and the contact wire, as evidenced by the 4-m tunnel height in the Guangzhou Metro Line 3 and a 6-m height in Line 18, with actual distances subject to variation due to angular imageic influences. This study, therefore, employed seven distinct shooting distances, ranging from 4 m to 6.5 m, in 0.5-m increments. Given the mobile nature of the detection equipment within the subway system, it is possible to secure images of insulators from multifaceted angles. Accordingly, the study utilized five shooting angles, dispersed at 22.5-degree intervals, spanning from 0 to 90 degrees. The predominant form of insulator impairment observed was structural damage (refer to Figure 4), with an observed size disparity ranging from 0.5 cm to 10.5 cm. The experimental dataset comprised a total of 1000 images, encompassing 647 instances of defective insulators and 353 instances of their intact counterparts.

To assess the applicability of the model derived from the experimental dataset, this study conducted an evaluation using images captured in authentic environments. Specifically, 200 images were procured from Guangzhou Metro Line 3, as depicted in Figure 5. The imageic process employed a detection vehicle outfitted with identical camera and lighting apparatuses, both affixed to a gimbal capable of angular adjustment. Throughout its operation, the vehicle traversed the track, enabling the acquisition of insulator images from diverse perspectives.

3.2. Image Preprocessing

Effective model training necessitates a substantial volume of images. However, reliance on images from a singular scene can precipitate model overfitting, undermining its generalization and robustness. In light of the constrained scene diversity and image volume, data augmentation strategies, encompassing rotation, stitching, and flipping, are imperative. To fortify the model’s resilience, particularly for detection under hazy conditions, this study innovates beyond traditional data augmentation techniques by incorporating a fogging preprocessing method.

Atomization algorithm

Color digital images are typically stored with each pixel harboring values for three color components, Red, Green, and Blue (RGB), where a higher value denotes a stronger presence of the corresponding color. In contrast, grayscale images consolidate the three RGB colors into a single channel by computing their average. Here, the maximum value (normalized to 1) signifies pure white, and the minimum value (0) denotes pure black. Customarily, in the majority of non-sky zones within color images, at least one channel of the pixel values is significantly low, a principle known as the “dark channel prior”. The mathematical representation [23] of this principle is shown as follows:

J^{dark} (x) = \min_{y \in Ω (x)} (\min_{c \in \{r, g, b\}} J^{c} (y)) \to 0

(4)

where

J^{c}

represents each channel of the color image;

J^{d} a r k

represents the output grayscale image. The process of calculating the output using Equation (4) is to first find the minimum value of the RGB components of each pixel and then store it in a grayscale image of the same size as the original image, and finally to perform minimum value filtering on the grayscale image;

Ω (x)

represents a filtering window centered on the pixel.

Building on these foundational concepts, a formation model for the generation of foggy images was proposed as follows [27]:

I (x) = J (x) t (x) + L (1 - t (x))

(5)

where J(x) represents the original haze-free image; L denotes the brightness of the atmospheric light component; and t(x) represents the transmission rate. The value range of brightness L is from 0 to 1, which represents the color of the added haze—essentially, its grayscale value. When L is set to 0 and 1, pure white haze and black haze are added, respectively. An increase in L’s magnitude results in haze with increased whiteness.

In Equation (5), the transmittance t(x) is from 0 to 1, representing the ratio of the original image content to the fog component in the output image. t(x) of 1 yields an output identical to the original image, whereas t(x) of 0 indicates total obfuscation by fog, resulting in a pure fog image. Therefore, it is necessary to set a suitable t(x) for each pixel within the image, a process that is mathematically defined as follows [28]:

t (x) = \exp [- D (- 0.0197 \sqrt{{(w - w_{C})}^{2} + {(h - h_{C})}^{2}} + s)]

(6)

where −0.0197 is a fixed parameter selected in this study, signifying the maximum value attainable when the fogging spans from the image’s central point (where fogging is concentrated) to its periphery, contingent on the dimensions of the images captured within this investigation; (

w_{c}

,

h_{c}

) denotes the center of the image, which is selected as the center of fogging; s represents the size of fogging, which is the square root of the maximum value of the width and height of the image; D represents the thickness coefficient of fog. From Equation (6), it can be observed that an increase in the value of D correlates with a decrease in the value of

t (x)

, indicating a heightened fog intensity and, consequently, a reduction in transmittance. Conversely, a decrease in the value of D results in an increase in the value of

t (x)

, signifying a reduced fog density and, consequently, enhanced transmittance.

2.: Optimal selection of L and D for image fogging

To augment the dataset realistically, this study meticulously calibrated the aerosol coefficients L and D based on empirical visibility conditions in fog. Rigorous testing indicated that with a constant fog thickness coefficient D, images became excessively dim and lost discernible features when the atmospheric light brightness coefficient L was below 0.2. Conversely, an L value exceeding 0.8 resulted in overly illuminated images, an unlikely scenario in practical diagnostics due to insufficient fill light. Accordingly, this research adopts an L value within the 0.2 to 0.8 range, specifically selecting 0.2, 0.4, 0.6, and 0.8 as optimal points.

Similarly, when the fog thickness coefficient D escalated beyond 0.04, the dense fog significantly obscured critical image information. Given that operational protocols preclude diagnostics in such low-visibility scenarios, instances with

D \geq 0.04

were deemed impractical and thus excluded. The study instead identified D values at 0.01, 0.02, and 0.03 as representative of realistic fog conditions.

The confluence of the meticulously selected L and D coefficients yielded a robust augmentation to the dataset. The final empirical formulation involved 12 distinct parameter pairings, derived from permutations of L at 0.2, 0.4, 0.6, 0.8 and D at 0.01, 0.02, 0.03. Visual representations of selected coefficient combinations are provided in Figure 6.

4. Experimental Results and Analysis

4.1. Dataset Configuration and Experimental Environment

In conventional machine learning workflows, a dataset is typically segregated into three distinct subsets: a training set, a validation set, and a test set. The training set’s primary function is to facilitate the initial model learning process and the fine-tuning of its intrinsic parameters. Concurrently, the validation set serves as a benchmark for periodically assessing the model’s iterative performance, thereby guiding parameter optimization for enhanced model stability. The test set, however, is reserved for the ultimate evaluation of the model’s generalizability, post-training. To mitigate the risk of model overfitting and to foster a more comprehensive learning process, it is pivotal to ensure a uniform distribution of image samples within these subsets. This precaution helps to avoid the model’s overexposure to redundant images from identical locations, thereby bolstering both the training efficacy and the evaluation accuracy. Adhering to this principle, the study employs a random stratification approach to partition the dataset into an 8:1:1 ratio across the training, validation, and test sets, respectively. The main configuration of the experimental platform is shown in Table 2.

This study employs a dataset comprising 1000 images compiled within a controlled experimental setting. A common issue during the model training phase is overfitting, where the model exhibits high accuracy on the training set but fails to generalize effectively to new, unseen data. To counteract this and bolster the dataset’s robustness, this study implemented data augmentation strategies. Techniques including rotation, stitching, flipping, brightness modulation, and fogging were utilized to diversify and expand the dataset, culminating in a total of 3000 insulator images. Annotation of the dataset was executed using the LabelImg software version 1.3.0, adhering to the annotation standards set by the Visual Object Classes (VOCs) dataset. Within this framework, insulators and their respective defects were distinctly marked and categorized as Insulator and Defect. As illustrated in Figure 7, insulators are demarcated with green bounding boxes, whereas defects are highlighted with yellow boxes, ensuring clear visual distinction and accuracy in subsequent model training.

Upon finalizing the experimental environment, the study progressed to the model training stage, characterized by specific parameter configurations: the training model selected was YOLOv5s, input image dimensions were standardized to

640 \times 640

pixels, the batch size was established at 64, a learning rate of 0.01 was applied, and the training was designed to span 300 epochs.

4.2. Algorithm Evaluation Metrics

The efficacy of object detection algorithms is quantitatively assessed through the application of accurate evaluation metrics, facilitating the comprehensive analysis of experimental results. The employed metrics encompass average precision (AP) for individual detection target classes and mean average precision (mAP) across all classes. Additionally, the model detection speed is gauged in FPS, complemented by evaluations of the model size and the frequency of missed detections. The computations of accuracy, recall, and miss rate are executed in accordance with the subsequent equations:

P_{re} = \frac{TP}{TP + FP}

(7)

R_{e} = \frac{TP}{TP + FN}

(8)

M_{r} = \frac{FN}{TP + FN}

(9)

where TP refers to the number of true positive samples that are correctly predicted; FN refers to the number of positive samples that are incorrectly predicted as negative samples (i.e., the correct samples are not detected); FP refers to the number of negative samples that are incorrectly predicted as positive samples;

R_{e}

(recall) is the recall rate;

P_{r e}

(precision) is the precision rate; and

M_{r}

(miss rate) is the detection rate.

The calculation method for AP is delineated as follows, with a total of n categories:

AP = \int_{0}^{1} P (r) d r

(10)

The mAP is calculated using a weighted average with sample frequencies as weights. AP, denoting the area beneath the accuracy–recall curve, quantifies the model’s average efficacy across varying levels of recall, serving as a holistic measure of model quality. Elevated AP values indicate heightened accuracy in target recognition. Conversely, a higher FPS denotes expedited detection processes, while a diminished miss rate implies a reduction in undetected targets.

4.3. Ablation Study on BL Function Parameters

In Equation (3), the parameter

β

is utilized to mediate the balance between positive and negative samples by adjusting their respective proportions. A heightened

α

augments the relative prevalence of positive samples. Conversely, the parameter

β

is employed to mitigate the loss attributed to undamaged insulator samples. Given the approximate 2:1 ratio of damaged to undamaged insulator samples in this investigation,

β

is assigned a value of 2. The parameter

γ

serves to curtail the loss associated with samples that are readily distinguishable—specifically, the undamaged insulator samples. An increase in

γ

diminishes the weighting accorded to these easily classifiable samples. This study elucidates the implications of modifications in the values of

α

,

β

, and

γ

for the detection efficacy, as evidenced in Table 3 (under the conditions of

α = 0.5

,

β = 1

, and

γ = 0

, the BL function reverts to the standard cross-entropy loss).

The BL function optimizes the distribution of sample losses by fine-tuning the weights assigned to positive and negative samples, as well as those that are easy or challenging within the loss function. This strategy bolsters the model’s proficiency in identifying challenging samples without compromising its capability to recognize simpler ones, thereby enhancing the accuracy of insulator defect detection. As indicated in Table 3, the parameters

α

,

β

, and

γ

must be judiciously chosen in line with the sample distribution. Extremes in these values can negatively impact the model’s detection accuracy. Given that challenging samples are comparatively scarce and elusive, an unduly low weight in the model diminishes the detection efficacy for these samples. In contrast, an excessively high weight can impede the accuracy in identifying numerous straightforward samples. The experimental data in Table 3 demonstrate that setting

α = 0.75

,

β = 1.5

, and

γ = 1

yields the ideal accuracy in insulator defect detection, marking a 5.8 pp enhancement relative to the baseline YOLOv5 model. This optimal ratio is incorporated into the subsequent algorithms discussed in this manuscript.

4.4. Results and Analysis of Ablation and Comparative Studies

This study aims to authenticate the efficacy of integrating the ECA-Net attention mechanism alongside the GhostNet lightweight module through systematic ablation experiments. These experiments on the test set involve comparative analysis by sequentially excluding specific modules. The evaluation encompasses four distinct models, with the consequential findings delineated in Table 4.

The data in Table 4 reveal that the foundational YOLOv5 model demonstrated mean average precision of 95.06% across both insulators and their defects. Transitioning YOLOv5’s backbone network to the more streamlined GhostNet resulted in a marginal decline in mean average precision by roughly 2.8 pp, and a more pronounced 3.98 pp reduction for insulator defect detection. Despite this, there was a decrease in model complexity, with the model size shrinking to half of its original and a detection speed surge from 30 FPS to 95 FPS, thereby satisfying the prerequisites of embedded edge computing devices. In contrast to the sole enhancement of the GhostNet backbone network, the integration of the ECA-Net attention mechanism into the original YOLOv5 model led to a substantial increase in mean average precision (by approximately 2.68 pp), albeit with a speed reduction to 24 FPS from 30 FPS. This suggests that while the ECA-Net attention mechanism augmented the richness of information captured during the feature extraction phase, it also elevated the computational demands of the algorithm, consequently diminishing its efficiency. Further, when the ECA-Net attention mechanism was superimposed on the YOLOv5-GhostNet framework during the feature fusion phase, there was an approximate 1.4 pp improvement in mean average precision, particularly a nearly 2 pp rise in recognizing insulator defects. Concurrently, the model size contracted to 60 pp of its initial size, and the detection velocity improved from 30 FPS to 90 FPS. These observations indicate the efficacy of the ECA-Net attention mechanism in augmenting target discernment and in more effectively mining requisite feature information. As a result, this research elects to incorporate the attention mechanism within the feature fusion segment of the model.

This study utilizes heat maps and result visualizations to further elucidate the influence of ECA-Net on the detection model, as depicted in Figure 8 and Figure 9. In these heat maps, areas of greater luminosity correspond to regions where the network directs more attention. Figure 8b highlights that the YOLOv5-GhostNet model fails to allocate sufficient attention to insulators, leading to suboptimal detection results and diminished confidence levels, especially for smaller insulators. Conversely, Figure 8c,d indicate that incorporating the ECA-Net mechanism enables the algorithm to concentrate more precisely on the intended targets. Specifically, the YOLOv5-GhostNet-ECA configuration succeeds in reliably identifying insulators, even against complex backdrops. When compared to the original YOLOv5 algorithm shown in Figure 8a, there is an enhancement in both the attention allocated and the detection accuracy pertaining to smaller targets.

The visualization results in Figure 9 reveal that while both the original and YOLOv5-GhostNet models demonstrate decent detection capabilities and substantial confidence levels for insulators, they underperform in identifying defects and smaller insulators, as evidenced by their lower confidence scores. Notably, the streamlined parameter set of the YOLOv5-GhostNet model correlates with a marked reduction in accuracy for diminutive targets. The integration of the ECA-Net attention mechanism noticeably amplifies the detection accuracy for minor defects. Consequently, the algorithm proposed herein exhibits a discernible accuracy enhancement relative to the original model.

To assess the efficacy and reliability of the proposed algorithm, it was benchmarked against an array of established algorithms on the test set, encompassing single-stage detectors such as YOLOv3, YOLOv4, YOLOv5, YOLOv8, and SSD, as well as the two-stage detector Faster R-CNN. Experimental comparisons, detailed in Table 5, were meticulously conducted, ensuring accuracy in the use of professional terminology.

The prevalent standard for real-time detection is a detection speed exceeding 30 FPS. Table 5 illustrates that while Faster R-CNN offers satisfactory accuracy, its extensive model size and high parameter count drastically reduce the detection speed, rendering it unsuitable for real-time defect detection in insulators. The SSD algorithm, although fast, compromises on accuracy and possesses a larger model. Both YOLOv3 and YOLOv4 fall below the 30 FPS threshold, with accuracies inferior to YOLOv5. Contrasting with Faster R-CNN and SSD, the algorithm developed in this study shows a marginal reduction in mean average precision—1.75 pp and 7.36 pp, respectively. However, it boasts a model size that is 57.6 times smaller than that of Faster R-CNN and 9.7 times smaller than SSD’s. Furthermore, the detection speeds surpass those of Faster R-CNN and SSD by 78.8 pp and 34.4 pp, respectively. Conclusively, the proposed algorithm stands out among its counterparts, achieving an optimal balance between detection accuracy and speed, thus fulfilling the prerequisites for real-time detection.

4.5. Field Detection and Visualization Analysis

This study further evaluates the enhanced algorithm’s practical application by deploying the refined model in real-world conditions, analyzing 200 field images. China’s subway system includes the extremely strict management of subway operation and maintenance, as the corresponding subway staff can only enter the operation in the early hours of the subway operation window period. Once the staff find a defective insulator, it will be replaced with a normal insulator. The contact network dataset used in this study was obtained on certain days, and we did not obtain a real image of the defective insulators in the line. Therefore, we only validate insulator recognition in real scenarios in this part. The results, detailed in Table 6, solely encompass insulator detections due to the absence of defects in the real-world images. Figure 10 visually illustrates the field detection, with insulators delineated by red boxes. This real-world application exhibits the algorithm’s efficacy and utility in practical settings.

The visual results presented in Figure 10 and the data in Table 6 indicate that the YOLOv5-GhostNet-ECA model proposed in this study enhances the detection velocity relative to the baseline model, while also improving the identification efficacy for smaller insulators, a direct benefit of integrating the ECA-Net attention mechanism, as evidenced by a marked rise in confidence levels. Notably, the model maintains robust detection and recognition capabilities under diverse environmental conditions, including fog. These findings demonstrate the model’s advanced performance in detecting and categorizing targets across varying environmental contexts, affirming its superiority, especially in visibility-compromised situations.

5. Discussion

This study introduces an enhanced, lightweight algorithm for insulator detection and defect identification based on a modified YOLOv5 framework, addressing the challenges posed by limited sample availability and validating the algorithm’s efficacy in real-world conditions. Initially, a detailed model reflecting the specific network structure and dimensions of subway lines is established, facilitating the generation of artificial samples, including both intact and damaged insulators, to mitigate sample scarcity. Addressing the need for sample equilibrium, the study advocates for a BL function, allocating distinct weights to positive, negative, and particularly challenging samples throughout the training phase, thereby augmenting the model’s sensitivity to positive and hard-to-identify samples. Furthermore, the algorithm innovates by restructuring YOLOv5’s backbone network into GhostNet, enabling more streamlined feature extraction from input images. During the feature fusion phase, the algorithm employs depth-wise separable convolution to optimize the computational efficiency while integrating the ECA-Net attention mechanism, ensuring that critical information is effectively discerned. To equip the model for performance in foggy conditions, the dataset undergoes fog simulation processing. Experimental results indicate that the proposed modifications enhance the mAP from 95.46% to 96.57%, reduce the model size from 15 MB to 9 MB, and boost the detection speed from 30 FPS to 90 FPS in experimental settings, compared to the original YOLOv5 framework. In real-world applications, the mAP witnesses an approximate 2.5 pp increase, demonstrating the practicality of utilizing experimentally derived datasets for real-world detection tasks. Since it is difficult to obtain an insulator defect dataset in real scenarios, the effectiveness of this paper’s method in detecting defects needs to be further tested. The developed algorithm fulfills the accuracy and velocity criteria necessary for insulator defect detection in inspection vehicles, exhibits adaptability to diverse inspection environments, and is suitable for real-time deployment on inspection vehicles. Future investigations will extend to the recognition of other insulator anomalies, including pollution and flashover occurrences.

Author Contributions

Conceptualization, L.H., W.W. and Z.H.; methodology, L.H.; software, L.H.; validation, L.H. and Y.L.; formal analysis, L.H.; investigation, L.H.; resources, W.W. and Z.H.; data curation, L.H. and Y.L.; writing—original draft preparation, L.H.; writing—review and editing, L.H. and W.W.; visualization, L.H.; supervision, W.W. and Z.H.; project administration, W.W. and Z.H.; funding acquisition, Z.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the project ‘Intelligent Inspection Vehicle for Contact Grid Suspension Devices’, grant number: H220781.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset is available at GitHub at: https://github.com/Edwardjiuxia/YOLOv5--GhostNet--ECA, accessed on 1 December 2023.

Conflicts of Interest

Author Zemin He was employed by the company ZKROT Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Xu, Z.; Lü, Z. Research on sampling method of insulator leakage current based on independent component analysis. Chin. J. Sci. Instrum. 2010, 31, 2861–2866. [Google Scholar]
Zhao, Z.; Wang, L. An automatic positioning method for aerial insulator string images. Chin. J. Sci. Instrum. 2014, 35, 558–565. [Google Scholar]
Dong, Z. Real-time detection of key components of power line based on YOLOv3. Electron. Meas. Tech. 2019, 42, 173–178. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv 2014, arXiv:1311.2524. [Google Scholar]
Girshick, R. Fast R-CNN. arXiv 2015, arXiv:1504.08083. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv 2016, arXiv:1506.01497. [Google Scholar] [CrossRef]
Fu, C.Y.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. DSSD: Deconvolutional Single Shot Detector. arXiv 2017, arXiv:1701.06659. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Volume 9905, pp. 21–37. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. arXiv 2016, arXiv:1612.08242. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. arXiv 2016, arXiv:1506.02640. [Google Scholar]
Yi, J.; Chen, C.; Gong, G. Transmission line aerial insulator detection based on improved Faster-R-CNN. Comput. Eng. 2021, 47, 292–298, 304. [Google Scholar]
Zhao, W.; Chen, X.; Zhao, Z.; Zhai, Y. Insulator recognition based on attention mechanism and Faster-R-CNN. J. Intell. Syst. 2020, 15, 92–98. [Google Scholar]
Jiang, H.; Qiu, X.; Chen, J.; Liu, X.; Miao, X.; Zhuang, S. Insulator Fault Detection in Aerial Images Based on Ensemble Learning with Multi-Level Perception. IEEE Access 2019, 7, 61797–61810. [Google Scholar] [CrossRef]
Liu, C.; Wu, Y.; Liu, J.; Sun, Z. Improved YOLOv3 Network for Insulator Detection in Aerial Images with Diverse Background Interference. Electronics 2021, 10, 771. [Google Scholar] [CrossRef]
Yao, L.; Qin, Y. Insulator detection dased on GIOU-YOLOv3. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6–8 November 2020. [Google Scholar]
Liu, X.; Tian, H.; Yang, Y.; Wang, Y.; Zhao, X. Research on insulator defect image detection method under complex environment background. J. Electron. Meas. Instrum. 2022, 36, 57–67. [Google Scholar]
Cristi, F.; Jocher, S. pre-commit.ci. Yolov5. 2020. Available online: https://github.com/ultralytics/yolov5/ (accessed on 11 April 2022).
Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression. arXiv 2019, arXiv:1902.09630. [Google Scholar]
Jia, X.; Wu, X.; Zhao, B. Lightweight detection network of insulator self-explosion defect DE-YOLO. J. Electron. Meas. Instrum. 2023, 37, 28–35. [Google Scholar]
Li, Y.; Zou, G.; Zou, H.; Zhou, C.; An, S. Insulators and Defect Detection Based on the Improved Focal Loss Function. Appl. Sci. 2022, 12, 10529. [Google Scholar] [CrossRef]
Tang, L.; Yu, M.; Wu, M.; Yang, C. Insulator defect detection algorithm based on improved YOLOv5. J. Cent. China Norm. Univ. Nat. (Sci. Ed.) 2022, 56, 771–780. [Google Scholar]
He, K.; Sun, J.; Tang, X. Single Image Haze Removal Using Dark Channel Prior. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 2341–2353. [Google Scholar] [CrossRef]
Tan, J.; Li, B.; Lu, X.; Yao, Y.; Yu, F.; He, T.; Ouyang, W. The Equalization Losses: Gradient-Driven Training for Long-tailed Object Recognition. arXiv 2022, arXiv:2210.05566. [Google Scholar] [CrossRef] [PubMed]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features from Cheap Operations. arXiv 2020, arXiv:1911.11907. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. arXiv 2020, arXiv:1910.03151. [Google Scholar]
Fattal, R. Single image dehazing. ACM Trans. Graph. 2008, 27, 1–9. [Google Scholar] [CrossRef]
Zhang, Z.D.; Zhang, B.; Lan, Z.C.; Liu, H.C.; Li, D.Y.; Pei, L.; Yu, W.X. FINet: An Insulator Dataset and Detection Benchmark Based on Synthetic Fog and Improved YOLOv5. IEEE Trans. Instrum. Meas. 2022, 71, 6006508. [Google Scholar] [CrossRef]

Figure 1. YOLOv5 network model.

Figure 2. Improved ECA-Bottleneck module.

Figure 3. Comparison of overhead contact systems (OCS) in real and experimental environments: (a) real environment; (b) experimental environment.

Figure 4. Schematic representation of insulator defects.

Figure 5. Schematic illustration of the shooting method and tunnel-shooting device for subway tunnels.

Figure 6. Visual Representation of the results after atomization treatment. (a) Original image; (b) Image at

L = 0.2 D = 0.01

; (c) Image at

L = 0.4 D = 0.02

; (d) Image at

L = 0.6 D = 0.03

.

Figure 6. Visual Representation of the results after atomization treatment. (a) Original image; (b) Image at

L = 0.2 D = 0.01

; (c) Image at

L = 0.4 D = 0.02

; (d) Image at

L = 0.6 D = 0.03

.

Figure 7. Visual representation of insulators and defects.

Figure 8. Thermal map comparison for insulator detection: (a) YOLOv5; (b) YOLOv5-GhostNet; (c) YOLOv5-ECA; (d) YOLOv5-GhostNet-ECA.

Figure 9. Comparative detection results with different algorithms (red boxes indicate insulators and pink boxes indicate defects): (a) YOLOv5; (b) YOLOv5-GhostNet; (c) YOLOv5-ECA; (d) YOLOv5-GhostNet-ECA.

Figure 10. Comparative detection performance of different models in real-world scenarios (red box denotes insulator): (a) YOLOv5; (b) YOLOv5-GhostNet-ECA.

Table 1. Parameters of Daheng ME2P-2621-4GC-P camera.

Category	Parameter
Resolution	$5120 \times 5120$ pixels
FPS	4.5
Pixel size	2.5 $μ$ m
Pixel depth	8 bit, 12 bit
Signal-to-noise ratio (SNR)	35.65 dB

Table 2. Configuration details of experimental platform.

Configuration Category	Version
Operating system	Ubuntu 18.04
CPU	Intel Core i9-10900K
Image processor	NVIDIA RTX3090
Programming language	Python 3.8
Programming language compilers	PyCharm 2023.1
GPU computing framework	CUDA 11.0
GPU computing acceleration library version	CUDNN 8.3
Deep learning framework	Pytorch 1.7.1

Table 3. Influence of hyperparameters on accuracy.

$α$	$β$	$γ$	Normal Insulator AP (%)	Broken Insulator AP (%)
0.5	1	0	90.62	88.25
0.5	1.5	1	86.33	91.48
0.5	2	1	87.79	92.14
0.5	1.5	2	79.57	87.38
0.5	2	2	80.65	87.84
0.75	1.5	1	96.83	94.09
0.75	2	1	94.75	91.49
0.75	1.5	2	80.23	90.44
0.75	2	2	78.19	90.86

Table 4. Comparison results of melting experiments.

Model	[email protected]:0.95 (%)	[email protected] (%)	Insulator [email protected] (%)	Defect [email protected] (%)	Size (MB)	FPS
YOLOv5	84.62	95.06	96.83	94.09	15	30
YOLOv5-GhostNet	81.15	92.26	96.21	90.11	7.5	95
YOLOv5-ECA	87.24	97.74	98.68	96.76	15	24
YOLOv5-GhostNet-ECA	86.58	96.43	97.06	96.08	9	90

Table 5. Comparison of results from different algorithms.

Model	mAP @0.5:0.95 (%)	mAP @0.5 (%)	Insulator [email protected] (%)	Defect [email protected] (%)	Size (MB)	FPS
YOLOv3	76.48	88.94	92.61	86.93	235	25
YOLOv4	79.17	89.50	92.35	87.95	243	22
YOLOv5	84.62	95.06	96.83	94.09	15	30
YOLOv8	85.08	95.11	96.87	94.15	24	27
SSD	78.83	89.07	94.23	86.25	88	59
Faster R-CNN	83.97	94.68	94.73	94.65	519	19
YOLOv5-GhostNet-ECA	86.58	96.43	97.06	96.08	9	90

Table 6. Comparative detection results under different environments.

Test Environment *	mAP (%)	Insulator AP (%)	Defect AP (%)	Size (MB)	FPS
Experimental setting (v5)	95.46	96.83	94.09	15	30
Real-life scenarios (v5)	95.98	95.98	0	15	30
Experimental setting (our)	96.57	97.06	96.08	9	90
Real-life scenarios (our)	98.43	98.43	0	9	90

* Note: “our” refers to the algorithm proposed in this study and “v5” represents the use of the original YOLOv5 model.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, L.; Li, Y.; Wang, W.; He, Z. Enhanced Detection of Subway Insulator Defects Based on Improved YOLOv5. Appl. Sci. 2023, 13, 13044. https://doi.org/10.3390/app132413044

AMA Style

Huang L, Li Y, Wang W, He Z. Enhanced Detection of Subway Insulator Defects Based on Improved YOLOv5. Applied Sciences. 2023; 13(24):13044. https://doi.org/10.3390/app132413044

Chicago/Turabian Style

Huang, Lifeng, Yongzhen Li, Weizu Wang, and Zemin He. 2023. "Enhanced Detection of Subway Insulator Defects Based on Improved YOLOv5" Applied Sciences 13, no. 24: 13044. https://doi.org/10.3390/app132413044

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhanced Detection of Subway Insulator Defects Based on Improved YOLOv5

Abstract

1. Introduction

2. Related Work

2.1. YOLOv5 Model Architecture

2.2. BL Function

2.3. Improved Feature Extraction Network

2.4. ECA-Net Feature Extraction

3. Methods

3.1. Image Acquisition

3.2. Image Preprocessing

4. Experimental Results and Analysis

4.1. Dataset Configuration and Experimental Environment

4.2. Algorithm Evaluation Metrics

4.3. Ablation Study on BL Function Parameters

4.4. Results and Analysis of Ablation and Comparative Studies

4.5. Field Detection and Visualization Analysis

5. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI