Infrared Image Enhancement Method of Substation Equipment Based on Self-Attention Cycle Generative Adversarial Network (SA-CycleGAN)

Wang, Yuanbin; Wu, Bingchao

doi:10.3390/electronics13173376

Open AccessArticle

Infrared Image Enhancement Method of Substation Equipment Based on Self-Attention Cycle Generative Adversarial Network (SA-CycleGAN)

by

Yuanbin Wang

^1,2,*

and

Bingchao Wu

^1,2

¹

School of Electrical and Control Engineering, Xi’an University of Science and Technology, Xi’an 710054, China

²

Xi’an Key Laboratory of Electrical Equipment Condition Monitoring and Power Supply Security, Xi’an 710054, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(17), 3376; https://doi.org/10.3390/electronics13173376

Submission received: 15 July 2024 / Revised: 3 August 2024 / Accepted: 21 August 2024 / Published: 26 August 2024

(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

During the acquisition of infrared images in substations, low-quality images with poor contrast, blurred details, and missing texture information frequently appear, which adversely affects subsequent advanced visual tasks. To address this issue, this paper proposes an infrared image enhancement algorithm for substation equipment based on a self-attention cycle generative adversarial network (SA-CycleGAN). The proposed algorithm incorporates a self-attention mechanism into the CycleGAN model’s transcoding network to improve the mapping ability of infrared image information, enhance image contrast, and reducing the number of model parameters. The addition of an efficient local attention mechanism (EAL) and a feature pyramid structure within the encoding network enhances the generator’s ability to extract features and texture information from small targets in infrared substation equipment images, effectively improving image details. In the discriminator part, the model’s performance is further enhanced by constructing a two-channel feature network. To accelerate the model’s convergence, the loss function of the original CycleGAN is optimized. Compared to several mainstream image enhancement algorithms, the proposed algorithm improves the quality of low-quality infrared images by an average of 10.91% in color degree, 18.89% in saturation, and 29.82% in feature similarity indices. Additionally, the number of parameters in the proposed algorithm is reduced by 37.89% compared to the original model. Finally, the effectiveness of the proposed method in improving recognition accuracy is validated by the Centernet target recognition algorithm.

Keywords:

substation equipment; infrared image enhancement; CycleGAN; self-attention; efficient local attention

1. Introduction

Substation equipment is a critical apparatus for power system [1,2,3]. Owing to the continuous operation of substation equipment in various harsh weather and heavy load for a long time, it is easy to break down, which has a great impact on the reliable operation of the power system [4]. Substation equipment failures are often accompanied by local overheating, it is important to recognize this situation in time [5,6]. With the rapid development of infrared thermal imaging technology, drones, robots and cameras equipped with infrared thermal imagers have been widely used in the daily inspection of substation equipment [7,8]. It is inefficient to analyze the infrared images of massive substation equipment manually, and the realization of automatic infrared image fault detection of substation equipment is an important part of the construction of smart power grid. However, in the process of collecting infrared images, due to factors such as camera shake, insufficient exposure time, unsuitable shooting angle and environmental interference, the image shows poor contrast, blurred details and missing texture information [9], just as shown in Figure 1. The low-quality infrared images of substation equipment will have great influence on the subsequent fault detection. Therefore, it is of great research significance to enhance low-quality infrared images [10].

At present, the mainstream infrared image enhancement methods are mainly divided into two types. One is the traditional method represented by physical and mathematical models, such as guided filtering (GF) [11,12], the Retinex algorithm [13,14], and histogram equalization (HE) [15,16]. However, the enhancement performance of such models is highly dependent on their adjustable parameters, making it challenging to adapt to a wide range of complex scenarios. The other is the deep learning method represented by a data-driven approach. Deep neural networks possess robust feature extraction and nonlinear fitting capabilities, eliminating the need for additional parameter settings. Consequently, they are well suited for handling infrared image enhancement tasks. Due to the lack of sufficient paired infrared image datasets, the enhancement performance of these methods requires improvement. With the rapid development of deep learning, cycle-consistent generative adversarial network (CycleGAN) [17] can effectively enhance low-quality images without paired datasets. However, the original CycleGAN model tends to produce images with blurred details, unclear enhancement effects, and difficulty in balancing target enhancement with noise suppression.

Aiming at the abovementioned problems, this paper presents an innovative infrared image enhancement algorithm for substation equipment based on the cycle-consistent generative adversarial network with a self-attention mechanism (SA-CycleGAN), which achieves image enhancement through style transformation. The major contributions of our work are briefly summarized as follows:

(1): In order to effectively improve the contrast and saturation of the generated image, the self-attention mechanism is employed to replace the residual structure in the transcoding network, enhance the mapping ability of the image color characteristics of the transcoding network, improve the image contrast and saturation, and greatly reduce the number of model parameters.
(2): To address the issue of missing edge information in low-quality infrared images, the feature fusion structure is constructed by using the feature pyramid of the generator’s coding network and decoding network to improve the generator’s generating effect of infrared image details.
(3): An efficient local attention mechanism is added to extract detailed information of infrared images in an all-around way to improve the clarity of image details.
(4): In the discriminator part, a two-channel discriminator is adopted to capture the difference between the generated infrared image and the target image so as to improve the discriminant network’s ability to discriminate the generated infrared image. Finally, the loss function is optimized to improve the model training convergence speed.

The rest of this paper is organized as follows. Section 2 describes the related work of the infrared image enhancement. In Section 3, we introduce the proposed method. Section 4 illustrates the experimental results of our work with comparisons to other methods. Section 5 shows concluding remarks.

2. Related Works

2.1. Traditional Infrared Image Enhancement Methods

In recent years, domestically and internationally, researchers have performed a lot of research on infrared image enhancement. Lu [18] proposed an infrared image enhancement method based on multi-scale cyclic convolution and multi-clustering space, and experimental results show that this method can enhance the contrast and brightness of an infrared image. Tan [19] presented a global and local contrast adaptive enhancement method, which consists of parameter self-adjustment Retinex and fast-guide filtering to enhance dark images. Lv [20] proposed an infrared image enhancement algorithm based on adaptive histogram equalization coupled with Laplace transform, and the results show that the contrast and clarity of the enhanced image are better. Lee [21] considered the characteristics of high-dynamic-range infrared images, and introduced the ramp distribution that increases with a constant slope in an ordered histogram domain to enhance image contrast. Although the above methods can achieve infrared image enhancement, the generalization ability is weak, the enhancement effect of the detailed features and image edge information is not obvious, and the model is relatively complex.

2.2. Deep Learning-Based Infrared Image Enhancement Methods

With the development of artificial intelligence technology, deep learning has been widely used in image enhancement tasks. Pang [22] designed a structural feature mapping network and a two-scale feature extraction network for infrared image enhancement, and experiments have proven that it can effectively improve contrast, avoiding over-enhancement. Ma [23] designed a spatial–parallax prior block with two symmetric branches to extract spatial and parallax features in an interactive guidance manner, and experimental results demonstrate that this method outperforms the current state-of-the-art methods. Wang [24] presented an innovative target attention deep neural network to balance the target enhancement and background suppression in an end-to-end manner. Jiang [25] proposed a highly effective unsupervised generative adversarial network, that can be trained without low/normal-light image pairs to enhance real-world images from various domains. Zhang [26] presented an infrared image data generation method based on DDR-CycleGAN, where the relative true probability of the image is replaced by the absolute true probability by the discriminator, which makes the generated image closer to the real image. Due to the intricate nature of infrared image backgrounds and the diverse array of equipment involved, the aforementioned approach may encounter challenges in accurately capturing all image content, potentially leading to instances of missing or obscured information.

3. The Proposed Method

During the acquisition of infrared images in substations, there is a significant likelihood of obtaining low-quality infrared images that exhibit abnormal contrast, irregular brightness, and blurred details. CycleGAN is an image style transformation model that can achieve infrared image enhancement tasks with unpaired data, consisting of two parts: generator and discriminator. The generator consists of a coding network, transcoding network, and decoding network. However, the original CycleGAN has over-enhancement problems during image enhancement. Therefore, the model in this article is improved based on the network structure of CycleGAN. The self-attention module, efficient local attention module, and feature pyramid network are introduced into the generator; a feature extraction network with two channels is incorporated into the discriminator. The improved CycleGAN is named SA-CycleGAN, and its structure is shown in Figure 2.

3.1. Self-Attention Module

The infrared image of low-quality substation equipment will have the problem of low contrast and saturation; therefore, the self-attention mechanism is incorporated into the generator, replacing the residual network in the original model to improve the mapping ability of equipment detailed information. The self-attention mechanism is a basic component of Transformer [27]. The self-attention mechanism can transform the interactions between the input infrared substation equipment image features into corresponding weights, updating each component of the sequence by integrating global information from the entire infrared image sequence. Incorporating the self-attention mechanism into the model can enhance the adaptive infrared feature extraction capabilities of the generator network, thereby improving the detail generation of infrared images in substation equipment. The original CycleGAN model employs a residual structure to map image style features within the transcoding network. However, it faces an issue with the inadequate transmission of infrared detail information. As shown in Figure 3, the residual structure consists of two convolution operations, which advantageously allows information to pass through multiple stacked layers and helps prevent overfitting. However, traditional convolution operation has limited receptive field and cannot obtain global information. In the context of this paper, infrared images of substation equipment have low contrast and saturation that are difficult to fully extract by convolution operations alone. So in this paper, the self-attention mechanism is introduced to provide an effective modeling method to capture the global information of infrared images. The core idea is to calculate the correlation between each pixel and all other pixels to obtain global context information. The structure of the self-attention mechanism is shown in Figure 4.

W_{q}

,

W_{k}

, and

W_{v}

are the corresponding weights that can be trained.

The input X is mapped to query, key, and value. The query represents the query vector, which denotes the location information that the model is currently paying attention to. The introduction of query vector Q can give more attention to the details in the image of infrared substation equipment. The key plays a crucial role in determining the relative importance within the model framework. Specifically, it facilitates the computation of significance in relation to the query vector Q. The interdependence or correlation between the query Q and its corresponding key serves as the fundamental basis for the construction of the attention matrix A. This matrix, in turn, captures the model’s focus or emphasis on different components of the input data, thereby enabling it to prioritize and allocate resources effectively. The output Z of the self-attention mechanism is derived from the dot product between the value V and the attention matrix A. Notably, the self-attention mechanism exhibits a superior capacity in capturing the intricate internal correlations among features. As a result, it possesses a more robust ability to extract the characteristic information of contrast and saturation from infrared substation equipment images. This enhanced capability ensures that the model can capture essential information from a complex image. Finally, the feature mapping capability of the transcoding network is improved to achieve the purpose of enhancing the contrast and saturation of infrared images. The mathematical principle of self-attention can be expressed as

A (Q, K) = S o f t m a x (\frac{Q \times K^{T}}{\sqrt{d}})

(1)

Z = A (Q, K) \times V

(2)

In addition, the self-attention mechanism has fewer parameters than the traditional convolution operation. In this paper, the parameter number of a self-attention operation is 3 × d × d, where d is the dimension of each input vector, and for a convolution operation with convolution kernel 3, the parameter number is 3 × 3 ×

C_{in}

×

C_{out}

.

C_{in}

is the number of input channels, and

C_{out}

is the number of output channels. Therefore, the self-attention mechanism can greatly reduce the number of model parameters compared with the residual structure.

3.2. Efficient Local Attention Mechanism and Feature Fusion Structure

The low-quality infrared substation image has the characteristics of low contrast, colorfulness, and saturation, so it is necessary to first extract the characteristic information from the infrared images through the encoding network when using CycleGAN for image style transformation. However, during the undersampling process, the receptive field of the convolutional operation is too large, resulting in an inability to fully extract the texture information and complex background information of the infrared substation equipment images. Therefore, this paper introduces the efficient localization attention (ELA) mechanism [28], the principle of which is shown in Figure 5. X Avg Pool and Y Avg Pool represent one-dimensional horizontal global pooling and one-dimensional vertical global pooling, respectively, whose mathematical expressions are shown in Equations (3) and (4).

Z_{c}^{h} (h)

and

Z_{c}^{w} (w)

execute average pooling across each channel within two distinct spatial domains to derive horizontal and vertical feature vectors from infrared substation images. This methodology is employed to capture long-range dependencies effectively. The pooled results are subsequently processed through a one-dimensional convolution F and group normalization

G_{n}

, followed by activation via the sigmoid function

σ

. The underlying principles are illustrated in Equations (5) and (6). The output of the ELA module is denoted as Y as depicted in Equation (7). Compared to the traditional local attention mechanism, the ELA mechanism adopts a sparse matrix operation, which greatly reduces the computation cost and has faster training. Moreover, the ELA mechanism can calculate the attention weight in a local scope so as to maintain local consistency and coherence when enhancing the details of the image. In this paper, by incorporating the ELA mechanism into the coding network of the generator, the network’s ability to capture local features is enhanced, thereby improving the clarity of details in the generated images:

Z_{c}^{h} (h) = \frac{1}{H} \sum_{0 \leq i \leq H} x_{c} (h, i)

(3)

Z_{c}^{w} (w) = \frac{1}{W} \sum_{0 \leq j \leq W} x_{e} (j, w)

(4)

g^{h} = σ (G_{n} (F_{h} (Z_{h})))

(5)

g^{w} = σ (G_{n} (F_{w} (Z_{w})))

(6)

Y = x_{c} (i, j) \times g^{h} \times g^{w}

(7)

To address the issue of information loss during the feature extraction stage of the generator, this paper introduces a feature pyramid structure [29]. As shown in Figure 2, the feature information collected in the first downsampling is fused with the feature information adopted in the first downsampling, and the feature information collected in the second downsampling is fused with the feature information after transcoding, which solves the problem of edge information loss in the downsampling process and improves the network’s ability to generate edge features. This structure effectively integrates feature information from the encoding network that is obtained by the decoding network, thereby enhancing the generator’s ability to capture global information. Compared to the original CycleGAN model, this approach mitigates the shortcomings of downsampled feature information loss and upsampled feature information dispersion, significantly improving the model’s image generation performance.

3.3. Two-Channel Discriminator

The original CycleGAN discriminator extracts model features through three convolution operations and judges the generated images based on the extracted results. However, the recognition of infrared substation equipment images is still a challenge. Devices such as current transformers, voltage transformers, and bushings exhibit similar characteristics, making them difficult to differentiate. A single-channel feature extraction network struggles to fully capture the image feature information, leading to potential misjudgments.

To enhance the discriminative ability of the discriminator network for generated images of infrared substation equipment, this paper introduces an additional image information extraction channel to the initial discriminator as illustrated in Figure 2. By employing a feature extraction network with 5 × 5 large convolutional kernels, the network can focus more on the overall structural information of the image. Conversely, using a feature extraction network with 3 × 3 small convolutional kernels allows the network to concentrate on details and texture information. This multi-faceted evaluation can enhance the discriminator’s ability to assess generated images, thereby enabling the generator to produce higher-quality images. The two-channel discriminator structure improves the network’s capability to extract texture features of devices in the complex background of infrared images. This enhancement allows for more effective differentiation of devices with similar features, thereby improving the overall discriminative performance of the discriminator.

3.4. Improved Loss Function

The initial loss function of CycleGAN is divided into two parts: one is adversarial loss function and the other is cyclic consistent loss. CycleGAN has two discriminators, so there will be two adversarial loss functions as shown in Equation (8), where G and F are two generators, respectively, and

D_{X}

and

D_{Y}

represent two discriminators. The other part is cyclic consistent loss as shown in Equation (10). The total loss is equal to the sum of the two:

\begin{matrix} L_{G A N} (G, D_{Y}, X, Y) = E_{y p d a t a (y)} [l o g D_{Y} (y)] + \\ E_{x p d a t a (x)} [l o g (1 - D_{Y} (G (x)))] \end{matrix}

(8)

\begin{matrix} L_{G A N} (F, D_{X}, X, Y) = E_{x p d a t a (x)} [l o g D_{X} (x)] + \\ E_{y p d a t a (y)} [l o g (1 - D_{X} (F (y)))] \end{matrix}

(9)

\begin{matrix} L_{c y c} (G, F) = E_{x p d a t a (x)} [| | F (G (x)) - {x | |}_{1}] + \\ E_{y p d a t a (y)} [| | G (F (y)) - {y | |}_{1}] \end{matrix}

(10)

\begin{matrix} L = L_{G A N} (G, D_{Y}, X, Y) + L_{G A N} (F, D_{X}, X, Y) + \\ L_{c y c} (G, F) + E_{y p d a t a (y)} [| | G (F (y)) - {y | |}_{1}] \end{matrix}

(11)

\begin{matrix} \overset{´}{L_{G A N}} (G, D_{Y}, X, Y) = E_{y \sim p d a t a (y)} {[D_{Y} (y)]}^{2} + \\ E_{x \sim p d a t a (x)} {[1 - D_{Y} (G (x))]}^{2} \end{matrix}

(12)

\begin{matrix} \overset{´}{L_{G A N}} (F, D_{X}, X, Y) = E_{x \sim p d a t a (x)} {[D_{X} (x)]}^{2} + \\ E_{y \sim p d a t a (y)} {[1 - D_{X} (F (y))]}^{2} \end{matrix}

(13)

To further improve the quality of images generated by the model and enhance the model’s convergence speed, this paper optimizes both the loss function and the adversarial loss function. As shown in Equations (12) and (13), the logarithmic operation is replaced by the square operation to reduce computational complexity. For the cycle consistency loss, the L1 loss is substituted with the Smooth L1 loss [30]. This adjustment better handles values with large differences, thereby improving the convergence speed of this model.

4. Experiments

4.1. Datasets

The infrared image dataset used in this paper originates from the 2023 provincial substation inspection conducted by a southern network company. The dataset includes a total of 982 images of seven types of substation equipment, such as lightning arresters, circuit breakers, isolation switches, and current transformers. Among these images of infrared substation equipment, 282 images are of low quality, while 700 images are of high quality.The dataset is randomly divided into training and test sets in a 2:1 ratio. In this paper, 94 low-quality images were used to test the model performance.

4.2. Implementation Details

All experiments in this paper are implemented under Windows 10. The deep learning framework, GPU, and CUDA versions are Pytorch 1.12.1, Nvidia RTX 3060, and 12.0.13, respectively.

4.3. Objective Evaluation

In order to evaluate the performance of this algorithm objectively, image entropy, image colorfulness, saturation, and feature similarity are selected as evaluation indexes.

(1) Image entropy is defined as the maximum amount of image information in the image. Generally, the larger the value, the richer the information contained in the image, and the infrared image content information can be effectively evaluated. The formula is shown in Equation (14):

E n t r o p y = - \sum_{i} p (i) l o g (p (i))

(14)

(2) Colorfulness can measure the richness of the image color, and the larger the color, the richer the image color, which can effectively evaluate the infrared color richness. Its formula is shown in Equation (15), and the calculation method of

σ_{r g y b}

and

μ_{r g y b}

is shown in Equations (16) and (17), where

σ_{r g}

,

σ_{y b}

,

μ_{r g}

, and

μ_{y b}

are the standard deviation and mean of rg and yb, respectively:

C o l o r f u l n e s s = σ_{r g y b} + 0.3 \times μ_{r g y b}

(15)

(3) Saturation indicates the brightness of an object’s color in an image. A higher maximum saturation value in the image suggests the presence of more vivid colors. For infrared images, increasing image saturation can effectively improve the image quality. Saturation is calculated by averaging the saturation levels of all the intensities in the image, allowing for an effective evaluation of the contrast information in the infrared image. The calculation formula is shown in Equation (16), where

R (u, v)

,

G (u, v)

, and

B (u, v)

represent the average saturation values of the red, green, and blue channels, respectively:

S a t u r a t i o n = \frac{1}{N} \underset{u = 1}{\sum^{U}} \underset{v = 1}{\sum^{V}} 1 - \frac{3 m i n {R (u, v), G (u, v), B (u, v)}}{(R (u, v), G (u, v), B (u, v))}

(16)

(4) Feature similarity (FSIM) [31] regards phase congruency as the main feature and image gradient amplitude as the secondary feature, and the two play complementary roles in characterizing image quality. The larger the FSIM value, the higher the image quality, and its calculation formula is shown in Equation (17). The calculation stage includes two stages. The first stage is to calculate the local similarity map. The second stage combines the similarity graphs into a single similarity score. FSIM is used as an evaluation index to accurately measure the global feature similarity of enhanced infrared images:

F S I M = \frac{\sum_{x \in Ω} S_{L} (x) * P C_{m} (x)}{\sum_{x \in Ω} P C_{m} (x)}

(17)

In addition to the above evaluation indexes, this paper also introduces the number of model parameters as an evaluation index.

4.4. Iterative Process Analysis

In this paper, the experimental parameters are set as follows: the input image size is 512 × 512 × 3, the training rounds are 300, the adaptive moment estimation optimizer is used for training, and the initial learning rate is 2 × 10⁻⁴. As shown in Figure 6, compared with the original algorithm, the proposed algorithm has faster convergence speed and smaller loss value. The results show that the convergence speed of the model can be greatly improved by replacing the logarithm operation with the square operation with lower computational complexity. The Smooth L1 loss function can greatly reduce the loss value and improve the model performance.

4.5. Comparison with Other Methods

To evaluate the performance of the proposed algorithm objectively and accurately, this paper includes comparisons with several traditional image enhancement methods, such as histogram equalization, the Retinex algorithm, and guided filtering. Additionally, some of the latest deep learning image enhancement methods, including DCGAN and the diffusion model (DM), are selected for comparison. Table 1 presents the results of the relevant evaluation indicators for these different algorithms. The enhancement effects of the Retinex algorithm and the guided filter algorithm are highly dependent on specific parameter settings. Consequently, their performance is suboptimal when processing images of infrared substation equipment. It can be seen from Table 1 that the maximum value of image entropy is obtained by the DM method, which indicates that the DM method has advantages in the richness of the generated image content, but the image quality is poor. Compared with mainstream image enhancement algorithms, the proposed algorithm has obvious advantages, achieving optimal results in terms of image color, saturation, and feature similarity, with average improvements of 10.91%, 18.89%, and 29.82%, respectively. At the same time, due to the introduction the of self-attention mechanism and efficient local attention mechanism with fewer parameters, the model parameter number of the proposed algorithm is reduced by 37.89% compared with the original algorithm. It shows that the proposed algorithm has obvious advantages in processing images of infrared substation equipment.

4.6. Visual Effect Comparisons

To better illustrate the superiority of the proposed algorithm, this section presents the enhancement effects of various algorithms, displayed in Figure 7. As evident from the figure, the proposed algorithm effectively enhances low-quality images, even in scenarios with complex backgrounds and multiple types of infrared transformer equipment. It significantly improves image contrast, brightness, saturation, and other features. Traditional methods, such as the Retinex algorithm and histogram equalization algorithm, have limited effectiveness in enhancing low-quality infrared images of substation equipment. While deep learning-based image enhancement methods can improve image quality, they often lead to the loss of detailed features. This can result in incomplete images or alterations to the original features of the image. For example, in images enhanced by the diffusion model, the characteristics of the current transformer may appear at the voltage balancing ring of the arrester. However, images enhanced by DCGAN tend to lose some features and exhibit blurriness. The contrast, saturation, and color of images enhanced by the original CycleGAN remain suboptimal. In contrast, the algorithm proposed in this paper significantly improves the contrast, color depth, and saturation of infrared images. This indicates that the addition of a self-attention mechanism effectively enhances image contrast. Furthermore, compared to other algorithms, the proposed method addresses the issue of blurred image details, demonstrating its superiority.

4.7. Ablation Experiment

To verify the effectiveness of the improved algorithm presented in this paper, we conduct an ablation experiment. The results of this experiment are shown in Table 2. Scheme 1 enhances the model by incorporating a self-attention mechanism, significantly reducing the number of template parameters and achieving a lightweight model. The parameter count is reduced to just 4.32 M. However, the FSIM value is low, indicating poor image enhancement performance. Scheme 2 builds upon Scheme 1 by adding an ELA attention mechanism. This addition enhances the generator’s capability to extract image details as evidenced by the improved image quality in the results. Scheme 3 further refines the model by introducing a feature pyramid structure on top of Scheme 2. This enhancement improves the feature fusion capability of the generator network and addresses the issue of missing information in the generated images. Building on Scheme 3, the proposed method introduces a dual-channel feature extraction network in the discriminator. This enhancement improves the discriminator’s feature comparison capability, resulting in the highest FSIM value of 0.61. This indicates that the proposed algorithm significantly enhances the image quality compared to the original algorithm.

4.8. Visualization of Image Recognition Results

The purpose of infrared image enhancement is to facilitate subsequent advanced machine vision tasks, such as target recognition and image segmentation. To evaluate the effectiveness of image enhancement, the Centernet object recognition algorithm is employed in this section. The experimental results are presented in Figure 8. The image recognition performance of the unenhanced low-quality infrared transformer is suboptimal, leading to issues such as missed detections and false detections. The recognition performance of the infrared image enhanced by the algorithm proposed in this paper has shown significant improvement. For instance, the contrast of the arrester in the unenhanced image is low, resulting in a relatively blurry image that is not recognized. In contrast, the target device in the enhanced image is accurately identified. These results demonstrate that the proposed algorithm can effectively enhance image quality and improve the recognition performance of the network.

5. Conclusions

The problem of low-quality substation infrared images will occur in the process of acquisition by UAV. Therefore, an infrared image enhancement algorithm based on SA-CycleGAN is proposed in this paper. Firstly, a self-attention mechanism is introduced into the model, enhancing the network’s ability to map infrared features of substation equipment in complex backgrounds and improving the model’s capability to generate clear images from blurry features. An efficient local attention mechanism is incorporated into the encoding network, and feature fusion between the encoding and decoding networks is achieved through a feature pyramid structure, which enhances the contrast and brightness of the generated images. In the discriminator module, a dual-channel discriminant network is constructed, improving the network’s ability to extract differential information from images, thereby enhancing the quality of the generated images. Additionally, the convergence speed of the model is improved by optimizing the loss function. The proposed algorithm is tested on a self-constructed image dataset of infrared substation equipment. The experimental results demonstrate that the evaluation indices for color, saturation, and feature similarity of low-quality infrared images processed by the proposed algorithm reach 59.21, 0.89, and 0.61, respectively, indicating a significant improvement in infrared image quality. Furthermore, compared to the original algorithm, the number of model parameters is decreased by 37.89%, highlighting the superiority and practical value of the proposed algorithm.

Author Contributions

Conceptualization, methodology, Y.W.; software, writing, original draft preparation, B.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by National Natural science foundation of China under Grant (no. 52174198) and this research was supported by The National Key Research and Development Program of Shaanxi Province (2023 YBSF-133).

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the time limitation.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhao, Z.B.; Feng, S.; Zhai, Y.J.; Zhao, W.Q.; Li, G. Infrared thermal image instance segmentation method for power substation equipment based on visual feature reasoning. IEEE Trans. Instrum. Meas. 2023, 72, 5029613. [Google Scholar] [CrossRef]
Luo, L.; Ma, R.; Li, Y.; Yang, F.; Qiu, Z. Image recognition technology with its application in defect detection and diagnosis analysis of substation equipment. Sci. Program. 2021, 2021, 2021344. [Google Scholar] [CrossRef]
Xiao, Y.; Yan, Y.; Yu, Y.S.; Wang, B.; Liang, Y.H. Research on pose adaptive correction method of indoor rail mounted inspection robot in gis substation. Energy Rep. 2022, 8, 696–705. [Google Scholar] [CrossRef]
Li, J.Q.; Xu, Y.Q.; Nie, K.H.; Cao, B.F.; Zuo, S.N.; Zhu, J. Pednet: A lightweight detection network of power equipment in infrared image based on yolov4-tiny. IEEE Trans. Instrum. Meas. 2023, 72, 5004312. [Google Scholar] [CrossRef]
Zou, H.; Huang, F.Z. A novel intelligent fault diagnosis method for electrical equipment using infrared thermography. Infrared Phys. Technol. 2015, 73, 29–35. [Google Scholar] [CrossRef]
Ferreira, R.A.M.; Silva, B.P.A.; Teixeira, G.G.D.; Andrade, R.M.; Porto, M.P. Uncertainty analysis applied to electrical components diagnosis by infrared thermography. Measurement 2019, 132, 263–271. [Google Scholar] [CrossRef]
Liu, T.; Li, G.L.; Gao, Y. Fault diagnosis method of substation equipment based on you only look once algorithm and infrared imaging. Energy Rep. 2022, 8, 171–180. [Google Scholar] [CrossRef]
Han, S.; Yang, F.; Yang, G.; Gao, B.; Zhang, N.; Wang, D.W. Electrical equipment identification in infrared images based on roi-selected cnn method. Electr. Power Syst. Res. 2020, 188, 106534. [Google Scholar] [CrossRef]
Yang, K.; Xiang, W.; Chen, Z.; Zhang, J.; Liu, Y. A review on infrared and visible image fusion algorithms based on neural networks. J. Visual Commun. Image Represent. 2024, 101, 104179. [Google Scholar] [CrossRef]
Wang, H.; Cheng, C.; Zhang, X.C.; Sun, H.B. Towards high-quality thermal infrared image colorization via attention-based hierarchical network. Neurocomputing 2022, 501, 318–327. [Google Scholar] [CrossRef]
Jia, H.; Yin, Q.; Lu, M. Steering kernel weighted guided image filtering with gradient constraint. Comput. Graphics. 2024, 119, 103908. [Google Scholar] [CrossRef]
Guo, Z.; Yu, X.; Du, Q. Infrared and visible image fusion based on saliency and fast guided filtering. Infrared Phys. Technol. 2022, 123, 104178. [Google Scholar] [CrossRef]
Rong, L.; Zhang, S.H.; Yin, M.F.; Wang, D.; Zhao, J.; Wang, Y.; Lin, S.F. Reconstruction efficiency enhancement of amplitude-type holograms by using single-scale retinex algorithm. Opt. Lasers Eng. 2024, 176, 108097. [Google Scholar] [CrossRef]
Noori, H.; Gholizadeh, M.H.; Rafsanjani, H.K. Digital image defogging using joint retinex theory and independent component analysis. Comput. Vis. Image Underst. 2024, 245, 104033. [Google Scholar] [CrossRef]
Yuan, Q.; Dai, S. Adaptive histogram equalization with visual perception consistency. Inf. Sci. 2024, 668, 120525. [Google Scholar] [CrossRef]
Zhang, F.; Dai, Y.; Peng, X.; Wu, C.; Zhu, X.; Zhou, R.; Wu, Y. Brightness segmentation-based plateau histogram equalization algorithm for displaying high dynamic range infrared images. Infrared Phys. Technol. 2023, 134, 104894. [Google Scholar] [CrossRef]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Lu, H.; Liu, Z.; Zhang, J.; Wang, Z. Infrared image enhancement based on multi-scale cyclic convolution and multi-clustering space. Acta Electron. Sin. 2022, 50, 415–425. [Google Scholar]
Tan, Y.X.; Fan, S.S.; Wang, Z.Y. Global and local contrast adaptive enhancement methods for low-quality substation equipment infrared thermal images. IEEE Trans. Instrum. Meas. 2024, 73, 5005417. [Google Scholar] [CrossRef]
Lv, K.; Zhang, D. Infrared image enhancement algorithm based on adaptive histogram equalization coupled with laplace transform. Opt. Technol. 2021, 47, 747–753. [Google Scholar]
Lee, S.; Kim, D.; Kim, C. Ramp distribution-based image enhancement techniques for infrared images. IEEE Signal Process Lett. 2018, 25, 931–935. [Google Scholar] [CrossRef]
Pang, Z.; Liu, X.; Liu, G.; Gong, Y.; Zhou, H.; Luo, H. Parallel multifeature extracting network for infrared image enhancement. Infrared Laser Eng. 2022, 51, 297–305. [Google Scholar]
Ma, J.Y.; Gao, W.J.; Ma, Y.; Huang, J.; Fan, F. Learning spatial-parallax prior based on array thermal camera for infrared image enhancement. IEEE Trans. Ind. Inf. 2022, 18, 6642–6651. [Google Scholar] [CrossRef]
Wang, D.; Lai, R.; Guan, J. Target attention deep neural network for infrared image enhancement. Infrared Phys. Technol. 2021, 115, 103690. [Google Scholar] [CrossRef]
Jiang, Y.F.; Gong, X.Y.; Liu, D.; Cheng, Y.; Fang, C.; Shen, X.H.; Yang, J.C.; Zhou, P.; Wang, Z.Y. Enlightengan: Deep light enhancement without paired supervision. IEEE Trans. Image Process. 2021, 30, 2340–2349. [Google Scholar] [CrossRef]
Zhang, H.; Yang, J.; Li, Q.; Hua, H. Infrared image data enhancement based on ddr-cyclegan. Laser Infrared 2022, 52, 600–606. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5999–6009. [Google Scholar]
Xu, W.; Wan, Y. Ela: Efficient local attention for deep convolutional neural networks. arXiv 2024, arXiv:2403.01123. [Google Scholar]
Li, Y.C.; Zhou, S.L.; Chen, H. Attention-based fusion factor in fpn for object detection. Appl. Intell. 2022, 52, 15547–15556. [Google Scholar] [CrossRef]
Li, F.; Zurada, J.M.; Wu, W. Smooth group L1 and L2 regularization for input layer of feedforward neural networks. Neurocomputing 2018, 314, 109–119. [Google Scholar] [CrossRef]
Zhang, L.; Zhang, L.; Mou, X.Q.; Zhang, D. Fsim: A feature similarity index for image quality assessment. IEEE Trans. Image Process. 2011, 20, 2378–2386. [Google Scholar] [CrossRef]

Figure 1. Different types of low-quality infrared images.

Figure 2. The structure of SA-CycleGAN.

Figure 3. The principle of the residual structure.

Figure 4. The structure of self-attention.

Figure 5. The structure of efficient localization attention.

Figure 6. The curve of loss with epoch.

Figure 7. Enhanced effect visualization.

Figure 8. Visualization of the image recognition results.

Table 1. Comparative experiments of different algorithms.

Model	Entropy	Colorfulness	Saturation	FSIM	Parameters (M)
HE	7.03	54.05	0.74	0.52	\
Retinex	6.32	49.28	0.68	0.42	\
GF	4.31	44.26	0.55	0.38	\
DCGAN	6.54	50.33	0.72	0.49	10.66
DM	7.22	56.69	0.83	0.48	12.97
CycleGAN	6.81	52.51	0.86	0.47	7.84
This paper	7.03	59.21	0.89	0.61	4.87

Table 2. Ablation results.

Method	SA	ELA	FPN	Two-Channel	FSIM	PARA (M)
CycleGAN	×	×	×	×	0.47	7.84
Scheme 1	✓	×	×	×	0.41	4.32
Scheme 2	✓	✓	×	×	0.49	4.50
Scheme 3	✓	✓	✓	×	0.55	4.51
This paper	✓	✓	✓	✓	0.61	4.87

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Wu, B. Infrared Image Enhancement Method of Substation Equipment Based on Self-Attention Cycle Generative Adversarial Network (SA-CycleGAN). Electronics 2024, 13, 3376. https://doi.org/10.3390/electronics13173376

AMA Style

Wang Y, Wu B. Infrared Image Enhancement Method of Substation Equipment Based on Self-Attention Cycle Generative Adversarial Network (SA-CycleGAN). Electronics. 2024; 13(17):3376. https://doi.org/10.3390/electronics13173376

Chicago/Turabian Style

Wang, Yuanbin, and Bingchao Wu. 2024. "Infrared Image Enhancement Method of Substation Equipment Based on Self-Attention Cycle Generative Adversarial Network (SA-CycleGAN)" Electronics 13, no. 17: 3376. https://doi.org/10.3390/electronics13173376

APA Style

Wang, Y., & Wu, B. (2024). Infrared Image Enhancement Method of Substation Equipment Based on Self-Attention Cycle Generative Adversarial Network (SA-CycleGAN). Electronics, 13(17), 3376. https://doi.org/10.3390/electronics13173376

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Infrared Image Enhancement Method of Substation Equipment Based on Self-Attention Cycle Generative Adversarial Network (SA-CycleGAN)

Abstract

1. Introduction

2. Related Works

2.1. Traditional Infrared Image Enhancement Methods

2.2. Deep Learning-Based Infrared Image Enhancement Methods

3. The Proposed Method

3.1. Self-Attention Module

3.2. Efficient Local Attention Mechanism and Feature Fusion Structure

3.3. Two-Channel Discriminator

3.4. Improved Loss Function

4. Experiments

4.1. Datasets

4.2. Implementation Details

4.3. Objective Evaluation

4.4. Iterative Process Analysis

4.5. Comparison with Other Methods

4.6. Visual Effect Comparisons

4.7. Ablation Experiment

4.8. Visualization of Image Recognition Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI