A Dual Convolutional Neural Network with Attention Mechanism for Thermal Infrared Image Enhancement

Gao, Pengfei; Zhang, Weihua; Wang, Zeyi; Ma, He; Lyu, Zhiyu

doi:10.3390/electronics12204300

Open AccessArticle

A Dual Convolutional Neural Network with Attention Mechanism for Thermal Infrared Image Enhancement

by

Pengfei Gao

¹,

Weihua Zhang

¹,

Zeyi Wang

¹,

He Ma

¹ and

Zhiyu Lyu

^2,*

¹

State Grib Jilin Electric Power Co., Ltd., Changchun Power Supply Company, Changchun 130041, China

²

School of Automation Engineering, Northeast Electric Power University, Jilin 132013, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(20), 4300; https://doi.org/10.3390/electronics12204300

Submission received: 11 September 2023 / Revised: 9 October 2023 / Accepted: 13 October 2023 / Published: 17 October 2023

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

In industrial applications, thermal infrared images, which are commonly used, often suffer from issues such as low contrast and blurred details. Traditional image enhancement algorithms are limited in their effectiveness in improving the visual quality of thermal infrared images due to the specific nature of the application. Therefore, we propose a dual Convolutional Neural Network (CNN) combined with an attention mechanism to address the challenges of enhancing low-quality thermal infrared images and improving their visual quality. Firstly, we employ two parallel sub-networks to extract both global and local features. In one sub-network, we utilize a sparse mechanism incorporating dilated convolutions, while the other sub-network employs Feature Attention (FA) blocks based on channel attention and pixel attention. This architecture significantly enhances the feature extraction capability. The use of attention mechanisms allows the network to filter out irrelevant background information, enabling more flexible feature extraction. Finally, through a simple yet effective fusion block, we thoroughly integrate the extracted features to achieve an optimal fusion strategy, ensuring the highest quality enhancement of the final image. Extensive experiments on benchmark datasets and real images demonstrate that our proposed method outperforms other state-of-the-art models in terms of objective evaluation metrics and subjective assessments. The generated images also exhibit superior visual quality.

Keywords:

image enhancement; CNN; thermal infrared image; FA block; dilated convolution

1. Introduction

Thermal infrared images possess excellent anti-interference capabilities and strong environmental adaptability, finding widespread application in civil aviation, industrial construction, outdoor transportation, and various other fields [1]. Particularly in the field of electric power, infrared thermography is often relied upon for accurate and rapid scanning and retrieval of damaged electrical equipment. However, due to the interference from thermal radiation in the detection environment, thermal infrared imaging suffers from poor resolution, blurred target margins, indistinct boundaries between targets and backgrounds, and inadequate image contrast. These issues significantly impact the accuracy of subsequent high-level visual tasks [2,3,4,5] involving thermal infrared images. Therefore, enhancing and improving the clarity of object contours, boosting the brightness contrast of images, and increasing the signal-to-noise ratio are crucial for elevating the quality of thermal infrared images. Such enhancements are of vital importance for the further development and application of infrared imaging technology.

In recent years, enhancement algorithms for thermal infrared images have been continuously evolving and improving. Starting from the early algorithms based on histogram equalization [6,7,8,9,10,11,12,13,14,15], advancements have led to the development of image enhancement techniques utilizing deep learning [16,17,18,19,20,21,22,23,24,25,26]. As a result, the enhanced effectiveness of thermal infrared images has demonstrated noticeable improvements. While traditional enhancement methods like histogram equalization have advantages in terms of simplicity and ease of implementation, they can introduce issues such as oversaturation, noise amplification, and extended processing times. Moreover, as the applications of thermal infrared images become more diverse and complex, traditional enhancement algorithms struggle to effectively handle thermal infrared images in different scenarios. This limitation significantly restricts the practical usage of thermal infrared images. In recent years, with the widespread application and development of deep learning in various image processing fields, attention has turned towards deep learning-based methods for enhancing thermal infrared images. However, there is still relatively limited research in this area. Although these approaches show some improvement compared with traditional methods, the enhancement results are not yet satisfactory.

To address this issue, we propose a dual convolutional neural network with an attention mechanism for enhancing thermal infrared images, aiming to further improve enhancement outcomes. In this approach, we employ two separate sub-networks to extract features at different scales. Subsequently, these features are effectively integrated using a simple yet powerful fusion block to ensure optimal visual effects in the enhanced images. In one of the sub-networks, we enhance the network’s feature extraction capability using a sparse mechanism based on dilated convolutions. In the other sub-network, we introduce an FA block [27] to filter out irrelevant background information in the images. This enhances the network’s focus on objects, textures, and details, ensuring the flexible extraction of meaningful features. By combining these techniques, our proposed method aims to overcome the limitations of both traditional enhancement methods and existing deep learning-based approaches, with the ultimate goal of achieving more effective and visually appealing enhancement results for thermal infrared images. The primary contributions of our proposed model are outlined as follows:

(1): We have introduced a dual convolutional neural network with an integrated attention mechanism for enhancing thermal infrared images. This network employs two parallel sub-networks to extract features at distinct scales. Furthermore, we utilize a fusion block to determine the optimal strategy for combining these features, resulting in enhanced images that boast rich details and improved visual quality.
(2): The FA block is introduced to enhance the flexibility of network feature extraction, enabling it to adapt to complex scenes more effectively. Through the incorporation of both channel attention and pixel attention mechanisms, the FA block efficiently filters out irrelevant background information, directing the network’s attention toward meaningful features. This integration significantly enhances the network’s ability to handle complex scenarios.
(3): The integration of the sparse mechanism using dilated convolutions is employed to enhance the network’s feature extraction capability. By utilizing dilated convolutions to expand the receptive field, the network can effectively capture information from a broader context, consequently enhancing its feature extraction capability.

The remaining sections of the article are organized as follows: Section 2 provides a brief overview of the developmental trajectory of thermal infrared image enhancement algorithms. Section 3 elaborates on the methodology we have proposed, including details about the network model’s loss function. In Section 4, we discuss the experimental setup along with the extensive experimental results. Finally, Section 5 concludes by summarizing the contributions and findings of our work.

2. Related Works

Like general image enhancement algorithms, thermal infrared image enhancement algorithms can also be broadly categorized into traditional image enhancement algorithms and deep learning-based image enhancement algorithms. However, due to the limited number of algorithms specifically designed for enhancing thermal infrared images, early thermal infrared image enhancement algorithms were similar to those used for regular images. With the advancement of deep learning and the construction of thermal infrared image datasets, there has been a gradual emergence of algorithms tailored for thermal infrared image enhancement. Nevertheless, such specialized algorithms remain relatively scarce.

2.1. Traditional Image Enhancement Methods

Among the diverse array of image enhancement methods, the histogram equalization algorithm is the most frequently employed. Classic histogram equalization (HE) [7] is a fundamental technique wherein the grayscale levels are transformed through a specific process, aiming to evenly distribute the image’s grayscale levels as much as possible. The underlying mechanism of “equalizing” the grayscale values consistently yields visually improved contrast in various types of images. Building upon this, researchers have continually refined the approach. For instance, the Adaptive Histogram Equalization (AHE) [8] divides images into grids of rectangular regions, performing contrast enhancement individually on each image block. The Plateau Equalization (PE) [10] algorithm selects an appropriate plateau threshold as a reference. When the probability distribution of a grayscale level in the histogram exceeds this threshold, the distribution is adjusted; if it falls below the threshold, it remains unchanged. Additionally, the Partially Overlapped Subblock Histogram Equalization (POSHE) [12] algorithm divides the original image into a limited number of regions. Based on corresponding weight values assigned to different regions, it ensures enhanced regional details. In comparison to local histogram equalization, this method reduces the computational load.

In addition to histogram equalization techniques, methods that enhance images by extracting details through filters are also widely used. For instance, the Bilateral Filtering (BF) [28] technique can extract image detail components to achieve image enhancement while avoiding excessive edge enhancement. However, its real-time performance is suboptimal. An improved version, the Fast Bilateral Filtering Enhancement Algorithm [29], developed by Paris and others, effectively reduces runtime. Advancements have also been made in infrared image enhancement through multiscale and multiresolution approaches. Pace et al. [30] introduced a multiscale pyramid method for image decomposition. To further enhance image enhancement, the Retinex-based image enhancement method [31], which simulates human visual mechanisms, has been introduced to infrared images. Addressing the issue of low human visual evaluation scores in wavelet-based infrared image enhancement methods, Zhan et al. [32] proposed an enhancement method based on both wavelet transform and Retinex for infrared images.

Frequency-domain-based thermal infrared image enhancement algorithms have also been widely employed. In 2001, Agaian et al. [33] transformed images into the frequency domain, utilized high-pass filters to extract the high-frequency components of the images, enhanced them, and then converted them back to the spatial domain. Li et al. [34] replaced the traditional Fourier transform with a fractional Fourier transform in their algorithm, preserving more image details. Shcherbinin et al. [35] used local phase congruency analysis to sharpen image details. Jiang et al. [36] proposed a novel rotation method to achieve higher-resolution digital imaging by utilizing higher-frequency diffraction patterns in the diagonal direction for resolution enhancement. Guo et al. [37] introduced an enhancement algorithm based on the fractional wavelet transform. They decomposed the input image using fractional wavelet transform and applied nonlinear enhancement to high-resolution subbands.

In summary, traditional infrared enhancement techniques amplify noise interference while enhancing details, making it challenging to improve the signal-to-noise ratio. Although artifacts in enhanced infrared images are somewhat reduced, they are still challenging to eliminate. The limited applicability of traditional algorithms significantly constrains the effectiveness of thermal infrared image enhancement.

2.2. Deep Learning-Based Image Enhancement Methods

With the widespread application of deep learning in the field of image processing, researchers have started exploring image enhancement techniques based on deep learning. Currently, deep learning-based image enhancement methods [17,20,21,22,23] mainly target the problems of low brightness, low contrast, noise, and artifacts in underexposed visible light images, aiming to improve visual quality. Research specifically focused on enhancing contrast and details in thermal infrared images is limited, but in general, methods using convolutional neural networks have significantly improved the quality of image enhancement.

For instance, Liang et al. [20] designed the Multi-Scale Retinex Net, which utilizes convolutional neural networks and the Retinex theory to achieve end-to-end mapping between dark and bright images. Similarly, the Kindling the Darkness network (KinD) [21], based on Retinex, trains models using pairs of images captured under different exposure conditions. This model decomposes images into two parts, adjusting image illumination in one part and recovering reflection in the other, displaying strong robustness against severe visual defects. The Global Illumination-Aware and Detail-Preserving Network [22] rescales input images to a specific size before feeding them into an encoder/decoder network to generate global illumination prior knowledge. The Multi-branch Low-light Enhancement Network (MBLLEN) [23] extracts rich features at different levels using multiple sub-networks, enhancing brightness, contrast, and removing artifacts and noise. Addressing the need to extract weak infrared targets and background sub-images, Fan et al. [24] proposed an infrared image enhancement method that uses convolutional neural networks to highlight targets and suppress background clutter. This method effectively improves the contrast between weak targets and the background. Kuang et al. [25] introduced a contrast and detail enhancement method for infrared images based on conditional generative adversarial networks, preventing the amplification of background noise and further enhancing contrast and details. Wang et al. [26] introduced a Target Attention Deep Neural Network (TADNN), which achieves discriminative enhancement in an end-to-end manner. To further improve the enhancement of thermal infrared images, resulting in a more outstanding visual effect in the enhanced images, Park et al. [16] propose the use of imitation learning based on the Enhanced Swin Transformer Model for thermal infrared image enhancement. Furthermore, Pang et al. [17] leverage a detail enhancement network composed of multiple Convolutional Mixed Attention Blocks (MAB), residual learning (RL), and upsampling units to extract deep features from the input and learn meaningful thermal radiation target information. While the aforementioned models have achieved good results in thermal infrared image enhancement, they often overlook the details and visual quality of the enhanced images. To address this issue, we have designed a dual-layer thermal infrared image enhancement network. This network leverages attention mechanisms and dilated convolutions to enhance the model’s feature extraction capabilities and flexibility in feature extraction. Additionally, we employ learnable fusion blocks to effectively fuse features, resulting in enhanced images with rich details and excellent visual quality.

3. Proposed Method

As is widely recognized, achieving rich image details and superior visual quality necessitates robust feature extraction. Effectively extracting these features represents a critical challenge. In this study, to ensure efficient feature extraction, we employ two parallel and independent sub-networks to extract multi-scale features. A fusion block is employed to integrate these features, ensuring their comprehensive utilization. The overall network architecture is depicted in Figure 1. Through the parallel and independent nature of these two sub-networks, we can capture diverse features at different scales. This strategy ensures the accurate reconstruction of image textures, details, and other distinctive features, thereby guaranteeing the effectiveness of image enhancement and elevating the visual quality of the enhanced images.

3.1. Overall Framework

The overall structure of the entire network model is depicted in Figure 2. The upper sub-network incorporates the Feature Attention (FA) block, leveraging attention mechanisms to filter out irrelevant background information and focus on the parts of the image that require enhancement. This approach enhances image details and textures more effectively. The lower sub-network introduces a sparse mechanism, significantly boosting the network’s feature extraction capability without increasing computational costs, resulting in richer feature information. The final fusion block ensures that the extracted features are fully utilized and effectively integrated into the enhanced image. Our proposed model combines extensive feature extraction with efficient feature fusion, enabling effective image enhancement and ensuring enhanced visual quality.

3.2. Feature Attention

In the face of complex scenes, many models struggle to effectively extract features. To further enhance our model’s adaptability to complex scenarios and improve the flexibility of network feature extraction, we integrate the Feature Attention (FA) block to optimize the sub-network structure. As shown in Figure 2, the upper sub-network comprises 8 convolutional layers and 3 FA blocks. The network’s input goes through four convolutional layers initially to extract features. Afterward, it undergoes downsampling and is then input into three FA blocks. These blocks use attention mechanisms to filter out unimportant information. Finally, the output undergoes upsampling and is passed through four convolutional layers to generate the initial enhanced image. It’s worth noting that all convolutional layers in this process have a kernel size of 3 × 3 and a stride of 1, and based on prior related work, we have chosen to set the number of feature maps in the network to 64 to achieve the best results. The FA block is designed based on attention mechanisms, and its detailed structure is illustrated in Figure 3. The FA block consists of Channel Attention (CA), Pixel Attention (PA), and local residual learning. Through sigmoid activation, the CA and PA modules assign different weights to the channels and pixels of the feature maps, enabling flexible processing of diverse information types. Meanwhile, the use of local residual learning effectively filters out unimportant background information, allowing the network to focus on the most relevant information. To maximize the utilization of attention mechanisms, we employ downsampling and upsampling operations. These operations effectively enlarge the receptive field of the convolutional layers within the FA block, capturing more feature information. This familiarity with low-resolution spatial features enhances the efficacy of the feature attention mechanism. This integration of the FA block enhances the model’s capability to handle complex scenarios by selectively attending to critical features and filtering out unnecessary background information, ultimately promoting robust feature extraction in challenging situations.

3.3. Sparse Mechanism

The primary architecture of the lower sub-network consists of an 18-layer CNN, as shown in Figure 2; the first 15 layers of the network serve as a sparse mechanism for feature extraction from the image. The subsequent 3 layers are responsible for generating the initial enhanced image. In both cases, whether using dilated convolutions or regular convolutions, the convolutional kernels have a size of 3 × 3. The network employs 64 feature maps throughout. Here, we leverage the sparse mechanism to enhance the network’s feature extraction capabilities. This mechanism comprises both regular convolutional layers and dilated convolutional layers. Notably, in the 2nd, 5th, 9th, 12th, and 14th layers, we replace regular convolutional layers with dilated convolutional layers. By incorporating dilated convolutions, we can expand the receptive field without significantly increasing computational complexity. However, continuous usage of dilated convolutions might lead to the loss of some contextual information due to their spaced-out convolution kernels, potentially resulting in a grid effect. To mitigate this issue, we have thoughtfully designed a sparse mechanism involving alternating regular and dilated convolutional layers to ensure the integrity of feature information. This approach has been proven effective in other computer vision tasks [38,39], maintaining contextual awareness while benefiting from the enhanced receptive field provided by dilated convolutions.

3.4. Fusion Block

To ensure the final enhanced image’s visual quality, the feature fusion process is crucial, as it must leverage the unique features extracted by both sub-networks. However, to strike a balance between enhancement efficacy and model complexity, I have designed a simple, learnable fusion block composed of three convolutional layers. In this fusion block, we first increase the number of feature maps through the initial convolutional layer, capturing richer feature information and providing ample features for enhancing the image. The subsequent two convolutional layers are then employed to refine the enhanced image. This straightforward structure serves a dual purpose: it effectively learns the optimal fusion strategy while preventing overly complex network architectures from affecting model performance. This design decision ensures that the enhanced image benefits from the distinctive features captured by both sub-networks while maintaining manageable model complexity.

In the model, all convolutional layers use a kernel size of 3 × 3. This consistent kernel size ensures that the model extracts features from the input images in a standardized manner across different layers. This choice of kernel size is common in many convolutional neural network architectures and contributes to the model’s ability to capture local patterns and details in the images effectively.

3.5. Loss Function

In order to achieve better visual quality and higher objective evaluation metrics in the final enhanced images, we have opted to train our model using a combination of L1 loss [40] and SSIM loss functions.

L1 Loss. The L1 loss, also known as the mean absolute error, measures the absolute pixel-wise differences between the enhanced image and the ground truth image. It encourages the model to generate images that closely match the ground truth in terms of pixel values, promoting the preservation of details and textures. The L1 loss is represented by the following:

L = \frac{1}{N} \sum_{i = 1}^{N} ‖ y_{i} - x_{i} ‖

(1)

where

y_{i}

and

x_{i}

denotes ground truth and the output of the network.

SSIM Loss. The Structural Similarity Index (SSIM) loss is a perceptual metric that evaluates the structural similarity between two images. It takes into account luminance, contrast, and structure information and aims to maintain the overall visual appearance of the enhanced image compared with the ground truth. The following is the definition of SSIM:

S S I M (x, y) = \frac{(2 μ_{x} μ_{y} + c_{1}) (2 σ_{x y} + c_{2})}{(μ_{x}^{2} + μ_{y}^{2} + c_{2}) (σ_{x}^{2} + σ_{y}^{2} + c_{2})}

(2)

L = 1 - S S I M (x, y)

(3)

where

σ_{x y}

is the covariance,

μ_{}

and

σ_{}^{2}

represent the average values and variance,

c_{1}

and

c_{2}

are constants employed to avoid systematic mistakes.

The total loss function of our model is shown in Equation (4),

α

and

β

are the positive weights:

L_{t o t a l} = α L_{l 1} + β L_{s s i m}

(4)

4. Experiments

In this section, we begin by describing the dataset we used, along with our data preprocessing methods and experimental setup. Subsequently, on this dataset, we conducted both quantitative and qualitative comparisons between the proposed model and several state-of-the-art models. Additionally, we performed qualitative comparative experiments on real images. To demonstrate the effectiveness and rationality of our model’s network structure design and branch subnetwork settings, we conducted ablation experiments.

4.1. Dataset and Experimental Settings

Training Dataset. Due to the scarcity of paired low-quality and high-quality thermal infrared images in commonly used datasets, which are primarily intended for advanced visual tasks such as image classification, object detection, and fault diagnosis, we employed a novel approach to address the limited dataset issue. Specifically, we utilized a random contrast function to manipulate high-quality thermal infrared images and generate corresponding low-quality images. To overcome the lack of available paired data, we leveraged the VT5000 [41] dataset, designed for object detection, and the Server [42] dataset, intended for fault diagnosis. The VT5000 dataset encompasses a total of 5000 thermal infrared images across 11 different scenes and environments. Similarly, the Server dataset comprises 1350 thermal infrared images. The images in these datasets were all captured using FLIR (Forward-Looking Infrared) T640 and T610 cameras, with dimensions of 640 × 480 pixels. The operational temperature ranges for these devices are −40 °C to +2000 °C and −40 °C to +650 °C, respectively.

Testing Dataset. To comprehensively evaluate our model from both objective and subjective perspectives, we curated test datasets by selecting samples from the VT5000 and Server datasets. Specifically, we chose 500 images from the VT5000 dataset and 100 images from the Server dataset to construct the testing dataset, allowing us to assess the model’s performance using objective evaluation metrics. Moreover, we further validated the effectiveness and generalization capability of our proposed model by testing it on real-world low-quality thermal infrared images captured in authentic scenarios.

Experimental Settings. Our model is trained on PyTorch, and the Nvidia RTX 3060 GPU accelerates the training speed. To augment the training dataset and enhance the model’s generalization capability, we performed random cropping on the images in the dataset, extracting image patches of size 256 × 256 as inputs for the model. Additionally, image patches were randomly rotated by 90, 180, and 270 degrees, as well as horizontally flipped, to further augment the data. For our experiments, we employed the Adam optimizer [43] with default values of

β_{1}

and

β_{2}

set to 0.9 and 0.999, respectively. The initial learning rate was set to 0.0001, and we employed a cosine annealing strategy [44] to adjust the learning rate over the course of training. We have presented the convergence process of our model during training, including objective evaluation metrics and loss functions, in Figure 4. To demonstrate the effectiveness of our proposed model in enhancing thermal infrared images, we compared it against the following methods: HE [7], TEN [18], TIECNN [19], IE-GAN [25], and DBDNet [45]. We use objective evaluation metrics such as peak signal-to-noise ratio (PSNR) and SSIM to assess the enhancement performance of the aforementioned models. The definition of PSNR is as follows:

P S N R = 10 \cdot \log_{10} (\frac{L^{2}}{M S E})

(5)

where

L

is the maximum possible pixel value, and

M S E

is the Mean Squared Error between the original and enhanced images.

4.2. The Experimental Results on the Dataset

Quantitative experimental results are summarized and compared in Table 1. The best-performing result is highlighted in bold, and the second-best result is indicated with an underline. It is evident from the table that our model achieves the best performance on both datasets. Specifically, on the VT5000 dataset, our model achieves a PSNR of 36.58 dB and an SSIM score of 0.8269, while on the Server dataset, it achieves a PSNR of 33.75 dB and an SSIM score of 0.7957. Compared with the second-best-performing method, DBDNet, our model achieves significant improvements in terms of PSNR and SSIM on both datasets. On the VT5000 dataset, our model exhibits a PSNR improvement of 5.93 dB and an SSIM improvement of 14.13% over DBDNet. Similarly, on the Server dataset, our model achieves a PSNR improvement of 5.22 dB and an SSIM improvement of 13.46% over DBDNet. These quantitative experimental results underscore the effectiveness and state-of-the-art performance of our model in the realm of thermal infrared image enhancement. The substantial improvements achieved in terms of image quality metrics reinforce the notion that our model outperforms existing methods, contributing to its significance in the field of thermal infrared imaging enhancement.

The qualitative comparison results of the enhancement effects are shown in Figure 5 and Figure 6. It is evident that our model achieves remarkable performance in enhancing textures and details. It effectively highlights the texture and detail information in the images, leading to a significant improvement in the overall visual quality of the images. When compared with methods such as TEN, TIECNN, and IE-GAN, our model demonstrates a clear enhancement in thermal infrared image quality. Furthermore, in comparison to DBDNet, the superiority of our model in enhancing textures and details is clearly evident.

4.3. The Experimental Results on Real-World Images

To demonstrate the generalization ability of our proposed model and its efficacy in handling real-world scenarios, we collected low-quality thermal infrared images captured in real-world scenes to test our model. The collected images were captured using FLIR (Forward-Looking Infrared) T640 cameras, which have an operational temperature range of −40 °C to +2000 °C. The captured images are all of size 640 × 480 pixels. Given that images from real scenes lack ground truth, making it impossible to calculate objective evaluation metrics like PSNR or SSIM, we conducted qualitative comparisons solely based on visual inspection. The comparison results in Figure 7 and Figure 8 support the effectiveness of our method. Compared with other thermal infrared image enhancement models, our model exhibits comparable or even superior performance. Whether in terms of enhanced image textures, details, or overall visual quality, our model consistently outperforms the other models. This showcases the robust generalization ability and effectiveness of our proposed model in handling real-world scenarios.

4.4. Ablation Study

As mentioned earlier, abundant features contribute to better image enhancement results. In this paper, we have utilized FA blocks and a sparse mechanism combining dilated convolutions (DC) to enhance the model’s feature extraction capability. To provide further evidence for the feasibility of employing FA blocks and dilated convolutions in our model, we conducted a series of ablation experiments to investigate how the efficiency of our network varies based on different configurations. Table 2 presents the results of experiments with different configurations: (1) Base: Two sub-networks with normal convolutions and the fusion block. (2) Base + DC: Add dilated convolutions to Base. (3) Base + FA: Add FA blocks into Base. (4) Base + FA + DC: Add FA blocks and dilated convolutions into Base

5. Conclusions

In this paper, we have introduced a dual convolutional neural network enhanced with attention mechanisms for thermal infrared image enhancement. To achieve superior enhancement results, we have devised two parallel subnetworks to capture features at distinct scales. In one of the subnetworks, we have employed FA (Feature Attention) blocks to refine feature extraction. Through local residual learning, channel attention, and pixel attention mechanisms, these blocks allocate weights to feature maps, effectively filtering out irrelevant background information and focusing on objects, textures, and details. This adaptability allows our model to handle feature information flexibly and adapt well to complex scenes. In the other subnetwork, we have combined dilated convolutions to form a sparse mechanism. By alternating between regular and dilated convolutional layers, we have enlarged the receptive field, enhancing the network’s feature extraction capacity while mitigating the grid effect associated with dilated convolutions. Lastly, we have designed a simple yet effective fusion block that employs convolution operations to adaptively learn optimal fusion strategies. This ensures the final enhanced images exhibit superior visual quality. Extensive experiments have substantiated our model’s superior performance over state-of-the-art approaches in thermal infrared image enhancement. It is capable of generating images with remarkable visual quality. Moving forward, our aim is to further optimize our model, attaining even more remarkable enhancement results. Simultaneously, we plan to utilize our model for a broader range of image enhancement tasks.

Author Contributions

Methodology, P.G.; Software, W.Z.; Investigation, H.M.; Data curation, Z.W.; Writing—original draft, Z.L.; Writing—review & editing, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by State Grid Jilin Electric Power Co., LTD. Science and Technology Project support (Contract number: SGJLCC00KJJS2302567) and the APC was funded by same project.

Data Availability Statement

Data will be shared by request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Katırcıoğlu, F.; Çay, Y.; Cingiz, Z. Infrared image enhancement model based on gravitational force and lateral inhibition networks. Infrared Phys. Technol. 2019, 100, 15–27. [Google Scholar] [CrossRef]
Liu, S.; Wang, S.; Liu, X.; Lin, C.-T.; Lv, Z. Fuzzy detection aided real-time and robust visual tracking under complex environments. IEEE Trans. Fuzzy Syst. 2020, 29, 90–102. [Google Scholar] [CrossRef]
He, X.; Chen, C.Y.-C. Exploring reliable visual tracking via target embedding network. Knowl.-Based Syst. 2022, 244, 108584. [Google Scholar] [CrossRef]
Abdar, M.; Fahami, M.A.; Rundo, L.; Radeva, P.; Frangi, A.F.; Acharya, U.R.; Khosravi, A.; Lam, H.-K.; Jung, A.; Nahavandi, S. Hercules: Deep hierarchical attentive multilevel fusion model with uncertainty quantification for medical image classification. IEEE Trans. Ind. Inform. 2022, 19, 274–285. [Google Scholar] [CrossRef]
Zhang, H.; Li, M.; Miao, D.; Pedrycz, W.; Wang, Z.; Jiang, M. Construction of a feature enhancement network for small object detection. Pattern Recognit. 2023, 143, 109801. [Google Scholar] [CrossRef]
Liu, J.; Zhou, X.; Wan, Z.; Yang, X.; He, W.; He, R.; Lin, Y. Multi-Scale FPGA-Based Infrared Image Enhancement by Using RGF and CLAHE. Sensors 2023, 23, 8101. [Google Scholar] [CrossRef] [PubMed]
Hummel, R. Image enhancement by histogram transformation. Comput. Graph. Image Process. 1977, 6, 184–195. [Google Scholar] [CrossRef]
Lee, J.-S. Digital image enhancement and noise filtering by use of local statistics. IEEE Trans. Pattern Anal. Mach. Intell. 1980, 2, 165–168. [Google Scholar] [CrossRef]
Reza, A.M. Realization of the contrast limited adaptive histogram equalization (CLAHE) for real-time image enhancement. J. Signal Process. Syst. 2004, 38, 35–44. [Google Scholar] [CrossRef]
Vickers, V.E. Plateau equalization algorithm for real-time display of high-quality infrared imagery. Opt. Eng. 1996, 35, 1921–1926. [Google Scholar] [CrossRef]
Kim, Y.-T. Contrast enhancement using brightness preserving bi-histogram equalization. IEEE Trans. Consum. Electron. 1997, 43, 1–8. [Google Scholar]
Kim, J.-Y.; Kim, L.-S.; Hwang, S.-H. An advanced contrast enhancement using partially overlapped sub-block histogram equalization. IEEE Trans. Circuits Syst. Video Technol. 2001, 11, 475–484. [Google Scholar]
Singh, K.; Vishwakarma, D.K.; Walia, G.S.; Kapoor, R. Contrast enhancement via texture region based histogram equalization. J. Mod. Opt. 2016, 63, 1444–1450. [Google Scholar] [CrossRef]
Sim, K.; Tso, C.; Tan, Y. Recursive sub-image histogram equalization applied to gray scale images. Pattern Recognit. Lett. 2007, 28, 1209–1221. [Google Scholar] [CrossRef]
Parihar, A.S.; Verma, O.P. Contrast enhancement using entropy-based dynamic sub-histogram equalisation. IET Image Process. 2016, 10, 799–808. [Google Scholar] [CrossRef]
Park, Y.; Sung, Y. Imitation Learning through Image Augmentation Using Enhanced Swin Transformer Model in Remote Sensing. Remote Sens. 2023, 15, 4147. [Google Scholar] [CrossRef]
Pang, Z.; Liu, G.; Li, G.; Gong, J.; Chen, C.; Yao, C. An Infrared Image Enhancement Method via Content and Detail Two-Stream Deep Convolutional Neural Network. Infrared Phys. Technol. 2023, 132, 104761. [Google Scholar] [CrossRef]
Choi, Y.; Kim, N.; Hwang, S.; Kweon, I.S. Thermal image enhancement using convolutional neural network. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; IEEE: Piscataway, NJ, USA, 2016. [Google Scholar]
Lee, K.; Lee, J.; Lee, J.; Hwang, S.; Lee, S. Brightness-based convolutional neural network for thermal image enhancement. IEEE Access 2017, 5, 26867–26879. [Google Scholar] [CrossRef]
Shen, L.; Yue, Z.; Feng, F.; Chen, Q.; Liu, S.; Ma, J. MSR-net: Low-light Image Enhancement Using Deep Convolutional Network. arXiv 2017, arXiv:1711.02488. [Google Scholar]
Zhang, Y.; Zhang, J.; Guo, X. Kindling the darkness: A practical low-light image enhancer. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019. [Google Scholar]
Wang, W.; Wei, C.; Yang, W.; Liu, J. Gladnet: Low-light enhancement network with global awareness. In Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, China, 15–19 May 2018; IEEE: Piscataway, NJ, USA, 2018. [Google Scholar]
Lv, F.; Lu, F.; Wu, J.; Lim, C. MBLLEN: Low-Light Image/Video Enhancement Using CNNs. In Proceedings of the British Machine Vision Conference, Newcastle, UK, 5–8 September 2018; Volume 220. [Google Scholar]
Fan, Z.; Bi, D.; Xiong, L.; Ma, S.; He, L.; Ding, W. Dim infrared image enhancement based on convolutional neural network. Neurocomputing 2018, 272, 396–404. [Google Scholar] [CrossRef]
Kuang, X.; Sui, X.; Liu, Y.; Chen, Q.; Gu, G. Single infrared image enhancement using a deep convolutional neural network. Neurocomputing 2018, 332, 119–128. [Google Scholar] [CrossRef]
Wang, D.; Lai, R.; Guan, J. Target attention deep neural network for infrared image enhancement. Infrared Phys. Technol. 2021, 115, 103690. [Google Scholar] [CrossRef]
Qin, X.; Wang, Z.; Bai, Y.; Xie, X.; Jia, H. FFA-Net: Feature fusion attention network for single image dehazing. Proc. Conf. AAAI Artif. Intell. 2020, 34, 11908–11915. [Google Scholar] [CrossRef]
Barash, D. Fundamental relationship between bilateral filtering, adaptive smoothing, and the nonlinear diffusion equation. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 844–847. [Google Scholar] [CrossRef]
Paris, S.; Durand, F. A fast approximation of the bilateral filter using a signal processing approach. In Proceedings of the Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; Part IV 9. Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Pace, T.; Manville, D.; Lee, H.; Cloud, G.; Puritz, J. A multiresolution approach to image enhancement via histogram shaping and adaptive wiener filtering. In Visual Information Processing XVII; SPIE: Bellingham, WA, USA, 2008; Volume 6978, p. 697804. [Google Scholar]
Li, Y.; Hou, C.; Tian, F.; Yu, H.; Guo, L.; Xu, G.; Shen, X.; Yan, W. Enhancement of infrared image based on the retinex theory. In Proceedings of the 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Lyon, France, 22–26 August 2007; IEEE: Piscataway, NJ, USA, 2007. [Google Scholar]
Zhan, B.; Wu, Y. Infrared image enhancement based on wavelet transformation and retinex. In Proceedings of the 2010 Second International Conference on Intelligent Human-Machine Systems and Cybernetics, Washington, DC, USA, 26–28 August 2010; IEEE: Piscataway, NJ, USA, 2010; Volume 1. [Google Scholar]
Agaian, S.; Panetta, K.; Grigoryan, A. Transform-based image enhancement algorithms with performance measure. IEEE Trans. Image Process. 2001, 10, 367–382. [Google Scholar] [CrossRef] [PubMed]
Li, X.M. Image enhancement in the fractional Fourier domain. In Proceedings of the 2013 6th International Congress on Image and Signal Processing (CISP), Hangzhou, China, 16–18 December 2013; IEEE: Piscataway, NJ, USA, 2013; Volume 1. [Google Scholar]
Shcherbinin, A.; Kolchin, K.; Glazistov, I.; Rychagov, M. Sharpening image details using local phase congruency analysis. Electron. Imaging 2018, 30, 218-1–218-5. [Google Scholar] [CrossRef]
Jiang, S.; Guan, M.; Wu, J.; Fang, G.; Xu, X.; Jin, D.; Liu, Z.; Shi, K.; Bai, F.; Wang, S.; et al. Frequency-domain diagonal extension imaging. Adv. Photonics 2020, 2, 036005. [Google Scholar] [CrossRef]
Guo, C. The application of fractional wavelet transform in image enhancement. Int. J. Comput. Appl. 2021, 43, 684–690. [Google Scholar] [CrossRef]
Tian, C.; Xu, Y.; Li, Z.; Zuo, W.; Fei, L.; Liu, H. Attention-guided CNN for image denoising. Neural Netw. 2020, 124, 117–129. [Google Scholar] [CrossRef]
Tian, C.; Xu, Y.; Zuo, W.; Du, B.; Lin, C.-W.; Zhang, D. Designing and training of a dual CNN for image denoising. Knowl.-Based Syst. 2021, 226, 106949. [Google Scholar] [CrossRef]
Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Tu, Z.; Ma, Y.; Li, Z.; Li, C.; Xu, J.; Liu, Y. RGBT salient object detection: A large-scale dataset and benchmark. In IEEE Transactions on Multimedia; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar]
Liu, H.; Bao, C.; Xie, T.; Gao, S.; Song, X.; Wang, W. Research on the intelligent diagnosis method of the server based on thermal image technology. Infrared Phys. Technol. 2018, 96, 390–396. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
He, T.; Zhang, Z.; Zhang, H.; Zhang, Z.; Xie, J.; Li, M. Bag of tricks for image classification with convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Ma, J.; Peng, C.; Tian, X.; Jiang, J. DBDnet: A deep boosting strategy for image denoising. EEE Trans. Multimed. 2021, 24, 3157–3168. [Google Scholar] [CrossRef]

Figure 1. Network Architecture Diagram. Two parallel sub-networks extract features at different scales, and feature fusion is achieved through a fusion block, ensuring the enhanced image’s visual quality.

Figure 2. Overall Network Architecture. The utilization of attention mechanism and sparse mechanism significantly enhances the network’s feature extraction capabilities.

Figure 3. Architecture of the FA Block. The FA block’s architecture encompasses local residual learning, channel attention, and pixel attention, endowing it with flexibility in feature extraction.

Figure 4. Network convergence analysis. (a) Diagram of convergence process of PSNR and SSIM with epoch. (b) Diagram of convergence process of Loss with epoch. (c) The PSNR and SSIM of validation with epoch.

Figure 5. Qualitative comparison on VT5000 Dataset. (a) Low Quality, (b) TEN, (c) TIECNN, (d) IE-GAN, (e) DBDNet, (f) Ours, (g) GT.

Figure 6. Qualitative comparison on Server Dataset. (a) Low Quality, (b) TEN, (c) TIECNN, (d) IE-GAN, (e) DBDNet, (f) Ours, (g) GT.

Figure 7. Qualitative comparison of real thermal infrared images of electronic components. (a) Low Quality, (b) TEN, (c) TIECNN, (d) IE-GAN, (e) DBDNet, (f) Ours.

Figure 8. Qualitative comparison of real thermal infrared images of switchgear. (a) Low Quality, (b) TEN, (c) TIECNN, (d) IE-GAN, (e) DBDNet, (f) Ours.

Table 1. Quantitative results on VT5000 and Server dataset.

Dataset	Metric	TEN	TIECNN	IE-GAN	DBDNet	Ours
VT5000	PSNR (dB)	20.58	22.73	25.83	30.65	36.58
VT5000	SSIM	0.5291	0.5949	0.6597	0.7245	0.8269
Server	PSNR (dB)	19.66	21.31	25.02	28.53	33.75
Server	SSIM	0.5185	0.5809	0.6253	0.7013	0.7957

Table 2. The results of ablation experiments on VT5000 and Server dataset.

Model	Dataset
	VT5000		Server
	PSNR (dB)	SSIM	PSNR (dB)	SSIM
Base	31.74	0.7352	29.06	0.7119
Base + DC	32.85	0.7503	30.19	0.7364
Base + FA	35.61	0.7988	32.25	0.7739
Ours (Base + DC + FA)	36.58	0.8269	33.75	0.7957

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, P.; Zhang, W.; Wang, Z.; Ma, H.; Lyu, Z. A Dual Convolutional Neural Network with Attention Mechanism for Thermal Infrared Image Enhancement. Electronics 2023, 12, 4300. https://doi.org/10.3390/electronics12204300

AMA Style

Gao P, Zhang W, Wang Z, Ma H, Lyu Z. A Dual Convolutional Neural Network with Attention Mechanism for Thermal Infrared Image Enhancement. Electronics. 2023; 12(20):4300. https://doi.org/10.3390/electronics12204300

Chicago/Turabian Style

Gao, Pengfei, Weihua Zhang, Zeyi Wang, He Ma, and Zhiyu Lyu. 2023. "A Dual Convolutional Neural Network with Attention Mechanism for Thermal Infrared Image Enhancement" Electronics 12, no. 20: 4300. https://doi.org/10.3390/electronics12204300

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Dual Convolutional Neural Network with Attention Mechanism for Thermal Infrared Image Enhancement

Abstract

1. Introduction

2. Related Works

2.1. Traditional Image Enhancement Methods

2.2. Deep Learning-Based Image Enhancement Methods

3. Proposed Method

3.1. Overall Framework

3.2. Feature Attention

3.3. Sparse Mechanism

3.4. Fusion Block

3.5. Loss Function

4. Experiments

4.1. Dataset and Experimental Settings

4.2. The Experimental Results on the Dataset

4.3. The Experimental Results on Real-World Images

4.4. Ablation Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI