1. Introduction
The development of laser technology has been changing rapidly in recent years, and new means of active laser imaging are constantly being produced. However, affected by imaging conditions and noise interference, the contrast of the original laser image is low and cannot directly meet practical needs. Image enhancement algorithms can improve the image’s overall and local contrast and highlight the image’s detailed information. The enhanced images fit the visual features of the human eye better and are easier for machine identification. Image enhancement algorithms have a wide range of applications in military and civilian fields. The new innovative algorithm targets laser images with low brightness, low contrast, and high noise. With the development of deep learning research and the expansion of its application in various fields, it also has an excellent performance in the field of laser image enhancement.
Existing laser image enhancement algorithms are mainly enhanced from visual and detail features. The histogram, wavelet transform [
1], and Retinex theory are three types of improved traditional algorithms. The histogram mainly enhances the algorithm by expanding the grayscale part with more pixels, the wavelet transform uses a method of decomposing sub-bands and layering processing for enhancement, and Retinex mainly relies on scientific experiments and analysis to enhance images. But these three algorithms require many parameters to be determined for improvement, and their impact and generalization abilities are unclear, which can easily lead to problems such as excessive enhancement, loss of important information, and artificial noise. In recent years, deep learning has been widely used in the image field. LL-Net [
2] proposed by Lore et al. was the first to use deep learning in image enhancement. They designed a low-light network depth autoencoder to simultaneously enhance and denoise low-light noisy images. Li et al. [
3] proposed a convolutional neural network (CNN) image enhancement method. It takes low-illumination images as the network input and the output results are enhanced using the Retinex [
4] model. The method solves the problem of distortion caused by over-enhancement in previous methods, but has limitations when enhancing low-quality low-light images.
Since the introduction of generative adversarial networks (GANs) in 2014 [
5], it has been widely used for generating high-quality samples using a unique zero-sum game and adversarial training approach. This has strengthened the feature learning and representation abilities of GANs, making it applicable in various fields such as computer vision. However, the lack of paired data for supervised learning makes it difficult to represent the performance. Zhu et al. designed a cyclic consistent adversarial network (Cycle GAN) based on GANs and pairwise learning [
6]. It overcomes the shortage of GANs but is prone to the loss of image information and cannot satisfy the structural similarity between the generated image and the real image. In 2021, the EnlightenGAN model was proposed, and it does not require paired training sets. One attention-guided U-Net [
7] generator and one global–local discriminator are included, and excellent performance is achieved on a range of standard test data. However, in the encoding–decoding stage, U-Net extracts different levels of features after multiple down-sampling and up-sampling. This can easily distort the generated images [
8], especially for darker low-light images, and make it difficult to restore the image detail information. In addition, the generated image tends to have a single color, and unknown artifacts are generated during the enhancement process. The laser image does not guarantee the same effect as other datasets do [
9].
To solve the above problems, the paper takes improving the generalization ability of unsupervised deep learning models on laser image datasets as the starting point. It combines the advantages of such models with a newly constructed laser image dataset to study low-light laser image enhancement techniques. The main contributions are as follows:
The algorithm is optimized and improved based on the EnlightenGAN model, and its loss function is redesigned to improve the generalization ability and enhancement effect of the model.
A deep connection between the global discriminator and the local discriminator is established on the original structure of the EnlightenGAN model, allowing the global loss of the global discriminator to better serve the local optimization of the local discriminator.
A new self-regularized attention mechanism applicable to laser images is established. The convolution mode of downsampling is improved to fuse the attention features and the original image features using residuals.
2. Proposed Algorithm
The EnlightenGAN model employs an attention-guided [
10] U-Net as a generator and uses a dual discriminator to guide the global and local information and a self-feature preservation loss to guide the training process, maintaining texture and structure [
11], as shown in
Figure 1. In this section, we focus on two important building blocks: the global–local discriminator and the self-feature retention loss.
2.1. Global–Local Discriminator
In the original GAN, there is a generator that is responsible for generating target samples from Gaussian samples, while there is also a discriminator that is responsible for determining whether the input image is a target sample. The discriminator and the generator have their respective optimization schemes. The discriminator reduces the loss by increasing the correct rate of real samples and decreasing the accuracy of incorrect samples, while the generator reduces the loss by increasing the correct rate of incorrect samples. The optimization function is as follows:
D represents the discriminator, G represents the generator, and V represents the difference between the real data and the generated data. x is the target distribution image; z is the Gaussian input sample, where E is the expectation; p(x) and p(z), respectively, refer to the distribution of the target sample and the random Gaussian sample. D(x) generates the probability for whether an image is true, G(x) generates a new image, max V represents maximizing the ability of the discriminator to identify the real data or the generated data, and min V represents minimizing the ability of the discriminator to identify the real data or the generated data.
In the input image, the global discriminator often fails to achieve individual enhancement of the local area [
12]. Therefore, in order to adaptively enhance local regions and improve the global illumination, we propose a global–local discriminator structure in the model.
The global discriminator uses PatchGAN [
13] for true–false discrimination. The authors of the original paper modified the loss function of the standard GAN. The relative meaning is that it considers both the probability that fake data are more real than real data and the probability that real data are more real than fake data. The function of the relative discriminator is:
In the formula above, C is the discriminator network;
and
are sampled from the true and false distributions, respectively;
denotes the sigmoid function [
14]; the
function is replaced by the least squares GAN (LSGAN) loss. In these two formulas, the first is the loss of the discriminator. When a real image is correctly identified and a generated fake image is identified as fake, the discriminator’s loss decreases. Conversely, when a real image is incorrectly identified or a generated fake image is identified as real, the loss increases. The second formula is the loss function of the generator. When a real image is misidentified and a generated fake image is identified as real, the generator’s loss increases [
15]. The final loss functions of the global discriminator D and generator G are as follows:
The local discriminator mainly addresses the need for some local regions to be enhanced differently from other parts. A total of 5 random patches are cropped at a time from the output image and the real image. Then, their truth is distinguished. The difference between this and the global discriminator is that the local discriminator does not use the relative discriminator function, but still uses the original discriminator function [
16]. Here, the unmodified LSGAN is used as the adversarial loss:
This global–local discriminator structure ensures that all local patches of the enhanced image look like true normal light, which is the key to avoiding local overexposure or underexposure.
2.2. U-Net Generator Guided with Self-regularized Attention
U-Net has achieved great success in semantic segmentation, image recovery, and enhancement. U-Net preserves rich texture information by extracting features from different depth layers at multiple levels, and synthesizes high-quality images using multi-scale contextual information [
17].
The adoption of U-Net as a generator backbone network further proposes an easy-to-use network of attention mechanisms for U-Net. In low-light images with spatial variations in light, we prefer to enhance the dark areas rather than the light areas so that the output image is neither overexposed nor underexposed. Therefore, the illumination channel I of the input RGB image is normalized to [0, 1]. Then, 1 to I (the difference between elements) is used as the self-normalized attention map. The attention map is resized to fit each feature map and multiplied with all intermediate feature maps and the output image.
The attention-guided U-Net generator is implemented by 8 convolutional blocks, each consisting of two 3 × 3 convolutional layers, followed by LeakyReLu and batch subsumption layers. In the up-sampling phase, the standard inverse convolution layer is replaced with a bilinear up-sampling layer plus a convolutional layer to reduce the tessellation effect.
3. Model Improvement
3.1. Limitations and Ideas
3.1.1. The Global Discriminator Is Not Related to the Local Discriminator
In the original model, the global discriminator discriminates the entire generated image, and the local discriminator randomly prunes five small fragments from the generated image for discrimination. We contact both the global discriminator and the local discriminator to better train the model for sparsely generated image patches and improve model generalization.
3.1.2. The Self-regularized Attention Is Inconsistent with the Laser Images
In the self-regularized attention mechanism of the original model, more attention is paid to the enhancement of dark regions and the enhancement of bright regions is weakened. Regions with small gray levels will have good enhancement effects, whereas regions with large gray levels will have no obvious enhancement effect. The major differences between the laser dataset and other datasets are that the laser-imaged images do not need to highlight certain dark regions, but the original methods will erroneously highlight those regions.
3.1.3. Refine the Modulus of the Dark Channel
We find that merging the dark channel module in advance can better improve the generalization capability of the model. So, the position of the dark channel module is refined, and on this basis, we also add a residual part to avoid exploding the gradient of the entire model.
3.2. Strong Connection between Global Discriminators and Local Discriminators
Establishing a strong connection between the global and local discriminators further improves the optimization capability of the model. This improved method is proven to be capable of improving the local loss of complex texture images, as shown in
Figure 2.
G represents the generator, D represents the discriminator, GL represents the global loss, randomcrop represents the random cut, LL represents the local loss, and selection represents the selection based on the global loss. The purpose of this paper is to add the strong connection between global and local loss based on the original EnlightenGAN model, which facilitates the computation of the region with a large global loss when computing the local loss.
Both the global discriminator and the local discriminator are based on the PatchGAN model, and their key feature is that they are replaced with full convolution networks. The discriminator of the joint GAN model maps the input to a real number, i.e., the probability that the input sample is a real sample, and PatchGAN maps the input to an matrix. In the matrix, value represents the probability of each patch being a real sample, the average of is the final output of the discriminator, and X is in fact the feature map of the output of the convolution layer. The feature map allows us to track back to a certain position in the original image and observe how much influence that position has on the final output.
This essay establishes the relation between two discriminators on the original base. The global discriminator finds the position with the largest likelihood difference in the patch and extracts the corresponding image position. The corresponding position in the local discriminator is first clipped, and then it is randomly clipped, optimizing local generation.
Figure 3 shows that after obtaining the maximum index, the block of images with the largest difference in likelihood is extracted from the original image and then placed in the local discriminator together with the randomly cropped image block.
3.3. Down-sampling Convolution Module Fitting
Downsampling can be commonly understood as shrinking the image and reducing the number of sampling points of the matrix, which serves to reduce the computational effort, reduce information redundancy, and increase the perceptual field. The network implements downsampling by several successive CLB convolutional modules, each consisting of one convolutional kernel, one LeakyReLu activation function in series, and one BN. The features extracted by multilayer convolution have stronger semantic properties compared with other methods.
The original network directly adds attention modules from different scales to the feature map after down-sampling. This attention mechanism has not been extracted by the convolution module, and will lose some of the local detailed textures when merged with the upsampled image. It greatly affects the fusion of features, and so this paper makes targeted improvements for that portion of the locally lost textures.
As shown in
Figure 4, the order of the attention map is flexibly adjusted and the attention map is adjusted before the second layer of the CLB convolution module, while the residual structure is introduced in the second layer of the CLB and the network is made deeper by two constant mappings of the skip connection and activation function. In this way, the attention map can better integrate the original feature map and improve the robustness of the model.
3.4. Self-regularized Attention Mechanism
There are a few “dark regions” in the laser data that do not need to be improved. The original model aims to enhance all the low pixels and weaken the high pixels. It does not work directly on our dataset.(
Figure 5) shows the effect of applying the model to the original self-regularized attentional diagram. The weights of the dark regions are much larger than those of the non-dark regions, and it does not achieve the expected effect.
The original self-regularized attention mechanism is similar to threshold image segmentation in image processing [
18]; a novel mechanism is therefore proposed in this paper using this idea.
Consider threshold segmentation to be a function operation:
The formula is as follows: x and y represent the horizontal and vertical coordinates of the pixel, p(x, y) represents the local features of the pixel, and f(x, y) represents the grey value of the pixel.
After thresholding, the image is defined as follows:
where
g(x, y) denotes the grayscale value of the processed image pixels, and
T1 and
T2 denote the different pixel thresholds. By adopting a new method of autoregulation, i.e., setting a threshold, when it is below the threshold, a low-drop and high-rise attention method is adopted; when it is above the threshold, a low-rise and -fall method is adopted. The improvement in “dark regions” is avoided.
Figure 5 shows the comparison between the original self-regularization and the enhanced self-regularization.
4. Experiment and Analysis
4.1. Experiment Design
4.1.1. Experimental Data and Parameter Tuning
The excellent effect of laser-range-gated technology [
19] in removing atmospheric backscatter and adjusting system distance has led countries to place great emphasis on relevant research and equipment development. In all countries, the application of range-gated imaging has become the priority imaging method when detecting targets. China, on the other hand, is still at the stage of theoretical research and experimental demonstration. A series of laser datasets is generated over the course of the experiments. For research into the follow-up processing of the laser images obtained, related projects are being carried out by large laboratories.
To assess the performance of this algorithm, unprocessed nighttime laser videos are deconvolved, and 927 high-luminance images and 642 low-luminance images are obtained. All of these images are converted to PNG format and set to 600 × 400 pixels.
Following the training pattern of the original algorithm, we first train 100 iterations with a learning rate of (1 × 10−4), and then we train 100 iterations with a linear attenuation of zero. The Adam optimizer is used.
4.1.2. Experimental Setting
Table 1 shows the hardware and software setups used in the experiment. The batch size of the EnlightenGAN program is fixed at 32 in this training environment, and the ratio of the training set, the verification set, and the test set is fixed at 6:3:1.
4.2. Experimentally Measured Indicators
There are three indices used in the experiment to assess the experimental results: NIQE (Natural Image Quality Evaluator) [
20], SSIM (Structural Similarity) [
21], and PSNR (Peak Signal to Noise Ratio) [
22].
4.2.1. NIQE
NIQE, also known as the no-reference image assessment index, is an assessment metric to compensate for all-reference assessment indexes (such as PSNR and SSIM). This primarily uses the regularized NSS model [
23] for extracting image features, and these features are taken as input to the MVG model [
24]. The distance to the quality perception features extracted from the natural landscape is computed to measure the effect of the image. The lower the NIQE value, the better the quality of the evidence image.
4.2.2. SSIM
SSIM, also called structural similarity, is a metric for measuring the similarity between two input images. This method primarily takes the luminance, contrast, and structural attributes of the objects in the image as the main metrics for measuring similarity. In our experiment, the enhanced image and metadata are used for SSIM evaluation. The greater the similarity, the better the enhancement algorithm.
and represent the mean of images X and Y, respectively; and represent the standard deviation of images X and Y, respectively; and represent the variance of images X and Y, respectively; C1 and C2 are constants.
4.2.3. PSNR
PSNR, also known as the peak signal-to-noise ratio, is an objective standard for assessing image quality. It primarily represents the ratio of the maximum possible signal power and the destructive noise power that affects the accuracy of its representation.
Given a clean image and a noisy image of dimension, where
I(i,j) and
K(i,j) are the gray levels of the pixels at position
(i,j) in the original image, respectively, the mean square error is defined as
4.3. Ablation Experiment
According to the feature map redundancy of the Ghost Convolutional layer, it can be inferred that the deep feature map is not suitable for feature redundancy inference using linear calculation. Therefore, the replaced convolutional layers in the experiment are all backbone network convolutional layers close to the input layer. The parameter settings such as the number of Ghost Convolutional layers replaced, training time, and recognition rate in the experiment are shown in
Table 2.
The ablation experiment [
25] is set-up to test the efficacy of the algorithm. The effects on image enhancement in this group of experiments without thresholded segmentation [
26], the fit of the down-sampling module, and the contact with the global and local discriminators are compared, and the experimental results are shown in
Figure 6.
The NIQE, SSIM, and PSNR of the original method and the improved method are compared to check the efficiency of the algorithm.
Table 2 shows the experimental results.
The results of the ablation experiments show that the improved method achieves some results under evaluation indices such as NIQE, PSNR, and SSIM. Self-regularized attention and down-sampling enhancement have similar effects on quantitative indices. But after local connectivity enhancement is removed, it has a larger impact on the model indices. It demonstrates that local connectivity has a larger impact on various indices of the model and can ameliorate model deficiencies very well. Particularly, below the peak signal-to-noise-ratio index, good results are obtained.
4.4. Comparison Experiment
In addition, our experiment compares the contrast effects of different reinforcement algorithms with EnlightenGAN and EnlightenGAN+. To ensure comparability and fairness of the experiment, when different algorithms are used, they are all performed according to the same training strategy.
Figure 7 shows the experimental results.
As can be seen in the figure, in comparison to other enhancement algorithms, the algorithm in this paper can produce enhanced images with higher contrast and lower distortion, and achieve a better enhancement effect on nighttime laser image data.
The NIQE, SSIM, and PSNR of different algorithms and the method proposed in this paper are compared to check the efficiency of the algorithm.
The experiment is designed to calculate the NIQE, SSIM, and PSNR of the results obtained by different algorithms and the method proposed in this paper, and to test the efficiency of the algorithms by comparison.
Table 3 shows the experimental results.
Based on the data in the table above, it can be analyzed that in comparison to the other enhancement algorithms and the EnlightenGAN+ algorithm, the mean PSNR, SSIM, and NIQE are increased by 12.3% and 0.7%, 57% and 10.3%, and 21% and 13%, respectively [
27].
Our experiment shows that the original Enlighten GAN, LLNET, RetinexNet, and other algorithms can both improve the original image to different extents after the same training. The numerical results show that our improved algorithm performs well (slightly decreased in some places) under evaluation indices such as SSIM, PSNR, and NIQE. The difference in terms of the true visual effect of the image is that our algorithm has a better enhancement effect in some images with a complex distinction between light and dark. Our model can better improve the real areas that need improvement, but the enhancement effect of our model is not evident in some regions with a clear light–dark distinction.
5. Conclusions
This paper enhances laser images based on the idea of GAN. Firstly, it introduces the mainstream laser image enhancement techniques currently available. Subsequently, EnlightenGAN is introduced and three improvements are made to the original model. The first improvement establishes a connection between the global discriminator and local discriminator. The second involves an improved downsampling module. The third corrects the self-regularized attention mechanism under laser images. Then, ablation experiments and enhancement algorithm comparison experiments are designed. In the ablation experiment, our three improved modules have certain improvements under different evaluation indicators, especially after establishing the connection between the global discriminator and local discriminator, where the indicators all show significant improvement. When comparing the indicators of different algorithms, the EnlightenGAN+ algorithm that we propose shows an average improvement of 12.3% and 0.7% in PSNR and 57% and 10.3% in SSIM, and a decrease of 21% and 13% in NIQE. The experimental results show that our proposed model improves the signal-to-noise ratio and contrast of laser images, providing new ideas for the preprocessing of laser images and more detailed image information.
This paper introduces our achievements in the laser image enhancement algorithm so far. In view of some shortcomings of the current algorithm, we will continue to conduct related research on image processing using deep learning to achieve better image processing under more image conditions.