A Zero-Reference Low-Light Image-Enhancement Approach Based on Noise Estimation

Cao, Pingping; Niu, Qiang; Zhu, Yanping; Li, Tao

doi:10.3390/app14072846

Open AccessArticle

A Zero-Reference Low-Light Image-Enhancement Approach Based on Noise Estimation

by

Pingping Cao

¹,

Qiang Niu

^1,*,

Yanping Zhu

² and

Tao Li

³

¹

School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221006, China

²

Department of Civil Architectural and Environmental Engineering, Missouri University of Science and Technology, Rolla, MI 65409, USA

³

Institute of Information Network and Artificial Intelligence, Jiaxing University, Jiaxing 314001, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(7), 2846; https://doi.org/10.3390/app14072846

Submission received: 5 February 2024 / Revised: 8 March 2024 / Accepted: 25 March 2024 / Published: 28 March 2024

Download

Browse Figures

Versions Notes

Abstract

:

A novel zero-reference low-light image-enhancement approach based on noise estimation (ZLEN) is proposed to mitigate noise interference in image-enhancement processes, while the tenets of zero-reference and lightweight network architecture are maintained. ZLEN improves the high-order curve expression governing the mapping of low-light images to their enhanced counterparts, addressing image noise through a meticulously designed noise-estimation module and a zero-reference noise loss function. First, the higher-order curve expression with a noise term is defined, and then the noise map undergoes feature extraction through the semantic-aware attention module; following this, the resulting features are integrated with the low-light image. Ultimately, a lightweight convolutional neural network is adjusted to estimate higher-order curve parameters that link the low-light image to its enhanced version. Notably, ZLEN achieves luminance enhancement and noise reduction without paired or unpaired training data. Rigorous qualitative and quantitative evaluations were conducted on diverse benchmark datasets, demonstrating that ZLEN attained state-of-the-art (SOAT) status among existing zero-reference and unpaired-reference image-enhancement methodologies, while it exhibited comparable performance to full-reference image-enhancement methods. To confirm the practicality and robustness of ZLEN, the luminance enhancement was applied to mine images, which yielded satisfactory results.

Keywords:

higher-order curve parameters; noise estimation; zero-reference image enhancement; mine image enhancement

1. Introduction

The efficiency and functionality of computer vision algorithms and various intelligent systems, such as object detection [1], target tracking [2], automated driving [3], and coal-mine-monitoring systems [4], heavily rely on the availability of high-quality images, which has been widely recognized in the field. However, the quality of many images in the real world is not satisfactory due to factors such as the environment, technology, and image acquisition equipment. Particularly in scenarios characterized by inadequate illumination and environmental imbalances, the acquired images manifest diminished contrast, reduced visibility, and elevated ISO noise. In computer vision tasks, low-light conditions often lead to image noise and graininess, making it difficult for traditional computer vision algorithms to extract relevant features and information. However, low-light images can provide valuable information in certain scenes. For example, in surveillance or nighttime imaging, low-light images may capture unique details or events that are not visible under well-lit conditions. Due to the need for noise reduction, contrast enhancement, and other preprocessing steps, processing low-light images typically requires additional computational resources. The processing speed of low-light images may vary depending on the complexity of the scene and the algorithm used. Typically, the more complex algorithms used for denoising and enhancing low-light images may require longer processing times. Real-time applications may face challenges in achieving high frame rates when dealing with low-light conditions.

High-light images can provide better visibility of objects and details in well-lit environments. In outdoor scenes or environments with strong lighting, high-light images can provide clearer features, making it easier to detect and recognize objects. Compared with low-light images, processing high-light images may require fewer computational resources, as they are typically less affected by noise and require less preprocessing. Due to the potentially lower computational requirements, processing high-light images may result in faster processing speeds compared with low-light images.

In addition, low-light images not only affect the visual experience but also convey erroneous information that affects the judgment of humans and multiple intelligent systems. Hence, there has been a huge emphasis on developing techniques to improve low-light image quality.

Many advanced approaches have been developed to enhance low-light images. Among them, deep learning methodologies [5,6,7,8,9] have shown commendable performance in image enhancement owing to the utilization of extensive datasets. However, a notable drawback is their reliance on paired data for training, posing challenges in capturing disturbed and ground truth images of the visual scene simultaneously, particularly in real-world scenarios [10]. While generative adversarial network (GAN)-based approaches address the need for paired data by generating reference images, they still need unpaired normal-light images as a reference and require careful adjustments to the training data to prevent overfitting [11]. Methods based on curve fitting [12,13] can achieve image enhancement without using paired or unpaired data, but these methods only consider the mapping between image pixels and image exposure, color, and luminance when performing curve fitting. This fails to address the issue of noise in the enhanced image, and the process of image enhancement concurrently amplifies the existing image noise, which seriously affects the image-enhancement effect and visual effect. Consequently, existing low-light image-enhancement methods encounter two principal challenges: (1) dependency on paired or unpaired data for training and (2) without a reference, inherent noise problems that persist in low-light image-enhancement methods.

Motivated by the work of Guo et al. [13], we propose an innovative zero-reference low-light image-enhancement approach founded on noise estimation. This approach achieves satisfactory reference-free low-light image enhancement by estimating the luminance curves of the low-light fused noisy images with a lightweight convolutional neural network. In contrast to preceding curve-fitting-based methods [12,13], our method integrates the noise map with the original input image to augment image features, resulting in superior quality and visually appealing low-light image enhancement. This improvement is facilitated by the incorporation of a specifically designed noise loss function. During the noise-fusion process, our objective was to estimate the noise map of the low-light images, which serves as a representation of the noise distribution within each low-light image. As depicted in Figure 1, regions in the original input image exhibiting significant color value disparities (i.e., content-rich regions) correspond to sparser noise distribution in the enhanced image, and conversely, regions correspond to dense noise distribution. Thus, the noise distribution of images can be estimated by analyzing the color map of the image. Subsequently, the noise map is amalgamated with the input image to enhance its features, with a lightweight convolutional neural network employed to derive the parameter matrix for the image-enhancement curve. Ultimately, the designed loss function facilitates the realization of zero-reference image enhancement and noise reduction.

Overall, the contributions of the proposed research are as follows:

Proposing an improved high-order curve expression for mapping low-light images to enhanced images, which involves adding a noise term to reduce the noise in the enhanced image;
Proposing a color-map-based noise-estimation method, fusing noise maps with low-light images to enhance the input features of the depth curve estimation network;
Designing a reference-free noise loss function and adjusting the depth curve estimation network structure to achieve noise reduction in reference-free enhanced images while reducing color distortion;
Achieving SOAT on many reference-free low-light datasets and successfully applying to real coal mine image enhancement.

2. Related Works and Methods

2.1. Related Works

Data-driven approach: In the realm of low-light image enhancement, early methodologies predominantly centered around the Retinex theory [14,15,16,17,18], which breaks down an image into its reflectance and illumination components, allowing for the adjustment of brightness in low-light images and the alleviation of artifacts. Recently, the advent of deep learning has significantly impacted image-enhancement tasks, with data serving as a pivotal foundation. In this context, methods for enhancing images through data-driven approaches can be broadly categorized into two groups: convolutional neural network (CNN)-based methods [8,9,19] and generative adversarial network (GAN)-based methods [10,20]. CNN-based approaches predominantly operate from supervised training, where pairs of data are essential for model training. CNN-based enhancement models learn from paired low-light images and their corresponding normal exposure counterparts. The efficacy of CNN-based methods hinges on the quantity and quality of the training data, emphasizing the need for adequate and standardized samples. However, the real-world constraints of economic and time investments in collecting paired data often result in insufficient datasets. The datasets may include false or unrealistic samples, subsequently undermining the generalization capabilities of CNN-based methods. On the other hand, unsupervised GAN-based methods alleviate the necessity for paired training data. These approaches utilize unpaired input data to train a low-light image-enhancement network employing a specific discriminator and loss function. Nevertheless, meticulous screening of the input unpaired data becomes imperative to ensure the effectiveness of GAN-based methods.

Curve-estimation-based methods: Curve estimation-based image-enhancement methods [12,13,21,22,23] enable zero-reference image enhancement and avoid dependence on paired or unpaired data. General S-curve estimation methods [21,22,23,24] achieve regional exposure adjustment by applying nonlinear curve correction to each region of a given image to obtain attributes such as image exposure, color, hue, and contrast adjustment. However, the traditional S-curve adjustment methods have some drawbacks in maintaining local details and bringing halo effects. Yuan and Sun [12] proposed an S-curve global optimization algorithm called detail preservation, which pushes each region as far as possible to its desired region, thus preserving local details and avoiding the halo effect. However, the algorithm in the literature [12] does not apply to all low-light scenarios and is relatively computationally intensive. Albu et al. [25] proposed a color image-enhancement technique based on recursive-filtering and contrast-stretching techniques. This image-enhancement technique was driven by statistical measurements of images and implemented under a logarithmic image-processing model. Although this method is highly suitable for implementation on low-power, low-memory embedded devices, it has not been shown to be effective for images with a single color, such as mine low-light images. Guo et al. [13] proposed a lightweight deep CNN that learns the adjustable parameters of the curves through multiple iterations to achieve higher-order curves that are more robust, more accurate, and less computationally intensive for image enhancement. Although the design of Guo et al. is very skillful, the problem of noise and color distortion caused by image enhancement is still not properly solved in some scene images.

The ZLEN proposed in this paper surpasses existing curve-fitting methods in three crucial aspects. First, it introduces a color-map-based noise-estimation technique, enhancing image features through the fusion of noise with low-light images. Second, a meticulously designed noise loss function is formulated to achieve denoising in reference-free enhanced images. This noise loss function, together with a series of reference-free loss functions, such as exposure control, spatial consistency, illumination smoothness, and color consistency, are the key to achieving the zero-reference image enhancement in this paper. Third, in contrast to prevailing image-enhancement methods, our approach addresses both methodological efficiency and computational costs, striking a balance between performance and computational overhead. Our method achieved SOAT status among zero-reference and unpaired-reference image-enhancement methods, was comparable to full-reference image-enhancement methods, and showed a better enhancement effect in coal mine images, which verified the robustness and practicality of our method.

2.2. Method

Our proposed ZLEN method is shown in Figure 2. In our method, noise estimation and fusion are the keys to image enhancement and denoising. The improved higher-order curve and depth-curve-estimation networks are the main techniques to enhance the brightness of the image. Finally, zero-reference image enhancement and denoising can be realized by a few designed reference-free loss functions, and our method displayed better results compared with existing zero-reference and semi-reference image-enhancement methods. We describe the improvement of the higher-order curves, noise-estimation module, depth-curve-estimation network, and reference-free loss functions in detail in the following sections.

As shown in Figure 2, our zero-reference low-light image-enhancement method based on noise estimation mainly includes a noise-estimation module, high-order curve representation, and depth-estimation network. The noise-estimation module is used to obtain the noise map corresponding to the input image, and then obtain the semantic perception attention map of the noise map [26]. The high-order curve is mainly used to represent the mapping from low-light images to enhanced images, and the corresponding parameters are solved through a depth-curve-estimation network. The depth-curve-estimation network achieves convergence through several reference-free loss functions.

Figure 2. The framework of ZLEN.

2.2.1. Improved Higher-Order Curve Expressions

Motivated by the curve-adjustment functionalities found in photo-editing software, Guo et al. [13] represented the mapping to an enhanced image from a low-light image as a higher-order curve related only to the pixel coordinates of the input image. Although their design realizes the luminance enhancement of low-light images, the enhanced image still suffers from the problem of significant noise due to the lack of consideration of the underlying noise distribution of the image. Based on this, we redefined the higher-order curve expression incorporating noise to establish the mapping to an enhanced image from a low-light image, as expressed in Equation (1):

L E_{n} (x) = L E_{n - 1} (x) + α_{n} L E_{n - 1} (x) (1 - L E_{n - 1} (x) + N (x)) .

(1)

where x represents the pixel coordinates of the low light image, n represents the number of iterations of the network,

L E_{n - 1} (x)

denotes the low-light image input in the n-th. It should be italic iteration, and

α_{n} \in [- 1, 1]

denotes the parameters of the trainable curve.

2.2.2. Noise Estimation and Fusion

We propose a noise-estimation approach based on the color map to derive the noise map for an image, which is subsequently fused with the low-light image to enhance its features. Within the noise-estimation module, the color map of the given input image is calculated and the corresponding noise map is deduced based on this color map. The color map, being an inherent property of the scene that remains unaltered in illumination [9], serves as valuable a priori information for enhancing low-light images. This utilization aims to augment image saturation and mitigate color distortion. Drawing inspiration from Retinex theory [15], the computation of the color map is expressed as Equation (2):

C (x) = \frac{x}{m e a n_{c} (x)} .

(2)

where x represents the input low-light image,

C (x)

represents the color map corresponding to x, and

m e a n_{c} (x)

denotes the average value of each pixel in the RGB channel. Then, the estimation of noise in the input low-light image can be derived from Equation (3):

N (x) = max (a b s (\nabla_{x} C (x)), a b s (\nabla_{y} C (x))) .

(3)

where

\nabla_{x}

and

\nabla_{y}

denote the gradient maps in the x-direction and y-direction of the color map, respectively, and they are used to calculate the maximum value between pixel channel levels x and y.

Inspired by [27,28], we adopted a semantic-aware module to extract features from noisy images. In the semantic-aware module, first, the activation mapping of an image pixel M is computed. For a given image, let

f_{k} (x, y)

denote the activation of cell k at the spatial grid

(x, y)

; then, the activation mapping of a pixel can be computed by Equation (4):

M (x, y) = \sum_{k} w_{k} f_{k} (x, y) .

(4)

where

w_{k}

denotes the weight corresponding to the pixel in cell k.

Then, the activation mapping of pixels is used to map the corresponding RGB values of the image to semantically aware feature representations, with the conversion process shown in Equation (5):

F_{s} = m^{T} X = \sum_{i = 1}^{W} \sum_{j = 1}^{H} m_{i, j} N_{i, j} .

(5)

where

m_{i, j}

denotes the weight of the activation map;

N_{i, j}

denotes the individual pixel points of

N (x)

; and W and H denote the width and height of the image, respectively.

Subsequently, a convolutional layer is employed to align the dimensions of the semantic features

F_{s}

and image features

F_{i}

to the same dimensions, and then the attention map is computed using the transposed attention mechanism according to [26]; as a result, the semantic-aware attention map can be defined by Equation (6):

A = S o f t m a x (L_{k} (F_{i}) \times L_{q} (F_{s}) / \sqrt{C}) .

(6)

where

L_{k}

and

L_{q}

denote the convolutional layers and C denotes the channel of the feature. Since A can represent the interrelationship between

F_{s}

and

F_{i}

, the noisy image features obtained through the semantic-aware attention map are given as in Equation (7):

F_{n o i s e} = F N (L (F_{i}) \times A + F) .

(7)

where FN denotes the feedforward network and

F_{n o i s e}

is the feature map extracted from the input noise map.

Finally, the image features are enhanced by fusing the extracted noise feature map with the input low-light image (Equation (8)) to provide more and more accurate image information for subsequent image illumination enhancement and noise reduction:

F_{f u s i o n} = F_{i} + β F_{n o i s e} .

(8)

where

β

denotes the coefficient of the noise feature.

2.2.3. Depth-Curve-Estimation Network

The essence of a depth-curve-estimation network is to solve for the parameters of a set of pixel-level curves. The network consists of seven simple convolutional layers, two dropout layers, and a tanh activation function. Given that dropout layers excel at filtering redundant and noisy data, their incorporation into the last two layers of the depth-curve-estimation network serves to preserve image features while mitigating noise interference and color distortion. The detailed network architecture is shown in Figure 3. The input into the depth-curve-estimation network is the low-light image, which is initially processed through the convolutional layer, and followed by the dropout layer to eliminate redundant or incoherent data. Subsequently, the tanh activation function generates the parameter mapping, and through multiple iterations on the input low-light image, the corresponding curve parameters are derived. Notably, each iteration entails the mapping of the three curve parameters for the RGB three-channel, further augmenting image contrast. Finally, the output of the depth curve estimation network is the final parameter of the low-light image-enhancement curve.

In Figure 3, n represents the number of iterations of the network.

2.2.4. Zero-Reference Loss Function

The cornerstone of our method’s efficacy in achieving zero-reference learning lies in the utilization of four reference-free loss functions drawn from the existing literature [13]. They encompass spatial consistency loss, exposure control loss, color consistency loss, luminance-smoothing loss, and a novel contribution from our work—the noise-estimation loss. Our network assesses the quality of the enhanced image by employing these loss functions, facilitating subsequent parameter adjustments. The expression of our noise-estimation loss function is as follows, and the other reference-free loss functions can be found in the literature [13].

The noise-estimation loss can be used to maintain the signal-to-noise ratio (SNR) between the enhanced image and the input image, and the results of each calculation are fed back into the curve-estimation network to continuously adjust the curve parameters, and to ensure the quality of the enhanced image. The noise-estimation loss is defined as shown in Equation (9):

L_{n o l} = \frac{1}{S} \sum {(S N R - W)}^{2} .

(9)

where W denotes the constant matrix and S denotes the size of the enhanced image.

The final zero-reference depth profile estimates the loss of the network, as shown in Equation (10):

L_{total} = W_{s p a} L_{s p a} + W_{exp} L_{exp} + W_{c o l} L_{c o l} + W_{n o l} L_{n o l} + W_{t v A} L_{t v A} .

(10)

where

W_{s p a}

,

W_{exp}

,

W_{c o l}

,

W_{n o l}

, and

W_{t v A}

denote the weights corresponding to spatial consistency, exposure control, color consistency, noise-estimation, and luminance-smoothing losses, respectively.

3. Results

3.1. Implementation Details

3.1.1. Dataset

The SICE [29] dataset is a publicly available sequence of low-light images proposed by Cai et al. Seven different camera devices were used to capture image sequences of various scenes and exposure values, each containing three to five images. After collecting the source images, the required sequences were further screened to generate reference images The final dataset included 589 sequences and 4413 multi-exposure images.

In general, CNN-based models employ paired data for network training [30,31,32], whereas GAN-based models meticulously choose unpaired data for training [33,34]. In our experiments, we randomly divided 3022 images of 360 multiple-exposure sequences from part 1 of the SICE [29] dataset into two parts. A total of 2416 images were used for training and the rest for validation. Both the validation and training images were resized to dimensions of

512 \times 512

.

3.1.2. Experiment Settings

Our framework was developed using PyTorch 1.8.1 on an NVIDIA 2080Ti GPU. The batch size chosen was eight. The filter weights were initialized for each layer using a Gaussian function with 0.02 standard deviation and a standard zero-mean. Bias initialization involved a constant value. Network optimization was conducted through the ADAM optimizer, with a learning rate of 1 × 10⁻⁴.

3.2. Benchmark Evaluations

To assess the validity and superiority of ZLEN, comprehensive qualitative and quantitative evaluations were conducted. These evaluations involved comparing ZLEN with SOTA approaches using both full-reference low-light datasets, namely, LOL [31], and non-reference low-light datasets, such as NPE [26], LIME [17], MEF [35], and DICM [36].

3.2.1. Validation on the Full-Reference Dataset LOL

The LOL dataset contains real captured images, encompassing 485 pairs of normal-light and low-light images for training, as well as an additional 15 image pairs for testing. We used 15 low-light test images stemming from the LOL dataset for validation. For the quantitative assessment of our methods, we employed PSNR, SSIM [31], and LPIPS [37] as evaluation metrics. The numerical evaluations, along with the visual results, are presented in Figure 4 and Table 1 for a comparative analysis.

As depicted in Figure 4, our method exhibited reduced noise and color shifts, yielding superior visual outcomes when juxtaposed with the contemporary SOTA zero-reference method, namely, Zero-DCE [13]. Furthermore, our method showed comparable performance with the SOTA full-reference method in terms of visual quality.

In Table 1, an upward arrow (↑) signifies that the larger value is indicative of superior quality, while a downward arrow (↓) suggests the opposite. The bold highlights the best result among the full-reference methods, and the red designates the best result among the zero-reference methods. As shown in Table 1, our method achieved a more advanced performance compared with existing zero-reference image-enhancement methods. Compared with the SOAT zero reference image-enhancement method Zero DCE, our method achieved a PSNR of +4.596 gain, an SSIM of +0.224 gain, and an LPIPS of −0.100 gain. Higher PSNR values indicate that our method can suppress artifacts and better reflect color information. Higher SSIM values indicate that our method better preserves structural information with high-frequency details. As for LPIPS, our method also achieved better performance. However our method failed to achieve better performance compared with the partial full-reference methods KinD++ [38] and LLFlow [39], which rely on a large amount of pairwise data and reference images to train the model. Our method has a wider range of application scenarios than the above methods, which rely on full-reference image data for training.

Figure 4. Our method visually compared with the SOTA methods on LOL test data.

Table 1. Quantitative comparison with the SOTA methods on the LOL dataset.

Reference	Method	PSNR ↑	SSIM ↑	LPIPS ↓
With	KinD (ACM ICM, 2019) [40]	18.970	0.782	0.257
	KinD++ (IJCV, 2021) [38]	20.870	0.804	0.175
	LLFlow (AAAI, 2022) [9]	24.999	0.870	0.117
	LLFlow-SKF (CVPR, 2023) [39]	26.798	0.879	0.105
	LIME (TIP, 2016) [17]	16.760	0.560	0.350
	DRBN (CVPR, 2020) [28]	19.137	0.784	0.252
	EnlightenGAN (TIP, 2021) [10]	17.483	0.652	0.322
Zero	RetinexNet (BMVC, 2018) [31]	16.770	0.462	0.474
	Zero-DCE (CVPR, 2020) [13]	14.861	0.562	0.335
	Ours	19.457 (best)	0.786 (best)	0.235 (best)

To further demonstrate the advantages of our method in other aspects, we compared it with other advanced non-full-reference methods, namely, RetinexNet [31], EnlightenGAN [10], and Zero-DCE [13], in terms of runtime, and the results are shown in Table 2.

According to the results in Table 2, our method had a faster running speed compared with other non-full reference methods. Although it ran slower than the SOAT zero-reference method Zero-DCE, our method achieved a balance between enhancement and running speed.

3.2.2. Validation of Non-Reference Datasets

To further validate the superiority of ZLEN, we replicated the existing advanced zero-reference and semi-reference methods and processed low-light images from non-reference low-light datasets, including NPE (84 images), LIME (10 images), MEF (17 images), and DICM (64 images). For each image, we presented the enhancement results on-screen, providing the input image as a reference. To quantitatively assess the subjective visual quality of the different methods, we conducted a user study involving 20 human subjects who independently evaluated the visual quality of the enhanced images. The assessment criteria included observations on whether the enhanced image exhibited under/over-enhanced regions or under/over-exposed artifacts, the presence of color shifts, and the existence of noticeable noise and unnatural textures. Visual quality scores, between 1 and 5 (indicating the worst to the best quality), were assigned by the subjects. The average subjective scores are presented for each image set in Table 3, and both the user study (US) and the non-reference perceptual index [16] (PI) are listed to assess the perceptual quality. The PI metric, which was initially employed for measuring perceptual quality in image super-resolution, has gradually become the standard for evaluating the performance of image-enhancement and image-restoration tasks.

In Table 3, lower PI scores indicate higher perceptual quality, higher US scores indicate better human subjective visual quality, and the bold indicates the score with the best results. From the results in Table 3, our method had better visual and perceptual quality compared with the advanced non-reference methods.

3.3. Ablation Studies

In contrast to the advanced zero-reference low-light image-enhancement method, namely, Zero-DCE [13], our approach proposes a noise fusion module, a noise loss function, and structural adjustments to the depth curve estimation network. To assess the effectiveness of ZLEN, we conducted ablation studies involving the removal of these three components. The comparative results are presented in Figure 5 and Table 4.

In Figure 5 and Table 4, “w/o Noise Fusion”, “w/o

L_{n o i s e}

” and “w/o Adjust Network” denote the experimental results of removing the noise estimation, the noise loss function, and the deep network adjustment module, respectively, and the bold in Table 4 denotes the best enhancement results.

3.3.1. Contribution of the Noise Fusion Module

Experiments were conducted to evaluate the performance of ZLEN with the exclusion of the noise fusion module. The original image served as the input to the depth curve estimation network, while the remaining modules and loss functions remained constant during training. A comparison was made with the full model of our method using a typical low-light image. As illustrated in Figure 6, the enhanced image exhibited a diminished enhancement effect in contrast to the full model when the noise fusion module was omitted. It was characterized by increased noise and color shifts. The quantitative evaluation results in Table 4 further underscored the impact of removing the noise fusion module on the overall image-enhancement performance.

3.3.2. Contribution of Noise Loss

To assess the significance of the noise loss component in our method, we excluded it while keeping other modules unchanged. The resulting enhancement effect was then compared with that of the full model. As depicted in the comparative results presented in Figure 6 and Table 4, the absence of the noise loss led to an inadequate removal of noise in the enhanced image, resulting in a diminished image-enhancement effect.

3.3.3. The Impact of Adjusting Depth Curve Estimate Networks

We validated the effect of the depth curve estimation network structure on our method by comparing the enhancement effect with and without dropout layers. The image-enhancement effects of adding each of a dropout layer after all convolutional layers, the fifth convolutional layer, and the last convolutional layer were verified, and finally, we chose to add the dropout layer after the fifth convolutional layer and the last convolutional layer and compared the enhancement effect with our full-model approach. As can be seen from the comparison results in Figure 6 and Table 4, the adjusted network could reduce the color shift and image noise to some extent.

3.4. Application of Low-Light Images in Mines and Indoor Squares

To verify the practicality of our method, the proposed method was used for mine images and indoor square low-light images for enhancement, and the original low-light image and the enhanced image are shown in Figure 6.

In Figure 6, the image enhanced by our method reveals a clearer distribution of debris, coal rods, and gangue in the coal flow in comparison with the original low-light image.

4. Conclusions

We propose a zero-reference low-light image-enhancement method based on noise estimation (ZLEN). Our approach, which was designed to mitigate the noise impact for image enhancement, incorporates a specially crafted noise-estimation module and a corresponding noise loss function. Simultaneously, color distortion is addressed by fine-tuning the depth-curve-estimation network. Subsequently, extensive experiments were conducted using prevalent low-light image-enhancement methods on a public dataset to demonstrate the superior performance of ZLEN. Finally, we applied ZLEN to enhance mine low-light images, achieving excellent results. This practical application affirmed the efficacy and effectiveness of ZLEN.

Author Contributions

Conceptualization, P.C. and Q.N.; methodology, P.C.; software, P.C.; validation, P.C., Q.N. and T.L.; formal analysis, Y.Z.; investigation, P.C.; resources, P.C.; data curation, Y.Z.; writing—original draft preparation, P.C.; writing—review and editing, P.C. and Q.N.; visualization, T.L.; supervision, Q.N. All authors read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://github.com/csjcai/SICE (accessed on 15 January 2018).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SOAT	State of the art
GAN	Generative adversarial network
CNN	Convolutional neural network
PSNR	Peak signal-to-noise ratio
SSIM	Structural similarity
LPIPS	Learned perceptual image patch similarity
US	User study
PI	Perceptual index

References

Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Chu, Q.; Ouyang, W.; Li, H.; Wang, X.; Liu, B.; Yu, N. Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4836–4845. [Google Scholar]
Zhong, Z.; Lei, M.; Cao, D.; Fan, J.; Li, S. Class-specific object proposals re-ranking for object detection in automatic driving. Neurocomputing 2017, 242, 187–194. [Google Scholar] [CrossRef]
Wang, Z.C.; Zhao, Y.Q. An image enhancement method based on the coal mine monitoring system. Adv. Mater. Res. 2012, 468, 204–207. [Google Scholar] [CrossRef]
Ma, L.; Ma, T.; Liu, R.; Fan, X.; Luo, Z. Toward fast, flexible, and robust low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5637–5646. [Google Scholar]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H.; Shao, L. Learning enriched features for real image restoration and enhancement. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 492–511. [Google Scholar]
Chen, C.; Chen, Q.; Xu, J.; Koltun, V. Learning to see in the dark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3291–3300. [Google Scholar]
Wang, R.; Zhang, Q.; Fu, C.W.; Shen, X.; Zheng, W.S.; Jia, J. Underexposed photo enhancement using deep illumination estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 6849–6857. [Google Scholar]
Wang, Y.; Wan, R.; Yang, W.; Li, H.; Chau, L.P.; Kot, A. Low-light image enhancement with normalizing flow. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022; Volume 36, pp. 2604–2612. [Google Scholar]
Jiang, Y.; Gong, X.; Liu, D.; Cheng, Y.; Fang, C.; Shen, X.; Yang, J.; Zhou, P.; Wang, Z. Enlightengan: Deep light enhancement without paired supervision. IEEE Trans. Image Process. 2021, 30, 2340–2349. [Google Scholar] [CrossRef] [PubMed]
Wolf, V.; Lugmayr, A.; Danelljan, M.; Van Gool, L.; Timofte, R. Deflow: Learning complex image degradations from unpaired data with conditional flows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 94–103. [Google Scholar]
Yuan, L.; Sun, J. Automatic exposure correction of consumer photographs. In Proceedings of the Computer Vision—ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; pp. 771–785. [Google Scholar]
Guo, C.; Li, C.; Guo, J.; Loy, C.C.; Hou, J.; Kwong, S.; Cong, R. Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1780–1789. [Google Scholar]
Land, E.H. The retinex theory of color vision. Sci. Am. 1997, 237, 108–129. [Google Scholar] [CrossRef] [PubMed]
Liu, R.; Ma, L.; Zhang, J.; Fan, X.; Luo, Z. Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10561–10570. [Google Scholar]
Fu, X.; Zeng, D.; Huang, Y.; Zhang, X.P.; Ding, X. A weighted variational model for simultaneous reflectance and illumination estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NE, USA, 27–30 June 2016; pp. 2782–2790. [Google Scholar]
Guo, X.; Li, Y.; Ling, H. LIME: Low-light image enhancement via illumination map estimation. IEEE Trans. Image Process. 2016, 26, 982–993. [Google Scholar] [CrossRef] [PubMed]
Li, M.; Liu, J.; Yang, W.; Sun, X.; Guo, Z. Structure-revealing low-light image enhancement via robust retinex model. IEEE Trans. Image Process. 2018, 27, 2828–2841. [Google Scholar] [CrossRef] [PubMed]
Lore, K.G.; Akintayo, A.; Sarkar, S. LLNet: A deep autoencoder approach to natural low-light image enhancement. Pattern Recognit. 2017, 61, 650–662. [Google Scholar] [CrossRef]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2223–2232. [Google Scholar]
Battiato, S.; Bosco, A.; Castorina, A.; Messina, G. Automatic image enhancement by content dependent exposure correction. EURASIP J. Adv. Signal Process. 2004, 2004, 613282. [Google Scholar] [CrossRef]
Bhukhanwala, S.A.; Ramabadran, T.V. Automated global enhancement of digitized photographs. IEEE Trans. Consum. Electron. 1994, 40, 1–10. [Google Scholar] [CrossRef]
Lee, K.; Kim, S.; Kim, S.D. Dynamic range compression based on statistical analysis. In Proceedings of the 2009 16th IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, 7–10 November 2009; pp. 3157–3160. [Google Scholar]
Reinhard, E.; Stark, M.; Shirley, P.; Ferwerda, J. Photographic tone reproduction for digital images. Semin. Graph. Pap. Push. Boundaries 2023, 2, 661–670. [Google Scholar]
Albu, F.; Vertan, C.; Florea, C.; Drimbarean, A. One scan shadow compensation and visual enhancement of color images. In Proceedings of the 2009 16th IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, 7–10 November 2009; pp. 3133–3136. [Google Scholar]
Liu, J.; Xu, D.; Yang, W.; Fan, M.; Huang, H. Benchmarking low-light image enhancement and beyond. Int. J. Comput. Vis. 2021, 129, 1153–1184. [Google Scholar] [CrossRef]
Cao, P.; Chen, P.; Niu, Q. Multi-label image recognition with two-stream dynamic graph convolution networks. Image Vis. Comput. 2021, 113, 104238. [Google Scholar] [CrossRef]
Yang, W.; Wang, S.; Fang, Y.; Wang, Y.; Liu, J. From fidelity to perceptual quality: A semi-supervised approach for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3063–3072. [Google Scholar]
Cai, J.; Gu, S.; Zhang, L. Learning a deep single image contrast enhancer from multi-exposure image. IEEE Trans. Image Process. 2018, 27, 2026–2049. [Google Scholar] [CrossRef] [PubMed]
Wang, W.; Lai, Q.; Fu, H.; Shen, J.; Ling, H.; Yang, R. Salient object detection in the deep learning era: An in-depth survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3239–3259. [Google Scholar] [CrossRef] [PubMed]
Wei, C.; Wang, W.; Yang, W.; Liu, J. Deep retinex decomposition for low-light enhancement. arXiv 2018, arXiv:1808.04560. [Google Scholar]
Xu, P.; Hospedales, T.M.; Yin, Q.; Song, Y.Z.; Xiang, T.; Wang, L. Deep learning for free-hand sketch: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 285–312. [Google Scholar] [CrossRef] [PubMed]
Li, C.; Guo, J.; Guo, C. Emerging from water: Underwater image color correction based on weakly supervised color transfer. IEEE Signal Process. Lett. 2018, 25, 323–327. [Google Scholar] [CrossRef]
Yu, R.; Liu, W.; Zhang, Y.; Qu, Z.; Zhao, D.; Zhang, B. Deepexposure: Learning to expose photos with asynchronously reinforced adversarial learning. In Proceedings of the Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Montreal, ON, Canada, 3–8 December 2018; Volume 31. [Google Scholar]
Ma, K.; Zeng, K.; Wang, Z. Perceptual quality assessment for multi-exposure image fusion. IEEE Trans. Image Process. 2015, 24, 3345–3356. [Google Scholar] [CrossRef] [PubMed]
Lee, C.; Lee, C.; Kim, C.S. Contrast enhancement based on layered difference representation. In Proceedings of the 19th IEEE International Conference on Image Processing, Orlando, FL, USA, 30 September–3 October 2012; pp. 965–968. [Google Scholar]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
Zhang, Y.; Guo, X.; Ma, J.; Liu, W.; Zhang, J. Beyond brightening low-light images. Int. J. Comput. Vis. 2021, 129, 1013–1037. [Google Scholar] [CrossRef]
Wu, Y.; Pan, C.; Wang, G.; Yang, Y.; Wei, J.; Li, C.; Shen, H.T. Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 1662–1671. [Google Scholar]
Zhang, Y.; Zhang, J.; Guo, X. Kindling the darkness: A practical low-light image enhancer. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; Volume 129, pp. 1632–1640. [Google Scholar]
Wang, Y.; Cao, Y.; Zha, Z.J.; Zhang, J.; Xiong, Z.; Zhang, W.; Wu, F. Progressive retinex: Mutually reinforced illumination-noise perception network for low-light image enhancement. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 2015–2023. [Google Scholar]
Ren, X.; Li, M.; Cheng, W.H.; Liu, J. Joint enhancement and denoising method via sequential decomposition. In Proceedings of the 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, 27–30 May 2018; pp. 1–5. [Google Scholar]

Figure 1. Content-rich areas in the original input image and noise distribution in the enhanced image.

Figure 3. Detailed architecture of depth-curve-estimation network.

Figure 5. Results of ablation experiments for each module.

Figure 6. Enhancement results of our method for low-light images of a mine and indoor squares.

Table 2. Comparison results of running time between non-full reference method and our method.

Method	Runtime/Second	Platform
RetinexNet (BMVC, 2018) [31]	0.1700	TensorFlow 2.0 (GPU)
EnlightenGAN (TIP, 2021) [10]	0.0125	PyTorch 1.8.1 (GPU)
Zero-DCE (CVPR, 2020) [13]	0.0037	PyTorch 1.8.1 (GPU)
Ours	0.0052	PyTorch 1.8.1 (GPU)

Table 3. User study (US) and perception index (PI) scores for different non-full-reference methods on non-reference datasets.

Method	NPE	LIME	MEF	DICM	Average
DeepUPE (ACM ICM, 2019) [41]	3.62/3.01	3.51/2.73	3.43/2.92	3.18/3.12	3.44/2.95
LIME (TIP, 2016) [17]	3.72/2.98	3.92/2.94	3.68/3.17	3.22/3.26	3.64/3.09
DRBN (CVPR, 2020) [26]	3.78/2.83	3.76/2.83	3.13/2.72	3.44/3/20	3.53/2.90
EnlightenGAN (TIP, 2021) [10]	3.81/2.85	3.78/2.77	3.69/2.37	3.46/3.06	3.69/2.78
RetinexNet (BMVC, 2018) [31]	3.20/3.14	2.23/3.01	2.69/2.73	2.72/3.12	2.71/3.00
JED (ISCAS, 2018) [42]	3.65/3.05	3.50/3.01	2.93/3.61	3.47/3/43	3.39/3.28
SRIE (CVPR, 2016) [16]	3.58/2.64	3.46/2.61	3.17/2.58	3.40/3.15	3.40/2.75
Zero-DCE (CVPR, 2020) [13]	3.77/2.76	3.78/2.68	4.01/2.37	3.44/2.97	3.75/2.70
Ours	3.92/2.71	3.84/2.58	4.62/2.25	3.83/2.79	4.05/2.58

Table 4. Quantitative comparison results of the ablation studies by module on the LOL dataset.

Method	PSNR ↑	SSIM ↑	LPIPS ↓
w/o Noise Fusion	18.725	0.764	0.253
w/o $L_{n o i s e}$	18.573	0.759	0.261
w/o Adjust Network	18.209	0.753	0.267
Our full model	19.457	0.786	0.235

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cao, P.; Niu, Q.; Zhu, Y.; Li, T. A Zero-Reference Low-Light Image-Enhancement Approach Based on Noise Estimation. Appl. Sci. 2024, 14, 2846. https://doi.org/10.3390/app14072846

AMA Style

Cao P, Niu Q, Zhu Y, Li T. A Zero-Reference Low-Light Image-Enhancement Approach Based on Noise Estimation. Applied Sciences. 2024; 14(7):2846. https://doi.org/10.3390/app14072846

Chicago/Turabian Style

Cao, Pingping, Qiang Niu, Yanping Zhu, and Tao Li. 2024. "A Zero-Reference Low-Light Image-Enhancement Approach Based on Noise Estimation" Applied Sciences 14, no. 7: 2846. https://doi.org/10.3390/app14072846

APA Style

Cao, P., Niu, Q., Zhu, Y., & Li, T. (2024). A Zero-Reference Low-Light Image-Enhancement Approach Based on Noise Estimation. Applied Sciences, 14(7), 2846. https://doi.org/10.3390/app14072846

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Zero-Reference Low-Light Image-Enhancement Approach Based on Noise Estimation

Abstract

1. Introduction

2. Related Works and Methods

2.1. Related Works

2.2. Method

2.2.1. Improved Higher-Order Curve Expressions

2.2.2. Noise Estimation and Fusion

2.2.3. Depth-Curve-Estimation Network

2.2.4. Zero-Reference Loss Function

3. Results

3.1. Implementation Details

3.1.1. Dataset

3.1.2. Experiment Settings

3.2. Benchmark Evaluations

3.2.1. Validation on the Full-Reference Dataset LOL

3.2.2. Validation of Non-Reference Datasets

3.3. Ablation Studies

3.3.1. Contribution of the Noise Fusion Module

3.3.2. Contribution of Noise Loss

3.3.3. The Impact of Adjusting Depth Curve Estimate Networks

3.4. Application of Low-Light Images in Mines and Indoor Squares

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI