Next Article in Journal
The Optimization of Frequency Distribution Based on Genetic Algorithm for Space Gravitational Wave Observatories
Previous Article in Journal
Enhancing Clay Soil’s Geotechnical Properties Utilizing Sintered Gypsum and Glass Powder
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Boosting the Performance of LLIE Methods via Unsupervised Weight Map Generation Network

1
School of Information Engineering, Nanchang University, Nanchang 330031, China
2
School of Mathematics and Computer Sciences, Nanchang University, Nanchang 330031, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(12), 4962; https://doi.org/10.3390/app14124962
Submission received: 23 April 2024 / Revised: 27 May 2024 / Accepted: 3 June 2024 / Published: 7 June 2024
(This article belongs to the Section Computing and Artificial Intelligence)

Abstract

:
Over the past decade, significant advancements have been made in low-light image enhancement (LLIE) methods due to the robust capabilities of deep learning in non-linear mapping, feature extraction, and representation. However, the pursuit of a universally superior method that consistently outperforms others across diverse scenarios remains challenging. This challenge primarily arises from the inherent data bias in deep learning-based approaches, stemming from disparities in image statistical distributions between training and testing datasets. To tackle this problem, we propose an unsupervised weight map generation network aimed at effectively integrating pre-enhanced images generated from carefully selected complementary LLIE methods. Our ultimate goal is to enhance the overall enhancement performance by leveraging these pre-enhanced images, therewith culminating the enhancement workflow in a dual-stage execution paradigm. To be more specific, in the preprocessing stage, we initially employ two distinct LLIE methods, namely Night and PairLIE, chosen specifically for their complementary enhancement characteristics, to process the given input low-light image. The resultant outputs, termed pre-enhanced images, serve as dual target images for fusion in the subsequent image fusion stage. Subsequently, at the fusion stage, we utilize an unsupervised UNet architecture to determine the optimal pixel-level weight maps for merging the pre-enhanced images. This process is adeptly directed by a specially formulated loss function in conjunction with the no-reference image quality algorithm, namely the naturalness image quality evaluator (NIQE). Finally, based on a mixed weighting mechanism that combines generated pixel-level local weights with image-level global empirical weights, the pre-enhanced images are fused to produce the final enhanced image. Our experimental findings demonstrate exceptional performance across a range of datasets, surpassing various state-of-the-art methods, including two pre-enhancement methods, involved in the comparison. This outstanding performance is attributed to the harmonious integration of diverse LLIE methods, which yields robust and high-quality enhancement outcomes across various scenarios. Furthermore, our approach exhibits scalability and adaptability, ensuring compatibility with future advancements in enhancement technologies while maintaining superior performance in this rapidly evolving field.

1. Introduction

The influence of LLIE can have a profound impact on various subsequent visual tasks, including image classification [1,2], object detection [3,4,5,6], face recognition [6,7,8], autonomous driving [9,10,11], and more, potentially compromising their effectiveness. These challenges arise from the challenges posed when capturing images under poor lighting conditions such as low light, backlighting, or nighttime settings, which often result in issues like loss of detail, inadequate contrast, and color inconsistencies, thus impeding the extraction of crucial visual features. Addressing this fundamental challenge has been a longstanding issue in the field of computer vision. To uncover hidden details within low-light images and mitigate performance degradation in subsequent visual tasks, significant research efforts have been directed towards the development of improved LLIE techniques over the past decade. In the domain of LLIE, initial methodologies predominantly lean on traditional image processing techniques, such as histogram equalization (HE) [12] and the Retinex theory [13]. HE redistributes luminance values across the histogram to enhance contrast. However, these approaches often prioritize contrast enhancement over addressing the underlying causes of illumination variations, which can lead to the risks of over-enhancement or under-enhancement. Conversely, the Retinex model decomposes images into reflectance and illumination components, with the aim of alleviating lighting variations while preserving reflectance. Mathematically, this model can be represented as:
I = R · L ,
where R, L, and · denote reflectance, illumination, and element-wise multiplication, respectively. Despite the favorable results achieved by Retinex-based methods in improving image contrast, this approach can sometimes lead to significant color cast problems and inadequate enhancement of details in darker areas. These methods generally rely on hand-crafted features and utilize optimization strategies to improve image quality, heavily relying on the accuracy of these pre-defined assumptions. As a result, their ability to capture intricate data patterns, especially in complex lighting scenarios, is limited, which in turn restricts their adaptability and generalization capabilities. To address these limitations, Jobson et al. introduced two variants of Retinex: Single-scale Retinex (SSR) [14] and multi-scale Retinex (MSR) [15]. SSR initially employs a Gaussian filter to smooth the illumination map, while MSR extends SSR by incorporating multi-scale Gaussian filters and color restoration. However, both approaches often produce unnatural-looking images and suffer from over-enhancement due to their treatment of reflectance as the final outcome of enhancement. Moreover, Guo proposed LIME [16] as an improvement and extension of Retinex theory. LIME refines the initial illumination estimation based on the Max-RGB assumption and incorporates a structure-preserving constraint. This refinement effectively enhances low-light images while improving brightness and detail. However, there is still room for improvement in terms of the method’s complexity and computational efficiency. Furthermore, existing methods exhibit limited efficacy in color enhancement due to the non-linearity across color channels and the complexity of the data. This limitation becomes particularly noticeable in the presence of local color distortions frequently observed in enhanced images.
Recently, deep learning has been widely applied in the field of LLIE and has made remarkable progress due to its ability to perform non-linear mapping, feature learning, and representation. In the early stages, Lore et al. developed the low-light net (LLNet) [17], which utilized a stacked-sparse denoising auto-encoder to simultaneously enhance low-light images and reduce noise. However, this approach does not fully consider the characteristics of low-light images, resulting in residual noise and over-smoothing issues in the processed images. To address these limitations, Tao et al. proposed the low-light convolutional neural network (LLCNN) [18]. LLCNN employs deeper convolutional neural networks specifically designed to capture and enhance features in low-light images. In comparison with LLNet, LLCNN not only achieves significant improvements in image quality but also demonstrates robust performance across various low-light conditions. Furthermore, Wei et al. introduced Retinex-Net [19] to overcome the limitations of LLNet. Retinex-Net utilizes an end-to-end network architecture that learns illumination-aware decompositions, striking a balance between natural colors and details in LLIE. However, Retinex-Net often produces unnatural enhanced results. To address this issue, Zhang et al. proposed Kindling the Darkness (KinD) [20], which improved upon Retinex-Net by introducing additional training losses and modifying the network architecture. However, KinD still faces challenges such as overexposure and halo artifacts. In response, the authors proposed KinD++ [21], which incorporates a multi-scale illumination attention module specifically designed to alleviate visual defects observed in KinD’s results. Additionally, Wang et al. addressed the limitation of deep Retinex-based methods, where noise is often neglected, by proposing a progressive Retinex network [22]. This approach consists of two interconnected subnetworks: the IM-Net, which estimates illumination, and the NM-Net, which assesses noise levels. These subnetworks collaboratively refine their outputs in a progressive manner until stable results are obtained. Overall, these supervision-based methods have significantly enhanced the quality of low-light images by continuously optimizing deep network architectures and improving training strategies. However, they rely on extensive paired data for supervised training, which can lead to model overfitting and limit their generalization capabilities.
In unsupervised settings, EnlightenGAN [23] introduced an attention-equipped generator for image enhancement without the need for paired datasets. This approach aims to tackle the domain shift issue between training data and real-world applications. However, the stability of EnlightenGAN in extreme environments remains a challenge. To address the influence of unstable training on the enhancement process, Guo et al. proposed Zero-DCE [24], a deep curve estimation network. Zero-DCE innovatively constructs a quadratic curve with learned parameters, taking a low-light image as input and generating high-order curves as output. These curves facilitate pixel-wise adjustments to the dynamic range of the input image, resulting in enhanced visual output. Additionally, an accelerated and lightweight variant of Zero-DCE, called Zero-DCE++ [25], was introduced. These curve-based methods do not require paired or unpaired data for training and rely on zero-reference learning using a set of non-reference loss functions. Compared to image reconstruction-based methods that demand high computational resources, the image-to-curve mapping in these methods only requires lightweight networks, enabling faster inference. Moreover, inspired by Deep Image Prior (DIP) [26], Zhao et al. proposed Retinex-DIP [27], which is grounded in the Retinex model. Retinex-DIP generates the reflectance and illumination components of an input image from randomly sampled white noise. During training, the model focuses on component characteristic-related losses, such as illumination smoothness, to optimize the enhancement process. While these methods alleviate the need for paired data, they primarily focus on light-related factors and exhibit limited capabilities in addressing other defects, as observed in EnlightenGAN and Zero-DCE. Therefore, there is still a need for effective and efficient designs capable of handling complex and multi-entangled image degradations in practical scenarios. Compared to supervised methods that directly learn from abundant normal illumination samples to restore low-light images, current unsupervised methods still fall short in terms of enhancement performance.
Overall, propelled by deep learning methodologies, LLIE methods have experienced remarkable improvement in performance. However, current LLIEs still have limitations, often struggling to ensure optimal enhancement results and exhibiting certain shortcomings when adopting a singular strategy. Interestingly, experimental evaluations conducted in this study reveal that different LLIEs often exhibit complementary characteristics, particularly in handling aspects such as image detail, color, textures, smoothing, and brightness. Inspired by this insight, we propose a two-stage LLIE approach that enhances performance through an unsupervised weight map generation network. Specifically, in the preprocessing stage, we employ two state-of-the-art (SOTA) LLIE methods, namely Night and PairLIE, selected for their complementary strengths, to generate two pre-enhanced images. Night and PairLIE are selectively drawn from existing mainstream self-supervised and unsupervised LLIEs, each exhibiting strong complementary advantages in their enhancement results. In the image fusion stage, the pre-enhanced images are utilized to fully leverage the distinct enhancement advantages of the Night and PairLIE methods across different regions of the low-light image. We propose an unsupervised approach to develop a UNet network for generating optimized pixel-level weighting maps. The generation process is guided by a multi-objective mixed loss function. To fuse the pre-enhanced images, we employ a mixed weighting mechanism that combines pixel-level weighting coefficients (referred to as weighting maps) and pre-defined global weighting coefficients for the entire pre-enhanced images. In the iterative generation of weighting maps, the iteration training process is terminated at appropriate steps based on the metric value of the no-reference image quality assessment algorithm, namely, NIQE [28]. The fused image obtained at this point serves as the final enhanced image. In summary, this research ingeniously combines the strengths of various LLIE methods, as validated by extensive experimental data. The effectiveness of this strategy stems from several innovative aspects:
  • The proposed method takes a novel approach by maximizing the complementary advantages of different approaches in low-light image enhancement. Instead of relying solely on a single technique or method, it combines the strengths of multiple LLIE methods to ensure robust and versatile enhancement results across diverse scenes. This not only augments the quality of the final image but also significantly mitigates the common pitfalls associated with any single enhancement approach. Moreover, the integration of unsupervised training technique further refines the enhancement process, enabling the proposed method to handle a variety of lighting conditions, thus ensuring the generalizability of the enhancement and bringing it closer to a real-world adaptability.
  • Distinct from other fusion strategies, we generate weight maps through a deep neural network (i.e., UNet) under the guidance of a multi-objective mixed loss function in an unsupervised manner, which allocates and determines the optimal pixel-level weights to each pre-enhanced image based on the no-reference metric value of the fused images. Note that the strategy of utilizing random input Ƶ as network input facilitates the generation of optimal pixel-level weight maps across different scenarios, significantly boosting its robustness. In addition to pixel-level weighting, we also incorporate global weighting coefficients for the pre-enhanced images, creating a mixed weighting mechanism that ensures the optimal enhancement of the fused image.
  • To the best of our knowledge, there are multiple objective evaluation metrics for assessing the enhancement performance of LLIEs, and each of these metrics has its own emphasis. To provide a more concise and comprehensive assessment of the various LLIE methods, we have devised a novel scoring system to evaluate their overall performance. Within this scoring system, an optimal combination pattern of LLIE methods (i.e., Night and PairLIE) was selected, which outperforms others comprehensively. Note that our method has potential scalability. In the future, if new high-performance low-light enhancement methods emerge, they can still be incorporated into the proposed fusion framework to achieve better enhancement effects, verified through our scoring system.
The organization of this paper is as follows: In Section 2, a comprehensive review is provided for the selected methods, which are chosen based on their complementary characteristics. These methods generate pre-enhanced images that serve as target images integrated into our fusion framework. Section 3 focuses on the backbone network used in the fusion stage. It covers the structure of the network, the processing of the generated image, and the composite loss function applied in the optimization process. Section 4 provides detailed information about the experimental dataset, settings, and the results obtained from comparing our approach with other LLIE methods. This comparison includes both classical methods and SOTA LLIEs, allowing for a comprehensive evaluation of our method. Finally, in Section 5, a comprehensive summary of our work is presented, highlighting the key contributions and findings.

2. Related Work

In this work, we meticulously select two methods, namely PairLIE and Night, from a diverse range of cutting-edge LLIEs based on ablation experiments. These two methods are chosen due to their complementary features. In the proposed weight map generation network, the processing results of the PairLIE and Night algorithms on any given low-light image will serve as target images. The objective is to generate enhancement results adaptable to various scenarios. Below is a detailed introduction to these two methods.

2.1. PairLIE Method

PairLIE, proposed by Zhang et al. [29], is an advanced LLIE method designed to enhance contrast and enrich image details in poor light conditions. Unlike traditional LLIE approaches that rely on a single input image and hand-crafted priors for illumination adjustments, PairLIE adopts a unique dual-input strategy. It utilizes two low-illuminated images with the same scene content as inputs, resulting in more adaptive and effective image enhancement. Figure 1 illustrates the PairLIE framework, which starts with the initial application of the P-Net module. This module removes noise and inappropriate features from the primary low-light images, denoted as I 1 and I 2 , thereby refining the inputs for further processing. The L-Net and R-Net modules are then employed to extract the latent illumination components, L 1 and L 2 , as well as the reflectance components, R 1 and R 2 , for each pair of input images. During the training phase, a composite loss function is used to shape the network’s performance, consisting of three distinct loss functions:
L Composite = w 0 L P + w 1 L C + w 2 L R ,
The influence of each constituent loss function, indicating their prescribed significance, is calibrated by the hyperparameters w 0 , w 1 , and w 2 . The self-supervised projection loss L P assesses the variation between the unprocessed and refined images. Calculated using R 1 and R 2 , the reflectance consistency loss L C ensures consistency in the reflectance components. Additionally, the Retinex loss L R enforces adherence to Retinex theory during the image decomposition process. During the testing stage, a given low-light image is initially preprocessed using the P-Net module. Then, the L-Net and R-Net modules are utilized to estimate the illumination and reflectance components, respectively. The final enhanced image is obtained by combining these components, resulting in improved brightness and detailed enhancement. This fusion process can be represented mathematically as:
I enhanced = g ( L ) R = L λ R ,
Here, λ represents the illumination correction factor, and I enhanced represents the resulting enhanced image.
In contrast to the conventional methodologies that rely on manually designed priors, PairLIE integrates deep learning methods at the cutting edge with the well-established Retinex theory. This integration considerably diminishes the traditional approach’s dependence on specific training datasets. Notably, PairLIE is proficient at maintaining the integrity of intricate textures and subtle details, which are frequently compromised under dim lighting conditions. This illustrates the method’s advanced and refined processing prowess. Additionally, the architecture of PairLIE is characterized by a reduced reliance on hand-crafted priors and an architecture that is more straightforward, suggesting prospects for improved efficiency. However, its application in varied contexts may be constrained by its dependency on pairs of images driven by data, and its capacity to reproduce colors and brightness with accuracy may not always be adequate, underscoring opportunities for further enhancement in subsequent versions.

2.2. Night Method

The challenge of uneven luminance distribution often plagues nighttime photography, frequently leading to an over-amplification and saturation of bright spots while leaving darker areas inadequately exposed. While prevailing night visibility enhancement strategies concentrate on improving these underexposed segments, they tend to unintentionally heighten the intensity of light, thereby diminishing the visibility in nighttime imagery. To tackle this issue, Jin et al. [30] proposed Night, an innovative unsupervised approach that artfully merges a layer decomposition network with a network tailored for diminishing light effects. Exploiting the Retinex theory, Night employs an unsupervised end-to-end architecture adept at concurrently intensifying dark regions and mitigating light disturbances. This architecture encompasses networks designated for image decomposition along with light-effects suppression. Figure 2 illustrates the initial processing phase where the network takes a single night image as input, which is subsequently parted into three distinct layers, shading, reflectance, and light-effects layers, each modulated by its own set of unsupervised prior losses, in accordance with the image-layer formulation:
I = R · L + G ,
where I is the input night image, R and L are the reflectance and shading layers, and G represents the light-effects layer.
Subsequently, the light-effects layer directs the initial input to the dedicated network for suppressing light effects, which has been trained on unpaired images exhibiting varying degrees of light effects. This specialized network generates attention maps with a focus on regions impacted by light effects, serving to restrain these effects while concurrently bolstering illumination in dark regions. Moreover, the network utilizes the gradient exclusion loss, denoted by L e x c l , which distinguishes between the gradient profiles of the light-effects layer and those of the background. This distinction is instrumental in refining the disentanglement of light effects from the overall scene imagery. The gradient exclusion loss L e x c l can be expressed as:
L e x c l = n = 1 3 tanh ( λ G n | G n | ) tanh ( λ J i n i t n | J i n i t n | ) ,
where ∇ denotes the gradient, ⊙ represents element-wise multiplication, and λ are normalization factors.
To faithfully reconstruct background details and mitigate the emergence of hallucination or artifacts, the method employs a novel loss function that enforces consistency in structural and high-frequency features, denoted as L g r a y f e a t . This loss function is crucial for preserving the structural integrity and fine textural nuances characteristic of nocturnal scenes. The definition of the loss is as follows:
L g r a y f e a t = | ϕ H F ( J r e f i n e ) ϕ H F ( I g r a y ) | 1 + | ϕ l V G G ( J r e f i n e ) ϕ l V G G ( I g r a y ) | 1 ,
where ϕ H F and ϕ l V G G represent the high-frequency feature maps and the feature maps from the l-th layer of the VGG network, respectively. The light-effects suppression network’s ultimate output proficiently diminishes light disturbances as it correspondingly amplifies darker zones. This is accomplished by refining the initially approximated background scene J i n i t to a final light-effects-free output J r e f i n e .
Night method leverages unsupervised learning techniques to enhance night images, achieving a harmonious blend of color and brightness while effectively reducing noise. However, its focus on smoothing and suppressing light effects can sometimes result in a loss of fine details, leading to excessively smooth images. While Night excels in addressing issues related to uneven light distribution and prominent light effects in nighttime photography, it is crucial to strike a careful balance between enhancement and the preservation of image details.

3. Methodology

3.1. Experimental Observations

Yi et al. suggested in [31] that classical algorithms are constrained by the efficiencies of manual design and optimization-driven approaches. Their generalization and robustness are limited, restricting their applicability. To overcome these shortcomings, deep learning had been employed to construct complex mappings from low to normal lighting conditions [17,32], significantly outperforming traditional LLIE methods [33,34,35]. However, Liu et al. contend that most deep learning-based LLIE approaches focus on learning mapping functions, but ignore the guidance of auxiliary priors provided by normal-light images in the training dataset, resulting in enhanced images exhibiting unpleasant artifacts or distorted colors [11]. As illustrated in Figure 3, none of the LLIEs showcase a definitive advantage in terms of detail preservation, exposure, color fidelity, and other aspects. Both traditional techniques such as LIME, Ying [36], and Retinex-Net, as well as deep learning methods like EnlightenGAN and PairLIE, primarily focus on preserving fine details. However, this focus frequently results in the unintended consequence of introducing noise. In terms of contrast, Zero-DCE and PairLIE serve as exemplars of the deep learning paradigm. When evaluating color fidelity, deep learning-based methods such as LLFlow [37], Night, and SCI [38] excel. LLFlow employs supervised learning, while Night and SCI rely on unsupervised approaches. Notably, LLFlow and Night demonstrate superior color reproduction, though they may face challenges in maintaining appropriate brightness levels. Night, specifically, stands out in mitigating overexposure. In terms of exposure correction, Bread [39] and LLFlow emerge as prominent contenders, achieving a more balanced distribution of light and dark regions in images. Furthermore, Night excels in reducing artifacts, effectively minimizing issues like halos and unnatural edges.
Therefore, we can naturally draw the following conclusion: no single LLIE method consistently outperforms others across all conditions. Instead, each method offers distinct and complementary advantages. This observation underscores the potential for developing hybrid approaches that combine the strengths of individual methods synergistically, thereby providing a more robust solution for enhancing low-light images.

3.2. Basic Idea

Our previous experiments have revealed the synergistic effects of various techniques in improving low-light images. Drawing inspiration from this observation, we devise a boosting strategy to effectively integrate the strengths of these complementary pre-enhanced images. Consequently, the main focus of this research centers around the task of image fusion. With regards to the problem of image fusion, we employ a hybrid weighted framework as follows:
x ^ ( i , j ) = n = 1 N W n s n ( i , j ) x n ( i , j ) ,
where N denotes the number of pre-enhanced images, and for each pixel at ( i , j ) , this framework combines N pre-enhanced images with the corresponding weighted value s n ( i , j ) for the n-th pre-enhanced image, along with the global weight W n specific to the n-th pre-enhanced image. By incorporating a combination strategy that combines global and local mixed weighting, this approach effectively preserves the texture details of each pre-enhanced image at a finer granularity and maximizes the advantages offered by different LLIEs. Having established this fundamental principle, the subsequent crucial question is about determining the LLIE methods that should be involved in the image fusion process. To comprehensively identify the optimal combination patterns, we select n mainstream LLIEs to process the same image. Subsequently, we evaluate the degree of complementarity among various combination patterns based on the quality of the fused image.
Due to the intricate and complex nature of the pixel-level weight maps required for each pre-enhanced image, it becomes challenging to determine them through manual design. To overcome this challenge, we employ the classical UNet network, which is capable of generating the corresponding local weight maps at the pixel-level for each pre-enhanced image. By utilizing the UNet, we aim to preserve the texture details of each pre-enhanced image with finer granularity. To train the UNet, we adopt an unsupervised online iterative training mode. This training mode is guided by a specially designed loss function that facilitates iterative refinement. Through these iterations, the weights of the UNet are optimized to minimize the defined loss function, resulting in improved performance. Furthermore, we utilize the no-reference image quality assessment metric, NIQE, to measure the quality of the fused image. This metric allows us to evaluate the quality of the output image without requiring a reference image. By employing NIQE, we can determine the suitable iteration step to terminate the iterative training process. The fused image obtained at the i-th step serves as the final enhanced image, denoted as x ^ i
The specific process is depicted in Figure 4. Initially, we take a low-light image y and apply N different LLIEs to enhance it, resulting in pre-enhanced images { x n } 1 N . Subsequently, we employ the proposed UNet network to generate local pixel-wise weight maps { s n } 1 N . These weight maps are combined with predefined global weight coefficients { W n } 1 N to obtain the final enhanced image x ^ i . To facilitate the iterative updates of network parameters, we propose a multi-objective mixed loss function. This loss function aims to preserve common elements across the pre-enhanced images, similar to a voting mechanism. By incorporating a combination strategy that utilizes both global and local mixed weighting, we ensure that the output fusion image achieves an approximate optimal quality. It is worth noting that the initial input Ƶ to the network consists of random noise, and the corresponding local weight map S n is also generated randomly. However, this randomness does not hinder us from obtaining the best fusion result. This is because the network parameters are adaptively updated during each training iteration, which provides a certain level of robustness to our approach. The detailed structure of the backbone network and the specifics of the loss function will be further explored in the subsequent subsections.

3.3. Backbone Network

Our novel network structure, illustrated in Figure 5, builds upon the encoder-decoder design pioneered by the classic UNet [41], incorporating innovative adjustments in the input-output, encoder, and decoder layers. Distinctively, our network uses uniform noise matching the original image size as input to reduce dependency on specific training data and mitigate the risk of overfitting. In the encoder design, the network employs a series of convolutional modules. Each module is composed of two 3 × 3 convolutional layers to capture the local features. Following each convolutional layer, batch normalization (BN) is employed to maintain activation stability. To enhance the non-linearity of the network, a Leaky ReLU activation function follows each BN. Interspersed between these modules are down-sampling operations that progressively distill abstract features in the depth dimension. Echoing the encoder, the decoder part also includes multiple convolutional modules. However, its goal is to reconstruct the image from the deep features. For a more accurate recovery of image details at the spatial resolution, up-sampling operations connected between modules are implemented using 3 × 3 convolutions (also known as deconvolutions). Following every up-sampling, BN, and Leaky, ReLU activation functions are also used. Additionally, to fuse deep features with shallow statistical information, skip connections are introduced between the encoder and decoder at the same levels to minimize losses from up-sampling and down-sampling. At the end of the proposed network, we add a convolutional layer with an appropriate activation function, which reduces the 128 channels to N channels that are used to output weight maps { s n } 1 N . Importantly, the proposed weight map generation network can implicitly capture and utilize the hierarchical features. It effectively leverages the deep network’s non-linear mapping capability to capture local features and map them onto pixel-level weight maps. This process enables the model to provide more stable and high-quality weight maps by utilizing the proposed loss function and image quality assessment metrics.
In short, in this paper, we introduce a novel network structure based on UNet, which eliminates the need for prior knowledge in the construction of weight maps. Operating as a dynamic, non-linear mapping function, our network excels at autonomously generating custom pixel-wise weight maps for individual pre-enhanced images. The architecture of UNet, with its symmetrically organized contracting and expanding pathways, is specifically designed to capture contextual information while ensuring accurate localization. This design proves immensely advantageous for our application, enabling the network to effectively learn intricate feature representations across different scales, which is crucial for accurately evaluating the importance of each pixel during the enhancement procedure.

3.4. Multi-Objective Mixed Loss L mul

The unsupervised weight map generation network presented in this paper differs from supervised LLIE methods as it operates independently of reference images, relying solely on a set of pre-enhanced images. It should be noted that each pre-enhanced image has already closely approximated the reference image in the image space. The network’s objective is to explore the domain defined by these pre-enhanced images and identify the highest quality images suitable for fusion results. This task also involves fine-tuning the network parameters to generate the most effective weight maps and determine optimal parameter values. In this study, we introduce a multi-objective mixed loss function, denoted as L mul , which consists of several L M S E i subterms. We leverage the multiple complementary pre-enhanced images obtained during the preprocessing phase to enhance its guidance capability. Specifically, the network learns to preserve common parts among the initial enhanced images, akin to a voting mechanism. The shared details are essential for preserving in the output fused image, while avoiding undesired artifacts or changes. Therefore, the overall loss function L o s s mul is defined as:
L mul = i L M S E i = L M S E 1 + L M S E 2 + + L M S E n = ( x ^ i x 1 ) 2 + ( x ^ i x 2 ) 2 + + ( x ^ i x N ) 2 ,
where x 1 , x 2 ,..., and x N are pre-enhanced images obtained with those mainstream low-light enhancement methods, which serve as the target images for the fusion stage, as shown in Figure 4.

3.5. Automatic Termination of Training Mechanism

In addition to utilizing the proposed loss function, we incorporate the selection of distinctive image content in each pre-enhanced image by employing a no-reference image quality metric. Notably, we employ the NIQE metric, a non-reference metric that monitors the iterative process of unsupervised training. The training phase is deemed complete when NIQE indicates the attainment of an optimal value, signaling that the image quality has reached a desirable level of clarity and detail without the need for reference images. This method allows for a more autonomous and efficient approach to image enhancement, as it relies on statistical regularities of natural images rather than comparisons with a set of pre-defined quality standards.
In summary, our approach aims to achieve an optimal integration of both shared and distinct yet beneficial image contents across all pre-enhanced images. To accomplish this, we utilize a combination of the multi-objective mixed loss function and the NIQE image quality metric. By incorporating the multi-objective mixed loss function and the NIQE metric, we ensure that the selected image contents, whether they are common or distinct, contribute to the overall enhancement of image quality. The multi-objective mixed loss function helps guide the training process by exploring potential images within the image space constrained with pre-enhanced images, enabling iterative improvements. Meanwhile, the NIQE metric serves as a no-reference image quality metric, allowing us to monitor the effectiveness of the enhancements throughout the iterative process. Finally, the integration of these two components empowers us to make informed decisions, resulting in visually appealing and high-quality output.

4. Experimental Results

This section encompasses the presentation of our datasets, the description of the experimental setup, and the reporting of the experimental results. To begin with, we conduct ablation experiments to assess the impact of different factors. Subsequently, we compare our method against other LLIEs.

4.1. Datasets and Experimental Setup

To comprehensively assess the overall enhancement performance of the proposed enhancement model, extensive experiments were conducted. The evaluation was carried out using three benchmark datasets: the LOL dataset [19], consisting of 485 pairs of low/normal light images for training and 15 for evaluation; LOL-Real [42], comprising 100 testing images with more diverse scenes; and DICM [43], introduced by Lee et al., containing 69 images captured using a commercial digital camera. These images are particularly valuable for studying image enhancement under low-light conditions. Specifically, in the ablation experiment, only the LOL-test dataset was utilized. Evaluation was performed on the LOL-test dataset, LOL-Real test-split, and DICM (with 10 images randomly selected). Representative state-of-the-art methods used for comparison can be categorized into four groups: traditional methods (e.g., LIME, Ying), supervised approaches (e.g., R2R), self-supervised methods (e.g., PairLIE), and unsupervised methods (e.g., Zero-DCE, Enlighten, Night), all demonstrating significant enhancement results. Objective evaluation of enhancing methods employed image quality assessment metrics, namely peak signal-to-noise ratio (PSNR), SSIM [44], NIQE, and learned perceptual image patch similarity (LPIPS) [45]. PSNR measures intensity similarity between ground truth and generated images, with higher values indicating better enhancement capability. SSIM, aligned with human visual perception, indicates visually more satisfying results with higher values. LPIPS, a reference-based metric, quantifies perceptual similarity using deep learning features, with lower values indicating higher perceptual resemblance and superior image quality. NIQE, a no-reference metric, assesses image naturalness through statistical modeling, with lower scores implying a more natural appearance and enhanced image quality. Experiments were conducted using PyTorch on a single NVIDIA RTX 3090 GPU and a Lenovo desktop equipped with a 2.1 GHz Intel Core i7-6700k CPU and 16 GB of RAM.
Additionally, to comprehensively evaluate various LLIEs based on four evaluation metrics (PSNR, SSIM, NIQE, and LPIPS), we have implemented the following scoring mechanism. For metrics such as PSNR and SSIM, where higher values indicate better enhancement, we ranked the metric values obtained by the compared methods in ascending order. The method with the highest PSNR value among all evaluated methods is assigned a rank of n, where n represents the total number of methods evaluated. The method ranked first receives a score of n points, the second-ranked method receives n 1 points, and so on. Conversely, for metrics like LPIPS and NIQE, where lower values indicate improved quality, we sorted the metric values in descending order. Using a similar scoring mechanism as for PSNR and SSIM, we assigned scores based on the rankings. Next, we calculated the total score for each method by multiplying its scores on each metric by the corresponding metric’s weight. The total score is obtained by summing these weighted scores. Finally, we normalized the scores to a range of 0 to 1 by dividing the original score by the maximum possible score. The maximum score is determined by the sum of weights of all metrics multiplied by the total number of methods evaluated.
Score = ( W psnr · N psnr + W ssim · N ssim + W NIQE · N NIQE + W LPIPS · N LPIPS ) / { m · ( W psnr + W ssim + W NIQE + W LPIPS ) } ,
In the scoring formula, m represents the total number of metrics used in the evaluation. The values N psnr , N ssim , N NIQE , and N LPIPS correspond to the rankings of each method based on their respective PSNR, SSIM, NIQE, and LPIPS metrics. These rankings are within the closed interval of [ 1 , n ] , where n denotes the total number of methods evaluated. The parameters W psnr , W ssim , W NIQE , and W LPIPS are assumed to have a default value of 1, indicating equal weightage for each metric. However, you can adjust these weights based on your specific requirements. By incorporating this scoring system, we can effectively balance the influence of different quality metrics and determine the best image enhancement method.

4.2. Ablation Experiments

Our main challenge lies in determining the type and number of LLIEs involved in our ensemble, as this directly impacts our approach’s performance. To strike a balance between effectiveness and efficiency, we initially conducted combination experiments using two types of enhanced images. From the extensive range of methodologies within the LLIE domain, we carefully selected nine representative and high-performing enhancers for inclusion in our ensemble. By randomly combining these nine methods, with the UNet’s layer set to five, we identified the best combination patterns for the two types of images. Table 1 presents the scores of different combination patterns on the LOL-test dataset. The combination of PairLIE and Night demonstrates a superior enhancing effect compared to other state-of-the-art LLIEs. Night excels in maintaining overall image brightness and color accuracy through its light-effects suppression network and high-frequency consistency losses. However, it may not perform as well in preserving fine details. On the other hand, PairLIE, utilizing Retinex decomposition and processing of original image features, exhibits superior performance in detail preservation but may be more conservative in brightness enhancement. By leveraging the strengths of each method in different scenarios, we achieve robust and optimal enhancement results.

4.2.1. The Extensibility of Combination Patterns

Theoretically, our strategy is flexible and can be extended to accommodate a greater number and variety of LLIEs. When we expand the LLIE categories to three, the resulting combination still exhibits positive effects, with improved indicators in all aspects compared to the two combinations. However, the magnitude of performance improvement is no longer significant. For example, when considering the combination of PairLIE, Night, and Bread, we obtain the following results: PSNR = 21.73, SSIM = 0.8010, NIQE = 4.0356, and LPIPS = 0.1173, as shown in Table 1. It should be noted that adding four LLIEs to our method does not significantly enhance performance and may even lead to a slight decrease. For instance, the combination of PairLIE, Night, Bread, and LIME yields the following results: PSNR = 20.30, SSIM = 0.7204, NIQE = 5.1908, and LPIPS = 0.1457. On the whole, considering the time-consuming nature of the preprocessing steps required to generate the pre-enhanced images, as well as the relatively marginal improvement in the quality of the fusion image, we have chosen to utilize only two LLIE methods for pre-enhancing low-light images in this paper.

4.2.2. The Global Weight of Combination Patterns

In our previous combination ablation experiments, we carefully selected two LLIEs for fusion: PairLIE and Night. Here, our primary goal was to fully exploit the strengths of these two methods to achieve an optimal combined enhancement effect. To achieve this, we manually assigned specific weight coefficients to each whole pre-enhanced image and conduct a series of weight coefficient combination ablation experiments. We evaluated the performance of different weight combinations using the PSNR metric. During the experiments, we maintained the weight coefficient of the Night method at a constant value while gradually increasing the weight coefficient of PairLIE. The weight coefficient for PairLIE started from 0.5 and went up to a maximum of 5. Our findings, as summarized in Table 2, reveal that the combination performs best when the weight ratio is set to 2:1. This corresponds to the global weight, denoted as W n , in Equation (7). As a result, we finalized the weight coefficients for PairLIE and Night as 2 and 1, respectively. Notably, this configuration outperforms the combination without manually set weights across all evaluation metrics, indicating that the pixel-level weighting mechanism, when combined with the global image weighting mechanism, can achieve better enhancement results.

4.2.3. Network Architectures

To comprehensively evaluate the influence of different network architectures on the enhancement effect, we conducted additional ablation studies focusing on the layer number within the UNet architecture employed in our methodology. The results, presented in Table 3, showcase performance metrics and corresponding scores for networks with one, two, three, four, and five layers. And, our experimental findings reveal intriguing insights. Firstly, networks with one, two, or three layers may lack sufficient nonlinear mapping capability to capture complex features, which can pose challenges when generalizing to unseen data. Secondly, increasing network depth does not exhibit a linear correlation with improved performance. A five-layer network introduces issues such as gradient instability and network degradation, despite partial mitigation attempts. Paradoxically, adding depth may even lead to performance decline. Moreover, deeper networks can suffer from saturation effects due to increased depth, which adversely impacts their enhancement capabilities. Additionally, deepening the model may diminish learning capabilities in shallower layers, thereby restricting the overall learning potential of the deeper network [46,47,48,49,50]. Considering these observations, we selected a 4-layer UNet architecture as it consistently delivers the best overall performance. The crucial role of structures like skip connections in facilitating effective learning within deeper networks underscores their significance in LLIE tasks.

4.3. Quantitative Results

The LOL-test dataset was initially utilized as the test data. Quantitative results in Table 4 illustrate a notable superiority of our method over other competitors, both in reference and no-reference metrics, thus demonstrating the effectiveness of our proposed combination approach with mixed weights. Specifically, our method achieves an improvement over the best performing method among the comparatives. The proposed method excels by achieving an improvement of 0.46 dB in terms of the PSNR metric, 0.00244 in terms of the SSIM metric, 0.0102 in terms of the LPIPS metric, and 0.2575 in terms of the NIQE metric. These quantitative results underscore the effectiveness of the proposed approach in enhancing the quality of low-light images. Subsequently, to ascertain the generalization capability of our proposed method, we conducted evaluations on the LOL-Real dataset. The corresponding results are displayed in Table 5. Our method once again exhibits superior performance compared to baseline methods across PSNR, SSIM, LPIPS, and NIQE metrics. Compared to other competing methods, while our method ranks second in terms of PSNR metric, it surpasses the comparative algorithms in other metrics to varying degrees. Overall, our approach demonstrates superior comprehensive performance. Analysis of data from Table 4 and Table 5 reveals relatively poor results attained by traditional and unsupervised methods. This outcome is expected due to the inherent challenge of learning a robust enhancement model in the absence of a reference image. Furthermore, the efficacy of these methods heavily relies on hand-crafted priors. In summary, compared to the performance of other methods on the aforementioned datasets, our approach maximizes the advantages of various LLIE methods, resulting in significant performance enhancements and superior overall performance.
To showcase the robustness of our method, we conducted comparisons by randomly selecting 10 images from the DICM dataset, thereby imposing a more challenging scenario for image enhancement. The quantitative analysis comparing our method with competitors on the DICM dataset is summarized in Table 6. Specifically, our method demonstrates superior performance compared to several benchmarks. On average, it achieves a 0.4132 lower NIQE score than PairLIE, 0.0947 lower than Zero-DCE, 0.5834 lower than LIME, 1.9228 lower than R2R, 0.2962 lower than EnlightenGAN, 0.6483 lower than Ying, and 1.5721 lower than Night.
In summary, the experimental results indicate that our method surpassed other state-of-the-art enhancers across three representative datasets. Notably, our approach achieves a significantly superior enhancing effect compared to any other stand-alone deep learning-based LLIE method, as it leverages both self-supervised and unsupervised approaches to their fullest extent. Functioning as a fusion-based enhancer, it integrates two complementary enhancers to amplify the enhancing effect, surpassing many representative state-of-the-art methods across various datasets.

4.4. Qualitative Comparison

In the task of image enhancement, visual quality serves as a crucial metric for evaluating the performance of LLIE methods. To visually compare the enhancing effect of our proposed method, we employed various competing methods to enhance images in the LOL-test dataset. Subsequently, we calculated the PSNR values of the enhanced images. For better visualization, we focused specifically on enlarging the area enclosed by the blue box to enhance clarity. As depicted in Figure 6, in terms of contrast, EnlightenGAN, LIME, and Night face challenges in achieving balanced contrast. Specifically, EnlightenGAN and Night exhibit low contrast, while LIME shows excessive contrast. Regarding overall brightness, PairLIE and Ying tend to produce darker images overall. Notably, Zero-DCE and R2R induce color distortions. In terms of details and noise, traditional methods typically introduce varying levels of noise. Conversely, deep learning-based LLIEs often overly smooth the images to eliminate noise, as observed in Night, or retain details but struggle to effectively reduce noise, as exemplified by PairLIE. Regarding the pre-enhanced images used as target images during the image fusion stage, Night excels in maintaining overall brightness and color accuracy. However, it tends to fall short in preserving fine details. In contrast, PairLIE excels in detail preservation but is more conservative in enhancing brightness. Our proposed method uniquely strikes a balance among these aspects, successfully preserving details, minimizing noise, and maintaining color fidelity. It ultimately achieves a brightness level and natural color contrast that closely resemble those of the original images.
Furthermore, Figure 7 presents the results of all methods on the LOL-Real dataset. In the magnification box, it can be observed that while traditional methods perform well in preserving details, they often introduce some level of noise. On the other hand, deep learning-based LLIE methods excel at preserving details but face challenges of excessive smoothing. Additionally, in terms of overall color fidelity and natural color contrast, Zero-DCE and LIME exhibit excessive contrast, Night demonstrates low contrast, and R2R suffers from color distortion. In contrast, our proposed method achieves a balance across various aspects and comes closest to the ground truth. This highlights its capability to effectively reduce noise and preserve detail. Moreover, the color fidelity and contrast achieved by our approach are more closely aligned with the ground truth compared to competing methods.

5. Conclusions

In this work, we introduce a two-stage strategy to enhance the performance of the LLIE methods, which leverages a novel unsupervised UNet network to generate pixel-wise weight maps, effectively combining the benefits of different complementary LLIEs. This approach significantly improves the overall image quality, marking a notable advancement in the field. In the pre-enhancement stage, our method initiates with producing two distinct pre-enhanced images through two LLIEs selected for their complementary strengths. During fusion, guided by a well-designed loss function and a termination criterion based on a non-reference metric, our proposed mixing fusion mechanism enables unsupervised modulation of image quality, leading to versatile and robust enhancement results suitable for various different scenarios. Empirical evidence demonstrates that our technique surpasses traditional and advanced learning-based LLIE techniques, particularly in preserving detail and naturalness. Moreover, our approach distinguishes itself with its scalability and adaptability, developed to seamlessly incorporate future technological advancements within the LLIE domain. This ensures its long-term applicability and capacity to consistently deliver superior results. In conclusion, our method represents a significant leap forward. It provides a highly effective solution that not only leverages the complementary advantages of different LLIEs but also exhibits remarkable adaptability across various situations, positioning it as novel in the realm of LLIE.

Author Contributions

S.X. and S.J. contributed to the conception of the study, and S.J. wrote the main manuscript text, and N.X. and X.C. contributed significantly to analysis and manuscript preparation, and Q.C. and X.J. conducted experiments. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of China, grant number 62162043.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to acknowledge the reviewers for their constructive comments and suggestions that helped to improve the paper’s quality.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Lim, C.C.; Loh, Y.P.; Wong, L.K. LAU-Net: A low light image enhancer with attention and resizing mechanisms. Signal Process. Image Commun. 2023, 115, 116971. [Google Scholar] [CrossRef]
  2. Zheng, N.; Huang, J.; Zhou, M.; Yang, Z.; Zhu, Q.; Zhao, F. Learning semantic degradation-aware guidance for recognition-driven unsupervised low-light image enhancement. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 3678–3686. [Google Scholar]
  3. Chen, W.; Shah, T. Exploring Low-light Object Detection Techniques. arXiv 2021, arXiv:abs/2107.14382. [Google Scholar]
  4. Guo, H.; Lu, T.; Wu, Y. Dynamic Low-Light Image Enhancement for Object Detection via End-to-End Training. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 5611–5618. [Google Scholar]
  5. He, Z.; Ran, W.; Liu, S.; Li, K.; Lu, J.; Xie, C.; Liu, Y.; Lu, H. Low-Light Image Enhancement with Multi-Scale Attention and Frequency-Domain Optimization. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 2861–2875. [Google Scholar] [CrossRef]
  6. Hashmi, K.A.; Kallempudi, G.; Stricker, D.; Afzal, M.Z. FeatEnHancer: Enhancing Hierarchical Features for Object Detection and Beyond Under Low-Light Vision. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 6725–6735. [Google Scholar]
  7. Hai, J.; Hao, Y.; Zou, F.; Lin, F.; Han, S. Advanced RetinexNet: A fully convolutional network for low-light image enhancement. Signal Process. Image Commun. 2023, 112, 116916. [Google Scholar] [CrossRef]
  8. Jiang, H.; Luo, A.; Fan, H.; Han, S.; Liu, S. Low-Light Image Enhancement with Wavelet-Based Diffusion Models. ACM Trans. Graph. 2023, 42, 1–14. [Google Scholar] [CrossRef]
  9. Rashed, H.; Ramzy, M.; Vaquero, V.; El Sallab, A.; Sistu, G.; Yogamani, S. FuseMODNet: Real-Time Camera and LiDAR Based Moving Object Detection for Robust Low-Light Autonomous Driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Seoul, Republic of Korea, 27–28 October 2019; pp. 2393–2402. [Google Scholar]
  10. Pham, L.H.; Tran, D.N.N.; Jeon, J.W. Low-Light Image Enhancement for Autonomous Driving Systems using DriveRetinex-Net. In Proceedings of the 2020 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), Seoul, Republic of Korea, 1–3 November 2020; pp. 1–5. [Google Scholar]
  11. Liu, Y.; Yi, F.; Ma, Y.; Wang, Y. ASA-BiSeNet: Improved real-time approach for road lane semantic segmentation of low-light autonomous driving road scenes. Appl. Opt. 2023, 62, 5224–5235. [Google Scholar] [CrossRef] [PubMed]
  12. Pizer, S.; Amburn, E.; Austin, J.; Cromartie, R.; Geselowitz, A.; Greer, T.; ter Haar Romeny, B.; Zimmerman, J.; Zuiderveld, K. Adaptive histogram equalization and its variations. Comput. Vis. Graph. Image Process. 1987, 39, 355–368. [Google Scholar] [CrossRef]
  13. Land, E.H. The retinex theory of color vision. Sci. Am. 1977, 237, 108–129. [Google Scholar] [CrossRef] [PubMed]
  14. Jobson, D.J.; ur Rahman, Z.; Woodell, G.A. Properties and performance of a center/surround retinex. IEEE Trans. Image Process. 1997, 6, 451–462. [Google Scholar] [CrossRef]
  15. Jobson, D.J.; ur Rahman, Z.; Woodell, G.A. A multiscale retinex for bridging the gap between color images and the human observation of scenes. IEEE Trans. Image Process. 1997, 6, 965–976. [Google Scholar] [CrossRef]
  16. Guo, X.; Li, Y.; Ling, H. LIME: Low-light image enhancement via illumination map estimation. IEEE Trans. Image Process. 2016, 26, 982–993. [Google Scholar] [CrossRef] [PubMed]
  17. Lore, K.G.; Akintayo, A.; Sarkar, S. LLNet: A deep autoencoder approach to natural low-light image enhancement. Pattern Recognit. 2017, 61, 650–662. [Google Scholar] [CrossRef]
  18. Tao, L.; Zhu, C.; Xiang, G.; Li, Y.; Jia, H.; Xie, X. LLCNN: A convolutional neural network for low-light image enhancement. In Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, FL, USA, 10–13 December 2017; pp. 1–4. [Google Scholar]
  19. Chen, W.; Wang, W.; Yang, W.; Liu, J. Deep retinex decomposition for low-light enhancement. arXiv 2018, arXiv:1808.04560. [Google Scholar]
  20. Zhang, Y.; Zhang, J.; Guo, X. Kindling the Darkness: A Practical Low-light Image Enhancer. In Proceedings of the 27th ACM International Conference on Multimedia, New York, NY, USA, 21–25 October 2019; pp. 1632–1640. [Google Scholar]
  21. Zhang, Y.; Guo, X.; Ma, J.; Liu, W.; Zhang, J. Beyond brightening low-light images. Int. J. Comput. Vis. 2021, 129, 1013–1037. [Google Scholar] [CrossRef]
  22. Wang, R.; Zhang, Q.; Fu, C.W.; Shen, X.; Zheng, W.S.; Jia, J. Underexposed Photo Enhancement Using Deep Illumination Estimation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 6849–6857. [Google Scholar]
  23. Jiang, Y.; Gong, X.; Liu, D.; Cheng, Y.; Fang, C.; Shen, X.; Yang, J.; Zhou, P.; Wang, Z. EnlightenGAN: Deep light enhancement without paired supervision. IEEE Trans. Image Process. 2021, 30, 2340–2349. [Google Scholar] [CrossRef] [PubMed]
  24. Guo, C.; Li, C.; Guo, J.; Loy, C.C.; Hou, J.; Kwong, S.; Cong, R. Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1780–1789. [Google Scholar]
  25. Li, C.; Guo, C.; Loy, C.C. Learning to Enhance Low-Light Image via Zero-Reference Deep Curve Estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 4225–4238. [Google Scholar] [CrossRef] [PubMed]
  26. Lempitsky, V.; Vedaldi, A.; Ulyanov, D. Deep Image Prior. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 9446–9454. [Google Scholar]
  27. Zhao, Z.; Xiong, B.; Wang, L.; Ou, Q.; Yu, L.; Kuang, F. RetinexDIP: A Unified Deep Framework for Low-Light Image Enhancement. IEEE Trans. Circuits Syst. Video Technol. (TCSVT) 2021, 32, 1076–1088. [Google Scholar] [CrossRef]
  28. Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a ’completely blind’ image quality analyzer. IEEE Signal Process. Lett. 2012, 20, 209–212. [Google Scholar] [CrossRef]
  29. Fu, Z.; Yang, Y.; Tu, X.; Huang, Y.; Ding, X.; Ma, K.K. Learning a Simple Low-Light Image Enhancer from Paired Low-Light Instances. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 22252–22261. [Google Scholar]
  30. Jin, Y.; Yang, W.; Tan, R.T. Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects Suppression. In Proceedings of the Computer Vision–ECCV 2022, Tel Aviv, Israel, 23–27 October 2022; pp. 404–421. [Google Scholar]
  31. Yi, X.; Xu, H.; Zhang, H.; Tang, L.; Ma, J. Diff-retinex: Rethinking Low-Light Image Enhancement with a Generative Diffusion Model. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 12302–12311. [Google Scholar]
  32. Wang, T.; Zhang, K.; Shen, T.; Luo, W.; Stenger, B.; Lu, T. Ultra-High-Definition Low-Light Image Enhancement: A Benchmark and Transformer-Based Method. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 2654–2662. [Google Scholar]
  33. Xu, X.; Wang, R.; Fu, C.W.; Jia, J. SNR-aware Low-Light Image Enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 17714–17724. [Google Scholar]
  34. Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H.; Shao, L. Learning Enriched Features for Real Image Restoration and Enhancement. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Part XXV 16. pp. 492–511. [Google Scholar]
  35. Zhang, Z.; Zheng, H.; Hong, R.; Xu, M.; Yan, S.; Wang, M. Deep Color Consistent Network for Low-Light Image Enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1899–1908. [Google Scholar]
  36. Ying, Z.; Li, G.; Ren, Y.; Wang, R.; Wang, W. A New Image Contrast Enhancement Algorithm Using Exposure Fusion Framework. In Proceedings of the Computer Analysis of Images and Patterns, Ystad, Sweden, 22–24 August 2017; pp. 36–46. [Google Scholar]
  37. Wang, Y.; Wan, R.; Yang, W.; Li, H.; Chau, L.P.; Kot, A. Low-Light Image Enhancement with Normalizing Flow. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 22 February–1 March 2022; Volume 36, pp. 2604–2612. [Google Scholar]
  38. Ma, L.; Ma, T.; Liu, R.; Fan, X.; Luo, Z. Toward Fast, Flexible, and Robust Low-Light Image Enhancement. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 5637–5646. [Google Scholar]
  39. Guo, X.; Hu, Q. Low-light Image Enhancement via Breaking Down the Darkness. Int. J. Comput. Vis. 2023, 131, 48–66. [Google Scholar] [CrossRef]
  40. Hai, J.; Xuan, Z.; Ren, Y.; Hao, Y.; Zou, F.; Lin, F.; Han, S. R2RNet: Low-light image enhancement via real-low to real-normal network. J. Vis. Commun. Image Represent. 2023, 90, 103712. [Google Scholar] [CrossRef]
  41. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2015: 18th International Conference), Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
  42. Yang, W.; Wang, W.; Huang, H.; Wang, S.; Liu, J. Sparse gradient regularized deep retinex network for robust low-light image enhancement. IEEE Trans. Image Process. 2021, 30, 2072–2086. [Google Scholar] [CrossRef] [PubMed]
  43. Lee, C.; Lee, C.; Kim, C.S. Contrast enhancement based on layered difference representation of 2D histograms. IEEE Trans. Image Process. 2013, 22, 5372–5384. [Google Scholar] [CrossRef] [PubMed]
  44. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
  45. Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
  46. Bengio, Y.; LeCun, Y. Scaling learning algorithms towards AI. Large-Scale Kernel Mach. 2007, 34, 1–41. [Google Scholar]
  47. Montufar, G.F.; Pascanu, R.; Cho, K.; Bengio, Y. On the number of linear regions of deep neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2, Montreal, QC, Canada, 8–13 December 2014; Volume 27, pp. 2924–2932. [Google Scholar]
  48. Pascanu, R.; Montufar, G.; Bengio, Y. On the number of response regions of deep feed forward networks with piece-wise linear activations. arXiv 2013, arXiv:1312.6098. [Google Scholar]
  49. Bianchini, M.; Scarselli, F. On the complexity of neural network classifiers: A comparison between shallow and deep architectures. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 1553–1565. [Google Scholar] [CrossRef]
  50. Raghu, M.; Poole, B.; Kleinberg, J.; Ganguli, S.; Sohl-Dickstein, J. On the expressive power of deep neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, Sydney, Australia, 6–11 August 2017; pp. 2847–2854. [Google Scholar]
Figure 1. The architecture of PairLIE. The P-Net module is employed to eliminate unsuitable features present in the original image, while L-Net and R-Net are utilized to estimate the illumination and reflectance components of the image, respectively. The optimization process of the network is guided by three loss functions: A self-supervised projection loss L P , a reflectance consistency loss L C , and a Retinex loss L R . In the testing stage, PairLIE enhances the image by adjusting the illumination component and recombining it with the reflectance component, resulting in an enhanced output image.
Figure 1. The architecture of PairLIE. The P-Net module is employed to eliminate unsuitable features present in the original image, while L-Net and R-Net are utilized to estimate the illumination and reflectance components of the image, respectively. The optimization process of the network is guided by three loss functions: A self-supervised projection loss L P , a reflectance consistency loss L C , and a Retinex loss L R . In the testing stage, PairLIE enhances the image by adjusting the illumination component and recombining it with the reflectance component, resulting in an enhanced output image.
Applsci 14 04962 g001
Figure 2. The architecture of Night. Upon receiving an input night image, Night employs a layer decomposition network for the suppression of undesired light effects. This intricate network is adept at the extraction of light effects, shading, and reflectance layers, thereby facilitating their individual analysis and manipulation. Central to the architecture is an unsupervised light-effects suppression network, guided by the decomposed light-effects layer G. This mechanism capitalizes on unpaired learning techniques to diminish the prominence of light effects, while concurrently enhancing the visibility within the darker regions of the image. A distinctive feature of this unsupervised approach is its generation of an attention map. This map vividly delineates the network’s targeted focus on areas predominantly influenced by light effects, ensuring a precise and effective suppression strategy. Consequently, this approach leads to a notable reduction of light effects in the refined output J refine .
Figure 2. The architecture of Night. Upon receiving an input night image, Night employs a layer decomposition network for the suppression of undesired light effects. This intricate network is adept at the extraction of light effects, shading, and reflectance layers, thereby facilitating their individual analysis and manipulation. Central to the architecture is an unsupervised light-effects suppression network, guided by the decomposed light-effects layer G. This mechanism capitalizes on unpaired learning techniques to diminish the prominence of light effects, while concurrently enhancing the visibility within the darker regions of the image. A distinctive feature of this unsupervised approach is its generation of an attention map. This map vividly delineates the network’s targeted focus on areas predominantly influenced by light effects, ensuring a precise and effective suppression strategy. Consequently, this approach leads to a notable reduction of light effects in the refined output J refine .
Applsci 14 04962 g002
Figure 3. The enhanced results of different complementary methods on LOL-test dataset. (a) label; (b) Bread [39]; (c) R2R [40]; (d) SCI [38]; (e) Night [30]; (f) PairLIE [29]; (g) LLFlow [37]; (h) Retinex-Net [19]; (i) EnlightenGAN [23]; (j) LIME [16]; (k) Ying [36]; (l) Zero-DCE [24].
Figure 3. The enhanced results of different complementary methods on LOL-test dataset. (a) label; (b) Bread [39]; (c) R2R [40]; (d) SCI [38]; (e) Night [30]; (f) PairLIE [29]; (g) LLFlow [37]; (h) Retinex-Net [19]; (i) EnlightenGAN [23]; (j) LIME [16]; (k) Ying [36]; (l) Zero-DCE [24].
Applsci 14 04962 g003
Figure 4. Unsupervised weight map generative network for image generation. Initially, a low-light image is subjected to N distinct LLIE processes, producing pre-enhanced images { x n } 1 N . Thereafter, a specialized UNet architecture is utilized to generate local pixel-wise weight maps { s n } 1 N . These maps, alongside global weight coefficients { W n } 1 N , are instrumental in the iterative refinement of network parameters, governed by the multi-objective mixed loss function L mul . This process ultimately yields the final enhanced image, represented as x ^ i , encapsulating the integration of local and global mixed weighting methodologies.
Figure 4. Unsupervised weight map generative network for image generation. Initially, a low-light image is subjected to N distinct LLIE processes, producing pre-enhanced images { x n } 1 N . Thereafter, a specialized UNet architecture is utilized to generate local pixel-wise weight maps { s n } 1 N . These maps, alongside global weight coefficients { W n } 1 N , are instrumental in the iterative refinement of network parameters, governed by the multi-objective mixed loss function L mul . This process ultimately yields the final enhanced image, represented as x ^ i , encapsulating the integration of local and global mixed weighting methodologies.
Applsci 14 04962 g004
Figure 5. The architecture of the proposed UNet. The network’s initial stage involves receiving an input characterized by random noise, symbolized as Ƶ . Simultaneously, a randomly generated output weight maps, { s n } 1 N , is produced, adhering to a configuration that mandates N channels for the output within the UNet framework.
Figure 5. The architecture of the proposed UNet. The network’s initial stage involves receiving an input characterized by random noise, symbolized as Ƶ . Simultaneously, a randomly generated output weight maps, { s n } 1 N , is produced, adhering to a configuration that mandates N channels for the output within the UNet framework.
Applsci 14 04962 g005
Figure 6. Visual enhancing results of various methods on LOL-test dataset (a) label; (b) PSNR = 13.88 dB; Ying [36]; (c) PSNR = 15.34 dB; EnlightenGAN [23]; (d) PSNR = 15.78 dB; Zero-DCE [24]; (e) PSNR = 16.38 dB; LIME [16]; (f) PSNR = 18.33 dB; PairLIE [29]; (g) PSNR = 20.48 dB; R2R [40]; (h) PSNR = 22.09 dB; Night [30]; (i) PSNR = 26.54 dB; our.
Figure 6. Visual enhancing results of various methods on LOL-test dataset (a) label; (b) PSNR = 13.88 dB; Ying [36]; (c) PSNR = 15.34 dB; EnlightenGAN [23]; (d) PSNR = 15.78 dB; Zero-DCE [24]; (e) PSNR = 16.38 dB; LIME [16]; (f) PSNR = 18.33 dB; PairLIE [29]; (g) PSNR = 20.48 dB; R2R [40]; (h) PSNR = 22.09 dB; Night [30]; (i) PSNR = 26.54 dB; our.
Applsci 14 04962 g006
Figure 7. Visual enhancing results of various methods on LOL-Real test-split (a) label; (b) PSNR = 12.53 dB; Ying [36]; (c) PSNR = 15.32 dB; EnlightenGAN [23]; (d) PSNR = 17.42 dB; Zero-DCE [24]; (e) PSNR = 17.83 dB; LIME [16]; (f) PSNR = 19.76 dB; PairLIE [29]; (g) PSNR = 20.69 dB; R2R [40]; (h) PSNR = 22.60 dB; Night [30]; (i) PSNR = 23.24 dB; our.
Figure 7. Visual enhancing results of various methods on LOL-Real test-split (a) label; (b) PSNR = 12.53 dB; Ying [36]; (c) PSNR = 15.32 dB; EnlightenGAN [23]; (d) PSNR = 17.42 dB; Zero-DCE [24]; (e) PSNR = 17.83 dB; LIME [16]; (f) PSNR = 19.76 dB; PairLIE [29]; (g) PSNR = 20.69 dB; R2R [40]; (h) PSNR = 22.60 dB; Night [30]; (i) PSNR = 23.24 dB; our.
Applsci 14 04962 g007
Table 1. The average values of metrics obtained by different combination patterns on the LOL-test dataset.
Table 1. The average values of metrics obtained by different combination patterns on the LOL-test dataset.
Methods
Case Number PairLIE Night Bread R 2 R EnlightenGAN Ying Zero-DCE LIME Fu PSNR SSIM NIQE LPIPS Score
1 21.600.78933.90790.1310 0.9653
2 19.950.80123.99540.11090.9306
3 22.110.78354.16200.14260.8958
4 19.470.77724.05230.14090.8472
5 18.630.75603.66860.16260.8056
6 20.050.69123.93210.14640.7917
7 19.950.75644.85810.15270.7847
8 18.870.69153.79200.14530.7778
9 19.110.70124.22760.13440.7708
10 18.330.69613.92560.13240.7639
11 20.260.74034.17700.18080.7361
12 18.270.76385.24790.13150.7222
13 18.420.74244.43360.16350.7014
14 20.280.69106.12940.17670.6042
15 17.710.71844.95280.17530.5972
16 16.940.70695.12690.17190.5417
17 18.290.68805.72590.18020.5278
18 17.020.62834.91760.16720.4931
19 18.150.68725.58470.19760.4792
20 18.450.67256.15250.18320.4583
21 17.550.57695.88610.16720.4167
22 19.550.59916.63100.23310.4028
23 18.960.61066.80890.21660.3958
24 16.810.67545.77700.19320.3819
25 17.470.64696.00980.21650.3681
26 18.490.59476.12190.24840.3681
27 16.560.66225.74290.20080.3542
28 16.220.55965.61430.18070.3264
29 18.280.59426.61680.24190.3125
30 17.540.50176.48820.23150.2500
31 16.820.58597.11710.26050.1736
32 15.290.59346.94240.24220.1597
33 17.150.52807.26540.29940.1528
34 16.980.47597.55470.32020.0972
35 15.560.53327.55690.28330.0903
36 15.740.48727.67010.30770.0556
37 21.730.80104.03560.1173-
38 20.300.72045.19080.1457-
Table 2. Corresponding average PSNR on the LOL-test dataset under different global weighted coefficient combination.
Table 2. Corresponding average PSNR on the LOL-test dataset under different global weighted coefficient combination.
Coefficient Ratio 0.5:11:12:13:14:15:1
PSNR 21.3221.63 21.70 21.4821.5521.53
Table 3. The impact of different levels of UNet networks on enhancement effects on LOL-test dataset.
Table 3. The impact of different levels of UNet networks on enhancement effects on LOL-test dataset.
Layer Metric PSNR SSIM NIQE LPIPS Score
521.730.78403.87840.13230.60
421.960.78693.82850.1341 0.70
321.610.76153.74150.15410.40
221.690.79363.95580.12720.65
121.660.79403.99190.12680.65
Table 4. Quantitative evaluation of different LLIEs on LOL-test datasets. The best results are highlighted in bold.
Table 4. Quantitative evaluation of different LLIEs on LOL-test datasets. The best results are highlighted in bold.
Metric Method PairLIE Zero-DCE LIME R 2 R Night EnlightenGAN Ying Our
PSNR 18.4616.8017.1918.1221.5016.1214.46 21.96
SSIM 0.74310.56440.56280.73810.76250.66230.5277 0.7869
LPIPS 0.16210.24860.14430.18640.17570.20580.2470 0.1341
NIQE 4.10147.93355.58594.08604.49386.27057.7122 3.8285
Table 5. Quantitative evaluation of different LLIEs on LOL-Real datasets. The best results are highlighted in bold.
Table 5. Quantitative evaluation of different LLIEs on LOL-Real datasets. The best results are highlighted in bold.
Metric Method PairLIE Zero-DCE LIME R 2 R Night EnlightenGAN Ying Our
PSNR 19.9318.1918.1017.95 25.53 19.3917.3223.22
SSIM 0.77310.55420.54930.77330.78220.70350.5955 0.8203
LPIPS 4.37918.21945.29834.95894.31576.35067.9347 3.8818
NIQE 0.15820.25400.16490.17600.18830.17810.2026 0.1340
Table 6. Quantitative evaluation of different LLIEs on DICM datasets (10 pictures selected). The best results are highlighted in bold.
Table 6. Quantitative evaluation of different LLIEs on DICM datasets (10 pictures selected). The best results are highlighted in bold.
Metric Method PairLIE Zero-DCE LIME R 2 R EnlightenGAN Ying Night Our
NIQE 2.96942.65093.13964.47902.85243.20454.1283 2.5562
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ji, S.; Xu, S.; Xiao, N.; Cheng, X.; Chen, Q.; Jiang, X. Boosting the Performance of LLIE Methods via Unsupervised Weight Map Generation Network. Appl. Sci. 2024, 14, 4962. https://doi.org/10.3390/app14124962

AMA Style

Ji S, Xu S, Xiao N, Cheng X, Chen Q, Jiang X. Boosting the Performance of LLIE Methods via Unsupervised Weight Map Generation Network. Applied Sciences. 2024; 14(12):4962. https://doi.org/10.3390/app14124962

Chicago/Turabian Style

Ji, Shuichen, Shaoping Xu, Nan Xiao, Xiaohui Cheng, Qiyu Chen, and Xinyi Jiang. 2024. "Boosting the Performance of LLIE Methods via Unsupervised Weight Map Generation Network" Applied Sciences 14, no. 12: 4962. https://doi.org/10.3390/app14124962

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop