Low-Light Image Enhancement Using CycleGAN-Based Near-Infrared Image Generation and Fusion

Lee, Min-Han; Go, Young-Ho; Lee, Seung-Hwan; Lee, Sung-Hak

doi:10.3390/math12244028

Open AccessArticle

Low-Light Image Enhancement Using CycleGAN-Based Near-Infrared Image Generation and Fusion

School of Electronic and Electrical Engineering, Kyungpook National University, 80 Daehak-ro, Buk-gu, Daegu 41566, Republic of Korea

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(24), 4028; https://doi.org/10.3390/math12244028

Submission received: 6 December 2024 / Revised: 18 December 2024 / Accepted: 19 December 2024 / Published: 22 December 2024

(This article belongs to the Special Issue New Advances and Applications in Image Processing and Computer Vision)

Download

Browse Figures

Versions Notes

Abstract

:

Image visibility is often degraded under challenging conditions such as low light, backlighting, and inadequate contrast. To mitigate these issues, techniques like histogram equalization, high dynamic range (HDR) tone mapping and near-infrared (NIR)–visible image fusion are widely employed. However, these methods have inherent drawbacks: histogram equalization frequently causes oversaturation and detail loss, while visible–NIR fusion requires complex and error-prone images. The proposed algorithm of a complementary cycle-consistent generative adversarial network (CycleGAN)-based training with visible and NIR images, leverages CycleGAN to generate fake NIR images by blending the characteristics of visible and NIR images. This approach presents tone compression and preserves fine details, effectively addressing the limitations of traditional methods. Experimental results demonstrate that the proposed method outperforms conventional algorithms, delivering superior quality and detail retention. This advancement holds substantial promise for applications where dependable image visibility is critical, such as autonomous driving and CCTV (Closed-Circuit Television) surveillance systems.

Keywords:

tone compression; CycleGAN; visible–NIR image fusion; contrast-limited adaptive histogram equalization

MSC:

68T45

1. Introduction

The development of effective surveillance systems and driver assistance technologies has become increasingly critical in recent years, particularly under challenging visual conditions. Ensuring clear visibility and accurate object detection in low-light adverse environments remains a significant challenge. To enhance image quality, techniques such as multi-exposure imaging or multi-band imaging (e.g., using IR cut switching during the day and auxiliary IR LED for night images) are often employed [1,2]. However, these approaches face several limitations, including high manufacturing costs, lengthy image acquisition times, and technical challenges such as multi-image alignment [3].

Numerous algorithms have been proposed to address these challenges. The iCAM06 algorithm, for instance, enhances HDR images through a systematic workflow involving XYZ color space conversion, chromatic adaptation, IPT color space transformation, and tone mapping with color correction [4]. Despite its effectiveness, iCAM06 often struggles with performance consistency under varying lighting conditions and may introduce artifacts in extreme scenarios.

Similarly, the L1LO algorithm employs a frequency-domain approach by decomposing images into multiple frequency bands and applying adaptive linear operators [5]. While computationally efficient, this method frequently fails to preserve fine details in complex scenes and struggles in regions with dramatic illumination changes.

Advanced methods such as KWON’s contrast sensitivity-based HDR image decomposition address these limitations by employing CSF-based multiscale decomposition to multiple base and detail layers [6]. This sophisticated processing pipeline, which includes Gaussian pyramid down-sampling and the application of CSF-based weight, excels at preserving fine details and minimizing halo artifacts. However, it demonstrates limitations in high-contrast regions, where its performance can degrade.

The Reinhard algorithm, widely recognized for HDR tone mapping, supports both global and local approaches [7,8]. The global method relies on log-average luminance calculations and scale factors for luminance compression, while the local version uses Gaussian-weighted averages for adaptive tone mapping. This versatility makes it particularly effective for general-purpose HDR processing. However, it can struggle to maintain local contrast in complex scenes.

Modern deep learning-based approaches have also emerged as promising solutions. RetinexNet, which leverages Retinex theory within a neural network framework [9], enhances low-light images by decomposing them into illumination and reflectance components for targeted enhancements. Its unsupervised learning approach allows for natural low-light enhancement without the need for paired training data.

Kim et al.’s method combines Retinex-based image enhancement techniques with contrast-limited adaptive histogram equalization (CLAHE) to improve both global and local contrast while preserving natural colors [10]. While effective in enhancing low-light or unevenly illuminated images, it causes slight color desaturation in dark regions.

Despite these advancements, challenges remain in overcoming the limitations of traditional image enhancement techniques in low-light conditions. While previous methods focused on converting saturated areas into unsaturated areas using images taken at different exposures, they often degrade certain image areas while improving others. This study proposes using a novel approach that leverages infrared images for training the model. The NIR wavelength band, which ranges from 800 to 1500 nm, offers better light penetration, diffraction capabilities, and reduced light saturation. In NIR images, object boundaries and textures are defined with strong contrast and sharpness, making them particularly useful for edge detection. In contrast, visible images are advantageous for expressing color components, and their fusion with NIR images allows for more accurate object identification, as NIR images emphasize the boundaries and textures of the entire image [11].

This work proposes a complementary cycle-consistent generative adversarial network (CycleGAN)-based training method, utilizing both visible and NIR images to create a sophisticated model. Given the difficulty in obtaining paired visible–NIR images for training, a sequential training strategy combining unpaired and paired image data within the CycleGAN framework is implemented. This innovative approach enables the generation of effective NIR training datasets without the need for extensive paired data collection. Then, the framework performs a two-stage mixing process between low-light visible images and virtually transformed NIR images. In the first stage, the difference in image between the visible and NIR images is calculated and used to blend the images by combining the NIR image weighted by the inverse of the difference and the visible image weighted by the difference itself. In the second stage, a bilateral filter is applied to the difference image obtained from the first stage [12], which is used to weight the visible image by its inverse and the synthesized image from the first stage by its difference [13]. This blending process is followed by additional contrast-limited adaptive histogram equalization for final image enhancement. During the fusion process between NIR and visible images, imbalances can arise between the modified luminance channel and the color channels (a and b), potentially resulting in unnatural color representation. To address this, a color adjustment method is proposed, ensuring the restoration of natural color fidelity while preserving the enhanced luminance information.

By fusing NIR and visible images, this method can significantly improve image clarity in low-light conditions [14], reducing oversaturation in low-illumination areas. Experimental results show substantial improvements in image quality under challenging lighting conditions, making this approach particularly valuable for surveillance systems and driver-assistance technologies, where reliable visibility is crucial.

The main contributions of this study are as follows:

▶: To address the lack of paired visible–NIR image datasets and improve training quality, two-stage training is adopted. The first stage uses CycleGAN to generate fake visible images from unpaired data, while the second stage fine-tunes the network with paired fake visible and real NIR images, enhancing domain translation and structural consistency;
▶: The purpose of the two-stage image fusion is to blend luminance information from visible and NIR images effectively. In the first stage, luminance differences are calculated and combined to emphasize details, while the second stage applies a bilateral filter to suppress noise and refine blending, followed by gamma correction to enhance global tone;
▶: Additional local tone processing is achieved using CLAHE to improve local contrast, while color adjustment restores natural chromatic balance by compensating for distortions caused during luminance enhancement.

The remainder of this paper is organized as follows: Section 2 reviews related work in deep learning-based approaches and image fusion. Section 3 details the proposed methodology, including the CycleGAN training strategy, and visible–NIR image blending. Section 4 presents experimental results and comparative analysis. Finally, Section 5 concludes the paper and outlines future directions for research.

2. Related Works

2.1. Cycle-Consistent Generative Adversarial Networks

CycleGAN is a deep learning framework designed for unpaired image-to-image translation, addressing tasks where paired datasets are unavailable. Unlike traditional GANs, which rely on paired datasets for training, CycleGAN employs two generators and two discriminators to learn mappings between two domains [15,16], as shown in Figure 1.

The model includes two generators (

G

and

F

) and two discriminators (

D_{X}

and

D_{Y}

). The generator

G

maps data from domain

X

(visible images) to domain

Y

(near-infrared images), while the generator

F

maps data in the reverse direction, from domain

Y

to domain

X

.

The adversarial loss helps the generator learn to deceive the discriminator into recognizing the generated images as authentic, whereas the cycle-consistency loss (

{= L}_{c y c l e})

ensures that the translations between domains

X

and

Y

remain consistent in both directions. The framework enforces a cycle consistency constraint, ensuring that an image translated from one domain to another can be reverted to its original form. This feature makes CycleGAN particularly valuable for tasks where paired data are unavailable, such as style transfer, domain adaptation, and cross-modal image generation.

L_{G A N} (G, D_{Y}, X, Y) = E_{y \sim p_{d a t a (y)}} [l o g D_{y} (y)] + E_{x ~ p_{d a t a (x)}} [l o g (1 - D_{Y} (G (x))],

(1)

where

G

and

D_{Y}

represent the generator and discriminator for domain

Y

, respectively.

X

and

Y

denote the source and target domains, while E represents the expectation operator, which calculates the average value of the data distribution. The terms

x ~ p_{d a t a (x)}

and

y \sim p_{d a t a (y)}

refer to samples drawn from the data distributions of

X

and

Y

, respectively. The goal is for

G

to minimize the difference between

G (x)

and real images from

Y

, while

D_{Y}

maximizes this difference. This creates a continuous adversarial process where

G

and

D_{Y}

iteratively improve.

The cycle consistency loss ensures that an image translated to the target domain and then back to the source domain remains consistent with the original input. Mathematically, this is expressed as follows:

L_{c y c l e} (G, F) = E_{x ~ p_{d a t a (x)}} [∥ F (G (x)) - x ∥_{1}] + E_{y \sim p_{d a t a (y)}} [∥ G (F (y)) - y ∥_{1}],

(2)

where

G

and

F

are the generators for the transformations between domains, and

x

and

y

represent data points from the source and target domains, respectively. The notation E represents the expectation operator, which calculates the average value over the data distribution. Specifically,

x ~ p_{d a t a (x)}

and

y \sim p_{d a t a (y)}

indicate that

x

and

y

are samples drawn from the data distributions of domains

X

and

Y

, respectively. The term

∥ ∥_{1}

denotes the normalization, which computes the absolute difference between the elements. The first term,

E_{x ~ p_{d a t a (x)}} [∥ F (G (x)) - x ∥_{1}]

, ensures that applying

G

followed by

F

on

x

results in a reconstruction of

x

itself. Similarly, the second term,

E_{y \sim p_{d a t a (y)}} [∥ G (F (y)) - y ∥_{1}]

, ensures that applying

F

followed by

G

on

y

reconstructs

y

. This is the cycle-consistency loss

L_{c y c l e} (G, F),

which ensures that both transformations

G

and

F

are inverses of each other, promoting consistency between the source and target domains. This bidirectional cycle consistency preserves structural integrity and features during translation, ensuring minimal information loss.

L_{t o t a l} (G, F, D_{X}, D_{y}) = L_{G A N} (G, D_{Y}, X, Y) + L_{G A N} (F, D_{X}, Y, X) + λ L_{c y c l e} (G, F),

(3)

where

L_{t o t a l} (G, F, D_{X}, D_{y})

is a combination of the adversarial losses from both domains and the cycle-consistency loss, with a weighting factor

λ

applied to the cycle-consistency term. The weight

λ

controls the importance of the cycle-consistency loss relative to the adversarial losses. This total loss is minimized during training to ensure that both the generators

G

and

F

perform effective domain transformations while maintaining cycle consistency.

Additionally, an adversarial loss is introduced for the mapping function

F

:

Y

→

X

and its discriminator

D_{X}

. The adversarial loss is defined as

\min_{F} \max_{D_{X}} L_{G A N} (F, D_{X}, Y, X)

where

F

represents the generator function that transforms data between

Y

to domain

X

, and

D_{X}

is the discriminator for domain

X

. The notation

\min_{F} \max_{D_{X}}

indicates a minimax optimization process, where

F

aims to minimize the adversarial loss

L_{G A N} (F, D_{X}, Y, X)

and

D_{X}

simultaneously tries to maximize it. The terms

Y

and

X

denote the target and source domains, respectively. The adversarial loss,

L_{G A N}

, measures the difference between the generated distribution and the real data distribution in domain

X

. This creates a symmetric adversarial loss for the reverse translation from domain

Y

to domain

X

, ensuring that both mapping functions are optimized through their respective discriminators.

The network architecture comprises a sophisticated combination of generators and discriminators that work together to achieve bi-directional domain translation. The generator architecture includes three critical components: an encoder, a transformer, and a decoder. The encoder compresses input images into a low-dimensional feature space through convolution layers, instance normalization, and ReLU activation functions.

The transformer, the core component of the generator, incorporates nine residual blocks inspired by the ResNet architecture [17]. These blocks preserve input image features during transformation by leveraging residual connections, which help address optimization challenges. The decoder then generates the transformed image using a combination of transposed convolutions, instance normalization, and ReLU activation functions.

For the discriminator, we employ a PatchGAN architecture that operates on 70 × 70 patches. This design enables efficient parameter usage and localized feature discrimination, allowing the network to process various image sizes with fewer parameters while maintaining high accuracy. The discriminator architecture includes convolution layers, instance normalization, and leaky ReLU components, which work together to effectively distinguish between real and generated images.

The loss calculation methodology utilizes the least squares generative adversarial networks (LSGAN) loss, which offers several advantages over conventional GAN loss functions. LSGAN improves stability during training, produces higher-quality images, and minimizes the risk of mode collapse. This loss function reduces the discrepancy between real and generated images, resulting in realistic and accurate transformations.

This comprehensive framework effectively generates synthetic NIR images from visible inputs, overcoming traditional challenges posed by the scarcity of paired training data by handling unpaired datasets, thereby eliminating the need for time-consuming and costly data labeling.

2.2. Image Fusion

Multi-image fusion techniques combine information from multiple images of the same scene into a single composite image, preserving the most critical features from each input. This process is especially useful when images are captured under varying conditions, such as differences in lighting, focus, or wavelength. By merging these diverse inputs, the fused image enhances overall quality and retains essential details that might otherwise be lost when relying on a single exposure or setting. One method used to preserve critical features is a latent low-rank representation (LatLRR).

LatLRR was proposed to address the limitations of traditional low-rank representation (LRR), which struggles with insufficient or noisy observations [18]. To improve robustness, LatLRR incorporates both observed (

X_{o}

) and unobserved (

X_{Z}

) data into its dictionary formulation. This is achieved by minimizing the nuclear norm of the representation matrix (Z):

X_{o} = [X_{o}, X_{Z}] Z,

(4)

where

X_{o}

represents the observed data matrix,

X_{Z}

denotes the unobserved (hidden) data, and Z is the representation matrix that captures relationships among data points. By incorporating

X_{Z}

, LatLRR effectively accounts for hidden data effects, even when the hidden data are unavailable. To recover these hidden data effects, LatLRR formulates a convex optimization problem. For noiseless data, it minimizes the nuclear norms of both the representation matrix (Z) and a latent low-rank matrix (L):

X = X Z + L X,

(5)

where

X

is the input data matrix,

X Z

represents the principal features that describe the core data structure, and

L X

corresponds to the latent low-rank features that capture additional details. When data are corrupted, an additional sparse noise term (E) is introduced [19], leading to the extended problem:

X = X Z + L X + E,

(6)

where

E

accounts for sparse noise or outliers in the data, ensuring the model remains robust under corrupted conditions. This formulation enables LatLRR to effectively reconstruct hidden effects and provide reliable representations, even in challenging environments.

The integration of NIR and visible imaging offers a promising solution to further enhance image quality. By combining spectral translation with advanced blending techniques, this fusion approach surpasses traditional multi-exposure fusion methods, addressing the limitations of individual modalities and advancing the field of image enhancement.

3. Proposed Method

As illustrated in Figure 2, the algorithm progresses through the following stages: CycleGAN training, image blending for detail enhancement, image blending with gamma correction for global toning, CLAHE, and color compensation. The primary objective is to enhance object details in dark regions by leveraging NIR data. Initially, the algorithm addresses the challenge of lacking paired NIR and visible datasets. While paired training was considered, the absence of matching datasets led to the adoption of unpaired learning (as described in Section 2.1) [20]. This unpaired approach facilitates the creation of a synthetic matching dataset, which is then used to train the model and generate synthetic NIR data [21,22]. Both the visible and NIR images are transformed from the RGB channel to the Lab channel, with blending performed exclusively on the L channel. The first step in image blending for detail enhancement involves incorporating details from the NIR image while preserving the natural features of the visible (VIS) image. To mitigate noise introduced by the NIR image, a bilateral filter is applied, reducing noise while retaining edge details.

Next, image blending for detail enhancement applies global gamma correction to enhance the image’s overall brightness. This step helps reduce computational costs while producing a slightly brighter output. CLAHE is then applied to improve visibility and enhance finer details. This final step involves color compensation, as shown in the fourth step (Section 3.3). Here, the synthesized luminance channel is combined with the color-restored a’ and b’ channels. These components are then converted into RGB channels to produce the final output image:

▶: Extraction of hidden information: Sequential unpaired-to-paired transformations are conducted to generate a synthetic NIR image, revealing details hidden in dark environments;
▶: Enhanced visibility: Dual blending is utilized to synthesize the NIR and VIS images, improving overall visibility;
▶: Detail refinement: Gamma correction and CLAHE are applied to refine image details, enhance contrast, and improve brightness.
▶: Color compensation: The final step involves applying color compensation to produce a visually balanced and color-accurate image.

3.1. Unpaired and Paired Dataset Training Using CycleGAN

This paper presents a novel two-stage training strategy designed to address the limited availability of paired visible–NIR training data––a common bottleneck in low-light enhancement systems. As illustrated in Figure 3, the first stage involves unpaired training to generate fake visible images from NIR inputs [23]. The real visible image passes through a resnet generator (vis->NIR) to create a fake NIR image, which then passes through another generator (NIR->vis) to create a fake visible image. The loss is calculated after this process, ensuring that the generated images are as close to real images as possible. In this paper, the focus is on testing real NIR images to generate fake visible images. This necessity is evident in Figure 4. In Figure 4b, the generated fake NIR image demonstrates significant limitations, failing to preserve essential information characteristic of fake NIR domain images. Therefore, this necessitates improvements in the training process. To address these limitations, we proceed with pair training to enhance the results.

The generated fake visible images and real NIR images are then used for pair learning. This process also undergoes the same steps, leveraging the previously mentioned resnet generator. Initially, the fake visible image passes through a resnet generator (vis->NIR) to create a fake NIR image, which is then processed through another generator (NIR->vis) to create a fake visible image. Loss is calculated to ensure that the generated images closely resemble real ones. This process of loss calculation improves the image quality, resulting in refined fake visible images and real NIR images used in pair learning. As a result, the fake NIR image, illustrated in Figure 4c, demonstrates significant improvements, successfully preserving essential information characteristic of fake NIR domain images. This improvement underscores the effectiveness of the proposed domain translation approach. Combining CycleGAN’s ability to perform domain transformations without paired data with the accuracy of paired training, this method provides a robust, effective solution to this challenging task [24].

3.2. Visible–NIR Fusion

The luminance information is stored exclusively in the L channel, allowing it to be processed independently without directly affecting the original color information in the chromatic channels. As a result, the blending process is carried out solely on the L channel. Image blending for detail enhancement calculates the luminance difference between the NIR and visible images [25,26]. The initial blending operation is defined as follows:

L_{d i f f} = \frac{{(L}_{N I R} - L_{V I S}) - m i n (L_{N I R} - L_{V I S})}{\max (L_{N I R} - L_{V I S}) - m i n (L_{N I R} - L_{V I S})},

(7)

L_{B l e n d 1} = L_{N I R} \times (1 - L_{d i f f}) + L_{V I S} \times L_{d i f f},

(8)

where

L_{V I S}

and

L_{N I R}

represent the luminance components of the visible and NIR images, respectively. The term

L_{d i f f}

denotes as a difference metric, typically used to measure the relative contribution of the two luminance components. It ranges from 0 to 1, where higher values emphasize

L_{V I S}

, and lower values prioritize

L_{N I R}

. This combined image retains the details of the visible image while simultaneously benefiting from the noise reduction and contrast enhancement effects of the NIR image. Figure 5 illustrates how

L_{B l e n d 1}

adapts the contributions of

L_{V I S}

and

L_{N I R}

based on the difference metric

L_{d i f f}

. This ensures a smooth and adaptive blending of the luminance components. This equation combines information from both images to create a clearer final image.

However, this direct blending approach is susceptible to quality degradation due to inherent white noise in NIR images. Image blending (with gamma) for global toning applies a bilateral filter to the difference map to mitigate this limitation, producing

L_{s w i t c h i n g}

, which effectively suppresses noise while preserving edge information [27]. The second blending operation is expressed as:

L_{B l e n d 2} = L_{V I S}^{γ} \times (1 - L_{s w i t c h i n g}) + L_{B l e n d 1}^{γ} \times L_{s w i t c h i n g},

(9)

where

L_{V I S}

represents the luminance component of the visible image.

L_{B l e n d 1}

is the blended luminance obtained from a prior step, and

L_{s w i t c h i n g}

is a weighting factor derived from a bilateral filter.

L_{s w i t c h i n g}

ranges from 0 to 1 and adjusts the relative contributions of

L_{V I S}

and

L_{B l e n d 1}

.

γ

(=0.4) is a parameter used in gamma correction, which is a non-linear operation used to encode and decode luminance or tristimulus values in image processing. This approach ensures a smooth and adaptive transition between

L_{V I S}

and

L_{B l e n d 1}

based on the bilateral filter’s weighting.

L_{B l e n d 2}

integrates spatial information to refine the final luminance blend. The blending equation

L_{B l e n d 2} = L_{V I S}^{γ} \times (1 - L_{s w i t c h i n g}) + L_{B l e n d 1}^{γ} \times L_{s w i t c h i n g}

highlights why this method is effective. The term

L_{s w i t c h i n g}

acts as a weighting factor that dynamically balances the contributions of

L_{V I S}

and

L_{B l e n d 1}

, ensuring that more weight is given to the less noisy and more detailed image. When

L_{s w i t c h i n g}

is close to 0, the contribution from

L_{V I S}

is dominant, preserving the details from the visible image. Conversely,

L_{s w i t c h i n g}

is close to 1, and the contribution from

L_{B l e n d 1}

becomes dominant, leveraging the benefits of the noise-reduced NIR image. This refined blending strategy emphasizes details in regions with substantial differences and maintains the natural structure of the visible image in areas with smaller differences. This dual approach ensures that the final blended image retains high-quality details while minimizing noise, leading to a sharper and more visually appealing result. In addition to gamma correction, CLAHE (Contrast Limited Adaptive Histogram Equalization) is applied to improve local contrast and improve the visual quality of the image. The tile size used for CLAHE was (8, 8) and the clip limit was set to 3.0 [28]. This method helps to prevent excessive noise amplification while effectively improving the contrast in small, contextually defined regions of the image.

Figure 6a–d illustrate the process. Figure 6a shows the L-channel during twilight, representing the original input. Figure 6b illustrates the unpaired and paired training modules proposed in the method. Figure 6d depicts the enhanced image after applying the proposed fusion technique. Figure 6e shows the result of processing with CLAHE alone under the same conditions. Compared to Figure 6e, the image in Figure 6d displays the windows of the left building and railroad tracks more brightly and clearly. The combined application of global gamma correction and CLAHE forms a robust post-processing framework that substantially enhances the perceptual quality of the fused image. By strategically manipulating luminance distribution and local contrast, this method extracts and highlights critical visual information that might otherwise be difficult to discern in the original spectral domains.

3.3. Color Space Transformation and Compensation

After converting the image to the Lab color space, the color information from the original image is separated into the a and b channels [29], while the luminance information is contained on the

L

channel. This separation enables independent processing of the luminance channel without directly affecting the original color information. However, during the fusion process of NIR and visible images, imbalances can arise between the modified luminance channel and the color channels (

a

and

b

), potentially leading to unnatural color representations.

When

L^{'}

,

a

, and

b

are converted back to the RGB color space, excessive color distortion, such as oversaturation, can occur. This is because chroma—representing color saturation—is defined as the distance of the (

a

,

b

) point in the color space, as shown in Equation (15). The non-linear relationship between chroma (

c

) and luminance (

L

) exacerbates this issue, as the rate of change in chroma is not proportional to changes in luminance. The color correction function,

C C_{g a i n}

, is determined experimentally through constants

α

and

β

, with values of

α

= 1.009 and

β

= 0.705. Conventional color correction methods, which rely solely on the luminance variation ratio, often result in excessive color representation and significant distortion, thus compromising the natural appearance of the image. To address this issue and preserve color accuracy, we propose a color correction method that retains the original color characteristics while incorporating the enhanced luminance information from the fusion process. The color correction process is mathematically expressed as follows:

L_{g a i n} = \frac{L^{'}}{0.5 * (L_{V I S} + L_{N I R})},

(10)

C C_{g a i n} = α \times {L_{g a i n}}^{β},

(11)

c = \sqrt{a^{2} + b^{2}},

(12)

where

L^{'}

is the refined luminance obtained from a previous blending step, and

L_{V I S}

and

L_{N I R}

are the luminance components of the RGB and NIR images, respectively. The term

L_{g a i n}

calculates a gain factor that adjusts the blended luminance

L^{'}

relative to the average of

L_{V I S}

and

L_{N I R}

. The denominator (

0.5 * (L_{V I S} + L_{N I R}))

represents the average luminance value, which serves as a baseline for normalization. By dividing

L^{'}

by this average, the gain factor highlights the relative enhancement or suppression of luminance in the final result.

Next, the

C C_{g a i n}

term applies a color correction gain to

L_{g a i n}

, where

α

(=1.009) is a scaling factor and

β

(=0.705) is an exponent that introduces a non-linear adjustment. This non-linear transformation ensures that the gain adapts perceptually, providing a more balanced color correction. The formula

C C_{g a i n} = α * {L_{g a i n}}^{β}

allows for flexible control over the color correction process.

a^{'} = C C_{g a i n} \times a,

(13)

b^{'} = C C_{g a i n} \times b,

(14)

where

a

and

b

correspond to the chromatic components of the image, and

a^{'}

and

b^{'}

are the adjusted chromatic components after the color correction process. The multiplication by

C C_{g a i n}

ensures that the chroma is properly scaled in a way that corresponds to the luminance adjustments, preserving a balance between brightness and color saturation.

Finally, the luminance channel

L^{'}

and color channels

a^{'}

and

b^{'}

, processed through the color correction method, are combined to produce the final synthesized image. As shown in Figure 7b, the region marked by the red rectangle exhibits distortion, appearing lighter compared to the original image in Figure 7a. In contrast, Figure 7c, which includes the proposed correction, eliminates this distortion, maintaining the original color fidelity and ensuring a more natural representation.

4. Simulations

In this study, our model was trained using an unpaired dataset comprising 3300 pairs from the low-light datasets by Wei at al. [9], Jiang et al. [30], and publicly available data from the internet. Additionally, a paired dataset also comprising 3300 pairs, was used from the NIR datasets by Sa et al. [31]. The evaluation conducted a comprehensive performance evaluation of the proposed model using a range of image datasets, including established benchmark collections such as Loh et al. [32], and Meylan et al. [33,34]. To rigorously assess the effectiveness of this approach, we compared the results with several existing tone-mapping algorithms under different imaging conditions. The experimental setup involved testing various image types to address specific challenges: low-light scenarios to evaluate tone compression in both bright and dark regions, outdoor scenes to assess detail preservation, and color-rich environments to examine both color reproduction and tone compression simultaneously. Image processing was performed on images with dimensions of 1312 × 2000 pixels, resulting in a processing time of 2.29 s per image. The evaluation was carried out on a Windows 10 Education 64-bit system with an Intel(R) Core™ i7-9700K CPU @ 3.6 GHz, a TITAN RTX (NVIDIA, Santa Clara, CA, USA), and 96 GB of DDR4 memory, utilizing Python 3.8 and MATLAB R2020b for computational analysis. The training module was configured with a batch size of 1000 epochs and a learning rate of

2 \times 10^{- 4}

. Through comparative analysis, we found that existing methods such as Reinhard and L1L0 tended to apply a uniform tone increase across image regions, resulting in a lack of dynamic detail. While methods like iCAM06, Kwon et al., Retinexnet, and Kim et al. maintained visibility in dark scenes, they introduced noticeable color distortions. These included unnatural saturation levels and an overall reddish color cast, especially in Retinexnet. In contrast, the proposed method outperformed these alternatives by effectively compressing tones in detailed regions. Specific elements, such as buses, transmission towers, cars, windows, and road surfaces, exhibited nuanced and accurate contrast representations. This allowed our approach to preserve and emphasize critical structural and textural information with remarkable fidelity, producing a balanced and natural visual output.

4.1. Comparative Experiments

To assess the performance of the proposed multi-exposure fusion method, we conducted comparative experiments against several widely used fusion techniques, including iCAM06, L1L0, Kwon, Reinhard, Retinexnet, and Kim et al. Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13 and Figure 14 display the results produced by each method using images with varying brightness levels, demonstrating the effectiveness of the proposed method in different scenarios.

When the input image depicted a night scene, all methods except the proposed one exhibited noticeable noise in the sky. As shown in Figure 8 and Figure 9, the proposed method effectively produced a clear output for the night scene, reducing sky noise. However, despite this improvement, the proposed method introduced some white noise, which is a limitation of the synthetic NIR generation process. This effect is linked to the combination of unpaired and paired training in the proposed modules, which may cause inconsistencies during image synthesis.

Figure 10 and Figure 11 present an image captured during twilight. It is clear that iCAM06, L1L0, Kwon, Reinhard, and Lee et al. methods suffer from over-saturation, particularly in areas near the sun. Additionally, Retinexnet, shown in Figure 10 and Figure 11, exhibits a noticeable reddish bias, causing significant color distortion compared to the original input. In contrast, the proposed method avoids color distortion while maintaining high contrast. This allows for clear visibility of details, even in the darker areas of the image, revealing faint objects typically obscured in low-light conditions. The proposed method thus demonstrates superior performance in challenging low-light scenarios.

Figure 12 illustrates an indoor scene with an outdoor view, highlighting key differences among the evaluated methods. Traditional approaches such as iCAM06, L1L0, Kwon et al., and Reinhard show subtle noise artifacts in their outputs. While Retinexnet achieves high visibility through enhanced sharpness, it suffers from substantial global color distortion. Kim et al.’s method encounters over-saturation issues, resulting in an unnatural background. In contrast, the proposed method maintains natural saturation levels while preserving object clarity and color accuracy. This balanced approach allows for clear visualization of both indoor and outdoor features, such as trees and curtains, demonstrating the method’s robustness in handling scenes with complex lighting conditions.

Figure 13 and Figure 14 depict a bright outdoor scene, emphasizing several notable differences among the methods. Traditional approaches like iCAM06, L1L0, Kwon et al., and Reinhard exhibit over-saturation, which diminishes the natural appearance of the images. Although Retinexnet produces sharp images, it introduces a pronounced reddish bias, particularly affecting the natural tones of faces, as seen in Figure 13. Kim et al.’s method maintains good color fidelity but struggles with accurate color reproduction of objects like trucks, as seen in Figure 13. In contrast, the proposed method outperforms these techniques in daylight conditions by preserving natural color fidelity and avoiding color distortion, as demonstrated in Figure 13 and Figure 14.

4.2. Quantitative Evaluations

The performance of the proposed method was quantitatively evaluated using advanced image quality metrics. The blind/referenceless image spatial quality evaluator (BRISQUE) assesses image quality by modeling statistical features of locally normalized luminance coefficients, capturing natural scene statistics to quantify visual distortions [35]. The spatial and spectral entropy-based quality (SSEQ) metric evaluates image quality by analyzing spatial and spectral entropy features, effectively quantifying both spatial distortions and spectral inconsistencies, making it well suited for assessing images under diverse conditions [36]. The spatial-spectral sharpness measure (S3) evaluates image sharpness by combining spatial and spectral characteristics, utilizing local spectral slope and total variation to produce a comprehensive sharpness index [37].

The localized perceptual contrast-sharpness index (LPC-SI) analyzes how well contrast and structural information are preserved, demonstrating effectiveness in maintaining fine image details and structural integrity [38]. The natural image quality evaluator (NIQE) provides a no-reference perceptual quality assessment by comparing image characteristics against a statistical model of natural scene statistics, offering an objective measure of image quality without requiring a reference image [39].

The multimedia assessment and inference of quality with attention (MANIQA) leverages advanced attention mechanisms to assess image quality, incorporating transformer-based architectures to capture complex perceptual features and provide nuanced quality evaluation [40]. The convolutional neural network-based image quality assessment (CNNIQA) employs deep learning techniques to extract hierarchical features, utilizing convolutional neural networks to generate comprehensive quality scores that reflect perceptual image characteristics [41].

Figure 15 presents the results of the evaluation based on the scores for 25 comparison images and their averages. Figure 16 illustrates the images used for score metrics, showcasing the diversity of inputs evaluated and highlighting the consistent performance of the method across various scenarios.

The proposed method consistently ranks highly across a variety of quality assessment metrics, demonstrating exceptional stability and minimal performance variation. In contrast to existing synthesis methods, which often exhibit considerable discrepancies in evaluation results, our approach provides robust and uniform performance across different quality assessment frameworks.

Notably, the method excels in the BRISQUE assessment, which quantifies visual distortions by modeling natural scene statistics. As shown in Table 1, the proposed method achieved a BRISQUE score of 20.473, ranking first, and outperforming the second-best score of 20.895 by approximately 2.02%. This highlights the method’s ability to produce visually appealing results with minimal distortion.

The performance on the SSEQ metric, where lower values indicate better quality, is also remarkable. The proposed method scored 16.475, trailing the leading score of 16.311 by just 1.01%, reflecting its strong competitive performance in preserving structural quality.

In the S3 metric, the proposed method scored 0.245, showing an improvement of approximately 5.60% over the second-best score of 0.232. This demonstrates the method’s effectiveness in enhancing overall structural similarity and maintaining high-quality synthesis.

For LPC_SI, the proposed method achieved a score of 0.948, surpassing the second-best score of 0.925 by approximately 2.49%. This indicates its exceptional performance in detailed feature representation and perceptual quality.

In the NIQE metric, the proposed method scored 3.001, outperforming the second-best score of 3.289 by approximately 8.75%. This validates the method’s ability to maintain high perceptual quality without relying on reference images.

In the MANIQA assessment, the proposed method achieved a superior score of 0.3043, compared to 0.2851 for the second-best method, demonstrating an improvement of approximately 6.74%. This showcases the method’s effectiveness in attention-based image quality evaluation.

Finally, in the CNNIQA metric, which employs deep learning for evaluation, the proposed method scored 18.802, surpassing the second-best score of 19.254 by approximately 2.35%. This reinforces the method’s superior image synthesis capabilities.

Overall, the proposed method stands out in image quality performance, particularly in key areas such as detail representation, noise reduction, and tone-compression effects. The consistently high rankings across various sophisticated image quality assessment metrics highlight the method’s robustness and effectiveness in advanced image synthesis. The results in Table 1 further underscore the method’s ability to achieve competitive or superior performance across diverse evaluation frameworks, showcasing its versatility and reliability in different contexts.

Finally, the processing time was calculated for the transformation into clean images based on the 1312 × 2000 resolution criterion. Comparing results with other existing methods, as shown in Table 2, the proposed method exhibited the fastest processing speed, outperforming all other techniques except for Reinhard (2012). This demonstrates the efficiency of the proposed method in terms of computational performance, while maintaining image quality.

The proposed method shows effective image representation performance for high contrast and dim surrounding conditions. However, it faces limitations in real-time image transformation due to multiple computational steps with the deep learning module and method-based blending processing. Therefore, embedding the proposed algorithm into boards such as specialized ASIC boards is necessary for practical application. Future research should also focus on real-time operations integrated into video surveillance systems.

5. Conclusions

This study presents a novel approach to enhancing image quality by combining image-to-image translation between the visible and NIR domains with advanced post-processing techniques. To address the challenge of limited paired visible–NIR training data, we employed an unpaired CycleGAN architecture to generate synthetic visible images. These synthetic images were then integrated into a paired training process with real NIR images, allowing for a robust and effective fusion of the two domains.

Our method consists of several key stages. It begins with initial image synthesis using CycleGAN, followed by the extraction of luminance channel information. Next, detail preservation techniques such as gamma correction and CLAHE are applied to enhance local contrast and structural details. Finally, a color compensation step is performed to ensure accurate RGB reconstruction, preserving the vibrancy and fidelity of the original color information.

The experimental results highlight the advantages of our approach over traditional tone-mapping algorithms. First, the proposed method achieves superior tone compression while enhancing local contrast, resulting in images with exceptional clarity. Second, unlike conventional methods that often produce muted or distorted colors, the color compensation technique ensures vibrant and accurate color reproduction. By combining the strengths of deep learning-based image translation with traditional image processing techniques, this hybrid approach effectively addresses the limitations of visible–NIR fusion. The proposed method successfully preserves structural detail and color fidelity, offering a promising solution for advanced image enhancement applications.

Author Contributions

Conceptualization, S.-H.L. (Sung-Hak Lee); methodology, M.-H.L. and S.-H.L. (Sung-Hak Lee); software, M.-H.L.; validation, M.-H.L., S.-H.L. (Seung-Hwan Lee) and S.-H.L. (Sung-Hak Lee); formal analysis, M.-H.L. and S.-H.L. (Sung-Hak Lee); investigation, M.-H.L. and S.-H.L. (Sung-Hak Lee); resources, M.-H.L. and S.-H.L. (Sung-Hak Lee); data curation, M.-H.L., Y.-H.G., S.-H.L. (Seung-Hwan Lee) and S.-H.L. (Sung-Hak Lee); writing—original draft preparation, M.-H.L.; writing—review and editing, S.-H.L. (Sung-Hak Lee); visualization, M.-H.L.; supervision, S.-H.L. (Sung-Hak Lee); project administration, S.-H.L. (Sung-Hak Lee); funding acquisition, S.-H.L. (Sung-Hak Lee). All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Korea Creative Content Agency (KOCCA) grant funded by the Ministry of Culture, Sports and Tourism (MCST) in 2024 (Project Name: Development of optical technology and sharing platform technology to acquire digital cultural heritage for high quality restoration of composite materials cultural heritage, Project Number: RS-2024-00442410, Contribution Rate: 50%) and the Innovative Human Resource Development for Local Intellectualization program through the Institute of Information and Communications Technology Planning and Evaluation (IITP) grant funded by the Korean government (MSIT) (IITP-2024-RS-2022-00156389, 50%).

Data Availability Statement

The data presented in this study are openly available in Wei et al. in reference [9], Jiang et al. in reference [30], Sa et al. in reference [31], Yuen et al. in reference [32], Meylan et al. in references in [33,34], and https://www.cs.cmu.edu/afs/cs/project/cil/ftp/html/v-images.html (accessed on 5 May 2024) in references [42].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kwon, H.-J.; Lee, S.-H. Visible and Near-Infrared Image Acquisition and Fusion for Night Surveillance. Chemosensors 2021, 9, 75. [Google Scholar] [CrossRef]
Park, C.-W.; Kwon, H.-J.; Lee, S.-H. Illuminant Adaptive Wideband Image Synthesis Using Separated Base-Detail Layer Fusion Maps. Appl. Sci. 2022, 12, 9441. [Google Scholar] [CrossRef]
Sukthankar, R.; Stockton, R.G.; Mullin, M.D. Smarter presentations: Exploiting homography in camera-projector systems. In Proceedings of the Eighth IEEE International Conference on Computer Vision, ICCV 2001, Vancouver, BC, Canada, 7–14 July 2001; pp. 247–253. [Google Scholar] [CrossRef]
Kuang, J.; Johnson, G.M.; Fairchild, M.D. iCAM06: A refined image appearance model for HDR image rendering. J. Vis. Commun. Image Represent. 2007, 18, 406–414. [Google Scholar] [CrossRef]
Ma, C.; Yeo, T.S.; Liu, Z.; Zhang, Q.; Guo, Q. Target imaging based on ℓ 1 ℓ 0 norms homotopy sparse signal recovery and distributed MIMO antennas. IEEE Trans. Aerosp. Electron. Syst. 2015, 51, 3399–3414. [Google Scholar] [CrossRef]
Kwon, H.-J.; Lee, S.-H. Contrast Sensitivity Based Multiscale Base–Detail Separation for Enhanced HDR Imaging. Appl. Sci. 2020, 10, 2513. [Google Scholar] [CrossRef]
Go, Y.-H.; Lee, S.-H.; Lee, S.-H. Multiexposed Image-Fusion Strategy Using Mutual Image Translation Learning with Multiscale Surround Switching Maps. Mathematics 2024, 12, 3244. [Google Scholar] [CrossRef]
Reinhard, E.; Stark, M.; Shirley, P.; Ferwerda, J. Photographic Tone Reproduction for Digital Images. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2; ACM: New York, NY, USA, 2023; pp. 661–670. [Google Scholar] [CrossRef]
Wei, C.; Wang, W.; Yang, W.; Liu, J. Deep Retinex Decomposition for Low-Light Enhancement. arXiv 2018, arXiv:1808:04560. [Google Scholar] [CrossRef]
Kim, Y.-J.; Son, D.-M.; Lee, S.-H. Retinex Jointed Multiscale CLAHE Model for HDR Image Tone Compression. Mathematics 2024, 12, 1541. [Google Scholar] [CrossRef]
Son, D.-M.; Kwon, H.-J.; Lee, S.-H. Visible and Near Infrared Image Fusion Using Base Tone Compression and Detail Transform Fusion. Chemosensors 2022, 10, 124. [Google Scholar] [CrossRef]
Elad, M. On the origin of the bilateral filter and ways to improve it. IEEE Trans. Image Process. 2002, 11, 1141–1151. [Google Scholar] [CrossRef]
Yan, B.; Guo, W. A novel identification method for CPPU-treated kiwifruits based on images. J. Sci. Food Agric. 2019, 99, 6234–6240. [Google Scholar] [CrossRef] [PubMed]
Im, C.-G.; Son, D.-M.; Kwon, H.-J.; Lee, S.-H. Tone Image Classification and Weighted Learning for Visible and NIR Image Fusion. Entropy 2022, 24, 1435. [Google Scholar] [CrossRef] [PubMed]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. arXiv 2014, arXiv:1406.2661. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Szekeres, B.J.; Gyöngyössy, M.N.; Botzheim, J. A ResNet-9 Model for Insect Wingbeat Sound Classification. In Proceedings of the 2023 IEEE Symposium Series on Computational Intelligence (SSCI), Mexico City, Mexico, 5–8 December 2023; pp. 587–592. [Google Scholar] [CrossRef]
Liu, G.; Yan, S. Latent Low-Rank Representation for subspace segmentation and feature extraction. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 1615–1622. [Google Scholar] [CrossRef]
Zarmehi, N.; Marvasti, F. Removal of sparse noise from sparse signals. Signal Process. 2019, 158, 91–99. [Google Scholar] [CrossRef]
Borstelmann, A.; Haucke, T.; Steinhage, V. The Potential of Diffusion-Based Near-Infrared Image Colorization. Sensors 2024, 24, 1565. [Google Scholar] [CrossRef]
Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar] [CrossRef]
Yang, Z.; Chen, Z. Learning From Paired and Unpaired Data: Alternately Trained CycleGAN for Near Infrared Image Colorization. In Proceedings of the 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP), Macau, China, 1–4 December 2020; pp. 467–470. [Google Scholar] [CrossRef]
Su, H.; Jung, C.; Yu, L. Multi-Spectral Fusion and Denoising of Color and Near-Infrared Images Using Multi-Scale Wavelet Analysis. Sensors 2021, 21, 3610. [Google Scholar] [CrossRef]
Radke, R.J.; Andra, S.; Al-Kofahi, O.; Roysam, B. Image change detection algorithms: A systematic survey. IEEE Trans. Image Process. 2005, 14, 294–307. [Google Scholar] [CrossRef]
Zhang, H.; Ma, J. IID-MEF: A multi-exposure fusion network based on intrinsic image decomposition. Inf. Fusion 2023, 95, 326–340. [Google Scholar] [CrossRef]
Lee, S.-H.; Kwon, H.-J.; Lee, S.-H. Enhancing Lane-Tracking Performance in Challenging Driving Environments through Parameter Optimization and a Restriction System. Appl. Sci. 2023, 13, 9313. [Google Scholar] [CrossRef]
Lee, G.-Y.; Lee, S.-H.; Kwon, H.-J.; Sohng, K.-I. Visual sensitivity correlated tone reproduction for low dynamic range images in the compression field. Opt. Eng. 2014, 53, 113111. [Google Scholar] [CrossRef]
Musa, P.; Al Rafi, F.; Lamsani, M. A Review: Contrast-Limited Adaptive Histogram Equalization (CLAHE) methods to help the application of face recognition. In Proceedings of the 2018 Third International Conference on Informatics and Computing (ICIC), Palembang, Indonesia, 17–18 October 2018; pp. 1–6. [Google Scholar] [CrossRef]
Bartleson, C.J. Predicting corresponding colors with changes in adaptation. Color Res. Appl. 1979, 4, 143–155. [Google Scholar] [CrossRef]
Jiang, Y.; Gong, X.; Liu, D.; Cheng, Y.; Fang, C.; Shen, X. Enlightengan: Deep light enhancement without paired supervision. IEEE Trans. Image Process. 2021, 30, 2340–2349. [Google Scholar] [CrossRef] [PubMed]
Sa, I.; Lim, J.Y.; Ahn, H.S.; MacDonald, B. DeepNIR: Datasets for Generating Synthetic NIR Images and Improved Fruit Detection System Using Deep Learning Techniques. Sensors 2022, 22, 4721. [Google Scholar] [CrossRef]
Loh, Y.P.; Chan, C.S. Getting to know low-light images with the Exclusively Dark dataset. Comput. Vis. Image Underst. 2019, 178, 30–42. [Google Scholar] [CrossRef]
Meylan, L.; Susstrunk, S. High dynamic range image rendering with a retinex-based adaptive filter. IEEE Trans. Image Process. 2006, 15, 2820–2830. [Google Scholar] [CrossRef]
Meylan, L. Tone Mapping for High Dynamic Range Images; EPFL: Lausanne, Switzerland, 2006. [Google Scholar]
Mittal, A.; Moorthy, A.K.; Bovik, A.C. No-Reference Image Quality Assessment in the Spatial Domain. IEEE Trans. Image Process. 2012, 21, 4695–4708. [Google Scholar] [CrossRef]
Liu, L.; Liu, B.; Huang, H.; Bovik, A.C. No-reference image quality assessment based on spatial and spectral entropies. Signal Process. Image Commun. 2014, 29, 856–863. [Google Scholar] [CrossRef]
Vu, C.T.; Chandler, D.M. S3: A Spectral and Spatial Sharpness Measure. In Proceedings of the 2009 First International Conference on Advances in Multimedia, Colmar, France, 20–25 July 2009; pp. 37–43. [Google Scholar] [CrossRef]
Hassen, R.; Wang, Z.; Salama, M.M.A. Image Sharpness Assessment Based on Local Phase Coherence. IEEE Trans. Image Process. 2013, 22, 2798–2810. [Google Scholar] [CrossRef]
Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a ‘Completely Blind’ Image Quality Analyzer. IEEE Signal Process. Lett. 2013, 20, 209–212. [Google Scholar] [CrossRef]
Yang, S.; Wu, T.; Shi, S.; Lao, S.; Gong, Y.; Cao, M.; Wang, J.; Yang, Y. Maniqa: Multi-dimension attention network for no-reference image quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1191–1200. [Google Scholar] [CrossRef]
Kang, L.; Ye, P.; Li, Y.; Doermann, D. Convolutional neural networks for no-reference image quality assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1733–1740. [Google Scholar]
Computer Vision Test Images. Available online: https://www.cs.cmu.edu/afs/cs/project/cil/ftp/html/v-images.html (accessed on 5 May 2024).

Figure 1. Overview of the CycleGAN principles: Generator functions

G

:

X

→

Y

and

F

:

Y

→

X

, and adversarial discriminators

D_{Y}

and

D_{X}

of CycleGAN. The blue circles represent the source domain samples and their transformed version, while the red circles represent the target domain samples and their transformed version.

Figure 1. Overview of the CycleGAN principles: Generator functions

G

:

X

→

Y

and

F

:

Y

→

X

, and adversarial discriminators

D_{Y}

and

D_{X}

of CycleGAN. The blue circles represent the source domain samples and their transformed version, while the red circles represent the target domain samples and their transformed version.

Figure 2. Flowchart of the proposed method.

Figure 3. Visible–NIR image translation training process.

Figure 4. Generation results: (a) Input image, (b) unpaired module result, and (c) paired module result.

Figure 5. First image fusion scheme of the proposed mode.

Figure 6. Blending results: (a) Original image (=

L_{V I S}

), (b)

L_{B l e n d 1}

image, (c)

L_{B l e n d 2}

image, (d) L′ image, and (e) CLAHE applied to the original image.

Figure 6. Blending results: (a) Original image (=

L_{V I S}

), (b)

L_{B l e n d 1}

image, (c)

L_{B l e n d 2}

image, (d) L′ image, and (e) CLAHE applied to the original image.

Figure 7. Color distortion comparisons: (a) Original image, (b) without color correction, and (c) with color correction.

Figure 8. Input and result images for night case #1: (a) Input image, (b) iCAM06, (c) L1L0, (d) Kwon et al. [6], (e) Reinhard (2012) [8], (f) Retinexnet, (g) Kim et al. [10], and (h) proposed method.

Figure 9. Input and result images for night case #2: (a) Input image, (b) iCAM06, (c) L1L0, (d) Kwon et al. [6], (e) Reinhard (2012) [8], (f) Retinexnet, (g) Kim et al. [10], and (h) proposed method.

Figure 10. Input and result images for sunset case #1: (a) Input image, (b) iCAM06, (c) L1L0, (d) Kwon et al. [6], (e) Reinhard (2012) [8], (f) Retinexnet, (g) Kim et al. [10], and (h) proposed method.

Figure 11. Input and result images for sunset case #2: (a) Input image, (b) iCAM06, (c) L1L0, (d) Kwon et al. [6], (e) Reinhard (2012) [8], (f) Retinexnet, (g) Kim et al. [10], and (h) proposed method.

Figure 12. Input and result images for indoor case: (a) Input image, (b) iCAM06, (c) L1L0, (d) Kwon et al. [6], (e) Reinhard (2012) [8], (f) Retinexnet, (g) Kim et al. [10], and (h) proposed method.

Figure 13. Input and result images for daytime case #1: (a) Input image, (b) iCAM06, (c) L1L0, (d) Kwon et al. [6], (e) Reinhard (2012) [8], (f) Retinexnet, (g) Kim et al. [10], and (h) proposed method.

Figure 14. Input and result images for daytime case #2: (a) Input image, (b) iCAM06, (c) L1L0, (d) Kwon et al. [6], (e) Reinhard (2012) [8], (f) Retinexnet, (g) Kim et al. [10], and (h) proposed method.

Figure 15. Metric scores: (a) Blind/referenceless image spatial quality evaluator, (b) spatial and spectral entropy-based quality, (c) spatial and spectral sharpness measure (S3) score, (d) local phase coherence-sharpness index (LPC_SI) score, (e) natural image quality evaluator (NIQE) score, (f) multi-dimension attention network for no-reference image quality assessment (MANIQA), and (g) convolutional neural networks for no-reference image quality assessment (CNNIQA) score. (The y-axis represents the metric score, and the x-axis indicates the image numbers).

Figure 16. Test images (The numbers in figures are the image numbers in Figure 15).

Table 1. Comparison of metric scores. (↑) Higher scores are preferable, and (↓) lower scores are preferable. Bold font highlights the best results in each corresponding metric. Superscript numbers indicate the ranking among comparison models (superscript ^1,2 represent score ranks). Bold font highlights the best result.

	iCAM06	L1L0	Kwon et al. [6]	Reinhard (2012) [8]	Retinexnet	Kim et al. [10]	Proposed
BRISQUE (↓)	25.272	26.424	22.602	25.946	20.895	22.098	20.473 ¹
SSEQ (↓)	21.316	23.052	16.311	23.842	21.995	18.165	16.475 ²
S3 (↑)	0.160	0.139	0.232	0.173	0.178	0.225	0.245 ¹
LPC_SI (↑)	0.902	0.888	0.925	0.917	0.919	0.921	0.948 ¹
NIQE (↓)	3.427	3.325	4.530	3.289	4.309	3.319	3.001 ¹
MANIQA (↑)	0.2246	0.2361	0.2392	0.2353	0.2851	0.2288	0.3043 ¹
CNNIQA (↓)	20.311	20.920	30.122	24.135	19.254	22.006	18.802 ¹

Table 2. Runtime performance: lower times are preferable. Superscript numbers indicate the ranking among comparison models (superscript ^1,2 represent score ranks). Bold font highlights the best result.

	iCAM06	L1L0	Kwon et al. [6]	Reinhard (2012) [8]	Retinexnet	Kim et al. [10]	Proposed
Process time	5.39 s	5.57 s	22.54 s	1.10 s ¹	6.40 s	26.78 s	2.29 s ²

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, M.-H.; Go, Y.-H.; Lee, S.-H.; Lee, S.-H. Low-Light Image Enhancement Using CycleGAN-Based Near-Infrared Image Generation and Fusion. Mathematics 2024, 12, 4028. https://doi.org/10.3390/math12244028

AMA Style

Lee M-H, Go Y-H, Lee S-H, Lee S-H. Low-Light Image Enhancement Using CycleGAN-Based Near-Infrared Image Generation and Fusion. Mathematics. 2024; 12(24):4028. https://doi.org/10.3390/math12244028

Chicago/Turabian Style

Lee, Min-Han, Young-Ho Go, Seung-Hwan Lee, and Sung-Hak Lee. 2024. "Low-Light Image Enhancement Using CycleGAN-Based Near-Infrared Image Generation and Fusion" Mathematics 12, no. 24: 4028. https://doi.org/10.3390/math12244028

APA Style

Lee, M.-H., Go, Y.-H., Lee, S.-H., & Lee, S.-H. (2024). Low-Light Image Enhancement Using CycleGAN-Based Near-Infrared Image Generation and Fusion. Mathematics, 12(24), 4028. https://doi.org/10.3390/math12244028

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Low-Light Image Enhancement Using CycleGAN-Based Near-Infrared Image Generation and Fusion

Abstract

1. Introduction

2. Related Works

2.1. Cycle-Consistent Generative Adversarial Networks

2.2. Image Fusion

3. Proposed Method

3.1. Unpaired and Paired Dataset Training Using CycleGAN

3.2. Visible–NIR Fusion

3.3. Color Space Transformation and Compensation

4. Simulations

4.1. Comparative Experiments

4.2. Quantitative Evaluations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI