Self-Supervised and Supervised Image Enhancement Networks with Time-Shift Module

Tuncal, Kubra; Sekeroglu, Boran; Abiyev, Rahib

doi:10.3390/electronics13122313

Open AccessArticle

Self-Supervised and Supervised Image Enhancement Networks with Time-Shift Module

by

Kubra Tuncal

^1,*,

Boran Sekeroglu

^2,3,†

and

Rahib Abiyev

^4,†

¹

Department of Computer Engineering, Near East University, Nicosia 99138, Turkey

²

Department of Software Engineering, World Peace University, Nicosia 99010, Turkey

³

Artificial Intelligence Research and Application Center, World Peace University, Nicosia 99010, Turkey

⁴

Applied Artificial Intelligence Researh Center, Near East University, Nicosia 99138, Turkey

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2024, 13(12), 2313; https://doi.org/10.3390/electronics13122313

Submission received: 16 May 2024 / Revised: 3 June 2024 / Accepted: 5 June 2024 / Published: 13 June 2024

Download

Browse Figures

Versions Notes

Abstract

:

Enhancing image quality provides more interpretability for both human beings and machines. Traditional image enhancement techniques work well for specific uses, but they struggle with images taken in extreme conditions, such as varied distortions, noise, and contrast deformations. Deep-learning-based methods produce superior quality in enhancing images since they are capable of learning the spatial characteristics within the images. However, deeper models increase the computational costs and require additional modules for particular problems. In this paper, we propose self-supervised and supervised image enhancement models based on the time-shift image enhancement method (TS-IEM). We embedded the TS-IEM into a four-layer CNN model and reconstructed the reference images for the self-supervised model. The reconstructed images are also used in the supervised model as an additional layer to improve the learning process and obtain better-quality images. Comprehensive experiments and qualitative and quantitative analysis are performed using three benchmark datasets of different application domains. The results showed that the self-supervised model could provide reasonable results for the datasets without reference images. On the other hand, the supervised model outperformed the state-of-the-art methods in quantitative analysis by producing well-enhanced images for different tasks.

Keywords:

image enhancement; time-shift method; self-supervised; UIEB; EUVP; Endo4IE

1. Introduction

Cameras and imaging devices have different challenges in capturing an ideal, undistorted image with high contrast since the conditions of the 3D world create a variety of parameters such as poor or high illumination, characteristics of objects, etc. [1]. Obtaining degraded or corrupted images leads to the enhancement of acquired images to improve their quality in terms of color correction, visual appearance, and detail preservation. Therefore, enhancing the images provides a more effective interpretation for humans and computers.

Spatial domain image enhancement considers the intensity information of images on the respective x and y coordinates or regions and aims to provide more reasonable images than the originals by modifying them. There are several approaches, such as histogram-based approaches [2,3,4], gamma correction [5], and fusion-based [6,7], to achieve the aims of image enhancement in the spatial domain. Even though the methods of these approaches produced visually satisfying results for different problem domains, each approach suffers from different disadvantages since considering a single pixel or a region of the image produces over- or under-enhanced images since the similar intensity values of corrupted and distinct regions mislead the modification of the intensity values. Therefore, the above-mentioned limitations prevent these methods from applying entirely.

Recently, the time-shift image enhancement method (TS-IEM) [1] was proposed to represent images in spacetime and derive the events from the inertial frame in predefined time sequences. It was combined with a gamma correction phase to enhance the associated events and to obtain a final reconstructed spatial domain image. It was based on the assumption that 2D images are fixed at a constant moment, which is an inertial frame, with no velocity and information associated with pre- or post-frames. The TS-IEM overcame the over- or under-enhancement problem of other methods and eliminated the parameter selection, which made the previous implementations challenging.

On the other hand, developments in deep learning techniques, their ability to effectively extract features from images, and modeling complex relationships between extracted features, overcame the disadvantages that occurred in the conventional methods [8]. Deep learning methods automatically enhance images by learning the characteristics of images through analysis and interpretation of a set of images. This ability allows for the learning of different problem domains with varied pixel or regional characteristics using learnable parameters, and it produces more reasonable enhanced images [9]. However, the efficacy of these highly effective models suffers from the training time, computational cost, and focusing on particular problems. Therefore, practical models which are suitable and effective for extensive applications are required to perform the enhancement with higher efficacy in a shorter time.

In this paper, we propose both supervised and self-supervised lightweight four-layered convolutional neural network (CNN) models, named Time-Shift Enhancement Network (Tim-EN), by adapting the time-shift image enhancement method into the CNN as an independent module to enhance images for different purposes. In the supervised Tim-EN model, the TS-IEM is used to enhance the degraded input images of the model in addition to the merging TS-IEM outputs in the last layer of the CNN. Contrary to the supervised model, the self-supervised Tim-EN model uses the TS-IEM outputs as target images and aims to enhance the original images without original reference images. This leads to enhanced unpaired or non-referenced images, which is necessary for real-life applications.

The rest of the paper is organized as follows: Section 2 summarizes the conventional and recent methods related to this work. Section 3 introduces the materials and methods considered. Section 4 presents the results of this study and discussion. Finally, Section 5 presents the concluding remarks of the study.

2. Related Works

2.1. Conventional Methods

Conventional image enhancement methods focus on improving image brightness and color correction in the spatial domain. The most widely used conventional approaches are histogram-based, gamma correction, and fusion-based methods.

Histogram-based methods aim to enhance images using the distribution of the pixels within the image globally or locally, and they are mostly used to improve the contrast level of the images. Even though they satisfactorily produce a dynamic range of pixels in the resultant image, different characteristics of images cause over- or under-enhancement of the images. Contrast-limited adaptive histogram equalization (CLAHE) [2] was proposed to avoid over-enhancement by limiting the enhancement level. Tt is the basis for advanced histogram-based methods; however, the resultant images might include distortions and high luminosity. For that reason, the brightness-preserving bi-histogram equalization (BBHE) [3] method aimed to preserve the brightness by applying thresholds using the mean and median intensity values.

On the other hand, the gamma-correction methods aimed to solve the problems that occurred in histogram-based methods. However, they had a vital problem in producing satisfactorily enhanced images since determining the gamma value is challenging and depends on the application domain. This problem leads researchers to determine the gamma value automatically by considering the intensity distributions. Huang et al. [5] proposed adaptive with weighting distribution (AGCWD). The proposed method uses the cumulative density function (CDF) and the probability density function (PDF) to determine the gamma value automatically. Even though the method effectively produces enhanced images without artifacts, bright objects cause excess luminosity in the enhanced images. To overcome this problem, Huang et al. [10] proposed truncated adaptive (TAGC) and truncated the CDF by thresholding. This prevented the over-enhancement and produced more reasonable dark and bright regions in the resultant image. Recently, Chang et al. [11] improved the CLAHE to perform enhancement automatically with dual (A-CLAHE-DGC). Dual was effectively used to limit the over-enhancement a single value could cause.

Fusion-based image enhancement methods consider different characteristics of images, such as intensity levels and exposures, and fuse the processed or extracted characteristics to reconstruct enhanced images. However, different problem domains and applications require different attention on the images, and this limitation has meant the fusion-based method has been proposed for particular applications [6].

Ancuti et al. [12] and Zhou et al. [13] proposed fusion-based methods to enhance underwater images. These methods achieved successful results in terms of contrast stretching, detail preserving, and color correction. A general-purpose fusion-based method was proposed by Fu et al. [14] to enhance weakly illuminated images.

Recently, Zhang et al. [15] proposed a method to enhance underwater images with minimal color loss and locally adaptive contrast enhancement (MLLE). The locally adaptive contrast process was used to reduce the over- or under-enhancement. The authors also tested the proposed method with different images under varied conditions to demonstrate its ability.

2.2. Deep-Learning-Based Methods

In the last two decades, deep-learning-based image enhancement methods have become common for image enhancement problems. However, the training of the models requires a large number of images to provide effective convergence. For that reason, deep-learning-based methods are widely used separately for particular problem domains, such as underwater image enhancement (UIE) [16], medical image enhancement, image dehazing, and exposure correction. In addition, different characteristics of problem domains required the embedding of individual modules into the deep learning models to overcome the over and under-enhancement by obtaining additional knowledge.

One of the most popular applications is the UIE [17,18], which includes challenging distortions, color changes, and illumination in raw images. In addition, obtaining the reference images for underwater images creates another challenge.

Recently, Ding et al. [19] developed a two-stream structure to enhance regional and global appearances separately. The resultant image is obtained by weighing the contributions of the streams. Fu et al. [20] proposed a novel probabilistic network to enhance underwater images effectively. The proposed method consisted of the conditional variational autoencoder and adaptive instance normalization. The final enhancement was based on predicting the result from the set of distributions.

Khandouzi and Ezoji [18] proposed an underwater image enhancement method based on deep learning. The proposed model consisted of three modules to provide local and global enhancement and intensity improvement, with an attention module in the final phase. The attention module was considered to combine the extracted characteristics and remove the irrelevant components.

Afifi et al. [21] proposed a coarse-to-fine deep network for exposure correction. Their method processed the raw images with Laplacian pyramid decomposition in different frequency bands. The sequential correction of the levels was performed by considering the global and image details. Mou et al. [22] proposed a supervised enhancement network to enhance low-light endoscopic images particularly. They designed a global illumination enhancement and a local feature extraction module for global illumination and detail preserving.

Even though the proposed deep-learning-based methods achieved reasonable results, their application dependency limits them to particular problems. Therefore, deep learning models that could produce successful and reasonable results in many applications need further development.

3. Materials and Methods

3.1. Datasets

In order to perform experiments and conduct a comparative study, we considered three datasets. We selected two referenced underwater image datasets, namely, the Underwater Image Enhancement Benchmark (UIEB) [23] and the underwater scenes (UWSs) of the Enhancing Underwater Visual Perception (EUVP) datasets [24]. Underwater images are the most challenging problem domain in image enhancement due to different degradations such as lighting and environmental effects, absorption, scattering, low-contrast, and color casts [23]. These degradations might occur in a single image or separately in different images, creating varied and severe challenges. Therefore, the enhancement of underwater images is the most effective way to determine the general enhancement ability of the proposed methods. The UIEB and EUVP underwater scenes datasets include 890 and 2185 paired images for training.

Additionally, the endoscopic real-synthetic over- and underexposed frames for image enhancement (Endo4IE) dataset [25] is considered to demonstrate the efficacy of the proposed model for different datasets of varied problem domains. The Endo4IE dataset consists of 4432 paired endoscopic images for normal, synthetic overexposed, and underexposed frames. Since the Endo4IE dataset consists of reference images for overexposed and underexposed images, it provides a different problem domain to evaluate the methods under different conditions. Table 1 summarizes the characteristics of the considered datasets.

3.2. Time-Shift Image Enhancement Method

The time-shift image enhancement method simulates the movement of objects in space using spacetime on the spatial intensity values of an image. It assumes that an image is an inertial frame of relativity theory without acceleration and generates T events from a single image.

The inertial frame is the basis for providing events associated with the image. In the time-shift algorithm, each pixel of a 2D spatial image has no velocity at a constant moment (T = 0), and the x and y planes have no change in the perspective. Therefore, the z plane is associated with the intensity values to create a 4D vector space (

I_{d}

) associated with spacetime. Generating the events in the spacetime is performed using Equation (1) [1].

{(Δ s_{d t})}^{2} = {(Δ c t_{t})}^{2} - {(Δ I_{d})}^{2}

(1)

where c is the speed of light (c) and is fixed as 1, which is the maximum intensity value of the normalized images. s, d, and t denote the spacetime interval, RGB channel, and time. The image vector space

{(Δ I_{d})}^{2}

, change in time

(Δ t_{t})

, and

{(Δ c t_{t})}^{2}

were defined as [1]

{(Δ I_{d})}^{2} = I_{d} - (I_{d} * γ_{t})

(2)

Δ t_{t} = (1 - t) / T

(3)

{(Δ c t_{t})}^{2} = {(c * I_{d} - Δ t_{t})}^{2}

(4)

where

γ

is the Lorentz factor.

Since the constant images do not contain pre- or post-frames, it was assumed that the image was traveling at predetermined periods at different speeds for each period with respect to the reference frame. In order to account for the lack of other frames, the time-shift algorithm used the Lorentz factor to provide the following events at different speeds observed in an inertial frame. A final image of the time-shift algorithm was reconstructed using the mean of the generated events by multiplying each event with the mean of the Lorentz factors.

Following the reconstruction of a single image, an automatic procedure was applied in order to obtain the final enhanced image. The automatic gamma-correction procedure was based on the mean intensity values of the reconstructed image. If the mean intensity value of each channel (activation level-

θ

) exceeds the predetermined threshold value (

T h = 0.50

), pixel-based gamma correction was applied using the channels’ activation levels separately (

T h + θ

). Figure 1 demonstrates the time-shift image enhancement method with the generated events, the final reconstructed single image, and the final enhanced image after gamma correction. The images,

t = 1

to

t = 10

, of Figure 1 visualize the spacetime events generated by the time-shift image enhancement procedure in different velocities from the inertial frame.

ζ d

visualizes a single reconstructed time-shift image using all spacetime events. Final enhancement (E) was performed using automatic gamma correction on the reconstructed single time-shift image

ζ d

.

3.3. Proposed Tim-EN Models

The proposed model is designed and implemented to provide effective image enhancement based on the time-shift module. The time-shift module is bidirectionally embedded into the simple yet effective CNN as an enhancing layer in the input and image fusion in the output layers. Enhancing the input images using the time-shift module produces less distorted inputs than the original images, which provides rapid and low-cost enhancement.

Tim-EN employs a 4-layered convolutional neural network to minimize the computation cost with the maximized efficiency of the feature extraction process due to the generation of well-enhanced resultant images of the raw inputs by the time-shift module.

The CNN architecture of the Tim-EN is designed to have an increasing number of 3 × 3 filters in the first three layers in order to extract spatial low- and high-level features. The first three layers extract the features with 32, 64, and 128 filters, respectively. The sequence of the layers is pre-tested, and following the decoder style in kernel numbers of the layers produced slightly superior results in exchanging the order of the layers. The final convolutional layer consists of 3 filters to reconstruct RGB images, and the sudden decrease in the filters helps to eliminate noise within the reconstructed images [26,27].

Each layer uses a ReLU activation function, and batch normalization is applied to avoid overfitting. The pooling operation is not considered so as not to decrease the spatial dimensions of the extracted features. This also reduces the computational cost since upsampling is not required. The total number of trainable parameters of the proposed Tim-EN is 97,155.

In the self-supervised Tim-EN model, raw input images are fed to the time-shift module, and the generated images are used as reference images. Similarly, the raw images are fed to the 4-layered CNN and trained to produce enhanced images without user-defined reference images. This self-supervision process led to the implementation of the model for non-referenced images, which are common in real-life applications. At that point, the time-shift module considers each image as a single input and generates reference images for the self-supervised model by ignoring the general characteristics of the whole dataset. However, some reference images might suffer from distortions or errors in color correction. The aim of the self-supervised CNN model is to use the relationship between all images in the training, which provides learning of different characteristics of images, and to improve the general enhancement ability. Figure 2 shows the architecture of the self-supervised Tim-EN model.

In the supervised Tim-EN model, the time-shift module generates the images for both the input and the final layer of the 4-layered CNN. The generated images are passed through the CNN, and in the final layer, extracted features of the CNN and the generated images are recombined by adding them together to improve the visual appearance of the reconstructed images and to decrease the loss. Figure 3 shows the architecture of the supervised Tim-EN model.

3.4. Experiments and Evaluation

The models were trained for 250 epochs using the Adam optimizer with a fixed learning rate (0.0001) and batch size of 16. The number of epochs was determined after performing several experiments and by considering the change in loss. The optimal epoch number for different datasets varied between 235 and 260, and 250 epochs were chosen to make the models standard for each experiment. The determination of the learning rate and the batch size was performed in a similar manner by conducting experiments with different batch sizes and variable learning rates. However, the superior results were recorded with a fixed (0.0001) learning rate and batch size of 16. A decrease in the batch size worsened the obtained results, while an increase did not contribute to the convergence of the models.

The datasets were randomly split into 80:20 training and testing sets. Each training was repeated three times, and the mean of the results was recorded. Each dataset was trained separately and tested on its own test data. The mean squared error was used as a loss function. It should be noted that different loss functions were not considered in these experiments and might cause improvement or worsening of the obtained results. All images were resized into 112 × 112 spatial dimensions to decrease the computational cost. The models were implemented on a Windows 11 PC with an Intel^® Core™ i7- 13700KF CPU, 64 GB memory, 24 GB RTX4090 GPU, and Python 3.11.7 using PyTorch 2.2.0+cu118.

Two full-reference evaluation metrics, peak-to-signal ratio (PSNR) [23] and structural similarity (SSIM) [28], are used to assess the quality of the generated images. The PSNR computes the ratio between the reference and generated images, and this ratio is used to determine the quality between the reference and generated images. The SSIM is used to quantify the degradation in the image quality after processing the original image by measuring the structural similarity between the reference and the enhanced image. A higher PSNR and SSIM indicate better quality.

4. Results

The results are analyzed both qualitatively and quantitatively. In the qualitative results, visual appearance, color saturation, and distortion are considered. The PSNR and SSIM metrics are used in the quantitative analysis to evaluate the results.

4.1. Visual Results

4.1.1. The UIEB Dataset Results

The results of the UIEB dataset showed that the time-shift algorithm might increase the brightness of the raw images based on the illumination conditions. However, it produced reasonable results in general. When the color saturation of the enhanced images was compared to reference images, the supervised Tim-EN model generated slightly more reasonable images than the self-supervised Tim-EN and time-shift algorithm. It should be noted that there was no red, yellow, or green domination of all methods in the UIEB dataset.

The supervised Tim-EN method was more successful in obtaining natural color tones; however, it might fail to properly enhance extremely dark and low-contrast images. In general, the supervised Tim-EN model was superior in color correction and enhancing the visual appearance of the degraded images compared to the time-shift algorithm and self-supervised Tim-EN model. Figure 4 presents sample visual results of the UIEB dataset.

4.1.2. The EUVP Underwater Scene Dataset Results

The experiments and results from the EUVP underwater scenes dataset demonstrated that each method produced different color corrections. The time-shift algorithm and self-supervised Tim-EN model reconstructed similar images in terms of color saturation and visual appearance. In darker images, all methods produced brighter images than the reference images, which shows that the methods adequately preserve details within the image; however, this led to a loss of color saturation in some images.

Even though both self-supervised and supervised models achieved reasonable results in quantitative analysis, the supervised Tim-EN model improved the images’ visual appearances without noise and distortions. Figure 5 presents sample visual results of the EUVP underwater scenes dataset.

4.1.3. The Endo4IE Dataset Results

In the endoscopy dataset, the time-shift algorithm and the self-supervised Tim-EN model tended to increase the brightness of the raw images. This led to sharpening the details of the image; however, it reduced the similarity compared to the original reference images. Even though they produced reasonable results in terms of not producing under- or over-contrasted images, the supervised Tim-EN model was superior due to the extraction of features using both the time-shift algorithm and original referenced images. The resultant images of the supervised Tim-EN were almost identical to the reference images by visual inspection. Figure 6 presents sample visual results of the endoscopy dataset.

4.2. Quantitative Results

Table 2 shows the quantitative results for the UIEB dataset. The time-shift algorithm and the proposed self-supervised Tim-EN model achieved close results for both metrics since the self-supervision of the model was obtained by the time-shift algorithm. However, the learnable parameters provided by the CNN improved the enhancement process. Despite achieving high quantitative results, the supervised Tim-EN model was superior to both methods and achieved the highest SSIM results (0.924). Furthermore, the supervised Tim-EN model produced the highest PSNR scores (29.24). The obtained results suggest that the supervised model is highly effective in providing enhanced images without distorting texture and eliminating noise.

Table 3 shows the quantitative results for the EUVP underwater scenes dataset. Similarly, the time-shift algorithm and the proposed self-supervised Tim-EN model achieved close results for both metrics. Even though the self-supervised Tim-EN model slightly outperformed the time-shift algorithm in both metrics, it is remarkable that it produced high results without learnable parameters. The supervised Tim-EN model obtained the highest quantitative results (SSIM: 0.925, PSNR: 29.19), and the results showed that the supervised model is significantly superior to the others.

In the Endo4IE dataset, all methods and models produced SSIM results of over 0.90, which means they were reasonably effective in image enhancement. The time-shift algorithm obtained an SSIM score of 0.921, while self-supervised Tim-EN achieved 0.951. However, the supervised model achieved outstanding scores and produced 0.984 for SSIM and 31.62 for PSNR. Table 4 presents the quantitative results obtained on the Endo4IE dataset.

4.3. Ablation Study

An ablation study on the UIEB dataset was performed by not adding the time-shift module to the supervised model in order to show the module’s contribution. Additionally, we trained the proposed supervised model by removing each convolutional layer separately to demonstrate the efficacy of the simple yet effective model. Table 5 shows the obtained results with different CNN architectures and without the time-shift module. The removal of each layer in the CNN is mentioned as a three-layer CNN and the removed layer.

Close results were obtained in the ablation study with or without the time-shift module using the different architectures. The results showed that combining the time-shift module with the four-layer CNN model produced superior results. However, improving the results obtained with the time-shift module could be possible by embedding it into more advanced and deep models, which would require further investigation.

4.4. Comparative Evaluation and Discussion

We performed comparative evaluations and analyses primarily for the UIEB dataset by considering the state of the art and recent methods focusing on the quantitative results. Additionally, qualitative comparisons were performed by considering the recent state-of-the-art methods to analyze at which point the methods effectively enhance images. Note that the spatial dimensions of training images might affect the appearance of outputs. Since the other datasets, the EUVP and the Endo4IE, consisted of sub-datasets, performing a fair comparison is challenging. However, we evaluated the state of the art and recent methods on these datasets.

4.4.1. Comparisons on the UEIB Dataset

The qualitative comparisons were performed by considering the enhanced images obtained by the FUnIE-GAN [24] and Water-Net [23]. Water-Net tends to darken images while preserving the contrast compared to the reference images. Even though it might fail to reduce the color distortions in underwater images, it produces satisfactory results for human perception. FUnIE-GAN generally produced reasonable results but failed to remove color distortions properly, causing foggy results compared to the reference images. The TS-IEM performed well on color corrections of distinct patterns. Its main disadvantage was the loss of color information in indistinct patterns. This also affected the performance of the proposed self-supervised model, and even though the contrast of the resultant images was clearly enhanced, it included a similar loss in background color. Since the time-shift module contributed to the convergence of reference images in the supervised model, it produced more reasonable color correction and background enhancement in the UIEB dataset. However, the sharpened details were lost under different conditions of underwater images compared to Water-Net. Figure 7 presents sample qualitative results for the UIEB dataset image.

Table 6 presents the comparative results for the UIEB dataset. Water-Net [23], which was a baseline model for the UIEB dataset, achieved an SSIM score of 0.80. Li et al. [29] proposed a novel comparative learning framework, CLUIENet, to enhance underwater images. The aim was to provide multiple reference images during learning. The CLUIENet achieved SSIM and PSNR scores of 0.89 and 20.37, respectively. Fu and Cao [30] performed experiments on the UIEB dataset using global–local networks and compressed-histogram equalization. The proposed method achieved SSIM and PSNR scores of 0.88 and 18.57.

A two-stream structure of Ding et al. [19] obtained SSIM and PSNR scores of 0.91 and 22.92. Fu et al. [20] achieved SSIM and PSNR scores of 0.89 and 21.84. Khandouzi and Ezoji [18] achieved SSIM and PSNR scores of 0.88 and 21.10 on the UIEB dataset.

Compared to the recent and state-of-the-art methods, the time-shift algorithm produces reasonable results even though it cannot outperform the deep learning methods. Considering the self-supervision strategy of the self-supervised Tim-EN, the results showed that the proper reconstruction of the reference images using the raw images might provide effective target images and improve results. This might provide real-time use of the self-supervised model since there are several application areas where obtaining reference images is challenging.

The supervised Tim-EN model outperformed the state-of-the-art models in quantitative analysis by obtaining higher SSIM and PSNR scores on the UIEB dataset. The results of supervised Tim-EN suggest that embedding the enhanced raw images into the CNN module produces improved results with a minimized computational cost.

4.4.2. Comparisons on the EUVP Underwater Scenes Dataset

Berman et al. [31] proposed an underwater image enhancement method that considers the spectral characteristics of different water types. They aimed to reduce the problems of underwater images into a single dehazing problem by considering different color channels. The proposed method achieved PSNR and SSIM scores of 18.85 and 0.77.

Chang et al. [32] developed another UWI enhancement method based on a GAN. A channel attention mechanism was embedded into the U-Net, and the generator was used to estimate the parameters of the physical model. Their method achieved PSNR and SSIM scores of 26.68 and 0.79 on the EUVP dataset. Islam et al. [24] proposed a conditional GAN-based model, FUnIE-GAN, to enhance underwater images. An objective function was formulated to evaluate the perceptual image quality. The proposed method obtained successful results on the EUVP dataset with PSNR and SSIM scores of 21.92 and 0.88.

The time-shift image enhancement method achieved PSNR and SSIM scores of 0.84 and 27.68 on the underwater scenes subset of the EUVP dataset. Even though it achieved a reasonable SSIM score without any learnable parameters or training, it could not achieve the scores obtained by other deep learning methods. However, it produced a higher PSNR with reduced noise. The self-supervised Tim-EN model obtained higher scores (0.86 SSIM and 28.64 PSNR) than the time-shift algorithm and proved that it is capable of enhancing images without reference or paired images. The supervised Tim-EN model achieved successful results with PSNR and SSIM scores of 0.925 and 29.19. Embedding the time-shift module into the CNN model improved the results by 0.08 and 0.06 compared to the time-shift method and self-supervised Tim-EN model. However, using different training and testing sets of the EUVP dataset makes the direct comparison of studies challenging and requires further investigation.

4.4.3. Comparisons on the Endo4IE Dataset

The coarse-to-fine deep network [21] for exposure correction achieved PSNR and SSIM scores of 22.28 and 0.772 [33] on the overexposed and 23.06 and 0.760 on the underexposed images of the Endo4IE dataset. Espinosa et al. [33] improved the method by adding an SSIM loss module into the model and improved the results to SSIM scores of 0.806 and 0.792 for over and underexposed images. The superior results were obtained by Mou et al. [22]. Their model achieved PSNR and SSIM scores of 27.32 and 0.83 on the Endo4IE dataset.

Both proposed self-supervised and supervised Tim-EN models achieved superior SSIM scores on the Endo4IE dataset, by 0.03 and 0.04. Table 7 shows the comparative table of the recent studies and the proposed models.

5. Conclusions

Image enhancement requires prior knowledge of the raw images to provide properly enhanced images. Conventional methods’ limitations lead to the development of deep-learning-based methods to overcome these limitations. However, the characteristics of different problem domains also require additional improvements and modifications of the models to produce reasonable resultant images. Therefore, deep-learning-based methods generally have limited application areas and might fail to produce optimal images in areas different from those in which they were trained.

This paper proposed two image-enhancement networks, self-supervised and supervised, with a time-shift image enhancement module to provide effective models for different kinds of image enhancement applications. The self-supervised model used the self-generated reference images using the time-shift module for training. The supervised model used the knowledge obtained by the time-shift module to enhance raw images and improve the training process.

The proposed models were tested on two challenging underwater image datasets and a medical image dataset to demonstrate their efficacy in different problem domains. The results and the comparative study showed that the proposed supervised model outperformed the state-of-the-art models. In addition, the self-supervised model produced successful results without original reference images and could be used effectively with the non-referenced images.

Future work will include adding a learnable parameter in the time-shift module to improve the performance of the models. Additionally, we will consider training on all the considered datasets with additional datasets to test the proposed models’ enhancement ability with external datasets.

Author Contributions

Conceptualization, K.T.; methodology, K.T. and B.S.; software, K.T.; validation, K.T. and B.S.; formal analysis, K.T., B.S. and R.A.; data curation, K.T.; writing—original draft preparation, K.T.; writing—review and editing, K.T., B.S. and R.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The considered UIEB [23], EUVP [24], and Endo4IE [25] datasets are available at https://doi.org/10.1109/TIP.2019.2955241, https://doi.org/10.1109/LRA.2020.2974710, https://doi.org/10.17632/3j3tmghw33.1, respectively. The UIEB dataset is accessed on 1 August 2023. The EUVP and Endo4IE datasets are accessed on 8 February 2024. The results and the codes of the Time-Shift Image Enhancement Method are available at https://github.com/BoranSekeroglu/Time_Shift_Enhancement (accessed on 1 January 2020).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sekeroglu, B. Time-shift image enhancement method. Image Vis. Comput. 2023, 138, 104810. [Google Scholar] [CrossRef]
Zuiderveld, K. Contrast limited adaptive histogram equalization. In Graphics Gems; Academic Press Inc.: Cambridge, MA, USA, 1994; pp. 474–485. [Google Scholar]
Kim, Y. Contrast enhancement using brightness preserving bi-histogram equalization. IEEE Trans. Consum. Electron. 1997, 43, 1–8. [Google Scholar]
Wan, Y.; Chen, Q.; Zhang, B. Image enhancement based on equal area dualistic subimage histogram equalization method. IEEE Trans. Consum. Electron. 1997, 45, 68–75. [Google Scholar]
Huang, S.C.; Cheng, F.C.; Chiu, Y.S. Efficient contrast enhancement using adaptive gamma correction with weighting distribution. IEEE Trans. Image Process. 2013, 22, 1032–1041. [Google Scholar] [CrossRef]
Li, Y.; Zhu, C.; Peng, J.; Bian, L. Fusion-based underwater image enhancement with category-specific color correction and dehazing. Opt. Express 2022, 30, 33826–33841. [Google Scholar] [CrossRef] [PubMed]
Saleem, A.; Beghdadi, A.; Boashash, B. Image fusion-based contrast enhancement. EURASIP J. Image Video Process 2012, 2012, 10. [Google Scholar] [CrossRef]
Awan, H.S.A.; Mahmood, M.T. Underwater Image Restoration through Color Correction and UW-Net. Electronics 2024, 13, 199. [Google Scholar] [CrossRef]
Yang, H.; Tian, F.; Qi, Q.; Wu, Q.M.J.; Li, K. Underwater image enhancement with latent consistency learning-based color transfer. IET Image Process. 2022, 16, 1594–1612. [Google Scholar] [CrossRef]
Huang, L.; Cao, G.; Yu, L. Efficient contrast enhancement with truncated adaptive gamma correction. In Proceedings of the 2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Datong, China, 15–17 October 2016; pp. 189–194. [Google Scholar] [CrossRef]
Chang, Y.; Jung, C.; Ke, P.; Song, H.; Hwang, J. Automatic contrast-limited adaptive histogram equalization with dual gamma correction. IEEE Access 2018, 6, 11782–11792. [Google Scholar] [CrossRef]
Ancuti, C.O.; Ancuti, C.; De Vleeschouwer, C.; Bekaert, P. Color balance and fusion for underwater image enhancement. IEEE Trans. Image Process. 2018, 27, 379–393. [Google Scholar] [CrossRef]
Zhou, J.; Zhang, D.; Zhang, W. Underwater image enhancement method via multifeature prior fusion. Appl. Intell. 2022, 52, 16435–16457. [Google Scholar] [CrossRef]
Fu, X.; Zeng, D.; Huang, Y.; Liao, Y.; Ding, X.; Paisley, J. A fusion-based enhancing method for weakly illuminated images. Signal Process. 2016, 129, 82–96. [Google Scholar] [CrossRef]
Zhang, W.; Zhuang, P.; Sun, H.H.; Li, G.; Kwong, S.; Li, C. Underwater Image Enhancement via Minimal Color Loss and Locally Adaptive Contrast Enhancement. IEEE Trans. Image Process. 2022, 31, 3997–4010. [Google Scholar] [CrossRef]
Raveendran, S.; Patil, M.D.; Birajdar, G.K. Underwater image enhancement: A comprehensive review, recent trends, challenges and applications. Artif. Intell. Rev. 2021, 54, 5413–5467. [Google Scholar] [CrossRef]
Sun, S.; Wang, H.; Zhang, H.; Li, M.; Xiang, M.; Luo, C.; Ren, P. Underwater Image Enhancement with Reinforcement Learning. IEEE J. Ocean. Eng. 2024, 49, 249–261. [Google Scholar] [CrossRef]
Khandouzi, A.; Ezoji, M. Coarse-to-fine underwater image enhancement with lightweight CNN and attention-based refinement. J. Vis. Commun. Image Represent. 2024, 99, 104068. [Google Scholar] [CrossRef]
Ding, D.; Gan, S.; Chen, L.; Wang, B. Learning-based underwater image enhancement: An efficient two-stream approach. Displays 2023, 76, 102337. [Google Scholar] [CrossRef]
Fu, Z.; Wang, W.; Huang, Y.; Ding, X.; Ma, K.K. Uncertainty inspired underwater image enhancement. In Proceedings of the 2022 European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022. [Google Scholar]
Afifi, M.; Derpanis, K.; Ommer, B.; Brown, M. Learning Multi-Scale Photo Exposure Correction. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 9153–9163. [Google Scholar] [CrossRef]
Mou, E.; Wang, H.; Yang, M.; Cao, E.; Chen, Y.; Ran, C.; Pang, Y. Global and Local Enhancement of Low-light Endoscopic Images. Preprints 2023, 2023111954. [Google Scholar] [CrossRef]
Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An Underwater Image Enhancement Benchmark Dataset and Beyond. IEEE Trans. Image Process. 2020, 29, 4376–4389. [Google Scholar] [CrossRef]
Islam, M.J.; Xia, Y.; Sattar, J. Fast Underwater Image Enhancement for Improved Visual Perception. IEEE Robot. Autom. Lett. 2020, 5, 3227–3234. [Google Scholar] [CrossRef]
Garcia-Vega, A.; Ochoa, G.; Espinosa, R. Endoscopic real-synthetic over- and underexposed frames for image enhancement. Mendeley Data 2022, 1. [Google Scholar] [CrossRef]
Azat, H.S.; Sekeroglu, B.; Dimililer, K. A Pre-study on the Layer Number Effect of Convolutional Neural Networks in Brain Tumor Classification. In Proceedings of the 2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), Kocaeli, Turkey, 25–27 August 2021; pp. 1–6. [Google Scholar] [CrossRef]
Sadikoglu, F.; Alpan, K.; Sekeroglu, B. Defect Detection of Casting Products Using Convolutional Neural Network. In Proceedings of the 12th World Conference on Intelligent System for Industrial Automation (WCIS-2022), Tashkent, Uzbekistan, 25–26 November 2022; Springer: Cham, Switzerland, 2022. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
Li, K.; Wu, L.; Qi, Q.; Liu, W.; Gao, X.; Zhou, L.; Song, D. Beyond Single Reference for Training: Underwater Image Enhancement via Comparative Learning. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 2561–2576. [Google Scholar] [CrossRef]
Fu, X.; Cao, X. Underwater image enhancement with global–local networks and compressed-histogram equalization. Signal Process. Image Commun. 2020, 86, 115892. [Google Scholar] [CrossRef]
Berman, D.; Levy, D.; Avidan, S.; Treibitz, T. Underwater single image color restoration using haze-lines and a new quantitative dataset. arXiv 2018, arXiv:1811.01343. [Google Scholar] [CrossRef]
Chang, S.; Gao, F.; Zhang, Q. Underwater Image Enhancement Method Based on Improved GAN and Physical Model. Electronics 2023, 12, 2882. [Google Scholar] [CrossRef]
Espinosa, R.; Garcia-Vega, A.; Ochoa-Ruiz, G. Multi-Scale Structural-aware Exposure Correction for Endoscopic Imaging. In Proceedings of the LatinX in AI at Computer Vision and Pattern Recognition Conference 2023, Vancouver, BC, Canada, 18–22 June 2023. [Google Scholar] [CrossRef]

Figure 1. Time-shift image enhancement procedure,

t = 1

to

t = 10

: generated spacetime events for

T = 10

.

ζ d

is the reconstructed single time-shift image, and the final image is the gamma-corrected image E (

T = 10

). Image courtesy of Sekeroglu [1].

Figure 1. Time-shift image enhancement procedure,

t = 1

to

t = 10

: generated spacetime events for

T = 10

.

ζ d

is the reconstructed single time-shift image, and the final image is the gamma-corrected image E (

T = 10

). Image courtesy of Sekeroglu [1].

Figure 2. Proposed self-supervised CNN model with time-shift module.

Figure 3. Proposed supervised CNN model with time-shift module.

Figure 4. Visual results on the UIEB dataset: (a) original input images, (b) TS-IEM, (c) proposed self-supervised model, (d) proposed supervised model, and (e) reference images.

Figure 5. Visual results on the EUVP underwater scenes dataset, (a) original input images, (b) TS-IEM, (c) proposed self-supervised model, (d) proposed supervised model, and (e) reference images.

Figure 6. Visual results on the Endo4IE dataset [25], (a) original input images, (b) TS-IEM, (c) proposed self-supervised model, (d) proposed supervised model, and (e) reference images.

Figure 7. Qualitative comparisons on the UIEB Dataset, (a) raw image, (b) Water-Net [23], (c) FUnIE-GAN [24] (d) TS-IEM [1], (e) proposed self-supervised model, (f) proposed supervised model, and (g) reference image.

Table 1. Characteristics of the considered datasets.

Dataset	Type	No. of Images	Problem Domain
UIEB [23]	Referenced	890	Underwater image enhancement
EUVP underwater scenes [24]	Referenced	2185	Underwater image enhancement
Endo4IE [25]	Referenced	4432	Exposure errors in endoscopic examinations

Table 2. Quantitative results on the UIEB dataset.

Metric	TS-IEM [1]	Proposed Self-Supervised Model	Proposed Supervised Model
SSIM	0.884	0.898	0.924
PSNR	28.88	29.14	29.24

Table 3. Quantitative results on the EUVP underwater scenes dataset.

Metric	Time-Shift Algorithm	Proposed Self-Supervised Model	Proposed Supervised Model
SSIM	0.848	0.864	0.925
PSNR	27.68	28.64	29.19

Table 4. Quantitative results on the Endo4IE dataset.

	Overexposure		Underexposure
Method	SSIM	PSNR	SSIM	PSNR
TS-IEM [1]	0.82	26.11	0.81	25.23
Proposed Self-Supervised Model	0.88	29.11	0.86	27.23
Proposed Supervised Model	0.91	29.87	0.90	29.24

Table 5. Ablation study results on the UIEB dataset.

Model	SSIM	PSNR
4-layer CNN w/out Time-Shift Module	0.852	28.60
3-layer CNN + Time-Shift Module w/out lr-32	0.853	28.88
3-layer CNN + Time-Shift Module w/out lr-64	0.841	28.55
3-layer CNN + Time-Shift Module w/out lr-128	0.838	28.34
Proposed Supervised Model	0.924	29.24

Table 6. Comparative results on the UIEB dataset.

Metric	CLUIENet [29]	Ding et al. [19]	Fu and Cao [30]	Water- Net [23]	Fu et al. [20]	Khandouzi and Ezoji [18]	TS- IEM [1]	Proposed Self- Supervised Model	Proposed Supervised Model
SSIM	0.89	0.91	0.88	0.80	0.89	0.88	0.88	0.89	0.92
PSNR	20.37	22.92	18.57	17.30	21.84	21.10	28.88	29.14	29.24

Table 7. Quantitative comparisons on the Endo4IE dataset.

	Overexposure			Underexposure
Method	SSIM	PSNR		SSIM	PSNR
Afifi et al. [21]	0.77	22.28		0.76	23.06
Espinosa et al. [33]	0.80	23.13		0.79	24.20
Mou et al. [22]			0.83/27.32 (total)
TS-IEM [1]	0.82	26.11		0.81	25.23
Proposed Self-Supervised Model	0.88	29.11		0.86	27.23
Proposed Supervised Model	0.91	29.87		0.90	29.24

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tuncal, K.; Sekeroglu, B.; Abiyev, R. Self-Supervised and Supervised Image Enhancement Networks with Time-Shift Module. Electronics 2024, 13, 2313. https://doi.org/10.3390/electronics13122313

AMA Style

Tuncal K, Sekeroglu B, Abiyev R. Self-Supervised and Supervised Image Enhancement Networks with Time-Shift Module. Electronics. 2024; 13(12):2313. https://doi.org/10.3390/electronics13122313

Chicago/Turabian Style

Tuncal, Kubra, Boran Sekeroglu, and Rahib Abiyev. 2024. "Self-Supervised and Supervised Image Enhancement Networks with Time-Shift Module" Electronics 13, no. 12: 2313. https://doi.org/10.3390/electronics13122313

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Self-Supervised and Supervised Image Enhancement Networks with Time-Shift Module

Abstract

1. Introduction

2. Related Works

2.1. Conventional Methods

2.2. Deep-Learning-Based Methods

3. Materials and Methods

3.1. Datasets

3.2. Time-Shift Image Enhancement Method

3.3. Proposed Tim-EN Models

3.4. Experiments and Evaluation

4. Results

4.1. Visual Results

4.1.1. The UIEB Dataset Results

4.1.2. The EUVP Underwater Scene Dataset Results

4.1.3. The Endo4IE Dataset Results

4.2. Quantitative Results

4.3. Ablation Study

4.4. Comparative Evaluation and Discussion

4.4.1. Comparisons on the UEIB Dataset

4.4.2. Comparisons on the EUVP Underwater Scenes Dataset

4.4.3. Comparisons on the Endo4IE Dataset

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI