FQ-UWF: Unpaired Generative Image Enhancement for Fundus Quality Ultra-Widefield Retinal Images

Lee, Kang Geon; Song, Su Jeong; Lee, Soochahn; Kim, Bo Hee; Kong, Mingui; Lee, Kyoung Mu

doi:10.3390/bioengineering11060568

Open AccessArticle

FQ-UWF: Unpaired Generative Image Enhancement for Fundus Quality Ultra-Widefield Retinal Images

by

Kang Geon Lee

¹

,

Su Jeong Song

^2,3,†

,

Soochahn Lee

⁴

,

Bo Hee Kim

²,

Mingui Kong

² and

Kyoung Mu Lee

^1,5,*,†

¹

Department of Electrical and Computer Engineering, Automation and Systems Research Institute (ASRI), Seoul National University, Seoul 08826, Republic of Korea

²

Department of Ophthalmology, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul 03181, Republic of Korea

³

Biomedical Institute for Convergence (BICS), Sungkyunkwan University, Suwon 16419, Republic of Korea

⁴

School of Electrical Engineering, Kookmin University, Seoul 02707, Republic of Korea

⁵

Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul 08826, Republic of Korea

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Bioengineering 2024, 11(6), 568; https://doi.org/10.3390/bioengineering11060568

Submission received: 10 May 2024 / Revised: 1 June 2024 / Accepted: 1 June 2024 / Published: 4 June 2024

(This article belongs to the Special Issue AI and Big Data Research in Biomedical Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Ultra-widefield (UWF) retinal imaging stands as a pivotal modality for detecting major eye diseases such as diabetic retinopathy and retinal detachment. However, UWF exhibits a well-documented limitation in terms of low resolution and artifacts in the macular area, thereby constraining its clinical diagnostic accuracy, particularly for macular diseases like age-related macular degeneration. Conventional supervised super-resolution techniques aim to address this limitation by enhancing the resolution of the macular region through the utilization of meticulously paired and aligned fundus image ground truths. However, obtaining such refined paired ground truths is a formidable challenge. To tackle this issue, we propose an unpaired, degradation-aware, super-resolution technique for enhancing UWF retinal images. Our approach leverages recent advancements in deep learning: specifically, by employing generative adversarial networks and attention mechanisms. Notably, our method excels at enhancing and super-resolving UWF images without relying on paired, clean ground truths. Through extensive experimentation and evaluation, we demonstrate that our approach not only produces visually pleasing results but also establishes state-of-the-art performance in enhancing and super-resolving UWF retinal images. We anticipate that our method will contribute to improving the accuracy of clinical assessments and treatments, ultimately leading to better patient outcomes.

Keywords:

unpaired super-resolution; retinal fundus image enhancement; ultra-widefield retinal image

1. Introduction

Ultra-widefield (UWF) retinal images have emerged as a revolutionary modality in ophthalmology [1,2]. As depicted in Figure 1, UWF provides an extensive field of view that enables the visualization of both central and peripheral retinal areas. This enables early detection and monitoring of peripheral retinal conditions that are often missed in standard fundus images. However, various artifacts, low macular area resolution, large data size, and lack of interpretation standardization act as impediments to widespread clinical use of UWF images.

Image enhancement techniques have the potential to improve UWF image quality, empowering healthcare professionals to make more accurate diagnoses and treatment plans. Ophthalmologists may better detect subtle early changes in the macular area and identify peripheral early signs of disease, leading to better patient outcomes. But since UWF images contain multiple degradation factors scattered throughout the fundus in a complex manner, image enhancement is a significant challenge. Many recent conventional image enhancement techniques are based on supervised learning and require a ground truth (GT) dataset of well-aligned low- and high-quality image pairs for training. Achieving this paired dataset is a significant challenge in the case of UWF, where precise alignment between image pairs is extremely difficult.

The application of deep learning algorithms has facilitated promising results in a wide range of image enhancement tasks, including super-resolution, image denoising, and image deblurring [3]. A variety of methods tailored for enhancement of retinal fundus images have also been proposed [4,5]. These methods can automatically learn and apply complex transformations to improve the visualization of critical structures such as blood vessels, the optic disc, and the macula. Despite the necessity, there has yet to be a comprehensive deep-learning-based enhancement method for UWF images.

We thus propose a comprehensive image enhancement method for UWF images, with the specific goal of improving the quality of conventional fundus images. Figure 2 presents sample results of the proposed method. As image quality can be subjective, we compare manual annotations of drusen from fundus images and UWF images after applying our enhancement method. Experimental evaluation demonstrates that the similarity between annotations after enhancement is considerably improved compared to annotations made on images before enhancement. Quantitative measurements of image quality are also assessed, demonstrating state-of-the-art results on several datasets. Based on our goal and the experimental findings, we refer to the enhanced images as fundus quality (FQ)-UWF images. We believe that our approach has the potential to improve the accuracy of clinical assessments and treatments, ultimately leading to better patient outcomes.

The proposed method is based on the generative adversarial network (GAN) framework to avoid the requirement of pairs of aligned high-quality images in pixelwise supervision. We employ a dual-GAN structure to jointly perform super-resolution, enhancing the low resolution of the macula in UWF, which has a critical impact on clinical practice. As image pairs are not required, training data are acquired by simply collecting sets of UWF and fundus images. We also incorporate appropriate attention mechanisms in the network for enhancement with regard to various degradations such as noise, blurring, and artifacts scattered throughout the UWF.

We summarize our contributions as follows:

We establish a method for UWF image enhancement and super-resolution from unpaired UWF and fundus image sets. We evaluate the clinical utility in the context of detecting and localizing drusen in the macula.
We propose a novel dual-GAN network architecture capable of effectively addressing diverse degradations in the retina while simultaneously enhancing the resolution of UWF images.
The proposed method is designed to be trained on unpaired sets of UWF and fundus images. We further present a corresponding multi-step training scheme that combines transfer learning and end-to-end dataset adaptation, leading to enhanced performance in both quantitative and qualitative evaluations.

2. Related Works

2.1. Retinal Image Enhancement

Due to the relatively invariable appearance, methods based on traditional image processing techniques continue to be proposed [6,7]. But the majority of methods leverage deep neural networks, as in [5,8], and especially GANs in particular [4].

Pham and Shin [9] considered additional factors such as drusen segmentation masks to not only improve image quality but also preserve crucial disease information during the enhancement process, addressing a common challenge in existing image enhancement techniques. To overcome the challenges of constructing a clean true ground truth (GT) dataset for retinal image data, particularly due to factors such as alignment, Yang et al. [4] introduced an unpaired image generation method for enhancing low-quality retinal fundus images. Lee et al. [5] proposed an attention module designed to automatically enhance low-quality retinal fundus images afflicted by complex degradation based on the specific nature of their degradation.

2.2. Blind and Unpaired Image Restoration

Blind image restoration is a computational process aimed at enhancing or recovering degraded images without prior knowledge of the degradation model or parameters. Traditionally, methods for blind image restoration have employed approaches involving the prediction of the estimation of degradation model parameters [10] or the degradation kernels [11]. Recently, there has been a trend towards directly generating high-quality images through training using deep learning models [12]. Shocher et al. [13] conducted super-resolution without relying on specific training examples of the target resolution during the model’s training phase. Yu et al. [14] proposed a blind image restoration toolchain for multiple tasks with reinforcement learning.

Unpaired image restoration focuses on learning the difference between pairs of image domains rather than pairs of individual images. Multiple methods using GAN-based models [15] have been proposed [16,17] to learn the mapping between the low-quality and high-quality images while also incorporating a cycle-consistency constraint [18] to improve the quality of the generated images.

2.3. Hierarchical or Multi-Structured GAN

Recently, there has been significant progress in mitigating the instability associated with GAN training, leading to the emergence of various proposed approaches that involve connecting two or more GANs for joint learning. Several works showed stable translation between two different image domains using coupled-GAN architectures [19]. Further works extended their usage to multiple domains or modalities [20,21]. And more works extended this approach beyond random image generation to tasks such as image restoration [16], and exploration into more complex architectures has also been proposed [22].

2.4. Transfer Learning for GANs

Pre-trained GAN models have demonstrated considerable efficacy across various computer vision tasks, particularly in scenarios characterized by limited training data [23,24]. Typically trained on extensive datasets comprising millions of annotated images, these models offer a foundation of learned features. Through the process of fine-tuning on novel datasets, one can capitalize on these pre-trained features, leading to the attainment of state-of-the-art performance across a diverse spectrum of tasks.

Early works confirmed successful generation in a new domain by transferring a pre-trained GAN to a new dataset [25,26]. Other works enabled transfer learning for GANs with datasets of limited size [27,28]. Li et al. [29] proposed an optimization method for transfer learning for GAN that was free from biases towards specific classes and resilient to mode collapse and achieved by fine-tuning only the class embedding layer, which is part of the GAN architecture. Mo et al. [30] proposed a method wherein the lower layers of the discriminator are fixed; then, it is partitioned into a feature extractor and a classifier. Subsequently, only the classifier is fine-tuned. Fregier and Gouray [26] performed transfer learning for GAN on a new dataset by freezing the low-level layers of the encoder, thereby preserving pre-trained knowledge to the maximum extent possible.

3. Methods

3.1. Overview of FQ-UWF Generation

To get a final enhanced FQ-UWF result

I_{F Q - U W F}

, we split the process of FQ-UWF generation into two steps: (i) degradation enhancement (DE) and (ii) super-resolution (SR). Figure 3 presents a visual overview of the framework. The order of the processes is tailored to maximize the quality of the output FQ-UWF images. The generator networks of each process, which we respectively denote as

G_{D E}

and

G_{S R}

, are coupled with adversarial discriminator networks

D_{D E}

and

D_{S R}

that are designed to enforce that the generators’ output images have similar image characteristics as the fundus images from the training set.

G_{D E}

performs degradation enhancement on input image

I_{U W F}

to get

I_{D E - U W F}

. Training of

G_{D E}

is guided by

D_{D E}

so that the

D_{D E}

output score values are similar for the given pair of

I_{E - U W F}

and

I_{D S - f u n d u s}

, which is a

\times 4

bicubically downsampled version of

I_{f u n d u s}

.

D_{D E}

is trained to make the score value of the given pair of images significantly differ.

G_{S R}

performs

\times 4

super-resolution on

I_{E - U W F}

to get

I_{F Q - U W F}

.

G_{S R}

and

D_{S R}

are trained in the same manner as

G_{D E}

and

D_{D E}

, respectively, with the pair of

I_{F Q - U W F}

and

I_{f u n d u s}

. For

D_{S R}

, we also impose cyclic constraints, as in [18,31], by applying the

G_{S R}

operation to not only

I_{E - U W F}

but to

I_{D S - f u n d u s}

as well. For each module, we empirically determined appropriate network architectures. The following subsections describe further details of each module.

3.2. Architecture Details

3.2.1. $G_{D E}$

We apply U-net [32] as the base architecture, as U-net has been proven to be effective for medical image enhancement [33]. Within the encoder–decoder structure of U-net, we embed attention modules to better enhance local degradation or artifacts scattered throughout the input image. We apply the attention layer structure proposed by [5], as it has been demonstrated to be effective for retinal image enhancement. The network structure is depicted in the top row of Figure 4.

The

C o n v

box comprises a

3 \times 3

convolutional layer so that the spatial size of the feature is reduced to

1 / 4

, where both the height and the width of the feature are reduced to

1 / 2

, and the channel dimension is doubled. The

D e c o n v

box comprises a

3 \times 3

deconvolutional layer so that the spatial size of the feature is quadrupled, where both the height and the width of the feature are doubled, and the channel dimension is halved. The attention (Att) box comprises a sequentially connected batch normalization, activation, operation-wise attention module, and activation, where the operation-wise attention module enables the degradations to be better attended.

3.2.2. $G_{S R}$

The network structure is depicted in the middle row of Figure 4. The

F e a t u r e E x t r a c t o r

box comprises a

3 \times 3

convolutional layer followed by activation. The

C o n v + B N

box comprises a

3 \times 3

convolutional layer followed by batch normalization. The

C o n v + S h u f f l e

box comprises a

3 \times 3

convolutional layer followed by a pixel shuffler for expanding the height and width of the feature by a factor of two each. Channel calibration is designed for reducing the dimension of the feature to three, maintaining the spatial dimension of the feature. The Residual Block comprises series of

C o n v + B N

, activation,

C o n v + B N

, and residual connections for element-wise summing. We note that this structure is adopted from [15].

Figure 4. The detailed structure of generators and discriminators. The detailed structure of generator

G_{D E}

,

G_{S R}

, and the discriminator shared between

D_{D E}

and

D_{S R}

is illustrated. Note that even though

D_{D E}

and

D_{S R}

utilize the same structure, they are fundamentally distinct discriminative networks.

Figure 4. The detailed structure of generators and discriminators. The detailed structure of generator

G_{D E}

,

G_{S R}

, and the discriminator shared between

D_{D E}

and

D_{S R}

is illustrated. Note that even though

D_{D E}

and

D_{S R}

utilize the same structure, they are fundamentally distinct discriminative networks.

3.2.3. $D_{D E} a n d D_{S R}$

The structures of the discriminator models

D_{D E}

and

D_{D E}

are depicted in Figure 4. The

F e a t u r e E x t r a c t o r

box comprises a

3 \times 3

convolutional layer followed by activation. The

C o n v + B N

box comprises a

3 \times 3

convolutional layer followed by batch normalization. The Conv Block comprises series of

C o n v + B N

and activation. At the final layer of the network, there exists a score function for evaluating the similarity of input images, accompanied by a

D e n s e

layer aimed at reducing the dimension of the feature to a single scalar score value. We follow the structure of the discriminator in [15] for

D_{D E}

. The input images for

D_{D E}

are pairs of downsampled real fundus images

I_{D S - f u n d u s}

and generated enhanced low-resolution UWF images

I_{E - U W F}

. The input images for

D_{S R}

are pairs of real fundus images

I_{f u n d u s}

and generated FQ-UWF images

I_{F Q - U W F}

.

3.3. Loss Functions and Training Details

Given that end-to-end training of an architecture composed of multiple networks is highly challenging, we take three steps to train the full network architecture composed of (i)

G_{D E}

training, (ii)

G_{S R}

training, and (iii) overall fine-tuning.

3.3.1. $G_{D E}$ Training

We first impose adversarial loss on

G_{D E}

and

D_{D E}

as follows:

\begin{matrix} L_{L} = & E_{x \sim I_{D S - f u n d u s}} [\log D_{D E} (x)] \\ + E_{z \sim I_{U W F}} [1 - \log D_{D E} (G_{D E} (z))] . \end{matrix}

(1)

The identity mapping loss is important when performing tasks such as super-resolution or enhancement, as it helps to maintain the style (color, structure, etc.) of the source domain’s image while applying the target domain’s information [18]. Thus, we use the loss function defined as:

L_{I} = E_{z \sim I_{U W F}} ∥G_{D E} (z) - z∥ .

(2)

We especially impose L2 regularization [34] loss

L_{R}

on the weight of

G_{D E}

to retain knowledge by preventing the abrupt change of the weight as much as possible when we use pre-trained

G_{D E}

with other datasets. Finally, the loss function

L_{E}

to adapt the

G_{D E}

to the fundus-UWF retinal image dataset is defined as:

\begin{matrix} L_{E} & = L_{L} + λ_{I} L_{I} + λ_{R} L_{R}, \end{matrix}

(3)

where

λ_{I}

and

λ_{R}

control the relative importance of

L_{I}

and

L_{R}

, respectively.

For more efficient adversarial training, we initialize the network parameters by pretraining using [5]. We then freeze the encoder parameters and only update the decoder parameters.

3.3.2. $G_{S R}$ Training

In this step, we freeze all trainable parameters in

G_{D E}

to generate

I_{E - U W F}

from

I_{U W F}

. After the adaptation process for

G_{D E}

is done, we apply adversarial loss to

G_{S R}

, which takes

I_{E - U W F}

from

G_{D E}

as an input and outputs the FQ-UWF result

I_{F Q - U W F}

, which is defined as:

\begin{matrix} L_{H} = & E_{x \sim I_{f u n d u s}} [\log D_{S R} (x)] \\ + E_{z \sim I_{E - U W F}} [1 - \log D_{S R} (G_{S R} (z))] . \end{matrix}

(4)

We also impose a cycle constraint [18], which maintains consistency between the two domains, resulting in more realistic and coherent image translations on

I_{f u n d u s}

→

I_{D S - f u n d u s}

→

I_{F Q - U W F}

. This can be denoted as follows:

L_{C} = E_{x \sim I_{f u n d u s}} ∥G_{S R} (D_{S R} (x)) - x∥ .

(5)

As mentioned in [17], by applying one-way cycle loss, the network can learn to handle various degradations by opening up the possibility of one-to-many generation mapping.

Overall, the loss function for

G_{S R}

training is expressed as follows:

\begin{matrix} L_{R} & = L_{H} + λ_{C} L_{C}, \end{matrix}

(6)

where

λ_{C}

controls the relative importance of

L_{C}

.

3.3.3. Overall Fine-Tuning

In the previous training steps,

G_{D E}

and

G_{S R}

are trained independently. But to ensure stability and integration between the two generators, a final calibration process is performed on the entire architecture. Additionally, to improve the network’s performance in clinical situations, where the diagnosis of lesions is mainly based on the macular region rather than the periphery of the fundus, we again employ the same loss combinations as follows, only using patches from the macular region to fine-tune the entire model:

L_{M} = L_{E} + L_{R} .

(7)

4. Experiments

4.1. Datasets and Settings

We used 3744 UWF images and 3744 fundus images acquired from the Kangbuk Samsung Medical Center (KBSMC) Ophthalmology Department from 2017 to 2019. Although UWF and fundus images were acquired in pairs, we anonymized and shuffled the image sets and did not use information of paired images during training. To train the model proposed in this paper, we used 3370 UWF and 3370 fundus images (unpaired). We set the scaling factor for super-resolution to 4, which was close to the approximate average difference in resolution between the UWF and fundus images. To test the model, we used 374 UWF images that were not used during training.

4.2. Implementation Details

We use the AdamW [35] optimizer with learning rate

= 1 e - 3

,

β_{1} = 0.9

,

β_{2} = 0.999

, and

ϵ = 10^{- 8}

to train

G_{D E}

and

G_{S R}

, with weight decay every 100K iterations with a decay rate of

1 e - 2

. We set the learning rate to be halved every 200K iterations and the batch size as 16, and we train the model for more than

5 \times 10^{6}

iterations using an NVIDIA RTX 2080Ti GPU. We feed two

128 \times 128

-sized

I_{U W F}

and

I_{f u n d u s}

patches that are randomly extracted from the UWF and fundus retinal images, respectively. During training, we apply additional dataset augmentations using rotation and flipping for

I_{U W F}

and

I_{f u n d u s}

.

We set

λ_{I}

,

λ_{R}

, and

λ_{C}

, which adjust the degree of importance of

L_{I}

,

L_{R}

, and

L_{C}

to be

0.5

,

0.1

, and

0.5

, respectively.

4.3. Baselines for Comparison

We choose the following baselines to compare with the proposed method on the KBSMC dataset: (i) ZSSR [13], (ii) cycle-in-cycle GAN [36], (iii) KMSR [37], (iv) CinCGAN [16], and (v) RLrestore [14] + bicubic upsampling. We train these five baselines on the KBSMC dataset from scratch.

4.4. Evaluation Metrics

As we do not assume paired images for training, we avoid the use of reference-based metrics such as the PSNR [38] or SSIM [39] that require paired GTs. Instead, we measure the LPIPS [40] and the FID [41]. Both metrics indicate a closer distance between the two images when their values are smaller.

Additionally, given the nature of retinal images with various degradations, achieving sharp images is also an important consideration. To measure this, we measure

γ

[42,43]. A lower value of the

γ

metric implies a higher level of sharpness in the generated images, and therefore, the model is considered to deliver higher performance. We further substantiate the statistical validity of our comparisons by employing two-sided tests. We first utilize ANOVA [44] to ascertain whether there were significant differences in the means among groups. Subsequently, to identify specific groups where differences exist, we employ Bonferroni’s correction [45]. These analyses are conducted using p-values for confirmation.

Furthermore, we attempt to measure the clinical impact of our method by comparative evaluation of the visibility of drusen in the

I_{U W F}

images before improvement, the

I_{F Q - U W F}

images after improvement, and the

I_{f u n d u s}

images. In this process, medical practitioners annotated drusen masks in the order of

I_{U W F} \to I_{F Q - U W F} \to I_{f u n d u s}

to minimize potential biases that might arise.

4.5. Experiments on the KBSMC Dataset

Figure 2 depicts samples of the enhancement by the proposed method. Improved clarity of vessel lines and background patterns can be observed.

4.5.1. Domain Distance Measurement Results

Table 1 shows the

γ

, LPIPS, and FID results of the baselines for comparison and our method. The proposed method yields the best results in terms of the

γ

and LPIPS metrics and the second-best results in terms of the FID. Figure 5 shows the corresponding sample results before and after the improvements with the given methods. We can see visible improvements in the patterns of vessels and the macula. This is corroborated by the

γ

values in Table 1. The low p-values

< 0.001

in the table show the statistical significance of our method in terms of LPIPS, FID, and

γ

.

4.5.2. Enhancement Results for Severe Degradations

Figure 6 illustrates the comparison with various unpaired super-resolution methods and our method for the challenging scenario wherein the input image is corrupted with the following synthetic degradations: (i) Gaussian blur with

σ = 7

, where the image is degraded with a Gaussian blur kernel of size

σ

×

σ

as in [46]; (ii) Illumination with

γ = 0.75

, where the brightness of the image is unevenly illuminated by gamma correction with

γ

as in [47]; (iii) JPEG compression with

r a t e = 0.25

, where the

c o m p r e s s i o n

r a t i o

= r a t e

as in [48]; (iv) Bicubic downsampling with

s c a l e = 0.25

, where the size of neighborhoods for interpolation is

s c a l e

×

s c a l e

as in [49]. Table 2 presents the corresponding results in terms of the r, LPIPS, and FID metrics. When considering these results collectively, our method demonstrates the most consistent and effective improvement across the majority of degradation types.

4.5.3. Drusen Detection Results

Figure 7 presents samples of

I_{U W F}

,

I_{F Q - U W F}

, and

I_{f u n d u s}

images with corresponding manually annotated drusen region masks. Quantitative comparative evaluations of the drusen region masks for

I_{U W F}

and

I_{F Q - U W F}

are presented in Table 3. Assuming the

I_{f u n d u s}

drusen mask as GT, we measure the mean average precision (mAP) as the intersection over union (IoU) [50] averaged across the number of images. The increase in mAP highlights the improved diagnostic capabilities through the enhanced

I_{F Q - U W F}

images.

4.6. Ablation Study

Table 1 illustrates the performance results of method variations such as the inclusion of pre-trained

G_{D E}

through

L_{E}

for training, the utilization of

G_{D E}

and

G_{S R}

, and the consideration of their configuration order. When utilizing pre-trained

G_{D E}

before super-resolution without a separate degradation enhancement process, significantly better results in terms of

γ

, LPIPS, and FID metrics were observed compared to cases where only super-resolution was performed. And training

G_{D E}

via

L_{E}

and utilizing it for super-resolution led to overwhelmingly superior results. Also, the configuration order of

G_{D E}

and

G_{S R}

shows a substantial numerical difference, justifying the subsequent structure of the modules.

Table 4 shows the performance changes when specific components of the loss functions that constitute the entire network are used. According to these results, the most significant performance improvement in our model, which is composed of both

G_{D E}

and

G_{S R}

, is achieved when fine-tuning

G_{D E}

to suit the

I_{D S - f u n d u s}

image domain. Furthermore, we can observe that utilizing

G_{D E}

, even when employing the bicubic upsampling method, outperformed the results using only the SRM network. This suggests that super-resolution without adequate degradation removal has limitations in enhancing retinal images. Figure 8 illustrates the importance of the process for removing degradations before super-resolution. We can see that using the improved

I_{E - U W F}

through the

G_{D E}

to generate

I_{F Q - U W F}

showcases a significantly superior enhancement capability compared to generating

I_{F Q - U W F}

directly from

I_{U W F}

without the prior degradation removal process.

5. Discussion

The proposed method can be trained on unpaired UWF and fundus image sets. By reducing dependency on paired and annotated data, our method becomes more pragmatic for integration into real-world medical settings, where the acquisition of such data is often a logistical challenge. The enhanced image quality facilitated by our approach holds the potential to significantly improve diagnostic accuracy. The ability to detect subtle changes in the retinal structure, often indicative of early-stage pathologies, is critical for timely interventions and effective disease management.

Despite the promising outcomes, our study prompts further investigation into several critical areas. The robustness and generalizability of our model need to be rigorously examined across a spectrum of imaging conditions, including instances with various ocular pathologies and diverse qualities of image acquisition. The influence of different imaging devices and settings on our model’s performance demands scrutiny to ensure broad applicability in clinical settings.

To validate the real-world impact of our enhancement method, collaboration with domain experts and comprehensive clinical validation are imperative. Ophthalmologists’ insights will provide essential perspectives on how the enhanced image quality translates into improved diagnostic accuracy and treatment planning. The feasibility of implementation in diverse clinical settings warrants further exploration considering factors such as computational requirements, integration with existing diagnostic workflows, and user-friendly interfaces for healthcare professionals.

Author Contributions

All authors have participated in the conception and design, analysis and interpretation of the data, and drafting the article and revising it critically for important intellectual content. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study adhered to the tenets of the Declaration of Helsinki, and the protocol was reviewed and approved by the Institutional Review Board (IRB) of Kangbuk Samsung Hospital (No. KBSMC 2019-08-031).

Informed Consent Statement

Our study is retrospective using medical records, and our data were fully anonymized before processing. The IRB waived the requirement for informed consent.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding authors.

Conflicts of Interest

None of the authors have any proprietary interest or conflicts of interests related to this submission.

Abbreviations

The following abbreviations are used in this manuscript:

UWF	Ultra-Widefield
FQ	Fundus Quality
GAN	Generative Adversarial Network
DE	Degradation Enhancement
SR	Super Resolution
KBSMC	Kangbuk Samsung Medical Center
IoU	Intersection over Union
GT	Ground Truth

References

Kumar, V.; Surve, A.; Kumawat, D.; Takkar, B.; Azad, S.; Chawla, R.; Shroff, D.; Arora, A.; Singh, R.; Venkatesh, P. Ultra-wide field retinal imaging: A wider clinical perspective. Indian J. Ophthalmol. 2021, 69, 824–835. [Google Scholar] [CrossRef] [PubMed]
Midena, E.; Marchione, G.; Di Giorgio, S.; Rotondi, G.; Longhin, E.; Frizziero, L.; Pilotto, E.; Parrozzani, R.; Midena, G. Ultra-wide-field fundus photography compared to ophthalmoscopy in diagnosing and classifying major retinal diseases. Sci. Rep. 2022, 12, 19287. [Google Scholar] [CrossRef] [PubMed]
Fei, B.; Lyu, Z.; Pan, L.; Zhang, J.; Yang, W.; Luo, T.; Zhang, B.; Dai, B. Generative Diffusion Prior for Unified Image Restoration and Enhancement. arXiv 2023, arXiv:2304.01247. [Google Scholar]
Yang, B.; Zhao, H.; Cao, L.; Liu, H.; Wang, N.; Li, H. Retinal image enhancement with artifact reduction and structure retention. Pattern Recognit. 2023, 133, 108968. [Google Scholar] [CrossRef]
Lee, K.G.; Song, S.J.; Lee, S.; Yu, H.G.; Kim, D.I.; Lee, K.M. A deep learning-based framework for retinal fundus image enhancement. PLoS ONE 2023, 18, e0282416. [Google Scholar] [CrossRef] [PubMed]
Li, D.; Zhang, L.; Sun, C.; Yin, T.; Liu, C.; Yang, J. Robust Retinal Image Enhancement via Dual-Tree Complex Wavelet Transform and Morphology-Based Method. IEEE Access 2019, 7, 47303–47316. [Google Scholar] [CrossRef]
Román, J.C.M.; Noguera, J.L.V.; García-Torres, M.; Benítez, V.E.C.; Matto, I.C. Retinal Image Enhancement via a Multiscale Morphological Approach with OCCO Filter. In Proceedings of the Information Technology and Systems, Libertad City, Ecuador, 4–6 February 2021; Rocha, Á., Ferrás, C., López-López, P.C., Guarda, T., Eds.; Springer: Cham, Switzerland, 2021; pp. 177–186. [Google Scholar]
Abbood, S.H.; Hamed, H.N.A.; Rahim, M.S.M.; Rehman, A.; Saba, T.; Bahaj, S.A. Hybrid Retinal Image Enhancement Algorithm for Diabetic Retinopathy Diagnostic Using Deep Learning Model. IEEE Access 2022, 10, 73079–73086. [Google Scholar] [CrossRef]
Pham, Q.T.M.; Shin, J. Generative Adversarial Networks for Retinal Image Enhancement with Pathological Information. In Proceedings of the 2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM), Seoul, Republic of Korea, 4–6 January 2021; pp. 1–4. [Google Scholar] [CrossRef]
Yang, J.; Wright, J.; Huang, T.; Ma, Y. Image super-resolution as sparse representation of raw image patches. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar] [CrossRef]
Yang, C.Y.; Ma, C.; Yang, M.H. Single-Image Super-Resolution: A Benchmark. In Proceedings of the Computer Vision—ECCV 2014, Cham, Switzerland, 6–12 September 2014; pp. 372–386. [Google Scholar]
Zheng, Z.; Nie, N.; Ling, Z.; Xiong, P.; Liu, J.; Wang, H.; Li, J. DIP: Deep Inverse Patchmatch for High-Resolution Optical Flow. arXiv 2022, arXiv:2204.00330. [Google Scholar]
Shocher, A.; Cohen, N.; Irani, M. “Zero-Shot” Super-Resolution using Deep Internal Learning. arXiv 2017, arXiv:1712.06087. [Google Scholar]
Yu, K.; Dong, C.; Lin, L.; Loy, C.C. Crafting a Toolchain for Image Restoration by Deep Reinforcement Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2443–2452. [Google Scholar]
Ledig, C.; Theis, L.; Huszar, F.; Caballero, J.; Aitken, A.P.; Tejani, A.; Totz, J.; Wang, Z.; Shi, W. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. arXiv 2016, arXiv:1609.04802. [Google Scholar]
Yuan, Y.; Liu, S.; Zhang, J.; Zhang, Y.; Dong, C.; Lin, L. Unsupervised Image Super-Resolution using Cycle-in-Cycle Generative Adversarial Networks. arXiv 2018, arXiv:1809.00437. [Google Scholar]
Maeda, S. Unpaired Image Super-Resolution using Pseudo-Supervision. arXiv 2020, arXiv:2002.11397. [Google Scholar]
Zhu, J.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. arXiv 2017, arXiv:1703.10593. [Google Scholar]
Yi, Z.; Zhang, H.; Tan, P.; Gong, M. DualGAN: Unsupervised Dual Learning for Image-to-Image Translation. arXiv 2018, arXiv:1704.02510. [Google Scholar]
Xu, T.; Zhang, P.; Huang, Q.; Zhang, H.; Gan, Z.; Huang, X.; He, X. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. arXiv 2017, arXiv:1711.10485. [Google Scholar]
Choi, Y.; Choi, M.; Kim, M.; Ha, J.W.; Kim, S.; Choo, J. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. arXiv 2018, arXiv:1711.09020. [Google Scholar]
Ye, L.; Zhang, B.; Yang, M.; Lian, W. Triple-translation GAN with multi-layer sparse representation for face image synthesis. Neurocomputing 2019, 358, 294–308. [Google Scholar] [CrossRef]
Brock, A.; Donahue, J.; Simonyan, K. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Kang, M.; Shin, J.; Park, J. StudioGAN: A Taxonomy and Benchmark of GANs for Image Synthesis. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 2023, 45, 15725–15742. [Google Scholar] [CrossRef] [PubMed]
Tzeng, E.; Hoffman, J.; Saenko, K.; Darrell, T. Adversarial Discriminative Domain Adaptation. arXiv 2017, arXiv:1702.05464. [Google Scholar]
Fregier, Y.; Gouray, J.B. Mind2Mind: Transfer Learning for GANs. In Proceedings of the Geometric Science of Information, Paris, France, 21–23 July 2021; Nielsen, F., Barbaresco, F., Eds.; Springer: Cham, Switzerland, 2021; pp. 851–859. [Google Scholar]
Wang, Y.; Wu, C.; Herranz, L.; van de Weijer, J.; Gonzalez-Garcia, A.; Raducanu, B. Transferring GANs: Generating images from limited data. arXiv 2018, arXiv:1805.01677. [Google Scholar]
Elaraby, N.; Barakat, S.; Rezk, A. A conditional GAN-based approach for enhancing transfer learning performance in few-shot HCR tasks. Sci. Rep. 2022, 12, 16271. [Google Scholar] [CrossRef]
Li, Q.; Mai, L.; Alcorn, M.A.; Nguyen, A. A cost-effective method for improving and re-purposing large, pre-trained GANs by fine-tuning their class-embeddings. arXiv 2020, arXiv:1910.04760. [Google Scholar]
Mo, S.; Cho, M.; Shin, J. Freeze the Discriminator: A Simple Baseline for Fine-Tuning GANs. arXiv 2020, arXiv:2002.10964. [Google Scholar]
Mertikopoulos, P.; Papadimitriou, C.H.; Piliouras, G. Cycles in adversarial regularized learning. arXiv 2017, arXiv:1709.02738. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Azad, R.; Aghdam, E.K.; Rauland, A.; Jia, Y.; Avval, A.H.; Bozorgpour, A.; Karimijafarbigloo, S.; Cohen, J.P.; Adeli, E.; Merhof, D. Medical Image Segmentation Review: The success of U-Net. arXiv 2022, arXiv:2211.14830. [Google Scholar]
Cortes, C.; Mohri, M.; Rostamizadeh, A. L2 Regularization for Learning Kernels. arXiv 2012, arXiv:1205.2653. [Google Scholar]
Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. arXiv 2019, arXiv:1711.05101. [Google Scholar]
Kim, G.; Park, J.; Lee, K.; Lee, J.; Min, J.; Lee, B.; Han, D.K.; Ko, H. Unsupervised Real-World Super Resolution with Cycle Generative Adversarial Network and Domain Discriminator. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 1862–1871. [Google Scholar] [CrossRef]
Zhou, R.; Süsstrunk, S. Kernel Modeling Super-Resolution on Real Low-Resolution Images. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2433–2443. [Google Scholar] [CrossRef]
Horé, A.; Ziou, D. Image Quality Metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.; Sheikh, H.; Simoncelli, E. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar] [CrossRef]
Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. arXiv 2018, arXiv:1706.08500. [Google Scholar]
Bai, X.; Zhou, F.; Xue, B. Image enhancement using multi scale image features extracted by top-hat transform. Opt. Laser Technol. 2012, 44, 328–336. [Google Scholar] [CrossRef]
Lai, R.; Yang, Y.T.; Wang, B.J.; Zhou, H.X. A quantitative measure based infrared image enhancement algorithm using plateau histogram. Opt. Commun. 2010, 283, 4283–4288. [Google Scholar] [CrossRef]
St»hle, L.; Wold, S. Analysis of variance (ANOVA). Chemom. Intell. Lab. Syst. 1989, 6, 259–272. [Google Scholar] [CrossRef]
Bonferroni, C. Teoria Statistica delle Classi e Calcolo delle Probabilità; Pubblicazioni del R. Istituto superiore di scienze economiche e commerciali di Firenze; Seeber: Florence, Italy, 2010. [Google Scholar] [CrossRef]
Gedraite, E.S.; Hadad, M. Investigation on the effect of a Gaussian Blur in image filtering and segmentation. In Proceedings of the ELMAR-2011, Zadar, Croatia, 14–16 September 2011; pp. 393–396. [Google Scholar]
Shi, Y.; Yang, J.; Wu, R. Reducing Illumination Based on Nonlinear Gamma Correction. In Proceedings of the 2007 IEEE International Conference on Image Processing, San Antonio, TX, USA, 16 September–19 October 2007; Volume 1, pp. I-529–I-532. [Google Scholar] [CrossRef]
Wallace, G. The JPEG still picture compression standard. IEEE Trans. Consum. Electron. 1992, 38, xviii–xxxiv. [Google Scholar] [CrossRef]
Rad, M.S.; Yu, T.; Musat, C.; Ekenel, H.K.; Bozorgtabar, B.; Thiran, J.P. Benefiting from Bicubically Down-Sampled Images for Learning Real-World Image Super-Resolution. arXiv 2020, arXiv:2007.03053. [Google Scholar]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. arXiv 2018, arXiv:1703.06870. [Google Scholar]

Figure 1. Conventional fundus image vs. ultra-widefield (UWF) image. (a) UWF images drastically increase the capability to observe the retina and can cover over 80%, which is more than a five-fold increase compared to (b) conventional fundus images. The diagrams in the left of (a,b) are reproduced from https://www.optomap.com/optomap-imaging/ accessed on 1 March 2022.

Figure 2. Sample results of the proposed UWF enhancement method. The top row depicts the input UWF images, and the bottom row depicts the FQ-UWF images enhanced by the proposed method. Numbered boxes are enlarged sample views of representative local regions. The clarity of anatomical structures such as vessels is greatly improved in the FQ-UWF images.

Figure 3. The overall architecture of the proposed method.

I_{U W F}

with severe degradations and artifacts is first enhanced to

I_{E - U W F}

via

G_{D E}

, for which the output is fed to

G_{S R}

to generate

\times 4

up-scaled

I_{F Q - U W F}

.

I_{f u n d u s}

is down-scaled to

I_{D S - f u n d u s}

with a scaling factor of 4.

D_{D E}

and

D_{S R}

measure the similarity between

I_{E - U W F}

and

I_{D S - f u n d u s}

to train

G_{D E}

and the similarity between

I_{F Q - U W F}

and

I_{f u n d u s}

to train

G_{S R}

, respectively.

Figure 3. The overall architecture of the proposed method.

I_{U W F}

with severe degradations and artifacts is first enhanced to

I_{E - U W F}

via

G_{D E}

, for which the output is fed to

G_{S R}

to generate

\times 4

up-scaled

I_{F Q - U W F}

.

I_{f u n d u s}

is down-scaled to

I_{D S - f u n d u s}

with a scaling factor of 4.

D_{D E}

and

D_{S R}

measure the similarity between

I_{E - U W F}

and

I_{D S - f u n d u s}

to train

G_{D E}

and the similarity between

I_{F Q - U W F}

and

I_{f u n d u s}

to train

G_{S R}

, respectively.

Figure 5. The enhanced FQ-UWF results. Input

I_{U W F}

images are improved using various methods.

Figure 5. The enhanced FQ-UWF results. Input

I_{U W F}

images are improved using various methods.

Figure 6. The enhanced FQ-UWF results. Different types of degradation are applied to

I_{U W F}

images. Degraded images are improved using various methods.

Figure 6. The enhanced FQ-UWF results. Different types of degradation are applied to

I_{U W F}

images. Degraded images are improved using various methods.

Figure 7. Qualitative drusen detection results.

Figure 8. The interim improvement results (a) Input image, (b)

I_{U W F}

, (c)

I_{E - U W F}

, (d)

I_{F Q - U W F}

, and (e) direct super-resolution results using

G_{S R}

of (b).

Figure 8. The interim improvement results (a) Input image, (b)

I_{U W F}

, (c)

I_{E - U W F}

, (d)

I_{F Q - U W F}

, and (e) direct super-resolution results using

G_{S R}

of (b).

Table 1. Quantitative evaluation of KBSMC dataset.

Method	r ↓ (p-Value)	LPIPS ↓ (p-Value)	FID ↓ (p-Value)
ZSSR [13]	0.775 (<0.001)	0.624 (<0.001)	117.193 (<0.001)
cycle-in-cycle GAN [36]	0.803 (<0.001)	0.552 (<0.001)	103.010 (<0.001)
KMSR [37]	0.590 (<0.001)	0.435 (<0.001)	15.192 (<0.001)
CinCGAN [16]	0.726 (<0.001)	0.653 (<0.001)	89.511 (<0.001)
RLrestore [14] + bicubic upsampling	0.514 (<0.001)	0.595 (<0.001)	54.118 (<0.001)
Ours: $G_{D E}$ w/o $L_{E}$ → bicubic upsampling	0.520 (<0.009)	0.318 (<0.001)	30.991 (<0.001)
Ours: $G_{D E}$ w/ $L_{E}$ → bicubic upsampling	0.499 (<0.001)	0.297 (<0.001)	25.120 (<0.001)
Ours: $G_{D E}$ w/o $L_{E}$ → $G_{S R}$	0.503 (<0.001)	0.284 (<0.001)	27.055 (<0.001)
Ours: $G_{S R}$ only	0.654 (<0.001)	0.305 (<0.001)	41.317 (<0.001)
Ours: $G_{S R}$ → $G_{D E}$ w/o $L_{E}$	0.671 (<0.001)	0.300 (<0.001)	26.114 (<0.001)
Ours: $G_{S R}$ → $G_{D E}$ w/ $L_{E}$	0.585 (<0.001)	0.288 (<0.001)	26.017 (<0.001)
Ours: full	0.317	0.231	17.235

Values are mean ± standard deviation. For

γ

, LPIPS, and FID, smaller values indicate better performance. Bold values denote the most effective method corresponding to each evaluation metric.

Table 2. Quantitative comparison on degraded KBSMC dataset.

Degradation Type	Methods	r ↓	LPIPS ↓	FID ↓
Gaussian Blur $(σ = 7)$	ZSSR [13]	$0.724$	$0.836$	$137.739$
	cycle-in-cycle GAN [36]	$0.799$	$0.889$	$140.350$
	KMSR [37]	$0.509$	$0.802$	$49.957$
	CinCGAN [16]	$0.710$	$0.790$	$92.041$
	RLrestore [14] + bicubic upsampling	$0.663$	$0.811$	$98.818$
	Ours	$0.471$	$0.599$	$31.535$
Illumination $(γ = 0.75)$	ZSSR [13]	$0.632$	$0.777$	$109.176$
	cycle-in-cycle GAN [36]	$0.601$	$0.818$	$104.073$
	KMSR [37]	$0.456$	$0.659$	$23.717$
	CinCGAN [16]	$0.643$	$0.751$	$79.990$
	RLrestore [14] + bicubic upsampling	$0.589$	$0.612$	$88.235$
	Ours	$0.375$	$0.363$	$20.532$
JPEG Compression $(r a t e = 0.25)$	ZSSR [13]	$0.721$	$0.809$	$119.501$
	cycle-in-cycle GAN [36]	$0.638$	$0.829$	$90.1199$
	KMSR [37]	$0.557$	$0.771$	$26.181$
	CinCGAN [16]	$0.699$	$0.832$	$84.595$
	RLrestore [14] + bicubic upsampling	$0.600$	$0.793$	$91.932$
	Ours	$0.497$	$0.552$	$34.172$
Bicubic Downsampling $(s c a l e = 0.25)$	ZSSR [13]	$0.703$	$0.813$	$163.115$
	cycle-in-cycle GAN [36]	$0.637$	$0.847$	$112.752$
	KMSR [37]	$0.553$	$0.728$	$36.114$
	CinCGAN [16]	$0.729$	$0.797$	$104.969$
	RLrestore [14] + bicubic upsampling	$0.581$	$0.607$	$82.032$
	Ours	$0.413$	$0.595$	$39.001$

Values are mean ± standard deviation. For

γ

, LPIPS, and FID, smaller values indicate better performance. Bold values denote the most effective method corresponding to each evaluation metric and each degradation type.

Table 3. Quantitative drusen detection results.

Image Pair	mAP
$I_{U W F}$ - $I_{f u n d u s}$	46.3%
$I_{F Q - U W F}$ - $I_{f u n d u s}$	62.4%

Table 4. Ablation study.

Loss Combination	r ↓	LPIPS ↓	FID ↓
$L_{H}$	$0.683$	$0.508$	$81.392$
$L_{H} + L_{E}$	$0.415$	$0.329$	$37.508$
$L_{R} + L_{E}$	$0.301$	$0.256$	$23.125$
$L_{R} + L_{E} + L_{M}$	$0.317$	$0.231$	$17.235$

Values are mean ± standard deviation. For

γ

, LPIPS, and FID, smaller values indicate better performance. Bold values denote the most effective method corresponding to each evaluation metric.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, K.G.; Song, S.J.; Lee, S.; Kim, B.H.; Kong, M.; Lee, K.M. FQ-UWF: Unpaired Generative Image Enhancement for Fundus Quality Ultra-Widefield Retinal Images. Bioengineering 2024, 11, 568. https://doi.org/10.3390/bioengineering11060568

AMA Style

Lee KG, Song SJ, Lee S, Kim BH, Kong M, Lee KM. FQ-UWF: Unpaired Generative Image Enhancement for Fundus Quality Ultra-Widefield Retinal Images. Bioengineering. 2024; 11(6):568. https://doi.org/10.3390/bioengineering11060568

Chicago/Turabian Style

Lee, Kang Geon, Su Jeong Song, Soochahn Lee, Bo Hee Kim, Mingui Kong, and Kyoung Mu Lee. 2024. "FQ-UWF: Unpaired Generative Image Enhancement for Fundus Quality Ultra-Widefield Retinal Images" Bioengineering 11, no. 6: 568. https://doi.org/10.3390/bioengineering11060568

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

FQ-UWF: Unpaired Generative Image Enhancement for Fundus Quality Ultra-Widefield Retinal Images

Abstract

1. Introduction

2. Related Works

2.1. Retinal Image Enhancement

2.2. Blind and Unpaired Image Restoration

2.3. Hierarchical or Multi-Structured GAN

2.4. Transfer Learning for GANs

3. Methods

3.1. Overview of FQ-UWF Generation

3.2. Architecture Details

3.2.1. G D E

3.2.2. G S R

3.2.3. D D E a n d D S R

3.3. Loss Functions and Training Details

3.3.1. G D E Training

3.3.2. G S R Training

3.3.3. Overall Fine-Tuning

4. Experiments

4.1. Datasets and Settings

4.2. Implementation Details

4.3. Baselines for Comparison

4.4. Evaluation Metrics

4.5. Experiments on the KBSMC Dataset

4.5.1. Domain Distance Measurement Results

4.5.2. Enhancement Results for Severe Degradations

4.5.3. Drusen Detection Results

4.6. Ablation Study

5. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.2.1. $G_{D E}$

3.2.2. $G_{S R}$

3.2.3. $D_{D E} a n d D_{S R}$

3.3.1. $G_{D E}$ Training

3.3.2. $G_{S R}$ Training