SIDGAN: Efficient Multi-Module Architecture for Single Image Defocus Deblurring

Ling, Shenggui; Zhan, Hongmin; Cao, Lijia

doi:10.3390/electronics13122265

Open AccessArticle

SIDGAN: Efficient Multi-Module Architecture for Single Image Defocus Deblurring

by

Shenggui Ling

^*

,

Hongmin Zhan

and

Lijia Cao

School of Computer Science and Engineering, Sichuan University of Science & Engineering, Zigong 643000, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(12), 2265; https://doi.org/10.3390/electronics13122265

Submission received: 26 April 2024 / Revised: 30 May 2024 / Accepted: 5 June 2024 / Published: 9 June 2024

(This article belongs to the Special Issue Artificial Intelligence in Image Processing and Computer Vision)

Download

Browse Figures

Versions Notes

Abstract

:

In recent years, with the rapid developments in deep learning and graphics processing units, learning-based defocus deblurring has made favorable achievements. However, the current methods are not effective in processing blurred images with a large depth of field. The greater the depth of field, the blurrier the image, namely, the image contains large blurry regions and encounters severe blur. The fundamental reason for the unsatisfactory results is that it is difficult to extract effective features from the blurred images with large blurry regions. For this reason, a new FFEM (Fuzzy Feature Extraction Module) is proposed to enhance the encoder’s ability to extract features from images with large blurry regions. After using the FFEM during encoding, its PSNR (Peak Signal-to-Noise Ratio) is improved by 1.33% on the DPDD (Dual-Pixel Defocus Deblurring). Moreover, images with large blurry regions often cause the current algorithms to generate artifacts in their results. Therefore, a new module named ARM (Artifact Removal Module) is proposed in this work and employed during decoding. After utilizing the ARM during decoding, its PSNR is improved by 2.49% on the DPDD. After using the FFEM and the ARM simultaneously, compared to the latest algorithms, the PSNR of our method is improved by 3.29% on the DPDD. Following the previous research in this field, qualitative and quantitative experiments are conducted on the DPDD and the RealDOF (Real Depth of Field), and the experimental results indicate that our method surpasses the state-of-the-art algorithms in three objective metrics.

Keywords:

defocus deblurring; defocus image; out-of-focus deblurring; blur image; image restoration

1. Introduction

Defocus blur of an image occurs when an object is out of the Depth of Field (DOF) of an imaging system. The aperture shape and lens design of the camera determine the blur shape, and the blur size varies depending on the depth of a scene point and the intrinsic parameters of the camera [1]. Defocused images impact not only human visual perception but also the performance of various vision tasks such as object detection, target recognition, image segmentation, and so forth. Despite extensive research on single image defocus deblurring, it remains a challenging problem because defocus blur not only spatially varies in size, but its shape also varies across the image. Therefore, it is essential to restore an all-in-focus image from its defocused one.

The conventional method of defocus deblurring is to model defocus blur as a combination of different convolution results obtained by applying various kernels to the sharp image [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]. According to the blur model, these methods can obtain sharp images by performing non-blind deconvolution on the blurry image after estimating per-pixel blur kernels. However, due to the employment of the restrictive blur model, the aforementioned methods often fail to recover sharp images from the defocused images because they consider that the blurred images only contain specific blur kernels, such as Gaussian kernels or discs while performing defocus deblurring. On the contrary, the blurred images in the real world are more complex than the images generated using algorithms or captured in the laboratory.

In recent years, with the rapid developments in hardware and deep learning, defocus deblurring has gradually evolved from traditional approaches to deep learning-based techniques. The first end-to-end learning-based algorithm for defocus deblurring, DPDNet (Dual-Pixel Deblurring Network) [17], was proposed by Abuolaim and Brown. Moreover, they created a new dataset called Dual-Pixel Defocus Deblurring (DPDD) [17]. In virtue of the superiority in the end-to-end learning, the sharp images generated by DPDNet are better than conventional defocus deblurring methods based on a single image in the aspect of both quantitative and qualitative evaluation. Henceforth, more and more experts and scholars all over the world began to make use of learning-based methods for image defocus deblurring [17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46].

Although previous research has achieved good image quality in single image defocus deblurring, the current methods are not effective in processing blurred images with a large depth of field. The greater the depth of field, the blurrier the image, namely, the image contains large blurry regions and encounters severe blur. The blurrier the image, the more features are lost. The fundamental reason for the unsatisfactory results is that it is difficult to extract effective features from the blurred images that contain large blurry regions. For this reason, a new Fuzzy Feature Extraction Module (FFEM) is proposed for enhancing the encoder’s ability to extract features from the image with large blurry regions. After using the FFEM during encoding, its PSNR (Peak Signal-to-Noise Ratio) is improved by 1.33% on the DPDD.

Moreover, images with large blurry regions often cause the current algorithms to generate artifacts in their results. Therefore, a new module named the Artifact Removal Module (ARM) is proposed in this work and employed during decoding. After using the ARM during decoding, its PSNR is improved by 2.49% on the DPDD. As illustrated in Figure 1, our method is capable of not only restoring structural contents and textural details but also preserving spatial smoothness of the homogeneous regions in the field of single image defocus deblurring when the images contain large blurry regions and encounter severe blur. In summary, the contributions of this paper are as follows:

Firstly, we present the Artifact Removal Module (ARM) that contributes to the removal of artifacts during decoding in the field of single image defocus deblurring. The design of the ARM addresses the artifact problem in defocus deblurring, which provides a new solution for improving image quality.
Secondly, we propose the Fuzzy Feature Extraction Module (FFEM) that is conducive to enhancing the encoder’s ability to extract features from the defocused image with large blurry regions. The design of the FFEM focuses on the effectiveness and robustness of feature extraction, which provides an effective means for improvement in image quality.
Finally, a novel algorithm for single image defocus deblurring is proposed and carefully designed. Qualitative and quantitative experiments demonstrate that the proposed method surpasses the state-of-the-art methods in image quality.

2. Related Works

2.1. Existing Approaches for Defocus Deblurring

The traditional defocus deblurring methods first estimate the defocus map of the defocused image [2,3,4], and then use the defocus map to perform non-blind deconvolution [5,6,7,8,9] to rehabilitate a sharp image. Because the performance of the former approaches completely depends on the accuracy of the defocus map, therefore, considerable effort has been made to improve the accuracy of the defocus map [10,11,12]. In addition, Andres et al. [13] proposed a new method to estimate the spatially varying defocus blur and then used local frequency image features and regression tree fields to train their model to obtain a coherent defocus blur map. Liu et al. [14] proposed a new method to estimate the parameter of the Generalized Gaussian kernels from image patches for non-blind image deblurring. Goilkar et al. [15] proposed blind and non-blind deconvolution techniques for the restoration of a defocused image. Chan et al. [16] used blind deconvolution to estimate the blur kernel and then used total variation minimization to recover the background and fuse it with the foreground to obtain a sharp image. However, the aforementioned methods not only require intensive computation but also end up with unsatisfactory results by reason of the inaccurate defocus map.

In recent years, with the rapid development of deep learning, significant breakthroughs have been made in the field of defocused image restoration. Abuolaim et al. [17] proposed a dataset called DPDD, which is the first public dataset used to train and validate an end-to-end deep learning framework for defocus deblurring. Lee et al. [18] proposed a convolutional neural network for estimating defocus map and defocus deblurring. Lee et al. [19] proposed an end-to-end approach that is equipped with an Interactive Filter Adaptive Network (IFAN) and created a dataset named RealDOF (Real Depth of Field). The IFAN does not directly predict pixel values, but generates spatially adaptive per-pixel deblurring filters, which are then applied to features extracted from a blurred image to generate deblurred features. Abuolaim et al. [20] proposed a single-encoder multi-decoder deep neural network for single image deblurring, which incorporates the two sub-aperture views into a multi-task framework. They discovered that jointly learning to predict the two DP (Dual-Pixel) views from a single blurry image improves the network’s ability to deblur. Zhao et al. [21] implemented an adversarial promoting learning framework to jointly handle defocus detection and defocus deblurring. Quan et al. [22] utilized single image defocus deblurring (SIDD) to remove defocus blur.

Zhang et al. [23] proposed a dual network with two subnets for estimating depth and defocus from a single image. Anwar et al. [24] estimated depth by cascaded convolutional and fully connected neural networks and then used the depth to recover sharp images. Karaali et al. [25] put forward a deep convolutional neural network to estimate defocus blur from a single defocused image. Yang et al. [26] first estimated a blur kernel and then performed the Fourier transform for deblurring. Next, their method reuses the blur kernel to perform a simple convolution for reblurring. Quan et al. [27] proposed a learnable recursive kernel representation to provide a compact yet effective and physics-encoded parametrization of the spatially varying defocus blurring process and then a physics-driven and efficient deep model with a cross-scale fusion structure was put forward. Li et al. [28] proposed a new network named GRL (Global, Regional, and Local) based on anchored stripe self-attention, window self-attention, and channel attention for image restoration. Ye et al. [29] introduced a new approach to estimating the defocus map of the scene, which then learned a direct mapping from the blurry image to the sharp image by using the estimated defocus map.

Zhao et al. [30] put forward a focused area detection attack (FADA) to enforce the focused area to reblur and implement a defocused region detection attack to guide the realistic blurred regions to deblur in the process of training the deblurring network. Ali et al. [31] introduced two encoder–decoder sub-networks for feeding with the blurry image and the estimated blur map, and the method works well when combined with a variety of blur estimation methods. Zhang et al. [32] proposed a dynamic multi-scale network for dual-pixel image defocus deblurring. The encoder network is composed of several vision transformer blocks and the reconstruction module is composed of several Dynamic Multi-scale Sub-reconstruction Modules. Saqib et al. [33] proposed a new Deep Neural Network (DNN) architecture for depth estimation and image deblurring by sharing the same encoder achieving good results. Nazir et al. [34] created the Indoor Depth from Defocus (iDFD) dataset, which contains naturally defocused, all-in-focus (AiF) images and dense depth maps of indoor environments. Mazilu et al. [35] used implicit and explicit regularization techniques to train an autoencoder, which enforces linearity relations among the representations of different blur levels in the latent space.

Zhao et al. [36] proposed a new method for defocus blur detection (DBD) based on adaptive cross-level feature fusion and refinement, which not only captures the complementary information of the cross-level features but also refines cross-level feature information. Zhang et al. [37] introduced common causes of image blur, benchmark datasets, performance metrics, and deep-learning-based image deblurring approaches. Chai et al. [38] proposed a hybrid CNN–Transformer architecture based on complementary residual learning for defocus blur detection. Fernando et al. [39] created a dataset and trained MobileNetV2 to classify image patches into one of the 20 levels of blurriness. Then the result was refined by applying an iterative weighted guided filter obtaining good results in adaptive image enhancement and defocus magnification. Zhang et al. [40] proposed a novel self-supervision training objective to enhance the model training consistency and stability and a hard mining strategy to accelerate the defocus blur detection model. Lin et al. [41] presented an iterative feedback framework for estimating depth maps and all-in-focus (AiF) images simultaneously.

Quan et al. [42] put forward a pixel-wise Gaussian kernel mixture (GKM) model for representing spatially variant defocus blur kernels. Then they designed GKMNet using a lightweight scale-recurrent architecture, with a scale-recurrent attention module for estimating the mixing coefficients in the GKM for defocus deblurring. Zhang et al. [43] proposed an efficient Multi-Refinement Network (MRNet) for dual-pixel image defocus deblurring. The alignment module and reconstruction module are the core of MRNet. Jung et al. [44] proposed the disparity probability volume module to predict the pixel-wise disparity probability and then incorporated the disparity probability into the defocus deblurring network to address defocus deblurring from dual-pixel image pairs. Zhai et al. [45] proposed a monocular depth estimation network to obtain the depth map and then used the map to guide the network for defocus deblurring. In order to improve the deblurring result, Ma et al. [46] introduced a new network for single image defocus deblurring by using defocus map estimation as an auxiliary task. Although the previous methods achieved promising results in the field of image defocus deblurring, they fail when the defocus image contains large blurry regions.

2.2. Defocus Deblurring Datasets

Extensive works suggest that high-quality and large-scale datasets are momentous for training an optimum model based on generative adversarial networks. For this reason, we mainly introduce relevant datasets in this section. Although there are several publicly available datasets for defocus deblurring research, such as DED [46], LFDOF [47], CUHK [48], RealDOF, SDD (Single-Image Defocus Deblurring) [49], DPDD, and PixelDP, DPDD is the only extensively adopted real-world training dataset. The test set of the DED dataset has 100 pairs and the training set contains 1012 pairs. Due to the use of defocus maps to train the network, the training images in this dataset also include 1012 defocus maps. Similar to the above dataset, there is also LFDOF that contains 12,000 images. The CUHK dataset is employed for blur detection; it contains 1000 images with human-labeled ground-truth blur regions. The RealDOF dataset constructed using dual cameras only contains 50 pairs of test images and no training images. The DPDD dataset collected from the real world is the most widely and frequently used dataset at present. It contains 76 pairs of test images, 350 pairs of training images, and 74 pairs of validation images, all of which contain left, middle, and right pixels.

Recently, Li et al. [49] proposed a joint deblurring and reblurring learning (JDRL) framework for image defocusing and a dataset called SDD that includes 115 pairs of training images and 35 pairs of test images. However, factors such as the corresponding images not being taken simultaneously with each other, as well as some uncontrollable natural factors such as wind and illumination, which can lead to misalignment between the sharp images and the defocused images, and overexposure of some regions in the image led to negative impacts on the model training and the evaluation of image quality. Detailed information on the aforementioned datasets is illustrated in Table 1.

2.3. Generative Adversarial Networks

In 2014, Goodfellow et al. [50] proposed the concept of generative adversarial networks (GANs) that consist of a generator and a discriminator. The purpose of the generator is to generate fake samples that are as similar to real samples as possible. The purpose of the discriminator is to distinguish as much as possible whether a given sample is a real sample or a fake one. They have opposite purposes, and in this game of competition with each other, they mutually enhance each other. In the end, even if the discriminator’s ability to distinguish is reliable enough, it is still impossible to distinguish whether the given sample is a real sample or a fake sample generated by the generator. The minimum and maximum targets of generator G and discriminator D are as follows:

min_{G} max_{D} \underset{x \sim P_{r}}{E} [log (D (x))] + \underset{\tilde{x} \sim P_{g}}{E} [log (1 - D (\tilde{x}))]

(1)

where

P_{r}

is the initial given data distribution.

P_{g}

is the distribution of data generated by the generator.

\tilde{x} = G (z)

represents the fake image generated by

G (z)

.

D (x)

represents the probability that the discriminator determines the authenticity of the sample: the more true it is, the closer it is to 1, and the more false it is, the closer it is to 0.

GANs have received a lot of attention since they were proposed, but as Salimans et al. [51] point out, the vanilla GANs have a series of problems such as gradient vanishing, mode collapse, hyperparameter sensitivity, and so on. Because the original GANs were derived from JS (Jensen–Shannon) divergence, their biggest flaw is that if the distributions of two samples do not overlap, regardless of how far apart they are, JS divergence remains constant at

log 2

. In the paper published by Arjovsky et al. [52], they proposed that JS divergence is the cause of training instability in the vanilla GANs. Besides that, they also proposed a new measurement method called Wasserstein distance, which can be defined as follows:

min_{G} max_{D \in L} \underset{x \sim P_{r}}{E} [D (x)] - \underset{z \sim P_{g}}{E} [D (G (z))]

(2)

Among them,

P_{r}

,

P_{g}

, and

G (z)

have the same meaning as Equation (1), except that

L

represents the Wasserstein distance which must satisfy the 1-Lipschitz condition, and the 1-Lipschitz constraint condition is represented as:

|D (x_{1}) - D (x_{2})| \leq |x_{1} - x_{2}|

(3)

the absolute difference in the output of the discriminator for two images

x_{1}

and

x_{2}

must be less than or equal to the absolute value of their average pixel-by-pixel difference. In other words, for different images, whether they are real images or fake images, the outputs of the discriminator should not differ too much. This means that the gradient of the function changes smoothly and there will be no abrupt changes in the gradient descent.

Therefore, the question is, how can we satisfy the 1-Lipschitz constraint? Arjovsky et al. [52] first adopted the weight clipping approach, truncating the discriminator parameters within the range of

[- 0.01, 0.01]

. However, the loss of the discriminator attempts to separate the scores of true and false samples. As weight clipping independently limits the range of values for each network parameter, it will bring about two extremes on the discriminator’s parameters, with a large distribution between

- 0.01

and

+ 0.01

. In response to a number of problems that arise from the approach of weight clipping proposed by Arjovsky et al., Gulrajani [53] et al. replaced weight clipping by introducing a gradient penalty and proposed WGAN-GP. The gradient penalty is used to satisfy the 1-Lipschitz constraint. After adding a gradient penalty term, the preceding Equation (3) can be rephrased as:

\begin{matrix} min_{G} max_{D \in L} \underset{x \sim P_{r}}{E} [D (x)] - \underset{z \sim P_{g}}{E} [D (G (z))] + λ \underset{x \sim χ}{E} {[| | ▿_{x} D (x) | |_{p} - 1]}^{2} \end{matrix}

(4)

given that gradient penalty terms can bring more stable training and have been widely used up to present. Due to the fact that WGAN-GP (Wasserstein Generative Adversarial Network With Gradient Penalty) has achieved competent results in the field of image generation such as image super-resolution [54], image shadow removal [55], image inpainting [56], illumination processing [57], and so forth, in this work, the gradient penalty of WGAN-GP is also utilized by our SIDGAN for improving the stability of training.

2.4. Conditional Adversarial Networks

Nowadays, in many image processing tasks, GANs are still as popular as they were in the past [58,59,60,61], and they have evolved into a wide range of variants so far [62,63]. In this section, we mainly discuss Conditional Adversarial Networks, which are another variant of GANs. The vanilla GANs will output a random photo when a random noise is input, and users cannot control specific output. However, in practice, it is not usually the case that a photo is outputted randomly, but a photo is output that can meet the user requirements, that is, the desired output photo and the input photo are corresponding and relevant. Just like defocus deblurring, we hope that the output of the model is semantically identical to the input image, but the only difference is that the output image is sharp and distinct. Thus, the idea of Isola et al. [64] is also introduced in the training process of SIDGAN. The detailed structure of SIDGAN’s generator will be discussed in detail in the next section of this paper, and with regard to the discriminator structure of SIDGAN, it is analogous to PatchGAN [64,65].

3. Proposed Method

In this section, the overall and elaborate architecture of SIDGAN is delineated first. Then, the proposed modules: the Fuzzy Feature Extraction Module (FFEM) and the Artifact Removal Module (ARM) are introduced. Then, the detailed structure of the discriminator is presented and described. Finally, the loss functions utilized by our SIDGAN are described in detail.

3.1. The Details of SIDGAN

As illustrated in Figure 2, the SIDGAN is composed of two main components, one is the generator and the other is the discriminator. The generator is responsible for generating sharp images from defocused images, whereas the discriminator is in charge of determining whether an image is real or fake. The detailed design of the generator is illustrated in Figure 3. Inspired by [66,67,68], a multi-scale original defocus image is put into a single encoder during downsampling, which is conducive to preserving more detailed features. The SCM (Shallow Convolutional Module) and the FAM (Feature Attention Module) proposed by Cho et al. [66] are able to extract the features of the original image while preserving the details of the original as soon as possible. Finally, during the training, the gradient penalty method of WGAN-GP is also employed to train the SIDGAN proposed in this work.

As illustrated in Figure 3, the original defocus image is fed into the generator from four branches. The first is to add the original image and the decoding results, which is named global skip connection and it contributes to preserving features. The second is that the blurry image is passed through 2 FAMs, 3 convolutions, and 3 FFEMs. The third is the case that a half of the original blurred image is first fed into the SCM and then passed into the FAM. The final branch is that one-quarter of the original is fed into the SCM and then passed into the FAM again. Thus far, the encoder of SIDGAN has been introduced. During decoding, 3 ARM modules, 2 deconvolutions, and 1 convolution are employed.

As is shown in the light green box in Figure 3, in order to solve the issue that it is hard to extract features from the defocus image with large blurry regions, we design the FFEM. The core of the FFEM is short skip connections, a distant skip connection, and a deep residual network [69]. During the early stage of the FFEM, three convolutions, with kernel sizes of 7, 3, and 3, are used. After each convolution, batch normalization and Relu [70] activation are attached. After completing 3 convolutions, 2 residual blocks are followed for the purpose of effectively extracting features from the image with large blurry regions. After that, two transposed convolution layers, a regularization layer, and a Relu activation layer are used. Then, after a convolution and a Tanh, a distant skip connection is used to directly fuse the output of the FAM with the output of Tanh to achieve the goal of being able to effectively extract features while still retaining the details of the original image. Finally, the FFEM is inserted into the encoder of SIDGAN, which enhances the feature extraction capability of the encoder and also lays the foundation for restoring high-quality images from defocus images.

Aimed at the common problem of artifacts in the field of defocus deblurring when the image contains large blurry regions and encounters severe blur, an ARM is put forward in this paper. The core of the ARM is distant skip connections, PixelShuffle, and a deep residual network. The specific design of the ARM is shown in the green box in Figure 3. At the beginning of the ARM, a convolution with kernel size 9 × 9 and PRelu [71] activation are adopted. The six residual blocks are employed to restore the details from the blurred image as well as to eliminate artifacts. After completing the residual computation, further convolution and normalization computations are performed. In order to eliminate artifacts in the decoding stage while being able to effectively preserve the original image features, a distant skip connection is also adopted after the first convolution, which is for the sake of fusing the features from the previous normalization results with the results obtained by the first convolution. The main function of PixelShuffle is to obtain high-resolution feature maps from low-resolution feature maps by convolution and reorganization among multiple channels. Moreover, PixelShuffle has achieved good results in the field of image generation [72,73]. Therefore, in this paper, three convolutions and two PixelShuffles are utilized at the end of the ARM to further eliminate the artifacts during decoding. Three ARMs are inserted into the decoding process of the SIDGAN, which enables our method to perceive the entire photo from coarse to fine and from local to overall.

The discriminator of our method is inspired by the components of PatchGAN, and its detailed structure is shown in Figure 4. In general, the discriminator contains 5 convolutions, 4 activation functions, and 3 BatchNorms. The stride values of the first 3 convolutions are set to 2 and the other values are set to 1. Their kernel sizes are set to 4 and negative slopes are set to 0.2. After the first convolution, PRelu is attached as an activation function. Then the second convolution follows. Behind the second convolution, BatchNorm and PRelu are used. After the second PRelu, a convolution, a BatchNorm, and a PRelu are adopted. Behind the third activation function, a convolution, a BatchNorm, and a PRelu follow. Finally, a convolution is attached.

3.2. Loss Function

Similar to most image processing tasks, L1 loss helps improve the image quality in the field of defocus deblurring, to some extent, which can reduce the artifacts of fake images generated by defocus deblurring methods. During the training process of the SIDGAN, the training results indicate that L1 loss indeed has a certain effect in removing artifacts. The essence of L1 loss is to subtract the value of the true image and the false image pixel by pixel, then calculate the absolute value, and then accumulate the average value. It can be described by using the following formula:

L_{M A E} = \frac{\sum_{i = 1}^{N} |G_{θ_{G}} (I_{i}^{B}) - I_{i}^{S}|}{N}

(5)

where

I_{i}^{B}

represents the defocused image,

I_{i}^{S}

means the sharp image,

G_{θ_{G}}

stands for the generator, and N represents the total number of pixels in the image. Compared to ordinary L1 loss, perceptual loss [74] can effectively enhance the details of the resulting image. It can be defined as follows:

L_{p} = \frac{1}{W_{i, j} H_{i, j}} \sum_{x = 1}^{W_{i, j}} \sum_{y = 1}^{H_{i, j}} {(ϕ_{i, j} {(I^{S})}_{x, y} - ϕ_{i, j} {(G_{θ_{G}} (I^{B}))}_{x, y})}^{2}

(6)

in the above expression, the feature map obtained from the pre-trained model VGG19 (Visual Geometry Group 19) through the j-th convolution (after activation) before the i-th max pooling layer is represented by

ϕ_{i, j}

, whereas

W_{i, j}

and

H_{i, j}

represent the size of the feature map. Then, the Euclidean distance of reconstructed image

G_{θ_{G}} (I^{B})

and reference image

I^{S}

is calculated.

In the second section of this paper, we have discussed the origin of WGAN-GP and its ability to train models stably and reliably, which has been confirmed during the training of SIDGAN. The adversarial loss of SIDGAN can be further represented as:

L_{a d v} = \sum_{n = 1}^{N} - D_{θ_{D}} (G_{θ_{G}} (I^{B}))

(7)

where

I^{B}

represents the original defocused image.

G_{θ_{G}}

and

D_{θ_{D}}

represent the generator network and discriminator network, respectively.

In summary, our definitive loss function can be expressed as follows:

L_{t o t a l} = L_{a d v} + λ L_{M A E} + μ L_{p}

(8)

in the above expression,

λ = 0.0

1 and

μ = 100

.

4. Experiments

4.1. Experimental Details

SIDGAN is implemented by Pytorch [75] and trained on one NVIDIA GeForce GTX 1080Ti graphics card. Adam [76] optimizer is adopted to optimize network parameters, where

β_{1}

= 0.5 and

β_{2}

= 0.999. At the beginning, both the generator and the discriminator have a similar learning rate of 0.0001, and it remains unchanged in the first 150 epochs. After completing the first 150 epochs, the learning rate is linearly decayed to zero over the next 150 epochs. During training, adversarial loss, L1 loss, and perceptual loss are combined as the definitive loss. In addition, the batch size is set to 2 and the trained image is cropped to 256 × 256. Our model would accomplish training after 300 epochs. The training set of the DPDD dataset consists of 350 pairs of defocused and sharp images, its validation set includes 74 pairs of defocused and sharp images, and its test set contains 76 pairs of defocused and sharp images. The aforementioned dataset is widely used in the field of image defocus deblurring. Unlike many now available deblurring methods, our SIDGAN only uses a single pixel to train. After completing model training on the DPDD dataset, qualitative and quantitative evaluation experiments on the DPDD and RealDOF are conducted.

4.2. Experimental Results

In order to verify the effectiveness of the SIDGAN proposed in this paper, a large number of experiments are conducted on the DPDD and the RealDOF that are publicly available datasets on the Internet. Not only are the PSNR, SSIM, and MAE chosen to quantitatively evaluate algorithm performance but also visual results are presented for qualitative comparison. In order to make a fair comparison with existing methods, our training data and test data are the same as for the previous algorithms. Moreover, most of the evaluation results are cited from the original paper, whereas a small amount of data come from previously published papers.

4.2.1. Evaluation Experiments on the DPDD Dataset

Table 2 illustrates the quantitative evaluation results on the DPDD dataset. From the table, we can see that our SIDGAN performs the best in terms of the PSNR, SSIM, and MAE on the DPDD dataset. Compared to the latest IRNeXt model published in ICML (International Conference on Machine Learning) in 2023, SIDGAN achieves more than a 0.86 dB PSNR improvement on the DPDD dataset. In comparison with the latest FocalNet (Focal Network) model published in ICCV (International Conference on Computer Vision) in 2023, the PSNR of SIDGAN is improved by 0.98. Compared to the recent Restormer and DRBNet (Dynamic Residual Block Network) published in CVPR (Conference on Computer Vision and Pattern Recognition) in 2022, SIDGAN achieves more than a 1.18 dB and a 1.43 dB PSNR improvement on DPDD, respectively. In contrast with BAMBNet (Blur-Aware Multi-Branch Network), our method achieves an improvement in it PSNR of more than 0.76 dB on DPDD. In comparison with MDPNet (Multi-Task Dual-Pixel Network), the PSNR of our SIDGAN is improved by 1.81 dB. As reported in the fifth row in the second column in Table 2, our method exceeds the PSNR of IFAN by 1.17 dB on the DPDD. Compared to KPAC (Kernel-Sharing Parallel Atrous Convolutional), RDPD (Recurrent Dual-Pixel Deblurring), DDDNet (Dual-Pixel-Based Depth and Deblur Network), DPDNet (Dual-Pixel Deblurring Network), and DMENet (Defocus Map Estimation Network), our method also performs better in terms of its PSNR. From the third and fourth column of Table 2, we can see that our SIDGAN also performs better than others in the aspect of the SSIM and MAE. In a few words, the former objective data go beyond previous methods, which demonstrates that the elaborately designed architecture of SIDGAN is trustworthy and effective in the field of defocus deblurring.

Figure 5 shows the visual results of different defocus deblurring methods on the DPDD dataset. From the first large blurry image, it is challenging to determine whether it is a cement floor or a stone floor. In order to provide a better viewing experience, we crop the image into small blocks and then present the corresponding sharp images processed by different methods. From the second image of the first group, we can see that the image is still smooth and it contains many artifacts on the wall and the floor after processing by DPDNet. The image processed by MDPNet not only becomes very smooth but also contains a lot of noise and artifacts. From the fourth image of the first group, it is apparent that Restormer achieves good visual effects, but the texture is not clear enough and there are many artifacts on the floor. Compared to Restormer and our SIDGAN, the results of DRBNet and RDPD are still smooth and there are many artifacts on the floor near the wall. From the final image of the first group, we can see that our method not only restores rich texture from the defocused image but also contains less noise and fewer artifacts than others. From the second group of Figure 5, it can be perceived that other methods cannot restore the window screen well and deblur effectively. However, our method not only restores the window screen effectively but also makes distant houses clear and makes our result contain fewer artifacts than others. In summary, our SIDGAN obtains better visual results, whereas most other methods will fail when they encounter large blurry regions.

4.2.2. Evaluation Experiments on the RealDOF Dataset

Table 3 illustrates the generalization performance of the latest and recent deblurring methods on the RealDOF dataset. Compared to EBDB (Edge-Based Defocus Blur), our SIDGAN achieves more than a 2.72 dB PSNR improvement on the RealDOF. In contrast with JNB (Just Noticeable Blur), our method achieves more than a 2.74 dB PSNR improvement on the RealDOF. In comparison with DPDNet, the PSNR of our SIDGAN is improved by 2.43. Compared to DMENet, our method improves PSNR by 2.69 dB. In contrast with IFAN, MPRNet (Multi-Path Residual Network), and RDPD, the PSNR of our SIDGAN is improved by 0.39 dB, 0.73 dB, and 1.88 dB, respectively. In comparison with recent deblurring methods such as MDPNet, Restormer, and DRBNet, our method also achieves more than a 1.60 dB, 0.01 dB, and 0.22 dB PSNR improvement on the RealDOF, respectively. Moreover, compared to the latest deblurring methods such as FocalNet and IRNeXt, our SIDGAN achieves more than a 0.08 and 0.3 dB PSNR improvement on the RealDOF. According to the available evaluation information in Table 3, we can see that our method performs better in terms of image quality and has better generalization performance than others on the untrained dataset.

Figure 6 shows the visual generalization results for different defocus deblurring methods on the RealDOF dataset that is only used for testing purposes. In order to provide a better viewing experience, we crop the image into small blocks and then present the corresponding sharp images processed by different methods. From the second image of the first group, it is evident that the image processed by DPDNet is still blurry and there are many artifacts on the wall. The image outputted by MDPNet contains relatively few details and has many artifacts on the wall and the window. At first glance, the image processed by Restormer is good in visual effects. After taking a closer look, from the upper left corner of the fourth image in the first group, it can be found that Restormer is unable to handle window frames well. From the sixth image processed by DRBNet, we can discover that the brick wall is still blurry and there are a lot of artifacts at the edges of the shadows. The result of DPDD is not only blurry but also noisy and contains many artifacts on the wall and the window. Taken as a whole, from the final image of the first group in Figure 6, we find that the result of SIDGAN is more clear and contains fewer artifacts than others. From the second group of Figure 6, we can see that DRBNet and Restormer obtain good visual results. However, the former methods cannot restore details effectively. For example, the regions near the text and the glove are still very blurry; the text and the glove from their output images are barely discernible, whereas our method can restore more details effectively. The main reason for the better results is that our FFEM can extract effective features from the blurred images with large blurry regions and our ARM is capable of removing artifacts. In short, our SIDGAN outperforms several other algorithms in terms of visual effects when the defocused image contains large blurry regions.

5. Ablation Studies

In this section, in order to confirm whether the FFEM can enhance the ability to extract features from the defocus image that contains large blurry regions and encounters severe blur, we conduct qualitative and quantitative experiments on the DPDD and the RealDOF datasets. Moreover, for the sake of validating the ARM’s ability to remove artifacts during decoding, we also conduct qualitative and quantitative experiments on the DPDD and RealDOF datasets.

5.1. Ablation Studies on the DPDD Dataset

As shown in Table 4, we first conduct ablation studies on the DPDD dataset. The evaluation results for different modules are reported in Table 4. The PSNR of the baseline is only 26.418. After adding the FFEM, the new algorithm improves the PSNR by 0.352 dB. Compared with the baseline, the new algorithm with the ARM improves the PSNR by 0.658 dB. When the FFEM and the ARM are adopted at the same time, the PSNR of the algorithm reaches 27.155 and is enhanced by 0.737 dB. From the third column of the table, we can see that the SSIM gradually improves after adding the FFEM and the ARM. However, the ARM obtains the most improvement in the SSIM metric. In summary, from the final row of Table 4, it can be found that the algorithm performs better than the others after adding the FFEM and the ARM proposed in this work.

In addition, the visual results of our SIDGAN equipped with various modules on the DPDD dataset are illustrated in Figure 7. Overall, no matter which module is employed, the visual results of the defocused image are improved. However, some modules cannot eliminate artifacts effectively. For instance, there are many artifacts near the brick joint in the image processed by the baseline. The main reason is that the first image is severely blurry and the baseline is unable to effectively extract features. After adding the FFEM to the algorithm, we obtain a clearer image with less noise as shown in the final column. The improvement in visual effects is mainly due to the feature extraction ability of the FFEM. Although the FFEM can further improve image quality, there are many artifacts near the brick joints. Therefore, in order to eliminate these artifacts, a new ARM is put forward. From the fifth image in Figure 7, we can see that a lot of artifacts near the brick joints are eliminated. When the FFEM and the ARM, as proposed in this paper, are added simultaneously, the algorithm is capable of not only restoring structural contents and textural details but also preserving the spatial smoothness of the homogeneous regions.

In Figure 8, we also provide the error maps of each module on the DPDD. The brighter the color, the larger the error. In general, the baseline, the FFEM, and the ARM are able to recover more details from the defocused image. From the left image in Figure 8, we can see many brighter pixels across the image, and the pixels near the brick joints are especially bright. The previous phenomenon indicates a significant error between the defocused image and the high-definition image. In a word, the first image in the second column of Figure 8 demonstrates that the error is reduced in comparison with the error map of the defocused image. Overall, from the first image in the final column of Figure 8, we can discover that the brighter pixels are further decreased in comparison with the baseline after adding the FFEM. From the first image in Figure 7 and the left image in Figure 8, it is easy to find that the defocused image contains large blurry regions. The previous observations indicate that the FFEM can enhance the feature extraction ability of our method when the input is a defocused image with large blurry regions. Compared to the preceding error maps, the ARM further reduces errors between the sharp image and the reconstructed image. From the final image in Figure 8, we can discover that it has the least number of brighter pixels among the five error maps, which indicates that the FFEM and the ARM are conducive to recovering high-quality images from defocused images.

5.2. Ablation Studies on the RealDOF Dataset

To further verify the effectiveness of the FFEM and the ARM proposed in this paper, a quantitative evaluation experiment is conducted on the RealDOF dataset. The RealDOF dataset is used only for testing purposes. It can be discerned from the second column of Table 5 that the PSNR of the baseline is only 23.874. After adding the FFEM proposed in this paper, its PSNR reaches 24.158. Compared with the baseline, our algorithm achieves better performance after using the ARM, and its PSNR reaches 24.713. When the FFEM and the ARM are used simultaneously, our algorithm achieves the best performance in terms of the PSNR and its PSNR is improved by 1.23 dB. From the third column of Table 5, we can see that the FFEM and the ARM can further improve the performance on the SSIM. As demonstrated in the final column of Table 5, their MAEs are gradually reduced after adopting the FFEM or the ARM. However, after using both of them simultaneously, the MAE is the minimum of all. In the final analysis, the quantitative results in the table demonstrate that the FFEM and the ARM contribute to defocus deblurring.

As illustrated in Figure 9, the visual results of the SIDGAN equipped with different modules on the RealDOF dataset are exhibited. From the first image, it is difficult to read the writing on the carriage. After processing by our method, the text on the carriage is clear and visible. Compared to the original defocused image, the image quality of the baseline shows a slight improvement, as seen in the first image of the second column in Figure 9. From the first row of the final column, it can be seen that the image quality is further improved after adding the FFEM. By comparison with the original defocused image, we can see that the visual effect is improved again after adding the ARM. In a word, the quality of the image processed by SIDGAN equipped with the FFEM and the ARM is the best of all.

In Figure 10, we also provide the error maps of each module on the RealDOF dataset. The brighter the color, the larger the error. In general, the baseline, the FFEM, and the ARM can recover more details from the defocused image. From the left image in Figure 10, we can see many brighter pixels near the characters and at the bottom of the carriage. The first image in the middle column of Figure 10 indicates that the error is barely reduced in comparison with the error map of the defocused image. The first image in the final column shows that the number of bright pixels is further decreased as compared with the error maps of the defocused image and the baseline. The second image in the middle column of Figure 10 suggests that the error is reduced again in comparison with the error maps of the baseline and the FFEM. From the final image of Figure 10, we can discover that the number of bright pixels is the smallest of all the five error maps. The previous observations demonstrate that the FFEM benefits the extraction of features from the defocused image and the ARM is conducive to reducing artifacts during defocus deblurring.

6. Limitations

Although our research has made some progress in the field of single image defocus deblurring, our method still has some limitations. Firstly, the datasets used in this paper may not fully represent all scenarios encountered in real-world applications. These datasets only cover a small number of scenes or types of images and may lack corresponding samples in other scenarios. Consequently, the model’s generalization ability and applicability may be limited by the data trained on it. Secondly, our model may have some errors when processing specific types of images. For instance, when the input images encounter low contrast, complex backgrounds, or strong lighting variations, the outputs of our method are not so satisfactory. Therefore, although our research has made some progress, we still need to consider and address these limitations in order to further improve the performance of our model. In the future, we are committed to addressing these challenges and making more contributions to the development of single image defocus deblurring.

7. Conclusions

In this paper, we propose an efficient multi-module architecture framework for single image defocus deblurring. For the sake of enhancing our method’s ability to extract features from the defocus image that contains large blurry regions and encounters severe blur, we propose a new module named FFEM. After that, we propose another new module named ARM for addressing the common issue of artifacts in the field of defocus deblurring. In addition, we incorporate multi-scale mechanisms into the network by adjusting the image resolution to one-half and one-quarter. Extensive experimental results demonstrate that our SIDGAN proposed in this paper outperforms the state-of-the-art algorithms in three objective metrics. Compared with currently available defocus deblurring methods, our SIDGAN has stronger feature extraction and artifact removal capabilities. Although our method outperforms all the latest defocus deblurring methods in image quality, our method still has shortcomings. For example, the parameters of our method are not the smallest of all, nor does it achieve the best performance in terms of FLOPs, and the generalization of our model needs further improvement. Future research work will aim: (1) to further reduce the computational complexity of SIDGAN while retaining (or improving) image quality, (2) to further optimize SIDGAN using novel loss functions, and (3) to optimize our method to handle more diverse blur scenarios.

Author Contributions

Conceptualization, S.L.; methodology, S.L. and H.Z.; software, S.L. and H.Z.; validation, S.L. and H.Z.; formal analysis, S.L.; investigation, S.L. and H.Z.; resources, S.L.; data curation, S.L.; writing—original draft preparation, S.L. and H.Z.; writing—review and editing, S.L. and L.C.; supervision, S.L.; project administration, L.C.; funding acquisition, S.L.; data analysis, L.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Opening Project of International Joint Research Center of Robotics and Intelligence System of Sichuan Province grant number JQZN2023-006.

Data Availability Statement

The finite models generated and analyzed during the current study are available from the corresponding author on reasonable request.

Acknowledgments

The data utilized in this publication were acquired from the York University (DPDD, https://ln2.sync.com/dl/c45358c50/r7kpybwk-xw8hhszh-qkj249ap-y8k2344d (accessed on 1 September 2023)), and the Pohang University of Science and Technology (RealDOF, https://www.dropbox.com/s/arox1aixvg67fw5/RealDOF.zip?dl=1 (accessed on 1 September 2023)). This manuscript represents the perspectives of the authors and may not necessarily align with the opinions or viewpoints of the individual contributing the original data to the datasets.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ARM	Artifact Removal Module
AiF	All-in-focus
BAMBNet	Blur-Aware Multi-Branch Network
CVPR	Conference on Computer Vision and Pattern Recognition
DOF	Depth of Field
DPDD	Dual-Pixel Defocus Deblurring
DP	Dual-Pixel
DMENet	Defocus Map Estimation Network
DBD	Defocus Blur Detection
DRBNet	Dynamic Residual Block Network
DPDNet	Dual-Pixel Deblurring Network
DDDNet	Dual-Pixel-Based Depth and Deblur Network
DNN	Deep Neural Network
EBDB	Edge-Based Defocus Blur
FFEM	Fuzzy Feature Extraction Module
FAM	Feature Attention Module
FocalNet	Focal Network
FADA	Focused Area Detection Attack
GANs	Generative Adversarial Networks
GRL	Global, Regional, and Local
GKM	Gaussian Kernel Mixture
IFAN	Interactive Filter Adaptive Network
ICML	International Conference on Machine Learning
iDFD	Indoor Depth from Defocus
ICCV	International Conference on Computer Vision
JDRL	Joint Deblurring And Reblurring Learning
JS divergence	Jensen–Shannon divergence
JNB	Just Noticeable Blur
KPAC	Kernel-Sharing Parallel Atrous Convolutional
MAE	Mean Absolute Error
MRNet	Multi-Refinement Network
MDPNet	Multi-Task Dual-Pixel Network
MPRNet	Multi-Path Residual Network
MSE	Mean Square Error
PSNR	Peak Signal-to-Noise Ratio
RealDOF	Real Depth of Field
RDPD	Recurrent Dual-Pixel Deblurring
SIDD	Single Image Defocus Deblurring
SIDGAN	Single Image Deblurring Generative Adversarial Networks
SCM	Shallow Convolutional Module
SSIM	Structural Similarity Index
SDD	Single-Image Defocus Deblurring
VGG19	Visual Geometry Group 19
WGAN-GP	Wasserstein Generative Adversarial Network With Gradient Penalty

References

Son, H.; Lee, J.; Cho, S.; Lee, S. Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 2622–2630. [Google Scholar]
Tai, Y.W.; Brown, M. Single image defocus map estimation using local contrast prior. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, 22–25 September 2009; pp. 1797–1800. [Google Scholar]
Zhuo, S.J.; Sim, T. Defocus map estimation from a single image. Pattern Recognit. 2021, 44, 1852–1858. [Google Scholar] [CrossRef]
Karaali, A.; Jung, C.R. Edge-Based Defocus Blur Estimation with Adaptive Scale Selection. IEEE Trans. Image Process. 2018, 3, 1126–1137. [Google Scholar] [CrossRef]
Cho, S.; Lee, S. Convergence Analysis of MAP Based Blur Kernel Estimation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4818–4826. [Google Scholar]
Fish, D.; Brinicombe, A.M.; Pike, E.R.; Walker, J.G. Blind deconvolution by means of the Richardson–Lucy algorithm. JOSA A 1995, 12, 58–65. [Google Scholar] [CrossRef]
Levin, A.; Fergus, R.; Durand, F.; Freeman, W. Image and depth from a conventional camera with a coded aperture. Acm Trans. Graph. (Tog) 2007, 27, 70-es. [Google Scholar] [CrossRef]
Krishnan, D.; Fergus, R. Fast image deconvolution using hyper-Laplacian priors. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 7–10 December 2009; pp. 1033–1041. [Google Scholar]
Bando, Y.; Nishita, T. Towards Digital Refocusing from a Single Photograph. In Proceedings of the 15th Pacific Conference on Computer Graphics and Applications (PG’07), Maui, HI, USA, 29 October–2 November 2007; pp. 363–372. [Google Scholar]
Shi, J.P.; Xu, L.; Jia, J.Y. Just noticeable defocus blur detection and estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 657–665. [Google Scholar]
Park, J.; Tai, Y.W.; Cho, D.; Kweon, I. A Unified Approach of Multi-scale Deep and Hand-Crafted Features for Defocus Estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 21–26 July 2017; pp. 2760–2769. [Google Scholar]
Xu, G.D.; Quan, Y.H.; Ji, H. Estimating Defocus Blur via Rank of Local Patches. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5381–5389. [Google Scholar]
D’Andres, L.; Salvador, J.; Kochale, A.; Süsstrunk, S. Non-Parametric Blur Map Regression for Depth of Field Extension. IEEE Trans. Image Process. 2016, 25, 1660–1673. [Google Scholar] [CrossRef]
Liu, Y.Q.; Du, X.; Shen, H.L.; Chen, S.J. Estimating Generalized Gaussian Blur Kernels for Out-of-Focus Image Deblurring. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 829–843. [Google Scholar] [CrossRef]
Goilkar, S.; Yadav, D.M. Implementation of Blind and Non-blind Deconvolution for Restoration of Defocused Image. In Proceedings of the International Conference on Emerging Smart Computing and Informatics (ESCI), Pune, India, 19–22 August 2021; pp. 560–563. [Google Scholar]
Chan, S.; Nguyen, T. Single image spatially variant out-of-focus blur removal. In Proceedings of the IEEE International Conference on Image Processing, Brussels, Belgium, 11–14 September 2011; pp. 677–680. [Google Scholar]
Abuolaim, A.; Brown, M. Defocus deblurring using dual-pixel data. In Proceedings of the European Conference on Computer Vision, Online, 22–28 August 2020; pp. 111–126. [Google Scholar]
Lee, J.; Lee, S.; Cho, S.; Lee, S. Deep Defocus Map Estimation Using Domain Adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 12214–12222. [Google Scholar]
Lee, J.; Son, H.; Rim, J.; Cho, S.; Lee, S. Iterative Filter Adaptive Network for Single Image Defocus Deblurring. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Online, 19–25 June 2021; pp. 2034–2042. [Google Scholar]
Abuolaim, A.; Afifi, M.; Brown, M. Improving Single-Image Defocus Deblurring: How Dual-Pixel Images Help Through Multi-Task Learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 82–90. [Google Scholar]
Zhao, W.D.; Wei, F.; He, Y.; Lu, H.C. United Defocus Blur Detection and Deblurring via Adversarial Promoting Learning. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 569–586. [Google Scholar]
Quan, Y.H.; Yao, X.; Ji, H. Single Image Defocus Deblurring via Implicit Neural Inverse Kernels. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 12566–12576. [Google Scholar]
Zhang, A.; Sun, J. Joint Depth and Defocus Estimation From a Single Image Using Physical Consistency. IEEE Trans. Image Process. 2021, 30, 3419–3433. [Google Scholar] [CrossRef]
Anwar, S.; Hayder, Z.; Porikli, F.M. Deblur and deep depth from single defocus image. Mach. Vis. Appl. 2021, 32, 1–13. [Google Scholar] [CrossRef]
Karaali, A.; Harte, N.; Jung, C.R. Deep Multi-Scale Feature Learning for Defocus Blur Estimation. IEEE Trans. Image Process. 2022, 31, 1097–1106. [Google Scholar] [CrossRef]
Yang, Y.; Pan, L.Y.; Liu, L.; Liu, M.M. K3DN: Disparity-Aware Kernel Estimation for Dual-Pixel Defocus Deblurring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 13263–13272. [Google Scholar]
Quan, Y.H.; Wu, Z.C.; Ji, H. Neumann Network with Recursive Kernels for Single Image Defocus Deblurring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 5754–5763. [Google Scholar]
Li, Y.W.; Fan, Y.C.; Xiang, X.Y.; Demandolx, D.; Ranjan, R.; Timofte, R.; Gool, L.V. Efficient and explicit modelling of image hierarchies for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 18278–18289. [Google Scholar]
Ye, Q.; Suganuma, M.; Okatani, T. Accurate Single-Image Defocus Deblurring Based on Improved Integration with Defocus Map Estimation. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Kuala Lumpur, Malaysia, 8–11 October 2023; pp. 750–754. [Google Scholar]
Zhao, W.; Hu, G.; Wei, F.; Wang, H.P.; He, Y.; Lu, H.C. Attacking Defocus Detection With Blur-Aware Transformation for Defocus Deblurring. IEEE Trans. Multimed. 2024, 26, 5450–5460. [Google Scholar] [CrossRef]
Ali, K.; Jung, C.R. SVBR-Net: A Non-Blind Spatially Varying Defocus Blur Removal Network. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; pp. 566–570. [Google Scholar]
Zhang, D.; Wang, X.B. Dynamic Multi-Scale Network for Dual-Pixel Images Defocus Deblurring with Transformer. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan, 18–22 July 2022; pp. 1–6. [Google Scholar]
Saqib, N.; Lorenzo, V.; Manuel, M.; Victor, M.B.; Daniela, C. 2HDED:Net for Joint Depth Estimation and Image Deblurring from a Single Out-of-Focus Image. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; pp. 2006–2010. [Google Scholar]
Nazir, S.; Qiu, Z.Y.; Coltuc, D.; Martínez-Sánchez, J.; Arias, P. iDFD: A Dataset Annotated for Depth and Defocus. In Proceedings of the Scandinavian Conference on Image Analysis, Sirkka, Finland, 18–21 April 2023; pp. 67–83. [Google Scholar]
Mazilu, I.; Wang, S.; Dummer, S.; Veldhuis, R.; Brune, C.; Strisciuglio, N. Defocus Blur Synthesis and Deblurring via Interpolation and Extrapolation in Latent Space. arXiv 2023, arXiv:2307.15461. [Google Scholar]
Zhao, Z.J.; Yang, H.; Liu, P.; Nie, H.; Zhang, Z.; Li, C. Defocus blur detection via adaptive cross-level feature fusion and refinement. Vis. Comput. 2024, 1432–2315. [Google Scholar] [CrossRef]
Zhang, K.H.; Ren, W.; Luo, W.; Lai, W.S.; Stenger, B.; Yang, M.H.; Li, H.D. Deep Image Deblurring: A Survey. Int. J. Comput. Vis. 2022, 130, 2103–2130. [Google Scholar] [CrossRef]
Chai, S.; Zhao, X.; Zhang, J.; Kan, J. Defocus blur detection based on transformer and complementary residual learning. Multimed. Tools Appl. 2023, 83, 53095–53118. [Google Scholar] [CrossRef]
Galetto, F.; Deng, G. Single image defocus map estimation through patch blurriness classification and its applications. Vis. Comput. 2022, 39, 4555–4571. [Google Scholar] [CrossRef]
Zhang, N.; Yan, J.C. Rethinking the Defocus Blur Detection Problem and a Real-Time Deep DBD Model. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 22–28 August 2020; pp. 617–632. [Google Scholar]
Lin, X.; Suo, J.L.; Cao, X.; Dai, Q.H. Iterative Feedback Estimation of Depth and Radiance from Defocused Images. In Proceedings of the Asian Conference on Computer Vision, Singapore, 20–23 May 2012; pp. 95–109. [Google Scholar]
Quan, Y.H.; Wu, Z.C.; Cao, X.; Ji, H. Gaussian Kernel Mixture Network for Single Image Defocus Deblurring. Adv. Neural Inf. Process. Syst. 2021, 34, 20812–20824. [Google Scholar]
Zhang, D.F.; Wang, X.B.; Jin, Z.Z. MRNET: Multi-Refinement Network for Dual-Pixel Images Defocus Deblurring. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–9 June 2023; pp. 1–5. [Google Scholar]
Jung, S.H.; Heo, Y.S. Disparity probability volume guided defocus deblurring using dual pixel data. In Proceedings of the International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Korea, 20–22 October 2021; pp. 305–308. [Google Scholar]
Zhai, J.C.; Liu, Y.; Zeng, P.C.; Ma, C.H.; Wang, X.; Zhao, Y. Efficient Fusion of Depth Information for Defocus Deblurring. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 2640–2644. [Google Scholar]
Ma, H.Y.; Liu, S.J.; Liao, Q.M.; Zhang, J.C.; Xue, J.H. Defocus Image Deblurring Network With Defocus Map Estimation as Auxiliary Task. IEEE Trans. Image Process. 2021, 31, 216–226. [Google Scholar] [CrossRef]
Ruan, L.Y.; Chen, B.; Li, J.; Lam, M.L. AIFNet: All-in-Focus Image Restoration Network Using a Light Field-Based Dataset. IEEE Trans. Comput. Imaging 2021, 7, 675–688. [Google Scholar] [CrossRef]
Shi, J.P.; Xu, L.; Jia, J.Y. Discriminative Blur Detection Features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 2965–2972. [Google Scholar]
Li, Y.; Ren, D.; Shu, X.; Zuo, W. Learning Single Image Defocus Deblurring with Misaligned Training Pairs. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; pp. 1495–1503. [Google Scholar]
Ian, G.; Jean, P.; Mehdi, M.; Bing, X.; David, W.F.; Sherjil, O.; Aaron, C.; Yoshua, B. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
Tim, S.; Ian, G.; Wojciech, Z.; Vicki, C.; Alec, R.; Xi, C.; Xi, C. Improved Techniques for Training GANs. In Proceedings of the International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 2234–2242. [Google Scholar]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein Generative Adversarial Networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved Training of Wasserstein GANs. In Proceedings of the International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5769–5779. [Google Scholar]
Yang, F.Z.; Yang, H.; Fu, J.L.; Lu, H.T.; Guo, B.N. Learning Texture Transformer Network for Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 5790–5799. [Google Scholar]
Vasluianu, F.A.; Seizinger, T.; Timofte, R.; Cui, S.; Huang, J.; Tian, S.; Xia, S. NTIRE 2023 Image Shadow Removal Challenge Report. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, BC, Canada, 18–22 June 2023; pp. 1788–1807. [Google Scholar]
Xie, C.H.; Liu, S.H.; Li, C.; Cheng, M.M.; Zuo, W.M.; Liu, X.; Wen, S.L.; Ding, E. Image inpainting with learnable bidirectional attention maps. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8857–8866. [Google Scholar]
Ling, S.G.; Fu, K.; Lin, Y.; You, D.; Cheng, P. Face illumination processing via dense feature maps and multiple receptive fields. Electron. Lett. 2021, 57, 627–629. [Google Scholar] [CrossRef]
Cui, Y.N.; Ren, W.Q.; Cao, X.C.; Knoll, A. Focal Network for Image Restoration. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 12955–12965. [Google Scholar]
Zhang, H.G.; Dai, Y.C.; Li, H.D.; Koniusz, P. Deep Stacked Hierarchical Multi-Patch Network for Image Deblurring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 27 October–2 November 2019; pp. 5971–5979. [Google Scholar]
Olson, M.L.; Liu, S.S.; Anirudh, R.; Thiagarajan, J.; Bremer, P.T.; Wong, W.K. Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences Between Pretrained Generative Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 7981–7990. [Google Scholar]
Solano-Carrillo, E.; Rodríguez, Á.B.; Carrillo-Perez, B.; Steiniger, Y.; Stoppe, J. Look ATME: The Discriminator Mean Entropy Needs Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 787–796. [Google Scholar]
Mirza, M.; Simon, O. Conditional Generative Adversarial Nets. In Proceedings of the Computer Science. arXiv 2014, arXiv:1411.1784. [Google Scholar]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar]
Isola, P.; Zhu, J.Y.; Zhou, T.H.; Efros, A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5967–5976. [Google Scholar]
Li, C.; Wand, M. Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 702–716. [Google Scholar]
Cho, S.J.; Ji, S.W.; Hong, J.P.; Jung, S.W.; Ko, S.J. Rethinking Coarse-to-Fine Approach in Single Image Deblurring. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 694–711. [Google Scholar]
Rădulescu, V.M.; Maican, C.A. Algorithm for image processing using a frequency separation method. In Proceedings of the International Carpathian Control Conference (ICCC), Sinaia, Romania, 29 May–1 June 2022; pp. 181–185. [Google Scholar]
Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.H.; Shi, W.Z. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 105–114. [Google Scholar]
He, K.M.; Zhang, X.; Ren, S.Q.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Caesars Palace, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Nair, V.; Hinton, G. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
He, K.M.; Zhang, X.; Ren, S.Q.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
Zamir, S.W.; Arora, A.; Khan, S.H.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–14 June 2022; pp. 5728–5739. [Google Scholar]
Wang, X.T.; Xie, L.B.; Dong, C.; Shan, Y. Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, QC, Canada, 7–10 October 2021; pp. 1905–1914. [Google Scholar]
Johnson, J.; Alahi, A.; Li, F.F. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 694–771. [Google Scholar]
Available online: https://pytorch.org/ (accessed on 1 July 2018).
Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980v9. [Google Scholar]
Pan, L.Y.; Chowdhury, S.; Hartley, R.; Liu, M.M.; Zhang, H.G.; Li, H.D. Dual Pixel Exploration: Simultaneous Depth Estimation and Image Restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 10–25 June 2021; pp. 4338–4347. [Google Scholar]
Abuolaim, A.; Delbracio, M.; Kelly, D.; Brown, M.; Milanfar, P. Learning to reduce defocus blur by realistically modeling dual-pixel data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 2289–2298. [Google Scholar]
Mehri, A.; Ardakani, P.B.; Sappa, A.D. MPRNet: Multi-Path Residual Network for Lightweight Image Super Resolution. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Piscataway, NJ, USA, 3–8 January 2021; pp. 2703–2712. [Google Scholar]
Liang, P.W.; Jiang, J.; Liu, X.; Ma, J. BaMBNet: A Blur-Aware Multi-Branch Network for Dual-Pixel Defocus Deblurring. IEEE/CAA J. Autom. Sin. 2022, 9, 878–892. [Google Scholar] [CrossRef]
Ruan, L.Y.; Chen, B.; Li, J.Z.; Lam, M. Learning to Deblur using Light Field Generated and Real Defocus Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 16283–16292. [Google Scholar]
Cui, Y.N.; Ren, W.Q.; Yang, S.N.; Cao, X.C.; Knoll, A. IRNeXt: Rethinking Convolutional Network Design for Image Restoration. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 15–17 December 2023; pp. 6545–6564. [Google Scholar]

Figure 1. Visualization of defocused images and sharp images restored by our SIDGAN (Single Image Deblurring Generative Adversarial Network). The defocused image and the ground truth are from the DPDD dataset.

Figure 2. The overall architecture of SIDGAN. The defocused images are cropped and passed to the generator, then the sharp image is recovered. The discriminator calculates the distance between the fake samples generated by the generator and the real samples.

Figure 3. The elaborately designed architecture of SIDGAN. The diagram is divided into two parts: the upper part above the dashed line illustrates the detailed structure of the generator, and the lower part describes the details of each module.

Figure 4. The detailed structure of the discriminator.

Figure 5. Visual comparison on the DPDD dataset among DPDNet [17], MDPNet [20], Restormer [72], DRBNet [81], RDPD [78], and our SIDGAN. The red box represents the cropping area of the image.

Figure 6. Visual comparison on the RealDOF dataset among DPDNet [17], MDPNet [20], Restormer [72], DRBNet [81], RDPD [78], and our SIDGAN. The red box represents the cropping area of the image.

Figure 7. Visual comparison to reflect the effectiveness of our SIDGAN equipped with the FFEM and the ARM proposed in this paper on the DPDD dataset.

Figure 8. The error maps reflect the effectiveness of our SIDGAN equipped with the FFEM and the ARM proposed in this paper on the DPDD dataset. The brighter the color, the larger the error.

Figure 9. Visual comparison to reflect the effectiveness of our SIDGAN equipped with the FFEM and the ARM proposed in this paper on the RealDOF dataset. The characters on the carriage mean Korean Express.

Figure 10. The error maps reflect the effectiveness of our SIDGAN equipped with the FFEM and the ARM proposed in this paper on the RealDOF dataset. The characters on the carriage mean Korean Express.

Table 1. Defocus deblurring datasets for training and testing.

Dataset	Number of Photos	Resolution	Camera
CUHK [48]	1000	~470 × 610	Internet
DED [46]	1022	409 × 613	Lytro Illum
DPDD [17]	500	1120 × 1680	Canon EOS 5D Mark IV
LFDOF [47]	12,000	680 × 1008	Lytro Illum
PixelDP [17]	13	~1680 × 1120	Google Pixel 4 Smartphone
RealDOF [19]	50	~1536 × 2320	Sony $α$ 7R IV
SDD [49]	150	2048 × 1080	HUAWEI X2381-VG

Table 2. Defocus deblurring comparison on the DPDD dataset. ‘+’ indicates the method is trained with extra data. The best algorithm is highlighted in boldface. ↑ indicates that the larger is better. ↓ means that the smaller is better.

	PSNR↑	SSIM↑	MAE↓	Params (M) ↓	FLOPs (G) ↓
Method	PSNR↑	SSIM↑	MAE↓	Params (M) ↓	FLOPs (G) ↓
KPAC ICCV2021 [1]	25.22	0.774	0.040	2.06	349
DPDNet ECCV2020 [17]	24.34	0.747	0.044	32.25	3150
DMENet CVPR2019 [18]	23.90	0.720	0.047	26.71	4787
IFAN CVPR2021 [19]	25.99	0.804	0.037	10.48	794
FocalNet ICCV2023 [58]	26.18	0.808	0.037	3.74	30
Restormer CVPR2022 [72]	25.98	0.811	0.038	26.13	4458
DDDNet CVPR2021 [77]	25.41	0.786	0.038	6.04	1661
RDPD+ ICCV2021 [78]	25.39	0.772	0.040	24.28	901
MDPNet WACV2021 [79]	25.35	0.763	0.040	46.86	1898
BAMBNet JAS2022 [80]	26.40	0.821	0.045	4.50	1804
DRBNet+ CVPR2022 [81]	25.73	0.791	0.039	11.69	1273
IRNeXt ICML2023 [82]	26.30	0.814	0.037	5.46	41
SIDGAN(Ours)	27.16	0.825	0.036	11.95	147

Table 3. Defocus deblurring comparison on the RealDOF dataset. The RealDOF is extensively adopted for testing the generalization performance of defocus deblurring algorithms and is not included in the training set. ‘+’ indicates the method is trained with extra data. The best algorithm is highlighted in boldface. ↑ indicates that the larger is better. ↓ means that the smaller is better.

	PSNR↑	SSIM↑	MAE↓	Params (M)↓	FLOPs (G)↓
Method	PSNR↑	SSIM↑	MAE↓	Params (M)↓	FLOPs (G)↓
EBDB TIP2018 [4]	22.38	0.638	0.0509	-	-
JNB CVPR2015 [10]	22.36	0.635	0.0511	-	-
DPDNet ECCV2020 [17]	22.67	0.666	0.0506	32.25	3150
DMENet CVPR2019 [18]	22.41	0.639	0.0508	26.71	4787
IFAN CVPR2021 [19]	24.71	0.749	0.0407	10.48	794
FocalNet ICCV2023 [58]	25.02	0.725	0.0428	3.74	30
MPRNet CVPR2019 [59]	24.37	0.734	0.0413	20.10	-
Restormer CVPR2022 [72]	25.09	0.762	-	26.13	4458
RDPD+ ICCV2021 [78]	23.22	0.693	0.0478	24.28	901
MDPNet WACV2021 [79]	23.50	0.681	0.0404	46.86	1898
DRBNet+ CVPR2022 [81]	24.88	0.751	-	11.69	1273
IRNeXt ICML2023 [82]	24.80	0.707	0.0437	5.46	41
SIDGAN (Ours)	25.10	0.772	0.0401	11.95	147

Table 4. Ablation studies for different modules of SIDGAN on the DPDD dataset. ↑ indicates that the larger is better. ↓ means that the smaller is better.

	PSNR↑	SSIM↑	MAE↓
Method	PSNR↑	SSIM↑	MAE↓
baseline	26.418	0.792	0.0383
baseline + FFEM	26.770	0.800	0.0371
baseline + ARM	27.076	0.815	0.0362
baseline + FFEM + ARM	27.155	0.825	0.0355

Table 5. Ablation studies for different modules of SIDGAN on the RealDOF dataset. ↑ indicates that the larger is better. ↓ means that the smaller is better.

	PSNR↑	SSIM↑	MAE↓
Method	PSNR↑	SSIM↑	MAE↓
baseline	23.874	0.690	0.0456
baseline + FFEM	24.158	0.709	0.0441
baseline + ARM	24.713	0.748	0.0419
baseline + FFEM + ARM	25.100	0.772	0.0401

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ling, S.; Zhan, H.; Cao, L. SIDGAN: Efficient Multi-Module Architecture for Single Image Defocus Deblurring. Electronics 2024, 13, 2265. https://doi.org/10.3390/electronics13122265

AMA Style

Ling S, Zhan H, Cao L. SIDGAN: Efficient Multi-Module Architecture for Single Image Defocus Deblurring. Electronics. 2024; 13(12):2265. https://doi.org/10.3390/electronics13122265

Chicago/Turabian Style

Ling, Shenggui, Hongmin Zhan, and Lijia Cao. 2024. "SIDGAN: Efficient Multi-Module Architecture for Single Image Defocus Deblurring" Electronics 13, no. 12: 2265. https://doi.org/10.3390/electronics13122265

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SIDGAN: Efficient Multi-Module Architecture for Single Image Defocus Deblurring

Abstract

1. Introduction

2. Related Works

2.1. Existing Approaches for Defocus Deblurring

2.2. Defocus Deblurring Datasets

2.3. Generative Adversarial Networks

2.4. Conditional Adversarial Networks

3. Proposed Method

3.1. The Details of SIDGAN

3.2. Loss Function

4. Experiments

4.1. Experimental Details

4.2. Experimental Results

4.2.1. Evaluation Experiments on the DPDD Dataset

4.2.2. Evaluation Experiments on the RealDOF Dataset

5. Ablation Studies

5.1. Ablation Studies on the DPDD Dataset

5.2. Ablation Studies on the RealDOF Dataset

6. Limitations

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI