**1. Introduction**

Image restoration [1] is an important research direction in image processing. It is a technique to study the cause of degradation and establish a mathematical model to restore high-quality images in response to the degradation in the image acquisition process. Image deblurring [2,3] is also a kind of image restoration. It is mainly aimed at the blurring effect caused by the relative displacement between the photographed object and the device due to camera shake or noise interference. The texture is also clearer and more natural. With this technology, low-quality blurred images can be restored to high-quality clear images. At present, there are many methods applied in the field of image deblurring. However, due to the high-quality requirements of image deblurring, image deblurring is still a very challenging research direction.

With the continuous development of human industrial technology, the application of characters on workpieces in the industrial field is also very important. The on-site environment in an industrial site is chaotic and complex. Factors such as camera shake, subject movement, light, and on-site noise will cause the image captured by the camera to appear blurred. The integrity of the information is urgently required in industrial sites. Solving this problem also has practical significance and application value. In the intelligent express sorting system, as long as the express code is obtained, you can know the area where the express was sent by the courier. If these courier slips are missing, guessing the information with the naked eye is not only cumbersome and heavy but also inefficient. If you can use a computer to automatically recover the blurred image information and automatically restore the blurred three-segment coding from the express order to a clear image, then you can reduce the manpower and material resources for manual data processing. Therefore, it is very important to deblur the fuzzy express image information in the automatic sorting system.

In recent years, with the rapid development of deep learning, a large number of scholars have researched image deblurring methods based on machine learning, and all have achieved good results [4]. In many studies, the image deblurring method based on generative adversarial network (GAN) has been widely recognized. This not only takes into account the rationality of image texture details but also considers the uniformity of the overall image structure. Therefore, this paper proposes an image deblurring method based on the generative adversarial network. Firstly, in order to eliminate the difference between the real blurred image and the generated blurred image, we established an unpaired dataset and solved the image deblurring problem based on this. Then, a GAN model was established, which consisted of two generative adversarial networks. For the conversion of blurred images to clear images and the conversion of clear images to blurred images, the loss function was optimized by combining adversarial loss and perceptual loss. Finally, a stable network model was obtained by iteratively training the GAN model on unpaired datasets. During each training process, the model was updated to achieve a better deblurring effect.

The specific sections of the article are as follows. Section 2 briefly introduces the related work on image deblurring. Section 3 presents the network model proposed in this paper and explains each part of the model in detail. Section 4 gives the experimental demonstration, and Section 5 gives the summary of this article and future expectations.

### **2. Related Works**

With the popularization of handheld devices and multimedia communication technology, image deblurring technology avoids further increasing the high cost of the device as well as noise interference between redundant components. It has broad application scenarios and strong technical advantages. If we can first use the deblurring method to restore the corresponding clear image and then use the restored image as the input of the subsequent neural network, the accuracy of the output of the algorithm will be greatly improved. Therefore, image deblurring technology, as an important part of computer vision data preprocessing, has also become a research hotspot in the field of computer vision and computer graphics.

At present, there are two main research methods for image deblurring. One is the nonblind deblurring method, which uses a known blur kernel function that directly deconvolves the degraded model of the blurred image to obtain a restored high-definition image. The other is the blind deblurring method, which is used when the fuzzy process is unknown. A brief introduction of these two methods is as follows.

The nonblind deblurring method is a more traditional image deblurring method. It first obtains the blur kernel information through a certain technique and then deconvolves the blur image according to the obtained blur kernel to restore a high-definition image. The classic deconvolution algorithms include the Lucy–Richardson algorithm, the Wiener filter, and the Tikhonov filter.

In reality, in most cases, the fuzzy function is unknown. Therefore, it is necessary to make assumptions on the fuzzy source and parameterize the fuzzy function. The most common assumption is that the blur is uniformly distributed on the image. For example, the method proposed by Fergus et al. [5] achieved groundbreaking results, and the literature [6–8] has been optimized based on it. In addition, there are some methods for dealing with cases where the blur is unevenly distributed on the image, but this type of algorithm also simplifies the problem from different angles. For example, Whyte et al. [9] used a parametric geometric model to model camera motion, and Gupta et al. [10]

assumed that blur was caused solely by 3D camera motion. These traditional methods have achieved certain effects. However, because the model makes too many assumptions, they have a lot of limitations in the application scene and cannot solve the problem of image blur caused by various complicated factors in actual life.

With the development of deep learning in the field of computer vision, scholars everywhere have begun to use deep learning to deal with image deblurring. Earlier works were still based on the idea of nonblind deblurring, allowing neural networks to estimate fuzzy kernel information. For example, Sun et al. [11] used a convolutional neural network (CNN) to estimate the fuzzy kernel and then restored the image based on the estimated fuzzy kernel. Chakrabarti et al. [12] used de-CNN to predict the Fourier coefficients of the fuzzy kernel and deblurred the image in the frequency domain. Gong et al. [13] used a full convolutional network (FCN) to estimate the motion flow of the entire image and restored a blurred image based on it. Due to the use of a nonblind deblurring algorithm, the above methods need to obtain a clear image after obtaining the estimated fuzzy kernel through CNN and then use a traditional deconvolution algorithm to deconvolve the blurred image. This leads to slow running speed of the algorithm, and the restoration effect depends entirely on the estimation results of the blur kernel.

In recent years, with the deep prospect of deep learning in the areas of image semantic repair and image compression [14], more and more scholars have discovered that the work that neural networks can cover is far more than just estimating fuzzy kernels. In 2017, Nah et al. [15] proposed the use of multiscale convolutional neural networks to directly deblur images. They used an end-to-end training method to allow the network to directly reproduce clear images without first estimating the blur function. This type of method is called the blind deblurring method. Compared with the previous method, this method greatly improves the model effect and running speed. Other similar methods are those proposed by Noroozi et al. [16], Ramakrishnan et al. [17], and Yao et al. [18,19]. Later, Kupyn et al. [20] proposed the use of conditional generative adversarial networks (CGAN) to deblur images. They followed the basic structure of pix2pix, a general framework for image translation tasks proposed by Isola et al. [21], and modified it to obtain the DeblurGAN image deblurring algorithm model. This model obtained better image deblurring effect than the multiscale convolutional neural network used by Nah et al. At the same time, the network structure was simpler and faster. To some extent, this reflects the fact that the generative adversarial network really performs well on image deblurring tasks.

In this paper, an image deblurring method based on GAN is proposed for unpaired datasets. Because there is no blur–clear image pair in unpaired datasets, a single GAN cannot directly calculate the loss function. Therefore, the proposed model uses two generations. Adversarial networks can realize the mutual conversion between blur and clear images and indirectly calculate the loss function between the input and output images. Therefore, the model has the ability to train and learn on unpaired datasets. At the same time, a loss function that combines adversarial loss and perceptual loss is used for training, making the image generated by the model clearer and more real.

### **3. Image Deblurring Model**

In order to eliminate the difference between the real blurred image and the blurred image synthesized by the algorithm, as well as to achieve a better image deblurring effect in the real industrial scene, we used the CycleGAN structure [22] on the premise of having an unpaired dataset. An image deblurring model based on a generative adversarial network was established. The overall structure of the model is shown in Figure 1. This model consists of two generative adversarial networks A and B, which are used to achieve the conversion from blurred images to clear images and from clear images to blurred images. A and B networks are composed of their respective generators and discriminators. The model also adds a loss function to the network that combines adversarial loss and perceptual loss. Such a model just constitutes a cyclic structure consisting of clear-> blur-> clear and blur-> clear->

blur, which better constrains the content of the generated sample. This article will introduce each part of the model separately.

**Figure 1.** The overall structure of the model.

### *3.1. Structure of GAN*

### 3.1.1. Generator Model

In generative adversarial networks, generative networks are the key. In previous studies, some classic network structures [23,24] have also achieved outstanding results. Among them, the deep residual network structure has achieved better performance in generating high-definition image tasks because it can greatly increase the number of layers in the network, as shown in [25,26]. The deep residual network was proposed by He et al. [24] in 2016. It is composed of several residual block (ResBlock) and other layers. After using the residual module, the number of layers of the network can be greatly deepened without the phenomenon of difficult model convergence, model degradation, and disappearance of gradients. Therefore, we used the deep residual network structure as the network structure of the generator. In addition, we added global skip connection [27] to the generative network. Global skip connection can directly connect the input of the entire network to the output. Therefore, the intermediate network only learns the residual between the output and the input, thereby reducing the amount of network learning and making the network converge faster and fit to improve the generalization ability of the model.

On the specific network structure, combined with CycleGAN, deep residual network, and global skip connection, the network contains 22 convolutional layers and two deconvolutional layers. A batch normalization (BN) layer [28,29] is added after each convolutional layer. The activation function uses a linear rectification (ReLU) function. The network structure of the generator is shown in Figure 2.

**Figure 2.** Generator network.

### 3.1.2. Discriminator Model

Compared with the generator, the task of the discriminator is to learn the difference between the generated sample and the real sample, including whether the structure in the picture is natural and the content is clear. In order to more accurately measure whether the samples are clear and natural, we used PatchGAN, proposed by Isola et al. [21], as the discriminative network structure. Because PatchGAN pays more attention to the local information of the image, the generated image details are more abundant, and the visual effect is more realistic. Unlike the original PatchGAN, we removed the last Sigmoid function activation layer of the original PatchGAN and used Wasserstein distance instead of the original loss function. PatchGAN is unique in that it pays more attention to the local information of the image, which makes the generated image richer with detailed information and the visual effect more realistic.

Because the input of PatchGAN is an image block, we used a sliding window with a size of 70 × 70 to traverse the entire generated sample. Each image block can output a value through PatchGAN, and the average value of all image blocks can then be obtained with the authenticity of the entire image. The structure of the entire discrimination network is shown in Figure 3.

**Figure 3.** Discriminator network.

### *3.2. Loss Function of Network*

The loss function is the most basic and critical factor in deep learning. By rationally designing the loss function and continuously optimizing it, the network can learn what they should learn without a clear image, thereby achieving a deblurring effect. In this work, the loss function of the entire network is a combination of adversarial loss [30] and perceptual loss. With these two loss functions, the generator can produce clear and realistic images. For the convenience of description, in the following content, Z is used to represent the samples in the clear image set, T is the samples in the blur image set, and N is the number of samples.

### 3.2.1. Adversarial Loss

Adversarial loss refers to the loss function between two generative adversarial networks A and B. For A, its role is to make the generated image as realistic and clear as possible, and for B, its role is to make the generated sample have as realistic motion blur as possible. In the development of generative adversarial networks, various adversarial loss functions have appeared, including cross-entropy loss functions [31], squared loss functions [32], and Wasserstein distance loss functions [33]. Because WGAN-GP [33,34] uses the Wasserstein distance loss function as the adversarial loss of the network and increases the gradient penalty term for discriminating the network, it has achieved the most stable training effect at present. Therefore, we used the Wasserstein distance loss function for the confrontation loss. The calculation process of the adversarial loss is shown in Figure 4.

**Figure 4.** The calculation process of adversarial loss.

The formulas of the two generative adversarial networks are shown in Equations (1) and (2).

$$L\_{\rm GAN}(A) = \frac{1}{N} \sum\_{n=1}^{N} \left[ D\_A(T) - D\_A(G\_A(Z)) \right] \tag{1}$$

$$L\_{GAN}(B) = \frac{1}{N} \sum\_{n=1}^{N} \left[ D\_B(Z) - D\_B(G\_B(T)) \right] \tag{2}$$

In the above formula, *GA*(*Z*) represents the sample generated by the generator in network A on the clear image set Z, and *GB*(*T*) represents the sample generated by the generator in network B on the blurred image set T. *DA*(*T*) represents the probability that the discriminator in network A judges whether the blurred image set T is a real image. *DB*(*Z*) represents the probability that the discriminator in network B judges whether the clear image set Z is a real image.

### 3.2.2. Perceptual Loss

Perceptual loss has the ability of visual perception close to the human eye. Compared with other pixel-level loss functions, it can make the generated image look more realistic and natural. Perceptual loss was originally proposed by Johnson et al. [35], and it has achieved good results in multiple application areas, such as image style transfer [35], image segmentation [36,37], image super-resolution reconstruction [26,38], and image deblurring [20]. The calculation of the perceptual loss depends on the visual geometric group (VGG) network [39]. The specific calculation steps are as follows. First, input two images to be tested, namely, a real image and a generated image, into a pretrained VGG network. Then, extract the feature map output by one or several convolutional layers from the VGG network. Finally, calculate the mean square error (MSE) on the feature maps corresponding to the two images to be tested. In this work, the feature map output from the eighth convolution layer in the

VGG-16 network was selected to calculate the perceptual loss. The calculation process is shown in Figure 5. The formula is shown in Equation (3).

$$\begin{split} L\_{\text{prr}}(G\_A, G\_B) &= \frac{1}{N} \sum\_{n=1}^{N} \left[ \frac{1}{\text{slw}\eta} \boldsymbol{\rho} (\text{G}\_B(G\_A(Z))) - \boldsymbol{\rho}(Z)\_2^2 \\ &+ \frac{1}{\text{slw}\boldsymbol{\rho}} \boldsymbol{\rho} (\text{G}\_A(G\_B(T))) - \boldsymbol{\rho}(T)\_2^2 \right] \end{split} \tag{3}$$

**Figure 5.** The calculation process of perceptual loss.

Here, ϕ represents the feature map output from the eighth convolutional layer of VGG-16, and *s*, *h*, and *w* represent the number of channels, height, and width of the feature map, respectively.

In summary, the overall loss function of the network is the result of the weighted summation of the above two loss functions, and the formula is shown in Equation (4).

$$\mathcal{L}(\mathbf{A}, \mathbf{B}, \mathbf{Z}, \mathbf{T}) = L\_{\text{GAN}}(\mathbf{G}\_{\text{A}\prime} \mathbf{D}\_{\text{A}\prime} \mathbf{Z}, \mathbf{T}) + L\_{\text{GAN}}(\mathbf{G}\_{\text{B}\prime} \mathbf{D}\_{\text{B}\prime} \mathbf{T}, \mathbf{Z}) + \mu \mathcal{L}\_{\text{per}}(\mathbf{G}\_{\text{A}\prime} \mathbf{G}\_{\text{B}}) \tag{4}$$

In the above formula, μ represents the weight of the perceptual loss function. As with the original generative adversarial network, the generator needs to minimize the loss function, and the discriminator needs to maximize the loss function. Therefore, the result of the optimal generator is as follows:

$$\mathbf{G}\_{A}\mathbf{'},\mathbf{G}\_{B}\mathbf{'} = \arg\min\_{\mathbf{G}\_{A},\mathbf{G}\mathbf{B}} \max\_{D\_{\bullet},D\_{\bullet},D\_{B}} L(A,B,Z,\mathbf{T}) \tag{5}$$

### *3.3. Algorithm Implementation*

In the algorithm implementation process of the entire network model, the role of Discriminator1 and Discriminator2 is to provide gradients for Generator1 and Generator2 to guide its optimization process. The role of generative network B is to realize the conversion from clear images to blurred images and assist generative network A to complete the learning process so that A can generate and input consistent samples. After Discriminator1, Generator2, and Discriminator2 complete their respective auxiliary work, Generator1 is responsible for restoring the input blurred image into a clear image as the output result of the entire algorithm model. In the model training process, we used the buffer pool strategy proposed by Shrivastava et al. [40] to reduce model oscillation. When updating the parameters of the discriminant network, the historical samples generated by the generator and the new samples generated in this iteration were used as the input of the discriminative network, thereby increasing the stability of the model. The size of the generated sample buffer pool was 50, the batch size of all model training was 1, and the number of iterations was 300 epochs. The optimization algorithm used the Adam algorithm [41], and the initial learning rate was 0.0002. The specific implementation process was as Algorithm 1.

### **Algorithm 1:** The algorithm flow of this model.

1: Initialize the input shape, h = 256, w = 256, s = 3, and the output shape of PatchGAN, pa = h/2\*\*4, and the loss weight, μ = 10, and the optimizer, Adam(0.0002, 0.5) 2: Input (img\_1,h,w,s), Input (img\_2,h,w,s) 3: Combined model trains generator to discriminator 4: **for** epoch **in** range(300): 5: **for** batch\_i, img\_1, img\_2 **in** enumerate (dataloader.loadbtach(1)): 6: fake\_2 = generator(img\_1), fake\_1 = generator(img\_2) 7: recon\_1 = generator(fake\_2), recon\_2 = generator(fake\_1) 8: vali\_1 = discriminator(fake\_1), vali\_2 = discriminator(fake\_2) 9: **if** loss **is** *arg* min *GA*,*GB* max *DA*,*DB L*(*A*, *B*,*Z*, *T*)**:** 10: clear\_image = concatenate (img\_1, fake\_2, recon\_1, img\_2, fake\_1, recon\_2) 11: **return** clear\_image
