**1. Introduction**

Rain is a common weather condition that negatively impacts computer vision systems. Raindrops appear as bright streaks in images due to their high velocity and light scattering. Since image recognition and detection algorithms are designed for clean inputs, it is essential to develop an effective mechanism for rain streak removal.

A number of research efforts have been reported in the literature focusing on restoring rain images, and different approaches have been taken. Some have attempted to remove rain streaks using video [1–3], while other researchers have focused on rain image recovery from a single image by considering the image as a signal separation task [4–6].

Since rain streaks overlap with background texture patterns, it is quite a challenging task to remove the rain streaks while maintaining the original texture in the background. Most of the times, this results in over-smoothed regions that are visible in the background after the de-raining process. De-raining algorithms [7,8] tend to over de-rain or under de-rain the original image. A key limitation in the traditional, handcrafted methods is that the feature learning is manual and designed to deal only with certain types of rain streaks, and they do not perform well with varying scales, shapes, orientations, and densities

**Citation:** Hettiarachchi, P.; Nawaratne, R.; Alahakoon, D.; De Silva, D.; Chilamkurti, N. Rain Streak Removal for Single Images Using Conditional Generative Adversarial Networks. *Appl. Sci.* **2021**, *11*, 2214. https://doi.org/10.3390/app11052214

Academic Editor: Rubén Usamentiaga

Received: 4 February 2021 Accepted: 25 February 2021 Published: 3 March 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

of raindrops [9,10]. In contrast, by using convolutional neural networks (CNNs), the feature learning process becomes an integral part of the algorithm and is able to unveil many hidden features. Convolutional neural network-based methods [11–13] have gained huge improvements in image de-raining during the last few years. These methods try to figure out a nonlinear mapping between the input rainy image and the expected ground truth image.

Still, there is potential for improvements and optimizations within CNN-based image de-raining algorithms, which could lead to more visually appealing and accurate results. Instead of being just constrained to characterizing rain streaks, visual quality should also be considered when defining the optimization functions, which will result in improving the visual appeal of test results. When defining the objective function, it should consider the fact that the performance of vision algorithms, such as classification/detection, should not be affected by the presence of rain streaks. The addition of this discriminative information ensures that the output is indistinguishable from its original counterpart.

Generative modeling is an unsupervised learning task in machine learning that involves automatically discovering and learning the patterns in input data in such a way that the model can be used to generate new examples that are indistinguishable from reality. The concept of generative adversarial networks (GANs) was originally presented in [14] and has gained a high level of interest, with several successful applications and directions reported within a short period in the machine learning community. Existing CNN-based mechanisms only consider either *L*1(Least Absolute Deviations) or *L*2 (Least Square Errors) errors, whereas in conditional GANs, they have additional adversarial loss components, which result in very good, qualitative, visually appealing image outputs.

In our approach, we propose a conditional generative adversarial network-based framework for rain streak removal. Our model consists of a densely connected generator (G) network and a CNN-based discriminator (D) network. The generator network converts rainy images to de-rained images in such a way that it fools the discriminator network. In certain scenarios, traditional GANs tend to make output images more artificial and visually displeasing. To mitigate this issue, we have introduced a conditional CNN with skip connections for the generator. Skip connections guarantee better convergence by efficiently leveraging features from different layers of the network. The proposed model is based on the Pix2Pix framework by Isola et al. [15] and the conditional generative adversarial networks originally proposed by Fu et al. [16]. We have also used the source codes provided by authors of LPNet [17] and GMM [18] for quantitative and qualitative comparisons of the proposed model.

This paper makes the following contributions:


The paper is organized as follows: In Section 2, we provide an overview of related methods for image de-raining and the basic concepts behind cGANs. Section 3 describes the proposed model (CGANet—Conditional Generative Adversarial Network model) in detail with its architecture. Section 4 describes the experimental details with evaluation results. Section 5 provides the conclusion. Implementation details and the dataset used for the experiments are publicly available at GitHub (https://github.com/prasadmaduranga/ CGANet (accessed on 11 December 2020)).

#### **2. Related Work**

In the past, numerous methods and research approaches have been proposed for image de-raining. These methods can be categorized as single image-based methods and video-based methods. With the evolution of neural networks, deep learning-based methods have become more dominant and efficient compared to past state-of-the-art methods.

#### *2.1. Single Image-based Methods*

Single image-based methods have limited access to information compared to videobased methods, which makes it more challenging to remove the rain streaks. Single imagebased methods include low-rank approximations [3,19], dictionary learning [4,5,20], and kernel-based methods [21]. In [4], the authors decomposed the image into high- and lowfrequency components and recognized the rain streaks by processing the high-frequency components. Other mechanisms have used gradients [22] and mixture models [18] to model and remove rain streaks. In [18], the authors introduced a patch-based prior for both clean and rainy layers using Gaussian mixture models (GMM). The GMM prior for rainy layers was learned from rainy images, while for the clean images, it was learned from natural images. Nonlocal mean filtering and kernel regression were used to identify rain streaks in [21].

#### *2.2. Video-based Methods*

With the availability of inter-frame information, video-based image de-raining is relatively more effective and easier compared to single image de-raining. Most research studies [1,23,24] have focused on detecting potential rain streaks using their physical characteristics and removing them using image restoration algorithms. In [25], the authors divided rain streaks into dense and sparse groups and removed the streaks using a matrix decomposition algorithm. Other methods have focused on de-raining in the Fourier domain [1] using Gaussian mixture models [23], matrix completions [24], and low-rank approximations [3].

#### *2.3. Deep Learning based Methods*

Deep learning-based methods have gained much popularity and success in a variety of high-level computer vision tasks in the recent past [26–28] as well as in image processing problems [29–31]. Deep learning was introduced for de-raining in [11] where a three-layer CNN was used for removing rain streaks and dirt spots in an image that had been taken through glass. In [12], a CNN was proposed for video-based de-raining, while a recurrent neural network was adopted by Liu in [32]. The authors in [33] proposed a residual-guide feature fusion network for single image de-raining. A pyramid of networks was proposed in [17], which used the domain-specific knowledge to reinforce the learning process.

CNNs learn to minimize a loss function, and the loss value itself decides the quality of output results. Significant design efforts and domain expertise are required to define an effective loss function. In other words, it is necessary to provide the CNN with what the user requires to minimize. Instead, if it is possible to set a high-level, general goal such as "make the output image indistinguishable from the target images", then the CNN can automatically learn a loss function to satisfy the goal. This is the basic underlying concept behind generative adversarial networks (GANs).

#### *2.4. Generative Adversarial Networks*

Generative adversarial networks [14] are unsupervised generative models that contain two deep neural networks. The two neural networks are named as the generator ( *G*) and discriminator ( *D*) and are trained parallelly during the training process. GAN training can be considered to be a two-player min-max game where the generator and discriminator compete with each other to achieve each other's targeted goal. The generator is trained to learn a mapping from a random noise vector (*z*) in latent space to an image (*x*) in a target domain: G(z) → x. The discriminator (D) learns to classify a given image as a real (output close to 1) image or a fake (output close to 0) image from the generator (G): D(x) → [0.1]. Both the generator and decimator can be considered as two separate neural networks trained from backpropagation, and they have separate loss functions. Figure 1 shows the high-level architecture of the proposed conditional GAN model. The generator will try to generate synthetic images that resemble real images to fool the discriminator. The discriminator learns how to identify the real images from the generated synthetic images from the generator.

**Figure 1.** High-level architecture of the proposed model (CGANet).

The widest adaptation of GANs is for data augmentation, or that is to say, to learn from existing real-world samples and generate new samples consistent with the distribution. Generative modeling has been used in a wide range of application domains including computer vision, natural language processing, computer security, medicine, etc.

Xu et al. [34] used GANs for synthesizing image data to train and validate perception systems for autonomous vehicles. In addition to that, [35,36] used GANs for data fusion for developing image classification models while mitigating the issue of having smaller datasets. Furthermore, GANs were used for augmenting datasets for adversarial training in [37]. To increase the resolution of images, a super-resolution GAN was proposed by Ledig et al. [38], which took a low-resolution image as the input and generated a highresolution image with 4× upscaling. To convert the image content from one domain to another, an image-to-image translation approach was proposed by Isola et al. [15] using CGANs. Roy et al. [39] proposed a TriGAN, which could solve the problem of image translation by adapting multiple source domains. Experiments showed that the SeqGAN proposed in [40] outperformed the traditional methods used for music and speech generation. In the computer security domain, Hu and Tan [41] proposed a GAN-based model to generate malware. For private product customization, Hwang et al. [42] proposed GANs to manufacture medical products.

## **3. Proposed Model**

The proposed approach uses image-to-image translation for the image de-raining task. In a GAN, the generator produces the output based on the latent variable or the noise variable (*z*). However, in the proposed approach, it is necessary for a correlation to exist between the source image and the generator output image. We have applied the conditional GAN [16], which is a variant of the traditional GAN that takes additional information, y, as the input. In this case, we provide a source image with rain streaks as additional information for both the generator and discriminator. *x* represents the target image.

The objective of a conditional GAN is as follows:

$$L\_{\text{cGAN}}\left(G, D\right) = E\_{\text{x-ydata(X)}}[\log(D(\text{x-y}))] + E\_{\text{z-ydz}[\text{Z}]}[\log(1 - D(G(\text{z-y}), y))]\tag{1}$$

where *pdata*(*X*) denotes the real data probability distribution defined in the data space *X*, and *pz*(*Z*) denotes the probability distribution of the latent variable *z* defined on the latent space *Z*. *Ex*~*pdata*(X) and *Ez*~*pz*(Z) represent the expectations over the data spaces *X* and *Z* respectively. *G*(.) and *D*(.) represent the non-linear mappings of the generator and discriminator networks respectively.

In an image de-raining task, the higher-order color and texture information has to be preserved during the image translation. This has a significant impact on the visual performance of the output. Adversarial loss alone is not sufficient for this task. The loss function should be optimized so that it penalizes the perceptual differences between the output image and the target image.

Our implementation architecture is based on the work of Isola's [15] Pix2Pix framework. It learns a mapping from an input image to an output image along with the objective function to train the model. In Pix2Pix, it suggests *L*1 (mean absolute error) loss instead of *L*2 (mean squared error) loss for the GAN objective function, since it encourages less blurring in the generator output. *L*1 loss averages the pixel level absolute difference between the target image and the generated image *G*(*<sup>z</sup>*. *y*) over the image space *x.y.z*.

$$L\_1(G) = \mathbb{E} \ge y.z[\parallel \ge -G(z, y) \parallel] \tag{2}$$

Finally, the loss function for this work is as follows:

$$L(G, D) = L\_{\subset GAN} \text{ (G. } D) + \lambda \text{ } L\_1(G) \tag{3}$$

Lambda ( *λ*) is a hyperparameter that controls the weights of the terms. In this case, we kept lambda = 100 [15]. When training the model, lambda was increased to train a discriminator and minimized to train a generator. The final objective was to identify the generator *G\** by solving the following optimization problem:

$$G^\* = \arg\min\_G \max\_D \left( L\_{\text{cGAN}}(G, D) + \lambda L\_1(G) \right) \tag{4}$$

*Model Overview*
