*3.1. Generator*

Our generator consists of a generator network G and an edge-enhancement network EEN. In this section, we describe the architectures of both networks and the corresponding loss function.

#### 3.1.1. Generator Network *G*

We use the generator architecture from ESRGAN [21], where all batch normalization (BN) layers are removed, and RRDB is used. The overall architecture of generator *G* is shown in Figure 3, and the RRDB is depicted in Figure 4.

Inspired by the architecture of ESRGAN, we remove BN layers to increase the performance of the generator *G* and to reduce the computational complexity. The authors of ESRGAN also state that the BN layers tend to introduce unpleasant artifacts and limit the generalization ability of the generator when the statistics of training and testing datasets differ significantly.

We use RRDB as the basic blocks of the generator network *G* that uses a multi-level residual network with dense connections. Those dense connections increase network capacity, and we also use residual scaling to prevent unstable conditions during the training phase [21]. We use the parametric rectified linear unit (PReLU) [38] for the dense blocks to learn the parameter with the other neural network parameters. As discriminator ( *DRa*), we employ a relativistic average discriminator similar to the work represented in [21].

In Equations (1) and (2), the relativistic average discriminator is formulated for our architecture. Our generator *G* depends on the discriminator *DRa*, and hence we briefly discuss the discriminator *DRa* here and then, describe all details in Section 3.2. The discriminator predicts the probability that a real image (*IHR*) is relatively more realistic than a generated intermediate image (*IISR*).

$$D\_{\rm Ra}(I\_{HR}, I\_{ISR}) = \sigma(\mathbb{C}(I\_{HR}) - \mathbb{E}\_{I\_{ISR}}[\mathbb{C}(I\_{ISR})]) \to 1 \quad \text{More Realistic than fake data?} \tag{1}$$

$$D\_{\rm R4}(I\_{I\&R}, I\_{\rm I\&R}) = \sigma(\mathbb{C}(I\_{I\&R}) - \mathbb{E}\_{I\&R}[\mathbb{C}(I\_{\rm I\&R})]) \to 0 \quad \text{Less realistic than real data?} \tag{2}$$

In Equations (1) and (2), *σ*, *<sup>C</sup>*(·) and <sup>E</sup>*IISR* represents the sigmoid function, discriminator output and operation of calculating mean for all generated intermediate images in a mini-batch. The generated intermediate images are created by the generator where *IISR* = *<sup>G</sup>*(*ILR*). It is evident from Equation (3) that the adversarial loss of the generator contains both *IHR* and *IISR* and hence, it benefits from the gradients of generated and ground truth images during the training process. The discriminator loss is depicted in Equation (4).

$$L\_{\mathbb{G}}^{Ka} = -\mathbb{E}\_{I\_{HR}}\left[\log\left(1 - D\_{\text{Ra}}(I\_{HR}, I\_{ISR})\right)\right] - \mathbb{E}\_{I\_{ISR}}\left[\log\left(D\_{\text{Ra}}(I\_{ISR}, I\_{HR})\right)\right] \tag{3}$$

$$L\_D^{Ka} = -\mathbb{E}\_{I\_{I\bar{R}}}\left[\log\left(D\_{Ra}\left(I\_{HR}, I\_{I\bar{R}}\right)\right)\right] - \mathbb{E}\_{I\_{I\bar{S}\bar{R}}}\left[\log\left(1 - D\_{Ra}\left(I\_{I\bar{S}\bar{R}\prime}I\_{I\bar{R}}\right)\right)\right] \tag{4}$$

We use two more losses for generator G: one is perceptual loss (*Lpercep*), and another is content loss (*L*1) [21]. The perceptual loss is calculated using the feature map (*vggf ea*(·)) before the activation layers of a fine-tuned VGG19 [62] network, and the content loss calculates the 1-norm distance between *IISR* and *IHR*. Perceptual loss and content loss is shown in Equations (5) and (6).

$$L\_{\text{perccp}} = \mathbb{E}\_{I\_{LR}} || \upsilon \text{gg}\_{fea}(G(I\_{LR}) - \upsilon \text{gg}\_{faa}(I\_{HR})) ||\_1 \tag{5}$$

$$L\_1 = \mathbb{E}\_{I \mathcal{R}} \left| \left| G(I\_{L\mathcal{R}}) - I\_{HR} \right| \right|\_1 \tag{6}$$

**Figure 3.** Generator *G* with RRDB (residual-in-residual dense blocks), convolutional and upsampling blocks.

(**b**) Dense block from RRDB.

**Figure 4.** Internal diagram of RRDB (residual-in-residual dense blocks).

#### 3.1.2. Edge-Enhancement Network EEN

The EEN network removes noise and enhances the extracted edges from an image. An overview of the network is depicted in Figure 5. In the beginning, Laplacian operator [28] is used to extract edges from the input image. After the edge information is extracted, it is passed through convolutional, RRDB, and upsampling blocks. There is a mask branch with sigmoid activation to remove edge noise as described in [22]. Finally, the enhanced edges are added to the input images where the edges extracted by the Laplacian operator were subtracted.

The EEN network is similar to the edge-enhancement subnetwork proposed in [22] with two improvements. First, we replace the dense blocks with RRDB. The RRDB shows improved performance according to ESRGAN [21]. Hence, we replace the dense block for improved performance of the EEN network. Secondly, we introduce a new loss term to improve the reconstruction of the edge information.

**Figure 5.** Edge-enhancement network where input is an ISR (intermediate super-resolution) image and output is a SR (super-resolution) image.

In [22], authors extracted the edge information from *IISR* and enhanced the edges using an edge-enhancement subnetwork which is afterwards added to the edge-subtracted *IISR*. To train the network, [22] proposed to use Charbonnier loss [34] between the *IISR* and *IHR*. This function is called consistency loss for images (*Limg*\_*cst*) and helps to ge<sup>t</sup> visually pleasant outputs with good edge information. However, sometimes the edges of some objects are distorted and produce some noises and consequently, do not give good edge information. Therefore, we introduce a consistency loss for the edges (*Ledge*\_*cst*) as well. To compute *Ledge*\_*cst* we evaluate the Charbonnier loss between the extracted edges (*Iedge*\_*SR*) from *ISR* and the extracted edges (*Iedge*\_*HR*) from *IHR*. The two consistency losses are depicted in Equations (7) and (8) where *ρ*(·) is the Charbonnier penalty function [63]. The total consistency loss is finally calculated for both images and edges by summing up the individual loss. The loss of our EEN is shown in Equation (9).

$$L\_{img,cst} = \mathbb{E}\_{I \text{SR}} \left[ \rho (I\_{HR} - I\_{SR}) \right] \tag{7}$$

$$L\_{\text{edge\\_cost}} = \mathbb{E}\_{I\_{\text{edge\\_SR}}} \left[ \rho (I\_{\text{edge\\_HR}} - I\_{\text{edge\\_SR}}) \right] \tag{8}$$

$$L\_{\text{err}} = L\_{\text{imp\\_cost}} + L\_{\text{ref\\_cost}} \tag{9}$$

Finally, we ge<sup>t</sup> the overall loss for the generator module by adding the losses of the generator G and the EEN network. The overall loss for the generator module is shown in Equation (10) where *λ*1, *λ*2, *λ*3, and *λ*4 are the weight parameters to balance different loss components. We empirically set the values as *λ*1 = 1, *λ*2 = 0.001, *λ*3 = 0.01, and *λ*4 = 5.

$$L\_{\text{G\\_een}} = \lambda\_1 L\_{\text{perccep}} + \lambda\_2 L\_{\text{G}}^{Ra} + \lambda\_3 L\_1 + \lambda\_4 L\_{\text{even}} \tag{10}$$
