3.1.3. Image Reconstruction

Recall that the pre-denoising image output of the generation network is *z*ˆ, the kernel obtained by KPN *K* = 1*kp*, *p* ∈ *<sup>z</sup>*2, where *kP* is a *k* × *k* matrix, and its *x* row and column elements *y* are marked as *kp*(*<sup>x</sup>*, *y*). To ensure that the weight range of each kernel falls in the interval [0,1], and the sum is equal to 1 [21], we first use the SoftMax function to normalize:

$$\mathcal{K}\_P(\mathbf{x}, \mathbf{y}) = \frac{\mathbf{K}\_p(\mathbf{x}, \mathbf{y})}{\frac{1}{|\mathbf{S}|} \sum\_{1 \le s, t \le k} \exp\left(\mathbf{K}\_p(\mathbf{x}, \mathbf{y})\right)} \tag{1}$$

The meaning of each element in the prediction kernel *K* ˆ *P* is the degree of influence of each pixel area in the domain *k* × *k* around the pixel *p*. *S* is the set of kernel sizes. Therefore, the final reconstructed image can be calculated as follows [21]:

$$\varphi(p) = \frac{1}{|S|} \sum\_{1 \le s, t \le k} \mathcal{R}\_P(x, y) \sharp (p + (x, y)) \tag{2}$$

Through weights normalizing, we can estimate the final color value included in each pixel area in the image, which can greatly reduce the search space of the output estimation value in the denoising process, and avoid phenomena such as the color shift effect. Secondly, normalization can also make the gradient of the weight value relatively stable, avoiding large gradient oscillations caused by the high-dynamic-range characteristics of the input image during the training process.

#### 3.1.4. Loss Function Design

The network proposed in this article is made up of KPN and DGAN. Thus, designing a reasonable loss function is a very important issue that enables these two networks to work together and improve the quality of denoising. Specifically, our loss function consists of three parts.

Generate the loss function *LDGAN*: The generator is responsible for using the input noise image to generate a preliminary denoising rendered image, and the discriminator is responsible for comparing the generated image with the real image. Our dataset is *M* = {*mi* = (*xi*, *fi*), *gi* : *i* = 1, 2, . . . , *<sup>N</sup>*}, where *R* is the number of denoise elimination iterations. We set it in our work as 4 iterations, and then considering the generator as *G*, and the discriminator is set as *D* = *Di*, *i* = {1, 2, 3, <sup>4</sup>}. The training process for generating an adversarial network is a process of optimizing the loss function *LDGAN*, such as [23]:

$$\min\_{G} \max\_{D} L\_{DGAN}(G, D) \tag{3}$$

$$
\min\_{\mathbf{G}} \max\_{\mathbf{D}} \mathbf{x}\_{\mathbf{D}} \sum\_{i=1}^{K} L\_{\mathbf{D} \mathbf{G} \mathbf{A} \mathbf{N}} (\mathbf{G}, \mathbf{D}\_{i}) \tag{4}
$$

During this, the optimization function (Equation (4)) in the optimization process is used to solve the parameter value of each *Di*. Thus, *LDGAN* reaches the maximum, this *Di* is fixed, and then we solve for *G* to minimize *LGAN*(*<sup>G</sup>*, *Di*).

$$\mathbf{G}^\* = \min\_{\mathbf{G}} \max\_{\mathbf{G}} \mathbf{L}\_{\mathbf{D}} \sum\_{i=1}^{\mathbf{R}} L\_{\mathbf{D}\mathbf{G}AN}(\mathbf{G}\_{\prime}D\_{i}) \tag{5}$$

The generator *G*∗ at this time has the model parameters to produce a reasonable denoised image by Equation (5). On the contrary, we adopt a different general discriminator form [15]. As for the loss function *LDGAN*(*<sup>G</sup>*, *Di*) of a single discriminator, instead of letting the discriminator output a probability value to judge the true or false of the sample, *L*1 is used to measure the loss between the two samples, namely [15]:

$$L\_{DGAN}(\mathcal{G}, D\_i) = E\left[\frac{1}{|z|} \sum\_{p \in z} \left\| D\_i(z(p), \mathcal{g}(p)) - D\_i(z(p), \mathcal{G}(p)) \right\|\_1 \right] \tag{6}$$

Among them, |*z*| is the total number of pixels in the image, *Di*(*z*(*p*), *g*(*p*)) represents the pixel value of the input image *<sup>z</sup>*(*p*), and the corresponding real image pixel *g*(*p*) is the output obtained as the input of the *i th* discriminator. The same principle *Di*(*z*(*p*), *<sup>G</sup>*(*z*(*p*))) represents the output obtained by taking the generator output *<sup>G</sup>*(*z*(*p*)) corresponding to *z*(*p*) as the input of the *i th* discriminator. *E* represents the mathematical expectation, which is the average calculation of the loss values calculated for all samples in the dataset.

Equation (6) is only used when training a single adversarial generation network. When KPN and DGAN are trained together, Equation (6) becomes the following form:

$$L\_{DGAN}(G, D\_i) = E\left[\frac{1}{|z|} \sum\_{p \in z} ||D\_i(z(p), \xi(p)) - D\_i(z(p), \sharp(p))||\_1\right] \tag{7}$$

The output of the generator *<sup>G</sup>*(*z*(*p*)) becomes the estimated value of the pixel after the prediction kernel obtained by KPN is applied to the output of the generator *<sup>z</sup>*´(*p*).

In kernel prediction loss function *LK*, the true value of the prediction kernel cannot be obtained, because there is no such label in the dataset. Thus, we use real images *gi* for supervision and, at the same time, make the two networks work together. Therefore, *LK* is defined as:

$$L\_{kcrnel} = \sum\_{z\_i} \frac{1}{|z\_i|} \sum\_{p \in z} \|\dot{z}(p) - \mathbf{g}(p)\|\_1 \tag{8}$$

Some state-of-the-art studies [16,24,25] found that comparing *L*1 loss with *L*2 loss can also reduce speckle noise-like artifacts in the reconstructed image, because *L*1 is more sensitive to outliers, such as brighter highlights, which have a grea<sup>t</sup> influence on error. Compared with *L*1 loss, *L*2 loss will be more robust to outliers, which is also confirmed in previous literature. However, *L*1 loss or *L*2 loss usually obtained a higher peak signal-tonoise ratio (PSNR) [26], but the result of the blurring of high-frequency components led to a blurry texture. Therefore, it is necessary to adopt other loss functions to compensate for the high-frequency details. Therefore, we add tone loss function *LT*. To make the generated denoised image details have better definition, have a better denoising effect on low-contrast and darker noisy images, and improve its contrast, it is subject to the method inspired by [27], added as a new loss function item to improve the denoising effect of the image. *LT* has the following form:

$$L\_{Tonc} = \sum\_{z\_i} \frac{1}{|z\_i|} \sum\_{p \in \mathcal{c}} \left\| \frac{z(p)}{1 + z(p)} - \frac{g(p)}{1 + g(p)} \right\|\_1 \tag{9}$$

Equation (9) is inspired by tone mapping, which can map the pixels in the image from a small range to a larger range so that the picture can be clearer and brighter. It is a common method in image processing. This penalty item can improve the contrast and clarity of the scene. Finally, the overall loss function *Ltotal*defines a mixture of the above three terms:

$$L\_{total} = \alpha L\_{DGAN} + \beta L\_{krmcl} + \omega L\_{Tone} \tag{10}$$

Among them, *α*, *β*, *ω*, we set the balance parameters as 0.003, 0.008, and 0.09, respectively. Typically, by using such a loss function to make the overall network structure work together, it becomes an end-to-end overall structure.

Finally, this article chooses the gradient magnitude similarity deviation as the image noise estimate because it is relative to other indicators such as the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) [21], because it has achieved good results in public databases for image quality evaluation and the calculation speed is relatively fast.
