•Generator Network

In image-to-image translations, it is necessary to map a high-resolution input grid to a high-resolution output grid. Though the input and output images differ in appearance, they share the same underlying structure, and as such, it is necessary to consider this factor when designing the generator architecture. Most previous work used the encoder-decoder network [43] for such scenarios. In encoder-decoder CNN, the input is progressively downsampled until the bottleneck layer, where the process gets reversed and starts to upsample the input data. Convolutional layers use 4 × 4 filters and strides with size 2 for downsampling. The same size of kernel is used for transpose convolution operation during upsampling. Each convolution/deconvolution operation is followed by batch normalization and Rectified Linear Unit (ReLU) activation. Weights of the generator are updated depending on the adversarial loss of the discriminator and the *L*1 loss of the generator. Architecture details are shown in Table 1.


**Table 1.** Generator architecture of the CGANet model.

These networks require all the input information to pass through each of the middle layers. In most of the image-to-image translation problems, it is desirable to share the feature maps across the network since both input and output images represent the same underlying structure. For this purpose, we added a skip connection while following the general configuration of a "U-Net" [44]. Skip connections simply concatenate the channels at the ith layer with the channels at the (n–i)th layer.

•Discriminator Network

We adapted PatchGAN architecture [45] for the discriminator, which penalized the structure at the scale of patches. It tried to classify each N × N patch as either real or fake. Final output of the discriminator (*D*) was calculated by averaging the received responses by running the discriminator convolutionally across the image. In this case, the patch was 30 × 30 in size, and each convolutional layer was followed by a ReLU activation and batch normalization. Zero-padding layers were used to preserve the edge details of the input feature maps during the convolution. Discriminator architecture is described in Table 2.


**Table 2.** Discriminator architecture of the CGANet model.
