**3. Method**

In this paper, we aim to improve the detection performance of small objects on remote sensing imagery. Towards this goal, we propose an end-to-end network architecture that consists of two modules: A GAN-based SR network and a detector network. The whole network is trained in an end-to-end manner and HR and LR image pairs are needed for training.

The SR network has three components: generator (G), discriminator ( *DRa*), and edge-enhancement network (EEN). Our method uses end-to-end training as the gradient of the detection loss from the dectector is backpropagated into the generator. Therefore, the detector also works like a discriminator and encourages the generator G to generate realistic images similar to the ground truth. Our entire network structure can also be divided into two parts: A generator consisting of the EEN and a discriminator, which includes the *DRa* and the detector network. In Figure 2, we show the role of the detector as a discriminator.

**Figure 2.** Overall network architecture with a generator and a discriminator module.

The generator G generates intermediate super-resolution (ISR) images, and then final SR images are generated after applying the EEN network. The discriminator ( *DRa*) discriminates between ground truth (GT) HR images and ISR. The inverted gradients of *DRa* are backpropagated into the generator G in order to create SR images allowing for accurate object detection. Edge information is extracted from ISR, and the EEN network enhances these edges. Afterwards, the enhanced edges are again added to the ISR after subtracting the original edges extracted by the Laplacian operator and we ge<sup>t</sup> the output SR images with enhanced edges. Finally, we detect objects from the SR images using the detector network.

We use two different loss functions for EEN: one compares the difference between SR and ground truth images, and the other compares the difference between the extracted edge from ISR and ground truth. We also use the VGG19 [62] network for feature extraction that is used for perceptual loss [21]. Hence, it generates more realistic images with more accurate edge information. We divide the whole pipeline as a generator, and a discriminator, and these two components are elaborated in the following.
