*3.3. Training*

Our architecture can be trained in separate steps or jointly in an end-to-end way. We discuss the details of these two types of training in the next two sections.

#### 3.3.1. Separate Training

In separate training, we train the SR network (generator module and discriminator *DRa*) and the detector separately. Detector loss is not backpropagated to the generator module. Therefore, the generator is not aware of the detector and thus, it only gets feedback from the discriminator *DRa*. For example, in Equation (11), no error is backpropagated to the *GG*\_*een* network (the network is detached during the calculation of the detector loss) while calculating the loss *Lcls*\_ *f rcnn*.

#### 3.3.2. End-to-End Training

In end-to-end training, we train the whole architecture end-to-end that means the detector loss is backpropagated to the generator module. Therefore, the generator module revceives gradients from both detector and discriminator *DRa*. We ge<sup>t</sup> the final discriminator loss (*LD*\_*det*) as following:

$$L\_{D\_\text{-}det} = L\_D^{\text{Ra}} + \eta L\_{\text{det}} \tag{17}$$

Here, *η* is the parameter to balance the contribution of the detector loss and we empirically set it to 1. Detection loss from SSD or Faster R-CNN is denoted by *Ldet*. Finally, we ge<sup>t</sup> an overall loss (*Loverall*) for our architecture as follows.

$$L\_{overall} = L\_{G\\_een} + L\_{D\\_det} \tag{18}$$
