**4. Experiments**

As mentioned above, we trained our architecture separately and in an end-to-end manner. For separate training, we first trained the SR network until convergence and then trained the detector networks based on the SR images. For end-to-end training, we also employed separate training as pre-training step for weight initialization. Afterwards SR and object detection networks were jointly trained, i.e., the gradients from the the object detector were propagated into the generator network.

In the training process, the learning rate was set to 0.0001 and halved after every 50 K iterations.The batch size was set to 5. We used Adam [64] as optimizer with *β*1 = 0.9, *β*2 = 0.999 and updated the whole architecture weights until convergence. We used 23 RRDB blocks for the generator *G* and five RRDB blocks for the EEN network. We implemented our architecture with the PyTorch framework [65] and trained/tested using two NVIDIA Titan X GPUs. The end-to-end training with COWC took 96 h for 200 epochs. The average inference speed using faster R-CNN was approximately four images/second and seven images/second for SSD. Our implementation can be found in GitHub [66].
