*3.1. ADGAN*

The architecture of Skip-GANomaly is shown in Figure 1. The Generator and the Discriminator consist of a U-net and an encoder architecture, respectively [47]. In earlier work, we showed how substituting the Generator for an encoder–decoder architecture without skip-connections—e.g., GANomaly [17]—does not always result in well-reconstructed fake images from Earth observation imagery [23]. The encoder–decoder architecture of Skip-GANomaly, in combination with the skip-connections, makes it efficient in recreating even complex remote sensing imagery.

**Figure 1.** Skip-GANomaly architecture. Adapted from [24].

Skip-GANomaly makes use of three distinctive losses to guide its training, called the latent loss (Llat), the adversarial loss (Ladv) and the contextual loss (Lcon). Ladv accounts for the correctness of the classification (fake or real). Lcon accounts for the generated image, and steers the model to create fake images that are contextually sound, i.e., images that look realistic. Llat is a loss that steers the encoders inside the Generator and Discriminator to create similar representations of the image latent vector *z* [24]. Each loss contributes to the overall loss according to their corresponding weight (w). The losses are described in the following equations:

$$L\_{\text{adv}} = \|f(\mathbf{x}) - f(\mathbf{\hat{x}})\|\_2 \tag{1}$$

where,

$$f(.) = \mathbb{E}\_{x \sim p\_x} \left[ \log D(.) \right] \tag{2}$$

$$L\_{\rm con} = \|\mathbf{x} - \mathbf{\hat{x}}\|\_1 \tag{3}$$

$$L\_{\text{lat}} = \|z - \hat{z}\|\_2 \tag{4}$$

The overall loss is described as follows:

$$L = w\_{\text{adv}}L\_{\text{adv}} + w\_{\text{conv}}L\_{\text{conv}} + w\_{\text{lat}}L\_{\text{lat}} \tag{5}$$

Several hyper-parameters influence the performance of the model. Besides the general parameters such as batch size, learning rate or decay rate, model specific parameters include loss weights, the size of the latent vector z, and the number of encoder layers inside the Generator and Discriminator. Details on how these parameters are tuned can be found in Section 3.4.

A modification was made to the network. In the original network, after each epoch of training, the Area Under the Curve (AUC) score was calculated using the validation dataset. After training finished, a model for inference was selected based on the epoch in which it obtained the highest AUC score [24]. This makes the original implementation not a truly unsupervised approach, since a validation dataset is still required (i.e., examples of damage are still needed). Therefore, we choose to save the best performing model when the lowest Generator loss was found. This ensures that the model is chosen that is best able to generate fake images, which is the main principle of Skip-GANomaly. We verified that this approach yielded performance comparable to the original implementation, without the need of annotated test samples.

During inference, each image is classified as either damaged or undamaged by obtaining anomaly scores. Per-pixel anomaly scores are derived by simple image differencing between the input and the generated image. Each corresponding channel is subtracted from each other and averaged per pixel to obtain per-pixel anomaly scores. An image anomaly score is obtained by averaging the per-pixel anomaly scores. The closer to one, the higher the probability that the image is anomalous. After obtaining anomaly scores for all test samples, a classification threshold was determined in order to classify the images. This threshold is characterized as the intersection between the distribution of anomaly scores of normal and abnormal samples. Any sample with an anomaly score below the threshold was classified as normal and any value above the threshold as abnormal. Ideally, a model with a high descriptive value should result in non-overlapping distributions of the normal and abnormal samples with a clear threshold.

Finally, alterations and additions were applied to Skip-GANomaly in an attempt to boost results for the satellite imagery dataset. First, with the idea of weighing the generation of building pixels more than other pixels, we attempted to direct the attention of Skip-GANomaly by adding building masks as input in an approach similar to the one described in [48]. Furthermore, with the idea of utilizing the building information in the multiple epochs, similar to the approach described in [16], we stacked pre- and post-imagery into a 6-channel image and implemented an early, late or full feature fusion approach. These additions only provided marginal improvements. Our findings for stacking preand post-imagery were in line with those found in [29]. The goal of this study was to investigate the applicability of ADGANs for building damage detection. Considering that improvements of the model were beyond our scope of work and only marginal, these lines of investigation were not explored any further and the original implementation was maintained.
