**Data augmentation.**

**Figure 4.** An example of data augmentation for an input synthetic aperture radar (SAR) patch (upper leftmost) and its corresponding reference mask (lower leftmost). Note how the augmented versions continue segments without breaking lines, using the "reflect" mode to fill gaps following rotation. Contains modified Copernicus Sentinel–1 data 2020.

#### 2.4.3. U–Net Model

The deep learning model for image segmentation that was chosen is the modified U–Net architecture proposed by Jerin Paul [36]. This architecture has 58 layers, divided into a downsampling encoder and upsampling decoder parts, which are connected via skip connections. The convolution layers are all 3 × 3, with exponential linear units as the activation and He normal initialiser. The only exception to this is the last output layer, which is a 1 × 1 convolution layer with sigmoid activation. In between the convolution layers, batch normalisation, max pooling and data dropout layers were included. The data dropout layers applied a dropout rate varying from 0.1 to 0.3. The total number of parameters were 1,946,993. The trainable parameters were 1,944,049. All models were initiated with random weights.

The input to the network included up to three layers of average backscatter intensity in VV and VH, and average coherence. The models returned segmented images for each input patch, with pixels ranging in value from 0 to 1. Values closer to 0 have a high prediction probability of belonging to the class of non–roads, while values closer to 1 have a high probability of being a road.

#### 2.4.4. Loss Function and Performance Metric

Road detection in desert regions is an unbalanced segmentation task, since in any given scene there are many more pixels falling into the non–road category than into the road category. The loss function applied during model training, and the accuracy metric to assess the performance of the model, need to take into account this class imbalance. The loss function that was applied in this case is the soft Dice loss. Based on the Dice coefficient, the soft Dice loss normalises for class imbalance. For the accuracy metric, it was decided to use the Jaccard index, also known as the Jaccard similarity coefficient, or intersection over union (IoU), which likewise considers class imbalance [47].

The formulae for soft Dice loss and the Jaccard index for a model predicted class (*A*) and a known class (*B*) are the following:

$$\text{SoftDice Loss} = 1 - \frac{2|A \cap B|}{|A| + |B|} \tag{1}$$

$$\text{Jaccard index } = \frac{|A \cap B|}{|A \cup B|} \tag{2}$$

#### 2.4.5. Hyperparameters and Model Training

Di fferent approaches were attempted for the model training. One approach was to include the available training data from all areas with the aim of training one model applicable to every sand covered desert landscape with characteristics similar to those of the test areas. It was soon apparent that this was not feasible, due mainly to the greatly varying sand dune forms between di fferent desert environments. This led to systematic false detections and almost no positive road detections over any area. Another approach was to choose one model, which would be trained for each specific desert region, with the available training data from that area. With this approach, even with much less training data, the model performed much better.

In addition to experimenting with the geographic coverage, di fferent types of Sentinel–1 input were tested. Various combinations of the VV and VH backscatter and coherence were included as inputs to the model, from individual bands, to combinations of two, or all three.

The model hyperparameters are listed in Table 2. After testing di fferent values for each parameter, these provided the best results for all the regions in which the method was applied, and with all the options for the SAR inputs. The only area specific parameter to be adjusted is the number of steps per epoch, which depends on the amount of training samples over a given region. Apart from the steps per epoch, these are the same hyperparameters as is in the model of Jerin Paul [36].


**Table 2.** Model hyperparameters.

After randomly shu ffling all samples, 10 percent were set aside for validation to assess the performance of the model during training, and another 10 percent for testing. After a review of this, a second round of training was carried out using all available data. Given the incompleteness of the OSM data in any given area, and the poor overlap between the OSM and detected roads discussed above, a more reliable accuracy assessment was carried out with the test data comprising manually digitised roads over subset areas. This is described in Section 2.5 below.

#### 2.4.6. Post–Processing and Final Map Generation

Over each area, having trained the model with the image patches containing the available OSM data, the model was applied to predict the presence of roads in all patches. The pixels in the resulting segmented patches ranged from 0 to 1. Values closer to 0 represented a high probability of belonging to the no–road class, while those closer to 1 were considered likely to be roads. To create binary masks, all pixels with a value of less than 0.5 were converted to 0, and those greater than or equal to 0.5 were converted to 1. The patches were then put together to reconstruct the original image scene. Finally, the resulting single raster mask was converted to a vector layer containing all the predicted roads as polygon vectors in one shapefile.

#### *2.5. Performance Evaluation*

A performance evaluation of the methodology was carried out by manually digitising all the roads in subset areas within each AOI, and comparing these with the model detections through the calculation of the Jaccard similarity coe fficient and the rank distance. The rank distance in turn is a measure which combines the completeness (percentage of reference data covered by model detections), and correctness (percentage of model detections covered by reference data) [48]. A performance evaluation with manually digitised reference data was necessary given the following limitations of using the OSM data as a reference.


Discrepancy between OSM, detected and true roads

**Figure 5.** Above: mask of detected roads. Below: Sentinel–2 image. Red line shows the Open Street Map (OSM) road data overlaid. The yellow arrows highlight the misregistration between the OSM road and both detected (mask) and actual (Sentinel–2) roads. Green arrows show roads which are not in the OSM dataset. Blue arrows point to a road which was neither in the OSM data, nor detected by the model. Contains modified Copernicus Sentinel–2 data 2020.

Figure 5 demonstrates the success in using, in some cases, low quality OSM data to train the U–Net model (accurate detections despite misregistration of the OSM with actual roads), while also highlighting the various problems with using OSM data as a reference for performance evaluation: the varying width of roads, missing roads in the OSM data, misregistration of the OSM. The top part of this figure shows the mask of detected roads over a part of the North Sinai AOI, while the lower part is a true colour Sentinel–2 image acquired on 2 August 2019 (roughly in the middle of the Sentinel–1 time

series used as an input to the model). White lines in the mask correspond with road detections. In this case the model detected the correct location of the road despite the misregistration of the OSM, which was used to train the model. Roads branching o ff the main road segmen<sup>t</sup> are not in the OSM dataset, but have been detected by the model (apart from one branch which was not detected).

These challenges resulted in the automatically calculated Jaccard index rarely exceeding 0.5 during model training, and the loss function seldom dropping below 0.3. As a relative assessment of performance, this was su fficient for model training and validation. For a more accurate quantitative assessment of results however, this was not su fficient, and a more thorough technique was adopted.

For a more robust accuracy assessment, the following was carried out: For each area, a 0.2 × 0.2 degree image subset was randomly selected. To avoid subsets with sparse detections, a threshold was applied to enable only those with at least 7000 road pixels to be considered. In the selected subsets, all roads were manually digitised as vectors, using the Sentinel–1 data and Sentinel–2 data from the same date range as the references. The model detected roads for the same area were similarly digitised. The resulting vector layers were visually inspected and the model detected vector components were assigned labels for true or false positives. All vector layers were converted to raster (one pixel width). A confusion matrix was created by quantifying the true and false positives and negatives. Based on this confusion matrix, the Jaccard index was calculated. Any OSM roads present in the selected subsets were discarded from the analysis since these had been used for training. While this method was suitable for quantifying true or false positives and negatives, another metric was required to assess the positional accuracy of the detections. For this, bu ffers of a two pixel width (40 m) were first created around both the reference and model detections. The percentage of reference data covered by model detections (completeness) and the percentage of model detections covered by the reference data (correctness) were calculated [15]. From these, the rank distance was derived, using the formula [47]:

$$\text{Rank Distance} = \sqrt{\frac{\% \text{Complete}^2 + \% \text{Correct}^2}{2}} \tag{3}$$

The two pixel bu ffer was necessary to account for the varying width of roads and errors in manual digitising, but is a reasonable value when compared to other studies, e.g., [4].

The manually digitised reference subsets in each AOI could have been used to assess the accuracy of the OSM data. However, each validation subset only had a small quantity of OSM roads—in the Taklimakan Desert subset there were none at all (the minimum 7000 road pixel threshold applied only to model detected roads). An assessment of the accuracy of these would not have been representative. Moreover, there have been several dedicated studies on the accuracy of OSM data, e.g., [49–51].
