**1. Introduction**

The mapping and monitoring of roads in desert regions are key concerns. Population growth and an increase in the development of urban centres have led to a corresponding expansion of transportation networks [1,2]. These networks are constantly evolving [1,3]. An awareness of the location and state of road systems is important to help monitor human activity and to identify any maintenance that may be required for the infrastructure. In many desert regions, roads and tracks are

used for illicit activities, such as smuggling [4]. Sand drift and dune migration can rapidly bury roads, thus necessitating intervention [5–7].

Ground techniques used for surveying and monitoring road networks are expensive and time consuming [2]. This is especially true for desert regions, given the extensive areas involved, the often inhospitable landscapes, and, in some cases, the political instability [8,9]. Remote sensing techniques have the ability to acquire information over large areas simultaneously, at frequent intervals, and at a low cost [10,11]. The application of emerging technologies, such as big data, cloud computing, interoperable platforms and artificial intelligence (AI), have opened new scenarios in different geospatial domains [12], such as the monitoring of critical infrastructure, i.e., roads.

Previously developed algorithms to automatically extract road features using techniques such as classification, segmentation, edge and line detection and mathematical morphology are summarised in a number of review papers, such as [13–16]. Since deep convolutional neural networks proved their effectiveness in the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC), deep learning has significantly gathered pace. Among the first to apply deep learning for road extraction were Mnih and Hinton [17]. Saito and his colleagues later achieved even better results with convolutional neural networks (CNNs) [18]. Techniques using CNNs are now considered to be standard for image segmentation [19] with many studies proposing different CNN architectures for road detection and monitoring, e.g., [1,3,20–24]. This is a fast evolving domain, and new research is regularly published on architectures and methods to address some of the limitations of CNNs. These include, for example, the significant computing and memory requirements [25], the fact that much training data is often needed, and the difficulty in adapting models to varying conditions [26]. A particularly effective CNN model for semantic segmentation is the U–Net architecture. Devised by Ronneberger and his colleagues for medical image segmentation [27], U–Net has become a standard technique for semantic segmentation in many applications since it won the IEEE International Symposium on Biomedical Imaging (ISBI) cell tracking challenge in 2015 by a large margin. The popularity of this architecture, which consists of a contracting path to capture the context and a symmetric expanding path that enables precise localisation, is due partly to its speed, and its ability to be trained end–to–end with very few images [27]. Many have applied variations of U–Net for road detection, e.g., [2,3,22–24], the majority basing their models on dedicated benchmark datasets of optical images for road identification, such as the Massachusetts roads data, created by Mihn and Hinton [17].

Most remote sensing based techniques for road detection and monitoring have relied on very high resolution (VHR) optical data [13]. However, in desert regions the spectral signatures of roads are often similar to the surrounding landscape, making them difficult to distinguish. Synthetic aperture radar (SAR) data has characteristics which make it efficient in the retrieval of roads in desert regions [9,28]. These include the sensitivity of the radar to surface roughness and the relative permittivity of targets, and the fact that SAR is a coherent system [29]. Dry sand usually has a very low relative permittivity and is therefore not a high reflector of microwave radiation. Sand covered areas are thus usually characterised by a very low SAR backscatter. Roads on the other hand may display a very different type of backscatter, which can contrast highly with the surrounding sand, even if the roads are significantly narrower than the SAR resolution cell [9]. These characteristics can be exploited to retrieve roads from SAR amplitude data. SAR coherence can also help to detect roads in desert regions. The low relative permittivity of dry sand causes the transmission of the microwave SAR signal into the sand volume [30,31]. Coherence is rapidly lost in such areas due to volume decorrelation [32]. This low coherence may contrast with the higher coherence of roads, often made from materials with a higher relative permittivity, such as asphalt, tarmac, or gravel, which therefore are not affected by volume decorrelation.

Some studies nonetheless have demonstrated methodologies for road detection and monitoring using SAR data. A good review of many of these is provided by [14]. More recently, a few studies have successfully applied deep learning techniques for SAR based road detection, e.g., [1,2,21], but these have mainly focused on relatively small, local areas, in developed landscapes, where good ground

truth and training data have been available. Some have also used SAR for detecting roads and tracks in desert regions, e.g., Abdelfattah and his colleagues proposed a semi–automatic technique for SAR based road detection over a local area in the Tunisian–Libyan border [4], but again, this was applied to a specific area, and was not fully automatic.

Robust methodologies are required for operational road detection and monitoring in desert regions over large areas without the need to acquire expensive reference data. Many desert areas are situated in developing countries, such as in North Africa, where accurate and abundant training data are not available, and budgets for infrastructure surveying are low.

The work presented in this paper aims to demonstrate a methodology for road detection and monitoring in desert regions, using free input and reference data that can be scaled to desert regions globally. This approach takes input SAR data from the free and open Copernicus Sentinel–1 satellite constellation over the area to be surveyed. The input data comprises both the amplitude and coherence averages from a time series of around two and a half months acquired in the same geometry (around seven scenes). The time series average contributes to removing image speckle and improves the model performance. The reference data on the other hand includes freely available Open Street Map (OSM) data. The combined use of OSM and Earth observation (EO) data in semantic segmentation has been much discussed, e.g., [33–35], but in most cases it has been used either with very high resolution (VHR) EO data, or for general classes with much less class imbalance than the road, no–road distinction. Roads are then extracted using a version of U–Net. With its architecture consisting of a contracting path to capture the context and a symmetric expanding path that enables precise localisation, U–Net has the well–known advantages that it can be trained end–to–end with very few images, and is fast [27]. This makes it suitable for cases where abundant, high quality reference data may not be available. One of the many versions of this architecture adapted to Earth observation data includes one proposed by Jerin Paul that was previously applied successfully to VHR optical data [36]. This was the version adopted in this methodology. Despite the fact that it was developed for use with optical data, it performed well on SAR based inputs with similar class imbalance. This U–Net model is trained with SAR amplitude and coherence averages, with OSM reference masks, for each desert region. The model is then applied to detect roads in each of the desert areas for which is was trained.

The method proposed here for SAR based deep learning segmentation, trained on OSM data, has been applied to a number of test areas in various deserts in Africa and Asia. The high accuracy of the results suggests that a robust methodology involving the use of freely available input and reference data could potentially be used for operational road network mapping and monitoring.

This study has been carried out in the framework of a joint collaboration between the European Space Agency (ESA) Φ–Lab, and the European Union Satellite Centre (SatCen) Research, Technology Development and Innovation (RTDI) Unit. The ESA Φ–Lab carries out research in transformative technologies that may impact the future of EO. The SatCen RTDI Unit provides new solutions supporting the operational needs of the SatCen and its stakeholders by looking at the whole EO data lifecycle.

#### **2. Materials and Methods**

This section presents the methodology for road detection and monitoring using free and open data. The process can be divided into two steps: The first is a SAR pre–processing step, to obtain temporal averages of the calibrated backscatter amplitude and consecutive coherence for each time series, over each area. The second is the deep learning workflow. In this second step, the input SAR layers are divided into adjacent, non–overlapping patches of 256 × 256 pixels. Each patch is matched with a corresponding mask of the same resolution and dimension showing the location of any OSM roads. In these masks, pixels coinciding with roads have a value of 1, while all other pixels have a value of 0. All SAR patches, which included OSM roads in their corresponding masks, were used to train the U–Net model, initiated with random weights, using the OSM data as a reference. Subsequently, the model was applied to all patches in each area of interest (AOI) to extract the roads not included

in the OSM dataset. The AOIs included three areas in three different desert environments in Africa and Asia, each the size of one Sentinel–1 IW scene (250 × 170 km).

While the OSM was used as the reference for model training, a more precise dataset was needed for an accuracy assessment. The reference masks only recorded roads present in the OSM dataset. The possibility existed that roads were present in the coverage of any given reference mask patch, but not included in the OSM. Moreover, due to the varying quality of the OSM and the varying width of roads, precise overlap between the model detected roads and OSM reference masks was difficult to achieve. To maintain automation and ensure the scalability of the method, there was no manual editing of these patches. Nonetheless, for the purpose of model training, the procedure to use the OSM as the reference worked well. For a reliable accuracy assessment however, a more rigorous technique was adopted: a subset area was randomly selected in each desert region in which all roads were manually digitised. These data were then used as the reference for a more precise accuracy assessment.

#### *2.1. Areas of Interest (AOIs)*

Three AOIs in different types of sand covered deserts were chosen to apply the method. These include most of the North Sinai Desert of Egypt, a large part of the Grand Erg Oriental in the Algerian and Tunisian Sahara, and the central part of the Taklimakan Desert of China (see Figure 1). The size of each of these three areas corresponds to the extent of one Sentinel–1 interferometric wide swath (IW) scene: 250 × 170 km, covering an area of 47,500 km<sup>2</sup> in each desert region. They were chosen for their geographic and morphological variety, each having very different sand dune forms and local conditions.

Map of AOIs 

**Figure 1.** Map showing the location of areas of interest (AOIs) on an ENVISAT MERIS true colour mosaic, in a geographic latitude, longitude map system, World Geodetic System 1984 (WGS84) datum. Insets show a close–up of the AOIs. Each AOI and inset has the dimension of one Sentinel–1 IW footprint (250 km East–West, 170 km North–South). Credits: CHELYS srl for the world map and the European Space Agency (ESA) GlobCover for insets.

The North Sinai Desert, in the north of the Sinai Peninsula, is composed mainly of aeolian sand dune fields and interdune areas. The sand dunes include barchan, seif or longitudinal linear dunes trending east–west, transverse and star dunes [37]. Linear dunes are the main aeolian form in North Sinai [5]. The climate of the study area is arid. The average annual rainfall is about 140 mm at El Arish [38], but drops in the south, where it does not exceed 28 mm per year [5].

The Grand Erg Oriental is a sand dune field in the Sahara desert, mainly in Algeria, but with its north–eastern edge in Tunisia. It is characterised by four large–scale dune pattern types with gradual transitions between them. These include large, branching linear dunes; small and widely spaced star and dome dunes; a network type created mostly from crescentic dunes; and large, closely spaced star dunes [39]. The average annual rainfall does not exceed 70 mm [40].

The Taklimakan Desert is the world's second–largest shifting sand desert, located in China, in the rain shadow of the Tibetan Plateau [41]. Three types of sand dunes exist in the Taklimakan Desert: compound, complex crescent dunes and crescent chains; compound dome dunes; and compound, complex linear dunes [42]. The mean annual precipitation varies between 22 and 70 mm [43].

#### *2.2. SAR and OSM Data*

To achieve the objective of demonstrating a robust and cost–e ffective methodology that can be applied globally, it was decided to exploit the Copernicus Sentinel–1 archive. The Sentinel–1 data are acquired at regular intervals worldwide and are available under a free and open access policy [44]. Over each of the AOIs, a time series was obtained of 7 images acquired every 12 days over an approximately two–and–a–half–month period (June/July to August/September 2019). The images were all interferometric wide swath (IW), all in ascending geometry and dual polarisation: vertical transmit–vertical receive, and vertical transmit–horizontal receive (VV and VH, respectively). In order to explore the use of both amplitude and coherence in road detection, the time series over each area was obtained in both ground range detected (GRD) and single look complex (SLC) formats. The spatial resolution of the Sentinel–1 IW data is approximately 20 × 20 metres for the GRD and 5 × 20 metres for the SLC. The pixel spacing of the GRD data is 10 × 10 metres. Table 1 shows the details of the Sentinel–1 data used in each AOI. All GPT graphs and bash scripts are available on Github (A Github repository has been created which contains all scripts that were used in this research, including the Bash files and GPT graphs for the Sentinel–1 data processing, and the Python code for the deep learning workflow available in a Jupyter Notebook. In this repository are also results in a shapefile format of the road detections over each of the AOIs. Supplementary data—Available online: https://github.com/ESA-PhiLab/infrastructure) [45].


**Table 1.** Details of the Sentinel–1 time series used in each of the AOIs.

OSM data, including all roads, was downloaded at continental scale. From the original XML formatted osm files, they were converted to vector shapefiles; subsequently, the OSM data were subset for each AOI and attribute fields were reduced to the sole roads identification, in order to limit the file size.

#### *2.3. SAR Pre–Processing*

Given that roads in desert areas can be distinguished in both SAR amplitude and coherence, it was decided to include both as inputs to the U–Net model. A virtual machine (VM), with Ubuntu as the operating system, was used for the Sentinel–1 pre–processing. This VM was connected to the CreoDIAS cloud environment, containing archive Sentinel–1 data. Processing was carried out automatically on the cloud using the command line graph processing tool (GPT) of the open source ESA Sentinel application platform (SNAP) software. Two GPT graphs, including all steps of each of the SLC and GRD processing chains, were applied in batch to the time series of data over each area using Linux bash scripts.

#### 2.3.1. Amplitude Processing

For the amplitude processing, each Sentinel–1 scene, in a GRD format, was calibrated to σ0 backscatter. The calibrated data was then terrain corrected to the European Petroleum Survey Group (EPSG) 4326 map system, i.e., geographic latitude and longitude with the World Geodetic System 1984 (WGS84) datum. The topographic distortion was corrected with the aid of the shuttle radar topography mission (SRTM) 3 s global digital elevation model (DEM). The output pixel spacing was 10 m. The stack of calibrated and terrain corrected scenes was then co–registered using cross correlation. The co–registered stack was averaged into one scene to reduce speckle. This average was finally converted from the linear backscatter scale to logarithmic decibel, to improve visualisation and facilitate further pre–processing during the deep learning workflow.

Some very good multitemporal speckle filters exist that preserve the spatial resolution while also keeping the temporal backscatter di fferences, such as the De Grandi speckle filter [46]. This allows for the monitoring of temporal intervals of less than the length of the time series. However, the emphasis of the study was to demonstrate a robust methodology that uses open data and tools. The most effective way to su fficiently reduce speckle while completely preserving the spatial resolution using the tools available was to average the data. Figure 2 shows the steps of the processing chain applied automatically to the time series of the Sentinel–1 GRD data.
