**1. Introduction**

Dryland biomes cover ~47% of the Earth's surface [1]. In these environments, vegetation appears scattered [2] and its structure, composition and spatial patterns are key indicators of biotic interactions [3], regulation of water, and nutrient cycles at landscape level [4]. Changes in the cover and spatial patterns of dryland vegetation occur in response to land degradation processes [5]. Hence, methods to identify and characterize vegetation patches and their structural characteristics can improve our ability to understand dryland functioning and to assess desertification risk [5–8]. Progress has been made using remote sensing tools in this regard (e.g., quantification of dryland vegetation structure at landscape scale [9], monitoring vegetation trends [10], spatial patterns identifying ecosystem

**Citation:** Guirado, E.; Blanco-Sacristán, J.; Rodríguez-Caballero, E.; Tabik, S.; Alcaraz-Segura, D.; Martínez-Valderrama, J.; Cabello, J. Mask R-CNN and OBIA Fusion Improves the Segmentation of Scattered Vegetation in Very High-Resolution Optical Sensors. *Sensors* **2021**, *21*, 320. https://doi.org/10.3390/s21010320

Received: 17 December 2020 Accepted: 1 January 2021 Published: 5 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

multifunctionality [11], characterizing flood dynamics [12], among many others). However, the improvement in the accuracy of vegetation cover measurement is still being studied to obtain maximum performance from data and technology. Estimating and monitoring changes in vegetation cover through remote sensing is key for dryland ecology and conservation [6]. Both historical temporal and spatial data are the base for remote sensing studies to identify the functioning and structure of vegetation [13,14].

The analysis of very high-resolution images to detect and measure vegetation cover and its spatial arrangement across the landscape starts typically by segmenting the objects to be identified in the images [7]. Object-Based Image Analysis (OBIA) [15] and Mask Region-based Convolutional Neural Networks (Mask R-CNN) [16] are among the most used and state-of-the-art segmentation methods. Though they provide a similar product, both methods rely on very different approaches. OBIA combines spectral information from each pixel with its spatial context [17,18]. Similar pixels are then grouped in homogenous objects that are used as the basis for further classification. Mask R-CNN, on the other hand, a type of artificial intelligence whose functioning is inspired by the human brain provides transferable models between zones and semantic segmentation with unprecedented accuracy [19,20]. Besides, fusion has recently been used to improve spectral, spatial, and temporal resolution from remote sensing images [21–23]. However, the fusion of methods for vegetation mapping has not been evaluated.

Remote sensing studies based on very high-resolution images have increased in the last years (e.g., [24–27]), partly because of the availability of Google Earth images worldwide [28–30] and the popularization of unmanned aerial vehicles (UAV). Although these images have shown a high potential for vegetation mapping and monitoring [31–33], two main problems arise when they are used. First, higher spatial resolution increases the spectral heterogeneity among and within vegetation types, resulting in a salt and pepper effect in their identification that does not correctly characterize the actual surface [34]. Second, the processing time of very high-resolution images and the computational power required is larger than in the case of low-resolution images [35]. Under these conditions, traditional pixel-based analysis has proved to be less accurate than OBIA or Mask R-CNN for scattered vegetation mapping [15,36]. There are many applications for OBIA [37–39] and deep learning segmentation methods [40,41]. For example, mapping greenhouses [42], monitoring disturbances affecting vegetation cover [5], or counting scattered trees in Sahel and Sahara [43]. These methods have been compared with excellent results in both segmenting and detecting tree cover and scattered vegetation [7,44,45]. However, greater precision is always advisable in problems of very high sensitivity [46]. Despite methodological advances, selecting the appropriate image source is key to produce accurate segmentations of objects, like in vegetation maps [47,48], and there is no answer to the question of which image or method to choose for segmenting objects. Understanding how the spatial resolution of the imagery used affects these segmentation methods or the fusing of both is key for their correct application to obtain better accuracy in object segmentation in vegetation mapping in drylands.

To evaluate which is the most accurate method between OBIA and Mask R-CNN to segment scattered vegetation in drylands and to understand the effect of the spatial resolution of the images used in this process, we assessed the accuracy of these two methods in the segmentation of scattered dryland shrubs and compared how final accuracy varies as does spatial resolution. We also check the accuracy of the fusion of both methods.

This work is organized as follows. Section 2 describes the study area, the dataset used, and the methodologies tested. Section 3 describes the experiments addressed to assess the accuracies of the methods used. The experimental results and discussion are presented in Section 4, and conclusions are given in Section 5.

#### **2. Materials and Methods**

#### *2.1. Study Area*

We focused on the community of *Ziziphus lotus* shrubs, an ecosystem of priority conservation interest at European level (habitat 5220\* of Directive 92/43/EEC), located in Cabo de Gata-Níjar Natural Park (36◦49 43 N, 2◦17 30 W, SE Spain), one of the driest areas of continental Europe. This type of vegetation is scarce and patchy, which appears surrounded by a matrix of bare soil and small shrubs (e.g., *Launea arborescens*, *Lygeum spartum* and *Thymus hyemalis*). *Z. lotus* is a facultative phreatophyte [49] and forms large hemispherical canopies (1–3 m tall) that constitute fertility islands where many other species of plants and animals live [50]. These shrubs are long-lived species contributing to the formation of geomorphological structures, called nebkhas [51], that protect from the intense wind erosion activity that characterizes the area, thereby retaining soil, nutrients, and moisture.

#### *2.2. Dataset*

The data set consisted of two plots (Plot 1 and Plot 2) with 3 images of different spatial resolution in each one. The plots had an area of 250 × 250 m with scattered *Z. lotus* shrubs. The images were obtained from optical remote sensors in the visible spectral range, Red, Green and Blue bands (RGB) and spatial resolutions of < 1 m/pixel:


#### *2.3. OBIA*

OBIA-based segmentation is a method of image analysis that divides the image into homogeneous objects of interest (i.e., groups of pixels also called segments) based on similarities of shape, spectral information, and contextual information [17]. It identifies homogeneous and discrete image objects by setting an optimal combination of values for three parameters (i.e., Scale, Shape, and Compactness) related to their spectral and spatial variability. There are no unique values for any of these parameters, and their final combination always depends on the object of interest, so finding this optimal combination represents a challenge due to the vast number of possible combinations. First, it is necessary to establish an appropriate Scale level depending on the size of the object studied in the image [43]; for example, low Scale values for small shrubs and high Scale values for large shrubs [44,45]. Recent advances have been oriented in developing techniques (e.g., [53–59]) and algorithms (e.g., [60–63]) to automatically find the optimal value of the Scale parameter [64], which is the most important for determining the size of the segmented objects [65,66]. The Shape and the Compactness parameters must be configured too. While high values of the Shape parameter prioritize the shape over the colour, high values of the Compactness parameter prioritize compactness of the objects over the smoothness of their edges [67].

#### *2.4. Mask R-CNN*

In this problem of locating and delimiting the edges of dispersed shrubs, we used a computer vision technique named instance segmentation [68]. Such technique infers a label for each pixel considering other nearby objects, thus including the boundaries of the object. We used Mask R-CNN segmentation model [16], which extends Faster R-CNN detection model [16] and provides three outputs for each object: (i) a class label, (ii) a bounding box that delimits the object and (iii) a mask which delimits the pixels that constitute each object. In the binary problem addressed in this work, Mask R-CNN generates for each predicted object instance a binary mask (values of 0 and 1), where values of 1 indicate a *Z. lotus* pixel and 0 indicates a bare soil pixel.

Mask R-CNN relies on a classification model for the task of feature extraction. In this work, we used ResNet 101 [69] to extract increasingly higher-level characteristics from the lowest to the deepest layer levels.

The learning process of Mask R-CNN is influenced by the number of epochs, which is the number of times the network goes through the training phase, and by other optimizations such as transfer-learning or data-augmentation (see Section 3.2). Finally, the 1024 × 1024 × 3 band image input is converted to 32 × 32 × 2048 to represent objects at different scales via the characteristic network pyramid.

#### *2.5. Segmentation Accuracy Assessment*

The accuracy of the segmentation task in this work was assessed with respect to ground truth by using the Euclidean Distance v.2 (ED2; [70]), which evaluates the geometric and arithmetic discrepancy between reference polygons and the segments obtained during the segmentation process. Both types of discrepancy need to be assessed. As reference polygons, we used the perimeter of 60 *Z. lotus* shrubs measured with photo-interpretation in all images by a technical expert. We estimated the geometric discrepancy by the "Potential Segmentation Error" (PSE; Equation (1)), defined as the ratio of the total area of each segment obtained in the segmentation that falls outside the reference segment and the total area of reference polygons as:

$$\text{PSE} = \frac{\Sigma|\mathbf{s}\_{\mathbf{i}} - \mathbf{r}\_{\mathbf{k}}|}{\Sigma|\mathbf{r}\_{\mathbf{k}}|} \tag{1}$$

where PSE is the "Potential Segmentation Error", rk is the area of the reference polygon and si is the overestimated area of the segment obtained during the segmentation. A value of 0 indicates that segments obtained from the segmentation fit well into the reference polygons. Conversely, larger values indicate a discrepancy between reference polygons and the segments.

Although the geometric relation is necessary, it is not enough to describe the discrepancies between the segments obtained during the segmentation process and the corresponding reference polygons. To solve such problem the ED2 index includes an additional factor, the "Number-of-Segmentation Ratio" (NSR), that evaluates the arithmetic discrepancy between the reference polygons and the generated segments (Equation (2)):

$$\text{NSR} = \frac{\text{abs}(\text{m} - \text{v})}{\text{m}} \tag{2}$$

where NSR is the arithmetic discrepancy between the polygons of the resulting segmentation and the reference polygons and abs is the absolute value of the difference of the number of reference polygons, m, and the number of segments obtained, v.

Thus, the ED2 can be defined as the joint effect of geometric and arithmetic differences (Equation (3)), estimated from PSE and NSR, respectively, as:

$$\text{ED2} = \sqrt{(\text{PSE})^2 + (\text{NSR})^2} \tag{3}$$

where ED2 is Euclidean Distance v.2, PSE is Potential Segmentation Error, and NSR is Number-of-Segmentation Ratio. According to Liu et al. [70], values of ED2 close to 0 indicate good arithmetic and geometric coincidence, while high values indicate a mismatch between them.

#### **3. Experiments**

We set several experiments to assess the accuracy of the two different OBIA and Mask R-CNN segmenting scattered vegetation in drylands. We used the images of Plot 1 to test the OBIA and Mask R-CNN segmentation methods. The images of Plot 2 were used

for the training phase in Mask R-CNN experiments exclusively (Figure 1). In Section 3.1, we describe OBIA experiments, focused on detecting the best parameters (i.e., Scale, Shape and Compactness) of a popularly used "multi-resolution" segmentation algorithm [71]. In Section 3.2. we described the Mask R-CNN experiments, in which we first evaluated the precision in the detection of shrubs (capture or notice the presence of shrubs) and second how accurate is the segmentation of those shrubs. Finally, in Section 3.3. we described the fusion of both methods and compared all the accuracies between them in Section 4.3.

**Figure 1.** Workflow with the main processes carried out in this work. Asterisk shows an example of the result of the fusion of the segmentation results from OBIA and Mask R-CNN. OBIA: Object-Based Image Analysis; Mask R-CNN: Mask Region-based Convolutional Neural Networks; ESP v.2: Estimation of Scale Parameter v.2; SPR: Segmentation Parameters Range.

#### *3.1. OBIA Experiments*

To obtain the optimal value of each parameter of the OBIA segmentation, we use two approaches:


#### *3.2. Mask R-CNN Experiments*

Mask R-CNN segmentation is divided in two phases: i) Training and ii) Testing phases. In the training phase, we selected 100 training polygons representing 100 shrub individuals with different sizes. The sampling was done using VGG Image Annotator [72] to generate a JSON file, which includes the coordinates of all the vertices of each segment, equivalent to the perimeter of each shrub. To increase the number of samples and reduce overfitting of the model, we applied data-augmentation and transfer-learning:


We tested three different learning periods (100 steps per epoch) per model:


We trained the algorithm based on the ResNet architecture with a depth of 101 layers with each of the three proposed spatial resolutions. We then evaluated the trained models in all possible combinations between the resolutions. We evaluated the use of data-

augmentation and transfer-learning from more superficial layers to the whole architecture with different stages in the training process. Particularly:

(1.1) Trained with UAV images.

(1.2) Trained with UAV images and data-augmentation.


We did the test phase using Plot 1. To identify the most accurate experiments, we evaluated the detection of the CNN-based models, and determined their Precision, Recall, and F1-measure [75] as:

$$\text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}'} \tag{4}$$

$$\text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}},\tag{5}$$

$$\text{F1} - \text{measure} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \tag{6}$$

#### *3.3. Fusion of OBIA and Mask R-CNN*

We combined the most accurate segmentations obtained using OBIA and Mask R-CNN, according to ED2 values (Figure 1). We let oi denote the i-th OBIA polygon within the OBIA segmentation, O, and mj denote the j-th Mask R-CNN polygon within the Mask R-CNN segmentation, C. Then we have O = {oi: i = 1, 2, ..., m} and C = {cj: j = 1, 2, ..., n}. Here, the subscripts i and j are sequential numbers for the polygons of the OBIA and Mask R-CNN segmentations, respectively. m and n indicate the total numbers of the objects segmented with OBIA and Mask R-CNN, respectively. m and n must be equal. Finally, the corresponding segment data sets extracted (Equation (7)) by the fusion are considered a consensus among the initially segmented objects as:

$$\text{OC}\_{\text{ij}} = \text{area}\text{O}\_{\text{i}} \cap \text{ area}\text{C}\_{\text{j}} \tag{7}$$

where OCij is the intersected area between the segments of the OBIA segmentation (Oi) and the area of the segments of the Mask R-CNN segmentation (Cj).

Finally, we estimate ED2 values of the final segmentation using validation shrubs from Plot 1, and we compared it with segmentation accuracy obtained by the different methods.

#### **4. Results and Discussion**

#### *4.1. OBIA Segmentation*

In total, 9234 segmentations were performed by SPR, 3078 for each image type (e.g., Google Earth, airborne and UAV). OBIA segmentation accuracy using the SPR presented large variability (Table 1), with values of ED2 ranging between 0.05 and 0.28. Segmentation accuracy increased with image spatial resolution. Thus, the higher the spatial resolution, the higher the Scale values and more accurate the segmentation was. This result was represented by a decrease in ED2 values of 0.14, 0.10 and 0.05 for Google Earth, airborne and UAV images, respectively. The best combinations of segmentation parameters along the different images were (Figure 2): (i) for the Google Earth image, Scale values ranging from 105 to 110, low Shape values of 0.3 and high Compactness values from 0.8 to 0.9; (ii) for the orthoimage from the airborne sensor, Scale values between 125 and 155, Shape of 0.6 and Compactness of 0.9; and (iii) for the UAV image, the optimal segmentation showed the highest Scale values, ranging from 360 to 420, whereas Shape and Compactness values were similar to the values of the Google Earth image.

**Table 1.** Segmentation accuracies of Object-Based Image Analysis (OBIA) among the three spatial resolutions evaluated. For each segmentation type, only the most accurate combination of Scale, Shape, and Compactness is shown. ESP2/HB: Estimate Scale Parameter v.2 (ESP2) with Bottom-up Hierarchy; ESP2/HT: ESP2 with Top-down Hierarchy; ESP2/NH: ESP2 Non-Hierarchical; SPR: Segmentation with Parameters Range. Closer values to 0 indicate accurate segmentations. In bold the most accurate results.


When we applied the semi-automatic method ESP2 to estimate the optimum value of the Scale parameter, we observed a similar pattern to that described for the SPR, with an increase in accuracy when increasing spatial resolution. The highest value of ED2 was for the Google Earth image segmentation results (ED2 = 0.25), decreasing for the orthoimage from the airborne sensor (ED2 = 0.15) and reaching the minimum value (best) in the UAV image (ED2 = 0.12). However, the results obtained by ESP2 were worse than the results obtained by the SPR method in all the images analysed (Table 1) with the largest differences in the image with the lowest spatial resolution (Google Earth). In the Google Earth images, the best method of analysis of the three options presented by the ESP2 tool was the hierarchical bottom level, with acceptable ED2 values, lower than 0.14 (Table 1). For the airborne images, the results were equal to Google Earth images (hierarchical bottom level). Conversely, the segmentation of the UAV image produced the best ED2 values when applying the ESP2 without hierarchical level. The computational time for the segmentation of the images was higher in ESP2 than SPR approach. In addition, the computation time of the analysis was also influenced by the number of pixels to analyse, it increased in higher spatial resolution images in computer with a Core i7-4790K, 4 GHz and 32G of RAM memory (Intel, Santa Clara, CA, USA) (Table 1).

**Figure 2.** Relationship between Scale, Shape and Compactness parameters (X axis) evaluated using Euclidean distance v.2 (ED2; Y axis) in 9234 Object-based image analysis (OBIA) segmentations from Google Earth, Airborne and unmanned aerial vehicle (UAV) images. The rainbow palette shows the density of validation results. In red high density and in blue low density.

#### *4.2. Mask R-CNN Segmentation*

#### 4.2.1. Detection of Scattered Shrubs

We obtained the best detection results for the models trained and evaluated with UAV images (F1-measure = 0.91) and the models trained with the highest number of epochs and data-augmentation activated (Table 2). The best transfer from a UAV trained model to a test with another resolution was to the image from the airborne sensor. Nevertheless, the Google Earth test image produced a similar result of F1-measure = 0.90. We consider that a model trained with data-augmentation and very high spatial resolution images (0.03 m/pixel) can generalize well to less accurate images such as those from Google Earth (0.5 m/pixel). Furthermore, when we trained the models with Google Earth images, we observed that it also generalised well to more precise resolutions (F1-measure = 0.90). For this reason, the detection of *Z. lotus* shrubs might be generalizable from any resolution less than 1 m/pixel.


**Table 2.** Test results of Mask Region-based Convolutional Neural Networks (Mask R-CNN) experiments in three different spatial resolutions images. TP: True Positive; FP: False Negative; FN: False Negative. Precision, Recall, and F1-measure were used for detection results. In bold the most accurate results.

#### 4.2.2. Segmentation Accuracy for Detected Shrubs

The best segmentation accuracy was obtained with the models trained and tested with the same source of images, reaching values of ED2 = 0.07 in Google Earth ones. However, when the model trained with Google Earth images was tested in a UAV image, the ED2 resulted in 0.08. Moreover, the effect of data-augmentation was counterproductive in models trained with airborne images and only lowered ED2 (best results) in models trained with the UAV image. In general, data-augmentation helped to generalise between images but did not obtain a considerable increase in precision in models trained and tested with the same image resolution (Table 3 and Figure 3).

**Table 3.** Segmentation accuracies of Mask Region-based Convolutional Neural Networks (Mask R-CNN). PSE: Potential Segmentation Error; NSR: Number Segmentation Ratio; ED2: Euclidean Distance v.2. In bold the most accurate results.


**Figure 3.** Examples of segmentation of images from Plot 1 using Object-based Image Analysis (OBIA; **Top**) and Mask Region-based Convolutional Neural Networks (Mask R-CNN; **Down**) on Google Earth, Airborne and Unmanned Aerial Vehicle (UAV) images. The different colours in the Mask R-CNN approach are to differentiate the shrubs individually.

#### *4.3. Fusion of OBIA and Mask R-CNN*

Our results showed that the fusion between OBIA and Mask R-CNN methods in very high-resolution RGB images is a powerful tool for mapping scattered shrubs in drylands. We found that the individual segmentations by using OBIA and Mask R-CNN independently were worse than the fusion of both. The accuracy of the fusion of OBIA and Mask R-CNN was higher than the accuracies of the separate segmentations (Table 4), being the most accurate segmentation of all the experiments tested in this work, with an ED2 = 0.038. However, the fusion between results on Google Earth images only improved the ED2 by 0.02. Therefore, the fusion of both segmentation methods provided the best segmentation over the previous methods (OBIA (ED2 = 0.05) and Mask R-CNN (ED2 = 0.07)), in very high-resolution images to segment scattered vegetation in drylands. Moreover, by merging the results of both methodologies (OBIA ∩ Mask R-CNN), the accuracy increases with an ED2 = 0.03.

**Table 4.** Segmentation accuracies of the fusion of Object-Based Image Analysis (OBIA) and Mask Region-based Convolutional Neural Networks (Mask R-CNN). PSE: Potential Segmentation Error; NSR: Number Segmentation Ratio; ED2: Euclidean Distance v.2. In bold the most accurate results.


To our knowledge, the effect of mixing these two methodologies has not been studied until the date, and it might be vital to improving future segmentation methods. As can be seen in the conceptual framework (Figure 1), it is reasonable to think that the higher the resolution and, therefore, the higher the detail at the edges of vegetation represented in the images, the fusion will improve the final precision of the segmentation. Nevertheless, in images with lower resolution, the fusion improved but to a minor degree.

The spatial resolution of the images affected the accuracy of the segmentation, providing outstanding results in all segmentation methods and spatial resolutions. However, according to [57], we observed that the spatial resolution and Scale parameter played a key role during the segmentation process and controlled the accuracy of the final segmentations. In non-fusion segmentation methods (OBIA or Mask R-CNN) the segmentation accuracy was higher in the spatial resolution image from UAV and OBIA up to ED2 = 0.05. However, when the object to be segmented is larger than the pixel size of the image, the spatial resolution of the image is of secondary importance [37,57,76,77]. For this reason, as the scattered vegetation in this area presents a mean size of 100 m2 [5], corresponding to 400 pixels of Google Earth image, only slight increases in segmentation accuracy were observed as the spatial resolution increased. Moreover, the overestimation of the area of each shrub was not significant as the images spatial resolution increased. Therefore, Google Earth images could be used to map scattered vegetation in drylands, if the plants to be mapped are larger than the pixel size. This result opens a wide range of new opportunities for vegetation mapping in remote areas where UAV or airborne image acquisition is difficult or acquiring commercial imagery of very high-resolution is very expensive. These results are promising and highlight the usefulness of free available Google Earth images for big shrubs mapping with only a negligible decrease in segmentation accuracy when compared with commercial UAV or airborne images. However, the segmentation of vegetation could be better if we use the near infrared NIR band since vegetation highlights in this range of the spectrum (e.g., 750 to 2500 nm) or used in vegetation indices such as the normalized difference vegetation index (NDVI) or Enhanced vegetation index (EVI). Finally, very high spatial resolution UAV images need much more computational time and are expensive and not always possible to obtain at larger scales in remote areas, hampering their use.

#### **5. Conclusions**

Our results showed that both OBIA and Mask R-CNN methods are powerful tools for mapping scattered vegetation in drylands. However, both methods were affected by the spatial resolution of the orthoimages utilized. We have shown for the first time that the fusion of the results from these methods increases, even more, the precision of the segmentation. This methodology should be tested on other types of vegetation or objects in order to prove to be fully effective. We propose an approach that offers a new way of fusing these methodologies to increase accuracy in the segmentation of scattered shrubs and should be evaluated on other objects in very high-resolution and hyperspectral images.

Using images with very high spatial resolution could provide the required precision to further develop methodologies to evaluate the spatial distribution of shrubs and dynamics of plant populations in global drylands, especially when utilizing free-to-use images, like the ones obtained from Google Earth. Such evaluation is of particular importance in drylands of developing countries, which are particularly sensitive to anthropogenic and climatic disturbances and may not have enough resources to acquire airborne or UAV imagery. For these reasons, future methodologies as the one presented in this work should focus on using freely available datasets.

In this context, the fusion of OBIA and Mask R-CNN could be extended to a larger number of classes of shrub and tree species or improved with the inclusion of more spectral and temporal information. Furthermore, this approach could improve the segmentation and monitoring of the crown of trees and arborescent shrubs in general, which are of particular importance for biodiversity conservation and for reducing uncertainties in carbon storages worldwide [78]. Recently, scattered trees have been identified as key structures for maintaining ecosystem services provision and high levels of biodiversity [43]. Global initiatives could benefit largely from CNNs, including those recently developed by FAO [79] to provide the forest extent in drylands. The uncertainties in this initiative [80,81] might be reduced implementing our approach CNN-based to segment trees. Tree and shrub segmentation methods could provide a global characterization of forest ecosystem structures and population abundances as part of the critical biodiversity variables initiative [82,83]. In long-lived shrubs where the precision of the segmentation is key for monitoring the detection of disturbances (e.g., pests, soil loss or seawater intrusion) [5]. Finally, the monitoring of persistent vegetation with minimal cover changes over decades could benefit from fusion approaches in the segmentation methods proposed.

**Author Contributions:** Conceptualization, E.G., J.B.-S., E.R.-C. and S.T.; methodology, E.G. and J.B.-S.; writing—original draft preparation, E.G., J.B.-S. and E.R.-C.; writing—review and editing, E.G., J.B.-S., E.R.-C., S.T., J.M.-V., D.A.-S., J.C.; funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the European Research Council (ERC Grant agreement 647038 [BIODESERT]), the European LIFE Project ADAPTAMED LIFE14 CCA/ES/000612, the RH2O-ARID (P18-RT-5130) and RESISTE (P18-RT-1927) funded by Consejería de Economía, Conocimiento, Empresas y Universidad from the Junta de Andalucía, and by projects A-TIC-458-UGR18 and DETECTOR (A-RNM-256-UGR18), with the contribution of the European Union Funds for Regional Development. E.R-C was supported by the HIPATIA-UAL fellowship, founded by the University of Almeria. S.T. is supported by the Ramón y Cajal Program of the Spanish Government (RYC-2015- 18136).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** All drone and airborne orthomosaic data, shapefile and code will be made available on request to the correspondent author's email with appropriate justification.

**Acknowledgments:** We are very grateful to the reviewers for their valuable comments that helped to improve the paper. We are grateful to Garnata Drone SL, Andalusian Centre for the Evaluation and Monitoring of Global Change (CAESCG) for providing the data set for the experiments.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**

