Predicting Fractional Shrub Cover in Heterogeneous Mediterranean Landscapes Using Machine Learning and Sentinel-2 Imagery

Cherif, El Khalil; Lucas, Ricardo; Ait Tchakoucht, Taha; Gama, Ivo; Ribeiro, Inês; Domingos, Tiago; Proença, Vânia

doi:10.3390/f15101739

Open AccessArticle

Predicting Fractional Shrub Cover in Heterogeneous Mediterranean Landscapes Using Machine Learning and Sentinel-2 Imagery

by

El Khalil Cherif

^1,*

,

Ricardo Lucas

²,

Taha Ait Tchakoucht

³,

Ivo Gama

⁴,

Inês Ribeiro

¹

,

Tiago Domingos

^1,4

and

Vânia Proença

¹

Marine, Environment, and Technology Centre/The Laboratory of Robotics and Engineering Systems, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001 Lisboa, Portugal

²

Energias de Portugal, S.A., Rua Cidade de Goa, 2, 2685-038 Sacavém, Portugal

³

School of Digital Engineering and Artificial Intelligence, Euromed Research Center, Euromed University of Fes, Meknes Road (Bensouda Roundabout), Fes 30000, Morocco

⁴

Terraprima—Serviços Ambientais, Sociedade Unipessoal, Lda, 2135-199 Samora Correia, Portugal

^*

Author to whom correspondence should be addressed.

Forests 2024, 15(10), 1739; https://doi.org/10.3390/f15101739 (registering DOI)

Submission received: 24 July 2024 / Revised: 26 September 2024 / Accepted: 27 September 2024 / Published: 1 October 2024

(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

Wildfires pose a growing threat to Mediterranean ecosystems. This study employs advanced classification techniques for shrub fractional cover mapping from satellite imagery in a fire-prone landscape in Quinta da França (QF), Portugal. The study area is characterized by fine-grained heterogeneous land cover and a Mediterranean climate. In this type of landscape, shrub encroachment after land abandonment and wildfires constitutes a threat to ecosystem resilience—in particular, by increasing the susceptibility to more frequent and large fires. High-resolution mapping of shrub cover is, therefore, an important contribution to landscape management for fire prevention. Here, a 20 cm resolution land cover map was used to label 10 m Sentinel-2 pixels according to their shrub cover percentage (three categories: 0%, >0%–50%, and >50%) for training and testing. Three distinct algorithms, namely Support Vector Machine (SVM), Artificial Neural Networks (ANNs), and Random Forest (RF), were tested for this purpose. RF excelled, achieving the highest precision (82%–88%), recall (77%–92%), and F1 score (83%–88%) across all categories (test and validation sets) compared to SVM and ANN, demonstrating its superior ability to accurately predict shrub fractional cover. Analysis of confusion matrices revealed RF’s superior ability to accurately predict shrub fractional cover (higher true positives) with fewer misclassifications (lower false positives and false negatives). McNemar’s test indicated statistically significant differences (p value < 0.05) between all models, consolidating RF’s dominance. The development of shrub fractional cover maps and derived map products is anticipated to leverage key information to support landscape management, such as for the assessment of fire hazard and the more effective planning of preventive actions.

Keywords:

fire hazard; forest fires; Mediterranean ecosystems; random forest model; Sentinel-2; shrub fractional cover mapping; wildfire prevention

1. Introduction

Wildfires are becoming increasingly frequent and severe in Mediterranean landscapes, posing significant threats to ecosystem resilience and human safety. Effective landscape management hinges on a comprehensive understanding of land cover. High-resolution land cover maps provide valuable insights into the composition and spatial distribution of vegetation, offering essential information for informed decision making [1]. However, acquiring such detailed data in fine-grained, heterogeneous Mediterranean landscapes, where vegetation boundaries are often complex, presents challenges [2].

Accurate mapping of shrub cover is particularly critical in fire-prone ecosystems, as shrub encroachment contributes significantly to fuel load and influences fire behavior [3,4]. Characterized by intricate branching patterns and dense foliage, shrubs serve as both fuel and ignition sources for wildfires [5]. Moreover, the unique physiological adaptations of many shrub species, such as flammable resins and oils, heighten their vulnerability to ignition during fire events [6]. Accurately mapping shrub cover at high resolution is, therefore, necessary for landscape management and fire prevention, offering valuable insights into areas most susceptible to fire outbreaks [7]. While traditional methods can be expensive and time-consuming, posing challenges for ecological research and management projects [8], remote sensing provides a valuable tool for the generation of accurate maps of fractional cover [9,10]. Numerous studies have leveraged satellite and in situ data to extract key environmental variables for various applications, including vegetation mapping [11,12,13], enabling large-scale monitoring and analysis of landscapes [14].

High-quality, publicly available satellite imagery from automated Earth Observation (EO) systems readily supports land cover mapping [15,16] at various scales, offering significant advantages over traditional methods like ground surveys, which are often impractical for landscape-level efforts [17]. The European Space Agency’s (ESA) Multi-Spectral Instrument (MSI), Sentinel-2, offers a unique combination of high spatial, temporal, and multi-spectral data, making it particularly valuable for the monitoring of ecosystems at the landscape level [18]. Its 10- to 60-m resolution enables the identification of smaller features on the Earth’s surface. This combination makes Sentinel-2 a cost-effective, comprehensive, and time-effective tool for the monitoring and analysis of changes on the Earth’s surface, enabling a wide range of applications across various fields [19]. While airborne devices like drones can capture even higher-resolution imagery, their use often requires specialized knowledge for image processing and mapping and can be time-consuming, and the data may not be as widely available as satellite imagery like Sentinel or Landsat [20]. While satellite imagery offers a powerful tool for large-scale monitoring [21,22], its limitations in terms of spatial resolution and cloud cover can hinder accuracy when applied to the production of fine-grained land cover maps [23,24,25,26].

Advanced modeling approaches, such as Machine learning (ML) algorithms, empower us to overcome these limitations [27,28]. When applied to satellite data, ML algorithms can extract finer details and insights, significantly enhancing the value and usefulness of satellite imagery [29,30,31].

Traditional vegetation mapping approaches using satellite imagery fall into the following two main categories: object-based and fractional cover methods. The former focus on identifying individual land cover elements [32], while the latter aim to estimate the proportional composition of different cover types within a single pixel [10,33]. Both methods have their merits and drawbacks. Fractional cover mapping provides a more holistic view of land cover composition, especially in fine-grained heterogeneous landscapes [34]. This approach, although less precise than object-based methods performs well when it is difficult to define vegetation-type boundaries [35]. Recent research has identified random forest (RF) [13,36], support vector machine (SVM) [11,37], and artificial neural networks (ANNs) [38] as some of the most widely used and effective methods for fractional cover mapping in remote sensing. These methods excel in handling complex and heterogeneous landscapes, making them particularly well-suited for applications such as fire hazard assessment and wildfire prevention. Vegetation fractional cover mapping, a specific type of fractional cover mapping that quantifies the proportional representation of different vegetation types within a pixel, is a valuable tool for assessing vegetation dynamics, particularly in fire-prone regions [39]. The spectral signatures and spatial distributions of different plant types exhibit distinct characteristics that allow for their identification and differentiation using multi-spectral satellite imagery. The application of fractional mapping extends to various fields, including ecology, precision agriculture, disaster management, and water resources, among others [40,41,42,43]. In ecology research, Kapitza et al. (2022) [40] introduced an advanced land use model for the prediction of ecological changes amid evolving socio-economic scenarios. In the domain of precision agriculture, a study conducted in the Himalayan region of India by Khare et al. (2019) [41] leveraged very high-resolution optical multi-spectral and stereo imagery to quantify the presence of the invasive Lantana plant across large spatial scales. Similarly, within the realm of disaster management, Wessels et al. (2019) [42] conducted a study in Namibia addressing the pressing concern of bush encroachment and its adverse effects on livestock productivity. Water resource management has also benefited from fractional mapping techniques. Brinkhoff et al. (2018) [43] utilized remote sensing imagery from UAVs and satellite platforms to monitor and classify vegetation in irrigation channels.

Remote sensing methods for fractional cover mapping have been classified into six different groups [44,45], namely (1) Relative Vegetation Abundance Algorithms (RVAAs) that utilize maximum and minimum vegetation index values for scaling [46,47]; (2) Spectral Mixture Analysis (SMA) algorithms [48,49]; (3) Spectral-Based Supervised Classification (SBSC) algorithms [50,51]; (4) physically-based models, such as multi-angle geometric–optical models [52]; (5) ML algorithms [53,54]; and (6) other approaches [55]. Among these methods, supervised machine learning classification approaches have gained significant popularity due to their ability to effectively classify heterogeneous land cover. Notably, RF [13,36], SVM [11,37], and ANN [38] have emerged as some of the most widely used methods in this domain.

In ecosystems prone to wildfire, high-resolution land cover mapping is particularity relevant to the mapping of fire-prone vegetation. That is the case of pyrophytic shrubs, which contribute substantially to the fuel load and influence fire behavior [3,4]. Characterized by intricate branching patterns and dense foliage, they serve as both fuel and ignition sources for wildfires [5]. Moreover, the unique physiological adaptations of many shrub species, such as flammable resins and oils, heighten their vulnerability to ignition during fire events [6]. Accurately mapping shrub cover at a high resolution is, therefore, necessary for landscape management and fire prevention, offering valuable insights into the areas most susceptible to fire outbreaks [7]. While traditional methods can be expensive and time-consuming, posing challenges for ecological research and management projects [8], remote sensing serves as a valuable tool for the generation of accurate maps of fractional cover, a technique used in numerous studies for diverse purposes [9,10]. For example, Du et al. [33] employed a pixel-based approach to estimate fractional cover, while [32] used an object-oriented approach to map fractional vegetation cover.

ML algorithms—specifically, RF and Convolutional Neural Networks (CNNs)—have demonstrated promising results in land cover classification tasks [56,57]. For example, Houda et al. [58] employed Google Earth Engine and RF classifiers to enhance the accuracy of burned area identification in Morocco. Trenčanová et al. [59] investigated the use of CNNs for the mapping of four land cover classes in fire-prone landscapes. Additionally, Aragoneses et al. [37] developed a methodology for generating fuel maps across European regions to improve wild-land fire risk assessment. Mohammadpour et al. [36] introduced a method for mapping complex and mixed vegetation cover using RF models and Sentinel-2 data from the Lousã district in Portugal. Hernandez et al. [13] proposed a method for land cover and crop mapping using RF models and Sentinel-2 data in Portugal. However, high-resolution land cover mapping in fine-grained landscapes remains a challenge [60]. While fractional cover mapping is commonly employed at coarser resolutions (>10 m) [61], advancements in high-resolution mapping with finer details are needed. Ideally, a calibrated model solely based on satellite data could enable high-resolution land cover mapping tailored to specific user needs [62]. This approach might prioritize accuracy over precision, advocating for a 10 m resolution to enhance mapping outcomes [35]. Here, we focus on addressing these challenges by proposing an approach for high-resolution shrub cover mapping in fire-prone landscapes. This approach leverages advanced machine learning techniques and high-resolution satellite imagery to overcome limitations associated with traditional methods.

This study aims to develop a machine learning model using Sentinel-2 imagery to predict fractional shrub cover in fine-grained, heterogeneous landscapes. We address the challenges of mapping shrub cover in complex Mediterranean landscapes by employing a high-resolution land cover map (20 cm resolution) for labeling. Our study also contributes by focusing on these often-neglected heterogeneous landscapes and by utilizing McNemar’s test for a more rigorous statistical comparison of model performance, providing insights into the strengths and weaknesses of different algorithms. Finally, we aim to compare three commonly used algorithms (SVM, RF, and ANN) to identify the best performer for this specific task.

2. Materials and Methods

2.1. Study Area

Our study area is a large native oak forest patch situated in QF, Covilhã, Portugal (Figure 1). The forest occupies an area of approximately 200 hectares, contributing to QF’s landscape heterogeneity and ecological diversity. The region has a Mediterranean climate with warm, dry summers and most rainfall occurring between October and May. This poses a significant forest fire risk during the summer months, which are characterized by warm temperatures (average, 22.2 °C) and minimal rainfall [63].

2.2. Data Sources and Processing

2.2.1. Sentinel-2 Data Processing

Thirteen Sentinel-2A satellite images captured between June and September 2020 were used in this study (Table A1). Table A2 in Appendix A provides a detailed description of the Sentinel-2 Image ID structure, explaining the information encoded within each ID. Data collection focused on the summer months because the distinction between green shrub cover and senescent herbaceous vegetation is most pronounced during this season [64,65].

Our spatial analysis leveraged imagery from the Harmonized S2 MSI: Multi-Spectral Instrument, Level-2A collection, a product developed by the ESA under the Copernicus program [66]. The MSI instrument onboard Sentinel-2A offers a diverse range of spatial resolutions (10–60 m) depending on the specific spectral band [66] (Table 1).

To ensure comprehensive representation of the study area’s spectral characteristics, all 13 spectral bands were included in our analysis. To assess the relative importance of each band for shrub cover classification, we conducted a band importance analysis using the random forest algorithm’s inherent feature-ranking capability. The results revealed that the blue, red edge 1, and SWIR bands were particularly influential in classifying shrub cover. Our analysis incorporated all 13 spectral bands, encompassing the available resolutions. To ensure consistency and facilitate meaningful comparisons across this diverse spectral information, all imagery was resampled to a uniform spatial resolution of 10 m. This standardization process enabled a streamlined analysis, capitalizing on the MSI’s high resolution, expansive field of view, and multi-spectral capabilities [67]. Following calibration and clipping, images derived from the EUMETSAT database were employed for training, testing, and validation purposes.

Table 1. The used spectral bands of the Sentinel-2A multi-spectral instrument [68].

Band Number	Band Name	Central Wavelength (nm)	Bandwidth (nm)	Resolution (m)
1	Coastal aerosol	443.9	27	60
2	Blue	496.6	98	10
3	Green	560.0	45	10
4	Red	664.5	38	10
5	Red Edge 1	703.9	19	20
6	Red Edge 2	740.2	18	20
7	Red Edge 3	782.5	28	20
8	NIR	835.1	145	10
8A	Narrow NIR	864.8	33	20
9	Water vapor	945.0	26	60
10	SWIR-Cirrus	1373.5	75	60
11	SWIR 1	1613.7	143	20
12	SWIR 2	2202.4	242	20

2.2.2. Vegetation Coverage Percentage

A high-resolution land cover map of 2020 from QF in Portugal, produced from drone imagery and made available by the farm managers, with a pixel resolution of 20 cm × 20 cm and an F1 score of 0.82, was intersected with the 10 m × 10 m Sentinel-2 pixel grid to determine the proportion of shrub cover within the pixels and label them accordingly. The high-resolution land cover map was created using drone imagery and manual interpretation, classifying each pixel within the study area into one of the following categories:

-: No shrub cover (0%): Pixels were classified as having no shrub cover if they were dominated by other vegetation types, such as trees or bare ground.
-: Moderate shrub cover (0%–50%): Pixels were classified as having moderate shrub cover if they contained a mix of shrubs and other vegetation types, such as trees, grasses, or bare ground.
-: Dense shrub cover (>50%): Pixels were classified as having dense shrub cover if they were primarily dominated by shrubs.

The intersection of the high-resolution land cover map with the Sentinel-2 pixel grid allowed us to assign each Sentinel-2 pixel to one of these three shrub cover categories based on the percentage of shrub cover within that pixel. This procedure ensured a consistent and accurate classification of shrub cover across the entire study area, allowing us to effectively train and validate our machine learning models for the prediction of fractional shrub cover.

2.3. Modeling Approach

Figure 2 illustrates our methodology for predicting shrub cover. It outlines the key steps involved in achieving accurate results. The methodology involves the following series of steps: (1) acquiring and aligning Sentinel-2 imagery with a high-resolution land cover map; (2) labeling Sentinel-2 pixels based on the percentage of shrub cover within each pixel; (3) training three machine learning models (RF, SVM, and ANN) on the labeled data; and (4) validating model performance using an independent dataset. These machine learning models were chosen for their ability to effectively classify shrub cover in heterogeneous landscapes, which often exhibit complex patterns and variations, and their capacity to handle high-dimensional data. The labeled dataset was divided into training, testing, and validation subsets. Experiments were conducted on a PC with 16 GB RAM and an i7 Intel(R) 2.80 GHz processor, utilizing Python 3.11 and its machine learning libraries (scikit-learn) to construct the models. The entire process was carried out using data from August 2020.

First, Sentinel-2 images from August 2020 were acquired and aligned with a high-resolution land cover map. This involved obtaining the necessary Sentinel-2 imagery for the study area and ensuring that it was correctly georeferenced and aligned with the high-resolution map to ensure a common spatial reference (Figure 2).

Secondly, the Sentinel-2 pixels were labeled according to their shrub cover percentage category. Based on this percentage, a pixel was assigned to one of the following three categories (Figure 3):

-: Category 1 (0% shrub cover);
-: Category 2 (>0%–50% shrub cover);
-: Category 3 (>50% shrub cover).

Thirdly, labeled Sentinel-2 pixels and their spectral information were used to train three machine learning models, namely RF, SVM, and ANN (details in Section 2.4). This involved using the labeled pixels as training data. The machine learning algorithms learned the relationships between the spectral characteristics of the Sentinel-2 pixels and the assigned shrub cover category labels.

Fourthly, models were validated using an independent dataset. This step involved testing the trained models on a separate set of labeled Sentinel-2 pixels that were not used for training. This helped to assess how well the models generalize to unseen data, providing an estimate of their accuracy in predicting the shrub cover percentage in new areas.

Thirdly and fourthly, for the training and validation process, the dataset was divided into two subsets. A significant portion (90%) was designated for training and testing, while the remaining 10% was reserved for model validation. Within this 90% subset, data were further partitioned into training data (80%) and testing data (20%). It is worth noting that each specific pixel is represented in multiple samples due to the use of multiple Sentinel-2 images. The full dataset contains 62,283 samples and 23,478 unique locations (pixels). The training set (80% of the training and testing sets) comprises 18,245 unique pixels (77.7% of unique pixels in the full dataset). The testing set (20% of the training and testing sets) contains 6769 unique pixels (28.8% of unique pixels in the full dataset). Finally, The validation set (10% of the full dataset) comprises 5247 unique pixels (22.3% of unique pixels in the full dataset).

Table 2 provides a concise overview of the data splitting process employed during the training, testing, and validation steps for the QF dataset.

Table 2 provides a quantitative overview of the data splitting process, showing the exact number of samples for each shrub cover category used for training, testing, and validation. This table reflects the stratified sampling approach employed to ensure balanced representation of each category in the data. The number of samples for each category is approximately equal, as shown in Table 2.

2.4. Machine Learning Models

The following specific machine learning models were employed in our study:

a: Artificial Neural Networks

Artificial neural networks (ANNs) are computational models inspired by the human brain’s neural networks and capable of learning complex relationships within datasets through iterative training processes [69]. Here, ANNs are employed as a robust machine learning framework for the classification of shrub cover categories based on spectral data. The adaptability of ANNs makes them well-suited to discerning intricate patterns and non-linear relationships inherent in multi-dimensional environmental data. Our choice of ANNs as a classification tool is supported by their demonstrated efficacy in various remote sensing applications [70].

In the context of this research, we used a feed-forward neural network architecture with three hidden layers with respective units in each layer set at [100-100-3]. This architecture was chosen after preliminary experimentation, balancing model complexity and computational efficiency. Three hidden layers were found to be sufficient to capture the non-linear relationships in the data, while the decreasing number of units in each layer helps to prevent overfitting and extract increasingly abstract features. The final layer with three units corresponds to the three shrub cover categories. “Relu” activation functions were used for the first two layers, and “softmax” was used for the third layer. We utilized the Adam optimizer with 150 epochs and a batch size of 32, as described in Table 3.

b: Support Vector Machines

Support vector machines (SVMs) leverage vector representations of training instances to establish a hyperplane between two classes, optimizing the distance (margin) from the closest training examples to the hyperplane [71]. In the realm of shrub coverage assessment, the most common approach involves labeling an individual pixel extracted from a multi-spectral or hyperspectral image as a representation of a data sample. SVMs hold a distinct appeal in the field of fire risk due to their ability to achieve robust generalization despite limited training samples [72]. Furthermore, it is important to note that SVMs are versatile and can be extended to multiclass classification tasks. The approach used in this study is a one-to-one SVM approach tackling multiclass classification problems. This technique breaks down our multiclass classification problem into subproblems which are binary classification problems. Following this strategy, we obtain binary classifiers for each pair of shrub coverage categories. For final prediction for any input, we use majority voting and the distance from the margin as the confidence criterion. In this study, we leveraged an SVM model-based RBF kernel with a regularization parameter (C) of 1 (inversely proportional to regularization strength) and a Kernel coefficient (

γ

) to 1/n_features, as shown in Table 3.

c: Random Forest Analysis

The RF algorithm is an ensemble method capable of handling both classification and regression tasks. It combines multiple decision trees, each trained on a randomly sampled subset of the data and features [73].

RF is popular in remote sensing due to its high accuracy and robustness to overfitting. This is attributed to the random selection of training samples and features for splitting at each tree node. RF is also effective at handling high-dimensional and correlated data, which are commonly encountered in hyperspectral imagery [74]. The RF model used in this research was configured for 250 estimators, with ‘Entropy’ as a split criterion, as represented in Table 3. The other parameter was the minimum number of samples required to split an internal node (min_samples_split), which was set to 2. The minimum number of samples required to be at a leaf node (min_samples leaf) was 1. The maximum depth of each estimator (max_depth) was not set, as we wanted nodes to be expanded until all leaves were pure or until all leaves contained fewer than min_samples_split samples. Moreover, we used bootstrap samples to build the estimators.

To optimize the performance of each machine learning model, we conducted a systematic hyperparameter tuning process. This involved exploring a range of values for key hyperparameters and evaluating the model’s performance using cross-validation on the training dataset. The set of hyperparameters that yielded the highest performance metrics (precision, recall, and F1 score) on the validation set was selected as the optimal configuration for each model.

2.5. Model Evaluation and Comparison

Evaluating and comparing model performance is key for assessing their effectiveness in classifying fractional shrub cover. We used different performance measures to ensure accurate evaluations.

2.5.1. Model Evaluation

We began by conducting systematic hyperparameter tuning to optimize our models. This process involved varying hyperparameters and assessing their impact on model performance. Following the hyperparameter tuning process, we employed a set of standard performance metrics to comprehensively assess the effectiveness of our models. These metrics encompassed accuracy, precision, recall, the confusion matrix, and the F1 score, collectively shedding light on various facets of the models’ performance [75].

-: Accuracy: Quantifies the ratio of correctly classified instances to the total number of instances, providing an overarching measure of a model’s correctness [76];
-: Precision: Delves into the proportion of true-positive predictions out of all instances predicted as positive, making it particularly pertinent in scenarios where minimizing False Positives (FPs) is of paramount importance;
-: Recall (also known as sensitivity): Gauges the proportion of true positives out of all actual positive instances, a critical consideration when the imperative is to minimize False Negatives (FNs) [77];
-: Confusion Matrix: Provides a more in-depth overview of the performance of a model, as it presents a summary of counts of True Positive (TP), True Negative (TN), FP, and FN predictions made by the model on a set of data [76];
-: F1 Score: The harmonic mean of precision and recall, offering a balanced assessment of a model’s performance and ensuring that both FPs and FNs are considered [77].

2.5.2. Model Comparison

We compared the performance of the three models to identify the one with the highest accuracy, precision, recall, and F1 score. This comparison was key for selecting the most effective model for the prediction of shrub cover.

We also used McNemar’s test to statistically compare the performance of the models [78]. This test evaluates whether there is a significant difference in performance between two classification models on the same dataset.

(a): McNemar’s test involves creating a contingency table for each pair of models, recording whether both models classified an instance correctly, both incorrectly, or one correctly and the other incorrectly (Table 4). The null hypothesis is that there is no difference in performance between the two models, while the alternative hypothesis is that there is a significant difference.

Let a be the count of instances where RF and SVM are both correct, b be the count of instances where RF is correct and SVM is incorrect, c be the count of instances where RF is incorrect and SVM is correct, d be the count of instances where RF and SVM are both incorrect, and N represent the total number of instances in the dataset. Basically, since we compared three ML algorithms, we established a contingency table for every pair of algorithms.

(b): Hypothesis Formulation:

We formulated the null hypothesis (

H_{0}

) and alternative hypothesis (

H_{1}

) such that

\begin{matrix} H_{0} : \frac{b + c}{N} & = 0.5 & (No significant difference in performance) \\ H_{1} : \frac{b + c}{N} & \neq 0.5 & (Significant difference in performance) \end{matrix}

(c): Calculation of Test Statistic:

We used the following formula to calculate the test statistic (

χ^{2}

):

χ^{2} = \frac{{(| b - c | - 1)}^{2}}{b + c}

(d): p-Value Calculation:

We compared the calculated test statistic to a

χ^{2}

distribution to obtain the p value.

(e): Decision Rule:

We compared the p value to a chosen significance level (e.g., 0.05) as follows:

If the p value < significance level, reject the null hypothesis $H_{0}$ ;
If the p value ≥ significance level, fail to reject the null hypothesis $H_{0}$ .

We used the training and testing datasets for model selection and the validation dataset for evaluation of the chosen models. This approach ensured a comprehensive assessment of the models’ performance for the classification of shrub cover categories.

3. Results

This section presents the evaluation results of the three ML algorithms (RF, SVM, and ANN) used for shrub coverage estimation. The evaluation focuses on the performance of these algorithms during the training, testing, and validation stages.

3.1. Training/Testing

Table 5, Table 6 and Table 7 present the algorithm evaluation results of the training and testing stages using RF, SVM, and ANN to estimate the three categories of shrub coverage.

RF consistently outperformed SVM and ANN in terms of precision, recall, and F1 score across all shrub cover categories (Table 5, Table 6 and Table 7). In terms of precision, RF consistently outperformed SVM and ANN, achieving the highest scores in all fractional shrub coverage categories. This suggests that when RF predicts a positive outcome in any category, it tends to be accurate, with a low rate of FP, whereas SVM and ANN both display competitive precision scores, with ANN performing better than SVM in all categories (Table 5, Table 6 and Table 7). In recall, RF stands out as the top-performing model, achieving the highest recall scores in all categories. RF captured a substantial portion of the actual positive instances, making it effective at identifying TP. SVM and ANN showed competitive recall scores, with ANN performing better than SVM (Table 5, Table 6 and Table 7). For the F score, which balances precision and recall, RF consistently achieved the highest scores in all categories. This indicates that RF maintains a strong balance between precision and recall, offering an effective compromise in terms of model performance. SVM and ANN exhibited competitive F scores, with ANN standing as the second best performer after RF.

The analysis of band importance revealed that the blue, red edge 1, and SWIR bands were particularly influential for the classification of shrub cover. This finding supports the rationale for including all 13 spectral bands in the analysis, as described earlier in the Section 2. The inclusion of a wide range of spectral information, especially in bands that are sensitive to vegetation structure, ensures a more comprehensive representation of the spectral characteristics of the study area, leading to improved model accuracy and a more robust understanding of the factors driving shrub cover classification.

In summary, the RF algorithm is highly accurate in predicting shrub cover, effectively capturing both TP and TN instances while maintaining a good balance between precision and recall. ANN generally performed better than SVM in terms of precision, recall, and F1 score.

After completing the classification process, we generated two sets of maps to visually represent the interplay between pixels within the designated region and the three pre-defined categories. The initial map delineated the actual pixel classes, while the subsequent map displayed the predicted classifications, all based on the RF algorithm, which proved to yield the best performance results among all the tested algorithms.

Figure 4 presents the actual and predicted shrub cover maps for the testing dataset. The testing set in Figure 4a shows that category 1 is present in the eastern part of QF, with smaller segments dispersed throughout the mapping area, while category 2 is prevalent across the region and category 3 is observed from the northwest to northeast in the training area, with sporadic instances in the central and southwestern zones (Figure 4a). The prediction for the testing set (Figure 4b) shows that category 1 is concentrated in the eastern part of the region, with smaller segments dispersed throughout the mapping area, while category 2 is prevalent across the landscape and category 3 is concentrated from the northwest to the northeast of the training area, with sporadic instances in the central and southwestern zones (Figure 4b). Indeed, the two maps show similar spatial patterns, with category 1 concentrated in the eastern part of the study area, category 2 prevalent throughout, and category 3 primarily located in the northwest and northeast.

3.2. Validation

Table 8, Table 9 and Table 10 present the validation results for the three ML models (RF, SVM, and ANN).

RF achieved remarkable performance in terms of precision, recall, and F1 score for all three shrub cover categories on the validation set (Table 8). The following outcomes were quite noteworthy: 82% precision, 85% recall, and 83% F score for category 1; 98% precision, 77% recall, and 86% F score for category 2; and 78% precision, 92% recall, and 84% F score for category 3 (Table 8). Figure 5 shows the actual and predicted shrub cover maps for the validation dataset. The visualization of the validation outcomes shown in Figure 5 delineates the actual shrub coverage categories within the validation set (Figure 5a) and the predicted shrub coverage categories based on our RF algorithm (Figure 5b), which demonstrated superior performance among the tested algorithms.

Both the actual and predicted maps show strong similarities across all shrub cover categories. The RF model consistently outperformed SVM and ANN, confirming its selection as the primary model. However, there is potential for further refinement by reducing FNs in category 1 and category 3 and FPs in category 2.

Figure 6 shows confusion matrices for shrub cover classification on the test set and validation set.

On the Test set:

-: When comparing RF and SVM, we observe that RF generally has higher TP values across all categories. SVM tends to have slightly higher FP values, especially for category 1 and category 3. Moreover, RF has lower FN values than SVM for all categories. In terms of TNs, both algorithms show similar results for category 1 and category 3, with RF performing better in category 2 (Figure 6a).
-: RF and ANN have comparable TP values across all categories. Nevertheless, ANN tends to have slightly higher FP values, especially for category 1 and category 2. In terms of FNs, RF generally has lower values than ANN for all categories (Figure 6a,c).
-: ANN generally has higher TP values than SVM, especially for category 1 and category 3, and lower FN values for all categories. SVM tends to have slightly higher FP values, particularly for category 1 and category 3 (Figure 6c,e).

Generally, RF consistently achieved higher TP values and lower FN values than SVM and ANN for all shrub cover categories. This indicates RF’s ability to accurately identify both positive and negative instances. SVM and ANN have their strengths and weaknesses in differentiating certain categories, but their overall performance is comparable.

On the validation set,

-: RF has high TP values for all categories, indicating good classification of positive instances. However, it has relatively higher FP values, suggesting a tendency to misclassify negative instances as positive (Figure 6b).
-: ANN shows a balanced performance, with moderate TP and FP values across all Categories. It demonstrates higher TN values than RF, indicating better classification of negative instances (Figure 6d).
-: SVM also has competitive TP and TN values, but overall, RF and ANN perform well, with SVM slightly trailing behind in terms of overall accuracy (Figure 6f).

In summary, RF also shows high TP values but relatively higher FP values, indicating a tendency to misclassify negative instances as positive. ANN demonstrates balanced performance, with moderate TP and FP values, while SVM also has competitive TP and TN values. Indeed, RF and ANN performed better than SVM on the validation set.

Table 11 and Table 12 present McNemar’s test results.

The results in Table 11 and Table 12 show a statistically significant difference in performance between each pair of algorithms evaluated on both the test and validation sets. This is supported by the chi-square (

χ^{2}

) values and the corresponding p values. All the p values are 0, indicating a rejection of the null hypothesis of no difference in performance at a significance level of alpha = 0.05.

For each pair of algorithms, the

χ^{2}

value is higher for the test set than the validation set (Table 11 vs. Table 12). This suggests that differences in model performance are more pronounced on the unseen test data. Specifically, the largest difference is observed between the RF and ANN algorithms (13,101 on the test set and 7448 on the validation set).

4. Discussion and Conclusions

This study investigated the application of machine learning models for the prediction of fractional shrub cover in fine-grained, heterogeneous landscapes, comparing the performance of RF, ANN, and SVM algorithms. Our findings demonstrate that RF is the most effective model for this task, consistently outperforming both ANN and SVM across all shrub cover categories. Detailed analysis of each model revealed insights into their strengths and weaknesses.

RF exhibited higher FP rates for category 2, suggesting potential limitations in differentiating areas without shrub cover from those with shrub cover. This highlights the importance of feature selection and hyperparameter tuning tailored to specific shrub cover categories.

The model may misclassify areas with young tree saplings as shrub-dominated, particularly in areas with sparse shrub cover. This is because young trees can have a similar spectral signature to that of shrubs. Incorporating high-resolution satellite imagery or LiDAR data, which provide detailed information on vegetation structure, could potentially improve model accuracy in these challenging areas [79]. This approach would be particularly helpful for distinguishing mature trees from shrubs but less useful for differentiating between young saplings and shrubs. Given that can happen in other studies areas dealing with deciduous trees, a seasonal approach incorporating data from the winter season when deciduous trees are leafless could potentially help differentiate between trees and shrubs. This approach leverages the distinct spectral characteristics of deciduous trees in the winter, allowing for a more accurate classification of shrub cover.

ANN showed more balanced performance, with moderate TP and FP rates across categories, although its accuracy was slightly lower than that of RF. This suggests that while ANN may be suitable for applications where minimizing FPs is important, RF offers a better balance for predicting shrub cover. SVM, although competitive in terms of TP and TN values, lagged behind RF and ANN in accuracy. This emphasizes the importance of model selection based on the specific application requirements [20].

Our results align with previous research highlighting the effectiveness of RF for land cover classification [80]. Similar to Mellor et al. (2015) [81], our RF model facilitated the creation of detailed and accurate shrub cover maps, reinforcing the value of machine learning approaches for vegetation mapping [82].

While RF has emerged as a leading algorithm for land cover classification in various ecological settings [80,81], our study provides further evidence of its effectiveness in complex Mediterranean landscapes, particularly in fire-prone regions. RF consistently outperformed support vector machine (SVM) and artificial neural networks (ANNs) in terms of precision, recall, and F1 score across all shrub cover categories. This suggests that RF offers a robust and accurate approach for fractional shrub cover mapping in these challenging landscapes.

The analysis of misclassified pixels, while revealing the model’s overall strength, also highlighted some recurring patterns that merit further investigation. For example, pixels with very low shrub cover or areas containing a mix of grass and rocks were sometimes confused with category 2. Similarly, in category 2, pixels with lower shrub cover could be misclassified as category 1, particularly in areas with bare soil. This suggests that the model’s ability to differentiate between low and no shrub cover requires further refinement. In category 3, dense shrub cover with a mix of trees could sometimes be misclassified as category 1, and moderate shrub cover in category 3 pixels could be confused with category 2 due to spectral similarity. This highlights the inherent challenges in distinguishing between different vegetation categories, particularly in areas with overlapping spectral characteristics. Further research could explore strategies to improve the model’s sensitivity to subtle variations in shrub cover, potentially through feature engineering, incorporating additional data sources (e.g., LiDAR) or refining classification algorithms.

The analysis of misclassifications emphasizes the difficulty of distinguishing between different vegetation categories, particularly in areas with overlapping spectral characteristics. This “spectral confusion” is a common challenge in remote sensing, potentially leading to significant errors in land cover mapping [83]. Our findings highlight the need to incorporate additional features (e.g., texture or elevation) or refine classification algorithms to improve accuracy. Studies have shown that feature selection and algorithm choice can greatly influence classification results [84]. Ultimately, accurate land cover mapping depends on a thorough understanding of both the data and the methods used, as well as a robust accuracy assessment of the resulting map [85].

McNemar’s test confirmed statistically significant differences (p < 0.05) between all pairs of models (Table 11 and Table 12), further emphasizing the importance of model selection based on the specific application requirements. The statistically significant differences in performance highlight the strengths and weaknesses of each model, reinforcing the need to carefully choose the model that best balances accuracy and precision for the desired application. This finding aligns with previous studies that have employed McNemar’s test to compare machine learning algorithms for land cover classification tasks. For instance, Abdi et al. (2020) [86] also found statistically significant differences between various classification algorithms using McNemar’s test. This reinforces the notion that evaluating not just accuracy but also the nuanced differences in performance across algorithms is key, especially when specific application requirements exist [87]. These findings support our earlier observations, highlighting the strengths and weaknesses of each model. The significant differences in performance further emphasize the importance of model selection based on the specific application and desired balance between accuracy and precision.

Future research could focus on improving feature engineering techniques to enhance the ability of models to differentiate shrub cover from other features within high-resolution data. Exploring advanced software programs alongside remote sensing techniques for fractional cover mapping, as highlighted in previous studies, is another promising research direction [61,88].

Within the scope of vegetation management for fire prevention, the integration of shrub cover mapping with other relevant data, such as weather patterns, topography, and historical fire occurrences, can facilitate the development of comprehensive fire risk assessment models. This knowledge is essential for the design and implementation of proactive fire prevention strategies. Recognizing the spatial distribution of fire-prone vegetation, as predicted by our shrub cover models, is key for informed decision making and strategic planning in fire risk management.

Author Contributions

Conceptualization, V.P. and E.K.C.; methodology, E.K.C., V.P. and R.L.; software, E.K.C., R.L. and T.A.T.; validation, E.K.C. and T.A.T.; formal analysis, E.K.C. and T.A.T.; investigation, E.K.C.; resources, V.P. and I.R.; data curation, V.P. and I.G.; writing—original draft preparation, E.K.C.; writing—review and editing, E.K.C., V.P., R.L., I.R. and T.D.; visualization, E.K.C. and T.A.T.; supervision, V.P.; project administration, V.P. and T.D.; funding acquisition, V.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the SILVANUS project (European Union’s Horizon 2020 Research and Innovation Program Grant Agreement no. 101037247) and by FCT/MCTES (PIDDAC) through project LARSyS—FCT Pluriannual funding 2020–2023 (UIDP/50009/2020), 2020.06277.BD (I. Ribeiro) and CEECIND/04469/2017 (V. Proença). The study was Supported also by FCT/MCTES (PIDDAC) through projects UIDB/50009/2020 and LA/P/0083/2020.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors express their gratitude and acknowledgment to Chaimaa Oulad Dahman for her diligent efforts in advancing the knowledge and enhancing the outcomes of this paper.

Conflicts of Interest

Authors Ivo Gama and Tiago Domingos were employed by the company Terraprima—Serviços Ambientais. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A

Table A1. Sentinel-2 Image IDs.

Year	Image ID
	S2A_MSIL2A_20200618T112121_N0214_R037_T29TPE_20200618T141236
	S2A_MSIL2A_20200628T112121_N0214_R037_T29TPE_20200628T121758
2020	S2A_MSIL2A_20200708T112121_N0214_R037_T29TPE_20200708T122745
	S2A_MSIL2A_20200718T112121_N0214_R037_T29TPE_20200718T123331
	S2A_MSIL2A_20200728T112121_N0214_R037_T29TPE_20200728T121745
	S2A_MSIL2A_20200807T112121_N0214_R037_T29TPE_20200807T121816
	S2A_MSIL2A_20200827T112121_N0214_R037_T29TPE_20200827T141333
	S2A_MSIL2A_20200906T112121_N0214_R037_T29TPE_20200906T141119
2020	S2B_MSIL2A_20200623T112119_N0214_R037_T29TPE_20200623T143258
	S2B_MSIL2A_20200703T112119_N0214_R037_T29TPE_20200703T140926
	S2B_MSIL2A_20200713T112119_N0214_R037_T29TPE_20200713T133039
	S2B_MSIL2A_20200802T112119_N0214_R037_T29TPE_20200802T131456
	S2B_MSIL2A_20200822T112119_N0214_R037_T29TPE_20200823T151025

Table A2. The interpretation of an example of Image ID.

S2A_MSIL2A_20200618T112121_N0214_R037_T29TPE_20200618T141236

1. Mission Identifier: “S2A” indicates that the image is from Sentinel-2A, one of the two satellites in the Sentinel-2 series.

2. Processing Level Identifier: “MSIL2A” indicates the processing level of the image. “MSI” stands for multi-spectral Instrument, which is the sensor on board the Sentinel-2 satellites, and “L2A” denotes that the data has been processed to Level-2A. Level-2A data has been corrected for atmospheric effects and includes surface reflectance values.

3. Date and Time: “20200618T112121” represents 18 June 2020, at 11:21:21 UTC. This is the date and time when the image was acquired.

4. Orbit Number: “N0214” represents the orbit number of the satellite when the image was captured. This information is related to the satellite’s path and position in its orbit.

5. Relative Orbit Number: “R037” represents the relative orbit number. Relative orbits are specific paths that the satellite follows as it orbits the Earth.

6. Tile Identifier: “T29TPE” indicates the specific tile or granule of the image. Each Sentinel-2 image covers a specific area divided into tiles, and this identifier specifies which tile the image corresponds to.

7. Processing Date and Time: “20200618T141236” indicates the date and time when the image was processed. This is the time when the Level-2A processing was completed.

Putting it all together, this Sentinel-2 image ID represents an image captured by the Sentinel-2A satellite on 18 June 2020, processed to Level-2A, and covering a specific area identified by the title “T29TPE”.

References

Zhang, C.; Li, X. Land Use and Land Cover Mapping in the Era of Big Data. Land 2022, 11, 1692. [Google Scholar] [CrossRef]
Toth, C.; Jóźków, G. Remote sensing platforms and sensors: A survey. ISPRS J. Photogramm. Remote Sens. 2016, 115, 22–36. [Google Scholar] [CrossRef]
Tesha, D.L.; Madundo, S.D.; Mauya, E.W. Post-fire assessment of recovery of montane forest composition and stand parameters using in situ measurements and remote sensing data. Trees For. People 2024, 15, 100464. [Google Scholar] [CrossRef]
Keeley, J.E.; Fotheringham, C. Impact of past, present, and future fire regimes on North American Mediterranean shrublands. In Fire and Climatic Change in Temperate Ecosystems of the Western Americas; Springer: Berlin/Heidelberg, Germany, 2003; pp. 218–262. [Google Scholar]
Keeley, J.E.; Bond, W.J.; Bradstock, R.A.; Pausas, J.G.; Rundel, P.W. Fire in Mediterranean Ecosystems: Ecology, Evolution and Management; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Keeley, J.E.; Zedler, P.H. Large, high-intensity fire events in southern California shrublands: Debunking the fine-grain age patch model. Ecol. Appl. 2009, 19, 69–94. [Google Scholar] [CrossRef] [PubMed]
Enes, T.; Lousada, J.; Fonseca, T.; Viana, H.; Calvão, A.; Aranha, J. Large scale shrub biomass estimates for multiple purposes. Life 2020, 10, 33. [Google Scholar] [CrossRef]
Chi, M.; Plaza, A.; Benediktsson, J.A.; Sun, Z.; Shen, J.; Zhu, Y. Big Data for Remote Sensing: Challenges and Opportunities. Proc. IEEE 2016, 104, 2207–2219. [Google Scholar] [CrossRef]
Giri, C.P. Remote Sensing of Land Use and Land Cover: Principles and Applications; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
Riihimäki, H.; Luoto, M.; Heiskanen, J. Estimating fractional cover of tundra vegetation at multiple scales using unmanned aerial systems and optical satellite data. Remote Sens. Environ. 2019, 224, 119–132. [Google Scholar] [CrossRef]
Macintyre, P.; Van Niekerk, A.; Mucina, L. Efficacy of multi-season Sentinel-2 imagery for compositional vegetation classification. Int. J. Appl. Earth Obs. Geoinf. 2020, 85, 101980. [Google Scholar] [CrossRef]
Forstmaier, A.; Shekhar, A.; Chen, J. Mapping of Eucalyptus in Natura 2000 areas using Sentinel 2 imagery and artificial neural networks. Remote Sens. 2020, 12, 2176. [Google Scholar] [CrossRef]
Hernandez, I.; Benevides, P.; Costa, H.; Caetano, M. Exploring Sentinel-2 for Land Cover and Crop Mapping in Portugal. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, 43, 83–89. [Google Scholar] [CrossRef]
Gamon, J.A.; Qiu, H.L.; Sanchez-Azofeifa, A. Ecological applications of remote sensing at multiple scales. In Functional Plant Ecology; CRC Press: Boca Raton, FL, USA, 2007; pp. 655–684. [Google Scholar]
Suess, S.; van der Linden, S.; Okujeni, A.; Griffiths, P.; Leitão, P.J.; Schwieder, M.; Hostert, P. Characterizing 32 years of shrub cover dynamics in southern Portugal using annual Landsat composites and machine learning regression modeling. Remote Sens. Environ. 2018, 219, 353–364. [Google Scholar] [CrossRef]
Xu, Z.y.; Sun, B.; Zhang, W.f.; Li, Y.f.; Yan, Z.y.; Yue, W.; Teng, S.h. An evaluation of a remote sensing method based on optimized triangular vegetation index (TVI) for aboveground shrub biomass estimation in shrub-encroached grassland. Acta Prataculturae Sin. 2023, 32, 1. [Google Scholar]
Turner, W. Sensing biodiversity. Science 2014, 346, 301–302. [Google Scholar] [CrossRef] [PubMed]
ESA. Sentinel-2. Available online: https://documentation.dataspace.copernicus.eu/Data/SentinelMissions/Sentinel2.html (accessed on 24 June 2024).
MODIS. Available online: https://modis.gsfc.nasa.gov/about/ (accessed on 29 March 2023).
Taha, A.T.; Cherif, E.K. Land Cover Classification for Fires Using Sentinel-2 Satellite RGB Images and Deep Transfer Learning. In Proceedings of the Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges, Montreal, QC, Canada, 21–25 August 2022; Rousseau, J.J., Kapralos, B., Eds.; Springer: Cham, Switzerland, 2023; pp. 142–150. [Google Scholar]
Gruen, A.; Zhang, Z.; Eisenbeiss, H. UAV Photogrammetry in Remote Areas–3d Modeling of Drapham Dzong Bhutan. Int. Arch. Photogramm. Remote. Sens. Spat. Inf. Sci. 2012, XXXIX-B1, 375–379. [Google Scholar] [CrossRef]
Lin, A.Y.M.; Novo, A.; Har-Noy, S.; Ricklin, N.D.; Stamatiou, K. Combining GeoEye-1 satellite remote sensing, UAV aerial imaging, and geophysical surveys in anomaly detection applied to archaeology. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2011, 4, 870–876. [Google Scholar] [CrossRef]
Jensen, J. Remote Sensing of the Environment: An Earth Resource Perspective 2/e; Pearson Education: London, UK, 2009. [Google Scholar]
Prudente, V.H.R.; Martins, V.S.; Vieira, D.C.; de França e Silva, N.R.; Adami, M.; Sanches, I.D. Limitations of cloud cover for optical remote sensing of agricultural areas across South America. Remote Sens. Appl. Soc. Environ. 2020, 20, 100414. [Google Scholar] [CrossRef]
Al-Wassai, F.A.; Kalyankar, N. Major limitations of satellite images. arXiv 2013, arXiv:1307.2434. [Google Scholar]
Du, Y.; Teillet, P.M.; Cihlar, J. Radiometric normalization of multitemporal high-resolution satellite images with quality control for land cover change detection. Remote Sens. Environ. 2002, 82, 123–134. [Google Scholar] [CrossRef]
Foga, S.; Scaramuzza, P.L.; Guo, S.; Zhu, Z.; Dilley, R.D.; Beckmann, T.; Schmidt, G.L.; Dwyer, J.L.; Joseph Hughes, M.; Laue, B. Cloud detection algorithm comparison and validation for operational Landsat data products. Remote Sens. Environ. 2017, 194, 379–390. [Google Scholar] [CrossRef]
Zhang, L.; Dong, R.; Yuan, S.; Li, W.; Zheng, J.; Fu, H. Making Low-Resolution Satellite Images Reborn: A Deep Learning Approach for Super-Resolution Building Extraction. Remote Sens. 2021, 13, 2872. [Google Scholar] [CrossRef]
Vaghela Himali, P.; Raja, R.A.A. Automatic Identification of Tree Species From Sentinel-2A Images Using Band Combinations and Deep Learning. IEEE Geosci. Remote Sens. Lett. 2024, 21, 2501405. [Google Scholar] [CrossRef]
Devi, N.B.; Kavida, A.C.; Murugan, R. Feature extraction and object detection using fast-convolutional neural network for remote sensing satellite image. J. Indian Soc. Remote Sens. 2022, 50, 961–973. [Google Scholar] [CrossRef]
Basu, S.; Ganguly, S.; Mukhopadhyay, S.; DiBiano, R.; Karki, M.; Nemani, R. DeepSat: A learning framework for satellite imagery. In Proceedings of the SIGSPATIAL ’15: 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 3–6 November 2015; Association for Computing Machinery: New York, NY, USA, 2015. [Google Scholar] [CrossRef]
Rau, J.Y.; Jhan, J.P.; Rau, R.J. Semiautomatic object-oriented landslide recognition scheme from multisensor optical imagery and DEM. IEEE Trans. Geosci. Remote Sens. 2013, 52, 1336–1349. [Google Scholar] [CrossRef]
Du, Z.; Yu, L.; Chen, X.; Gao, B.; Yang, J.; Fu, H.; Gong, P. Land use/cover and land degradation across the Eurasian steppe: Dynamics, patterns and driving factors. Sci. Total Environ. 2024, 909, 168593. [Google Scholar] [CrossRef] [PubMed]
Seo, B.; Bogner, C.; Koellner, T.; Reineking, B. Mapping Fractional Land Use and Land Cover in a Monsoon Region: The Effects of Data Processing Options. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 3941–3956. [Google Scholar] [CrossRef]
Zhang, C.; Chen, Y.; Lu, D. Detecting fractional land-cover change in arid and semiarid urban landscapes with multitemporal Landsat Thematic mapper imagery. GISci. Remote Sens. 2015, 52, 700–722. [Google Scholar] [CrossRef]
Mohammadpour, P.; Viegas, D.X.; Viegas, C. Vegetation Mapping with Random Forest Using Sentinel 2 and GLCM Texture Feature—A Case Study for Lousã Region, Portugal. Remote Sens. 2022, 14, 4585. [Google Scholar] [CrossRef]
Aragoneses, E.; Chuvieco, E. Generation and mapping of fuel types for fire risk assessment. Fire 2021, 4, 59. [Google Scholar] [CrossRef]
Odebiri, O.; Mutanga, O.; Odindi, J. Deep learning-based national scale soil organic carbon mapping with Sentinel-3 data. Geoderma 2022, 411, 115695. [Google Scholar] [CrossRef]
Deardorff, J.W. Efficient prediction of ground surface temperature and moisture, with inclusion of a layer of vegetation. J. Geophys. Res. Ocean. 1978, 83, 1889–1903. [Google Scholar] [CrossRef]
Kapitza, S.; Golding, N.; Wintle, B.A. A fractional land use change model for ecological applications. Environ. Model. Softw. 2022, 147, 105258. [Google Scholar] [CrossRef]
Khare, S.; Latifi, H.; Rossi, S.; Ghosh, S.K. Fractional cover mapping of invasive plant species by combining very high-resolution stereo and multi-sensor multispectral imageries. Forests 2019, 10, 540. [Google Scholar] [CrossRef]
Wessels, K.; Mathieu, R.; Knox, N.; Main, R.; Naidoo, L.; Steenkamp, K. Mapping and monitoring fractional woody vegetation cover in the Arid Savaniannas of Namibia Using LiDAR training data, machine learning, and ALOS PALSAR data. Remote Sens. 2019, 11, 2633. [Google Scholar] [CrossRef]
Brinkhoff, J.; Hornbuckle, J.; Barton, J.L. Assessment of aquatic weed in irrigation channels using UAV and satellite imagery. Water 2018, 10, 1497. [Google Scholar] [CrossRef]
Guan, K.; Wood, E.F.; Caylor, K. Multi-sensor derivation of regional vegetation fractional cover in Africa. Remote Sens. Environ. 2012, 124, 653–665. [Google Scholar] [CrossRef]
Jia, K.; Yao, Y.; Wei, X.; Gao, S.; Jiang, B.; Zhao, X. A review on fractional vegetation cover estimation using remote sensing. Adv. Earth Sci. 2013, 28, 774. [Google Scholar]
Gutman, G.; Ignatov, A. Satellite-derived green vegetation fraction for the use in numerical weather prediction models. Adv. Space Res. 1997, 19, 477–480. [Google Scholar] [CrossRef]
Wittich, K.; Hansing, O. Area-averaged vegetative cover fraction estimated from satellite data. Int. J. Biometeorol. 1995, 38, 209–215. [Google Scholar] [CrossRef]
Roberts, D.A.; Gardner, M.; Church, R.; Ustin, S.; Scheer, G.; Green, R. Mapping chaparral in the Santa Monica Mountains using multiple endmember spectral mixture models. Remote Sens. Environ. 1998, 65, 267–279. [Google Scholar] [CrossRef]
Settle, J.; Drake, N. Linear mixing and the estimation of ground cover proportions. Int. J. Remote Sens. 1993, 14, 1159–1177. [Google Scholar] [CrossRef]
Friedl, M.A.; McIver, D.K.; Hodges, J.C.; Zhang, X.Y.; Muchoney, D.; Strahler, A.H.; Woodcock, C.E.; Gopal, S.; Schneider, A.; Cooper, A.; et al. Global land cover mapping from MODIS: Algorithms and early results. Remote Sens. Environ. 2002, 83, 287–302. [Google Scholar] [CrossRef]
Okin, G.S.; Clarke, K.D.; Lewis, M.M. Comparison of methods for estimation of absolute vegetation and soil fractional cover using MODIS normalized BRDF-adjusted reflectance data. Remote Sens. Environ. 2013, 130, 266–279. [Google Scholar] [CrossRef]
Chopping, M.; Su, L.; Rango, A.; Martonchik, J.V.; Peters, D.P.; Laliberte, A. Remote sensing of woody shrub cover in desert grasslands using MISR with a geometric-optical canopy reflectance model. Remote Sens. Environ. 2008, 112, 19–34. [Google Scholar] [CrossRef]
Stojanova, D.; Panov, P.; Gjorgjioski, V.; Kobler, A.; Džeroski, S. Estimating vegetation height and canopy cover from remotely sensed data with machine learning. Ecol. Inform. 2010, 5, 256–266. [Google Scholar] [CrossRef]
Verrelst, J.; Muñoz, J.; Alonso, L.; Delegido, J.; Rivera, J.P.; Camps-Valls, G.; Moreno, J. Machine learning regression algorithms for biophysical parameter retrieval: Opportunities for Sentinel-2 and-3. Remote Sens. Environ. 2012, 118, 127–139. [Google Scholar] [CrossRef]
Gao, L.; Wang, X.; Johnson, B.A.; Tian, Q.; Wang, Y.; Verrelst, J.; Mu, X.; Gu, X. Remote sensing algorithms for estimation of fractional vegetation cover using pure vegetation index values: A review. ISPRS J. Photogramm. Remote Sens. 2020, 159, 364–377. [Google Scholar] [CrossRef]
Mahdianpari, M.; Salehi, B.; Rezaee, M.; Mohammadimanesh, F.; Zhang, Y. Very deep convolutional neural networks for complex land cover mapping using multispectral remote sensing imagery. Remote Sens. 2018, 10, 1119. [Google Scholar] [CrossRef]
Sothe, C.; De Almeida, C.; Schimalski, M.; La Rosa, L.; Castro, J.; Feitosa, R.; Dalponte, M.; Lima, C.; Liesenberg, V.; Miyoshi, G.; et al. Comparative performance of convolutional neural network, weighted and conventional support vector machine and random forest for classifying tree species using hyperspectral and photogrammetric data. GISci. Remote Sens. 2020, 57, 369–394. [Google Scholar] [CrossRef]
Badda, H.; Cherif, E.K.; Boulaassal, H.; Wahbi, M.; Yazidi Alaoui, O.; Maatouk, M.; Bernardino, A.; Coren, F.; El Kharki, O. Improving the Accuracy of Random Forest Classifier for Identifying Burned Areas in the Tangier-Tetouan-Al Hoceima Region Using Google Earth Engine. Remote Sens. 2023, 15, 4226. [Google Scholar] [CrossRef]
Trenčanová, B.; Proença, V.; Bernardino, A. Development of semantic maps of vegetation cover from UAV images to support planning and management in fine-grained fire-prone landscapes. Remote Sens. 2022, 14, 1262. [Google Scholar] [CrossRef]
Mallet, C.; Le Bris, A. Current challenges in operational very high resolution land-cover mapping. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, 43, 703–710. [Google Scholar] [CrossRef]
Rose, M.B.; Mills, M.; Franklin, J.; Larios, L. Mapping Fractional Vegetation Cover Using Unoccupied Aerial Vehicle Imagery to Guide Conservation of a Rare Riparian Shrub Ecosystem in Southern California. Remote Sens. 2023, 15, 5113. [Google Scholar] [CrossRef]
El Mendili, L.; Puissant, A.; Chougrad, M.; Sebari, I. Towards a Multi-Temporal Deep Learning Approach for Mapping Urban Fabric Using Sentinel 2 Images. Remote Sens. 2020, 12, 423. [Google Scholar] [CrossRef]
Climate Data. Available online: https://en.climate-data.org/europe/portugal/covilha/covilha-6944/ (accessed on 2 March 2022).
Kaszta, Ż.; Van De Kerchove, R.; Ramoelo, A.; Cho, M.A.; Madonsela, S.; Mathieu, R.; Wolff, E. Seasonal Separation of African Savanna Components Using Worldview-2 Imagery: A Comparison of Pixel- and Object-Based Approaches and Selected Classification Algorithms. Remote Sens. 2016, 8, 763. [Google Scholar] [CrossRef]
Cao, X.; Liu, Y.; Liu, Q.; Cui, X.; Chen, X.; Chen, J. Estimating the age and population structure of encroaching shrubs in arid/semiarid grasslands using high spatial resolution remote sensing imagery. Remote Sens. Environ. 2018, 216, 572–585. [Google Scholar] [CrossRef]
Szantoi, Z.; Strobl, P. Copernicus Sentinel-2 Calibration and Validation. Eur. J. Remote Sens. 2019, 52, 253–255. [Google Scholar] [CrossRef]
EUMETSAT Database. Available online: https://www.eumetsat.int (accessed on 24 May 2023).
Martínez-Casasnovas, J.A.; Escolà, A.; Arnó, J. Use of Farmer Knowledge in the Delineation of Potential Management Zones in Precision Agriculture: A Case Study in Maize (Zea mays L.). Agriculture 2018, 8, 84. [Google Scholar] [CrossRef]
Kumar, A. Practical on Artificial Neural Networks; Indian Agricultural Statistics Research Institute: New Delhi, India, 2016; Volume 2. [Google Scholar]
Joshi, M.; Pedersen, T.; Maclin, R. A Comparative Study of Support Vector Machines Applied to the Supervised Word Sense Disambiguation Problem in the Medical Domain. In Proceedings of the 2nd Indian International Conference on Artificial Intelligence, IICAI 2005, Pune, India, 20–22 December 2005; pp. 3449–3468. [Google Scholar]
Vapnik, V. Statistical Learning Theory; John Wiley&Sons Inc.: New York, NY, USA, 1998; p. 1. [Google Scholar]
Mountrakis, G.; Im, J.; Ogole, C. Support vector machines in remote sensing: A review. ISPRS J. Photogramm. Remote Sens. 2011, 66, 247–259. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Robert, C. Machine Learning, a Probabilistic Perspective. Chance 2014, 27, 62–63. [Google Scholar] [CrossRef]
Alpaydin, E. Introduction to Machine Learning; MIT Press: Cambridge, MA, USA, 2020. [Google Scholar]
Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar]
McNemar, Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 1947, 12, 153–157. [Google Scholar] [CrossRef] [PubMed]
Amakhchan, W.; El Kharki, O.; Cherif, E.K.; Wahbi, M.; Yazidi Alaoui, O.; Maatouk, M.; Boulaassal, H. An Overview of Tools and Algorithms Used to Classify, Detect, and Monitor Forest Area Using LiDAR Data. In Proceedings of the Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges, Montreal, QC, Canada, 21–25 August 2022; Rousseau, J.J., Kapralos, B., Eds.; Springer: Cham, Switzerland, 2023; pp. 171–182. [Google Scholar]
Rodriguez-Galiano, V.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm. Remote Sens. 2012, 67, 93–104. [Google Scholar] [CrossRef]
Mellor, A.; Boukir, S.; Haywood, A.; Jones, S. Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin. ISPRS J. Photogramm. Remote Sens. 2015, 105, 155–168. [Google Scholar] [CrossRef]
Jain, P.; Coogan, S.C.; Subramanian, S.G.; Crowley, M.; Taylor, S.; Flannigan, M.D. A review of machine learning applications in wildfire science and management. Environ. Rev. 2020, 28, 478–505. [Google Scholar] [CrossRef]
Richards, J.A. Remote Sensing Digital Image Analysis; Springer: Berlin/Heidelberg, Germany, 2022; Volume 5. [Google Scholar]
Foody, G.M. Status of land cover classification accuracy assessment. Remote Sens. Environ. 2002, 80, 185–201. [Google Scholar] [CrossRef]
Congalton, R.G.; Green, K. Assessing the Accuracy of Remotely Sensed Data: Principles and Practices; CRC Press: Boca Raton, FL, USA, 2019. [Google Scholar]
Abdi, A.M. Land cover and land use classification performance of machine learning algorithms in a boreal landscape using Sentinel-2 data. GISci. Remote Sens. 2020, 57, 1–20. [Google Scholar] [CrossRef]
Colkesen, I.; Kavzoglu, T. 23 - Comparative Evaluation of Decision-Forest Algorithms in Object-Based Land Use and Land Cover Mapping. In Spatial Modeling in GIS and R for Earth and Environmental Sciences; Pourghasemi, H.R., Gokceoglu, C., Eds.; Elsevier: Amsterdam, The Netherlands, 2019; pp. 499–517. [Google Scholar] [CrossRef]
Schwieder, M.; Leitão, P.J.; Suess, S.; Senf, C.; Hostert, P. Estimating fractional shrub cover using simulated EnMAP data: A comparison of three machine learning regression techniques. Remote Sens. 2014, 6, 3427–3445. [Google Scholar] [CrossRef]

Figure 1. The study area within Quinta da França farm (white dashed line), Portugal (top left). The solid blue line outlines the study area in detail. A zoomed-in view in the bottom-right corner highlights the land cover within the study area, as captured by drone imagery.

Figure 2. Methodology employed by this study for shrub cover assessment using Sentinel-2 imagery and a high-resolution land cover map. The process encompasses the following chain of steps: (1) Sentinel-2 images from August 2020 are acquired and aligned with a high-resolution land cover map; (2) Sentinel-2 pixels are labeled according to their shrub cover percentage category using a high-resolution land cover map; (3) labeled Sentinel-2 pixels and their spectral information are used to train three machine learning models (RF, SVM, and ANN); (4) models are validated using an independent dataset. Solid lines depict training and testing steps, and dashed lines represent the validation step.

Figure 3. Fractional shrub coverage in Quinta da Franca. Full dataset obtained from the intersection of the high-resolution land cover map and the Sentinel-2 pixel grid. This figure shows the spatial distribution of shrub cover across the study area, not the number of samples for each category.

Figure 4. (a) Actual shrub cover in the testing subset of pixels retrieved from the full dataset (Figure 3) to test the machine learning algorithms. (b) Shrub cover predicted by the random forest algorithm for the subset of pixels in the testing dataset.

Figure 5. (a) Validation dataset showing the subset of pixels retrieved from the full dataset (Figure 3) to test the machine learning algorithms. (b) Shrub coverage predicted by the random forest algorithm for the subset of pixels in the validation dataset.

Figure 6. Confusion Matrices for shrub cover classification on the test set and validation set.

Table 2. Number of samples for training, testing, and validation across main shrub coverage categories (category 1, category 2, and category 3) based on a stratified sampling approach.

	Category 1	Category 2	Category 3	Total
Training	15,031	14,998	14,997	45,026
Testing	3730	3763	3764	11,257
Validation	2000	2000	2000	6000

Table 3. Hyperparameter configuration for the used ML models.

ML Model	Hyperparameter Configuration
ANN	- Number of hidden layers: 3
	- Number of units in each layer: [100-100-3]
	- Activations in each layer: [relu, relu, softmax]
	- Optimizer: Adam
	- Epochs: 150
	- Batch size: 32
SVM	- Kernel: radial basis function
	- C: 1 (regularization parameters)
	- $γ$ : 1/n (kernel coefficient for ‘rbf’)
RF	- Number of estimators: 250
	- Criterion: entropy
	- min_samples_split (Minimum samples required to split an internal node): 2
	- min_samples_leaf (Minimum samples required at a leaf node): 1
	- max_depth: maximum depth of the tree; if “None”, nodes are expanded until all leaves are pure or contain fewer than min_samples_split samples.
	- max_features: $\sqrt{(n)}$ (where n is the number of features to consider when looking for the best split; in this case, $\sqrt{(n_f e a t u r e s)}$ was used)
	- Bootstrap (whether bootstrap samples were used when building trees) = true

Table 4. Contingency table for McNemar’s test (RF vs. SVM example).

Simulation Results
	SVM Correct	SVM Incorrect	Total
RF Correct	a	b	$a + b$
RF Incorrect	c	d	$c + d$
Total	$a + c$	$b + d$	N

Table 5. Performance evaluation on the test set using the RF algorithm. Definitions of precision, recall, and F score are provided in the main text (Section 2.5.1), and support represents the number of pixels in each category.

	Precision	Recall	F Score	Support
Category 1	88%	82%	85%	3730
Category 2	79%	87%	83%	3763
Category 3	86%	82%	83%	3764

Table 6. Performance evaluation on the test set using the SVM algorithm. Definitions of precision, recall, and F score are provided in the main text (Section 2.5.1), and support represents the number of pixels in each category.

	Precision	Recall	F Score	Support
Category 1	81%	74%	77%	3730
Category 2	71%	73%	72%	3763
Category 3	73%	77%	75%	3764

Table 7. Performance evaluation on the test set using the ANN algorithm. Definitions of precision, recall, and F score are provided in the main text (Section 2.5.1), and support represents the number of pixels in each category.

	Precision	Recall	F-Score	Support
Category 1	86%	79%	82%	3730
Category 2	78%	81%	80%	3763
Category 3	78%	81%	80%	3764

Table 8. Performance evaluation on the validation set using the RF algorithm. Definitions of precision, recall, and F score are provided in the main text (Section 2.5.1), and support represents the number of samples in each category.

	Precision	Recall	F Score	Support
Category 1	82%	85%	83%	2000
Category 2	98%	77%	86%	2000
Category 3	78%	92%	84%	2000

Table 9. Performance evaluation on the validation set using the SVM algorithm. Definitions of precision, recall, and F score are provided in the main text (Section 2.5.1), and support represents the number of samples in each category.

	Precision	Recall	F Score	Support
Category 1	79%	81%	80%	2000
Category 2	94%	66%	77%	2000
Category 3	72%	92%	81%	2000

Table 10. Performance evaluation on the validation set using the ANN algorithm. Definitions of precision, recall, and F score are provided in the main text (Section 2.5.1), and support represents the number of samples in each category.

	Precision	Recall	F Score	Support
Category 1	77%	87%	82%	2000
Category 2	99%	70%	81%	2000
Category 3	73%	86%	79%	2000

Table 11.

χ^{2}

evaluated on each pair of algorithms (test set).

Table 11.

χ^{2}

evaluated on each pair of algorithms (test set).

Pair of Algorithms	$χ^{2}$	p Value
(RF, SVM)	12,502	0
(RF, ANN)	13,101	0
(SVM, ANN)	10,404	0

Table 12.

χ^{2}

evaluated on each pair of algorithms (Validation set).

Table 12.

χ^{2}

evaluated on each pair of algorithms (Validation set).

Pair of Algorithms	$χ^{2}$	p Value
(RF, SVM)	8195	0
(RF, ANN)	7448	0
(SVM, ANN)	6338	0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cherif, E.K.; Lucas, R.; Ait Tchakoucht, T.; Gama, I.; Ribeiro, I.; Domingos, T.; Proença, V. Predicting Fractional Shrub Cover in Heterogeneous Mediterranean Landscapes Using Machine Learning and Sentinel-2 Imagery. Forests 2024, 15, 1739. https://doi.org/10.3390/f15101739

AMA Style

Cherif EK, Lucas R, Ait Tchakoucht T, Gama I, Ribeiro I, Domingos T, Proença V. Predicting Fractional Shrub Cover in Heterogeneous Mediterranean Landscapes Using Machine Learning and Sentinel-2 Imagery. Forests. 2024; 15(10):1739. https://doi.org/10.3390/f15101739

Chicago/Turabian Style

Cherif, El Khalil, Ricardo Lucas, Taha Ait Tchakoucht, Ivo Gama, Inês Ribeiro, Tiago Domingos, and Vânia Proença. 2024. "Predicting Fractional Shrub Cover in Heterogeneous Mediterranean Landscapes Using Machine Learning and Sentinel-2 Imagery" Forests 15, no. 10: 1739. https://doi.org/10.3390/f15101739

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Predicting Fractional Shrub Cover in Heterogeneous Mediterranean Landscapes Using Machine Learning and Sentinel-2 Imagery

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Sources and Processing

2.2.1. Sentinel-2 Data Processing

2.2.2. Vegetation Coverage Percentage

2.3. Modeling Approach

2.4. Machine Learning Models

2.5. Model Evaluation and Comparison

2.5.1. Model Evaluation

2.5.2. Model Comparison

3. Results

3.1. Training/Testing

3.2. Validation

4. Discussion and Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI