1. Introduction
The argan tree (
Argania spinosa (L.) Skeels) is a rare species endemic to southwestern Morocco [
1]. Argan trees play a critical role in the environmental, social, and economic landscape. Known for their resilience in arid conditions, the argan trees contribute to biodiversity by supporting a rich network of plant and animal life while also enhancing soil stability [
2]. Beyond their ecological significance, they also serve as a source of livelihood and income for local communities through valuable products like argan oil and timber [
3]. The argan oil industry significantly contributes to the Moroccan economy, as highlighted in
Table 1, which presents the market size of Moroccan and global argan oil in 2019, along with forecasts for 2030 [
4,
5]. The growing global demand for argan oil across the cosmetics, skincare, and culinary sectors further underscores its increasing economic importance.
The argan forest considered in this study is located in the Souss region, centered around the city of Agadir (Morocco), an area distinguished by its agricultural sector, which is a leading economic sector alongside tourism. This argan forest is one of the largest in the country [
6] and includes a section that was declared a Biosphere Reserve by UNESCO in 1998, dedicated to protecting this unique habitat [
7]. However, accurately assessing the precise distribution and abundance of argan trees in the region remains challenging [
8].
At present, the argan forest has been threatened by deforestation due to several factors [
9]. On the one hand, agricultural expansion, urban encroachment [
6], massive logging for producing firewood and charcoal, not to mention climate change and drought [
8,
10]. As a result, the argan forests have lost approximately 50% of their total area over the past century (
Table 2), with an estimated annual loss of 600 hectares [
11]. This has raised alarm bells regarding the possible extinction of this magnificent tree in the future [
12]. It is therefore crucial for resource conservation that significant efforts are made to create accurate maps, essential for the monitoring and proper management of land use and land cover (LULC) in this strategic zone, given its role in food security and Morocco’s environmental and economic development [
13].
In recent years, remote sensing technologies and satellite imagery have provided the capacity to collect accurate, comprehensive, and wide-ranging data, facilitating studies related to forest mapping and the accurate monitoring of forest habitat change and deforestation [
14,
15,
16,
17]. In particular, multispectral and multitemporal data from the European Space Agency’s Sentinel-2 satellites have contributed to the success of this technology in these applications [
18,
19,
20].
Tree detection studies using Sentinel-2 data fall into two main categories: those relying solely on single-date multispectral information and those incorporating both multispectral and multi-temporal data [
21]. Two primary methods are used to assess spectral band significance. The first calculates feature importance scores from classifiers, offering a quantitative approach to select optimal bands, though interpretability and classification performance can sometimes pose challenges [
22]. The second method involves individual classification experiments with each spectral feature, providing a clearer understanding of their impact [
23,
24]. Incorporating multi-temporal data is crucial due to the “same spectra but different objects, same objects but different spectra” phenomenon, caused by tree phenological cycles, which can limit detection accuracy when using single-date data [
21,
25]. The multi-temporal analysis allows for better tracking of these phenological stages, enhancing detection accuracy [
26]. However, cloud contamination can interfere with this process [
27]. Integrating multi-source data has been suggested, but this approach requires extensive preprocessing and sensor calibration [
28,
29]. In summary, using multispectral and multi-temporal Sentinel-2 data significantly improves tree detection accuracy, with results strongly dependent on information selection strategies [
21,
26].
In studies on tree detection and forest mapping using Sentinel-2 data, various classification algorithms have been employed, encompassing both machine learning and deep learning techniques [
18], with supervised classification generally preferred over unsupervised approaches [
30]. Before undertaking any remote sensing task, it is essential to identify the most appropriate approach for data analysis, especially with Sentinel-2. There are three primary approaches: pixel-based [
31], patch-based [
32], and object-based [
31], which differ primarily in the basic unit of analysis. Pixel-based and patch-based approaches often employ machine learning algorithms for LULC classification, with Random Forest (RF) and Support Vector Machine (SVM) being the most frequently used. These algorithms have shown promising results, often achieving an Overall Accuracy (OA) exceeding 80% [
18]. Other studies with similar objectives have utilized other machine learning algorithms, such as Maximum Likelihood Classification (MLP), Artificial Neural Networks (ANN), Decision Trees, k-Nearest Neighbors (KNN), and Bayes models [
18]. It is worth noting that pixel-based methods, while commonly used, can be prone to noise. Patch-based or object-based approaches are generally preferable, especially for change detection tasks [
33,
34]. Object-based methods, in particular, often take advantage of deep learning algorithms, which were first introduced to remote sensing in 2014 [
20]. Since then, deep learning has gained significant attention due to its success in this domain [
35]. Among the most commonly used deep learning models are Convolutional Neural Networks (CNN), which have a demonstrated OA exceeding 90% in LULC classification and mapping tasks using Sentinel-2 data [
17,
36]. Overall, deep learning algorithms outperform traditional machine learning algorithms in terms of accuracy due to their superior feature extraction capabilities, non-linear modeling, and high-level semantic segmentation [
37,
38]. Nevertheless, the efficiency of machine learning models remains relevant, as they typically require less training data compared to deep learning models, making them well-suited for smaller datasets [
39]. Finally, regardless of the type of algorithm chosen for a remote sensing task, factors such as data preprocessing and algorithm parameterization can significantly enhance accuracy [
18].
Finally, it is important to briefly review the role of remote sensing in detecting changes in the same geographical area over time, a key focus of this study [
40]. Change detection methods using multispectral satellite data can be categorized into four main types [
41]: Algebra-Based Methods [
42], Statistics-Based Methods [
43,
44], Transformation-Based Methods [
42], and Deep-Learning-Based Methods [
17,
45]. For deforestation mapping, two primary approaches are commonly used. The first involves classifying images for each time period into multiple land-cover labels (e.g., trees, bare soil, urban areas, water) using machine learning classifiers [
46] or deep learning algorithms [
47,
48]. By comparing these classification maps, deforestation maps are generated. However, this approach is prone to error propagation from the classification maps, underscoring the need to improve classification accuracy. The second approach directly compares two images from different dates. Deep neural network models, such as Improved UNet++ [
49], are specifically designed for this task. These models use multi-temporal data in an end-to-end manner, analyzing spectral, spatial, and structural features to produce a final change map. This method can also be accomplished using machine learning algorithms [
50].
Although numerous studies have explored remote sensing data and developed algorithms for forest mapping, few have specifically focused on argan forests, either for general mapping or deforestation detection, largely due to the limited availability of argan tree-related data [
51]. Two studies focused on mapping argan trees near Essaouira, close to Agadir, using the NDVI index derived from Sentinel-2 time series data and the Support Vector Machine (SVM) algorithm, achieving overall accuracies of 89.78% [
52] and 92.60% [
53], respectively. Another study mapped argan using Sentinel-1 time series, Sentinel-2 data, and three machine learning algorithms, in addition to a layer of the missing Shuttle Radar Topography Mission Digital Elevation Model (DEM), achieving the highest accuracy of 93.25% [
54]. Building on this foundation, our team researched mapping argan deforestation using deep learning algorithms and a patch-based approach [
20,
32,
55]. This study aims to further enhance these results.
Recognizing the ecological and economic importance of argan trees and the threats they face, this study aims to map argan forests and calculate the deforestation rate using remote sensing and machine learning tools. Specifically, we create argan tree maps for 2017/2018 and 2022/2023, comparing these maps to generate a map and calculate the deforestation rate over this period. Given the limited availability of argan-specific data, machine learning algorithms were selected for their efficiency, faster data analysis capabilities, and minimal computational requirements. This work seeks to provide specialists with valuable insights and statistics to inform strategies for mitigating argan forest deforestation.
4. Discussion
This study aimed to map the Admine argan forest located in the Souss region of Morocco for the periods 2017/2018 and 2022/2023, calculate the deforestation rate over this six-year period, and provide specialists with accurate insights to inform mitigation strategies for this UNESCO World Biosphere Reserve habitat. Using remote sensing and machine learning tools, we addressed the challenge of limited argan-specific data through efficient algorithms capable of analyzing large datasets with minimal computational requirements. Additionally, we sought to enhance the precision of these tools by conducting extensive experiments to identify the optimal scenarios for mapping and deforestation analysis.
The experiments provided valuable insights into the performance of different machine-learning algorithms, data levels, and spectral information. The Decision Tree, Random Forest, and XGBoost machine learning algorithms consistently performed well, in line with prior studies [
18], but the LightGBM was selected due to the outstanding results, with OAs greater than 98.0% in all cases. Additionally, the comparison between Sentinel-2 Levels 1C and 2A demonstrated minimal differences, primarily attributed to the negligible contribution of Band 10, the key differentiator between these data levels. These results suggest that both data levels are suitable for this task, offering flexibility in data selection for future studies.
The superior classification results for 2022/2023 compared to 2017/2018 can be attributed to the lack of available ground truth data for 2017/2018. Consequently, the 2022/2023 dataset is inherently more reliable than the 2017/2018 dataset.
Experiments also highlighted the importance of selecting optimal temporal windows for Argan tree detection (
Figure 7). Variations in classification results were observed depending on the observation date, underscoring the need for a precise temporal analysis when mapping vegetation in dynamic ecosystems [
126,
127]. An analysis of classification performance throughout the year, as illustrated in
Figure 7, reveals some periods with higher accuracy; however, the difference between the average OA for each season is minimal (
Table 14), suggesting that seasonal variations have a limited impact on classification performance and argan tree detection accuracy. This can be attributed to the severe seven-year drought that Morocco is facing, which has affected the study area [
128]. The decreasing rainfall and its seasonal irregularity (
Figure 12), as well as the relatively stable temperature throughout the year, which results in limited seasonal fluctuations in the region [
129]. On the other hand, the experiments demonstrated that combining spectral and temporal data significantly improves OA (
Table 12). Specifically, models trained on time-series data consistently outperformed those trained on single-date data, indicating that temporal information helps the models better learn the characteristics of Argan trees, ultimately leading to improved classification accuracy.
The argan tree is distinguished by its medium-density distribution, adaptation to semi-arid regions, and typical growth in areas surrounded by bare soil. Given these conditions, it was initially expected that incorporating additional spectral information, such as spectral indices, could enhance accuracy—provided an appropriate resampling method was applied. In this study, all Sentinel-2 bands were resampled to a 10-m resolution. While resampling higher-resolution bands does not introduce new information to lower-resolution data, the interpolation method used minimized data distortion, preserved spatial patterns, and enabled the effective use of certain indices (e.g., ARVI and SCI), which combine bands of different native resolutions (10 m, 20 m, and 60 m). Standardizing spatial resolution across all input features benefits machine learning models, as they tend to perform better when all variables share the same resolution, reducing potential biases during training and classification. While resampling inherently has limitations, in this study, it proved advantageous by enhancing model consistency, improving classification performance, and enabling a more effective integration of spectral information.
In this study, OA was assessed using tabular test data and image-based test data, which provide a more realistic representation of real-world complexities. This resulted in a clear gap between the OA achieved with tabular data (100%) and that obtained with image-based data (85%). This discrepancy is primarily due to key classification challenges inherent in image data, including tree edge overlap, which leads to multiple land cover types within a single pixel, the loss of spatial contextual information when data are represented in tabular form, and the impact of mixed pixels, which complicates classification. Nevertheless, achieving perfect accuracy with tabular data does not imply that the task was easy or unreliable; rather, it reflects the quality of data preprocessing and the selection of relevant features. Since the models were trained on tabular data, it is natural for them to perform better on this format than on image-based data. Furthermore, applying the model to an entire scene—characterized by its vastness, class diversity, and factors such as terrain and spatial context—naturally leads to a decrease in accuracy compared to the training phase. This is expected in binary classification, where Argan tree samples were carefully selected to simulate their spectral characteristics.
Pixel-based classification has significant limitations in argan tree detection, leading to errors in deforestation mapping. Factors already discussed, such as the 10-m resolution of Sentinel-2, the presence of mixed pixels, and daily spectral variations, contributed to the classification uncertainties. In an attempt to resolve these problems, a patch-based approach was applied.
Since the study area is located in the plain of Souss argan orchards, which has an average density of 10 trees/ha [
6], and that the argan tree may be shrubby or reach up to 10 m, occasionally 20 m [
130], we conclude that an argan pixel can be located in a window larger than 3 × 3 pixels (such as a 5, 7, or 9 pixels square). In this case, the most appropriate patch size would be 7 × 7 or even 9 × 9 (close to one hectare) for a sure probability of argan presence. However, determining that optimal patch size (for deforestation detection) was particularly challenging due to the lack of reference data for 2017/2018, which prevented direct numerical validation. Therefore, the two criteria described in
Section 2.8 were adopted to evaluate patch performance.
The visual assessment (first criterion) revealed that 1 × 1 and 3 × 3 were the least effective patch sizes. They produced excessive noise and classification errors, making it unreliable. In
Figure 13 (Study area—Region 1), which shows a newly constructed road in 2022/2023 that did not exist in 2017/2018, the 1 × 1 and 3 × 3 patches captured this change most clearly. However, in
Figure 14 (Study area—Region 2), a zone with no significant changes between 2017/2018 and 2022/2023, the 7 × 7 and 9 × 9 patches best preserved this stability, with 9 × 9 maintaining more relevant spatial details.
The second criterion involved comparing the deforestation rates obtained for each patch size with official reports and previous studies conducted in the same area using Sentinel-2 data and deep learning methodologies, which estimated deforestation rates between 2% and 5% [
20,
32,
131,
132]. For instance, a study employing Convolutional Neural Networks (CNN) with 32 × 32 patches estimated a deforestation rate of 2.56% between 2015 and 2020 [
20], while another study using U-Net with 16 × 16 and 32 × 32 patches reported a 2.59% deforestation rate between 2015 and 2022 [
32]. The 9 × 9 patch aligned most closely with this range, producing results that balanced noise reduction and fine-scale deforestation detection. The 1 × 1 and 3 × 3 patches significantly overestimated deforestation, making them unreliable. Therefore, the 9 × 9 patch size provided the best overall performance, preserving detail while maintaining classification stability (
Figure 15). This study highlights the importance of optimizing patch selection strategies to enhance deforestation detection, particularly in regions where ground truth data is limited.
Despite the progress made, several challenges remain. The first challenge arises from the model’s occasional difficulty in distinguishing between argan trees and soil, leading to errors in argan detection. Argan trees, native to arid and semi-arid regions of southwestern Morocco, have small, oval, leathery leaves spaced to conserve water in harsh conditions. The result is less dense foliage than trees from temperate or tropical climates, which tend to have larger and denser leaves. In addition, the gnarled, twisted, and widely spreading branches of the argan contribute to this effect. The low canopy density allows sunlight to reach the ground (
Figure 2), which makes the spectral signatures of the argan tree and the ground similar, making it challenging for machine learning models to differentiate between the two. The model’s ability to accurately detect argan trees could be improved by using images with higher spatial resolution than those provided by Sentinel-2, ideally with a resolution of less than one meter, such as drone or airborne data. Alternatively, experimenting with other models, especially deep learning models, may yield better results.
The second challenge lies in the phenomenon of “same spectra but different objects, same objects but different spectra”, which limits classification accuracy when relying on single-date data [
21,
25]. This study mitigated this issue by employing multi-temporal data, demonstrating the value of temporal information in improving classification results. While challenges remain, our findings provide a solid foundation for further advancements in the detection and management of argan forests.