1. Introduction
Land cover mapping is a fundamental tool for land management, providing essential information for monitoring ecological, hydrological, and climatic processes at multiple scales. It allows the characterization of physical elements on the Earth’s surface, such as vegetation, water, soil, and urban areas, and is crucial for biodiversity conservation, urban planning, agriculture, and resource management [
1,
2,
3,
4,
5,
6,
7]. Accurate land cover data support modeling of ecosystem processes, assessment of environmental changes, and evaluation of human impacts, making it a cornerstone for sustainable development and climate change mitigation. Land cover is also a key variable for monitoring several Sustainable Development Goals (SDGs), and its importance has increased due to the growing availability of high-resolution satellite data, which enables more detailed and frequent observations of land transformations [
8].
In France, land cover mapping has traditionally relied on national and regional products such as Corine Land Cover (CLC), which provide categorical maps at 100 m resolution [
9,
10]. While useful for broad-scale applications, the relatively coarse spatial resolution of these maps often results in pixels containing mixtures of land cover types, and assigning a single dominant class oversimplifies heterogeneous landscapes. This limitation is especially pronounced in areas with mixed agriculture, forest–pasture mosaics, or urban–rural interfaces.
Very high-resolution imagery (<1 m), such as the IGN FLAIR datasets, allows detailed local studies by combining aerial imagery, Sentinel-1/2, and topographic layers [
11,
12]. However, its high cost and demanding processing requirements limit its applicability for large-scale mapping, which is why intermediate-resolution products like Sentinel-2 are valuable for regional and national analyses [
12,
13,
14].
Sentinel-2 imagery offers an intermediate solution, providing 10–20 m resolution data suitable for regional and national studies, with frequent temporal revisits [
15,
16,
17,
18]. Building on this resource, several complementary datasets have been developed. The CNES OSO dataset provides 10 m resolution maps with up to 23 classes for metropolitan France, derived from Sentinel-2 and integrated into the Theia Land Cover CES platform [
19]. At the European scale, the S2GLC project delivers 10 m resolution maps with a standardized 13-class legend, enabling consistent temporal monitoring and landscape characterization across countries, including France [
20]. At the global scale, the Copernicus Global Land Cover (CGLS-LC100) dataset provides 100 m resolution maps covering all major land cover classes, combining Sentinel-2 and other satellite data to ensure a consistent global product [
21,
22].
Although these advances have improved coverage and resolution, traditional categorical products still face important limitations in heterogeneous landscapes. Assigning a single dominant class to each pixel oversimplifies complex land patterns, making it difficult to capture sub-pixel mixtures and reducing accuracy in applications such as carbon flux estimation, hydrological modeling, or biodiversity assessment. Fractional land cover mapping provides a solution to this problem by describing the relative proportion of each class within a pixel, rather than forcing exclusive membership to a single category. This “soft” or continuous approach—also called fuzzy classification, subpixel mapping, or linear mixture modeling [
23]—enables models to work on land cover characteristics, such as tree or herbaceous cover, instead of discrete pixel labels, providing a more realistic representation of the Earth’s surface and supporting more robust environmental analyses.
Efforts to map land cover fractions have been conducted previously, mostly focusing on 3–6 classes at local scales [
24,
25,
26,
27,
28,
29,
30] and less frequently at regional scales with more detailed class sets [
31]. Methods for evaluating accuracy vary considerably across studies. At the global level, several products have been developed targeting individual classes, such as tree cover [
27,
32,
33], water bodies [
34], or urban areas [
35,
36]. To date, only the Copernicus Global Land Cover (CGLS-LC100) dataset provides global fractional maps encompassing all major land cover classes [
21,
22]. More recently, Masiliūnas et al. (2021) [
37] demonstrated the feasibility of estimating true fractional cover for multiple land cover classes using machine learning techniques, bridging the gap between local studies and global products. Together, these approaches illustrate the progression from local, limited-class studies to probabilistic and global fractional products, highlighting both the potential and current limitations in representing landscape complexity across multiple classes. However, these methods still have weaknesses, and more advanced machine and deep learning architectures promise to surpass previous results, motivating the present study.
The main objective of this study is to develop and evaluate a methodology for estimating fractional land cover across a large and diverse set of classes using Sentinel-2 imagery and advanced machine and deep learning models. Using the high-resolution FLAIR dataset as reference (ground truth), the methodology is tested for 13 land cover classes in France, assessing its potential to better represent heterogeneous landscapes. The study evaluates multiple modeling architectures—including XGBoost, deep neural networks (DNNs), and convolutional neural networks (CNNs)—and examines the contribution of spectral and auxiliary variables providing insights for potential operational applications. By addressing these limitations at the national scale, this study also contributes to the global effort to improve land cover monitoring, offering a methodology that can complement continental- and global-scale products and support applications in carbon flux estimation, hydrological modeling, and biodiversity assessment.
2. Materials and Methods
2.1. Workflow
The workflow presented in
Figure 1 describes the process for obtaining a model that classifies the proportion of land cover within a Sentinel-2 pixel. The FLAIR (French Land cover from Aerospace ImageRy) dataset is used as the basis for training and validating this model. FLAIR is a very high-resolution (VHR) image dataset that provides detailed and labeled land cover information for French territory, derived from aerial imagery.
In the preprocessing stage, very high-resolution (VHR) images (0.2 m per pixel) were cropped and co-registered to align with Sentinel-2 images. Based on the location of the FLAIR tiles, four different datasets are generated, each containing increasing levels of information. The first dataset includes Sentinel-2 pixel reflectance values together with the Scene Classification Layer (SCL, a per-pixel mask distinguishing surface types such as vegetation, water, clouds, and shadows) at 20 m resolution. The second dataset extends this by incorporating spectral indices derived from the Sentinel-2 bands, such as NDVI and other vegetation indices, alongside the reflectance and SCL data. The third dataset combines reflectance and SCL information with auxiliary variables, including climatic data and the acquisition date of the imagery, to provide additional contextual information. Finally, the fourth dataset consists solely of Sentinel-2 reflectance values. Over these areas, the Sentinel-2 Scene Classification Layer (SCL) and spectral bands are clipped, and spectral indices are calculated. Auxiliary information, such as predominant climate and image acquisition month, is also incorporated.
In the modeling phase, three approaches are employed, ranging from simplest to most complex: XGBoost, deep neural networks (DNN), and convolutional neural networks (CNN). The models are trained and optimized, and their performance is evaluated using metrics such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the Coefficient of Determination (R2). A two-level validation scheme is applied, considering both the Sentinel-2 pixel level and the VHR scene level (aggregates of 5 × 5 Sentinel-2 pixels). Model performances are assessed using both validation datasets (which are also used in training the neural networks) and independent testing datasets.
2.2. Study Area
The study was conducted in France, a country located in Western Europe, approximately between 42° and 51° N latitude and −5° and 8° E longitude, characterized by diverse geography including plains, plateaus, and mountain ranges such as the Alps and the Pyrenees. With a total area of about 643,801 km2, France has a population of approximately 67 million inhabitants, mainly concentrated in urban areas such as Paris, Lyon, and Marseille, while rural regions exhibit significantly lower population densities. The climate varies across regions, with oceanic conditions in the northwest, Mediterranean in the southeast, continental in the northeast, and mountainous in the Alps and Pyrenees, resulting in notable differences in temperature and precipitation throughout the country. This geographic and climatic diversity directly influences land use, which includes forests, agricultural areas, and urban settlements.
Dataset Employed
The FLAIR-one dataset, developed by the French National Institute of Geographic and Forest Information (IGN), is designed to address semantic segmentation of aerial imagery and domain adaptation challenges in land cover mapping [
11]. It covers approximately 810 km
2 of metropolitan French territory, divided into 54 spatial domains that represent a wide variety of landscapes and climatic conditions, including urban, rural, agricultural, forest, and coastal areas. Each domain corresponds to a French administrative département.
The dataset contains 951 areas with a total of 77,412 aerial images acquired at a 0.2 m spatial resolution. Each image has 512 × 512 pixels, equivalent to approximately 1.05 ha. This approach provides a wide variety of representative examples for training and evaluating segmentation models. The data were collected in several acquisition campaigns between 2018 and 2021, specifically on 17 and 19 April and 16 September 2018; 1, 29 June, 4, 5, 10, 16, 25 July, 3, 14, 16, 23, 31 August, and 2 September, 11 October 2019; 25 June 2020; and 9 July 2021. The dataset is divided into three parts: 70% of the domains (38 domains) are allocated for training, while 15% (8 domains) are used for validation, and the other 15% (8 domains) for testing. This partition ensures independent assessment during model validation and testing analyses, as shown in
Figure 2.
The labeling of FLAIR was carried out through manual photointerpretation by experts. This resulted in the creation of semantic masks (MSK) that assign a class to each pixel in the image. The original dataset contains 19 land cover classes, ranging from urban areas to natural and agricultural zones, among others.
Figure 3 illustrates the percentage distribution of classes across the dataset’s training, validation, and test sets. As can be seen, the last seven classes have a much lower distribution than the rest. Therefore, in this study, these will be merged into a combined class called “Other”. This reduced scheme, with 13 classes, aims to balance precision and simplicity, facilitating the implementation of classification models.
2.3. Data Preprocessing
Sentinel-2 Imagery
The Sentinel-2 mission of the Copernicus Programme provides multispectral optical imagery with a spatial resolution of ≥10 m and a revisit frequency of ~5 days, making it a key source for studies such as this one. The Sentinel-2 Level-2A images used in this study are atmospherically corrected products, derived from Level-1C data through surface reflectance processing. These data are distributed free of charge by the European Space Agency (ESA) via platforms such as the Copernicus Dataspace Ecosystem, facilitating their use in scientific research and environmental management applications.
To enhance model training and ensure temporal consistency between data sources, the selected Sentinel-2 images cover a ±15-day window around the acquisition dates of the FLAIR very high-resolution (VHR) images. This approach minimizes discrepancies between data sources caused by the different acquisition times and eventual phenological or atmospheric variations. Moreover, this criterion allows for the inclusion of multiple Sentinel-2 images for a single reference date, thereby increasing the chances of cloud-free observations.
The Sentinel-2 Scene Classification Layer (SCL) is used to exclude observations contaminated by atmospheric effects or with the presence of clouds or cloud shadows, as indicated in
Table 1. For the models, this index was encoded using the value of its corresponding code.
Sentinel-2 provides spectral information across 12 bands, ranging from the visible spectrum (RGB: red, green, and blue) to the shortwave infrared (SWIR), including the near-infrared (NIR) and several red-edge bands. In this study, all 12 Sentinel-2 bands are used, enabling a highly detailed characterization of the spectral signature of different land cover types [
16].
In addition to the spectral bands, several spectral indices were used to generate the training datasets, in order to assess their contribution to land use classification [
16]. Spectral indices are algebraic operations applied to the Sentinel-2 image bands, designed to enhance the discrimination between different land cover types.
The indices calculated for the database include the Normalized Difference Vegetation Index (NDVI), the Normalized Difference Water Index (NDWI), and the Normalized Difference Built-up Index (NDBI).
NDVI is used to assess active vegetation, where values greater than 0.5 are typically associated with dense vegetation, while values below 0.2 are indicative of bare soil or urban areas [
38]. This index plays a significant role in crop monitoring and vegetation dynamics analysis [
39].
The
NDWI highlights the presence of surface water, with positive values typically associated with water bodies and negative values indicating dry soils or vegetation [
40]. This index allows for the analysis of water bodies and their temporal dynamics [
39].
Finally,
NDBI facilitates the detection of built-up areas by exploiting the high reflectance of constructed surfaces in the
SWIR1 band, distinguishing them from vegetation and natural soils [
41]. This proves useful in urban expansion studies and land use classification [
39].
2.4. Auxiliary Information
In addition to the spectral information provided by Sentinel-2 bands and vegetation indices, variables at a different scale, such as climate and data acquisition time, have been included. The climate, or even the seasonality within the same climate, can affect satellite image classification in several ways. Seasons impact the appearance of vegetation and soil, as plant phenology varies with temperature and precipitation in the course of the year. This can lead to differences in classifications performed at different times of the year, requiring adjustments or strategies like combining multiple dates to improve result consistency. Meanwhile, different climates can affect the greater or lesser presence of certain classes.
Figure 4 shows the seven main climate types across the scenes, based on the FLAIR climate classification [
42]. The “altered Mediterranean” climate was renamed into “Mediterranean”. The climates of France were categorized as: mountainous, oceanic, altered oceanic, degraded oceanic, Mediterranean, semi-continental, and southwestern basin, and for the models, these climates were encoded numerically from 0 to 7 in the order listed above.
Given the marked seasonal variations in land cover—e.g., agricultural cycles, vegetation phenology, and changes in water bodies—the month of image acquisition has been incorporated as an additional variable to improve the accuracy of satellite classification [
43]. This temporal information allows for better capturing the land surface dynamics, such as crop growth [
44,
45], leaf senescence in deciduous forests [
46,
47], and seasonal fluctuations in water bodies [
48], thereby reducing errors stemming from interpreting these changes as permanent land-use transformations.
2.5. Reduction of the VHR Image Size
The Sentinel-2 bands used in this study have a spatial resolution of 20 m and were kept in their original format. Each FLAIR very high-resolution (VHR) scene, measuring 512 × 512 pixels at 0.2 m per pixel (covering 102.4 × 102.4 m), is co-registered with Sentinel-2, meaning that the center of the scene aligns with the center of a Sentinel-2 pixel. To match precisely a 5 × 5 pixel block of Sentinel-2 (covering 100 × 100 m), a small margin of 1.2 m per side (6 × 512 pixels) was clipped from the VHR images, as shown in
Figure 5.
2.6. Datasets Employed
The study evaluates four distinct datasets, each composed of different combinations of input variables (
Table 2). This approach allows for understanding the contribution of each set of variables on model performance.
2.7. Architectures
In this study, diverse machine learning model architectures will be evaluated.
2.7.1. Extreme Gradient Boosting (XGBoost)
XGBoost is a machine learning algorithm based on decision trees that uses “boosting”, an approach to improve model accuracy [
49]. Unlike traditional approaches that train trees independently, XGBoost builds trees sequentially, with each one correcting the errors made by previous trees. This iterative process efficiently optimizes the model, producing highly accurate results even with complex data. XGBoost is particularly powerful for classification and regression tasks due to its ability to handle non-linear relationships between features and its robustness against issues like overfitting. XGBoost has been successfully used in various studies focused on satellite image classification and land cover analysis, which supports its suitability for these types of applications [
50,
51,
52]. The model was configured with a maximum tree depth of 10, learning rate (eta) of 0.1, subsample ratio of 0.8, column sample by tree of 0.8, and the number of trees (estimators) set to 500.
Figure 6 illustrates the XGBoost model used where
n is 500.
2.7.2. Deep Neural Networks (DNNs)
Deep neural networks (DNNs) are machine learning models composed of multiple hidden layers between the input and output, enabling them to learn hierarchical and non-linear representations of data. Compared to traditional neural networks, DNNs are capable of handling large datasets and autonomously extracting complex patterns [
53]. They are highly versatile and have been successfully applied to tasks such as classification, regression, and time series analysis [
54]. In particular, DNNs are well-suited for problems involving complex, non-linear relationships between input features, such as the analysis of spectral data from satellite images [
50,
55]. Their effectiveness improves with larger datasets, allowing for enhanced prediction accuracy.
Figure 7 illustrates the base DNN architecture employed in this study.
To assess the impact of model complexity, three DNN architectures with varying depths were tested:
Compact Architecture (DNN1)
This model features a simple architecture with three hidden layers of 128, 64, and 32 neurons, respectively, all activated with the ReLU function. Dropout layers have been incorporated after the first two hidden layers with a rate of 30% to mitigate overfitting, which helps improve model generalization. The output layer uses linear activation for regression. It was compiled with the Adam optimizer and the Mean Squared Error (MSE) loss function, which is suitable for regression tasks.
Highly Dense Architecture (DNN3)
This model is the most complex and dense of the three, designed to capture more sophisticated patterns in the data. It starts with 512 neurons in the first layer and maintains a high number of neurons in the subsequent layers (512, 256, 256, 128, 128, 64, 64, and 32), allowing it to model highly complex non-linear relationships. The great depth and high number of parameters can give the model greater predictive capacity, but also make it more susceptible to overfitting and higher computational demands. ReLU activation is maintained in the hidden layers and linear in the output, along with the MSE loss function and the Adam optimizer.
2.7.3. Convolutional Neural Networks (CNNs)
CNNs are a type of deep neural network widely used for image processing and visual data analysis tasks. Their architecture is designed to recognize spatial and hierarchical patterns in data through convolutional layers that apply filters to extract important features, such as edges, textures, and shapes. These networks are especially effective in classification, segmentation, and object recognition tasks, as they can learn representations at different scales without manual intervention [
56].
In the context of remote sensing, CNNs are ideal for analyzing satellite images or remote sensing data, as they can identify complex patterns in multispectral data and classify different land cover classes with high accuracy [
57,
58]. Their ability to generalize patterns from spatial data makes them a relevant tool in land use classification with Sentinel-2 images.
The implemented architecture consists of two convolutional layers with ReLU activation functions, configured with 32 and 64 filters respectively and a kernel size of 3 × 3 with 1 padding, which allows preserving the spatial resolution of the input. These layers are designed to capture hierarchical and local representations of spatial features. Subsequently, a flattening operation is applied, followed by a fully connected layer of 128 units and an output layer corresponding to the number of target classes. No activation is applied in the output layer, as the BCEWithLogitsLoss function is used, which combines sigmoid activation with binary cross-entropy, making it suitable for multi-label classification tasks.
Figure 8 illustrates the base CNN architecture used in this study.
2.8. Validation and Metrics
To ensure the accuracy and reliability of the estimated land cover fractions, the validation process is carried out at two levels: pixel-level validation and scene-level validation. These complementary approaches allow for a comprehensive evaluation of model performance.
Since the model outputs are constrained to range from 0 to 100 and the predicted fractions for all land cover classes within each pixel always sum to 100, the resulting RMSE and MAE values are directly interpretable as percentage errors. This also applies to scene-level validation, where aggregated predictions retain this percentage-based interpretation.
2.8.1. Pixel-Level Validation
At the pixel level, the estimated land cover fractions for each class are compared with the reference fractions obtained from the FLAIR dataset. This approach allows evaluating the model’s ability to accurately predict the fraction of each class within individual Sentinel-2 pixels. To this end, some of the metrics proposed by [
59] for validating machine learning models were selected.
Root Mean Squared Error (
RMSE): A metric used to evaluate the accuracy of a regression model. It is calculated as the square root of the mean squared error (MSE), which allows interpreting the error in the same units as the target variable.
Mean Absolute Error (
MAE): Calculates the average absolute difference between the estimated and actual fractions, providing an intuitive measure of the error.
Coefficient of Determination (
R2): Evaluates the proportion of variance in the actual fractions that is explained by the model’s predictions. Values close to 1 indicate a strong correlation.
where
is the observed fractional land cover derived from the FLAIR dataset and
the estimated fractional land cover in each Sentinel-2 pixel
.
refers to the number of Sentinel-2 pixels.
2.8.2. Scene-Level Validation
A complete scene covers a 100 × 100 m surface and contains 25 Sentinel-2 20 m pixels. The same error metrics mentioned above (MAE, RMSE, R
2) are used. In this case the definitions of
and
are modified:
refers to a scene,
to the number of scenes (
), and
where
refers in this case to a Sentinel-2 pixel and
to the number of Sentinel-2 pixels in each FLAIRS scene (
), which cover an area of 10,000 m
2.
2.9. Class-Level Model Evaluation Against the Test Set
To achieve a more detailed evaluation of model performance, global errors are complemented with a class-level analysis using the test set. This approach is particularly relevant for determining the model’s individual accuracy within each land cover category, offering a better characterization of the model’s performance on imbalanced classes or those with similar spectral characteristics. The same statistical metrics from previous sections are applied but calculated specifically for each class. This allows for the identification of the best and worst performing categories.
2.10. Comparison with Copernicus Global Land Cover
To assess the performance of our model, we conducted a comparison with the Copernicus Global Land Cover (CGLS-LC100, fractional) product. For this purpose, we built a homogeneous dataset integrating three sources of information: the manually labeled ground truth (GT), the predictions generated by our model from very high-resolution aerial imagery, and the Copernicus fractional cover estimates. Since the class definitions across the two datasets did not fully align, we harmonized the categories to enable direct comparison. Specifically, tree cover classes (Coniferous, Deciduous, Ligneous) were grouped under Tree cover; Brushwood was mapped to Shrubland; Agricultural land, Plowed land, and Vineyard were aggregated into Cropland; Herbaceous vegetation was retained as an independent class; Bare soil and Pervious surface were integrated into Bare/sparse vegetation; Water was directly aligned with the Water category; and urban/artificial surfaces (Building, Impervious surface) were grouped under Built-up. After this reaggregation, all class fractions were normalized so that each image summed to 100%. This harmonization provided a consistent framework to evaluate agreement between our predictions and the Copernicus reference product, allowing us to identify systematic differences across land cover types and to better understand the strengths and limitations of our approach relative to a widely used continental-scale dataset.
3. Results
3.1. Validation
Table 3 and
Table 4 show the model performance results evaluated against the validation dataset. When jointly analyzing the pixel-level and scene-level metrics for DNN1, DNN2, DNN3, XGBoost, and CNN, a clear pattern emerges: the mean errors (MSE, MAE, and RMSE) are higher in pixel-level validation and decrease significantly when pixels are aggregated into 100 × 100 m units (scene level). Conversely, the coefficient of determination (R
2) increases at the scene level, reflecting a better capture of aggregated spatial variability.
At both levels, DNN2 and DNN3 yield the best results in terms of MSE and MAE, closely followed by XGBoost. DNN1, being the simplest architecture, shows slightly inferior performance. The CNN, on the other hand, exhibits the highest average error values in both tables, but surprisingly boasts the highest R2 at both pixel and scene levels. This indicates that despite its larger absolute bias, it better explains the overall variance.
The inclusion of spectral indices did not improve the performance of the deep neural networks (DNNs); in fact, their metrics remained largely unchanged (
Table 5 and
Table 6). This is likely because the indices introduced redundant information derived from the original bands, increasing the dimensionality without adding new value. Similarly, CNNs and XGBoost were barely affected by the addition of these indices. In the case of CNNs, the architecture already extracts relevant spatial and spectral patterns directly from the input bands. XGBoost, on the other hand, is capable of automatically disregarding non-informative variables, such as spectral indices that do not contribute additional information.
While none of the models showed improvements at the pixel level, XGBoost did achieve a noticeable enhancement at the scene level, reaching a mean absolute error (MAE) of 1.47. This improvement is likely due to its feature selection mechanism, which helps mitigate the impact of redundant or irrelevant inputs.
The inclusion of auxiliary information, such as temporal variables or climate data, does contribute to improved performance, particularly at the scene level (
Table 7 and
Table 8). While DNN2 achieves its best pixel-level MAE to date with this input combination (5.05), the most notable gains are observed at the scene level, where the results outperform those obtained with other input configurations.
In this specific context, DNN2 achieves the best results at both pixel and scene levels, outperforming both XGBoost and CNN (
Table 9 and
Table 10). This suggests that when the input is limited to the essential Sentinel-2 bands, DNN2 yields the lowest MAE and MSE, though not the highest R
2.
3.2. Dataset and Model Selection
The models that achieved the best results in terms of MAE and RMSE against the validation set for each dataset were then evaluated on the independent test set (unseen during training), yielding the following results at both global and class levels.
3.2.1. Pixel-Level
Despite the low coefficients of determination (R
2) obtained across the four datasets—with values between 0.39 and 0.41—the other metrics are reasonably good given the problem’s context (
Table 11). The Mean Absolute Error (MAE) ranges from 5.42 to 5.86.
The class-specific results reveal considerable variability in model performance (
Figure 9). This figure includes the most representative and prevalent classes in the dataset, providing insight into how the model handles the most common land cover types. Generally, classes with lower spectral variability or greater homogeneity—such as
Bare soil and
Vineyard—exhibit the best performance metrics, with low MAE values (2.89 and 2.35, respectively) and high R
2 coefficients (0.53 for both), indicating high precision and strong correspondence with ground truth.
Water and
Plowed land also demonstrate solid performance, with R
2 values of 0.51 and 0.48, respectively, and low errors (MAE of 4.61 and 2.97), suggesting the model’s good capability in predicting these categories. Conversely, more heterogeneous classes or those with greater spectral similarity to others—such as
Herbaceous vegetation,
Deciduous,
Agricultural land,
Impervious surface, and
Pervious surface—present the highest errors. Notably,
Herbaceous vegetation has the lowest R
2 (0.23) and the highest MAE (17.59), while
Deciduous and
Agricultural land also show elevated MAE values (16.24 and 10.83, respectively), reflecting the model’s difficulty in capturing their internal variability and effectively distinguishing them from spectrally similar classes.
3.2.2. Scene-Level
The scene-level results at 100-m resolution show a clear improvement over those obtained at 20 m, which is consistent with what was observed in the validation set. In this case, the coefficients of determination (R
2) range between 0.59 and 0.67 (
Table 12). Additionally, MAE is significantly reduced, settling around 2.36–2.94. The RMSE also decreases notably, with values between 4.94 and 6.09.
The class-specific results from this analysis, conducted at the scene level (100 m resolution), demonstrate notable improvements in model performance across various categories compared to the previous pixel-level (20 m resolution) assessment (
Figure 10). As in the previous analysis,
Figure 10 includes only the most representative and frequent classes in the dataset, ensuring a meaningful comparison across dominant land cover types. The enhanced performance observed is largely attributable to the aggregation of information at a coarser scale, which reduces pixel-level noise and better captures the dominant land cover signals. Classes with inherently clearer spectral signatures or greater homogeneity—such as Bare soil, Water, and Vineyard—now exhibit excellent metrics, with R
2 values of 0.81 in all three cases and remarkably low MAE values (2.18, 3.02, and 1.67, respectively), indicating very strong agreement with ground truth. Building also shows a significant improvement, achieving an R
2 of 0.71 and an MAE of 4.52. Furthermore, classes that previously presented challenges at the pixel level, like Impervious surface, Plowed land, and Agricultural land, now demonstrate solid performance with R
2 values of 0.73, 0.72, and 0.74, and correspondingly lower errors (MAE of 6.06, 2.29, and 7.40). While classes such as Deciduous, Brushwood, Pervious surface, and Coniferous show good R
2 values (ranging from 0.63 to 0.75), their MAE values remain comparatively higher, suggesting that some internal variability or spectral confusion persists even at the scene level. Herbaceous vegetation, although notably improved from its pixel-level performance, still presents the lowest R
2 (0.58) and the highest MAE (8.90) among all classes, indicating it remains the most challenging category for precise prediction—likely due to its high heterogeneity and temporal variability, which may persist even after spatial aggregation.
3.3. Comparation with CGLS-LC100
Figure 11 shows both visual and quantitative evaluations of land cover classification performance for two models: the model developed in this study (“Our Pred”) and the Copernicus model (CGLS-LC100). The assessment was conducted across three urban and peri-urban scenarios, using the high-resolution FLAIR-GT dataset as the ground truth.
Across all scenarios, the “Our Pred” model exhibited strong agreement with the FLAIR-GT reference. This is reflected in the visual representation of land cover classes and the corresponding area proportions shown in the histograms. The model produced well-balanced predictions across categories such as trees, herbaceous vegetation, water, built-up areas, and bare soil.
In contrast, the Copernicus (CGLS-LC100) model showed notable biases. In the first two scenarios, it consistently overestimated built-up areas, suggesting potential confusion between constructed structures and other land cover types such as vegetation or bare soil. In the third scenario, this trend reversed, with a significant underestimation of built-up areas, highlighting inconsistencies in local-level predictions.
4. Discussion
Although continuous or fractional mapping approaches have traditionally received less attention than their discrete approaches, recent studies and operational products have begun to highlight their advantages and feasibility. One notable example is the work by Masiliūnas et al. [
37], who employed Landsat 8 imagery and the Random Forest (RF) algorithm to estimate fractional cover for seven land cover classes. In a more advanced implementation, Copernicus Global Land Cover delivers global fractional cover for ten classes at a spatial resolution of 100 m, explicitly adopting a continuous perspective [
21].
Compared to studies like that of Masiliūnas et al. [
37], which relied on conventional architectures such as RF, the present study explores more advanced models. These include XGBoost—a direct improvement of RF—as well as deep neural networks (DNNs) with varying depths and convolutional neural networks (CNNs). The results demonstrate a clear performance improvement: while Masiliūnas et al. [
37] reported a mean absolute error (MAE) of 7.9 at 100 m resolution for seven classes, our approach achieved a substantially lower MAE of 2.36 despite working with a much broader set of 13 classes. These findings underscore the potential of combining finer input data, broader class diversity, and more advanced modeling architectures to improve fractional land cover estimation. Future work could directly compare both approaches under standardized dataset and resolution settings to isolate the effect of model complexity and input richness.
Another key advantage—though also a challenge—is the number of land cover classes considered. While the aforementioned studies and products typically work with fewer than 10 classes, this study addresses a broader thematic diversity with 13 classes. This increased thematic richness enables a more detailed characterization of the landscape.
Our results at the pixel level (20 m resolution) yielded MAEs around 5–6, which is reasonable for heterogeneous cover types. Nonetheless, the low coefficients of determination (mean R2 = 0.41) at this scale suggest that the variance explained at the individual pixel level is limited. This indicates that, although the model captures the overall trend, predicting fractions at the pixel level remains challenging in complex environments, particularly for classes with high variability. Classes such as “Herbaceous vegetation,” “Agricultural land,” and “Deciduous” showed the highest pixel-level errors, with MAEs up to 17.59 and RMSEs exceeding 24. In contrast, classes like “Water,” “Bare soil,” “Plowed land,” and “Vineyard” had lower MAEs and better R2 values. This pattern is also reflected in the scatter plots, where a wide dispersion of predicted versus reference values is observed at the pixel level.
The notable improvement in performance metrics when validating the models at the scene level (100 m resolution) underscores the suitability of the approach for regional-scale analyses. At 100 m, the mean R
2 increased significantly to 0.67, while the MAE decreased to values between 2.36 and 2.94—indicating very low average errors. The scatter plots at this coarser resolution display a much tighter distribution around the 1:1 line, with reduced dispersion and fewer outliers, highlighting the improved consistency of predictions after spatial aggregation. At the class level, classes such as “Water,” “Vineyard,” “Bare soil,” and “Plowed land” achieved high R
2 values (0.81, 0.81, 0.80, and 0.72, respectively), with very low MAEs, even approaching or falling below 2 in the case of “Bare soil,” “Water,” “Vineyard,” “Plowed land,” and “Coniferous.” This improved performance at coarser spatial resolution reflects, as expected, how errors tend to cancel each other as the size of the spatial unit of analysis increases, and supports the robustness of the model for regional applications. These findings align with previous relevant studies, such as Marceau et al. [
60] and Buchhorn et al. [
21].
The study evaluated the performance of different machine learning architectures—DNNs, CNNs, and XGBoost—using four distinct datasets to assess the contribution of various input variables. Overall, results were relatively consistent across models. The intermediate DNN2 architecture consistently achieved the best results at the pixel level (20 m), while XGBoost outperformed all other models at the scene level (100 m) on the independent test set. When spectral indices were included (Dataset 2), DNNs underperformed—likely due to increased dimensionality and redundancy, which added noise, as noted by Romero et al. [
61]. In contrast, XGBoost and CNNs were less affected. XGBoost efficiently manages redundancy by discarding irrelevant features or assigning them low weights [
49], and CNNs are designed to extract relevant patterns directly from the original spectral bands [
62].
Dataset 3, which included auxiliary variables such as temporal and climate data, produced general improvements across all models—particularly at the scene level. This highlights the value of contextual information and suggests that such variables may enhance model performance more effectively than spectral indices alone. These results emphasize the importance of tailoring both model architecture and input features to the specific task and scale of analysis.
The comparison between “Our Pred” and Copernicus CGLS-LC100 highlights that while the Copernicus model is suitable for regional-scale analyses, it shows inconsistencies at the local scale, such as over- or underestimating built-up areas due to mixed land cover signals and limited spatial resolution. In contrast, the “Our Pred” model demonstrates higher accuracy and balanced classification, indicating that high-resolution, tailored approaches are better suited for detailed urban and peri-urban land cover mapping. These results underscore the value of localized model training and high-resolution reference data for improving predictive reliability and supporting planning and environmental monitoring in complex landscapes.
The use of fractional land cover maps provides a more detailed representation of landscape heterogeneity compared to traditional discrete maps. This approach enables a wide range of environmental applications: carbon stock estimation benefits from weighting the different pools according to the proportion of each class within a pixel, improving accuracy compared to assuming full occupation by a single class; hydrological models can incorporate actual proportions of vegetation, bare soil, and impervious surfaces to refine calculations of runoff, infiltration, and evapotranspiration; and biodiversity and habitat assessments are enhanced by capturing habitat mosaics and sub-pixel heterogeneity, supporting more realistic species distribution and habitat suitability models. Despite the advances presented, the proposed approach has limitations. Although a pixel-level MAE of around 5 is reasonable, the low R2 at this scale suggests limited explanatory power for individual pixels. Thus, while the model captures general trends, pixel-level prediction remains challenging in complex environments. However, the significant improvement in scene-level metrics (100 m) validates the applicability of our model for regional-scale analysis.
A key limitation compared to global products like Copernicus CGLS-LC100, or even broader studies such as Masiliūnas et al. [
37], is the geographical scope. Our current focus is regional—specifically in France—using the FLAIR dataset. While this allows for more precise calibration and may explain the better MAE scores, broader applicability remains to be tested. Nonetheless, model transferability is feasible. Studies like as demonstrated in studies like Safarov et al. [
63] and Sierra et al. [
14] demonstrate that CNNs can be used to generate labels in new regions, reducing the need for manual annotation and enabling expansion of the training dataset.
Future research could explore incorporating the temporal dimension in Sentinel-2 time series to better capture phenological dynamics, thereby improving the classification of seasonally varying classes. Additionally, hybrid architectures that combine the feature-handling capabilities of XGBoost with the spatial pattern extraction power of CNNs may further enhance model robustness. Finally, testing across a wider range of environments will be crucial for evaluating model transferability and potential for global applications.
5. Conclusions
This study presents a methodology for mapping fractional land cover using Sentinel-2 imagery and artificial intelligence models. Unlike traditional discrete classification approaches, the proposed method offers a more realistic depiction of landscape heterogeneity, particularly in pixels composed of mixed land cover types.
A key contribution of this research lies in addressing a gap in the current state of the art: most previous studies on fractional land cover have focused on global scales with a limited number of broad classes. In contrast, this work broadens the thematic scope by estimating fractional cover for up to 13 land cover classes. The methodology is developed and validated using the FLAIR dataset, which covers 810 km2 of French territory at 0.2 m resolution and includes expert-annotated land cover data. Four dataset variants were tested, combining Sentinel-2 spectral bands, scene classification (SCL), derived vegetation indices, and auxiliary variables such as climate zone and image acquisition date.
France was chosen as the study area due to its wide range of climates and land cover types, including oceanic, Mediterranean, continental, and mountainous regions, ensuring that the methodology is exposed to diverse environmental conditions and better prepared for adaptation to other regions. Although developed and validated using French data, the methodology is transferable to other areas with similar Sentinel-2 coverage and heterogeneous landscapes. For example, it could be applied to temperate European countries such as Germany, Poland, or Spain, as well as to parts of North America like the northeastern United States or southern Canada. In these regions, the approach could be used directly, but fine-tuning with local training data would likely improve accuracy by adapting the models to region-specific land cover patterns and climate variability.
Three modeling strategies were compared—XGBoost, deep neural networks (DNNs), and convolutional neural networks (CNNs)—to evaluate the impact of model complexity and input configuration on estimation accuracy. DNN2 and XGBoost showed the best results in terms of RMSE and MAE at the pixel (13.83 and 5.42) and scene level (4.94 and 2.36), respectively. CNNs, on the other hand, achieved the best coefficient of determination (R2), highlighting the potential of deep learning architectures to capture more complex spatial patterns, even if they do not always outperform in absolute error metrics.
Although results at the 20 m pixel level are promising, scatterplots reveal high variability in pixel-wise predictions. These findings suggest that aggregating predictions at coarser resolutions—such as 100 m—may yield more robust and consistent outputs, aligning better with current global-scale studies and reducing local noise. Finally, fractional land cover mapping represents a major step forward compared to discrete approaches, particularly in environmental applications such as carbon stock estimation. By avoiding the assumption that each pixel is fully occupied by a single class, fractional estimates offer a more nuanced view of land cover, especially in transitional or heterogeneous areas. Thanks to the use of Sentinel-2 data and relatively simple AI architectures, this approach remains scalable and cost-effective, with strong potential for frequent updates. The promising performance of deep learning models reinforces their value as tools for accurate and timely land monitoring.