1. Introduction
Accurate land cover and land use (LCLU) classification is crucial for environmental monitoring, urban planning, and climate change studies [
1]. The advent of high-resolution satellite imagery and advanced machine learning techniques has significantly improved LCLU classification [
2]. However, challenges remain in enhancing the classification accuracy and interpreting the decision-making processes of deep learning models [
3].
Convolutional neural networks (CNNs) have been widely adopted for image segmentation tasks in remote sensing. Among these, the U-Net architecture [
4] has gained prominence due to its ability to capture contextual information and produce detailed segmentation maps through its encoder–decoder structure with skip connections. The Attention U-Net [
5] extends the U-Net by incorporating attention mechanisms that enable the model to focus on relevant regions in the input image, suppressing irrelevant features and potentially improving the performance in complex scenes.
Understanding how deep learning models make classification decisions is essential in building trust and improving the models. Gradient-weighted class activation mapping (Grad-CAM++) [
6] is a visualization technique that provides insights into the features influencing a model’s predictions by highlighting important regions in the input image. This enhanced interpretability can aid in diagnosing model weaknesses and guiding improvements.
In this study, the U-Net and Attention U-Net models were employed to classify 10 LCLU classes in Sentinel-2 imagery. The classes included water, wetland, deciduous forest, mixed forest, coniferous forest, barren, urban/development, agriculture, shrubland, and no data. The selection of these models was motivated by their proven effectiveness in image segmentation tasks and the potential of attention mechanisms to enhance the classification accuracy. Furthermore, the potential of Grad-CAM++ was explored to enhance the interpretability of the deep learning models, aiding in understanding and improving their decision-making processes.
2. Materials and Methods
2.1. Study Area and Data
The study area was located in Northern Ontario, Canada, centered at 49.17° N, 83.03° W (
Figure 1). Sentinel-2 satellite imagery acquired on 15 August 2023 was utilized for this study. The imagery provided 10 spectral bands at various spatial resolutions, which were selected for the analysis: B02 (Blue), B03 (Green), B04 (Red), B05 (Vegetation Red Edge), B06 (Vegetation Red Edge), B07 (Vegetation Red Edge), B08 (NIR), B8A (Narrow NIR), B11 (SWIR), and B12 (SWIR).
The dataset was created by dividing the Sentinel-2 image into 94,922 image patches of 64 × 64 pixels using a sliding window approach with 50% overlap. This method ensured the comprehensive coverage of the study area and increased the dataset size. The dataset was then randomly divided into training (70%), validation (15%), and testing (15%) sets, ensuring the representative distribution of the land cover types in each set.
2.2. Ground Truth Data Preparation
The ground truth data were derived by integrating three land cover maps: the AAFC Annual Crop Inventory 2023 [
7], the NRCan Land Cover Map 2020 [
8], and the Ontario Land Cover Compilation v2.0 [
9]. These maps, each employing distinct classification schemes and spatial resolutions, provided complementary information for land cover characterization.
The AAFC Annual Crop Inventory 2023, produced by Agriculture and Agri-Food Canada, is a 30-m-resolution map offering detailed crop type information across Canada. The NRCan Land Cover Map 2020, produced by Natural Resources Canada, also has a 30 m resolution and classifies land cover types such as forests, wetlands, and barren lands. The Ontario Land Cover Compilation v2.0 provides land cover data at a 15 m resolution, detailing forests, wetlands, and urban areas specific to Ontario. All datasets were obtained under their respective Open Government Licenses.
To ensure consistency across the varying resolutions, all maps were resampled to a 10 m spatial resolution using the nearest neighbor method, aligning with the spatial resolution of the Sentinel-2 imagery. Each map was reclassified according to the 10-class land cover and land use (LCLU) scheme used in this study, encompassing no data, water, wetland, deciduous forest, mixed forest, coniferous forest, barren, urban/development, agriculture, and shrubland.
Given the shared spatial extent of the maps with the Sentinel-2 imagery, a majority voting approach was employed for class assignment. For each pixel, the most frequently assigned class among the three maps was selected. Pixels for which no consensus was reached (i.e., each map assigned a different class) were designated as ‘no data’ and omitted from further analysis.
This integration process aimed to enhance the accuracy and robustness of the ground truth dataset by leveraging the strengths of each map while minimizing their individual limitations.
2.3. Land Cover and Land Use Classification Scheme
The land cover and land use (LCLU) classification scheme employed in this study integrates both land cover and land use categories to capture the physical characteristics and human activities within the study area. Land cover refers to the physical materials present on the Earth’s surface, while land use describes the human utilization of specific land areas.
The classification includes water bodies, such as lakes and rivers, and wetlands, where the water table is close to the surface, supporting hydrophilic vegetation. Forested areas are categorized into deciduous forests, which consist predominantly of trees that shed leaves seasonally; mixed forests containing both deciduous and coniferous species; and coniferous forests that retain their needles throughout the year. Barren regions are characterized by minimal vegetation, such as exposed rock or sand, while urban/development areas include regions dominated by impervious surfaces due to human infrastructure. Agricultural areas are defined by their use for farming and cultivation, including croplands and pastures. Shrublands are characterized by dense, low-growing vegetation.
In addition, a ‘no data’ category is used to mask areas where classification is uncertain. By incorporating both land cover and land use dimensions, this scheme enables a detailed analysis of the landscape, capturing both natural features and anthropogenic influences.
2.4. Model Architecture and Training
This study employed two deep learning architectures: U-Net [
4] and Attention U-Net [
5]. Both models were configured to process input images with dimensions of 64 × 64 × 10 and produce classification maps across the 10 LCLU classes.
The U-Net architecture follows an encoder–decoder structure with skip connections. The encoder comprises repeated applications of convolutions and max pooling for downsampling, while the decoder path utilizes upsampling layers and concatenates features from corresponding encoder layers. The Attention U-Net extends the U-Net architecture by integrating attention gates within the skip connections, enabling the model to focus selectively on salient regions of the input image.
Both models were trained using a focal loss function to address class imbalance issues. The Adam optimizer was employed with an initial learning rate of 0.001. Learning rate reduction and early stopping strategies were implemented to improve the training stability and prevent overfitting. The training process was conducted for 15 epochs with a batch size of 64.
Grad-CAM++ was employed as a post hoc interpretability tool to visualize and understand the regions within the input images that most influenced the models’ classification decisions. It did not have an active role in influencing the classification performance during training but provided valuable insights into the decision-making processes of the models. By highlighting important regions, Grad-CAM++ aids in diagnosing model weaknesses and can inform future improvements in model design and training.
3. Results
3.1. Model Performance
The performance of the U-Net and Attention U-Net models on the test set is summarized in
Table 1, with metrics including the test loss, test accuracy, mean intersection over union (IoU), F1 score, and recall. These metrics provide a comprehensive evaluation of model performance.
Test loss measures the discrepancy between the predicted labels and the ground truth, with lower values indicating better alignment.
Test accuracy reflects the percentage of correctly classified pixels in the test set, serving as an indicator of overall model precision.
Mean IoU assesses the average overlap between the predicted and true segmentation masks across all classes, offering insights into the spatial agreement.
F1 score is the harmonic mean of the precision and recall, balancing the trade-off between these two metrics.
Recall represents the proportion of actual positive instances that are correctly identified, providing a measure of the sensitivity.
Table 1.
Performance comparison of U-Net and Attention U-Net models.
Table 1.
Performance comparison of U-Net and Attention U-Net models.
Model | Test Loss | Test Accuracy | Test Mean IoU | F1 Score | Recall |
---|
Base U-Net | 0.1864 | 70.68% | 0.4852 | 0.7150 | 0.7068 |
Attention U-Net | 0.1944 | 70.17% | 0.4771 | 0.7108 | 0.7017 |
The results indicate that the U-Net model outperformed the Attention U-Net, achieving a lower test loss and slightly higher values in all other evaluation metrics. This suggests that the U-Net may be more effective in classifying land cover and land use (LCLU) classes in this study area.
3.2. Class-Wise F1 Scores
Table 2 presents the class-wise F1 scores for both the U-Net and Attention U-Net models, offering insights into their classification performance across different LCLU classes.
The results indicate that both models achieve higher F1 scores for classes such as mixed forest and coniferous forest, reflecting strong performance in these categories. In contrast, both models struggle with the classification of the barren class, showing an F1 score of zero. This could be attributed to the limited representation of this class in the training data or spectral similarities with other classes, which may hinder accurate differentiation.
3.3. Grad-CAM++ Visualization
Figure 2 and
Figure 3 present the Grad-CAM++ visualizations for each LCLU class using the U-Net and Attention U-Net models, respectively.
The Grad-CAM++ visualizations demonstrate that both models effectively focus on relevant features corresponding to each LCLU class. These visualizations not only validate the models’ ability to capture spatially meaningful patterns but also assist in identifying potential sources of misclassification and areas for refinement. For instance, both models exhibit strong activations for forested areas, while the activations for barren land are notably weaker. This observation aligns with the land cover distribution within the study area and corresponds to the models’ lower F1 scores for the barren class, highlighting a potential area for further model improvement.
3.4. Prediction Results
Figure 4 and
Figure 5 present the prediction outcomes for the U-Net and Attention U-Net models, respectively. Each figure includes three components: the original RGB true color image, the ground truth map derived from the integrated land cover datasets, and the predicted classification map.
Both the ground truth and the predicted maps cover the same spatial extent, enabling a direct comparison of the classification results. The ground truth map is based on the integration and reclassification of land cover data from the AAFC, NRCan, and Ontario Land Cover datasets. These individual maps are not displayed separately, as the primary focus is on assessing the alignment between the model predictions and the integrated ground truth.
The results indicate that both models capture the general land cover and land use (LCLU) patterns within the study area. Nevertheless, instances of misclassification are evident, particularly in regions with mixed or transitional land cover types. This aligns with the class-wise F1 scores, which reveal lower performance for certain categories, such as urban/development and agriculture. These findings highlight the challenges associated with classifying heterogeneous landscapes and suggest potential areas for model improvement.
3.5. Discussion
The results underscore the potential of Grad-CAM++ in enhancing the interpretability of deep learning models in LCLU classification. Although the U-Net demonstrated marginally better performance metrics compared to the Attention U-Net, both models encountered challenges with certain classes, particularly in accurately classifying barren land. This difficulty highlights the need to address class imbalances and suggests that incorporating additional relevant features could improve the model performance.
The minimal difference in performance between the U-Net and Attention U-Net models indicates that both architectures have room for further optimization. Refining the attention mechanisms could enhance the feature localization, potentially leading to more accurate classifications. Moreover, optimizing the baseline U-Net through hyperparameter tuning or the integration of advanced techniques could yield additional performance gains.
4. Conclusions
This study demonstrates the value of integrating Grad-CAM++ with deep learning architectures for LCLU classification using Sentinel-2 imagery. While the U-Net showed marginally better performance metrics compared to the Attention U-Net, both models benefited from Grad-CAM++ visualizations in enhancing their interpretability. The visualizations provided insights into the features that influenced the models’ decisions for different LCLU classes, revealing which regions of the input images were most significant for classification.
Although the Attention U-Net did not outperform the U-Net, the attention mechanisms within the model offer a valuable avenue through which to direct focus within the network. Further research is needed to optimize these mechanisms and assess their impacts on model performance and interpretability. The integration of Grad-CAM++ with both models emphasizes the importance of interpretability tools in deep learning applications for remote sensing.
Future work should focus on optimizing the attention mechanism in the Attention U-Net model and exploring ways to integrate Grad-CAM++ insights into the training process. Addressing class imbalance issues, particularly for underrepresented classes like barren land, is crucial. Techniques such as oversampling minority classes, undersampling majority classes, or employing more sophisticated data augmentation strategies could be employed.
Moreover, incorporating additional relevant features or ancillary data sources might be beneficial. Integrating textural features, digital elevation models, or multi-temporal data could potentially improve the models’ ability to distinguish between spectrally similar classes. By addressing these challenges, more robust and accurate LCLU classification models can be developed to better support environmental monitoring and land management decisions.
Author Contributions
Conceptualization, all; methodology, all; software, T.H.; investigation, all; data curation, T.H.; writing—original draft preparation and editing, T.H.; writing—review, B.H.; visualization, T.H.; supervision, B.H.; project administration, B.H.; funding acquisition, B.H. All authors have read and agreed to the published version of the manuscript.
Funding
This research was partly funded by the Natural Sciences and Engineering Research Council of Canada (NSERC), through a Discovery Grant (RGPIN-2021-03624) awarded to Dr. Baoxin Hu.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Acknowledgments
The Sentinel-2 satellite imagery was obtained from the Copernicus Open Access Hub. The land cover classification maps were derived from datasets provided by Agriculture and Agri-Food Canada (AAFC), Natural Resources Canada (NRCan), and the Ontario Ministry of Natural Resources and Forestry (OMNRF). These data contain information licensed under the Open Government License—Canada and Ontario.
Conflicts of Interest
The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript; or in the decision to publish the results.
References
- Foody, G.M. Status of land cover classification accuracy assessment. Remote Sens. Environ. 2002, 80, 185–201. [Google Scholar] [CrossRef]
- Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef]
- Zhang, C.; Sargent, I.; Pan, X.; Li, H.; Gardiner, A.; Hare, J.; Atkinson, P.M. Joint Deep Learning for land cover and land use classification. Remote Sens. Environ. 2019, 221, 173–187. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015; Navab, N., Hornegger, J., Wells, W., Frangi, A., Eds.; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Oktay, O.; Schlemper, J.; Le Folgoc, L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
- Chattopadhay, A.; Sarkar, A.; Howlader, P.; Balasubramanian, V.N. Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 839–847. [Google Scholar]
- Agriculture and Agri-Food Canada (AAFC). Annual Crop Inventory 2023; Government of Canada: Ottawa, ON, Canada, 2023. Available online: https://open.canada.ca/data/en/dataset/5d3ab93e-324a-41db-8d29-0f0813d0e9cd (accessed on 15 July 2024).
- Natural Resources Canada (NRCan). 2020 Land Cover of Canada; Government of Canada: Ottawa, ON, Canada, 2022. Available online: https://open.canada.ca/data/en/dataset/ee1580ab-a23d-4f86-a09b-79763677eb47 (accessed on 15 July 2024).
- Ontario Ministry of Natural Resources and Forestry (OMNRF). Ontario Land Cover Compilation v2.0—Data Specifications; Queen’s Printer for Ontario: Toronto, ON, Canada, 2014. Available online: https://geohub.lio.gov.on.ca/documents/7aa998fdf100434da27a41f1c637382c/about (accessed on 15 July 2024).
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).