1. Introduction
Forests occupy about one third of Earth’s surface [
1] and have a significant impact on the balance of our biosphere. As storehouses of biological diversity, they play an important role in the life support system of Earth. They act as moderators of ecosystems, protect and replenish water and air, and are critical to the carbon balance on the planet. They provide several ecosystem services [
2], like material resources, timber, and non-timber products, regulating services including climate and water quality, habitat or provisioning, supporting services, including soil formation and nutrient cycling, and cultural services relating to recreation and spirituality.
A recent study on the effects of climate change on forestry [
3] revealed that the conditions in forests and their capacity to deliver environmental services are on the brink of being destabilized. Global warming causes fluctuations in rainfall, increased frequency of high-intensity storms and hurricanes, and an increase in CO
2 concentrations in the atmosphere, which affects the growth of trees [
4]. Disturbances to forests due to climate change cause them to store less carbon dioxide, thereby accelerating global warming.
Due to this important role of forestry and trees, appropriate forest management and conservation efforts are essential. These efforts are only possible when policy and decision makers have accurate information about forests and their potential threats [
5]. An important part of this information is accurate tree censuses and classification at high spatial resolutions. A regular forest inventory assists in the assessment of the extent of forest depletion, deforestation, synchronized restoration measures, and effectiveness of conservation programs [
6]. Furthermore, functional identification of tree species and their health statuses allows for accurate estimation of the carbon stock and other ecosystem services.
Performing a tree census at scale via traditional methods is resource-demanding, as it requires significant effort by a large number of experts. At the same time, traditional methods are error-prone because humans are heavily involved in the process. Such methods demand high costs and significant time, are rather static, and do not allow the detection of rapid disturbances and threats to trees and forests.
Recent technological advancements have allowed researchers to accelerate and scale up the tree census process by employing satellite imagery and artificial intelligence (AI) [
7]. Satellite imagery has become increasingly accessible, offering improved spatial and temporal resolutions which enable detailed observations of forested areas. Freely available data from satellite missions like Landsat and Copernicus provide open access to moderate-resolution imagery. However, the lack of free access to high-resolution imagery from commercial providers like Maxar (e.g., WorldView) and Planet Labs (e.g., PlanetScope) can pose limitations [
8]. Deep learning (DL) methods, including techniques such as convolutional neural networks (CNNs) and long short-term memory (LSTM) networks, have been widely applied in tree classification from satellite imagery, demonstrating their effectiveness in leveraging spectral and spatial patterns for detailed species identification [
9]. However, a drawback of these models is that they require a supervised training process, which usually requires experts to annotate a large number of images to allow AI models to learn how to identify different tree species [
10]. This creates significant overheads in the process, while identifying the right experts is usually difficult.
In this paper, we propose a hybrid approach where experts label data only partially, and then weakly supervised learning techniques are employed to train AI models, taking satellite imagery as input. We argue that by harnessing historical and partially labeled data, even when a portion of these data is inconsistent, we can employ data imputation techniques such as pseudo-labeling to expand the training datasets to the extent required by AI models to tackle the tree classification task with state-of-the-art precision.
We further exploit the fact that large-scale geospatial tools and digital twins are becoming available [
11,
12], providing detailed and highly accurate geomorphological and topographical characteristics of Earth. Combining high spatial resolution visible-band and near-infrared (NIR)-band satellite imagery together with geoanalytical services provided by those tools, AI models can be trained on multiple modalities [
13,
14], allowing solving the tree census problem with high accuracy, minimal effort, and lower costs. Thus, the main contribution of this paper is the introduction of a novel methodology for a tree census classification system which leverages historical and partially labeled data, employing data imputation and weakly supervised learning techniques and thus achieving the classification of the dominant tree species of Cyprus with state-of-the-art precision.
This study aims to build a cost-effective and scalable methodology which can be used to classify tree species over a wide geographical area. Instead of being solely reliant on large and accurate datasets which are difficult to acquire, our approach leverages a weak supervision technique in the form of pseudo-labeling along with auxiliary geomorphological data, which are used together for multi-modal deep learning. This methodology reduces the dependency on large, fully annotated datasets, allowing it to be applied for large-scale coverage of geographical areas. At the same time, this study provides new evidence that the general idea of employing larger and rather incomplete datasets to train deep learning models for the tree mapping task from satellite imagery works well in comparison with using smaller and more precise datasets. This evidence aligns with the work of Rolnick et al. [
15], who showed that deep neural networks are capable of generalizing from training data, for which true labels are massively outnumbered by incorrect labels. They specifically diluted each clean training example with 100 randomly labeled examples and still achieved quite good results on several datasets. Training in the regime of substantial label noise requires a significant but manageable increase in dataset size that is related to the factor by which correct labels have been diluted. Similarly, in our approach, each existing tree label is diffused at neighboring areas and characterizes unknown nearby trees. Since the probability of two trees of the same species growing close to each other is (at least slightly) higher than the probability of two different trees growing in neighboring locations, our pseudo-label assignment process will never harm the training process, but on the contrary, it will most likely improve the results if it is applied systematically and consistently. Given that the errors introduced by the pseudo-labeling process are generated uniformly in the sense that the process is applied to large areas of evenly distributed tree species (in terms of the area covered by each species), the proposed weakly supervised approach can only improve the results and is more cost-effective than using a smaller but more precisely annotated dataset via a costly and time-consuming process.
2. Related Work
Ground assessment of tree resources is time-consuming, especially in dense forested areas. The recent progress in remote sensing and DL has improved the quality of tree species classification and mapping initiatives. Remote sensing provides large-scale coverage and more frequent surveying, making it a promising alternative method for classifying tree species rather effectively. In this section, we describe prominent work in this field, while
Table 1 summarizes the main results of the related efforts under study.
Regarding medium spatial resolution satellite imagery, Axelsson et al. [
16] demonstrated that Sentinel-2 can be effectively used in tree species classification. Similarly, Persson et al. [
17] and Puletti et al. [
18] showcased the use of Sentinel-2 for the classification of forest types and tree species with similar assertions.
Moreover, high spatial resolution satellite imagery (e.g., from IKONOS or WorldView satellites) has been employed, harnessing better-quality images of tree canopies. For instance, Immitzer et al. [
19] studied the classification of three tree species of a temperate forest in Germany using WorldView-2. Likewise, Fang et al. [
20] used multi-temporal WorldView-3 imagery to classify tree species at different taxonomic levels in Washington, D.C. Generally, the shift from medium to high spatial resolution imagery results in improved classification precision due to the enhanced detail captured in the imagery.
Combining visible-band satellite imagery with other modalities such as airborne LiDAR and multispectral imagery can aid in the tree classification challenge. For example, Wang et al. [
21] showed that the integration of LiDAR with visible-band satellite imagery led to a 10% increase in classification accuracy. Immitzer et al. [
22] found out that the approach of using additional spectral bands from multispectral imagery significantly improved the user’s accuracy by 8% in classifying various tree species.
Modern approaches incorporate AI-based techniques such as random forests [
23], CNNs, LSTMs, and multilayer perceptrons (MLPs), which offer great capabilities in identifying and classifying tree species [
9]. For instance, Welle et al. [
24] used Sentinel-2 imagery and the XGBoost ML model to map tree species across Germany, harnessing multi-temporal and multi-spectral data to capture species phenology across seasons. Likewise, Lechner et al. [
14] fused data from Sentinel-1 and Sentinel-2 for enhancing species categorization in the Wienerwald Biosphere Reserve in Austria. Furthermore, a study by He et al. [
7] in Qingyuan County, China demonstrated that the ResNet50 [
25] model performed best in comparison with several implemented DL models. Also, the authors demonstrated the complexity of differentiating relatively similar species by utilizing the application of alternative image analysis methods (such as PCA), and additional data inputs (e.g., the NDVI index) may improve the species classification process. Similarly, Li et al. [
26] examined several CNN structures (i.e., ResNet [
27] and DenseNet [
28]) for the classification of individual tree species using high spatial resolution satellite imagery, further demonstrating the applicability of CNNs in improving the tree classification task. These studies demonstrate the increasing precision and scalability of DL techniques across different landscapes.
Other studies have shown that the inclusion of geomorphological data such as digital elevation models (DEMs) and topographical features enhances classification accuracy, especially in hilly or mountainous areas. Prodromou et al. [
13] used random forests to map the main forest habitats of Cyprus using Sentinel-1 and Sentinel-2 satellite imagery along with topographical features and achieved a significant improvement in overall accuracy of 10% in comparison with using Sentinel-2 alone. Liu et al. [
29] noted that the integration of DEMs with Sentinel-1, Sentinel-2, and Landsat-8 imagery improved the overall accuracy for forest species in Wuhan city, China by 5.96%. In the same manner, Chiang et al. [
30] identified the major tree species in northern Mongolia by employing Landsat 8 imagery along with the topographic factors obtained from ASTER GDEM v2. Their results showed that incorporating variables such as elevation, slope, aspect, and the topographic wetness index significantly increased the overall accuracy from 71% to 81%. The significance of integrating geomorphological features in tree classification was also showcased by Yu et al. [
31] in a study of Inner Mongolia’s grasslands and by Chiang et al. [
32] in a study of Mongolia’s mountainous regions.
Finally, weakly supervised learning-based approaches look promising for overcoming data limitations in tree species classification. A recent attempt by Illarionova et al. [
33] employed weakly supervised learning for tree classification in Russian boreal forests based on Sentinel-2 imagery. Their methodology addressed issues of weak and uneven ground truth information by using a weakly supervised neural network architecture which corrected the species markup in line with the sorts of species peculiar to stands. Adding this weak markup to the object-wise sampling techniques enhanced the overall classification performance (F1 score from 0.68 to 0.76).
To sum up,
Table 1 lists the key findings of each paper mentioned above, indicating the study area, datasets used, the type of classification task under study, the techniques employed, the number of classes involved, and the results based on the different metric(s) employed by each author. We can conclude that higher spatial resolution satellite data allow for better results, with improvements of 5–10% in the overall accuracy [
26]. DL models have better performance in classification tasks involving multiple classes, achieving 84.91% classification accuracy in [
7]. Also, the integration of topographic information improves classification results, with an overall accuracy increase of 10% in [
13], because topographic features influence vegetation types and distributions.
Table 1.
Overview of techniques and outcomes in forest classification and monitoring.
Table 1.
Overview of techniques and outcomes in forest classification and monitoring.
Authors | Study Area | Data Used | Task | Technique or Model Used | Classes | Results |
---|
Immitzer et al. [19] | Bavaria, Germany | WorldView-2 and Landsat | Tree species classification and mapping | Random forest | 3 | R2 = 0.76 |
Li et al. [26] | York University, Toronto, Canada | WorldView-2 | Individual tree species classification | ResNet18 | 4 | Overall accuracy: 90.9% |
Fang et al. [20] | Washington, DC, USA | WorldView-3 | Tree species classification | Random forest | 19 | Overall accuracy: 61.3% |
Yin et al. [34] | Central Asia | WorldView-3 | Forest cover mapping | Random forest | 3 | Overall accuracy: 83% |
Axelsson et al. [16] | Southern Sweden | Sentinel-2 | Tree species classification | Bayesian inference with maximum likelihood classification | 4 | Overall accuracy: 87% |
Persson et al. [17] | Remningstorp, Sweden | Sentinel-2 | Tree species classification | Random forest | 5 | Overall accuracy: 88.2% |
Puletti et al. [18] | Tuscany, Italy | Sentinel-2 | Tree species classification | Random forest | 4 | Overall accuracy: 86.2% |
Welle et al. [24] | German forests | Sentinel-2 | Dominant tree species classification | XGBoost | 7 | F1 scores from 0.69 to 0.96 |
Lechner et al. [14] | Wienerwald Biosphere Reserve, Austria | Sentinel-1 and Sentinel-2 | Tree species classification | Random forest | 12 | Overall accuracy: 83.7% |
He et al. [7] | Qingyuan County, Zhejiang Province, China | Sentinel-2 | Forest tree species classification | ResNet50 | 8 | Validation accuracy: 84.91% |
Prodromou et al. [13] | Cyprus (Paphos, Akamas, Troodos) | Sentinel-1, Sentinel-2, and topographical features | Forest habitat mapping in Natura2000 sites | Random forest (RF) | 8 in Akamas, 9 in Paphos, 6 in Troodos | Overall accuracy: 91–94% |
Liu et al. [29] | Wuhan, China | Sentinel-1A, Sentinel-2A, Landsat-8, DEM | Forest type classification | Object-based random forest | 9 | Overall accuracy: 82.78% |
Chiang et al. [30] | Erdenebulgan County, Mongolia | Landsat 8 and ASTER GDEM | Tree species classification | Maximum entropy (MaxEnt) | 4 | Overall accuracy: 81% |
Yu et al.’s [31] | Inner Mongolia, China | Sentinel-2 and DEM | Grassland classification | Random forest | 3 | Overall accuracy: 83.41–96.97% |
Illarionova et al. [33] | Leningrad Oblast, Russia | Sentinel-2 | Tree species classification | CNN with weakly supervised classification and object-wise sampling | 4 | F1 score: 0.76 |
5. Results
This section evaluates the model’s performance in terms of its precision when performing the classification task under study.
Table 5 presents a comprehensive performance comparison of different ResNet50 models trained with various combinations of the multimodal data used as input. We assumed that topographical features such as elevation, aspect, slope, and soil, integrated as auxiliary input data, would provide additional contextual information about the probability of certain tree species being found in different geomorphological landscapes of the island [
50]. Starting from a ResNet50 model as a baseline, the lowest performance was observed when the model was trained on RGB images without using pretrained weights (classification accuracy of 76%). When a pretrained ResNet50 model on ImageNet data was used, the accuracy then reached 85%. By incorporating soil data, a slight improvement in the model’s accuracy was observed (86%). Similarly, by adding NIR-band images together with RGB ones, the model accuracy further improved slightly (87%). Incorporating the elevation, slope, and aspect increased the accuracy to 88%. The highest performance was achieved when the AI model combined RGB and NIR images and all geomorphological characteristics (i.e., elevation, aspect, slope, and soil), reaching a classification accuracy of 90% and a score of 0.90 for all other performance metrics (F1 score, precision, and recall). These results highlight the importance of the fusion of multiple datasets and modalities for achieving high classification performance.
Table 6 shows the classification performance of the best-performing ResNet50 model (final row in
Table 5) across all different classes. The model achieved high precision, recall, and F1 score results across most of the classes. The model exhibited the highest scores for olive trees, with a precision of 0.95, recall of 0.93, and F1 score of 0.94. Similarly,
Juniperus and vine were accurately classified, with F1 scores of 0.91. At the same time, the
Quercus alnifolia and
Pinus nigra classes had slightly worse performance, with F1 scores of 0.84 and 0.87, respectively. Overall, the ResNet50 model demonstrated good performance (more than 0.80 for all metrics) for all tree classes.
The confusion matrix in
Figure 10 provides a detailed breakdown of the AI model’s performance across different vegetation classes, showing the number of correct and incorrect predictions for each class. Each cell in the matrix represents the count of predictions performed by the model based on the testing dataset. The diagonal cells (from the top left to the bottom right) show the number of correct predictions for each class, while the off-diagonal cells indicate misclassifications. Olive trees had the highest accuracy with 4776 correct predictions, while
Cedrus brevifolia was correctly predicted 274 times with minimal misclassification errors. Misclassifications occurred mainly between similar classes (i.e., classes with low variance), such as
Pinus brutia and
Pinus nigra, as well as between fruit-bearing trees and olive trees. These misclassifications were likely due to shared spectral or textural features as well as potential overlaps in their ecological environments, which the model found challenging to separate. Overall, the model demonstrated strong classification abilities with most predictions correctly identified, as indicated by the concentration along the diagonal. Despite these strengths, further refinement of the input features, inclusion of additional training data for overlapping classes, or incorporation of advanced feature engineering techniques could help mitigate the observed confusion between closely related classes. This will be explored as part of future work.
Geographical Distance for Pseudo-Labeling
Here, we show how our experimentation performed in order to consider the best performing distance for the pseudo-labeling approach of our ground truth data (see
Section 4.2.2). We performed experiments for pseudo-labeling based on geographical distances from the ground truth labels of 50, 100, 200, 300, and 400 m. We trained the ResNet50 model using the best configuration of data sources, which was the combination of RGB and NIR images, elevation, slope, aspect, and soil (see
Table 5). The results are shown in
Table 7. The results suggest that a distance of 200 m was the best choice for the pseudo-labeling process.
6. Discussion
The adoption of satellite imagery and AI, together with the inclusion of topographical features for tree classification, apart from improving the census of forestry resources, also plays a crucial role in forest carbon inventory and stock assessment. This approach has been proven to be highly beneficial, particularly in remote areas which are difficult to reach on foot. It offers not only a cost-effective alternative to ground surveys but also constitutes a dynamic methodology for frequent surveying, which is crucial for prompt decision making and policy development. The findings of our study underscore the effectiveness of integrating high spatial resolution visible-band satellite imagery, NIR imagery, and topographical features as multi-modal inputs of DL models for achieving a high classification accuracy of tree species. Most importantly, the results of our study indicate that country-scale surveys can be achieved only by means of sparsely labeled data and weakly supervised approaches for training AI models, where specific techniques such as pseudo-labeling seem to work well. This study also shows that using an extensive but partially incomplete dataset to train a deep learning model can be more effective for tree mapping from satellite imagery compared with relying solely on smaller and highly accurate datasets.
6.1. AI Model Performance
The ResNet50 model trained on RGB and NIR images and the elevation, slope, aspect, and soil performed best, with a classification accuracy of 90% on the testing data for nine tree species classes.
Table 5 shows the impact of integrating multiple sources of relevant data for training DL models, where each additional source contributed, to a certain extent, to improving the classification results. It is likely that multi-temporal and multi-spectral (aside from NIR) satellite data as well as LIDAR data could further improve accuracy. This is a task for future work.
In comparison with existing state-of-the-art research on tree species classification based on satellite imagery and AI [
16,
17,
24], our results show state-of-the-art performance, while their significance is more profound due to the weakly supervised approach which followed. It is difficult to directly compare our results with other works because of the different techniques, datasets, metrics, numbers of classes, and input data used. We claim with some caution that our AI model surpasses the findings of He et al. [
7], where an 84.91% validation accuracy was achieved on the FTSD dataset. He et al. achieved this score by utilizing PCA and NDVI based on Sentinel-2 satellite imagery for classifying nine tree species. Similarly, Lechner et al. [
14] utilized Sentinel-1 and Sentinel-2 data for tree classification, reaching a classification accuracy of 83.2%. By adding additional data from Landsat-8 and topographic features, together with Sentinel-1 and Sentinel-2 imagery, Liu et al. [
29] achieved an accuracy of 82.78%.
A possible reason for the good performance of our ResNet50 model is its capacity to capture and properly encode a wide number of diverse input modalities. The confusion matrix in
Table 6 shows that the accuracy was high for the majority of the tree classes. Our approach achieved precision and recall values close to 0.9 for the olive,
Juniperus, and vine classes. Most misclassifications occurred between
Pinus brutia and
Pinus nigra as well as between fruit-bearing and olive trees. This happened due to the similar characteristics of those classes, which made it easy to confuse one with the other. Also, this confusion can be attributed to their similar spectral signatures in satellite imagery, which are challenging to distinguish. Some lower accuracy in some classes, such as in
Quercus alnifolia and
Pinus nigra, can be explained by the small number of training examples available.
6.2. Topographical Features and Tree Species
When examining the relation between the topographical features and tree classes, we developed a range of geovisualizations. In
Figure 11, the distribution of tree classes is shown in relation to the elevation using a box plot. Tree classes have different elevation preferences. For example,
Quercus alnifolia is found at higher elevations with an interquartile range (IQR) of 600–1000 m.
Cedrus brevifolia and
Pinus nigra prefer hilly areas and flourish in the mid-near range of 1000 m. However, the height range for vine, olive, and Juniper trees is wide, being more common at elevations below 500 m.
Pinus brutia and
Cedrus brevifolia seem to adapt to different elevations, whereas Ceratonio rhamnion has a narrower distribution.
Similarly, when examining the relation between soil type and tree class,
Figure 12 indicates that olive trees tend to grow in diverse soils, especially loam, clay, and rocky types, while Ceratonio rhammonion appears mostly in rocky and loamy soils. Fruit-bearing trees grow in gravelly sand and loamy soil, while
Quercus alnifolia prefers rocky and loam soils.
6.3. Interpretation of Results
The Department of Forestry of the Republic of Cyprus is concerned about the spatial expansion of the
Pinus nigra tree species, which creates competition with the more native
Pinus brutia species. Our tree census allows understanding the trends regarding this expansion, as well as which geomorphological conditions favor this expansion.
Figure 13 illustrates that
Pinus nigra prefers higher elevations compared with
Pinus brutia. While both tree species tend to grow in rocky mountain areas,
Pinus brutia grows in sandy areas as well, as can be seen in
Figure 14. Such observations are significant for local policymakers, while the velocity of the expansion of
Pinus brutia is also important for them to assess the urgency of the problem and the need to take direct measures or not.
6.4. Limitations
While this study marks an advancement in tree classification and forest censuses based on earth observation and AI, mainly because of the methodology followed to achieve this (i.e., weakly supervised learning), there are certain limitations worth mentioning. First, the study included only nine tree classes, due to the fact that only those classes had at least 50 images annotated, even with the data imputation methods used. Classes such as palm trees, banana trees, walnut trees, and fig trees were left out. Furthermore, a limitation of our study is the fact that our testing was based on the ground truth information originally provided by experts (Deptartment of Forestry) and not any visual inspection of the results performed afterward by the same or other experts.
6.5. Future Work
Future research will focus on including additional tree species classes and employing experts to further validate our classification results. We also plan to perform tree classification for previous years (e.g., 10 years ago) to assess the velocity of the spatial expansion of Pinus nigra, as requested by local policy makers. Moreover, we intend to experiment with multi-temporal and multi-spectral satellite imagery and investigate the potential improvements in accuracy which may be achieved. Specifically, we aim to improve the misclassifications among Pinus brutia and Pinus nigra by incorporating additional spectral bands or higher-resolution imagery, potentially refining the model’s ability to capture subtle interspecies differences.
Finally, an existing problem of the related work is that the methodologies or implementations of most studies were site-specific and therefore of limited utility, being difficult to replicate in different ecosystems and landscapes. We will work on developing more robust and adaptive methodologies and models which may work effectively in diverse areas and scenarios.