1. Introduction
For several decades, there has been a growing interest in mapping the distribution and conservation status of ecosystems as an effective tool for the conservation of natural resources [
1,
2]. One of the main reactions to the biodiversity crisis was the development of conservation areas, primarily designed to maximize the performance of investments in conservation and minimize conflicts with human activities [
3]. For example, the Habitats Directive, adopted in 1992, launched the EU-wide Natura 2000 ecological network of protected sites to safeguard the natural habitats of wild fauna and flora against potentially damaging developments. The directive states that each European member state should monitor the conservation status of special areas of conservation (SACs) recognized in its territory, which in the Spanish case refers to 14.2% of its territory. SAC monitoring is generally carried out through the general matrix of conservation status assessment [
4], which contains the reference thresholds to assess the conservation status of these sites based on their area, range, structures and functions, and future prospects for the survival of the habitat. For each of these parameters, three possible values must be established: favorable, unfavorable-inappropriate, or unfavorable-bad. Traditionally, SAC monitoring has been carried out through photo interpretation techniques and field work performed by botanical experts [
5]. However, these traditional methods are not effective in developing information bases on the vegetation cover and structure at a large-spatial scale, due to its high economic and human cost for long-term monitoring under a limited budget [
6]. In this sense, it is urgent to find reliable, accurate, and inexpensive monitoring systems for natural habitats and terrestrial ecosystems [
6].
Alternatively, available techniques, such as remote sensing, could offer advances in mapping and managing ecosystem conservation, constituting a practical and economic alternative. In this sense, essential biodiversity variables (EBV-s) represent spatially explicit and scalable proxies of individual biophysical parameters relevant to biodiversity [
7,
8]. The development of EBV-s was raised with satellite remote sensing-based monitoring at a global scale in mind, but biodiversity monitoring at finer scales is currently a demand for biodiversity conservation. Forest structure is a relevant indicator of biodiversity evaluation and conservation status in forest ecosystems, since greater structural diversity correlates with the presence of more ecological niches and faunal habitats [
9,
10,
11,
12]. Structural diversity patterns are an essential source of information for the management of Natura 2000 Network sites [
13]. Even though the forest structure and its three-dimensional organization are among the most problematic components to quantify in natural ecosystems [
14], light detection and ranging (LIDAR) has been applied to analyze the structure of forests with success [
13,
15,
16], enabling the prediction of species richness [
17,
18]. Specifically, positive correlations have been found between the diversity of heights in forests and the richness of bird species [
19,
20], as well as with some other mammals, such as the endangered Delmarva fox squirrel [
21] or the red vole (
Arborimus longicaudus) [
22], among others. Furthermore, Guo et al. [
23] developed an inventory of the vegetation structure in Alberta (Canada) that efficiently synthesizes the vertical variation of forests into different categorical classes for their use in conservation and management activities.
Despite the trend of studies using LiDAR data to determine a habitat’s specific richness, which is gradually increasing, not many address the conservation status of different habitats of interest. Zlinszky et al. [
24] used the random forest (RF) classification technique to create conservation status fuzzy maps based on information obtained from leaf-on and leaf-off point clouds (10 pt/m
2) collected in an alkali grassland area. They concluded that very few (if any) remote sensing-derived variables correlate closely with biodiversity indicators, regardless of the local habitat. Simonson et al. [
25] tested the combination of airborne multispectral and laser scanning (LiDAR) for habitat mapping and a robust indication of conservation status in
Quercus suber forests in southern Portugal. By using relationships between forest vegetation structure, species diversity, and LiDAR height metrics, these authors predicted and mapped areas of high, medium, and low conditions, defined as the spatial distribution and structure of habitat patches. Although the combination of multispectral and LiDAR data improved the accuracy of habitat mapping [
24] and other forest structural variables, such as fuel types [
26], few examples have tested the combination of multiple sensors for conservation status estimation, and, to the best of our knowledge, none of them combined single-photon LiDAR and Sentinel-2 data. The combination of both types of remote sensing data can be essential, since LiDAR can provide information directly related to the structural diversity of the forest and multispectral data can be directly related to species richness and vegetative state.
In the last few years, new LiDAR metrics describing canopy height distributions have been developed to quantify the complexity of the canopy and classify the structure of the forest according to ecological functions [
27]. These are accurate and easily interpretable [
16], so they are frequently included as explanatory variables in habitat models for forest wildlife [
28,
29]. The L moments [
30], for example, have recently started to appear in the ecology and forest management LiDAR applications literature [
27,
31]. To study the occupation patterns of red tree voles, Johnston and Moskal [
22] have generated complex variables derived from LiDAR, such as the diversity of foliage height, canopy connectivity, or border density, to determine whether they are highly correlated with direct measures of canopy height. Listopad et al. [
15] developed the adaptation of the Shannon (LiDAR height diversity index, LHD) and Pielou (LiDAR height evenness index, LHEI) biodiversity indices, calculated from the three-dimensional data provided by LiDAR technology. The first one arises from a mathematical model [
32] that seeks to measure the diversity of species in a determined area. This index must be combined with the second one, since the equitability of species richness is also needed to conduct a correct study of structural biodiversity [
33]. However, the utility of many of these metrics has not been explored in wildlife and forest habitat models of conservation status.
The main objective of this research is to evaluate the potential of single photon LiDAR and Sentinel-2 data for the classification of conservation status in four special areas of conservation in Navarra Province (Spain) that comprise three habitats: Quercus faginea (official code 9240), Quercus pyrenaica (official code 9230), and Quercus robur (official code 9160) included in Annex I of the Habitats Directive. The secondary aims are: 1) to test the performance of structural biodiversity-related LiDAR metrics to classify oak grove conservation statuses, 2) to test the performance of classification with the combination of Sentinel-2 and simple photon LiDAR data, and 3) to test the performance of two different machine-learning methods: random forest (RF) and support vector machine (SVM).
2. Study Area
The oak groves under study are located in Navarra Province, which has one of the highest levels of biodiversity in the EU due to the confluence of the Alpine, Atlantic, and Mediterranean biogeographic regions [
34]. A total of four study zones have been selected, where three habitats were studied:
Quercus faginea (9240),
Quercus pyrenaica (9230), and
Quercus robur (9160) (
Figure 1).
Zone A includes 135 ha of Galician-Portuguese forest dominated by
Quercus robur (9160) and
Quercus pyrenaica (9230). It is located within the SAC “Belate” (ES2200018), characterized by the uniqueness and diversity of habitats, flora, and fauna, which have been recognized through its protection in the Habitats Directive. According to the Digital Climatic Atlas of the Iberian Peninsula (DCAIP) [
35], the average annual temperature in this area is 13 °C, and the annual precipitation ranges from 1000 to 1300 mm. Forests extend from the limits of Quinto Real to Usategieta in Leitza. Pyrenean oaks comprise relatively poor and open forests, with a distinctive acidophilic flora (Acer
sp., Sorbus sp., Ilex aquifolium, Crataegus monogyna, i.e.,). Habitat
Quercus pyrenaica (9230) is found on acidic substrates with elevations ranging between 400 and 1600 m above sea level (a.s.l.) and slopes ranging from 20% to more than 40% [
36].
Zone B covers 314 ha of sub-Atlantic and Central European oak-hornbeam forests (
Quercus robur habitat, 9160). The oak grove forests are managed by the Ultzama and Basaburua valley plans (ES2200043) and have been recognized through their protection as a Natura 2000 special area of conservation. Forests are characterized by deep and hydromorphic soils. Silty or clay-silty rocks dominate the lithology, whose limited extension prevents the existence of a specific fauna, such as Rana dalmatina or the middle-spotted woodpecker (
Dendrocopos medius) [
37], though the habitat serves as a refuge for other species, typical of deciduous forests or nearby habitats. The elevation ranges from 500 to 700 m a.s.l., and the slope steepness ranges from 2–3% to 0.4–0.5%. The lithology is mainly composed of washed brown earth acids [
38].
In this area, the annual mean precipitation reaches approximately 1347 mm, and the average annual temperature is 13 °C. Forests are characterized by a smooth topography, with elevations below 400 m.
Zones C and D include 533 ha of Iberian oak forests of
Quercus faginea and
Q. canariensis. These zones are located within the “Montes de Valdorba” (ES2200322) and “Sierra de Lokiz” (ES2200022) SACs, which have a management plan for monitoring the conservation status of
Quercus faginea habitat. Forests are dominated by
Quercus faginea, which lives on neutral substrates between 500 and 1500 m a.s.l. In “Sierra de Lokiz”, the annual precipitation ranges from 800 to 1200 mm, and the average annual temperature varies from 10 °C to 12 °C [
39]. However, in “Montes de Valdorba” the average annual temperature is 12.5 °C, and precipitation ranges from 600 to 900 mm.
3. Materials and Methods
A four-step methodology was conducted to classify and map the conservation status of oak grove habitats,
Quercus robur, Quercus pyrenaica, and
Quercus faginea (codes 9160, 9230, and 9240, respectively) (
Figure 2). Initially, Sentinel-2 and LiDAR data were gathered and processed to develop radiometric indices and LiDAR metrics related to biodiversity. Furthermore, to cover the variability of the study sites, a representative random sample was created. In a third step, the Kruskal–Wallis and Dunn statistical tests were used to compare conservation status. Finally, the most suitable variables with an ecological significance were selected to classify and map the conservation status.
3.1. Conservation Status Reference Data
Navarra’s provincial community carried out its own habitat inventory (2004) of the areas included in the Natura 2000 Network. It was executed taking as a reference the enclosures of Crop and Exploitation Map (MCA) at a 1/25,000 scale. The data derived from the qualitative evaluation of conservation status carried out between 2004 and 2010 [
36,
40] were used for the analysis. The conservation status was classified into three categories: 1. unfavorable-bad (UB); 2. unfavorable-inappropriate (UI); and 3. favorable (F).
Habitats in the favorable status category are mature forests with a balanced distribution of tree ages and size classes. The forests may have open gaps due to the fall of old trees, favoring natural regeneration. The stands present little sign of human activity, raising their ecological value and microhabitat presence. There is also a high presence of dead wood on the ground or standing in various states of decomposition.
Unfavorable-inappropriate status integrates forests with a closed canopy, but older trees have not reached mature age yet. Dense regrowth of young trees in high competition for light could be found below the main canopy. The presence of standing dead wood of a small diameter is mainly due to young trees’ mortality.
The unfavorable-bad status category is characterized by a predominance of young trees originated from seed that form a forest in an initial state of succession, although stump or root sprouts are also frequent. Habitats resemble young forests, generally with lower canopy density and a lower amount of dead wood, generated by encroachment or large forest clearings.
3.2. ALS and Sentinel-2 Data Description
In this section, we describe the ALS and Sentinel-2 datasets used in this work. The ALS data were captured in 2017 by the Spanish National Plan for Aerial Orthophotography (PNOA) using a Single Photon LiDAR (SPL100). The SPL100 operates at a wavelength of 532 nm and collects up to 6 million points per second, with an average pulse repetition frequency of 60 KHz and a flying altitude between 3,500 and 6,000 m above the ground. Its maximum scanning angle is 25° and RMSE obtained in Z values was 6–10 cm. Point clouds, with a density of 14 points per m
2, were provided in 1 × 1 km tiles in LAS format in European Terrestrial Reference System (ETRS) 1989 Universal Transverse Mercator (UTM) and orthometric heights [
41]. A total of 80 LAS files, publicly available at the CNIG Download Centre and at the Spatial Data Infrastructure of Navarra (IDENA), were selected to cover the extension of the Zones A, B, C, and D.
The cloud-free image from the Sentinel-2A/B MSI sensor, of medium-high spatial resolution, and captured on 26th September 2019, was obtained from the European Space Agency’s (ESA) Sentinel Scientific Data Hub (
https://scihub.copernicus.eu/) with a processing Level-2A that refers to the bottom of the atmosphere (BOA), being atmospherically and topographically corrected.
3.3. ALS Data Processing and Metrics Computation
The return heights of the point clouds were normalized using FUSION LDV 4.0 open-source software [
42] by subtracting the elevation data from a digital elevation model (DEM) with 2 m grid resolution, provided by IDENA.
A complete set of statistical metrics, related to vertical and horizontal structure metrics commonly used within forestry, were generated using FUSION LDV 4.0 at a spatial resolution of 20 m, due to the regional scale of the analysis and to better match the spatial resolution of Sentinel-2 data. Following Domingo et al. [
26], a threshold value of 0.2 m height was applied to remove understory and ground returns, considering also the RMSE in Z values of ALS-PNOA data. Metrics related to vegetation height include different intervals of return distribution percentiles (P01, P05, P10, P20, P30, P40, P50, P60, P70, P75, P80, P90, P95, and P99), the minimum, maximum, median, and mode of height distribution (Elev.min, Elev.max, Elev.mean, and Elev.mode), quadratic and cubic elevation (Elev. SQRT mean SQ, Elev. CUR mean CUBE), and several L moments (Elev. L1, Elev. L2, Elev. L3). L moments are canopy height metrics computed using linear functions of ordered data that suffer less from sample variability and are more robust to outliers in the data [
30]. The computed metrics related to vegetation height variability correspond to skewness (Elev.skewness), kurtosis (Elev.kurtosis), coefficient of variation (Elev.CV), variance (Elev.variance), standard deviation (Elev. SD), and interquartile distance (Elev.IQ). Metrics related to canopy cover include the percentage of first or all returns above a threshold, the canopy relief ratio (CRR), the mean or the mode (e.g., % first ret. above 0.20), the percentage of all returns with a range of 0.5 m (e.g., % all ret. between 1 and 1.5 m), and the ratio of all returns with respect to the number of total returns (e.g., (all ret. above 0.20)/(total first ret.) by 100)).
Furthermore, the return proportion at every 0.5 m was computed as a measure of strata density. The strata metrics were used to subsequently compute two structural diversity indexes. The LHEI (LiDAR height evenness index), which is an adaptation of the Pielou’s index, and the LHDI (LiDAR height diversity index), which resembles the Shannon-Weiner diversity index. According to Listopad et al. [
15], these structural diversity indices constitute valuable surrogates for structural and species biodiversity, allowing the development of ecosystem-specific parameters as quantifiable targets for ecosystem conservation. The two structural diversity indexes were computed by applying the formulas proposed by Listopad et al. [
15] in R software (see Equations (1) and (2)).
where p is the proportion of returns every 0.5 m (h).
3.4. Sentinel-2 Data Processing and NDVI Derivation
The normalized differential vegetation index (NDVI) has been derived from Sentinel-2A images that have been atmospherically corrected (level 2A) using SNAP software (Sentinel toolboxes:
https://step.esa.int/main/download/snap-download/). According to Equation (3), the red band related to chlorophyll absorption (Band 4) and the relatively high reflectance in the near infrared (NIR) (Band 8) of vegetation are used to calculate the NDVI.
3.5. Sample Selection
A sample of the data was selected in order to reduce computational effort, avoid overfitting the model, and have better control over the results. The variability of the forest was considered using a representative sample stratified using four variables: (1) canopy cover (using the FCC metric), (2) terrain slope, (3) canopy height (using the P95 metric), and (4) conservation status of the habitats, obtained through stratified random sampling [
43].
Four classes have been established with respect to canopy cover (FCC), ranging from open or treeless spaces to high forest stand densities (see
Table 1). Considering that there is no standard categorization for all habitats and that FCC varies according to local ecological characteristics within each biogeographic region [
6], we defined four FCC ranges.
The slope of the terrain was obtained from the DEMs available in the spatial data infrastructure of Navarra (IDENA), used in the normalization of the point cloud. For this variable, three categories commonly used in its cartographic expression were established (see
Table 1), differentiating between gentle, medium, and steep slopes.
P95 is related to a standard forest’s dominant height and, in turn, provides information about the age and condition of the stand. As can be seen in
Table 2, P95 was categorized into four classes, referring to the age and condition of the forest stands in the three habitats to be analyzed. The P95 thresholds vary according to the habitat by considering the morphological characteristics of the forest stands.
The categorization created by GAN-NIK [
36] in three classes in relation to the conservation status of the different habitats was used: from number 1 for the most poorly conserved habitats to number 3 for those in an optimal conservation status.
The mapping of the LiDAR-derived variables was generated from the LiDAR-PNOA data building raster layers with a 20-m resolution. All the variables were reclassified in a geographical information system (GIS), using ArcGIS software (
https://www.esri.com/en-us/home), and overlapped to generate one layer. A total of 1000 random points in habitats
Quercus robur (9160) and
Quercus faginea (9240), and 450 in habitat
Quercus pyrenaica (9230) were created, considering a minimum distance of 20 m between random points.
Finally, after the first statistical analysis, some of the sample points considered outliers, potentially linked to changes or data mismatches, were visited in the field in August 2020 in order to check the conservation status class assigned. A total of 150 points were distributed among the three sites that were visited. The agreement between the classification of the conservation status reference data and the field was approximately 50%. The misclassified points were reclassified for subsequent analyses.
3.6. Assessment of the Suitability of ALS Metrics for Conservation Status Discrimination
Preliminary analysis determined the absence of normality in the data. The transformation of the data to a logarithmic decimal scale was accomplished as a feasible alternative for normalizing the data. However, the Shapiro–Wilk test performed in an R environment revealed that a normal distribution was not reached (
p-value < 0.05). Accordingly, the nonparametric Kruskal–Wallis test was computed in R [
44,
45] to select the LiDAR variables that were subsequently included in the classification process. The final selection of variables was created based on the presence of significant differences in the medians of the subgroups (
p-value < 0.01) and their ecological significance, according to previous literature. Finally, the Dunn’s test of multiple comparisons was performed in R to pinpoint existing differences in each selected variable between the different categories of conservation.
3.7. Digital Classification of Conservation Status
The performance of two frequently used nonparametric machine-learning classification methods was tested to classify conservation status: SVM and RF. SVM was computed using radial (SVMr) and linear (SVMl) kernels. The SVM cost parameter was parametrized within an interval of 1–1000 and gamma within 0.01–1. The RF classifier was parametrized by applying between 1 and 3,000 trees to growth (ntrees) and between 1 and 3 metrics in each node (mtry), in accordance with Rodrigues et al. [
46], and bias was corrected. The models were computed in an R environment using “e1071”, “randomForest” [
47] and “caret” [
48] for SVM and RF, respectively.
The classification was carried out using training and testing datasets based on a random sample of pixels (see the sample data section). Models were validated using the testing dataset, computed by applying a stratified random sampling of 30% to include the different conservation status. To increase robustness in the results, the validation was executed 100 times, and average performance values were computed [
49]. In order to compare and, subsequently, determine the best classification model, the classification confusion matrices, user’s and producer’s accuracy, and overall accuracy were assessed [
50]. The analysis of the results using LiDAR metrics on their own and in combination with the NDVI index was also performed.
5. Discussion
Monitoring conservation status in special areas of conservation (SACs) is compulsory for all European member states, which was traditionally performed by using photo interpretation techniques and fieldwork. These practices present high economic and human costs, requiring the development of efficient alternatives on a larger spatial scale. The results reveal the suitability of single-photon LiDAR data for classifying conservation status in Quercus faginea (9240), Quercus pyrenaica (9230), and Quercus robur (9160) habitats under the Habitats Directive.
LHEI, LHDI, P95, FCC, CV, and percentage of returns above the mean (PRAM) were the most suitable LiDAR metrics to pinpoint the differences between conservation statuses in
Quercus robur (9160) and
Quercus faginea (9240) habitats. Though LHEI, LHDI, and FCC LiDAR metrics were the most relevant in the
Quercus pyrenaica 9230 habitat, these results are in accordance with Zlinszky et al. [
24], who determined that the selected variables depend on the habitat type. The selected variables provide direct or indirect information on FCC, age, presence of old trees, diversity, structural complexity, and strata development. This seems to indicate that LiDAR metrics and structural diversity indices provide suitable information about conservation status and ecological characteristics [
15]. Our results are in accordance with Gelabert et al. [
12], who concluded that structural diversity indexes, such as LHDI and canopy height variability metrics, were suitable for depicting structural differences in Mediterranean
Pinus halepensis forests affected by fires occurring on different dates. On the contrary, unlike the good results obtained by Valbuena et al. [
27] in the use of the L-moments to quantify the complexity of the canopy and classify the structure of the boreal forest according to its ecological function, in our case, no significant differences were found with respect to the conservation status estimation.
The classification results show the feasibility of single-photon LiDAR to discriminate conservation status in oak grove forests with an accuracy after validation of 83.01%, 75.51%, and 88.25% for Quercus robur (9160), Quercus pyrenaica (9230), and Quercus faginea (9240) habitats, respectively. Our model results have comparable accuracies to previous studies that classified the conservation status in grasslands [
53,
54], steppes [
24], or heatlands [
55] using LiDAR or hyperspectral data or their combinations, which range roughly from 70% to 89%. Though this comparison should be interpreted with caution as the habitat type was different, the lower penetrability of the SPL data appears to be complemented by the high density of points in the data source. The use of LiDAR data was in accordance with Simonson et al. [
25], who showed the applicability of three-dimensional data sources to predict conservation status indicators in
Quercus suber habitats in Portugal. A potential handicap of our approach could have been the discrepancy in dates between the ALS data (2017) and the conservation status categorization (2004–2010). However, the conservation status does not vary rapidly over time, and there was good agreement in classification status in the plots visited in the field in 2020, which potentially were affected by structural changes, supporting the usefulness of the selected approach and associated accuracies for conservation status classification. The best classification method for all the habitats, independent of the sample size and the number of variables included in the classification, was RF, in agreement with Marcinkowska-Ochtyra et al. [
54]. Furthermore, our results also suggest that radial kernels work better than linear kernels in the SVM classifier, agreeing with Domingo et al. [
26] and Gelabert et al. [
12].
Concerning the combination of passive and active sensors, several authors have successfully integrated hyperspectral and LiDAR for grassland habitat mapping. Furthermore, multispectral and LiDAR data have also been combined for forest structural variable characterization. For example, Alonso-Benito et al. [
56] reported an improvement of 10% in the overall accuracy of fuel type mapping when combining World View-2 and LiDAR data, and Mutlu et al. [
57] pointed out an improvement of 14% when combining QuickBird and LiDAR data. However, our results showed that the inclusion of the NDVI index from a Sentinel-2A image did not significantly improve the classification accuracy, in accordance with Domingo et al. [
26], who concluded that ALS-derived metrics have more importance in fuel model classification than multispectral Sentinel-2 data, which added a minor improvement in fuel type classification accuracy.
Future research should focus on testing the performance of further structural diversity indices, such as rumple, which were successfully used to characterize structural complexity [
58] and generate parsimonious models in fuel type mapping [
26]. In addition, further analysis should consider other habitat types, including grasslands and shrublands, to better provide regional-scale approaches for conservation status monitoring.