1. Introduction
Earth systems modeling relies on accurate estimates of vegetation composition and distribution to characterize atmospheric-terrestrial fluxes of water, carbon, nutrients, and energy. Within a landscape type, species richness positively correlates with rates of microbial activity and decomposition [
1,
2]. The composition of plant types influences soil temperature and moisture, which in turn regulates microbial activity and governs carbon and nitrogen cycling, and thus the long-term balance of greenhouse gas emission and sequestration [
3,
4,
5,
6,
7,
8]. The most recent iteration of the Model for Interdisciplinary Research on Climate, Earth System version 2 for Long-term (MIROC-ES2L) simulations, incorporates a land based biogeochemical component that links the interactions of soil nitrogen-carbon and vegetation [
9]). Knowing the vegetation distribution in vast peatlands and at a realistic spatial scale is thus key for modeling these stores and fluxes.
Spectra containing multiple vegetation types require a deconvolution approach that can determine the fractional abundance of each vegetation component [
10]. Image analysis algorithms like random forest and maximum likelihood result in hard-edged binning of pixels where each pixel represents one class of material. More advanced spectral unmixing tools, such as spectral mixture analysis (SMA) analyzes sub-pixel fractions of component materials. SMA can distinguish between materials that are spectrally and functionally similar such as between multiple anthropogenic surfaces or multiple plant functional types (PFTs) [
11]. This ability to distinguish between materials that are spectrally similar, makes SMA well suited to systems like peatlands that exhibit fine scale variability in vegetation coverage [
12].
Spectral unmixing compares each image pixel’s spectral profile to a spectral endmember reference set in order to identify the fraction of each endmember’s presence in the pixel. Simple unmixing models use a single set of endmembers to unmix each image [
13], neglecting endmember variability [
10,
14,
15]. MESMA is an extension of the simple unmixing model, in which number and types of endmembers are allowed to vary on a per pixel basis, thus accounting for endmember variability [
16,
17].
Here, we evaluate the effectiveness of using a field dataset of spectra and plot composition, collected across different wetland classes, to parameterize MESMA models to predict fractional coverage of PFTs. Specific questions include (1) does using combined ordination-partial least squares regression (PLSR) provide an effective feature selection tool for identifying bands for unmixing hyperspectral imagery, (2) can we scale MESMA predictive mapping using the feature selection tool, from a narrow sampled swath to a larger unsampled area, and (3) does constraining MESMA to the verified endmembers present at a site improve prediction of vegetation coverage.
This work complements the existing body of research publications in MESMA such as [
16,
17,
18] in three important ways. First, the field-collected pure spectral libraries of PFT enhance the accuracy of deconvolution of mixed peatland vegetation associations within a single pixel. The second contribution of this work is to demonstrate the use of PLSR as a dimension reduction technique, particularly as applied to vegetation feature extraction and the ability to scale from the small footprint of airborne hyperspectral imagery to the larger footprint of multispectral satellite imagery. Lastly, the focus across multiple wetland classes and hyperspectral tiles, permits the investigation of whether these methods can be used to estimate fractional composition in heterogeneous areas with difficult field access.
4. Discussion
In this research, we integrate field-collected data to create a spectral reference set appropriate for mapping across multiple wetland classes in a suite of subarctic wetland-peatlands. The MESMA mapping that results from the library created from this field data collection suggests that a relatively small investment in fieldwork can be leveraged to map wetlands across multiple hyperspectral tiles.
We use a set of study-defined PFT library spectra, to apply PLSR as a feature extraction method to identify a parsimonious set of bands capable of discriminating between PFTs. The four-bands identified through feature extraction can be used in spectral mixture analysis (MESMA) to achieve predictive mapping that appears to have accuracy similar to but with lower classification errors, than that achieved with unmixing using the full set of 110 hyperspectral bands. After unmixing the imagery using MESMA, we find closer correlations between the sampled and the predicted fractional coverages using the PLSR 4-band rather than the 110-band library. These findings agree with previous work on the use of PLSR as a feature selection technique useful in reducing model complexity and to produce more accurate prediction results compared to full wavelength spectrum libraries [
58].
With the exception of the
Sphagnum category at BAR, we find that while the constrained spectral mixture analysis models returns slightly better results in both model fit and accuracy (i.e., RMSE in
Table 5), the unconstrained models’ results are not significantly different. Results in
Table 7 also show that the field observed values (column 4) most closely match the Band 64–67 data for most PFT and sites. Agreement between unconstrained and constrained analyses suggests that the predicted species composition based on the unconstrained set of PFTs is relatively close to that predicted by the constrained set of PFTs. Thus, extensive fieldwork in remote wetlands may not be necessary for predicting species composition. Further, when the constrained (columns 6 and 8) and unconstrained (columns 5 and 7) values are similar, it is less important to have extensive field measurements, as it is an indication that unconstrained analysis (with the specified bands, 4 or 110, or both) is sufficient for predictive purposes. This suggests that using unconstrained libraries could be suitable for mapping areas where field sampling is not possible due to inaccessibility.
There is a clear computation cost benefit as a result of using feature extraction to identify critical bands for unmixing. The number of endmembers for subsetting, or in other words the unmixing complexity, is also a significant driver in processing time (
Table 8). Some of the matches between predicted and actual are high, an additional indication that the process is robust, while the processing times are small for all small swaths it is greater when we constrained the unmixing by the PFTs known to be present at the site. The processing time for the scaled-up imagery used here increased dramatically and appears to be driven both the number of bands used for unmixing (higher for the higher number of bands) and by the three versus 4-endmember unmixing complexity which is greater for the higher complexity models (
Table 8).
In each of the unmixing model results (both the 4-band and 110-band), the PFTs dominating the unmixed swaths are what is anticipated from the wetland class identified using the Alaska Vegetation Classification [
27]. The highest fractional coverage correlations between the reference and predicted plots are generally for the 4-band rather than the 110-band models. In other words, for the small swath sampling area of Lily Lake predicted coverage, which is a low shrub-scrub wetland which, according to Viereck [
27], would include
Sphagnum and woody shrubs, has 58%
Sphagnum coverage and 26% woody coverage, with only 13% graminoids. BAR, determined by our field sampling composition to be a Wet Herbaceous wetland, should be dominated by graminoids especially Carex, is 77% graminoid and is close to our plot sampling which found 74% graminoid. Finally, BC which is, according to Viereck, a sedge peatland dominated by
Sphagnum and graminoid is 48%
Sphagnum and 33% graminoids. Overall, spectral unmixing results in fractional coverages that suggest all three sites are dominated by graminoids and/or
Sphagnum. This finding is supported by our field sampling.
Based on the results of the small swath fractional coverages, we suggest it is possible that when
Sphagnum is present at a site (as at LL and BC (
Table 5)), spectral mixture analysis may over-predict the presence of
Sphagnum spp., with the resulting maps displaying a greater fractional coverage than is present. This overprediction may occur because
Sphagnum is frequently found at wetland sites as a ground covering such that even when a graminoid or woody taxa are present, they co-occur the
Sphagnum, and the woody (or other PFTs) signal is dwarfed by a more dominant
Sphagnum signal which may cover a greater percentage of the pixel. We see a related effect when we consider the black spruce occupied wetlands which, according to Viereck, frequently have the spruce appearing together with
Sphagnum. In these wetland-peatlands where small spruce often occurs singly and cover only a small meter area, the
Sphagnum signal may dominate.
In terms of accuracy assessment,
Figure 6 boxplots visually display the finding that fractional coverages’ means, when calculated using a 4-band library for unmixing are closer to field sampled results than means of fractional coverage found using the 110-band library. We further found that constraining PFTs by field-knowledge, did not produce significantly better results suggesting it is more helpful to have a broad locally produced spectral library than it is to have a specific site informed library. The finding makes this feature extraction method meaningful for mapping in areas where field sampling is logistically challenging.
We illustrate scaling up the spectral mixture analysis using a large cropped area from the LL site (
Figure 8 and
Figure 9).
Figure 9A shows the 4-band library successfully unmixing water while the 110-band library does not recognize the full extent of the water and, because the image is close to 100% unmixed, we know the 110-band library is misclassifying water into another class. In Panel B., the 110-band is overclassifying conifers. We see this over-classification by comparing the true-color image with the wetland class overlay, by doing so we see the 110-band library does not follow the wetland class map as closely as does the 4-band library. In Panel C., the 110-band underestimates
Sphagnum coverage since we know both from our sampling plots and field-based knowledge, that entire site is underlain by
Sphagnum even when conifers are present.
By identifying the small set of bands best able to discriminate between wetland PFTs, this research contributes toward the ability to scale from the high (spatial and spectral) hyperspectral resolution to the lower resolution of multi-spectral imagery. The advantages of scaling from airborne collected hyperspectral imagery to lower spectral resolution of satellite imagery include both the high temporal repeat of satellite imagery and its wall-to-wall, rather than narrow swath, coverage of an area, and the ability to leverage broadly available data.