1. Introduction
Hyperspectral imagery (HSI) can be acquired from numerous contiguous spectral bands, enabling the identification of materials that cannot be distinguished in traditional broadband imagery [
1]. However, different material substances may contribute to the spectral measurements of individual pixels. For such mixed pixels, we aim to identify the different materials present in the mixture, along with their corresponding proportions. Hyperspectral unmixing (HU) decomposes the spectral measurements of mixed pixels into a set of constituent spectra, or endmembers, and a set of corresponding fractions, or abundances, indicating the proportional presence of each endmember in the pixel [
2,
3]. Endmembers typically consist of familiar macroscopic substances in the scene, such as soil, trees, water, or any natural or man-made materials. HU provides the capability to identify subpixel details, which has practical value in many scenarios [
4,
5,
6].
Based on the spectral mixing mechanism, common HU models can be categorized into Linear Mixing Model (LMM) and Nonlinear Mixing Model (NLMM) [
7,
8]. LMM assumes that each pixel in a HSI is a linear combination of endmembers and their abundances. Due to its generality, LMM has been the primary model for HU over the past few decades [
9]. HU using LMM consists of two steps. The first step involves endmember extraction, with typical methods including Pure Pixel Index (PPI) [
10], N-FINDR algorithm [
11], and Vertex Component Analysis (VCA) [
12]. The second step is abundance estimation based on spectral data and the extracted endmembers, typically achieved through optimization algorithms with abundance nonnegativity constraint (ANC) and abundance sum-to-one constraint (ASC), with the common method being Fully Constrained Least Squares Unmixing (FCLSU) [
13]. However, this two-step unmixing approach may lead to error accumulation [
14]. To avoid such errors, blind unmixing techniques, which simultaneously perform endmember extraction and abundance estimation, have been widely researched [
15,
16,
17]. Existing methods often rely on Nonnegative Matrix Factorization (NMF), where many extended NMFs introduce a series of regularization constraints during the matrix factorization process to incorporate prior information on both spectral and spatial domains into the NMF framework, thus enhancing the stability of unmixing [
18,
19,
20].
However, in practical scenarios, the spectra captured by detectors are not simply the weighted sum of individual endmember spectra [
21]. The spectral variability (SV) caused by lighting conditions, terrain, atmospheric effects, and nonlinear effects introduced by complex interactions among materials in the scene limit the ability of LMM to achieve high performance [
22]. Many LMM-based methods attempt to introduce additional parameters to model SV, but their modeling capability under complex conditions lacks good generalization [
23,
24,
25,
26]. To address complex SV and nonlinearity, NLMM is an ideal solution. NLMM can be divided into model-based and model-free methods [
27]. Model-based methods assume that the spectral mixing process is known a priori. A popular class of NLMM is the Bilinear Mixing Model (BMM), which simplifies nonlinear theory by assuming that light experiences at most two reflections of the illuminating radiation before reaching the detector. A major variant of this model is the Fan model [
28], which performs poorly in scenarios with only linear interactions. To improve model generalization, the Generalized Bilinear Model (GBM) [
29], the Linear-Quadratic Model (LQM) [
8], and the Polynomial Post Nonlinear Mixing Model (PNMM) [
30] are proposed, incorporating a hyperparameter to balance the weights of linear and nonlinear terms in the model. However, the mixing priors are often unknown in practical applications, leading to poor generalization and difficulties in model selection. Therefore, to improve model generalization, the development of model-free unmixing methods is necessary.
In recent years, the powerful learning and data fitting capabilities of deep learning have provided strong support for HU. The network architectures are primarily based on autoencoders and their variants, where HSIs are encoded into corresponding abundance fractions and decoded back into spectra, with the decoder weights representing endmembers. Most deep learning-based HU approaches focus primarily on pixel-wise unmixing, employing various regularizations to constrain the solution space. The mDAE [
31] employs a nonnegative sparse autoencoder for unmixing and cascades a marginalized denoising autoencoder to mitigate the effects of noise. Recognizing that cascading introduces additional reconstruction errors, the uDAS [
32] incorporates denoising ability as a denoising constraint into the network optimization process. To enhance the sparsity of estimations, EndNet [
33] introduces a novel loss function incorporating a Kullback–Leibler divergence term with SAD similarity and several other penalty terms. In contrast to commonly used norm-based sparse priors, OSPAEU [
34] observes that different abundance maps are nearly orthogonal, thus proposing an orthogonal sparse prior that achieves better abundance sparsity. Recently, several methods have integrated discriminative networks into their models, where structural distribution similarity is utilized to guide spectral reconstruction [
35,
36,
37]. However, these methods are limited to pixel-level unmixing, despite ample evidence demonstrating the advantages of incorporating spatial information into the unmixing process.
Leveraging the convenience of neural network frameworks, autoencoder-based methods effectively exploit spatial features through convolutional layers [
38]. CNNAEU [
39] segments HSIs into a series of patches and extracts spatial information using 2D convolutional neural networks (CNNs). DEAS [
40] designs a plug-and-play extended-aggregated convolutional module, which extends the algorithm’s spatial receptive field using dilated convolutions at different scales and demonstrates its effectiveness in enhancing the unmixing capabilities of CNNAEU. In fact, the targets exhibit varying scales and sizes, with pure pixels distributed throughout the entire HSI. Networks utilizing local convolutional filters overlook the global material distribution and long-range interdependencies, resulting in the loss of essential spatial feature information during the unmixing process. While MSNet [
41] scales the original HSI to expand the receptive field of CNNs, downsampling operations lead to the loss of detailed information, making it challenging to balance the preservation of detailed information and the acquisition of comprehensive information. In contrast to the limited receptive fields of traditional CNNs, considering the non-local spatial correlations between hyperspectral pixels, employing self-attention mechanisms proves to be a viable solution. DeepTrans [
42] pioneers the application of transformers [
43] in HU, capturing non-local feature dependencies through interactions between image blocks. However, block-based operations introduce inconsistencies associated with patches. UST-Net [
44] integrates the advantages of MSNet and DeepTrans, applying a multi-head self-attention mechanism (MHSAM) based on shifted windows to HSIs at different scales, enabling operations on the entire HSI and eliminating inconsistencies between patches. Nevertheless, due to computational constraints, the current non-local spatial correlations are still based on operations between blocks and cannot establish connections between pixels. Additionally, while the introduction of spatial information yields favorable endmember results, it often leads to excessive smoothing of abundance transitions. An ideal approach involves jointly extracting spatial-spectral information from HSI using 3D CNN, albeit at the cost of increased computational burden. Hence, it is common practice to either sequentially extract spatial and spectral information from HSI or employ dual-stream networks for joint extraction of spatial-spectral information. The former is exemplified by SSAE [
45], where spatial information is initially utilized for effective endmember extraction, subsequently fixed into the decoder of the abundance estimation network. 1D CNN is then employed to extract the spectral features of HSI, facilitating more accurate abundance estimation. To achieve end-to-end learning, SSANet [
46] incorporates an adaptive spectral-spatial attention module, sequentially comprising a spatial attention module and a spectral attention module. The latter typical method is DBA [
47], which extracts spatial-spectral information through two branches and adjusts the weighting ratio of both as hyperparameters to regulate their impact on the unmixing results. SSCU-Net [
48] and MSSS-Net [
49] adopt weight-sharing mechanisms to enable interaction between the information streams, thereby reducing the selection of hyperparameters. Upon summarizing existing unmixing algorithms, it is observed that none fully account for both spatial and spectral information of HSI due to computational constraints, inevitably resulting in a decrease in unmixing performance. Furthermore, the aforementioned unmixing algorithms are all based on the LMM, comprising a meticulously designed encoder and a simple single-layer decoder. DAEU [
50] experiments reveal that the simplistic structure of the decoder influences the performance of the autoencoder in reconstructing inputs, indicating that a single-layer decoder fails to fully exploit the robust capabilities of the encoder.
Linear unmixing can be easily addressed using classical methods, while deep learning demonstrates stronger competitiveness in tackling nonlinear problems [
51,
52]. NAE [
53] reconfigures the decoder based on the PNMM [
30] and leverages pre-training to enhance unmixing performance. Taking into account the higher degrees of freedom inherent in nonlinear neural networks, AEC [
54] designed the encoder as the inverse of the mixing process, thereby enhancing the algorithm’s robustness. UHUNA [
55] designs three specific nonlinear models for the decoder while retaining the ability for further expansion, thereby improving algorithm versatility. RDAE [
56] unfolds the GBM [
29] to construct the decoder while extracting endmembers and their second-order scattering interactions. 3DAEU [
57] jointly extracts spatial-spectral information of HSI using 3D CNN, with a carefully designed decoder covering several existing artificial models. Compared to linear methods, there are relatively fewer algorithms developed based on NLMM. On one hand, existing nonlinear unmixing methods often confine themselves to specific mixing models. Developing data-driven model-free unmixing methods can effectively enhance model generalization. On the other hand, due to the inherent non-convexity of blind unmixing methods, the high degrees of freedom in nonlinear unmixing algorithms often generate a set of meaningless endmembers. Addressing the critical issue of setting appropriate initialization and regularization to guide algorithm convergence towards optimal solutions is a key consideration.
1.1. Motivation
While 1D CNN, 2D CNN and self-attention mechanism have been widely employed for feature extraction, none of these methods fully integrate global spatial information and spectral properties of HSI. Therefore, this paper utilizes a dual-stream network to separately extract spatial and spectral information of HSI, and global pixel-level contextual communication is achieved through a MHSAM based on linear projection, reducing the computational complexity from O() to O(N) without compromising unmixing performance. To fully harness the powerful feature fitting capability of the encoder, a data-driven nonlinear decoder is adopted. The nonlinear type is learned entirely from the data, enabling effective handling of various complex nonlinear scenarios. Considering the challenge of nonlinear decoders easily falling into local optima during model training, a stable initialization method is developed to effectively handle outliers and noise, significantly enhancing unmixing performance.
1.2. Novelty and Contribution
The main contributions of this article are summarized as follows:
We propose a novel global spatial-spectral unmixing method, which integrates global spatial-spectral information in HSI, achieving pixel-level global spatial information interaction to reduce information loss and improve unmixing performance. Unlike conventional patch-based operations, to the best of our knowledge, this is the first application of pixel-level global attention mechanisms in HU, avoiding discontinuities between pixel blocks.
We introduce a decoder structure suitable for nonlinear spectral unmixing. Compared to various nonlinear decoders designed for specific models, data-driven nonlinear decoders do not require the application of mixed priors of the scene, enabling adaptive handling of complex mixed images including linear and various nonlinear mixing scenarios.
We propose a simple and efficient endmember initialization method to mitigate the interference from noise and outliers. Experimental results demonstrate that this method maintains high accuracy across various complex datasets. Moreover, this method can replace the commonly used VCA initialization directly applied to existing autoencoder unmixing algorithms, significantly enhancing unmixing performance.
This article is organized as follows.
Section 2 provides a brief introduction to the mixing model and autoencoder structures.
Section 3 elaborates on the proposed unmixing autoencoder framework, including the specific network architecture, component modules, and loss functions.
Section 4 presents the experimental section, where performance comparisons are made with existing state-of-the-art unmixing methods. Finally, the conclusions of our work are presented in
Section 5.
4. Experiments
In order to better evaluate the proposed method, detailed ablation experiments were conducted on the dual-branch spatial-spectral feature extraction module and the nonlinear decoder within the network. The proposed method was compared with several representative existing methods on three real datasets.
The mean spectral angle distance (mSAD) was used to evaluate the quality of the extracted endmembers, while the mean root mean square error (mRMSE) was utilized to assess the accuracy of the abundance estimation, which are defined as follows:
The proposed method is implemented in Python 3.9 using the PyTorch framework. The network’s learning rate is initialized to
, while the linear decoder is initialized to
. Every 30 epochs of training, the learning rate is decayed by a factor of 0.8. The training epochs for different datasets were adjusted, and the network was run for 200, 400, and 90 epochs respectively for the Samson dataset, Jasper Ridge dataset, and Urban dataset. The decoder weights are initialized based on the endmembers extracted in
Section 4.2.
4.1. Data Description
Due to the challenge of synthetic datasets in reflecting the complex nonlinear interactions in real-world scenarios, only three widely used real datasets were employed to validate the unmixing results of different algorithms, as illustrated in
Figure 5.
Samson contains pixels, with each pixel encompassing 156 spectral channels ranging from 401 nm to 889 nm. This dataset is not affected by bad pixels or severe noise contamination and consists of three endmembers: Soil, Tree, and Water.
Jasper Ridge comprises pixels, with each pixel containing 198 spectral channels ranging from 380 nm to 2500 nm after the removal of channels 1–3, 108–112, 154–166, and 220–224 due to dense water vapor and atmospheric effects. Four endmembers are included in this dataset: Road, Soil, Tree, and Water.
Urban is a large dataset consisting of pixels, with each pixel containing 162 spectral channels spanning from 400 nm to 2500 nm after the removal of channels 1–4, 76, 87, 101–111, 136–153, and 198–210 due to dense water vapor and atmospheric effects. This dataset contains a significant number of outliers and encompasses four endmembers: Asphalt, Grass, Tree, and Roof.
4.2. Endmember Initialization
The unmixing problem can be formulated as a non-convex minimization problem, and reasonable initialization can significantly improve the unmixing performance. Most unmixing algorithms use geometric extraction algorithms such as VCA for initialization, but the existence of outliers and noise may lead to the extraction of meaningless endmembers, which strongly hinder the unmixing process. DAEN [
62] combines stacked autoencoders and VCA to generate well-initialized endmembers, eliminating the influence of outliers, but it is computationally expensive. OSPAEU [
34] removes outliers by measuring the uniformity of neighboring pixels over the entire image. MAAENet [
51] proposes an SLIC-VCA algorithm, which generates spatial groups through image segmentation [
63]. The spectral within the same group are averaged to alleviate the impact of outliers and noise. We further optimize the SLIC-VCA algorithm by following the concept of endmember bundles.
The specific workflow is illustrated in
Figure 6. HSI exhibits similar spectral characteristics within compact spatial neighborhoods. Under the assumption that pure pixels do not exist independently, SLIC [
63] is employed to segment the HSI. Notably, SLIC clusters pixels by considering both spatial Euclidean distances and spectral similarity. The formulation is given by:
where
is the squared Euclidean distance between two pixels,
S is the search size of SLIC, and
m is a hyperparameter balancing pixel distance and spectral similarity strength.
After SLIC segmentation, the HSI is divided into highly correlated sub-pixel blocks. Each sub-pixel block undergoes averaging, effectively eliminating outliers and reducing noise. Following the averaging of spectral collections, based on the concept of endmember bundles, a subset of the spectral collection is randomly selected to run VCA. This approach assumes that a small percentage of image pixels can approximate the original image statistics. Each run of the endmember extraction algorithm yields a different set of endmember spectra.
Next, the extracted spectral library undergoes k-means clustering based on Euclidean distance as a similarity metric. This partitions the spectral library into independent endmember bundles for each ground component, characterizing each endmember with a set of spectra exhibiting spectral variability. Finally, averaging is applied to each group of endmember bundles to obtain the desired initial endmembers.
Utilizing the concept of endmember bundles instead of directly applying the endmember extraction algorithm on the spectral collection offers several advantages. Firstly, running VCA across the entire image typically yields inconsistent results, whereas averaging over multiple samplings minimizes uncertainties. This ensures that even if some VCAs extract incorrect spectra, the final endmembers remain unaffected. Secondly, averaging a set of endmember bundles containing various spectral variabilities further mitigates the impact of outliers and noise, achieving accurate and reliable endmember initialization.
The challenging Urban dataset is employed for comparative experiments in endmember extraction, with VCA [
12], NFINDR [
11], and SLIC-VCA [
51] selected as the benchmark methods.
Figure 7 illustrates the visual results of endmember extraction by different algorithms, while
Table 1 quantitatively lists the performance of all compared methods. Conventional geometric extraction methods are adversely affected by outliers, resulting in the poorest outcomes. SLIC-VCA, by locally averaging spectral signatures, effectively mitigates the impact of outliers, significantly enhancing the accuracy of endmember extraction. However, it fails to accurately differentiate highly similar endmember spectra such as “Grass” and “Tree”. In comparison, VCA-bundles accurately extracts all endmembers, demonstrating robust adaptability to various complex natural scenes compared to other methods.
To further investigate the robustness of VCA-bundles,
Figure 8 visualizes the results of endmember bundle extraction. It can be observed that for each reference endmember, a set of spectral subsets with varying spectral variabilities is generated, and averaging operations help reduce the influence of spectral variability. Moreover, for endmembers that are difficult to extract, such as “Grass”, partial erroneously extracted endmembers are eliminated through averaging, thereby yielding robust results.
4.3. Ablation Experiments
The effectiveness of the proposed branch modules was analyzed by systematically removing the SAFEM, SEFEM, or NLMM decoder module individually from three datasets until all were removed. When the Spatial-Spectral Feature Extraction Module was entirely removed, the encoding module was substituted with a 2D CNN incorporating
spatial channel attention.
Table 2 summarizes the outcomes of endmember extraction and abundance estimation under varied conditions. As anticipated, complete removal of all modules resulted in the poorest performance. When only one module was reintroduced, augmenting the encoder with spatial or spectral information proved more advantageous. The independent use of the nonlinear decoder integrating local spatial information yielded only marginal improvements; however, when paired with SAFEM, which also extracts local spatial information, a significant enhancement in unmixing accuracy was observed. Ultimately, simultaneous utilization of all three modules yielded optimal results, underscoring the efficacy of the proposed dual-branch encoder and nonlinear decoder.
In addition, a highlight of this study is the proposed MHSAM based on linear projection, facilitating pixel-level global spatial interaction. Ablation experiments were conducted using linear layers, CNN layers, and a 5 × 5 MHSAM based on patch-based operations as substitutes for this module on the Samson and Jasper Ridge datasets, as illustrated in
Table 3. The results indicate that using solely linear layers yielded the poorest performance, while CNN layers considering local spatial information offered moderate improvement. In contrast, employing MHSAM with both methods comprehensively integrating global spatial information resulted in significant enhancements in endmember accuracy.
Figure 9 illustrates abundance visualizations of tree, water, dirt, and road on the Jasper Ridge dataset, demonstrating that MHSAM without patch-based processing achieved the smoothest pixel transitions and best visual performance.
4.4. Projection Dimension Analysis
The projection dimension in the global spatial information interaction module is an optional hyperparameter, as demonstrated in Linformer [
60] where a lower projection dimension leads to faster network training speed. However, it may also result in a decrease in network performance. Tests were conducted on different projection dimensions using the Samson dataset and Jasper Ridge dataset, with specific results shown in
Figure 10. Worth noting is that the network performance does not significantly deteriorate as the projection dimension decreases, which could be attributed to the relatively simple nature of the unmixing task. On the other hand, a more noticeable decrease is observed in computation time, which converges when the projection dimension is below 128. Therefore, the projection dimension is set to 128 to minimize information loss while maintaining computational efficiency.
4.5. Experiments
In this section, we conducted comparative experiments with other methods. We took into consideration both linear and nonlinear methods for method selection, as the three chosen datasets are widely used in various linear unmixing algorithms. The comparative methods are as follows:
- (1)
FCLSU [
13]: The most commonly used abundance estimation method. In our experiments, VCA-bundles were used as the endmember extraction method in conjunction with FCLSU.
- (2)
DeepTrans [
42]: A linear unmixing network based on deep learning, which captures nonlocal feature dependencies through operations between image patches.
- (3)
uDAS [
32]: A linear unmixing network based on deep learning, with denoising capability incorporated into network optimization in the form of denoising constraints.
- (4)
SGSNMF [
19]: A linear unmixing network based on NMF, where the group-structured prior information of HSI is integrated into nonnegative matrix factorization optimization, with data organized into spatial groups.
- (5)
NAE [
53]: A nonlinear unmixing network based on deep learning, trained through pixel-wise network.
- (6)
rNMF [
64]: A nonlinear unmixing network based on NMF, with an additional term introduced in the model to consider nonlinear effects.
- (7)
3DAEU [
57]: A nonlinear unmixing network based on deep learning, capturing spatial-spectral information of HSI through 3DCNN, with the design of the nonlinear model encompassing several existing artificial models.
- (8)
A2SAN [
65]: A linear unmixing network based on deep learning, utilizing spectral and spatial modules to extract spatial-spectral information of HSI, and employing attention mechanisms for direct reconstruction.
- (9)
USTNet [
44]: A linear unmixing network based on deep learning, employing multihead self-attention blocks based on shifted windows to extract HSI feature maps at different scales, minimizing loss of detailed information.
All comparative methods were independently run ten times on each dataset. The subsequent evaluation calculated the mean and standard deviation for each method.
4.5.1. Samson Dataset
The
Table 4 presents the abundance RMSE and endmember SAD obtained by different unmixing methods on the Samson dataset. It is observed that all unmixing methods can accurately extract the endmembers “Soil” and “Tree”, but most encounter difficulties in extracting “Water”, possibly due to its low reflectance which makes it difficult to distinguish subtle differences in the loss function. In comparative experiments, only USTNet and the proposed method accurately extract the “Water” endmember, suggesting that the introduction of global spatial information aids in endmember extraction, a finding reinforced by [
45]. FCLSU employing VCA-bundles as the endmember extraction method achieves suboptimal mean SAD results, further demonstrating the advantage of the proposed initialization method. Regarding abundance estimation, it is evident that the proposed method and A2SAN significantly outperform other methods in accuracy, highlighting the advantage of using scale-invariant SAD as a loss function in conjunction with spectral information extraction. Visual results of abundances and endmembers are shown in
Figure 11 and
Figure 12, respectively. Comparisons indicate that deep learning methods exhibit significant advantages over traditional methods in both endmember extraction and abundance estimation, with recent approaches yielding abundance maps closest to ground truth.
4.5.2. Jasper Ridge Dataset
In this section, all the compared unmixing methods are applied to the Jasper Ridge dataset.
Table 5 quantitatively demonstrates the performance of all competing methods, while the visualization results of abundance and endmembers are respectively presented in
Figure 13 and
Figure 14. NAE and rNMF confused “Dirt” and “Road”, resulting in the poorest performance. Although 3DAEU successfully extracted all endmembers, it completely ignored “Road” in abundance estimation. This indicates that nonlinear methods with more parameters are prone to falling into local minima, and may require more prior information and careful hyperparameter tuning. Many linear methods performed better in endmember extraction on this dataset, while DeepTrans, A2SAN and USTNet were affected by initialization and interfered with the extraction of “Road”. FCLSU initialized by VCA-bundles achieved suboptimal performance, further highlighting that more accurate endmember initialization might bring greater benefits compared to sophisticated algorithm design. In the observation of the abundance map, it is evident that the majority of comparative algorithms did not accurately separate the roads, whereas the proposed method fully considered the spatial and spectral information in HSI, achieving better road separation. Demonstrating its stable processing capabilities across different datasets, our method consistently achieved the best mSAD and mRMSE, although it may not have outperformed others in individual comparisons.
4.5.3. Urban Dataset
Table 6 presents the abundance RMSE and endmember SAD achieved by different unmixing methods on the Urban dataset. The Urban dataset is highly complex, with a considerable number of outliers. Most methods face challenges distinguishing between “Grass” and “Tree”, with uDAS and SGSNMF completely unable to differentiate between the two. Methods incorporating spatial information such as DeepTrans, 3DAEU, and A2SAN perform well in extracting most endmembers, while the proposed method and USTNet achieve optimal mSAD in two specific endmembers. Influenced by endmember extraction challenges,
Figure 15’s abundance visualization results show suboptimal performance for most methods, whereas ASAN, USTNet, and the proposed method, proposed in the last two years, integrate spatial-spectral information extraction and self-attention mechanisms, achieving the best results. Visual results of endmembers are shown in
Figure 16.
4.6. Processing Time
Table 7 presents the running times of all methods on three datasets. Specifically, FCLSU, uDAS, SGSNMF and rNMF were implemented in Matlab (2022a), while the remaining methods were implemented in PyCharm (2022). The experiments were conducted on a computer equipped with an Intel Core i5-13600KF processor, 32 GB of memory, and an NVIDIA GeForce RTX 2080 Ti graphics processing unit. On the Samson and Jasper Ridge datasets, the runtime of all methods remained within approximately one hundred seconds, except for USTNet, which did not utilize GPU acceleration. The Urban dataset, being larger, particularly saw a sharp increase in computation time for the 3D CNN-based method 3DAEU. The proposed method, leveraging global spatial-spectral information extraction, achieved a runtime lower than that of pixel-wise methods like uDAS, thus maintaining acceptable computational costs.
Figure 11.
Abundance maps of soil, tree, water on the Samson dataset obtained by different methods.
Figure 11.
Abundance maps of soil, tree, water on the Samson dataset obtained by different methods.
Table 4.
Valuation metrics SAD and RMSE results of Samson dataset (). Best results are bold.
Table 4.
Valuation metrics SAD and RMSE results of Samson dataset (). Best results are bold.
Samson | FCLSU | DeepTrans | uDAS | SGSNMF | NAE | rNMF | 3DAEU | A2SAN | USTNet | Proposed |
---|
SAD | Soil | 1.67 ± 0.06 | 2.49 ± 0.40 | 3.12 ± 0.09 | 1.63 ± 0.11 | 2.08 ± 0.03 | 3.52 ± 0.01 | 1.18 ± 0.06 | 2.26 ± 0.21 | 1.05 ± 0.02 | 1.46 ± 0.03 |
Tree | 3.71 ± 0.09 | 4.93 ± 0.24 | 5.44 ± 0.36 | 6.04 ± 0.38 | 4.96 ± 0.03 | 8.26 ± 0.01 | 2.97 ± 0.02 | 4.07 ± 0.04 | 3.35 ± 0.03 | 3.15 ± 0.02 |
Water | 9.85 ± 0.12 | 8.81 ± 0.56 | 13.93 ± 1.15 | 23.29 ± 0.30 | 13.31 ± 0.14 | 23.73 ± 0.02 | 24.26 ± 0.43 | 13.36 ± 0.37 | 2.47 ± 0.03 | 2.15 ± 0.13 |
Mean SAD | 5.08 ± 0.06 | 5.41 ± 0.34 | 7.49 ± 0.49 | 10.32 ± 0.03 | 6.78 ± 0.05 | 11.84 ± 0.01 | 9.49 ± 0.17 | 6.56 ± 0.08 | 2.29 ± 0.02 | 2.25 ± 0.04 |
RMSE | Soil | 17.52 ± 0.04 | 16.40 ± 0.33 | 25.29 ± 0.84 | 20.12 ± 0.27 | 23.01 ± 1.95 | 25.87 ± 0.00 | 11.65 ± 0.05 | 7.63 ± 0.15 | 8.54 ± 0.13 | 9.20 ± 0.10 |
Tree | 16.28 ± 0.14 | 17.35 ± 1.03 | 25.29 ± 0.82 | 25.56 ± 0.55 | 22.19 ± 1.60 | 20.47 ± 0.00 | 6.45 ± 0.06 | 4.77 ± 0.08 | 11.11 ± 0.10 | 7.45 ± 0.07 |
Water | 28.29 ± 0.10 | 28.28 ± 0.96 | 41.37 ± 0.68 | 37.62 ± 0.27 | 36.07 ± 1.92 | 37.34 ± 0.00 | 10.03 ± 0.11 | 6.07 ± 0.18 | 9.37 ± 0.08 | 5.64 ± 0.16 |
Mean RMSE | 21.34 ± 0.07 | 20.68 ± 0.77 | 30.65 ± 0.57 | 27.76 ± 0.25 | 27.85 ± 1.43 | 28.77 ± 0.00 | 9.62 ± 0.04 | 6.16 ± 0.09 | 9.73 ± 0.09 | 7.57 ± 0.09 |
Figure 12.
Extracted endmember comparison between the different algorithms and the corresponding GTs in the Samson dataset.
Figure 12.
Extracted endmember comparison between the different algorithms and the corresponding GTs in the Samson dataset.
Table 5.
Valuation metrics SAD and RMSE results of Jasper Ridge dataset (). Best results are bold.
Table 5.
Valuation metrics SAD and RMSE results of Jasper Ridge dataset (). Best results are bold.
Jasper Ridge | FCLSU | DeepTrans | uDAS | SGSNMF | NAE | rNMF | 3DAEU | A2SAN | USTNet | Proposed |
---|
SAD | Tree | 9.40 ± 0.20 | 5.42 ± 2.01 | 14.90 ± 1.76 | 13.71 ± 0.38 | 26.32 ± 0.06 | 24.94 ± 0.01 | 7.77 ± 0.93 | 11.41 ± 1.88 | 4.86 ± 0.17 | 4.03 ± 0.01 |
Water | 18.08 ± 1.12 | 11.18 ± 2.76 | 9.58 ± 1.63 | 21.27 ± 1.95 | 29.60 ± 0.28 | 28.69 ± 0.01 | 22.59 ± 2.68 | 12.45 ± 3.31 | 3.76 ± 0.03 | 5.15 ± 0.04 |
Dirt | 5.52 ± 0.28 | 6.24 ± 0.64 | 13.98 ± 5.00 | 13.94 ± 3.71 | 22.40 ± 0.02 | 5.49 ± 0.00 | 7.31 ± 0.50 | 11.30 ± 1.12 | 18.35 ± 0.29 | 2.42 ± 0.03 |
Road | 4.77 ± 0.30 | 16.75 ± 1.39 | 5.85 ± 0.20 | 4.07 ± 0.52 | 54.30 ± 0.24 | 70.20 ± 0.02 | 4.62 ± 0.19 | 17.70 ± 1.76 | 10.00 ± 0.24 | 4.14 ± 0.12 |
Mean SAD | 9.45 ± 0.31 | 9.90 ± 1.55 | 11.08 ± 0.72 | 13.25 ± 0.91 | 33.16 ± 0.09 | 32.33 ± 0.01 | 10.57 ± 0.83 | 13.21 ± 1.23 | 9.24 ± 0.14 | 3.93 ± 0.03 |
RMSE | Tree | 9.29 ± 0.17 | 8.21 ± 0.55 | 16.16 ± 0.07 | 13.36 ± 0.44 | 18.89 ± 1.71 | 14.21 ± 0.01 | 7.06 ± 0.43 | 14.28 ± 1.94 | 12.05 ± 0.23 | 7.68 ± 0.09 |
Water | 9.03 ± 0.03 | 6.3 ± 0.52 | 19.91 ± 0.58 | 17.34 ± 0.67 | 8.42 ± 0.87 | 8.01 ± 0.00 | 6.64 ± 0.98 | 8.17 ± 1.05 | 6.20 ± 0.05 | 6.40 ± 0.03 |
Dirt | 11.09 ± 0.15 | 20.41 ± 0.90 | 12.58 ± 0.24 | 12.15 ± 0.31 | 25.61 ± 1.43 | 26.03 ± 0.01 | 23.34 ± 2.69 | 14.31 ± 1.17 | 23.51 ± 0.35 | 11.41 ± 0.08 |
Road | 6.46 ± 0.12 | 19.85 ± 1.00 | 12.45 ± 0.56 | 12.04 ± 0.32 | 21.70 ± 0.60 | 24.92 ± 0.00 | 22.76 ± 0.00 | 10.70 ± 0.77 | 31.60 ± 0.32 | 9.65 ± 0.04 |
Mean RMSE | 9.12 ± 0.07 | 15.15 ± 0.51 | 15.59 ± 0.15 | 13.89 ± 0.34 | 19.74 ± 0.81 | 19.78 ± 0.00 | 17.05 ± 0.82 | 17.05 ± 1.08 | 20.82 ± 0.20 | 8.99 ± 0.04 |
Figure 13.
Abundance maps of tree, water, dirt, road on the Jasper Ridge dataset obtained by different methods.
Figure 13.
Abundance maps of tree, water, dirt, road on the Jasper Ridge dataset obtained by different methods.
Figure 14.
Extracted endmember comparison between the different algorithms and the corresponding GTs in the Jasper Ridge dataset.
Figure 14.
Extracted endmember comparison between the different algorithms and the corresponding GTs in the Jasper Ridge dataset.
Table 6.
Valuation metrics SAD and RMSE results of Urban dataset (). Best results are bold.
Table 6.
Valuation metrics SAD and RMSE results of Urban dataset (). Best results are bold.
Urban | FCLSU | DeepTrans | uDAS | SGSNMF | NAE | rNMF | 3DAEU | A2SAN | USTNet | Proposed |
---|
SAD | Asphalt | 9.98 ± 0.24 | 15.36 ± 0.71 | 15.89 ± 2.88 | 27.64 ± 1.09 | 20.40 ± 0.02 | 19.30 ± 0.43 | 17.01 ± 0.45 | 14.86 ± 0.65 | 6.28 ± 0.04 | 7.90 ± 0.02 |
Grass | 22.41 ± 2.25 | 16.43 ± 1.54 | 114.0 ± 1.60 | 120.4 ± 7.80 | 63.40 ± 3.51 | 49.88 ± 3.87 | 20.03 ± 0.83 | 12.85 ± 0.80 | 9.79 ± 0.04 | 17.36 ± 0.03 |
Tree | 4.81 ± 0.21 | 10.36 ± 0.71 | 11.86 ± 2.60 | 8.56 ± 0.10 | 11.46 ± 0.13 | 11.78 ± 0.10 | 9.94 ± 0.12 | 9.29 ± 0.23 | 2.88 ± 0.01 | 2.60 ± 0.01 |
Roof | 4.04 ± 0.16 | 28.03 ± 3.19 | 27.31 ± 0.85 | 19.71 ± 3.24 | 81.15 ± 0.38 | 79.13 ± 0.58 | 47.44 ± 0.99 | 9.08 ± 0.51 | 3.46 ± 0.03 | 3.14 ± 0.03 |
Mean SAD | 10.31 ± 0.53 | 17.54 ± 1.42 | 42.27 ± 1.58 | 44.08 ± 1.00 | 44.20 ± 0.89 | 44.02 ± 0.70 | 23.60 ± 0.31 | 11.52 ± 0.41 | 5.60 ± 0.02 | 7.75 ± 0.01 |
RMSE | Asphalt | 38.26 ± 0.14 | 13.09 ± 0.46 | 32.32 ± 1.26 | 30.08 ± 0.65 | 28.90 ± 0.05 | 26.91 ± 0.14 | 32.18 ± 0.87 | 12.79 ± 0.57 | 15.29 ± 0.12 | 13.07 ± 0.31 |
Grass | 54.05 ± 0.46 | 14.00 ± 0.71 | 45.31 ± 1.71 | 45.08 ± 0.36 | 25.76 ± 1.02 | 47.40 ± 0.20 | 27.50 ± 0.77 | 14.50 ± 0.65 | 13.53 ± 0.13 | 13.11 ± 0.53 |
Tree | 22.54 ± 0.77 | 11.65 ± 0.49 | 28.50 ± 2.42 | 26.15 ± 0.40 | 24.79 ± 0.22 | 40.15 ± 0.63 | 23.78 ± 0.21 | 13.83 ± 0.20 | 7.44 ± 0.06 | 10.84 ± 2.86 |
Roof | 21.83 ± 0.16 | 11.67 ± 0.32 | 20.73 ± 0.41 | 19.50 ± 0.26 | 19.97 ± 1.14 | 15.94 ± 0.02 | 14.86 ± 0.65 | 11.13 ± 0.44 | 8.47 ± 0.07 | 8.08 ± 0.20 |
Mean RMSE | 36.64 ± 0.24 | 12.60 ± 0.43 | 31.71 ± 0.59 | 30.20 ± 0.41 | 25.07 ± 0.10 | 34.77 ± 0.21 | 25.39 ± 0.44 | 13.13 ± 0.40 | 11.66 ± 0.08 | 11.46 ± 0.98 |
Figure 15.
Abundance maps of asphalt, grass, tree, roof on the Urban dataset obtained by different methods.
Figure 15.
Abundance maps of asphalt, grass, tree, roof on the Urban dataset obtained by different methods.
Figure 16.
Extracted endmember comparison between the different algorithms and the corresponding GTs in the Urban dataset.
Figure 16.
Extracted endmember comparison between the different algorithms and the corresponding GTs in the Urban dataset.
Table 7.
The processing time (in seconds) of each method on three datasets. Best results are bold.
Table 7.
The processing time (in seconds) of each method on three datasets. Best results are bold.
| | FCLSU | DeepTrans | uDAS | SGSNMF | NAE | rNMF | 3DAEU | A2SAN | USTNet | Proposed |
---|
Samson | 1.21 | 6.56 | 13.24 | 12.83 | 4.95 | 10.65 | 44.77 | 11.83 | 75.22 | 27.39 |
Jasper Ridge | 1.93 | 12.86 | 62.95 | 16.63 | 6.62 | 19.34 | 98.89 | 9.12 | 227.43 | 71.01 |
Urban | 9.20 | 78.04 | 429.65 | 192.83 | 41.00 | 113.02 | 7855.08 | 79.44 | 1261.15 | 331.91 |