1. Introduction
Hyperspectral imaging technology has the ability to discriminate between different materials on the basis of their unique spectral signatures. A hyperspectral image (HSI) can be seen as a set of “images” collected by the hyperspectral sensor. Each image represents a narrow wavelength range of the electromagnetic spectrum, also known as a spectral band. These “images” are combined to form a three-dimensional hyperspectral data cube that contains rich spatial and spectral information. Due to such information, hyperspectral imaging has been widely used in practical applications [
1,
2], such as use for precision agriculture [
1], identification of the germination status of tree seeds [
2], distinguishing between healthy and non-healthy skin [
3], target detection [
4], etc. Target detection [
3] has gained much attention over the years as a prominent hyperspectral application. One subcategory of hyperspectral target detection is anomaly detection (AD), which is particularly attractive and challenging because it does not require prior spectral knowledge about the target.
The AD method mainly relies on the difference between the pixel under test (PUT) and the local or global background area to detect the target. Consequently, an accurate detection of the background signal plays an important role. Over the last two decades, many AD methods have been proposed with good detection performance. Statistical AD is the most traditional approach. It assumes that the HSI background obeys a particular statistical distribution. A well-known representative is the RX detector [
4], which assumes that if the background can be modeled as a Gaussian normal distribution, then the target can be identified by measuring the Mahalanobis distance of a pixel vector to its background [
5]. Two extended versions, named the global RX (GRX) and the local RX (LRX) have been developed based on the RX detector [
6]. These two methods estimate the global and local background statistics (i.e., mean and covariance matrix), the latter based on a sliding window. Although the Gaussian distribution-based concept is mathematically convenient, it is still a challenge to accurately describe complex backgrounds. To overcome this shortcoming, other RX-based algorithms have been proposed. For example, the cluster-based anomaly detector (CBAD) utilizes the background cluster strategy to estimate the background statistical information and AD is conducted on individual clusters [
7]. The Gaussian mixture-based method combines a set of unimodal Gaussian distributions to characterize the backgrounds, which reportedly provides more accurate descriptions of complex backgrounds by accounting for the presence of multiple materials [
8,
9]. The weighted-RXD (W-RXD) and linear filter- based RXD (LF-RXD) are introduced in [
10]. These methods intend to provide a more accurate estimation of the covariance matrix and mean vector of the background. Furthermore, to exploit the nonlinear relationships between different spectral bands, a kernel version of the RX method was proposed [
11]. The said Kernel RX-algorithm extends the original RX algorithm into a higher dimensional space associated with the original input space using a non-linear mapping function. Other kernel-based AD methods have been proposed, such as the Support Vector Data Description (SVDD) detector [
12], Robust Kernel Regression Analysis detector (RKRAD) [
13], nonlinear spectral-spatial composite kernel-based detector (SSCKD) [
14], Kernel independent component analysis (KICA) detector [
15], and cluster Kernel RX (CKRX) detector [
16]. The main disadvantage of these kernel-based methods is that an inappropriate selection of the kernel function often leads to unstable detection performance. Another AD strategy is to reduce the spatial variation of the background by applying a preliminary step called background suppression. This step can be conducted using Orthogonal Subspace Projection (OSP) techniques [
17], which try to suppress the background by projecting the data onto a subspace orthogonal to the background subspace.
Recently, a representation-based AD route has attracted much attention for its advantages of accurate and objective description of the background features. The most popular algorithms are based on the sparse representation theory. Chen et al. [
18] was the first to employ sparse representation for hyperspectral target detection. They constructed two sub-dictionaries from the background and the target, and treat the target detection problem as a special binary sparse representation classification (SRC) [
19] task. The algorithm assumes that the background and the target should be distributed in two different subspaces. It is well known that the anomaly target detection in general lacks prior information of the target, and therefore the target dictionary is not known if sparse representation technique is adopted for target detection. There have been a few available studies that have tried to investigate the sparse representation-based AD mainly focusing on the construction of a background dictionary. For example, Li et al. [
20] proposed a background joint sparse representation AD algorithm. This algorithm employs a sparse representation model to select the pixels that best represent the local background from all of the pixels. The thus obtained local background pixels are used to establish the orthogonal subspace, and finally a traditional subspace detection operator is used to extract the target. This algorithm uses concentric double sliding windows and all image pixels when processing the target pixel, so it is very time-consuming and not conducive for real-time processing. In addition, AD methods based on low rank and sparse representation have also been proposed. The main idea is to decompose the HSI into low-rank and sparse parts. It is assumed that if the background has low-rank characteristics and the target is sparse, the target can be extracted from the sparse part [
21,
22]. Sun et al. [
23] used the GoDec algorithm to decompose the HSI into a low-rank background matrix and a sparse matrix, followed by the extraction of the anomaly targets from the sparse matrix with the help of Euclidean distance. Zhang et al. [
24] proposed an AD method based on Mahalanobis distance in a low-rank decomposition framework. This approach mainly extracts anomaly targets through Mahalanobis distance after image decomposition. Xu et al. [
25] and Niu et al. [
22] also studied hyperspectral low-rank sparse representation AD algorithms.
Although the above-mentioned sparse representation-based methods have achieved good performance, an important physical phenomenon has been neglected: mixed pixels are very common in HSIs. Hyperspectral unmixing technology enables an accurate characterization of the background features in AD [
26,
27]. The method proposed by Qu et al. [
27] realized the hyperspectral AD through spectral unmixing and dictionary-based low-rank decomposition. This method firstly employed the traditional Minimum Volume Constrained Nonnegative Matrix Factorization (MV-NMF) method to obtain the endmembers and abundance matrix from the input HSIs. Secondly, a dictionary was constructed on the abundance features. Thirdly, a low-rank matrix factorization method was utilized to decompose the abundance features data to obtain a low-rank matrix and a sparse matrix to contain the background and the targets, respectively. Finally, the anomaly targets were extracted from the sparse part. These results show that the use of the unmixing technology can describe the background features with a high accuracy at a sub-pixel level, subsequently improving detection performance. However, the available spectral unmixing-based AD methods have two main shortcomings: (1) The unmixing methods mostly employ simple Nonnegative Matrix Factorization (NMF) unmixing framework where the generation of the endmembers is not well-constrained. This renders the endmembers not representative; (2) The obtained endmembers and abundances have not been effectively utilized.
With regard to the first issue encountered by using MV-NMF unmixing method for AD, the first objective of this study is to introduce the model of Archetypal Analysis (AA) [
28] to implement spectral unmixing and to aid in sparse representation-based AD. In fact, this model has been demonstrated to be of a great potential for spectral unmixing [
29,
30]. AA explicitly provides the generation of a mutual relationship between the endmembers and the original data. Prior to MV-NMF, the good model interpretability enables one to control the generation of the virtual endmember to obtain more accurate endmembers and abundances. Moreover, the initialization strategy of the archetypes also makes it less possible to get the local optimum rather than the MV-NMF with a random initialization [
29]. Instead of non-negative constraints, AA is imposed with simplex constraints to optimize the principle convex hull, which in fact satisfies the non-negative and sum-to-one constraints as linear unmixing and leads to both low-rank and sparse representation effects. Such a property has, in fact, been demonstrated to be of potential value when applied to AD [
21,
22]. Therefore, in this study, the AA -based spectral unmixing of the main components of the HSI image is first conducted and the spectral unmixing reconstruction error is used for AD. At the same time, assuming the background is the dominant component of the low-rank characteristics, when compared to the anomaly targets being sparse, the representative endmembers achieved by AA under this assumption are further argued to represent a background dictionary. Therefore, to make full use of the AA unmixing result, the generated endmembers are next embedded in a structured sparse representation model and the sparse reconstruction error is utilized for AD. The final target determination is realized through the linear fusion of both spectral unmixing error and sparse reconstruction error.
The remainder of this paper is organized as follows.
Section 2 provides a detailed description of the proposed method.
Section 3 verifies the proposed algorithm and analyzes the parameters on simulated and real-world datasets. Conclusions are drawn in
Section 4.
3. Experimental Results
3.1. Datasets Description
In this paper, four datasets are utilized to evaluate the detection performance of the proposed method.
3.1.1. Simulated Dataset
The simulated data, created using the linear weighting method, is based on real-world hyperspectral data. The real hyperspectral reflectance data was collected by AVIRIS (Airborne Visible Infrared Imaging spectrometer) at San Diego Airport, CA, USA. The spatial resolution is 3.5 m and the dataset has 224 spectral bands from 370 nm to 2510 nm. After removing the water absorption bands and bands with low signal-to-noise ratios or spurious data, 189 bands were left for further analysis. The original size of the image is
pixels. In this paper, a sub-image of size
pixels was selected to create simulated data. The original image and the selected area are shown in
Figure 2. The method to generate the simulated data is discussed in [
25,
35] and will not be repeated here. The targets were embedded into the background using the target implantation technology. Specifically, according to the linear mixing model, the current background pixel
q was replaced by the new pixel
x which is formed as the linear combination of the target spectrum
t and the current background pixel
q according to the abundance fraction
as follows:
Figure 3 shows the simulated data and the ground truth. A total of 18 targets were implanted into the background and evenly distributed in the image in the form of 3 rows and 6 columns. The targets in each row are of the same size. The size (in pixel unit) of the targets, from top to bottom, is
, respectively. The value of
is the same in each column, from left to right, it is
, respectively. The man-made aircraft in the HSI was treated as the target, and the spectrum from the aircraft pixels was extracted and used as the spectral target
t.
3.1.2. Real-World Dataset
There are three real-world datasets. The first is the Urban dataset shown in
Figure 4a, which is collected with HYDICE. The spectral coverage is 400∼2500 nm, the spectral resolution is 10 nm, and the spatial resolution is 2 m. The entire image contains 210 bands, and has a size of
pixels.This urban area has vegetation, construction, and several roads including some vehicles. The background of this dataset mainly includes trees, soil, grass, and asphalt, etc., and thus, man-made elements were used as anomaly targets. The ground reference target of the entire image is not easy to obtain [
36]. A sub-image (shown in
Figure 4b) of the size of
from the upper right corner was selected as the test image. The low signal-to-noise ratio and water vapor absorption bands (1∼4, 76, 87, 101∼112, 136∼153, 197∼210) were removed, leaving 160 spectral bands. In this dataset, 21 man-made small targets (including cars and roofs) were used as the anomaly targets [
25,
37]. The anomalous targets, including cars and roofs, are embedded in different backgrounds and the anomalous target pixels include vehicles with different sizes. And the ground-truth map is shown in
Figure 4c.
The second and third real-world datasets were selected from the San Diego Airport dataset, and the selected area and size are the same as in previous studies [
13,
24,
25]. The second dataset is a
pixels sub-image from the bottom left of the original HSI. As shown in
Figure 5, the dataset is named “Sandiego60” and the aircrafts were used as the target. The third dataset is named “Sandiego80”. It is a sub-image with a size of
pixels from the upper left corner of the original image. The selected scene is mainly composed of buildings with different roofs, an airport runway and a small amount of vegetation. The aircrafts represent the anomaly targets. The selected area and the groundtruth map are shown in
Figure 6.
3.2. Comparison Methods
To evaluate the performance of the proposed method, eight state-of-the-art methods were employed for comparison. They are GlobalRX (GRX) [
4], LocalRX (LRX) [
6], Background Joint Sparse Representation Detection (BJSRD) [
20], Local Summation Anomaly Detector (LSAD) [
38], Collaborative Representation for Hyperspectral Anomaly Detection (CRD) [
39], Cluster-Based Anomaly Detector (CBAD) [
7], Low-Rank and Sparse Representation (LRASR) [
25], Abundance and Dictionary-based Low Rank Decomposition (ADLR) [
27]. Among those methods, a Matlab Toolkit (
https://github.com/davidkun/HyperSpectralToolbox, accessed on 15 February 2016) was used to run the first two state-in-art methods. The code of CRD algorithm is available at
https://sites.google.com/site/godecomposition/code, accessed on 15 June 2017. The codes of other methods are not accessible. However, the low-rank representation and clustering used by the other methods were realized by utilizing the corresponding toolboxes in Matlab.
(a). GRX and LRX are the benchmark algorithms to be compared. They assume that the background obeys a fixed model in AD.
(b). BJSRD is a new AD algorithm based on the sparse representation theory. This algorithm uses the traditional OMP algorithm [
40] for sparse coefficients optimization.
(c). The LSAD algorithm is similar to the single-window LRX algorithm. It uses a multi-window sliding filter to obtain various spatial distributions of pixels adjacent to the cells to be measured, and this local aggregating strategy integrates spectral and spatial information to improve the detection performance.
(d). The CRD algorithm is based on the principle that each pixel in the background can be approximated by its spatial neighborhood, while the anomaly target cannot. The background estimation is realized by sliding the double window to approximate each central pixel as a linear combination of the surrounding samples. The anomaly target is extracted based on the residual image obtained by subtracting the predicted background from the original HSI.
(e). The CBAD method divides the HSI into clusters using the Gaussian mixture model (GMM). The Mahalanobis distance is then calculated between the pixel under test and the center of each class. The anomaly targets are those with a distance larger than a pre-defined threshold.
(f). LRASR decomposes the background into low-rank and sparse matrices, and the anomaly target is included in the residual. The k-means clustering method is used for dictionary construction.
(g). ADLRS is a typical anomaly target detection method based on spectral unmixing. This method first uses the traditional MV-NMF to unmix the input HSI. It then constructs the dictionary with clusters obtained from the abundance matrix using a meanshift clustering algorithm. The dictionary is further used in a low-rank matrix decomposition model to divide the abundance matrix into the low-rank part (background) and the sparse part (the target). The anomaly target is extracted from the sparse part. Similar to the proposed method, ADLRS analyzes the HSI at a sub-pixel level using spectral unmixing.
Since the proposed method integrates SURE and SSRRE for AD, the detection results are presented separately in the experiments, and they are defined as the SURE detector (SUREAD) and the SSRRE detector (SSRREAD).
3.3. Experimental Results and Analysis
The Receiver Operating Characteristic (ROC) curve generated on a per pixel basis and the Area Under the ROC Curve (AUC) indexes are used to evaluate the methods. Given a threshold, the detection result can be transformed to a binary image, where value 1 represents that targets are present in the pixel and value 0 represents that targets are absent. Based on the ground truth, the ROC curve presents the varying trend between the detection probability
and false alarm rate
by taking all possible thresholds.
and
are defined as
where
represents the number of anomalous target pixels detected with certain threshold,
represents the total number of anomalous target pixels in the image,
represents the number of background pixels having been detected and
represents the total pixel number of the image. The parameters with respect to a more superior detector would lie nearer the top left of the ROC curve or are with a larger area under the curve (AUC). The AUC is also quantitatively computed to evaluate the detection performance for further validation. A detailed explanation about these two indexes can be found in [
41].
Figure 7 and
Figure 8 present the detection probabilities of the methods on the simulated data in the 2-D and 3-D space. The targets in this dataset were obtained as a linear combination of the background pixels and the target spectrum. Thus, the mixed pixels in the image account for a high proportion. Those two figures show that the simulated targets detected by our proposed method (see
Figure 7k) were visually closest to the ground truth shown in
Figure 3b. It is further confirmed by the visible number of detected targets in
Figure 8k. The results of SURE, shown in
Figure 7i, were achieved by reconstructing the original image via AA unmixing. The results show that the method managed to suppress the background to a large extent. After the dictionary was constructed by the endmembers and inserted into the structured sparse representation model for AD, the detection probability of SSRRE is shown in
Figure 7j. Even if the targets are highlighted against the background, there are also false alarms.
Figure 7k shows the final fused results of SURE and SSRRE, which realize the background suppression and target highlighting at the same time. The ROC curves and AUC values of the detection results on the simulated data are shown in
Figure 9. The proposed algorithm achieved an highest AUC value of 0.99711 indicating its effectiveness, even though the LRX was very approaching.
For the Urban data, the detection results expressed in a 2D and 3-D space are shown in
Figure 10 and
Figure 11, and the corresponding ROC and AUC results are summarized in
Figure 12. It is clear the detection results via SURE (shown in the
Figure 10i) have not achieved background suppression. However, when combined with SSRRE, the targets can be prominently highlighted as shown in the
Figure 10k. The AUC value of the proposed method is the highest at 0.98817 (
Figure 12) which dominantly surpassed the other methods. Different from the results achieved on the simulated dataset, it was in particular far more than the LRX method on the Urban data.
The last two real-world datasets, Sandiego60 and Sandiego80, are both from the San Diego Airport dataset. The 18 aircrafts in Sandiego60 are smaller in size and the study area has a relatively low background complexity. There are three aircrafts in Sandiego80, larger than those in Sandiego60, but the background objects presented are more challenging.
The detection results, ROC cures and AUC values achieved on Sandiego60 are shown in the
Figure 13,
Figure 14 and
Figure 15. The detection results, ROC cures and AUC values achieved on Sandiego80 are shown in the
Figure 16,
Figure 17 and
Figure 18. The traditional statistics-based methods (such as GRX, LRX, LSAD) assume that the background obeys a fixed model and as a result, their detection accuracies are low. Although representation-based methods (such as CRD, LRASR) can better describe the background features, they do not consider spectral mixing and their detection results are not satisfactory. Spectral unmixing based ADLR method only uses the abundance matrix after unmixing, which leads to poor detection results. The proposed algorithm employs the sparse representation theory to model the background features accurately, and also collaboratively considers the mixed pixel characteristic at a sub-pixel level. The best AUC values present more precise comparisons, all the detection methods include ours worked better on Sandiego80 data than that on Sandiego60 data. Under this condition, the best AUC value of our method obtained on Sandiego80 data surpassed that of the second-ranked method with more value.
To evaluate the computational complexity of the proposed method, the running times of all methods are summarized in
Table 1. All experiments were implemented with MATLAB software on the laptop which has 16 GB RAM and the CPU as 64-bit Intel Core i7-9750H working at 2.6-GHz. It is obvious that as the data size increases, the running time of all algorithms increases. Compared with other methods, the time consumed by the proposed method is moderate and acceptable.
3.4. Analysis of Parameters and Unmixing Method
In this section, the main parameters involved in the proposed algorithm are briefly summarized and analyzed. In addition, the AA unmixing method was compared with the well-known Minimum Volume constrained Nonnegative Matrix Factorization (MV-NMF) spectral unmixing method [
27] on all of the datasets to indirectly verify the effectiveness of the proposed method.
3.4.1. Parameter Analysis
The main parameters of the proposed method are the number of endmembers
m and the balance parameter
in the final fusion error. When analyzing the influence of the varied number of endmembers on the detection performance in terms of AUC, a total of 25 values were taken in the range of 1–50 at an interval of 2 to calculate the AUC value with the balance parameter fixed at the best value of
.
Figure 19a presents the AUC values obtained by the proposed method with varied numbers of the endmember. It can be seen that when the number of endmembers is too small or too large, their representation of the background information results in poor detection performance on all datasets. Different levels of fluctuation are observed on different datasets. Comparatively, relatively smaller variation was observed on the synthetic data than that on the other three real data. From
Figure 19a, it can be seen that the proposed detection method performs well on the simulated data when the endmember number is in the range of 10–18. For the Urban datasets, the performance is better when the number of endmembers is 8–10. On the two subsets of San Diego data, more number of endmembers was needed for our method to represent the background well and get a promising detection result of Sandiego80 subset compared to that of Sandiego60 subset.
As for the balance parameter, 11 values were selected in the range of
at a step of
. As shown in
Figure 19b, good results are obtained at values near 0.5. So the presented best results achieved by our method on all the datasets were all obtained with the balance parameter set as 0.5. When the balance parameter takes values between 0 and 1, they represent special cases where only a single reconstruction error is used to extract anomaly targets. In such cases, SUREAD or SSRREAD are working independently. Note that the detection performance is poor in these two special cases.
Figure 19b further demonstrates the necessity to fuse these two parts.
3.4.2. Unmixing Method Analysis
The proposed method is inspired by the AD detection strategy based on spectral unmixing, which attempts to analyze and extract the background features at a sub-pixel level. The proposed method adopts the AA spectral unmixing strategy. To analyze the advantages of an AA unmixing algorithm, we replaced the AA unmixing with the well-known MVNMF [
27]. In order to comprehensively evaluate the performance, a total of 25 values were taken between 1–50 at intervals of 2 to calculate the AUC value.
Figure 20 shows the detection performance in terms of AUC obtained by AA and MVNMF on each dataset. It is apparent that the AA unmixing based AD outperforms the MV-NMF based AD under every background representation condition. This shows the AA unmixing method is the right choice for AD.
4. Discussion
Anormaly detection through hyperspectral remote sensing technique is a significant application task demanded for both military and civilian monitoring. To accurately identify the abnormal targets, a good representation of the background information plays a vital role. This task is particular difficult due to the fact that mixed pixels commonly exist in the remote sensing images. Available studies have consider this problem by using spectral unmixing approach to generate more representative features used for AD. Simple low-rank representation methods, such as PCA [
20] and MN-NMF [
27], have been adopted for the background information learning, which render the improvement of detection performance being limited. In this study, a well-constrained spectral unmiixng model of AA is introduced to conduct spectral unmixing which is mainly intended for representative signature extractions for direct representation of background information. Here, we discuss the implications of our results with respect to the applicability of AA unmixing model for the background representation, the key parameter in spectral unmixing and the complexity of background.
In concern of the mixed pixels in hyperspectral images, the precise target detection needs to be conducted at a sub-pixel level. Therefore spectral unmixing is a essential step used for accurate target detection. As has been mentioned in previous section that AD methods based on low rank and sparse representation have been proposed to decompose the HSI into the low-rank and sparse parts. It assumes that the target can be extracted from the sparse part if the background has low-rank characteristics and the target is sparse [
21,
22]. The fact that AA also enables low-rank and sparse representations motivates us to conduct this initial exploration of using AA unmixing to generate representative background dictionary. From a physical point of view, the model of AA finds distinct patterns in the data [
28]. It has been demonstrated to be of potential value in spectral unmixing which is used to generate pure representative endmembers of the data. It is well-known for those who work on spectral unmixing that mixed data is distributed as simplex in the low dimensional feature space and the pure endmembers are located as the vertices of the simplex in idea conditions (in practical cases, it would not be a very regular simplex structure).The archetypes obtained by the AA associated models are the main representatives for the different class types of the data. Abnormal targets are, in general, very small and occupy only a small proportion of the image which also would be very different from the background materials. Thus, the targets may distribute far from the background data samples which means AA can be used to get endmember signatures for background representation with proper setting of number of endmembers.
Figure 1 is a case which proves this. As has also been mentioned, the well constrained AA is superior to MV-NMF [
27] in endmember extraction. The practical evaluation is in accordance with such theoretical assumptions referring to what is presented in
Figure 20. Moreover, all the detection results achieved on both simulated and real datasets verify the effective of our AA unmixing based AD methods.
The number of endmembers in fact plays a vital role in our study with our initial trying to use AA unmixing technique to provide a low dimension representation of the background data. As is observed in
Figure 19a, the method is a little sensitive to this parameter setting. Many variants of NMF models have been developed for the unsupervised unmixing where the endmember number is also often provided as prior information. So we also manually set this number with a general understanding of the background from available studies. Therefore, it needs to mention that a priori knowledge of the background is needed to ensure a good behavior of the proposed method and consequently interpretable and exploitable results at output.
A priori knowledge of the background relates to know about the background complexity. It is difficult to give a clear definition of the “complex background”. However, it at least refers to the background where there are more different classes than the targets. What’s more important, heterogeneous region existed obviously. Among the four experimental datasets, the Urban data and Sandiego80 are assumed as more complex. This is because the targets in the simulated data and in Sandiego80 data are less (more sparse) than that in the other two datasets, and the backgrounds in the former two datasets have more heterogeneous regions than the latter ones. As comparison, dominant and large homogeneous regions can be found in the simulated and Sandiego60 datasets. So the former two datasets were assumed to be more complex. We have pointed out that our method is mainly intended for complex background cases. The best AUC of each method achieved on those four datasets also indicated that the traditional statistical modeling based AD methods (such as GRX, LRX) fell more behind our method on the former datasets, whereas in the case of the latter two datasets its accuracy approached the accuracy of our methods. However,
Figure 19a also shows the latter two datasets needed more endmembers used in the proposed algorithm though they were assumed to have simple backgrounds. In fact, this is more possible as for the intra-class spectral variability. As large homogeneous regions can be observed in those two scenes, the intra-class spectral variability is a more dominant problem in such a scene than that in a scene with more heterogeneous regions. So there must be multiple endmembers from the same background types, which under certain condition, would also decrease the chance to include target in the endmembers extraction of our method. It is recommended that question, the related factors and verification be studied more deeply in the future. In addition, this also throw light on the future study to extend our study by exploration of effective strategy to realize multiple endmembers extraction for better background representation for AD.