1. Introduction
Hyperspectral imaging is a developing area in remote sensing in which a hyperspectral spectrometer collects hundreds of narrow contiguous bands over a wide range of the electromagnetic spectrum [
1]. Different from target detection in natural images [
2], hyperspectral target detection aims to distinguish specific target pixels from the background in given HSIs with few prior spectral information of the target, which has been the focus of the remote sensing interpretation research [
3].
In the past few decades, several classic hyperspectral target detectors have been proposed [
3]. The spectral angular mapper (SAM) [
4] and spectral information divergence (SID) [
5] perform detections based on distance measurements. According to the background modeling, Zhang et al. [
6] divided the traditional detection model into structured background models and unstructured background models. Structured background models include the constrained energy minimization (CEM) [
7], the orthogonal subspace projection (OSP) [
8], target-constrained interference minimize filter (TCIMF) [
9], etc. Unstructured background models regard the background as samples from a multivariate Gaussian distribution, such as the generalized likelihood ratio test (GLRT) [
10], adaptive coherence/cosine estimator (ACE) [
11], and adaptive matched filter (MF) [
12]. Most traditional detectors are based on certain assumptions, which may fail in practice [
13]. For instance, the background and target may not share the same spectral covariance matrix.
In the past few years, machine learning-based and sparse representation-based methods have improved detection performance. Hierarchical CEM detector (HCEM) [
14], and ensemble-based constrained energy minimization (ECEM) [
15], introduce hierarchical and ensemble structures to the CEM detectors, which improves the detection nonlinearity and generalization ability. The key idea of sparse representation-based methods is that each spectrum can be linearly combined with very few atoms of an over-complete dictionary. Typical methods include original sparse representation based target detector (STD) and its variants: sparse representation-based binary hypothesis detector (SRBBH) [
16], coordinated-representation-based object detection (CR-TD) [
17], and sparse and dense hybrid representation-based target detector (SDRD) [
18]. Many scholars combined sparse representation and other techniques to develop many novel methods. Li et al. [
17] proposed a combined sparse and collaborative representation (CSCR) for target detection, and Du et al. [
19] proposed a hybrid sparsity and statistics-based method (HSSD). Zhao et al. [
20] propose adaptive iterated shrinkage thresholding method (AISTM) for
-norm sparse representation and improve the detection performance.
Although traditional methods have achieved promising target detection results, there are still a few problems. For instance, the low spatial resolution of HSIs, atmospheric absorption, and scattering make the target spectra different from the prior spectra, introducing difficulties to the detection task [
21]. Furthermore, the handcrafted spectral feature extraction filters restrict any further improvement of performance under complicated background interference [
6]. Meanwhile, deep-learning-based methods have made great success in the remote sensing field, such as anomaly detection [
22,
23,
24,
25], classifications [
26,
27,
28], image unmixing [
29,
30,
31,
32], etc. Many deep learning-based hyperspectral detectors [
33,
34,
35] outperform the traditional methods relying on the feature extraction capability of neural networks. However, learning-based methods require numerous reliable training samples, but such samples are not directly obtainable from a single prior spectrum.
To overcome the few label samples problem, some unsupervised methods [
35,
36,
37,
38] exploit the neural networks as a feature extraction module and detect the target based on discriminative features with other detectors like CEM. Specifically, Shi et al. [
36] introduce distance constraint stacked sparse autoencoders (SSAEs) to maximize the distinction between the target pixels and other background pixels and detect the target with a simple detector. Xie et al. [
38] impose autoencoder and variational autoencoder for discriminative feature selection. Similarly, the authors of [
37] select a subset of all bands for detection with a deep latent spectral representation learning-based autoencoder. In [
35], a background-learning method is proposed and obtain the coarse detection map according to the reconstruction error. Although unsupervised-based methods improve the detection performance with discriminative features, these method requires other detectors for the final results in a two-stage way.
Many supervised methods have been proposed and integrate feature processing and detection. However, the unbalance between target and background samples is the critical problem of the supervised method. For generating numerous training samples, Zhang et al. [
34] utilize generative adversarial networks to generate target and background spectra for pixels pairs. Gao et al. [
21] generates simulated data with an auxiliary generative adversarial network. Zhu et al. [
39] generate enough typical background pixels via a hybrid sparse representation and classification-based pixel selection strategy. Several convolutional network-based detectors have achieved great performance with numerous target and background spectra. Du et al. [
40] propose a convolutional neural network-based target detector (CNNTD) and Zhang et al. [
34] propose a novel target detection framework HTD-Net. HTD-Net and CNNTD feed the subtraction of pixels pairs to the multi-layers convolutional networks. Instead of feeding spectral subtraction to the networks. Zhu et al. [
39] feeds the prior spectrum and the generated spectrum to a two-stream convolutional network (TSCNTD) and detects the target with the subtraction of spectral features. Optimized with the generated target and background spectra, supervised target detection methods [
21,
34,
39,
40] realized novel performance. However, convolutional-based feature extraction modules are designed with many convolutional layers, introducing much computation burden. In addition, the suboptimal initialization of these supervised methods will impact the performance of networks, and initializing networks with random parameters will lead to fluctuations in performance [
41].
Siamese networks are capable of recognition with little available data with multiple networks sharing the parameters, proved by the scholars of [
42], which is propitious to target detection with the single prior spectrum. This paper proposes a supervised Siamese fully connected target detector (SFCTD) composed of nonlinear feature extraction modules (NFEMs) and cosine angle distance-based classifiers. Two NFEMs, which extract discriminative spectral features of input spectra-pairs, are based on fully connected layers for efficient computing and share the parameters to ease the optimization. We utilize the cosine angle value of SAM measurement as the differential criterion to optimize the parameters of NFEMs. The cosine angle distances of spectral feature pairs represent the similarities of the input spectral pairs, serving as the target confidences of the test spectra. To solve the few samples problem, we propose a pseudo data generation method based on the linear mixed model and the assumption that background pixels are dominant in HSIs. For avoiding the impact of suboptimal initialization and achieve stable detection, we optimize several Siamese detectors independently and detect targets with the network ensembles.
The contributions of our work are summarized as follows.
- (1)
A Siamese fully connected hyperspectral target detector (SFCTD) is proposed, consisting of nonlinear feature extraction modules and cosine angle distance based classifiers.
- (2)
A pseudo data generation method is proposed to create numerous positive and negative spectral pairs with discrete similarity labels, i.e., 0 or 1. The SFCTD is effectively optimized with the generated spectral pairs.
- (3)
A detection ensemble method is proposed for improving detection performance and stability. The Siamese detector ensembles outperform other state-of-the-art algorithms regarding the accuracy, recall, and background suppression, validated on multiple complex HSI data sets.
The remainder of this paper is organized as follows:
Section 2.2,
Section 2.3 and
Section 2.4 introduce the methods of the proposed hyperspectral detector.
Section 2.5 and
Section 2.6 introduce the information of experimental data sets and implementation details.
Section 3 presents the experimental results and ablation studies of the proposed method. The discussion and conclusion are drawn in
Section 4 and
Section 5.
2. Materials and Methods
2.1. Abbreviations Define
For the convenience of the subsequent description, let denote the i-th spectral vector of the HSI with prior target spectrum , where n is the number of pixels in the HSI and l is the number of spectral bands. The generated training data set consists of positive spectral pairs, and negative spectral pairs , where , represent spectral pairs associated with test spectrum . The batch size of the training data set is denoted b and number of mini batches of an HSI with n pixels equals . The NFEM is denoted f, and the test data pairs are denoted .
2.2. Siamese Hyperspectral Target Detector
As shown in
Figure 1, the proposed Siamese detector consists of two NFEMs with shared structure and parameters. Each input spectral pair consists of a prior spectrum and a test spectrum. We separately feed the spectral pairs into each NFEM and compute the cosine angle distance of the transformed output features. The cosine angle values of SAM measurement represent the probabilities of the test spectra belonging to the target category. To extract the features of the spectra effectively, i.e., 1-D vectors, the proposed NFEMs utilize fully connected networks instead of 1-D convolutional networks. Although convolutional networks have fewer parameters than fully connected networks because of the weight sharing of convolution operation, they have much more computation and random access memory burden. Specifically, each NFEM comprises a single batch norm layer and two fully connected blocks (FC blocks). Each FC block consists of a fully connected layer, a batch norm layer, and a nonlinear activation layer. We will illustrate each component successively in the following paragraph.
The amplitudes and waveforms of spectra in an HSI vary in positions because of different imaging and surface conditions, as shown in
Figure 2a. Assuming that all the spectra belonging to one category are independently sampled from the same multidimensional random distribution, the distributions of target and background spectra are different because of their different physical property. However, the significant variance and mean shift of the test spectral distribution may impact the effectiveness of the feature extraction module. Hence, we preprocess the input spectral distribution at the beginning of feature extraction with a batch norm layer to reduce the shift and optimization difficulty. Instead of normalizing the spectral distribution with zero mean and unit variance, which may be unideal for optimizing loss functions, we use batch normalization (
) to transform the original spectral distributions to distributions with learnable statistical parameters. Specifically, for the spectra mini batch
,
,
normalizes the distribution of the spectra batch,
, and transforms it to a distribution with a learnable mean
and variance
. The target spectra after
is shown in
Figure 2b. The equation of the
process is:
where
,
are the mean and variance of input spectra and
,
are the positive and negative spectral pairs for supervised learning optimization. The detailed method and purpose of generating these spectral pairs are illustrated in
Section 2.3. In the training stage,
and
change with the forward propagation of each mini-batch while
and
are optimized by the loss function in the backward times. In the testing stage, all the parameters are fixed for every process of each spectral pair. Experiments in
Table 1 validate that
enlarge the distribution difference between target and background spectra. We note that the learned parameters of the batch norm layers change with the HSIs and prior spectra.
After the operation, spectral pairs are separately fed to two FC blocks to generate discriminative spectral features. In each FC block, spectra are fed to the fully connected layer, batch norm layer, and nonlinear activation layer successively. A fully connected layer with weight transforms a spectrum with l band to a low-level feature space with dimension. Each vector serves as a liner classifier for the spectra detection of the test HSI, which highlights background or target spectra to improve the spectral discriminability. Notably, the batch norm layers of the FC block play different roles to that of the preprocessing layers. The of the FC block converts the spectral features into the unsaturated interval of the activation function, which is usually operated before the nonlinear activation layers. The Sigmoid layer helps the NFEMs extract nonlinear spectral features for accurate detection. Finally, we obtain discriminative spectral features for cosine angle distance computation through spectra transformation of preprocessing and two FC blocks.
For the input spectral pair
, the transformed spectral vector pair is
. Different from [
34,
39], both of which utilize a fully connected layer to classify the feature subtraction of the input pairs, we derive a simple cosine angle distance classifier from SAM measurement. Specifically, we utilize the cosine angle distances of the two output vector pairs as the classification confidence, which equals the cosine angle values of SAM measurement. The angle distance is simple to compute and easy for derivation. The formula of the cosine angle distance-based classifier is as follows:
where
is the cosine distance,
is the Euclidean norm of vector
, and
represents the inner product of two vectors. Compared with subtraction of spectral pairs in [
34] and spectral features subtraction in [
39], the cosine angle distance of the proposed method is magnitude invariant.
The similarity label of the spectral pair is a discrete value, 0 or 1, which is the supervised label of the cosine distance. Considering the target detection as a classification problem, we utilized binary cross-entropy (BCE) to measure the distance between the similarity labels and cosine similarities. The optimization function
of mini batch
is:
where
denotes similarity of positive spectral pair, and
indicates the similarity of a negative spectral pair, and
b represents the batch size. It is worth noting that the mini-batch
includes
b positive spectral pairs and
b negative spectral pairs generated by the identical spectra of the HSI. A detailed description is illustrated in Algorithm 1.
Algorithm 1 Training stage of the SFCTD. |
Input: The detected HSI, ; Stochastic initialized Siamese detector , ; Nonlinear feature extraction module of , ; The prior spectrum of the target, ; Batch size, b. Generate labeled data pairs: - 1:
Generate negative spectral pair following Equation ( 4), where ; - 2:
Augment the target spectra following Equation ( 5) and generate positive spectral pairs following Equation ( 7), where ; - 3:
Concatenate and to obtain the training data set .
Forward and backward propagation of the Siamese detectors: - 1:
Shuffle the order of the spectral pairs of data set ; - 2:
Feed each mini-batch , to the Siamese network, obtaining transformed feature ; - 3:
Compute the cosine angle distances of each transformed feature pairs following Equation ( 2), obtaining ; - 4:
Compute the cross entropy distance between the detection results and labels with BCE follow Equation ( 3); - 5:
Optimize the parameters of with the BCE loss.
|
2.3. Pseudo Data Generation Method
We generate numerous pseudo data with positive and negative spectra-pairs to optimize the Siamese detectors. The negative spectral pairs comprise prior and background spectra, which are utilized to optimize the detector to filter the background pixels. In addition, the positive spectral pairs comprise prior and target spectra, which help to optimize the detector to distinguish target pixels with spectral variations. However, the known target and background spectra are not directly obtainable from the HSI. To solve this lack of data problem, we generate numerous background and target spectra based on the dominant background pixels and an LMM, as illustrated in
Figure 3.
To obtain background spectra from the test HSI, which contains both target and background spectra, we assume that background pixels are dominant in the HSIs. Based on this assumption, we consider each spectrum of the test HSI as a background spectrum and is different from the prior spectrum. The combination of background spectra and prior spectra make up negative spectral pairs
, the specific formula of which is:
Although a few target spectra may be labeled as background spectra mistakenly, this does not reduce the detection performance because the target pixels are far fewer than the correctly labeled background spectra. The effectiveness of the background spectra generation method is demonstrated by experiments conducted on several real data sets, as shown in
Figure 4.
To create multiple target spectra in addition to single prior spectra, we generate simulated target spectra by mixing up prior spectra and background spectra based on the LMM. The LMM assumes that the mixed spectrum
x is a linear combination of target spectrum
and background spectrum
with abundance coefficients
and
, respectively. The formula is as follows:
Since test spectra are different in amplitude, which may be much larger or smaller than the prior spectrum, we uniform the test spectra and adjust their amplitudes to those of the prior spectrum. The adjusted test spectra with small random weights multiplied are linearly mixed with the prior spectra, generating simulated target spectra,
. The visualization of background spectra and its associated simulated target spectra are exhibited in
Figure 5, and the formula of the simulated target spectra generation is:
where
is the ratio of background spectrum, and we set it as 0.1 for all the data sets. The abundance value of the target and background endmembers are 0.9 and 0.1, which means the resulting spectra is dominated by the target spectrum and can be seen as target spectra. It is worth noting that our target spectra generation method does not need to estimate the specific categories of the background spectra. Each spectrum
is regarded as the spectral noise added to the single prior spectrum. After obtaining the simulated target spectra, we combine the prior spectrum and target spectrum generating positive data pairs
, as follows:
The training data set, composed of positive and negative data pairs, is divided into mini-batches with batch size b. In each mini-batch, we use the identical spectra from the HSI to generate an equal number of positive and negative samples. Although the prior spectra in each mini-batch are the same, all the prior spectra are fed to the feature extraction module in the training stage for proper parameter updating of the batch norm layers.
2.4. Detection Ensemble Method
The performance of deep-learning-based detectors varies with model initialization, and suboptimal parameter initialization will impact the optimization and performance of the proposed detector. Specifically, detectors with stochastic initialization and data set shuffling will perform better than the average level. Although the probability of obtaining ideal parameter initialization is moderate, it is not easy to find the specific distribution of the ideal initialization. To achieve stable detection, we propose a simple but effective ensemble method, as shown in
Figure 6. Relying on the moderate probability of ideal stochastic initialization, we optimize and aggregate multiply Siamese detectors,
, to obtain a high-performance detector with a higher probability. Specifically, the final detection map
is generated by averaging the detection map of each single Siamese detection following Equation (
8). Experiments validate that the Siamese detector ensembles outperform every single detector, which means the multiple independently optimized detectors complement each other. As shown in
Table 2, the ensemble result also shows better stability than each single detection result.
In the training stage, each Siamese detector is initialized with different stochastic parameters and trained with varying shuffles of the data sets, making sure multiple Siamese detectors are independent. Compared with other convolutional-based detectors [
34,
39,
40], our proposed detector is based on the fully connected neural networks and the computing burden of which is much lower than that of convolutional-based detectors. Hence, we could parallelly optimize multiple detectors to improve detection stability using the parallel computing capability of GPU. In the testing stage, we follow the pipeline illustrated in Algorithm 2. Before the detection of
N siamese detectors, we generate test spectra pairs by combining the prior spectrum
and each test spectrum
. Then, spectral pairs of test data set
are fed to
N Siamese detectors and generate detection maps,
. We ensemble all the results through the bagging approach to obtain the high-performance detection results without ground truth labels and manual intervention.
Algorithm 2 Test stage of the SFCTD. |
Input: input parameters The detected HSI, ; The prior spectrum, ; Optimized N detectors, . Generate test data pairs: - 1:
Duplicate the prior spectrum generating a prior spectra matrix, ; - 2:
Generate test data pairs, , where .
Detection of N Siamese detectors. - 1:
Feed test spectral pairs to the nonlinear feature extraction modules of each Siamese detector, obtaining transformed feature , where ; - 2:
Compute the cosine similarity of the transformed features, obtaining each cosine similarity , following Equation ( 2); - 3:
Average the similarity predictions of the N detectors to obtain the final results, , following Equation ( 8).
|
2.5. Information of Experimental Data Sets
We used six real data sets and one synthetic data set to validate the proposed method, and the pseudo-color images are shown in the first row of
Figure 7. All the real HSIs were captured by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS). All the data sets we selected are used as the experimental data by other published hyperspectral target detection data sets.
All the data sets provide the target ground truth maps, but only two data sets (Cuprite and Synthetic) provide the prior spectra according to the USGS Digital Spectral Library [
43]. For the data sets without prior spectra, scholars usually follow Manolakis et al. [
3] to average the target spectra according to the ground truth maps as the prior simulated spectra. It is worth noting that the spectra of target boundary pixels reflect the interference of the target and background, which are different from the pure target spectra and are impractical to obtain in actual application scenarios. Therefore, we did not average all the pixels of the ground truth map to generate the prior spectra.
Except for the Cuprite and Synthetic data sets, which provide pure endmember spectra as prior spectra, we conducted a morphological erosion operation to the ground truth maps to obtain the prior spectra candidate maps, and their average spectra represent the prior spectra. The Erosion operation is one of two basic operators in mathematical morphology. The erosion operation using
B on binary ground truth map
G is defined as:
where
E is defined as Euclidean space and
is
B shifted by
z. This paper uses a positive 3 × 3 kernel as
B. The morphological erosion is operated on the ground truth maps to discard the edge pixels. The ground truth annotation maps
G, and the annotation maps after corrosion
operations are shown in the last two rows of
Figure 7, respectively. The captured places and images details are listed below:
San Diego Airport: The San Diego Airport data set was captured at San Diego with a 200 × 200 pixel size, which contains six airplanes and several backgrounds such as buildings and the parking apron. We selected three planes with the same size as targets, and target pixels annotation is the same as previous for a fair comparison, and the number of the target pixels is 134. Due to water-vapor absorption and atmospheric effects, we selected 189 bands from a total of 224 bands for the experiments.
San Diego Beach: This hyperspectral data set was captured in San Diego with a atial size 100 × 100 pixels. The scene of the data set is the beach, and the target pixels number is 202. The target annotation map refers to the data set from [
21,
38,
44].
Texas Coast 1,
Texas Coast 2: These two hyperspectral data sets were captured along the Texan coast with a spatial size of 100 × 100 pixels. Several storage tanks were selected as targets from the urban scene, consisting of 67 and 155 pixels. The target annotation maps refer to the data set from [
21,
38,
44]. The band number of the two data sets are 204 and 205.
Synthetic: The spectra of the synthetic data set were generated from the USGS Digital Spectral Library, and there are 15 endmember spectra. For the comparison, we used the Labradorite HS17.3B endmember as the detection target, which is the same as [
15]. Since all the methods used for comparison achieved 1 AUC in the clean data set, Gaussian white noise signal-to-noise ratios (SNRs) of 15 dB and 20dB were added to the original images.
Cuprite: The Cuprite data set was captured in the Cuprite mining district of Nevada in 1997, where a subset of the images have a spatial size of 250 × 191 pixels. We selected buddingtonite from 14 kinds of minerals as the target for a fair comparison. The band number utilized is 188, and the spectra of buddingtonite in the United States Geological Survey (USGS) Digital Spectral Library was selected as a single prior spectrum.
2.6. Implementation Details
The experiments in the paper are run through Pytorch in Python on a computer with an Intel(R) Core(TM) i9-9900X CPU 3.50 GHz, GTX Titan Xp and 32G memory. We utilized the Adam optimizer for all the experiments and set the learning rate to 0.0005 and weight decay to 0.0005. Before training, the fully connected layers are initialized with a mean of 0 and a standard deviation of 0.001. We initialized the batch norm layers with of 0 and of 1.
3. Results
In this section, we introduce the results of the experiments. First, we conduct the hyper-parameters sensitive experiments and select the proper parameters. Then, we conduct detection performance comparisons with seven state-of-the-art (SOTA) methods in terms of two-dimension receiver operating characteristic (2-D ROC) curves, area under the curve (AUC) values. Since the 2-D ROC curve could not reflect the background suppression capabilities of the detectors, we supplement the box plots of detection confidences of all the comparison detectors for quantitative comparison and detection map visualizations for qualitative comparison. We use four Siamese detector ensembles for all the comparison experiments for consistency. The compared methods include four traditional methods (SAM, MF, ACE, and CEM), two advanced CEM-based methods (HCEM, ECEM), and one deep learning detector (TSCNTD) similar to our methods. Finally, we conduct an ablation experiment to study each component of the proposed method.
3.1. Hyper-Parameters Sensitive Experiments
Hyper-parameters are the parameters that are set before the training of networks. Ideal hyper-parameters could improve the performance of trained networks. Before comparing our method with other detectors, we first apply sensitive experiments of the hyper-parameters of the proposed method and select the proper hyper-parameters.
We set a different batch size, training epoch number, background spectra abundance
, learning rate, and ensemble number and evaluate the detector performance under different hyper-parameters with AUC values in five test data sets except for the Cuprite data. For the setting of each parameter, we repeat ten times and compute the means and standard deviations of AUC values. The hyper-parameters candidates and experiment results are illustrated in
Figure 8.
The experimental results can prove that background spectra abundance is not sensitive, and we select for all the data sets. The experimental results of epoch number and ensemble number reveal that the detection performance of the proposed Siamese detector is better and more stable with more training time and ensemble numbers. To make a trade-off between performance and efficiency, we optimize four Siamese detectors for ten epochs using the parallel computing capability of GPU. As for the learning rate, the detector’s performance with a learning rate larger than is much more fluctuant. Therefore, we set the learning rate as on all the data sets. Since the proposed detector is trained with fixed epoch numbers on all the data sets, the network may not be trained well with a large batch size because the iteration numbers are small. We set the batch size as 128 for the Cuprite data set because of its large image size and batch size 32 for all the other data sets.
3.2. Experiment Comparisons
3.2.1. Background SuppressionComparision
In
Figure 9, six detection maps are visualized for the background suppression qualitative comparison. A higher visualization contrast-detection maps mean better background suppression capabilities. Among all the detectors, the ACE, HCEM, ECEM, TSCNTD, and proposed detector show higher visualization contrast than the SAM, MF, and CEM. However, the targets’ integrity of our detection results is better than that of ACE, HCEM, and ECEM. Take, for example, the experiment on the San Diego Airport data set, the left and top planes of the ACE detection map are a bit blurred. The detection results of HCEM and ECEM fail to detect the margins of the target. Furthermore, the false alarm detection rate of the Siamese detector with NFEMs is better than that of TSCNTD; the latter detects many background pixels as the target.
Figure 10 shows a antitative comparison of background suppression. The red and green boxes reveal the confidence distributions of target and background pixels. Specifically, the wider a green box is, the larger the standard deviation of the background confidence distribution, which means background pixels’ confidence is in an extensive range. The lower a green box is, the smaller the mean of background confidence distribution is. Generally, the methods with good background suppression have flat and low green boxes and the red boxes whose lower quartiles are far away from the green boxes, which means most target pixels have higher confidence than background pixels. The box plot results prove the point consistent with that detection maps visualization results prove. CEM, SAM, and MF have wider and higher green boxes than the other methods. For the methods with low detection rates, such as HCEM, ECEM, and TSCNTD, the lower quartiles of their target boxes are close to zero in the data sets except for Synthetic, because many target pixels fail detect and their confidences are low. Although their green boxes are flat and low, the lower quartiles are close to the green boxes. For our method, the background boxes are flat and close to zero. Meanwhile, the lower quartiles of our target boxes are far away from the upper limits of the background boxes.
To sum up, our method achieves an outstanding balance between background suppression and target detection recall and outperforms the other comparison methods.
3.2.2. Roc Curves Comparision
ROC curves results are shown in
Figure 11. ROC curve reflects the detection results in terms of detection rate and precision. The black line of each ROC curve represents the ensembled detection results of our method. The proposed Siamese detector shows a competitive detection rate in most false-positive rate thresholds for all the data sets. For the San Diego Airport data set, our method achieves a competitive detection rate under a low false-positive rate (0–
) and the best detection rate as the growth of the false positive rate. Except for the San Diego Airport data set, our method surpasses all the comparison methods under almost all the false-positive rate thresholds. Especially for two data sets captured at Texas Coast, our method’s curves are more than
higher than the other curves under
false-positive rate. For the San Diego Beach and Cuprite data sets, our curves outperform other methods almost
under the low false-positive rate between
–
.
3.2.3. AUC Values Comparision
Table 3 exhibits the AUC value results for all the test hyperspectral data sets except the Synthetic data set. The proposed Siamese network achieves the best AUC values in all the data sets except the Texas Coast 1 data set and surpasses the other methods by large margins, especially for the San Diego Beach, Cuprite, and Texas Coast 2 data sets. Specifically, our method outperforms the second-best methods by 0.008, 0.162, and 0.061 on the San Diego Beach, Cuprite, and Texas Coast 2 data sets. Since the Synthetic data set has random white noise, we repeated the test 10 times and calculated the mean and standard deviation of the AUC values, illustrated in
Table 4. The proposed Siamese detector surpasses all the comparison methods under two noise conditions. The Siamese detector’s lowest standard deviation of AUC values reflects its excellent detection stability under noise interference.
3.3. Comparison with TSCNTD
To validate the superiority of our proposed Siamese detector to TSCNTD, we optimize the TSCNTD and Siamese detector with the same training data and compare the performance in terms of AUC values, test time, and stability. We repeat the training and testing ten times and compute the two methods’ AUC value means and standard deviations. The experiment results are exhibited in
Table 5.
According to the experimental results in
Section 3.2 and AUC means of
Table 5. The proposed Siamese detector outperforms the TSCNTD in detection recall and precision. As shown in the standard deviation results in
Table 5, the Siamese detector ensembles outperform the TSCNTD in terms of stability with the help of the detection ensemble method. Moreover, by using a few fully connected layers rather than many convolutional layers, our method’s test speed is six times faster than TSCNTD. Since the batch size and training epoch numbers are different, we only compare the test times for fair. In conclusion, the proposed Siamese detector ensembles are superior to the TSCNTD in performance, stability, and efficiency detection.
3.4. Ablation Study
In this section, we study the effectiveness of batch norm layers, Sigmoid layers, generated positive spectra pairs, and detection ensemble methods. We don’t present the detection performance optimized without negative spectral pairs because the positive data failed to optimize the network alone.
Figure 4 shows the 2-D ROC curves of the single Siamese detector without batch norm layers, Sigmoid layers, and positive spectra pairs compared with the normal one in two selected data sets. The ROC curves of the norm single Siamese detector are much better than that of the detector without Sigmoid layers and batch norm layers.
Table 1 shows the AUC values of the curves in
Figure 4. The Siamese detector with all the contributions has the largest AUC values.
To demonstrate the effectiveness of the proposed detection ensemble method. We optimize four randomly initialized Siamese detectors independently and compare their AUC values with ensembles. We repeat the experiment 10 times and exhibit the results in
Table 2. The ensemble result surpasses all the other detectors in terms of standard deviations and means of AUC values, which validates the stability improvement of the proposed detection ensemble method.
4. Discussion
There have a few supervised learning-based detectors similar to our method, CNNTD [
40], HTD-Net [
34] and TSCNTD [
39]. These three methods adopt convolutional networks for spectral extraction and employ no nonlinear activation layers for the network structures. CNNTD and HTD-Net input the network with spectral differences, reducing feature discriminability. The TSCNTD cleverly designed two-stream networks that separately apply to the prior and other spectra and solve the problem in CNNTD and HTD-Net, which makes TSCNTD superior to CNNTD and HTD-Net [
39]. However, TSCNTD uses two convolutional networks with nine layers to process the spectra pairs, making it slower than our method. Our proposed detector derives from the Siamese network, which is capable of recognition with little available data [
42]. Similar to the network structures in [
21,
35,
38], we design the Siamese detector with fully connected layers and nonlinear activation layers. Since Zhu et al. [
39] has proved the superiority of TSCNTD to CNNTD and HTD-Net in terms of performance and speed, we only compare the proposed detector with TSCNTD.
Although TSCNTD has achieved great performance, the computation burden of convolutional networks is heavy. Moreover, the parameters are redundant and introduce optimization difficulty. Specifically, the upper stream is only responsible for the feature extraction of the prior spectrum, a constant vector, which makes up almost half the parameters. Hence, Zhu et al. [
39] proposed a regularized cost to optimize these numerous parameters. Our method solves these problems with a Siamese detector that comprises two fully connected network sharing parameters. Parameters sharing reduces the number of parameters and reduces the difficulty of optimization. We also introduce nonlinear layers to improve the feature extraction capability. Experiments in
Table 5 validate that the proposed detector is more effective than TSCNTD.
Shi et al. [
41] propose a semisupervised domain adaptive few-shot learning (SDAFL) model and exhibit the standard deviation results to prove the detection stability of SDAFL. Other deep learning-based methods [
34,
39,
40] do not give the standard deviation results. To study the detection stability of TSCNTD, we repeat the experiments of TSCNTD ten times and find the stability is unsatisfactory, as shown in
Table 5. This paper pays attention to the detection stability and improves it with a classical machine learning method, ensemble learning. The detection ensemble method improves both the stability and performance but introduces computation.
For a Given HSI and prior spectrum, non-learning methods will give specific detection results, such as SAM, CEM, MF, ACE, HCEM, and ECEM. Our proposed Siamese detector outperforms in performance with the help of neural networks’ excellent feature extraction capability. However, the parameters initialization of the neural networks introduces fluctuation in performance. Therefore, the repeatability of non-learning methods is better than TSCNTD and SFCTD.