**3. Methodology**

In general, as mentioned before, there are two main strategies of global and local in the field of anomaly detection for hyperspectral images where all of the so-far developed methods can be placed in one of these categories. Our proposed method is among the local strategies that evaluate the probability of anomalies' existence using various spatial neighbor conditions around each pixel of interest (*PoI*). This is performed through a transition of a sliding window with a pre-defined size around the *PoI*. In other words, when considering the location of each *PoI* in the input image, e.g., Figure 1, the *PoI* experiences different spatial positions with respect to the spatial neighbors through the transition of the sliding window. In each position, the *PoI* is investigated for the presence and absence of the anomaly. Finally, the fusion of the results obtained for each *PoI* during its presence in the sliding window will generate the anomaly detection criteria.

**Figure 1.** Structure of moving the sliding window around the *PoI*.

According to the Figure 1, *m* is the number of elements of sliding window. Thus, each *PoI* has the possibility of placement in *m* different positions with respect to the sliding window where, in each location, the consequent *fi*, *I* = 1, 2, ... , *m* index will be calculated, as described in Equation (3). In other words, through a complete transition of the sliding window on each *PoI*, *m* different positions of the *PoI* would occur in the sliding window (*wi*, *I* = 1, 2, 3, ... , *m*). Finally, for each *PoI*, the feature vector composed of *m* members (*FPoI*) will be calculated where the variance of its elements is used as the index of the anomaly detector. The main reason of proposing such an idea is the spatial asymmetry assumption of background elements around the probable spectral anomalies. Prior to this idea, all local anomaly detection methods have assumed the spatial symmetry of background elements around pixels of interest in the detection process. Furthermore, each pixel of the image is evaluated in terms of the presence of the anomaly only one time in its symmetric neighboring region. Accordingly, the proposed solution includes two main contributions: (1) the ability of judgment for each *PoI* with the variety of the spatial neighborhood; and, (2) the capability of the synergy of the obtained knowledge through the transition of the sliding window for each *PoI*.

The presentation strategy of the proposed algorithm is focused on the estimation of the *fi* indices for each *PoI* in the *wi*'s location of the sliding window. It is obvious that generalization of this process to other *wi* positions of each *PoI* will yield the generation of its *m*-dimensional *FPOI* vector.

Simultaneous sparse representation of all pixels occurring in each *wi* using a unique set of local background atoms from a learned dictionary is the main idea of the proposed method in the field of anomaly index generation. Through this process, it is expected that simultaneous estimation of all the available signals in *wi* leads to imposing the selection of the descriptive background atoms. Consequently, the increase in the *l*2-norm of the estimated residuals for each signal could be interpreted as the level of anomaly. In this procedure, using the traditional RXD anomaly, the randomly selected

signals used in dictionary learning are initially refined by exclusion of highly-probable anomaly signals. Figure 2 illustrates the process of *fi* estimation in a *wi* occurrence.

Here, a spatial subset of a hyperspectral image that coincided with the position of *wi* is depicted where the *PoI* inside the *wi* is shown as a red pixel (parts a, b, and c). Obviously the transition of *wi* will change the location of the *PoI* in it.

**Figure 2.** The process of the proposed anomaly detection algorithm.

According to Figure 2, and as the first step, using the traditional RXD, the potentially anomalous signals were removed by applying a proper threshold (*Th*-Plane). This threshold is set to twice of the average (*2μ*) of RXD map. In continue, the signals having higher values than *Th*-Plane were omitted from randomly selection process of dictionary learning (parts c and d). The aim of this process is to perform initial refinement of dictionary atoms. Thus, the initial dictionary atoms (*Dwi*) would be more descriptive to model the background signals.

In Figure 2, the matrix [*Swi*]*<sup>b</sup>*×*m* (b is the number of hyperspectral image bands) contains the constructive signals of *wi* where the green columns are indications of possible anomaly signals that are detected using RXD. The black columns are randomly-chosen candidates as initial atoms in the process of dictionary learning (*Dwi*). The number of signals used in dictionary learning (*K*) is equivalent to the '*q*' percentage of the *Swi* signals in the case of the omitted probably-anomalous signals. As a result, the matrix *Dwi* would be constructed from matrix *Swi* as the initial dictionary in the dictionary learning process (part d).

To optimize the initial atoms of *Dwi*, the K-SVD method [43] has been utilized where the OMP algorithm [41] is used for the sparse coding process of each signal (part e). In this method, the direction of each dictionary atom is updated in an iterative process. This method is composed of two main steps. In the first step, sparse coding of all input signals (*Swi*) is performed, and, in the second step, for each selected atom, a new direction is calculated using the signals coded by the selected atom. This new direction is estimated through the Singular Value Decomposition (SVD) method of a matrix composed of columnar vectors, indicating the residual of the affected signals. In the other words, the estimated residual vector of the signals is calculated by only the signals affected by that selected atom. In this process, while the selected atom is absent, through elimination of the effect of the selected atom, the residual vector of the estimated signals will be calculated. Finally, the b-dimensional eigenvector corresponds to the largest singular value will be chosen as the substitute direction of each selected atom. As can be seen in [43], dictionary learning is an iterative procedure, which includes two steps (1) sparse coding of the *Swi* signals, and (2) optimization of the direction of *Dwi* atoms.

After the *Dwi* was learned via K-SVD, finding a common subspace for all *Swi* signals is performed through the SOMP algorithm (part e). In this algorithm, the sparse coding of a set of signals is simultaneously carried out. This means that all of the *Swi* signals will be simultaneously estimated through the same subspace spanned by the atoms in the learned dictionary with the minimum dimension. Furthermore, the minimization of the l2-norm of estimated residuals should be satisfied. The aim of this process is choosing the background descriptive atoms to reconstruct the *Swi* signals. As discussed before, it is expected that, during this process, the background signals are estimated to be more precise than the rare and anomalous signals. When considering this expectation, after the estimation of the *Swi* residual vectors using SOMP, their *l*2-norm for all *wi* signals would be calculated as *rj*, *j* = 1, 2, ... , *m*. To continue, the index *fi* for the *PoI* will be estimated by normalizing the *ri* corresponding the location of *PoI* in *wi* through the Equation (3):

$$f\_i = \frac{r\_i}{\sum\_{j=1}^{m} r\_j}, \quad i = 1, 2, \dots, m \tag{3}$$

Through the transition of *wi* (*i* = 1, 2, ... , *m*) on *PoI*, a *m*-dimensional vector *FPoI* will be generated. Finally, its variance would be selected as the criterion of anomaly detector after the 3σ statistical test (Equation (4). In this equation, *m*\* is the number of *fi* for each *PoI* after the blunder separation procedure by the 3σ test. The well-known 3*σ* test is a standard statistical test to remove blunders from the random variable sets (*fi*). In this procedure, by assuming the normal distribution of the random variables, using mean (*μ*) and the standard deviation (*σ*) of the *fi* elements, those samples that are located within the range of *μ* − 3*σ* < *fi* < *μ* + 3*σ* are known as inliers and the other samples outside this interval are considered to be outliers. Finally, the outlier samples are removed from the data set and the anomaly index (Equation (4)) is calculated for the remaining samples [44]:

$$AD\_{PoI} = \frac{1}{m^\*} \sum\_{i=1}^{m^\*} (f\_{\ i} - \sum\_{i=1}^{m^\*} \frac{f\_{i}}{m^\*})^2 \tag{4}$$

The following pseudo-code represents the general process of the proposed algorithm (Algorithm 1).


#### **4. Datasets and Pre-Processing**

Three real and two synthetic datasets were used in this research. The first real dataset contains an urban and forestry region of Cook city in Minnesota, USA acquired by a Hymap hyperspectral sensor with 126 spectral bands ranging from 370–512 μm in 2006. This data is freely available to the public through the Rochester Institute of Technology (RIT) and includes several targets with known spatial and spectral characteristics. This data is considered as a reference for the evaluation of target and anomaly detection methods. Figure 3 shows this reference data and the location of spectral targets. The details of the spectral targets and their behavior can be studied in [45].

**Figure 3.** Rochester Institute of Technology (RIT) real dataset: (**a**) the spectral curve of targets; (**b**) the location of targets in selected subset; and, (**c**) original data.

A subset of 80 × 100 pixels from the first real data containing six spectral targets was selected for analysis in this study. The second real data is from an airport zone of San Diego that has been collected by the AVIRIS sensor with 224 spectral bands ranging from 350–2510 nm. This has been converted to a 100 × 100 × 189 hypercube after removal of the water absorption and noisy bands. In this data, three spectral targets (airplanes) with extents of more than a couple of pixels exist and were used to apply the anomaly detection algorithms. Figure 4 displays the original data, the selected subset, and the spectral curve of the targets.

**Figure 4.** San Diego real dataset: (**a**) the spectral curve of targets; (**b**) the location of targets in selected subset; and, (**c**) the original data.

The third real data has been acquired from a region in Viareggio city in Italy collected by the SIM.GA airborne sensor [46]. Although this original dataset has 512 spectral bands, ranging from 388–994 nm, after removal of the low Signal to Noise Ratio (SNR) bands, by applying a spectral resampling process with a 4 nm interval, it has been converted to a 100 × 100 × 123 hypercube. In the area of interest on this image, five spectral targets with extents of more than a few pixels were chosen to apply anomaly detection algorithms. Figure 5 shows the original data, the area of interest, and the spectral curve of targets.

On the other hand, in the majority of previous works, the efficiency of the developed methods for target and anomaly detection has also been evaluated using synthetic data. In this research, two synthetic datasets were also created when considering two different strategies. As the first strategy, some sub-pixel targets were implanted in a region near to the location of the original targets from the Rochester Institute of Technology (RIT) data. Figure 6 shows the implantation targets and their spectral curves. Thus, this way, the number of spectral targets will be increased and a higher number of probable spectral anomalies should be identified. According to Figure 6, seven spectral targets with 50–80% of similarity to the original spectrum were linearly added to the hyperspectral image and a total of 13 potential anomaly pixels were constructed. Then, with the aim of simulating PSF effects, a Gaussian weighted averaging process using a 3 × 3 window around the location of the implanted target was applied.

**Figure 5.** Viareggio real dataset: (**a**) the spectral curve of targets; (**b**) The location of targets in selected subset; and, (**c**) the original data.

**Figure 6.** Implanted RIT dataset: (**a**) the spectral curve of implanted targets; (**b**) the location of implanted targets in selected subset; and, (**c**) the original data.

As the second strategy of synthetic data generation, spectral destruction of original signals in the real RIT dataset was performed. In this strategy, a variation between ±5 to ±20% with respect to the original signals was applied to a randomly selected number of spectral bands (ranging from 5–10% of the total image bands) for six candidate pixels and a total of 12 potential anomaly pixels were constructed. The location of candidate pixels in this strategy were also locally chosen similar to the first strategy in the relatively homogeneous regions. The position of the destructed signals, a sample of the destructed spectral curve, and its related original data are displayed in Figure 7.

**Figure 7.** Destructed RIT dataset: (**a**) the spectral curve of destructed targets; (**b**) the location of destructed targets in the selected subset; and, (**c**) the original data.

#### **5. Results and Discussion**

As mentioned before, to evaluate the results and efficiency of the proposed algorithm, five types of different data consisting of three real and two synthetic datasets were used.

In this study, the functionality of the proposed method was assessed by performing the three-dimensional (3D)-ROC analysis [10,14] (Figure 8), evaluating the area under curves (Figure 9 and Table 1), background suppression criteria (Figure 10 and Table 1) and the generation of a target-background separation diagram [5] (Figure 11).

The traditional ROC curve is obtained by plotting of the false alarm rate ( *PFA* (versus the correct probability of detection ( *PD* (for different thresholds through Equation (5):

$$P\_D = \frac{N\_{\text{Signal detected}}}{N\_l}, \; P\_{FA} = \frac{N\_{\text{False Alarm}}}{N} \tag{5}$$

Since the output of anomaly detection algorithms is an image with two anomaly and background classes, by calculating the ratio of the number of correctly-detected anomaly pixels ( *NSignal detected*) to the total anomaly pixels ( *Nt* )for each threshold, the probability of correct detection is calculated. Additionally, with the calculation of the ratio of number of background pixels wrongly placed in the anomaly class ( *NFalse Alarm*) to the all pixels of the image ( *N*), the probability of wrong detection (known as the false alarm rate) for each threshold will be obtained.

Recently, the 3-D ROC analysis with some advantages respect to 2-D one was developed to evaluate anomaly detection algorithms [44]. In this case, varying the value of threshold (*Th*) enables the users to observe progressive changes in *PD* and *PFA* independently. A 3-D ROC curve can be generated when considering *PD*, *PFA*, and threshold (*Th*) as three components of a 3D point in the Cartesian coordinate system. In other words, it is a three-dimensional curve of ( *PD*, *PFA*, *Th*), in which three different 2-D ROC curves could be also generated from each aspect. The 2-D ROC that was obtained from ( *PD*, *PFA*) is the traditional one and the 2-D ROC obtained from ( *PFA*, *Th*) or ( *PD*, *Th*) are the new ones.

The 2-D ROC of ( *PD*, *Th*) could be represented as the progressive detection power versus the changes of threshold and the 2-D ROC of ( *PFA*, *Th*) provides important information of progressive background suppression as the threshold varies, especially in the case of visual interpretation with no availability of ground truth data.

Having obtained the detection maps of each method in different situations, the plot of this 3D curve for 5000 numbers of different thresholds with the minimum and maximum limit of the map of detection was performed and the area under curves were considered as a scale of the evaluation of the efficiency.

The separability diagram is also one of the indices of efficiency evaluation of two-class classification algorithms that shows the statistical separation of anomaly and background data. This diagram is generated with the help of the ground truth map and shows the range of the recorded values in the anomaly and background locations in the detection map. The level of separation or a presence of overlap among the domains of anomaly and background values indicate the level of success of the anomaly detection algorithm. In plotting this diagram, the following steps are considered: (1) generating anomaly detection maps for all of the compared methods; (2) normalization of detection maps considering the minimum and maximum of all anomaly detection maps simultaneously; (3) identification of anomaly and background signals through a ground truth reference map; and, (4) estimation of the minimum and maximum anomaly and background values for each detection map in two ways:


To compare the results obtained from the proposed algorithm, seven other anomaly detection algorithms were also implemented. The traditional Global and Local RX algorithms [12], Causal R-RX, and K-RX [22], as well as the recently-developed CRD [23] and BJSR [33] algorithms in the field of anomaly detection were chosen for this evaluations. Except the Global RX which has not any setting parameters, other six algorithms have several setting parameters. Generally, default setting parameters proposed by the developers were used in our comparisons. Window size is the only setting parameter in Local RX method. Generally, the best result obtained from windows of 11 × 11, 13 × 13 and 15 × 15 pixels were used in the comparisons. In Causal R-RX and K-RX, the window width of sliding array is the main setting parameter which set to CW = 900, according the best result obtained in [22]. The CRD setting are inner and outer windows as well as the regularization parameters. Here, a 7 × 7 inner and a 15 × 15 outer window size as well as 10−<sup>6</sup> were set as the regularization parameter [23]. In continue, the BJSR setting parameters are background and guard window size, search window and level of sparsity. Window sizes (background and guard) were selected based on the optimum setting reported in [33]. So, a 17 × 17 background window, 5 × 5 guard window, and 19 × 19 search window were the spatial setting and the sparsity level was set to 3 in SOMP method [33]. However, among the detection algorithms used in this paper, the Local RX algorithm (LRX) was easily adapted to the proposed sliding window. Thus, a new version of Local RX called the Sliding-window Local RX anomaly detector (SLRX) is also used in the evaluations. In this version, the generation of *FPOI* for each *PoI* is based on the Mahalanobis distance calculated by the Local RX of samples in the *wi* windows. Again, and similar to the method of SWJSR, the variance of *FPoI* has been used as the measure of this detector for each *PoI*.

Table 1 presents the results of AUC index ( *PD*, *PFA*, and *PFA*, *Th*) for the abovementioned algorithms and the best results by the proposed algorithm (SWJSR) implemented on the five types of real and synthetic data (implanted and destructed data).


**Table 1.** Average improvement of efficiency (AUC) of the GRX, local RX algorithm (LRX), CRD, BJSR, CR-RXD, CK-RXD, sliding-window Local RX anomaly detector (SLRX), and SWJSR for all the datasets. (The bold one is higher in case of AUC(*PD*, *PFA*) and it is lower in case of AUC(*PFA*, *Th*)).

When considering the traditional AUC ( *PD*, *PFA*) that is provided in Table 1, in the similar conditions the higher efficiency of the proposed algorithm is observed. Thus, an average 2% improvement when compared to the best results obtained from the other methods is noticeable. On the other hand, the AUC ( *PFA*, *Th*) values of the proposed method are rather lower than the other algorithms. It should be noted that the AUC of *PFA* vs. *Th* represents the level of background suppression and their lower values indicate the better performance of the algorithms [47]. In order to compare the anomaly detection algorithms, the 3D-ROC curves, 2-D ROCs of ( *PD*, *PFA*), 2-D ROCs of (*PFA*, *Th*), and the target-background separation diagrams that are related to each dataset (Table 1) are shown in the following figures (Figures 8–11).

**Figure 8.** *Cont.*

**Figure 8.** 3D ROC curves of the anomaly detection algorithms: (**a**) Real RIT dataset; (**b**) Real San Diego dataset; (**c**) Real Viareggio dataset; (**d**) Implanted RIT dataset; and, (**e**) Destructed RIT dataset.

**Figure 9.** *Cont.*

**Figure 9.** *Cont.*

**Figure 9.** 2-D ROC (*PD*, *PF*) of the anomaly detection algorithms: (**a**) Real RIT dataset; (**b**) Real San Diego dataset; (**c**) Real Viareggio dataset; (**d**) Implanted RIT dataset; and, (**e**) Destructed RIT dataset.

 **Figure 10.** *Cont.*

**Figure 10.** 2-D ROC (*PF*,*Th*) of the anomaly detection algorithms: (**a**) real RIT dataset; (**b**) real San Diego dataset; (**c**) real Viareggio dataset; (**d**) implanted RIT dataset; and, (**e**) destructed RIT dataset.

**Figure 11.** *Cont.*

**Figure 11.** Target-background separation diagram of the anomaly detection algorithms (the green box shows the target and the red box shows background statistics): (**a**) real RIT dataset; (**b**) real San Diego dataset; (**c**) real Viareggio dataset; (**d**) implanted RIT dataset; and, (**e**) destructed RIT dataset.

Again, higher efficiency of the SWJSR algorithm can be seen from the formation mechanism of 3-D ROC curves and separability diagrams. These diagrams, except for the implanted synthetic data, also reveal the desirable separation between the anomaly and background elements in the proposed method, which is a verification of better functionality when compared to the other methods.

In the case of synthetic data, all of the compared methods yield similar results. In the case of ROC (*PD*, *PF*) curves, mainly for all of the applied examinations, the relevant curve of the SWJSR is closer to the upper-left of the diagram and this factor has yielded the increase of the AUC (*PD*, *PF*) value.

Figure 12 shows the obtained detection maps by the GRX, LRX, CRD, BJSR, CR-RXD, CK-RXD, SLRX, and the proposed algorithms (SWJSR) for the reference ground truth map and for all of the used data in this research.

According to the best obtained results from the suggested SWJSR algorithm, the sensitivity analysis of this algorithm with respect to its tuning parameters was also implemented. These parameters, including: (1) the dimensions of the sliding window; and, (2) the level of sparsity, were used in simultaneous reconstruction of the background signals by SOMP algorithm. As the first investigation, the results of changing the sliding window dimensions in the AUC index for all of the used data are presented in Table 2.


**Figure 12.** Detection maps of compared algorithms for all datasets: (**a**) GRX detection map; (**b**) LRX detection map; (**c**) CRD detection map; (**d**) Background Joint Sparse Representation (BJSR) detection map; (**e**) SWJSR detection map; (**f**) CR-RXD detection map; (**g**) CK-RXD detection map. (**h**) SLRX detection map; and, (**i**) Reference anomaly map.

Map



As shown in Table 2, the obtained results depend on the correct definition of the dimensions of the sliding window. In this regard, according to part b of Figure 2, using the traditional RXD, to remove potentially anomalous signals by applying a proper threshold (*Th*-Plane), the covariance matrix that was estimated from a small number of data samples, could involve rank-deficient (non-invertible) matrices. To overcome numerical instabilities in case of smaller sliding windows, the pseudo inverse based on the Moore-Penrose method was used. To this aim when the number of samples (sliding window elements) was less than the number of spectral bands, the "pinv" function was used as an alternative of common inversion function (inv) in MATLAB. In the case of spectral anomalies with large spatial extension, it is necessary to use large sliding windows to achieve better results. For example, since the multi-pixel anomaly regions for the San Diego airport zone and Viareggio city in Italy are more extended than one-pixel anomalies in the RIT data, the optimum sliding window is also larger. The same rule also applies to the San Diego data compared with the Viareggio data, and the larger sliding window dimensions are convenient. Thus, keeping this rule in mind and while considering the spatial resolution of the sensor, obtaining primary knowledge about the extension of probable anomalies could be effective in tuning the sliding window size, reaching reliable results faster. This knowledge is less important when dealing with anomalies in the range of one pixel or less.

One of the most important tuning parameters of the suggested algorithm is a determination of the level of simultaneous sparse estimation of the background elements, which is called level of sparsity. In other words, the maximum number of atoms used from the learned dictionary for simultaneous estimation of all signals located in the sliding window is another tuning parameter. It is obvious that the value of this parameter depends on the variety of the occurrence of endmembers in the window. Since this value is indicative of the maximum use of the dictionary atoms, it is obvious that a lower, or the same, number of dictionary atoms that are proportional to this tuning parameter are selected in the recovery of all sliding window positions. Assigning a low number for this parameter yields an incomplete modeling of the background, and, when considering a higher number than necessary, results in the possibility of cooperation of unrelated atoms in decreasing the recovery residuals during the anomaly occurrence. Accordingly, the optimum value of this parameter was selected in a way that provided a balance between the two mentioned boundaries of the consequences of the incorrect selection of this parameter.

In Table 3, by assigning the identified optimum value for each dataset to the dimension of the sliding window, the effect of changing the level of sparsity in the AUC index has also been studied for all the datasets.

As observed from the results of Table 3, an optimized selection of this parameter has a significant role in the efficiency of the proposed algorithm. Indeed, considering the variety of the input data, choosing values of 5, 6, or 7 for this parameter will mainly yield desirable results, although in the ranges close to the optimum value this parameter did not reveal a significant change in results. Incorrect determination of this will considerably influence the results.

Since the proposed method involves considerably high processing when compared to other methods, it could not be compared from the computational cost and running time point of view. For example, the running time for RIT data in MATLAB software using a computer having an Intel Core i7 2.6 GHz processor and 16 GB of RAM under the Windows 10 64-bit operating system was 129 s, which is longer than the other methods. Nevertheless, the average running time of the proposed method in comparison with other methods are tabulated in Table 4. These times are the average value of running times of all datasets in each anomaly detection algorithm.

**Table 3.** Effect of the level of sparsity in the SWJSR detector for all datasets. (The bold one is the higher).



**Table 4.** Average running time of the compared algorithms using all datasets.

Finally, it seems that utilizing and developing parallel processing systems will increase the speed of running the proposed algorithm that is the focus of future studies of the authors.
