**1. Introduction**

Today, hyperspectral imaging has become a powerful tool in the field of remote sensing applications. It provides valuable data acquired from hundreds of narrow spectral bands across the reflective electromagnetic spectrum to distinguish different materials based on their unique spectral responses [1]. Target detection and classification could be considered as the most important information extraction approaches in hyperspectral data interpretations [2–4]. The target detection algorithms could be utilized in supervised and unsupervised categories [4]. In the former case, the spectral signatures of the targets are used in detection algorithms whereas, in the latter, no prior knowledge is available regarding the spectral characteristics of targets, and just the detection of the spectral anomalies would be on the agenda [5]. In fact, anomaly detection algorithms could be considered as an unsupervised classification with two classes (anomaly and background) [6]. Thus, the anomalies are unknown targets that are significantly different from their neighbor samples and their probabilities of occurrence are low. Detection of these differences would be independent of the spectral signature of the targets and thus, their effective parameters, including the environmental and atmospheric conditions [7]. Remote sensing application, such as search and rescue [8], detection of military vehicles and objects [9], detection of rare minerals in geology, recognition of vegetation stress [10], toxic wastes in environmental monitoring, and tumors in medical imaging could be considered as spectral anomalies that can be detected via hyperspectral anomaly detection algorithms.

All of the developed methods in the field of anomaly detection could be classified into two broad categories. Local and global methods include the first category in this area. In the global methods, the judgment criterion of each pixel in terms of anomaly presence is the generation of indicators that use all the signals recorded in the hyperspectral image [11]. In local methods, only the spatial neighbors of each signal are used for this purpose. When considering the compliance or non-compliance of hyperspectral data to the normal distribution assumption in the feature space leads to another categorization of anomaly detection algorithms. Parametric methods, such as considering the covariance/correlation matrix, assume that the background data follow a normal distribution. In contrast, methods that are based on linear un-mixing or sparse representations do not make any assumption on the statistical distribution of the hyperspectral data.

The Reed-Xiaoli (RX) method [12] is known as a traditional benchmark of hyperspectral anomaly detection algorithms. The idea of this traditional algorithm has been used as the basis of development of other similar methods which have been used in both local and global strategies, such as normalized RX, modified RX, causal RX [13,14], weighted RX [15], RX-UTD, and Adaptive Causal Anomaly Detector (ACAD) [16]. The main assumption of these algorithms is that the hyperspectral data follows the multivariate normal distribution. Thus, it is assumed that the anomalous signals would be placed in a larger Mahalanobis distance compared to the centroid of the data. Although it seems reasonable in the homogeneous regions, it is not, however, convenient to represent the background signals when the data do not follow the Gaussian distribution. In this regard, some modified version of RX, such as the Kernel-RX algorithm [17] was proposed to overcome the flaws of the mentioned RX assumption for the background. This method attempts to increase the tendency of the data in the feature space to the Gaussian distribution by mapping the signals into a higher dimensional space using non-linear kernels. When considering the use of the covariance/correlation matrix of the sampled data in RX-based methods, these methods are categorized as parametric algorithms.

Another developed algorithm to detect anomalies in hyperspectral data is the Dual Window-based Eigen Separation Transform (DWEST) algorithm [18]. Based on the linear transformation of EST, this algorithm has been designed to maximize the separation between two classes in the low-dimensional subspaces by using local windows [19]. The Nested Spatial Window-based Target Detector (NSWTD) algorithm is also another anomaly detection algorithm [20]. In this algorithm, similar to DWEST, the nested spatial windows with a pre-defined size are used as inner, middle, and outer windows. The evaluation criterion of the spectral features differences of these windows is also Orthogonal Projection Divergence (OPD). Liu and co-workers extends the concept of DWEST to propose a new approach, called multiple-window anomaly detection (MWAD), using multiple windows to perform anomaly detection adaptively. This method is able to detect anomalies of various sizes using multiple windows so that local spectral variations can be characterized and extracted by different window sizes [21]. Chang and co-workers proposed an anomaly detection method using causal sliding windows, which has the real-time capability. They suggested three types of causal windows, using causal sliding square matrix windows, causal sliding rectangular matrix windows, and causal sliding array windows. In this method a causal sample covariance/correlation matrix can be derived for causal anomaly detection. In the case of using covariance matrix and correlation matrix, they are called CK\_RXD and CR-RXD, respectively. They also proposed a recursive update equation to speed up the real-time processing [22]. Moreover, Li and co-workers introduced another method, named the CRD algorithm [23]. The main assumption in this method is the possibility of precise background estimation using the neighboring pixels. Thus, it is not true for the anomaly signals and a high residual occurs. Therefore, in this method the l2-norm of residuals of the estimated signals have been considered as an anomaly detection map. In other words, this detector locally estimates the backgrounds using a dynamic dual-window structure, and, subsequently, estimating the error vector of the signals located at the center of the window is considered as the criterion of probability of anomaly presence for each signal. The idea of background signals recovery using bases of the background subspace and utilizing these bases to recover the anomalous signals is considered as the most important innovative aspect of this detector. Yuan and co-workers proposed a novel method for fast and accurate hyperspectral anomaly detection, which is called 2DCAD [24]. In this method a high-order two-dimensional (2-D) crossing approach is proposed to find the regions of rapid change in the spectrum, which runs without any a priori assumption. This method has a low-complexity discrimination framework which can be implemented by a series of filtering operators with linear time cost. Also it has the ability to detect the true pixel-level for real-time application. Also, Yuan and co-workers proposed a graph-based method for anomaly detection without any assumptions of background distribution statistics [25]. In this method, after the construction of a vertex- and edge-weighted graph, a pixel selection process is utilized to locate the anomalies. The philosophy behind this method is that the anomalies tend to be picked out more easily than the background pixels in the constructed graph. Because an anomaly pixel generally deviates from the background, and its distinctiveness makes its connections with other background pixels vulnerable. This method has good robustness to noise and adaptability to window sizes, which makes it more applicable in the real situations.

Recently, another method by applying sparse representation theory has been introduced and successfully accepted as a strong tool for anomaly/target detection [26]. The main objective of these techniques are the recovery of the majority of high-dimensional signals via a low-dimensional subspace through a dictionary of normalized signals (atoms). In the process of sparse estimation of each signal, a limited number of atoms of a dictionary are active and a majority of coefficients related to dictionary atoms are zero [27]. In other words, signals are recovered via a linear mixing of atoms in the dictionary through the sparse coefficients.

When considering the sparse representation techniques, targets and anomalies could be detected using two different approaches. In the target detection approach, the creation of a dictionary containing background and target spectra are the main steps of sparse representation. In other words, a proper background modeling would result in the efficient presence estimation of spectral targets [5]. In this regard, Chen and co-workers [28] defined a dictionary including interested targets using their spectral signatures, at the same time another dictionary was including the local background signals. Subsequently, these two dictionaries are used to make a decision on a pixel being a target or a background. This decision can be made through sparse estimation of each pixel using two target and background dictionaries while considering the recovery error differences. Furthermore, Du and co-workers [29] presented a target detection algorithm through integration of statistical methods and sparse representation by the Hybrid Sparsity and Statistic Detector (HSSD) algorithm. The primary assumption in this method is that the pixel of interest follows the Gaussian normal distribution with the same covariance and different variance in two statistical hypotheses of being or not being a target. To achieve the efficient detection, the probable target pixels are removed from the dictionary related to the background through utilizing the SAM algorithm based on the initial target spectral signatures. Then, in an iterative process, the sparse estimation is performed by the Orthogonal Matching Pursuit (OMP) method in two stages: (1) the dictionary including only the background data; and, (2) the integrated dictionary of target and background data. Finally, the recovery error difference of the pixel in these two stages in comparison to a pre-determined threshold will yield the decision as to whether the pixel is a target or a background.

In anomaly detection methods, considering no prior knowledge about the spectral targets, the plan is to build a dictionary of atoms that can exclusively model the background elements [30]. In other words, having a dictionary that is composed of bases denoting the background subspace enables the precise recovery of background signals. Additionally, the presence of anomaly signals, assuming their deviation from the background subspace, will not have a precise estimation by the background dictionary. The main idea of anomaly detection methods based on sparse representation of signals is

focused on evaluating recovery errors of signals by a dictionary that describes the background subspace. The effort of removing atoms that describe the anomaly in the background dictionary can be considered as one of the essential actions in this procedure [31]. In this field, Yuan and co-workers [32] presented a new method for anomaly detection in hyperspectral images by introducing a spatial-spectral evaluation index, which is called the Local Sparsity Divergence (LSD), where the estimation of sparse matrix elements is locally performed by the determination of the search window dimensions. Lee et al. [33] also suggested Background Joint Sparse Representation (BJSR) for anomaly detection by estimating the background locally using a limited number of subspaces extracted from the hyperspectral data through sparse coding. Zhao and co-workers [34] presented the Sparsity Score Estimation Anomaly Detector (SSEAD) algorithm for the same reason. In this method, an index is used to detect anomalies based on the frequency of the participating atoms in the dictionary learning process to estimate the background atoms. In this way, through an iterative process, the estimation of the background is optimized. Moreover, through optimization of the weighting of the forming atoms of the background, each pixel in the hyperspectral data is scored and the decision is made for being an anomaly or background. Zhang and co-workers [35] also introduced the LLTSA-SSBJSR method as an extension to the BJSR method. This method first uses the spectral space to identify anomalies, and then spatial analysis is performed on the dimensionally reduced data by the LLTSA method. Also Ma and co-workers proposed a novel anomaly detection method based on sparse dictionary learning with capped norm constraint using the sliding dual window, which is named SDLCN [36]. In this method, a number of patches with same size from the entire image are randomly selected and stacked as training data to construct the background dictionary. After that the capped l1 norm based loss function is used to suppress the effects of anomalies in the training set, which will learn a better dictionary resistant to anomalies. After learning an optimized background dictionary, through computing the sparse representation coefficient matrix, the reconstruction errors are calculated, which can be regarded as the corresponding anomaly probability values.

By focusing on local anomaly detection algorithms, in all of these methods, the assumption of the spatial symmetry of background elements is considered to judge a signal. Thus, each pixel of the hyperspectral image is tested only once in terms of anomaly presence. In such situations, if the anomalous pixels are near the edges of the image, the probability of false detection will be increased and the background signals might be considered as anomalies. Due to the lack of prior knowledge about the spatial distribution of similar signals in a geographic area, providing a voting-based approach in the definition of a diverse neighborhood could be a good solution. Accordingly, in this research, creating diversity in the definition of spatial neighborhoods of spectral signals, as well as voting-based judgment in different situations, of the spatial distribution are proposed as two approaches to confront this challenge. In other words, the most important aspect of this study is to improve the judgments about the probability of anomaly presence in signals by diversifying the definition of spatial neighborhood of their surrounding area. Since, by designing an optimized local dictionary, which is based on a sliding window with a new structure, the votes of each signal in terms of anomaly presence in each spatial neighborhood are calculated with the aim of achieving better judgment.

#### **2. Dictionary Learning and Joint Sparse Coding**

In the sparse coding techniques, the b-dimensional signals ([*s*]*<sup>b</sup>*×1) are mapped to a low-dimensional subspace through a dictionary of atoms [37]. When considering [ *D*]*<sup>b</sup>*×*n* = [ → *d*1, → *d*2, ... , → *dn*] as a dictionary of unit length atoms ([ → *di*]*<sup>b</sup>*×1, *i* = 1, 2, ... , *n* and → *di*2 = 1) where *b* << *n*, the aim of the sparse estimation of a signal is to find the sparse vector [ <sup>α</sup>]*n*×<sup>1</sup> through solving an under-determined system of equations presented in Equation (1) [38]:

$$\mathbf{a} \cdot [\mathbf{s}]\_{b \times 1} = [D]\_{b \times n} \times [\mathbf{a}]\_{n \times 1}, \hat{\mathbf{a}} = \operatorname\*{argmin} \|\mathbf{a}\|\_{0} \\ \text{s.t.} \|\mathbf{r} = \mathbf{s} - D\mathbf{a}\| < \varepsilon \tag{1}$$

where .<sup>0</sup> indicates the *L*0-norm, which is equivalent to the number of non-zero elements of *a*ˆ. Since there is no explicit method to solve this equation systems, greedy algorithms [39] are used as the general approach to estimate *a*ˆ. OMP [40] and Simultaneous Orthogonal Matching Pursuit (SOMP) [41] techniques are two common approaches for greedy and sparse estimation of signals using dictionaries. In the OMP algorithm, the sparse vector is estimated individually based on a signal, while in the SOMP algorithm, it is estimated simultaneously based on several signals. Both of these algorithms try to find atoms that describe signals iteratively to satisfy the conditions mentioned in Equation (1) [42]. In these two techniques, in each iteration, an atom from the dictionary, which has the minimum spectral angle in the estimation error of a signal/signals is added as a new atom to a set of previously-selected atoms (activate atoms). In this trend, the similarity of the residual vector/vectors related to estimated signals using previous spanned subspace is considered as the criterion for choosing new atoms. In other words, when considering R as the vector/matrix of the estimation error obtained from previously activated atoms (Equation (2)), in each iteration the atom which maximizes [*di*] *T* × [*R*]2 (*i* = 1, 2, . . . , *n*) would be added as the new active atom to the set of previously-activated atoms in dictionary ( *D*):

$$\begin{bmatrix} \mathbf{R} \end{bmatrix} = \begin{bmatrix} \mathbf{S} \end{bmatrix} - D \times \begin{bmatrix} A \end{bmatrix} \tag{2}$$

Here, when the OMP technique is used the *S* is exclusively a vector including a single signal *S* = [*s*]*<sup>b</sup>*×<sup>1</sup> and also when SOMP technique is used a matrix including all of the signals (*S* = [ → *s*1, → *s*2,... , → *st*]) that tend to be estimated simultaneously. In the same way, it is obvious that the dimension of A will be [*α*]*n*×<sup>1</sup> in the OMP technique and [ → *α*1, → *α*2, ... , → *<sup>α</sup>t*] in the SOMP technique, as they share the same zero rows. Notably in the first iteration *R* is considered *S* (*S* = *R*) to select the first atom.

In sparse coding procedures, by creating a redundant dictionary from probable endmembers in the feature space, sparse recovery of signals is performed. Effective performance of a dictionary depends on the correct orientation of its atoms in the feature space, and also the lack of these bases in input data through the absence of end-members in the imaging process is possible. The direct use of sampled signals or learning of dictionary atoms are two main approaches of dictionary generation. In the first approach, if all sampled signals are chosen, the sparse estimation of each signal is merely converted to a minimum distance classification process and the *L0*-norm of the sparse estimation vector (*a*ˆ) of each signal will be one. Choosing a part of the sampling signals faces probable problems, such as (1) the occurrence of a minimum distance classification phenomena (*a*<sup>ˆ</sup>0 = 1) for chosen signals, and (2) the probability of the impossibility of the signal subspace spanning using dictionary atoms.

In the anomaly detection applications using sparse coding methods, having a dictionary where their atoms are capable of spanning the formed space by background signals is critical. In other words, because the sparse estimating error of signals by the background dictionary is considered to be a measure of being or not being an anomaly for each signal, correct extraction of the background bases subspace and their presence in the dictionary is necessary. Due to the limitations of using randomly selected signals in the formation of the background dictionary (according to the designed structure of the proposed anomaly detection algorithm) learning of dictionary atoms to match with bases that can correctly recover the space of the background signals is used in this research.

The K-SVD technique [43], as one of the dictionary learning methods, by choosing a percentage of sampled signals, randomly creates the initial dictionary and during the iterative process converges its atoms to the spanning bases of the subspace of all input signals. In each iteration of the K-SVD algorithm after the sparse estimation of all signals with the OMP technique, the effect of loss of each atom in the estimation error vector of the signals is affected by that atom is evaluated. The main idea of this technique is to correct the base of the specified atom toward the dominating base of the estimating error vectors of signals. To this aim, specific vectors corresponding to the maximum singular value that is obtained from singular value decomposition of residual matrices is chosen as the substituted base of the specified atom. This iterative procedure is continued to stabilize the base of all atoms of the dictionary.
