2.1. Hyperspectral Harmonic Analysis
HSI has a lot of redundant information because of the high correlation of adjacent bands. Due to the influence of noise and the atmosphere, not all bands can provide valid information [
33]. Because of this, the hyperspectral data are regarded as the timing signal from the perspective of time domain signals. Then, HA is introduced to perform time-frequency space conversion on the timing signal. The theory of HA is to represent any time-series function
f(
t) with respect to time
t by superposition of sine or cosine waves (harmonics) [
34].
Different from the traditional Fourier transform [
35] than from the spatial domain to the frequency domain, HA utilizes spectral information and correlation between bands in the HSI to transform from time domain to spatial domain. By making full use of the hyperspectral spectral characteristics, the hyperspectral data is transformed into a set of components that is composed of energy spectral characteristic components by the harmonic analysis, while the spatial characteristics of the hyperspectral data remain unchanged. Specifically for a single pixel, multiple harmonic analysis can express the spectral information of each pixel as the sum of the superposition of the sine waves (cosine waves), which is composed of a series of harmonic energy characteristic parameter components. In this way, the spectrum of each pixel can be represented as a complex, continuous and smooth curve.
When HA is employed to process spectral data, the approximately continuous spectral curve composed of
B bands can be treated as a function of period
B. After harmonic decomposition, the spectral curve of each pixel in the HSI can be expressed as the sum of the superposition of the sine waves (cosine waves) composed of a series of harmonic energy characteristic parameter components, including harmonic remainder, amplitude and phase [
20]. If it is known that the spectral vector consisting of
B bands can be represented as
Yi = [
y1,
y2, …,
yn, …,
yB]
T, the spectral value of each band is recorded as
yn, where
n is the band serial number (
n = 1, 2, …,
B). Therefore, the harmonic decomposition expansion of the
h-th harmonic analysis of the spectral vector
Yi is:
The harmonic energy characteristic parameter components of the
h-th harmonic decomposition of
Yi are calculated as:
where
h (
h = 1, 2, 3, …) is the number of harmonic decompositions;
Re is the remainder of the harmonic;
is the
h-th harmonic component;
Ch,
Sh,
Hh and
Ph are the cosine amplitude, sine amplitude, harmonic component amplitude and harmonic component phase of the
h-th harmonic decomposition, respectively. Here, the remainder represents the average of the spectral, the amplitude and phase respectively reflect energy changes in different bands and the position of the band where the amplitude of the energy appears [
36]. The detailed steps of hyperspectral HA are described in Algorithm 1.
In the HA, the lower harmonics contain the main energy characteristics of the spectral and the higher harmonics are usually mixed with noise information. Therefore, harmonic analysis has better noise cancellation and main energy extraction capabilities, and it can also preserve the spatial feature information of hyperspectral data.
Since the low-order harmonics contain most of the energy of the HSI, not all harmonic images can provide crucial information. As such, we can select the first few low-order harmonic images as the subsequent test images. However, not all images in low-order harmonic images are suitable for anomaly detection. As shown in
Figure 2, it can be clearly seen that the phase in low-order harmonic images contains a lot of interference and useless information. Thus, the phase in low-order harmonic images is directly discarded. The remainder and amplitude are preserved. Finally, the remainder and the first five amplitude images are adopted for subsequent test images. In this way, the ultimate goal of hyperspectral HA in this part is to reduce dimensionality and remove redundant information.
Algorithm 1. Harmonic analysis |
1. Input: Hyperspectral matrix Y = [Y1, Y2, …, YM] ∈ RB×M, maximum harmonic number hmax. |
2. Transformation process |
for each pixel Yi do |
(1) calculate remainder Re by Equation (2); |
for h = 1 to hmax do |
(2) Calculate coefficients of HA by Equations (3)–(6); |
end for |
(3) Get the reconstructed pixel Y′i by Equayion (1); |
end for |
3. Output: remainder Re, amplitude Hh and phase Ph. |
2.3. Low-Rank Decomposition
Since the anomaly is sparse and the background is low-rank in hyperspectral data, the low-rank decomposition in the field of visible light image processing is also applicable to HSI [
44,
45,
46]. However, most of these methods directly implement low-rank decomposition on the original HSI without considering the redundant information of HSI, the correlation between bands and the isolated noise. In this respect, we proposed a novel low-rank decomposition strategy. Here, instead of the original HSI, we performed the low-rank decomposition on the initial smooth images
S, which is identified as the object of low-rank decomposition. Then, the decomposition formula is represented as follows:
where
FRm×M is a
m ×
M coefficient matrix of all pixels in
S;
ARl×M is a
l ×
M sparse matrix, respectively. Here,
m represents the total number of atoms in the background dictionary
D,
M represents the total number of pixels in the HIS, and
l indicates that each pixel in the initial smooth images
S is a
l-dimensional column vector.
For the low-rank decomposition problem, there are several solutions. RPCA [
28] could decompose the hyperspectral data into the low rank part and the sparse residual part. However, RPCA does not consider the noise of hyperspectral data making the detection results prone to false alarms. The GoDec method [
47,
48] divides the hyperspectral data into low rank matrix, sparse matrix and noise matrix, taking into account the noise. The LRASR method [
30] solves the low rank representation (LRR) [
49] problem with low rank and sparse regularization constraints by the linearized alternating direction method with adaptive penalty (LADMAP) [
50]. This method implants the
l21 regularization constraint to make the majority of the columns in matrix
F equal to 0. However, due to the characteristic difference of anomaly in practice, there are some non-zero values in the column of the sparse matrix
A. Therefore, the ADLR method [
32] replaces the
l21 regularization constraint with the
l1 regularization constraint. In addition, this
l1 regularization constraint can further reduce the impact of isolated noise on anomaly detection. Since the isolated noise has no sparse attributes in the image, they will be clustered into a group because of their similar coefficients. The
l21 regularization is the
l21 norm of a matrix, which is defined as the sum of the
l2 norm of each column in the matrix. The
l1 regularization is the
l1 norm of a matrix, which is defined as the sum of the absolute values of all the elements of the matrix.
Through the above analysis and comparison, we exploited the ADLR method with the
l1 regularization constraint for the low-rank decomposition. In the low-rank decomposition, the coefficient matrix
F is low-rank and the sparse matrix
A is sparse. After introducing the
l1 regularization constraint, the objective function is as follows:
where
rank(·) represents the rank function;
λ > 0 is the tradeoff parameter used to adjust the low rank part and the sparse part; ||·||
1 is the
l1 norm, respectively. The abbreviation s.t. refers to “subject to”, which indicating that the constraint is
S =
DF +
A. ||A||
1 is specifically expressed as:
where
l indicates that each pixel in the initial smooth images
S is a
l-dimensional column vector,
M represents the total number of pixels in the HIS and
is the absolute value of the element in the
row-th row and
col-th column of the matrix
A.
To solve the objective function, the rank function is replaced by the matrix nuclear norm ||·||
* [
51]. Therefore, the objective function can be optimized to:
In order to decouple the objective function (15), an auxiliary variable
U is introduced instead of
F [
52]. The Lagrange equation of the objective function is;
where tr[·] represents the trace of matrix, which is the sum of the diagonal elements of matrix;
L1 and
L2 are Lagrange multipliers;
τ > 0 is the penalty parameter; ||·||
F is the
F norm, respectively. Since Equation (16) contains multiple variables, it can be solved by alternate iterative update methods. When solving one of the variables, keep the other variables unchanged. Accordingly, the solution process can be broken down into the following form.
(1) Update
U when keeping
F and
A unchanged, the objective function can be reformulated as:
where arg is the abbreviation of argument. The arg min refers to the value of the variable
U when the equation behind it reaches the minimum value.
(2) Update
F when keeping
U and
A unchanged, the objective function can be reformulated as:
(3) Update
A when keeping
F and
U unchanged, the objective function can be reformulated as:
Equations (17) and (19) can be optimized by utilizing the singular value thresholding operator [
53] and Lemma 1 in literature [
54], respectively. In the iterative process, the Lagrange multipliers update formulas are as follows:
The iterative convergence condition is expressed as:
where
η is the decomposition error; ||·||
∞ is the infinite norm. After the sparse matrix
A is obtained by low-rank decomposition, we get the anomaly by the
l2 norm. The
l2 norm is the square root of the sum of the squares of all the elements in each column of pixels in the matrix. The specific equation is as follows:
where
T(
i) represents the anomaly detection result of the
i-th pixel.
A:,
i represents the
i-th column of pixels in the matrix
A, and
M represents the total number of pixels in the HSI.
The complete steps of the low-rank decomposition are presented in Algorithm 2.
Algorithm 2. Low-rank Decomposition |
Input: Initial smooth image S, background dictionary D. |
Initialization: Give λ according to different input data, τ = 10−6, η = 10−6, τmax = 1010, ζ = 1.1, F = U = A = L1 = L2 = 0. |
while not satisfy the convergence condition (21) do |
(1) Update variable U according to Equation (17); |
(2) Update variable F according to Equation (18); |
(3) Update variable A according to Equation (19); |
(4) Update Lagrange multipliers L1 and L2 according to Equation (20); |
(5) Update variable τ, where τ = min(τmax, τ*ζ); |
end while |
Return: Coefficient matrix F, sparse matrix A. |
Calculate anomaly result T according to Equation (22); |
Output: Anomaly result T. |