Data-Wise Spatial Regional Consistency Re-Enhancement for Hyperspectral Image Classification

Zhou, Lijian; Xu, Erya; Hao, Siyuan; Ye, Yuanxin; Zhao, Kun

doi:10.3390/rs14092227

Open AccessArticle

Data-Wise Spatial Regional Consistency Re-Enhancement for Hyperspectral Image Classification

by

Lijian Zhou

¹,

Erya Xu

¹,

Siyuan Hao

¹,

Yuanxin Ye

²

and

Kun Zhao

^1,*

¹

School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266525, China

²

Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 610031, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(9), 2227; https://doi.org/10.3390/rs14092227

Submission received: 10 March 2022 / Revised: 3 May 2022 / Accepted: 3 May 2022 / Published: 6 May 2022

(This article belongs to the Section Remote Sensing Image Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Effectively using rich spatial and spectral information is the core issue of hyperspectral image (HSI) classification. The recently proposed Diverse Region-based Convolutional Neural Network (DRCNN) achieves good results by weighted averaging the features extracted from several predefined regions, thus exploring the use of spatial consistency to some extent. However, such feature-wise spatial regional consistency enhancement does not effectively address the issue of wrong classifications at the edge of regions, especially when the edge is winding and rough. To improve the feature-wise approach, Data-wise spAtial regioNal Consistency re-Enhancement (“DANCE”) is proposed. Firstly, the HSIs are decomposed once using the Spectral Graph Wavelet (SGW) to enhance the intra-class correlation. Then, the image components in different frequency domains obtained from the weight map are filtered using a Gaussian filter to “debur” the non-smooth region edge. Next, the reconstructed image is obtained based on all filtered frequency domain components using inverse SGW transform. Finally, a DRCNN is used for further feature extraction and classification. Experimental results show that the proposed method achieves the goal of pixel level re-enhancement with image spatial consistency, and can effectively improve not only the performance of the DRCNN, but also that of other feature-wise approaches.

Keywords:

hyperspectral image classification; spatial regional consistency; SGWT; Gaussian filtering

1. Introduction

In recent years, hyperspectral imaging and application have attracted great attention in the field of earth remote sensing. HSI classification is the basic task of hyperspectral data analysis and application [1,2,3,4]. The spatial regional consistency characteristics [5] of the image should be considered during the process of HSI since some correlations exist between the ground objects in the HSI. Moreover, the problem of background interference widely exists in the existing public HSI data [6], which also makes it difficult to accurately identify and classify ground objects. In summary, it is very important to make full use of the rich spatial consistency information [7] and improve the quality of HSIs [8].

The research on spatial consistency has attracted increasing attention as a result of the development of remote sensing classification techniques. The spatial consistency of an image can be simply defined as every small window having similarity with the other windows in the same image, especially the adjacent windows [9]. Therefore, the correlation between a pixel and its neighboring pixels should be considered during feature extraction. In addition, usually, similar objects tend to be distributed in a block, that is, pixels belonging to the same class are usually close to each other. Therefore, the spatial consistency in HSI is firstly used to enhance the quality of HSI. Based on the Gibbs algorithm, Rand et al. [10] regard HSI as a set of high-dimensional vectors related to spectral information, and divide a large set into several subsets of vectors according to spatial similarity. The spatial consistency of spectral information at each site is enhanced by facilitating subsequent spectral mixing analysis (SMA) of HSI. Yue et al. [11] combine multiple similar pixels adjacent in the spatial domain into one block to realize the pixel reduction in HSI according to the spectral angle. Secondly, spatial consistency is also used for feature extraction of HSIs. The spatial correlation features of HSIs can be obtained by using Spectral Graph Wavelet Transform (SGWT) [12], which fully considers the relationship between each pixel and its adjacent pixels. Nadia et al. [13] use SGWT to extract texture information of a HSI as secondary features in HSI classification. In addition, SGW can also be seen as a filter, which can be used to extract the multi-scale characteristics of an image. The SGW [14] is used as the convolution kernel to construct a Graph Wavelet Neural Network (GWNN), which is used to classify the nodes of the graph. Dong et al. [15] decomposes the vibration signal by using SGW to obtain its multi-scale characteristics, and converts the results into path graphs at multiple levels. The above spatial consistency enhancement methods are mainly based on hyperspectral raw data (data-wise).

Chen [16] applied deep learning to HSI classification for the first time and achieved good results. Convolutional Neural Networks (CNNs) in hyperspectral image classification tasks [17,18,19,20] use convolutional kernels to traverse the whole image and extract valuable features. In the process of convolution, the spatial consistency of the image has been considered. The recently proposed DRCNN [21] divides an image block into multiple regions, which are sent into different CNN models instead of only a single CNN model. The classification process is more consistent using the regional consistency assumption in the spatial domain since multiple feature extractions and weighted averages are performed in DRCNN. Overall, the advantage of DRCNN is that it strengthens the spatial consistency of the feature-wise approach and increases the number of samples by multi-region operation. However, it has some limitations. Firstly, DRCNN ignores the spatial consistency at the data-wise (pixel) level since the correlation between the pixels in an HSI is considered less. Secondly, the operation of multiple convolutions will lead to the loss of image edge information, which will cause the problem of edge point misclassification. Furthermore, to remove the noise of HSIs, the denoising effects of commonly used filters such as the bilateral filter [22], trilateral filter [23], and Gaussian filter have been compared. The Gaussian filter is selected in this paper since it can remove Gaussian noise and smooth the edges of images. In particular, the Gaussian filter can greatly simplify the noise variance estimation and analysis [24]. Therefore, a Data-wise spAtial regioNal Consistency re-Enhancement (DANCE) method is proposed in this paper to further improve the spatial consistency. Based on the above analysis, DANCE can overcome the shortcomings of the DRCNN in terms of spatial consistency in data-wise approaches to some extent.

The main contributions of this paper are as follows:

To solve the misclassification problem of HSI image edge points, a novel and effective DANCE method is proposed to enhance the spatial regional consistency of data-wise approaches, which can promote the performance of some state-of-the-art methods.
To better integrate the feature-wise and data-wise method, the structure of the DRCNN model is optimized through experiments, which can comprehensively improve the spatial regional consistency.

The remainder of this paper is organized as follows. The related basic knowledge is introduced in Section 2. The proposed method is described in Section 3. The experiment results and analysis are discussed in Section 4. The discussion is given in Section 5. The conclusions are drawn in Section 6.

2. Preliminary

2.1. Characteristics of Weighted Graphs

A hyperspectral image can be regarded as an undirected weight graph

G = (V, E, w)

, where

V

is the vertex set,

E

is the edge set,

w

is the weight function between vertices and is not less than 0. Take an

M \times M

image as an example; each pixel in an image has k neighbor nodes

N_{k}

. The value of element

a_{i, j}

in

A

is defined as:

a_{i, j} = {\begin{matrix} w (i, j), i f j N_{k} (i) \\ 0, o t h e r w i s e \end{matrix}

(1)

where

1 \leq i, j \leq M^{2}

. For a weight graph G, the degree of each vertex i is recorded as d(i), which is equal to the sum of all weighted edges values of a vertex i. d(i) is defined as:

d (i) = \sum_{j = 1}^{M^{2}} a_{i, j}

(2)

The matrix D is defined as:

D_{i j} = {\begin{matrix} d (i), i f i = j \\ 0, o t h e r w i s e \end{matrix}

(3)

Then the Laplace operator L of a graph G is defined as:

L = D - A

(4)

where L is a real symmetric matrix, and the eigenvalues of L are defined as

λ_{l} (l = 0, 1, ..., M^{2} - 1)

, all of which are non-negative. The corresponding eigenvectors are X_l, which are mutually orthogonal. The eigenvalues

λ_{l}

can be sorted to be

0 = λ_{0} < λ_{1} \leq \dots \leq λ_{N - 1}

, then:

L χ_{l} = λ_{l} χ_{l}

(5)

2.2. Graph Fourier Transform

The Fourier transform of a signal f is shown in Equation (6).

\hat{f} (ω) = \int f (x) e^{- j ω x} d x

(6)

The inverse Fourier transform is given by:

f (x) = \frac{1}{2 π} \int \hat{f} (ω) e^{j ω x} d ω

(7)

where

e^{j ω x}

is the exponential eigenfunction. The Spectral Graph Fourier Transform (SGFT) is obtained by replacing the set of eigenvectors

e^{j ω x}

with the graph eigenvectors

χ_{l}

, i.e., the SGFT of a function f with the length L.

\hat{f} (l) = 〈 f, χ_{ℓ} 〉 = \sum_{n = 0}^{L - 1} f (n) χ_{ℓ}^{*} (n)

(8)

where

0 \leq l, n \leq L - 1

. The inverse SGFT is given by:

f (n) = \sum_{ℓ = 0}^{L - 1} \hat{f} (l) χ_{ℓ} (n)

(9)

2.3. Spectral Graph Wavelet Transform

The spectral wavelet kernel function

g

is similar to the wavelet kernel function in the Fourier domain. Generally,

g

can be regarded as a bandpass filter satisfying

g (0) = 0

and

\lim_{x \to \infty} g (x) = 0

. Each Fourier mode of a given function f can be modulated by the wavelet operator

T_{g} = g (L)

:

\overset{\land}{T_{g} f} (l) = g (λ_{l}) \hat{f} (l)

(10)

Then the Fourier inverse transform is applied to Equation (10):

(T_{g} f) (m) = \sum_{ℓ = 0}^{M^{2} - 1} g (λ_{l}) \hat{f} (l) χ_{l} (m)

(11)

The wavelet operator on scale s is defined as

T_{g}^{s} = g (s L)

, where

T_{g}^{s}

corresponds to the wavelet operator

ψ (s, ω)

in a classical wavelet transform. Then the spectral graph wavelet is obtained as shown in Equation (12):

ψ_{s, n} (m) = \sum_{ℓ = 0}^{L - 1} g (s λ_{l}) χ_{l}^{*} (n) χ_{l} (m)

(12)

Formally, the wavelet coefficients for a given function f are obtained by taking the inner product of the wavelets:

W_{f} (s, n) = 〈 ψ_{s, n}, f 〉

(13)

Then the SGWT for a graph function

f \in R^{L}

at vertex n and scale s is shown in Equation (14):

W_{f} (s, n) = (T_{g} f) (n) = \sum_{l = 0}^{N - 1} g (s λ_{l}) \hat{f} (l) χ_{l} (n)

(14)

3. Materials and Methods

3.1. Materials

In later experiments, three public hyperspectral datasets were selected, namely, include Indian Pines data, Salinas-small data, and Pavia University data.

3.1.1. Indian Pines Data

Indian Pines data are an image of the Indiana farm in the USA taken by the AVIRIS imager, which is the first dataset used for HSI classification tasks. The wavelength range of the AVIRIS spectral imager is 0.4 to 2.5 μm with a spatial resolution of about 20 m. A total of 220 spectral bands are collected by the sensors. Since 20 bands cannot be reflected by water, the remaining 200 bands were used for this study. The image size is 145 × 145. There are 16 kinds of ground objects in the image. The numbers of sample points per class and the numbers involved in the training set and the testing set are shown in Table 1. There is a serious class imbalance in the Indian Pines data. Specifically, there are only 46, 28, and 20 samples in classes 1, 7, and 9, respectively. In addition, even the ratio of the sample numbers between classes 9 and 11 is less than 1:100. Therefore, 25% of the samples were taken as the training samples for classes 1, 7, and 9 for all of the experiments described in Section 3.2 and Section 4.

3.1.2. Salinas Data

The Salinas data are an image (512 × 217) of the Salinas Valley, CA, USA, captured by the AVIRIS sensor, which contains 16 classes of ground objects with 224 bands. Since the computation complexity is high using all samples, 9 classes of ground objects were taken to verify the effectiveness of the proposed method. The spatial resolution of the Salinas data reaches 3.7 m. Since bands 108–112, 154–167, and 224 cannot be reflected by water, the remaining 204 bands were used for this study. The sample number of each class is relatively balanced.

3.1.3. Pavia University Data

The Pavia University data are a top view image (610 × 340) of the University of Pavia, Italy, acquired by the ROSIS-03 sensor. The wavelength range of the sensor is 0.43–0.86 μm and the spatial resolution of the data is 1.3 m. A total of 103 spectra bands were selected in this paper; the other 12 bands were removed since they are heavily influenced by noise. There are 207,400 pixels in an image. Among them, only 42,776 pixels include ground object information, and the remainder are background pixels.

3.2. Methods

3.2.1. Overview of the Classification Approach

The overall flowchart of the proposed method is shown in Figure 1. Two main stages are included, i.e., DANCE and DRCNN classification. The HSI is first operated by DANCE to obtain images with enhanced spatial consistency. The size of HSI is not changed in this stage. Then, the preprocessed image is sent to the DRCNN to obtain the classification result. The detailed processes are introduced in Section 3.2.2 and Section 3.2.3.

3.2.2. Data-wise spAtial regioNal Consistency re-Enhancement (DANCE)

The proposed DANCE method mainly includes five stages: blocking, SGW frequency decomposition, filtering and SGW reconstruction, and splicing, as shown in Figure 1. First, the HSI is divided into some sub-image blocks. Then, each sub-image block is decomposed to obtain its spectrum graph feature using the SGW. Thirdly, the Gaussian filter is used to filter the block’s noise and smooth its edge. Fourth, the inverse SGW is used to reconstruct the filtered sub-image block. Finally, all sub-image blocks are spliced to a preprocessed HSI. The operations of SGW frequency decomposition, filtering, and SGW reconstruction are described in detail below.

(1): SGW Decomposition.

First, a weight map is calculated according to the distance between the pixels and their neighbor nodes, and denoted as an adjacency matrix. The neighborhood nodes are selected based on the minimum distance principle. Taking N_k = 4 as an example, the 4 nearest nodes are selected as in Figure 2.

The SGWT with scale s is carried out on the weight map, and the decomposed images are obtained in four frequency bands, which are a low-frequency component LF and three high-frequency components, HF₁, HF₂, and HF₃, as shown in Figure 1.

(2): Gaussian Filtering.

Gaussian filtering is applied to the image components obtained in step (1). The parameters of the Gaussian filter are determined by the experiments, which are described in Section 4.1. Taking the image block of Pavia University as an example, the Gaussian filtering results of four components, GLF, GHF₁, GHF₂, and GHF₃, are shown in Figure 3 for d = 5, σ = 0.5. The low-frequency component mainly includes the coarse information of the image according to Figure 3a. Furthermore, the edge information of the image is mainly in the high-frequency components (Figure 3b–d). It is clear that the image is smoothed and the noise is removed to a certain extent, which makes the processed image more consistent with the true spatial domain distribution.

(3): SGW Reconstruction.

The filtered components with different frequency bands are reconstructed by inverse SGW to obtain the preprocessed hyperspectral feature map, as shown in Figure 4. The reconstructed image using four frequency band images after Gaussian filtering is shown in Figure 4a. Figure 4b shows the un-preprocessed image. For clear observation, two areas in the two images are marked with red rectangle and enlarged as shown in Figure 4. It can be seen that the preprocessed image using the DANCE method can improve the spatial consistency in both flat regions and edges.

To quantitatively evaluate the spatial consistency enhancement of DANCE, the regions with the size of 5 × 5, 10 × 10, and 15 × 15 were randomly selected 10 times in the processed HSI and the original HSI. The average Euclidean distances between all pixels in these regions were calculated, as shown in Table 2. It can be seen that the average Euclidean distance between pixels after DANCE is reduced, regardless of the size of the region, which further verifies the enhancement of spatial consistency by DANCE.

3.2.3. Construction of the DRCNN

The remote sensing image classification performance using DRCNN [17] is effectively improved by fully considering the feature-wise spatial consistency. Since the proposed DANCE can provide the more spatial consistency information based on a data-wise approach, the preprocessed remote sensing image can be taken based on the preliminary features, which are sent to the DRCNN to classify the ground objects. The structure of the DRCNN is adjusted by experiments based on the preprocessed data. The preprocessed image is divided into the K × K sub-block images. In general, K is odd. A sub-block image is taken as a Global Region. Its left, right, top, bottom, and central regions are extracted as the new features, which are trained using the different networks. For convenience, the six selected regions are named GR (Global Region), RR (Right Region), LR (Left Region), TR (Top Region), BR (Bottom Region), and CR (Central Region). The sizes of five of the areas, with the exception of CR, can be determined by the window radius r. Thus, the size of GR (SGR) is (2 r + 1) × (2 r + 1). The size of LR and RR is set as (2 r + 1) × (r + 2). The size of TR and BR is set as (r + 2) × (2 r + 1). Two series of experiments were performed to determine the sizes of GR, RR, LR, TR, BR, and CR. The schematic diagram examples of feature extraction with r = 3, 4, 5, 6 are shown in Figure 5. Each region can be taken as a GR. Its TR, BR, LR, and RR are shown with different color rectangles.

(1): The selection of r and the proportion of training set.

In the cross-validation experiments based on the Indian Pines data, Salinas-small data, and Pavia University data, r was set to be 3, 4, 5, 6. The size of CR was set 3 × 3. The proportion of training samples was taken as 3%, 5%, 10%, 13%, and 15%. The results of the experiments are shown in Figure 6, where the different colored lines represent the classification Overall Accuracies (OAs) with different r values.

As can be seen from Figure 6, the best OA can be achieved with r = 6 on the Indian Pines data and Salinas-small data. The OA is the best when r = 5 on the Pavia University dataset. The OA is obviously good when using 15% of the training samples for these data. Based on the above analysis and the time complexity, 10% of the data were set as the training set, and the radius of the sub-block image was taken as 6. Therefore, the sizes of the five windows are shown in Table 3.

A share of 10% samples of the Indian Pines data were selected as the training set and r = 6. The size of CR was set as 1 × 1, 3 × 3, and 5 × 5. The OA were obtained using all features of the six regions, as shown in Figure 7. It can be seen that the classification accuracy is the highest when the size of the CR is 3 × 3. Therefore, 3 × 3 was selected as the size of the CR.

4. Experiment Results and Analyses

The parameter selection experiment on Gaussian filter was designed first to achieve the optimal performance of the proposed method. Then, multiple experiments were performed to prove the effectiveness of DANCE and DANCE-DRCNN. The experimental environment was MATLAB (R2019 a) and Python running on a workstation with a GeForce RTX 2080 Ti GPU.

4.1. Parameters Selection of Gaussian Filter

The parameters of the Gaussian filter used for the four components extracted by SGWT are important parameters in the DANCE method. A (2 k + 1) × (2 k + 1) discrete Gaussian convolution kernel H is defined as:

H_{i, j} = \frac{1}{2 π σ^{2}} e^{- \frac{{(i - k - 1)}^{2} + {(j - k - 1)}^{2}}{2 σ^{2}}}

(15)

where

1 \leq i, j \leq (2 k + 1)

, σ is the variance. Thus, the two parameters affecting the Gaussian filter effectiveness are the dimensionality of the Gaussian convolution kernel d = 2 k + 1 and the variance σ. To determine the optimum values of these two parameters, d was taken as 3, 5 and 7, and σ was set as 0.3, 0.5, and 0.7 respectively. Taking the Indian Pines data processed by SGWT as an example, in which 25% of the samples were taken as the training samples for classes 1, 7, and 9, and 10% of the samples for the other classes were taken as training set, the DRCNN constructed in Section 3.2.2 was used for training and classification. The classification results with various parameters are shown in Table 4. According to Table 4, d was set to 5 and σ was set to 0.5 in this study.

4.2. Results and Comparisons

4.2.1. Comparisons with DRCNN and Baselines

To further demonstrate the effectiveness of the proposed method, comparative experiments between DANCE-DRCNN, DRCNN, and baselines were designed. The training set proportion was set as 10%. Seven state-of-art approaches were selected. SVMMRF [25] represents the traditional method, and combines an SVM classifier and a Radial Basis Function (RBF). As an improvement to ResNet, A2S2K-ResNet [26] was also chosen for comparison. The HSID-CNN method [27] was selected since it fuses multi-scale features to remove noise. R-PCA-CNN [19] and Gabor-CNN [28] are all based on the method of combining the classical preprocessing methods with CNN. SSRN [29] and 3D-CNN [20] were compared with the proposed method since their classification results are relatively good at present. The classification maps of the three datasets are shown in Figure 8, Figure 9 and Figure 10.

It can be seen that the Figures (a) in the three maps (Figure 8, Figure 9 and Figure 10) represent the ground truth of these data, and Figures (b–f) show the obtained classification maps using SVMMRF [25], R-PCA-CNN [19], 3D-CNN [20], DRCNN [21], and the proposed method, respectively. For clear observation, some of the image edges framed in the classification map were enlarged, as shown in Figures (g) or (h), which correspond to the Figures (b–f) in order from top to bottom. Illegible misclassification points are labeled with dashed circles. Firstly, compared with SVMMRF, R-PCA-CNN, and 3D-CNN methods, the proposed method can effectively remove the influence of Gaussian noise and enhance the continuity of the classification maps. In addition, the misclassification rate of the edge points is greatly reduced. For example, the edge misclassification of the Corn-mintill class in the Indian Pines data (the red area in Figure 8), the Grapes untrained class in the Salinas-small data (the blackish green area in Figure 9), and the Gravel class in the Pavia University dataset (the red area in Figure 10) is significantly corrected. Then, compared with DRCNN, the misclassification rate of ground object edge reduces obviously when using the proposed DANCE approach.

The comparisons of the classification performance between the proposed method, DRCNN, and baselines on the three datasets are shown in Table 5, Table 6 and Table 7, which list the accuracy of each class, OA, and AA. It can be seen that the proposed method achieved the highest accuracy in the three datasets. Compared with the traditional SVM classification method, the OA using DANCE-DRCNN is improved by about 20% compared to the Indian Pines data, and by about 8% compared to the Salinas-small data and Pavia University data. At the same time, compared with other CNN methods, the proposed method improved the OA to some extent. Moreover, several classes with low accuracy in DRCNN are improved by the proposed method.

4.2.2. Ablation Experiments

According to the strategy described in Chapter 3, the HSI was firstly decomposed into four components, Gaussian filtering was then used, and it was finally reconstructed. To verify the effectiveness of the proposed approach, the HSI processed only by GF (GF-HSI), the four components after GF (GLF, GHF₁, GHF₂, and GHF₃), and the HSI after DANCE were the inputs of DRCNN, respectively. The experimental results with different inputs are shown in Table 8, which lists the accuracy of each class, OA, and Average Accuracy (AA).

Firstly, it can be seen that the classification accuracy of the approach using only GF is lower than that of the proposed combination. This may be because the high-frequency and low-frequency features of the HSI are filtered together when the global HSI is filtered using the GF. Secondly, it can also be seen that the classification accuracy using GLF is higher than that using three other components (GHF₁, GHF₂, and GHF₃) because it represents most of the image information. However, it is still lower than the DANCE method since some edge information is lacking in the GLF. Finally, since GHF₁, GHF₂, and GHF₃ only represent the high-frequency components of the image (edge information), the results using GHF₁, GHF₂, and GHF₃ are lower in order. Specially, the edge information is not vital for the classification of some large ground objects. In summary, the proposed DANCE method fusing SGWT and GF has higher classification accuracy than any of the approaches using only one of these methods.

4.2.3. The Improvement with DANCE in Other Methods

To demonstrate the effectiveness of the proposed method for other feature-wise approaches, the data were first preprocessed using the DANCE method. Based on the preprocessed data, all comparison methods used in Section 4.2.1 were performed to classify the ground objects like the proposed method. The experimental results are shown in Table 9. As can be seen from the results in Table 9, the DANCE method not only improves the classification accuracy of DRCNN, but also that of the other methods. Thus, it was further verified that the proposed DANCE method is an effective solution for the HSI classification.

5. Discussion

5.1. The Selection of the Input Size in DANCE

Before the use of DANCE, an HSI is divided into many blocks having the same size; then, the undirected graphs of blocks are obtained by spectral graph theory. Therefore, the size of blocks determines how many pixels are used simultaneously for spatial consistency, which can affect the performance of DANCE. Based on the analysis, the selection experiments of the sub-block size in DANCE were designed. To generate a node graph of every HSI block, the sub-block size must be an integer. Thus, the input size of Indian Pines data was set to 145 × 145, 29 × 29, and 5 × 5. The image after the use of DANCE is sent to the DRCNN for classification, and the results are shown in Table 10. Smaller blocks mean more iterations, which in turn affects the running time. Therefore, the running time with different image block sizes is also shown in Table 10.

It can be seen that the best classification results are obtained when the size of the HSI block is set to 29 × 29. Compared with the size of 145 × 145, the smaller image block only contains the information of spatial consistency with its neighborhoods, which is better than the result of computing all pixel points together. In addition, when the size is set to 5 × 5, the more iterations lead to huge computation complexity. Therefore, the input size in DANCE was selected as 29 × 29 on Indian Pines data.

In summary, the selection of the image block size needs to consider both the classification performance and the running time. According to the above experimental results, it is clear that good performance can be achieved when the middle block size is selected in all possible sizes. Therefore, the sub-block sizes were set to 128 × 37 and 122 × 85 for Salinas-small data and Pavia University data, respectively. However, for different data, further study of the evaluation criteria for the optimum sub-block size is still necessary.

5.2. The Computation Cost of DANCE

To evaluate the computation cost of DANCE, the Indian Pines data were taken as an example. The HSI was first divided into blocks with the size of 29 × 29, and passed into DANCE ten times. The computational costs are shown in Table 11, which includes the averages and variances of disk usage, CPU usage, and the running time.

It can be seen that the proposed DANCE does not greatly increase the burden of image processing. However, the running time is still not short, and thus needs to be optimized in future research.

6. Conclusions

Motivated by the DRCNN method using feature-wise spatial regional consistency, a method named Data-wise spAtial regioNal Consistency re-Enhancement (DANCE) is proposed, which fully considers the relationship between pixels and combines the SGWT with Gaussian filtering. Then, DRCNN is used to realize the HSI classification. Experimental results show the proposed DANCE method can effectively enhance the spatial regional consistency of images based on a data-wise approach. It can be seen in Section 4.2.1 that the proposed method performs better than other baselines and DRCNN. Firstly, compared with other baselines, the proposed method makes full use of the spatial consistency of both the data-wise and feature-wise approaches. For both the middle and edge areas of the ground objects, the misclassification points are evidently reduced. Then, compared with DRCNN, DANCE improves the quality of HSIs by enhancing the spatial consistency of the data-wise approach and removing the Gaussian noise. In particular, it can be seen that the accuracy of edge points is improved in the classification maps. The disadvantage is that DANCE increases computational cost compared with only DRCNN.

Some additional work should be further researched. Firstly, the result of six regions in the DRCNN is adopted by the contact strategy. Therefore, the central region does not achieve the role of re-correcting the misclassified points. This issue should also be given further attention in future work. Regarding another aspect, the proposed method does not consider the spectral correlations between different bands, which leads to the problem of redundancy with longer training time and larger storage space. The above two issues were not addressed in this study. In the future, we will conduct further research.

Author Contributions

Conceptualization, L.Z.; methodology, L.Z.; validation, E.X.; formal analysis, S.H.; investigation, L.Z.; resources, Y.Y.; data curation, E.X.; writing—original draft preparation, E.X.; writing—review and editing, L.Z.; visualization, E.X.; supervision, K.Z.; project administration, L.Z.; funding acquisition, S.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 62171247 and 41921781.

Data Availability Statement

The datasets presented in this paper are available through: http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes, http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes, http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes#Pavia_University_scene accessed on 9 March 2022.

Acknowledgments

The authors would like to thank the editors and the reviewers for their valuable suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yan, J.; Chen, H.; Liu, L. Overview of hyperspectral image classification. J. Sens. 2020, 2020, 4817234. [Google Scholar] [CrossRef]
Yang, F.; Chen, X.; Chai, L. Hyperspectral image destriping and denoising using stripe and spectral low-rank matrix recovery and global spatial-spectral total variation. Remote Sens. 2021, 13, 827. [Google Scholar] [CrossRef]
Saboori, A.; Ghassemian, H. Adversarial discriminative active Deep Learning for domain adaptation in hyperspectral images classification. Int. J. Remote Sens. 2021, 42, 3981–4003. [Google Scholar] [CrossRef]
Zhang, S.; Sun, B.; Li, S.; Kang, X. Noise estimation of hyperspectral image in the spatial and spectral dimensions. Natl. Remote Sens. Bull. 2021, 25, 1108–1123. [Google Scholar]
Mohan, A.; Sapiro, G.; Bosch, E. Spatially coherent nonlinear dimensionality reduction and segmentation of hyperspectral images. IEEE Geosci. Remote Sens. Lett. 2007, 4, 206–210. [Google Scholar] [CrossRef]
Li, J.; Zhao, C.H.; Mei, F. Detecting hyperspectral anomaly by using background residual error data. J. Infrared Millim. Waves 2010, 29, 150–155. [Google Scholar] [CrossRef]
Imani, M.; Ghassemian, H. An overview on spectral and spatial information fusion for hyperspectral image classification: Current trends and challenges. Inf. Fusion 2020, 59, 59–83. [Google Scholar] [CrossRef]
Rasti, B.; Scheunders, P.; Ghamisi, P.; Licciardi, G.; Chanussot, J. Noise reduction in hyperspectral imagery: Overview and application. Remote Sens. 2018, 10, 482. [Google Scholar] [CrossRef] [Green Version]
Buades, A.; Coll, B.; Morel, J.M. On image denoising methods. A new nonlocal principle. SIAM Rev. 2010, 4, 490–530. [Google Scholar] [CrossRef]
Rand, R.S.; Keenan, D.M. A spectral mixture process conditioned by Gibbs-based partitioning. IEEE Trans. Geosci. Remote Sens. 2001, 39, 1421–1434. [Google Scholar] [CrossRef]
Yue, J.; Zhang, Y.; Xu, H.; Bai, L. An unsupervised classification of hyperspectral images based on pixels reduction with spatial coherence property. Spectrosc. Spectr. Anal. 2012, 32, 1860–1864. [Google Scholar]
David, K.H.; Pierre, V.; Rémi, G. Wavelets on graphs via spectral graph theory. Appl. Comput. Harmon. Anal. 2011, 30, 129–150. [Google Scholar]
Zikiou, N.; Mourad, L.; David, H. Hyperspectral image classification using graph-based wavelet transform. Int. J. Remote Sens. 2020, 41, 2624–2643. [Google Scholar] [CrossRef]
Zhang, M.; Li, Q. MS-GWNN: Multi-scale graph wavelet neural network for breast cancer diagnosis. arXiv 2020, arXiv:2012.14619. [Google Scholar]
Dong, X.; Li, G.; Jia, Y.; Xu, K. Multiscale feature extraction from the perspective of graph for hob fault diagnosis using spectral graph wavelet transform combined with improved random forest. Measurement 2021, 176, 109178. [Google Scholar] [CrossRef]
Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep learning-based classification of hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2094–2107. [Google Scholar] [CrossRef]
Quan, Y.; Dong, S.; Feng, W.; Dauphin, G.; Zhao, G.; Wang, Y.; Xing, M. Spectral-spatial feature extraction based CNN for hyperspectral image classification. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Waikoloa Village, HI, USA, 26 September–2 October 2020; pp. 485–488. [Google Scholar]
Feng, Y.; Zheng, J.; Qin, M.; Bai, C.; Zhang, J. 3d octave and 2d vanilla mixed Convolutional Neural Network for hyperspectral image classification with limited samples. Remote Sens. 2021, 13, 4407. [Google Scholar] [CrossRef]
Makantasis, K.; Karantzalos, K.; Doulamis, A.; Doulamis, N. Deep supervised learning for hyperspectral data classification through convolutional neural networks. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 4959–4962. [Google Scholar]
Ahmad, M.; Khan, A.M.; Mazzara, M.; Distefano, S.; Ali, M.; Sarfraz, M.S. A fast and compact 3-D CNN for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Zhang, M.; Li, W.; Du, Q. Diverse region-based CNN for hyperspectral image classification. IEEE Trans. Image Process. 2018, 27, 2623–2634. [Google Scholar] [CrossRef]
Zhang, Y.; He, J. Bilateral texture filtering for spectral-spatial hyperspectral image classification. J. Eng. 2019, 2019, 9173–9177. [Google Scholar] [CrossRef]
Gupta, V.; Sastry, S.; Mitra, S.K. Hyperspectral image classification using trilateral filter and deep learning. In Proceedings of the IEEE International Symposium on Sustainable Energy, Gunupur, Odisha Signal Processing and Cyber Security (iSSSC), Gunupur Odisha, India, 16–17 December 2020. [Google Scholar]
Landgrebe, D.; Malaret, E. Noise in remote-sensing systems: The effect on classification error. IEEE Trans. Geosci. Remote Sens. 1986, GE-24, 294–300. [Google Scholar] [CrossRef]
Tarabalka, Y.; Fauvel, M.; Chanussot, J.; Benediktsson, J.A. SVM- and MRF-based method for accurate classification of hyperspectral images. IEEE Geosci. Remote Sens. Lett. 2010, 7, 736–740. [Google Scholar] [CrossRef] [Green Version]
Roy, S.K.; Manna, S.; Song, T.; Bruzzone, L. Attention-based adaptive spectral-spatial kernel ResNet for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 7831–7843. [Google Scholar] [CrossRef]
Yuan, Q.; Zhang, Q.; Li, J.; Shen, H.; Zhang, L. Hyperspectral image denoising employing a spatial-spectral deep residual convolutional neural network. IEEE Trans. Geosci. Remote Sens. 2019, 57, 1205–1218. [Google Scholar] [CrossRef] [Green Version]
Kang, X.; Li, C.; Li, S.; Lin, H. Classification of hyperspectral images by Gabor filtering based deep network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 11, 1166–1178. [Google Scholar] [CrossRef]
Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral-spatial residual network for hyperspectral image classification: A 3-D deep learning framework. IEEE Trans. Geosci. Remote Sens. 2018, 56, 847–858. [Google Scholar] [CrossRef]

Figure 1. The overall flowchart of the proposed method.

Figure 2. The selection of neighborhood nodes.

Figure 3. Four components’ images before and after application of the Gaussian filter on Pavia University image block: (a) LF to GLF; (b) HF₁ to GHF₁; (c) HF₂ to GHF₂; (d) HF₃ to GHF₃.

Figure 4. Comparison of results with and without DANCE: (a) reconstructed image (with GF); (b) the un-preprocessed image (without GF).

Figure 5. Four kinds of feature extraction windows with different r values.

Figure 6. The overall accuracies for different r values and different training set ratios: (a) Indian Pines data; (b) Salinas-small data; (c) Pavia University data.

Figure 7. Overall classification accuracy for different CR sizes.

Figure 8. Classification maps from the proposed DANCE-DRCNN method and the baselines on the Indian Pines data: (a) ground truth; (b) SVM-PCA; (c) PCA-CNN; (d) 3D-CNN; (e) DRCNN; (f) DANCE-DRCNN; (g,h) enlarged edge image.

Figure 9. Classification maps from the proposed DANCE-DRCNN method and the baselines on the Salinas-small data: (a) ground truth; (b) SVM-PCA; (c) PCA-CNN; (d) 3D-CNN; (e) DRCNN; (f) DANCE-DRCNN; (g) enlarged edge image.

Figure 10. Classification maps from the proposed DANCE-DRCNN method and the baselines on the Pavia University data: (a) ground truth; (b) SVM-PCA; (c) PCA-CNN; (d) 3D-CNN; (e) DRCNN; (f) DANCE-DRCNN; (g,h) enlarged edge image.

Table 1. The numbers of total, training, and testing samples for the Indian Pines data.

#	Class	Total	Training	Testing
1	Alfalfa	46	12	34
2	Corn-notill	1428	143	1285
3	Corn-mintill	830	83	747
4	Corn	237	24	213
5	Grass-pasture	483	48	435
6	Grass-trees	730	73	657
7	Grass-pasture-mowed	28	7	21
8	Hay-windrowed	478	48	430
9	Oats	20	5	15
10	Soybean-notill	1072	97	875
11	Soybean-mintill	2455	246	2209
12	Soybean-clean	593	59	534
13	Wheat	205	21	184
14	Woods	1265	127	1138
15	Building-grass-trees-drives	386	39	347
16	Stone-steel-towers	93	9	84
-	Total	10,349	1041	9208

Table 2. The average Euclidean distances with and without DANCE on the Indian Pines data.

#	5 × 5	10 × 10	15 × 15
without DANCE	1.0775 × 10⁻³	3.1700 × 10⁻³	8.8460 × 10⁻³
with DANCE	1.0571 × 10⁻³	1.4450 × 10⁻⁴	8.0556 × 10⁻³

Table 3. The size of windows of DRCNN.

xR	GR	RR	LR	TR	BR
Size	13 × 13	13 × 8	13 × 8	8 × 13	8 × 13

Table 4. Classification accuracy of different parameters in GF (%).

	Total	Train	Test
σ	Total	Train	Test
0.3	96.68	96.64	96.95
0.5	98.68	98.82	97.69
0.7	97.88	96.23	97.36

Table 5. Comparison of the classification accuracy (%) among the proposed method, DRCNN, and the baselines on the Indian Pines data.

Class	SVMMRF	HSID-CNN	A²S²K-RestNet	R-PCA-CNN	Gabor-CNN	SSRN	3D-CNN	DRCNN	DANCE-DRCNN
1	44.62	100.00	98.37	100.00	95.62	52.73	100.00	71.74	100.00
2	68.51	88.47	90.51	92.20	89.20	98.28	94.27	96.18	98.90
3	70.36	88.71	98.55	97.62	85.65	97.23	99.03	90.99	96.93
4	57.07	92.23	91.12	100.00	90.31	96.71	95.51	87.63	98.04
5	92.20	99.16	94.67	100.00	88.92	97.56	95.94	100.00	99.27
6	86.70	88.76	97.71	98.65	93.17	98.62	98.64	100.00	99.20
7	85.71	100.00	96.41	100.00	96.88	98.60	100.00	96.18	100.00
8	97.05	91.58	98.72	100.00	90.67	95.42	100.00	96.75	100.00
9	36.36	33.33	99.46	96.23	96.35	98.62	100.00	100.00	100.00
10	69.76	90.87	95.38	95.79	89.87	96.90	97.66	95.14	99.38
11	75.48	84.64	96.46	95.93	94.78	98.02	97.77	97.89	98.29
12	81.38	92.14	92.28	96.72	98.26	96.91	91.53	93.22	99.21
13	93.16	99.42	95.66	100.00	95.46	99.59	100.00	100.00	100.00
14	92.18	91.90	89.76	93.86	90.18	98.91	99.80	97.32	99.35
15	76.11	88.19	95.67	97.50	89.63	97.66	96.69	89.69	93.84
16	95.95	96.01	96.69	100.00	97.30	98.67	97.37	88.91	98.73
OA	78.54	88.80	97.31	96.97	95.75	97.71	97.27	96.15	98.61
AA	73.37	89.09	95.46	97.78	92.82	95.03	95.88	93.85	98.82

Table 6. Comparison of the classification accuracy (%) among the proposed method, DRCNN, and the baselines on the Salinas-small data.

Class	SVMMRF	HSID-CNN	A²S²K-RestNet	R-PCA-CNN	Gabor-CNN	SSRN	3D-CNN	DRCNN	DANCE-DRCNN
1	100.00	99.73	100.00	100.00	100.00	100.00	99.88	100	100.00
2	99.64	99.28	99.70	99.62	99.20	100.00	99.87	100	100.00
3	98.96	99.49	99.47	98.94	99.50	99.49	100.00	99.88	100.00
4	99.97	99.93	99.60	99.95	99.80	99.30	100.00	100.00	100.00
5	99.81	100.00	100.00	100.00	99.75	98.50	99.86	99.86	100.00
6	79.78	99.97	87.64	99.95	85.15	93.40	100.00	96.34	100.00
7	99.43	99.99	99.45	98.37	99.48	99.30	100.00	99.89	100.00
8	98.41	99.76	99.64	98.30	98.13	100.00	99.47	98.81	100.00
9	97.58	99.86	100.00	92.10	95.66	100.00	100.00	100.00	100.00
OA	92.72	99.83	98.32	99.27	97.46	98.28	99.97	99.91	100.00
AA	97.06	99.78	98.39	98.30	97.41	97.88	99.90	99.42	100.00

Table 7. Comparison of the classification accuracy (%) among the proposed method, DRCNN, and the baselines on the Pavia University data.

Class	SVMMRF	HSID-CNN	A²S²K-RestNet	R-PCA-CNN	Gabor-CNN	SSRN	3D-CNN	DRCNN	DANCE-DRCNN
1	94.56	94.32	98.22	99.70	96.45	99.81	99.85	99.35	99.81
2	96.08	95.49	98.90	99.84	96.95	99.94	99.93	99.88	99.97
3	85.73	94.07	88.91	92.31	96.09	99.35	98.46	99.72	100.00
4	96.41	99.46	93.56	99.35	99.22	99.81	100.00	98.99	99.85
5	99.59	99.57	99.11	100.00	99.92	99.94	100.00	100.00	100.00
6	93.18	97.94	80.26	94.35	94.69	99.95	100.00	98.22	99.98
7	89.66	97.68	93.31	98.52	87.36	100.00	100.00	100.00	100.00
8	85.75	84.11	93.64	94.85	87.38	98.59	99.46	98.39	99.97
9	99.88	98.05	99.37	98.96	100.00	100.00	100.00	100.00	100.00
OA	92.18	94.97	95.23	97.54	95.67	99.77	99.82	99.58	99.92
AA	94.15	95.63	93.92	98.25	95.37	99.71	99.74	99.39	99.95

Table 8. Comparison of classification results (%) using DRCNN with different inputs on Indian Pines data.

#	GF-HSI	GLF	GHF₁	GHF₂	GHF₃	DANCE
1	100.00	100.00	73.12	62.00	51.84	100.00
2	95.76	83.06	72.35	67.51	37.89	98.90
3	96.92	98.99	87.33	47.15	63.03	96.93
4	97.54	100.00	61.16	63.64	59.88	98.04
5	100.00	100.00	96.22	100.00	78.82	99.27
6	100.00	100.00	100	99.25	88.13	99.20
7	95.45	100.00	81.23	43.59	39.33	100.00
8	100.00	99.31	91.15	99.51	33.57	100.00
9	100.00	100.00	87.98	100.00	79.75	100.00
10	89.43	95.26	91.45	92.33	92.33	99.38
11	96.64	95.55	84.22	85.42	70.15	98.29
12	96.38	96.66	88.07	53.75	68.45	99.21
13	97.37	98.40	71.52	96.69	75.32	100.00
14	100.00	99.73	96.49	84.77	83.41	99.35
15	87.31	85.36	68.99	62.14	80.05	93.84
16	87.37	79.05	93.16	100.00	88.12	98.73
OA	96.13	94.52	86.59	78.61	69.15	98.61
AA	96.26	95.71	84.03	74.90	68.13	98.82

Table 9. Comparison of the classification accuracy (%) among the proposed method and the baselines on the Indian Pines data.

OA	SVMMRF	HSID-CNN	A²S²K-RestNet	R-PCA-CNN	Gabor-CNN	SSRN	3D-CNN	DRCNN
without DANCE	78.54	88.80	97.31	96.97	95.75	97.71	97.27	96.15
with DANCE	81.67	90.12	98.29	97.35	96.82	97.91	98.13	98.61

Table 10. Comparison of the different sub-block sizes on the Indian Pines data.

#	145 × 145	29 × 29	5 × 5
OA (%)	98.59	98.61	97.57
AA (%)	97.21	98.82	96.22
running time of DANCE	58.44±2.26 s	368.38 ± 3.62 s	4601.54 ± 9.21 s

Table 11. Computational cost of DANCE on the Indian Pines data.

Disk Usage	CPU Usage	Running Time
1236 ± 23.4 MB	36.12 ± 1.4%	368.38 ± 3.62 s

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, L.; Xu, E.; Hao, S.; Ye, Y.; Zhao, K. Data-Wise Spatial Regional Consistency Re-Enhancement for Hyperspectral Image Classification. Remote Sens. 2022, 14, 2227. https://doi.org/10.3390/rs14092227

AMA Style

Zhou L, Xu E, Hao S, Ye Y, Zhao K. Data-Wise Spatial Regional Consistency Re-Enhancement for Hyperspectral Image Classification. Remote Sensing. 2022; 14(9):2227. https://doi.org/10.3390/rs14092227

Chicago/Turabian Style

Zhou, Lijian, Erya Xu, Siyuan Hao, Yuanxin Ye, and Kun Zhao. 2022. "Data-Wise Spatial Regional Consistency Re-Enhancement for Hyperspectral Image Classification" Remote Sensing 14, no. 9: 2227. https://doi.org/10.3390/rs14092227

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Data-Wise Spatial Regional Consistency Re-Enhancement for Hyperspectral Image Classification

Abstract

1. Introduction

2. Preliminary

2.1. Characteristics of Weighted Graphs

2.2. Graph Fourier Transform

2.3. Spectral Graph Wavelet Transform

3. Materials and Methods

3.1. Materials

3.1.1. Indian Pines Data

3.1.2. Salinas Data

3.1.3. Pavia University Data

3.2. Methods

3.2.1. Overview of the Classification Approach

3.2.2. Data-wise spAtial regioNal Consistency re-Enhancement (DANCE)

3.2.3. Construction of the DRCNN

4. Experiment Results and Analyses

4.1. Parameters Selection of Gaussian Filter

4.2. Results and Comparisons

4.2.1. Comparisons with DRCNN and Baselines

4.2.2. Ablation Experiments

4.2.3. The Improvement with DANCE in Other Methods

5. Discussion

5.1. The Selection of the Input Size in DANCE

5.2. The Computation Cost of DANCE

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI