*3.1. Spectral-Spatial Kernel*

Let us consider an HSI that contains *B*-band vectors of **x***SPE* ≡ .*xSPE* 1 , *xSPE* 2 ,..., *xSPE N* / ∈ <sup>R</sup>*B*×*N*, its EMP **x***SPA* ≡ .*xSPA* 1 , *xSPA* 2 ,..., *xSPA N* / ∈ R*k*(<sup>2</sup>*n*+<sup>1</sup>)×*<sup>N</sup>* and the hierarchical segmentation map **x***HIE* ≡ .*xHIE* 1 , *xHIE* 2 ,..., *xHIE N* / ∈ R*S*×*N*. The supervised SVM classifier is widely used for statistical classification and regression analysis due to its characteristics of geometrical margin maximization and empirical error minimization [62]. Because the HSIs are not linearly separable, the pixels are mapped from **x***SPE* to a kernel Hilbert space by using a mapping function *φ***x***SPE* to construct the hyperplane. Then, the decision function can be defined as follows:

$$f(\mathbf{x}) = \sum\_{i=1}^{q} a\_i y\_i \mathcal{K}(\mathbf{x}, \mathbf{x}\_i) + b \tag{24}$$

where *α* = %*<sup>α</sup>*1, *α*2,..., *αq*& is a set of coefficients associated with *zq*, *b* is the bias of the decision function *f*, and *<sup>K</sup>xi*, *xj* = *f*(*xi*)' *f xj*. For HSI classification using SVM, the Gaussian RBF kernel is the most widely employed as a spectral kernel, measuring the similarity between two pixels. The typical spectral kernel can be defined, as follows:

$$K^{SPE}\left(\mathbf{x}\_i^{SPE}, \mathbf{x}\_j^{SPE}\right) = \exp\left(-\gamma \left\|\mathbf{x}\_i^{SPE} - \mathbf{x}\_j^{SPE}\right\|^2\right) \tag{25}$$

where *σ* is the width of the RBF kernel. Similarly, the spatial kernel can be constructed using the RBF kernel. Specifically, for two vectors *xSPA i* and *xSPA j* , the spatial kernel is defined as follows:

$$K^{\rm SPA}\left(\mathbf{x}\_{l}^{\rm SPA}, \mathbf{x}\_{\hat{l}}^{\rm SPA}\right) = \exp\left(-\gamma \left\|\mathbf{x}\_{l}^{\rm SPA} - \mathbf{x}\_{\hat{l}}^{\rm SPA}\right\|^2\right) \tag{26}$$

As stated in [20,63], if *k*1 and *k*2 are two kernels, then *μ*1*k*1 + *μ*2*k*2 is a new kernel with *μ*1, *μ*2 ≥ 0. According to this property, Camps-Valls et al. [20] formulated a SVM classifier with a spectral-spatial kernel for HSI classification, and this composite kernel is shown as follows:

$$K^{SPE-SPA}(\mathbf{x}\_i, \mathbf{x}\_j) = \mu K^{SPE} \left( \mathbf{x}\_i^{SPE}, \mathbf{x}\_j^{SPE} \right) + (1 - \mu) K^{SPA} \left( \mathbf{x}\_i^{SPA}, \mathbf{x}\_j^{SPA} \right) \tag{27}$$

where *μ* is a weight to balance the spectral kernel and the spatial one. The authors in [20] performed the spatial feature extraction for each pixel by computing the mean and variance within a fixed-size window. The SVM classifier with the composite kernel (27) can effectively combine the spectral and spatial information and achieve better results than that using the spectral kernel individually. However, the spatial structure information may not be well represented for classification within such a predefined region.

#### *3.2. The SVM-SSHK Method*

In this work, we propose an effective SVM classifier that is characterized with three kernels, which are computed on the pixels from the original, feature and hierarchical spaces to extract the spectral, spatial and hierarchical structure features, respectively. In the proposed framework, the spectral features are extracted directly through each pixel's vector value in the original HSI, and the spatial feature extraction in the proposed framework is performed using the EMP method due to its simplicity and effectiveness. As addressed in the previous question, the spectral information in HSIs can be represented by the limited PCs. It means that the spatial information of the HSI can be projected into a lower dimensional space after the PCA transform. To construct the EMP of the HSI, we can first define the MP for each PC instead of each spectral band, and then stacked the MPs of all the PCs to produce a final EMP. Specifically, the PCA transform is first applied to the original HSI for feature extraction. Then, the first three PCs are used as a feature image to obtain the EMP, where each pixel is a stacked vector, according to Equation (2).

To remedy the shortcomings of the spatial feature extraction, the hierarchical structure information can be used to as a supplement to the spatial features. Based on our previous study [32], the hierarchical structure information is helpful to improve HSI classification accuracies. As proposed in [32], the AMG method is very effective to model the spatial structure information because the multigrid structure can be used as the hierarchical representation of HSIs. To construct the hierarchical kernel, the FMS algorithm is applied to the original HSI for feature selection to obtain its spectral subset. Then, the multigrid representation of this subset is built using the AMG-based method. Next, the AMG-MHSEG algorithm is performed on each grid to obtain the corresponding segmentation map. Finally, these maps are combined to produce a stacked vector for each pixel and its value is featured with the cluster labels in different grids. The proposed hierarchical kernel is introduced, as follows:

$$\mathcal{K}^{\rm HIE}\left(\mathbf{x}\_i^{\rm HIE}, \mathbf{x}\_j^{\rm HIE}\right) = \exp\left(-\gamma \left\|\mathbf{x}\_i^{\rm HIE} - \mathbf{x}\_j^{\rm HIE}\right\|^2\right) \tag{28}$$

To exploit the spectral, spatial, and hierarchical structure information for HSI classification, composite kernels are considered for combining information. In this work, we present a weighted summation kernel, as follows:

$$\begin{aligned} K^{\rm SPE-SPA-HIE} \left( \mathbf{x}\_{i\prime}, \mathbf{x}\_{j} \right) &= \mu^{\rm SPE} K^{\rm SPE} \left( \mathbf{x}\_{i}^{\rm SPE}, \mathbf{x}\_{j}^{\rm SPE} \right) + \mu^{\rm SPA} K^{\rm SPA} \left( \mathbf{x}\_{i}^{\rm SPA}, \mathbf{x}\_{j}^{\rm SPA} \right) \\ &+ \mu^{\rm HIE} K^{\rm HIE} \left( \mathbf{x}\_{i}^{\rm HIE}, \mathbf{x}\_{j}^{\rm HIE} \right) \end{aligned} \tag{29}$$

where *<sup>μ</sup>SPE*, *μSPA* and *μHIE* are weights to indicate the contribution of each feature information involved in HSI classification under the condition of *μSPE* + *μSPA* + *μHIE* = 1. For clarity, the SVM-SSHK method is introduced in Algorithm 2.

## **Algorithm 2: SVM-SSHK**

Input: An original hyperspectral image **u**, the available training samples, required number of segmentation maps *S*, the time step size *τ*, Gaussian scale *σ*, the gradient threshold *β*, the critical threshold *υ* and the number of morphological operators *n*.

**Step 1:** Initialize *S*, *τ*, *σ*, *υ* and *n*.

**Step 2:** Obtain the first three PCs of **u**;

**Step 3:** Construct the EMP by computing the MPs for all the PCs in **Step 2** as described in **Section 2.1**. **Step 4:** Perform the FMS algorithm on **u** for feature selection to produce its spectral subset **u1** with the most relevant spectral bands as described in **Section 2.2**. **Step 5:** For *i* = 1, 2, . . . , *S* (a) Construct the *i*th grid of **u**1 using the procedures described in **Section 2.3**. (b) Select all the vertices in the *i*th grid as makers for the HSEG algorithm and initialize each vertex with a non-zero marker label. (c) Obtain the *i*th segmentation map by using the MHSEG algorithm described in **Algorithm 1**. End **Step 6:** Normalize **u**, the EMP and the *S*-scale HSEG maps to [0,1]. **Step 7:** Construct the spectral, spatial and hierarchical kernels as described in **Section 3.2**. **Step 8:** Apply the SVM classifier with the proposed SSHK kernel in (29) to classify **u** using the training samples by choosing the optimal *C* and *γ*. **Step 9:** Obtain the final classification map.
