**1. Introduction**

Hyperspectral (HS) remote sensing is an emerging discipline. Traditional remote sensing sensors obtain the image in a few discrete bands, and lose a large amount of useful information. A hyperspectral remote sensing sensor is capable of acquiring numerous contiguous narrow bands in a certain wavelength range [1]. As a result, the HS imagery has very high spectral resolution, and is a three-dimensional data cube, of which two spatial dimensions contain the space information, and one spectral dimension at each pixel includes the high-dimensional reflectance vectors [2,3]. Such HS image with abundant spectral information has been widely utilized in many domains, such as military surveillance [4], environmental monitoring [5], mineral exploration [6,7], and agriculture [8,9]. However, due to the constraints of technical difficulties and budget, the HS image usually has low spatial resolution. Although the high spectral resolution is crucial for identifying the materials, high spatial resolution is also important for locating the objects with high accuracy. There are various techniques to improve the spatial resolution of the HS image. Hyperspectral image fusion is one of the important spatial resolution enhancement approaches. Panchromatic (PAN) sensors can provide the PAN imagery with high spatial resolution. Fusion of an HS image and a PAN image is able to obtain a fused HS image with high spectral and spatial resolution by integrating the spectral information of the HS image and the spatial information of the PAN image.

A large number of hyperspectral image fusion methods have been proposed, and can be roughly divided into five families [10]. The first family is component substitution (CS), which first separates the spatial and spectral information of an HS image. The separated spatial component is then substituted by the PAN image, and a fused HS image can be obtained by applying the inverse transformation [11]. The CS includes algorithms such as intensity-hue-saturation (IHS) [12–14], principal component analysis (PCA) [15–17], Gram-Schmidt (GS) [18], adaptive GS (GSA) [19], Brovey transform (BT) [20], and partial replacement adaptive CS (PRACS) [21]. These CS based methods work well from a spatial aspect [19], and have fast and simple implementation [13]. However, they may suffer from serious spectral distortion, cause by the difference between the PAN image and the substituted spatial component [22]. The second family is multiresolution analysis (MRA) which aims to extract the spatial details of a PAN image through the multiscale decomposition or spatial filtering. The extracted spatial details are then injected into an HS image. Some well-known examples in the MRA family are smoothing filter based intensity modulation (SFIM) [23], decimated wavelet transform (DWT) [24], Laplacian pyramid [25], modulation transfer function (MTF) generalized Laplacian pyramid (MTF-GLP) [26], and MTF-GLP with high pass modulation (MTF-GLP-HPM) [27]. The MRA algorithms have temporal coherence [28], and good spectral preservation performance. On the negative side, these MRA algorithms have heavy computational burden and complicated implementation when compared to CS-based algorithms [28]. The CS and MRA approaches are the traditional fusion methods, and have been also extended from multispectral (MS) pansharpening to hyperspectral pansharpening.

The other three families, Bayesian methods, matrix factorization based methods, and hybrid methods, have been proposed recently. Bayesian methods transform a fusion problem into an explicit probabilistic framework, and then define a suitable prior distribution of interest to regularize the optimization model [29]. Bayesian sparsity promoted Gaussian prior (Bayesian sparse) [30], Bayesian HySure [31], and Bayesian naive Gaussian prior (Bayesian naive) [32] belong to this class of hyperspectral pansharpening. Matrix factorization based methods employ the nonnegative matrix factorization (NMF) model [33], and utilize the estimated solution of the NMF model to generate the fused HS image. The matrix factorization family contains algorithms such as nonnegative sparse coding (NNSC) [34], and constrained nonnegative matrix factorization (CNMF) [35]. Bayesian approaches and matrix factorization approaches perform well in terms of the preservation of spectral information. However, they have high computational cost. Hybrid methods combine algorithms from different families, for example, CS and MRA families, to form a new algorithm. Such obtained new algorithms generally take advantages of algorithms in both families [36]. Examples include the curvelet and ICA fusion method [37], the guided filter PCA (GFPCA) method [38], and the non-linear PCA (NLPCA) and indusion method [39].

The key to hyperspectral pansharpening is to provide more spatial information while preserving the spectral information of the original HS image. In order to accomplish this goal, this paper presents a new hyperspectral image fusion algorithm based on structure tensor. In this work, the structure tensor which describes the geometry structure and spatial details is applied to the fusion of HS and PAN images for the first time. Traditional methods extract the spatial details only from the PAN image without considering the structure information of the HS image, and thus, cause spectral distortion or deficient spatial enhancement. The proposed method considers the spatial details of the HS and PAN images simultaneously. The spatial details of the PAN image are extracted by calculating and analyzing the structure tensor and its eigenvalues. The spatial details of the HS image

are synchronously generated by the adaptive weighted method. In order to consider the HS and PAN images simultaneously and obtain the complete spatial details, an appropriate weighted fusion strategy is introduced to merge the extracted spatial information from the PAN image with the spatial information obtained from the HS image. To avert artifacts at the boundaries, a guided filter which is an edge-preserving filter is applied to the obtained merged spatial information image. Consequently, we can effectively provide spatial information and accomplish sufficient spatial enhancement. In order to maintain the spectral information, an injection gains matrix is constructed. This gains matrix can also further reduce the spatial distortion by a defined tradeoff parameter. After a desired gains matrix is constructed, the fused HS image is obtained by adding spatial details to the interpolated HS image. Extensive experiments have been conducted on both simulated and real hyperspectral remote sensing datasets to verify the excellent fusion performance in spatial and spectral aspects.

The rest of this paper is organized as follows. Section 2 briefly introduces the basic theory of structure tensor. The proposed hyperspectral image fusion method is described in detail in Section 3. In Section 4, the experimental results and analysis for different datasets are presented. Conclusions are drawn in Section 5.

## **2. Related Work**

A structure tensor can represent the structure and spatial information of images and has been shown to be an important tool in the field of image analysis [40,41]. The structure tensor has been successfully applied to many image processing problems, such as texture analysis [42], anisotropic filtering [43], and motion detection [44].

For a gray image *<sup>I</sup>*(*<sup>x</sup>*, *y*), the change generated by a shift (<sup>Δ</sup>*x*, <sup>Δ</sup>*y*) can be described as

$$\sigma = \sum\_{(\mathbf{x}, \mathbf{y})} \mathbf{w}(\mathbf{x}, \mathbf{y}) \left[ I(\mathbf{x} + \Delta \mathbf{x}, \mathbf{y} + \Delta \mathbf{y}) - I(\mathbf{x}, \mathbf{y}) \right]^2 \tag{1}$$

where (<sup>Δ</sup>*x*, <sup>Δ</sup>*y*) includes {(0, <sup>1</sup>),(1, <sup>0</sup>),(1, <sup>1</sup>),(−1, <sup>1</sup>)}, and w is a smooth window, such as a Gaussian window [40]. Then, by using the first-order Taylor series *<sup>I</sup>*(*x* + Δ*x*, *y* + <sup>Δ</sup>*y*) = *<sup>I</sup>*(*<sup>x</sup>*, *y*) + *Ix*Δ*x* + *Iy*<sup>Δ</sup>*y* + *<sup>O</sup>*(<sup>Δ</sup>*x*2, <sup>Δ</sup>*y*<sup>2</sup>), the change *r* are described as

$$r = \sum\_{(\mathbf{x}, \mathbf{y})} \mathbf{w}(\mathbf{x}, \mathbf{y}) \left[ I\_{\mathbf{x}} \Delta \mathbf{x} + I\_{\mathbf{y}} \Delta \mathbf{y} + O(\Delta \mathbf{x}^2, \Delta \mathbf{y}^2) \right]^2 \tag{2}$$

where *Ix* = *∂I ∂x* and *Iy* = *∂I ∂y* are the horizontal and vertical components of the gradient vector. For the small shift, the change *r* can be simplified as

$$r = \left[\Delta x, \Delta y\right] T \left[\Delta x, \Delta y\right]^T \tag{3}$$

where a matrix *T* is the structure tensor, defined as

$$T = \begin{bmatrix} \sum\_{(\mathbf{x}, \mathbf{y})} \mathbf{w}(\mathbf{x}, \mathbf{y}) (I\_x)^2 & \sum\_{(\mathbf{x}, \mathbf{y})} \mathbf{w}(\mathbf{x}, \mathbf{y}) I\_x I\_y\\ \sum\_{(\mathbf{x}, \mathbf{y})} \mathbf{w}(\mathbf{x}, \mathbf{y}) I\_x I\_y & \sum\_{(\mathbf{x}, \mathbf{y})} \mathbf{w}(\mathbf{x}, \mathbf{y}) (I\_y)^2 \end{bmatrix} \tag{4}$$

This structure tensor *T* is a semi-definite matrix and can be decomposed as

$$T = \begin{bmatrix} \ \ \mathfrak{e}\_1 & \ \mathfrak{e}\_2 \end{bmatrix} \begin{bmatrix} \ \mu\_1 & 0 \\ 0 & \mu\_2 \end{bmatrix} \begin{bmatrix} \ \mathfrak{e}\_1 & \mathfrak{e}\_2 \end{bmatrix}^\mathrm{T} \tag{5}$$

where *μ*1 and *μ*2 are the nonnegative eigenvalues, and *e*1 and *e*2 are the eigenvectors corresponding to the two eigenvalues. The two nonnegative eigenvalues describe the structure information of an image. When *μ*1 ≈ *μ*2 ≈ 0, the windowed image region is the flat area. If *μ*1 > *μ*2 ≈ 0, the area belongs to the

edge region. When *μ*1 ≥ *μ*2 > 0, this indicates a corner. The trace is the sum of the eigenvalues and the determinant is the product of the eigenvalues, and a thresholding is used to classify and detect a edge or a corner [40]. For one pixel of an image, structure tensor matrix *J* is defined as

$$J = \begin{bmatrix} \left(I\_x\right)^2 & I\_x I\_y\\ I\_x I\_y & \left(I\_y\right)^2 \end{bmatrix} = \nabla I \cdot \nabla I^\mathrm{T} \tag{6}$$

where ∇*I* = [*Ix*, *Iy*] T is the gradient operator, and · is matrix product.

#### **3. Proposed Hyperspectral Image Fusion Algorithm**

Figure 1 shows a diagram of the proposed method, which consists of the following steps. First, the spatial information of an HS image is obtained by using an adaptive weighted method. Then, an image enhancement approach is applied to the PAN image to sharpen the spatial information. This is followed up by a structure tensor which is introduced to extract the spatial details of the enhanced PAN image. Subsequently, the extracted spatial information of the HS and PAN images is merged via a matching weighted fusion method, and a guided filter is performed on the merged spatial information to prevent artifacts. Finally, an injection gains matrix is constructed to avoid the spectral and spatial distortion, and a fused image is produced through injecting the integrated spatial details into each band of the interpolated HS image.

**Figure 1.** Diagram of the proposed hyperspectral image fusion algorithm. ( *m* and *M* (*m* < *M* ) represent the image height of the original HS and PAN images, respectively. *n* and *N* (*n* < *N* ) represent the image width of the two images, and *d* represents the number of the HS image bands.).

#### *3.1. Upsamping and Adaptive Weighted for the HS Image*

For the same scene, let **Y** H ∈ R*m*×*n*×*<sup>d</sup>* represent the original low spatial resolution HS image, and **Y**P ∈ R *M*×*N*×1 represent the high spatial resolution PAN image. Fusing the HS and PAN images aims to obtain a fused high spatial resolution HS image **X** H ∈ R *M*×*N*×*d*. Here, *m* and *M* (*m* < *M*) denote the image height of the HS and PAN images, respectively. *n* and *N* (*n* < *N*) denote the image width of these two images, and *d* denotes the number of the HS image bands. The low spatial resolution HS image is upsampled to the scale of the PAN image by the finite impulse response (FIR) filter interpolation method. The FIR filter interpolation method first performs zero interpolation on the low spatial resolution HS image, and then carries out the FIR filter processing to obtain the interpolated image.

$$
\widetilde{\mathbf{Y}}\_{\mathcal{H}}^{l} = \uparrow \mathbf{Y}\_{\mathcal{H}}^{l} \tag{7}
$$

for *l* = 1, 2, ... , *d*, where ↑ is the upsampling operation, **Y** H ∈ R*M*×*N*×*<sup>d</sup>* is the interpolated HS image, **Y** *l* H ∈ R*M*×*<sup>N</sup>* is the *l*th band of the interpolated HS image, and **<sup>Y</sup>***l*H ∈ R*m*×*n* is the *l*th band of the original HS image.

For the purpose of extracting the spatial information of the HS image, an adaptive weighted method [19] is applied to the interpolated HS image.

$$\mathbf{S}\_{\rm H} = \sum\_{l=1}^{d} \omega\_{l} \tilde{\mathbf{Y}}\_{\rm H}^{l} \tag{8}$$

where **S**H ∈ R*M*×*<sup>N</sup>* is the spatial information of the HS image, and [*<sup>ω</sup>*1, *ω*2,..., *<sup>ω</sup>d*]<sup>T</sup> is the weight vector. To obtain the weights {*<sup>ω</sup>l*}*<sup>l</sup>*=1,...,*d*, the PAN image is first reduced to the same spatial scale of the low spatial resolution HS image. Let us denote this reduced PAN image as **Y**P. Then, let us assume that **S** ˆ H = ∑*dl*=<sup>1</sup> *<sup>ω</sup>l***Y***l*H. The optimal set of weights {*<sup>ω</sup>l*}*<sup>l</sup>*=1,...,*<sup>d</sup>* can be calculated by linear ridge regression to minimize the mse between **Y**P and **S** ˆ H. We obtain the closed-form solution of the weight *ωl* as follows

$$
\omega\_l = \left( \left( \mathbf{Y}\_\mathrm{H}^l \right)^T \left( \mathbf{Y}\_\mathrm{H}^l \right) \right)^{-1} \left( \mathbf{Y}\_\mathrm{H}^l \right)^T \mathbf{\overline{Y}}\_\mathrm{P} \tag{9}
$$

for *l* = 1, 2, ... , *d*, where ()T is the transpose operation, and ()−<sup>1</sup> is the matrix inversion. In Equation (9), **<sup>Y</sup>***l*H∈ R*m*×*n* and **Y**ˆ P ∈ R*m*×*n* are converted to the *mn* × 1 dimensional form to calculate the solution.

#### *3.2. Image Enhancement and Structure Tensor Processing for the PAN Image*

To sharpen the spatial structure information of the PAN image, image enhancement processing is applied to the PAN image. The spatial filtering method is adopted to sharpen the PAN image. Compared with the Laplace algorithm, Laplacian of Gaussian (LOG) image enhancement algorithm can improve the robustness to noise and discrete points. We choose the LOG enhancement algorithm to sharpen the PAN image. The LOG algorithm first reduces noise by Gaussian convolution filtering. Subsequently, Laplace operator is utilized to enhance the spatial details. The Laplacian filtered image is finally combined with the PAN image to obtain the enhanced PAN image. This LOG enhancement procedure can be described as

$$\hat{\mathbf{Y}}\_{\rm P} = \mathbf{Y}\_{\rm P} + \varepsilon [\mathbf{Y}\_{\rm P} \* f\_{\rm LOG}(\mathbf{x}, \mathbf{y})] \tag{10}$$

where **Y** ˆ P ∈ R*M*×*<sup>N</sup>* denotes the enhanced PAN image, *fLOG*(*<sup>x</sup>*, *y*) denotes the kernel function of LOG operator, ∗ denotes the convolution operator, and *c* is a constant. If the central coefficient of the kernel *fLOG*(*<sup>x</sup>*, *y*) is negative, *c* is equal to −1. If the central coefficient of the kernel *fLOG*(*<sup>x</sup>*, *y*) is a positive value, *c* is 1. In this work, the size of the kernel is set to 15 × 15. The central coefficient of the kernel is a negative value, and *c* is −1. Based on the principle of the LOG operator, the kernel function *fLOG*(*<sup>x</sup>*, *y*) is defined as

$$f\_{\rm LOS}(\mathbf{x}, \mathbf{y}) = \frac{\partial^2}{\partial \mathbf{x}^2} f\_{\rm G}(\mathbf{x}, \mathbf{y}) + \frac{\partial^2}{\partial \mathbf{y}^2} f\_{\rm G}(\mathbf{x}, \mathbf{y}) \tag{11}$$

where *fG*(*<sup>x</sup>*, *y*) is the Gaussian convolution function, defined as

$$f\_G(\mathbf{x}, y) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp(-\frac{\mathbf{x}^2 + y^2}{2\sigma^2}) \tag{12}$$

where *σ* is the standard deviation, and *σ* is set to 0.43. Thus, the kernel function *fLOG*(*<sup>x</sup>*, *y*) is calculated by

$$f\_{LGG}(\mathbf{x}, y) = \frac{\mathbf{x}^2 + y^2 - 2\sigma^2}{\sigma^4} \exp(-\frac{\mathbf{x}^2 + y^2}{2\sigma^2}) \tag{13}$$

*Remote Sens.* **2018**, *10*, 373

In order to extract the spatial details of the enhanced PAN image, the structure tensor processing is introduced. Based on Equation (6), structure tensor matrix of the enhanced PAN image at pixel *i* is defined by

$$\mathbf{T}\_i = \begin{bmatrix} \hat{Y}\_{P x,i}^2 & \hat{Y}\_{P x,i} \hat{Y}\_{P y,i} \\ \hat{Y}\_{P x,i} \hat{Y}\_{P y,i} & \hat{Y}\_{P y,i}^2 \end{bmatrix} \tag{14}$$

where **Y** ˆ P*x* = *∂***Y** ˆ P *∂x* and **Y** ˆ P*y* = *∂***Y** ˆ P *∂y* are the horizontal and vertical components of the gradient vector on the enhanced PAN image, and **T***i* ∈ R2×<sup>2</sup> is the structure tensor matrix at pixel *i* on the enhanced PAN image. To include the local spatial structure information, a Gaussian kernel function is convoluted with the above structure tensor.

$$\mathbf{T}\_{i} = \begin{bmatrix} \mathcal{G}\_{\mathcal{F}} \* \hat{\mathbf{Y}}\_{\mathcal{P}x,i}^{2} & \mathcal{G}\_{\mathcal{F}} \* \hat{\mathbf{Y}}\_{\mathcal{P}x,i} \hat{\mathbf{Y}}\_{\mathcal{P}y,j} \\\ \mathcal{G}\_{\mathcal{F}} \* \hat{\mathbf{Y}}\_{\mathcal{P}x,i} \hat{\mathbf{Y}}\_{\mathcal{P}y,i} & \mathcal{G}\_{\mathcal{F}} \* \hat{\mathbf{Y}}\_{\mathcal{P}x,i}^{2} \end{bmatrix} = \begin{bmatrix} P\_{11} & P\_{12} \\\ P\_{12} & P\_{22} \end{bmatrix} \tag{15}$$

where *gr* represents a Gaussian kernel with standard deviation *r*, the kernel size and the standard deviation of the Gaussian kernel are set to 1 × 2 and 0.5, **T***i* represents the resulting structure tensor matrix at pixel *i*, and *P*11 *P*12 *P*12 *P*22 simply represents the tensor **T***i*. According to the content of related work, the structure tensor **T***i* can be decomposed as the form shown in Equation (5). Two nonnegative eigenvalues which represent the spatial structure information are calculated using the following formula

$$\mathbf{f}\_{1,2}^{\pi} = \frac{1}{2} \left[ P\_{11} + P\_{22} \pm \sqrt{\left( P\_{11} - P\_{22} \right)^2 + 4P\_{12}^2} \right] \tag{16}$$

where *ξ*1 and *ξ*2 are the nonnegative eigenvalues for the structure tensor matrix shown in Equation (15). The values of the two nonnegative eigenvalues divide the structure information into three types, i.e., flat area, edge area, and corner. If *ξ*1 and *ξ*2 are near zero, the area of this pixel is the flat area. When *ξ*1 > *ξ*2 ≈ 0 and *ξ*1 ≥ *ξ*2 > 0, the area of this pixel belongs to the edge area and corner, respectively. For an image, we consider that the effective spatial information includes edge region and corner.

The trace of a matrix, denoted by *R*, is the sum of the eigenvalues, and is also the sum of *P*11 and *P*22. The determinant denoted by *D*, is the product of the eigenvalues. We test on a large number of enhanced PAN images to study the trace and determinant at each pixel. Figure 2 shows the trace and determinant at each pixel on two of the enhanced PAN images. For a pixel, if *R* is near zero, the two eigenvalues are all near zero, and the area is the flat area. If *R* is larger than zero, at least one of the nonnegative eigenvalues is greater than zero, and the area at this pixel belongs to edge region or corner. Similarly, when *D* is near zero, at least one of the eigenvalues is near zero, and the area is flat area or edge region. when *D* is larger than zero, the two eigenvalues at this pixel are all larger than zero, and this pixel is a corner. The edge regions and corners are important spatial information. Based on the analysis and study on numerous enhanced PAN images, we sugges<sup>t</sup> the following guidelines. When *R* > 1 ∗ <sup>10</sup>(−<sup>5</sup>), the pixel is identified as edge region or corner, and the area of this pixel is the effective spatial information. Thus, the value of this pixel should be retained. Otherwise, the area of this pixel is classified as the flat area, and the value of this pixel is not retained. This procedure of extracting the spatial details of the enhanced PAN image can be described as

$$\mathbf{S}\_{\mathbf{P},i} = \begin{cases} & \mathbf{\hat{Y}}\_{\mathbf{P},i} & \text{if } R > 1 \ast 10^{(-5)} \\ & 0, & \text{otherwise} \end{cases} \tag{17}$$

where **S**P ∈ R*M*×*<sup>N</sup>* is the spatial information of the enhanced PAN image, **S**P,*i* is the value of the spatial information at pixel *i*, and **Y** ˆ P,*i* is the value of the enhanced PAN image at pixel *i*.

**Figure 2.** Trace and determinant at each pixel on two enhanced PAN images. (**a1**,**a2**) PAN image; (**b1**,**b2**) Trace of structure tensor at each pixel; (**c1**,**c2**) Determinant of structure tensor at each pixel.

Figure 3 shows the spatial information obtained by the gradient methods and the structure tensor method. Figure 3a shows a PAN image. Figure 3b shows the enhanced PAN image which is sharpened by using the LOG image enhancement algorithm. Figure 3c,d show the spatial information extracted by the horizontal gradient processing and the vertical gradient processing, respectively. According to Equation (17), the spatial information extracted by the structure tensor method is obtained and shown in Figure 3e. As shown in Figure 3, the extracted spatial information of the horizontal gradient method and the vertical gradient method only retain part of edge information of the original image. By contrast, the spatial information obtained by the structure tensor method contains most of the edge and structure information. This illustrates the structure tensor processing method in this subsection can effectively extract the spatial details of the enhanced PAN image.

**Figure 3.** Spatial information of an enhanced PAN image extracted by the gradient methods and the structure tensor method. (**a**) PAN image; (**b**) Enhanced PAN image; (**c**) Horizontal gradient method; (**d**) Vertical gradient method; (**e**) Structure tensor method.

#### *3.3. Weighted Fusion of Spatial Details*

As shown in Figure 1, **S**H contains the spatial information of the HS image, and **S**P retains the spatial details of the enhanced PAN image. For a same scene, the HS image and the PAN image all include spatial information, and the spatial information of the two images is different and complementary. The PAN image has more spatial information, but may not include the details about the spatial structure of the HS image. Most conventional approaches only extract the spatial information from the PAN image, and not consider the spatial structure of the HS image. They may lead to spectral distortion or inadequate spatial enhancement. To obtain the complete spatial details and consider the spatial information of the HS and PAN images simultaneously, a weighted fusion method is presented to integrate the spatial information of the HS image with the spatial information of the PAN image.

$$\mathbf{S}\_{\rm F,i} = \begin{cases} \mathbf{S}\_{\rm H,i}, & \text{if } \mathbf{S}\_{\rm P,i} = 0 \\\ \lambda\_1 \cdot \mathbf{S}\_{\rm P,i} + \lambda\_2 \cdot \mathbf{S}\_{\rm H,i}, & \text{if } \mathbf{S}\_{\rm P,i} \neq 0 \end{cases} \tag{18}$$

where *λ*1 and *λ*2 are weight coefficients, **S**F ∈ R*M*×*<sup>N</sup>* is the complete spatial details, **S**F,*i* is the value of **S**F at pixel *i*, **S**P and **S**H are the spatial information of the enhanced PAN image and the HS image, respectively, **S**P,*i* and **S**H,*i* are the values of **S**P and **S**H at pixel *i*. Since the PAN image contains more spatial details compared with the HS image, *λ*1 and *λ*2 are set to 0.9 and 0.1, respectively. Subsequently, to avoid artifacts at the boundaries, a guided filter is applied to the obtained fused spatial information image. The guided filter is an edge-preserving filter. It can smooth the input image while transferring the structure information from the guidance image to the output image [45]. The fused spatial information **S**F is served as both the guidance image and the input image. The filtered image which is the continuous and smooth result of the input image has the spatial structure information of the guidance image. Thus, the filtered output image which is continuous can avert artifacts at the boundaries, and preserves the spatial details of the fused spatial information **S**F, simultaneously. According to the principle of the guided filter, the output image is a local linear transformation of the guidance image. This procedure are described as

$$\mathbf{S}\_{i} = \overline{a}\_{i}\mathbf{S}\_{\overline{r},i} + \overline{b}\_{i} = \frac{1}{|s|}\sum\_{k \in \upsilon\_{i}} a\_{k}\mathbf{S}\_{\overline{r},i} + \frac{1}{|s|}\sum\_{k \in \upsilon\_{i}} b\_{k}, \forall i \in \upsilon\_{k} \tag{19}$$

where **S** ∈ R*M*×*<sup>N</sup>* is the output image, *vk* is a local square window centered at pixel *k*, the local window size is set to 40, *ak* and *bk* are linear coefficients assumed to be constant, *ai* and *bi* are the average coefficients of all windows overlapping *i*, and |*s*| is the number of pixels in *vk*. The linear coefficients *ak* and *bk* are computed by minimizing the difference between the input image **S**F and the output image **S** while maintaining the linear transformation in the window *vk*. The solution of *ak* and *bk* is obtained by calculating the linear ridge regression model.

$$a\_k = \frac{\frac{1}{|s|} \sum\_{i \in \upsilon\_k} \mathbf{S}\_{\mathbf{F},i} \mathbf{S}\_{\mathbf{F},i} - \theta\_k \overline{\mathbf{S}}\_{\mathbf{F},k}}{\chi\_k^2 + \varepsilon} \tag{20}$$

$$b\_k = \overline{\mathbf{S}}\_{\mathcal{F},k} - a\_k \theta\_k \tag{21}$$

where *θk* and *χ*2*k* are the mean and variance of the guidance image **S**F in *vk*, **<sup>S</sup>**F,*k* is the mean of the input image **S**F in *vk*, *ε* is a regularization parameter, and parameter *ε* is set to 10−4.

#### *3.4. Constructing Gains Matrix and Injecting Spatial Details*

Before including the integrated continuous spatial details into the interpolated HS image, a injection gains matrix is constructed to control the spectral and spatial distortion. To reduce the spectral distortion, the ratios between each pair of the HS bands should preserve unchanged. It is significant for maintaining the spectral information to preserve such ratios. It is depicted as

$$\mathbf{G}^{l} \approx \frac{\mathbf{\tilde{Y}}\_{\text{H}}^{l}}{(1/d)\sum\_{l=1}^{d} \mathbf{\tilde{Y}}\_{\text{H}}^{l}} \tag{22}$$

for *l* = 1, 2, ... , *d*, where **G** ∈ R *M*×*N*×*d* denotes the injection gains matrix, and **G***<sup>l</sup>* ∈ R *M*×*N* denotes the *l*th band of the gains matrix. For the sake of ensuring the spatial quality, we define a following tradeoff parameter to regulate the amount of the injected spatial details.

$$\mathbf{G}^{l} = \text{tr} \frac{\widetilde{\mathbf{Y}}\_{\text{H}}^{l}}{(1/d)\sum\_{l=1}^{d} \widetilde{\mathbf{Y}}\_{\text{H}}^{l}} \tag{23}$$

for *l* = 1, 2, . . . , *d*, where *τ* is the defined tradeoff parameter. The influence and setting of the tradeoff parameter *τ* have been expounded in the experimental part. Then, the spatial details are injected into the interpolated HS image to generate the fused HS image for each band.

$$\mathbf{X}\_{\rm H}^{l} = \tilde{\mathbf{Y}}\_{\rm H}^{l} + \mathbf{G}^{l} \cdot \mathbf{S} \tag{24}$$

where · is element-wise multiplication.

#### **4. Experimental Results and Discussion**

In this section, we design the experimental setup, and analyze the setting of the tradeoff parameter. To evaluate the fusion performance of the proposed method, four hyperspectral remote sensing datasets are used for experiments.
