**2. Related Work**

#### *2.1. Intrinsic Image Decomposition*

Based on the principle of human visual perception, intrinsic image decomposition (IID) aims to decompose an image into reflectance and illumination components [30]. The reflectance component depends on the material of objects in an image. There is abundant edge and structure information in an image. However, these information is not directly related to the reflectance component. Edge and structure information is mainly preserved in the illumination component. The intrinsic image decomposition process can be expressed as:

$$I = SR \tag{3}$$

where matrix *I* represents an input image, matrices *S* and *R* represent the illumination and reflectance components, respectively. From the aforementioned equation, it is obvious that estimating *S* and *R* based on *I* is a difficult problem. In order to solve this problem, many solutions have been presented in recent years [31–34]. Among these methods, user intervention [34], extra constraints [30], and heuristic cues have shown good results to estimate the intrinsic image from the input image. Retinex algorithm is a widely used image enhancement method based on scientific experiments and analysis [35]. Compared with the traditional methods that can only enhance a certain characteristic, the retinex algorithm can balance the dynamic compression, contrast improvement and constant color. It is important that the detail information immersed in the light region or shadow can be effectively displayed by the retinex algorithm. However, it is difficult to weigh up the relationship between the detail contrast of the image and the color reservation by the single scale retinex algorithm. Hyperspectral pansharpening aims at enhancing the spatial information while preserving the spectral information. Multiscale retinex algorithm indeed shows the good performance in achieving this objective [36]. Motivated by the aforementioned finding, the multiscale retinex based-IID algorithm is really a good technique for hyperspectral pansharpening.

#### *2.2. Weighted Least Squares Filter*

The WLS filter which is an edge-preserving filter has become a highly active research topic in various image processing. Since the WLS filter do not blur strong edges in the process of image decomposition, the ringing artifacts can be avoided. Compared with other edge-preserving filters, the WLS filter can make a best compromise between sharpening and blurring [37]. The WLS filter can be used for estimating the low frequency image of the input image. Specifically, the WLS filter assumes that the filtered image *f* is as close as possible to the input image *g*, and should be as smooth as possible everywhere, except across the edges. Farbman et al. [37] proposed that the filtered image *f* can be obtained by seeking the minimum of the following equation

$$\sum\_{p} \left( (f\_p - g\_p)^2 + \gamma(\omega\_{x,p}(g)(\frac{\partial f}{\partial x})\_p^2 + \omega\_{y,p}(g)(\frac{\partial f}{\partial y})\_p^2) \right) \tag{4}$$

where *p* refers to the *p*th pixel. The first term (*fp* − *gp*)<sup>2</sup> ensures the minimum distance between the filtered image *f* and the input image *g*, *<sup>ω</sup>x*,*<sup>p</sup>*(*g*) and *<sup>ω</sup>y*,*<sup>p</sup>*(*g*) which depend on *g* are smoothness weights. *γ* is a regular term parameter that balances the two terms.

## **3. Proposed Method**

The schematic diagram of the proposed method is shown in Figure 1. It consists of three major parts: (1) Extracting spatial details of the *P* image; (2) Extracting spatial detail of the H image; (3) Generating the detail map; and (4) Obtaining the fused H image.

**Figure 1.** The schematic diagram of the proposed method.

#### *3.1. Extracting Spatial Details of the P Image with Weighted Least Squares Filter*

The *P* image contains plentiful spatial information. To enhance the spatial details of the *P* image and reduce noise, we use the Laplacian of Gaussian (LOG) enhancement algorithm to sharpen the spatial information of the *P* image. LOG algorithm can reduce image noise and sharpen structural details. First, Gaussian convolution filtering is performed on the *P* image to remove the noise. Laplace operator is then used for enhancing the spatial details of the denoised *P* image. The enhanced *P* image is finally obtained by combining the *P* image with the Laplacian filtered image. This process can be expressed as

$$P\_{\mathbb{S}} = P + a[P \ast f\_L(\mathfrak{x}, y)] \tag{5}$$

where, *PS* is the enhanced *P* image, *fL*(*<sup>x</sup>*, *y*) is the kernel function of LOG operator, *a* is a constant. *a* is related to the central coefficient of the kernel *fL*(*<sup>x</sup>*, *y*). In this paper, the central coefficient of the kernel is a negative value, and we set *a* to −1. *fL*(*<sup>x</sup>*, *y*) is defined as

$$f\_L(\mathbf{x}, y) = \frac{\partial^2}{\partial \mathbf{x}^2} f\_{\mathcal{S}}(\mathbf{x}, y) + \frac{\partial^2}{\partial y^2} f\_{\mathcal{S}}(\mathbf{x}, y) \tag{6}$$

where, *fg*(*<sup>x</sup>*, *y*) = √ 12*πσ*<sup>2</sup> exp(−*x*2+*y*<sup>2</sup> 2*σ*<sup>2</sup> ) is the Gaussian convolution function, and *σ* is the standard deviation.

To extract the spatial details of the enhanced *P* image, the WLS filter is applied to the *PS* image. Based on the principle of the WLS filter, the low-frequency image *PL* which should be as close as possible to the *PS* image and be as smooth as possible except the edges can be obtained by the following optimization equation.

$$\arg\min\_{P\_L} (\left\| P\_L - P\_S \right\|^2 + \gamma \left( \omega\_x \left( \frac{\partial P\_L}{\partial x} \right)^2 + \omega\_y \left( \frac{\partial P\_L}{\partial y} \right)^2 \right)) \tag{7}$$

The first term *PL* − *PS*<sup>2</sup> ensures the minimum distance between *PL* and *PS*. *γ* is a regularization parameter to balance the first term and the second term. The second term *<sup>ω</sup>x*(*∂PL*/*∂x*)<sup>2</sup> + *<sup>ω</sup>y*(*∂PL*/*∂y*)<sup>2</sup> aims to smooth the *PL* image by minimizing the derivatives of *PL* with respect to *x* and *y*. *ωx* and *<sup>ω</sup>y* are smoothness weights. We rewrite Equation (7) by using matrix notation

$$\gamma \left(P\_L - P\_S\right)^T \left(P\_L - P\_S\right) + \gamma \left(P\_L^T Z\_x^T \mathcal{W}\_x Z\_x P\_L + P\_L^T Z\_y^T \mathcal{W}\_y Z\_y P\_L\right) \tag{8}$$

where matrices *Wx* and *Wy* are diagonal matrices, the matrices *Zx* and *Zy* are the discrete differentiation operators, and the smoothness weights *ωx* and *<sup>ω</sup>y* are contained in matrices *Zx* and *Zy*, respectively. The following linear equation can be obtained by taking the derivative of Equation (8)

$$[\left(I + \gamma (Z\_x^T W\_x Z\_x + Z\_y^T W\_y Z\_y)\right)]P\_L = P\_S \tag{9}$$

where *ZTx WxZx* + *ZTy WyZy* is a five-point spatially inhomogeneous Laplacian matrix. We define *ωx* and *<sup>ω</sup>y* in the same manner as in [38]:

$$
\omega\_{\mathbf{x},p(P\_{\mathbb{S}})} = \left( \left| \frac{\partial l}{\partial \mathbf{x}}(p) \right|^{a} + \varepsilon \right)^{-1}, \omega\_{\mathbf{y},p(P\_{\mathbb{S}})} = \left( \left| \frac{\partial l}{\partial \mathbf{y}}(p) \right|^{a} + \varepsilon \right)^{-1} \tag{10}
$$

where *l* is the log-luminance channel of the *PS* image, *ε* is a constant that is close to zero, the parameter *α* controls the gradients of *PS*, and is set to 2.0 in this paper. By the above analysis, the low-frequency image *PL* can be estimated. Considering the MRA formulation (Equation (2)), we can obtain the high-frequency image *PH* by subtracting the low-frequency image *PL* from the enhanced *P* image *PS*. This process can be expressed as

$$P\_H = P\_S - P\_L \tag{11}$$

where *PL* and *PH* denote the low-frequency image and high-frequency image of the *PS* image, respectively. We consider that most of the spatial details of the enhanced *P* image are contained in the high-frequency image.

Figure 2 shows the example of spatial detail processing of the *P* image. In this experiment, the respective parameters are adjusted to the optimal values. From Figure 2b, it can be observed that the Laplacian of Gaussian (LOG) enhancement algorithm indeed plays an important role in the spatial details enhancement. Figure 2c shows the high-frequency image of the enhanced *P* image obtained by the proposed method. To illustrate the effectiveness of the WLS filter in spatial extraction, Figure 2c is compared with Figure 2d which is obtained by Gaussian filtering on the enhanced image. Similarly, to illustrate the effectiveness of the LOG algorithm in spatial enhancement, Figure 2c is compared with Figure 2e which is obtained by WLS filtering on the original *P* image. A comparison of Figure 2c with Figure 2d,e shows that there is fine distinction between three filtered images. Figure 2d,e show that some low frequency components are mixed with the high-frequency image, especially in spherical regions. Therefore, the LOG enhancement algorithm indeed displays a good performance in spatial detail enhancement, and the WLS filter is really suitable for extracting the high frequency component of the *P* image.

**Figure 2.** Example of spatial detail processing of the *P* image. (**a**) Original *P* image; (**b**) Enhanced *P* image. High-frequency images: (**c**) Sharpening + WLS filtering; (**d**) Sharpening + Gaussian filtering; (**e**) WLS filtering.

#### *3.2. Extracting Spatial Detail of the HS Image with Intrinsic Image Decomposition*

Upsampling is done on the original hyperspectral image (*OH*) to obtain the same size as the *P* image.

$$
\Box HH^k = \uparrow \Box H^k \tag{12}
$$

for *k* = 1, 2, ... , *λ*, where *λ* is the number of the HS image bands, ↑ is the upsampling operation, *UH* is the interpolated HS image, *OH<sup>k</sup>* and *UH<sup>k</sup>* are the *k*th band of the original HS image and the interpolated HS image, respectively.

A preprocessing step that performs MTF-based deblurring [39] on the interpolated HS image. The MTF-based deblurring method is performed in the frequency domain as follows

$$\mathcal{H}^k = \frac{\mathcal{M}\_k^\*}{\left| \mathcal{M}^k \right|^2 + \frac{1}{\mathcal{SNR}}} \mathcal{U} \mathcal{H}^k \tag{13}$$

where, H is the Fourier transform of the deblurred interpolated HS image, H*<sup>k</sup>* is the *k*th band of H, UH*k*, and M*<sup>k</sup>* are the Fourier transform of *UH<sup>k</sup>* and the Fourier transform of the blurring kernel for the *k*th band, respectively, 1/*SNR* is the noise-to-power ratio, and M∗*k* is the complex conjugate of M*k*. 1/*SNR* is a constant that is close to zero. The deblurred interpolated HS image *H* is obtained by the inverse Fourier transform of H. Subsequently, according to [39], we adopt the de-ringing technique to decrease ringing artifacts caused by non-periodic boundaries.

According to the CS based method, the deblurred HS image *H* can be separated into spectral and spatial information. IID is introduced to separate the spectral and spatial information. Based on the principle of the IID, each band of the HS image is composed of two components, i.e., the illumination component and the reflectance component. For the HS image, the reflectance component is identified

as the spectral information, while the illumination component is the spatial information. Each band of the deblurred HS image can be represented as follows

$$H^k(\mathbf{x}, \mathcal{Y}) = H^k\_{\mathbb{R}}(\mathbf{x}, \mathcal{Y}) \times H^k\_I(\mathbf{x}, \mathcal{Y}) \tag{14}$$

where (*<sup>x</sup>*, *y*) is spatial coordinate. *<sup>H</sup>kR*(*<sup>x</sup>*, *y*) which depends on the intrinsic nature of the objects represents the reflectance component of the *k*th band of an HS image. *<sup>H</sup>kI*(*<sup>x</sup>*, *y*), which is related to the structure information of the objects, represents the illumination component of the *k*th band of an HS image. Based on the single scale retinex algorithm, the output can be estimated by the difference between the input and the average of its neighborhood, which can be described by the following equation

$$h\_{\mathcal{R}}^k(\mathbf{x}, \boldsymbol{y}) = \log(H^k(\mathbf{x}, \boldsymbol{y})) - \log(H^k(\mathbf{x}, \boldsymbol{y}) \* F(\mathbf{x}, \boldsymbol{y})) \tag{15}$$

where *<sup>h</sup>kR*(*<sup>x</sup>*, *y*) = log(*HkR*(*<sup>x</sup>*, *y*)), *H<sup>k</sup>* is the *k*th band of the input image, *hkR* is the *k*th band of the output image, *F* is the Gaussian surround function, and symbol ∗ is convolution. Illumination estimation is denoted by the convolution *<sup>H</sup><sup>k</sup>*(*<sup>x</sup>*, *y*) ∗ *<sup>F</sup>*(*<sup>x</sup>*, *y*). The Gaussian surround function *<sup>F</sup>*(*<sup>x</sup>*, *y*) can be given as follows

$$F(\mathbf{x}, y) = \mathbb{C} \exp[- (\mathbf{x}^2 + y^2) / 2\sigma^2] \tag{16}$$

$$\iint F(\mathbf{x}, \mathbf{y})d\mathbf{x}d\mathbf{y} = 1\tag{17}$$

where *σ*, the scale factor of the Gaussian kernel, controls the color information and the spatial resolution of the image. *σ* cannot be determined and theoretically modeled. Generally, the larger the scale parameter *σ*, the better the color fidelity and the lower the spatial resolution of the output image. To make the compromise between the extraction of spatial details and the preservation of spectral information, multiscale retinex is used for separating the reflectance component from the illumination component of the deblurred HS image. This process can be given by the following formula

$$h\_{\rm MR}^k(\mathbf{x}, \boldsymbol{y}) = \sum\_{n=1}^N \omega\_n h\_{\rm R}^k = \sum\_{n=1}^N \omega\_n \log(H^k(\mathbf{x}, \boldsymbol{y})) - \log(H^k(\mathbf{x}, \boldsymbol{y}) \* \mathbf{F}\_n(\mathbf{x}, \boldsymbol{y})) \tag{18}$$

$$F\_{\rm n}(x, y) = \mathbb{C}\_{\rm n} \exp[- (x^2 + y^2) / 2\sigma\_n^2] \tag{19}$$

where *N* represents the number of scales, and *ωn* represents the weighting factor. According to experimental experience, *N* is set to 3, and *ωn* is set to 1/3. *σ*1, *σ*2, and *σ*3 are set as 20, 40, and 80, respectively. Obtaining the scale factor *σn*, *Cn* which denotes the normalization factor can be determined by Equations (17) and (19). Then, the reflectance component of the deblurred HS image is easy to be obtained

$$H\_R^k(\mathbf{x}, \mathbf{y}) = \exp(h\_{MR}^k(\mathbf{x}, \mathbf{y})) \tag{20}$$

According to Equation (14), the illumination component of the deblurred HS image is calculated by

$$H\_I^k = \frac{H^k}{H\_R^k} \tag{21}$$

We consider that most of the spatial details of each band of the deblurred HS image are contained in *HkI* .

In [29], the spatial details of each band of the HS image are extracted by average filtering. The right picture of Figure 3b shows the detail map obtained by average filtering. The right picture of Figure 3c shows the detail map obtained by the adopted IID technique. The detail map shown in Figure 3b contains part of the spatial information of the HS image. The detail map obtained by the IID technique contains most of the spatial information of the HS image. The IID is indeed an effective technique for extracting spatial information of the HS image.

**Figure 3.** Spatial details extracted by average filtering and intrinsic image decomposition methods. (**a**) HS image; (**b**) Average filtering method (**left**: Spectral component of HS image; **right**: Spatial component of HS image); (**c**) Intrinsic image decomposition method (**left**: Reflectance component of HS image; **right**: Illumination component of HS image).

#### *3.3. Generating the Detail Map*

The CS methods suffer from the spectral distortion, since the detail map only depends on the *P* image. To reduce the spectral distortion, the detail map in this paper depends on both the *P* image and the HS image. The detail map of each band of the HS image is determined by the illumination component of the HS image and the high-frequency component of the *P* image. Specifically, after obtaining the high-frequency component of the *P* image and the illumination component of the *k*th band of the HS image, the detail map can be generated by

$$D^k = \zeta P\_H + (1 - \zeta) H\_I^k \tag{22}$$

for *k* = 1, 2, ... , *λ*, where *ζ* is a weight coefficient, and *D<sup>k</sup>* is the initial detail map. Since the *P* image contains much more spatial information compared with the HS image, *ζ* is set to 0.9. Many traditional CS and MRA based methods assume that the same detail map is in demand by different bands of the HS image. This hypothesis always produces spectral and spatial distortion. We consider that a different detail map is required by different bands of the HS image. Assuming that the amount of spatial detail required for different bands is proportional to the ratio of the information of that band. It is helpful for reducing the spectral distortion to keep this ratio unchanged. Thus, we define the following formula

$$D\_F^k = \frac{\iota HH^k}{(1/\lambda)\sum\_{k=1}^n \iota HH^k} D^k \tag{23}$$

for *k* = 1, 2, ... , *λ*, where *λ* is the bands number of the HS image, *DkF* denotes the final detail map which is required by the *k*th band of the HS image.

#### *3.4. Obtaining the Fused HS Image*

According to Equation (1), we define a constraint parameter *α* to control the final detail map. The fused HS image can be obtained by injecting the final detail map into the deblurred interpolated HS image for each band.

$$H\_F^k = H^k + \mathfrak{a} D\_F^k \tag{24}$$

for *k* = 1, 2, ... , *λ*, where *HF* is the fused HS image, and *HkF* is the *k*th band of the fused HS image. Since the amount of the injected details is regulated by the parameter *α*, the spectral and spatial distortion can be restrained.
