1. Introduction
For several decades, a huge amount of remote sensing images, which are provided by optical satellites, played a crucial role in human tasks. With an increasing demand for very high-resolution (HR) products, high-performance acquisition devices are quickly being developed. Nevertheless, due to physical constraints, a sole acquisition device cannot provide very fine spatial and spectral resolutions [
1]. Normally, the optical satellites are equipped with two types of imaging devices: multispectral (MS) and panchromatic (PAN). The MS image is composed of several spectral channels and has rich color information. However, its spatial resolution does not satisfy the requirements of some remote sensing applications, such as classification and objection detection. The PAN image with only one spectral channel can supply high spatial resolution. Thus, pansharpening (PS) technique, which fuses the MS image and the PAN image, was developed to acquire HR MS images [
2].
Nowadays, the existing PS approaches can be classified into three categories: component substitution (CS), multiresolution analysis (MRA), and variational optimization (VO)-based methods [
3]. The CS-based methods, also known as spectral approaches, project the MS image onto a specific space and substitute the component that contains the main spatial information with the histogram-matched PAN image. This category of methods consists of intensity-hue-saturation (IHS) [
4], Gram–Schmidt (GS) spectral sharpening [
5], and principal component analysis (PCA) [
6]. Due to the obvious spectral distortion caused by the classical CS-based methods, some improved methods belonging to this category were presented, which can be found in the literatures [
7,
8,
9,
10]. The MRA-based methods, also known as spatial approaches, constitute another category of PS approaches. This category injects the spatial details extracted from the PAN image through a multiresolution decomposition into the upsampled MS image of the same scale with the PAN image. Several modalities of MRA are used to extract the spatial details, such as decimated wavelet transform [
11], undecimated wavelet transform [
12], “à trous” wavelet transform (ATWT) [
13,
14], Laplacian pyramid (LP) [
15], contourlet transform [
16], curvelet transform [
17], generalized LP based on Gaussian filters matching the modulation transfer function (MTF) of the MS sensor (MTF-GLP) [
18,
19,
20], and so on. The MRA-based methods are well able to preserve the spectral information but may cause spatial distortions in local regions. Since these two categories of methods produce the fused images with different features, hybrid methods combining the advantages of CS-based and MRA-based methods were developed. The representative methods include IHS+Wavelet [
21], PCA+Contourlet [
6], and ICA+Curvelet or Wavelet [
17,
22], etc.
As a new generation of PS methods, the VO-based methods attracted much attention from researchers and were rapidly developed [
3]. This category of PS methods generally coverts the pansharpening process to the optimization of a variational model. As a crucial part for these methods, the construction of energy functional relies on the imaging mechanism of the observed measurements [
23,
24,
25]. The energy functionals generally consist of three parts, i.e., the spectral fidelity model, the spatial enhancement model, and the prior model [
3]. The spectral fidelity model aims to preserve the color information of the MS image as much as possible. It describes the relationship between the ideal fused image and the low-resolution (LR) MS image. The LR MS image can be regarded as a degraded version of the ideal HR MS image processed by blurring, downsampling, and noise operators [
25,
26]. The spatial enhancement model constructs the relationship between the ideal fused image and the PAN image. Two main assumptions are usually made in this model: one assumption is that the PAN band is represented as a linear combination of the HR MS bands [
26]; the other assumption is that the spatial structures of the pansharpened image are approximately consistent with the PAN image [
27,
28]. The prior model that imposes the spatial constraints on the ideal HR MS image is used to further enhance the spatial quality. The representative prior models include Laplacian prior [
29], total variation [
30], nonlocal prior [
31], low-rank prior [
32], etc. Based on the idea of image super-resolution, sparse representation (SR) technique was successfully used in remote sensing image fusion [
33,
34]. The SR-based method assumes that the LR images and the HR images have the same coefficients under certain specific dictionaries. During the last ten years, a lot of SR-based PS methods were proposed [
35,
36,
37,
38,
39,
40,
41,
42,
43,
44,
45,
46,
47,
48,
49]. These methods adopt different dictionary construction approaches to improve the fusion performance. Although the SR-based method has better performance than the CS and MRA methods, its high computational complexity restricts practical applications.
With the rapid development of the deep learning techniques, various types of convolutional neural network (CNN) structures proven to be effective in classification and super-resolution tasks were applied to remote sensing image fusion [
50]. The representative methods include Pansharpening CNN (PNN) [
51], multiscale and multidepth CNN (MSDCNN) [
52], PNN with a spectral-spatial structure loss (S3) [
53], Pan-GAN [
54], GTP-PNet [
55], etc. These methods accomplish better results than the traditional PS methods. However, the CNN-based pansharpening methods require many datasets to train the network structures and have weak generalization ability for different types of remote sensing images.
However, for different types of PS methods, the spectral mismatch between the MS channels and the PAN channel can result in an unwarranted degradation of fusion performance.
Figure 1 shows the relative spectral response curve of WorldView-2; the colored lines and black line indicate the spectral responses of the MS channels and the PAN channel, respectively. The yellow, red, and red edge bands are within the wavelength range well covered by the PAN band. Also, the blue and green bands are partially outside the wavelength range covered by the PAN image, while the coastal, NIR1, NIR2 bands are almost outside the wavelength range covered by the PAN image. Hence, an obvious difference exists in the spectral response for the WorldView-2 data. The spectral mismatch problem makes most of PS methods suffer from spectral and spatial distortions. For example, the VO-based methods usually adopt the linear combination model as the spatial enhancement term under the assumption that the spectral range of the PAN image almost covers that of all the MS channels. Hence, these methods are not suitable for pansharpening the WorldView-2 data. The sparse coding-based methods are based on the assumption that the LR and HR image patches have the same sparse representations over the dictionary pair learned from the PAN image and its degraded version. In our earlier work [
56], we firstly exploited the graph regularized sparse coding (GRSC) [
57] algorithm into the pansharpening. In this method, we only consider the four-band MS images; for the eight-band MS image, due to the spectral mismatch, the dictionary learned from the PAN image may not be adequate to sparsely represent the MS image patches. To reduce the influence of spectral mismatch, this paper proposes a PS method to sharpen the WorldView-2 data via graph regularized sparse coding and adaptive coupled dictionary (GRSC-ACD). Our contributions are as follows.
- (1)
Considering the degree of correlation among the MS channels and the PAN channel, the PS process of the WorldView-2 data is regarded as a multitask problem. The first task is to process the adjacent MS channels, i.e., green, yellow, red, and red edge, with high correlation to the PAN band and within the wavelength range well covered by the PAN image. The second task is to process a single MS channel, i.e., blue band, partially outside the wavelength range covered by the PAN image and with low correlation to the PAN image. The third task is to process the MS channels, i.e., coastal, NIR1, and NIR2 outside the wavelength range covered by the PAN image.
- (2)
To acquire precise sparse representations of the MS image patches, the GRSC algorithm is used in the GRSC-ACD method by exploiting the local manifold structure that describes the spatial similarity of the image patches. In each task, the LR MS channels are tiled into image patches, which make up an image patch set. Then, the image patch set is clustered into several subsets using the K-means algorithm so that the structural similarities of the image patches are further strengthened. Finally, each subset is sparsely represented by the GRSC algorithm. The accurate sparse representations contribute to a high-quality reconstruction of the HR MS image.
- (3)
Adaptive coupled dictionary is constructed for different PS tasks. For the first task, a coupled dictionary learned from the PAN image and its degraded version is used to sparsely represent the MS image patches. For the second task, to effectively represent the single blue band, the PAN image and the reconstructed green band that has high correlation to the blue band are selected as the image dataset to train the coupled dictionary. For the third PS task, the reconstructed blue band with high correlation to the coastal band is selected as the image dataset to learn the adaptive coupled dictionary for the coastal band. Meanwhile, the reconstructed red edge band is taken as the image dataset to learn the adaptive coupled dictionary for sharpening the NIR 1 and NIR 2 bands.
The rest of this article is organized as follows:
Section 2 briefly introduces the SR-based PS methods, the SR theory, and the GRSC algorithm; the proposed GRSC-ACD method is presented in
Section 3;
Section 4 compares and analyzes the experimental results on degraded and real remote sensing data, and finally,
Section 5 concludes this article.
3. Multitask Pansharpening Method: GRSC-ACD
In this section, we introduce the proposed multitask PS method GRSC-ACD for the WorldView-2 data.
Figure 2 shows the scheme of the proposed method. The detailed algorithm steps of the proposed method are described as follows.
3.1. Description of Multitask Pansharpening
The first step of the proposed method is to divide the PS process into three tasks according to the degree of correlation among the MS channels and the PAN channel and the relative spectral response curves of different channels. The WorldView-2 data used in this paper is exhibited in
Figure 3.
Figure 3a shows the MS image with eight spectral bands with the size of
, and
Figure 3b shows the PAN image with the size of
. Then, the degraded PAN image is obtained by blurring and downsampling the PAN image, which has the same spatial resolution and scale as the original MS image. The correlation coefficient matrix among the MS channels and the PAN channel is computed, which is listed in
Table 1. According to the correlation coefficients of different channels and the relative spectral response curves among different channels as shown in
Figure 1, the PS process of WorldView-2 data is divided into three tasks.
- (1)
First task: The correlation coefficients between the MS channels including green, yellow, red and red edge, and the PAN channel are listed in
Table 1, which are highlighted in red. The green, yellow, red and red edge bands have high correlation to the PAN image; also, these bands are almost within the wavelength range covered by the PAN image. Hence, in the first task, these MS channels will be sharpened together. For this task, the HR PAN image and its degraded version are used to learn the coupled dictionary pair.
- (2)
Second task: In
Figure 1, the blue band is mostly within the wavelength range covered by the PAN image. However, it has low correlation to the PAN image. Hence, the second task specially deals with the blue band. From the correlation coefficient labeled with blue color, the blue band and the green band have high correlation. Hence, the PAN image and the reconstructed green band are used as the dataset to learn the adaptive coupled dictionary for this task.
- (3)
Third task: The remaining MS channels, i.e., coastal, NIR1, and NIR2, are almost outside the wavelength range covered by the PAN image shown in
Figure 1. In this task, three MS channels are divided into two groups: (1) coastal band; (2) NIR1 and NIR2. For these two groups, different reconstructed HR MS bands are chosen to learn the adaptive coupled dictionaries. From the correlation coefficient labeled with purple color, it can be concluded that the coastal band is highly related to the blue band. Hence, the reconstructed blue band is used to learn the coupled dictionary for sharpening the coastal band. The correlation coefficients labeled with green color show the high degree of correlation among red edge, NIR1, and NIR2. Hence, for sharpening the NIR1 and NIR2 bands, we use the reconstructed red edge band to train the coupled dictionary.
3.2. Pansharpening Algorithm via GRSC for Each Task
The purpose of PS is to generate an HR MS image with a LR MS image and an HR PAN image . For each task, the MS channels have high correlation to each other. Hence, the image patches from these MS channels have the same or similar manifold structures. Let be the band of the LR MS bands for the task, where , and . Then, all the LR MS bands are tiled into image patches with the patch size of and the overlapping size of . Each image patch is arranged in a column vector, and all the column vectors form an image patch set that is denoted as , where is the image patch of the spectral band of LR MS image. The PS process consists of three main steps which are described as follows.
- (1)
Constructing image patch sets with similar geometrical structure: To acquire the precise sparse representations of the image patches, the set is first separated into several subsets with K-means clustering algorithm. Let be the subset of each class, where , and is the total number of the subsets. All the image patches in a subset share the same or similar local geometrical structures.
- (2)
Sparse coding of the subsets via GRSC: The proposed method is based on the assumption that the LR MS image patch and its corresponding HR MS image patch share the same sparse representation over the coupled dictionary pair. Let
and
be the LR dictionary and the HR dictionary, respectively. The dictionary construction method will be introduced in the following subsection. Considering the graph regularized sparse coding for image representation, we first construct the weighted graph matrix
and the degree matrix
for the subset
. Then, the Laplacian matrix can be defined as
. The sparse representation of the subset
can be estimated by solving the following objective function:
where
is the sparse coefficient matrix for the subset
, and
is the sparse vector of the
image patch in the subset.
and
are the regularization parameters to balance the contribution of the two regularization terms. To solve the objective function by optimizing over each
, (7) can be rewritten in a vector form. First, the reconstruction error
can be written as
. The Laplacian regularizer
can be rewritten as follows:
Then, by combining (7) and (8), the problem (5) can be written as
Based on the feature-sign search algorithm proposed in [
59], the problem in (9) can be effectively solved to acquire the optimal sparse coefficient matrix
.
- (3)
Reconstructing the HR MS channels for each task: The estimated sparse coefficient matrix
for the subset
can be obtained by solving the problem in (9). Then, the HR MS image patch subset
corresponding to
can be calculated through the following Formula (10).
After all the HR MS image patch subsets are obtained, the MS bands for each task can be reconstructed by averaging the overlapped image patches.
3.3. Dictionary Learning
Dictionary learning is an essential step in the proposed GRSC-ACD method. For dif-ferent tasks, different coupled dictionary pairs need to be learned according to the charac-teristic of the MS channels. In our method, the images used to learn the coupled dictionary should be updated according to the characteristics of the tasks. The detailed descriptions are as follows.
- (1)
First task: This task processes the MS channels: green, red, yellow, and red edge. These MS bands are within the wavelength range covered by the PAN image and show high correlation to the PAN image. Hence, the HR PAN image and its degraded version are suitable to learn the coupled dictionary pair for the first task.
- (2)
Second task: This task only processes the blue band, which is partially outside the wavelength band covered by the PAN image, and has low correlation to the PAN image. Thus, only using the PAN image to learn the coupled dictionary is not suitable for this task. To effectively represent the image patches subsets, the PAN image and the reconstructed HR green band with high correlation to the blue band are selected to learn the coupled dictionary.
- (3)
Third task: This task sharpens the MS channels that are almost outside the wavelength range covered by the PAN image, i.e., coastal, NIR1, and NIR2. As shown in
Table 1, the coastal band has very low correlation to the NIR1 and NIR2 bands. Hence, this task is divided into two subtasks. One subtask processes the coastal spectral band. For this subtask, the reconstructed blue band is used to learn the coupled dictionary. Another subtask processes the NIR1 and NIR2 bands. For this subtask, the reconstructed red edge band is used to learn the coupled dictionary.
Then, the dictionary construction method for each subset is introduced. Let , be high-resolution images for dictionary learning. The HR images are blurred and downsampled to obtain the corresponding LR images , . Then, the HR and LR image pairs are tiled into the image patches. The patch size for the LR images is , and the overlapping size is . While the patch size for the HR images is , and the overlapping size is , where is the scale factor between MS and PAN. The image patches are arranged into vectors; hence, the coupled dictionary is constructed by the raw LR and HR image patch vectors, which are defined as and , respectively.
In short, our algorithm can be summarized in Algorithm 1.
Algorithm 1. The GRSC-ACD Pansharpening Method. |
Input: LR MS image , PAN image |
Initialization: Set parameters , , , and |
1: Split the PS process into multiple tasks according to the relative spectral response as shown in Figure 1 and the channel correlation matrix as listed in Table 1 |
2: for do |
3: Separate all the MS bands , into image patches and form an image patch set |
4: Generate each subset , using K-means clustering algorithm |
5: for do |
6: Learn the LR dictionary and the HR dictionary |
7: Compute the sparse coefficient matrix according to (7) |
8: Compute the HR image patch subset through (10) |
9: end for |
10: Generate the HR MS bands , |
11: end for |
Output: Target HR MS image . |