**4. Experiments**

#### *4.1. Data Sets and Implementation Details*

We conduct several experiments using three data sets respectively collected by the GeoEye-1, WorldView-2, and QuickBird satellites. Each data set is split into two nonoverlapping subset: a training data set and a testing data set. Each sample in the data set consists of an MS and PAN image pairs with the PAN image of size 64 × 64 and the MS image of size 16 × 16. In order to verify the generalization of the proposed method, we also perform pansharpening on a scene taken on other days for QuickBird satellite at the full resolution experiment. More detailed information of the three data sets is reported in Table 1.


**Table 1.** Information of the three data set.

As for the training of our proposed PWNet, due to the unavailable of the ideal HRMS images, similar to the other CNN-based pansharpening methods [32–34], we first follow the Wald's protocol [46] to generate the training input MS and PAN pairs by downsampling both the original PAN and MS images with scale factor *r* = 4 (i.e., the resolution of the MS and PAN images is reduced by applying the MTF-matched low-pass filters [21]), and then the the original MS images are treated as target outputs.

The PWNet is implemented in Tensorflow and trained on a Intel(R) Core(TM) i5-4210U CPU. We use the Adam algorithm [47] with an initial learning rate of 0.001 to optimize the network parameters. And we set the maximum number of epoch to 1000 and mini-batch sample size to 32. It takes about 1 h to train our network.

We first evaluate the methods at a reduced resolution, in addition to *visual analysis* on the experimental results, the proposed PWNet and other compared methods are also evaluated by five widely used quantitative metrics, namely, universal image quality index [48] averaged over the bands (Q\_avg) and its four band extension, Q4 [3], *spectral angle mapper* (SAM) [49], *Erreur Relative Globale Adimensionnelle de Synthe*´*se* (ERAGS) [50] and the *spatial correlation coefficient* (SCC) [46]. The closer to one the Q\_avg, Q4, and SCC, the better the quality of fused results, while the lower the SAM and ERGAS, the better the fusion quality.

We also evaluate the methods at a full resolution. In this case, the *quality with no-reference index* (QNR) [51] and its spatial index (*DS*) and spectral index (*Dλ*) are employed for the quantitative assessment. It should be pointed out that the quantitative assessment at full resolution is challenging since these indexes (i.e., QNR, *DS* and *Dλ*) are not computed with unattainable ground truth, but rely heavily on the original MS and PAN images [43]. This tends to quantify the similarity of certain components in the fused images to the low-resolution observations, which will lead biases in these indexes estimation. Due to this reason, some methods can generate images with high QNR values but poor image qualities [52].

In the following, we have carried out five sets of experiments to perform comprehensive analysis on the proposed PWNet, typically the effect of the hyperparameter *α*, the number of CS-based and MRA-based methods, the weight maps channels, and the quantitative, visual and running time comparisons with the CS-based, MRA-based and learning-based methods at reduced resolution and full resolution.

#### *4.2. Analysis to the Hyper-Parameters α*

There is a hyper-parameter *α* in our proposed PWNet, which is to balance the contributions of the CS-based and MRA-based methods. In this experiment, we will analyze this parameter to optimize the performance of PWNet. We fix the number of CS-based and MRA-based methods to six and change *α* from 0.1 to 1 with interval 0.1. The results obtained are shown in Table 2. As we can see from it, the PWNet attains constantly good performances when *α* varies from 0.7 to 0.9. Specially, the best results can be obtained for *α* = 0.7. It is worth noting that when *α* goes to 1, the quantitative indexes seem to become worse. Thus, *α* = 0.7 can be a relatively good choice in the following experiments.


**Table 2.** Quantiative results obtained by PWNet with different *α*.

#### *4.3. Impact of the Number of the CS-Based and MRA-Based Methods*

This experiment shows what influences would be produced by the different number of the CS-based and MRA-based methods under the condition of *nCS* = *nMRA* for simplicity and also for keeping the balance between the CS class and the MRA class methods. The number of methods to be averaged is very important to balance the spectral and spatial information from the LRMS and high resolution PAN images. Too few methods might extract features incompletely and result in a poor performance, while too many would suffer from computational burden during testing. We would like to find a trade-off value according to the performance and running time. Therefore, we limit the range of *nCS* (or *nMRA*) to between 5 and 8.

Table 3 gives the quantitative results and the running time (in second) when the number of averaged methods varies from 5 to 8. Note that, *nMRA* and *nCS* equal 5, which means that the selected CS-based are PCA [11], GIHS [10], Brovey [12], GS [13], GSA [15] and the selected MRA-based methods are HPF [18], SFIM [19], AWLP [24], MTF-GLP-HPM [22], MTF-GLP-CBD [38], respectively. When both *nCS* and *nMRA* are equal to 6, the PRACS [17] and the ATWT-M3 [39] methods are added into the CS-based and MRA-based methods, respectively. When *nMRA* and *nCS* are equal to 7, we add the BDSD [14] into the CS weight network and the Indusion [23] into the MRA weight network, and the BDSD-PC [16] and the MTF-GLP [21] are added into the CS weight network and the MRA weight network as method modules when *nMRA* and *nCS* are equal to 8. As reported in Table 3, the performance of the proposed PWNet is improved when the number of averaged methods increases from 5 to 7, while it decays when *nCS* and *nMRA* are equal to 8. This reveals that the use of less number of averaged methods will reduce the performances of the PWNet and increasing the number of averaged methods will not continuously bring improvements but will suffer from more computation. In principle, our proposed PWNet is data-driven and thus can automatically weight different kinds of CS-based and/or MRA-based methods. In practice, we sugges<sup>t</sup> two criteria for the selection of CS-based and MRA-based methods for the proposed PWNet. First, the number of CS-based methods and the number of MRA-based methods should be equal in order to keep the balance contribution of the CS class and MRA class. And then, to further improve the performances and robustness of the

PWNet, we sugges<sup>t</sup> selecting the CS-based and MRA-based methods according to their performances reported in [3].


**Table 3.** Quantitative results and running time with different number of the averaged methods.

#### *4.4. Impact on the Number of Weight Map Channels*

In order to reduce the number of parameters and the computational cost, we set the weight map for each CS-based or MRA-based method to be one channel. That is, each MS band of a HRMS image obtained by a CS-based or MRA-based method share the same weight map. In general, the model capacity will be increased with the number of model parameters. Thus, we conduct the experiments based on different output weight map channels to verify whether the capacity of our model has suffered from the reduction of channels.

The results of PWNet with one shared weight map channel and with four different weight map channels for each CS-based or MRA-based method are reported in Table 4. As we can see from it, the PWNet with one shared weight map channel attains constantly good performances in terms of the five commonly used metrics on three different kinds of satellite data sets, while the PWNet with four different weight map channels for each method has a relatively poor performances with the same training conditions. This may due to that an under-fitting phenomenon caused by excessive parameters has happened in the PWNet with four weight map channels. It further verifies the advantages of the PWNet with one shared weight map channel, which can lower the training difficulty.


**Table 4.** Results of the proposed PWNet with one shared weight map channel or four different weight map channels for each CS-based or MRA-based method. The best results are highlighted in bold.

#### *4.5. Comparison with the CS-Based and MRA-Based Methods*

One key issue of the proposed PWNet method is whether its fusion result is better than that of each participating method. Only when the answer to this question is yes, we can claim that the proposed PWNet can produce appropriate weight maps for each CS-based or MRA-based method, so that the methods involved in model average can complement each other and the result can be improved. Here we set *nMRA* and *nCS* to 7 as we have proved in above that this setting has a good balance for performances and running times. The compared pansharpening methods include seven methods belonging to the CS class, namely, PCA [11], GIHS [10], Brovey [12], BDSD [14], GS [13], GSA [15], PRACS [17], and seven methods belonging to the MRA class such as HPF [18], SFIM [19], Indusion [23], AWLP [24], ATWT-M3 [39], MTF-GLP-HPM [22], MTF-GLP-CBD [38]. All methods follow the experimental settings recommended by the authors.

We first inspect the visual quality of the pansharpening results. Figures 5–7 present the pansharpened images on all the three data sets, obtained by our proposed PWNet and the other fourteen methods. As we can see from theses figures, the CS-based methods produce a relatively sharper spatial features in Figure 5a–g, but they suffer from spectral distortions, as highlighted in the small window, where the trees around buildings in Figure 5 and the bare soil in Figure 6 are a little darker than that of ground truth. In contrast, less spectral distortions are appeared in the results of the MRA-based methods, however, they show poor spatial rendering as they present a little blurring in Figure 6h–n, especially for the results of AWLP and ATWT-M3. Compared with the CS-based and MRA-based methods, the proposed PWNet can achieve more similar results to the ground truth. From the enlarged area in the upper left corner of the Figure 7o, we can see that PWNet has the best performance in both improving the spatial details and keeping spectral fidelity of the roads and trees. In summary of the visual analysis, the proposed PWNet method can debias the spectral and spatial distortions in the CS-based and the MRA-based methods, and can effectively combine the advantages of these two types of methods, thus shows better visual performances.

Besides visual inspection, we apply numeric metrics to assess the quality of pansharpened images. Tables 5–7 report the comparison results of the CS-based methods, MRA-based methods, and the proposed PWNet method on the three data sets. As we can find from these tables, for the WordView-2 and GeoEye-1 data sets, the BDSD method shows the best performances among the fourteen traditional methods, the AWLP achieves the best performance among the CS-based and MRA-based methods for the QuickBird data set. None of the CS-based and MRA-based methods systematically obtain the best performances for all the three data sets. The proposed PWNet yields results with the best spatial and spectral accuracy over the CS-based and MRA-based methods on all the three data sets. This proves once again that the proposed method can combine the advantages of the two types of methods to produce an optimal result.

**Figure 5.** Visual comparison of the CS-based and MRA-based methods and the proposed PWNet method on the WorldView-2 images, (**a**) PAN; (**b**) LRMS; (**c**) PCA; (**d**) GIHS; (**e**) Brovey; (**f**) BDSD; (**g**) GS; (**h**) GSA; (**i**) PRACS; (**j**) HPF; (**k**) SFIM; (**l**) Indusion; ( **m**) AWLP; (**n**) ATWT-M3; (**o**) MTF-GLP-HPM; (**p**) MTF-GLP-CBD; (**q**) PWNet (ours); (**r**) Ground Truth.

**Figure 6.** Visual comparison of the CS-based and MRA-based methods and the proposed PWNet method on the GeoEye-1 images, (**a**) PAN; (**b**) LRMS; (**c**) PCA; (**d**) GIHS; (**e**) Brovey; (**f**) BDSD; (**g**) GS; (**h**) GSA; (**i**) PRACS; (**j**) HPF; (**k**) SFIM; (**l**) Indusion; (**m**) AWLP; (**n**) ATWT-M3; (**o**) MTF-GLP-HPM; (**p**) MTF-GLP-CBD; (**q**) PWNet (ours); (**r**) Ground Truth.

**Figure 7.** Visual comparison of the CS-based and MRA-based methods and the proposed PWNet method on the QuickBird images, (**a**) PAN; (**b**) LRMS; (**c**) PCA; (**d**) GIHS; (**e**) Brovey; (**f**) BDSD; (**g**) GS; (**h**) GSA; (**i**) PRACS; (**j**) HPF; (**k**) SFIM; (**l**) Indusion; (**m**) AWLP; (**n**) ATWT-M3; (**o**) MTF-GLP-HPM; (**p**) MTF-GLP-CBD; (**q**) PWNet (ours); (**r**) Ground Truth.


**Table 5.** Quantitative comparison of the CS-based and MRA-based methods and the proposed PWNet method on the WorldView-2 images. The best and second best results are highlighted in bold.

**Table 6.** Quantitative comparison of the CS-based and MRA-based methods and the proposed PWNet method on the GeoEye-1 images. The best and second best results are highlighted in bold.


To be interpretable, we also visualize the some weight maps of the selected traditional methods used in the PWNet, as shown in Figures 8–10. It can be seen from these figures that, the edges of the road and the buildings are extracted by the weight maps of both the CS-based and MRA-based methods. Typically, we can see that, for the WorldView-2 and GeoEye-1 data sets, the BDSD method plays an important role as its weight map is clearer than any others, as showin in Figures 8b and 9b, while the AWLP and MTF-GLP-CBD methods show a little greater contribution to the averaged results of the PWNet for the QuickBird data set, as can be see from Figure 10d,e. As for the PCA method, the weight maps are all black, which means that PCA method almost makes no contribution to the final result on all the three tested data sets. This conclusion is consistent with the previous visual inspection in Figures 5–7 and quantitative results reported in Tables 5–7. This proves the adaptive characteristic of our PWNet as it considers different performance of the selected CS-based and MRA-based methods. From these experimental results, we can conclude that the proposed PWNet are adaptive and robust to different data sets.


**Table 7.** Quantitative comparison of the the CS-based and MRA-based methods and the proposed PWNet method on the QuickBird images. The best and second best results are highlighted in bold.

**Figure 8.** Visualization of weight maps on the WorldView-2 images, (**a**) PCA; (**b**) BDSD; (**c**) PRACS; (**d**) AWLP; (**e**) MTF-GLP-HPM; (**f**) MTF-GLP-CBD.

**Figure 9.** Visualization of weight maps on the GeoEye-1 images, (**a**) PCA; (**b**) BDSD; (**c**) PRACS; (**d**) AWLP; (**e**) MTF-GLP-HPM; (**f**) MTF-GLP-CBD.

**Figure 10.** Visualization of weight maps on the QuickBird images, (**a**) PCA; (**b**) BDSD; (**c**) PRACS; (**d**) AWLP; (**e**) MTF-GLP-HPM; (**f**) MTF-GLP-CBD.

#### *4.6. Comparison with the CNN-Based Methods*

Currently, the proposed PWNet has shown its priority over the selected traditional CS-based and MRA-based methods. In this subsection, we are going to compare it with the CNN-based methods to verify its effectiveness. The other four *stat-of-the-art* (SOTA) methods including *pansharpening by convolutional neural networks* (PNN) [32], *deep residual pan-sharpening neural network* (DRPNN) [42], *multiscale and multidepth convolutional neural network* (MSDCNN) [43], are used as alternative methods for comparison. All the compared methods follow the experimental setting of their original papers. Note that the source codes of PNN is provided by the original authors and the codes of DRPNN, MSDCNN are available at https://github.com/Decri.

Figures 11–13 show some example regions selected from the pansharpened images on the three test data sets. In Figure 11, by magnifying the selected area in the image three times, it can be obviously seen that the other CNN-based methods have a little blurring to the ground truth, while the edges produced by our proposed PWNet method are more clear and natural as shown in zoomed areas. Although MSDCNN, DRPNN and PNN can produce better results with less spatial distortions, they sometimes suffer from a little spectral distortions, as shown in Figure 12b–d, where the bare soil is a little darker than the reference. This can also be seen in Figure 13b where the buildings in this scene is dark yellow while they are white in the ground truth as shown in Figure 13f. Compared with other CNN-based methods, the proposed PWNet shows a good balance between the injected spatial details and the maintain of original spectral information, this is clearly visible on the vegetable areas and textures (e.g., edges of the roof and road), as shown in Figures 11e–13e.

In addition, Table 8 shows the quantitative results for the three tested data sets obtained by the compared CNN-based methods and our proposed PWNet. It should be pointed out that, for each test experiment, we would choose one test sample randomly from the test data set rather than a cherry-picked sample, thus the results listed in Table 8 and Tables 5–7 are based on different PAN and MS image pairs and have different quantitative results. For better comparison, the best results among the four methods are highlighted in boldface. According to this table, one can see that performances of the proposed PWNet is better than the other three CNN-based methods in terms of the five indexes.

**Figure 11.** Visual comparison of the CNN-based methods on the WorldVie-2 data set, (**a**) LRMS; (**b**) PNN; (**c**) DRPNN; (**d**) MSDCNN; (**e**) PWNet (ours); (**f**) Ground Truth.

**Figure 12.** Visual comparison of the CNN-based methods on the GeoEye-1 data set, (**a**) LRMS; (**b**) PNN; (**c**) DRPNN; (**d**) MSDCNN; (**e**) PWNet (ours); (**f**) Ground Truth.

**Figure 13.** Visual comparison of the CNN-based methods on the QuickBird data set, (**a**) LRMS; (**b**) PNN; (**c**) DRPNN; (**d**) MSDCNN; (**e**) PWNet(ours); (**f**) Ground Truth.

**Table 8.** Quantitative comparison of the CNN-based methods on three test data sets. The best results are highlighted in bold.


#### *4.7. Comparison at Full Resolution*

The comparison results on three tested images at full resolution are shown in Figures 14–16 and Table 9. As we can see from the table that, for the WordView-2 and GeoEye-1 data sets, the DRPNN and PNN method respectively show the best performances, while the proposed PWNet holds the second best position for all the three data sets. On a whole, the CNN-based methods perform better than the traditional methods (i.e., the CS-based and MRA-based methods). By a visual inspection, the PNN, DRPNN, and MSDCNN methods tend to produce blurring results while the proposed PWNet is able to enhancing the spatial quality and shows clearly sharper fusion results, as shown in Figures 14–16r. As a summary, compared to the other methods at the full resolution, the proposed PWNet could consistently reconstruct sharper HRMS image with less spectral and spatial distortion.

(a)PCA (b)IHS

**Figure 14.** Visual comparison of different methods on the WorldVie-2 data set at full resolution, (**a**) PCA; (**b**) GIHS; (**c**) Brovey; (**d**) BDSD; (**e**) GS; (**f**) GSA; (**g**) PRACS; (**h**) HPF; (**i**) SFIM; (**j**) Indusion; (**k**) AWLP; (**l**) ATWT-M3; (**m**) MTF-GLP-HPM; (**n**) MTF-GLP-CBD; (**o**) PNN; (**p**) DRPNN; (**q**) MSDCNN; (**r**) PWNet (ours).

(d)BDSD (e)GS (f)GSA

(c)Brovey

**Figure 15.** Visual comparison of different methods on the GeoEye-1 data set at full resolution, (**a**) PCA; (**b**) GIHS; (**c**) Brovey; (**d**) BDSD; (**e**) GS; (**f**) GSA; (**g**) PRACS; (**h**) HPF; (**i**) SFIM; (**j**) Indusion; (**k**) AWLP; (**l**) ATWT-M3; (**m**) MTF-GLP-HPM; (**n**) MTF-GLP-CBD; (**o**) PNN; (**p**) DRPNN; (**q**) MSDCNN; (**r**) PWNet (ours).

**Figure 16.** Visual comparison of different methods on the QuickBird data set at full resolution, (**a**) PCA; (**b**) GIHS; (**c**) Brovey; (**d**) BDSD; (**e**) GS; (**f**) GSA; (**g**) PRACS; (**h**) HPF; (**i**) SFIM; (**j**) Indusion; (**k**) AWLP; (**l**) ATWT-M3; (**m**) MTF-GLP-HPM; (**n**) MTF-GLP-CBD; (**o**) PNN; (**p**) DRPNN; (**q**) MSDCNN; (**r**) PWNet (ours).


**Table 9.** Performance comparison on three test data sets at full resolution. The best and second best results are highlighted in bold and underlined, respectively.

#### *4.8. Running Time Analysis*

In this subsection, we compare the running time of the proposed method with the others on a 64 × 64 LRMS and 256 × 256 PAN image pair. The experiments are performed by MATLAB R2016b on the same platform with Core i5-4210U/1.7 GHz/4G. The running times of different methods are listed in Table 10, in which the time is measured in second. From this table, it can be found that DRPNN is the most time-consuming method, because the number of hidden layers within DRPNN is more than the other CNN-based methods. In addition, the MSDCNN needs a little more time to obtain the fusion result than that of the proposed PWNet. In a word, the proposed PWNet method is more efficient than the CNN-based methods due to less hidden layers and that it only outputs weight maps rather than directly producing an estimated HRMS image.


**Table 10.** Running time comparison of different methods (in second).
