4.6.1. Reduced-Resolution Experiments

The reduced-resolution experimental results of different methods on the QB datasets sets are shown in Table 4. It can be clearly observed from Table 4 that DL-based methods outperform the traditional methods on the evaluation indexes, which reflect the powerful fusion performance of deep learning. Among them, our proposed SSIN performs the best for all the reduced-resolution indexes, followed by MDA-Net [46], which demonstrates the effectiveness of SSIN. Similar results also appear on WV4 and WV2 datasets, as shown in Tables 5 and 6. SSIN achieves the best result except the Q2n on WV2.


**Table 4.** Quantiative results of different methods on the QB dataset. The best results are in bold and the second-best results are underlined.

**Table 5.** Quantiative evaluation comparison of different methods on the WV4 dataset. The best results are in bold and the second-best results are underlined.


**Table 6.** Quantiative evaluation comparison of different methods on the WV2 dataset. The best results are in bold and the second-best results are underlined.


Although the quantitative evaluation shows the excellent performance of SSIN, in order to demonstrate the effect of SSIN in subjective evaluation, we make a subjective visual comparison of some samples in the above datasets. To prove the universality of SSIN, we selected images of three different scenes from the above datasets for comparison. Specifically, the harbor image is from the QB dataset, the forest image is from the WV4 dataset, and the city image is from the WV2 dataset.

Figures 7–9 present the visual comparison of different methods on the three satellite datasets. For intuitive comparison, residual results between fusion image and the ground truth are also presented. The concrete method is to take the average of the absolute values of each band residuals.

**Figure 7.** The reduced-resolution experiments results of different methods on QB dataset.

**Figure 8.** The reduced-resolution experiments results of different methods on WV4 dataset.

**Figure 9.** The reduced-resolution experiments results of different methods on WV2 dataset.

Figure 7 displays the fusion results of an image from the QB dataset. We can clearly see from Figure 7 that the results of EXP and PRACS methods are blurred and contain serious spatial distortion compared with GT. The results of the BDSD-PC, PNN, and MUCNN are slightly darker than the ground truth (GT). According to the residual result from Figure 7, it can be found that the error between the traditional method and the GT is greater, while the DL-based methods have less error. Moreover, comparing various DL-based methods, we can see that our proposed SSIN is the closest to the GT, which indicates that our model has better performance in spatial recovery and spectral preservation.

A visual comparison of WV4 dataset is shown in Figure 8. As can be seen, EXP, PRACS, PNN, and MSDCNN produce very blurry images with serious spatial distortion, while the results of BDSD-PC, GSA, and MTF-GLP generate significant spectral distortion in the forest area. The results of the MDA-Net, GGPCRN and our proposed methods are difficult to discern visually. We can further see the residual result from Figure 8, similarly to the experimental results of the QB dataset case, DL-based methods are closer to GT than traditional methods. Among DL-based methods, the residual image of MDA-Net and SSIN is closer to GT than others, which is consistent with the results of the quantitative evaluation shown in Table V. Although the residual image of MDA-Net is very close to that of SSIN, further observation reveals that MDA-Net is slightly brighter than SSIN.

The visual comparison of the WV2 dataset is depicted in Figure 9. It can be observed that, compared with the first two datasets, the test results of each method have larger errors in the WV2 dataset. The reason is that the number of bands in the WV2 dataset is twice that of the first two datasets, making reconstruction more difficult. This can also be seen by comparing objective indicators of the three datasets. As we can see from the residual result in Figure 8, the results of the proposed SSIN are closer to the GT. In particular, it can be clearly seen from the circle in the upper right corner of the residual image that SSIN has the smallest error.

The above comparison at reduced resolution demonstrates the superior performance of SSIN.
