**1. Introduction**

Due to the limitations of remote sensing satellite-imaging system, remote sensing satellite images with both high spatial and high spectral resolution are difficult to obtain. This problem can be mitigated by improving the hardware. However, it turns out to be an arduous task due to the strict limit of the signal-to-noise ratio of satellite products [1]. To alleviate this problem, the pansharpening technique was proposed. The main purpose of pansharpening is to generate a high-spatial-resolution (HR) multispectral (MS) image, which contains spatial information of the panchromatic (PAN) image and spectral information of the corresponding MS images, by fusing a low-spatial-resolution (LR) MS image with a HR PAN image. As one of the most basic and dynamic research topics in remote sensing, pansharpening has a significant impact on many remote sensing applications, such as crop mapping [2], land cover classification [3], and target detection [4]. In recent years, with the wide application of deep learning in various computer vision tasks, DLbased pansharpening methods have developed rapidly. Inspired by SRCNN [5], Masi et al. [6] first attempted to use convolutional neural networks (CNN) for pansharpening and stacks of three convolutional layers (PNN) for pansharpening, achieving state-of-theart results. Motivated by the results of the PNN, many pansharpening methods based on deep learning have emerged in recent years [7–13].

Although they have different structures and achieve the desired effect, they usually underutilize the advantage of spectral information that exists in the MS images and spatial information that exists in the PAN images. Most of them tend to concatenate PAN and MS images at the beginning of the network and extract the input feature maps with a single network, which is simple and easy to implement but is not conducive to spectralspatial information fusion. Even though some DL-based methods [14–18] are designed

**Citation:** Nie, Z.; Chen, L.; Jeon, S.; Yang, X. Spectral-Spatial Interaction Network for Multispectral Image and Panchromatic Image Fusion. *Remote Sens.* **2022**, *14*, 4100. https://doi.org/ 10.3390/rs14164100

Academic Editor: Adrian Stern

Received: 28 June 2022 Accepted: 19 August 2022 Published: 21 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

to be multi-branch structures, and the features of the input PAN and MS images are extracted respectively, they fail to consider the interaction and the impact of spectral-spatial information, which is adverse to information transmission and conversion in networks.

In view of the above issues, we propose a spectral-spatial interaction network for pansharpening. Specifically, to fully extract the features of MS and PAN images, SSIN is designed as dual-branch structures, where the spatial branch is used to extract spatial information from PAN images and the spectral branch extracts spectral information from MS images. Inspired by [19], we take into account information guidance between different branches and design a spectral-spatial attention (SSA) module to fully extract the advantageous information from the two branches. Moreover, we introduce an information interaction block (IIB) into our network for information interaction of the spectral branch and spatial branch. Furthermore, we assemble IIB and SSA into an information interaction group (IIG) as the basic structure of our network. It is worth noting that we use a long skip connection to pass the upsampled MS image to the end of the network directly; many DL-based methods [8,20–22] have demonstrated the effectiveness of this approach.

In summary, the main contributions of this article are as follows:


The remainder of this paper is organized as follows. Section 2 introduces the related work, while Section 3 introduces the proposed SSIN and each part of the network in detail. Section 4 presents the data sets, evaluation index, ablation study, parameters analysis and comparison with SOTA methods on three data sets. Section 5 presents the efficiency study. Finally, Section 6 draws conclusions.
