*3.2. Network Architecture*

**Pansharpening Weight Network (PWNet):** The key of model average is to assign proper weights to different models depending on their performances. Unlike traditional model average methods that are based on hand-crafted design to assign weights, we resort the neural network to adaptively generate weights. The proposed PWNet is composed of two subnet: the *CS weight network* and the *MRA weight network*. For each subnet, the original LRMS and/or PAN images are took as input. Similar to [33], high-pass filter operation is first implemented in order to preserve edges and details. For the CS weight subnet, the high-pass filtered MS image is upsampled though a *transpose convolution* and concatenated with the high-pass filtered PAN image, and then fed into four *residual blocks* [44]. Different from this, the MRA weight subnet only passes the high-pass filtered PAN image into the subsequent residual blocks. Each of the residual blocks consists of three convolutional layers with each having a learnable filter of size 3 × 3 and a *rectified-linear unite* (ReLU) activation function [45]. To generate weight maps, the activation function of the output layer is set to be a *softmax* function. By considering computation efficiency and also for keeping the proportions between each pair of the MS bands unchanged, the PWNet only outputs one weigh map for each CS-based or MRA-based

method to achieve a *pixel-wise* aggregation, which will be discussed below. The detailed architecture of PWNet for generating the weight maps is shown in Figure 4.

**Figure 3.** Blockdiagram of the proposed PWNet for pansharpening. The proposed method can be divided into three independent part, i.e., the weight maps network, the CS inference modules and the MRA inference modules. The weight maps network takes the reduced MS and PAN image as inputs, and outputs weight maps for each of the CS-based and MRA-based methods. At the same time, the CS inference modules and the MRA inference modules generate HRMS images according to specific pansharpening methods. Finally, the estimated HRMS image is obtained by averaging all HRMS images estimated by the selected CS-based and MRA-based methods with the weight maps generated by the weight maps network.

**Figure 4.** The architecture of proposed PWNet for generating the weight maps.

**The CS-based Result Average:** Suppose we have *nCS* kinds of CS-based methods for our PWNet, and then, for the *i*th CS-based method, we have its estimated HRMS image as

$$\left[ \overleftarrow{M} \bar{\boldsymbol{S}}\_{\text{CS}\_{i1}\prime} \overleftarrow{M} \bar{\boldsymbol{S}}\_{\text{CS}\_{i2}\prime} \cdots \boldsymbol{\mu} \overleftarrow{M} \bar{\boldsymbol{S}}\_{\text{CS}\_{iN}} \right] = \text{CS}\_{i}(\boldsymbol{P}, \widehat{\boldsymbol{M}} \bar{\boldsymbol{S}}) \tag{5}$$

$$i = 1, 2, \cdots, n\_{CS} \tag{6}$$

where *MS CSik* , *k* = 1, 2, ··· , *N*, is the *k*th MS band and *CSi*(·) denotes the *i*th CS-based method. According to the CS weight network, we ge<sup>t</sup> adaptive weight maps for the used CS-based methods as:

$$\left[ \left. \left. \left. \left. W\_{\mathrm{CS}\_{2'}}, \dots, \cdot \right. \right. \right. \right. \left. \left. \left. \left. \left. W\_{\mathrm{CS}\_{n\_{\mathrm{CS}}}} \right. \right. \right. \right. \right. = f\left( \mathrm{G}\left( P \right), \mathrm{G}\left( MS \right); \theta\_{\mathrm{CS}} \right) \tag{7}$$

where *WCSi* is the *i*th output weight map generated by the CS weight network, *f*(*G*(*P*), *<sup>G</sup>*(*MS*); *<sup>θ</sup>CS*) denotes the CS weight network, *θCS* is the parameter set, the function *<sup>G</sup>*(·) is the high-pass filter. At last, we conduct pixel-wise multiplication for each estimated HRMS image and its corresponding weight map *WCSi* and sum the multiplication results to ge<sup>t</sup> the CS-based result average, i.e., for each band, we have the averaged result as

$$
\widehat{\bar{M}\bar{S}}\_{CS\_{\bar{k}}} = \sum\_{i=1}^{n\_{CS}} \widehat{\bar{M}\bar{S}}\_{CS\_{\bar{k}}} \odot \mathcal{W}\_{CS\_{\bar{i}'}} \tag{8}
$$

$$k = 1, \ldots, N,\tag{9}$$

where *MS CSk* denotes the result of the *k*th band of CS-based method module and denotes the point-wise multiplication.

**The MRA-based Result Average:** Consistent with the procedures of the CS-based method module, the averaged results for the *nMRA* MRA-based methods can be given as

$$
\overline{\hat{M}\bar{S}}\_{MRA\_k} = \sum\_{i=1}^{n\_{MRA}} \overline{\hat{M}} \overline{\hat{S}}\_{MRA\_{ik}} \odot W\_{MRA\_i} \tag{10}
$$

$$k = 1, \ldots, N,\tag{11}$$

where *MS MRAik* and *WMRAi* are the *k*th HRMS band obtained by the *i*th MRA-based method and the *i*th output weight map generated by the MRA weight network, which respectively are given as

$$\left[\mathcal{W}\_{MRA\_1\prime}, \mathcal{W}\_{MRA\_2\prime}, \dots, \mathcal{W}\_{MRA\_{w\_{MRA}}}\right] = \mathcal{g}\left(\mathcal{G}(P); \theta\_{MRA}\right) \tag{12}$$

and

$$\left[\widehat{\rm MS}\_{MRA\_{i1}}, \widehat{\rm MS}\_{MRA\_{i2'}}, \dots, \widehat{\rm MS}\_{MRA\_{iN}}\right] = MRA\_i(P, \widehat{\rm MS}), \quad i = 1, 2, \dots, n\_{MRA} \tag{13}$$

where *g*(*G*(*P*); *<sup>θ</sup>MRA*) denotes the MRA weight network, *θMRA* is the parameter set, and *MRAi*(·) is the *i*th MRA-based method. Note that, different from the CS-based weight network, we only take the PAN image as the input of the MRA-based weight network since the MRA-base methods extract the spatial details only rely on the PAN image.

**The Final Result Aggregation:** After we have obtained the averaged results of the CS-based method module, *MS CS* = [*MS CS*1 , ··· , *MS CSN* ], and the averaged results of the MRA-based method module, *MS MRA* = [*MS MRA*1 , ··· , *MS MRAN* ], then we can aggregate them for compensating their spatial and/or spectral distortions. The final estimated HRMS image *MS* can be given as

$$
\overline{MS} = \overline{MS}\_{\overline{CS}} + a\overline{MS}\_{MRA} \tag{14}
$$

where *α* is a factor for balancing the contributions between the CS-based methods and the MRA-based methods.

**The Loss function:** To learn the model parameters *θ* = {*<sup>θ</sup>CS*, *<sup>θ</sup>MRA*}, we would like to minimize the reconstruction errors between the estimated HRMS image, *MS* , and its corresponding ideal one, *Y*, i.e.,

$$\min\_{\theta} \frac{1}{n} \sum\_{i=1}^{n} ||\overrightarrow{M}\overrightarrow{S}^{i} - \Upsilon^{i}||\_{F}^{2} \tag{15}$$

where *n* is the number of training samples.
