**3. Results**

### *3.1. Experimental Setup*

#### 3.1.1. Quality Evaluation

Visual and numerical evaluations were performed according to the Wald protocol [36]. The original MS image was used as the reference image. Correlation coefficient (CC), universal image quality index (UIQI) [37], *erreur relative global adimensionnelle de synthese* (ERGAS) [38], and spectral angle mapper (SAM) [39] were used for numerical evaluation. These are major evaluation criteria and used in almost all PS-related research [2]. The CC is given by

$$\begin{array}{lcl} \mathbf{CC} = \frac{1}{|\mathbb{B}|} \times \sum\_{b \in B} \mathbf{CC}\_{b\prime} \\ \mathbf{CC}\_{b} = \frac{\sum\_{i=1}^{N} \left( O\_{b}(i) - \overline{O\_{b}} \right) \times \left( PS\_{b}(i) - \overline{PS\_{b}} \right)}{\sqrt{\sum\_{i=1}^{N} \left( O\_{b}(i) - \overline{O\_{b}} \right)^{2}} \times \sqrt{\sum\_{i=1}^{N} \left( PS\_{b}(i) - \overline{PS\_{b}} \right)^{2}}} \end{array} \tag{19}$$

where a value closer to 1.0 implies a smaller loss of the intensity correlation and a better result. *N* and |*B*| are the total number of pixels in the entire image for each band and the number of bands in the PS image, respectively. *Ob*(*i*) and *Ob* denote the *i*th pixel value of the *b*-band reference image and its mean value, respectively, and *PSb*(*i*) and *PSb* denote the *i*th pixel value of the *b*-band PS image and its mean value, respectively. UIQI is an index for measuring the loss of intensity correlation, intensity distortion, and contrast distortion and is given by

$$\begin{array}{c} \text{UIQI} = \frac{1}{\|\mathbf{B}\|} \times \sum\_{b \in B} \text{IIIQI}\_{b\prime} \\ \text{UIQI}\_{b} = \frac{\sigma\_{O\_{b}, \text{PS}\_{b}}}{\sigma\_{O\_{b}} \cdot \sigma\_{\text{PS}\_{b}}} \times \frac{2 \cdot \overline{\mathcal{O}\_{b}} \cdot \text{PS}\_{b}}{\left(\overline{\mathcal{O}\_{b}}\right)^{2} + \left(\overline{\mathcal{PS}\_{b}}\right)^{2}} \times \frac{2 \cdot \sigma\_{O\_{b}} \cdot \sigma\_{\text{PS}\_{b}}}{\sigma\_{O\_{b}} \cdot ^{2} + \sigma\_{\text{PS}\_{b}} \cdot ^{2}} \end{array} \tag{20}$$

where σ*Ob* and σ*PSb* are the standard deviation of the reference and PS images in the *b*-band, respectively, and <sup>σ</sup>*Ob*,*PSb* denotes the covariance of the reference and PS images in the *<sup>b</sup>*-band. A value closer to 1.0 implies that these losses are small. The size of the UIQI sliding window was 8 × 8.

The ERGAS is given by

$$\begin{aligned} \text{ERGAS} &= 100 \times \frac{h}{l} \times \sqrt{\frac{1}{|\mathcal{V}|} \times \sum\_{b \in B} \left( \frac{(RMSE\_b)^2}{\left( \overline{PS\_b} \right)^2} \right)}\\ \text{RMSE}\_b &= \sqrt{\frac{1}{N} \times \sum\_{i=1}^{N} \left( O\_b(i) - PS\_b(i) \right)^2} \end{aligned} \tag{21}$$

where *h* and *l* denote the spatial resolution of the PAN and MS images, respectively. The smaller the ERGAS value, the better the image quality. The SAM is an index for measuring spectral distortion and is given by

$$\begin{aligned} \text{SAM} &= \frac{1}{N} \sum\_{i=1}^{N} \text{SAM}(i), \\ \text{SAM}(i) &= \cos^{-1} \left( \frac{\sum\_{k \in B} O\_b(i) \times PS\_b(i)}{\sqrt{\sum\_{k \in B} (O\_b(i))^2} \times \sqrt{\sum\_{k \in B} (PS\_b(i))^2}} \right). \end{aligned} \tag{22}$$

If the value is closer to 0.0, the spectrum ratio of each band is closer to the reference image.

### 3.1.2. Dictionary Learning

We used PAN images from the Nihonmatsu and Yokohama datasets as training images for dictionary learning. For training image *X* and dictionary, the number of training times was 40, the number of atoms of the dictionary *D* was 1024, and the size of each atom was *p* = 4. The number of patches was 6433 (Nihonmatsu) and 10,000 (Yokohama). The training dataset was a set of 1000 patches randomly selected from these patches. As the training image *X*, a training PAN image for the high-resolution dictionary was used as the high-resolution data *Xhigh*, and a training PAN image for the low-resolution dictionary was used as the low-resolution data *Xlow*. The low-resolution image was generated by downsampling and upsampling via bicubic interpolation. The sparsity regularization parameter was λ = 0.1. The sparse representation β was calculated using Equation (10) by solving the L1 norm regularized least-squares problem, and the dictionary *D* was obtained using Equation (11) by solving the least-squares problem with quadratic constraints. We used the code provided by Lee et al. [40]. The learned high-resolution dictionaries are shown in Figure 3. For Nihonmatsu images, the dictionary created using Nihonmatsu images was used. For Yokohama images, the dictionary created using Yokohama images was used.

(**a**) (**b**)

**Figure 3.** Trained high-resolution dictionaries: (**a**) Nihonmatsu, (**b**) Yokohama.
