*4.2. Train Details*

In the training stage, we use the Adam optimization algorithm [53] to optimize our network, and the initial learning rate is set to 0.0005. Every 200 epochs, the learning rate drops by a factor of two. We set the batch size to 10 and the patch size of LRMS images to 16 for training with 1200 epochs. Our network uses the -1 norm as the loss function.

In our experiments, all the DL-based approaches are implemented in Pytorch framework and are trained on a GTX-1080Ti GPU, while traditional methods are conducted by MATLAB. As for the test phase, all the objective evaluation indexes are calculated by MATLAB, and the results were averaged on the corresponding test set.
