5.3.1. Effectiveness of Sub-Network Structure
To assess the proposed sub-network’s efficacy in multi-scale SR, we initially identified the experiment’s key scale factors: 2× and 8×. In the 2× model, we reduced the final sub-network to extract low magnification image features, whereas, in the 8× model, we appended an additional sub-network final one to capture high magnification image details.
- (1)
Effectiveness of Sub-Networks with 2× Super-Resolution on SRTM Dataset
Table 3 displays the 2× SR results after removing the final sub-network (as depicted in
Figure 1). In terms of
PNSR, the EDSR, RCAN, EDEM, and DDMAFN models exhibit similar values, with RCAN slightly outperforming others at 57.13. Additionally, the Bicubic method demonstrates a relatively higher PNSR of 55.58, whereas the SRCNN yields the lowest
PNSR of 45.44. Despite the simplicity and relatively high
PSNR (55.58) of the traditional Bicubic algorithm, when compared with the 4× super-resolution reconstruction results of SRTM and HMA data, its limitations become evident. In terms of
RMSE, the EDSR, RCAN, and DDMAFN models achieved the best performance, with identical values of 2.56, owing to their adoption of advanced network structures, showcasing the potential and advantages of deep learning in DEM data SR tasks. In terms of
MAE, the EDSR, RCAN, and DDMAFN models demonstrate favorable performance, with DDMAFN achieving the lowest
MAE value of 1.78. Upon comparing the performance of each model based on
PNSR,
RMSE, and
MAE, the DDMAFN model closely approaches the best-performing model, achieving a
PSNR of up to 57.11. Furthermore, it attains a low
RMSE value of 2.56, demonstrating its robust capability to minimize reconstruction error. Particularly noteworthy is that DDMAFN attains the lowest
MAE value of 1.78. Additionally, DDMAFN has exhibited good performance in the
and
metrics, suggesting its effectiveness and robustness in restoring terrain features in DEM data, owing to the advanced network structure design and multi-scale loss function constraints.
From the 2× SR results depicted in
Figure 5a, distinguishing differences between the 16 models visually is challenging. For further evaluation, error maps of the 2× SR images and the original HR images were calculated and plotted (
Figure 5b) alongside a histogram of the error data (
Figure 5c).
Figure 5b,c collectively illustrates that the errors of DDMAFN, RSPCN, EDSR, and RCAN models are closer to zero, signifying their superior performance in the SR task and their ability to more accurately recover the HR image information. Nonetheless, the error distribution of SRCNN appears irregular, suggesting potential instability or inaccuracies in the reconstruction effectiveness of this model in certain regions. SRFBN errors predominantly exhibit negativity, whereas SRGAN errors lean towards positivity, indicating a potential bias or tendency in the reconstruction process of these models. RSPCN, FEN, and EDEM display mean error values close to 0 but negative, suggesting a limited extent of model generalization ability, resulting in slightly lower reconstruction results compared to the actual elevation value. The error distributions of SAN and D-SRGAN show similarities, possibly suggesting commonality in their SR strategies. While the distribution of positive and negative error values of GISR is normal, it lacks complete symmetry. Similarly to DWSR, its error distribution range is larger.
- (2)
Effectiveness of Sub-Networks with 8× Super-Resolution on SRTM Dataset
In the 8× SR experiments (
Table 4), the Bicubic method yields a
PSNR of 40.08, with
RMSE and
MAE values of 18.54 and 13.46, respectively. SRCNN exhibits a slightly lower
PSNR than Bicubic at 39.45, accompanied by higher
RMSE and
MAE. VDSR and DWSR marginally enhance
PSNR and reduce the errors. EDSR significantly improves
PSNR to 41.15 while also substantially reducing
RMSE and
MAE. RCAN, SAN, SRFBN, EFDN, and DDMAFN achieved higher
PSNR values and lower errors. Particularly, the DDMAFN achieves a
PSNR of 41.24, with corresponding RMSE and
MAE values of 16.11 and 11.49. It is notable that the SRGAN model performs inadequately in this task, exhibiting substantially lower
PSNR value compared to other models and significantly higher
RMSE and
MAE. Additionally, considering DEM data’s emphasis on terrain slope and aspect, VDSR, SAN, EFDN, GISR, and DDMAFN models exhibit comparable performance in the
and
indexes, with DDMAFN demonstrating the best performance at 7.8 and 93.75, respectively. This further validates the strong terrain feature reconstruction ability of the sub-network, fully showcasing the robustness and effectiveness of the DDMAFN model in DEM data super-resolution reconstruction tasks. Overall, the DDMAFN model excels in image quality improvement, error control, and detail recovery in the 8× SR task.
Figure 6a displays the error maps of the SR results. Comparative analysis of the 8× SR images with the HR images revealed considerable variation in error distributions among the 16 models (
Figure 6b). The error distributions of Bicubic, VDSR, RCAN, EDEM, D-SRGAN, EFDN, and DDMAFN are more centralized and closer to zero, possibly due to their superior preservation of original image information during reconstruction. Peaks of DWSR, EDSR, RSPCN, and DDMAFN are observed in the positive error region, while peaks of SRFBN, SAN, and EFDN emerge in the negative error region, possibly indicating variations in their reconstruction strategies or network structures, leading to some deviation. The larger and irregular error distribution of SRCNN may arise from its limitations in handling complex textures or details. The gradual increase in error from negative to positive for SRGAN may relate to its characteristics in balancing perceptual quality and pixel accuracy. Among all the models, GISR exhibits the smoothest error distribution, suggesting improved consistency and stability in processing different regions. Through 8× ablation experiments, we observe that the model structure proposed in this paper continues to yield satisfactory results in the SR task despite the addition of sub-networks, validating the model design’s effectiveness.
5.3.2. The Effect of Weights on Superscoring Results in Multi-Scale Attention Mechanisms
To explore the impact of various weight assignments in the multi-scale attention mechanism on the SR results, we conducted four sets of experiments for comparative analysis (displayed in
Table 5). The weight assignments included (0:0:1), (0.1:0.1:0.8), (0.2:0.2:0.6), and a set of experiments that employed weight adaptive adjustment (Self-Adaption). In the experiment employing a weight adaptive adjustment allocation strategy, variance is utilized as an evaluation metric to dynamically assign more reasonable weights to each output. Experimental results indicate that in the (0:0:1) experiment, where the weights are entirely focused on the last scale, the model achieves a
PSNR of 48.94,
RMSE of 14.80,
MAE of 11.35, with
and
values of 4.42 and 113.52, respectively. Adopting a balanced weight distribution method, like 0.2:0.2:0.6, significantly enhances the model’s performance, as reflected by the rise in
PSNR value and the decline in
RMSE and
MAE,
and
. The model’s performance can be further optimized by moderately increasing the weight of the last branch, for instance, to 0.1:0.1:0.8. Although the
RMSE value experiences a slight increase, it remains low, while the reduction in
MAE suggests that the model optimizes other evaluation metrics while preserving high reconstruction quality. Notably, the weight allocation strategy in this group merits further reduction of the
and
indexes. This may be attributed to the last branch’s greater expressiveness in reconstructing specific details or terrain textures; thus, appropriately increasing its weight enhances the model’s performance. When the strategy of Self-Adaption was used, it did not bring the expected performance improvement.
5.3.3. Effectiveness of the Main Modules
Table 6 delineates the contributions and interactions among the DWT-IWT wavelet forward and inverse transforms, the MAFBlock, and the customized loss function in the SR task. Experimental results demonstrate that the DWT-IWT wavelet transform, as a fundamental component, significantly enhances SR performance and achieves satisfactory reconstruction quality when applied independently. In the absence of other key components, the MAFBlock exhibited poor performance, resulting in a decrease in
PSNR to 42.20 and an increase in
RMSE,
MAE,
, and
to 30.98, 23.11, 46.31, and 155.58, respectively, suggesting that its benefits need to be complemented by other components for full realization. Additionally, the customized multi-scale loss function demonstrates its significance in enhancing SR performance. The combination of DWT-IWT and MAFBlock alone does not yield satisfactory performance improvement. However, the introduction of
atop DWT-IWT notably enhances the model’s performance, with the
PSNR increasing to 48.93 and
RMSE and
MAE,
, and
slightly decreasing. This indicates the effectiveness of integrating
with the wavelet transform in super-resolution tasks. Introducing DWT-IWT, MAFBlock and
simultaneously optimizes model performance, yielding the highest
PSNR of 49.37 and the lowest
RMSE and
MAE of 14.02 and 10.80, respectively.
5.3.4. Impact of Weight Settings on Network Performance
- (1)
Effect of Edge and Pixel Loss Weight Setting on Super-Resolution Reconstruction Performance
In the super-resolution reconstruction of DEM data, topographic relief, and morphological changes are crucial for the fineness and accuracy of the reconstruction. Edge information, an important indicator of terrain changes, significantly impacts reconstruction quality. To focus the network model on edge information in DEM data, we set different weight ratios for edge loss and pixel loss in the loss function (as shown in
Table 7). We compare the reconstruction results with real data in terms of edge features. Experimental results show significant performance improvement by increasing edge loss weight appropriately (e.g.,
= 0.9,
= 0.1 in Experiment 12). While pixel loss also plays an important role, it mainly focuses on pixel-level differences and is less sensitive to reconstructing terrain details than edge loss. Decreasing pixel loss weight and increasing edge loss weight in the loss function helps the network model achieve better super-resolution reconstruction results. However, when the edge loss weight is too high (e.g.,
= 1,
= 0 in Experiment 3), the network model’s performance decreases. Therefore, in super-resolution reconstruction of DEM data, adjusting the weight ratio of edge loss and pixel loss in the loss function is key to achieving high-performance reconstruction.
- (2)
Effect of Edge Detection Loss Weight Settings on Super-Resolution Reconstruction Performance
The weight settings (
,
,
) for different edge extraction methods (Sobel X, Sobel Y, and Laplace operator) in the edge loss function impact the super-resolution reconstruction of DEM data. As shown in
Table 8, experiments show that model performance declines when edge weights in a specific direction are too high (e.g., Experiments 23 and 7). This may be because the model overemphasizes edge information in one direction, ignoring critical information in others, which affects the comprehensiveness and accuracy of the reconstruction results. Conversely, when the weight allocation is more balanced (e.g., Experiments 3 and 5), the model performs better, highlighting the importance of considering edge information from multiple directions. Additionally, the Laplace operator shows unique advantages in edge extraction. Experiments indicate better reconstruction results when Laplace operator weights are moderate (e.g., Experiments 5 and 19). This is because the Laplace operator captures second-order derivative information, which is sensitive to abrupt changes and details in the terrain data. Thus, reasonable utilization of edge information extracted by the Laplace operator enhances the fineness and clarity of super-resolution reconstruction in DEM data. Furthermore, the weight assignment of the Sobel operator in the x and y directions significantly impacts model performance. The Sobel operator extracts edge information horizontally and vertically, which is crucial for recovering topographic relief and orientation in DEM data. Experimental results show that balanced weights for Sobel X and Sobel Y better retain the topographic structure of the original DEM data (e.g., Experiments 3 and 11).
5.3.5. Analysis of the Generalization Ability of Models across Different Terrains
In this paper, we propose the DDMAFN to address the challenge of super-resolution reconstruction of DEM data. First, we consider the complex topography of the Tibetan Plateau, which ranges from 3963.211 m to 7924.954 m in elevation, encompassing steep mountains, deep canyons, and varied geomorphic textures. The experimental results indicate that the DDMAFN achieves excellent reconstruction results even in this challenging environment. DDMAFN surpasses other deep learning models and the traditional Bicubic algorithm in several evaluation metrics, including peak
PSNR,
RMSE,
MAE,
and
(see
Table 1 for details). Meanwhile, DDMAFN demonstrates comparable performance to other state-of-the-art models, such as EDEM, EFDN, and SAN, on the SRTM dataset, which has a lower elevation (0–1794 m) and gentler terrain. Notably, although SRCNN slightly underperforms traditional interpolation algorithms due to its concise network structure, DDMAFN still maintains optimal performance in metrics such as
RMSE,
and
. This demonstrates the strong generalization ability of the DDMAFN model in handling terrains of varying complexity.
Second, the visualization of the reconstruction results from GDEM to HMA data (
Figure 3) shows that DDMAFN excels in restoring DEM terrain features and detailed textures, accurately reproducing the complex features in the original data. This high compatibility with high-resolution DEM data underscores its effectiveness. In the reconstruction of SRTM data (
Figure 4), models like EDSR, RCAN, EFDN, EDEM, and DDMAFN generate mountain shadow maps similar to high-resolution DEM data, indicating a degree of generalization ability. Considering the results from both datasets, DDMAFN maintains stable performance in both complex and simple terrain conditions.
5.3.6. Interpretability Analysis
This section provides an in-depth exploration of the specific regions in the input LR DEM that are crucial for high-frequency detail extraction and edge enhancement. To achieve this, we utilize the Grad-CAM visualization technique to comprehensively visualize and analyze the core nodes in the DDMAFN. The results are presented in
Figure 7.
By closely examining the heat maps of the Discrete Wavelet Transform (DWT) in the three reconstructed sub-networks, we observe that the region of interest in the wavelet guidance module demonstrates a refinement trend from coarse to fine as the network hierarchy progresses. In the first sub-network, the initial wavelet bootstrap module effectively guides the model in focusing on high-frequency detail features while providing a preliminary understanding of the overall structure of the LR DEM data. At this stage, the model demonstrates a fundamental ability to recognize the layout of each region within the LR DEM data. Simultaneously, the convolution operations in its parallel branches prioritize local feature extraction but lack the capacity to analyze global features. The combination of these two approaches creates a complementary effect, enabling the model to begin identifying key areas in the LR DEM data, such as contours of complex mountainous and flat regions, thereby enhancing its learning ability.
In the second reconstruction sub-network, the DWT component of the wavelet bootstrap module significantly enhances the model’s focus on high-frequency regions while also improving its ability to understand the overall structure. At this stage, edge delineation becomes clearer and more effective, allowing the model to better distinguish between different terrain features, such as peaks. Notably, although the LR DEM data may exhibit issues such as low quality and incoherent values due to acquisition conditions, the wavelet guidance module allows for finer focus on high-frequency details and edge features in the third sub-network, providing a deeper understanding of the numerical distribution of the LR DEM. This enhancement undoubtedly aids the model in performing high-frequency detail extraction and edge feature learning in subsequent modules. In complex terrains, this capability enables the model to more accurately reconstruct subtle features like undulations and slope changes, facilitating high-frequency detail extraction and edge feature learning in subsequent modules.
However, as the network depth increases, we observe a phenomenon: the convolutional branch of the wavelet guidance module increasingly neglects various regions of the DEM data, nearly losing its ability to recognize it in the final stage. To mitigate this unfavorable effect, we posit that the DWT in the wavelet bootstrap module is crucial, while the residual links serve a secondary auxiliary role.
Next, we analyze three different forms of the multi-scale attention module, MAFBlock, in the context of the third reconstruction sub-network. Although the different attention mechanism modules focus on the LR DEM data in slightly varied ways, they all demonstrate a strong emphasis on high-frequency regions and effectively delineate the regional layout of the LR DEM. Notably, in MAFBlock3, with the addition of the edge enhancement module, the model’s ability to control high-frequency details improves, enabling it to better perceive changes between high and low frequencies. Finally, by examining the overall output results of each sub-network, we find that as network depth increases, the model gains a more comprehensive understanding of the entirety while successfully capturing local high-frequency details and maintaining overall consistency, achieving an effective balance between the two.
5.3.7. Sensitivity Analysis
In this section, we assess the performance of the DDMAFN model and comparison models under varying noise levels (
Figure 8). Metrics including
PSNR,
RMSE,
MAE, and
were calculated by introducing Gaussian noise with variances ranging from 0.1 to 40 (shown in
Figure 9). Experimental results indicate that while all models degrade with increasing noise, the DDMAFN model maintains optimal reconstruction accuracy. At low noise levels, model
PSNR decreases slightly, and
shows a limited increase, indicating strong robustness to low noise. At high noise levels, the DDMAFN model exhibits less
PSNR reduction and smaller
RMSE and
MAE increases compared to the comparison models due to its multi-scale attention mechanism and progressive upsampling architecture, which effectively suppresses noise and minimizes information loss. Additionally, the
SRGAN model shows the least change due to noise perturbation, possibly due to adversarial training, but demonstrates the weakest overall performance. Models like
RCAN and
EDEM experience greater performance degradation under high noise conditions but still outperform
SRGAN.
Figure 10 presents the terrain reconstruction results for hilly and flat regions, with the numerical values on each plot representing
PSNR,
RMSE,
MAE,
, and
. In the hilly region, the DDMAFN model achieves a
PSNR of 48.85 and an
of 4.09, indicating its ability to capture terrain changes and learn these features during reconstruction. In the flat region, the DDMAFN model slightly improves, with
PSNR increasing to 52.90 and
decreasing to 3.21. This can be attributed to the flat region’s lower terrain variation, making it easier for the model to capture and replicate features. As shown in
Figure 10, DDMAFN reconstructs terrain closest to the HR terrain, while the EDSR, SRGAN, SRFBN, D-SRGAN, and GISR methods exhibit varying levels of checkerboard effects across different terrains, reducing reconstruction quality. SRCNN and VDSR models show significant morphological differences and elevation errors compared to HR data. The SRCNN and VDSR models display considerable morphological differences from HR data, with significant elevation errors and failure to capture many fine-grained features. Hillshade maps generated by EDSR, RCAN, EFDN, EDEM, and DDMAFN across different terrains closely resemble those from HR DEM data, reflecting relatively high reconstruction quality. These differences may arise from varying model learning requirements for different terrain features.