This section gives an overview of the results of this research, their interpretation, and quality assessment. The predictions should show the right level of detail and brightness, which we should identify for each scale and resolution based on both visual and quantitative inspection. By the right level of detail, we mean the level of geographic detail that is useful to read a map and does not overload it [
17], and with regard to the relief, the balanced level of smaller detail and yet identifiable larger relief forms.
3.1. Predictions
Table 2 assembles the predictions received from the four models corresponding to the four scales 1:25,000, 1:50,000, 1:100,000, and 1:200,000 trained with all the training points. The scales 1:500,000 and 1:1,000,000 were excluded from this table since the number of training points (2 out of 3 and 8 out of 10) was not enough to get eligible predictions. Here, the resolutions when training the network at each of the four scales correspond to those from
Table 1 and are outlined in green. It means that the first model is the result of training the network with 1:25,000 manual relief shading at 12.5 m resolution, the second model with 1:50,000 shading at 25 m resolution, the third model with 1:100,000 shading at 50 m resolution, and the fourth model with 1:200,000 shading at 100 m resolution. The rest of the images in a row were generated using the corresponding models for a certain scale but the other three input resolutions were from 12.5 m to 100 m. To demonstrate the training results, we clipped smaller areas from the canton of Ticino for better readability.
Naturally, models trained at specific resolutions would not perform the same at other resolutions. Coarser resolutions result in blurry images, as for 1:25,000 and 1:50,000, while finer resolutions for smaller scales end up de-emphasizing larger landforms. The latter is also clear, since the smaller the search window is, the less area the network has access to and learns from.
On the other hand,
Table 3 and
Table 4 display the predictions of 24 models (6 scales × 4 resolutions) generated with 80% and 100% training points at each of the scales and each of the resolutions with the aim of finding the most appropriate scale–resolution combination visually corresponding to its manual shaded relief analogue. As in
Table 2, we outline in green the predictions based on the input resolutions from
Table 1. The adjacent resolution images in every row next to those outlined in green are the ones potentially also matching the quality of manual relief shadings, which will be found at the evaluation stage. Since the goal of this research is to apply the generated models to any area, the testing area here is located outside the training area as shown in
Figure 4, but within Switzerland to make the comparison possible.
Table 3 and
Table 4 show that the predictions highlighted in green deliver a more optimal ratio between detail level and tonal brightness in contrast to those located further from them in each line, though visual differences between predicted values at adjacent resolutions are often barely noticeable.
The 1:500,000 and 1:1,000,000 scales are also included. They demonstrate rather similar predictions with slight contrast and brightness differences and some artefacts at the finest resolution of 12.5 m, which means that much finer resolutions than those needed for the two scales would result in nearly no change in the predictions but unnecessarily large image sizes; thus, small scales do require coarser resolutions. To obtain the predictions at coarser resolutions of 250 m and 500 m, we would have to get more training points by changing the base and the padding (the size of the training window), whereas it is important to keep the same base and padding of the training window for consistency and further comparison between the scales. Therefore, for our research, we only evaluate the predictions based on resolutions of up to 100 m, and the resulting predictions for the smallest scales prove the eligibility of the scale/resolution ratios by Tobler from
Table 1.
When comparing the highlighted predictions from the last two tables to the manual relief shadings as our ground truth data in
Table 5, one can see that the neural relief shadings generally show a slightly lower level of detail (especially on the slopes) and slightly less contrast between opposing slopes on different sides of ridgelines. At the same time, there is a slightly higher degree of aerial perspective effect in terms of subtle, blurry transitions from lowlands to slopes and the highest peaks [
19], as suggested by Imhof [
1].
If at large scales manual shadings reveal more detail on slopes compared to both predictions, at small scales of 1:100,000 and 1:200,000, manual relief presents a higher level of generalisation than neural reliefs. Small relief features such as alluvial fans (at the top middle) are clearly better emphasised at all the scales of the manual reliefs. On neural shadings, they either do not display enough detail and merge with the valley at large scales, or do not have a convex upward appearance or well-defined edge at small scales.
The white patches in the middle of the valley in the predictions generated with 80% of the training points are not artefacts; they appear on predictions since lakes on manual shadings are depicted as white. However, this is well learnt by the network with more training data, and in the middle column of
Table 5, we see no white patches within valleys where there are no lakes. The neural shadings also show less obvious borders between the sheets (tiles) as seen, for instance, at the manual shadings at scales of 1:50,000 and 1:100,000, where the lighter to darker grey boundary in the valleys moving from north to south is caused by seems between paper sheets of manual relief shadings.
To sum up, the more training data are available, the better the neural shadings are expected after the training. However, overall, the models trained with 80% of the training points deliver fine predictions with levels of detail and brightness comparable to those of manual shadings.
The predictions with higher resolutions than those highlighted in green (
Table 3 and
Table 4) bear the missing details, but not necessarily the overall contrast level and focus on larger relief forms. Thus, we proceed with quantitative evaluation in order to assess the validity of the scales and resolutions from
Table 1 with regard to relief shading.
3.2. Evaluation
When interpreting the neural shadings in the previous section, we performed a brief visual similarity check of the predictions vs. manual relief shadings. In this section, we quantitatively evaluate them by means of pixel subtraction, tonal distribution, and heat maps.
3.2.1. Interactive Web Application for Manual Relief Shadings and Prediction Comparisons
We designed a web application that should help users explore the manual and neural relief shadings and compare them when placed next to each other (
Figure 5). It displays the swisstopo manual relief shadings, already generated predictions according to
Table 3 for the area of Switzerland, and their difference images. In addition, it allows for outlining an area, for instance, an illuminated or a shadowed slope, downloading the cropped raster files, showing overlaid histograms of varying brightness between manual and neural shadings and of their difference values, and delivering confusion matrices for the outlined areas presented as a sort of heat map. This interactive tool can be found and tested at
www.reliefshading.ethz.ch (accessed on 3 September 2024).
The histogram of the outlined area, visible on the rightmost difference image, indicates that neural shading is too dark on non-illuminated slopes with a more middle-grey tone in the range 90–160, but much less dark tones (less contrast) and no or almost no bright values in the range 240–255. The histogram of the difference values shows how close mean values are to zero and whether they are on a darker or brighter side. In the subsections below, we have a closer look at these differences.
3.2.2. Subtraction
To visualise differences between manual and neural shadings, we performed pixel-based subtraction (
Figure 6), where yellow and close-to-yellow tones show similar values, red tones demonstrate values that are darker on manual reliefs, and green tones depict lighter areas on manual reliefs.
Based on this, we may see similar patterns at the large scales (
Figure 6a,b). Here, the illuminated slopes are clearly overall lighter on manual reliefs, whereas the shadowed slopes are consistently darker than those on neural shadings are. This means that reliefs created manually still represent higher contrast at the ridges and shadowed slopes with a large presence of white and close-to-white values on the illuminated slopes. In addition, there is a higher level of detail present on manual relief shadings, while neural networks tend to smooth the smallest details and only leave the bigger ones. Big valleys also show a consistently lighter grey tone on manual relief shading, and neural shadings deliver overall darker middle-grey tones, which gets better with more training data. At the same time, the range of the difference values is lowest among the four scales.
At the smaller scales, i.e., 1:100,000 and 1:200,000, there are higher contrast differences, and the overall pattern changes (
Figure 6c,d). There is mainly missing detail on slopes, and the tonal changes are equally spread on both illuminated and shadowed slopes with overall darker values on manual shadings, and ridges still lack variation in greyscale from one side to the other.
Figure 6c shows the subtraction at 1:100,000, and here we can again see tonal differences scattered on the slopes independent of the illumination direction, although these are remarkably higher differences than those at larger scales. In general, at smaller scales, manual relief shadings tend to deliver higher contrasts between light and dark values, while the predictions, in turn, give darker middle-grey values and even smoother slope shading.
The latter relates also to the smallest scale of 1:200,000 presented in
Figure 6d. At the small scales, there are again both long and short ridgelines that represent the break between tonal differences, while more of the tonal values across slopes tend to match for both manual relief shadings and predictions.
At small scales, there are yellowish stripes at the foot of the mountains, which mean very similar tonal values and may speak in favour of the aerial perspective effect, well preserved in the predictions at all the scales. At the same time, it can also be because we employ the elevation of the DEM, thus the differences in elevations of valley floors can be accounted for.
To sum up, the tonal differences demonstrated that despite the generally high quality of predictions, there are still contrast changes present at all the scales to a certain degree. Manual relief shadings still possess much more bright tones and deliver more contrast at the ridge lines and overall.
3.2.3. Tonal Distribution
In this subsection, we check if tonal variations in predictions and our ground truth data, i.e., manual relief shadings, have similarities.
Table 6 shows comparisons of the tonal distribution of predictions at different scales and resolutions.
The first noticeable trend is that the number of the brightest pixels in manual shadings is up to ten times higher than that of neural shadings. The brightest values are only present in such a high amount in manually shaded reliefs, which is probably due to the presence of bright spots left white on paper, and not so much in neural shadings. These bright values are essentially paper-white values. In addition, cartographers largely used white gouache in the drawing process to emphasise the bright areas.
Second, the prediction distribution curves tend to repeat those of manual shadings in most of the cases, but they appear shifted towards the middle tones and are shorter in length, excluding the brightest and the darkest values. By adjusting the range of pixel numbers, we can see if black and white values are present on manual relief shadings and predictions at the corresponding scales (
Figure 7 and
Figure 8).
There are more dark values present on manual relief shadings in comparison to predictions. One of the reasons is that the network tries to even out extreme values during the training. The same happens to the bright values (
Figure 8), where the y range for manual shadings stays unchanged to emphasise the large number of bright values.
When looking closer at the clipped area, one can notice outliers on manual shadings at 1:100,000 and 1:200,000 (
Figure 9) scales, which may be scanning artefacts. At the same time, manual relief shadings at small scales miss a number of values, while neural shadings prove to have a more consistent distribution of tonal values with all the values present. Generally, smoother tonal values of predictions compared to rather grainy manual shadings can probably also be explained by the fact that manual relief shadings were originally scanned with five times higher resolutions than those presented in
Table 1; thus, they were downsampled to match the resolution of the predictions.
Overall, the histograms of differently generated shaded reliefs at the same scales and resolutions share higher numbers of bright values and similar patterns, but with more outliers, missing values, and grainy distribution curves of manual relief shadings vs. smoother curves and more middle-grey values of neural relief shadings.
3.2.4. Confusion Matrices as Heat Maps
When cropping an area in the web interactive tool, along with histograms the user will see a confusion matrix for that area. It resembles a heat map and essentially shows the pixel values of manual reliefs on the
x-axis and the predicted values on the
y-axis. For example, for each pixel that has a certain value on the manual relief, the user can see the frequency of the values for the pixels of the predicted relief. The greener the pixel, the more instances the model predicted for a certain manual value. If the model were perfectly reconstructing the manual reliefs, this diagram would show a straight diagonal line. One can see that there is roughly a straight line surrounded by a lot of noise (
Figure 10). The larger the cropping area, the more balanced or even distribution of green values should occur. While heat maps can be useful for obtaining a better understanding of the predictive quality of relief shading models, it is suggested to only use them for clearly delineated features (e.g., spatially delineated landforms, single slopes, ridges), as their expressiveness might diminish when computed for larger areas (e.g.,
Figure 10c).
3.2.5. Final Assessment
We present the quantification of the differences from the previous subsections by the mean values and their standard deviations in
Table 7 and
Table 8 (80% and 100% of the training points) and visualise mean values as clustered columns and standard deviations as error bars (
Figure 11 and
Figure 12).
The lower the absolute mean values, the closer the tonal values of the predictions to those of manual relief shadings. Tonal values on manual shadings do not depict the absolute heights, but “the approximate appearance of differences in relative elevation” [
1]. Thus, it is not the absolute measure to refer to, but in our research comparing the predicted shadings to manual ones helps us to see whether there is a correlation between scale and resolution, since manual relief shadings present a historically important and carefully considered level of detail for a certain scale. As we can see, at scales of 1:25,000 and 1:100,000 (
Figure 11) and 1:25,000 and 1:50,000 (
Figure 12), the lowest absolute mean values are those of the resolutions 12.5 m and 100 m and 12.5 m and 25 m (the numbers closest to zero for each of the scales). The negative values here only guide us on which side (darker or lighter) the mean values are, and what we look at is how close the value is to zero. For smaller scales, the 50 m (
Figure 11) and 25 m resolutions (
Figure 12) show the values closest to zero, while 50 m and 100 m resolutions give the second-best values (
Figure 12). Here, the difference between the quantities of training data is not high; the main distinction is about the consistency of values. Overall, there is a tendency of ascending mean values for each of the resolutions within one scale, which fits the scale/resolution ratios from
Table 1. There are outliers, e.g., the highest values of 50 m resolution for the 1:50,000 scale (
Figure 11) or the 12.5 m resolution for the 1:200,000 scale (
Figure 12) and the second-best values for certain scales and resolutions, and given that every single model gives a slightly different prediction, we may assume that further models would give a potentially better and more consequent picture. For some scales, mean values are not the closest to zero for best-fit resolutions but are still much lower than the least-fit ones.
As one can see from
Table 7 and
Table 8, standard deviation values are fairly close to one another and yet tend to slightly change horizontally along the scale in favour of the suitable resolutions from
Table 1.
This lets us affirm that the correlation between scales and resolutions presented in
Table 1 is not a coincidence and that the scale does require and define the ideal resolution, and vice versa, based on the assessment conducted.
3.3. Testing
Below are the neural shadings produced by the models that we tested on the two areas in the USA, i.e., outside the training area, where the multi-resolution DEMs [
20] are available. The resolution range (
Figure 13 and
Figure 14) is different from the one we used for training. However, we generated predictions for resolutions finer than 12.5 m by applying the model trained with 1:25,000 and 12.5 m, 15 m with 1:50,000 and 25 m, 30 m with 1:100,000 and 50 m, and all the following resolutions from 90 m and coarser with 1:200,000 and 100 m; all models were trained with 80% of the points.
These testing results demonstrate that it should be possible to employ the models not only on different areas but also on resolutions different from those the models were trained on. To compare neural shadings against the ground truth data, we also tested the Churfirsten and Säntis area multi-resolution DEMs available at resolutions 30–2000 m (
Table 9).
Here, there are two resolutions outside the range we trained the models with—30 m (between 25 m and 50 m) and 60 m (between 50 m and 100 m). Therefore, in the first two rows, there are two neural shadings displayed, each generated with the models of lower and higher resolutions, i.e., for 30 m DEM Model 1 at 1:50,000 and 25 m and Model 2 at 1:100,000 and 50 m. The rest of the resolutions—120 m, 250 m, and 500 m—are all coarser than 100 m, so we employed to each of them only one model trained at 1:200,000 and 100 m. The resulting images show that the higher resolution models (Model 1 in the first two rows) deliver a level of detail and contrast closer to the one in ground truth data (swisstopo manual relief shadings marked green). For this reason, we recommend going for a model with a higher resolution if the input resolution lies between the values 12.5 m, 25 m, 50 m, and 100 m. The eight models (80% and 100% training points) are publicly available to download from polybox (
https://www.polybox.ethz.ch/index.php/s/G5KpbhAwAPOtPXJ, accessed on 1 September 2024).
Based on the last tests presented in
Table 9, here are the guidelines derived from the analysis from the previous chapters (
Figure 15). Some of these guidelines relate to the quality of input data and some to the training itself.
According to
Table 9 and
Figure 15, the following models are to be applied in different scenarios:
The corresponding four models: for the exact same scales and resolutions;
Model 1:25,000 and 12.5 m: for scales < 1:25,000–1:50,000 and/or resolutions < 25 m;
Model 1:50,000 and 25 m: for scales 1:50,000–1:100,000 and/or resolutions < 50 m;
Model 1:100,000 and 50 m: for scales 1:100,000–1:200,000 and/or resolutions < 100 m;
Model 1:200,000 and 100 m: for scales > 1:200,000 and/or resolutions > 100 m
For resolutions higher than 12.5 m and lower than 100 m, we recommend creating neural shadings with the models 1:25,000 and 12.5 m and 1:200,000 and 100 m, respectively. For resolutions and scales between those listed, we recommend employing the next higher resolution and/or larger scale model.