*3.2. Cross-Validation Statistics for Hydrological Simulations*

#### 3.2.1. Spatially Distributed Variables

How are the differences in the ability of the four methods to adjust the joint distribution of temperature and precipitation reflected in the hydrological simulations? Figure 6 shows the average cross-validation statistics for the five hydrological components (R, E, S, SWE and SWEmax) in the two scenario periods (see Figure S5 for the statistics in years 1981–2010). For clarity, the results are shown only for the two-month time window in the remainder of the paper. When looking at the MAE in monthly mean runoff (R) in the late 21st century period, it is seen that M4 outperforms the other methods in pseudo-reality approach V0, although the differences are small in comparison to M1. In pseudo-reality approach V1, however, MAE is practically identical for M1–M2 and M4. M3 has a somewhat worse performance than the other methods, which is likely caused by the larger remaining errors in the precipitation distribution than for the other methods. When evapotranspiration (E) as simulated by HYPE is considered, bi-variate methods M3 and M4 have the smallest MAE in monthly mean values and M3 actually has the smallest MAE in pseudo-reality approach V0. In addition, the results for E illustrate a side-effect of the additional pseudo-reality adjustment (V1): the MAE in E is systematically larger in V1 for all methods, as the systematic underestimation of temperature in V0 likely leads to too weak evapotranspiration in comparison to real-world hydrological simulations.

We next take a look at the cross-validation statistics for the two storage variables. The MAE of soil moisture (S) is almost identical for all methods, which indicates that it is relatively insensitive to the adjustment of daily-scale inter-variable correlations. The small differences between the four methods tend to follow those seen in the MAE calculated over the precipitation distribution, as the MAE is smallest and almost identical for methods M4 and for M2. The last two panels in Figure 6 show the MAE of the monthly mean SWE and SWEmax. These results illustrate the main benefit of pseudo-reality approach V1. As predicted, the adjustment of pseudo-reality GCM-RCMs reduces biases in snow variables, although with the expense of increased MAE for E, as discussed before. This also causes differences in the relative ranking of the correction methods between the two pseudo-reality approaches (V0 and V1); M4 performs slightly worse in relation to M1 in reality approach V0, whereas the opposite is seen after adjusting the pseudo-reality GCM-RCMs towards WFDEI (V1). Overall, these results indicate that the simulation of most hydrological aspects is only marginally improved by joint bias correction and that the accurate adjustment of marginal distributions plays a more important role, at least when only temperature and precipitation are used as input in a hydrological model, such as HYPE.

**Figure 6.** Similar to Figure 5 but for the cross-validated MAE of monthly mean (**a**) total runoff, (**b**) evapotranspiration, (**c**) soil moisture, (**d**) snow water equivalent and (**e**) the mean annual maximum of snow water equivalent in years (**bottom**) 2011–2040 and (**top**) 2061–2090.

The cross-validation statistics for the near-term scenario period are in line with the corresponding statistics of temperature and precipitation, with generally smaller errors in all studied hydrological aspects than in the later scenario period. The relatively better performance of M1, when adjusting the joint distribution of temperature and precipitation at that time is to some extent reflected in the hydrological simulations (bottom of row Figure 6), as R, E and S are all better captured by M1 in the near-future period. In contrast to the later scenario period, M2 and M4 have smaller MAE values in monthly mean evapotranspiration than M3. The cross-validation statistics of monthly mean SWE and SWEmax show the largest differences between bias adjustment methods also in this period, indicating that method choice is most important for this variable from the studied hydrological aspects.

#### 3.2.2. Evaluation of Future River Discharges

The analysis is complemented by illustrating the cross-validated LAR10 for Q5, Q99 as well as for the distribution-averaged LAR10 in the two scenario periods (Figure 7). The absolute values of LAR10 vary to some extent between the two pseudo-reality approaches. For example, LAR10 of Q5 is systematically smaller in V1 for all methods (apart from M0) in both periods, while the opposite is seen in the Q99 in the early 21st century period. Furthermore, the performance of all methods is extremely consistent when the distribution-averaged LAR10 is considered. Methods M2 and M4 have a marginally smaller LAR10 than M1 and M3, while in the earlier scenario period method M1 performs equally well or even better than M2 and M4. Also the best performing method depends on the pseudo-reality approach when low flows (Q5) are considered. In V0, method M2 somewhat outperforms the other methods in both periods, while in V1 method M3 has a slightly better performance in comparison to the other methods. On the other hand, M1 has the largest LAR10 values in both periods, which is probably related to the larger errors in temperature and evapotranspiration accordingly. The simulation of Q99 seems to marginally benefit from the adjustment of inter-variable correlations, as M4 has the smallest LAR10 among the four methods, particularly in years 2011–2040. Again, the LAR10 is larger for M3 than for the other methods, most likely due to the combination of the aforementioned issues.

**Figure 7.** Similar to Figure 5 but for the cross-validated LAR10 in (**a**) the 5th and (**b**) 99th percentile of flow duration curves shown together with (**c**) the distribution-averaged LAR10 in years 2011–2040 (**bottom**) and 2061–2090 (**top**).
