*3.1. Distribution-Averaged Statistics for Daily Mean Temperature and Precipitation*

We first inspect the overall performance of the four methods from the GCM-RCM perspective. Figure 5 illustrates the distribution-averaged cross-validation statistics in the two scenario periods (see Figure S4 for the statistics in years 1981–2010). When first concentrating on the results shown for the years 2061–2090, it is seen that bias correction methods M2–M4 slightly outperform M1 in adjusting the temperature distribution in terms of both CM and MAE. On the other hand, both CM and MAE of M3 are very close to M2 and M4, which indicates that temperature can be reasonably modeled using a normal distribution. Although the general picture is mostly similar for precipitation, CM and MAE give a partially contrasting picture about the relative method performance. The CM values are smallest and almost identical for methods M2–M4, while the relative performance of M3

is slightly worse than M2 and M4 in terms of MAE. This might be partially explained by how the fraction of dry days is explicitly taken into account by M3 in its corrections, while using a gamma distribution to model the precipitation distribution might not capture biases in it as efficiently as non-parametric quantile mapping. Using a two-month time window generally reduces errors in both temperature and precipitation distributions, which is in line with the results of Räisänen and Räty [21] and Räty et al. [22]. As expected, all methods have substantially smaller errors in the marginal distributions of temperature and precipitation in comparison to the uncorrected model simulations (M0) in both periods.

**Figure 5.** Cross-validated CM and MAE for (**a**,**b**) daily mean temperature and (**c**,**d**) daily precipitation distribution in years 2011–2040 (**bottom**) and 2061–2090 (**top**). Also shown are the MAE in (**e**) the Pearson correlation coefficient and (**f**) the empirical copula density. Black color denotes the cross-validation statistics for the pseudo-reality approach without additional adjustments (V0), while the results for the approach where pseudo-realities have been adjusted to biases in relation to WFDEI are shown in red (V1). Furthermore, crosses (bars) indicate the results for the one-month (two-month) time window used to estimate simulated changes or model biases, shown for both V0 and V1. Note that the differences between the one- and two-month time windows are typically small, as indicated by the small differences between the bars and crosses.

The MAE for the Pearson correlation coefficient and the empirical copula density, when calculated over the full monthly time-series of temperature and precipitation, is also shown in Figure 5. The results for the Pearson correlation coefficient show that, although M3 and M4 improve the results in comparison to method M2, M1 performs slightly better in capturing the linear correlation between temperature and precipitation than the other methods. Moreover, M4 seems to be susceptible to the effect of noise, as M3 has a somewhat smaller MAE when the one-month time window is used. The situation is slightly different when the MAE in the empirical copula density is considered. While M1 has again the best performance out of all methods, M2 has now MAE values which are closer to the bi-variate methods. The modest improvement obtained with M4 in comparison to M1 is again at least partially related to the small sample size, as indicated by the reduction in the MAE values for the two-month time window. Yet, this highlights the difficulty to robustly estimate biases in inter-variable correlations in a changing climate. As M1 has a superior performance in terms of both of the two measures regardless of the period considered, this suggests that the inter-variable correlations do not change substantially among the selected models and within the studied regions.

The bottom row shows the cross-validation statistics for the near-term scenario period (2011–2040). As expected, the remaining errors are generally smaller for all methods in this period. The marginal distributions of both temperature and precipitation are slightly better captured by method M1 in comparison to other methods, while the relative performance of other methods does not show marked

differences between the two periods. Furthermore, the MAE in the Pearson correlation coefficient and the copula density indicate a slightly improved performance for M3 and M4 in comparison to M1, although M1 still has the smallest MAE in all cases.

In qualitative terms, the cross-validation statistics are similar for temperature and precipitation regardless of the pseudo-reality approach. By far, the largest differences are shown by method M0 for which the cross-validation statistics calculated for temperature deteriorate when correcting the pseudo-reality GCM-RCM toward WFDEI (V1), while the opposite happens for the precipitation statistics. For temperature, the larger MAE in V1 is explained by the systematic cold bias within the GCM-RCM ensemble. However, for methods M1-M4 the results are mostly similar between the two pseudo-reality approaches, although the cross-validation statistics for the temperature and precipitation distributions tend to be slightly worse for the two-month time window after pseudo-realities have been adjusted against WFDEI (V1). This suggests that, from the climate modeling perspective, the additional adjustment step does not substantially modify the cross-validation statistics apart from the uncorrected model simulations, backing up its use in the hydrological modeling step.

While not the specific target of this study, it should be mentioned that an inherent property of M4 is that in order to obtain correct ranks for each temperature and precipitation pair, both time series need to be temporally re-ordered. This is to a lesser extent an issue in M3, in which only the temporal sequence of precipitation is potentially modified. As the temporal re-ordering might affect the hydrological simulations, a modified version of M4 was tested. First, the time series of uncorrected temperature and precipitation were divided into dry and wet days in a similar manner as in M3. Next, M2 was applied separately on wet-day and dry-day distributions to retain the improved statistics for them, as obtained with M4. Finally, the N-pdft algorithm was applied only on wet-day distributions of temperature and precipitation. Tests with the modified algorithm showed, however, that although the cross-validated MAE of both correlation measures decreased slightly, changes in the cross-validation statistics for hydrological variables in comparison to the original method varied non-systematically depending on the season, region and variable considered (not shown) and, thus, did not offer systematic improvements in comparison to the original algorithm.
