*2.4. Model Output Statistics*

Non-parametric quantile mapping applied both in the delta change (M1) and bias correction (M2) mode is used to benchmark the ability of joint bias correction methods to adjust temperature and precipitation for biases in the GCM-RCM simulations (see Table 3). In the following, the formulation is shown for the bias correction form of quantile mapping. For given simulated values of daily mean

temperature or precipitation in the scenario period (*si*), the projected values *pi* are obtained by transforming *si* according to

$$p\_{\bar{i}} = F\_o^{-1} \left( F\_c \left( s\_{\bar{i}} \right) \right). \tag{1}$$

here *Fc* denotes the cumulative distribution function of the baseline period simulation and *F*−<sup>1</sup> *<sup>o</sup>* its inverse estimated from the observations. Formulation for the delta change form is simply obtained by switching the indexes for the observations (*o*) and the future period simulation (*s*).

Before the transformation shown in Equation (1) is applied, the quantile-quantile relationship between *Fc* and *Fo* is smoothed by replacing individual quantiles with a running average taken over a specified quantile range using the approach and numerical values described in Räty [21] and Räty et al. [22]. If the future simulated values are outside the baseline period observations and model simulation, the quantile relationship is extrapolated assuming a constant, additive, relationship above the highest and below the lowest quantile for daily mean temperature. For precipitation, relative values are used. For further implementation details, the reader is referred to Räty [21] and Räty et al. [22].

**Table 3.** List of bias adjustment methods used in this study together with a short description.


To take biases in the co-variations of daily mean temperature and precipitation into account, two bi-variate bias correction methods were implemented and compared with their univariate counterparts. In the first one (M3), the dependence structure is modeled separately from the marginal (i.e., unconditional) distributions of temperature and precipitation using a copula-based approach as described in Li et al. [14]. The implementation of this method is based on the properties of copula described by Sklar's theorem [37], which states that, given two random variables *X* and *Y* such as daily mean temperature and precipitation, their joint cumulative distribution (*H*(*x*, *y*)) can be constructed as

$$H(\mathbf{x}, y) = \mathbb{C}(F(\mathbf{x}), G(y)),\tag{2}$$

where *C*() denotes the cumulative copula distribution and *F*(*x*) and *G*(*y*) are the cumulative marginal distributions for *X* and *Y*. Here, it is assumed that the dependence structure of daily mean temperature and precipitation can be reasonably modeled using a Gaussian copula as in Li et al. [14]. The Gaussian copula was chosen due to its relatively simple formulation and its ability to model both positive and negative correlations. Although several other parametric copulas are available for modeling the dependence structure, testing them is beyond the scope of this paper. Marginal distributions were modeled with parametric distributions in a similar manner as in Yang et al. [2] and Li et al. [14]. Temperature is assumed to follow Gaussian distribution, while gamma distribution is used to model precipitation above a wet-day threshold, here defined as 0.1 mmd<sup>−</sup>1. To improve the performance of M3 at daily temporal scales, separate temperature distributions were fitted both on dry and wet days following a pre-adjustment of the fraction of wet days in model simulations to the observed one [2,35]. In M3, bias correction of the wet-day part of the joint distribution needs to be applied conditionally on either temperature or precipitation, thus offering two approaches to correct the joint distribution. Based on tests with these two alternatives, we decided to bias correct precipitation conditionally on temperature due to the slightly better overall performance of this option (not shown). The correlation parameter values obtained from fitting Gaussian copula to the baseline period simulations are illustrated in the supplementary material (Figure S1). For further details, the reader is referred to an R package available in GitHub [38], which contains implementations of methods M1–M3 written by the authors.

The second bi-variate bias correction method (M4) was recently proposed by Cannon [17] based on the N-pdft algorithm designed by Pitié et al. [36]. In this method the full 2-dimensional distribution structure is adjusted iteratively by reducing the adjustment of the 2-dimensional distribution to a series of 1-dimensional bias corrections of the marginal distributions. First, both temperature and precipitation distributions are normalized and randomly rotated to a new orthogonal coordinate system. Second, quantile mapping is applied to the rotated distributions. The adjusted distributions are then rotated back to the original coordinate system before repeating the described sequence. After several successive iterations it can be shown that the joint distribution converges to the target distribution [36]. As discussed by Cannon [17], the algorithm constructs the joint distribution at each iteration step as a linear combination of the bias corrected marginal distributions, which allows modifying the dependence structure. In the original article quantile delta mapping [4,8] was used to adjust the marginal distributions along the iteration cycles. Here, we use the same quantile mapping algorithm as in M2 instead of quantile delta mapping when bias correcting the marginal distributions of temperature and precipitation. Doing this, the bias corrected temperature and precipitation distributions are identical in M2 and M4. No smoothing was applied to the marginal distributions in the rotation step, as this would had contracted the underlying joint distribution of observations and the control period simulation. The algorithm was terminated after 50 iterations, which should be sufficient for the algorithm to converge to the target distribution, as illustrated by Cannon [17]. The implementation of M4 was based on the R package [39] available in the CRAN repository [40].

To take biases in the annual cycle into account, daily mean temperature and precipitation time series were adjusted on a monthly basis at each sub-model domain. As sampling errors are likely to affect the estimation of simulated changes (M1) and biases (M2–M4) in the GCM-RCM distributions, the effect of increased sample size on method performance was addressed by using both one- and two-month time windows when estimating model biases and simulated changes from GCM-RCM simulations. Using an even larger time window could in principle reduce the sampling noise [41], although with the expense of possibly introducing systematic biases to the future results. Tests with longer time windows did not show significant changes in the results, although a more systematic comparison would be needed to fully assess the potential benefits of reducing sampling noise.

To illustrate how each method represents the dependence structure in the calibration period, Figure 2 shows differences in the empirical copula density for the wet-day values of temperature and precipitation in comparison to the WFDEI copula in winter months of years 1981–2010 in the Tornio sub-model. The copula density has been estimated as a 2-dimensional histogram of the normalized ranks for both variables (see a more detailed description in Section 2.6). The density values can be interpreted as the ratio of the joint probability density to the case, where both variables were independent from each other. For example, values larger than one suggest a larger-than expected joint probability density in this part of the two-dimensional space. Differences in the empirical copula density roughly denote the difference in the strength of dependence for particular cumulative probability values of both temperature and precipitation. The panel for the reference data shows that temperature and precipitation are positively correlated in this example (the highest values slope from bottom-left to top-right). By design, M1 takes the inter-variable relationships directly from the reference. In contrast to the delta change mode, M2 inherits the multivariate dependence structure from the uncorrected GCM-RCM simulation (M0), with some modifications to it due to changes in the fraction of wet days [42]. Therefore, these methods can be thought to give the "limits" in which the multivariate methods can operate on adjusting the inter-variable correlations. Although the dependence structure of temperature and precipitation can be reasonably modeled using Gaussian copula on monthly scales, this might not be as feasible at daily scales, at least in cold climates. From Figure 2, it is immediately seen that although the Pearson correlation coefficient is well captured by M3, the overall copula structure shows noticeable deviations from the target distribution. In this particular case, M3 tends to overestimate the strength of the co-occurrence of low precipitation intensities and medium temperature values. Although the behavior of M3 strongly depends on the selected GCM-RCM, season and location, this example highlights the importance of evaluating the full multivariate dependence structure to reveal such issues. In contrast to M3, M4 performs very well in capturing the WFDEI dependence structure and differences in the copula density field are small in most parts of the distribution.

**Figure 2.** Empirical copula density of wet-day (P > 0.1 mmd<sup>−</sup>1) precipitation and temperature in winter months (December-January-February) estimated from the reference data (WFDEI) in Tornio river catchment separately for each sub-basin and then averaged over the whole domain. In addition, differences in the estimated densities, when compared against WFDEI, are shown for CNRM-A GCM-RCM without bias adjustments (M0) and after applying each of the four methods (M1–M4) in the baseline period (1981–2010). The sub-model-averaged Pearson correlation coefficient is also shown on the top-left corner of each panel.

As another example, Figure 3 shows how the four methods capture the hydrological conditions in the Tornio sub-model, when adjusting the GCM-RCM simulations against WFDEI in the baseline period (1981–2010). For simplicity, a one-month time window has been used when estimating the simulated changes and biases in the GCM-RCM simulations. As expected, M1 has essentially a perfect correspondence with the observed hydrological conditions. The remaining biases in temperature and precipitation are very similar for M2 and M4 but not identical as small differences arise from the re-shuffling of daily values over the full 32-year period in M4, which is an inherent property of this and many other multivariate bias correction methods. Furthermore, M3 shows a very similar pattern for temperature, while the differences to M2 and M4 are more visible in the remaining precipitation biases. The differences between the methods are also visible in the annual monthly mean cycle of different aspects of the simulated surface hydrology. The differences to WFDEI are largest for M3 in most cases, which is expected, as the (potentially sub-optimal) parametric marginal distributions used in M3 match the GCM-RCM-simulated temperature and particularly precipitation less accurately with WFDEI than the non-parametric versions used in M2 and M4. Apart from M1, M4 has generally the smallest differences to WFDEI, particularly in total runoff and snow water equivalent, although the remaining evapotranspiration biases are similar for M2 and M4.

**Figure 3.** A real-world example showing the annual cycle of (**a**) daily mean temperature, (**b**) precipitation, (**c**) total runoff, (**d**) evapotranspiration, (**e**) snow water equivalent and (**f**) soil moisture in the Tornio river catchment in years (solid lines) 1981–2010 and (dashed lines) 2061–2090 separately for each of the GCM-RCMs when adjusted against WFDEI using the four (M1–M4) bias correction and delta change methods. To readily illustrate differences in comparison to WFDEI, the remaining biases in 1981–2010 are shown below the annual cycle panels separately for each method and variable.
