3.1.3. Ranking Based on Skill in Reproducing the Reference Climate

The models were also evaluated with respect to their skill at simulating the past climate during the reference period (1976–2005). The selected climate model simulations were compared to the reference temperature and the precipitation gridded dataset [7] and were assigned skill scores. We did not use the same method for assigning skill score to temperature and precipitation. For assessing the performance of models in simulating past temperature, the method we applied was adopted from Perkins et al. [41]. In this method, the skill score for temperature is calculated based on the identification of similarities between PDFs of modelled data and the observed reference data. A metric is generated to calculate the cumulative minimum value of each binned value for the two distributions, which represent the common area between two PDFs. This skill score (*SkTmp*) can be expressed as follows:

$$\mathcal{S}k\_{Tmp} = \sum\_{1}^{n} \minimum \ (Z\_{\mathcal{CM}\prime} \ Z\_{\mathcal{Obs}}) \tag{3}$$

where *n* is the number of bins used to calculate the PDF, *ZCM* is the frequency of values in a given bin from the model while *ZCM* is the frequency of values in a given bin from the observed data. This skill score is 1, when there is a perfect match between simulated and the observed data, while a score of 0 means no similarities at all.

The number of bins used in this study to generate the PDFs was 50.

In the case of precipitation, the skill score is calculated by a method proposed by [42] as the product of five skill functions, each assessing similarities between modelled and observed data, while covering different aspects of precipitation behavior. These five skill score functions for a particular model *j* are listed below:

$$f\_{1j} = 1 - \left(\frac{|A\_{CMj} - A\_{Obs}|}{2 \cdot A\_{Obs}}\right)^{0.5} \tag{4}$$

$$f\_{2j} = 1 - \left(\frac{\left|A\_{CMj}^+ - A\_{Obs}^+\right|}{2 \cdot A\_{Obs}^+}\right)^{0.5} \tag{5}$$

$$f\_{\text{\%}} = 1 - \left(\frac{\left|A\_{\text{CM}j}^{-} - A\_{\text{Ols}}^{-}\right|}{2 \cdot A\_{\text{Ols}}^{-}}\right)^{0.5} \tag{6}$$

$$f\_{4j} = 1 - \left(\frac{|\overline{P\_{CM}}j}{2 \cdot \overline{P\_{Obs}}}\right)^{0.5} \tag{7}$$

$$f\_{5\bar{j}} = 1 - \left(\frac{|\sigma\_{\complement M\bar{j}} - \sigma\_{\text{Obs}}|}{2 \cdot \sigma\_{\text{Obs}}}\right)^{0.5} \tag{8}$$

where *ACMj* and *AObs* are the areas below the simulated (climate model *j)* and the observed precipitation cumulative density function (PDF) curves, respectively, and A+ and A− are the fractional areas over (+) and under (−) the 50th percentile. *P* denotes the average annual precipitation over UIB and σ is the standard deviation of the probability distribution function.

Each of the above factors is intended to cover different aspects of probability distribution characteristics of the climate models, so that the distribution as a whole is taken into account through the mean and the total area (Equations (4) and (7)), the smaller and higher precipitation amounts are accounted for, through the 50th-percentile limit (Equations (5) and (6)), while the shape of the distribution is defined through the variance (Equation (8)).

These five factors are multiplied together to yield a single final skill score *(SkPrec)* for precipitation estimated by each model *j*:

$$Sk\_{\text{Prec}} = f\_{1\text{j}} \cdot f\_{2\text{j}} \cdot f\_{3\text{j}} \cdot f\_{4\text{j}} \cdot f\_{5\text{j}} \tag{9}$$

As a final step, all the rankings/scores, based on the changes in the means and in the extremes, as well as the skill scores for reproducing reference temperature and precipitation, are multiplied together to get the final overall skill or rank as follows:

$$\text{Final Skill Score} = \text{Sk}\_{EI\_1} \cdot \text{Sk}\_{EI\_2} \cdot \text{Sk}\_{\Lambda} \cdot \text{Sk}\_{\Lambda} \cdot \text{Sk}\_{\text{Temp}} \cdot \text{Sk}\_{\text{Prec}} \tag{10}$$

Under this skill score, a higher value indicates better performance, while a lower value indicates otherwise. These skill scores can be further translated to a simple ranking of 1 to 4 for each group of climate models.

The climate model selection procedure adopted in this study is in line with the approach and methods suggested by [21,41,42], although with certain modifications in the evaluation criteria. For assessing the performance of models in simulating past temperature, the method applied is adopted from [41], while in the case of precipitation, guidance is taken from [42]. A major difference from [21] in assessing model performances in simulating past climate is the use of a new long-term climate data set and an additional evaluation step for assessing model runs for their skill in reproducing the annual cycle of precipitation and temperature as well.
