*2.2. Methods*

To quantitatively compare the consistency of the three reanalysis datasets with in-situ observations and evaluate their errors of daily average temperatures, the reanalysis data from 2017–2018 are divided into eight times per day (00, 03, 06, 09, 11, 14, 17, 20 UTC), which is the division used in GLDAS. The arithmetic mean is taken as daily mean temperature for individual reanalysis datasets. Based on the latitude and longitude information of the observation sites, daily mean temperatures from reanalysis are interpolated to the observation sites using the nearest neighbor interpolation method. Two sequences of daily temperatures from the reanalysis and from the in-situ observations, with 11,635 samples in each sequence, are then compared. Pearson correlation coefficient (CC), mean bias error (MBE), root-mean-square error (RMSE), Nash–Sutcliffe efficiency coefficient (NSE) [50,51], Kling–Gupta efficiency (KGE) [52,53], and Willmott's Index of Agreement (WIA) [54] are then calculated to evaluate the accuracy and applicability of CLDAS, ERA5L, and GLDAS

temperature data in the alpine region of the QTP. The calculations of the aforementioned indices are as follows:

$$\text{CC} = \frac{\sum\_{\mathbf{i}=1}^{\text{n}} (\mathbf{R}\_{\mathbf{i}} - \overline{\mathbf{R}})(\mathbf{S}\_{\mathbf{i}} - \overline{\mathbf{S}})}{\sqrt{\sum\_{\mathbf{i}=1}^{\text{n}} (\mathbf{R}\_{\mathbf{i}} - \overline{\mathbf{R}})^2} \sqrt{\sum\_{\mathbf{i}=1}^{\text{n}} (\mathbf{S}\_{\mathbf{i}} - \overline{\mathbf{S}})^2}} \tag{1}$$

$$\text{MBE} = \frac{1}{\mathbf{n}} \sum\_{\mathbf{i}=1}^{n} \mathbf{R}\_{\mathbf{i}} - \mathbf{S}\_{\mathbf{i}} \tag{2}$$

$$\text{RMSE} = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} (\mathbf{R}\_i - \mathbf{S}\_i)^2} \tag{3}$$

$$\text{NSE} = 1 - \frac{\sum\_{i=1}^{n} (\mathbf{R\_i} - \mathbf{S\_i})^2}{\sum\_{i=1}^{n} (\mathbf{S\_i} - \mathbf{S})^2} \tag{4}$$

$$\text{KGE} = 1 - \sqrt{(\text{CC} - 1)^2 + (\alpha - 1)^2 + (\beta - 1)^2} \text{ with } \alpha = \frac{\overline{\text{R}}}{\text{S}}, \text{ and } \beta = \frac{\sigma\_{\text{R}}}{\sigma\_{\text{S}}} \tag{5}$$

n

$$\text{WIA} = 1 - \frac{\overset{\cdots}{\sum} \left( \mathbf{R\_i} - \mathbf{S\_i} \right)^2}{\sum\_{i=1}^n \left( \left| \mathbf{R\_i} - \mathbf{S} \right| + \left| \mathbf{S\_i} - \mathbf{S} \right| \right)^2} \tag{6}$$

where Ri is the reanalysis temperature interpolated to the observation site, Si is the in-situ observation at the site, n is the total number of records that participate in the evaluation, and R and S denote the averages of reanalysis data and observations during the study period, respectively. CC (Equation (1)) represents the correlation between reanalysis and observations with values that range within [−1, 1]. |CC| = 1 indicates that the two sequences are completely linearly correlated; CC = 0 means there is no correlation between the two sequences, and 0 < |CC| < 1 indicates that there is a certain degree of linear correlation between the two. The closer |CC| is to 1, the higher the linear relationship is; the closer |CC| is to 0, the weaker the linear correlation is between the two. CC>0 indicates that the reanalysis and the in-situ observations have the same trends of change, and CC<0 means that they have opposite trends. MBE (Equation (2)) reflects the deviations of reanalysis data from observations. Negative MBE values indicates that the reanalysis data are lower than the observation, and vice versa. RMSE (Equation (3)) shows the overall difference between the reanalysis and the observations, including systematic and non-systematic biases. A closer-to-0 RMSE corresponds to a more accurate reanalysis dataset. NSE (Equation (4)) is widely applied to quantify the prediction ability of hydrological models. It reflects the consistency of two datasets: NSE = 1 indicates that the reanalysis data are completely consistent with the observations; NSE ≤ 0 indicates that the two datasets are inconsistent with each other. The KGE (Equation (5)) is based on a decomposition of the NSE into its constitutive components (correlation, mean bias, and variability bias) and is increasingly used for model calibration and evaluation. *σ*R and *σ*S are the standard deviations in reanalysis and in-situ observations, respectively. KGE can vary from negative infinity to 1, and KGE = 1 indicates perfect agreemen<sup>t</sup> between simulations and observations. WIA (Equation (6)) is similar to NSE, but the denominator of the main term in the equation is the potential maximum difference [54]. The value of WIA ranges between 0 (not consistent) and 1 (perfectly consistent). On the scatterplot of reanalysis versus in-situ observations, both WIA and NSE indicate how close the data points are to the fitted 1:1 line. During the evaluation period, all samples used for evaluation are calculated based on the cumulative results of daily observations.

**Figure 1.** Elevation of the study area and distribution of in-situ observation sites.


Note: The properties of these sites are derived from the metadata provided in http://data.tpdc.ac.cn (accessed on 18 May 2022). The unit of latitude and longitude in degree, minute, and second is transferred to decimal unit in degree and keeps four decimal places. For a few sites where the metadata is missing, the properties are derived from other information. For example, the elevation at Shuanghu is derived from 90 m-resolution DEM, and the land cover type at Ali is derived from http://www.horn.ac.cn/index.jsp (accessed on 18 May 2022).

### **3. Results Analysis**

*3.1. Comparative Analysis of Spatial Distribution Characteristics*

Spatial distributions of average temperature for reanalysis datasets and in-situ observations during the study period of 2017–2018 are displayed in Figure 2, which clearly shows that the three reanalysis datasets and the in-situ observations roughly follow the variation of latitude and elevation. Despite the slight differences at local or regional scales, the magnitude and spatial distribution of temperature are basically the same for these datasets (Figure 2a). Temperature gradually increases from north to south, and high temperature

centers are located in southeastern Tibet, southwestern Sichuan, and northwestern Yunnan. The Qaidam Basin in the northwest of Qinghai is surrounded by mountains. The average elevation of the basin is about 2600 m, which is lower than the surrounding areas. The annual average temperatures in the three reanalysis datasets in the Qaidam Basin are all significantly higher than those of the surrounding areas. The average elevation of the Kunlun Mountains and Karakoram Mountains located in western Tibet is more than 5500 m, and the temperature is significantly lower than other areas in the same latitude. Compared to ERA5L and GLDAS, CLDAS describes more details of temperature changes with altitude. For example, CLDAS aptly describes the dramatic temperature changes caused by large altitude differences in the Hengduan Mountains region located at the junction of Tibet, Sichuan, and Yunnan, where mountains, valleys, and rivers are intertwined. In contrast, the other two reanalysis datasets can barely reflect this characteristic distribution of temperature in the Hengduan Mountains.

**Figure 2.** Spatial distributions of annual mean temperature over 2017–2018 ((**a**) CLDAS; (**b**) ERA5L; (**c**) GLDAS; (**d**) in-situ observations).

Spatial distributions of seasonal temperature are displayed in Figures 3–6. In the spring (Figure 3), CLDAS temperature is higher than ERA5L and GLDAS in the entire study area except the Qaidam Basin and the low-elevation region of southern Tibet, where CLDAS is lower than ERA5L and GLDAS. ERA5L and GLDAS show large differences in the spatial distribution of air temperature in the plateau area, though CLDAS has a small difference. ERA5L is also significantly lower than CLDAS and GLDAS in the central QTP. In the summer (Figure 4), the spatial distributions of CLDAS and GLDAS are similar to each other, while ERA5L is obviously lower than the other two reanalysis datasets. In the autumn (Figure 5), CLDAS and ERA5L are closer to each other, while GLDAS is lower than CLDAS and ERA5L in the high-elevation region of western QTP, but higher in the low-elevation region of the southeastern QTP. In the winter (Figure 6), spatial distributions of the three reanalysis datasets are basically consistent, although ERA5L is lower than CLDAS and GLDAS in southeastern Qinghai and northeastern Tibet. Overall, compared to GLDAS and ERA5L, CLDAS is closer to observations and demonstrates higher spatial consistency.

**Figure 3.** Spatial distributions of spring mean temperature over 2017–2018 ((**a**) CLDAS; (**b**) ERA5L; (**c**) GLDAS; (**d**) in-situ observations).

**Figure 4.** Spatial distributions of summer mean temperature over 2017–2018 ((**a**) CLDAS; (**b**) ERA5L; (**c**) GLDAS; (**d**) in-situ observations).

**Figure 5.** Spatial distributions of autumn mean temperature over 2017–2018 ((**a**) CLDAS; (**b**) ERA5L; (**c**) GLDAS; (**d**) in-situ observations).

**Figure 6.** Spatial distributions of winter mean temperature over 2017–2018 ((**a**) CLDAS; (**b**) ERA5L;(**c**) GLDAS; (**d**) in-situ observations).

### *3.2. Accuracy of the Reanalysis Datasets for the Evaluation Period*

Table 3 lists the evaluation results over the period 2017–2018. The mean temperatures of CLDAS, ERA5L, and GLDAS are 1.49 ◦C, −2.491 ◦C, and −0.44 ◦C, respectively. The mean value of CLDAS is the closest to the average of the in-situ observations (0.956 ◦C). The correlation coefficient (CC) between CLDAS and the observations is the highest (0.969), followed by the correlation between ERA5L and the observations (0.934); the correlation coefficient between GLDAS and the observations is the lowest (0.92). The MBEs of ERA5L and GLDAS are −3.45 ◦C and −1.40 ◦C, respectively, which suggests that temperature is underestimated in the two reanalysis datasets, to a certain degree. Conversely, the MBE of CLDAS is 0.53, which indicates that CLDAS overestimates temperature in those in-situ observation sites. The RMSEs of CLDAS, ERA5L, and GLDAS are 2.18 ◦C, 4. 83 ◦C, and 3.64 ◦C, respectively, which indicates that the errors of CLDAS are smaller than the other two reanalyses. The values of NSE and WAI are close to 1 (the premium value) for all the three reanalysis datasets, suggesting that they are highly consistent with the in-situ observations, especially CLDAS. From the value of KGE, CLDAS is closer to 1, which indicates that it is better than ERA5L and GLDAS. This result agrees with NSE and WIA. In general, CLDAS is noticeably better than GDAS and ERA5L during the evaluation period based on evaluation indices of correlation, bias, and consistency. GLDAS is better than ERA5L, although the differences between them are relatively small.

**Table 3.** Accuracy evaluation results of CLDAS, ERA5L and GLDAS for the period 2017–2018.


To better display the consistency of the three reanalysis datasets with the observations during the evaluation period, Figure 7 shows the scatter plots of reanalysis data versus in-situ observations and the results of univariate linear regression. The goodness of fit (R2) for CLDAS, ERA5L, and GLDAS are 0.939, 0.872, and 0.847, respectively, which indicates that CLDAS is more consistent with in-situ observations. This result agrees with the results shown in Table 3.

**Figure 7.** Scatter plots of reanalysis datasets versus in-situ observations ((**a**) CLDAS; (**b**) ERA5L; (**c**) GLDAS; n: total number of samples).

### *3.3. Evaluation of Temporal Variation*

### 3.3.1. Daily Variation

To analyze differences in daily temperature of the reanalysis datasets during the evaluation period, daily average temperatures of CLDAS, GLDAS, ERA5L, and in-situ observations over the evaluation period are displayed in Figure 8a, which shows that the daily variations and temporal changes of surface air temperature are basically consistent between the three reanalysis datasets and observations, and that CLDAS is closer to the observations than GLDAS and ERA5L are. Looking at the time series of daily CC (Figure 8b), we found that in 85. 6% of the days, the CCs of CLDAS with observations are above 0.8. However, the CCs of GLDAS and ERA5L with observations are below 0.8 in 60.7% and 90.5% of the days, respectively. Furthermore, the magnitude of daily variation of CLDAS is relatively small, which implies a more stable correlation with in-situ observations. The ranges of daily RMSE variation for CLDAS, ERA5L, and GLDAS (Figure 8c) are within 0.61–2.35 ◦C, 1.97–3.80 ◦C, and 2.43–3.76 ◦C, respectively. Note that the daily variation of RMSE for CLDAS is obviously lower than—the other two reanalysis datasets. In 76% of the total days, the RMSE values of GLDAS were lower than those of ERA5L, which indicates that the quality of GLDAS is higher than ERA5L in most days. The time series of daily MBE are displayed in Figure 8d, which shows that the MBEs of CLDAS are closer to the zero line than GLDAS and ERA5L are, which suggests that CLDAS is more consistent with observations than GLDAS and ERA5L. The MBE of CLDAS is positive in 78.5% of the days, whereas the MBE of GLDAS is negative in 90% of the days, and ERA5L is negative throughout the study period. This result indicates that daily temperature is overestimated by CLDAS and underestimated by GLDAS in most of the days, and it is always underestimated by ERA5L. The consistency indices of NSE (Figure 8e), KGE (Figure 8f), and WIA (Figure 8g) of CLDAS are closer to 1 with a smaller range of variation compared to that of ERA5L and GLDAS, which shows that CLDAS is more consistent with observations and demonstrates a higher stability.

**Figure 8.** Daily evaluation of during 2017–2018. (**a**) Time series of daily mean temperature; (**b**) CC; (**c**) RMSE; (**d**) MBE; (**e**) NSE; (**f**) KGE; (**g**) WIA.

### 3.3.2. Monthly Variation

Figure 9 presents characteristic changes in monthly mean errors of the reanalysis datasets. The time series of monthly mean temperature (Figure 9a) indicates that the variation trends of the three reanalysis datasets are similar to that of the observations, i.e., temperature is the lowest in January and gradually increases from then onwards, reaches the highest in July, and then gradually decreases. Monthly CCs for CLDAS are all higher than those for GLDAS and ERA5L (Figure 9b). CCs for GLDAS are higher than for ERA5L in all months except March 2017 and February 2018, when the CCs for GLDAS are slightly lower than those of ERA5L. The RMSEs of CLDAS, ERA5L, and GLDAS (Figure 9c) range between 1.637–3.046 ◦C, 2.535–8.353 ◦C, and 2.682–5.054 ◦C, respectively. Note that the RMSEs of CLDAS are smaller than those of the DLDAS and ERA5L in all months, while the RMSEs of GLDAS are lower than those of ERA5L in all months except August and September of 2017 and July and August of 2018, when the RMSEs of GLDAS are slightly higher than those of ERA5L. Monthly MBE variations (Figure 9d) indicate that CLDAS overestimates monthly mean temperature in all months except December 2018, when it slightly underestimates the monthly mean temperature by −0.025 ◦C. The largest overestimation of 1.172 ◦C occurs in March 2017. Monthly MBEs of ERA5L are negative in all months, with the largest negative bias of −7.395 ◦C occurring in November 2018. Monthly MBEs of GLDAS are negative in all months except December 2017, when the monthly mean temperature of GLDAS is higher than the observation by 0.445 ◦C. The largest negative bias of GLDAS occurs in March 2017 with the value of −2.993 ◦C. Monthly consistency indices of NSE (Figure 9e) for CLDAS, ERA5L, and GLDAS are within the ranges of 0.581–0.847, −2.253–0.363, and −0.191–0.541, respectively, and the ranges of KGE (Figure 9f) are 0.027–0.892, −26.714–0.736, and −8.948–0.709, respectively. The indices of WIA (Figure 9g) are within the ranges of 0.903–0.961, 0.545–0.855, and 0.685–0.865, respectively. The lowest value of NSE occurs in either July or August for all the three reanalysis datasets, whereas the lowest value of WIA occurs in either October or August. Compared to the other two reanalysis datasets, monthly values of NSE, KGE, and WIA for CLDAS are closer to one, suggesting that CLDAS is more consistent with observations. GLDAS overall is better than ERA5L, with the exception of a few months.

### 3.3.3. Seasonal Analysis

Figure 10 displays seasonal error characteristics during the evaluation period. The histograms of seasonal mean air temperature changes from reanalysis datasets and insitu observations are displayed in Figure 10a, which shows that seasonal temperatures of CLDAS, ERA5L, and GLDAS as well as in-situ observations all present a unimodal feature of being low in winter and high in summer. This result indicates that the three reanalysis datasets can well describe the seasonal variation of temperature in the QTP. Seasonal CCs (Figure 10b) of the three reanalysis datasets with observations are all the highest in autumn, while the CCs of CLDAS and ERA5L with observations are the lowest in winter and higher in spring than in summer. Although the CC of GLDAS with observations is the lowest in summer, the difference between CCs in winter and summer is quite small. Seasonal RMSEs (Figure 10c) of the CLDAS, ERA5L, and GLDAS all gradually increase from the minimum values in summer (1.819 ◦C, 2.863 ◦C and 2.828 ◦C) to the maximum values in winter (2.62 ◦C, 5.693 ◦C and 4.451 ◦C), and then decrease in the spring. Seasonal MBEs are displayed in Figure 10d, which indicates that CLDAS overestimates seasonal mean temperature in all seasons, though the overestimation is relatively small in autumn. Opposite to CLDAS, ERA5L and GLDAS both underestimate seasonal mean temperature, and the underestimation is more severe in ERA5L. The largest negative bias occurs in autumn and the smallest negative bias occurs in summer for both ERA5L and GLDAS. The histograms of seasonal NSE (Figure 10e) and WIA (Figure 10g) show that the consistency of the three reanalysis data with in-situ observations is relatively poor in winter, and is optimal in autumn. However, from the perspective of KGE (Figure 10f), the three reanalysis datasets are worst in spring and better in summer.

**Figure 9.** Monthly evaluation of during 2017–2018. (**a**): Series of monthly mean temperature; (**b**) CC; (**c**) RMSE; (**d**) MBE; (**e**) NSE; (**f**) NSE; (**g**) WIA.

### *3.4. Comparative Reanalysis at Individual Sites*

Figure 11 shows box plots of temperature errors of CLDAS, GLDAS, and ERA5L during 2017–2018. The numbers of stations with CC (Figure 11a) higher than 0.95 account for 82.4%, 52.9%, and 70.6% of the total number of stations for CLDAS, ERA5L, and GLDAS, respectively. The lowest CCs, with respective values of 0.902, 0.915, and 0.913 for CLDAS, ERA5L, and GLDAS, all occur at Ruoergai (Elinghu), while the highest CCs occur at Ali and Golmud (0.992 for CLDAS), Haibei (0.974 for ERA5L), and Mushitage (0.979 for GLDAS). RMSEs (Figure 11b) are within the ranges of 1.222–4.289 ◦C, 2.345–6.076 ◦C, and 2.366–5.736 ◦C for CLDAS, ERA5L, and GLDAS, respectively. The largest RMSEs of CLDAS and ERA5L occur at Ruoergai (Elinghu), where the correlation is the lowest. The largest RMSE of GLDAS is found at Lasa. The smallest RMSEs of the three datasets occur at different sites. The box plot of MBE (Figure 11c) shows that CLDAS is lower than observations at only 4 sites, i.e., Ruoerai (Elinghu), Sanjinagyuan, Golmud, and Shenzha, which account for 23.5% of the total observation sites. The largest negative bias (−1.995) is found at Ruoergai (Elinghu), and the largest positive bias (1.99 ◦C) occurs at Naqu (Hanhansuo). ERA5L data are lower than observations at all sites, and the largest bias is found at Mushitage (−4.968 ◦C). Positive biases of GLDAS only occur at Ruoergai (Maqu), Naqu (Qingzangsuo), Naqu (Hanhansuo), and Shuanghu, which account for 29.4% of the total stations. The largest positive bias occurs at Shuanghu (2.37 ◦C); the biases of GLDAS are negative at all other sites with the largest negative bias (−5.385 ◦C) at Lasa. The consistency indices of NSE for CLDAS, ERA5L, and GLDAS (Figure 11d) are within the ranges of 0.748–0.974, 0.102–0.919, and 0.219–0.934, respectively, and the ranges of KGE (Figure 11e) are −157.108–0.918, −389.193–0.537, and −138.881–0.882, respectively. The ranges of WIA (Figure 11f) are 0.936–0.994, 0.85–0.978, and 0.839–0.982, respectively. Based on NSE, KGE, and WIA, the consistency of CLDAS is the worst at Ruoergai (Elinghu) and the consistency of GLDAS is the worst at Lasa. The consistency of ERA5L is worst at Zhufeng, Zangdongnan, and Lasa. In summary, the various reanalysis datasets show different qualities and applicability.

**Figure 10.** Seasonal evaluation during 2017–2018. (**a**): Seasonal changes of the average temperature; (**b**): CC; (**c**): RMSE; (**d**): MBE; (**e**): NSE; (**f**): KGE; (**g**): WIA.

To intuitively and easily understand the relationship between the consistency and errors of CLDAS, ERA5L, and GLDAS at the 17 observation stations, Taylor diagrams between the three reanalysis datasets and in-situ observations at each individual observation site are displayed in Figure 12. Figure 12a–q show that the standard deviations of CLDAS and GLDAS are relatively large at 11 and 7 sites, respectively, while ERA5L shows greater variability at 15 sites. The correlation coefficient between CLDAS and in-situ observations is larger than those between the other two reanalysis datasets and observations at all observation sites except Ruoergai (Elinghu), where the CC of CLDAS is slightly lower than the CC of GLDAS and ERA5L. The Taylor diagram between the three reanalysis datasets and all the in-situ observations (Figure 12r) indicates that CLDAS is closer to, and more consistent with, observations with smaller deviation.

**Figure 11.** Box plots of temperature errors in CLDAS, ERA5Land GLDAS: (**a**) CC, (**b**) RMSE, (**c**) MBE, (**d**) NSE, (**e**) KGE, (**f**) WIA.

### *3.5. Comparative Reanalysis at Different Terrain Elevations*

To explore the temperature variation characteristics of the three reanalysis datasets in different elevations, the observation sites are divided into four categories of elevation: <3500 m, ≥3500–4000 m, ≥4000–4500 m, and ≥4500 m. Figure 13 show the bias characteristics of the three gridded datasets at different elevations. In terms of evaluation indices (CC, NSE, KGE, and WIA), the consistency between CLDAS and the in-station observations is higher than the other analysis products at any altitude. The MBEs of CLDAS show a positive deviation relative to the observation station, while ERA5L and GLDAS are opposite. The RMSEs of CLDAS are lower than those of the other two reanalysis datasets, and GLDAS is better than ERA5L. Although the CCs of ERA5L is slightly higher than GLDAS, other indices (NSE, KGE and WIA) are relatively lower than GLDAS. Compared to ERA5L and GLDAS, the CLDAS temperature data is less affected by elevation.

### *3.6. Comparative Reanalysis at Different Land Covers*

According to the land cover type, the observation sites are divided into seven categories: alpine meadow (AE), desert (DT), grassland in forests (GF), gravel (GL), peatland (PD), sand and gravel (SG), and artificial grassland (AG). Figure 14 show the bias characteristics of CLDAS, ERA5L, and GLDAS at different land covers. The MBEs of ERA5L and GLDAS showed negative deviation in in-station observations at different land covers, while CLDAS is opposite. The deviation of CLDAS at land cover of artificial grassland is the smallest, as are ERA5L and GLDAS with alpine meadow and peatland, respectively. In essence, the consistency indices (CC, NSE, KGE, and WIA) and deviation (MBE and RMSE) between CLDAS and in-station observations have a small range of variation, and are better than ERA5L and GLDAS in each land cover.

**Figure 12.** Taylor diagrams of (**<sup>a</sup>**–**q**) CLDAS, ERA5L, GLDAS against in-situ observations at 17 stations and (**r**) all observation stations.

**Figure 13.** The errors at different altitudes: (**a**) CC; (**b**) RMSE; (**c**) MBE; (**d**) NSE; (**e**) KGE; (**f**) WIA.

**Figure 14.** The errors at different land covers: (**a**) CC; (**b**) RMSE; (**c**) MBE; (**d**) NSE; (**e**) KGE; (**f**) WIA.
