*Article* **Analyses on the Multimodel Wind Forecasts and Error Decompositions over North China**

**Yang Lyu 1, Xiefei Zhi 1,\*, Hong Wu 2,\*, Hongmei Zhou 3, Dexuan Kong 4, Shoupeng Zhu <sup>2</sup> , Yingxin Zhang <sup>5</sup> and Cui Hao <sup>5</sup>**


**Abstract:** In this study, wind forecasts derived from the European Centre for Medium-Range Weather Forecasts (ECMWF), the National Centers for Environmental Prediction (NCEP), the Japan Meteorological Agency (JMA) and the United Kingdom Meteorological Office (UKMO) are evaluated for lead times of 1–7 days at the 10 m and multiple isobaric surfaces (500 hPa, 700 hPa, 850 hPa and 925 hPa) over North China for 2020. The straightforward multimodel ensemble mean (MME) method is utilized to improve forecasting abilities. In addition, the forecast errors are decomposed to further diagnose the error sources of wind forecasts. Results indicated that there is little difference in the performances of the four models in terms of wind direction forecasts (DIR), but obvious differences occur in the meridional wind (U), zonal wind (V) and wind speed (WS) forecasts. Among them, the ECMWF and NCEP showed the highest and lowest abilities, respectively. The MME effectively improved wind forecast abilities, and showed more evident superiorities at higher levels for longer lead times. Meanwhile, all of the models and the MME manifested consistent trends of increasing (decreasing) errors for U, V and WS (DIR) with rising height. On the other hand, the main source of errors for wind forecasts at both 10 m and isobaric surfaces was the sequence component (SEQU), which rose rapidly with increasing lead times. The deficiency of the less proficient NCEP model at the 10 m and isobaric surfaces could mainly be attributed to the bias component (BIAS) and SEQU, respectively. Furthermore, the MME tended to produce lower SEQU than the models at all layers, which was more obvious at longer lead times. However, the MME showed a slight deficiency in reducing BIAS and the distribution component of forecast errors. The results not only recognized the model forecast performances in detail, but also provided important references for the use of wind forecasts in business departments and associated scientific researches.

**Keywords:** wind forecast; error decomposition; bias; distribution; sequence

### **1. Introduction**

Wind, the movement of air, is one of the most important meteorological elements, and plays a significant role in determining and controlling climate and weather [1]. It has various impacts on human life and economic society, in both positive and negative ways. Appropriate wind conditions can help many industries, such as wind power production, whereas high winds can cause downed trees and power lines, flying debris and buildings to collapse, which may lead to power outages, transportation disruptions, damage to buildings and vehicles, and injury or death [2]. With respect to transportation fields at the

**Citation:** Lyu, Y.; Zhi, X.; Wu, H.; Zhou, H.; Kong, D.; Zhu, S.; Zhang, Y.; Hao, C. Analyses on the Multimodel Wind Forecasts and Error Decompositions over North China. *Atmosphere* **2022**, *13*, 1652. https://doi.org/10.3390/ atmos13101652

Academic Editors: Jimy Dudhia and Leonardo Primavera

Received: 10 August 2022 Accepted: 7 October 2022 Published: 10 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

near surface, windy conditions can create dangerous driving situations on highways [3]. As for at higher levels, abnormal winds can increase risks in terms of unstable aircraft, posing profound threats to aviation safety [4]. Thus, accurate and reliable forecasts of winds play an important role in both reducing traffic accidents and improving the efficiency of traffic operations [5,6].

So far, due to improved understanding of atmospheric physical processes and the rapid development of computer technology, numerical weather prediction (NWP) has been greatly developed and used in various predictions of weather and climate [7–9]. Taking wind as an example, subjective forecasts are always limited in ability because of the lack of enough observations, while the NWP could enrich wind forecasts with multiple lead times and multiple levels, as required [10]. In addition, it has been demonstrated that the NWP models are generally capable of reasonably forecasting atmospheric conditions. However, obvious differences in forecasting abilities always feature different NWP models in different regions. Comprehensive assessments are necessary for the rational application of NWP products and for further enhancing forecast ability [11–13].

On the other hand, considering the chaotic characteristics of atmosphere dynamics, even the best NWP model has inevitable systematic biases. Therefore, it is important to further post-process NWP model outputs to effectively improve forecasting abilities [14–16]. Correspondingly, many statistical post-processing methods, which enhance forecast abilities by learning a function derived from the historical performances of models, have been developed and widely utilized in recent years. Such as the frequency matching method [17,18], the mean bias removal [19], the pattern projection methods [20,21] and the decaying average method [22,23]. Moreover, due to the inherent limitation and uncertainty of an individual NWP model, the multimodel ensemble methods, including the straightforward ensemble mean, the bias-removed ensemble mean and other advanced superensemble algorithms, have been proposed to calibrate forecast errors of temperature, precipitation, wind and other variables, making full use of valid information from various NWP models [24–27].

Over the past few decades, the multimodel ensemble forecasts based on various algorithms have been demonstrated as capable of effectively improving single NWP results, which is always featured with lower root mean square errors, higher correlation coefficients and many other metrics with higher abilities [28–30]. However, most of these assessments could only provide composite scores, which lack certain physical interpretabilities and give little insight into which aspects of the forecasts are good or bad. In this regard, decomposing performance measures into multiple interpretable elements has been considered an intelligent option to obtain more realistic and insightful assessments, and comparisons between different forecast systems [31–33]. At present, error decomposition has been widely utilized to analyze the sources of errors and to indicate future directions for improvement [34,35]. Taking the metric of mean square error (MSE) as an example [32], Murphy et al. [36] decomposed the MSE into correlation, conditional bias, unconditional bias and possible other contributions. Afterwards, Geman et al. [37] decomposed the MSE into bias and variance. More recently, Hodson et al. [38] have further decomposed it into components of bias, distribution and sequence.

In this study, the wind forecasts derived from the European Centre for Medium-Range Weather Forecasts (ECMWF), the National Center for Environmental Prediction (NCEP), the United Kingdom Meteorological Office (UKMO) and the Japan Meteorological Agency (JMA), accompanied by a multimodel ensemble mean (MME), are evaluated and compared for multiple layers including ground (10 m) and isobaric surfaces (500 hPa, 700 hPa, 850 hPa and 925 hPa). The study area selected is North China (46◦ N–36◦ N, 111◦ E–119◦ E; NC), which features the most populous region and a major agricultural and industrial sector [39,40]. Meanwhile, forecast errors are decomposed to diagnose the error sources of wind forecasts in NWP models, and analyzed to determine which aspects of the forecasts are improved by the MME. The manuscript is organized as follows. The datasets and methods are briefly described in Section 2. Section 3 displays the comprehensive

evaluation of the wind forecast abilities of ECMWF, NCEP, UKMO, JMA and MME. Finally, a summary and discussion are presented in Section 4.

### **2. Data and Method**

### *2.1. Data*

The used forecast datasets of meridional wind (u) and zonal wind (v) at ground (10 m) and isobaric surfaces (500 hPa, 700 hPa, 850 hPa, 925 hPa) with lead times of 1–7 days were derived from ECMWF, NCEP, UKMO and JMA in the the Observing System Research and Predictability Experiment (THORPEX) Interactive Grand Global Ensemble (TIGGE).

In addition, ERA5 reanalysis is selected for verification. ERA5 is a product of the Integrated Forecast System (IFS) release 41r2, which was operational at ECMWF during the period March 2016 to November 2016. ERA5 therefore benefits from a decade of developments in model physics, core dynamics and data assimilation [41]. Various considerations have to be made when choosing the verification dataset to evaluate the performance of NWP models. Station observation has the advantage of being independent of all models, but wind observational datasets over isobaric surfaces are difficult to obtain. Meanwhile, reanalysis provides consistent "maps without gaps" of essential climate variables by optimally combining observations and models [42]. Moreover, ERA5 data has been demonstrated to be capable of effectively reflecting and describing the local atmospheric conditions in observations, and has been widely used in associated studies including forecast error evaluation, analyzing the thermodynamic characteristics of warm sector heavy rainfall, etc. [43–46]. On the other hand, a previous study has proved that whether the verification data consist of reanalysis or observations, it has little impact on final assessment results [47]; therefore, we chose ERA5 for verification in this study.

Correspondingly, the study area is unified as North China (46◦ N–36◦ N, 111◦ E–119◦ E; NC), with a horizontal resolution of 0.5◦ × 0.5◦, and the entire year of 2020 is selected for evaluation. Both forecast and verification datasets are obtained from the ECMWF archive at https://apps.ecmwf.int/datasets/, accessed on 1 August 2022. The topography of North China and its surrounding area is described in Figure 1.

**Figure 1.** Topography (m) of North China (marked region) and its surrounding area.

### *2.2. Verification Metrics*

Aimed at quantitative assessments of forecast results of different NWP models and the MME method over North China for assessed period, several metrics are employed; including the root mean square error (*RMSE*) and temporal correlation coefficient (*TCC*):

$$RMSE = \sqrt{\frac{1}{n} (f\_i - o\_i)^2} \tag{1}$$

$$TCC = \frac{\sum\_{i=1}^{n} \left(f\_i - \overline{f}\right) (o\_i - \overline{o})}{\sqrt{\sum\_{i=1}^{n} \left(f\_i - \overline{f}\right)^2} \sqrt{\sum\_{i=1}^{n} (o\_i - \overline{o})^2}} \tag{2}$$

where *n* indicates the total number of samples. The term *fi* and *oi* represent the forecast and observation of sample *i*, respectively. The terms *f* and *o* refer to the average forecast and observation, respectively.

In addition, the error decomposition proposed by Hodsonal et al. [38] is utilized to diagnose the sources of error for both NWP models and the MME method. Firstly, the *MSE* at each grid can be calculated by Equation (3):

$$MSE = \frac{1}{n} \sum\_{i=1}^{n} (f\_i - o\_i)^2 \tag{3}$$

where *fi* and *oi* represent the forecast and observation of sample *i*, respectively. According to the decomposing method proposed by Geman et al. [37], the *MSE* can be decomposed into bias and variance:

$$\begin{array}{l} MSE(e) = \left( E(e^2) - E(e)^2 \right) + E(e)^2\\ = Var(e) + Bias(e)^2 \end{array} \tag{4}$$

where *e* represents the forecast error of the model as the difference between the forecast and observation, while *E*(*e*) represents the mean of the forecast error which is equal to *Bias*(*e*) and *Var*(*e*) represents the variance of the forecast error. The variance component quantifies the extent to which the model reproduces the observed variability, while the bias component quantifies the ability of the model to reproduce the average characteristics of the observations. Meanwhile, the variance component can be further decomposed to obtain a deeper understanding of model performance [38]. The derivation begins by monotonically sorting the model predictions and observations, then decomposing the *MSE* of the result:

$$w = \text{sort}(f) - \text{sort}(o) \tag{5}$$

$$MSE(w) = Bias(w)^2 + Var(w) \tag{6}$$

where *sort*(*f*) and *sort*(*o*) represent the sorted observations and forecasts, respectively, and *w* represents the forecast error after sorting. Considering that changing the sequence of the data does not change the mean error of the data, bias before and after the sorting is equal. Meanwhile, the sorted observations and forecasts share the same time series, and the variance at this point, *Var*(*w*), describes the error caused by the data distribution (*Dist*(*e*)); thus, the Equations (7) and (8) can be obtained:

$$Var(w) = Dist(e)\tag{7}$$

$$MSE(w) = Bias(e)^2 + Dist(e) \tag{8}$$

Furthermore, the difference between *MSE*(*e*) and *MSE*(*w*) can be attributed to the time series variation, Sequence(e); thus, the following equation can be obtained:

$$\begin{array}{c} MSE(e) - MSE(w) = Var(e) - Var(w) \\ = Sequence(e) \end{array} \tag{9}$$

In conclusion, the MSE can be decomposed into the bias element, the distribution element and the sequence element as follows:

$$\begin{array}{l} MSE(e) = Bias(e)^2 + Var(e) \\ = Bias(e)^2 + (Var(e) - Var(w)) + Var(w) \\ = Bias(e)^2 + sequence(e) + Distribution(e) \end{array} \tag{10}$$

where *Bias*(*e*)<sup>2</sup> is the bias component, which characterizes the ability of the forecast to reproduce the average characteristics of the observations, *Sequence*(*e*) is the sequence error component, which characterizes the error due to the forecast being ahead of (or lagging behind) the observations. *Distribution*(*e*) is the distribution error component, which characterizes the error due to the difference in data distribution between the forecasts and the observations. In order to transfer the units of associated error components from *m s* <sup>2</sup> into *m*/*s*, we divide both sides of the equation by RMSE at the same time and obtain the error decomposition of the RMSE.

### **3. Result**

### *3.1. Evaluation of Multiple NWP Models and the MME*

Figure 2 describes the regional averaged RMSE and TCC of ECMWF, NCEP, UKMO, JMA and MME for wind forecasts at the 10 m level over North China (NC) during a validation period of 1–7 lead days, including the meridional wind (U10), zonal wind (V10), wind speed (WS10) and wind direction (DIR10). Generally, multiple forecasts are characterized by consistent trends of increasing RMSE and decreasing TCC with growing lead times. The ECMWF shows the best performance, but with limited superiorities to UKMO and JMA, while the NCEP shows the lowest ability among the four NWP models. Specifically, the ECMWF features the lowest RMSEs and the highest TCCs at most lead times for all the elements. On the other hand, NCEP tends to show the highest RMSEs and the lowest TCCs, but it does not show much difference in comparison to other models in terms of WS10 forecasts. Furthermore, the MME is significantly superior to the individual NWP models, which is more evident for longer lead times. The RMSEs of the MME are lower than ECMWF by 0.3–0.5 m/s (12◦–35◦) for U10, V10 and WS10 (DIR10) for all lead times, and the MME shows TCCs of 0.1–0.15 higher than ECMWF for wind forecasts.

For assessments of the spatial distribution of forecast abilities for the NWP models and MME, with the lead time of 1 day taken as an example, Figure 3 describes the spatial distributions of RMSE for U10, V10, WS10 and DIR10 derived from ECMWF, NCEP and MME, which denote the best NWP model, the worst NWP model and the multimodel ensemble mean, respectively. In terms of U10 and V10, the lower RMSEs are continuously seen around central NC, whereas the highest RMSEs occur around northwestern NC. Meanwhile, the RMSEs of NCEP are higher than ECMWF over the whole area, and the advantages of MME to ECMWF are mainly reflected over the southwest NC. As for DIR10, the RMSE spatial distribution of ECMWF, NCEP and MME are generally consistent, with the largest RMSEs reaching up to 120◦ occurring at central NC, while the lowest RMSEs of lower than 40◦ are seen at northwestern NC. It is worth noting that the RMSEs are obviously lower over all regions in the MME than ECMWF.

In order to assess the wind forecasts at multiple isobaric surfaces, Figure 4 describes the regional averaged RMSE of U, V, WS and DIR at 500 hPa, 850 hPa, 700 hPa and 925 hPa, derived from ECMWF, NCEP, UKMO, JMA and MME over NC, with lead times of 1, 4 and 7 days taken as examples. Generally, the multiple forecasts are characterized by consistent trends of increasing RMSE (decreasing RMSE) for U, V and WS (DIR) with the rising height. Among them, the RMSE of U, V and WS show the highest growth rates between 925 hPa and 850 hPa, and the highest growth rate of DIR is seen between 700 hPa and 500 hPa. Furthermore, the ECMWF shows lower RMSE than the other NWP models at all isobaric surfaces, which is more evident at higher levels. The advantages of ECMWF diminish with increased lead times. Furthermore, the MME tends to show lower RMSE for U, V and WS

(DIR) than ECMWF at all levels for all lead times, which is more obvious at higher (lower) levels for longer lead times.

**Figure 2.** Variations in RMSE and TCC of U10, V10, WS10 and DIR10, at lead times of 1–7 days derived from ECMWF, NCEP, UKMO, JMA and MMA averaged over North China.

To reveal the spatial distribution of wind forecast abilities at the isobaric surfaces for NWP models and the MME, Figure 5 describes the RMSE spatial distribution for U500, V500, WS500 and DIR500 derived from ECMWF, NCEP and MME, with the lead time of 1 day taken as an example. Generally, multiple forecasts show similar error distribution characteristics for U500, V500 and WS500. Specifically, the lower RMSEs are seen at central and northeastern NC, while the largest RMSEs occur at northwestern NC. Furthermore, NCEP shows limited forecast ability, with RMSEs reaching up to 2.2 m/s at most areas for U500, V500 and WS500, while the RMSEs of MME are mostly lower than 2 m/s. In terms of DIR500, the lowest RMSEs are seen at central NC for ECMWF, NCEP and MME, while the largest occurs at the northwestern and southern NC. Furthermore, the MME shows clear superiority to the two NWP models, with its RMSEs of lower than 60◦ for most areas.

To summarize, there is little difference in the performances of the four NWP models in terms of wind direction forecasts, but clear differences occur in the meridional wind, zonal wind and wind speed forecasts. The ECMWF shows general advantages over the other three at both 10 m and isobaric surfaces, which are more pronounced at isobaric surfaces. Furthermore, the forecast abilities of MME are superior to ECMWF for U, V, WS and DIR, which are more distinct at higher levels for longer lead times. It is worth noting that multiple forecasts manifest with the consistent trends of increasing (decreasing) RMSE for U, V and WS (DIR) with rising height. In addition, all the NWP models and MME tend to show higher forecast abilities at central NC, while they manifest with lower ability at northwestern NC for both ground and isobaric surfaces.

**Figure 3.** Spatial distributions of RMSEs for U10, V10, WS10 and DIR10 with a lead time of 1 day derived from ECMWF, NCEP and MME.

**Figure 4.** Variations in RMSE for U, V, WS and DIR at isobaric surfaces (500 hPa, 700 hPa, 850 hPa, 925 hPa) for lead times of 1–7 days, derived from ECMWF, NCEP, UKMO, JMA and MMA, averaged over North China.

#### *3.2. Error Decompositions of the Wind Forecasts*

Although the forecast abilities of NWP models and MME have been assessed in Section 3.1 via metrics, including RMSE and TCC, they tend to provide overall ability scores and give little insight into which aspects of the models are good or bad. Thus, the error decomposition method is utilized in this section to diagnose the error sources of wind forecasts in NWP models, and to analyze which aspects of the forecasts are improved by the MME method.

Figure 6 describes the regional-averaged RMSE, the decomposed bias component (BIAS), the distribution error component (DIST) and the sequence error component (SEQU) of the 10 m wind speed (WS10) and direction (DIR10) over NC derived from ECMWF, NCEP, UKMO, JMA and MME for lead times of 1–7 days. Generally, SEQU is the main source of error for both WS10 and DIR10, and rises rapidly with increasing lead times. While BIAS and DIST account for a relatively small proportion of the total error and do not increase with growing lead times. It implies that the 10 m wind forecast errors are mainly attributed to the forecasts being ahead of (lagging behind) the observations. However, the deficiency of NCEP for WS10, compared with other NWP models, could mainly be attributed to the BIAS and DIST. Furthermore, the MME tends to generate lower SEQU than four NWP models for both WS10 and DIR10, which is more evident at longer lead

times, while the BIAS and DIST of the MME could not show obvious superiority over the best NWP model.

**Figure 5.** Spatial distributions of RMSEs for U500, V500, WS500 and DIR500 with a lead time of 1 day derived from ECMWF, NCEP and MME.

**Figure 6.** Variations in RMSE, decomposed BIAS, DIST and SEQU for WS10 and DIR10 at lead times of 1–7 days derived from ECMWF, NCEP, UKMO, JMA and MMA, averaged over North China.

To assess the spatial distributions of each error component, Figures 7 and 8 describe the BIAS, DIST and SEQU spatial distributions derived from ECMWF, NCEP and MME over NC for WS10 and DIR10, respectively, with the lead time of 1 day taken as an example. Generally, multiple forecasts perform with consistent spatial distribution for both WS10 and DIR10. In terms of WS10, the largest BIASs and DISTs occur at central NC, while also characterized by the lowest SEQUs. In addition, the largest SEQUs of up to 1 m/s can be seen at northwestern and southeastern NC. Although MME is generally superior to ECMWF, its DISTs at northwestern NC are obviously higher than the ECMWF results. For DIR10, the largest BIASs, DISTs and SEQUs mainly occur at central NC, and the lowest DISTs and SEQUs can be seen at northwestern NC. Moreover, the MME shows lower SEQUs than the two NWP models over most areas, but the DISTs of the MME are generally higher than the two NWP models, which is more distinct at southeastern NC. It is worth noting that the higher BIASs and DISTs tend to occur in the regions characterized with high altitudes, while SEQUs are less affected. This implies that the BIASs and DISTs might be associated with the deficiency of NWP models in simulating real terrain.

Aiming at diagnoses of the wind forecast errors at the isobaric surface, Figure 9 shows the regional averaged RMSE and the components of BIAS, DIST and SEQU for WS500 and DIR500 over NC derived from the four NWP models and MME, with lead times of 1–7 days. Generally, the SEQU remains the main source of errors and they rise rapidly with increasing lead times for both WS500 and DIR500. Furthermore, the proportions accounted for by SEQU in total errors are higher than those in 10 m wind forecasts for both WS500 and DIR500. Unlike the 10 m wind forecasts, the insufficiency of the NCEP forecasts at 500 hPa could mainly be attributed to the SEQU. On the other hand, the MME is characterized by lower SEQU, along with higher BIAS and DIST, than all NWP models for the WS500, which is more evident at longer lead times.

**Figure 7.** Spatial distributions of decomposed BIAS, DIST and SEQU for WS10 with a lead time of 1 day derived from ECMWF, NCEP and MME.

1 day derived from ECMWF, NCEP and MME.

**Figure 8.** Spatial distributions of decomposed BIAS, DIST and SEQU for DIR10 with a lead time of

**Figure 9.** Variations in RMSE, decomposed BIAS, DIST and SEQU for WS500 and DIR500 at lead times of 1–7 days derived from ECMWF, NCEP, UKMO, JMA and MMA, averaged over North China.

Figures 10 and 11 further describe the spatial distributions of BIAS, DIST and SEQU components derived from ECMWF, NCEP and MME over NC for WS500 and DIR500, with the lead time of 1 day taken as an example. In terms of WS500, the SEQUs of NCEP over most areas are greater than 2 m/s, which accounts for the overall insufficiency of the model. Furthermore, the MME shows generally lower SEQUs than the two NWP models, while the BIASs of MME at northern NC are higher than ECMWF and NCEP. For DIR500, the three forecast systems show generally consistent distributions, and the largest SEQUs are mainly distributed at northern NC. Furthermore, MME performs with the lower SEQUs than ECMWF and NCEP for most areas, but there are higher DISTs at northwestern NC in MME than the two models. In addition, MME could not produce overt improvements to ECMWF and NCEP in terms of the BIAS component.

In summary, the main source of wind forecast errors at both 10 m and isobaric surfaces is the SEQU component, which rises rapidly with increasing lead times. The proportions accounted for by SEQU in total errors at isobaric surfaces are higher than that at the 10 m level. The deficiency of NCEP at both 10 m and isobaric surfaces could mainly be attributed to the BIAS and SEQU terms, respectively. Furthermore, the MME tends to perform with lower SEQU than NWP models at both 10 m and isobaric surfaces, which is more distinct for longer lead times. However, the MME shows a slight deficiency in reducing BIAS and DIST. There are even higher DISTs for MME than NWP models, which are not included in detail here and require exploration in future work.

**Figure 10.** Spatial distributions of decomposed BIAS, DIST and SEQU for WS500 with a lead time of 1 day derived from ECMWF, NCEP and MME.

**Figure 11.** Spatial distributions of decomposed BIAS, DIST and SEQU for DIR500 with a lead time of 1 day derived from ECMWF, NCEP and MME.

### **4. Conclusions and Discussion**

In this study, the wind forecasts of 2020 derived from ECMWF, NCEP, UKMO and JMA over NC for lead times of 1–7 days at 10 m and isobaric surfaces (500 hPa, 700 hPa, 850 hPa and 925 hPa) were evaluated and the straightforward multimodel ensemble mean method (MME) was utilized to improve wind forecast abilities. Furthermore, the error decomposition method was also applied to diagnose the error sources of wind forecasts

in NWP models and analyze which aspects of the forecasts were improved by the MME method. Associated results were obtained as follows.

Generally, there was little difference in the performances of the four NWP models in terms of wind direction forecasts, but evident differences occurred in the meridional wind, zonal wind and wind speed forecasts. The ECMWF showed general advantages over the other three NWP models at both 10 m and isobaric surfaces, which were more pronounced at isobaric surfaces. Furthermore, the forecast abilities of MME were superior to ECMWF for U, V, WS and DIR, which were more obvious at higher levels for longer lead times. It is worth noting that multiple forecasts manifested with the consistent trends of increasing (decreasing) RMSE for U, V, WS (DIR) with rising height. In addition, all the NWP models and MME tended to show higher forecast ability at central NC, while they manifested with lower ability at northwestern NC for both ground and isobaric surfaces.

The main source of wind forecast errors at both 10 m and isobaric surfaces was the SEQU component, which rose rapidly with increasing lead times. In addition, the proportions accounted by SEQU in total errors at isobaric surfaces were higher than that at the 10 m level. Furthermore, the deficiency of NCEP at the 10 m and isobaric surfaces could mainly be attributed to the BIAS and SEQU terms, respectively. Furthermore, the MME tended to perform with lower SEQU than NWP models at both 10 m and isobaric surfaces, which was more distinct for longer lead times. However, the MME showed slight deficiency in reducing BIAS and DIST, and there were even higher DISTs for the MME than the NWP models. These results not only provide an important reference for the use of wind NWP results in business departments and scientific research, but also in directing further improvement of NWPs in the future.

Moreover, according to the current study, higher BIASs and DISTs tended to occur at regions with high altitudes for wind forecasts at 10 m, which implied that the BIAS and DIST might be associated with the deficiency of the model in simulating the real terrain [48,49]. Thus, calibration methods incorporating geographic information should also be examined in the future [50,51]. On the other hand, the examined MME method is one of the most basic and straightforward multimodel ensemble methods, which assigns all models with the same role. Considering the deficiency of MME in reducing the BIAS and DIST of wind forecasts, the multimodel ensemble methods based on more complex algorithms assigning different weights for different models, including Kalman filter [52,53], object-based diagnosis [54] and deep learning methods [6,55], are also on the way to be utilized to further improve wind forecast ability. Furthermore, with the development of modern observation channels and technologies, observations are enriched and could be taken into consideration to assess and calibrate the model products in a more realistic way.

**Author Contributions:** Y.L. and X.Z. contributed to conception and design of the study. H.W., S.Z. and Y.Z. contributed to the analysis. H.Z., D.K. and C.H. organized the database. All authors contributed to manuscript revision, read, and approved the submitted version. All authors have read and agreed to the published version of the manuscript.

**Funding:** The study was jointly supported by the Collaboration Project of Urumqi Desert Meteorological Institute of China Meteorological Administration "Precipitation forecast based on machine learning", the National Key R&D Program of China (Grant No. 2017YFC1502002), the Basic Research Fund of CAMS (Grant No. 2022Y027), the research project of Jiangsu Meteorological Bureau (Grant No. KQ202209).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The forecast and observation data in this paper are publicly available. The datasets are obtained from the ECMWF archive in https://apps.ecmwf.int/datasets/, 1 August 2022.

**Acknowledgments:** The authors are grateful to ECMWF, NCEP, UKMO and JMA for their datasets.

**Conflicts of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

### **References**

