1. Introduction
The accuracy of numerical weather prediction (NWP) strongly depends on the initial state, and a good initial state depends on the process of high-quality data assimilation. In a variational data assimilation system [
1], the background error covariance matrix (
matrix) plays a crucial role in weighting a priori state, information spreading and smoothing, balance relationship construction, and improving in observation usage efficiency [
2]. Therefore, the precise specification of
is of great significance for improving the forecasting effect of a model. In practice, the dimension of
is vast, and a complete description of the background field error cannot be obtained. It needs to be modeled under appropriate assumptions. Usually, under the assumptions of isotropy, homogeneity, and static conditions, the
matrix is implicit in a set of control variable transformations [
3,
4].
The control variable transformation (CVT) technique converts the objective function minimization of model variables into a functional analysis of control variables, which includes two parts: physical transformation and spatial transformation. The physical transformation expresses the dynamic balance constraint between different variables and transforms the model variables into quasi-independent analysis variables. Generally, the variable that can characterize the main balance pattern of the atmosphere is selected as the first independent analysis variable [
5], and the remaining model variables are decomposed into the parts balanced with selected variables and the unbalanced parts. The unbalanced parts are the remaining independent analysis variables. For example, in the European Centre for Medium-Range Weather Forecasts (ECMWF), these analysis variables are vorticity, unbalanced divergence, unbalanced mass (temperature and surface pressure), and specific humidity (considered separately) [
6].
The relevant assumptions made for modeling
are only a rough approximation of the forecast error. The true error structure must be inhomogeneous and anisotropic and vary with the weather conditions. The wavelet form of the background error covariance model proposed by Fisher [
7] enables the matrix
to vary with scale and spatial position and considers the inseparability of covariance in the horizontal and vertical directions. Regarding the time-varying characteristics of flow fields, ensemble data assimilation (EDA) can provide flow-dependent estimates of short-term forecast errors with ensemble members and construct a
matrix reflecting the uncertainty of the real-time meteorology situation [
8,
9]. Hybrid data assimilation methods that combine climatological and ensemble covariances have also been implemented and continuously developed in some numerical operational centers (e.g., Met Office and National Centers for Environmental Prediction) [
10,
11].
The use of EDA to estimate the background error variance of vorticity was implemented in the ECMWF model in May 2011 [
12]. Raynaud, et al. [
13] used the Arpége system to examine the effect of extending the flow-dependent variance to all variables, emphasizing the importance of correctly representing unbalanced statistics. Bonavita, et al. [
14] determined that for temperature and divergence, the variances in unbalanced variables in the control vector that are not constrained by dynamic balance contribute significantly to the total variances, especially for the temperature errors in the troposphere. For high-resolution assimilation systems and particularly areas where quasi-geostrophic theory is less suitable (e.g., the tropics and in extratropical low-pressure systems), the proportion of unbalanced variances further increases.
Tropical cyclones (TCs) are a type of low-pressure system accompanied by high-impact weather and pose a great threat to nearby areas [
15]. Currently, the track forecast error of TCs shows a decreasing trend, but the intensity forecast error has not changed much. One of the key factors affecting the intensity forecast is the accurate description of the TC in the initial field of a model [
16,
17]. Li, et al. [
18] studied Hurricane Ike (2008) using the Weather Research and Forecasting (ERF) model hybrid ensemble-three-dimensional variational data assimilation system and found that flow-dependent background errors can dynamically generate more consistent state estimation. Gopalakrishnan and Chandrasekar [
19] analyzed the performance of the 4D variational over 3D variational assimilation scheme for the forecasting of two TCs in the Indian Ocean. Previous studies are mostly based on regional models and only focused on specific TC cases. Statistical results on the effect of flow-dependent background error variance on TC forecasting in the global model remain to be evaluated.
The current trend in data assimilation technology is hybrid data assimilation, the key component of which is the use of background error information that can reflect the current flow characteristics. For developing of operational NWP systems, it is important to evaluate the impact of day-to-day background error variance since these results can help guide the design of the assimilation scheme. The improvements in TC forecasting skills also require a flow-dependent description of the background error field. Therefore, it is necessary to analyze the effects of different variables in the flow-dependent control vector (especially the unbalanced components that may play a leading role) on the development of TCs.
The primary purpose of this study is to evaluate the effect of ensemble-based unbalanced variances on assimilation and forecasting and to conduct a detailed analysis on high-impact weather events such as TCs. Specifically, we used the En4DVar assimilation system (
Section 2.1) to estimate the flow-dependent background error variances. By comparing the different results of the ensemble estimation only for vorticity and for all control variables, the effect of the flow-dependent unbalanced variances on the TC forecast is obtained. The objectives of this study are summarized as follows: On the one hand, we diagnose the background error variance of the control variables estimated by EDA to verify their flow-dependent characteristics and to inform the further development of the assimilation system (full flow-dependent background error covariance matrix). On the other hand, in the framework of the global model, we analyze the influence of the flow-dependent variance of unbalanced control variables on the track and intensity forecasts of TCs in different regions to preliminarily reveal the role of different control variables and provide possible reasons for improving TC forecasts from a dynamical perspective.
This article is mainly composed of five sections.
Section 2 briefly introduces the data assimilation system and forecast model used in this study, the theoretical knowledge of the background error covariance matrix
and the related experimental design.
Section 3 presents a diagnostic study on TC Saudel, including the comparison of background error statistics under various configurations and the analysis increments of single-observation experiments. In
Section 4, the forecast skills are considered. The effects of the unbalanced variances on TC forecasts are investigated through batch comparison experiments and the case study of Saudel.
Section 5 provides a summary and discussion of the main conclusions of this study.
2. Experimental Configuration
2.1. Assimilation System and Forecast Model
The assimilation system used in this paper is the Yin-He four-dimensional variational data assimilation system (YH4DVAR), which was developed based on the work of Zhang, et al. [
20,
21]. It assimilates global meteorological data and provides a high-quality initial field for global medium- and short-term NWP. The system is designed with an integrated cost function, which comprehensively considers the background field, observation processing, gravity wave control, and deviation correction. The model variables are vorticity
, divergence
, temperature and the logarithm of surface pressure
, and specific humidity
. As an operational system, YH4DVAR runs the assimilation of global data twice a day, providing a high-precision initial field for the global model. Therefore, the assimilation window was set to 12 h, and the analysis and forecast products of 0000 UTC (1200 UTC) use the observations in the time window from 2100 UTC to 0900 UTC (0900 UTC to 2100 UTC). Approximately 3.5 million observation data points can be assimilated at a single time. These observations have conventional data (humidity, temperature, pressure, and wind) provided by radio sounding, surface, aircraft, etc. Although they are less than 10% of the total, they play an important role in the bias correction process. Most of the observations are unconventional data, such as Advanced Microwave Sounding Unit-A (AMSU-A), advanced technology microwave sounder (ATMS), global positioning system radio occultation (GPSRO), etc., which can detect radiations in the atmospheric column. The radiative transfer model can convert the prior information of the atmosphere into an equivalent amount of atmospheric radiation and compare it with the detection data as the innovations that drive the assimilation system. The current resolution of the system is TL1279L137 (indicating a spectral triangle truncated wavenumber of 1279, a linear grid, and 137 vertical mixed coordinate layers), and the detailed settings of the vertical model levels refer to ECMWF (
https://confluence.ecmwf.int/display/UDOC/L137+model+level+definitions (accessed on 20 September 2022)). To improve the computation efficiency, YH4DVAR adopts a multiresolution incremental scheme [
22,
23] and configures three inner loops (i.e., three minimization iterations). The resolution of the first loop is T159, and the resolutions of the other two are both T255.
The model that matches the assimilation system used in the experiment is the Yin-He Global Spectral Model (YHGSM), which is a global NWP model developed by the College of Meteorology and Oceanography, National University of Defense Technology. The dynamical core of the YHGSM satisfies the dry-mass conservation constraint proposed by Peng, et al. [
24]. The time discretization adopts the semi-implicit semi-Lagrangian scheme, and the spatial discretization adopts the spherical harmonic functions expansion (horizontal) and the finite-element method (vertical). For details, see Wu, et al. [
25], Yang, et al. [
26,
27], and Yin, et al. [
28,
29].
2.2. Background Error Covariance Modelling
The four-dimensional variational (4DVar) cost function in incremental form can be expressed as:
In the actual algorithm implementation, the variable transformation matrix
is used to transform the model variables from the incremental space to the control vector space [
4,
5], which can be expressed as:
The control vector is the vector processed by the minimization algorithm in the assimilation system. The analysis field is the model variable field when the cost function is optimal. The difference between the analysis field and the background field is expressed using the analysis increment , i.e., . The subscript is the time index corresponding to the observation, and is the priori estimate (background field) of the target atmospheric state analysis field . is the innovation vector, is the observation vector, is the observation operator that maps the model variables to the observation space, and is obtained by the linearization of the observation operator near the background state. represents the state field corresponding to the background field propagating to time through the complete nonlinear model, and represents the state where the control vector at the initial time propagates through the tangent linear model to time . is the observation error covariance matrix. The background error covariance matrix does not appear in Equation (1) but is implicitly represented by the matrix ().
YH4DVAR uses a spherical wavelet background error covariance model [
7,
30], and the corresponding CVT relationship based on a spherical wavelet is:
where
represent wavelets of different scales,
is the vertical covariance matrix on the
wavelet space and the horizontal position (
is longitude,
is latitude),
is the
wavelet control variable,
stands for convolution,
is the wavelet transform,
is the background error variance in the grid space, and
represents the balance constraint relationship between different variables.
The control variable transformation matrix usually includes balance transformation and spatial transformation. Balance transformation deals with the correlations between different variables. The original background error covariance matrix is transformed into a diagonal matrix by balance transformation, so that those independent analysis variables can be considered separately. Thus, the original multivariate problem is transformed into a univariate problem [
31]. Spatial transformation deals with the spatial correlation of the same variable. In the YH4DVAR system, vorticity, as a variable that characterizes the main balance mode of the atmosphere, is called a balanced variable. The remaining variables are decomposed into balanced parts and unbalanced parts by the balance operator. This process can be represented in matrix form:
where the first matrix on the right is the balance operator matrix
,
represents the correlation between the divergence increment and the vorticity increment,
and
represent the balance between the mass field and the wind field [
4,
7,
32]. These matrices are estimated with the National Meteorological Centre (NMC) method and using linear regression for calibration [
33]. The second matrix on the right consists of the control variables, which are vorticity
, unbalanced divergence
, unbalanced mass
, and specific humidity
. The model variables on the left side of the equation are correlated, and through the balance operator, the control variables on the right are independent of each other, which can be considered separately.
2.3. Flow-Dependent Estimation and Postprocessing of Variances
The data assimilation system used in this work mainly includes an EDA cycle and a deterministic 4DVar cycle. The flow-dependent background error variances were estimated from the EDA cycle. First, the background field, observation data, and sea surface temperature (SST) field were perturbed. The perturbation of the background was implemented implicitly by perturbing the physical process tendency in the numerical prediction model. The perturbations of observation and SST were achieved by superimposing random noises obeying their respective error distributions. Then, the perturbations were input into the EDA cycle to obtain the perturbed analysis fields, and the forecast field (background field) ensemble was obtained through the forward integration of the forecast model. Next, the raw variances estimated from the ensemble samples were scaled and filtered to obtain the EDA variances. Finally, the background error variances with flow-dependent properties were used as the input of the deterministic 4DVar cycle (only used in the minimization step). To reduce the computational cost, the resolution of EDA is usually lower than that of the 4DVar cycle. In the EDA cycle, two layers of inner loops were used with resolutions T95 and T159, respectively, and one layer of outer loops had a resolution of T399. In the deterministic 4DVar cycle, three layers of inner loops (T159/T255/T255) and one layer of outer loops (T1279) were used. Similar configurations can also be seen in the ECMWF operational system [
9].
Using ensembles to estimate the flow-dependent background error variance, one of the key points is to select the appropriate number of members. Sampling noise considerably affects the accuracy of estimation. Pereira and Berre [
34] noted that the accuracy of variance estimation based on EDA is directly proportional to the root mean square of the number of ensemble members. Increasing the size of the ensemble is beneficial, but it also brings a significant increase in the amount of computation. Due to the limitation of computation cost and the demand for the timeliness, the operational number of members is on the order of
. Several studies have focused on the impact of finite-size ensemble samples [
35,
36,
37,
38]. Bonavita, et al. [
39] showed that a relatively small set (e.g., 10 members) can adequately characterize the large-scale error structure of extratropical cyclones, and a larger set can model more refined features. The research of Liu, et al. [
40] found that the variance estimation using 10 members is very similar to the 30 members except for the relative reduction in noise, and there is no essential difference between the two. The number of EDA members in this article is also taken as 10.
To reduce the effect of random errors, an objective filtering method [
41] with a small amount of calculation was used in the process of variance postprocessing. The estimated variances were filtered using the following spectral low-pass filter:
where
is the wavenumber,
is the truncation wavenumber (i.e., the corresponding wavenumber when the signal energy spectrum is equivalent to the random sampling noise variance energy spectrum), and
is the spectral filter coefficient. The filtering process converts the ensemble-estimated variance in the grid space to the spectral space and multiplies it by the filter coefficient so that the larger-scale signal passes through and the smaller-scale noise is filtered out.
2.4. Experimental Design
In the operational YH4DVAR system, the variances at each analysis time are first calculated from the 10-member EDA (lower resolution) and then applied to the operational 4DVar. Among all the control variables, the analysis of specific humidity is independent and not related to other variables and is not considered. In addition, this study only focuses on the flow-dependent variances, and the correlations (i.e., off-diagonal elements of ) are climatological.
The objective of experiments is to evaluate the effect of flow-dependent variances of unbalanced control variables (i.e., unbalanced divergence, unbalanced temperature, and surface pressure). Therefore, we performed three experiments. In the vorticity-balanced flow-dependent variance experiment, the variance in vorticity is calculated from the EDA, while the remaining part of the control vector is derived from the climatological statistics. As the equations in
Section 2.1 indicate, vorticity plays a crucial role in describing the balanced relationship among different variables. The flow dependence of vorticity variances can be projected to the balanced parts of divergence, temperature, and surface pressure through the balance operator. Therefore, we call this experiment “flow-dependent balanced”, abbreviated as “fbal”. A fully flow-dependent description of variances is investigated in the second experiment, which means that the variances of all control variables are provided by the ensembles. By comparing with “fbal”, we can determine the effect of variance of unbalanced variables. The second experiment is abbreviated as “fall”. As a reference, we also ran a control experiment (abbreviated “ctrl”). In this experiment, the variances of all control variables are static and derived from climatology.
One-month assimilation/forecast experiments with three conditions were run in October 2020 (0000UTC 01 to 1200UTC 31). The assimilated observations include surface observations, aircraft data, sea surface observations (e.g., drifting buoys and ship reports), in situ sounding data, wind profiler radar data, global positioning system radio occultation bending angle, and radiances from polar-orbiting satellites (e.g., AMSU-A, MHS). The observations are subjected to various quality check steps, such as bias correction, variational quality control, and thinning before entering the assimilation system.
5. Conclusions and Discussion
Accurate representation of background error variance considerably affects the forecast performance of variational data assimilation systems for TCs. In this work, an EDA system was used to estimate the day-to-day variances, and three experiments (ensemble estimates for all control variables, vorticity only, and climatological estimates for all variables) were investigated to examine the effects.
The mechanism by which flow-dependent variances affect analysis was investigated through a single-observation experiment and a case study analysis for TC Saudel. The introduction of flow-dependent unbalanced variances can further strengthen the connection with the underlying flow and can describe the background error characteristics in more detail. They can make better use of the observations, that is, provide larger weight to observations near areas of enhanced uncertainties in the analysis. For active dynamical areas, such as that affected by TC, the contribution of the unbalanced background error variances to the total variances is higher for mass variables. The use of EDA variances of unbalanced variables also propagates flow-dependent information into the analysis increments, indicated by the anisotropic increments of Saudel. It is expected that the analysis will be further improved once the flow-dependent correlation information is used. Statistical results of a series of TC forecasting experiments verify the advantage of the full flow-dependent of control variable variances. Although there are differences across time, region, and TC, the EDA-based unbalanced variances can improve overall track and intensity forecasts. The track and intensity of TCs are determined by a combination of external forcing (e.g., guidance of large-scale circulation, the influence of approaching cyclones, etc.) and internal dynamical factors (e.g., convective asymmetry, vertical coupling of high and low-level vortices, etc.). The increase in the 500 hPa geopotential height forecast scores and the more realistic wind–pressure balance relationship indicates that the improved large-scale circulation forecast and the description of the dynamic balance between different variables are responsible for the improvement of track forecasting. Nevertheless, the global system’s description of the TC internal dynamics may not be fully accurate (due to resolution limitations), resulting in less pronounced intensity improvements than the track. Although more in-depth analysis is needed, these results are valuable in guiding the design of operational assimilation systems.
Furthermore, we found that the two experiments that introduced flow-dependent information did not accurately predict the strongest magnitude and temporal evolution of the ‘Saudel’ intensity (
Figure 7). To analyze whether it is the effect of the number of ensemble members, a single forecast for Saudel was performed using a 20-member ensemble (similar to
Section 4.1). However, increasing the ensemble members did not significantly improve the results (not shown). Next, we will diagnose these failed cases and identify possible causes to further improve the system. There are also some fluctuations in the “fall” experiment for different TC cases. It is worth analyzing the reasons for the different effects of different cases. Is it related to the intensity scale of the TC itself or the external environmental factors (e.g., sea surface temperature)?
We only assessed the effect of the variance of control variables, the correlation is also an important part of the background error covariance. A covariance model with full flow dependence needs to be evaluated to investigate the combined effect of flow-dependent variance and correlation. Finally, the main disadvantage of En4DVar is its high computational cost. Whether the gradient descent information (such as direction) of the previous member can be used to accelerate the solution algorithm of the next member based on the similarity of the minimization process of each ensemble member is also a direction worthy of further exploration.