1. Introduction
Numerous studies have documented the critical role of sub-mesoscale progress (~10 km) in the ocean’s energy cascade [
1,
2,
3,
4]. However, existing observing approaches struggle to accurately represent the occurrence and evolution of the target continuously and accurately due to their sparse and discrete measurements. This situation would be alleviated by the launch of the Surface Water Ocean Topography (SWOT) mission [
5]. The equipped Ka-band Radar Interferometer (KaRIn) measures the global Sea Surface Height (SSH) at a 21-day cycle, providing the along- and across-track SSH observations at very high resolutions (~2 km), and the Low Rate (LR) products that focus on the ocean were released in December 2023. The new-type SSH observations have been considered an effective way to resolve signals with wavelengths down to 15 km [
6,
7], and they would enhance the research on sub-mesoscale processes. Recent studies have been performed to quantify the potential contribution of the SWOT observations in ocean state estimation and forecasting [
8,
9,
10,
11].
Multi-scale features, encompassing large-scale currents, mesoscale eddies, and sub-mesoscale processes, coexist in the SWOT wide-swath maps. However, the amalgamation of these diverse features may surpass the cognitive capacity of the model. Utilizing assimilation with a singular decorrelation scale, as highlighted by D’addezio et al. [
12], is likely to result in information loss across other scales and may impede the model’s ability to capture crucial interactions and feedback between different scales, consequently affecting the overall model performance. To address the issue of observation aliasing in data assimilation, multi-scale methodologies have been proposed and developed in previous studies [
13,
14,
15,
16,
17,
18,
19,
20]. The primary concept of this methodology involves decomposing information from high-resolution observations into large-scale and small-scale components. These components are then integrated separately with the model results at the corresponding scales or assimilated into the model through a sequential recursive process. In light of the distinct features exhibited by contemporary high-resolution observational data and the demands of assimilation, multi-scale assimilation methods are continuously refined and widely applied across diverse models. For instance, when confronted with locally dense Glider observations, Carrier et al. [
17] implemented a two-step assimilation strategy within the Four-Dimensional Variational Data Assimilation (4DVAR) framework, a methodology later refined and extended to assimilate SWOT observations in the Three-Dimensional Data Assimilation (3DVAR) system [
18]. Another illustrative example involves Li et al. [
21], who decomposed SWOT observational data into three scale components and integrated them individually with the model results at the corresponding scales using an extended 3DVAR method.
To enhance the application of SWOT observations and advance the application of multi-scale assimilation strategy, we have implemented an extension of multi-scale assimilation methods within the ROMS-4DVAR framework, and the noteworthy positive impact of the multi-scale assimilation method is verified through an Observing System Simulation (OSSE). The paper is structured as follows:
Section 2 provides a comprehensive overview of the methods employed in this study, covering the model setup, SWOT synthetic data simulation, and the implementation of the multi-4DVAR method.
Section 3 presents the research findings, emphasizing assimilation method performance validation and comparisons with nature run experiments within the Observing System Simulation Experiment (OSSE) framework. Finally,
Section 4 concludes the study and outlines future perspectives.
2. Methods
In this section, the construction details of an OSSE framework are illustrated. The primary objective was to enhance the assimilation of SWOT SSH observations. To this end, a comprehensive twin experiment, encompassing a nature run, a free run, and multiple assimilation cases, was executed as delineated below. Firstly, a nature run case constructed using a higher horizontal and vertical resolution was instantiated, and its outcomes were posited as a representation of the “true” state of the ocean. In the second step, synthetic SWOT observations were meticulously simulated by sampling the surface elevation field of the nature run at the predetermined orbit locations. Subsequently, these synthetic observations were assimilated into the free run, utilizing the primary single-scale and the proposed multi-scale ROMS-4DVAR system. Finally, the third step involved a rigorous evaluation of the impact of SWOT observations on the simulation of the ocean state.
2.1. Model Configuration
The Regional Ocean Modeling System (ROMS) was employed to generate three-dimensional oceanic fields and quantify the assimilation of SWOT observations within the OSSE system. The primitive equations governing ROMS were formulated under the hydrostatic and Boussinesq assumptions. This study is centered on the Northern South China Sea (NSCS) domain, spanning from 105.5°E to 125.7°E and from 14.5°N to 26.5°N. Within the OSSE framework, a twin experiment featuring two cases was conducted, with horizontal resolution and vertical S-coordinate configuration set at 1/36° × 1/36° (1/10° × 1/10°) and 50 (30), respectively. The topography fields for the two cases were derived from the General Bathymetric Chart of the Oceans (GEBCO) data and the ETOPO1 global relief model integrated topography. The modeling minimum and maximum depths across the entire domain were set at 10 and 5000 m, respectively.
Figure 1 illustrates the smoothed bathymetry used in the nature run. Model inputs, encompassing the initial state, surface air–sea forcing fields, and boundary conditions for both the nature run and free run, are detailed in
Table 1.
The initialization of surface water elevation and three-dimensional temperature, salinity, and velocity fields for both the nature run and free run was accomplished utilizing the HYbrid Coordinate Ocean Model and Navy Coupled Ocean Data Assimilation (HYCOM and NCODA) global reanalysis dataset. For the nature run, the atmospheric forcing fields were derived from the European Centre for Medium Range Weather Forecasts (ECMWF)’s ERA-interim reanalysis [
22] with a 3 h temporal resolution and a spatial resolution of 0.125° × 0.125° while the free run utilized data with a 6 h temporal resolution and a spatial resolution of 0.25° × 0.25°. The analysis and forecast products of ERA-interim were integrated for this purpose. Boundary conditions were established using the 5-day and monthly Simple Ocean Data Assimilation (SODA) reanalysis dataset, providing temperature, salinity, sea surface height, and velocity fields. Additionally, tide forcing along the open boundaries for the nature run was incorporated from the global Oregon Tidal Inverse Solution (OTIS). The vertical mixing process was parameterized using the level 2.5 Mellor–Yamada method [
23].
2.2. Simulated SWOT Observations
The SWOT observations were synthetically simulated using the results from the nature run. Previous investigations emphasizing SWOT Calibration/Validation (Cal/Val) underscored the critical importance of reproducing realistic internal tide energy in the nature run [
8,
12]. To ensure the reliability of the simulated SWOT observations, an initial exploration that focused on the SSH accuracy in the nature run case was conducted. Considering the introduction of tide forcing in the modeling process, the barotropic tidal signals involved in the surface evaluation field were removed, employing the harmonic analysis method based on the Least Squares Fit technology.
Qualitatively, the monthly mean nature SSH filed for June 2012 exhibits favorable comparability with assimilative HYCOM reanalysis data. As depicted in
Figure 2a,b, the spatial patterns, including height magnitude and location, demonstrate noteworthy similarity, particularly along the Kuroshio western boundary current. The assimilation of multi-platform data such as satellite altimeter observations, in-situ temperature and salinity profiles from XBTs, Argo floats, and moored buoys by HYCOM contributes to a more realistic reconstruction of mesoscale characteristics. Additionally, we also explored the daily variation in sea surface elevation from 1 January to 31 June 2012. The spatial averaged value of the daily SSH derived by HYCOM and nature run is presented in
Figure 2c. The consistent tendency in the seasonal cycle captured by the two SSH data series is observed, and the difference with the HYCOM reanalysis product had been effectively controlled since the middle of February.
Furthermore, the internal tidal energy contained in the surface elevation field was examined. For simplicity, a subdomain proximal to the Luzon Strait, demarcated by the red box in
Figure 1, was selected for analysis. The two-dimensional power spectral density (PSD) was computed for each grid spanning the period from 1 January to 31 August 2012. The spatially averaged energy spectrum is depicted in
Figure 3. The observed spectrum reveals prominent peaks near semidiurnal and diurnal frequencies. The power spectral densities adjacent to these frequencies reach values as high as 10
−1 m
2/cpd, underscoring the substantial reproduction of primary internal tidal energy within the nature run. Consequently, the demonstrated feasibility of utilizing the SSH fields from the nature run for simulating SWOT measurements is established.
To generate the SWOT observations, we employed the SWOT-simulator software of version 1.2.4 [
5] developed by the Jet Propulsion Laboratory (JPL). The simulated SWOT measurements were sampled instantaneously, utilizing the 3-hourly surface elevation outputs from the nature run. The wide-swath capability of SWOT was set at 120 km while the along- and across-track resolutions were configured typically as 2 km, consistent with previous studies [
18].
Figure 4 presents the 5-day simulated SWOT observations, with detailed magnifications of measurement particulars along the wide-swath track near the Luzon Strait provided in the right panel.
2.3. Data Assimilation Method
The multi-scale assimilation method used in this study was developed based on the ROMS and 4DVAR system [
24]. The target of primitive 4DVAR is to identify the best estimate circulation by minimizing the difference between the model background and the observations, subject to prior hypotheses about errors and possibly additional constraints. Generally, the variational cost function for the incremental approach is given by the equation below:
Here,
is the observation sequence and
represents the model background state.
denotes the observation operator that maps the model state fields to the observation position.
represents the increment of the state variable, surface forcing, and open boundary conditions and
is the innovation vector representing the difference between observations and model value. As stated by Moore et al. [
24], the innovation vector
represents the combination of all the observations located in the specified window, and the matrix
and
vectors are block diagonals with blocks
and
, respectively.
The analysis increment
derived by minimizing
in a least-squares sense is added to the background
to form the analysis solution
, so we have
and the increment
could be expressed as
Here,
K denotes the gain matrix. According to Weaver and Courtier [
25] and Moore et al. [
24], the background error of ROMS-4DVAR could be expanded as follows:
.
is the multivariate balance operator, utilizing hydrostatic and geostrophic balance to constrain different model variables. In the case of rough topography, the balance operator can be disabled. Moreover, the tangent linear model and adjoint model enable the transfer of information between variables through model dynamics, and thus, disabling the usage of
would not destroy the solutions.
denotes the standard deviation matrix and is calculated using the free run model results for a period that is long enough to compute meaningful circulation statistics like mean and standard deviations for all prognostic state variables.
The ROMS-4DVAR system has been efficiently applied to assimilate multi-platform oceanic observations. Particularly, dense datasets originating from instruments such as gliders and the SWOT mission have significantly contributed to enhancing the correction of the model state at higher resolutions. The process of “thinning” observations has been employed as an immediate step to accommodate high-frequency sampling measurements. This involves a reduction in the number of observations within a specified radius based on the local decorrelation length scale, thereby constraining the spatial density of retained observations. Consequently, the full utility of dense measurements becomes challenging. To figure out the reasonable assimilation of observations beyond mesoscale, the reliable way is to adopt the multi-step strategy, which involves performing the large-scale corrections to the background field in the first cycle and updating the modified analysis using small-scale innovation. Several analogous multi-scale assimilation frameworks have been implemented in prior studies [
8,
15,
16,
17,
18,
19,
20,
21]. In this study, we developed a multi-step 4DVAR system based on ROMS, which splits the high-resolution observations into different scales and corrects the model state accordingly, thus avoiding the scale aliasing between the model background and observation fields.
In the development of the multi-step ROMS-4DVAR framework, we followed the established two-step assimilation methodology as outlined in prior literature [
13,
15,
16]. This approach entails the assimilation of large-scale features during the initial step, followed by the adoption of the modified analysis field as the background for the second step. The overall analysis increment, denoted as
, is further decomposed into two distinct components, namely
and
. These components arise from the low- and high-resolution observation assimilation cycles, respectively. Thus, we have
In the computational process, the estimation of the analysis increment is accomplished through Equation (3). By decomposing the high-resolution observation vector, denoted as
, into its low- and high-frequency components (designated as
and
) and segregating the large- and small-scale information inherent in the model outcomes, specifically
, the innovation vector (
for the two-step procedure could be reformulated as
and
. Consequently, the large-scale increment component can be articulated as
and the small-scale increment is expressed as
Supposing the large-scale field has been corrected in the first step, the term
could be estimated using
. Consequently, Equation (6) can be represented as
By combining (5) and (7), the comprehensive analysis for the multi-step ROMS-4DVAR is accomplished. Notice that the increment in the first step, guided by (5), is calculated using the large-scale component of the background field. In this study, we adopted the approach introduced by [
18], which approximates the increment term in (5) as the complete one. Similarly, the first-step analysis is conducted as follows:
Considering the limited capability of the primary model in simulating small-scale features, the biases and uncertainties introduced by this approximation are generally manageable.
For the two-step assimilation, the high-resolution SWOT observations were initially organized into bins corresponding to model grid cells. If multiple measurements fell into the same model cell, these observations were averaged to generate the ‘‘super observations’’. By doing this, the coupled features were split into the spatial averaged terms and the corresponding departures. Furthermore, the error covariance
was configured following the methodology outlined by Carrier et al. [
17]. The static background error variance was computed from the history file of the preceding model run and the decorrelation scales were set proportionally to the Rossby deformation radius. Specifically, the typical averaged Rossby deformation scale in the domain is 50 km [
26] and the proportionality constants used in the two-step analysis were 1.0 and 0.5 in our study.
3. Results
The assimilation experiment spanned from 1 January to 30 June 2012, employing both the developed multi-scale and primitive ROMS-4DVAR formulations simultaneously. The assimilation window for the two-step analysis was set at 7 days. It is worth noting that a shorter assimilation interval could potentially enhance small-scale analysis, particularly in the second step, a facet reserved for subsequent investigations.
The analysis fields produced by both the multi-scale and primitive ROMS-4DVAR methodologies were exported at 3 h intervals. Initial evaluations entailed assessing the bias relative to the assimilated observations. Throughout the 6-month assimilation duration, over 700,000 synthetic SWOT measurements were assimilated into the 4DVAR system. Let
denote the model outputs mapped to the observations’ locations. The error statistical indicators, including the coefficient of determination (R2), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE), were computed as follows:
Here,
is the total number of the assimilated observations,
represents the observations, and
is the averaged observation value.
As illustrated in
Figure 5, the model analysis states derived from both the single-scale and multi-scale assimilation approaches are consistent with the observations. However, the statistical error comparison between the two assimilation approaches indicates that the multi-scale strategy acquires better gap reduction towards the SSH observations. The coefficient of determination between the multi-scale corrected model fields and the observations has a certain improvement, from 0.991 to 0.998, the RMSE is significantly reduced by half, and the MAPE decreases from 2.237% down to 0.93%. Moreover, the points in the multi-scale panel are more concentrated near the diagonal line, and after fitting the two scattered “model-observation” data using the polynomial basis, the approximation function of the multi-scale case moves closer to the line
. Therefore, the enhancement optimized by the proposed multi-4DVAR scheme is validated by analyzing the residual difference between the model state and the assimilated observations.
Moreover, the error inter-comparison between single-scale and multi-scale ROMS-4DVAR schemes was conducted and is presented. As illustrated in
Figure 6, the bias of the single-scale approach is located at [−0.1, 0.1] while the one of the multi-scale 4DVAR is reduced significantly and restricted in the interval [−0.04, 0.04], indicating that the posterior analysis after the two-step correction comes closer to the target observation than the primitive 4DVAR. Given the two-step assimilation process, wherein large-scale features are corrected in the initial round and the subsequent analysis step primarily focuses on updating small-scale features, we delve into the respective increments for the two assimilation cycles. The results, presented in
Figure 7, encompass both the large-scale (a) and fine-scale (b) components of increments along with their summation (c). The two-step assimilation procedure ensures the comprehensive updating of multi-scale features in the SSH field, thereby circumventing the scale aliasing of errors mentioned in a prior study [
12].
The robustness of the proposed multi-scale modification for the ROMS-4DVAR system has been substantiated through an examination of the congruence between the corrected model state and the assimilated observations. To further assess the efficacy of the updated surface elevation fields achieved through multi-scale assimilation, the true state generated by the nature run is employed here. An analysis of the errors in surface elevation concerning the nature run from 1 January to 31 June is presented in
Figure 8. Firstly, the high-resolution SSH field of the nature run series was mapped onto the coarse grid of the free run. Subsequently, the averaged mean absolute error (MAE) was calculated across the entire region for the free-run, single-scale, and multi-scale 4DVAR assimilation scenarios. The temporal progression of the MAE underscores the beneficial impact of the assimilation procedure on model state correction, evident in both the single-scale and multi-scale strategies. The accuracy of SSH fields following the first assimilation step was comparable to that derived from the single-scale analysis and, in certain instances, exhibited a slight superiority. Furthermore, it is noteworthy that the single-scale analysis may encounter challenges in specific situations, as illustrated by the dashed rectangle in
Figure 8. These arise when multi-scale features, aliased in high-resolution SWOT measurements, are directly injected into the model, leading to a subsequent negative correction. In contrast, for the multi-scale assimilation process, a global enhancement in accuracy was observed, decreasing from 3.97 cm to 3.04 cm following the correction of large-scale features. A notable error reduction of 1.0 cm was noted after the assimilation of fine-scale information.
Moreover, to delineate the distinctive performances of the single- and multi-scale assimilation schemes, a spatial analysis of errors was undertaken. Initially, a representative scenario depicting horizontal SSH updates is presented in
Figure 9, with a comparison between the free run and nature run serving as a reference for comprehensive evaluation. In summary, the findings align with the earlier regional averaged mean absolute error (MAE) analysis in the time series, indicating overall improvements in SSH analysis fields across the entire region for all assimilation cases.
The efficacy of the single-scale method is comparable to that achieved in the first assimilation step using the multi-scale scheme. This suggests that, when adopting a single-scale assimilation, efficient absorption is confined to the large-scale features present in the observations. Furthermore, the analysis outcomes following the two-step assimilation processes demonstrate the highest accuracy, with the second assimilation cycle primarily contributing to modifications at fine scales, particularly in regions surrounding islands. As an illustrative example, the analyzed SSH fields in the vicinity of Taiwan Island exhibit a substantial reduction in the number of errors. This notable improvement is particularly noteworthy given the challenges arising from the intricate topography and the coupling of multi-scale ocean dynamic activities including tides, Kuroshio, and coastal currents. The diminished error in these areas underscores the efficiency of the second-step assimilation in updating smaller-scale patterns.
Table 2 provides the RMSE and MAE for each experimental group. The accuracy values for the first step of multi-scale and single-scale assimilation are equivalent, with MAE and RMSE values of approximately 5.3 cm and 8.3 cm, respectively.
The comprehensive error analysis of the ocean state, encompassing SSH, three-dimensional (3D) temperature (temp), salinity (salt), and the eastern and northern velocity components (u and v) fields resulting from the assimilation cases, was systematically undertaken. To emphasize the enhancements and facilitate a comparative assessment of the impacts between single-scale and multi-scale assimilation, the error reduction was quantified relative to the accuracy of the free run case. The MAE and RMSE reductions are presented in
Figure 10. Regarding the 3D temperature and salinity fields, the influence of SSH assimilation was negligible, as the reductions in MAE and RMSE were minimal, occasionally displaying negative values. In alignment with prior studies [
27,
28,
29], the direct assimilation of altimetry measurements proved insufficient for enhancing the estimation of temperature and salinity. Additional processes are typically required before SSH data assimilation, although this aspect was not the primary focus of the present study and was left for future exploration. In addition to the evaluation of water properties, the dynamic model state variables, including surface elevation and momentum components, demonstrated significant improvement. Remarkably, the two-step assimilation within the multi-scale process produced better results compared to the single-scale approach, reinforcing the conclusions previously mentioned.
4. Summary and Discussion
To address the imperative of assimilating high-resolution observations such as the SWOT KaRIn SSH products, this study introduced a multi-scale 4DVAR system within the ROMS framework. Building upon the validated fine-scale correcting capabilities of multi-scale assimilation schemes [
15,
16,
17,
18], we extended this approach to the ROMS-4DVAR system. Employing the OSSE framework, a twin experiment comprising a nature run and a free run case was executed. The SSH fields estimated by the nature run were validated to be comparable with HYCOM reanalysis products, affirming their reliability in simulating realistic internal tide energy and their applicability for reproducing SWOT observations.
Subsequently, synthetic SWOT SSH measurements were decomposed into spatially averaged terms and their corresponding departures, accounting for the model configuration resolution. These components, derived from dense SSH observations, were then integrated into the proposed two-step 4DVAR assimilation scheme. The first cycle involved large-scale features to correct the model fields, and the updated analysis served as the background for the second assimilation step, targeting the fine-scale observation component. A comparison with primitive ROMS-4DVAR using a single-scale scheme revealed that the multi-scale strategy achieved better gap reduction in SSH observations. The RMSE was significantly halved, and the MAPE decreased from 2.237% to 0.93%. The two-step assimilation process ensured comprehensive multi-scale updates in the SSH field, enhancing the fine-scale features in the analysis fields. A quantitative comparison with the nature run further validated the efficiency and superiority of the multi-scale 4DVAR approach, demonstrating both accuracy enhancement and fine-scale correction. Moreover, the equivalence in accuracy between the first step of the multi-scale and single-scale approaches suggests that the single-scale 4DVAR system predominantly captures large-scale features.
The straight assimilation of SSH barely has an impact on improving the oceanic 3D temperature and salinity fields, and thus, it is essential to project the surface dynamic measurements onto the water property fields in advance. However, the strategy of dense SSH observation projection would bring other issues and needs further consideration of features such as the quality of produced 3D temperature and salinity profiles, the decomposition of the generated high-resolution underwater pseudo-observations, the necessary modification of the multi-scale 4DVAR approach, and so on. In addition, the updates of the spatial–temporal windows and the background error variance during the assimilation process should be further considered, especially for the fine-scale feature correction process in the multi-scale assimilation method. Moreover, it is also worth trying to decompose high-resolution observations into more components with different scales in order to optimize the multi-scale feature of the model results more precisely. Above all, the verification in this study was performed in the OSSE framework, and the beta products of the SWOT KaRIn L2 SSH have been available since December 2023, so the main focus in the future will be to explore the assimilation of real SWOT observation data.